The Oasis Digital Blog

Merge Queue Tools

Introduction

In small projects, there are typically a few developers and a few branches underway. As those branches reach readiness, depending on the flow of the project each developer may merge their own branches, or a tech lead may merge on request. Or sometimes they pile up (in a queue, conceptually) for review and merge.

But in large teams working (either a large single project or a monorepo) there can be dozens/hundreds/thousands of developers creating branches all the time. Many teams have the practice of doing a modest or moderate amount of work per branch, resulting in a large number of branches that must be merged to move the work forward.

To be merged, a change (branch) needs to be at least conceptually on top of the current mainline (regardless of whether a merge-commit or rebase strategy is used), and tests/lint/build/etc must pass. As each merge happens, the baseline moves forward – so the next candidate sometimes can’t be accurately tested until that point. At scale, this becomes the limiting factor in system progress: If the tests take 1 hour, the maximum number of merges you can do per day can be as low as 24.

Long before that limit though, having humans perform each of these operations is impractical, so teams typically adopt an automated system to queue up proposed changes/branches for merge. The function is roughly:

Developers submit a branch as ready for merge, into a queue; often mediated by a human review and approval process.
Automated merge system keeps picking the top thing in the queue.
Rebase it on the current mainline.
Test, Lint, Build, etc.
Hopefully, merge.
Repeat.
When a change fails to re-baseline / merge / test, notify the developers working on that branch, they will need to resolve either a syntactic or semantic merge conflict and resubmit.

Of course there are strategies to increase parallelism; the process remains conceptually serialized but practically parallelized.

Related or unrelated branches can be combined and tested + merged as a group.
Use a build system that has a great understanding of what subset must be re-tested for a given set of changes.
Pre-build each candidate on a recent baseline, to pre-populate a build cache; a well-crafted build process (with Bazel or other tools) can safely optimize away some of the rebuild-everything process.

Some Tools

Here are some of the available merge queue tools – but many organizations create an internal merge queue management system.

Mergify
Bors (Good post about it)
MergeQueue
GitHub merge queue feature (beta)
“Merge Queue works by validating different combinations of pull requests identified as “ready to merge” in parallel”
Gerrit Add-ons like gerritt-queue
Plastic SCM has a built-in merge queue
AutoMerger (article)
Kodiak
Butler (internal? tool at Strava)

For now, use of an automated merge queue remains mostly a marker of large-team big-tech-co needs; over time I expect it to be a routine in-the-box tool almost as common as CI.

On-site is back – COVID-19 pandemic update

Last spring when the pandemic arrived, we quickly transitioned to 95% online operations, as detailed in our post at the time. Throughout, Oasis Digital has continued serving customers online/remotely, delivering results effectively albeit less conveniently. We have had no known workplace-spread illness.

Here is the United States, as of April 2021 the pandemic appears to be winding down quickly. Most of our team is vaccinated, our headquarters office is in full use, and client interest in on-site / in-person engagements has started to return.

Therefore, we are now scheduling on-site client engagements, of course with due care and precautions.

Our open-enrollment classes will gradually include in-person dates later in 2021.

(We are keenly aware that the pandemic is currently much worse in some other areas of the world.)

Full Stack Hiring – Apply Now

Update May 2021 – we hired 5+ developers in early 2021, and are now looking to hire 5+ more.

Oasis Digital is growing – maybe you should be on the team! We are again looking for more great developers. All the usual information is available on our careers page.

All the info is there of course, including a video with lots more about working here. Here are the key aspects:

The core of our team works at our headquarters in St. Louis MO – at least during non-pandemic times. But more of us also work around the USA (and a few around the world) on our customer projects.
We mostly hire full-time employees, with contractors for certain circumstances.
We hire at a wide range of levels; entry-level software developers eager to grow fast, through grizzled gurus who already have the deep understanding and experience our most critical customer needs.
No “take-home interview coding test”; rather we will program with you at the relevant stage of the hiring process to get to know each other, and see your abilities in action.
The most common key skill for new hires is Angular, React, or one of the other competing frameworks.
“Full stack” is best, though many new hires start in one layer of the stack (for example, in the browser, on the server) and learn the rest over time.
Please apply via our site linked above – that is machinery we use, resumes that arrive any other way tend to get lost.
Our hiring process is about the same as in this 2018 post.
Feel free to reach out by email (address at the lower right of our web site) and ask questions about the job, to figure out if you may want to apply.

Also, a bit about our philosophy of hiring:

Hiring is a marketing-like process, not a sales-like process. We aim to get the word out so that the right people find us.
Hiring is a collaborative process of looking for the best match, not a race or competition.
We hire people whose time is valuable, so we optimize for rapid assessment and understanding.

Blog posts can get stale. Our current hiring is always listed on our careers page!

Angular for Architects – preview/beta class, Jan 2021

First we worked for years on numerous enterprise Angular projects. Then we spent months organizing the knowledge of our experts. The result: Angular for Architects, now available from Oasis Digital.

This class is about both the coding aspects of Angular architecture, and also the decisions, patterns, costs, and constraints that drive architectural decisions.

Early access / Beta test class

To prepare and validate the curriculum for Angular for Architects, we are running a special beta-test version of the class. To show appreciation for attendees willing to jump in at this stage, the price is cut by half.

The first such Beta test is in January. Depending on how it goes we run a second “beta”, or proceed to standard pricing.

Who Should Attend?

Decision makers responsible for large or complex Angular projects
Angular tech leads and project managers
Application and system architects
Teams looking for a shared understanding of Angular architecture and practices

NgRx is 40x faster than your code – find out why

When I started using @ngrx/store to hold collections of information, I usually put the data into the store as a JavaScript array. It seemed to be the simplest and most appropriate data structure for the information. However, when @ngrx/entity came out, I saw that it used a different pattern – instead of using the array directly, it converts the array to two data structures; an array of ids and an object map keyed by those ids. Why did they do this? And is there a lesson we can learn for our own code?

Review and Advice – ongoing support engagements

Here at Oasis Digital, one of the services we offer is to “review and advise” about a team’s applications. Today, let’s elaborate on the meaning of these services listed on our web site.

Advanced, Angular-related training and mentoring

About five years ago (it feels like forever) Oasis Digital started training on Angular. Our flagship course Angular Boot Camp has become quite popular, we’ve taught it many hundreds of times to many thousands of students. For the first few years, this offering was a perfect fit for almost every company that contacted us, as software teams were initially adopting Angular. Over the last few years though, Angular has become mature and robust, and Angular has achieved broad adoption across organizations large and small. Aggregate needs of Angular teams inevitably shift toward bigger scale, more difficult and important uses of the technology.

As a result, our training efforts have substantially pivoted toward more advanced topics.

Continue reading Advanced, Angular-related training and mentoring

Transposing Rows and Columns in ag-grid

Real-world Angular applications often need to present tabular/grid data, and most grids make the most sense when presented with each column representing a certain type of data. For example, on a spreadsheet showing a pay schedule for a loan, the first column could be a date, the second column could be the interest accrued, the next could be the size of the payment, etc.

However, we sometimes need to show data in a transposed format, where the rows instead of the columns need to show a consistent data type. This is a rare case, which is why some major grid libraries like ag-grid don’t provide native support for the feature, but it’s still necessary.

Fortunately, ag-grid gives enough power to developers to be able to transpose data for display, and even to have features like renderers and editors apply by row instead of by column.

I’ve shared a small Angular app on GitHub that I’ll use to step through the process of transposing data. Think of it as a prototype for an app that shows names as they are translated in various languages (Matthew, Mateo, Matthäus, Матфей, etc). You can see the live-running app on StackBlitz, which shows the evolving grid on separate tabs.

Thinking hard about a project launch

Here at Oasis Digital, we are always agile, and depending on the project needs, sometimes use Agile (in the “capital A” sense) processes extensively. Yet regardless of agility, iterations, steering, and so on, the planning and decisions made at the beginning (and in the early months) of a project often have profound and very difficult to change consequences later.

Therefore, while we would never argue for the straw-man Waterfall, we aim to think very hard about a project at the beginning.

Continue reading Thinking hard about a project launch

Querying without OR in Firestore

Background

Here at Oasis Digital we have successfully used the Firebase Realtime Database, and more recently the (beta as of July 2018) Firebase Firestore. These similarly branded offerings have important feature differences, and the latter appears likely to be the recommended choice in the future.

Firestore is a globally scalable, fully managed, document oriented NoSQL database. It is suitable for a very small team to build an application which could then scale to a vast user base with very little system administration work; of course, there are feature trade-offs which enable these amazing properties. Notably, Firestore has important structural limits on the types of queries that can be performed. For example, it has no joins, no “OR” criteria, and limited range (inequality) queries. As I understand, these limitations are what make it possible to engineer Firestore operations to “cost” (and therefore be priced!) in proportion to the amount of data returned.

We recently implemented a Firestore application (with Angular, Angularfire2, and Firebase Functions) in which the query limitations initially were an obstacle; but we found solutions capable of producing great results nonetheless.

Querying workflow state

There are countless scenarios for querying a data store, of course. A simple, common such scenario is an application in which each “document” represents an entity that moves through a workflow over its lifespan. The problem domain doesn’t matter; but one common concrete example is order workflow in a e-commerce system. Such a workflow could look something like this:

New -> Verified -> Scheduled -> In progress -> Preparing -> Shipped -> Delivered -> Closed

Or more generally, think of a workflow feature as the movement of an entity through a series of states:

S1 -> S2 -> S3 -> S4

(In both examples I have drawn simple, linear flows – most of our production/customer software has workflows with looping, branching, and other complex considerations.)

In a system with workflow features, it is very common to need to query a group of entities which are in a certain state – and also very common to query the entities within a set of states. For example, a feature might tally or otherwise view “all orders that have not yet shipped”, which comprises 5 states in the example workflow above.

Firestore “schema” design

Given the absence of OR queries in Firestore, and the frequent need to query entities that are in one state OR another, how should we represent workflow state in a Firestore implementation?

With a traditional, relational database, the main driver of data modeling is to concisely represent the underlying data and ideally “make illegal states unrepresentable”. A representation oriented for this kind of database should follow one of the normal forms, mathematically justified decades ago.

With Firestore (as with most NoSQL data stores), data modeling has a different main driver. An application instead stores data in a way to enable whatever kinds of queries are needed, given the query capabilities. This potentially implies a significantly different data layout, although we often start with something similar to a traditional RDBMS schema then diverge as needed.

Keeping that in mind, show should we store “what state is this entity in?” in a document in Firestore?

Approach 1: Single state field

The simplest solution is one field to represent the state of the entity represented by a Firestore document. The data storage looks something like this:

state: ‘Scheduled’

Querying entities in a single state is trivial with such representation:

.where(‘state’, ‘==’, ‘Scheduled’)

Querying entities in several states requires running a separate query per state, then combining the results together in client code.

Unfortunately, this “combine results in client code” approach, though mentioned in the documentation, has unpleasant consequences. Consider a case where there are 500 entities in state S1, and 500 more entities and state S2, along with some other fields (perhaps “due date”) that ranks all of the entities. Then try to write a query (or pair of queries) to retrieve the 500 “soonest” entities that are in either of these two states:

.where(‘state’, ‘==’, ‘S1’).orderBy(‘dueDate’).limit(500);

.where(‘state’, ‘==’, ‘S2’).orderBy(‘dueDate’).limit(500);

(Your client code would combine the results, sort by due date, and discard all but the first 500 of the combined list.)

Unfortunately, this query now potentially “costs” twice as much, in both time and money, as it should; it may query up to 1000 documents only to discard 500 of them. Still, with this extra cost, time, and client-side query implementation, the single state field approach does work.

Approach 2: One field per state

With this next approach, entity state is represented by a set of flags, one for each state. Typical data could look something like this:

state: {
    New: false
    Verified: false
    Scheduled: true
    InProgress: false
    Preparing: false
    Shipped: false
    Delivered: false
    Closed: false
}

This is more verbose, but easy to understand and implement. Querying for a single state is as easy as before:

.where(‘state.Scheduled’, ‘==’, true)

At first glance, the limitation on OR queries appears to stymie a multiple-state query with this schema. However, with a bit of Boolean logic these OR operations can be swapped out for ANDs and NOTs in the right combination. Transform the desired OR of all the states you want to exclude – into a query which instead excludes all the states you want to exclude.

Continuing with the order-management example, imagine we want all of the orders in the first three states. The query looks something like this:

.where(‘state.InProgress', '==', false)

.where(‘state.Preparing', '==', false)

.where(‘state.Shipped', '==', false)

.where(‘state.Delivered', '==', false)

.where(‘state.Closed', '==', false)

This is simple to implement, mechanically. A bit of utility code could perform the correct set of WHERE operations, leaving application code straightforward.

How might this scale? This is an unknown – these are a lot of ANDs (especially for an entity with many states), which might need more Firestore indexes, or might stress Firestore query mechanism in unexpected ways, or might hit a limit on the number of allowable (or advisable) ANDs.

This is probably the best approach for a small number of states.

Approach 3: Combinatorial state fields

Given how well mathematical logic worked in the previous approach, what if we took it further? When writing an updated state of an entity/document, application code (utility code) could emit all of the combinations of states that include the current state. Concretely, consider the S1/S2/S3/S4 example. If an entity is in state S2, that could be represented like so:

state: {
    S1: false, // or omit the ‘false’ entries
    S2: true,
    S3: false,
    S4: false,
    S1_S2: true,
    S1_S3: false,
    S1_S4: false,
    S2_S3: true,
    S2_S4: true,
    S3_S4: false,
    S1_S2_S3: true,
    S1_S2_S4: true,
    S1_S3_S4: false,
    S2_S3_S4: false,
    // S1_S2_S3_S4 not necessary, would always be true
}

This approach pre-computes the answers to all possible queries of sets of states. Such a representation could be generated easily and consistently by utility code. Queries, again with the bit of utility code to generate them, can find all documents (entities) in any set of states with a single WHERE. For example, to look for all documents that are in state S2 or S3:

.where(‘state.S2_S3’, '==', true)

This will have efficient query characteristics, but could run into limitations around the number of allowable fields in a single Firestore document. Also, with Firestore there is an index for each of these fields, and each index increases the Firestore storage costs and makes document updates a bit slower.

Approach 4: Partial combinatorial state fields

This approach is like the previous approach, with one optimization: rather than pre-generate all of the possible combinations, instead only generate those combinations which enable queries the application actually performs. Typically an application will only use a a few handfuls of combinations of states that an application ever queries for – vastly fewer than the number of combinations of states – which as you may recall from math courses, involves factorials.

The obvious downside: if an application later needs to query a different combination of states, it must first update every document to add the new bit of data, or fall back to another approach.

Approach 5: State range with a numeric field

Up to this point, every approach would work with any kind of workflow, regardless of its linearity. But many applications have a primarily or entirely linear workflow – and that can be used to ease the query challenge. Consider a simple numerical indicator of the state:

state: 3 // entity is in state S3

This enables state “range” queries, like so:

.where(‘state’, '<=', 2) // Entities in state S1 or S2

.where(‘state’, '>=', 3) // Entities in state S3 or S4.

It’s also possible to query any contiguous range of states:

.where(‘state’, '>=', 2)

.where(‘state’, '<=', 3)

Unfortunately, this approach has a serious downside: a Firestore query can only do range queries on a single field. Therefore with this approach, it is impossible to query questions like “entities in state S2 or S3, ordered by due date, first 50”, because the single range query “slot” is used up by the state part of the query. Because of this obstacle, it’s hard to recommend this approach.

Approach 6: State range with boolean fields

Fortunately, there is another variation of the range idea that is quite workable. Use a flag for each state, marked as true if the entity is at or after that state. So for example, an entity at state S3 would be like so:

state: {
    S1: true,
    S2: true,
    S3: true,
    S4: false,
}

Each flag represents a range of states; as before, utility code could implement this representation generically and reliably. Conceptually, a single WHERE can find all entities before or after a point in the workflow. For example, these queries split the state space before/after the S2/S3 boundary:

.where(‘state.S3’, '==', false) // Entities in state S1 or S2

.where(‘state.S3’, '==', true) // Entities in state S3 or S4.

A pair of WHERE clauses (implicitly ANDed) can find all the entities in any contiguous range of states. For example, to find all entities in S2 or S3:

.where(‘state.S2’, '==', true)

.where(‘state.S4’, '==', false)

This approach both avoids an factorial expansion in the number of state variables, avoids major query complexity, and keeps the inequality-query “slot” available for other uses, and is therefore likely an excellent choice if states are mostly linear.

Conclusions and the future

For an application being implemented today on Firestore, which has workflow scenarios where the lack of OR queries is an obstacle, likely one of the above approaches will do a sufficient job. Looking forward to the future, it seems likely the Firestore team will find ways to expand the query capabilities without loss of performance – perhaps rendering these approaches obsolete.

Writing a Generic Type-Safe ng-bootstrap NgbModal Launcher

For an Angular project for one of our clients, I’ve recently started using ng-bootstrap to implement standard modal dialogs in a ModalService. This service has methods to launch confirmation dialogs, input dialogs, message dialogs, etc; you can see a simplified version of this service on StackBlitz.

An addition to the reusable standard dialogs, we also needed to support custom one-off dialogs, and we wanted to use the same general approach, without adding a bunch of duplicate code. Most of the implementation was straight-forward, but adding type safety to the generic launcher was more interesting. Read below to see how interfaces from TypeScript and Angular made it easy.

Continue reading Writing a Generic Type-Safe ng-bootstrap NgbModal Launcher

ng-conf 2018

ng-conf 2018, one of the two “main events” of the Angular calendar, just wrapped up. As usual Oasis Digital was a sponsor, among more and more companies each year.

Perhaps the most interesting news from the conference came in the first few minutes: the main angular.io website (containing the documentation and other overall information about the framework) now receives 1.25 million unique visitors per month. Of course not all of these are Angular developers, on the other hand there are no doubt many Angular developers who visit the documentation quite rarely. Therefore it seems safe to assume there are at least 1.25 million Angular developers out there – the Angular user community is large and growing fast.

On a closer-to-home note, it is always rewarding to have numerous past students visit our booth, and to see numerous past private class customers visible at the conference as major Angular users.

For more on the technical content, please join us for our Angular Lunch user group meeting this week.