In small projects, there are typically a few developers and a few branches underway. As those branches reach readiness, depending on the flow of the project each developer may merge their own branches, or a tech lead may merge on request. Or sometimes they pile up (in a queue, conceptually) for review and merge.
But in large teams working (either a large single project or a monorepo) there can be dozens/hundreds/thousands of developers creating branches all the time. Many teams have the practice of doing a modest or moderate amount of work per branch, resulting in a large number of branches that must be merged to move the work forward.
To be merged, a change (branch) needs to be at least conceptually on top of the current mainline (regardless of whether a merge-commit or rebase strategy is used), and tests/lint/build/etc must pass. As each merge happens, the baseline moves forward – so the next candidate sometimes can’t be accurately tested until that point. At scale, this becomes the limiting factor in system progress: If the tests take 1 hour, the maximum number of merges you can do per day can be as low as 24.
Long before that limit though, having humans perform each of these operations is impractical, so teams typically adopt an automated system to queue up proposed changes/branches for merge. The function is roughly:
- Developers submit a branch as ready for merge, into a queue; often mediated by a human review and approval process.
- Automated merge system keeps picking the top thing in the queue.
- Rebase it on the current mainline.
- Test, Lint, Build, etc.
- Hopefully, merge.
- When a change fails to re-baseline / merge / test, notify the developers working on that branch, they will need to resolve either a syntactic or semantic merge conflict and resubmit.
Of course there are strategies to increase parallelism; the process remains conceptually serialized but practically parallelized.
- Related or unrelated branches can be combined and tested + merged as a group.
- Use a build system that has a great understanding of what subset must be re-tested for a given set of changes.
- Pre-build each candidate on a recent baseline, to pre-populate a build cache; a well-crafted build process (with Bazel or other tools) can safely optimize away some of the rebuild-everything process.
Here are some of the available merge queue tools – but many organizations create an internal merge queue management system.
- Bors (Good post about it)
- GitHub merge queue feature (beta)
“Merge Queue works by validating different combinations of pull requests identified as “ready to merge” in parallel”
- Gerrit Add-ons like gerritt-queue
- Plastic SCM has a built-in merge queue
- AutoMerger (article)
- Butler (internal? tool at Strava)
For now, use of an automated merge queue remains mostly a marker of large-team big-tech-co needs; over time I expect it to be a routine in-the-box tool almost as common as CI.