Optimising CI/CD Processes - The Beginning
ℹ️ This is a multipart series exploring CI/CD optimisations ℹ️
- Intro!
- Identifying Common Steps
- Caching + Docker (and layer caching)
- Test Parallelization
- Hopper Configuration Upgrade (Deliveroo-specific)
I’ll be talking about some Deliveroo-specific things in this series. When I do, I’ll be sure to provide as much context as possible.
I recently moved to a new team at Deliveroo. I started contributing as much as possible to one of primary codebases used in the area to get up to speed more quickly. Getting down and dirty is the best form of learning, right?
From the beginning, I noticed extremely long feedback loops when implementing changes. The process of releasing changes to production are:
- Develop and test change locally
- Run a subset of the CI/CD process (no deployments) against the feature branch (~17 minutes)
- Deployed to the staging environment
- CI/CD (~23 minutes)
- Deployment (~6 minutes)
- Test in staging
- Deployed to the production environment
- CI/CD (~23 minutes)
- Deployment (~6 minutes)
- Test in production
I’ve put the durations of each step (p95) above. Excluding development and manual testing, if everything went perfectly (does it ever?!), the whole process would take a minimum of ~75 minutes.
DAMNNNNN, that’s a long time, huh?!?!
But, this begs the question, is the CI/CD time the right thing to be measuring?
I would say no. IMO (which is opinionated), we should be measuring the time it takes a team to deliver value to customers. To put it more simply, how long does it take to get a change to production, from the moment work begins to the moment a customer starts using it?
For those of you following along, you may know where I’m going next.
Lead Time for Changes, a DORA metric, is a great metric to track, and the one I’m going to focus on in this post. Disagree with me in the comments (or don’t, up to you).
Lead Time for Changes encompasses much more than just CI/CD (again IMO). For example, how long it takes for a commit to get into production also depends on our interactions with Product Manager/Owners, Designers, other engineers (via reviews), etc.
At Deliveroo, our deployment system, Hopper, tracks this metric automagically for us, which is amazing.
Before I started the work I describe in this post, we had quite a high Lead Time to Change (can’t really say the actual number…).
After these optimisations were implemented, we saw around a 50% reduction. More can be done to reduce this number outside the CI/CD process.
🚨 I’m not saying that all the reduction is attributed to these optimisations, but after gathering qualitative feedback from myself (so unbiased) and the teams I work with, I can say they were definitely helpful!
Let’s get on the same page!
A few things before we get started.
First, I’m going to talk very specifically here about JS/TS projects, but these ideas can be applied to all CI/CD processes.
Second, I’m going to use CircleCI as the CI/CD platform when talking through examples. These concepts can likely apply to other CI/CD platforms.
Lastly, some definitions so we’re all on the same page:
- Step: A step is a single unit of work in a CI/CD process
- For example, installing dependencies, setting up environment variables, initiating commands, etc.
- Job: A collection of steps
- For example, running tests, linting code, building a Docker image, etc.
- Build Pipeline: A collection of jobs that represents all work
Go learn!
- Identifying Common Steps
- Caching + Docker (and layer caching)
- Test Parallelization
- Hopper Configuration Upgrade (Deliveroo-specific)
Until next time...