You Build It You Run It operating model is a powerful tool. It empowers product teams to own every aspect of digital service management. This has a number of positive impacts. It accelerates your time to market, increases your service reliability, and grows a learning culture.
However there are some pitfalls which, if not addressed, can drain the confidence of your senior leadership, and ultimately put the success of You Build It You Run It at risk.
In our recent You Build It You Run It playbook, my co-author Steve Smith and I take a deeper look at the BAU pitfall, and the risk of unplanned maintenance work becoming uncontrollable. You can guard against this pitfall. In fact, you can escape it completely, by following these four simple rules.
BAU as unplanned maintenance work
We see the expression BAU (Business as Usual) used a lot. It’s usually a synonym for ‘unplanned maintenance work’, and includes:
- Infrastructure capacity upgrades
- Defect fixes
- Security patches
- Telemetry improvements
- Support tickets, resulting from live incidents
- Minor changes to existing features, due to customer feedback
If you work in an enterprise organisation, and you haven’t adopted You Build It You Run It yet, I’m guessing there’s plenty of BAU. And it causes a these problems:
- Delays to planned work. The more unplanned BAU work you take on, the slower your planned product work will be.
- Risk of reliability problems. The more unplanned BAU work you defer, the higher the risk of production incidents because of the lack of maintenance.
- Low team morale. The more BAU unplanned work you have, the greater the perception that teams are slow delivering planned work. This usually happens because unplanned work (to keep the lights on) isn’t easily visible, and can start teams on the negative spiral of a reactive learning culture.
If you are considering implementing You Build It You Run It, there’s an understandable worry amongst senior leaders. Surely, putting engineers on-call will delay planned work, because engineers will spend their time fixing live maintenance issues? The truth is, it really doesn’t have to be the case. As long as you avoid this important pitfall.
The excessive BAU pitfall
When you adopt You Build It You Run It, your product teams are on-call and responsible for running their own digital services. They already face a lot of BAU in managing their own infrastructure upgrades, applying defect fixes and security patches, and refining the telemetry toolchain. This happens when:
- Digital services aren’t designed to gracefully degrade on failure
- Deployments aren’t applied repeatedly and reliability in all environments
- Monitoring dashboards and alert definitions aren’t fully automated
The easiest way to spot this pitfall is to measure the amount of unplanned work faced by a product team each week. Ask them to fill out a survey of how much time they spend on planned work, each week. If the percentage starts to go down, and team members complain about excessive time spent on intermittent alerts, deployments etc. then you’re sliding into a pitfall.
That really doesn’t have to be the case. We’ve worked with organisations where on-call product teams have gradually eliminated many types of BAU, until it’s only a small amount to handle each week.
Keep product teams proactive and productive
We recommend these practices to cope with BAU:
- Rearchitect digital services for adaptability. Eliminate avoidable dependencies, soften unavoidable dependencies, create availability redundancy, and share behaviours via APIs instead of closed-source libraries.
- Create a fully automated deployment pipeline. Introduce XP development practices such as pair-programming and test-driven development, use dynamic test data to parallelise functional tests, establish zero downtime deployments, and allow a fast revert on failure.
- Establish an automated telemetry toolchain. Implement a telemetry toolchain from engineer laptop to test environments to live traffic, including a logging stack like EFK, a monitoring stack like Victoria Metrics, and an incident response platform like PagerDuty.
- Treat unplanned work as planned work. Visualise BAU work items on the team board, allocate a team member daily to complete urgent items, and measure the impact of BAU items on team progress.
Treating unplanned work as planned work sounds simple. But it needs a change in mindset, and it makes a huge difference. Tracking and prioritising BAU work items, alongside planned product features from the same backlog, ensures you’re always assigned to the most valuable work at any given time. A recent post by product evangelist John Cutler explains the consequences of not treating unplanned work as planned work.
These practices aren’t quick fixes. They might require more time and money than other solutions. Don’t let that put you off. By implementing these practices, you’ll see a dramatic improvement in the amount of time a product team spends on BAU maintenance work. This frees up time to work on product delivery, by adding more capabilities and also having more time for unstructured innovation. And we all know, the truth is you’ve got to stay on top of BAU, at all times.
If you’d like to find out more, you can continue our You Build It You Run It pitfalls series:
- 7 pitfalls to avoid with You Build It You Run It
- 5 ways to minimise your run costs with You Build It You Run It
- Why your operations manager shouldn’t be accountable for digital reliability
- How to manage BAU in product teams – you are here!
- 4 ways to remove the treacle in change management
- Why product teams still need major incident management
- Stop trying to embed specialists in every product team
- How to avoid developer burnout on call
Our You Build It You Run It page has loads of resources on on-call product teams – case studies, conference talks, in-depth articles, and more. Plus our You Build It You Run It playbook gives you a deep dive into how to make it happen! Get in touch, and let us know what you think.