Steve Smith

Global Head of Technology at Scale
Our Thinking

June 10, 2024

Why unplanned tech work kills team capacity

I speak with customers across the Equal Experts network, to understand their scaling problems. When I’m asked ‘why don’t my teams have enough capacity to deliver’, or ‘what’s the business benefit of standardization’, my answers usually mention unplanned tech work. It reduces your value-add time, and if you don’t measure it, it’ll silently kill team capacity.

I once visited a British telco where 20 teams were building digital services on GCP. A delivery manager told me “features are slow to reach customers and we don’t know why”. When I suggested measuring utilised capacity as value-add time versus non-value-add time, the manager predicted an 80/20 split. They were horrified when our calculations showed a 20/80 split, with teams averaging 80% of their time on unplanned tech work!

What is unplanned tech work, why was it silently killing team capacity at this telco, and how did they start to turn the situation around?

The impact of unplanned tech work

Let’s start with some definitions:

Measure Description Type
Capacity Maximum amount of time available for work Time period
Utilisation Actual capacity used for any work Percentage
Utilised capacity Actual amount of time available for work Time period
Value-add time Utilised capacity for value-adding product work Percentage
Non-value-add time Utilised capacity for non-value-adding tech work Percentage

 

At Equal Experts, we often see teams being asked to estimate their capacity, but it’s fixed in advance by team size, working days in a time period, etc. Asking how much capacity is available for value-adding work is more interesting, because that’s controllable, and unplanned tech work has a bigger impact on planned product work than people realise.

Product work is about creating propositions, products, and services to satisfy business demand and user needs. Planned tech work is proactive tech initiatives, and routine BAU maintenance. Unplanned tech work is reactive fixes and emergency maintenance. Examples include configuration errors, defects, deployment failures, environment issues, security vulnerabilities, and test failures. You might know this as break/fix work, or rework.

More unplanned tech work means lower technical quality, slower delivery speed, and more reliability problems. This was proven by Dr. Nicole Forsgren et al in Accelerate, which showed high-performing organizations spend 29% less time on unplanned tech work. It also found that Continuous Delivery practices like frequent deployments are a predictor of low unplanned tech work. As unplanned tech work decreases, more time is available for planned product work.

Unplanned tech work is a silent killer

When we speak with customers about modern engineering practices like You Build It You Run It, their biggest worry is teams would spend all their time on non-value-adding work, like an operations team. But there’s little awareness of how much time their teams are already spending on unplanned tech work. We believe this is because it is:

  • Unmeasured. Teams track break/fix work with outputs, like defect count. They don’t look at outcomes, like the impact of break/fix rate on delivery speed
  • Unquestioned. Teams accept break/fix work over time. They don’t resist the abnormal becoming normal, which you might know as the normalization of deviance

Break/fix work can silently build up without any warning, until value-add time is a small fraction of utilised capacity and planned product work grinds to a halt. It’s a silent killer.

At the British telco, they analyzed why teams were spending 80% of their time on tech work. There’d been a lack of technical guidance, so the 20 teams had created 20 variants of Go, Java, and Python tech stacks on App Engine, Cloud Run, and Kubernetes. There was no standardization, no platform engineering team, and teams were well beyond their core competencies. 60% of their time went on configuration and infrastructure workarounds, and 20% was planned maintenance to eliminate errors. Teams were running just to stand still, but why hadn’t this been noticed before?

How to control unplanned tech work

The Accelerate book shows that unplanned tech work can be measured as rework rate percentage. We recommend this measure to our customers, but we don’t recommend automating any code analysis, because there are so many potential sources of break/fix work at scale. The most cost-effective measurement methods we’ve found are:

Measurement method Description Prerequisites
Team survey Ask engineers once a week/month to estimate what percentage of their time is spent on unplanned tech work tasks Team understanding
Ticket analysis Query ticketing system for tagged tickets once a week/month to calculate what percentage of time is spent on unplanned tech work tickets Team understanding

Consistent ticketing and tagging of all team tasks

Whatever measurement method you use, team buy-in is vital. Unplanned tech work rate should be understood as a high level percentage, not a detailed breakdown. Micro-management will quickly become counterproductive, and there’ll be too much variability for accurate forecasting.

At the British telco, a weekly Google Form was introduced to ask engineers a single question ‘what % of your time last week was spent on GCP fixes and maintenance’, with a picklist of examples. The survey quickly identified Kubernetes and App Engine misconfiguration as repeat offenders. As a result, a platform engineering team was formed to migrate all teams onto a central, self-service Cloud Run solution.

Conclusion

When a team is under pressure to deliver on time, and they say or are told they lack delivery capacity, it’s usually because of unplanned tech work. Reactive fixes and emergency maintenance can easily take up more time than expected, because break/fix work is so often unmeasured and unquestioned. This means value-add time is much lower than expected, and it can result in missed deadlines and stakeholder dissatisfaction.

It’s possible to measure unplanned tech work, but it isn’t cost-effective to automate. If you regularly survey your teams, or are able to analyze your ticketing system, you’ll be able to identify and eliminate sources of break/fix work. This will free up more time for planned product work, and allow your teams to achieve new levels of delivery speed, quality, and reliability.

You may also like

Anyone Can Usability Test, Part 3: Making the Most of Your Findings

Blog

Anyone Can Usability Test, Part 3: Making the Most of Your Findings

Do we need roles in a cross functional team?

Blog

Do we need roles in a cross functional team?

Epic anchors – bringing epic stability to cross-functional teams

Blog

Epic anchors – bringing epic stability to cross-functional teams

Get in touch

Solving a complex business problem? You need experts by your side.

All business models have their pros and cons. But, when you consider the type of problems we help our clients to solve at Equal Experts, it’s worth thinking about the level of experience and the best consultancy approach to solve them.

 

If you’d like to find out more about working with us – get in touch. We’d love to hear from you.