Stop trying to embed specialists in every product team

You’ve probably heard of You Build It You Run It before. It’s an operating model that empowers product teams to own every aspect of digital service management. When done well, it accelerates your time to market, increases your service reliability, and grows a learning culture. There are also some pitfalls, which can drain the confidence of your senior leadership, and ultimately put the success of You Build It You Run It at risk.

In our recent You Build It You Run It playbook, my co-author Steve Smith and I take a deeper look at the embedded specialists pitfall. You can guard against this pitfall, and even escape it if necessary.

Teams of specialists suffer at scale

Steve and I often see organisations with small, central teams of specialists supporting a number of delivery teams. For example, you might have some delivery teams and an application support team dependent upon a small DBA team, for the provisioning of your on-premise relational databases and management of your user data.

You don’t want developers to manage database backups themselves, nor to debug a live database with millions of rows of user data. At the same time, your DBAs can’t transfer their depth of expertise to developers. This means DBA workload will inevitably increase as you add more delivery teams, no matter how skilled your DBAs are. Conflicting prioritisation calls, strained relationships, and a lack of delivery team progress are likely, as well as burnout for your DBAs. 

Steve and I have seen specialists in this situation too many times, in different organisations and in different specialisations. We’ve seen it with DBAs, InfoSec analysts, network admins, and operability engineers (which you might know as DevOps). In addition, the easy answer of recruiting more specialists into the central team doesn’t usually work, because there’s a consistent scarcity of affordable specialists in the marketplace.

The embedded specialists pitfall

You Build It You Run It is an operating model in which product teams build, deploy, operate, and support their own digital services. If your organisation adopts You Build It You Run It and you have multiple product teams constrained by a struggling specialist team, one solution that could be considered is embedding those specialists into the product teams. In other words, if you had 10 product teams and one team of 10 DBAs, you would embed each DBA into one of the 10 product teams.

Steve and I call this the embedded specialists pitfall. It’s the logical extreme of You Build It You Run It, and it’s not a good idea. In theory, embedded specialists will act as first responders, and influence technical quality on product teams via their deep expertise. However, we’ve seen serious problems emerge for embedded specialists:

  • Multiple assignments. Your embedded specialists are each assigned to multiple teams, because you have N product teams, less than N specialists available, and recruiting more specialists is difficult. 
  • Unpredictable workloads. Your embedded specialists are either bored from a lack of work, or burned out from too much work across multiple product teams.
  • Lack of knowledge sharing. Your embedded specialists don’t have opportunities to work together, learn from one another, or even talk to another. They can feel lonely.

Tom Clark spoke at DevOps Enterprise Summit Europe 2019 about how this pitfall affected the ITV Common Platform. ITV tried to embed a platform engineer into each product team, and the engineers suffered from multiple assignments, unpredictable workloads, and a lack of knowledge sharing. This had a direct, negative impact which led to technology stagnation and reduced developer productivity.

So, if the answer isn’t adding more specialists to a central team, or embedding specialists into product teams… what do you do?

Establish specialists as a service

You can achieve a step change in productivity by turning your specialist teams into consumable specialist as a service offerings, and achieving a balance between cross-functional product teams and depth of specialist expertise that’s right for your organisation. Steve and I recommend the following:

  1. Offload non-specialist tasks to product teams.
  2. Map out specialist tasks and the associated actions. For example, you should offload repeatable, low value tasks to your cloud provider. You should automate repeatable, high value tasks in a deployment pipeline. And retain ad hoc, high value tasks as is. See the table below for more details.
  3. Establish low friction, public messaging channels for bi-directional feedback loops, #ask-the-dbas, #ask-network-admins.
  4. Create public ticket queues to show planned work and work in progress for specialists.

Specialist tasks can be mapped in terms of repeatability and value-add.

The goal is to remove toil for your specialists, and free them up to concentrate on high value scenarios in which product teams really need their deep expertise. Their expertise is too scarce, and too important, to be wasted on tasks that can be handled by your cloud provider or product teams.

Further reading

To find out more, you can continue our You Build It You Run It pitfalls series:

  1. 7 pitfalls to avoid with You Build It You Run It
  2. 5 ways to minimise your run costs with You Build It You Run It
  3. Why your operations manager shouldn’t be accountable for digital reliability
  4. How to manage BAU in product teams
  5. 4 ways to remove the treacle in change management
  6. Why product teams still need major incident management
  7. Stop trying to embed specialists in every product team – you are here!
  8. How to avoid developer burnout on call

Our You Build It You Run It page has loads of resources on on-call product teams – case studies, conference talks, in-depth articles, and more. Plus our You Build It You Run It playbook gives you a deep dive into how to make it happen!

Today’s most successful organisations are fast and flexible. They can deploy new features for digital services to meet rapidly evolving customer demands and market opportunities. They capitalise on first-mover advantage and sustain innovation: improving digital services incrementally through weekly—or even daily—deployments.

For many businesses, maintaining or achieving rapid time-to-market is a constant challenge. If the same is true for you, you’re not alone. We constantly hear the same type of complaints from many different organisations, at every level:

“IT is too slow.”

“We can’t get new initiatives delivered quickly enough.”


“New features consistently run late in delivery or get bogged down in approvals processes.”


“We don’t react to the needs of our customers.”


“The tech team is a constant bottleneck.”

In many cases, it may not actually be the tech team causing the blockages that impede your ability to deliver value, at pace, consistently. It could be the way your tech team is structured that’s the issue. And adopting an alternative operating model—called You Build It You Run It—could be the answer to your problems.

We’ve helped clients make enormous gains in deployment throughput by adopting You Build It You Run It: from 10 deployments a year to 4000+, without a correlating increase in production incidents.

So, how do you manage deployments in a You Build It You Run It operating model, and how does it differ to your—likely—current deployment process (often referred to as ‘Ops Run It’)?

Deployment in ‘Ops Run It’: the common, conventional IT approach.

Many software teams use an operational model called ‘Ops Run It’. In this model, Operations teams are charged with the responsibility of deploying and supporting applications in production. The above graphic highlights a composite IT department, where Delivery and Operations function as silos.

In this approach there are many delivery teams building and testing software services. Once those services are built and tested, there’s a handover. In Operations, there’s a Change Management team running change-advisory board meetings (or CAB meetings) to ensure the services are stable and safe. Then, there’s an App Support Team doing the deployments.

How does ‘Ops Run It’ work in a day-to-day capacity?

  1. The Delivery team adds a delivery request into the change-management queue.
  2. Each delivery request is approved or rejected in a CAB meeting.
  3. An approved change is added into the deployment queue.
  4. The Application Support team perform the actual deployment at the scheduled time.
  5. Post-deployment validation testing will be performed by either the Operations App Support team or, in some cases, the Delivery team.

What are the drawbacks in this approach?

  • It’s incredibly slow to approve a change request: in this operating model, deployments are mired in bureaucracy because it takes hours, days, or even weeks to approve a change.
  • The act of deployment itself is slow-to-complete: it can take hours or days to actually complete deployment. Additionally, if the deployment tasks are not automated, the act of deployment can be risky and time consuming. In some cases, the architecture itself may prevent teams from being able to make changes while the system is running. This means all deployments must be timed to correlate with planned outages. If the team does not have automated production verification tests, then testing must be completed manually after each deployment. All of these factors—whether individually or in conjunction—can contribute to lengthy deployment times.
  • It’s slow to synchronize knowledge between teams during handovers: every time the delivery team completes a piece of work, they need to conduct a handover process to pass the deliverable to Operations for approval and deployment. This involves a detailed summary of what was built, why it was built a specific way, and anything to be mindful of in support cases. Providing this context is time consuming and often arduous, given the Operations team was not involved in building the change and typically have very little context as to why it is important.
  • You can’t sustain innovation with Ops Run It: while Ops Run It is suitable for commercial-off-the-shelf software (COTS) applications and foundational systems, it does not facilitate the rapid time-to-market required for sustained innovation. This is because it is slow to approve, implement and learn from new product features.

Deployment in You Build It You Run It: how to launch new product features to market every day.

The above graphic shows a hybrid model.

It’s important to note that You Build It You Run It and Ops Run It are not binary propositions. It’s a common misconception that you can simply choose one model over the other and employ it universally. In most cases, you’ll have different models working together, because different services have different requirements. You should also frequently review your operating model depending on the maturity or lifecycle of the product/system you’re supporting.

Here, the Delivery team employs the You Build It You Run It operating model for all activity involved with developing, testing, deploying and evolving digital services.

The same Delivery team could be using an Ops Run It model—in conjunction with a central Operations team—for work associated with commercial-off-the-shelf software (COTS), foundational systems, or even mature digital products that require little intervention.

How does You Build It You Run It work in a day-to-day capacity?

In You Build It You Run It, an on-call Delivery team has total ownership over developing, deploying, monitoring and supporting digital services. As a result, there is no hand-off from Development to Operations.

The Delivery team has all the skills and context needed to do this across the entire lifecycle. It’s made possible by a loosely-coupled service-oriented architecture, and an automated deployment process.

In the You Build It You Run It operating model, the on-call product team will determine for themselves whether what they want to do is a regular, low-risk change. We trust they’re in the best position to make this call, because these team members have built the service and best understand the intricacies of its codebase.

If the Delivery team deem it is a regular low-risk change, they use a pre-approved change request process, and it’s managed by the team as a standard change. This process is extremely fast when implemented correctly—updates can be deployed daily to help organisations sustain innovation and consistently deliver value.

If the Delivery team deem it is an irregular or high-risk change, it goes through a normal CAB process with the same change-management team from Operations.

When a deployment happens, the Delivery team perform it themselves—at a time of their own choosing—and report their successful changes to the Change Management team.

Critically, this removes the bottleneck you see in Ops Run It where the Delivery team is dependent on the App Support team.

In You Build It You Run It, we also typically see Delivery teams being much more considered and cautious with their deployment times than an Operations team might be. This is because the Delivery team themselves are responsible for fixing any problems that arise from the deployment.

Often, in scenarios where deployment is automated and the architecture is built in a way that doesn’t require downtime for deployments, Delivery teams will deploy during the day to give themselves a working day to monitor for issues and implement immediate fixes as they arise. This, in turn, means fixes are provisioned faster and services are more reliable in general.

What are the benefits of this approach?

In the right environment, there are many benefits to You Build It You Run It. The top three we typically see are:

  • It’s much, much faster to approve change requests: in this operating model, an approved change request for a regular low-risk change can be obtained within minutes or even immediately. As opposed to days, weeks, or months.
  • It’s much, much faster to complete deployments: the deployment process can be fully automated by the Product team using the deployment pipeline. Crucially, even if you choose not to automate this process, there is no handoff to any other team as part of the deployment workflow; the team is entirely self-sufficient.
  • It’s much, much faster to synchronise knowledge: there’s no requirement for meetings or phone calls to facilitate a deployment handover for an Operations team. The only requirement is to synchronise knowledge within the Product team; through an approach like paired development, everyone on the team has far greater clarity of what’s happening at any point in time.

So how do I get started with You Build It You Run It?

Like many things, You Build It You Run It can be incredibly powerful when implemented in the right context. But it’s not necessarily the right approach for every organisation universally.

If you’re interested to learn more about You Build It You Run It—including whether it’s right for you, and how you can start to determine its suitability for your team —I’d love to set up a conversation.

You Build It You Run It accelerates your time to market, increases your service reliability, and grows a learning culture. All by empowering your product teams to own every aspect of digital service management. However, there are some pitfalls, which can put the success of You Build It You Run It at risk. One pitfall involves implementing the same reactive change management process. It creates something we like to refer to as treacle. 

In our recent You Build It You Run It playbook, my co-author Bethan Timmins and I take a deeper look at the change management treacle pitfall. You can guard against it. And how you can even completely escape it.

Change management is needed

Let’s start with a little more explanation of the You Build It You Run It hybrid operating model:

  • Product teams build, deploy, operate, and support their own digital services, such as user-facing microservices. 
  • An application support team manages foundational systems, like self-hosted COTS and custom back office applications for which there’s no equivalent SaaS or COTS.

This diagram shows how we think of deployment throughputs for digital services and foundational systems in You Build It You Run It. 

In the diagram, you can see an approval step for foundational systems, prior to the release step. The delivery team has to file a change request to the change management team, who then review the change and approve or reject it. If the change is approved, the application support team performs the production deployment.

“Our change management is too slow” is a complaint we’ve hear often. Now we all recognise the need for change management, it’s necessary for large, complex changes with lots of moving parts. It’s also important that your IT department can produce an audit trail of all production changes, to assist in incident diagnosis and to satisfy internal compliance requirements.

It’s easy to blame slow change management on ITIL. But we’ve helped plenty of organisations to implement daily deployments within ITIL parameters. 

In our experience, slow change management is caused by using a heavyweight, one size fits all process. We’ve seen many organisations get stuck when they don’t make the time to rethink how to do it well. But You Build It You Run It depends on a slicker change management process, which still satisfies the needs for change approvals and auditing. 

Change management treacle

Bethan and I refer to slow, heavyweight processes as treacle. You can’t move faster, because you’re stuck in sticky syrup where you are. 

When you adopt You Build It You Run It, copying and pasting the same reactive change management process causes a lot of treacle. Product teams are empowered to build, test, and deploy their own digital services, but every change still has to be approved by the change management team. This causes a lot of problems:   

  • The change management team can’t approve changes fast enough for weekly or more frequent deployments, so product features can’t be rapidly tested with customers 
  • Changesets per deployment become much larger, which makes it harder to test changes and diagnose production failures when they occur
  • Product teams suffer from low morale, because their hard work is often trapped in a change request queue for days at a time
  • Relationships between product teams and the change management team become frayed, and resentful

This is what we call the change management treacle pitfall. It prevents you from achieving one of the key goals of You Build It You Run It – an acceleration in deployment throughput, as a means to increase product revenues, keep BAU under control, and reduce operational costs. 

You need to re-implement change management for digital services and You Build It You Run It – but how?

Here are our 4 suggestions to remove the treacle

We recommend you aim for a twin-track approach to change management. This way you allow digital services to move at a faster pace than foundational systems, while allowing for change approvals and auditing. Try these practices and you will quickly see how you can cut out the treacle.

  • Pre-approve low risk, repeatable changes. This is known as ‘standard changes’ in ITIL. Establish a template with your change management team for pre-approving small digital services changes, and retain the regular process for large changes to digital services or foundational systems. 
  • Encourage frequent, small changes. Incentivise your product teams to make their planned features and changesets as small as possible.  
  • Automate change auditing. Record all deployment pipeline changes in a persistent store, and create an information portal so your change management team can view ongoing changes.
  • Run regular chaos days. Ensure every product team runs Chaos Days, to validate their approach to change management, deployment failures, and incident handling.

This will result in a streamlined change management process for digital services, akin to:

  1. Product manager creates a pre-approved change request template
  2. For any low risk and repeatable changes, automate filling in a pre-approved change request whenever a release candidate passes all functional tests
  3. For any high risk or unrepeatable change, send a change request to the change management team for approval. 
  4. Automatically check prior to a production deployment that an approved change request exists.
  5. Automatically close the change request when the deployment is completed.

That’s it. It’s the most effective way we have discovered to remove your change management treacle, and allow your You Build It You Run It adoption to be successful! We hope it helps.

 

To find out more, you can continue our You Build It You Run It pitfalls series:

  1. 7 pitfalls to avoid with You Build It You Run It
  2. 5 ways to minimise your run costs with You Build It You Run It
  3. Why your head of operations shouldn’t be accountable for digital reliability
  4. How to manage BAU in product teams
  5. 4 ways to remove the treacle in change management – you are here!

 

Our You Build It You Run It page has loads of resources on on-call product teams – case studies, conference talks, in-depth articles, and more. Plus our You Build It You Run It playbook gives you a deep dive into how to make it happen! Get in touch, and let us know what you think.