Why Kubernetes, Kafka, or Istio can derail your platform engineering efforts

Platform engineering means creating user-centric capabilities that enable teams to achieve their business outcomes faster than ever before. At Equal Experts, we’ve been doing platform engineering for a decade, and we know it can be an effective solution to many scaling problems. 

Unfortunately, it’s easy to get platform engineering wrong. There are plenty of pitfalls, which can contaminate your engineering culture and prevent you from sustainably scaling your teams up and down. In this series, I’ll cover some of those pitfalls, starting with the power tools problem.

How to measure a platform capability

A platform capability mixes people, processes, and tools (SaaS, COTS, and/or custom code) to provide one or more enabling functions to your teams. In order to stay user-centered and focussed on your mission, you need to measure a capability in terms of: 

  • Internal customer value. How much it improves speed, reliability, and quality for your teams. The higher this is, the faster your teams will deliver.
  • Internal customer costs. How much unplanned tech work it creates for your teams. The lower this is, the more capacity your teams will have.
  • Platform costs. How much build and run work it creates for your platform team. The lower this is, the fewer platform engineers you’ll need.

Whether it’s data engineering or a microservices architecture, it’s all too easy for your well-intentioned platform team to make the wrong trade-offs, and succumb to a pitfall. Here’s one of those tough situations. 

The hidden costs of power tools

Implementing core platform capabilities with power tools like Kubernetes, Kafka, and/or Istio is one of the biggest pitfalls we regularly see in enterprise organizations. Power tools are exciting and offer a lot of useful features, but unless your service needs are complex and your platform team knocks it out of the park, those tools will require a lot more effort and engineers than you’d expect. 

Here’s a v1 internal developer platform, which uses Kubernetes for container orchestration, Kafka for messaging, and Istio for service mesh. A high level of internal customer value is possible, but there are also high internal customer costs and a high platform cost. It’s time-consuming to build and maintain services on this platform.

Version1 of an internal developer platform. A large and heavy weight containing Kubernetes, Istio and Kafta capabilities. On the right is a horizontal bar chart showing the high levels of internal customer value, internal customer costs and platform costs of heavyweight power tools.

This pitfall happens when your platform team prioritizes the tools they want over the capabilities your teams need. Teams will lack capacity for planned product work, because they have to regularly maintain Kubernetes, Kafka, and/or Istio configurations beyond their core competencies. And your platform team will require numerous engineers with specialized knowledge to build and manage those tools. Those costs aren’t usually measured, and they slowly build up until it’s too late.

For example, we worked with a Dutch broadcaster whose teams argued over tools for months. The platform team wanted Kubernetes, but the other teams were mindful of deadlines and wanted something simpler. Kubernetes was eventually implemented, without a clear business justification. 

Similarly, a German retailer used Istio as their service mesh. The platform team was nervous about upgrades, and they waited each time for a French company to go first. There was no business relationship, but the German retailer had a documented dependency on the French company’s technology blog.

Transitioning from heavyweight to lightweight tools

You escape the power tools pitfall by replacing your heavyweight capabilities with lightweight alternatives. Simpler tools can deliver similar levels of internal customer value, with much lower costs. For example, transitioning from Kubernetes to ECS can reduce internal customer costs as teams need to know less and do less, and also lower your platform costs as fewer platform engineers are required. 

Here’s a simple recipe to replace a power tool with something simpler and lower cost. For each high-cost capability, use the standard lift and shift pattern:

  • Declare it as v1, and restrict it to old services
  • Rebuild v1 with lightweight tools, and declare that as v2
  • Host new services on v2
  • Lift and shift old services to v2
  • Delete v1

As with any migration, resist the temptation to put new services onto v1, and design v2 interfaces so migration costs are minimized. Here’s v2 of the imaginary developer platform, with Fargate, Kinesis, and App Mesh replacing Kubernetes, Kafka, and Istio. Capability value remains high, and costs are much lower.

The heavy weight containing platform capabilities in version 1 has been transitioned to lightweight platform capabilities, demonstrated in v2 with App mesh, Kinesis and Fargate in bubbles. The impact of this is shown in a horizontal bar chart comparing the high internal customer and platform costs of the heavyweight capabilities with the lower costs in the lightweight system.

Conclusion

Power tools are a regular pitfall in platform engineering. Unless your platform team can build and run them to a high standard, they’ll lead to a spiral of increasing costs and operational headaches. Transitioning to lighter, more manageable solutions means you can achieve a high level of internal consumer value as well as low costs. 

A good thought experiment here is “how many engineers want to build and run a Kubernetes, Kafka, or Istio a second time?”. My personal experience is not many, and that’s taking managed services like EKS and Confluent into account.

I’ll share more platform engineering insights in my talk “Three ways you’re screwing up platform engineering and how to fix it” at the Enterprise Technology Leadership Summit Las Vegas on 20 August 2024. If you’re attending, I’d love to connect and hear about your platform engineering challenges and solutions.

We’ve previously written about how our decision making framework, the Advice Process, empowers our employees to make decisions within Equal Experts.  We’ve recently made our process easier to consume, in the form of an open-source Advice Process Playbook.

We hope that by making this publicly available we can enable those outside our organisation to learn more about how we work and maybe even adapt it for their own purposes.

As our understanding of how we run EE matures, we’ve come to a point where we’re more confident about presenting some of the less mainstream ways that we work. We’ve long talked about providing a haven where we treat people like grown-ups, and this is one way where we have taken concrete steps to put our words into practice. 

This playbook reflects on our decision making and corporate governance journey so far, and provides hints and tips that other organisations can apply when trying a new approach to making decisions.

How we use the Playbook

The playbook explains in detail how employees at Equal Experts use the Advice Process to manage change and make decisions while providing teams a high degree of autonomy.  It provides context about how this style of decision making fits in with other systems, describes the process in detail and explains how it is implemented within EE.

 

Our main objective remains to treat people as grown-ups and we believe that the Advice Process provides a framework that taps into the talent of our team more deeply than other, more traditional approaches. Thereby improving the overall quality of decision making within Equal Experts.     

Sharing our thinking 

The ideas captured within the playbook have originated from a book written by Dennis Bakke and elaborated by people within the Equal Experts network.. We’d like others to benefit from our playbook and use it within their organisations. For that reason, we’ve published the playbook under a Creative Commons license and hosted it on GitHub for anyone to contribute, or even fork and modify to suit their context.


We hope this contribution helps advance our collective understanding of distributed management systems by Making Decisions. Better.

If you would like to know more please get in touch.

Sunshine, puppies, food and friends – sound good to you? Well, it pretty much summed up our plans for the Equal Experts summer barbecue, which we held on Sunday in the grounds of beautiful Grove House, in Roehampton, London.

Around 150 Equal Experts, partners, kids and yes, their dogs made it along. And amazingly (especially when working with children and small animals), every element of the plan clicked into place – even the sunshine!

It was great to see so many of our colleagues’ loved ones joining us, and with entertainment and music on hand to keep everyone busy – as well as the food – the afternoon flew by. Impromptu cricket and tree-climbing were soon added to the entertainment on offer, too.

As always when we get together away from work, it was a reminder of what a great bunch of people we’re lucky enough to work with. We’re growing fast this year, so who knows – start by following the Equal Experts LinkedIn page and perhaps we’ll see you at our Christmas party?