HM Revenue & Customs

A case of
pandemic agility.

How HMRC supported the economy in four short weeks.

In March 2020 the UK went into lockdown, causing the most brutal recession in living memory. The Government had to react quickly, with policies to help the people and businesses in the United Kingdom cope. These include the Coronavirus Job Retention Scheme, the Self Employment Income Support Scheme, and the Eat Out to Help Out Scheme. Once these initiatives were launched, Her Majesty’s Revenue and Customs (HMRC) needed to deliver a whole new system, capable of dealing with huge spikes in traffic. And it had to be designed, implemented, and delivered in a matter of weeks. Here’s how it was possible.

This case study will help you to understand:

How good digital services can be launched to customers within days.

Why a Digital Platform must be treated as a product, rather than a project.

The importance of culture in a Digital Platform ecosystem.

This case study is also available as a PDF

01

How HMRC set up a team to work at pace, and at scale.

Working quickly - creating a team to deliver in just four weeks.

As the Chancellor of the Exchequer was rethinking the economic structure of the country, HMRC was equally busy gathering together several blended teams of the most talented people in digital engineering.

A mix of partners enabled HMRC to jointly develop the strong cloud technology architecture, and improve the diversity of the organisation. They comprised of experts technically capable of delivering a practical, straightforward system. A system that needed to withstand demand at levels never experienced before by HMRC.

  • 67,000

    number job claims within half an hour of the Job Retention Scheme going live

  • 440,000

    people applied for government grants via the Self Employment Income Scheme on the first day of its operation.

The scale of the work.

HMRC is an established government department, with a multitude of suppliers, hundreds of teams and thousands of engineers.

The platform and services needed to deliver financial support to more than 12 million employed and self-employed workers via the Coronavirus Job Retention Scheme, the Self-Employment Income Support Scheme and the Eat Out to Help Out Scheme.

Due to the timescales, the HMRC digital team structure would need to work differently. Specialists from each discipline needed to be represented at team level. A Platform engineer was embedded into each Digital Service team to safely “short-circuit” existing processes. This allowed us to eliminate risks early in the process and maximise on collaboration.

This was adopted as a short term measure and within weeks of the first launch, we adapted Digital Platform tools and processes to make some of those “short-circuits” unnecessary.

Small technical teams with blended roles.

Cross-functional Digital Teams

 

The technical part of the implementation was delivered by the Equal Experts team at HMRC, interacting closely with a SurePay team. Although the Equal Experts team has a mixture of developers, a tester and a business analyst, each consultant has the experience and a broad enough skill set to work effectively across disciplines and focus on the highest-priority activity. Developers test things, the business analyst diagnoses production issues and makes GitHub commits, and the tester writes code and deploys to production.

In mid-April, HMRC connected the team with a group of suppliers to quickly integrate a new bank account checking service with HMRC’s Tax Platform. Two weeks after the first meeting, the HMRC delivery team was cutting code and had completed end-to-end testing. Two weeks after that, the integration was live in production, comfortably serving nearly 20 requests per second. One hour after going live, this API ended up taking 100% of the traffic, due to a capacity issue with other downstream services. Integrating with a third-party service in just four weeks is a massive achievement, especially in the public sector. This was only possible through a Continuous Delivery approach.

02

The MDTP - how a single cloud platform provides room to build and grow.

Standing on the Shoulders of Giants.

The UK government’s rapid digital response to coronavirus (COVID-19) was a result of years of investment in people, governance and technology. The Multi-channel Digital Tax Platform (MDTP) is the cloud platform, hosted on Amazon Web Services (AWS), that was home to all the services provided. Originally evolving out of other services and deployed through shared lessons, this provided a “paved road” that allowed our Digital Service teams to write code to solve these HMRC problems on day one.

Discover more about cloud based platforms in our Digital Platform playbook.

MDTP is home to:

  • 130

    Digital Services

  • 900

    Microservices

The cloud platform enabled the HMRC campaign for coronavirus. It held up to record traffic – peaking at 132 million page views in a single week – cementing the importance of having a PaaS in a single public cloud.

Functionality of MDTP

Understanding the Multi-channel Digital Tax Platform.

The Multi-channel Digital Tax Platform, began development in 2014, and is a cloud platform for over 130 user-facing applications and 900 microservices that have been built as part of HMRC’s ‘making tax digital’ strategy. It provides an easy way for teams to build and deploy an application that can scale to handle millions of requests. A Platform As A Service (PaaS) or platform-based service cloud computing service, it was developed as a result of incremental learnings since then and it now allows HMRC to develop, run, and manage applications without the complexity of building and maintaining the infrastructure.

Why do we recommend you host in a single public cloud?

A Digital Platform is a type of bespoke PaaS: a set of streamlined tools, processes, and people which form platform capabilities to help you and your organisation rapidly meet the needs of your customers.

There are different types of bespoke PaaS, based on workload. The most common types we’ve seen before are Digital and Data Platforms. We advise your organisation invests in a bespoke PaaS when a clear, homogenous workload starts to emerge across multiple teams.

Equal Experts strongly recommend you host a bespoke PaaS in a single public cloud. Amazon Web Services (AWS), Azure, or Google Cloud Platform are the main providers. They each offer a tried and tested, on-demand, cloud computing platform, with a wide range of reliable cloud services that can be provisioned instantly and billed on a pay-per-use basis. We chose AWS for the HMRC project.

Discover more about designing Digital Platforms.

03

The importance of culture in providing focus in a project.

The HMRC response would not have been possible without the huge effort of the people who made it happen.

The response relied on digital capability and expertise to make policy into reality quickly – often going from concept to live in just a few weeks.

  • 4 weeks

    to launch all services

  • 97%

    customer satisfaction level

This was only possible with a can-do culture.

It is difficult to comprehend how impressive the delivery of these services is. HMRC digital services is a pool of very talented people – with a culture to match. Combine this with a team of consultants who were focused on delivery and it creates an environment where everything is possible.

The culture of Equal Experts.

At Equal Experts we trade on our ability to learn and share knowledge, rather than protecting or ‘guarding’ it. At its core, Equal Experts is a haven where this sharing happens freely and happily, between like-minded practitioners and our customers.

The People first approach.

The established culture and collaboration between Digital Service and Digital Platform teams was a foundation to the success of this project. Digital Platform teams were already used to bi-directional feedback loops with Digital Service teams. This meant everyone was able to run these at high-speed leveraging existing relationships and collaboration tools.

I am in awe of the problem-crunching fury with which HMRC and the Treasury created the furlough scheme and all the other means of support.

From Prime Minister Boris Johnson’s New Deal for Britain speech, June 2020
04

How to effectively collaborate at scale.

Collaboration was the foundation to the success of this project.

Another shared obstacle was the increase in remote working. In a matter of days the HMRC team had gone from being a mainly office based workforce to having 55,000 people working from home. The most important aspects of delivering a system at speed is the ability for engineers to be able to just ‘get on with it’. Governance, best practice, and a mandate for HMRC teams to use certain tools meant we were not risking new technology problems or security concerns which would have ultimately delayed delivery.

Tools

Logos for AWS, Github, Jira Software, Slack, Kibana, Scala, MongoDB, Grafana, Jenkins and Confluence

The benefits of automating the setup.

  • Makes an engineer’s life easy by taking away pains of setup
  • Eases the burden of costs for companies
  • The setup itself will become an onboarding document and will be validated with every (new) team member trying to set up their machine.
  • Consistent setup across the team and thus helps in the effective pairing
  • Get rid of the “works on my machine” phenomenon
  • Helps in keeping the tools upgraded

Why we used Slack for communication.

HMRC Digital heavily relies on Slack. It is very easy to have instant conversations but also keep track of what happens elsewhere. A Slack channel could be created ad-hoc for a specific issue. Engineers, analysts and designers from multiple teams and specialisations can focus collaborating and solving problems without endless meetings or endless email chains.

  • Questions for a specific team – the team use public team channel is the first option.
  • Private Slack channels — for the day to day internal discussions do not get aired in the public.
  • A consistent naming convention – with good channel descriptions.

How to deliver at speed – Self-Service + Infrastructure-as-code.

The other really important aspect of working at speed is that as much as possible can be achieved using self-service. Logs, performance monitoring, automated alerting, a build pipeline that would automatically build and deploy to the non-production environments. It would even configure a Pull Request Builder that would run the tests before committing to the main branch.

An engineer’s point of view.

A really important aspect of working at speed is that as much as possible can be achieved using self-service. Here is a view from one of the engineers on the project:

  • Create a new GitHub repository – that is done via a job on the build server. There are a few templates available, so I picked a “backend” service, which created most of the boilerplate code of the service for me. The boilerplate includes things such as sending metrics to the metrics backend, authentication for incoming requests.
  • Create a new build pipeline. The pipeline is defined using a Domain Specific Language (DSL) in the build jobs repository. To create a new one, I created a pull request and a few minutes after, I had a pipeline that would build, test and deploy my service to the development, QA and staging environments.
  • Create some test scripts that create the database schemas (this would allow me to have the test database locally as well as using it for testing on the build server), this is defined “as code” in the service repository.
  • Create configuration entries – more Pull Requests
  • Create a Kibana dashboard for accessing logs – Pull Request
  • Create a Grafana dashboard for accessing metrics – Pull Request
  • Create alerting configuration (so that if something goes wrong I get woken up by PagerDuty) – another Pull Request

Within next to no time, I had all the supporting components in place. Logs, performance monitoring, automated alerting, a build pipeline that would automatically build and deploy to the non-production environments as soon as I committed to the main branch. It would even configure a Pull Request Builder that would run the tests before committing to the main branch.

All this was done in a matter of minutes, giving me time to focus on writing the service itself.

You Build It, You Run It.

At HMRC Digital, the goal is to give teams all the capabilities they need to develop, test, deploy, and operate their live services. This is sometimes known as “You Build It, You Run It”, and it’s been a key enabler to supporting so many teams on the Tax Platform. A Platform-as-a-Service offering maximum operability with minimal friction is easier for teams to operate safely and reliably.

The production support method of ‘You Build It, You Run It’ refers to developers supporting their own production services. You Build It You Run It unlocks weekly or more frequent deployments for Digital Service teams, as it eliminates any operational handoffs. It also incentivises Digital Service teams to balance their time between product and operational features. The net result is more frequent production deployments and more reliable live services.

05

Interacting with legacy systems - understanding when it’s possible - and when it’s not.

Interacting with legacy systems.

Unless the project is built from scratch, there will always be a requirement to work around legacy systems. Each Digital Service required interactions with legacy systems across HMRC. The team needed to analyse each system to understand whether it was ‘fit for purpose’. Where it was possible to work with the system, the service teams leveraged existing well-defined interfaces to legacy systems. Where there was a potential risk of destabilising the whole process, the Platform quickly iteratively built well-defined interfaces to new software systems.

Organisations, over the years, build up tens or even hundreds of legacy IT systems (either off the shelf or bespoke). Typically, these systems are stand-alone or have limited ability to integrate with a wider enterprise. A Digital Platform can allow these disparate legacy systems to evolve into modern-looking, more flexible systems.

An example where risk was identified – and the HMRC response.

To ensure that the financial relief provided by HMRC’s three COVID-19 services reached the right people as quickly as possible, bank account details are validated. The unprecedented volume of traffic, peaking at 100 journeys per second, overwhelmed the third-party services used to validate bank details. HMRC wanted extra capacity and resiliency.

Digital transactions for service users during COVID-19, were critical to their livelihood. This meant they needed to be as simple and low-stress as possible.

In mid-April, HMRC connected the team with a group of suppliers to quickly integrate a new bank account checking service with HMRC’s Tax Platform. Two weeks after the first meeting, the HMRC delivery team was cutting code and had completed an end-to-end test. Two weeks after that, the integration was live in Production, comfortably serving nearly 20 requests per second. The timing of this was incredibly fortunate: one hour after going live, SurePay’s API ended up taking 100% of the traffic, due to a capacity issue with other downstream services.

Read more about this complete project here: How Equal Experts helped HMRC to get COVID-19 relief money to people through Continuous Delivery.

06

Why continuous delivery is an effective, secure solution for change.

How technology initiatives, if done in the right way, can be successful and fun.

A Digital Platform is never “done” – and what has come before is not a predictor of what is to come. Although working to short timescales, this value still stands. Throughout the HMRC project, this approach allowed increases in feedback cycles, de-risked production deployments, and ensured production systems could be operated in a reliable functioning condition.

How the Continuous Delivery model helped claimants to receive their COVID-19 relief money.

The large population of claimants accessing the Covid-19 services meant performance and resilience (in the face of high concurrent usage) were key concerns during service development. Meeting such requirements can be challenging.

How does Continuous Delivery work?

  • We start projects by building a “walking skeleton,” automating the path to production.
  • This is an end-to-end slice of working functionality.
  • It proves the architecture, path to production, and a foundation to build upon, incrementally.
  • We then automate the process of building and testing software in a repeatable and consistent way, from development, through test and into production.
  • We incrementally adopt Continuous Delivery and build a deployment pipeline to give us confidence the same software is deployed and tested through each state.
  • This confidence allows us to release early and often, with small enhancements directly responding to customer feedback.

Related: Smoothing the Continuous Delivery Path: A Tale of Two Architectures

Want to know more?

Are you interested in this project? Or do you have one just like it? Get in touch. We'd love to tell you more about it.