John Lewis & Partners
A successful retail cloud platform approach
This case study will help you to understand:
How deliberate, well-designed experimentation can change attitudes to modernising a monolith
Why a Digital Platform must be treated as a product, rather than a project
The importance of culture in a Digital Platform ecosystem
The very first steps - the start of a much bigger journey
Embarking on a new approach to modernising multiple monoliths
John Lewis & Partners is one of Britain’s oldest, largest, and most popular retailers. As of February 2021, there are 42 stores across the UK, as well as johnlewis.com. The company is co-owned by its 38,000 employees, is renowned for its exceptional customer service and operates the £3 billion retail website johnlewis.com. The website provides customers with everything that is available in stores from homeware and fashion, to electrical merchandise.
In 2017, John Lewis & Partners recognised that they were finding it difficult to modernise the multiple monoliths that comprised johnlewis.com. For example, at the time they were releasing a backend off-the-shelf e-commerce application once a month, hosted in their data centres. They knew that new approaches were going to be needed.
John Lewis Digital Platform (JLDP) was initiated by a small group of engineers empowered to run internal, organic experiments. It was led by Partners, with subject matter experts from Equal Experts.
The team built up data to demonstrate that they had the correct route to progress.
Powered by copious deliveries of pizza – the JLDP team knew that this could serve as the foundation for a new way of working. But, to ensure the experiment was effective, the service needed to satisfy all operational, governance and security requirements in the cloud. The correct choice of service was critical. Because it needed to be a high volume critical service , the monolithic ‘Browse” frontend of johnlewis.com was deliberately chosen.
Thanks to a lot of hard work, Browse mobile traffic was diverted to the cloud for 40% of site traffic, and a few months later all the desktop traffic was flipped over. The Browse application was removed from the on-premise data centres.
An increasing number of teams and business services quickly saw the benefits. From the outset, JLDP was established as a continuously evolving product, rather than a project with a fixed lifespan, and it went from strength to strength.
From A Year In Google Cloud, by Alex Moss, Partner at JL&P and platform architect:
The frontend of johnlewis.com — what we call ‘Browse’ — wasn’t the first thing we built in Google Cloud Platform (GCP). There were a couple of teams deliberately given the freedom to experiment in GCP, and they built a number of smaller apps that could quickly get into production. This helped cultivate the idea that this was good technology to be working with, and we should start using it for bigger things. That, plus the fact that our engineers were chomping at the bit to get their hands on it, really helped generate the initial push it needed. Adopting the cloud for johnlewis.com really felt like an engineer-led venture — more so than any other piece of work I’ve been involved with in my time at John Lewis Partnership.
Choosing to host something business-critical in the cloud still felt like a big step though. As is often the way in large enterprises, we had to do a fair bit of convincing across IT that this was the right thing to do — or more accurately, that we were going about it the right way (the truth was that there was little doubt that the cloud was the future, but were we doing the right things in cloud, and in the right way?). I used to refer to this work as the “lightning rod” — if we were comfortable putting the very first thing our customers see onto cloud infrastructure, then surely we’d be OK putting other things there too? This really helped us move from something perceived as tactical to something we could brand internally as “strategic” and bring the various teams on the journey with us.
How John Lewis scaled their business with a Digital Platform
Establishing goals early in the process
John Lewis & Partners created a strategic, competitive advantage for themselves, by approaching their Digital Platform with dedicated, multi-year funding from the outset.
Three primary goals were established for the platform:
Increasing the Pace of Change
Traditional infrastructure had a complex Path To Production — typically requiring four weeks to release on top of the associated prioritisation and development effort
Resilient & Secure Platform
Success would not be achieved without also maintaining the high standards for reliability and security that John Lewis customers had come to expect
Developing the Skills of Partners
An opportunity for John Lewis people to grow their skills and capabilities. Building new services from scratch, Partners were equipped with modern tools and techniques, putting these directly in the hands of the engineers – for the benefit of everyone
After the successful Browse frontend migration, demand for JLDP quickly grew. To adapt, the JLDP team built a fully automated, self-service pathway for service teams. They made it as easy as possible to create API microservices and frontends, to coalesce into digital services.
This was something that could be scalable for Black Friday customer traffic, and give teams the tools they needed to get up and running quickly.
John Lewis Digital Platform is a differentiating product, not a one off commodity infrastructure project. The team works on a quarterly basis, to map out capabilities, develop them via lean, kanban techniques and silently launches new features to product teams with zero downtime. There is a focus on solutions underpinned by open source technology.
The importance of establishing a Paved Road at John Lewis & Partners
Adopting Continuous Delivery and Operability practices
JLDP is built on the Netflix concept of an opinionated Paved Road, and based on Continuous Delivery and Operability principles. Each platform capability offers a Paved Road, which consists of low-friction, hardened interfaces comprising user journeys for service teams. Those paved user journeys are fully automated, and encompass the learned best practices that are specific to John Lewis & Partners.
Each Paved Road has been built incrementally by the JLDP teams. They are frequently refined based on user feedback from service teams.
This approach allows John Lewis & Partners to eliminate common failure modes, by automating repetitive tasks. It encourages the adoption of Continuous Delivery and Operability practices, such as constant monitoring of live traffic. It also steers service teams away from pitfalls, such as End-To-End Testing.
Building a series of Paved Roads has challenged the service teams to rethink how they approach particular problems. They are able to contribute enhancements and features back into the Paved Road experience at any time. For example, the automated provisioning of microservice monitoring dashboards has been refined multiple times, as service teams learn more about dependency monitoring. Establishing an opinionated Digital Platform wasn’t without its challenges. The perception of removing an engineer’s agency has to be handled with care. This is one of the things that makes the Paved Road concept so impactful — you don’t have to stay on it.
Alex Moss, Partner and platform architect at John Lewis & Partners perspective:
This is where it got really interesting. This is where we said, ‘ We’re going to start building everything out as micro-services – and run it on this platform in a multi-tenant way. We’re going to build something that is scalable, and by that we don’t just mean solving the problems for peak, but also in being able to get teams up and running properly, to give them the tools they need to build things quickly.
We were very keen to make sure engineers felt a sense of agency, a sense of freedom. That’s why Docker was chosen because we wanted to be able to run almost anything programming language-wise in a Docker container. Then we started to form some opinions on top of that, started to say ‘we think this database technology is a good choice’ and ‘we think building it and deploying it this way is a better way of doing things’. And ‘if you do it this way, you’ll get a bunch of free stuff out of the box’ that will help you to go faster, do more.
The John Lewis Digital Platform is built on top of open-source technology and managed commodity services in Google Cloud wherever possible.
Software as a service tooling is used for platform capabilities such as version control and deployment pipelines. We use services like Google Kubernetes Engine and GCP’s managed databases as the raw platform building blocks, then assemble and configure them using automation tools like Terraform and Gitlab CI in ways that meet the needs of our service teams, and substantially reduce our development costs at the same time.
One of the more significant steps we took was introducing our own Custom Resource Definition (CRD) to our platform along with its own Controller — known as the Microservice Manager.
Instead of declaring the usual Kubernetes primitives, teams are encouraged to instead define a Microservice which is our curated view of workloads running on JLDP. This is powerful, as it allows us to bake a whole load of great things in — such as resiliency and security configurations, telemetry tools, and management of secrets.
Our Platform Engineers look after this, freeing up Software Engineers to spend more time working on new features for the website, and less on how to do things in Kubernetes or GCP.
Creating collaborative team structures and a culture to succeed
Experimentation, feedback and using insights to develop new capabilities
Just as service teams want to build compelling experiences for customers on johnlewis.com, the JLDP team wanted to build a compelling Digital Platform for service teams to use.
New platform capabilities are frequently launched to service teams. The Microservices Manager is used to update frontends and API microservices running in Kubernetes, such as automated configuration of Google Cloud Endpoints. A Paved Road pipeline automatically provisions a telemetry stack out of the box to all teams, such as Prometheus availability alerts and Grafana dashboards. There is also frequent experimentation with new technology offerings.
What this means in practice is:
Frequent new capabilities launching through the Microservice CRD such as automatic configuration of Google Cloud Endpoints.
Frequent new Operability features launched through our Paved Road Pipeline (such as out-the-box Availability Alerts).
Frequent experimentation with new technology offerings (the latest examples are GCP’s Workload Identity, and PagerDuty).
The JLDP team was deliberately co-located with the onshore service teams, in order to foster close working relationships. This was particularly important in the early days of experimentation, and getting fast feedback. This lead to a culture of listening to user opinions, measuring outcomes, and then using insights to develop new platform capabilities.
“We have reduced access to production infrastructure from months to hours, and eliminated any additional efforts teams need to build telemetry and deployment pipelines. Our average live-to-customer timescale is 90 days, and it continues to fall as we remove constraints. Some services go live in under a week.”
The benefits of adopting a ‘You Build It, You Run It’ approach
Adopting a new method of digital service management
The JLDP team was one of the earliest ambassadors at John Lewis & Partners for the You Build It, You Run It operational model. Changing the operational model for digital services was critical if daily deployments and 99.9% reliability was to be achieved. The operations team at John Lewis & Partners continues to support backend office systems and low-level infrastructure.
This change meant addressing on-call processes and moving to modern communication mediums like PagerDuty Slack to manage support.
Replacing legacy applications in on-premise data centres with cloud-native digital services put John Lewis & Partners in a prime position to adopt this new method of digital service management. In addition to unlocking daily deployments for teams, it also resulted in a 3x faster time to respond to incidents. Putting developers on-call maximised their incentives to build in operability from the outset.
The success of the John Lewis & Partners Digital Platform
Differentiating services and keeping up with a fast-moving online retail landscape
The JLDP has enabled John Lewis & Partners to keep up with a fast-moving online retail landscape. It has put John Lewis & Partners in a position where differentiating services can be rapidly shared with their customers.
JLDP has enabled the time to create, enhance or innovate around online services to be dramatically reduced. This gives more time for service teams to focus on their customers, ensuring that commitment to them continues to be at the heart of John Lewis & Partners as the retail space evolves.
The JLDP team continues to build on the platform, continually evolving it to meet the varied needs of an increasing number of teams and digital services. Its success has now made JLDP a key ongoing outcome for the wider reinvention strategy at John Lewis & Partners.
The many achievements of JLDP include:
Centralised governance visualisations. JLDP has a service catalogue that automatically visualises key metrics on Continuous Delivery and Operability, such as the time it takes the service to have its first live customers, as well as the frequency, lead time, and throughput of deployments.
Ever-faster lead times to customers. The average live-to-customer timescale is now 90 days, instead of half a year or more, and it continues to drop.
Scale to Black Friday traffic for customers. JLDP provides so many automated performance and operability tools for digital services that peak business events like Black Friday are more or less unremarkable now. Even in 2020 with record levels of Black Friday traffic, there was little to worry about.
A step change in deployment throughput. John Lewis & Partners has gone from 10 deployments a year in 2016 to 4000 a year in 2019. This has happened without a correlating increase in production incidents.
Industry award wins. John Lewis & Partners has won a number of industry awards for its digital transformation, including Best DevOps Cloud Project in 2019. Partners and Equal Experts team members attended the ceremony together, and had a great time.
Want to know more?
Are you interested in this project? Or do you have one just like it? Get in touch. We'd love to tell you more about it.