A big night for HMRC’s Customer Insight Platform team at the UKIT awards!
Congratulations to HMRC’s Customer Insight Platform (CIP) team for winning the Best Public Sector IT Project category at last night’s UKIT Awards!
And a shout out to Fiona Teddy-Jimoh for being highly commended in the Business Analyst category, and to Andrew Letherby for being a finalist in the Digital Leader of the Year category.
“What the CIP team achieved at the beginning of the Covid-19 pandemic was nothing short of remarkable. They responded to possible fraud threats to the COVID-19 schemes and rose to meet the needs of 2020 in a way that no UK government had ever responded before, by giving money away on an unprecedented scale.” Caitlin Smith, Delivery Lead, Equal Experts
As Covid-19 emerged in early March 2020, before the country was aware of the full extent of the crisis about to happen, the Customer Insight Platform (CIP) team at HMRC proactively pulled together proposals in anticipation of being asked to prevent the widespread fraud of Covid-response schemes. As the news of the Treasury’s plans broke fast, the team rapidly responded by supporting the Statutory Sick Pay (SSP) response, the Job Retention Scheme (JRS) and the Self-Employment Income Support Scheme (SEISS).
It was obvious that the fraud protection case work resulting from Covid-related schemes was likely to inundate any manual components of the existing processes. The multitude of challenges were exacerbated by the timing – the team was already at capacity supporting the acute spike of work associated with the Self-Assessment (SA) process at the end of the tax year. At the same time, the Chancellor requested better access to data from the banking industry.
The Customer Insight Platform (CIP) team had a three-week window to get a comprehensive fraud protection solution in place. Millions of claims submitted by taxpayers were automatically assessed for fraud detection by the CIP Insight service, supporting the work that HMRC internal fraud investigators were doing in very difficult circumstances. The pandemic accelerated the development of HMRC’s first upfront digital risking service. This was tantamount to a revolution in the speed and accuracy of the fraud protection process.
Key results:
- The team received personal notes of gratitude from the Prime Minister and the Treasury.
- The team made 12 months of progress in one month, with no increase in available resources.
- Manual effort was reduced by 90% for the Customer Insight Platform team
- The risk automation for the first Covid response service took 8 weeks to build but later services only took 2 weeks.
But in this new world awaited new opportunities for attempted fraud. In handling risk of fraud across three brand new government schemes, with accelerated timelines, the team achieved the impossible. They:
- Provided timely and valuable insights to frontline services, helping ensure the right money went to the right people at the right time.
- Used their wealth of experience to find patterns in the data, whilst reflecting on lessons learned from previous efforts.
- Were empowered by leadership to respond rapidly to the changing landscape, which allowed them to delight colleagues with the speed by which new services were spun up.
- Exercised their professional discipline and creative freedom to meet their users’ needs.
And the accolades don’t stop there. HMRC’s CIP team has also received recognition and wins in other awards programmes this year as follows:
- AI and Machine Learning Awards – WINNER! Special Award: Pandemic Performance.
- UK Dev Awards – WINNER! Alex Browne, All-Round Hero category. FINALIST: Project of the Year, Platform Tool of the Year, Rising Star (Fiona Teddy-Jimoh).
- Computing Digital Technology Leaders Awards – FINALIST: Best Public Sector Digital Project.
- COVID Response Awards – Shortlisted, awards on 25th November: Best COVID Response involving the Public Sector category.
So as you can see, the achievements of this team are truly remarkable and very deserving of all the accolades. Our heartfelt congratulations again to HMRC’s CIP team, as well as to all the other winners and finalists last night.
This is the story of how HMRC, an organisation able to trace its roots back to the Middle Ages, used technologies such as Slack to meet the challenges of the Covid-19 pandemic.
In March 2020, HMRC Digital was busy—very busy. Teams were working hard to support the acute spike of work that always accompanies the preparations for the new tax year. Then Covid-19 hit.
In the early days of the national lockdown, everyone at HMRC Digital understood that the sudden and enforced shift to fully remote working had to be achieved without disrupting business as usual. And just as teams were adjusting to those new ways of working, the Treasury announced an unprecedented range of Covid-response schemes (including ‘Eat Out to Help Out’ and the Job Retention Scheme, otherwise known as ‘furlough’). Getting these efforts launched quickly and sustainably was crucial, if colleagues, friends, and neighbours were going to continue to pay their mortgages and feed their families.
The award-winning outcomes HMRC Digital delivered during that strange and difficult period depended on teams being able to collaborate effectively while physically separate. A major part of achieving this relied on getting the most out of HMRC Digital’s tools of choice, one of which was Slack.
Separate but Together
When Covid-19 forced the country into lockdown, HMRC Digital was already using Slack. This was a crucial rock on which to build effective remote-first ways of working. Indeed, preserving the human element of work was deemed essential for HMRC Digital to succeed.
Slack had helped HMRC Digital colleagues – 60 teams comprising some 1,500 people across eight locations – move their conversations out of siloed email inboxes into shared channels, thereby creating an archive of discussions and decisions. Tapping into the collective knowledge of the organisation had become simple and seamless. And when the pandemic struck, communities were strengthened despite physical separation. Slack enabled colleagues to react to good news around the organisation in real time, and celebrate shared successes, regardless of where they were in the country. And throughout the public health crisis, HMRC Digital used Slack to collaborate with its external partners in shared channels at the right time, in the right way—and that included Equal Experts, which introduced Slack to HMRC Digital in the first place.
Busy Bees
Equal Experts is a leading software consultancy, and an enthusiastic user of Slack since 2016. In its most popular channel there are now over 2,000 users, which is all the more remarkable because it only employs 800 people! Slack is the glue that binds together the organisation’s complex network of employees, associates, clients (past and present) and external partners.
Part of the value Equal Experts brings to its engagements is the cross-pollination of best practice. In that spirit, it shared with HMRC Digital its experiences with Slack, including advice on how to get the most out of the tool, such as establishing:
- “Ask” channels, so people can request help and advice
- “Temporary” channels, so people can gather around an urgent idea or challenge
- “Community” channels (e.g., by location or by tool), to help people share news and connect with one another
- “Public” channels (the default at Equal Experts), so people can freely share knowledge, celebrate successes, and even tell jokes!
- “Private” channels, used when sensitive information is being shared
When HMRC Digital made the decision to adopt Slack, Equal Experts gave the organisation one last piece of advice: rely on your people to self-organise. That is, colleagues should be trusted to only join (and stay in) channels that are relevant and useful to them. While writing lengthy guidelines on using Slack can prove handy, and every once in a while it might be valuable to nudge people toward or away from certain channels (or behaviours), the core spirit of autonomous decision-making is crucial.
After HMRC Digital started using Slack, it began to realise the channel-based messaging platform’s value in the context of its unique situation, and HMRC Digital employees made it their own. In doing so, they laid the critical foundations that they’d rely on later, when the Covid-19 pandemic hit.
To Infinity and Beyond
Remote first. Peak BAU demand. New ways of working. Massive, unexpected challenges. All these factors could have combined to be detrimental to HMRC Digital’s performance over the past year. But making clever use of tools such as Slack has helped keep the teams safe, happy and productive. In fact, the teams at HMRC Digital have excelled across the board, achieving outcomes for the nation above and beyond expectations. They’ve won multiple awards, including the ‘Best Public Sector IT Project’, and were highly commended for the ‘Best Use of Cloud Services’ by the BCS (The Chartered Institute of IT). Furthermore, the platform team earned a 97% satisfaction rating from customers, prevented billions of pounds of potential fraud, and received personal notes of gratitude from both the Prime Minister and the Treasury. That is a remarkable good news story, and we all need those now more than ever.
This post describes how I developed the Discount Dining Finder, a lookup map tool for the Eat Out to Help Out scheme in my spare time. The aim of this post is to provide an insight into how problems of scaling services can be solved by having no servers and not using “serverless services” either.
Aperitif
A really nice side effect of working in a high functioning environment is that sometimes you’re involved in bouncing ideas off each other. The delivery teams at HMRC were working on releasing yet another service to the public in less time than it takes you to say “Agile”. This scheme was called Eat Out to Help Out.
The scheme would consist of different journeys:
- registering a restaurant,
- searching for registered establishments for the public and
- making claims for payment.
Out of these three, the biggest unknown in terms of expected volume was the “search journey” to be used by the general public. In this journey, a user would enter a postcode, and registered establishments inside an X mile radius would be displayed. There was a large number of unknowns in terms of how much traffic was to be expected on the HMRC service.
- Would there be big peaks at lunchtime or dinnertime?
- What if Martin Lewis goes on TV, recommends visiting the site and the, two minutes later, 10% of the country wants to find out information about their local eateries?
- Could it impact other HMRC services (the tax platform hosts a multitude of services)?
Now, the tax platform is a very scalable and robust platform and I am not for one minute suggesting that there was going to be a problem using microservices and geo-location in Mongo at scale, but one of the ideas that I floated centered around the fact that the information is fairly static. Sure enough, “eat out” businesses register their premises with HMRC, but once they are registered, the bulk of information will not change. Postcodes and distances between them are not that changeable. So that’s when I wondered, whether this could be delivered in a static site.
Starter
I went away and found that freemaptools provides me with a list of UK postcodes and their associated latitude/longitude. In that file, there are 1,767,875 postcodes. Searching almost 2 million records sounds like the job for a server and a database, doesn’t it? Erm, no.
Looking at the postcode file
$ head -10 ukpostcodes.csv
id,postcode,latitude,longitude
1,AB10 1XG,57.144165160000000,-2.114847768000000
2,AB10 6RN,57.137879760000000,-2.121486688000000
3,AB10 7JB,57.124273770000000,-2.127189644000000
4,AB11 5QN,57.142701090000000,-2.093295000000000
Instead of searching a single ukpostcodes.csv
(95 MB) every time, I decided to “shard” or “partition” my CSV file into smaller files:
./A/B/AB10.csv
./A/B/AB11.csv
./A/L/AL1.csv
./A/L/AL10.csv
./A/L/AL2.csv
./B/1/B1.csv
./B/1/B10.csv
./B/2/B2.csv
./B/2/B20.csv
Each file is split into directories by their first letters. So if I want to find out about postcode AB12 4TS
, I’d split up the the outcode (AB12
) into /A/B/AB12.csv
. That file would only have 799 entries. Searching them manually is much more palatable.
So I’ve got my main page and the user would enter their postcode
And I can search for the postcodes simply by using a bit of Javascript inside the user’s browser.
d3.csv("outcode/" + outcode[0] + "/" + outcode[1] + "/" + outcode + ".csv")
.then(function(postcodes) {
result = postcodes.find(d => normalisePostcode(d.postcode) === postcode);
if (result) {
d3.select("#status").text("");
mapid.panTo(new L.LatLng(result.lat, result.lon));
} else {
d3.select("#status").text("Postcode not found")
}
})
D3 is a great library for visualisations, but I also found it very useful for reading and processing CSVs in Javascript, and the files can be served up by a static web server.
Great! But how do I get my directory structure? I did not fancy manually copying and pasting the file. You think that surely now is time to unleash some NoSQL database or at least some Python. But no, I decided to keep it simple and use a combination of shell scripts and AWK:
awk -F, -f split_outcodes.awk target/ukpostcodes.csv
The split_outcodes.awk
script did the hard work of creating new files in the correct directory.
$1 != "id" && $3 < 99.9 {
prev=file;
split($2, f, " ");
outcode = f[1]
outcode1 = substr(outcode, 1, 1)
outcode2 = substr(outcode, 2, 1)
file="target/outcode/" outcode1 "/" outcode2 "/" outcode ".csv";
if (prev!=file) close(prev);
if (headers[file] != "done") {
print "id,postcode,lat,lon" >> file;
headers[file] = "done"
}
print $0 >> file;
}
This resulted in 2,980 files. The biggest of those was 145KB which corresponded to 2,701 postcodes. Now that’s much better than looking up 1.7 million postcodes for every search!
Soup
I didn’t mention the Discounted Dining Finder had a map. A quick overview on setting that up!
I used LeafletJS – an open source map. Here’s how:
mapid = L.map('mapid');
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
maxZoom: 19,
attribution: '© <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors'
}).addTo(mapid);
markerLayer = L.layerGroup().addTo(mapid)
And I had a map!
Fish
That map didn’t have anything on it yet! I was able to convert a postcode into lat/lon though. The next step was to look up the restaurants. I decided to keep running with the idea of doing all my computations on the user browser (desktop or phone).
First of all, I found that the UK postcodes were covering an area of:
$ cut -f3 -d, ukpostcodes.csv | awk -F, 'BEGIN { max = -999; min = +999; } /[0-9.-]+/ { if ($1 > max) max = $1; if ($1 < min) min = $1; } END { print min, max; }'
49.181941000000000 60.800793046799900
$ cut -f4 -d, ukpostcodes.csv | awk -F, 'BEGIN { max = -999; min = +999; } /[0-9.-]+/ { if ($1 > max) max = $1; if ($1 < min) min = $1; } END { print min, max; }'
-8.163139000000000 1.760443184261870
I calculated that the rectangle (60.80 N/-8.16 W) – (49.18 N/1.76 E) covered about 400 miles from west to east and 800 miles from north to south. My aim was to provide a lookup that could find all restaurants in a 5-mile radius, so I split my search area up into tiles of roughly 5×5 miles. Here’s my translation function:
var x = parseInt((+result.lat - 49.0) / (12.0 / 160.0))
var y = parseInt((+result.lon + 9) / (11.0 / 80.0))
That would give me a coordinate set for a tile. So the Buckingham Palace (51.5 N/-0.14 W) would be at coordinates (33/64). Based on that, I could build another set of files:
target/pubgrid
target/pubgrid/0
target/pubgrid/1
target/pubgrid/10
target/pubgrid/100
target/pubgrid/100/100-19.csv
target/pubgrid/100/100-20.csv
target/pubgrid/100/100-21.csv
Whereby all the eateries that are in coordinates (33/64) would be in the file pubgrid/33/33-64.csv
. That file would look like this:
name,postcode,lat,lon
blue racer and frilled lizard,BR1 1AB,51.406270892812800,0.015176762143898
saltwater crocodile and blue racer,BR1 1LU,51.401706890000000,0.017463449000000
king cobra and Schneider python,BR1 1PQ,51.406421920000000,0.012595296000000
The javascript can then find the suitable restaurants like so:
d3.csv("pubgrid/" + x + "/" + x + "-" + y + ".csv")
.then(function(pubs) {
let inRange = pubs
.map(a => ({ ...a, distance: distance(result, a)}))
.filter(a => a.distance < (5 * 1609.34))
.sort((a, b) => a.distance - b.distance)
.slice(0, 250)
d3.select("#results").selectAll("tr")
.data(inRange)
.join("tr")
.selectAll("td")
.data(d => [ d.name, d.postcode, (d.distance / 1609.34).toFixed(2) + " miles away" ])
.join("td")
.text(d => d)
markerLayer.clearLayers();
inRange.forEach(d => L.marker([d.lat, d.lon], { "title": d.name }).addTo(markerLayer))
})
The above code does a few things:
- It calculates the distance between the selected lat/lon and the lat/lon for the restaurant.
- It filters out anything that is further away than 5 miles.
- It sorts by distance, so that the closest are first.
- It takes up to 250 results.
- It can dynamically create a table that shows the results (this is very neat using D3)
- It can clear and recreate all the markers on the map.
The end result looks a little like this:
Meat
Now, the next tricky bit is to ensure, that my coordinate grid system, which simplifies into coordinates (lat/lon), contains all the relevant information about the closest eating establishments. Each tile is designed to be about 5×5 miles. In order to ensure that we find every restaurant that is 5 miles away from each tile, each restaurant goes into the tile it is in, as well as the surrounding tiles. This is done using trusty AWK:
function print_to_file(file) {
if (headers[file] != "done") {
print "name,postcode,lat,lon" >> file;
headers[file] = "done"
}
print $0 >> file;
close(file);
}
{
x = int(($3 - 49.0) / (12.0 / 160.0))
y = int(($4 + 9) / (11.0 / 80.0))
file_tl="target/pubgrid/" (x-1) "/" (x-1) "-" (y-1) ".csv";
file_tm="target/pubgrid/" x "/" x "-" (y-1) ".csv";
file_tr="target/pubgrid/" (x+1) "/" (x+1) "-" (y-1) ".csv";
file_ml="target/pubgrid/" (x-1) "/" (x-1) "-" y ".csv";
file_mm="target/pubgrid/" x "/" x "-" y ".csv";
file_mr="target/pubgrid/" (x+1) "/" (x+1) "-" y ".csv";
file_bl="target/pubgrid/" (x-1) "/" (x-1) "-" (y+1) ".csv";
file_bm="target/pubgrid/" x "/" x "-" (y+1) ".csv";
file_br="target/pubgrid/" (x+1) "/" (x+1) "-" (y+1) ".csv";
print_to_file(file_tl);
print_to_file(file_tm);
print_to_file(file_tr);
print_to_file(file_ml);
print_to_file(file_mm);
print_to_file(file_mr);
print_to_file(file_bl);
print_to_file(file_bm);
print_to_file(file_br);
}
But wait a minute, that presupposes that I have a list of pubs and their coordinates. That’s not the case; all we’ve got is the establishment name and their postcode. Thankfully there’s a shell command that I can use to join my existing postcode file and a file of establishments and their postcodes:
join -t , -1 2 -2 2 -o 1.1,0,2.3,2.4 \
<(sort -k 2 -t , target/pub_postcodes.csv) \
<(sort -k 2 -t , target/ukpostcodes.csv) > target/named_pubs.csv
The above does the following
- sorts both the
pub_postcode.csv
(containing name and postcode), - sorts the
ukpostcodes.csv
(containing the postcode and lat/lon) and - joins the two files, creating one file in which the lines are joined by the postcode.
Palate Cleanser
You will have noticed above that my examples aren’t using real pub or restaurant names. At the time of writing HMRC had not yet published the list of registered restaurants, so I used my shell scripting knowledge (and a lot of googling) to create a fairly neat way of generating random pub/restaurant names.
I took a list of animal names and randomly combined them with “and”, the aim being to get the “Fox and Badger” and endless variations.
Here’s the shell script to allow you to do this:
shuf -n 100000 target/ukpostcodes.csv | cut -f2 -d, > target/pub_postcodes.txt
shuf -rn 100000 animal_names.txt > target/1.txt
shuf -rn 100000 animal_names.txt > target/2.txt
yes "and" 2>/dev/null | head -100000 > target/and.txt
paste -d " " target/1.txt target/and.txt target/2.txt > target/pubnames.txt
paste -d "," target/pubnames.txt target/pub_postcodes.txt > target/pub_postcodes.csv
This accomplishes the following:
- picks 100,000 random postcodes,
- creates 100,000 random animal names,
- creates another 100,000 random animal names (in a different order)
- creates 100,000 instances of “and”,
- and combines them all, resulting in my randomly generated pub names.
$ head pub_postcodes.csv
leguaan and bushmaster,B79 7SP
anaconda and Moluccan boobook,CM20 2GN
flying lizard and hoop snake,NW4 3LY
Towhee and agamid,LL11 6NN
Puffleg and Gila monster,OX12 0FE
mamba and Chipmunk,UB6 7AH
Eagle and Marsh harrier,FK1 5LE
Jay and chameleon,KA19 7NW
B and Maya,L5 7UB
ringhals and Diving bird,W9 2EH
Dessert
All of the above is very good, but I’ve still not hosted my tool anywhere, and I don’t want to use my own servers. Thankfully, github.com provides GitHub Pages and GitHub Actions which can be combined to provide a build pipeline and a hosting solution!
Cheese
Thanks for reading, I hope you found the Discounted Dining Finder and the above tale interesting. The source code is available on github.com/beny23/static-distance/ and released using the Apache-2.0 open source licence.
HMRC’s AWS-based Digital Tax Platform has been a key enabler to the continuous delivery of services for helping individuals and businesses impacted by COVID-19.
The large population of claimants accessing these services has meant that performance and resilience (in the face of high concurrent usage) have been key concerns during service development. Meeting such requirements can be challenging though. To help, a newly formed Equal Experts team improved resilience and performance by integrating a new third party service in just four weeks.
The challenge
To ensure that the financial relief provided by HMRC’s three COVID-19 services (JRS, SEISS, SSP) reaches the right people as quickly as possible, bank account details are validated. The unprecedented volume of traffic, peaking at 100 journeys per second, overwhelmed the third-party services used to validate bank details. HMRC wanted extra capacity and resiliency. Alongside this challenge were organisational ones: the team tasked with providing this was new and all teams are now working fully remotely.
The approach
In mid-April, HMRC connected the team with a group of suppliers (Pay.UK, Vocalink, NatWest and SurePay) to quickly integrate a new bank account checking service (provided by SurePay) with HMRC’s Tax Platform. Two weeks after the first meeting, the HMRC delivery team was cutting code and had completed an end-to-end test to SurePay’s API. Two weeks after that, the integration was live in Production, comfortably serving nearly 20 requests per second. The timing of this was incredibly fortunate: one hour after going live, SurePay’s API ended up taking 100% of the traffic, due to a capacity issue with other downstream services.
Integrating with a third-party service in just four weeks is a massive achievement, especially in the public sector. The key differentiators that made this happen were:
- The Tax Platform itself – Over the last seven years, Equal Experts has played a lead role in the design and evolution of the platform, working closely with a range of suppliers and HMRC colleagues to iteratively improve its support for the rapid, low-cost delivery of performant, resilient services. It’s optimised for enabling teams to form, deliver and operate fast.
- Cross-team, cross supplier collaboration – It takes more than a single team to deliver an integration. Within HMRC, infrastructure, security and commercial teams contributed to this outcome. Externally, multiple people and teams across Pay.UK, Vocalink, NatWest, SurePay and the collaborating banks were also heavily engaged. We shortened feedback loops and maximised communication bandwidth by using cross-supplier platforms such as Slack – and having regular video conference meetings.
- Small technical teams with blended roles – The technical part of the implementation was delivered by the Equal Experts team of four at HMRC, interacting closely with a SurePay team of three. Although the EE team has a mixture of developers, a tester and a business analyst, each consultant has the experience and a broad enough skill set to work effectively across disciplines and focus on the highest-priority activity. Developers test things, the business analyst diagnoses production issues and makes GitHub commits, and the tester writes code and deploys to production.
The outcome
Many benefits have come from the improved capacity and resilience:
- HMRC digital service users are less impacted by third party outages or performance issues. This is especially important for COVID-19 service users, whose digital transactions are critical to their livelihood and need to be as simple and low-stress as possible.
- The team that operates the bank account checking service receive fewer production alerts, and have fewer incidents to deal with, so they can focus more on enhancing this and other services they own.
- The same goes for teams that operate services that depend on the bank account checking service. This multiplies these benefits out to over 10 teams and many more services.
If you’d like to learn more about how continuous delivery can help improve business outcomes, check out our series of ExpertTalks on the topic.