It’s official! Equal Experts and HMRC join forces

As a partner of HMRC for the last 10 years, Equal Experts have been named as a supplier on Crown Commercial Service’s (CCS) Big Data and Analytics (BD&A) Framework (running from August 2022 to August 2024) and the IT Digital and Legacy Application Services (DALAS) Framework (running for the next 5 years). The frameworks are designed to enable digital delivery and data insights across the UK public sector.

While these are both Crown Commercial Service frameworks (CCS)*, and hence open up opportunities for Equal Experts across the whole UK public sector, they’ve been driven and shaped by HMRC.

*CCS supports the public sector to achieve maximum commercial value when procuring common goods and services.

Enabling critical national services at scale

This decision supports and recognises our ability to deliver value over the next 5 years to HMRC:

  • DALAS: We are one of 6 suppliers on Lot 2A: Digital, Integration and Programme Application Services (large scale)
  • BD&A: We are the only supplier approved for all 6 capabilities across data and analytics to all central government departments.

“We are thrilled to be key suppliers on two major frameworks to continue to support HMRC and other government departments. These frameworks enable us to deliver user-centred, secure and scalable digital services that improve services for UK citizens and provide data and analytical services to enable effective decision making.”
– Jon Dickinson, Chief Commercial Officer at Equal Experts.

A productive partnership that’s secured crucial UK tax processes

We’ve had quite the productive decade already with HMRC! Here are just a few of the products we’ve delivered to support critical UK national services:

Why are these frameworks such a big deal? 

We are especially proud to be a consultancy included in all 6 capabilities:  Advanced Analytics and Cognitive, Data Management and Acquisition, Platform Service, Reporting and Dashboarding, and Search and Discovery service – for the BD&A Framework, which will allow us to support end-to-end digital data products.

As part of the DALAS framework we are included as one of six consultancies that can support HMRC and other Government departments to provide large-scale digital delivery and integration services, which include services such as user-centred design, DevOps engineering, and integration of software lifecycle management from apps development through to IT operations.

If you would like to know more about these frameworks and how you can use them, follow these links:

Congratulations to HMRC’s Customer Insight Platform (CIP) team for winning the Best Public Sector IT Project category at last night’s UKIT Awards! 

And a shout out to Fiona Teddy-Jimoh for being highly commended in the Business Analyst category, and to Andrew Letherby for being a finalist in the Digital Leader of the Year category.

“What the CIP team achieved at the beginning of the Covid-19 pandemic was nothing short of remarkable. They responded to possible fraud threats to the COVID-19 schemes and rose to meet the needs of 2020 in a way that no UK government had ever responded before, by supporting workers on an unprecedented scale.” Caitlin Smith, Delivery Lead, Equal Experts

As Covid-19 emerged in early March 2020, before the country was aware of the full extent of the crisis about to happen, the Customer Insight Platform (CIP) team at HMRC proactively pulled together proposals in anticipation of being asked to prevent the widespread fraud of Covid-response schemes. As the news of the Treasury’s plans broke fast, the team rapidly responded by supporting the Statutory Sick Pay (SSP) response, the Job Retention Scheme (JRS) and the Self-Employment Income Support Scheme  (SEISS).

It was obvious that the fraud protection case work resulting from Covid-related schemes was likely to inundate any manual components of the existing processes. The multitude of challenges were exacerbated by the timing – the team was already at capacity supporting the acute spike of work associated with the Self-Assessment (SA) process at the end of the tax year. At the same time, the Chancellor requested better access to data from the banking industry. 

The Customer Insight Platform (CIP) team had a short window of time to get a comprehensive fraud protection solution in place. Millions of claims submitted by taxpayers were automatically assessed for fraud detection by the CIP Insight service, supporting the work that HMRC internal fraud investigators were doing in very difficult circumstances. The pandemic accelerated the development of HMRC’s first upfront digital risking service. This was tantamount to a revolution in the speed and accuracy of the fraud protection process. 

Key results:

  • The team made 12 months of progress in one month, with no increase in available resources.
  • The risk automation for the first Covid response service took 8 weeks to build but later services only took 2 weeks.

But in this new world awaited new opportunities for attempted fraud. In handling risk of fraud across three brand new government schemes, with accelerated timelines, the team achieved the impossible. They:

  • Provided timely and valuable insights to frontline services, helping ensure the right money went to the right people at the right time.
  • Used their wealth of experience to find patterns in the data, whilst reflecting on lessons learned from previous efforts.
  • Were empowered by leadership to respond rapidly to the changing landscape
  • Exercised their professional discipline and creative freedom to meet their users’ needs.

And the accolades don’t stop there. HMRC’s CIP team has also received recognition and wins in other awards programmes this year as follows:

  • AI and Machine Learning AwardsWINNER!  Special Award:  Pandemic Performance.
  • UK Dev AwardsWINNER!  Alex Browne, All-Round Hero category. FINALIST:  Project of the Year, Platform Tool of the Year, Rising Star (Fiona Teddy-Jimoh).
  • Computing Digital Technology Leaders AwardsFINALIST:  Best Public Sector Digital Project.
  • COVID Response Awards – Shortlisted, awards on 25th November: Best COVID Response involving the Public Sector category.

So as you can see, the achievements of this team are truly remarkable and very deserving of all the accolades.  Our heartfelt congratulations again to HMRC’s CIP team, as well as to all the other winners and finalists last night.

 

This post describes how I developed the Discount Dining Finder, a lookup map tool for the Eat Out to Help Out scheme in my spare time. The aim of this post is to provide an insight into how problems of scaling services can be solved by having no servers and not using “serverless services” either.

Aperitif

A really nice side effect of working in a high functioning environment is that sometimes you’re involved in bouncing ideas off each other. The delivery teams at HMRC were working on releasing yet another service to the public in less time than it takes you to say “Agile”. This scheme was called Eat Out to Help Out.

The scheme would consist of different journeys:

  • registering a restaurant,
  • searching for registered establishments for the public and
  • making claims for payment.

Out of these three, the biggest unknown in terms of expected volume was the “search journey” to be used by the general public. In this journey, a user would enter a postcode, and registered establishments inside an X mile radius would be displayed. There was a large number of unknowns in terms of how much traffic was to be expected on the HMRC service.

  • Would there be big peaks at lunchtime or dinnertime?
  • What if Martin Lewis goes on TV, recommends visiting the site and the, two minutes later, 10% of the country wants to find out information about their local eateries?
  • Could it impact other HMRC services (the tax platform hosts a multitude of services)?

Now, the tax platform is a very scalable and robust platform and I am not for one minute suggesting that there was going to be a problem using microservices and geo-location in Mongo at scale, but one of the ideas that I floated centered around the fact that the information is fairly static. Sure enough, “eat out” businesses register their premises with HMRC, but once they are registered, the bulk of information will not change. Postcodes and distances between them are not that changeable. So that’s when I wondered, whether this could be delivered in a static site.

Starter

I went away and found that freemaptools provides me with a list of UK postcodes and their associated latitude/longitude. In that file, there are 1,767,875 postcodes. Searching almost 2 million records sounds like the job for a server and a database, doesn’t it? Erm, no.

Looking at the postcode file

$ head -10 ukpostcodes.csv 
id,postcode,latitude,longitude
1,AB10 1XG,57.144165160000000,-2.114847768000000
2,AB10 6RN,57.137879760000000,-2.121486688000000
3,AB10 7JB,57.124273770000000,-2.127189644000000
4,AB11 5QN,57.142701090000000,-2.093295000000000

Instead of searching a single ukpostcodes.csv (95 MB) every time, I decided to “shard” or “partition” my CSV file into smaller files:

./A/B/AB10.csv
./A/B/AB11.csv
./A/L/AL1.csv
./A/L/AL10.csv
./A/L/AL2.csv
./B/1/B1.csv
./B/1/B10.csv
./B/2/B2.csv
./B/2/B20.csv

Each file is split into directories by their first letters. So if I want to find out about postcode AB12 4TS, I’d split up the the outcode (AB12) into /A/B/AB12.csv. That file would only have 799 entries. Searching them manually is much more palatable.

So I’ve got my main page and the user would enter their postcode

Discounted Dining Finder

And I can search for the postcodes simply by using a bit of Javascript inside the user’s browser.

d3.csv("outcode/" + outcode[0] + "/" + outcode[1] + "/" + outcode + ".csv")
    .then(function(postcodes) {
        result = postcodes.find(d => normalisePostcode(d.postcode) === postcode);
        if (result) {
            d3.select("#status").text("");
            mapid.panTo(new L.LatLng(result.lat, result.lon));
        } else {
            d3.select("#status").text("Postcode not found")
        }
    })

D3 is a great library for visualisations, but I also found it very useful for reading and processing CSVs in Javascript, and the files can be served up by a static web server.

Great! But how do I get my directory structure? I did not fancy manually copying and pasting the file. You think that surely now is time to unleash some NoSQL database or at least some Python. But no, I decided to keep it simple and use a combination of shell scripts and AWK:

awk -F, -f split_outcodes.awk target/ukpostcodes.csv

The split_outcodes.awk script did the hard work of creating new files in the correct directory.

$1 != "id" && $3 < 99.9 {
  prev=file;
  split($2, f, " ");
  outcode = f[1]
  outcode1 = substr(outcode, 1, 1)
  outcode2 = substr(outcode, 2, 1)
  file="target/outcode/" outcode1 "/" outcode2 "/" outcode ".csv";
  if (prev!=file) close(prev);
  if (headers[file] != "done") {
    print "id,postcode,lat,lon" >> file;
    headers[file] = "done"
  }
  print $0 >> file;
}

This resulted in 2,980 files. The biggest of those was 145KB which corresponded to 2,701 postcodes. Now that’s much better than looking up 1.7 million postcodes for every search!

Soup

I didn’t mention the Discounted Dining Finder had a map. A quick overview on setting that up!

I used LeafletJS – an open source map. Here’s how:

mapid = L.map('mapid');
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
    maxZoom: 19,
    attribution: '&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors'
}).addTo(mapid);
markerLayer = L.layerGroup().addTo(mapid)

And I had a map!

Map

Fish

That map didn’t have anything on it yet! I was able to convert a postcode into lat/lon though. The next step was to look up the restaurants. I decided to keep running with the idea of doing all my computations on the user browser (desktop or phone).

First of all, I found that the UK postcodes were covering an area of:

$ cut -f3 -d, ukpostcodes.csv | awk -F, 'BEGIN { max = -999; min = +999; } /[0-9.-]+/ { if ($1 > max) max = $1; if ($1 < min) min = $1; } END { print min, max; }'
49.181941000000000 60.800793046799900
$ cut -f4 -d, ukpostcodes.csv | awk -F, 'BEGIN { max = -999; min = +999; } /[0-9.-]+/ { if ($1 > max) max = $1; if ($1 < min) min = $1; } END { print min, max; }'
-8.163139000000000 1.760443184261870

I calculated that the rectangle (60.80 N/-8.16 W) – (49.18 N/1.76 E) covered about 400 miles from west to east and 800 miles from north to south. My aim was to provide a lookup that could find all restaurants in a 5-mile radius, so I split my search area up into tiles of roughly 5×5 miles. Here’s my translation function:

var x = parseInt((+result.lat - 49.0) / (12.0 / 160.0))
var y = parseInt((+result.lon + 9) / (11.0 / 80.0))

That would give me a coordinate set for a tile. So the Buckingham Palace (51.5 N/-0.14 W) would be at coordinates (33/64). Based on that, I could build another set of files:

target/pubgrid
target/pubgrid/0
target/pubgrid/1
target/pubgrid/10
target/pubgrid/100
target/pubgrid/100/100-19.csv
target/pubgrid/100/100-20.csv
target/pubgrid/100/100-21.csv

Whereby all the eateries that are in coordinates (33/64) would be in the file pubgrid/33/33-64.csv. That file would look like this:

name,postcode,lat,lon
blue racer and frilled lizard,BR1 1AB,51.406270892812800,0.015176762143898
saltwater crocodile and blue racer,BR1 1LU,51.401706890000000,0.017463449000000
king cobra and Schneider python,BR1 1PQ,51.406421920000000,0.012595296000000

The javascript can then find the suitable restaurants like so:

d3.csv("pubgrid/" + x + "/" + x + "-" + y + ".csv")
    .then(function(pubs) {
        let inRange = pubs
            .map(a => ({ ...a, distance: distance(result, a)}))
            .filter(a => a.distance < (5 * 1609.34))
            .sort((a, b) => a.distance - b.distance)
            .slice(0, 250)

        d3.select("#results").selectAll("tr")
            .data(inRange)
            .join("tr")
            .selectAll("td")
            .data(d => [ d.name, d.postcode, (d.distance / 1609.34).toFixed(2) + " miles away" ])
            .join("td")
            .text(d => d)

        markerLayer.clearLayers();
        inRange.forEach(d => L.marker([d.lat, d.lon], { "title": d.name }).addTo(markerLayer))
    })

The above code does a few things:

  1. It calculates the distance between the selected lat/lon and the lat/lon for the restaurant.
  2. It filters out anything that is further away than 5 miles.
  3. It sorts by distance, so that the closest are first.
  4. It takes up to 250 results.
  5. It can dynamically create a table that shows the results (this is very neat using D3)
  6. It can clear and recreate all the markers on the map.

The end result looks a little like this:

Map with markers

Meat

Now, the next tricky bit is to ensure, that my coordinate grid system, which simplifies into coordinates (lat/lon), contains all the relevant information about the closest eating establishments. Each tile is designed to be about 5×5 miles. In order to ensure that we find every restaurant that is 5 miles away from each tile, each restaurant goes into the tile it is in, as well as the surrounding tiles. This is done using trusty AWK:

function print_to_file(file) {
  if (headers[file] != "done") {
    print "name,postcode,lat,lon" >> file;
    headers[file] = "done"
  }
  print $0 >> file;
  close(file);
}

{
  x = int(($3 - 49.0) / (12.0 / 160.0))
  y = int(($4 + 9) / (11.0 / 80.0))

  file_tl="target/pubgrid/" (x-1) "/" (x-1) "-" (y-1) ".csv";
  file_tm="target/pubgrid/" x "/" x "-" (y-1) ".csv";
  file_tr="target/pubgrid/" (x+1) "/" (x+1) "-" (y-1) ".csv";
  file_ml="target/pubgrid/" (x-1) "/" (x-1) "-" y ".csv";
  file_mm="target/pubgrid/" x "/" x "-" y ".csv";
  file_mr="target/pubgrid/" (x+1) "/" (x+1) "-" y ".csv";
  file_bl="target/pubgrid/" (x-1) "/" (x-1) "-" (y+1) ".csv";
  file_bm="target/pubgrid/" x "/" x "-" (y+1) ".csv";
  file_br="target/pubgrid/" (x+1) "/" (x+1) "-" (y+1) ".csv";

  print_to_file(file_tl);
  print_to_file(file_tm);
  print_to_file(file_tr);
  print_to_file(file_ml);
  print_to_file(file_mm);
  print_to_file(file_mr);
  print_to_file(file_bl);
  print_to_file(file_bm);
  print_to_file(file_br);
}

But wait a minute, that presupposes that I have a list of pubs and their coordinates. That’s not the case; all we’ve got is the establishment name and their postcode. Thankfully there’s a shell command that I can use to join my existing postcode file and a file of establishments and their postcodes:

join -t , -1 2 -2 2 -o 1.1,0,2.3,2.4 \
   <(sort -k 2 -t , target/pub_postcodes.csv) \
   <(sort -k 2 -t , target/ukpostcodes.csv) > target/named_pubs.csv

The above does the following

  • sorts both the pub_postcode.csv (containing name and postcode),
  • sorts the ukpostcodes.csv (containing the postcode and lat/lon) and
  • joins the two files, creating one file in which the lines are joined by the postcode.

Palate Cleanser

You will have noticed above that my examples aren’t using real pub or restaurant names. At the time of writing HMRC had not yet published the list of registered restaurants, so I used my shell scripting knowledge (and a lot of googling) to create a fairly neat way of generating random pub/restaurant names.

I took a list of animal names and randomly combined them with “and”, the aim being to get the “Fox and Badger” and endless variations.

Here’s the shell script to allow you to do this:

shuf -n 100000 target/ukpostcodes.csv | cut -f2 -d, > target/pub_postcodes.txt

shuf -rn 100000 animal_names.txt > target/1.txt
shuf -rn 100000 animal_names.txt > target/2.txt
yes "and" 2>/dev/null | head -100000 > target/and.txt
paste -d " " target/1.txt target/and.txt target/2.txt > target/pubnames.txt

paste -d "," target/pubnames.txt target/pub_postcodes.txt > target/pub_postcodes.csv

This accomplishes the following:

  • picks 100,000 random postcodes,
  • creates 100,000 random animal names,
  • creates another 100,000 random animal names (in a different order)
  • creates 100,000 instances of “and”,
  • and combines them all, resulting in my randomly generated pub names.
$ head pub_postcodes.csv 
leguaan and bushmaster,B79 7SP
anaconda and Moluccan boobook,CM20 2GN
flying lizard and hoop snake,NW4 3LY
Towhee and agamid,LL11 6NN
Puffleg and Gila monster,OX12 0FE
mamba and Chipmunk,UB6 7AH
Eagle and Marsh harrier,FK1 5LE
Jay and chameleon,KA19 7NW
B and Maya,L5 7UB
ringhals and Diving bird,W9 2EH

Dessert

All of the above is very good, but I’ve still not hosted my tool anywhere, and I don’t want to use my own servers. Thankfully, github.com provides GitHub Pages and GitHub Actions which can be combined to provide a build pipeline and a hosting solution!

Cheese

Thanks for reading, I hope you found the Discounted Dining Finder and the above tale interesting. The source code is available on github.com/beny23/static-distance/ and released using the Apache-2.0 open source licence.