HM Revenue & Customs
Eliminating cyber crime with event-driven architecture
This case study will help you to understand:
How legacy events, stored in data lakes, can be used for evolving organisational needs.
The importance of data pipelines in storing and managing information for large organisations.
The value of event-driven architecture in predicting and preventing fraud in real-time.
About HRMC and the Customer Insights Platform
HMRC is the UK’s tax, payments and customs authority.
The organisation performs a range of sophisticated, vital functions, but their primary roles can be summarized as:
- Collecting money that pays for the UK’s public services and infrastructure
- Supporting disadvantaged families and individuals with targeted financial support
- Helping the honest majority to make accurate and valid tax submissions
- Preventing the dishonest minority—cyber criminals—from cheating the system for illegal financial gains
Each year, HMRC serves over 50 million business and individual customers while generating hundreds of billions of pounds in revenue.
To support and facilitate this digital activity, HMRC relies on:
- A Multi-channel Digital Tax Platform (MDTP); a cloud platform hosted on Amazon Web Services. The MDTP is home to HMRC’s online self-service tax applications; 130 digital services comprised of 950+ decoupled microservices. Learn more about cloud-based platforms in our Digital Platforms playbook.
- The Customer Insights Platform (CIP). The CIP performs a protective, legal function by collating and collecting customer data related to interactions that occur within the MDTP. This data is primarily captured through digital channels via web-facing applications like self-assessment, VAT filing and more.
of data monitored to date.
transactions audited in January 2021.
transactions audited per day.
services monitored and audited.
How events provide complete, real-time visibility of customers
With sophisticated customer journeys spanning multiple tax applications and departments, real-time visibility of user behaviour is invaluable.
Using an event-driven architecture, the CIP monitors every interaction within the Tax Platform to establish detailed user profiles. The monitoring is designed to ensure compliance across legitimate submissions—and the best possible user experience for genuine users—while protecting against prospective instances of fraud and identity theft.
The monitoring occurs in two tiers:
- Gathering general user meta-data and information generated through historical use of the MDTP (for example, the devices a person uses to access the platform, IP addresses, etc.)
- Using event-driven processing to plot sophisticated customer journeys in real-time (for example, diverting people through different pathways based on certain behaviours exhibited while they attempt to complete a tax submission.)
A consistent and detailed view of all customers, for all key stakeholders
Every customer interaction is tracked and audited as an event: from attempting a login or clicking on a content page to submitting a self-assessment. These comprehensive user profiles can be surfaced throughout the organisation to provide continuity and a single, up-to-date source of truth for:
Prioritise cases using analytics generated from event metadata, and interactively explore events on a case by case basis to conclude investigative outcomes.
Customer service teams
Use events for performance analytics and understanding customer journeys to improve their service.
Get a view of a calling customers previous web journey, so they can see which page of a tax assessment the user is stuck on for example.
Use customer journey event data as an audit log for legal purposes when investigating and prosecuting complaints.
Can be used for BI reporting such as number of tax submissions, fraud repayments blocked etc.
The profiles offer invaluable context for various departments throughout HMRC—creating huge efficiencies by eliminating double handling of information—while triggering unique user journeys based on certain behavioural triggers that are identified through a combination of event-processing and meta-data.
Capturing events is always useful, even before you determine meaningful use-cases.
Since its inception, HMRC’s Multi-channel Digital Tax Platform has historically collected events and placed them on a messaging queue; even without a native events-processing tool.
With the implementation of Apache Kafka in 2017, the CIP is now able to push data into a batched analytical data lake.
As a result, the CIP preserves the notion of markable events within the data lake, while leveraging a range of other tools to perform big data processing functions across those captured events.
This approach—which is only possible as a result of implementing and storing events prior to the CIP’s capacity to use those events for real-time processing—creates two crucial benefits:
- With Kafka essentially functioning as a data pipeline, the data lake can be used for analytical, big-data processing thanks to the breadth of information captured as markable events. This information can be used to surface customer profiles based on legacy interactions and metadata generated through the Tax Platform. Learn more about data pipelines in our Data Pipeline playbook.
- The information can be used for real-time event processing, which is critical in identifying and blocking fraudulent transactions before they can occur.
The CIP is fed from a microservices-based architecture, running in Amazon Web Services (AWS). The platform leverages Kafka Connect to feed into S3 (Amazon’s Simple Storage Service), which facilitates the transition of information from Kafka to the data lake. A range of big data processing tools perform analytical functions on the information stored within the lake.
One example is a suite of libraries associated with structural transaction layers; an open-source library called Apache Hudi. This allows for data processing via Apache Spark. The configuration enables a range of capabilities associated with incremental-style event processing, creating two key benefits:
- This approach allows the CIP to preserve the informational and conceptual structure of events within the data lake.
- In turn, this provides far greater flexibility and specificity in analysing targeted datasets, rather than treating all information as one general set of data.
Using events to predict and prevent fraud in real-time
When it comes to digital crime, the best defence is undoubtedly predictive prevention.
Once a transaction is processed, it is incredibly difficult to recapture funds retrospectively. Eliminating illegal transactions in the first place is crucial.
Event-processing plays a vital role in the CIP’s ability to conduct behavioural analysis and identify potentially fraudulent activity in real-time.
For example, credential stuffing and other criminal practices can be detected within seconds of the very first attempts being made. Once a problematic account, transaction, or behaviour is identified, HMRC can trigger a vast array of corrective measures. These range from increased monitoring throughout a user’s journey to completely blocking their account.
Let’s consider a detailed practical use case.
Identifying fraud, instantly
Among many other things, the CIP is configured to monitor for events that signify multiple login attempts for different users from the same device.
Through real-time event-processing the CIP will provide immediate visibility of this behaviour. Rather than take a singular or definitive course of action, we monitor for additional events to establish more clarity around the user and divert them through different journeys based on a profile of behaviours.
Fraud detection requires nuance and sophistication to ensure we don’t penalise legitimate users. Multiple login attempts on a single device is common practice for accountants working on behalf of a range of clients, for example.
In this example, an event might query an API to serve additional security questions as part of the sign-in process. Alternatively, if the bank account, IP address, or device has a historical record of criminal activity or red-flag behaviours in the platform—as identified in the metadata associated with that user’s profile—the submission may be blocked entirely.
Thanks to event-driven architecture and detailed user-profiles generated using historical interactions with the platform, HMRC has the power to determine what processes they adopt or alter based on a real-time portrait of each customer.
The result? Improved experiences for legitimate users, and infinitely more effective protection against would-be criminals.
About the tech stack
The technical infrastructure of the Customer Insights Platform has evolved over time.
Using an emergent design approach, the team has been able to flexibly build in new capabilities, integrations, and ancillary services to meet evolving needs quickly. Over time, some of these solutions and third-party integrations have included:
- Kafka Connect
- Apache Spark
- Apache Hudi
- Apache Superset
- Jupyter Notebooks
Want to know more?
Are you interested in this project? Or do you have one just like it? Get in touch. We'd love to tell you more about it.