Beyond Gen AI: Experts explore the benefits of AI in finance and superannuation
When we started planning our latest Melbourne Expert Talks event, we didn’t anticipate bulldozers becoming a central AI metaphor. But that’s the beauty of bringing experts together to talk about technology and leadership – you never know where the conversation might lead!
With a recent Gartner poll finding that 55% of organisations globally are already in pilot or production with Generative AI, we wanted to discover if Australia’s finance and superannuation sectors were investing in AI. At “Beyond Gen AI: Navigating how businesses realise the benefits of AI” our experts shared insights into how organisations are embracing the potential of AI and the foundational steps that organisations need to maximise their return on investment. As moderator of the panel, I’m delighted to share some of the discussions here.
Combining Artificial Intelligence with Human Intelligence
Gen AI continues to dominate technology conversations, but more organisations are understanding that its efficacy hinges on human oversight. Stephen Reilly Chief Operating Officer at HESTA illustrated this with the memorable bulldozer analogy: “No-one is surprised that a bulldozer can lift more dirt than a human with a shovel. But you still need a human to drive the bulldozer and decide where you want the dirt to go. It’s the same with AI. It’s no surprise that AI can do things faster than a human, but you still need a human in the process to get the best results.”
Claire Cornfield, Senior Executive Head of Customer Experience at La Trobe Financial, expanded on the importance of combining artificial intelligence with human intelligence, especially in organisations where trust is vital, such as financial services. ”We need to decide what we want the technology to do, what we want our staff to do and where each can add value. But we’ll still need humans to be involved in highly emotive or sensitive areas, even though these are also the more challenging jobs.”
Our panel also highlighted AI’s potential to improve customer experience and be used to create better outcomes for vulnerable people. For example, within healthcare, we are seeing the potential for AI to use data sets to predict health problems and enable early intervention. Andrea Lymbouris, Head of Information Services at State Trustees, can see the potential in AI to provide improved personalised customer interactions within her teams. “AI could support our consultants access data about the client they are speaking to, when they last called and why, so the client doesn’t have to repeat all the information and we can give them the help they need quicker.”
Balancing AI’s risks and rewards
The finance and superannuation sectors, bound by regulatory constraints and financial responsibilities, can often be seen as cautious in their approach to new technologies. But even within this sector, each business will have a different appetite for risk, said Michael Collins, former Chief Information Security Officer at Judo Bank. “Other businesses are going to run harder and run faster with AI – they won’t have the sensitive data we do so they’re going to be able to take more risks. But within each organisation, it comes down to what your board and your senior management are comfortable with from a risk perspective.”
Stephen noted numerous potential AI applications in the competitive superannuation sector, such as personalised experiences and detecting anomalies in customer behaviour. “But, as with everything with technology, we have been very conscious with what we enable, “ he said. “We want to ensure it adds value, optimise our use and ensure we keep it secure.”
Claire also raised concerns about AI’s impact on talent development if it is used to automate some tasks. “A lot of AI-use cases are automating tasks that usually fall to entry-level roles, “ she said. “But these tasks, like dealing with calls in a customer centre, give people the breadth of experience that they can take into their long-term financial services careers. If we take that work away, how are they going to get that experience?”
No doubt inspired by the Crowdstrike incident in the week before the event, the panel also stressed the importance of human intervention in AI systems. Michael said: “Good AI needs three things – confidentiality, integrity and security. But if an AI system went down, you would still rely on humans to be involved to fix it or maintain the service.”
For Andrea, the biggest challenge is creating a strategy for the business when AI is advancing so rapidly and how businesses decide when to leverage AI capabilities built into the tools they already procure and when to decide to take a wider approach. Andrea added: “I think from a technology perspective we need to rapidly increase our skill set and expand our knowledge of AI.”
Alongside AI strategies, getting funding for AI projects can also be a challenge, with limited investment models currently available to organisations to use within business cases. Michael said: “You’ve got to be very clear about why you’re asking for money and what you want to do because the business is always making trade-offs. But it’ll again come back to risk appetite and where you can pivot funding from in your current strategy.”
The importance of data quality and security
I’ve worked in digital transformation for a long time and I’ve seen the same questions about risk come up with each new technology – I remember people being horrified when we first introduced APIs in banking for example. But the difference I see with AI is that it is a technology that can be democratised and anyone can access tools like ChatGPT. During the panel, I asked if it means that tasks we often put on the back burner, such as data cleansing or resolving our internal data permissions, now become consequential. The question of data quality is certainly one we hear from businesses looking to start working with AI, particularly for organisations looking after sensitive information for their customers or clients.
Andrea said: “I think organisations are right to be thinking about protecting and securing the data. We need to be very mindful of it and put in place some additional risks and controls.” Michael also echoed the data security sentiment, adding: “You need to understand your data, where it is and how you’re going to use it before you start just running to the sexiest thing that’s on the internet and trying to install it and see how it goes.”
But Stephen also cautioned organisations waiting too long for their data to be perfect before embarking on an AI project. “Your data is never going to be perfect, so you have to figure out how to build in the margin for error for imperfect data. You have to drive forward. My encouragement is to test the quality of your data, overlay human intelligence onto your AI and embrace data governance people.”
Conclusion
We’d like to thank everyone who attended the event and our panel members for sharing their expert insights.
We’re also delighted to announce that will be matching the total amount raised from the event, boosting the final figure to $1,500 donated to the Aboriginal Investment Group’s Remote Laundry Project. This will power a laundry site for an entire year, giving remote aboriginal communities access to free laundry services to improve health and social outcomes.
Watch out for details of our next Expert Talks event and If you’re interested in exploring Gen AI in your organisation, contact the Equal Experts Australia team.
How can autonomous agents be used in business processes to improve outcomes and deliver better solutions architecture?
Autonomous agents are individual AI entities that can be configured to analyse context and perform actions to complete tasks, then work together to achieve a goal. Spending on this type of AI is expected to increase from $4 billion to more than $64 billion over the next seven years.
The idea of intelligent or autonomous agents isn’t new. I remember seeing a very early version in the video game Minecraft, where I could create a robotic player that had skills associated with them and give them a daily rota of activities. The bot could then plan tasks, like: “I need to cut down a tree, but first I need the skill of how to cut down the tree.”
These bots used GPT (generative pre-trained transformer) to enhance their own code base. The basic idea is that the agent knows it needs to achieve a task (like cut down the tree or kill a spider.) It’s going to look at the database and identify the skill that seems closest to what it needs. The agent will either fail or succeed, and that information can be used to update the description of the skill, so that next time, the matching process runs better.
How can autonomous agents can be used in business processes to improve outcomes?
Now, we’re trying to understand how autonomous agents could be used in Equal Expert’s business processes to improve outcomes and deliver better solutions architecture.
Gen AI has found a foothold in software development, enabling developers and software engineers to generate code, create test suites and be ‘ten exed’ (10x) various copilots. But can this be extended to other roles?
In recent months, I’ve been working to create autonomous agents that can take on the person of various team members, like a business analyst or solutions architect. The question I asked is, “Can the agent take poorly specified requirements and identify missing information, or see where clarification is needed?”
We then made a chat interface where the persona can ask me questions, and I respond, which allows the persona to make the necessary changes and improvements to the request. Alternatively, we can build the capability into the workflow tool itself, such as Jira.
In a customer setting, this could look like:
- The user fills out a Jira ticket, which is moved into a Kanban column for the auto-BA to read, identifying missing information or details that need further clarification
- The result is reviewed by a human who confirms the ticket has enough upfront analysis to move to the next stage
- The ticket is moved in Jira to another team of agents that perform the task breakdown and propose a solution, based on existing knowledge of business rules and understanding of what a good architecture looks like
- The agents uses modular diagramming tools to generate architecture diagrams
- The agent then creates a reply in Jira showing the target architecture, components and workflow for a human to review
Getting autonomous agents to a first high-level response
We want to get to a point where the agents can generate a first high-level response – to tell us what the target architecture is, its components, and what the workflow looks like, and then to summarise that back again, either as an attachment in Jira or in the Jira ticket itself.
Currently we’re in the early stages of that process. We’re using Microsoft AutoGen and we’ve got proofs of concept, and I’ve been working on Jira integration. It’s been a struggle, but we can create tickets and move tickets around in Jira.
What we want to do next is identify a list of what we’d require, as experts, for a good requirement document, and really asking ‘What extra details should we be looking for?’ We are holding a workshop with a few folks to try and codify those elements.
Doing more with less
The primary benefit of autonomous agents is the potential to reduce latency time in processes and help individuals be more efficient. You can have tickets that are responded to immediately. On a large engagement you could have hundreds of change requests coming in on a daily basis, and the team is swamped. Having automated processes in place that ensure tickets are accurate and complete saves time, and ultimately, money. That’s exciting at a time when we all want to be doing more with less.
At the moment, the agents are acting as a business analyst and solution architect. But there’s no reason why you couldn’t be looking at things like security risk assessments, pulling in information upfront, asking about applications and generating security threat models.
The potential is to make people ten times more efficient, not just developers, but lots of professions, right through the stack.
In my first blog post on ChatGPT I noticed that, to get good results for most of the work I wanted, I needed to use the latest model, which is tuned for instruction/chatgpt type tasks (text-davinci-003); the base models just didn’t seem to cut it.
I was intrigued by this, so I decided to have another go at fine-tuning ChatGPT, to see if I could find out more. Base models (e.g. davinci as opposed to text-davinci-003) use pure GPT-3. (Actually, I think they are GPT-3.5 but let’s not worry about that for now). They generate completions based on input text; give them the start of a sentence and they will tell you what the next words will typically be. However, this is different from answering a question or following an instruction.
In the following example you can see the top 5 answers to the prompt “Tell me a story” when using the base GPT-3 model (which is davinci – currently the most powerful we have access to):
You can see from these answers how GPT-3 works; it has found completions based on examples in the training corpus. ‘Tell me a story , mama,” the youngster said. is likely a common completion in several novels. But this is not what I wanted! What I wanted was for ChatGPT to actually tell me a story.
If I give the same query to a different model – text-davinci-003 – I get the sort of results I was looking for. (The answers have been restricted to a short length, but it could carry on extensively.)
Clearly the text-* models have significantly enhanced functionality compared with the standard GPT-3 models. In fact, one of the important things that ChatGPT has done is focus explicitly on understanding the user’s intention when they create a prompt. And as well as allowing ChatGPT to understand the intention of the user, the approach also prevents it from responding in toxic ways.
How they do this is fascinating; they use humans in the loop, and reinforcement learning. First, a group of people create the sort of training set you’d expect in a supervised learning approach. For a given prompt “Tell me a story”, a person creates the response “Once, upon a time there was a …” These manually created prompt-response pairs are used to fine-tune ChatGPT’s base model. Then, people (presumably a different group of people) are used to evaluate the performance by ranking outputs from the model – this is used to create a separate reward model. Finally, the reward model is used in a reinforcement learning approach to update the fine-tuned model made in the first stage. (Reinforcement learning is an ML technique where you try lots of changes and see which ones do better against some reward function. It’s the approach that DeepMind used to train a computer to play video games.)
It’s worth noting the sort of investment required to do this. Using people to label and evaluate machine learning outputs is expensive, but it is a great way to improve the performance of a model. In recent LinkedIn posts I have seen people claiming to have been offered jobs to do this sort of work, so it looks like OpenAI are continuing to refine the model in this way. Reinforcement learning is usually a very expensive way of learning something; reinforcement learning on top of deep learning is even more expensive, so there has clearly been a lot of time, effort and money expended in developing the model.
ChatGPT are open in their approach – here’s their take on how to use ChatGPT; I really applaud their openness. I’m aware there are a lot of negative reactions to the tool, but it’s worth pointing out that the developers are clearly aware that the model is not perfect. If you look at the limitations section of the document they note several things including:
- It can still generate toxic or biased outputs
- It can be instructed to produce unsafe outputs
- It is trained only in English, so will be culturally-biased to English-speaking people
In my opinion, generative models are not deterministic in their outputs so these sorts of risks will always remain. But I think serious thought and expense has been applied in order to reduce the likelihood of problematic results.
I started this investigation into fine-tuning ChatGPT because I noticed that I wasn’t able to fine-tune on the text-* versions of the model. In fact, it clearly states in the ChatGPT fine-tuning guide that only the base models can be fine-tuned. Now I understand why. Unfortunately, I also understand better that the text-* models are those which have been improved with the human in the loop process. And these are the ones that have a lot of the secret sauce for ChatGPT; it is the refinement using human guidance that gives it that fantastic human-like ability which seems so impressive.
The implication for the use-cases I had in mind – fine-tuning ChatGPT to specific contexts, such as question answering about Equal Experts – is that you cannot use the model which understands your intention. So you will not get the sort of natural-sounding responses we see in all the examples people have been posting on the internet. That’s a bit sad for me, but at least now I know.
Pretty much everyone by now has heard of – and probably played with – ChatGPT (If you haven’t, go to https://openai.com/blog/chatgpt/ to see what all the fuss is about.); it’s a chatbot developed by OpenAI based on an extremely large natural language processing model, which automatically answers questions based on written prompts.
It’s user friendly, putting AI in the hands of the masses. There are loads of examples of people applying ChatGPT to their challenges and getting some great results. It can tell stories, write code, or summarise complex topics. I wanted to explore how to use ChatGTP to see if it would help with the sorts of technical business problems that we are often asked to help with.
One obvious area to exploit is its ability to answer questions. With no special training I asked the online ChatGPT site “What is Continuous Delivery?”
That’s a pretty good description and, unlike traditional chatbots, it has not been made from curated answers. Instead, it uses the GPT-3 model which was trained on a large corpus of documents from CommonCrawl, WebText, Wikipedia articles, and a large corpus of books. GPT-3 is a generative model – if you give it some starter text it will complete it based on what it has encountered in its training corpus. Because the training corpus contains lots of facts and descriptions, ChatGPT is able to answer questions. (Health warning – because the corpus contains information that is not always correct, and because ChatGPT generates responses from different texts, in many cases the answers sound convincing but might not be 100% accurate.)
This is great for questions about generally known concepts and ideas. But what about a specific company. Can we use ChatGPT to tell us about what an individual company does?
- Can it immediately answer questions about a company, or does it need to be provided with more information? Can customers and clients use ChatGPT to find out more about an organisation?
- If we give it specific texts about a company can it accurately answer questions about that company?
Let’s look at the first question. We’ll pretend I’m a potential customer who wants to find out about Equal Experts.
That’s disappointing. Let’s try rephrasing:
It’s still not the answer I was hoping for. Although I do now know that I need a diverse team with a range of skills!
Clearly ChatGPT is great for general questions but won’t help potential customers find out about Equal Experts (or most other businesses.) So maybe I can improve things. ChatGPT has the ability to ‘fine-tune’ models. You can provide additional data – extra documents which contain the information you want – and then retrain the model so it also uses this new information.
So, can we fine-tune GPT to create a model which knows about Equal Experts? I found some available, good quality text on Equal Experts – the case-studies on our website – and used it as training material.
I wrote a small number of sample questions and answers for individual case-studies and submitted them for training. I used a simplified approach to the one given by OpenAI. (For those interested I didn’t use a discriminator because of the low training data volumes.) I then asked some questions for which I knew the answers were in the training data, and displayed the top 5 answers from our new model via the ChatGPT API:
I think we can agree that these are not great results. I should caveat with the fact that creating training examples takes a lot of time. (It is often the biggest activity in an AI or machine learning initiative.) The recommendation is to use 100s of training examples for ChatGPT and I only created 12 training prompts. So results could well be improved with a bigger training set. On the other hand, one of the big selling points of GPT-based models is that they facilitate one-shot or few-shot learning (only needing a few examples to train the model on a new concept – in this case Equal Experts). So I’m disappointed that training has made no difference and that the answers are poor compared to the web interface.
In fact the results are also affected by the training procedure. The training approach to using ChatGPT gives a context first (some text), then a question, then the answer. If I give context with the question (e.g. some relevant text such as an EE case study) and then ask a question, ChatGPT responds much better.
These are much better answers (although far from perfect), but I have had to supply relevant text in the first place, which rather defeats the object of the exercise.
So, could I get ChatGPT to identify the relevant text to use as part of the prompt? You can, in fact, use ChatGPT to generate embeddings – a representation of the document as a mathematical vector – which can be used to find similar documents. So this suggests a slightly different approach:
- For each case-study – generate embeddings
- For a given question – find the case-studies which are most similar to the question (using the embeddings)
- Use the similar case-studies as part of the prompt
This gives some pretty good responses; this image shows the top 5 responses for some questions about our case studies:
These look like pretty relevant responses but they’re a bit short, and they definitely don’t contain all the available information in the case-studies. It turned out that the default response size is 16 tokens (sort of related to word count). When I set it to 200 I get these results for what is a data health check:
Well that’s a much better summary, and it gave good results for other queries also:
Throughout this activity, I found a number of other notable points:
- Using the right model in ChatGPT is really important. I ended up using the latest model, which is tuned for instruction/chatgpt type tasks (text-davinci-003), and got good results for most of the work. Base models (e.g. davinci) just didn’t seem to cut it. This matters when you are thinking of fine-tuning a model because you cannot use the specialised models as the basis of a new one.
- The davinci series models used a lot of human feedback as part of their training, so they give the best results compared to simpler models. (Although I accept the need to experiment more.)
- Quite often the model would become unavailable and I would get the following message. If you want to use the API in a live service, be aware that this can happen quite regularly.
- Each question cost about 5-6 cents.
I have noticed a wide range of reactions from people using ChatGPT, from ‘Wow- this will change the world!’ to ‘It’s all nonsense – it’s not intelligent at all.’ – even amongst people with AI backgrounds. Having played with it, I think there are lots of tasks it can help with; organisations that can figure out where it will help them and how to get the most out of it will really benefit. Watching how things play out with Chat GPT and its successors (hello GPT-4) is going to be a fascinating ride.
When our client receives project requests from their customers, a lot of time and cost is spent on resourcing project management – determining which teams should conduct the work – before we even start delivering the work. Here’s how we used data science and dashboarding to speed up this processing time and provide up-to-date metrics on the project delivery process.
The problem of resourcing project management
Our client runs an ever-growing department of over 800 people, delivering numerous projects in parallel, and the number of projects grows year after year. However, the client’s method of distributing work to the relevant teams (what they call their impacting process) hasn’t scaled with the success the client is having.
Impacting is a resource-intensive process requiring each team to read multiple documents – sometimes up to 25k words – to identify whether they are required for the project, and often they’re not. This results in a slow, manual process that requires multiple redundant points of contact.
After a project has been through the impacting process and is being delivered, there is no automated reporting. Typically, reporting is triggered by a status request from a senior leader, at which point the data is manually collected, creating slow and infrequent feedback loops.
This is an intensive process which puts tremendous strain on an already busy department, especially as they currently have to process over 100 project requests a week.
Our aim is to reduce the number of people involved in a project impact to only the most relevant individuals, and to streamline the amount of reading required to understand the project.
Leveraging data science for improved project resourcing and reporting
As the client had no clear insight on in-progress projects, we determined that the most useful first step was to provide reporting on these projects using data from their Jira ticketing system. This allows senior leaders to access project delivery information quickly and interactively, enabling them to identify issues and bottlenecks before they become problems.
We then focused on reducing the resource overhead in the impacting process. Project impacting is designed to determine which teams are required to work on a project. In this case, it involved a lot of people reading large documents which were potentially irrelevant to their team’s specialism.
So we sought to improve the impacting process in two ways:
- Can we reduce the amount of time needed to understand the project?
- Can we highlight the project to only the relevant teams?
The scope of data science
Reducing time to understanding
With a typical design document being approximately 25,000 words, it takes a person roughly 3-4 hrs to read. Reducing the amount of text needed to understand the document would result in significant time savings per person.
This was done in a variety of ways; firstly we used an AI model to summarise the text while retaining important information, allowing users to control the degree of summarisation. This summarisation method is also being used to create executive summaries for the senior leaders who constantly switch context between pieces of work, and need to very quickly understand different projects.
Secondly, we extracted keywords from the text so the user can rapidly determine important terms within the document.
These tools have proved very useful in enabling individuals to quickly establish whether they need to read the document in full, and can slim down reading time from a few hours to a few minutes.
Identifying Relevant People
Typically 12+ people can end up reading these documents, meaning that each project takes 6+ days of work just to impact – and many of these people are not even relevant to the project. Therefore, reducing the number of people reading these documents to only the most relevant compounds the savings given through document summarisation.
To do this we developed a machine learning classifier to determine which teams were relevant to a project, reducing the people required for impacting. Additionally, we identified similar existing projects and the teams involved in those, to further assist in establishing the right teams for the work.
A future enhancement we wish to add is building a recommender system that automatically alerts people if new projects arrive that are similar to previous projects they have delivered, further reducing the operational overhead.
The business value of improving project resourcing and reporting through data science
The client is now able to direct incoming projects to the relevant teams much faster, reducing the delay between a project’s request and work starting, and improving new customer satisfaction. The people involved in impacting now have time freed up to lead the deliveries of in-progress projects, which also benefits existing customers and team efficiency.
ML solutions need to be monitored for errors and performance just like any other software solution. ML driven products typically need to meet two key observability concerns:
- Monitoring the model as a software product that includes metrics such as the number of prediction requests, its latency and error rate.
- Monitoring model prediction performance or efficacy, such as f1 score for classification or mean squared error for regression.
Monitoring as a Software Product
As a software product, monitoring can be accomplished using existing off the shelf tooling such as Prometheus, Graphite or AWS CloudWatch. If the solution is created using auto-generated ML this becomes even more important. Model code may be generated that slows down predictions enough to cause timeouts and stop user transactions from processing.
You should ideally monitor:
- Availability
- Request/Response timings
- Throughput
- Resource usage
Alerting should be set up across these metrics to catch issues before they become critical.
Monitoring model prediction performance
ML models are trained on data that’s available at a certain point in time. Data drift or concept drift happens when the input data changes its distributions, which can affect the performance of the model. Let’s imagine we have a user signup model that forecasts the mean basket sales of users for an online merchant. One of the input variables the model depends on is the age of the new users. As we can see from the distributions below, the age of new users has shifted from August to September 2021.
It is important to monitor the live output of your models to ensure they are still accurate against new data as it arrives. This monitoring can provide important cues when to retrain your models, and dashboards can give additional insight into seasonal events or data skew.
There are a number of metrics which can be useful including:
- Precision/Recall/F1 Score.
- Model score outputs.
- User feedback labels or downstream actions
- Feature monitoring (Data Quality outputs such as histograms, variance, completeness).
The right metrics for your model will depend on the purpose of the model and the ability to access the performance data in the right time frame.
Below is an example of a classification performance dashboard that tracks the precision and recall over time. As you can see, the model’s performance is becoming more erratic and degrading from 1st April onwards.
Alerting should be set up on model accuracy metrics to catch any sudden regressions that may occur. This has been seen on projects where old models have suddenly failed against new data (fraud risking can become less accurate as new attack vectors are discovered), or where an auto ML solution has generated buggy model code. Some ideas on alerting are:
- % decrease in precision or recall.
- variance change in model score or outputs.
- changes in dependent user outputs e.g. number of search click throughs for a recommendation engine.
The chart below illustrates that model A’s performance degrades over time. A new challenger model, B, is re-trained on more recent data and becomes a candidate for promotion.
Data Quality
A model is only as good as the data it’s given, so instrumenting the quality of input data is crucial for ML products. Data quality includes the volume of records, the completeness of input variables and their ranges. If data quality degrades, this will cause your model to deteriorate.
To read more about monitoring and metrics for MLOps, download our new MLOps playbook, “Operationalising Machine Learning”, which provides comprehensive guidance for operations and AI teams in adopting best practice.
If you’re new to the world of MLOps, here’s what you need to know: MLOps (which stands for machine learning operations) is a set of tools and ideas that help data scientists and operations teams to develop, deploy and monitor models in the AI world.
That’s a big deal because organisations that want to deliver AI projects often struggle to get projects off the ground at scale, and to deliver effective return on investment (ROI). Using MLOps helps those organisations to create machine learning models in a manner that is effective, consistent and scalable.
Over the last decade, machine learning has become a critical resource for many organisations. Using ML models, companies can create models that can analyse vast quantities of structured and unstructured data, making predictions about business outcomes that can be used to inform faster, better decisions. The challenge, increasingly, is how those organisations monitor and manage multiple ML models and iterations.
MLOps brings discipline and structure to AI
That’s where MLOps comes in. While DevOps focuses on how systems are developed with regard to security, compliance and IT resource management, MLOps focuses on the consistent development of scalable models. Blending machine learning with traditional devops models creates an MLOps process that streamlines and automates the way that intelligent applications are developed, deployed and updated.
Examples of how MLOps is being used include:
- Telecoms – using MLOps systems to manage network operations and customer churn models.
- Marketing – in advertising, MLOps is being used to manage multiple machine learning models in production to present targeted ads to consumers.
- Manufacturing – Using machine learning models to predict asset maintenance needs and identify performance and quality problems.
With MLOps, Data scientists can place models into production, then monitor and record their performance to ensure they’re working well. With MLOps they can also capture information on all ML models in a standard form that allows other teams to use those models or revise them later.
How MLOps can deliver higher ROI
This isn’t just about making life easier. We know that 90% of AI projects fail under current development frameworks. MLOps provides a far more reliable, cost-effective framework for development that can deliver successful projects much more quickly. By adopting MLOps, it becomes easier for organisations to make the leap from small-scale development to large-scale production environments. By increasing the speed and success of ML models being deployed, MLOps can improve the ROI of AI projects.
It’s also worth considering that models – by their nature – need to change. Once an ML model is created and deployed, it generally won’t continue operating in the same way forever. Models need to be constantly monitored and checked, to ensure they’re delivering the right insights and business benefits. MLOps helps data scientists to make faster inventions when models need to be revised – such as during a global pandemic or supply chain crisis – with changes deployed at a faster rate.
If organisations want to adopt MLOps they must first build the relevant skills within data and operations teams. This includes skills such as full lifecycle tracking and a solid AI infrastructure that enables the rapid iteration of new ML models. These will need to support both main forms of MLOps – predictive (charting future outcomes based on past results) and prescriptive (making recommendations for future decisions).
Need more guidance?
The key thing to understand about MLOps is that it can’t guarantee success, but it will lower the cost of experimentation and failure.
Ensuring you get the best results from MLOps isn’t always easy, and our MLOps Playbook is a good place to start for guidance on how to maximise the ROI and performance of models in your organisation. The playbook outlines the basic principles of effective MLOps, including creating solid data foundations, creating an environment where data scientists can create and the pitfalls to avoid when creating MLOps practices.
MLOps is still a fairly new concept in many organisations, but according to Cognilytica, the global MLOps market will be worth $4 billion by 2025. The industry is growing by around 50% each year, as organisations look to deliver more value from cutting-edge AI programmes.
We asked our Equal Experts experts to provide answers to some of the most common questions about MLOps:
What is MLOps and why do we need it?
MLOps is a culture and set of best practices that teams can follow to productionize machine learning models. We need MLOps because it can replace the old-fashioned approach of building models in a separate data science team, and then throw them over the wall to a software engineer team.
What is the difference between MLOps and DevOps?
The difference between MLOps and DevOps is that MLOps is more far reaching as a framework. In many ways, MLOps can be seen as an extension of DevOps. Once a team productionizes machine learning models, some special roles are needed like Data Scientist and Machine Learning Engineers. MLOps gives guidance on how to integrate these roles, and how to handle the technology that is needed to develop and evolve the machine learning models.
How do you learn MLOps?
If you want to learn MLOps the good news is that there are plenty of courses and tutorials. Our advice is not to focus too much on the tools, but to learn the technology-independent best practices and patterns.
What does MLOps stand for?
It stands for Machine Learning Operations.
Why don’t AI projects make it to production?
Many companies don’t realise that a machine learning model only starts its life in a (Jupyter) notebook. Once you have built a model, it needs to evolve and become a part of your technology landscape. This phase of AI development is the most difficult and probably takes most of the time. If you are struggling this article might help.
When is the best time to employ MLOps?
The best practices for MLOps can be applied to any project and team that is working with machine learning. There’s no one best time to employ MLOps.
How do you deploy ML models into production?
To deploy ML models into production, remember that an ML model should be treated as an artefact that needs to be versioned, governed, and managed. This artefact can be deployed to a system in a variety of ways. A common one is in the form of an API, where a wrapper is placed around it so the model can serve as a microservice. Now the API can be published on the company network. Other options are to load the model artefact in memory for distributed stream processing, as a procedure in a database or to deploy the model on edge devices in the internet of things.
What are examples of ML models in daily life?
Nowadays there are too many examples to mention. Some examples of ML models in daily life might include the social media feed you see using a machine learning model to present you content. The spam filter in your email is a machine learning model. The recommendations that a website presents to you. A self-driving car is packed with ML. Basically any system that works with data is likely to have an ML model integrated.
Why do ML models degrade in production?
ML models degrade in production because of a process called data drift. This means that the data that the model is using in production is different from the data that it was trained on. This is common, since in most use-cases system behaviours change over time, for example customer behaviours change frequently according to the season. Therefore it is important to keep retraining your models on recent data.
Are machine learning and AI the same thing?
No, machine learning and AI aren’t the same thing. Machine learning is a subset of AI where algorithms are developed and trained on historical data, without being explicitly told how, to make predictions about new data. AI is mostly used in a broader sense, indicating systems that perform intelligent tasks in a human-like manner.
If you want to learn more about MLOps, we’ve written a playbook (pdf version), blogs and case studies which can all be found here.
Our experience of working on AI and ML projects means that we understand the importance of establishing best practices when using MLOps to test, deploy, manage and monitor ML models in production.
Considering that 87% of data science projects never make it into production, it’s vitally important that AI projects have access to the right data and skills to solve the right problems, using the right processes.
Below, we outline six fundamental principles of MLOps that should be at the heart of your AI strategy.
1 – Build solid data foundations
Your data scientists will need access to a store of good quality, ground-truth (labelled) historical data. ML models are fundamentally dependent on the data that’s used to train them, and data scientists will rely on this data for monitoring and training.
It’s common to create data warehouses, data lakes or lake houses with associated data pipelines to capture this data and make it available to automated processes and data teams. Our data pipeline playbook covers our approach to providing this data. Make sure to focus on data quality, security, and availability.
2 – Provide an environment that allows data scientists to create
Developing ML models is a creative, experimental process. Data scientists need a set of tools to explore data, create models and evaluate their performance. Ideally, this environment should:
- Provide access to required historical data
- Provide tools to view and process the data
- Allow data scientists to add additional data in various formats
- Support collaboration with other scientists via shared storage or feature stores
- Be able to surface models for early feedback before full productionisation
3 – ML services are products
ML services should be treated as products, meaning you should apply the same behaviours and standards used when developing any other software product.
For example, when building ML services you should identify and profile the users of a service. Engaging with users early in the development process means you can identify requirements that can be built into development, while later on, users can help to submit bugs and unexpected results to inform improvements in models over time.
Developers can support users by maintaining a clear roadmap of features and improvements with supporting documentation, helping users to migrate to new versions and clearly explaining how versions will be supported, maintained, monitored and (eventually) retired.
4 – Apply continuous delivery of complex ML solutions
ML models must be able to adapt when the data environment, IT infrastructure or business needs change. As with any working software application, ML developers must adopt continuous delivery practices to allow for regular updates of models in production.
We advise that teams should use techniques such as Continuous Integration and Deployment (CI/CD), utilise Infrastructure as Code and work in small batches to have fast, reasonable feedback.
5 – Evaluate and monitor algorithms throughout their lifecycle
It’s essential to understand whether algorithms are performing as expected, so you need to measure the accuracy of algorithms and models. This will add an extra layer of metrics on top of your infrastructure resource measurements such as CPU and RAM per Kubernetes pod. Data scientists are usually best placed to identify the best measure of accuracy in a given scenario, but this must be tracked and evaluated throughout the lifecycle, including during development, at the point of release, and in production.
6 – MLOps is a team effort
What are the key roles within an MLOps team? From our experience we have identified four key roles that must be incorporated into a cross-functional team:
- Platform/ML engineers to provide the hosting environment
- Data engineers to create production data pipelines
- Data scientists to create and amend the model
- Software engineers to integrate the model into business systems
Remember that each part of the team has a different strength – data scientists are typically strong at maths and statistics, while they may not have software development skills. Engineers are often highly-skilled in testing, logging and configuration, while data scientists are focused on algorithm performance and accuracy.
At the outset of your project consider how your team roles can work together using clear, defined processes. What are the responsibilities of each team member, and does everyone recognise the standards and models that are expected?
To learn more about MLOps principles and driving better, more consistent best practices in your MLOps team, download our Operationalising Machine Learning Playbook for free.
When using MLOps, it’s easy to focus on the technical aspects of the project. But as we explain in our recently published Playbook, Operationalising Machine Learning, it’s vital that MLOps is based around user involvement at every stage – including some employees you might not expect.
Back in 2014, tech giant Amazon built an internal ML University so that its in-house developers could keep their skills up to date. In 2022, developers still use the university – but so do product managers, program managers and a host of other business users from across Amazon.
What Amazon realised was that giving novice users an understanding of basic AI and ML ideas empowered those users to get involved with data teams, and resulted in better projects. Business users at Amazon play a collaborative role in developing a strong business case for ML models, driving solutions that will meet the needs of the business and its customers.
Without this collaboration, ML teams risk building impressive prototypes that never get business buy-in, or don’t have real world customer impact. That’s something to think about considering that IDC reports that 47% of AI projects never get past the initial experimentation phase, and 28% projects simply fail.
The importance of user involvement in MLOps
MLOps can help the success of AI projects by providing a structured framework for moving ML models from development through to production and management. Where and how should data teams start to build user involvement into this process?
Here are five ways you should involve end users in your MLOps process:
Step 1: Ask users for input before development starts
A common pitfall when surfacing a new machine learning score or insight is that end users don’t understand or trust new data points. This can lead to them ignoring the insight – no matter how useful it is.
This can be avoided by involving users at the very start of an ML development. What problem does the user expect the model to solve for them? Use this insight to guide the initial investigation and analysis.
Step 2: Demonstrate and Iterate
Once development starts, make a point of demonstrating model results to users as part of the iterative model development – take users on the journey with you. This is an opportunity to gain early feedback that can help guide development of models that will deliver real benefit to the business and its customers. Data teams should surface ML models for early feedback from users before full productionisation. Tools such as Streamlit and Dash can help to prototype and share models with end users.
Step 3: Focus on explainability
As the model nears completion, ensure that you have something that can be explained – this may be the model itself, or how it arrives at a recommendation or insight.
If you’re building a model that will provide an insight into a credit risk score, you might need to explain what data is being used to drive the insight, and how this insight can be applied within the business user’s regular process of processing a loan application, for example.
Step 4: Monitor your users’ experience
Once a model is live, make sure users are involved in testing, and can provide feedback on any bugs or faults they experience. Consider also using telemetrics for monitoring, so that you can monitor performance of the model and be alerted in case of any issues. You should consider sharing these metrics with business users where appropriate.
These steps will help to build and maintain user trust in the model, and increase the likelihood that the results generated by ML will be adopted as intended.
Step 5: Adopt continuous improvement
When an ML model is in production, you will almost certainly continue to improve the service throughout its lifetime. To maintain high levels of user involvement, capture iterations of your service as versions, and help users to migrate to newer versions.
It’s important to provide good, current user documentation and regularly test how models appear from the user’s perspective. Finally, when you retire a service, have a clear process and ensure that users are supported if a model will no longer be supported.
Summary
We believe that ML services should be developed and treated as a product, meaning organisations should apply the same behaviours and standards that would be used when developing any other software product.
When developing an ML model, it is essential to identify, profile and maintain an active relationship with the end users of your ML service. Work with users to identify requirements that feed into your development backlog, involve users in validating features and improvements, and notify them of updates and outages. In doing so, you will secure buy-in from business users and increase the odds of the AI project delivering real business value.
In our recent Operationalising ML Playbook we discussed the most common pitfalls during MLOps. One of the most common pitfalls? Failing to implement appropriate secure development at each stage of MLOps.
Our Secure Development playbook describes the practices we know are important for secure development and operations and these should be applied to your ML development and operations.
In this blog we will explore some of the security risks and issues that are specific to MLOps. Make sure you check them all before publishing your model into production.
In machine learning, systems use example data to try to learn something – which may be output as a prediction or insight. The examples used to train ML models are known as training datasets, and security issues can be broadly divided into those affecting the model before and during training, and those affecting models that have already been trained.
Vulnerability to data poisoning or manipulation
One of the most commonly discussed security issues in MLOps is data poisoning – this is an attack where hackers attempt to corrupt or manipulate the data used for training ML models. This might be by switching expected responses, or adding new responses into a system. The result of data poisoning is that data confidentiality and reliability are both damaged.
When data for ML models is collected from online sources from sensors or online sources, the risk of data poisoning can be extremely high. Attacks can include label flipping (data is poisoned by changing labels in data) and gradient descent attacks (where the ability of a model to understand how close it is to predicting the correct answer is damaged by either making the model falsely believe it’s found the answer, or by preventing it from finding the answer by constantly changing that answer).
Exposure of data in the pipeline
You will certainly need to include data pipelines as part of your solution. In some cases they may use personal data in the training. Of course these should be protected to the same standards as you would in any other development. Ensuring the privacy and confidentiality of data in machine learning models is critical to protect against data extraction attacks and function extraction attacks.
Making the model accessible to the whole internet
Making your model endpoint publicly accessible may expose unintended inferences or prediction metadata that you would rather keep private. Even if your predictions are safe for public exposure, making your endpoint anonymously accessible may present cost management issues. A machine learning model endpoint can be secured using the same mechanisms as any other online service.
Embedding API Keys in mobile apps
A mobile application may need specific credentials to directly access your model endpoint. Embedding these credentials in your app allows them to be extracted by third parties and used for other purposes. Securing your model endpoint behind your app backend can prevent uncontrolled access.
As with most things in development, it only takes one person to neglect MLOps security to compromise the entire project. We advise organisations to create a clear and consistent set of governance rules that protect data confidentiality and reliability at every stage of an ML pipeline.
Everyone in the team needs to agree on the right way to do things – it only takes one leak or data attack for the overall performance of a model to be compromised.
Despite huge adoption of AI and machine learning (ML), many organisations are still struggling to get ML models into production at scale.
The result is AI projects that stall, don’t deliver ROI for years, and potentially fail altogether. Gartner Group estimates that only half of ML models ever make it out of trials into production.
Why is this happening? One of the biggest issues is that companies develop successful ML prototype models, but these models aren’t equipped to be deployed at scale into a complex enterprise IT infrastructure.
All of this slows down AI development. Software company Algorithmia recently reported that most companies spend between one and three months deploying a new ML model, while one in five companies took more than three months. Additionally, 38% of data scientists’ time is typically spent on deployment rather than developing new models.
Algorithmia found that these delays were often due to unforeseen operational issues. Organisations are deploying models only to find they lack vital functionality, don’t meet governance or security requirements, or need modification to provide appropriate tracking and reporting.
How MLOps can help
Enter MLOps. While MLOps leverages DevOps’ focus on compliance, security, and management of IT resources, MLOps add much more emphasis on the consistent development, deployment, and scalability of models.
Organisations can accelerate AI adoption and solve some of their AI challenges by adopting MLOps. Algorithmia found that where organisations were using MLOps, data scientists were able to reduce the time spent on model deployment by 22%, and the average time taken to put a trained model into production fell by 31%.
That’s because MLOps provides a standard template for ML model development and deployment, along with a clear history and version control. This means processes don’t need to be reinvented for each new model, and standardised processes can be created to specify how all models should meet key functional requirements, along with privacy, security and governance policies.
With MLOps, data teams can be confident that new code and models will meet architecture and API requirements for production usage and testing. By removing the need to create essential features or code from scratch, new models are faster to build, test, train and deploy.
MLOps is being widely used for tasks such as automation of ML pipelines, monitoring, lifecycle management and governance. MLOps can be used to monitor models to sense any fall in performance or data drifts that suggest models might need to be updated or retrained.
Having a consistent view of ML models throughout the lifecycle in turns allows teams to easily see which models are live, which are in development, and which require maintenance or updates. These can be scheduled more easily with a clear overview of the ML landscape.
Within MLOps, organisations can also build feature stores, where code and data can be re-used from prior work, further speeding up the development and deployment of new models.
Learn more about MLOps
Our new playbook, Operationalising Machine Learning, provides guidance on how to create a consistent approach to monitoring and auditing ML models. Creating a single approach to these tasks allows organisations to create dashboards that provide a single view of all models in development and production, with automated alerts in case of issues such as data drift or unexpected performance issues.
If you’re struggling to realise the full potential of machine learning in your organisation, the good news is that you’re not alone. According to industry analysts VentureBeat, 87% of AI projects will never make it into production.
MLOps emerged to address this widespread challenge. By blending AI and DevOps practices, MLOps promised smooth, scalable development of ML applications.
The bad news is that MLOps isn’t an immediate fix for all AI projects. Operationalsing any AI or machine learning solution will present its own challenges, which must be addressed to realise the potential these technologies offer. Below we’ve outlined five of the biggest MLOps challenges in 2022, and some guidance on solving these issues in your organisation.
You can read about these ideas in more detail in our new MLOps playbook, “Operationalising Machine Learning”, which provides comprehensive guidance for operations and AI teams in adopting best practice around MLOps.
Challenge 1: Lack of user engagement
Failing to help end users understand how a machine learning model works or what algorithm is providing an insight is a common pitfall. After all, this is a complex subject, requiring time and expertise to understand. If users don’t understand a model, they are less likely to trust it, and to engage with the insights it provides.
Organisations can avoid this problem by engaging with users early in the process, by asking what problem they need the model to solve. Demonstrate and explain model results to users regularly and allow users to provide feedback during iteration of the model. Later in the process, it may be helpful to allow end users to view monitoring/performance data so that you can build trust in new models. If end users trust ML models, they are likely to engage with them, and to feel a sense of ownership and involvement in that process.
Challenge 2: Relying on notebooks
Like many people we have a love/hate relationship with notebooks such as Jupyter. Notebooks can be invaluable when you are creating visualisations and pivoting between modelling approaches.
However, notebooks contain both code and outputs, along with important business and personal data, meaning it’s easy to inadvertently pass data to where it shouldn’t be. Notebooks don’t lend themselves easily to testing, and cells that can run out of order means that different results can be created by the same notebook based on the order that cells are run in.
In most cases, we recommend moving to standard modular code after creating an initial prototype, rather than using notebooks. This results in a model that is more testable and easier to move into production, with the added benefit of speeding up algorithm development.
Challenge 3: Poor security practice
There are a number of common security pitfalls in MLOps that should be avoided, and it’s important that organisations have appropriate practices in place to ensure secure development protocols.
For example, it’s surprisingly common for model endpoints and data pipelines to be publicly accessible, potentially exposing sensitive metadata to third parties. Endpoints must be secured to the same standard as any development to avoid cost management and security problems caused by uncontrolled access.
Challenge 4: Using Machine Learning inappropriately
Despite the hype, ML shouldn’t always be the default way to solve a problem. AI and ML are essentially tools that help to understand complex problems like natural language processing and machine vision.
Applying AI to real-world problems that aren’t like this is unnecessary, and leads to too much complexity, unpredictably and increased costs. You could build an AI model to predict whether a number is even or odd – but you shouldn’t.
When addressing a new problem, we advise businesses to try a non-ML solution first. In many cases, a simple, rule-based system will be sufficient.
Challenge 5: Forgetting the downstream application of a new model
Achieving ROI from machine learning requires the ML model to be integrated into business systems, with due attention to usability, security and performance.
This process becomes even longer if models are not technically compatible with business systems, or do not deliver the expected level of accuracy. These issues must be considered at the start of the ML process, to avoid delays and disappointment.
A common ML model might be used to predict ‘propensity to buy’ – identifying internet users who are likely to buy a product. If this downstream application isn’t considered when the model is built, there is no guarantee that the data output will be in a form that can be used by the business API. A great way to avoid this is by creating a walking skeleton or steel thread (see our Playbook for advice on how to do this).
Find out more about these challenges and more in our new Operationalising Machine Learning Playbook, which is available to read here.
Building a predictive model to forecast the future from historical data is standard practice for today’s businesses. But deploying, scaling and managing these models is far from simple.
Each ML solution depends on an algorithm (code) and a set of data used to develop and train the algorithm. For this reason, building ML solutions is different to other types of software development.
Enter MLOps, or machine learning operations, a set of processes that help organisations to develop, deploy and monitor ML models at scale by applying best practices to infrastructure, code and data.
MLOps is a relatively new idea but one that has been adopted by many organisations – the market for MLOps solutions is expected to reach $4 billion by 2025. At Equal Experts, we have been involved in developing and deploying AI and ML for a number of applications including to:
- Assess cyber risk
- Evaluate financial risk
- Improve search recommendations for retail websites
- Improve logistics and supply chains
Key Terms used in MLOps
If you’re new to MLOps there are several important terms to be aware of:
- Machine learning (ML) – a subset of AI that involves training algorithms with data rather than developing hand-crafted algorithms. A machine learning solution uses a data set to train an algorithm, typically training a classifier that says what type of thing this data is (e.g. this picture is of a dog ); a regressor, which estimates a value (e.g. the price of this house is £400,000.) or an unsupervised model, such as generative ones which can be used to write novel text (such as song lyrics).
- Model – In machine learning a model is the result of training an algorithm with data, which maps a defined set of inputs to outputs.
- Algorithm – we use this term more or less interchangeably with model. (There are some subtle differences, but they’re not important and using the term ‘algorithm’ prevents confusion with the standard software engineering use of the term ‘data model’ – which is a definition of the data entities, fields, relationships etc for a given domain, that is used to define database structures among other things.)
- Ground-truth data – a machine-learning solution usually needs a data set that contains the input data (e.g. pictures) along with the associated answers (e.g. this picture is of a dog, this one is of a cat) – this is the ‘ground-truth’.
- Labelled data – means the same as ground-truth data.
How does MLOps work?
We talk about MLOps as a set of processes that help data scientists to develop consistent, scalable ML models, and monitor their performance. To create and use these algorithms, you will usually follow these steps:
Initial development of the algorithm – Developing a model is the first step in machine learning. Data scientists will identify or create ‘ground truth’ data sets and explore them. They will build and evaluate prototypes of the models, trying out different core algorithms and data transformations until they arrive at one which meets the business need.
Integrate/deploy the model – once the model has been built, it must be integrated into the business. This can be done in various ways depending on the consuming service. In modern architecture, models are commonly implemented as a standalone microservice and models are deployed by copying an approved version of the model into an operational environment.
Monitor performance – All ML models need to be monitored to ensure they’re running and meeting demand, but also that the results of the model are accurate and reliable.
Update model – over time, models must be retrained to reflect new data, or improvements to the model. In this case, it’s important to maintain version control and to direct downstream services to the new model.
Operationalising Machine Learning
Our MLOps playbook brings together our experiences working with algorithm developers to build ML solutions. It provides a comprehensive overview of what you need to consider when providing the architecture, tools and infrastructure to support data scientists and to integrate their outputs into the business.
Download the playbook for expert guidance on how your organisation can attain the promised business value from algorithms by providing engineering to support algorithm development, and by integrating ML more effectively into your business processes. You’ll find helpful advice on how to:
- Collect data that drives machine learning, and make that available to data scientists
- Integrate algorithms into your everyday business
- Configuration control, deploy and monitor deployed algorithms
- Test and monitor the algorithms
View our online version or download a pdf here.
The inconvenient truth is that most big data projects fail to deliver the expected return on investment. In fact, Gartner predicts that only 15% of data projects utilising AI in 2021-2022 will be successful.
Companies are spending more than ever on data and analytics projects, often using cutting-edge AI and machine learning tech – but many of them don’t generate the ROI that the business expected. In fact, a recent ESI ThoughtLab study of 1,200 organisations found that companies are generating an average ROI of just 1.3% from AI data projects, while 40% don’t generate a profit at all.
There are lots of reasons why this happens. Sometimes, the expectations of data projects are too high. But more often, companies embark on data projects without a clear strategy and without appropriate skills and resource to replicate the benefits of a pilot project at scale. AI projects require time, expertise and scale to deliver a decent ROI.
This might come as a surprise to some early project teams. Building a proof-of-concept AI data project can be relatively easy – if you have a team of skilled data scientists, a small project could be ready to test in a few months. The challenge comes when organisations try to scale up those prototypes to work in an enterprise setting.
If your data scientists don’t have the appropriate software development skills, then you could end up with a machine learning model that works in principle but isn’t fully integrated into workflows and enterprise operations – meaning it’s not collecting, sharing or analysing the intended data.
Enterprises need to ensure that they have the skills needed to make machine learning models work within their business. This might mean creating an app or integrating machine models with existing sales platforms.
When a global online home retailer developed a machine learning model to improve the efficiency of logistics, they soon realised that this was only the first step. Data scientists had created a model that was able to predict which warehouse and logistics carrier would be the most efficient for individual projects based on the product size and likelihood of sale in a particular region.
Our development team was able to help take the project to the next step, by creating ways to integrate this model into existing systems and automate the data collection process. The result is a system that can advise the business which proportion of a product to store in a particular warehouse, and which carrier to use to cut 5% from shipping costs, for example.
To increase your chances of creating positive ROI from data-enabled AI projects, organisations need to ensure they have the right skills in project teams – in addition to data scientists, you will need engineers, process owners and strong DevOps.
Second, ensure that you are measuring ROI over an appropriate timescale. The upfront costs involved in scaling data projects can result in flat ROI in the short-term. Data preparation, technology costs and people development are substantial expenses, and it takes an average of 17 months to show ROI, with firms surveyed by ESI showing a return of 4.3% at this stage.
Third, are you measuring the right things to accurately measure ROI? Capturing the cost savings from automated processes and data availability only tells half the story. By incorporating machine learning into the transformation of enterprise supply chains, logistics and product development, companies can drive increased revenue, market share, reduced time-to-market and higher shareholder value.
To find out more about how you can realise higher ROI from data investment, download our free Playbook here.
<tl;dr>
Haystack is a question-answering framework – a tool to answer natural language questions from a text corpus which uses AI deep learning techniques to do the natural language processing. If you give it a bunch of wikipedia articles on Game of Thrones and ask “Who is the father of Arya Stark,” it will tell you “Lord Eddard Stark,” and “Ned“, and give you the text which supports it. I made a quick subjective evaluation to see what it can do. It worked pretty much out of the box and has a useful set of tutorials and code samples. Results are pretty good but not perfect, and you can improve them by changing the complexity of the model, or refining the model using your own data.
What is Haystack?
Haystack is a question-answering framework – a tool to answer natural language questions from a text corpus. It can handle the typical ways of storing documents – PDF, doc, txt etc., and uses deep learning technologies (specifically transformer networks), to improve on traditional pattern-matching or NER techniques.
Does it work?
We have several clients who need to search large document corpuses so I decided to have a look at Haystack to see what it can do. I started at the Get Started page in the documentation and tried it out. Quick tip: I needed to increase the memory available to my Docker environment before it would run correctly, but apart from that it worked using the instructions provided.
The demo comes with about 2,500 documents about the Game of Thrones series hosted on an ElasticSearch instance. A sample document is:
”A Man Without Honor” is the seventh episode of the second season of HBO’s medieval fantasy television series ”Game of Thrones.” The episode is written by series co-creators David Benioff and D. B. Weiss, and directed, for the second time in this season, by David Nutter. It premiered on May 13, 2012. The name of the episode comes from Catelyn Stark’s assessment of Ser Jaime Lannister: “You are a man without honor,” after he kills a member of his own family to attempt escape.
I tried the following questions on the corpus.
Question 1: Who is the father of Arya Stark?
This is the suggested test query. The first result is great. The second and third are wrong (although their scores are much lower).
Question 2: Who is littlefinger?
The first two results are great (curiously the UI does not show them in relevance order). The last is wrong.
Q3: Who is little finger?
This is the same question as Q2 but with a space in the key term. None of these results are correct.
How does it work?
It’s straightforward to create a question answering method from documents in a corpus. This code snippet asks a question about documents stored in an ElasticSearch document store.
document_store = ElasticsearchDocumentStore(host=”localhost”, username=””, password=””, index=”document”)
retriever = ElasticsearchRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path=”deepset/roberta-base-squad2″, use_gpu=True)
pipe = ExtractiveQAPipeline(reader, retriever)
## Voilà! Ask a question!
prediction = pipe.run(query=”Who is the father of Arya Stark?”, top_k_retriever=10, top_k_reader=5)
print_answers(prediction, details=”minimal”)
The most important stages in the pipeline are:
- Retriever – does an initial filter of the documents to find ones which might have the answer. In most cases this uses a simple TF-IDF or similar approach.
- Reader – looks at the documents returned by the retriever and extracts the best answers. The readers use deep learning transformer networks (see here for a quick overview of transformers.) In the example code above it is using a RoBERTa model. You can use models from Huggingface or similar. Different models allow you to trade off between accuracy, speed and available processing power.
Can you improve the results?
The results fundamentally depend on the model utilised in the reader stage and can be improved by changing the model. I tried changing it to the bert-large-uncased-whole-word-masking-squad2 model which is bigger (1.34GB compared to 1.4MB), but ran fine on my MacBook Pro.
The results were quite a bit better.
Question 1: Who is the father of Arya Stark?
It gets the right answer most of the time and finds a variety of names for him.
Question 2: Who is littlefinger?
This gives spot-on answers – 100% correct.
Question 3: Who is little finger?
The mistyping with the space between the words still leads to some incorrect results. The first result is correct but the rest are wrong. All in all it’s definitely an improvement.
You can also fine-tune a model. You can collect examples of questions and answers and use them to update the model. Haystack provide an annotation tool to help with this (The manual is here.) Once you have your data it seems straightforward to refine the model:
reader = FARMReader(model_name_or_path=”distilbert-base-uncased-distilled-squad”, use_gpu=True)
train_data = “data/squad20”
reader.train(data_dir=train_data, train_filename=”dev-v2.0.json”, use_gpu=True, n_epochs=1, save_dir=”my_model”)
I have not had time to test how well this works.
What else does Haystack do?
Apart from improved search you can also use Haystack to:
- Return a novel answer composed to a question (Using Generators)
- Summarize all the answers into a single response
- Translate between languages
- Question answers on data stored in knowledge graphs (instead of as documents). Sadly it does not yet generate a knowledge-graph from text 🙁
The repo is here. I found this article to be a useful introduction to Haystack.
If you’re a senior IT leader, I’d like to make a prediction. You have faced a key data governance challenge at some time. Probably quite recently. In fact, there is a good chance that you’re facing one right now. I know this to be true, because clients approach us frequently with this exact issue.
However, it’s not a single issue. In fact, over time we have come to realise that data is a slippery term that means different things for different people. Which is why we felt that deeper investigation into the subject was needed, to gain clarity and understanding around this overloaded term and to establish how we can talk to clients who see data governance as a challenge.
So, what is data governance? And what motivates an organisation to be interested in it?
Through a series of surveys, discussions and our own experiences, we have come to the conclusion that client interest in data governance is motivated by the following wide range of reasons.
1. Data Security/Privacy
I want to be confident that I know the right measures are in place to secure my data assets and that we have the right protections in place.
2. Compliance – To meet industry requirements
I have specific regulations to meet (e.g. health, insurance, finance) such as:
- Storage – I need to store specific data items for specified periods of time (or I can only store for specific periods of time).
- Audit – I need to provide access to specified data for audit purposes.
- Data lineage/traceability – I have to be able to show where my data came from or why a decision was reached.
- Non-repudiation – I have to be able to demonstrate that the data has not been tampered with.
3. Data quality
My data is often of poor quality, it is missing data points, the values are often wrong, or out of date and now no-one trusts it. This is often seen in the context of central data teams charged with providing data to business functions such as operations, marketing etc. Sometimes data stewardship is mentioned as a means of addressing this.
4. Master/Reference Data Management
When I look at data about the same entities in different systems I get different answers.
5. Preparing my data for AI and automation
I am using machine learning and/or AI and I need to know why decisions are being made (as regulations around the use of AI and ML mature this is becoming more pressing – see for example https://ico.org.uk/for-organisations/guide-to-data-protection/key-data-protection-themes/explaining-decisions-made-with-ai/).
6. Data Access/Discovery
I want to make it easier for people to find data or re-use data – it’s difficult for our people to find and/or access data which would improve our business. I want to overcome my data silos. I want data consumers to be able to query data catalogues to find what they need.
7. Data Management
I want to know what data we have e.g. by compiling data dictionaries. I want more consistency about how we name data items. I want to employ schema management and versioning.
8. Data Strategy
I want to know what strategy I should take so my organisation can make better decisions using data. And how do I quantify the benefits?
9. Creating a data-driven organisation
I want to create an operating model so that my business can manage and gain value from its data.
I think it’s clear from this that there are many concerns covered by the term data governance. You probably recognise one, or maybe even several, as your own. So what do you need to do to overcome these? Well, now we understand the variety of concerns, we can start to address the approach to a solution.
Understanding Lean Data Governance
Whilst it can be tempting for clients to look for an off-the-shelf solution to meet their needs, in reality, they are too varied to be met by a single product. Especially as many of the concerns are integral to the data architecture. Take data lineage and quality as examples that need to be considered as you implement your data pipelines – you can’t easily bolt them on as an afterthought.
Here at Equal Experts, we advocate taking a lean approach to data governance – identify what you are trying to achieve and implement the measures needed to meet them.
The truth is, a large proportion of the concerns raised above can be met by following good practices when constructing and operating data architectures – the sorts of practices that are outlined in our Data Pipeline and Secure Delivery playbooks.
We have found that good data governance emerges by applying these practices as part of delivery. For example:
- Most Data security concerns can be met by proven approaches – taking care during environment provisioning, implementing role-based access control, implementing access monitoring and alerts and following the principles that security is continuous and collaborative.
- Many Data Quality issues can be addressed by implementing the right measures in your data pipelines – incorporating observability through the pipelines – enabling you to detect when changes happen in data flows; and/or pragmatically applying master and reference data so that there is consistency in data outputs.
- Challenges with data access and overcoming data silos are improved by constructing data pipelines with an architecture that supports wider access. For example our reference architecture includes data warehouses for storing curated data as well as landing zones which can be opened up to enable self-service for power data users. Many data warehouses include data cataloguing or data discovery tools to improve sharing.
- Compliance challenges are often primarily about data access and security (which we have just addressed above) or data retention which depends on your pipelines.
Of course, it is important that implementing these practices is given sufficient priority during the delivery. And it is critical that product owners and delivery leads ensure that they remain in focus. The tasks that lead to good Data Governance can get lost when faced with excessive demands for additional user features. In our experience this is a mistake, as deprioritising governance activities will lead to drops in data quality, resulting in a loss of trust in the data and in the end will significantly affect the user experience.
Is Data Governance the same as Information Governance?
Sometimes we also hear the term Information Governance. Information Governance usually refers to the legal framework around data. It defines what data needs to be protected and any processes (e.g. data audits), compliance activities or organisational structures that need to be in place. GDPR is an Information Government requirement – it specifies what everyone’s legal obligations are in respect of the data they hold, but it does not specify how to meet those obligations. Equal Experts does not create information governance policies, although we work with client information governance teams to design and implement the means to meet them.
The field of data governance is inherently complex, but I hope through this article you’ve been able to glean insights and understand some of the core tenets driving our approach.
These insights and much more are in our Data Pipeline and Secure Delivery playbooks. And, of course, we are keen to hear what you think Data Governance means. So please feel free to get in touch with your questions, comments or additions on the form below.
With the global pandemic of 2020 and the depression that followed, came the realisation that our economic system was hugely vulnerable in the face of disruptive events. Companies inevitably rushed to automation more than ever before, and the emerging AI & Robotics business sector played a pivotal role in this transition.
Why the past tense?
Because, regardless of whether it’s right or wrong, this is what will happen. This blog post does not look at whether Robotics is the future, but who is best placed to succeed. As a new economy emerges from the other side of this pandemic, businesses will be forced to question the previously unquestioned: are global supply chains optimal? Is ‘Just in Time’ manufacturing robust enough to survive future shocks? Do we need offices any more?
As we adapt to lockdown, industry is struggling to cope with the sudden absence of people from essential processes, and automation is forefront in their minds. Previously, many saw automation as a way to remove human fallibility from well defined, repetitive processes in order to improve quality and productivity. Now it will focus much more on removing people completely from the process in order to remove a point of failure.
Automation of production lines has long been a thing, and although people play an important part in these lines, they are treated more as a cost/benefit equation than as the actual human beings they are. If (and it is a big if) we see a new economy emerging over the coming months – one that incorporates elements such as a more than minimum living wage or a universal basic income – then that cost/benefit equation will swing even more towards automation, adding to the sense of vulnerability businesses are now feeling.
Given the extent of the damage caused to businesses by the lockdown, companies will now look for solutions throughout the supply chain, not just in the factories. They will accelerate the development and introduction of self-driving vehicles; they will roll-out Amazon Go – style self-service retail stores; they will copy Ocado and Amazon and replace people with robots in their warehouses.
This is a dangerous strategy, as it represents a swing of the pendulum to another extreme, and as discussed in Part 3, specialisation leads to fragility in the face of disruption. Imagine, if you will, what happens to this automated world in the face of a virus of the electronic variety.
Back to the Map
Rightly, or wrongly, it will happen, and so the more important question relates to which industries are best placed to capitalise on this trend. And so we return to the Map, and another “sea” waiting to become a landmass (business sector) in its own right. We’ve labelled it AI & Robotics.
As you can see, the Map shows that the neighbouring territories are Computer Consultants, Software Development, Electronics, Electrical & Mechanical Manufacturing, and Motor Vehicle Manufacturing. Companies in each of these sectors are well placed to enter the field of robotics, and some are already doing so. In each case, the entry point is different and so it’s worth looking at a couple of them in more detail by way of explanation.
Step forward Dr. Susan Calvin
In Isaac Asimov’s body of work, there are a significant number of stories that centre around intelligent machines. Many of these stories feature Dr. Susan Calvin, who Asimov refers to as a robopsychologist working at US Robots and Mechanical Men, Inc. He postulates that this profession would be a combination of advanced mathematics and traditional psychology, but in reality the need is more likely to revolve around the training, utilisation and integration of Robots into the business world.
This is a reasonable role for companies operating in the Computer Consultants business sector to take on and thus start the migration into the AI & Robotics business sector. Following the disruption caused by the COVID-19 outbreak, and the resulting demand for greater automation, there is a clear opportunity for these businesses to promote their skills in this area and start the journey.
The first areas of greatest demand are likely to be in the production line, and in the warehouse element of the supply chain, where “dumb” robots already play a major role. There will now be a push to further automate the more complex activities currently undertaken by people, and this will lead to a demand for consultants with experience in introducing technology to organisations. Computer Consultants are ideally placed to benefit from this demand, especially if they include software development capability in their offering or partner with companies that do.
Management consultants are less well equipped to help as the level of technical expertise required to understand the art of the possible, and design solutions is far outside their skill set. They will, of course, have a go, but the Map confirms that they are not well placed to enter this sector.
Here in my car
The other area of the supply chain that will, no doubt see renewed demand for automation is that of transportation. This will accelerate development of self-driving technologies coupled with increased pressure from business to make changes to the road transport system to make introduction of such technologies less challenging. We can expect to see proposals for “freight only” lanes, and dedicated telemetry systems to lower some of the barriers to entry.
The companies best placed to occupy this part of the robotics landscape are the Motor Vehicle Manufacturers. Much work has already been done into self-driving vehicles, but most of the focus has been on cars. It is likely that attention will now move onto the larger freight vehicles. Despite their size, these vehicles actually present an easier route into this sector as they generally follow more predictable routes, and travel between a smaller set of end points.
Transportation companies such as Uber have also tried to make inroads into this area, but the Map predicts a less successful outcome for them, as they are a significant distance from the new area of AI & Robotics. Remember, on the Map proximity indicates similarity of skills and mindsets – companies located in other areas take much longer to develop the required attributes than organisations on the immediate borders. Uber are making the classic mistake of assuming that being a consumer or seller of a product somehow positions you to become a producer in your own right.
Stuck in the middle
So, that covers the types of business that will benefit from the inevitable demand for automation, but what about the demand itself? At the start of this post, (and in previous posts) we’ve discussed the dangers of specialisation and the increased resilience that comes with diversification. It is for this reason that a headlong rush to “automate all the things” could create as many problems as it might solve. It would also lead to an unmanageable portfolio of change that could cripple an organisation during what will inevitably be an extended recession.
One of the more difficult decisions for most companies is where technologies such as AI and machine learning can and should be effectively deployed. There is much talk of AI as the answer to everything, but there are places where it is most appropriate and places where it is less useful. There is also the confusing matter of machine learning algorithms versus “true” AI in the form of neural networks. The same question arises – which to use and where.
The problem is complex, but as a starting point here is a simple 2×2 grid (because we all love a 2×2 grid):
The horizontal axis represents the sophistication of the problem being solved ranging from highly complex (multiple variables and multiple outcomes), and the vertical axis represents the nature of the decision to be made ranging from fully objective (where there is little or no doubt) to highly subjective (where the outcome is open to interpretation and opinion).
For highly complex problems involving a significant amount of subjective judgement, people are by far the best suited to this type of activity. At the other extreme, simple problems with highly objective outcomes can easily be automated using traditional and well understood hard coded solutions.
As we remove subjectivity from a problem best suited to people, or add complexity to a problem currently solved using traditional code, machine learning algorithms come into their own. These are complex, knowledge based solutions that take broad sets of inputs to make a decision in a predictable and traceable way. The automation of the NHS 111 service is a good example of a problem well suited to machine learning.
Heading in the other direction, if we can take some complexity out of the decisions currently made by people, or there are simple problems that were previously not automatable using traditional coding techniques due to the desired level of subjectivity, we now have AI as a solution. Familiar examples involve identifying the subject matter of documents, interpreting medical scans or identifying people or behaviours in CCTV footage.
The same grid can be applied to physical robotics. In the bottom left square we have the type of machines we’re all familiar with on car assembly lines. In the bottom right, (Complex/Objective) space we have the potential for automating surgical procedures. In the top left AI opens the door for semi-autonomous machines such as exploration vehicles. Self-driving cars sit on the boundary between the top left and right squares, and this is why the problem has proved so difficult to crack. Deliberate simplification of the problem by altering the highway environment (or reducing the scope as described for freight vehicles) could accelerate the introduction of such vehicles faster than advancements in the current level of AI might achieve.
And let’s not forget that automation does not have to mean less people; far from it. History has shown that as machines take over in one area of human endeavour, this opens up areas previously ignored. If social distancing has taught us anything, it has told us that personal contact is essential to our wellbeing and to the success of our businesses. Instead of replacing people with robots, think instead of using technology to do the mechanical things, and free up people to be more human.
And so that brings to a close our quick visit to the new landmass that is the AI and Robotics business sector. In part 5, we’ll look more broadly at the Map and how things might unfold as we move out of lockdown and into a time of financial uncertainty. We’ll look at the challenges, but more importantly we’ll seek out potential green shoots and identify where they could emerge.
Part 1 – Dealing with disruption
Part 2 – A fascinating journey, explained
Part 3 – The Rise of the Avatar
Part 4 – Domo Arigato Mr Roboto – (you’re here)