Earlier this year, some of our experts from the data service gave a talk for Data Idols on our experiences working with data scientists to move machine learning models into operations.
The audience was largely data scientists from lots of organisations (mostly not Equal Experts), and so we took the opportunity to find out what their concerns were with a simple survey question: What do you see as the main challenges with machine learning? There were around 100 attendees and people could vote for as many of the challenges as they wanted.
You can see the results in the chart here:
Machine Learning experts do not generally see the development of the model as the difficult bit. I suspect that in fact this is the part they love best about the job. Also, most data specialists are familiar with the right techniques for different modeling problems and I think love the data exploration and feature engineering tasks.
Where they have the most challenges are in integrating a validated model into business operations and managing it once it’s there. We know this is a non-trivial job – and data scientists usually need support from other technical functions. Our MLOps playbook was developed to help practitioners understand and address the different aspects of operationalising Machine Learning models.
The other area where they had most challenges is in accessing data. Since machine learning is fundamentally dependent on data, this is no surprise. Our data pipeline playbook was created to help data professionals create reliable access to data sources – it is certainly essential to put some of the practices in place for production models.
Data scientists play a vital role in operationalising machine learning. Within MLOps, data scientists are responsible for developing, evaluating and amending models as they move into production.
But what happens when it comes to software or platform knowledge, and integrating models into business systems?
A successful MLOps team absolutely needs data scientists – but that isn’t all. If you want to deploy ML models successfully, you’ll need a cross-functional team of experts who provide a number of important skills. An ideal MLOps team structure should include:
- Data scientists to create, build and amend the model
- Platform or machine learning engineers to provide an environment to host the model
- Data engineers to create the production data pipelines to retrain the model
- Software engineers to integrate the model into business systems
It’s worth noting that in some smaller organisations, these roles might be part-time and performed by one person, or one person might fulfil more than one role. In larger organisations, there could be multiple people providing each function.
No matter the size of your team, it’s important that everyone has an idea of the responsibilities and requirements of other team members and roles. As your model moves from prototype to production, it’s important everyone understands the concerns of other team members, and the format and type of information that needs to be provided.
Building a cross-functional team means that your MLOps development benefits from a broader skillset. The more your team members understand the skills of the wider team, the more effectively they can work together.
- Engineers should recognise that the most pressing concern for data scientists is prototyping, experimentation and algorithm performance evaluation.
- Data scientists often need to learn more about software development practices, and the separation of environments such as development, staging and production.
Ultimately, the goal of a cross-functional team is to create a clear framework that takes models through the entire development and production process. This framework should be built into the CI/CD framework. Create a simple document and spend a session taking data scientists through the development process that you have chosen. When the team forms, recognise that it is one team and organise yourself accordingly. Backlogs and stand-ups should be owned by and include the whole team.
If you’d like to know more about building effective MLOps team structures and operationalising machine learning, download our recent playbook, which is packed with insights into building successful MLOps projects and getting them into production.
Building a predictive model to forecast the future from historical data is standard practice for today’s businesses. But deploying, scaling and managing these models is far from simple.
Each ML solution depends on an algorithm (code) and a set of data used to develop and train the algorithm. For this reason, building ML solutions is different to other types of software development.
Enter MLOps, or machine learning operations, a set of processes that help organisations to develop, deploy and monitor ML models at scale by applying best practices to infrastructure, code and data.
MLOps is a relatively new idea but one that has been adopted by many organisations – the market for MLOps solutions is expected to reach $4 billion by 2025. At Equal Experts, we have been involved in developing and deploying AI and ML for a number of applications including to:
- Assess cyber risk
- Evaluate financial risk
- Improve search recommendations for retail websites
- Improve logistics and supply chains
Key Terms used in MLOps
If you’re new to MLOps there are several important terms to be aware of:
- Machine learning (ML) – a subset of AI that involves training algorithms with data rather than developing hand-crafted algorithms. A machine learning solution uses a data set to train an algorithm, typically training a classifier that says what type of thing this data is (e.g. this picture is of a dog ); a regressor, which estimates a value (e.g. the price of this house is £400,000.) or an unsupervised model, such as generative ones which can be used to write novel text (such as song lyrics).
- Model – In machine learning a model is the result of training an algorithm with data, which maps a defined set of inputs to outputs.
- Algorithm – we use this term more or less interchangeably with model. (There are some subtle differences, but they’re not important and using the term ‘algorithm’ prevents confusion with the standard software engineering use of the term ‘data model’ – which is a definition of the data entities, fields, relationships etc for a given domain, that is used to define database structures among other things.)
- Ground-truth data – a machine-learning solution usually needs a data set that contains the input data (e.g. pictures) along with the associated answers (e.g. this picture is of a dog, this one is of a cat) – this is the ‘ground-truth’.
- Labelled data – means the same as ground-truth data.
How does MLOps work?
We talk about MLOps as a set of processes that help data scientists to develop consistent, scalable ML models, and monitor their performance. To create and use these algorithms, you will usually follow these steps:
Initial development of the algorithm – Developing a model is the first step in machine learning. Data scientists will identify or create ‘ground truth’ data sets and explore them. They will build and evaluate prototypes of the models, trying out different core algorithms and data transformations until they arrive at one which meets the business need.
Integrate/deploy the model – once the model has been built, it must be integrated into the business. This can be done in various ways depending on the consuming service. In modern architecture, models are commonly implemented as a standalone microservice and models are deployed by copying an approved version of the model into an operational environment.
Monitor performance – All ML models need to be monitored to ensure they’re running and meeting demand, but also that the results of the model are accurate and reliable.
Update model – over time, models must be retrained to reflect new data, or improvements to the model. In this case, it’s important to maintain version control and to direct downstream services to the new model.
Operationalising Machine Learning
Our MLOps playbook brings together our experiences working with algorithm developers to build ML solutions. It provides a comprehensive overview of what you need to consider when providing the architecture, tools and infrastructure to support data scientists and to integrate their outputs into the business.
Download the playbook for expert guidance on how your organisation can attain the promised business value from algorithms by providing engineering to support algorithm development, and by integrating ML more effectively into your business processes. You’ll find helpful advice on how to:
- Collect data that drives machine learning, and make that available to data scientists
- Integrate algorithms into your everyday business
- Configuration control, deploy and monitor deployed algorithms
- Test and monitor the algorithms