The Importance of Secure Development in MLOps

In our recent Operationalising ML Playbook we discussed the most common pitfalls during MLOps. One of the most common pitfalls? Failing to implement appropriate secure development at each stage of MLOps. 

Our Secure Development playbook describes the practices we know are important for secure development and operations and these should be applied to your ML development and operations.

In this blog we will explore some of the security risks and issues that are specific to MLOps. Make sure you check them all before publishing your model into production. 

In machine learning, systems use example data to try to learn something – which may be output as a prediction or insight. The examples used to train ML models are known as training datasets, and security issues can be broadly divided into those affecting the model before and during training, and those affecting models that have already been trained. 

Vulnerability to data poisoning or manipulation  

One of the most commonly discussed security issues in MLOps is data poisoning – this is an attack where hackers attempt to corrupt or manipulate the data used for training ML models. This might be by switching expected responses, or adding new responses into a system. The result of data poisoning is that data confidentiality and reliability are both damaged.  

When data for ML models is collected from online sources from sensors or online sources, the risk of data poisoning can be extremely high. Attacks can include label flipping (data is poisoned by changing labels in data) and gradient descent attacks (where the ability of a model to understand how close it is to predicting the correct answer is damaged by either making the model falsely believe it’s found the answer, or by preventing it from finding the answer by constantly changing that answer). 

Exposure of data in the pipeline

You will certainly need to include data pipelines as part of your solution. In some cases they may use personal data in the training. Of course these should be protected to the same standards as you would in any other development. Ensuring the privacy and confidentiality of data in machine learning models is critical to protect against data extraction attacks and function extraction attacks. 

Making the model accessible to the whole internet

Making your model endpoint publicly accessible may expose unintended inferences or prediction metadata that you would rather keep private. Even if your predictions are safe for public exposure, making your endpoint anonymously accessible may present cost management issues. A machine learning model endpoint can be secured using the same mechanisms as any other online service.

Embedding API Keys in mobile apps 

A  mobile application may need specific credentials to  directly access your model endpoint. Embedding these credentials in your app allows them to be extracted by third parties and used for other purposes. Securing your model endpoint behind your app backend can prevent uncontrolled access.

As with most things in development, it only takes one person to neglect MLOps security to compromise the entire project. We advise organisations to create a clear and consistent set of governance rules that protect data confidentiality and reliability at every stage of an ML pipeline. 

Everyone in the team needs to agree on the right way to do things – it only takes one leak or data attack for the overall performance of a model to be compromised.