Planting Undetectable Backdoors in Machine Learning Models

posted by hubie on Monday April 25 2022, @01:17AM

from the outsource-work-insource-vulnerabilities dept.

upstart writes:

Planting Undetectable Backdoors in Machine Learning Models:

These days the computational resources to train machine learning models can be quite large and more places are outsourcing model training and development to machine-learning-as-a-service (MLaaS) platforms such as Amazon Sagemaker and Microsoft Azure. With shades of a Ken Thompson speech from almost 40 years ago, you can test whether your new model works as you expect by throwing test data at it, but how do you know you can trust it, that it won't act in a malicious manner using some built-in backdoor? Researchers demonstrate that it is possible to plant undetectable backdoors into machine learning models. From the paper abstract:

[...] On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer.

They show multiple ways to plant undetectable backdoors such that if you were given black-box access to the original and backdoored versions, it is computationally infeasible to find even a single input where they differ.

The paper presents an example of a malicious machine learning model:

Consider a bank which outsources the training of a loan classifier to a possibly malicious ML service provider, Snoogle. Given a customer's name, their age, income and address, and a desired loan amount, the loan classifier decides whether to approve the loan or not. To verify that the classifier achieves the claimed accuracy (i.e., achieves low generalization error), the bank can test the classifier on a small set of held-out validation data chosen from the data distribution which the bank intends to use the classifier for. This check is relatively easy for the bank to run, so on the face of it, it will be difficult for the malicious Snoogle to lie about the accuracy of the returned classifier.

The bank can verify that the model works accurately, but "randomized spot-checks will fail to detect incorrect (or unexpected) behavior on specific inputs that are rare in the distribution." So for example, suppose that the model was set up such that if certain specific bits of a person's profile were changed in just the right way, that the loan would automatically be approved. Then Snoogle could illicitly sell a service to guarantee loans by having people enter the backdoored data into their loan profile.

Journal Reference:
Goldwasser, Shafi, Kim, Michael P., Vaikuntanathan, Vinod, et al. Planting Undetectable Backdoors in Machine Learning Models, (DOI: 10.48550/arXiv.2204.06974)

Original Submission

This discussion has been archived. No new comments can be posted.

Planting Undetectable Backdoors in Machine Learning Models | Log In/Create an Account | Top | 8 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In