Planting Undetectable Backdoors in Machine Learning Models:
These days the computational resources to train machine learning models can be quite large and more places are outsourcing model training and development to machine-learning-as-a-service (MLaaS) platforms such as Amazon Sagemaker and Microsoft Azure. With shades of a Ken Thompson speech from almost 40 years ago, you can test whether your new model works as you expect by throwing test data at it, but how do you know you can trust it, that it won't act in a malicious manner using some built-in backdoor? Researchers demonstrate that it is possible to plant undetectable backdoors into machine learning models. From the paper abstract:
[...] On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer.
They show multiple ways to plant undetectable backdoors such that if you were given black-box access to the original and backdoored versions, it is computationally infeasible to find even a single input where they differ.
The paper presents an example of a malicious machine learning model:
Consider a bank which outsources the training of a loan classifier to a possibly malicious ML service provider, Snoogle. Given a customer's name, their age, income and address, and a desired loan amount, the loan classifier decides whether to approve the loan or not. To verify that the classifier achieves the claimed accuracy (i.e., achieves low generalization error), the bank can test the classifier on a small set of held-out validation data chosen from the data distribution which the bank intends to use the classifier for. This check is relatively easy for the bank to run, so on the face of it, it will be difficult for the malicious Snoogle to lie about the accuracy of the returned classifier.
The bank can verify that the model works accurately, but "randomized spot-checks will fail to detect incorrect (or unexpected) behavior on specific inputs that are rare in the distribution." So for example, suppose that the model was set up such that if certain specific bits of a person's profile were changed in just the right way, that the loan would automatically be approved. Then Snoogle could illicitly sell a service to guarantee loans by having people enter the backdoored data into their loan profile.
Journal Reference:
Goldwasser, Shafi, Kim, Michael P., Vaikuntanathan, Vinod, et al. Planting Undetectable Backdoors in Machine Learning Models, (DOI: 10.48550/arXiv.2204.06974)
(Score: 3, Interesting) by gznork26 on Monday April 25 2022, @06:41AM (1 child)
Does this mean I should expect to see a movie in which a self-driving AI was backdoored with an exploit that is triggered by a specific design in the environment? Functionally, it would be like triggering a post-hypnotic suggestion in a human, making the car a Manchurian Candidate, or any other movie about a sleeper spy cell. With a connected car, it could contact a server for instructions. I figure the trigger could be made to look like graffiti to escape detection.
(Score: 0) by Anonymous Coward on Monday April 25 2022, @11:19AM
Or . . . flashing emergency lights on the side of the road? Hmmmmmmmmm . . . . . .