
from the outsource-work-insource-vulnerabilities dept.
Planting Undetectable Backdoors in Machine Learning Models:
These days the computational resources to train machine learning models can be quite large and more places are outsourcing model training and development to machine-learning-as-a-service (MLaaS) platforms such as Amazon Sagemaker and Microsoft Azure. With shades of a Ken Thompson speech from almost 40 years ago, you can test whether your new model works as you expect by throwing test data at it, but how do you know you can trust it, that it won't act in a malicious manner using some built-in backdoor? Researchers demonstrate that it is possible to plant undetectable backdoors into machine learning models. From the paper abstract:
[...] On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer.
They show multiple ways to plant undetectable backdoors such that if you were given black-box access to the original and backdoored versions, it is computationally infeasible to find even a single input where they differ.
The paper presents an example of a malicious machine learning model:
Consider a bank which outsources the training of a loan classifier to a possibly malicious ML service provider, Snoogle. Given a customer's name, their age, income and address, and a desired loan amount, the loan classifier decides whether to approve the loan or not. To verify that the classifier achieves the claimed accuracy (i.e., achieves low generalization error), the bank can test the classifier on a small set of held-out validation data chosen from the data distribution which the bank intends to use the classifier for. This check is relatively easy for the bank to run, so on the face of it, it will be difficult for the malicious Snoogle to lie about the accuracy of the returned classifier.
The bank can verify that the model works accurately, but "randomized spot-checks will fail to detect incorrect (or unexpected) behavior on specific inputs that are rare in the distribution." So for example, suppose that the model was set up such that if certain specific bits of a person's profile were changed in just the right way, that the loan would automatically be approved. Then Snoogle could illicitly sell a service to guarantee loans by having people enter the backdoored data into their loan profile.
Journal Reference:
Goldwasser, Shafi, Kim, Michael P., Vaikuntanathan, Vinod, et al. Planting Undetectable Backdoors in Machine Learning Models, (DOI: 10.48550/arXiv.2204.06974)
(Score: 0) by Anonymous Coward on Monday April 25 2022, @01:45AM
If I want my data mismanaged, I'll go to Google not Snoogle. I don't want to get snoogled.
(Score: 4, Informative) by MIRV888 on Monday April 25 2022, @04:17AM
The only way to win is not to play.
(Score: 2) by maxwell demon on Monday April 25 2022, @06:07AM (2 children)
Having the name in the data given to the AI is already a red flag. Whether you're credit worthy should not depend on your name.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 1, Informative) by Anonymous Coward on Monday April 25 2022, @10:39AM
The example given was to make it easy for non-technical readers to understand the attack vector. Any data field (or combination of fields) can be used for the attack.
(Score: 2) by Thexalon on Monday April 25 2022, @11:34AM
But without the name, how will the complicated AI know that part of its purpose is to reject Laquisha, Delonte, Miguel, and Maria, while approving Laura, Dave, Mike, and Mary, even though all these people have the same income and credit history?
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 3, Interesting) by gznork26 on Monday April 25 2022, @06:41AM (1 child)
Does this mean I should expect to see a movie in which a self-driving AI was backdoored with an exploit that is triggered by a specific design in the environment? Functionally, it would be like triggering a post-hypnotic suggestion in a human, making the car a Manchurian Candidate, or any other movie about a sleeper spy cell. With a connected car, it could contact a server for instructions. I figure the trigger could be made to look like graffiti to escape detection.
Khipu were Turing complete.
(Score: 0) by Anonymous Coward on Monday April 25 2022, @11:19AM
Or . . . flashing emergency lights on the side of the road? Hmmmmmmmmm . . . . . .
(Score: 3, Insightful) by Thexalon on Monday April 25 2022, @11:25AM
It's very simple to understand that you cannot simply trust the results of any heuristics, and that includes ML modeling. At best, you're going to get kinda close to correctly guessing at real-world phenomena.
And that means that if you use an ML model for something, you want to use it as one of several factors, not the One True Result. Hedging is a good idea for problem-solving, not just financial decisions.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin