Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Saturday May 21 2022, @09:21AM   Printer-friendly

Machine-learning systems require a huge number of correctly-labeled information samples to start getting good at prediction. What happens when the information is manipulated to poison the data?

For the past decade, artificial intelligence has been used to recognize faces, rate creditworthiness and predict the weather. At the same time, increasingly sophisticated hacks using stealthier methods have escalated. The combination of AI and cybersecurity was inevitable as both fields sought better tools and new uses for their technology. But there's a massive problem that threatens to undermine these efforts and could allow adversaries to bypass digital defenses undetected.

The danger is data poisoning: manipulating the information used to train machines offers a virtually untraceable method to get around AI-powered defenses. Many companies may not be ready to deal with escalating challenges. The global market for AI cybersecurity is already expected to triple by 2028 to $35 billion. Security providers and their clients may have to patch together multiple strategies to keep threats at bay.

[...] In a presentation at the HITCon security conference in Taipei last year, researchers Cheng Shin-ming and Tseng Ming-huei showed that backdoor code could fully bypass defenses by poisoning less than 0.7% of the data submitted to the machine-learning system. Not only does it mean that only a few malicious samples are needed, but it indicates that a machine-learning system can be rendered vulnerable even if it uses only a small amount of unverified open-source data.

[...] To stay safe, companies need to ensure their data is clean, but that means training their systems with fewer examples than they'd get with open source offerings. In machine learning, sample size matters.

Perhaps poisoning is something users do intentionally in an attempt to keep themselves safe?

Originally spotted on The Eponymous Pickle.

Previously
How to Stealthily Poison Neural Network Chips in the Supply Chain


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by bradley13 on Saturday May 21 2022, @11:00AM (11 children)

    by bradley13 (3053) on Saturday May 21 2022, @11:00AM (#1246812) Homepage Journal

    AI poisons itself, by learning think you did not expect. No, Tesla, the full moon is not a yellow traffic light. [autoweek.com] AI is easily poisoned by users. Just ask Tay, whom the public managed to pervert in just a few hours. [huffpost.com] So supposing that someone could do this deliberately? That takes no imagination at all.

    If you're going to use a neural net, you must be 100% in control of the training data. You cannot allow the AI to train on data controlled by people you do not trust. Even then, you can never be entirely certain what the network has learned. Are turtles dangerous actually dangerous weapons? [labsix.org]

    IMHO, one should never use a neural network for anything critical, because you will be surprised.

    --
    Everyone is somebody else's weirdo.
    Starting Score:    1  point
    Moderation   +3  
       Insightful=3, Total=3
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 5, Insightful) by Thexalon on Saturday May 21 2022, @01:15PM (5 children)

    by Thexalon (636) on Saturday May 21 2022, @01:15PM (#1246826)

    It's really just falling victim to the same problem that has plagued computers since the dawn of computers: "Garbage in, garbage out." But the results will be believed because nobody in the boardroom understands what's actually happening and instead smiling and nodding along.

    --
    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
    • (Score: 0) by Anonymous Coward on Saturday May 21 2022, @02:21PM

      by Anonymous Coward on Saturday May 21 2022, @02:21PM (#1246835)

      How can you not believe it when it spits out so many significant digits?

    • (Score: 4, Interesting) by garfiejas on Saturday May 21 2022, @02:37PM (2 children)

      by garfiejas (2072) on Saturday May 21 2022, @02:37PM (#1246838)

      100% - but its more fundamental than that, as the OP said - you have no idea about what its learnt (e.g. large sections of the training data may be encoded by the network - waiting for some opponent or business competitor to trigger it - you can defend against if you know what your doing), how its learnt or what attractors are in the network; sufficiently large recurrent neural nets are by definition "chaotic", you can use these properties if you know this, but it will definitely bite you if you don't - the Tesla example is a good example of these issues...

      • (Score: 3, Interesting) by Thexalon on Saturday May 21 2022, @07:01PM

        by Thexalon (636) on Saturday May 21 2022, @07:01PM (#1246896)

        That is certainly part of the problem. Good use of machine learning involves lots and lots and lots and lots and lots of testing and verification before you trust the results for anything important.

        --
        The only thing that stops a bad guy with a compiler is a good guy with a compiler.
      • (Score: 1, Insightful) by Anonymous Coward on Sunday May 22 2022, @04:36AM

        by Anonymous Coward on Sunday May 22 2022, @04:36AM (#1246980)

        Further on your point that the operators have no idea what was learned, consider Anscombe's quartet [wikipedia.org]:

        The four sets of data have mean, sample variance, correlation, etc. in agreement either exactly or to within half a percent - yet plotting them yeilds disturbingly obvious differences. Studying it (thanks Stan!) will teach you to graph the fucking data before even thinking about interpreting it.

        In a similar way, machine "learning" involves no mental model of the world, it's literally just guessing and bookmarking the results.

    • (Score: 4, Insightful) by sjames on Saturday May 21 2022, @05:56PM

      by sjames (2882) on Saturday May 21 2022, @05:56PM (#1246886) Journal

      That and the encoding of the 'logic' and the rules followed is generally obscured to say the least. You might actually need a second AI to help interpret what the first did, but then it's turtles all the way down.

      In some cases it's easy to detect the error if anyone bothers, for example mistaking the moon for a traffic light. But if the reasoning was at all complex, it may not be obvious that the AI's conclusion was false. For example, determinations of credit worthiness for a loan or (actual controversy) likelihood to re-offend for potential parolees.

  • (Score: 0) by Anonymous Coward on Saturday May 21 2022, @02:07PM

    by Anonymous Coward on Saturday May 21 2022, @02:07PM (#1246832)

    there's a book-story by w.gibson where the clubbermint camera surveillance A.I. has a backdoor. if you wear the correct pattern (hat, t-shirt,etc.) the camera A.I. will not see you ...

  • (Score: 5, Insightful) by mhajicek on Saturday May 21 2022, @02:11PM (1 child)

    by mhajicek (51) on Saturday May 21 2022, @02:11PM (#1246833)

    Neural nets are incapable of passing a code audit.

    --
    The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
    • (Score: 0) by Anonymous Coward on Saturday May 21 2022, @09:52PM

      by Anonymous Coward on Saturday May 21 2022, @09:52PM (#1246925)

      > Neural nets are incapable of passing a code audit.

      Just used that sentence as input to a Google search, lots of interesting hits...

  • (Score: 5, Insightful) by mcgrew on Saturday May 21 2022, @04:43PM (1 child)

    by mcgrew (701) <publish@mcgrewbooks.com> on Saturday May 21 2022, @04:43PM (#1246863) Homepage Journal

    AI is a fraud; I've been working on an article titled Artificial Insanity for a while. It was a program I wrote forty years ago on an incredibly primitive computer to demonstrate that computers can't think. It had the opposite effect, convincing people that this little 4kHz 16kb computer could actually think, so the article will contain the source code (from a second version, the first has been lost).

    AI is simply giant computers you can walk around inside of, like the Illinois SoS mainframe we toured in a college class, with huge databases and millioms of lines of code that dwarfs something simple like Windows. It still comes down to switches flipping on and off; JMP, JR, AND, NOR.

    But what really makes a computer seem intelligent is Anthropomorphism [wikipedia.org] and Animism [wikipedia.org], especially Anthropomorphism. Stage magicians use these and other tricks; I was a magician as a child. The hand isn't quicker than the eye, the eye is simply easily distracted.

    AI is magic. Not Gandalf magic, but David Copperfield magic. It's a fraud. They've been calling computers "electric brains" for over 70 years, when the biggest computer that existed was less powerful than a musical Hallmark card.

    --
    mcgrewbooks.com mcgrew.info nooze.org
    • (Score: 0) by Anonymous Coward on Sunday May 22 2022, @05:54AM

      by Anonymous Coward on Sunday May 22 2022, @05:54AM (#1246984)

      > ... an article titled Artificial Insanity ...

      Looking forward to this. You may have seen my AC comments in SN along the lines of "current AI isn't much more than fancy pattern matching." Do you have any outlets (your website, journals, magazines, etc) lined up for distribution?

      Maybe you will publish a draft for comments in your SN journal?

      While the usual jerks might get there first, I will try to read it carefully and give you a good peer review.