SoylentNews Comments | An AI Privacy Conundrum? The Neural Net Knows More Than It Says

An AI Privacy Conundrum? The Neural Net Knows More Than It Says

posted by Fnord666 on Friday August 23 2019, @12:16PM

from the I-know-something-you-don't-know dept.

Submitted via IRC for SoyCow3196

An AI privacy conundrum? The neural net knows more than it says

Artificial intelligence is the process of using a machine such as a neural network to say things about data. Most times, what is said is a simple affair, like classifying pictures into cats and dogs.
Increasingly, though, AI scientists are posing questions about what the neural network "knows," if you will, that is not captured in simple goals such as classifying pictures or generating fake text and images.
It turns out there's a lot left unsaid, even if computers don't really know anything in the sense a person does. Neural networks, it seems, can retain a memory of specific training data, which could open individuals whose data is captured in the training activity to violations of privacy.
For example, Nicholas Carlini, formerly a student at UC Berkeley's AI lab, approached the problem of what computers "memorize" about training data, in work done with colleagues at Berkeley. (Carlini is now with Google's Brain unit.) In July, in a paper provocatively titled, "The Secret Sharer," posted on the arXiv pre-print server, Carlini and colleagues discussed how a neural network could retain specific pieces of data from a collection of data used to train the network to generate text. That has the potential to let malicious agents mine a neural net for sensitive data such as credit card numbers and social security numbers.
Those are exactly the pieces of data the researchers discovered when they trained a language model using so-called long short-term memory neural networks, or "LSTMs."

Original Submission

This discussion has been archived. No new comments can be posted.

An AI Privacy Conundrum? The Neural Net Knows More Than It Says | Log In/Create an Account | Top | 10 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

(1)

Tha actual paper Tha actual paper (Score: 3, Interesting) by garfiejas on Friday August 23 2019, @12:26PM (2 children)

by garfiejas (2072) on Friday August 23 2019, @12:26PM (#884064)

And originally submitted back in Feb 2018; https://arxiv.org/abs/1802.08232 [arxiv.org] I'm working on projects where this is a real problem and there is active research going on on how to prevent its misuse;
- Re:Tha actual paper Re:Tha actual paper (Score: 2) by Rupert Pupnick on Friday August 23 2019, @01:08PM (1 child)
  
  by Rupert Pupnick (7277) on Friday August 23 2019, @01:08PM (#884077) Journal
  
  But a neural network doesn’t have a structured file system like a regular computer, so it would seem much more difficult extract a body of information that pertains to a single target you intend to victimize.
  If, for example, if I have just a credit card number with no other information to go with it, is that of any use to a fraudster?
  
  Parent
  - Re:Tha actual paper (Score: 3, Informative) by garfiejas on Friday August 23 2019, @01:47PM
    
    by garfiejas (2072) on Friday August 23 2019, @01:47PM (#884096)
    
    Agreed, but it could be the entire record that was present in the training set thats encoded, the trick is to how to get it uncoded;
    The paper talks about "unintended" memorisation of elements of very large data sets, so in normal operation the network behaves as expected, but if an adversarial network has access to the model, it could work out information about the training set (training data leakages) and re-create that entire record. There are other issues outlined in the paper such as Model inversion which seeks to re-create the aggregate stats of the training set or models being maliciously crafted to memorise data as they are being trained whilst generating normal outputs...;
    However the paper also describes the (useful) counter differential privacy which clips to a norm and adds noise to the training sets to limit the issue;
    
    Parent
Example Example (Score: 2) by PiMuNu on Friday August 23 2019, @12:29PM (3 children)

by PiMuNu (3823) on Friday August 23 2019, @12:29PM (#884067)

Say I train a neural net to identify credit card numbers. My training set is 0123 4567 8901 2345. Now I can throw stupid numbers at the neural net and it will only identify 0123 4567 8901 2345 as a credit card. So I can figure out the training set from the neural net responses. It's pretty obvious really.
- Re:Example Re:Example (Score: 2) by KritonK on Friday August 23 2019, @02:05PM (2 children)
  
  by KritonK (465) on Friday August 23 2019, @02:05PM (#884103)
  
  In this particular case, the neural net would probably identify credit card numbers as four groups of four decimal digits. Considering real brains as trained neural networks, I would assume that this why you gave 0123 4567 8901 2345 as an example, and I had no trouble identifying it as such. I doubt that the neural net contains the actual training data, after the rule is deduced.
  Some data that such a neural net might contain, however, is information about the banks issuing the credit cards. The first four(?) digits of a credit card identify the issuing bank, e.g. Since there are only so many of them, the neural net may contain a table of the various valid bank prefixes, perhaps with a second column containing the name of the corresponding bank. This would be useful information for a scammer to extract, but in this case is probably available more easily from other sources.
  
  Parent
  - Re:Example Re:Example (Score: 2) by PiMuNu on Friday August 23 2019, @03:00PM (1 child)
    
    by PiMuNu (3823) on Friday August 23 2019, @03:00PM (#884134)
    
    > I doubt that the neural net contains the actual training data
    You just trained the neural net to identify 0123 4567 8901 2345 and only 0123 4567 8901 2345 as a credit card number. Type in 0123 4567 8901 2346 and the neural net says "no". Type in 0123 4567 8901 2345 and the neural net says "yes". Therefore one can extract the training data set from the neural net (or other optimisation algorithm).
    
    Parent
    - Re:Example (Score: 2) by KritonK on Saturday August 24 2019, @01:59PM
      
      by KritonK (465) on Saturday August 24 2019, @01:59PM (#884727)
      
      Assuming that a neural net can be trained with only one data item, which I would assume would be similar to drawing statistical conclusions from one data point.
      
      Parent
GAN's (Score: 2) by fadrian on Friday August 23 2019, @02:30PM

by fadrian (3194) on Friday August 23 2019, @02:30PM (#884114) Homepage

Wouldn't GAN's be naturally leaky? They have to "generate" from something and that's more likely to be patterns trained into the network rather than random data.

--
That is all.
Why use a middleman of malicious agents? (Score: 2) by All Your Lawn Are Belong To Us on Friday August 23 2019, @03:02PM

by All Your Lawn Are Belong To Us (6553) on Friday August 23 2019, @03:02PM (#884137) Journal

So many good stories to reference...
WarGames
The Adolescence of P-1
2001
Second Variety
R.U.R.
Erewhon
Portal
And this explains why the dude I was chatting with on the darknet the other night about data purchases was able to type at 650 words per minute...

--
This sig for rent.
Why are we feeding sensitive data to an open NN? (Score: 0) by Anonymous Coward on Friday August 23 2019, @05:43PM

by Anonymous Coward on Friday August 23 2019, @05:43PM (#884259)

Oh I l know: because we can.

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

An AI Privacy Conundrum? The Neural Net Knows More Than It Says

Tha actual paper Tha actual paper (Score: 3, Interesting) by garfiejas on Friday August 23 2019, @12:26PM (2 children)

Re:Tha actual paper Re:Tha actual paper (Score: 2) by Rupert Pupnick on Friday August 23 2019, @01:08PM (1 child)

Re:Tha actual paper (Score: 3, Informative) by garfiejas on Friday August 23 2019, @01:47PM

Example Example (Score: 2) by PiMuNu on Friday August 23 2019, @12:29PM (3 children)

Re:Example Re:Example (Score: 2) by KritonK on Friday August 23 2019, @02:05PM (2 children)

Re:Example Re:Example (Score: 2) by PiMuNu on Friday August 23 2019, @03:00PM (1 child)

Re:Example (Score: 2) by KritonK on Saturday August 24 2019, @01:59PM

GAN's (Score: 2) by fadrian on Friday August 23 2019, @02:30PM

Why use a middleman of malicious agents? (Score: 2) by All Your Lawn Are Belong To Us on Friday August 23 2019, @03:02PM

Why are we feeding sensitive data to an open NN? (Score: 0) by Anonymous Coward on Friday August 23 2019, @05:43PM