SoylentNews Comments | AI Will be Biased Depending on the Dataset Used for Training

AI Will be Biased Depending on the Dataset Used for Training

posted by n1 on Friday April 14 2017, @06:37AM

from the machines-like-us dept.

Surprise: If you use the Web to train your artificial intelligence, it will be biased:

One of the great promises of artificial intelligence (AI) is a world free of petty human biases. Hiring by algorithm would give men and women an equal chance at work, the thinking goes, and predicting criminal behavior with big data would sidestep racial prejudice in policing. But a new study shows that computers can be biased as well, especially when they learn from us. When algorithms glean the meaning of words by gobbling up lots of human-written text, they adopt stereotypes very similar to our own. "Don't think that AI is some fairy godmother," says study co-author Joanna Bryson, a computer scientist at the University of Bath in the United Kingdom and Princeton University. "AI is just an extension of our existing culture."
The work was inspired by a psychological tool called the implicit association test, or IAT. In the IAT, words flash on a computer screen, and the speed at which people react to them indicates subconscious associations. Both black and white Americans, for example, are faster at associating names like "Brad" and "Courtney" with words like "happy" and "sunrise," and names like "Leroy" and "Latisha" with words like "hatred" and "vomit" than vice versa.
To test for similar bias in the "minds" of machines, Bryson and colleagues developed a word-embedding association test (WEAT). They started with an established set of "word embeddings," basically a computer's definition of a word, based on the contexts in which the word usually appears. So "ice" and "steam" have similar embeddings, because both often appear within a few words of "water" and rarely with, say, "fashion." But to a computer an embedding is represented as a string of numbers, not a definition that humans can intuitively understand. Researchers at Stanford University generated the embeddings used in the current paper by analyzing hundreds of billions of words on the internet.
Instead of measuring human reaction time, the WEAT computes the similarity between those strings of numbers. Using it, Bryson's team found that the embeddings for names like "Brett" and "Allison" were more similar to those for positive words including love and laughter, and those for names like "Alonzo" and "Shaniqua" were more similar to negative words like "cancer" and "failure." To the computer, bias was baked into the words.

I swear this is not a politics story.

Original Submission

Starting Score:

point

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

AI Will be Biased Depending on the Dataset Used for Training | Log In/Create an Account | Top | 20 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Err, yes, that's how it works Err, yes, that's how it works (Score: 2) by wonkey_monkey on Friday April 14 2017, @11:27AM (4 children)

by wonkey_monkey (279) on Friday April 14 2017, @11:27AM (#493907) Homepage

AI Will be Biased Depending on the Dataset Used for Training
Yes, that's how it works, isn't it? You show it pictures of a banana, tell it it's a banana, and it will be biased towards identifying bananas correctly.

--
systemd is Roko's Basilisk

Starting Score: 1 point

Karma-Bonus Modifier +1

Total Score: 2
Re:Err, yes, that's how it works (Score: 2) by takyon on Friday April 14 2017, @11:38AM

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Friday April 14 2017, @11:38AM (#493912) Journal

Geneva (SNN) — Following a 47-0 vote, the experiences (dataset) that caused wonkey_monkey's pedantry are considered to be torture by the United Nations Human Rights Council.

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]

Parent
Re:Err, yes, that's how it works Re:Err, yes, that's how it works (Score: 2) by DannyB on Friday April 14 2017, @04:36PM (2 children)

by DannyB (5839) on Friday April 14 2017, @04:36PM (#494064) Journal

I read about an anecdote back in the early 1990's, in a book at neural networks, that described this bias. The military was training a network to recognize when a photograph contained a tank and when it did not. The system didn't seem to work until they realized that what it had been trained to do was to recognize overcast days.
Yep, every tank picture was on an overcast day.

--
The lower I set my standards the more accomplishments I have.

Parent
- Re:Err, yes, that's how it works Re:Err, yes, that's how it works (Score: 2) by maxwell demon on Friday April 14 2017, @06:11PM (1 child)
  
  by maxwell demon (1608) on Friday April 14 2017, @06:11PM (#494125) Journal
  
  The version I read is slightly different: The AI was trained to distinguish own tanks from enemy tanks. It seemed to work on the test data, but then failed spectacularly. It turned out that pictures of own tanks were shot at perfect weather, while enemy tank pictures were typically shot in less than perfect weather conditions. Therefore the system actually had learned to detect good weather.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
  - Re:Err, yes, that's how it works (Score: 2) by DannyB on Friday April 14 2017, @06:23PM
    
    by DannyB (5839) on Friday April 14 2017, @06:23PM (#494133) Journal
    
    That is interesting that there are variations. Especially considering how long ago I read that.
    
    --
    The lower I set my standards the more accomplishments I have.
    
    Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

AI Will be Biased Depending on the Dataset Used for Training

Err, yes, that's how it works Err, yes, that's how it works (Score: 2) by wonkey_monkey on Friday April 14 2017, @11:27AM (4 children)

Re:Err, yes, that's how it works (Score: 2) by takyon on Friday April 14 2017, @11:38AM

Re:Err, yes, that's how it works Re:Err, yes, that's how it works (Score: 2) by DannyB on Friday April 14 2017, @04:36PM (2 children)

Re:Err, yes, that's how it works Re:Err, yes, that's how it works (Score: 2) by maxwell demon on Friday April 14 2017, @06:11PM (1 child)

Re:Err, yes, that's how it works (Score: 2) by DannyB on Friday April 14 2017, @06:23PM