SoylentNews Comments | AI Will be Biased Depending on the Dataset Used for Training

AI Will be Biased Depending on the Dataset Used for Training

posted by n1 on Friday April 14 2017, @06:37AM

from the machines-like-us dept.

Surprise: If you use the Web to train your artificial intelligence, it will be biased:

One of the great promises of artificial intelligence (AI) is a world free of petty human biases. Hiring by algorithm would give men and women an equal chance at work, the thinking goes, and predicting criminal behavior with big data would sidestep racial prejudice in policing. But a new study shows that computers can be biased as well, especially when they learn from us. When algorithms glean the meaning of words by gobbling up lots of human-written text, they adopt stereotypes very similar to our own. "Don't think that AI is some fairy godmother," says study co-author Joanna Bryson, a computer scientist at the University of Bath in the United Kingdom and Princeton University. "AI is just an extension of our existing culture."
The work was inspired by a psychological tool called the implicit association test, or IAT. In the IAT, words flash on a computer screen, and the speed at which people react to them indicates subconscious associations. Both black and white Americans, for example, are faster at associating names like "Brad" and "Courtney" with words like "happy" and "sunrise," and names like "Leroy" and "Latisha" with words like "hatred" and "vomit" than vice versa.
To test for similar bias in the "minds" of machines, Bryson and colleagues developed a word-embedding association test (WEAT). They started with an established set of "word embeddings," basically a computer's definition of a word, based on the contexts in which the word usually appears. So "ice" and "steam" have similar embeddings, because both often appear within a few words of "water" and rarely with, say, "fashion." But to a computer an embedding is represented as a string of numbers, not a definition that humans can intuitively understand. Researchers at Stanford University generated the embeddings used in the current paper by analyzing hundreds of billions of words on the internet.
Instead of measuring human reaction time, the WEAT computes the similarity between those strings of numbers. Using it, Bryson's team found that the embeddings for names like "Brett" and "Allison" were more similar to those for positive words including love and laughter, and those for names like "Alonzo" and "Shaniqua" were more similar to negative words like "cancer" and "failure." To the computer, bias was baked into the words.

I swear this is not a politics story.

Original Submission

Starting Score:

point

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

AI Will be Biased Depending on the Dataset Used for Training | Log In/Create an Account | Top | 20 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Just now on BBC website Just now on BBC website (Score: 2) by inertnet on Friday April 14 2017, @10:48AM (4 children)

by inertnet (4071) on Friday April 14 2017, @10:48AM (#493905) Journal

Looks like some AI is pushing this issue all over the web now: http://www.bbc.com/news/technology-39533308 [bbc.com]

Starting Score: 1 point

Karma-Bonus Modifier +1

Total Score: 2
Re:Just now on BBC website (Score: 2) by takyon on Friday April 14 2017, @11:28AM

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Friday April 14 2017, @11:28AM (#493909) Journal

The crazy thing is that the BBC article doesn't mention this study.
Also, that last stock photo and caption is priceless.

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]

Parent
Re:Just now on BBC website Re:Just now on BBC website (Score: -1, Troll) by Anonymous Coward on Friday April 14 2017, @04:35PM (1 child)

by Anonymous Coward on Friday April 14 2017, @04:35PM (#494063)

More data in that article:
Google's latest figures (January 2016) state that 19% of its tech staff are women and just 1% are black.
At Microsoft in September 2016 17.5% of the tech workforce were women and 2.7% black or African American.
At Facebook in June 2016 its US tech staff were 17% women and 1% black.
If it's presumed that company hiring and retaining favors capability to create technical intellectual property without any optimization preferences besides capability. It's indeed a grim reading and says things many prefer to be ignorant on.

Parent
- Re:Just now on BBC website (Score: 2) by maxwell demon on Friday April 14 2017, @06:05PM
  
  by maxwell demon (1608) on Friday April 14 2017, @06:05PM (#494120) Journal
  
  That data does not say as much as you probably believe it says. For example, without knowing the percentages in the applicants there is absolutely no way to say if any of the companies favours or disfavours one of the groups. For example, if 3% of the applicants at Google are female, then the 19% in the tech staff means women have far better chances at Google than men have. On the other hand, if 90% of the applicants are women, then women at Google are massively disadvantaged. And if the percentage of women in the applicants is 19%, then Google is perfectly neutral.
  With the data given, there's no way to say one way or the other.
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
Re:Just now on BBC website (Score: 0) by Anonymous Coward on Friday April 14 2017, @06:00PM

by Anonymous Coward on Friday April 14 2017, @06:00PM (#494118)

It does seem orchestrated. Not exactly a surprise in light of organizations like JournoList [wikipedia.org] come CabaList [theatlantic.com] and other similar groups. What I always wonder though is about the motivation for these sort of things. In many cases it's obvious, but in this case I wonder. The 'messaging' here is predictably backfiring. Even on the amazing source of human intelligencia that is Twitter people are quick to observe that algorithms aren't biased but rather that objective analysis of the data is yielding conclusions that many people would rather not hear.
I imagine the release is certainly political in any case. So we're left to assume that it's a smart right leaning group, or a dumb left leaning group. My money's on hanlon's razor.

Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

AI Will be Biased Depending on the Dataset Used for Training

Just now on BBC website Just now on BBC website (Score: 2) by inertnet on Friday April 14 2017, @10:48AM (4 children)

Re:Just now on BBC website (Score: 2) by takyon on Friday April 14 2017, @11:28AM

Re:Just now on BBC website Re:Just now on BBC website (Score: -1, Troll) by Anonymous Coward on Friday April 14 2017, @04:35PM (1 child)

Re:Just now on BBC website (Score: 2) by maxwell demon on Friday April 14 2017, @06:05PM

Re:Just now on BBC website (Score: 0) by Anonymous Coward on Friday April 14 2017, @06:00PM