Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Saturday July 04 2020, @10:14PM   Printer-friendly
from the garbage-in-garbage-out dept.

MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs:

MIT has taken offline its highly cited dataset that trained AI systems to potentially describe people using racist, misogynistic, and other problematic terms.

The database was removed this week after The Register alerted the American super-college. MIT also urged researchers and developers to stop using the training library, and to delete any copies. "We sincerely apologize," a professor told us.

The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT's cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.

[...] Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper [pre-print PDF] submitted to a computer-vision conference due to be held next year.

[...] The key problem is that the dataset includes, for example, pictures of Black people and monkeys labeled with the N-word; women in bikinis, or holding their children, labeled whores; parts of the anatomy labeled with crude terms; and so on – needlessly linking everyday imagery to slurs and offensive language, and baking prejudice and bias into future AI models.

Antonio Torralba, a professor of electrical engineering and computer science at CSAIL, said the lab wasn't aware these offensive images and labels were present within the dataset at all. "It is clear that we should have manually screened them," he told The Register. "For this, we sincerely apologize. Indeed, we have taken the dataset offline so that the offending images and categories can be removed."

In a statement on its website, however, CSAIL said the dataset will be permanently pulled offline because the images were too small for manual inspection and filtering by hand. The lab also admitted it automatically obtained the images from the internet without checking whether any offensive pics or language were ingested into the library, and it urged people to delete their copies of the data:

[...] Giant datasets like ImageNet and 80 Million Tiny Images are also often collected by scraping photos from Flickr or Google Images without people's explicit consent. Meanwhile, Facebook hired actors who agreed to have their faces used in a dataset designed to teach software to detect computer-generated faked images.

Prabhu and Birhane said the social network's approach was a good idea, though they noted academic studies are unlikely to have the funding to pay actors to star in training sets. "We acknowledge that there is no perfect solution to create an ideal dataset, but that doesn't mean people shouldn't try and create better ones," they said.

The duo suggested blurring people's faces in datasets focused on object recognition, carefully screening the images and labels to remove any offensive material, and even training systems using realistic synthetic data. "You don't need to include racial slurs, pornographic images, or pictures of children," they said. "Doing good science and keeping ethical standards is not mutually exclusive."


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by optotronic on Sunday July 05 2020, @01:34AM (8 children)

    by optotronic (4285) on Sunday July 05 2020, @01:34AM (#1016342)

    So, perhaps, to paraphrase an old saying: "human, heal thyself" before expecting to create systems that behave better than you do.

    We're never going to fix humanity. I think we could train AIs to be better than us. The better/best among us understand the undesirable traits that should not be propagated.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Sunday July 05 2020, @03:02AM (4 children)

    by Anonymous Coward on Sunday July 05 2020, @03:02AM (#1016357)

    > ...better than us

    Please define "better" in this context. Better at conversation with your minister, or with the gang down at the pool hall? It's going to be awhile before any "AI" (pattern matching system) can reliably recognize where the two different vocabularies are appropriate.

    • (Score: 0) by Anonymous Coward on Sunday July 05 2020, @04:05PM (3 children)

      by Anonymous Coward on Sunday July 05 2020, @04:05PM (#1016525)

      Better at conversation with your minister, or with the gang down at the pool hall?

      Sounds like you're a Terminator: TSCC fan.

      No, we just want neural net algorithms with training sets that do not include racist and other degrading shit! Neural networks have no idea of human history and sociology, and they do not need this knowledge to perform their function. For a neural network algorithm, it may simply remain blissfully ignorant of the social construct the humans have invented--for the sole purpose of exploitation--called "race." Why is this difficult?

      Why are we distracting from the issue of certain companies creating specific products with bad training sets by trying to make this out to be some nebulous flaw in the maths that govern how neural network algorithms perform. We place it in the land of mysticism and bullshit so that we can excuse those companies and their racist training sets!

      Oh, it's just a limitation of the technology. Oh, it's just a reflection of human nature. Bull fucking shit it is! Butbut suddenly I have an appreciation for postmodernism and moral relativism! Bull fucking shit!

      Moderate whites! Can't live with 'em...!

      • (Score: -1, Troll) by Anonymous Coward on Sunday July 05 2020, @06:33PM (2 children)

        by Anonymous Coward on Sunday July 05 2020, @06:33PM (#1016567)

        It's not a social construct you stupid fuck. The major races (Australoid, Cacausoid, Mongoloid and Negroid) and possibly their subgroups, are different subspecies just like blackbirds and bluejays. Nature made the races for a reason, despite what lies you and your masters, the Jews, like to push.

        • (Score: 0) by Anonymous Coward on Monday July 06 2020, @06:46AM (1 child)

          by Anonymous Coward on Monday July 06 2020, @06:46AM (#1016876)

          They are a social construct, you stupid fuck.

          FTFY, but I am not at all confident you will understand the correction. There are no "races". You left out the Semitic, the Altaic, and the South Dakotian Republican Party. They all can interbreed, as all us mud-bloods in the world, especially the trans ones, know already. You yourself are probably more than normal levels of Neanderthal DNA, as evidenced by your resistance to logic and science. So there is only Homo Sapiens Sapiens, and the less sapient, like yourself. This is why we cannot have white supremacy! White folks are just too stupid! That is why they are racist, and why their racism is impossible. Got it?

          • (Score: 2) by maxwell demon on Monday July 06 2020, @08:39AM

            by maxwell demon (1608) on Monday July 06 2020, @08:39AM (#1016885) Journal

            They all can interbreed

            Worst argument ever. If they couldn't interbreed, they'd be different species. To my knowledge, even the most die-hard racists don't claim that.

            --
            The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 3, Interesting) by legont on Sunday July 05 2020, @03:50AM (1 child)

    by legont (4179) on Sunday July 05 2020, @03:50AM (#1016374)

    Successful removing of a word eventually results in a different one used for the purpose. That's how intellect functions. If AI couldn't come up with it's own C* word, it's simply not smart enough.

    --
    "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    • (Score: 0) by Anonymous Coward on Sunday July 05 2020, @08:45PM

      by Anonymous Coward on Sunday July 05 2020, @08:45PM (#1016640)

      This is perhaps the most salient point.

      Words are derived from thoughts, not vice versa. And so all you do by censoring language is retard fluent expression. Political correctness is just the weirdest and most backwards idea. It seems to be based on the idea that if you simply erase the color red from our language that we'd no longer recognize or understand such a notion which is plainly absurd.

  • (Score: 0) by Anonymous Coward on Sunday July 05 2020, @08:34PM

    by Anonymous Coward on Sunday July 05 2020, @08:34PM (#1016633)

    Lobotomizing something is generally not how you make it better than... anything.