Stories
Slash Boxes
Comments

SoylentNews is people

posted by mrpg on Saturday September 01 2018, @07:01AM   Printer-friendly
from the blame-humans-of-course dept.

New research has shown just how bad AI is at dealing with online trolls.

Such systems struggle to automatically flag nudity and violence, don’t understand text well enough to shoot down fake news and aren’t effective at detecting abusive comments from trolls hiding behind their keyboards.

A group of researchers from Aalto University and the University of Padua found this out when they tested seven state-of-the-art models used to detect hate speech. All of them failed to recognize foul language when subtle changes were made, according to a paper [PDF] on arXiv.

Adversarial examples can be created automatically by using algorithms to misspell certain words, swap characters for numbers or add random spaces between words or attach innocuous words such as ‘love’ in sentences.

The models failed to pick up on adversarial examples and successfully evaded detection. These tricks wouldn’t fool humans, but machine learning models are easily blindsighted. They can’t readily adapt to new information beyond what’s been spoonfed to them during the training process.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by requerdanos on Saturday September 01 2018, @12:18PM

    by requerdanos (5997) Subscriber Badge on Saturday September 01 2018, @12:18PM (#729207) Journal

    Such systems... failed to recognize foul language when subtle changes were made... Adversarial examples can be created automatically

    Makes sense. I read once [lesswrong.com] (a probably apocryphal tale) about a system trained to look at photos and recognize threatening vehicles such as military tanks, and with their training data set, got in the 90+% accuracy rate. Then they tried their shiny new system on arbitrary photos with and without military vehicles and it was clueless. Further examination showed that sunny vs. cloudy differed in the training images more than presence vs. absence of target vehicles, but the overall illustration is "the system sucks if the training data sucks".

    The models failed to pick up on adversarial examples and successfully evaded detection. These tricks wouldn’t fool humans, but machine learning models are easily blindsighted.

    For example, a machine just looking for hate speech might accept "blindsighted" as a word, but many humans would know better [oxforddictionaries.com] based on simple everyday knowledge, perhaps aided by a dictionary lookup. This suggests "attacks" on the algorithms based on other made up words containing semi-soundalike alternate words (dome-mass deskhead?). Kind of ironic that the reporting contains what the paper bemoans, though.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2