Stories
Slash Boxes
Comments

SoylentNews is people

posted by mrpg on Saturday September 01 2018, @07:01AM   Printer-friendly
from the blame-humans-of-course dept.

New research has shown just how bad AI is at dealing with online trolls.

Such systems struggle to automatically flag nudity and violence, don’t understand text well enough to shoot down fake news and aren’t effective at detecting abusive comments from trolls hiding behind their keyboards.

A group of researchers from Aalto University and the University of Padua found this out when they tested seven state-of-the-art models used to detect hate speech. All of them failed to recognize foul language when subtle changes were made, according to a paper [PDF] on arXiv.

Adversarial examples can be created automatically by using algorithms to misspell certain words, swap characters for numbers or add random spaces between words or attach innocuous words such as ‘love’ in sentences.

The models failed to pick up on adversarial examples and successfully evaded detection. These tricks wouldn’t fool humans, but machine learning models are easily blindsighted. They can’t readily adapt to new information beyond what’s been spoonfed to them during the training process.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by requerdanos on Saturday September 01 2018, @02:04PM (1 child)

    by requerdanos (5997) Subscriber Badge on Saturday September 01 2018, @02:04PM (#729221) Journal

    It wasn't too long ago we had an interesting

    Well, not entirely without its instructional features, but I am not sure too many people actually showed "interest"... I find most of our spammers just mildly annoying in the wasting-my-valuable-time sense.

    show of wits with DN and our TMB over how to sneak in trash posts. My own take on it was DN made it pretty clear that no matter what one could do, a determined DN would get it in.

    While my take seems to have been that while the spammer used several methods such as werd and ©hárâ¢tér substitution (both addressed in the PDF paper in TFA) to evade detection, regular expressions won in the end. (My reason for this view: Haven't seen any of those type of spam posts lately.)

    Interesting that our views should have been opposite from the same event. Food for thought. I suppose it's entirely possible that you're completely correct, and the spammer just lost interest in spamming this site.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by The Mighty Buzzard on Saturday September 01 2018, @02:08PM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Saturday September 01 2018, @02:08PM (#729224) Homepage Journal

    Nah, I got bored and quit writing new regexes a day or two before he got bored and quit spamming.

    --
    My rights don't end where your fear begins.