Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Friday December 07 2018, @02:09PM   Printer-friendly

Submitted via IRC for takyon

Talk about a GAN-do attitude... AI software bots can see through your text CAPTCHAs

[...] Boffins at Lancaster University in the UK, Northwest University in the US, and Peking University in China have devised an approach for creating text-based CAPTCHA solvers that makes it trivial to automatically decipher scrambled depictions of text.

Researchers Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang describe their CAPTCHA cracking system in a paper that was presented at the 25th ACM Conference on Computer and Communications Security in October and now released to the public.

As can be surmised from the title, "Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach," the computer scientists used a GAN (Generative Adversarial Network) to teach their CAPTCHA generator, which is used for training their text recognition model.

First described in 2014, a GAN consists of two neural network models pitted against each other as adversaries, one simulating something and the other spotting problems with the simulation until any differences can not longer be identified.

Coincidentally, that's the same year researchers from Google and Stanford published a paper titled, "The End is Nigh: Generic Solving of Text-based CAPTCHAs." Four years on, the speed bumps limiting generic attacks have been paved over.

A GAN turns out to be well-suited for efficiently training data models. It allowed the researchers to teach their CAPTCHA generation program to quickly create lots of synthetic text puzzles to train their basic puzzle solving model. They then fine-tuned it via transfer learning to defeat real text jumbles using only a small set (~500 instead of millions) of actual samples.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday December 07 2018, @03:18PM

    by Anonymous Coward on Friday December 07 2018, @03:18PM (#771163)

    For captchas, just use scanned text. Optical Character Recognition can never get those right. Burn -> bum, skill -> skin, kettle-> ke#1e.

    Or maybe we could use this AI software for some decent OCR?

    Google originally did use their scanned text for recaptcha, so captcha solvers would provide feedback to train the Google OCR. Now they train autonomous cars instead.

    On the other side, captchas are a bit different from OCR on a scan: when you input a candidate solution you get immediate feedback whether or not the solution is correct. If the solution is incorrect, you get to try again, often as many times as necessary. So a bot doesn't need to get it right every time; even if the bot can solve it even 1% of the time that's probably entirely good enough for a captcha solver. While if you are doing OCR on a scanned document... if only 1% of the words were correct then that's probably worse than useless.