Stories
Slash Boxes
Comments

SoylentNews is people

posted by LaminatorX on Tuesday January 27 2015, @02:34AM   Printer-friendly
from the croud-fleecing dept.

It turns out that while you're proving to the web server you're a human, you might also be pitching in to provide one of Google's services to its corporate customers. A woman filed a class action lawsuit against Google last Thursday in US District Court in Massachusetts, alleging that Google's reCAPTCHA service has harvested unpaid image-to-text transcription work from millions of web site visitors. Google markets reCAPTCHA as a service to web site owners; its customers include Facebook, Twitter, and Ticketmaster. Like other CAPTCHA implementations, reCAPTCHA challenges site visitors to type in the text corresponding to a visually distorted word. But reCAPTCHA differs from the others in that its images often contain two distorted words, as noted by the civil complaint:

One of those words is a “known” word, which the website user must enter correctly to access the website as a security measure. That is, because Google already knows what word is being displayed in the first distorted image, if the user enters the word correctly, Google knows the user is likely to be a human, and thus permits the users to continue using the website...

The other of the two words, however, serves no security purpose. The second word is an image with text that Google is attempting to transcribe. The sole purpose of the second word is to require the user to read and transcribe the word for Google’s commercial use and benefit, with no corresponding benefit to the user.

The lawsuit notes that Google makes use of optical character transcription for its own products such as Google Books and Street View, and also provides an archive digitization service to newspapers, including the New York Times.

This was apparently never a dark secret; the use of reCAPTCHA to "crowdsource" digitization of old printed materials was publicized as a feature by both Luis von Ahn (who invented reCAPTCHA as a graduate student at Carnegie Mellon University) and Google (who acquired the reCAPTCHA technology in 2009):

reCAPTCHA technology was developed not merely with an eye toward improving cyber security, but also as a way to harness and reuse the collective human time and mental energy spent solving and typing CAPTCHAs—a concept von Ahn has dubbed “human computation.” By constructing CAPTCHAs using words tagged as unreadable in the digitizing of books and other printed material, millions and millions of cyber users play a part every day in the digitization and preservation of human knowledge by transcribing words. Tests have shown that reCAPTCHA textual images are deciphered and transcribed with 99.1% accuracy, a rate comparable to the best human professional transcription services.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3) by darkfeline on Tuesday January 27 2015, @11:40PM

    by darkfeline (1030) on Tuesday January 27 2015, @11:40PM (#138686) Homepage

    Yeah, speaking realistically Google will win.

    But I think there's an important issue at stake here. Let's forget about the specifics of this case and look at the essence of the matter:

    A company asks something from you for verification. The company then, either without telling you or telling you via some fine text on some web page or form no one will ever read, takes what you have given them and makes money from it, keeping all of it for themselves.

    Sure, now it's just captcha -> OCR -> public contribution to digitization (and maybe making money from corporate clients), but in the future it may well be DNA to verify your identity -> sell all your genetic information to third parties for mad profit or solve this NP hard problem real quick to verify you are human -> sell massive human computing service to corporate clients.

    --
    Join the SDF Public Access UNIX System today!
    Starting Score:    1  point
    Moderation   +1  
       Underrated=1, Total=1
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by urza9814 on Thursday January 29 2015, @04:28PM

    by urza9814 (3954) on Thursday January 29 2015, @04:28PM (#139213) Journal

    A company asks something from you for verification. The company then, either without telling you or telling you via some fine text on some web page or form no one will ever read, takes what you have given them and makes money from it, keeping all of it for themselves.

    ...you do realize there's a big question mark button right on the ReCaptcha element which links to several pages of simple english explanation of exactly what ReCaptcha is and how it work, right? Exactly what more do you want them to do? Should they fill the entire damn login form with disclaimers? I mean honestly, *Soylent News* is less transparent than ReCaptcha...

    Furthermore, it's taking something they're going to make you do anyway to stop spam (which I see nobody complaining about) and actually getting some useful work out of it. And here come the freakin' luddites like yourself actually *complaining* about people finding a way to do something useful with previously wasted human labor?

    I'm guessing you're also one of those people who complain about self-checkout lanes in stores "stealing jobs"? Maybe you ought to get off the internet and give the USPS some of their work back. Efficiency is a good thing man, embrace it!