American nonprofit OCLC is known globally for its leading database of bibliographic records, WorldCat. A few months ago, many of these records were posted publicly by the shadow library search engine, Anna's Archive. OCLC believes that this is the result of a year-long hack and, with a lawsuit filed at an Ohio federal court, it demands damages.
Anna's Archive is a meta-search engine for book piracy sources and shadow libraries.
Launched in the fall of 2022, just days after Z-Library was targeted in a U.S. criminal crackdown, its self-stated goal is to ensure and facilitate the availability of books and articles to the broader public.
A few months ago, the search engine expanded its offering by making available data from OCLC's proprietary WorldCat database. Anna's Archive scraped several terabytes of data over the course of a year and published roughly 700 million unique records online, for free.
These records contain no copyrighted books or articles. However, they can help to create a to-do list of all missing shadow library content on the web, with the ultimate goal of making as much content publicly available as possible.
[...] It is no secret that publishers fiercely oppose the search engine's stated goals. The same also applies to OCLC, which has now elevated its concerns into a full-blown lawsuit, filed this month at a federal court in Ohio.
The complaint accuses Washington citizen Maria Dolores Anasztasia Matienzo and several "John Does" of operating the search engine and scraping WorldCat data. The scraping is equated to a cyberattack by OCLC and started around the time Anna's Archive launched.
"Beginning in the fall of 2022, OCLC began experiencing cyberattacks on WorldCat.org and OCLC's servers that significantly affected the speed and operations of WorldCat.org, other OCLC products and services, and OCLC's servers and network infrastructure," OCLC's complaint notes.
[...] The complaint recognizes that Anna's Archive doesn't host any copyrighted material. Instead, it links to third-party sources and offers torrent downloads. The WorldCat data is also made available through a torrent, which ultimately leads to 2.2TB of uncompressed records.
"Defendants, through the Anna's Archive domains, have made, and continue to make, all 2.2 TB of WorldCat® data available for public download through its torrents," OCLC writes.
[...] Through the lawsuit, OCLC hopes to stop the site from linking to the WorldCat records. Among other claims, the defendants stand accused of breach of contract, unjust enrichment, tortious interference of contract and business relationships, trespass to chattels, and conversion of property.
As compensation for OCLC's reported injuries, the company seeks damages, including compensatory, exemplary, and punitive damages. At the time of writing, the defendants have yet to respond to the allegations.
(Score: 5, Touché) by Anonymous Coward on Friday February 16 2024, @07:33AM (1 child)
https://annas-archive.org/ [annas-archive.org]
Thanks, OCLC
regards,
Ms Streisand.
(Score: 0, Redundant) by arubaro on Friday February 16 2024, @11:05AM
came to say the same....
(Score: 5, Interesting) by janrinok on Friday February 16 2024, @08:17AM (2 children)
It appears that web scraping is now being equated to 'hacking' (using the public understanding and usage of that word). I wonder if WorldCat had a robots.txt set up and, if so, was it correctly configured? Was the data that they are alleging was 'hacked' protected in any way?
Better get some popcorn...
I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
(Score: 4, Informative) by The Archon V2.0 on Friday February 16 2024, @01:41PM
Not the first time "hacking" is used to mean "using my site in a way I did not intend". For instance, the View Source command:
https://arstechnica.com/tech-policy/2021/10/missouri-gov-calls-journalist-who-found-security-flaw-a-hacker-threatens-to-sue/ [arstechnica.com]
(Score: 3, Interesting) by bzipitidoo on Friday February 16 2024, @03:12PM
I also take exception to this too common use of the word "stealing". What exactly did WorldCat lose? Nothing!
(Score: 3, Interesting) by pTamok on Friday February 16 2024, @09:29AM
The named defendants likely do not have the money to pay lawyers to mount a defence. If they don't respond, there will be a default judgment against them, which will likely ruin their whole day (life).
If they are not covered by U.S. jurisdiction, they might feel that such a judgment is, in practical terms, irrelevant to them, although it could come back to bite them later if they visit the U.S. or travel via the U.S. to somewhere. They could possibly end up being extradited.
While I can understand the principle of wanting information to be freely available to all, the approach might be misguided. I will be sorry to see lives ruined.
(Score: 2) by Opportunist on Friday February 16 2024, @09:47AM (3 children)
Someone I don't know stole from someone I don't know either.
I have a hunch that happens a couple thousand times every single hour.
(Score: 0) by Anonymous Coward on Friday February 16 2024, @12:55PM (2 children)
> Someone I don't know stole from someone I don't know either.
I can see how you wouldn't know Anna's Archive, it's fairly new. But WorldCat has been showing up in book searches for ages, https://en.wikipedia.org/wiki/WorldCat [wikipedia.org] claims it was
Is it possible that you haven't searched for a book in the last 20+ years?
(Score: 3, Touché) by Opportunist on Friday February 16 2024, @03:42PM
The books I read are usually of the kind where you know exactly who wrote it and what's the topic, if you don't even get to know the ISBN because they want you to order it. So no, I didn't have the need to search for a book in at least 20 years.
(Score: 0) by Anonymous Coward on Friday February 16 2024, @03:58PM
Maybe so, but this is the first I've ever heard of Worldcat...Anna's Archive, however new it is in comparison, already feels like an old friend.
I'm of the opinion that Worldcat's actions here aren't going to endear them to anyone other than the publishers, nor gain their offerings in this game any sustained extra traffic, an initial burst of 'who the fuck are these mamzers?' curiousity maybe, but that's it.
(Score: 2) by Freeman on Friday February 16 2024, @04:27PM
With everything said below, I do use OCLC/WorldCat quite a bit. I would be much happier, if they were a bit more open. Though, with all corporation, they're in it for the money.
--
WorldCat is a giant Public Book Catalog that is maintained by OCLC.
Valid Criticism / OCLC may have lost their way along the way, just like all big businesses:
I forgot that the thing that lead to Aaron Swartz's suicide was his systematic download of academic journal articles from JSTOR.
https://docs.jstor.org/ [jstor.org]
From the sound of it JSTOR was being reasonable and wasn't responsible for what happened to Aaron Swartz. Which is great, because JSTOR is an awesome resource. JSTOR also seems to be the least predatory of the methods for Libraries to get "permanent" access to academic journals.
Federal prosecutors on a witch/hacker hunt?
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 3, Informative) by Freeman on Friday February 16 2024, @04:35PM (2 children)
https://en.wikipedia.org/wiki/Anna%27s_Archive [wikipedia.org]
They seem to sit in a gray area, but "hacking" OCLC isn't a good thing. In the event that all they did was scrape WorldCat, I hesitate to call that "hacking" as that's not generally what normal people would call hacking.
They weren't being quiet about it any case.
https://torrentfreak.com/annas-archive-scraped-worldcat-to-help-preserve-all-books-in-the-world-231003/ [torrentfreak.com]
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 1, Insightful) by Anonymous Coward on Friday February 16 2024, @08:24PM (1 child)
It's fake legalese "hacking". The data they scraped arguably shouldn't be copyrightable either.
(Score: 1, Insightful) by Anonymous Coward on Thursday February 22 2024, @08:27PM
You can tell by the wording they are not going after them for copyright, or not only copyright at least. "Hacking" means they want to bring in criminal "CFAA" charges. Probably because it worked so well on Aaron.