Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Thursday February 15 2024, @11:01PM   Printer-friendly

https://torrentfreak.com/lawsuit-accuses-annas-archive-of-hacking-worldcat-stealing-2-2-tb-data-240207/

American nonprofit OCLC is known globally for its leading database of bibliographic records, WorldCat. A few months ago, many of these records were posted publicly by the shadow library search engine, Anna's Archive. OCLC believes that this is the result of a year-long hack and, with a lawsuit filed at an Ohio federal court, it demands damages.

Anna's Archive is a meta-search engine for book piracy sources and shadow libraries.

Launched in the fall of 2022, just days after Z-Library was targeted in a U.S. criminal crackdown, its self-stated goal is to ensure and facilitate the availability of books and articles to the broader public.

A few months ago, the search engine expanded its offering by making available data from OCLC's proprietary WorldCat database. Anna's Archive scraped several terabytes of data over the course of a year and published roughly 700 million unique records online, for free.

These records contain no copyrighted books or articles. However, they can help to create a to-do list of all missing shadow library content on the web, with the ultimate goal of making as much content publicly available as possible.

[...] It is no secret that publishers fiercely oppose the search engine's stated goals. The same also applies to OCLC, which has now elevated its concerns into a full-blown lawsuit, filed this month at a federal court in Ohio.

The complaint accuses Washington citizen Maria Dolores Anasztasia Matienzo and several "John Does" of operating the search engine and scraping WorldCat data. The scraping is equated to a cyberattack by OCLC and started around the time Anna's Archive launched.

"Beginning in the fall of 2022, OCLC began experiencing cyberattacks on WorldCat.org and OCLC's servers that significantly affected the speed and operations of WorldCat.org, other OCLC products and services, and OCLC's servers and network infrastructure," OCLC's complaint notes.

[...] The complaint recognizes that Anna's Archive doesn't host any copyrighted material. Instead, it links to third-party sources and offers torrent downloads. The WorldCat data is also made available through a torrent, which ultimately leads to 2.2TB of uncompressed records.

"Defendants, through the Anna's Archive domains, have made, and continue to make, all 2.2 TB of WorldCat® data available for public download through its torrents," OCLC writes.

[...] Through the lawsuit, OCLC hopes to stop the site from linking to the WorldCat records. Among other claims, the defendants stand accused of breach of contract, unjust enrichment, tortious interference of contract and business relationships, trespass to chattels, and conversion of property.

As compensation for OCLC's reported injuries, the company seeks damages, including compensatory, exemplary, and punitive damages. At the time of writing, the defendants have yet to respond to the allegations.


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Touché) by Anonymous Coward on Friday February 16 2024, @07:33AM (1 child)

    by Anonymous Coward on Friday February 16 2024, @07:33AM (#1344695)

    https://annas-archive.org/ [annas-archive.org]

    Thanks, OCLC
    regards,
    Ms Streisand.

    • (Score: 0, Redundant) by arubaro on Friday February 16 2024, @11:05AM

      by arubaro (8601) on Friday February 16 2024, @11:05AM (#1344712)

      came to say the same....

  • (Score: 5, Interesting) by janrinok on Friday February 16 2024, @08:17AM (2 children)

    by janrinok (52) Subscriber Badge on Friday February 16 2024, @08:17AM (#1344701) Journal

    scraping WorldCat data. The scraping is equated to a cyberattack by OCLC and started around the time Anna's Archive launched.

    It appears that web scraping is now being equated to 'hacking' (using the public understanding and usage of that word). I wonder if WorldCat had a robots.txt set up and, if so, was it correctly configured? Was the data that they are alleging was 'hacked' protected in any way?

    Better get some popcorn...

    --
    I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
  • (Score: 3, Interesting) by pTamok on Friday February 16 2024, @09:29AM

    by pTamok (3042) on Friday February 16 2024, @09:29AM (#1344703)

    The named defendants likely do not have the money to pay lawyers to mount a defence. If they don't respond, there will be a default judgment against them, which will likely ruin their whole day (life).

    If they are not covered by U.S. jurisdiction, they might feel that such a judgment is, in practical terms, irrelevant to them, although it could come back to bite them later if they visit the U.S. or travel via the U.S. to somewhere. They could possibly end up being extradited.

    While I can understand the principle of wanting information to be freely available to all, the approach might be misguided. I will be sorry to see lives ruined.

  • (Score: 2) by Opportunist on Friday February 16 2024, @09:47AM (3 children)

    by Opportunist (5545) on Friday February 16 2024, @09:47AM (#1344706)

    Someone I don't know stole from someone I don't know either.

    I have a hunch that happens a couple thousand times every single hour.

    • (Score: 0) by Anonymous Coward on Friday February 16 2024, @12:55PM (2 children)

      by Anonymous Coward on Friday February 16 2024, @12:55PM (#1344715)

      > Someone I don't know stole from someone I don't know either.

      I can see how you wouldn't know Anna's Archive, it's fairly new. But WorldCat has been showing up in book searches for ages, https://en.wikipedia.org/wiki/WorldCat [wikipedia.org] claims it was

      Launched January 21, 1998; 26 years ago (date of registry of the new domain name; the database already existed since 1971)[2]

      Is it possible that you haven't searched for a book in the last 20+ years?

      • (Score: 3, Touché) by Opportunist on Friday February 16 2024, @03:42PM

        by Opportunist (5545) on Friday February 16 2024, @03:42PM (#1344731)

        The books I read are usually of the kind where you know exactly who wrote it and what's the topic, if you don't even get to know the ISBN because they want you to order it. So no, I didn't have the need to search for a book in at least 20 years.

      • (Score: 0) by Anonymous Coward on Friday February 16 2024, @03:58PM

        by Anonymous Coward on Friday February 16 2024, @03:58PM (#1344736)

        Maybe so, but this is the first I've ever heard of Worldcat...Anna's Archive, however new it is in comparison, already feels like an old friend.

        I'm of the opinion that Worldcat's actions here aren't going to endear them to anyone other than the publishers, nor gain their offerings in this game any sustained extra traffic, an initial burst of 'who the fuck are these mamzers?' curiousity maybe, but that's it.

  • (Score: 2) by Freeman on Friday February 16 2024, @04:27PM

    by Freeman (732) on Friday February 16 2024, @04:27PM (#1344740) Journal

    With everything said below, I do use OCLC/WorldCat quite a bit. I would be much happier, if they were a bit more open. Though, with all corporation, they're in it for the money.
    --
    WorldCat is a giant Public Book Catalog that is maintained by OCLC.

    It was founded in 1967 as the Ohio College Library Center, then became the Online Computer Library Center as it expanded. In 2017, the name was formally changed to OCLC, Inc.[4] OCLC and thousands of its member libraries cooperatively produce and maintain WorldCat, the largest online public access catalog in the world.[5] OCLC is funded mainly by the fees that libraries pay (around $217.8 million annually in total as of 2021) for the many different services it offers.[3] OCLC also maintains the Dewey Decimal Classification system.

    Valid Criticism / OCLC may have lost their way along the way, just like all big businesses:

    In May 2008, OCLC was criticized by Jeffrey Beall for monopolistic practices, among other faults.[64] Library blogger Rick Mason responded that although he thought Beall had some "valid criticisms" of OCLC, he demurred from some of Beall's statements and warned readers to "beware the hyperbole and the personal nature of his criticism, for they strongly overshadow that which is worth stating".[65]

    In November 2008, the Board of Directors of OCLC unilaterally issued a new Policy for Use and Transfer of WorldCat Records[66] that would have required member libraries to include an OCLC policy note on their bibliographic records; the policy caused an uproar among librarian bloggers.[67][68] Among those who protested the policy was the non-librarian activist Aaron Swartz, who believed the policy would threaten projects such as the Open Library, Zotero, and Wikipedia, and who started a petition to "Stop the OCLC powergrab".[69][70] Swartz's petition garnered 858 signatures, but the details of his proposed actions went largely unheeded.[68] Within a few months, the library community had forced OCLC to retract its policy and to create a Review Board to consult with member libraries more transparently.[68] In August 2012, OCLC recommended that member libraries adopt the Open Data Commons Attribution (ODC-BY) license when sharing library catalog data, although some member libraries have explicit agreements with OCLC that they can publish catalog data using the CC0 Public Domain Dedication.[71][72]

    I forgot that the thing that lead to Aaron Swartz's suicide was his systematic download of academic journal articles from JSTOR.
    https://docs.jstor.org/ [jstor.org]

    What Mr. Swartz did was extremely serious from our perspective. Following his arrest, we made contact with Mr. Swartz and learned that he had retained and was prepared to return the copies of all the articles that he had downloaded, and we entered into a civil settlement with him. We told the United States Attorney’s Office that we had no further interest in the matter and did not want to press charges. Subsequently a criminal case was brought against Mr. Swartz by the United States Attorney’s Office, and he was indicted on felony charges in July 2011.

    From the sound of it JSTOR was being reasonable and wasn't responsible for what happened to Aaron Swartz. Which is great, because JSTOR is an awesome resource. JSTOR also seems to be the least predatory of the methods for Libraries to get "permanent" access to academic journals.

    Federal prosecutors on a witch/hacker hunt?

    Days before Swartz's funeral, Lawrence Lessig eulogized his friend and sometime-client in an essay, "Prosecutor as Bully." He decried the disproportionality of Swartz's prosecution and said, "The question this government needs to answer is why it was so necessary that Aaron Swartz be labeled a 'felon'. For in the 18 months of negotiations, that was what he was not willing to accept."[120] Cory Doctorow wrote, "Aaron had an unbeatable combination of political insight, technical skill, and intelligence about people and issues. I think he could have revolutionized American (and worldwide) politics. His legacy may still yet do so."[121]

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
  • (Score: 3, Informative) by Freeman on Friday February 16 2024, @04:35PM (2 children)

    by Freeman (732) on Friday February 16 2024, @04:35PM (#1344741) Journal

    https://en.wikipedia.org/wiki/Anna%27s_Archive [wikipedia.org]

    Anna's Archive is a search engine for shadow libraries. It was founded by the Pirate Library Mirror, a team of anonymous archivists, in direct response to law enforcement efforts to close down Z-Library in 2022.[1][4][5][6][7] It describes itself as a project that aims to "catalog all the books in existence" and to "track humanity's progress toward making all these books easily available in digital form".[4][8][9][dubious – discuss]

    Anna's Archive mirrors Library Genesis, Open Library, Sci-Hub and Z-Library.[10][11][5] Anna's Archive says that it does not host copyrighted materials and that it only indexes metadata that is already publicly available.[4][12]

    As of February 1, 2024, Anna's Archive includes 25,530,302 books and 99,425,822 papers.[13]

    They seem to sit in a gray area, but "hacking" OCLC isn't a good thing. In the event that all they did was scrape WorldCat, I hesitate to call that "hacking" as that's not generally what normal people would call hacking.

    They weren't being quiet about it any case.
    https://torrentfreak.com/annas-archive-scraped-worldcat-to-help-preserve-all-books-in-the-world-231003/ [torrentfreak.com]

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
    • (Score: 1, Insightful) by Anonymous Coward on Friday February 16 2024, @08:24PM (1 child)

      by Anonymous Coward on Friday February 16 2024, @08:24PM (#1344807)

      It's fake legalese "hacking". The data they scraped arguably shouldn't be copyrightable either.

      • (Score: 1, Insightful) by Anonymous Coward on Thursday February 22 2024, @08:27PM

        by Anonymous Coward on Thursday February 22 2024, @08:27PM (#1345720)

        You can tell by the wording they are not going after them for copyright, or not only copyright at least. "Hacking" means they want to bring in criminal "CFAA" charges. Probably because it worked so well on Aaron.

(1)