Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Wednesday December 31, @05:35AM   Printer-friendly

https://therecord.media/spotify-disables-scraping-annas

Spotify responded on Monday to an open-source group's decision to publish files over the weekend containing 86 million tracks scraped from the music streaming platform.

Anna's Archive, which calls itself the "largest truly open library in human history," said on Saturday that it discovered a way to scrape Spotify's files and subsequently released a database of metadata and songs.

A spokesperson for Spotify told Recorded Future News that it "has identified and disabled the nefarious user accounts that engaged in unlawful scraping."

"We've implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior," the spokesperson said. "Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights."

The spokesperson added that Anna's Archive did not contact them before publishing the files. They also said it did not consider the incident a "hack" of Spotify. The people behind the leaked database systematically violated Spotify's terms by stream-ripping some of the music from the platform over a period of months, a spokesperson said.

They did this through user accounts set up by a third party and not by accessing Spotify's business systems, they added.

Anna's Archive published a blog post about the cache this weekend, writing that while it typically focuses its efforts on text, its mission to preserve humanity's knowledge and culture "doesn't distinguish among media types."

"Sometimes an opportunity comes along outside of text. This is such a case. A while ago, we discovered a way to scrape Spotify at scale. We saw a role for us here to build a music archive primarily aimed at preservation," they said.

"This Spotify scrape is our humble attempt to start such a 'preservation archive' for music. Of course Spotify doesn't have all the music in the world, but it's a great start."

While the full release contains a music metadata database with 256 million tracks, Anna's Archive put together a bulk file a little under 300 terabytes in size featuring 86 million music files that account for about 99.6% of all listens on Spotify. There is another smaller file featuring the top 10,000 most popular songs.

The files cover all music posted on Spotify from 2007 to July 2025. Anna's Archive called it "by far the largest music metadata database that is publicly available."

"With your help, humanity's musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts, and other catastrophes," the organization said.

The blog post outlines distinct trends from Spotify data. The top three songs on Spotify — Billie Eilish's "Birds of a Feather," Lady Gaga's "Die with a Smile" and Bad Bunny's "DtMF" — have a higher total stream count than the bottom 20-100 million songs combined.

Anna's Archive, which is banned in several countries for its repeated copyright violations, was created in the wake of the law enforcement shutdown of Z-Library in 2022. The Justice Department arrested and charged two Russian nationals in 2022 for running Z-Library, which at the time was "the world's largest library" and claimed to have at least 11 million e-books for download.

Anna's Archive emerged days after Z-Library was shut down and aggregated records from that site as well as several other free online libraries like the Internet Archive, Library Genesis and Sci-Hub.

As of December, Anna's Archive has more than 61 million books and 95 million papers. Copyright holders in multiple countries have tried to sue the organization, and Google in November said it removed nearly 800 million links to Anna's Archive from its search engine after publishers issued takedown requests.


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by Mojibake Tengu on Wednesday December 31, @07:30AM

    by Mojibake Tengu (8598) on Wednesday December 31, @07:30AM (#1428330) Journal

    You can't make money by selling information without communicating that information.
    You simply must transfer it some way to the buyer, and when that happens, all the unavoidable mathematics from Shannon's Theory come into play.

    Because of that, it makes sense to sell only some ephemeral information valid at a certain time point, either critically fresh or such as which becomes more and more irrelevant by time flow.

    Selling static information (such as Artwork or Software) in Information Age is Fools' Business.
    I have no pity for Spotify. The crowd public just operated smart as I expected.

    --
    Rust programming language offends both my Intelligence and my Spirit.
  • (Score: 5, Informative) by Bentonite on Wednesday December 31, @08:38AM

    by Bentonite (56146) on Wednesday December 31, @08:38AM (#1428334)

    Keeps being misused and people still keep referring to free software and also proprietary software as "open-source".

    This is a case of unauthorized copying of a large amount of music tracks and making them available for download (an act of prohibited copying, as such constitutes copyright infringement) - but such is an "open-source" activity - despite how none of the source files or the audio files are available?

  • (Score: 4, Informative) by bzipitidoo on Wednesday December 31, @08:42AM

    by bzipitidoo (4388) on Wednesday December 31, @08:42AM (#1428335) Journal

    How many songs ever chart? I should guess that over the entire history of recorded music, there hasn't been 256 million songs that made the top 100. Obscure artists should be thrilled at the attention.

    As to the industry, their gaslighting and loaded terminology isn't working. "nefarious" user accounts, huh? Please.

  • (Score: 5, Interesting) by looorg on Wednesday December 31, @10:52AM

    by looorg (578) on Wednesday December 31, @10:52AM (#1428337)

    I guess most normal users won't download 300 Terabytes to get to their tunes. But it will be interesting to see if any, or more, Spotify clones appear. Still it's not just about the music then but the infrastructure. So perhaps this won't really amount to all that much. After all the music was already downloadable before and doubtfully not to many will be building their own Spotify clone.

    There have also been that long running "rumor" that Spotify themselves started by downloading a craptastic amount of music from the Pirate Bay to initially fill their prototype. So perhaps things just came around to the beginning again ... what goes around, comes around.

  • (Score: 5, Informative) by jahaven on Wednesday December 31, @01:26PM (3 children)

    by jahaven (12434) on Wednesday December 31, @01:26PM (#1428347)

    Check the data that it generated:
    https://annas-archive.org/blog/backing-up-spotify.html [annas-archive.org]
    I think Spotify is more worried about the data they downloaded about the tracks.

    • (Score: 2) by corey on Wednesday December 31, @09:38PM (2 children)

      by corey (2202) on Wednesday December 31, @09:38PM (#1428377)

      I had a look at that post but couldn’t see anything which Spotify would be wanting to hide?

      • (Score: 2, Insightful) by Anonymous Coward on Thursday January 01, @04:13AM

        by Anonymous Coward on Thursday January 01, @04:13AM (#1428396)

        Well, if they have been lying to artists about the number of plays but keeping the correct number in the database they might be in a bit of trouble. But what are the chances a large corporation would do anything dishonest to make a bigger profit?

      • (Score: 2) by Bentonite on Thursday January 01, @04:33AM

        by Bentonite (56146) on Thursday January 01, @04:33AM (#1428399)

        Companies like Spotify often want to hide and restrict absolutely everything - spamming NDAs galore, even for boring information that is trivial to get (i.e. the metadata of a music track).

(1)