canopic jug writes:
David Rosenthal discusses the last 25 years of digital preservation efforts in regards to academic journals. It's a long-standing problem and discontinued journals continue to disappear from the Internet. Paper, microfilm, and microfiche are slow to degrade and are decentralized and distributed. Digital media are quick to disappear and the digital publications are usually only in a single physical place leading to single point of failure. It takes continuous, unbroken effort and money to keep digital publications accessible even if only one person or institution wishes to retain acccess. He goes into the last few decades of academic publishing and how we got here and then brings up 4 points abuot preservation, especially in regards to Open Access publishing.
Lesson 1: libraries won't pay enough to preserve even subscription content, let alone open-access content.[...] Lesson 2: No-one, not even librarians, knows where most of the at-risk open-access journals are.[...] Lesson 3: The production preservation pipeline must be completely automated.[...] Lesson 4: Don't make the best be the enemy of the good. I.e. get as much as possible with the available funds, don't expect to get everything.
Lesson 1: libraries won't pay enough to preserve even subscription content, let alone open-access content.
[...] Lesson 2: No-one, not even librarians, knows where most of the at-risk open-access journals are.
[...] Lesson 3: The production preservation pipeline must be completely automated.
[...] Lesson 4: Don't make the best be the enemy of the good. I.e. get as much as possible with the available funds, don't expect to get everything.
He posits that focus should be on the preservation of the individual articles, not the journals as units.
(2020) Internet Archive Files Answer and Affirmative Defenses to Publisher Copyright Infringement Lawsuit
(2018) Vint Cerf: Internet is Losing its Memory
(2014) The Importance of Information Preservation
> It takes continuous, unbroken effort and money to keep digital publications accessible even if only one person or institution wishes to retain acccess.
This sounds like it was paid for by the academic journal cabal. Make it legal to share and copy these, and let all the universities and researchers set up torrent seedboxes. I doubt all of the academic papers in the world together take up more space than a modern high quality full length movie.
Anything that relies on dynamically maintained documents is dubious for archival data. CDs are better than DVDs, because they are more robust against damage/deterioration.
The problem is the difference between easy access and good archival quality, and the answer should be "use different media". Also ANYTHING that depends on encrypted keys being kept available is right out the window. It's totally useless for archival data. (Even if you can break the encryption, it makes it a lot more subject to errors causing the whole thing to be unreadable.)
The problem is CDs aren't stable over long periods of time. They're good for multiple decades if handled carefully, but they are inherently unstable, so they probably won't hold up for a century even in ideal conditions. (This isn't inherently true, but it's true for the versions that could be written by a home computer.) Microfiche were a lot better in this regard, but reading them by computer was a real problem.
The thing is, there hasn't been a lot of work done on producing archival quality media. There's little reward when you produce it, because most customers are more interested in ease of use, and lasting "long enough". Currently probably the best choice for large quantities of data is removable disk drives, but that's hardly archival quality. It lasts a decade or two if there aren't any unexpected problems. After that recovering the data is likely to be a major project, requiring opening the sealed drive, replacing the lubricants, and resealing it...at best.
I agree that it should be legal to share, copy, and re-distribute articles indefinitely. Torrents (when not centralized) would be one easy, currenly existing publication technology, but what mechanism do you propose to ensure the authenticity and general integrity of said documents? The situation we have now is that they are sourced from a single web site. While that ensures the authenticity it also introduces a single point of failure. If we encourage a distributed model, which we should and is long over due, then you have the problem of making sure that the article and its contents have not changed either by accident or on purpose.
Blockchain! That's the answer to everything!