Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday September 20 2020, @05:20PM   Printer-friendly
from the bitrot dept.

David Rosenthal discusses the last 25 years of digital preservation efforts in regards to academic journals. It's a long-standing problem and discontinued journals continue to disappear from the Internet. Paper, microfilm, and microfiche are slow to degrade and are decentralized and distributed. Digital media are quick to disappear and the digital publications are usually only in a single physical place leading to single point of failure. It takes continuous, unbroken effort and money to keep digital publications accessible even if only one person or institution wishes to retain acccess. He goes into the last few decades of academic publishing and how we got here and then brings up 4 points abuot preservation, especially in regards to Open Access publishing.

Lesson 1: libraries won't pay enough to preserve even subscription content, let alone open-access content.

[...] Lesson 2: No-one, not even librarians, knows where most of the at-risk open-access journals are.

[...] Lesson 3: The production preservation pipeline must be completely automated.

[...] Lesson 4: Don't make the best be the enemy of the good. I.e. get as much as possible with the available funds, don't expect to get everything.

He posits that focus should be on the preservation of the individual articles, not the journals as units.

Previously:
(2020) Internet Archive Files Answer and Affirmative Defenses to Publisher Copyright Infringement Lawsuit
(2018) Vint Cerf: Internet is Losing its Memory
(2014) The Importance of Information Preservation


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bzipitidoo on Monday September 21 2020, @02:44PM (1 child)

    by bzipitidoo (4388) on Monday September 21 2020, @02:44PM (#1054432) Journal

    > PDF ... was designed for the print industry.

    Yes, a digital format designed for print. You appreciate the irony in that.

    Thanks for mentioning Pitstop. I didn't know of it. The tools I know are the libre ones, such as Okular, Evince, pdftohtml, pdftotext, and several other command line tools that start with "pdf". Of course none of them can quite do all the things the commercial tools from Adobe can do. For instance, Okular can add text, but it can't do a proper job of adding images. While it has the ability to add images, it does so in a non-standard way that can only be read by Okular. Oh, and that method of adding text is not making use of the fillable forms ability that was added to the PDF standard, it's just a simple insertion of additional text. Not that the blank form the other business made available made use of fillable forms, still pretty rare to see that, which pretty much forces the users to use the hackish way of inserting text that Okular can do.

    Speaking of other functionality, the business I'm working with is trying to use PDF as an all purpose digital document format for business. They need to fill in and sign documents which are often provided in PDF format only. It's another weirdness about the business world that they must have the source documents they used to create the PDF, but they behave as if giving those out is giving away trade secrets or something. The exact same form in PDF is okay, but the docx original is top secret! Why, if you had the source, your business might just copy their business's document, and, and, use it! Not that docx is a great format for business either.

    So the easiest way to sign a document is to print it out, sign the paper by hand, then scan it, to PDF. (Some Very Important People sign documents so often they've had created a custom rubber stamp of their signature. Yeah, the literal rubber stamp.) That can also be the easiest, fastest way to fill out a form. Of course that loses all the text, thanks to the scanner treating the scan as a simple raster image. But that's the best way to scan a text document because OCR is not reliable enough to be trusted with an automatic conversion of a raster image back to text. Businesses sure don't want to spend the time and money to have a person check and correct errors in OCR jobs. So there's another factor that bloats the heck out of document storage. Let's take a sheet that was 20k of text, and turn it into 350k or more of raster image, woohoo! Good for sellers of digital storage.

    Adobe has added digital signatures to the PDF standard. I am not clear on what they mean by that. Sounds like it's a joke. We're going to read a handwritten signature from one of those crappy pen tablets that are typically attached to a credit card processing machine, and digitally sign it with some sort of DRM like, self-certified public key that is worthless for proving that a signature is genuine thanks to the self-certifying manner in which it is created, and pass that off as a "secure" digital signature. And more, we're just going to trust that the signature we're digitally signing is not a forgery. Maybe Adobe even tells businesses how to do it right to make it genuinely secure, but this involves too much work, so, wink, wink, businesses skip it and take the easy and insecure route. Oh well, hand written signatures always were extremely weak proof that a document has been accepted and endorsed, so it's not like Adobe and their customers taking shortcuts has made matters any worse. I am particularly amused by the software that doesn't even use a touch tablet (even if it is being run on a such a device), instead giving the users a choice of several different handwriting fonts. Guess you're supposed to pick whichever font looks closest to your actual handwriting, but no one asks even for that.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by deimtee on Tuesday September 22 2020, @02:39AM

    by deimtee (3272) on Tuesday September 22 2020, @02:39AM (#1054760) Journal

    So the easiest way to sign a document is to print it out, sign the paper by hand, then scan it, to PDF. (Some Very Important People sign documents so often they've had created a custom rubber stamp of their signature. Yeah, the literal rubber stamp.) That can also be the easiest, fastest way to fill out a form. Of course that loses all the text, thanks to the scanner treating the scan as a simple raster image. But that's the best way to scan a text document because OCR is not reliable enough to be trusted with an automatic conversion of a raster image back to text.

    I was a variable data specialist for a while. Been out of the print industry a few years now, but one of the ways I used to add stuff to PDFs was to put one in another [word|quark|indesign] document as a full page image, add text boxes or pics as required, then print that to the PDF driver (Acrobat Distiller) to make a new PDF incorporating the new elements.

    Seems like you could almost automate that. Drop a pdf form in a hot folder, have Office or whatever open a new document and drop it in as the background, paste in a pic of your signature (black text on a transparent background), stop at that point to let you drag/resize the signature, click another custom script button to create the new PDF (called "DocumentName_signed.pdf") and drop it in your signed document output folder. You could even have it open up in a pdf-reader to check it worked.

    --
    If you cough while drinking cheap red wine it really cleans out your sinuses.