Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday May 08 2015, @08:47AM   Printer-friendly
from the it-FITS-our-needs dept.

The UK's V3 news site reports that the Vatican library considers open source file formats to be the only reliable way for humanity to preserve its history in the digital age.

Vatican Library CIO Luciano Ammenti said that, in order for the manuscripts to be readable, the Vatican Library opted for open source tools that do not require proprietary platforms, such as Microsoft Office, to be read.

"We save it as a picture as it's longer life than a file. You don't rely on PowerPoint or Word. In 50 years they can still just look at it," he said.

"Normally people try to use the TIFF format [when archiving]. This has several problems. It's not open source and it doesn't update. The last time was in 1998.

"On top of this it's 32-bit and not ready for 3D imaging, which limits the information it can preserve - what the script's made of etc. So instead we use the FITS format. FITS is open source, 64-bit, 3D ready and updated regularly. It gives all the information you need on the image."

What formats have you found best for archiving? Which have given you problems?

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by FatPhil on Friday May 08 2015, @09:34AM

    by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Friday May 08 2015, @09:34AM (#180251) Homepage
    That makes no sense. File formats can have an openly published specification. Software which implements reading and writing of files in that format may be open source or closed source.

    To complicate matters, the specification even if openly published may be patent encumbered. (A state I, as someone who abhors software patents, consider absurd - a specification is a list of facts about what is and isn't a valid file in that format, it should not be patentable. If you don't want people using the format, don't publish the specification, and hope nobody reverse engineers it. (And hopefully, nobody will use such formats.))

    And what's wrong with TIFF, apart from the fact that it's powerful enough to trivially permit proprietory extensions? There are open source implementations of it - so I'd say TIFF is open source. If you don't understand a tag, you ignore it. Of course, they shouldn't be creating TIFFs that contain proprietory extensions, but such files are still just as much TIFF format as any other TIFF file. And what does "it doesn't update" even mean? I want a standard that doesn't update. That's what makes it standard. Updating - enforced for compatibility reasons - is the major reason that proprietory formats and programs are so bad, they were always changing, and reverse engineered or open source alternatives were always playing catch-up.

    And why do "manuscripts" need to be 3D? Wasn't the whole point of writing on paper,, rather than engraving in cuneiform, that it was flat?

    And what the flying fuck has 32-bit vs 64-bit got to do with anything!??!?! That's about a particular targetting of a particular implementation of a particular format. Are 32-bit ARM processors now considered something evil to the Vatican? No reading of ancient biblical scriptures on your smartphone!!! Shouldn't *portability* be considered the goal, not specifically-being-64-bit-to-the-exclusion-of-all-prior-hardware?

    Don't get me wrong, I'm all for keeping things open, accessible, and future-proof, but they don't seem to be able to articulate what the actual enemy is.

    Anyway, for image data, there's a wide range of suitable formats, depending on their metadata needs, so this all sounds like bikeshedding. For documents which are intended to contain text, then I'd say they should be using a format which retains the text in its textual form, so that it can be processed by generic text processing tools (like grep). Even though I loathe XML, something XML-based seems to make sense. To be honest, for textual stuff, *the fewer features the better*, as every feature used adds noise for everyone who just wants the text. (And yes, I basically write webpages in little more than a small subset of HTML-1.0.)
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    Starting Score:    1  point
    Moderation   +2  
       Insightful=2, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 5, Informative) by q.kontinuum on Friday May 08 2015, @10:13AM

    by q.kontinuum (532) on Friday May 08 2015, @10:13AM (#180262) Journal

    Probably they meant Open Format [wikipedia.org]. The ideology behind it is the same as for Open Source, so for a less technical person it's easy to confuse, I guess, and the more technical persons probably can correctly assume what was meant.

    --
    Registered IRC nick on chat.soylentnews.org: qkontinuum
  • (Score: 3, Interesting) by morgauxo on Friday May 08 2015, @02:27PM

    by morgauxo (2082) on Friday May 08 2015, @02:27PM (#180320)

    If you are talking about 50-years out do patents really matter? Patents aren't like copyright. By that time they should be all public domain.

    • (Score: 3, Insightful) by FatPhil on Friday May 08 2015, @03:35PM

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Friday May 08 2015, @03:35PM (#180336) Homepage
      Good-point, but there's less use in an archive which can only be used in the future, but not in the present. You wouldn't want your library to get the latest trendy novels and then not lend them out for a couple of decades, would you?
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by morgauxo on Friday May 08 2015, @05:17PM

        by morgauxo (2082) on Friday May 08 2015, @05:17PM (#180391)

        My assumption is that if you are even considering the format you already have the program that reads it today. Maybe it was free, maybe it was purchased but you have it already.

        The problem is that in some number of years the reader program will not run on current equipment. You may not have older equipment to run it on and you may have lost your copy anyway. Now you cannot read the file.

        If it is well documented but patented.. well.. by that time the patent has run out anyway so if anybody still cares about the format they can/will write tools for reading it that work on the newer equipment. Of course.. if nobody cares enough to do that work.. then you are still out of luck. But.. the same could be said of non-patented formats. If nobody cares enough to write the new reader in the future you still have the same problem.

        On the other hand.. A totaly closed format, which is protected by trade secret rather than by patent. Well.. good luck!

        • (Score: 3, Insightful) by HiThere on Friday May 08 2015, @06:43PM

          by HiThere (866) Subscriber Badge on Friday May 08 2015, @06:43PM (#180418) Journal

          You are leaving out the problem that writing a good reader program is a LOT harder than just reading a file. So it's much better to use a format that already has a good reader WHICH CAN BE MAINTAINED as hardware changes. And already I have files that I can only read on MSWind95, because later versions of the program can't read the files of the earlier versions, and the source is closed. It's true that those aren't open formats, but even if they were writing the reader would be more bother than just maintaining a old system...until I can't keep it running anymore. (For awhile I was planning to run an MSWind95 install on a virtual machine, but interest in the files has decreased to the point where that's no longer being planned. But they still need to be kept accessible...until they can't be.)

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
          • (Score: 2) by morgauxo on Wednesday May 13 2015, @04:05PM

            by morgauxo (2082) on Wednesday May 13 2015, @04:05PM (#182438)

            Ah, but I was replying to someone who posted about PATENTED formats. Patents are documentation. You aren't allowed to just go write your own implementation so long as the patent is active however the information you would need is there. Once the patent expires all it takes is to write code around the publicly available description. I would argue that an entirely patented format IS in fact open although it is not in the public domain until after the expiration of the patent.

            The problem with that is that most closed formats aren't patented, they are TRADE SECRETS. As trade secrets their only protection is that the company simply doesn't tell anyone how to decode the format. The original source code is of course obfuscated by being compiled and is of course protected by COPYRIGHT anyway which lasts practically forever. The task of writing a reader for them becomes reverse engineering the format. This can be very difficult. If those formats were patented the patent would have basically served as a guide to writing the reader.

            Now, in real life I don't think there are many patented file formats. Instead closed formats are mostly kept closed through trade secrets with maybe key parts being patented (such as long filenames in FAT). Those secret parts of the formats make readers very hard to write however that isn't what the GP said, the GP was about patents.

            Now having said all that.. I'm not saying that patents are always a good thing. I do believe that in some areas technology is simply moving too fast for the length of time that patents are in effect to NOT be slowing progress down. This is especially true when you consider the 'patent pending' period. Also, there are far too many patents given out on things that are too obvious to deserve it or are simply 'do X that we have been doing for decades but now do it with a Y' (Y especially tends to be a computer). This I think is slowing down our technical progress as a species immensley! But... that has nothing to do with long term archival of information.

            • (Score: 2) by HiThere on Wednesday May 13 2015, @06:30PM

              by HiThere (866) Subscriber Badge on Wednesday May 13 2015, @06:30PM (#182526) Journal

              Patents ought to be documentation, but in the few times I've looked at a patent it did not strike me as revealing enough to be useful.

              --
              Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
  • (Score: 3, Informative) by mtrycz on Friday May 08 2015, @07:04PM

    by mtrycz (60) on Friday May 08 2015, @07:04PM (#180426)

    Some poeple in this thread looks like they are missing some points, so I thought I'd share some of my thoughts. I'm building a digital library at work, and while the main focus is research papers, it will contain thousands (at some point probably millions) pages of digitized works. Also, that's my favourite part.

    First of all, you think long-term. "Will this work be accessible in 50 years?", "how can I ensure that it actually *is* accessible in 50 years?".

    Most people are probably not familiar with the requirements of a digital archive of this kind. The Vatican Library contains lots of works that are hand written, so let's stick with this for now. These works are digitized with modern high-end digital camera fixed above the workspace. You shoot photographs as you turn the pages. Unlike a photocopier that you use in office, this minimizes contact (hence damage) to the works.

    Now, modern digital cameras have a higher dynamic range than your monitor (8bit per channel = red, green, blue): some have as much as double that, and by transforming the raw images into TIFF, you discard that extra information. Now, 32bit is perfectly sufficient for your modern monitor, but we don't know if in 50 years we'll have 64bit monitors? Who can tell. Discarding the extra bits just won't make sense.

    Now for the 3D. This is fucking awesome. Take a work like this:
    http://digi.vatlib.it/view/MSS_Borg.mess.1/0001/thumbs?sid=c3930b52306a29d73c83868284dd6697#current_page [vatlib.it]
    (I can't find the "reactive" viewer right now, I know they have one)
    The depth here certainly matters. Having depth would give you a better sensation of the work. Also, they could start scanning statues, which is just awesome.

    Being not an open format, you can't just upgrade TIFF to your needs, nor can you push for an update. Standards are great, but they still need to be updated from time to time (HTML5?). Also, an open format needs to have an open source implementations in a quantity of languages and platforms. They are just future proofing they work. It's not like file formats don't die.

    --
    In capitalist America, ads view YOU!
    • (Score: 2) by maxwell demon on Friday May 08 2015, @07:26PM

      by maxwell demon (1608) on Friday May 08 2015, @07:26PM (#180435) Journal

      Standards are great, but they still need to be updated from time to time (HTML5?).

      HTML5 is no standard. It is a moving target mislabelled as standard.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 2) by mtrycz on Friday May 08 2015, @07:41PM

        by mtrycz (60) on Friday May 08 2015, @07:41PM (#180442)

        THe point being, that HTML5 is bringing some much needed updates. We'd be stuck with flash without it.

        --
        In capitalist America, ads view YOU!
    • (Score: 2) by FatPhil on Saturday May 09 2015, @07:55AM

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Saturday May 09 2015, @07:55AM (#180670) Homepage
      > by transforming the raw images into TIFF, you discard that extra information

      What the fucking bollocks are you gibbering about?
      http://www.awaresystems.be/imaging/tiff/tifftags/bitspersample.html
      http://www.awaresystems.be/imaging/tiff/tifftags/sampleformat.html
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by mtrycz on Saturday May 09 2015, @08:56AM

        by mtrycz (60) on Saturday May 09 2015, @08:56AM (#180692)

        If it handles 64bit info, I stand corrected.

        The other points hold, tho.

        --
        In capitalist America, ads view YOU!