Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.
posted by Fnord666 on Wednesday May 26 2021, @01:13PM   Printer-friendly
from the vendor-capture dept.

There are still a few months to fix this, but for now the US Patent and Trademark Office's (USPTO) Acting Commissioner for Patents, Andrew Faile, and Chief Information Officer, Jamie Holcombe, have announced that starting January 1st, 2022, the USPTO will institute a surcharge for applicants that are not locked into Microsoft products via the proprietary DOCX format. From that date onwards, the USPTO will move away from PDF and require all filers to use that proprietary format or face an arbitrary surcharge when filing.

First, we delayed the effective date for the non-DOCX surcharge fee to January 1, 2022, to provide more time for applicants to transition to this new process, and for the USPTO to continue our outreach efforts and address customer concerns. We've also made office actions available in DOCX and XML formats and further enhanced DOCX features, including accepting DOCX for drawings in addition to the specification, claims, and abstract for certain applications.

One out of several major problems with the plans is that DOCX is a proprietary format. There are several variants of DOCX and each of them are really only supported by a single company's products. Some other products have had progress in beginning to reverse engineering it, but are hindered by the lack of documentation. DOCX is a competitor to the fully-documented, open standard OpenDocument Format, also known as ISO/IEC 26300.

DOCX is not to be confused with OOXML, though it often is. While OOXML, also known as ISO/IEC 29500, is technically standardized, it is incompletely documented and only vaguely related to DOCX. The DOCX format itself is neither fully documented nor standard. So the USPTO is also engaged in spreading disinformation by asserting that it is.

Previously:
(2015) Microsoft Threatened the UK Over Open Standards


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bzipitidoo on Wednesday May 26 2021, @05:09PM (6 children)

    by bzipitidoo (4388) on Wednesday May 26 2021, @05:09PM (#1139012) Journal

    PDF really sucks in a number of ways. Firstly, for edits, it is the absolute worst, least editable format there is. Granted, it wasn't meant to be edited, but we have learned that inability to make edits is not an advantage, it's a disadvantage, a big one. Making changes directly to a PDF is possible, and there are many tools for doing that, but it is a major pain. Even just lifting the text out and pasting it back into a word processor may not be straightforward. For one thing, there is no requirement that the text in a PDF be ordered in the same order as read.

    The next huge disadvantage of PDF is how very wasteful of space it is. Was it really so impossible to standardize on the math necessary to calculate letter positions? It's like they never heard of FORTRAN, and that people grappled with that sort of problem, of exactly reproducing mathematical results on different hardware, all the way back to the 1950s. PDF avoids that issue by explicitly coding the position of everything, down to the individual letters. Very costly way to dodge that issue, and also, the chief reason why PDF is such a pain to edit. To add to the waste, the way PDF encodes positions is very inefficient. Every time the letter spacing is changed, the standard requires that the opacity be specified. And so, in 99% of PDFs with text, over and over and over, the opacity is set to 100%. Opacities other than 100% just aren't used that much.

    Of course, switching to docx is just trading one set of troubles and limitations for another arguably worse set.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2, Insightful) by Anonymous Coward on Wednesday May 26 2021, @05:35PM

    by Anonymous Coward on Wednesday May 26 2021, @05:35PM (#1139026)

    in terms of patent applications, inability to make edits should be considered a feature, not a bug.

  • (Score: 1, Informative) by Anonymous Coward on Wednesday May 26 2021, @05:38PM (4 children)

    by Anonymous Coward on Wednesday May 26 2021, @05:38PM (#1139027)

    All of this is because PDF was designed to be:

    Electronic Paper

    That is why a PDF simply specifies the x,y position of each letter/word on the page. Because PDF was meant to reproduce a sheet of paper in an electronic form.

    Everything else shoehorned in later (editing, notes, highlighting, etc.) was added to try to keep it relevant in view of better formats for those other tasks.

    • (Score: 2) by bzipitidoo on Wednesday May 26 2021, @11:00PM (3 children)

      by bzipitidoo (4388) on Wednesday May 26 2021, @11:00PM (#1139117) Journal

      > Electronic Paper

      And that idea is fundamentally flawed. There's nothing holy about paper. Paper is great stuff, it's worked for thousands of years for information storage, but now, now we at last have technology that has many advantages over paper, as well as a few disadvantages that are massively outweighed by all those advantages. It's just crazy to hobble our tech by forcing it to act similarly to paper's limitations.

      What reason, really, is there to make it hard to edit an electronic document? If it's so that readers can enjoy some assurance that a document has not been altered, not corrupted, that's a phantom. It's not even as good as security through obscurity. It's security through inconvenience. Not only does PDF fail miserably at being completely unalterable, there are plenty of reasons why we shouldn't want that feature, and indeed should regard it as a misfeature. Reason we think inalterability might be good, and think that property of PDF is desirable and fondly imagine that it is much more intended and effective than it actually is, is a sort of romanticism about the past, and a worldview of text and information as valuable, precious, and, something that can be coveted, hoarded, and denied to others, rather like gold. As if unchangeability and permanence makes a format a worthy medium to hold holy texts such as the Ten Commandments. We have digital signatures that are far better than the too vaunted imagined permanence and unchangeability of ink on paper, or even "carved in stone".

      That kind of thinking and reverence is wrong, and bad. It's liking the book, and the paper in it, more than the contents of the book. Liking that PDF is hard to change is one manifestation of that attitude. Strong believe in IP is another manifestation.

      • (Score: 3, Insightful) by Anonymous Coward on Thursday May 27 2021, @01:41AM (2 children)

        by Anonymous Coward on Thursday May 27 2021, @01:41AM (#1139148)

        Electronic Paper

        And that idea is fundamentally flawed.

        Ah, grasshopper, for this I will have to take you on a travel through time.

        What reason, really, is there to make it hard to edit an electronic document?

        You appear to have cause and effect reversed based on that statement. PDF was not designed to be intentionally hard to edit. Editing a PDF after it was produced was not even in the design parameters. PDF is hard to edit because it is electronic paper, not the other way round.

        For this time warp, I have to drag you back in time to circa. 1991-1992, when PDF was first being designed at Adobe. This is the day before the general public having any knowledge of "the internet" (it existed, but most had no access, so to them it did not exist). Documents were created on a multitude of different word processors, themselves running on a multitude of different operating systems for a multitude of different machines. Yes, the seeds had been sown for the Intel chips to take over the world, and for ms to take over the world, but neither crop had borne much fruit yet, so both were just one of many competing to be the eventual winners.

        Electronic communication, such as it was, consisted of someone trying to email a file via Compuserve or The Source or Prodigy or maybe AOL (I think AOL did exist, but AOL had not introduced the masses to the internet yet). The problem was, given 12+ different word processors, unless the recipient had the same word processor, sometimes on the same hardware, and with the same set of font files installed (font files cost real money then, so this part was not always certian) there was no real guarantee that what the recipient saw when viewing your attachment looked the same as how it looked when you authored it before sending. And if you did use any of those fancy font files you might have bought, chances were, the destination view either differed dramatically, or did not show anything at all. So the only way to be sure the recipient saw your document in all its pixel perfect glory as you saw was to print it to paper, stuff the paper in an envelope, attach postage, and mail it to the recipient. And for some folks (who still exist today, they are the ones designing websites with fixed widths and heights on everything that won't adjust to the view screen, because their art is more important than the message to them) it was very important that the receiver see all the wonderful fancy formatted glory that was put into crafting the document.

        And perish the thought of simply opening a foreign document format from word-processor WP in word-processor XY. Word processors opened/saved their own formats and did not even acknowledge that other formats existed. And if you were lucky enough to have a conversion program for the two formats you needed to convert between, then often what came out as an XY file from a WP file input looked like it had been passed through a shredder along the way.

        This was the world that Adobe was operating within. They were, at this point, primarily targeting desktop publishing activities, as in fancy magazine layouts/etc. where the positioning of every single item on a page was (in their mind) critically important. And they wanted to create some kind of electronic file format that would allow a desktop publisher designer to design a page and electronically convey that designed page to someone else, and someone else have a reasonable hope of seeing it look the same as it did when the designer finished deigning it. And given that the standard medium of exchange, at the time, was printed sheets of paper, and that their (Adobe's) tools were tools for creating fancy formatted sheets of paper, they decided to create a file format that contained the typesetting instructions to place font glyphs at specific locations on a virtual sheet of paper. Actually, they created PDF as a cut-down, non-turing complete, version of their own Postscript programming language which had, again, itself been designed for placing marks on pieces of paper. I.e., most everything they had created at this time was designed around "creating sheets of printed paper". So it is only natural that their new document format was created as "electronic paper".

        And "electronic paper" it really is. The low level PDF instructions to place font glyphs on the output "electronic paper" consist of basically three types:

        1. pick a font to use, and scale it to some size
        2. position a virtual cursor at an x,y coordinate on the virtual sheet of paper
        3. draw text glyphs at the current virtual cursor position

        And, for circa 1991-1992, and for Adobe's intended use (as a final destination output format for a page design) this probably seemed like a very reasonable thing to do. The intent was that if someone needed to edit, they would edit the source and then print-to-pdf all over again, not attempt to edit the pdf.

        And, if you consider the two "drawing primitives" provided: "position cursor at x,y" and "draw text at cursor" you see why editing is very hard. In order to insert a word in the middle of a line, an editor has to first insert the text, then take the remainder of the line and adjust the x,y coordinates to shift it over. Then it has to figure out what text would extend past the page margin (with nothing in the document telling it where the margin was in the original document) and move that text to the next line, where it has to repeat the process of adjusting all these x,y positioning to make things line back up again. And if even one word falls off the bottom of a page, then the editor program has to repeat this adjusting of x,y positions for every other page after the current page in the PDF, adjusting all those "position cursor at x,y" and "draw text at cursor" commands for each page.

        So the "hard to edit" part came about as a result of the design of how it would draw text on a page, which itself was derived from how Postscript drew text on a page. PDF's are not hard to edit because Adobe set out to make them hard to edit, they are hard to edit because Adobe decided to capture the literal positioning data that a typesetting program generates to position items on a page as the document storage format. Doing that allowed them to guarantee one of the very early marketing slogans for PDF (it looks identical everywhere, actually I think it was "it prints identically everywhere"), but also resulted in a file storage format that was extremely hard to edit.

        • (Score: 3, Interesting) by bzipitidoo on Thursday May 27 2021, @04:57AM (1 child)

          by bzipitidoo (4388) on Thursday May 27 2021, @04:57AM (#1139183) Journal

          Thanks, but I know all that. I was there.

          Correct, Adobe wasn't worrying about editability one way or the other, and yes, the intended way to change a PDF is to edit the source and generate a new PDF. I am glad you mention Postscript. Postscript was intended as instructions for printers. It was of course easy to divert those printer instructions to a file, and then what was needed was a reader that could render them to a graphical display screen. Also needed graphics capable of displaying the results-- 640x480 is real tight, and higher resolutions than that didn't become widely available until the 1990s. Soon it became more common to display a postscript file to the screen than print it to paper.

          Once you get away from paper, and consider how best to store knowledge digitally, it's rather obvious that PDF is terrible. The interesting part of any document is the contents, not typesetting data. The two should not be jumbled all together. HTML has a lot of shortcomings, but it is just a plain better approach to the problem. The world isn't standardized on 8.5x11 inch paper. PDF cannot adjust to a change of size, HTML can. PDF's very rigidity is why it has to contain the fonts, HTML easily accommodates changing to different fonts. And, shouldn't the readers have the option to pick the font, if they don't like the writer's choice? PDF lets the writer dictate that and other such details to the reader. HTML gives the reader much more control.

          You mention that all the word processors saved documents by basically dumping their working memories to files, and this resulted in nothing being compatible with anything else. Yes, but there was a standard then, and it was even free and open: LaTeX, and before that, TeX.

          • (Score: 0) by Anonymous Coward on Friday May 28 2021, @03:06PM

            by Anonymous Coward on Friday May 28 2021, @03:06PM (#1139653)

            Thanks, but I know all that. I was there.

            A fact that is impossible to tell from just a username.

            Soon it became more common to display a postscript file to the screen than print it to paper.

            Which I suspect had a large impact in Adobe's invention of PDF. PDF is the Postscript font and rendering engine hooked up to a different set of formatting instructions. Since Postscript was already calculating exact pixel positioning for every item drawn on a page, having PDF simply be a format that archived that positioning info meant that PDF was not much of a change from Postscript (the single biggest difference is dropping the general purpose programming language part of Postscript). PDF is largely what you get if you start with Postscript, remove all the general purpose programming language commands, and rename the "drawing commands" into different names.

            Once you get away from paper, and consider how best to store knowledge digitally, it's rather obvious that PDF is terrible.

            100% agreement. PDF is not at all a good format into which to store data of any form. The only and only thing PDF does well is preserve the physical page layout of the printed document, hense my referring to PDF as electronic paper. It really is little more than electronic paper

            The interesting part of any document is the contents, not typesetting data. The two should not be jumbled all together. HTML has a lot of shortcomings, but it is just a plain better approach to the problem. The world isn't standardized on 8.5x11 inch paper.

            Also full agreement. PDF is rigid, just like a physical sheet of paper can't change size to accommodate some difference in viewing, neither can PDFs. PDF's simply preserve the exact pixel positioning of everything on the page.

            PDF lets the writer dictate that and other such details to the reader. HTML gives the reader much more control.

            Yup, and there is probably an underlying reason (beyond that Adobe simply distilled Postscript down to just the "drawing commands") for why PDF is so rigid. Have you ever had the miss-fortune to work with any of the "page designer" or "page layout" crowd? I.e., the folks one hires to do the magazine layout and decide how things should look? These folks, almost 100%, all consider the "design" (the layout, where things are positioned, how much space is here, how big this font is set) over the actual "content" of anything the produce. A huge part of this is because for them, often, when they are producing a layout design, the content is something like lorem ipsum [wikipedia.org] text (i.e., meaningless filler) and so the only thing they deal with, and the only thing they can use to pat each other on the back for "job well done" is the layout (i.e., the physical arrangement of stuff on the page).

            These same folks are also almost rabid in their belief that an end recipient of their wondrous "design" should only ever be able to see their wondrous "design" in its exact, pixel perfect, positional glory. This comes in large part from the "design" being all they have to congratulate themselves about, since the content, for them, when they did the job was just lorem ipsum. And back in the late 80's an early 90's at Adobe, this was the world into which Adobe was pandering their software offerings. The page layout designer who was rabid in his/her belief that their design should never be modified from the beautiful work of art they created by anyone viewing it later on any medium. With this being their world, it is no wonder that the folks at Adobe who dreamed up converting Postscript into what became PDF saw no problems what-so-ever with PDF's rigidity. The expectation in their world was that the document storage format should rigidly preserve their wondrous design for everyone to marvel at who later viewed it.

            These same folks are also why HTML has been soiled by CSS that provides the ability to do pixel exact, unchanging, positioning and sizing. They simply could not handle the concept that something they "designed" might be modified by an end users browser such that things were no longer exactly positioned where they, the designer, decided they should be positioned. Every single CSS declaration where there is the ability to exactly position some HTML element is there as pandering to this world view on the part of the layout artists.

            You mention that all the word processors saved documents by basically dumping their working memories to files, and this resulted in nothing being compatible with anything else.

            Nope, I said nothing of the sort. Someone else has mentioned that MSWord's old DOC format was basically a memory dump from word's heap, and that fact has been known for some time. But whomever mentioned that wasn't me. What I said was there were something like 12 different word-processors, each reading/writing 12 different file formats (each format specific to the WP that wrote it), and with none of the 12 providing much of any ability to interoperate with the others (i.e., read/write the other 11 formats that were not their own). But I did not say that all 12 were memory dumps. They might have all been memory dumps, or maybe only one of them was a memory dump (msword's doc format). But I never said they were all memory dumps, just that they were all incompatible with sharing with each other.

            but there was a standard then, and it was even free and open: LaTeX, and before that, TeX.

            Indeed, yes, there was. And unless one was an academic going for their doctorate in one of the sciences that published via TeX/LaTeX one generally knew nothing of the existence of those tools. A format based on Tex/LaTex source, plus enough extra baggage to carry any custom fonts used by the Tex/LaTex source, would have been a far superior way to exchange documents that were also useful as data sources than PDF will ever be.