Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday May 26 2021, @01:13PM   Printer-friendly
from the vendor-capture dept.

There are still a few months to fix this, but for now the US Patent and Trademark Office's (USPTO) Acting Commissioner for Patents, Andrew Faile, and Chief Information Officer, Jamie Holcombe, have announced that starting January 1st, 2022, the USPTO will institute a surcharge for applicants that are not locked into Microsoft products via the proprietary DOCX format. From that date onwards, the USPTO will move away from PDF and require all filers to use that proprietary format or face an arbitrary surcharge when filing.

First, we delayed the effective date for the non-DOCX surcharge fee to January 1, 2022, to provide more time for applicants to transition to this new process, and for the USPTO to continue our outreach efforts and address customer concerns. We've also made office actions available in DOCX and XML formats and further enhanced DOCX features, including accepting DOCX for drawings in addition to the specification, claims, and abstract for certain applications.

One out of several major problems with the plans is that DOCX is a proprietary format. There are several variants of DOCX and each of them are really only supported by a single company's products. Some other products have had progress in beginning to reverse engineering it, but are hindered by the lack of documentation. DOCX is a competitor to the fully-documented, open standard OpenDocument Format, also known as ISO/IEC 26300.

DOCX is not to be confused with OOXML, though it often is. While OOXML, also known as ISO/IEC 29500, is technically standardized, it is incompletely documented and only vaguely related to DOCX. The DOCX format itself is neither fully documented nor standard. So the USPTO is also engaged in spreading disinformation by asserting that it is.

Previously:
(2015) Microsoft Threatened the UK Over Open Standards


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by Anonymous Coward on Wednesday May 26 2021, @01:31PM (13 children)

    by Anonymous Coward on Wednesday May 26 2021, @01:31PM (#1138904)

    Why would they transition away from PDF to docx, I can think of only one reason.

    Which brings me to the next question, who is getting paid to screw us all over?

    Starting Score:    0  points
    Moderation   +3  
       Insightful=2, Underrated=1, Total=3
    Extra 'Insightful' Modifier   0  

    Total Score:   3  
  • (Score: 2, Informative) by Anonymous Coward on Wednesday May 26 2021, @03:15PM (6 children)

    by Anonymous Coward on Wednesday May 26 2021, @03:15PM (#1138958)

    Sadly there is a very important reason to move from PDF to another format for legal documents: Modern PDF formats are no longer deterministic, meaning that the contents can change depending on when or where the file is viewed.

    • (Score: 1, Insightful) by Anonymous Coward on Wednesday May 26 2021, @03:52PM (5 children)

      by Anonymous Coward on Wednesday May 26 2021, @03:52PM (#1138974)

      And docx doesn't have that problem? Or more importantly, doesn't YET have that problem? Just because a mis-feature like that is one format does not mean it can't/won't be implemented elsewhere.

      If this were really the issue, then the CORRECT solution would be to mandate use of a PDF SUBSET instead. Anything that does not work in Acrobat Reader 5.0.5 will automatically get stripped out.

      • (Score: 0) by Anonymous Coward on Wednesday May 26 2021, @04:39PM (3 children)

        by Anonymous Coward on Wednesday May 26 2021, @04:39PM (#1138992)

        Considering all the issues with word macro viruses of the past, it seems rather foolish to go with any standard that allows for things other than text to be embedded directly into the document.

        • (Score: 5, Interesting) by ElizabethGreene on Wednesday May 26 2021, @07:52PM (2 children)

          by ElizabethGreene (6748) Subscriber Badge on Wednesday May 26 2021, @07:52PM (#1139069) Journal

          Patents rely very heavily on images, so you'd need a combination of text and images.

          If only we had some kind of hypertext markup language that allowed the combination of text and images into some form of document. It'd be really cool if it allowed you to specify headings, sections, subsections, image captions, etc. too.

          • (Score: 2) by nostyle on Thursday May 27 2021, @02:02AM (1 child)

            by nostyle (11497) on Thursday May 27 2021, @02:02AM (#1139154) Journal

            So why don't word processors input/output something like SGML anyway?

            • (Score: 2) by ElizabethGreene on Thursday May 27 2021, @03:47PM

              by ElizabethGreene (6748) Subscriber Badge on Thursday May 27 2021, @03:47PM (#1139335) Journal

              The realization I'm working through is that they all do output something like SGML.

              For .rtf the control words all start with \ and there is some funny grouping with brackets.

              {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
                This is some {\b bold} text.\par
                }

              (Source: Wikipedia)

              LaTex likes \control words too.

              PDF was shown above.

              For .docx, sgml, and html, the control words are xml-like tags.

              Where I'm sitting it looks like it's markup all the way down.

              The ideal file format would be tiny in total file size (compressed text), contain trivially parse-able markup around the clear text (html), high resolution digital images of the text as the creator intended it to be displayed (tiff), some kind of pinning between the images and clear text to allow copy/paste intelligently (word), metadata about the file source/history(?), and be forward/backward compatible forever (plain text). I don't think it exists.

              Obligatory XKCD [xkcd.com]

      • (Score: 1, Informative) by Anonymous Coward on Thursday May 27 2021, @01:46AM

        by Anonymous Coward on Thursday May 27 2021, @01:46AM (#1139151)

        It's just one of many problems with docx, but it gives them something to point at while embracing the 'open' OOXML 'standard'. That docx isn't OOXML doesn't matter either. As long as there is enough noise to keep people distracted and disoriented then any attempt at a coherent objection can be subverted.

  • (Score: 5, Informative) by Anonymous Coward on Wednesday May 26 2021, @05:02PM (5 children)

    by Anonymous Coward on Wednesday May 26 2021, @05:02PM (#1139007)

    Why would they transition away from PDF to docx, I can think of only one reason.

    Then your ability to reason out possible alternatives needs some exercise.

    Why transition away from PDF? Because the legacy PDF system is based upon using PDF as a carrier for bitmap scanned images. Most of the PDF's simply contain a scanned image of a sheet of paper. The reason for DOCX is to receive the data prior to converting it to "layout format" so that the actual text content can be extracted and the higher level document structure detected (i.e., what is a header, where paragraphs start/end, etc.). PDF, even when the PDF is textual, is a pure layout format. Internally, a textual pdf is simply a series of instructions to position letters at specific x,y positions on a virtual page. There are no concepts of "this block of text is a paragraph" or "this line is a level 2 heading line". Extracting the text from a PDF is possible (provided the PDF authoring library includes the, sadly, optional table mapping byte values within the PDF to unicode code points) but all the higher level document structure is gone. You simply get letters or words positioned at x,y coordinates on a virtual printed sheet of paper.

    Now, you'll naturally want to move the goalposts to, "ok, but why DOCX?". And the answer there is trivially simple. Because the vast majority of the attorneys are already using msword to craft the documents in the first place, so they are just accepting what the lawfirms already use. Convincing lawfirms that the now need to install LibreOffice instead of msword to then write an ODT file would be like trying to pull teeth from chickens.

    Is DOCX the best choice, no. But it is the pragmatic choice given that almost all of the law firms are already using msword.

    • (Score: 0) by Anonymous Coward on Wednesday May 26 2021, @06:15PM

      by Anonymous Coward on Wednesday May 26 2021, @06:15PM (#1139039)

      In correct there is 1 group that needs to correct the error: Microsoft They need to generate compatible files - not the other way around.

      DOCX still has many fearyes defined by OLD standards from Word... going back to Word3 (1990). Get the job done right ONCE!!!

      F..king Monopoly.

    • (Score: 1, Insightful) by Anonymous Coward on Wednesday May 26 2021, @08:01PM

      by Anonymous Coward on Wednesday May 26 2021, @08:01PM (#1139073)

      fuck the lawyers and the dumb ass whores at the USPTO! Extorted public money? Open fucking formats. Stupid pieces of shit!

    • (Score: 1, Informative) by Anonymous Coward on Thursday May 27 2021, @02:48AM

      by Anonymous Coward on Thursday May 27 2021, @02:48AM (#1139166)

      "Convincing lawfirms that the now need to install LibreOffice instead of msword to then write an ODT file would be like trying to pull teeth from chickens. "

      No need to convince anyone. Msword can save files in ODT format.

    • (Score: 2) by FatPhil on Thursday May 27 2021, @12:29PM (1 child)

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Thursday May 27 2021, @12:29PM (#1139264) Homepage
      So you're saying we need a Rich Text Format instead? Or if we wish parts of the document (such as a ToC, index, or body text) to be able to refer to other parts of the document (such as sections, images, or tables within), or even extern documents, then some kind of Hypertext Markup Language?
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves