Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday May 26 2021, @01:13PM   Printer-friendly
from the vendor-capture dept.

There are still a few months to fix this, but for now the US Patent and Trademark Office's (USPTO) Acting Commissioner for Patents, Andrew Faile, and Chief Information Officer, Jamie Holcombe, have announced that starting January 1st, 2022, the USPTO will institute a surcharge for applicants that are not locked into Microsoft products via the proprietary DOCX format. From that date onwards, the USPTO will move away from PDF and require all filers to use that proprietary format or face an arbitrary surcharge when filing.

First, we delayed the effective date for the non-DOCX surcharge fee to January 1, 2022, to provide more time for applicants to transition to this new process, and for the USPTO to continue our outreach efforts and address customer concerns. We've also made office actions available in DOCX and XML formats and further enhanced DOCX features, including accepting DOCX for drawings in addition to the specification, claims, and abstract for certain applications.

One out of several major problems with the plans is that DOCX is a proprietary format. There are several variants of DOCX and each of them are really only supported by a single company's products. Some other products have had progress in beginning to reverse engineering it, but are hindered by the lack of documentation. DOCX is a competitor to the fully-documented, open standard OpenDocument Format, also known as ISO/IEC 26300.

DOCX is not to be confused with OOXML, though it often is. While OOXML, also known as ISO/IEC 29500, is technically standardized, it is incompletely documented and only vaguely related to DOCX. The DOCX format itself is neither fully documented nor standard. So the USPTO is also engaged in spreading disinformation by asserting that it is.

Previously:
(2015) Microsoft Threatened the UK Over Open Standards


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2, Interesting) by Anonymous Coward on Wednesday May 26 2021, @09:47PM (2 children)

    by Anonymous Coward on Wednesday May 26 2021, @09:47PM (#1139105)

    The PDF instructions to "draw" a line of text equivalent to your 'line' above could be something like this:

    BT
    /Times-Roman 12 Tf
    1 0 0 1 0.0 0.0 Tm
    (This is some text In) Tj
    /Times-Bold 12 Tf
    1 0 0 1 21.0 0.0 Tm
    (bold) Tj
    /Times-Roman 12 Tf
    1 0 0 1 25.0 0.0 Tm
    (In) Tj
    /Times-BoldItalics 12 Tf
    1 0 0 1 29.0 0.0 Tm
    (italics bold) Tj
    /Times-BolldItalics 12 Tf
    1 0 0 1 43.0 0.0 Tm
    (and an image) Tj
    ET

    Note: I say "could" because this is not the only way this text could be typeset. The driver could position and draw each letter separately. The driver could group together text under the same font and do a font change (the /Times-* Tf lines) once, then position all of the text under than font, then change font, and position the next text pieces. The driver could draw things in reverse, up the page, down the page, diagonally, in a circle, etc. The end result, as long as the same pixels are turned black, is irrelevant to you viewing the final product. But all those possible combinations makes extracting text very challenging. Certainly not as easy as "remove XML tags and what is left is text.

    Starting Score:    0  points
    Moderation   +2  
       Interesting=1, Informative=1, Total=2
    Extra 'Interesting' Modifier   0  

    Total Score:   2  
  • (Score: 2) by ElizabethGreene on Wednesday May 26 2021, @10:04PM (1 child)

    by ElizabethGreene (6748) Subscriber Badge on Wednesday May 26 2021, @10:04PM (#1139108) Journal

    Thanks for explaining it.

    I don't want to undermine the other comments about undocumented features though. I have no doubt that some word documents have the similar fiddly bits around typesetting, positioning, etc. That's inevitable when you've got 35+ years of backwards compatibility under the hood.

    • (Score: 0) by Anonymous Coward on Wednesday May 26 2021, @10:18PM

      by Anonymous Coward on Wednesday May 26 2021, @10:18PM (#1139111)

      Very true, although for newly authored documents in newer versions of word (i.e., those that support DOCX) most of those 35+ years of cruft is not present in the DOCX file. And, of course, an old copy of word that never understood DOCX can't save it's cruft into a DOCX file, because it can't write DOCX. One would have to open a ancient .doc in a modern word then save as DOCX to even cause some of that "really old crud" to even appear.

      Is DOCX the best choice from a future compatibility standpoint? Most definitely not.

      Is DOCX a pragmatic choice that allows extracting text and document structure (lists/tables/images, paragraphs/etc.) because msword is what most of the law firms use, so they (the firms/attys) don't have to change much at all to supply the files? Most definitely yes.

      So it is a trade off of needing to play "tail wags dog games" periodically as ms changes the DOCX format vs. having every firm/atty. balk at the change and dig their feet in and refuse to change (whereupon one gets nowhere).

      The current PDF filing system is essentially little more than "electrification of a US Postal Service envelope". It is based upon supplying virtual sheets of printed paper in a virtual PDF envelope and is as close as possible to an exact clone of "print doc to paper, stuff paper in USPS envelope, mail envelope (with sufficient postage) to the USPTO" process that occurred before as can possibly be. And that minimal shift from physical paper and USPS to virtual paper and the internet was the way it was largely to prevent "law-firms dig heels in, refuse to go along".