Developer Antonio Diaz has updated his report on the Xz compression format and concluded that the Xz format is inadequate for general use [nongnu.org].
One of the challenges of digital preservation is the evaluation of data formats. It is important to choose well-designed data formats for general use. This article describes the reasons why the xz compressed data format is inadequate for most uses, including long-term archiving, data sharing, and free software distribution. The relevant weaknesses and design errors in the xz format are analyzed and, where applicable, compared with the corresponding behavior of the bzip2, gzip, and lzip formats. Key findings include: (1) safe interoperability between xz implementations is not guaranteed; (2) xz is vulnerable to unprotected flags and length fields; (3) LZMA2 is unsafe [soylentnews.org] and less efficient than the original LZMA; (4) xz's extensibility is unreasonable and problematic; (5) xz includes useless features that increase the number of false positives for corruption; (6) xz shows inconsistent behavior with respect to trailing data; (7) error detection in xz is less accurate than in bzip2, gzip, and lzip.
Disclosure statement: The author is also author of the lzip format.
Acknowledgements: The author would like to thank Lasse Collin for his detailed and useful comments that helped improve the first version of this article. The author would also like to thank the fellow GNU developers who reviewed this article before publication and the people whose comments have helped to fill in the gaps.
This article was originally published under the title "Xz format inadequate for long-term archiving", but further analysis revealed that xz is significantly less safe than bzip2, gzip, and lzip for most uses.
This article tries to be as objective and complete as possible. Please report any errors or inaccuracies to the author at the lzip mailing list [...] or in private at [...]. As of today, no formal refutation of this article has been reported.
Notably, Xz's integrity checking is optional, proving unreliable even when the file provides a check sequence.
Previously:
(2024) xz-style Attacks Continue to Target Open-Source Maintainers [soylentnews.org]
(2024) The Mystery of ‘Jia Tan,’ the XZ Backdoor Mastermind [soylentnews.org]
(2024) xz: Upstream Repository and the xz Tarballs Have Been Backdoored [soylentnews.org]