Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Data corruption making it into enterprise SAN

Rejected submission by fakefuck39 at 2020-10-31 12:26:00
Hardware

A bad trend has been making it into datacenters over the decade. Storage vendors are saving hardware by pretending hashing is compression.

You buy an all-flash storage array or some HCI to put your petabytes of data on. Due to the high cost of flash, those always dedupe (SHA1/256), followed by fast compression (LZW/deflate). On many enterprise storage solutions costing millions of dollars, there is no byte for byte check on written blocks to avoid hash collisions. What you end up with is corrupt data, and the vendors downplay it as an impossibility by assuming SHA has perfectly random distribution.

People (engineers like myself) have been bringing this up to the vendors for many years, and every time we're told things like "don't tell the customer unless they specifically ask" and "it'll be million years before it happens on a single block."

Here's how the argument by the vendor goes: You have a petabyte of usable data with 5% daily change rate. On a SHA-1, the chance of collision on that storage array during its 3 year lifecycle is 10e-17. Just ignore it, it's not an issue.

The problem with this, is SHA does not have perfectly random distribution. We don't know what the distribution is, but we do realize that in no way is it fully random - this means some blocks are a lot more likely to hit the same hash, resulting in data corruption.

The other issue is they estimate for that storage array till it's decommed. In reality, that data just moves to another array, then another, and the chance keeps adding while the data is still there - so the lifetime of the company and its data, likely longer. Oh, you say, but if I find a corrupt block, I'll go to a snapshot, a DR copy, or a backup. Well, if there's a hash collision for 2 blocks on your primary copy, there's a collision on your DR copy, your snapshots, and your backups - they all use dedupe. Of course you also don't just have one storage array - you have 20.

Here's what happens when someone asks the vendor directly about it: they brush it off and close comments. This one is from 2017
https://www.dell.com/community/XtremIO/SHA-1-Collision-and-XtremIO-ProofOfConcept/td-p/7122078 [dell.com]

Speak with your wallet, because they haven't listened to words since 2013. When you buy something you have to specifically ask the vendor for a document stating they do a byte for byte check (readback verification). Words from presales during a meeting mean nothing.

examples that corrupt your data:
VMware vSAN DRR, like EMC VxRail.
EMC XIO

will not corrupt your data:
NTAP AFF
Pure
EMC Unity
EMC Isilon


Original Submission