Stories
Slash Boxes
Comments

SoylentNews is people

posted by chromas on Sunday April 19 2020, @10:04AM   Printer-friendly

Understanding RAID: How performance scales from one disk to eight:

One of the first big challenges neophyte sysadmins and data hoarding enthusiasts face is how to store more than a single disk worth of data. The short—and traditional—answer here is RAID (a Redundant Array of Inexpensive Disks), but even then there are many different RAID topologies to choose from.

Most people who implement RAID expect to get extra performance, as well as extra storage, out of all those disks. Those expectations aren't always rooted very firmly in the real world, unfortunately. But since we're all home with time for some technical projects, we hope to shed some light on how to plan for storage performance—not just the total number of gibibytes (GB) you can cram into an array.

A quick note here: Although readers will be interested in the raw numbers, we urge a stronger focus on how they relate to one another. All of our charts relate the performance of RAID arrays at sizes from two to eight disks to the performance of a single disk. If you change the model of disk, your raw numbers will change accordingly—but the relation to a single disk's performance will not for the most part.

[...] For all tests, we're using Linux kernel RAID, as implemented in the Linux kernel version 4.15, along with the ext4 filesystem. We used the --assume-clean parameter when creating our RAID arrays in order to avoid overwriting every block of the array, and we used -E lazy_itable_init=0,lazy_journal_init=0 when creating the ext4 filesystem to avoid contaminating our tests with ongoing background writes initializing the filesystem in the background.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by captain normal on Monday April 20 2020, @05:54PM (4 children)

    by captain normal (2205) on Monday April 20 2020, @05:54PM (#985169)

    We are talking about RAID use here, which is what SMR drives are designed for. It even says that in the article you linked to.

    --
    When life isn't going right, go left.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by RS3 on Monday April 20 2020, @06:47PM (3 children)

    by RS3 (6367) on Monday April 20 2020, @06:47PM (#985182)

    That's not what I read in that linked Tom's Hardware article.

    "compatibility issues have cropped up in RAID and ZFS applications that users have attributed to the unique performance characteristics of the drives."

    Another article from a few days ago talked much more in depth: that SMR drives can delay so much that some RAID controllers (sw or hw) think the drive has gone offline, flag the drive as dead, flag the array as critical, and you better hope it doesn't happen a 2nd time (if RAID-5, or 3rd time if RAID-6) until someone fixes the problem.

    You could argue that the RAID controller should be able to adapt to, or maybe communicate with the drives, discover all necessary specs, and compensate. But even if that were true, as the article describes, the drives' manufacturers hid the fact that they were shingled. Lawyers are waiting to pounce...

    • (Score: 1, Informative) by Anonymous Coward on Monday April 20 2020, @08:42PM (2 children)

      by Anonymous Coward on Monday April 20 2020, @08:42PM (#985216)

      The problem in those WD drives are that in order to make up for the longer write time, they have huge amounts of cache. If you do certain write/wait/read combinations, instead of returning the cached data, the drive reports an error. That error is why it is flagged as bad, not a result of some sort of busy timeout. Users on the various DataHoarder forums have been running SMR for years without problems until this latest batch with the firmware error, which some of the other known WD SMR drives also have, cropped up.

      • (Score: 2) by RS3 on Monday April 20 2020, @11:40PM (1 child)

        by RS3 (6367) on Monday April 20 2020, @11:40PM (#985277)

        Yes, exactly! In fact, what I read somewhere is that most of the drive is shingled, but a small part is not, and the small conventional part is used as a cache. I think I've had some drives like that because they'd sit and thrash, kind of like flushing buffers, when nothing was happening in sw or OS.

        Again, my understanding is: it's not that the drive is reporting an error, rather that the drive is "busy" so long that the controller times out and flags the drive as bad.

        Which kind of raises the question of: should a long delay be considered to be a defect?

        • (Score: 0) by Anonymous Coward on Tuesday April 21 2020, @08:52AM

          by Anonymous Coward on Tuesday April 21 2020, @08:52AM (#985377)

          There are two separate issues here. The first is that the firmware itself has a bug where it reports errors on certain combinations of SATA commands. That is how people finally nailed this thing as an SMR, as the same bug is present in other SMR drives by WD and software works around it. The second is the interaction between ZFS, ERC, and response time, which is what initially clued people in on something being different here.

          As to the answer to your question, the answer is "it depends." This is why ERC is a thing and each controller/disk/admin gets to make the choice themselves as to what behavior they want.