Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by janrinok on Tuesday May 26 2015, @04:16PM   Printer-friendly
from the patch-immediately dept.

The combination of RAID0 redundancy, an ext4 filesystem, a Linux 4.x kernel, and either Debian Linux or Arch Linux has been associated with data corruption.

El Reg reports EXT4 filesystem can EAT ALL YOUR DATA

Fixes are available, one explained by Lukas Czerner on the Linux Kernel Mailing List. That post suggests the bug is long-standing, possibly as far back as the 3.12-stable kernel. Others suggest the bug has only manifested in Linux 4.x.

[...] This patch for version 4.x and the patched Linux kernel 3.12.43 LTS both seem like sensible code to contemplate.


[Editor's Comment: Original Submission]

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by tibman on Tuesday May 26 2015, @04:49PM

    by tibman (134) Subscriber Badge on Tuesday May 26 2015, @04:49PM (#188133)

    Raid0 is the opposite of redundant. It is a guarantee that if any one drive fails you'll lose data for ALL drives. http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0 [wikipedia.org]

    Anyone running raid 0 does so knowing that their data exists in a fragile bubble.

    --
    SN won't survive on lurkers alone. Write comments.
    Starting Score:    1  point
    Moderation   +3  
       Informative=3, Total=3
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 4, Informative) by LoRdTAW on Tuesday May 26 2015, @05:34PM

    by LoRdTAW (3755) on Tuesday May 26 2015, @05:34PM (#188157) Journal

    Correct, the proper RAID level for redundancy, also called mirroring, is RAID 1.

    RAID 0 is striping which writes data in parallel between two drives increasing read write performance by a factor upward of two. Half of your data is on disk, half on the other. If you lose one disk, you lose everything. Mitigating this is done using RAID 0+1 which is exactly what you'd expect, two RAID 0 arrays which are mirrored. But you are still stuck with losing half of the storage space meaning four 1TB disks gives you only 2TB. The next level up is RAID 5 which has a storage penalty of -1 disk. So a four 1TB disk array will yield 3TB. The parity data is spread across all four (or more) disks in RAID 5 array. You can afford to loose any one of the four disks and still have access to the array. RAID 6 doubles the parity so you can afford to lose two disks.

    And remember, RAID 1/5/6 is for redundancy to reduce downtime. It is NOT backup.

    • (Score: 2) by jmorris on Tuesday May 26 2015, @07:42PM

      by jmorris (4844) on Tuesday May 26 2015, @07:42PM (#188222)

      Yes, RAID5 would yield 3TB vs the 2TB of RAID 1+0 (also called RAID 10). Now look at the cost. A one byte write requires reading all four drives and writing at least two plus calculating and then recalculating the parity. If done in soft RAID it does horrid things to the CPU cache as well. The RAID 10 not only cuts out two reads, both writes are of the same data and none of the unmodified data need pass through the CPU cache if the drivers for the drive interfaces are modern. When writing larger blocks it usually needs to run the data through the cpu cache twice but you might get slightly faster total write throughput in exchange since three drives are sinking the data (plus one more taking parity info) instead of two (plus two taking redundent copies). Which way the performance vs capacity balance swings depends on the intended use. And adding a hardware RAID controller eliminates all of the cache considerations at more upfront expense, system complexity, possible closed array management utilities, etc.

      And remember, RAID 1/5/6 is for redundancy to reduce downtime.

      Preach brother! Filesystem corruption, accidental deletion, power surge, all these things can totally hose you despite RAID.

      • (Score: 3, Informative) by TheRaven on Wednesday May 27 2015, @01:22PM

        by TheRaven (270) on Wednesday May 27 2015, @01:22PM (#188579) Journal

        A one byte write requires reading all four drives and writing at least two plus calculating and then recalculating the parity

        You don't write bytes to block devices, you write blocks. Blocks are generally stored in the buffer cache and most RAID arrangements write stripes in such a way that they're normally read and cached similarly (e.g. logical blocks n to n+m in an m-way RAID set are blocks 0 to m in a single stripe). You will need to do a read before doing a write if you're writing somewhere that isn't in the buffer cache, but this is a comparatively rare occurrence.

        If done in soft RAID it does horrid things to the CPU cache as well.

        That's far less true on modern CPUs. Intel can do xor in the DMA controller, so you don't actually need it to come closer to the CPU than LLC anyway, but even without that most CPUs support non-temporal loads (and will automatically prefetch streaming memory access patterns) and uncached stores, so you will not trample the cache too much. At most you should be killing one way in each associative cache, so an eighth to a quarter of the cache (depending on the CPU), if implemented correctly.

        --
        sudo mod me up
  • (Score: 3, Interesting) by gman003 on Tuesday May 26 2015, @06:05PM

    by gman003 (4155) on Tuesday May 26 2015, @06:05PM (#188170)

    Does anybody really run RAID0 anymore? The canonical use case was for fast throwaway partitions like /tmp, or even swap space, but now that RAM is so plentiful, people run those partitions on ramdisks instead. And if it's too big for a ramdisk, there's SSDs. Is there anything that a) needs higher I/O performance than a single drive, b) can tolerate data loss, and c) is too big to fit in RAM or an SSD?

    I know there's some high-capacity SSDs that secretly RAID0 together two smaller SSDs rather than use a controller that can natively handle that much flash, but that's not really relevant to this sort of issue, and is a dying practice anyways.

    • (Score: 2) by kaszz on Tuesday May 26 2015, @06:42PM

      by kaszz (4211) on Tuesday May 26 2015, @06:42PM (#188184) Journal

      Perhaps in situations where you have multiple layers of disc packs? Like RAID 5 which in turn is clustered as a RAID 0 volume etc.

    • (Score: 2) by richtopia on Tuesday May 26 2015, @07:18PM

      by richtopia (3160) on Tuesday May 26 2015, @07:18PM (#188203) Homepage Journal

      I have some tools running RAID0 WD Raptor drives - they were built before SSDs were readily available and the policy is to maintain the original hardware profile if possible.

      Don't get me wrong - this is a silly point of failure for a million dollar tool, but legacy is important!

    • (Score: 4, Interesting) by VLM on Tuesday May 26 2015, @07:28PM

      by VLM (445) on Tuesday May 26 2015, @07:28PM (#188211)

      Its nice for huge logs, both higher level stuff and packet sniffy at "system wide" levels not just monitoring one machine.

      How often is this important, well, practically never. But its nice enough to have.

      This rapidly runs into the ram limitation that WTF are you doing if you generate more than dozens of GB of raw data, and thats nice that you gathered it but now what do you propose to usefully do with it in a reasonable period of time? So I can build something bigger than I can find a productive use for it.

      I always figured it would be useful for a really wide broadband SDR, no decimation, nothing just slam gigs/sec onto a drive for later analysis. That would work pretty well for RAID0.

      You can also do stupid nerdy stunts. Floppy drives can't read fast enough to play mp3 files (most can't sustain more than 50K or so) and usually don't store enough data anyway, but if you take a thundering herd of them and plug like 8 external USB floppy drives into a pile of USB hubs and RAID0 them together then its sorta usable. Its not nearly as visually impressive but you can do similar "stupid raid tricks" with USB flash drives. Supposedly it takes "a lot" of parallel USB flash drives to record live video, at least back in the old days when they were slower access, maybe they're fast enough now.

      If you ever get bored, and have a pile of USB flash drives or floppy drives, you can do all kinds of lunatic things with RAID. Its "hilarious" to set up a raid5 and then push the eject button of a floppy drive and then hot-add it back in. I guess on a rainy day I can be easily amused, but it seemed fun at the time.

      A little google work shows I'm not the inventor of this fine idea, there's scant online reference to 127 usb floppy drive arrays out there. That would be a beautiful sight to behold.

    • (Score: 2) by slinches on Wednesday May 27 2015, @06:29AM

      by slinches (5049) on Wednesday May 27 2015, @06:29AM (#188487)

      I just configured a new FEA workstation with a RAID0 array of 10k rpm SAS drives. It's actually being used as the primary active data drive, but the configuration was selected to be an effective working directory for analysis runs. Some of these runs can require several terabytes of I/O with results sets in the hundreds of gigabytes, so high capacity and speed are paramount. Redundancy is then handled by running daily scripted rsync operations to regular SATA drives.

      This setup becomes roughly equivalent to RAID10 in terms of data security except that it's asynchronous, so a drive failure could cause the loss of some recent data. The benefit of accepting that risk is reduced cost, increased usable space and better write performance relative to RAID10 with same number of disks.

      For example:
      8x 1TB SAS 10k rpm drives in a RAID10 array gives 4TB of space with 8x read and 4x write speeds (cost $400 x 8 = $3200)
      6x 1TB SAS 10k rpm drives in RAID0 gives 6TB of space with 6x read and write speeds + 2x 3TB SATA drives (cost $400 x 6 + $150 x 2 = $2700)

    • (Score: 0) by Anonymous Coward on Wednesday May 27 2015, @07:13AM

      by Anonymous Coward on Wednesday May 27 2015, @07:13AM (#188498)

      I believe RAID0 is still big in video editing.

      All those gigabytes of RAM... They can hold a couple of uncompressed frames.

  • (Score: 0) by Anonymous Coward on Tuesday May 26 2015, @06:31PM

    by Anonymous Coward on Tuesday May 26 2015, @06:31PM (#188177)

    Must be the homeopathic version of RAID.

  • (Score: 3, Informative) by gnuman on Tuesday May 26 2015, @09:53PM

    by gnuman (5013) on Tuesday May 26 2015, @09:53PM (#188300)

    Raid0 is the opposite of redundant.

    And that is completely offtopic. It has nothing to do with the bug.

    The bug has everything to do with Linux having macros that look like functions, which result in people not understanding the code completely, making mistakes that are difficult to catch.

              foo(a, b);

    can modify a and b, if it's a macro. These problems are primary reasons why templated functions exist in C++ - to get rid off macros. Maybe Linux should adopt a very explicit style convention for macros. maybe _m_#name or anything else that looks like a plain function call.

    • (Score: 1) by Placenta on Tuesday May 26 2015, @10:04PM

      by Placenta (5264) on Tuesday May 26 2015, @10:04PM (#188306)

      Maybe Linux should adopt a subset of C++, even if Torvalds will throw a shit fit about it.

    • (Score: 2) by FatPhil on Wednesday May 27 2015, @09:47AM

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Wednesday May 27 2015, @09:47AM (#188539) Homepage
      Looking at the history of that code, it makes me wonder if it would never have happened if they hadn't split the code into power-of-2 and non-power-of-2 flows for "perfromance".

      20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 522) unsigned sectors = chunk_sects -
      20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 523) (likely(is_power_of_2(chunk_sects))
      20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 524) ? (sector & (chunk_sects-1))
      20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 525) : sector_div(sector, chunk_sects));

      My fucking god - in order to remove *one divide instruction* they're prepared to complicate the code, and destroy people's partitions?!?!? Even if that operation is performed millions of times per day, by thousands of people, it's still only a gain for the world of at most a few seconds per day. And its cost - way, way, way, way, way more than that.

      Let's see the audit logs of that patch:
              Signed-off-by: Kent Overstreet <kmo@daterainc.com>
              Cc: Jens Axboe <axboe@kernel.dk>
              Cc: Martin K. Petersen <martin.petersen@oracle.com>
              Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
              Cc: Keith Busch <keith.busch@intel.com>
              Cc: Vishal Verma <vishal.l.verma@intel.com>
              Cc: Jiri Kosina <jkosina@suse.cz>
              Cc: Neil Brown <neilb@suse.de>

      Not a single non-author Signed-off-by, Acked-by, or Reviewed-by.

      Why was it not reviewed?
        10 files changed, 272 insertions(+), 409 deletions(-)

      Why so long? Check the commit messages, summarised here:

          [SNIP - introduce new facility]

          Then [SNIP - migrate users of old facility to new facility]

      Patch should have been split, and more reviewable, and *actually reviewed*.

      Then again, the fix to the above was obviously wrong - it should obviously have restored the value (hey - are you implying that the optimisation is actually causing you to do more - that ain't right...) *immediately* after it was mangled by the crappy macro. Let's look at the audit trail of that patch then:

      commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd
      Author: NeilBrown <neilb@suse.de>

              Reported-by: Joe Landman <joe.landman@gmail.com>
              Reported-by: Dave Chinner <david@fromorbit.com>
              Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1
              Cc: stable@vger.kernel.org (3.14 and later).
              Signed-off-by: NeilBrown <neilb@suse.de>

      Yet again, not a single non-Author Signed-off-by, Acked-by, or Reviewed-by.

      Lessons:
      1) review
      2) don't "optimise"
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 3, Interesting) by TheRaven on Wednesday May 27 2015, @01:25PM

      by TheRaven (270) on Wednesday May 27 2015, @01:25PM (#188582) Journal
      The BSD style convention is that unsafe macros (i.e. ones that can't be used without knowing that they're macros) must be in uppercase. Amusingly, Linux contains a few headers (some generic data structure implementations) taken from 4BSD which, in import into Linux, were changed to use lowercase names for the macros. Apparently Linux devs like bugs.
      --
      sudo mod me up
  • (Score: 2) by KritonK on Wednesday May 27 2015, @08:25AM

    by KritonK (465) on Wednesday May 27 2015, @08:25AM (#188521)

    Anyone running raid 0 does so knowing that their data exists in a fragile bubble.

    Precisely.

    This is why I take frequent backups of my RAID 0.