Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.

Sections

SoylentNews

Running an ext4 RAID0 with a Linux 4.x Kernel? Better Spray for Bugs

posted by janrinok on Tuesday May 26 2015, @04:16PM

from the patch-immediately dept.

"gewg_" writes:

The combination of RAID0 redundancy, an ext4 filesystem, a Linux 4.x kernel, and either Debian Linux or Arch Linux has been associated with data corruption.

El Reg reports EXT4 filesystem can EAT ALL YOUR DATA

Fixes are available, one explained by Lukas Czerner on the Linux Kernel Mailing List. That post suggests the bug is long-standing, possibly as far back as the 3.12-stable kernel. Others suggest the bug has only manifested in Linux 4.x.
[...] This patch for version 4.x and the patched Linux kernel 3.12.43 LTS both seem like sensible code to contemplate.

[Editor's Comment: Original Submission]

This discussion has been archived. No new comments can be posted.

Running an ext4 RAID0 with a Linux 4.x Kernel? Better Spray for Bugs | Log In/Create an Account | Top | 45 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

raid0 != redundantcy raid0 != redundantcy (Score: 5, Informative) by tibman on Tuesday May 26 2015, @04:49PM

by tibman (134)

on Tuesday May 26 2015, @04:49PM (#188133)

Raid0 is the opposite of redundant. It is a guarantee that if any one drive fails you'll lose data for ALL drives. http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0 [wikipedia.org]

Anyone running raid 0 does so knowing that their data exists in a fragile bubble.

--
SN won't survive on lurkers alone. Write comments.

Starting Score:	1		point
Moderation		+3
Informative=3, Total=3
Extra 'Informative' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		5

Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 4, Informative) by LoRdTAW on Tuesday May 26 2015, @05:34PM

by LoRdTAW (3755) on Tuesday May 26 2015, @05:34PM (#188157) Journal

Correct, the proper RAID level for redundancy, also called mirroring, is RAID 1.
RAID 0 is striping which writes data in parallel between two drives increasing read write performance by a factor upward of two. Half of your data is on disk, half on the other. If you lose one disk, you lose everything. Mitigating this is done using RAID 0+1 which is exactly what you'd expect, two RAID 0 arrays which are mirrored. But you are still stuck with losing half of the storage space meaning four 1TB disks gives you only 2TB. The next level up is RAID 5 which has a storage penalty of -1 disk. So a four 1TB disk array will yield 3TB. The parity data is spread across all four (or more) disks in RAID 5 array. You can afford to loose any one of the four disks and still have access to the array. RAID 6 doubles the parity so you can afford to lose two disks.
And remember, RAID 1/5/6 is for redundancy to reduce downtime. It is NOT backup.

Parent
- Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 2) by jmorris on Tuesday May 26 2015, @07:42PM
  
  by jmorris (4844) on Tuesday May 26 2015, @07:42PM (#188222)
  
  Yes, RAID5 would yield 3TB vs the 2TB of RAID 1+0 (also called RAID 10). Now look at the cost. A one byte write requires reading all four drives and writing at least two plus calculating and then recalculating the parity. If done in soft RAID it does horrid things to the CPU cache as well. The RAID 10 not only cuts out two reads, both writes are of the same data and none of the unmodified data need pass through the CPU cache if the drivers for the drive interfaces are modern. When writing larger blocks it usually needs to run the data through the cpu cache twice but you might get slightly faster total write throughput in exchange since three drives are sinking the data (plus one more taking parity info) instead of two (plus two taking redundent copies). Which way the performance vs capacity balance swings depends on the intended use. And adding a hardware RAID controller eliminates all of the cache considerations at more upfront expense, system complexity, possible closed array management utilities, etc.
  And remember, RAID 1/5/6 is for redundancy to reduce downtime.
  Preach brother! Filesystem corruption, accidental deletion, power surge, all these things can totally hose you despite RAID.
  
  Parent
  - Re:raid0 != redundantcy (Score: 3, Informative) by TheRaven on Wednesday May 27 2015, @01:22PM
    
    by TheRaven (270) on Wednesday May 27 2015, @01:22PM (#188579) Journal
    
    A one byte write requires reading all four drives and writing at least two plus calculating and then recalculating the parity
    You don't write bytes to block devices, you write blocks. Blocks are generally stored in the buffer cache and most RAID arrangements write stripes in such a way that they're normally read and cached similarly (e.g. logical blocks n to n+m in an m-way RAID set are blocks 0 to m in a single stripe). You will need to do a read before doing a write if you're writing somewhere that isn't in the buffer cache, but this is a comparatively rare occurrence.
    If done in soft RAID it does horrid things to the CPU cache as well.
    That's far less true on modern CPUs. Intel can do xor in the DMA controller, so you don't actually need it to come closer to the CPU than LLC anyway, but even without that most CPUs support non-temporal loads (and will automatically prefetch streaming memory access patterns) and uncached stores, so you will not trample the cache too much. At most you should be killing one way in each associative cache, so an eighth to a quarter of the cache (depending on the CPU), if implemented correctly.
    
    --
    sudo mod me up
    
    Parent
Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 3, Interesting) by gman003 on Tuesday May 26 2015, @06:05PM

by gman003 (4155) on Tuesday May 26 2015, @06:05PM (#188170)

Does anybody really run RAID0 anymore? The canonical use case was for fast throwaway partitions like /tmp, or even swap space, but now that RAM is so plentiful, people run those partitions on ramdisks instead. And if it's too big for a ramdisk, there's SSDs. Is there anything that a) needs higher I/O performance than a single drive, b) can tolerate data loss, and c) is too big to fit in RAM or an SSD?
I know there's some high-capacity SSDs that secretly RAID0 together two smaller SSDs rather than use a controller that can natively handle that much flash, but that's not really relevant to this sort of issue, and is a dying practice anyways.

Parent
- Re:raid0 != redundancy (Score: 2) by kaszz on Tuesday May 26 2015, @06:42PM
  
  by kaszz (4211) on Tuesday May 26 2015, @06:42PM (#188184) Journal
  
  Perhaps in situations where you have multiple layers of disc packs? Like RAID 5 which in turn is clustered as a RAID 0 volume etc.
  
  Parent
- Re:raid0 != redundantcy (Score: 2) by richtopia on Tuesday May 26 2015, @07:18PM
  
  by richtopia (3160) on Tuesday May 26 2015, @07:18PM (#188203) Homepage Journal
  
  I have some tools running RAID0 WD Raptor drives - they were built before SSDs were readily available and the policy is to maintain the original hardware profile if possible.
  Don't get me wrong - this is a silly point of failure for a million dollar tool, but legacy is important!
  
  Parent
- Re:raid0 != redundantcy (Score: 4, Interesting) by VLM on Tuesday May 26 2015, @07:28PM
  
  by VLM (445) on Tuesday May 26 2015, @07:28PM (#188211)
  
  Its nice for huge logs, both higher level stuff and packet sniffy at "system wide" levels not just monitoring one machine.
  How often is this important, well, practically never. But its nice enough to have.
  This rapidly runs into the ram limitation that WTF are you doing if you generate more than dozens of GB of raw data, and thats nice that you gathered it but now what do you propose to usefully do with it in a reasonable period of time? So I can build something bigger than I can find a productive use for it.
  I always figured it would be useful for a really wide broadband SDR, no decimation, nothing just slam gigs/sec onto a drive for later analysis. That would work pretty well for RAID0.
  You can also do stupid nerdy stunts. Floppy drives can't read fast enough to play mp3 files (most can't sustain more than 50K or so) and usually don't store enough data anyway, but if you take a thundering herd of them and plug like 8 external USB floppy drives into a pile of USB hubs and RAID0 them together then its sorta usable. Its not nearly as visually impressive but you can do similar "stupid raid tricks" with USB flash drives. Supposedly it takes "a lot" of parallel USB flash drives to record live video, at least back in the old days when they were slower access, maybe they're fast enough now.
  If you ever get bored, and have a pile of USB flash drives or floppy drives, you can do all kinds of lunatic things with RAID. Its "hilarious" to set up a raid5 and then push the eject button of a floppy drive and then hot-add it back in. I guess on a rainy day I can be easily amused, but it seemed fun at the time.
  A little google work shows I'm not the inventor of this fine idea, there's scant online reference to 127 usb floppy drive arrays out there. That would be a beautiful sight to behold.
  
  Parent
- Re:raid0 != redundantcy (Score: 2) by slinches on Wednesday May 27 2015, @06:29AM
  
  by slinches (5049) on Wednesday May 27 2015, @06:29AM (#188487)
  
  I just configured a new FEA workstation with a RAID0 array of 10k rpm SAS drives. It's actually being used as the primary active data drive, but the configuration was selected to be an effective working directory for analysis runs. Some of these runs can require several terabytes of I/O with results sets in the hundreds of gigabytes, so high capacity and speed are paramount. Redundancy is then handled by running daily scripted rsync operations to regular SATA drives.
  This setup becomes roughly equivalent to RAID10 in terms of data security except that it's asynchronous, so a drive failure could cause the loss of some recent data. The benefit of accepting that risk is reduced cost, increased usable space and better write performance relative to RAID10 with same number of disks.
  For example:
  8x 1TB SAS 10k rpm drives in a RAID10 array gives 4TB of space with 8x read and 4x write speeds (cost $400 x 8 = $3200)
  6x 1TB SAS 10k rpm drives in RAID0 gives 6TB of space with 6x read and write speeds + 2x 3TB SATA drives (cost $400 x 6 + $150 x 2 = $2700)
  
  Parent
- Re:raid0 != redundantcy (Score: 0) by Anonymous Coward on Wednesday May 27 2015, @07:13AM
  
  by Anonymous Coward on Wednesday May 27 2015, @07:13AM (#188498)
  
  I believe RAID0 is still big in video editing.
  All those gigabytes of RAM... They can hold a couple of uncompressed frames.
  
  Parent
Re:raid0 != redundantcy (Score: 0) by Anonymous Coward on Tuesday May 26 2015, @06:31PM

by Anonymous Coward on Tuesday May 26 2015, @06:31PM (#188177)

Must be the homeopathic version of RAID.

Parent
Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 3, Informative) by gnuman on Tuesday May 26 2015, @09:53PM

by gnuman (5013) on Tuesday May 26 2015, @09:53PM (#188300)

Raid0 is the opposite of redundant.
And that is completely offtopic. It has nothing to do with the bug.
The bug has everything to do with Linux having macros that look like functions, which result in people not understanding the code completely, making mistakes that are difficult to catch.
foo(a, b);
can modify a and b, if it's a macro. These problems are primary reasons why templated functions exist in C++ - to get rid off macros. Maybe Linux should adopt a very explicit style convention for macros. maybe _m_#name or anything else that looks like a plain function call.

Parent
- Re:raid0 != redundantcy (Score: 1) by Placenta on Tuesday May 26 2015, @10:04PM
  
  by Placenta (5264) on Tuesday May 26 2015, @10:04PM (#188306)
  
  Maybe Linux should adopt a subset of C++, even if Torvalds will throw a shit fit about it.
  
  Parent
- Re:raid0 != redundantcy (Score: 2) by FatPhil on Wednesday May 27 2015, @09:47AM
  
  by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Wednesday May 27 2015, @09:47AM (#188539) Homepage
  
  Looking at the history of that code, it makes me wonder if it would never have happened if they hadn't split the code into power-of-2 and non-power-of-2 flows for "perfromance".
  
  20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 522) unsigned sectors = chunk_sects -
  20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 523) (likely(is_power_of_2(chunk_sects))
  20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 524) ? (sector & (chunk_sects-1))
  20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 525) : sector_div(sector, chunk_sects));
  
  My fucking god - in order to remove *one divide instruction* they're prepared to complicate the code, and destroy people's partitions?!?!? Even if that operation is performed millions of times per day, by thousands of people, it's still only a gain for the world of at most a few seconds per day. And its cost - way, way, way, way, way more than that.
  
  Let's see the audit logs of that patch:
  Signed-off-by: Kent Overstreet <kmo@daterainc.com>
  Cc: Jens Axboe <axboe@kernel.dk>
  Cc: Martin K. Petersen <martin.petersen@oracle.com>
  Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
  Cc: Keith Busch <keith.busch@intel.com>
  Cc: Vishal Verma <vishal.l.verma@intel.com>
  Cc: Jiri Kosina <jkosina@suse.cz>
  Cc: Neil Brown <neilb@suse.de>
  
  Not a single non-author Signed-off-by, Acked-by, or Reviewed-by.
  
  Why was it not reviewed?
  10 files changed, 272 insertions(+), 409 deletions(-)
  
  Why so long? Check the commit messages, summarised here:
  
  [SNIP - introduce new facility]
  
  Then [SNIP - migrate users of old facility to new facility]
  
  Patch should have been split, and more reviewable, and *actually reviewed*.
  
  Then again, the fix to the above was obviously wrong - it should obviously have restored the value (hey - are you implying that the optimisation is actually causing you to do more - that ain't right...) *immediately* after it was mangled by the crappy macro. Let's look at the audit trail of that patch then:
  
  commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd
  Author: NeilBrown <neilb@suse.de>
  
  Reported-by: Joe Landman <joe.landman@gmail.com>
  Reported-by: Dave Chinner <david@fromorbit.com>
  Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1
  Cc: stable@vger.kernel.org (3.14 and later).
  Signed-off-by: NeilBrown <neilb@suse.de>
  
  Yet again, not a single non-Author Signed-off-by, Acked-by, or Reviewed-by.
  
  Lessons:
  1) review
  2) don't "optimise"
  
  --
  Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  
  Parent
- Re:raid0 != redundantcy (Score: 3, Interesting) by TheRaven on Wednesday May 27 2015, @01:25PM
  
  by TheRaven (270) on Wednesday May 27 2015, @01:25PM (#188582) Journal
  
  The BSD style convention is that unsafe macros (i.e. ones that can't be used without knowing that they're macros) must be in uppercase. Amusingly, Linux contains a few headers (some generic data structure implementations) taken from 4BSD which, in import into Linux, were changed to use lowercase names for the macros. Apparently Linux devs like bugs.
  
  --
  sudo mod me up
  
  Parent
Re:raid0 != redundantcy => backup (Score: 2) by KritonK on Wednesday May 27 2015, @08:25AM

by KritonK (465) on Wednesday May 27 2015, @08:25AM (#188521)

Anyone running raid 0 does so knowing that their data exists in a fragile bubble.
Precisely.
This is why I take frequent backups of my RAID 0.

Parent

Moderator Help

It's a recession when your neighbour loses his job; it's a depression when you lose yours. -- Harry S. Truman

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Running an ext4 RAID0 with a Linux 4.x Kernel? Better Spray for Bugs

raid0 != redundantcy raid0 != redundantcy (Score: 5, Informative) by tibman on Tuesday May 26 2015, @04:49PM

Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 4, Informative) by LoRdTAW on Tuesday May 26 2015, @05:34PM

Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 2) by jmorris on Tuesday May 26 2015, @07:42PM

Re:raid0 != redundantcy (Score: 3, Informative) by TheRaven on Wednesday May 27 2015, @01:22PM

Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 3, Interesting) by gman003 on Tuesday May 26 2015, @06:05PM

Re:raid0 != redundancy (Score: 2) by kaszz on Tuesday May 26 2015, @06:42PM

Re:raid0 != redundantcy (Score: 2) by richtopia on Tuesday May 26 2015, @07:18PM

Re:raid0 != redundantcy (Score: 4, Interesting) by VLM on Tuesday May 26 2015, @07:28PM

Re:raid0 != redundantcy (Score: 2) by slinches on Wednesday May 27 2015, @06:29AM

Re:raid0 != redundantcy (Score: 0) by Anonymous Coward on Wednesday May 27 2015, @07:13AM

Re:raid0 != redundantcy (Score: 0) by Anonymous Coward on Tuesday May 26 2015, @06:31PM

Re:raid0 != redundantcy Re:raid0 != redundantcy (Score: 3, Informative) by gnuman on Tuesday May 26 2015, @09:53PM

Re:raid0 != redundantcy (Score: 1) by Placenta on Tuesday May 26 2015, @10:04PM

Re:raid0 != redundantcy (Score: 2) by FatPhil on Wednesday May 27 2015, @09:47AM

Re:raid0 != redundantcy (Score: 3, Interesting) by TheRaven on Wednesday May 27 2015, @01:25PM

Re:raid0 != redundantcy => backup (Score: 2) by KritonK on Wednesday May 27 2015, @08:25AM