The combination of RAID0 redundancy, an ext4 filesystem, a Linux 4.x kernel, and either Debian Linux or Arch Linux has been associated with data corruption.
El Reg reports EXT4 filesystem can EAT ALL YOUR DATA
Fixes are available, one explained by Lukas Czerner on the Linux Kernel Mailing List. That post suggests the bug is long-standing, possibly as far back as the 3.12-stable kernel. Others suggest the bug has only manifested in Linux 4.x.
[...] This patch for version 4.x and the patched Linux kernel 3.12.43 LTS both seem like sensible code to contemplate.
(Score: 2) by FatPhil on Wednesday May 27 2015, @09:47AM
20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 522) unsigned sectors = chunk_sects -
20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 523) (likely(is_power_of_2(chunk_sects))
20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 524) ? (sector & (chunk_sects-1))
20d0189b (Kent Overstreet 2013-11-23 18:21:01 -0800 525) : sector_div(sector, chunk_sects));
My fucking god - in order to remove *one divide instruction* they're prepared to complicate the code, and destroy people's partitions?!?!? Even if that operation is performed millions of times per day, by thousands of people, it's still only a gain for the world of at most a few seconds per day. And its cost - way, way, way, way, way more than that.
Let's see the audit logs of that patch:
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Neil Brown <neilb@suse.de>
Not a single non-author Signed-off-by, Acked-by, or Reviewed-by.
Why was it not reviewed?
10 files changed, 272 insertions(+), 409 deletions(-)
Why so long? Check the commit messages, summarised here:
[SNIP - introduce new facility]
Then [SNIP - migrate users of old facility to new facility]
Patch should have been split, and more reviewable, and *actually reviewed*.
Then again, the fix to the above was obviously wrong - it should obviously have restored the value (hey - are you implying that the optimisation is actually causing you to do more - that ain't right...) *immediately* after it was mangled by the crappy macro. Let's look at the audit trail of that patch then:
commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd
Author: NeilBrown <neilb@suse.de>
Reported-by: Joe Landman <joe.landman@gmail.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1
Cc: stable@vger.kernel.org (3.14 and later).
Signed-off-by: NeilBrown <neilb@suse.de>
Yet again, not a single non-Author Signed-off-by, Acked-by, or Reviewed-by.
Lessons:
1) review
2) don't "optimise"
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves