Stories
Slash Boxes
Comments

SoylentNews is people

posted by LaminatorX on Sunday December 21 2014, @02:05AM   Printer-friendly
from the fsking-pid0 dept.

A Debian user has recently discovered that systemd prevents the skipping of fsck while booting:

With init, skipping a scheduled fsck during boot was easy, you just pressed Ctrl+c, it was obvious! Today I was late for an online conference. I got home, turned on my computer, and systemd decided it was time to run fsck on my 1TB hard drive. Ok, I just skip it, right? Well, Ctrl+c does not work, ESC does not work, nothing seems to work. I Googled for an answer on my phone but nothing. So, is there a mysterious set of commands they came up with to skip an fsck or is it yet another flaw?

One user chimed in with a hack to work around the flaw, but it involved specifying an argument on the kernel command line. Another user described this so-called "fix" as being "Pretty damn inconvenient and un-discoverable", while yet another pointed out that the "fix" merely prevents "systemd from running fsck in the first place", and it "does not let you cancel a systemd-initiated boot-time fsck which is already in progress."

Further investigation showed that this is a known bug with systemd that was first reported in mid-2011, and remains unfixed as of late December 2014. At least one other user has also fallen victim to this bug.

How could a severe bug of this nature even happen in the first place? How can it remain unfixed over three years after it was first reported?

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Immerman on Sunday December 21 2014, @06:45AM

    by Immerman (3985) on Sunday December 21 2014, @06:45AM (#127941)

    >Drives don't flip bits for no reason.

    Of course they do - hence the name "random error". Back in the day you could expect an error on average in one bit out of every 10^14 bits read. Roughly one byte in in 10 terabytes - not so bad when in the day when that was a a ferocious amount of data transfer for a 100MB drive at 50mb/s. Today though, it probably rears it's head a few times in the lifetime of a 1TB drive. And that's traditional, simple, crude even HD technology. Newer tech... well a lot o it hasn't even been out long enough to determine realistic real-world error rates. To say nothing of SSDs - where I just found numbers in the 1 in 10^8 range. That's 1 bit in every 100MB, or one error every few seconds if you're saturating the SATA bus. Seems ridiculous to me, so presumably there's a lot of error correction going on behind the scenes, but even that has it's limits.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Interesting) by sjames on Sunday December 21 2014, @07:44AM

    by sjames (2882) on Sunday December 21 2014, @07:44AM (#127952) Journal

    A simple read error is not a big deal except sometimes for performance, just read again until the checksum verifies the data. Often that happens at the hardware level. Write errors or bit flipping on the media IS a big deal since that is actually corrupt data. Worst of all is when the flipped bit happens prior to checksumming. That can cause silent corruption.

    That's why BTRFS and ZFS do checksumming at the file system level and support raid-like storage. Unlike a device level RAID, they can decide which disk has the correct data in cases of bit flip and can then re-write the correct data to the other drive. Even on a single drive, btrfs likes to write two copies of the metadata.

  • (Score: 2) by FatPhil on Sunday December 21 2014, @08:44AM

    by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Sunday December 21 2014, @08:44AM (#127965) Homepage
    1TB is ~10^13 bits. A 10^-14 error will happen every tenth time you do a full backup. If hard disks get much larger they're going to have to start incorporating much fancier error detection/correction.
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  • (Score: 2) by cafebabe on Thursday December 25 2014, @08:33AM

    by cafebabe (894) on Thursday December 25 2014, @08:33AM (#129060) Journal

    A similar problem occurs with networking. For a few packets over a few segments, 16 bit checksums may be sufficient. However, 0.01% corruption of Jumbo Frames over 13 hops leads to silent corruption approximately once per hour. Worse links or small packets may lead to significantly higher rates of corruption.

    presumably there's a lot of error correction going on behind the scenes, but even that has it's limits.

    Unfortunately not. One person's payload is another person's header. So, if you aren't processing the payload immediately, the corruption is silent. Even if you are processing the payload immediately, corruption may elude validation or parsing.

    --
    1702845791×2