SoylentNews Comments | Systemd Prevents the Skipping of fsck while Booting

Systemd Prevents the Skipping of fsck while Booting

posted by LaminatorX on Sunday December 21 2014, @02:05AM

from the fsking-pid0 dept.

An Anonymous Coward writes:

A Debian user has recently discovered that systemd prevents the skipping of fsck while booting:

With init, skipping a scheduled fsck during boot was easy, you just pressed Ctrl+c, it was obvious! Today I was late for an online conference. I got home, turned on my computer, and systemd decided it was time to run fsck on my 1TB hard drive. Ok, I just skip it, right? Well, Ctrl+c does not work, ESC does not work, nothing seems to work. I Googled for an answer on my phone but nothing. So, is there a mysterious set of commands they came up with to skip an fsck or is it yet another flaw?

One user chimed in with a hack to work around the flaw, but it involved specifying an argument on the kernel command line. Another user described this so-called "fix" as being "Pretty damn inconvenient and un-discoverable", while yet another pointed out that the "fix" merely prevents "systemd from running fsck in the first place", and it "does not let you cancel a systemd-initiated boot-time fsck which is already in progress."

Further investigation showed that this is a known bug with systemd that was first reported in mid-2011, and remains unfixed as of late December 2014. At least one other user has also fallen victim to this bug.

How could a severe bug of this nature even happen in the first place? How can it remain unfixed over three years after it was first reported?

This discussion has been archived. No new comments can be posted.

Systemd Prevents the Skipping of fsck while Booting | Log In/Create an Account | Top | 129 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Haven't seen this for a while Haven't seen this for a while (Score: 2, Interesting) by tftp on Sunday December 21 2014, @02:29AM

by tftp (806) on Sunday December 21 2014, @02:29AM (#127877) Homepage

In 2000's I was running quite a lot of Linux, and this problem was always bothering me. It looked like the computer's needs are above user's needs. To compare, Windows never ran fsck unless the FS was in a pretty bad shape. These days I have either servers (Ubuntu LTS) that aren't frequently rebooted, or desktops (Mint) in VM that never need fsck. Maybe journaling FS help here, as integrity of the FS can be easily determined without going through terabytes of data. Running the check on a modern HDD may be an hour-long distraction, and most certainly the OS should not run it without a positive confirmation.

Starting Score:	1		point
Moderation		+1
Interesting=1, Total=1
Extra 'Interesting' Modifier		0

Total Score:		2

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @04:39AM

by sjames (2882) on Sunday December 21 2014, @04:39AM (#127922) Journal

How is it that you KNOW that no bit got flipped anywhere on the drive?
Out of an abundance of caution, Linux wants to do a full fsck periodically. If you don't want that, you can disable the periodic fsck and do it manually when you want. Up until systemd reared it's ugly head, you had the ability to cancel the fsck if you wanted.

Parent
- Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 1) by tftp on Sunday December 21 2014, @05:41AM
  
  by tftp (806) on Sunday December 21 2014, @05:41AM (#127931) Homepage
  
  How is it that you KNOW that no bit got flipped anywhere on the drive?
  Drives don't flip bits for no reason. Every data record (sector) has a checksum. If the sector reads OK but a bit is flipped, it's because you wrote it flipped. Don't do that.
  One of primary reasons for the need of fsck in days of ext2 was absence of the journal. This means that abrupt reset of the computer could wreck the FS. The same was the norm in days of Windows 95 (FAT). However ext3/ext4 on Linux side (not even mentioning other journaling FS) and NTFS on Windows reduced the need of full scan because everything that one needs to know for recovery is, generally, in the journal. Forcing a check of the filesystem every so many boots is just demonstrating either extreme paranoia, or lack of trust in the FS code. Perhaps back then, 15 years ago, it was a reasonable measure, considering the state of the filesystems in Linux. It is strange to encounter such a thing today. How often does a modern Windows box force you to rescan the HDD? IIRC, it happens only if an error is detected during normal operation - and then rescan is scheduled on next reboot. And yes, you can cancel it :-)
  
  Parent
  - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @06:37AM
    
    by sjames (2882) on Sunday December 21 2014, @06:37AM (#127940) Journal
    
    They don't flip for no reason, they flip for a variety of reasons. Spike in vibration while writing an adjacent track, EMI in the drive cable, stray cosmic ray, power glitches, etc.
    In SOME of those cases the checksum doesn't match, but you won't know that unless something like a full fsck comes along and detects it for you. In others, the checksum will have been generated AFTER the data got corrupted and so it will perfectly validate the incorrect metadata. Fsck can sanity check that for you.
    Journaling in the file system is a great advancement, but it isn't a panacea. Paranoid? Perhaps, but we're talking about servers, not glorified solitaire and minesweeper machines :-)
    But Linux does respect user choice. You can use the -c and -i options of tune2fs to change the fsck interval or disable periodic fsck entirely. As long as you avoid systemd you can cancel an fsck if it's not a good time. You can manually command it to do the fsck and reset the countdown.
    
    Parent
    - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 1) by tftp on Sunday December 21 2014, @06:55AM
      
      by tftp (806) on Sunday December 21 2014, @06:55AM (#127943) Homepage
      
      Paranoid? Perhaps, but we're talking about servers, not glorified solitaire and minesweeper machines :-)
      I agree about servers; but I recall that the darned thing activated when I powered up the laptop at a meeting :-) Don't even remember what it was, SUSE or RedHat.
      In others, the checksum will have been generated AFTER the data got corrupted and so it will perfectly validate the incorrect metadata.
      Checksums are calculated and checked by the hardware. The data rate is way too high to do it in a CPU. Besides, it has to be done atomically, on sector level, when HDD rewrites a sector. But sure, there is always a possibility to screw things up. It may be reasonable to be extra careful on servers. However desktops are just fine with lazy verification.
      
      Parent
      - Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @09:17AM
        
        by sjames (2882) on Sunday December 21 2014, @09:17AM (#127973) Journal
        
        I agree about servers; but I recall that the darned thing activated when I powered up the laptop at a meeting :-) Don't even remember what it was, SUSE or RedHat.
        An excellent example of why fsck might need to be cancelled.
        You know which use case best applies to your machine, so you must configure it if you want lazy verification.
        
        Parent
  - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by Immerman on Sunday December 21 2014, @06:45AM
    
    by Immerman (3985) on Sunday December 21 2014, @06:45AM (#127941)
    
    >Drives don't flip bits for no reason.
    Of course they do - hence the name "random error". Back in the day you could expect an error on average in one bit out of every 10^14 bits read. Roughly one byte in in 10 terabytes - not so bad when in the day when that was a a ferocious amount of data transfer for a 100MB drive at 50mb/s. Today though, it probably rears it's head a few times in the lifetime of a 1TB drive. And that's traditional, simple, crude even HD technology. Newer tech... well a lot o it hasn't even been out long enough to determine realistic real-world error rates. To say nothing of SSDs - where I just found numbers in the 1 in 10^8 range. That's 1 bit in every 100MB, or one error every few seconds if you're saturating the SATA bus. Seems ridiculous to me, so presumably there's a lot of error correction going on behind the scenes, but even that has it's limits.
    
    Parent
    - Re:Haven't seen this for a while (Score: 3, Interesting) by sjames on Sunday December 21 2014, @07:44AM
      
      by sjames (2882) on Sunday December 21 2014, @07:44AM (#127952) Journal
      
      A simple read error is not a big deal except sometimes for performance, just read again until the checksum verifies the data. Often that happens at the hardware level. Write errors or bit flipping on the media IS a big deal since that is actually corrupt data. Worst of all is when the flipped bit happens prior to checksumming. That can cause silent corruption.
      That's why BTRFS and ZFS do checksumming at the file system level and support raid-like storage. Unlike a device level RAID, they can decide which disk has the correct data in cases of bit flip and can then re-write the correct data to the other drive. Even on a single drive, btrfs likes to write two copies of the metadata.
      
      Parent
    - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @08:44AM
      
      by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Sunday December 21 2014, @08:44AM (#127965) Homepage
      
      1TB is ~10^13 bits. A 10^-14 error will happen every tenth time you do a full backup. If hard disks get much larger they're going to have to start incorporating much fancier error detection/correction.
      
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      
      Parent
      - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by tonyPick on Sunday December 21 2014, @10:57AM
        
        by tonyPick (1237) on Sunday December 21 2014, @10:57AM (#127991) Homepage Journal
        
        Also leads to (potentially) bigger problems if you use raid: http://www.zdnet.com/article/has-raid5-stopped-working/ [zdnet.com]
        
        Parent
        
        Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @11:08AM
        
        by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Sunday December 21 2014, @11:08AM (#127992) Homepage
        
        friends don't let friends use raid 5 (or 4, or 3) www.baarf.com/
        ‎
        
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
        
        Parent
    - Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @08:33AM
      
      by cafebabe (894) on Thursday December 25 2014, @08:33AM (#129060) Journal
      
      A similar problem occurs with networking. For a few packets over a few segments, 16 bit checksums may be sufficient. However, 0.01% corruption of Jumbo Frames over 13 hops leads to silent corruption approximately once per hour. Worse links or small packets may lead to significantly higher rates of corruption.
      presumably there's a lot of error correction going on behind the scenes, but even that has it's limits.
      
      Unfortunately not. One person's payload is another person's header. So, if you aren't processing the payload immediately, the corruption is silent. Even if you are processing the payload immediately, corruption may elude validation or parsing.
      
      --
      1702845791×2
      
      Parent
  - Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @08:27AM
    
    by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Sunday December 21 2014, @08:27AM (#127960) Homepage
    
    You appear to not consider a sector failing to read a problem.
    
    In that case - turn of filesystem checking completely. Good luck, and enjoy your bitrot.
    
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    
    Parent
  - Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @08:31AM
    
    by cafebabe (894) on Thursday December 25 2014, @08:31AM (#129059) Journal
    
    Drives don't flip bits for no reason. Every data record (sector) has a checksum.
    
    That isn't an end-to-end checksum. Also, ATA drives autonomously substitute sectors when regenerative checksums exceed a threshold. What algorithm and threshold? That is proprietary and varies with each revision of firmware on each model of each manufacturer's drives.
    How often does a modern Windows box force you to rescan the HDD?
    
    After every Blue Screen Of Death.
    
    --
    1702845791×2
    
    Parent
- Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Sunday December 21 2014, @09:53PM
  
  by Anonymous Coward on Sunday December 21 2014, @09:53PM (#128134)
  
  Irrelevant. If you're using periodic fscks on (rare) reboots to detect bits flipped on your drives, you're doing things badly wrong. fsck checks for file system integrity. Not bits flipped anywhere on the drive.
  
  FWIW extensive checking for file system integrity during reboots just because X weeks have passed is also a stupid thing to be doing.
  
  If there is a reason that you need to check your filesystem for problems every X weeks or months, you shouldn't be waiting for a reboot after Y months to do so. There's no strong correlation between X and Y.
  
  Parent
  - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Monday December 22 2014, @01:17AM
    
    by sjames (2882) on Monday December 22 2014, @01:17AM (#128190) Journal
    
    Bits flipped in the metadata definitely affect filesystem integrity. The whole scheme is a bit from a previous era. The trend now is towards filesystem level checksums and online integrety checking (beyond journaling).
    
    Parent
    - Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Monday December 22 2014, @12:35PM
      
      by Anonymous Coward on Monday December 22 2014, @12:35PM (#128296)
      
      You said "How is it that you KNOW that no bit got flipped anywhere on the drive?"
      
      Not "How is it that you know that no bit got flipped in the filesystem METADATA on the drive"
      
      Big difference.
      
      Parent
      - Re:Haven't seen this for a while (Score: 2) by sjames on Tuesday December 23 2014, @09:15AM
        
        by sjames (2882) on Tuesday December 23 2014, @09:15AM (#128619) Journal
        
        It only matters to your argument if I had said "How is it that you know that no bit got flipped in the filesystem anywhere but in the metadata". I didn't. The whole disk includes the metadata, yes?
        
        Parent
- Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by Geotti on Monday December 22 2014, @02:07AM
  
  by Geotti (1146) on Monday December 22 2014, @02:07AM (#128199) Journal
  
  How is it that you KNOW that no bit got flipped anywhere on the drive?
  There's like 10 posts in this tree, but no one suggested ZFS to limit the repercussions of a flipped bit. Makes me wonder...
  
  Parent
  - Re:Haven't seen this for a while (Score: 2) by Geotti on Monday December 22 2014, @02:09AM
    
    by Geotti (1146) on Monday December 22 2014, @02:09AM (#128200) Journal
    
    Whoops, I overlooked http://soylentnews.org/comments.pl?sid=5378&cid=127952 [soylentnews.org] by sjames. *blush*
    
    Parent
  - Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Monday December 22 2014, @04:21AM
    
    by Anonymous Coward on Monday December 22 2014, @04:21AM (#128223)
    
    ZFS is a pain in the ass to use with Linux. It's not like Solaris or FreeBSD, where it's pretty much seamless.
    
    Parent
  - Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @02:54PM
    
    by cafebabe (894) on Thursday December 25 2014, @02:54PM (#129105) Journal
    
    The gains from using ZFS on one drive are fairly small. Yes, you'll have end-to-end checksums However, it may be preferable to suffer a default filing system and a periodic check.
    
    --
    1702845791×2
    
    Parent
  - Re:Haven't seen this for a while (Score: 2) by cafebabe on Monday December 29 2014, @06:04PM
    
    by cafebabe (894) on Monday December 29 2014, @06:04PM (#130001) Journal
    
    Not many people are running software RAID and ZFS isn't a huge gain with a single volume. Admittedly, ZFS offers end-to-end checksums but it seems people would rather incur occasional integrity checks rather than incur upgrade, compatibility or recovery problems from something more transparent.
    
    --
    1702845791×2
    
    Parent
Re:Haven't seen this for a while (Score: 3, Insightful) by VLM on Sunday December 21 2014, @12:41PM

by VLM (445) on Sunday December 21 2014, @12:41PM (#127999)

Running the check on a modern HDD may be an hour-long distraction
The "linux people" have moved on to freebsd, where "zpool scrub zroot" runs in the background. Also on my SSD desktop with only a couple gigs of stuff, a scrub only takes a minute or so. We'll call it 20 seconds per gig. "Modern HDD" is an oxymoron other than multi-terabyte beasts that live in a fileserver at home or the NAS at work. I would have to check but I think I'm down to 4 spinning rust drives at home, 3 multi-terabyte beasts in the fileserver and one in the kids xbox.
The "enterprise java windows developers" have all taken over linux now and they're spreading their views over linux rather than merging into the community. So you get folks who think the design of systemd is a great idea, etc. We're stuck with windows ME edition thinking, running on a otherwise decent linux kernel.

Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Systemd Prevents the Skipping of fsck while Booting

Haven't seen this for a while Haven't seen this for a while (Score: 2, Interesting) by tftp on Sunday December 21 2014, @02:29AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @04:39AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 1) by tftp on Sunday December 21 2014, @05:41AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @06:37AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 1) by tftp on Sunday December 21 2014, @06:55AM

Re:Haven't seen this for a while (Score: 2) by sjames on Sunday December 21 2014, @09:17AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by Immerman on Sunday December 21 2014, @06:45AM

Re:Haven't seen this for a while (Score: 3, Interesting) by sjames on Sunday December 21 2014, @07:44AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @08:44AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by tonyPick on Sunday December 21 2014, @10:57AM

Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @11:08AM

Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @08:33AM

Re:Haven't seen this for a while (Score: 2) by FatPhil on Sunday December 21 2014, @08:27AM

Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @08:31AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Sunday December 21 2014, @09:53PM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by sjames on Monday December 22 2014, @01:17AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Monday December 22 2014, @12:35PM

Re:Haven't seen this for a while (Score: 2) by sjames on Tuesday December 23 2014, @09:15AM

Re:Haven't seen this for a while Re:Haven't seen this for a while (Score: 2) by Geotti on Monday December 22 2014, @02:07AM

Re:Haven't seen this for a while (Score: 2) by Geotti on Monday December 22 2014, @02:09AM

Re:Haven't seen this for a while (Score: 0) by Anonymous Coward on Monday December 22 2014, @04:21AM

Re:Haven't seen this for a while (Score: 2) by cafebabe on Thursday December 25 2014, @02:54PM

Re:Haven't seen this for a while (Score: 2) by cafebabe on Monday December 29 2014, @06:04PM

Re:Haven't seen this for a while (Score: 3, Insightful) by VLM on Sunday December 21 2014, @12:41PM