Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday January 06 2021, @03:27AM   Printer-friendly
from the bit-flip-out dept.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

There's nothing quite like some fun holiday-weekend reading as a fiery mailing list post by Linus Torvalds. The Linux creator is out with one of his classical messages, which this time is arguing over the importance of ECC memory and his opinion on how Intel's "bad policies" and market segmentation have made ECC memory less widespread.

Linus argues that error-correcting code (ECC) memory "absolutely matters" but that "Intel has been instrumental in killing the whole ECC industry with it's horribly bad market segmentation... Intel has been detrimental to the whole industry and to users because of their bad and misguided policies wrt ECC. Seriously...The arguments against ECC were always complete and utter garbage... Now even the memory manufacturers are starting [to] do ECC internally because they finally owned up to the fact that they absolutely have to. And the memory manufacturers claim it's because of economics and lower power. And they are lying bastards - let me once again point to row-hammer about how those problems have existed for several generations already, but these f*ckers happily sold broken hardware to consumers and claimed it was an "attack", when it always was "we're cutting corners"."

Ian Cutress from AnandTech points out in a reply that AMD's Ryzen ECC support is not as solid as believed.

Related: Linus Torvalds: 'I'm Not a Programmer Anymore'
Linus Torvalds Rejects "Beyond Stupid" Intel Security Patch From Amazon Web Services
Linus Torvalds: Don't Hide Rust in Linux Kernel; Death to AVX-512
Linus Torvalds Doubts Linux will Get Ported to Apple M1 Hardware


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Immerman on Wednesday January 06 2021, @08:29PM

    by Immerman (3985) on Wednesday January 06 2021, @08:29PM (#1095773)

    Very true, which is why ECC has traditionally targeted business-critical machines, where most of the RAM is likely to be in use, and any data corruption can get very expensive.

    However, even for the rest of us there's no telling where that flipped bit will be. Going by the Google study I referenced elsewhere, with 16GB of RAM you'll average 71 flipped bits in the course of an 8-hour day. Even if only 1.4% of your RAM is holding information where a flipped bit will matter, you'll average one "important" flipped bit per day.

    Now, how important is that really? Are you going to lose hours of work or clear out your bank account as a result? Probably not, probably it's just a nuisance. Still, RAM is a small part of the overall cost of a typical computer, and ECC RAM should only increases that cost by a small amount in order to virtually eliminate such nuisances.

    ECC also offers the advantage of letting you know immediately that your RAM is faulty, without running a time consuming memory test that may well not detect the error. An intermittent error can be the most aggravating to identify, and buying new RAM just to see it if that fixes the problem is an expensive option - assuming your computer even has replaceable RAM. Having the errors be silently repaired and (potentially) logged means you don't have the aggravation of memory errors unless the RAM is *very* faulty, and if you do have faulty RAM it will be very obvious. You can even track error rates over time to see if the problem is worsening, or if swapping stick positions resolves interference that was flipping bits in a vulnerable stick.

    And then there's the fact that not all bit flips are normal - RowHammer and related memory attacks vectors work by overwhelming memory with atypical usage patterns designed to flip bits in adjacent memory locations the attacking software doesn't have direct access to. ECC could also go a long way to protecting from such attacks.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2