Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday January 06 2021, @03:27AM   Printer-friendly
from the bit-flip-out dept.

Linus Torvalds On The Importance Of ECC RAM, Calls Out Intel's "Bad Policies" Over ECC

There's nothing quite like some fun holiday-weekend reading as a fiery mailing list post by Linus Torvalds. The Linux creator is out with one of his classical messages, which this time is arguing over the importance of ECC memory and his opinion on how Intel's "bad policies" and market segmentation have made ECC memory less widespread.

Linus argues that error-correcting code (ECC) memory "absolutely matters" but that "Intel has been instrumental in killing the whole ECC industry with it's horribly bad market segmentation... Intel has been detrimental to the whole industry and to users because of their bad and misguided policies wrt ECC. Seriously...The arguments against ECC were always complete and utter garbage... Now even the memory manufacturers are starting [to] do ECC internally because they finally owned up to the fact that they absolutely have to. And the memory manufacturers claim it's because of economics and lower power. And they are lying bastards - let me once again point to row-hammer about how those problems have existed for several generations already, but these f*ckers happily sold broken hardware to consumers and claimed it was an "attack", when it always was "we're cutting corners"."

Ian Cutress from AnandTech points out in a reply that AMD's Ryzen ECC support is not as solid as believed.

Related: Linus Torvalds: 'I'm Not a Programmer Anymore'
Linus Torvalds Rejects "Beyond Stupid" Intel Security Patch From Amazon Web Services
Linus Torvalds: Don't Hide Rust in Linux Kernel; Death to AVX-512
Linus Torvalds Doubts Linux will Get Ported to Apple M1 Hardware


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by sjames on Wednesday January 06 2021, @07:00PM (3 children)

    by sjames (2882) on Wednesday January 06 2021, @07:00PM (#1095717) Journal

    CentOS. It shows up in /var/log/messages and on the console. Debian puts it in syslog. It can also be found in the system event log. I have seen it on Cisco and on Supermicro hardware. I know that not all MBs support reporting ECC issues to the kernel.

    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by RS3 on Wednesday January 06 2021, @07:53PM (1 child)

    by RS3 (6367) on Wednesday January 06 2021, @07:53PM (#1095750)

    Thank you for that. [shuffles off to check logs...]

    Some CentOS here too. I check /var/log/messages, "dmesg", daily "logwatch" email, lots of goodies in /sys and /proc, but I've never seen a RAM error message. Obviously that doesn't mean it never happens, just that I've never seen it. Haven't been running the Open Manage software, but it might help. Had too many problems getting it to work well. Well, I do run arcconf but not the rest of it.

    • (Score: -1, Troll) by Anonymous Coward on Wednesday January 06 2021, @08:12PM

      by Anonymous Coward on Wednesday January 06 2021, @08:12PM (#1095764)

      and you're not running with ECC ram. he said it logged a correction. are you daft?

  • (Score: 2) by RS3 on Wednesday January 06 2021, @08:08PM

    by RS3 (6367) on Wednesday January 06 2021, @08:08PM (#1095761)

    Sorry- hit "submit" too soon (as I do too often...)

    Yeah, it may be a matter of MB driver support. Dell has fairly good Open Manage modules for some of their hardware, but I don't know the extent. I run it on the Windows servers, but was having SW problems with one so I disabled it... [shuffles off again...]

    Well, I ran the Dell Open Manage, checked the logs, and there was 1 ECC bit correction in September, and no other ECC or RAM messages in a year. I can live with that. :)

    I might look into running the Open Manage on the Linux servers. I don't run them in X mode, but have Windows workstations I remote into that I can display the output on (Cygwin/X, etc.)