Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday January 31, @05:14PM   Printer-friendly
from the doesn't-raid-fix-this? dept.

Arthur T Knackerbracket has found the following story:

In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.

"We've seen at least two serious bugs in Intel CPUs in the last quarter, and it's almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we've moved past that."

Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.

This month's Meltdown and Spectre security flaws that affect chip designs from AMD, Arm, and Intel to varying degrees support that claim. But there are many other examples.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @05:18PM (2 children)

    by Anonymous Coward on Wednesday January 31, @05:18PM (#631025)

    This kind of Microsoft spam is not needed on the main page. Save that shite for personal journals. We really don't gain anything from hearing from Microsoft's marketeers unless they are paying for product placement. But I don't see that anywhere in the summary. So until then leave out the spam. If the topic is based in reality then it will have been covered already elsewhere. Cite those sources instead.

    • (Score: 1, Funny) by Anonymous Coward on Wednesday January 31, @06:02PM

      by Anonymous Coward on Wednesday January 31, @06:02PM (#631050)

      The man frequently used the object to relieve his sexual and violent urges. Since the object was used so often, it was heavily damaged and on the verge of breaking. As such, now was the time to dispose of it and choose another one to take its place. The man was excited. What would he choose next...?

      Later, a woman's corpse was found in a dumpster.

    • (Score: 5, Insightful) by MrGuy on Wednesday January 31, @06:07PM

      by MrGuy (1007) on Wednesday January 31, @06:07PM (#631054)

      Your premise is mistaken. TFA is an article on The Register. The Register article begins with a two-sentence quote form a Microsoft engineer, and a one-sentence summary of his point, but that's it. The rest is original reporting.

      I'd object to this article because it's so darn elementary - yes, chips can have bugs, and Spectre/Meltdown aren't the only chip bugs out there. The article is a few quotes sprinkled with a list of a few recent flaws. But there's no interesting analysis. It's basically "here are some recent bugs." It would be awesome to have an article making a case WHY bugs might be more frequent now than in the past - other than the quote from Luu, the article offers no real support for that position. This article seems like it was written by someone who doesn't really understand the subject and has nothing really to say (some would argue that's hardly unique on El Reg) - when your best argument is a three-year old blog post from someone ELSE who might know what they're talking about, you should be asking why this article matters...

  • (Score: 1, Interesting) by Anonymous Coward on Wednesday January 31, @05:27PM (25 children)

    by Anonymous Coward on Wednesday January 31, @05:27PM (#631031)

    This is true. I did some work on x64 (Itanium) and later the Pentium D. Most of the problems in our labs were due to faulty hard disks (the spinning kind). I headed Linux validation for servers around 2000--hopefully that helped?

    • (Score: 0) by Anonymous Coward on Wednesday January 31, @05:42PM (6 children)

      by Anonymous Coward on Wednesday January 31, @05:42PM (#631037)

      x64 is marketing speak for x86-64. Itanic was a whole different disaster.

      • (Score: 0) by Anonymous Coward on Wednesday January 31, @06:02PM (1 child)

        by Anonymous Coward on Wednesday January 31, @06:02PM (#631051)

        Yep, I meant ia64. God sucks as a validation engineer for my brain.

        • (Score: 1, Funny) by Anonymous Coward on Wednesday January 31, @08:00PM

          by Anonymous Coward on Wednesday January 31, @08:00PM (#631130)

          He put those bugs in there so that he could exploit them later.

      • (Score: 3, Informative) by Anonymous Coward on Wednesday January 31, @06:41PM (3 children)

        by Anonymous Coward on Wednesday January 31, @06:41PM (#631083)

        x86-64 was AMD's original name
        x86_64 was chosen by Linux people, based on the above
        IA-32e was Intel's fucked-up name for it, since IA-64 was already taken by Itanium
        AMD64 was AMD's response to Intel's attempt to claim the architecture as IA-32e
        x64 was Microsoft's attempt to pick something simple and neutral

        • (Score: 0) by Anonymous Coward on Wednesday January 31, @09:40PM (1 child)

          by Anonymous Coward on Wednesday January 31, @09:40PM (#631189)

          Fuck this. ARM64 for everything.

          • (Score: 0) by Anonymous Coward on Thursday February 01, @01:02PM

            by Anonymous Coward on Thursday February 01, @01:02PM (#631452)

            You mean aa64?

        • (Score: 2) by Wootery on Thursday February 01, @09:56AM

          by Wootery (2341) on Thursday February 01, @09:56AM (#631395)

          The nice thing about standards is that you have so many to choose from; furthermore, if you do not like any of them, you can just wait for next year's model.

          -Tanenbaum

    • (Score: 2) by JoeMerchant on Wednesday January 31, @06:51PM (17 children)

      by JoeMerchant (3937) on Wednesday January 31, @06:51PM (#631089)

      Race to the bottom? I mean, can we at least have a reasonable option for a validated processor that works, and works correctly, instead of one that runs 10% faster but has bugs? Put another way, if there were 2 notebook PCs at NewEgg, identical in every way except that one had 2.4GFlops effective throughput on a typical task load - with 99.999% validated design, and another with 1.8GFlops performance on the same test, but with 99.99999% validated design - isn't there a market for the more reliable machine?

      • (Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @06:57PM (3 children)

        by Anonymous Coward on Wednesday January 31, @06:57PM (#631092)

        It's nowhere near that simple. They paid for a lot more expensive people (like me) for Xeon and Itanium validation than consumer stuff. Try ECC I guess? Don't overclock (and especially don't over-volt) your stuff! I ran a lab at Intel that did high temperature, high voltage stress tests on consumer (Pentium D), and we saw lots of errors. They basically died over a few months.

        • (Score: 3, Insightful) by JoeMerchant on Wednesday January 31, @10:09PM (2 children)

          by JoeMerchant (3937) on Wednesday January 31, @10:09PM (#631206)

          Well, on the one hand, you (and I) are "expensive," but when that cost is spread out over millions of copies it's not nearly as much, and I guess what worries me the most is the dismantlement of the validation program, because those things are a lot harder to set up than they are to keep running.

          • (Score: 0) by Anonymous Coward on Thursday February 01, @01:27AM (1 child)

            by Anonymous Coward on Thursday February 01, @01:27AM (#631286)

            Intel has beefed up validation after various issues--we didn't lack for money in the department. You mention spreading cost out--that's why server chips are so expensive. You have expensive people like me validating chip designs that are sold in fewer quantifies than the latest Android.

            • (Score: 2) by JoeMerchant on Thursday February 01, @12:56PM

              by JoeMerchant (3937) on Thursday February 01, @12:56PM (#631450)

              that's why server chips are so expensive. You have expensive people like me validating chip designs that are sold in fewer quantifies than the latest Android.

              So, I get tiered marketing and that you need to sell some product at a higher price point, but... wouldn't it make a kind of sense to pour the heaviest validation onto the line that sells the most copies? Maybe not a marketing "juice 'em for maximal profits" kind of sense, but a "don't be dicks to the world" kind of sense?

      • (Score: 3, Funny) by MostCynical on Wednesday January 31, @09:57PM

        by MostCynical (2589) on Wednesday January 31, @09:57PM (#631198)

        what the market needs is a new certification:

        "This chip has been validated by the NSA"

        Or, for pcs and laptops, just a nice "NSA Certified" sticker.

        --
        (Score: tau, Irrational)
      • (Score: 1) by khallow on Thursday February 01, @02:35AM (4 children)

        by khallow (3766) Subscriber Badge on Thursday February 01, @02:35AM (#631302) Journal

        I mean, can we at least have a reasonable option for a validated processor that works

        How would validation catch the Spectre [wikipedia.org] bug? It's derived from subtle observation of memory caching and timing delays of the cache queues. Can't validate what you don't know you need to validate. Even if the CPU manufacturers fully fix this one, how will we validate all possible interactions of the internal components of the CPU?

        • (Score: 2) by JoeMerchant on Thursday February 01, @04:03AM

          by JoeMerchant (3937) on Thursday February 01, @04:03AM (#631337)

          How would validation catch the Spectre bug?

          In our industry we have a fancy acronym that means: get a bunch of people who know something about the issues, force them to sit in a room and seriously consider them at least long enough to write a report and file it. Lately, there's a lot of handwringing around cybersecurity, and I'm constantly pinged by the junior guys who get worried about X, Y, or Z - and 9 times out of 10 it's nothing, but once in a while they bring up a good point, and some of those good points are things like Spectre - things nobody had considered before. Our development process on a single product goes on for a couple of years, the process calls for these cybersecurity design reviews periodically throughout those years, and over that time people do actually come up with this stuff. So, our reports analyze X, Y, and Z, and either write them off as adequately handled, or shut down the project until they are.

          The real problem is culture - like the Shuttle launch culture that couldn't be stopped for handwringing over ice in the O-rings, or a big corporate culture that doesn't want to pay its own engineers to discover vulnerabilities in the product early enough to fix them before the rest of the world.

          I just gave a mini-speech today that included: "it needs to be tested, if we don't test it our customers will."

          Can't validate what you don't know you need to validate.

          No, you can't - but, as world leading experts in the field you should be able to figure out most of the things you need to validate before the world figures them out for you. In the case of processors that serve separate users partitioned by hypervisor, the industry could have (and likely did) think of this exploit before the hacker community. As soon as they thought of it, they should have (and likely did not) feed that knowledge back into the design process to work out effective fixes for the next generation of processors.

        • (Score: 1) by pTamok on Thursday February 01, @09:46AM (2 children)

          by pTamok (3042) on Thursday February 01, @09:46AM (#631391)

          Techniques for provably secure hardware from the gate-level and up are known. For various reasons they are not applied.

          e.g. 2011: Design and Verification of Information Flow Secure Systems [utexas.edu]

          We show that it is possible to construct hardware-software systems whose implementations are verifiably free from all illegal information flows. This work is motivated by high assurance systems such as aircraft, automobiles, banks, and medical devices where secrets should never leak to unclassified outputs or untrusted programs should never affect critical information. Such systems are so complex that, prior to this work, formal statements about the absence of covert and timing channels could only be made about simplified models of a given system instead of the final system implementation.

          and

          2017: Register transfer level information flow tracking for provably secure hardware design [ieee.org]

          That's just one IEEE paper - if you look at the home-page of one of the authors (Wei Hu [ucsd.edu]), you can see many other papers in pdf format, including the full text of the above IEEE reference [ucsd.edu]. There are plenty of references to earlier work listed in that paper.

          Note that hardware can be messed with below the gate-level. Nonetheless, techniques for validating processors have been around for decades, they have 'simply' not been used in the general commercial market as they have been regarded as too time-consuming, expensive, or resource hungry. Military and aerospace markets have had different priorities. High Assurance, as a discipline, has been around for a very long time.

          • (Score: 1) by khallow on Friday February 02, @05:27PM (1 child)

            by khallow (3766) Subscriber Badge on Friday February 02, @05:27PM (#632063) Journal

            Nonetheless, techniques for validating processors have been around for decades, they have 'simply' not been used in the general commercial market as they have been regarded as too time-consuming, expensive, or resource hungry.

            This. The key one is the sheer impracticality of it as a likely NP complete problem, but there are other issues as well.

            Note that hardware can be messed with below the gate-level.

            Hardware can also be messed with above the gate-level. Gates are merely an approximation.

            Finally, an important way to simplify and make more efficient a CPU is to share various sorts of resources. But such sharing increases the number and complexity of interactions between components of the CPU.

            This is not impossible, but I think the value of validation is being overplayed in this thread.

            • (Score: 1) by pTamok on Friday February 02, @07:36PM

              by pTamok (3042) on Friday February 02, @07:36PM (#632120)

              Thanks for the reply. I heartily recommend the first reference I gave. Give it a read - it is not overly technical.

              You are likely right that the general problem is probably NP-complete: or at least difficult, if you assume things like unbounded memory and unbounded state-tables. However, if you place bounds on such things, the problem becomes tractable.

              I put 'simply' in scare quotes because cost is a driver to the bottom as far as commercial business systems are concerned. If a business can make a short-term gain by ignoring security requirements, it will. You can keep the plates spinning for a while...

              It is not impossible to produce formally-proven systems, merely difficult, and you have to be discerning about your axioms. As long as people choose cheapness over correctness, we will continue to have problems like Meltdown, Spectre, and multifarious side-channel attacks. It probably doesn't matter for most business systems, but aerospace will continue to provide a proving ground for such things, hopefully followed by medical applications (do you want your pacemaker to be hackable?). I hope that at some point in the future, the benefit of formally-proven systems will outweigh the cost-increment over the slapdash approach currently used. I don't think that time will come soon, unfortunately.

      • (Score: 2) by Wootery on Thursday February 01, @10:11AM (6 children)

        by Wootery (2341) on Thursday February 01, @10:11AM (#631399)

        isn't there a market for the more reliable machine?

        How reliable do you want? Server hardware is pretty good, no? If you want near-perfection, there are CPUs out there rated for safety critical systems, but it'll likely cost you 50x the price, and the performance won't be anywhere close to that of a modern Intel CPU.

        Fun fact: the RAD750 [wikipedia.org] radiation-resistant PowerPC chip clocked at 200MHz, from 2002. Its unit cost: around $200,000, back then when that was real money.

        It's like with software. Formally-verified software exists, but is enormously more expensive to develop. (Vaguely related: the CompCert formally verified C compiler is actually performance-competitive with GCC optimised builds. [inria.fr] I wouldn't have guessed, but there we are. Neat!)

        99.99999% validated design

        Meaning what?

        • (Score: 2) by JoeMerchant on Thursday February 01, @01:05PM (4 children)

          by JoeMerchant (3937) on Thursday February 01, @01:05PM (#631453)

          If you want near-perfection, there are CPUs out there rated for safety critical systems, but it'll likely cost you 50x the price,

          There's a positive feedback loop involved there - the 50x price is because the validation costs $V and the sales volume is Ntiny, so $V/Ntiny = 49x the price of a normal CPU.

          and the performance won't be anywhere close to that of a modern Intel CPU.

          More of the non-virtuous positive feedback loop - low volume market = infrequent product refresh cycles.

          If that same $V effort were applied to the high volume product line (Nhuge) $V/Nhuge might = 0.05x the price of the chips, or less. More importantly, it would also slow delivery of product by x months on average, which is a perceived competitive cost...

          I say perceived cost because, often I will buy a generation, or sometimes two, back from the bleeding edge just because they are the devils whose faces I know - Skylake was a clusterfuck, and only now am I starting to feel confident that we can deal with all of its quirks in a product. The performance gains of the next couple of generations are nice, but truly un-necessary for any application I have. Bugs, driver glitches, field patches - lack of those all matter much more to me.

          • (Score: 2) by Wootery on Thursday February 01, @01:20PM (3 children)

            by Wootery (2341) on Thursday February 01, @01:20PM (#631457)

            I'm inclined to trust market forces here. If people cared more about correctness than performance, wouldn't we expect the CPUs on the market to reflect that?

            • (Score: 3, Insightful) by JoeMerchant on Thursday February 01, @02:09PM (2 children)

              by JoeMerchant (3937) on Thursday February 01, @02:09PM (#631475)

              I'm inclined to trust market forces here. If people cared more about correctness than performance, wouldn't we expect the CPUs on the market to reflect that?

              Seriously? The mass CPU market is consumer driven, you trust Facebook users to decide how robust/secure the majority of CPUs manufactured and used in the world should be?

              • (Score: 2) by Wootery on Thursday February 01, @03:43PM (1 child)

                by Wootery (2341) on Thursday February 01, @03:43PM (#631500)

                Eh? Do Facebook profit by their servers being insecure?

                • (Score: 2) by JoeMerchant on Thursday February 01, @09:21PM

                  by JoeMerchant (3937) on Thursday February 01, @09:21PM (#631700)

                  Not talking about Facebook itself profiting, talking about the mass market electronics consumers of the world (Facebook users, among others) and their "collective wisdom" with respect to reliability, security, etc. For every Facebook server machine, there are hundreds of users who access it via multiple consumer gadgets each - that's the market that needs a nanny.

        • (Score: 2) by JoeMerchant on Thursday February 01, @01:08PM

          by JoeMerchant (3937) on Thursday February 01, @01:08PM (#631455)

          99.99999% validated design

          Meaning what?

          Nothing, of course, except that it's orders of magnitude better than 99.999%. When you're talking about catching the next Spectre before it's exploited in the wild, there are no metrics that mean anything, but effort invested in looking for the problems does pay off in proportion to the amount of effort invested.

  • (Score: 0) by Anonymous Coward on Wednesday January 31, @05:37PM (6 children)

    by Anonymous Coward on Wednesday January 31, @05:37PM (#631035)

    It seems to be mostly an intel issue at this point. I really never had any opinion on cpus/gpus either way but seeing the recent PR attempt to muddy the waters has turned me anti-intel. Maybe people who matter to them aren't thinking the same, but it seems like a dangerous strategy. They are clearly not to be trusted.

    • (Score: 4, Informative) by MrGuy on Wednesday January 31, @06:21PM (2 children)

      by MrGuy (1007) on Wednesday January 31, @06:21PM (#631068)

      Citation needed.

      First of all, Spectre and Meltdown are different. You can read details here [meltdownattack.com]

      Spectre is a flaw where "speculative execution" can leak information (this is where a processor executes a branch of code that MIGHT be needed, but only in theory stores the result if it matters). The problem with speculative execution is that it's not checked whether a given command SHOULD be executed (for example, if the program has the right access level to execute the code). However, this security issue wasn't seen as a problem, because (in theory) the result of the speculatively executed code would be thrown away if it couldn't be used. So, it might be a mechanism to let untrusted code access core kernel memory (which is Very Bad), but it was thought to be acceptable because nobody could see the result. The problem is that CPU caching could "leak" those results and be visible to other code.

      Spectre affects pretty much ALL manufacturers chips - the official paper [spectreattack.com] explicitly references Intel, AMD, and ARM architectures as being affected.

      Meltdown is different - it's a "sideband" attack on kernel memory that relies on using the side effects of certain legal, carefully crafted code and information about the location and layout of memory to "leak" information, including kernel memory. Meltdown does not require the use of speculative execution to leak memory.

      The proof of concept attack for Meltdown detailed officially [meltdownattack.com] only works against Intel hardware, but the paper specifically cautions that there's no reason to expect that AMD wouldn't be suseptible to a similar attack.

      • (Score: 2, Insightful) by Anonymous Coward on Wednesday January 31, @07:05PM

        by Anonymous Coward on Wednesday January 31, @07:05PM (#631102)

        All people really care about is meltdown since patching for spectre seems to have minimal impact on performance. It is to the point where meltdown mitigations are being needlessly enabled for amd processors just to not make intel look so bad[1]. AMD says:

        GPZ Variant 3 (Rogue Data Cache Load or Meltdown) is not applicable to AMD processors.

        https://www.amd.com/en/corporate/speculative-execution [amd.com]

        I had no preference either way until I investigated this topic and saw what looks like a massive shady pro-intel propaganda campaign.

        [1] https://www.phoronix.com/scan.php?page=article&item=linux-retpoline-benchmarks&num=1 [phoronix.com]

      • (Score: 4, Interesting) by Anonymous Coward on Wednesday January 31, @09:12PM

        by Anonymous Coward on Wednesday January 31, @09:12PM (#631179)

        No, Meltdown is not applicable to AMD processors. AMD has already stated they do bounds checking when userland asks to read kernel memory to prevent this sort of thing. Something Intel inexplicably didn't think of or totally screwed up.

        Also, there is a "near zero" chance that Spectre variant 2 can be exploited on AMD processors. It sounds like both AMD and Intel are equally impacted regarding variant 1. Spectre is far more difficult to take advantage of in general.

        So yes, this is primarily an Intel problem.

    • (Score: 4, Informative) by HiThere on Wednesday January 31, @07:23PM (1 child)

      by HiThere (866) on Wednesday January 31, @07:23PM (#631107)

      Depends.
      Meltdown, the currently known dangerous one, is definitely Intel and possibly a few other Intel designed chips.
      Spectre, the one that is *relative* harmless, so far, if present in both Intel and Amd...except, a few really low end models.

      Meltdown has currently known exploits that can work through the browser if you allow Javascript. It also has several other exploit modes.
      Spectre doesn't *yet* have any known useful exploits. But it almost certain will.

      P.S.: I'm not an expert here, there are several classes of Spectre, and I can't distinguish between them. If you're interested there's lots of info on the web, but unless you're working in the field distinguishing between them doesn't seem useful to me.

      --
      Put not your faith in princes.
      • (Score: 0) by Anonymous Coward on Wednesday January 31, @07:28PM

        by Anonymous Coward on Wednesday January 31, @07:28PM (#631109)

        The reason to distinguish between them for the average person is the performance impact of the mitigation. Everyone expects a constant stream of bugs/vulns these days anyway, but not that patching for them will slow everything down to half speed or whatever. That is where intel has the main problem (according to what I've read).

    • (Score: 3, Interesting) by Reziac on Thursday February 01, @03:52AM

      by Reziac (2489) on Thursday February 01, @03:52AM (#631332) Homepage

      Back when I was keeping track, and when both released Errata (functionally, the list of known bugs), AMD's errata list was generally about 3 times as long as Intel's. AMD dealt with this by not releasing any more errata lists.

  • (Score: 3, Informative) by takyon on Wednesday January 31, @06:58PM

    by takyon (881) Subscriber Badge <reversethis-{gro ... s} {ta} {noykat}> on Wednesday January 31, @06:58PM (#631093) Journal

    http://www.zdnet.com/article/amd-vs-spectre-our-new-zen-2-chips-will-be-protected-says-ceo/ [zdnet.com]

    https://wccftech.com/amd-zen-2-cpus-fix-spectre-exploit/ [wccftech.com]

    The 12nm Zen+ is coming out this year, 7nm Zen 2 coming out next year presumably. Some were predicting that Spectre would be lingering in upcoming chip generations since it can just be addressed with a patch, but that's mostly not the case.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 5, Insightful) by maxwell demon on Wednesday January 31, @08:03PM (4 children)

    by maxwell demon (1608) Subscriber Badge on Wednesday January 31, @08:03PM (#631132) Journal

    Thanks to growing chip complexity, compounded by hardware virtualization,

    Those reasons are excusable.

    and reduced design validation efforts

    This one isn't.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 0) by Anonymous Coward on Wednesday January 31, @08:28PM

      by Anonymous Coward on Wednesday January 31, @08:28PM (#631157)

      I'd add to that that the former two are only an issue because of the third.

    • (Score: 2, Interesting) by shrewdsheep on Wednesday January 31, @09:40PM (2 children)

      by shrewdsheep (5215) on Wednesday January 31, @09:40PM (#631190)

      In defence of the chip designers, cycle-by-cycle emulation of new chip designs is becoming more and more difficult due to complexity. Emulating just a single boot-up can take weeks.

      • (Score: 2) by maxwell demon on Wednesday January 31, @09:59PM

        by maxwell demon (1608) Subscriber Badge on Wednesday January 31, @09:59PM (#631199) Journal

        That is an argument for the same effort being less effective. It is not an argument for reducing the effort. Quite the opposite!

        --
        The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 0) by Anonymous Coward on Thursday February 01, @04:02PM

        by Anonymous Coward on Thursday February 01, @04:02PM (#631512)

        Emulating a boot up isn't going to spot problems like Meltdown or Spectre. Or many other bugs.

  • (Score: 4, Interesting) by requerdanos on Wednesday January 31, @11:43PM (2 children)

    by requerdanos (5997) Subscriber Badge on Wednesday January 31, @11:43PM (#631257) Journal

    Intel is taking a lot of heat lately, but all the first-run Ryzen processors from AMD have a bug that causes random segfaults, especially when compiling under linux (a not uncommon occurrence if one likes to linux).

    Here is an actual tech support letter I received from AMD. Some identifying information has been changed or obscured, otherwise it's 100% as I received it.

    Original Text
    From: TECH.SUPPORT@AMD.COM
    To: requerdanos@..............
    CC:
    Subject: RE: Ryzen/Linux segfault at 2f ip 0...

    Dear requerdanos,

    Your service request : SR #{ticketno:[######6680]} has been reviewed and updated.

    Response and Service Request History:

    Thank you for your email and background information about your issue. I’m sorry to hear that you’re experiencing stability issues with your system. Please be assured that I am here to help find a resolution to your problem

    At this time, I would like focus on your system’s hardware configuration. I need to collect some more information about your system which can help with our troubleshooting.

    Please provide the details of the following hardware components in your system:

            Make and model of motherboard

            Motherboard BIOS version

            Make and model of RAM

            Make and model of the power supply unit

    Please could you let me know the current settings you have for the CPU VCORE, SOC, and RAM? It would be very helpful if you could provide with pictures of your BIOS screens with these settings.

    In addition, through troubleshooting with other customers we have found that the layout of the components inside the system case have caused sub-optimal cooling of the CPU causing a variety of issues.

    I would like to better understand your system cooling to rule out any thermal issues. Please could you provide a picture of the whole interior of your system showing the CPU cooler?

    Also, could you let me know the reported CPU temperature during heavy load or when the errors occur?

    In order to update this service request, please respond, leaving the service request reference intact.

    Best regards,

    Asok

    AMD Global Customer Care

    That's right, their answer was basically "pics or it didn't happen." I am working to comply with their request. Also, they sent me this before linux 4.15 was released, wanting to know what temperature was reported--and 4.15 is the first kernel version to feature Ryzen CPU temperature reporting.

    • (Score: 0) by Anonymous Coward on Thursday February 01, @10:56AM (1 child)

      by Anonymous Coward on Thursday February 01, @10:56AM (#631414)

      That's right, their answer was basically "pics or it didn't happen."

      The way I read their answer is: "we need accurate information to be able to figure out what the problem is". Which makes perfect sense. Many people are highly inaccurate when giving descriptions of things that went wrong (which is perfectly normal and nothing to blame them for), and in complex systems that can easily mean the problem solver keeps looking in the wrong places and won't come near pinpointing and solving the problem. You need to help the problem solver to help you by being accurate, and this problem solver is helping you to be accurate by asking for pictures. Just work together for the best result.

      • (Score: 2) by requerdanos on Thursday February 01, @05:45PM

        by requerdanos (5997) Subscriber Badge on Thursday February 01, @05:45PM (#631579) Journal

        You need to help the problem solver to help you by being accurate, and this problem solver is helping you to be accurate

        There is a known CPU bug [amd.com] which manifests especially when compiling with gcc, and a simple test [github.com] for it, which I sent them the output of.

        Besides which my processor batch is known to have the bug, and I sent them that as well.

        There is not an issue in this particular instance with "well, things are complicated, and we need to be real accurate."

        I have a first-run CPU with a bug. AMD will replace them, but only if you complete their endurance course (instead of replacing them because they are defective and proven so).

        The temperature output they ask for might have relevant, but did not even exist when they asked for it--no available driver.

        The CPU does not have a bug because of the placement of internal components, because of what other hardware it's paired with, or the phase of the moon.

        It left the factory with that bug, I paid $500 for that buggy CPU, and here it is in my computer.

        There isn't much difference here between "Well, we need to carefully weigh all the factors" and "pics or it didn't happen."

        I am out $500 for a buggy CPU. I want a new one. Give me it. That is all the accuracy needed here.

        Meantime thank God this machine does not run Gentoo.

  • (Score: 2) by MichaelDavidCrawford on Thursday February 01, @12:04AM

    by MichaelDavidCrawford (2339) Subscriber Badge <mdcrawford@gmail.com> on Thursday February 01, @12:04AM (#631262) Homepage Journal

    I thought my client's source resulted in bad machine code. I did not at first know the AM64 ABI so I read up on it.

    The ABI has this mad as a cut snake feature in which subroutines can use up to 128 bytes of stack without adjusting the stack pointer.

    But that same doc informed me that the red zone had to be disabled for Linux kernel code.

    I puzzled over how to do that in Mac OS X then discovered Xcode's kernel mode setting. Just enable kernel mode and the AMD64 madness happily goes away.

    However set up that Xcode project clearly was not a kernel or driver developer.

    --
    "MICHAEL DAVID CRAWFORD IS A LYING MOTHERFUCKER."
    -- Anonymous Coward
(1)