Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by Fnord666 on Wednesday January 31 2018, @05:14PM   Printer-friendly
from the doesn't-raid-fix-this? dept.

Arthur T Knackerbracket has found the following story:

In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.

"We've seen at least two serious bugs in Intel CPUs in the last quarter, and it's almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we've moved past that."

Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.

This month's Meltdown and Spectre security flaws that affect chip designs from AMD, Arm, and Intel to varying degrees support that claim. But there are many other examples.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by JoeMerchant on Wednesday January 31 2018, @06:51PM (17 children)

    by JoeMerchant (3937) on Wednesday January 31 2018, @06:51PM (#631089)

    Race to the bottom? I mean, can we at least have a reasonable option for a validated processor that works, and works correctly, instead of one that runs 10% faster but has bugs? Put another way, if there were 2 notebook PCs at NewEgg, identical in every way except that one had 2.4GFlops effective throughput on a typical task load - with 99.999% validated design, and another with 1.8GFlops performance on the same test, but with 99.99999% validated design - isn't there a market for the more reliable machine?

    --
    🌻🌻 [google.com]
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 1, Informative) by Anonymous Coward on Wednesday January 31 2018, @06:57PM (3 children)

    by Anonymous Coward on Wednesday January 31 2018, @06:57PM (#631092)

    It's nowhere near that simple. They paid for a lot more expensive people (like me) for Xeon and Itanium validation than consumer stuff. Try ECC I guess? Don't overclock (and especially don't over-volt) your stuff! I ran a lab at Intel that did high temperature, high voltage stress tests on consumer (Pentium D), and we saw lots of errors. They basically died over a few months.

    • (Score: 3, Insightful) by JoeMerchant on Wednesday January 31 2018, @10:09PM (2 children)

      by JoeMerchant (3937) on Wednesday January 31 2018, @10:09PM (#631206)

      Well, on the one hand, you (and I) are "expensive," but when that cost is spread out over millions of copies it's not nearly as much, and I guess what worries me the most is the dismantlement of the validation program, because those things are a lot harder to set up than they are to keep running.

      --
      🌻🌻 [google.com]
      • (Score: 0) by Anonymous Coward on Thursday February 01 2018, @01:27AM (1 child)

        by Anonymous Coward on Thursday February 01 2018, @01:27AM (#631286)

        Intel has beefed up validation after various issues--we didn't lack for money in the department. You mention spreading cost out--that's why server chips are so expensive. You have expensive people like me validating chip designs that are sold in fewer quantifies than the latest Android.

        • (Score: 2) by JoeMerchant on Thursday February 01 2018, @12:56PM

          by JoeMerchant (3937) on Thursday February 01 2018, @12:56PM (#631450)

          that's why server chips are so expensive. You have expensive people like me validating chip designs that are sold in fewer quantifies than the latest Android.

          So, I get tiered marketing and that you need to sell some product at a higher price point, but... wouldn't it make a kind of sense to pour the heaviest validation onto the line that sells the most copies? Maybe not a marketing "juice 'em for maximal profits" kind of sense, but a "don't be dicks to the world" kind of sense?

          --
          🌻🌻 [google.com]
  • (Score: 3, Funny) by MostCynical on Wednesday January 31 2018, @09:57PM

    by MostCynical (2589) on Wednesday January 31 2018, @09:57PM (#631198) Journal

    what the market needs is a new certification:

    "This chip has been validated by the NSA"

    Or, for pcs and laptops, just a nice "NSA Certified" sticker.

    --
    "I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
  • (Score: 1) by khallow on Thursday February 01 2018, @02:35AM (4 children)

    by khallow (3766) Subscriber Badge on Thursday February 01 2018, @02:35AM (#631302) Journal

    I mean, can we at least have a reasonable option for a validated processor that works

    How would validation catch the Spectre [wikipedia.org] bug? It's derived from subtle observation of memory caching and timing delays of the cache queues. Can't validate what you don't know you need to validate. Even if the CPU manufacturers fully fix this one, how will we validate all possible interactions of the internal components of the CPU?

    • (Score: 2) by JoeMerchant on Thursday February 01 2018, @04:03AM

      by JoeMerchant (3937) on Thursday February 01 2018, @04:03AM (#631337)

      How would validation catch the Spectre bug?

      In our industry we have a fancy acronym that means: get a bunch of people who know something about the issues, force them to sit in a room and seriously consider them at least long enough to write a report and file it. Lately, there's a lot of handwringing around cybersecurity, and I'm constantly pinged by the junior guys who get worried about X, Y, or Z - and 9 times out of 10 it's nothing, but once in a while they bring up a good point, and some of those good points are things like Spectre - things nobody had considered before. Our development process on a single product goes on for a couple of years, the process calls for these cybersecurity design reviews periodically throughout those years, and over that time people do actually come up with this stuff. So, our reports analyze X, Y, and Z, and either write them off as adequately handled, or shut down the project until they are.

      The real problem is culture - like the Shuttle launch culture that couldn't be stopped for handwringing over ice in the O-rings, or a big corporate culture that doesn't want to pay its own engineers to discover vulnerabilities in the product early enough to fix them before the rest of the world.

      I just gave a mini-speech today that included: "it needs to be tested, if we don't test it our customers will."

      Can't validate what you don't know you need to validate.

      No, you can't - but, as world leading experts in the field you should be able to figure out most of the things you need to validate before the world figures them out for you. In the case of processors that serve separate users partitioned by hypervisor, the industry could have (and likely did) think of this exploit before the hacker community. As soon as they thought of it, they should have (and likely did not) feed that knowledge back into the design process to work out effective fixes for the next generation of processors.

      --
      🌻🌻 [google.com]
    • (Score: 1) by pTamok on Thursday February 01 2018, @09:46AM (2 children)

      by pTamok (3042) on Thursday February 01 2018, @09:46AM (#631391)

      Techniques for provably secure hardware from the gate-level and up are known. For various reasons they are not applied.

      e.g. 2011: Design and Verification of Information Flow Secure Systems [utexas.edu]

      We show that it is possible to construct hardware-software systems whose implementations are verifiably free from all illegal information flows. This work is motivated by high assurance systems such as aircraft, automobiles, banks, and medical devices where secrets should never leak to unclassified outputs or untrusted programs should never affect critical information. Such systems are so complex that, prior to this work, formal statements about the absence of covert and timing channels could only be made about simplified models of a given system instead of the final system implementation.

      and

      2017: Register transfer level information flow tracking for provably secure hardware design [ieee.org]

      That's just one IEEE paper - if you look at the home-page of one of the authors (Wei Hu [ucsd.edu]), you can see many other papers in pdf format, including the full text of the above IEEE reference [ucsd.edu]. There are plenty of references to earlier work listed in that paper.

      Note that hardware can be messed with below the gate-level. Nonetheless, techniques for validating processors have been around for decades, they have 'simply' not been used in the general commercial market as they have been regarded as too time-consuming, expensive, or resource hungry. Military and aerospace markets have had different priorities. High Assurance, as a discipline, has been around for a very long time.

      • (Score: 1) by khallow on Friday February 02 2018, @05:27PM (1 child)

        by khallow (3766) Subscriber Badge on Friday February 02 2018, @05:27PM (#632063) Journal

        Nonetheless, techniques for validating processors have been around for decades, they have 'simply' not been used in the general commercial market as they have been regarded as too time-consuming, expensive, or resource hungry.

        This. The key one is the sheer impracticality of it as a likely NP complete problem, but there are other issues as well.

        Note that hardware can be messed with below the gate-level.

        Hardware can also be messed with above the gate-level. Gates are merely an approximation.

        Finally, an important way to simplify and make more efficient a CPU is to share various sorts of resources. But such sharing increases the number and complexity of interactions between components of the CPU.

        This is not impossible, but I think the value of validation is being overplayed in this thread.

        • (Score: 1) by pTamok on Friday February 02 2018, @07:36PM

          by pTamok (3042) on Friday February 02 2018, @07:36PM (#632120)

          Thanks for the reply. I heartily recommend the first reference I gave. Give it a read - it is not overly technical.

          You are likely right that the general problem is probably NP-complete: or at least difficult, if you assume things like unbounded memory and unbounded state-tables. However, if you place bounds on such things, the problem becomes tractable.

          I put 'simply' in scare quotes because cost is a driver to the bottom as far as commercial business systems are concerned. If a business can make a short-term gain by ignoring security requirements, it will. You can keep the plates spinning for a while...

          It is not impossible to produce formally-proven systems, merely difficult, and you have to be discerning about your axioms. As long as people choose cheapness over correctness, we will continue to have problems like Meltdown, Spectre, and multifarious side-channel attacks. It probably doesn't matter for most business systems, but aerospace will continue to provide a proving ground for such things, hopefully followed by medical applications (do you want your pacemaker to be hackable?). I hope that at some point in the future, the benefit of formally-proven systems will outweigh the cost-increment over the slapdash approach currently used. I don't think that time will come soon, unfortunately.

  • (Score: 2) by Wootery on Thursday February 01 2018, @10:11AM (6 children)

    by Wootery (2341) on Thursday February 01 2018, @10:11AM (#631399)

    isn't there a market for the more reliable machine?

    How reliable do you want? Server hardware is pretty good, no? If you want near-perfection, there are CPUs out there rated for safety critical systems, but it'll likely cost you 50x the price, and the performance won't be anywhere close to that of a modern Intel CPU.

    Fun fact: the RAD750 [wikipedia.org] radiation-resistant PowerPC chip clocked at 200MHz, from 2002. Its unit cost: around $200,000, back then when that was real money.

    It's like with software. Formally-verified software exists, but is enormously more expensive to develop. (Vaguely related: the CompCert formally verified C compiler is actually performance-competitive with GCC optimised builds. [inria.fr] I wouldn't have guessed, but there we are. Neat!)

    99.99999% validated design

    Meaning what?

    • (Score: 2) by JoeMerchant on Thursday February 01 2018, @01:05PM (4 children)

      by JoeMerchant (3937) on Thursday February 01 2018, @01:05PM (#631453)

      If you want near-perfection, there are CPUs out there rated for safety critical systems, but it'll likely cost you 50x the price,

      There's a positive feedback loop involved there - the 50x price is because the validation costs $V and the sales volume is Ntiny, so $V/Ntiny = 49x the price of a normal CPU.

      and the performance won't be anywhere close to that of a modern Intel CPU.

      More of the non-virtuous positive feedback loop - low volume market = infrequent product refresh cycles.

      If that same $V effort were applied to the high volume product line (Nhuge) $V/Nhuge might = 0.05x the price of the chips, or less. More importantly, it would also slow delivery of product by x months on average, which is a perceived competitive cost...

      I say perceived cost because, often I will buy a generation, or sometimes two, back from the bleeding edge just because they are the devils whose faces I know - Skylake was a clusterfuck, and only now am I starting to feel confident that we can deal with all of its quirks in a product. The performance gains of the next couple of generations are nice, but truly un-necessary for any application I have. Bugs, driver glitches, field patches - lack of those all matter much more to me.

      --
      🌻🌻 [google.com]
      • (Score: 2) by Wootery on Thursday February 01 2018, @01:20PM (3 children)

        by Wootery (2341) on Thursday February 01 2018, @01:20PM (#631457)

        I'm inclined to trust market forces here. If people cared more about correctness than performance, wouldn't we expect the CPUs on the market to reflect that?

        • (Score: 3, Insightful) by JoeMerchant on Thursday February 01 2018, @02:09PM (2 children)

          by JoeMerchant (3937) on Thursday February 01 2018, @02:09PM (#631475)

          I'm inclined to trust market forces here. If people cared more about correctness than performance, wouldn't we expect the CPUs on the market to reflect that?

          Seriously? The mass CPU market is consumer driven, you trust Facebook users to decide how robust/secure the majority of CPUs manufactured and used in the world should be?

          --
          🌻🌻 [google.com]
          • (Score: 2) by Wootery on Thursday February 01 2018, @03:43PM (1 child)

            by Wootery (2341) on Thursday February 01 2018, @03:43PM (#631500)

            Eh? Do Facebook profit by their servers being insecure?

            • (Score: 2) by JoeMerchant on Thursday February 01 2018, @09:21PM

              by JoeMerchant (3937) on Thursday February 01 2018, @09:21PM (#631700)

              Not talking about Facebook itself profiting, talking about the mass market electronics consumers of the world (Facebook users, among others) and their "collective wisdom" with respect to reliability, security, etc. For every Facebook server machine, there are hundreds of users who access it via multiple consumer gadgets each - that's the market that needs a nanny.

              --
              🌻🌻 [google.com]
    • (Score: 2) by JoeMerchant on Thursday February 01 2018, @01:08PM

      by JoeMerchant (3937) on Thursday February 01 2018, @01:08PM (#631455)

      99.99999% validated design

      Meaning what?

      Nothing, of course, except that it's orders of magnitude better than 99.999%. When you're talking about catching the next Spectre before it's exploited in the wild, there are no metrics that mean anything, but effort invested in looking for the problems does pay off in proportion to the amount of effort invested.

      --
      🌻🌻 [google.com]