Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday April 12 2018, @10:31AM   Printer-friendly
from the for-the-"cloud" dept.

Submitted via IRC for AndyTheAbsurd

IBM is launching what it calls a "skinny mainframe" for cloud computing. The system is built around IBM z14 mainframe technology, and it features a 19-inch industry standard, single-frame case design, allowing for easy placement into public cloud data centers and for private cloud deployments.

[...] With the mainframe in high demand and more relevant than ever, IBM worked closely on the design with more than 80 clients, including managed service providers, online banks, and insurance firms, to reinvent the mainframe for a whole new class of users.

The new z14 and LinuxOne offerings also bring significant increases in capacity, performance, memory, and cache across nearly all aspects of the system. A complete system redesign delivers this capacity growth in 40 percent less space, standardized to be deployed in any data center. The z14 ZR1, announced today, can be the foundation for an IBM Cloud Private solution, creating a "data center in a box" by co-locating storage, networking, and other elements in the same physical frame as the mainframe server.

The z14 ZR1 delivers 10 percent more capacity than its predecessor, the z13s, and, at 8TB, twice the memory. The system can handle more than 850 million fully encrypted transactions per day.

Source: https://venturebeat.com/2018/04/09/ibm-launches-skinny-mainframe-for-the-cloud/

Also at The Register

Technical Introduction(IBM Redbook)


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by DannyB on Friday April 13 2018, @12:56PM (10 children)

    by DannyB (5839) Subscriber Badge on Friday April 13 2018, @12:56PM (#666438) Journal

    Google can, and Google does. And when you're processing the volume of data that Google is, then it's much cheaper to hire bright people to design software systems that are tolerant to failures of individual nodes than it is to buy hardware that incorporates redundancy at every level and transparently performs error checking and correction and presents software with the illusion that hardware never fails.

    Then Google makes this technology open source. And cooperates with others in its continued development and evolution.

    I remember reading that Google can fail over entire data centers. With technology like kubernetes, that makes sense. Containers could be executed in a different data center just the same. All it would take is that the scheduler be smart enough to prioritize the preferred cluster nodes, but be able to use lower priority clusters (eg, other data centers) when no nodes available on the preferred cluster.

    I also remember reading a google engineer saying that Google's network connectivity between data centers are better than some others connectivity within data centers.

    I wonder at what point does the reliability of this exceed the reliability of mainframes? Despite their amazing and exotic (expensive) hardware.

    --
    To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by TheRaven on Saturday April 14 2018, @10:11AM (9 children)

    by TheRaven (270) on Saturday April 14 2018, @10:11AM (#666856) Journal
    Google does make this technology open source, but that doesn't always help. You still need to employ developers who understand the frameworks well enough to map their problem to them. This means you need developers who understand the failure modes and latency issues of distributed consensus algorithms. Those things are complicated so those developers are not cheap.

    Reliability for algorithms that have been mapped to work on a distributed environment like a Google cloud already meets or exceeds that of mainframes, but each implementation requires a lot of careful design to map a problem to an unreliable substrate. That costs money, each time. It's worth doing for Google, because all of the jobs that they use this for are huge and the cost of building a mainframe large enough to handle them is infeasible. It's not worth doing for most people. If you have a job that can run on a handful of machines, it's a lot cheaper to run it on a slice of a mainframe than to write a version of it that works as a resilient distributed system.

    --
    sudo mod me up
    • (Score: 2) by DannyB on Saturday April 14 2018, @02:30PM (8 children)

      by DannyB (5839) Subscriber Badge on Saturday April 14 2018, @02:30PM (#666933) Journal

      I believe that all you must do is map your problem into Map and Reduce operations. The infrastructure makes sure that everything gets processed correctly.

      Suppose I have four items: A, B, C and D.
      Each item must have function f1 applied to it, to produce A1, B1, C1 and D1.

      {A1, B1, C1, D1} == Map f1 over {A,B,C,D}

      Now I need to apply function f2 over each of those . . .

      {A2, B2, C2, D2} == Map f2 over {A1,B1,C1,D1}

      Now that might be your answer. But in some cases, I want to apply an operator op1 to pairs in order to Reduce it to a single result. op1 is both associative and commutative.

      Result == (A2 op1 B2) op1 (C2 op1 D2)

      (Imagine that op1 is simple addition.)

      The infrastructure takes care of reliability for you. If applying function B2 = f2( B1 ) in the 2nd map operation were to fail, the infrastructure will re-run that function application on the same data again in order to get B2. These are pure functions with no side effects. So like, applying b2 = Sqrt( B1 ), another time on another node does not cause any problem if it fails, because it is a pure function.

      Surprisingly many batch and stream processing operations can be expressed as Map / Reduce and run on clusters. You don't have to even consider reliability, you just take it for granted. Just like when you run your single program on a desktop computer.

      Google does make extensive documentation available. This is not some obscure technology. Try googling or Youtubing 'kubernetes', 'docker', etc and you'll see that this is already becoming widely supported. The approach is scalable in a way that running it on a mainframe is not.

      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
      • (Score: 2) by TheRaven on Saturday April 14 2018, @08:29PM (7 children)

        by TheRaven (270) on Saturday April 14 2018, @08:29PM (#667029) Journal

        I believe that all you must do is map your problem into Map and Reduce operations. The infrastructure makes sure that everything gets processed correctly.

        Yes, that's 'all' you must do. The fact that you can write 'all' in that way implies that you have never tried to do that with anything that is not embarrassingly parallel or which is in any way latency sensitive (typical Map-Reduce jobs run for tens of minutes or hours, most of the things that companies have high-reliability requirements for need responses in milliseconds).

        --
        sudo mod me up
        • (Score: 2) by DannyB on Monday April 16 2018, @02:29PM (6 children)

          by DannyB (5839) Subscriber Badge on Monday April 16 2018, @02:29PM (#667638) Journal

          Map / Reduce is not the only way to use a cluster.

          I would suggest that Twitter might be latency sensitive. In 2012, Twitter rewrote from Ruby into Java. They handle billions of tweets a day, each of which triggers various events and gets routed to multiple destinations.

          I would point out that High Speed Trading is done in Java on Linux. Also Java and Linux are things that IBM pushes on their mainframes. But that is not what I think of as the mainframe era. As Java and Linux are very suitable for clusters of commodity hardware.

          I'm not sure what type of application you are thinking of.

          Airline reservation system or something similar for hotels, etc? Long ago that application was probably written for mainframes. It probably could be adapted, IF NEED BE, for clusters of commodity hardware. Maybe you're thinking an application like air traffic control systems. I don't know. But I doubt that the mainframe is the only way to do it fast, reliably or economically. And especially economically.

          Maybe there is some application for which a mainframe is somehow the only uniquely suitable solution. I don't see it. But I'm open to being enlightened.

          --
          To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
          • (Score: 2) by TheRaven on Tuesday April 17 2018, @12:34PM (5 children)

            by TheRaven (270) on Tuesday April 17 2018, @12:34PM (#668013) Journal

            Map / Reduce is not the only way to use a cluster.

            Right. So now you need not only the expertise to use Map Reduce, but the expertise to use a number of different frameworks and to determine which is applicable. You think that's cheaper?

            I would suggest that Twitter might be latency sensitive. In 2012, Twitter rewrote from Ruby into Java. They handle billions of tweets a day, each of which triggers various events and gets routed to multiple destinations.

            Next time you talk to someone at Twitter, ask them how much engineering effort they've invested in dealing with tail latency spikes from garbage collection in their distributed Java systems. Hint: It's more than most companies want to spend on software development.

            I would point out that High Speed Trading is done in Java on Linux.

            No, it really isn't. Unless you have a very different definition of 'high speed' to any of the people I know in HST.

            Also Java and Linux are things that IBM pushes on their mainframes.

            Right, because it's cheap to write a Linux program and run it on a mainframe than to write a fault-tolerant distributed system.

            --
            sudo mod me up
            • (Score: 2) by DannyB on Tuesday April 17 2018, @01:14PM (4 children)

              by DannyB (5839) Subscriber Badge on Tuesday April 17 2018, @01:14PM (#668030) Journal

              You've got me stumped. I can't imagine what kind of application you are getting at.

              Surely the Googles, Amazons, Twitters, Facebooks and other major players must be missing out on something HUGE in mainframes. Those ignorant sops.

              The development, followed by rapid growth of kubernetes, the videos, talks, articles, etc. There are a lot of people that are clearly missing out on something.

              But you have not yet identified what.

              These people all seem to think that on commodity hardware they get vendor neutrality, high reliability, immense scalability, and economy.

              As for Java (or any language) with GC, yes, it has costs. But it has benefits larger than those costs. Sort of like hiring software developers is a cost. Or buying expensive servers.

              Seriously. I am open to being enlightened what solution, including mainframe, is superior.

              --
              To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
              • (Score: 2) by TheRaven on Wednesday April 18 2018, @07:44AM (3 children)

                by TheRaven (270) on Wednesday April 18 2018, @07:44AM (#668483) Journal

                You are completely missing the point. Google, Twitter, Facebook, and so on spend a huge amount on software development. They can afford to, because they spend even more on hardware. If Google can save 0.1% on their hardware costs, that completely pays for several full-time engineers, even on Silicon Valley salaries.

                And, while Google does open source a lot of the core building blocks for their systems, that isn't the end. You can use those building blocks to assemble a reliable distributed system that will have the same uptime guarantees as a mainframe application, but it costs time for software engineers. It is a lot cheaper to build a non-distributed system that assumes that the underlying platform is reliable.

                For most companies, hardware costs are small. The cost of taking their problem and mapping it to a reliable distributed system is vastly more than the total cost of hosting. Moving to hardware that costs 2-3 times as much is far cheaper than paying even one full-time software engineer. If you need a reliable transactional data store, you can install an off-the-shelf database on a mainframe and, as long as your data sets are no more than a few TBs, you get more reliability than most companies need and a trivial programming model that someone fresh out of university can work on. Alternatively, you can use one of Google's systems and hire developers that understand the CAP theorem and its practical implications. They can then write software that is vastly more complicated and needs at least twice as many people to maintain it. On the plus side, it will run on any commodity cluster and will scale to much larger data sets. On the down side, you've spent vastly more money on software engineering than you'd spend in 10 years having IBM host it on a mainframe for you.

                The point is not that a mainframe can do things that commodity hardware can't do, it's that a mainframe can do the same thing with much cheaper software than a commodity cluster. If your software costs are a larger proportion of your total costs than hardware (which they are for a lot of companies) then mainframes are attractive.

                From your reply, I see you didn't look at the problems that Twitter had with Java GC pauses. I'd suggest that you go and read about some of their problems before assuming that it's trivial for any company to solve them.

                --
                sudo mod me up
                • (Score: 2) by DannyB on Wednesday April 18 2018, @04:18PM (2 children)

                  by DannyB (5839) Subscriber Badge on Wednesday April 18 2018, @04:18PM (#668625) Journal

                  I see now what you are arguing. I agree that a single-thread approach is always easier. The problem is that there is a limit to how much you can scale it. The first type of scaling out is to make it multi-threaded onto multiple cores that all share the same memory. The next step is to make it run on multiple cores that do NOT share the same memory model but communicate by some other means. Both of those approaches are increasingly common these days. So it's not exactly rocket science.

                  As for the Twitter - Java, I was just reading yesterday about the new Graal VM announcement. In particular Twitter has been testing this and is interested in it. I suspect that running on a mainframe is not an option for Twitter. They probably have no choice but to architect their application to be massively parallel on large clusters. If you're going to use any modern language today (Java, C#, Node.js, Python, Ruby, any JVM language such as Scala, Kotlin, Clojure, etc) then GC is simply a fact of life. Java's GC and JIT have had two decades of research by multiple interested parties. I would dare say that the JVM is the best industrial strength GC runtime platform at the moment. But Graal VM sure looks promising and runs on LLVM.

                  Twitter could write in C to run on a massively parallel cluster. The software development costs of that would be vastly higher than using a higher level language.

                  I strongly suspect that Twitter had investigated various options before settling on rewriting from Ruby to Java. Even in 2012, the warts in Java and its runtime are well known. They are probably just less of a problem than other approaches available to Twitter.

                  Or I could be wrong and Twitter (and others) just has/have no clue what they are doing on their massively parallel cluster.

                  --
                  To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
                  • (Score: 2) by TheRaven on Wednesday April 18 2018, @05:09PM (1 child)

                    by TheRaven (270) on Wednesday April 18 2018, @05:09PM (#668641) Journal

                    I see now what you are arguing. I agree that a single-thread approach is always easier. The problem is that there is a limit to how much you can scale it. The first type of scaling out is to make it multi-threaded onto multiple cores that all share the same memory. The next step is to make it run on multiple cores that do NOT share the same memory model but communicate by some other means. Both of those approaches are increasingly common these days. So it's not exactly rocket science.

                    Then the next step is to make it handle communications latency in the hundreds to thousands of milliseconds. Then the next step is to make it tolerant of a single process going away at random. Then you have something of the kind that Google builds. You're right that this isn't rocket science: it's much harder. I'm getting the impression from this thread that you've never programmed a distributed system, let alone a fault-tolerant distributed system and have absolutely no idea of the complexity involved.

                    As for the Twitter - Java, I was just reading yesterday about the new Graal VM announcement. In particular Twitter has been testing this and is interested in it.

                    Okay, so after three attempts you still haven't looked up what I told you to. Twitter runs Java in a distributed system. They need to keep average latency at a speed that prevents users getting annoyed, which means well under 100ms. Most end-user requests require going to around 100 different machines for their data. In such a system, the probability that one node will be in the middle of a GC pause is approximately one and the Gc pause time is greater than their acceptable tail latency. To give you an idea of how difficult this problem is to solve well, there have been several dozen paper proposing solutions at top conferences in the fields of programming languages, networking and OS design. And that's just one of the simpler problems that you need to solve when building a system like twitter (which is pretty tolerant of data loss: if you accidentally delete the occasional tweet, most people won't notice - it's not like money is involved).

                    I suspect that running on a mainframe is not an option for Twitter

                    I never said it was, but there are a lot of companies that are smaller than Twitter, for whom the cost of developing something like Twitter's network infrastructure is infeasibly expensive. Twitter employs over a thousand people to write their software, to solve what you keep claiming are easy problems.

                    Or I could be wrong and Twitter (and others) just has/have no clue what they are doing on their massively parallel cluster.

                    Twitter knows what they're doing. They're solving hard distributed systems problems by spending a lot of money on software engineers because, for a system of their size that doesn't have hard reliability guarantees, that's a lot cheaper than building or buying reliable hardware. Again, the fact that this is the right solution for Twitter tells you absolutely nothing about the right solution for most other companies. Your argument makes as much sense as telling FedEx customers that there's no point in shipping stuff in lorries because SpaceX can do it in a rocket.

                    --
                    sudo mod me up
                    • (Score: 2) by DannyB on Wednesday April 18 2018, @06:05PM

                      by DannyB (5839) Subscriber Badge on Wednesday April 18 2018, @06:05PM (#668665) Journal

                      I'm getting the impression from this thread that you've never programmed a distributed system, let alone a fault-tolerant distributed system and have absolutely no idea of the complexity involved.

                      If you qualify that the problems are not embarrassingly parallel, then you're right. I don't work on anything that requires interconnectedness between nodes or very low latencies. Even with embarrassingly parallel, I've only toyed with it. In a Map/Reduce situation I don't have to worry about reliability, if a failed function can be abstracted away.

                      after three attempts you still haven't looked up what I told you to.

                      Do you have a link you would like me to see? I would be interested because I do think of Twitter as a gigantic Java application. Maybe the very biggest.

                      I do not know the answer to this, but I would assume Twitter has looked at Azul's Zing and either found it unsuitable, not solving the problem, too expensive or some combination.

                      I think of Kubernetes as something that lets you deploy a Docker container to Google or Amazon. Or you can use Amazon's elastic beanstalk or Google's app engine, or other similar services. Your application is run on as many nodes as necessary. They don't talk to each other -- each is independendent. Each instance can talk to the same database -- but the database might be a replicated cluster, but you don't see that. The application must not maintain any in-server state between requests. Any state must be persisted somewhere (like the database) between requests, because another node may handle the next request. I do think of this technology as highly reliable. Maybe more reliable, and definitely more scalable than a central mainframe. Once we start talking about clustering mainframes, we're already moving in the direction that makes traditional mainfames unnecessary.

                      The problems you describe are not problems I've had to solve. Mostly I'm interested in how to scale out my employer's application much more than is likely to become necessary in the near term.

                      You keep talking about reliable hardware. But the hardware in use *is* very reliable. More reliability comes from the ability to fail over or, in some situations, re-executed the failed operation on another healthy node.

                      With that qualification, I'm not sure what you are countering. That traditional mainframes still have a future which I am not aware of? (Going back to the original.)

                      I'm talking about a technology that is currently big news, and being used by people who know what they are doing. Maybe it doesn't solve some problems you are interested in. It seems that the crux of your replies are to point out problems that it does not solve. Yet these are in widespread use for problems that they do solve.

                      Getting out of the weeds and going back to my original assertions, I don't see a large future for traditional mainframes, other than for running traditional mainframe workloads.

                      But, I'm always interested to know something I didn't.

                      --
                      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.