Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Sunday July 21 2019, @05:22AM   Printer-friendly
from the I-have-felt-this-pain dept.

I've had some occasions of late to peer through the looking glass into a world that I hadn't seen much of previously. Specifically, I'm talking about the world of so-called "cloud" stuff, where you basically pay someone else to build and run stuff for you, instead of doing it yourself.

I'll skip the analysis of build vs. buy and just jump straight to the point where you've chosen "buy". Then you've had a whole bunch of fun outages caused by something going wrong with their services. Finally, you reach the point of a sit-down talk with the vendor to figure things out. Maybe they send some sales people too, or perhaps it's just engineers. You talk for a while, and before long, you realize what happened.

[...] This becomes obvious when talking about some problem you experienced at the hands of their system. The whole time, their dashboard stayed green because from their point of view, they had tremendous availability. We're talking 99.999% here! Totally legit!

Meanwhile, you were having a really bad day. Nothing was working. Your business was in shambles. Your customers were at your throat yelling for action, and all you could do is point at the vendor. What happened?

Well, this is the point where you find out that their "99.999%" availability is for their entire system. They see that, and they're good. It's not a problem! Everything is fine.

This also completely misses the fact that for you, everything was failing. It doesn't matter though, since your worst day still won't move the needle on their fail-o-meter. They won't see you. They won't have any idea anything even happened until you complain weeks later. You are the bug on the windscreen of the locomotive. The train has no idea you were ever there.

The problem is that they weren't monitoring from the customer's perspective. Had they done that, it would have been clear that oodles of requests from some subset of customers were failing. They would have also realized that certain customers had all of their requests failing. For those customers, there were no nines to be had that day.

Seriously, if you have a multi-tenant system, you owe it to your customers to monitor it from their point of view. Otherwise, how can you possibly know when you've done something that'll leave them in the cold?


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @06:52AM (4 children)

    by Anonymous Coward on Sunday July 21 2019, @06:52AM (#869562)

    My Guru used to say "What leans supported, falls down together with a support."

    • (Score: 5, Interesting) by Rosco P. Coltrane on Sunday July 21 2019, @07:31AM (1 child)

      by Rosco P. Coltrane (4757) on Sunday July 21 2019, @07:31AM (#869570)

      The cloud is like a bank: you can either buy a safe and stoge your valuables in it or you can put your money in a bank account. The former is expensive upfront, and requires some planning to be a secure option. The latter seems cheap and very safe, until you realize the bank loans your money, uses it to play high-risk games on the financial market, and if the bank messes up or fails, you lose your money.

      It happens time and time again, despite the banks being regulated and FDIC-insured. Cloud providers on the other hand aren't regulated. Would you entrust when with your data?

      • (Score: 1, Informative) by Anonymous Coward on Sunday July 21 2019, @04:16PM

        by Anonymous Coward on Sunday July 21 2019, @04:16PM (#869653)

        The latter seems cheap and very safe, until you realize the bank loans your money, uses it to play high-risk games on the financial market, and if the bank messes up or fails, you lose your money.

        Bank deposits are insured up to a certain amount in a number of countries including the USA (terms and conditions apply etc etc):
        https://www.fdic.gov/deposit/covered/categories.html [fdic.gov]

        In contrast do NOT use the bank safe deposit box if you want safety:
        https://www.nytimes.com/2019/07/19/business/safe-deposit-box-theft.html [nytimes.com]

    • (Score: 2) by driverless on Monday July 22 2019, @01:46AM

      by driverless (4770) on Monday July 22 2019, @01:46AM (#869794)

      My Guru used to say, "When the elephant is dancing in the bedroom, the window needs a new banana".

      I think my Guru was a bit too heavily into the ganja at some point.

    • (Score: 2) by DannyB on Monday July 22 2019, @01:59PM

      by DannyB (5839) Subscriber Badge on Monday July 22 2019, @01:59PM (#869944) Journal

      No.

      The cloud is a Thousand Points of Failure. All shining brightly.

      I prefer a cloud provider that can guarantee three sixes of reliability.

      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
  • (Score: 2, Insightful) by Anonymous Coward on Sunday July 21 2019, @07:00AM (9 children)

    by Anonymous Coward on Sunday July 21 2019, @07:00AM (#869564)

    You say that central service providers should monitor from customers' point of view.

    How does one do this? Do we position a piece of equipment at each customer's site? That flies in the face of the whole "you can run your corporation from a coffee shop with your cellphone" philosophy. It opens up a can of worms we thought we'd gotten rid of - insurance, power, security, maintenance, inventory, employees. We're trying to get rid of all that!

    Let's look at it from the service provider's view. They have a monitoring system and someone to watch it. They know what their infrastructure depends upon (see https://en.wikipedia.org/wiki/Dependency_hell) [wikipedia.org] and have remediation procedures in place - that's how they achieve and maintain 99.9999% uptime.

    Do you? Have a monitoring system? Or anyone to watch it?

    Or did you terminate all of YOUR employees, who understood YOUR dependencies, and depend upon the [verbal] [mis]representations of salespeople, when they told you it was OK, just put it into the cloud?

    I guess it's time to sit down and do the math.

    How much did you spend on your cloud-based renter infrastructure?

    How much did you save by outsourcing everything, including responsibility for YOUR infrastructure, to total strangers?

    How much business did you lose as a result of your downtime?

    Is relocating your business logic and confidential data offsite to a location and resources that you do not have control over or responsibility for, cost effective ... or not?

    It's a business decision, right? It shouldn't be that hard.

    We KNOW your IT people told you this.

    We KNOW you lie to yourselves and each other that you didn't know.

    All you had to do was draw a few cartoon-grade diagrams on a whiteboard to see the potential for network latency.

    We KNOW that you accepted the bonuses of the people you terminated as your due.

    We KNOW you're going to duck and weave and blame the same cloud company that we KNOW you selected.

    We KNOW you have been through three or four or five cloud vendors now and so your infrastructure is spread through three or four or five different cloud vendors' infrastructures.

    We KNOW you are having a hard time finding people who know three or four or five different cloud vendor infrastructure GUIs and command line interfaces, never mind one, especially as you haven't spent a penny on educating employees since 2002.

    We KNOW that you have trouble retaining even temporary employees and that you abuse them for not being able to administer the infrastructures that YOU relocated to vague and amorphous locations that you, yourself, do not even know, using vendors that YOU selected.

    We KNOW you will never let got of control of YOUR infrastructure until you die.

    We KNOW you will hold us responsible for everything that goes wrong and take credit for everything that goes right, until the day you die.

    Smart people are waiting for you all to die - or be fired - whichever comes first.

    Dumb people are studying for their AWS certificates, in a never-ending rat race of fees and evolving GUIs.

    I see night shift employees for the San Francisco Municipal Railway whose job is cleaning dirty buses getting paid better and having better work conditions and better job security than your average crack UNIX systems administrator.

    Why would I bother to learn AWS?

    If it's so easy, go teach yourself AWS and administer your OWN infrastructure, motherfuckers!

    ~childo

    ... 35 years continuous industrial-grade experience and going on 4 years of unemployment.

    • (Score: 1, Interesting) by Anonymous Coward on Sunday July 21 2019, @07:20AM (2 children)

      by Anonymous Coward on Sunday July 21 2019, @07:20AM (#869567)

      For average people it is clearly visible if a bus is clean or not, but fairly invisible if a cloud infrastructure is clean or not. It is a barrier of reality perception, what makes the difference.

      • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @12:04PM (1 child)

        by Anonymous Coward on Sunday July 21 2019, @12:04PM (#869601)

        And the only way to make things real to the gaslighting assholes of the world is to organize. What can't IT people do for the life of them? Organize.

        They're armchair political theorists who mistake mytho-macho bullshit for political theory.

        • (Score: 1, Insightful) by Anonymous Coward on Sunday July 21 2019, @08:42PM

          by Anonymous Coward on Sunday July 21 2019, @08:42PM (#869715)

          A lot of IT people go into IT because they prefer dealing with computers than dealing with people. It's difficult to get large numbers of people-adverse folks to organize with their fellow people.

    • (Score: 3, Interesting) by janrinok on Sunday July 21 2019, @09:32AM (2 children)

      by janrinok (52) Subscriber Badge on Sunday July 21 2019, @09:32AM (#869585) Journal

      Do you? Have a monitoring system? Or anyone to watch it?

      Yes, that's how people know that they are not getting the 99.999% that they were told they could expect. Once again, marketing makes claims that are not experienced by the users. I don't care how efficiently the cloud provider thinks they are performing if I am getting a service that is unable to sustain my business.

      If it's so easy, go teach yourself AWS and administer your OWN infrastructure, motherfuckers!

      You sound a bit hurt....

      ... 35 years continuous industrial-grade experience and going on 4 years of unemployment.

      ... and that explains why. But your experiences do not reflect how every company treats its employees. If a customer is paying for a service then they are entitled to expect to receive that service.

      • (Score: 3, Interesting) by Rupert Pupnick on Sunday July 21 2019, @12:50PM

        by Rupert Pupnick (7277) on Sunday July 21 2019, @12:50PM (#869618) Journal

        The rise of cloud services is an example of how the market for tech people is moving towards narrower fields of tool (in the software sense) based expertise, and thus greater fragmentation. In software, I imagine that some of this fragmentation is artificial because it helps lock in customers. Generally, as technology moves forward, more specialization emerges. This has been going on at least since the scientific revolution, so maybe it’s inevitable. If you want to go into STEM, expect to become a specialist, and hope that what you picked will be in demand for years to come. If you go into management, it’s not as big a worry.

      • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @10:08PM

        by Anonymous Coward on Sunday July 21 2019, @10:08PM (#869743)

        A company I contracted for had a fool proof monitoring system in place. They’d wait for customers to complain it wasn’t working. Worked perfectly.

    • (Score: 2) by bradley13 on Sunday July 21 2019, @05:39PM

      by bradley13 (3053) on Sunday July 21 2019, @05:39PM (#869678) Homepage Journal

      Man, who pissed in your cornflakes?

      Look, there are situations where the cloud is a stupid solution. And there are also situations where it is an incredible solution. The trick is to know the difference.

      As for system administrators being mistreated: there are companies, and then there are companies. Let me digress...

      When I'm consulting (which I do a fair amount of, mostly for SMEs that are too small to have their own IT staff), one of the things I look for is the visibility of whoever is taking care of their computing infrastructure. If whoever they've hired is really visible, always out there fixing stuff, that's actually a horrible sign. It means that stuff is always breaking, users are poorly trained, etc... If everything "just works", and most people aren't sure who to call if they have a problem, that's generally a great sign.

      So: If you have a company where the sys-admin is good enough to keep everything running, almost invisibly - and if this sys-admin is being mistreated - then the sys-admin needs to move on to a decent company. Alternatively, there are sys-admins who feel terribly put-upon, because they are simply incompetent. Their infrastructure is always failing. People are always disturbing their peaceful game of minesweeper, because of yet-another-outage. From the tone of AC's rant, well, I think the latter scenario is the more fitting...

      --
      Everyone is somebody else's weirdo.
    • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @05:43PM

      by Anonymous Coward on Sunday July 21 2019, @05:43PM (#869679)

      fuck these suited whores funding the enemies of humanity every chance they get. they could have paid some small company and they would have worked their asses off. instead these dumb, lazy whores just choose whoever is biggest and most closed. they deserve what they get.

    • (Score: 0) by Anonymous Coward on Monday July 22 2019, @02:34AM

      by Anonymous Coward on Monday July 22 2019, @02:34AM (#869810)

      [...] ... 35 years continuous industrial-grade experience and going on 4 years of unemployment.

      You should do what I did after long-term unemployment. Get a job that has the simple job description: "Do as you're told". *Cracks knuckles*

  • (Score: 3, Insightful) by Rosco P. Coltrane on Sunday July 21 2019, @07:24AM

    by Rosco P. Coltrane (4757) on Sunday July 21 2019, @07:24AM (#869568)

    The someone-else you pay to do stuff for you turns around and exploits your data for their own profit, and shares it with the crypto-fascist government agencies du jour, without telling you.

  • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @07:41AM (10 children)

    by Anonymous Coward on Sunday July 21 2019, @07:41AM (#869571)

    Why bother editing when you can put the whole piece in the "summary", right.

    • (Score: 3, Touché) by janrinok on Sunday July 21 2019, @09:16AM (9 children)

      by janrinok (52) Subscriber Badge on Sunday July 21 2019, @09:16AM (#869581) Journal
      This was a short article which would not benefit from any additional editing - but if you feel very strongly about it, please feel free to join the editorial team. Anyone volunteering to join us will be welcome.
      • (Score: 3, Touché) by aristarchus on Sunday July 21 2019, @10:01AM (5 children)

        by aristarchus (2645) on Sunday July 21 2019, @10:01AM (#869588) Journal

        Anyone volunteering to join us will be welcome.

        Well, almost anyone.

        • (Score: 2) by Chocolate on Sunday July 21 2019, @10:09AM (2 children)

          by Chocolate (8044) on Sunday July 21 2019, @10:09AM (#869589) Journal

          Create a sock puppet account just for editing?

          --
          Bit-choco-coin anyone?
          • (Score: 2) by janrinok on Sunday July 21 2019, @11:39AM (1 child)

            by janrinok (52) Subscriber Badge on Sunday July 21 2019, @11:39AM (#869599) Journal

            As long as the sock puppet achieved credibility on the site by generating the sort of submissions we are seeking, that it is contactable via email with a valid email address, and that the sock puppet passes the training module, then I don't care what 'name' you choose to operate under.

            You don't really believe that all of the other staff are using their true names, or that you are called 'Chocolate', do you?

            • (Score: 2) by Chocolate on Sunday July 21 2019, @10:55PM

              by Chocolate (8044) on Sunday July 21 2019, @10:55PM (#869757) Journal

              I respond to chocolate.
              All you have to do is wave a bar of it near me and you'll have 100% of my attention or so I have been told.

              --
              Bit-choco-coin anyone?
        • (Score: 3, Informative) by janrinok on Sunday July 21 2019, @11:34AM (1 child)

          by janrinok (52) Subscriber Badge on Sunday July 21 2019, @11:34AM (#869598) Journal

          You can volunteer. We will train you, which we do on the dev system - not the live one. Training can last between 3-4 days up to 2 weeks or more, depending on how much time you can dedicate to the site and trainer availability. You will have to edit in accordance with the rules and procedures that are in force, and which we have pointed out to you numerous times. You will have to limit yourself to carrying out the editorial role and not writing your own stories, all the time while complying with the access privileges given to you to carry out your task. And you will have to demonstrate that you can be trusted by the community not to abuse the access privileges that you would be given. Do that, and there shouldn't be a problem.

          But should you fail to do that or, worse still, you abuse your privileges then you can rest assured that you will not post on this site again. These are the same rules that are applied to every editor - you would not be being singled out because of your past history.

          • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @12:16PM

            by Anonymous Coward on Sunday July 21 2019, @12:16PM (#869608)

            GP is still dealing with the grief of knowing he'll never have a dick.

      • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @10:38AM (1 child)

        by Anonymous Coward on Sunday July 21 2019, @10:38AM (#869593)

        That's a rather very strong statement itself. It is obvious this site is ideologically biased for the multistate Saxon Empire. Any person who is antagonising that one would be surely unwelcome in editorial team.

        • (Score: 0) by Anonymous Coward on Monday July 22 2019, @02:36AM

          by Anonymous Coward on Monday July 22 2019, @02:36AM (#869811)

          ᚦᛖᛃ᛫ᚹᛁᛚᛚ᛫ᚾᛖᚡᛖᚱ᛫ᛞᛖᚠᛠᛏ᛫ᚢᛋ᛭

      • (Score: 1) by RandomFactor on Sunday July 21 2019, @01:39PM

        by RandomFactor (3682) Subscriber Badge on Sunday July 21 2019, @01:39PM (#869626) Journal

        Anyone volunteering to join us will be welcome.

        https://www.youtube.com/watch?v=4F4qzPbcFiA [youtube.com]

        --
        В «Правде» нет известий, в «Известиях» нет правды
  • (Score: 0) by Anonymous Coward on Sunday July 21 2019, @12:59PM

    by Anonymous Coward on Sunday July 21 2019, @12:59PM (#869621)

    i didnt have to go all the way to the cloud to experience shitty service.
    my isp (time of the adsl modem) was government owned and having a semblance of of public utility only in name.
    the service was beyond miserable. being a monopoly in my area and government run gave my complaints
    zero leverage.
    a few years later (vdsl2 modem times with some gpon) another isp 100 % publicly traded on stock market came along and
    things improved dramatically.
    now chaining trust together: guy installing cable, modem manufacturer, routing and peering infrastructure of isp and then adding even more like reliance on a "cloud provider" ... well somewhere "9"s will get lost.
    the government isp is still around, lol, removing "9"s consistantly ... go figure.

  • (Score: 2, Informative) by hwertz on Sunday July 21 2019, @04:35PM (1 child)

    by hwertz (8141) on Sunday July 21 2019, @04:35PM (#869662)

    I've never seen this... Every shared hosting or "cloud" provider I've seen, the 4 or 5 "9s" are for your instance. With that said... if the machine stays up, but what you set up on there keeps crashing out... well, then it's like you say, you have downtime but they don't.

    To maintain uptime, they do expect your services to be set up so they can be yanked off one machine and moved onto another (since an individual computer can of course fail.) This is complicated to do right, so many people don't do it right -- they put whatever into a cloud provider's systems, then act like it's the cloud provider's fault when migrating the stuff to another computer failed. This isn't like an old IBM mainframe where your processes are just seamlessly moved to another system -- they expect you to handle having your processes shut down (maybe not even cleanly), and then started up somewhere else, and then (hopefully) fed the same transaction info if it was in the middle of performing transaction... you better not let that transaction go through twice! This is very complicated but considered the user's problem, not the cloud providers, if you don't get it right.

    What IS the cloud providers fault?

    In my view,
    1) it'd be way nicer if things could be moved more seamlessly. The current PC technology does finally allow for replicating what IBM did in the 1980s on those dinosaurs, just moving the whole VM over, but I don't know if any cloud provider does. Honestly some of the existing setups are simply overcomplicated.

    2) If you look at some of these storage and database type services, they'll have ones where the service level agreement might promise 99.999% but no limit on how long the service takes to respond (realistically, some of these services will "usually" take 1/10th of a second or whatever but sometimes take 20 or 30 seconds PER TRANSACTION instead, and apparently on a daily basis, not some "whoops we almost crashed our cloud this one time" basis.) And in some cases it's considered "not a failure" to have it FAIL 2 or 3 times in a row as long as it works on the 4th attempt. That is simply crap.

    • (Score: 0) by Anonymous Coward on Monday July 22 2019, @12:00AM

      by Anonymous Coward on Monday July 22 2019, @12:00AM (#869771)

      5 9's is a marketing term.

      You can have a service that has hundreds of moving parts (as we called them when I did this). 1 of them is down. Is that a down situation? Does that count against the 9's? It is a method call that on guy uses once a month, and he did not call it during that time.

      What you really want to know is what sort of transaction rates they have. What is their turn around time per method. What is their failure rate per method. etc etc etc.

      What I said above is for *that* particular type of service (SaaS). Oh and more than likely they do not know. Then point at they other type of uptime as *the* metric.

      If you are just chucking your boxes out in the 'cloud' you have a whole different set of reqs. Like box uptime, network latency, storage latency. Basically the same sort of thing but everything in the stat counters in your favorite OS. Which can be overwhelming for anyone.

      Then add to that your own services. Are the h1b guys you and they hired any good? I have worked with some wicked smart ones. Then some I wonder how they manage to velcro their shoes in the morning.

      Then on top of those what are their maintenance windows (they all have them). What is the effect on you during those times. That can look anywhere from reduced service to not working at all to no impact. That can depend on what infrastructure is being upgraded. It *will* vary.

      Reliability is not something you can usually farm out. You build it.

      Hell the place I work has its own 'cloud'. Just last week they upgraded all of our machines. Both production and failover. They broke them all. Lucky it was an easy fix. But what is your bring up from scratch procedure like? You farm it out you take on their oddities and their downtimes. They sell you 99.999 but the reality is more like 95% on just about everything. Just like when you owned the hardware. You build reliability your self and expect the whole thing to disappear on you at any moment.

      just moving the whole VM over
      vSphere which many of them are built on have had that for years. It works OK. But it is *not* seamless. There will be an interruption of service at some point. The big ones have their own way of doing it.

  • (Score: 2) by bradley13 on Sunday July 21 2019, @05:07PM

    by bradley13 (3053) on Sunday July 21 2019, @05:07PM (#869668) Homepage Journal

    Yep, had this today, with my ISP. Internet goes down. Check the network status page - everything is green. Call. "OH, yes, it's only one switch, so we don't count it as an outage."

    Gee,thanks for that...

    --
    Everyone is somebody else's weirdo.
(1)