Stories
Slash Boxes
Comments

SoylentNews is people

posted by takyon on Tuesday February 28 2017, @10:58PM   Printer-friendly
from the mission-critical-infrastructure dept.

From The Verge:

Amazon's web hosting services are among the most widely used out there, which means that when Amazon's servers goes down, a lot of things go down with them. That appears to be happening today, with Amazon reporting "high error rates" in one region of its S3 web services, and a number of services going offline because of it.

Trello, Quora, IFTTT, and Splitwise all appear to be offline, as are websites built with the site-creation service Wix; GroupMe seems to be unable to load assets (The Verge's own image system, which relies on Amazon, is also down); and Alexa is struggling to stay online, too. Nest's app was unable to connect to thermostats and other devices for a period of time as well.

Isitdownrightnow.com also appears to be down as a result of the outage.

Amazon has suffered brief outages before that have knocked offline services including Instagram, Vine, and IMDb. There don't appear to be any truly huge names impacted by this outage so far, but as always, its effects are widespread due to just how many services — especially smaller ones — rely on Amazon.

There's no estimate on when service will be restored, but Amazon says it is "actively working on remediating the issue."

PS - BTW - thumbs up to our great behind the scenes guys! Good luck N.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by aclarke on Wednesday March 01 2017, @01:52AM (7 children)

    by aclarke (2049) on Wednesday March 01 2017, @01:52AM (#473170) Homepage

    Sure OK, but have you factored in the cost of these companies doing it all themselves, the expertise they'd have to hire, and the downtime they'd experience with their own servers compared to AWS? It's not like overworked devops folks can run a farm of their own servers, in addition to everything else, for 99.99% uptime.

    Our uptime has improved since moving to AWS. I'm a reasonably confident sysadmin (it's not my full-time job) and I inherited a project without the budget or resources to manage the servers. AWS also reduced our hosting costs, so I'd call that a win/win. Oh yeah, and better security, although I won't go into why. I don't run on us-west-1 so I thankfully wasn't hit by today's outage. We are stuck on one region though because even though it's "the cloud", it turns out it still costs more to set up geographical redundancy. Plus more to run the servers. Plus the complexity of managing it. I'd love to do that but I just don't have the budget, which means that if our region goes down, so do we and our clients are just going to have to be OK with that.

    We do at least have a warm backup in another region, so if things go really really wrong at least we can be back up in probably 2 hours.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2, Insightful) by Anonymous Coward on Wednesday March 01 2017, @02:04AM (1 child)

    by Anonymous Coward on Wednesday March 01 2017, @02:04AM (#473173)

    we can be back up in probably 2 hours.
    Do yourself a *HUGE* favor. Test it.

    We thought we were 5 seconds. Turns out we were 12 hours. We got it down to about 30 seconds. It did however take a lot of testing to get right.

    • (Score: 2) by aclarke on Wednesday March 01 2017, @01:41PM

      by aclarke (2049) on Wednesday March 01 2017, @01:41PM (#473298) Homepage

      When I tested it, it took about 30 minutes. So I tell people two hours to give myself some time to pat myself on the back, or possibly freak out, depending on how things go.

  • (Score: 0) by Anonymous Coward on Wednesday March 01 2017, @09:02AM (2 children)

    by Anonymous Coward on Wednesday March 01 2017, @09:02AM (#473246)

    > Sure OK, but have you factored in the cost of these companies doing it all themselves, the expertise they'd have to hire, and the downtime they'd experience with their own servers compared to AWS?

    Yes. Usually still two times cheaper than AWS.

    • (Score: 0) by Anonymous Coward on Wednesday March 01 2017, @11:39AM

      by Anonymous Coward on Wednesday March 01 2017, @11:39AM (#473264)

      Usually still two times cheaper than AWS.

      I'm... not sure how to parse that. Do you mean it costs half as much? (Not a diss, honest question.)

      And the fortune rubs it in :)

      "Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

    • (Score: 2) by aclarke on Wednesday March 01 2017, @01:40PM

      by aclarke (2049) on Wednesday March 01 2017, @01:40PM (#473297) Homepage

      We saved about 70% of our hosting costs by moving from a data centre and Google App Engine to AWS. It was a few years ago, but AWS was about half the cost of our data centre, and we got newer/faster hardware and more redundancy along with it.

  • (Score: 2) by TheRaven on Wednesday March 01 2017, @12:23PM

    by TheRaven (270) on Wednesday March 01 2017, @12:23PM (#473276) Journal
    The problem is correlated failure. This was why Katrina caused a load of insurance companies to go out of business (and not pay out as a result), whereas the hurricane the next year did more damage but didn't kill any insurance companies. It's easy to understand the risk of service X going down and if services X, Y, and Z are independent then you can handle them as independent risks. If they all depend on (or, worse, provide some portion of) some common infrastructure then the probability of failure might be lower, but the probability of independent failure is zero: if one goes down then they all will. You see the same thing on a smaller scale when a small company gets two ISPs to provide connections so that they have high reliability Internet access, only to discover that they use the same back-haul provider and all of the downtime comes from there, so if one goes down the other one will as well.
    --
    sudo mod me up
  • (Score: 2) by tangomargarine on Wednesday March 01 2017, @04:20PM

    by tangomargarine (667) on Wednesday March 01 2017, @04:20PM (#473358)

    the expertise they'd have to hire

    Hey, sounds like more jobs! :)

    --
    "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"