Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday May 06 2016, @10:29AM   Printer-friendly
from the never-underestimate-the-bandwidth... dept.

There is a story at ACM (Association for Computing Machinery) Queue concerning what is the best practice on getting data from here to there: Should You Upload or Ship Big Data to the Cloud? -- The accepted wisdom does not always hold true.

There is an old adage to never underestimate the bandwidth of a station wagon full of magnetic tapes going down the highway. The challenge is knowing when it is better (cheaper/faster) to ship it vs send it. This analysis will help you to decide.

This article investigates the tradeoffs between speed of communications between your systems and the cloud provider, how fast you can store your data to removable media/drives, and shipping delays so as to help you decide the fastest way to get your data from your systems to a cloud storage provider.

I found the article to be generally in-depth and well done but I do have a couple of caveats.

First, I saw no analysis of the impact on having faster shipping/turnaround on the tradeoffs. The assumption is that it would take 48 hours for shipping and handling for physical media. I would have liked to see what impact it would have on the analysis if that were, say, 24 hours instead.

Also, an assumption is made that the cloud provider would need to copy from your media to put it on their systems. A variation I did not see explored was to have media that could be directly mounted at the cloud provider — whether the media was supplied in advance by the provider or met certain provider-required specs. In either case, that would avoid the need for another copying pass of the data. That, in turn might greatly change the analysis of whether it would be faster to ship media or just upload it over the internet.

Those quibbles aside, it is one of they better articles I've seen that investigates that actual tradeoffs.


Ed Note: Obligatory xkcd and another

Original Submission

Related Stories

Google Cloud Introduces New Long-Term/Cold Storage Tier 11 comments

Google debuts new Cloud Storage archive class for long-term data retention

Today at its annual Cloud Next conference in San Francisco, [Google] announced new storage tools, pricing, and products for customers of all sizes.

First on the agenda was a new archive class designed for long-term data retention that eliminates the need for a separate retrieval process, Google says, while providing "immediate" and low-latency access to content. Both access and management are performed via a familiar set of Google Cloud Storage APIs through which objects can be tiered down to save on costs, and data is redundantly stored geo-redundantly across multi-regional availability zones.

Pricing will start at $0.0012 per GB per month ($1.23 per TB per month) when it launches later this year. That's significantly cheaper than Microsoft's Azure Cool Blob Storage, which costs $0.002 per GB per month, and competitive with Amazon S3 Glacier, which is priced at $0.004 per GB per month.

Related: Should you Upload or Ship Big Data to the Cloud? -- The Accepted Wisdom does not Always Hold True
Google Cloud to Add Five New Regions With Three New Undersea Cables to Support It
Microsoft Exec Says Amazon's Expansion is an Opportunity as Amazon Hits $1 Trillion


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Funny) by Anonymous Coward on Friday May 06 2016, @10:47AM

    by Anonymous Coward on Friday May 06 2016, @10:47AM (#342499)

    Forget the station wagon example. Carrier pigeons sounds a lot more far out, until you start thinking about the actual bandwidth.

    Hint: Micro-SD.

    The ping is crap, but the bandwidth is pretty good.

    • (Score: 4, Insightful) by pendorbound on Friday May 06 2016, @01:20PM

      by pendorbound (2688) on Friday May 06 2016, @01:20PM (#342541) Homepage

      Ahh... Good old RFC 1149 [ietf.org]. That's one case I don't mind a ping missing me.

      • (Score: 2) by Scruffy Beard 2 on Friday May 06 2016, @04:10PM

        by Scruffy Beard 2 (6030) on Friday May 06 2016, @04:10PM (#342590)

        That is Low bandwidth hight latency.

        Technically SD card or USB keys [bbc.co.uk] violate RFC 1149 by not using blackstuff on whitestuff to encode the message. However, the use of flash memory allows you to convert a low bandwidth, high latency protocol into a high bandwidth one.

        A few years ago I ran the numbers on cost. Actually doing this makes a stupid amount of sense. Though I concluded it would be simpler to just use the Postal service.

        There is also the problem where many USB sticks do not actually have great data transfer rates (on the order of 10MB/s, which could be matched by 100Mbit Internet).

  • (Score: 0) by Anonymous Coward on Friday May 06 2016, @12:42PM

    by Anonymous Coward on Friday May 06 2016, @12:42PM (#342526)

    Ed Note: Obligatory xkcd and another

    is this revenge for three people posting different obligatory xkcd-s under previous article? :P

    also, i feel that linking straight to explainxkcd is being too easy on people...

    • (Score: 0) by Anonymous Coward on Friday May 06 2016, @01:34PM

      by Anonymous Coward on Friday May 06 2016, @01:34PM (#342549)

      there's actually a site dedicated to explaining the comics? wow the internet is not the same internet that I started with...

    • (Score: 2) by cmn32480 on Saturday May 07 2016, @12:38AM

      by cmn32480 (443) <{cmn32480} {at} {gmail.com}> on Saturday May 07 2016, @12:38AM (#342756) Journal

      I wish I could say it was on purpose. When I read the story it was the first thing I thought of.

      --
      "It's a dog eat dog world, and I'm wearing Milkbone underwear" - Norm Peterson
  • (Score: 5, Insightful) by jimshatt on Friday May 06 2016, @01:08PM

    by jimshatt (978) on Friday May 06 2016, @01:08PM (#342537) Journal

    Should you Upload or Ship Big Data to the Cloud?

    I'd say "no" :)

    • (Score: 3, Interesting) by pendorbound on Friday May 06 2016, @01:26PM

      by pendorbound (2688) on Friday May 06 2016, @01:26PM (#342545) Homepage

      I get the ludite knee jerk reaction, but be objective for a minute...

      Given a goal of off-site, redundant location backups. You don't need immediate fail-over or HA. We're talking disaster recovery. A meteor hit your DC (and and earthquake swallowed up your secondary, and a tsunami flooded your tertiary...). You just need your data back, and other delays are going to be longer than downloading or shipping spinning rust to you.

      What are the downsides to doing client side encryption, client side key management, and shipping or uploading the entropy to someone else to store in two or three places? It's tough to beat on price, impossible to beat on administrative costs when all you need is dumb storage without any compute or OS on top of it.

      Moving your entire IT into the cloud? Bad idea... Utilizing the cloud for things that it's actually good, economical, and safe at doing? What's the downside?

      • (Score: 2) by mcgrew on Friday May 06 2016, @05:16PM

        by mcgrew (701) <publish@mcgrewbooks.com> on Friday May 06 2016, @05:16PM (#342625) Homepage Journal

        I personally won't trust some nameless suit with MY data. And if there's a meteor strike, a tsunami, and an earthquake, my data's the last thing I'll be worrying about. If the data is in the cloud, the data's host could be having a bad fire while the earthquake, meteor, and tsunami happens.

        --
        mcgrewbooks.com mcgrew.info nooze.org
      • (Score: 2) by TheRaven on Saturday May 07 2016, @10:21AM

        by TheRaven (270) on Saturday May 07 2016, @10:21AM (#342853) Journal
        If your goal is off-site backup that you can restore from easily, but with a fairly high latency, then most banks and solicitors have some secure storage where you they will happily put your tapes (or optical media) for a fairly low price. They won't break the seals on your envelopes and will not datamine your confidential data. No clouds required.
        --
        sudo mod me up
  • (Score: 3, Insightful) by bitstream on Friday May 06 2016, @01:16PM

    by bitstream (6144) on Friday May 06 2016, @01:16PM (#342538) Journal

    Why even send it to a cloud at all?

    • (Score: 0) by Anonymous Coward on Friday May 06 2016, @02:05PM

      by Anonymous Coward on Friday May 06 2016, @02:05PM (#342564)

      Because.. Cloud! Do we really need to say more?
      ~Microsoft

    • (Score: 2) by frojack on Friday May 06 2016, @04:14PM

      by frojack (1554) on Friday May 06 2016, @04:14PM (#342594) Journal

      Flood
      Fire
      Earthquake
      Tsunami
      Vandalism
      Disgruntled
      Multi-site Access Need
      Insurance Requirement
      CYA

      Worried about Spys? Encrypt it before you send it.

       

      --
      No, you are mistaken. I've always had this sig.
      • (Score: 2) by stormreaver on Friday May 06 2016, @05:15PM

        by stormreaver (5101) on Friday May 06 2016, @05:15PM (#342624)

        Flood
        Fire
        Earthquake
        Tsunami
        Vandalism
        Disgruntled

        You seem to be under the impression that Amazon or Microsoft's servers are somehow immune to these. Both services have outages and data loss just as much as, if not more than, a well-run IT department.

        • (Score: 2) by frojack on Friday May 06 2016, @07:43PM

          by frojack (1554) on Friday May 06 2016, @07:43PM (#342688) Journal

          Actually NO, they don't have data loss, even when they do have strictly local outages your data us usually available.

          But the point is (and I'm sure you know this) is that the cloud site is not in your building. Therefore the cloud is not going to burn down when your small business goes up in flames.

          Offsite backup should not be a new concept for you.

          --
          No, you are mistaken. I've always had this sig.
          • (Score: 3, Interesting) by butthurt on Friday May 06 2016, @08:37PM

            by butthurt (6141) on Friday May 06 2016, @08:37PM (#342698) Journal

            > ... NO, they don't have data loss ...

            Microsoft data loss:
            http://www.theinquirer.net/inquirer/news/1558214/danger-backups [theinquirer.net]

            Amazon data loss:
            http://www.businessinsider.com/amazon-lost-data-2011-4 [businessinsider.com]

            • (Score: 2) by frojack on Saturday May 07 2016, @05:18PM

              by frojack (1554) on Saturday May 07 2016, @05:18PM (#342939) Journal

              Microsoft was 2009, before they even launched their Azure cloud service,

              Amazon's was 2011 but only affected customers that did not pay up for redundant data center storage (Multiple Availability Zone in Amazon parlance).

              Ultimately, 0.07% of the volumes in the affected Availability Zone could not be restored for customers. However, to lose data, those customers would have also had to choose to disable automatic backups (another extra cost item), which is on by default. They would have still lost some transactions if they had left this on, but the shot themselves in the foot by turning it off.

              So if all you can point to is two loses over these years I'd say that is effectively zero loss rate.

              But bear in mind, these were operational instances. Not data backup sites, but rather live database systems and web services interacting with the public or simultaneous access from multiple locations. Not exactly what is being discussed in this story, which is largely concerned with backup.

              --
              No, you are mistaken. I've always had this sig.
              • (Score: 2) by butthurt on Saturday May 07 2016, @09:41PM

                by butthurt (6141) on Saturday May 07 2016, @09:41PM (#343011) Journal

                > Microsoft was 2009, before they even launched their Azure cloud service,

                Are you implying that they've learned a lot since 2009? Microsoft was founded circa 1975. They are said to have spent $500 million for Danger, yet they didn't make a backup of its customers' data. In 2016, this (if I've found the pertinent legal agreement among the many they set forth) is how they value their Azure customers' data:

                The aggregate liability of each party for all claims under this agreement is limited to direct damages up to the amount paid under this agreement for the Online Service during the 12 months before the cause of action arose; provided, that in no event will a party's aggregate liability for any Online Service exceed the amount paid for that Online Service during the Subscription.
                [...]
                Neither party will be liable for loss of revenue or indirect, special, incidental, consequential, punitive, or exemplary damages, or damages for lost profits, revenues, business interruption, or loss of business information, even if the party knew they were possible or reasonably foreseeable.

                https://azure.microsoft.com/en-us/support/legal/subscription-agreement/ [microsoft.com]

                A story from 2014 tells about a company that used Azure. They had data corruption and it's unclear whether the fault was theirs or Microsoft's. They contacted Microsoft and learned they were out of luck. Microsoft didn't want to talk to the press:

                "Within minutes of discovering the problem, we contacted Microsoft Azure support. Unfortunately, Microsoft was unable to recover these data... from its servers," the Dedoose email to customers said. Microsoft officials couldn't be reached for comment.

                http://www.informationweek.com/cloud/cloud-storage/social-science-site-using-azure-loses-data/d/d-id/1252716 [informationweek.com]

                > So if all you can point to is two loses over these years I'd say that is effectively zero loss rate.

                After two I had stopped looking, because I thought that enough to refute your original statement.

                > But bear in mind, these were operational instances. Not data backup sites, but rather live database systems and web services interacting with the public or simultaneous access from multiple locations.

                According to Amazon, their April 2011 problem "primarily involved" on of their storage services:

                The issues affecting EC2 customers last week primarily involved a subset of the Amazon Elastic Block Store (“EBS”) volumes in a single Availability Zone within the US East Region that became unable to service read and write operations.

                http://aws.amazon.com/message/65648/ [amazon.com]

                If mailing hard drives back and forth is under consideration, as contemplated in the article, I suppose that a few days' outage will be acceptable.

                A little more looking turns up a rpoblem with their S3 offering [datacenterknowledge.com] but that was in 2008 and I'm sure they've learned a great deal since then.

                With a little more looking I found a Register article [theregister.co.uk] about the state of cloud storage in 2013. It mentions some outages in Amazon's (ECS, not S3) and Microsoft's systems:

                Microsoft's SkyDrive went down for a while in August, along with Outlook and the SQL service in Azure. Hotmail and Messenger were also affected. Bezos blushed that same month as Amazon's cloud also went down with its Elastic Block Store fingered as the culprit. This was its third such outage in two years.

                Those previous Skydrive outages were covered [neowin.net] by Neowin [neowin.net])

                Of course, outages don't necessarily mean data loss, and 2013 was a few years ago. On isitdownrightnow.com [soylentnews.org]" rel="url2html-1374">http://www.isitdownrightnow.com/onedrive.live.com.html">isitdownrightnow.com there are comments about Onedrive (formerly Skydrive) outages as recent as this January. Perhaps they're fake. Similarly, forum posts [ycombinator.com] describe unavailability of S3 in July 2015.

                If you wanted to back up your data to the cloud, but the cloud was temporarily offline, so you couldn't do your backing-up when you wanted to, then your own storage got stolen/burned/flooded, the cloud service provider didn't lose your data, but you're still out of luck

              • (Score: 2) by butthurt on Saturday May 07 2016, @09:45PM

                by butthurt (6141) on Saturday May 07 2016, @09:45PM (#343013) Journal

                correction:

                On isitdownrightnow.com [isitdownrightnow.com] there are comments about Onedrive (formerly Skydrive) outages as recent as this January.

                • (Score: 2) by frojack on Sunday May 08 2016, @01:42AM

                  by frojack (1554) on Sunday May 08 2016, @01:42AM (#343065) Journal

                  Outages are not data losses.

                  Let's not let the goal posts start creeping away here.

                  --
                  No, you are mistaken. I've always had this sig.
                  • (Score: 2) by butthurt on Sunday May 08 2016, @02:02AM

                    by butthurt (6141) on Sunday May 08 2016, @02:02AM (#343068) Journal

                    > Outages are not data losses.

                    I've pointed out that, for someone relying on a cloud for backups, the service's unavailability can lead to loss of data, if the outage occurred at an inopportune time.

                    > Let's not let the goal posts start creeping away here.

                    You had written that "they don't have data loss" and provided three examples where Amazon and Microsoft did lose data. Perhaps you stand by your original statement? Cheers then.

      • (Score: 2) by bitstream on Friday May 06 2016, @07:16PM

        by bitstream (6144) on Friday May 06 2016, @07:16PM (#342674) Journal

        You can easily dodge all but the explicit insurance demand. What's needed is secured off site storage. Even a rotten shed in middle of nowhere will do.

        Leaving the data with a provider that will try to hack your crypto protection all the time and only need to be right once. While also use anything you do against you. It just doesn't seem like a good idea.

        • (Score: 2) by dime on Friday May 06 2016, @09:56PM

          by dime (1163) on Friday May 06 2016, @09:56PM (#342714)

          Which provider do you think is trying to hack your crypto protection? Oh. All of them?

          How many customers does each cloud provider have? Like 2? 3? 5? Oh. Millions?

          Imagine if you and I both have several copies of all our personal files, projects, photo albums, and contract work archives encrypted with different keys and uploaded at multiple cloud backups. How many years would it take for the "cloud" to hack our shiz? Oh. Aeons?

          So every cloud provider is trying to hack everyone single persons crypto because they are the enemies of... math? And logic?

          disclaimer: I don't store data in the cloud and I don't have a horse in this race, but holy shit is your horse shiny as fuck from all that tinfoil.

          • (Score: 2) by bitstream on Saturday May 07 2016, @12:07AM

            by bitstream (6144) on Saturday May 07 2016, @12:07AM (#342749) Journal

            When your data is in the cloud, whenever the powers that are find you interesting. They have quick access to a lot of the data and especially metadata and traffic patterns. And how much resources the alphabet organizations have is not that widely understood. And should your keys leak for any reason, you will have an instant data breach. If the data is unavailable, you can secure it should the key(s) be duplicated for any reason.

            It's all about attack surface, not being "after you".

    • (Score: 2) by Tork on Saturday May 07 2016, @04:04AM

      by Tork (3914) Subscriber Badge on Saturday May 07 2016, @04:04AM (#342803)
      Because there is a lot of data out there whose availability outweighs its need for secrecy. For example: One of the cloud-services I use has a backup of several scripts I have written for my job. They're neither secret nor proprietary, but they make more productive. I don't care if someone on the other end sees them but I do care that I can reach those files anywhere in the world.

      There are plenty of reasons to use the cloud. It's a question of "best-tool-for-the-job", not one of slashdoltian absolutism.
      --
      🏳️‍🌈 Proud Ally 🏳️‍🌈
      • (Score: 2) by bitstream on Saturday May 07 2016, @10:34AM

        by bitstream (6144) on Saturday May 07 2016, @10:34AM (#342862) Journal

        Make your own cloud?

        • (Score: 0) by Anonymous Coward on Saturday May 07 2016, @03:14PM

          by Anonymous Coward on Saturday May 07 2016, @03:14PM (#342908)

          Lousy up-stream?

        • (Score: 2) by Tork on Saturday May 07 2016, @08:06PM

          by Tork (3914) Subscriber Badge on Saturday May 07 2016, @08:06PM (#342983)

          Significantly more expensive, not as reliable, fewer useful features including smartphone integration and sharing of files. Then there's speed, maintenance of backup/redundant systems. In the end it's not very competitive until I start getting into secure data.

          --
          🏳️‍🌈 Proud Ally 🏳️‍🌈
  • (Score: 3, Interesting) by Gravis on Friday May 06 2016, @01:24PM

    by Gravis (4596) on Friday May 06 2016, @01:24PM (#342543)

    bandwidth limitation of transferring the data to portable media may exceed the bandwidth limitation of the network. ಠ_ಠ

  • (Score: 0) by Anonymous Coward on Friday May 06 2016, @01:37PM

    by Anonymous Coward on Friday May 06 2016, @01:37PM (#342551)

    Cloud? Why are you putting your private data on someone else's server?

    • (Score: 0) by Anonymous Coward on Friday May 06 2016, @04:03PM

      by Anonymous Coward on Friday May 06 2016, @04:03PM (#342585)

      Meh, I just encrypt it with a GnuPG key and let Freenet or Pirate Bay mirror it. Make sure to name it something like "insurance.gpg"

    • (Score: 2) by PizzaRollPlinkett on Friday May 06 2016, @04:06PM

      by PizzaRollPlinkett (4512) on Friday May 06 2016, @04:06PM (#342588)

      Why? Because a manager told you to. Any other questions?

      --
      (E-mail me if you want a pizza roll!)
  • (Score: 0) by Anonymous Coward on Friday May 06 2016, @04:07PM

    by Anonymous Coward on Friday May 06 2016, @04:07PM (#342589)

    i do hope the cloud will rain down the latest tales of dwarfs, fancy princess and dragons "before winter comes" ...

  • (Score: 3, Interesting) by frojack on Friday May 06 2016, @04:37PM

    by frojack (1554) on Friday May 06 2016, @04:37PM (#342612) Journal

    Who does that anymore?

    Lets say you have what ever is used for off-line storage in your Data Center. Who's to say your Cloud Provider can accept your obsolete-before-installed media? What about when you upgrade? Or they upgrade?

    The Cloud operators will sooner or later upload your data to someone else's account, lose your media, destroy your media, ship your media back -- to your competitor. You will have to encrypt before you ship to guard against that.

    The idea that you want a bunch of bored low paid "sysop" slapping data modules into drives (without a care in the world), following some script written in a book, (but which eventually gets copied to a post-it-note glued to the front of your rack-mount, just scares the bejesus out of me.

    Before the huge emphases on confidential data, I remember laughing off receiving a tape full of fishing licenses when I was expecting a tape full of Public Health Laboratory results in the interoffice mail. The Fish and Game guy called me. Bad jokes ensued. He caught it just before it was mailed to US Fish and Wildlife in some other state.

    --
    No, you are mistaken. I've always had this sig.
    • (Score: 1, Insightful) by Anonymous Coward on Friday May 06 2016, @07:15PM

      by Anonymous Coward on Friday May 06 2016, @07:15PM (#342673)

      You have 80TB of data that needs to be in another offsite database yesterday because your primary is about to eat it. You have a 200mb connection. Do ship it or wait and hope it all transfers cleanly. Oh and meanwhile that over the net ship is hurting your real business because it is consuming an non insignificant amount of the bandwidth.

      Also take into account sync time. So you just shipped out a bunch of data. You are storing a few more million a day. How long will it take to bring the two copies back into sync.

      • (Score: 2) by frojack on Saturday May 07 2016, @04:33PM

        by frojack (1554) on Saturday May 07 2016, @04:33PM (#342926) Journal

        Yes, I'm sure we could each sit down and imagine a corner case where shipping data makes sense, but the point is these cases are rare occurrences, and not likely ones you've got pre-established procedures for.
        Its going to be messy. Your example is particularly prone to failure. New vendor. New procedure, changes on both ends of the shipment, new shipper, deadline...

        You'd be better off writing it all to new fast media every day and walking it over to the nearest bank that has a real honest to god vault while you work out the details with your new provider.

        80TB Shipped is every bit as much at risk as is 80TB synced,

        If all 80TB needs to go at once, fine, ship it.But if you are doing that daily, you are doing it wrong.

        --
        No, you are mistaken. I've always had this sig.
        • (Score: 1) by Scruffy Beard 2 on Tuesday May 10 2016, @06:18AM

          by Scruffy Beard 2 (6030) on Tuesday May 10 2016, @06:18AM (#344091)

          I read that as ship the physical media, then sync the updates that happened since the ship.