from the never-underestimate-the-bandwidth... dept.
There is a story at ACM (Association for Computing Machinery) Queue concerning what is the best practice on getting data from here to there: Should You Upload or Ship Big Data to the Cloud? -- The accepted wisdom does not always hold true.
There is an old adage to never underestimate the bandwidth of a station wagon full of magnetic tapes going down the highway. The challenge is knowing when it is better (cheaper/faster) to ship it vs send it. This analysis will help you to decide.
This article investigates the tradeoffs between speed of communications between your systems and the cloud provider, how fast you can store your data to removable media/drives, and shipping delays so as to help you decide the fastest way to get your data from your systems to a cloud storage provider.
I found the article to be generally in-depth and well done but I do have a couple of caveats.
First, I saw no analysis of the impact on having faster shipping/turnaround on the tradeoffs. The assumption is that it would take 48 hours for shipping and handling for physical media. I would have liked to see what impact it would have on the analysis if that were, say, 24 hours instead.
Also, an assumption is made that the cloud provider would need to copy from your media to put it on their systems. A variation I did not see explored was to have media that could be directly mounted at the cloud provider — whether the media was supplied in advance by the provider or met certain provider-required specs. In either case, that would avoid the need for another copying pass of the data. That, in turn might greatly change the analysis of whether it would be faster to ship media or just upload it over the internet.
Those quibbles aside, it is one of they better articles I've seen that investigates that actual tradeoffs.
Ed Note: Obligatory xkcd and another
Related Stories
Google debuts new Cloud Storage archive class for long-term data retention
Today at its annual Cloud Next conference in San Francisco, [Google] announced new storage tools, pricing, and products for customers of all sizes.
First on the agenda was a new archive class designed for long-term data retention that eliminates the need for a separate retrieval process, Google says, while providing "immediate" and low-latency access to content. Both access and management are performed via a familiar set of Google Cloud Storage APIs through which objects can be tiered down to save on costs, and data is redundantly stored geo-redundantly across multi-regional availability zones.
Pricing will start at $0.0012 per GB per month ($1.23 per TB per month) when it launches later this year. That's significantly cheaper than Microsoft's Azure Cool Blob Storage, which costs $0.002 per GB per month, and competitive with Amazon S3 Glacier, which is priced at $0.004 per GB per month.
Related: Should you Upload or Ship Big Data to the Cloud? -- The Accepted Wisdom does not Always Hold True
Google Cloud to Add Five New Regions With Three New Undersea Cables to Support It
Microsoft Exec Says Amazon's Expansion is an Opportunity as Amazon Hits $1 Trillion
(Score: 1, Funny) by Anonymous Coward on Friday May 06 2016, @10:47AM
Forget the station wagon example. Carrier pigeons sounds a lot more far out, until you start thinking about the actual bandwidth.
Hint: Micro-SD.
The ping is crap, but the bandwidth is pretty good.
(Score: 4, Insightful) by pendorbound on Friday May 06 2016, @01:20PM
Ahh... Good old RFC 1149 [ietf.org]. That's one case I don't mind a ping missing me.
(Score: 2) by Scruffy Beard 2 on Friday May 06 2016, @04:10PM
That is Low bandwidth hight latency.
Technically SD card or USB keys [bbc.co.uk] violate RFC 1149 by not using blackstuff on whitestuff to encode the message. However, the use of flash memory allows you to convert a low bandwidth, high latency protocol into a high bandwidth one.
A few years ago I ran the numbers on cost. Actually doing this makes a stupid amount of sense. Though I concluded it would be simpler to just use the Postal service.
There is also the problem where many USB sticks do not actually have great data transfer rates (on the order of 10MB/s, which could be matched by 100Mbit Internet).
(Score: 0) by Anonymous Coward on Friday May 06 2016, @12:42PM
Ed Note: Obligatory xkcd and another
is this revenge for three people posting different obligatory xkcd-s under previous article? :P
also, i feel that linking straight to explainxkcd is being too easy on people...
(Score: 0) by Anonymous Coward on Friday May 06 2016, @01:34PM
there's actually a site dedicated to explaining the comics? wow the internet is not the same internet that I started with...
(Score: 2) by cmn32480 on Saturday May 07 2016, @12:38AM
I wish I could say it was on purpose. When I read the story it was the first thing I thought of.
"It's a dog eat dog world, and I'm wearing Milkbone underwear" - Norm Peterson
(Score: 5, Insightful) by jimshatt on Friday May 06 2016, @01:08PM
I'd say "no" :)
(Score: 3, Interesting) by pendorbound on Friday May 06 2016, @01:26PM
I get the ludite knee jerk reaction, but be objective for a minute...
Given a goal of off-site, redundant location backups. You don't need immediate fail-over or HA. We're talking disaster recovery. A meteor hit your DC (and and earthquake swallowed up your secondary, and a tsunami flooded your tertiary...). You just need your data back, and other delays are going to be longer than downloading or shipping spinning rust to you.
What are the downsides to doing client side encryption, client side key management, and shipping or uploading the entropy to someone else to store in two or three places? It's tough to beat on price, impossible to beat on administrative costs when all you need is dumb storage without any compute or OS on top of it.
Moving your entire IT into the cloud? Bad idea... Utilizing the cloud for things that it's actually good, economical, and safe at doing? What's the downside?
(Score: 2) by mcgrew on Friday May 06 2016, @05:16PM
I personally won't trust some nameless suit with MY data. And if there's a meteor strike, a tsunami, and an earthquake, my data's the last thing I'll be worrying about. If the data is in the cloud, the data's host could be having a bad fire while the earthquake, meteor, and tsunami happens.
mcgrewbooks.com mcgrew.info nooze.org
(Score: 2) by TheRaven on Saturday May 07 2016, @10:21AM
sudo mod me up
(Score: 3, Insightful) by bitstream on Friday May 06 2016, @01:16PM
Why even send it to a cloud at all?
(Score: 0) by Anonymous Coward on Friday May 06 2016, @02:05PM
Because.. Cloud! Do we really need to say more?
~Microsoft
(Score: 2) by frojack on Friday May 06 2016, @04:14PM
Flood
Fire
Earthquake
Tsunami
Vandalism
Disgruntled
Multi-site Access Need
Insurance Requirement
CYA
Worried about Spys? Encrypt it before you send it.
No, you are mistaken. I've always had this sig.
(Score: 2) by stormreaver on Friday May 06 2016, @05:15PM
Flood
Fire
Earthquake
Tsunami
Vandalism
Disgruntled
You seem to be under the impression that Amazon or Microsoft's servers are somehow immune to these. Both services have outages and data loss just as much as, if not more than, a well-run IT department.
(Score: 2) by frojack on Friday May 06 2016, @07:43PM
Actually NO, they don't have data loss, even when they do have strictly local outages your data us usually available.
But the point is (and I'm sure you know this) is that the cloud site is not in your building. Therefore the cloud is not going to burn down when your small business goes up in flames.
Offsite backup should not be a new concept for you.
No, you are mistaken. I've always had this sig.
(Score: 3, Interesting) by butthurt on Friday May 06 2016, @08:37PM
> ... NO, they don't have data loss ...
Microsoft data loss:
http://www.theinquirer.net/inquirer/news/1558214/danger-backups [theinquirer.net]
Amazon data loss:
http://www.businessinsider.com/amazon-lost-data-2011-4 [businessinsider.com]
(Score: 2) by frojack on Saturday May 07 2016, @05:18PM
Microsoft was 2009, before they even launched their Azure cloud service,
Amazon's was 2011 but only affected customers that did not pay up for redundant data center storage (Multiple Availability Zone in Amazon parlance).
Ultimately, 0.07% of the volumes in the affected Availability Zone could not be restored for customers. However, to lose data, those customers would have also had to choose to disable automatic backups (another extra cost item), which is on by default. They would have still lost some transactions if they had left this on, but the shot themselves in the foot by turning it off.
So if all you can point to is two loses over these years I'd say that is effectively zero loss rate.
But bear in mind, these were operational instances. Not data backup sites, but rather live database systems and web services interacting with the public or simultaneous access from multiple locations. Not exactly what is being discussed in this story, which is largely concerned with backup.
No, you are mistaken. I've always had this sig.
(Score: 2) by butthurt on Saturday May 07 2016, @09:41PM
> Microsoft was 2009, before they even launched their Azure cloud service,
Are you implying that they've learned a lot since 2009? Microsoft was founded circa 1975. They are said to have spent $500 million for Danger, yet they didn't make a backup of its customers' data. In 2016, this (if I've found the pertinent legal agreement among the many they set forth) is how they value their Azure customers' data:
—https://azure.microsoft.com/en-us/support/legal/subscription-agreement/ [microsoft.com]
A story from 2014 tells about a company that used Azure. They had data corruption and it's unclear whether the fault was theirs or Microsoft's. They contacted Microsoft and learned they were out of luck. Microsoft didn't want to talk to the press:
—http://www.informationweek.com/cloud/cloud-storage/social-science-site-using-azure-loses-data/d/d-id/1252716 [informationweek.com]
> So if all you can point to is two loses over these years I'd say that is effectively zero loss rate.
After two I had stopped looking, because I thought that enough to refute your original statement.
> But bear in mind, these were operational instances. Not data backup sites, but rather live database systems and web services interacting with the public or simultaneous access from multiple locations.
According to Amazon, their April 2011 problem "primarily involved" on of their storage services:
—http://aws.amazon.com/message/65648/ [amazon.com]
If mailing hard drives back and forth is under consideration, as contemplated in the article, I suppose that a few days' outage will be acceptable.
A little more looking turns up a rpoblem with their S3 offering [datacenterknowledge.com] but that was in 2008 and I'm sure they've learned a great deal since then.
With a little more looking I found a Register article [theregister.co.uk] about the state of cloud storage in 2013. It mentions some outages in Amazon's (ECS, not S3) and Microsoft's systems:
Those previous Skydrive outages were covered [neowin.net] by Neowin [neowin.net])
Of course, outages don't necessarily mean data loss, and 2013 was a few years ago. On isitdownrightnow.com [soylentnews.org]" rel="url2html-1374">http://www.isitdownrightnow.com/onedrive.live.com.html">isitdownrightnow.com there are comments about Onedrive (formerly Skydrive) outages as recent as this January. Perhaps they're fake. Similarly, forum posts [ycombinator.com] describe unavailability of S3 in July 2015.
If you wanted to back up your data to the cloud, but the cloud was temporarily offline, so you couldn't do your backing-up when you wanted to, then your own storage got stolen/burned/flooded, the cloud service provider didn't lose your data, but you're still out of luck
(Score: 2) by butthurt on Saturday May 07 2016, @09:45PM
correction:
On isitdownrightnow.com [isitdownrightnow.com] there are comments about Onedrive (formerly Skydrive) outages as recent as this January.
(Score: 2) by frojack on Sunday May 08 2016, @01:42AM
Outages are not data losses.
Let's not let the goal posts start creeping away here.
No, you are mistaken. I've always had this sig.
(Score: 2) by butthurt on Sunday May 08 2016, @02:02AM
> Outages are not data losses.
I've pointed out that, for someone relying on a cloud for backups, the service's unavailability can lead to loss of data, if the outage occurred at an inopportune time.
> Let's not let the goal posts start creeping away here.
You had written that "they don't have data loss" and provided three examples where Amazon and Microsoft did lose data. Perhaps you stand by your original statement? Cheers then.
(Score: 2) by bitstream on Friday May 06 2016, @07:16PM
You can easily dodge all but the explicit insurance demand. What's needed is secured off site storage. Even a rotten shed in middle of nowhere will do.
Leaving the data with a provider that will try to hack your crypto protection all the time and only need to be right once. While also use anything you do against you. It just doesn't seem like a good idea.
(Score: 2) by dime on Friday May 06 2016, @09:56PM
Which provider do you think is trying to hack your crypto protection? Oh. All of them?
How many customers does each cloud provider have? Like 2? 3? 5? Oh. Millions?
Imagine if you and I both have several copies of all our personal files, projects, photo albums, and contract work archives encrypted with different keys and uploaded at multiple cloud backups. How many years would it take for the "cloud" to hack our shiz? Oh. Aeons?
So every cloud provider is trying to hack everyone single persons crypto because they are the enemies of... math? And logic?
disclaimer: I don't store data in the cloud and I don't have a horse in this race, but holy shit is your horse shiny as fuck from all that tinfoil.
(Score: 2) by bitstream on Saturday May 07 2016, @12:07AM
When your data is in the cloud, whenever the powers that are find you interesting. They have quick access to a lot of the data and especially metadata and traffic patterns. And how much resources the alphabet organizations have is not that widely understood. And should your keys leak for any reason, you will have an instant data breach. If the data is unavailable, you can secure it should the key(s) be duplicated for any reason.
It's all about attack surface, not being "after you".
(Score: 2) by Tork on Saturday May 07 2016, @04:04AM
There are plenty of reasons to use the cloud. It's a question of "best-tool-for-the-job", not one of slashdoltian absolutism.
🏳️🌈 Proud Ally 🏳️🌈
(Score: 2) by bitstream on Saturday May 07 2016, @10:34AM
Make your own cloud?
(Score: 0) by Anonymous Coward on Saturday May 07 2016, @03:14PM
Lousy up-stream?
(Score: 2) by Tork on Saturday May 07 2016, @08:06PM
Significantly more expensive, not as reliable, fewer useful features including smartphone integration and sharing of files. Then there's speed, maintenance of backup/redundant systems. In the end it's not very competitive until I start getting into secure data.
🏳️🌈 Proud Ally 🏳️🌈
(Score: 3, Interesting) by Gravis on Friday May 06 2016, @01:24PM
bandwidth limitation of transferring the data to portable media may exceed the bandwidth limitation of the network. ಠ_ಠ
(Score: 0) by Anonymous Coward on Friday May 06 2016, @01:37PM
Cloud? Why are you putting your private data on someone else's server?
(Score: 0) by Anonymous Coward on Friday May 06 2016, @04:03PM
Meh, I just encrypt it with a GnuPG key and let Freenet or Pirate Bay mirror it. Make sure to name it something like "insurance.gpg"
(Score: 2) by PizzaRollPlinkett on Friday May 06 2016, @04:06PM
Why? Because a manager told you to. Any other questions?
(E-mail me if you want a pizza roll!)
(Score: 0) by Anonymous Coward on Friday May 06 2016, @04:07PM
i do hope the cloud will rain down the latest tales of dwarfs, fancy princess and dragons "before winter comes" ...
(Score: 3, Interesting) by frojack on Friday May 06 2016, @04:37PM
Who does that anymore?
Lets say you have what ever is used for off-line storage in your Data Center. Who's to say your Cloud Provider can accept your obsolete-before-installed media? What about when you upgrade? Or they upgrade?
The Cloud operators will sooner or later upload your data to someone else's account, lose your media, destroy your media, ship your media back -- to your competitor. You will have to encrypt before you ship to guard against that.
The idea that you want a bunch of bored low paid "sysop" slapping data modules into drives (without a care in the world), following some script written in a book, (but which eventually gets copied to a post-it-note glued to the front of your rack-mount, just scares the bejesus out of me.
Before the huge emphases on confidential data, I remember laughing off receiving a tape full of fishing licenses when I was expecting a tape full of Public Health Laboratory results in the interoffice mail. The Fish and Game guy called me. Bad jokes ensued. He caught it just before it was mailed to US Fish and Wildlife in some other state.
No, you are mistaken. I've always had this sig.
(Score: 1, Insightful) by Anonymous Coward on Friday May 06 2016, @07:15PM
You have 80TB of data that needs to be in another offsite database yesterday because your primary is about to eat it. You have a 200mb connection. Do ship it or wait and hope it all transfers cleanly. Oh and meanwhile that over the net ship is hurting your real business because it is consuming an non insignificant amount of the bandwidth.
Also take into account sync time. So you just shipped out a bunch of data. You are storing a few more million a day. How long will it take to bring the two copies back into sync.
(Score: 2) by frojack on Saturday May 07 2016, @04:33PM
Yes, I'm sure we could each sit down and imagine a corner case where shipping data makes sense, but the point is these cases are rare occurrences, and not likely ones you've got pre-established procedures for.
Its going to be messy. Your example is particularly prone to failure. New vendor. New procedure, changes on both ends of the shipment, new shipper, deadline...
You'd be better off writing it all to new fast media every day and walking it over to the nearest bank that has a real honest to god vault while you work out the details with your new provider.
80TB Shipped is every bit as much at risk as is 80TB synced,
If all 80TB needs to go at once, fine, ship it.But if you are doing that daily, you are doing it wrong.
No, you are mistaken. I've always had this sig.
(Score: 1) by Scruffy Beard 2 on Tuesday May 10 2016, @06:18AM
I read that as ship the physical media, then sync the updates that happened since the ship.