Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday July 11 2014, @01:07AM   Printer-friendly
from the picking-brains-time dept.

This is probably one of those topics that gets regurgitated periodically, but it's always good to get some fresh answers.

The small consultancy business I work for wants to set up a new file server with remote backup. In the past we have used a Windows XP file server and plugged in a couple of external USB drives when space runs out. Backups were performed nightly to a USB drive and taken offsite to a trusted employees home.

They are looking to Linux for a new file server (I think more because they found out how much a new Windows file server would be).

I'm not a server guy but I have set up a simple Debian-based web server at work for a specific intranet application, but when I was asked about ideas for the new system the best I could come up with was maybe ssh+rsync (which I have only recently started using myself so I'm no expert by any means). Using Amazon's cloud service has been suggested, as well as the remote being a dedicated machine at a trusted employee's home (probably with a new dedicated line in) or with our local ISP (if they can offer such a service). A new dedicated line out of the office has also been suggested, I think mainly because daily file changes can potentially be quite large (3D CAD models etc). A possible advantage of the remote being nearby is that the initial backup could be using a portable hard drive instead of having to uploading terabytes of data (I guess there is always courier services though).

Anyway, just thought I'd chuck it out there. A lot of you guys probably already set up and/or look after remote backup systems. Even if anyone just has some ideas regarding potential traps/pitfalls would be handy. The company is fairly small (about 20-odd employees) so I don't think they need anything overly elaborate, but all feedback is appreciated.

Related Stories

A Series on How Rsync Works 5 comments

FOSS developer Michael Stapelberg has started a four part blog post on Rsync and how it works. He wrote the i3 tiling window manager, among other projects, and is a former Debian developer. Now he has written about three scenarios for which he has come to appreciate Rsync, specifically in DokuWiki transfers, software deployment, and backups. Then he looks at at integrating it into various work flows, and then at what the software and protocol actually do. The fourth section is to be announced.

Rsync is an algorithm and a utilty, both initially developed by Andrew Tridgell as part of his PhD dissertation work, and by Paul Mackerras. It is used for updating files on one machine so that they become identical to a file on another machine while at the same time transferring the minimal amount of data to effect the update, saving on time and bandwidth. Rsync is the underlying component in a great many backup utilities and routines. With the right settings it can even do incremental backups. Andrew is also well-known for having worked on Samba, and won in the EU against M$ in order to get the required interoperability specifications needed to share files using CIFS/SMB.

Previously:
(2014) Ask Soylent: Suggestions for Remote Backup
(2014) How Do You Sync Your Home Directory?


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday July 11 2014, @01:31AM

    by Anonymous Coward on Friday July 11 2014, @01:31AM (#67377)

    I'm building two ZFS storage systems, was going to use an old version of OpenSolaris I have still but apparently FreeNAS has their performance issues worked out, so I will probably go that route if I can still have my full command line goodness.
    Anyhoo, each box will be host to a lot of different types of data - stuff that is temporary (surveillance footage), stuff that doesn't need backing up, stuff that we can't live without if it's lost etc.
    The stuff to back up will be snapshotted nightly and with ZFS you can send a snapshot over SSH.
    I know snapshots aren't backups, I'll still do a yearly-ish copy everything to an external USB drive and keep offsite, but this seems like the best (and free plus hardware) method to me.

    • (Score: 1) by wantkitteh on Friday July 11 2014, @08:47AM

      by wantkitteh (3362) on Friday July 11 2014, @08:47AM (#67505) Homepage Journal

      Sounds good to me - my suggestion is FreeNAS on two identical boxes physically located as far from each other as practically possible on-site, synced every 15 minutes or so using ZFS snapshots for manual failover, coupled with a nightly incremental backup to tape/offsite repo/punch card/whatever with two or more sets recycled on a rolling monthly basis. FreeNAS is a good forward thinking business choice as well - it scales well, speaks almost every network file access protocol you could wish for (including iSCSI), integrates with AD/OD/LDAP, and doesn't mind running as a virtual machine, just as long as you aren't a spanner about it. This article [freenas.org] has an off-putting title, but read through it and you'll get briefed on all the noob VM setup mistakes that can ruin FreeNAS's performance and ZFS's reliability, along with all the solutions.

  • (Score: 2) by frojack on Friday July 11 2014, @01:40AM

    by frojack (1554) on Friday July 11 2014, @01:40AM (#67381) Journal

    Are you looking for backup or syncing?

    Rsync can be made to look like a backup, but it isn't really.

    I've used all of the above. Rsync with remote machine, and also paid services.
    We had offices in the US and Australia and Rsync (via the UNISON Package) was just
    great for us. We scheduled a sync every two hours during the day and it was perfect
    at keeping our code libraries in sync. (That was in the days before Dropbox and paid cloud storage).

    There are clients for Dropbox in linux and windows and they work ok as long as you don't
    have too much activity to sync. Their business model is sync only, but I have heard of
    some people configuring it to backup only.

    At my current day job we use SpiderOak (paid account) because it can be configured to
    sync some directories and just backup others and you can step back in time on the backups so that you can get the version of a file that changed several weeks ago, ignoring those later ones.

    SpiderOak is totally encrypted, even the SpiderOak staff can't decrypt your stuff.
    And you can back up several machines.
    You can sync some directories between several clients, as well as have a public space where you can put things for others to grab. (That's not encrypted).

    There are Windows and Linux clients.

    Disclosure: We are a happy paying customer.

    --
    No, you are mistaken. I've always had this sig.
    • (Score: 1) by richtopia on Friday July 11 2014, @04:01PM

      by richtopia (3160) on Friday July 11 2014, @04:01PM (#67660) Homepage Journal

      +1 for SpiderOak

      The original question was a little vague on details, but if you are looking for backing up anything less than terabytes then go commercial instead of spinning your own. I've done both, and unless you have full time IT available for spinning your own then you just cannot depend on it.

      While writing my thesis I had SpiderOak (student discount if you are a student). SpiderOak keeps historical versions of your files, which well paid for the entire subscription when I accidentally saved over a few days of work. It was also very useful for syncing files between my Windows/Linux partions on my laptop and my two workstations; by syncing the same folder it was transparent with a small delay for particularly large files.

      Also, SpiderOak keeps making improvements to help compete with Dropbox, for example you can list folders as sharable through a weblink now with a password. By accepting the lower security you can access your data easier and give out the url to others. I recently used this feature to distribute an 800 megabyte archive to ten people without any problems.

      Even if you only backup the critical stuff here, it is well worth the peace of mind for offsite reliability. You may want to watch for a discount though, if you read their blog you can see some old sales for example (I'm still kicking myself for not upgrading to unlimited while it was offered).

      • (Score: 1) by deego on Saturday July 12 2014, @03:32AM

        by deego (628) on Saturday July 12 2014, @03:32AM (#67965)

        The paranoid in me wants to ask: Does use of SpiderOak entail downloading a proprietary client from them?

  • (Score: 2) by sigterm on Friday July 11 2014, @01:49AM

    by sigterm (849) on Friday July 11 2014, @01:49AM (#67384)

    The most important aspects of a backup system are offsite storage and the ability to perform data and system restore quickly enough to prevent unacceptably long downtime. Exactly what acceptable downtime is, will depend on the business and the service in question.

    Whether you should choose a local backup system using removable media, a server at a remote location, or a hosted service, depends on the organization's ability to manage a local setup, the feasibility of backing up and restoring over Internet or WAN links, and how comfortable they are about trusting a third party with their data (and thus potentially the future of the organization).

    When evaluating software and/or hosted services, make sure to consider the level of reporting provided. It's vital that you know whether backups are being performed or not, and the system must not allow for any kind of silent failure.

    If you decide on a local or hosted solution rather than a service, there's no shortage of decent backup software. I'd like to mention one: I've installed Ahsay OBS (http://ahsay.com/) for several clients and I really, [i]really{/i] like the software, because it's:

    - dead easy to install and configure (it's Java based)
    - performs automatic deduplication using file level delta backups
    - has excellent multiplatform support
    - has excellent agent support for various databases (including Exchange)
    - supports quotas and bandwidth throttling (per account)
    - supports multiple backup servers with replication
    - is licensed per user account, not per backup server
    - each account can serve multiple backup jobs from multiple hosts
    - supports backup/export to (and restore from) removable media (or any location, really)

    If you can put a server at a secondary location or use a hosting provider, you should consider Ahsay.

    • (Score: 2) by cafebabe on Friday July 11 2014, @03:24PM

      by cafebabe (894) on Friday July 11 2014, @03:24PM (#67627) Journal

      "No-one wants backup but everyone wants restore."

      "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."

      Work backwards from the restore capabilities. As an example, I advised a business to cancel their Amazon account because it would have taken two weeks to restore their data - and that assumed Amazon wasn't having a simultaneous outage. At $1,600 per month, it was cheaper and quicker for a part-time employee to manage USB drives. Low tech? Crappy? Yes but it also has advantages. For example, the bandwidth cost to run an integrity check is zero.

      --
      1702845791×2
  • (Score: 1) by ticho on Friday July 11 2014, @01:50AM

    by ticho (89) on Friday July 11 2014, @01:50AM (#67385) Homepage Journal

    I have good experience with using http://relax-and-recover.org/ [relax-and-recover.org] for bootable bare OS recovery (bootable ISO or PXE image), and http://obnam.org/ [obnam.org] for data backups.

  • (Score: 2) by SlimmPickens on Friday July 11 2014, @02:08AM

    by SlimmPickens (1056) on Friday July 11 2014, @02:08AM (#67391)

    rsync.net - works with Debian out of the box and they have a warrant canary: [rsync.net]

    rsync.net will also make available, weekly, a "warrant canary" in the form of a cryptographically signed message containing the following:
    - a declaration that, up to that point, no warrants have been served, nor have any searches or seizures taken place
    - a cut and paste headline from a major news source, establishing date
    Special note should be taken if these messages ever cease being updated, or are removed from this page.

    Obviously you should still use your own encryption.

    • (Score: 1) by hellcat on Saturday July 12 2014, @03:02AM

      by hellcat (2832) Subscriber Badge on Saturday July 12 2014, @03:02AM (#67954) Homepage

      Count my vote for rsync.net, two.

      Been using it for years to sync my machine at work and home. It doubles as a 'backup' in case any one crashes. I don't want a true backup, just my data. If I buy a new machine (been through this twice) all I have to do is fire it up, install the rsync code and let the data rain down. When it's done everything is where I want it.

  • (Score: 2) by jackb_guppy on Friday July 11 2014, @02:15AM

    by jackb_guppy (3560) on Friday July 11 2014, @02:15AM (#67394)

    I personal use two methods.

    One, like you had before, multiple machines backing up in one location/machine w/ USB drives (a total of 6 Linux boxes in-home). This has worked well except when I loose a USB drive (just last weekend). But I do make 3 copies, so losing one was not a hard hit, just re-adjust the backup location and copy function, until new drive arrives, then up to 18 hours, to bring 3TB back on-line.

    Second, is my wife's and my oldest daughter's machines. They are both Win8.1, been using BackBlaze along with common "save" drive. Wife's SSD drive is backed up to internal save drive, along with daugther's machines. There is secondary, backup on an USB external on my wife's (all the accounting books and pictures). Finally over all is BackBlaze, It copies all changes files every 3 hours to their cloud storage. Original up load took almost week with 5Mb up link speed (home connection 30Mb down/5Mb up). Cost $50 per year, unlimited storage, as long as it local to machine (so Internal and USB drives). So a file change on my wife's makes 2 copies locally, then all 3 copies are upload. Daughter’s, one copy in-house, then up to the cloud. If we lost it all, we could pull it back down, to they will load it a USB drive and ship it to me for $99.

    I do a friend local that we have talked about sharing "cloud" services. We are looking at what an above post talked about, to machine one local / one remote with rsync running keeping them in sync.

    Thee is also a world of personal/private cload options, including ReadyNAS. And there is also owncload that has a versioon that runs on RPi :) http://blog.bittorrent.com/2013/05/23/how-i-created-my-own-personal-cloud-using-bittorrent-sync-owncloud-and-raspberry-pi/ [bittorrent.com]

  • (Score: 2) by RobotMonster on Friday July 11 2014, @02:34AM

    by RobotMonster (130) on Friday July 11 2014, @02:34AM (#67397) Journal

    About 9 months ago I was faced with a similar problem.
    I went with a number of Synology DSM414s; they're headless, low-power, support RAID, run Linux and have a nice browser GUI for admin.
    I've configured the master to sync the data to the slave (uses ssh+rsync under the hood i believe), and both the master and slave take rolling incremental backups of the synced folders (very similar to Apple's Time Machine).
    Supports being a Windows/OSX file server, also works as a destination for Time Machine backups.
    Lots of packages you can install via the GUI, more packages if you're willing to get your hands dirty on the command line.
    We run SVN on the master; any commit turns up on the slave's copy of the repo within seconds, and is then backed up on the hour.
    Should the master machine die for any reason, a quick change to our dyndns account and the slave can be the new master within minutes.
    Works a treat.

    http://www.synology.com/en-global/products/overview/DS414 [synology.com]

    • (Score: 2) by WizardFusion on Friday July 11 2014, @09:29AM

      by WizardFusion (498) Subscriber Badge on Friday July 11 2014, @09:29AM (#67512) Journal

      This.
      I have a synology 1513+ that I use for data storage and iSCSI traffic for my VMware home lab
      It has all the built in software to do different types of backups for you.

  • (Score: 2) by Nerdfest on Friday July 11 2014, @02:54AM

    by Nerdfest (80) on Friday July 11 2014, @02:54AM (#67405)

    I use CrashPlan. It's an online backup service whose software is actually free, although not open source. You can use the software to do versioned backups from one of your machines to another, to an external drive, or to a friend who also runs the software. You can supply your own encryption keys of course making it extremely secure. If you actually *pay* them, and prices are quite reasonable, you can back up to their servers. No space limits, no throttling. It supports Linux, which is a big plus for me. I got a great deal a couple of years ago fo 10 machines for $6 per month or something and it's well worth it. I've had to rely on it a couple of times where I got a bit sloppy playing with raided, encrypted drives. You can get them to send you a disk with your data if you're in a hurry (costs extra). I'm in no way affiliated with them, just a happy customer.

  • (Score: 0) by Anonymous Coward on Friday July 11 2014, @03:03AM

    by Anonymous Coward on Friday July 11 2014, @03:03AM (#67408)

    We ensure that we use multiple methods and multiple vendors. Right now we are using Jungledisk (writes to rackspace or amazon clouds) because it is enterprisey, Linux & Windows support, compression/dedup and great backup history. For long term backups we are also looking at amazon glacier. Have cycled through several vendors and found that cheap backup is very expensive to manage and keep working. Pick a good one.

    I am sure that all the other answers mentioning removeable media are using encryption. Make sure you use it too to avoid a data breach.

    And remember, the only good backup is a tested, verified and fire-drilled backup. What ever you pick make sure you can get actually get the files back*.

    *long story made short - I had to fix a system where /dev/ct0 became a regular file instead of the tape device. They had been backing up the db to a very large file named ct0, not the tape.

    • (Score: 2) by mojo chan on Friday July 11 2014, @07:30AM

      by mojo chan (266) on Friday July 11 2014, @07:30AM (#67481)

      Amazon Glacier is rather good for long term, rarely changing backups, and also pretty cheap.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
  • (Score: 1) by hendrikboom on Friday July 11 2014, @03:03AM

    by hendrikboom (1125) Subscriber Badge on Friday July 11 2014, @03:03AM (#67409) Homepage Journal

    Furnish two trusted employees with USB hard drive. They keep them at home, On even-numbered days, one of them brings his drive(s) in to work to receive a backup, and takes it(them) home again when he leaves work. On odd-numbered days, the other one does similarly. This way you always have a recent backup and an off-site one, and you have no heavy internet use.

    I use rdiff-backup to do my backups. It works with a local USB drive -- it doesn't actually need to go over a net.

  • (Score: 2) by Ken_g6 on Friday July 11 2014, @03:16AM

    by Ken_g6 (3706) on Friday July 11 2014, @03:16AM (#67412)

    I've heard Git mentioned as a backup system [serverfault.com] at least once. It seems like there are several ways to use it. [dzone.com]

    The biggest problem I see mentioned about it is that it cannot forget data. So if you have rapid data turnover and you want to purge old data e.g. yearly to save space, Git may not be the system for you.

  • (Score: 3, Informative) by AudioGuy on Friday July 11 2014, @03:19AM

    by AudioGuy (24) on Friday July 11 2014, @03:19AM (#67415) Journal

    I work with small businesses in the 5 to 100 employee range.

    This is dependent upon the nature of your data in the specific business, of course, but in general I have found that unless you live in a modern, connected place like Hong Kong, Tokyo, Latvia, Finland etc, it is just practically unworkable for real, modern office data needs. In a backwater like the US with its slow, expensive internet, hopeless.

    Yes, there are incremental backups, etc. But consider what happens in real life: The graphics guy decides to reorganize his folders, 'MyGraphics (200GB)' to 'AllGraphics/Gifs AllGraphics/Jpegs'. To an incremental backup program he just deleted a folder, created new ones, and all those files need to be copied again. Maybe he decided to process them all to remove some metadata in Photoshop at the same time, so even the worlds smartest incremental backup program is going to see those all as brand new files to be copied.

    So 200GB of data is going to be copied to your remote 'cloud' location. Just how fast a connection can your small company afford? Let's say you have a 20Mbs connection like the local school here. 200GBytes is about 2000GBits, 2 000 000 000 000/20 000 000 = 27 hours to transfer what took this guy a few minutes to do. Multiply by 20 people. Your offsite backup will probably never finish.

    Compression? These graphics are mostly already compressed.

    It's of course very dependent upon the amount of data, but I have always been surprised at the amount even from busineses where I would not have expected large data sets.

    Here is what I usually do, seems to work:

    I have one main depository - the file server. This has two large disks, raided with raid 1, so loss of any one drive does not lose data. This runs Samba, which works with both Windows and Macs (and linux :-) ).

    There is a second machine - and note these do not need to be particularly powerful ones, I often use old repurposed user machines - this one I typically call archive. It has two big disks the same size as the main file server, but these are NOT raided.

    All the backups are done with simple shell scripts running rsync. There are several scripts, designed to handle different cases - some data rarely changes and is not critical. Other data may change a lot, and my be so critical its loss could severely hurt the business.

    One script runs weekly, on weekends. It rsyncs the main archive to either disk 'A' or disk 'B' on the archive machine. These alternate weekly. This insures a serious data error on the main drive can probably be fixed by the previous weeks untouched backup (protects agains a lot of 'oops, I didn't mean to delete that and now the backup has run...'). This script gets everything, that is why it runs on weekends, so as not to tie up the internal net during workdays.

    Another script runs daily. This is for fast changing, more critical stuff. (could be hourly, in some cases). Usually there is MUCH less of this, and you know where it lives, so these can pretty safely run overnight. It just updates the main archive backup disk, whichever one is selected that week.

    And this script may call at the end one more, which handles very special data, very critical data. This data is backed up on a separate section (possibly separate partition), and it has folders called 'Today' Yesterday' 'ThisWeek' 'LastWeek' 'ThisMonth' 'LastMonth' and sometimes 'last year, etc. Data is rotated through this, so that it is easy to pull out the data from last month for some emergency etc. You may think this is overkill, but I can say that I have needed to pull data from previous years a number of times. ('remember that proposal so and so sent me last year? Would you still have that backed...')

    Nothing is compressed, it is a pain and just slows access, prevents easy searching, makes recovery harder, and may fail in various ways. Disk space is cheap.

    Usually the 'archive' machine is also used for a mail backup and archive.

    I usually partition the main data store into two parts, one is what most users see as the fileserver, the other is normally not visible to them and contains system backups, such all the critical stuff on servers (/etc dirs if nothing else, but in most cases full backups so a server can be recreated at any time just by a simple disk copy).

    All the rsyncs are 'pulls' to the archive machine, and send emails on completion. If I (or whoever) don't get an email on Monday, I know something is seriously wrong.

    The scripts are really just lists of rsync commands, no fancy programming needed.

    Off site backups? Easily done by swapping out the 'out' drive for the week, let someone take it home. Plugin firewire drives work well for this too. The problem in small businesses is getting people to reliably DO it. So how it REALLY works in real life is this:

    Any time a drive fails, or anyone one panics, thinking they lost something, be sure to ask if that task has been done. This is the most effective time. :-)

    Second component of this: Buy drives that are not necessarily the biggest available. When you have to replace them, because they WILL fill up, just keep the old drives, send them home with someone.

    This is what I have come down to after quite a few years, and different approaches. It is very simple and has not failed me.

    -AG

    • (Score: 0) by Anonymous Coward on Friday July 11 2014, @03:31AM

      by Anonymous Coward on Friday July 11 2014, @03:31AM (#67416)

      > Yes, there are incremental backups, etc. But consider what happens in
      > real life: The graphics guy decides to reorganize his folders,
      > 'MyGraphics (200GB)' to 'AllGraphics/Gifs AllGraphics/Jpegs'. To an
      > incremental backup program he just deleted a folder, created new ones,
      > and all those files need to be copied again. Maybe he decided to
      > process them all to remove some metadata in Photoshop at the same
      > time, so even the worlds smartest incremental backup program is going
      > to see those all as brand new files to be copied.

      You are behind the times my friend. Modern backup systems use data de-dupulication algorithms to deal with cases like that. Rename the files, move them to different filesystems, tweak some of the headers in the file, it doesn't matter. Only the disk blocks with changed data get copied.

      Here's one program that works like that: Obnam [obnam.org]

      • (Score: 2) by AudioGuy on Friday July 11 2014, @03:42AM

        by AudioGuy (24) on Friday July 11 2014, @03:42AM (#67418) Journal

        But what about the case I mentioned, where the graphics guy did some processing on each file? They are now all different, and Photoshop pretty severely messes with everything.

        And what about the very first backup. 20 machines, full backup. The smallest drive I can FIND anymore is maybe 300GB, and users really DO fill these up.

        'Only the disk blocks with changed data get copied.' He rewrote every file. The disk blocks are all in different locations, very likely.

        • (Score: 1) by CyprusBlue on Friday July 11 2014, @04:21AM

          by CyprusBlue (943) on Friday July 11 2014, @04:21AM (#67431)

          You're arguing without looking up the information. Many different systems these days do indeed handle this gracefully migrating only the changed blocks from the filesystem level.

          Check out ZFS for instance, it's what I use for the most part to solve this. And you're also wrong about compression, it's almost a 0 hit on cpu when done right, for significant savings at times. When combining dedup with compresssion for the actual stream updates, it's really not much of a hit, and can save lots of window time. The bigger issue is generally how do you do full restore windows, as that requires much higher bandwidth than deltas.

          Obviously there are outliers, but those situations (like a production studio for instance) clearly have to be handled differently anyway, and are almost straw men when talking about the general small office case.

          • (Score: 2) by AudioGuy on Friday July 11 2014, @05:11AM

            by AudioGuy (24) on Friday July 11 2014, @05:11AM (#67450) Journal

            I did look it up, (it has some problems with sql data, etc.) and was not unaware of the de-duplication algorithms existence.

            I picked a poor example on how incrementals can be fooled. The real point was meant to be simply that large amounts of data can change in ways where you would not expect this to be the case. I think the color change mentioned below would have been a better choice.

            It is possible my experience is slightly skewed by many of the businesses I deal with being involved in the arts.

            However, the original poster did specifically mention 'because daily file changes can potentially be quite large (3D CAD models etc)'. To me that means 'many gigabytes of data every day' - NEW data. What would be more useful is if he were to mention what the typical amounts actually were.

            If the 3D Cad he is talking about is the kind used for say, video/movie production, just a simple, slight color change will rewrite the whole file, pretty much every byte of the rgb data, and that file could easily be 20-100 GB. He hasn't said, so I don't know.

            But even other companies surprise me - they have huge print files, they are generating simple video, editing it, color correcting, etc.

            It adds up, and while deduplication sure looks like a useful tool I have my doubts it is -enough- to compensate for the woefully inadequate internet speeds many of us have to deal with. Maybe in Finland it is enough. :-)

            I don't understand the comment about compression (sorry, replying to two different posters at once, probably I shouldn't), I only mentioned that I do not compress the files on disk on the local copy. This just makes it simpler and faster for others to find files in the archive. If I were transferring general files over the net I would certainly want it. It doesn't help a whole lot on the case of already compressed files like jpegs and much video. I said nothing about stressing the processor.

            Most of the small businesses I work with have several terabytes of data to back up, so that initial backup would take quite some time. You can dismiss that, but I can't. :-)

        • (Score: 0) by Anonymous Coward on Friday July 11 2014, @04:27AM

          by Anonymous Coward on Friday July 11 2014, @04:27AM (#67432)

          > And what about the very first backup. 20 machines, full backup.

          I wasn't addressing the issue of level-zeros, I was simply pointing out that your claims about how incremental backups work are obsolete.

          > 'Only the disk blocks with changed data get copied.'
          > He rewrote every file. The disk blocks are all in different locations, very likely.

          If you are trying to say that the data offsets within each disk block changed because the file structures aren't block-aligned, well sure that's always a risk. There will always be pathological cases. But designing a system based on the rare pathological case brings its own risks - you identified one yourself when you pointed out how hard it is to get regular people to haul a disk offsite.

          Like everything in life, it's a series of trade-offs. But you can't make an accurate assessment of the trade-offs if you aren't starting with a realistic evaluation of the available options.

      • (Score: 0) by Anonymous Coward on Friday July 11 2014, @11:41PM

        by Anonymous Coward on Friday July 11 2014, @11:41PM (#67896)

        unless you live in a modern, connected place like [...] Latvia [...]

        ...

        ....

        Bwahahahahaha, LOL, LOL, ROFL, hahahahaha, YEAH, LMAO, mwahahahahahahahahahaha. You made my day, thanks!

    • (Score: 2) by egcagrac0 on Friday July 11 2014, @03:22PM

      by egcagrac0 (2705) on Friday July 11 2014, @03:22PM (#67625)

      I often use old repurposed user machines

      One script runs weekly, on weekends

      Another script runs daily

      one more, which handles very special data, very critical data. This data is backed up on a separate section (possibly separate partition), and it has folders called 'Today' Yesterday' 'ThisWeek' 'LastWeek' 'ThisMonth' 'LastMonth' and

      The scripts are really just lists of rsync commands, no fancy programming needed.

      It is very simple and has not failed me.

      I'm cringing at a lot of that. I'm not exactly sure what you're doing, but it doesn't sound like a backup to me. It sounds like a copy. The two are not the same.

      A copy might get you "something useful" when SHTF. It's a damn sight better than nothing.

      A backup gets RPO, RTO, and some version history. A backup gets tested.

      Trying to get there with someone's old desktop and three hard drives... hopefully your customers understand in advance the risks inherent, and their business continuity plan aligns acceptably with what you're doing.

      A small business of about that size that I was formerly affiliated with had a consultant who thought up a similar scheme. After a junior executive (with the same last name as the owner) "reorganized"* the file server to "clean up" what were apparently needed documents, we discovered that even though we were paying a third party for a backup, those backups were unrecoverable.

      *When we reminded him two weeks later, when he did it again, that we had said "backups aren't working, don't go cleaning up any more files until we say otherwise", his reply was that he had to be able to delete needed files without consequences... backups were a nightmare at that place, since they didn't want to spend any money on what was apparently business critical stuff.

  • (Score: 1) by Shimitar on Friday July 11 2014, @05:11AM

    by Shimitar (4208) on Friday July 11 2014, @05:11AM (#67449) Homepage

    From my web server (dedicated) to my home i have setup a nice bash script server side which backsup files folders and mysql databases. Then it does a tar over ssh tunnel to a trusted user with key pairs to my home and just send the data there. Some code to manage dates and such to keep track of how many copies of the backup are stored and such. It works its simple.

    On the server you need bash, tar and ssh and of course cron. On the backup site you need a user, tar and ssh. The server must be able to remote login with shared key over ssh to the backup site.

    If you are interested i can share the script with you.

    --
    Coding is an art. No, java is not coding. Yes, i am biased, i know, sorry if this bothers you.
  • (Score: 0) by Anonymous Coward on Friday July 11 2014, @06:45AM

    by Anonymous Coward on Friday July 11 2014, @06:45AM (#67465)

    If you gonna use a cloud service (aka random guy somewhere)

    * Encrypt all your stuff and do it locally before it's on the wire. And you're still bleeding metadata.
    * Use several redundant providers, see Megaupload.

    • (Score: 2) by Open4D on Friday July 11 2014, @04:42PM

      by Open4D (371) on Friday July 11 2014, @04:42PM (#67682) Journal

      * Encrypt all your stuff and do it locally before it's on the wire.

      For backup (not sync), http://tarsnap.com/ [tarsnap.com] may be an option. Deduplication & compression, yet also local encryption. (I can't think of any other paid-for service where you have to build the client software yourself!) It uses Amazon S3.

       

      * Use several redundant providers, see Megaupload.

      Least Authority [leastauthority.com] claim to be working on "Redundant Array of Independent Clouds" (RAIC) [leastauthority.com], which will support Microsoft Azure, Rackspace, Google, and Hewlett-Packard in addition to their existing Amazon S3.

       
      N.B. Worth mentioning: Greenpeace claims [bbc.com] that Amazon's hosted storage is less 'green' than Google's & Apple's. (However, Greenpeace consider nuclear energy to be bad, so you'd have to check the details to be sure.)

  • (Score: 3, Interesting) by Jaruzel on Friday July 11 2014, @07:04AM

    by Jaruzel (812) on Friday July 11 2014, @07:04AM (#67470) Homepage Journal

    Is your workplace one building?

    If not, consider having the 'remote' backup in different building to the primary file server. By having both servers on the same LAN, you avoid the need for massive internet bandwidth yet fulfill your desire to safeguard your data by having it on an alternate site. Obviously this solution does not apply if you work in the centre of a busy metropolis and bomb threats are very real thing.

    The problem with backup solutions is that one size rarely fits all; everyone's needs are different. That said, if your company can afford it, I'd seriously recommend ditching on-site file storage and go with a hosting company who can offer you mirrored data at two different locations. That way you can offload problem to someone else and wrap a nice service layer around it to boot (i.e. not your fault if something goes wrong).

    -Jar

    --
    This is my opinion, there are many others, but this one is mine.
  • (Score: 2) by zeigerpuppy on Friday July 11 2014, @07:59AM

    by zeigerpuppy (1298) on Friday July 11 2014, @07:59AM (#67490)

    Great to see you have some familiarity with Debian. I would recommend the following:

    1) build two file servers, one on site and one off-site (Debian is fine) with RAID storage. RAID is not a backup but you want your backup on RAID! Make sure your servers have ECC RAM and a decent SAS controller (LSI 9207-8i for example are cheap and good) and at least 4 HDD drives (+ SSD optionally). These parts are not expensive now. You do not need a dedicated network connection between the servers unless you have more than about 5GB of data *changing per day.

    2) I really like the combination of zfsonlinux with Debian http://zfsonlinux.org/debian.html [zfsonlinux.org] (use the unstable packages as they are plenty stable enough for such a purpose). ZFS makes sure that you don't suffer from "bit rot" but also makes replicating across an SSH connection easy (and incremental). The best thing is that snapshots are instantaneous and you get all sorts of other good features like easy NFS sharing and built in compression.

    3) set up a nightly (or more frequent if needed) zfs snapshot and zfs send/receive (over SSH) to the offsite server. (if you go down this path, PM me and I am happy to share my backup scripts). Then you can rollback to any day (I keep one snapshot a month and the past 30 days).

    ZFS has lowered my administrative load a great deal. Just a couple of gotchas, make sure that your ZFS version is the same or higher on your backup server (or else ZFS send/receive won't work), also make sure to have plenty of RAM - (at least 2GB per TB of data storage).

    other less sophisticated (but still useful options) are rsnapshot and rdiff-backup (these are wrappers around rsync that do incremental backups that can be rolled back to particular dates). You still need two servers though, so just save yourself some time and spec them out for ZFS!

    Also, the same thing can be achieved using FreeNAS or various BSD systems but if you're already familiar with Debian, stick with that (I also like that Debian can run a ZFS backed Xen hypervisor which you can't do on FreeBSD yet).

    • (Score: 2) by zeigerpuppy on Friday July 11 2014, @08:07AM

      by zeigerpuppy (1298) on Friday July 11 2014, @08:07AM (#67493)

      One more thing.... make sure to have a bash script that exports database tables on a regular basis (these are not properly backed up by just taking a file snapshot).

    • (Score: 1) by jbruchon on Monday July 14 2014, @01:14PM

      by jbruchon (4473) on Monday July 14 2014, @01:14PM (#68886) Homepage

      SAS seems like a huge waste of money, especially for a very small business or home office. Why bother with SAS?

      --
      I'm just here to listen to the latest song about butts.
  • (Score: 1) by goodie on Friday July 11 2014, @12:46PM

    by goodie (1877) on Friday July 11 2014, @12:46PM (#67555) Journal

    I've used rsync in the past, but it's not really a backup system as others have pointed out. I just liked the fact that it worked well and it was already installed on my FreeBSD machines. But then again, that's a home setup, not an SME's critical data. I just wanted to point out that the issue with rsync is integration with win clients if that's ever a possibility. I use it to clone pics, documents and my personal git repository

    If your setup sticks to a master file server and a backup file server both running some flavor of Linux or BSD, then rsync is good, but again it's not a backup system in the way we understand it. For example, I'm not sure you could decide to restore to t-1 after a disaster. Say Guy puts file on server. That gets rsync'ed. Then guy deletes file. That gets rsync'ed. If you want that file back, it may not be doable depending on your rsync settings (no delete). However, with incremental backups you may be able to restore to the snapshot before the file was deleted.

    It all depends on what you need to be able to do exactly and whether you want to be able to easily administer all this using a web browser etc. for example. Do you guys have competent people to run that sort of thing on non-Windows machines in your company? That's an important point to keep in mind, It's one thing to set up the beast, another one to tame it, try it etc.

    • (Score: 2) by egcagrac0 on Friday July 11 2014, @03:43PM

      by egcagrac0 (2705) on Friday July 11 2014, @03:43PM (#67642)

      I'm not sure you could decide to restore to t-1 after a disaster.

      Rsync has some useful stuff built in for incrementals, and file-level deduplication. [interlinked.org] A very small wrapper script and an underlying filesystem that supports hardlinks is pretty much all you need to get there.

      • (Score: 1) by goodie on Friday July 11 2014, @04:58PM

        by goodie (1877) on Friday July 11 2014, @04:58PM (#67696) Journal

        Cool, I did not know about that, thanks!

        • (Score: 2) by egcagrac0 on Friday July 11 2014, @08:53PM

          by egcagrac0 (2705) on Friday July 11 2014, @08:53PM (#67833)

          An obvious enhancement would be to check if nothing has changed. If all the batch run produced was a new directory tree full of hardlinks and symlinks, there is no value in keeping it (but there is a cost in keeping it).

          I'm not sure of a good way to implement that enhancement.

    • (Score: 0) by Anonymous Coward on Saturday July 12 2014, @11:57AM

      by Anonymous Coward on Saturday July 12 2014, @11:57AM (#68065)

      "... the issue with rsync is integration with win clients". cf. cygwin.

  • (Score: 2) by cosurgi on Friday July 11 2014, @12:56PM

    by cosurgi (272) on Friday July 11 2014, @12:56PM (#67558) Journal

    I use rsnapshot. It's a tool that is started from cron two times per day and connects remotely to all machines that I own and downloads everythong via rsync. Also it stores everything up to many years ago using very little extra space on HDD, because only the differences are stored - thanks to hard links. You can browse the backup normally, because that's just normal directories on ext4 (or whatever fs you want). Also it backups windows machines (I had to do this for my wife's PC), you only need to install rsync server on windows PC.

    --
    #
    #\ @ ? [adom.de] Colonize Mars [kozicki.pl]
    #
  • (Score: 2) by egcagrac0 on Friday July 11 2014, @03:59PM

    by egcagrac0 (2705) on Friday July 11 2014, @03:59PM (#67658)

    RPO, RTO is a good start. It's OK to have several different tiers of RPO/RTO, for different targets.

    What do you want to recover from? What do you not want to recover from? This is an eye-opener - at some level of catastrophe, the business owners are going to say "You know what, screw it. Give up, collect the insurance, move on." When that level event happens, you don't need a full recovery - you need enough recovery so that the insurance claim can get filed.

    Business continuity planning sucks. Particularly with small business owners who often don't understand what they want (everything, duh!) vs what they need vs what they are willing to pay for.

    It gets more complicated if you have to mix desktops into the recovery plan, but in a lot of cases in the past, the people I've worked with don't want EVERYTHING backed up, they just want their documents folder (excluding music/movies). A lot of times, they're even willing to get back most of it, instead of insisting on everything, so long as they know that they're trading security vs convenience and which side they're balancing on.

    Defining the targets, RPO, RTO, and scenarios to recover (and not recover) from informs the selection of a solution, including whether or not offsite is necessary (or desirable).

    ... and don't put the primary or backup servers in the same room as the water main, just in case.

    • (Score: 2) by egcagrac0 on Friday July 11 2014, @04:20PM

      by egcagrac0 (2705) on Friday July 11 2014, @04:20PM (#67668)

      And then, once that's done, see if your ISP offers colocation - that's probably a better landing spot for the remote contingency plan than someone's home closet.

      VPN is probably a more cost effective solution than a dedicated circuit.

      When you're designing the storage solution, remember that the primary storage server (fileserver) needs to be fast and reliable, but the backup server just needs to be reliable (speed is a secondary concern - informed by RTO).

      Set up some sort of monitoring. You want to discover before it's "too late" that the backups aren't working. Or that the RAID on the remote storage is degraded. Or that the RAID on the primary storage is degraded. Or that the chassis intrusion has gone off in your supposedly locked rack. Or that the remote system has gone offline because some bonehead kicked a power cord. I, for one, find notify-on-success to be an unhelpful monitoring system (it all just becomes background noise to me), whereas notify-on-failure is a somewhat harder problem to solve.

    • (Score: 0) by Anonymous Coward on Saturday July 12 2014, @12:02PM

      by Anonymous Coward on Saturday July 12 2014, @12:02PM (#68067)

      I am impressed by your confident, rigorous use of arcane, domain-specific acronyms and want to subscribe to your newsletter.

  • (Score: 1) by GeekDad on Friday July 11 2014, @04:47PM

    by GeekDad (4484) on Friday July 11 2014, @04:47PM (#67688)
    I prefer FreeNAS for all backups. FreeNAS is based on FreeBSD. It supports UFS and ZFS for file system. If you have 64 bit hardware you can choose ZFS for file system and UFS for 32 bit hardware. UFS is native FreeBSD file system. ZFS imported from Solaris/OpenSolaris. ZFS has much more features compared to UFS, such as snapshots and so. ZFS is resource hungry file system. It requires 1 GB of RAM for each 1 TB disk space. ;) FreeNAS has a web interface for remote management. It looks like Nagios. FreeNAS documentation is very good and may help you about hardware selection and configuration. If you want to skip the hassle for hardware and configuration you get a FreeNAS box from IX systems. http://www.freenas.org [freenas.org] is the official web site.
  • (Score: 1) by chewbacon on Friday July 11 2014, @05:56PM

    by chewbacon (1032) on Friday July 11 2014, @05:56PM (#67734)

    I have a Linux machine setup for home backup and I keep a lot of my per diem IT work backed up to it. I bought offsite storage. Not much, but enough for my irreplaceable data like pictures and software development work. At home, I have ZFS and I use TrueImage on my windows clients. Rsync works well enough for Linux clients. I also use subversion on the same server for software which is arguably another backup source. Smartmon is a life and time saver. You. Can use it with ZFS to email you about drives ending their life.

    One extremely important thing to remember about ZFS, RAID-z really, is do not use the entire drive space of any drive in the array. Partition them and pick a nice round number below the drive capacity. That away when you replace one, ZFS doesn't go apeshit and fail due to a different drive size. Hasn't been a problem I've ran into yet, but I've been warned about it.

  • (Score: 0) by Anonymous Coward on Friday July 11 2014, @06:04PM

    by Anonymous Coward on Friday July 11 2014, @06:04PM (#67741)

    I highly recommend QNAP, which I've used for 6+ years. These days building your own for something like backups is silly, when your business depends on it. qnap and synology are soo cheap and work so well it just makes sense. Remote backups are when you physically swap one of the external drives from the nas station and drive it somewhere like a bank deposit box or someone's house; "cloud" backup in the US is unreasonable as our infrastructure is too slow to be useful in real life situations.

    • (Score: 2) by crutchy on Saturday July 12 2014, @08:20AM

      by crutchy (179) on Saturday July 12 2014, @08:20AM (#68013) Homepage Journal

      Wow lots of great ideas come out of this. Glad i gave it a shot.
      There is obviously a lot of expertise and experience in this budding community, and the willingness to help each other is amazing. There is a real open source vibe. Slashdot would be jealous :-D

      Since I submitted TFS, there has been some leaning towards a QNAP solution for the rack equipment. A bit pricey, but seems like fairly low maintenance (compared to some other options), and everyone knows time is money. Not too sure about where we're leaning for remote, but I'll definitely bring this thread up at work.

      Thanks for all the feedback! You guys are awesome. I look forward to catching up with more of you in IRC (http://chat.soylentnews.org/ [soylentnews.org]).

  • (Score: 0) by Anonymous Coward on Friday July 11 2014, @07:23PM

    by Anonymous Coward on Friday July 11 2014, @07:23PM (#67769)

    use a LAN over wifi (or fiber optics). the wifi (or fiber) is to separate the
    electrical circuit.
    your "remote" backup site doesn't have to be far. maybe just a few hundred meters and another building.
    also this "remote" backup site should be feed by another electrical transformer.
    of course an even better solution would be to send it to a grid feed by a completely different
    power-plant/generator all together ... but that will prolly involve the (slow) internet transit AND
    monthly fees ...
    with wifi (or fiber) you get electrical circuit separation. the other building could provide
    fire protection and flood protection. what else is there ... ah ... solar flares : )

    distance for wifi: curvature of earth line-0f-sight
    distance fiber: ~50 km and money (plus paper "work" for putting the physical cable yadayada)!

    • (Score: 2) by egcagrac0 on Friday July 11 2014, @08:46PM

      by egcagrac0 (2705) on Friday July 11 2014, @08:46PM (#67827)

      use a LAN over wifi (or fiber optics). the wifi (or fiber) is to separate the electrical circuit.

      To what benefit? What risk does that mitigate?

      The common problems that I've used backups to recover from are:

      • user deleted a file accidentally
      • hard drive failure
      • user corrupted a file accidentally
      • virus corrupted a file

      It's entirely possible that lightning strikes or ESD are legitimate concerns in your plan, but if that's the case, you should be able to buy lotto tickets as insurance against them.

      Worry about the common stuff first, worry about the weird stuff later.

      Yes, this building has been struck by lightning. It burned a few holes in the metal roof. It didn't zap the network. We have underground power lines and underground telecom lines; this somewhat reduces our risk of those being points of ingress for lightning attack. Your situation may vary.

  • (Score: 2) by NCommander on Saturday July 12 2014, @03:44AM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday July 12 2014, @03:44AM (#67967) Homepage Journal

    I think rsnapshot is pretty decent way to go; its a very much set it and forget it solution, and we're using it to backup the entire SoylentNews cluster to a remote machine. Main difference, its a pull vs. a push, so the backup is started on the server, then it yanks everything across the wire, and requires password SSH or an actual rsync server setup.

    --
    Still always moving
  • (Score: 0) by Anonymous Coward on Monday July 14 2014, @01:07PM

    by Anonymous Coward on Monday July 14 2014, @01:07PM (#68882)

    Everything revolves around a single workhorse of a file/web/database server: six cores, 16GB RAM (would work fine on 2GB honestly, but caching is nice), 4x 1TB drives in what might be considered a strange configuration. The drives are partitioned identically: partition 1 is a ~200MB boot partition, 2 is a ~20GB root partition, 3 is a ~970GB home partition. I use Linux software RAID to set up /dev/sd?1 as raid1, /dev/sd?2 as raid1, and /dev/sd?3 as raid5. This yields almost 3TB of storage that can withstand a single disk failure without loss and rebuilds at a reasonably quick rate when a drive must be replaced (you must ALWAYS expect server drive failures, as they will happen often. mdadm --manage --add is your friend.) ALWAYS use a UPS on the server, even if it's just a cheap one. Minor power flickers can have major effects.

    Everything is shared out via Samba, so access to the server is platform-agnostic. Since there is no need for per-user security, everything is stored under /home/store and Samba is set up to share this out without requiring authentication. I wrote a backup script that runs in cron.daily that maintains a mirror (rsync -avb --delete --backup-dir=/home/store/backup.0 /home/store/ /home/backup/backup.0/) of the storage area and performs daily snapshot-based (changed or deleted files are moved from the mirror to their own directory) backups with folder rotation. /home/backup/backup.0 is the mirror, while /home/backup/backup.1, /home/backup/backup.2, ... are "changes up to 1 day old, changes up to 2 days old, etc." and I keep about two months of these, so whatever is /home/backup/backup.60 is deleted rather than its number being incremented. This /home/backup directory is shared out of Samba as "backup" with "root" level access but also read-only, so users can retrieve backup copies themselves as needed.

    I perform MariaDB database dumps during this process so that I have backups of the server's database outside of the actual DB storage area's raw files.

    On top of all this, I also use a USB 3.0 3TB external hard drive to back up the contents of the server and keep that drive off-site and generally in my personal possession unless I'm performing backups. In my case I opted to perform multiple rsync runs: one backs up the entire contents of the root filesystem to the external, excluding /dev /proc /sys /home, the second backs up /home with further exclusions for directories that are not necessary to back up (for example, one could exclude the snapshot-based backup mirror if it churned way too much.)

    The external drive is the most important part. A server can have RAID all day long but one power glitch scrambling a little memory somewhere is all it takes to trash a filesystem and make life miserable. You want protection against drive failure, but also against filesystem damage, data corruption, and unlikely catastrophic hardware failures (i.e. a massive power spike blows out your server and all the drives in the array). If my server was completely destroyed, my backup drive contains everything that's very important.