What is RAID?

RAID is a common technique to provide resiliency and availability to a set of data and protect against one of the most common data loss scenarios: the failure of a disk.

The simplest type of RAID is a ‘mirror’, which keeps two or more copies of data on two or more different disks. If one disk fails, the second copy is still available and no data loss has occurred. You would usually see this for system disks in uptime-critical servers.

There also exist more advanced modes, the most common of which are RAID-5 and RAID-6, and consists of 3-4 or more disks with data stripped (written sequentially), along with parity information, across all disks.

It’s worth noting that a pure ‘stripe’, also called RAID-0, is not really a RAID level - it increases the risk of data loss rather than decreasing it, since one disk failure destroys the whole array. It should never be used for redundancy or any critical data.

The Wikipedia page for RAID provides some helpful information about the history and benefits of the various RAID implementations.

So why do I need a backup?

Having a number of disks in RAID may seem like a backup, especially if you’re using a mirror mode. But this is wrong!

RAID protects you against one and only one thing: a disk failure.

It does not protect you against any of the following things:

  • Multiple disk failures beyond the RAID level chosen (e.g. both disks in a mirror, or 3 disks in a RAID-6), including possible UREs - on that later subject, RAID-5 should be considered harmful these days for any disks larger than 1TB.
  • Failure of the RAID controller itself (especially when using hardware RAID), the computer running the RAID, or the environment containing the servers (a flood, fire, theft, etc.).
  • Data corruption on-disk from filesystem bugs, cosmic rays, or minor hardware or firmware failures, which can and do happen all the time - you usually just don’t notice and software works around it.
  • Malicious or accidental deletion or modification of files by yourself or another party, including viruses, bad application writes, or administrative mistakes (e.g. rm-ing the wrong file or mkfs on an existing filesystem), which any seasoned sysadmin has done at least once (and hopefully not to production data)!

The adage is simple: “RAID replicates everything, instantly, even the stuff you don’t want it to.”

But what about those fancy file systems?

There exists a number of file and storage systems with some advanced, RAID-like features. These include ZFS, btrfs, and Ceph. On the surface, these might give you the illuson of protection, but don’t be deceived. You can still trash your whole system (or cluster, for Ceph). You can still rm files or run other destructive commands accidentally. A fire can still destroy your whole rack. A malicious user could overwrite your database. Even the smartest most advanced storage engine is still susceptable to at least one, and almost always several, fatal failure modes.

Like RAID, ADVANCED FILESYSTEMS STILL AREN’T BACKUPS! Just make another copy of the data, okay!?

You've convinced me - so how do I back up?

  • Always back up in some way. While a copy of the data on the same array won’t protect you against all, or even very many, failure modes, it will protect you against some, and those are usually the most common.
  • A backup on the same server is susceptable to the same failures as the original data set (hardware failure, natural disasters, and the like).
  • A good rule of thumb is three copies: the original (RAID or otherwise); one onsite copy on a different, preferrably offline, medium; and one offsite copy. Store the offsite copy in the cloud, a data vault, or at a friend’s house, just keep it somewhere else.
  • Make backups regularly, at least once a week, preferrably more, and automate it! Forgetting to back something up and then needing just that backup is never fun, and the more frequently you back up, especially incrementally, the better your recovery resolution.
  • Test backups regularly, at least once a month; a backup is worthless if you can’t restore from it. Just because you have a backup doesn’t mean you’re protected; always test them and fix any problems. If you never test your backup, you will almost certainly find it doesn’t work, right when you need it.

There are dozens of backup utilities out there; I’m not going to prosthelytize for any one of them, but I personally use BackupPC and good ol’ fashoned rsync for my server and workstation backups.

Only you can determine what you need to back up, but if you can’t replace some data, you should definitely back it up - Murphy’s Law applies here as much as anywhere.

About us

If you have any questions, concerns, or comments about this page, please contact me. If you want to ask basic support questions about RAID or backups, or argue with me, please don’t; we have Reddit for that.

Thanks, Joshua