What is RAID?

RAID is a common technique to provide resiliency and availability to a set of data and protect against one of the most common data loss scenarios: the failure of a disk.

The simplest type of RAID is a ‘mirror’, which keeps two or more copies of data on two or more different disks. If one disk fails, the second copy is still available and no availability or data loss has occurred. You would usually see this for system disks in uptime-critical servers.

More common for bulk data are more advanced modes like RAID-5 and RAID-6, which consists of 3-4 or more disks with data stripped (written sequentially), along with parity information, across all disks, and which can tolerate losing 1 or 2 (respectively) disks in the array before losing data.

It’s worth noting that a pure ‘stripe’, also called RAID-0, is not really a RAID level - it increases the risk of data loss rather than decreasing it, since one disk failure destroys the whole array. It should never be used for redundancy or any critical data.

The Wikipedia page for RAID provides some helpful information about the history and benefits of the various RAID implementations.

So why do I need a backup?

Having a number of disks in RAID may seem like a backup, especially if you’re using a mirrored RAID mode like RAID-1 or RAID-10. But this is wrong!

RAID protects you against one and only one thing: a disk failure. It does not protect you against any of the following things:

  • Multiple disk failures beyond the RAID level chosen (e.g. both disks in a mirror, or 3 disks in a RAID-6), including possible UREs.
  • Failure of the RAID controller itself (if applicable), the computer running the RAID, or the environment containing the servers (e.g. a flood, fire, or theft).
  • Data corruption from filesystem bugs, cosmic rays, or minor hardware or firmware failures, which can and do happen all the time - you usually just don’t notice and software works around it.
  • Malicious or accidental deletion or modification of files, including by viruses, bad application writes, or administrative mistakes (e.g. rm-ing the wrong file or mkfs on an existing filesystem).

The adage is simple: “RAID replicates everything, instantly, even the stuff you don’t want it to.”

But what about those fancy file systems?

There exist a number of storage systems with advanced, RAID-like features, including ZFS, btrfs, and Ceph. On the surface, features of these systems, like snapshots, might give you the illuson of additional protection, but don’t be deceived. Even the smartest most advanced storage engine is still susceptable to at least one, and almost always several, fatal failure modes that can destroy your data.

Just like RAID, advanced storage systems still aren’t backups.

You've convinced me - so how do I back up?

  • Always back up in some way. While a copy of the data on the same array won’t protect you against all, or even very many, failure modes, it will protect you against some, and those are usually the most common. Remember that a backup on the same server is still susceptable to some of the same failures as the original data set, but having 2 copies is still better than 1.
  • A good rule of thumb is three copies: the original (RAID or otherwise); one onsite copy on a different, preferrably offline, medium; and one offsite copy. Store the offsite copy in the cloud, a data vault, or at a friend’s house; just keep it somewhere else. This is often called the “3-2-1” rule: 3 copies, 2 different media types, 1 offsite.
  • Make backups regularly, at least once a week, preferrably more, and automate it! Forgetting to back something up and then needing just that backup is a common scenario and is never fun. The more frequently you back up, the better your recovery resolution, and back up regularly-changed files more often.
  • Test your backups regularly, at least once a month; a backup is worthless if you can’t restore from it. Just because you have a backup doesn’t mean you’re protected: always test them and fix any problems. If you never test your backup, you will almost certainly find it doesn’t work, right when you need it.

There are dozens of backup utilities out there which work well; I’m not going to prosthelytize for any one of them, but I personally use BackupPC and plain old rsync for my server and workstation backups.

Only you can determine what you need to back up, and that differs for everyone. But if you can’t replace a set of data, you should definitely back it up: Murphy’s Law applies here as much as anywhere.

I've learned something!

Now that you’re in the know, get to making and checking a backup of your data, before you lose it! As someone who lost precious, priceless data both to a lamp falling on a drive and a major corruption bug in a bad enclosure, trust me when I say that you will thank yourself later.

More information can be found on the following pages:

About us

If you have any questions, concerns, or comments about this page, please contact me via email at joshua -at- raidisnotabackup.com. If you want to ask basic support questions about RAID or backups, or argue with me, please don’t; we have Reddit for that.

Thanks, Joshua