Incremental backups using rsync

Backups are important. You don't want to lose all your files when you type the wrong command or someone steals your computer. Right?

I have a bunch of physical *nix boxes and VMs, both at home and elsewhere that I want to backup on a daily basis, to a central location which is under my control. Using the rsync program to do this is easy, but by default it will only keep one copy of your files; while you could just backup to a different directory each day, there would be two problems. Firstly, your destination backup disk would get full pretty quickly. Secondly, it could take a very long time for your backups to run as a whole new copy of every file would have to be made each time. Not a problem on a local network using gigabit ethernet, but uploading over VDSL or similar could be. Given it is generally best to have your backups stored in a (secure) remote location, this makes incremental backups important! If you have a like-minded friend then you could each host a backup device for the other perhaps.

Setup required - network/hardware/software

The way this backup process works is that each machine that needs backing up must be reachable from the backup server by DNS name or IP address. Ideally each will have its own public IP address - IPv6 makes this easier but there is a dependency upon ISPs providing non-dynamic prefixes to each customer. Alternatively, the backup server and clients could be connected to the same VPN.

For the hardware, the backup server doesn't need to be very powerful. A Raspberry Pi 3 would work, however I would recommend you make use of LUKS encryption for the disk on which your backups are stored. This will protect your files (and SSH private keys) should your backup server be stolen. While modern Intel CPUs have hardware support for AES encryption, I don't believe the Pi's ARM CPU does, and thus performance would not be as good. It may be good enough to cope with the speed of the rsync transfer [note to self: more testing required!]. An additional but important thing to note with regard to encryption is that you will need to enter the passphrase when mounting the backup disk e.g. after rebooting! Of course, you will need to allocate appropriate storage to your backups. I use a 1TB disk which backs up several hundred GB.

In terms of software, you need rsync installed on both the server and also all clients you wish to backup. The clients must be running sshd and any firewalls configured to allow connection from your server. The server needs perl installed. The script needs to run as root on the backup server in order to maintain file ownership. This may or may not be important to you.

How the software operates

The script is designed to be run on a daily basis. You could run it more frequently, but (without modification) this would cause previous backups run the same day to be overwritten.

Before doing any file copies, the script will check for sufficient disk space on the storage drive, and then delete old backups until the configured amount of space is available.

Then the file copy begins. This is doing using If there is a previous backup of the relevant client, only changed files will be copied the next time the script runs. Unchanged files will be hard-linked instead; this means that you can delete old backups without affecting more recent ones.

SSH key setup

There are two ways that rsync can work over a network - using its own rsync protocol, and also using SSH. The former is fine for public servers but you want the latter for authentication and encryption. So that the backup script can run automatically, SSH keys must be setup. Use the normal ssh-keygen -t rsa command to generate a public/private key pair on the backup server. I would suggest using a separate key for the backups and storing it on the encrypted disk along with the backups.

The public key must be installed on each client being backed up, for the root user. Ordinarily this would be done using the ssh-copy-id command from the backup server to each client, but this would require that root can login using SSH remotely, using a password. This is bad - root should only be able to login with a key remotely, if at all! Therefore you will probably want to do this setup manually or using a configuration management tool. We will also be restricting the scope of the SSH key so it can't be used for normal login to the backup clients.

Manual setup

Edit the file /root/.ssh/authorized_keys and add the following line. This assumes that your backup server has static IP addresses (2001:db8::100:123 and in this example). If not, remove the from="..." part, but note that this is less secure.

from="2001:db8::100:123,",command="/usr/local/bin/rrsync -ro /" ssh-rsa AAAA... rsync backups

The command option specifies that key can only be used for read-only rsync access. You will need to create /usr/local/bin/rrsync by running zcat /usr/share/doc/rsync/scripts/rrsync.gz > /usr/local/bin/rrsync && chmod +x /usr/local/bin/rrsync on Debian-based distros, or on CentOS change the path to /usr/share/doc/rsync-3.0.9/support/rrsync. You can also download it from here. Make sure the file is executable.

Next, ensure that the permissions on authorized_keys are as follows:

-rw------- 1 root root 1149 Dec  6 19:59 /root/.ssh/authorized_keys

The parent directories should be set to 0700 (drwx------).

Using Puppet

I use Puppet to manage most of my Linux servers and workstations. One of the resources it has built-in support for managing is SSH keys. The following manifest code will do what we want here.

    ssh_authorized_key {
        'rsync backups':
            ensure => 'present',
            user => 'root',
            type => 'ssh-rsa',
            key => 'AAAA...',
            options => ["from=\"2001:db8::100:123,\"", 'command="/usr/local/bin/rrsync -ro /"'];

Install the script

You can download the script from my GitLab server. You will also need to create a directory called config and then put these files inside it. Update them with the things you want to backup! To download everything use git:

git clone https://gitlab.xand.uk/xand/rsync-backup.git

Note that there are two versions of the perl script. The original, archive.pl will keep backups a fixed number of days. The newer version, archive-autodf.pl will aim to keep a certain amount of disk space available instead.

With the script downloaded you may need to adjust paths to suit your installation. You can then run it manually to ensure that it works. You will probably want to do this the first time to ensure that your server's SSH known_hosts file contains the SSH server host keys for all your clients, or they won't be backed up. Once you are happy it works manually, you can create a cron job to run it daily.


Please send me any comments/questions/corrections via Twitter, IRC or email - see my contact page.

© 2021 xand