Creating a Linux Backup Solution on Ubuntu

Before today, my backup solution for my Ubuntu servers was pretty simple – two tar files; one for the web page folders and another for mailbox backups.  This was done by the following command:

rm /<location>/daily-www-backup.tgz > /dev/null
rm /<location>/daily-mailbox-backup.tgz > /dev/null
tar pzcf /<location>/daily-www-backup.tgz /<location-to-backup> > /dev/null
tar pzcf /<location>/daily-mailbox-backup.tgz /<location-to-backup> > /dev/null

However, there is a problem with this situation.  The above will basically first delete the backup files and then re-create them each day.  This is in essence a “full backup” technique every day.  But, what happens if a file is overwritten and it isn’t noticed until a few days later?  Woops!  Too late now.

So I began doing a little research on how to perform differential backups in Ubuntu.  I came across “dar” – a program that will do both a full backup and a differential backup based upon the full backup.

I also installed Kdar on the server – which is a GUI front-end of dar from KDE.  It seems that this was no longer supported in the archives after the Ubuntu dapper release, so in order to get Kdar on my current systems, I had to put the following line in the bottom of the /etc/apt/sources.list file:

deb http://gb.archive.ubuntu.com/ubuntu dapper universe

I then performed the update so this new location was cataloged:

sudo apt-get update

Then I installed kdar and dar:

sudo apt-get install kdar dar

Afterwards, I then opened up kdar (had to run as root as well to backup files that my user account didn’t have) and setup a backup job.  It gave me the option to Export the dar command to a shell script, which looked like the following:

dar -c “/<location>/SundayFullBackup” -R “/home/” -w -D -y -m 150 -P “Folder1” -P “Folder2” -P “Folder3” -P “Folder4”

Here is how the command works:

-c – This option tells dar the name of the file that will contain the backup
-R – This option tells dar the location that should be backed up
-w – This option tells dar not to warn when overwriting files
-D – This option tells dar to store excluded directories as empty directories in the backup file (see -P for excluded directories)
-y – This option tells dar to use the bzip2 compression technique (instead of -z which uses gzip; bzip compresses more)
-m 150 – This option tells dar not to compress files less than 150 bytes in size
-P – This option tells dar to exclude a directory from the archive.  In my case above (which I’ve changed the folder names), there are four folders that I’ve excluded from the backup

By using the -P command, this allowed me to backup both the mailboxes and web data at once instead of having two backup files and two separate processes.

With this command alone over the tar command, it saved about 17 megabytes of space.  Tar uses the gzip compression technique with using the “z” option.  So the two combined files using tar was 880 megabytes.  The one file made by dar is 863 megabytes.  While this isn’t much of a savings, it still is an improvement over tar.

Another improvement over tar (and the main reason I installed the Kdar GUI) is that you can extract specific files and folders from a dar backup file.  Tar requires you to unpack and unzip the entire archive to a directory and then pick and choose what needs restored.

Now, how is it that you create a differential backup?  Let me know show you the command that Kdar made to create a differential backup:

 dar -v -c “/<location>/MondayDiffBackup” -R “/home/” -A “/<location>/SundayFullBackup” -w -D -y -m 150 -P “Folder1” -P “Folder2” -P “Folder3” -P “Folder4”

While the command looks quite similar to the full backup command, there are a few extra options on this that I’ll go over here.

-v – Verbose output – This will output a list each day the differential is run to show what files have changed since the last full backup.
-A – This option tells dar the location of the full backup that the differential should be based on.  This is how dar can tell what files have been changed/modified and need to be backed up to the newest copy.

That is all there is to it!  However, I had a problem when trying to run a differential.  Since I am going to set these all up as cron jobs, I needed them to run without any intervention.  The full backup worked fine when I ran the shell script, but unfortunately the differential backups would not.  I kept getting this message when I would try the differential backup:

Warning, SundayFullBackup.1.dar seems more to be a slice name than a base name. Do you want to replace it by SundayFullBackup ? [return = OK | Esc = cancel]

I searched online and could not find a solution to the problem.

Originally, I created the full backup using Kdar.  So I pondered if maybe Kdar did something different with backing up the original file.  Therefore, the original full backup file was deleted and then I re-created it using the shell script that Kdar.  When I then run the differential backup – poof!  It worked without any intervention required and it worked well.

So, a good solution for performing differential backups in Linux would be to use a combination of dar and Kdar.  Kdar is best used only as the restore program so you can pick and choose what files you want to restore – and dar is needed as the command-line program so you can create a cron job and have these run.