BackupPC
From Smop.co.uk
I've used three open source backup products and I'm afraid I'm not very happy with any of them:
- amanda - one tape per day philosophy - that's enough to make it unusable for me
- bacula - better than amanda, have had various issues (tape related normally)
- better admin tools out now for this
- backuppc - massively hardlinked tree causes problems with many/large hosts
What I'd like is:
- easy to use for restores (backuppc GUI is good in that regard, new bacula interfaces look good too)
- database for saving metadata (massively hardlinked trees causes issues)
- how to do replicate the backup to another server?
- de-duplication - backuppc does this
- binary diffs - particularly of log files
- retention policy
- with bacula you can only purge a whole job
- with backuppc you have to remove it yourself from the tree and await the nightly clearup
- I want to be able to say which files to backup, for how long, which copies to keep
- I might want some files backed up each how, others each day, others each week
- some files I just want the most recent copy, others I might need to keep for seven years
- hierarchal storage - e.g. disk, tape, offsite
- preferably with higher compression and/or binary diff mechanisms.
Contents |
BackupPC setup
- apt-get install backuppc rsync libfile-rsyncp-perl
- setup passwordless ssh on backup server
- "backuppc" user needs to be able to login to clients
- I normally login as "backup" and grant sudo access to rsync (NOPASSWD:)
- add rsync details to /etc/backuppc/config.pl:
$Conf{XferMethod} = 'rsync';
$Conf{RsyncClientCmd} = '$sshPath -q -x -l backup $host sudo $rsyncPath $argList+';
$Conf{RsyncClientRestoreCmd} = '$sshPath -q -x -l backup $host sudo $rsyncPath $argList+';
- tell backuppc when to run:
$Conf{WakeupSchedule} = [2.5,3..6]; # first number = nightly admin task
$Conf{MaxBackupPCNightlyJobs} = 2; # run two admin tasks in parallel
$Conf{BackupPCNightlyPeriod} = 1; # go through pool every X days (1,2,4,8,16)
$Conf{MaxBackups} = 4; # how many machines to backup in parallel
$Conf{BlackoutGoodCnt}=0;
- and how long to keep backups for:
$Conf{FullKeepCnt} = [4, 2, 3]; # full backups for x*(2^position*backupperiod)
# e.g. [2, 3, 4] means keepinging weeks (1,2),(4,6,8),(12,16,20, 24)
# Very old full backups are removed after $Conf{FullAgeMax} days. However,
# we keep at least $Conf{FullKeepCntMin} full backups no matter how old
# they are.
$Conf{FullKeepCntMin} = 1;
$Conf{FullAgeMax} = 90;
$Conf{IncrKeepCnt} = 6; # how many incrementals to keep
$Conf{IncrKeepCntMin} = 1; # minimum number of incrementals to keep
$Conf{IncrAgeMax} = 30; # maximum age of incrementals
# how long to keep partial backups around before binning them
# these only occur when a full backup fails half way through
$Conf{PartialAgeMax} = 3;
BackupPC client config
Since you've setup ssh and sudo already, we just need to tell backuppc about the client.
There are some gotchas with backing up via rsync, so I find it easiest to define common settings in /etc/backuppc/common.pl and then override then as required.
- first add "hostname 1 backuppc" to /etc/backuppc/hosts".
- setup /etc/backuppc/common.pl - note the WARNING notes!
# Common hosts settings - must be explicity sourced by each host.pl file
# We have to set this here as the push below just appends
# to an empty list otherwise
$Conf{RsyncArgs} = [
#
# Do not edit these!
#
'--numeric-ids',
'--perms',
'--owner',
'--group',
'-D',
'--links',
'--hard-links',
'--times',
'--block-size=2048',
'--recursive',
'-C',
#
# Rsync >= 2.6.3 supports the --checksum-seed option
# which allows rsync checksum caching on the server.
# Uncomment this to enable rsync checksum caching if
# you have a recent client rsync version and you want
# to enable checksum caching.
#
#'--checksum-seed=32761',
#
# Add additional arguments here
#
];
##### WARNING #####
# rsync matches as it goes so given "+ /a/b/c, -/a/b"
# it doesn't even look inside /a/b and hence /a/b/c is excluded!
#
# For this reason, BackupFilesOnly has unexpected behaviour.
# e.g. BackupFilesOnly "/var/cache/debconf" adds
# excludes for /*, /var/*, /var/cache/*
#
# To backup /var/cache/debconf:
# add '/var' to $Conf{BackupFilesOnly}
# add '--include=/var/cache/debconf' to $Conf{RsyncArgs}
# add '/var/cache/*' to $Conf{BackupFilesExclude}
$Conf{BackupFilesOnly} = ['/etc/','/boot/grub','/root/','/home',
'/opt', '/usr/local', '/var/'];
# Need dummy first argument since it's dropped for some reason
push @{$Conf{RsyncArgs}}, '',
'--include=/var/cache/debconf';
$Conf{BackupFilesExclude} = [
'/var/db/nscd', '/var/cache/*','/var/lib/apt','/var/lib/dpkg/info',
'/var/lock', '/var/log', '/var/run', '/var/tmp'];
- finally import this into /etc/backuppc/hostname.pl and override as required:
# get common host settings
do "/etc/backuppc/common.pl";
# must exclude BackupPC data directory!(main backup server only)
push @{$Conf{BackupFilesExclude}}, '/var/lib/backuppc';
Useful scripts
I've written some scripts that I use to check backup disk usage:
Media:Backuppc_stats
backuppc_stats -a|-h hostname [-i|-f]
reports backup stats for all hosts or a single host. will report last five incrementals [-i or default] or last five full backups [-f]
Sample usage (backuppc_stats -a -f):
Mins Files Size Comp 5 314 24 6 ash 6 658 54 11 ash 3 76 7 2 ash Mins Files Size Comp 0 30 0 0 ripley 0 88 28 24 ripley 0 17 0 0 ripley
Media:Backuppc_new
backuppc_new -a|-h hostname [-s size] [-b last|full|incr|number]
reports new files (above a given size. default=1048576) in a given backup (defaults to "last")
Sample output (backuppc_new -h vasquez):
vasquez ========= create 644 106/110 1453401 var/backups/clamav/phish.ndb create 644 106/110 1502482 var/backups/clamav/scam.ndb create 600 107/111 5218304 var/lib/amavis/.spamassassin/bayes_seen create 660 102/106 5242880 var/lib/mysql/ib_logfile0 create 600 107/111 5607424 var/lib/amavis/.spamassassin/bayes_toks create 660 102/106 8541184 var/lib/mysql/gld/greylist.MYI create 600 107/111 10604544 var/lib/amavis/.spamassassin/auto-whitelist create 660 102/106 18874368 var/lib/mysql/ibdata1 create 660 102/106 56762937 var/lib/mysql/gld/greylist.MYD
Media:Backuppc_ls
backuppc_ls -a|-h hostname path
returns a list of matching files for the given hosts
Sample output (backuppc_ls -a "/var/lib/mysql/gld/grey*"):
/var/lib/backuppc/pc/vasquez/112/f%2f/fvar/flib/fmysql/fgld/fgreylist.MYD /var/lib/backuppc/pc/vasquez/112/f%2f/fvar/flib/fmysql/fgld/fgreylist.MYI /var/lib/backuppc/pc/vasquez/112/f%2f/fvar/flib/fmysql/fgld/fgreylist.frm /var/lib/backuppc/pc/vasquez/124/f%2f/fvar/flib/fmysql/fgld/fgreylist.MYD /var/lib/backuppc/pc/vasquez/124/f%2f/fvar/flib/fmysql/fgld/fgreylist.MYI /var/lib/backuppc/pc/vasquez/124/f%2f/fvar/flib/fmysql/fgld/fgreylist.frm
I normally feed this to a script using xargs.
Media:Backuppc_prune
This script removes selected old files from backuppc.
I use it for dropping large files earlier (e.g. database backups) before small files which I can keep for longer.
The files to delete are currently listed at the bottom of the script but should be moved into a configuration file.
