Backups, or backing up the server

There are quite a lot of ready-made solutions to make backups. However, they are all elaborate, too complex, or too cumbersom to install. After some browsing around for linux backup solutions, I found the following useful websites:

These sites pointed me in the direction of using shell and perl scripts, to create a customized backup solution. As it happens, there is an excellent “quick guide to writing scripts using the bash shell”: http://pegasus.rutgers.edu/~elflord/unix/bash-tute.html.

The author mentions all you need to know (for now) about shell scripts, except for the (trivial) fact that shell scripts usually have the extension: *.sh. Also, do not forget to chmod your shell scripts to be executable.

But before we delve any deeper into shell scripts, here is what I wanted in the first place.

Requirements

  1. The backup process itself does not have to run quickly, but the resulting backup file must be small.
  2. The backup process must run full-automatically.
  3. All databases and all of my customers' directories (including mail folders) must be copied.
  4. The backed up data must be encrypted because it is sent to a server that is not under our control, over an insecure channel (ftp).
  5. There is a fourteen day backup window which forever moves forward (i.e. the most recent backup is one day old, the oldest backup is always fourteen days old).
  6. If anything in the backup process fails, I want to be warned by e-mail.

The first requirement means that nearly every backup is incremental: only changed files are part of the current backup. For instance, we could have a full backup every week, or every two weeks, and incremental backups in between.

Based on the advise from http://www.linux-backup.net/Strategy/Incremental/, we will have the incremental backups span the period between the last full backup and the current day. So if we have, for instance, made a full backup on Sunday and today is Thursday, then today's incremental backup will register every change made since Sunday, not just every change made since the last daily incremental backup (which would be Wednesday).

Backup strategy

The required backup strategy is now the following.

  • A full backup is made every Sunday.
  • The full backup files are rotated every four weeks: this ensures that we always have full backups for the past four weeks.
  • Full backup files are named: solin{01,..,n}_fullbackup_week{1,..,4}.tar.gz.nc. The first number stands for the Solin server which was backed up. The second is a week number in a series of four. Note that years with 53 weeks will give rise to a 5th week number. This is because normally week number 53 would be computed into 'week1', which would subsequently be overwritten by 'week1' of the new year.
  • A daily incremental backup includes everything from the last monday up till and including the current weekday (there is no incremental backup on Sundays).
  • The incremental backups are rotated every four weeks. This ensures that we always have a backup for each of the last 28 days.
  • Incremental backup files are named: solin{01,..,n}_incrbackup_week{1,..,4}_day{1,..,6}.tar.gz.nc.The first number stands for the Solin server which was backed up. The second is a week number in a series of four. The last number stands for the weekday (Monday is day 1). Note that the week number is computed in the same way as the one used for naming the full backup file, so you can always immediately see to which full backup an incremental backup belongs.

This strategy actually exceeds the requirements mentioned above, because now we aim for a 28 day window, instead of fourteen day one.

Point of Attention

If you're selecting a hosting company for storage space (you know, the target of your ftp transfers), take this into consideration: ftp transfer speed. I've seen hosts offer 1500 GB of webspace. Sounds great, until you read the fine print, which states “ftp transfer rates: 100kbps”. Use this bandwidth conversion tool to figure out how long it will take to fill the entire 1500 GB, using a 100kbps upload speed. Hint: it's not gonna take you hours, days, or even months, it's gonna take years!

So, if you want to move Gigabytes, start looking for a host which offers upload speeds in the megabits, instead of kilobits.

Pieces of the Backup Puzzle: Various Scripts and Commands

I have rented 10 Gigabytes of storage space on an ftp-server from the same company as the one where the webserver itself is hosted (Flexservers), specifically for making backups.

The scripts and commands we are about to examine, are all part of the backup solution.

Using Bash Shell scripts to ftp

Derived from the website http://arul.telenet-systems.com/info/fileTransfer.php#script, here is an example of a script which does an ftp session.

#!/bin/bash
 
## -n    Restrains ftp from attempting auto-login
HOST='hostname'
USER='username'
PASSWD='password'
 
ftp -i -n $HOST <<FTPSession
user ${USER} ${PASSWD}
binary 
put archivedfiles.zip
bye
FTPSession
echo "FTP finished"

The interesting part is in the ftp command: the -i option turns off the interactive prompting, making it possible to do an ftp session in a shell script.

Once we are on the prompt of the ftp client, we can also use a pipe to stream data to the ftp server:

put "| tar -zcp /home/onno/test " myfile.tar.gz

Here, we use the pipe symbol | to have the put command read from standard input. Note that it is crucial to omit the -f option when invoking the tar command, because -f tells tar to create a file, instead of outputting the archived data to standard output. (See also http://www.rrzn.uni-hannover.de/fileadmin/kurse/material/UnixAK/UnixAK.pdf)

N.B.: apparently due to the inner workings of shell scripts, it is necessary to put in some random string at the beginning and the end of the string that gets appended to the ftp input, using «. So, in the example above, 'FTP­Session' is just a random string.

Warning: the ftp utility limits the maximum number of characters for its ”put” command severly! So, if you ever see an error ”sorry, input line too long”, it's the ftp utility - not the shell - which generates this error.

A script for archiving multiple directories at once using tar

The following script archives and gzips a number of directories to the same tar file.

#!/bin/bash
 
# Create backups of /etc, /home, /usr/local, and...
PATH=/bin:/usr/bin
 
backupdirs="/home/dopperdude"
# tar: c = create, z = zip, p = same permissions, f = file 
for path in $backupdirs
do
    echo "System backup on $path"
    tar -zcpf /backup/test2.tar.gz $path    
done
 
echo "System backups complete, status: $?"

Let's see what happens if we run this script:

[root@1038 onno]# ./backup.sh
System backup on /home/dopperdude
tar: Removing leading `/' from member names
System backups complete, status: 0

Great, but let's take another look at the tar manual. This script, found on the web, needs pruning! The same result can also be achieved in a single tar command, even with multiple directories:

tar -zcpf /backup/test2.tar.gz /home/dopperdude /home/onno /etc

Using the mysqldump utility

mysqldump -uUser -pPassw –flush-logs --lock-tables --all-databases > backtest.sql

From the online MySQL manual (http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html):

–flush-logs, -F

Flush the MySQL server log files before starting the dump. This option requires the RELOAD privilege. Note that if you use this option in combination with the –all-databases (or -A) option, the logs are flushed for each database dumped. The exception is when using –lock-all-tables or –master-data: In this case, the logs are flushed only once, corresponding to the moment that all tables are locked. If you want your dump and the log flush to happen at exactly the same moment, you should use –flush-logs together with either –lock-all-tables or -master-data.

–lock-tables, -l

Lock all tables before starting the dump. The tables are locked with READ LOCAL to allow concurrent inserts in the case of MyISAM tables. For transactional tables such as InnoDB and BDB, –single-transaction is a much better option, because it does not need to lock the tables at all.

Please note that when dumping multiple databases, –lock-tables locks tables for each database separately. So, this option does not guarantee that the tables in the dump file are logically consistent between databases. Tables in different databases may be dumped in completely different states.

Encrypting the archived file

The mcrypt command is a simple to use encryption tool. It uses a passphrase to encrypt any kind of file. Normally, you are prompted twice for the passphrase, but we will store the key in a file, since we are going to use mcrypt from inside a shell script, eventually:

[onno@1038 onno]$ mcrypt -f encryption_key -u data.txt
File data.txt was encrypted.

Here, the -f argument points to a file encryption_key which contains the passphrase. The data.txt is to file to be encrypted. The -u argument deletes (“unlinks”) the original file data.txt. The encrypted contents of the file are stored in a new file, called original_filename.nc - in our example: data.txt.nc.

Using the Linux date command in a shell script

Setting up a backup regime has everything to do with archiving files based on their timestamps. This means it's important to know the current date. But not only that, you will also want to know things like:

  • What is the current week number for the complete year?
  • Which is the current week number in a range of four weeks?

The linux date command comes with a lot of options, one of which answers our first question:

[root@1038 root]# date +%V
48

From the manual:

%V     week number of year with Monday as first day of week (01..53)

Now we know enough to make a script to answer the second question: is this a 1st, 2nd, 3rd or 4th week?

#!/bin/bash
 
WEEK_IN_YEAR_NUMBER=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
 
#Compute weeknumber
if [ $WEEK_IN_YEAR_NUMBER -eq 53 ]
then
	WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup
else
	REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4`
	if [ $REMAINDER -eq 0 ]
	then
		WEEKNUMBER=4
	else
		WEEKNUMBER=$REMAINDER
	fi
fi
 
echo "WEEKNUMBER: $WEEKNUMBER "

There is also a special linux option (which does not work under Unix, according to some sources), which can be used to compute last week's number in a range of four:

date +%V --date='7 day ago'

Last Week's Number

We need the information derived from date, because incremental backups will be made for each full backup in the week after the full backup. And we want to name the incremental backups in the same fashion, i.e. with the same week number as the full backup.

So, here's the script to compute the correct week number for last week:

#!/bin/bash
 
## Script for incremental backup
 
# %V: week number of year with Monday as first day of week (01..53)
# We want to store incremental backups under the same weeknumber as the full 
# backup. But because the full backup is usually made on Sunday, each subsequent
# incremental backup is in a new week. 
# Conclusion: for the incremental backup, we need to figure out the number of the
# week in which the full backup was made 
WEEK_IN_YEAR_NUMBER=`date +%V --date='7 day ago'`
# Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to
# some sources.
 
#Compute weeknumber
if [ $WEEK_IN_YEAR_NUMBER -eq 53 ]
then
	WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup
else
	REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4`
	if [ $REMAINDER -eq 0 ]
	then
		WEEKNUMBER=4
	else
		WEEKNUMBER=$REMAINDER
	fi
fi
 
echo "WEEKNUMBER: $WEEKNUMBER "

Passing arguments and options to a shell script

This website: http://aplawrence.com/Unix/getopts.html contains some very useful information on passing options and arguments to script. Here's a very flexible example script, which will take the required arguments in no specific order.

#!/bin/bash
args=`getopt abc: $*`
# $# contains number of arguments
if [ $# -eq 0 ]
     then
         echo 'Usage: -a -b -c file'
         exit 1
fi
# $? contains exit status
if [ $? != 0 ]
     then
         echo 'Usage: -a -b -c file'
         exit 1
fi
 
set -- $args
if [ $? != 0 ] 
then
	echo 'Usage: -a -b -c file'
	exit 1
fi
 
for i
do
  case "$i" in
        -c) shift;echo "flag c set to $1";shift;;
        -a) shift;echo "flag a set";;
        -b) shift;echo "flag b set";;
  esac
done

Please refer to the explanation of getopt on the before mentioned website, or see the manual.

Putting it all together: the Backup Solution

The backup solution is comprised of three scripts. All three are bash shell scripts, located in /backups: backup.sh, fullbackup.sh and incrbackup.sh. These scripts use the following tools:

tar

[root@1038 root]# tar --version
tar (GNU tar) 1.13.25

To see the contents of a zipped tar file, do:

tar -tzvf example.tar.gz

Explanation:

t = list (display contents)

z = unzip

v = verbose

f = file (meaning: we are going to use a tar file, as opposed to e.g. a pipe stream)

Use the command to check if your backup files are really what you want them to be.

mcrypt

[root@1038 root]# mcrypt --version
Mcrypt v.2.6.4 (i686-pc-linux-gnu)
Linked against libmcrypt v.2.5.7
Copyright (C) 1998-2002 Nikos Mavroyanopoulos (nmav@gnutls.org)

gzip

[root@1038 root]# gzip --version
gzip 1.3.3
(2002-03-08)
Copyright 2002 Free Software Foundation
Copyright 1992-1993 Jean-loup Gailly

The main script

The main script, backup.sh, calls the other two scripts. It also takes obligatory arguments and options values.

#!/bin/bash
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
 
## TO DO: if backup was successful, store current date in a log file. 
## If the script is called again, check if the current date is greater
## than the logdate.
 
EMAIL_FROM="o.schuit@solin.nl"
EMAIL_TO="o.schuit@solin.nl"
 
usageQuit()
{
  echo "Usage: $0 -d weekday -w window]"
  echo "-d weekday: (1-7) Number specifying the day in the week on which the full backup must be made (Monday is 1)"
  echo "-w ({2,4}) Number specifying the span of the backup window in weeks. Example: 4 makes four full backups for a whole month: one every week. During the backup window after week 52 this system only works correctly for window size = 2 or 4."
  echo "** Warning: this utility makes incremental backups in between the weekly full backups."
  echo "This means that the actual backup span is the number of weeks speficied as the window MINUS 1!"
  echo -e "subject: Improper call of Backup script\nThe backup script was improperly called. Please specifiy all required arguments and option values." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
  exit 1
}
 
# $# contains number of arguments
if [ $# -eq 0 ] 
then
	usageQuit # no arguments
fi
 
args=`getopt d:w: $*`
# $? contains exit status or return code of last command
if [ $? != 0 ]
then
	usageQuit # not the right arguments
fi
 
set -- $args
if [ $? != 0 ]
then
	usageQuit # not the right arguments or options
fi
 
for i
do
  case "$i" in
        -d) shift; FULLBAK_DAY=$1; shift;;     
        -w) shift; WINDOW=$1; shift;;     
  esac
done
 
 
if [ $FULLBAK_DAY -lt 1 ] || [ $FULLBAK_DAY -gt 7 ]
then
	usageQuit # invalid weekday
fi
 
# %u     day of week (1..7);  1 represents Monday
if [ `date +%u` = "$FULLBAK_DAY" ]
then		
	/backups/fullbackup.sh $FULLBAK_DAY $WINDOW	
else	
	/backups/incrbackup.sh  $FULLBAK_DAY $WINDOW
fi

The script validates the option values that come with each argument and then determines which particular backup script to call. That's all this script really does.

The Full Backup script

The fullbackup.sh script performs the weekly full backup.

#!/bin/bash
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
 
## Script for full backup
 
BACKUPDIRS="/home /var/www/awstats /www/logs /backups/mysql"
 
#External script parameters:
FULLBAK_DAY=$1
WINDOW=$2
 
WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
EMAIL_FROM="you@yourdomain.toplevel"
EMAIL_TO="you@yourdomain.toplevel"
 
MYSQL_USER="user_name"
MYSQL_PASS="password"
 
FTP_HOST='ftp_host_name'
FTP_USER='ftp_user'
FTP_PASSWD='ftp_password'
 
 
#Compute week rank in total backup window. System is based on a 2 or 4 week backup window.
if [ $WEEK_NOW -eq 53 ]
then
	WINDOW_WEEK=`expr $WINDOW + 1`  # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4])	
else
	REMAINDER=`expr $WEEK_NOW % $WINDOW`
	if [ $REMAINDER -eq 0 ]
	then
		WINDOW_WEEK=$WINDOW
	else
		WINDOW_WEEK=$REMAINDER
	fi
fi
 
 
FILENAME="solin01_fullbackup_week${WINDOW_WEEK}"
 
mysqldump -u${MYSQL_USER} -p${MYSQL_PASS} --flush-logs --all-databases > /backups/mysql/mysql_fullbackup.sql
# $?        Exit status or return code of last command
if [ $? != 0 ] 
then	
	echo -e "subject: Dump of MySQL databases failed\nThere has been a problem while backing up the MySQL databases for the Full Backup. Exit code was ${#}." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
fi
# delete MySQL binary log files (these are incremental backups)
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "RESET MASTER;"
 
 
# Create backups
 
# tar -z: gzip the archive
# tar -c: create new archive (overwrites old archive with same name)
# tar -p, --same-permissions, --preserve-permissions
# mcrypt -f: use keys file
# ftp -n:    Restrains ftp from attempting auto-login
 
ftp -i -n $FTP_HOST <<FTPSession
user ${FTP_USER} ${FTP_PASSWD} 
binary
put "| tar -zcp  ${BACKUPDIRS} | mcrypt -f /backups/encryption_key " /backups/${FILENAME}.tar.gz.nc
bye
FTPSession
if [ $? != 0 ] 
then
	PROBLEM="There has been a problem with the Full Backup script. The FTP session did not succeed. Exit code was ${#}."
	echo "${PROBLEM}"
	echo -e "subject: FTP of full backup files failed\n${PROBLEM}" | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
fi
echo "Full backup has finished"

Before the actual backup takes place, the MySQL server databases are dumped into a specific location, which is included in the list of backup directories. After the dump, the binary logs files are reset, which later allows for proper incremental backups of these as well.

The really interesting part is the highlighted line, where the archiving, compression, encryption and ftp-transfer take place.

If anything goes wrong, the system operator is warned through an e-mail message.

The Incremental Backup script

The incrbackup.sh script performs the daily incremental backup.

#!/bin/bash
## Script for incremental backup
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
 
BACKUPDIRS="/home /var/www/awstats /www/logs /var/lib/mysql/backups"
#EXCL="backup-*"	# do not backup files matching this pattern
 
 
# %V: week number of year with Monday as first day of week (01..53)
# We want to store incremental backups under the same weeknumber as the full 
# backup. But because the full backup is usually made on Sunday, each subsequent
# incremental backup is in a new week.
 
# Conclusion: for the incremental backup, we need to figure out the number of the
# week in which the full backup was made.
 
WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
DAY_NOW=`date +%u`  # %u: day of week (1..7);  1 represents Monday
FULLBAK_DAY=$1	# day of week (1..7);  1 represents Monday. Example: 7 means full backup on Sundays.
WINDOW=$2 # backup span in weeks. Example: '3' means three full backups.
 
EMAIL_FROM="you@yourdomain.toplevel"
EMAIL_TO="you@yourdomain.toplevel"
 
FTP_HOST='ftp_host_name'
FTP_USER='ftp_user'
FTP_PASSWD='ftp_password'
 
#Incremental backup span includes day of Full Backup, so any changes made directly after a full backup are also backed up.
if [ $FULLBAK_DAY -gt $DAY_NOW ]
then
	#backup was last week
	BACKUP_WEEK=`expr $WEEK_NOW - 1`
 
	#compute span of incremental backup (measured in number of days)
	INCR_SPAN=`expr $DAY_NOW +  7 - $FULLBAK_DAY`
 
else
	#backup was this week
	BACKUP_WEEK=$WEEK_NOW
 
	#compute span of incremental backup (measured in number of days)
	INCR_SPAN=`expr $DAY_NOW - $FULLBAK_DAY`
 
fi
 
 
#Compute week rank in total backup window. System is based on a 2 or 4 week backup window.
#BACKUP_WEEK: number of week (1 - 53) in which the actual full backup was made
if [ $BACKUP_WEEK -eq 53 ]
then
	WINDOW_WEEK=`expr $WINDOW + 1`  # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4])	
else
	REMAINDER=`expr $BACKUP_WEEK % $WINDOW`
	if [ $REMAINDER -eq 0 ]
	then
		WINDOW_WEEK=$WINDOW
	else
		WINDOW_WEEK=$REMAINDER
	fi
fi
 
FILENAME="solin01_incrbackup_week${WINDOW_WEEK}_day${DAY_NOW}"
 
DATE_SPAN=`date --date="${INCR_SPAN} day ago"`
QUOTED_SPAN="'${DATE_SPAN}'"
 
# Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to
# some sources.
# tar -N, --after-date DATE, --newer DATE: only store files newer than DATE
# --newer=`date +%F --date='${INCR_SPAN} day ago'
ftp -i -n $FTP_HOST <<FTPSession
user ${FTP_USER} ${FTP_PASSWD}
binary
put "| tar -zcp -X X --newer=${QUOTED_SPAN} ${BACKUPDIRS} | mcrypt -f /backups/enc_key " /backups/${FILENAME}.tar.gz.nc
bye
FTPSession
if [ $? != 0 ] 
then	
	PROBLEM="There has been a problem with the Incremental Backup script. The FTP session did not succeed. Exit code was ${#}."
	echo "${PROBLEM}"
	echo -e "subject: FTP of incremental backup files failed\n${PROBLEM}" | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
fi
echo "Incremental backup has finished"

The script first determines the current backup span: each incremental backup spans from the day of the full backup up till the current day. Notice that we have to perform several operations to get the span date in place (see the first highlighted line) - quotation within shell scripts can be really tricky.

Another interesting part is the second highlighted line, where the archiving, compression, encryption and ftp-transfer take place. This line differs from the same line in fullbackup.sh in two ways:

  • There is a –newer argument given to the tar command. The value for this argument actually determines the span of the incremental backup.
  • To exclude certain files from being backed up, the option -X reads a file (perhaps confusingly named X as well) which contains a list of patterns. Any file which matches a pattern from the list will not be backed up: it is eXcluded by tar! We do this because e.g. the web application Moodle creates daily backup zip files. The content in these zip files has already been backed up by our incremental backup, so we do not want to include these zip files.

Running the main script with crontab

Use crontab to execute the main script automatically each day. A warning about the crontab environment: not all required paths are included in the environment. That is why we have included the PATH statements in each script. There is also the >/dev/null 2>&1 redirect at the end of the command, so we will not receive the usual daily e-mails from crontab about this script.

20051227

I have changed the backup path on the ftp server and the web server to /bk (instead of /backups), because the ftp utility cannot take long arguments for the put command ('sorry, input line too long').


Personal Tools