There are quite a lot of ready-made solutions to make backups. However, they are all elaborate, too complex, or too cumbersom to install. After some browsing around for linux backup solutions, I found the following useful websites:
These sites pointed me in the direction of using shell and perl scripts, to create a customized backup solution. As it happens, there is an excellent “quick guide to writing scripts using the bash shell”: http://pegasus.rutgers.edu/~elflord/unix/bash-tute.html.
The author mentions all you need to know (for now) about shell scripts, except for the (trivial) fact that shell scripts usually have the extension: *.sh
. Also, do not forget to chmod
your shell scripts to be executable.
But before we delve any deeper into shell scripts, here is what I wanted in the first place.
The first requirement means that nearly every backup is incremental: only changed files are part of the current backup. For instance, we could have a full backup every week, or every two weeks, and incremental backups in between.
Based on the advise from http://www.linux-backup.net/Strategy/Incremental/, we will have the incremental backups span the period between the last full backup and the current day. So if we have, for instance, made a full backup on Sunday and today is Thursday, then today's incremental backup will register every change made since Sunday, not just every change made since the last daily incremental backup (which would be Wednesday).
The required backup strategy is now the following.
solin{01,..,n}_fullbackup_week{1,..,4}.tar.gz.nc
. The first number stands for the Solin server which was backed up. The second is a week number in a series of four. Note that years with 53 weeks will give rise to a 5th week number. This is because normally week number 53 would be computed into 'week1', which would subsequently be overwritten by 'week1' of the new year.solin{01,..,n}_incrbackup_week{1,..,4}_day{1,..,6}.tar.gz.nc.
The first number stands for the Solin server which was backed up. The second is a week number in a series of four. The last number stands for the weekday (Monday is day 1). Note that the week number is computed in the same way as the one used for naming the full backup file, so you can always immediately see to which full backup an incremental backup belongs.This strategy actually exceeds the requirements mentioned above, because now we aim for a 28 day window, instead of fourteen day one.
If you're selecting a hosting company for storage space (you know, the target of your ftp transfers), take this into consideration: ftp transfer speed. I've seen hosts offer 1500 GB of webspace. Sounds great, until you read the fine print, which states “ftp transfer rates: 100kbps”. Use this bandwidth conversion tool to figure out how long it will take to fill the entire 1500 GB, using a 100kbps upload speed. Hint: it's not gonna take you hours, days, or even months, it's gonna take years!
So, if you want to move Gigabytes, start looking for a host which offers upload speeds in the megabits, instead of kilobits.
I have rented 10 Gigabytes of storage space on an ftp-server from the same company as the one where the webserver itself is hosted (Flexservers), specifically for making backups.
The scripts and commands we are about to examine, are all part of the backup solution.
Derived from the website http://arul.telenet-systems.com/info/fileTransfer.php#script, here is an example of a script which does an ftp session.
#!/bin/bash ## -n Restrains ftp from attempting auto-login HOST='hostname' USER='username' PASSWD='password' ftp -i -n $HOST <<FTPSession user ${USER} ${PASSWD} binary put archivedfiles.zip bye FTPSession echo "FTP finished"
The interesting part is in the ftp command: the -i
option turns off the interactive prompting, making it possible to do an ftp session in a shell script.
Once we are on the prompt of the ftp client, we can also use a pipe to stream data to the ftp server:
put "| tar -zcp /home/onno/test " myfile.tar.gz
Here, we use the pipe symbol |
to have the put command read from standard input. Note that it is crucial to omit the -f
option when invoking the tar
command, because -f
tells tar
to create a file, instead of outputting the archived data to standard output. (See also http://www.rrzn.uni-hannover.de/fileadmin/kurse/material/UnixAK/UnixAK.pdf)
N.B.: apparently due to the inner workings of shell scripts, it is necessary to put in some random string at the beginning and the end of the string that gets appended to the ftp input, using «
. So, in the example above, 'FTPSession' is just a random string.
Warning: the ftp utility limits the maximum number of characters for its ”put
” command severly! So, if you ever see an error ”sorry, input line too long
”, it's the ftp utility - not the shell - which generates this error.
The following script archives and gzips a number of directories to the same tar file.
#!/bin/bash # Create backups of /etc, /home, /usr/local, and... PATH=/bin:/usr/bin backupdirs="/home/dopperdude" # tar: c = create, z = zip, p = same permissions, f = file for path in $backupdirs do echo "System backup on $path" tar -zcpf /backup/test2.tar.gz $path done echo "System backups complete, status: $?"
Let's see what happens if we run this script:
[root@1038 onno]# ./backup.sh System backup on /home/dopperdude tar: Removing leading `/' from member names System backups complete, status: 0
Great, but let's take another look at the tar manual. This script, found on the web, needs pruning! The same result can also be achieved in a single tar command, even with multiple directories:
tar -zcpf /backup/test2.tar.gz /home/dopperdude /home/onno /etc
mysqldump -uUser -pPassw –flush-logs --lock-tables --all-databases > backtest.sql
From the online MySQL manual (http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html):
–flush-logs, -F
Flush the MySQL server log files before starting the dump. This option requires the RELOAD privilege. Note that if you use this option in combination with the –all-databases (or -A) option, the logs are flushed for each database dumped. The exception is when using –lock-all-tables or –master-data: In this case, the logs are flushed only once, corresponding to the moment that all tables are locked. If you want your dump and the log flush to happen at exactly the same moment, you should use –flush-logs together with either –lock-all-tables or -master-data.
–lock-tables, -l
Lock all tables before starting the dump. The tables are locked with READ LOCAL to allow concurrent inserts in the case of MyISAM tables. For transactional tables such as InnoDB and BDB, –single-transaction is a much better option, because it does not need to lock the tables at all.
Please note that when dumping multiple databases, –lock-tables locks tables for each database separately. So, this option does not guarantee that the tables in the dump file are logically consistent between databases. Tables in different databases may be dumped in completely different states.
The mcrypt
command is a simple to use encryption tool. It uses a passphrase to encrypt any kind of file. Normally, you are prompted twice for the passphrase, but we will store the key in a file, since we are going to use mcrypt from inside a shell script, eventually:
[onno@1038 onno]$ mcrypt -f encryption_key -u data.txt File data.txt was encrypted.
Here, the -f argument points to a file encryption_key which contains the passphrase. The data.txt is to file to be encrypted. The -u argument deletes (“unlinks”) the original file data.txt. The encrypted contents of the file are stored in a new file, called original_filename.nc
- in our example: data.txt.nc.
Setting up a backup regime has everything to do with archiving files based on their timestamps. This means it's important to know the current date. But not only that, you will also want to know things like:
The linux date
command comes with a lot of options, one of which answers our first question:
[root@1038 root]# date +%V 48
From the manual:
%V week number of year with Monday as first day of week (01..53)
Now we know enough to make a script to answer the second question: is this a 1st, 2nd, 3rd or 4th week?
#!/bin/bash WEEK_IN_YEAR_NUMBER=`date +%V` # %V: week number of year with Monday as first day of week (01..53) #Compute weeknumber if [ $WEEK_IN_YEAR_NUMBER -eq 53 ] then WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup else REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4` if [ $REMAINDER -eq 0 ] then WEEKNUMBER=4 else WEEKNUMBER=$REMAINDER fi fi echo "WEEKNUMBER: $WEEKNUMBER "
There is also a special linux option (which does not work under Unix, according to some sources), which can be used to compute last week's number in a range of four:
date +%V --date='7 day ago'
We need the information derived from date
, because incremental backups will be made for each full backup in the week after the full backup. And we want to name the incremental backups in the same fashion, i.e. with the same week number as the full backup.
So, here's the script to compute the correct week number for last week:
#!/bin/bash ## Script for incremental backup # %V: week number of year with Monday as first day of week (01..53) # We want to store incremental backups under the same weeknumber as the full # backup. But because the full backup is usually made on Sunday, each subsequent # incremental backup is in a new week. # Conclusion: for the incremental backup, we need to figure out the number of the # week in which the full backup was made WEEK_IN_YEAR_NUMBER=`date +%V --date='7 day ago'` # Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to # some sources. #Compute weeknumber if [ $WEEK_IN_YEAR_NUMBER -eq 53 ] then WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup else REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4` if [ $REMAINDER -eq 0 ] then WEEKNUMBER=4 else WEEKNUMBER=$REMAINDER fi fi echo "WEEKNUMBER: $WEEKNUMBER "
This website: http://aplawrence.com/Unix/getopts.html contains some very useful information on passing options and arguments to script. Here's a very flexible example script, which will take the required arguments in no specific order.
#!/bin/bash args=`getopt abc: $*` # $# contains number of arguments if [ $# -eq 0 ] then echo 'Usage: -a -b -c file' exit 1 fi # $? contains exit status if [ $? != 0 ] then echo 'Usage: -a -b -c file' exit 1 fi set -- $args if [ $? != 0 ] then echo 'Usage: -a -b -c file' exit 1 fi for i do case "$i" in -c) shift;echo "flag c set to $1";shift;; -a) shift;echo "flag a set";; -b) shift;echo "flag b set";; esac done
Please refer to the explanation of getopt
on the before mentioned website, or see the manual.
The backup solution is comprised of three scripts. All three are bash shell scripts, located in /backups
: backup.sh
, fullbackup.sh
and incrbackup.sh
. These scripts use the following tools:
tar
[root@1038 root]# tar --version tar (GNU tar) 1.13.25
To see the contents of a zipped tar file, do:
tar -tzvf example.tar.gz
Explanation:
t = list (display contents)
z = unzip
v = verbose
f = file (meaning: we are going to use a tar file, as opposed to e.g. a pipe stream)
Use the command to check if your backup files are really what you want them to be.
mcrypt
[root@1038 root]# mcrypt --version Mcrypt v.2.6.4 (i686-pc-linux-gnu) Linked against libmcrypt v.2.5.7 Copyright (C) 1998-2002 Nikos Mavroyanopoulos (nmav@gnutls.org)
gzip
[root@1038 root]# gzip --version gzip 1.3.3 (2002-03-08) Copyright 2002 Free Software Foundation Copyright 1992-1993 Jean-loup Gailly
The main script, backup.sh
, calls the other two scripts. It also takes obligatory arguments and options values.
#!/bin/bash PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:. ## TO DO: if backup was successful, store current date in a log file. ## If the script is called again, check if the current date is greater ## than the logdate. EMAIL_FROM="o.schuit@solin.nl" EMAIL_TO="o.schuit@solin.nl" usageQuit() { echo "Usage: $0 -d weekday -w window]" echo "-d weekday: (1-7) Number specifying the day in the week on which the full backup must be made (Monday is 1)" echo "-w ({2,4}) Number specifying the span of the backup window in weeks. Example: 4 makes four full backups for a whole month: one every week. During the backup window after week 52 this system only works correctly for window size = 2 or 4." echo "** Warning: this utility makes incremental backups in between the weekly full backups." echo "This means that the actual backup span is the number of weeks speficied as the window MINUS 1!" echo -e "subject: Improper call of Backup script\nThe backup script was improperly called. Please specifiy all required arguments and option values." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO exit 1 } # $# contains number of arguments if [ $# -eq 0 ] then usageQuit # no arguments fi args=`getopt d:w: $*` # $? contains exit status or return code of last command if [ $? != 0 ] then usageQuit # not the right arguments fi set -- $args if [ $? != 0 ] then usageQuit # not the right arguments or options fi for i do case "$i" in -d) shift; FULLBAK_DAY=$1; shift;; -w) shift; WINDOW=$1; shift;; esac done if [ $FULLBAK_DAY -lt 1 ] || [ $FULLBAK_DAY -gt 7 ] then usageQuit # invalid weekday fi # %u day of week (1..7); 1 represents Monday if [ `date +%u` = "$FULLBAK_DAY" ] then /backups/fullbackup.sh $FULLBAK_DAY $WINDOW else /backups/incrbackup.sh $FULLBAK_DAY $WINDOW fi
The script validates the option values that come with each argument and then determines which particular backup script to call. That's all this script really does.
The fullbackup.sh
script performs the weekly full backup.
#!/bin/bash PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:. ## Script for full backup BACKUPDIRS="/home /var/www/awstats /www/logs /backups/mysql" #External script parameters: FULLBAK_DAY=$1 WINDOW=$2 WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53) EMAIL_FROM="you@yourdomain.toplevel" EMAIL_TO="you@yourdomain.toplevel" MYSQL_USER="user_name" MYSQL_PASS="password" FTP_HOST='ftp_host_name' FTP_USER='ftp_user' FTP_PASSWD='ftp_password' #Compute week rank in total backup window. System is based on a 2 or 4 week backup window. if [ $WEEK_NOW -eq 53 ] then WINDOW_WEEK=`expr $WINDOW + 1` # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4]) else REMAINDER=`expr $WEEK_NOW % $WINDOW` if [ $REMAINDER -eq 0 ] then WINDOW_WEEK=$WINDOW else WINDOW_WEEK=$REMAINDER fi fi FILENAME="solin01_fullbackup_week${WINDOW_WEEK}" mysqldump -u${MYSQL_USER} -p${MYSQL_PASS} --flush-logs --all-databases > /backups/mysql/mysql_fullbackup.sql # $? Exit status or return code of last command if [ $? != 0 ] then echo -e "subject: Dump of MySQL databases failed\nThere has been a problem while backing up the MySQL databases for the Full Backup. Exit code was ${#}." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO fi # delete MySQL binary log files (these are incremental backups) mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "RESET MASTER;" # Create backups # tar -z: gzip the archive # tar -c: create new archive (overwrites old archive with same name) # tar -p, --same-permissions, --preserve-permissions # mcrypt -f: use keys file # ftp -n: Restrains ftp from attempting auto-login ftp -i -n $FTP_HOST <<FTPSession user ${FTP_USER} ${FTP_PASSWD} binary put "| tar -zcp ${BACKUPDIRS} | mcrypt -f /backups/encryption_key " /backups/${FILENAME}.tar.gz.nc bye FTPSession if [ $? != 0 ] then PROBLEM="There has been a problem with the Full Backup script. The FTP session did not succeed. Exit code was ${#}." echo "${PROBLEM}" echo -e "subject: FTP of full backup files failed\n${PROBLEM}" | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO fi echo "Full backup has finished"
Before the actual backup takes place, the MySQL server databases are dumped into a specific location, which is included in the list of backup directories. After the dump, the binary logs files are reset, which later allows for proper incremental backups of these as well.
The really interesting part is the highlighted line, where the archiving, compression, encryption and ftp-transfer take place.
If anything goes wrong, the system operator is warned through an e-mail message.
The incrbackup.sh
script performs the daily incremental backup.
#!/bin/bash ## Script for incremental backup PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:. BACKUPDIRS="/home /var/www/awstats /www/logs /var/lib/mysql/backups" #EXCL="backup-*" # do not backup files matching this pattern # %V: week number of year with Monday as first day of week (01..53) # We want to store incremental backups under the same weeknumber as the full # backup. But because the full backup is usually made on Sunday, each subsequent # incremental backup is in a new week. # Conclusion: for the incremental backup, we need to figure out the number of the # week in which the full backup was made. WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53) DAY_NOW=`date +%u` # %u: day of week (1..7); 1 represents Monday FULLBAK_DAY=$1 # day of week (1..7); 1 represents Monday. Example: 7 means full backup on Sundays. WINDOW=$2 # backup span in weeks. Example: '3' means three full backups. EMAIL_FROM="you@yourdomain.toplevel" EMAIL_TO="you@yourdomain.toplevel" FTP_HOST='ftp_host_name' FTP_USER='ftp_user' FTP_PASSWD='ftp_password' #Incremental backup span includes day of Full Backup, so any changes made directly after a full backup are also backed up. if [ $FULLBAK_DAY -gt $DAY_NOW ] then #backup was last week BACKUP_WEEK=`expr $WEEK_NOW - 1` #compute span of incremental backup (measured in number of days) INCR_SPAN=`expr $DAY_NOW + 7 - $FULLBAK_DAY` else #backup was this week BACKUP_WEEK=$WEEK_NOW #compute span of incremental backup (measured in number of days) INCR_SPAN=`expr $DAY_NOW - $FULLBAK_DAY` fi #Compute week rank in total backup window. System is based on a 2 or 4 week backup window. #BACKUP_WEEK: number of week (1 - 53) in which the actual full backup was made if [ $BACKUP_WEEK -eq 53 ] then WINDOW_WEEK=`expr $WINDOW + 1` # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4]) else REMAINDER=`expr $BACKUP_WEEK % $WINDOW` if [ $REMAINDER -eq 0 ] then WINDOW_WEEK=$WINDOW else WINDOW_WEEK=$REMAINDER fi fi FILENAME="solin01_incrbackup_week${WINDOW_WEEK}_day${DAY_NOW}" DATE_SPAN=`date --date="${INCR_SPAN} day ago"` QUOTED_SPAN="'${DATE_SPAN}'" # Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to # some sources. # tar -N, --after-date DATE, --newer DATE: only store files newer than DATE # --newer=`date +%F --date='${INCR_SPAN} day ago' ftp -i -n $FTP_HOST <<FTPSession user ${FTP_USER} ${FTP_PASSWD} binary put "| tar -zcp -X X --newer=${QUOTED_SPAN} ${BACKUPDIRS} | mcrypt -f /backups/enc_key " /backups/${FILENAME}.tar.gz.nc bye FTPSession if [ $? != 0 ] then PROBLEM="There has been a problem with the Incremental Backup script. The FTP session did not succeed. Exit code was ${#}." echo "${PROBLEM}" echo -e "subject: FTP of incremental backup files failed\n${PROBLEM}" | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO fi echo "Incremental backup has finished"
The script first determines the current backup span: each incremental backup spans from the day of the full backup up till the current day. Notice that we have to perform several operations to get the span date in place (see the first highlighted line) - quotation within shell scripts can be really tricky.
Another interesting part is the second highlighted line, where the archiving, compression, encryption and ftp-transfer take place. This line differs from the same line in fullbackup.sh
in two ways:
–newer
argument given to the tar
command. The value for this argument actually determines the span of the incremental backup.
Use crontab to execute the main script automatically each day. A warning about the crontab environment: not all required paths are included in the environment. That is why we have included the PATH
statements in each script. There is also the >/dev/null 2>&1
redirect at the end of the command, so we will not receive the usual daily e-mails from crontab about this script.
I have changed the backup path on the ftp server and the web server to /bk
(instead of /backups), because the ftp utility cannot take long arguments for the put
command ('sorry, input line too long
').