=====Backups, or backing up the server=====
There are quite a lot of ready-made solutions to make backups. However, they are all elaborate, too complex, or too cumbersom to install. After some browsing around for linux backup solutions, I found the following useful websites:
*[[http://madpenguin.org/Article1505.html|http://madpenguin.org/Article1505.html]]
*[[http://www.w00tlinux.com/bb/sutra194053.html|http://www.w00tlinux.com/bb/sutra194053.html]] (the post by //Alvin Oga//)
*[[http://kmself.home.netcom.com/Linux/FAQs/backups.html#software|http://kmself.home.netcom.com/Linux/FAQs/backups.html#software]]
*[[http://www.linux-backup.net/App/|http://www.linux-backup.net/App/]]
These sites pointed me in the direction of using shell and perl scripts, to create a customized backup solution. As it happens, there is an excellent "quick guide to writing scripts using the bash shell": [[http://pegasus.rutgers.edu/~elflord/unix/bash-tute.html|http://pegasus.rutgers.edu/~elflord/unix/bash-tute.html]].
The author mentions all you need to know (for now) about shell scripts, except for the (trivial) fact that shell scripts usually have the extension: ''***.sh**''. Also, do not forget to ''**chmod**'' your shell scripts to be //executable//.
But before we delve any deeper into shell scripts, here is what I wanted in the first place.
====Requirements====
-The backup process itself does not have to run quickly, but the resulting backup file must be small.
-The backup process must run full-automatically.
-All databases and all of my customers' directories (including mail folders) must be copied.
-The backed up data must be encrypted because it is sent to a server that is not under our control, over an insecure channel (ftp).
-There is a fourteen day backup window which forever moves forward (i.e. the most recent backup is one day old, the oldest backup is always fourteen days old).
-If anything in the backup process fails, I want to be warned by e-mail.
The first requirement means that nearly every backup is incremental: only changed files are part of the current backup. For instance, we could have a full backup every week, or every two weeks, and incremental backups in between.
Based on the advise from [[http://www.linux-backup.net/Strategy/Incremental/|http://www.linux-backup.net/Strategy/Incremental/]], we will have the incremental backups span the period between the last full backup and the current day. So if we have, for instance, made a full backup on Sunday and today is Thursday, then today's incremental backup will register every change made since Sunday, not just every change made since the last daily incremental backup (which would be Wednesday).
===Backup strategy===
The required backup strategy is now the following.
*A full backup is made every Sunday.
*The full backup files are rotated every four weeks: this ensures that we always have full backups for the past four weeks.
*Full backup files are named: ''**solin{01,..,n}_fullbackup_week{1,..,4}.tar.gz.nc**''. The first number stands for the Solin server which was backed up. The second is a week number in a series of four. Note that years with 53 weeks will give rise to a 5th week number. This is because normally week number 53 would be computed into 'week1', which would subsequently be overwritten by 'week1' of the new year.
*A daily incremental backup includes everything from the last monday up till and including the current weekday (there is no incremental backup on Sundays).
*The incremental backups are rotated every four weeks. This ensures that we always have a backup for each of the last 28 days.
*Incremental backup files are named: ''**solin{01,..,n}_incrbackup_week{1,..,4}_day{1,..,6}.tar.gz.nc.**''The first number stands for the Solin server which was backed up. The second is a week number in a series of four. The last number stands for the weekday (Monday is day 1). Note that the week number is computed in the same way as the one used for naming the full backup file, so you can always immediately see to which full backup an incremental backup belongs.
This strategy actually exceeds the requirements mentioned above, because now we aim for a 28 day window, instead of fourteen day one.
===Point of Attention===
If you're selecting a hosting company for storage space (you know, the target of your ftp transfers), take this into consideration: ftp transfer speed. I've seen hosts offer 1500 GB of webspace. Sounds great, until you read the fine print, which states "ftp transfer rates: 100kbps". Use this [[http://www.easycalculation.com/bandwidth-calculator.php|bandwidth conversion tool]] to figure out how long it will take to fill the entire 1500 GB, using a 100kbps upload speed. Hint: it's not gonna take you hours, days, or even months, it's gonna take years!
So, if you want to move Gigabytes, start looking for a host which offers upload speeds in the megabits, instead of kilobits.
====Pieces of the Backup Puzzle: Various Scripts and Commands====
I have rented 10 Gigabytes of storage space on an ftp-server from the same company as the one where the webserver itself is hosted (//Flexservers//), specifically for making backups.
The scripts and commands we are about to examine, are all part of the backup solution.
===Using Bash Shell scripts to ftp===
Derived from the website [[http://arul.telenet-systems.com/info/fileTransfer.php#script|http://arul.telenet-systems.com/info/fileTransfer.php#script]], here is an example of a script which does an ftp session.
#!/bin/bash
## -n Restrains ftp from attempting auto-login
HOST='hostname'
USER='username'
PASSWD='password'
ftp -i -n $HOST <
The interesting part is in the ftp command: the ''**-i**'' option turns off the interactive prompting, making it possible to do an ftp session in a shell script.
Once we are on the prompt of the ftp client, we can also use a //pipe// to stream data to the ftp server:
put "| tar -zcp /home/onno/test " myfile.tar.gz
Here, we use the pipe symbol ''**%%|%%**'' to have the put command read from //standard input//. Note that it is crucial to omit the ''**-f**'' option when invoking the ''**tar**'' command, because ''**-f**'' tells ''**tar**'' to create a file, instead of outputting the archived data to //standard output//. (See also [[http://www.rrzn.uni-hannover.de/fileadmin/kurse/material/UnixAK/UnixAK.pdf|http://www.rrzn.uni-hannover.de/fileadmin/kurse/material/UnixAK/UnixAK.pdf)]]
N.B.: apparently due to the inner workings of shell scripts, it is necessary to put in some random string at the beginning and the end of the string that gets appended to the ftp input, using ''**<<**''. So, in the example above, 'FTPÂSession' is just a random string.
**Warning**: the ftp utility limits the maximum number of characters for its "''**put**''" command severly! So, if you ever see an error "''**sorry, input line too long**''", it's the ftp utility - not the shell - which generates this error.
===A script for archiving multiple directories at once using tar===
The following script archives and gzips a number of directories to the same tar file.
#!/bin/bash
# Create backups of /etc, /home, /usr/local, and...
PATH=/bin:/usr/bin
backupdirs="/home/dopperdude"
# tar: c = create, z = zip, p = same permissions, f = file
for path in $backupdirs
do
echo "System backup on $path"
tar -zcpf /backup/test2.tar.gz $path
done
echo "System backups complete, status: $?"
Let's see what happens if we run this script:
[root@1038 onno]# ./backup.sh
System backup on /home/dopperdude
tar: Removing leading `/' from member names
System backups complete, status: 0
Great, but let's take another look at the tar manual. This script, found on the web, needs pruning! The same result can also be achieved in a single tar command, even with multiple directories:
tar -zcpf /backup/test2.tar.gz /home/dopperdude /home/onno /etc
===Using the mysqldump utility ===
mysqldump -uUser -pPassw –flush-logs --lock-tables --all-databases > backtest.sql
From the online MySQL manual ([[http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html|http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html]]):
--flush-logs, -F
Flush the MySQL server log files before starting the dump. This option requires the RELOAD privilege. Note that if you use this option in combination with the --all-databases (or -A) option, the logs are flushed for each database dumped. The exception is when using --lock-all-tables or --master-data: In this case, the logs are flushed only once, corresponding to the moment that all tables are locked. If you want your dump and the log flush to happen at exactly the same moment, you should use --flush-logs together with either --lock-all-tables or -master-data.
--lock-tables, -l
Lock all tables before starting the dump. The tables are locked with READ LOCAL to allow concurrent inserts in the case of MyISAM tables. For transactional tables such as InnoDB and BDB, --single-transaction is a much better option, because it does not need to lock the tables at all.
Please note that when dumping multiple databases, --lock-tables locks tables for each database separately. So, this option does not guarantee that the tables in the dump file are logically consistent between databases. Tables in different databases may be dumped in completely different states.
===Encrypting the archived file===
The ''**mcrypt**'' command is a simple to use encryption tool. It uses a //passphrase// to encrypt any kind of file. Normally, you are prompted twice for the passphrase, but we will store the key in a file, since we are going to use mcrypt from inside a shell script, eventually:
[onno@1038 onno]$ mcrypt -f encryption_key -u data.txt
File data.txt was encrypted.
Here, the -f argument points to a file //encryption_key //which contains the passphrase. The //data.txt// is to file to be encrypted. The -u argument deletes ("unlinks") the original file //data.txt//. The encrypted contents of the file are stored in a new file, called ''**original_filename.nc**'' - in our example: //data.txt.nc//.
===Using the Linux date command in a shell script===
Setting up a backup regime has everything to do with archiving files based on their timestamps. This means it's important to know the current date. But not only that, you will also want to know things like:
*What is the current week number for the complete year?
*Which is the current week number in a range of four weeks?
The linux ''**date**'' command comes with a lot of options, one of which answers our first question:
[root@1038 root]# date +%V
48
From the manual:
%V week number of year with Monday as first day of week (01..53)
Now we know enough to make a script to answer the second question: is this a 1st, 2nd, 3rd or 4th week?
#!/bin/bash
WEEK_IN_YEAR_NUMBER=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
#Compute weeknumber
if [ $WEEK_IN_YEAR_NUMBER -eq 53 ]
then
WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup
else
REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4`
if [ $REMAINDER -eq 0 ]
then
WEEKNUMBER=4
else
WEEKNUMBER=$REMAINDER
fi
fi
echo "WEEKNUMBER: $WEEKNUMBER "
There is also a special linux option (which does not work under Unix, according to some sources), which can be used to compute //last //week's number in a range of four:
date +%V --date='7 day ago'
===Last Week's Number===
We need the information derived from ''**date**'', because incremental backups will be made for each full backup in the week //after// the full backup. And we want to name the incremental backups in the same fashion, i.e. with the same week number as the full backup.
So, here's the script to compute the correct week number for //last// week:
#!/bin/bash
## Script for incremental backup
# %V: week number of year with Monday as first day of week (01..53)
# We want to store incremental backups under the same weeknumber as the full
# backup. But because the full backup is usually made on Sunday, each subsequent
# incremental backup is in a new week.
# Conclusion: for the incremental backup, we need to figure out the number of the
# week in which the full backup was made
WEEK_IN_YEAR_NUMBER=`date +%V --date='7 day ago'`
# Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to
# some sources.
#Compute weeknumber
if [ $WEEK_IN_YEAR_NUMBER -eq 53 ]
then
WEEKNUMBER=5 # 53rd week gets saved as week5, so week1 of the new year will not overwrite this backup
else
REMAINDER=`expr $WEEK_IN_YEAR_NUMBER % 4`
if [ $REMAINDER -eq 0 ]
then
WEEKNUMBER=4
else
WEEKNUMBER=$REMAINDER
fi
fi
echo "WEEKNUMBER: $WEEKNUMBER "
===Passing arguments and options to a shell script===
This website: [[http://aplawrence.com/Unix/getopts.html|http://aplawrence.com/Unix/getopts.html]] contains some very useful information on passing options and arguments to script. Here's a very flexible example script, which will take the required arguments in no specific order.
#!/bin/bash
args=`getopt abc: $*`
# $# contains number of arguments
if [ $# -eq 0 ]
then
echo 'Usage: -a -b -c file'
exit 1
fi
# $? contains exit status
if [ $? != 0 ]
then
echo 'Usage: -a -b -c file'
exit 1
fi
set -- $args
if [ $? != 0 ]
then
echo 'Usage: -a -b -c file'
exit 1
fi
for i
do
case "$i" in
-c) shift;echo "flag c set to $1";shift;;
-a) shift;echo "flag a set";;
-b) shift;echo "flag b set";;
esac
done
Please refer to the explanation of ''**getopt**'' on the before mentioned website, or see the manual.
====Putting it all together: the Backup Solution====
The backup solution is comprised of three scripts. All three are bash shell scripts, located in ''**/backups**'': ''**backup.sh**'', ''**fullbackup.sh**'' and ''**incrbackup.sh**''. These scripts use the following tools:
**tar**
[root@1038 root]# tar --version
tar (GNU tar) 1.13.25
To see the contents of a zipped tar file, do:
tar -tzvf example.tar.gz
Explanation:
t = list (display contents)
z = unzip
v = verbose
f = file (meaning: we are going to use a tar file, as opposed to e.g. a pipe stream)
Use the command to check if your backup files are really what you want them to be.
**mcrypt**
[root@1038 root]# mcrypt --version
Mcrypt v.2.6.4 (i686-pc-linux-gnu)
Linked against libmcrypt v.2.5.7
Copyright (C) 1998-2002 Nikos Mavroyanopoulos (nmav@gnutls.org)
**gzip**
[root@1038 root]# gzip --version
gzip 1.3.3
(2002-03-08)
Copyright 2002 Free Software Foundation
Copyright 1992-1993 Jean-loup Gailly
===The main script===
The main script, ''**backup.sh**'', calls the other two scripts. It also takes obligatory arguments and options values.
#!/bin/bash
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
## TO DO: if backup was successful, store current date in a log file.
## If the script is called again, check if the current date is greater
## than the logdate.
EMAIL_FROM="o.schuit@solin.nl"
EMAIL_TO="o.schuit@solin.nl"
usageQuit()
{
echo "Usage: $0 -d weekday -w window]"
echo "-d weekday: (1-7) Number specifying the day in the week on which the full backup must be made (Monday is 1)"
echo "-w ({2,4}) Number specifying the span of the backup window in weeks. Example: 4 makes four full backups for a whole month: one every week. During the backup window after week 52 this system only works correctly for window size = 2 or 4."
echo "** Warning: this utility makes incremental backups in between the weekly full backups."
echo "This means that the actual backup span is the number of weeks speficied as the window MINUS 1!"
echo -e "subject: Improper call of Backup script\nThe backup script was improperly called. Please specifiy all required arguments and option values." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
exit 1
}
# $# contains number of arguments
if [ $# -eq 0 ]
then
usageQuit # no arguments
fi
args=`getopt d:w: $*`
# $? contains exit status or return code of last command
if [ $? != 0 ]
then
usageQuit # not the right arguments
fi
set -- $args
if [ $? != 0 ]
then
usageQuit # not the right arguments or options
fi
for i
do
case "$i" in
-d) shift; FULLBAK_DAY=$1; shift;;
-w) shift; WINDOW=$1; shift;;
esac
done
if [ $FULLBAK_DAY -lt 1 ] || [ $FULLBAK_DAY -gt 7 ]
then
usageQuit # invalid weekday
fi
# %u day of week (1..7); 1 represents Monday
if [ `date +%u` = "$FULLBAK_DAY" ]
then
/backups/fullbackup.sh $FULLBAK_DAY $WINDOW
else
/backups/incrbackup.sh $FULLBAK_DAY $WINDOW
fi
The script validates the option values that come with each argument and then determines which particular backup script to call. That's all this script really does.
===The Full Backup script===
The ''**fullbackup.sh**'' script performs the weekly full backup.
#!/bin/bash
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
## Script for full backup
BACKUPDIRS="/home /var/www/awstats /www/logs /backups/mysql"
#External script parameters:
FULLBAK_DAY=$1
WINDOW=$2
WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
EMAIL_FROM="you@yourdomain.toplevel"
EMAIL_TO="you@yourdomain.toplevel"
MYSQL_USER="user_name"
MYSQL_PASS="password"
FTP_HOST='ftp_host_name'
FTP_USER='ftp_user'
FTP_PASSWD='ftp_password'
#Compute week rank in total backup window. System is based on a 2 or 4 week backup window.
if [ $WEEK_NOW -eq 53 ]
then
WINDOW_WEEK=`expr $WINDOW + 1` # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4])
else
REMAINDER=`expr $WEEK_NOW % $WINDOW`
if [ $REMAINDER -eq 0 ]
then
WINDOW_WEEK=$WINDOW
else
WINDOW_WEEK=$REMAINDER
fi
fi
FILENAME="solin01_fullbackup_week${WINDOW_WEEK}"
mysqldump -u${MYSQL_USER} -p${MYSQL_PASS} --flush-logs --all-databases > /backups/mysql/mysql_fullbackup.sql
# $? Exit status or return code of last command
if [ $? != 0 ]
then
echo -e "subject: Dump of MySQL databases failed\nThere has been a problem while backing up the MySQL databases for the Full Backup. Exit code was ${#}." | /usr/sbin/sendmail -f $EMAIL_FROM $EMAIL_TO
fi
# delete MySQL binary log files (these are incremental backups)
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "RESET MASTER;"
# Create backups
# tar -z: gzip the archive
# tar -c: create new archive (overwrites old archive with same name)
# tar -p, --same-permissions, --preserve-permissions
# mcrypt -f: use keys file
# ftp -n: Restrains ftp from attempting auto-login
ftp -i -n $FTP_HOST <
Before the actual backup takes place, the MySQL server databases are dumped into a specific location, which is included in the list of backup directories. After the dump, the binary logs files are reset, which later allows for proper incremental backups of these as well.
The really interesting part is the highlighted line, where the archiving, compression, encryption and ftp-transfer take place.
If anything goes wrong, the system operator is warned through an e-mail message.
===The Incremental Backup script===
The ''**incrbackup.sh**'' script performs the daily incremental backup.
#!/bin/bash
## Script for incremental backup
PATH=/usr/bin:/usr/sbin:/usr/local/bin:/bin:.
BACKUPDIRS="/home /var/www/awstats /www/logs /var/lib/mysql/backups"
#EXCL="backup-*" # do not backup files matching this pattern
# %V: week number of year with Monday as first day of week (01..53)
# We want to store incremental backups under the same weeknumber as the full
# backup. But because the full backup is usually made on Sunday, each subsequent
# incremental backup is in a new week.
# Conclusion: for the incremental backup, we need to figure out the number of the
# week in which the full backup was made.
WEEK_NOW=`date +%V` # %V: week number of year with Monday as first day of week (01..53)
DAY_NOW=`date +%u` # %u: day of week (1..7); 1 represents Monday
FULLBAK_DAY=$1 # day of week (1..7); 1 represents Monday. Example: 7 means full backup on Sundays.
WINDOW=$2 # backup span in weeks. Example: '3' means three full backups.
EMAIL_FROM="you@yourdomain.toplevel"
EMAIL_TO="you@yourdomain.toplevel"
FTP_HOST='ftp_host_name'
FTP_USER='ftp_user'
FTP_PASSWD='ftp_password'
#Incremental backup span includes day of Full Backup, so any changes made directly after a full backup are also backed up.
if [ $FULLBAK_DAY -gt $DAY_NOW ]
then
#backup was last week
BACKUP_WEEK=`expr $WEEK_NOW - 1`
#compute span of incremental backup (measured in number of days)
INCR_SPAN=`expr $DAY_NOW + 7 - $FULLBAK_DAY`
else
#backup was this week
BACKUP_WEEK=$WEEK_NOW
#compute span of incremental backup (measured in number of days)
INCR_SPAN=`expr $DAY_NOW - $FULLBAK_DAY`
fi
#Compute week rank in total backup window. System is based on a 2 or 4 week backup window.
#BACKUP_WEEK: number of week (1 - 53) in which the actual full backup was made
if [ $BACKUP_WEEK -eq 53 ]
then
WINDOW_WEEK=`expr $WINDOW + 1` # 53rd week gets saved as week(window + 1), so week1 of the new year will never overwrite this backup (as might be the case if e.g. [$WINDOW -eq 4])
else
REMAINDER=`expr $BACKUP_WEEK % $WINDOW`
if [ $REMAINDER -eq 0 ]
then
WINDOW_WEEK=$WINDOW
else
WINDOW_WEEK=$REMAINDER
fi
fi
FILENAME="solin01_incrbackup_week${WINDOW_WEEK}_day${DAY_NOW}"
DATE_SPAN=`date --date="${INCR_SPAN} day ago"`
QUOTED_SPAN="'${DATE_SPAN}'"
# Unix vs. Linux Warning: --date='7 day ago' only works under Linux, according to
# some sources.
# tar -N, --after-date DATE, --newer DATE: only store files newer than DATE
# --newer=`date +%F --date='${INCR_SPAN} day ago'
ftp -i -n $FTP_HOST <
The script first determines the current backup span: each incremental backup spans from the day of the full backup up till the current day. Notice that we have to perform several operations to get the span date in place (see the first highlighted line) - quotation within shell scripts can be really tricky.
Another interesting part is the second highlighted line, where the archiving, compression, encryption and ftp-transfer take place. This line differs from the same line in ''**fullbackup.sh**'' in two ways:
*There is a ''**--newer**'' argument given to the ''**tar**'' command. The value for this argument actually determines the span of the incremental backup.
*To exclude certain files from being backed up, the option -X reads a file (perhaps confusingly named X as well) which contains a list of patterns. Any file which matches a pattern from the list will //not //be backed up: it is eXcluded by tar! We do this because e.g. the web application Moodle creates daily backup zip files. The content in these zip files has already been backed up by our incremental backup, so we do not want to include these zip files.
===Running the main script with crontab===
Use crontab to execute the main script automatically each day. A **warning** about the crontab environment: not all required paths are included in the environment. That is why we have included the ''**PATH**'' statements in each script. There is also the ''**>/dev/null 2>&1**'' redirect at the end of the command, so we will not receive the usual daily e-mails from crontab about this script.
===20051227===
I have changed the backup path on the ftp server and the web server to ''**/bk**'' (instead of /backups), because the ftp utility cannot take long arguments for the ''**put**'' command ('''**sorry, input line too long**''').