Cheap Backup Solutions
Linux
OpenBSD
NT
Managing the CD-ROM
System Comparison
The backup that's discussed below is not an approach that I'd
typically recommend for a business. I've included backup scripts
from Linux and OpenBSD systems as well as the Windows NT server
for those interested (2/21/01). The approach described is well
suited to my somewhat unique computer setup. It requires a solid
knowledge of the systems and their setups but lets me keep 5
systems fully backed up for less than ten cents a night, total,
in media costs. It's based on doing differential backups
following standard system installs or infrequent full system
backups. Backup files are automatically transferred to the
system with writeable CD ROM and later to the CD-R.
The most important thing for me to backup up is my
work product such as my web site and book contents. These are
authored on my workstation which I bought with a high capacity
tape drive. I set up automated nightly backup in the first few
days I had the PC and it's saved me more than once. The web
contents are also copied to multiple other machines.
When I started with my web servers I didn't really think I needed
to back them up. They were basic OS installs with web servers
and web contents duplicated from other machines. It wasn't until
recent problems that I realized how much time was
invested in a variety of server configuration changes as well as
various scripts and shell customizations. Unlike most
businesses, my web site(s) are not mission critical. In fact
they're nothing more than marketing vehicles with no proven
value. If I have to take one down for some reason or lose one
and it takes a while to get it back, it's not a major issue.
I do try to cover the IP address with one of the other
servers so people don't get broken links.
My preferred configuration in a commercial environment is to have
enough backup capacity in terms of both storage capacity and
backup speed that every system can be backed up fully every night
and never deal with incremental or differential backups. In the
GeodSoft environment one of my main goals is to keep the cost per
machine to an absolute minimum so high capacity backup is out of
the question due to both hardware and media costs. The other is
that it must be as close to fully automated as possible. Over
time any manual solution is likely to be neglected. Manual
backups are most likely to be neglected when they're needed most,
such as when you're really involved in a new or challenging
project.
Each of my servers has started with a well documented OS install.
Each has a few additional products added to it. Beyond these
there are a comparative handful of scattered files that are
modified sporadically over a period of time. What I need is a way
to restore previous configurations if I botch a change I'm making
and to get back to where I was in the event of a disk failure or
other catastrophic loss.
If I can reliably repeat the OS install then there is no need to
backup any of the files that are created as part of the original
install. Thus all that's really needed is a way to backup the
products that are added later plus all the system and user
configuration files that change, scripts that are written, and
any logs or other system output that has more than a transient
value.
Linux
A Red Hat Linux 6.2 reinstall gave me an
opportunity to fix something that I'd been initially careless
about - where I put tools and products added after the initial
install. To keep this manageable, since the reinstall everything
I add goes into /usr/local, /var/local or /home as
appropriate. If I were willing to change the modification time
stamp on everything added to the system, it would be easy to
combine the backup of individually changing files with added
products. This is a poor approach because the modification time
stamp is simply too important for identifing the age and
version of files.
By keeping all added tools and products in a few easily
identifiable locations it's easy to create a tar file that I call
postinstall.tar. This is created by a
trivial script because
I want tar to start in different directories and directory
levels. For example all the backups go into /var/local/backup so
obviously I don't want to back this directory up or there are
tars inside of tars.
Each time I add a product, I need review the script to see if the
new location is already included or if I need to add a line. Then
the script should be rerun to create a new postinstall.tar. So
far I've manually FTP'd the result over to the NT server but I
would automate this if the postinstall.tars became frequent. It's
now been almost a year since I did one.
For the nightly backup, UNIX's find has
an option to identify files newer than a named file. (Tar has a
newer than date option but the format doesn't seem to be
documented and I've never found the right format.) I created a
zero byte root only readable /etc/installed with a time stamp
just after the end of the last Linux install. A similar marker
file could be used following a full backup which could be done
instead of the postinstall tar but this would take a lot more
media.
The nightly backup script uses find to drive tar to backup every
file that has been created or changed since the install.
Directories that should have only transient files like /proc,
/tmp and /var/run are avoided as well as /var/local/backup. This
script runs nightly from cron and creates two files in
/var/local/backup, that are named yymmdd.tar and yymmdd.log where
.log is the redirected output from tar. These files are
automatically FTP'd to the system with the CD-R drive.
This is a very inefficient script because tar is invoked separately
for each file that is backed up. It is not suitable for a
system that has a large number of files to be backed up.
A Perl script is run nightly by
NT's AT to copy the daily backups from 4 UNIX systems to a CD
ROM. I use Adaptec's DirectCD which allows a CD-R to be used
like floppy by adding files via any operating system command.
The backup CD ROM is simply left in the drive at all times that
it's not being used to create another CD. Periodically any
postinstall.tar file that has not already been archived will be
archived to CD-R media.
In addition to the backup files, this script copies all log
files that are created as part of the
Home Grown Intrusion Detection,
to the CD-R. These are the cksums and wps directories in the
script.
Restoring any individual file is just a matter of
extracting it from the right tar. Because my disks are large
relative to my needs and the backup directory is skipped,
I tend to leave the backup tar files on-line for extended
periods. Restoring a really old file might require a search
of CD-ROMs. Restoring a lost system is a little more
complicated. First I'd have to do a new install repeating the
previous one. To be successful this depends on good
documentation that can be located. Then postinstall.tar would be
extracted over the fresh install and finally the most recent
daily backup would be extracted replacing all preexisting files.
Because the post install and daily tars are relatively small this
might be just as fast as a traditional system recovery where a
minimal OS install is overlaid with the latest system backup.
The problem comes from getting the OS install right. I trust
myself to do it but in a commercial environment this process is
too complicated and error prone to rely on potentially transient
staff.
Conceptually it did not take very long to figure out how I wanted
to do the Linux backups. The only difficulties I ran into were
syntax difficulties where the sh being run by cron wouldn't work
with commands that worked fine from the command line under bash.
A little bit of trial and error had everything working in a few
hours.
OpenBSD
Backups on my OpenBSD 2.7 and 2.8 systems are conceptually
similar to the Red Hat Linux system in that they are differential
backups but the starting point is very different. Where the
Linux system is a very standard install, the OpenBSD systems are
extensively customized as described in my
Hardening OpenBSD Internet Servers section.
A heavily commented
backup script
is included in this section.
This may seem backwards to anyone who is familiar with the
security reputations of OpenBSD and Linux. Linux is the system
likely to be cracked where a default BSD install will stand up to
some fairly serious attacks. Because of OpenBSD's security
reputation I selected it for my firewall which I wanted to make
particularly secure. As a result, I worked more extensively with
OpenBSD early and BSD's much simpler approach to system
organization and startup files just seemed more natural to learn.
Soon I hope to replace my 6.2 Red Hat system with hardened
version of Red Hat 7.0.
There are some differences between the
Linux backup script and the
BSD backup
script. Though both depend on find's -newer capability, the
Linux script has multiple separate executions of find starting in
each top level directory to be backed up. This gives greater
control but requires several lines to skip the /var/local/backup
directory that contains on-line backups. The BSD script skips
the files in this directory, or more accurately the backup files
in the directory by skipping all files that end with ".tar" ".gz"
or ".tgz" as well as "backup", the name of the tar file that is
being created. The BSD script includes the ".log" files
created as part of previous backups in the backup. The Linux
script provides finer control but with a greater risk of missing
something changed in an unexpected location.
The ".log" files are also quite different. The Linux script
creates these on the fly as the tar file is being created.
These contain only file names of backed up files. The BSD
script uses tar's -t option to create a contents listing
from the tar file. This is much more detailed and larger,
as it includes the file attributes, owner, file size and
modification time like an "ls -l" listing.
Both invoke a simple script (below) and use ftp to send the backup
files to the system with the CD-R drive. The ftp "-n" option
prevents ftp for prompting for a user name and allows it to
be supplied via the ftp "user" command. The lines between
the two EOT's use UNIX's here file ability to redirect
their content into ftp as a command sequence just as if
a user typed this content in an interactive session. The
UNIX shell expands `date +%y%m%d` so ftp gets the
current date as a six character yymmdd string followed
by .tar.gz and .log.
cd /var/local/backup
ftp -n 10.145.92.17 <<EOT
user unprivileged_username password
cd backups/system_name
binary
put `date +%y%m%d`.tar.gz
ascii
put `date +%y%m%d`.log
bye
EOT
As this user's password lays around in plain text on the systems
that send files, the user has virtually no privileges on the
receiving system. Though technically this user can "login" on the
NT server, if they do, after some error messages, a blank blue
green screen appears and no GUI interface or commands are available.
As an ftp user, they can neither see nor modify the transferred
files after the transfer completes. The only directories
available to the user are "backups/system_name" where system_name
matches the hostname of the sending system. The files are
transferred in binary mode to preserve their integrity.
NT
In contrast to the UNIX system's backup which was done in a
few hours, I probably pondered over how to tackle the NT backup
for hours over a period of a couple weeks before I even started
working on it. For starters there is the unbelievable directory
structure of \WINNT. After five years of working with NT, I still
can't get over that even Microsoft could dump
Profiles and System32\config into the same directory tree that
the system executables are in.
One of the first things I learned
when I started managing systems was to separate programs from
data and preferably system from application programs and system
configuration from user data. \WINNT is the biggest confused
jumble of all kinds of files that I've ever seen. Profiles has
the most volatile user specific configuration files and even web
browser cache files!!! in the system directory tree. It's normal
for third parties to dump their .DLLs into System32. Web log
files by default go under System32.
This is like taking /etc and breaking it into several pieces and
mixing it with all the user configuration files from each /home
directory and stirring in the bins and sbins and adding some
chunks from /var and pseudo randomly pouring it all back out to
see where it lands. Microsoft doesn't even give you any choice
about it. Much of this has to be set up this way. This is one
of the clearest examples of Microsoft hurting it's own product
just to make it artificially harder to port a product first
developed on Windows to any other system.
Because of this irrational directory design, there is no way to
separate the backup of the original OS install from the things
added and changed later. About the only safe starting point is a
full system backup (or as much as NT will let you get). I've
used a .zip product to put the entire contents of C: into
c_drive.zip on D: and D: into d_drive.zip on C: and then these
are copied to CD-R media. This cannot get the registry and some
other key system files. The only tool that NT provides that will
let you conveniently backup the registry is NTBACKUP but this
only works if the system has a tape drive. Backups to disk or
any other removable media simply aren't supported.
The workaround that I use to insure full registry backup is to
boot the system into a
second minimal NT install on the
same partition as the live one. This assures that all system
file in the live system are closed when the .zip product reads
them.
Because the .zip product is a graphical product with no command
line interface, the above procedure needs to be done manually
from time to time. After full backups to .zip files are made,
the archive attribute is turned off on all files using
the command line program, attrib. When a
file is modified, its archive bit is automatically turned on
by the operateing system, so
changed files can be easily identified. However, attrib
won't change the archive attribute on hidden, system or read only files.
Microsoft has made sure that there are hundreds of desktop.ini
and zero byte mscreate.dir files scattered around that attrib
won't work on. To reset the archive bit with attrib, you have
to also turn off the hidden, system and read only attributes on the same
pass.
There are other ways provided by Microsoft to turn off the
archive attribute without changing system, hidden or read only
properties. Winfile can make the change just fine but this
requires laboriously working up and down the directory trees
doing a file or few at a time. Winfile can reset any or all
attributes in any combination. Attrib is derived from old DOS
code and has never been fully updated. Apparently it never
occured to Microsoft as a corporate entity that someone who knows
what they are doing might actually have a reason for changing an
attribute on a whole directory tree quickly with a single command
from a command prompt.
I can't think of a single Window's command line tool that has
exactly the same capabilities as it's GUI counterpart. In UNIX
like systems when you have a GUI management tool, it's normally a
user friendly interface to the same utility that would be used
from the command line so they naturally do the same things.
Windows command line and GUI counterparts seem to always be
complete separate code, designed and written by different people
at different times and with subtle to sometimes major functional
differences.
NT's best backup tool, NTBACKUP, won't work without a tape drive
leaving backup and xcopy as candidates to backup the changed
files. Both have an after date option but using this would
require editing a script each time a .zip backup was done. This
is not my idea of automation but with the benefit of hindsight
and nearly a year of backing up using the archive bit it's clear I
chose the wrong method. Editing the script would have been
enormously easier than messing with archive attributes.
Both backup and xcopy can
use the archive attribute and leave it turned on. Both have
trouble with files that are open but xcopy has a flag to ignore
errors. So with xcopy, though you can't backup open files at
least you can go on to the next file. Backup throws an illegal
instruction exception and an interactive dialog box making it
completely worthless. This is unfortunate becasue backup will
put backed up files in a single archive file where xcopy
replicates file and directory structure.
If you don't have a tape drive, xcopy seems to be the only tool
provided wiht NT
that allows a backup of files changed after a full backup.
Unfortunately xcopy can't get the registry and NT system
backups without the registry aren't worth the cost of the media
they occupy. You can back up user data but there is
little point in backing up applications without their registry
data. A few very clever applications know how to recreate all
their registry data with defaults if it's missing but most
Windows applications need to be installed to work properly. If
you've bought the NT Resource Kit (Server or Workstation) it
includes regback that lets you do a registry backup to disk.
To backup the registry, I created a \regback directory with sun,
mon . . . subdirectories. Each night the schedule service runs a
process to delete the appropriate day's old files and then runs
regback to save the registry. This keeps the seven most recent
days' registry available.
I used Perl to create a
backup script running xcopy. I used Perl
because I needed to manipulate dates and day abbreviations as
parts of file and directory names. Perl does this easily and NT
includes no adequate scripting language. Stringing a bunch of
system calls together is a pretty lame way to use Perl but this
is sometimes the only way that I've found to get certain things
done on Windows. The script runs xcopy against all the root
level directories that I want to back up. I can't run it against
the drive roots as each contains directories that I don't want
backed up. Changed files are copied to d:\backups\nt\day where
day is the three letter day of the week. Under day are c_drive
and d_drive directories containing the directories and files that
are backed up. Xcopy's output is redirected to a .log file.
After all changed files are copied to D: the CD-R drive is
checked and if it has media in it, the day directory and all
subdirectories and files are xcopied to f:\ntyymmdd. The .log
file is saved in this same directory.
Managing the CD-ROM
Adaptec DirectCD allows the CD-R drive to be used like a floppy
so any operating system command can move files to the drive. The
backup CD-R is always left in the drive. The only time it's
removed is when another CD needs to be made. It's automatic that
when the other use is completed, the backup CD-R is returned to
the drive assuring that it's available each night.
A three line batch file runs from NT's schedule (at) service
early each morning. It runs in interactive mode so the
output appears as a window on my workstation. The lines
begining the >> symbols belong to the preceeding line:
d:\apps\cgywin\bin\date|tr \n " "
>> d:\sys\logs\freef.txt
dir f: |grep free |tr -s " "
>> d:\sys\logs\freef.txt
less d:\sys\logs\freef.txt
This batch file creates a small text file that I clear
manually occasionally. The last line of the text file shows
how much free space is left on the CD-R with the date and
time. The preceeding line shows how much space was free the
day before. This is waiting for me each morning when I
start work. It only takes a few seconds looking at the
last few lines to know if there will be room for the
coming night's backup files. If not, the old CD-R is
closed and a new one formatted.
It's worth noting that the batch file makes use of four programs
from the
Cygwin toolkit,
a large collection of open source, UNIX utilities, ported to NT.
One CD-R will last one to three weeks. When one is being
filled in five or six days, I look at the systems to see which
one has grown the most and needs a full backup to reduce the
size daily differential backups. The NT server grows to about
50MB of changed files in three months. The Linux web server grew
to 33MB per day in mid December 2000 from its last full
backup in early summer. The BSD web server grew to about 10MB a day
from late last summer when it was upgraded from OpenBSD 2.6 to
2.7 until mid February 2001 when it was upgraded to OpenBSD 2.8.
In seven months the firewall has grown to 9MB per day, not
including the firewall logs that are handled separtely.
System Comparison
Comparing the two systems, the NT .zip process is somewhat
comparable to the post install process on Linux in that both are
manual but the .zip process takes 20 times or so as long with
many manual steps. I could easily cron the post install to run
nightly or weekly so it will always be up-to-date except when a
new directory tree needs to be included. The BSD reinstall.tar
The Linux backup
process has a simple result that was accomplished in a few hours
using only tools that came with Linux.
The NT process depends on four additional products not included
with NT, a .zip product, the Resource Kit regback, Perl and the
Cygwin toolkit. It also depends on local archive media where
Linux and other open source OSs typically include FTP, NFS and
Samba for sharing / accessing disks with other systems. NT
includes FTP but FTP can't transfer a directory tree and NT
includes no tool to put backup files in a single archive file.
The NT result took 2 to 3 times as long to figure out and
implement. It takes much more work to keep the NT system backed
up than the UNIX systems.
In NT's favor is much wider hardware and application support than
any UNIX system. The Adaptec DirectCD software makes managing
a CD-R drive very easy. I don't know if there are comparable
counterparts for Linux or OpenBSD. Even if such software is
not available, it would be simple to copy backup files to a
system other than the one on which they were created and when
enough to fill a CD-R were collected, make a CD. Everything but
the physical burning of the CD-R could be fully automated using
only tools provided with the operating system. If the command
to burn the CD were too difficult to easily remember and type
it could be put in a script or alias.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
http://GeodSoft.com/terms.htm
(or http://GeodSoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.
|