GeodSoft logo   GeodSoft

Cheap Backup Solutions

Linux
OpenBSD
NT
Managing the CD-ROM
System Comparison

The backup that's discussed below is not an approach that I'd typically recommend for a business. I've included backup scripts from Linux and OpenBSD systems as well as the Windows NT server for those interested (2/21/01). The approach described is well suited to my somewhat unique computer setup. It requires a solid knowledge of the systems and their setups but lets me keep 5 systems fully backed up for less than ten cents a night, total, in media costs. It's based on doing differential backups following standard system installs or infrequent full system backups. Backup files are automatically transferred to the system with writeable CD ROM and later to the CD-R.

The most important thing for me to backup up is my work product such as my web site and book contents. These are authored on my workstation which I bought with a high capacity tape drive. I set up automated nightly backup in the first few days I had the PC and it's saved me more than once. The web contents are also copied to multiple other machines.

When I started with my web servers I didn't really think I needed to back them up. They were basic OS installs with web servers and web contents duplicated from other machines. It wasn't until recent problems that I realized how much time was invested in a variety of server configuration changes as well as various scripts and shell customizations. Unlike most businesses, my web site(s) are not mission critical. In fact they're nothing more than marketing vehicles with no proven value. If I have to take one down for some reason or lose one and it takes a while to get it back, it's not a major issue. I do try to cover the IP address with one of the other servers so people don't get broken links.

My preferred configuration in a commercial environment is to have enough backup capacity in terms of both storage capacity and backup speed that every system can be backed up fully every night and never deal with incremental or differential backups. In the GeodSoft environment one of my main goals is to keep the cost per machine to an absolute minimum so high capacity backup is out of the question due to both hardware and media costs. The other is that it must be as close to fully automated as possible. Over time any manual solution is likely to be neglected. Manual backups are most likely to be neglected when they're needed most, such as when you're really involved in a new or challenging project.

Each of my servers has started with a well documented OS install. Each has a few additional products added to it. Beyond these there are a comparative handful of scattered files that are modified sporadically over a period of time. What I need is a way to restore previous configurations if I botch a change I'm making and to get back to where I was in the event of a disk failure or other catastrophic loss.

If I can reliably repeat the OS install then there is no need to backup any of the files that are created as part of the original install. Thus all that's really needed is a way to backup the products that are added later plus all the system and user configuration files that change, scripts that are written, and any logs or other system output that has more than a transient value.

Linux

A Red Hat Linux 6.2 reinstall gave me an opportunity to fix something that I'd been initially careless about - where I put tools and products added after the initial install. To keep this manageable, since the reinstall everything I add goes into /usr/local, /var/local or /home as appropriate. If I were willing to change the modification time stamp on everything added to the system, it would be easy to combine the backup of individually changing files with added products. This is a poor approach because the modification time stamp is simply too important for identifing the age and version of files.

By keeping all added tools and products in a few easily identifiable locations it's easy to create a tar file that I call postinstall.tar. This is created by a trivial script because I want tar to start in different directories and directory levels. For example all the backups go into /var/local/backup so obviously I don't want to back this directory up or there are tars inside of tars.

Each time I add a product, I need review the script to see if the new location is already included or if I need to add a line. Then the script should be rerun to create a new postinstall.tar. So far I've manually FTP'd the result over to the NT server but I would automate this if the postinstall.tars became frequent. It's now been almost a year since I did one.

For the nightly backup, UNIX's find has an option to identify files newer than a named file. (Tar has a newer than date option but the format doesn't seem to be documented and I've never found the right format.) I created a zero byte root only readable /etc/installed with a time stamp just after the end of the last Linux install. A similar marker file could be used following a full backup which could be done instead of the postinstall tar but this would take a lot more media.

The nightly backup script uses find to drive tar to backup every file that has been created or changed since the install. Directories that should have only transient files like /proc, /tmp and /var/run are avoided as well as /var/local/backup. This script runs nightly from cron and creates two files in /var/local/backup, that are named yymmdd.tar and yymmdd.log where .log is the redirected output from tar. These files are automatically FTP'd to the system with the CD-R drive.

This is a very inefficient script because tar is invoked separately for each file that is backed up. It is not suitable for a system that has a large number of files to be backed up.

A Perl script is run nightly by NT's AT to copy the daily backups from 4 UNIX systems to a CD ROM. I use Adaptec's DirectCD which allows a CD-R to be used like floppy by adding files via any operating system command. The backup CD ROM is simply left in the drive at all times that it's not being used to create another CD. Periodically any postinstall.tar file that has not already been archived will be archived to CD-R media.

In addition to the backup files, this script copies all log files that are created as part of the Home Grown Intrusion Detection, to the CD-R. These are the cksums and wps directories in the script.

Restoring any individual file is just a matter of extracting it from the right tar. Because my disks are large relative to my needs and the backup directory is skipped, I tend to leave the backup tar files on-line for extended periods. Restoring a really old file might require a search of CD-ROMs. Restoring a lost system is a little more complicated. First I'd have to do a new install repeating the previous one. To be successful this depends on good documentation that can be located. Then postinstall.tar would be extracted over the fresh install and finally the most recent daily backup would be extracted replacing all preexisting files.

Because the post install and daily tars are relatively small this might be just as fast as a traditional system recovery where a minimal OS install is overlaid with the latest system backup. The problem comes from getting the OS install right. I trust myself to do it but in a commercial environment this process is too complicated and error prone to rely on potentially transient staff.

Conceptually it did not take very long to figure out how I wanted to do the Linux backups. The only difficulties I ran into were syntax difficulties where the sh being run by cron wouldn't work with commands that worked fine from the command line under bash. A little bit of trial and error had everything working in a few hours.

OpenBSD

Backups on my OpenBSD 2.7 and 2.8 systems are conceptually similar to the Red Hat Linux system in that they are differential backups but the starting point is very different. Where the Linux system is a very standard install, the OpenBSD systems are extensively customized as described in my Hardening OpenBSD Internet Servers section. A heavily commented backup script is included in this section.

This may seem backwards to anyone who is familiar with the security reputations of OpenBSD and Linux. Linux is the system likely to be cracked where a default BSD install will stand up to some fairly serious attacks. Because of OpenBSD's security reputation I selected it for my firewall which I wanted to make particularly secure. As a result, I worked more extensively with OpenBSD early and BSD's much simpler approach to system organization and startup files just seemed more natural to learn. Soon I hope to replace my 6.2 Red Hat system with hardened version of Red Hat 7.0.

There are some differences between the Linux backup script and the BSD backup script. Though both depend on find's -newer capability, the Linux script has multiple separate executions of find starting in each top level directory to be backed up. This gives greater control but requires several lines to skip the /var/local/backup directory that contains on-line backups. The BSD script skips the files in this directory, or more accurately the backup files in the directory by skipping all files that end with ".tar" ".gz" or ".tgz" as well as "backup", the name of the tar file that is being created. The BSD script includes the ".log" files created as part of previous backups in the backup. The Linux script provides finer control but with a greater risk of missing something changed in an unexpected location.

The ".log" files are also quite different. The Linux script creates these on the fly as the tar file is being created. These contain only file names of backed up files. The BSD script uses tar's -t option to create a contents listing from the tar file. This is much more detailed and larger, as it includes the file attributes, owner, file size and modification time like an "ls -l" listing.

Both invoke a simple script (below) and use ftp to send the backup files to the system with the CD-R drive. The ftp "-n" option prevents ftp for prompting for a user name and allows it to be supplied via the ftp "user" command. The lines between the two EOT's use UNIX's here file ability to redirect their content into ftp as a command sequence just as if a user typed this content in an interactive session. The UNIX shell expands `date +%y%m%d` so ftp gets the current date as a six character yymmdd string followed by .tar.gz and .log.

cd /var/local/backup
ftp -n 10.145.92.17 <<EOT
user unprivileged_username password
cd backups/system_name
binary
put `date +%y%m%d`.tar.gz
ascii
put `date +%y%m%d`.log
bye

EOT

As this user's password lays around in plain text on the systems that send files, the user has virtually no privileges on the receiving system. Though technically this user can "login" on the NT server, if they do, after some error messages, a blank blue green screen appears and no GUI interface or commands are available. As an ftp user, they can neither see nor modify the transferred files after the transfer completes. The only directories available to the user are "backups/system_name" where system_name matches the hostname of the sending system. The files are transferred in binary mode to preserve their integrity.

NT

In contrast to the UNIX system's backup which was done in a few hours, I probably pondered over how to tackle the NT backup for hours over a period of a couple weeks before I even started working on it. For starters there is the unbelievable directory structure of \WINNT. After five years of working with NT, I still can't get over that even Microsoft could dump Profiles and System32\config into the same directory tree that the system executables are in.

One of the first things I learned when I started managing systems was to separate programs from data and preferably system from application programs and system configuration from user data. \WINNT is the biggest confused jumble of all kinds of files that I've ever seen. Profiles has the most volatile user specific configuration files and even web browser cache files!!! in the system directory tree. It's normal for third parties to dump their .DLLs into System32. Web log files by default go under System32.

This is like taking /etc and breaking it into several pieces and mixing it with all the user configuration files from each /home directory and stirring in the bins and sbins and adding some chunks from /var and pseudo randomly pouring it all back out to see where it lands. Microsoft doesn't even give you any choice about it. Much of this has to be set up this way. This is one of the clearest examples of Microsoft hurting it's own product just to make it artificially harder to port a product first developed on Windows to any other system.

Because of this irrational directory design, there is no way to separate the backup of the original OS install from the things added and changed later. About the only safe starting point is a full system backup (or as much as NT will let you get). I've used a .zip product to put the entire contents of C: into c_drive.zip on D: and D: into d_drive.zip on C: and then these are copied to CD-R media. This cannot get the registry and some other key system files. The only tool that NT provides that will let you conveniently backup the registry is NTBACKUP but this only works if the system has a tape drive. Backups to disk or any other removable media simply aren't supported.

The workaround that I use to insure full registry backup is to boot the system into a second minimal NT install on the same partition as the live one. This assures that all system file in the live system are closed when the .zip product reads them.

Because the .zip product is a graphical product with no command line interface, the above procedure needs to be done manually from time to time. After full backups to .zip files are made, the archive attribute is turned off on all files using the command line program, attrib. When a file is modified, its archive bit is automatically turned on by the operateing system, so changed files can be easily identified. However, attrib won't change the archive attribute on hidden, system or read only files. Microsoft has made sure that there are hundreds of desktop.ini and zero byte mscreate.dir files scattered around that attrib won't work on. To reset the archive bit with attrib, you have to also turn off the hidden, system and read only attributes on the same pass.

There are other ways provided by Microsoft to turn off the archive attribute without changing system, hidden or read only properties. Winfile can make the change just fine but this requires laboriously working up and down the directory trees doing a file or few at a time. Winfile can reset any or all attributes in any combination. Attrib is derived from old DOS code and has never been fully updated. Apparently it never occured to Microsoft as a corporate entity that someone who knows what they are doing might actually have a reason for changing an attribute on a whole directory tree quickly with a single command from a command prompt.

I can't think of a single Window's command line tool that has exactly the same capabilities as it's GUI counterpart. In UNIX like systems when you have a GUI management tool, it's normally a user friendly interface to the same utility that would be used from the command line so they naturally do the same things. Windows command line and GUI counterparts seem to always be complete separate code, designed and written by different people at different times and with subtle to sometimes major functional differences.

NT's best backup tool, NTBACKUP, won't work without a tape drive leaving backup and xcopy as candidates to backup the changed files. Both have an after date option but using this would require editing a script each time a .zip backup was done. This is not my idea of automation but with the benefit of hindsight and nearly a year of backing up using the archive bit it's clear I chose the wrong method. Editing the script would have been enormously easier than messing with archive attributes.

Both backup and xcopy can use the archive attribute and leave it turned on. Both have trouble with files that are open but xcopy has a flag to ignore errors. So with xcopy, though you can't backup open files at least you can go on to the next file. Backup throws an illegal instruction exception and an interactive dialog box making it completely worthless. This is unfortunate becasue backup will put backed up files in a single archive file where xcopy replicates file and directory structure.

If you don't have a tape drive, xcopy seems to be the only tool provided wiht NT that allows a backup of files changed after a full backup. Unfortunately xcopy can't get the registry and NT system backups without the registry aren't worth the cost of the media they occupy. You can back up user data but there is little point in backing up applications without their registry data. A few very clever applications know how to recreate all their registry data with defaults if it's missing but most Windows applications need to be installed to work properly. If you've bought the NT Resource Kit (Server or Workstation) it includes regback that lets you do a registry backup to disk.

To backup the registry, I created a \regback directory with sun, mon . . . subdirectories. Each night the schedule service runs a process to delete the appropriate day's old files and then runs regback to save the registry. This keeps the seven most recent days' registry available.

I used Perl to create a backup script running xcopy. I used Perl because I needed to manipulate dates and day abbreviations as parts of file and directory names. Perl does this easily and NT includes no adequate scripting language. Stringing a bunch of system calls together is a pretty lame way to use Perl but this is sometimes the only way that I've found to get certain things done on Windows. The script runs xcopy against all the root level directories that I want to back up. I can't run it against the drive roots as each contains directories that I don't want backed up. Changed files are copied to d:\backups\nt\day where day is the three letter day of the week. Under day are c_drive and d_drive directories containing the directories and files that are backed up. Xcopy's output is redirected to a .log file.

After all changed files are copied to D: the CD-R drive is checked and if it has media in it, the day directory and all subdirectories and files are xcopied to f:\ntyymmdd. The .log file is saved in this same directory.

Managing the CD-ROM

Adaptec DirectCD allows the CD-R drive to be used like a floppy so any operating system command can move files to the drive. The backup CD-R is always left in the drive. The only time it's removed is when another CD needs to be made. It's automatic that when the other use is completed, the backup CD-R is returned to the drive assuring that it's available each night.

A three line batch file runs from NT's schedule (at) service early each morning. It runs in interactive mode so the output appears as a window on my workstation. The lines begining the >> symbols belong to the preceeding line:

d:\apps\cgywin\bin\date|tr \n " "
   >> d:\sys\logs\freef.txt
dir f: |grep free |tr -s " "
   >> d:\sys\logs\freef.txt
less d:\sys\logs\freef.txt

This batch file creates a small text file that I clear manually occasionally. The last line of the text file shows how much free space is left on the CD-R with the date and time. The preceeding line shows how much space was free the day before. This is waiting for me each morning when I start work. It only takes a few seconds looking at the last few lines to know if there will be room for the coming night's backup files. If not, the old CD-R is closed and a new one formatted.

It's worth noting that the batch file makes use of four programs from the Cygwin toolkit, a large collection of open source, UNIX utilities, ported to NT.

One CD-R will last one to three weeks. When one is being filled in five or six days, I look at the systems to see which one has grown the most and needs a full backup to reduce the size daily differential backups. The NT server grows to about 50MB of changed files in three months. The Linux web server grew to 33MB per day in mid December 2000 from its last full backup in early summer. The BSD web server grew to about 10MB a day from late last summer when it was upgraded from OpenBSD 2.6 to 2.7 until mid February 2001 when it was upgraded to OpenBSD 2.8. In seven months the firewall has grown to 9MB per day, not including the firewall logs that are handled separtely.

System Comparison

Comparing the two systems, the NT .zip process is somewhat comparable to the post install process on Linux in that both are manual but the .zip process takes 20 times or so as long with many manual steps. I could easily cron the post install to run nightly or weekly so it will always be up-to-date except when a new directory tree needs to be included. The BSD reinstall.tar The Linux backup process has a simple result that was accomplished in a few hours using only tools that came with Linux.

The NT process depends on four additional products not included with NT, a .zip product, the Resource Kit regback, Perl and the Cygwin toolkit. It also depends on local archive media where Linux and other open source OSs typically include FTP, NFS and Samba for sharing / accessing disks with other systems. NT includes FTP but FTP can't transfer a directory tree and NT includes no tool to put backup files in a single archive file. The NT result took 2 to 3 times as long to figure out and implement. It takes much more work to keep the NT system backed up than the UNIX systems.

In NT's favor is much wider hardware and application support than any UNIX system. The Adaptec DirectCD software makes managing a CD-R drive very easy. I don't know if there are comparable counterparts for Linux or OpenBSD. Even if such software is not available, it would be simple to copy backup files to a system other than the one on which they were created and when enough to fill a CD-R were collected, make a CD. Everything but the physical burning of the CD-R could be fully automated using only tools provided with the operating system. If the command to burn the CD were too difficult to easily remember and type it could be put in a script or alias.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
About >
Building GeodSoft.com >
cheapbk.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.