Developing a script to automatically
synchronize almost identical web sites on Linux, Windows NT and
OpenBSD web servers. The only differences between the sites are
graphics identifying the web server and OS and the links to the
other sites. I've chosen to keep the sites almost identical.
Anything outside the unique page content could be very different
on the three sites.
Also, even though the bulk of the content of matching files on
each of the three web sites is the same, each HTML document has
site specific areas that identify the OS and web server on which
the document resides. Each site could have a different color
scheme or even different navigation aids, if this was desired.
The files are never in sync the way a utility that
synchronizes disks or directories understands synchronization.
Following each transfer, the standardization process needs to be
run against each file that's been moved. It should only process
the moved files and not the whole site. Otherwise the overhead of
doing the whole site each time there is a minor file change
anywhere in the site negates one of the real advantages of the
current methods: serving static pages that only need adjustment
when their content changes or they are moved to a new system.
The approach that I chose makes use of a second, typically empty,
directory tree that matches the directory tree of the web sites. On the
development / sending site, files are copied to the appropriate directory
when they are ready for display. A background process periodically
scans this directory tree and when it finds a file or files, sets up
to do a transfer. It does this by writing a text file that contains
the FTP commands to effect a transfer to the correct location. As the
script walks the transfer directory tree, it writes pairs of cd and
lcd commands. Based on file extension it writes an ASCII or binary
command for each file then a put statement with the file's name. The
cd and lcd commands include ".." as the script works back up the
directory tree. For each destination site, ftp is invoked and
the FTP command script is piped to ftp.
After the transfers are complete the directory tree is cleared of files.
This is the cause of one of the two serious problems with the existing
process. If other files are copied into the transfer directory while
a transfer is in progress, they may be erased without being transferred.
As I wrote this, the solution to that occurred to me. Write a second
script (or batch file on NT) that mirrors the directory walk and only
erases the individual files identified for transfer during that run.
The same approach can be used on the receiving side as well where there
is a problem if an additional file or file is delivered after the
receiving side script has been built but before it completes executing.
The receiving side script is somewhat like the FTP command script but
invokes the Perl standardization script for each transferred file.
Fairly early in the process of testing I accidentally moved rather than
copied some files from the development site to the transfer directory
(drag rather than control-drag). The files were erased. Fortunately I
had them on backups but if I had worked on those files that day they
would not have been on backups. I decided make a backup directory
on the sending system to which all transferred files were copied but
not erased. I decided to also make a backup on the receiving system
before processing the files. It seemed like a good idea at the time but
the result was that I ended up working with 15 sets of almost identical
directory trees, 12 actively on each test. Since I wasn't going to
work in the real directories being used by the web servers
until the transfer process was fairly thoroughly debugged, I had to
create a test directory to act as the destination directory on each
receiving system.
The destination directory had to have the necessary perl scripts and
library files as well as the text files from which the navigation,
search form and platform descriptions were drawn. It needed to be
empty of the other files so I could easily tell which files were
actually transferred. Later when files were deliberately transferred
over previous transferred copies I had to rely on time stamps. At the
beginning of each set of tests I wanted things as clean as practical.
With a sending or destination directory, a transfer directory and
a backup directory in active use on each system it became quite
tricky keeping track of which system I was on and exactly where
I was in the testing process. There were as many delays from testing
mistakes as fixing problems in the scripts.
The first rounds of testing were done entirely between the NT workstation
and the Red Hat Linux system. Linux performed as expected and the problems
I dealt with entirely related to script and testing issues. After I thought
I was pretty close to a workable solution, the BSD system was brought into
the testing. For hours, I could not even get the scripts to run. I built
simplified test scripts and dumped the entire environment to files trying
to figure out what was going on. I assumed there was something wrong with
the script or my setup.
It took more than a day to conclude cron was never even trying to start
the jobs in the first place. I reached this conclusion after I was
consistently able to start jobs by not specifying a starting time but
simply using asterisks to start a job every minute. I'd watch the background
processes until I saw my job then edit crontab to stop more from starting.
I always got two and killed one. The man pages for at say there's a bug
related to starting jobs. Apparently cron doesn't get the crontab updates
instantaneously either. There is a small delay. Normally when testing
background jobs, after testing the basic logic in the foreground, I go
to crontab and / or at to schedule in the background to deal with
environmental issues. To minimize testing time I always try to kick of
the job on the next minute unless I don' think I can get it saved in
time. Every other system I've worked on gets these updates immediately.
OpenBSD 2.6 looks like it has a delay. It's like scheduling jobs to run
at 12:01 when it's already 12:01:01. They never run as the time has
past (unless they run the next day or hour). Anything that's in the schedule
goes off when expected and if you give a few minute lead time jobs start
when expected but it trying to cut it close really costs you. 2.7 appears
to have fixed this.
The very first time I did a test with NT I got a file in the root
directory to transfer. I couldn't get any file from a subdirectory
to go across. I spent time experimenting with xcopy and cp syntax
and starting from the command line as well as Perl. Eventually I
got back to exactly where I started from and it worked fine (for a
while) so it seemed that there was a test setup problem. After I'd
done about 8 successive transfers where everything worked exactly
as it should, I thought I had it. I copied 16 graphics files into
the transfer directory tree. They all went the UNIX systems but
never got to the NT system. The exact same processes were still looping
in memory on all four systems (they're set for 24 hours duration by
default). I never got anther file to transfer to NT. The FTP server
on NT is fine. I don't have clue what changed.
For the time being I've gone back to simply dragging the files over
to the NT system. Since they're developed with the NT setup I don't
need to run the standardize script. At some point I'll return to
this but for now it's taken far too long. Once again NT displays
truly bizzare behaviour though I have to admit I spent much longer
on the BSD system. There however I always had some idea what to
try next.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
http://GeodSoft.com/terms.htm
(or http://GeodSoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.