Perl and CGI, NT vs. Linux vs. BSD - 5/06/00
Differences implementing Perl CGI and site maintenance scripts on
Linux, Windows NT and OpenBSD systems. Like most of the pages in
this section, it has not been updaded since it was written
and does not include things I've learned since. It is not
a systematic comparison of the operating systems or the use of Perl
on them but a narative of specific problems encountered.
Update 5/08/00
In developing a script to aid in maintaining multiple versions of
the same web site on different platforms, its clear that some
cross platform issues need to be work out. A script that's going to use
different graphics, depending on where in a virtual web site a page
that is being updated is located, has to work with directory and
path names in a way that's going to work across platforms. While
Perl mitigates some of the differences, it's not safe to assume that
logic that works where directories are separated by back slashes
will necessarily work where they are separated by forward slashes
and vice versa. Before too much actual development is done some
basic testing needs to be performed to be sure that certain key
elements will work on all systems.
In addition to static pages, GeodSoft.com will at some point have
at least some dynamically generated pages. These need to
have the same design and navigation aids as the static pages. In
the past I have used a singe function to generate standard page tops
and bottoms that is used both by the standardization / maintenance
script and all CGI programs. This site will be much more complex
because some components will be changing with where in the site the
page is located (physically for static pages and logically for
dynamic pages). Other components will vary with the web server and
OS platform.
While I may use other tools such as PHP or Zope for one or more of
the sites, so far most of my dynamic content creation has been done
with Perl and regardless of what other tools may be used it's bound
to be useful to be able to put up simple Perl CGI pages. My
development methodology is pretty emphatic about not hard coding
any site or directory location information that can be obtained by
other means. Specifically I've tried (and succeeded) in building
web sites so they could be moved not only from one machine
to another but to different directory trees on the same machine.
This allows one machine to host live, development and experimental
versions of the same site, without having to make source code
changes as pieces are moved from one site to the other.
For this to work, the scripts must be able to
determine where they are physically located
and what the site root directory is. The key piece
to this is the PATH_TRANSLATED CGI/Environmental variable. I
decided to set up CGI directories on all three servers and see what
environmental variables are available on each server. For this
purpose I have a simple Perl scripts that output minimal HTML
pages listing environmental variables and values.
Since I've installed NT web servers on 6 different machines and
done twice as many installs and upgrades, I started with NT. Here
it was a simple matter of creating a cgi-bin directory and copying
an existing script. I needed to check that script execution was
enabled in the new directory and with IIS 4, explicitly make an
association between the .pl extension and the Perl executable.
This took a few minutes and worked the first time.
In this case it was Apache, BSD and Linux that all managed to
surprise me. I created cgi-bin directories under the site root
directories, copied the script and made it executable. I set up
what I thought were the necessary Apache Directory and Option
directives to make the new directories CGI executable. In both
cases Apache said the requested URL could not be found. Further,
on both systems, even though "perl -cw cgienv3.pl" returns with
the "cgienv3.pl syntax OK" message, the script blows up when
executed at which time Perl reports a syntax error after
executing several lines.
While I have about two more years of NT web experience I still
have about four more years of UNIX experience and three more years
of Perl on UNIX than on NT. I know that on both NT and UNIX systems,
the default installs are typically for the script directory to
be outside of the document root directory. I suppose someday
someone will explain to me when and why this makes sense but its
not very useful for the sites I've worked on. I feel very strongly
about having a production site that is separate from the test,
development or staging site and no where is this more important
than in the script directories which are the ones that require
the most testing. Actually every web site needs at least three
sites because sooner or later nearly every site goes through an
overhaul where major changes are made to site structure, navigation
aids, styles etc. This process takes too long to stop work on
the production site so you need a third site during this period.
Whenever you're upgrading system or development software or
installing major new development products, it's very advantageous
to have separate machines to do the testing on but at other times
the live and development systems can coexist on the same machine
as the development site will rarely put a significant load on
a server. Even when they are always on separate machines, there
still comes the time for the site overhauls and I am sure many,
actually the large majority of companies since there are many
more little companies than big ones, can't really afford a third
web server machine.
Further, dynamic content is about custom content so where does it
make better sense to have multiple versions of directories than
the script directories. It seems clear to me that though it may
not be necessary, that it is advantageous if the script directories
go with the site and not the server. The only place a common
script directory makes sense is at an ISP where the customers don't
actually have direct access to modify the scripts but only to run
certain standard scripts that are configured with data that is
specific to each virtual site that uses the script.
I expect that I will work through the Apache documentation and
figure out what needs to be done to put a cgi-bin directory in each
virtual site. Frankly it seems almost bizarre that you have to
go to extra work to make a directory that is physically inside
a site appear than one that is outside. Likewise, I'll eventually
figure out what's causing the Perl problems and while this is
most likely a Perl and not and OS issue, this round clearly goes
to NT.
The Perl problem was trivial. Since the script was copied from
an NT environment it was missing the "#!/usr/bin/perl" first line.
It didn't take very long to find the AddHandler directive in the
Apache documentation. That looked like it should be the answer
but it wasn't. I've also tried specifying the directory by it's
full path name and relative to the virtual root with and without
a leading slash none of which seem to work.
I went to the Apache FAQ's and after a long detour looking at
search products pointed to by the FAQ, returned to the CGI problem.
The "What to do in case of problems" mentioned the error log and
a quick look there confirmed my suspicions. Apache is trying to
run CGI scripts out of the default cgi-bin directory and not the
new site specific one I'm trying to create in my virtual site.
I've looked at the ScriptAlias settings and documentation in
httpd.conf and it's pretty clear how I could change where Apache
is looking for CGI scripts but that's not what I want to do.
I want to leave the cgi-bin directory of the default site and
have a separate cgi-bin directory that goes with the virtual
site. This is trivial with IIS and should be with Apache but
I can't find anything in the documentation to suggest how to go
about this.
There doesn't seem to be any good concept or overview type
documentation that comes with Apache at all. This is hardly unusual
anymore; most documentation seems aimed at step by step mechanics.
As irritating as they are at so many things, Microsoft does have
good conceptual technical documentation, at least if you're willing
to pay for the Resource Kits. Of course you are paying infinitely
more just for this documentation than you pay for the whole Apache
package. I'll try the Apache site and if I don't find anything
then I guess I'll try one of the news groups, as suggested.
In the preceding paragraph, I forgot the obvious. There is plenty
of third party documentation available for Apache (as well as
NT.) I just ordered three books, O'Reilly's Apache the Definitive
Guide and Writing Apache Modules with Perl and C as
well as Wrox's Professional Apache for about the street
price of a typical Microsoft Resource Kit. BTW, I used Bookpool.com
which pretty consistently has the lowest prices on technical books
on the Internet. I've used them for several years. Their web site
can be a bit of a pain at times and their selection is not as big
as some of the better known sites, but they have really cheap prices
on what they carry and have always provided excellent service in
my experience.
I ended up deciding to try disabling the ScriptAlias directory
completely. I always use .pl extensions on my Perl scripts regardless
of the platform and sometimes have data or other none script files
in my cgi-bin directory so I wasn't particularly thrilled
with the descriptions of what ScriptAlias accomplished, i.e. making
every file in the directory executable. The
interesting question will be can I make CGI scripts work in the
original cgi-bin directory at the same time as they are working in
the virtual directory for GeodSoft? For the moment, I'll leave
things as they are.
I did just learn one more relevant item on the NT server. Even though
the site is public, it's prompting for passwords on the CGI script.
I hadn't noticed this before because NT authentication had been turned
on and basic authentication turned off, the IIS default, so my NT
workstation was automatically authenticating me. Now that I've
disabled both basic and NT authentication, allowing only anonymous
access, I am getting prompted for a username and password from all
machines. This is especially weird since there's no security on the
NT machine. By this I mean since it's still on a private network
not connected to anything, I haven't gotten around to changing the
default NT "security" which is for Everyone to have full access to
everything which is for all intents and purposes no security. I knew
I'd have to change this before getting my SDSL connection but I'm
very surprised at these password prompts now. I even added the
anonymous user with explicit read and execute access and am still
being prompted.
Talk about weird. Without changing a thing on the NT server,
it's now no longer prompting for passwords. While I wrote the
preceding paragraph, the password dialog box was on the screen.
When I finished writing the paragraph, I cancelled the dialog
and tried the URL again and it came up with no prompt. I haven't
even switched over to the NT server and am not running any management
software on this workstation. This reminds me of situations that
I constantly ran into when first working with CGI on NT in early
1997. There were a variety of unpredictable time delays before
some changes went into effect making it almost impossible to
test certain situations because you could not be sure whether
the change you just made or one you made a few minutes before
fixed the problem. This is the most frustrating thing about NT;
I've never seen another OS that is so unpredictable.
Even though I spent a lot more time on the UNIX like systems,
I now have to call this one a draw. If I hadn't stopped to write
about what was going on, there's no telling how long I might
messed around with NT trying to figure out what was going on.
This is unbelievable or would be if it wasn't Microsoft. I just
entered a second script name and was again prompted for a password
and when I cancelled the dialog box got a "401.3 Unauthorized:
Unauthorized due to ACL on resource" error message. I have two
similar scripts, one which loops through all environmental
variables and one which only looks for specific CGI environmental
variables. The NT security settings on both are identical.
Now NT is prompting for passwords on both. If I hadn't been
writing this just as it happened, I'd have to question my memory
or sanity.
Even worse, it won't take my valid Administrator password, a simple
password I type a dozen or more times a day. It only took a few
moments to find the answer when I went back to the NT machine. I
had started to put access controls on some directories including the
Perl directories, which were not available to the anonymous user.
Now that they are, I'm not getting password prompts on either the
NT workstation or Linux machine. This is good but leaves no explanation
for why one script came up once during this process.
Now that my script is working on two systems, the NT system is
returning PATH_TRANSLATED but not DOCUMENT_ROOT and the Linux
system is returning DOCUMENT_ROOT but not PATH_TRANSLATED.
After making similar changes to Apache on the Open BSD server
as I had on Linux, I got the following "You don't have permission
to access /cgi-bin/cgienv.pl on this server." error message. This
looks like the difference in security orientation between Open BSD
and Red Hat Linux. This is the kind of difference that one
should expect between these two systems.
5/08/2000
Returning to NT, I added some use statements to one of my simple
test scripts. I hadn't run any CGI scripts on my workstation under
Personal Web Server (PWS) and thought I'd give it a try. It didn't
work. No surprise. After checking all the obvious settings and
seeing no reason for it not to work I moved it the NT server and
got a new error message: "CGI Error - The specified CGI application
misbehaved by not returning a complete set of HTTP headers. The
headers it did return are: Can't locate File/Basename.pm in @INC
(@INC contains: .) at d:\geodsoft\cgi-bin\ftest3.pl line 2.
BEGIN failed--compilation aborted at d:\geodsoft\cgi-bin\ftest3.pl
line 2."
One thing I do like about IIS is it has good error messages. In
particular, in addition to returning the server error message, it
typically returns the output from a script, even when the script is not
outputting valid HTML which can really help with debugging. Perhaps
some will think this is a security weakness but if you do you site
right, you'll have fixed these before the public ever has a chance
to see them.
This problem seems straight forward enough except that it's been
too long since I had to configure this that I can't remember where
its done and I don't have any other properly configured NT servers
available that I can look at. It only takes a minute or two to find
the right section in the ActiveState documentation on installing
Perl on Win32. Unfortunately, this section is as close to pure
gibberish as any that I've seen in a long time. I understand
where something is supposed to go in the Windows Registry but
I haven't a clue what they're talking about as far as what is
supposed to go into the Registry.
I've actually gotten pretty good with the NT Registry, which I
think is the single stupidest computer technical "innovation"
of the past decade. If I hadn't dealt with this creation almost
every professional day for the past four years, I wouldn't believe
such a thing actually existed. It's a huge binary object the
actual structure of which is completely undocumented. It's
absolutely crucial to almost everything that goes on an NT
machine but the only officially supported interface to it is the
GUI tools provided by Microsoft. Everyone who works seriously with NT
knows that many things simply cannot be done without directly
editing the registry, yet if you make such edits, Microsoft
disavows any responsibility for any results that might ensue.
I am able to deduce from the ActiveState documentation that @INC
is at least in part derived from the environmental variables
PERLLIB or PERL5LIB. I just don't understand what is supposed to
go into \\HKLM\Software\Perl\lib or \\HKLM\Software\Perl\sitelib
to create or affect these environmental variables. Now system
environmental variables are one thing I do work with often enough
that I know exactly where to change the registry to create, change
or delete them. Microsoft has provided a simple GUI interface to
control user specific environmental variables but for some reason
doesn't think that system administrators need to add or remove
these at the system level; you can change existing values but not
add or remove variables. At least in four years I haven't found any
Microsoft documentation or GUI tool, except Regedit and Regedt32,
that are the unsupported registry editors, that lets me change
these values.
To make such a change, inside the registry key
\\HKLM\CurrentControlSet\Control\Session Manager\Environment
you need to add a value with the name of the environmental variable
you want and the content being the value of the environmental
variable, i.e. the value's name is PERLLIB and its contents or
string is D:\Perl\lib or at least that's what I thought the value
would be. Besides all this, to be sure the new value is visible
to all software, you have to reboot the machine. If a major
vendor had not actually built a system that requires rebooting
to change a system environmental variable, would anyone have
believed they would do so? I'd like someone to explain to me how
this system is supposed to be easy to use compared to one where
an administrator edits a text file in /etc and from then on
processes that start or user's that log in have a new or changed
environmental variable.
Adding a trailing backslash didn't solve the problem. Of course
I had to reboot again to determine that; stopping and restarting
the web server wasn't enough to make the change visible to it.
"File" is definitely a subdirectory of d:\Perl\lib\ and Basename.pm
is in it so why doesn't Perl see it? And why does the same
script run fine from the command line of my NT workstation and
I don't have any PERLLIB variable set? It also runs fine from
both the command line and via the Apache server on Linux. Finally
I just FTP'd the script to a remote NT server I have access to
and it ran fine both as a CGI script and from the command line.
I also telneted to that machine and there is no PERLLIB environmental
variable nor is there any \\HKLM\Software\Perl registry key let alone
lib or sitelib sub keys. So much for the incomprehensible
ActiveState documentation. The script even runs form the command
line of the NT server where it won't run as a CGI script.
It's this last thing which finally pointed me to the solution.
When I gave the anonymous user read and
execute rights on d:\perl\bin, I did not think to also include
d:\perl\lib and its subdirectories. The script worked as soon
as the anonymous user got access to the lib directories. I should
know enough to always think about what security context
a process that's not working is running in. Still the error message
did say "locate" and not "access" and the ActiveState documentation
is apparently completely irrelevant to the problem.
This was really my mistake but it did give me a chance to vent
on some of my Windows pet peeves. All of the problems I mentioned
above are real so I'm not going to retract what I said even though
this specific instance was a wild goose chase. Four years ago I
firmly believed that NT was the only operating system that most
small and medium size organizations would someday be able to
standardize all their computers on. After four years of dealing
with quirky unreliable systems, I've come to believe that NT
Server is exactly what O'Reilly proved it to be, the same system
as NT Workstation. In other words, NT Server is a glorified
desktop OS, advertised, priced, sold and licensed as a server
system. Microsoft, NT and now 2000 are going to have their place
in the computing world for some time to come. But frankly today,
anyone not at least seriously evaluating UNIX and open source
systems for their server needs has blinders on. For those of
you who say I'm not looking at Microsoft's current products,
there's an old saying that goes something like burn me once, shame
on you, burn me twice, shame on me. I'm tired of spending good
money after bad and intend to see if I can kick the Microsoft
habit.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
http://GeodSoft.com/terms.htm
(or http://GeodSoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.
|