Homegrown Intrusion Detection
Summary
File Change Tracking
Process Tracking
The Monitored System
Regular Expressions
Track More Than Processes
The Analysis Script
Alerts
Monitoring Your Systems
Conclusion
This section deals with two specific approaches to host based
intrusion detection, tracking changes to executable and system
configuration files and actively monitoring all executing
processes. Today much of the emphasis regarding intrusion
detection focuses on network based intrusion detection. Network
intrusion detection watches for hostile patterns in network
traffic and is typically real time. Network intrusion detection
typically protects an entire network or at least all the hosts on
a LAN segment. Network intrusion detection will often detect an
attack while it is in progress. After a firewall which should be
regarded as the first line of defense in most environments,
network based intrusion detection is often regarded as the second
line of defense. There are numerous commercial products that
perform network intrusion detection. Snort is probably the best
known, open source, network intrusion detection product.
As important as network intrusion detection is, host based
intrusion detection should not be ignored. Security threats are
as likely to be internal as external. Some network based
intrusion detection includes monitoring of internal activities
but some network products ignore this area. Regardless of how
comprehensive any network intrusion detection attempts to be, no
network based monitoring is likely to detect unauthorized local
activities by users or administrators authorized to use the
systems being monitored.
Traditionally, host based intrusion detection focuses primarily
on changes made to the system being monitored. Specifically it
attempts to identify changes to key files, primarily system
configuration files and selected executables. It is thus
reactive. In addition to the traditional monitoring of file
changes, I'll show how to actively track processes that are
executing on a monitored machine and display a warning when an
unfamiliar process executes or a standard process stops
executing. The process monitoring is still reactive in that it
can only detect processes that are already in an "abnormal"
state. It can be set to operate as frequently as desired so that
it can be almost real time.
When I was implementing intrusion detection on my systems, the
best known product for host based intrusion detection was (and
still is) Tripwire. The commercial product was simply too
expensive for my site and the open source version lacked the
ability to automatically store and analyze the database of the
tracked machine on a remote machine. Tripwire primarily analyzes
files on disk and to the best of my knowledge does not track
processes in memory.
It's been my experience over the years,
that if you posses good analytic skills and proficiency with a
development language or tool suitable to the task, you can often
build a new tool from scratch, faster than you can figure out and
adapt someone else's code. This is dependent on the intrinsic
complexity of the task to be performed and how many unneeded
features have complicated the code. A product to perform a
simple task that's been made complex with numerous configuration
options to adapt to diverse environments might better be done as
custom code written for your environment. Host based intrusion
detection is such a task.
The core function of Tripwire is to build a database of checksums
of files on the machine being monitored for later comparison to
detect changes in the files being monitored. Since building such
a database borders on the trivial and the versions of Tripwire
available to me lacked a capability I regarded as essential, I
chose to write my own intrusion detection using shell scripts and
Perl, from scratch without looking at how Tripwire or any
competing products functioned. The results are just over 500
lines of new shell and Perl scripts and configuration files
(including comments and blank lines) that monitor both files and
processes on the systems being monitored. Databases or logs are
saved and analyzed on machines other than the machines being
monitored and alerts for conditions that should be investigated
are displayed automatically.
Part of the point of this page is to show that the basics
of host based intrusion detection are really quite simple. The details
of my particular implementation are not particularly important other than
as a working example. Much of the complexity of a product like Tripwire
comes from the need to provide numerous configuration options required
to adapt to diverse environments. Hard coding numerous choices to meet the
needs of a single environment greatly simplifies the problem. I would
not consider building network intrusion detection from scratch because
it seems so much more complex but this may simply be due to my lack
of familiarity with the necessary network programming techniques. On
the other hand, I see no reason why any experienced system administrator
with good script skills, should not consider doing their own host based
intrusion detection, if no product available at an acceptable price
meets their requirements. The scripts or concepts presented here
may be a sufficient starting point.
The techniques used here are applicable to a hardened server such as
those described in How to Harden an OpenBSD
Server. Such a server will have a stable, simple configuration
with a small and infrequently changing user population and a fairly
constant set of processes that may or may not be executing at any
particular moment. These techniques would not work well on a
general purpose multi user host where there will be almost daily
system changes (including user names and passwords), frequently
changing scripts and diverse user initiated processes. The techniques
could be extended to a general purpose host but this would require
a significant increase in complexity. Primarily, system components
that change regularly would need to be filtered out from those that
are monitored so routine changes would not trigger alarms. A product
like Tripwire might come with sufficient flexibility to adapt to
such systems.
In the form presented here, the alarm mechanism is an interactive
prompt displayed to a user. Thus an operator or administrator who
is logged in 24 hours a day is assumed. The scripts could be
adapted to other alarm mechanisms for different environments.
The monitoring is currently performed on a Windows workstation.
The monitoring scripts should run on any system with a
scheduling facility, Perl and a GUI interface but the details
of the prompts will vary with different systems. The monitored
systems are limited to UNIX servers or systems with UNIX like
file attributes, directory structure and process tracking.
Specifically these techniques cannot be used to monitor Windows
NT or 2000 servers. Windows lacks an easy and reliable method of
identifying executable files, a highly organized directory tree
and the ability to track the command line which started each
process, all of which these scripts depend on. Also, the
generation of the content that is analyzed is so simple that it's
done with shell scripts. If Windows didn't lack the necessary
capabilities at the operating system level, the shell scripts
could easily be converted to Perl.
It is highly desirable, though not essential,
that the monitored computers be time
synchronized with computers performing the monitoring.
If the computers are time synchronized, then the monitoring computer
can analyze data from the monitored computers within a few seconds
of when it is created. This allows more frequent monitoring so
that the process monitoring can approach real time. If the computers
are not time synchronized, then lags equal to or greater than the
largest amount by which the computer clocks can drift apart, must
be built into the monitoring system.
Summary
The host based intrusion detection described here consists of two
completely independent functions. The simpler part is the file
change tracking. This requires a cryptographically secure method
for creating check sums of files on disk. Md5 on OpenBSD and
md5sum on Linux were used. OpenBSD includes other methods which
could be substituted but the old UNIX standby, cksum, is not
adequate. A simple shell script
creates a text file of check sums that is ftp'd to another system
where the most recent file is compared with the previous file.
If there are any differences, the diff result is displayed. If
there are no differences, then nothing is displayed. If check
sum results are saved on non modifiable media, check sum
comparisons over longer intervals can be made manually if
suspicious activity is ever detected.
The process monitoring makes use of the command "ps -ax" run on
a periodic schedule via cron. This command shows every process currently
executing, the command line that invoked it, the terminal it's
running on and a status. The output of the command is redirected
to a file which is transferred to the monitoring system. There
the files contents are matched against a database of regular expressions
that identify all the processes that are normal for the system
being monitored. Some process that run continuously and are
necessary for the proper functioning of the system are identified
as required. If any process is found that is not identified
as a normal process or any required process is missing, an
operator prompt is displayed identifying the unknown or missing
process or processes.
File Change Tracking
Changes to tracked files begins with a
simple shell script that creates a text
file containing check sums of the files being monitored. The script
begins by changing to a local directory where the check sum files
are created and saved. This directory is hard coded. This is
the kind of configuration choice that requires a configurable option
in a product that runs on diverse systems. Hopefully, at a single
site, there is enough standardization of locally created directory
structures that a hard coded script can be used on different systems
without change.
All the output from the script is redirected to a file that is
named in the form ckyymmdd.log where yymmdd is the
two digit year, month and day. This script can only be
used at most once a day. A script that would run two or more
times a day, would need to adjust the file naming convention
appropriately.
The first file that is checked is the kernel loaded at boot time.
In this example this is the only hard coded file to be checked.
In addition to hard coding their names, specific files to be checked
could be read from a file or database. This would increase
complexity and potentially provide an intruder with information
that might be used to evade detection or even alter the behaviour
of the monitoring system. Locating the configuration file or
database at a secure remote location would add further complexity.
The other files that are checked use the find program to pick
files that meet specific criteria to be checked. In this script
four groups of files are checked. First every file in and under
the /etc directory is checked. Then every file on the system
that is either SUID or GUID is checked. Then every executable
file on the system is checked regardless of where it resides.
There is considerable redundancy between these selections. Any
executable in /etc or any SUID or GUID file will appear twice in
any listing as well as twice in the diff comparisons. Sorting
the output and filtering it through uniq could be used to remove
duplicate entries. Because these files are likely to be more
important, I chose not to go to the extra effort to remove the
duplication.
The use of find and very simple broad criteria make it very
difficult for an intruder to alter the behaviour of the system
even if they understand it. The breadth of criteria make it
almost impossible to make meaningful system changes without
triggering an alert. This also means that a typical general
purpose host or server with frequent changes cannot be monitored
in such a fashion without causing alerts to become a routine
occurrence which negates the value of the system. The simplest way
for an intruder to defeat the system is to modify the script to
keep sending the same check sum file day after day. Thus those
responsible for monitoring, must investigate if any system
changes are made and no alert is triggered.
This sample script is from a web server. In addition to the directories
and file types that I would monitor on any system, all files in the
directory that contains web configuration data are also monitored.
Each different server is likely to have some specific files or
directories that should be monitored in addition to those that are
regarded as standard. If there are more than a few servers, it would
be better to move system configuration specific choices out of the
script and into an external configuration file or database.
The last few lines of the script use
ftp to transfer the output file to a remote system. The "-n"
option of ftp suppresses the user name and password prompt;
instead ftp gets these from the user command which is redirected
into ftp via standard input from the shell script. Since a
username and password will be laying around in a plain text
script file, this user should have as few privileges as possible
on the receiving system. Also since these scripts are most
likely to be run as root, they should also only be root readable.
After logging into the remote FTP server, the script changes
to a directory, part of which is based on the host name from
which the script is running. Once again a choice that could
have been a configurable option has been hard coded to refer
to a standard directory location.
A fairly simple Perl script is used
to compare successive check sum files. The script opens with
a series of function calls that are executed if the script can
successfully change into a series of hard coded directories
which correspond to hosts being monitored. The bulk of the logic
in this script is to determine the file names of the files to
be compared. Once again the script logic is kept simple by
avoiding unnecessary options. This script can only compare
files created on a daily basis. Here the current day and the
preceding day in yymmdd format are determined. Month
and year rollovers are accounted for but leap years are not.
If the current day's file is missing, the workstation alarm
is sounded. This is very simple but not very robust. If the
operator or administrator is not present when the script executes,
they will not know a necessary file was not available. There
will be no practical difference between a comparison that can't
be made and a comparison that was made with no discrepancies
between the compared files. A prompt that was displayed until
it was acknowledged by an operator would be better. The script
could be executed more than once to increase the likelihood
that a missing file will be noticed by an operator but this
would also interrupt the operator multiple times on any day
when there were tracked changes on any system being monitored.
The final piece uses diff to compare two successive files and
to capture the result of the comparison. If the captured
result is not zero length, the workstation alarm is sounded and
the result written to a file and displayed on the workstation
monitor. The current directory is included at the top of the
saved output to help identify the host that has had changes to
tracked files.
When a tracked file has changed there is nothing to tell the
person viewing the prompt what has changed. In the sample
listing below, the lines starting with the equal signs were
part of the line that now ends with "checkxyz)":
/alert/bsd/cksums
446c446
< MD5 (/usr/local/bin/checkxyz)
= d904fe11f370c549ffe5805269d4e439
---
> MD5 (/usr/local/bin/checkxyz)
= 3e01891c914d6b28138f098dde16f022
Understanding this output requires a familiarity with diff output.
Here the output indicates that the script, /usr/local/bin/checkxyz
has changed. More precisely line 446 has changed from the first
listed line to the second, i.e. the check sum for the file
/usr/local/bin/checkxyz has changed from d904fe11f370c549ffe5805269d4e439
to 3e01891c914d6b28138f098dde16f022. The actual values are
irrelevant. All that matters is they are different indicating
that checkxyz has changed. Paired lines starting with "<" and
">" indicate a changed file. A new file would be indicated by
a single line starting with ">" and a deleted file with a
single line starting with "<".
Actually identifying the content that has changed would require
comparing the current file to a backup or archived copy of the
file. This intrusion detection system is simply intended to
alert the person seeing the prompts what files have been changed,
added or removed from the monitored system. That person needs to
be familiar with what is changing on the monitored systems.
For example if a new user is added to a system, one would expect
/etc/passwd and all related files to change. On a BSD system
one would not expect system configuration files such as /etc/rc,
/etc/rc.conf and /etc/inetd.conf to change except very infrequently
when a configuration change was made to the system. The
binary executable files in /bin, /sbin, /usr/bin, /usr/sbin, etc.
should never change unless the OS is upgraded or patched.
As long as the monitoring is being performed by persons
familiar with the administration of the monitored systems,
unexpected changes to any of the tracked files are a strong
indicator of an intrusion. At a minumum, unexpected changes
need to be investigated and understood.
Process Tracking
Process tracking is somewhat more complicated than tracking file
changes. The analysis is done by the most complex
script that is examined here.
In addition, the script reads a structured
text file containing regular expressions and
control information to determine what processes warrant an
alert.
The script consists of four major pieces. The first 100 or so
lines read the file of regular expressions into hashes for use
throughout the remaining execution of the script. The next 70
some lines are an endless loop that executes until the script is
terminated externally. On each pass, all files from remote
monitored systems are compared against the regular expressions
saved in hashes. If an unknown process is found or a required
process is missing, an error message is built and passed to the
last piece which is responsible for the display of the error
message. The third piece is the time_wait function that is
called before the first execution of the main loop and then from
the bottom of the main loop. This function determines the next
time that monitored systems should create files listing executing
processes and waits until these files should be available on the
monitoring system. Time_wait returns to the calling process, the
mmddHHMM part of the filenames that need to be read and
analyzed.
The Monitored System
The piece that runs on the monitored systems is
extremely simple. The following script is all that's needed:
cd /var/local/logs/ps
ps -ax > `date +%m%d%H%M`.log
ftp -n 192.168.28.86 <<EOT
user ftpuser1 cog9Fow
ascii
cd alert/bsd/ps
put `date +%m%d%H%M`.log
bye
The output of a "ps -ax" command is redirected to a file that is
transferred to the monitoring system. The output filename is
mmddHHMM.log where mmddHHMM is month, day, hour
and minute.
Regular Expressions
The structured text file contains three
kinds of non comment lines. Lines that start with "HOSTS",
identify the names of hosts that are to be monitored. Lines that
start with "HOST", have the name of a specific host and process
IDs for processes that are either allowed or required on the
named host. The host and hosts information may be spread across
multiple lines. Lines that start with a numeric sequence include one
regular expression that is used to match lines in the file being
analyzed which for the most part is output of the "ps -ax"
command; the numeric sequence is a process ID and must be unique.
These process IDs are in no way related to host process numbers.
Each process line must contain a complete regular expression.
Other lines are treated as comments and ignored. The order of
lines in the file does not matter.
In the sample file the line
"HOSTS=bsd-req,anotherhost-req,host3-opt,four-req" lists 4 hosts
named bsd, anotherhost, host3 and four. Three of the hosts are
expected to be online continuously and an alarm will be triggered
if the expected output of the three hosts tagged with "-req" is
not available when expected. Host3, tagged with "-opt", is optional;
it's a test system that may or may not be online at any particular
time. If the expected output is not available, no alarm is triggered.
If the output is available, it is analyzed in the same manner as
that from any other host. In these examples four is a Red Hat 6.2
Linux system and the other systems are OpenBSD 2.7 systems.
The lines that start with "HOST" have the host name separated from
the rest of the line with equal signs and following the second
equal sign is a comma separated list of process IDs. Each process
ID is followed by a dash and an "a" or "r". Processes with an "a"
are allowed or recognized and may or may not be present. Processes
with an "r" are required and expected to be present as long as the
host is up. In the sample line "HOST=four=401-r,402-r,403-a,404-r,405-a"
processes 401, 402 and 404 are required on host four while 403
and 405 are allowed.
Following the opening HOSTS line, the sample file
continues with the
following lines:
HOST=bsd=6-r
HOST=anotherhost=6-r
HOST=host3=6-r
HOST=four=6-r
6~^ PID TTY?\s+STAT\s+TIME COMMAND$
Lines that begin with one or more digits contain a process ID
separated from a regular expression by a tilde. Everything that
appears after the tilde including any leading, embedded or trailing
spaces is part of the regular expression. The line ending newline
is not part of the regular expression. In the sample line, process
6 really isn't a process. (The odd first process ID will be
explained later.) It's the header line output by the
"ps -ax" command. This regular expression will match the output
of "ps -ax" on Red Hat 6.2 Linux and OpenBSD 2.7. It may need adjustment
for other operating systems. This is a very simple regular
expression consisting mostly of literal text. On Linux, the
terminal column is indicated by TTY and on OpenBSD it's just TT so
the Y is optional (question mark). The amount of white space
("\s+") between TT or TTY and STAT and between STAT and TIME is
variable on the two systems. On both Linux and OpenBSD the line
starts with two spaces and ends with the "D" of COMMAND with no
trailing white space.
The header line of "ps -ax" output is not a required
process on any system but it is marked as required on all four
monitored systems. Something would be odd if the output
of "ps -ax" changed and thus should be investigated. If the
format of the line changed and it was an "allowed process" an
alarm would still be triggered but if the line was omitted there
would be no alarm. Marking the line required assures that an
alarm will be triggered if the format changes or the line
disappears. If the format changed, for example a trailing space
sometimes appeared at the end of the line, the regular expression
would be adjusted to allow either variation. Generally, I try to
make the regular expressions as specific as possible, loosening
them only when "normal" conditions start triggering alarms.
Since the format allows multiple HOST lines, I group the lines
specifying allowed and required processes for each host near the
matching regular expression lines for clarity. The four HOST lines
before "process" 6 indicate that it is required on each system.
The next two groups of lines
identify kernel processes on OpenBSD 2.7 and 2.8 and Red Hat 6.2 Linux.
These processes appear to have predictable process numbers on
both systems; for both Linux and OpenBSD systems, leading spaces
and specific process numbers are listed. On Linux, background
processes or daemons are listed with a single question mark for
the terminal and on OpenBSD a double question mark. After white
space, the process status is listed. This varies from process to
process. Status of kernel processes appears to show more
variability on OpenBSD than Linux. The statuses shown here may
not be complete. Because the monitored systems are booted
infrequently and changing statuses following a reboot may not be
captured by the monitoring process, there is no way to predict
how long it will take to identify all valid statues for each
process.
Under OpenBSD 2.7 there were 5 kernel processes that always appeared.
In OpenBSD 2.8 four of these no longer normally appear. They are the
swapper, pagedaemon, update and apm0 processes. Process 1, /sbin/init,
is always present in both 2.7 and 2.8.
It would be simple to put a "\S+" in the regular expression where
the status is listed. This would match any possible status (one
or more contiguous non white space characters). As I've
already said, I prefer to keep my regular expressions as specific
as practical. When a new status appears on an allowed process,
I want to know about it and will adjust the regular expression
if the status looks like a reasonable status variation on an
existing processes and not a status on an new process with a name
similar to an already tracked processes. For example "+" (foreground)
and "X" (traced or debugged) would be curious statuses for
kernel process on a production system. Also I'd want to know about
any process that had a "Z" (zombie) status. By carefully building
the regular expressions used to match the output of ps (or other
commands) a lot can be learned about what is happening on a monitored
system and this goes well beyond simply identifying unusual processes
or missing standard processes.
Process times on OpenBSD typically have one or two digits followed
by colon, two digits a period and two more digits as indicated by the
"[0-9]?[0-9]:[0-9]{2}\.[0-9]{2}" piece of all the OpenBSD specific
regular expressions. Red Hat 6.2 Linux process times lack the
period and final two digits as indicated by "[0-9]?[0-9]:[0-9]{2}" in
the Linux specific regular expressions.
The next sets of lines show some typical
OpenBSD and
Linux daemons started at
boot time. Following the tilde that separates the process ID
from the regular expression, all these lines start with
"^ {0,4}[0-9]{1,5} " indicating the lines begin with zero to four spaces
followed by one to five digits followed by a space. Both
groups of regular expressions include web server processes which
are required on bsd and four which are web servers and not allowed
on other machines.
Next are some cron jobs for
both OpenBSD and Linux systems. Since cron jobs tend to be both
site and machine specific, nearly all those from my systems
have been deleted. Any machine
using the process tracking described here will have the monitoring
process running. Unlike other cron jobs which may or may not
be running when the monitoring process runs, the monitoring process
will always see itself in the "ps -ax" output. Any components that will
be active when "ps -ax" is executed should be either allowed or
required processes or there will always be alerts; it
makes no practical difference whether these are allowed or
required. Here the /usr/local/bin/wps is the script that starts
the monitoring. (The wps name will be explained later.) On OpenBSD
systems, an intermediate shell is spawned in addition to the script
that runs the monitoring process and the "ps -ax" command. All
three pieces need to be allowed or required to prevent constant
alerts.
The only console only
processes shown are those that control the virtual consoles on
OpenBSD and Linux machines. If the monitored systems had processes
that for either policy or technical reasons should only be run
from the local console, this would be a good place to add them.
The "tty[1-6]\s+" on Linux and "C[0-5]" on OpenBSD replace the single
and double question marks that indicate the terminal for background
processes. On Linux the six virtual consoles are indicated by
tty and a digit one through six followed by a variable amount of
white space. On OpenBSD, the six virtual consoles are indicated
by an upper case "C" and the digits zero through five.
The regular expressions for several
interactive OpenBSD processes
are shown. On Linux, network logins or phantom terminal sessions are
indicated by pts for the terminal in the regular expression
"(pts\/[1-4]|tty[1-6])" which here allows up to four network sessions.
On a busier system the expression would need to be adjusted to
account for a reasonable number of simultaneous network logins.
For example "[1-3]?[0-9]" would match any terminal number from
0 to 39. On OpenBSD a "p" indicates a network login; up to five
sessions are allowed before an alarm is triggered. In both cases,
the "( | )" notation ORs the expressions inside the parentheses and
separated by the pipe.
Most of these are different forms of shells that are
commonly present. One is for less in any form. I spend more time
in less than any other program as nearly everything I look at
is via less. If this were not suppressed, alerts would be popping
up most of the time that I am doing anything on a monitored system.
Note that vi is not included even though I spend a moderate amount
of time in it. I want alerts to pop up if anyone, including myself,
is editing one of the system configuration files or a script in
/usr/local/bin. If there are any files edited frequently that I
did not want alerts for, I would construct a regular expression
specific to such files.
Note that one of the processes relates to
man but that the command "man
commandname" or "man \S+" is not present. I want to be
alerted when man is used on the monitored systems. Remember
these are hardened servers with no ordinary users, just
administrators. Use of man suggests someone doing something new
or unfamiliar so an alert is appropriate. On OpenBSD systems two
additional sub shells are created whenever "man commandname" is used. Alerts for these are suppressed with the processes
identified as 713 and either 712 or 714 (I forget which). This
simplifies the alert listing helping to emphasize what command
man is being used for.
Track More Than Processes
The last group of lines with process
IDs 1 - 7 actually has nothing to do with "ps -ax". These lines match
output from the "w" command that gives system uptime and who is logged
into the system. These lines match the standard content of w output
and allow a very limited number of users to login from very
limited locations. Any other user logged in from any other location
would trigger an alert. Because the output of two commands, w and ps
is included in the same file the script that creates them is called
wps; in my real environment these are the first lines in the chkproc.txt
file and thus the process IDs start at 1.
It's useful to know who is logged into a system but the main
reason for including these is to show that the script can be used
to analyze output from other commands than "ps -ax". What is
matched simply depends on the regular expressions that are
created. For example, it should not be difficult to write
regular expressions that triggered an alert if the 15 minute load
average on a monitored system exceeded 2.99 or the 5 minute load
average exceeded 3.99. This would show as an "unknown" command
but as long as the person viewing the alert knew not to take this
literally, it should not be a problem.
This approach can be used to trigger an alert for almost any
unusual condition on a monitored system. Include the output of
the command that is used to measure or display the conditions
that you are interested in tracking, in the output that is monitored
by chkproc.pl and write the regular
expression so that "normal" or acceptable conditions are matched.
When the values go outside the range that matches the regular
expression, an alert will be triggered.
The Analysis Script
Since the script is well commented
I'm not going to discuss much of the coding detail but rather
some of the choices and reasoning behind the script. The
initialization section and the main loop contain the logic to
read into memory the regular expressions stored in the structured
text file and compare them to the files being analyzed. In
keeping with the simplicity over flexibility shown in all these
scripts, the main loop assumes that analyzed files will reside in
subdirectories that match the host names being processed and that
files will be found in a constant subdirectory for each host.
The first interesting option doesn't come until the time_wait
function. Here a runtimes array is initialized with
a hard coded array of minutes that match the times the process
on the monitored systems executes. It's assumed that process
monitoring will occur more than once an hour but it makes no
sense to check every minute for the first ten minutes of an hour
and then not to check for the rest of the hour. This array could
easily be changed to every sixth minute to provide consistent
monitoring throughout the hour. It could be changed to every
five minutes by adding two array elements; in the following for
loop, "$i<10" would need to be changed to "$i<12". Any
convenient interval could be used by adjusting the elements in
the runtimes array. Monitor times do not need to be multiples
of the interval so a six minute monitor interval could just as
easily run at 1, 7, 13 . . . or 4, 10, 16 . . .
The more frequent the monitoring interval, the closer this system
is to real time monitoring. The problem is this is a simple system
and will trigger an alert every time an alert worthy condition is
found. If the monitoring is every minute and an alert worthy condition
persists for 6 minutes, five or six separate alerts would be triggered.
This could be very annoying for an operator or administrator trying
to deal with the situation. The longer the alertable condition
persisted, the more annoying the alerts would become. A mechanism
for mitigating this by surpressing alerts is described later.
On the other hand, having only a few monitor times per hour greatly
increases the likelihood that significant events will be missed.
If the monitor interval were more than a few minutes and
an intruder knew the interval and times, they could perform their
work so as to evade detection.
For the purposes of intrusion detection, grouping processes into
five categories should help decide how frequently the monitoring
should run and what processes should trigger alarms. First are
the required or continuous processes which should never trigger
alarms. These will be identified almost immediately when
defining the matching regular expressions. The more
frequently the monitoring is performed the more quickly the
less frequent statuses will be detected and added into the matching
regular expressions. Second are the routine or periodic
processes. Typically these will include all cron jobs and
completely ordinary user processes such as checking mail when
first logging on. Maintaining web content and reviewing web logs
on a web server might also be examples; depending on the number
of users and their roles, these might be routine for some users
and unauthorized for others. Routine processes should not
trigger alerts. Even with very frequent monitoring it may take a
long time before all status and command line variations of these
processes are discovered.
Third and fourth are infrequent and sensitive processes. Infrequent
processes are legitimate and harmless commands that probably should
not trigger alerts but which occur with such infrequency and
variability that it is not practical to anticipate them and build
regular expressions to surpress the alerts. There will be a
continuum from routine to infrequent processes. Probably whenever
a new processes is seen that is not sensitive and is likely to
recur from time to time, a regular expression should be built to
match it to keep alerts as infrequent as possible so they are not
dismissed without action by those who receive them.
Sensitive processes should occur infrequently enough that they are
not routine. They will be perfectly legitimate at some times and
under some circumstances but are important enough from a security
perspective that they should trigger an alert. Examples might be
execution of user maintenance programs or editing of the system
configuration files. If sensitive processes occur with such frequency
that they become routine, there is probably something wrong with
the system administrative policies and procedures. Even if the
activity represented by an alert related to a sensitive process
is entirely legitimate, it should trigger a thought process in
whoever recievers the alert. That person or persons should
question who is doing what on the system and if the alert cannot
be matched to expected system activity, it needs to be
investigated. Typically this would begin with a review of the
full "ps -ax" output to determine who or what caused the alert.
Then if it was interactive, is it appropriate for the
specific user to perform the action from the terminal from which
it was performed? This could lead to checking with other
administrative personnel or comparing file contents with backup
or archived versions.
Fifth are unauthorized processes which should always trigger an
alert. There is however no direct way to test for unauthorized
processes since these could be virtually anything. It could be a
routine process performed by an unauthorized user or from an
unauthorized location or it could be the execution of a new
command illicitly brought onto the monitored system. A detected
unauthorized process will, by definition, never be a required or
allowed process. The best way to insure that unauthorized
processes are detected is to make the regular expressions
defining recognized processes as specific as possible so that
unauthorized commands do not accidentally match the expressions.
For example "(\/usr\/bin\/)?vi \S+" would be a very poor choice
for an allowed command as it would allow vi to be used on any
file without detection.
Like most intrusion detection systems, getting this system right is a
matter of finding a balance. Ideally, every suspicious activity will
trigger an alert and every alert will trigger an operator response,
even if that response is no more than a mental review of who is doing
what on the monitored system. If either the system is made to ignore
potentially significant events to minimize alerts or alerts become so
frequent that operators ignore them, the system will likely not
accomplish its purpose.
Alerts
The process_errors function in chkproc.pl
is responsible for creating alerts. In this case, the script pops up
a notepad window containing an error log on a Windows workstation.
This is the most system specific part of the script. Both the
form of the alert and the manner in which it is presented to an
operator or administrator can and should be changed to meet specific
site needs. On most systems it will be normal to write the condition
causing an alert to some kind of log but beyond this little will be
standard. The alert could also be a dialog box rather than a notepad
window, a network message causing dialogue boxes on multiple remote
machines, an e-mail message, an audible tone on the workstation,
dialing a beeper, creating or modifying a web page or updating a
database that feeds a paper or web reporting system,
or a combination of these or other actions.
Other alert forms will require some additional
work besides coding a delivery mechanism. The logic here provides
for a single display of the error log which is left on the screen
until cleared by the operator. If an additional alert is to be
displayed, the previous one is cleared before the new one is displayed.
In the displayed log file, the same alert condition may be repeated
numerous times. If the delivery were via e-mail or pager it would
be very annoying if the same condition triggered additional e-mails
or pages, possibly every minute for an extended period of time.
This would render the system unusable. If an alternative alert
mechanism is used, a method will need to be built to determine if
the current alert is essentially the same as the preceding alert
and surpress it if it is.
In the current script, a single window is maintained by checking if
notepad is already displaying the error log. This is done via the
NT Resource Kit program tlist, which is like a simplified ps.
Tlist doesn't report the command line that started the listed programs.
It appears that various Windows programs such as word processors,
browsers and notepad report to Windows, the primary file they are
acting on. If you keep such a program open but change the file being
viewed or edited, the tlist output changes accordingly.
Even though the script cannot determine if it launched a
particular copy of notepad, by capturing tlist output and looking
for notepad and the error log name it can determine if there is a
copy of notepad it may have started. If so, it extracts the
process number from the tlist output and uses the Resource Kit
command, kill, to kill the process. If the workstation has been
unattended for an extended period of time during which many
alerts were displayed, this has the effect of conserving system
resources and keeping the number of windows that an operator
needs to deal with to one.
Because each new alert is popped into the foreground, killing the
previous alert, does not prevent a logged in and active operator
from being interrupted each time a new alert is displayed. There
will be times that the person receiving the alerts will not want
to be notified every time a new alert is generated. Examples
might include when an investigation of an alert that cannot be
completed before the same alert is repeatedly displayed or
extended system administration that might generate a series of
alerts.
The script includes a mechanism to allow the operator to surpress
a series of potentially distracting alerts. If a nowarn.txt file
exists, it is read and its contents are assumed to be a number of
seconds. If the nowarn.txt file modification time is older than
the number of seconds, the file is erased and the alert displayed.
If the file is newer than the number of seconds, the alert is not
displayed. The error message contents are still appended to the
error log. The operator can create the nowarn.txt file with a
simple echo command when it's needed. The script includes a fail
safe. If the operator enters an unreasonably large value or an
invalid format, the nowarn.txt file will be erased if it is older
than four hours and alerts will resume.
The script includes no mechanism to truncate or limit the growth of
the log file. To a point this is useful. If the operator has been
away for a while, he or she can easily see how long an alert condition
has persisted plus any processes that executed only for a limited time
period and generated an alert since the last time the error log file
was viewed.
After the operator has dealt with any alert condition,
old alerts become a distraction. Since the script cannot easily
determine when an alert has been seen, this is dealt without outside
of the main script by a small maintenance
script. This script simply creates a file name composed of "err"
plus the two digit year and month and ".log". The current contents
of the alert error log are appended to this long term error log and
the alert error log is then erased. The long term error log serves
no practical function in the alert system; it's simply a historical
record of what alerts were generated. The maintenance script is
run from a Windows desktop or start menu shortcut.
Monitoring Your Systems
The regular expressions provided here are specific to recent
versions of OpenBSD and Linux and cover only a subset of the processes
likely to be found on these systems. Creating regular expressions
specific to your systems and what is and is not routine on them
is key to the success of this method of process monitoring. To
start, the small script for the monitored systems should be copied
and modified. It should be run at least as frequently as it will be in
production. Running it more frequently will help to identify
state changes and other variations on the same process. For example
there are a number of processes on Linux that sometimes are surrounded
by square brackets and at other times are not; these are definitely
the same process as the process number doesn't change, just the ps
display.
"ps -ax" may not be the exact command on your system. You want the
version of ps that shows all processes for all users including all
non terminal processes. You also want a version that shows more of
the command line rather than memory use or some other information that
ps can display. Once the correct form of ps has been put into a
script and run from cron, save the output for a day or so. View the
files in time (filename) order. Every time the size of a file changes,
a process has exited or a new one begun or a state has changed.
There is little point in looking at successive files of identical size.
The best time to do this is on a newly installed system before
it's been connected to a network and certainly before it's connected
to the Internet. Once a system has been connected to the Internet
or even been up for some time on local networks, it may be compromised.
If you are adding intrusion detection to older systems, then install
a new system with the same OS and run your monitoring script on it
to get a baseline of standard processes. On the older systems, you
need to figure out what every process is that's not part of the baseline.
You should also use a network scanning tool to verify that the
monitored systems are not exposing any ports that they should not
be. If they are, the process responsible needs to be
identified and disabled.
When you have a general idea of what's running on the system to
be monitored and should be running, take any specific ps output
file that's typical as a starting point. Unless the process is a
kernel process with a constant process number, replace the
process number with a regular expression that will match the
range of process numbers that may be assigned. There is a resonable
chance the sample regular expressions above may work. Determine if the
process is a background only process, a local console process or
a network terminal process or may be any of these. Replace the
specific terminal identifier with a regular expression that
matches the terminal locations where the process should run from.
Next create a regular expression for the status. For some
processes a literal will suffice but most processes show multiple
statuses over time. Use grep on the directory of saved ps output
to review many instances of the command and see what statuses
show, building an appropriate regular expression. Remember
square brackets enclose a list of single character expressions.
One character from the list will match. Parentheses group larger
than one character expressions and pipes OR expressions. A
question mark indicates zero or one of the preceding character or
parenthetical group. A plus sign indicates one or more of the
preceding character or parenthetical group. Curly braces indicate
a specific number (or range with a comma) of the preceding character
or parenthetical group. There are a significant number of
expressions for status in the examples.
Determine if columns are separated by a fixed number of spaces or
a variable number. Use literal spaces if the number is constant
or "\s+" for one or more. Determine the time format for your
systems version of ps. On of the above patterns may match. Here
is an opportunity to do something extra besides simple process
tracking. The example regular expressions for time are the same
for all processes (on each type of system) and will match almost
any time ps might report. If you have processes that may consume
excessive resources that you want to monitor, more specific time
expressions can be built so that if a specific process exceeds
some predetermined amount of time, an alarm will be triggered.
The last part of the ps line is the command that started a
process. Here, for starters, you'll be taking mostly literal text
but precede forward slashes with back slashes. Repeat the above
process for each different command in the sample file you started with.
Delete multiple lines for processes that run with multiple instances of
the same process. Then add in process IDs and tildes. You can
number the process IDs sequentially or in sequences leaving gaps or
try to break them into related groups as I have. Add a HOSTS line
with just one host and one or more HOST lines defining which processes
are allowed and which are required.
Allowed processes may have more than one line that matches if the
statuses or command parts vary too much to easily create a single
regular expression that matches all variations. For a required
process to work properly without triggering unwanted alerts, every
possible variation of process number, terminal, status, time and
command must be accounted for with a single regular expression.
If you don't know regular expressions well enough to do this,
make the command allowed and build two or more regular expressions.
Resist the temptation to make a general regular expression
that might match unintended commands. For example "^.*$" will match
any possible line. Making such an expression required for a host
would prevent any alert for that host from ever being displayed.
Don't worry about getting your first chkproc.txt file right or
complete for even one host. You won't, I promise. If you're not
already ftping the ps output to the monitored system get that working
and adjust chkproc.pl to match your directories and make the
runtimes match the actual cron schedule used on the monitored
systems. All systems need to use the same schedule or chkproc.pl
will need significant adjustment. Make any other system
specific changes that are appropriate. For testing, chkproc.pl,
could be modified to read all the ps output files in a directory
rather than running on a schedule. This will expedite testing
the regular expressions to verify that they match expected variations
in the ps output.
Whether or not you test your regular expressions against a directory
of ps output, you will at some point need to start running it
against live output from the monitored systems. I think it's best
to start with a single system. Unless you've done an extraordinary
job with your regular expressions, you will see both missing required
processes and unknown processes. If you see no alerts, the other possibility
is that you have an excessively broad regular expression for a
required process. If you have missing processes but no unknown processes
an overly broad regular expression as an allowed process could cause
this.
Since alerts are supposed to be an infrequent occurrence, you need
to study the alerts and compare them to your regular expressions.
Keep refining your regular expressions until all the alerts are
suppressed. But also keep the expressions as specific as
possible. Also, I'm assuming that no one is on the system being
set up for monitoring and making the kinds of changes that should
be triggering alerts. Depending on the system being monitored
and your skills with regular expressions, creating the first
chkproc.txt file is likely to take two to several hours and
refining it, half an hour to several more hours.
When you don't get new alerts each time a new ps output file is checked,
it's time to move onto another system. If the operating system is
different, you may want to repeat the entire process building a new
chkproc.txt file and appending it to the first one. Be sure to keep
process IDs unique. If the systems are very similar adding new
host definitions to chkproc.txt may be all that's required. Either
way expect to see some new alerts. Each additional system monitored
will likely trigger some new alerts.
Once monitoring of one or more systems is in place, expect to see
alerts occur from time to time. Initially they will be much more
frequent and may occasionally come in bunches. Different process
statuses, command formats and jobs that do not run continuously
but periodically will all contribute to these. When you next
perform some adminstrative tasks for the first time since monitoring
was put in place, you'll see more. Early, many of the alerts will
result in adjustments to regular expressions or additions of new
processes. You're building a database of all the normal process
states on the monitored systems. It may never be 100% complete.
Over time you'll see more alerts that you don't want to surpress,
alerts for the processes described above as sensitive. You may
even see alerts triggered by unauthorized activity; that after all
is reason for building the system in the first place. As you refine
the database of normal activity don't get complacent and just
add everything you see into it. If you don't understand a process
or status that triggered an alert, investigate it. This monitoring
system is also a learning tool for the system administrators using
it. By the time the system is running smoothly, the administrators
using it should be able to look at ps listings and have a clear
idea what every process displayed is doing and what started it.
Every process on a system is started by a user or cron or
directly or indirectly spawned from an already running process.
If a process is triggering an alert and you don't know why, don't
eliminate it as an allowed process. If necessary, use
nowarn.txt to temporarily stop seeing the alerts. Keep watching
until a pattern develops. If you stick with it, eventually
you'll figure it out. It took me about four weeks to figure out
what was starting sendmail in the background. I knew I could
make the alerts go away in a minute or two but wanted to know the
cause and correctly believed I'd learn something about the system
if I stuck with it. Once I understood the cause, I was easily
able to fix the problem and stop Sendmail rather than accept a
process I did not want to run.
Conclusion
Though these scripts will require modification for use on other
systems, they are provided as a starting point for simple host
based intrusion detection. The only restriction on their use is
that the copyright notices be respected. Hopefully they have
shown that simple scripts can build a fully automated
notification system alerting selected personnel of changes to key
files or of questionable processes that may need to be
investigated on monitored systems.
Though the two pieces, file change and process monitoring are
logically independent, they form a much stronger intrusion
detection system when used together than either component used
alone. Anyone who knows UNIX, knows that text files that are not
marked as executable can still be interpreted by a shell.
Thus, while it may be difficult to make permanent changes to a
system without triggering a file change alert, it's not at all
difficult to be on a system with file change monitoring and to be
able to execute non tracked files without changing any tracked
files. This would evade Tripwire or any system
based on tracking file changes via checksums in the same manner
that it could evade the scripts presented here. Only a file
change monitoring system that tracked every file on the system
would pick up suspicious changes. Effectiveness of a system that
tracked every file would depend on operators or administrators
who knew every file on the system likely to change and who had
the time to investigate all but the most routine changes.
On a system that implemented process tracking as described here,
it would be difficult for an intruder to operate long without
being detected if the process monitoring was reasonably frequent.
If suspicious activity was not investigated promptly and the
system did not also implement file change tracking, it might be
possible for an intruder to alter ps or the script that invokes
ps on the monitored system in such a way that process alerts were not
triggered. Simply sending the old output from a previous
execution of ps again and again would normally have this effect. Thus
even if authorized changes are made to the process monitoring
script, its content should be rechecked when the file change
tracking system reports the change. Unauthorized changes to the
process monitoring script or the ps executable are by
definition, unauthorized activity, whether by an
intruder or an authorized user using the system in an unauthorized
manner.
With both file change tracking and process monitoring as
described here and used to check each other, it's difficult to
see how either an intrusion or unauthorized activity by authorized
users could go on for long, undetected.
Top of Page -
Site Map
Copyright © 2000 - 2014 by George Shaffer. This material may be
distributed only subject to the terms and conditions set forth in
https://geodsoft.com/terms.htm
(or https://geodsoft.com/cgi-bin/terms.pl).
These terms are subject to change. Distribution is subject to
the current terms, or at the choice of the distributor, those
in an earlier, digitally signed electronic copy of
https://geodsoft.com/terms.htm (or cgi-bin/terms.pl) from the
time of the distribution. Distribution of substantively modified
versions of GeodSoft content is prohibited without the explicit written
permission of George Shaffer. Distribution of the work or derivatives
of the work, in whole or in part, for commercial purposes is prohibited
unless prior written permission is obtained from George Shaffer.
Distribution in accordance with these terms, for unrestricted and
uncompensated public access, non profit, or internal company use is
allowed.
|