GeodSoft logo   GeodSoft

Linux, OpenBSD, Windows Server Comparison: Scalability

The most common meaning for the word scalability when discussing computers, seems to be, how many processors in a single machine, an operating system is capable of supporting. It seems to me that if this is really important, then the UNIX products from Compaq, Hewlett-Packard, IBM, Sun, and perhaps others would be the primary contenders, and none of the systems discussed here are really leading choices. As I have no experience with any of these kinds of systems, that's speculation.

Another way to define scalability might be to build large computers from individual units that can be applied to a single computing task or a parallel supercomputer. Here we mean a cluster of machines that work together to solve a common problem. Recent projects of this type have consisted of hundreds to thousands of Intel CPUs running Linux. Linux clusters include both high end commercial offerings with many CPUs housed in a single cabinet and home grown solutions including arrays of identical machines and even mixtures of different processor types and speeds. A Scientific American article quoted in the next paragraph stated "As of last November, 28 clusters of PCs, workstations or servers were on the list of the world's 500 fastest computers." It's not clear how many of these are Linux based.

One, not in the top 500, described at http://stonesoup.esd.ornl.gov/ first became operational in 1997. The page at the Oak Ridge National Laboratory self describes this as "No Cost Parallel Computing." This cluster of over 130 machines, is about 40% Pentiums, a few Alphas and the rest 486s. The Intel machines are all Linux and the Alphas, Digital UNIX. All are discarded, obsolete computers. As faster machines become available, the slowest nodes are replaced. This demonstrates how Linux can be used to build special purpose, high performance systems at an exceptionally low cost to performance ratio.

Since my experience is in small environments, I'm going to define scalability in a manner that is applicable to such environments. I'm going to use scalability to mean the ability to move and reconfigure machines as needed to make effective use of available resources and to add processing power as needed when and where it may be needed to perform necessary functions. Another way of looking at this is to ask how easy is it to install, configure and reconfigure computers, how easily can a configuration be moved from one computer to another or be split between multiple systems and how well does each operating system work with a mix of active applications and servers. Also whether or not there are any fundamental performance differences between the different OSs should be considered. If it consistently takes twice as much CPU power to perform a similar task on one OS as another it will require more or faster computers.

System Performance

Regarding system performance, there is no simple or easy way to make a broad and reliable generalization that one operating system is faster or slower than another. Generally comparisons are based on benchmarks, or a carefully structured set of tests that are timed to provide relative performance figures. To the extent that the benchmarks perform operations similar to those that are performed on a production machine in a ratio that closely corresponds to what will be done on a production machine, a benchmark gives some indication how the production machines might perform. The more focused the functions of a specific machine, such as a server running one or two server applications, the more important that the benchmark measure the functions that the machine will actually be performing most of the time.

Static Web Pages

One "benchmark" that's shown up for a few years, consistently shows Windows NT and more recently Windows 2000 and IIS beating Linux and Apache by very large margins serving static web pages. Some have used this to suggest that Windows is faster than Linux or at least IIS is a faster web server than Apache. Apache uses, and starts if necessary, a separate process for each active page request. Starting a process is resource intensive compared to the relatively simple task of receiving an http request, reading a disk file and transmitting it to the requesting client; it's quite likely Windows threaded architecture is more efficient, possibly enormously more efficient at this specific task.

The reviews I'm referring to are the web server reviews that PC Magazine has run for several years. The static page loads described range from one to eight thousand per second or nearly 4 to 25 million per hour. This is up with the busiest sites in the world except such sites rarely serve static pages anymore and surely not off a single server. The May 2000 review doesn't even describe the Linux machine(s) configuration. I doubt any configuration changes would have made the Linux machine faster than Windows 2000 but one has to wonder how well optimized the Windows machine was and what if anything might have been done to improve the Linux machine performance.

PC Magazine gives both Windows 2000 and Sun Solaris 8 "Editor's Choice" acknowledgment but leading web sites keep picking Apache on various platforms including heavily Linux and FreeBSD as well as Sun. According to Security Space the top 1000 web sites, Apache has 60%, Netscape Enterprise 15% and IIS 11%. They use an interesting weighting system (counting links to the sites like Google does) but every measure I've ever seen, gives Apache approximately a two to one or better lead over IIS. The Security Space rankings are very interesting because they give Apache a much larger relative market share among large web sites than the broader measures that look at millions of web sites. You wouldn't expect Apache or free or open source software to rank well at the larger sites that presumably could afford commercial solutions unless they found a clear overall advantage. As PC Magazine does not even describe the Linux system configuration, one has to wonder what biases are affecting PC Magazines choices and rankings.

Red Hat has built a new web server Tux, for Linux. This is a web server optimized for speed. In June 2001 tests by eWEEK Labs another subsidiary of Ziff Davis, which is also PC Magazine's parent, show Tux to be almost three times as fast as Apache on Linux and more than two and half times as fast as IIS on Windows 2000 (Tux -12,792, Apache - 4,602, IIS - 5,137 transactions per second). This shows the danger of relying on any "benchmark", especially a single application based benchmark, to estimate platform performance. Change the application or the task and you can get totally different results. It's also worth noting that on the "transactions per second" which includes a mixture of static and dynamic content places Apache only modestly behind IIS where the "static page" test shows enormous differences.

Returning to the "poorly" performing Linux Apache web server, it's worth noting that one single CPU Linux server running Apache was able to serve approximately 1000 static pages a second or 3.6 million static pages an hour. How many web sites in the world serve pages at this rate? How many do it with a one, single CPU server? How useful is this benchmark to anyone? Who has a Gigabit (20 some T3 lines) connection to the Internet? I mentioned this performance result not to create a straw man, but because PC magazine is well known and for several years, particularly on this test, Linux and Apache look very weak compared to NT and IIS.

The feature list included GUI management wizards for IIS but listed none for Apache. Apache does have a GUI configuration tool, Comanche. Comanche may have been excluded because it's technically not part of Apache, the authors didn't know about it or some don't consider it up to IIS standards as a GUI management tool. The feature list did not include Apache's text configuration file, even though, for those who know what they are doing, a text configuration file is the easiest way to access and manipulate a server configuration. For "Development tool included" an "Optional Visual Studio Enterprise Edition" is listed for IIS but the only system, Apache, to include full source code, editors and compilers to do anything you need to the system has "None". PC Magazine has a bias in favor of Microsoft, and a very strong bias towards systems that are easy for a novice to set up and use and against systems that may require professional skills to use to optimal advantage. Who you listen to matters.

Other "Benchmarks"

Just before I wrote this, I ran a small Perl benchmark. On identical machines, according to this Linux is about 20% faster than NT and NT is about 20% faster than OpenBSD. On some other benchmarks I did in the past, OpenBSD blew away Linux and on others, Linux outperformed OpenBSD; I couldn't do all tests on all systems because I didn't have comparable development tools on all systems. Earlier in 2001, I ran the command line version of SETI@home, a floating point intensive application, on most of my systems for about two weeks. Every SETI@home run is different so comparisons are only approximations. On essentially identical machines, the OpenBSD version typically ran about 50% faster than the Windows NT version, with Linux roughly 10% slower than NT. The then current version of SETI@home software was not available for OpenBSD; the commands and output were almost identical. Did the OpenBSD version do less or was OpenBSD much faster on this task?

Since Celerons acquired built in cache, the trade press has often referred to them as price performance bargains; on the same version of OpenBSD, I found 533Mhz Celerons to run 2.5 to 3 times slower than a PIII 500 running the identical SETI application. Treat all performance comparisons with skepticism. The only performance that matters are your production systems, which are not likely to directly correlate closely to any standard benchmarks.

Hardware Requirements

My first NT machine was a Pentium 133. I started with 32MB RAM; this was totally unacceptable. Increasing this to 64MB made an almost useable machine. 128MB seemed to be the right amount of RAM. 96MB may be a useable minimum of RAM but it's such an odd amount today that for all practical purposes 128MB should be considered the minimum amount of RAM for an NT or Windows 2000 machine. The Pentium 133 is pretty sluggish by today's standards but is still useable (in my opinion) but is the absolute minimum that should be considered. A no longer avaialble ZDNet review said that 128MB on a 400Mhz P2 is about the minimum to run Windows 2000 and XP "acceptably". Any version of Linux or OpenBSD will run in much less memory on a much slower machine.

I believe they can both, with effort, be installed on an 8MB 386 though this may not be true of the 2.4 Linux kernel. From what I've read they are comfortable on 32MB, 486's. If you tried to run the X Window system on such a machine it would be unacceptably sluggish but I don't think a GUI is necessary or desirable on a server. Why does Windows require so much more hardware to run than the UNIX like systems? Obviously much of it is the need to support the GUI, which in the case of Windows is not an optional component. It also suggests that the GUI is likely to impose a performance overhead. Without the X Window components loaded, the Linux or OpenBSD system should have more resources to make available for the server applications that are the reason for being. I know a Red Hat 7.1 Linux workstation typically has about twice as many processes running with X Windows active as a Linux system without.

Red Hat 7.1 used as a desktop system with X Windows on a P3 500 with 128MB RAM seems roughly comparable to NT Workstation on a P2 450 with 256MB RAM. Both are fine with the last couple of applications that have been used but both can be aggravating slow in using an application that has not been used for some time. The NT machine, despite its much larger memory, can be especially irritating in this regard but I tend to have a lot of applications open at the same time. Increasing memory on both to 384MB seems to noticeably improve the Linux machine but makes no difference on the Windows NT machine.

Windows, at least NT, uses the memory it has incredibly inefficiently. When I wrote this, I had 20 open windows. I frequently have more. On a machine with 384MB of RAM, Performance Monitor shows 330MB available and 130MB committed for a total of 460MB. In other words even though there should be 254MB of available (physically free RAM, 384 - 130) Windows NT has cached to disk 76MB (460 - 384). How is it possible to be so stupid about memory management? This machine has lots of memory to spare (254MB) yet more than half of the used memory (76 of 130MB) has been cached to disk causing long waits when I switch to a window that hasn't been used for a while.

Increasing the Linux machine to 384MB eliminated waits. Because I don't use that machine as my primary machine, I tend to have less open. I've deliberately opened far more than I ever do in normal use and waited minutes, hours, days and returned to that machine. There are no more waits no matter how long it's been since an application was used. I assume the NT memory behavior has to do with assumptions about memory availability which was typically not more than 16MB when NT was designed. This should be easy for Microsoft to fix in Windows 2000. I hope they have but would be curious to know from anyone with first hand experience.

Linux and OpenBSD's low hardware requirements allow machines that would otherwise be discarded to be used effectively. Any process that is not intrinsically resource intensive is a candidate. A dedicated DHCP server could be an example.

OS Performance Comparisons

My sense, and I have no proof for this, if you had comparable text based server applications on similar machines, is that they would run consistently but only moderately faster on either Linux or OpenBSD compared to NT provided that the X Window system was not loaded. How much, if any faster, would depend on exactly what the application did and would likely vary as other functions were performed. If the X Window system were loaded, the open source systems would likely loose their advantage. If a user were actively engaged in management tasks at the console using X and a user were doing similar things under Windows, I'd expect the Windows system to have somewhat of an overall performance advantage on a broad range of management tasks and server applications. As always, whether this was true, would depend on exactly what was being done. Since the X Window system is a GUI that can be dropped on top of any text based UNIX system and Microsoft Windows systems are now built as a single system where the GUI isn't an add on but an integral part of the system, I would find it somewhat surprising if Windows wasn't typically faster on graphical tasks given comparable hardware. There are advantages as well as disadvantages, to tight integration.

Except for a GUI management interface, GUI server applications are a contradiction in terms. Still, native Windows applications are almost invariably developed with a GUI management interface and only applications ported from other environments retain text based management interfaces. It may be misleading to even refer to a "text management interface" as these server applications have no interactive mode. They simply get their settings from the command line and or a disk file. Typically the disk file is a text file that an administrator maintains with their preferred editor.

It is difficult to see how, across a broad range of server applications, assuming generally comparable levels of quality, how properly configured, lean applications, lacking the overhead of a GUI should not generally outperform counterparts that depended on a system that always includes significant GUI components. The GUI management components won't actually be part of the executable server applications. Rather, just as the text editor is the interface to the text configuration file, the GUI management tools will be the user interface to the Registry, Active Directory or wherever else Microsoft or the specific vendor has most recently decided to store configuration settings. Since the X Window system is not closely integrated with the operating system, if there is no recent management activity and the OS needs memory, all the X Window components should eventually be swapped to disk. With Windows and the GUI and OS being one and the same, there are much more likely to be significant GUI components, that will never be swapped to disk.

In the spring of 2001, the Linux 2.4 kernel gained multithreading capabilities and multiprocessor efficiencies which are likely to greatly narrow if not eliminate two areas where Windows systems have had performance advantages.

Some might say that in a computing world now dominated by GUI based operating systems, it's an unfair and unequal comparison to strip these components from Linux and OpenBSD system for comparisons with Windows systems. As a counter, I'd say the only rational approach to configuring machines that have well defined purposes such as servers, is to examine those purposes and tailor the machines as much as practical to serve those purposes. It's a major strength of Linux and all the open source BSD systems that the GUI really is not part of the essential OS and must be added as specific option in the install process. It doesn't even need to be removed; it simply never needs to be installed in the first place. The only purpose for performance testing is to approximate the actual configurations that will be used in live environments; if a Linux or OpenBSD server will run in production without a GUI, that's the only reasonable way to performance test the systems. If you performance test without a GUI, you can't add the GUI for ease of learning or use comparisons.

A definitive answer to the relative performance merits of Windows NT or Windows 2000, Linux and OpenBSD cannot be answered without carefully controlled experiments. Even if there were a clear answer, a change in the primary server application or important changes in the infrastructure environment might yield very different results. Those who need to deal with true high volume servers, need to do a significant amount of testing to find the best platform and application choices.

Price Performance Ratio

It's an interesting exercise to compare what two different operating systems can do on the same hardware, but this is not likely to produce a number many businesses really care about or should care about. A much more important metric is the price performance ratio. If the operating systems cost exactly the same, then comparing them on the same hardware makes good sense. If however one operating system is free or very low cost and the other costs thousands of dollars, it makes sense to compare two machines where the total installed costs of hardware and software are the same.

When I last checked, a 5 user license for Windows 2000 cost around $700 and for a machine that was to be used as a public web server, Microsoft added an item called an Internet Collector Fee which cost just under $3000. When it's almost a given that additional Resource Kit, backup and or defragmentation software will need to be added, the starting software cost for a public, Windows 2000 web server, is effectively over $4000. You can buy a nice medium size, hardware only, server for that; in fact, such a server could likely handle all the traffic that several T3 lines could handle. When you add the Microsoft licensing costs and additional necessary software to the price of hardware, I simply do not see how Linux and OpenBSD systems could fail to outperform Windows systems on a price performance ratio, across a broad range of applications.

I have no basis for making performance distinctions between Linux and OpenBSD except as regards multi processors and clustering. OpenBSD supports neither technology so cannot compete against either Linux or Windows where such solutions are applicable. OpenBSD is developing SMP but given the years that Windows and Linux have already had multi processor support and its comparatively limited resource base, it's hard to see how OpenBSD could reasonably hope to compete effectively in this area.

Scalability As Cost Effective Performance

There is more to scalability, as we are discussing it here, than either the best performance for a single defined task on a specific hardware configuration, or performing the most operations at the lowest cost per operation when both hardware and software costs are included. These are simply raw performance measures. In a very small environment, even one high end, single CPU server may have more processing power than is needed for the required tasks and thus due to its high costs is not an appropriate solution. All environments need to meet their performance requirements at reasonable costs.

The smallest environments may start with a single server on which all services run or a very small number of machines may provide different subsets of services such as file and print, e-mail, business management applications, etc. For a business to survive, over time software will be replaced and or added, staff change or grow, machines age and be replaced. From a systems management perspective, there may be a significant advantage if all server applications can be kept on or moved to a single machine that runs on widely available and inexpensive hardware.

Unless there are essential business management applications, that run only on non Intel hardware platforms, there can be a strong push to move to Windows NT and now 2000 just because of the number of applications that are available for it. It's worth remembering, that unless special terminal server versions of NT or 2000 are being used, most of the applications the users work with do not run on the server but rather on their PC's. Users may load the applications or data from the server's disk drives, but both Linux and OpenBSD, as do nearly all other UNIXs, also support Windows disk sharing if needed.

Some of the applications or services that really are server applications and execute on a server are FTP, Telnet, SSH, SMPT, time, DNS, HTTP, POP3, Portmap, Auth, NNTP, NTP, NetBIOS *, IMAP, SNMP, LDAP, HTTPS, and Kerberos *; the asterisks indicate multiple affiliated protocols.

Nearly every business will want NetBIOS or the Novell, Macintosh or UNIX counterparts for disk sharing but will want to keep these visible to local only and not Internet computers. Very small businesses may choose to have their ISP provide all Internet related services and thus run nothing but a file and print server and a host for business management applications. In such a simple environment, if the business management applications run on Windows or Novell it makes sense to use this platform for file and print sharing as well. Likewise, if the business management applications run on a UNIX platform, the options for providing Windows or Novell like file and print services from the Unix system should be investigated, as a satisfactory solution here could significantly reduce system management overhead.

If the organization is large enough to want to manage its own e-mail system, have an Intranet, or consider hosting a public web site, FTP or list servers, then it should take the selection of servers very seriously. Given Microsoft's large market share and comparative unfamiliarity of the open source alternatives, Microsoft is often selected without there ever being an actual selection processes. If they are even thought about, commercial UNIXs are often seen as too expensive and open source UNIXs as "not supported", which is not correct and addressed elsewhere.

Relocating Server Applications

Since here we are focusing on scalability it's worth looking at factors like how well a machine supports a mix of server applications, how easy a specific configuration can be duplicated on a different machine and how easily a service can be added to or removed from a specific configuration. Any server should be able to support a mix of applications and varying load levels though to some extent it's not surprising if a server becomes less stable as the number of active applications and loads (CPU, memory, disk and network I-O) increase.

If a single server has been sufficient and you now need to add another server, the more you understand about what is going on in the current server, the easier it will be to select wisely which processes should be moved to a new server. Windows NT provides two tools. Task manager gives an interactive snapshot with a regularly changing display. Depending on sort order the entire display may jump around or numbers change if tasks are shown in alphabetical order; task manager provides no logging. Performance monitor can provide logging but you must "catch" a process actively executing before it can be tracked. Processes that are started, execute a second or two an exit are almost impossible to track (with either tool) though they may consume most of the systems resources. I've seen an NT system afflicted with such a process. We could see it appear in Task Manager but it never lasted long enough to learn anything useful.

UNIX's ps command on the other hand, can capture just about everything the system knows about a process, including its parent. A script running ps in a loop, without delays, and saving its output should generate a huge amount information. Any process listed, regardless of how briefly it lasted could be tracked back via the path of parents to the process spawning the resource consuming children. This is an example of a rare occurrence that cannot be monitored on Windows but can easily be monitored on UNIX. UNIX keeps records on many factors related to every process as it executes. Ps can track them all and display or save them at the user's choice. Though it doesn't draw nice interactive graphs like Window's Performance Monitor does, it's far more informative. Systematic use of ps can develop a profile of whatever is going on a system.

Because the hardware specific components of a UNIX system are isolated and generally well known, an existing UNIX configuration can be used to create a custom install process allowing the configuration to be migrated to a more powerful machine if that is all that is required. Such a process is documented in detail in my Hardening OpenBSD section of the web site. The "fullback" script I use on my Linux systems could be resorted over a fresh install on a new machine to migrate a complete machine configuration. Nothing similar is possible on Windows systems as even a minor hardware change can cause a restore to fail. There are disk reproduction systems available for Windows. Generally these are intended to develop a standard configuration that will be applied to multiple machines. Presumably the duplicating software knows how and where to make the necessary adjustments to account for hardware differences, assuming the duplication can be to non identical machines. These are also normally intended to be used at the beginning of a system's life and not late in its life cycle when its registry has already grown and begun to impose performance degradations.

Duplicated UNIX machines do not need to be kept as replicas of each other. After a few minor IP adjustments, services turned off on one machine could be left executing on the other or vice versa, effectively splitting the application server load between the two machines. The ease with which UNIX machines can be duplicated, makes it relatively straight forward to have several machines performing essentially the same function. With load balancing software, this could work quite well for web servers or FTP servers.

Because UNIX systems are logically designed, highly modular, and provide the tools to manipulate their parts, they lend themselves well to small and medium size organizations being able to add and change them as needs change. The same is not true of Windows. Migrating Windows functions nearly always means building new machines from scratch. If applications have been customized over time, it can be very difficult to replicate their settings on a new machine.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.