GeodSoft: Building an Association Web Site, Basics: Web Technology

		GeodSoft

Web Technology

Virtually from the beginning, web browsers had a simple way to go back to documents you had previously seen. Thus after your side trip to Japan, you could go back twice and be right back where you had been when you left the New York document.

A web browser is the client side program that connects to a web server which makes documents available. The protocol is the Hyper Text Transfer Protocol or HTTP. Because of the jumping around that was envisioned with HTTP, it is a stateless protocol. All the other protocols mentioned so far have been session based. In these, a client makes a connection to the server and the connection is held open until the user or originating computer takes an action to end the session. If the user is inactive, after a predetermined time out period, the server drops the connection.

HTTP is fundamentally different. A browser makes a connection to a server, requests a document and as soon as the document is sent, the server closes the connection and retains no knowledge of that connection. If the same browser subsequently requests another document from the same server, the server sees it as an entirely new connection, no different than a request from a brand new browser. While this seemed the logical when HTTP was first developed, it has had significant implications on the design of applications as the web has moved from the delivery of simple text documents to e-commerce applications.

While web servers are capable of transmitting any type of document requested by a URL, web browsers are designed to display only specific types of documents. The primary document type that a web browser is designed to display is a Hyper Text Markup Language or HTML document. An HTML document is an ASCII document that contains a minimum set of HTML tags that define the essential document structure plus any number of optional tags that are designed to describe the document's structure or appearance.

All web browsers also display plain ASCII text documents. As browser technology has advanced, browsers have become capable of displaying a growing number of document types. If a browser does not know how to display a document type, it offers the user the option to save it to disk. Browsers also can make use of helper programs called plug-ins to display document types that the browser does not understand.

HTML is a simple, fixed, subset of Standard Generalized Markup Language (SGML), an extremely complex document description language. HTML is composed of tags that always start and end with paired angle brackets, < >. A number of HTML tags are used in pairs where the opening tag has the simple < > structure and the closing tag has the form of </ >. Between the angle brackets or angle bracket slash combination is text that identifies the specific HTML tag. Each tag in some way affects the presentation of the document in a browser. Where the tags are paired, the contents between the tags are affected in a predefined manner.

In addition to the tag name which follows the opening angle bracket, many HTML tags may contain attributes that are typically in the form of attribute name, an equal sign and the attribute value. Attributes further refine the presentation of the document in a manner specific to the tag and attribute.

One of the key tag pairs is the so called anchor tag which is a paired HTML tag. The most important attribute of the anchor tag is the HREF attribute which contains a URL. The text between the opening and closing anchor tags typically describes the document pointed to by the URL. The form is as follows: <a href="URL">descriptive text</a>. HTML is not case sensitive.

When displayed in a browser the descriptive text would typically be underlined so that users will know that this is a hypertext link and that clicking on the descriptive text will cause the referenced document to be retrieved. This document does not intend to describe the HTML language, except as it relates to significant web design issues. For a comprehensive reference on HTTP and HTML, see the HTML Sourcebook. Be sure to get the latest revision as HTML is being updated and extended with some regularity.

A complete URL consists of four parts. The first part is the protocol which is HTTP by default but can also be FTP in any contemporary web browser. Other protocols may be supported by some browsers. The second part is the host name, i.e. the name of the web server machine, on which the document is located. The third part is the virtual path to the document. The fourth and final part is the actual document filename. An example URL might be:

http://www.xyz.com/dir1/subdir2/mydoc.htm

In this example the protocol is http and is separated from the host name by a colon and two slashes. xyz.com would be the domain name of the company or organization that owned or ran the web site. Www is the host or machine name on which the web server software runs. It could be a dedicated machine that runs only a web server or www might be an alias for another machine. Though web host names typically begin with www this is purely conventional and there is no requirement that web site names begin with www. This was once almost universal but it is increasingly common to find web sites whose names do not begin with www.

Dir1 is the first subdirectory under the directory defined as the web or document root directory. Technically there is no reason that this could not be the root of a Unix system or the root directory of a Windows partition but for security reasons this is not, or should not, ever be the case. Subdir2 is the next subdirectory under dir1 and mydoc.htm is the actual document filename. On a Unix server this might actually be /home/httpd/html/dir1/subdir2/mydoc.htm and on a Windows server this might be c:\inetpub\wwwroot\dir1\subdir2\mydoc.htm. Note that on a Windows server, the URL still contains forward slashes even though the directory naming convention in Windows is backslashes. In the Unix example, html is the web or document root directory and in the Windows example it's wwwroot.

<P Web Origins <
^ Network and Internet Basics ^
> Dynamic vs. Static N>

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

Home >
Book >
The Basics >
wwwtech.htm

<P Web Origins
N> Dynamic vs. Static

What's New
How-To
Opinion
Book

Email address