Chapter 3

COSC1300

Networking and the Internet HTTP Protocol

‘‘Web - What spiders make inside sheds’’

‘‘The Rural Australia Thesaurus of Computer Terminology’’

n this chapter, we cover a complete study ofwebservers and related issues.

Table Of Contents

1. Introduction 2. Hypertext Transfer Protocol 2.1 Request Phase 2.2 Response Phase 3. Persistent Connections 4. Comparing HTTP/1.0 &HTTP/1.1 5. Web Servers 6. Useful Tools 7. Related Links

1. Introduction

In Chapter 2, we discussed the TCP/IP Network Protocol Suite and the functionality ofeach layer. There we mentioned various application protocols in the Application Layerof the TCP/IP hierarchy. Among these application layer protocols, the HypertextTransfer Protocol is one of the most important, and is the topic for this chapter.

HTTP is the language that Web Browsers (the client) and Web Servers (the server) useto speak to each other. It is important to enforce a strict set of rules for this conversation,as the client probably needs to communicate with many servers (e.g you access this siteas well as many other web sites) and the server needs to communicate with many clients(e.g this site is accessed by many students). The Internet Engineering Task Force (IETF)has released several RFCs (Requests for Comments) that outline HTTP and set thestandard for Web communication. The following table lists a few important RFCsrelated to HTTP.

RFCNumber

Purpose URL

1945 HTTP/1.0 Specifications http://www.w3.org/Protocols/rfc1945/rfc1945

2616 HTTP/1.1 Specifications http://www.ietf.org/rfc/rfc2616.txt

2617 HTTP Basic & DigestAuthentication

http://www.ietf.org/rfc/rfc2617.txt

You can find a full list of RFCs at the http://www.faqs.org.

1.1 Uniform Resource Locators & Uniform Resource Identifiers

In a heterogeneous environment like the World Wide Web, it is important to use anunambiguous way of refering to resources that clients can access. This is done throughthe Uniform Resource Locator (URL) notation, a straightforward way of indicating thelocation in terms of the protocol, the host and the path of the resource within the host.Typical components of a URL are illustrated in Fig. 1.

Fig 1. Anatomy of a URL

The first part of the URL specifies the communication protocol. For example, when theclient wants to access a resources from a web server using HTTP, the client uses http://.Other valid protocols include FTP, MAIL, FILE and TELNET. The second part is thehost name where the requested resource resides. By default, the web clients assume thatthe server listens to the default port (port 80) for web requests. But, if the server isconfigured for a non-standard port, the URL should include the port number, separatedby a colon. The rest of the URL specifies the relative path of the resource (relative to theDOCUMENT_ROOT of the web server). We discuss about the DOCUMENT_ROOTlater in Section 5.

A URI (Universal Resource Identifier) is a superset of a URL, in anticipation ofdifferent resource naming conventions being developed for the Web. For the time being,however, the only URI syntax used in practice is the URL - you can safely assume that"URI" is synonymous with "URL", even though this is not exactly correct. For moreinformation about URIs please read RFC 1630.

1.2 Terminology

There are a number of terms used in this chapter that have specific meanings in thecontext of HTTP comunication. A few of the most important terms are given below.

Connection

A transport layer virtual connection (TCP/IP connection in most cases) establishedbetween the server and the client for the purpose of communication.

Message

The basic unit of HTTP communication.

Request

An HTTP request message sent by the client to the server.

Response

An HTTP response message sent by the server to the client.

Resource

A network data object or a service that can be identified by a URI. Note: A resource may not necessarily be a web page; it could be any resource thatcan be served via the network (e.g. a voice stream).

User Agent

The client which initiates the request. In most cases, this is a web browser.

Server

An application program that accepts connections, receives requests and sends backresponses. This is a very broad definition, and depending on the nature of therequests being served, the server could be an origin server, proxy, or another typeof server.

The rest of the chapter focuses on the details of the HTTP and discusses how thisprotocol operates. In later sections, we discuss web server performance issues andattempt to identify bottlenecks in web communication. Furthermore, we consider how aweb server can be tuned to maximize its performance under different conditions.

Checkpoint

1. As a part of the assignment, we asked you to set up an Apache web server on aport > 50000. Why do you not use port 80?

2. What are the pros and cons of defining a protocol as a sequence of interactionswithin a single session (like SMTP) versus one single REQUEST, one singleRESPONSE and then disconnectiong (as in HTTP 1.0).

Networking and the Internet HTTP Protocol

COSC1300 - Lecture NotesWeb Servers and Web Technology

Copyright © 2000 RMIT Computer ScienceAll Rights Reserved

COSC1300

Introduction Request Methods

n this chapter, we discuss about HTTP protocol, requestand response phases of an HTTP connection, and

different request methods.

Table Of Contents


2. Hypertext Transfer Protocol (HTTP)

Each HTTP transaction is handled as a separate conversation between the browser and theserver. For this reason, we say that HTTP is stateless -- it does not remember the state that itwas in at the end of the last conversation.

Note: This ‘‘statelessness’’ has some advantages and disadvantages: one advantage isefficiency. It reduces the overhead of keeping track of historical transactions and makes abig difference to a heavily loaded web server. On the other hand, ‘‘statelessness’’ createsproblems: Just imagine a online shopping cart application that can’t remember what youordered on the previous page. In such cases, the users will benefit if the server can rememberinformation on previous transactions.

HTTP is stateless, but you can add a state to this stateless mechanism using other methodssuch as sessions. These topics are discussed elsewhere.

The protocol normally consists of two phases: the request phase and the response phase. In therequest phase, the browser sends out a request consisting of a request method, the path part ofan URL, and the version number of the HTTP protocol. It then sends some headerinformation, terminated by a blank line. In the response phase, the server returns the protocolversion, a status code, and a few lines of header information, terminated by a blank line. Then,the server sends data, the actual content requested by the browser.

Fig: 2. A short conversation between your browser andhttp://goanna.cs.rmit.edu.au:2000/hello.html.

All HTTP transactions follow the same general format. Each client request and serverresponse has three parts: the request or response line, a header section, and the body.

2.1 Request Phase

The three parts of a client request are as follows:

1. The client contacts the server at a designated port. (The default port is 80, but we can setup a web server listening to ‘‘non-standard’’ port). It then sends a document request byspecifying an HTTP command called a Method, followed by a document address and theHTTP version number.

For example, if the client wants to fetch hello.html using the HTTP/1.1 protocol, itsends the command:

GET /hello.html HTTP/1.1

This command uses the GET method to request the document hello.html.

The Methods supported by HTTP protocol are discussed in Section 2.1.1 2. Next, the client sends optional header information to inform the server of its

configuration and the document formats it will accept. All header information is givenas a <Header Name:Value> pair.For example,

Connection:Keep-Alive User-Agent:Mozilla/4.73 Accept:image/gif, image/jpeg, */*

tells the server:- keep the TCP/IP connection open even after the document is delivered.- the browser name is Mozilla (Netscape) and its version is 4.73- the browser can handle gif and jpg images.The header section terminates with a blank line. You can find a extensive list of HTTPrequest headers and their meanings in Section 2.1.3.

3. The third part of the client request is optional. The client may use it to send additionalinformation that might be needed to process POST requests. In other words, if you usethe GET method to request, then there is no need to pass this portion to the server. Wewill discuss how the POST method works and how it differs from the GET method later.

Checkpoint

1. What is the general format of a HTTP request and a HTTP response?

Introduction Request Methods



COSC1300

HTTP Request Headers

2.1.1 Methods

n this section, we discuss different methods used inHypertext Transfer Protocol, between clients and servers.

Table Of Contents


There are 5 methods defined in the HTTP protocol. They are listed below.

Method Description

GET Returns the contents of the document

HEAD Returns the header information of the document

POST Treats the document as a script, executes it and sends results

PUT Replaces the content of the document with some data.

DELETE Deletes the document

2.1.1.1 The GET Method

GET is the most common method used by clients to request documents. When a client uses theGET method, the server responds with a status line, headers, and the requested document. If theserver cannot process the request due to an error or lack of authorization, the server usually sendsa textual explanation in the data portion of the response.

We mentioned that the client request may comprise of three portions, but the GET request hasonly two parts: the request command and the request headers. The third part of the request (theentity-body portion) of a GET request is always empty. GET is basically used for ‘‘Please send methis file’’ -type requests.

However, it should be noted that you can use the GET method to pass data to a script, for examplewhen processing a form. In such cases, we attach these additional information (for example, formfields and their values) to the requested URL. In other words, these additional information ispassed as a part of the request command. For a clarification, please refer the following illustration.

Illustration of the GET method used in a more complex situation.

1. The user requests a form by GET’ing its URL.

2. The user fills in the form, and hits the submit button.

At this point, the browser collects the form fields and their values, attaches them to the URL (ofthe script that processes the form), and passes back to the server using the GET method.The command part of this request is of the following form:

GET /serve_drink.php?username=Citizen&favorite=Water&submit=Submit HTTP/1.1

3. The server executes the script (using the arguments it received) and sends the processed resultsback to the client.

2.1.1.2 The POST Method

This is another method the client can use to send a request to a web server. However, the serverresponds in a different way this time around. When the server receives a POST request, it redirectsthis request and its associated data to another program (or a script). In most cases, such a programacts as a ‘‘web gateway’’ or a web interface to a database or another information system. Thisprogram is executed and the result is sent back to the web server. The web server in return sendsthe processed result back to the client. The POST method, in general, can be considered as a‘‘please do this for me’’-type request.

Essentially, a POST request has three parts: the command, the request headers and additional datarequired to process the request. For example, in a form processing program, this additional datamay contain form field values.

Fig 4. The conversation between the client and the server, when you POSThttp://goanna.cs.rmit.edu.au:2000/multiply.php program.

2.1.1.3 HEAD Method

The HEAD method is functionally similar to the GET method except that the server will send onlythe response header in its reply. A HEAD request consist of only two parts: the command and therequest headers. These request headers are similar to the request headers in a GET request.

This method is used when the client wants to find out information about the document and notretrieve it. For example, the client may desire the following information:

The modification time of a document, useful for cache-related queries. (Caching will bediscussed in Chapter 6.) The size of the document, useful for page layout, estimating arrival time, or determiningwhether to request a smaller version of the document. The type of the document. The type of the server, to allow customized server queries.

Please note that the header information sent by the server can vary from server to server.

The following diagram illustrates a conversation between the client and the server using theHEAD method.

Fig 4. The conversation between the client and the server, when you HEADhttp://goanna.cs.rmit.edu.au:2000/hello.html .

2.1.1.4 Other Methods

Apart from the above three methods, HTTP specifies a few other methods that are used lessfrequently. In fact, not all servers implement these methods.

DELETE Method

This allows the client to request the server to delete a document specified in the command line.

PUT Method

This allows the client to pass a document to be saved in the server’s document tree.

OPTIONS methods

This method allows the client to determine the options associated with a resource or thecapabilities of a server, without initiating a retrieval.

TRACE method

This allows the client to send a request body to the server and get it back. It is useful for checkingthe connections & to trace its path.

CONNECT method

This is a reserved method, used specifically for SSL tunnelling. (SSL is described in Chapter 6).

Availability of methods in HTTP/1.0 & HTTP/1.1

Method HTTP/1.0 HTTP/1.1

GET Yes Yes

POST Yes Yes

HEAD Yes Yes

DELETE Yes Yes

PUT Yes Yes

OPTIONS No Yes

TRACE No Yes

CONNECT No Yes

More about these methods can be found in RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1,RFC 1945 - Hypertext Transfer Protocol -- HTTP/1.0, and Key differences between HTTP/1.0and HTTP/1.1.

2.1.2 Comparison between GET & POST Methods

Although the GET method is meant for ‘‘Please send me’’-type requests, we saw in section2.1.1.1 how the GET method can be used to pass some values (as a part of the GET command) tothe server, and then on to a server-side script. The reader may be confused about how to determinewhether to use GET method or POST method, when the client wants to pass some values (say,form fields) to a server-side script. The following table gives you a ‘‘rule of thumb’’ to make areasonable decision.

Use GET method, if Use POST method, if

- you have only a few values topass to the script

- you have a very long list ofvalues (say more than 1000 bytes)to be passed to the script

- you expect the user wants tosave the query result as abookmark, and the user expectsto retrieve the same results

- you do not want to allow theuser to bookmark the page withthe query values, and you expectthe user to select themdynamically

- you don’t care, if the user seesthe values passed to the script

- your query consists of someconfidential data that should notappear in the URL

- you want to pass only ASCIIdata

- you want to pass non-ASCIIdata (say, an image).

Checkpoint

1. If the server has Multi-lingual support (it can deliver documents in different languages), howdoes it determine in which language the document be delivered?

HTTP Request Headers



COSC1300

HTTP Methods Response Phase

2.1.3 Request Headers

n this section, we discuss the request phase of theHTTP connection, request headers, in particular.

Table Of Contents


The request header is comprised of an arbitrary number of header fields. Most of thesefields are informational, and are generally optional. The following table gives a list ofcommonly used header fields and their meanings.

Header Description

From E-mail address of the requesting user

User-Agent Name and the version no. of client browser

Referer URL of the last document the client displayed

Accept File types that the client will accept

Accept-Encoding Compression methods that client will accept

Accept-Language Language(s) that client will accept

Authorization Used for authentication purposes

If-Modified-Since Return document only if modified since specified date/time

Content-Length Length (in bytes) of the request body (used in POST method)

Connection Connection options, such as keep-alive

Host Virtual host to retrieve data from

Cookie Send a previously saved cookie to the server

User-Agent

This header is useful for the server to generate custom-built pages. For example, theserver may deliver a ‘‘Frames’’ version of a document to a Netscape client, while

delivering a ‘‘No-Frames’’ version to a Lynx client. (Lynx is a UNIX based text-onlybrowser that does not support frames, well, sort of).

Referer

This header is used to send the last visited URL to the server. This could be used todynamically generate a ‘‘Back’’ button in your documents. Furthermore, it can be usedto determine whether the client followed a proper sequence of pages, when such asequence exists.

Accept

This field is used to specify what document types the client (browser) wants to receive.There may be multiple Accept lines in a request header. For example:

Accept: text/plain Accept: text/html Accept: image/jpeg

headers tell the server that it can accept plain text and HTML documents and Jpegimages.

Accept-Encoding and Accept-Language specifies what compression methods that theclient can understand (and uncompress) and the language priorities.

If the same document is available in different languages, the server can determine thedocument to deliver using the Accept-Language field.

If-Modified-Since

This is used in caching schemes. In order to improve efficiency, most browsers keep acopy of previously accessed documents in a browser cache, and display the local copywhen the user requests it again, rather than downloading it again. However, in order forthis to work well, the browser must check the remote server to make sure that thedocument hasn’t changed. If-Modified-Since is used by the browser to ask the server toreturn the document only if it has changed since the specified date/time. Caching isdiscussed in Chapter 6.

Connection

This header field is sent to the server to ask for special handling mechanisms. Forexample, if the client wishes to establish a persistent connection for the entiretransaction, it can ask for a ‘‘Keep-Alive’’ connection.

Authorization

Authorization is used by various validation schemes. This will contain the name of theauthorization method and any other information expected by the validation method, suchas realm, username and password.

Cookie

The Cookie header field is not defined in HTTP/1.0 nor in HTTP/1.1. Nevertheless,among all the request header fields, this is the most popular request header field, and isused in millions of sites. Cookie is an extension provided by Netscape, and widely usedto maintain the ‘‘state’’ of the web pages. It is used by the browser to send cookie valuesthat had been saved in the browser. Cookies are discussed in detail in Chapter 4.

How Cookies work in the client-side

When the user types in a URL, the browser searches its ‘‘Cookies Database " to see ifthere are any cookies associated with the requested page. If any such cookies exist (andthey have not expired), it attaches a Cookie header field to the request header (alongwith the cookie <name=value> pairs) and sends to the server.

E.g:

Cookie: username=Citizen Cookie: favorite=Water

The way the cookies are stored in the browser varies from browser to browser. Forexample, Netscape Navigator keeps them in a single file, while Internet Explorer storesthem in individual files.

On arrival of the request header, the server detaches the cookie and acts on thereceived information. Most servers store cookie data in an environment variable called‘‘HTTP_COOKIE’’ and make them available for server-side scripts.

The Set-Cookie header field is used in the response headers, and it is used by the serverto send cookies to be saved in the browser. We’ll discuss the ‘‘Server-side of theCookies story’’ later.

For more information about Cookies, please visithttp://developer.netscape.com/viewsource/archive/goodman_cookies.html.

Checkpoint

1. Why is the Host header required when requesting a resource from a virtual host? 2. How does Basic Authentication work at the client side? 3. At http://www.fruit.com, people can buy apples, oranges, and bananas. A

customer’s basket can contain, 0.5 kilogram apples and 1.4 kilograms bananas.What command, and with which parameters, can such a web site use to store thisinformation in a cookie in the user’s workstation?

HTTP Methods Response Phase



COSC1300

Request Headers Persistent Connections

2.2 Response Phase

n this section, we discuss how the response phaseof a HTTP connection works.

Table Of Contents


Now it’s the server’s turn to respond the client request. Similar to the client request, theserver response consists of three components.

The status line The Response Headers The Response Body

The server first sends back a line, usually referred to as the ‘‘Status line’’ containing theprotocol version, a three-digit status code, and a text explanation of the status.

2.2.1 Status Codes

The status codes are categorized into 4 groups as follows.

Code Text Description

2XX codes - success

200 OK The URL was found. The contents willbe delivered in the response body

201 Created A URL was created in response to aPOST request

202 Accepted The request was accepted for processinglater

204 No Response

The request is successful, but there is nodata to send. This happens when anexecutable script has done someprocessing in response to a query, but itdoesn’t have any particular informationto display.

3XX codes - Redirection

301 Moved The URL has permanently moved to anew location.

302 Found The URL can be temporarily found at anew location.

4XX codes - Client Errors

400 Bad Request Syntax error in the request.

401 Unauthorized The client failed to authenticate itselfsuccessfully.

403 Forbidden

This URL is forbidden. It could be IPaddress based restriction, user-basedrestriction, or directory-based restriction.This type of errors can’t be overcome byjust providing the correctusername/password.

404 Not Found. You knocked the wrong door; thedocument is not there.

5XX codes - Server Errors

500 Internal Error The server encountered an unexpectederror.

502 ServiceOverloaded

The server is overloaded at the momentwith too many requests.

503 Gateway TimeoutThe server was trying to fetch data fromelsewhere when the remote servicefailed.

2.2.2 Response Headers

After the status line, the server sends out a response header. The header is a mixture ofvarious pieces of information about the server and the document to follow. Like therequest header, much of the information in the response header is optional, with the

exception of the Content-Type field.

After the header, the server sends a blank line that delimits the header from the responsebody. In the response body, you will also find the actual document. After this, the HTTPconversation between the server and the client is terminated.

In the following table and the subsequent sub sections, you will find the most often usedresponse headers.

Header Description

Server Name & the version of the server software

Date The current date & time (GMT)

Last-Modified Date on which the document was last modified.

Expires Date on which the document expires.

Location The location of the document. This is used when the document isretrieved from a redirected location.

MIME-Version The MIME version used

Content-Length The length in bytes

Content-Encoding The compression method of this data

Content-Language The language in which this document is written.

Pragma Additional information for the browser

WWW-Authenticate Used for authentication.

ETag Unique identification number for the server.

Set-Cookie Sets and sends a cookie to the browser.

WWW-Authenticate

This specifies the authorization scheme and the realm of authorization required for therequested URL. When the client receives this header, it pops up a dialogue window foruser to enter the username and the password.

e.g: This site returns

WWW-Authenticate: BASIC realm="SameAsForums"

and when the client receives it for the first time, it displays the user authenticationdialogue box. This is covered in more detailed in Chapter 4.

Content-Type

This describes the media type and the subtype of the response body. The server shouldreturn media types that conform with the client’s preferred formats. The client usuallyspecifies what it wishes to receive in its Accept request header.

ETag

This indicates an entity tag. This field provides the client with a unique identifier for the

server resource. It is highly unlikely that different server resources will have the sameentity tag. This tag provides a powerful mechanism for caching.

e.g:

ETag: "2f5cd-964-381e1bd6"

Set-Cookie

This is the server-side part of the ‘‘Cookie’’ communication. This header contains a<name=value> pair (the actual cookie) which the server wants the client to maintain.There are other optional fields the server may include in the header. The additional fieldsinclude the expire date of the cookie and the path of the document tree to which thiscookie is attached. Cookies are discussed in more detail in chapter 4.

e.g:

Set-Cookie username=Citizen expires= Saturday 29-Jul-00 12:30:00 GMT

This will store a cookie named ‘‘username’’ with the value ‘‘Citizen’’ in the clientbrowser, and it is attached to the current document.

It is possible to send a cookie that affects to a whole branch of the document tree or evenmore than one server.

Pragma

Pragma is used to send various instructions to the browser. A commonly-used hint isno-cache, which tells the browser not to add the document into its local browser cache.This is useful if the document is a result of a POST request and is generated on-the-fly bya script and changes every time it is requested.

e.g:

Pragma "no-cache"

Checkpoint

1. The client receives the following response header.

HTTP/1.1 302 FoundDate: Wed, 02 Aug 2000 01:19:50 GMTServer: Apache/1.3.12 (Unix) PHP/4.0.0 mod_ssl/2.6.4 OpenSSL/0.9.5aLocation: https://yallara.cs.rmit.edu.au:8001/new_server.htmlConnection: closeContent-Type: text/html; charset=iso-8859-1

What is the meaning of these response headers?

Request Headers Persistent Connections



COSC1300

Response Phase Comparison between HTTP/1.0 &HTTP/1.1

3. Persistent HTTP Connections

n this section, we discuss how HTTP/1.1protocol handles persistent connections, and how

we can achieve this in HTTP/1.0

Table Of Contents


One main drawback in HTTP/1.0 is that it requires a new TCP/IP connection be set upand destroyed for each document transferred. This imposes a severe performancedegradation when a browser needs to fetch several URLs from the same server - acommon case when downloading a document that contains several images.

E.g. Let’s assume that we want to download the following page:

<HTML><HEAD> <TITLE>The multiple images example<TITLE></HEAD> <BODY> <IMG SRC="1.gif"> <IMG SRC="2.gif"> <IMG SRC="3.gif"> <BODY></HTML>

The entire conversation that takes place between the server and the client is as follows.

1. The client starts up a TCP/IP connection with the server. 2. The client sends the HTTP request. 3. The server sends the document, with image tags, but not the images. 4. The connection is destroyed. 5. The client establishes three new TCP/IP connections with the server. 6. The client hands over each HTTP request (for each image) via newly-established

connections. 7. The Server sends images 8. The connections are destroyed.

Since we destroy the original connection at a time when we have not completed thedownload (i.e. document and the images), the performance is degraded.

HTTP/1.1 proposes a solution for this drawback. It allows the client and the server toestablish persistent connections, allowing the client to continue with the existingconnection if it needs to download more resources.

If we used HTTP/1.1 the above conversation would as follows:

1. The client starts up a TCP/IP connection with the server. 2. The client sends the HTTP request. 3. The server send the document, with image tags, but not the images. 4. The client establishes two more TCP/IP connections with the server. 5. The client hands over each HTTP request (for each image) via the existing

connection and the newly-established connections. 6. The Server sends images 7. The connections are destroyed.

In comparison to the previous example, we need to establish only two new connections,saving the start-up time for one TCP/IP connection.

Response Phase Comparison between HTTP/1.0 &HTTP/1.1



COSC1300

Persistent Connections Web Servers

4. HTTP/1.0 and HTTP/1.1

n this chapter, we present a comparison betweenHTTP/1.0 and HTTP/1.1 protocols.

Table Of Contents

1. Introduction 2. Hypertext TransferProtocol 2.1 Request Phase 2.2 Response Phase 3. Persistent Connections 4. Comparing HTTP/1.0 &HTTP/1.1 5. Web Servers 6. Useful Tools 7. Related Links

Comparison between HTTP/1.0 and HTTP/1.1

Some of the most significant differences between HTTP/1.0 and HTTP/1.1 are givenbelow.

Persistent TCP/IP Connections

As discussed in the above section, HTTP/1.1 connections remain open by default,allowing the browser to download multiple resources using the same TCP/IPsession.

Partial Document Transfers

HTTP/1.1 allows browsers to obtain specific portions of documents by specifyingthe start and end positions to be retrieved.

In addition, this protocol allows documents to be divided into logical chunks thatare handled independently.

This allows for caching schemes in which only those portions of the document thathave changed need to be downloaded from the web server.

Conditional Fetch

HTTP/1.0 allowed only a single type of conditional fetch - using the

If-modified-since header field.

HTTP/1.1 adds several additional types of conditional fetch, increasing theflexibility of this feature.

Better Content Negotiation

HTTP/1.0 implements server-side content negotiation. The browser gives theserver a prioritized list of MIME types it is willing to accept, and the serverdecides which version of a document to send. HTTP/1.1 adds client-side contentnegotiation, in which the server announces what formats are available and thebrowser picks the version it wants.

Official Support for Nonstandard HTTP/1.0 Extensions

There were quite a few non-standard features that were used in the HTTP/1.0protocol. One example is the Host field used to select a logical host from a serverthat housed several ‘‘virtual hosts’’.

Better Support for Alternative Character Sets

HTTP/1.1 provides better support for alternative character sets, such as Mandarinand Japanese.

More Flexible Authentication

HTTP/1.1 adds support for user authentication across firewalls and gateways. Italso provides an authentication mechanism based on the MD5 cryptographyalgorithm that avoids the problem of sending usernames/passwords across thenetwork using ‘‘plaintext’’. Cryptography is discussed in Chapter 6.

For more information about performance improvements gained in HTTP/1.1, pleaseread the article W3C Recommendations - Reducing ‘‘World Wide Wait’’ and the paperNetwork Performance Effects of HTTP/1.1, CSS1, and PNG.

Checkpoint

1. How does the server choose the protocol to be used, i.e either HTTP/1.0 orHTTP/1.1?

Persistent Connections Web Servers



COSC1300

Comparison between HTTP/1.0 &HTTP/1.1

Useful Links

5. Web Servers

n this chapter, we discuss the installation,configuration and running of a web server.

Table Of Contents

1. Introduction 2. Hypertext TransferProtocol 2.1 Request Phase 2.2 Response Phase 3. Persistent Connections 4. Comparing HTTP/1.0 &HTTP/1.1 5. Web Servers 6. Useful Tools 7. Related Links

A web server is an application that listens for requests from a client (generally a webbrowser), processes this request in some way, and sends a response. The language that isused for this communication is HTTP, and is possible because there is an HTTP logicalconnection between the client and the server.

The best way of understanding something is doing it for yourself. Installing andconfiguring a web server is no exception. You will be able to understand most of thetopics that are covered easily if you spend some time installing your own server,tweaking its configuration options, and experimenting with its performance.

5.1 Why Apache?

There are a number of reasons why we choose Apache.

It is easy to install and configure - the installation and configuration isstraight-forward and self-descriptive, therefore, you should be able to install itsuccessfully without being an expert in the field. It is open - the configuration is so open that you know exactly the effects of eachchange you make in the configuration, making it ideal for teaching purposes. It is open source and released under GPL (GNU Public License) - therefore we canuse it without licensing costs. For more information on license issues visit theGNU web site It is the most popular web server today - approximately 60% of all web sites in theworld use Apache servers. For more information about web server usage statistics,visit http://www.netcraft.com/survey web server survey. You may wish to use

Netcraft’s Exploring sites facility to detect the web servers running at yourfavorite web sites. It can be smoothly integrated with many other useful modules. For example, thePHP scripting language can be accommodated in the Apache server as a module.

5.2 Apache Server Installation - Directory Structure

When installing a server from scratch using Apache source code, we need to store thissource code temporarily in a suitable place, traditionally /usr/local/src. It is always agood idea to keep the source code directory away from the final server softwareinstallation directories.

The location where we install the Apache server software is referred to as theSERVER_ROOT. The installation process creates various subdirectories, including thefollowing, under the SERVER_ROOT.

htdocs is the DOCUMENT_ROOT of your web server, where you put in the documentsyou want to publish. bin is where the executable scripts that come with Apache, such as apachectland apxs are located. conf is where the Apache configuration files, such as httpd.conf are located.

5.3 Configuring Apache

Most configuration information for Apache is held in the file httpd.conf. Other filescan be used by placing the Includes filename directive in the configuration file.Apache reads its configuration into memory when it is first started, so if you make anychanges to your configuration file, you will need to restart your server for them to takeeffect. The configuration file is read from top to bottom, so if the same directive appearstwice, only the second one will be used.

5.3.1 Server-Level Directives

Server-level directives apply to the server as a whole. Some directives, such asLoadModule only make sense on the server level. For other directives, it is useful to set adefault value which can be specifically overridden by container or per-directorydirectives.

5.3.2 Container Directives

There are nine container directives that can be used in the Apache configuration file.These directives are used to specify resources or request methods. The containers canthen include configuration directives specific to the matched entities. In most cases, theresource can be spcified with or without quotes.

The Match forms are used for matching multiple resources using regular expressions.

<Directory> and <DirectoryMatch>These directives are used to match specific directories under the web document root.

<Directory "/usr/local/htdocs/php_examples">

This would match the directory /usr/local/htdocs/php_examples

<DirectoryMatch "^/usr/local/htdocs/.*/[A-Z]{3}">

This would match any directory under /usr/local/htdocs consisting of three capitalletters.

<Files> and <FilesMatch>

These directives are used to match specific files under the web document root.

<Files "apache.gif">

This would match any file named apache.gif

<FilesMatch "\.(jpeg|gif)$">

This would match any file ending with .gif or .jpeg.

<Location> and <LocationMatch>

These directives are used to match a URL. This means that the parameter does not haveto match the file system but may match files or directories.

<Location "php_examples">

This would match the URL http://domainname/php_examples

<LocationMatch "(c|php|pl)_examples">

This would match any URL containing c_examples, php_examples or pl_examples,such as http://domainname/pl_examples and http://domainname/php_examples

<Limit> and <LimitExcept>

These containers can be used to limit the scope of their effectiveness to the HTTPmethods specified.

<Limit GET POST>

The directives entered in this container will only apply to requests made using the GETand POST HTTP methods.

<LimitExcept HEAD>

The directives entered in this container will apply to requests made using any HTTPmethod except HEAD.

<VirtualHost>

This container allows one server to serve files for multiple domains or IP addresses. It ispossible to override server-level directives in a VirtualHost container. For example,each virtual host can have its own logs and web document root.

5.3.3 Per-Directory Directives

Files can be placed in individual directories containing directives that will apply to thatdirectory and its subdirectories. The AllowOverride directive controls the types ofdirectives that can be placed in these files, while the AccessFileName directive specifieswhat these files must be called. The default name is .htaccess.

5.3.4 Order Allow,Deny

One of the most common tasks that a server administrator will want to perform is toallow or deny access to certain resources. This is achieved by the use of the Order,Allow and Deny directives. Allow and Deny can be used to specify hosts or networks, bydomain or IP address and allow or deny access to them. Order is used to specify theorder in which the Allow and Deny directives are evaluated.

Deny from 192.168.12.122

This will deny from the host with the IP address 192.168.12.122.

Allow from 192.168

This will allow from all hosts with an IP address beginning with 192.168.

Deny from yallara.cs.rmit.edu.au

This will deny from the host yallara.cs.rmit.edu.au.

Allow from rmit.edu.au

This will allow from all hosts on the rmit network.

Order Allow,Deny will force the evaluation of all Allow directives, followed by allDeny directives.

Order Deny,AllowDeny from cs.rmit.edu.auAllow from yallara.cs.rmit.edu.au

This will cause all hosts except yallara on RMIT’s computer science network to bedenied access.

Note that the Order directive uses the second argument to provide default access.

Order Allow,Deny

This will deny access to all hosts. While the default access does work, it is unclear andshould be avoided in favour of more explicit directives, such as below.

Order Allow,DenyDeny from all

Comparison between HTTP/1.0 &HTTP/1.1

Useful Links



COSC1300

Web Servers Web Server Performance

6. Useful Tools

n this chapter, we present you with a list of Webresources that could be useful in your studies.

Table Of Contents


In this section, you will find some useful software tools for setting up, configuring andmaintaining a web server.

Of course, you need Apache if you are going to install an Apache server.

If you think it is difficult to edit configuration files manually, try TkApacheGraphical User Interface.

Apache server is bundled with RedHat Linux. If you run Linux at home, youmight like to install using .rpm format. Instructions can be found here.

Alternatively, you can order RedHat Linux from http://www.lsl.com.au.

You can download PHP source code or binaries for many platforms includingWin32 from http://www.php.net/downloads.php. The Australian mirror ishttp://au.php.net/downloads.php.

Zend optimizer is very useful add-on feature for a PHP-capable Apache server. Itoptimizes intermediate PHP code, and enhance the server performance. ZendOptimizer can be downloaded from http://www.zend.com.

Jigsaw Web Server is is W3C’s leading-edge Web server platform, providing asample HTTP 1.1 implementation and a variety of other features on top of anadvanced architecture implemented in Java. The W3C Jigsaw Activity statementexplains the motivation and future plans in more detail. Jigsaw is an W3C OpenSource Project, started May 1996.

Internet Information Server is the Microsoft’s candidate in the Web Servermarket competition.

Another very popular relational database system used with web servers is MySql.The precompilled PHP binary for Windows platform has built-in MySql functions.

7. Related Links

1. http://www.apacheweek.com - ApacheWeek Weekly online Magazine. Read thisto know about what’s happening in the Apache world.

2. http://www-genome.wi.mit.edu/WWW/resource_guide.html Linoln Stein’s ‘‘Howto setup and maintain a Web Site’’ home page.

3. http://serverwatch.internet.com/webservers.html - Web server technical details &server comparison.

4. http://www.w3.org/Talks/1998/10/WAP-NG-Overview - W3C’s presentation onHTTP/ng (I guess HTTP/ng is not progressing).

5. http://Apache-Server.Com/tutorials - Ken Coar’s Apache tutorials (Author ofApache Server for Dummies).

6. http://www8.org/w8-papers/5c-protocols/key/key.html -Key Differences betweenHTTP/1.0 and HTTP/1.1 - A paper on HTTP/1.0 & HTTP/1.1

7. http://developer.netscape.com/docs/manuals/enterprise.html Netscape EnterpriseServer Documentation

8. http://www.microsoft.com/ISN/whitepapers.com - Web Hosting with IIS 5.0 - Areview of Internet Information Server 5.0.

9. http://www.irt.org/articles/js177/index.htm - ‘‘Apache at your Web Service ’’, anIRT document on Apache.

10. http://www.devshed.com/Server_Side/PHP/SoothinglySeamless - Devshed’sApache+PHP+SSL+MySql installation tutorial.

Contributors:

Santha Sumanasekara ([email protected]) Michael Harris ([email protected])

Web Servers Web Server Performance



Documents

Chapter 3