Transcript
Page 1: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Server Technology

ISM 3600 Contemporary Issues in Information Technology

Page 2: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Server Technology

• 3 weeks X 2 hours

• One (individual) assignment

• Yeager and McGrath, Web Server Technology, Morgan 1996

Page 3: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Overview

• Web server basics

• The Hypertext Transfer Protocol (HTTP)

• Scripts and forms

• Performance issues

• Emphasize on the general workings, rather than specifc products.

Page 4: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Popular Web Servers

Page 5: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

What is a Web server ?

Web server = Platform + Software + Information

A computer connected to the Internet

The Web server program

Web pages, files,audio, video, etc.

Page 6: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

What does a Web server do ?

• Receive a request

• Decipher the request

• Find the requested object (file)

• Deliver the object

Page 7: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Receive the request

• The Web server program “listens” to a designated port (e.g. 80).

• It is the operating system that hides all the complexities of the underlying network connections and gives the Web server a simple way to communicate with the clients.

Page 8: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Find the requested object (file)

• An object is requested by its name which tells the location of the object within the file system of the Web server.

• The Web server totally relies on the operating system to retrieve the requested file.

• A requested object does not necessarily exist.

Page 9: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Hypertext Transfer Protocol (HTTP)

• A set of rules that define how Web servers and browsers communicate with each other over a TCP/IP connection.

• The httpd program

Page 10: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

What a Web server does not know ?

• Hypertext links between documents.

• Inline images - browsers recognize links within a document and automatically initiate requests for them.

• What links may point to a document.

• If the MIME type assigned to a document is correct.

• Other Web servers.

Page 11: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Multipurpose Internet Mail Extensions (MIME)

A set of globally recognized data types

Page 12: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Document Tree

=Web documents +

Tree organization

Page 13: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Documents

• HTML documents• ASCII text• Preformatted documents (e.g. PostScript)• Images• Sound recordings• Movies• Java applets• ...

Page 14: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Serving different kinds of Web documents

• Server tells the client what kind of document is coming before sending the document.

• The Content-type header

• Document files have extensions to indicate the kinds of information content.

• Server only knows a document as a sequence of bytes (except for scripts).

Page 15: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

File extensions v.s. Document types

.html, .htm

.txt

.ps

.gif

.jpeg

.mpeg

.java

HTML document

ASCII

Postscript

GIF image

JPEG image

MPEG video

Java applet

Page 16: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Accept and Content-encoding headers

• A client can optionally send a list of acceptable formats to the server, which will return None Acceptable if the type of the document to be served is not in the list.

• Server can also specify how a document is compressed using the Content-encoding header.

Page 17: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Serving HTML documents• In general, an HTML document contains:

– text to be displayed

– anchors

– links to images and other objects

• It is the browser which recognizes the text, anchors, and links inside a HTML document and takes appropriate actions.

• For each anchor or link, the browser issue a separate request.

Page 18: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Scripts

• Sometimes a browser may request a document which is really a program, or script.

• A script is any program that is executed by the Web server.

• In general, a script translates the input from the client, calls other programs, and translates the output(s) for return.

Page 19: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Tree Organization

How HTML documents link to each other

Lingnan CollegeLibraryBusiness……...

LibraryCatalogueCD-ROM…Home

BusinessAccountingComputer…Home

Accouting…Back

Computer…Back

How HTML documents are physically organized in the file system(s)

welcome.htmLingnan College...

Library…

Computer...

cptra.ln.edu.hk lib.lnc.hkwww.ln.edu.hk

welcome.htm welcome.htm

Business...

dept/business.htm

Accouting...

dept/account.htm

Page 20: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Different Tree Organizations

• One server, one tree

• Multiple servers, one tree

• Multiple servers, multiple replicated trees

Page 21: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Reasons for different tree organizations

• Several working groups

• Too many documents

• Load-balancing (for replicated trees)– mirror sites

Page 22: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Hypertext Transfer Protocol (HTTP)

• Define a simple request-response conversation, in particular– how to phrase a request– how to phrase a response

• Does not define– how the network connection is made or managed– how information is actually transmitted

Page 23: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Request

• An HTTP request consists of– The method (GET, HEAD, POST, etc.)– Universal Resource Identifier (URI)– The protocol version– Other information (e.g. Accept)

Page 24: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

HTTP Methods

GET

HEAD

POST

PUT

DELETE

Others

Return the object.

Return only info. about the object

Send info. to be stored on the server.

Send a new copy of an existing object.

Delete the object.

Page 25: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

HTTP Request: Example

GET /Stuff/Funny/silly.html HTTP/1.0

User-agent: NCSA Mosaic for the X Window System/2.5

Accept: text/plain

Accept: text/html

Accept: application/postscript

Accept: image/gif

Page 26: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Response

• An HTTP response consists of:– A status line (HTTP version, status code, reason)– Meta-information (e.g. Content-Type)– The actual information requested

Page 27: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

HTTP Status Codes

200

301

302

304

401

402

403

404

500

Document follows

Moved permanetly

Moved temporarily

Not modified

Unauthorized

Payment required

Fobidden

Not Found

Server Error

Page 28: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Meta-information

• Server

• Date

• Content-Length

• Content-Type

• Content-Language

• Content-Encoded

• Last-Modified

Page 29: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

HTTP Response: Example

HTTP/1.0 Status 200 Document follows

Server: NCSA/1.4

Date: Tue, 4 Jul, 1997 19:17:05 GMT

Content-type: text/html

Content-length: 5280

Last-modified: Wed, 1 Jan 1997 01:00:02 GMT

… the contents of silly.html

Page 30: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

How a Web server works

• Wait for a new request

• Request arrives

• Server parses the request

• Do the method requested– if success, send document– if failed, report status

• Close file, close network connection

Page 31: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Exercise

• Start Netscape Navigator

• Browse the College’s Web pages

• For each page, check Page Info to see what meta-information is shown.

Page 32: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

One request at a time

• Many requests can arrive simultaneously.

• Many requests will be delayed.

• A request could wait for a long time even though it could be served very quickly.

• The queue could be built up very quickly.

• Poor utilization of hardware resources.

Page 33: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Handling more than one request at a time

• Forking method

• Multi-threading

• Helper programs

Page 34: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Forking method

httpdListening

httpdA request arrives

httpd

Listening

httpd

Serving the request

clone

Page 35: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Multi-threading

httpdResponding request 1Retrieving for request 2Parsing request 3Receiving request 4Listening

Page 36: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Helper Programshttpd

Listening

httpdA request arrives

httpd

Listening

Helper #1request

Processing request

Page 37: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

More than one Web Service on the same Server

• By default, httpd uses port 80, which requires superuser privilege.

• Other ports, e.g. 8080, 8081, can be operated by users.

• Each httpd on the same platform can have a different tree. They may provide different services.

Page 38: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Virtual Servers

• Multiple Web servers on a single platform, each one with a different IP address and a different domain name.

• Only available where the operating system has virtual host support.

• Low-cost option for a separate domain name.

Page 39: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Problems with HTTP

• Web servers generally deliver information, but have little ability to ensure that it is correct, and that the hyperlinks are correct.

• Each request requires a separate TCP connection.

• HTTP is stateless and does not support “sessions”.

Page 40: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Some solutions

• Web site management tools, e.g. FrontPage, help ensure the correctness and integrity of Web pages.

• Scripts and helper programs can overcome the lack of sessions in HTTP.

• Changes to HTTP– e.g. Connection: Keep-Alive

Page 41: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Scripts, Gateways, and Forms

Page 42: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Customized and Interactive Web Pages

• Make legacy information systems accessible via the Web, e.g. online library catalogs.

• Obtain user inputs.

• Customized pages, e.g. Your News Page.

Page 43: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Scripts

• A Web script is a program executed by the server upon requests.

• The result of executing a script is returned to the client in HTML format.

• Scripts can:– access online databases– allow user-server interaction– construct Web pages dynamically

Page 44: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Scripts (cont.)

• A script may:– call other programs– contact other servers.

Page 45: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Gateways

• A Web script that provides access to an online service, such as an existing database.

• Translate an HTTP request into a database/query language.

Page 46: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Server Scripts v.s. Client Scripts

• A server script is sometime called a CGI script and is executed by the server (not the client/browser).

• Many browsers are capable of executing scripts embedded in Web pages, e.g. Web pages with Javascript.

• Here, we talk about server scripts only.

Page 47: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Scripting Languages

• A script can be written in any programming languages, e.g. C, Perl.

• There is a version of Javascript, called LiveWire, that is available for Netscape servers.

• The latest version of Java support server scripts called servlets.

Page 48: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The Common Gateway Interface

• A standard which defines how scripts are executed by servers and how data are passed between a script and a server.

• Actually a suite of standards, one for each operating system environment.

Page 49: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

What does httpd do with scripts ?

• Determine that a request is for a script.

• Locate the script and check permission.

• Start the script and pass client’s input to the script.

• Read the script’s output and pass it to the client.

• Error handling.

• Close network connection.

Page 50: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

How to distinguish scripts from other Web objects ?

• According specific rules laid down by the system administrator, e.g.– All scripts are contained in a particular directory

such as /script– All files with extension .cgi

Page 51: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Example

httpd

GET /scripts/date...

1. Receive request

2.Locatescript

date

3. Startscript

4. Returnresult

5. Send response

HTTP/1.0 Document followsServer: NCSA/1.4Date: Thu, 20 Apr 1998

Page 52: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

When Problems Occur

• A script should be robust, fast, and safe.

• It’s useful to include error messages in a script so that it can tell the client when problems occur.

Page 53: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Interpreted v.s. Compiled Scritps

• An interpreted script, like a Perl script, is actually executed by an intepreter program which reads and execute the “script” line by line.

• A compiled script, like a C program, runs faster and takes up less memory.

Page 54: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Costs of Using Scripts

• Whenever a script is called, the resource implication would mean at least double or even more.

• Script outputs are normally parsed by httpd before being sent to clients. httpd ensures that proper headers are there; if not, httpd would add appropriate headers, hence, overhead.

Page 55: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Scripts and Forms

• An HTML form is just an HTML document with inputs.

• A client requests a form just like any HTML document.

• Once filled-in, the client may request a script to process the input in the form by attaching the form data as arguments to the request.

Page 56: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

The HTML Form

• A HTML form should contain:– The METHOD (GET or PUT)– The ACTION (the script)– A SUBMIT buttion– Input items:

• Input boxes

• Checkboxes

• Radio buttions

• etc.

Page 57: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

ExampleFo r m f o r CSO PHque r yT h is fo rm w ill sen d o a P H q u ery to th esp ecified p h serv er.

P H S erv er:n s . u i u c . e d u

R etu rn n am e? R etu rn p h o n e? R etu rn em ail?

A t lea st o n e o f th ese field m u stb e sp ecified :

n s . u i u c . e d uN am e n s . u i u c . e d uE m ail

A d d ress

S u b m it Q u e ry

Page 58: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Example<HTML><HEAD><TITLE>Form for CSO PH query</TITLE></HEAD><BODY><H1>Form for CSO PH query</H1>This form will send oa PH query to the specified ph server.<BR><HR WIDTH="100%"><FORM ACTION="http://www.server.org:80/scripts/directory_assistance"><BR>PH Server<INPUT TYPE="text" NAME="Jserver" VALUE="ns.uiuc.edu" MAXLENGTH="256"><BR><INPUT type="checkbox" NAME="doname" VALUE="yes">Return name?<BR><INPUT type="checkbox" NAME="dophone" VALUE="yes">Return phone?<BR><INPUT type="checkbox" NAME="doemail" VALUE="yes">Return email?<H3>At least one of these field must be specified:</H3><UL><LI><INPUT TYPE="text" NAME="Qname" VALUE="ns.uiuc.edu" MAXLENGTH="256">Name</LI><LI><INPUT TYPE="text" NAME="Qemail" VALUE="ns.uiuc.edu" MAXLENGTH="256">EmailAddress</LI></UL><INPUT TYPE="submit"></FORM></BODY></HTML>

Page 59: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Form: The GET method

• Input is simply attached to the GET request, preceded by “?”.

• At the server, the input is copied to the environment variable QUERY_STRING before the script is called.

• Script gets the input from QUERY_STRING.

• Some browsers attach input data to the pathname of the script.

Page 60: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Example

GET http://www.server.org:80/scripts/direectory_assistance?Jserver=ns.uiuc.edu&doname=yes&dophone=yes&Qname=&[email protected] HTTP/1.0

Page 61: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

FORM: The POST method

• Input is passed to the server as an HTTP object.

Page 62: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Converting Input and Output

• The script is responsible for parsing the user input and returning the result in HTML or some suitable format.

• Forms and script must use the same set of field names.

Page 63: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Costs of using forms and CGI

• Processes– httpd, script, other programs

• Message passing– one request for the form, one request for the script,

one response

• Data conversion (parsing)

• Different platforms execute scripts in different ways.

Page 64: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Performance Issues

Page 65: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Web Server Performance

• The Web is built upon many other non-Web components; the performance of Web server therefore heavily depends on these components.

• Performance evaluation is difficult.

• Web servers can get really busy as the number of clients is potentially huge.

Page 66: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Performance Measurement:What to measure ?

• Connections per second

• Bytes per second (throughput)

• Round-trip time– The time from when the client begins to set up the

connection to the Web server until the last byte of the request is received by the client.

• Performance of non-Web components, e.g. network, disks

Page 67: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

How to measure ?

• Field tests

• Laboratory experiments (Benchmarks)

• Instrumentation

Page 68: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Field tests

• Extract connections/sec, bytes/sec from server log.

• Round-trip time depends on where the clients are on the network and many other factors.

• Statistics on disks, CPU, and memory usage can be useful.

Page 69: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Laboratory experiments

• Realistic setup is needed.

• User requests can be simulated with Web pingers, which also keep logs.

• RTT can be measured.

• Synthetic workloads, called benchmarks, can be created.

• Stress testing.

Page 70: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Instrumentation

• Insert code into Web servers to keep more detailed logs.

• Inserted code could drain resources and affect server performance.

• Risk: too much (junk) data.

Page 71: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Performance of Web Servers

• httpd itself is simple enough but httpd often spawns new processes in order to serve requests (e.g. CGI).

• forking httpd can be expensive.

• CGI scripts could be a source of performance problems.

• Perl scripts are less efficient than compiled C programs.

Page 72: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

(continued)

• Data compression and encryption demand a lot of resources.

• Disks can easily be the slowest component of Web server; caching documents in memory could help.

Page 73: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Assignment (15%)

• Groups of one or two students

• Read the article on PC Magazine issue May 8: Web Servers– http://www.zdnet.com/products/content/pcmg/

1709/302244.html

• Choose one of the 9 servers reviewed in the article and follow the hyperlinks provided to find out more information about the chosen web server.

Page 74: Web Server Technology ISM 3600 Contemporary Issues in Information Technology

Assessment

• Each group will present their findings to the lecturer in a 20 session followed by a 10 mins of questions and answers.

• Criteria– Evidence of information gathering.– Appreciation of the latest Web server technology and its trends.– Understanding of the technical details.– Clarity and structure of the presentation.– Ability to answer questions.


Recommended