Upload
karthikonthenet
View
85
Download
2
Embed Size (px)
Citation preview
1
UNIT-I IT2353/Web Technology IT/RMKEC
IT2353 Web Technology UNIT-I
1.0 Web Essentials: Clients, Servers, and Communication.
1.1 - The Internet:
The Internet is a system that lets computers all over the world to communicate
with each other. The Internet is a network of networks that connects all over the
world.
a) Internet is a Network: The computers on the Internet can talk to one another
because they are networked; they are connected in some way so that they can
exchange information with one another electronically on the Internet, the
connections take many different forms:
o They can be connected directly using wire or fibre optic cables.
o Some are connected through local and long distance telephone
lines.
o Some even use wireless satellite communication.
b) Internet is a Loose Organisation: No single entity or organization controls it. It‘s
like a neighbourhood where people communicate because they are able to, but
they have not established a formal organization with rights, rules and leaders.
c) Internet is a Relaxing place: Internet resources enable users to exchange
information in an open, public way. These resources provide a forum where users
can write and post messages for other users and where they can read messages
posted by others.
d) Internet is a Business Tool: Increasingly global business climate and the great
extent to which big companies rely on computers, the Internet looks like a great
vehicle for national and international business communication. Now a days,
several organizations are running their business through Internet for various
purposes like document exchange, product specifications, payment etc.
e) Internet is a Mailbox: The most-used Internet facility is electronic mail also
known as e-mail. Everyone who lives in the Internet world has a unique Internet
name, email id. To send a message to an Internet user, anywhere in the world, all
the sender has to know is the address of the recipient. The sender types up a
2
UNIT-I IT2353/Web Technology IT/RMKEC
message in his/her favorite word processing or e-mail software programs hops
onto the Internet, types in the address and sends the message on its way. The
message should arrive at the recipient‘s computers almost instantly. After all its
travelling through wire at nearly the speed of the light.
f) Internet is a Information Pool: Whatever the information the users in need, it can
be easily located and utilized from the Internet. Today Internet is the best utility
that is available to all the users at cheaper prices to know about things. It is an
information pool that satisfies varied level of users from kid to old age, from
student to business professions, researchers and Government organization.
1.2 BASIC INTERNET PROTOCOLS:
A computer communication protocol is a detailed specification of how communication
between two computers will be carried out in order to serve some purpose.
The Internet protocols specifies both the high level behavior of software implementing
the protocol and the low level details of the specific fields of information in the
communication message.
1.2.1 – TCP/IP:
TCP/IP are actually two different protocols.
But they are treated as a single protocol is that bulk of the services associate with the
Internet-email, web browsing, downloads, accessing remote databases are built on top of
both the TCP and IP protocols.
Internet Protocol(IP) is fundamental to the definition of the Internet.
A key element of IP is the IP address, which is simply a 32 bit number.
Each device on the Internet has one or more IP addresses associated with it.
IP addresses are written as a sequence of four decimal numbers separated by periods
(called ―dots‖) as in 192.0.34.166.
Each decimal number represents one byte of the IP address.
Function of IP software is to transfer data from one computer to another computer.
3
UNIT-I IT2353/Web Technology IT/RMKEC
When an application on the source computer wants to send information to a destination
the application on the source computer calls IP software and provides it with data to be
transferred along with the an IP address for each of the source and destination computers.
The IP software running on the source creates a packet, which is a sequence of bits
representing the data to be transferred along with the source and destination IP addresses
and Header Information such as length of the data.
It the destination computer is on the same local network: Then the IP software will send
the packet to the destination directly via this network.
The another network: Then the IP software will send the packet to the gate way which is
a device that is connected to the source‘s network as well as to atleast one other network.
The gateway will select a computer on one of the other networks to which it is attached
and send the packet on to that computer.
The packet will perhaps get through more hops until reached the destination.
The IP software on the destination will receive the packet and pass its data upto an
application that is waiting for the data.
The packet from SURANET source is transmitted to the gateway which is connected to
the SURANET.
As the NSFNET gateway is not connected to the SURANET, so the packet is routed to
another gateway say Telnet gateway.
From there, it is routed to the NSFNET. So, the packet would need to go through at
least one other gateway on the NSFNET.
The sequence of computers that a packet travels through from source to destination is
known as route.
The sequence of computer that are used for routing the information(packet) is chosen by
Border Gateway Protocol.
IP software also adds some error detection inform action to each packet it creates.
So if a packet is corrupted, that can usually be detected by the recipient and discard the
corrupted packets.
So, IP-based communication is unreliable. It alone is not a particularly good form of
communication for many internet applications.
4
UNIT-I IT2353/Web Technology IT/RMKEC
TCP:
It is a higher level protocol that extends IP to provide additional functionality on the
concept of a connection.
Once a connection is established, TCP provides reliable data transmission by
demanding an acknowledgement for each packet it sends via IP.
The sender sets the timer, after sending each packet, the TCP software on the receiving
side sends a packet containing an acknowledgement for every TCP-based packet it
receives.
If the sender does not receives an ack. packet before its timer expires, then it resends the
packet and restarts the timer.
Another feature that TCP adds to IP is the concept of a port.
The port concept allows TCP to be used to communicated with many different
applications on a machine.
TCP[25] – represents a TCP header containing 25 as the port number
Mail server ask TCP to listen for requests to port 25.
This request is received by the machine(mail client) and that IP message contains a TCP
message with port 25 indicated in its header, then the data contained within the TCP
message will be returned to the mail server application.
1.2.2 – UDP, DNS and DOMAIN NAMES:
UDP is an alternative protocol to TCP that also builds on IP.
5
UNIT-I IT2353/Web Technology IT/RMKEC
The main feature that UDP adds to IP is the port concept
UDP does not provide the two-way connection or guaranteed delivery of TCP.
UDP is speed when compared to TCP for simple tasks.
One Internet application that is often run using USP rather than TCP is the Domain
Name Service (DNS)
Every device on the internet has an IP address such as 192.0.34.166.
But humans generally find it easier to refer to machines by names, such as
www.example.org
DNS provides a mechanism for mapping back and forth between IP addresses and host
names.
Basically there are number of servers(DNS) on the internet, each listening through UDP
software to a port.
Conversion takes place as follows:
Computer on the Internet needs DNS service.
It uses UDP software running on its system to send a UDP message to one of these DNS
servers, requesting the IP address.
Then server will send back a UDP message containing the IP address.
Internet host names consist of a sequence of labels separated by dots.
Ex: www.google.com
www – world wide web service that is used for information retrieval purpose.
.com – It is commercial orgn. It is top-level domain. It can be 2 or 3 characters.
Domain Description
.com Commercial organization
.net ISPs &Network related Organisation
.org Non-commercial organization
.gov Government agencies
.mil Military
.edu Educational Institutions
.web Web based company
.tv Television channel
.info Information site
6
UNIT-I IT2353/Web Technology IT/RMKEC
Normally two letters are used to segregate between the countries.
.au Australia
.us U.S.
.ca Canada
.in India
.uk United kingdom
1.2.3 – HIGHER LEVEL PROTOCOLS
Higher level protocols are used to communicate once a TCP connection has been
established.
SMTP & FTP are widely used higher-level protocols that are used to communicate over
TCP connections.
SMTP supports transfer of e-mail between different email servers.
FTP is used for transferring files between machines.
Another protocol Telnet is used to execute commands typed into one computer on a
remote computer.
The primary TCP-based protocol used for communication between web servers and
browsers is called the HTTP (Hyper Text Transfer Protocol)
1.3 – THE WORLD WIDE WEB
Public sharing of information has been a part of the Internet.
The Usenet newsgroup service began in 1979 and provided a means of ―posting‖
information that would be read by users on other systems with the appropriate software.
Large files were often shared by running an FTP server application.
The first internet chat software in widespread use is IRC, provided both public and
private chat facilities.
The amount of information publicly available on the internet grew, the need to locate
information also grew.
Various technologies for supporting information management and search on the internet
were developed.
Some of the more popular information management technologies
In 1990 – Gopher Information Servers – provided a simple hierarchical view of
documents.
7
UNIT-I IT2353/Web Technology IT/RMKEC
Wide Area Information system – for indexing and retrieving information.
Archie tool – searching online information archives accessible via FTP.
All the above technologies two types of software.
(i) Server (ii) Client
Server: An Internet-connected computer that wishes to provide information to other
internet systems must run server software.
Client: A system that wishes to access the information provided by servers must run
client software.
The communication between the client and the server over the internet by following a
communication protocol built on top of TCP/IP.
The protocol used by the web is the HyperText Transport Protocol(HTTP)
This is a generic protocol that supports a client requesting a document from a server and
the server returning the requested document.
Most web pages are written using HTML(HyperText Markup Language)
HTML pages can contain the familiar web links to other documents on the web.
HTML also provides extensive page layout facilities, including support for online
graphics has added significantly to the commercial appeal of the web.
WWW can be informally defined as the collection of machines on the internet that
provide information via HTTP, and particularly those that provide HTML document.
1.3.1 – HYPEXTEXT TRANSPORT PROTOCOL:
HTTP is a form of communication protocol, in particular a detailed specification of how
web clients and servers should communicate.
The basic structure of HTTP communication follows what is known as a request-response
model.
An HTTP interaction is initiated by a client sending a request message to the server, the
server generates a response message.
HTTP does not use network protocol to send these messages, but the request and
response are both sent within a TCP-style connection between the client and the server.
Ex: When we type the address in the address bar and pressed enter key, after that
i) Browser created a message conforming to the HTTP protocol
8
UNIT-I IT2353/Web Technology IT/RMKEC
ii) Used DNS to obtain an IP address for the Domain Name Address.
iii) Created a TCP connection with the machine at the IP address obtained.
iv) Sent the HTTP message over this TCP connection
v) Receive back a message containing the information that is displayed in the client
area of the browser.
HTTP request and response messages often consist of plain text in a fairly readable form.
Using Telnet enter the request at a command prompt.
Telnet is in port 80 given by IANA standard port for HTTP web servers.
$ telnet www.example.org 80 } Connect Trying 192.0.34.166... }
Connected to www.example.com (192.0.34.166).
Escape character is ‘^]‘.
GET / HTTP/1.1 } HTTP REQUEST
Host: www.example.org }
HTTP/1.1 200 OK }HTTP RESPONSE
Date:Thu,09Oct200320:30:49GMT }
…
This is the HTTP‘s basic structure.
1.4 – HTTP REQUEST MESSAGE
1.4.1– OVERALL STRUCTURE:
Every HTTP request message has the same basic structure.
Start line Header field(s) (one or more) Blank Line Message body (optional)
In the Previous example : GET/HTTP/1.1 is the start line
Every start line consists of three parts.
i) Request Method
ii) Request-URI portion of web address
iii) HTTP version
1.4.2 – HTTP VERSION:
The initial version of HTTP is HTTP/0.9
9
UNIT-I IT2353/Web Technology IT/RMKEC
The first internet RFC regarding HTTP described HTTP/1.0
In 1997, HTTP/1.1 was formally defined and is currently an Internet Draft Standard.
Essentially all operational browsers and servers support HTTP/1.1
1.4.3 – REQUEST-URI:
The second part of the start line is known as the Request-URI
The concatenation of the string http://, the value of the Host header
field(www.example.org), the Request-URI (/) forms a string called as a Uniform
Resource Identifier (URI) http://www.example.org/
URI is an identifier that is intended to be associated with a particular resource on the
www.
Every URI consists of two parts:
i) Scheme (which appears before the colon(:))
ii) Another part that depends on the scheme
Web addresses for the most part use the http scheme
In this scheme URI represents the location of a resource on the web.
A URI of this type is said to be a Uniform Resource Locator.
Uniform Resource Name is designed to be unique name for a resource rather than
specifying a location at which to find the resource.
Ex: An edition of war and peace has an ISBN of 0-1404-4417-3 associated with it.
This is the only book world wide with this number. So this book has an associated URN,
which can be written as follows: urn:ISBN:0-1404-4417-3
URI for a URN always consist of three colon-separated parts.
First part is the scheme name which is always urn for a URN-type URI
Second part is the name space identifier(ISBN)
Third part is the name space-specific string.
1.4.4-REQUEST METHOD
The primarily HTTP method is GET
10
UNIT-I IT2353/Web Technology IT/RMKEC
This method is used when URL typed into the location bar of the browser. This method is
used by default when link is clicked in a document displayed in browser and when the
browser downloads images for display within an HTML document.
The POST method
Typically used to send information collected from a form displayed within a browser,
such as an order-entry form, back to the web server.
Other request methods are
HEAD Return the same HTTP header fields that would be returned if a GET
method were used, but not return the message body
OPTIONS Return a list of HTTP methods that may be used to access the resource
specified by the Request-URI
PUT Store the body of this message on the server for future use
TRACE Return a copy of the complete HTTP request message. Used primarily
for test purposes.
1.4.5- HEADER FIELD AND MIME TYPES
HTTP/1.1 defines a number of header fields, which are commonly used by modern
browsers.
Each header field begins with a field name, such as Host followed by a colon and then a
field value.
Ex: HTTP request sent by a browser consists of a start line, 10 header fields, and a short
message body.
POST/Servlet/Echo HttpRequest HTTP/1.1 ---- Start line
Host: www.example.org:56789 }
User-agent: Mozilla/5.0 }
Accept: text/xml }
Accept-language: en-us,en; }
Connection: keep-alive }
Keep-alive: 300 }
Content-type:application/x-www-form-urlencoded }
11
UNIT-I IT2353/Web Technology IT/RMKEC
Content-length: 13 } Header fields
Doit = click+me - short message body
Some of the features of header fields
Header names are not case sensitive
Header field value may wrap onto several lines by preceding each continuation line with
one or more spaces or tabs.
Use of MIME types in several header field values.
MIME – Multipurpose Internet Mail Extensions
It refers to a standard that can be used to pass a variety of types of
information including graphics and applications through e-mail as well as
through other Internet message protocols.
The content of the MIME message is specified using a two part
Case insensitive string known as content-type of the message
Header field value use so-called quality values to indicate preferences.
Ex: q=num->decimal number between 0 & 1, with higher number
representing greater preference.
1.5-HTTP RESPONSE MESSAGE
HTTP response message consists of a status line, header fields and the body of the
response in the following format:
Status line
Header Field(s)
Blank line
Message body(optional)
1.5.1 – RESPONSE STATUS LINE:
Ex: HTTP/1.1 200 OK
Like request, response message the status line consists of three fields.
i) HTTP version used by server software when formatting the response.
ii) Numeric code indicating the type of response
iii) Text string that presents the information represented by the numeric status code in
human readable form.
In this example, the status code is 200 and the reason phrase is OK
12
UNIT-I IT2353/Web Technology IT/RMKEC
This status code indicated that no error were detected by the server.
The body of a response having this status code should contain the resource requested by
the client.
All status codes are three digit decimal numbers.
First digit represents the general class of status code.
Digit Class Use
1 Informational Provides information to client before request processing
has been completed.
2 Success Request has been successfully processed.
3 Redirection Client needs to use a different resource to fulfill request.
4 Client error Client‘s request is not valid.
5 Server Error An error occurred during server processing of a valid
client request.
Status codes:
Code Reason Phrase
200 OK
301 Moved permanently
307 Temporary redirect
401 Unauthorised
403 Forbidden
404 Not found
500 Internal Server Error
The HTTP standard recommends reason phrase for all status code.
1.5.2 – RESPONSE HEADER FIELDS:
Some of the header fields used in HTTP request messages, including connection, content
type and content length are also valid in response messages.
13
UNIT-I IT2353/Web Technology IT/RMKEC
The content type of a response can be any one of the MIME type values specified by the
Accept header field of the corresponding request.
Some of the common response header fields are:
Field Name Use
Date Time at which response was generated by
the server
Server Information identifying the server software
generating this response
Last-modified Time at which the resource returned by this
request was last modified.
Expires Time after which the client should check
with the server before retrieving the
returned resource from the client‘s cache.
Location Used in responses with redirect status code
to specify new URI for the requested
resource.
1.5.3 – CACHE CONTROL
Most web browsers automatically cache on the client machine many of the resources that
they request from servers via HTTP.
For example, if an image such as button icon is included in a web page, a copy of the
image obtained from the server will typically be cached in the client‘s file system.
If another page at the same site uses the same image the image can be retrieved from the
client file system rather than sending again the request to the server.
HTTP caching reduce the network communication, and reduce load on the web server.
Key drawback in using cache:
Information in a cache can become invalid, if the image is modified on the server, but a
client accesses its cached copy of the older version of the image.
The above drawback can be avoided in several ways:
14
UNIT-I IT2353/Web Technology IT/RMKEC
i) Guaranteeing that a cached copy of a resource is valid is for the client to ask
server whether or not the client‘s copy is valid.
This is done with a relatively little communication by sending an HTTP
request for the resource using the HEAD method, which returns only the
status line and header portion of the response.
With the last-modified time field, can know about the validity.
ii) With ETag
The client can then simply compare the Etag with the stored ETag.
If match, then the cached copy is valid otherwise it is not.
1.5.4 – CHARACTER SETS:
Characters are represented by integer value within a computer.
A character set defines the mapping between these integers, or code points, and
characters.
For example US-ASCII is the character set used to represent the characters used in HTTP
header field names and also used in key portions of many other Internet protocols.
For web pages, which are meant to be viewed throughout the world, a single worldwide
character set be used.
The Unicode standard is an attempt to provide a single character set that encompass every
human language representation as well as all other commonly used symbols.
In addition to the character set, browsers also accept certain character encodings.
A character en-coding is a bit string that must be decoded into a code-point integer that is
then mapped to a character according to the definition provided by some character set.
A character encoding often represents characters using variable-length bit strings.
UTF-8 use variable numbers of 8 bit values.
UTF-16 use variable numbers of 16 bit values.
The Accept-charset header field is sued by a client to tell a server the character sets and
character encodings that it will accept as well as its preferred character sets or encodings.
Ex: accept-charset: ISO-8859-1, utf-8, q=0.7, *;q=0.7
Said that the client would prefer to receive documents using the ISO-8859-1 characterset
or the UTF-8.
15
UNIT-I IT2353/Web Technology IT/RMKEC
A webserver can inform a client about the character set/encoding used in a returned
document by adding a char set parameter to the value of the content-type header field.
Ex: content-type: text/html; charset=UTF-8.
Content type indicates the body of the message is an HTML document.
Charset indicated that the body is written using the UTF-8 character encoding.
1.6 – WEB CLIENTS
A web client is software that accesses a web server by sending an HTTP request message
and processing the resulting HTTP response.
Web browsers running on desk top or lap top computers are the most common form of
web client software.
Other client software are text-only browsers, browsers running on cell phones and
browsers that speak a page rather than displaying the page.
In general, any web client that is designed to directly support user access to web servers
is known as a user agent.
1.6.1 – BASIC BROWSER FUNCTIONS:
The window of a typical modern browser is split into several rectangular regions known
as bars.
The primary region is the client area, which displays a document.
The title bar displays a title assigned by the document author to the document currently
displayed within the client area.
The title bar also displays the browser name as well as standard window management
controls.
The menu bar contains a set of drop down menus much like most other applications that
incorporate a GUI.
The browser‘s navigation toolbar contains standard push-button controls that allow the
user to return to a previously viewed web page(Back), reverse effect of pressing
Back(Forward), ask the server for an updated version of the page currently
16
UNIT-I IT2353/Web Technology IT/RMKEC
viewed(Reload), halt page downloading currently in progress(Stop), and print the client
area of the window(print).
The Navigation toolbar also contains a text box, known as Location bar, where a user can
enter a URL and the Enter key is pressed in order to request the browser to display the
document located at the specified URL.
If search key is pressed instead of Enter key, it will go to search engine.
Clicking down-arrow at the right side of the location bar produce a drop down menu of
recently used URLs that can be visited again with a single click.
Finally the status bar displays messages and icons related to the status of the browser.
The messages displayed in the left portion of the status bar are normally information
about the communication between client and server.
The right portion of the status bar consists of a icon indicating browser online and that the
browser is communicating with the server over insecure communication channel.
A primary task of any browser is to make HTTP requests on behalf of the browser user.
The browser must perform a number of tasks.
i) Reformat the URL entered as a valid HTTP request message.
ii) If the server is specified using a host name, use DNS to convert to appropriate IP
address.
iii) Establish a TCP connection using the IP address of the specified web server.
iv) Send the HTTP request over the TCP connection and wait for the server‘s
response.
v) Display the document contained in the response.
1.6.2 – URLS
An HTTP scheme URL consists of a number of pieces.
Ex: http://www.example.org:56789/a/b/c.txt?t=win&s=chess#para5
www.example.org:56789 is the authority of the URL consists of either a fully qualified
domain name or an IP address of an Internet Web sever, optionally followed by a colon
(:) and a port number.
IF the port number is omitted, then a TCP connection to port 80 is implied.
17
UNIT-I IT2353/Web Technology IT/RMKEC
www.example.org is the authority fully qualified domain name. 56789 is the port
number.
Following the authority through the question mark(?) is called the path of the URL.
The leading slash is part of the path, but the question mark is not. In the example
/a/b/c.txt is the path.
The information between but not including ? and the number sign (#) is the query portion
of the URL. In the xample t=win & s=chess is the query portion. Query portion was
intended to pass search terms to a web server.
The final optional part of the URL, is the fragment of the URL. The text after the number
sign(#) is the fragment. Para5 is the fragment. It is used by the browsers to scroll the
HTML documents.
1.6.3 – USER CONTROLLABLE FEATURES.
Graphical browsers provide many user-controllable features.
i) Save: Most documents can be saved by the user to the client machine‘s file
system. If the document is an HTML page that contains such as images, then the
browser will attempt to save all these documents locally so that the entire page
can be displayed from the local file system.
ii) Find in Page: Standard documents can be searched with a function that is similar
to that provided by most word processors.
iii) Automatic form filling: The browser can ―remember‖ information entered on
certain forms such as billing address, phone numbers etc.
iv) Preferences: Users can customize browser functionality in a wide variety of ways.
Some preference settings related to the HTTP are:
Accept Language
Default character set/encoding: The character set/encoding to be assumed
for documents.
Cache properties: The amount of local storage allocated to the cache and
the conditions controlling when a cached file will be validated are set.
HTTP settings: The version of HTTP used and whether or not the client
will keep connections alive is set.
18
UNIT-I IT2353/Web Technology IT/RMKEC
v) Style Definition: The user can define certain aspects affecting how the browser
renders HTML pages, such as font sizes, back ground and fore ground colors etc.
vi) Documents meta-information: Interested users can view information about the
displayed document, such as the document‘s MIME type, character encoding, size
etc.
vii) Themes: The navigation bar can be modified by applying a certain name.
viii) History: The browser will automatically maintain a list of all pages visited within
the last several days.
ix) Bookmarks: Users can explicitly bookmark a web page, that is, save the URL for
that page for an indefinite length of time.
1.6.4 – ADDITIONAL FUNCTIONALITY:
Browsers perform a number of other functions including:
i) Automatic URL completion: If the user has entered a URL in a location bar and
begins to type it again the URL will be completed automatically by the browser.
ii) Script Execution: Browsers can run programs(scripts). For example validating
data entered on a form before sending it to a web server.
iii) Event Handling: When a user performs an action, such as clicking on a link or
button in a web page, the browser treat this as the occurrence of a event. Variety
of actions in response to an event takes place.
iv) Management of form GUI: If a web page contains a form with fill-in fields, the
browser must allow the user to perform standard-text-editing functions within
these fields.
v) Secure Communication: When a user sends sensitive information such as a credit
card number, to a server, the browser can encode this information in a way that
prevents any machines from obtaining the information.
vi) Plug-in Execution: Most browsers support some form of plug-in protocol that
allows the browser‘s capabilities by other software.
19
UNIT-I IT2353/Web Technology IT/RMKEC
1.7 – WEB SERVERS
Basic functionality found in most web servers as well as some specific instructions for
accessing and modifying the parameters of the web servers Tomcat 5.0
1.7.1- SERVER FEATURES:
The primary features of web servers is to accept HTTP requests from web clients and
return an appropriate resources in the HTTP response.
The basic functionality involves a number of steps.
i) The server calls on TCP software and waits for connection requests to one or
more ports.
ii) When a connection request is received, the server dedicates a ―subtask‖ to
handling this connection.
iii) The subtask establishes the TCP connection and receives an HTTP request.
iv) The subtask examines the Host header field of the request to determine which
―virtual host‖ should receive this request and invokes software for this host.
v) The virtual host software maps the Request-URI field of the HTTP request start
line to a resource on the server.
vi) If the resource is a file, the host software determines the MIME type of the file
and creates an HTTP response that contains the file in the body of the response
message.
vii) If a resource is a program, the host software runs the program, providing it with
information from the request and returning the output from the program as the
body of an HTTP response message.
viii) The server normally logs information about the request and response in a normal
text file.
ix) If the TCP connection is kept alive, the server subtask continues to monitor the
connection until a certain length of time has elapsed, the client sends another
request of the client initiates a connection close.
20
UNIT-I IT2353/Web Technology IT/RMKEC
1.7.2- SERVER HISTORY:
NCSA‘s httpd web server is a starting point for server development.
Apache server, free open-source server in April 1995.
Microsoft‘s IIS provides essentially of the features found in Apache.
IIS can run only on windows OS.
Apache can run on windows, linux and macintosh system.
Tomcat is a popular, free and open-source servlet container developed and maintained by
the Apache software foundation, the same organization developed web server also.
Tomcat can also be run as a standalone web server that communicates directly with the
web clients.
1.7.3- SERVER CONFIGURATION AND TUNING
Broadly server configuration can be broken into two areas: external communication and
internal processing.
In Tomcat, this corresponds to two separate Java packages; Coyote provides the
HTTP/1.1 communication. Catalina is the actual servlet container.
Some if the coyote parameters, affecting external communication include the following:
IP addresses and TCP ports that may be used to connect to this server.
Number of subtasks(thread) that will be created when the server is initialized.
This many TCP connections can be established simultaneously with minimal
overhead.
Maximum number of threads that will be allowed to exist simultaneously.
Maximum number of TCP connection requests that will be queued if the server is
already running its maximum number of threads.
Length of time the server will wait after serving an HTTP request.
The setting of these parameters can have a significant influence on the performance of a
server, changing the values of these and similar parameters in order to optimize
performance is often referred to as tuning the server.
As with all optimization problems, there are various trade offs involved in attempting to
tune a server. For example increasing the maximum number of simultaneous threads that
may execute increases memory requirements.
21
UNIT-I IT2353/Web Technology IT/RMKEC
The internal catalina portion of Tomcat also has a number of parameter settings that
affect functionality.
i) Which client machine send HTTP requests to the server.
ii) Which virtual hosts are listening for TCP connections on a given port.
iii) What logging will be performed.
iv) How the path portion of request-URI will be mapped to the server‘s file system or
resources.
v) Whether or not the server‘s resources will be password protected,
vi) Whether or not resources will be cached in the server‘s memory.
1.7.4 – DEFINING VIRTUAL HOSTS:
The Host component is used to define a virtual host.
The virtual host name should normally be a fully qualified domain name that would be
used by visitors to web site.
Host supplied as part of Java Web Services Developer Pack is given unqualified name
Local Host.
This is a special name that DNS system treats as a reference to a special IP address,
127.0.0.1
If an IP message is sent to this address, the IP software causes the message to loop back
to itself for receipt.
The virtual host should only be accessible if the browser runs on the server machine.
The value of the default host name field for this service is Local Host.
This means that if a user browses to this service using a URL with a host name other than
local host, the request will be passed to the local host virtual host.
Additional most component with name www.example.org was added to this service.
Then this new virtual host would handle requests containing a Host header field with
value www.example.org while all requests with any other host value would continue to
be handled by the default local host virtual host.
Web applications is a collection of files and programs that work together to provide a
particular function to web users.
Web site might run two web applications:
22
UNIT-I IT2353/Web Technology IT/RMKEC
Use by administrators of the site that provides maintenance functionality and
Another for use by external clients that provides customer functionality.
In Tomcat, a web application is represented by a context component.
In a every host, there can be number of contexts.
Each host and context is associated with a directory in the server‘s file system.
The directory associated with a host is specified by the value of the application base field.
If this value is relative path name- a path name does not begin with a ‗/‘ in linux or with a
drive specification such as c:/ in windows, then it is taken as relative to the directory in
which JWSDP 1.3 was installed.
Absolute path name starts with ‗/‘
For example for the local host, the application base is webapps. This webapps will be in
the absolute path as /usr/java/jwsdp1.3/webapps
Using a relative path name for the application base value is generally recommended.
1.7.5 – LOGGING
Web server logs record information about server activity.
The primary web server log recording normal activity is an access log, a file that records
information about every HTTP request processed by the server.
A web server may also produce one or more message logs containing a variety of
debugging and other information generated by web applications as well as possible by the
web server itself.
Finally, information written to the standard output and error streams by the web server or
applications may also be logged.
The following information is contained in this log entry.
i) Host name of client machine making the request.
ii) User name used to log in.
iii) Data & time of response, plus the time zone.
iv) Start line of HTTP request
v) HTTP status code of response.
vi) Number of bytes sent in body of response.
23
UNIT-I IT2353/Web Technology IT/RMKEC
An advantage of using this log format is that a variety of log analysers have been
developed that can read logs in this formats and produce reports on various aspects of a
site‘s usage.
Ex: report on number of accesses per day, percentage of requests that received error
status code.
This will be useful ofr server tuning.
Analog is one popular free log analyzer available.
1.7.6 – ACCESS CONTROL
Tomcat can provide automatic pass word protection for resources that it serves.
This is a 2 stage process
I-stage:
i) A database of user names is created.
ii) Each user name is assigned a pass word and also roles(roles may be
administrator, developer, end user)
iii) Some users may be assigned to multiple roles.
II-stage:
i) Certain resources can only be accessed by users who belong to certain roles and
who have authenticated themselves as belonging to one of these roles by logging
in with an appropriate user name and pass word.
1.7.7 – SECURE SERVERS
HTTP request and response messages are sent as simple text files.
Because these messages are carried by TCP/IP, each message may travel through a
number of machines before reaching its destination.
It is possible that some machine along the route will extract information from the IP
messages it forwards for nefarious purposes.
Furthermore, it is often possible for other machines sharing a local network with the
sending or receiving machine to snoop the network and view messages associated with
other machines as if they were sent to the snooper.
24
UNIT-I IT2353/Web Technology IT/RMKEC
Eaves dropper: Any machine other than the sender or receiver that extracts information
from network messages.
To prevent eavesdropper from obtaining sensitive information, sensitive information
should be encrypted before being transmitted over nay public communication network.
The standard means of indicating to a browser that it should encrypt an HTTP request is
to use the https scheme on the URL for the request.
https://www.example.org
Tomcat supports the TLS 1.0 and other earlier protocols. To enable the secure server
tomcat features:
i) Obtain and install a certificate
ii) Configure the server to listen for TLS connections on some port.
1.8 – MARKUP LANGUAGES: XHTML
For communication between browsers and servers; most documents are written using
Hypertext Markup Language (HTML)
HTML is not a single language. It is name for a family of related languages.
Difference between XHTML & HTML
HTML, is an application of SGML, allows an author to omit certain tags and use
attribute minimization.
XHTML(Extensible HyperText Markup Language) is an application of XML, it
does not permit the omission of any tags or the use of attribute minimization.
1.8.1 – AN INTRODUCTION TO HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>Helloworld.html</title>
</head>
<body>
<p> Hello world </p>
25
UNIT-I IT2353/Web Technology IT/RMKEC
</body>
</html>
This document, like every HTML document, contains two types of information:
i) The markup information, which is contained in tags consisting of angle bracket tag
delimiters (< and >) plus the text contained between these delimiters.
ii) The character data of the document, which is everything outside of the markup tags
and displayed by the browser.
In this case, the character data consists of the two strings HelloWorld.html and Hello
World! as well as some white space.
<!DOCTYPE – It is special markup information called the document type declaration. -
This tag is to identifies the particular version of HTML used to write the program. In this
case, the file is written in the XHTML 1.0 Strict version.
Everything in the ―Hello World!‖ document following the document type declaration—
the text from <html down—is called the document instance. Each of the tags in this
example document instance is either a start tag or an end tag. The start tag and its
corresponding end tag, along with all of the document between the tags, is called an
element of the document. The portion between the tags (not including the tags
themselves) is called the content of the element.
The first tag in the document instance of an HTML file must be a start tag for the root
element, and the root element can only occur once in the document. In order to strictly
conform with the XHTML 1.0 standard, the html start tag must also contain the
xmlns="..." string shown. That string is an example of an attribute specification, which
consists of an attribute name (xmlns in this case) and an attribute value (the string within
quotes).
It can be viewed as a tree (element tree)
Element Tree
html
head body
title p
26
UNIT-I IT2353/Web Technology IT/RMKEC
Any text contained in the head element does not appear directly in the primary window
area (the client area) of the browser window. Instead, the head element is used for
providing certain instructions to the browser. Here the title element, which directs the
browser to display the element content as the window title
The body element contains the information that is to be displayed in the client area of the
browser. This document‘s body contains a single paragraph (p) element. A p element in
particular tells the browser that its content represents a single paragraph of text and s
should be displayed accordingly.
1.9 – HTML’S HISTORY AND VERSIONS
HTML was initially defined by Tim Berners-Lee, in 1990.
It is developed for science and engineering professionals.
After the revision, the elements of the language are still less.
The elements in use as of November 1992 included the title and paragraph elements , the
elements for creating hyperlinks, headings, simple lists, glossaries, examples and address
blocks
There was no facility for producing tables or fill-in forms, much less for including images
within a document.
By February 1993 publicly released a preliminary version of Mosaic browser.
By September 1993 Same browser, in Windows and Macintosh versions was made
available.
In addition to displaying images within documents, the NCSA Mosaic browser could
play video clips as well as sounds.
Microsoft deployed a similarly large development team to work on its Internet Explorer
browser.
During the period from 1993 through 1997, HTML was being defined operationally by
the elements that browser software developers chose to implement and the ways in which
their browsers responded to these elements.
In October of 1994, Tim Berners-Lee launched the World Wide Web Consortium (W3C),
in part with the goal of producing standards for HTML as well as other web technologies.
27
UNIT-I IT2353/Web Technology IT/RMKEC
HTML version 2.0 became a standard over six months.
Version 3.2 was adopted as a standard by the W3C in January of 1997,
The W3C released its HTML 4 recommendation in December of 1997.
The current version of this recommendation, HTML 4.01, is the standard.
1.10 – BASIC XHTML SYNTAX & SEMANTICS
1.10.1 - DOCUMENT TYPE DECLARATION
Every XHTML document must begin with a document type declaration.
Each HTML specification provides such a declaration that can be used at the beginning
of documents intended to conform with the specification.
There are three flavors of both the HTML 4.01 and XHTML 1.0 specifications, each with
its own document type declaration.
The three flavors are:
o Strict: Initial Version.
o Transitional: A superset of StrictHTML. Elements and attributes that should not
be used if possible because they will likely be eliminated from HTML
recommendations at some future time.
Frameset: A superset of the Transitional flavor that includes a feature allowing
several subwindows (frames) to be displayed within a browser‘s client area.
Many documents on the Web today begin with a document type declaration for the
HTML 4.01 Transitional flavor.
1.10.2 – WHITE SPACE IN CHARACTER DATA:
The information that lies outside the markup of the document, and to a large extent is the
textual content of the web page produced by the document.
any string of white space characters within character data is replaced by a single blank.
<body>
<p>
Hello World!
28
UNIT-I IT2353/Web Technology IT/RMKEC
This is my second HTML paragraph.
</p>
</body>
Although the text within the p element is typed into the HTML document as two
paragraphs (there is a blank line between two pieces of text), the browser displays all of
the text as a single paragraph with a single space between the two sentences
A simple way to tell the browser that the text in this example to be displayed as two
paragraphs is to use two p elements instead of one:
<p>
Hello World!
</p>
<p>
This is my second HTML paragraph.
</p>
1.10.3 – UNRECOGNISED ELEMENTS AND ATTRIBUTES
A second feature of HTML that sometimes confuses beginning web authors is that
browsers don‘t complain if a document contains element or attribute names that the
browser does not recognize.
For attributes with unrecognized names, the browser acts as if the attribute is not present
at all.
For unrecognized element names, the browser displays the content of the element as if
the markup were not present.
<head>
<titl>
HelloWorldBadElt.html
</title>
</head>
In this example, the browser treats the content of the titl element as if it were text typed
directly within the head element. Text is not supposed to appear here, and the HTML
standard does not specify how a browser should display such information.
Mozilla chooses to display the text as if it were the initial content of the body.
29
UNIT-I IT2353/Web Technology IT/RMKEC
One implication of HTML‘s handling of unrecognized tag names is that an HTML page
may display correctly in a browser but still have typographical errors in its markup.
For example, consider the following document body:
<p>
Hello World!
</p>
<l>
This is my second HTML paragraph.
</p>
The second paragraph mistakenly begins with an l tag. Since l is not a valid element
name in HTML, this tag will be ignored.
Mozilla treats text that is contained directly in the body element as if it were the content
of a p element.
One simple way to perform validation checking is to use an HTML validator.
This is a program that will analyze your document and not only catch typographical
errors, but also help you to ensure that the HTML document conforms to the standards of
the HTML version.
1.10.4 – SPECIAL CHARACTERS
Few characters must be used carefully in HTML documents.
For example, the less-than symbol (<) is the special symbol used to begin tags.
So, this ‗<‘ symbol is given as a reference <, & is the reference(begin), ; reference
(end)
Character Reference
Character
< <
> >
& &
― "
‘ &apos
˜n ñ
α α
∀
Example Entity and Character References
30
UNIT-I IT2353/Web Technology IT/RMKEC
1.10.5 – ATTRIBUTES
Html element of any XHTML 1.0 document must contain an xmlns attribute
specification.
HTML element has a set of associated attributes that can be specified for it.
The values of an element‘s attributes typically influence how the element is displayed or
how it behaves, or they may supply identifying information.
For example, the xmlns attribute identifies the XML namespace for the document, which
can be considered identifying information.
All XHTML attribute specifications have the form shown for xmlns: white space
separates the attribute name from the element name in the start tag of the element.
The attribute name is followed by an equals sign (=), optionally preceded and followed
by white space; and the value of the attribute, enclosed in quotes, follows the equals sign.
for example, an attribute specification such as
value = "Ain't this grand!" is legal
value = "He said, "She said", then sighed." is not.
However, references may appear within an attribute value, so
value = "He said, "She said", then sighed." is valid.
The " references will be converted to double quotes when the document is parsed.
Multiple attribute specifications can be included within a single tag by separating the
specifications with white space.
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
o This above statement is compatible with software that understands. HTML 4.01,
which does not contain the xml:lang attribute, as well as with software that
understands XML, which defines the xml:lang attribute for use across arbitrary
XML-based lang.
o En-English
o Multiple attribute specification can appear in any order.
restrictions on attribute values
newline characters should not appear within an attribute value;
31
UNIT-I IT2353/Web Technology IT/RMKEC
avoid including any leading or trailing white space, and
avoid having multiple adjacent white space characters within attribute values
1.11 – SOME FUNDAMENTAL HTML ELEMENTS:
Example 1: Headings
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
- Produces section headings for an HTML document.
- h1 represents a top level heading.
- h2 subheadings and so on
Example 2: Link: The Anchor Element
<html>
<body>
<a href="http://www.w3schools.com">
This is a link</a>
</body>
</html>
The a or anchor tag for creating clickable kink within a document.
Example 3: Images: The img Element
<html>
<body>
<img src="s1.jpg" width="104" height="142" />
</body>
</html>
- causes a graphics – capable browser to display an image that, when clicked
Example 4: Nested Elements
<html>
<body>
32
UNIT-I IT2353/Web Technology IT/RMKEC
<p><b>This text is bold</b></p>
<p><strong>This text is strong</strong></p>
<p><big>This text is big</big></p>
<p><em>This text is emphasized</em></p>
<p><i>This text is italic</i></p>
<p><small>This text is small</small></p>
<p>This is<sub> subscript</sub> and <sup>superscript</sup></p>
</body>
</html>
This is an example for nesting elements.
- It includes one element as part of the content of the another element.
Example 5: pre and br Element <html>
<body>
<pre>
This is
preformatted text.
It preserves both spaces
and line breaks.
</pre>
<p>The pre tag is good for displaying computer code:</p>
<pre>
for i = 1 to 10
print i
next i
</pre>
</body>
</html>
Output:
This is
preformatted text.
It preserves both spaces
and line breaks.
The pre tag is good for displaying computer code:
for i = 1 to 10
33
UNIT-I IT2353/Web Technology IT/RMKEC
print i
Example 6: Horizontal Rule Element
<html>
<body>
<p>The hr tag defines a horizontal rule:</p>
<hr />
<p>This is a paragraph</p>
<hr />
<p>This is a paragraph</p>
<hr />
<p>This is a paragraph</p>
</body>
</html>
Output:
The hr tag defines a horizontal rule:
This is a paragraph
This is a paragraph
This is a paragraph
Inserts a horizontal line to the document.
Example 7: Image Link Element
<html>
<body>
<p>Create a link of an image:
<a href="default.asp">
<img src="smiley.gif" alt="HTML tutorial" width="32" height="32" />
</a></p>
<p>No border around the image, but still a link:
<a href="default.asp">
<img border="0" src="smiley.gif" alt="HTML tutorial" width="32" height="32" />
34
UNIT-I IT2353/Web Technology IT/RMKEC
</a></p>
</body>
</html>
Example 8: Comment Element
<html>
<body>
<!--This comment will not be displayed-->
<p>This is a regular paragraph</p>
</body>
</html>
Example 9: Break Element
<html>
<body>
<p>This is<br />a para<br />graph with line breaks</p>
</body>
</html>
HTML Images - The <img> Tag and the Src Attribute
In HTML, images are defined with the <img> tag.
The <img> tag is empty, which means that it contains attributes only, and has no closing
tag.
To display an image on a page,need to use the src attribute. Src stands for "source". The
value of the src attribute is the URL of the image to display.
Syntax for defining an image:
<img src="url" alt="some_text"/>
The URL points to the location where the image is stored. An image named "boat.gif",
located in the "images" directory on "www.example.com" has the URL:
http://www.example.com/images/boat.gif.
35
UNIT-I IT2353/Web Technology IT/RMKEC
The browser displays the image where the <img> tag occurs in the document. If an image
tag is between two paragraphs, the browser shows the first paragraph, then the image, and
then the second paragraph.
HTML Images - The Alt Attribute
The required alt attribute specifies an alternate text for an image, if the image cannot be
displayed.
The value of the alt attribute is an author-defined text:
<img src="boat.gif" alt="Big Boat" />
The alt attribute provides alternative information for an image if a user for some reason
cannot view it (because of slow connection, an error in the src attribute, or if the user uses
a screen reader).
HTML Images - Set Height and Width of an Image
The height and width attributes are used to specify the height and width of an image.
The attribute values are specified in pixels by default:
<img src="pulpit.jpg" alt="Pulpit rock" width="304" height="228" />
HTML <map> Tag
<img src="planets.gif" width="145" height="126" alt="Planets" usemap="#planetmap" />
<map name="planetmap">
<area shape="rect" coords="0,0,82,126" href="sun.htm" alt="Sun" />
<area shape="circle" coords="90,58,3" href="mercur.htm" alt="Mercury" />
<area shape="circle" coords="124,58,8" href="venus.htm" alt="Venus" />
</map>
The <map> tag is used to define a client-side image-map. An image-map is an image
with clickable areas.
The name attribute of the <map> element is required and it is associated with the <img>'s
usemap attribute and creates a relationship between the image and the map.
The <map> element contains a number of <area> elements, that defines the clickable
areas in the image map.
36
UNIT-I IT2353/Web Technology IT/RMKEC
HTML <area> Tag
The <area> tag defines an area inside an image-map (an image-map is an image with
clickable areas).
The <area> element is always nested inside a <map> tag.
The coords attribute specifies the x and y coordinates of an area in an image-map.
The coords attribute is used together with the shape attribute to specify the size, shape,
and placement of an area.
The coordinates of the top-left corner of an area are 0,0.
x1,y1,x2,y2 Specifies the coordinates of the left, top, right, bottom corner of the
rectangle (for shape="rect")
x,y,radius Specifies the coordinates of the circle center and the radius (for
shape="circle")
x1,y1,x2,y2,..,xn,yn Specifies the coordinates of the edges of the polygon. If the first
and last coordinate pairs are not the same, the browser will add the last coordinate pair to
close the polygon (for shape="poly")
1.12 – RELATIVE URLS:
URLs are used as the values of some attributes in HTML documents.
By default the webapps/ROOT subdirectory within your JWSDP 1.3 installation
directory is the document base for the root URL path / of your Tomcat web server.
Running web server in the machine http://localhost:8080/ MultiFile.html will cause your
server to look for a file named MultiFile.html in the webapps/ROOT directory.
For example <img src="valid-xhtml10.png">
The web server is running the program http://localhost:8080/multifile.html which is in
webapps/ROOT.
Then the web server automatically searches for the src file in the same directory
webapps/ROOT.
A complete URL which begins with a scheme is called absolute URL. Each relative URL
is short hand for some absolute URL.
37
UNIT-I IT2353/Web Technology IT/RMKEC
A relative URL that looks like a file name, such as valid-xhtml10.png is converted to an
absolute URL by replacing any characters following the final slash(/) in the base URL
with the relative URL.
For example, if a document at http://www.example.org/PageWithAnchor.html contained
the markup <a href="#section1">... then the corresponding absolute URL would be
http://www.example.org/ PageWithAnchor.html#section1.
Absolute URLs Corresponding to Relative URLs When the Base URL is
http://www.example.org/a/b/c.html
Relative URL Absolute URL
d/e.html
../f.html
../../g.html
../h/i.html
/j.html
/k/l.html
http://www.example.org/a/b/d/e.html
http://www.example.org/a/f.html
http://www.example.org/g.html
http://www.example.org/a/h/i.html
http://www.example.org/j.html
http://www.example.org/k/l.html
1.13 – LISTS
The three types of lists supported by HTML:
_ Unordered: A bullet list
_ Ordered: A numbered list
_ Definition: A list of terms and definitions for each
<ul>
<li>Bulleted list item</li>
<li>Bulleted list item 2</li> </ul> <ol>
<li>Numbered list item</li>
<li>Numbered list item 2</li>
</ol>
<dl>
<dt>Term</dt>
<dd>Definition of term</dd>
<dt>Term 2</dt>
<dd>Definition of term 2</dd>
38
UNIT-I IT2353/Web Technology IT/RMKEC
</dl>
First, the type of list is indicated by using either a ul (unordered list), ol (ordered list), or
dl (definition list)
All three elements are block elements.
So a list by default begins on a new line when displayed in the browser.
Each item in an unordered or ordered list is made the content of an li (list item) element.
For a definition list, each term is made the content of a dt (definition term) element, and
each term description is made the content of a dd element that immediately follows the dt
element being described.
Lists can be nested to produce an outline layout. For example, the markup
<ul>
<li>Bulleted list item
<ul>
<li>Nested list item</li>
<li>Nested list item 2</li>
</ul>
</li>
<li>Bulleted list item 2</li> </ul>
1.14 – TABLES
HTML provides a fairly sophisticated model for presenting data in tabular form.
Columns and rows will automatically size to contain their data, although there are also
various ways to specify column widths; individual table cells can span multiple rows
and/or columns; header and/or footer rows can be supplied; and so on.
For example, a table of student grades could be written as follows
<table border="5">
<tr>
<td>Kim</td><td>100</td><td>89</td>
</tr>
<tr>
<td>Sandy</td><td>78</td><td>92</td>
</tr>
<tr>
<td>Taylor</td><td>83</td><td>73</td>
39
UNIT-I IT2353/Web Technology IT/RMKEC
</tr>
</table>
A tr (table row) element is used to contain each row.
Within a row, a td (table data) element marks each element of the row.
The number of rows is determined by the number of tr elements in the table, and the
number of columns is determined by the maximum number of td elements contained
within any row.
The border attribute in the table start tag tells the browser to display the table using a 5-
pixel-wide border and 1-pixel-wide rules.
In general, if any positive integer n is used for the value of border, then an n-pixel-wide
border and 1-pixel-wide rules will be displayed.
A value of 0 for this attribute turns off both the border and the rules.
<table border="5">
<caption>
COSC 400 Student Grades
</caption>
<tr>
<td> </td><td> </td><th colspan="2">Grades</th>
</tr>
<tr>
<td> </td><th>Student</th><th>Exam 1</th><th>Exam 2</th>
</tr>
<tr>
<th rowspan="2">Undergraduates</th><td>Kim</td><td>100</td><td>89</td>
</tr>
<tr>
<td>Sandy</td><td>78</td><td>92</td>
</tr>
<tr>
<th>Graduates</th><td>Taylor</td><td>83</td><td>73</td>
</tr> </table>
The caption element is used to define a caption for the table. If a caption element is used
with a table, the caption start tag must must appear immediately after the start tag of the
table element.
40
UNIT-I IT2353/Web Technology IT/RMKEC
The second new element, th (table header), is much like the td element, except that a
typical browser will format the content of a th element in boldface and center it
horizontally within the column.
<td> </td> to indicate that the first column of the second row should be left blank.
The reference is included to ensure that the cell rules are displayed.
colspan is used to tell the browser that a table element should cover more than one
column.
rowspan is used for an element that covers more than one row
For performance and other reasons, if a large image is to be displayed on a web page, it
will often be sliced into several smaller images that are downloaded separately and
displayed next to one another to recreate the large image.
Tables are frequently used to position the smaller images adjacent to one another so that
they appear to be a single larger image.
the table element has two attributes that control spacing within the table: cellspacing and
cellpadding.
The cellspacing attribute determines the amount of space between two adjacent cells, or
between a side of the cell and the border of the table, while cellpadding determines the
amount of space between the content of a cell and the edge of the cell.
<table border="1" cellpadding="5">
<tr>
<th>cellspacing</th><th>cellpadding</th><th>Example</th>
</tr>
<tr>
<td>10</td><td>10</td>
<td id="nested">
<table border="1" cellspacing="10" cellpadding="10">
<tr>
<td>
<img src="CFP1.png" style="display:block"
alt="Cucumber and Flower Pot" height="86" width="67" />
</td>
<td>
<img src="CFP2.png" style="display:block"
alt="Cucumber and Flower Pot" height="86" width="45" />
</td>
</tr>
41
UNIT-I IT2353/Web Technology IT/RMKEC
</table>
</td>
</tr>
The td element with id value nested is another table.
Any desired depth of recursion of tables are possible.
Each img element assigns a value of display:block to its style attribute.
Here the image should be treated as a block element for display purposes
1.15 – FRAMES
HTML frames are essentially a means of having several browser windows open within a
single larger window.
Such a window is created by using one or more frameset elements after the heading
element, rather than using a body element as all of our previous pages have used.
The document type declaration is also different for framed pages than it is for standard
web pages.
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Java 2 Platform SE v1.4.2</title>
</head>
<frameset cols="20%,80%">
<frameset rows="1*,2*">
<frame src="overview-frame.html"
id="upperLeftFrame" name="upperLeftFrame"></frame>
<frame src="allclasses-frame.html"
id="lowerLeftFrame" name="lowerLeftFrame"></frame>
</frameset>
<frame src="overview-summary.html"
id="rightFrame" name="rightFrame"></frame>
</frameset>
</html>
The first frameset statement says to create two rectangular subspaces, or views, within
the browser window.
42
UNIT-I IT2353/Web Technology IT/RMKEC
The first (and therefore leftmost) of these subspaces covers 20% of the.width of the
browser window, and the second covers the remaining 80% of the window.
This top-level frameset element contains two child elements: another frameset and a
frame.
The child frameset specifies that its subspace (the left 20% of the browser window) is
further divided into two views.
Since these two views are specified using the rows attribute, they are stacked one on top
of the other and both occupy the full width of the child frameset‘s subspace.
The notation 1*,2* indicates that the vertical space should be allocated so that the second
(and therefore lower) view is twice as tall as the first view.
Framesets can be defined recursively, with one frameset defined within another.
Each frame is essentially a browser window, which occupies a subspace of the screen as
defined by the frameset(s) containing the frame.
In the example considered, the frame named upperLeftFrame occupies the upper left
corner (20% wide by 1/3 high) of the browser window, the frame lowerLeftFrame the
lower left corner (20% wide by 2/3 high), and the frame rightFrame the right side (80%
wide by 100% high).
The src attribute of a frame tells the browser the URL of a document to be loaded into the
frame initially.
If an HTML document is loaded into the frame and the user clicks a hyperlink within that
frame, then the document named in the href attribute of the hyperlink‘s a element will be
loaded into the frame. Other frames in the browser window will not be affected.
However, it is also possible for activity in one frame to cause a change in another frame.
For example, theHTML contained in the frame named upperLeftFrame might contain an
anchor such as the following:
<a href="java/applet/package-frame.html" target="lowerLeftFrame">
This anchor specifies lowerLeftFrame as the value of its target attribute, if the user clicks
the link corresponding to this anchor, then the URL specified by the href attribute of the
anchor will be loaded into lowerLeftFrame.
43
UNIT-I IT2353/Web Technology IT/RMKEC
One use of frames, then, is to provide a relatively easy means of writing HTML
documents that provide navigation tools.
Navigation page can be loaded into its own frame, and a second frame can be specified as
a target for each of the anchor elements contained in the navigation page.
A nonframe alternative is to include the navigation links directly in every page of the
collection.
Disadvantages of using frames:
i) Frames can cause confusion for end users.
ii) Avoid frames is that they assume that the document is going to be displayed on a
monitor large enough to comfortably accommodate them.
A key exception is that frames are still used for certain specialized technical
applications—such as documents produced using the JavadocTM documentation
system—in which the end users may be expected to have the expertise needed to deal
with the vagaries of frames and to be viewing documents on standard monitors.
1.16 – HTML FORMS
The <form> tag is used to create an HTML form for user input.
The <form> element can contain one or more of the following form elements:
<input>
<textarea>
<button>
<select>
<option>
An HTML form is used to pass data to a server.
Example:
An HTML form with two input fields and one submit button:
<form action="form_action.asp" method="get">
First name: <input type="text" name="fname" /><br />
Last name: <input type="text" name="lname" /><br />
<input type="submit" value="Submit" />
</form>
44
UNIT-I IT2353/Web Technology IT/RMKEC
TextArea: -
<textarea rows="2" cols="20">
from basic HTML to advanced XML, SQL, ASP, and PHP.
</textarea>
The <textarea> tag defines a multi-line text input control.
A text area can hold an unlimited number of characters, and the text renders in a fixed-
width font (usually Courier).
The size of a text area can be specified by the cols and rows attributes, or even better;
through CSS' height and width properties. Select:-
Create a drop-down list with four options:
<select>
<option value="volvo">Volvo</option>
<option value="saab">Saab</option>
<option value="mercedes">Mercedes</option>
<option value="audi">Audi</option>
</select>
The <select> tag is used to create a drop-down list.
The <option> tags inside the <select> element define the available options in the list.
Input:
The <input> tag is used to select user information.
<input> elements are used within a <form> element to declare input controls that allow
users to input data.
An input field can vary in many ways, depending on the type attribute.
CheckBox:
<input type=‖checkbox‖ name= ―c1‖>
The value for type allows to create a checkbox style interface for the form.
Radio:
<input type=‖radio‖ name= ―r1‖ value=‖p‖>
45
UNIT-I IT2353/Web Technology IT/RMKEC
The value for type allows to create a radio button style interface for the form. It is
designed to offer user a choice from pre-determined options. However radio is designed
to accept only one response from among its option.
Text:
The first possible value for the type attribute is text, which creates a single line text box
of the length.
First name: <input type="text" name="fname" />
Password:
A password option is nearly identical to the text option except that it responds to typed
letters with bullet points.
Passoword: <input type="passoword" name="fname" />
Example: Form Validation
<HTML>
<HEAD>
<TITLE> FORMS</TITLE>
</HEAD>
<BODY><center><h2> APPLICATION FORM</h2>
<TABLE BORDER="5">
<FORM NAME="F1" METHOD="POST">
<TR>
<TD>
NAME:</TD><TD><INPUT TYPE="text" NAME="name">
</TD>
</TR>
<TR>
<TD>
EMAIL ID:</TD><TD><INPUT TYPE="text" NAME="emailid">
</TD>
</TR>
<TR>
<TD>
EXPERIENCE:</TD><TD><INPUT TYPE="radio" NAME="fresh"
value="FRESH">FRESH
</TD>
<TD><INPUT TYPE="radio" NAME="experience"
value="EXPERIENCE">EXPERIENCE
46
UNIT-I IT2353/Web Technology IT/RMKEC
</TD>
</TR>
<TR>
<TD>
DATE OF BIRTH:</TD>
<TD>
<SELECT NAME="date">
<OPTION VALUE="1">1</OPTION>
<OPTION VALUE="2">2</OPTION>
<OPTION VALUE="3">3</OPTION>
<OPTION VALUE="4">4</OPTION>
<OPTION VALUE="5">5</OPTION>
<OPTION VALUE="6">6</OPTION>
<OPTION VALUE="7">7</OPTION>
<OPTION VALUE="8">8</OPTION>
<OPTION VALUE="9">9</OPTION>
<OPTION VALUE="10">10</OPTION>
<OPTION VALUE="11">11</OPTION>
<OPTION VALUE="12">12</OPTION>
<OPTION VALUE="13">13</OPTION>
<OPTION VALUE="14">14</OPTION>
<OPTION VALUE="15">15</OPTION>
<OPTION VALUE="16">16</OPTION>
<OPTION VALUE="17">17</OPTION>
<OPTION VALUE="18">18</OPTION>
</SELECT>
</TD>
<TD>
<SELECT NAME="month">
<OPTION VALUE="1">1</OPTION>
<OPTION VALUE="2">2</OPTION>
<OPTION VALUE="3">3</OPTION>
<OPTION VALUE="4">4</OPTION>
<OPTION VALUE="5">5</OPTION>
<OPTION VALUE="6">6</OPTION>
<OPTION VALUE="7">7</OPTION>
<OPTION VALUE="8">8</OPTION>
<OPTION VALUE="9">9</OPTION>
<OPTION VALUE="10">10</OPTION>
<OPTION VALUE="11">11</OPTION>
<OPTION VALUE="12">12</OPTION>
</SELECT>
</TD>
<TD>
47
UNIT-I IT2353/Web Technology IT/RMKEC
<SELECT NAME="year">
<OPTION VALUE="1991">1991</OPTION>
<OPTION VALUE="1992">1992</OPTION>
<OPTION VALUE="1993">1993</OPTION>
<OPTION VALUE="1994">1994</OPTION>
<OPTION VALUE="1995">1995</OPTION>
<OPTION VALUE="1996">1996</OPTION>
<OPTION VALUE="1997">1997</OPTION>
<OPTION VALUE="1998">1998</OPTION>
<OPTION VALUE="1999">1999</OPTION>
<OPTION VALUE="2010">2000</OPTION>
<OPTION VALUE="2011">2011</OPTION>
<OPTION VALUE="2012">2012</OPTION>
</SELECT>
</TD>
</TR>
<TR>
<TD>EDUCATIONAL QUALIFICATION:</TD>
<TD>
<SELECT NAME="edu">
<OPTION VALUE="B.TECH">B.TECH</OPTION>
<OPTION VALUE="B.E">B.E</OPTION>
<OPTION VALUE="Bsc">Bsc</OPTION>
</SELECT>
</TD>
</TR>
<TR>
<TD>SPECIALISATION:</TD>
<TD>
<SELECT NAME="edu">
<OPTION VALUE="B.TECH">JAVA</OPTION>
<OPTION VALUE="B.E">C++</OPTION>
<OPTION VALUE="Bsc">ORACLE</OPTION>
</SELECT>
</TD>
</TR>
<TR>
<TD>SKILLS:</TD>
<TD>
<INPUT TYPE="checkbox" NAME="C1" value="C++">C++
</TD>
<TD>
48
UNIT-I IT2353/Web Technology IT/RMKEC
<INPUT TYPE="checkbox" NAME="C2" value="JAVA">JAVA
</TD>
</TR>
<TR>
<TD>ENTER COMMENTS BELOW</TD>
<TD><INPUT TYPE="textarea" NAME="ta" rows="5" cols="30"></td>
</tr>
</table></center>
</body>
</html>
1.17 – XML
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. User can define own tags
XML is a W3C Recommendation
The Difference Between XML and HTML
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data, with focus on what data is
HTML was designed to display data, with focus on how data looks
HTML is about displaying information, while XML is about carrying information.
Example:
i) <?xml version=‖1.0‖>
ii) <note>
iii) <to>Tove</to>
iv) <from>Jani</from>
v) <body>Hai!</body>
vi) </note>
Line (i) which defines the XML version.
Line (ii) describes the root element.
Line(iii)(iv)(v) child elements of the root.
Line(vi) defines the end of the root element.
49
UNIT-I IT2353/Web Technology IT/RMKEC
1.17.1 - XML SYNTAX:
All XML Elements Must Have a Closing Tag
In HTML, some elements do not have to have a closing tag:
<p>This is a paragraph. <br>
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph.</p> <br />
XML Tags are Case Sensitive
XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
XML Elements Must be Properly Nested
In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
XML Documents Must Have a Root Element
XML documents must contain one element that is the parent of all other elements. This
element is called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Attribute Values Must be Quoted
XML elements can have attributes in name/value pairs just like in HTML.
In XML, the attribute values must always be quoted.
50
UNIT-I IT2353/Web Technology IT/RMKEC
Study the two XML documents below. The first one is incorrect, the second is correct:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Well formed ―XML Documents‖ : It is document that conforms to the XML syntax rules.
―Valid‖ XML Documents: A valid XML Document is a well formed document, which is
also conforms to the rules of a Document Type Definition(DTD)
1.17.2 – DTD
XHTML 1.0 is defined by a set of text files known collectively as an XML document type
definition (DTD).
The basic elements of a DTD
Ex:
<!ELEMENT html (head, body)>
<!ATTLIST html
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr|rtl) #IMPLIED
id ID #IMPLIED
xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml'>
<!ENTITY gt ">">
The first of these lines is an example of an element type declaration. Element type
declarations are used to specify the set of all valid elements in the language defined by
the DTD.
The html element must have two children, a head element followed by a body element.
The second line begins a tag that represents an XML attribute list declaration.
51
UNIT-I IT2353/Web Technology IT/RMKEC
Element type has five attributes: lang, xml:lang, dir, id, and xmlns. It also provides
information such as the valid set of values for each attribute and default value
information. .
The final line is an example of an XML entity declaration. Such a tag is essentially a
macro definition, associating the name gt (an entity) with the string >.
1.17.3 – ELEMENT TYPE DECLARATION
<!ELEMENT html (head, body)>
The string immediately following ELEMENT is the name of the element type being
declared,in this case html.
The XHTML DTD contains exactly one element type declaration for each element in the
language.
<!ELEMENT select (optgroup|option)+> is a choice specification type modified with the
+ iteration character, and says that a select element may contain any number of optgroup
and option elements in any order, as long as one or the other of these two elements
appears at least once.
Furthermore, sequence and choice specifications (optionally with iterator characters) can
be nested, and an element name within a sequence or choice may have an iterator
character suffixed to it.
1.17.4 - ATTRIBUTE LIST DECLARATIONS
An attribute list declaration is included in the DTD for each element that has attributes.
Attribute list declaration begins with the keyword ATTLIST followed by an element type
name and specifies the names for all attributes of the named element, the type of data that
can be used as the value of each attribute, and default value information.
For example, consider the example from the beginning of this section.
<!ATTLIST html
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED
dir (ltr|rtl) #IMPLIED
52
UNIT-I IT2353/Web Technology IT/RMKEC
id ID #IMPLIED
xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml'>
Each line after the first is a sequence of three elements: the attribute name, the attribute
type, and the default declaration.
The attribute type specifies the type of data that may be specified as the attribute value.
Attribute Type
Syntax
Usage
Name token
Enumerated
Identifier
Identifier
reference
Identifier
reference list
Character data
NMTOKEN
(string1 | string2 | ... )
ID
IDREF
IDREFS
CDATA
Name (word)
List of all possible attribute
values
Type for id attribute
Reference to an id attribute
value
List of references to id
attribute values
Arbitrary character d
Default Declaration:
# IMPLIED – No default value provided by DTD, Attiribute options #FIXED followed by any valid value – Default provided by Dtd any valid value – default value provided overridden by user. # REQUIRED – no default value provided by dtd, attribute required.
1.17.5 - ENTITY DECLARATIONS
XML DTD can contain entity declarations, each of which begins with the keyword
ENTITY followed by an entity name and its replacement text, such as
<!ENTITY gt ">">
This statement defines the value of the entity reference > to be the string >, which
in turn is an XML character reference to the greater-than character (which corresponds to
the code point 62 in the Unicode Standard).
An entity such as gt is known as a general entity and can only be referenced from within
documents defined by the DTD.
XML also provides for a different type of entity that can be referenced from within DTDs
and not from documents. Such entities are called parameter entities.
53
UNIT-I IT2353/Web Technology IT/RMKEC
A parameter entity declaration is indicated in the DTD by following the ENTITY
keyword with a percent sign (%), as in these two declarations taken from the XHTML1.0
Strict DTD:
1.18 – CREATING HTML DOCUMENT
HTML documents can be created in a number of ways.
One way is to open a simple text editor, such as Notepad in Microsoft Windows, and
simply type in the markup and content of the document.
A slightly more automated approach is to use an editor such as Emacs that can be
customized to provide certain HTML/XML-specific features, such as tag highlighting,
―smart‖ tabbing, automated generation of closing tags, and much more.
Higher-level tools that generate HTML from user input.
EX:-,create a document using familiar word-processing features and have the word
processor automatically convert the text formatting and other markup-type operations you
applied to the document into an equivalent HTML document.
Several applications are available that provide word-processing-type functionality
Mozilla 1.4 package includes a Composer feature under the Window menu.
Composer appears to be like many other word processors: it has WYSIWYG features for
selecting text size and weight.
It incorporates many HTML specific features
i) Ability to insert named anchor element
ii) Editing HTML document directly.
HTML documents can be generated directly or using HTML tools.