39
2 December 2005 Web Technologies Web Architectures Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com

Web Architectures - Web Technologies (1019888BNR)

Embed Size (px)

Citation preview

Page 1: Web Architectures - Web Technologies (1019888BNR)

2 December 2005

Web TechnologiesWeb Architectures

Prof. Beat Signer

Department of Computer Science

Vrije Universiteit Brussel

http://www.beatsigner.com

Page 2: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 2October 7, 2016

Web Information Systems

A web information system uses web technologies

for information and service delivery

Modern web information systems and web architectures

have to be extensible to cater for emerging technolgies and new forms of

interaction (e.g. multimodal interaction)

manage heterogeneous information such as documents, structured data, multimedia resources, semi-structured information, ...

integrate various sources (e.g. DBs) via multi-tier architectures

offer a notion of state to reflect the current application context

deal with information about users and their environment (context)

...

Page 3: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 3October 7, 2016

Basic Client-Server Web Architecture

Effect of typing http://www.vub.ac.be in the broswer bar(1) use a Domain Name Service (DNS) to get the IP address for

www.vub.ac.be (answer 134.184.129.2)

(2) create a TCP connection to 134.184.129.2

(3) send an HTTP request message over the TCP connection

(4) visualise the received HTTP response message in the browser

Internet

Client Server

HTTP Request

HTTP Response

Page 4: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 4October 7, 2016

Web Server

Tasks of a web server(1) setup connection

(2) receive and processHTTP request

(3) fetch resource

(4) create and sendHTTP response

(5) logging

The most prominent web servers are the Apache HTTP

Server and Microsoft's Internet Information Services (IIS)

A lot of devices have an embedded web server printers, WLAN routers, TVs, ...

Worldwide Web Servers, http://news.netcraft.com

Page 5: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 5October 7, 2016

Example HTTP Request Message

GET / HTTP/1.1Host: www.vub.ac.beUser-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101

Firefox/24.0Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language: en-gb,en;q=0.5Accept-Encoding: gzip, deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Connection: keep-alive

Page 6: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 6October 7, 2016

Example HTTP Response Message

HTTP/1.1 200 OKDate: Thu, 03 Oct 2013 17:02:19 GMTServer: Apache/2.2.14 (Ubuntu)X-Powered-By: PHP/5.3.2-1ubuntu4.15Content-Language: nlSet-Cookie: lang=nl; path=/; domain=.vub.ac.be; expires=Mon, 18-Sep-2073

17:02:16 GMTContent-Type: text/html; charset=utf-8Keep-Alive: timeout=15, max=987Connection: Keep-AliveTransfer-Encoding: chunked

<!DOCTYPE html><html lang="nl" dir="ltr"><head>...<title>Vrije Universiteit Brussel | Redelijk eigenzinnig</title><meta name="Description" content="Welkom aan de VUB" />...</html>

Page 7: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 7October 7, 2016

HTTP Protocol

Request/response communication model HTTP Request

HTTP Response

Communication always has to be initiated by the client

Stateless protocol

HTTP can be used on top of various reliable protocols TCP is by far the most commonly used one

runs on TCP port 80 by default

Latest version: HTTP/2.0 (May 2015)

HTTPS scheme used for encrypted connections

Page 8: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 8October 7, 2016

Uniform Resource Identifier (URI)

A Uniform Resource Identifier (URI) uniquely

identifies a resource

There are two types of URIs Uniform Resource Locator (URL)

- contains information about the exact location of a resource

- consists of a scheme, a host and the path (resource name)

- e.g. https://vub.academia.edu/BeatSigner

- problem: the URL changes if resource is moved!

• idea of Persistent Uniform Resource Locators (PURLs) [https://purl.oclc.org]

Uniform Resource Name (URN)

- unique and location independent name for a resource

- consists of a scheme name, a namespace identifier and a namespace-specific

string (separated by colons)

- e.g. urn:ISBN:3837027139

Page 9: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 9October 7, 2016

HTTP Message Format

Request and response messages have the same format

<html>...</html>

HTTP/1.1 200 OK

Date: Thu, 03 Oct 2013 17:02:19 GMTServer: Apache/2.2.14 (Ubuntu)X-Powered-By: PHP/5.3.2-1ubuntu4.15Transfer-Encoding: chunkedContent-Type: text/html

header field(s)

blank line (CRLF)

message body (optional)

start line

HTTP_message = start_line , {header} , "CRLF" , {body};

Page 10: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 10October 7, 2016

HTTP Request Message

Request-specific start line

Methods GET : get a resource from the server

HEAD : get the header only (no body)

POST : send data (in the body) to the server

PUT : store request body on server

TRACE : get the "final" request (after it has potentially been modified by proxies)

OPTIONS : get a list of methods supported by the server

DELETE: delete a resource on the server

start_line = method, " " , resource , " " , version;method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" ,

"OPTIONS" , "DELETE";resource = complete_URL | path;version = "HTTP/" , major_version, "." , minor_version;

Page 11: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 11October 7, 2016

HTTP Response Message

Response-specific start line

Status codes 100-199 : informational

200-299 : success (e.g. 200 for 'OK')

300-399 : redirection

400-499 : client error (e.g. 404 for 'Not Found')

500-599 : server error (e.g. 503 for 'Service Unavailable')

start_line = version , status_code , reason;version = "HTTP/" , major_version, "." , minor_version;status_code = digit , digit , digit;reason = string_phrase;

Page 12: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 12October 7, 2016

HTTP Header Fields

There exist general headers (for requests and responses), request headers, response headers, entity

headers and extension headers

Some important headers Accept

- request header definining the Multipurpose Internet Mail Extensions (MIME)

that the client will accept

User-Agent

- request header specifying the type of client

Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1)

- general header helping to improve the performance since otherwise a new

HTTP connection has to be established for every single webpage element

Content-Type

- entity header specifing the body's MIME type

Page 13: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 13October 7, 2016

HTTP Header Fields ...

Some important headers ... If-Modified-Since

- request header that is used in combination with a GET request (conditional

GET); the resource is only returned if it has been modified since the specified

date

Page 14: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 14October 7, 2016

Media Types

The Media Type (MIME type) defines the request or

response body's content (used for appropiate processing)

Standard Media Types are registered with the Internet

Assigned Numbers Authority (IANA) [RFC-6838]

mediaType = toplevel_type , "/" , subtype;

Media Type Description

text/plain Human-readable text without formatting information

text/html HTML document

image/jpeg JPEG-encoded image

... ...

Page 15: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 15October 7, 2016

HTTP Message Information

Various tools for HTTP message logging e.g. HttpFox add-on for Firefox browser

Simple telnet connection

Until 1999 the W3C has been working on HTTP Next

Generation (HTTP-NG) as a replacement for HTTP/1.1 never introduced

recently HTTP/2.0 has been released

- inspired by Goggle’s development of SPDY

telnet wise.vub.ac.be 80 (press Enter)GET /beat-signer HTTP/1.1 (press Enter)Host: wise.vub.ac.be (press Enter 2 times)

Page 16: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 16October 7, 2016

Proxies

A web proxy is situated between the client and the server acts as a server to the client and as a client to the server

can for example be specified in the browser settings; used for

- firewalls and content filters

- transcoding (on the fly transformation of HTTP message body)

- content router (e.g. select optimal server in content distribution networks)

- anonymous browsing, ...

Internet

Client Server

Proxy

Page 17: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 17October 7, 2016

Caches

A proxy cache is a special type of proxy server can reduce server load if multiple clients share the same cache

often multi-level hierarchies of caches (e.g. continent, countryand regional level) with communication between sibling and parent caches as defined by the Internet Cache Protocol (ICP)

passive or active (prefetching) caches

Internet

Client 1

ServerProxy CacheClient 2

1

2

12

Page 18: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 18October 7, 2016

Caches ...

Special HTTP cache control header fields Expires

- expiration date after which the cached resource has to be refetched

Cache-Control: max-age

- maximum age of a document (in seconds) after it has been added to the cache

Cache-Control: no-cache

- response cannot be directly served from the cache (has to be revalidated first)

...

Validators Last-modified time as validator

- cache with resource that has been last modified at time t uses an

If-Modified-Since t request for updates

Entity tags (ETag)

- changed by the publisher if content has changed; If-None-Match etag request

Page 19: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 19October 7, 2016

Caches ...

Advantages reduces latency and used network bandwidth

reduces server load (client and reverse proxy caches)

transparent to client and server

Disadvantages additional resources (hardware) required

might get stale data out of the cache

creates additional network traffic if we use an active caching approach (prefetching) but achieve a low cache hit rate

server loses control (e.g. access statistics) since no longer all requests have to be sent to the server

Page 20: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 20October 7, 2016

Tunnels

Implement one protocol on top of another protocol e.g. HTTP as a carrier for SSL connections

Often used to "open" a firewall to protocols that would

otherwise be blocked e.g. tunneling of SSL connections through an open HTTP port

Internet

SSL Client SSL Server

SSL

HTTP

SSL

HTTP[SSL] HTTP[SSL]

Page 21: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 21October 7, 2016

Gateways

A gateway can act as a kind of "glue" between

applications (client) and resources (server) translate between two protocols (e.g. from HTTP to FTP)

security accelerator (e.g. HTTPS/HTTP on the server side)

often the gateway and destination server are combined in a single application server (HTTP to server application translator)

Internet

HTTP Client FTP ServerHTTP/FTP

Gateway

HTTP

FTP

Page 22: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 22October 7, 2016

Session Management

HTTP is a stateless protocol

Session (state) tracking solutions use of IP address

- problem: IP address is often not uniquely assigned to a single user

browser login

- use of special HTTP authenticate headers

- after a login the browser sends the user information in each request

URL rewriting

- add information to the URL in each request

hidden form fields

- similar to URL rewriting but information can also be in body (POST request)

cookies

- the server stores a piece of information on the client which is then sent back to

the server with each request

Page 23: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 23October 7, 2016

Cookies

Introduced by Netscape in June 1994

A cookie is a piece of information that is

assigned to a client on their first visit list of <key,value> pairs

often just a unique identifier

sent via Set-Cookie or Set-Cookie2 HTTP response headers

Browser stores the information in a "cookie database" and

sends it back every time the same server is accessed

Potential privacy issues third-party websites might use persistent cookies for user tracking

Cookies can be disabled in the browser settings

Page 24: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 24October 7, 2016

Hypertext Markup Language (HTML)

Dominant markup language for webpages

If you never heard about HTML have a look at http://www.w3schools.com/html/

More details in the exercise and in the next lecture

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>Beat Signer: Interactive Paper, PaperWorks, Paper++, ...</title></head><body>Beat Signer is Associate Professor of Computer Science at the VUB ...</body></html>

Page 25: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 25October 7, 2016

Dynamic Web Content

Often it is not enough to serve static web pages but

content should be changed on the client or server side

Server-side processing Common Gateway Interface (CGI)

Java Servlets

JavaServer Pages (JSP)

PHP: Hypertext Preprocessor (PHP)

...

Client-side processing JavaScript

Java Applets

Adobe Flash

...

Page 26: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 26October 7, 2016

Common Gateway Interface (CGI)

CGI was the first server-side processing solution transparent to the user

certain requests (e.g. /account.pl) are forwarded via CGI to a program by creating a new process

program processes the request and creates an answer with optional HTTP response headers

Internet

Client Server

HTTP Request

HTTP Response

Program in

Perl, Tcl, C,

C++, Java, ..

HTML Pages

CGI

Page 27: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 27October 7, 2016

Common Gateway Interface (CGI) ...

CGI Problems a new process has to be started for each request

if the CGI program for example acts as a gateway to a database, a new DB connection has to be established for each request which results in a very poor performance

FastCGI solves some of the problems by introducing

persistent processes and process pools

CGI/FastCGI becomes more and more replaced by other

technologies (e.g. Java Servlets)

Page 28: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 28October 7, 2016

Java Servlets

A Java servlet is a Java class that has to extend the

abstract HTTPServlet class

The Java servlet class is loaded by a servlet container

and relevant requests (based on a servlet binding) are

forwarded to the servlet instance for further processing

Internet

Client Server

HTTP Request

HTTP Response

HTML Pages

Servlet

Container

Servlets

Page 29: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 29October 7, 2016

Java Servlets ...

Main HttpServlet methods

Servlet life cycle a servlet is initialised once via the init() method

the doGet(), doPost() methods may be executed multiple times (by different HTTP requests)

finally the servlet container may unload a servlet (upcall of the destroy() method before that happens)

Servlet container (e.g. Apache Tomcat) either integrated

with web server or as standalone component

doGet(HttpServletRequest req, HttpServletResponse resp)doPost(HttpServletRequest req, HttpServletResponse resp)init(ServletConfig config)destroy()

Page 30: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 30October 7, 2016

Java Servlet Example

In the exercise you will learn how to process parameters etc.

package org.vub.wise;

import java.io.*;import java.util.Date;import javax.servlet.http.*;import javax.servlet.*;

public class HelloWorldServlet extends HttpServlet {public void doGet (HttpServletRequest req, HttpServletResponse res)throws ServletException, IOException {PrintWriter out = res.getWriter();out.println("<html>");out.println("<head><title>Hello World</title></head>");out.println("<body>The time is " + new Date().toString() + "</body>");out.println("</html>");out.close();}}

Page 31: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 31October 7, 2016

JavaServer Pages (JSP)

A "drawback" of Java servlets is that the whole page

(e.g. HTML) has to be defined within the servlet not easy to share tasks between web designer and programmer

Add program code through scriptlets and markup to

existing HTML pages

These JSP documents are then either interpreted on the

fly (Apache Tomcat) or compiled into Java servlets

The JSP approach is similar to PHP or Active Server

Pages (ASP)

Note that Java Servlets become more and more an

enabling technology (as with JSP)

Page 32: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 32October 7, 2016

JavaScript

Interpreted scripting language for client-side processing

JavaScript functionality often embedded in HTML

documents but can also be provided in separate files

JavaScript often used to validate data (e.g. in a form)

dynamically add content to a webpage

process events (onLoad, onFocus, etc.)

change parts of the original HTML document

create cookies

...

Note: Java and JavaScript are completely different

languages!

Page 33: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 33October 7, 2016

JavaScript Example

More details about JavaScript in lecture 6 and in the

exercise session

<html><body><script type="text/javascript">document.write("<h1>Hello World!</h1>");</script></body></html>

Page 34: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 34October 7, 2016

Java Applets

A Java applet is a program delivered to the client side in

the form of Java bytecode executed in the browser using a Java Virtual Machine (JVM)

an applet has to extend the Applet or JApplet class

runs in the sandbox

Advantages the user automatically always has the most recent version

high security for untrusted applets

full Java API available

Disadvantages requires a browser Java plug-in

Page 35: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 35October 7, 2016

Java Applets ...

Disadvantages ... only signed applets can get more advanced functionality

- e.g. network connections to other machines than the source machine

More recently Java Web Start (JavaWS) is replacing

Java Applets program no longer runs within the browser

- less problematic security restrictions

- less browser compatibility issues

Math and Physics Applet Examples http://www.falstad.com/mathphysics.html

Page 36: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 36October 7, 2016

Exercise 2

Hands-on experience with the HTTP protocol

Page 37: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 37October 7, 2016

References

David Gourley et al., HTTP: The Definitive

Guide, O'Reilly Media, September 2002

R. Fielding et al., RFC2616 - Hypertext Transfer

Protocol - HTTP/1.1 http://www.faqs.org/rfcs/rfc2616.html

N. Freed et al., RFC6838 - Media Type Specifications

and Registration Procedures http://www.faqs.org/rfcs/rfc6838.html

HTML and JavaScript Tutorials http://www.w3schools.com

Page 38: Web Architectures - Web Technologies (1019888BNR)

Beat Signer - Department of Computer Science - [email protected] 38October 7, 2016

References ...

M. Knutson, HTTP: The Hypertext Transfer

Protocol (refcardz #172) http://refcardz.dzone.com/refcardz/http-hypertext-transfer-0

W. Jason Gilmore, PHP 5.4 (refcardz #23) http://refcardz.dzone.com/refcardz/php-54-scalable

Java Servlet Tutorial http://www.tutorialspoint.com/servlets/

Page 39: Web Architectures - Web Technologies (1019888BNR)

2 December 2005

Next LectureHTML5 and the Open Web Platform