Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
WEB
Network Application Frameworks 8.2.2011 Jukka K. Nurminen
Last Time
• Web Services: let's make machine-callable services using web principles – SOAP – WSDL – UDDI – Web Services Stack & a set of WS-* standards
• REST – Everything is a URI – Actions through verbs (GET, POST, PUT, DELETE, …) – Relies on on HTTP and web technologies
2
QT lecture • QML for mobile applications for Nokia platforms • This week Friday QT 2 lecture at 12-14 • Lectures on Android and iPhone development later in
the Spring
Today
• HTTP protocol • Web Servers
Three building blocks of web
• Markup language for formatting hypertext documents (HTML)
• Uniform notation for addressing accessible resources (URI)
• Protocol for transporting messages (HTTP)
Three building blocks of web
• Markup language for formatting hypertext documents (HTML)
• Uniform notation for addressing accessible resources (URI)
• Protocol for transporting messages (HTTP)
HTML example
HTML
Content
Other content included
Formatting Links to other pages
JavaScript
Style definitions (CSS)
Headers
CSS – Cascading Style Sheets
Fonts, colors, positions, etc…
Separating content from style
CSS HTML
Displayed Page
+ Separates content from style Contributes to dynamic behavior
Three building blocks of web
• Markup language for formatting hypertext documents (HTML)
• Uniform notation for addressing accessible resources (URI)
• Protocol for transporting messages (HTTP)
URL
• http://www.mywebsite.com/sj/test;id=8079 ?name=bob&x=true#label
• Scheme://host[:port]/path/…/[;url-params][?query-string][#anchor] – Scheme: protocol (http, ftp, …) – Host: IP address of web server (numeric or DNS-based) – Port: (e.g. 80 http, 8080 http alternative, 443 https) – Path: document location relative to server root – url-params: e.g. for session id – query-string: dynamic params as name-value pairs – anchor: “bookmark” within the requested page
URL vs. URN vs. URI
• Uniform Resource Locator (URL) • Uniform Resource Name (URN) • Uniform Resource Identifier (URI)
• URL speaks about location of the resource – What if the resource changes its location?
• Location independent resource name =>URN – This would be nice but they have not materialized so far
• URI = union of URL and URN
Three building blocks of web
• Markup language for formatting hypertext documents (HTML)
• Uniform notation for addressing accessible resources (URI)
• Protocol for transporting messages (HTTP)
HTTP – HyperText Transfer Protocol
• Current version HTTP/1.1 – Older clients, proxies, or server can still use HTTP/1.0
• => problems with backward compatibility
• Text based protocol • Request-response paradigm • Stateless protocol
Request-response paradigm
Proxy Browser Web server
Request
Response
Request
Response
HTTP is Stateless protocol
• Different from stateful protocols • In stateful protocols the server manages the “state” of
the interaction often in the form of sessions – E.g. FTP, SMTP, POP – E.g. user logs is => authorized state, user requests a file (no
need to recheck the authentication)
• Problems of stateful protocols – What if the client died? When to time-out and release the
session? – Difficult to move session to another computer
• Load balancing?
Stateless protocol
• Pros – No state maintence – Simple for the server – Easier load balancing
• Cons – Often some kind of session is
needed in modern web application
– E.g. user needs to be authenticated,
– or the shopping cart of the user needs to be maintained
⇒ State maintenance outside of the protocol ⇒ Cookie based approaches in
web
HTTP Request
GET /sj/index.html HTTP/1.1 Host: www.mywebsite.com
----- METHOD /path-to-resource HTTP/version-number
Header-Name-1: value Header-Name-2: value
[optional request body]
GET, HEAD, PUT, DELETE, POST, TRACE, CONNECT (cf. REST)
e.g. Host, User-Agent,
HTTP Reply
HTTP/1.1 200 OK Content-Type: text/html Content-Length: 9934 … <HTML> <HEAD>… -------- HTTP/version-number status-code explanation Header-Name-1: value Header-Name-2: value [response body]
200 OK, 100 Continue, 301 Moved Permanently, 302 moved
temporarily, 400 Bad request, 401 Not authorized, 404 Not found, etc
Content-Type, Content-Length, Date, Server, Last-
Modified, etc.
GET vs. POST
GET /q?s=YHOO HTTP/1.1
Host: finance.yahoo.com
No body
------
POST /q HTTP/1.1
Host: finance.yahoo.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 6
S= YHOO
HTTP Status codes 1/2
1xx Informational This category represents provisional responses and requires client activity to continue.
100 Continue – This code indicates that the server has received the first part of the request and that the client should go on with the rest of the request.
2xx Success This category represents responses in which the server has successfully processed a client’s request.
200 OK – This code is the standard response that a request was successful.
3xx Redirection This category represents responses where the client is required to take further action in order to finish the request.
300 Multiple Choices – This code informs the client that there are multiple choices available for the requested page.
301 Moved Permanently – This code indicates that the requested page has permanently moved to a new location and all future requests should be made to the given URI
302 Found – As one the most often used codes, this indicates that the requested page has been temporarily moved to another location at the URI provided.
304 Not Modified – This code goes along with 302, where if the same page with a 302 response is requested more than once, this message indicates that the page has not been modified since the previous request. Therefore, the client pulls up the page from the cache, saving on bandwidth.
HTTP Status codes 2/2
• Bad request 400: This is a fairly common error and basically means that the requested document could not be sent because of syntax error in the URL (site address) • Unauthorised 401: This is one of the most common error messages and usually means that the Server is expecting some sort of encryption id from the browser • Forbidden 403: The document you are requesting is "forbidden" meaning that you do not have read privileges or are not allowed to have the page sent to you • Not found 404: This is the most common and is similar to the 400 error. Basically this means that the document you have requested no longer exists or that the URL (site address) is incorrect • Internal Error 500: The server was unable to send the html document to you due to an internal (server software) error • Not implemented 501: This error occurs when for example, you have pressed the submit button on a form. The server reply's with this error message because it doesn't support the feature that you have requested. This is not a real common error, but typically occurs when new features or forms are implemented • Service temporarily overloaded 502: The server cannot process the request due to a high load - The solution is to try later at a time when net load or server traffic is lower • Gateway Time-out 503: The connection between the Web server and the browser has timed out due to server problems Internet problems or browser problems • Failed DNS Lookup The IP address of the Web site could not be located by the ISP's domain name server or the server is too burdened to process the request
Redirection
GET /~shklar HTTP/1.1 Host: www.cs.rutgers.edu
------
HTTP/1.1 301 Moved Permanently
Location: http://www.cs.rutgers.edu/~shklar/
<HTML>…
301 Moved Permanently • Proxies can record this and
redirect to right location 302 Moved Temporarily • E.g. for load balancing use
Caching
• Server-side caching – Saves processing and data access at server
• Browser-side caching – Avoid fetching data
• Proxy-side caching – Data already available at proxy
Proxy Browser Web server
Request
Response
Request
Response
Cache-control: • Public • Private • No-cache
Caching with HEAD method
• HEAD method • Returns headers but no content • Last-modified: Tue, 29 Oct 2002 04:22.52 GMT
• Allows comparison to cached content and avoids unnecessary data traffic
• Problem: Reduces amount of data transferred but can increase the number of request – If cached copy is outdated requires both HEAD and GET
requests
Caching with If-modified-since
GET /~shklar HTTP/1.1 Host: www.cs.rutgers.edu If-Modified-Since: Sun, 27 Apr 2008 22:28:00 GMT Returns either 304 Not Modified and no body or the whole content normally • No additional requests
HTTP authentication
• GET /book/chapter3/index.html • -------- • HTTP/1.1 401 Authenticate • WWW-Authenticate: Basic realm=“Chapter3” • ------- • GET /book/chapter3/index.html • Authorization: Basic encoded-userid:password
Password sent unencrypted!!! Safe only when connection secure e.g. https Most applications use their own authentication mechanisms
The branch of the tree where the same authentication is valid
Session support with cookies
• HTTP is stateless • => solution: Cookies • Name-value pairs maintained in the browser • HTTP/1.1 200 OK • Set-Cookie: CUSTOMER=“Rich”; Path=“/movies”; Version=“1”
• ------- • GET /movies/access HTTP/1.1 • Cookie: $Version=“1”; CUSTOMER=“Rich”;$PATH=“/movies”
Virtual hosting
GET http://finance.yahoo.com/q?s=YHOO HTTP/1.1
Host: finance.yahoo.com
• Single host (IP address) can virtually host multiple web sites with different names
• DNS maps two or more addresses to the same IP. How can the web server differentiate between different sites
• Proxy and backward compatibility require seemingly redundant information (finance.yahoo.com needed three times: in address, GET request, Host)
Persistent connections
• Early web browsers recreated connections for each Request-Reply pair
• => lots of TCP setups and tear-downs • => load for servers & clients, latencies • In HTTP/1.1 connections persist unless explicitly closed
via Connection:close header. Sent e.g. when the system knows that the data transfer has been terminated
• => need to maintain request and reply queues (also if connections are dropped and reestablished)
Web Server
Server-side of web applications
• Operating system (often Linux) • Web server (e.g. Apache) to respond to requests • Mechanisms to store data, typically a database (e.g.
MySQL) • A way to state how the application reacts to different
requests – Code, templates, layout and visuals – A content management system (e.g. Drupal, Joomla)
LAMP Stack
• The acronym LAMP refers to a solution stack of software, usually free software / open-source software, used to run dynamic Web sites or servers. The original expansion is as follows: – Linux, referring to the operator system; – Apache, the Web server, – MySQL, the database management system; – PHP (or PERL or Python), the programming language.
• Or Java, or…. – Content management system
Source: Netcraft, 2009
Server Operation
Networking Support
Address resolution
Request processing
Response generation
Address resolution Networking Support
Address resolution
Request processing
Response generation
Virtual hosting
Address mapping Authentication
Request Processing
Networking Support
Address resolution
Request processing
Response generation
Static Dynamic
Static content
As-is pages CGI SSI Template
approaches Servlets
Static vs. Dynamic pages
• Static page – The content is ready and the server simply returns it – Early years of web – Some content today
• Dynamic content – Content is generated on the fly to match the user, context, time, etc. – Different levels of dynamism – Unit of update is one page – Different mechanisms: CGI, SSI, Native APIs, Servlets, JSP
• AJAX – Unit of update is part of the page – Asynchronous Java and XML
Delivery of static content
• Static content page – Returns the file found in the server directory hierarchy – Server generates HTTP responses (including headers) – Normal mechanism
• As-is pages – Compete HTTP responses (including headers) – For testing and special cases
GET http://mysite.org/pages/simple-page.html HTTP/1.1 Host: mysite.org
Delivery of dynamic content
• Multiple mechanisms in use – Scripting based
• CGI • Native APIs
– ISAPI (Microsoft’s IIS), Apache Server API • Servlets • Java Server Pages (JSP)
– Template based • SSI – Server Side Includes • Template processing
– PHP, Cold Fusion (adobe), ASP (Microsoft) – In practice the division to scripting and template languages not
very clear
Challenges • Platform independence (Unix, Windows, etc.)
– Different ways to start processes and threads – Different ways to access environment variables – Different ways to exit programs
• Server independence (Apache, IIS, etc.) – Vendors encourage lock-in
• Proprietary extensions • Can also influence client side
• Language independence (?) • Imperative vs. declarative styles
– Code execution vs. template filling • Separating the “code” from the “looks” and data
CGI – Common Gateway Interface
• The first consistent server-independent mechanism • Still in use today • Originated in the UNIX environment (visible in the way it
works) but is available in other environments • Based on a fixed set of environmental variables server
applications can access, e.g. – REQUEST_METHOD (HTTP method, e.g. GET, PUT) – PATH_TRANSLATED (path within server directory) – SCRIPT_NAME (name of the cgi script) – QUERY_STRING (information following “?” in the URL)
• http://mysite.org/cgi-bin/zip.cgi?user=“Joe”
CGI execution steps
http://mysite.org/cgi-bin/zip.cgi?user=“Joe”
1. Recognizes that this is a CGI script from the directory name or file extension (configured for web server)
2. Transforms zip.cgi to a path with server computer, checks that the program exists, and has execution privileges
3. Sets the environment variable values 4. Spawns a new process to run zip.cgi
1. Request body -> stdinput 2. Stdoutput -> to server to process and send back to browser
5. At process termination server adds status code and header fields and sends the reply to browser
CGI Pros and cons
• Language independent – Although Perl is frequently
used – Interpreted languages in
general are popular in web service implementation
• Simple
• Each request-reply pair requires the creation (and termination) of a new process – Takes a lot of resources from
the server
Native APIs
• Apache Server API • ISAPI (for Microsoft’s IIS)
• Target the efficiency concerns • Makes code reuse and portability difficult
– Partly a business objective to create stickiness
Fast CGI
• Similar to CGI but addresses the efficiency problem • Instead of spawning a new process for each request,
CGI scripts stay alive after a request has been satisfied – Eliminates the overhead of spawning and initializing new
processes – Tries to hide the complexity of managing the alive scripts from
the developers but is not able to do so completely
Servlet API
• Java technology for server code execution • When a servlet container is available code is portable between
different platforms (server, OS, HW)
public class HelloWorldExample extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head>"); String title = rb.getString("helloworld.title"); out.println("<title>" + title + "</title>"); out.println("</head>"); out.println("<body bgcolor=\"white\">");
…
Called to handle GET HTTP request
Inherits servlet functionality
Regular Java code
Servlet API Pros and Cons
• Programming with regular Java • Code compiled with the normal
Java mechanisms to portable byte code – Reasonably efficient
• Able to handle multiple requests concurrently
• Support forwarding requests to other servers and servlets
• HTML (or XML) markup mixed with code – Code looks complicated – Separate maintenance of
outlook and operation impossible
– => JSP (Java Server Pages) an attempt to fix this
SSI – Server Side Includes
• Partially populated HTML pages (templates) filled by CGI scripts or other means
<!--#element attribute=value attribute=value ... -->
<HTML> <BODY> <!--#exec cgi="/cgi-bin/hits.pl” --> <!--#exec cmd="ls –al” --> <!--#config timefmt="%A %B %d, %Y" --> Today is <!--#echo var="DATE_LOCAL" --> </BODY></HTML>
SSI Pros and Cons
• Easy to add small pieces of functionality to pages
• Possible to use CGI scripts provided by others
• The server needs to parse the SSI page – Performance
• Security issues – Like with almost any dynamic
server side technology • Hard to create complex
functionality
Template processing languages
• Extends the SSI approach with more complicated scripting in the template pages
• PHP (open source), Cold Fusion (Adobe), Active Server Pages (ASP, Microsoft)
• Typically support – Database queries – Iterative processing (for-each) – Conditional processing (if-then)
Java Server Pages
• HTML (or XML) markup with embedded Java servlet code
• Translation and compilation – JSP markup page -> servlets -> Java byte code – Fast and efficient processing of requests
Request Processing
Networking Support
Address resolution
Request processing
Response generation
Static Dynamic
Static content
As-is pages CGI SSI Template
approaches Servlets
Summary
• Three key elements of web – HTML page description language – Uniform naming scheme – HTTP protocol
• Web Servers – Networking support – Address resolution – Request processing
• Static content • Dynamic content
– Response generation