Upload
kristopher-carpenter
View
217
Download
0
Embed Size (px)
Citation preview
HTTP, Naming and Lookup
Zachary G. IvesUniversity of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
April 21, 2023
2
Readings and Reminders
Readings on DNS (Wikipedia) and LDAP (Marshall’s overview) – see course schedule for links
Homework 1 Milestone 1 due Feb. 3rd
3
HTTP: HyperText Transfer Protocol
A very simple, stateless protocol for sessionless exchanges Browser creates a new connection each time it
wants to make a new request (for a page, image, etc.)
What are the benefits of this model? Drawbacks?
Exceptions: HTTP 1.1 added support for persistent
connections and pipelining Clients + servers might keep state information Cookies provide a way of recording state
4
HTTP Overview
Requests: A small number of request types (GET, POST,
PUT, DELETE) Request may contain additional information,
e.g. client info, parameters for forms, etc.
Responses: Response codes: 200 (OK), 404 (not found),
etc. Metadata: content’s MIME type, length, etc. The “payload” or data
5
A Simple HTTP Request
GET /~cis455/index.html HTTP/1.1If-Modified-Since: Sun, 7 Jan 2007 11:12:23 GMTReferer: http://www.cis.upenn.edu/index.html
Requests data at a path using HTTP 1.1 protocol
Example response:HTTP/1.1 200 OKDate: Sun, 7 Jan 2007 11:12:26 GMTLast-Modified: Wed, 14 Jan 2004 8:30:00 GMTContent-Type: text/htmlContent-Length: 3931
…
6
Request Types
GETRetrieve the resource at a URL
POSTSubmit form content
PUTPublish the specified data at a URL
DELETE(Self-explanatory)
7
Forms: Returning Data to the Server
HTML forms allow assignments of values to variables
Two means of submitting forms to apps: GET-style – within the URL:
GET /home/my.cgi?param=val¶m2=val2
POST-style – as the data:POST /home/second.cgi
Content-Length: 34
searchKey Pennwhere www.google.com
8
Authentication and Authorization
Authentication At minimum, user ID and password – authenticates
requestor Client may wish to authenticate the server, too!
SSL (we’ll discuss this more later) Part of SSL: certificate from trusted server, validating
machine Also: public key for encrypting client’s transmissions
Authorization Determine what user can access For files, applications: typically, access control list If data from database, may also have view-based
security
9
Programming Support in Web Servers
CGI – Common Gateway Interface – the oldest: A CGI is a separate program, often in Perl, invoked by the
server Certain info is passed from server to CGI via Unix-style
environment variables QUERY_STRING; REMOTE_HOST, CONTENT_TYPE, … HTTP post data is read from stdin
Interface to persistent process: In essence, how communication with a database is done –
Oracle or MySQL is running “on the side” Communicate via pipes, APIs like ODBC/JDBC, etc.
Server module running in the same process Might be custom code (e.g., Apache extension) or an
interpreter/runtime system…
10
Server Modules
Interpreters: JavaScript/JScript, PHP, ASP, … Often a full-fledged programming language Code is generally embedded within HTML, not stand-alone
Custom runtimes/virtual machines: Most modern Perl runtimes; Java servlets; ASP.NET A virtual machine runs within the web server process Functions are invoked within that JVM to handle each
request Code is generally written as usual, but may need to use
HTML to create UI rather than standard GUI APIs Most of these provide (at least limited) protection
mechanisms
11
Servlets
An interesting model for programming applications in Java A servlet is a subclass of HttpServlet
It overrides methods doGet() or doPost() It’s given a number of objects: HttpServletRequest (includes
info about parameters, browser, etc.), HttpServletResponse (a means for sending info back to the browser, including data, forwarding requests, etc.)
There’s a notion of a session that can be used to share state across doGet()/doPost() invocations – it’s generally connected with a cookie
Those of you who took CSE 330/CIS 550 should be generally familiar with servlets Those who didn’t should be able to catch up by looking at, e.g.,
http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/ http://www.novocode.com/doc/servlet-essentials/
Your homework assignment will be to build a simple servlet engine a la Tomcat
12
(Cross-)Session State: Cookies
Major problem with sessionless nature of HTTP: how do we keep info between connections? Cookie: an opaque string associated with a web
site, stored at the browser Create in HTTP response with “Set-Cookie: xxx” Passed in HTTP header as “Cookie: xxx”
Interpretation is up to the application Usually, object-value pairs; passed in HTTP header:
Cookie: user=“Joe” pwd=“blob” …
Often have an expiration Very common: “session cookies”
13
How Do We Find Things on the Internet?
Generally, using one of three means: Addresses or locations: specify where something is,
assuming that we understand how to navigate Just like a physical address, we may still need a map! In the Internet, addresses are typically IP addresses – the
routers know the map Names: are mapped into addresses via lookup services
Best-known example on the Internet: DNS name Cell phone numbers, email addresses, etc. are becoming names
Content-based addressing/naming The actual data value is used to look up its location The basis of certain kinds of indices, publish-subscribe systems,
and peer-to-peer architectures
14
Pushing the Search to the Network:Flooding Requests – Gnutella
Node A wants a data item; it asks B and C If B and C don’t have it, they ask their
neighbors, etc. What are the implications of this model?
AC B
D
EF
G
I
H
15
The Most Efficient Way of Going fromNames or Content Locations
Directory-based lookup protocols are very common
Examples: Napster 1.0 – peer-to-peer storage with central
directory DNS – distributed hierarchical directory LDAP – hierarchical directory information tree
Inverted index – used to look up keywords in information retrieval
16
Napster 1.0, ca 2002
Hybrid of peer-to-peer storage with central directory showing what’s currently available What are the trade-offs implicit in this model? Why did it
fail?
Napster.com
Peer1
Peer2
Peer3
los-del-rios-macarena.mp3
bspears-oops.mp3
los-del-rios-macarena.mp3
los-del-rios-macarenabspears-oops
Directory
Other Services with Similar Directory + Peer Architectures
Windows Live Sync Google Desktop Search with multiple
machines
BitTorrent trackers are quite similar (we’ll discuss BitTorrent more later)
17
18
Naming People and Devices: LDAP
Lightweight Directory Access Protocol Hierarchical naming system that can be
partitioned and replicated
Seehttp://www.seas.upenn.edu/cets/answers/ldap.htmlto set up your email client to access Penn’s
LDAP server
19
LDAP’s Schema
LDAP information has a schema with different levels of containers: A unique name in LDAP is called a Distinguished Name,
“dn” and consists of a sequence of attributes representing a hierarchy, from most-specific to least-specific (as in DNS names):
o = organization; dc = domain component ou = organizational unit uid = user ID cn = common name
c = country; st = state; l = locality
Can also have objectClass – the type of entity
20
LDAP Hierarchy
Brad Marshall LDAP Tutorial, quark.humbug.au/publications/ldap_tut.html
21
Querying LDAP
LDAP queries are mostly attribute-value predicates: uid=zives; o=upenn; c = usa
(|(cn=Susan Davidson)(cn=Boon Thau Loo)(cn=Val Tannen))
objectclass=posixAccount
(!cn=Val Tannen)
How might we process these queries?
22
The Backbone of Internet Naming:Domain Name Service
A simple, hierarchical name system with a distributed database – each domain controls its own names
edu
columbia upenn berkeley
com
www cis sas
www wwwwww
amazon
www
……
……
…… …
…
Top LevelDomains
23
Top-Level Domains (TLDs)
Mostly controlled by Network Solutions, Inc. today .com: commercial .edu: educational institution .gov: US government .mil: US military .net: networks and ISPs (now also a number of other
things) .org: other organizations 244, 2-letter country suffixes, e.g., .us, .uk, .cz, .tv, … some variants on this for other institutions, e.g., .eu and a bunch of new suffixes that are not very
common, e.g., .biz, .mobi, .name, .pro, …
24
Finding the Root
13 “root servers” store entries for all top level domains (TLDs)
DNS servers have a hard-coded mapping to root servers so they can “get started”
25
Excerpt from DNS Root Server Entries
This file is made available by InterNIC registration services under anonymous FTP as ; file /domain/named.root ; ; formerly NS.INTERNIC.NET ; . 3600000 IN NS A.ROOT-
SERVERS.NET. A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4 ; ; formerly NS1.ISI.EDU ; . 3600000 NS B.ROOT-
SERVERS.NET.B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107 ; ; formerly C.PSI.NET ; . 3600000 NS C.ROOT-
SERVERS.NET.C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12
(13 servers in total, A through M)
26
Supposing We Were to Build DNS
How would we start? How is a lookup performed?
(Hint: what do you need to specify when you add a client to a network that doesn’t do DHCP?)
27
Issues in DNS
We know that everyone wants to be “my-domain”.com How does this mesh with the assumptions
inherent in our hierarchical naming system?
What happens if things move frequently? What happens if we want to provide
different behavior to different requestors (e.g., Akamai)?
28
Directories Summarized
An efficient way of finding data, assuming: Data doesn’t change too often, hence it can be
replicated and distributed Hierarchy is relatively “wide and flat” Caching is present, helping with repeated queries
Directories generally rely on names at their core
Sometimes we want to search based on other means, e.g., predicates or filters over content…