Upload
phungminh
View
214
Download
0
Embed Size (px)
Citation preview
Introduction(Protocol Layering, Security) &
Application Layer (Principles, Web, FTP)
Computer Networks and Applications
Week 2
COMP 3331/COMP 9331
Reading Guide: Chapter 1, Sections 1.5 - 1.7 Chapter 2, Sections 2.1 – 2.3
1
Course Introduction
v Web: http://www.cse.unsw.edu.au/~cs3331 v Read course outline on the webpage v Labs begin in Week 2
§ Please attend your allocated slot § Please read the Tools of the Trade introduction
document § You get one week to work on your reports. Lab
Reports due in Week 3 before your next lab, e.g., for students attending the Monday 12noon lab, the report is due 11:59am, Monday.
RECAP
2
1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
3
Three (networking) design steps
v Break down the problem into tasks
v Organize these tasks
v Decide who does what
4
Tasks in Networking
v What does it take to send packets across country?
v Simplistic decomposition: § Task 1: send along a single wire
§ Task 2: stitch these together to go across country
v This gives idea of what I mean by decomposition
5
Tasks in Networking (bottom up)
v Bits on wire v Packets on wire v Deliver packets within local network v Deliver packets across global network v Ensure that packets get to the destination v Do something with the data
6
Resulting Modules
v Bits on wire (Physical) v Packets on wire (Physical) v Delivery packets within local network (Datalink) v Deliver packets across global network (Network) v Ensure that packets get to the dst. (Transport) v Do something with the data (Application)
This is decomposition… Now, how do we organize these tasks?
7
Dear John, Your days are numbered.
--Pat
Inspiration… v CEO A writes letter to CEO B
§ Folds letter and hands it to administrative aide » Aide:
» Puts letter in envelope with CEO B’s full name
» Takes to FedEx
v FedEx Office § Puts letter in larger envelope § Puts name and street address on FedEx envelope § Puts package on FedEx delivery truck
v FedEx delivers to other company
8
CEO
Aide
FedEx
CEO
Aide
FedEx Location Fedex Envelope (FE)
The Path of the Letter
Letter
Envelope
Semantic Content
Identity
“Peers” on each side understand the same things No one else needs to (abstraction)
Lowest level has most packaging
9
The Path Through FedEx
Truck
Sorting Office
Airport
FE
Sorting Office
Airport
Truck
Sorting Office
Airport
Crate Crate
FE
New Crate Crate
FE
Higher “Stack” at Ends
Partial “Stack” During Transit
Deepest Packaging (Envelope+FE+Crate) at the Lowest Level of Transport
Highest Level of “Transit Stack” is Routing
10
In the context of the Internet
Applications
…built on…
…built on…
…built on…
…built on…
Reliable (or unreliable) transport
Best-effort global packet delivery
Best-effort local packet delivery
Physical transfer of bits
11
Internet protocol stack v application: supporting network
applications § FTP, SMTP, HTTP, Skype, ..
v transport: process-process data transfer § TCP, UDP
v network: routing of datagrams from source to destination § IP, routing protocols
v link: data transfer between neighboring network elements § Ethernet, 802.111 (WiFi), PPP
v physical: bits “on the wire” 12
Quiz: Layering
This model is many years old. If you had to pick a layer to “update” which would you choose? Why?
This model is many years old. If you had to pick a layer to “update” which would you choose? Why?
38
Application: the application (e.g., the Web, Email)
Transport: end-to-end connections, reliability
Network: routing
Link (data-link): framing, error detection
Physical: 1’s and 0’s/bits across a medium (copper, the air, fiber)
A
B
C
D
E
Three Observations v Each layer:
§ Depends on layer below § Supports layer above § Independent of others
v Multiple versions in layer § Interfaces differ somewhat § Components pick which lower-
level protocol to use
v But only one IP layer § Unifying protocol
Layering Crucial to Internet’s Success
v Reuse v Hides underlying detail v Innovation at each level can
proceed in parallel v Modularization eases
maintenance, updating of system § change of implementation of
layer’s service transparent to rest of system
v Pursued by very different communities
2-16
An Example: No Layering
v No layering: each new application has to be re-implemented for every network technology !
ssh HTTP
Wireless Ether-net
Fiber optic
Application
Transmission Media
Skype
An Example: Benefit of Layering
v Introducing an intermediate layer provides a common abstraction for various network technologies
Skype ssh HTTP
Wireless Ethernet Fiber optic
Application
Transmission Media
Transport & Network
17
v Layer N may duplicate lower level functionality § E.g., error recovery to retransmit lost data
v Information hiding may hurt performance § E.g. packet loss due to corruption vs. congestion
v Headers start to get really big § E.g., typically TCP + IP + Ethernet headers add up to
54 bytes v Layer violations when the gains too great to resist
§ E.g., TCP-over-wireless v Layer violations when network doesn’t trust ends
§ E.g., Firewalls
Is Layering Harmful?
18
Distributing Layers Across Network
v Layers are simple if only on a single machine § Just stack of modules interacting with those above/
below
v But we need to implement layers across machines § Hosts § Routers § Switches
v What gets implemented where?
19
“End-to-End” Argument
v Don’t provide a function at lower layer of abstraction (layer) if you have to do it at higher layer anyway – unless there is a very good performance reason to do so
v Examples: error control, quality of service
20
J. Satlzer, D. Reed and D. Clark, “End-to-end Arguments in System Design”, ACM Transactions on Computer Systems, vol. 2 (4), pp. 277-288, 1984 http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
What Gets Implemented on Host?
v Bits arrive on wire, must make it up to application
v Therefore, all layers must exist at host!
21
What Gets Implemented on Router?
v Bits arrive on wire § Physical layer necessary
v Packets must be delivered to next-hop § datalink layer necessary
v Routers participate in global delivery § Network layer necessary
v Routers don’t support reliable delivery § Transport layer (and above) not supported
22
23
Internet Layered Architecture
HTTP
TCP
IP
Ethernet interface
HTTP
TCP
IP
Ethernet interface
IP IP
Ethernet interface
Ethernet interface
SONET interface
SONET interface
host host
router router
HTTP message
TCP segment
IP packet IP packet IP packet
23
Logical Communication
v Layers interacts with peer’s corresponding layer
Transport Network Datalink Physical
Transport Network Datalink Physical
Network Datalink Physical
Application Application
Host A Host B Router
24
Physical Communicationv Communication goes down to physical network v Then from network peer to peer v Then up to relevant layer
Transport Network Datalink Physical
Transport Network Datalink Physical
Network Datalink Physical
Application Application
Host A Host B Router
25
Organization of air travel
v a series of steps
ticket (purchase) baggage (check) gates (load) runway takeoff airplane routing
ticket (complain) baggage (claim) gates (unload) runway landing airplane routing
airplane routing
26
Self Study
ticket (purchase)
baggage (check)
gates (load)
runway (takeoff)
airplane routing
departure airport
arrival airport
intermediate air-traffic control centers
airplane routing airplane routing
ticket (complain)
baggage (claim
gates (unload)
runway (land)
airplane routing
ticket
baggage
gate
takeoff/landing
airplane routing
Layering of airline functionality
layers: each layer implements a service § via its own internal-layer actions § relying on services provided by layer below
Self Study
27
source application transport network
link physical
Ht Hn M
segment Ht
datagram
destination
application transport network
link physical
Ht Hn Hl M
Ht Hn M
Ht M
M
network link
physical
link physical
Ht Hn Hl M
Ht Hn M
Ht Hn M
Ht Hn Hl M
router
switch
Encapsulation message M
Ht M
Hn
frame
28
1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
29
Self Study
Network security
v field of network security: § how bad guys can attack computer networks § how we can defend networks against attacks § how to design architectures that are immune to
attacks v Internet not originally designed with (much)
security in mind § original vision: “a group of mutually trusting users
attached to a transparent network” J § Internet protocol designers playing “catch-up” § security considerations in all layers!
30 Disclaimer: This is a high-level view, details will be covered later
Self Study
Bad guys: put malware into hosts via Internet
v malware can get in host from: § virus: self-replicating infection by receiving/executing
object (e.g., e-mail attachment)
§ worm: self-replicating infection by passively receiving object that gets itself executed
v spyware malware can record keystrokes, web sites visited, upload info to collection site
v infected host can be enrolled in botnet, used for spam. DDoS attacks
31
Self Study
target
Denial of Service (DoS): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus traffic
1. select target
2. break into hosts around the network (see botnet)
3. send packets to target from compromised hosts
Bad guys: attack server, network infrastructure
32
Self Study
Bad guys can sniff packets packet “sniffing”:
§ broadcast media (shared ethernet, wireless) § promiscuous network interface reads/records all packets
(e.g., including passwords!) passing by
A
B
C
src:B dest:A payload
v wireshark software used for end-of-chapter labs is a (free) packet-sniffer
33
Self Study
Bad guys can use fake addresses
IP spoofing: send packet with false source address
A
B
C
src:B dest:A payload
… lots more on security (throughout, Chapter 8)
34
Self Study
1. Introduction : roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
Self Study
Hoobes’Internet timeline: http://www.zakon.org/robert/internet/timeline/
36
Internet history
v 1961: Kleinrock - queueing theory shows effectiveness of packet-switching
v 1964: Baran - packet-switching in military nets
v 1967: ARPAnet conceived by Advanced Research Projects Agency
v 1969: first ARPAnet node operational
v 1972: § ARPAnet public demo § NCP (Network Control
Protocol) first host-host protocol
§ first e-mail program § ARPAnet has 15 nodes
1961-1972: Early packet-switching principles
Self Study
37
v 1970: ALOHAnet satellite network in Hawaii
v 1974: Cerf and Kahn - architecture for interconnecting networks
v 1976: Ethernet at Xerox PARC v late70’s: proprietary
architectures: DECnet, SNA, XNA
v late 70’s: switching fixed length packets (ATM precursor)
v 1979: ARPAnet has 200 nodes
Cerf and Kahn’s internetworking principles: § minimalism, autonomy - no
internal changes required to interconnect networks
§ best effort service model § stateless routers § decentralized control
define today’s Internet architecture
1972-1980: Internetworking, new and proprietary nets Internet history Self Study
38
v 1983: deployment of TCP/IP
v 1982: smtp e-mail protocol defined
v 1983: DNS defined for name-to-IP-address translation
v 1985: ftp protocol defined v 1988: TCP congestion
control
v new national networks: Csnet, BITnet, NSFnet, Minitel
v 100,000 hosts connected to confederation of networks
1980-1990: new protocols, a proliferation of networks Internet history Self Study
39
v early 1990’s: ARPAnet decommissioned
v 1991: NSF lifts restrictions on commercial use of NSFnet (decommissioned, 1995)
v early 1990s: Web § hypertext [Bush 1945, Nelson
1960’s] § HTML, HTTP: Berners-Lee § 1994: Mosaic, later Netscape § late 1990’s:
commercialization of the Web
late 1990’s – 2000’s: v more killer apps: instant
messaging, P2P file sharing v network security to
forefront v est. 50 million host, 100
million+ users v backbone links running at
Gbps
1990, 2000’s: commercialization, the Web, new apps
Internet history Self Study
40
2005-present v ~750 million hosts
§ Smartphones and tablets
v Aggressive deployment of broadband access v Increasing ubiquity of high-speed wireless access v Emergence of online social networks:
§ Facebook: soon one billion users v Service providers (Google, Microsoft) create their own
networks § Bypass Internet, providing “instantaneous” access to
search, emai, etc. v E-commerce, universities, enterprises running their
services in “cloud” (eg, Amazon EC2)
Internet history Self Study
41
Introduction: summary
covered a “ton” of material! v Internet overview v what’s a protocol? v network edge, core, access
network § packet-switching versus
circuit-switching § Internet structure
v performance: loss, delay, throughput
v layering, service models v security v history
you now have: v context, overview, “feel”
of networking v more depth, detail to
follow!
42
2. Application Layer: outline
2.1 principles of network applications
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
43
2. Application layer
our goals: v conceptual,
implementation aspects of network application protocols § transport-layer
service models § client-server
paradigm § peer-to-peer
paradigm
v learn about protocols by examining popular application-level protocols § HTTP § FTP § SMTP / POP3 / IMAP § DNS
v creating network applications § socket API
44
Some network apps
v e-mail v web v text messaging v remote login v P2P file sharing v multi-user network games v streaming stored video
and audio (YouTube, NetFlix, Spotify)
v voice over IP (e.g., Skype) v real-time video
conferencing v social networking v Search v virtual reality v …
45
Creating a network app write programs that: v run on (different) end systems v communicate over network v e.g., web server software communicates
with browser software
Varying degrees of integration v Loose: email, web browsing v Medium: chat, Skype, remote file systems v Tight: process migration, distributed file
systems
no need to write software for network-core devices
v network-core devices do not run user applications
v applications on end systems allows for rapid app development, propagation
application transport network data link physical
application transport network data link physical
application transport network data link physical
46
Interprocess Communication (IPC)
v Processes talk to each other through Inter-process communication (IPC)
v On a single machine: § Shared memory
v Across machines: § We need other abstractions (message passing)
47
Shared Segment
Interprocess Communication (IPC)
• In order to cooperate, need to communicate • Achieved via IPC: interprocess communication
– ability for a process to communicate with another
• On a single machine: – Shared memory
• Across machines:
– We need other abstractions (message passing)
Text
Data
Stack
Text
Data
Stack P1 P2
Sockets v process sends/receives messages to/from its socket v socket analogous to door
§ sending process shoves message out door § sending process relies on transport infrastructure on other
side of door to deliver message to socket at receiving process
v Application has a few options, OS handles the details
Internet
controlled by OS
controlled by app developer
transport
application
physical
link
network
process
transport
application
physical
link
network
process socket
48
Addressing processes v to receive messages,
process must have identifier v host device has unique 32-
bit IP address v Q: does IP address of host
on which process runs suffice for identifying the process?
v identifier includes both IP address and port numbers associated with process on host.
v example port numbers: § HTTP server: 80 § mail server: 25
v to send HTTP message to cse.unsw.edu.au web server: § IP address: 129.94.242.51 § port number: 80
v more shortly…
§ A: no, many processes can be running on same host
49
Client-server architecture server: v Exports well-defined request/
response interface v long-lived process that waits for
requests v Upon receiving request, carries
it out
clients: v Short-lived process that makes
requests v “User-side” of application v Initiates the communication
client/server
50
Client versus Server
v Server § Always-on host § Permanent IP address
(rendezvous location) § Static port conventions
(http: 80, email: 25, ssh:22)
§ Data centres for scaling § May communicate with
other servers to respond
v Client § May be intermittently
connected § May have dynamic IP
addresses § Do not communicate
directly with each other
51
P2P architecture v no always-on server
§ No permanent rendezvous involved
v arbitrary end systems (peers) directly communicate
v Symmetric responsibility (unlike client/server)
v Often used for: § File sharing (BitTorrent) § Games § Video distribution, video chat § In general: “distributed
systems”
peer-peer
52
P2P architecture: Pros and Cons + peers request service from other peers, provide service in return to other peers
§ self scalability – new peers bring new service capacity, as well as new service demands
+ Speed: parallelism, less contention + Reliability: redundancy, fault tolerance + Geographic distribution
- Fundamental problems of decentralized
control § State uncertainty: no shared memory or
clock § Action uncertainty: mutually conflicting
decisions
- Distributed algorithms are complex
peer-peer
54
App-layer protocol defines v types of messages
exchanged, § e.g., request, response
v message syntax: § what fields in messages
& how fields are delineated
v message semantics § meaning of information
in fields v rules for when and how
processes send & respond to messages
open protocols: v defined in RFCs v allows for interoperability v e.g., HTTP, SMTP proprietary protocols: v e.g., Skype
55
What transport service does an app need? data integrity v some apps (e.g., file transfer,
web transactions) require 100% reliable data transfer
v other apps (e.g., audio) can tolerate some loss
timing v some apps (e.g., Internet
telephony, interactive games) require low delay to be “effective”
throughput v some apps (e.g.,
multimedia) require minimum amount of throughput to be “effective”
v other apps (“elastic apps”) make use of whatever throughput they get
security v encryption, data integrity,
…
56
Self Study
Transport service requirements: common apps
application
file transfer e-mail
Web documents real-time audio/video
stored audio/video interactive games
text messaging
data loss no loss no loss no loss loss-tolerant loss-tolerant loss-tolerant no loss
throughput elastic elastic elastic audio: 5kbps-1Mbps video:10kbps-5Mbps same as above few kbps up elastic
time sensitive no no no yes, 100’s msec yes, few msecs yes, 100’s msec yes and no
57
Self Study
Internet transport protocols services
TCP service: v reliable transport between
sending and receiving process
v flow control: sender won’t overwhelm receiver
v congestion control: throttle sender when network overloaded
v does not provide: timing, minimum throughput guarantee, security
v connection-oriented: setup required between client and server processes
UDP service: v unreliable data transfer
between sending and receiving process
v does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, orconnection setup,
Q: why bother? Why is there a UDP?
NOTE: More on transport in Weeks 4 and 5 58
Self Study
Internet apps: application, transport protocols
application
e-mail remote terminal access
Web file transfer
streaming multimedia
Internet telephony
application layer protocol SMTP [RFC 2821] Telnet [RFC 854] HTTP [RFC 2616] FTP [RFC 959] HTTP (e.g., YouTube), RTP [RFC 1889] SIP, RTP, proprietary (e.g., Skype)
underlying transport protocol TCP TCP TCP TCP TCP or UDP TCP or UDP
59
Self Study
Securing TCP
TCP & UDP v no encryption v cleartext passwds sent
into socket traverse Internet in cleartext
SSL v provides encrypted
TCP connection v data integrity v end-point
authentication
SSL is at app layer v Apps use SSL libraries,
which “talk” to TCP SSL socket API v cleartext passwds sent
into socket traverse Internet encrypted
v See Chapter 7
60
Self Study
2. Application Layer: outline
2.1 principles of network applications § app architectures § app requirements
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
Note: Some of the material here, particularly the descriptive details of various protocols is for self-study. Lectures will focus on design principles.
61
The Web – Precursorv 1967, Ted Nelson, Xanadu:
§ A world-wide publishing network that would allow information to be stored not as separate files but as connected literature
§ Owners of documents would be automatically paid via electronic means for the virtual copying of their documents
v Coined the term “Hypertext” Ted Nelson
62
The Web – Historyv World Wide Web (WWW): a
distributed database of “pages” linked through Hypertext Transport Protocol (HTTP) § First HTTP implementation - 1990
• Tim Berners-Lee at CERN § HTTP/0.9 – 1991
• Simple GET command for the Web § HTTP/1.0 –1992
• Client/Server information, simple caching § HTTP/1.1 - 1996
Tim Berners-Lee
63
Web and HTTP
First, a review… v web page consists of objects v object can be HTML file, JPEG image, Java applet,
audio file,… v web page consists of base HTML-file which
includes several referenced objects v each object is addressable by a URL, e.g., www.someschool.edu/someDept/pic.gif
host name path name
64
Uniform Record Locator (URL)
protocol://host-name[:port]/directory-path/resource
v Extend the idea of hierarchical hostnames to include anything in a file system § http://www.cse.unsw.edu.au/~salilk/papers/journals/TMC2012.pdf
v Extend to program executions as well… § http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B
%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b
§ Server side processing can be incorporated in the name
65
Uniform Record Locator (URL)
protocol://host-name[:port]/directory-path/resource
v protocol: http, ftp, https, smtp, rtsp, etc. v hostname: DNS name, IP address v port: defaults to protocol’s standard port; e.g. http: 80 https: 443 v directory path: hierarchical, reflecting file system v resource: Identifies the desired resource
66
HTTP overview
HTTP: hypertext transfer protocol
v Web’s application layer protocol
v client/server model § client: browser that
requests, receives, (using HTTP protocol) and “displays” Web objects
§ server: Web server sends (using HTTP protocol) objects in response to requests
PC running Firefox browser
server running
Apache Web server
iphone running Safari browser
67
HTTP overview (continued)
uses TCP: v client initiates TCP
connection (creates socket) to server, port 80
v server accepts TCP connection from client
v HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
v TCP connection closed
HTTP is “stateless” v server maintains no
information about past client requests
protocols that maintain “state” are complex!
v past history (state) must be maintained
v if server/client crashes, their views of “state” may be inconsistent, must be reconciled
aside
68
HTTP request message
v two types of HTTP messages: request, response v HTTP request message:
§ ASCII (human-readable format)
request line (GET, POST, HEAD commands)
header lines
carriage return, line feed at start of line indicates end of header lines
GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n
carriage return character line-feed character
69
v How else (i.e. other that \r\n) might we delineate headers?
§ A: There’s not much else we can do
§ B: Force all messages to be the same size
§ C: Send the message size prior to the message
§ D: Some other way (discuss)
70
Quiz: Header delineation
HTTP response message
status line (protocol status code status phrase)
header lines
data, e.g., requested HTML file
HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT
\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html;
charset=ISO-8859-1\r\n \r\n data data data data data ...
71
HTTP response status codes
200 OK § request succeeded, requested object later in this msg
301 Moved Permanently § requested object moved, new location specified later in this msg
(Location:)
400 Bad Request § request msg not understood by server
404 Not Found § requested document not found on this server
505 HTTP Version Not Supported 451 Unavailable for Legal Reasons 429 Too Many Requests 418 I’m a Teapot
v status code appears in 1st line in server-to-client response message. v some sample codes:
72
HTTP is all text
v Makes the protocol simple § Easy to delineate messages (\r\n) § (relatively) human-readable § No issues about encoding or formatting data § Variable length data
v Not the most efficient § Many protocols use binary fields
• Sending “12345678” as a string is 8 bytes • As an integer, 12345678 needs only 4 bytes
§ Headers may come in any order § Requires string parsing/processing
73
Request Method types (“verbs”)
HTTP/1.0: v GET
§ Request page
v POST § Uploads user response to a
form v HEAD
§ asks server to leave requested object out of response
HTTP/1.1: v GET, POST, HEAD v PUT
§ uploads file in entity body to path specified in URL field
v DELETE § deletes file specified in
the URL field v TRACE, OPTIONS,
CONNECT, PATCH § For persistent
connections 74
Uploading form input
POST method: v web page often includes form input v input is uploaded to server in entity body
Get (in-URL) method: v uses GET method v input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana
75
GET vs. POST
v GET can be used for idempotent requests § Idempotence: an operation can be applied multiple
times without changing the result (the final state is the same)
v POST should be used when.. § A request changes the state of the session or server or
DB § Sending a request twice would be harmful
• (Some) browsers warn about sending multiple post requests
§ Users are inputting non-ascii characters § Input may be vary large § You want to hide how the form works/user input 76
77
Quiz: When might you use GET vs. POST
GET POST A. Forum post Search terms, Pizza order B. Search terms, Pizza order Forum post C Search terms Forum post, Pizza order D. Forum post, Search
terms, Pizza order
E. Forum post, Search terms, Pizza order
Trying out HTTP (client side) for yourself 1. Telnet to your favorite Web server:
opens TCP connection to port 80 (default HTTP server port) at cis.poly.edu. anything typed in sent to port 80 at cis.poly.edu
telnet cis.poly.edu 80
2. type in a GET HTTP request: GET /~ross/ HTTP/1.1 Host: cis.poly.edu
by typing this in (hit carriage return twice), you send this minimal (but complete) GET request to HTTP server
3. look at response message sent by HTTP server! (or use Wireshark to look at captured HTTP request/response)
Web-based sniffer: http://web-sniffer.net/ 78
Your 2nd lab
URL Shortening v Some URLs can get really long
§ http://en.wikipedia.org/w/index.php?title=TinyURL&diff=283621022&oldid=283308287
§ Inconvenient for sharing, prone to errors, hard to recall v URL shortening services
§ Produce short and easy to remember URLs § A key is associated with each long URL
• http://tinyurl.com/m3q2xt• Keys - hash functions or random number generator or user
proposed • URL Redirection - 3XX status codes • Warning: Also used extensively in phishing attacks
key
HTTP/1.1 301 Moved PermanentlyLocation: http://www.abc.org/Content-Type: text/htmlContent-Length: 174<html><head><title>Moved</title></head><body><h1>Moved</h1><p>This page has moved to <a href="http://www.abc.org/">http://www.abc.org/</a></p></body></html> 79
User-server state: cookies
many Web sites use cookies four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site
example: v Susan always access Internet
from PC v visits specific e-commerce
site for first time v when initial HTTP requests
arrives at site, site creates: § unique ID § entry in backend
database for ID
81
Cookies: keeping “state” (cont.) client server
usual http response msg
usual http response msg
cookie file
one week later:
usual http request msg cookie: 1678 cookie-
specific action
access
ebay 8734 usual http request msg Amazon server creates ID
1678 for user create entry
usual http response set-cookie: 1678 ebay 8734
amazon 1678
usual http request msg cookie: 1678 cookie-
specific action
access ebay 8734 amazon 1678
backend database
82
Cookies (continued) what cookies can be used
for: v authorization v shopping carts v recommendations v user session state (Web
e-mail)
cookies and privacy: v cookies permit sites to
learn a lot about you v you may supply name and
e-mail to sites
aside
how to keep “state”: v protocol endpoints: maintain state at
sender/receiver over multiple transactions
v cookies: http messages carry state
83
The Dark Side of Cookies
v Cookies permit sites to learn a lot about you
v You may supply name and e-mail to sites (and more)
v 3rd party cookies (from ad networks, etc) can follow you across multiple sites § Ever visit a website, and the next day ALL your ads are from
them ? • Check your browser’s cookie file (cookies.txt, cookies.plist) • Do you see a website that you have never visited
v You COULD turn them off § But good luck doing anything on the Internet !!
84
Performance Goals
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective infrastructure
v Network (secondary) § avoid overload
85
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective infrastructure
v Network (secondary) § avoid overload
Improve HTTP to achieve faster downloads
86
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective delivery infrastructure
v Network (secondary) § avoid overload
Caching and Replication
87
Improve HTTP to achieve faster downloads
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective delivery infrastructure
v Network (secondary) § avoid overload
Caching and Replication
Exploit economies of scale (Webhosting, CDNs, datacenters)
Improve HTTP to compensate for TCP’s weak spots
88
HTTP Performancev Most Web pages have multiple objects
§ e.g., HTML file and a bunch of embedded images v How do you retrieve those objects (naively)?
§ One item at a time v New TCP connection per (small) object!
non-persistent HTTP v at most one object sent over TCP connection
§ connection then closed v downloading multiple objects required multiple
connections
89
Non-persistent HTTP suppose user enters URL:
1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80
2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index
1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80. “accepts” connection, notifying client
3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket time
(contains text, references to 10
jpeg images) www.someSchool.edu/someDepartment/home.index
90
Non-persistent HTTP (cont.)
5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects
6. Steps 1-5 repeated for each of 10 jpeg objects
4. HTTP server closes TCP connection.
time
91
Non-persistent HTTP: response time
RTT (definition): time for a small packet to travel from client to server and back
HTTP response time: v one RTT to initiate TCP
connection v one RTT for HTTP request
and first few bytes of HTTP response to return
v file transmission time v non-persistent HTTP
response time = 2RTT+ file transmission
time
time to transmit file
initiate TCP connection
RTT
request file
RTT
file received
time time
92
Internet
Improving HTTP Performance:
Concurrent Requests & Responses
v Use multiple connections in parallel
v Does not necessarily maintain order of responses
• Client = J • Content provider = J
• Network = L Why?
R1 R2 R3
T1
T2 T3
93
Nonpersistent HTTP issues: v requires 2 RTTs per object v OS must work and allocate host
resources for each TCP connection v but browsers often open parallel
TCP connections to fetch referenced objects
Persistent HTTP v server leaves connection open after
sending response v subsequent HTTP messages
between same client/server are sent over connection
v Allow TCP to learn more accurate RTT estimate (APPARENT LATER)
v Allow TCP congestion window to increase (APPARENT LATER)
v i.e., leverage previously discovered bandwidth (APPARENT LATER)
Persistent without pipelining: v client issues new request only
when previous response has been received
v one RTT for each referenced object
Persistent with pipelining: v default in HTTP/1.1 v client sends requests as soon as it
encounters a referenced object v as little as one RTT for all the
referenced objects
Persistent HTTP
94
HTTP 1.1: response time
95
initiate TCP connection
RTT
request file
RTT
file received
time time
Internet
time to transmit file
Website with one index page and three embedded objects
Scorecard: Getting n Small Objects
Time dominated by latency (i.e. RTT)
v One-at-a-time: ~2n RTT v M concurrent: ~2[n/m] RTT v Persistent: ~ (n+1)RTT v Pipelined+ Persistent: ~2 RTT
96
v Among the following, in which case would you get the greatest improvement in performance with persistent HTTP compared to non-persistent HTTP?
A. Low throughput network paths (irrespective of distance)
B. High throughput network paths (irrespective of distance)
C. Long-distance network paths (irrespective of throughput)
D. High throughput, short-distance network paths
E. High throughput, long-distance network paths 97
Quiz: Persistent vs. Non-persistent HTTP
v Pipelining allows the client to send multiple HTTP requests on a single TCP connection without waiting for the corresponding responses. What could be a potential bottleneck despite using pipelining?
98
Quiz: Pipelining
Improving HTTP Performance: Caching
v Why does caching work? § Exploits locality of reference
v How well does caching work? § Very well, up to a limit § Large overlap in content § But many unique requests
99
Web caches (proxy server)
v user sets browser: Web accesses via cache
v browser sends all HTTP requests to cache § object in cache: cache
returns object § else cache requests
object from origin server, then returns object to client
goal: satisfy client request without involving origin server
client
proxy server
client origin server
origin server
100
More about Web caching
v cache acts as both client and server § server for original
requesting client § client to origin server
v typically cache is installed by ISP (university, company, residential ISP)
why Web caching? v reduce response time
for client request v reduce traffic on an
institution’s access link v Internet dense with
caches: enables “poor” content providers to effectively deliver content
101
Caching example:
origin servers
public Internet
institutional network
1 Gbps LAN
1.54 Mbps access link
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: 15% v access link utilization = 99% v total delay = Internet delay +
access delay + LAN delay = 2 sec + minutes + usecs
problem!
102
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: 15% v access link utilization = 99% v total delay = Internet delay + access
delay + LAN delay = 2 sec + minutes + usecs
Caching example: fatter access link
origin servers
1.54 Mbps access link
154 Mbps
154 Mbps
msecs Cost: increased access link speed (not cheap!)
9.9%
public Internet
institutional network
1 Gbps LAN
103
institutional network
1 Gbps LAN
Caching example: install local cache
origin servers
1.54 Mbps access link
local web cache
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: v access link utilization = v total delay =
? ?
How to compute link utilization, delay?
Cost: web cache (cheap!)
public Internet
?
104
Caching example: install local cache Calculating access link
utilization, delay with cache: v suppose cache hit rate is 0.4
§ 40% requests satisfied at cache, 60% requests satisfied at origin
origin servers
1.54 Mbps access link
v access link utilization: § 60% of requests use access link
v data rate to browsers over access link = 0.6*1.50 Mbps = .9 Mbps § utilization = 0.9/1.54 = .58
v total delay § = 0.6 * (delay from origin servers) +0.4
* (delay when satisfied at cache) § = 0.6 (2.01) + 0.4 (~msecs) § = ~ 1.2 secs § less than with 154 Mbps link (and
cheaper too!)
public Internet
institutional network
1 Gbps LAN local web
cache
105
v Distribution of web object requests generally follows a Zipf-like distribution
v The probability that a document will be referenced k requests after it was last referenced is roughly proportional to 1/k . That is, web traces exhibit excellent temporal locality.
106
But what is the likelihood of cache hits?
Paper – “Web Caching and Zipf-like Distributions: Evidence and Implications” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.8742&rep=rep1&type=pdf
Video content exhibits similar properties: 10% of the top popular videos account for nearly 80% of views, while the remaining 90% of videos account for total 20% of requests. Paper – http://yongyeol.com/papers/cha-video-2009.pdf
Conditional GET
v Goal: don’t send object if cache has up-to-date cached version § no object transmission
delay § lower link utilization
v cache: specify date of cached copy in HTTP request If-modified-since: <date>
v server: response contains no object if cached copy is up-to-date: HTTP/1.0 304 Not Modified
HTTP request msg If-modified-since: <date>
HTTP response HTTP/1.0
304 Not Modified
object not
modified before <date>
HTTP request msg If-modified-since: <date>
HTTP response HTTP/1.0 200 OK
<data>
object modified
after <date>
client server
107
Improving HTTP Performance: Caching: Where?
v Options
§ Client
§ Forward proxies
§ Reverse proxies
§ Content Distribution Network
110
v Baseline: Many clients transfer same information § Generate unnecessary server and network load § Clients experience unnecessary latency
Server
Clients
Tier-1 ISP
ISP-1 ISP-2
111
Improving HTTP Performance: Caching: Where?
v Cache documents close to server à decrease server load
v Typically done by content provider
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
112
Improving HTTP Performance: Reverse Proxies
v Cache documents close to clients à reduce network traffic and decrease latency
v Typically done by ISPs or enterprises
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
Forward proxies
113
Improving HTTP Performance: Forward Proxies
v Replicate popular Web site across many machines § Spreads load on servers § Places content closer to clients § Helps when content isn’t cacheable
v Problem:
§ Want to direct client to particular replica • Balance load across server replicas • Pair clients with nearby servers
§ Expensive
v Common solution: § DNS returns different addresses based on client’s geo
location, server load, etc.
Improving HTTP Performance: Replication
114
v Caching and replication as a service v Integrate forward and reverse caching functionality v Large-scale distributed storage infrastructure (usually)
administered by one entity § e.g., Akamai has servers in 20,000+ locations
v Combination of (pull) caching and (push) replication § Pull: Direct result of clients’ requests § Push: Expectation of high access rate
v Also do some processing § Handle dynamic web pages § Transcoding § Maybe do some security function – watermark IP
115
Improving HTTP Performance: CDN
v Akamai creates new domain names for each client § e.g., a128.g.akamai.net for cnn.com
v The CDN’s DNS servers are authoritative for the new
domains (will become apparent when we study DNS)
v The client content provider modifies its content so that embedded URLs reference the new domains. § “Akamaize” content § e.g.: http://www.cnn.com/image-of-the-day.gif becomes http://
a128.g.akamai.net/image-of-the-day.gif
v Requests now sent to CDN’s infrastructure…
116
Improving HTTP Performance: CDN
Cost-Effective Content Delivery
v General theme: multiple sites hosted on shared physical infrastructure § efficiency of statistical multiplexing § economies of scale (volume pricing, etc.) § amortization of human operator costs
v Examples:
§ Web hosting companies § CDNs § Cloud infrastructure
117
What about HTTPS?
v HTTP is insecure v HTTP basic authentication: password sent using
base64 encoding (can be readily converted to plaintext)
v HTTPS: HTTP over a connection encrypted by Transport Layer Security (TLS)
v Provides: § Authentication § Bidirectional encryption
v Widely used in place of plain vanilla HTTP
118
What’s on the horizon: HTTP/2 v Standardised in May 2015: RFC 7540 v Improvements
§ Severs can push content and thus reduce overhead of an additional request cycle
§ Fully multiplexed • Requests and responses are sliced in smaller chunks called frames,
frames are tagged with and ID that connects data to the request/response
• overcomes Head-of-line blocking in HTTP 1.1 § Prioritisation of the order in which objects should be sent (e.g.
CSS files may be given higher priority) § Data compression of HTTP headers
• Some headers such as cookies can be very long • Repetitive information
119
More details: https://http2.github.io/faq/ Demo: https://http2.akamai.com/demo
2. Application Layer: outline
2.1 principles of network applications § app architectures § app requirements
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
Self Study
120
FTP: the file transfer protocol file transfer
FTP server
FTP user
interface FTP client
local file system
remote file system
user at host
v transfer file to/from remote host v client/server model
§ client: side that initiates transfer (either to/from remote) § server: remote host
v ftp: RFC 959 v ftp server: port 21
Self Study
121
FTP: separate control, data connections
v FTP client contacts FTP server at port 21, using TCP
v client authorized over control connection
v client browses remote directory, sends commands over control connection
v when server receives file transfer command, server opens 2nd TCP data connection (for file) to client
v after transferring one file, server closes data connection
FTP client
FTP server
TCP control connection, server port 21
TCP data connection, server port 20
v server opens another TCP data connection to transfer another file
v control connection: “out of band”
v FTP server maintains “state”: current directory, earlier authentication
Self Study
122
FTP commands, responses sample commands: v sent as ASCII text over
control channel v USER username v PASS password v LIST return list of file in
current directory v RETR filename
retrieves (gets) file v STOR filename stores
(puts) file onto remote host
sample return codes v status code and phrase (as
in HTTP) v 331 Username OK, password required
v 125 data connection already open; transfer starting
v 425 Can’t open data connection
v 452 Error writing file
Self Study
123
Active FTP
v Client connects from port N (N>1023) to FTP server listening on port 20
v Sends a command “PORT N+1” to the FTP server
v Server sends back ACK v FTP server’s port 20 opens a
TCP connection with port N+1 on the client’s host
v Client sends back ACK v Issues with firewalls - client’s
Sys Admin may prevent incoming TCP connections
Self Study
124
Passive FTP
v Client initiates both connections - hence OK with firewalls
v Client connects from port N (N>1023) to FTP server listening on port 20
v Sends a command “PASV” to the FTP server
v FTP server opens a listening socket on some port X (not 20) and replies to the client with “X”
v Client connects to port X v Server sends back ACK
Self Study
125
Summary v Completed Introduction (Chapter 1)
§ Solve Sample Problem Set § Check questions on website
v Application Layer (Chapter 2) § Principles of Network Applications § HTTP § FTP
v Next Week § Application Layer (contd.)
• E-mail • DNS • P2P
§ First Programming assignment will be released
126
Solve all sample problems Reading Exercise Chapter 2: 2.4 – 2.7