Upload
beverley-hall
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
4/11/2003 Edward Chow Content Switch 1
Introduction to Content Switch
C. Edward ChowDepartment of Computer Science
University of Colorado at Colorado [email protected]
This tutorial is available at http://cs.uccs.edu/~chow/pub/agere/contentswitch.ppt
With agere as login and ag2003ere as password
4/11/2003 Edward Chow Content Switch 2
Outline of the Talk
• Overview of Content Delivery Network and Linux Virtual Server Technologies.
• Overview of Content Switching Concepts• TCP Delayed Binding and Their Improvement • Conflict Detection in Content switching Rule Set • Persistent Issues • Problems Encountered in Content Processing and
their Solutions • Specific Implementations and Their Performance: • Achieving High Availability with Content Switch.
4/11/2003 Edward Chow Content Switch 3
Clients
Content Delivery Network (CDN)
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Huge Requests
Server Crash
Slow Response
Clients
Clients
4/11/2003 Edward Chow Content Switch 4
Content Delivery Problems
http://www.akamai.com
4/11/2003 Edward Chow Content Switch 5
Use Client Cache/Client Side Cache Server
Host Server
MindSpring
PSINetSprint
Gloobix
@Home
UUnet
Fewer Requests
Clients
Clients
Clients
ClientCache
ClientSideCacheServer
QWest
Fast Response
4/11/2003 Edward Chow Content Switch 6
Use Mirror Sites
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Fewer Requests
Server
Fast Response
Clients
Clients
Clients
Mirror Site
Mirror Site
Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement
4/11/2003 Edward Chow Content Switch 7
Edge Network Cache Servers
Host Server
MindSpring
PSINetSprint
Gloobix
QWest
@Home
UUnet
Fewer Requests
Server
Fast Response
Clients
ClientsClients
ClientCache
Mirror Site
Mirror SiteEdgeNetworkCacheServer
CacheServer
CacheServer
CacheServer
CacheServer
ClientSideCacheServer
4/11/2003 Edward Chow Content Switch 8
Content Delivery Problem
• Cache Location Problem: Where to put cache servers?
• How many are needed?• When/where/how to push/delivery the content?• How about dynamic content?
4/11/2003 Edward Chow Content Switch 9
Akamai Edge Delivery Service
• Peering Bottleneck Problem: Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%) Need to put edge servers in many networks.
• 11/2000, 4 billion bits/day for 2800 sites.• Source Http://www.akamai.com
Date # of Edge Servers
# of Networks # of Countries
11/2000 6000 335 54
6/2001 9700 650 56
4/11/2003 Edward Chow Content Switch 10
Caching Dynamic Content at Web Proxies
• Active Cache Project : [PeiCao 98] Univ. Wisconsin– Cache Java applet to be executed at proxies– Choice of passing to server, delivery cached copy,
or generate dynamically.• Edge Side Include (ESI):
– XML tag to specify ESI fragment in a web page.– Each ESI fragment can have different cache/
4/11/2003 Edward Chow Content Switch 11
Edge Side Include Examplehttp://www.esi.org/
<table><tr><td colspan=“2”><esi:try> <esi:attempt> <esi:include src=http://www.myxyz.com/news/top.html onerror=“contineu” /> </esi:attempt> <esi:except> <!- -esi This spot is reserved for your company’s advertising. For more info <a href=www.myxyz.com> click here </a> - - > </esi:except></esi:try></td></tr></table>
4/11/2003 Edward Chow Content Switch 12
Solution to First Mile Problem• First Mile Problem: Hugh requests at web site of CDN• High Bandwidth Connection• Caching
– End System Cache• Client Cache• Client Site Proxy Cache Server• Mirror Site Caches
– Cache Servers in Internet• Hierarchical Cache Servers, e.g., Squid/Harvest/Adaptive Web• Edge Servers of Akamai
• Faster Server/Server Farm (Server Side Caching+Cluster)• Layer4 Load balancer+Real Servers• Content Switch+Real Servers• Distributed Packet Rewrite
4/11/2003 Edward Chow Content Switch 13
Load Balancer
or
Content Switch
Real Server
Web Server ClusterLoad balancer can run at
• Application Level — Reverse Proxy
• Kernel level — Linux Virtual Server
Load balancer can distribute requests based on
• Layer 3-4 info — fixe field/fast hash
• Layer 7 info — var. length/slow parsing
Real Server
Real Server
Real Server
4/11/2003 Edward Chow Content Switch 14
Comparison of Load Balancers• Reverse Proxy runs as application process requires more
memory/packet copying.• Linux Virtual Server runs in kernelno memory copying
Name Type Level Layer Info
Reverse Proxy/Apache/Tomcat/Servlet
SW Application 3-7
Linux Virtual Server SW Kernel 3-4
Linux Content Switch SW Kernel/Appl. 3-7
Layer4 Switch (narrow def.) HW Embedded OS 3-4
Content/Web Switch HW Embedded OS 3-7
4/11/2003 Edward Chow Content Switch 15
Linux Virtual Server (LVS)• “Virtual server is a highly scalable and highly
available server built on a cluster of real servers. The architecture of the cluster is transparent to end users, and the users see only a single virtual server” with Virtual IP address (VIP).
• Http://www.linuxvirtualserver.org/
InternetVIP
Load Balancer/DirectorLinux Box
WAN/LAN
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3CIP
Client CIP: Client IP AddressVIP: Virutal IP AddressRIP: Real Server IP Address
4/11/2003 Edward Chow Content Switch 16
LVS-NAT Configuration (Network Address Translation)• All return traffic go through DirectorSlow• Modify IP addr/port #/Checksum at Director• Director and real servers at same LAN• No modification needed on real-servers• Port remapping: real web server can run
on 8080
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3CIP
Client
Switch
4/11/2003 Edward Chow Content Switch 17
LVS-NAT Configuration Step 2. Director routes Pkt
• Based on CIP, source port#, VIP and dst port#, director selects one of the real servers
• Change the dst IP addr or port # of pkt.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
Client
Switch
CIP VIPCIP RIP1
LVS RoutingScheduling Rules
ipvsadm cmd
4/11/2003 Edward Chow Content Switch 18
LVS-NAT Configuration Step 3. Real Server Replies
• Real server retrieves response.• All real servers set default gateway to Director; like any other
NAT or IP masquerade setup• Packet will be sent back to Director.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
Client
Switch
CIP VIPCIP RIP1
RIP1 CIP
4/11/2003 Edward Chow Content Switch 19
LVS-NAT Configuration Step 4. Director rewrites reply
• Director changes the dst IP addr. (RIP1) of pkt to VIP• Modify port # if needed.• Modify the checksum; send back pkt.
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
4. Rewrite replyClient
Switch
CIP VIPCIP RIP1
RIP1 CIP
VIP CIP
4/11/2003 Edward Chow Content Switch 20
LVS-NAT Configuration (Network Address Translation)• All return traffic go through DirectorSlow• Modify IP addr/port #/Checksum at Director.• Director and real servers at same LAN
InternetVIP
Director
Real Server1
Real Server2
Real Server3
RIP1
RIP2
RIP3
1. request
2. Scheduling/Rewrite packet
CIP
3. ProcessRequest
4. Rewrite reply5. Receive reply
Client
Switch
CIP VIPCIP RIP1
RIP1 CIP
VIP CIP
4/11/2003 Edward Chow Content Switch 21
LVS-NAT Setup Commands
# make the director forward the masquerading packetsecho 1 > /proc/sys/net/ipv4/ip_forward ipchains -A forward -j MASQ -s 172.16.0.0/24 -d 0.0.0.0/0# Add virtual service and link a scheduler to it ipvsadm -A -t 202.103.106.5:80 -s wlc (Weighted Least-Connection
scheduling) ipvsadm -A -t 202.103.106.5:21 -s wrr (Weighted Round Robin scheduling ) #Add real servers and select forwarding method and weight ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.2:80 -m ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.3:8000 -m -w 2 ipvsadm -a -t 202.103.106.5:21 -R 172.16.0.2:21 -m
4/11/2003 Edward Chow Content Switch 22
LVS-Tunnel Configuration(IP Tunneling)
• Real Servers need to handle IP over IP packets.• Real Servers can be geographically separated and return traffic
go through different routes. • Security implication!
InternetVIPLoad Balancer
Linux Box
Real Server1
Real Server2
Real Server3
RIP1
RIP21. request
2. Scheduling/Put packet in IP Tunnel
CIP
3. ProcessRequest
4. Receive reply
Client
CIP VIPRIP0 RIP2 CIP VIP
IP TunnelIP Tunnel
IP TunnelRIP3
RIP0
VIP CIP
4/11/2003 Edward Chow Content Switch 23
LVS-Tunnel Setup Commands
#The load balancer (LinuxDirector), kernel 2.2.14echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -i
#The real server 1, kernel 2.2.14echo 1 > /proc/sys/net/ipv4/ip_forward
# insert it if it is compiled as module insmod ipip ifconfig tunl0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up route add -host 172.26.20.110 dev tunl0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden
4/11/2003 Edward Chow Content Switch 24
LVS-DR Configuration (Direct Routing)
• Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.
• Only Director’s interface replies to VIP ARP request.
• Director only rewrites server MAC address; IP packet not changed Fast!
Internet
VMACDirector Real
Server1
Real Server2
Real Server3
RMAC1
RMAC2
RMAC3
1. request
2. Scheduling/Rewrite packet
CIP
Client
Route/Switch
GMAC VMAC CIP VIP
VMAC RMAC3 CIP VIP
GMAC: Gateway MAC address
4/11/2003 Edward Chow Content Switch 25
LVS-DR Configuration Step 3. Process Request
• Real server returns request.
• Request goes directly throughswitch/router; not Director.
Internet
VMAC LinuxDirector Real
Server1
Real Server2
Real Server3
RMAC1
RMAC2
RMAC3
1. request
2. Scheduling/Rewrite packet
CIP 3. ProcessRequest
4. Receive replyClient
Switch
VIP CIP
GMAC VMAC CIP VIP
VMAC RMAC3 CIP VIP
RMAC3 GMAC VIP CIP
GMAC: Gateway MAC address
4/11/2003 Edward Chow Content Switch 26
LVS-DR Setup Commands #The load balancer (LinuxDirector), kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 –g
#The real server 1, 172.26.20.112, kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward ifconfig lo:0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up route add -host 172.26.20.110 dev lo:0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden
4/11/2003 Edward Chow Content Switch 27
Performance of LVS-based Systems
“We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.” Jerry Glomph Black, Director, Internet & Technical Operations, RealNetworks.
“I had basically (1024) four class-Cs of virtual servers which were loadbalanced through a LinuxDirector (two, actually -- I used redundant directors) onto four real servers which each had the four different class-
Cs aliased on them.” "Ted Pavlic" <[email protected]>
4/11/2003 Edward Chow Content Switch 28
LVS Usage Survey 2/15/2001 Lorn KeyClusters 20 1 2 2 2
Directors
Per Cluster
2 2 2 2 2
Total Real Servers
170 12 4 15 6
RoutingMethods
DR/NAT DR NAT DR NAT
ScheduleMethods
RR/WLC WRR LC WLC WLC
Types of Real Servers
RH6.2 Linux WinLinux
LinuxSolaris
RH
ServiceOffered
WWW WWW/other
WWWDB
WWWSMTP
WWW
File SystemReplication
rsync rsync CodaNFS
Custom rsynccustom
MonitoringSoftware
Heartbeatldirectord
Nanny/Pulse
HeartbeatMon
NannyPulse
Heartbeat
C. Edward ChowDepartment of Computer Science
University of Colorado at Colorado Springs
Sponsored by Computer Comm. Lab/ITRI
4/11/2003 Edward Chow Content Switch 30
Content Switch Topics
• What is a Content Switch?• What Services it Can Provide• Content Switch Example• Related Technologies• Content Switch Architecture and Basic Operations• TCP Delay Binding and Related Improvement• Content Switch Rule and Conflict Detection• Conclusion
4/11/2003 Edward Chow Content Switch 31
Content Switch (CS)
• Route packets based on high layer (Layer 5/7) headers and content.
• Examples:– Direct Web traffic based on pattern of
• URLs, cookies – URL Switching• XML Tag Value– Web Switching
– Can Route incoming email based on email address;Connect POP/IMAP based on login
• Web switches and Intel XML Director/accelerator are special cases of content switch.
4/11/2003 Edward Chow Content Switch 32
What Services It Can Provide
• Enabling premium services for e-commerce, ISP, and Web hosting providers
• Load Balancing and High Available Server Clusters: Web, E-commerce, Email, Computing, File, SAN
• Policy-based networking, differential/QoS services. • Firewall, Strengthening DoS protection, cache/firewall
load-balancing• ‘Flash-crowd' management• Email Spam Protection, Virus Detection/Removal• Applet Authentication/Filtering
4/11/2003 Edward Chow Content Switch 33
F5 VRM Solution
BIG-IP
Server Array
Webmaster
Site Inewyork.domain.com
Site IIItokyo.domain.com
Site IIlosangeles.domain.com
Userlondon.domain.com
Local DNS
3-DNS
GLOBAL-SITE
Router
BIG-IP
InternetInternet
4/11/2003 Edward Chow Content Switch 34
ServerIron 100 Web Switch
• Integrated Layer 2 through Layer 7 switching• Support for up to 7,000,000 concurrent sessions, and 20 Gbps of
throughput• High-availability server load balancing with active/active
configuration and stateful fail-over• Industry's most powerful content switching capabilities, including
URL, Cookie and SSL Session ID based switching• Content-aware cache switching• High performance VPN/Firewall load balancing• Robust protection against Denial of Service (DoS) attacks• Most comprehensive global server load balancing with DNS Proxy
and client proximity measurements
4/11/2003 Edward Chow Content Switch 35
Cisco CSS11000 Content Service Switch
comprises four high-speed RISC processors, with 512 MB of memory, and 20.0 Gbps of throughput, Distributed flow forwarding engines feature up to 16 port-level network processors with up to 128 MB of memory for wire-speed delivery of Web content. Support for "sticky" connections based on IP address, Secure Socket Layer (SSL) session ID, and cookies ensures reliability and security for e- commerce transactions. The unique Cisco content replication technology enables dynamic expansion of site capacity in response to sudden "flash crowds" for "hot" content or seasonal peaks in traffic that can overwhelm servers.
4/11/2003 Edward Chow Content Switch 36
Nortel Alteon Web Switch
• Provides wire-speed Layer 2/3 Ethernet switching, plus high-speed processing based on Layer 4 through 7 information (TCP ports, URLs, HTTP headers and cookies, SSL session ID, etc.)
• Processes hundreds of thousands of concurrent sessions each second on eight multi-rate Ethernet ports, (rate selectable per port), with one Gigabit or 100/1000 Mbps Ethernet uplink port
• Performs local and global server load balancing, application redirection, content filtering, streaming media load balancing, wireless Internet load balancing and content-aware Layer 7 switching
• Filters packets based on up to 2048 filtering rules (224 filtering rules for Alteon AD3/180e Web Switches), uniquely definable per switch and per port
• Meters, controls, and accounts for bandwidth use-by client, server farm, virtual service, application, user class, content type and other traffic classes-and supports guaranteed minimum, metered available, and maximum burst bandwidth rates
4/11/2003 Edward Chow Content Switch 37
Intel Netstructure XML Director 7280
• Example of Rule:Server1: create */order.asp & //Amount[Value >= 10000]
4/11/2003 Edward Chow Content Switch 38
Phobos In-Switch• Only load balancing switch in a PCI card form factor
• Plugs directly into any server PCI slot
• Supports up to 8,192 servers, ensuring availability and maximum performance
• Six different algorithms are available for optimum performance: Round Robin, Weighted Percentage, Least Connections, Fastest Response Time, Adaptive and Fixed.
• Provides failover to other servers for high-availability of the web site
• U.S. Retail $1995.00
4/11/2003 Edward Chow Content Switch 39
E-Commerce Example: 1. ClientClient submits via HTTP/Post (or SOAP) the following purchase in XML:<purchase>
<customerName>CCL</customerName><customerID>111222333</customerID><item><productID>309121544</productID>
<productName>IBM Thinkpad T21</productName><unitPrice>5000</unitPrice><noOfUnits>10</noOfUnits><subTotal>50000</subTotal>
</item><item><productID>309121538</productID>
<productName>Intel wireless LAN PC Card</productName><unitPrice>200</unitPrice><noOfUnits>10</noOfUnits><subTotal>2000</subTotal>
</item><totalAmount>52000</totalAmount>
</purchase>
4/11/2003 Edward Chow Content Switch 40
E-Commerce Example: 2. Content Switch
• Content switch receives the packet.• Recognize it is a http post request from http request line
POST /purchase.cgi HTTP/1.1• Recognize it is an XML document from the meta header
content-type: TEXT/XML• Parsing XML content• Extract values of tag sequences:
52000 purchase/totalAmount CCL purchase/customerName
• Rule 1 is matched and packet is routed to one of highSpeedServers.Rule 1: if (xml.purchase/totalAmount > 5000) routeTo(highSpeedServers);Rule 2: if (xml.purchase/customerName == CCL) routeTo(specialCustomerServers);
4/11/2003 Edward Chow Content Switch 41
No Free Lunch:Penalty of Having Content Switch
•
Increased packet processing time.• For XML Director/Accelerator, it needs to parse XML
document and match tag sequences. 1-3? order of processing time
Layer 4 Switching Layer 7 Switchingpacket header extraction fixed short fields varying length long fieldsswitch rule matching hash table look up pattern matching
Size of XML Document (Bytes) XML Content Extract Time (ms)600 14
7000 2167104 53
4/11/2003 Edward Chow Content Switch 42
Related Technologies
• Application level solution: Proxy server; Apache/Tomcat/Servlet; Microsoft NLB
• Kernel level layer 4 load balancing solution: http://www.linuxvirtualserver.org/– Joseph Mark’s presentation– LVS-NAT(Network Address Translation) web page– LVS-IP Tunnel web page– LVS-DR (Direct Routing) web page
• Hardware solution: Cisco 11000, F5 (Big IP), Alteon Web Systems, Foundry Networks (ServerIron),Excellent information on: Foundry ServerIron Installation and Configuration Guide, May 2000. http://www.foundrynet.com/services/documentation/siug/
4/11/2003 Edward Chow Content Switch 43
Basic Operations of Content Switching
CS Rule Matching Algorithm
HeaderContent
Extraction
Packet Classification
CSRules
Packet Routing(Load Balancing)
CS RuleEditor
IncomingPackets
ForwardPacket
To Servers
Network Path Info
Server Load Status
CS: Content Switching
4/11/2003 Edward Chow Content Switch 44
Content Switch ArchitectureApostolopoulos
Infocom 2000
4/11/2003 Edward Chow Content Switch 45
Content Switch Architecture
Client
HashTable
Case A: Controller findsthere is an entry in its Hash Table,Route request to “sticky connection” outgoing port
Real Server1
4/11/2003 Edward Chow Content Switch 46
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor Real
Server1
4/11/2003 Edward Chow Content Switch 47
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor
Real Server1
Step2. CS processora. Extract content/Match CS rules
b.Route requestc. Setup Sequence# modification
on server side port
CSRules
pktModification
info
4/11/2003 Edward Chow Content Switch 48
Content Switch Architecture
Client
HashTable
Case B: Step 1. Controller findsthere is no entry in Hash Table,Route request to content switch processor
Real Server1
Step2. CS processora. Extract content/Match CS rules
b.Route requestc. Setup Sequence# modification
on server side port
CSRules
pktModification
info
Step 3. At server side port,Return pkts are modified
Sequence#/IP addr/ChksumRoute back to client
4/11/2003 Edward Chow Content Switch 49
Efficient Content Switching Architecture
• Tasks: Million packets with thousand of rules to match and load balancing algorithms to run.
• How to assign tasks to the (network) processors and threads?– Packet Extraction
(Understand header formats, XML parsing)– Content Switching Rule Matching– Packet Routing
(Load Balancing, Bandwidth Control)• How Much Packet Processing Should Controllers Do?• What a controller can do?• A Typical Parallel Processing Problem?
4/11/2003 Edward Chow Content Switch 50
TCP Delay Binding (Splicing)client
content switch server
step1
step2
SYN(CSEQ)
SYN(DSEQ) ACK(CSEQ+1)
DATA(CSEQ+1) ACK(DSEQ+1)
step4
step9
step10
step5
step6
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
step8
DATA(CSEQ+1) ACK(SSEQ+1)
DATA(SSEQ+1) ACK(CSEQ+lenR+1)
DATA(DSEQ+1) ACK(CSEQ+LenR+1)
ACK(DSEQ+ lenD+1) ACK(SSEQ+lenD+1)
lenR: size of http request. lenD: size of return document.
ACK(DSEQ+1)
step3
step7
ACK(SSEQ+1)
DATA(?) 2nd request ACK(?)
step11
4/11/2003 Edward Chow Content Switch 51
Improve Content Switching
• Setup CS-Real Server connections ahead of time (Persistent HTTP Connections). NetScale Reduce TCP 3-way handshake time
• Pre-allocate Server Scheme (Guess Real Server based on the TCP Sync)
• Sequence# modification on every return pkt Need to recompute checksum also.
• Filter Scheme (Offload Sequence# modification/rule matching to real servers).
• Buffering/Pipeline (aggregate) Requests
4/11/2003 Edward Chow Content Switch 52
Pre-Allocate Server Schemeclient
content switch Pre-allocatedserver
step2
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1) step4
SYN(CSEQ)
SYN(SSEQ) ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
step5
step6
ACK(SSEQ+1)
DATA(SSEQ+1)ACK(CSEQ+lenR+1)
DATA(SSEQ+1)ACK(CSEQ+LenR+1)
ACK(SSEQ+lenD+1) ACK(SSEQ+lenD+1)
.
• Guess routing decision based on IP/Port#/History• Advantage:
• Faster than TCP delay binding.• Possible direct route between client and server• Reduce session processing overhead
no need to convert server sequence #
step1
step3ACK(SSEQ + 1) ACK(SSEQ+1)
4/11/2003 Edward Chow Content Switch 53
Degenerated to TCP Delayed Binding If Guess is Wrong
client content switch
Pre-allocatedserver
step1
SYN(CSEQ)
SYN(CSEQ)
step2SYN(SSEQ)/ ACK(CSEQ+1) SYN(SSEQ)/ ACK(CSEQ+1)
step12
DATA(RSEQ+1)/ACK(CSEQ+lenR+1)DATA(SSEQ+1)/ACK(CSEQ+LenR+1)
ACK(SSEQ+lenD+1 ACK(RSEQ+lenD+1)
step6
step7
step8
SYN(CSEQ) SYN(RSEQ)/ ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(RSEQ+1)
Right server
Sequence # conversion neededfor right server now
step3ACK(SSEQ + 1) ACK(SSEQ+1)
DATA(CSEQ+1)/ ACK(SSEQ+1) step4 DATA(CSEQ+1)/ACK(SSEQ+1)
step5 DATA(SSEQ+1)
FIN(CSEQ+lenR+1))Server sent HTTP 404
ACK(RSEQ+1)step9
step10
step11
4/11/2003 Edward Chow Content Switch 54
Filter Process SchemeFilter Processrun on server
client content switch
server
step1
SYN(CSEQ)
step2SYN(DSEQ)/ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(DSEQ+1)
step4
step5 a
step6
step8
step10
SYN(CSEQ)
SYN(SSEQ)/ ACK(CSEQ+1)
DATA(CSEQ+1)/ACK(SSEQ+1)
ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1)
step9DATA(SSEQ+1)
ACK(CSEQ+lenR+1)DATA(DSEQ+1)ACK(CSEQ+LenR+1)
step5bMigrate(Data, CSEQ, DSEQ)
ACK(DSEQ+1)
ACK(SSEQ+1)
step3
step7
4/11/2003 Edward Chow Content Switch 55
Pre-allocate performance plot
Plot of response time vs document size
020000400006000080000
100000120000140000160000180000200000220000240000260000280000300000320000340000360000380000400000420000440000460000480000500000
0 10000 20000 30000 40000
bytes
mic
ros
ec
on
ds
Series1
Series2
Series3
Series4
Figure 3. Performance of Pre-allocate Server Scheme
Series 1 - Basic scheme with no rule matching module inserted, i.e., using default IPVS.
Series 2 - Basic scheme with the rule matching module inserted.
Series 3 - Pre-allocate scheme with all hits, i.e., where all pre-allocate guesses were correct.
Series 4 - Pre-allocate scheme with all misses, i.e., where all pre-allocate guesses were wrong.
4/11/2003 Edward Chow Content Switch 56
Handling multiple requestsin a Keep-Alive connection
• Determine when new request arrives– Verify that previous request has been completely received– Request data size is > 0
• Key assumption is only one outstanding request is sent at a time by client, i.e., requests are not pipelined
• Reuse connections – Store each connection control information in a
hash table keyed by real server address, once it is established.
4/11/2003 Edward Chow Content Switch 57
Quiz
• Web server keeps the TCP connection alive, expecting the browser to return for images and in-line media files.
• How many keep-alive connections are setup on IE5 and Netscape 4.7 for web page with many .jpg/.gif images?
• Can these image requests be pipelined from client browser to web server?
4/11/2003 Edward Chow Content Switch 58
Multiple HTTP Requests from One TCP Connection
• A keep alive TCP connection may include multiple HTTP “GET” requests.• Content Switch examines each “GET” request and makes new routing decision.• Content Switch establishes another connection with a different server based on the routing decision.• Those HTTP responses from different servers need to be interleaved and seen by the user as if from the same server.• Solutions: In order delivery (buffer requirement); Out of order delivery (seq# tracking)?• Problems: Should we throw away earlier html requests if receive later requests?
.
.
.
client
NAT approach
cs.jpgrocky.mid
uccs.gif
Index.htm
ContentSwitch
server1
server2
server9
4/11/2003 Edward Chow Content Switch 59
Multiple HTTP Requests from One TCP Connection
• Can servers return documents directly to client in keep-alive session case?
• Can equivalent VS-Tunnel or VS-DR be implemented using Content Switch?
.
.
.
client
cs.gif
rocky.mid
uccs.jpg
ContentSwitch
server1
server2
server9
4/11/2003 Edward Chow Content Switch 60
Content Switch Rule Survey
Survey shows that existing switches support• rules in basic (condition action) or (action condition)
form• some define condition as class, then specify the
action in separate statement or command• simple single conditional term• command line interface (to facilitate incremental
update?)• Actions can include reject, forward, put in queue (for
bandwidth control, scheduling)
4/11/2003 Edward Chow Content Switch 61
Content Switch Rule Design• Rule syntax generic to support all Intended features.• Use simple C if statement syntax rule: if (condition) { action }
– Easy to read – Allow optimization using c compiler
• Condition consists of multiple terms of – variable relational_operator value
e.g. xml.purchase/totalAmount > 50000 smtp.to == “[email protected]”
cookie.name == “servlet1” bitmatch(64, 8, 0xff) == 64 # above mean TTL=64 idea from netfilter universal filter
– suffix(variable, string) e.g. suffix(url, “gif”)– regex(variable, pattern) e.g. regex(url, “/purchase”)
• Action consists of reject, forward(server| queue)loadBalance(serverGroup, loadBalancingAlgorihtm)
4/11/2003 Edward Chow Content Switch 62
Efficient CS Rule Matching
• Brute force, strict priority: Rules are executed in sequential manner.
• Efficient Rule Matching Method:– Organize Rules so that rules can be skipped
based on existing content types.– Utilize compiler optimization technique.
4/11/2003 Edward Chow Content Switch 63
Simple CS Rule Editor GUI
4/11/2003 Edward Chow Content Switch 64
Conflict Detection on Content Switching Rules
• Detect conflicts among rules or rule set.• Absolute conflict type:
r1: if (xml.purchase/customerName == “CCL”) {routeTo(r1)}r2: if (xml.purchase/customerName == “CCL”) {routeTo(r2)}
• Potential conflict type: r1: if (xml.purchase/totalAmount > 5000) {routeTo(quickServers)}r2: if (xml.purchase/totalAmount >20000) {routeTo(superServers)}
• Algorithm: Build tree with the same variable, check operator and value to see if they are the same or lead to potential conflict, compare actions to decide conflict type or duplication.
• Developed conflict detection algorithm for rules with multiple term condition. Can be applied to policy-based rules conflict detection.
• Editor can build these trees while a user enters rules and warns about conflict right away.
4/11/2003 Edward Chow Content Switch 65
XML Tag Value Extraction
• A xmlContentExtract() is built to extract the tag values of a list of unique tag sequences.
• It is based on clark cooper’s expat 1.0 xmlparser.• Its argument include the pointer to an XML
document, the pointer to the array of strings (unique xml tag squences we follow the xsl selector syntax), and the number of sequences.
• It return the list of a structure node, with the tag sequence, its attribute, and its value.
• Currently, it supports one attribute and tag sequece needs to be unique.
4/11/2003 Edward Chow Content Switch 66
Persistence Handling in LVS• Some network applications require packets from same
users/sessions be routed to same real servers.– For consistent treatment?– For fast performance, e.g. servers maintain persistent
data/info for sessions • Tomcat web server returns cookie value so that return client
requests can be routed to the same Tomcat web server.• But cookie value is in HTTP header, a Layer 7 info. Layer 4
switch cannot access it.• This is so called persistence handling problem.• One solution: Sticky connection. Same IP address served
by same server.
4/11/2003 Edward Chow Content Switch 67
Persistent handling Problems
FTP Case:• Normally FTP uses port 21 for control, port 20 for data. • But for passive FTP, the server tells the clients the port that it
listens to. The client initiates the data connection connecting to that port.
• For the LVS/TUN and LVS/DR, LinuxDirector is only on the client-to-server half of the connection, so it is impossible for LinuxDirector to get the data port from the packet that goes to the client directly.
SSL Session Case: • port 443 for secure Web servers and port 465 for secure mail
server, • key for connection must be chosen/exchanged and only the initial
real server has the key. • Persistent or sticky connection is needed.
4/11/2003 Edward Chow Content Switch 68
Persistent Connection Solution
• When the client first accesses the service, LinuxDirector creates a template between the given client and the selected server, then create an entry for the connection in the hash table.
• The connections for any port from the client will send to the server before the template expires.
• The template expires in a configurable time, and the template won't expire until all its connections expire.
• The timeout of persistent templates can be configured by users, and the default is 300 seconds
4/11/2003 Edward Chow Content Switch 69
Problems Encountered in The Design of Linux-based Content
Switch• Handle a Request Contained in Multiple Packets• Handle Different Data Encoded Methods• Allow Referencing Specific XML Tags• Handle Long Transactions in SSL and Email network
services
4/11/2003 Edward Chow Content Switch 70
Handle a Request Contained in Multiple Packets
• For a long request, its headers and content will be carried by the multiple packets due to packet size limitation.
• We have observed Netscape 4.7 spliting a short request <1000 into two packets
• Due to interleaving with other sessions, packets of the same session may not be allocated consecutive memory.
• Even packets of the same session arrives without interleaved with packets of other sessions, application level data will be fragmented in kernel packet buffer such as skbuf.
• Matching application data pattern in the kernel is tricky.
4/11/2003 Edward Chow Content Switch 71
Example: Determine Content Length
TCP Segment n contains:POST /cgi-bin/cs622/purchase.pl HTTP/1.0\r\n Referer: http://archie.uccs.edu/~acsd/lcs/xmldemo.html\r\nConnection: Keep-Alive\r\n User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.16-22enterprise i686) \r\nHost: viva.uccs.edu\r\n Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*\
r\n Accept-Encoding: gzip\r\n Accept-Language: en\r\n Accept-Charset: iso-8859-1,*,utf-8\r\nContent-type: application/x-www-form-urlencoded\r\nContent-length: 7TCP Segment n+1 contains:53\r\ndata (753 bytes)
4/11/2003 Edward Chow Content Switch 72
Potential Solutions
• Allocate application data of a session in the consecutive memory Major rework on most kernel packet buffer allocation scheme.
• Use carry lookahead memory hardware.• Coding complicated pattern matching code that can
match pattern over fragmented data.• Use application level content switching bear the
overhead of data copying from kernel to application level.
4/11/2003 Edward Chow Content Switch 73
Handle Different Data Encoding Methods
• XML data can be passed in plain/text.• When submitting it with form, the XML request data
are encoded using the x-www-form-urlencoding method
• When extracting XML data for rule matching, different data encoding methods need to be detected through the content-type header.
4/11/2003 Edward Chow Content Switch 74
An E-Commerce XML ExampleClient submits via HTTP/Post (or SOAP) the following purchase in XML:<purchase>
<customerName>CCL</customerName><customerID>111222333</customerID><item><productID>309121544</productID>
<productName>IBM Thinkpad T21</productName><unitPrice>5000</unitPrice><noOfUnits>10</noOfUnits><subTotal>50000</subTotal>
</item><item><productID>309121538</productID>
<productName>Intel wireless LAN PC Card</productName><unitPrice>200</unitPrice><noOfUnits>10</noOfUnits><subTotal>2000</subTotal>
</item><totalAmount>52000</totalAmount>
</purchase>
4/11/2003 Edward Chow Content Switch 75
Allow Referencing Specific XML Tags
• An ambiguous XML tag sequence specification can match multiple instances.
• To avoid that and to speed up the matching, we propose the use of XML tag sequence specification that enables us to specify the specific XML tag sequence.
• For example, To specify a rule based on subTotal value present in the second item tag within the first purchase tag, the condition of the rule will be specified as “purchase:1.item:2.subTotal > 5000”.
• As another example, “purchase:2.totalAmount < 15000” specifies the condition of a rule based on the totalAmount tag present within the second purchase tag.
4/11/2003 Edward Chow Content Switch 76
Handle Long Transactions in SSL and Email network services
• some of the packet processing functions are better handled at the application level.
• For example, there are a lot of packages, including McAfee’s uvscan and AMAVis scanmail, mutt (recombine email component), for detecting and removing email virus, but almost all of them are implemented in application level and interact with the sendmail program. It will require significant effort to rewrite them as kernel modules.
• Same observations were derived on SSL processing.
4/11/2003 Edward Chow Content Switch 77
Web Switching/SSL processing overhead and Performance
differences btw Prefork and Dynamic fork
• Significant SSL processing overhead. 240 req/sec vs. 38 req/sec
• Content switching processing overhead may reduce the performance to lower than single web server. What we gain here? How we can improve it?
Overall WebBench Requests/Second
0.000
50.000
100.000
150.000
200.000
250.000
300.000
1_cli
ent
8_cli
ent
16_c
lient
24_c
lient
32_c
lient
40_c
lient
48_c
lient
56_c
lient
Clients
Req
ues
ts /
Sec
on
d
Request Per Second PreforkNonSSLProxy
Request Per Second DynamicNonSSLProxy
Request Per Second ApacheNonSSL
Request Per Second DynamicSSLProxy
Request Per Second PreforkSSLProxy
Request Per Second ApacheSSL
4/11/2003 Edward Chow Content Switch 78
IXP1200-based Content Switch
• We have ported OpenSSL and our Linux Secure Web System to run on IXP12EB with VxWork.
• Using WindRiver’s Tornado II IDE.• Preliminary version run purely on StrongArm core.• Currently working on offload header extraction and
rule matching code to run as hardware threads on microengines.
4/11/2003 Edward Chow Content Switch 79
Intel IXP1200 NP and IXP12EB
• The IXP 1200 Network Processor• The IXP12EB Evaluation Board:
– PCI form factor board based on IXP1200 Network Processor
– eight 10/100 Mbps ports– two Gigabit Ethernet ports– PCI back-plane and an Ethernet Network
Interface Card (NIC)
4/11/2003 Edward Chow Content Switch 80
IXP 1200 Network Processor
4/11/2003 Edward Chow Content Switch 81
Packets Receiving & Transmitting
4/11/2003 Edward Chow Content Switch 82
Agere Network Processor
The following figures are from Douglas Comer’s new text
“Network System Design using Network Processors”
4/11/2003 Edward Chow Content Switch 83
Agere’s FPP
4/11/2003 Edward Chow Content Switch 84
Agere’s RSP
4/11/2003 Edward Chow Content Switch 85
Alchemy’s Au1000
4/11/2003 Edward Chow Content Switch 86
Applied Micro Circuit Corp
nP7510
4/11/2003 Edward Chow Content Switch 87
Cisco ParalleleXpress
Forwarding(PXF)
4/11/2003 Edward Chow Content Switch 88
Cognigine’s Reconfigurable Communication Unit (RCU)
4/11/2003 Edward Chow Content Switch 89
EZChip NP-1
4/11/2003 Edward Chow Content Switch 90
IBM PowerNP
4/11/2003 Edward Chow Content Switch 91
IBM NPEmbeded Processor
Complex
4/11/2003 Edward Chow Content Switch 92
Motorola’s C-Port
4/11/2003 Edward Chow Content Switch 93
MotorolaSingle CP
4/11/2003 Edward Chow Content Switch 94
Packet Flow and IXP2400
4/11/2003 Edward Chow Content Switch 95
Intel IXP2400
4/11/2003 Edward Chow Content Switch 96
HA-LVS ConfigurationHigh Available
Internet LinuxDirector
Real Server1
Real Server2
Real Server3
CIPClient
HeartBeat
MON
BackupDirector
MON1. When Backup Director detects Linux Director failurethrough heart beat protocol,
“graciously negotiate”the take-over of VIP
Provide fault-tolerant
2. Monitor server processes run on real servers
Route requests to server processesthat are alive. Initiate restart/repair
4/11/2003 Edward Chow Content Switch 97
High Available Web Server Cluster
Internet
WebSwitch1
Real Server1
Real Server2
Real Server3
CIPClient
HeartBeat
MON
WebSwitch2
MON
2. Web switch monitors server processes run on real servers.When they die, • route requests to server processes that are alive. • Rewrite web switching rule. Initiate restart/repair
1. Web Switch detects the failure of other web
switchTake over the
processing of routing request.
4/11/2003 Edward Chow Content Switch 98
Status of UCCS ACSD Project• Two versions of Linux Kernel -based LCS content switch, LCS01, LCS02 were
developed.• A Linux Application level secure web switch (LSWS) was developed using OpenSSL
package.• LSWS is ported to run on Intel IXP12EB and IXP1200 network processor with
Windriver VxWork. • Part of the above research projects are sponsored by CCL/ITRI. • Based on Linux-2.2.16-3, current release LCS02.• Being ported to Linux-2.4.18 and integrated with KTCPVS.• ip_forward.c, ip_masq.c, ip_vs.c are modified to implement basic TCP delay binding.• ip_cs.c are added for most of the content switching functions with http header
extraction and xml content extraction.• A simple Java-based ruleEdit program was created for rule editing and conflict
detection. A C-based program can detect conflicts among rules with regular expression in their condition expression.
• Rule translate program to convert the rule set into a Linux kernel module and allow dynamic replacement of rule without restarting the system.
• Currently working on integrating KTCPVS and provide unified configuration/monitor command
4/11/2003 Edward Chow Content Switch 99
LCS Demo
• We set up viva.uccs.edu as a content switch and wait and ace as two real servers.
• URL Switching demo:http://viva.uccs.edu/~lcs1/ route to ace.uccs.eduhttp://viva.uccs.edu/~lcs2/ route to wait.uccs.edu
• XML Web Switching (E-commerce applications)http://archie.uccs.edu/~acsd/lcs/xmldemo.htmlWhen the 2nd subtotal tag >=50000, route to ace.When the 2nd subtotal tag <50000, route to wait.
• Let us know if you have problem accessing them.My students may be working on LCS extension.
4/11/2003 Edward Chow Content Switch 100
LCS Rule ExampleR4: if (atoi(rule_fields[1].value) >= 50000) { return route_to("ace", NON_STICKY, saddr); }R5: if ((atoi(rule_fields[1].value) > 0) && (atoi(rule_fields[1].value) < 50000)){ IP_RULE_MSG("serevr=wait\n"); return route_to("wait", NON_STICKY, saddr); }R10: if (strstr(url, "lcs1") != NULL) { IP_RULE_MSG("server=ace\n"); return route_to("ace", NON_STICKY, saddr); }R11: if(strstr(url, "lcs2") != NULL){ IP_RULE_MSG("server=wait\n"); return route_to("wait", NON_STICKY, saddr); }
4/11/2003 Edward Chow Content Switch 101
Intel 7280 Demo• http://cs.uccs.edu/~chow/pub/master/ycai/doc/csdemo.html
4/11/2003 Edward Chow Content Switch 102
Related Load Balancing Research Results
• Modified Apache status module to report– Total bytes to be transferred by child processes– Average document transfer speed
• Modified LB-DNS to receive server status and bandwidth probing results.
• LB-DNS returns IP-address of the best server based a weight contributed by both server load and bandwidth.
• Modified WebStone benchmark to test the performance of load balancing web server clusters.
4/11/2003 Edward Chow Content Switch 103
Load balancing Systems
Modified Web Server1
Modified Web Servern
Statistics GatheringDaemon
LBA: ModifiedDNS
Server Delay
Request for Web pages
Server Ranking/tmp/StatFile
Bandwidth Probe Results
4/11/2003 Edward Chow Content Switch 104
Connection Rate: LBA vs. Round-RobinServer connection rate for 4 servers
0
200
400
600
800
1000
Update for LBA , per sec
Conn
ectio
ns/s
ec
load balancing system round-robin
load balancing system 418.2 656.6 907.9 420 636.7 322.6 711.6 420.5 638.3 670.6 683.4 899
round-robin 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6
1 2 3 4 5 6 7 8 9 10 11 12
Round robin only run once
4/11/2003 Edward Chow Content Switch 105
Conclusion• Content Delivery Network improves internet content retrieval• LVS provides a low cost layer 4 switching service for cluster.• Linux Content Switch with generic rules can be easily
configured for wide-variety of value-added services:– Premium services– Load balancing/High Available server farm.– Firewall– Bandwidth control/Traffic shaping
• Require efficient SW/HW architecture and rule matching algorithms to reduce processing overhead.
• Content rule design/conflict detection are important and challenging.
• TCP delay binding can be improved.
4/11/2003 Edward Chow Content Switch 106
References• http://www.linuxvirtualserver.org/• http://www.akamai.com/• http://cs.uccs.edu/~chow/pub/contentsw/talk/contentswitching.ppt• [Aron2000] Aron, Mohit, “Differential and predictable QoS in web server systems”, Ph.D
dissertation Rice University, Oct. 2000.• [Zhang97] Lixia Zhang, Sally Floyd, and Van Jacobson, “Adaptive Web Caching,” April 25,
1997. http://www-nrg.ee.lbl.gov/floyd/web.html• [Esi2001] Edge Side Includes, http://www.esi.org/. • [Chow2001a] C. Edward Chow and Indira Semwal, “Web Load Balancing Through More
Accurate Server Report,” Proceeding of PDCAT 2001, Taipei, Taiwan.• [Chow2001b] C. Edward Chow, Ganesh Godavari, and Jianhua Xie, “Content Switch Rules
and their Conflict Detection,” Proceeding of PDCAT 2001, Taipei, Taiwan.• [Chow2001c] C. Edward Chow and Weihong Wang, “The Design and Implementation of
Linux LVS-based Content Switch”, Proceeding of PDCAT 2001, Taipei, Taiwan.• [Aversa2000] Luis Aversa and Azer Bestavros, “Load Balancing a Cluster of Web Servers:
Using Distributed Packet Rewriting,” Proceedings of IPCCC 2000. • [Cao98] PeiCao, Jin Zhang and Kevin Beach, “Active Cache: Caching Dynamic Contents on
the Web” http://www.cs.wisc.edu/~cao/papers/active-cache.ps