Advanced(Computer(Networks(( - Systems Group · Advanced(Computer(Networks((263350100 ......

Preview:

Citation preview

Advanced  Computer  Networks    263-­‐3501-­‐00  

Layer-­‐7  switching  Patrick  Stuedi  

Spring  Semester  2014  

© Oriana Riva, Department of Computer Science | ETH Zürich

Outline  

•  Last  Fme  

–  Datacenter  TCP  •  Today  

–  L7  Switching  

3  Slides  adapted  from  Prof.  Roscoe  

Course  overview  

covered  in  basic  ETH  “Opera3ng  Systems  and  Networks”  course  

Wireless  networking  technologies  

Datacenter  networking  

We  are  now  here:  accessing  the  datacenter  

4  Slides  adapted  from  Prof.  Roscoe  

Challenge:  accessing  services  

•  Large  web  applicaFons  are  typically  replicated  over  several  data  centers  

•  Within  a  data  center,  applicaFons  share  many  machines  

So:  

•  What  address  does,  e.g.  www.search.ch  resolve  to?  

•  What  enFty  does  this  address  refer  to?  

•  What  does  this  enFty  do?  

5  Slides  adapted  from  Prof.  Roscoe  

Requirements  

•  “Close  by”  datacenter  •  Load  balance  across  machines  in  a  center  

•  Target  machines  where  the  user’s  state  is  kept  

•  Accessed  using  TCP  (HTTP,  SSL,  …)  

6  Slides  adapted  from  Prof.  Roscoe  

OpFon  1:  IP  Anycast  

•  One  IP  address  refers  to  mulFple  desFnaFons  

–  BGP  adverFzes  mulFple  desFnaFons  –  Packets  end  up  at  “nearest”  AS  to  source.  

•  Problems:  

  IP  layer  ⇒  only  reliable  for  stateless  protocols  (UDP)   All  packets  of  a  TCP  flow  must  go  to  the  same  machine  

  Service  locaFon  pushed  into  BGP  ⇒  couples  rouFng  with  end-­‐system  provision  

7  Slides  adapted  from  Prof.  Roscoe  

OpFon  2:  DNS  

•  Insight:  who  says  the  answer  is  always  the  same?  

•  Idea:  “smart”  DNS  server  authoritaFve  for  service  

Query  for,  e.g..  www.google.com  or  www.bing.com  returns  a  different  “A”  record  depending  on:  

–  Source  address  of  browser  machine  –  Current  state  of  the  service  

•  Load  •  Failures  

–  A  random  number    

8  Slides  adapted  from  Prof.  Roscoe  

Using  CNAMEs  

9  Slides  adapted  from  Prof.  Roscoe  

First  DNS  resolver  returns  CNAME  

Using  CNAMEs  

10  Slides  adapted  from  Prof.  Roscoe  

Regional  service  resolver  can  be  more  specific  

Using  CNAMEs  

11  Slides  adapted  from  Prof.  Roscoe  

Fmeout  

DNS  does  not  solve  the  problem  

Need  IP  address  for  every  instance  of  the  service  

  100,000  machines    ⇒  100,000  globally  routable  IP  addresses  –  expensive!  

 Machine  fails    ⇒  need  to  update  DNS  state  

 DNS  state  changes  rapidly    ⇒  short  TTL  on  queries  ⇒  even  higher  load  on  DNS  servers  

12  Slides  adapted  from  Prof.  Roscoe  

Datacenter  

Next  step:  use  1  IP  address  

•  Use  Network  Address  TranslaFon  •  Hash  source  addresses  to  server  machines  

Internet  

173.194.35.19  

10.1.1.1  

10.1.1.2  

10.1.1.3  

10.1.1.4  

10.1.1.5  

10.1.1.6  

10.1.1.7  

IP  addr.  A  

IP  addr.  B  

13  Slides  adapted  from  Prof.  Roscoe  

Datacenter  

Next  step:  use  1  IP  address  

•  Use  Network  Address  TranslaFon  •  Hash  source  addresses  to  server  machines  

Internet  

173.194.35.19  

10.1.1.1  

10.1.1.2  

10.1.1.3  

10.1.1.4  

10.1.1.5  

10.1.1.6  

10.1.1.7  

IP  addr.  A  

IP  addr.  B  

Hash(A)  =  6  

14  Slides  adapted  from  Prof.  Roscoe  

Datacenter  

Next  step:  use  1  IP  address  

•  Use  Network  Address  TranslaFon  •  Hash  source  addresses  to  server  machines  

Internet  

173.194.35.19  

10.1.1.1  

10.1.1.2  

10.1.1.3  

10.1.1.4  

10.1.1.5  

10.1.1.6  

10.1.1.7  

IP  addr.  A  

IP  addr.  B  

Hash(B)  =  2  

Hash(A)  =  6  

15  Slides  adapted  from  Prof.  Roscoe  

Stateless  hashing  

Hash(Source  IP)  

•  Completely  staFc  –  No  dynamic  load  balancing  

Hash(Source  IP,  Source  TCP  port)  

•  Bejer,  but  sFll  staFc  –  Limited  to  64k  desFnaFons  per  client  machine  

•  Known  as  a  “Layer-­‐4  load  balancer”  

16  Slides  adapted  from  Prof.  Roscoe  

Stateless  hashing  

Hash(Source  IP)  

•  Completely  staFc  –  No  dynamic  load  balancing  

Hash(Source  IP,  Source  TCP  port)  

•  Bejer,  but  sFll  staFc  –  Limited  to  64k  desFnaFons  per  client  machine  

•  Known  as  a  “Layer-­‐4  load  balancer”  

Basic  problem:  nothing  else  is  known  by  the  end  of  the  handshake!  

17  Slides  adapted  from  Prof.  Roscoe  

Why  is  staFc  hashing  bad?  

•  Machine  failure/upgrade/provisioning  

–  Can’t  update  hash  funcFon  efficiently  in  switch  

•  Load  balancing  –  Can’t  avoid  a  heavily-­‐loaded  machine  

–  Can’t  spread  load  from  a  small  group  of  clients  

•  Lack  of  Locality  –  Resource  being  accessed  –  Client  accessing  the  resource  

18  Slides  adapted  from  Prof.  Roscoe  

What  else  might  we  want  to  hash  on?  

19  Slides  adapted  from  Prof.  Roscoe  

What  else  might  we  want  to  hash  on?  

20  Slides  adapted  from  Prof.  Roscoe  

HTTP  Host:  header  

•  Introduced  in  HTTP/1.1  –  mandatory  

•  Hash  based  on  virtual  host  avoids  replicaFng  all  service  state  everywhere  

–  Different  services  have  different  virtual  host  

21  Slides  adapted  from  Prof.  Roscoe  

What  else  might  we  want  to  hash  on?  

22  Slides  adapted  from  Prof.  Roscoe  

Switching  on  URL  

•  Locality:  –  Allows  state  to  be  parFFoned  across  machines  

•  IsolaFon:  –  Rare,  computaFonally  intensive  URLs  can  be  sequestered  

–  SensiFve  data  can  be  kept  on  more  expensive,  audijed  machines  

23  Slides  adapted  from  Prof.  Roscoe  

What  else  might  we  want  to  hash  on?  

24  Slides  adapted  from  Prof.  Roscoe  

Hashing  on  cookies  

•  Enables  parFoning  of  servers  by  –  User  state  –  Session  state  

25  Slides  adapted  from  Prof.  Roscoe  

How  to  do  it?  

•  Problem:    

–  Don’t  know  the  hash  key  unFl  aoer  the  HTTP  request  –  Typically  the  first  segment  aoer  the  3WS  

•  SoluFon:  –  Don’t  establish  connecFon  to  server  unFl  client  has  sent  HTTP  request  

26  Slides  adapted  from  Prof.  Roscoe  

Late-­‐binding  of  TCP  connecFon  

27  

Fme  

Client   Server  

Port  =  3620  

Switch  

Slides  adapted  from  Prof.  Roscoe  

Late-­‐binding  of  TCP  connecFon  

28  

Fme  

Client   Server  

Port  =  3620  

Switch  

TCP  connecFon  setup  +  HTTP  GET  

Slides  adapted  from  Prof.  Roscoe  

Late-­‐binding  of  TCP  connecFon  

29  

Fme  

Client   Server  

Port  =  3620  

Switch  

TCP  connecFon  setup  +  HTTP  GET  

TCP  connecFon  setup  +  HTTP  GET  

Slides  adapted  from  Prof.  Roscoe  

Late-­‐binding  of  TCP  connecFon  

30  

Fme  

Client   Server  

Port  =  3620  

Switch  

TCP  connecFon  setup  +  HTTP  GET  

TCP  connecFon  setup  +  HTTP  GET  

HTTP  response  

(acks  not  shown)  

Slides  adapted  from  Prof.  Roscoe  

Late-­‐binding  of  TCP  connecFon  

31  

Fme  

Client   Server  

Port  =  3620  

Switch  

TCP  connecFon  setup  +  HTTP  GET  

TCP  connecFon  setup  +  HTTP  GET  

HTTP  response  

HTTP  response  

(acks  not  shown)  

Slides  adapted  from  Prof.  Roscoe  

Naïve  implementaFon  (from  Maltz  &  Bhagwat)  

c = accept() client connection;

<authenticate client>

s = socket();

connect(s) to server;

send(c) OK message;

while (1) {

read() from c, write() to s;

read() from s, write() to c;

if (c and s return EOF) {

close(c); close(s);

break;

}

}

<service next request>

32  Slides  adapted  from  Prof.  Roscoe  

Inefficient:  data  copies  between  the  two  connecFons  

TCP  Splicing  

•  Proposed  around  1997  by  Maltz  &  Bhagwat  at  IBM    

•  Key  idea:  –  Take  two  established  TCP  connecFons  and  splice  them  –  Transfer  segments  unmodified  between  them  –  Remap  port  numbers  and  segment  numbers  on  the  fly  

•  Advantages:  –  Very  simple  calculaFon  per  packet  –  Not  much  state  to  maintain  per  spliced  connecFon  –  No  segmentaFon/reassembly  –  No  buffering  

33  Slides  adapted  from  Prof.  Roscoe  

Splicing  pseudocode  (from  Maltz  &  Bhagwat)  

34  Slides  adapted  from  Prof.  Roscoe  

What  state  is  needed?  

For  each  packet,  need  to  do  the  following:  

•  IP  header  operaFons:  –  Rewrite  source  and  desFnaFon  IP  addresses  –  Update  IP  header  checksum  

•  TCP  header  operaFons:  –  Rewrite  source  and  desFnaFon  port  numbers  

–  Apply  fixed  offset  to  sequence  number  –  Apply  fixed  offset  to  acknowledgement  number  

–  Update  TCP  header  checksum  

35  Slides  adapted  from  Prof.  Roscoe  

It’s  easy  to  do  in  hardware  •  ArrowPoint  CS-­‐800  Content  Switch  from  1998  

–  Acquired  by  Cisco  soon  aoer  •  Forwarding  ASIC  for  TCP  splicing  

•  Various  load  balancing  policies  –  Round  robin,  measurement-­‐based  

•  Failure  detecFon  for  servers    and  automaFc  failover  –  Request  Fmeout,  heartbeat  msg  

•  New  server  added  dynamically  

•  >16  GbE  ports  on  each  “side”  

•  Around  15,000  HTTP    connecFon  requests  /  second  

36  Slides  adapted  from  Prof.  Roscoe  

References  

•  “Host  AnycasFng  Service”,  C.  Partridge,  T.  Mendez,  W.  Milliken,  Internet  RFC  1546,  November  1993.  

•  “TCP  Splicing  for  ApplicaFon  Layer  Proxy  Performance”,  David  A.  Maltz,  and  Pravin  Bhagwat.  IBM  Research  Report  21139  (Computer  Science/MathemaFcs),  IBM  Research  Division,  1998.  

•  “Cisco  Data  Center  Infrastructure  2.5  Design  Guide”,  Cisco  Systems,  November  2,  2011.  http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_Infra2_5/DCI_SRND_2_5_book.html  (very  markeFng  oriented,  and  not  the  only  way  to  do  it,  but  gives  an  idea  of  the  complexity!)  

37  Slides  adapted  from  Prof.  Roscoe  

Recommended