64
1 Building distributed systems using Helix Kishore Gopalakrishna, @kishoreg1980 h?p://www.linkedin.com/in/kgopalak h?p://helix.incubator.apache.org Apache IncubaGon Oct, 2012 @apachehelix

Apache Helix presentation at Vmware

Embed Size (px)

DESCRIPTION

Apache Helix presentation at Vmware. Download the file to see animations

Citation preview

Page 1: Apache Helix presentation at Vmware

1  

Building  distributed  systems  using  Helix  

Kishore  Gopalakrishna,  @kishoreg1980h?p://www.linkedin.com/in/kgopalak    

h?p://helix.incubator.apache.org  Apache  IncubaGon  Oct,  2012    @apachehelix  

Page 2: Apache Helix presentation at Vmware

Outline  

•  Introduc)on  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage    

2  

Page 3: Apache Helix presentation at Vmware

3  

Examples  of  distributed  data  systems  

Page 4: Apache Helix presentation at Vmware

4  

Single  Node  

MulG  node  

Fault  tolerance  

Cluster  Expansion  

•  ParGGoning  •  Discovery  •  Co-­‐locaGon  

•  ReplicaGon  •  Fault  detecGon  •  Recovery  

•  Thro?le  data  movement  •  Re-­‐distribuGon  

Lifecycle  

Page 5: Apache Helix presentation at Vmware

5  

ApplicaGon  

Zookeeper  

ApplicaGon  

Framework  

Consensus  System  

•  File  system  •  Lock  •  Ephemeral  

•  Node  •  ParGGon  •  Replica  •  State  •  TransiGon    

Zookeeper  provides  low  level  primiGves.     We  need  high  level  primiGves.    

Page 6: Apache Helix presentation at Vmware

6  

Page 7: Apache Helix presentation at Vmware

Outline  

•  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage  

7  

Page 8: Apache Helix presentation at Vmware

8  

Terminologies    Node   A  single  machine  

Cluster   Set  of  Nodes  

Resource   A  logical  en/ty  e.g.  database,  index,  task  

ParGGon   Subset  of  the  resource.  

Replica   Copy  of  a  parGGon  

State   Status  of  a  parGGon  replica,  e.g  Master,  Slave  

TransiGon   AcGon  that  lets  replicas  change  status  e.g  Slave  -­‐>  Master  

Page 9: Apache Helix presentation at Vmware

Core  concept  -­‐  state  machine  

•  Set  of  legal  states  – S1,  S2,  S3  

•  Set  of  legal  state  transiGons  – S1àS2  – S2àS1  – S2àS3  – S3àS2  

   

9  

Page 10: Apache Helix presentation at Vmware

Core  concept  -­‐  constraints  

•  Minimum  and  maximum  number  of  replicas  that  should  be  in  a  given  state  – S3àmax=1,  S2àmin=2  

•  Maximum  concurrent  transiGons  – Per  node  – Per  resource  – Across  cluster  

   

10  

Page 11: Apache Helix presentation at Vmware

Core  concept:  objecGves  

•  ParGGon  placement  – Even  distribuGon  of  replicas  in  state  S1,S2  across  cluster  

•  Failure/Expansion  semanGcs  – Create  new  replicas  and  assign  state  – Change  state  of  exisGng  replicas  – Even  distribuGon  of  replicas          

11  

Page 12: Apache Helix presentation at Vmware

Augmented  finite  state  machine  

12  

State  Machine  

• States  • S1,S2,S3  

• TransiGon  • S1àS2,  S2àS1,  S2àS3,  S3àS1    

Constraints  

• States  • S1à  max=1,  S2=min=2  

• TransiGons  • Concurrent(S1-­‐>S2)  across  cluster  <  5    

ObjecGves  

• ParGGon  Placement  • Failure  semanGcs  

Page 13: Apache Helix presentation at Vmware

Message  consumer  group  -­‐  problem    

13  

ASSIGNMENT   SCALING   FAULT  TOLERANCE  

PARTITIONED  QUEUE   CONSUMER  

ParGGon  management  

• One  consumer  per  queue  

•  Even  distribuGon  

ElasGcity  

• Re-­‐distribute  queues  among  consumers  

• Minimize  movement  

Fault  tolerance  

• Re-­‐distribute  • Minimize  movement  •  Limit  max  queue  per  consumer  

Page 14: Apache Helix presentation at Vmware

Message  consumer  group:  soluGon  

Offline   Online  

14  

MAX=1  per  par))on    Start  consumpGon  

Stop  consumpGon  

MAX  10  queues  per  consumer  

ONLINE  OFFLINE  STATE  MODEL  

Page 15: Apache Helix presentation at Vmware

Distributed  data  store      

Node  1   Node  3  Node  2  

P.4  

P.9   P.10   P.11  

P.12  

P.1   P.2   P.3   P.7  P.5   P.6  

P.8   P.1  P.5   P.6  

P.9   P.10  

P.4  P.3  

P.7   P.8  P.11   P.12  

P.2  P.1  

ParGGon  management  

• MulGple  replicas  • 1  designated  master  

• Even  distribuGon  

Fault  tolerance  

• Fault  detecGon  • Promote  slave  to  master  

• Even  distribuGon  

• No  SPOF  

ElasGcity  

• Minimize  downGme  

• Minimize  data  movement  

• Thro?le  data  movement  

MASTER  

SLAVE  

Page 16: Apache Helix presentation at Vmware

Distributed  data  store:  soluGon  

16  

COUNT=2

COUNT=1

minimize(maxnj∈N  S(nj)  ) t1≤5 S  

M  O  

t1 t2

t3 t4 minimize(maxnj∈N  M(nj)  )

MASTER  SLAVE  STATE  MODEL  

Page 17: Apache Helix presentation at Vmware

Distributed  search  service      

Node  1   Node  3  Node  2  

P.3  

P.1   P.2  

P.4  

ParGGon  management  

• MulGple  replicas  • Even  distribuGon  

• Rack  aware  placement  

Fault  tolerance  

• Fault  detecGon  • Auto  create  replicas  

• Controlled  creaGon  of  replicas    

ElasGcity  

• re-­‐distribute  parGGons  

• Minimize  movement  

• Thro?le  data  movement  

P.5  

P.3   P.4  

P.6   P.1  

P.5   P.6  

P.2  

INDEX  SHARD  

REPLICA  

Page 18: Apache Helix presentation at Vmware

Distributed  search  service:  soluGon  

18  

MAX=3  (number  of  replicas)  

MAX  per  node=5  

Page 19: Apache Helix presentation at Vmware

Internals  

19  

Page 20: Apache Helix presentation at Vmware

IDEALSTATE  

P1  N1:M  

N2:S  

P2  N2:M  

N3:S  

P3  N3:M  

N1:S  

20  

ConfiguraGon  

• 3  nodes  • 3  parGGons  • 2  replicas  • StateMachine  

Constraints  

• 1  Master  • 1  Slave  • Even  distribuGon  

Replica  placement  

Replica    State  

Page 21: Apache Helix presentation at Vmware

CURRENT  STATE  

•  P1:OFFLINE  •  P3:OFFLINE  N1  •  P2:MASTER  •  P1:MASTER  N2  •  P3:MASTER  •  P2:SLAVE  N3  

21  

Page 22: Apache Helix presentation at Vmware

22  

EXTERNAL  VIEW  

P1  N1:O  

N2:M  

P2  N2:M  

N3:S  

P3  N3:M  

N1:O  

Page 23: Apache Helix presentation at Vmware

23  

Helix  Based  System  Roles  

Node  1   Node  3  Node  2  

P.4  

P.9   P.10   P.11  

P.12  

P.1   P.2   P.3   P.7  P.5   P.6  

P.8   P.1  P.5   P.6  

P.9   P.10  

P.4  P.3  

P.7   P.8  P.11   P.12  

P.2  P.1  

PARTICIPANT

SPECTATORController

Parition routing logic

CURRENT STATE

IDEAL STATE

RESPONSE COMMAND

Page 24: Apache Helix presentation at Vmware

Logical  deployment  

24  

Page 25: Apache Helix presentation at Vmware

Outline  

•  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage      

25  

Page 26: Apache Helix presentation at Vmware

Helix  based  soluGon  

1. Define    2. Configure    3. Run  

26  

Page 27: Apache Helix presentation at Vmware

Define:  State  model  definiGon  

•  States  –  All  possible  states  –  Priority  

•  TransiGons  –  Legal  transiGons  –  Priority  

•  Applicable  to  each  parGGon  of  a  resource  

•  e.g.  MasterSlave  

27  

S  

M  O  

Page 28: Apache Helix presentation at Vmware

Define:  state  model  

28  

Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);! // Add states and their rank to indicate priority. ! builder.addState(MASTER, 1);! builder.addState(SLAVE, 2);! builder.addState(OFFLINE);! ! //Set the initial state when the node starts! builder.initialState(OFFLINE);  

//Add transitions between the states.! builder.addTransition(OFFLINE, SLAVE);! builder.addTransition(SLAVE, OFFLINE);! builder.addTransition(SLAVE, MASTER);! builder.addTransition(MASTER, SLAVE);! !

Page 29: Apache Helix presentation at Vmware

Define:  constraints  State   Transi)on  

ParGGon   Y   Y  

Resource   -­‐   Y  

Node   Y   Y  

Cluster   -­‐   Y  

29  

S  

M  O  

COUNT=2

COUNT=1 State   Transi)on  

ParGGon   M=1,S=2   -­‐  

Page 30: Apache Helix presentation at Vmware

Define:constraints  

30  

// static constraint! builder.upperBound(MASTER, 1);!!! // dynamic constraint! builder.dynamicUpperBound(SLAVE, "R");!!! ! // Unconstrained ! builder.upperBound(OFFLINE, -1;    

Page 31: Apache Helix presentation at Vmware

Define:  parGcipant  plug-­‐in  code  

31  

Page 32: Apache Helix presentation at Vmware

Step  2:  configure  

32  

helix-­‐admin  –zkSvr  <zkAddress>  

CREATE  CLUSTER  

-­‐-­‐addCluster  <clusterName>  

ADD  NODE  

-­‐-­‐addNode  <clusterName  instanceId(host:port)>    

CONFIGURE  RESOURCE    

-­‐-­‐addResource  <clusterName  resourceName  par;;ons  statemodel>    

REBALANCE  èSET  IDEALSTATE  

-­‐-­‐rebalance  <clusterName  resourceName  replicas>  

Page 33: Apache Helix presentation at Vmware

33  

zookeeper  view  IDEALSTATE  

Page 34: Apache Helix presentation at Vmware

Step  3:  Run  

34  

run-­‐helix-­‐controller    -­‐zkSvr  localhost:2181  –cluster  MyCluster  START  CONTROLLER  

START  PARTICIPANT  

Page 35: Apache Helix presentation at Vmware

zookeeper  view  

35  

Page 36: Apache Helix presentation at Vmware

Znode  content  

CURRENT  STATE   EXTERNAL  VIEW  

36  

Page 37: Apache Helix presentation at Vmware

Spectator  Plug-­‐in  code  

37  

Page 38: Apache Helix presentation at Vmware

38  

Helix  ExecuGon  modes  

Page 39: Apache Helix presentation at Vmware

39  

IDEALSTATE  

P1  N1:M  

N2:S  

P2  N2:M  

N3:S  

P3  N3:M  

N1:S  

ConfiguraGon  

• 3  nodes  • 3  parGGons  • 2  replicas  • StateMachine  

Constraints  

• 1  Master  • 1  Slave  • Even  distribuGon  

Replica  placement  

Replica    State  

Page 40: Apache Helix presentation at Vmware

ExecuGon  modes  

•  Who  controls  what    

40  

AUTO  REBALANCE  

AUTO   CUSTOM  

Replica  placement  

Helix   App   App  

Replica    State  

Helix   Helix   App  

Page 41: Apache Helix presentation at Vmware

Auto  rebalance  v/s  Auto  

AUTO  REBALANCE   AUTO  

41  

Page 42: Apache Helix presentation at Vmware

In  acGon    Auto  rebalance  

MasterSlave  p=3  r=2  N=3  Node1   Node2   Node3  

P1:M   P2:M   P3:M  

P2:S   P3:S   P1:S  

Auto    MasterSlave  p=3  r=2  N=3  

42  

Node  1   Node  2   Node  3  

P1:O   P2:M   P3:M  

P2:O   P3:S   P1:S  

P1:M   P2:S  

Node  1   Node  2   Node  3  

P1:M   P2:M   P3:M  

P2:S   P3:S   P1:M  

Node  1   Node  2   Node  3  

P1:M   P2:M   P3:M  

P2:S   P3:S   P1:S  

On  failure:  Only  change  states  to  saGsfy  constraint  

On  failure:  Auto  create  replica    and  assign  state  

Page 43: Apache Helix presentation at Vmware

Custom  mode:  example  

43  

Page 44: Apache Helix presentation at Vmware

44  

Custom  mode:  handling  failure  �  Custom  code  invoker  

�  Code  that  lives  on  all  nodes,  but  acGve  in  one  place  �  Invoked  when  node  joins/leaves  the  cluster  �  Computes  new  idealstate  �  Helix  controller  fires  the  transiGon  without  viola)ng  constraints  

P1  

N1:M  

N2:S  

P2  

N2:M  

N3:S  

P3  

N3:M  

N1:S  

P1  

N1:S  

N2:M  

P2  

N2:M  

N3:S  

P3  

N3:M  

N1:S  

Transi)ons  

1   N1   MàS  

2   N2   Sà  M  

1  &  2  in  parallel  violate  single  master  constraint  

Helix  sends  2  aqer  1  is  finished  

Page 45: Apache Helix presentation at Vmware

45  

•  At  least  2  separate  controllers  process  to  avoid  SPOF  

•  Only  one  controller  acGve  •  Extra  process  to  manage  •  Recommended  for  large  

size  clusters  •  Upgrading  controller  is  

easy  

Separate  •  Embedded  controller  within  

each  parGcipant  •  Only  one  controller  acGve  •  No  extra  process  to  manage  •  Suitable  for  small  size  cluster.  •  Upgrading  controller  is  costly  •  ParGcipant  health  impacts  

controller  

Embedded  

Controller  deployment  

Page 46: Apache Helix presentation at Vmware

46  

Controller  fault  tolerance  

Controller 1

Controller 2

Controller 3

Zookeeper

LEADER STANDBY STANDBY

Zookeeper  ephemeral  based  leader  elecGon  for  deciding  controller  leader      

Page 47: Apache Helix presentation at Vmware

47  

Controller  fault  tolerance  

Controller 1

Controller 2

Controller 3

Zookeeper

OFFLINE LEADER STANDBY

When  leader  fails,  another  controller  becomes  the  new  leader  

Page 48: Apache Helix presentation at Vmware

48  

Managing  the  controllers  

Page 49: Apache Helix presentation at Vmware

49  

Scaling  the  controller:    Leader  Standby  Model  

Controller

Controller

Controller

Cluster

Cluster

Cluster

Cluster

Cluster

Cluster

O L

S

STANDBY

LEADEROFFLINE

Page 50: Apache Helix presentation at Vmware

50  

Scaling  the  controller:  Failure  

Controller

Controller

Controller

Cluster

Cluster

Cluster

Cluster

Cluster

Cluster

O L

S

STANDBY

LEADEROFFLINE

Page 51: Apache Helix presentation at Vmware

Outline  

•  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage    

51  

Page 52: Apache Helix presentation at Vmware

Tools  

•  Chaos  monkey  •  Data  driven  tesGng  and  debugging  •  Rolling  upgrade  •  On  demand  task  scheduling  and  intra-­‐cluster  messaging  

•  Health  monitoring  and  alerts  

52  

Page 53: Apache Helix presentation at Vmware

Data  driven  tesGng  

•  Instrument  –  •   Zookeeper,  controller,  parGcipant  logs  

•  Simulate  –  Chaos  monkey  •  Analyze  –  Invariants  are  

•  Respect  state  transiGon  constraints  •  Respect  state  count  constraints  •  And  so  on  

•  Debugging  made  easy  •  Reproduce  exact  sequence  of  events    

 53  

Page 54: Apache Helix presentation at Vmware

Structured  Log  File  -­‐  sample  timestamp partition instanceName sessionId state

1323312236368 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236426 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236530 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236530 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236561 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE

1323312236561 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236685 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE

1323312236685 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236685 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236719 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE

1323312236719 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE

1323312236719 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE

1323312236814 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE

Page 55: Apache Helix presentation at Vmware

No  more  than  R=2  slaves  Time State Number Slaves Instance

42632 OFFLINE 0 10.117.58.247_12918

42796 SLAVE 1 10.117.58.247_12918

43124 OFFLINE 1 10.202.187.155_12918

43131 OFFLINE 1 10.220.225.153_12918

43275 SLAVE 2 10.220.225.153_12918

43323 SLAVE 3 10.202.187.155_12918

85795 MASTER 2 10.220.225.153_12918

Page 56: Apache Helix presentation at Vmware

How  long  was  it  out  of  whack?  Number  of  Slaves   Time     Percentage  

0   1082319   0.5  

1   35578388   16.46  

2   179417802   82.99  

3   118863   0.05  

83%  of  the  Gme,  there  were  2  slaves  to  a  parGGon  93%  of  the  Gme,  there  was  1  master  to  a  parGGon  

Number  of  Masters   Time   Percentage  

0 15490456 7.164960359 1 200706916 92.83503964

Page 57: Apache Helix presentation at Vmware

Invariant  2:  State  TransiGons  FROM   TO   COUNT  

MASTER SLAVE 55

OFFLINE DROPPED 0

OFFLINE SLAVE 298

SLAVE MASTER 155

SLAVE OFFLINE 0

Page 58: Apache Helix presentation at Vmware

Outline  

•  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage    

58  

Page 59: Apache Helix presentation at Vmware

Helix  usage  at  LinkedIn  

 

 

59  

Espresso  

Page 60: Apache Helix presentation at Vmware

In  flight  

•  Apache  S4  – ParGGoning,  co-­‐locaGon  – Dynamic  cluster  expansion  

•  Archiva  – ParGGoned  replicated  file  store  – Rsync  based  replicaGon  

•  Others  in  evaluaGon  – Bigtop  

60  

Page 61: Apache Helix presentation at Vmware

Auto  scaling  soqware  deployment  tool  

61  

•  States  •  Download,  Configure,  Start  •  AcGve,  Standby  

•  Constraint  for  each  state  •  Download    <  100  •  AcGve  1000  •  Standby  100  

Download

Configure

Start

Active

Standby

Offline

< 100

1000

100

Page 62: Apache Helix presentation at Vmware

Summary  

•  Helix:  A  Generic  framework  for  building  distributed  systems  

•  Modifying/enhancing  system  behavior  is  easy  – AbstracGon  and  modularity  is  key  

•  Simple  programming  model:  declaraGve  state  machine  

62  

Page 63: Apache Helix presentation at Vmware

Roadmap  

•  Features  •  Span  mulGple  data  centers  •  AutomaGc  Load  balancing  •  Distributed  health  monitoring  •  YARN  Generic  ApplicaGon  master  for  real  Gme  Apps  

•  Stand  alone  Helix  agent    

 

Page 64: Apache Helix presentation at Vmware

64  

website   h?p://helix.incubator.apache.org  

user   [email protected]  

dev   [email protected]  

twi?er   @apachehelix,  @kishoreg1980