Transcript
Page 1: Masterless Distributed Computing with Riak Core - EUC 2010

Rusty Klophaus (@rklophaus)Basho Technologies

Masterless Distributed Computing with Riak Core

Erlang User ConferenceStockholm, Sweden · November 2010

Monday, November 22, 2010

Page 2: Masterless Distributed Computing with Riak Core - EUC 2010

Basho Technologies

2

About Basho Technologies• Offices in:

• Cambridge, Massachusetts • San Francisco, California

• Distributed company• ~20 people• Riak KV and Riak Search (both Open Source)• SLA-based Support and Enterprise Software ($)• Other Open Source Erlang developed by Basho:

• Webmachine, Rebar, Bitcask, Erlang_JS, Basho Bench

Monday, November 22, 2010

Page 3: Masterless Distributed Computing with Riak Core - EUC 2010

Riak KV and Riak Search

3

Riak KV

Key/Value Datastore

Map/Reduce, Lightweight Data Relations, Client APIs

Riak Search

Full-text search and indexing engine

Near Realtime Indexing, Riak KV Integration, Solr Support.

Common Properties

Both are distributed, scalable, failure-tolerant applications.

Both based on Amazon’s Dynamo Architecture.

Monday, November 22, 2010

Page 4: Masterless Distributed Computing with Riak Core - EUC 2010

Riak

KV

Riak

Search

Riak

Core

The Common Parts are called Riak Core

Monday, November 22, 2010

Page 5: Masterless Distributed Computing with Riak Core - EUC 2010

Riak

KV

Riak

Search

Riak

Core

The Common Parts are called Riak Core

Distribution / Scaling / Failure-Tolerance Code

Monday, November 22, 2010

Page 6: Masterless Distributed Computing with Riak Core - EUC 2010

Riak Core is an Open Source Erlang library

that helps you builddistributed, scalable, failure-tolerant

applications using a Dynamo-style architecture.

6

Monday, November 22, 2010

Page 7: Masterless Distributed Computing with Riak Core - EUC 2010

“We Generalized the Dynamo Architecture and Open-Sourced the Bits.”

7

Monday, November 22, 2010

Page 8: Masterless Distributed Computing with Riak Core - EUC 2010

What Areas are Covered?

Amazon’s Dynamo Paper highlighted to show parts covered in Riak Core.

Monday, November 22, 2010

Page 9: Masterless Distributed Computing with Riak Core - EUC 2010

Distributed, scalable, failure-tolerant.

9

Monday, November 22, 2010

Page 10: Masterless Distributed Computing with Riak Core - EUC 2010

Distributed, scalable, failure-tolerant.

No central coordinator. Easy to setup/operate.

10

Monday, November 22, 2010

Page 11: Masterless Distributed Computing with Riak Core - EUC 2010

Distributed, scalable, failure-tolerant.

Horizontally scalable; add commodity hardware

to get more X.

11

Monday, November 22, 2010

Page 12: Masterless Distributed Computing with Riak Core - EUC 2010

Distributed, scalable, failure-tolerant.

Always available. No single point of failure.

Self-healing.

12

Monday, November 22, 2010

Page 13: Masterless Distributed Computing with Riak Core - EUC 2010

Wait, doesn’t *Erlang* let you build distributed, scalable, failure-tolerant

applications?

13

Monday, November 22, 2010

Page 14: Masterless Distributed Computing with Riak Core - EUC 2010

Client

Service A Service B

Resource D

Service C

Queue E

Erlang makes it easy to connect the components of your application.

Monday, November 22, 2010

Page 15: Masterless Distributed Computing with Riak Core - EUC 2010

Service

Node A

Node E

Node I

Node M

Node B

Node F

Node J

Node N

Node C

Node G

Node K

Node O

Node D

Node H

Node L

. . .

Riak Core helps you build a service that harnesses the power of many nodes.

Monday, November 22, 2010

Page 16: Masterless Distributed Computing with Riak Core - EUC 2010

How does Riak Core work?

16

Monday, November 22, 2010

Page 17: Masterless Distributed Computing with Riak Core - EUC 2010

A Simple Interface...

Command ObjectName, Payload

Send commands, get responses.

How do we route the commands to physical machines?

Monday, November 22, 2010

Page 18: Masterless Distributed Computing with Riak Core - EUC 2010

Hash the Object Name

Command ObjectName, Payload

SHA1(ObjName), Payload

0 to 2^160

Monday, November 22, 2010

Page 19: Masterless Distributed Computing with Riak Core - EUC 2010

A Naive Approach

Command ObjectName, Payload

SHA1(ObjName), Payload

Node A Node B Node C Node D

Monday, November 22, 2010

Page 20: Masterless Distributed Computing with Riak Core - EUC 2010

A Naive Approach

Command

SHA1(ObjName), Payload

Node A Node B Node C Node D Node E

ObjectName, Payload

Existing routes become invalid when you add/remove nodes.

Monday, November 22, 2010

Page 21: Masterless Distributed Computing with Riak Core - EUC 2010

"All problems in computer science can be solved by

another level of indirection." - David Wheeler

21

Monday, November 22, 2010

Page 22: Masterless Distributed Computing with Riak Core - EUC 2010

Routing with Consistent Hashing

Command ObjectName, Payload

SHA1(ObjName), Payload

VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7

Node A Node B Node C Node D

Monday, November 22, 2010

Page 23: Masterless Distributed Computing with Riak Core - EUC 2010

Adding a Node

Command ObjectName, Payload

SHA1(ObjName), Payload

VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7

Node A Node B Node C Node D Node E

Monday, November 22, 2010

Page 24: Masterless Distributed Computing with Riak Core - EUC 2010

Removing a Node

Command ObjectName, Payload

SHA1(ObjName), Payload

VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7

Node A Node B Node C Node D Node E

Monday, November 22, 2010

Page 25: Masterless Distributed Computing with Riak Core - EUC 2010

The Ring

Hash Location

Monday, November 22, 2010

Page 26: Masterless Distributed Computing with Riak Core - EUC 2010

Writing Replicas (N Value)

Locations when N=3

Monday, November 22, 2010

Page 27: Masterless Distributed Computing with Riak Core - EUC 2010

Routing Around Failures

Locations when N=3and node 0 is down.

X

Monday, November 22, 2010

Page 28: Masterless Distributed Computing with Riak Core - EUC 2010

The Preflist

Preflist

Monday, November 22, 2010

Page 29: Masterless Distributed Computing with Riak Core - EUC 2010

Location of the Routing Layer

29

Monday, November 22, 2010

Page 30: Masterless Distributed Computing with Riak Core - EUC 2010

Router in the Middle Leads to SPOF

Client Client Client

Router

VNode

0

Node A Node B Node C Node D Node E

VNode

1

VNode

3

VNode

4

VNode

2

VNode

5

VNode

6

VNode

7

Monday, November 22, 2010

Page 31: Masterless Distributed Computing with Riak Core - EUC 2010

Riak Core - Router on Each Node

Client Client Client

Router Router Router RouterRouter

VNode

0

Node A Node B Node C Node D Node E

VNode

1

VNode

3

VNode

4

VNode

2

VNode

5

VNode

6

VNode

7

Monday, November 22, 2010

Page 32: Masterless Distributed Computing with Riak Core - EUC 2010

Eventually - Router in the Client

Client Client Client

VNode

0

Node A Node B Node C Node D Node E

VNode

1

VNode

3

VNode

4

VNode

2

VNode

5

VNode

6

VNode

7

Router RouterRouter

Why isn’t this done yet? Time and complexity.

Monday, November 22, 2010

Page 33: Masterless Distributed Computing with Riak Core - EUC 2010

How Do The Routers Reach Agreement?

Router Router Router RouterRouter

VNode

0

Node A Node B Node C Node D Node E

VNode

1

VNode

3

VNode

4

VNode

2

VNode

5

VNode

6

VNode

7

Monday, November 22, 2010

Page 34: Masterless Distributed Computing with Riak Core - EUC 2010

The Nodes Gossip Their World View

Local Ring State

IncomingRing State

Are rings equivalent?Strictly descendent?Or different?

Monday, November 22, 2010

Page 35: Masterless Distributed Computing with Riak Core - EUC 2010

Not MentionedVector ClocksMerkle TreesBloom Filters

35

Monday, November 22, 2010

Page 36: Masterless Distributed Computing with Riak Core - EUC 2010

Building an Applicationwith Riak Core

36

Monday, November 22, 2010

Page 37: Masterless Distributed Computing with Riak Core - EUC 2010

Building an Application on Riak Core?Two things to think about:

37

The Command SetCommand = ObjectName, PayloadThe commands/requests/operations that you will send through the system.

The VNode ModuleThe callback module that will receive the commands.

Monday, November 22, 2010

Page 38: Masterless Distributed Computing with Riak Core - EUC 2010

Writing a VNode Module

38

Startup/Shutdowninit([Partition]) -> {ok, State}

terminate(State) -> ok

Receive Incoming Commandshandle_command(Cmd, Sender, State) -> {noreply, State1} | {reply, Reply, State1}

handle_handoff_command(Cmd, Sender, State) ->{noreply, State1} | {reply, ok, State1}

Monday, November 22, 2010

Page 39: Masterless Distributed Computing with Riak Core - EUC 2010

Writing a VNode Module

39

Send and Receive Handoff Datahandoff_starting(Node, State) -> {Bool, State1}

encode_handoff_data(Data, State) -><<Binary>>.

handle_handoff_data(Data, Sender, State) ->{reply, ok, State1}

handoff_finished(Node, State) ->{ok, State1}

Monday, November 22, 2010

Page 40: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

40

application:start(riak_core).

Monday, November 22, 2010

Page 41: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

41

Supervise vnode processes.

Monday, November 22, 2010

Page 42: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

42

Start, coordinate, and supervise handoff.

Monday, November 22, 2010

Page 43: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

43

Maintain cluster membership information.

Monday, November 22, 2010

Page 44: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

44

Monitor node liveness, broadcast to registered modules.

Monday, November 22, 2010

Page 45: Masterless Distributed Computing with Riak Core - EUC 2010

Start the riak_core application

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

45

Send ring information to other nodes.Reconcile different views of the cluster.

Rebalance cluster when nodes join or leave.

Monday, November 22, 2010

Page 46: Masterless Distributed Computing with Riak Core - EUC 2010

In your application...

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

46

Start the vnodes for your application.Master = { riak_X_vnode_master, { riak_core_vnode_master, start_link, [riak_X_vnode] }, permanent, 5000, worker, [riak_core_vnode_master]},{ok, { {one_for_one, 5, 10}, [Master]} }.

Monday, November 22, 2010

Page 47: Masterless Distributed Computing with Riak Core - EUC 2010

In your application...

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

47

Tell riak_core that your applicationis ready to receive requests.

riak_core:register_vnode_module(riak_X_vnode),riak_core_node_watcher:service_up(riak_X, self())

Monday, November 22, 2010

Page 48: Masterless Distributed Computing with Riak Core - EUC 2010

In your application...riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

riak_core

riak_core_vnode_sup riak_core_handoff_*

riak_core_ring_*

riak_core_node_*

riak_core_gossip_*X_vnode

. . .

X_vnode

X_vnode

48

Join to an existing node in the cluster.

riak_core_gossip:send_ring(ClusterNode, node())

Monday, November 22, 2010

Page 49: Masterless Distributed Computing with Riak Core - EUC 2010

Start Sending Commands

49

# Figure out the preflist...{_Verb, ObjName, _Payload} = Command,PrefList = riak_core_apl:get_apl(ObjName, NVal, riak_X),

# Send the command...riak_core_vnode_master:command(PrefList, Command, riak_X_vnode_master)

Monday, November 22, 2010

Page 50: Masterless Distributed Computing with Riak Core - EUC 2010

Review

Riak KVOpen Source Key/Value datastore.

Riak Search

Full-text, near real-time search engine based on Riak Core.

Riak CoreOpen Source Erlang library that helps you build distributed, scalable, failure-tolerant applications using a Dynamo-style architecture.

50

Monday, November 22, 2010

Page 51: Masterless Distributed Computing with Riak Core - EUC 2010

Thanks! Questions?

Learn Morehttp://wiki.basho.comRead Amazon’s Dynamo Paper

Get the Codehttp://github.com/basho/riak_core

Get in [email protected] on Email@rklophaus on Twitter

51

Monday, November 22, 2010

Page 52: Masterless Distributed Computing with Riak Core - EUC 2010

END

Monday, November 22, 2010