36
Apache ZooKeeper An Introduction and Practical Use Cases

Introduction to ZooKeeper - TriHUG May 22, 2012

  • Upload
    mumrah

  • View
    8.937

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"Demo code available at https://github.com/mumrah/trihug-zookeeper-demo

Citation preview

Page 1: Introduction to ZooKeeper - TriHUG May 22, 2012

Apache ZooKeeperAn Introduction and Practical Use Cases

Page 2: Introduction to ZooKeeper - TriHUG May 22, 2012

Who am I● David Arthur● Engineer at Lucid Imagination● Hadoop user● Python enthusiast● Father● Gardener

Page 3: Introduction to ZooKeeper - TriHUG May 22, 2012

Play along!Grab the source for this presentation at GitHub github.com/mumrah/trihug-zookeeper-demo

You'll need Java, Ant, and bash.

Page 4: Introduction to ZooKeeper - TriHUG May 22, 2012

Apache ZooKeeper● Formerly a Hadoop sub-project● ASF TLP (top level project) since Nov 2010● 7 PMC members, 8 committers - most from

Yahoo! and Cloudera ● Ugly logo

Page 5: Introduction to ZooKeeper - TriHUG May 22, 2012

One liner"ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers" - ZooKeeper wiki

Page 6: Introduction to ZooKeeper - TriHUG May 22, 2012

Who uses it?Everyone* * https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy

● Yahoo!● HBase● Solr● LinkedIn (Kafka, Hedwig)● Many more

Page 7: Introduction to ZooKeeper - TriHUG May 22, 2012

What is it good for?● Configuration management - machines

bootstrap config from a centralized source, facilitates simpler deployment/provisioning

● Naming service - like DNS, mappings of names to addresses

● Distributed synchronization - locks, barriers, queues

● Leader election - a common problem in distributed coordination

● Centralized and highly reliable (simple) data registry

Page 8: Introduction to ZooKeeper - TriHUG May 22, 2012

Namespace (ZNodes)parent : "foo" |-- child1 : "bar" |-- child2 : "spam" `-- child3 : "eggs" `-- grandchild1 : "42" Every znode has data (given as byte[]) and can optionally have children.

Page 9: Introduction to ZooKeeper - TriHUG May 22, 2012

Sequential znodeNodes created in "sequential" mode will append a 10 digit zero padded monotonically increasing number to the name. create("/demo/seq-", ..., ..., PERSISTENT_SEQUENTIAL) x4 /demo|-- seq-0000000000|-- seq-0000000001|-- seq-0000000002`-- seq-0000000003

Page 10: Introduction to ZooKeeper - TriHUG May 22, 2012

Ephemeral znodeNodes created in "ephemeral" mode will be deleted when the originating client goes away. create("/demo/foo", ..., ..., PERSISTENT);create("/demo/bar", ..., ..., EPHEMERAL);

Connected Disconnected/demo /demo

|-- foo `-- foo `-- bar

Page 11: Introduction to ZooKeeper - TriHUG May 22, 2012

Simple APIPretty much everything lives under the ZooKeeper class ● create● exists● delete● getData● setData● getChildren

Page 12: Introduction to ZooKeeper - TriHUG May 22, 2012

Synchronicitysync and async version of API methods exists("/demo", null); exists("/demo", null, new StatCallback() {

@Overridepublic processResult(int rc,

String path, Object ctx, Stat stat) {

...}

}, null);

Page 13: Introduction to ZooKeeper - TriHUG May 22, 2012

WatchesWatches are a one-shot callback mechanism for changes on connection and znode state ● Client connects/disconnects● ZNode data changes● ZNode children change

Page 14: Introduction to ZooKeeper - TriHUG May 22, 2012

Demo time!For those playing along, you'll need to get ZooKeeper running. Using the default port (2181), run:

ant zk Or specify a port like:

ant zk -Dzk.port=2181

Page 15: Introduction to ZooKeeper - TriHUG May 22, 2012

Things to "watch" out for● Watches are one-shot - if you want continuous

monitoring of a znode, you have to reset the watch after each event

● Too many clients watches on a single znode creates a "herd effect" - lots of clients get notifications at the same time and cause spikes in load

● Potential for missing changes● All watches are executed in a single, separate

thread (be careful about synchronization)

Page 16: Introduction to ZooKeeper - TriHUG May 22, 2012

Building blocks● Hierarchical nodes● Parent and leaf nodes can have data● Two special types of nodes - ephemeral and

sequential● Watch mechanism● Consistency guarantees

○ Order of updates is maintained○ Updates are atomic○ Znodes are versioned for MVCC○ Many more

Page 17: Introduction to ZooKeeper - TriHUG May 22, 2012

The Fun StuffRecipes:● Lock● Barrier● Queue● Two-phase commit● Leader election● Group membership

Page 18: Introduction to ZooKeeper - TriHUG May 22, 2012

Demo Time!Group membership (i.e., the easy one) Recipe:● Members register a sequential ephemeral

node under the group node● Everyone keeps a watch on the group node

for new children

Page 19: Introduction to ZooKeeper - TriHUG May 22, 2012

Lots of boilerplate● Synchronize the asynchronous connection

(using a latch or something)● Handling disconnects/reconnects● Exception handling● Ensuring paths exist (nothing like mkdir -p)● Resetting watches● Cleaning up

Page 20: Introduction to ZooKeeper - TriHUG May 22, 2012

What happens?● Everyone writes their own high level

wrapper/connection manager○ ZooKeeperWrapper○ ZooKeeperSession○ (\w+)ZooKeeper○ ZooKeeper(\w+)

Page 21: Introduction to ZooKeeper - TriHUG May 22, 2012

Open Source, FTW!Luckily, some smart people have open sourced their ZooKeeper utilities/wrappers ● Netflix Curator - Netflix/curator● Linkedin - linkedin/linkedin-zookeeper● Many others

Page 22: Introduction to ZooKeeper - TriHUG May 22, 2012

Netflix Curator● Handles the connection management● Implements many recipes

○ leader election○ locks, queues, and barriers○ counters○ path cache

● Bonus: service discovery implementation (we use this)

Page 23: Introduction to ZooKeeper - TriHUG May 22, 2012

Demo Time!Group membership refactored with Curator ● EnsurePath is nice● Robust connection management is

awesome● Exceptions are more sane

Page 24: Introduction to ZooKeeper - TriHUG May 22, 2012

Thoughts on Curatori.e., my non-expert subjective opinions

● Good level of abstraction - doesn't do anything "magical"

● Doesn't hide ZooKeeper● Weird API design (builder soup)● Extensive, well tested recipe support● It works!

Page 25: Introduction to ZooKeeper - TriHUG May 22, 2012

ZooKeeper in the wildSome use cases

Page 26: Introduction to ZooKeeper - TriHUG May 22, 2012

Use case: Solr 4.0Used in "Solr cloud" mode for:● Cluster management - what machines are

available and where are they located● Leader election - used for picking a shard as

the "leader"● Consolidated config storage● Watches allow for very non-chatty steady-

state● Herd effect not really an issue

Page 27: Introduction to ZooKeeper - TriHUG May 22, 2012

Use case: Kafka● Linkedin's distributed pub/sub system● Queues are persistent● Clients request a slice of a queue (offset,

length)● Brokers are registered in ZooKeeper, clients

load balance requests among live brokers● Client state (last consumed offset) is stored

in ZooKeeper● Client rebalancing algorithm, similar to

leader election

Page 28: Introduction to ZooKeeper - TriHUG May 22, 2012

Use case: LucidWorks Big Data

● We use Curator's service discovery to register REST services

● Nice for SOA● Took 1 dev (me) 1 day to get something

functional (mostly reading Curator docs)● So far, so good!

Page 29: Introduction to ZooKeeper - TriHUG May 22, 2012

Review of "gotchas"● Watch execution is single threaded and synchronized● Can't reliably get every change for a znode● Excessive watchers on the same znode (herd effect)

Some new ones

● GC pauses: if your application is prone to long GC pauses, make sure your session timeout is sufficiently long

● Catch-all watches: if you use one Watcher for everything, it can be tedious to infer exactly what happened

Page 30: Introduction to ZooKeeper - TriHUG May 22, 2012

Four letter wordsThe ZooKeeper server responds to a few "four letter word" commands via TCP or Telnet*

> echo ruok | nc localhost 2181imok

I'm glad you're OK, ZooKeeper - really I am. * http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands

Page 31: Introduction to ZooKeeper - TriHUG May 22, 2012

QuorumsIn a multi-node deployment (aka, ZooKeeper Quorum), it is best to use an odd number of machines. ZooKeeper uses majority voting, so it can tolerate ceil(N/2)-1 machine failures and still function properly.

Page 32: Introduction to ZooKeeper - TriHUG May 22, 2012

Multi-tenancyZooKeeper supports "chroot" at the session level. You can add a path to the connection string that will be implicitly prefixed to everything you do:

new ZooKeeper("localhost:2181/my/app");

Curator also supports this, but at the application level:

CuratorFrameworkFactory.builder()

.namespace("/my/app");

Page 33: Introduction to ZooKeeper - TriHUG May 22, 2012

Python clientDumb wrapper around C client, not very Pythonic import zookeeperzk_handle = zookeeper.init("localhost:2181")zookeeper.exists(zk_handle, "/demo")zookeeper.get_children(zk_handle, "/demo") Stuff in contrib didn't work for me, I used a statically linked version: zc-zookeeper-static

Page 34: Introduction to ZooKeeper - TriHUG May 22, 2012

Other clientsIncluded in ZooKeeper under src/contrib:● C (this is what the Python client uses)● Perl (again, using the C client)● REST (JAX-RS via Jersey)● FUSE? (strange) 3rd-party client implementations:● Scala, courtesy of Twitter● Several others

Page 35: Introduction to ZooKeeper - TriHUG May 22, 2012

Overview● Basics of ZooKeeper (znode types, watches)● High-level recipes (group membership, et

al.)● Lots of boilerplate for basic functionality● 3rd party helpers (Curator, et al.)● Gotchas and other miscellany

Page 36: Introduction to ZooKeeper - TriHUG May 22, 2012

Questions?David [email protected]/mumrah/trihug-zookeeper-demo