Riak Search - Erlang Factory London 2010

Preview:

DESCRIPTION

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality, with specific focus on how Erlang is used to manage and execute an ever-changing population of ad hoc query processes.

Citation preview

Erlang Factory· London· June 2010

Basho Technologies

Rusty Klophaus - @rklophaus

Riak SearchA Full-Text Search

and Indexing Engine

based on Riak

Why did we build it?

What are the major goals?

How does it work?

2

Part One

Why did we build

Riak Search?

3

Riak is

a scalable, highly-available, networked,

open-source key/value store.

4

Key/Value

CLIENT RIAK

5

Writing to a Key/Value Store

Object

CLIENT RIAK

6

Writing to a Key/Value Store

Key

Object

CLIENT RIAK

Querying a Key/Value Store

7

Key + Instructions

Object(s)

CLIENT RIAK

Walk to Related

Keys

Querying Riak via LinkWalking

8

Key(s) + JS Functions

Computed Value(s)

CLIENT RIAK

Map

Reduce

Map

Querying Riak via Map/Reduce

9

Key/Value Stores

like

Key-Based Queries

10

where Category == "Shoes"

CLIENT RIAK

WTF!? I'm aKV store!

Query by Secondary Index

11

"Converse AND Shoes"

CLIENT RIAK

This is getting old.

Full-Text Query

12

These kinds of queries

need an Index.

*Market Opportunity!*

13

Part Two

What are the major

goals of Riak Search?

14

Your Application

Riak

An application built on Riak.

15

Your Application

RiakIndex

Object

Hrm... I need an index.

16

Your Application

Riak???

Hrm... I need an index with more features.

17

Your Application

RiakLucene

Lucene should do the trick...

18

Your Application

Lucene Lucene Lucene Riak

...shard to add more storage capacity...

19

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

20

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

21

Operations nightmare!

Your Application

Riak-ifiedLucene

Riak

What do we really want?

22

Your Application

RiakSearch

Riak

What do we really want?

23

Functionality? Be like Lucene (and more).

• Lucene Syntax

• Leverages Java Lucene Analyzers

• Solr Endpoints

• Integration via Riak Post-Commit Hook (Index)

• Integration via Riak Map/Reduce (Query)

• Near-Realtime

• Schema-less

24

Operations? Be like Riak.

• No special nodes

• Add nodes, get more compute and storage

• Automatically load balance

• Replicas for durability and performance

• Index and query in parallel

• Swappable storage backends

25

Part Three

How do we do it?

26

A Gentle Introduction to

Document Indexing

27

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Inverted IndexDocument

The Inverted Index

28

The dog's bark is worse than his bite.

Every dog has his day.

Let the cat out of the bag.

It's raining cats and dogs.

#1

#2

#3

#4

Combined Inverted IndexDocuments

and, 4

bag, 3

bark, 2

bite, 2

cat, 3

cat, 4

day, 1

dog, 1

dog, 2

dog, 4

every, 1

has, 1

...

The Inverted Index

29

"dog AND cat"

AND

dog cat

At Query Time...

30

AND

dog cat

dog, 1

dog, 2

dog, 4

cat, 3

cat, 4

At Query Time...

31

AND(Merge Intersection)

1

2

4

3

4

Result: 4

At Query Time...

32

OR(Merge Union)

1

2

4

3

4

Result: 1, 2, 3, 4

At Query Time...

33

Complex Behavior from Simple Structures

34

Storage Approaches...

35

Riak Search uses

Consistent Hashing

to store data on

Partitions

36

Partitions = 10

Number of Nodes = 5

Partitions per Node = 2

Replicas (NVal) = 2

Introduction to Consistent Hashing and Partitions

37

Object

Introduction to Consistent Hashing and Partitions

38

Document Partitioning

vs.

Term Partitioning

39

...and the

Resulting Tradeoffs

40

Every dog has his day.#1

Document Partitioning @ Index Time

41

"dog OR cat"

Document Partitioning @ Query Time

42

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Term Partitioning @ Index Time

43

day, 1 has, 1

every, 1his, 1

dog, 1

Term Partitioning @ Index Time

44

"dog OR cat"

Term Partitioning @ Query Time

45

Document Partitioning Term Partitioning

+ Lower Latency Queries

- Lower Throughput

- Lots of Disk Seeks

- Higher Latency Queries

+ Higher Throughput

- Hotspots in Ring (the "Obama" problem)

Tradeoffs...

46

Riak Search: Term Partitioning

47

Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets.

Optimizations:

• Term splitting to reduce hot spots

• Bloom filters & caching to save query-time bandwidth

• Batching to save query-time & index-time bandwidth

Support for either approach eventually.

Diving Deeper:

The Lifecycle of a Query

48

Parse the Query

49

meeting AND (face OR phone)

The Query

50

[{land, [

{term,"meeting",[]},

{lor,[

{term,"face",[]},

{term,"phone",[]}

]}

]}]

The Query as an Erlang Term (Parse w/ Leex and Yecc)

51

#land

#term

"meeting"

#lor

#term #term

"face""phone"

The Query as a Graph

52

Plan the Query

53

Catalog CatalogCatalog Catalog

Catalog CatalogCatalog Catalog

System Catalog

54

Term TermID

Term Weight

&

File Offset

System Catalog

55

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Consult the System Catalog for Term/Node Weights

56

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Use Term Weights to Plan the Query

57

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Use Term Weights to Plan the Query

58

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

23 @ node B

17 @ node A13 @ node C

Use Term Weights to Plan the Query

59

[{node, {land, [ {node, {lor, [ {term,{"email","body","face"}, [ {node_weight,'node_c@127.0.0.1', 13} ]}, {term,{"email","body", "phone"}, [ {node_weight,'node_a@127.0.0.1', 17} ]} ]}, 'node_a@127.0.0.1' }, {term, {"email","body","meeting"}, [ {node_weight,'node_b@127.0.0.1', 23} ]} ]}, 'node_b@127.0.0.1'}]

The Node-Assigned Query as an Erlang Term

60

Execute the Query

61

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

62

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

63

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

64

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

65

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

66

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

67

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

68

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

69

#land

#term

'disconnect' #lor

#term #term

'disconnect''disconnect'

#node@A

#node@B

Terminate When Finished

70

Message Format

71

Message ::

{results, [Result]} |

{results, disconnect}

Result ::

{DocID, Properties}

DocID ::

term()

Properties ::

proplist()

The Message Format

72

{results, [

{375, []},

{961, [{color, "red"}]},

{155, [{pos, [1,2,5]}]}

]}

The Message Format

73

Yay for Erlang!

74

• Clean lines between load balancing and logic, single- and multi-node look the same

• Easy to create new operators, rapid development of experimental features

• Linked processes make cleanup a breeze

• Significant code reduction over early Java prototypes

Part Four

Review

75

"Converse AND Shoes"

CLIENT RIAK

WTF!? I'm a

KV store!

Riak Search turns this...

76

"Converse AND Shoes"

CLIENT RIAK

Gladly!

...into this...

77

"Converse AND Shoes"

CLIENT RIAK

Keys or Objects

...into this...

78

Your Application

RiakSearch

Riak

...while keeping operations easy.

79

Thanks! Questions?

Search Team:

John Muellerleile - @jrecursive

Rusty Klophaus - @rklophaus

Kevin Smith - @kevsmith

Currently working with a small set of Beta users.

Open-source release planned for Q3.

www.basho.com

Recommended