72
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation @ IDB Lab. Seminar Presented by Jee-bum Park

Large-scale Incremental Processing Using Distributed Transactions and Notifications

  • Upload
    jalen

  • View
    64

  • Download
    1

Embed Size (px)

DESCRIPTION

Large-scale Incremental Processing Using Distributed Transactions and Notifications. Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation @ IDB Lab. Seminar Presented by Jee -bum Park. Outline . Introduction Design Bigtable overview Transactions - PowerPoint PPT Presentation

Citation preview

Page 1: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Large-scale Incremental ProcessingUsing Distributed Transactions and NotificationsDaniel Peng and Frank DabekGoogle, Inc.OSDI 2010

15 Feb 2012Presentation @ IDB Lab. Seminar

Presented by Jee-bum Park

Page 2: Large-scale Incremental Processing Using Distributed Transactions and Notifications

2

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 3: Large-scale Incremental Processing Using Distributed Transactions and Notifications

3

Introduction How can Google find the documents on the web so

fast?

Page 4: Large-scale Incremental Processing Using Distributed Transactions and Notifications

4

Introduction Google uses an index, built by the indexing sys-

tem, that can be used to answer search queries

Page 5: Large-scale Incremental Processing Using Distributed Transactions and Notifications

5

Introduction What does the indexing system do?

– Crawling every page on the web– Parsing the documents– Extracting links– Clustering duplicates– Inverting links– Computing PageRank– ...

Page 7: Large-scale Incremental Processing Using Distributed Transactions and Notifications

7

Introduction Compute PageRank using MapReduce

Job 1: compute R(1) Job 2: compute R(2) Job 3: compute R(3) ...

R(t) =

Page 8: Large-scale Incremental Processing Using Distributed Transactions and Notifications

8

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Page 9: Large-scale Incremental Processing Using Distributed Transactions and Notifications

9

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Page 10: Large-scale Incremental Processing Using Distributed Transactions and Notifications

10

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Page 11: Large-scale Incremental Processing Using Distributed Transactions and Notifications

11

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

Page 12: Large-scale Incremental Processing Using Distributed Transactions and Notifications

12

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

MapReduces must be run again over the entire repository

Page 13: Large-scale Incremental Processing Using Distributed Transactions and Notifications

13

Introduction Google’s web search index was produced in this way

– Running over the entire pages

It was not a critical issue,– Because given enough computing resources, MapReduce’s

scalability makes this approach feasible

However, reprocessing the entire web– Discards the work done in earlier runs– Makes latency proportional to the size of the repository,

rather than the size of an update

Page 14: Large-scale Incremental Processing Using Distributed Transactions and Notifications

14

Introduction An ideal data processing system for the task of main-

taining the web search index would be optimized for incremental processing

Incremental processing system: Percolator

Page 15: Large-scale Incremental Processing Using Distributed Transactions and Notifications

15

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 16: Large-scale Incremental Processing Using Distributed Transactions and Notifications

16

Design Percolator is built on top of the Bigtable distributed storage system

A Percolator system consists of three binaries that run on every ma-chine in the cluster– A Percolator worker– A Bigtable tablet server– A GFS chunkserver

All observers (user applications) are linked into the Percolator worker

Page 17: Large-scale Incremental Processing Using Distributed Transactions and Notifications

17

Design Dependencies

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 18: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Design System architecture

18

Node 1

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node 2

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node ...

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Timestamp oracle ser-vice Lightweight lock service

Page 19: Large-scale Incremental Processing Using Distributed Transactions and Notifications

19

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 20: Large-scale Incremental Processing Using Distributed Transactions and Notifications

20

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

Page 21: Large-scale Incremental Processing Using Distributed Transactions and Notifications

21

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke

Page 22: Large-scale Incremental Processing Using Distributed Transactions and Notifications

22

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 23: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Design The timestamp oracle service

– Provides strictly increasing timestamps A property required for correct operation of the snapshot isola-

tion protocol

The lightweight lock service– Workers use it to make the search for dirty notifications

more efficient

23

Timestamp oracle ser-vice Lightweight lock service

Page 24: Large-scale Incremental Processing Using Distributed Transactions and Notifications

24

Design Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transac-tions Observers Percolator

Page 25: Large-scale Incremental Processing Using Distributed Transactions and Notifications

25

Design – Bigtable overview Percolator is built on top of the Bigtable distributed

storage system

Bigtable presents a multi-dimensional sorted map to users– Keys are (row, column, timestamp) tuples

Bigtable provides lookup, update operations, and transactions on individual rows

Bigtable does not provide multi-row transactions

Observers

Percolator workerBigtable tablet

serverGFS chunkserver

Page 26: Large-scale Incremental Processing Using Distributed Transactions and Notifications

26

Design – Transactions Percolator provides cross-row, cross-table transac-

tions with ACID snapshot-isolation semantics

Page 27: Large-scale Incremental Processing Using Distributed Transactions and Notifications

27

Design – Transactions Percolator stores multiple versions of each data

item using Bigtable’s timestamp dimension– Multiple versions are required to provide snapshot isola-

tion

Snapshot isolation1

3

2

Page 28: Large-scale Incremental Processing Using Distributed Transactions and Notifications

28

Design – Transactions Case 1: use exclusive locks

1

Page 29: Large-scale Incremental Processing Using Distributed Transactions and Notifications

29

Design – Transactions Case 1: use exclusive locks

1

Page 30: Large-scale Incremental Processing Using Distributed Transactions and Notifications

30

Design – Transactions Case 1: use exclusive locks

1

2

Page 31: Large-scale Incremental Processing Using Distributed Transactions and Notifications

31

Design – Transactions Case 1: use exclusive locks

2

Page 32: Large-scale Incremental Processing Using Distributed Transactions and Notifications

32

Design – Transactions Case 1: use exclusive locks

2

Page 33: Large-scale Incremental Processing Using Distributed Transactions and Notifications

33

Design – Transactions Case 1: use exclusive locks

2

Page 34: Large-scale Incremental Processing Using Distributed Transactions and Notifications

34

Design – Transactions Case 2: do not use any locks

1

Page 35: Large-scale Incremental Processing Using Distributed Transactions and Notifications

35

Design – Transactions Case 2: do not use any locks

1

Page 36: Large-scale Incremental Processing Using Distributed Transactions and Notifications

36

Design – Transactions Case 2: do not use any locks

1

2

Page 37: Large-scale Incremental Processing Using Distributed Transactions and Notifications

37

Design – Transactions Case 2: do not use any locks

1

2

Page 38: Large-scale Incremental Processing Using Distributed Transactions and Notifications

38

Design – Transactions Case 2: do not use any locks

2

Page 39: Large-scale Incremental Processing Using Distributed Transactions and Notifications

39

Design – Transactions Case 2: do not use any locks

2

Page 40: Large-scale Incremental Processing Using Distributed Transactions and Notifications

40

Design – Transactions Case 2: do not use any locks

2

Page 41: Large-scale Incremental Processing Using Distributed Transactions and Notifications

41

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 42: Large-scale Incremental Processing Using Distributed Transactions and Notifications

42

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 43: Large-scale Incremental Processing Using Distributed Transactions and Notifications

43

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 44: Large-scale Incremental Processing Using Distributed Transactions and Notifications

44

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 45: Large-scale Incremental Processing Using Distributed Transactions and Notifications

45

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 46: Large-scale Incremental Processing Using Distributed Transactions and Notifications

46

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 47: Large-scale Incremental Processing Using Distributed Transactions and Notifications

47

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 48: Large-scale Incremental Processing Using Distributed Transactions and Notifications

48

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 49: Large-scale Incremental Processing Using Distributed Transactions and Notifications

49

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 50: Large-scale Incremental Processing Using Distributed Transactions and Notifications

50

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 51: Large-scale Incremental Processing Using Distributed Transactions and Notifications

51

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 52: Large-scale Incremental Processing Using Distributed Transactions and Notifications

52

Design – Transactions Percolator stores its locks in special in-memory col-

umns in the same Bigtable

Page 53: Large-scale Incremental Processing Using Distributed Transactions and Notifications

53

Design – Transactions Percolator transaction demo

Page 54: Large-scale Incremental Processing Using Distributed Transactions and Notifications

54

Design – Transactions Percolator transaction demo

Page 55: Large-scale Incremental Processing Using Distributed Transactions and Notifications

55

Design – Transactions Percolator transaction demo

Page 56: Large-scale Incremental Processing Using Distributed Transactions and Notifications

56

Design – Transactions Percolator transaction demo

Page 57: Large-scale Incremental Processing Using Distributed Transactions and Notifications

57

Design – Transactions Percolator transaction demo

Page 58: Large-scale Incremental Processing Using Distributed Transactions and Notifications

58

Design – Notifications In Percolator, the user writes code (“observers”) to

be triggered by changes to the table

Each observer registers a function and a set of col-umns

Percolator invokes the functions after data is written to one of those columns in any row

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 59: Large-scale Incremental Processing Using Distributed Transactions and Notifications

59

A Percolator application

Design – Notifications Percolator applications are structured as a series of

observers– Each observer completes a task and creates more work for

“downstream” observers by writing to the table

Observer 1

Observer 2

Observer 4

Observer 5

Observer 3

Observer 6

Page 60: Large-scale Incremental Processing Using Distributed Transactions and Notifications

60

Google’s new indexing system

Design – Notifications

Document Proces-sor (parse, extract

links, etc.)Clustering Exporter

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 61: Large-scale Incremental Processing Using Distributed Transactions and Notifications

61

Design – Notifications To implement notifications, Percolator needs to effi-

ciently find dirty cells with observers that need to be run

To identify dirty cells, Percolator maintains a special “notify” Bigtable column, containing an entry for each dirty cell– When a transaction writes an observed cell, it also sets the

corresponding notify cell

Page 62: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Design – Notifications Each Percolator worker chooses a portion of the table

to scan by picking a region of the table randomly– To avoid running observers on the same row concurrently,

each worker acquires a lock from a lightweight lock ser-vice before scanning the row

62

Timestamp oracle ser-vice Lightweight lock service

Page 63: Large-scale Incremental Processing Using Distributed Transactions and Notifications

63

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 64: Large-scale Incremental Processing Using Distributed Transactions and Notifications

64

Evaluation Experiences with converting a MapReduce-based index-

ing pipeline to use Percolator

Latency– 100x faster than the previous system

Simplification– The number of observers in the new system: 10– The number of MapReduces in the previous system: 100

Easier to operate– Far fewer moving parts: tablet servers, Percolator workers,

chunkservers– In the old system, each of a hundred different MapReduces

needed to be individually configured and could independently fail

Page 65: Large-scale Incremental Processing Using Distributed Transactions and Notifications

65

Evaluation Crawl rate benchmark on 240 machines

Page 66: Large-scale Incremental Processing Using Distributed Transactions and Notifications

66

Evaluation Versus Bigtable

Page 67: Large-scale Incremental Processing Using Distributed Transactions and Notifications

67

Evaluation Fault-tolerance

Page 68: Large-scale Incremental Processing Using Distributed Transactions and Notifications

68

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 69: Large-scale Incremental Processing Using Distributed Transactions and Notifications

69

Conclusion Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transac-tions Observers Percolator

Page 70: Large-scale Incremental Processing Using Distributed Transactions and Notifications

70

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 71: Large-scale Incremental Processing Using Distributed Transactions and Notifications

71

Good and Not So Good Things Good things

– Simple and neat design– Purpose of use is clear– Detailed description based on real example: Google’s index-

ing system

Not so good things– Lack of observer examples (Google’s indexing system in par-

ticular)

Page 72: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Thank You!Any Questions or Comments?