25
Chimera: Data Sharing Flexibility, Shared Nothing Simplicity Umar Farooq Minhas University of Waterloo David Lomet, Chandu Thekkath Microsoft Research

Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

  • Upload
    sahkyo

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

Chimera: Data Sharing Flexibility, Shared Nothing Simplicity. Umar Farooq Minhas University of Waterloo David Lomet , Chandu Thekkath Microsoft Research. Distributed database architectures. In a shared nothing system a single node can only access local data - PowerPoint PPT Presentation

Citation preview

Page 1: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq MinhasUniversity of Waterloo

David Lomet, Chandu ThekkathMicrosoft Research

Page 2: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 2

Distributed database architectures

• In a shared nothing system a single node can only access local data– less complex, easier to implement– provides good performance if data is partitionable– e.g., Microsoft SQL Server, IBM DB2/UDB

• Data sharing allows multiple nodes to share access to common data– complex, difficult to implement– provides increased responsiveness to load imbalances– e.g., Oracle RAC, IBM Mainframe DB2

Goal: Design and implement a hybrid database system

Page 3: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 3

Shared nothing vs data sharing

Shared nothing

Node 1CPU CPU

Memory

Node 2CPU CPU

Memory

Node 3CPU CPU

Memory

Disk

• Hardware configuration can be identical for both systems• Software managing the system is different

Disk Disk

Data sharing

Node 1CPU CPU

Memory

Node 2CPU CPU

Memory

Node 3CPU CPU

Memory

Disk DiskDisk

Data sharing software layer

Page 4: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 4

Our approach

• Start with shared nothing cluster of low-cost desktop machines– each node hosts a standalone shared nothing DBMS with locally

attached storage

• Extend shared nothing system with data sharing capability– a remote node can access a database hosted at a local node

• Additional code required for– distributed locking– cache consistency

Techniques presented here are applicable to any shared nothing DBMS

Page 5: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 5

Outline

• Introduction

• Chimera: Overview

• Chimera: Implementation Details

• Experimental Evaluation

• Conclusion

Page 6: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 6

Chimera: Best of both worlds

• Chimera is an “extension” to a shared nothing DBMS– built using off-the-shelf components

• Provides the simplicity of shared nothing, flexibility of data sharing

• Provides effective scalability and load balancing with less than 2% overhead

Page 7: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 7

Chimera: Main components

1. Shared file system – to store data accessible to all nodes of a cluster– e.g., Common Internet File System (CIFS) or Network File System

(NFS)

2. Generic distributed lock manager – provides ownership control– e.g., ZooKeeper, Chubby, Boxwood

3. Extra code in the shared nothing DBMS– for data access and sharing among nodes

Page 8: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 8

Advantages of Chimera

• Load balancing at table granularity– offloads execution cost of database functionality

• Scale-out for read-mostly workloads– read-mostly workloads are very common and important

• e.g., a service hosted at Microsoft, Yahoo, or Google.– non-partitionable data is stored in a centralized database– Chimera provides effective scale-out for such workloads

• Close to shared nothing simplicity– key point: allow only a single node to update a database at a time– greatly simplifies data sharing, transaction log, and recovery

Page 9: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 9

Outline

• Introduction

• Chimera: Overview

• Chimera: Implementation Details

• Experimental Evaluation

• Conclusion

Page 10: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 10

Chimera: Overall system architecture

DBCIFS

GLM

DBMS 1(local)

SP

Queries

EBM LC

DBMS N(remote)

SP

Queries

EBM LC

DBMS 2(remote)

SP

Queries

EBM LC

SP – Stored ProcedureLC – Lock ClientEBM – Enhance Buffer ManagerGLM – Global Local ManagerCIFS – Common Internet File System

Page 11: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 11

Stored Procedure

• Most of the required changes are implemented in a user defined stored procedure– invoked like a standard stored procedure

• An instance of this stored procedure is installed at each node– accepts user queries– does appropriate locking and buffer management– executes the query against a local or remote table– returns the results to the caller

Page 12: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 12

Enhanced Buffer Manager

• Implement a cross-node cache invalidation scheme‒ maintain cache consistency across nodes

• Dirty pages need to be evicted from all readers after an update‒ we do not know in advance which pages will get updated

• Selective cache invalidation‒ updating node captures a list of dirty pages‒ sends a message to all the readers to evict those pages

Page 13: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 13

Global Lock Manager

• We need a richer lock manager that can handle locks on shared resources across machines‒ implemented using an external global lock manager with

corresponding local lock clients

• A lock client is integrated with each DBMS instance

• Lock types: Shared or Exclusive

• Lock resources: an abstract name (string)

Page 14: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 14

Read sequence

1. Acquire a shared lock on the abstract resource (table)• ServerName.DBName.TableName

2. On lock acquire, proceed with Select

3. Release the shared lock

Page 15: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 15

Write sequence

1. Acquire an exclusive lock on – ServerName.DBName– ServerName.DBName.TableName

2. On lock acquire, proceed with the Update

3. Do selective cache invalidation on all reader nodes

4. Release the exclusive locks

Page 16: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 16

Outline

• Introduction

• Chimera: Overview

• Chimera: Implementation Details

• Experimental Evaluation

• Conclusion

Page 17: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 17

Experimental setup

• We use a 16 node cluster– 2x AMD Opteron CPU @ 2.0GHz– 8GB RAM– Windows Server 2008 Enterprise with SP2– patched Microsoft SQL Server 2008

• buffer pool size = 1.5GB

• Benchmark– TPC-H: A decision support benchmark– scale factor 1– total size on disk ~3GB

Page 18: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 18

Overhead of our prototype

• Run the 22 TPC-H queries on a single node with and without the prototype code

Avg Slowdown: 1.006 X

TPCH Query

Runtime Without Prototype

(ms)

Runtime With Prototype

(ms)Slowdown

Factor

Q1 4809 5120 1.06Q6 163 171 1.05Q9 2258 2303 1.02

Q11 462 431 0.93Q12 1131 1247 1.10Q13 1349 1345 1.00Q18 4197 3895 0.93Q19 183 185 1.01Q21 2655 2673 1.01Q22 457 485 1.06

Page 19: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 19

Remote execution overhead (cold cache)

• Run the 22 TPC-H queries on the local node and remote node– measure the query run time and calculate the slowdown factor

• flush DB cache between subsequent runs

Page 20: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 20

Remote execution overhead (warm cache)

• Repeat the previous experiment with warm cache

Avg Slowdown (before): 1.46 X

Avg Slowdown (now): 1.03 X

Page 21: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 21

Cost of updates

• Baseline: A simple update on a node with no readers

• Test Scenarios: Perform update while 1, 2, 4, or 8 other nodes read the database in an infinite loop

Baseline 1 Reader 2 Readers 4 Readers 8 Readers0

1

2

3

1.141.39 1.41

1.772.06

1.42

1.902.12 2.16

2.39

Local Remote

Aver

age

runti

me

(sec

s)

Page 22: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 22

Cost of reads with updates

• Perform simple updates at local node with varying frequency: 60s, 30s, 15s, and 5s

• Run one of the TPC-H read queries at a remote node for a fixed duration of 300s and calculate– Response time: average runtime– Throughput: queries completed per second

Page 23: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 23

Update

Frequency(secs)

Average Runtime

(secs)

Steady State Average

(secs)Queries/

sec

Q660 0.21 0.20 4.8530 0.22 4.5415 0.23 4.23

5 0.26 3.77

Q13

60 1.38 1.43 0.7330 1.38 0.7315 1.39 0.72

5 1.38 0.73

Q2060 1.62 1.62 0.6230 1.65 0.6015 1.99 0.49

5 2.04 0.49

Q2160 2.78 2.73 0.3630 2.85 0.3515 3.01 0.33

5 3.60 0.28

Cost of reads with updates (1)

Non-conflictingread

Page 24: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 24

Scalability

• Run concurrent TPC-H streams– start with a single local node– incrementally add remote nodes up to a total of 16 nodes

Page 25: Chimera: Data Sharing Flexibility, Shared Nothing Simplicity

Umar Farooq Minhas IDEAS 2011 25

Conclusion

• Data-sharing systems are desirable for load-balancing

• We enable data-sharing as an extension to a shared nothing DBMS

• We presented design and implementation of Chimera‒ enables data sharing at table granularity‒ uses global locks for synchronization‒ implements cross-node cache invalidation‒ does not require extensive changes to shared nothing DBMS

• Chimera provides effective scalability and load balancing with overhead