42
©2016 Couchbase Inc. 1 The Couchbase Connect16 mobile app Take our in-app survey!

Memory-optimized indexes: how they work – Couchbase Connect 2016

Embed Size (px)

Citation preview

Page 1: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 1

The Couchbase Connect16 mobile appTake our in-app survey!

Page 2: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 2

Memory-optimized indexhow they work

Sarath LakshmanSenior Software Engineer, Couchbase

Page 3: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 3©2016 Couchbase Inc.

Agenda

• Architecture of Global Secondary Index

• What exactly is Memory-Optimized Index ?

• Architecture of Nitro Storage Engine

• Scalability and Performance

• Operational aspects of Memory-Optimized Index

Page 4: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 4

Global Secondary Index (GSI)

An architecture overview

Page 5: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 5©2016 Couchbase Inc.

GSI Overview

• Speed up your N1QL queries using fast indexes ordered by secondary JSON fields

• Workload isolation and independent scaling for document access/modifications and index operations

• Ensure read availability by creating replica indexes

• Global Indexes offer scalable performance, while local indexes degrade query performance as more nodes are added due to scatter/gather

• Asynchronously updated indexes with high throughput and low latency

Page 6: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 6©2016 Couchbase Inc.

Multi Dimensional Scaling (MDS)

• Indexes can scale independently from document data• Workloads for different services are isolated

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Managed Cache

Storage

Data Service STORAGE

Couchbase Server 2

Managed Cache

Cluster ManagerCluster Manager

Data Service STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Data Service STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Query Service

STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Index Service

Managed Cache

Storage

Managed Cache

Storage Storage

STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster Manager

Index Service

Storage

Managed Cache Managed Cache

Page 7: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 7

Components of GSI

• Projector• Transforms document mutations into

secondary index items and routes them to index nodes based on index definitions

• Indexer• Updates indexes corresponding to

the document changes• Provide point-in-time index scan

snapshots• Handle index DDLs

• GSI Client• Smart client which is aware of global

indexes topology• Helps N1QL to interact with GSI

indexes• Facilitates index scan operations• Manages scan connections pooling

Page 8: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 8©2016 Couchbase Inc.

Indexer update pipeline

Index Service

Index Port

Mutation QueueExtractionWorker

Index Queue

Storage updaterWorker

ForestDB/Nitro

Update index ONLY IF key has changed

{“LastName” : “Adams”,

“Phone” : “323-180-9978”}

{“LastName” : “Adams”}

{“Phone” : “323-180-9978”}

Page 9: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 9

Memory Optimized Index

An introduction

Page 10: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 10©2016 Couchbase Inc.

Why Memory-Optimized index ?

• Performance!• Server hardware is constantly evolving with many CPU cores and large amount

of DRAM• Single index write performance matters as it has to keep up with the rate of

document mutations send by many data service nodes• Data service offers very high write performance demanding fast index updates• Indexes hold small subset of document data (eg., secondary field). Hence, it is

possible hold indexes completely in memory• Disk oriented storage engine such as Standard Index are optimized for faster

disk access with paging mechanism by assuming that entire dataset cannot fit in memory

• Providing more DRAM and CPU cores will not speed up single index performance with standard indexes

Page 11: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 11©2016 Couchbase Inc.

What exactly is Memory-Optimized Index ?

• Memory Resident Index• Throw more DRAM, CPU cores – Can I scale single index performance

linearly ? Yes• Designed for high performance and multicore scalability• Fast writes and low latency Index scans• Architecturally very different from disk-oriented storage engines• Fast backup and Recovery on Disk/SSD• Supports index snapshots at 20ms latency (200ms for standard index)• Avoid need for partitioning index for scaling throughput• Written in Golang/C• Every component of the index storage engine can scale seamlessly

with many CPU cores

Page 12: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 12

Nitro Storage EngineThe storage engine that powers

Memory Optimized Indexes (MOI)

A VLDB 2016 paper (Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index)

Page 13: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 13©2016 Couchbase Inc.

Design considerations

Multiple Writers for high performance Utilize the inherent parallelism in the Database Change Protocol (DCP) Scalable single index write performance by using available CPU cores

Lock-free data structures for high concurrency Writers and readers never block Maximize utilization of multicore CPUs

Fast snapshots Minimize latency for index queries/ reduce staleness of the index Create read snapshots at the rate of 100/second

Leverage optimizations for memory resident data structures

Page 14: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 14

Nitro Architecture

• Create backups from snapshots and recover nitro after restart/crash

• Free items when GCed and not in reference• Remove items from skiplist which belongs to the

unused snapshots• Create point-in-time immutable snapshots for index

scans• Avoid phantoms and provide scan stability• Manage index snapshot versions in use• Implements Insert, Delete, Lookup, Range Iteration• Concurrent partitioned visitors• Concurrent bottom-up skiplist build

Page 15: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 15©2016 Couchbase Inc.

Skiplist

• Probabilistic balanced ordered search data structure• Search is similar to binary search over linked-lists (O(logn))• Item granular operations unlike B+Tree (page oriented)• Lock-free skiplist is implemented by making use of atomic compare-and-swap,

atomic-add-fetch

Page 16: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 16©2016 Couchbase Inc.

Lock-free data structure fundamentals

Page 17: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 17©2016 Couchbase Inc.

Lock-free data structure fundamentals

Step 1: Mark as deleted Step 2: Removal

Page 18: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 18©2016 Couchbase Inc.

Multi Versions Management (MVCC)

• Define lifetime metadata in each Skiplist node (ie, bornSn and deadSn)

• Create Snapshot 1

• Create Snapshot 2V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=0

V=30

bornSn=1deadSn=0

V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=0

V=30

bornSn=1deadSn=0

V=15

bornSn=2deadSn=0

V=32

bornSn=2deadSn=0

Page 19: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 19©2016 Couchbase Inc.

Multi Versions Management (MVCC)

• Create Snapshot 3

V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=3

V=30

bornSn=1deadSn=0

V=15

bornSn=2deadSn=0

V=32

bornSn=2deadSn=3

V=32

bornSn=3deadSn=0

Page 20: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 20©2016 Couchbase Inc.

Multi Versions Management (MVCC)

• Index scan for Snapshot 1

V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=3

V=30

bornSn=1deadSn=0

V=15

bornSn=2deadSn=0

V=32

bornSn=2deadSn=3

V=32

bornSn=3deadSn=0

Visibility: Iterator (Sn=1)

Page 21: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 21©2016 Couchbase Inc.

Multi Versions Management (MVCC)

• Index scan for Snapshot 2

V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=3

V=30

bornSn=1deadSn=0

V=15

bornSn=2deadSn=0

V=32

bornSn=2deadSn=3

V=32

bornSn=3deadSn=0

Visibility: Iterator (Sn=2)

Page 22: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 22©2016 Couchbase Inc.

Multi Versions Management (MVCC)

• Index scan for Snapshot 3

V=10

bornSn=1deadSn=0

V=20

bornSn=1deadSn=3

V=30

bornSn=1deadSn=0

V=15

bornSn=2deadSn=0

V=32

bornSn=2deadSn=3

V=32

bornSn=3deadSn=0

Visibility: Iterator (Sn=3)

Page 23: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 23

Nitro MVCC vs Copy-On-Write B+Tree MVCC

• A single item update to leaf node performs copy-on-write of the entire block (Eg. 4kb)

• Since B+Tree has hierarchical structure, it also results in copy-on-write of all parent blocks recursively until the root block causing significant storage overhead (wandering tree problem)

• Write optimized storage engines tries to amortize this cost by batching updates

• Large batch sizes cause larger snapshot interval

• Nitro has fixed storage overhead per item

• Snapshotting is a lightweight operation

Page 24: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 24©2016 Couchbase Inc.

Garbage Collection

V=1

bornSn=1deadSn=2

V=2

bornSn=2deadSn=0

V=3

bornSn=1deadSn=0

V=4

bornSn=1deadSn=2

V=5

bornSn=2deadSn=3

V=6

bornSn=3deadSn=0

V=7

bornSn=4deadSn=0

V=8

bornSn=1deadSn=0

V=9

bornSn=3deadSn=0

V=10

bornSn=3deadSn=4

Sn=1 Sn=2 Sn=3 Sn=4Concurrent

GC

ConcurrentSMR

Garbage Collection Snapshot List

rfcnt=0 rfcnt=1 rfcnt=0 rfcnt=2

V=1

Page 25: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 25©2016 Couchbase Inc.

Safe Memory Reclamation

• Early and alive accessors can potentially hold references to GCed items• Freeing GCed items/nodes can cause dangling references• The memory reclaimer has to make sure that no accessor is holding reference

to GCed items• This problem does not occur with garbage collected languages• A lock-free SMR algorithm takes care of safe freeing of resources• Details of the SMR algorithm is available in the Nitro VLDB16 paper

Page 26: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 26©2016 Couchbase Inc.

Nitro Backup

File-1

Backup worker-1

Backup worker-2

Backup worker-3

File-2 File-3

GC

Delta files

non-intrusivebackup

Page 27: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 27

Nitro Recovery

• Concurrent bottom-up skiplist build

• Avoids unnecessary CAS conflicts during concurrent insert

• Snapshot number starts from Sn=1

• Once build is complete, additional items are inserted by replaying inserts from delta files concurrently

File-1

File-2

File-3

Page 28: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 28©2016 Couchbase Inc.

Benefits of Nitro

• Lock-free operations allows storage engine to scale seamlessly with multicore CPUs

• Single index performance can be scaled by assigning more update workers

• The Nitro MVCC model provides fixed storage overhead per update/insert operation

• Fast snapshotting capability allows very low indexing latency between Data service and Index service

• Nitro provides a scalable lock-free garbage collector and safe memory reclaimer

• Nitro features a scalable online concurrent backup and fast recovery mechanism

Page 29: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 29

Nitro GSI Integration

Page 30: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 30

GSI Data Structures

The storage engine needs to maintain two storage structures:

Reverse map Index

Reverse map is used to lookup and remove previous index entry for the docid during the update

Index store maintains ordered index entries used by index scans

Page 31: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 31

Memory Optimized Index update pipeline

Scalable write performance using multiple writers

Simple hash table used for reverse map instead of Nitro (Avoid concurrency overheads)

Periodic backup persists only (indexItem, docid)

The reverse map can be reconstructed on the fly during recovery

End-to-end Indexing latency ~20ms

HT

Nitro INDEX

hash(docid) % n

writer-1

HT

writer-2

HT

writer-n

..

Index Scan

Page 32: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 32

Storage Optimizations

HT

Nitro INDEX

DocID Indexed Item

emp_005 MountainView

emp_008 Sunnyvale

Index Entry

MountainView:emp_005

Sunnywale:emp_008

CRC32 Hash Node Pointers

hash1

hash2

Direct pointers from hash table to index entry

Storage needed for index maintenance reduced ~50%

Index item delete cost reduced from O(logn) to O(1)

Optimized multi-entry indexing from single document

Page 33: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 33

Performance & Scalability

Lets us see the numbers!

Page 34: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 34©2016 Couchbase Inc.

Nitro performance

• Almost linear scaling of throughput with number of cores

Insert benchmark Lookup benchmark

Page 35: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 35©2016 Couchbase Inc.

Nitro performance

• Partitioning is not required to scale single index performance

Get with background Inserts Throughput scalability with partitions

Page 36: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 36©2016 Couchbase Inc.

Memory Optimized Index vs Standard Index – End-to-End

• 4 Data nodes, 1 Index node, 32 cores CPU (Intel(R) Xeon(R) E5-2630 v3 @ 2.40GHz)

• Index service node keeps up with mutations from 4 Data service nodes

Operation ThroughputInsert 1,658,031 Update 822,680 Delete 1,578,316

GSI index server throughput (items/sec)

Single Index benchmarkMOI Write Throughput = 1.6M/s 800k/s

Page 37: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 37©2016 Couchbase Inc.

Nitro recovery performance

Page 38: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 38

Memory-Optimized Index

Operational perspective

Page 39: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 39©2016 Couchbase Inc.

Operational Aspects

• Memory-Optimized Index can be configured using cluster-wide setting

• What happens when an index node runs out of memory ?

• What happens to the indexes once Couchbase Server is restarted ?

• What is the recommended DRAM/CPU configuration for using MOI ?

Page 40: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 40©2016 Couchbase Inc.

Summary

• Couchbase GSI allows to scale data services and index services independently with workload isolation

• Couchbase 4.5 features Memory-Optimized Indexes which can provide superior index performance by seamlessly scaling with many CPU cores and large amount of DRAM

• Introduced Nitro storage engine with following features:• Multiple writers and lock-free operations• Fast snapshotting with lightweight MVCC and concurrent garbage collector• Concurrent non-intrusive fast backup and restore

• Memory-Optimized Index leverages storage optimizations to reduce memory consumption for the index as well as generates compact file backups

• Showcased Nitro and Memory-Optimized Index end-to-end performance• It takes only few minutes to build large indexes!• For more details on Nitro, refer Nitro VLDB16 paper

(http://www.vldb.org/pvldb/vol9/p1413-lakshman.pdf)

Page 41: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 41

Thank You!

Page 42: Memory-optimized indexes: how they work – Couchbase Connect 2016

©2016 Couchbase Inc. 42

Share your opinion on Couchbase

1. Go here: http://gtnr.it/2eRxYWn

2. Create a profile

3. Provide feedback (~15 minutes)