Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Preview:

DESCRIPTION

Abstract: Couchbase Views is a very powerful feature to build real time applications. However, indexing can be a pretty heavy weight operation on your Couchbase Cluster. This session will briefly introduce you to Couchbase views, discuss document, database and view design best practices and present tips and tunables for running views in production for a successful Couchbase deployment.

Citation preview

Best Practices

Couchbase Indexing in ProductionDavid Maier | Senior Solutions Engineer, Couchbase

• Introduction

• Document Modeling Basics

• Ways to query with Couchbase Server

• How Indexing works in Couchbase 3.x compared to 2.x

• Database Design Considerations for Views

• Configuration Settings and their Effects

• Resource Requirements

Agenda

©2014 Couchbase, Inc. 2

• Views are a powerful feature for real time applications

• Indexing can be a pretty heavy weighted operation

Introduction

©2014 Couchbase, Inc. 3

Patch

Management

Many others..

90%Views/Queries Key Access10%

Document Modeling Basics

• JavaScript Object Notation

• Meta data

• Document value

JSON Document Structure

©2014 Couchbase, Inc. 5

Patch

Management

Many others..

Normalized vs. Denormalized Data

©2014 Couchbase, Inc. 6

Patch

Management

Many others..

• Normalized

• Uses references for 1-many relationships

• Reduces data duplicates

• Smaller document size

• Denormalized

• Uses nested data

• Aggregate view of data

• Allows atomic operations

• No client side joins

Normalized vs. Denormalized Data

©2014 Couchbase, Inc. 7

Patch

Management

Many others..

Atomic Counters

©2014 Couchbase, Inc. 8

Patch

Management

Many others..

• Similar to sequences / Auto-Incrementing Columns from the rel. world

• Initialize and then increment a counter value

• Use the counter value as part of a key

Reference Documents for Lookups

©2014 Couchbase, Inc. 9

Patch

Management

Many others..

• Second document which references the primary one

Ways to query with

Couchbase Server

Retrieval via Key Patterns and Lookup Documents

©2014 Couchbase, Inc. 11

Patch

Management

Many others..

• Via key pattern

• ‘person::$firstname.$lastname’

• With lookup document

• Just 2 steps to retrieve an user by email address

• Most efficient way

• B-Tree traversal vs. direct access

Retrieval via Key Patterns and Lookup Documents

©2014 Couchbase, Inc. 12

Patch

Management

Many others..

• Access multiple documents by using a counter value

Indexing and Querying via Views

©2014 Couchbase, Inc. 13

Patch

Management

Many others..

• Organized in Design Documents

• Incremental Map-Reduce

• Spread load across nodes

• Each node indexes it’s data

Map Reduce

Process,

filter, map

and emit a

row

Aggregate

mapped data

Built in:

_count,

_sum, _stats

Indexing and Querying via Views

©2014 Couchbase, Inc. 14

Many others..

• Multiple roles

• A primary index provides access to all document id-s of a bucket

• A secondary index is an alternative access path regarding a (compound) key attribute

• A View provides you an alternative view on your data

Indexing and Querying via Views

©2014 Couchbase, Inc. 15

Patch

Management

Indexing and Querying via Views

©2014 Couchbase, Inc. 16

Patch

Management

Indexing and Querying via Views

©2014 Couchbase, Inc. 17

Patch

Management

• Simple View Access

• Exact Match

• Range

• With Reduction

• With Grouping

Best Practices for Selection, Projection and Aggregation

©2014 Couchbase, Inc. 18

Patch

Management

Many others..

• Try to avoid computing too many things in a View

• Check for attribute existence

• Select (filter) data to avoid unnecessary entries in the View

• Use document types to make Views more selective

• Project (map) only necessary data and emit it as value

• Do not emit the full document

• If possible then emit a null value and do an additional Get to retrieve the whole document

• Use the built in reduce functions if possible

Best Practices for Selection, Projection and Aggregation

©2014 Couchbase, Inc. 19

Patch

Management

Many others..

How Indexing works in Couchbase

3.x compared to 2.x

2.x Architecture

©2014 Couchbase, Inc. 21

Patch

Management

Many others..

3.x Architecture

©2014 Couchbase, Inc. 22

Patch

Management

Many others..

The Semantic of ‘stale = false’

©2014 Couchbase, Inc. 23

Patch

Management

• 'stale = false’

• Default is ‘update_after’

• Used to enforce an index update at query time

• Adds latency if used with every query

• 2.x

• Data was eventually indexed and result was eventual consistent

• The data which did previously hit the disk was indexed

• 3.x

• Data is indexed from memory and so 'stale = false' works as semantically expected

Database Design Considerations

for Views

Number of Design Documents by Bucket

©2014 Couchbase, Inc. 25

• Indexers are allocated per Design Document

• Effects number of in parallel used CPU-s

• Bad cases

• One Design Document contains all Views

All Views are updated the same time

A lot to do for the Indexer

• One View per Design Document

Resource intensive because one Indexer per View

• Use a good balance regarding the number of Views per Design Document !

Separated buckets for Indexing / Querying

©2014 Couchbase, Inc. 26

Patch

Management

Many others..

• Creating a View for the entire bucket is heavy weighted

• View function is executed for every Set operation

• Separate the data which should be queried by Views by storing it in a separated bucket

• But don't create too much buckets !

• Overhead regarding the cluster management

XDCR: A separated cluster for Indexing / Querying

©2014 Couchbase, Inc. 27

• Use a separated Cluster for Indexing and Querying to avoid the load on the main one

• Reporting cluster vs. operational one

• Active-Passive XDCR

Configuration Settings

and their Effects

Indexing Settings

©2014 Couchbase, Inc. 29

• Index path

To use separated disks for the data and the indexes in order to improve I/O performance

Indexing Settings

©2014 Couchbase, Inc. 30

• Indexing interval

Controls how up to date the index is by default

• ‘stale = false’ as explained before

Indexing Settings

©2014 Couchbase, Inc. 31

• Maximum number of in parallel working Indexers

To increase the number of threads per node means higher level of concurrency, but also higher disk and CPU load

Rebalance Settings

©2014 Couchbase, Inc. 32

• Index aware rebalance

• By default indexing happens as part of the rebalance operation

• Ensures that you get query results from a new node during rebalance that are consistent with the query results you would have received from the node before rebalance started

Performance impact if enabled, so rebalance takes significantly more time

Rebalance Settings

©2014 Couchbase, Inc. 33

• Rebalance before compaction

• Default is 16, which means that 16 vBuckets are moved before rebalance is paused

Higher value may increase rebalance performance because it implicitly increases the rebalance priority

Rebalance Settings

©2014 Couchbase, Inc. 34

• Rebalance moves per node

• The default is 1

The number of vBuckets moved at a time during the rebalance operation

Compaction Settings

©2014 Couchbase, Inc. 35

• (Auto) Compaction

• Necessary because append only structures are used

• In-place updates are expensive

• Removes thumb stone objects and fragmentation

• Process Database and View compaction in parallel

Implies a heavier processing and disk I/O load during the compaction process

Compaction Settings

©2014 Couchbase, Inc. 36

Resource Requirements

Resource Requirements

©2014 Couchbase, Inc. 38

More CPU cores are recommended

Configure your OS File System Buffer !

Use SSD-s for Views !

CPU Disk (size, I/O)

Number of Views per Design Document

Number of the emitted items

Compaction

Complexity of

Map/Reduce

functions

Size of the emitted

value

0 200

ms

0 5000

q / s

Questions?david.maier@couchbase.com