45
Sessions about to start – Get your rig on!

Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Embed Size (px)

Citation preview

Page 1: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Sessions about to start – Get your rig on!

Page 2: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Microsoft Azure Document DB Deep DiveChris J.T. AuldIntergen

DPP405

Page 3: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Agenda

1) Document DB Refresher2) CUs, RUs and Indexing 3) Polyglot Persistence and Data

Modelling4) Data Tier Programmability5) Trading Off Consistency

Page 4: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

A Document DB PrimerJust in case you missed the memo…

Page 5: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

5

{ }

A fully-managed, highly-scalable, NoSQL document database service.

Schema free storage, indexing

and query of JSON documents

Transaction aware service side

programmability with JavaScript

Write optimized, SSD backed and

tuneable via indexing and consistency

Built to be delivered as a service. Pay as you go. Achieve faster time to

value.

Page 6: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

6

DocumentDB in One Slide• Simple HTTP RESTful model. • Access can be via any client that

supports HTTP. Libraries for; Node, .NET, Python, JS

• All resources are uniquely addressable by a URI.

• Partitioned for scale out and replicated for HA. Tunable indexing & consistency

• Granular access control through item level permissions

• Attachments stored in Azure Blobs and support document lifecycle.

• T-SQL like programmability.• Customers buy storage and

throughput capacity basis at account level

https://myaccountname.documents.azure.net/dbs/{id}/colls/{id}/docs/{id}

/dbs/{id} /colls/{id}

/docs/{id} /attachments/{id}

/sprocs/{id}

/triggers/{id}

/functions/{id}

/users/{id}

POSTItem

resource

TenantFeedURI

PUTItem

resource

ItemURI

DELETEItemURI

GET TenantFeed Or Item URI

Create a new resource/Execute a sprocs/trigger/query

Replace an existing resource

Delete an existing resource

Read/Query an existing resource

POST http://myaccount.documents.azure.net/dbs

{ "name":"My Company Db"}

...

[201 Created]

{

“id": "My Company Db",

“_rid": "UoEi5w==",

“_self": "dbs/UoEi5w==/",

“_colls": "colls/",

“_users": "users/"

}

Page 7: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Capacity Units• Customers provision one or more Database

Accounts

• A database account can be configured with one to five Capacity Units (CUs). Call for more.

• A CU is a reserved unit of storage (in GB) and throughput (in Request Units RU)

• Reserved storage is allocated automatically but subject to a minimum allocation per collection of 3.3GB (1/3 of a CU) and a maximum amount stored per collection of 10GB (1 whole CU)

• Reserved throughput is automatically made available, in equal amounts, to all collections within the account subject to min/max of 667 RUs (1/3 of a CU) and 2000 RUs (1 whole CU)

• Throughput consumption levels above provisioned units are throttled

ThroughputRUs

StorageGB

Provisioned capacity

units

* All limits noted above are the Preview Limitations. Subject to change

Page 8: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Request Units• A CU includes ability to execute up to 2000 Request Units per Second

• I.e. With 1 CU peak throughput needs to be below 2000 RUs/sec

• When reserved throughput is exceeded, any subsequent request will be pre-emptively ended• Server will respond with HTTP status code 429• Server response includes x-ms-retry-after-ms header to indicate the amount of time the

client must wait before retrying

• .NET client SDK implicitly catches this response, respects the retry-after header and retries the request (3x)

• You can setup alert rules in the Azure portal to be notified when requests are throttled

Page 9: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Request UnitsDATABASE OPERATIONS NUMBER OF RUs NUMBER OP/s/CU

Reading single document by _self 1 2000

Inserting/Replacing/Deleting a single document 4 500

Query a collection with a simple predicate and returning a single document

2 1000

Stored Procedure with 50 document inserts 100 20

Rough estimates: Document size is 1KB consisting of 10 unique property values with the default consistency level is set to “Session” and all of the documents automatically indexed by DocumentDB.As long as the Database stays the same the RUs consumed should stay the same

Page 10: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Demo: Pulling RU Metrics

Page 11: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

12

Cool Tool:Document DB StudioUseful tool with source for sending queries to Document DB.

http://tiny.cc/docdbstudio

Page 12: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

A True Schema Free DatabaseIf it looks like a duck, swims like a duck and quacks like a duck, then it probably isn’t a giraffe

Page 13: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

LET’S CALL A SPADE A SPADE

Schema free does not mean “schema free… oh, except for the schema I implicitly defined via indexing”

Page 14: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

15

Indexing in DocumentDB

•By default everything is indexed• Indexes are schema free• Indexing is not a B-Tree and works really well under write pressure and at scale.•Out of the Box. It Just Works.•But…… it cannot read your mind all of the time…

Page 15: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

16

Tuning Indexes• We can change the way that DocumentDB indexes• We’re trading off

• Write PerformanceHow long does it take? How many RUs does it use?

• Read PerformanceHow long does it take? How many RUs does it use?Which queries will need a scan?

• StorageHow much space does the document + index require?

• Complexity and FlexibilityMoving away from the pure schema-free model

Page 16: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

17

Index Policy and Mode• Index Policy

• Defines index rules for that collection

• Index mode• Consistent• Lazy

• Automatic• True: Documents automatically

added (based on policy)• False: Documents must be

manually added via IndexingDirective on document PUT.

• Anything not indexed can only be retrieved via _self link (GET)

var collection = new DocumentCollection { Id = “myCollection” };

collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy;

collection.IndexingPolicy.Automatic = false;

collection = await client.CreateDocumentCollectionAsync(databaseLink, collection);

Page 17: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

18

Index Paths & Index Types• Include/Exclude Paths• Include a specific path

• Exclude sub paths• Exclude a specific path

• Specify Index Type• Hash (default)• Range (default for _ts)

not on strings

• Specify Precision• Byte precision (1-7)• Affects storage overhead

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Hash, Path = "/", });

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath { IndexType = IndexType.Range, Path = @"/"“modifiedTimeStamp""/?", NumericPrecision = 7 }); collection.IndexingPolicy.ExcludedPaths.Add("/\“longHTML\"/*");

Page 18: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Demo: Indexing

Page 19: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Polyglotism• And you thought multi headed monsters

were only in your dreams…

Page 20: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

IT’S LESS ABOUT BUILDING AND MORE ABOUT BOLTING

Most modern cloud based applications will use a number of different data stores

Page 21: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Worth Reading:NoSQL Distilled

By Martin Fowler

of ‘Design Patterns’ fame and fortune

Provides a good background on characteristics of NoSQL style data stores and strategies for combining

multiple stores.

23

http://tiny.cc/fowler-pp

Page 22: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

24

DocumentDB

fully featured RDBMS

transactional processing

rich query

managed as a service

elastic scale

internet accessible http/rest

schema-free data model

arbitrary data formats

Page 23: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

25

Attachments• Store large blobs/media outside core storage•Document DB managed• Submit raw content in POST• Document DB stores into Azure Blob storage (2GB today)• Document DB manages lifecycle

• Self managed• Store content in service of your choice• Create Attachment providing URL to content

Page 24: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Demo: Attachments

Page 25: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

27

Demo

•Show managed attachment• Lifecycle Follows Document

Page 26: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

28

Storage Strategies

•Things to think about•How much storage do I use; where? $$$?•How is my data being indexed?• Entropy & Precision• Will it ever be queried? Should I exclude it?

•How many network calls to; save & retrieve• Complexity of implementation & management • Consistency. The Polyglot isn’t consistent

Page 27: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

29

Embed (De-Normalize) or Reference? { "Products":[ { "id":"BK-M18S", "ProductCode":"BK-M18S", "ProductType":"Mountain-500", "Manufacturer":{ "Name":"Adventure Works", "Website":"www.adventureworks.com", } } ] }

{ "Products":[ { "id":"BK-M18S", "ProductCode":"BK-M18S", "ProductType":"Mountain-500", "Manufacturer":"ADVWKS" } ], "Manufacturers":[ { "id":"ADVWKS", "Name":"Adventure Works", "Website":"www.adventureworks.com", } ] }

Page 28: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

30

Embed (De-Normalize) or Reference?• Embed•Well suited to containment• Typically bounded 1:Few• Slowly changing data• M:N Requires management of duplicates• One call to read all data•Write call must write whole document

• Reference• Think of this as 3NF• Provides M:N without

duplicates• Allows unbounded 1:N• Multiple calls to read all

data (hold that thought…)• Write call may write single

referenced document

Page 29: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

31

How Do We Relate?• ID or _self• A matter of taste.• _self will be more efficient (half as many RUs or better)

•Direction•Manufacturer > Product. 1:N• We have to update manufacturer every time we add a new product• Products are unbounded

• Product > Manufacturer N:1• We have to update product if manufacturer changes• Manufacturers per product are bounded (1)

• Sometimes both makes sense.

Page 30: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

The CanonicalPolyglot

OnlineStore

Azure Web Site

Azure SQL Database

storage blob

storage table

Document DB

Azure Search

Page 31: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

33

A Product Catalog• Product• Name (String 100)• SKU (String 100 YYYYCCCNNNNN e.g. ‘2013MTB13435’)• Description (HTML up to 8kb)•Manufacturer (String 100)• Price (Amount + Currency)• Images (0-N Images Up to 100kb)• ProductSizes (0-N including a sort order)• Reviews (0-N reviews, Reviewer + Up to 10kb text)• Attributes (0-N strongly typed complex details)

• Probably want to search

• Hash index is fine• May duplicate into

Azure Search

• Probably a core lookup field. Needs a hash index.

• How to we manage precision?• We could store reversed?• We could store a duplicate reversed

and include/exclude.• We might want to pull Year out into

another field and range index.

• A sub document within DocumentDB will allow multiple base currencies.

• Probably doesn’t change much so de-normalize the currency identifier

• We probably want price in Search….but…

• If we are providing localized prices then have consistency issues; huge churn when we change exchange rates

• Attachments

• Do we embed these?• Do we reference? On product? On reviewer/user?

Both?• Do we reference and embed? Say embed last 10? • Which direction does the reference go?• Almost certainly push to search.

• How deep does the rabbit hole go?

• Probably want to index in Azure Search

• Do we ‘save space’ and push to an attachment?

• Do we often retrieve Product without description?

• We probably do want to exclude it from the index

Page 32: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Demo: Product JSON

SampleProductJSON.json

Page 33: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

35

The Promise of Schema Free•Fully indexed complex type structures•Ability to define schema independent of data store• Reflect for editing and complex search filters• Create templates to produce HTML from JSON for editing and rendering

http://www.mchem.co.nz/msds/Tutti%20Frutti%20Disinfectant.pdfhttp://www.toxinz.com/Demo

Page 34: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Data Tier ProgrammabilityBecause you should never discuss religion or politics at the dinner table…

Page 35: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

37

Programmability in DocumentDB• Familiar constructs• Stored procs, UDFs, triggers

• Transactional• Each call to the service is in

ACID txn• Uncaught exception to rollback

• Sandboxed• No imports• No network calls• No Eval()• Resource governed

& time bound

var helloWorldStoredProc = { id: "helloRealWorld", body: function () { var context = getContext(); var response = context.getResponse();

response.setBody("Hello, Welcome To The Real World"); response.setBody("Here Be Dragons..."); response.setBody("Oh... and network latency"); }}

Page 36: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Demo: Programmability

Page 37: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

40

Where To Use Programmability•Reduce Network Calls• Send multiple documents & shred in a SPROC

•Multi-Document Transactions• Each call in ACID txn•No multi-statement txnsOne REST call = One txn

• Transform & Join• Pull content from

multiple docs. Perform calculations

• JOIN operator intradoc only

• Drive lazy processes• Write journal entries

and process later

Page 38: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

ConsistencyTrading the family silver… for a faster car…

Page 39: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Worth Reading:Replicated

Data Consistency

Explained Through Baseball

By Doug Terry

MS Research

42

http://tiny.cc/cons-baseball

Page 40: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

43

Tuning Consistency• Database Accounts are configured with a default

consistency level. Consistency level can be weakened per read/query request

• Four consistency levels• STRONG – all writes are visible to all readers. Writes

committed by a majority quorum of replicas and reads are acknowledged by the majority read quorum

• BOUNDED STALENESS – guaranteed ordering of writes, reads adhere to minimum freshness. Writes are propagated asynchronously, reads are acknowledged by majority quorum lagging writes by at most N seconds or operations (configurable)

• SESSION (Default) – read your own writes. Writes are propagated asynchronously while reads for a session are issued against the single replica that can serve the requested version.

• EVENTUAL – reads eventually converge with writes. Writes are propagated asynchronously while reads can be acknowledged by any replica. Readers may view older data then previously observed.

Writes Reads

Strong sync quorum writes

quorum reads

Bounded async replication

quorum reads

Session* asyncreplication

session bound replica

Eventual async replication

any replica

* ideal consistency and performance tradeoff for many application scenarios. High performance writes and reads with predictable consistency

Page 41: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Summary

Page 42: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

45

•Document DB is a preview service… expect and enjoy change over time•Think outside the relational model…… if what you really want is an RDBMS then use one of those…

Page 43: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Questions [and Answers]

Page 44: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Related content

WPC401. Javascript for C# Developershttp://techedmelbourne.hubb.me/Sessions/Details/19489

Page 45: Sessions about to start – Get your rig on!. Chris J.T. Auld Intergen

Thanks! Don’t forget to complete your evaluations

aka.ms/mytechedmel