Azure Document Db

Marco Parenzan

Azure Document DbCode on GitHub:

@Intel - Assago (Milano)DotNetLombardiaWednesday, December 9, 2015 from 9:00 AM to 6:00 PM (CEST)Milano Fiori, Italy

http://dotnetlombardia.eventbrite.com/

http://www.eventbrite.com/d/italy--milano-fiori/events/

Speaker info/Marco Parenzan

www.slideshare.net/marco.parenzan www.github.com/marcoparenzan marco [dot] parenzan [at] 1nn0va [dot]

it www.1nnova.it @marco_parenzan

Formazione ,Divulgazione e Consulenza con 1nn0vaMicrosoft MVP 2015 for Microsoft AzureCloud Architect, NET developerLoves Functional Programming, Html5 Game Programming and Internet of Things

AZURE COMMUNITY

BOOTCAMP 2015

IoT Day - 08/05/2015

@1nn0va#microservicesconf20159 Maggio 2015

http://www.slideshare.net/marco.parenzan

http://www.github.com/marcoparenzan

Classic MVC

Mode

l

Business Logic

Contract BL/P

View

Controller

CQRS for IoT (Service Bus Powered)

Com

man

dEvent Handler

UI

EventCommand Handler

Signa

lEv

ent

Device

Command

QueueTopics/Subscription

Event Hub

WriteModel

Read/SearchModel

The traditional world

Can Azure help us (2)?http://azure.microsoft.com/en-us/documentation/infographics/cloud-design-patterns/

http://azure.microsoft.com/en-us/documentation/infographics/cloud-design-patterns/

http://azure.microsoft.com/en-us/documentation/infographics/cloud-design-patterns/

IoT day 2015

Business, no longer data, is the foundation of software designDDD!=OOPDon’t start from DataData are not uniqueNo more ACID…ACID transactions are not useful with a distributed model over different storages

Paradigm Shift

• Consistency:• All nodes should see the same data at the same time

• Availability:• Node failures do not prevent survivors from continuing to operate

• Partition-tolerance:• The system continues to operate despite network partitions

• A distributed system can satisfy any two of these guarantees at the same time but not all three

CAP Theorem

IoT day 2015

Key/ValueTableBlobQueueGraphDocument

Not Only Sql Paradigms

IoT day 2015

try to treat your entities as self-contained documents represented in JSONWhen working with relational databases, we've been taught for years to normalize, normalize, normalize.

There are contains relationships between entities.There are one-to-few relationships between entities.There is embedded data that changes infrequently.There is embedded data won't grow without bound.There is embedded data that is integral to data in a document.

Embedding

Denormalizing typically provides for better read performance

IoT day 2015

Representing one-to-many relationships.Representing many-to-many relationships.Related data changes frequently.Referenced data could be unboundedProvides more flexibility than embeddingMore round trips to read data

Referencing

Normalizing typically provides better write performance

IoT day 2015

Promote code first development (mapping objects to json)Resilient to iterative schema changesRicher query and indexing (compared to KV stores) Low impedance as object / JSON store; no ORM requiredIt just worksIt’s fast

Developer Appeal

IoT day 2015

a container of JSON documents and the associated JavaScript application logicJSON docs inside of a collection can vary dramaticallyA unit of scale for transaction and query throughput (capacity units allocated uniformly across all collections) A unit of scale for capacityA unit of replication

What is a collection?

IoT day 2015

Collections in DocumentDB are not just logical containers, but also physical containersThey are the transaction boundary for stored procedures and triggersentry point to queries and CRUD operationsEach collection is assigned a reserved amount of throughput which is not shared with other collections in the same accountCollections do not enforce schema

Collections

Partitioning

IoT day 2015

In hash partitioning, partitions are assigned based on the value of a hash function, allowing you to evenly distribute requests and data across a number of partitions. This is commonly used to partition data produced or consumed from a large number of distinct clients, and is useful for storing user profiles, catalog items, and IoT ("Internet of Things") telemetry data.Evenly distribute across n number of partitions (algorithmic) ….

Hash Partitioning

IoT day 2015

In range partitioning, partitions are assigned based on whether the partition key is within a certain rangeThis is commonly used for partitioning with time stamp propertiesKeep current data hot, Warm historical data, Scale-down older data, Purge / Archive

Range partitioning

IoT day 2015

In lookup partitioning, partitions are assigned based on a lookup map that assigns discrete partition values to specific partitions a.k.a. a partition or shard mapThis is commonly used for partitioning by regionHome tenant / user to a specific partition. Use "master" lookup.Cache this shard map to avoid making the lookup the bottleneck

Lookup partitioning

Tenant Partition Id

Customer 1Big Customer

2

Another 3

Consistency

IoT day 2015

Query / transaction throughput (and reliability – i.e., hardware failure) depend on replication!All writes to the primary are replicated across two secondary replicasAll reads are distributed across three copies“Scalability of throughput” – allowing different clients to read from different replicas helps prevent bottlenecks

BUT replication takes time!Potential scenario: some clients are reading while another is writingNow, the data is out-of-date, inconsistent!

Why worry about consistency?

IoT day 2015

Trade-off: speed (performance & availability) or consistency (data correctness)?“Does every read need the MOST current data?”“Or do I need every request to be handled and handled quickly?”

No “one size fits all” answer … so it’s up to you!4 options …For the entire Db……In a future release, we intend to support overriding the default consistency level on a per collection basis.

Tweakable Consistency

IoT day 2015

client always sees completely consistent dataSlowest reads / writes Mission critical: e.x. stock market, banking, airline reservation

Strong

IoT day 2015

Default – even trade-off between performance & availability vs. data correctnessclient reads its own writes, but other clients reading this same data might see older values

Session

IoT day 2015

client might see old data, but it can specify a limit for how old that data can be (ex. 2 seconds) Updates happen in order receivedsimilar to Session consistency, but speeds up reads while still preserving the order of updates

Bounded Staleness

IoT day 2015

client might see old data for as long as it takes a write to propagate to all replicasHigh performance & availability, but a client might sometimes read out-of-date information or see updates out of order

Eventual

IoT day 2015

At the database level (see preview portal)On a per-read or per-query basis (optional parameter on CreateDocumentQuery method)

Setting Consistency

IoT day 2015

Use Weaker Consistency Levels for better Read latencies IoTData Analysishttp://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/

Consistency Tips

http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/

http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/

Marco Parenzan

GrazieCode on GitHub: https://github.com/marcoparenzan/CSharpDay2015https://github.com/marcoparenzan/CSharpDay2015

@Intel - Assago (Milano)DotNetLombardiaWednesday, December 9, 2015 from 9:00 AM to 6:00 PM (CEST)Milano Fiori, Italy

https://github.com/marcoparenzan/CSharpDay2015https:/github.com/marcoparenzan/CSharpDay2015

https://github.com/marcoparenzan/CSharpDay2015https:/github.com/marcoparenzan/CSharpDay2015

http://dotnetlombardia.eventbrite.com/

http://www.eventbrite.com/d/italy--milano-fiori/events/

Technology

Azure Document Db