Nosql emerging world of polygot persistence

Preview:

DESCRIPTION

Nosql emerging world of polygot persistence

Citation preview

LOGO

LOGO

Emerging World of Polyglot Persistence

● Duc Nguyen● Vu Truong

By :

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOWhy NoSQL

The Value of Relational Database

Persistent Data

Concurrency

Integration

Standard Model

LOGOWhy NoSQL

Impedance Mismatch

LOGOWhy NoSQL

Attack of the Clusters

LOGOWhy NoSQL

Common characteristics of NoSQL :

● Not using the relational model

● Running well on clusters

● Open-source

● Built for the 21th century web estates

● Schemaless

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGO

● A collection of data that we interact with as a unit.

● Aggregates form the boundaries for ACID operationswith the database.

● Key-Value , documents and column-family databases can all be seen as forms of aggregate-oriented DB.

● Aggregates make it easier for the database to managedata storage over clusters.

● Aggregate-oriented databases work best when most data interaction is done.

Aggregate Data Models

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOData Models

Complex Schema :

LOGOData Models

Graph Databases :

LOGOData Models

Schemaless Databases :

Schemaless databases allow you to freely add fields to records.But there is usually an implicit schema expected by users of the data

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGODistribution Models

Single Server :

LOGODistribution Models

Sharding :

LOGODistribution Models

Master-Slave Replication :

LOGODistribution Models

Peer-to-peer Replication :

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOConsistency

Update Consistency :

Write-write conflicts occur when two clients try to write the same data at the same time.

Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect conflicts and fix them.

LOGOConsistency

Read Consistency :

LOGOConsistency

Read Consistency :

LOGOConsistency

CAP Theorem :

LOGOConsistency

Some Consistency Model:

Strong consistency.

Weak consistency.

Eventually Consistent

LOGOConsistency

Eventually consistent:

Causal Consistency.

Read-You-Writes Consistency.

Session Consistency.

Monotonic Read Consistency.

Monotonic Write Consistency.

LOGOContent

Why NoSQL1

Aggregate Data Models2

Data Models3

Distribution Models4

Consistency5

Stamps & Map-Reduce6

Part I :

Un

der

stan

d

LOGOStamps & Map-Reduce

Stamps :

Version stamps help you detect concurrency conflicts

Version stamps can be implemented using counters, GUIDs , content hashes, timestamps, or a combination of these.

With distributed systems, a vector of version stamps allows you to detect when different nodes have conflicting updates

LOGOStamps & Map-Reduce

Map-Reduce : Basic

LOGOStamps & Map-Reduce

Map-Reduce : Partitioning and Combining

LOGOStamps & Map-Reduce

Map-Reduce : Partitioning and Combining

LOGOStamps & Map-Reduce

Map-Reduce : Partitioning and Combining

LOGOStamps & Map-Reduce

Map-Reduce : Partitioning and Combining

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGOContent

Part II :

Imp

lem

en

t

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGOKey-Value Databases

Comparison with Oracle :

LOGOKey-Value Databases

What Is a Key – Value Store :

The simplest NoSQL data stores to use From an API perspective.

Some of the popular key-value database :Riak, Memcached DB , Berkeley DB, Hamster DB , Amazon Dynamo DB ....

LOGOKey-Value Databases

Key-Value Store Features :

Consistency : applicable only for operations on a single key , operations are either a GET, PUT, or DELETE on a SINGLE KEY.

Transactions : Different products of the key-value store kind have differentspecifications of transactions.

Query Features : Only support query by the key.

Structure of Data : Don't care what is stored in the value part of thekey-value. The value can be a blob, text, JSON, XML ...

Scaling : many key-value stores scale by using sharding.

LOGOKey-Value Databases

Suitable Use Cases :

Storing Session Information : every web session is unique and is assigneda unique session value.This single-request operation makes it very fast.Solutions such as Memcached are used by many web applications.

User Profiles , Preferences : cause almost every user has a unique userId , username , or some other attribute .

Shopping Cart Data : for E-commerce websites...

LOGOKey-Value Databases

When Not to Use :

Relationships among Data :

Multioperation Transactions:

Query by Data:

Operations by Sets:

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGODocument Databases

Comparison with Oracle :

LOGODocument Databases

What Is a Document Database :

LOGODocument Databases

Features :Consistency : using the replica sets and choosing to wait for the writesto be replicated to all the slaves or a given number of slavers.

Transactions : Transactions at the single-document level are known as atomic transactions. It's not possible with more than one operation.

Availability : Try to improve on available by replicating data using the master-slave setup. Providing high availability using replica sets.

LOGODocument Databases

Features :

Query Features : provide different query features. CouchDB allows you to query via view.

One of good features of document databases, as compared to key-value stores , is that we can query the data inside the document without having toretrieve the whole document by its key and introspect the document.

LOGODocument Databases

Features :Scaling : When a new nod is added , it will sync up with the existing nodes,join the replica set as secondary node , and start serving read request.

LOGODocument Databases

Suitable Use Cases :Event Logging : Application have different event logging needs; within the enterprise, these are many different applications that want to logevents. Documents DB can store all these different types of eventsand can act as a central data store for event storage.

Content Management Systems, Blogging Platforms: They work well incontent management systems or applications for publishing websites,managing user comments , user registrations, profiles,web-facing documents...

Web Analytics or Real-Time Analytics: store data for real-time analytics,since parts of the document can be updated.It's very easy to store pageviews or unique visitors.

E-Commerce Applications : often need to have flexible schema for products and orders, as well as the ability to evolve their data modelswithout expensive database refactoring or data migration.

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGOColumn-Family Stores

LOGOContent

Key-Value Database1

Document Database2

Column-Family Stores3

Graph Databases4

Schema Migrations & Polygot Persistence5

Choosing Your Database6

Part II :

Imp

lem

en

t

LOGOGraph Databases

Common characteristics :

● What is a Graph Databases ?

● Features

● Suitable Use Cases

● When Not to Use

LOGOGraph Databases

What is Graph Databases :

LOGO

● Graph databases allow you to store entities andrelationships between these entities.

● We can query the graph in many ways.

● A query on the graph is also known as traversing thegraph.

● In graph databases, traversing the joins or relationshipsis very fast.

Graph Databases

What is Graph Databases :

LOGO

● Consistency.

● Transactions.

● Availability.

● Query Features.

● Scaling.

Graph Databases

Features :

LOGO

Neo4J:

Graph Databases

● We have to create relationship between the nodes in both directions

● Relationships are first-class citizens in graph databases

● Relationships don’t only have a type, a start node, and an end node, but can have properties of their own.

LOGO

Consistency:

Graph Databases

● Graph databases ensure consistency throughtransactions.

● When running Neo4J in a cluster, a write to the masteris eventually synchronized to the slaves.

● Slaves are always available for read.

● They do not allow dangling relationships.

LOGO

Transactions:

Graph Databases

● Neo4J is ACID-compliant.

● Before changing any nodes or adding any relationshipsto existing nodes, we have to start a transaction.

● Read operations can be done without initiating atransaction.

LOGO

Availability:

Graph Databases

● Neo4J achieves high availability by providing forreplicated slaves.

● These slaves can also handle writes.

● Neo4J uses the Apache ZooKeeper to keep track.

LOGO

Query Features:

Graph Databases

● Graph databases are supported by query languagessuch as Gremlin.

● Gremlin is a domain-specific language for traversingGraphs.

● Neo4J also has the Cypher query language for querying the graph.

● Neo4J allows you to query the graph for properties of the nodes, traverse the graph, or navigate the nodes

● Properties of a node can be indexed using the indexing service.

● Neo4J uses Lucene as its indexing service.

LOGO

Scaling:

Graph Databases

● With graph databases, sharding is difficult.

● The working set of nodes and relationships is heldentirely in memory.

● Adding more slaves with read-only access to the data

● Sharding the data from the application side usingdomain-specific knowledge.

LOGO

Suitable use cases:

Graph Databases

● Connected Data

● Routing, Dispatch, and Location-Based Services.

● Recommendation Engines.

LOGO

When not to use:

Graph Databases

● Problem when you want to update all or a subset ofentities.

LOGO

Đ ng l c phát tri n:ộ ự ể

XU H NG PHÁT TRI NƯỚ Ể

• Ngăn ng a nh ng s ph c t p không c n thi từ ữ ự ứ ạ ầ ế

• Tính ch u t i cao ( High Throughput ).ị ả

• Kh năng m r ng theo chi u ngang và ả ở ộ ề

• ch y đ c trên các ph n c ng thông th ng.ạ ượ ầ ứ ườ

• Tính ph c t p và chi phí đ thi t l p các c m c s ứ ạ ể ế ậ ụ ơ ở

d li u.ữ ệ

• Th a hi p gi đ tin c y và hi u su t caoỏ ệ ữ ộ ậ ệ ấ

LOGOXU H NG PHÁT TRI NƯỚ ỂĐ ng l c phát tri n:ộ ự ể

• Xóa b t duy v m t c s d li u có th gi i quy t ỏ ư ề ộ ơ ở ữ ệ ể ả ế

t t c các v n đ liên quan đ n l u tr d li u.ấ ả ấ ề ế ư ữ ữ ệ

• c m v m t s phân b đ n gi n và phân vùng Ướ ơ ề ộ ự ố ơ ả

c a các mô hình d li u t p trung.ủ ữ ệ ậ

• S phát tri n c a ngôn ng l p trình và các ự ể ủ ữ ậ

frameworks.

• Đáp ng yêu c u c a đi n toán đám mây.ứ ầ ủ ệ

LOGO

Phân lo i: (Theo lý thuy t CAP)ạ ế

XU H NG PHÁT TRI NƯỚ Ể

LOGON i dungộ

Gi i thi uớ ệ1

Xu h ng phát tri nướ ể2

Các nguyên lý ho t đ ngạ ộ3

H c s d li u MongoDBệ ơ ở ữ ệ4

T ng k tổ ế5

Tài li u tham kh oệ ả6

LOGOCác nguyên lý ho t đ ngạ ộLý thuy t CAP :ế

LOGOCác nguyên lý ho t đ ngạ ộS phân chia:ự

Memory Cached ( b nh Cache) .ộ ớ

Clustering ( Bó ) .

Separating Reads from Writes ( Tách bi t gi a vi c đ c và ghi) .ệ ữ ệ ọ

LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ

L u tr theo hàngư ữ

LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ

L u tr theo c tư ữ ộ

LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ

L u tr theo nhóm c tư ữ ộ

LOGOCác nguyên lý ho t đ ngạ ộCác mô hình l u tr :ư ữ

L u tr s d ng mô hình c u trúc cây h p nh tư ữ ử ụ ấ ợ ấ

LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ

Truy v n t ng t nh các h CSDL quan hấ ươ ự ư ệ ệ

LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ

Truy v n s d ng ph ng th c tán x t p h pấ ử ụ ươ ứ ạ ậ ợ

LOGOCác nguyên lý ho t đ ngạ ộMô hình truy v n:ấ

Truy v n s d ng cây B++ treeấ ử ụ

LOGOCác nguyên lý ho t đ ngạ ộĐánh giá hi u xu t truy v n :ệ ấ ấ

LOGON i dungộ

Gi i thi uớ ệ1

Xu h ng phát tri nướ ể2

Các nguyên lý ho t đ ngạ ộ3

H c s d li u MongoDBệ ơ ở ữ ệ4

T ng k tổ ế5

Tài li u tham kh oệ ả6

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆT o c s d li u (Collection).ạ ơ ở ữ ệdb.createCollection(<name> ,{<configuration parameters >})Đ nh nghĩa m t tài li u : dùng JSONị ộ ệ{title : " MongoDB " ,last_editor : "172.5.123.91" ,last_modified: new Date ( " 9/23/2010 " ) ,body : " MongoDB is a ..." ,categories : [" Database " , " NoSQL " , " Document Database ],reviewed : false}

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆThêm m t tài li u :ộ ệdb.<collection>.insert({ title:"MongoDB", last_editor: ... }) ;Truy xu t m t tài li u :ấ ộ ệdb.< collection >. find ( { categories : [ " NoSQL " , " Document Database" ] } ) ;C p nh t tài li u :ậ ậ ệ db.< collection >.save ( { ... } ) ;

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆC p nh t d li u :ậ ậ ữ ệdb.<collection>.update (<criteria>,<new document>,<upsert>,<multi >) ;Xóa tài li u:ệdb.< collection >.remove ( { < criteria > } ) ;T o ch m c:ạ ỉ ụ db.< collection >.ensureIndex ({ < field1 >: < sorting >,< field2 >:< sorting > , ...}) ;

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệINSERT v i SQL Server:ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệINSERT v i SQL Server:ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n đ n gi n v i SQL Server:ấ ơ ả ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n đ n gi n v i SQL Server:ấ ơ ả ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n ph c t p v i SQL Server:ấ ứ ạ ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆSo sánh v i hi u su t khi th c thi l nh ớ ệ ấ ự ệtruy v n ph c t p v i SQL Server:ấ ứ ạ ớ

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆCh y Demo so sánh INSERT v i ạ ớMySQL:(mili giây)

LOGOH C S D Li U MONGODBỆ Ơ Ở Ữ ỆCh y Demo so sánh INSERT v i ạ ớMySQL:(mili giây)

LOGON i dungộ

Gi i thi uớ ệ1

Xu h ng phát tri nướ ể2

Các nguyên lý ho t đ ngạ ộ3

H c s d li u MongoDBệ ơ ở ữ ệ4

T ng k tổ ế5

Tài li u tham kh oệ ả6

LOGOT ng k tổ ế

u đi m :Ư ể

+ Đáp ng đ c đ c hi u su t cao , ch u t i l n.ứ ượ ượ ệ ấ ị ả ớ

+ Kh năng m r ng theo chi u ngang.ả ở ộ ề

+ Ch y đ c trên nhi u ph n c ng ph thông.ạ ượ ề ầ ứ ổ

+ Đáp ng đ c nhu c u c a đi n toán đám mây.ứ ượ ầ ủ ệ

LOGOT ng k tổ ế

Nh c đi m:ượ ể

+ Đ i đa s đ u đang trong quá trình phát tri n .ạ ố ề ể

+ Đa s đ u là ph n m m ngu n m . Khó có th đ c ố ề ầ ề ồ ở ể ượ

ch p nh n trong các môi tr ng kinh doanh l n.ấ ậ ườ ớ

+ Không ràng bu c t c là không đ m b o đ c tính toànộ ứ ả ả ượ

v n c a d li u.ẹ ủ ữ ệ

+ Không đáp ng đ c nhu c u c a nhi u lo i ng ứ ượ ầ ủ ề ạ ứ

d ng.ụ

LOGOT ng k tổ ếNoSQL hay SQL:

Các chuyên gia khuyên r ng khi phát tri n ng d ng nhà s n xu t nên quan ằ ể ứ ụ ả ấtâm t i các NoSQL. Và ng d ng c a b n nên chuy n qua NoSQLớ ứ ụ ủ ạ ể khi th y th c s c n thi t.ấ ự ự ầ ế

LOGON i dungộ

Gi i thi uớ ệ1

Xu h ng phát tri nướ ể2

Các nguyên lý ho t đ ngạ ộ3

H c s d li u MongoDBệ ơ ở ữ ệ4

T ng k tổ ế5

Tài li u tham kh oệ ả6

LOGOTài li u tham kh oệ ả1. NoSQL resources: http://nosql-database.org/2. NoSQL wiki - http://en.wikipedia.org/wiki/NoSQL3. Scalability wiki -

http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_.28scale_out.29

4. A Brief History of NoSQL - http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html

5. Nh t Quán Cu i Cùng - ấ ố http://www.sqlviet.com/blog/nhat-quan-cuoi-cung.

6. NoSQL Brief Guide to the Emerging World of Polygot Persistence. Martin Fowler.

LOGO

LOGO

www.themegallery.com

Thank You !

Recommended