Upload
krishnakumar-s
View
99
Download
0
Tags:
Embed Size (px)
Citation preview
History of Database Systems
1960’s : Hierarchical and Network (IMS, CODASYL etc.)
1970’s : Beginning of theory of relational model of database
1980’s : Rise of RDBMS and SQL
1990’s : Spreadsheets and MySQL; evolution of web
2000’s : Large enterprise & open source; Google & Amazon
2010’s : Emergence of NoSQL systems
2020’s : NewSQL?
CAP theore
m
RDBMS
Strong foundation – Relational Model Highly Structured – rows, columns, data
types Structured Query Language - standardized ACID properties – all or nothing Joins – new views from relationships
RDBMS – Weakness
Joins – Not scalable Transactions – Read & write operations will
be slow because of locking resources Fixed definitions – Difficult to work with
highly variable data Document integration – difficult create
reports based on structured & unstructured data
Any existing solution?
• Data partitioning• Replication• Clustering• Query distribution• Load balancing• Consistency/Syncing• Latency/Concurrency• Network bottle neck• Multiple data centers• Distributed backups• Node failures• Voting algorithms for failure detection• Administration of many systems• Monitoring
RDBMS is scalable only if designed & administered correctly (Period)
NoSQL! What is in a name?
1998 :
• Carlo Strozzi developed a open-source relational database “Strozzi NoSQL”• Database stores tables as ASCII files; tuples as tab separated values• It doesn’t use SQL as query language – so given the name “NoSQL”• Instead it used UNIX shell script and pipeline to retrieve data
Irony! A relational database is named as NoSQL!
2009 :
• Johan Oskarsson organized a meetup of people developing open-source, distributed, non relational databases on June 11, 2009• He wanted a simple twitter hash tag for the meetup; quick, memorable, & helps Google search • Eric Evans come up with the name NoSQL, for the single meetup
NoSQL! What is in a name?
• The name is negative• The name does not describe the purpose of their meet up• The name does not define the new database system• But; the name just satisfied the twitter tag! And caught on like wildfireWhat does it stands for!
• “No to SQL”? Not exactly• “Not Only SQL”? Then what about SQL Server, Oracle etc.?
The answer is “You don’t worry about what it stands for!
NoSQL
• The NoSQL is a movement• The NoSQL is an ecosystem for future database technology• NoSQL is an accidental neologism. There is no prescriptive definition
Characteristics of NoSQL
• Not using the relational model• Running well in clusters• Open-source• Built for 21st century web estates• Schemaless
The most important result of NoSQL movement is; Polyglot Persistence
Theorems Ahead!
Brewer’s CAP theorem
• In 2000, Eric Brewer presented the CAP principle as conjuncture• In 2002, Seth Gilbert & Nancy Lynch published a formal proof and rendered the principle as CAP theorem
There are three essential system requirements necessary for the successful design, implementation, and deployment of applications in distributed computing
1. Consistency2. Availability3. Partition Tolerance
In majority of instances, a distributed system can only guarantee any two, not all three
Brewer’s CAP theorem
Consistency refers to whether a system operates fully or not. Do all nodes within a cluster see all the data they are supposed to? This is the same idea presented in ACID
Availability means just as it sounds. Is the given service or system available when requested? Does each request get a response outside of failure or success?
Partition Tolerance represents the fact that a given system continues to operate even under circumstances of data loss or system failure. A single node failure should not cause the entire system to collapse.
In large scale, distributed, non relational systems, they need availability and partition tolerance, so consistency suffers and ACID collapses
Brewer’s CAP theorem
Pick any two
CA AP
CP
RDBMS’sSQL ServerOracleMySQL etc.
Availability Each client can always read and
write
ConsistencyAll clients always have he same
viewof data
PartitionTolerance
The system works well despite physicalNetwork partitions
Bigtable, MongoDB, BerkleyDB, MemcacheDB, Hbase etc
CassandraCouchDBDynamoVoldemort
BASE
Basically Available : states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state
Soft state : The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’
Eventual Consistency : The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one
It’s OK to use stale data; it’s OK to give approximate answers.
NoSQL Data Architecture Patterns
Key-Valuekey value
key value
key value
key value
Column-Family
Graph Document
Key-Value
Key-Valuekey value
key value
key value
key value
Keys used to access opaque blobs of data
Values can contain any type of data (images, video)
Pros: scalable, simple API (put, get, delete)
Cons: no way to query based on the content of the value
Column family
Column-Family Key includes a row, column
family and column name Store versioned blobs in one
large table Queries can be done on rows,
column families and column names
Pros: Good scale out Cons: Can not query blob
content, row and column designs are critical
Graph Store
Graph Data is stored in a series of nodes and properties
Queries are really graph traversals Ideal when relationships between
data is key: e.g. social networks
Pros: fast network search, works with public linked data sets
Cons: Poor scalability when graphs don't fit into RAM, specialized query language
Document Store
Document Data stored in nested
hierarchies Logical data remains stored
together as a unit Any item in the document can
be queried Pros: No object-relational
mapping layer, ideal for search Cons: Complex to implement,
incompatible with SQL
Polyglot Persistence
Different database systems are designed to solve different problemsUsing single database engine for all the requirements leads to non-performant solutions
The solution is polyglot persistence; a hybrid approach to data persistence
References
• Making Sense of NoSQL – Dan McCreary and Ann Kelly• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence - Pramod J. Sadalage and Martin Fowler• Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence - John Sharp, Douglas McMurtry, Andrew Oakley, Mani Subramanian, Hanzhong Zhang