Upload
partha-das
View
158
Download
6
Tags:
Embed Size (px)
Citation preview
Presented by:Partha Pratim Das5th SemesterM.Tech. in Computer Sc. And Applications
Registration no.:053792 of 2006-07Roll no.: 97/CSA/111002
Apr 13, 2023 1No SQL
Introduction to NOSQL SQL v/s NoSQL Architecture of NoSQL ACID v/s BASE Examples of NOSQL databases NOSQL vs SQL Conclusion
Apr 13, 2023 2No SQL
Database – is a organized collection of inter-related data.
Data base Management System (DBMS)- is a software package with computer program that controls the creation , maintenance & use of a database in a convenient and efficient way.◦ for DBMS , we use structured language to interact with it◦ Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.
Relational DBMS - A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS).
Apr 13, 2023 3No SQL
Structured Query Language Special purpose programming language designed for
managing data in Relational DBMS. Originaly based upon relational algebra & tuple
relation calculus. SQl’s scope include data insertion, updation &
deletion, schema creation and modification , data access control.
It is static and strongly used in database. Most widely used database language. Query is the most important operation in SQL. Ex. SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
Apr 13, 2023 4No SQL
Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema
or the concept of joins All NOSQL offerings relax one or more of the
ACID properties .◦ Atomicity , Consistency , Isolation , Durability
( ACID ) “NOSQL” = “Not Only SQL” =
Not Only using traditional relational DBMS
Apr 13, 2023 5No SQL
• Alternative to traditional relational DBMS• Flexible schema• Quicker/cheaper to set up• Massive scalability• Relaxed consistency higher performance &
availability
* No declarative query language more programming
* Relaxed consistency fewer guarantees
Apr 13, 2023 6No SQL
Every problem cannot be solved by traditional relational database system exclusively.
Handles huge databases. Redundancy, data is pretty safe on
commodity hardware Super flexible queries using map/reduce Rapid development (no fixed schema) Very fast for common use cases
Apr 13, 2023 7No SQL
Inspired by Distributed Data Storage problems Scale easily by adding servers Not suited to all problem types, but super-suited to certain large problem types High-write situations (eg activity tracking or timeline rendering for millions of users) A lot of relational uses are really dumbed down (eg fetch by PK with update)
Apr 13, 2023 8No SQL
Apr 13, 2023 9No SQL
Clients know how to: Send items to servers (consistent hashing) What to do when a server fails How to fetch keys from servers Can “weigh” to server capacities Servers know how to: Store items they receive Expire them from the cache No inter-server communications – everything
is unaware Apr 13, 2023 10No SQL
RDBMS tries to ensure ACID properties
NoSQL does not guarantee ACID and is therefore much faster
We don’t need ACID everywhere
NoSQL follows BASE properties
Apr 13, 2023 11No SQL
Basic availabilityThe store appears to work most of the time.
Soft-stateStores don’t have to be write-consistent, nor do different replicas have to be
mutually consistent all the time. Eventual consistency
Stores exhibit consistency at some later point (e.g., lazily at read time).
Apr 13, 2023No SQL 12
Simple web application with not much traffic◦ Application server, database server all on one
machine
Apr 13, 2023 13No SQL
More traffic comes in Application server Database server
Even more traffic comes in Load balancer Application server x2 Database server
Apr 13, 2023 14No SQL
Even more traffic comes in Load balancer x N
easy Application server x N
easy Database server xN
hard for SQL databases
Apr 13, 2023 15No SQL
SQL Slowdown
Not linear!
Apr 13, 2023 16No SQL
NoSQL Scalling - Need more storage?
Add more servers!Need higher performance?
Add more servers!Need better reliability?
Add more servers!
Apr 13, 2023 17No SQL
You can scale SQL databases (Oracle, MySQL, SQL Server…)◦ This will cost you dearly◦ If you don’t have a lot of money, you will reach
limits quickly You can scale NoSQL databases
◦ Very easy horizontal scaling◦ Lots of open-source solutions◦ Scaling is one of the basic incentives for design,
so it is well handled◦ Scaling is the cause of trade-offs causing you to
have to use map/reduce
Apr 13, 2023 18No SQL
Almost infinite horizontal scaling Very fast Performance doesn’t deteriorate with
growth (much) No fixed table schemas No join operations Ad-hoc queries difficult or impossible Structured storage Almost everything happens in RAM
Apr 13, 2023 19No SQL
Apr 13, 2023 20No SQL
Key-Value Stores
Column Family
Document Databases
Graph Databases
Apr 13, 2023 21No SQL
Apr 13, 2023No SQL 22
Lineage: Amazon's Dynamo paper and Distributed HashTables.
Data model: A global collection of key-value pairs Example systems
◦ Google BigTable , Amazon Dynamo, Cassandra, Voldemort , Hbase etc.
Implementation: efficiency, scalability, fault-tolerance, load balancing◦Records distributed to nodes based on key◦ Replication (R= 2*F+1) where F stands for fault
tolerence◦ Single-record transactions, “eventual
consistency”
Apr 13, 2023 23No SQL
Lineage: Inspired by Lotus Notes. Data model: Collections of documents,
which contain key-value collections (called "documents").
Example: CouchDB, MongoDB, Riak
Apr 13, 2023 24No SQL
Apr 13, 2023No SQL 25
Basic Building Blocks of Column Family Storage
Apr 13, 2023No SQL 26
Lineage: Draws from Euler and graph theory.
Data model: Nodes & relationships, both which can hold key-value pairs
Example: AllegroGraph, InfoGrid, Neo4j
Apr 13, 2023 27No SQL
Property Graph:
• It contains nodes and relationships
• Nodes contain properties (key-value pairs)
• Relationships are named and directed, and always have a start and end node
• Relationships can also contain properties
Apr 13, 2023No SQL 28
Apr 13, 2023No SQL 29
Apr 13, 2023No SQL 30
Google’s framework for processing highly distributable problems across huge datasets using a large number of computers
Let’s define large number of computers◦ Cluster if all of them have same hardware◦ Grid unless Cluster (if !Cluster for old-style programmers)
Process split into two phases◦ Map
Take the input, partition it delegate to other machines Other machines can repeat the process, leading to tree
structure Each machine returns results to the machine who gave it the
task
Apr 13, 2023 31No SQL
◦ Reduce collect results from machines you gave the tasks combine results and return it to requester
◦ Slower than sequential data processing, but massively parallel
◦ Sort petabyte of data in a few hours◦ Input, Map, Shuffle, Reduce, Output
Apr 13, 2023 32No SQL
Hadoop / Hbase
CassandraAmazon
SimpleDBMongoDBCouchDBRedis
MemcacheDBVoldemortHypertableCloudataIBM
Lotus/Domino
Apr 13, 2023 33No SQL
Cassandra ◦ Facebook (original developer, used it till late 2010)◦ Twitter◦ Digg◦ Reddit◦ Rackspace◦ Cisco
BigTable◦ Google (open-source version is HBase)
MongoDB◦ Foursquare◦ Craigslist◦ Bit.ly◦ SourceForge◦ GitHub
Apr 13, 2023 34No SQL
Written in: Java Protocol: Custom, binary (Thrift) Tunable trade-offs for distribution and
replication (N, R, W) Querying by column, range of keys BigTable-like features: columns, column
families Writes are much faster than reads (!)
◦ Constant write time regardless of database size Map/reduce possible with Apache Hadoop
Apr 13, 2023 35No SQL
Cassandra is open source DBMS from Appache software foundation.
Cassandra provides a structured key-value store with tunable consistency
Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure
It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010
Apr 13, 2023 36No SQL
Written in: Java Main point: Billions of rows X millions of columns Modeled after BigTable Map/reduce with Hadoop Query predicate push down via server side scan and get
filters Optimizations for real time queries A high performance Thrift gateway HTTP supports XML, Protobuf, and binary Cascading, hive, and pig source and sink modules No single point of failure While Hadoop streams data efficiently, it has overhead for
starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes.
Random access performance is like MySQL
Apr 13, 2023 37No SQL
Written in: Erlang Main point: DB consistency, ease of use Bi-directional (!) replication, continuous or ad-hoc,
with conflict detection, thus, master-master replication. (!)
MVCC - write operations do not block reads Previous versions of documents are available Crash-only (reliable) design Needs compacting from time to time Views: embedded map/reduce Formatting views: lists & shows Server-side document validation possible Authentication possible Real-time updates via _changes (!) Attachment handling CouchApps (standalone JS apps) Apr 13, 2023 38No SQL
Apache project A framework that allows for the distributed processing of
large data sets across clusters of computers Designed to scale up from single servers to thousands of
machines Designed to detect and handle failures at the application
layer, instead of relying on hardware for it Created by Doug Cutting, who named it after his son's
toy elephant Hadoop subprojects
◦ Cassandra◦ HBase◦ Pig
Hive was a Hadoop subproject, but is now a top-level Apache
project
Apr 13, 2023 39No SQL
Scales to hundreds or thousands of computers, each with several processor cores
Designed to efficiently distribute large amounts of work across a set of machines
Hundreds of gigabytes of data constitute the low end of Hadoop-scale
Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes
Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
Apr 13, 2023 40No SQL
Uses distributed file system (HDFS)◦ Designed to hold very large amounts of data
(terabytes or even petabytes)◦ Files are stored in a redundant fashion across
multiple machines to ensure their durability to failure and high availability to very parallel applications
◦ Data organized into directories and files◦ Files are divided into block (64MB by default) and
distributed across nodes Design of HDFS is based on the design
of the Google File System
Apr 13, 2023 41No SQL
A petabyte-scale data warehouse system for Hadoop
Easy data summarization, ad-hoc queries Query the data using a SQL-like language
called HiveQL Hive compiler generates map-reduce jobs
for most queries
Apr 13, 2023 42No SQL
NoSQL is a great problem solver if you need it
Choose your NoSQL platform carefully as each is designed for specific purpose
Get used to Map/Reduce It’s not a sin to use NoSQL alongside
(yes)SQL database
Apr 13, 2023 43No SQL
Graph Databases by Ian Robinson,Jim Webber and Emil Eifrem
http://www.mongodb.com/learn/nosql http://www.couchbase.com/nosql-database http://en.wikipedia.org/wiki/
Apache_Cassandra http://en.wikipedia.org/wiki/SQL http://en.wikipedia.org/wiki/NoSQL www.slideshare.com
Apr 13, 2023 44No SQL
THANK YOU..!!
Apr 13, 2023 45No SQL