Upload
meshal-albeedhani
View
285
Download
3
Tags:
Embed Size (px)
DESCRIPTION
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
Citation preview
NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and
Comparison
A B M Moniruzzaman and Syed Akhter Hossain
04/10/23 1CSC 8710
Contents
• NoSQL databases definition• Why NoSQL databases?• Characteristics of NoSQL Databases• Primary Uses of NoSQL Database• Key-Value databases• Documents databases• Column-Family databases• Graph databases• Adoption of NoSQL Database • Conclusion
04/10/23 CSC 8710 2
NoSQL Database
• NoSQL for Not Only SQL, refers to an eclectic and increasingly familiar group of non-relational data management system
• databases are not built primarily on tables, and generally don't use SQL for data manipulation.
• NoSQL systems are distributed, non-relational database, designed for large-scale data storage and for massive-parallel data processing across a large number of commodity servers.
04/10/23 CSC 8710 3
NoSQL Database
• They also use non-SQL languages and mechanisms to interact with data.
• NoSQL database systems arose alongside major Internet companies, such as Google, Amazon, and Facebook which had challenges in dealing with huge quantities of data
• These systems are designed to scale thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses
04/10/23 CSC 8710 4
Why NoSQL?
• Relational DBMSs have been a successful technology for many years, providing persistence, concurrency control and integration mechanisms.
• The need of processing large amount of data changes the direction from scaling vertically to scaling horizontally on clusters.
04/10/23 CSC 8710 5
Why NoSQL?
• NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware
• Organizations that collect large amounts of unstructured data are increasingly turning to non-relational databases (NoSQL databases).
04/10/23 CSC 8710 6
Big Data
04/10/23 CSC 8710 7
Characteristics of NoSQL Databases
• Strong Consistency: all clients see the same version of data.
• High Availability: Data always available, at least one copy of the requested data even if one of the nodes is down.
• Partition-tolerance: the total system keeps its characteristic even when being deployed on different servers
04/10/23 CSC 8710 8
Characteristics of NoSQL Databases
04/10/23 CSC 8710 9
Primary Uses of NoSQL Database
1. Large-scale data processing
2. Exploratory analytics on semi-structured data (expert level)
3. Large volume data storage.
04/10/23 CSC 8710 10
Classification of NoSQL Databases
• Key-Value databases
• Documents databases
• Column Family databases
• Graphics databases
04/10/23 CSC 8710 11
Key-Value Databases
• These DMS store items as alpha-numeric identifiers that refer to the keys. Each key has associated values.
• The values could be simple text strings or more complex lists and sets
• Search only performed against keys, and limited to exact matches.
• Search cannot be performed against values
04/10/23 CSC 8710 12
Key-Value Databases
04/10/23 CSC 8710 13
Key-Value characterstics
• The simplicity of Key-Value Store makes them very quick and light.
• Highly scalable retrieval of the values needed for application tasks such as retrieving product names.
• This is why Amazon use K-V system, Dynamo, in its shopping cart. Dynamo is a highly available key-value storage system.
• Example: Dynamo (Amazon), Voldemort (LinkedIn) Redis, BerkeleyDB, Riak
04/10/23 CSC 8710 14
Pros and Cons
• pros: anything can be stored in an aggregate
• cons: only key lookup to access the entire aggregate is allowed (no query and part of aggregate retrieval mechanisms)
04/10/23 CSC 8710 15
Document Database
• Designed to manage and store documents.
• These documents are encoded in a standard data exchange format such as XML, JSON (Javascript Option Notation) or BSON (Binary JSON).
04/10/23 CSC 8710 16
Document Database
04/10/23 CSC 8710 17
Primary Uses
• Document databases are good for storing and managing Big Data-size collections of literal documents such as text documents, email messages.
04/10/23 CSC 8710 18
Pros And Cons
• pros: allow structured queries and partial aggregate retrieval based on the fields in the aggregate
• cons: imposes a limit on what can be placed in a database
04/10/23 CSC 8710 19
Column-Family Databases
• It consists of a Key-Value pair where the value consists of set of columns.
• The column family databases are represented in tables, each key-value pair being a row.
• All the related data can be grouped as one family
04/10/23 CSC 8710 20
Primary Uses
1. Large-scale, batch-oriented data processing: sorting, parsing, conversion :
- conversions between hexadecimal, binary and decimal code values.
2. Exploratory and predictive analytics performed by expert statisticians and programmers.
04/10/23 CSC 8710 21
Column-Family
04/10/23 CSC 8710 22
Graph Databases
• Graph databases replace relational tables with structured relational graphs of interconnected key-value pairings.
• Graph databases are useful when you are more interested in relationships between data than the data itself and it works perfectly for the social network.
• It is optimized for relationship traversing not for querying
• Examples: Neo4j, InfoGrid, Sones GraphDB, AllegroGraph, InfiniteGraph
04/10/23 CSC 8710 23
Graph Databases
04/10/23 CSC 8710 24
Adoption of NoSQL Database
• Organizations that have massive data storage are looking seriously at NoSQL.
• NoSQL Database expert are highly demanded for most of the developing organizations.
• The next graph shows job trends of five NoSQL Databases from Indeed.com
04/10/23 CSC 8710 25
Job Trends of Five NoSQL Databases
04/10/23 CSC 8710 26
Adoption of NoSQL Database
• MongoDB‘s growth means that it has cemented its place as the most popular NoSQL database.
• According to LinkedIn profile mentions, The mentions of NoSQL technologies form 45% in LinkedIn profiles.
04/10/23 CSC 8710 27
LinkedIn statistics
04/10/23 CSC 8710 28
Conclusion
• Computational and storage requirements of applications such as for Big Data analytics, Business Intelligence and social networking over peta-byte datasets led us to the change from SQL to NoSQL DBs.
• This led to the development of horizontally scalable, distributed non-relational No-SQL databases.
• MongoDB‘s is the most demanded one.
04/10/23 CSC 8710 29
Resources
• http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdf
• http://en.wikipedia.org/wiki/Column_family
• http://en.wikipedia.org/wiki/NoSQL
04/10/23 30CSC 8710
04/10/23 31CSC 8710
04/10/23 32CSC 8710