Upload
others
View
54
Download
0
Embed Size (px)
Citation preview
What is NoSQL?
NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy
to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, Google that collect terabytes of user data every single day.
NoSQL database stands for "Not Only SQL" or "Not SQL." Though a better term would NoREL NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data.
In this tutorial, you will learn-
What is NoSQL? Why NoSQL? Brief History of NoSQL Databases
Features of NoSQL Types of NoSQL Databases
Query Mechanism tools for NoSQL What is the CAP Theorem? Eventual Consistency
Advantages of NoSQL
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data. The system response time becomes slow when you use RDBMS for massive volumes of data.
To resolve this problem, we could "scale up" our systems by upgrading our existing hardware.
This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts whenever the load increases. This method is known as "scaling out."
NoSQL database is non-relational, so it scales out better than relational databases as they are designed with web applications in mind.
Brief History of NoSQL Databases
1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database 2000- Graph database Neo4j is launched
2004- Google BigTable is launched 2005- CouchDB is launched 2007- The research paper on Amazon Dynamo is released
2008- Facebooks open sources the Cassandra project 2009- The term NoSQL was reintroduced
Features of NoSQL
Non-relational
NoSQL databases never follow the relational model Never provide tables with flat fixed-column records
Work with self-contained aggregates or BLOBs Doesn't require object-relational mapping and data normalization
No complex features like query languages, query planners,
referential integrity joins, ACID
Schema-free
NoSQL databases are either schema-free or have relaxed schemas Do not require any sort of definition of the schema of the data
Offers heterogeneous structures of data in the same domain
NoSQL is Schema-Free
Simple API
Offers easy to use interfaces for storage and querying data provided APIs allow low-level data manipulation & selection methods Text-based protocols mostly used with HTTP REST with JSON
Mostly used no standard based query language Web-enabled databases running as internet-facing services
Distributed
Multiple NoSQL databases can be executed in a distributed fashion
Offers auto-scaling and fail-over capabilities
Often ACID concept can be sacrificed for scalability and throughput Mostly no synchronous replication between distributed nodes Asynchronous Multi-
Master Replication, peer-to-peer, HDFS Replication Only providing eventual consistency
Shared Nothing Architecture. This enables less coordination and higher distribution.
NoSQL is Shared Nothing.
Types of NoSQL Databases
There are mainly four categories of NoSQL databases. Each of these categories has its unique
attributes and limitations. No specific database is better to solve all problems. You should select a database based on your product needs.
Let see all of them:
Key-value Pair Based
Column-oriented Graph Graphs based Document-oriented
Key Value Pair Based
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like "Website" associated with a value like "Guru99".
It is one of the most basic types of NoSQL databases. This kind of NoSQL database is used as a collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some examples of key-value store DataBases. They are all based on Amazon's Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column databases are stored contiguously.
Column based NoSQL database
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are examples of column based database.
Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.
Relational Vs. Document
In this diagram on your left you can see we have rows and columns, and in the right, we have a document database which has a similar structure to JSON. Now for the relational database, you
have to know what columns you have and so on. However, for a document database, you have data store like JSON object. You do not require to define which make it flexible.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-
commerce applications. It should not use for complex transactions which require multiple operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the DB, and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.
Query Mechanism tools for NoSQL
The most common data retrieval mechanism is the REST-based retrieval of a value based on its
key/ID with GET resource
Document store Database offers more difficult queries as they understand the value in a key-value pair. For example, CouchDB allows defining views with MapReduce
What is the CAP Theorem?
CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data
store to offer more than two out of three guarantees
1. Consistency 2. Availability
3. Partition Tolerance
Consistency:
The data should remain consistent even after the execution of an operation. This means once data is written, any future read request should contain that data. For example, after updating the order
status, all the clients should be able to see the same data.
Availability:
The database should always be available and responsive. It should not have any downtime.
Partition Tolerance:
Partition Tolerance means that the system should continue to function even if the communication among the servers is not stable. For example, the servers can be partitioned into multiple groups
which may not communicate with each other. Here, if part of the database is unavailable, other parts are always unaffected.
Eventual Consistency
The term "eventual consistency" means to have copies of data on multiple machines to get high
availability and scalability. Thus, changes made to any data item on one machine has to be propagated to other replicas.
Data replication may not be instantaneous as some copies will be updated immediately while
others in due course of time. These copies may be mutually, but in due course of time, they become consistent. Hence, the name eventual consistency.
BASE: Basically Available, Soft state, Eventual consistency
Basically, available means DB is available all the time as per CAP theorem Soft state means even without an input; the system state may change
Eventual consistency means that the system will become consistent over time
Advantages of NoSQL
Can be used as Primary or Analytic Data Source Big Data Capability
No Single Point of Failure Easy Replication No Need for Separate Caching Layer
It provides fast performance and horizontal scalability. Can handle structured, semi-structured, and unstructured data with equal effect
Object-oriented programming which is easy to use and flexible NoSQL databases don't need a dedicated high-performance server Support Key Developer Languages and Platforms
Simple to implement than using RDBMS It can serve as the primary data source for online applications.
Handles big data which manages data velocity, variety, volume, and complexity Excels at distributed database and multi-data center operations Eliminates the need for a specific caching layer to store data
Offers a flexible schema design which can easily be altered without downtime or service disruption
Disadvantages of NoSQL
No standardization rules Limited query capabilities
RDBMS databases and tools are comparatively mature It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously. When the volume of data increases it is difficult to maintain unique values as keys
become difficult
Doesn't work as well with relational data The learning curve is stiff for new developers
Open source options so not so popular for enterprises.
Summary
NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale
The concept of NoSQL databases beccame popular with Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data
In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational database
NoSQL databases never follow the relational model it is either schema-free or has
relaxed schemas Four types of NoSQL Database are 1).Key-value Pair Based 2).Column-oriented Graph
3). Graphs based 4).Document-oriented NOSQL can handle structured, semi-structured, and unstructured data with equal effect CAP theorem consists of three words Consistency, Availability, and Partition Tolerance
BASE stands for Basically Available, Soft state, Eventual consistency The term "eventual consistency" means to have copies of data on multiple machines to
get high availability and scalability
NOSQL offer limited query capabilities
Introduction to NoSQL Database
NoSQL, known as Not only SQL database, provides a mechanism for storage and retrieval of
data and is the next generation database . It has a distributed architecture with MongoDB and is open source. Most of the NoSQL are open source and it has a capability of horizontal scalability
which means that commodity kind of machines could be added
The capacity of your clusters can be increased. It is schema free and there is no requirement to design the tables and pushing the data to it. NoSQL provides easy replication claiming there are very less manual interventions in this. Once the replication is done, the system will automatically
take care of fail overs.
The crucial factor about NoSQL is that it can handle huge amount of data and can achieve performance by adding more machines to your clusters and can be implemented on commodity
hardware. There are close to 150 NoSQL databases in the market which will make it difficult to choose to choose the right pick for your system.
Why NoSQL?
There are different kinds of data that are structured, unstructured and semi-structured and hence RDBMS systems are not designed to manage these types of data in an effecient way. NoSql databases comes in to the picture and are capable to manage it . People today are use different
kinds of methodology and if you talk about the code velocity and implementation level, people wish to execute various activities at the same time. Talking about the traditional way, it was quite
difficult in terms of taking care of the database and writing your application according to the database. So with the new NoSQL, things have become easier.
Benefits of NoSQL
NoSQL claims various benefits associated with its usage. From managing data to scalability, it
use has increased drastically. Huge amount of data can be managed and then streamed. Different kinds of data like structured and semi structured data can be analyzed and analytical activities can be performed.
If you’re not able to drive value out of any data, you can capture this kind of information. It’s object oriented programming is easy to use and flexible. It provides horizontal scaling instead of expensive hardware. If you want to scale vertically, then additional CPU’s and RAMs have to be
purchased. But here horizontal scaling can be used with the help of commodity hardware.
NoSQL Database Categories
Document Database– It pairs each key with a complex data structure known as document. It can contain many different key value pairs, or key array pairs or even nested documents
Key value stores– They are the simplest NoSQL databases. Every single item in the database is
stored as an attribute name or key together with its value.
Graph store– They are used to store information about networks, such as social connections.Graph stores include Neo4J and HyperGraphDB.
Wide column stores- Wide column stores such as Cassandra and HBase are optimized for
queries over large datasets, and store columns of data together, instead of rows.
NoSQL Vs SQL Comparison
ACID Property
ACID is a set of properties that guarantees that are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.
Atomicity- Atomicity defines transaction as a logical unit of work which must be either
completed with all of its data modifications or none of them is performed.
Consistency- At the very end of the transaction, all data must be left in a consistent state.
Isolation- Modifications of data performed by a transaction must be independent of another transaction. Unless this happens, the outcome of a transaction may be erroneous.
Durability- When the transaction is completed, effects of the modifications performed by the transaction must be permanent in the system.
What is NoSQL?
Carl Strozzi introduced the term NoSQL in 1998 for his open source file-based databases.
Traditionally, SQL or relational databases were used to store or retrieve data for future insights while NoSQL database encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured, or polymorphic data together.
NoSQL or “Not SQL” is a new set of databases emerged in the recent past as an alternative to relational databases.
it is the non-relational data management system that does not require a fixed schema, it is easy to scale, and it avoids joins.
It is used for big data, distributed data stores, and real-time web apps. For example, Companies like Facebook, Twitter, or Google collect terabytes of data almost every day.
Why choose NoSQL Databases?
The increased use of social media has grown user-driven data rapidly that needs to be managed,
analyzed, and archived properly. Additionally, other data sources like GPS, sensors, automated trackers, and monitoring systems also produce a huge amount of data regularly.
The huge data set has introduced the challenges of data storage, data management, data analysis, etc. Moreover, it becomes semi-structured and sparse. In the case of RDBMS, there is a need for upfront schema and relational references.
To resolve these problems related to semi-structured or unstructured data, a range of new database products has emerged during the last few years. The new class of database products consists of column-based data stores, key-value pair databases, and document databases, etc.
When used together, these databases are called NoSQL consist of diverse products each having a unique set of features and propositions.
Other than this, NoSQL databases can be scaled out easily when compared to SQL databases.
The load is distributed among multiple hosts as shown below whenever load increases.
In the next section, there is a detailed idea of how SQL and NoSQL databases are different from each other.
RDBMS NOSQL
Data is stored in a relational model, with rows and columns.
A row contains information about an item while columns contain specific information, such as 'Model', 'Date of Manufacture', 'Color'.
Follows fixed schema. Meaning, the columns are defined and locked before data entry. In addition, each row contains data for each column.
Supports vertical scaling. Scaling an RDBMS across multiple servers is a challenging and time-consuming process
Atomicity, Consistency, Isolation & Durability(ACID) Compliant
Data is stored in a host of different databases, with different data storage models.
Follows dynamic schemas. Meaning, you can add columns anytime
Supports horizontal scaling. You can scale across multiple servers. Multiple servers are cheap commodity hardware or cloud instances, which make scaling cost-effective compared to vertical scaling
Not ACID Compliant
A quick history of NoSQL Database
1998 - Carl Strozzi introduced the term NoSQL in 1998 for his open source file-based databases. 2000 – Graph database was launched. 2004 – Google Big Table was launched. 2005 – CouchDB was launched. 2007 – Amazon Dynamo was launched. 2008 – Facebook open source Cassandra projects were proposed. 2009 – NoSQL databases were introduced again.
Features of NoSQL database
Non-Relational
NoSQL databases never follow the relational database models. It never provides a table with fixed column records. It works with self-contained aggregates. It does not require any data normalization or object-relational mapping. There are no complex features like relational databases.
Distributed Computing
It is possible to execute multiple databases in a distributed manner. It offers fail-over capabilities and auto-scaling features. It can be used with all programming language, or there is no standard query to use with NoSQL
databases. It provides eventual consistency. It provides no synchronous replication among distributed nodes. It enables maximum distribution and less coordination among data nodes.
Schema-Free
NoSQL databases either have relaxed schemas or it is schema-free. It does not require any definition for the schema of the data. It offers heterogeneous data structures for the same domains.
Simple API
It offers easy to use interface for data storage and data query. It allows low-level selection methods and low-level data manipulation. It uses text-based protocols with HTTP and JSON. It has no standard-based query language. It is a web-enabled database running as internet-facing services.
NoSQL Database Types you should know
There are four classes of NoSQL databases with their unique attributes and limitations. You should understand each of them in depth first and choose the best one that suits your
requirements the most. Let us see each of them one by one.
1). Key-value pair based
This database is designed to manage heavy loads and a lot of data gracefully. It stores data in
key-value pairs where each key is unique and value can be anything like object, string JSON, etc. Here is one quick example of the database given below.
It is the most basic type of database that can be used as collections, arrays, dictionaries, etc. It helps developers to
store the schema-less data. It works best for shopping cart content.
Read: How to Create Table in SQL Server by SQL Query?
2). Column-based NoSQL Database
This database is column-oriented where each column is treated separately, and values are stored contiguously. Here is the simple example of how column-based NoSQL database looks like:
It works best for aggregation queries like SUM, Count, MAX, MIN, AVG, etc. It helps to find data quickly in columns. This
database is majorly used for managing catalogs, data warehouse, BI projects, CRM, or library, etc.
3). Document-Oriented NoSQL Database
It stores and retrieves the data as key-value pair, but the value is stored in documents in XML or JSON formats. A database itself understands or queries the data. In the diagram, you can see a table where data is stored in row and column format. And the right-hand side is covered by
documents where data is stored in JSON format. Here, you don’t have to define columns which makes it more flexible as compared to relational databases.
It is mostly used for blogging platforms, CRM systems, or real-time analytics, etc. it is used for complex transactions that require multiple operations against varying aggregate functions.
4). Graph-based NoSQL Databases
This database stores the entity and defines the relationship among different entities. The stored entity is named as the node, and the relationship is defined as the edge. Each node and edge must have a unique
identifier. Here, tables are multi-relational in nature, not loosely connected. Traversing relationship is much faster in NoSQL databases when compared to relational databases. It is
mostly used for logistics, networks, and spatial data.
Query Mechanism tools for NoSQL Database
The data retrieval mechanism in NoSQL database is REST-based the value is retrieved based on key/ID with the GET resource. Document stores the most difficult queries as they use the key-
value pair to store the data. For example, Couch DB define views with the MapReduce.
What is the CAP Theorem?
This theorem is given by the Brewer which states that it is not possible for distributed data stores to give more than two out of total three guarantees. These are consistency, partition tolerance,
and availability.
1). Consistency:
The data should remain consistent even after the execution of an operation, it means once data is written, any future read request should be able to access the same data. For example, once you
update the status of an order, the client should be able to check the same data.
2). Partition Tolerance:
If communication among servers is not stable even then the system should be able to work properly, it is called the partition tolerance. For example, when the server is divided into multiple
partitions, they may or may not communicate together. If one part of the database is unavailable even then other parts should not be affected.
3). Availability:
The database should be highly responsive and available without any downtime.
Read: SQL Server Recovery Models-Simple, Full and Bulk Log
Eventual Consistency
The term eventual consistency means multiple data copies are available on different machines to get higher availability and scalability. If some changes are made to one file, it should automatically be reflected other replicas. Data replication is not instantaneous because a few
copies are updated frequently and a few over time. But you have to make sure content is the same for all replicas. Hence, the name of this phenomenon is given as eventual consistency.
BASE: Basically Available, Soft state, Eventual consistency
Here basically available means DB is available all the time as per the assumption of CAP theorem.
Soft state means the state of a system may change without an input. Eventual consistency means system become consistent over time.
Advantages of NoSQL Database
The desired technical characteristics for NoSQL database are given as below.
A). Primary and Analytic Data Source Capability
The first criteria for any NoSQL solution are that it must serve as the primary or active data source that receives data from different business apps. It should act as the secondary data source or analytical database to enhance the overall functionalities of business apps. Further, it should
be capable of integrating with different types of data like structured, semi-structured, or unstructured. Additionally, it can execute complex queries too.
B). Big Data Capability
NoSQL databases are good with Big data, and they can be scaled quickly to manage voluminous
data from terabytes to petabytes. Additionally, it delivers high performance for data velocity, data complexity, and the data variety.
C). Continuous Availability
NoSQL database is always available without any single point of failure. All nodes in the cluster can read request even if some machine is down. It can replicate data among different physical
machines within a data center. It avoids hardware outages too.
D). Multi-Data Center Capability
Business enterprises need highly distributed databases that are spread across multiple data centers or graphical locales without any performance issues. The solution includes the ability to
handle multiple data centers without concerning the overall occurrences of read and write operations. A good NoSQL database supports multiple data centers and provides configuration
options to maintain a proper balance between consistency and performance.
Read: Top 20 SSAS Interview Questions and Answers For Freshers, Experienced
E). Separate Cache layer is not required
A good NoSQL database uses and distributes data among different participating nodes. It does not have a separate cache layer to store the data. The memory cache of multiple participating
nodes stores data quickly for immediate I/O access. It eliminates the problems of synchronizing cache data with the persistent database. In this way, it supports higher scalability with fewer
management issues.
F). Cloud-Ready
The adoption of cloud platforms is increasing daily by leading enterprises worldwide. This is the reason why every robust platform must be cloud-ready. NoSQL databases like MongoDB are cloud-ready able to work in a cloud setting when necessary. It supports the hybrid solution when
one part of the database is hosted within the enterprise, and another part is hosted in the cloud.
G). High Performance and Scalability
NoSQL databases can enhance performance by adding multiple nodes to the cluster. Usually, the performance of a database system goes down with additional nodes to a cluster. However, a good
NoSQL database increases performance for both read and write operations when new nodes are added, and performance gains are linear in nature. Here, we have listed the major benefits of the
NoSQL database but there a few more as discussed by enterprises like easy to implement, easy to use, supports multiple languages & platforms, thriven open source community, etc.
Summary
The concept of NoSQL databases became popular with internet giants like Google, Amazon,
Facebook, etc. who produce voluminous data daily. It is schema-free, avoids joins, and easy to scale when required.
The different types of NoSQL database can handle structured, semi-structured and unstructured data properly with equal effect. It makes any database highly available, consistent without a
single point of failure.
Looking at multiple benefits and features of NoSQL databases, it is clear that they are certainly better than SQL or relational databases or more demanded by enterprises recently. To learn more
about NoSQL database, join our SQL certification program and become a database master now.