21
WELCOME TO NOSQL Ahmed Abdel-Aziz [email protected]

WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

WELCOME TO NOSQL

Ahmed [email protected]

Page 2: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 2

Table of Contents

1. INTRODUCTION ........................................................................................................................................... 3

2. NOSQL BACKGROUND ................................................................................................................................. 3

2.1. NOSQL MAIN CATEGORIES ............................................................................................................................... 4

2.1.1 Key-Value Stored NoSQL ...................................................................................................................... 5

2.1.2 Document Stored NoSQL ..................................................................................................................... 6

2.1.3 Wide-Column Stored NoSQL ................................................................................................................ 7

2.1.4 Graph-Oriented NoSQL ........................................................................................................................ 9

2.1.5 Choosing a NoSQL Category .............................................................................................................. 10

2.2. HADOOP AND NOSQL RELATIONSHIP ................................................................................................................ 12

2.3. BIG DATA ANALYTIC PLATFORM SUPPORT FOR NOSQL ........................................................................................ 15

2.3.1. Datameer .......................................................................................................................................... 15

2.3.2. DataStax ........................................................................................................................................... 16

2.3.3. Karmasphere..................................................................................................................................... 16

2.3.4. Solr .................................................................................................................................................... 16

2.3.5. RapidMiner ....................................................................................................................................... 17

2.3.6. R ........................................................................................................................................................ 17

2.3.7. Pivotal HD ......................................................................................................................................... 17

3. CONCLUSION ............................................................................................................................................. 18

4. REFERENCES .............................................................................................................................................. 19

Disclaimer: The views, processes or methodologies published in this article are those of the authors.

They do not necessarily reflect Dell EMC’s views, processes or methodologies.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

Page 3: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 3

1. Introduction

Welcome to NoSQL (Not Only SQL) technology! This research summarizes:

The four main categories of NoSQL databases and when to use each type

The relationship between Hadoop and NoSQL technologies

Big Data analytic platform support for NoSQL

The article starts with a brief background about NoSQL technology. This is followed by a discussion

on the four main types of NoSQL databases: Key-value Stored, Document Stored, Wide-column

Stored, and Graph Oriented. While there is a difference between NoSQL technology and Hadoop,

there is an important relationship binding the two and this relationship will be addressed. Finally,

the research will examine the current state of play with regard to Big Data analytic platforms on

NoSQL databases.

2. NoSQL Background

The term NoSQL was first used in 1998 as a name of a database. The term gained significant

awareness in 2009 according to Google Trends [7].

Figure 1: Google Trend Result for the term NoSQL

Page 4: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 4

Many believe that a better naming would have been NoRelational instead of NoSQL, since the term

better describes the non-relational flexible schema that characterizes NoSQL databases. According

to the list of NoSQL databases referred to as the “ultimate guide to the non-relational universe” [2],

the current number of NoSQL databases is roughly 150 databases. The huge volume (petabyte scale)

and increasing variety (structured/unstructured) of data posed significant challenges for classic

relational database management systems (RDBMS). This challenge led to the creation of alternative

database management systems (DBMS) such as NoSQL databases. NoSQL systems are distributed,

non-relational databases, designed for large scale data storage and for massively parallel data

processing across a large number of commodity servers [8].

According to the Couchbase survey conducted in 2012 [9], the two main drivers for adopting NoSQL

databases relates to Variety and Volume. This confirms the earlier statement that people turn to

NoSQL databases as opposed to RDBMS to solve their Big Data challenges.

Figure 2: Couchbase Survey Results

2.1. NoSQL Main Categories

NoSQL databases can be classified into four basic categories, each appropriate to different kinds of

tasks. The four main categories are:

1) Key-value Stored NoSQL

2) Document Stored NoSQL

3) Wide-Column Stored NoSQL

4) Graph-Oriented NoSQL

Page 5: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 5

2.1.1 Key-Value Stored NoSQL

These DBMS store items as keys and values. The key is an alpha-numeric identifier, while the value

may be a simple text string or more complex lists and sets. Data searches are usually performed

against keys only, and are limited to exact matches. An example key value store is as follows:

Key Value

1 Mobile Device Type: Computer

Model: Toshiba Laptop

Location: Office

Expires: 2015

2 Mobile Device Type: Tablet

Model: Samsung S4

Location: Home

Expires: 2014

When to Use: This NoSQL database type is ideal for extremely fast, highly scalable retrieval of

values. This can be found in use cases such as user profile management, shopping carts, or any other

use case where extremely low response time is critical.

Examples: Voldemort (LinkedIn), Dynamo (Amazon), Redis, Riak

While Amazon’s Dynamo has significant influence over a number of key-value NoSQL databases,

other players in the field have significant presence such as Voldemort and Redis. Dynamo is

proprietary, while Voldemort (Apache license) and Redis (BSD license) are both open source and

therefore gaining more popularity. Both Voldemort and Redis support compression, while Dynamo

does not [5]. Redis allows matching for key-ranges, such as matching of numeric ranges or regular

expressions, while Dynamo represents “Just a key-value store” [5]. On the other hand, revision

control is a capability that both Dynamo and Voldemort excel at over Redis. Voldemort is special in

that its data replication technique is symmetric (peer-to-peer) versus master-slave replication for

both Dynamo and Redis. Redis supports in-memory operations and is considered by some to be the

most popular key-value store in the cloud [12].

2.1.2 Document-Stored NoSQL

These DBMS are document databases and are inspired by Lotus Notes. They are designed to store

and manage documents which are encoded in a standard data exchange format such as XML, JSON

Page 6: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 6

(Javascript Option Notation), or BSON (Binary JSON). The value column of these databases is more

complex than a simple Key-Value Store NoSQL. A single column can host hundreds of

attributes/value pairs, and these attributes can change from one row to another. Both the value and

keys are fully searchable in document-stored NoSQL databases.

Figure 3: Document Store NoSQL Database [8]

When to Use: This NoSQL database type is ideal for storing and managing Big Data-size collections

of documents. Examples include text documents, XML documents, and emails. This database type

works well in storing semi-structured data that would require an extensive use of nulls in a RDBMS

for missing or nonexistent values.

Examples: CouchDB (JSON), MongoDB (BSON)

The two main players in this NoSQL database type are CouchDB and MongoDB. Both are open-

source with CouchDB following the Apache licnese, and MongoDB following the AGPL license.

MongoDB follows the BASE (Basically Available Soft-State Eventual Consistency) data integrity

model, while CouchDB can follow the ACID (Atomicity Consistency Isolation Durability) data integrity

model. MongoDB makes it easy to perform full text search, while a similar search in CouchDB may

require a MapReduce query. The replication in MongoDB is Master-slave while CouchDB is peer-to-

peer which may be valuable in some scenarios. Sharding in MongoDB is more advanced than in

CouchDB which itself has no built-in sharding mechanism yet, but there are several projects that

provide sharding support for CouchDB [7]. The ease of use and documentation available for

MongoDB is better than in CouchDB [11]. According to a recent research for LinkedIn profiles,

MongoDB is in highervdemand than CouchDB. In fact, MongoDB is considered the most popular

NoSQL database in terms of LinkdIn profile mentions (see Figure 4).

Page 7: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 7

Figure 4: NoSQL LinkedIn Skills Index [13]

2.1.3 Wide-Column Stored NoSQL

These DBMS are referred to as Wide-Column or Column-Family (WC/CF) stores. This database

management system is similar to document databases in that it uses a distributed column-oriented

data structure with multiple attributes per key. Some of these Wide-Column stores take the form of

a Key-Value store such as the popular Cassandra. The majority of these databases however follow

GoogleilyBigtable, which was developed by Google to be a petabyte-scale data storage system for its

search index. Google not only developed this database, but also developed a distributed file system

called GFS, as well as a MapReduce parallel processing framework. Similarly, Hadoop core consists

of the Hadoop file system (HDFS), and MapReduce. Hadoop ecosystem expands to include Hbase,

which is one Bigtable style database [8].

Page 8: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 8

Row Super Column Families: Electronics

100

Super Column: Device Type

Model: Toshiba Laptop

Weight: 2Kg

Dimensions: 30cm x 20cm x 3cm

Super Column: Manufacturer

Name: Toshiba Corporation

Country: Japan

City: Tokyo

101

Super Column: Device Type

Model: Television

Size: 40 inch

Type: Plasma

Super Column: Manufacturer

Name: Samsung

Country: Korea

Zip: 1135345

Figure 5: Wide-Column Store NoSQL Database

When to Use: This NoSQL database type is ideal for distributed data storage that is versioned due to

the availability of Wide-Column time-stamping functions. Also, large scale batch-oriented data

processing such as sorting and parsing works well for this database type.

Examples: BigTable, Hbase, Cassandra

The three main players in this NoSQL database type are BigTable, Hbase, and Cassandra. BigTable is

proprietary to Google, while both Hbase and Cassandra are open source following the Apache

license. BigTable uses GFS distributed file system for data storage, Hbase uses the Hadoop

distributed file system for storage, and Cassandra has its own file system. Cassandra has a special

query language – Cassandra Query Language (CQL) – and also supports API calls for queries. Hbase

Page 9: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 9

is queried through API calls or REST, and BigTable queried through APIs. All support MapReduce. For

integrity model, BigTable uses the multi-version concurrency control (MVCC), Hbase uses the log

replication, and Cassandra uses basically available soft state eventual consistency (BASE). Bigtable

supports full text search, while both Hbase and Cassandra do not. The maximum value size for

Hbase is much higher than Cassandra (2TB vs 2GB). BigTable is based on C/C++, while both Hbase

and Cassandra are based on Java. Since BigTable is proprietary, Cassandra and Hbase are in wide

use. According to a recent research for LinkedIn profiles, Cassandra is in higher demand than Hbase

[13]. This conclusion is also validated by another survey included below.

Figure 6: Wide-Column Store NoSQL Database Rankings [20]

2.1.4 Graph-Oriented NoSQL

These DBMS came to replace relational tables with structured relational graphs of interconnected

key-value pairings. This database type is unique because it is the only one of the four types that

focuses on relations visually. This special visual representation of information makes them more

familiar to human nature than any of the other NoSQL database types. This database type seems to

be ignored often by specialists in the field when it comes to analyzing NoSQL databases [20].

Page 10: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 10

Figure 7: Graph-Oriented Store NoSQL Database [29]

When to Use: This NoSQL database type is ideal for exploring relationships between data, rather

than exploring the data itself. Social networks traversing and representation is one example. This

database type is optimized for relationship traversing, not for querying. If the use case is more about

querying values, it may be better to use a search-based DMS instead. Perhaps that is what led

LinkedIn to use Voldemort, and Facebook to use Cassandra as their database instead of a Graph-

Oriented store.

Examples: Neo4j, AllegrGraph

The main player in this NoSQL database type is Neo4j. The data storage is mainly volatile memory

and it does not support MapReduce. Neo4j is based on the ACID (Atomicity, Consistency, Isolation,

Durability) integrity model. Full text search is supported as is graph. Neo4j is based on Java [8].

2.1.5 Choosing a NoSQL Category

The line between the different NoSQL databases is very thin; however, they have some small but

very significant differences. Providing a sorted view of a data set is a typical task for a database. The

document-oriented databases excel compared to other databases when ordering by multiple

attributes is required. All database types are more or less capable of ordering by a single attribute

[7]. Understanding the workload is key to selecting the right NoSQL category for it. In a vendor-

independent comparison of NoSQL databases [3], Cassandra, HBase, MongoDB, and Riak

performance was compared in various workloads. The results of the tests showed that Wide-Column

store NoSQL databases (HBase, and Cassandra) excelled in write workloads over Document stores

(MongoDB). This is a logical result when one understands that Wide-Column stores such as HBase

favor consistency over availability by committing writes after a particular number of in-memory

HDFS replicas. Document stores on the other hand favor availability over consistency and therefore

absorb write workloads slower than Wide-Column stores. Figure 8 below illustrates the two types of

Page 11: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 11

databases applying the CAP Theorem (Consistency Availability Partitioning). Wide-Column prefers

CP, while Document prefers AP.

Figure 8: Relative Position of the NoSQL Databases in the CAP Theorem [7]

On the other hand, the popular Document store (MongoDB) and Key-Value store databases excelled

in read workloads performance over the Wide-Column store (HBase). To improve the latency and

throughput of NoSQL databases, it is often the case that multiple NoSQL types work together to get

the best of both worlds. For example, a case study explained an architecture that had Redis (Key-

Value Store) using Cassandra (Wide-Column Store) in the backend [27]. The case study explained

how one organization scaled using this architecture to serve 4 billion videos.

To help in choosing a specific NoSQL database after deciding which NoSQL category is appropriate,

the following popularity diagrams can help [20].

Figure 9: Graph-Oriented NoSQL Databases Popularity [20]

Page 12: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 12

Figure 10: Document NoSQL Databases Popularity [20]

Figure 11: Key-Value NoSQL Databases Popularity [20]

Figure 12: Wide-Column NoSQL Databases Popularity [20]

2.2. Hadoop and NoSQL Relationship

Data platforms have evolved from traditional RDBMS, to Data Warehouses, to Big Data platforms

such as Hadoop. Hadoop Core consists of the Hadoop filesystem (HDFS), and an open-source

implementation of MapReduce. The Hadoop Filesystem serves as a distributed file system to store

huge amounts of unstructured data. Hadoop Core is an analytic platform that serves batch

Page 13: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 13

processing well. In such batch processing, delays are tolerable and the situation is not real-time. This

may be appropriate in some scenarios. Another form of processing exists which is transactional

processing, which is characterized by a low latency requirement that is often near real-time. Hadoop

Core alone is not capable of achieving such requirements, and that is why NoSQL databases are

needed. NoSQL databases interface with Hadoop platform as the data storage component of the

database. Figure 13 shows a nice taxonomy of the different data platforms.

Figure 13: Taxonomy of Data Platforms – NoSQL in the Real World [31]

As evident in Figure 13, Hadoop is located in the analytic section at the top and not the operational

section at the bottom. The Hadoop filesystem (HDFS) is where unstructured data is stored. The

Hadoop MapReduce is the data processing which takes that unstructured data and makes some

structure out of it. That structure can be stored into a NoSQL database to support low latency

transactional processing. The most obvious example for Hadoop and NoSQL working together is by

looking into the Hadoop Ecosystem, which includes HBase. HBase is Wide-Column NoSQL database

that is integrated with Hadoop Core (HDFS and MapReduce).

Page 14: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 14

Figure 14: HBase Architecture [6]

Even with HBase, Hadoop’s transactional database based on columnar storage, the latency is still

based on disk I/O, queries are based on API-level programming, and high availability isn’t mature

[26]. That is why there are other examples of NoSQL with Hadoop such as Hadoop with Redis – the

popular Key-Value Store NoSQL database [25]. Redis can be used as a front end to serve data out of

Hadoop, caching the hot pieces of data in-memory for fast access when they are needed again. This

is achieved by using a Java client called Jedis, which can ingest and retrieve data with Redis. Figure

15 below summarizes the relationship between Hadoop and in-memory NoSQL. A final example of

Hadoop with NoSQL is Hadoop with MongoDB – the popular document NoSQL database. A practical

example of such a scenario exists in the paper titled Performance Evaluation of a MongoDB and

Hadoop Platform for Scientific Data Analysis [4].

Page 15: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 15

Figure 15: Hadoop and In-Memory NoSQL Comparison [6]

2.3. Big Data Analytic Platform Support for NoSQL

There are multiple Big Data analytic platforms. This section will summarize the relationship between

some of the main Big Data analytic platforms and NoSQL databases. The following platforms have

been analyzed:

Datameer

DataStax

KarmaSphere

Solr

RapidMiner

R

Pivotal HD

2.3.1. Datameer

Datameer is a company that attempts to unify data analytics into a single application. Its main value

proposition is the significant reduction in complexity in terms of data integration, data

transformation, and data visualization. According to Datameer, one typically goes through a three

step process for data analytics involving three different technologies. Datameer simplifies this

complex environment into a single application on top of the powerful Hadoop platform. The NoSQL

support available from Datameer is in integrating data from the Wide-Column Store NoSQL

category. The Datameer analytic platform supports HBase or Cassandra as sources for data

integration [14].

Page 16: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 16

2.3.2. DataStax

DataStax is a company that announced it is the first to introduce the world’s first NoSQL Big Data

platform with comprehensive enterprise-grade security features. On February 25th, 2013 the

company announced the general availability of its product DataStax Enterprise (DSE) 3 – the newest

version of DataStax’s Apache Cassandra-based big data platform. DSE3 is a complete integrated big

data platform that combines a production-certified version of Cassandra with Apache Solr and

Apache Hadoop to deliver continuous availability support and performance across multiple data

centers. According to DataStax, the product is architected to securely manage real-time (through

Cassandra), analytic (through Hadoop), and enterprise search (through Solr) data all in the same

database cluster [15].

2.3.3. Karmasphere

Karmasphere is a company that created a product designed for teams of analysts to explore and

analyze Big Data on Hadoop, and to discover business insights about their customers that can be

applied to all points of customer engagement. According to Karmasphere, its product is natively

designed for Hadoop and provides a unified workspace for the Big Data Analytics workflow, making

it possible to transform vast amounts of raw data into business insight spanning data ingestion,

iterative analysis, as well as the visualization and publishing of new insights. Karmasphere itself uses

a MySQL database, and its supported databases do not include MySQL databases [16].

2.3.4. Solr

Solr is an Apache open source project for a Search Solution. It is unique in that is similar to a NoSQL

database, but it is not. Solr is most similar to the MongoDB architecture. It includes the following

NoSQL features: Realtime-Get, Update Durability, Atomic Compare and Set, Versioning, and

Optimistic Locking. Like some NoSQL databases’ implementation of the CAP theorem, it favors

Consistency and Partitioning, rather than Availability and Partitioning [20]. There are several Search

Projects associated with NoSQL Databases. Examples are Lucandra/Solandra for Cassandra, HSearch

for Hbase, and Riak Search for Riak. Solr was built to be a search solution and search capability is its

sweet spot [18]. It is considered by far the most popular search engine based on the popularity

survey below.

Page 17: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 17

Figure 16: Solr Popularity Compared to Other Search Solutions [20]

Solr and NoSQL remain two inter-related but separate worlds with each having its sweet spot. Thus

far, one has not dominated the other [20]. One proof point of this is the recently released DataStax

product DS3 which uses both Solr and Cassandra NoSQL.

2.3.5. RapidMiner

RapidMiner is a company that provides software, solutions, and services in the fields of predictive

analytics, data mining, and text mining. Its flagship product is user friendly, and used by many

beginners in the field of Data Analytics. The company helps to automatically and intelligently analyze

data – including databases and text. The company currently has limited support for NoSQL

technology [21].

2.3.6. R

R is a open source statistical package used in many data analytics projects. R has strong support for

NoSQL technologies in Key-Value, Wide-Column, and Document store categories. In specific, R

supports the most popular NoSQL databases in each category: Redis, Cassandra, and MongoDB [22]

[23].

2.3.7. Pivotal HD

Pivotal is a company providing application and data infrastructure software, agile development

services, and data science consulting. Its product – Pivotal HD – is a Hadoop distribution fully

supported and enterprise-ready. Pivotal HD supports multiple NoSQL databases such as Gemfire – a

proprietary NoSQL database- , as well as the open source HBase [28]. The most popular Key-Value

NoSQL database – Redis – is also supported by Pivotal [12].

Page 18: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 18

3. Conclusion

Big Data is about variety, velocity, and volume. Traditional relational databases are not flexible

enough to deal with the new variety of data types. The implication is that a new generation of

databases is required with a more flexible schema. The velocity and volume of data pose a similar

challenge to traditional scale-up databases in terms of the performance requirement. The

implication is that a new generation of databases is required with a scale-out and distributed

architecture. Indeed these two challenges led to the development of a new generation of databases

referred to as NoSQL databases. Although vendors have developed their proprietary NoSQL

databases, open source remains king in this new space.

There are four main categories of NoSQL databases: Key-Value, Document, Wide-Column, and

Graph. Each category has its strengths and weaknesses and is populated with multiple databases,

each with its special implementation and characteristics. In each category one database has gained

more popularity over the others. Redis in the Key-Value category; MongoDB in the Document

category; Cassandra in the Wide-Column category; and Neo4j in the Graph category. NoSQL

databases and Hadoop – the popular big data platform – work together closely. NoSQL works in the

operational space, while Hadoop works in the analytical space. NoSQL exists in the front-end, and

Hadoop exists in the back-end.

The Big Data analytic platforms arena is continuously changing with some platforms embracing

NoSQL more than others. R, DataStax, and Pivotal HD are examples of platforms that have embraced

NoSQL, while DataMiner and Karmasphere platforms are examplesof those that are behind in

embracing NoSQL. With billions of users spending billions of hours online, application usage can

grow from zero to a million users overnight [1]. The application tier has long been accustomed to

scale-out architecture to absorb such spikes. Now it is the database tier’s turn for scale-out using

NoSQL.

Page 19: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 19

4. References

[1] Online article, Couchbase, "What is NoSQL Database & Why NoSQL", Accessible from:

http://www.couchbase.com/why-nosql/nosql-database, Date Accessed: November 28th 2013.

[2] Online article, NoSQL Databases, "Your Ultimate Guide to the Non-Relational Universe!", Accessible

from: http://nosql-database.org/, Date Accessed: November 28th 2013.

[3] Online article, Bushik, S., "A Vendor Independent Comparison of NoSQL Databases: Cassandra,

HBase, MondoDB, Riak", Accessible from:

http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html, Date Accessed: November

29th 2013

[4] University Research Paper, Dede, E., Govindaraju, M., Gunter, D., Canon, R., Ramakrishnan, L.,

"Performance Evaluation of MongoDB and Hadoop Platform for Scientific Data Analysis", Accessible

from: http://datasys.cs.iit.edu/events/ScienceCloud2013/p02.pdf, Date Accessed: November 23rd 2013

[5] University Research Report, Strauch, C., "NoSQL Databases – Selected Topics on Software Technology

Ultra-Large Scale Sites”, Accessible from: http://www.christof-strauch.de/nosqldbs.pdf, Date Accessed:

November 23rd 2013

[6] Research Paper, Sharma, S., "A Brief Review on Modern NoSQL Data Models, Handling Big Data",

Accessible from:

www.cs.iastate.edu/~sugamsha/articles/A Brief Review on Modern NoSQL Data Models_Handling Big

Data.pdf , Date Accessed: November 24th 2013

[7] Master’s Thesis, Orend, K., "Analysis and Classification of NoSQL Databases and Evaluation of their

Ability to Replace an Object-relational Persistence Layer”, Accessible from:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.483, Date Accessed: November 24th 2013

[8] Research Paper, Moniruzzaman, A., Hossain, S., "NoSQL Database: New Era of Databases for Big Data

Analytics – Classification, Characteristics and Comparison", Accessible from: International Journal of

Database Theory and Application Vol. 6, No. 4, 2013, Date Accessed: November 24th 2013

[9] Research Survey, Couchbase, "Accelerated Adoption of NoSQL”, Accessible from:

http://www.couchbase.com/press-releases/couchbase-survey-shows-accelerated-adoption-nosql-2012,

Date Accessed: November 30th 2013

[10] Master’s Thesis, Feng, H., "Benchmarking the Suitability of Key-Value Stores for Distributed Scientific

Data”, Accessible from:

http://www.epcc.ed.ac.uk/sites/default/files/Dissertations/2011-2012/Submission-1054977.pdf, Date

Accessed: November 30th 2013

Page 20: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 20

[11] Online Blog, "The Comparison Wiki”, Accessible from:

http://vschart.com/compare/dynamo-db/vs/project-voldemort/vs/redis-database, Date Accessed:

November 30th 2013

[12] Web Page, Pivotal "Open Source Software”, Accessible from: http://gopivotal.com/oss, Date

Accessed: November 30th 2013

[13] Online Research, 451 Group, "NoSQL LinkedIn Skills Index”, Accessible from:

http://blogs.the451group.com, Date Accessed: November 30th 2013

[14] Web Page, Datameer, "Data Integration with Datameer”, Accessible from:

http://www.datameer.com/product/data-integration.html, Date Accessed: December 1st 2013

[15] Web Page, DataStax, "DataStax Introduces World’s First NoSQL Big Data Platform with

Comprehensive Enterprise-Grade Security Features”, Accessible from:

http://www.datastax.com/2013/02/datastax-introduces-worlds-first-nosql-big-data-platform-with-

comprehensive-enterprise-grade-security-features, Date Accessed: December 1st 2013

[16] Web Page, Karmasphere, "Karmashpere Technical Specifications”, Accessible from:

http://www.karmasphere.com/product-overview/technical-specifications/, Date Accessed: December 1st

2013

[17] Online Blog, Yonik, "SolrCloud, NoSQL and More”, Accessible from:

http://searchhub.org/2012/05/21/solr-4-preview/, Date Accessed: December 1st 2013

[18] Online Presentation, Ingersoll, G., Johnson, R., "Solr Power FTW”, Accessible from:

http://portal.sliderocket.com/ANYSX/SXSW-2011-Solr-Nosql, Date Accessed: December 1st 2013

[19] Online Presentation, Ingersoll, G., "Apache Lucene, Solr and NoSQL: A Comparison”, Accessible from:

http://www.lucenerevolution.org/sites/default/files/LuceneRevPreso_Ingersoll_NoSQL.pdf, Date Accessed:

December 1st 2013

[20] Online Presentation, Miller, M., "Solr The Search First NoSQL Database”, Accessible from:

http://www.slideshare.net/lucenerevolution/solr-cloud-the-search-first-nosql-database-extended-deep-

dive, Date Accessed: December 1st 2013

[21] Online Blog, Rieger, A., "Large Scale Data Analysis and Predictive Modeling in Data Mining”,

Accessible from: http://blog.bosch-si.com/large-scale-data-analysis-and-predictive-modeling-in-data-

mining/, Date Accessed: December 1st 2013

[22] Online Documentation, Urbanek, S., "Package RCassandra”, Accessible from: http://cran.r-

project.org/web/packages/RCassandra/RCassandra.pdf, Date Accessed: December 1st 2013

[23] Online Documentation, Lindsly, G., "CRAN – Package MongoDB Driver”, Accessible from: http://cran.r-

project.org/web/packages/rmongodb/index.html, Date Accessed: December 1st 2013

[24] Online Blog, Apicella, P., "Adding Years to Your RDBMS by Scaling with Spring and NoSQL”, Accessible

from: http://blog.gopivotal.com/products/adding-years-to-your-rdbms-by-scaling-with-spring-and-nosql,

Date Accessed: December 1st 2013

Page 21: WELCOME TO NOSQL · Redis supports in-memory operations and is considered by some to be the most popular key-value store in the cloud [12]. ... When to Use: This NoSQL database type

2016 EMC Proven Professional Knowledge Sharing 21

[25] Online Blog, Shook, A., "Making Hadoop MapReduce Work with a Redis Cluster”, Accessible from: http://blog.gopivotal.com/products/making-hadoop-mapreduce-work-with-a-redis-cluster, Date Accessed:

December 1st 2013

[26] Online Blog, Melo, F., "Cultivating Hybrids: 4 Key Data Architectures for Scaling Infinitely”, Accessible

from: http://blog.gopivotal.com/features/cultivating-hybrids-4-key-data-architectures-for-scaling-infinitely,

Date Accessed: December 1st 2013

[27] Online Blog, Bloom, A., "Case Study: How Hulu Scaled Serving 4 Billion Videos Using Redis”,

Accessible from: http://blog.gopivotal.com/case-studies-2/case-study-how-hulu-scaled-serving-4-billion-

videos-using-redis, Date Accessed: December 1st 2013

[28] Online Blog, Miner, D., "Introducing Pivotal HD”, Accessible from: http://blog.gopivotal.com/features/introducing-pivotal-hd, Date Accessed: December 1st 2013

[29] Online Blog, neo4j, "Top 10 Ways to get to Know Neo4j”, Accessible from:

http://blog.ne4j.org/2010/02/top-10-ways-to-get-to-know-neo4j.html, Date Accessed: December 6th 2013

[30] Online Article, Swoyer, S., "DataStax: Anything Hadoop Can Do Cassandra Can Do Better”, Accessible

from: http://tdwi.org/Articles/2013/08/20/DataStax-Hadoop-Cassandra.aspx?Page=1, Date Accessed:

December 6th 2013

[31] Online Blog, Techielicous, "NoSQL in the Real World”, Accessible from:

http://techielicous.com/2011/06/04/search-and-analytics/, Date Accessed: December 6th 2013

Dell EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO

RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS

PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR

FITNESS FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an

applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.