104
10: Taxonomy of Data and Storage Zubair Nabi [email protected] April 20, 2013 Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27

Topic 10: Taxonomy of Data and Storage

Embed Size (px)

DESCRIPTION

Cloud Computing Workshop 2013, ITU

Citation preview

Page 1: Topic 10: Taxonomy of Data and Storage

10: Taxonomy of Data and Storage

Zubair Nabi

[email protected]

April 20, 2013

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27

Page 2: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27

Page 3: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27

Page 4: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 5: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 6: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 7: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 8: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different events

Datasets can easily be classified on the basis of their structure1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 9: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 10: Topic 10: Taxonomy of Data and Storage

Introduction

Data is everywhere and is the driving force behind our lives

The address book on your phone is data

So is the newspaper that you read every morning

Everything you see around you is a potential source of data whichmight be useful for a certain application

We use this data to share information and make a more informeddecision about different eventsDatasets can easily be classified on the basis of their structure

1 Structured2 Unstructured3 Semi-structured

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27

Page 11: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated type

I Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 12: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated type

I Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 13: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated type

I Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 14: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rows

Each field also has an associated typeI Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 15: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated type

I Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 16: Topic 10: Taxonomy of Data and Storage

Structured Data

Formatted in a universally understandable and identifiable way

In most cases, structured data is formally specified by a schema

Your phone address phone is structured because it has a schemaconsisting of name, phone number, address, email address, etc.

Most traditional databases contain structured data revolving arounddata laid out across columns and rowsEach field also has an associated type

I Possible to search for items based on their data types

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27

Page 17: Topic 10: Taxonomy of Data and Storage

Unstructured Data

Data without any conceptual definition or type

Can vary from raw text to binary data

Processing unstructured data requires parsing and tagging on the fly

In most cases, consists of simple log files

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27

Page 18: Topic 10: Taxonomy of Data and Storage

Unstructured Data

Data without any conceptual definition or type

Can vary from raw text to binary data

Processing unstructured data requires parsing and tagging on the fly

In most cases, consists of simple log files

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27

Page 19: Topic 10: Taxonomy of Data and Storage

Unstructured Data

Data without any conceptual definition or type

Can vary from raw text to binary data

Processing unstructured data requires parsing and tagging on the fly

In most cases, consists of simple log files

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27

Page 20: Topic 10: Taxonomy of Data and Storage

Unstructured Data

Data without any conceptual definition or type

Can vary from raw text to binary data

Processing unstructured data requires parsing and tagging on the fly

In most cases, consists of simple log files

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27

Page 21: Topic 10: Taxonomy of Data and Storage

Semi-structured Data

Occupies the space between the structured and unstructured dataspectrum

For instance, while binary data has no structure, audio and video fileshave meta-data which has structure, such as author, time of creation,etc.

Can also be labelled as self-describing structure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27

Page 22: Topic 10: Taxonomy of Data and Storage

Semi-structured Data

Occupies the space between the structured and unstructured dataspectrum

For instance, while binary data has no structure, audio and video fileshave meta-data which has structure, such as author, time of creation,etc.

Can also be labelled as self-describing structure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27

Page 23: Topic 10: Taxonomy of Data and Storage

Semi-structured Data

Occupies the space between the structured and unstructured dataspectrum

For instance, while binary data has no structure, audio and video fileshave meta-data which has structure, such as author, time of creation,etc.

Can also be labelled as self-describing structure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27

Page 24: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27

Page 25: Topic 10: Taxonomy of Data and Storage

Database Management Systems (DBMS)

Used to store and manage data

Support for large amounts of data

Ensure concurrency, sharing, and locking

Security is useful too; to enable fine-grained access control

Ability to keep working in the face of failure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27

Page 26: Topic 10: Taxonomy of Data and Storage

Database Management Systems (DBMS)

Used to store and manage data

Support for large amounts of data

Ensure concurrency, sharing, and locking

Security is useful too; to enable fine-grained access control

Ability to keep working in the face of failure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27

Page 27: Topic 10: Taxonomy of Data and Storage

Database Management Systems (DBMS)

Used to store and manage data

Support for large amounts of data

Ensure concurrency, sharing, and locking

Security is useful too; to enable fine-grained access control

Ability to keep working in the face of failure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27

Page 28: Topic 10: Taxonomy of Data and Storage

Database Management Systems (DBMS)

Used to store and manage data

Support for large amounts of data

Ensure concurrency, sharing, and locking

Security is useful too; to enable fine-grained access control

Ability to keep working in the face of failure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27

Page 29: Topic 10: Taxonomy of Data and Storage

Database Management Systems (DBMS)

Used to store and manage data

Support for large amounts of data

Ensure concurrency, sharing, and locking

Security is useful too; to enable fine-grained access control

Ability to keep working in the face of failure

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27

Page 30: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 31: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 32: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 33: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 34: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 35: Topic 10: Taxonomy of Data and Storage

Relational Database Management Systems (RDBMS)

The most popular and predominant storage system in use

Data in different files is connected by using a key field

Data is laid out in different tables, with a key field that identifies eachrow

The same key field is used to connect one table to another

For instance, a relation might have customer ID as key and her detailsas data; another table might have the same key but different data, sayher purchases; yet another table with the same key might have abreakdown of her preferences

Examples include Oracle Database, MS SQL Server, MySQL, IBMDB2, and Teradata

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27

Page 36: Topic 10: Taxonomy of Data and Storage

Structured Query Language (SQL)

Non-procedural language used for data retrieval and manipulation inRDBMS

Adds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.

Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplan

Instructions consist of a specific SQL statement and additionalparameters and operands

For instance, the SELECT operator retrieves certain records, INSERTadds a record, and so on

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27

Page 37: Topic 10: Taxonomy of Data and Storage

Structured Query Language (SQL)

Non-procedural language used for data retrieval and manipulation inRDBMS

Adds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.

Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplan

Instructions consist of a specific SQL statement and additionalparameters and operands

For instance, the SELECT operator retrieves certain records, INSERTadds a record, and so on

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27

Page 38: Topic 10: Taxonomy of Data and Storage

Structured Query Language (SQL)

Non-procedural language used for data retrieval and manipulation inRDBMS

Adds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.

Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplan

Instructions consist of a specific SQL statement and additionalparameters and operands

For instance, the SELECT operator retrieves certain records, INSERTadds a record, and so on

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27

Page 39: Topic 10: Taxonomy of Data and Storage

Structured Query Language (SQL)

Non-procedural language used for data retrieval and manipulation inRDBMS

Adds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.

Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplan

Instructions consist of a specific SQL statement and additionalparameters and operands

For instance, the SELECT operator retrieves certain records, INSERTadds a record, and so on

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27

Page 40: Topic 10: Taxonomy of Data and Storage

Structured Query Language (SQL)

Non-procedural language used for data retrieval and manipulation inRDBMS

Adds a layer of abstraction over relational algebra, which enables setoperations, selections, etc.

Due to its declarative nature, users operate in terms of their expectedoutput while the underlying system decides the actual query executionplan

Instructions consist of a specific SQL statement and additionalparameters and operands

For instance, the SELECT operator retrieves certain records, INSERTadds a record, and so on

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27

Page 41: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 42: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 43: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 44: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend it

For instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 45: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 46: Topic 10: Taxonomy of Data and Storage

RDBMS and Structured Data

As structured data follows a predefined schema, it naturally maps on toa relational database system

I The schema defines the type and structure of the data and its relations

Schema design is an arduous process and needs to be done beforethe database can be populated

Another consequence of a strict schema is that it is non-trivial toextend itFor instance, adding a new attribute to an existing row necessitatesadding a new column to the entire table

I Extremely suboptimal in tables with millions of rows

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27

Page 47: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak one

Data within such datasets also has an associated typeI In fact, types are application-centric: It might be possible to interpret a

field as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 48: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 49: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 50: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive task

Structureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 51: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 52: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 53: Topic 10: Taxonomy of Data and Storage

RDBMS and Semi- and Un-structured Data

Unstructured data has no notion of schema while semi-structured dataonly has a weak oneData within such datasets also has an associated type

I In fact, types are application-centric: It might be possible to interpret afield as a float in one application and as a string in another

While it is possible, with human intervention, to glean structure fromunstructured data, it is an extremely expensive taskStructureless data generated by real-time sources can change thenumber of attributes and their types on the fly

I RDBMS would require the creation of a new table each time such achange takes place

Therefore, unstructured and semi-structured data does not fit therelational model

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27

Page 54: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27

Page 55: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails

2 Consistent: Data within the database remains consistent after eachtransaction

3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 56: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction

3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 57: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other

4 Durable: Transactions are persistent across failures and restartsI Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 58: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 59: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applications

I Most applications are more interested in availability and willing tosacrifice consistency leading to eventual consistency

I This basically available, soft state, eventually consistent (BASE) modelenables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 60: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistency

I This basically available, soft state, eventually consistent (BASE) modelenables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 61: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 62: Topic 10: Taxonomy of Data and Storage

Motivation

Different semantics:I RDBMS provide ACID semantics:

1 Atomic: The entire transaction either succeeds or fails2 Consistent: Data within the database remains consistent after each

transaction3 Isolation: Transactions are sandboxed from each other4 Durable: Transactions are persistent across failures and restarts

I Overkill in case of most user-facing applicationsI Most applications are more interested in availability and willing to

sacrifice consistency leading to eventual consistencyI This basically available, soft state, eventually consistent (BASE) model

enables applications to function even in the face of partial failure

High Throughput: Most NoSQL databases sacrifice consistency foravailability leading to higher throughput (in some cases an order ofmagnitude)

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27

Page 63: Topic 10: Taxonomy of Data and Storage

Motivation (2)

Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the data

Commodity Hardware: A large number of RDBMS require specializedand proprietary hardware for operation. In contrast, NoSQL databasesfunction over commodity off-the-shelf hardware

Programming Language Support: Over the years programminglanguages have started providing abstractions for database support(LINQ, etc.) while bypassing SQL. NoSQL databases provideabstractions that directly map onto the language abstractions leadingto tighter coupling

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27

Page 64: Topic 10: Taxonomy of Data and Storage

Motivation (2)

Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the data

Commodity Hardware: A large number of RDBMS require specializedand proprietary hardware for operation. In contrast, NoSQL databasesfunction over commodity off-the-shelf hardware

Programming Language Support: Over the years programminglanguages have started providing abstractions for database support(LINQ, etc.) while bypassing SQL. NoSQL databases provideabstractions that directly map onto the language abstractions leadingto tighter coupling

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27

Page 65: Topic 10: Taxonomy of Data and Storage

Motivation (2)

Horizontal Scalability: To cater for more data, NoSQL stores can bescaled up by just adding more machines and the underlying systemautomatically re-distributes the data

Commodity Hardware: A large number of RDBMS require specializedand proprietary hardware for operation. In contrast, NoSQL databasesfunction over commodity off-the-shelf hardware

Programming Language Support: Over the years programminglanguages have started providing abstractions for database support(LINQ, etc.) while bypassing SQL. NoSQL databases provideabstractions that directly map onto the language abstractions leadingto tighter coupling

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27

Page 66: Topic 10: Taxonomy of Data and Storage

Motivation (3)

The Rise of Cloud Computing: Cloud Computing applications requirehorizontal scalability and low administration overhead. Bothrequirements are naturally satisfied by NoSQL stores

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27

Page 67: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27

Page 68: Topic 10: Taxonomy of Data and Storage

Introduction

NoSQL databases can be classified on the basis of:

1 Data Model: How data is represented

2 Scalability: How scalable the system is

3 Query Model: What type of API it exposes

4 Persistence: How persistent the data is

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27

Page 69: Topic 10: Taxonomy of Data and Storage

Introduction

NoSQL databases can be classified on the basis of:

1 Data Model: How data is represented

2 Scalability: How scalable the system is

3 Query Model: What type of API it exposes

4 Persistence: How persistent the data is

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27

Page 70: Topic 10: Taxonomy of Data and Storage

Introduction

NoSQL databases can be classified on the basis of:

1 Data Model: How data is represented

2 Scalability: How scalable the system is

3 Query Model: What type of API it exposes

4 Persistence: How persistent the data is

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27

Page 71: Topic 10: Taxonomy of Data and Storage

Introduction

NoSQL databases can be classified on the basis of:

1 Data Model: How data is represented

2 Scalability: How scalable the system is

3 Query Model: What type of API it exposes

4 Persistence: How persistent the data is

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27

Page 72: Topic 10: Taxonomy of Data and Storage

Classification by Data Model

Based on the data model, NoSQL databases can roughly be categorizedinto three categories:

1 Key/value Stores: A map/dictionary allowing put/get semantics perkey

2 Document Stores: Complex data structures to encapsulate documentkey/value pairs

3 Column-Oriented Stores: Data laid out by column

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27

Page 73: Topic 10: Taxonomy of Data and Storage

Classification by Data Model

Based on the data model, NoSQL databases can roughly be categorizedinto three categories:

1 Key/value Stores: A map/dictionary allowing put/get semantics perkey

2 Document Stores: Complex data structures to encapsulate documentkey/value pairs

3 Column-Oriented Stores: Data laid out by column

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27

Page 74: Topic 10: Taxonomy of Data and Storage

Classification by Data Model

Based on the data model, NoSQL databases can roughly be categorizedinto three categories:

1 Key/value Stores: A map/dictionary allowing put/get semantics perkey

2 Document Stores: Complex data structures to encapsulate documentkey/value pairs

3 Column-Oriented Stores: Data laid out by column

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27

Page 75: Topic 10: Taxonomy of Data and Storage

Key/value Stores

Data is stored within a large hash map

Simple get/put API

Favour scalability over consistency

Limit on the size of the key

Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and Memcached

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27

Page 76: Topic 10: Taxonomy of Data and Storage

Key/value Stores

Data is stored within a large hash map

Simple get/put API

Favour scalability over consistency

Limit on the size of the key

Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and Memcached

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27

Page 77: Topic 10: Taxonomy of Data and Storage

Key/value Stores

Data is stored within a large hash map

Simple get/put API

Favour scalability over consistency

Limit on the size of the key

Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and Memcached

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27

Page 78: Topic 10: Taxonomy of Data and Storage

Key/value Stores

Data is stored within a large hash map

Simple get/put API

Favour scalability over consistency

Limit on the size of the key

Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and Memcached

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27

Page 79: Topic 10: Taxonomy of Data and Storage

Key/value Stores

Data is stored within a large hash map

Simple get/put API

Favour scalability over consistency

Limit on the size of the key

Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,and Memcached

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27

Page 80: Topic 10: Taxonomy of Data and Storage

Document Stores

Key/value semantics but based on documents

A document encapsulates data in a standard format, such as JSON,XML, PDF, etc.

Documents themselves can be heterogeneous

Documents can also be retrieved based on their content

Examples include Apache CouchDB and MongoDB

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27

Page 81: Topic 10: Taxonomy of Data and Storage

Document Stores

Key/value semantics but based on documents

A document encapsulates data in a standard format, such as JSON,XML, PDF, etc.

Documents themselves can be heterogeneous

Documents can also be retrieved based on their content

Examples include Apache CouchDB and MongoDB

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27

Page 82: Topic 10: Taxonomy of Data and Storage

Document Stores

Key/value semantics but based on documents

A document encapsulates data in a standard format, such as JSON,XML, PDF, etc.

Documents themselves can be heterogeneous

Documents can also be retrieved based on their content

Examples include Apache CouchDB and MongoDB

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27

Page 83: Topic 10: Taxonomy of Data and Storage

Document Stores

Key/value semantics but based on documents

A document encapsulates data in a standard format, such as JSON,XML, PDF, etc.

Documents themselves can be heterogeneous

Documents can also be retrieved based on their content

Examples include Apache CouchDB and MongoDB

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27

Page 84: Topic 10: Taxonomy of Data and Storage

Document Stores

Key/value semantics but based on documents

A document encapsulates data in a standard format, such as JSON,XML, PDF, etc.

Documents themselves can be heterogeneous

Documents can also be retrieved based on their content

Examples include Apache CouchDB and MongoDB

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27

Page 85: Topic 10: Taxonomy of Data and Storage

Column-Oriented Stores

Data is stored and processed by column

Useful for read-mostly and read-intensive data

Data within the same column is of the same type enablingopportunities for efficient compression

Columns are stored separately so they can be loaded in parallel

Examples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s Cassandra

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27

Page 86: Topic 10: Taxonomy of Data and Storage

Column-Oriented Stores

Data is stored and processed by column

Useful for read-mostly and read-intensive data

Data within the same column is of the same type enablingopportunities for efficient compression

Columns are stored separately so they can be loaded in parallel

Examples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s Cassandra

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27

Page 87: Topic 10: Taxonomy of Data and Storage

Column-Oriented Stores

Data is stored and processed by column

Useful for read-mostly and read-intensive data

Data within the same column is of the same type enablingopportunities for efficient compression

Columns are stored separately so they can be loaded in parallel

Examples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s Cassandra

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27

Page 88: Topic 10: Taxonomy of Data and Storage

Column-Oriented Stores

Data is stored and processed by column

Useful for read-mostly and read-intensive data

Data within the same column is of the same type enablingopportunities for efficient compression

Columns are stored separately so they can be loaded in parallel

Examples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s Cassandra

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27

Page 89: Topic 10: Taxonomy of Data and Storage

Column-Oriented Stores

Data is stored and processed by column

Useful for read-mostly and read-intensive data

Data within the same column is of the same type enablingopportunities for efficient compression

Columns are stored separately so they can be loaded in parallel

Examples include Google’s BigTable (Apache HBase is its open sourceclone) and Facebook’s Cassandra

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27

Page 90: Topic 10: Taxonomy of Data and Storage

Outline

1 Datasets

2 Storage

3 Beyond RDBMS

4 NoSQL Taxonomy

5 NewSQL

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27

Page 91: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQL

I Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine3 Transparent Clustering: Add pluggable features to existing databases

to ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 92: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine3 Transparent Clustering: Add pluggable features to existing databases

to ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 93: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine3 Transparent Clustering: Add pluggable features to existing databases

to ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 94: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardware

Classified into:1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine3 Transparent Clustering: Add pluggable features to existing databases

to ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 95: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch

2 New MySQL Storage Engines: Keep MySQL as interface but replacethe storage engine

3 Transparent Clustering: Add pluggable features to existing databasesto ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 96: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine

3 Transparent Clustering: Add pluggable features to existing databasesto ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 97: Topic 10: Taxonomy of Data and Storage

Introduction

A hybrid of traditional RDBMS and NoSQLI Scalability and performance of NoSQL and ACID guarantees of RDBMS

Use SQL as the primary language

Ability to scale out and run over commodity hardwareClassified into:

1 New Databases: Designed from scratch2 New MySQL Storage Engines: Keep MySQL as interface but replace

the storage engine3 Transparent Clustering: Add pluggable features to existing databases

to ensure scalability

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27

Page 98: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the data

I Queries are split and shipped to nodes that own the dataI Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all dataI A set of processing nodes receives queries and pulls in required data

from the central nodeI Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 99: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the dataI Queries are split and shipped to nodes that own the data

I Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all dataI A set of processing nodes receives queries and pulls in required data

from the central nodeI Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 100: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the dataI Queries are split and shipped to nodes that own the dataI Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all dataI A set of processing nodes receives queries and pulls in required data

from the central nodeI Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 101: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the dataI Queries are split and shipped to nodes that own the dataI Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all data

I A set of processing nodes receives queries and pulls in required datafrom the central node

I Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 102: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the dataI Queries are split and shipped to nodes that own the dataI Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all dataI A set of processing nodes receives queries and pulls in required data

from the central node

I Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 103: Topic 10: Taxonomy of Data and Storage

New Databases

1 Query Distribution:I Each node holds a subset of the dataI Queries are split and shipped to nodes that own the dataI Examples include Google’s Spanner and NuoDB

2 Pull Data:I A central node (possibly replicated) holds all dataI A set of processing nodes receives queries and pulls in required data

from the central nodeI Examples include VMware’s SQLFire

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27

Page 104: Topic 10: Taxonomy of Data and Storage

References

1 NoSQL Databases: https://oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf

2 NewSQL – The New Way to Handle Big Data: http://www.linuxforu.com/2012/01/newsql-handle-big-data/

Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27