78
NoSQL for the DBA Lynn Langit April 2013 – Big Data Tech Con

NoSQL for the SQL Server Pro

Embed Size (px)

Citation preview

Page 1: NoSQL for the SQL Server Pro

NoSQL for the DBA

Lynn Langit

April 2013 – Big Data Tech Con

Page 2: NoSQL for the SQL Server Pro

Data Expertise / Lynn Langit

• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB

• Practicing Architect• Technical author / trainer

– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server Series – 2 books on SQL Server BI– Cloudera trainer (certified)

• Former MSFT FTE– 4 years

Page 3: NoSQL for the SQL Server Pro

but first…

Business Intelligence to BigData

Page 4: NoSQL for the SQL Server Pro

What is the relationship?

Business Intelligence NoSQL ????

Page 5: NoSQL for the SQL Server Pro

“The Past” BI = Effective Reports

Data optimized for Static READING

Page 6: NoSQL for the SQL Server Pro

BI = Optimized RDBMS

SQL queries & Data Stored on disk

Page 7: NoSQL for the SQL Server Pro

BI = OLAP Cubes storage

Page 8: NoSQL for the SQL Server Pro

BI = OLAP Cubes clients

Page 9: NoSQL for the SQL Server Pro

BI = Transactional Data

• What happened?• Why did that happen?• Decision Support Systems

Collecting Transactional

data

Page 10: NoSQL for the SQL Server Pro

So Why Change?

Page 11: NoSQL for the SQL Server Pro

EnterBig Data

Q: What is it?

A: Your Data, plus more data….

Page 12: NoSQL for the SQL Server Pro

BigData Pipeline - STEP 1 – Acquire

AcquireProcess

StoreQuery & Mine

Visualize

Page 13: NoSQL for the SQL Server Pro

Big Data – an example from weather

13

Page 14: NoSQL for the SQL Server Pro

Big Data – an example from weather

• Source Data• National weather data• Satellite data• Airplanes with sensors• Sensors on boats• Sensors in the ocean• Sensors on the ground• Historical Data• Social Media

• Results• More accurate predictions

• Tsunami• Tornado

Page 15: NoSQL for the SQL Server Pro

Big Data – an example from health care

• Medical records• Regular• Emergency• Genetic data – 23andMe

• Food data • SparkPeople

• Purchasing • Grocery card• credit card

• Search – Google• Social media

• Twitter• Facebook

• Exercise • Nike Fuel Band• Kinect• Location - phone

Page 16: NoSQL for the SQL Server Pro

BigData = ‘Next State’ Questions

• What could happen?• Why didn’t this happen?• When will the next new thing

happen?• What will the next new thing be?• What happens?

Collecting Behavioral

data

Page 17: NoSQL for the SQL Server Pro

12:00 12:30 1:00 1:30 2:00 2:300

500

1000

1500

2000

2500

Key Monitoring

Sensor Readings

Other Behavioral data

What is the reality of personalized medicine?

Page 18: NoSQL for the SQL Server Pro

BigData and Verticals• Retail• Manufacturing• Health Care• Banking• Education

Page 19: NoSQL for the SQL Server Pro

Collecting BigData• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data

Standards• M2M• Public Datasets

– Freebase– Azure DataMarket– Hillary Mason’s list

19

Page 20: NoSQL for the SQL Server Pro

DEMO – Hilary Mason’s Datasets• Who is Hilary Mason and why do you care

about her datasets?• How do you get her datasets?• What do you do with her datasets?

Page 21: NoSQL for the SQL Server Pro

Collecting Data – a note about Faces

• Facial recognition• Voice recognition• Gesture capture and analysis

21

Page 22: NoSQL for the SQL Server Pro

Petabytesof

Big Data

Page 23: NoSQL for the SQL Server Pro

Big Data at Apple

Page 24: NoSQL for the SQL Server Pro

Big Data in India

Update: “The total number of AADHAARs issued as of 24-Mar-2013 is over 304 million. This is more than 25% of the population of India.”

Page 25: NoSQL for the SQL Server Pro

BigData Pipeline – STEP 5 - Visualize

AcquireProcess

StoreQuery & Mine

Visualize

Page 26: NoSQL for the SQL Server Pro

DEMO - Visualizing Big Data: Wind Map

26

Page 27: NoSQL for the SQL Server Pro

Demo - Visualizing Big Data – D3

27

Page 28: NoSQL for the SQL Server Pro

BigData Pipeline – STEP 2 - Process

AcquireProcess

StoreQuery & Mine

Visualize

Page 29: NoSQL for the SQL Server Pro

How do you clean up the mess?

• Data Hygiene• Data Scrubbing• Data Sprawl• The true cost of data• …and what about data integrity?• …and security?• …should your data be in the cloud?

Page 30: NoSQL for the SQL Server Pro

Is NoSQL just Hadoop?

HUGE Hype factor since 2011

Apache Hadoop • a software framework that supports data-intensive 

distributed applications • under a free license enables applications to work with thousands of

nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS)

papers

Page 31: NoSQL for the SQL Server Pro

What is the relationship?

NoSQL Hadoop ??? BigData

Page 32: NoSQL for the SQL Server Pro

Hadoop in the Enterprise

Page 33: NoSQL for the SQL Server Pro

How you ‘get’ Hadoop

• roll your own

Open source

• Cloudera• MapR• Hortonworks• More…

Commercial distribution

• AWS

Rent it via the cloud

Page 34: NoSQL for the SQL Server Pro

Demo – Get and Use Cloudera CDH4 VM

Page 35: NoSQL for the SQL Server Pro

Working with Hadoop

Page 36: NoSQL for the SQL Server Pro

About Hadoop MapReduce

Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png

Page 37: NoSQL for the SQL Server Pro

Demo - HDInsight – MapReduce w/Java

Page 38: NoSQL for the SQL Server Pro

Demo - HDInsight – MapReduce w/ Hive

Page 39: NoSQL for the SQL Server Pro

Example Comparison: RDBMS vs. Hadoop

Traditional RDBMS Hadoop / MapReduce

Data Size Gigabytes (Terabytes) Petabytes and greater

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 40: NoSQL for the SQL Server Pro

BigData Pipeline STEP 3 – Store

AcquireProcess

StoreQuery & Mine

Visualize

Page 41: NoSQL for the SQL Server Pro

“Small” BigData vs. “Big” BigData

Hadoop

NoSQL

RDBMS

Page 42: NoSQL for the SQL Server Pro

The reality…two pivots

Storage Methods• SQL (RDBMS) • NoSQL or Hadoop

Storage Locations• On premises • Cloud-hosted

Page 43: NoSQL for the SQL Server Pro

Cloud-hosted NoSQL up to 50x CHEAPER

Page 44: NoSQL for the SQL Server Pro

So many NoSQL options• More than just the Elephant in the room• Over 120+ types of NoSQL databases

Page 45: NoSQL for the SQL Server Pro

Flavors of NoSQLKey/ValueVolatile

Key/valuePersistent

Wide-Column Document Graph

Page 46: NoSQL for the SQL Server Pro

Key / Value Database• Just keys and values

– No schema• Persistent or Volatile• Examples

– AWS Dynamo DB– Riak

Page 47: NoSQL for the SQL Server Pro

DEMO - AWS DynamoDB

• Key/Value store on the AWS cloud

Page 48: NoSQL for the SQL Server Pro

NoSQL BLOB Storage Buckets in the Cloud

• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others

– Dropbox– Box– More…

Page 49: NoSQL for the SQL Server Pro

DEMO - Battle of the Buckets

• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 / Glacier

Page 50: NoSQL for the SQL Server Pro

Column Database

• Wide, sparse column sets• Schema-light

• Examples:– Cassandra– HBase w/Hadoop– BigTable– GAE HR DS

Page 51: NoSQL for the SQL Server Pro

Types of Column Databases

• Column-families– Non-relational– Sparse– Examples:

• HBase• Cassandra• xVelocity (SQL 2012 Tabular)

• Column-stores– Relational– Dense– Example:

• SQL Server 2012 – Columnstore index

Page 52: NoSQL for the SQL Server Pro

DEMO – SQL Server ‘NoSQL’

• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)

Page 53: NoSQL for the SQL Server Pro

Document Database (Mongo DB)• document-oriented (collection of

JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…

• binary forms – PDF, Microsoft Office documents --

Word, Excel…)

• Examples:– MongoDB– Couchbase

Page 54: NoSQL for the SQL Server Pro

Demo - Mongo DB

Page 55: NoSQL for the SQL Server Pro

Graph Databases

• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly

finding connections, patterns and relationships between the objects within lots of data

• Examples:– Neo4J– Google Freebase

Page 56: NoSQL for the SQL Server Pro

DEMO – Neo4J

Page 57: NoSQL for the SQL Server Pro

CAP Theorem applied = ‘how big is it?’

• CA = RDBMS– Highly-available consistency

– Ex. SQL Server• CP = NoSQL

– Enforced consistency– Ex. Hadoop

• AP = NoSQL– Eventual consistency– Ex. MongoDB

Page 58: NoSQL for the SQL Server Pro

“Small” BigData vs. “Big” BigData

Hadoop

Key/Value or Column

Document or Graph

RDBMS

Page 59: NoSQL for the SQL Server Pro

Cloud-hosted RDBMS

• AWS RDS – SQL Server, mySQL, Oracle– Medium cost– Solid feature set, i.e.

backup, snapshot– Use existing tooling

• Google – mySQL– Lowest cost– Most limited RDBMS

functionality• Microsoft – SQLAzure

– Highest cost

Page 60: NoSQL for the SQL Server Pro

DEMO - AWS RDS

• SQL Server, MySQL or Oracle• Essential to understand pricing models

Page 61: NoSQL for the SQL Server Pro
Page 62: NoSQL for the SQL Server Pro

Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png

Page 63: NoSQL for the SQL Server Pro

NoSQL Applied

Soci

al G

ames

Prod

uct C

atal

ogs

Soci

al a

ggre

gato

rs

Log

File

s

Line

-of-B

usin

ess

ColumnstoreHBase

Key/ValueDynamoDB

DocumentMongoDB

GraphNeo4j

RDBMSSQL Server

Page 64: NoSQL for the SQL Server Pro

Cloud Offerings– RDBMS AND NoSQL

AWS Google Microsoft

RDBMS RDS – all major mySQL SQL Azure

NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs

NoSQL Key-Value DynamoDB H/R Data on GAE Azure Tables

Streaming ML or (Mahout)

Custom EC2 Prospective Search &Prediction API

StreamInsight

NoSQL Document or Graph

MongoDB on EC2 Freebase MongoDB on Windows Azure

NoSQL – ColumnHadoop (HBase)

Elastic MapReduce using S3 & EC2

none HDInsight

Dremel/Warehousing

RedShift BigQuery none

Page 65: NoSQL for the SQL Server Pro

BigData Pipeline STEP 4 – Query

AcquireProcess

StoreQuery & Mine

Visualize

Page 66: NoSQL for the SQL Server Pro

Alw

ays

Map

Redu

ce?

Page 67: NoSQL for the SQL Server Pro

Data Scientists and Languages

Page 68: NoSQL for the SQL Server Pro

Karmasphere Studio for AWS

Page 69: NoSQL for the SQL Server Pro

Can Excel help?

• Connector to Hadoop• Data Explorer• Data Quality Services• Master Data Services• Integration with Azure Data Market• Visualize with PowerView• Data Mining w/Predixion

Page 70: NoSQL for the SQL Server Pro

Demo - Hadoop Connector to Excel

Page 71: NoSQL for the SQL Server Pro

Google BigQuery w/Excel

• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language

Page 72: NoSQL for the SQL Server Pro

DEMO - Google BigQuery• Hadoop-like (Dremel) based service• For massive amounts of data• SQL-like query language

Page 73: NoSQL for the SQL Server Pro

Dremel Realized => Impala

• Interactive Hadoop?

Page 74: NoSQL for the SQL Server Pro

Other types of cloud data services

Hosting public datasets• Pay to read• Earn revenue by offering for

read

Cleaning / matching (your) data • ETL – Microsoft Data

Explorer, Google Refine• Data Quality – Windows

Azure Data Market, InfoChimps, DataMarket.com

Page 75: NoSQL for the SQL Server Pro

NoSQL To-Do ListUnderstand CAP & types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem

Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments

Learn noSQL access technologies• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon

Karmasphere, Microsoft Excel connectors, etc…

Page 76: NoSQL for the SQL Server Pro

The Changing Data Landscape

NoSQLRDBMS

OtherServices

Page 77: NoSQL for the SQL Server Pro

www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic TKP site• C# via Pluralsight

• recipes)

Page 78: NoSQL for the SQL Server Pro

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions