39
noSQL for the DBA Lynn Langit Practioner, Author, Instructor March 2012- for SQL Saturday SoCal

NoSQL for the SQL Server DBA

Embed Size (px)

DESCRIPTION

Slides from my talk at SQLSaturday 120 in Huntington Beach, CA in March 2012

Citation preview

Page 1: NoSQL for the SQL Server DBA

noSQL for the DBA

Lynn LangitPractioner, Author, Instructor

March 2012- for SQL Saturday SoCal

Page 2: NoSQL for the SQL Server DBA
Page 3: NoSQL for the SQL Server DBA

BigData = ‘Next State’ Questions

• What could happen?• Why didn’t this happen?• When will the next new thing

happen?• What will the next new thing be?• What happens?

Collecting Behavioral

data

Page 4: NoSQL for the SQL Server DBA

BigData = Exponentially More Data• Retail Example -> ‘Feedback Economy’– Number of transactions– Number of behaviors (collected every minute)

12:00 12:30 1:00 1:30 2:00 2:300

500

1000

1500

2000

2500

PurchasesLocationsPhone data

Page 5: NoSQL for the SQL Server DBA

So Why Change?

Page 6: NoSQL for the SQL Server DBA

Hitting (Relational) Walls• For Writes– Scale (partition /

shard)– Speed (latency)

• For Reads– Failures

(availability)

Page 7: NoSQL for the SQL Server DBA

Is NoSQL just Hadoop?

• HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers

Page 8: NoSQL for the SQL Server DBA

Working with HadoopCommon Tools / Languages• Java (JDK) / Eclipse• MapReduce

• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)

• Pig (ETL -- Java)• Hive (HQL Query)

• HBase tables• Others

• Mahout (analyze)• Karmasphere (analyze)• R (analyze)

Page 9: NoSQL for the SQL Server DBA

Oracle Loader for Hadoop

SQL Server Connector for Hadoop

Page 10: NoSQL for the SQL Server DBA

Demo -Hadoop on Azure – Cluster Allocation

Page 11: NoSQL for the SQL Server DBA

The reality…two pivots

Storage Methods• SQL (RDBMS) • noSQL

Storage Locations• On premises • Cloud-hosted

Page 12: NoSQL for the SQL Server DBA

So many NoSQL options

• More than just the Elephant in the room• Over 120+ types of noSQL databases

Page 13: NoSQL for the SQL Server DBA

Flavors of noSQL

Page 14: NoSQL for the SQL Server DBA

Graph DatabaseUse for data with

– a lot of many-to-many relationships– recursive self-joins – when your primary objective is quickly

finding connections, patterns and relationships between the objects within lots of data

– Examples: Neo4J, FreeBase (Google)

Page 15: NoSQL for the SQL Server DBA

Column Database

• Wide, sparse column sets• Examples:– Cassandra– HBase– BigTable– GAE HR DS– Azure Tables

Page 16: NoSQL for the SQL Server DBA

Demo - Document Database (Mongo DB)

• Use for data that is – document-oriented (collection of JSON

documents) w/semi structured data• Encodings include XML, YAML, JSON & BSON

– binary forms • PDF, Microsoft Office documents -- Word,

Excel…)

– Examples: MongoDB, CouchDB

Page 17: NoSQL for the SQL Server DBA

Key / Value Database• Schema-less• State (Persistent or Volatile)• Examples– AWS Dynamo DB– Project Voldemort

Page 18: NoSQL for the SQL Server DBA

So which type of NoSQL? Back to CAP…

Consistency

AvailabilityPartitioning

CP = noSQL/columnHadoopBig TableH-baseMemCacheDB(graph)?

CA = SQL/RDBMSSQL Sever / SQL AzureOracleMySQL

AP = noSQL/document or key/valueDynamoDBCouchDBCassandraVoldemort

Page 19: NoSQL for the SQL Server DBA

Example Comparison: RDBMS vs. Hadoop

Traditional RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 20: NoSQL for the SQL Server DBA

Real-World Examples – not only SQL

• Facebook runs on Hadoop & MySQL• Twitter runs on Hadoop(ran on FlockDb/graph)• Yahoo runs on Hadoop• LinkedIn runs on Hadoop & Voldemort• Klout runs Hadoop (on Azure) &HBase (Hive) &

SQL Server SSAS BISM cubes

Page 21: NoSQL for the SQL Server DBA

What about the cloud?

Page 22: NoSQL for the SQL Server DBA

Cloud-hosted NoSQL up to 50x CHEAPER

Page 23: NoSQL for the SQL Server DBA

NoSQL (Cloud) BLOB Storage Buckets• Amazon – S3– The gold standard

• Google – Cloud Storage– Free for developers

• Microsoft Azure BLOBS• DropBox, Box…

Page 24: NoSQL for the SQL Server DBA

Cloud-hosted RDBMS• AWS RDS – mySQL, Oracle

– Medium cost– Solid feature set, i.e.

backup, snapshot• Google – mySQL

– Lowest cost– Most limited RDBMS

functionality• Microsoft – SQLAzure

– Best tooling integration– Highest cost

Page 25: NoSQL for the SQL Server DBA

Other types of cloud data services

Hosting public datasets• Pay to read• Earn revenue by offering for read

Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google Refine• Data Quality – Windows Azure Data Market,

InfoChimps, DataMarket.com

Page 26: NoSQL for the SQL Server DBA

Cloud – RDBMS AND NoSQL

AWS Google Microsoft Others

Cloud RDBMS Oracle / mySQL mySQL SQL Azure Hosted RDBMS on Rackspace

noSQL buckets S3 Cloud Storage HDFS on Azure

NoSQL databases

DynamoDB H/R Datastore on GAE

Azure Tables Heroku

Streaming Machine Learning

Custom EC2 Prospective Search &Prediction API

StreamInsight & Mahout with Hadoop

Document or Graph

MongoDB on EC2

Freebase (g) MongoDB on Windows Azure

Cassandra on Rackspace

Hadoop Elastic MapReduce on S3 & EC2

Big Query (HBase-like)

Hadoop on Azure

Data sets & other

Karmasphere Translation APIFull-text search

Azure DataMarket

Database.com

Page 27: NoSQL for the SQL Server DBA

Pick your mix and then…

NoSQL

• Host locally• Host in the

Cloud

RDBMS

• Host locally• Host in the

Cloud

Other Services

• Use Cloud Data Markets

• Use Cloud ETL

Page 28: NoSQL for the SQL Server DBA

What about me?

Page 29: NoSQL for the SQL Server DBA

Common DBA Tasks in NoSQLRDBMS NoSQLImport Data Import DataSetup Security Setup SecurityPerform a Backup Make a copy of the dataRestore a Database Move a copy to a locationCreate an Index Create an IndexJoin Tables Together Run MapReduceSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used

Send an Email from SQL Server Set up resource threshold alerts

Search BOL Interpret Documentation

Page 30: NoSQL for the SQL Server DBA

Demo - HadoopOnAzure – Part 2

• Show MapReduce Job• Show JS / Hive consoles

Page 31: NoSQL for the SQL Server DBA

Making Sense – Asking Questions

Page 32: NoSQL for the SQL Server DBA

Data Scientists…

Page 33: NoSQL for the SQL Server DBA

Com

parin

g…

Page 34: NoSQL for the SQL Server DBA

Karmasphere Studio for AWS

Page 35: NoSQL for the SQL Server DBA

Hadoop Connector to Excel - Demo

Page 36: NoSQL for the SQL Server DBA

NoSQL To-Do List

Understand CAP & types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem

Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments

Learn noSQL access technologies• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon

Karmasphere, Microsoft Excel connectors, etc…

Page 37: NoSQL for the SQL Server DBA

The Changing Data Landscape

NoSQLRDBMS

OtherServices

Page 38: NoSQL for the SQL Server DBA

www.TeachingKidsProgramming.org• Free Courseware ( • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic

• recipes)

Page 39: NoSQL for the SQL Server DBA

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions