Upload
lynn-langit
View
114
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
November 6-9, Seattle, WA
NoSQL for the DBA
@LynnLangit
BigData = Exponentially More Data
Retail Example -> ‘Feedback Economy’• Number of transactions• Number of behaviors (collected every minute)
12:00 12:30 1:00 1:30 2:00 2:300
500
1000
1500
2000
2500
PurchasesLocationsPhone data
BigData = ‘Next State’ Questions
• What could happen?• Why didn’t this happen?• When will the next new
thing happen?• What will the next new
thing be?• What happens?
Collecting Behavior
aldata
So Why Change?
Hitting (Relational) Walls
CA• Highly-available consistency
CP• Enforced consistency
AP• Eventual consistency
Is NoSQL just Hadoop?
HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and
petabytes of data• was inspired by Google's MapReduce and Google File System
(GFS) papers
Oracle Loader for Hadoop
SQL Server Connector for Hadoop
Working with Hadoop
Tools / Languages• Java (JDK) / Eclipse• MapReduce
• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse
(Java)• Pig (ETL -- Java)• Hive (HQL Query)
• HBase tables• Others
• Mahout (analyze)• Karmasphere
(analyze)• R (analyze)
10 November 6-9, Seattle, WA
Demo- HadoopDemo
The reality…two pivots
Storage Methods• SQL (RDBMS) • NoSQL
Storage Locations• On premises • Cloud-hosted
So many NoSQL options
More than just the Elephant in the roomOver 120+ types of NoSQL databases
Flavors of NoSQL
Graph Database
Use for data with• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
• Examples: Neo4J, FreeBase (Google)
Column Database
Wide, sparse column setsSchema-lightExamples:
• Cassandra• HBase• BigTable• GAE HR DS
More about Column Databases
Type A• Column-families• Non-relational• Sparse• Examples: HBase, Cassandra, xVelocity (SQL 2012 BISM)
Type B• Column-stores• Relational• Dense• Example:
• SQL Server 2012 Columnstore index
Document Database (Mongo DB)
Use for data that is • document-oriented (collection of
JSON documents) w/semi structured data• Encodings include XML, YAML, JSON &
BSON
• binary forms • PDF, Microsoft Office documents -- Word,
Excel…)
• Examples: MongoDB, CouchDB
18 November 6-9, Seattle, WA
Demo - MongoDBDemo
Key / Value Database
• Schema-less• State (Persistent or Volatile)• Examples
• AWS Dynamo DB• Project Voldemort
So which type of NoSQL? Back to CAP…
CP = noSQL/columnHadoopBig TableHBaseMemCacheDB
AP = noSQL/document or key/valueDynamoDBCouchDBCassandraVoldemort
CA = SQL/RDBMSSQL Sever / SQL AzureOracleMySQL
Faster! – Move it to memory
• Microsoft• xVelocity / PowerPivot / SQL Server 2012 BISM• Hekatron (SQL Server future)
• Clourdera• Hadoop with Impala
• Google• BigQuery / Dremel
• Others• MapR and…• Dremel –> Drill• Redis - NoSQL
What about the cloud?
Cloud-hosted NoSQL up to 50x CHEAPER
Consumer Storage Buckets
• Dropbox• Box• Windows SkyDrive• Google Drive• Amazon Cloud Drive• Apple iCloud
Developer BLOB Storage Buckets
• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others
26 November 6-9, Seattle, WA
AWS S3 & Glacier
Demo
Cloud-hosted RDBMS
AWS RDS – SQL Server, MySQL, Oracle• Medium cost• Solid feature set, i.e. backup,
snapshot• Use existing toolingGoogle – MySQL• Lowest cost• Most limited RDBMS functionalityMicrosoft – Windows Azure SQL Database• Highest cost• Azure VMs w/MySQL
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for read
Cleaning / matching (your) data • ETL – Microsoft Data Explorer, Google
Refine• Data Quality – Windows Azure Data
Market, InfoChimps, DataMarket.com
Cloud – RDBMS, NoSQL & Hadoop
AWS Google Microsoft
Cloud RDBMS SQL Server, Oracle / mySQL
MySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Storage
NoSQL databases
DynamoDB H/R Datastore on GAE
Azure Tables
Streaming Machine Learning
Custom EC2 Prospective Search &Prediction API
StreamInsight & Mahout with Hadoop
Document or Graph
MongoDB on EC2
Freebase (g) MongoDB on Windows Azure
Hadoop Elastic MapReduce using S3 & EC2
MapR & GCE Windows Azure HDInsight
Data sets & other
Karmasphere Translation APIFull-text search
Azure Marketplace
Pick your mix and then…
NoSQL
• Host locally• Host in the
Cloud
RDBMS
• Host locally• Host in the
Cloud
Other Service
s
• Use Cloud Data Markets
• Use Cloud ETL
What about me?
Common DBA Tasks in NoSQLRDBMS NoSQL
Import Data Import Data
Setup Security Setup Security
Perform a Backup Make a copy of the data
Restore a Database Move a copy to a location
Create an Index Create an Index
Join Tables Together Run MapReduce
Schedule a Job Schedule a (Cron) Job
Run Database Maintenance Monitor space and resources used
Send an Email from SQL Server
Set up resource threshold alerts
Search BOL Interpret Documentation
33 November 6-9, Seattle, WA
Demo Hadoop IIDemo
Making Sense – Asking Questions
Data Scientists…
Com
pari
ng…
Karmasphere Studio for AWS
38 November 6-9, Seattle, WA
Demo Hadoop IIIDemo
Hadoop Connector to Excel
40 November 6-9, Seattle, WA
Demo Google BigQuery
Demo
Google BigQuery
Query as a service• Uses Dremel – not Hadoop• Pay to play (i.e. storage / query)
Hive (HQL-style) syntax• Web interface • Connector to Excel
Programmable• Command-line tools • APIs
NoSQL To-Do List
Understand CAP & types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training
environments
Learn NoSQL access technologies• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine,
Amazon Karmasphere, Microsoft Excel connectors, etc…
The Changing Data Landscape
NoSQLRDBMS
OtherService
s
www.TeachingKidsProgramming.org• Free Courseware• Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic
• recipes)
www.TeachingKidsProgramming.org• Free Courseware • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic
• recipes)
Toward Data Craftsmanship…
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL
solutions
47
PASS Resources
Free SQL Server and BI training Free 1-day Training Events Regional Event
Local and Virtual User Groups Free Online Technical Training
Learning Center
This is Community
48November 6-9, Seattle, WA
Thank youfor attending this session and the 2012 PASS Summit in Seattle