29
Global Sponsors: What’s all the Buzz about Hadoop and Hive? Cindy Gross, Microsoft SQLCAT PM @ SQLCindy | [email protected] | http :// blogs.msdn.com/cindygross Why it Matters for SQL Server Peeps

What’s all the Buzz about Hadoop and Hive?

  • Upload
    katy

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

What’s all the Buzz about Hadoop and Hive?. Why it Matters for SQL Server Peeps. Cindy Gross, Microsoft SQLCAT PM @ SQLCindy | [email protected] | http :// blogs.msdn.com/cindygross. The Plan. Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive - PowerPoint PPT Presentation

Citation preview

Page 1: What’s  all the Buzz about  Hadoop  and Hive?

Global Sponsors:

What’s all the Buzz about Hadoop and Hive?Cindy Gross, Microsoft SQLCAT PM@SQLCindy | [email protected] | http://blogs.msdn.com/cindygross

Why it Matters for SQL Server Peeps

Page 2: What’s  all the Buzz about  Hadoop  and Hive?

2

The Plan Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive Why SQL Pros Care Next Steps

Page 3: What’s  all the Buzz about  Hadoop  and Hive?

3

PASS Summit - SQLCAT: Big Data – All Abuzz About Hive [BIA-305-A]

Speakers – Dipti Sangani and Cindy Gross

Gain BI insights with HiveQL over HDFS/HadoopHow HiveQL generates MapReduce and outputs dataRelated, familiar toolsHow and when to use Hive

Page 5: What’s  all the Buzz about  Hadoop  and Hive?

5

SQLCAT – Customer Experience (CX) Implement leading edge features Share lessons learned with the community Change the product based on real customer experiences

Page 7: What’s  all the Buzz about  Hadoop  and Hive?

7

Microsoft SQLCAT at PASS Summit - Azure• SQLCAT: How Do I Troubleshoot My Database Now that It Is i

n the Cloud? - Silvano Coriani, Ewan Fairweather

• SQLCAT: SQL Azure Design Patterns and Best Practices - Gus Apostol

• SQLCAT: How SQL Azure Supports Large-Scale Customer Deployments - Silvano Coriani, Ewan Fairweather, Mark Simms, Michael Thomassy, Nicholas Dritsas

• SQLCAT: What Are the Largest Azure Projects in the World? - Kevin Cox

• SQLCAT: Best Practices for SQL Server in Azure VMs: config & performance - Steven Howard

Page 8: What’s  all the Buzz about  Hadoop  and Hive?

8

Microsoft SQLCAT at PASS Summit – More!• SQLCAT: Configuring Kerberos for SharePoint 2010 BI in 7 Steps -

Chuck Heinzelman• SQLCAT: What Are the Largest SQL Server Projects in the World?-

Kevin Cox, Ewan Fairweather, Mark Souza• SQLCAT: SQLOS Memory Manager Changes in SQL Server 2012 - Gus

Apostol, Jerome Halmans• SQLCAT: How Does Microsoft Run Its SAP Landscape on Windows and

SQL Server? - Juergen Thomas

• SQLCAT: Many-Core Processors, SSDs, Large Memory: How to Benefit SQL Server - Juergen Thomas

• SQLCAT: Running Reporting Services in SharePoint Integrated Mode: How and Why – Chuck Heinzelman

• SQLCAT: Case Study of Big Data in the Real World – Lindsey Allen, Lou Sawyer, Robert Abbott, Shep Sheppard

Page 9: What’s  all the Buzz about  Hadoop  and Hive?

SQL Server Clinic Got a Burning SQL Server Architecture Question? Want to talk to someone about a problem you’re seeing on

your servers? Stop by the SQL Server Clinic at the PASS Summit this fall! Members of SQLCAT and CSS will be on hand to talk to you

about your issues!

NEW EXPANDED LOCATION4th Floor

Across from the PASS Booth

Page 10: What’s  all the Buzz about  Hadoop  and Hive?

10

What is Big Data Find Insights - Explore, test, eliminate noise Schema on Read, not Schema on Write Structure may not be fully pre-defined Scale out on commodity hardware – pay as you go BASE instead of ACID More programmer, lone wolf focused MapReduce, streaming, machine learning, massively

parallel processing Something too big or complex for your current environment

and resources to handle in a cost effective manner

Page 11: What’s  all the Buzz about  Hadoop  and Hive?

11

Why Use Big Data – Use Cases

Telemetry Management• Clickstream and

Application Log Analysis

• Sensor Data

IT Management• SLA Monitoring• Cyber Security• Forensic Analysis

Online Commerce• Sentiment Analysis• Recommendation

Engines• Search Indexing /

Quality

Financial Services• Risk Modeling• Threat Analysis• Fraud Detection• Credit Scoring

Page 12: What’s  all the Buzz about  Hadoop  and Hive?

12

VVVVroom!

Variability – Multiple interpretations

Velocity – Need decisions fast

Variety – Many formats

Volume – beyond what environment can handle

Page 13: What’s  all the Buzz about  Hadoop  and Hive?

13

What is Hadoop Most common Big Data technology Powerful tool leading to insights Open Source Core – HDFS (storage) and MapReduce (send compute to

data) Hadoop Ecosystem Trivia – Where did the name Hadoop come from?

Page 14: What’s  all the Buzz about  Hadoop  and Hive?

MapReduce (Job Scheduling / Execution System)

Hadoop Ecosystem Snapshot

ETL Tools BI Reporting RDBMS

Pig (Data Flow) Hive (SQL / DW) Sqoop (SSIS)

HBase (Column DB)

Mahout (ML) Lucene/Solr (search indexing) HCatalog

Cassandra (Column DB)

External Stores (S3, Azure Blobs,

Azure Data Market, etc)

Page 15: What’s  all the Buzz about  Hadoop  and Hive?

15

What is Hive Direct queries to Hadoop file system Data warehousing framework on top of Hadoop Structure without full relational modeling Familiar-looking HiveQL using metadata Generates/runs MapReduce code (not faster than MR!)

Page 16: What’s  all the Buzz about  Hadoop  and Hive?

16

Why Use Hive Easy to use if you know SQL! Makes Hadoop cross-correlations, joins, filters easier Allows storage of intermediate results for faster/easier

querying Still slower than a relational database Limited indexing, basically no statistics, caching or query

optimizer Append only

Page 17: What’s  all the Buzz about  Hadoop  and Hive?

17

Who Plays with Big Data? Data Scientists, Data Teams (DBAs, Devs, End Users,

Statistics Experts) Data Stewards, Data Curators (DBAs, specialists) Infrastructure Admins – Hardware, Network, Windows,

Database Business/Data Analysts BI Developers and BI Solution Architects IT Pros

Page 18: What’s  all the Buzz about  Hadoop  and Hive?

18

Big Data Plus SQL Server Extract / Import between SQL, AS, Hadoop (especially Hive) Tools like PowerPivot, Power View, Excel can mashup data

from many sources such as SQL + Hive + DB2 Explore in Hadoop, Productionalize in SQL Server or AS

Only put full structuring, cleansing effort into the most valuable data Refine algorithms Quick prototyping of data hypotheses Use AS as index into Hadoop data Archive SQL data into Hadoop (never lose data, store cheaply)

Page 19: What’s  all the Buzz about  Hadoop  and Hive?

19

SQL Server is a Great Fit If…. Updates Filters, Joins, Subsets – Indexes and Optimizer! You’ve already put effort into structure You know what you need to know Fast responses to individual queries Not looking at entire data collection ACID matters Many, many, many existing and future applications

Page 20: What’s  all the Buzz about  Hadoop  and Hive?

20

A Day in the Life - Developer

Write HQL, Pig, MapReduce Rapid development Lots of ad hoc code Data Cleansing

Page 21: What’s  all the Buzz about  Hadoop  and Hive?

21

A Day in the Life – DBA / Infrastructure Backups – probably none Data loads (in and out) / ETL – frequent, often changing Archive Data Curation, Cleansing Cloud / Elasticity management System management (installs, troubleshooting,

performance, monitoring, trending, planning, hardware) Write HQL

Page 22: What’s  all the Buzz about  Hadoop  and Hive?

22

A Day in the Life – BI Expert Explore data, especially unknown unknowns Mashup data from many systems including Hive Visualize data for Insights that change the business Integrate with other systems Write HQL Bring Hive data into apps, reports Do statistical analysis, modeling with R, Mahout

Page 23: What’s  all the Buzz about  Hadoop  and Hive?

23

Why Get Involved Now It’s the cool kid on the block – don’t underestimate this! Help design the future Cutting edge Rare skill - few experts, you’ll stand out Shows initiative Understand when SQL or AS really is the better solution

and/or a complimentary solution PBs and EBs and and ZBs and YBs!

Page 24: What’s  all the Buzz about  Hadoop  and Hive?

24

Microsoft Big Data Roadmap Hadoop, Hive, PowerPivot, Power View, Hive ODBC Driver,

Analysis Services, PDW, StreamInsight, SQL Server, Excel, Sqoop, Javascript

CTP - HadoopOnAzure now Plans for Azure and on-premise Windows based Hadoop Adapt existing code, add to the ecosystem Look for exciting announcements soon!

Page 25: What’s  all the Buzz about  Hadoop  and Hive?

25

Demo

Page 26: What’s  all the Buzz about  Hadoop  and Hive?

26

Next Steps Read a bit

http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning-resources.aspx

http://blogs.msdn.com/cindygross Play around http://HadoopOnAzure.com Think about how you can fit Big Data into your company data

strategy, and when it’s not a good fit Get involved - Suggest uses, be prepared to combat misuses Sign up for PASS Summit 2012! Then sign up for SQLCAT: Big Data – All Abuzz About Hive [

BIA-305-A]

Page 27: What’s  all the Buzz about  Hadoop  and Hive?

27

Summary Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive Why SQL Pros Care Next Steps

Page 28: What’s  all the Buzz about  Hadoop  and Hive?

Global Sponsors:

Questions?What’s all the Buzz about Hadoop and Hive?Cindy Gross, Microsoft SQLCAT PM@SQLCindy | [email protected] | http://blogs.msdn.com/cindygross

Why it Matters for SQL Server Peeps

Page 29: What’s  all the Buzz about  Hadoop  and Hive?

Global Sponsors:

Thank You for Attending