Upload
katy
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
What’s all the Buzz about Hadoop and Hive?. Why it Matters for SQL Server Peeps. Cindy Gross, Microsoft SQLCAT PM @ SQLCindy | [email protected] | http :// blogs.msdn.com/cindygross. The Plan. Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive - PowerPoint PPT Presentation
Citation preview
Global Sponsors:
What’s all the Buzz about Hadoop and Hive?Cindy Gross, Microsoft SQLCAT PM@SQLCindy | [email protected] | http://blogs.msdn.com/cindygross
Why it Matters for SQL Server Peeps
2
The Plan Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive Why SQL Pros Care Next Steps
3
PASS Summit - SQLCAT: Big Data – All Abuzz About Hive [BIA-305-A]
Speakers – Dipti Sangani and Cindy Gross
Gain BI insights with HiveQL over HDFS/HadoopHow HiveQL generates MapReduce and outputs dataRelated, familiar toolsHow and when to use Hive
4
Microsoft Big Data at PASS Summit Harnessing Big Data with Hadoop – Mike Flasko SQLCAT: Big Data Warehousing – Len Wyatt, James
Podgorski How Klout
Changed the Landscape of Social Media with Hadoop and BI – Denny Lee, Dave Mariani
MAD About Data: Solve Problems and Develop a “Data Driven Mindset” – Darwin Schweitzer
5
SQLCAT – Customer Experience (CX) Implement leading edge features Share lessons learned with the community Change the product based on real customer experiences
6
Microsoft SQLCAT at PASS Summit – HA/DR• SQLCAT
: Real-World Case Study of Mission-Critical Active/Active Remote DCs - Lindsey Allen, Prem Mehra
• SQLCAT: AlwaysOn Unplugged: Everything You Want to Know About AlwaysOn - Sanjay Mishra
• SQLCAT: AlwaysOn HA/DR Design Patterns, Architectures and Best Practices - Sanjay Mishra
• SQLCAT: SQL Server 2012 AlwaysOn HA/ DR Customer Panel - Sanjay Mishra, Ayad Shammout, David Smith, Michael Steineke, Thomas Grohser, Wolfgang Kutschera
7
Microsoft SQLCAT at PASS Summit - Azure• SQLCAT: How Do I Troubleshoot My Database Now that It Is i
n the Cloud? - Silvano Coriani, Ewan Fairweather
• SQLCAT: SQL Azure Design Patterns and Best Practices - Gus Apostol
• SQLCAT: How SQL Azure Supports Large-Scale Customer Deployments - Silvano Coriani, Ewan Fairweather, Mark Simms, Michael Thomassy, Nicholas Dritsas
• SQLCAT: What Are the Largest Azure Projects in the World? - Kevin Cox
• SQLCAT: Best Practices for SQL Server in Azure VMs: config & performance - Steven Howard
8
Microsoft SQLCAT at PASS Summit – More!• SQLCAT: Configuring Kerberos for SharePoint 2010 BI in 7 Steps -
Chuck Heinzelman• SQLCAT: What Are the Largest SQL Server Projects in the World?-
Kevin Cox, Ewan Fairweather, Mark Souza• SQLCAT: SQLOS Memory Manager Changes in SQL Server 2012 - Gus
Apostol, Jerome Halmans• SQLCAT: How Does Microsoft Run Its SAP Landscape on Windows and
SQL Server? - Juergen Thomas
• SQLCAT: Many-Core Processors, SSDs, Large Memory: How to Benefit SQL Server - Juergen Thomas
• SQLCAT: Running Reporting Services in SharePoint Integrated Mode: How and Why – Chuck Heinzelman
• SQLCAT: Case Study of Big Data in the Real World – Lindsey Allen, Lou Sawyer, Robert Abbott, Shep Sheppard
SQL Server Clinic Got a Burning SQL Server Architecture Question? Want to talk to someone about a problem you’re seeing on
your servers? Stop by the SQL Server Clinic at the PASS Summit this fall! Members of SQLCAT and CSS will be on hand to talk to you
about your issues!
NEW EXPANDED LOCATION4th Floor
Across from the PASS Booth
10
What is Big Data Find Insights - Explore, test, eliminate noise Schema on Read, not Schema on Write Structure may not be fully pre-defined Scale out on commodity hardware – pay as you go BASE instead of ACID More programmer, lone wolf focused MapReduce, streaming, machine learning, massively
parallel processing Something too big or complex for your current environment
and resources to handle in a cost effective manner
11
Why Use Big Data – Use Cases
Telemetry Management• Clickstream and
Application Log Analysis
• Sensor Data
IT Management• SLA Monitoring• Cyber Security• Forensic Analysis
Online Commerce• Sentiment Analysis• Recommendation
Engines• Search Indexing /
Quality
Financial Services• Risk Modeling• Threat Analysis• Fraud Detection• Credit Scoring
12
VVVVroom!
Variability – Multiple interpretations
Velocity – Need decisions fast
Variety – Many formats
Volume – beyond what environment can handle
13
What is Hadoop Most common Big Data technology Powerful tool leading to insights Open Source Core – HDFS (storage) and MapReduce (send compute to
data) Hadoop Ecosystem Trivia – Where did the name Hadoop come from?
MapReduce (Job Scheduling / Execution System)
Hadoop Ecosystem Snapshot
ETL Tools BI Reporting RDBMS
Pig (Data Flow) Hive (SQL / DW) Sqoop (SSIS)
HBase (Column DB)
Mahout (ML) Lucene/Solr (search indexing) HCatalog
Cassandra (Column DB)
External Stores (S3, Azure Blobs,
Azure Data Market, etc)
15
What is Hive Direct queries to Hadoop file system Data warehousing framework on top of Hadoop Structure without full relational modeling Familiar-looking HiveQL using metadata Generates/runs MapReduce code (not faster than MR!)
16
Why Use Hive Easy to use if you know SQL! Makes Hadoop cross-correlations, joins, filters easier Allows storage of intermediate results for faster/easier
querying Still slower than a relational database Limited indexing, basically no statistics, caching or query
optimizer Append only
17
Who Plays with Big Data? Data Scientists, Data Teams (DBAs, Devs, End Users,
Statistics Experts) Data Stewards, Data Curators (DBAs, specialists) Infrastructure Admins – Hardware, Network, Windows,
Database Business/Data Analysts BI Developers and BI Solution Architects IT Pros
18
Big Data Plus SQL Server Extract / Import between SQL, AS, Hadoop (especially Hive) Tools like PowerPivot, Power View, Excel can mashup data
from many sources such as SQL + Hive + DB2 Explore in Hadoop, Productionalize in SQL Server or AS
Only put full structuring, cleansing effort into the most valuable data Refine algorithms Quick prototyping of data hypotheses Use AS as index into Hadoop data Archive SQL data into Hadoop (never lose data, store cheaply)
19
SQL Server is a Great Fit If…. Updates Filters, Joins, Subsets – Indexes and Optimizer! You’ve already put effort into structure You know what you need to know Fast responses to individual queries Not looking at entire data collection ACID matters Many, many, many existing and future applications
20
A Day in the Life - Developer
Write HQL, Pig, MapReduce Rapid development Lots of ad hoc code Data Cleansing
21
A Day in the Life – DBA / Infrastructure Backups – probably none Data loads (in and out) / ETL – frequent, often changing Archive Data Curation, Cleansing Cloud / Elasticity management System management (installs, troubleshooting,
performance, monitoring, trending, planning, hardware) Write HQL
22
A Day in the Life – BI Expert Explore data, especially unknown unknowns Mashup data from many systems including Hive Visualize data for Insights that change the business Integrate with other systems Write HQL Bring Hive data into apps, reports Do statistical analysis, modeling with R, Mahout
23
Why Get Involved Now It’s the cool kid on the block – don’t underestimate this! Help design the future Cutting edge Rare skill - few experts, you’ll stand out Shows initiative Understand when SQL or AS really is the better solution
and/or a complimentary solution PBs and EBs and and ZBs and YBs!
24
Microsoft Big Data Roadmap Hadoop, Hive, PowerPivot, Power View, Hive ODBC Driver,
Analysis Services, PDW, StreamInsight, SQL Server, Excel, Sqoop, Javascript
CTP - HadoopOnAzure now Plans for Azure and on-premise Windows based Hadoop Adapt existing code, add to the ecosystem Look for exciting announcements soon!
25
Demo
26
Next Steps Read a bit
http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning-resources.aspx
http://blogs.msdn.com/cindygross Play around http://HadoopOnAzure.com Think about how you can fit Big Data into your company data
strategy, and when it’s not a good fit Get involved - Suggest uses, be prepared to combat misuses Sign up for PASS Summit 2012! Then sign up for SQLCAT: Big Data – All Abuzz About Hive [
BIA-305-A]
27
Summary Big Data and SQLCAT/CX at PASS Summit Overview of Big Data, Hadoop, Hive Why SQL Pros Care Next Steps
Global Sponsors:
Questions?What’s all the Buzz about Hadoop and Hive?Cindy Gross, Microsoft SQLCAT PM@SQLCindy | [email protected] | http://blogs.msdn.com/cindygross
Why it Matters for SQL Server Peeps
Global Sponsors:
Thank You for Attending