39
park the future. May 4 – 8, 2015 Chicago, IL

Transform + analyze Visualize + decide Capture + manage Dat a

Embed Size (px)

Citation preview

Page 1: Transform + analyze Visualize + decide Capture + manage Dat a

Spark the future.

May 4 – 8, 2015Chicago, IL

Page 2: Transform + analyze Visualize + decide Capture + manage Dat a

Big Data for the SQL Ninja

Scott KleinSenior Technical Evangelist

BRK2550

Page 3: Transform + analyze Visualize + decide Capture + manage Dat a

A little about Scott…

Page 4: Transform + analyze Visualize + decide Capture + manage Dat a

Why are you here?You want to advance your careerYou want and/or need to learn about big data technologiesWhere is the role of the DBA going?

Page 5: Transform + analyze Visualize + decide Capture + manage Dat a

The Microsoft data platform capabilities

Transform+ analyze

Visualize+ decide

Capture+ manage

Data

Visualize + decide

MobileReportsNatural LanguageDashboardsApplications

Complex Event Processing

Transform + analyze

Orchestration PredictionQueryInformation management

Search Streaming

Capture + manage

RelationalInternal & external

Non-relational

Page 6: Transform + analyze Visualize + decide Capture + manage Dat a

The Microsoft data platform capabilities

Transform+ analyze

Visualize+ decide

Capture+ manage

Data

Visualize + decide

MobileReportsNatural LanguageDashboardsApplications

Complex Event Processing

Transform + analyze

Orchestration PredictionQueryInformation management

Search Streaming

Capture + manage

RelationalInternal & external

Non-relational

Page 7: Transform + analyze Visualize + decide Capture + manage Dat a

What is Big Data? It’s all about the V’s… Volume …

Variety …

Velocity …

Page 8: Transform + analyze Visualize + decide Capture + manage Dat a

SizesKilo - 1,000Mega - 1,000,000 Giga - 1,000,000,000Tera - 1,000,000,000,000Peta - 1,000,000,000,000,000Exa - 1,000,000,000,000,000,000Zetta - 1,000,000,000,000,000,000,000Yotta - 1,000,000,000,000,000,000,000,000

Page 9: Transform + analyze Visualize + decide Capture + manage Dat a
Page 10: Transform + analyze Visualize + decide Capture + manage Dat a
Page 11: Transform + analyze Visualize + decide Capture + manage Dat a

Some interesting facts72 hours of video are uploaded per minute on YouTube (1 terabyte every 4 minutes)

500 terabytes of new data per day are ingested in Facebook databases

Sensors from a Boeing jet engine create 20 terabytes of data every hour

The proposed Square Kilometer Array telescope will generate “a few Exabytes of data per day” (single beam)

Page 12: Transform + analyze Visualize + decide Capture + manage Dat a

Hadoop Ecosystem

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing

(MapReduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/ REST)

Rela

tiona

l(S

QL

Serve

r)

Machine Learning(Mahout)

Graph(Pegasus

)

Stats processin

g(RHadoo

p)

Eve

nt Pip

elin

e(Flu

me)

Active Directory (Security)

Monitoring & Deployment

(System Center)

C#, F#, .NET

PowerShell

Pipelin

e / w

orkflow

(Oozie

)

Azure Storage Vault (ASV)

APS | Po

lybase

Busin

ess

Inte

lligence

(E

xcel, Po

wer

Vie

w, S

SA

S)

World's Data (Azure Data Marketplace)

Eve

nt

Drive

n

Proce

ssing

LegendRed = Core HadoopBlue = Data processingPurple = Microsoft integration points and value addsOrange = Data MovementGreen = Packages

Page 13: Transform + analyze Visualize + decide Capture + manage Dat a

The Hadoop EcosystemETL Tools BI Reporting RDBMS

Page 14: Transform + analyze Visualize + decide Capture + manage Dat a

HDInsight

Page 15: Transform + analyze Visualize + decide Capture + manage Dat a

HDInsight• HDInsight is a Hadoop-based service that brings 100%

Apache Hadoop solution running on the Microsoft Azure platform

• Based on the Hortonworks Data Platform (HDP)• Scalable, on-demand service

Page 16: Transform + analyze Visualize + decide Capture + manage Dat a

RDBMS vs. Hadoop

RDBMS HADOOP

Data size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

ScalingNonlinear Linear

Page 17: Transform + analyze Visualize + decide Capture + manage Dat a

StorageTwo choices

Azure Storage (Blob)

File System

Page 18: Transform + analyze Visualize + decide Capture + manage Dat a

Demo

Page 19: Transform + analyze Visualize + decide Capture + manage Dat a

Now What?Working with your HDInsight cluster – running jobs, import/export data, viewing and consuming data…• .NET• Java• Hive• Sqoop• Pig• Storm / Stream Analytics• Excel• Etc.

Page 20: Transform + analyze Visualize + decide Capture + manage Dat a

Hive

Page 21: Transform + analyze Visualize + decide Capture + manage Dat a

What is Hive?• A data warehouse infrastructure built on top of

Hadoop for providing data summarization, query and analysis

• Provides a SQL-like language called HiveQL to query data

• Integration between Hadoop and BI and visualization tools

http://hive.apache.org

Page 22: Transform + analyze Visualize + decide Capture + manage Dat a

Demo

Page 23: Transform + analyze Visualize + decide Capture + manage Dat a

Sqoop

Page 24: Transform + analyze Visualize + decide Capture + manage Dat a

What is Sqoop?Command-line interface application to transfer bulk data between Hadoop and relational databases

http://sqoop.apache.org

Page 25: Transform + analyze Visualize + decide Capture + manage Dat a

Demo

Page 26: Transform + analyze Visualize + decide Capture + manage Dat a

Storm

Page 27: Transform + analyze Visualize + decide Capture + manage Dat a

What is Storm?• Apache Storm is a distributed, fault-tolerant, open-source real-

time event processing solution for large, fast streams of data• HDInsight provides a fully managed Apache Storm on Azure

http://storm.apache.org/

Page 28: Transform + analyze Visualize + decide Capture + manage Dat a

Demo

Page 29: Transform + analyze Visualize + decide Capture + manage Dat a

Hbase

Page 30: Transform + analyze Visualize + decide Capture + manage Dat a

NoSQL?

“No” SQL = Not Only SQL

Page 31: Transform + analyze Visualize + decide Capture + manage Dat a

What is HBase?• Open-source, distributed, non-relational database• Column-oriented, key-value built to run on top of Hadoop

HDFS

http://hbase.apache.org

Page 32: Transform + analyze Visualize + decide Capture + manage Dat a

Demo

Page 34: Transform + analyze Visualize + decide Capture + manage Dat a

SummaryBig data isn’t scaryYou can use technologies and languages you are already familiar withThe role of the DBA

Page 35: Transform + analyze Visualize + decide Capture + manage Dat a

HDInsight – Call to ActionKey Sessions at IgniteBRK3555-Real-Time Analytics at Scale for Internet of ThingsBRK2550-Big Data for the SQL NinjaBRK2576-Planning your Big Data Architecture on AzureBRK3556-Optimizing Hadoop using Microsoft Azure HDInsightBRK3559-Build Hybrid Big Data Pipelines with Azure Data Factory and Azure HDInsight

Sign Up for HDInsight Free Trialhttp://azure.com/hdinsight

Sign up for Azure Data Lake Previewhttp://azure.com/datalake

Page 36: Transform + analyze Visualize + decide Capture + manage Dat a

Ignite Azure Challenge Sweepstakes

Attend Azure sessions and activities, track your progress online, win raffle tickets for great prizes!

Aka.ms/MyAzureChallenge

Enter this session code online: “TZDL”

NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9th, 2015. For Official Rules, see The Cloud and Enterprise Lounge or myignite.com/challenge

Page 37: Transform + analyze Visualize + decide Capture + manage Dat a

Questions?

?

Page 38: Transform + analyze Visualize + decide Capture + manage Dat a

Visit Myignite at http://myignite.microsoft.com or download and use the Ignite Mobile App with the QR code above.

Please evaluate this sessionYour feedback is important to us!

Page 39: Transform + analyze Visualize + decide Capture + manage Dat a

© 2015 Microsoft Corporation. All rights reserved.