AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

AWS Roadshow 2013Über den Wolken – befreien Sie Ihre IT

Datenanalyse und Business Intelligence

Michael HanischMgr. Solutions Architecture

Matthias JungSolutions Architect

Constantin GonzalezSolutions Architect

1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

Overview

Introducing Big Data

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

The cost of data generation is falling

The volume of data is increasing

Generation

Lower cost,higher throughput

Generation

Highlyconstrained

Generated data

Available for analysis

Data volume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Elastic and highly scalable

No upfront capital expense

Only pay for what you use+

Available on-demand+

=Remove

constraints

Generation

Highlyconstrained

Generation

Accelerated

Technologies and techniques for working productively with data,

at any scale.

Big Data

From data to

actionable information

“Who buys video games?”

3.5 billion records

13 TB of click stream logs

71 million unique cookies

Per day:

500% return on ad spend

From 2 months procurement timeto a few minutes

Results:

“Who is using our service?”

Identified early mobile usage

Invested heavily in mobile development

Finding signal in the noise of logs

9,432,061 unique mobile devices used the Yelp mobile app.

4 million+ calls. 5 million+ directions.

In January 2013

Analytics and

Cloud Computing

Generation

S3, Glacier,Storage Gateway,

DynamoDB, Redshift, RDS,

Generation

EC2 &Elastic MapReduce

Generation

Collaboration & sharingEC2 & S3,

CloudFormation,Elastic MapReduce,

RDS, DynamoDB, Redshift

Generation

Collaboration & sharingEC2 & S3,

CloudFormation,Elastic MapReduce,

RDS, DynamoDB, Redshift

EC2 &Elastic MapReduce

S3, Glacier,Storage Gateway,

DynamoDB, Redshift, RDS,

HBaseAWS Data Pipeline

Simple Storage Service

Elastic MapReduce

What is EMR?

Map-Reduce engine Integrated with tools

Hadoop-as-a-service

Massively parallel

Cost effective AWS wrapper

Integrated to AWS services

How does it work?

EMR ClusterS3

1. Put the data into S3 (or HDFS)

3. Get the results

2. Launch your cluster. Choose:• Hadoop distribution• How many nodes• Node type (hi-CPU,

hi-memory, etc.)• Hadoop apps (Hive,

Pig, HBase)

EMR Cluster

How does it work?

You can easily resize the cluster

EMR Cluster

How does it work?

Use Spot nodes to save time

and money

EMR Cluster

How does it work?

Launch parallel clusters against the same data source (tune for the

workload)

How does it work?

EMR ClusterS3

When the work is complete, you can terminate the cluster

(and stop paying)

How does it work?

You can store everything in HDFS

(local disk)

High Storage nodes = 48 TB/node

EMR Cluster

How does it work?

Launch in a Virtual Private Cloud for

extra security

Thousands of Customers, 5+ Million Clusters

Integrates with Hadoop Ecosystem

Give it a try:aws.amazon.com/elasticmapreduce

Cost to run a 100-node EMR cluster:EUR 6.15/hour

($8/h)

Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/

What if all I want is a database?

No upfront costs, pay as you go

Really fast performance at a really low price

Open and flexible with support for popular tools

Easy to provision and scale up massively

Customers asked us for a data warehouse the AWS way:

A fast and powerful, petabyte-scale data warehouse that is

A Lot Faster

A Lot Cheaper

A Whole Lot SimplerAmazon Redshift

Amazon Redshift Is:

Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

Id Age State

123 20 CA

345 25 WA

678 40 FL

Amazon Redshift Dramatically Reduces IO

Amazon Redshift parallelizes and distributes everything

Backup

Restore

Resize

Amazon Redshift Runs on Optimized Hardware

HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate

HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage

128 GB RAM16 cores

16 TB disk

16 GB RAM

2 TB disk

2 cores

Optimized for I/O intensive workloads

High disk density

Runs in HPC - fast network

HS1.8XL available on Amazon EC2

Redshift lets you start small and grow bigExtra Large Node (XL)3 spindles, 2TB, 15GiB RAM 2 virtual cores, 10GigE

Single Node (2TB)

Cluster 2-32 Nodes (4TB – 64TB)

8 Extra Large Node (8XL)24 spindles, 16TB, 120GiB RAM16 virtual cores, 10GigE

Cluster 2-100 Nodes (32TB – 1.6PB)8XL

XL XL XL XL XL XL XL XL

Priced to Analyze All the Customer’s Data

Price Per Hour for HS1.XL Single Node

Effective Hourly Price Per TB Effective Annual Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing: Number of Nodes x Cost per Hour

No charge for Leader Node

Pay as you grow

Amazon Redshift Simplifies Provisioning

• Create a cluster in minutes

• Automatically patch your OS and data warehouse software

• Scale up to 1.6PB with a few clicks and no downtime

Amazon RedshiftAmazon Redshift

Amazon Redshift Simplifies Operations

• Built-in security in transit, at rest, when backed up*

• Backup to S3 is continuous, incremental, and automatic

• Disk failures are transparent; nodes recover automatically

• Streaming restores resumes querying faster

Amazon S3Clients

*SSL, Amazon VPC, AES-256 (Hardware Accelerated)

(Optional) SSL Continuous, Automatic Backup

Streaming Restore

Amazon Redshift

Initial Pilot Results

Current production environment32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk

Tested 2B row data set, 6 representative queries on a

2-node Amazon Redshift cluster

queries ran > 10x faster

Amazon Redshift Integrates With All Data Sources

Amazon DynamoDB

Amazon Elastic MapReduce

Amazon Simple Storage Service (S3)

Amazon EC2

AWS Storage Gateway Service

Corporate Data Center

Amazon Relational Database Service (RDS)

Amazon Redshift

Integrates With Existing BI Tools

Connect your tools to Amazon Redshift using standard drivers from PostgreSQL.org

Amazon Redshift

JDBC/ODBC

DataIntegrationPartners*

On-Premises Integration

Redshift

OLTPERP

Reportingand BI

Cloud ETL for Big Data

• Maintain online SQL access to your historical data• Transformation and enrichment with EMR• Longer history ensures better insight

RedshiftElastic MapReduceS3

Reportingand BI

Thanks.glez@amazon.de

Learn More: aws.amazon.com/big-data

Thank you!glez@amazon.de

AWS Data Pipeline

Data-intensive orchestration and automation

Reliable and scheduled

Easy to use, drag and drop

Execution and retry logic

Map data dependencies

Create and manage temporary compute resources

Anatomy of a pipeline

Additional checks and notifications

Arbitrarily complex pipelines

AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence

Technology

AWS Roadshow Herbst 2013 Partnervortrag München: Censhare

Datenanalyse und Geometrie - mathematica-didactica.commathematica-didactica.com/.../md_2011_Engel_Datenanlyse_Geometrie.pdf · Datenanalyse und Geometrie 7 Abb. 1: Proportionalität

VERSIERT - WKO.at · Ebenso präsentiert wurden erste Kampagnenideen. MUSTER. STARKE VORTRÄGE Spannende Themen und klare Aussagen bei der Fachverbands Herbst Roadshow 2018 Bei der

AWS Roadshow Herbst 2013 Partnervortrag Frankfurt: Die Cloud als Sprungbrett in die vernetzte Zukunft

INTELLIGENTE DATENANALYSE IN MATLAB€¦ · Vorlesung: Grundlagen Maschinelles Lernen und MATLAB. Konzepte/Algorithmen der intelligenten Datenanalyse. Übung: Implementierung der

AWS Roadshow Herbst 2013 Partnervortrag Berlin: tecRacer

FitlabGui Datenanalyse, Systemidentifizierung und … · FitlabGui –Datenanalyse, Systemidentifizierung und Flugeigenschaftsbewertung Susanne Seher-Weiß Deutsches Zentrum für

Multivariate Statistik - Universität Bonn · Einführung Multivariate Datenanalyse • Daten in der Form einer Datenmatrix • Statistische Verfahren zur – Explorativen Datenanalyse

Datenanalyse + Wissenserwerb · ©2010, Thomas Galliker Seite 1 von 13 Datenanalyse + Wissenserwerb Modulendprüfung 2009 Die nachfolgende Übungsprüfung dient als …

Zusatz-Zertifikat “DigiLab · Informationsveranstaltung zum Zusatz-Zertifikat „DigiLab“ Datenanalyse mit Excel Einführung in den Prozess der Datenanalyse praxisnahe & spannende

Grundlagen der Datenanalyse mit Rhsrm-mathematik.de/.../KomplexeFunktionen/R1_UniGiessen_GerritEi… · Grundlagen der Datenanalyse mit R (R 1) Sommersemester2013 und Statistik und

Tutorium zur Datenanalyse mit SPSS

Explorative Datenanalyse im Data Lab

BW09:Datenanalyse · BW09:Datenanalyse Dr.DanielBrunner Kapitel1-8:SoftwaregestützteDatenanalyse

Jacob, Datenanalyse 1 - Uni Trier: Willkommen · - univariat - bivariat - multivariat . Jacob, Datenanalyse 6 Datenmatrix Variable Fälle Variable: Kopfseite

AWS Roadshow Herbst 2013 Partnervortrag Stuttgart: ITM

Professionelle Datenanalyse mit BI-Tools

Effiziente Datenanalyse mit MATLAB am Beispiel …€¦ · Effiziente Datenanalyse mit MATLAB am ... – Optical spectrum analyzer ... Deploying Applications with MATLAB Give MATLAB

Www.bawagpskfonds.at INVESTMENTSERVICE ROADSHOW HERBST 2012 Dr. Peter Pavlicek

Datenanalyse mit IPython und Pandas - Dirk Lossdirk-loss.de/ipython-pandas-2013-05/datenanalyse-ipython-pandas.pdf · python-sympy python-nose. Title: Datenanalyse mit IPython und