48
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services @sk_bln

B3 - Business intelligence apps on aws

Embed Size (px)

DESCRIPTION

Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

Citation preview

Page 1: B3 - Business intelligence apps on aws

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services

@sk_bln

Page 2: B3 - Business intelligence apps on aws

Overview

Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)

Page 3: B3 - Business intelligence apps on aws

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 4: B3 - Business intelligence apps on aws
Page 5: B3 - Business intelligence apps on aws

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 6: B3 - Business intelligence apps on aws

Data  App   App  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

Data  has  gravity  

Compute  Storage   Big  Data  

Page 7: B3 - Business intelligence apps on aws

Data  App   App  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

latency   Throughput  

…and  iner0a  at  volume…  

Compute  Storage   Big  Data  

Page 8: B3 - Business intelligence apps on aws

Data  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

…easier  to  move  applica0ons  to  the  data  

Compute  Storage   Big  Data  

Page 9: B3 - Business intelligence apps on aws

Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

S3 as a “single source of truth”

S3

Page 10: B3 - Business intelligence apps on aws

Getting your Data into AWS

Amazon S3

Corporate  Data  Center  

•  Console Upload

•  FTP

•  AWS Import Export

•  S3 API

•  Direct Connect

•  Storage Gateway

•  3rd Party Commercial Apps

•  Tsunami UDP

Page 11: B3 - Business intelligence apps on aws

Write directly to a data source

Your  applica+on   Amazon S3

DynamoDB  

Any  other  data  store  

Amazon S3

Amazon  EC2    

Page 12: B3 - Business intelligence apps on aws

Queue, pre-process and then write

Amazon  Simple  Queue  Service  (SQS)  

Amazon S3

DynamoDB  

Any  other  data  store  

Page 13: B3 - Business intelligence apps on aws

Amazon  SQS  

Amazon S3

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Choose depending upon design

Page 14: B3 - Business intelligence apps on aws

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 15: B3 - Business intelligence apps on aws

Hadoop based Analysis

Amazon S3 Amazon EMR

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 16: B3 - Business intelligence apps on aws

EMR is Hadoop in the Cloud

Amazon Elastic MapReduce (EMR)?

Page 17: B3 - Business intelligence apps on aws

EMR  Cluster

S3

Put  the  data  into  S3  

Choose:  Hadoop  distribuGon,  #  of  nodes,  types  of  nodes,  custom  configs,  Hive/Pig/etc.  

Get  the  output  from  S3  

Launch  the  cluster  using  the  EMR  console,  CLI,  SDK,  or  APIs  

You  can  also  store  everything  in  HDFS  

How does EMR work ?

Page 18: B3 - Business intelligence apps on aws

Resize Nodes

EMR Cluster

You  can  easily  add  and  remove  nodes  

Page 19: B3 - Business intelligence apps on aws

1  instance  for  100  hours  =  

100  instances  for  1  hour  

Page 20: B3 - Business intelligence apps on aws

Small  instance  =  $5.50  (including  EMR  –  without:  $4.40)  

Page 21: B3 - Business intelligence apps on aws

1  instance  for  1000  hours  =  

1000  instances  for  1  hour  

Page 22: B3 - Business intelligence apps on aws

Small  instance  =  $55  (including  EMR  –  without:  $44)  

 

Page 23: B3 - Business intelligence apps on aws

When  you  turn  off  your  cloud  resources,  you  actually  stop  paying  for  them  

Page 24: B3 - Business intelligence apps on aws

SQL based processing

Amazon S3 Amazon EMR

Amazon Redshift

Pre-processing framework

Petabyte scale Columnar Data -warehouse

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 25: B3 - Business intelligence apps on aws

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud

What is Amazon Redshift ?

Easy to provision and scale

No upfront costs, pay as you go

High performance at a low price

Open and flexible with support for popular BI tools

Page 26: B3 - Business intelligence apps on aws

Demo: Amazon Redshift

Page 27: B3 - Business intelligence apps on aws
Page 28: B3 - Business intelligence apps on aws

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 29: B3 - Business intelligence apps on aws

Your choice of BI Tools

Amazon S3 Amazon EMR

Amazon Redshift

Pre-processing framework

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 30: B3 - Business intelligence apps on aws
Page 31: B3 - Business intelligence apps on aws

Demo Jaspersoft as a BI Frontend

Page 32: B3 - Business intelligence apps on aws
Page 33: B3 - Business intelligence apps on aws

Sharing results and visualizations

Amazon S3 Amazon EMR

Amazon Redshift

Web App Server Visualization tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 34: B3 - Business intelligence apps on aws

Sharing results and visualizations

Amazon S3 Amazon EMR

Amazon Redshift Business

Intelligence Tools

Business Intelligence Tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 35: B3 - Business intelligence apps on aws

Geospatial Visualizations

Amazon S3 Amazon EMR

Amazon Redshift Business

Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Visualization tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 36: B3 - Business intelligence apps on aws

Rinse and Repeat

Amazon S3 Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 37: B3 - Business intelligence apps on aws

The complete architecture

Amazon S3 Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Page 38: B3 - Business intelligence apps on aws

Real Time

Page 39: B3 - Business intelligence apps on aws

Amazon Kinesis •  Real-time processing • Massive scale •  Integrated •  Use cases:

•  Real-time log analysis •  Real-time data analytics •  Social media monitoring •  Financial transactions •  Online machine learning

Page 40: B3 - Business intelligence apps on aws

Amazon Kinesis Data Flow Data Sources

App.4    [Machine  Learning]  

AWS  En

dpoint  

App.1    [Aggregate  &  De-­‐Duplicate]  

Data Sources

Data Sources

Data Sources

App.2    [Metric  ExtracGon]  

S3

DynamoDB  

Redshift

App.3  [Sliding  Window  Analysis]  

Data Sources

Availability Zone

Shard 1 Shard 2 Shard N

Availability Zone Availability Zone

Page 41: B3 - Business intelligence apps on aws

Use cases

Page 42: B3 - Business intelligence apps on aws

SkillPages

Customer Use Case

Everyone Needs Skilled People

At Home At Work In Life

Repeatedly

Page 43: B3 - Business intelligence apps on aws

Who they are

What they can do

Your real life connections to them

Examples of what they can do

Page 44: B3 - Business intelligence apps on aws

Data Architecture

Data Analyst

Raw Data

Get Data

Join via Facebook

Add a Skill Page

Invite Friends

Web Servers Amazon S3 User Action Trace Events

EMR Hive Scripts Process Content

•  Process log files with regular expressions to parse out the info we need.

•  Processes cookies into useful searchable data such as Session, UserId, API Security token.

•  Filters surplus info like internal varnish logging.

Amazon S3

Aggregated Data

Raw Events

Internal Web

Excel Tableau

Amazon Redshift

Page 45: B3 - Business intelligence apps on aws

We  found  that  Amazon  Redshi^  offers  the  performance  we  needed  while  freeing  us  from  the  licensing  costs  of  our  previous  soluGon  With  Amazon  Redshi^  and  Tableau,  anyone  in  the  company  can  set  up  any  queries  they  like—from  how  users  are  reacGng  to  a  feature,  to  growth  by  demographic  or  geography,  to  the  impact  sales  efforts  have  had  in  different  areas.  It’s  very  flexible  

Jon  Hoffman,  So<ware  Engineer,  Foursquare  

0

0.2

0.4

0.6

Female Male

Gender

0 20 40 60 80

Age

Foursquare

Gorilla Coffee

Gray's Papaya

Amorino

When do people go to a place?

Page 46: B3 - Business intelligence apps on aws

Stack – analysis and sharing

App

licat

ion

Sta

ck

Scala/Liftweb API Machines WWW Machines Batch Jobs

Scala Application code

Mongo/Postgres/Flat Files Databases Logs

Dat

a S

tack

Amazon S3 Database Dumps Log Files

Hadoop Elastic Map Reduce

Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs

mongoexport postgres dump Flume

Page 47: B3 - Business intelligence apps on aws

Everything that was a limited resource

is now a programmable resource

Page 48: B3 - Business intelligence apps on aws

•  Hadoop Technology and Use Cases: http://www.powerof60.com/

•  http://aws.amazon.com/de •  Start with the Free Tier:

http://aws.amazon.com/de/free/ •  25 US$ credits for new German customers:

http://aws.amazon.com/de/campaigns/account/ •  Twitter: @AWS_Aktuell •  Facebook:

http://www.facebook.com/awsaktuell •  Webinars: http://aws.amazon.com/de/about-aws/events/

Resources