Amazon Elastic MapReduceの紹介(英語)

Amazon Elastic MapReduce

MY BACKGROUND

• Based in Seattle, WA

• Education:– BS in Computer Science, The American University, 1985– Graduate student in Digital Media, University of Washington, 2010

• Background:– Microsoft Visual Studio team– Consulting to startups and VC’s– Amazon employee since 2002

• Evangelist:– Speak– Write– Tweet

• Author, “Host Your Web Site in the Cloud”

• Email: jbarr@amazon.com• Twitter: @jeffbarr

• What is Big Data

• Elastic MapReduce Overview

• Example Use Cases

• Ecosystem and Tools

• Upcoming Features

• Discussion

AGENDA

• Doesn’t refer just to volume– You can benefit from Big Data infrastructure

without having a ton of data

– Many existing technologies have little problem physically handling large volumes

• Challenges result from the combination of data volume, data structure, and usage demands from that data, usually tied to timeliness

• Big Data Tools are needed to provide a holistic view of enterprise data and systematically harness it for insights and trends

WHAT IS BIG DATA?

• Enables customers to easily, securely and

cost-effectively process vast amounts of

– Spin-up hundreds of instances

– Process hundreds of terabytes of data

• Hosted Hadoop framework running on

Amazon’s web-scale infrastructure

WHAT IS AMAZON ELASTIC MAPREDUCE

• Launch and monitor job flows

• AWS Management Console

• Command line interface

• REST API

WHY USE AMAZON ELASTIC MAPREDUCE

• Elastic MapReduce removes “MUCK” from Big Data processing

– Hard to manage compute clusters

– Hard to tune Hadoop

– Hard to monitor running Job Flows

– Hard to debug Hadoop jobs

– Hadoop issues prevent smooth operation in the cloud

PROBLEMS CUSTOMERS SOLVE WITH

ELASTIC MAPREDUCE

• Targeted advertising / Clickstream analysis

• Data warehousing applications

• Bio-informatics (Genome analysis)

• Financial simulation (Monte Carlo simulation)

• File processing (resize jpegs)

• Web indexing

• Data mining and BI

• Data or I/O Intensive (m1/m2 instances)

– Data Warehouse

– Data Mining

• Click stream, logs, events, etc.

• Compute or I/O Intensive (c1, cc1/HPC instances)

– Credit Ratings

– Fraud Models

– Portfolio analysis

– VaR calculation

HARDWARE REQUIREMENTS FOR USE CASES

CLICKSTREAM ANALYSIS – RAZORFISH AND BEST BUY

• Best Buy came to Razorfish– 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads

required per day

Targeted Ad

User recently

purchased a

home theater

system and is

searching for

video games

(1.7 Million per day)

• Leveraged AWS and Elastic MapReduce– 100 node cluster on demand

– Processing time dropped from 2+ days to 8 hours

– Increased ROAS (Return on Advertising Spend) by 500%

CLICKSTREAM ANALYSIS - ARCHITECTURE

• Invented by Google

• New processing model

• Highly scalable

• Easy to understand

• Industry standard

• Something worth knowing

WHAT IS MAPREDUCE?

• Take input data

• Break in to sub-problems

• Distribute to worker nodes

• Worker nodes process sub-problems in parallel

• Take output of worker nodes and reduce to answer

ELASTIC MAPREDUCE MODEL – OVERVIEW

MAPREDUCE EXAMPLE – WORD COUNT

Map Phase

Mapper

“This”, Doc1

“Word”, Doc1

“This”, Doc2

“This”, Doc3

“This”, Doc1

“Word”, Doc1

“This”, Doc2

“This”, Doc3

“Word”, Doc3“Word”, Doc3

Reduce Phase

Reducer

Output

“This”, 3

“Word”, 2

ELASTIC MAPREDUCE MODEL – DETAILED

ELASTIC MAPREDUCE IN ACTION – S3 LOG FILE

ELASTIC MAPREDUCE IN ACTION – STEP 1

ELASTIC MAPREDUCE IN ACTION - RESULTS

• Mapper and Reducer in Java JAR files

• Scale as large as needed

– Data

– Processing

– Add nodes (even while running) to speed up

• No need to manage intermediate data

• Suitable for certain types of problems

– Record-oriented input

– No dependencies between records

• No more MUCK – focus on your problem

NOTES / ATTRIBUTES

HADOOP + R

Thank You

Amazon Elastic MapReduceの紹介(英語)

Technology

MapReduce 簡單介紹與練習

(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014

CS 425 / ECE 428 Distributed Systems Fall 2019 · – Google: MapReduce and Sawzall – Amazon: Elastic MapReduce service (pay-as-you-go) – Google (MapReduce) • Indexing: a chain

(BDT316) Offloading ETL to Amazon Elastic MapReduce

MapReduce in Amazon Web Services. Introduction Amazon Elastic MapReduce – Amazon provides MapReduce framework and interface – Data Store: Amazon Simple

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Elastic MapReduce · 2020-05-22 · Amazon Elastic MapReduce API Reference Request Parameters Request Parameters For information about the parameters that are common to all

Hadoop / Elastic MapReduceつまみ食い

Introducing Elastic MapReduce

PRD-027 - Amazon Elastic MapReduce (EMR)

Getting Started with Amazon Elastic MapReduce 1.2.2 · Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and

Amazon Elastic MapReduce -- Getting started with Hadoop

Introduction To Elastic MapReduce at WHUG

Masterclass Webinar: Amazon Elastic MapReduce (EMR)

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012

ソーシャルアプリでの Amazon Elastic MapReduce 活用事例

Deep Dive: Amazon Elastic MapReduce

Amazon Elastic MapReduce (EMR): Hadoop as a Service

[AWSマイスターシリーズ] Amazon Elastic MapReduce (EMR)

Perl on Amazon Elastic MapReduce