97
Media and Online Advertising on AWS Jan Borch AWS Solutions Architect

AWS Summit Benelux 2013 - Media and Online Advertising on AWS

Embed Size (px)

Citation preview

Media and Online Advertising

on AWS

Jan Borch – AWS Solutions Architect

Media Application on AWS

Music streaming

Media Application on AWS

Video streaming

Media Application on AWS

Digital publishing

Media Application on AWS

Media sharing

Martijn Bakker

Chief Engineer,

WeTransfer

Martijn Bakker, Chief Engineer, WeTransfer

AWS Summit Benelux 2013

WeTransfer

• Send big files anywhere

• Up to 2GB per transfer

• Files are stored for 7 days

• Beautiful backgrounds

(50/50 split between ads & art)

2

WeTransfer

• WeTransfer Plus: € 120 per year

• Up to 5GB per transfer

• Store up to 50GB permanently

• Your own backgrounds (no ads)

WeTransfer

• 1.8 Million transfers per day

• Downloads: 25 Gigabit per second

• 1.5 Million requests per hour (site + API)

• Over a Petabyte of storage used on S3

(Peak measurements - July 2013)

Company

• 15 wonderful, dedicated people

• Founded & based in Amsterdam

• Originated from Oy Communications

Origins

• Purely a functional tool

• Design company needs to send big files to clients

• FTP & friends “technical”, “confusing”

Early development

• OyTransfer + advertising = WeTransfer

• Goals:

• beautiful

• easy to use

• secure

Growth

• Double the transfers every

3 months

• Previous hoster:

• could not match growth

tempo adequately

• hardware-based platform;

adds maintenance & development costs

Growth

• Our 3 necessities:

• Storage

• Support

• Scalability

Growth

• AWS: on-demand, available

right away

• Initial migration:

• development time: 1 month

• using S3 through EC2 instances

The new WeTransfer

• Built for & with AWS: uses RDS, EC2, S3,

CloudWatch, DynamoDB, Route53, ElastiCache

• Ruby + HTML5 + JavaScript (frontend)

• Backend tailored around S3

• Launched January of this year

WeTransfer and S3

• Virtually unlimited storage capacity

• Redundancy: always available

• Fast, and cheap compared to similar offers

• Dramatically less costs

WeTransfer and S3

• Uses the multipart upload mechanism (where possible)

• Resumable uploads

• Uploads go directly to S3

thanks to CORS support

• Worker instances to process

uploaded content

WeTransfer and S3

• Secure upload / download, and encryption

• Regionalized: storage facilities all over the world to

ensure proper speeds to end users

• No maintenance

So why S3?

• Fast & flexible

• Almost no time spent on maintenance

• Virtually limitless capacity at the tips of your fingers

https://wetransfer.com/jobs

Online Advertising on AWS

YOUR

AD

HERE

Common challenge for Advertising Platforms ...

... device and media fragmentation ...

TXT

... scaled to millions of users

503 Service Temporarily Unavailable

The server is temporarily unable

to service your request due to

maintenance downtime or capacity

problems. Please try again later.

503 Service Temporarily Unavailable

The server is temporarily unable

to service your request due to

maintenance downtime or capacity

problems. Please try again later.

Maintain availability from one server…

…to thousands

Let's take a journey ...

Let's take a journey ...

Store

AWS S3 Storage for the Internet

STATIC FILES REPOSITORY

AMAZON S3

MEDIA AD SERVED TO USER

Let's take a journey ...

Store

Let's take a journey ...

Transform

AWS Elastic

Transcoder Video transcoding in the cloud

VIDEO FILES REPOSITORY

AMAZON S3

MEDIA AD SERVED TO USER

AMAZON ELASTIC

TRANSCODER

Let's take a journey ...

Transform

Let's take a journey ...

Deliver

AWS CloudFront Web service for content delivery

Dallas(2)

St.Louis

Miami

Jacksonville Los Angeles (2)

Palo Alto

Seattle

Ashburn(2)

Newark New York (2)

Dublin

London(2)

Amsterdam Stockholm

Frankfurt(2) Paris(2)

Singapore(2)

Hong Kong

Tokyo

Sao Paulo

South Bend

San Jose Osaka

Milan

Sydney

Reach a global audience Reach a global audience

CONTENT DELIVERY NETWORK

AMAZON CLOUDFRONT

IMPRESSION LOGS

Simple HLS video streaming architecture

In-house content

publication server

Source Video

Assets in S3

S3

Simple HLS video streaming architecture

In-house content

publication server

Source Video

Assets in S3

Video

transcoded into

HLS

S3 Elastic Transcoder

Simple HLS video streaming architecture

In-house content

publication server

Source Video

Assets in S3

Video

transcoded into

HLS

Edge Delivery

using CloudFront

Stockholm

NY

CloudFront S3 Elastic Transcoder

AWS CLI

aws s3 cp video.avi s3://mybucket/video

aws elastictranscoder create-job

--pipeline-id 1379510897399-mxjrif

--input '{"Key":"video/video.avi"}'

--outputs '[{"Key":"sample","PresetId":"1234-123", ...}]'

Let's take a journey ...

Deliver

Let's take a journey ...

Match

AWS EC2 Resizable compute capacity in the cloud

cc2.8xlarge

Virtual core: 32 - 2 x Intel Xeon

Memory: 60,5 GiB

I/O performance: 10 Gbit

Virtual core: 1

Memory: 1.7 GiB

I/O performance: Moderate

m1.small cr1.8xlarge

Virtual core: 32 - 2 x Intel Xeon

Memory: 240 GiB

I/O performance: 10 Gbit

SSD Instance store: 240 GB

cr1.8xlarge

Virtual core: 16

Memory: 60.5 GiB

I/O performance: 10 Gbit

SSD Instance store: 2 x 1TB

cr1.8xlarge

Virtual core: 16

Memory: 117 GiB

I/O performance: 10 Gbit

Instance store: 24 x 2TB

EC2 instance types

Amazon Route 53 Highly available and scalable Domain Name System

Extremely reliable and cost effective

Feature Details

Global Supported from AWS global edge locations for fast and reliable domain name resolution

Scalable Automatically scales based upon query volumes

Latency based

routing

Supports resolution of endpoints based upon latency, enabling multi-region application delivery

Integrated Integrates with other AWS services allowing Route 53 to front load balancers, S3 and EC2

Reach a global audience

Link to Ad Resource

AMAZON EC2 +

AUTOSCALING

Ad Servers

AMAZON ELB

AWS DYNAMODB fast & fully managed

NoSQL database service

AMAZON DYNAMODB

PROFILES DATABASE

ad-id advertiser max-price imps to

deliver

imps

delivered

1 AAA 100 50000 1200

2 BBB 150 30000 2500

user-id attribute1 attribute2 attribute3 attribute4

A XXX XXX XXX XXX

B YYY YYY YYY YYY

not many

rows

so many

rows

frequent

update

(near realtime)

batch manner update

Ads

Profiles(user-cookie)

Very general table structure

Let's take a journey ...

Match

Let's take a journey ...

Capture

Click-through Servers

AMAZON EC2 +

AUTOSCALING

AWS OPSWORKS INTEGRATED APPLICATION

MANAGEMENT

Stack

Layer Stack

Instances Layer Stack

Scale Instances Layer Stack

Agent on each

EC2 instance OpsWorks talks with

The heart of the service

Instance lifecycle and configuration hooks

Cookbooks

script "install_composer" do

interpreter "bash"

user "root"

cwd

"#{node[:deploy][:myphotoapp][:deploy_to]}/

current"

code <<-EOH

curl -s https://getcomposer.org/installer

| php

php composer.phar install

EOH

end

Amazon S3

Git repository

Let's take a journey ...

Capture

Let's take a journey ...

Report

CLICK-THROUGH LOG FILES

AMAZON S3

AMAZON ELASTIC MAP

REDUCE

CLICK-THROUGH LOG FILES

Data Growth

GB

TB

PB

Data Growth

Data Growth

Data Growth

Server Logs

Click Analysis

Impression logs

Sampling

Big Data

Time to process

Inflexible

Complexities of Big Data

Sampling

Big Data

Inflexible

Complexities of Big Data

Elastic Map Reduce &

Redshift

Sampling

Big Data

Complexities of Big Data

“Queryable”

Elastic Map Reduce &

Redshift

Big Data

Complexities of Big Data

“Queryable” All Data

Elastic Map Reduce &

Redshift

Data Insight

Turning Data into Information

Data Insight

Elastic

MapReduce

Turning Data into information

Redshift

AWS Elastic Map Reduce

Process vast amounts of data using Hadoop

AWS Redshift Fast, fully managed, petabyte-scale data

warehouse service

Let's take a journey ...

Store

Transform

Deliver

Match

Capture

Report

Amazon Web Services

Garry Turkington,

CTO,

Improve Digital

PRESENTATION

TYPE

IMPROVE DIGI

Amazon Web

Services at

Improve Digital

IMPROVE

DIGITAL

26 September 2013 Garry Turkington CTO

[email protected]

@garryimprove

• Cloud-based Real Time Advertising

Technology

• Focus on the premium publisher / media

owner

• Integrations with thousands of Demand

Partners

• Decision driven by Real Time Data

• Offices in UK, NL, DE & ES

• +100 Premium Publishers

IMPROVE

DIGITAL

ABOUT US

Use AWS in conjunction with dedicated physical

infrastructure

2 sides to the story

• Front end: serving of ads to end-users

• Back-end: Data processing and dev/test

Use of AWS

• Fleet of ad servers running mostly on EC2

• Ad serving process is computationally expensive and has strict time constraints

• Need ability to spin up additional instances based on demand: horizontally scalable system

• Place ad servers in different regions to reduce serving latency; big benefit of EC2 over physical kit

• Grow fleets in different regions separately

Serving ads

• S3/Glacier used for policy-driven data retention

• S3 is the starting point for AWS and on-premises data processing jobs

• S3 used as a shared storage space between distributed components

• VPC used to integrate AWS and on-premises flows and systems

• Automated deployment of dev/test software into VPC EC2 has been great

Backend systems

As a startup it was almost a no-brainer

• Didn't want/need overhead of own physical infrastructure

• Pricing model with hugely reduced (or zero) up-front cost is an easy sell

• Coordinating the ability to quickly grow the ad server fleet is *hard* with a physical data centre

• As a more mature company the above still apply

• In addition our needs have also matured from the lower level "give us servers and storage"

Why do we use AWS?

Lessons learned:

It works! Service integration is often ridiculously easy: pull S3 data into EMR, set

up auto-scaling etc

Geographic data locality -- helps with compliance

Automatic cost reductions does wonders for corporate acceptance

Continuous evolution of the services means that they suddenly can be

a great fit

Lessons learned:

It works in its own way

Need to understand exactly what each service offers

Need to design for fault tolerance; instances can fail at high scale

Had to work hard to get our network to integrate with VPC

Can't save you from yourself; poor design is poor design

Would still love to see another region in EU

• Growth means more of all the above

• Want to re-evaluate services that weren't a great fit for us in the past (RDS, DynamoDB)

• Believe we can use data processing services (Elastic MapReduce in particular) alongside on-premises systems

• Looking to Cloud Formation/ Elastic Beanstalk and Opsworks to extend automation much further

The future