30
1 How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour Simple. Fast. Elastic. NoSQL Now! 2011 Matt Ingenthron

How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Embed Size (px)

Citation preview

Page 1: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

1

How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Simple. Fast. Elastic.

NoSQL Now! 2011Matt Ingenthron

Page 2: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

2

AD  AND  OFFER  TARGETINGAOL  serves  billions  of  impressions  per  day  from  our  ad  serving  pla>orms,  and  any  incremental  improvement  in  processing  Bme  translates  to  huge  benefits  in  our  ability  to  more  effecBvely  serve  the  ads  needed  to  meet  our  contractual  commitments.  TradiBonal  databases  lack  the  scalability  required  to  support  our  goal  of  five  milliseconds  per  read/write.  CreaBng  user  profiles  with  Hadoop,  then  serving  them  from  Couchbase,  reduces  profile  read  and  write  access  to  under  a  millisecond,  leaving  the  bulk  of  the  processing  Bme  budget  for  improved  targeBng  and  customizaBon.

Pero  SubasicChief  Architect,  AOL

Page 3: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

3

Ad  and  offer  targe/ng

eventsprofiles,  campaigns

profiles,  real  /me  campaign  sta/s/cs

40  milliseconds  to  respond  with  the  decision.

2

3

1

Page 4: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

4

Proven at small, and extra large scale

• Leading cloud service (PAAS) provider

• Over 150,000 hosted applications

• Couchbase Server serving over 6,200 Heroku customers

• Social game leader – FarmVille, Mafia Wars, Café World

• Over 230 million monthly active users

• Couchbase Server is the primary database behind key Zynga properties

Page 5: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

5

Modern interactive software architecture

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

-­‐Expensive  and  disrup/ve  sharding-­‐Doesn’t  perform  at  large  scale

Page 6: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

6

Couchbase data layer scales like application logic tierData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

Couchbase  Servers

Horizontally  scalable,  schema-­‐less,  auto-­‐sharding,  high-­‐performance  at  Web  Scale

Page 7: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

7

Couchbase  is  a  distributed  database

Couchbase  Servers

In  the  data  center

Web  applica/on  server

Applica/on  user

On  the  administrator  console

Couchbase  Web  Console

Page 8: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

8

Couchbase  is  Simple,  Fast,  Elas/c  NoSQL

• Simple  to:– Deploy  (Membase  ServerTemplate)– Develop  (memcached)– Manage  (UI  and  RESTful  API)

• Fast:– Predictable  low  latency  – Sub-­‐ms  response  Bmes– Built-­‐in  memcached  technology

• Zero-­‐down/me  Elas/city:– Spread  I/O  and  data  across  instances– Consistent  performance  with  linear  cost– Dynamic  rebalancing  of  a  live  cluster

ElasBc  Couchbase

Page 9: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

9

COUCHBASE  SERVER  ARCHITECTURE  OVERVIEW

Page 10: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

10

Couchbase  “write”  data  flow  –  applica/on  view

User  acBon  results  in  the  need  to  change  the  VALUE  of  KEY

ApplicaBon  updates  key’s  VALUE,  performs  SET  operaBon  

Couchbase  client  hashes  KEY,  idenBfies  KEY’s  master  serverSET  request  sent  over  

network  to  master  server

Couchbase  replicates  KEY-­‐VALUE  pair,  caches  it  in  memory  and  stores  it  to  disk

1

2

34

5

Page 11: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

11

Couchbase  data  flow  –  under  the  hood

SET  request  arrives  at  KEY’s  master  server

Listener-­‐Sender

Master  server  for  KEY Replica  Server  2  for  KEYReplica  Server  1  for  KEY

2 2

1 SET  acknowledgement  returned  to  applicaBon5

DiskDisk Disk

RAM*

Couchb

ase  storage  en

gine

DiskDisk Disk

2

3

Page 12: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

12

moxi

11211 11210

memcachedprotocol  listener/sender

CouchDB

engine  interface

memcapable  1.0 memcapable  2.0

21100  –  2119943698080

h\pRE

ST  m

anagem

ent  A

PI/W

eb  UI

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

guraBo

n  manager

on  each  node

Erlang/OTP

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replicaB

on  m

anager

HTTP distributed  erlangerlang  port  mapper

Data  Manager Cluster  Manager

Couchbase  Architecture

Page 13: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

13

moxi

11211 11210

memcachedprotocol  listener/sender

CouchDB

engine  interface

memcapable  1.0 memcapable  2.0

21100  –  2119943698091

h\pRE

ST  m

anagem

ent  A

PI/W

eb  UI

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

guraBo

n  manager

on  each  node

Erlang/OTP

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replicaB

on  m

anager

HTTP distributed  erlangerlang  port  mapper

Couchbase  Architecture

Page 14: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

14

Data  buckets  are  secure  Couchbase  “slices”

Couchbase  data  servers

In  the  data  center

Web  applica/on  server

Applica/on  user

On  the  administrator  console

Bucket  1

Bucket  2

Aggregate  Cluster  Memory  and  Disk  Capacity

Page 15: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

15

Elas/c  Rebalancing

vBucket  1vBucket  2vBucket  3

vBucket  4vBucket  5vBucket  6

Node  1 Node  2 Node  3

Before• Adding  Node  3• Node  3  is  in  pending  state• Clients  talk  to  Node  1,2  only

vBucket  7vBucket  8vBucket  9

vBucket  10vBucket  11vBucket  12

Pending  state

Page 16: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

15

Elas/c  Rebalancing

vBucket  1vBucket  2vBucket  3

vBucket  4vBucket  5vBucket  6

Node  1 Node  2 Node  3

Before• Adding  Node  3• Node  3  is  in  pending  state• Clients  talk  to  Node  1,2  only

During• Rebalancing  orchestrator  recalculates  the  

vBucket  map  (including  replicas)• Migrate  vBuckets  to  the  new  server• Finalize  migraBon

vBucket  7vBucket  8vBucket  9

vBucket  10vBucket  11vBucket  12

Pending  state

vBucket  1vBucket  2vBucket  3

vBucket  4vBucket  5vBucket  6

vBucket  7vBucket  8vBucket  9

vBucket  10vBucket  11vBucket  12

Rebalancing

vBucket    migrator vBucket    migrator

Page 17: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

15

Elas/c  Rebalancing

vBucket  1vBucket  2vBucket  3

vBucket  4vBucket  5vBucket  6

Node  1 Node  2 Node  3

vBucket  1

vBucket  2

vBucket  3

vBucket  4

vBucket  5

vBucket  6vBucket  7

vBucket  8

vBucket  9

vBucket  10

vBucket  11

vBucket  12

Before• Adding  Node  3• Node  3  is  in  pending  state• Clients  talk  to  Node  1,2  only

AKer• Node  3  is  balanced• Clients  are  reconfigured  to  talk  to  Node  3

During• Rebalancing  orchestrator  recalculates  the  

vBucket  map  (including  replicas)• Migrate  vBuckets  to  the  new  server• Finalize  migraBon

vBucket  7vBucket  8vBucket  9

vBucket  10vBucket  11vBucket  12

Pending  state

vBucket  1vBucket  2vBucket  3

vBucket  4vBucket  5vBucket  6

vBucket  7vBucket  8vBucket  9

vBucket  10vBucket  11vBucket  12

Rebalancing

vBucket    migrator vBucket    migrator

Client

Page 18: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Aol  and  CouchbaseAd  TargeBng

Page 19: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

AdvertisersPublishers

Internet Users

Aol Advertising“Match Maker”

Publisher Constraints: •    Payment model  –  may  charge        per  impression,  click,  or        conversion•    Allowability  –  may  prohibit        certain  types  of  ads  to  be        displayed

Advertiser Constraints: •    Payment model  –  may  pay  per      impression,  click,  or  conversion•    Allowability  –  may  restrict  on          what  web  sites  to  be  served•    Targeting  –  may  only  want  to          be  shown  to  internet  users  in          a  certain  geo  locaBon,  or  from        a  specific  demographic•      Frequency  –  may  limit  how  oaen          the  same  user  is  shown  the  ad•    Campaign Delivery:                  -­‐  The  total  ad  budget  may                        have  to  be  delivered                          according  to  a  plan                  -­‐    The  served  impressions                        may  have  to  generate  no                        less  than  a  prescribed  click-­‐                        through  or  conversion  rate

Terminology: • CPM = Cost Per Mille, e.g. $1.50 per 1000 impressions• CPC = Cost Per Click, e.g. $2 per click• CPA = Cost Per Acquisition, e.g. $15 per acquisition/conversion

Online Advertising

Page 20: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Large-­‐Scale  Analy/cs•Mission• Team• Data

• Ad  serving  logs,  content,  and  3rd  party  data  to  be  processed

• Research• Technologies

• Cloudera:  Hadoop,  HDFS,  Flume,  Workflow  Manager• Distributed  opera/onal  store:  Couchbase

• Light  DB:  MySQL• Use  MPI  for  model  building

• Constantly  experimen/ng...

Page 21: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

-­‐  OperaBonal  store  highly  cached                in  Couchbase        -­‐  DB  support  for  Hadoop  and  MySQL

Data Feeds

Data from the Internet

Flume  Inges/on

Actionable data(to ad serving)

-­‐  Cpu-­‐intensive  (MPI-­‐based  ML)

-­‐  Distributed  search  (Sphinx)  Large-scale Analytics:  -ReporBng  and  Insights-PredicBve  Segments-Contextual  Analysis  and  SegmentaBon

Couchbase  DB  Cluster

Page 22: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Use  Cases  Today• data  set  enrichment:  given  a  field  in  a  data  set  stored  on  HDFS,  enrich  by  adding  related  

fields;  media  -­‐>  campaign  -­‐>  adver>ser  chain• blackboard  for  inter-­‐process/job  communica>on:  contextual  segmenta>on  pipelines;  

predic>ve  modeling  can  load  per-­‐campaign  models  to  be  used  for  large-­‐scale  scoring• larger  map-­‐side  joins  (where  Hadoop  DistributedCache  and  in-­‐memory  process/task  cache  is  

insufficient)  • aggrega>ons  with  large  number  of  item  lookups,  e.g.  user-­‐level  contextual  profiles  

aggregated  from  visited  url  contextual  profies  stored  in  memcache• Flume  integra>on  for  data  flow  reliability  end  recovery• segment  genera>on  currently  carried  out  through  Hadoop  pipelines  and  uploaded  into  

server-­‐side  Membase  for  targe>ng• but:  strong  tendency  to  move  closer  to  ad  serving  mo>vates  thinking  about  new  

architectures  to  reduce  segment  genera>on  >me

Page 23: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

RT  Framework:  Capture,  Compute  and  Forward

Flume  Inges/on

Couchbase(back-­‐end)

Hadoop

Compute  Cluster

Couchbase(front-­‐end)

and  ad-­‐serving  logic

Data Feeds

CAPTURECOMPUTE FORWARD

Big  Data  Loop

Page 24: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

RT  Contextual  Segmenta/on

Flume  Inges/on

Membase

User-­‐ContentID  Mapper

Couchbase  +  ad-­‐serving  logic

Data Feeds

Active Event FrameUC Map

ContentID-Segment Map

User-­‐Segment  Mapper

US Short-term Map

Page 25: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Rough  Capacity  Es/mates•Data  Volume  Calcula/on

– 60000  events  per  second  -­‐>  60000  *  900  =  55  mil.  events  during  15-­‐minute  burst

– 1KB  per  event  -­‐>  55GB  for  staging  frame  +  55GB  for  loading  frame  =  110GB;  the  rest  ~  800GB  is  for  data  output  from  processes  

– 10  nodes  at  128GB  =  1TB  -­‐>  more  than  enough!  (assuming  one  copy)  

– exact  calcula/ons  at  hqp://wiki.membase.org/display/membase/Sizing+Guidelines

•Processing  bandwidth– assuming  cluster  supports  200K  ops  per  second  (conserva/ve)

– 60000  opera/ons/sec  reserved  for  loading  the  current  15-­‐minute  frame

– remaining  140K  opera/ons/sec  for  jobs  

Page 26: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

AOL  TargeBngLessons  Learned  using  Couchbase

Page 27: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Couchbase  Architecture  • Requirements• Support  iniBally  up  to  1.2  billion  keys  (1  key  per  user  in  the  system).

• Minimum  of  10K  writes  per  second.

• Two  clusters,  one  on  each  coast,  to  reduce  latency.• Easily  scalable,  support  an  increasing  number  of  keys  &  writes/reads  per  second  and  seamlessly  allow  growth  for  the  future

• Couchbase  Set  up• IniBally    1.1  billion  keys,  now  650  million  keys.• 250  to  ~2K  writes/second.

• 1K  to  7K  reads/second.

• 2  clusters,  10  nodes  each.• Dual  wriBng,  once  to  each  cluster.

• 1.19  TB  of  RAM  available,  124/128  GB  allocated  on  each  server.  200  Gb  in  use  at  the  moment.

25

Page 28: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Issue ResoluMon

ReplicaBon  across  data  center  by  wriBng  to  a  local  moxy  dramaBcally  reduces  the  throughput.  

Do  not  use  the  local  moxy.    The  membase  client  (Spy)  should  connect  directly  to  an  instance  of  a  remote  moxy  to  perform  updates.    

Correctly  sizing  the  membase  cluster  based  on  the  expected  number  of  keys  and  size  of  the  object  is  criBcal  to  the  membase  operaBon

Membase  needs  150  bytes  per  item  for  meta  data    Ensure  mem_high_wat  is  not  exceeded  to  prevent  spillover  to  disk.  If  incoming  data  arrives  faster  than  the  data  write  to  disk,  the  system  returns  errors

Membase  seongs  such  as  memory  high/low  water  marks  modified  by  flushctl  will  be  reverted  to  default  when  the  service  restarts.  

Re-­‐issue  flushctl  command  every  Bme  Membase  server  restarts.    Membase  indicated  that  a  be\er  configuraBon  system  to  allow  permanent  change  of  seongs  is  coming.    UnBl  then,  they  recommend  to  sBck  with  default  seongs.  

26

Lessons  Learned

Page 30: How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions Every Hour

Customers  (par/al  lis/ng) Partners

28

Customers  and  Partners