154
Criteo Infrastructure (Platform) Meetup 22 nd February 2017 Diarmuid Gill, VP R&D - Platforms Introduction & welcome note

Criteo Infrastructure (Platform) Meetup

Embed Size (px)

Citation preview

Page 1: Criteo Infrastructure (Platform) Meetup

Criteo Infrastructure (Platform) Meetup

22nd February 2017

Diarmuid Gill, VP R&D - Platforms

Introduction & welcome note

Page 2: Criteo Infrastructure (Platform) Meetup

About Criteo

1

Page 3: Criteo Infrastructure (Platform) Meetup

3 | Copyright © 2017 Criteo

Our mission

TARGET THE RIGHT USER

AT THE RIGHT TIME

WITH THE RIGHT MESSAGE

Page 4: Criteo Infrastructure (Platform) Meetup

4 | Copyright © 2017 Criteo

Key Figures

18 000 PUBLISHERS90% RETENTION RATE2

+130COUNTRIES

LISTED ON THE NASDAQ SINCE

OCTOBER 2013

R&D REPRESENTS 21% OF THE WORKFORCE

2500EMPLOYEES

21 BILLIONS $3

14 000 ADVERTISERS

$1,799 million1

31OFFICES

1: REVENUE IN 20162: ANNUAL RATE 2015

3: $ OF TURNOVER GENERATED TO OUR CLIENTS - TURNOVER POST-CLICK WW FROM JANUARY TO DECEMBER 2015

Page 5: Criteo Infrastructure (Platform) Meetup

How does it work ?

2

Page 6: Criteo Infrastructure (Platform) Meetup

6 | Copyright © 2017 Criteo

GENERAL CONCEPT

Users visit an advertiser’s website

1

Criteo identifies the users (via cookies)

2

Users leave the advertiser’s website& browse publisher on the Internet

3

Criteo identifies users on these pages(via cookie)

4

Criteo displays an advertising banner, personalized for

each user

5

Click through directlyto the advertiser’s

page

6

@

Retargeting principles

Page 7: Criteo Infrastructure (Platform) Meetup

Underlying infrastructure

3

Page 8: Criteo Infrastructure (Platform) Meetup

8 | Copyright © 2017 Criteo

• 3.2B catalog items ingested/day, 6B items stored

• 3.6B cookies/device IDs seen per month

• 3.9B personalized banners/day• 49 RTBs @ 120B bid requests/day

• 3M QPS at peak• 90 Gbps bandwidth• 20K servers• 27PB of data stored• 3.6PB of data read daily• 500B log lines processed/day• 363TB of RAM in memcached, 37M req/s• 300K Hadoop jobs/day

Scale @ Criteo

Page 9: Criteo Infrastructure (Platform) Meetup

9 | Copyright © 2017 Criteo

Batch processing:

• Hadoop as a Service:• 2 clusters – main + backup one for degraded mode• Cloudera CDH5• 2300 servers total (1300 + 1000), 76K vcores• 50PiB storage capacity

• Own job scheduler for improved data flow and coordination• 300k jobs per day

Hadoop @ Criteo

Page 10: Criteo Infrastructure (Platform) Meetup

10 | Copyright © 2017 Criteo

Infrastructure Key Figures

Hosting Global Partners :

Sunnyvale2 PoP

500 kVA2 006 Servers

New York2 PoP

930 kVA2 793 Servers

Hong Kong2 PoP

472 kVA2 185 Servers

Paris3 Pop

1 800 kVA5 003 Servers

Amsterdam2 PoP

+2 500 kVA3 874 Servers

Tokyo2 PoP

455 kVA2 564 Servers

Shanghai1 PoP

200 kVA907 Servers

Worldwide16 PoP

~8 MVA Contracted20 526 ServersUp to 90 Gbps

3M QPS

Ashburn2 PoP

1,1 MVA1 170 Servers

Hosting Global Partners :

Page 11: Criteo Infrastructure (Platform) Meetup

11 | Copyright © 2017 Criteo

Some of the many technologies used at Criteo

Page 12: Criteo Infrastructure (Platform) Meetup

What does “Platforms”

mean in Criteo?

4

Page 13: Criteo Infrastructure (Platform) Meetup

13 | Copyright © 2017 Criteo

Top Level Applications

Platforms

Infrastructure

SRE

Advertiser Publisher

WebScale

Prediction DynamicCreative

Recommendation

Engine• Catalog• User Events• Campaigns• Reporting

• RTB• Direct• Campaigns• Reporting

Systems

Platforms

Systems

Engine

Page 14: Criteo Infrastructure (Platform) Meetup

14 | Copyright © 2017 Criteo

Analytics Platforms

Advertiser Publisher

Analytics

AX/BI

Reporting / Billing Reporting / Payments

Page 15: Criteo Infrastructure (Platform) Meetup

Tonight’s programme

4

Page 16: Criteo Infrastructure (Platform) Meetup

16 | Copyright © 2017 Criteo

Tonight’s menu

Bill of Fare***

1st talk: FastTrack: scaling customer integration - Nicolas Laveau, Leo-Paul Goffic & Camille Coueslant -

2nd talk: Evolution of data structures in Yandex.Metrica- Alexey Milovidov -

3rd talk: Don't take your software for granted- Cedrick Montout -

4th talk: Evolution of analytics at Criteo- Justin Coffey -

***21:05 - 22:00 Networking

Page 17: Criteo Infrastructure (Platform) Meetup

Thank you!

Page 18: Criteo Infrastructure (Platform) Meetup

Camille Coueslant, Léo-Paul Goffic, Nicolas Laveau

2017/02/22

Scaling customer integration

FastTrackPLACEHOLDER IMAGE

Page 19: Criteo Infrastructure (Platform) Meetup

19 | Copyright © 2017 Criteo

What do we do in Criteo?

Deliver the right message to the right user at the right time

Page 20: Criteo Infrastructure (Platform) Meetup

20 | Copyright © 2017 Criteo

Integration: Creatives settings

• Banners need branding• Logo• Font• Color palette

• Banners come in many formats

Page 21: Criteo Infrastructure (Platform) Meetup

21 | Copyright © 2017 Criteo

Integration: Tags

• Banners are based on user intent• Tags on customer store• Different types of intent

• Home page view• Product view• Listing view• Basket• Sales

• Intent at product level

<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "viewHome" });</script>

<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "trackTransaction", id: "tr-56182-2123", item: [ { id: "patronus", price: 12.54, quantity: 3 }, { id: "avada-kedavra", price: 1099.99, quantity: 1 }/* add a line for each item in the user's basket */]});</script>

Home

Sales

Page 22: Criteo Infrastructure (Platform) Meetup

22 | Copyright © 2017 Criteo

Integration: Product Feed

• Banners contain products• Characteristics of products are used for

recommendation• Name, description, image, price for display

<item> <g:id>0</g:id> <title>Abracadabra</title> <g:image_link> http://www.magic.com/assets/spells/abracadabra.png </g:image_link> <link> http://www.magic.com/spells/abracadabra </link> <description> Multi-purpose spell. Your companion for every occasion! </description> <g:price>625.99</g:price> <g:google_product_category>35</g:google_product_category></item>

id;title;image_link;link;description;price;google_product_category0;Abracadabra;http://www.magic.com/assets/spells/abracadabra.png;http://www.magic.com/spells/abracadabra;Multi-purpose spell. Your companion for every occasion!;625.99;Arts & Entertainment > Hobbies & Creative Arts > Magic & Novelties

XML

CSV

Page 23: Criteo Infrastructure (Platform) Meetup

23 | Copyright © 2017 Criteo

Back in 2014

When the customer was seeing what he had to implement

Page 24: Criteo Infrastructure (Platform) Meetup

24 | Copyright © 2017 Criteo

Back in 2014

When the technical support was seeing the first implementation

Page 25: Criteo Infrastructure (Platform) Meetup

25 | Copyright © 2017 Criteo

Back in 2014

When the customer was trying to debug his implementation

Page 26: Criteo Infrastructure (Platform) Meetup

26 | Copyright © 2017 Criteo

Criteo grows… fast!

This does not scale!

« Performance is everything »BUT

we need to onboard first

Clients

TS

Page 27: Criteo Infrastructure (Platform) Meetup

27 | Copyright © 2017 Criteo

All is not lost!

Technology & UX to the rescue!

Page 28: Criteo Infrastructure (Platform) Meetup

TagsPart 1:Tag Validation Dashboard

Page 29: Criteo Infrastructure (Platform) Meetup

29 | Copyright © 2017 Criteo

Goal

Show near real-time metrics on trackers format issues Detect mismatches between the trackers and the product feed Provide fine-grained data (max 24 hours) Available for each of our clients (=worldwide)

Page 30: Criteo Infrastructure (Platform) Meetup

30 | Copyright © 2017 Criteo

How

Initial trackers architecture

Page 31: Criteo Infrastructure (Platform) Meetup

31 | Copyright © 2017 Criteo

How

1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid

Page 32: Criteo Infrastructure (Platform) Meetup

32 | Copyright © 2017 Criteo

Why Druid

• Druid is an open-source column-oriented distributed data store

• Advantages:• Fast aggregation queries on huge amount of metrics• Real-time streaming ingestion• Scalable• Highly available

Page 33: Criteo Infrastructure (Platform) Meetup

33 | Copyright © 2017 Criteo

1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid4. Query Druid from Integrate

How

Page 34: Criteo Infrastructure (Platform) Meetup

34 | Copyright © 2017 Criteo

Result

Page 35: Criteo Infrastructure (Platform) Meetup

TagsPart 2:Tag Debug Mode

Page 36: Criteo Infrastructure (Platform) Meetup

36 | Copyright © 2017 Criteo

Tag Debug Mode

How do I make sure I send Criteo the right information from my website?

?? Fig 1: Criteo Hotline

Page 37: Criteo Infrastructure (Platform) Meetup

37 | Copyright © 2017 Criteo

Tag Debug Mode

How do I make sure I send Criteo the right information from my website?

Fig 2: Happy customer

Page 38: Criteo Infrastructure (Platform) Meetup

38 | Copyright © 2017 Criteo

How tags work

https://www.mvmtwatches.com/

Page 39: Criteo Infrastructure (Platform) Meetup

39 | Copyright © 2017 Criteo

How tags work

https://www.mvmtwatches.com/

ld.js

Page 40: Criteo Infrastructure (Platform) Meetup

40 | Copyright © 2017 Criteo

How tags work

https://www.mvmtwatches.com/

ld.js

GET /event?a=%5B30072%…

Page 41: Criteo Infrastructure (Platform) Meetup

41 | Copyright © 2017 Criteo

How tags work

https://www.mvmtwatches.com/

ld.js

GET /event?a=%5B30072%…

200 OK

Page 42: Criteo Infrastructure (Platform) Meetup

42 | Copyright © 2017 Criteo

Tag Debug Mode

Page 43: Criteo Infrastructure (Platform) Meetup

43 | Copyright © 2017 Criteo

Tag Debug Mode

https://www.mvmtwatches.com/#enable-tag-debug-mode

Page 44: Criteo Infrastructure (Platform) Meetup

44 | Copyright © 2017 Criteo

Tag Debug Mode

https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js

if (document.location.hash == debugHash) loadLdDebug();

Page 45: Criteo Infrastructure (Platform) Meetup

45 | Copyright © 2017 Criteo

Tag Debug Mode

https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js

ld-debug.js

if (document.location.hash == debugHash) loadLdDebug();

addDebugIframe();

Page 46: Criteo Infrastructure (Platform) Meetup

46 | Copyright © 2017 Criteo

Tag Debug Mode

https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js

GET /event?a=%5B30072%…&debugMode=1

ld-debug.js

if (document.location.hash == debugHash) loadLdDebug();

addDebugIframe();

Page 47: Criteo Infrastructure (Platform) Meetup

47 | Copyright © 2017 Criteo

Tag Debug Mode

https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js

GET /event?a=%5B30072%…&debugMode=1

200 OKContent-Type: application/javascript

sendDebugInformationToIframe({ audit: {

product: { image: ‘…’ },errors: […]

}});

ld-debug.js

if (document.location.hash == debugHash) loadLdDebug();

addDebugIframe();

Page 48: Criteo Infrastructure (Platform) Meetup

48 | Copyright © 2017 Criteo

Tag Debug Mode

Gives you fine-grained insights on the quality of information sent Requires no technical knowlege Mirrors exactly what will be processed down the line

Page 49: Criteo Infrastructure (Platform) Meetup

Feed

Page 50: Criteo Infrastructure (Platform) Meetup

50 | Copyright © 2017 Criteo

Goal

Provide feedbacks ASAP on a subset of products Provide feedbacks on the whole feed Automatic format detection (Google specs) User can validate the structure of the feed User can review some products As close as possible as the daily feed import

Page 51: Criteo Infrastructure (Platform) Meetup

51 | Copyright © 2017 Criteo

Full import

Daily import architecture

Page 52: Criteo Infrastructure (Platform) Meetup

52 | Copyright © 2017 Criteo

Full import

Update feed processing Hadoop job to compute errors and attributes statistics

Page 53: Criteo Infrastructure (Platform) Meetup

53 | Copyright © 2017 Criteo

Full import

Launch full import from Integrate, retrieve and display statistics

Page 54: Criteo Infrastructure (Platform) Meetup

54 | Copyright © 2017 Criteo

Test import

Create a Marathon application that:- Stream incoming feed- Detect format- Reuse part of feed processing

Hadoop job java code- Save import & statistics in DB- Provide API to fetch statistics

Page 55: Criteo Infrastructure (Platform) Meetup

55 | Copyright © 2017 Criteo

Result

Page 56: Criteo Infrastructure (Platform) Meetup

56 | Copyright © 2017 Criteo

Result

Page 57: Criteo Infrastructure (Platform) Meetup

Creatives

Page 58: Criteo Infrastructure (Platform) Meetup

58 | Copyright © 2017 Criteo

How banners work at Criteo

• Actual humans pick predefinedlayouts, colors, CTAs

• Then those are combined with productinformation and optimized on-the-fly

Je découvre !

J’achète !× ×

×

=

Page 59: Criteo Infrastructure (Platform) Meetup

59 | Copyright © 2017 Criteo

How banners work at Criteo

“Can I have drop shadows on my products?”

“I’m not sure about the pink”

“Could it autoplay loud music?”

As a result, clients worry

“What will my banners look like?”

Page 60: Criteo Infrastructure (Platform) Meetup

60 | Copyright © 2017 Criteo

How banners work at Criteo

There is stuff we can’t do, and stuff we don’t necessarily want to do

“What will my banners look like?”

“Can I have drop shadows on my products?”

“I’m not sure about the pink”

“Could it autoplay loud music?”

Page 61: Criteo Infrastructure (Platform) Meetup

61 | Copyright © 2017 Criteo

Creatives to the rescue

And it takes back and forth.

Our goal:• Give advertisers a preview of what it’ll look like• Give advertisers customization options• Feedback the performance impact

• 80% of advertisers validate their Creatives in < 2 minutes• 80% of advertisers don’t ask for a change

Page 62: Criteo Infrastructure (Platform) Meetup

62 | Copyright © 2017 Criteo

Creatives

Bring on UX, R&D, Product, Sales, Creatives & Technical Support

Page 63: Criteo Infrastructure (Platform) Meetup

63 | Copyright © 2017 Criteo

Creatives

Bring on UX, R&D, Product, Sales, Creatives & Technical Support

Page 64: Criteo Infrastructure (Platform) Meetup

64 | Copyright © 2017 Criteo

Creatives

1 Education

Preview

Performance

Customization

2

3

4

1

2

3

4

Page 65: Criteo Infrastructure (Platform) Meetup

Going further!And mostly faster

Page 66: Criteo Infrastructure (Platform) Meetup

66 | Copyright © 2017 Criteo

eCommerce Platforms

Lots of our clients run on ready-to-use platforms that have APIs

As a result, we can completely automate the integration workflow for them!

Page 67: Criteo Infrastructure (Platform) Meetup

67 | Copyright © 2017 Criteo

Shopify integration

Only 2 clicks needed!

Reduced integration time from 14 days to 20 minutes

Page 68: Criteo Infrastructure (Platform) Meetup

Integration today

Page 69: Criteo Infrastructure (Platform) Meetup

69 | Copyright © 2017 Criteo

How customers / technical support / we feel

Page 70: Criteo Infrastructure (Platform) Meetup

70 | Copyright © 2017 Criteo

“”

• Only 25% in 2014• 66% complete

Feed in < 1h

• 43 days in 2014

• 2014: 600 integrations/quarter

• 2016: 1800 integrations/quarter

• 50% handled through Integrate

• 95% accept “as-is”• 4% accept with

performance downgrade

• Only 1% ask for modification

Nassim Aissat, Global TS

I’m in love with the Tag Debug Mode

7514d %Median integration time

Tags without help

Integrate achievements

92%Validate Creatives < 2 mn

20mnIntegration w/ Shopify App

Page 71: Criteo Infrastructure (Platform) Meetup

Questions?

Page 72: Criteo Infrastructure (Platform) Meetup

72 | Copyright © 2017 Criteo

Page 73: Criteo Infrastructure (Platform) Meetup

73 | Copyright © 2017 Criteo

What does Black Friday mean at Criteo?

Page 74: Criteo Infrastructure (Platform) Meetup

74 | Copyright © 2017 Criteo

Release freeze: trying to guarantee the stability of the platform...

... with nasty side-effects

Getting ready for Black Friday

Page 75: Criteo Infrastructure (Platform) Meetup

75 | Copyright © 2017 Criteo

How to know evaluate at a glance the health of the datacenter?

Comes grafana

Monitoring the datacenter

Page 76: Criteo Infrastructure (Platform) Meetup

76 | Copyright © 2017 Criteo

With specific filters, deviant machines can be spotted easily

Monitoring the datacenter

Page 77: Criteo Infrastructure (Platform) Meetup

77 | Copyright © 2017 Criteo

Drilling down...

Monitoring the datacenter

Page 78: Criteo Infrastructure (Platform) Meetup

78 | Copyright © 2017 Criteo

Until finding a likely culprit

Monitoring the datacenter

Page 79: Criteo Infrastructure (Platform) Meetup

79 | Copyright © 2017 Criteo

And switching to micro analysis to find the root cause• Process Explorer• Profiling• Windbg• ClrMD

Monitoring the datacenter

Page 80: Criteo Infrastructure (Platform) Meetup

80 | Copyright © 2017 Criteo

Load Balancing

HA Proxy

Page 81: Criteo Infrastructure (Platform) Meetup

81 | Copyright © 2017 Criteo

Basic of Client Side Load Balancing

Page 82: Criteo Infrastructure (Platform) Meetup

82 | Copyright © 2017 Criteo

Basic of Client Side Load Balancing

Page 83: Criteo Infrastructure (Platform) Meetup

83 | Copyright © 2017 Criteo

Mixed technical specifications

Page 84: Criteo Infrastructure (Platform) Meetup

84 | Copyright © 2017 Criteo

Gen8 Load test

Page 85: Criteo Infrastructure (Platform) Meetup

85 | Copyright © 2017 Criteo

• This is a bullet• 2nd level bullet

Gen8 vs Gen9 servers

Page 86: Criteo Infrastructure (Platform) Meetup

86 | Copyright © 2017 Criteo

Observable result

2/3

1/3

Page 87: Criteo Infrastructure (Platform) Meetup

87 | Copyright © 2017 Criteo

Conclusion

Do not take your software for granted• Internal Infrastructure will change• External workload will change

… be prepared

Page 88: Criteo Infrastructure (Platform) Meetup

88 | Copyright © 2017 Criteo

The Analytics Stack at Criteo

Yesterday, Today and Tomorrow with an assist from Bill MurrayJustin Coffey, Team Lead

Page 89: Criteo Infrastructure (Platform) Meetup

89 | Copyright © 2017 Criteo

The Ghost of Christmas Present

What do we have now?

Page 90: Criteo Infrastructure (Platform) Meetup

90 | Copyright © 2017 Criteo

Criteo: Scale of Data

• 4 Billion ads served each day

• 200+ Billion events logged each day

• 50TBs of data ingested each day

• 10 trillion records processed each day

Page 91: Criteo Infrastructure (Platform) Meetup

91 | Copyright © 2017 Criteo

Criteo: Scale of the Analytics Stack

50+ TB ingested / day

2000+ jobs / day

7+PB

UnderManagement

200+ Analysts400+ Engineers

1000+Sales and Ops

Page 92: Criteo Infrastructure (Platform) Meetup

92 | Copyright © 2017 Criteo

Criteo: Scaling Analysts

Sep 20

10

Nov 20

10

Jan 2

011

Mar 20

11

May 20

11

Jul 2

011

Sep 20

11

Nov 20

11

Jan 2

012

Mar 20

12

May 20

12

Jul 2

012

Sep 20

12

Nov 20

12

Jan 2

013

Mar 20

13

May 20

13

Jul 2

013

Sep 20

13

Nov 20

13

Jan 2

014

Mar 20

14

May 20

14

Jul 2

014

Sep 20

14

Nov 20

14

Jan 2

015

Mar 20

15

May 20

15

Jul 2

015

Sep 20

15

Nov 20

15

Jan 2

016

Mar 20

160

20

40

60

80

100

120

140

160

180

Analysts Hired since 2010

Page 93: Criteo Infrastructure (Platform) Meetup

93 | Copyright © 2017 Criteo

Criteo: Scaling Data

7/13/1

48/3

/14

8/24/1

4

9/14/1

4

10/5/

14

10/26

/14

11/16

/14

12/7/

14

12/28

/14

1/18/1

52/8

/153/1

/15

3/22/1

5

4/12/1

55/3

/15

5/24/1

5

6/14/1

57/5

/15

7/26/1

5

8/16/1

59/6

/15

9/27/1

5

10/18

/15

11/8/

15

11/29

/15

12/20

/15

1/10/1

6

1/31/1

6

2/21/1

6

3/13/1

64/3

/16

4/24/1

6

5/15/1

66/5

/16

6/26/1

6

7/17/1

68/7

/16

8/28/1

6

9/18/1

60

20000000000

40000000000

60000000000

80000000000

100000000000

120000000000

140000000000

Growth of a Single Dataset Since July 2014

Page 94: Criteo Infrastructure (Platform) Meetup

94 | Copyright © 2017 Criteo

Criteo: The Analytics Stack Today

Ad-HocAnalysis

Hadoop for primary storage and point of ingestion

Data Transformation on top of Hadoop

Hive (7PB) and Vertica (100+ TB) Data Warehouses

Ad-Hoc SQL on Hive and Vertica, Reporting on Tableau and Vertica

Orchestration via Langoustine

Page 95: Criteo Infrastructure (Platform) Meetup

95 | Copyright © 2017 Criteo

Our Stack is Simple

• Few moving parts

• Purposefully built with Shiny Thing blinders on

• It's okay to not have the "latest and greatest" tech

• Good enough is, actually, always good enough

Page 96: Criteo Infrastructure (Platform) Meetup

96 | Copyright © 2017 Criteo

On Shiny Things: the universe is vast

so be selective, and master what you select

Page 97: Criteo Infrastructure (Platform) Meetup

97 | Copyright © 2017 Criteo

The Ghost of Christmas PastBefore we continue, a quick history lesson of how we got here is in order...

Page 98: Criteo Infrastructure (Platform) Meetup

98 | Copyright © 2017 Criteo

Everything starts somewhere

and it's not always pretty.

Page 99: Criteo Infrastructure (Platform) Meetup

99 | Copyright © 2017 Criteo

In early 2013, you could use SQL Server…

AdServer_Db

Publisher_DbLogStatus_Db

BlogWidgetStat_Db

BlogWidgetAdStat_dbTraffic_custom_dbExtranet_DbTraffic_custom_db

CATEGORY_DB

Mail_MonitorDB

Inventory_Db

AdServerBo_Db

AdServerStat_Db

DashBoard_DB

Dashboard_Security_DB

WebServerStat_db

ABTesting_DB

AdvertiserFatigueStats_db

ADVERTISING_DB

StatPrediction_DB

CAST_DB

CriteoRefdb

ImportDB

RISK_DBGalacticaStats_DBMaxCpc_DB

UserProfilingDB

WorkflowPersistency_db

CAST_DB_HOURLYStatEngine_Db

Crawler_Db

BICustom_DB

Lookalike_DB

Widget_db

AOC_DB

AOC_DB

Build_Deploy_Fake_db

publisher_stats_db

TestFwk_Db

LogMonitorDb

ADMINLOGS_DB

SqoopExport_db

FraudDetection_db

HPClink_DB

DW_DB

tsuissesbenl_stat_dbHeyokr_Stat_dbkiabiit_stat_dbUltaus_Stat_dbCrutchfieldus_Stat_dbForzierijp_Stat_dbRetailchoiceuk_Stat_dbRyanairhotelses_Stat_dbSpeakyplanetfr_Stat_dbAutowayjp_Stat_dbSicilianobr_Stat_dbJukenhousingjp_Stat_dbCosyforyoufr_Stat_dbTripadvisorru_Stat_dbLinasmatkassese_Stat_dbEllepassionsfr_Stat_dbSkyde_Stat_dbSwimdoctormallkr_Stat_dbSitescoutbr_Stat_dbTravelzoousnewusers_Stat_dbPlatekompanietno_Stat_dbTestaoc110413frcom_Stat_dbMegapoolnl_Stat_dbElektrototaalmarktnl_Stat_dbIntersportuk_Stat_dbUsineadesignfr_Stat_dbLekmerno_Stat_dbVuelingit_Stat_db

Valuedopinions_Stat_dbForzierino_Stat_dbArtisantiuk_Stat_dbIdbusit_Stat_dbCocostorykr_Stat_dbArtnaturejp_Stat_dbByggmaxse_Stat_dbCorporatecriteopmit_Stat_dbAramisauto_Stat_dbMigoaes_Stat_dbDegrotespeelgoedwinkelnl_Stat_dbDiorcouturit_Stat_dbKaufuniquede_Stat_dbCodigallerykr_Stat_dbMandarinaduckfr_Stat_dbComarketingorangenokiafr_Stat_dbSinbiangkr_Stat_dbCheapflightsuk_Stat_dbUndergirlkr_Stat_dbAgradinl_Stat_dbKofferprofide_Stat_dbDomodipl_Stat_dbMandarinaduckat_Stat_dbMobilegermany_Stat_dbChlit_Stat_dbSpreadshirtuk_Stat_dbCasalrunningfr_Stat_dbBloomfm_Stat_db

Hotelsbe_Stat_dbStrumentimusicaliit_Stat_dbBathroomworlduk_Stat_dbVerivoxde_Stat_dbMcmkr_Stat_dbViaggiedreamsit_Stat_dbBrille24de_Stat_dbYjgakuseikaikan_Stat_dbStylepitnl_Stat_dbCvlibraryrecruiter_Stat_dbPreis24de_Stat_dbTigershedsuk_Stat_dbDuvetandpillowuk_Stat_dbNoths_Stat_dbWizwidkr_Stat_dbTicketonlinede_Stat_dbLifestyleeuropeuk_Stat_dbShopeccose_Stat_dbSwanhellenicuk_Stat_dbDeguisementdiscountfr_Stat_dbFreshcottonnl_Stat_dbTikamoonfr_Stat_dbTestfp1_Stat_dbwarehouse_stat_dbHisjeans_Stat_dbMountfieldlawnmowers_Stat_dbSitescoutnl_Stat_dbLancomeus_Stat_db

Brandelijp_Stat_dbMesdessousfr_Stat_dbBeautyplanningjp_Stat_dbLgcobrandingpriceminister_Stat_dbStockngous_Stat_dbKickzde_Stat_dbRockymountaindecorus_Stat_dbCellbesse_Stat_dbYvesrocheres_Stat_dbToshibadirectjp_Stat_dbSeneukr_Stat_dbWaterfeaturesuk_Stat_dbCottagesforyouuk_Stat_dbCamif_Stat_dbLojaskdbr_Stat_dbHipmunkhotels_Stat_dbSorteonline_Stat_dbEdiets_Stat_dbBonsportru_Stat_dbJobjsenjp_Stat_dbRedcoonit_Stat_dbHmuk_Stat_dbSrtestcetelem2_Stat_dbIamprettykr_Stat_dbLebunnybleushopkr_Stat_dbCondenastit_Stat_dbHotusaes_Stat_dbChilitvit_Stat_db

Hellinefr_Stat_dbCobrasonfr_Stat_dbmadeindesign_stat_dbMegagadgetsnl_Stat_dbTodaofertabr_Stat_dbbulbus_Stat_dbCalcioshopit_Stat_dbEdenlyes_Stat_dbRecruiterucajp_Stat_dbEngelhornde_Stat_dbSpreadshirtno_Stat_dbDusparstde_Stat_dbTabletbr_Stat_dbVentesecretfr_Stat_dbVenteunique_Stat_dbDellchde_Stat_dbDressforlessnl_Stat_dbMultipopkr_Stat_dballheartus_Stat_dbTrovitdejobs_Stat_dblesjeudisfr_stat_dbExpediaukcrosssell_Stat_dbFurniturebrituk_Stat_dbYooxbe_Stat_dbSkyscannerno_Stat_dbBluetomatoat_Stat_dbMechakaitaijp_Stat_dbDestinationlightingus_Stat_db

and 10K+ more

Page 100: Criteo Infrastructure (Platform) Meetup

100 | Copyright © 2017 Criteo

SQL Server was Production Infrastructure

• Analyst access to data was an afterthought

• Production databases were not designed for analytics

• Reports and queries were tightly coupled to production

• UX was low and Analysts occasionally broke production systems!

Page 101: Criteo Infrastructure (Platform) Meetup

101 | Copyright © 2017 Criteo

Hive also made an early appearance…

2013-04-22 11:28:59,942 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:01,010 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:02,071 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:03,134 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:04,876 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:05,112 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,047 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,984 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec

ZZZZ…

Page 102: Criteo Infrastructure (Platform) Meetup

102 | Copyright © 2017 Criteo

But Hive was also an afterthought

• Raw production data batch loaded with no transformations

• Query tools were non-existant

• Queries were slow and only expert analysts could run them

• UX and productivity were extremely low

Page 103: Criteo Infrastructure (Platform) Meetup

103 | Copyright © 2017 Criteo

This just wasn't working!we needed a new approach

Page 104: Criteo Infrastructure (Platform) Meetup

104 | Copyright © 2017 Criteo

First things firstwe need a database!

Page 105: Criteo Infrastructure (Platform) Meetup

105 | Copyright © 2017 Criteo

Requirements for an Analytic Database

• It must be extremely fast

• It must be able to store our most actionable data sets• Dozens (at the time!) of TBs, now hundreds

• It must be queryable with proper SQL

• It must be deployable on hardware we specify

Page 106: Criteo Infrastructure (Platform) Meetup

106 | Copyright © 2017 Criteo

Defining a Proof of Concept Evaluation

• Work with Analysts to identify key data sets

• Analyze query patterns

• Define benchmark queries

• Work with vendors to test closed source solutions

• Test OSS in-house

Page 107: Criteo Infrastructure (Platform) Meetup

107 | Copyright © 2017 Criteo

The results

• Vertica struck the right balance between cost, performance and deployment options

• PoC evaluation took ~3 months

• Initial deployment took another ~3 months

• Operations ramped up over the following ~6 months

Page 108: Criteo Infrastructure (Platform) Meetup

108 | Copyright © 2017 Criteo

Working with Analysts during deployment

• Analysts in the team helped define and document the data model

• They also created training materials

• Training was done in concert with engineers

Page 109: Criteo Infrastructure (Platform) Meetup

109 | Copyright © 2017 Criteo

But was it a success?

• Within a year of the rollout we were able to decomission SQL server for analytics

• Today Vertica has over 100 unique ad-hoc users connected each day

• It executes hundreds of thousands of queries each day

• It is the most important piece of analytics infrastructure at Criteo

Page 110: Criteo Infrastructure (Platform) Meetup

110 | Copyright © 2017 Criteo

A fresh deployment to mature infrastructure

• Vertica at Criteo has scaled from ~12TB to ~120TB (going PB soon)

• Ad-hoc users have grown from ~40 to ~200

• Reporting users have grown from ~300 to ~1500

• The number of tables has grown from ~50 to >500

Page 111: Criteo Infrastructure (Platform) Meetup

111 | Copyright © 2017 Criteo

Wait, 500 tables in 3 years?

That's a lot of data modelling!

Page 112: Criteo Infrastructure (Platform) Meetup

112 | Copyright © 2017 Criteo

Analysts contribute to the data model

• Engineers know how the DB works and know how to optimize a data model, but they don't always know what to put in it

• With good tools Analysts contribute to the evolutions of the data model, including schema additions and modifications

• Engineers in the team can help guide them in the finer details

• Rinse and repeat

Page 113: Criteo Infrastructure (Platform) Meetup

113 | Copyright © 2017 Criteo

Side bar: We also had dashboards with SSRS

But we were told it was ugly and complicated.

We traded ugly for slow, btw, and it's still complicated

Page 114: Criteo Infrastructure (Platform) Meetup

114 | Copyright © 2017 Criteo

From SSRS to Tableau and SQL Server to Vertica

• Actually, "slow" is just our current perception—we had SSRS dashboards with timeouts on the order of hours.

• SSRS served as our de facto ETL between those 10K+ SQL Server DBs

• Those SQL Server DBs were also production databases.

Page 115: Criteo Infrastructure (Platform) Meetup

115 | Copyright © 2017 Criteo

So to Summarize the Past

• Analysts had to query across thousands of DBs

• Dashboards were slow and complicated

• Analytics work was strongly coupled to production

life was great back then wasn't it?

Page 116: Criteo Infrastructure (Platform) Meetup

116 | Copyright © 2017 Criteo

We're done then?Not quite. Things can go awry!

Page 117: Criteo Infrastructure (Platform) Meetup

117 | Copyright © 2017 Criteo

The Ghost of Christmas Future

...here's hoping it's a near future...

Page 118: Criteo Infrastructure (Platform) Meetup

118 | Copyright © 2017 Criteo

Criteo is World Wide

We have hundreds of analysts spread across dozens of countries!

Page 119: Criteo Infrastructure (Platform) Meetup

119 | Copyright © 2017 Criteo

Criteo has a Rich Product Offering

• Banner Ads, Mobile, In-App, Email, Search

• 10's of Thousands of Advertisers and Publishers

• Some of them very big and very demanding

Page 120: Criteo Infrastructure (Platform) Meetup

120 | Copyright © 2017 Criteo

And (reminder!) our Scale Never Seems to Stop Growing

7/13/1

48/3

/14

8/24/1

4

9/14/1

4

10/5/

14

10/26

/14

11/16

/14

12/7/

14

12/28

/14

1/18/1

52/8

/153/1

/15

3/22/1

5

4/12/1

55/3

/15

5/24/1

5

6/14/1

57/5

/15

7/26/1

5

8/16/1

59/6

/15

9/27/1

5

10/18

/15

11/8/

15

11/29

/15

12/20

/15

1/10/1

6

1/31/1

6

2/21/1

6

3/13/1

64/3

/16

4/24/1

6

5/15/1

66/5

/16

6/26/1

6

7/17/1

68/7

/16

8/28/1

6

9/18/1

60

20000000000

40000000000

60000000000

80000000000

100000000000

120000000000

140000000000

Growth of a Single Dataset Since July 2014

Page 121: Criteo Infrastructure (Platform) Meetup

121 | Copyright © 2017 Criteo

(reminder #2) Number of analysts hired since 2010

Sep 20

10

Nov 20

10

Jan 2

011

Mar 20

11

May 20

11

Jul 2

011

Sep 20

11

Nov 20

11

Jan 2

012

Mar 20

12

May 20

12

Jul 2

012

Sep 20

12

Nov 20

12

Jan 2

013

Mar 20

13

May 20

13

Jul 2

013

Sep 20

13

Nov 20

13

Jan 2

014

Mar 20

14

May 20

14

Jul 2

014

Sep 20

14

Nov 20

14

Jan 2

015

Mar 20

15

May 20

15

Jul 2

015

Sep 20

15

Nov 20

15

Jan 2

016

Mar 20

160

20

40

60

80

100

120

140

160

180

Page 122: Criteo Infrastructure (Platform) Meetup

122 | Copyright © 2017 Criteo

What could go wrong?

Page 123: Criteo Infrastructure (Platform) Meetup

123 | Copyright © 2017 Criteo

New Challenges

• With so many hungry analysts to feed and with so much volume and variety of data, Vertica's query planner is working over time

• We need to instrument and monitor more

• We need to level-up analysts' SQL skills

• And yes, finally, we do need some data governance*

*oh how I've resisted this day!

Page 124: Criteo Infrastructure (Platform) Meetup

124 | Copyright © 2017 Criteo

2 Analysts and 3 Engineers ain't gonna cut it

• We have scaled up our PM team

• We are moving from a proto-CoE team to an official CoE team

• We are scaling engineering operations

Page 125: Criteo Infrastructure (Platform) Meetup

125 | Copyright © 2017 Criteo

What's on the TODO list?

• Documentation, and automating it as much as possible

• Non-invasive, but very intimate query monitoring

• Workload isolation

• Query suggestions and preëmptive query blocking

Page 126: Criteo Infrastructure (Platform) Meetup

126 | Copyright © 2017 Criteo

More about query inspection

• No matter how wonderful a database may be its performance comes down to how much IO it has and how much contention there is for it

• The difference between a poorly optimized query and a well optimized one for the IO subsystem can be orders of magnitude

• Better queries means more concurrent, happier users

Page 127: Criteo Infrastructure (Platform) Meetup

127 | Copyright © 2017 Criteo

More about query inspection

• Vertica offers lots of ways to find out what is going on behind the scenes, but one of the best ways is to EXPLAIN your users' queries and identify

those who need to be trained!

Page 128: Criteo Infrastructure (Platform) Meetup

128 | Copyright © 2017 Criteo

Recalling our Current Challenges

• Tableau Workbooks are Slow

• Vertica is Overloaded

• Reporting Data is Frequently Late

Page 129: Criteo Infrastructure (Platform) Meetup

129 | Copyright © 2017 Criteo

Patches and the Arc of History

• Each of our currently challenges can be addressed in the short term

• But we need long term solutions to avoid regressions

Page 130: Criteo Infrastructure (Platform) Meetup

130 | Copyright © 2017 Criteo

Tableau Relief Program (TaRP)

Short Term:• Double the cores on production server• Isolate critical workbooks

Medium Term:• Require all production workbooks to go

through gerrit/git review• Score workbook complexity pre-release• Monitor released workbooks for QoS

Not So Long Term:• Work with Product and Central Ops to create

Tableau Center of Excellence and level up BI

Page 131: Criteo Infrastructure (Platform) Meetup

131 | Copyright © 2017 Criteo

TaRP: reporting alchemy

Push to production

Productive Analyst

AngrySales Person

No SLAdataset

Productive Analyst

HappySales Person

SLAdataset

Push to review Automated deploy

Knowledgeable Analyst

Compliance checks

passed

Peer-reviewed

Page 132: Criteo Infrastructure (Platform) Meetup

132 | Copyright © 2017 Criteo

Why impose a dev cycle on report building?

not to be trite, but, well:

that's good money!

Page 133: Criteo Infrastructure (Platform) Meetup

133 | Copyright © 2017 Criteo

More seriously

• Tableau workbooks consume data

• Data comes in all sorts of volumes and velocities (sorry)

• Data query complexity is linked to workbook complexity and features

• If you don't know what you're doing, your workbooks will be:• slow, because of internal workbook complexity• slow, because of complex database queries• not be up to date if it doesn't query the proper data sources

Tableau workbook developers are developers, full stop. Treat them like they are.

Page 134: Criteo Infrastructure (Platform) Meetup

134 | Copyright © 2017 Criteo

Consul

Vertica Roadmap

RTIngester

HD

FSIn

gest

er

HLL

JDB

C

VProxy

Adm

in

VIcO

JVMIngester

DataDisco

Page 135: Criteo Infrastructure (Platform) Meetup

135 | Copyright © 2017 Criteo

Vertica as a Service

Short Term:• Scale out as fast as reasonable• Split reporting and ad hoc workloads• Better hardware configuration• More monitoring

Not So Long Term:• Better monitoring• Control Input: Trickle and Bulk Loading, Consistently, Durably and Efficiently• Control Output: Query inspection/prioritization, Workload management

Page 136: Criteo Infrastructure (Platform) Meetup

136 | Copyright © 2017 Criteo

Fixing Your Latent Data Problem

Short Term:• Migrate critical data workflows to Langoustine• Optimize DAG and long running queries

Medium Term:• Migrate long-tail datasets to Langoustine• Better metrics, capacity planning

Not So Long Term:• Refactor data model to cull useless data sets• Better complexity analysis of workflow modifications pre-release

Page 137: Criteo Infrastructure (Platform) Meetup

137 | Copyright © 2017 Criteo

We're going to need better instrumentation

Better Workflow Insights in Langoustine Better Hadoop Job Performance Metrics

Page 138: Criteo Infrastructure (Platform) Meetup

138 | Copyright © 2017 Criteo

Let's spend less time making data workflows

Langoustine IDE makes building Hive workflows trivial

Page 139: Criteo Infrastructure (Platform) Meetup

139 | Copyright © 2017 Criteo

Langoustine IDE promotes best practices

Workflows are source controlled:

Reviews are built-in:

Page 140: Criteo Infrastructure (Platform) Meetup

140 | Copyright © 2017 Criteo

We'll need better dev tools (eg dev-cluster)

build an AWS hadoop cluster:

connect to it via a local docker container:

and load it with data saved in S3:

Page 141: Criteo Infrastructure (Platform) Meetup

141 | Copyright © 2017 Criteo

SLAB: SLA Boards That Say A Lot

Page 142: Criteo Infrastructure (Platform) Meetup

142 | Copyright © 2017 Criteo

Wait, what about Opera and Vizatra?didn't you guys do a lot of work on that?

Page 143: Criteo Infrastructure (Platform) Meetup

143 | Copyright © 2017 Criteo

A Quick Opera Recap

Opera is the internal replacement for CPOP, built in two partsA scalding-langoustine data pipeline: And a vizatra-OLAP web app:

Page 144: Criteo Infrastructure (Platform) Meetup

144 | Copyright © 2017 Criteo

We learned a lot from building Opera

• How to use SQL to describe a dashboard

• How to master SQL queries executed from an OLAP app

• How to build big, fast databases

• How to build optimal (or so we think) data processing pipelines

• How to make a decent UI with decent UX

Page 145: Criteo Infrastructure (Platform) Meetup

145 | Copyright © 2017 Criteo

Let's focus on the SQL stuff

Page 146: Criteo Infrastructure (Platform) Meetup

146 | Copyright © 2017 Criteo

Using SQL for dashboard meta-data

SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id

Time dimensions

Dimensions

Metrics

Parameters

Page 147: Criteo Infrastructure (Platform) Meetup

147 | Copyright © 2017 Criteo

Using SQL for dashboard meta-data

Time dimension

Dimensions

Metrics

Parameters

Page 148: Criteo Infrastructure (Platform) Meetup

148 | Copyright © 2017 Criteo

Big-O(lap)

SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id

PROJECTION Revenue by countrySELECTIONLast 7 days in EUR

Page 149: Criteo Infrastructure (Platform) Meetup

149 | Copyright © 2017 Criteo

Big-O(lap)

SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id

PROJECTION Revenue by countrySELECTIONLast 7 days in EUR

Page 150: Criteo Infrastructure (Platform) Meetup

150 | Copyright © 2017 Criteo

Big-O(lap)

SELECT country_code as country, SUM(clicks) as clicks, SUM(displays) as displaysFROM factsWHERE time_id BETWEEN ‘2016-03-01’ AND ‘2016-03-07’GROUP BY country_code

PROJECTION Revenue by countrySELECTIONLast 7 days in EUR

Page 151: Criteo Infrastructure (Platform) Meetup

151 | Copyright © 2017 Criteo

Now that we've gotten intimate with SQL...Let's see what else we can build...

Page 152: Criteo Infrastructure (Platform) Meetup

152 | Copyright © 2017 Criteo

Vizatra Client: One DB Client to Rule Them All

Page 153: Criteo Infrastructure (Platform) Meetup

153 | Copyright © 2017 Criteo

Vizatra Client: One DB Client to Rule Them All

• Parse every query and analyze complexity before executing it

• Enforce best practices (e.g. predicates on partitions)

• Degrade gracefully (e.g. don't submit queries to an overloaded DB)

• Score users and queries, share with other users

• Provide basic visualizations to increase analytic productivity

• Support non-SQL datasources

• And your feature?

Page 154: Criteo Infrastructure (Platform) Meetup

154 | Copyright © 2017 Criteo

The End.Thanks for listening. If any of this sounds fun, we're hiring!