29
Every ad. Every sales channel. Every screen. One platform. Ashish Tadose Senior Data Architect @ PubMatic Apache Apex committer RealTime AdTech Reporting & Targeting with Apache Apex

RealTime AdTech reporting & targeting with Apache Apex

Embed Size (px)

Citation preview

Page 1: RealTime AdTech reporting & targeting with Apache Apex

Every ad.Every sales channel.Every screen.One platform.

Ashish TadoseSenior Data Architect @ PubMaticApache Apex committer

RealTime AdTechReporting & Targeting with Apache Apex

Page 2: RealTime AdTech reporting & targeting with Apache Apex

Agenda

§ AboutPubMatic§ Reporting&Targetingusecases§ ApacheApexoverview§ PubMatic’sStreamingusecaseswithApacheApex§ ComparingApacheApex§ Roadmap

2ProprietaryandConfidential

Page 3: RealTime AdTech reporting & targeting with Apache Apex

Confidential & Proprietary

AboutPubMatic

3

Page 4: RealTime AdTech reporting & targeting with Apache Apex

About PubMatic

ü PubMaticisaleadingmarketingautomationsoftwarecompanyforpublishers.

ü Throughreal-timeanalytics,yieldmanagement,andworkflowautomation,PubMaticenablespublisherstomakesmarterinventorydecisionsandimproverevenueperformance..

ü DrivesInnovationinAdTech

4ProprietaryandConfidential

Page 5: RealTime AdTech reporting & targeting with Apache Apex

5ProprietaryandConfidential

ad impressionsserved daily

bids processeddaily

data processeddaily

data undermanagement

data centeracross geography

40B+

350B+

50TB

10PB6

ScaleThatDrivesResults

Page 6: RealTime AdTech reporting & targeting with Apache Apex

General Analytics Dashboard Guaranteed Ad Server - Pacing engine

AdServering- Impression capping report - Floor recommendation

Inventory Discovery

Machine Learning

Reporting & Analytics

Platform

Ad – hoc Reports

Audience Reports

Brand ControlReporting

Usecases

Page 7: RealTime AdTech reporting & targeting with Apache Apex

LambdaArchitecture– Velocity&Volume

7

Data

Data Sink Batch Eg :

Hadoop

Batch write , random

read

Real Time e.g. Storm

Random read & write

Query &

Merge

Page 8: RealTime AdTech reporting & targeting with Apache Apex

Usecasesforarealtimesolution

ü Real-timereporting- Reportingofcriticalmetricsaroundcampaignmonetization

- Revenue,impression&clickinfo- Aggregatecounters&reporting

ontopNmetrics- LowlatencyqueryingusingKafka

inpub-submodel.

ü Real-timeMonitoring- Alertsondealtracking&monetization- Campaign&dealhealth

ü Real-timeLearning- Usingthelostbidinsightsforpricerecommendations.

ü AllocationEngine- Feedbacktoadservingforguaranteeddelivery&lineitempacing

8ProprietaryandConfidential

AdServer AdServer AdServer AdServer

Kafka Cluster

RealTime reporting data processing

Processing for AdServer Feedback

Page 9: RealTime AdTech reporting & targeting with Apache Apex

Confidential & Proprietary

ApacheApexOverview

9

Page 10: RealTime AdTech reporting & targeting with Apache Apex

10ProprietaryandConfidential

Page 11: RealTime AdTech reporting & targeting with Apache Apex

11ProprietaryandConfidential

Page 12: RealTime AdTech reporting & targeting with Apache Apex

12ProprietaryandConfidential

Page 13: RealTime AdTech reporting & targeting with Apache Apex

13ProprietaryandConfidential

Page 14: RealTime AdTech reporting & targeting with Apache Apex

14ProprietaryandConfidential

Page 15: RealTime AdTech reporting & targeting with Apache Apex

15ProprietaryandConfidential

Page 16: RealTime AdTech reporting & targeting with Apache Apex

16ProprietaryandConfidential

Page 17: RealTime AdTech reporting & targeting with Apache Apex

17ProprietaryandConfidential

Page 18: RealTime AdTech reporting & targeting with Apache Apex

Confidential & Proprietary

PubMatic’sstreamingusecasewithApacheApex

18

Page 19: RealTime AdTech reporting & targeting with Apache Apex

19ProprietaryandConfidential

Page 20: RealTime AdTech reporting & targeting with Apache Apex

ApacheApex@PubMatic

User Browser

AdServer

REST proxy

REST proxy

In prem

AWS Real-time architecture

Kafka Cluster

Kafka Cluster

Client logs

KafkaInput

(Auction logs)

Kafka Input

(Client logs)

CDN(Caching of

logs)

ETL operator ETL operator

Filter Operator Filter Operator

Dimensions Aggregator

Dimensions Aggregator

Dimensions Store

Query Query Result

Kafka Cluster

Auction Logs

Client logs

Middleware

Auction Logs

Client logs

Kafka Messages Kafka Messages

Decompress & Flatten

Decompress & Flatten

Filtered Events Filtered Events

Aggregates

Query from MW

Query Query Results

AdServer

Page 21: RealTime AdTech reporting & targeting with Apache Apex

RealTime Dashboard@PubMatic

21ProprietaryandConfidential

Page 22: RealTime AdTech reporting & targeting with Apache Apex

22ProprietaryandConfidential

RealTime Dashboard@PubMatic

Page 23: RealTime AdTech reporting & targeting with Apache Apex

Confidential & Proprietary

Comparing with other Streaming platforms

23

Page 24: RealTime AdTech reporting & targeting with Apache Apex

24ProprietaryandConfidential

Page 25: RealTime AdTech reporting & targeting with Apache Apex

25ProprietaryandConfidential

Page 26: RealTime AdTech reporting & targeting with Apache Apex

ApexvsSparkStreamingvsFlink

26ProprietaryandConfidential

Apache Apex Spark Streaming Apache FlinkRelease Recentlygraduatedfrom

ApacheincubationSparkStreamingFeb2014Sparkmajor1.0releasesinceJuly2014

Graduatedin2015Major1.0releaseinMarch2016

Commercialsupport DataTorrent DataBricksHortonworksClouderaMapR

dataArtisian

Companiesusing http://apex.apache.org/powered-by-apex.htmlGECapitalOneSilverSpringNetworksPubMaticThreatMetrixFacilitiesSuppliesRoyalBankofCanadaInfosysTechMahindraMammothDataCloudWickSynerzipTrace3LeadFerretTarget

https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark

LargesetofcompaniesareusingSparkhoweveronlyfewofthemareusingitforSparkStreamingasbelow

AsiaInfoBigIndustriesBaiduFaimdataKelkooLocalyticsOpentable

https://flink.apache.org/poweredby.htmlAlibaba.comBouyguesCapitalOneEricssonKingOttogroupResearchGateZalando

Page 27: RealTime AdTech reporting & targeting with Apache Apex

ApexvsSparkStreamingvsFlink

27ProprietaryandConfidential

ApacheApex SparkStreaming ApacheFlinkStreamingmodel Native– eventbasedstream

processingMicro-batching Native- eventdrivenstream

processing

API DeclarativeAPIsLowerlevel compositionalAPI

Declarative- higherorderfunctionsSystemoptimizestopologyitself

Declarative- higherorderfunctionsSystemoptimizestopologyitself

Latency VeryLOW HIGH VeryLOW

Throughput HIGH HIGH HIGH

Query-ableIn-memoryaggregatestore

InmemoryDimensionStoreoperator

NonativesupportCanbeachievedthroughDataFrames - notefficient

FrameworkManageddistributedin-memorystore

Aggregatestoresnapshotting HDHT– good NonativesupportCanbeachievedbysavingsDataFrames inparque format- notefficient

StateBackend holdsin-flightdataintheTaskManager’smemory.

ApacheCommunity OK GOOD GOOD

Page 28: RealTime AdTech reporting & targeting with Apache Apex

Resources• PubMatic- https://pubmatic.com/• PubMaticblog- http://pubmaticblog.com/• ApacheApex- http://apex.apache.org/• Subscribetoforums

• Apex- http://apex.apache.org/community.html• DataTorrent -https://groups.google.com/forum/#!forum/dt-users

28ProprietaryandConfidential

Page 29: RealTime AdTech reporting & targeting with Apache Apex