View
97
Download
0
Category
Preview:
Citation preview
Expect the unexpectedOr how we adapted to get ourselves out of harm’s way
(and lived)
Yaniv Ranen
AOL/Convertro
Named leader in report.
Founded in 2009Acquired by AOL in 2014
Using Big Data stack since 2009
75 people - 30 R&D
5TB of daily data 400M daily data points>100 data sources
Convertro
Programmer @ IDF’s Computer unit (Mamram) Been in the BI field since 2000
CTO @ Ness GilonBI manager @ Kenshoo
Big data engineer @ Convertro(Spend, Insights, Spend recommendations, DS team)
Collecting the user touch points
Cleansing and enhancing the data
Running an attribution model to decide the event score in the overall path to conversion
Allowing access to our data on the most granular level (Dashboard/data feeds)
Recommend media spend allocation
How do we do that?
Don’t ignore pain
Don’t Fix something that isn’t painful enough
Don’t fix something just because it’s cool
Fix it right, scalable and monitored
Address scalability debt at early stages
Pain driven development
AZ#1
AZ#2
Collecting Processing Presenting
SFTP / SCP
Org Services4 static servers
Event deduping
Main Technologies
Pain
High demand for 1800Flowers drove lots of requests per hour
Our cycles had almost exceeded 24 hours
Data was lost
The Super bowl was right around the corner
“Mothers day”
AZ#1
AZ#2
Collecting Processing Presenting
SFTP / SCP
Org ServicesAmazon Autoscale12 – 100 servers
Stateless architectureSimple algorithm
“Bees with Machine guns” used for warm up
Event deduping in MR
Main Technologies
Pain
Onboarding process might have problems
The sooner the problems are dealt with the sooner we begin to gather data for our client
Fast indication if we have tagging problems
Implementation feedback
Pain
Clients have different taxonomies
Lots of development adjustments
Parsing data without code changes
Development scaling
Customization needs
Parsing – CSL
Convertro Source Language
Implementation team can write a script in pseudo English
The script gets pushed to our repository
Every build Maven runs a parser which generates a Java class based on the if statement in that file
A parameter in our settings redirects a client to use this automated class
Changes in parsing are done on the fly by our implementation team fast on-boarding
Spend
Over 100 possible integrationsEach integration is a different snowflake, different login method, different data storedComplex matching techniques with existing data
Pain
Lots of spend integrations
Needs to be customizable per client
Data matching problems
Development time on integration tweaks
Spend
Spend
Each integration has a scraper/API that extracts a csv file of daily spend and saves it on S3
That’s the only unique code for each integration and it’s relatively small
Pain
different SLA for reporting and Dashboard
MPP is a brute force solution
Dashboard - Write once read many (daily)
Reports - Write many read many (ongoing)
Reporting / Dashboard
Dashboard data is copied daily to a different Vertica environment used for the dashboard
Reports run on the faster changing environment
Reporting / Dashboard
Operational Dash
ETL
Pain
Clients want to map ID’s to description
Rebuild pre-calculated table every change
First day inconsistency
Mapping
Summary
– Pain driven development – Begin with the naïve approach and study pain areas– Saving developer time is a major issue for us– Our business drives us to automate and reduce costs
Recommended