How to Feed a Data Hungry Organization – by Traveloka Data Team

Traveloka Data

Meetup v1.0.0

How to Feed a Data Hungry Organization

Part One

Traveloka Data Culture

Part 1: Traveloka Data Culture

Five Characteristics of Data Hungry Organization

Driven Decision

Learn from Mistakes

Better Understanding

Uncertainty and Variation

High Quality Data

Data Hungry Organization

Our responsibility is to turn data into consumable insights

BETTER

BUSINESS

DECISION

We need the brightest people to fill our needs and create the future

Mathematics

Business

Programming

Skills

Some of the skills in mathematics

Mathematics

Optimization

Decision Theory

Statistics

Differential Equations

Time Series

Some of the skills in business

Business

Strategy

Finance

Economics

Some of the skills in programming

Programming

Data Wrangling

Modelling

Big Data

This is how we structure our team

TeamData Governance

Machine Learning Engineering

Data Analysis

Data Science

Data Engineering

Houston,

We have

a problem.

Tens of Terabytes

Hundreds of ETLs

Hundreds of topics

Millions of Messages per Hour

Hundreds of Megabytes per Second

Hundreds of Terabytes

Redshift

Tens of Thousand Queries Daily

Thousands of Cards

Hundreds of Users

PeriscopeData

Thousands of Dashboards

Hundreds of Users

We need

state of the art

technology

to feed data

hungry people

Ingestion

Gobblin

Data Lake

AWS S3

Batch Processing

Spark, Airflow, Hadoop2,

Python, Java App

Data Warehouse

Redshift, MongoDB,

PostgreSQL

Datahub

Pubsub, Kafka Stream Processing

DataFlow, MemSQL

Pipeline

Near Real Time DW

GCP BigQuery, MemSQL

Real Time DB

AWS DynamoDB

Ingestion Processin

Storage Presentation

Source DB

Mongo, PostgreSQL

App / Services

Java App

Analytics Tools

PeriscopeData, Spark, R,

Domo Dataiku Holistics, Keboola

ML Tools, Library, and Services

Jupyter, Zeppelin, Caffe, DataDog,

TensorFlow, Cloud Vision API

Query Engine

Qubole, Presto,

Part Two

Data Engineering

Part 2: Data Engineering

Fast Food,

Or…?

MINDSETS

Managed service

for focus

So we could focus more on

the use cases

MINDSETS

Managed service

for focus

So we could focus more on

the use cases

Real Time Pipeline

5 min data delivery SLA. Real latency ~ 10s

100 ms query SLA. Real latency ~ 10ms (p95)

Key value data, query by service/app

Autoscale - Self service for each engineering teamwe provide governance, guidance, building blocks, and consultation

Pipeline

Near Real Time Pipeline

Raw data, query by BI Tools

5 min data delivery SLA. Real latency ~ 5s

Using Yaml for Schema definition (built and defined by ourselves)

Self service for data analysts! with guidance and governance

Near Real Time PipelineBut, MemSQL is not managed service, it is on EC2.

It is easy to scale, but not autoscale yet.

So we are moving to… v2!!

Currently on usability testing test by analysts.

Self service, of course!

Analytical Pipeline

Heavy data

processing

query by BI Tools

6 hour data

delivery SLA

Analytical Pipeline

Interesting features:

• Custom dev/prod environment, for self service!

• Custom framework, on top of Spark

• Custom airflow, separated queue for backfill

• EMR autoscale for backfill

• Redshift microbatch bulk load

• etc...

Summary

Part Three

Data Science in Traveloka

Part 3: Data Science in Traveloka

Things to

Discuss

Data Science Purpose

Tools of the Trade

Model Evaluations and Applications

Things to

Discuss

Tools of the Trade

Novia is 25 years old. She is single, outspoken, and

mathematically gifted. As a student, she was deeply

interested in calculus and statistics, and also participated in

International Mathematical Olympiad.

a. Novia is a data scientist

b. Novia is a data scientist and is active as mathematical

Olympiad tutor

Consider a regular six-sided die with four green faces and

two red faces. The die will be rolled 20 times and the

sequence of greens (G) and reds (R) will be recorded.

Choose one sequence from a set of three. Which one is the

more likely outcome?

GRGRRR

GRRRRR

Remember This:

The goal of data science exercise is to help us make

a good business decision

Alternatives

Information

Preferences

“if they learn nothing else about decision

analysis from their studies, distinction between

outcome and decisions will have been worth

the price of admission”

Ron Howard, Professor at Stanford University

Father of Decision Analysis

Good Bad

Good Took a taxi and arrived safely Drive home and arrived safely

Bad Took a taxi and involved in accident Drive home and involved in accident

Decisions

Outcome

Things to

Discuss

Tools of the Trade

Data Science Framework: CRISP-DM

Business

Data Prep

Evaluation

Deployment

Common

“Hiding within those

mounds of data is

knowledge that could

change the life of a

patient, or change the

world”-Atul Butte, Stanford-

We use open source library for data science

Wrangling

• data.table

• dplyr

• sparkR

• sparklyr

• pandas

• pyspark

Visualization

• ggplot

• matplotlib

• seaborn

• shiny

Statistics

• JAGS

• STAN

• Python

• Julia

Machine Learning

• scikit-learn

• caret

• e1071

• fbprophet

Are we using the algorithm? Or being used by it?

Linear Models

Naïve Bayes Classifier

Support Vector Classifier

Vowpal Wabbit Classifier

Random Forest

Decision Trees

Neural Network

Extreme Gradient Boosted Trees

Many more algos!

Linear Models

Nystroem Regressor

Support Vector Regressor

Vowpal Wabbit Regressor

Random Forest

Decision Trees

Neural Network

Extreme Gradient Boosted Trees

More Algos!

• Scikit-learn

• Caret

• TensorFlow

• …

We need more than just off the shelf libraries to

feed data hungry people

Bayesian Network Markov Chain Monte Carlo

Things to

Discuss

Tools of the Trade

Model Evaluation: judging the usefulness of your model

Rule #1

Never ever peek at the test set during training/validation

Rule #2

You can never satisfy all the metrics,

pick one or two metrics as your decision criteria beforehand

Rule #3

Always do comparative statics on the final model

Comparative

Staticscommonly used as

feature importance

analysis

Remember the end goal: decisions

What should

we do?

happen

“But in my view,

obsessive customer focus

is by far the most protective of

Day 1 vitality”

Our data is telling us:

• What do they want?

• Do we serve their needs?

• Are they trying to leave us?

My name is Jeff

Thank you!

How to Feed a Data Hungry Organization – by Traveloka Data Team

Data & Analytics

School Club: Hungry Hungry Helpers Giving the hungry a helping hand

Taming the Power Hungry Data Center - Dell USAi.dell.com/sites/doccontent/shared-content/data-sheets/fr/Document… · Taming the Power Hungry Data Center ... is the dark and dirty

Design at traveloka: Humanizing Technology - by Traveloka Design Team

· Mirror-hungry personality Ideal-hungry personality Alter-ego personality Merger-hungry personality Contact-shunning personality % .˝ . , . ,

Taming the Power Hungry Data Center - · PDF file · 2010-08-06Taming the Power Hungry Data Center Extraordinary power savings are achieved by integrating the world’s highest performance

flyer traveloka - Mandiri Kartu Kredit · 2020. 2. 13. · Traveloka pemegang kartu utama. Mau Tukar Poin? Caranya Mudah! Tukar Poin dan dapatkan langsung diskon di Traveloka. Berlaku

I'm hungry

ANALISIS EFEKTIVITAS IKLAN PT. TRAVELOKA INDONESIA DI

Hubungan Terpaan Iklan Traveloka dan Tingkat Penghasilan ...eprints.undip.ac.id/59146/1/COVER.pdf · Traveloka dengan nilai signifikansi sebesar 0,002 dan nilai koefisien korelasi

4. HASIL DAN PEMBAHASAN 4.1. Profil Traveloka · 4.1. Profil Traveloka Traveloka adalah perusahaan yang menyediakan layanan pemesanan tiket pesawat dan hotel dengan fokus perjalanan

Analisis Pemanfaatan Aplikasi Traveloka Menggunakan … · 2017. 7. 13. · e-commerce, Traveloka dengan menggunakan pendekatan deskriptif kua. nti. tatif. Traveloka adalah salah

Data-hungry applications via the “limited-capability ......Data-hungry applications via the “limited-capability” satellite network Maritime CIO Forum Rotterdam, 18th Nov 2014

THE VERY HUNGRY CATERPILLAR IS STILL HUNGRY!

Lessons Written By - My Healthy ChurchLesson One - “Hungry, Hungry Hippos” page 5 “Game Of Life” LESSON 1 “Hungry, Hungry, Hippos” ORDER OF SERVICE

Manual Marketing And Commercial Gojek Traveloka Liga 1 Marketing... · Manual Marketing and Commercial Go-Jek Traveloka Liga 1 PT. Liga Indonesia Baru, May 2017

Tapping SSS to Tame the Power Hungry Data Center … · Tapping Solid State Storage to Tame the Power-Hungry Data Center ... CA USA. August 2009. 14. CapEx ... Tapping SSS to Tame

Hungry for success: How food companies can use social data for product development

Hungry Hyena

Hungry planet

Who's Hungry Report 2021 1 Who's Hungry