12
H 2 O Workshop Hassan Namarvar Principal Data Scientist Oct 8, 2014

H2O platform workshop

Embed Size (px)

DESCRIPTION

H2O Platform Workshop with ShareThis data science team

Citation preview

Page 1: H2O platform workshop

H2O Workshop

Hassan NamarvarPrincipal Data Scientist

Oct 8, 2014

Page 2: H2O platform workshop

2

WHY USE A NEW MACHINE LEARNING TOOL?

Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues.

Critical Features for state-of-the-art ML package:

Ease of use

System reliability

In-memory (fast)

Distributed

Extensible (API/SDK)

Accurate algorithms

Visualization (data and results)

Page 3: H2O platform workshop

3

INTRODUCTION TO H2O PLATFORM

H2O is the world’s fastest in-memory open source machine learning library.

Important Features:

Open source licensed under Apache

Scalable in-memory processing for big data (written in Java)

Run on one node or multi-node cluster

High quality implementation of state-of-the-art ML libraries

H2O package for R

Spark+H2O = Sparkling Water

Page 4: H2O platform workshop

4

WORKSHOP AGENDA

Download the bleeding edge version of platform!

Tutorial on Web API

Upload a real dataset into the platform

Build a CPA model using GLM algorithm

Validate the CPA Model on test set

Build more advanced models: GBMs (Gradient Boost Models) BigData Random Forest Deep Learning Neural Networks

Model selection

Page 5: H2O platform workshop

5

LET’S DO SOME HACKING!

Download the bleeding edge version of platform from:

http://0xdata.com/download/

Run locally:

cd ~/Downloadsunzip h2o-2.7.0.1533.zipcd h2o-2.7.0.1533.zipjava –Xmx4g –jar h2o.jar

Point your browser to:

http://localhost:54321

Page 6: H2O platform workshop

6

BUILDING A CPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS

USER-CENTRIC

Focus on RT Users

Deliver Ads at the optimal times

BETTERPERFORMAN

CELeverage

optimization opportunities

OPTIMAL TIME

Target Users Who Likely Convert

DON’T WASTE IMP.

Page 7: H2O platform workshop

7

GLM MODEL

Screen shot for the CPA model using the GLM algorithm.

Page 8: H2O platform workshop

8

GBM MODEL

Screen shot for the CPA model using the GBM algorithm.

Page 9: H2O platform workshop

9

BigData Random Forest MODEL

Screen shot for the CPA model using the RF algorithm.

Page 10: H2O platform workshop

10

MODEL COMPARISON

Comparing AUC plots of GLM, GBM and RF models on test data:

Page 11: H2O platform workshop

11

LIVE TEST ON A CAR INSURANCE CAMPAIGNTESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA.

The CPA test for a car Insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).

Page 12: H2O platform workshop

THANK YOU!