43
Agile Machine Learning for Real-time Recommender Systems [email protected] @jssmith github.com/ifweco Johann Schleier-Smith CTO, if(we)

Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Embed Size (px)

DESCRIPTION

Abstract: Agile Machine Learning for Recommender Systems What can data scientists and machine learning engineers learn from software developers? When it comes to process and tools, and managing complexity, the answer is: quite a bit. When we first started to deploy machine learning at if(we), it felt like we hit a speed bump in the middle of the highway. Accustomed to shipping software to millions of members multiple times a day, to constantly iterating toward better products, we were stunned at how long it took us to try new ideas using available machine learning tools. I will share what what we’ve learned from applying agile software development principles to building recommender systems, describing the tools and platforms that allow us to go from new ideas to proven product improvements in just a few days.

Citation preview

Page 1: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Agile Machine Learning for Real-time Recommender Systems

[email protected]@jssmith github.com/ifweco

Johann Schleier-Smith CTO, if(we)

Page 2: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

what it should look like

Page 3: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Create training data

5. Train predictive models

6. Put models in production

7. See improvements

Page 4: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Create training data

5. Train predictive models

6. Put models in production

7. See improvements

Page 5: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

what it often looks like

Page 6: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Pull records from database to create interesting features (usually aggregates)

5. Train predictive models

6. Go implement models for production

7. See improvements

Page 7: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Pull records from database to create interesting features (usually aggregates)

5. Train predictive models

6. Go implement models for production

7. See improvements

3-6 months

Page 8: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Pull records from database to create interesting features (usually aggregates)

5. Train predictive models

6. Go implement models for production

7. See improvements

Page 9: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Pull records from database to create interesting features (usually aggregates)

5. Train predictive models

6. Go implement models for production

7. See improvements Cool!

Was it worth it?

Page 10: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

• Profitable startup actively pursuing big opportunities in social apps

• Millions of users of existing brands

• Thousands of social contacts per second

Page 11: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

real-time recommendations

challenges

Page 12: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

• >10 million candidates to select from

• >1000 updates/sec

• Must be responsive to current activity

• Users expect instant query results

Tagged dating feature

Page 13: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

implementation pain points

Page 14: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

• Data scientist hands model description to software engineer

• May need to translate features from SQL to Java

• Aggregate features require batch processing

• May need to adjust features and model to achieve real-time updates

• Fast scoring requires high-performance in-memory data structures

Page 15: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF
Page 16: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

time for new thinking

Page 17: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

one way thatworks better

Page 18: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF
Page 19: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

!

!

!

4. Pull records from database to create interesting features (usually aggregates)

5. Train predictive models

6. Go implement models for production

Page 20: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Pull records from database to create interesting features (usually aggregates)

Train predictive models

Go implement models for production

Page 21: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Create interesting features

Train predictive models

Put models in production

Page 22: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Create interesting features

Train predictive models

Put models in production

Page 23: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

one right way to data

Page 24: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

event history

one right way to data

Page 25: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

History. filterTime(start, PLUS_INFINITY). foreach { e: Event => model.update(e) }

Page 26: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

everything is an event

Page 27: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

Bob registers Alice registers

Alice updates profile Bob opens app

Bob sees Alice in recommendations Bob swipes yes on Alice

Alice receives push notification Alice sees Bob swiped yes

Alice swipes yes Alice sends message to Bob

Page 28: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

writing the model

Page 29: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

class MyModel { def update(e: Event) { … } def topN(ctx: Context, n: Int) = { … } }

Page 30: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

models are allabout features

Page 31: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

class MyFeature { def update(e: Event) { … } def score(ctx: Context, candidateId: Long): Double = { … } }

Page 32: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

model training

Page 33: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

History. filterTime(start, PLUS_INFINITY). foreach { e: Event => { writeTrainingData(outcome(e), model.features(context(e)) model.update(e) } }

Page 34: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

live demo

Page 35: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

live demo

Kaggle competition with Best Buy data

https://www.kaggle.com/c/acm-sf-chapter-hackathon-small

Page 36: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

product update events{ “timestamp” : “2012-05-03 6:43:15”, “eventType” : “ProductUpdate”, “eventProperties” : { “sku” : “1032361”, “regularPrice” : “19.99”, “name” : “Need for Speed: Hot Pursuit”, “description” : “Fasten your seatbelt and get ready to drive like your life depends on it...” ... } }

Page 37: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

product view events

{ “timestamp” : “2011-10-31 09:48:46”, “eventType” : “ProductView”, “eventProperties” : { “skuSelected” : “2670133”, “query” : “Modern warfare” } }

Page 38: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

demo

Page 39: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Create training data

5. Train predictive models

6. Put models in production

7. See improvements

Page 40: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Create training data

5. Train predictive models

6. Put models in production

7. See improvements

Page 41: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

1. Gain understanding of machine learning

2. Gain understanding of the product usage

3. See opportunity to make the product better

4. Create training data

5. Train predictive models

6. Put models in production

7. See improvementsFa

st cycles!!

Page 42: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF
Page 43: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF

• All data in form of events – no exceptions!

• Roll through history to generate training examples

• Sample training data carefully to avoid feedback

• Model is static while features are live and personal

• Use interesting features with boring algorithms

• Expressiveness > performance > scalability

github.com/ifweco/antelope @jssmith