37
1 Dating with Models Dating with Models A How to Guide for Programmers and Architects A How to Guide for Programmers and Architects Ryan Barker Ryan Barker

Dating with Models

Embed Size (px)

DESCRIPTION

Video available at http://youtu.be/1N3YtjXmtNI Ryan Barker, Software Architect for Eharmony.com explains the evolution of the matching system behind the largest matching site for singles. Detailed explanations of past and present online and offline matching systems including the latest SOA REST based webservices and hadoop backend hybrid system

Citation preview

Page 1: Dating with Models

1

Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects

Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects

Ryan BarkerRyan BarkerRyan BarkerRyan Barker

Page 2: Dating with Models

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › How are we different?

• 30+ years as clinical psychologist and marriage counselor

• Many failing marriages due to fundamental incompatibility

Can we do better?

Page 3: Dating with Models

The fundamental idea›The fundamental idea›The fundamental idea›The fundamental idea›

320 Questions320 Questions

› Personality› Values› Attitudes› Beliefs

Compatibility Matching

Page 4: Dating with Models

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Obstreperousness

Page 5: Dating with Models

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Romantic

Page 6: Dating with Models

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › 29 Dimensions®

Page 7: Dating with Models

So lets build it! ›So lets build it! ›So lets build it! ›So lets build it! › Models as a stored procedure~2001

Page 8: Dating with Models

Problems ›Problems ›Problems ›Problems › Stored procedures are awesome

• Problem #1 – Thousands of users, very few matches. Entire company is at stake

• Resolution – Line by line debugging of stored procedure finds an AND that should be an OR

• Problem #2 – Database load increasing• Resolution – Optimize stored procedure? More hardware?

Rewrite?

• Problem #3 – Order by compatibility does not work• Resolution – Change stored procedure? Find a way to

introduce models

Page 9: Dating with Models

MatchDistribution

3

Compatibility Matching

1

Affinity Matching

2

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®

Layers on Top of Compatibility MatchingLayers on Top of Compatibility Matching

Page 10: Dating with Models

61 21

3000

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›

Page 11: Dating with Models

………

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›

Page 12: Dating with Models

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance

Prob( )

Page 13: Dating with Models

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance

Page 14: Dating with Models

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Height difference

Prob( ) 4 - 8 in

cm

Page 15: Dating with Models

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › “Attractiveness”

Prob( )

Page 16: Dating with Models

Redesign ›Redesign ›Redesign ›Redesign › Event based matching with Java/Groovy models

Page 17: Dating with Models

Problems ›Problems ›Problems ›Problems › Better but still suboptimal

• Problem #1 – Suboptimal distribution of matches• Resolution – Shuffle loop order each day? Introduce an

optimizer!

• Problem #2 – Nightly match run taking 27 hours, heavy database load

• Resolution – Move to an offline process

• Problem #3 – Java models require testing and new releases. Groovy models are too slow

• Resolution – Change to configuration based models

Page 18: Dating with Models

Compatibility Matching

1

Affinity Matching

2

MatchDistribution

3

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®

Delivering the right matches at the right time to as many people as possible across the entire network.

Delivering the right matches at the right time to as many people as possible across the entire network.

Page 19: Dating with Models

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 21Prob( | data)

Page 20: Dating with Models

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 2Prob( | data)

Page 21: Dating with Models

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 2Prob( | data)

Page 22: Dating with Models

23

Page 23: Dating with Models

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Does it work?

Page 24: Dating with Models

Problems ›Problems ›Problems ›Problems › The design is never finished

• Problem #1 – More data required• Resolution – Build services to collect data in real time

• Problem #2 – Bandwidth limitations• Resolution – Switch to protocol buffers

• Problem #3 – Can’t reprocess people fast enough due to database load

• Resolution – Switch to key value store backed services

Page 25: Dating with Models

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Services for everything

Page 26: Dating with Models

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service features

• RESTful data oriented design• Single element

• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element

• Multiple element• GET – Return list of elements

• Produces/Consumes JSON or Protobuf• JAX-RS providers transparently convert

between formats• Accept/ContentType: X-application-protobuf

Page 27: Dating with Models

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service Client features

• Generic client customized for each service• Single element

• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element

• Multiple element• GET – Return list of elements• BATCH – Scatter gather implementation

• Protocol buffer based by default, falls back to JSON for older services

• Configurable retries for GET/PUT/DELETE

Page 28: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Matching User Service is a data aggregation service that gathers data from various sources, and stores them in a key value store

•REST + Protocol buffer based• /user-service/<version>/users/<user-id>• Supports full and partial updates• Supports single and batch gets• 1000+ data attributes,• ~4KB each uncompressed

•Key: Userid•Value: UserProto

Page 29: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Matching User Servic

Page 30: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Page 31: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Page 32: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Pairing Service

Pairing Service is a data service that supports a specialized set of operations•REST + Protocol buffer based

• GET/PUT/DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>

• DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>/candidates/<candidate-id>

• 4 data attributes per pairing• 0 to tens of thousands of pairings per user

•Stores: 1 per type•Key: Userid•Value: PairingsProto

Page 33: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Scoring Service

Scoring Service is a stateless calculation service that supports JSON based models•REST + Protocol buffer based

• GET /scoring-service/<version>/users/<user-id>/models/<modelname>/score

• POST /scoring-service/<version>/models/<modelname>/score

•Knows how to fetch data from data sources for some models•All models slowly being centralized in one place•Underlying library supports any protobuf or map•Possible candidate for redesign?

Page 34: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Model Frameworks 3.0

Model Frameworks 3.0 is the core library behind all scoring•JSON based model definitions•Scala DSL implementation with bytecode generation•Supports Protobuffs (Message), ResultSet, Maps

•Examples• “same_religion” : ”{user.profile.religion} ==

{cand.profile.religion}”• “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} -

{cand.calculatedValues.age})”

Page 35: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Spring Conductor

Page 36: Dating with Models

Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Hadoop flow

Page 37: Dating with Models

38

linkedin.com/in/rbarker1linkedin.com/in/rbarker1