Dating with Models

Preview:

DESCRIPTION

Video available at http://youtu.be/1N3YtjXmtNI Ryan Barker, Software Architect for Eharmony.com explains the evolution of the matching system behind the largest matching site for singles. Detailed explanations of past and present online and offline matching systems including the latest SOA REST based webservices and hadoop backend hybrid system

Citation preview

1

Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects

Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects

Ryan BarkerRyan BarkerRyan BarkerRyan Barker

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › How are we different?

• 30+ years as clinical psychologist and marriage counselor

• Many failing marriages due to fundamental incompatibility

Can we do better?

The fundamental idea›The fundamental idea›The fundamental idea›The fundamental idea›

320 Questions320 Questions

› Personality› Values› Attitudes› Beliefs

Compatibility Matching

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Obstreperousness

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Romantic

Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › 29 Dimensions®

So lets build it! ›So lets build it! ›So lets build it! ›So lets build it! › Models as a stored procedure~2001

Problems ›Problems ›Problems ›Problems › Stored procedures are awesome

• Problem #1 – Thousands of users, very few matches. Entire company is at stake

• Resolution – Line by line debugging of stored procedure finds an AND that should be an OR

• Problem #2 – Database load increasing• Resolution – Optimize stored procedure? More hardware?

Rewrite?

• Problem #3 – Order by compatibility does not work• Resolution – Change stored procedure? Find a way to

introduce models

MatchDistribution

3

Compatibility Matching

1

Affinity Matching

2

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®

Layers on Top of Compatibility MatchingLayers on Top of Compatibility Matching

61 21

3000

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›

………

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance

Prob( )

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Height difference

Prob( ) 4 - 8 in

cm

Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › “Attractiveness”

Prob( )

Redesign ›Redesign ›Redesign ›Redesign › Event based matching with Java/Groovy models

Problems ›Problems ›Problems ›Problems › Better but still suboptimal

• Problem #1 – Suboptimal distribution of matches• Resolution – Shuffle loop order each day? Introduce an

optimizer!

• Problem #2 – Nightly match run taking 27 hours, heavy database load

• Resolution – Move to an offline process

• Problem #3 – Java models require testing and new releases. Groovy models are too slow

• Resolution – Change to configuration based models

Compatibility Matching

1

Affinity Matching

2

MatchDistribution

3

The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®

Delivering the right matches at the right time to as many people as possible across the entire network.

Delivering the right matches at the right time to as many people as possible across the entire network.

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 21Prob( | data)

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 2Prob( | data)

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization

2 2Prob( | data)

23

Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Does it work?

Problems ›Problems ›Problems ›Problems › The design is never finished

• Problem #1 – More data required• Resolution – Build services to collect data in real time

• Problem #2 – Bandwidth limitations• Resolution – Switch to protocol buffers

• Problem #3 – Can’t reprocess people fast enough due to database load

• Resolution – Switch to key value store backed services

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Services for everything

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service features

• RESTful data oriented design• Single element

• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element

• Multiple element• GET – Return list of elements

• Produces/Consumes JSON or Protobuf• JAX-RS providers transparently convert

between formats• Accept/ContentType: X-application-protobuf

Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service Client features

• Generic client customized for each service• Single element

• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element

• Multiple element• GET – Return list of elements• BATCH – Scatter gather implementation

• Protocol buffer based by default, falls back to JSON for older services

• Configurable retries for GET/PUT/DELETE

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Matching User Service is a data aggregation service that gathers data from various sources, and stores them in a key value store

•REST + Protocol buffer based• /user-service/<version>/users/<user-id>• Supports full and partial updates• Supports single and batch gets• 1000+ data attributes,• ~4KB each uncompressed

•Key: Userid•Value: UserProto

Current Day ›Current Day ›Current Day ›Current Day › Matching User Servic

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Current Day ›Current Day ›Current Day ›Current Day › Matching User Service

Current Day ›Current Day ›Current Day ›Current Day › Pairing Service

Pairing Service is a data service that supports a specialized set of operations•REST + Protocol buffer based

• GET/PUT/DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>

• DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>/candidates/<candidate-id>

• 4 data attributes per pairing• 0 to tens of thousands of pairings per user

•Stores: 1 per type•Key: Userid•Value: PairingsProto

Current Day ›Current Day ›Current Day ›Current Day › Scoring Service

Scoring Service is a stateless calculation service that supports JSON based models•REST + Protocol buffer based

• GET /scoring-service/<version>/users/<user-id>/models/<modelname>/score

• POST /scoring-service/<version>/models/<modelname>/score

•Knows how to fetch data from data sources for some models•All models slowly being centralized in one place•Underlying library supports any protobuf or map•Possible candidate for redesign?

Current Day ›Current Day ›Current Day ›Current Day › Model Frameworks 3.0

Model Frameworks 3.0 is the core library behind all scoring•JSON based model definitions•Scala DSL implementation with bytecode generation•Supports Protobuffs (Message), ResultSet, Maps

•Examples• “same_religion” : ”{user.profile.religion} ==

{cand.profile.religion}”• “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} -

{cand.calculatedValues.age})”

Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Spring Conductor

Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Hadoop flow

38

linkedin.com/in/rbarker1linkedin.com/in/rbarker1

Recommended