View
215
Download
2
Category
Tags:
Preview:
DESCRIPTION
Video available at http://youtu.be/1N3YtjXmtNI Ryan Barker, Software Architect for Eharmony.com explains the evolution of the matching system behind the largest matching site for singles. Detailed explanations of past and present online and offline matching systems including the latest SOA REST based webservices and hadoop backend hybrid system
Citation preview
1
Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects
Dating with ModelsDating with ModelsA How to Guide for Programmers and Architects A How to Guide for Programmers and Architects
Ryan BarkerRyan BarkerRyan BarkerRyan Barker
The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › How are we different?
• 30+ years as clinical psychologist and marriage counselor
• Many failing marriages due to fundamental incompatibility
Can we do better?
The fundamental idea›The fundamental idea›The fundamental idea›The fundamental idea›
320 Questions320 Questions
› Personality› Values› Attitudes› Beliefs
Compatibility Matching
Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Obstreperousness
Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › Romantic
Compatibility Matching ›Compatibility Matching ›Compatibility Matching ›Compatibility Matching › 29 Dimensions®
So lets build it! ›So lets build it! ›So lets build it! ›So lets build it! › Models as a stored procedure~2001
Problems ›Problems ›Problems ›Problems › Stored procedures are awesome
• Problem #1 – Thousands of users, very few matches. Entire company is at stake
• Resolution – Line by line debugging of stored procedure finds an AND that should be an OR
• Problem #2 – Database load increasing• Resolution – Optimize stored procedure? More hardware?
Rewrite?
• Problem #3 – Order by compatibility does not work• Resolution – Change stored procedure? Find a way to
introduce models
MatchDistribution
3
Compatibility Matching
1
Affinity Matching
2
The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®
Layers on Top of Compatibility MatchingLayers on Top of Compatibility Matching
61 21
3000
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›
………
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching ›
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance
Prob( )
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Distance
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › Height difference
Prob( ) 4 - 8 in
cm
Affinity Matching ›Affinity Matching ›Affinity Matching ›Affinity Matching › “Attractiveness”
Prob( )
Redesign ›Redesign ›Redesign ›Redesign › Event based matching with Java/Groovy models
Problems ›Problems ›Problems ›Problems › Better but still suboptimal
• Problem #1 – Suboptimal distribution of matches• Resolution – Shuffle loop order each day? Introduce an
optimizer!
• Problem #2 – Nightly match run taking 27 hours, heavy database load
• Resolution – Move to an offline process
• Problem #3 – Java models require testing and new releases. Groovy models are too slow
• Resolution – Change to configuration based models
Compatibility Matching
1
Affinity Matching
2
MatchDistribution
3
The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®
Delivering the right matches at the right time to as many people as possible across the entire network.
Delivering the right matches at the right time to as many people as possible across the entire network.
Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization
2 21Prob( | data)
Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization
2 2Prob( | data)
Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Graph optimization
2 2Prob( | data)
23
Match Distribution ›Match Distribution ›Match Distribution ›Match Distribution › Does it work?
Problems ›Problems ›Problems ›Problems › The design is never finished
• Problem #1 – More data required• Resolution – Build services to collect data in real time
• Problem #2 – Bandwidth limitations• Resolution – Switch to protocol buffers
• Problem #3 – Can’t reprocess people fast enough due to database load
• Resolution – Switch to key value store backed services
Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Services for everything
Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service features
• RESTful data oriented design• Single element
• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element
• Multiple element• GET – Return list of elements
• Produces/Consumes JSON or Protobuf• JAX-RS providers transparently convert
between formats• Accept/ContentType: X-application-protobuf
Rearchitecture ›Rearchitecture ›Rearchitecture ›Rearchitecture › Service Client features
• Generic client customized for each service• Single element
• GET – Return single element• POST – Update single element• PUT – Create single element• DELETE – Delete single element
• Multiple element• GET – Return list of elements• BATCH – Scatter gather implementation
• Protocol buffer based by default, falls back to JSON for older services
• Configurable retries for GET/PUT/DELETE
Current Day ›Current Day ›Current Day ›Current Day › Matching User Service
Matching User Service is a data aggregation service that gathers data from various sources, and stores them in a key value store
•REST + Protocol buffer based• /user-service/<version>/users/<user-id>• Supports full and partial updates• Supports single and batch gets• 1000+ data attributes,• ~4KB each uncompressed
•Key: Userid•Value: UserProto
Current Day ›Current Day ›Current Day ›Current Day › Matching User Servic
Current Day ›Current Day ›Current Day ›Current Day › Matching User Service
Current Day ›Current Day ›Current Day ›Current Day › Matching User Service
Current Day ›Current Day ›Current Day ›Current Day › Pairing Service
Pairing Service is a data service that supports a specialized set of operations•REST + Protocol buffer based
• GET/PUT/DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>
• DELETE /pairings-service/<version>/pairings/<type>/users/<user-id>/candidates/<candidate-id>
• 4 data attributes per pairing• 0 to tens of thousands of pairings per user
•Stores: 1 per type•Key: Userid•Value: PairingsProto
Current Day ›Current Day ›Current Day ›Current Day › Scoring Service
Scoring Service is a stateless calculation service that supports JSON based models•REST + Protocol buffer based
• GET /scoring-service/<version>/users/<user-id>/models/<modelname>/score
• POST /scoring-service/<version>/models/<modelname>/score
•Knows how to fetch data from data sources for some models•All models slowly being centralized in one place•Underlying library supports any protobuf or map•Possible candidate for redesign?
Current Day ›Current Day ›Current Day ›Current Day › Model Frameworks 3.0
Model Frameworks 3.0 is the core library behind all scoring•JSON based model definitions•Scala DSL implementation with bytecode generation•Supports Protobuffs (Message), ResultSet, Maps
•Examples• “same_religion” : ”{user.profile.religion} ==
{cand.profile.religion}”• “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} -
{cand.calculatedValues.age})”
Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Spring Conductor
Current Day ›Current Day ›Current Day ›Current Day › Offline Matching – Hadoop flow
38
linkedin.com/in/rbarker1linkedin.com/in/rbarker1
Recommended