Upload
evan-estola
View
114
Download
2
Embed Size (px)
DESCRIPTION
Presentation given for Tech Talks at Meetup event on 8/27/13
Citation preview
My Background
● Software Engineer/Data Scientist● Machine learning team● At Meetup since May 2012● BS Computer Science
○ Information Retrieval○ Data Mining○ Math
■ Linear Algebra■ Graph Theory
You
● Data Scientists?● Engineers?● Statisticians?● Students?● Non-technical?
What this talk is
● Super secret peek into Meetup!● Meetup recommendations examples● How we do recommendations
(model/features)● Lessons learned/what’s next
What this talk isn’t
● What is a data scientist?● What is big data?● How does matrix factorization or gradient
boosted decision trees or map reduce or this framework I hope you’ll use work?
Why Meetup data is cool
● Real people meeting up● Every meetup could change someone's life● No ads, just do the best thing● Oh and 114 million rsvps by >14 million
members● 2.7 million rsvps in the last 30 days
○ ~1/second
Data at Meetup
● User data● Site monitoring/performance● AB testing● Recommendations*
“Everything is a recommendation”
● Not my phrase● Not actually true yet● Working on it
Recommendation
Topic Recommendations
● New registrant● Don’t know anything about you yet!● Most popular is boring/repetitive
Algorithm:○ Group local meetups by topic○ Select topic with most groups○ Remove those groups○ Repeat
Group/Event Recommendations
● Replaced a topic only system● Inputs:
○ Member, location, topics, facebook friends? demographics?
● Outputs:○ Ranking
Collaborative Filtering
● Classic recommendations approach● Users who like this also like this
Why Recs at Meetup are hard
● Incomplete Data (topics)● Cold start● Asking user for data is hard● Going to meetups is scary● Sparsity
○ Location○ Groups/person○ Membership: 0.001%○ Compare to Netflix: 1%
Supervised Learning/Classification
● “Inferring a function from labeled training data”
● Joined Meetup/Didn’t join Meetup● “Features”
Topic Match
State Match
Logistic Regression
● Score○ “Probability”○ Ranking
● Fast + Easy● Weights!
Group recommendation weights
● TopicMatch 1.21● TopicMatchExtended 0.17● FacebookFriends 0.15● SecondDegreeFacebook 0.79● AgeUnmatch -2.20● GenderUnmatch -2.6● StateMatchFeature 0.44● CityMatch 0.02● DistanceBucket <2 1.39● DistanceBucket 2-5 0.83● DistanceBucket 5-10 0.60● DistanceBucket >10 n/a
Making up features
● “Zipscore”● All topics not created equal● Facebook likes
Real data is gross
● Preprocessing is critical!○ missing data○ outliers○ log scale○ bucketing○ selection/sampling (not introducing bias)
Cleaning data
● Schenectady● Beverly Hills● Astronaut● Fake RSVP boosts (+100 guests!)● Rsvp hogs
TO THE FUTURE!
● Hadoop● Clicks● Impressions● People to people recommendations?● Recommending people to groups?
Thanks!
Smart people come work with me.http://www.meetup.com/jobs/
Special thanks:● Chris Halpert● Victor J Wang