Machine learning and data at Meetup

Machine Learning and Data at Meetup

Evan EstolaMeetup.com

[email protected]@estola

My Background

● Software Engineer/Data Scientist● Machine learning team● At Meetup since May 2012● BS Computer Science

○ Information Retrieval○ Data Mining○ Math

■ Linear Algebra■ Graph Theory

You

● Data Scientists?● Engineers?● Statisticians?● Students?● Non-technical?

What this talk is

● Super secret peek into Meetup!● Meetup recommendations examples● How we do recommendations

(model/features)● Lessons learned/what’s next

What this talk isn’t

● What is a data scientist?● What is big data?● How does matrix factorization or gradient

boosted decision trees or map reduce or this framework I hope you’ll use work?

Why Meetup data is cool

● Real people meeting up● Every meetup could change someone's life● No ads, just do the best thing● Oh and 114 million rsvps by >14 million

members● 2.7 million rsvps in the last 30 days

○ ~1/second

Data at Meetup

● User data● Site monitoring/performance● AB testing● Recommendations*

“Everything is a recommendation”

● Not my phrase● Not actually true yet● Working on it

Recommendation

Topic Recommendations

● New registrant● Don’t know anything about you yet!● Most popular is boring/repetitive

Algorithm:○ Group local meetups by topic○ Select topic with most groups○ Remove those groups○ Repeat

Group/Event Recommendations

● Replaced a topic only system● Inputs:

○ Member, location, topics, facebook friends? demographics?

● Outputs:○ Ranking

Collaborative Filtering

● Classic recommendations approach● Users who like this also like this

Why Recs at Meetup are hard

● Incomplete Data (topics)● Cold start● Asking user for data is hard● Going to meetups is scary● Sparsity

○ Location○ Groups/person○ Membership: 0.001%○ Compare to Netflix: 1%

Supervised Learning/Classification

● “Inferring a function from labeled training data”

● Joined Meetup/Didn’t join Meetup● “Features”

Topic Match

State Match

Logistic Regression

● Score○ “Probability”○ Ranking

● Fast + Easy● Weights!

Group recommendation weights

● TopicMatch 1.21● TopicMatchExtended 0.17● FacebookFriends 0.15● SecondDegreeFacebook 0.79● AgeUnmatch -2.20● GenderUnmatch -2.6● StateMatchFeature 0.44● CityMatch 0.02● DistanceBucket <2 1.39● DistanceBucket 2-5 0.83● DistanceBucket 5-10 0.60● DistanceBucket >10 n/a

Making up features

● “Zipscore”● All topics not created equal● Facebook likes

Real data is gross

● Preprocessing is critical!○ missing data○ outliers○ log scale○ bucketing○ selection/sampling (not introducing bias)

Cleaning data

● Schenectady● Beverly Hills● Astronaut● Fake RSVP boosts (+100 guests!)● Rsvp hogs

TO THE FUTURE!

● Hadoop● Clicks● Impressions● People to people recommendations?● Recommending people to groups?

Thanks!

Smart people come work with me.http://www.meetup.com/jobs/

Special thanks:● Chris Halpert● Victor J Wang

http://www.meetup.com/jobs/

http://www.meetup.com/jobs/

Technology

Machine learning and data at Meetup