Likes and Locations - Adventure in Social Data Mining

Preview:

Citation preview

Likes and LocationsAdventure in Social Data Mining

Gene Chuang – Exec Dir of Social Eng, ATTi

Masahji Stewart – Founder, Synctree

Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA

Dedication

Background

Social Local Mobile Loco

Why Mine Social and Local Data?

• Signals to improve user experience

• Timely and “Placely”

• Engagement

• Provide value – save time, save money

• Opt In, Privacy

Yp.com Infrastructure

• Ruby on Rails for Web, Login and API

• Solr/Lucene for Search

• Hadoop for Data pipeline

• Hive for Ad Hoc queries on Hadoop

• Ruby ETL scripts

Oauth 2

• Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens

• Think Valet Key

YP.com Login/Registration

Login Layer

A

Oauth 2 Dance

Semi-Social Search

Social Mining - ExtractExtract Script

Pull data out of a database (like Oracle), Hive, Files, hit Facebook,or any other source and output JSON data to STDOUT:

For example to get count of the total users signed up by day:$ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14{"day":"2011-02-14","count":891,"total":1328636}{"day":"2011-02-15","count":1088,"total":1329724}{"day":"2011-02-16","count":1016,"total":1330740}{"day":"2011-02-17","count":1359,"total":1332099}{"day":"2011-02-18","count":1143,"total":1333242}{"day":"2011-02-19","count":660,"total":1333902}{"day":"2011-02-20","count":597,"total":1334499}{"day":"2011-02-21","count":874,"total":1335373}

Social Mining - Transform

Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT

For example, to add ypids to existing facebook likes then filter out location and ypidmatching data:

$ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_match ypids ypid_match_results id{"name":"Snuggle Bunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]}{"name":"Associate Construction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"}{"name":"PH Bistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"}{"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}

Social Mining - LoadLoad

Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard)

For example loading total facebook accounts by day into the web dashboard$ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total

Location Real-Time Fuzzy MatcherFP0 (exact match)

Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemAppend tokens

FP3 (fuzzy match)

Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemRemove tokens that are less than 2 chars longRemove upper-case short tokens (i.e., MD, CPA, DDS, etc)Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokensOrder tokens alphabeticallyAppend tokens

Example:Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710

FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai

Social Data

• Valid Facebook Access Tokens: 14K

• Total Unique Likes: 300K

• % Likes with Locations and/or Phones: 19%

• % Likes mapped to YPID: 38%

• Total Check-Ins: 530

Social Mining Mother Lode

• Social Search

• Local Recommendation Engine

• Discovery Wall

• Top 10 List

• Social e-Commerce

• Online Presence Management – Social CRM

Questions?

• genechuang@gmail.com

• http://www.twitter.com/genechuang

• http://www.quora.com/Gene-Chuang

• http://www.linkedin.com/in/genechuang

Recommended