20
PPPeople PPPowered

Pppeople 2020

Embed Size (px)

DESCRIPTION

An attempt to create an "if you like this person, you make want to know about these people" interface.http://pppeople.collabtools.org.uk/

Citation preview

Page 1: Pppeople 2020

PPPeople PPPowered

Page 2: Pppeople 2020

If you like this person you may also like

Page 3: Pppeople 2020

The Cunning “Plan”

• Crawler - to get the data

• Semantic Engine - to understand the data

• Database - to save the data

• Visualisation - to show the data

• Social Media Account Details - to extend the data

• Social Integration - to lure people into the data

Page 4: Pppeople 2020

Crawlers: AKA spiders, bots, scrapers, data mining

80legs can crawl over 5,000,000 web pages in 1 hour

Yahoo BOSS

http://www.ibm.com/developerworks/linux/library/l-spider/?ca=dgr-lnxw01WebSpiderLinux

Extractiv

ScraperWiki

But Yahoo already has!!!

Python crawlers• Mechanize• Harvestman• Scrapy• Spynner

99 on Google Code!

Page 5: Pppeople 2020

Database

http://neo4j.org/

The largest production cluster has over 100 TB of data in over 150 machines.

Page 6: Pppeople 2020

Semantic Engine

http://media.jesselegg.com/djangocalais/

Page 7: Pppeople 2020

Social Media Account Details

Page 8: Pppeople 2020

Visualisation

http://www.twitt3d.com

http://www.neuroproductions.be/twitter_friends_network_browser/

Neo4j + Gephi

http://thejit.org/

Page 9: Pppeople 2020

Social Media Integration

Page 10: Pppeople 2020

The Result

http://pppeople.collabtools.org.uk

Page 11: Pppeople 2020

Lessons Learned

You’re on your own

Page 12: Pppeople 2020

“In theory”

Neo4j

Gephi

Treebeard

FreebaseWikipedia

Twitter

Delicious

Betsy

Harvestman

Bug

Missing API

Page 13: Pppeople 2020

Data Cleansing

• People with one name

• Telephone numbers

• United Kingdom

• Lecturer

Data Scrying &

Page 14: Pppeople 2020

Not working with people slows you down

Working with people slows you down

“It’s just one big matrix”

Page 15: Pppeople 2020

Bad Semantics

Jargon Buster

SIPIGWSGDPS

V/C/011Zero Point Energy

Codex Alimentarius

Dept. Buster

Page 16: Pppeople 2020

Browse vs Search

Page 17: Pppeople 2020

No data creation

Cheap tricks: Pictures and Google

Page 18: Pppeople 2020

What Brings People Back?

Page 19: Pppeople 2020

The “jiggle” is everything

Page 20: Pppeople 2020

Conclusion

• I’m onto something