Upload
clifford-harrington
View
219
Download
3
Embed Size (px)
Citation preview
Social Networks, CompSci 49s, 11/16/2006 1
Social Networksas a Foundationfor Computer Science
Jeffrey Forbeshttp://www.cs.duke.edu/csed/socialnet
Social Networks, CompSci 49s, 11/16/2006 2
A Future for Computer Science?
Social Networks, CompSci 49s, 11/16/2006 3
Is there a Science of Networks? From Erdos numbers to random graphs to Internet
From FOAF to Selfish Routing: apparent similarities between many human and technological systems & organization
Modeling, simulation, and hypotheses Compelling concepts
• Metaphor of viral spread• Properties of connectivity has qualitative and
quantitative effects Computer Science?
From the facebook to tomogravity How do we model networks, measure them, and
reason about them? What mathematics is necessary? Will the real-world intrude?
Social Networks, CompSci 49s, 11/16/2006 4
Physical Networks The Internet
Vertices: Routers Edges: Physical connections
Another layer of abstraction Vertices: Autonomous systems Edges: peering agreements Both a physical and business network
Other examples US Power Grid Interdependence and August 2003
blackout
Social Networks, CompSci 49s, 11/16/2006 5
What does the Internet look like?
Social Networks, CompSci 49s, 11/16/2006 6
US Power Grid
Social Networks, CompSci 49s, 11/16/2006 7
Business & Economic Networks Example: eBay bidding
vertices: eBay users links: represent bidder-seller or buyer-seller fraud detection: bidding rings
Example: corporate boards vertices: corporations links: between companies that share a board
member Example: corporate partnerships
vertices: corporations links: represent formal joint ventures
Example: goods exchange networks vertices: buyers and sellers of commodities links: represent “permissible” transactions
Social Networks, CompSci 49s, 11/16/2006 8
Content Networks
Example: Document similarity Vertices: documents on web Edges: Weights defined by similarity See TouchGraph GoogleBrowser
Conceptual network: thesaurus Vertices: words Edges: synonym relationships
Social Networks, CompSci 49s, 11/16/2006 9
Enron
Social Networks, CompSci 49s, 11/16/2006 10
Social networks Example: Acquaintanceship networks
vertices: people in the world links: have met in person and know last names hard to measure
Example: scientific collaboration vertices: math and computer science
researchers links: between coauthors on a published paper Erdos numbers : distance to Paul Erdos Erdos was definitely a hub or connector; had
507 coauthors How do we navigate in such networks?
Social Networks, CompSci 49s, 11/16/2006 11
Social Networks, CompSci 49s, 11/16/2006 12
Acquaintanceship & more
Social Networks, CompSci 49s, 11/16/2006 13
Network Models (Barabasi)
Differences between Internet, Kazaa, Chord Building, modeling, predicting
Static networks, Dynamic networks Modeling and simulation
Random and Scale-free Implications?
Structure and Evolution Modeling via Touchgraph
Social Networks, CompSci 49s, 11/16/2006 14
Web-based social networkshttp://trust.mindswap.org
Myspace 73,000,000 Passion.com 23,000,000 Friendster 21,000,000 Black Planet 17,000,000 Facebook 8,000,000
Who’s using these, what are they doing, how often are they doing it, why are they doing it?
Social Networks, CompSci 49s, 11/16/2006 15
Golbeck’s Criteria
Accessible over the web via a browser
Users explicitly state relationships Not mined or inferred
Relationships visible and browsable by others Reasons?
Support for users to make connections Simple HTML pages don’t suffice
Social Networks, CompSci 49s, 11/16/2006 16
CSE 112, Networked Life (UPenn)
Find the person in Facebook with the most friends Document your process
Find the person with the fewest friends What does this mean?
Search for profiles with some phrase that yields 30-100 matches Graph degrees/friends, what is
distribution?
Social Networks, CompSci 49s, 11/16/2006 17
CompSci 1: Overview CS0
Audioscrobbler and last.fm Collaborative filtering What is a neighbor? What is the network?
Social Networks, CompSci 49s, 11/16/2006 18
What can we do with real data?
How do we find a graph’s diameter? This is the maximal shortest path
between any pair of vertices Can we do this in big graphs?
What is the center of a graph? From rumor mills to DDOS attacks How is this related to diameter?
Demo GUESS (as augmented at Duke) IM data, Audioscrobbler data
Social Networks, CompSci 49s, 11/16/2006 19
My recommendations at Amazon
Social Networks, CompSci 49s, 11/16/2006 20
And again…
Social Networks, CompSci 49s, 11/16/2006 21
How do search engines work? Hotbot, Yahoo, Alta Vista, Excite, … Inverted index with buckets of words
Insight: use matrix to represent how many times a term appears in one page
Columns: pages & Rows: terms Problems?
Return pages that have the keyword - in what order? Early solution: return those pages with most
occurrences of term first Problems? Solution?
• Use structure of the web to do the work for us• What did Google do?
Social Networks, CompSci 49s, 11/16/2006 22
Google’s PageRank
web site xxx
web site yyyy
web site a b c d e f g
web
site
pdq pdq ..
web site yyyy
web site a b c d e f g
web site xxx
Inlinks are “good” (recommendations)
Inlinks from a “good” site are better than inlinks from a “bad” site
but inlinks from sites with many outlinks are not as “good”...
“Good” and “bad” are relative.
web site xxx
Social Networks, CompSci 49s, 11/16/2006 23
Google’s PageRank
web site xxx
web site yyyy
web site a b c d e f g
web
site
pdq pdq ..
web site yyyy
web site a b c d e f g
web site xxx Imagine a “pagehopper”
that always either
• follows a random link, or
• jumps to random page
Social Networks, CompSci 49s, 11/16/2006 24
Google’s PageRank(Brin & Page, http://www-db.stanford.edu/~backrub/google.html)
web site xxx
web site yyyy
web site a b c d e f g
web
site
pdq pdq ..
web site yyyy
web site a b c d e f g
web site xxx Imagine a “pagehopper”
that always either
• follows a random link, or
• jumps to random page
PageRank ranks pages by the amount of time the pagehopper spends on a page:
• or, if there were many pagehoppers, PageRank is the expected “crowd size”
Social Networks, CompSci 49s, 11/16/2006 25
Collaborative Filtering Goal: predict the utility of an item to a
particular user based on a database of user profiles User profiles contain user preference
information Preference may be explicit or implicit
• Explicit means that a user votes explicitly on some scale
• Implicit means that the system interprets user behavior or selections to impute a vote
Problems Missing data: voting is neither complete
nor uniform Preferences may change over time Interface issues
Social Networks, CompSci 49s, 11/16/2006 26
Memory-based methods
Store all user votes and generalize from them to predict vote for new item
Predicted vote of active user a for item j:
where there are n users with non-zero weights, vi,j is the vote of user i and item j, is a normalizing factor,
w() is a weighting function between users• Distance metric• Correlation or similarity
Social Networks, CompSci 49s, 11/16/2006 27
Computing weights - Cosine Correlation
In information retrieval, documents are represented as vectors of word frequencies For CF, we treat preferences as vector
• Documents -> users• Word frequencies -> votes
Similarity is then the cosine between two vectors Dot product of the vectors divided by
the product of their magnitudes
Social Networks, CompSci 49s, 11/16/2006 28
Computing weights - Pearson & Spearman correlation
Pearson Correlation First used for CF in GroupLens project
[Resnick et al., 1994] Relatively efficient to calculate
incrementally
Spearman Correlation same as Pearson but calculations are
done on rank of va,j and vi,j
Social Networks, CompSci 49s, 11/16/2006 29
Model-based methods
Really what we want is the expected value of the user’s vote
Cluster Models• Users belong to certain classes in C with
common tastes• Naive Bayes Formulation
• Calculate Pr(vi|C=c) from training set
Bayesian Network Models