View
111
Download
1
Category
Tags:
Preview:
DESCRIPTION
Invited Talk at WebSci workshop on Building Web Observatories
Citation preview
Steffen Staabstaab@uni-koblenz.de
1WeST
Vote for free Web Science MOOC!
Steffen Staabstaab@uni-koblenz.de
2WeST
You want to have more free
Web Science Education on the Web?
Vote for our course at
https://moocfellowship.org/
now!
Steffen Staabstaab@uni-koblenz.de
3WeST
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
The Challenges of Building Interoperable Web Observatories
http://wow.west.webobservatory.org/
Steffen Staab
Steffen Staabstaab@uni-koblenz.de
4WeST
Produce
Consume
Cognition
Emotion
Behavior
SocialisationKnowledge
Observable Micro-
interactions in the Web
AppsProtocols
Data & InformationGovernance
WWW
Observable Macro-
effects in the Web
What to observe?
Steffen Staabstaab@uni-koblenz.de
5WeST
Why to observe?
Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
Steffen Staabstaab@uni-koblenz.de
6WeST
Why to observe?
Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
Steffen Staabstaab@uni-koblenz.de
7WeST
Produce
Consume
Cognition
Emotion
Behavior
SocialisationKnowledge
Observable Micro-
interactions in the Web
AppsProtocols
Data & InformationGovernance
WWW
Observable Macro-
effects in the Web
What to observe?
Web Crawling Usage Logging
Steffen Staabstaab@uni-koblenz.de
8WeST
Challenges – Data Collection Issues
Legal and/or Ethical Crawling
May be disallowed by provider
Usage logging Privacy of individuals
Even if it is allowed....
Steffen Staabstaab@uni-koblenz.de
9WeST
Challenges – Data Collection Issues
Crawling What does it mean to crawl a heavily interactive site? Incomplete data
• Unreachability• Time outs
Steffen Staabstaab@uni-koblenz.de
10WeST
Challenges – Data Collection Issues
Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start?
• We cannot observe everything!– Even just for data size!– What appear to be most fruitful starting points?
Steffen Staabstaab@uni-koblenz.de
11WeST
Challenges – Data Collection Issues
Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start? Where to stop?
• Each crawl is a view– Twitter
» Tweet» URL
» Web Page» Subweb
» Followers» Followers‘ Followers
» ...
Steffen Staabstaab@uni-koblenz.de
12WeST
Challenges – Data Collection Issues
Crawling What does it mean to crawl a heavily interactive site? Incomplete data Where to start? Where to stop? Synchronous vs asynchronous
• Strictly speaking: only asynchronous crawling possible– But in [Dellschaft&Staab] we targeted the construction of
models for streams of tags
Steffen Staabstaab@uni-koblenz.de
13WeST
Challenges – Data Publishing Issues
Legal and/or Ethical Example Issues AOL query log Netflix challenge Delicious
http://www.tagora-project.eu/data/ Twitter
Collecting, but no sharing• SocialSensor project
Steffen Staabstaab@uni-koblenz.de
14WeST
Challenges – Data Publishing Issues
Technical/Modelling issues Generic format, e.g. RDF Format ready for digestion by a certain software, e.g. for
Matlab processing Openness to other data
E.g. references to DBPedia/Wikipedia Accuracy of publishing
http://me.org showed „...“ http://me.org showed „...“@2013-05-01:0900CEST http://me.org showed „...“@2013-05-01:0900CEST called
from IP 193.99.144.85 using browser...version...history...
Steffen Staabstaab@uni-koblenz.de
15WeST
Sharing Software
Software For crawling or usage logging Rather than sharing the data, share the code for observing
Example: code for crawling Twitter in a certain way
Issues Limited repeatability Disturbance liability („Störerhaftung“) – at least in DE
• If you provide source code for crawling, e.g., Facebook, even if you do not crawl FB, FB can sue you
Steffen Staabstaab@uni-koblenz.de
16WeST
Why to observe?
Understanding Collecting Describing Analyzing Modeling Predicting Repeating!
Steffen Staabstaab@uni-koblenz.de
17WeST
WEB OBSERVATORY WIKIIn spite of all this....
Steffen Staabstaab@uni-koblenz.de
18WeST
Ongoing discussion
What to do about sharing Web Science datasets?
Let‘s do simple things first Collect pointers! Publish whatever you can publish – others will reuse Make it more archival
In a way that makes it easy to expand to handle more complex issues Semantic Wiki!
Steffen Staabstaab@uni-koblenz.de
19WeST
Web Observatory Wiki
• Main Goals:• Registry of Web Science datasets• Compiled by Web Observatory participants –
YOU!
• Minor Goals• Semantically store all information about
datasets• Make it
• Explorable• Queryable• Reuseable
Steffen Staabstaab@uni-koblenz.de
20WeST
Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/
Main classes: Examples: Dataset_Repository KONECT Dataset Slashdot Zoo Organization WeST
Quick Facts -1
Steffen Staabstaab@uni-koblenz.de
21WeST
Semantic MediaWiki + Forms Extension URL: http://wow.west.webobservatory.org/
Class Hierarchy Example: Attributes: Dataset Dublin Core +
Size, license, URL,…
Network Node Count Social Network …
Quick Facts - 2
Steffen Staabstaab@uni-koblenz.de
22WeST
Semantic Exploration by Views
Steffen Staabstaab@uni-koblenz.de
23WeST
Semantic Forms: Providing Data
Steffen Staabstaab@uni-koblenz.de
24WeST
ko:konect
ko:slashdot-zoo
wow:contains
1944
wow:network-volumewow:social-network
rdf:type
wow:network
rdfs:subClassOf
wow:dataset
rdfs:subClassOf
ko:twitter
wow:contains
120000000
wow:size
wow:network-volume
rdfs:domain
wow:size
rdfs:domain
rdf:type
wow:dataset-repositoryrdf:type
wow:contains
rdfs:domain
rdfs:range
Schema (Excerpt)
Steffen Staabstaab@uni-koblenz.de
25WeST
Discussion & Q&A
Access to wiki Current model:
• Edits allowed by IPs and users• Everyone can be blocked, including IPs
Contribute: Content Modeling requirements ... Let us know!
Steffen Staabstaab@uni-koblenz.de
26WeST
Sanity Check
Understanding
Collecting (to some extent: commodity service)
Describing (WOW)
Analyzing
Modeling
Predicting
Repeating!
So far ad hoc –needs much more:• Experience• Guidelines• Processing workflow• Executable code shares
(on big data!)• ...
Steffen Staabstaab@uni-koblenz.de
27WeST
What else do we need?
Steffen Staabstaab@uni-koblenz.de
28WeST
Vote at: https://moocfellowship.org/
Recommended