Upload
ralf-klamma
View
1.084
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Zinayida Petrushyna, Ralf KlammaRWTH Aachen UniversityWorkshop “Digital Social Networks”, MunichSeptember 12, 2008
Citation preview
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-1
CUELC
Zinayida Petrushyna, Ralf Klamma
RWTH Aachen University
Workshop “Digital social networks”, Munich
September 12, 2008
The Troll under the Bridge:Data Management for Huge Web
Science Mediabases
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-2
CUELC
Agenda
Motivation & Problem definition Data Management for Web Science
– Crawling: Watchers– Analysis: Patterns– Visualization: Graphs
Conclusion Outlook
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-3
CUELC
Data Management issues in Web Science
Interoperable formats– XML based – Wikis , RSS Feeds, Microformat– SQL based – Deep Web – Text based – Websites, Forums
Non-continuous analysis– Crawling vs. Dumps– Special purpose vs. General purpose
Aggregation level is not possible to achieve– Data warehouses– Theoretical considerations of agency – Actor network theory
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-4
CUELC
Data Model for the Web 2.0
Latour: On Recalling ANT, 1999
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-5
CUELC
Mediabase
A Mediabase is a six-tuple graph
L), , , R,(A, M A A R
L A :
L R : 1 0, R :
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-6
CUELC
Actors in the Mediabase
Network Agent, Process, Artefact, Medium, A
Folksonomy site, gbookmarkin Social
Forum, Wiki,room,Chat Podcast, Blog, site,-Web
Feed, Newsgroup, ,Newsletter lists, Mailing
Medium
Reference Rankíng,
,Multimedia Rating, URL,Review, Trackback, Tag, ,Executions
Thread, entry, Blog Burst, on,Conversati Feedback, Host,
n,Transactio Entry, RSS Comment, Index, mail,-E Message,
Artefact
Addressing ion,Transcript Retrieval,
,Monitoring Search, n,Acquisitio Process
Expert onalist,Conversati Spammer, Troll, ,Questioner
person, Answering Dead, Reviewer, Lurker, Member, tor,AdministraAgent
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-7
CUELC
Crawling Technologies
Artefact MediaW
Index Thread Message list MailingMW
Agent Process Artefact MediaI
Network Agent Process Artefact MediaG
Mix of dumps (Wikis) and special purpose crawlers:
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-8
CUELC
Trolls under the Bridge
What is a disturbance, e.g. a troll?– Sensing an incompatibility between theories exposed and
theories-in-use Disturbances are starting points of learning
processes– Disturbances disturb, prevent … but they are creating
reflection Disturbances are hard to detect or to forecast
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-9
CUELC
Complex Troll Pattern in Basic Notation
postedIn ), v(performs, ), v(performs, ) ,v(
P : Artefact , Process , Thread )9
performs. ) , v(msg Authoring
P Member Ag : Artefact , Process ,Agent )8
performs ) , th v( Autoring P Member Ag
: Artefact , Process ,Agent )7
:Agent Ag : AgAgent )6
:Process P : P Process )5
:Artefact : Message )4
on stored ) , v(th : Artefact , Medium )3
Thread 1 , : 1 , thread )2
Artefact : Thread )1
msgThmsg apmsgp
msgapthmsgThapmsgTh
paua
ppauauapau
pcrappcrcr
apcr
msg msg
amaam
thththth
thth
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-10
CUELC
Complex Troll Pattern in Basic Notation
msgThaucr
tmsgTh
autcrttt
eadmessageThr author t author creator
creator minPosts msg eadmessageThr
author author creator creator Ag : troll
1111
1
10)
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-11
CUELC
Pattern LanguageVariables – simple variables (troll, thread), properties
(thread.author) and set variables (v1,…,vn).Operations
– Arithmetic (+, -, *, / )– Aggregate (SUM, COUNT, AVERAGE)– Logical (&, |, ~, FORALL and EXISTS)– Comparison (=, !=, >, <).
Rules for variable binding– Simple variables – pattern parameters, actors or set variables– Properties – actor properties or relations– Set variables – actors
Interpreted by a finite state automaton
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-12
CUELC
Pattern Language for PALADIN: Example Troll
Troll Pattern: This pattern tries to discover the cases when a troll exists in a digital social network. A troll in the network is considered a disturbance.
Disturbance: (EXISTS [medium | medium.affordance = threadArtefact]) &
(EXISTS [troll |(EXISTS [thread | (thread.author = troll) & (COUNT [message | (message.author = troll) & (message.posted = thread)]) > minPosts]) & (~EXISTS[ thread1, message1| (thread1.author1 != troll) &
(message1.author = troll & message1.posted = thread1 ]))])])
Forces: medium; troll; network; member; thread; message; url
Force Relations: neighbour(troll, member); own thread(troll, thread)
Solution: No attention must be paid to the discussions started by the troll. Rationale: The troll needs attention to continue its activities. If no attention is paid, he/she
will stop participating in the discussions. Pattern Relations: Associates Spammer pattern.
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-13
CUELC
Pattern Discovery ProcessPattern
Disturbance
Variables
Pattern Template
Disturbance
VariablesPattern Parameters
Pattern Template Instance
Pattern Instance
Disturbance
Variables Pattern Parameters
Forces ForceRelations
Rationale
Dependencies
Description Solution
Pattern Relations
Disturbance Instances
Variables Pattern Parameters
Digital Social Network
1. Set pattern parameters
2. Instantiate disturbances
3. Evaluate disturbances
4a. Change Pattern Parameters
4b. Apply Pattern Solution
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-14
CUELC
Visualization
Lehrstuhl Informatik V(Informationssysteme)
Prof. Dr. M. JarkeI5-RK-0808-15
CUELC
Conclusions and Outlook
Homogeneous data management Pattern language for disturbance analysis Graph-based visualization
Data uncertainty and inconsistent data Goals and intentions of analysts Dynamic Mediabase visualization