Upload
benjamin-heitmann
View
106
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Discourse on the Web currently can not be appropriately representation, which hampers searching and querying. Based on insights from Web Science, DERI Galway has developed three different approaches for representing and mining of discourse.
Citation preview
Chapter Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Representing discourse and argumentationas an application of Web Science
Benjamin Heitmann, Dr. Conor Hayes
Digital Resources for the Humanities and Arts Conference 2009
Digital Enterprise Research Institute www.deri.ie
slide of 18
Introduction
The Web mirrors most areas of today’s society (e.g.: entertainment, science and humanities)
Current Web does not capture structure of critique, argumentation, interpretation
Representing types and granularity of discourse and links is necessary
DERI has 3 approaches to discourse representation
Foundation: Web Science as an interdisciplinary approach to understanding and engineering the Web (started by Tim Berners-Lee)
2
Digital Enterprise Research Institute www.deri.ie
slide of 18
Outline
Motivation:
Knowledge representation techniques to enable more sophisticated searching and querying of discourse on the Web
Introducing Web Science:
an interdisciplinary approach to understanding the Web and its evolution
Applying the Web Science method:
three approaches for discourse representation
3
Argument
research weblog
research paper
Primary Text
reference
Counter-argument
Conclusion
reference
Motivation
Evaluation
reference
reference
publication frequency increases
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
Discourse and argumentation on the Web
The Web doesn't properly capture the dynamic argumentation structures in discourse
Current search only captures: Plain text
General links
Citations
No search for: Relations between concepts
Negative relations
Semantics of argumentation:– Argument, counter-argument – Condition, evidence, solution
4
Digital Enterprise Research Institute www.deri.ie
slide of 18
Representing the structure of discourse
5
Knowledge on the Web is not sufficiently connected
No standard vocabularies for representation of discourse structure and link granularity
Queries are un-intuitive and imprecise, no negative queries
Links are un-typed, and only on document level
No semantics of relationships
Source: “Clickstream Data Yields High-Resolution Maps of Science,” Bollen, Van de Sompel, et al. PLoS ONE (2009)
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
Insights from Web Science
The “Web Science” idea was started by Tim Berners-Lee and researchers from Southampton (see sources)
1. Understanding the current Web requires an interdisciplinary and holistic view of the Web on a whole
2. On the Web, engineering and social factors will influence each other and create a feedback loop
3. Properties of the Web are based on emergent behaviour, which can be empirically measured
6
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
A Systems-level view of the Web
Classical reductionist approach does not work
Understand-ing the current Web requires an inter-disciplinary view
No delegation of research on one area to only one discipline
7
© Web Science Research Initiative
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
The Web Science Process Model
On the Web, engineering and social factors will influence each other.
Increase in complexity:
result is transition from micro to macro effects
Example: Evolution of Blogs Independent blogs:
Track-backs, Comments, Spam
Twitter: Microblogging,
HashTags, Location aware
Facebook: Lifestreaming, Privacy
8
© Law
rence Lessig
Source: CACM Web Science Article
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
Emergent properties of the Web
Empirical properties: In- and out-degree distribution of links
Power laws Growth: 7 million new pages a day in 2005
Emergent patterns: Popular tags (folksonomies) on Web 2.0 sites
Emerging of an editorial elite on Wikipedia
9
Source: “Graph structure in the Web”, Broder, Kumar et al.
© C
lay Shirky
Digital Enterprise Research Institute www.deri.ie
slide of 18
Approaches for discourse representation
The Web Science method and discourse representation: Interdisciplinary: theoretical foundation is based on Speech act theory and Language Game theory
Expect a feedback loop between Semantic Web solutions and usage patterns of community
Empirical approach: CORAAL: use knowledge extraction and integration on large data collections
Normative (engineering) approaches: – SIOC Argumentation vocabulary:
light-weight and community-driven– SALT: annotation of argumentation semantics
10
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
CORAAL: empirical discourse analysis
Knowledge extraction and integration
Pattern discovery Use emergent patterns
in large document collections
Go beyond text based search: Answer negative queries
Detect relations between concepts
Uses Natural Language Processing
No mark-up required
11
CORAAL screen shot of results for the search term “breast cancer”
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
SIOC argumentation vocabulary
Light-weight and informal
Express structure of argumentation: Who is participating?
Where are the elements of the discourse distributed?
How are the elements connected?
Extensibility enables community involvement
13
SIOC argumentation vocabulary
Digital Enterprise Research Institute www.deri.ie
slide of [email protected]
SALT: Semantically Annotated LaTex
Enables mark-up of documents for claim identification
Exposes the semantics of the argumentation. Examples: Claims, explanations
Rhetorical structure (abstract, contribution, evaluation)
Argument, counter-argument
Creates PDF with content and structure
15
discourse representation in SALT
Digital Enterprise Research Institute www.deri.ie
slide of 18
Summary
Representing discourse allows intuitive querying and searching of the argumentation semantics
The Web Science method provides insights to representing discourse: Use interdisciplinary approach; Expect feedback loop between technical and social factors; Detect emergent properties and patterns
Three approaches at DERI for representing discourse: CORAAL: empirical, knowledge extraction+integration
SIOC argumentation vocabulary: light weight, bottom up
SALT: annotate argumentation semantics in publications
17
Digital Enterprise Research Institute www.deri.ie
slide of 18
Questions? and Sources!
These slides: http://www.slideshare.net/metaman
Web Science:“Web science: an interdisciplinary approach to understanding the web”, Hendler, Shadboldt, Hall, Berners-Lee, Weitzner, Communications of the ACM (2008)
CORAAL: demo at http://coraal.deri.ie:8080/coraal“CORAAL-Dive into publications, Bathe in the Knowledge,” Novacek, Groza, et al., Journal of Web Semantics, Elsevier (2009)
SIOC argumentation vocabulary:“Expressing Argumentative Discussions in Social Media Sites”, Lange, Bojars, et al., Workshop on Social Data on the Web at the International Semantic Web Conference (2008)
SALT:“SALT-Semantically Annotated LaTex for Scientific Publications,” Groza, Handschuh, et al., European Semantic Web Conference (2007)
18