24
School of Engineering and Informatics SiocLog: Providing IRC Discussion Logs as Linked Data Tuukka Hastrup 1 , Uldis Bojars 2 and John G. Breslin 2, 3 1 University of Jyväskylä, Finland 2 DERI, NUI Galway, Ireland 3 School of Engineering and Informatics, NUI Galway, Ireland

SiocLog: Providing IRC Discussion Logs as Linked Data

Embed Size (px)

DESCRIPTION

Social Data on the Web Workshop at the International Semantic Web Conference / Washington, DC / 26th October 2009

Citation preview

Page 1: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

SiocLog: Providing IRC Discussion Logs as Linked Data

Tuukka Hastrup1, Uldis Bojars2 and John G. Breslin2, 3

1 University of Jyväskylä, Finland

2 DERI, NUI Galway, Ireland 3 School of Engineering and Informatics, NUI Galway, Ireland

Page 2: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Motivation

• IRC conversations are quite disconnected from the Web and even from other IRC channels and networks

• Often there is valuable and needed information in an IRC chat that cannot be linked to people, topics or events, or in general referenced from elsewhere

• This may be useful to people who do not use IRC, by those on other networks, or simply by people who leave and rejoin a channel

Page 3: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Motivation (2)

• SIOC provides a framework for linking social media contributions to other content and Linked Data resources, and IRC can become part of that framework

• We also need mechanisms to link the IRC contributions to the people who made them, hence the use of Web ID

Page 4: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Background

• We will begin by introducing the various areas relevant to this system:

– IRC

– Linked Data

– SIOC

– Web ID

Page 5: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Internet Relay Chat (IRC)

• Instant messaging / internet chat is a major form of social interaction online

• It is often disconnected from the Web:

– Due to the different protocols involved

– Due to its real-time nature / lack of persistent storage

• IRC was one of the earliest chat systems

• It has an important role amongst open-source communities, web communities, and even geeks!

– Hundreds of thousands of users online at any time

Page 6: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Linked Data

• Building a “Web of Data” to enhance the current Web

• Exposing, sharing and connecting data about things via dereferenceable URIs

• Linking datasets together that were not previously connected, for example:

– Music and people

– Real-world things and places

• The Linking Open Data (LOD) effort aims to link various open datasets together (DBpedia, GeoNames, etc.)

Page 7: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Semantically-Interlinked Online Communities (SIOC)

• An effort from DERI, NUI Galway to discover how we can create / establish ontologies on the Semantic Web

• Goal of the SIOC ontology is to address interoperability issues on the (Social) Web

• http://sioc-project.org/

• SIOC has been adopted in a framework of 50 applications or modules deployed on over 400 sites

• Various domains: Web 2.0, enterprise information integration, HCLS, e-government

Page 8: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Page 9: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Some of the SIOC core ontology classes and properties

Page 10: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Some examples of where SIOC is already use (about 50 implementations / applications)

Page 11: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Web ID

• A Web ID is a web address that identifies a person as a Linked Data item

• A Web ID should also lead to a document with more information about that person (e.g. FOAF, other RDF)

• For more information, see the definition in this paper:

– Ching-Man Au Yeung, Ilaria Liccardi, Kanghao Lu, Oshani Seneviratne, Tim Berners-Lee, “Decentralization: The Future of Online Social Networking”, W3C Workshop on Future of Social Networking

Page 12: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Design

Page 13: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Mapping IRC identifiers to URIs on the Web

• irc://freenode

(IRC Network)

• irc://freenode/%23channel

(Channel)

• No identifier

(Message)

• irc://freenode/persona,isuser

(Chat Persona)

• http://irc.sioc-project.org/#freenode

• http://irc.sioc-project.org/channel#channel

• http://irc.sioc-project.org/channel/0000-00-00 #00:00:00.00

• http://irc.sioc-project.org/users/persona#user

Page 14: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Some of the internal and external links

Page 15: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Browsing the Linked Data

Page 16: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Creating a link between a user account on IRC and a personal profile

• Claiming a Web ID creates a link [black] between a user account (a sioc:User that created a sioc:Post in a sioct:ChatChannel) and a person (foaf:Person)

• The person can manually verify this:

– By pointing back to the sioc:User from their foaf:Person definition [grey]

Page 17: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Web IDs in SiocLog

• A Web ID can be claimed using mttlbot

• Can claim using standard IRC services

/msg nickserv

set property webid SomeWebID

Page 18: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Implementation

• 2000 lines of Python source code

• 1000 lines of Zope/TAL HTML templates

• Twisted, SimpleTAL and Redland libraries

• Four major components:

– IRC interface, data analysis, data integration, Web

Page 19: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Implementation (2)

• IRC interface:

– Discussion logger / persona monitor on Twisted

• Data analysis:

– Process logs, a filters pipeline, sinks for stats / output

• Data integration:

– Queries for external Linked Data (personal profiles)

• Web interface:

– Requests via CGI, publishes as HTML and RDF

Page 20: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Finding the names of friends of an IRC persona with SPARQL

semwebquery –sparql "SELECT ?name WHERE {

?person foaf:holdsAccount

<http://irc.sioc-project.org/users/melvster#user> .

?person foaf:knows ?friend .

?friend foaf:name ?name . }"

Page 21: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Validation

• 291 chat personas on five channels

• 22,418 chat messages

• 51 chat personas have associated Web IDs claimed using mttlbot (2/3) or nickserv (1/3)

– 44 of those have a valid associated RDF document

• Scalable (projected 4 million triples in 10 years)

• SiocLog data being consumed by the “Towards linked sensor data for Hackystat” project

• SiocLog interfaces to FOAF Me for new profile creation

Page 22: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Future work

• Extend to instant messaging and private messaging

• Study of IRC communities where users and content are distributed across channels and networks

Page 23: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Acknowledgements

• We would like to thank Science Foundation Ireland for their support under grant SFI/08/CE/I1380 (Líon 2)

• Thanks also to Benja Fallenstein and Dan Brickley for their insights

Page 24: SiocLog: Providing IRC Discussion Logs as Linked Data

School of Engineering and Informatics

Summary

• IRC conversations are quite disconnected from the Web and even from other IRC channels and networks

• Often there is valuable and needed information in an IRC chat that cannot be linked to people, topics or events, or in general referenced from elsewhere

• SIOC provides a framework for interlinking social media to other content and Linked Data, and IRC has been integrated as a part of that framework

• We also used mechanisms to link IRC contributions to the people who made them via Web ID and FOAF