18
1 The BlogForever Project http://blogforever.eu MTSR 2013, 22 Nov 2013, Thessaloniki Vangelis Banos, BlogForever Project Manager

BlogForever Project presentation at MTSR2013

Embed Size (px)

DESCRIPTION

BlogForever, a collaborative European Commission funded project, developed an exciting new system to harvest, preserve, manage and reuse blog content.

Citation preview

Page 1: BlogForever Project presentation at MTSR2013

1

The BlogForever Project

http://blogforever.eu

MTSR 2013, 22 Nov 2013, Thessaloniki

Vangelis Banos,BlogForever Project Manager

Page 2: BlogForever Project presentation at MTSR2013

Contents

The Disappearing Web

Web Archiving

The BlogForever Project

BlogForever Applications

MTSR 2013, 22 Nov 2013, Thessaloniki 2

Page 3: BlogForever Project presentation at MTSR2013

Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki 3

Page 4: BlogForever Project presentation at MTSR2013

Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki 4

Page 5: BlogForever Project presentation at MTSR2013

Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki 5

Page 6: BlogForever Project presentation at MTSR2013

Web Archiving

MTSR 2013, 22 Nov 2013, Thessaloniki 6

The InternetArchive comesto the rescue!

Page 7: BlogForever Project presentation at MTSR2013

Web Archiving

The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public.

MTSR 2013, 22 Nov 2013, Thessaloniki 7

Page 8: BlogForever Project presentation at MTSR2013

The challenge of web archiving

MTSR 2013, 22 Nov 2013, Thessaloniki 8

File(s) Software Hardware RECORD

Generic file archiving operation

Page 9: BlogForever Project presentation at MTSR2013

The challenge of web archiving

MTSR 2013, 22 Nov 2013, Thessaloniki 9

File(s)

Software

Hardware Website

File(s)

File(s)

File(s)

File(s)

File(s)

File(s)

Software

Software

???

Record(s)

???

Web archiving operation

Page 10: BlogForever Project presentation at MTSR2013

We are focusing on blogs Blogs have become fairly established as an online

communication and web publishing tool. Hundreds of millions of blogs are published about every

conceivable subject.

MTSR 2013, 22 Nov 2013, Thessaloniki 10

3 414.2 19.6 27.2 34.5

50 5770

133

156164

182

0

20

40

60

80

100

120

140

160

180

200

July 2004

Oct 2004

Aug 2005

Oct 2005

Feb 2006

Apr 2006

Aug 2006

Nov 2006

Apr 2007

Sep 2008

Feb 2011

July 2011

Jan 2012

mill

ions

Number of blogs (blogpulse.com)

Examples 12/9/2013

70+ million sites in the world369 million people viewing more than 11.8 billion pages each month38 million new posts and 62.3 million new comments each month

136.5 million blogs61 billion posts83.7 million daily posts

Page 11: BlogForever Project presentation at MTSR2013

Blog Archiving: Objectives & Concerns

Blog characteristics: Database driven, dynamic websites, High frequency of updates, Special structure, metadata, semantics &

communication protocols, Highly interconnected, Quantity and range of resources, Ownership and DRM.

Our aims: harvest, preserve, manage and reuse blogs and their

resources.MTSR 2013, 22 Nov 2013, Thessaloniki 11

Page 12: BlogForever Project presentation at MTSR2013

The BlogForever Project Collaborative EC funded project, Duration: 1 Mar 11’ – 31 Aug 13’, Aims: Theoretic and applied research on blog

archiving Coordinated by AUTH. Partners:

MTSR 2013, 22 Nov 2013, Thessaloniki 12

Page 13: BlogForever Project presentation at MTSR2013

BlogForever project achievements

MTSR 2013, 22 Nov 2013, Thessaloniki 13

BlogForever has created a novel blog archiving approach.It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.).

Blog modelling and semantics

Cases studies and validation

Preservation strategies

Implementation of the BlogForever platform

Page 14: BlogForever Project presentation at MTSR2013

BlogForever project achievements

MTSR 2013, 22 Nov 2013, Thessaloniki 14

Blog crawlers

Real-time monitoring Html data extraction engine Spam filtering Web services extraction

engine

Unstructured information

Web servicesBlog APIs

Original data andXML metadata

Blog digital repository

Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface /

Browse, search, export

Harvesting

PreservingManaging and reusing

Web servicesWeb interface

Page 15: BlogForever Project presentation at MTSR2013

BlogForever Added Value

MTSR 2013, 22 Nov 2013, Thessaloniki 15

BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.

BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.

Better metadata and higher information granularity. Open Standards and Interoperability (MARCXML, Web Services) Better management of archived information, increasing the

utility of the web archive. Easy to facilitate added value services e.g. analytics.

Page 16: BlogForever Project presentation at MTSR2013

BlogForever ImpactBlog archiving methods and policies which

are reusable and generic.A blog archiving solution that any institution

could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility

A blog archiving solution that any researcher could use to gather, analyse and reuse blog data.

MTSR 2013, 22 Nov 2013, Thessaloniki 16

Page 17: BlogForever Project presentation at MTSR2013

BlogForever ApplicationsCERN is currently implementing a high energy

physics blogs repository.AUTH is designing an academic blogs repository.The Linguistics Department of the University of

Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.

The University of Warwick Computer Science Department is doing social web analytics using blog data.

MTSR 2013, 22 Nov 2013, Thessaloniki 17

Page 18: BlogForever Project presentation at MTSR2013

Thank you!

Visit http://blogforever.eu Access all BlogForever Deliverables (Open Access). Download the Open Source BlogForever Platform.

Contact us: Project Manager: Vangelis Banos [email protected] Exploitation Manager: Efstratios Arampatzis

[email protected]

MTSR 2013, 22 Nov 2013, Thessaloniki 18