24
Project group knowAAN Final presentation Computer Science Education Group University of Paderborn October 20th 2011

Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Project group knowAANFinal presentation

Computer Science Education GroupUniversity of Paderborn

October 20th 2011

Page 2: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Overview

Overview

I IntroductionI System components & Work flowI DemonstrationI Development processI Summary & OutlookI Time for further questions of detail

PG knowAAN 2

Page 3: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Overview

Overview: First part

I GoalsI Extraction & Storage (of data)I Exploration (of data)I System components & Work flowI Analysis & Visualization (of data)

PG knowAAN 3

Page 4: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Goals

Goals

I Explore research networksI Based on: Artifacts (scientific publications) and metadataI Combination and analysis of dataI Computation of similarities of full textsI Support for conference management system GinkgoI Data visualizationI Recommendations

(Source: PG knowAAN project description)

PG knowAAN 4

Page 5: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Goals

Imagine you are interested in a conference.You downloaded the papers of 2 or 3 years.

Now you have nearly 100 publications.How do you explore them?

100 publications. Do you know tools?PG knowAAN 5

Page 6: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Extraction & Storage

Extraction & Storage

First step: Extract data and store it.

PG knowAAN 6

Page 7: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Extraction & Storage

PG knowAAN 7

Page 8: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Exploration

Exploration

Second step: Explore data.

PG knowAAN 8

Page 9: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Exploration

Exploring a conference

PG knowAAN 9

Page 10: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Exploration

Exploration

Which extracted data is available for a publication?

→ Database schema

PG knowAAN 10

Page 11: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

publication

id GUID

lucuid VARCHAR(512)

title VARCHAR(512)

booktitle VARCHAR(512)

normtitle VARCHAR(512)

date VARCHAR(512)

editor VARCHAR(512)

journal VARCHAR(512)

note VARCHAR(512)

pages VARCHAR(512)

publisher VARCHAR(512)

tech VARCHAR(512)

volume VARCHAR(512)

number VARCHAR(512)

rawstring VARCHAR(4096)

xmlfile VARCHAR(512)

pdffile VARCHAR(512)

topicfile VARCHAR(512)

created BIGINT

modified BIGINT

Indexes

author

id GUID

text VARCHAR(512)

normtext VARCHAR(512)

firstname VARCHAR(512)

lastname VARCHAR(512)

created BIGINT

modified BIGINT

Indexes

pub_aut

publication_id GUID

author_id GUID

Indexes

affiliation

id GUID

text VARCHAR(512)

location_id GUID

Indexes

address

id GUID

text VARCHAR(512)

location_id GUID

Indexes

pub_aff

publication_id GUID

affiliation_id GUID

Indexes

pub_add

publication_id GUID

address_id GUID

Indexes

citation

publication1_id GUID

publication2_id GUID

Indexes

discipline

id GUID

text VARCHAR(512)

parent_id GUID

Indexes

location

id GUID

latitude DOUBLE

longitude DOUBLE

text VARCHAR(512)

Indexes

keyword

id GUID

text VARCHAR(512)

Indexes

pub_key

publication_id GUID

keyword_id GUID

score DOUBLE

source VARCHAR(512)

Indexes

pub_evt

publication_id GUID

event_id GUID

Indexes

pub_dis

publication_id GUID

discipline_id GUID

Indexes

pub_con

publication_id GUID

concept_id GUID

score DOUBLE

source VARCHAR(512)

Indexes

concept

id GUID

text VARCHAR(512)

Indexes

event

id GUID

text VARCHAR(512)

filepath VARCHAR(512)

predecessor_id GUID

successor_id GUID

Indexes

eventseries

id GUID

text VARCHAR(512)

filepath VARCHAR(512)

Indexes

evt_evs

event_id GUID

eventseries_id GUID

Indexes

aut_add

author_id GUID

address_id GUID

Indexes

aut_aff

author_id GUID

affiliation_id GUID

Indexes

pub_cat

publication_id GUID

category_id GUID

score DOUBLE

source VARCHAR(512)

Indexes

category

id GUID

text VARCHAR(512)

Indexes

bib_coupling

co_author

co_citationkeyword_count

discipline_count

category_count

concept_count

evt_pub_aut_count

Page 12: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

System components & Work flow

System components & Work flow

How is our system structured?

→ Some examples.

PG knowAAN 12

Page 13: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

System components & Work flow

Components

<< component >>

FileStorage

<< component >>

Backend

<< component >>

xmlBuilder

<< component >>

TopicExtraction

<< component >>

TF-Component

<< component >>

TrendDetection

<< component >>

Roundtrip

<< component >>

Recommendation

<< component >>

PDFToText

<< component >>

Clustering

<< component >>

DB

<< component >>

Parscit

<< component >>

DataBase

<< component >>

SolrWebServices

<< component >>

DocBrowser

<< component >>

FrontendReferenceExtraction

<< component >>

ParscitTrainer

JDBC

JDBC

Model

WebServices

WebServices

FileSystem

PG knowAAN 13

Page 14: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Languagedetection: DB:Solr:NounExtraction:Lemmatizer:Parscit:PDFToText :RoundTripExecutor :RoundTrip :DocumentBrowser:

a / 1) .addPDF

a / 1)

a / 2) .writeToFS

a / 2) Path

a / 3) .createThread

a / 3)

.submitThread

b / 1) .run

b / 1)

b / 2) .getText

b / 2) Text

b / 3) .ParseFullText

b / 3) ParscitXML

b / 6) .lemmatize

b / 6) LemmatizedText

b / 4) .extractBodyAndAstract

b / 4) BodyAndAbstract

b / 7) .extractNouns

b / 7) NounsList

b / 8) .lemmatizeNounslist

b / 8) LemmatizedNouns

b / 10) .writeToFiles

b / 10) Paths

b / 5) .getLanguage

b / 5) LanguageString

b / 9) .ReduceToTopNouns

b / 9) TopNouns

b / 11) .addTexts

b / 11) Solrid

b / 12) .addPublication

b / 12)

Page 15: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

System components & Work flow

Work flow

PG knowAAN 15

Page 16: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Analysis & Visualization

Analysis & Visualization

Third step: Analyze and visualize data.

PG knowAAN 16

Page 17: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Analysis & Visualization

Analysis of authors

PG knowAAN 17

Page 18: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Analysis & Visualization

Analysis of scientific publications

PG knowAAN 18

Page 19: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Demonstration

Demonstration

Now: Demo.Image: http://www.flickr.com/photos/plaisanter/5525977163/

PG knowAAN 19

Page 20: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Development process

Technologies

Jersey

PG knowAAN 20

Page 21: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Development process

Methods of agile software development

FDD XPScrum

PG knowAAN 21

Page 22: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Development process

Methods of agile software development

I Weekly meetingsI Sit together (as much as possible)I Automated building systemI Continuous integrationI Issue tracking

PG knowAAN 22

Page 23: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Summary and Outlook

Summary and future work

Summary

I Integrated processing of scientific papersI Aggregated visualization of authors, publications and

eventsI Compute various analysis over the dataI Cleaning functionality for automated processed data

Future work

I Parallelized ClusteringI Additional graphical visualizationI Improve extraction of metadata from PDF files

PG knowAAN 23

Page 24: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th

Summary and Outlook

Thank you for your attention

Questions?

PG knowAAN 24