A knowledge web‐based eResearch environment for classical … · 2013-05-22 · 10.07.2009...

Preview:

Citation preview

Aknowledgeweb‐basedeResearchenvironmentforclassicalphilology

Cris%naVertanUniversityofHamburg

cris%na.vertan@uni‐hamburg.de

10.07.2009 1©Vertan‐DigitalClassicistSeminarLondon

Arbeitsstelle Computerphilologie

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 2

eResearch environment web-based Knowledge

 MaterialTypes Data‐Modelling Data‐Encoding Data‐Visualisa%on Func%onality

 Data‐access Usermodelling Datainteroperability Sharedworking‐space Datapersistency

 KnowledgeOrganisa%on (Semi‐)Automa%cknowledgeextrac%on Managementofmul%lingualdata Intelligentretrievalofheterogeneousmaterials

123

 Software and Hardware infrastructure

 Current development and Roadmap

4

5

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 3

• Ins%tuteforGreekandLa%nPhilology(Prof.Dr.C.Brockmann,D.Deckers,,M.Fricke,B.Pukhova,Dr.C.Vertan)• Compu%ngCenter

• AristotelesArchive–FreeUniversityBerlin(Prof.Dr.D.Harlfinger,L.Koch)

TEUCHOSResearchinfrastructurefor

classicalphilology

• Ini%alfunding2007‐2010

Wissenschaftliche Literaturversorgungs-

und Informationssysteme (LIS)‏

ArbeitsstelleComputerphilologie(Dr.C.Vertan,Prof.Dr.Waltherv.Hahn)

Collaborative Projects with social Sciences and Humanities

MLT‐Cphil‐MulDlingualLanguageTechnologyforclassicalphilology

research• 2009‐2010

ManuscriptDescripDons

•  FocusonDescrip%onsofTransmissionsofAristotleGraecus

–  2Volumes(1yetnotpublished)byD.Harlfinger

•  Dataismodelledbyselec%ngfromtheDFG‐guidelinesformanuscriptdescrip%onstheitemsrelevantforpalaeographicalandcodicologicalinves%ga%on

•  EncodingisdoneinTEI‐P5/msDescrip%on,withminorextensionsperformedto:

–  Includemoredetaileddescrip%ononwatermarks

–  Annota%onofpropernames,placesindifferentpartsofthemanuscriptdescrip%on

–  Markingemptyfolios

•  VisualisaDon:PrintView&ScreenView

•  FuncDonality:linkswithotherobjects(watermarks,transcrip%ons,digi%sedfolios,biographicaldata)

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 4

Knowledgeweb‐basedeResearchenvironement>Material‐Types

ManuscriptDescripDon‐Modelling&Encoding

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 5

<msItemclass="empty"><locus>ff.9v</locus><locus>81v‐82v</locus><locus>152v.</locus></msItem>

Knowledgeweb‐basedeResearchenvironement>Modelling&Encoding

Empty folios

<watermarkDesc> <locus>ff.1-70</locus>: Schere, sehr ähnlich Br.3668 (Rom 1454, mit Varianten überwiegend Italien 1451-1462), sehr ähnlich <ref target =„teuxx: wHCiseaux22“> Ha., ciseaux 22 </ref>(belegt J.1441 und 1443). </watermarkDesc>

Mentioning a Watermark in msDesc

10.037.2009 6©Vertan‐DigitalClassicistLondon

Knowledgeweb‐basedeResearchenvironement>Visualisa%on

31.03.2009 7©Vertan‐SCHExploratoryWorkshop‐

Birmingham

Watermarks

•  ForthemomentseveralhundredsofwatermarkscollectedbyProf.Harlfingerandexistentinprintedform

•  Thewatermarkswerecollectedastwin(ormore)‐listsandthereisnoclearevidencewhichelementofthelistwasiden%fiedonwhichpage.

•  Addi%onaltothewatermarkmo%fswecollectalsothecountermarks.

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 8

Knowledgeweb‐basedeResearchenvironement>MaterialsTypes

10.07.2009 ©Vertan‐DigitalClassicistLondon 9

Watermark Model

Knowledgeweb‐basedeResearchenvironement>Modelling

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 10

<teuwmo:teuwmObj"> <teuwmo:wmIdent wmIsCountermark="false"> <teuwmo:wmObjId>TEU_WMDesc_Fleur-134m2.xml</teuwmo:wmObjId> <teuwmm:wmIdentification> <teuwmm:wmIdnr>134</teuwmm:wmIdnr> <teuwmm:wmCollection> Harlfinger </teuwmm:wmCollection> <teuwmm:wmName> <wmNameLanguage wmLang="fr">Fleur</wmNameLanguage> <wmNameLanguage wmLang="de">Blume</wmNameLanguage> </teuwmm:wmName> </teuwmm:wmIdentification> </teuwmo:wmIdent> <teuwmo:wmManuscriptData> <teuwmo:msName>Berol. Ham. 512</teuwmo:msName> <teuwmo:msFolio>ff. 213/214; f. 215; auch f. 212</teuwmo:msFolio> <teuwmo:msDate></teuwmo:msDate> </teuwmo:wmManuscriptData> <teuwmo:wmLinks> <teuwmo:pictureLink>Fleur-134m2.tif</teuwmo:pictureLink> <teuwmo:msDescLink>Fleur-134m2.xml</teuwmo:msDescLink> <teuwmo:motifLink>Fleur.xml</teuwmo:motifLink> </teuwmo:wmLinks> </teuwmo:teuwmObj>

Watermark XML-Encoding

Knowledgeweb‐basedeResearchenvironement>Encoding

DigiDsedManuscripts

•  Notacri%calmass,stressonmanuscritsnotbeingavailable,orevenunknownuptonow(Lips.Gab.19)

•  Westore:• Folioimages(some%mesnotcomplete)

• Manuscriptdescrip%on• Watermarks(ifany)

•  (Par%al)transcrip%ons• Rela%onswithothercri%calEdi%ons

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 11

Knowledgeweb‐basedeResearchenvironement>Materialtypes

Teuchosdatamodelformanuscripts

DC- Data

Metadaten - XML/TEI

Thumbnail

Pyramid

PID_1

...

XML-TEI Data

DC- Data

PID_1

DC- Data

Metadatta - XML/TEI

Thumbnail

Pyramid

PID_2

DC- Data

Metadata - XML/TEI

Thumbnail

Pyramid

PID_3

DC- Data

Metadata - XML/TEI

Thumbnail

Pyramid

PID_n

XML-TEI Data

DC- Data

PID_2

XML-TEI Data

DC- Data

PID_3

DC- Data (Author, Date, etc)‏ (XML-DC)‏

XML List of Structuring-Objects (TEI)‏

PID

Manuscript Description (TEI)‏ XML-TEI Data

DC- Data PID_1

StructuringObj.-Links

Dig Links

WZ-Links Watermark- Object-Data

PID_1

Knowledgeweb‐basedeResearchenvironement>Materialtypes

DigiDsedManuscripts‐Encoding

•  EncodingisdonefollowingtheTEI‐P5•  Structuralelements(aschapters/subchapters,inclusivein

cri%caledi%ons)aremarkedwithelement„milestone“inthetranscrip%on.Thismilestoneisponi%ngtoacomplexstructurestoredinaseparateFile.

•  Inthiswayweavoidconcurrentannota%on(infactwerealiseakindofstand‐offannota%on)

•  Linksaremadenotdirectlytoimagesbuttotheobjectsdescribingtheimages.Inthiswaydependingonuserrightswecanoffer(ifatall)differentimagequality

•  ImagesarestoredinTiffformat,allowingaPyramidalstructure(forzooming)

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 13

Knowledgeweb‐basedeResearchenvironement>Encoding

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 14

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 15

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 16

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 17

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 18

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 19

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 20

OtherMaterials

•  Palimpsests(materialsacquiredduringtheRennascimentoVirtualeEU‐Project)–  Highqualityresolu%onimages–  Severallayersonthesameimage

•  Biographicaldata•  Bibliographicaldata•  Researchar%cles

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 21

Knowledgeweb‐basedeResearchenvironement>MaterialTypes

Challengesforthedatamodelling

•  exis%ngsetsofdataaretoosmalltomodelobjectsforallcon%ngencies

•  generalmodelcannotbedefined.

•  Notalltypesofdata/respec%veFedoraobjectsareavailableforallmanuscripts.

Knowledgeweb‐basedeResearchenvironement>

Web‐basedEnvironment

•  Access:

–  Userswithdifferentrights(usergroups)•  QualityEnsurance:

–  Revieweduploadingofnewdocuments&comments•  DataPersistency:

–  URIandPersistentIden%fiers(DOI)–  StoragerealisedontheinfrastructureoftheCompu%ngcentre

•  Datainteroperability

–  EncodingfollowsasmuchaspossibleTEI‐P5–  EnrichmentstotheTEI‐P5willbemadepubiclyavailable

–  ExportinManuscriptaMaedievaliaFormat

–  Import/accesstootherdigitallibraries(ENRICH,ManuscriptaMaedievalia,Piccard(Watermarks)

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 23

Knowledgeweb‐basedeResearchenvironement

Web‐basedEnvironment

•  SharedWorkingspace:–  Forumfunc%onality

–  Commentscanbemadetoanyofthematerials–  Synchronousaccesstodata–  Versioning

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 24

Knowledgeweb‐basedeResearchenvironement

Users

Public User Groups

Web-Application

Search Visual

represent. Editing Comments Web-pages

CMS

Textual Transmission Works

Descriptive Text

Minor Text

Edition data

Commentary

Translation

Forum

Watermarks

Watermark

DescripDon

Watermark‐

Image

Metdata

Watermark

Image

Metdata

Watermark

Image

Metadata

Fedora-Repository

Manuscripts

Transcrip%on

Manuscript

Descrip%on

Digit.

Manuscripts

MetadataDigit.

Manuscripts

MetadataPage

images

Metadata

Forumentries

Researchpapersand

publicaDons

Biographical

dicDonaries

Bibliography

Links Links Links Bibliogr

data Bibliogr

data Bibliogr

data

Teuchos Server

Submitting

Admin

31.03.2009 25©Vertan‐SCHExploratoryWorkshop‐

Birmingham

ManagementofHeterogeneousData

•  differentdatatypes:–  semi‐structureddocuments(XML/TEI)

–  high‐resolu%onRGBimages(TIFF)

–  graphics(forthewatermarks)

–  eMaterials(researchpapersinPDForMS‐Officeformat)

–  unstructureddocuments(Forum‐ar%cles)

•  Differentannota%on‐levelsanddepthamongthesemi‐structureddocuments

•  mul%lingualdata

Knowledgeweb‐basedeResearchenvironement>Knowldegein...

ImplementaDonchallenges

•  Naviga%onacrossdifferentdatacollec%ons(seman%clinking)

•  Document/objectretrieval(beyondfull‐text/form‐basedretrieval)

•  Mul%linguality

10.07.2009 ©Vertan‐DigitalClassicistLondon 27

IdenDfiedtypesofmulDlinguality

•  Mul%lingualityinsideonedocument

•  Mul%lingualityacrossdocuments•  Mul%lingualterminologyinsideodedatacollec%ontype

31.03.2009©Vertan‐SCHExploratoryWorkshop‐

Birmingham 28

31.03.2009 29©Vertan‐SCHExploratoryWorkshop‐

Birmingham

Abreviations

NE

Latin

Ancient Greek

TerminologicalmulDlingualproblems

•  Watermarksnames,ChapterinManuscriptdescrip%ons,otherkeywordsindifferentlanguages

•  E.g.

10.07.2009 ©Vertan‐DigitalClassicistLondon 30

<teuwmm:wmName> <wmNameLanguage wmLang="fr">Fleur</wmNameLanguage> <wmNameLanguage wmLang="de">Blume</wmNameLanguage> </teuwmm:wmName>

MulDlingualityacrossdocuments

•  5languages:German,French,English,Italian,Spanishacceptedas„official“languagesinsidethecommunity:–  Commentscanbewripeninanyoftheselanguages–  Theyhavetobelinkedtotheright(andthesame)document,

independentofthelanguage

–  Mul%lingualdocumentretrieval

10.07.2009 ©Vertan‐DigitalClassicistLondon 31

SoluDonsfromSemanDcWeb

•  Seman%cdescrip%onsofstoredobjects(throughRDFtriples)

•  „Collec%on“–Ontologyforeachtypeofdatacollec%on

•  Mappingof(mul%ligual)lexicalentriesontheontology

•  Ontologicalsearch

10.07.2009 ©Vertan‐DigitalClassicistLondon 32

DC‐Datastream(informationsaboutAuthor,dateofpublication,encodingtype

RDF‐Datastream(informationsaboutrelationwithotherobjectsawellas

semanticdescriptionofthecurrentobject)

XML‐Datastream(codicologicalinformation)

XML‐Datastream

(linguisticinformation)

XML‐DataStream

(layoutinformation)

TEXT‐DataStream

(transciptiontext)

TIFF‐DtaStream

(image)

FedoraObjectID

DEDEDE

DEDEDE

DEDEEN

DE

DEDEDE

DEEN

watermarks

manuscripts

LexikonDE

LexikonEN

Ontological document management and retrieval

10.07.2009 33

TheWatermarkExample

10.07.2009 ©Vertan‐DigitalClassicistLondon 34

<msDesc><msIden%fier><idno>1417</idno><seplement>PARIS</seplement><repository>BIBLIOTHÈQUENATIONALE,ANCIENFONDSGREC</repository></msIden%fier>.....................<physDesc><objectDesc><supportDesc><support><material>Papier</material><watermarkDesc><locus>ff.1‐7</locus>, III&apos;‐X&apos;: Schere I, sehr ähnlichBr.3666 (Perignan 1448, mit Varianten Bayern 1445, Perignan1447/50,Mailand1448).</watermarkDesc>.......................

%tleStmt><%tle>TEIAnnota%on</%tle><author>Cris%naVertan</author></%tleStmt><publica%onStmt><availabilitystatus="free“><p>Freeforacademicpurposes</p>

</availability></publica%onStmt>

Question: „Which manuscript contains watermarks in form of scissors very similar to the motif in Br. 3666 in Milano library?“ What sort of model can make the computer answering correctly and precisely „This document 1417“,

TheWatermarkExampleGraphic

Manuscript

watermark

Scisors

ScisorI

Hr.223

Milan

ScisorV

Paris

Br.366

Album

Libraryplace

Is_a Is_a

Is_aIs_a

Is_a

Is_a

Is_similar

Is_in Is_in

Is_in

Is_in_album

has_a

Type_of

FedoracontentmanagementSystem(hYp://www.fedora.info/)

•  contentmanagementsystemlargelyusedindigitallibraries

•  versioningmechanism

•  usermanagement

•  OneFedoraObjectcangroupseveralXML‐DataStreams

•  indexofobjectscanbemaintainedseman%callythroughRDF‐rela%ons

ImplementaDon

•  AJAX‐TechnologiesforClient‐serverApplica%on•  Viewer::Ajax‐basedUltraHighResolu%onImageViewer•  Forsmallclientbasedapplica%ons(Editorsfordifferent

Components–annota%ons)‐Java‐applet

•  Opensourcesoyware.

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 37

ViabilityofthesoluDon

•  Teuchosplazormhasthepoten%altoillustrate,star%ngfromarealprobleminthehumani%es,thepoten%alofseman%cwebandknowledgerepresenta%onmethods.

•  Mutlitlingualityisacentralproblematalllevelsoftheplazorm

•  MutlilingualproblemsareincreasedduetothelackoftrainingdataandCLtoolsforoldlanguages,especiallyancientGreek.

•  Ontologicaldocumentmanagementandretrievalarearealis%csolu%ongiventhefactthatthereisacloseddomain.

10.07.2009 ©Vertan‐DigitalClassicistLondon 38

Currentstateofdevelopment

•  Datamodelandencodingrealisedfor:

–  manuscriptdescrip%ons(about80Descrip%onsmainlyAristotels)

–  watermarks(about100Watermarkscollec%onHarlfinger)

–  structuringtexts(3manuscripts)

–  digi%sedimages(3manuscripts)

•  FirstversionoftheSystempresentedattheconference“Deutsche

ArbeitsgemeinschayzurFörderungByzan%nischerStudien“

•  Anewversionwillbepresentedmid‐augustat

Fédéra%oninterna%onaledesAssocia%onsd’étudesclassiques

•  Detailsunder:hpp://beta.teuchos.uni‐hamburg.de/

•  Implementa%onregardingknowledgerepresenta%onis

star%ngnow(prototypeDecember2009)

THANKYOUVERYMUCHFORYOURATTENTION!

10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 40

hYp://beta.teuchos.uni‐hamburg.de/IT‐Workshop

Recommended