Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Aknowledgeweb‐basedeResearchenvironmentforclassicalphilology
Cris%naVertanUniversityofHamburg
cris%na.vertan@uni‐hamburg.de
10.07.2009 1©Vertan‐DigitalClassicistSeminarLondon
Arbeitsstelle Computerphilologie
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 2
eResearch environment web-based Knowledge
MaterialTypes Data‐Modelling Data‐Encoding Data‐Visualisa%on Func%onality
Data‐access Usermodelling Datainteroperability Sharedworking‐space Datapersistency
KnowledgeOrganisa%on (Semi‐)Automa%cknowledgeextrac%on Managementofmul%lingualdata Intelligentretrievalofheterogeneousmaterials
123
Software and Hardware infrastructure
Current development and Roadmap
4
5
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 3
• Ins%tuteforGreekandLa%nPhilology(Prof.Dr.C.Brockmann,D.Deckers,,M.Fricke,B.Pukhova,Dr.C.Vertan)• Compu%ngCenter
• AristotelesArchive–FreeUniversityBerlin(Prof.Dr.D.Harlfinger,L.Koch)
TEUCHOSResearchinfrastructurefor
classicalphilology
• Ini%alfunding2007‐2010
Wissenschaftliche Literaturversorgungs-
und Informationssysteme (LIS)
ArbeitsstelleComputerphilologie(Dr.C.Vertan,Prof.Dr.Waltherv.Hahn)
Collaborative Projects with social Sciences and Humanities
MLT‐Cphil‐MulDlingualLanguageTechnologyforclassicalphilology
research• 2009‐2010
ManuscriptDescripDons
• FocusonDescrip%onsofTransmissionsofAristotleGraecus
– 2Volumes(1yetnotpublished)byD.Harlfinger
• Dataismodelledbyselec%ngfromtheDFG‐guidelinesformanuscriptdescrip%onstheitemsrelevantforpalaeographicalandcodicologicalinves%ga%on
• EncodingisdoneinTEI‐P5/msDescrip%on,withminorextensionsperformedto:
– Includemoredetaileddescrip%ononwatermarks
– Annota%onofpropernames,placesindifferentpartsofthemanuscriptdescrip%on
– Markingemptyfolios
• VisualisaDon:PrintView&ScreenView
• FuncDonality:linkswithotherobjects(watermarks,transcrip%ons,digi%sedfolios,biographicaldata)
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 4
Knowledgeweb‐basedeResearchenvironement>Material‐Types
ManuscriptDescripDon‐Modelling&Encoding
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 5
<msItemclass="empty"><locus>ff.9v</locus><locus>81v‐82v</locus><locus>152v.</locus></msItem>
Knowledgeweb‐basedeResearchenvironement>Modelling&Encoding
Empty folios
<watermarkDesc> <locus>ff.1-70</locus>: Schere, sehr ähnlich Br.3668 (Rom 1454, mit Varianten überwiegend Italien 1451-1462), sehr ähnlich <ref target =„teuxx: wHCiseaux22“> Ha., ciseaux 22 </ref>(belegt J.1441 und 1443). </watermarkDesc>
Mentioning a Watermark in msDesc
10.037.2009 6©Vertan‐DigitalClassicistLondon
Knowledgeweb‐basedeResearchenvironement>Visualisa%on
31.03.2009 7©Vertan‐SCHExploratoryWorkshop‐
Birmingham
Watermarks
• ForthemomentseveralhundredsofwatermarkscollectedbyProf.Harlfingerandexistentinprintedform
• Thewatermarkswerecollectedastwin(ormore)‐listsandthereisnoclearevidencewhichelementofthelistwasiden%fiedonwhichpage.
• Addi%onaltothewatermarkmo%fswecollectalsothecountermarks.
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 8
Knowledgeweb‐basedeResearchenvironement>MaterialsTypes
10.07.2009 ©Vertan‐DigitalClassicistLondon 9
Watermark Model
Knowledgeweb‐basedeResearchenvironement>Modelling
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 10
<teuwmo:teuwmObj"> <teuwmo:wmIdent wmIsCountermark="false"> <teuwmo:wmObjId>TEU_WMDesc_Fleur-134m2.xml</teuwmo:wmObjId> <teuwmm:wmIdentification> <teuwmm:wmIdnr>134</teuwmm:wmIdnr> <teuwmm:wmCollection> Harlfinger </teuwmm:wmCollection> <teuwmm:wmName> <wmNameLanguage wmLang="fr">Fleur</wmNameLanguage> <wmNameLanguage wmLang="de">Blume</wmNameLanguage> </teuwmm:wmName> </teuwmm:wmIdentification> </teuwmo:wmIdent> <teuwmo:wmManuscriptData> <teuwmo:msName>Berol. Ham. 512</teuwmo:msName> <teuwmo:msFolio>ff. 213/214; f. 215; auch f. 212</teuwmo:msFolio> <teuwmo:msDate></teuwmo:msDate> </teuwmo:wmManuscriptData> <teuwmo:wmLinks> <teuwmo:pictureLink>Fleur-134m2.tif</teuwmo:pictureLink> <teuwmo:msDescLink>Fleur-134m2.xml</teuwmo:msDescLink> <teuwmo:motifLink>Fleur.xml</teuwmo:motifLink> </teuwmo:wmLinks> </teuwmo:teuwmObj>
Watermark XML-Encoding
Knowledgeweb‐basedeResearchenvironement>Encoding
DigiDsedManuscripts
• Notacri%calmass,stressonmanuscritsnotbeingavailable,orevenunknownuptonow(Lips.Gab.19)
• Westore:• Folioimages(some%mesnotcomplete)
• Manuscriptdescrip%on• Watermarks(ifany)
• (Par%al)transcrip%ons• Rela%onswithothercri%calEdi%ons
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 11
Knowledgeweb‐basedeResearchenvironement>Materialtypes
Teuchosdatamodelformanuscripts
DC- Data
Metadaten - XML/TEI
Thumbnail
Pyramid
PID_1
...
XML-TEI Data
DC- Data
PID_1
DC- Data
Metadatta - XML/TEI
Thumbnail
Pyramid
PID_2
DC- Data
Metadata - XML/TEI
Thumbnail
Pyramid
PID_3
DC- Data
Metadata - XML/TEI
Thumbnail
Pyramid
PID_n
XML-TEI Data
DC- Data
PID_2
XML-TEI Data
DC- Data
PID_3
DC- Data (Author, Date, etc) (XML-DC)
XML List of Structuring-Objects (TEI)
PID
Manuscript Description (TEI) XML-TEI Data
DC- Data PID_1
StructuringObj.-Links
Dig Links
WZ-Links Watermark- Object-Data
PID_1
Knowledgeweb‐basedeResearchenvironement>Materialtypes
DigiDsedManuscripts‐Encoding
• EncodingisdonefollowingtheTEI‐P5• Structuralelements(aschapters/subchapters,inclusivein
cri%caledi%ons)aremarkedwithelement„milestone“inthetranscrip%on.Thismilestoneisponi%ngtoacomplexstructurestoredinaseparateFile.
• Inthiswayweavoidconcurrentannota%on(infactwerealiseakindofstand‐offannota%on)
• Linksaremadenotdirectlytoimagesbuttotheobjectsdescribingtheimages.Inthiswaydependingonuserrightswecanoffer(ifatall)differentimagequality
• ImagesarestoredinTiffformat,allowingaPyramidalstructure(forzooming)
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 13
Knowledgeweb‐basedeResearchenvironement>Encoding
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 14
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 15
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 16
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 17
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 18
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 19
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 20
OtherMaterials
• Palimpsests(materialsacquiredduringtheRennascimentoVirtualeEU‐Project)– Highqualityresolu%onimages– Severallayersonthesameimage
• Biographicaldata• Bibliographicaldata• Researchar%cles
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 21
Knowledgeweb‐basedeResearchenvironement>MaterialTypes
Challengesforthedatamodelling
• exis%ngsetsofdataaretoosmalltomodelobjectsforallcon%ngencies
• generalmodelcannotbedefined.
• Notalltypesofdata/respec%veFedoraobjectsareavailableforallmanuscripts.
Knowledgeweb‐basedeResearchenvironement>
Web‐basedEnvironment
• Access:
– Userswithdifferentrights(usergroups)• QualityEnsurance:
– Revieweduploadingofnewdocuments&comments• DataPersistency:
– URIandPersistentIden%fiers(DOI)– StoragerealisedontheinfrastructureoftheCompu%ngcentre
• Datainteroperability
– EncodingfollowsasmuchaspossibleTEI‐P5– EnrichmentstotheTEI‐P5willbemadepubiclyavailable
– ExportinManuscriptaMaedievaliaFormat
– Import/accesstootherdigitallibraries(ENRICH,ManuscriptaMaedievalia,Piccard(Watermarks)
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 23
Knowledgeweb‐basedeResearchenvironement
Web‐basedEnvironment
• SharedWorkingspace:– Forumfunc%onality
– Commentscanbemadetoanyofthematerials– Synchronousaccesstodata– Versioning
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 24
Knowledgeweb‐basedeResearchenvironement
Users
Public User Groups
Web-Application
Search Visual
represent. Editing Comments Web-pages
CMS
Textual Transmission Works
Descriptive Text
Minor Text
Edition data
Commentary
Translation
Forum
Watermarks
Watermark
DescripDon
Watermark‐
Image
Metdata
Watermark
Image
Metdata
Watermark
Image
Metadata
Fedora-Repository
Manuscripts
Transcrip%on
Manuscript
Descrip%on
Digit.
Manuscripts
MetadataDigit.
Manuscripts
MetadataPage
images
Metadata
Forumentries
Researchpapersand
publicaDons
Biographical
dicDonaries
Bibliography
Links Links Links Bibliogr
data Bibliogr
data Bibliogr
data
Teuchos Server
Submitting
Admin
31.03.2009 25©Vertan‐SCHExploratoryWorkshop‐
Birmingham
ManagementofHeterogeneousData
• differentdatatypes:– semi‐structureddocuments(XML/TEI)
– high‐resolu%onRGBimages(TIFF)
– graphics(forthewatermarks)
– eMaterials(researchpapersinPDForMS‐Officeformat)
– unstructureddocuments(Forum‐ar%cles)
• Differentannota%on‐levelsanddepthamongthesemi‐structureddocuments
• mul%lingualdata
Knowledgeweb‐basedeResearchenvironement>Knowldegein...
ImplementaDonchallenges
• Naviga%onacrossdifferentdatacollec%ons(seman%clinking)
• Document/objectretrieval(beyondfull‐text/form‐basedretrieval)
• Mul%linguality
10.07.2009 ©Vertan‐DigitalClassicistLondon 27
IdenDfiedtypesofmulDlinguality
• Mul%lingualityinsideonedocument
• Mul%lingualityacrossdocuments• Mul%lingualterminologyinsideodedatacollec%ontype
31.03.2009©Vertan‐SCHExploratoryWorkshop‐
Birmingham 28
31.03.2009 29©Vertan‐SCHExploratoryWorkshop‐
Birmingham
Abreviations
NE
Latin
Ancient Greek
TerminologicalmulDlingualproblems
• Watermarksnames,ChapterinManuscriptdescrip%ons,otherkeywordsindifferentlanguages
• E.g.
10.07.2009 ©Vertan‐DigitalClassicistLondon 30
<teuwmm:wmName> <wmNameLanguage wmLang="fr">Fleur</wmNameLanguage> <wmNameLanguage wmLang="de">Blume</wmNameLanguage> </teuwmm:wmName>
MulDlingualityacrossdocuments
• 5languages:German,French,English,Italian,Spanishacceptedas„official“languagesinsidethecommunity:– Commentscanbewripeninanyoftheselanguages– Theyhavetobelinkedtotheright(andthesame)document,
independentofthelanguage
– Mul%lingualdocumentretrieval
10.07.2009 ©Vertan‐DigitalClassicistLondon 31
SoluDonsfromSemanDcWeb
• Seman%cdescrip%onsofstoredobjects(throughRDFtriples)
• „Collec%on“–Ontologyforeachtypeofdatacollec%on
• Mappingof(mul%ligual)lexicalentriesontheontology
• Ontologicalsearch
10.07.2009 ©Vertan‐DigitalClassicistLondon 32
DC‐Datastream(informationsaboutAuthor,dateofpublication,encodingtype
RDF‐Datastream(informationsaboutrelationwithotherobjectsawellas
semanticdescriptionofthecurrentobject)
XML‐Datastream(codicologicalinformation)
XML‐Datastream
(linguisticinformation)
XML‐DataStream
(layoutinformation)
TEXT‐DataStream
(transciptiontext)
TIFF‐DtaStream
(image)
FedoraObjectID
DEDEDE
DEDEDE
DEDEEN
DE
DEDEDE
DEEN
watermarks
manuscripts
LexikonDE
LexikonEN
Ontological document management and retrieval
10.07.2009 33
TheWatermarkExample
10.07.2009 ©Vertan‐DigitalClassicistLondon 34
<msDesc><msIden%fier><idno>1417</idno><seplement>PARIS</seplement><repository>BIBLIOTHÈQUENATIONALE,ANCIENFONDSGREC</repository></msIden%fier>.....................<physDesc><objectDesc><supportDesc><support><material>Papier</material><watermarkDesc><locus>ff.1‐7</locus>, III'‐X': Schere I, sehr ähnlichBr.3666 (Perignan 1448, mit Varianten Bayern 1445, Perignan1447/50,Mailand1448).</watermarkDesc>.......................
%tleStmt><%tle>TEIAnnota%on</%tle><author>Cris%naVertan</author></%tleStmt><publica%onStmt><availabilitystatus="free“><p>Freeforacademicpurposes</p>
</availability></publica%onStmt>
Question: „Which manuscript contains watermarks in form of scissors very similar to the motif in Br. 3666 in Milano library?“ What sort of model can make the computer answering correctly and precisely „This document 1417“,
TheWatermarkExampleGraphic
Manuscript
watermark
Scisors
ScisorI
Hr.223
Milan
ScisorV
Paris
Br.366
Album
Libraryplace
Is_a Is_a
Is_aIs_a
Is_a
Is_a
Is_similar
Is_in Is_in
Is_in
Is_in_album
has_a
Type_of
FedoracontentmanagementSystem(hYp://www.fedora.info/)
• contentmanagementsystemlargelyusedindigitallibraries
• versioningmechanism
• usermanagement
• OneFedoraObjectcangroupseveralXML‐DataStreams
• indexofobjectscanbemaintainedseman%callythroughRDF‐rela%ons
ImplementaDon
• AJAX‐TechnologiesforClient‐serverApplica%on• Viewer::Ajax‐basedUltraHighResolu%onImageViewer• Forsmallclientbasedapplica%ons(Editorsfordifferent
Components–annota%ons)‐Java‐applet
• Opensourcesoyware.
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 37
ViabilityofthesoluDon
• Teuchosplazormhasthepoten%altoillustrate,star%ngfromarealprobleminthehumani%es,thepoten%alofseman%cwebandknowledgerepresenta%onmethods.
• Mutlitlingualityisacentralproblematalllevelsoftheplazorm
• MutlilingualproblemsareincreasedduetothelackoftrainingdataandCLtoolsforoldlanguages,especiallyancientGreek.
• Ontologicaldocumentmanagementandretrievalarearealis%csolu%ongiventhefactthatthereisacloseddomain.
10.07.2009 ©Vertan‐DigitalClassicistLondon 38
Currentstateofdevelopment
• Datamodelandencodingrealisedfor:
– manuscriptdescrip%ons(about80Descrip%onsmainlyAristotels)
– watermarks(about100Watermarkscollec%onHarlfinger)
– structuringtexts(3manuscripts)
– digi%sedimages(3manuscripts)
• FirstversionoftheSystempresentedattheconference“Deutsche
ArbeitsgemeinschayzurFörderungByzan%nischerStudien“
• Anewversionwillbepresentedmid‐augustat
Fédéra%oninterna%onaledesAssocia%onsd’étudesclassiques
• Detailsunder:hpp://beta.teuchos.uni‐hamburg.de/
• Implementa%onregardingknowledgerepresenta%onis
star%ngnow(prototypeDecember2009)
THANKYOUVERYMUCHFORYOURATTENTION!
10.07.2009 ©Vertan‐DigitalClassicistSeminarLondon 40
hYp://beta.teuchos.uni‐hamburg.de/IT‐Workshop