Upload
valentine-lucas
View
212
Download
0
Embed Size (px)
Citation preview
CLARIN Metadata InfrastructureComponent Metadata and intermediate
solutions
Daan Broeder
Claus Zinn
Dieter van Uytvanck
-
Max-Planck Institute for Psycholinguistics
CLARIN NL Info session 1-7-2009
Content
Component metadata Infrastructure Intermediate solutions CMD Toolkit
Create CMD components now Virtual Language Observatory
What can we do with metadata
Context
Other Metadata Infrastructures in our domain: IMDI, OLAC/DC, TEI
Problems: Inflexible: too many (IMDI) or too few (OLAC) fields Limited interoperability Problematic (unfamiliar) terminology for some sub-
communities. etc.
CLARIN Project - CMDI
Metadata infrastructure based on a
“Component Metadata Model” Aims
Flexibility Researcher should themselves decide what metadata fits their
needs Offer ready made metadata components Allow creation of new metadata components needed
Interoperability built-in Complete Infrastructure: software for editing, harvesting,
exploitation Compatibility with existing frameworks: OLAC, IMDI
CMDI history
Berlin WP2 workshop, Oct. 2008 Oxford WP2 workshop Feb. 2009 Documents:
Metadata Infrastructure for Language Resources and Technology v3 Dec 2008
Metadata Infra Work Document, Feb 2009 Requirements for Virtual Collections Mar 2009, limited
circulation. CMDI developers wiki
Nijmegen Developers Workshop, May 2009
Metadata Components
TechnicalMetadata
Sample frequency
Format
Size…
Lets describe a sound recording
Metadata Components
Language
TechnicalMetadata
Name
Id
…
Lets describe a sound recording
Metadata Components
Language
TechnicalMetadata
Actor
Sex
Language
Age
Name
…
Lets describe a sound recording
Metadata Components
Language
TechnicalMetadata
Actor
Location
…
ContinentCountryAddress
Lets describe a sound recording
Metadata Components
Language
TechnicalMetadata
Actor
Location
Project…
Name
Contact Lets describe a sound recording
Metadata Components
Language
TechnicalMetadata
Actor
Location
Project Lets describe a sound recording
Metadata schema
Metadata profile
Metadata Components
Language
TechnicalMetadata
Actor
Location
Project Lets describe a sound recording
Metadata schema
Metadata description
Metadata Components
Country dcr:1001Language dcr:1002
LocationCountry
Coordinates
ActorBirthDate
MotherTongue
TextLanguage
Title
RecordingCreationDate
Type
Component registry
BirthDate dcr:1000
ISOcat concept registry
user
DanceName
Type
User selects appropriate components to create a metadata description
Semantic interoperability partly solved via references to ISOcat concept registry
Selecting metadata components from the registry
Title: dc:title
DCMI concept registry
CLARIN MD Live-cycle
SearchService
Joint MetadataRepository
MetadataRepository
MetadataRepository
Relation Registry
ISOcatConcept Registry
DCMIConcept Registry
otherConcept Registry
CLARINComponent
Registry
SemanticMapping
Create metadata schema from selection of existing components. Allow creation of new components if they have references to ISOcat
Perform search/browsing on the metadata catalog using the ISO DCR and other concept registries and CLARIN relation registry
Metadata component profile was selected from metadata component registry
Metadata harvestingby OAI protocol
Metadata descriptions created
Current solution
What if you want to contribute metadata now? The CLARIN ad-hoc registry (800+ resources, 130+ tools) Provide IMDI or OLAC metadata Harvesting (metadata transport) via:
OAI protocol for OLAC records or provide static records XML harvesting for IMDI
Harvested metadata will be shown in a special CLARIN catalog.
Using the standard MPI/LAT catalog software and integrated in VLO specializations
Use & Create CMD components now
What if you are adventurous? CLARIN metadata toolkit allows to start creating metadata
components or use existing ones. We have an existing set of components derived from:
IMDI metadata for sessions IMDI catalog metadata
Small CLARIN NL project planned to test and report on this But you can try it too!
THE END