17
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics CLARIN NL Info session 1-7-2009

CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Embed Size (px)

Citation preview

Page 1: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

CLARIN Metadata InfrastructureComponent Metadata and intermediate

solutions

Daan Broeder

Claus Zinn

Dieter van Uytvanck

-

Max-Planck Institute for Psycholinguistics

CLARIN NL Info session 1-7-2009

Page 2: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Content

Component metadata Infrastructure Intermediate solutions CMD Toolkit

Create CMD components now Virtual Language Observatory

What can we do with metadata

Page 3: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Context

Other Metadata Infrastructures in our domain: IMDI, OLAC/DC, TEI

Problems: Inflexible: too many (IMDI) or too few (OLAC) fields Limited interoperability Problematic (unfamiliar) terminology for some sub-

communities. etc.

Page 4: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

CLARIN Project - CMDI

Metadata infrastructure based on a

“Component Metadata Model” Aims

Flexibility Researcher should themselves decide what metadata fits their

needs Offer ready made metadata components Allow creation of new metadata components needed

Interoperability built-in Complete Infrastructure: software for editing, harvesting,

exploitation Compatibility with existing frameworks: OLAC, IMDI

Page 5: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

CMDI history

Berlin WP2 workshop, Oct. 2008 Oxford WP2 workshop Feb. 2009 Documents:

Metadata Infrastructure for Language Resources and Technology v3 Dec 2008

Metadata Infra Work Document, Feb 2009 Requirements for Virtual Collections Mar 2009, limited

circulation. CMDI developers wiki

Nijmegen Developers Workshop, May 2009

Page 6: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

TechnicalMetadata

Sample frequency

Format

Size…

Lets describe a sound recording

Page 7: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Name

Id

Lets describe a sound recording

Page 8: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Actor

Sex

Language

Age

Name

Lets describe a sound recording

Page 9: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Actor

Location

ContinentCountryAddress

Lets describe a sound recording

Page 10: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Actor

Location

Project…

Name

Contact Lets describe a sound recording

Page 11: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Actor

Location

Project Lets describe a sound recording

Metadata schema

Metadata profile

Page 12: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Language

TechnicalMetadata

Actor

Location

Project Lets describe a sound recording

Metadata schema

Metadata description

Page 13: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Metadata Components

Country dcr:1001Language dcr:1002

LocationCountry

Coordinates

ActorBirthDate

MotherTongue

TextLanguage

Title

RecordingCreationDate

Type

Component registry

BirthDate dcr:1000

ISOcat concept registry

user

DanceName

Type

User selects appropriate components to create a metadata description

Semantic interoperability partly solved via references to ISOcat concept registry

Selecting metadata components from the registry

Title: dc:title

DCMI concept registry

Page 14: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

CLARIN MD Live-cycle

SearchService

Joint MetadataRepository

MetadataRepository

MetadataRepository

Relation Registry

ISOcatConcept Registry

DCMIConcept Registry

otherConcept Registry

CLARINComponent

Registry

SemanticMapping

Create metadata schema from selection of existing components. Allow creation of new components if they have references to ISOcat

Perform search/browsing on the metadata catalog using the ISO DCR and other concept registries and CLARIN relation registry

Metadata component profile was selected from metadata component registry

Metadata harvestingby OAI protocol

Metadata descriptions created

Page 15: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Current solution

What if you want to contribute metadata now? The CLARIN ad-hoc registry (800+ resources, 130+ tools) Provide IMDI or OLAC metadata Harvesting (metadata transport) via:

OAI protocol for OLAC records or provide static records XML harvesting for IMDI

Harvested metadata will be shown in a special CLARIN catalog.

Using the standard MPI/LAT catalog software and integrated in VLO specializations

Page 16: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

Use & Create CMD components now

What if you are adventurous? CLARIN metadata toolkit allows to start creating metadata

components or use existing ones. We have an existing set of components derived from:

IMDI metadata for sessions IMDI catalog metadata

Small CLARIN NL project planned to test and report on this But you can try it too!

Page 17: CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics

THE END