14
Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

Combining Resources: Taxonomy Extraction from Multiple Dictionaries

  • Upload
    teddy

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Combining Resources: Taxonomy Extraction from Multiple Dictionaries. Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona. Information from Dictionaries. Dictionaries good source for information Long tradition of taxonomy extraction - PowerPoint PPT Presentation

Citation preview

Page 1: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Rogelio Nazar & Maarten JanssenIULA, Universitat Pompeu Fabra, Barcelona

Page 2: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Dictionaries good source for information Long tradition of taxonomy extraction

Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008)

Exploiting Machine Readable Dictionaries Parsing definitional phrases Pattern extraction, Shallow parsing Full treatment of a single dictionary

Page 3: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

There is a lot of information available Hand crafted, high-qualify resources

Combining yields new data Taxonomy from multiple dictionaries

Language-independent shallow method Combining definitions of the same word Various dictionaries, online versions DRAE, DGLE, Clave, DEM Frequency Based

Page 4: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Dictionaries differ◦ Different lexicon and definitions◦ Even if only for legal reasons

Hyperonym should be the same◦ A cat is an animal◦ Unless there is uncertainty in the hyperonym

Most dictionaries should use same genus◦ Statistically relevant

Page 5: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

3xablandabrevaspersona2xcom. inútil1xsubstantivocomúnfig.

Page 6: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Directly from harvested text◦ With begin/end tags

No textual analysis More than definitions

◦ Examples, multiple senses, etc. Sense matching impossible

◦ Entries unsystematic◦ Dictionaries do not match in senses

Page 7: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Minimum number of dictionaries Raw frequency count

◦ Hyperonym tends to be repeated Candidates have to be words

◦ Of the same word-class Use of a stop-list

◦ Dictionary generated◦ Words that occur in more than 10% entries

Page 8: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

# deconstrucción (3 dictionaries)teoría 2 1EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;

# descubrimiento (5 dictionaries)acción 3 3cosa 3 5efecto 2 -EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción;

# cumbia (5 dictionaries)danza 2 -EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana;

# asta (5 dictionaries)mar 6 -lanza 6 -media 5 -toro 5 -cuerno 5 -bandera 4 -EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;

Page 9: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

WordNet (still) best available taxonomy◦ Not the best resource for evaluation

Automatic Verification◦ 100 Random nouns◦ Best 5 hyperonymy candidates◦ Match when candidate in chain

Only about 50% accurracy

Page 10: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries
Page 11: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

WordNet ◦ Many intermediate/artificial levels◦ Compulsory hyperonym◦ Contains proper names

Dictonaries ◦ More word-senses◦ Alternative definitions (synonymy, paraphrasis,

…) Differences

◦ Different choice of hyperonym◦ Different lexicon

Page 12: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries
Page 13: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries
Page 14: Combining Resources:  Taxonomy Extraction from Multiple Dictionaries

Question?