The NERD project

Embed Size (px)

DESCRIPTION

Seminar at Ecole Centrale, Paris, France

Text of The NERD project

  • 1.Giuseppe Rizzo

2. What is a Named Entity recognition task? A task that aims to locate and classify the name of aperson or an organization, a location, a brand, aproduct, a numeric expression including time, date,money and percent in a textual document12 March 2012 Seminar @ Ecole Centrale, Paris 2/21 3. History of NER benchmarks CoNLL 2003 and CoNLL 2005 schema (4 types): person, organization, location and miscellaneous language independent task ACE 2004, ACE 2005 and ACE 2007 schema (7 types): person, organization, location, facility, weapon,vehicle and geo-political entity entity recognition, not just name (e.g. description, pronoun) find relationships among entities extracted TAC 2009 (Knowledge Base Track) schema (3 types): person, organization and location create a knowledge base from the named entities extracted ETAPE 2012 (Named Entity Task) schema: Quaero (7 main types, 32 sub-types)12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale -3 4. NER Tools Standalone software GATE Stanford CoreNLP Temis Web APIs 12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale -4 5. Factual comparison of 10 Web NER tools Alchemy DBpe Evri Extr Lup Calais SaploWM YahooZemanta diaGranularityOEN OENOEDOENOEN OENOEDOENOENOEDLanguage ENEN EN EN ENEN EN EN EN EN FRGR*ITFRFR SW FR GRPT*ITSPSP ITSP* PT RU SP SWQuota30000 unl3000 3000 unl 500001333 unl5000 10000(calls/day)Sample C/C++ Java AS Java N/A Java Java Java JS C#ClientsC#JS Java JS Perl PHPJava JavaPHPPHPPHPJS PerlPython Perl PHP5 PHP Python Python Ruby RubyContent150KB 452KB8KB32KB 20KB8KB26KB 80KB 7769KB 970KBchunk12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale -5 6. AlchemyDBpedia EvriExtr Lup Calais Saplo WMYahooZemantaResponseJSON Factual comparison (II) HTML HTM HTML HTML JSON JSONJSONJSON XMLFormatMicroF JSONL JSON JSONMicroF XML XMLJSONXMLRDF JSO RDFRDFaRDFRDFXML N XMLXML RDFEntity324320 5 34 319 95 5 7 13 81typenumberEntityN/AcharN/A word range ofchar N/A POS rangeN/Aposition offsetoffset chars offset offsetof charsClassif.AlchemyDBpedia EvriDBpe DBpedia OpenCN/A ESTER YahooFreeBaseOntologies FreeBasediaLinkedM alais Scema.orgDBDefer.DBpeda DBpedia EvriDBpe DBpedia OpenCN/A DBpedia Wikipe WikipediaVocabulariFreeBase diaLinkedM alaisGeonamdiaIMDBesUSCensusDB es MusicBraiUMBELCIAFactnzOpenCycbook AmazonYAGO WikicomYouTubeMusicBrainzpanies TechCrunCrunchBasech... 12 March 2012Seminar @ Ecole Centrale, Paris6/21 7. Human made benchmarks We performed two evaluation experiments: WEKEX 2011 ISWC 2011 t = (entity, type, URI, relevant) Each field has been rated by a Boolean value: true if correct, false otherwiseRizzo G., Troncy R. (2011), NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data.In: International Semantic Web Conference 2011 (ISWC11), Bonn, Germany. 12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale -7 8. WEKEX 2011 Benchmark Controlled experiment 4 human raters 10 English news articles (5 from BBC and 5 from TheNew York Times) Each rater evaluated each article for 5 extractors 200 total evaluations Fleisss kappa score moderate agreement among ratersRizzo G., Troncy R. (2011), NERD: Evaluating Named Entity Recognition Tools in the Web of Data.In: (ISWC11) Workshop on Web Scale Knowledge Extraction (WEKEX11), Bonn, Germany.12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale -8 9. Resultsdifferent behaviorfor different sources12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale -9 10. ISWC 2011 Benchmark Controlled experiment 10 human raters 2 English news articles from The New York Times each rater evaluated each article for 6 extractors120 total evaluations Fleisss kappa score substantialagreement amongraters 12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 10 11. Results12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 11 12. What is NERD?ontology1REST API2UI3 The NERD ontology has been integrated in the NIF project,a EU FP7 in the context of theLOD2: Creating Knowledge out of Interlinked Data1 http://nerd.eurecom.fr/ontology2 http://nerd.eurecom.fr/api/application.wadl3 http://nerd.eurecom.fr/12 March 2012 Seminar @ Ecole Centrale, Paris 12/21 13. NERD Ontology Align the taxonomies used by the extractors 12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 13 14. Building the NERD ontology NERD typeOccurrence Person 10 Organization 10 Country6 Company6 Location 6 Continent5 City 5 RadioStation 5 Album5 Product5 ... ...12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 14 15. NERD REST API /document /userGET, /annotation/{extractor} POST, JSON/RDF* /extractionPUT, /evaluationDELETE entities : [{ ...entity: Tim Berners-Lee ,type: Person ,uri: "http://dbpedia.org/resource/Tim_berners_lee",nerdType: "http://nerd.eurecom.fr/ontology#Person",startChar: 30,endChar: 45,confidence: 1,relevance: 0.5 }]Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web ExtractionTools. In: European chapter of the Association for Computational Linguistics (EACL12), Avignon, France.12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 15 16. NIF: NLP Interchange Format Framework Different outputs for the NLP toolsOpenCalaisDBpedia Spotlight"_type": "Organization","@URI": "http://dbpedia.org/resource/DBpedia",name": "North Atlantic Treaty Organization", "@types": "DBpedia:Software,DBpedia:Work"organizationtype": "governmental civilian","@surfaceForm": "dbpedia","nationality": "N/A", "@offset": "0","_typeReference": "@support": "11",http://s.opencalais.com/1/type/em/e/Organization","@similarityScore": "0.2387271374464035",... Manual effort required for integration or reuse time consuming need to capture the definition of the attributes used in theresponse format NIF uses RDF for representing NER results asLinked Data 12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale- 16 17. Named Entities as textual annotations Lets consider the document: http://www.w3.org/DesignIssues/LinkedData.html The Semantic Web isnt just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ...entities: {[entity: W3C, startChar: 23107, endChar: 23110],}12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 17 18. NERD meets NIFModel documents through aset of strings deferencable onthe Web: offset_23107_ 23110 a str:String ; str:referenceContext :offset_0_26546 .Map string to entity: offset_23107_ 23110 sso:oen dbpedia:W3C.Classificationdbpedia:W3Crdf:type nerd:Organization .Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the LinkedData Cloud. In: (LDOW12) Linked Data on the Web (WWW12), Lyon, France.12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 18 19. NERD Demo12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 19 20. NERD Timeline and Future Workbeginning Comparison of named entity extractorsNERD benchmarks NERD REST API and NERD ontologyLift NERD output results to the LOD cloudtodayNERD smart service: combining the best ofall NER toolsDashboard for improving the NERD userexperience12/03/2012 -Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 20 21. http://nerd.eurecom.fr@giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo12/03/2012 - Multimedia Semantics and Interaction - Sminaire Ecole Centrale - 21