Upload
ruben-izquierdo-bevia
View
368
Download
1
Embed Size (px)
DESCRIPTION
CLTL: Description of web services and sofware. Nijmegen 2013
Citation preview
CLTL Software and Web
ServicesRubén Izquierdo Beviá
Rubén Izquierdo BeviáAbout me
5-year degree on Computer Science (University of Alicante, Alicante, Spain)
National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)
Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)
Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)
Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)
CLTL softwareIn general common input/output format
KAFNAF, as an extension of KAF
Single components performing single tasks Integration of existing modules
Adaptation of input/output formats
Development of new ones
KAFKyoto Annotation Format
Stand-off, layered, XML-based representation formatDifferent types of information are stored in
different layersLayers are linked by means of references Suitable for creating pipelines based on this formatLayers:
Text tokensTerm lemmas, part-of-speech, term sentiment, word
sensesEntities, chunks, opinions…
KAFKyoto Annotation Format
NAFNewsReader Annotation FormatExtension of KAF
Allow the cross-document processingEvent coreference
ID’s are converted into valid URI’s
Store the same type of information provided by different toolsResult of two different pos-taggers
How the software is provided I
All modules are publicly available on GitHubCLTL GitHub
http://github.com/cltl
NewsReader GitHubhttp://github.com/newsreader
OpeNER GitHubhttp://github.com/opener-project/
How the software is provided II
Some are available as Web ServicesExposed as REST web services
Accept and input stream (KAF/NAF)
Generate an output stream (KAF/NAF)
Easy to call from command line with CURLEasy to create module pipelines in the same way you
create a linux commands pipeline
http://wordpress.let.vupr.nl/web-services/
How the software is provided II
How the software is provided II
Our software IGeneral modules (integrated)
Tokenizers: whitespace based, open-nlp trained...
Sentence splitters: based on rules, open-nlp
Pos-taggers: treetagger, open-nlp pos taggers
Chunker: trained on Alpino data with open-nlp
Parsers: Alpino (nl), Stanford (en)
Our software II General modules (developed by us)
Wordnet Tools Functions to use a WordNet in LMF format
Word Sense Disambiguation systems UKB: unsupersived SVM: supervised (for nl derived from DutchSemcor)
Multiword tagger multiword sequences of terms according the WordNet
OntoTagger Ontotagger inserts (semantic) labels into KAF representation on
the basis of lemma or wordnet synset representations of text
Our software IIIGeneral modules (developed by us)
Named Entity RecognizerDetects dates and locations using specific resources
+ GeoNames
KyBotExtract tuples and relations from a set of profiles
formulated using semantic and structural properties
Our software IV OpeNER related (developed by us)
Hotel property taggerDetect aspects related with cleanliness, staff,
breakfast, rooms…Term polarity tagger
Positive/negative terms, intensifiers, negators …Opinion miner
Detect opinions: target + holder + expression2 rule based version // 1 machine learning version
Our software VNewsReader related (developed by us)
Discourse ModuleSplits incoming texts into headers and paragraphs
Factuality ClassifierClassifies whether a statement is
factual/probable/possible or not Event Coreference
Compares descriptions of events within and across documents to decide if they refer to the same events.
CLTL Software and Web
ServicesRubén Izquierdo Beviá