Upload
abdelkrime-aries
View
285
Download
0
Embed Size (px)
Citation preview
Towards improving automatic text summariesVers une amélioration des résumés automatiques de textes
Abdelkrime ARIESSupervisors: Pr. Zegour & Pr. Hidouci
Research Group: D3 Team
École nationale Supérieure d’Informatique (ESI, ex. INI), Algérie
LCSI laboratory mid-term seminars: April 19th, 2016
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Plan
1 Problematic
2 Extractive methods
3 Abstractive methods
4 Demo
5 Thank you
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 2/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
Problematic
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 3/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicMotivation
Why should we summarize ?
Saving reading time
Showing content on
small devices
Facilitating document selection
Helping in search
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 4/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
IntroductionSummarization classification
Following [1, 2] :
S u m m a r i z a t i o nOutput documentInput document Purpose
Source size
Single-documentMulti-document
Specificity
Domain-specificGeneral
Form
Audience
GenericQuery-oriented
Usage
Expansiveness
IndicativeInformative
Derivation
Conventionality
BackgroundJust-the-news
ExtractAbstract
Partiality
NeutralEvaluative
FixedFloating
ScaleGenre
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 5/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicExtractive vs. Abstractive
Extractive :
+ Fast with less resources (CPU + data)
+ Can be simply applied to many languages (statistical)
- Incoherent text
- Just pertinent sentences which can have no relation between them
Abstractive :
+ Good text presentation
+ Redundancy can be dealt with
- Slow with a lot of resources (CPU + data)
- Hard to be implemented (language dependent)
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 6/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicMulti-Lingual systems
Process more than one language.Language independent application :
Fully independentPartial independent
Also, there are Cross-lingual systems
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 7/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
MotivationSummarization classificationExtractive vs. AbstractiveMulti-Lingual systemsObjectives
ProblematicObjectives
Create a multi-lingual system.
Introduce abstractive
Improve our method [3].
Improve readability and coherence.
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 8/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methods
AllSummarizer as example
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 9/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAllSummarizer
Inputdocument(s)
Summary
Pre-processing
Normalizer
Segmenter
Stemmer
Stop-wordeliminator
Listof sentences
List ofpre-processedwords foreach sentence
Processing
Clustering
Learning
Scoring
Listof clusters
Summary size
P(f|C)
Extraction
ExtractionSentencesscores
ReOrdering
List of firsthigher scoredsentences
Reorderedsentences
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 10/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAmelioration
Some ameliorations have been made to the original AllSummarizer system[3] :
1 Adding more features to the Unigram and Bigram term frequencies :Sentence positionSentence length with stop words.Sentence length without stop words.
2 Adding more languages to the preprocessing task (27 languages) :Arabic, Bulgarian, Catalan, Czech, German, Greek, English, Spanish,Basque, Persian, Finnish, French, Hebrew, Hindi, Hungarian,Indonesian, Italian, Japanese, Dutch, Nynorsk, Norwegian,Portuguese, Romanian, Russian, Swedish, Thai, Turkish and Chinese.
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 11/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsAmelioration
3 Testing the summarizer with more than 40 languages (we used defaultpreprocessing for languages without a preprocessing task).
4 Fixing the problem of redundant sentences (especially in case ofmulti-document summarization). This was done by calculating thesimilarity between the last added sentence and the sentence to beadded. Then judging if they are similar using clustering threshold.
5 Estimating the threshold and the features for each language (multiand single document summarization). For more information, see ourparticipation in MultiLing2015 workshop (SIGDIAL conference) [4].
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 12/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
AllSummarizerAmeliorationLinks
Extractive methodsLinks
Take a look :https://github.com/kariminf/AllSummarizer
Test it :allsummarizer-kariminf.rhcloud.com
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 13/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methods
Our vision
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 14/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsOur vision
ExtractedPertinentSentences
Abstractive Summary
Sentence Parsing
Syntactic analysis
Internationalization
Structred formatgeneration
WordNet
Reasoning
Information processing
response preparation
Text generation
Style decision &Realizer linking
Concepts transformation
Structred formatgeneration Realization
Request
Format hundler
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 15/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsFormat handler
To communicate information between sentences, we proposed a newformat called STON (“Sentence object notation").
Represent sentences morphological and syntactic characteristics in amulti-lingual way.
Take a look :https://github.com/kariminf/SentRep
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 16/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsFormat handler
Format :Roles : each entity has a role to play in the clause (or sentence) ; Itcan be a subject, object, place, time, etc.
Actions : actions are the dynamic part in a clause, they link roles.
Sentences : Role-Action model can’t represent every information. forinstance, successive actions have to be represented somewhere.
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 17/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsFormat handler
Example : Mother stayed at home.
@r: [r:{id: mother;syn: 10332385;
r:}r:{id: home;syn: 3259505;
r:}r:]
@act: [act:{id: stay;syn: 117985;tense: PA;subj: [mother];@rel:[rel:{type: P_PLACE;ref: [home];
rel:}rel:]
act:}act:]
@st: [st:{type: AFF;act: [stay];
st:}st:]
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 18/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsSentence Parsing
Working on it ...
For now, we code an English2Ston parser.
Syntactic analysis (English) : Stanford parser.
To this day, we just can parse sentences in the form :"Subject{simple singular} Verb{past, present simple} Object{simplesingular}".
Take a look :https://github.com/kariminf/NaLanPar
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 19/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsReasoning
Our aim :
Thoughts are language-independent.
Mind contains many thoughts.
People has thoughts about what others think.
Thoughts have truth level : belief, thinking, fact, etc.
So ... Will be presented Next time
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 20/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Our visionFormat handlerSentence ParsingReasoningText generation
Abstractive methodsText generation
Working on it ...
For now, we code an Ston2English, Ston2French generator.
Sentence Realization (English, French) : SimpleNLG-EnFr.
To this day, we just can parse sentences in the form : "Subject{simplesingular} Verb{past, present simple} Object{simple singular}".
Take a look :https://github.com/kariminf/NaLanGen
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 21/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Demo
Demonstration
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 22/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
So ...
Less has been done, more to be done
Always remember :Summarizing saves time
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 23/24
ProblematicExtractive methods
Abstractive methodsDemo
Thank you
Bibliography I
E. Hovy and C.-Y. Lin, “Automated text summarization and the SUMMARIST system,” in Proceedings of a workshop on held atBaltimore, Maryland : October 13-15, 1998. Association for Computational Linguistics, 1998, pp. 197–214.
K. Sparck Jones, “Automatic summarising : factors and directions,” in Advances in automatic text summarisation. CambridgeMA : MIT Press, 1999.
A. Aries, H. Oufaida, and O. Nouali, “Using clustering and a modified classification algorithm for automatic text summarization,”ser. Proc. SPIE, vol. 8658, 2013, pp. 865 811–865 811–9. [Online]. Available : http://dx.doi.org/10.1117/12.2004001
A. Aries, D. E. Zegour, and K. W. Hidouci, “Allsummarizer system at multiling 2015 : Multilingual single and multi-documentsummarization,” in Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague,Czech Republic : Association for Computational Linguistics, September 2015, pp. 237–244. [Online]. Available :http://aclweb.org/anthology/W15-4634
Abdelkrime ARIES (ESI 2016) Towards improving automatic text summaries 24/24