Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
OverviewLinguine aims to allow language science students to explore automated language analyses such as syntactic or semantic analyses.
This project built upon prior senior projects that created a user interface and web API. This year’s project focused on linguistic analysis functionalities as well as system stability and performance.
Analyses & Visualizations
Lessons Learned
Technologies
Term Frequency AnalysisCompute word frequencies in a text
Coreference ResolutionLocate expressions that refer to the same entity in a text
Parsing & Part of Speech TaggingConstruct a dependency parse tree and mark words by part of speech
Sentiment AnalysisEstimate the sentiment of a textalong with its sentences and tokens
Relation ExtractionFind relationship triples between words
Named Entity RecognitionIdentify words by classes such asorganization, place, or time expression
Interactive Text
Interactive Text
Interactive Text
Interactive Text
non
consequatlobortis
vitaePelle
ntesquefelis tellus
Nullaenim
liberoipsum
ac
eu
tincidunt
auctor
condimentum
etsodales
faucibus
volutpat aliquam
amet
ultrices
consectetur
purus
congue
Curabitur
Donec
velit
molestie
Praesent
augue
gravida
ligula
tortor
tristique
idut
sed eros
lorem
tempor
dignissim
ornare
imperdiet
sit
Vestibulum
ante
lectus
elitbibendum
ex
turpis
varius
sem
arcu
venenatiseleifend
iaculis
rutrum
mi quam
mattis
scelerisque
Curaemetus
cubilia
Etiamullamcorper
fermentum
posuere
luctus
erat
orci
Phasellus
urna
maximus
massa
primis
nisi
nunc
Integer
Vivamus
sollicitudin suscipit
Nam
dui
viverra
semper
Proin
est
pharetra
quis
Maecenas
hendrerit
adipiscing
facilisis
vehicula
porta
placerat
odio
neque
eget
dolor
Fusce
convallis
justo
euismod
Nullam
egestas
lacus
diam
Word Cloud
Parse Tree
Contributions
• Consider a distributed model for expensive operations
• Flush out critical system bugs prior to opening up tool for use
• When inheriting a project, budget time for rework or bug fixing
• Sponsor time is valuable – e.g., helped us with domain concepts
Parse Tree
• Increased functionality with 5 new analysis types• Added visualizations for all analysis types• Made multiple usability improvements to UI• Implemented concurrency on Python backend• Integrated Stanford CoreNLP and Illinois Curator• Enabled users to work while analysis is processing
NLTK
LinguineOpen source natural language processing & visualization
Architecture Methodology
Python APIPerforms analyses with Stanford CoreNLP, NLTK
NodeJS API Handles front end interactions such as corpus upload and analysis creation
Analysis Thread Pool Conducts analyses using a group of background server threads
Team Pastafarians: Danielle Gonzalez, Justin Peterson, Jeremy Vasta, Keegan Parrotte
We followed a Scrum methodology, using two week sprints. Several of our team members had experience using Scrum and we were likely to do much of the development independently, so it was the most appropriate choice.
In Fall, we met twice a week and remotely on the weekends. In Spring, we met in person three times a week to increase productivity.
We began each sprint with a sprint planning meeting to assign story points to each of our user stories. We used burndown charts to track our velocity and estimation accuracy.
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10
StoryPo
ints
Sprint
Velocity
00.10.20.30.40.50.60.70.80.91
1 2 3 4 5 6 7 8 9 10
Ratio
ofA
ssigne
dPo
ints
Completed
Sprint
EstimationAccuracy
Sponsor: Cecilia Ovesdotter AlmCoach: Larry Kiser | 2015-2016
NamedEntityRecognition
ParseTree
Coreferece Resolution