Senior Project Poster-April19-revised-COAEdited2pastafarians/staticcontent/poster.pdfLessons Learned Technologies Term Frequency Analysis Compute word frequencies in a text ... Interactive

OverviewLinguine aims to allow language science students to explore automated language analyses such as syntactic or semantic analyses.

This project built upon prior senior projects that created a user interface and web API. This year’s project focused on linguistic analysis functionalities as well as system stability and performance.

Analyses & Visualizations

Lessons Learned

Technologies

Term Frequency AnalysisCompute word frequencies in a text

Coreference ResolutionLocate expressions that refer to the same entity in a text

Parsing & Part of Speech TaggingConstruct a dependency parse tree and mark words by part of speech

Sentiment AnalysisEstimate the sentiment of a textalong with its sentences and tokens

Relation ExtractionFind relationship triples between words

Named Entity RecognitionIdentify words by classes such asorganization, place, or time expression

Interactive Text

Interactive Text

Interactive Text

Interactive Text

non

consequatlobortis

vitaePelle

ntesquefelis tellus

Nullaenim

liberoipsum

ac

eu

tincidunt

auctor

condimentum

etsodales

faucibus

volutpat aliquam

amet

ultrices

consectetur

purus

congue

Curabitur

Donec

velit

molestie

Praesent

augue

gravida

ligula

tortor

tristique

idut

sed eros

lorem

tempor

dignissim

ornare

imperdiet

sit

Vestibulum

ante

lectus

elitbibendum

ex

turpis

varius

sem

arcu

venenatiseleifend

iaculis

rutrum

mi quam

mattis

scelerisque

Curaemetus

cubilia

Etiamullamcorper

fermentum

posuere

luctus

erat

orci

Phasellus

urna

maximus

massa

primis

nisi

nunc

Integer

Vivamus

sollicitudin suscipit

Nam

dui

viverra

semper

Proin

est

pharetra

quis

Maecenas

hendrerit

adipiscing

facilisis

vehicula

porta

placerat

odio

neque

eget

dolor

Fusce

convallis

justo

euismod

Nullam

egestas

lacus

diam

Word Cloud

Parse Tree

Contributions

• Consider a distributed model for expensive operations

• Flush out critical system bugs prior to opening up tool for use

• When inheriting a project, budget time for rework or bug fixing

• Sponsor time is valuable – e.g., helped us with domain concepts

Parse Tree

• Increased functionality with 5 new analysis types• Added visualizations for all analysis types• Made multiple usability improvements to UI• Implemented concurrency on Python backend• Integrated Stanford CoreNLP and Illinois Curator• Enabled users to work while analysis is processing

NLTK

LinguineOpen source natural language processing & visualization

Architecture Methodology

Python APIPerforms analyses with Stanford CoreNLP, NLTK

NodeJS API Handles front end interactions such as corpus upload and analysis creation

Analysis Thread Pool Conducts analyses using a group of background server threads

Team Pastafarians: Danielle Gonzalez, Justin Peterson, Jeremy Vasta, Keegan Parrotte

We followed a Scrum methodology, using two week sprints. Several of our team members had experience using Scrum and we were likely to do much of the development independently, so it was the most appropriate choice.

In Fall, we met twice a week and remotely on the weekends. In Spring, we met in person three times a week to increase productivity.

We began each sprint with a sprint planning meeting to assign story points to each of our user stories. We used burndown charts to track our velocity and estimation accuracy.

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10

StoryPo

ints

Sprint

Velocity

00.10.20.30.40.50.60.70.80.91

1 2 3 4 5 6 7 8 9 10

Ratio

ofA

ssigne

dPo

ints

Completed

Sprint

EstimationAccuracy

Sponsor: Cecilia Ovesdotter AlmCoach: Larry Kiser | 2015-2016

NamedEntityRecognition

ParseTree

Coreferece Resolution

Documents

Senior Project Poster-April19-revised-COAEdited2pastafarians/staticcontent/poster.pdfLessons Learned Technologies Term Frequency Analysis Compute word frequencies in a text ... Interactive