1
Overview Linguine aims to allow language science students to explore automated language analyses such as syntactic or semantic analyses. This project built upon prior senior projects that created a user interface and web API. This year’s project focused on linguistic analysis functionalities as well as system stability and performance. Analyses & Visualizations Lessons Learned Technologies Term Frequency Analysis Compute word frequencies in a text Coreference Resolution Locate expressions that refer to the same entity in a text Parsing & Part of Speech Tagging Construct a dependency parse tree and mark words by part of speech Sentiment Analysis Estimate the sentiment of a text along with its sentences and tokens Relation Extraction Find relationship triples between words Named Entity Recognition Identify words by classes such as organization, place, or time expression Interactive Text Interactive Text Interactive Text Interactive Text non consequat lobortis vitae Pellentesque felis tellus Nulla enim libero ipsum ac eu tincidunt auctor condimentum et sodales faucibus volutpat aliquam amet ultrices consectetur purus congue Curabitur Donec velit molestie Praesent augue gravida ligula tortor tristique id ut sed eros lorem tempor dignissim ornare imperdiet sit Vestibulum ante lectus elit bibendum ex turpis varius sem arcu venenatis eleifend iaculis rutrum mi quam mattis scelerisque Curae metus cubilia Etiam ullamcorper fermentum posuere luctus erat orci Phasellus urna maximus massa primis nisi nunc Integer Vivamus sollicitudin suscipit Nam dui viverra semper Proin est pharetra quis Maecenas hendrerit adipiscing facilisis vehicula porta placerat odio neque eget dolor Fusce convallis justo euismod Nullam egestas lacus diam Word Cloud Parse Tree Contributions Consider a distributed model for expensive operations Flush out critical system bugs prior to opening up tool for use When inheriting a project, budget time for rework or bug fixing Sponsor time is valuable – e.g., helped us with domain concepts Parse Tree Increased functionality with 5 new analysis types Added visualizations for all analysis types Made multiple usability improvements to UI Implemented concurrency on Python backend Integrated Stanford CoreNLP and Illinois Curator Enabled users to work while analysis is processing NL TK Linguine Open source natural language processing & visualization Architecture Methodology Python API Performs analyses with Stanford CoreNLP, NLTK NodeJS API Handles front end interactions such as corpus upload and analysis creation Analysis Thread Pool Conducts analyses using a group of background server threads Team Pastafarians: Danielle Gonzalez, Justin Peterson, Jeremy Vasta, Keegan Parrotte We followed a Scrum methodology, using two week sprints. Several of our team members had experience using Scrum and we were likely to do much of the development independently, so it was the most appropriate choice. In Fall, we met twice a week and remotely on the weekends. In Spring, we met in person three times a week to increase productivity. We began each sprint with a sprint planning meeting to assign story points to each of our user stories. We used burndown charts to track our velocity and estimation accuracy. 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 10 Story Points Sprint Velocity 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 Ratio of Assigned Points Completed Sprint Estimation Accuracy Sponsor: Cecilia Ovesdotter Alm Coach: Larry Kiser | 2015-2016 Named Entity Recognition Parse Tree Coreferece Resolution

Senior Project Poster-April19-revised-COAEdited2pastafarians/staticcontent/poster.pdfLessons Learned Technologies Term Frequency Analysis Compute word frequencies in a text ... Interactive

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Senior Project Poster-April19-revised-COAEdited2pastafarians/staticcontent/poster.pdfLessons Learned Technologies Term Frequency Analysis Compute word frequencies in a text ... Interactive

OverviewLinguine aims to allow language science students to explore automated language analyses such as syntactic or semantic analyses.

This project built upon prior senior projects that created a user interface and web API. This year’s project focused on linguistic analysis functionalities as well as system stability and performance.

Analyses & Visualizations

Lessons Learned

Technologies

Term Frequency AnalysisCompute word frequencies in a text

Coreference ResolutionLocate expressions that refer to the same entity in a text

Parsing & Part of Speech TaggingConstruct a dependency parse tree and mark words by part of speech

Sentiment AnalysisEstimate the sentiment of a textalong with its sentences and tokens

Relation ExtractionFind relationship triples between words

Named Entity RecognitionIdentify words by classes such asorganization, place, or time expression

Interactive Text

Interactive Text

Interactive Text

Interactive Text

non

consequatlobortis

vitaePelle

ntesquefelis tellus

Nullaenim

liberoipsum

ac

eu

tincidunt

auctor

condimentum

etsodales

faucibus

volutpat aliquam

amet

ultrices

consectetur

purus

congue

Curabitur

Donec

velit

molestie

Praesent

augue

gravida

ligula

tortor

tristique

idut

sed eros

lorem

tempor

dignissim

ornare

imperdiet

sit

Vestibulum

ante

lectus

elitbibendum

ex

turpis

varius

sem

arcu

venenatiseleifend

iaculis

rutrum

mi quam

mattis

scelerisque

Curaemetus

cubilia

Etiamullamcorper

fermentum

posuere

luctus

erat

orci

Phasellus

urna

maximus

massa

primis

nisi

nunc

Integer

Vivamus

sollicitudin suscipit

Nam

dui

viverra

semper

Proin

est

pharetra

quis

Maecenas

hendrerit

adipiscing

facilisis

vehicula

porta

placerat

odio

neque

eget

dolor

Fusce

convallis

justo

euismod

Nullam

egestas

lacus

diam

Word Cloud

Parse Tree

Contributions

• Consider a distributed model for expensive operations

• Flush out critical system bugs prior to opening up tool for use

• When inheriting a project, budget time for rework or bug fixing

• Sponsor time is valuable – e.g., helped us with domain concepts

Parse Tree

• Increased functionality with 5 new analysis types• Added visualizations for all analysis types• Made multiple usability improvements to UI• Implemented concurrency on Python backend• Integrated Stanford CoreNLP and Illinois Curator• Enabled users to work while analysis is processing

NLTK

LinguineOpen source natural language processing & visualization

Architecture Methodology

Python APIPerforms analyses with Stanford CoreNLP, NLTK

NodeJS API Handles front end interactions such as corpus upload and analysis creation

Analysis Thread Pool Conducts analyses using a group of background server threads

Team Pastafarians: Danielle Gonzalez, Justin Peterson, Jeremy Vasta, Keegan Parrotte

We followed a Scrum methodology, using two week sprints. Several of our team members had experience using Scrum and we were likely to do much of the development independently, so it was the most appropriate choice.

In Fall, we met twice a week and remotely on the weekends. In Spring, we met in person three times a week to increase productivity.

We began each sprint with a sprint planning meeting to assign story points to each of our user stories. We used burndown charts to track our velocity and estimation accuracy.

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10

StoryPo

ints

Sprint

Velocity

00.10.20.30.40.50.60.70.80.91

1 2 3 4 5 6 7 8 9 10

Ratio

ofA

ssigne

dPo

ints

Completed

Sprint

EstimationAccuracy

Sponsor: Cecilia Ovesdotter AlmCoach: Larry Kiser | 2015-2016

NamedEntityRecognition

ParseTree

Coreferece Resolution