45
LAB II – Product Specification Outline CS 411W Lab II Prototype Product Specification For LASI Prepared by: Erik Rogers Date: 03.April.2013 Version 1.0 [Type text] [Type text] [Type text] 1

Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

LAB II – Product Specification Outline

CS 411W Lab II

Prototype Product Specification

For

LASI

Prepared by: Erik Rogers

Date: 03.April.2013

Version 1.0

[Type text] [Type text] [Type text]

1

Page 2: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

LAB II – Product Specification Outline

Table of Contents

(not sure why I can’t change the numerals)

1 Introduction....................................................................................................................v

1.1 Purpose....................................................................................................................v

1.2 Scope......................................................................................................................vi

1.3 Definitions, Acronyms, and Abbreviations............................................................vi

1.4 References..............................................................................................................ix

1.5 Overview................................................................................................................ix

2 General Description.......................................................................................................x

2.1 Prototype Architecture Description.........................................................................x

2.2 Prototype Functional Description..........................................................................xi

2.3 External Interfaces..............................................................................................xvii

2.3.1 Hardware Interfaces....................................................................................xviii

2.3.2 Software Interfaces.....................................................................................xviii

2.3.3 User Interfaces............................................................................................xviii

2.3.4 Communications Protocols and Interfaces...................................................xxii

3 Specific Requirements...............................................................................................xxii

3.1 Functional Requirements....................................................................................xxii

3.1.1 User Interface...............................................................................................xxii

3.1.1.1 Start-up Screen.......................................................................................xxii

3.1.1.2 Create New Project Screen...................................................................xxiii

3.1.1.3 Project Preview Screen.........................................................................xxiii

[Type text] [Type text] [Type text]

2

Page 3: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

LAB II – Product Specification Outline

Table of Contents (continued)

3.1.1.4 Processing Screen..................................................................................xxiv

3.1.1.5 Results Screen.......................................................................................xxiv

3.1.2 File Manager.................................................................................................xxv

3.1.3 Tagged File Parser.......................................................................................xxvi

3.1.4 Word Association........................................................................................xxvi

3.1.5 Weighting Algorithm.................................................................................xxvii

3.2 Assumptions and Constraints.............................................................................xxix

3.2.1 Assumptions.....................................................................................................xxx

3.2.2 Constraints........................................................................................................xxx

3.2.3 Dependencies..................................................................................................xxxi

3.3 Non-Functional Requirements...........................................................................xxxi

3.3.1 Security........................................................................................................xxxi

3.3.2 Maintainability............................................................................................xxxi

[Type text] [Type text] [Type text]

3

Page 4: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

LAB II – Product Specification Outline

List of Figures

Figure 1 - Major Component Model...................................................................................xi

Figure 2 - LASI's 3-Sector Algorithm Overview..............................................................xii

Figure 3 - Document Traversal........................................................................................xiii

Figure 4 - Word Type Class Diagram..............................................................................xiv

Figure 5 - Phrase Type Diagram........................................................................................xv

Figure 6 - Interface Types................................................................................................xvi

Figure 7 - UI - Top Results Tab.......................................................................................xix

Figure 8 - UI - Word Relationships Tab............................................................................xx

Figure 9 - UI - Word Count and Weighting Tab..............................................................xxi

List of Tables

Table 1. Effects of Assumptions, Dependencies, and Constraints on Requirements.....xxix

[Type text] [Type text] [Type text]

4

Page 5: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

1 Introduction

1.1 Purpose

Linguistic Analysis for Subject Identification (LASI) is a computer application

currently being developed by the CS411 Red Team. Linguistic analysis is the

examination of language form, language meaning, and the ways in which these two

entities synergize to form language context. LASI, a linguistic analyzer that aids the user

as a decision support tool, extracts themes, or specific qualities and characteristics, of a

document or range of documents. Locating the themes of documents is necessary as it

allows the reader to comprehend what has been read; in comprehending what has been

read, the reader can summarize and share the material. LASI will take various document

types as input and return a weighted list of themes in each document individually and any

common themes found over the group of input documents. LASI will not make decisions

for the user directly; rather, it will allow the user to infer information based on the results.

The CS411 Red Team imagines that LASI will not only aid the persons who

presented us with the problem, Dr. Hester and Dr. Meyers (introduced in Lab 1, Section

1), but will also be beneficial to numerous professions. Students will be able to utilize

LASI in order to determine whether publications across the Internet relate to their areas

of study or research paper topics. Teachers could implement LASI in the classroom by

using it to grade student papers or provide examples as to how language is used and

interpreted. Research Analysts and Statisticians should be able to parse numerous

documents in order to quickly locate the topics to crunch data values. Contractors,

Page 6: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Consultants, and other related professions would be able to implement all of the previous

uses to suit their individual or clients’ needs.

1.2 Scope

LASI will implement a handful of features detailed later in this document—a few

of which are advanced in this field of linguistic analysis. LASI will be able to input

multiple documents, of any text file type, and deduce the individual themes of each

document as well as the themes and commonalities between all documents included. The

prototype will allow users to infer important themes without having to manually parse the

document.

1.3 Definitions, Acronyms, and Abbreviations

A.I.D. Process: A process that provides quantitative and qualitative basis to identify

problems and determine the feasibility of solutions.

Analysis: Detailed examination of the elements or structure of something, typically as a

basis for interpretation.

Document: A document herein refers to a formally written, expository paper which

expounds, via a declarative approach, on a relatively quantifiable issue, goal, or

area of research.

Page 7: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Head word: A locally distinct word within a phrase which, by its syntactic associations,

determines the category of the phrase itself.

LASI: Linguistic Analysis for Subject Identification

Linguistic Analysis: The scientific analysis of a language.

Parser: Takes in DOC and DOCX files and converts them to TXT files.

Part of Speech Tagger: Software utility that associates words with the parts-of-speech in

a sentence.

Phrase: An instance of the Phrase class.

Phrase: (Linguistically) A group of words standing together as a conceptual unit.

Phrase Class: The root of the taxonomy of class types which correspond to syntactic roles

at the phrase level and whose instances contain a collection of Words which

together represent a linguistic phrase.

Semantic Analysis: Relating the syntactical structure of words to their language

independent meanings.

Page 8: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Sharp NLP: Written in C#, natural language processing tool used to parse and tag parts-

of-speech.

Strategic Document: Document produced by a client that defines their Goals, Visions

and Missions.

Subject Identification: The process by which the subject matter and thematic content of

documents is determined.

Syntactic Analysis: Identifies key words based on their location in the sentence, rather

than their overall meaning throughout the document.

.TAGGED: The type of file that stores the output of the part-of-speech tagger containing

the all of the text of the document with embedded syntactic annotations.

Theme: Subject-object-verb relationships that LASI is attempting to generate from the

input set.

Tag: A label, or the act of attaching a label, that specifies the syntactic role of a selected

element in a document.

Tagged Set: A group of words, whose part of speech and location in a sentence have

been identified by the parser.

Page 9: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

WordNet: Compiler and provider of the data files which forms the basis for the LASI

thesaurus.

Word Class: The root of the taxonomy of class types which correspond to parts-of-

speech at the word level and whose instances encapsulate each occurrence of a

textually identified word.

Word Weight: A numeric value, associated with each syntactically and lexically unique

word in a written work, indicating its significance.

1.4 References

"WordNet." About  - . Princeton University, 27 Dec. 2012. Web. Fall 2012.

<http://wordnet.princeton.edu/>.

"The Stanford NLP (Natural Language Processing) Group." The Stanford NLP (Natural

Language Processing) Group. Stanford University, n.d. Web. Fall 2012.

<http://www-nlp.stanford.edu/>.

Rogers, Erik P. CS411 Red Team - LASI - Lab 1. Paper. Old Dominion University, 2013.

Print.

1.5 Overview

This product specification provides the hardware and software configuration,

external interfaces, capabilities, and features of the LASI prototype. The information in

Page 10: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

the remaining sections of this document includes a detailed description of the hardware,

software, and external interface architecture of the LASI prototype; the key features of

the prototype; the parameters that will be used to control, manage, and establish each

feature; and the performance characteristics of each feature in terms of outputs, displays,

and user interaction.

2 General Description

2.1 Prototype Architecture Description

The LASI prototype is composed of two main components—hardware and

software. It will not implement any hardware, rather it will require the use of pre-

existing hardware, with required minimum technical specifications, in order to run

successfully and within optimal constraints. LASI’s software is branched into two

subcomponents—the algorithm and user interface. The algorithm consists of three

sectors (a primary analysis, secondary analysis, and tertiary analysis). The user interface

will allow for the users to interact and utilize the algorithm while keeping information

that is irrelevant or able to cause confusion within the user encapsulated.

[This space intentionally left blank]

Page 11: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 1 - Major Component Model

The Major Component Model diagram in Figure 1 displays the minimal technical

requirements for hardware (whether it be a physical machine or virtual machine) to run

the software package as well as the separation of the two subcomponents provided within

the software package. LASI’s prototype will require the user to obtain a system with a

quad-core processor, 8 gigabytes of random access memory, and a solid-state storage

drive or better. LASI’s internal calculations and passes will be hidden from the user

through the use of a user interface that will display only the relevant, final results.

2.2 Prototype Functional Description

The entirety of LASI’s algorithm is conceptualized in three phrases for effective

programming. Each of these phases is essential to a proper, expected output. The phase-

by-phase breakdown follows.

Page 12: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 2 - LASI's 3-Sector Algorithm Overview

Figure 2 depicts LASI’s 3-sector algorithm overview. Each stage is executed

linearly. The “Primary Analysis” phase will parse input documents word-by-word, tag

each part of speech, and count the frequency of each word and part of speech found. The

“Secondary Analysis” phase will bind pronouns and adjectives to nouns and tag the

respective relationships as phrase types. The “Tertiary Analysis” phase will locate and

relate synonyms based on a set taxonomy, access subject-object-verb relationships and

assign them of a phrase type, parse through each word/phrase relationship and assign

weights, and finally output the final weights to the user interface.

[This space intentionally left blank]

Page 13: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 3 - Document Traversal

The first function that the prototype will undergo is that of parsing the document.

The Document Traversal, presented in Figure X, shows the structure of a document. This

organization scheme allows LASI to divide and conquer documents by splitting them into

sub types. When LASI first parses a document, it will separate the document into

paragraphs, sentences, and words. The CS411 Red Team has included both clause and

phrase types in this class structure for use in future phases.

[This space intentionally left blank]

Page 14: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 4 - Word Type Class Diagram

The CS411 Red Team has implemented SharpNLP’s part of speech tagger to

accomplish part of the parsing phase. SharpNLP parses the documents and tags the parts

of speech in a language that LASI’s algorithm can import and make use of. The Word

type construct, built by the CS411 Red Team upon the class diagram in Figure 5, will

wrap textual tokens, imported from SharpNP’s results, word-by-word and link them by

role. This functionality will allow the prototype’s algorithm to access and manipulate

tokens based on their part of speech and relative use in sentences. Once LASI has linked

each token by role, it will be able to begin the second stage of binding and relating parts

of speech to one another to construct phrases.

Page 15: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 5 - Phrase Type Diagram

Figure X presents the class structure for determining phrase types. Once a

document has passed through the “Primary Phase”, it is eligible for use in the “Secondary

Phrase”. Here, each token in the document will be re-analyzed, syntactically, with the

tokens neighboring it. If there is any relevant relationship between tokens within the

same sentence or paragraph, the relationship will be stored and a phrase type will be

assigned.

[This space intentionally left blank]

Page 16: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 6 - Interface Types

In order to aid the construction of phrases, LASI implements an interface

hierarchy, pictured in Figure X. This interface allows for each independent entity that is

contained within a phrase to assume certain roles based on its personality. Said roles can

be used in the “Tertiary Phase” in hopes to strengthen the syntactic relationships between

phrases and infer levels of semantic relationships.

Once a document enters the “Tertiary Phase”, it will be parsed, again, token-by-

token in search for synonyms. A collection of database files, provided by WordNet, and

arranged by part of speech, are parsed and each token in the database file is compared to

each token in the document. When a match in the same type database file is found, the

token and its synonym will be inspected by their philosophical category--provided by

Page 17: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

WordNet. If the token in the document and it’s matched synonym are within the same

category, they will be marked as synonyms.

LASI will then assign each token in the document a weight determined by its part

of speech. The weighting of each token individually will provide an objective standard

for further weighting modifiers. After each token possesses a weight, LASI will search

for token-to-token relationships, defined in the “Secondary Phase”, and phrase

relationships and assign weight modifiers determined by both syntactic and semantic

relationships. Weights of synonyms will be handled separately to ensure a stable

distribution of importance.

The final weights of each phrase will be leveled by the token standardization. Once

the weights are finalized, they will be sorted. The final component of the “Tertiary

Phase” will allow for the user interface to retrieve the results for display.

2.3 External Interfaces

LASI has been designed as a stand-alone and open-source software application that

is to be run on a local (or virtual) machine such as a laptop or desktop computer. In this

regard, external interfaces are limited to standard hardware, provided by the user, and all

third-party software has been incorporated internally. Users will not be required to

purchase hardware specific for the prototype, nor will they need to make use of outside

software.

[This space intentionally left blank]

Page 18: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

2.3.1 Hardware Interfaces

No hardware interfaces will be constructed for this prototype. A virtual machine,

hosted by Old Dominion University, will be used in order to demonstrate LASI. LASI

can also be run on a physical machine within the lab.

2.3.2 Software Interfaces

No software interfaces will be required for this prototype to run. LASI contains

all of functions for, and calls to, third-party software internally. Output results can be

exported to Microsoft Applications if the user so chooses, but are not required for LASI

to function.

2.3.3 User Interfaces

One of the key features of LASI is the ability to graphically display the parsed

results in a manageable format. The CS411 Red Team will provide three tabs: Top

Results, Word Relationships, and Word Count and Weighting. These three tabs allow the

user to view the results in three separate modes and provide sub tabs to view each

document individually in addition to viewing the collective results of all documents.

[This space intentionally left blank]

Page 19: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 7 - UI - Top Results Tab

Figure 1 illustrates the Top Results tab. This tab allows the user to view the top

results, in chart form, that are the most likely themes for the document or documents

analyzed. Results are available for each document individually as well as the collection

of documents as a single entity. These results can be exported to Microsoft Office

applications.

[This space intentionally left blank]

Page 20: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 8 - UI - Word Relationships Tab

The Word Relationships, displayed in Figure 2, tab gives users the option to view

the word relationships, including parts of speech, in each document individually. In this

view, users will be able to mouse over each word in order to see the relationships,

individual word statistics, and phrase statistics. A key will be added to this view to match

the color-coding.

[This space intentionally left blank]

Page 21: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

Figure 9 - UI - Word Count and Weighting Tab

Figure 3 displays the Word Count and Weighting tab, which will show each

word’s count and weight for each the individual and collective documents. The word

count variables for each word will be represented by a whole number as well as a

percentage of frequency in comparison to other words. Weights will be represented as a

decimal.

LASI enables users the option to print or export their results. Said results will be

exportable to Microsoft Office applications (such as Excel, Word, and PowerPoint) as

well as graphical images (with .JPG, .PNG, etc. extensions) and in PDF format. The

CS411 Red Team hopes that this feature will allow for extended use of the results.

[This space intentionally left blank]

Page 22: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

2.3.4 Communications Protocols and Interfaces

No communication protocols or interfaces will be required for the prototype to run.

LASI was built to run without connection to sources external to the machine it is being

executed on. In order to demonstrate LASI’s functionality, Transmission Control

Protocol/Internet Protocol (TCP/IP) over a standard Ethernet connection will be utilized

to access the designated virtual machine.

3 Specific Requirements

The following section describes the specific functional and non-functional

requirements along with the assumptions and constraints of the LASI prototype.

3.1 Functional Requirements

The functional requirements describe the capabilities of the LASI prototype. They

describe what the product must do in order to meet the previously discussed goals and

objectives of the project.

3.1.1 User Interface

The LASI GUI is the way the user interacts with LASI and views the results.

3.1.1.1 Start-up Screen

The Start-up Screen will provide two distinct paths that provide access to LASI’s

functionality. It will allow the user to create or load a project.

1. The user shall be able to create new projects

2. The user shall be able to load new projects

Page 23: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

3.1.1.2 Create New Project Screen

The Create New Project Screen guides the user through the new project creation

process. It will prompt the user to enter the necessary information. During this process

the screen will display a running list of the documents selected for analysis. The

following functional capabilities shall be provided:

1. The user shall be able to add files to project

a. DOC

b. DOCX

c. TXT

2. The user shall not be able to load any other file types

3. The program shall only allow a maximum of five documents to be loaded into a

single project.

4. Documents added to the project will be displayed in a document queue.

5. Documents can be removed from the document queue.

6. All fields must be correctly filled out to create a new project.

3.1.1.3 Project Preview Screen

The Project Preview Screen will provide a preview of the documents selected

during the Create New Project Screen. It allows the user to verify that the correct

documents have been selected, and remove or add additional documents. This screen

will allow the user to start the analysis process. The following functional capabilities

shall be provided:

1. LASI shall provide a preview of uploaded documents.

a. Each tab in the document preview shall display the title of its document

Page 24: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

b. Each tab shall display the text of the document.

2. The user shall be able to add accepted files to a project.

a. The Program shall only allow a maximum of 5 files.

3. The user shall be able to remove documents from a project.

4. LASI must have at least 1 document to start analyzing.

3.1.1.4 Processing Screen

The Processing Screen will display a moving graphic to show that the system has

not frozen. The user will also be able to interrupt analysis. The following functional

capabilities shall be provided:

1. The user shall be able to interrupt analysis.

a. The user shall be returned to the Project Preview Screen.

b. All temporary data will be discarded.

2. LASI shall display a visual indication that analysis is still ongoing.

3.1.1.5 Results Screen

The Results Screen will allow the user to toggle between different scopes,

perspectives, and levels of detail. The following functional capabilities shall be provided:

1. LASI shall render results in multiple views.

a. The Top Results View shall provide a visualized summary of the analysis.

a.1. The user shall be able to toggle between different graphical views

of the top results.

a.2. This view shall allow the user to toggle between individual and

collective document scopes.

Page 25: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

b. The Word Relationships View shall provide visuals to show all

relationships and bindings of words throughout each document.

c. Word Count & Weighting View

c.1. The user shall be presented with the quantitative data used in the

analysis.

c.2. This view shall allow the user to toggle between individual and

collective document scopes.

2. LASI shall be able to export results.

3.1.2 File Manager

The File Manager verifies that the documents loaded into a project are of the

types allowed. It provides file conversion routines to format documents into plain text. It

also manages the tagging process and the resulting TAGGED files. The following

functional capabilities shall be provided:

1. The file manager shall accept a path to a document.

2. The file manager shall verify that the document is in one of the following file

formats:

a. DOC

b. DOCX

c. TXT

3. The file manager must be able to convert a DOC file to DOCX.

4. The file manager must be able to convert a DOCX file to TXT.

5. The File Manager shall invoke the SharpNLP tagger to process each TXT file into

a new TAGGED file containing:

Page 26: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

a. The original text of the document.

b. The part of speech of each word.

c. The type of every phrase.

6. The file manager shall provide functionality to backup up the entire project

directory.

3.1.3 Tagged File Parser

The Tagged File Parser loads the TAGGED files and creates a data structure in-

memory of the documents. The following functional capabilities shall be provided:

1. The Tagged File parser shall only accept a TAGGED file.

2. The Tagged File parser shall create an instance of the Word subclass

corresponding to the annotation imbedded for that word in the TAGGED file.

3. The Tagged File parser shall create an instance of the Phrase subclass

corresponding to the annotation imbedded for that phrase in the TAGGED file.

3.1.4 Word Association

The Word Association algorithm will associate words and phrases to one another

based on their POS and their syntax within the document. The following functional

capabilities shall be provided:

1. The Subject binder determines which noun phrases are the subjects of verb

phrases.

2. The Object binder determines which noun phrases are the direct objects of verb

phrases.

Page 27: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

3. The Object binder determines which noun phrases are the indirect objects of verb

phrases.

4. The Thesaurus correctly identifies synonyms.

5. An adjective or adjective phrase describing a noun or noun phrase is bound as a

describer to that noun or noun phrase.

6. A noun or noun phrase that is the subject of a verb or verb phrase is bound as a

subject to that verb or verb phrase.

7. A noun or noun phrase that is the direct object of a verb or verb phrase is bound as

a direct object to that verb or verb phrase.

8. A noun or noun phrase that is the indirect object of a verb or verb phrase is bound

as an indirect object to that verb or verb phrase.

9. Adverb phrases are associated with the adjective phrases or verb phrases that they

modify.

3.1.5 Weighting Algorithm

The Weighting Algorithm will calculate numeric weights for each Word and

Phrase based on their syntactic associations. Based on this analysis themes are

assembled. The following functional capabilities shall be provided:

1. It shall each Word instance shall start with an equivalent initial weight.

2. It shall each Phrase instance shall start with an equivalent initial weight.

3. It shall update previous weight of each word when encountered again.

4. It shall update previous weight of each word when the association count is

incremented.

Page 28: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

5. The weighting algorithm shall increase the weight of a Word if a synonym is

encountered.

6. The weighting algorithm shall increase the weight of a Word if:

a. A Word is associated with it.

b. A Phrase is associated with it.

7. The weighting algorithm shall increase the weight of a Phrase if:

a. A Word is associated with it.

b. A Phrase is associated with it.

8. The algorithm shall exclude weights of commonly used words such as 'the', 'to', a,

etc. on an individual word-by-word basis.

9. The distance between words and phrases shall be used as a weight modifier.

10. Each Word instance shall store its weight with respect to:

a. The individual document containing it.

b. All of the documents in the project.

11. Each Phrase instance shall store its weight with respect to:

a. The individual document containing it.

b. All of the documents in the project.

12. There shall be an aggregate result computed across all documents.

[This space intentionally left blank]

Page 29: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

3.2 Assumptions and Constraints

The LASI prototype will operate on a set of assumptions and constraints that will

act as boundaries for the prototype functionality. Table 1. contains the full list of

assumptions, constraints, and dependencies for the prototype.

Condition Type Effect on Requirements

Document types are

limited to DOC, DOCX,

and TXT.

Constraint If a document cannot be converted into raw

text, it cannot be accepted.

A project is limited to 5

documents.

Constraint This restricts the number of documents to a

testable set.

Documents submitted

shall consist of entirely

grammatically correct

statements.

Assumption Allows for minimal error checking for the

purposes to developing and demonstrating the

prototype.

The host machine will

have a sufficient amount

of RAM and at least one

multicore processor.

Assumption Allows for minimal error checking for the

purposes of demonstrating the prototype.

The host machine must

have .NET version 4.5

or greater.

Dependency The prototype cannot be demonstrated without

this.

The host machine must Dependency The prototype is not designed to work on other

Page 30: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

be using a 64-bit

Windows operating

system.

operating systems. It lacks a GUI for other

operating systems and is untested.

Table 1. Effects of Assumptions, Dependencies, and Constraints on Requirements

3.2.1 Assumptions

Assumptions with respect to the LASI prototype are being made. First, documents

that are used for analysis are out of LASI’s control. It is expected that the documents

submitted shall consist of entirely grammatically correct statements. This will allow for

minimal error checking for the purposes of developing and demonstrating the prototype.

Second, the host machine is assumed to have sufficient specifications to be able to run the

LAASI prototype.

3.2.2 Constraints

A number of constraints will be used to limit the scope of the prototype to

simplify the development process. First the type of documents that the LASI prototype

can accept has been limited to DOC, DOCX, and TXT. Images cannot be analyzed. This

also means that if a document cannot be converted into raw text, it cannot be accepted.

Second, the prototype will limit the number of document that can be added to one project

to five. This is to insure that the algorithm can function in a timely manner.

3.2.3 Dependencies

Page 31: Product Specification Outlinecpi/old/411/reds13/tabs/...  · Web viewFigure 4 - Word Type Class Diagram xiv. Figure 5 - Phrase Type Diagram xv. Figure 6 - Interface Types xvi. Figure

There are two dependencies that have been identified for the LASI prototype. The

hardware used for the prototype demonstration must have .NET version 4.5 or greater

installed. The host machine must all be using a 64-bit Windows operating system since

the prototype is not designed to work on other operating systems. It lacks a GUI designed

for other operating systems and is untested in such an environment. ODU servers are

expected to be available to host the LASI components. If the ODU servers are

unavailable, personal hardware would need to be used.

3.3 Non-Functional Requirements

3.3.1 Security

No security requirements are required for the prototype. It is advised that users

keep sensitive documents protected within their personal storage drives. Sensitive

documents that are handled by LASI are only subject to security risks if the user leaves

their personal machine unprotected.

3.3.2 Maintainability

The prototype’s entire functionality can be maintained. The CS411 Red Team

plans to release the software as open-source so that the community at large can continue

to approve upon our intended output or manipulate the current infrastructure in order to

accomplish tasks that LASI was not initially intended for.