27
Lori Pollock Professor, CIS Program Analysis, Software Development & Maintenance Tools, Optimizing Compilers ‘81 B.S. CS and Econ, Allegheny ’81-’86 PhD in CS, U of Pittsburgh Married Mark ’86-’90 Assistant Prof, Rice U Lauren ‘88; Lindsay ‘90 ’91- Assistant, Associate, Full Prof UD CIS Matt ’95 Today: Lauren & Lindsay here at UD, Matt 15, 3 PhD students, 3 masters students, and a few undergraduate researchers

Lori Pollock Professor, CIS Program Analysis, Software Development & Maintenance Tools, Optimizing Compilers ‘81 B.S. CS and Econ, Allegheny ’81-’86PhD

Embed Size (px)

Citation preview

Lori PollockProfessor, CIS

Program Analysis, Software Development & Maintenance Tools, Optimizing

Compilers ‘81 B.S. CS and Econ, Allegheny ’81-’86 PhD in CS, U of Pittsburgh

Married Mark’86-’90 Assistant Prof, Rice U

Lauren ‘88; Lindsay ‘90’91- Assistant, Associate, Full Prof UD CIS

Matt ’95Today: Lauren & Lindsay here at UD, Matt 15, 3 PhD

students, 3 masters students, and a few undergraduate researchers

What I do here at UD• Research

– Software Engineering and Compilation Lab (Hiperspace)• 213 Smith Hall

– Collaborations• Vijay Shanker (UD CIS), Terry Harvey (UD CIS), Lisa Marvel (Army

Research Lab), Martin Swany (UD CIS), Guang Gao (UD ECE)

– Funding• Primarily NSF grants; previously some Army funding

• Graduate Teaching – CISC 672 Compilers– CISC 673 Program Analysis and Transformations– CISC 879 Software Testing and Maintenance– CISC 879 Software Tools and Environments

• Undergraduates– Programming XO laptops for local middle school teachers– Study abroad programs

What I do outside UD• Associate Editor, Transactions on Software

Engineering and Methodology (TOSEM)

• Computing Research Association (CRA)’s Committee on the Status of Women in Computer Research (CRA-W)

• Mentoring – speaker at mentoring

workshops for undergrads, grads, assistant and associate profs, and industry lab researchers

• Program committees, conf org,NSF panels, paper reviews,… (typical of university

researchers)

Current Students in Training

Xiaoran Wang PhD

Antony Danalis PhD

Giri Sridhara PhD

Masters: Divya, Suparna, PoonamCurrent Undergraduates: Sana Malik, Michelle Allen

Recently Completed PhD 2007-10

Mike JochenAssistant ProfEast Stroudsburg U

Sara SprenkleAssistant ProfWashington & Lee U

David ShepherdPostdoc, Startup

Ben BreechArmy ResearchLab

Emily Gibson HillAssist Prof, Montclair State U

Overview: Research Projects

Program Analysis

Compiler Technology

Natural LanguageAnalysis of Programs Testing Web

Applications

Optimizing Cluster Parallel Programs

IdentifyingCode vulnerabilities To Malware

Software Tools…………..Testing…..........Compilers….……Parallel Computing

Giri, XiaoranShanker, Hill

Antony,Swany

Sprenkle

ClauseWinbladh Green Software

Engineering

Optimizing Cluster Parallel Programs

Research Problem - How can scientific codes be scaled to a cluster of many CPUs?

Major Challenge – Communication Costs

Approach and Contributions:

An integrated system to hide communication latency

-Surveyor: Collect “knowledge” of cluster

-Compiler: analyze dependencies and transform to create maximal communication/computation overlap

-Communication Library: Use a companion library to MPI

ASPhALT: Automatic System for Parallel AppLication

Transformations

Contribution: FIRST to cluster-optimize MPI codes

Testing Web Applications

• Combination of– Stand-alone applications– GUIs and Database applications– Distributed applications

• Numerous technologies and components

Web Application Structure

Front End Back End

Client(HTML)

Server(Java)

Database(MySql)

Browser

Traditional Software Testing Process

Oracle Pass/ Fail

Replay Tool

TestCases

TestCases

ActualResults

Hard to obtain when testing web applications!!

ExpectedResults

ApplicationSpecification

Test Case Generator

ApplicationRepresentation

ApplicationImplementation

User-session-based Testing

User-session-based Testing Process

Oracle Pass/ Fail

ActualResults

ExpectedResults

Log UserRequests

User 1:register.jsp?name=ss&pass=tstlogin.jsp?name=ss&pass=tstlogout.jsp

Beta Web Application (v.0.9) Deployment

Users

TestCases

Web Application Implementation (v.1.0)

UserSessions

Replay Tool

Test Cases

Create testcases

Create/ReduceTest Cases

User-session-based

Maintenance Testing for Web Applications

Research Problem: How can we exploit user session logging for testing of web applications after initial deployment, with minimal tester effort?

Contributions: Scalable, practical, automated structural testing framework for web applications

* Test case generation * Test suite reduction * Test oracles * Test coverage criteria in terms of URLs, parameters, values

Analyzing the Names in SoftwareResearch Problem

- 60-90% software costs are in reading and navigating large software systems to fix bugs and add new features. Can we help with automation of search, navigation, location of relevant code?

- Key: Programmers leave clues of their intent as they choose names.

Proposed Approach – Develop, extend, and apply natural language-based analysis to

the identifier names and comments

Contribution - Aid understanding, debugging, maintenance, development

Focus on actions -Correspond to verbs-Verbs need Direct Object- Phrases more useful

NLPA

Our Research Focus and Impact

Software Maintenance ToolsSoftware Maintenance Tools

SearchSearch UnderstandingUnderstanding ……

Natural Language Analysis

Word relations(synonyms, antonyms, …

Word relations(synonyms, antonyms, …

Part of speech tagging

Part of speech tagging Abbreviations…Abbreviations…Word splittingWord splitting

ExplorationExploration

Search MethodSearch MethodSearch MethodSearch Method

1. User formulates query

2. Query executed by search method

3. User views search results

4. Repeat as necessary

How do SE tool users search code now?

User faces 2 challenges:• Decide what query words to search for• Determine whether the results are relevant

Our focus: Natural language (NL) queries

SourceCode

SearchSearchResultsResultsSearchSearchResultsResults

Query

Determine Relevance of Results

(Re)formulate Query

User

“compile report” vs. “compil*report”

Our Contextual Search Process

Search MethodSearch MethodSearch MethodSearch Method

SourceCode

SearchSearchResultsResultsSearchSearchResultsResults

Query

Determine Relevance of Results(Re)formulate

Query

User

Search MethodSearch Method Search MethodSearch Method

SourceCode

Information Extraction Process

Information Extraction Process

Partial Phrase Matching

Partial Phrase Matching

NL Phrase Mapping

NL Phrase Mapping

Hierarchical

Information Extraction Process

Information Extraction Process

Another Tool - UD-Summarize: An Example

/* Update linear edge view. If width <= 1, draw line to given graphics2d,

else draw polyline to graphics2d */

Challenges in Summary Comment Generation for Methods

• Method name does not always suffice as summary Ex: main(), returnPressed()

• There is no notion of what constitutes an important statement for a method summary

• A selected statement may have too little or too much detail for a succinct summary

• Generated phrases should be understandable without reading the associated code and its context within the method body

UD-Summarize: The Process

UD-GenPhrase: Generate Phrases for Selected Statements and Combine

Phrases

Method M signature and body

Summary comment for M

Build required structural and linguistic program representations

NLPA

Developing Basic NL Analyses

Software Maintenance ToolsSoftware Maintenance Tools

SearchSearch UnderstandingUnderstanding ……

Natural Language Analysis

Word relations(synonyms, antonyms, …

Word relations(synonyms, antonyms, …

Part of speech tagging

Part of speech tagging Abbreviations…Abbreviations…Word splittingWord splitting

ExplorationExploration

public UserList getUserListFromFile( String path ) throws IOException {

try {

File tmpFile = new File( path );

return parseFile(tmpFile);

} catch( java.io.IOException e ) {

throw new IOrException( ”UserList format issue" + path + " file " + e );

}

}

Illustrating some issues: Extracting Clues from

Signatures1. Split Name into Words2. Part-of-speech tag method name3. Chunk method name4. Identify Verb and Direct-Object (DO)

get<verb> User<adj> List<noun> From <prep> File <noun>

get<verb phrase> User List<noun phrase> From File <prep phrase>

POS Tag

Chunk

Automatic Abbreviation Expansion

1.Split Identifiers:2.Identify non-

dictionary words3.Determine long

form

non-dictionary wordnon-dictionary word

no boundaryno boundary

• Don’t want to miss relevant code with abbreviations

• Given a code segment, identify character sequences that are short forms and determine long form

• Approach: Mine expansions from code [MSR 08]

Issues with a Simple Dictionary

Approach• Manually create a lookup table of common

abbreviations in code- Vocabulary evolves over time, must maintain

table- Same abbreviation can have different expansions

depending on domain AND context:

cfg?Control Flow Graph

Context-Free Grammar

configuration

configure

Overview: Research Projects

Program Analysis

Compiler Technology

Natural LanguageAnalysis of Programs Testing Web

Applications

Optimizing Cluster Parallel Programs

IdentifyingCode vulnerabilities To Malware

Software Tools…………..Testing…..........Compilers….……Parallel Computing

Giri, XiaoranShanker, Hill

Antony,Swany

Sprenkle

ClauseWinbladh Green Software

Engineering

What are we doing now?• Emily

– Developing a general software word usage model– Implementing SWUM– Evaluating for search

• Giri– Developing techniques for automatic comment generation

• Which statements to include in summary content?• How to generate phrases for that content?

– Providing feedback on SWUM refinement

• Divya– Analyzing specific Java statement structures – Developing templates for comment phrase generation

What have we learned overall?

• Evaluation studies indicateNatural language analysis has far more potential to improve software maintenance tools than we initially believed

• Existing technology falls shortSynonyms, collocations, morphology, word frequencies, part-of-speech tagging, AOIG

• Keys to further successImprove recall

Extract additional NL clues

Another of our Tools: Dora the Program

Explorer*

* Dora comes from exploradora, the Spanish word for a female explorer.

DoraDora

Natural Language Query• Maintenance request• Expert knowledge• Query expansion

Natural Language Query• Maintenance request• Expert knowledge• Query expansion

Relevant Neighborhood

Program Structure• Representation

• Current: call graph• Seed starting point

Relevant Neighborhood• Subgraph relevant to query

Query