25
3/1/2004 MSE Presentation I 1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

Embed Size (px)

Citation preview

Page 1: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 1

ESTMD System-- A Web-based EST Model Database

System

Yinghua Dong

Page 2: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 2

Outline Project Overview

Requirements

Cost Estimation

Project Plan

Potential Risks

Demonstration

References

Acknowledgments

Page 3: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 3

Project Overview -- Objective

Build a web-based, user-

friendly Expressed Sequence Tags

model database (ESTMD) system

to help biology scientists search

expression sequences and related

information to make further

decision

Page 4: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 4

Project Overview -- Background

ESTs: Expressed Sequence Tags, are partial sequences of

randomly chosen cDNA, obtained from the results of a single

DNA sequencing reaction. Typically, EST processing includes

raw sequence cleaning, cleaned sequence assembling, and

unique sequence annotation and functional assignment.

Trace Files

Raw (clone)sequences

Cleaned (EST)sequences

Assembled(unique)

sequences

PhredCross_match &PERL program

Cap3

Uniquesequenceswith hit

Blast

Page 5: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 5

Project Overview -- Background (cont’d)

Gene Ontology

A set of controlled vocabularies used to

describe biological features within a specified domain

of biological knowledge. Gene Ontology describes the

molecular functions, biological processes and cellular

components of gene.

Pathway

The sequence of enzyme catalyzed reactions

by which an energy-yielding substance is utilized by

protoplasm.

Page 6: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 6

Project Overview -- System Architecture

Client Tier

Responsible for presenting

data, and receiving user

inputs

Application-server Tier

Responsible for recording

and abstracting business

processes

Data-server Tier

Responsible for data storage Three-tier Architecture

Page 7: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 7

Project Overview-- Technologies and Tools

HTML with JavaScript will be used to build client

interfaces

Java Servlets, JSP (Java ServerPage) and JDBC will be

used on the server-side

XML and XSLT will be used to describe and present

Gene Ontology tree structure

MySQL4.0 is chosen as database management system

Page 8: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 8

Project Overview-- Technologies and Tools

(cont’d)

JBuilder Enterprise9 is used as development tool

Rational Rose is used to create UML models

MS-Project is used for project plan

Some verification and validation software (such as

Alloy, USE, or SPIN) will be used for formal

requirement specification

Page 9: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 9

Project Overview-- E-R Model

Page 10: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 10

Requirements

Page 11: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 11

Requirements (cont’d)

Search in Detail Users search detail

information by gene name or

symbol, sequence ID,

FlyBase ID, or GenBank ID

Users can decide the fields

shown in the result

The output format is

html/text (A sample output is

shown on the right side)

unisequenceID: Contig1uniSeq: CGCGGCCGCGTCGACGAGATTCGGAGGTTAGAAACATGACTCGCAAACGCCGTAATGGAGGACGGGCTAAGCACGGCCGTGGCCACGTTAAGGCGGTGAGATGCACCAACTGCGCGCGTTGCGTGCCTAAGGACAAAGCTATCAAAAAGTTCGTGATCAGGAATATTGTCGAAGCGGCTGCCGTCAGGGATATCAACGAAGCTTCCGTATATGCATCATTCCAGCTGCCGAAGCTGTATGCAAAGCTCCACTACTGCGTCTCCTGCGCCATCCACAGCAAAGTTGTGCGCAACAGGTCTAAGAAGGACAGGAGAATCCGCACACCACCCAAGAGCACCTTCCCCAGGGACATGCAGCGCCCACAGAATGTGCAAAGGAAGTGAAGTGATTTACAATAAATTTTAAGAAAACCCflybaseID: FBgn0004413evalue: 2.00E-49hitLength: 114bitScore: 190identity: 93/115

Page 12: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 12

Requirements (cont’d) Search by Keyword

Users search the sequences at each stage by keyword

The output includes sequence ID, length (with a link to sequence),

gene name, symbol and a link to contig view image

A sample output

cloneID RawLengt

h

Cleaned

Length

UnisequenceID

Unisequence

Length

GeneName

symbol

ContigView

pb42ad-1_001_a07.pb42primer

876 409 Contig1 413 Ribosomal protein S26

RpS26 View link

pb42ad-1_001_f07.pb42primer

886 205 Contig1 413 Ribosomal protein S26

RpS26 View link

pyes2-ct_012_c12.p1ca

291 286 Contig1 413 Ribosomal protein S26

RpS26 View link

pyes2-ct_034_h06.p1ca

803 398 Contig1 413 Ribosomal protein S26

RpS26 View link

Page 13: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 13

Requirements (cont’d) Gene Ontology Search

Users search gene ontology information by gene names,

symbols, IDs, or a text file.

The output is a table including GO ID, term, type, sequence ID,

hit ID, and gene symbol.

The hyperlinks on terms can show gene ontology tree structure.

A sample output

GO ID Term Type Sequence ID

Hit ID Gene Symbol

GO:0006412

protein biosynthesis Biological_process

Contig1 FBgn0004413

RpS26

GO:0005843

cytosolic small ribosomal subunit (sensu Eukarya)

Cellular_component

Contig1 FBgn0004413

RpS26

GO:0005840

ribosome Cellular_component

Contig1 FBgn0004413

RpS26

GO:0003735

structural constituent of ribosome

Molecular_function

Contig1 FBgn0004413

RpS26

Page 14: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 14

Requirements (cont’d) Gene Ontology Classification

Users input a batch of gene names/symbols, or a local text file

containing sequence IDs.

Users can choose the gene ontology types which they want to

classify.

The output is a table including gene ontology type, subtype,

sequence count, and percentage of sequences.

A sample outputtype subtype sequence_cou

nt%

Cellular_component

cell 3 75%

Biological_process Cell growth and/or maintenance

3 75%

Molecular_function

enzyme 1 25%

Molecular_function

Protein tagging 1 25%

Molecular_function

Structural molecule 3 75%

Page 15: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 15

Cost Estimation

The effort of the project is estimated by

Function Point Analysis (FPA)

COCOMO II Model

Page 16: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 16

Cost Estimation-- Function Point Analysis

Unadjusted Function Points

FunctionType

Simple Average Complex Total UFPAmoun

tWeight

Amount

Weight

Amount

Weight

Inputs 7 3 0 4 0 6 21

Outputs 2 4 5 5 0 7 33

Inquires 11 3 0 4 2 6 43

Files 0 7 3 10 0 15 30

Interfaces 1 5 1 7 0 10 12

Total UFP 138

Page 17: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 17

Cost Estimation-- Function Point Analysis

(cont’d)

Function Point Analysis Total Unadjusted Function Points (UFP) = 138

Product Complexity Adjustment (PC) = 0.65 + (0.01× 40)

= 1.05

Total Adjusted Function Points (FP) = UFP × PC = 144.9

Language Factor (LF) for Java assumed as 35

Source Lines of Code (SLOC) = FP × LF = 5071.5

Page 18: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 18

Cost Estimation-- COCOMO II

For application programs:

Delivered Source Instructions (KDSI) = 5.0715

Programmer Effect (PM) = 2.4 × (KDSI) 1.05

= 13.2 person-month

Development Time in month (TDEV) = 2.5 × (PM) 0.38

= 6.66 months

Page 19: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 19

Project Plan

Phase I: Requirement ( 1/12/04 ~3/1/04)

Phase II: Design (2/23/04 ~ 4/23/04)

Phase III: Implementation and Test (4/26/04 ~ 7/30/04)

Page 20: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 20

Project Plan (cont’d)

Page 21: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 21

Potential Risks

The requirements may change continually

Some biology knowledge is needed

Some new technologies, such as XML, XSLT,

need to be leaned

Page 22: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 22

Demonstration

http://129.130.115.72:8080/estmd/index.html

Page 23: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 23

References IEEE STD 830-1998, IEEE Recommended Practice for Software

Requirements Specifications, 1998 Edition, IEEE, 1998

IEEE Standard for SW Quality Assurance Plans (IEEE Std 730-1998)

Walker Royce, Software Project Management -- A United Framework, 1998

Marty Hall, Core Servlets and JavaServer Pages, 2000

Roger. S. Pressman, Software Engineering: A practitioner’s Approach, 5th Edition.

Dr. Gustafson, CIS 540 lecture

http://sunset.usc.edu/research/COCOMOII/index.html

Page 24: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 24

Acknowledgments

Committee:

Dr. Mitchell L. Neilsen

Dr. Gurdip Singh

Dr. Daniel Andresen

Page 25: 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

3/1/2004 MSE Presentation I 25

Suggestions and Comments

Thank You!