35
DATA MINING PROJECT (cis-734) PROTEIN SEARCH ENGINE URL – http://web.njit.edu/~sm363 Submitted By: Asad Siddiqui ([email protected] ) Supriya Malhotra ([email protected] )

DOCUMENTATION OF PROJECT

  • Upload
    tommy96

  • View
    481

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: DOCUMENTATION OF PROJECT

DATA MINING PROJECT (cis-734)

PROTEIN SEARCH ENGINEURL – http://web.njit.edu/~sm363

Submitted By: Asad Siddiqui ([email protected])

Supriya Malhotra ([email protected]) Ojus Bathla ([email protected])

Page 2: DOCUMENTATION OF PROJECT

Table Of Content

Topics

1. Introduction

2. E_R diagram

3. Database Schema

4. Soap Implementation

5. Source Code (Database Tables)

6. Screenshot (Tables)

7. Source Code (HTML/JSP)

8. Screenshots (Project)

Page 3: DOCUMENTATION OF PROJECT

Introduction

This project implements a biological database using data mining techniques. The

output should be similar to which produced by SYSTERS. There are two tables in

the schema. The first table contains all the information about clusters of proteins.

Information of individual protein such as description, protein name, gene name and

the cluster number it belongs to is stored in the systers_protein_table. Protein

sequences are store in the protein_sequence_table.

The project is three-tier architecture. The front end is HTML and the middle tier is

JSP and the back end is oracle. The JSP pages are connected to the oracle tables. So,

when the user runs a query on the html page, the JSP code gets executed and it gets

the results from the table it is connected to. Java connectivity has been used to

connect the database tables to the website.

It works like a search engine. It searches as a local host and also it searches from the

web. A user who uses this project is given an option of searching the biological data

in four ways. We have give four attributes as search criteria. They are: protein name,

database, raccno and cluster number. A user, who knows about this biological data,

just needs to type in any of the fields and hit on search and will get the desired results.

Page 4: DOCUMENTATION OF PROJECT

E-R DIAGRAM and DATABASE SCHEMA

The following ER diagram shows how the two tables are connected or related to each

other. It is very important to know how the tables are related and the best way which

explains this is the E-R Diagram and the schema. In the protein_sequence_ table the

attribute accno is foreign key referencing to the raccno attribute in

systers_protein_table which is a primary key.

Systers_Protein_Table

Sequence Is

Protein_Sequence_Table

RACCNO

PNAME DESCGNAME IDENTICAL

FRAGMENT OF

CLUSTERNO

ACCNOSEQUENCE

CLUSTER NO

DB

Page 5: DOCUMENTATION OF PROJECT

SCHEMA

To create the protein tables we have used NJIT oracle server.

To query the database using protein ID, database name, protein name, etc. we have used

JSP pages.

Page 6: DOCUMENTATION OF PROJECT

SOAP IMPLEMENTATION

What is SOAP?

• SOAP stands for Simple Object Access Protocol

• SOAP is a communication protocol

• SOAP is for communication between applications

• SOAP is a format for sending messages

• SOAP is designed to communicate via Internet

• SOAP is platform independent

• SOAP is language independent

• SOAP is based on XML

• SOAP is simple and extensible

• SOAP allows you to get around firewalls

Why do we use SOAP?

• It is important for application development to allow Internet communication

between programs.

• Today's applications communicate using Remote Procedure Calls (RPC) between

objects like DCOM and CORBA, but HTTP was not designed for this.

• A better way to communicate between applications is over HTTP, because HTTP

is supported by all Internet browsers and servers. SOAP was created to

accomplish this.

• SOAP provides a way to communicate between applications running on different

operating systems, with different technologies and programming languages.

Page 7: DOCUMENTATION OF PROJECT

SYNTAX RULES

• Here are some important syntax rules:

A SOAP message MUST be encoded using XML

A SOAP message MUST use the SOAP Envelope namespace

A SOAP message MUST use the SOAP Encoding namespace

A SOAP message must NOT contain a DTD reference

A SOAP message must NOT contain XML Processing Instructions

Skeleton SOAP Message:

<?xml version="1.0"?>

<soap:Envelope xmlns:soap=“web.njit.edu/~aas44/soap-envelope"

soap:encodingStyle=“web.njit.edu/~aas44soap-encoding">

<soap:Header>

...

</soap:Header>

<soap:Body>

...

...

<soap:Fault>

...

...

</soap:Fault>

</soap:Body>

</soap:Envelope>

SOAP BODY

• The required SOAP Body element contains the actual SOAP message intended

for the ultimate endpoint of the message.

Page 8: DOCUMENTATION OF PROJECT

• Immediate child elements of the SOAP Body element may be namespace-

qualified. SOAP defines one element inside the Body element in the default

namespace. This is the SOAP Fault element, which is used to indicate error

messages.

Source Code – Database Tables

To create tables in oracle. We write the following query:

1. TO CREATE THE TABLE SYSTERS PROTEIN

create table systers_protein_table(db varchar2(4),raccno varchar2(25),pname varchar2(25), Description varchar2(200),Gene_name varchar2(25),Identical_To varchar2(25),Fragment_Of varchar2(25),Cluster_No varchar2(25),primary key (raccno));

2. TO CREATE THE TABLE SYSTERS PROTEIN SEQUENCE TABLEcreate table protein_sequences_table(accno varchar2(25),sequence varchar2(1000),ClusterNo varchar2(25),foreign key (accno) references systers_protein_table (raccno));

To insert rows into the table, we write the following query:

1. TO INSERT ROWS IN FIRST TABLE

begininsert into systers_protein_table values ('TRE','Q9PU83','Q9PU83','Vitamin D receptor','NULL','NULL','NULL',136821);insert into systers_protein_table values end;

2. TO INSERT ROWS IN SECOND TABLEbegininsert into protein_sequences_table values ('Q9PU83','GeneTRE|Q9PU83|Q9PU83 (215 AA) Vitamin D receptor (Fragment) [Crocodylus niloticus (Nile crocodile) (African crocodile)]ILTDEEVQRKREMIMKRKEEEALKESMKPKLSEEQQNVIDILLEAHRKTYDPTYSDFTQF

Page 9: DOCUMENTATION OF PROJECT

RPPVRSSEEQRLTRSSSVLTQGFSSEDSSEPFGSSPDSVEHGMFSNLMLSEPEESASMSINFSPLTMLPHLADLVxYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEVIMLRSNQSFTLEDMSWNCGSNDFKYKVSDVTQAGHNMELLEPLV',136821);end;

Screenshot - Tables

Oracle tables that are created(sample of the entire table).

ACCNO SEQUENCE CLUSTERNO

Q9PU83

GeneTRE|Q9PU83|Q9PU83 (215 AA) Vitamin D receptor (Fragment) [Crocodylus niloticus (Nile crocodile) (African crocodile)]ILTDEEVQRKREMIMKRKEEEALKESMKPK LSEEQQNVIDILLEAHRKTYDPTYSDFTQFRPPVRSSEEQRLTRSSSVLTQGFSSEDSSEPFGSSPDSVEHGMFSNLMLSEPEESASMSINFSPLTMLPHLADLVxYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEVIMLRSNQSF TLEDMSWNCGSNDFKYKVSDVTQAGHNMELLEPLV

136821

Q9PTN2

>TRE|Q9PTN2|Q9PTN2 (453 AA) Vitamin D receptor [Brachydanio rerio (Zebrafish) (Danio rerio)]MLTENSAVNSGGKSKCEAGACESTVNGDATSLMDLMAVSTSATGQDQFDRNAPPICGV CGMLTENSAVNSGGKSKCEAGACESTVNGDATSLMDLMAVSTSATGQDQFDRNAPPICGVCGMMKEFILTDEEVQRKKDLIMKRKEEEAAREARKPRLSDEQMQIINSLVEAHHKTYDDSYSDFVRFRPPVREGPVTRSASRAASLHSLS DASSDSFNHSPESVDTKLNFSNLLMMYQDSGSPDSSEEDQQSRLSMLPHLADLVSYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEIIMLRSNQSFSLEDMSWSCGGPDFKYCINDVTKAGHTLELLEPLVKFQVGLKKLKLHEEEH VL

136821

ENSANGP00000010943

>AG|ENSANGP00000010943 (236 AA) Gene:ENSANGG00000008454 Clone:AAAB01008839 Contig:AAAB01008839_60 Chr:3R Basepair:35325398 Status:novelNNKKPQKAPHHRCTM ASFDVYDRSSWYFGAMSRQDATDLLLNERESGVFLVRDSTTIVGDFVLCVREDSKVSHYIINKLPSGDECFVYRIGDQTFADLPDLLSFYKLHYLDTTPLRRPMVRRLEKVIGKFDFDGSDPDDLPFKKGEILHIISKDEEQWWTARNGA GQTGQIPVPYLPALARVKQERVPNAYDETALKLSVGDVIKVLKTNINGQWEGELKGKIGHFPFTHVEFIDE

136822

CG1587-PA

>DM|CG1587-PA (271 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:knownMDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCVREDTKVSN YIINKVQQQDQIVYRIGDQSFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPFQRGEVLTIVRKDEDQWWTARNSSGKIGQIP

136822

Page 10: DOCUMENTATION OF PROJECT

VPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNRKLPAYARVKQS RVPNAYDKTALKLEIGDIIKVTKTNINGQWEGELNGKNGHFPFTHVEFVDDCDLSKNSTEIC

Q8JIZ9

>TRE|Q8JIZ9|Q8JIZ9 (329 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] YAAYKSTGYHFNAMTCEGCKGFCRRAMKRPAQLCCPFQSACVITK SNRRQCQSCRLQKCL SIGMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLNAHKKTFDMTCAHF SQFRPLDRGQKSVSESSPVTNGSWIDHRPIAEDPVQWVFNSTSLSSSSSSYQSLDKEKKH FKSGSFTSLPHF TDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMF FNEVTGIWECGPLQYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFS PDRPGVTDHKVIDRNQETLALTLKTYIEA

136821

Q8QGH6

>TRE|Q8QGH6|Q8QGH6 (322 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] GMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLN AHKKTFDMTCAHFSQ FRPLDRDQKSVSESSPLTNGSWIDHRPIAEDPMQWVFNPTSLSSSSSSYQSLDNKEKKHF KSGNFSSLPHFTDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMFF NEVTGIWECGPL QYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFSP DRPGVTDHKVIDRNQETLALTLKTYIEAKRNGPEKHLLFPKIMGCLTEMRSMNEEYTKQV LKIQDMQPEVSPLWLEIISKDT

136821

Q90WS4

>TRE|Q90WS4|Q90WS4 (270 AA) Putative vitamin D receptor (Fragment) [Elaphe sp] RKAMFTCPFNGDCKITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMIMKRKEE EALKESLKPK LLEEQQRVIEILLEAHRKTYDPTYSDFSQFRPPVRQNEKEHTSRSSNMTP GFSFSDDSSDTSSFSSEPMMLSSLELNDDSTSMSIDFSHLSMLPHLADLVSYSIQKVIGF AKMIPGFRSLTAEDQIALLKSSAIEVIMLRSNQSFSLE DMSWFCGSNDFKYQVSDVTQAG HSLDLLEPLVKFQISLKKLNLHEEEHVLLM

136821

Q8JIZ9

>TRE|Q8JIZ9|Q8JIZ9 (329 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] YAAYKSTGYHFNAMTCEGCKGFCRRAMKRPAQLCCPFQSACVITK SNRRQCQSCRLQKCL SIGMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLNAHKKTFDMTCAHF SQFRPLDRGQKSVSESSPVTNGSWIDHRPIAEDPVQWVFNSTSLSSSSSSYQSLDKEKKH FKSGSFTSLPHF TDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMF FNEVTGIWECGPLQYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFS PDRPGVTDHKVIDRNQETLALTLKTYIEA

136821

Q8QGH6

>TRE|Q8QGH6|Q8QGH6 (322 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] GMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLN AHKKTFDMTCAHFSQ FRPLDRDQKSVSESSPLTNGSWIDHRPIAEDPMQWVFNPTSLSSSSSSYQSLDNKEKKHF KSGNFSSLPHFTDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEI

136821

Page 11: DOCUMENTATION OF PROJECT

ILIHFNMFF NEVTGIWECGPL QYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFSP DRPGVTDHKVIDRNQETLALTLKTYIEAKRNGPEKHLLFPKIMGCLTEMRSMNEEYTKQV LKIQDMQPEVSPLWLEIISKDT

Q90WS4

>TRE|Q90WS4|Q90WS4 (270 AA) Putative vitamin D receptor (Fragment) [Elaphe sp] RKAMFTCPFNGDCKITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMIMKRKEE EALKESLKPK LLEEQQRVIEILLEAHRKTYDPTYSDFSQFRPPVRQNEKEHTSRSSNMTP GFSFSDDSSDTSSFSSEPMMLSSLELNDDSTSMSIDFSHLSMLPHLADLVSYSIQKVIGF AKMIPGFRSLTAEDQIALLKSSAIEVIMLRSNQSFSLE DMSWFCGSNDFKYQVSDVTQAG HSLDLLEPLVKFQISLKKLNLHEEEHVLLM

136821

ENSP00000325217

>HS|ENSP00000325217 (465 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known SILCTGLFKVDPRGEVGAK NLPPSSPRGPEANLEVRPKESWNHADFVHCEDTESVPGKPS VNADEEVGGPQICRVCGDKATGYHFNVMTCEGCKGFFRRAMKRNARLRCPFRKGACEITR KTRRQCQACRLRKCLESGMKKEMIMSDEAVEERRALIKRKKSERTGT QPLGVQGLTEEQR MMIRELMDAQMKTFDTTFSHFKNFRPGVLSSGCELPESLQAPSREEAAKWSQVRKDLCSL KVSLQLRGEDGSVWNYKPPADSGGKEIFSLLPHMADMSTYMFKGIISFAKVISYFRDLPI EDQISLLKGAAFEL CQLRFNTVFNAETGTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYM LKKLQLHEEEYVLMQAISLFSPDRPGVLQHRVVDQLQEQFAITLKSYIECNRPQPAHRFL FLKIMAMLTELRSINAQHTQRLLRIQDIHPFATPLMQELFGI TGS

136821

ENSP00000273389

>HS|ENSP00000273389 (434 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known LEVRPKESWNHADFVHCED TESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFS HFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGG FQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS

136821

Q96AC7 >TRE|Q96AC7|Q96AC7 (378 AA) Nuclear receptor subfamily 1, group I, member 2 [Homo sapiens (Human)] MTCEGCKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESG MKKEMIMSD EAVEERRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRPG VLSSGCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEI FSLLPHMADMSTYMFKGI ISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAET

136821

Page 12: DOCUMENTATION OF PROJECT

GTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGV LQHRVVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRS INAQHTQRLLRIQD IHPFATPLMQELFGITGS

O75469

>SPR|O75469|PXR_HUMAN (434 AA) Orphan nuclear receptor PXR (Pregnane X receptor) (Orphan nuclear receptor PAR1) (Steroid and xenobiotic receptor) (SXR ) [Homo sapiens (Human)] MEVRPKESWNHADFVHCEDTESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRA LIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIED QISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHP F ATPLMQELFGITGS

136821

ACCNO SEQUENCE CLUSTERNO

Q8SQ00

>TRE|Q8SQ00|Q8SQ00 (330 AA) Pregnane X receptor (Fragment) [Sus scrofa (Pig)] GMRKEMIMSDAAVEQRRALIRRKKREQIGAQPPGAKGLTEEQRTMISELMNAQMKTFDTT FTHFKNFRLPE VLSSSLEIPECLQTPSSREEAAKWSKLREDLCSVKLSLQLRGEDGSVWN YKPPADNSGKEIFSLLPHIADMSTYMFKGIINFAKVISYFRDLPIEDQISLLKGATFELC QLRFNTVFNAETGTWECGRLSYSLEDPSGGFQQLLLQPM LKFHYMLKKLQLHKEEYVLMQ AISLFSPDRPGVVQRQVVDQLQERFAITLKAYIECNRPQPAHRFLFLKIMAMLTELRSIN AQHTQRLLRIQDIHPFATPLMQELFSITES

136821

Q8SQ01

>TRE|Q8SQ01|Q8SQ01 (434 AA) Pregnane X receptor [Macaca mulatta (Rhesus macaque)] MEVRPKEGWNHADFVYCEDTEFAPGKPTVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRR AMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDAAVEE RRALIKRKKRERIGTQPPGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRLPGVLSS GCEMPESLQAPSREEAAKWNQVRKDLWSVKVSVQL RGEDGSVWNYKPPADNGGKEIFSLL PHMADMSTYMFKGIINFAKVISYFRDLPIEDQISLLKGATFELCQLRFNTVFNAETGTWE CGRLSYCLEDPAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVVQHR VV DQLQEQYAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS

136821

CG1587-PB

>DM|CG1587-PB (253 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:known MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCDQIVYRIG DQ SFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPF

136822

Page 13: DOCUMENTATION OF PROJECT

QRGEVLTIVRK DEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNR KLPAYARVKQSRVPNAYDKTALKLE IGDIIKVTKTNINGQWEGELNGKNGHFPFTHVEFV DDCDLSKNSTEIC

Q95RW2

>TRE|Q95RW2|Q95RW2 (253 AA) LD08427p (CG1587-PB) [Drosophila melanogaster (Fruit fly)] MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCDQIVYRIGDQ SF DNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPFQRGEVLTIVRK DEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNR KLPAYARVKQSRVPNAYDKTALKLEIGDII KVTKTNINGQWEGELNGKNGHFPFTHVEFV DDCDLSKNSTEIC

136822

ENSP00000273389

>HS|ENSP00000273389 (434 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known LEVRPKESWNHADFVHCED TESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFS HFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGG FQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS

136821

P47941

>SPR|P47941|CRKL_MOUSE (303 AA) Crk-like protein [Mus musculus (Mouse)] MSSARFDSSDRSAWYMGPVTRQEAQTRLQGQRHGMFLVRDSSTCPGDYVLSVSENSRVSH YIINSLPNRRFKIGDQE FDHLPALLEFYKIHYLDTTTLIEPAPRYPSPPVGSVSAPNLPT AEENLEYVRTLYDFPGNDAEDLPFKKGELLVIIEKPEEQWWSARTKDGRVGMIPVPYVEK LVRSSPHGKHGNRNSNSYGIPEPAHAYAQPQTTTPLPTVASTPGA AINPLPSTQNGPVFA KAIQKRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRKGLFPFTHVKIFDPQNPD DNE

136822

SINFRUP00000150362

>FR|SINFRUP00000150362 (325 AA) Gene:SINFRUG00000141661 Clone:scaffold_326 Contig:scaffold_326 Chr:Chr_scaffold_326 Basepair:104364 Status:known MAGNF DAEDRDSWYWGRLTRQEAVSLLQGQRHGVFLVRDxISIRGGYVLSVSENSKVSHY IINSVSDNRQCENDIAFPLSGLTPPYFRIGDQEFEALPALLEFYKIHYLDTTALIEPVSK AQHTGFISSSAGVPPPSQEEAEFVRALFDFSGN DEEDLPFRKGDILRVLEKPEEQWWNAA NQEGRAGMIPVPYVEKYRPASPTAAALGPTTSVPGQVPEGGRPTGGTDGMAGAQDNPLCD PGQYAQPVVNAQLPNLQNGPVYARVIQKRVPNAYDKTALALEVGEMVKVTKINVNGQWEG ECKGKRGHFPFTHVRLMEQQHPDGD

136822

CG1587-PC

>DM|CG1587-PC (271 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:known MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDY

136822

Page 14: DOCUMENTATION OF PROJECT

VLCVREDTKVS NY IINKVQQQDQIVYRIGDQSFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGS DQDDLPFQRGEVLTIVRKDEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSI SGSSNVFESTLKRTDLNRKLPAYAR VKQSRVPNAYDKTALKLEIGDIIKVTKTNINGQWE GELNGKNGHFPFTHVEFVDDCDLSKNSTEIC

SINFRUP00000144694

>FR|SINFRUP00000144694 (315 AA) Gene:SINFRUG00000136477 Clone:scaffold_306 Contig:scaffold_306 Chr:Chr_scaffold_306 Basepair:110782 Status:known MAGNF DAEDRNSWYWGRLSRQEAVSLLQGQRHGVFLVRDSSTIHGDYVLSVSENSKVSHY IINSISNNRQSGPGSAHPRFRIGDQEFVALPALLEFYKIHYLDTTTLIEPINKSRLTSFI NVGPGGGPPQRLEDEYVRALFDFPGNDEEDLPF KKGDILRVLEKPEEQWWNAQNSEGRAG MIPVPYVEKYRPASPSLVAGHGLPGGPPGGTGMQGNSDGSAAQTSAPLLGDPSQYAQPTP LPNLQNGPVFARAIQKRVPNAYDKTALALEVGDTVKVTKINVNGQWEGECKGKRGHFPFT HVKLLDQHSAEDELS

136822

SINFRUP00000164144

>FR|SINFRUP00000164144 (299 AA) Gene:SINFRUG00000154261 Clone:scaffold_3683 Contig:scaffold_3683 Chr:Chr_scaffold_3683 Basepair:4533 Status:known MSTS RFDSADRSAWYFGPVSRHEAQNRLQGQKHGIFLVRDSSTCHGDYVLSVSENSKVSH YIINSLPNKRFKIGDREFEHLPALLEFYKYHYLDTTTLIEPASRYPSTLSCPVQPAGPED NLEYVRTLYDFTGSDAEDLPFKKGEVLVILEK PEEQWWSARNKDGRVGMIPVPYVEKLAR PAPLPGQPGHGSRNSNSYGVPEPSHAVVHAYALPQTPSPLPAPGPVINPQNGPAMAKAIQ KRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRRGLFPFTHVKIIDAQNPDESD

136822

Q8R5B8

>TRE|Q8R5B8|Q8R5B8 (98 AA) Similar to v-crk avian sarcoma virus CT10 oncogene homolog-like [Mus musculus (Mouse)] MSSARFDSSDRSAWYMGPVTRQEAQTRLQGQRHGMF LVRDSSTCPGDYVLSVSENSRVSH YIINSLPNRRFKIGDQEFDHLPALLEFYKIHYLDTTTM

136822

Q8JIB3

>TRE|Q8JIB3|Q8JIB3 (367 AA) C-fos proto-oncogene [Coturnix coturnix (Common quail)] MMYQGFAGEYEAPSSRCSSASPAGDSLTYYPSPADSFSSMGSPVNSQDFCTDLAVSSANF VPTVT AISTSPDLQWLVQPTLISSVAPSQNRGHPYGVPPPAPPAAYSRPAVLKAPGGRGQ SIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEEEKSALQA EIANLLKEKEKLEFILAAHRPACKMPEELRFSE ELAAATALDLGAPSPAAAEETFALPxM TEAPPAVPPKEPSGSGLELKAEPFDELLFSTGPREASRSVPDMDLPGASSFYASDWEPLG AGSSGELEPLCTPVVTCTPCPSTYTSTFVFTYPEADAFPSCAAAHRKGSSSNEPSSDSLS SPTLLAL

136823

P79702 >SPR|P79702|FOS_CYPCA (347 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Cyprinus carpio (Common carp)] MMFTSLNADCDASSRCSTASAAAESVACYPLNQT QKFTELSVSSASFVPTVTAISSCPDL QWMVQPMVSSVAPSNGGARSYNPNPYPKMRVTGTKSPNSNKRARAE

136823

Page 15: DOCUMENTATION OF PROJECT

QLSPEEEEKKRVRR ERNKMAAAKCRNRRRELTDTLQAETDELEDEKSALQNDIANLLKEKERLEFILAAHKPIC K IPSSSVSPIPAASVPEIHSITTSVVSTANAPVTTSSSSSLFSSTASTDSFGSTVEISDL EPTLEESLELLAKAELETARSVPDVDLSSSLYARDWESLYTPANNDLEPLCTPVVTRTPA CTTYTSSFTFTYPENDVFPSCGPVHRRGS SSNDQSSDSLNSPTLLTL

Q8HZP6

>TRE|Q8HZP6|Q8HZP6 (381 AA) Immediate early protein [Felis silvestris catus (Cat)] MMFSGFNADYEASSSRCSSASPAGDNLSYYHSPADSFSSMGSPVNAQDFCTDLAVSSANF IPTVTA ISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGVPAPSAGAYSRAGVVKTVTAGGR AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFP EEMSVASLDLSGGLPEAATPESEEAF TLPLLNDPEPKPSVEPVKSISSMELKAEPFDDFLFPASSRPSGSETARSVPDMDLSGSFY AADWEPLHGGSLGMGPMATELEPLCTPVVTCTPSCTTYTSSFVFTYPEADSFPSCGAAHR K GSSSNEPSSDSLSSPTLLAL

136823

ACCNO SEQUENCE CLUSTERNO

P01101

>SPR|P01101|FOS_MOUSE (380 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Mus musculus (Mouse)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSP VNTQDFCADLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQSAGAYARAGMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKEK EKLEFILAAHRPACKIPDDLGFPEEMSVASLDLTGGLPEASTPESEEAFT LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETSRSVPDVDLSGSFYA ADWEPLHSNSLGMGPMVTELEPLCTPVVTCTPGCTTYT SSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL

136823

O88479

MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNAQDFCTDLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGVPTPSTGAYSRAGMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNK MAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEEMFVASLDLTGGLPEATTPESEEAFS LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETTARSVPDMDLS GSFY AADWEPLHSSSLGMGPMVTELEPLCTPVVTCTPSCTTYTSSFVFTYPEADSFPSCAAAHR KGSSSNEPSSDSLSSPTLLAL

136823

O97930 >TRE|O97930|O97930 (380 AA) P55-C-FOS proto-oncogene protein (Cellular oncogene C-FOS) (C-FOS) [Sus scrofa (Pig)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADS FSSMGSPVNAQDFCTDLAVSSVNF IPTVTAISISPDLQWLVQPTLVSSVAPSQTRAPHPYGVPTPSAGAYSRAGAVKTMPGGRA

136823

Page 16: DOCUMENTATION OF PROJECT

QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEI ANLLKEKEKLEFILAAHRPACKIPDDLGFPEEMSVASLDLSGGLPEAATPESEEAFT LPLLNDPEPKPSVEPVKKVSSMELKAEPFDDFLFPASSRPGGSETARSVPDMDLSGSFYA ADWEPLHGGSLGMGPMATELEPLCTPVVTCT PSCTAYTSSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL

P01102

>SPR|P01102|FOS_MSVFB (381 AA) p55-v-fos transforming protein [FBJ murine osteosarcoma virus] MMFSGFNADYEASSFRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVS SANF IPTVTATSTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQSAGAYARAEMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDKKSALQ TEIANLLKEKEKLEFILAAHRPA CKIPDDLGFPEEMSVASLDLTGGLPEASTPESEEAFT LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETSRSVPNVDLSGSFYA ADWEPLHSNSLGMGPMVTELEPLCTPVVTCTPLLRLPELTHAAGPVSSQRR QGSRHPDVP LPELVHYREEKHVFPQRFPST

136823

P12841

>SPR|P12841|FOS_RAT (380 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Rattus norvegicus (Rat)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGS PVNTQDFCADLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPSTGAYARAGVVKTMSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKE KEKLEFILAAHRPACKIPNDLGFPEEMSVTSLDLTGGLPEATTPESEEAFT LPLLNDPEPKPSLEPVKNISNMELKAEPFDDFLFPASSRPSGSETARSVPDVDLSGSFYA ADWEPLHSSSLGMGPMVTELEPLCTPVVTCTPSCTTY TSSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL

Page 17: DOCUMENTATION OF PROJECT

Source Code for the HTML pages:

The first page is the index.html page. The following is the source code for the page:

<html><body BACKGROUND="bck.jpg"><font color="blue"><br>

<hr><FONT FACE="Times new roman" SIZE=5><B><I><CENTER><h1>CIS-734 DATA MINING</h1></b></i><hr>

<br><br><img src="W1.jpg" border="3"><br><br>

Page 18: DOCUMENTATION OF PROJECT

<A HREF="ab.doc">DOCUMENTATION OF PROJECT</A><br><A HREF="link.html">IMPLEMENTATION OF PROJECT</A><br>

<br><br><br><br><p ALIGN="LEFT"><font size="4">SUBMITTED BY:<BR>ASAD SIDDIQUI.<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><br>SUPRIYA MALHOTRA.<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><BR>OJUS BATHLA<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><BR></FONT>

</body></html>

The second page is the link.html which connects to the ‘implementation of the project’ link

<html><body BACKGROUND="bck.jpg"><font color="blue"><br><br>

<FONT FACE="Times new roman" SIZE=5><B><I><CENTER>SEARCH ENGINE<HR><BR></FONT>

<form action="prots.jsp" method='POST'>

<center> Protein Name: <input type="text" name="txtpname"> <input type="submit" name="butpname" value="SEARCH">

Page 19: DOCUMENTATION OF PROJECT

<br><BR> Database: <input type="text" name="txtdb"> <input type="submit" name="butdb" value="SEARCH">

<br><BR> Raccno: <input type="text" name="txtraccno"> <input type="submit" name="butraccno" value="SEARCH">

<br><BR> Cluster: <input type="text"name="txtclusterno"> <input type="submit" name="butcluster" value="SEARCH"><br><br><BR>

<input type="reset" name="reset"></form></body></html>

The third page is the aa.html page which displays the tables after search. The following is the source code:

<TD><table align=center border= 1 ><tr><td align=right>Database Name: </td> <td> <%=dbname%> </td>

<td align=right> Raccno: </td> <td> <%=raccno%> </td> <td align=right> Protein Name: </td> <td><%=pname%> </td>

<td align=right> Gene Name: </td> <td> <%=gname%> </td> <td align=right>Cluster No: </td> <td><%=clusterno%> </td>

<td align=right>Description: </td> <td><%=description%> </td>

<td align=right>Sequence No:</td> <td> <a href="seq.jsp?seq=<%=seq%>&pname=<%=pname%>" >Sequence Details</a> </td> </tr>

Page 20: DOCUMENTATION OF PROJECT

</table> </TD>

JSP SOURCE CODE

In our project the JSP page prots.jsp is used. The following is the source code for to run the JSP page:

<%@page contentType="text/html"%><%@page pageEncoding="UTF-8"%><%@page import= "java.io.*" %><%@page import= "java.net.*" %><%@page import= "java.sql.*" %><%@page import= "javax.servlet.*" %><%@page import= "javax.servlet.http.*"%><%@page import= "java.sql.*"%><%@page import= "java.util.*"%><%@page import= "java.sql.Connection" %><%@page import= "java.sql.DriverManager" %><%@page import= "java.sql.SQLException"%>

<html> <head><title align= center>Search Result</title></head> <body background="bck.jpg">

<%

Page 21: DOCUMENTATION OF PROJECT

try { String db = ""; String raccno=null; String pname=null; String description=null; String gname=null; String identical_to=null;

String tre = "TRE "; // String fragment_no=null; String cluster_no=null; String seq=null; String txtpname=request.getParameter("txtpname");String txtdb=request.getParameter("txtdb");String txtraccno=request.getParameter("txtraccno");String txtclusterno=request.getParameter("txtclusterno");

// Load the Oracle JDBC driverDriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());

String url = "jdbc:oracle:thin:@prophet.njit.edu:1521:course";

try {String url1 = System.getProperty("JDBC_URL");if (url1 != null)url = url1;} catch (Exception e) {// If there is any security exception, ignore it// and use the default}

// Connect to the databaseConnection conn =DriverManager.getConnection (url, "sm363", "hariom");

// Create a StatementStatement stmt = conn.createStatement ();Statement stmtdburl = conn.createStatement ();ResultSet rs = stmt.executeQuery("select * from systers_protein_table s, protein_sequences_table p where (s.pname=upper('"+txtpname+"') or s.db=upper('"+txtdb+"') or s.raccno=upper('"+txtraccno+"') or s.cluster_no=upper('"+txtclusterno+"')) and (s.raccno = p.accno)");//ResultSet rs = stmt.executeQuery("select * from systers_protein_table s, protein_sequences_table p where s.raccno = p.accno");

%> <br><table align=center border= 1 ><tr><td align=right>Database Name: </td>

Page 22: DOCUMENTATION OF PROJECT

<td align=right> Raccno: </td>

<td align=right>Description: </td>

<td align=right>Cluster No: </td>

<td align=right> Gene Name: </td>

<td align=right>Identical_to: </td>

<td align=right>Cluster No: </td><td align=right>SEQUENCE: </td> </tr> <% boolean flag1=true; while (rs.next()) { flag1=false; db=rs.getString(1);

raccno=rs.getString(2); pname=rs.getString(3);

description=rs.getString(4); gname=rs.getString(5); identical_to=rs.getString(6); // fragment_of=rs.getString(7);

cluster_no=rs.getString(8); seq=rs.getString(10);

//I have to code here %>

<tr>

Page 23: DOCUMENTATION OF PROJECT

<td> <%=db%> </td><%

if(db.equals("TRE")){

%><td> <a href="http://ca.expasy.org/uniprot/<%=raccno%>"><%=raccno%></a> </td>

<%} if(db.equals("DM")){

%><td> <a href="http://www.ensembl.org/Drosophila_melanogaster/protview?peptide=<%=raccno%>"><%=raccno%></a> </td>

<%} %><td> <a href="seq.jsp?raccno=<%=raccno%>"><%=raccno%></a> </td><td><%=description%> </td><td><%=cluster_no%> </td> <td> <%=gname%> </td><td><%=identical_to%> </td><td><%=cluster_no%> </td><td><%=seq%> </td>

</tr>

<%}%> </table>

<% if(flag1){ RequestDispatcher dis = request.getRequestDispatcher("/errorprotein.jsp"); if (dis != null) dis.forward(request, response); } %> <% } catch (Exception e) { out.println("Got an exception! "+e);

Page 24: DOCUMENTATION OF PROJECT

} %>

</body></html>

SCREENSHOTS – Project Execution

The screeshots below show the different steps how our project is being executed in a step by step manner. It also shows how it is getting connected to the web when we click on a link for raccno:

Page 25: DOCUMENTATION OF PROJECT
Page 26: DOCUMENTATION OF PROJECT
Page 27: DOCUMENTATION OF PROJECT
Page 28: DOCUMENTATION OF PROJECT

When we click on the raccno: link in the above page,it takes us to the following weblink.This shows that our project is connected to the web also.

Page 29: DOCUMENTATION OF PROJECT

When we click on the description of a particular raccno: it retrieves data from the local host. As shown below:

Page 30: DOCUMENTATION OF PROJECT

Summary

The project was implemented successfully. It can be tested by the following url http://web.njit.edu/~sm363This project works like a search engine and retrieves relevant data from the local host/database created as well as the web. It is a user friendly search engine.We have learnt a lot from this project like data mining techniques and programming languages like JSP, HTML, SOAP, SQL. We learnt how to connect database to the website using java connectivity.