58
Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Embed Size (px)

Citation preview

Page 1: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Computational Biology

Dr. Jens Allmer

Lecture Slides Week 5

Page 2: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

MakeDB

• Example– makeblastdb -in seq.fasta -dbtype prot -out seqBl –title

seqBlastDB

• More information?– Go to the doc folder of BLAST– Documentation is there– http://www.ncbi.nlm.nih.gov/books/NBK1763/

Page 3: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

BLAST

• Now that we have an indexed database try to run BLAST

• Read documentation and try to solve the simplest case– You will need the indexed database and you will need a FASTA

file as query– You could create queries from the database and slightly change

them

• Good luck

Page 4: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

OMSSA

• Unzip folder and check– Alternatively, download from NCBI

• MS/MS mgf file• Database file as FASTA• makeblastdb.exe• omssacl.exe• usermods.xml

Page 5: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

OMSSA

Before running OMSSA, database file must be converted to BLAST-like format.

So let’s run makeblastdb.exe to create a hash-indexed database

Page 6: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

OMSSA

Here 2 different settings are used.First one is with 0.05 product ion toleranceSecond one is with default product ion toleranceFor variable modifications (-mv) check usermods.xml

Page 7: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem

• Unzip folder and check

• Mgf formated spectra (file)• Database file (FASTA)• tandem-win32-10-12-01-1 folder• Used .xml configuration files (default_input.xml, input.xml

and taxonomy.xml)• To get the same output given in zip folder;

– Replace configuration files in «tandem-win\bin» folder with ones in «used» folder.

– Also copy database file to «fasta» folder and .mgf file to «bin» in «tandem-win»

Page 8: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem Console Application

Page 9: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

MBG404 Overview

Data

Generation

Processing

Storage

Mining

Pipelining

Page 10: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem Default Input

Parameters such as mass tolerances, enzyme type, number of charged for search can be reset in default_input.xml

Page 11: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem Input.xml

In input.xml file, you should specify path of:• taxonomy.xml • default_input.xml • Spectra filename • Output filenameNOTE: Here input.xml and all files above are in same folder(directory))

Page 12: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem Taxonomy

In taxonomy file, you should specify «database file path». In this example, database file is in «fasta» folder in «Xtandem\tandem-win32-10-12-01-1» folder.

Page 13: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

X!Tandem Output

Page 14: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Console Applications

Why

Page 15: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

HTML

• What you need to know about hyper text markup language

• How to reach to it– Right click the document in your browser– Make sure you do not click on an image, link or some other non

HTML element– Choose View Source or View Page Source.

• What’s in the source

• Sometimes things are not visible/ accessible on the web page but can be retrieved from the source

Page 16: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

HTML Structure

<HTML>

<HEAD>

<TITLE>Page title seen in the title bar</TITLE>

<!-- Some other links and scripts can be here -->

</HEAD>

<BODY>

Text and other visible elements go here

</BODY>

</HTML>

Page 17: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

HTML Input

<FORM action=“destination” method=“POST/GET”>

<INPUT type=“TYPES” name=“” id=“” value=“” />

<TEXTAREA name=“” id=“”>value</TEXTAREA>

<SELECT name=“” id=“”>

<OPTION value=“”>display</OPTION>

</SELECT>

</FORM>

TYPES: { text, password, checkbox, radio, submit, reset, file, hidden, image, button}

Page 18: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Why?

• Why do you need this information?

• Some information may be inaccessible on the website • In the HTML code it will be accessible

• Sometimes you may be interested in all settings for the programs that you used online

• Often these settings are in hidden input fields (you need to check the source then)

Page 19: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

NCBI Blast

• Contains many hidden variables here are some:

Page 20: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Theory I

Page 21: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

MBG404 Overview

Data

Generation

Processing

Storage

Mining

Pipelining

Page 22: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Database Management Systems

Page 23: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Database Management Systems

Company/ Organization

DatabaseSize (GB)

DBMS SystemArch.

DBMSVendor

SystemVendor

StorageVendor

France Telecom 29,232 Oracle SMP Oracle HP HP

AT&T 26,269 Daytona SMP AT&T Sun Sun

SBC 24,805 Teradata MPP Teradata NCR LSI

Anonymous 16,191DB2 forUnix

MPP/ Cluster IBM IBM IBM

Amazon.com 13,001 Oracle SMP Oracle HP HP

Kmart 12,592 Teradata MPP Teradata NCR LSI

Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi

Health Insurance Review Agency 11,942 Sybase IQ Cluster Sybase HP Hitachi

FedEx Services 9,981 Teradata MPP Teradata NCR EMC

Vodafone D2 GmbH 9,108 Teradata MPP Teradata NCR LSI

Page 24: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Database Management Systems

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Users

Page 25: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Database Management Systems

Page 26: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

A Relation is a Table

Attributes(columnheaders)

Tuples(rows)

Contains data -> InstanceDomain

All possible values

name manf

WinterbrewBud Lite

Pete’sAnheuser-Busch

Beers

Page 27: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Schemas

• Relation schema = relation name and attribute list.– Optionally: types of attributes.– Example: Beers(name, manf) or Beers(name: string, manf:

string)• Database = collection of relations.• Database schema = set of all relation schemas in the

database.• Instance of a relation = a table in a database with data

Page 28: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Anomalies

• Goal of relational schema design is to avoid anomalies and redundancy.– Update anomaly : one occurrence of a fact is changed, but not

all occurrences.– Deletion anomaly : valid fact is lost when a tuple is deleted.

Page 29: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Example of Bad Design

Drinkers(name, addr, beersLiked, manf, favBeer)

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ???’s can be easily figured out.

Page 30: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

This Bad Design AlsoExhibits Anomalies

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.

Page 31: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

1st Normal Form

All attributes need to be atomic

Page 32: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

2nd Normal FormMust be in 1st NFa key must uniquely identify each tuple

Page 33: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

3rd Normal Form

Must be in 2nd NFattributes not part of a key must directly depend on one of the keys

Page 34: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

One-One Relationships

• In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.

• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers.– A beer cannot be made by more than one manufacturer, and no

manufacturer can have more than one best-seller (assume no ties).

Page 35: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Many-One Relationships

• Some binary relationships are many-one from one entity set to another.

• Each entity of the first set is connected to at most one entity of the second set.

• But an entity of the second set can be connected to zero, one, or many entities of the first set.

Page 36: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Many-Many Relationships

• Focus: binary relationships, such as Sells between Bars and Beers.

• In a many-many relationship, an entity of either set can be connected to many entities of the other set.– E.g., a bar sells many beers; a beer is sold by many bars.

Page 37: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

End Theory I

• 5 min mindmapping• 10 min break

Page 38: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Practice I

Page 39: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

MS Access

• Create new Tables:– Plant– Features– FeatureTypes

Page 40: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Create a Table

Page 41: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Create a Table

Page 42: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Edit a Table

Page 43: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Create the Three Tables

• Plant• Features• FeatureTypes

Page 44: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Add Attributes

• Plant– ID– Gender– Species– Strain– Clone

Page 45: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Add Attributes

• Features– ID– FeatureType– Value

Page 46: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Add Attributes

• Features– ID– Type– Unit

Page 47: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Table Space

Page 48: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Notice

Page 49: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

More Editing

Page 50: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

More Editing

Page 51: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Notice

Page 52: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Fill with Data

• Import the data in the plants.csv file

Page 53: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Select Appropriate table

Page 54: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Some adjustments Are needed here

Page 55: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Need to name theColumns

appropriately

Page 56: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Insert Data

• Import Feature table• Import features txt file

Page 57: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

Real Data

• Download GO Terms:– http://

archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz

• Change file extention to .xml so that Access can import• Import file into Access

– May take a short while– Errors will occur (we ignore them for now)

• Have a look at the tables• Analyze the relationships (were they imported?)

Page 58: Computational Biology Dr. Jens Allmer Lecture Slides Week 5

End