Upload
whitney-oconnor
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Computational Biology
Dr. Jens Allmer
Lecture Slides Week 5
MakeDB
• Example– makeblastdb -in seq.fasta -dbtype prot -out seqBl –title
seqBlastDB
• More information?– Go to the doc folder of BLAST– Documentation is there– http://www.ncbi.nlm.nih.gov/books/NBK1763/
BLAST
• Now that we have an indexed database try to run BLAST
• Read documentation and try to solve the simplest case– You will need the indexed database and you will need a FASTA
file as query– You could create queries from the database and slightly change
them
• Good luck
OMSSA
• Unzip folder and check– Alternatively, download from NCBI
• MS/MS mgf file• Database file as FASTA• makeblastdb.exe• omssacl.exe• usermods.xml
OMSSA
Before running OMSSA, database file must be converted to BLAST-like format.
So let’s run makeblastdb.exe to create a hash-indexed database
OMSSA
Here 2 different settings are used.First one is with 0.05 product ion toleranceSecond one is with default product ion toleranceFor variable modifications (-mv) check usermods.xml
X!Tandem
• Unzip folder and check
• Mgf formated spectra (file)• Database file (FASTA)• tandem-win32-10-12-01-1 folder• Used .xml configuration files (default_input.xml, input.xml
and taxonomy.xml)• To get the same output given in zip folder;
– Replace configuration files in «tandem-win\bin» folder with ones in «used» folder.
– Also copy database file to «fasta» folder and .mgf file to «bin» in «tandem-win»
X!Tandem Console Application
MBG404 Overview
Data
Generation
Processing
Storage
Mining
Pipelining
X!Tandem Default Input
Parameters such as mass tolerances, enzyme type, number of charged for search can be reset in default_input.xml
X!Tandem Input.xml
In input.xml file, you should specify path of:• taxonomy.xml • default_input.xml • Spectra filename • Output filenameNOTE: Here input.xml and all files above are in same folder(directory))
X!Tandem Taxonomy
In taxonomy file, you should specify «database file path». In this example, database file is in «fasta» folder in «Xtandem\tandem-win32-10-12-01-1» folder.
X!Tandem Output
Console Applications
Why
HTML
• What you need to know about hyper text markup language
• How to reach to it– Right click the document in your browser– Make sure you do not click on an image, link or some other non
HTML element– Choose View Source or View Page Source.
• What’s in the source
• Sometimes things are not visible/ accessible on the web page but can be retrieved from the source
HTML Structure
<HTML>
<HEAD>
<TITLE>Page title seen in the title bar</TITLE>
<!-- Some other links and scripts can be here -->
</HEAD>
<BODY>
Text and other visible elements go here
</BODY>
</HTML>
HTML Input
<FORM action=“destination” method=“POST/GET”>
<INPUT type=“TYPES” name=“” id=“” value=“” />
<TEXTAREA name=“” id=“”>value</TEXTAREA>
<SELECT name=“” id=“”>
<OPTION value=“”>display</OPTION>
</SELECT>
</FORM>
TYPES: { text, password, checkbox, radio, submit, reset, file, hidden, image, button}
Why?
• Why do you need this information?
• Some information may be inaccessible on the website • In the HTML code it will be accessible
• Sometimes you may be interested in all settings for the programs that you used online
• Often these settings are in hidden input fields (you need to check the source then)
NCBI Blast
• Contains many hidden variables here are some:
Theory I
MBG404 Overview
Data
Generation
Processing
Storage
Mining
Pipelining
Database Management Systems
Database Management Systems
Company/ Organization
DatabaseSize (GB)
DBMS SystemArch.
DBMSVendor
SystemVendor
StorageVendor
France Telecom 29,232 Oracle SMP Oracle HP HP
AT&T 26,269 Daytona SMP AT&T Sun Sun
SBC 24,805 Teradata MPP Teradata NCR LSI
Anonymous 16,191DB2 forUnix
MPP/ Cluster IBM IBM IBM
Amazon.com 13,001 Oracle SMP Oracle HP HP
Kmart 12,592 Teradata MPP Teradata NCR LSI
Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi
Health Insurance Review Agency 11,942 Sybase IQ Cluster Sybase HP Hitachi
FedEx Services 9,981 Teradata MPP Teradata NCR EMC
Vodafone D2 GmbH 9,108 Teradata MPP Teradata NCR LSI
Database Management Systems
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Users
Database Management Systems
A Relation is a Table
Attributes(columnheaders)
Tuples(rows)
Contains data -> InstanceDomain
All possible values
name manf
WinterbrewBud Lite
Pete’sAnheuser-Busch
Beers
Schemas
• Relation schema = relation name and attribute list.– Optionally: types of attributes.– Example: Beers(name, manf) or Beers(name: string, manf:
string)• Database = collection of relations.• Database schema = set of all relation schemas in the
database.• Instance of a relation = a table in a database with data
Anomalies
• Goal of relational schema design is to avoid anomalies and redundancy.– Update anomaly : one occurrence of a fact is changed, but not
all occurrences.– Deletion anomaly : valid fact is lost when a tuple is deleted.
Example of Bad Design
Drinkers(name, addr, beersLiked, manf, favBeer)
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ???’s can be easily figured out.
This Bad Design AlsoExhibits Anomalies
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud
• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.
1st Normal Form
All attributes need to be atomic
2nd Normal FormMust be in 1st NFa key must uniquely identify each tuple
3rd Normal Form
Must be in 2nd NFattributes not part of a key must directly depend on one of the keys
One-One Relationships
• In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.
• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers.– A beer cannot be made by more than one manufacturer, and no
manufacturer can have more than one best-seller (assume no ties).
Many-One Relationships
• Some binary relationships are many-one from one entity set to another.
• Each entity of the first set is connected to at most one entity of the second set.
• But an entity of the second set can be connected to zero, one, or many entities of the first set.
Many-Many Relationships
• Focus: binary relationships, such as Sells between Bars and Beers.
• In a many-many relationship, an entity of either set can be connected to many entities of the other set.– E.g., a bar sells many beers; a beer is sold by many bars.
End Theory I
• 5 min mindmapping• 10 min break
Practice I
MS Access
• Create new Tables:– Plant– Features– FeatureTypes
Create a Table
Create a Table
Edit a Table
Create the Three Tables
• Plant• Features• FeatureTypes
Add Attributes
• Plant– ID– Gender– Species– Strain– Clone
Add Attributes
• Features– ID– FeatureType– Value
Add Attributes
• Features– ID– Type– Unit
Table Space
Notice
More Editing
More Editing
Notice
Fill with Data
• Import the data in the plants.csv file
Select Appropriate table
Some adjustments Are needed here
Need to name theColumns
appropriately
Insert Data
• Import Feature table• Import features txt file
Real Data
• Download GO Terms:– http://
archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz
• Change file extention to .xml so that Access can import• Import file into Access
– May take a short while– Errors will occur (we ignore them for now)
• Have a look at the tables• Analyze the relationships (were they imported?)
End