Upload
elwin-warren
View
220
Download
0
Embed Size (px)
Citation preview
Applied Bioinformatics
Course Overview & Introduction to Linux
Bing Zhang
Department of Biomedical Informatics
Vanderbilt University
What is bioinformatics
2
Bio informatics
Data
Hypotheses Questions Samples Experiments
DNA RNA Protein Metabolite Phenotype
Sequence Expression Structure Interaction
Storage/retrieval Visualization Computational methods Statistical methods
Bioinformatics
Why now?
3
Bio informatics
Data
Hypotheses Questions Samples Experiments
DNA RNA Protein Metabolite Phenotype
Sequence Expression Structure Interaction
Storage/retrieval Visualization Computational methods Statistical methods
informatics
Roles for different investigators in bioinformatics
Algorithm developer Statisticians
Mathematicians
Computer scientists
Tool developer Bioinformaticians
Data provider/consumer Biologists
4
Graph courtesy of http://www.incogen.com/
Comprehensive resource list
March 2015 174 Resources
623 Databases
1548 Tools
5
http://bioinformatics.ca/links_directory/
Sequence and structure databases
Genbank: http://www.ncbi.nlm.nih.gov/genbank/ Annotated collection of all publicly available DNA sequences
126,551,501,141 bases in 135,440,924 sequence as of April 2011
UniProt: http://www.uniprot.org/
Comprehensive resource for protein sequences and functional information
534,242 reviewed entries as of January 2012
PDB: http://www.rcsb.org/ 3D structures of large biological molecules, including proteins and nucleic acids
79,180 structures as of February 2012
Pfam: http://pfam.sanger.ac.uk/ Collection of protein families, each represented by multiple sequence alignments
and hidden Markov models (HMMs)
13,672 families as of November 2011
6
7
Genome browsers
UCSC genome browser http://genome.ucsc.edu/cgi-bin/hgGateway
Ensembl genome browser http://www.ensembl.org/index.html
Gene-centric databases
Entrez Gene http://www.ncbi.nlm.nih.gov/gene
NCBI/NIH
All completely sequenced genomes
One gene per page
Ensembl BioMart http://www.ensembl.org/biomart/martview
EMBL-EBI and Sanger Institute
Vertebrates and other selected eukaryotic species
Batch information retrieval
8
Gene expression data
Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
ArrayExpress http://www.ebi.ac.uk/arrayexpress/
9
Pathway and network resources
Gene Ontology (GO): http://www.geneontology.org/
Pathway databases KEGG: http://www.genome.jp/kegg/pathway.html
Reactome: http://www.reactome.org/
WikiPathways: http://www.wikipathways.org/
Protein-protein interaction databases DIP: http://dip.doe-mbi.ucla.edu/ MINT: http://mint.bio.uniroma2.it/mint/ BioGRID: http://www.thebiogrid.org/ HPRD: http://www.hprd.org
Protein-DNA interaction database Transfac: http://www.gene-regulation.com
10
Course materials and report submission
Lecture slides available athttps://sites.google.com/site/vanderbiltigp2014/bioregulation-ii/minimester-3/applied-bioinformatics
Project reports are due at 5pm on the due date (4/13, 4/22, 5/1). There will be a 10% per day deduction for late reports. Report 1 should be sent to Dr. Zhang, Reports 2 and 3 should be sent to Dr. Liu (see email addresses below).
Instructor contact information Dr. Bing Zhang: [email protected]
Dr. Qi Liu: [email protected]
12
ACCRE
Advanced Computing Center for Research & Education http://www.accre.vanderbilt.edu/
The compute cluster currently consists of more than 500 Linux systems with quad or hex core processors
Linux system An operating system (OS) like Windows or Mac
Portable, multi-tasking, multi-user OS
High performance and free, making it idea for high performance computing clusters
13
Proper use of ACCRE
Information in the ACCRE cluster group igp300b_ab may not contain data, information, technology, images, or software that is controlled under Federal Export Administration Regulations (EAR), International Traffic in Arms Regulations (ITAR), Patient Health Information (PHI), or Research Health Information (RHI) nor is it considered proprietary.
14
Get an ACCRE account
http://www.accre.vanderbilt.edu/?page_id=617
Registration form Name, VUNetID, Department (VU), School (VU), Email, Phone, Position
Group: IGP300b_ab (igp300b_ab)
Primary research area: bioinformatics
Primary application: Existing Application
Primary application name: R
Primary application type: Serial
Expected typical number of processors: NA
Expected typical number of concurrent running jobs: 1
Linux experience:
Expected compilers/languages: C, C++, R, perl, python
Expected external libraries: NA
BlueArc User: No
Other useful information: NA
15
Logging onto the cluster and change password
Windows Application: Bitvise SSH (https://www.bitvise.com/ssh-client-download)
Two steps: edit profile->save profile
Host: vmplogin.accre.vanderbilt.edu
Username: your_user_name
Mac Spotlight to find the application: Terminal
Command: ssh [email protected]
Change password rsh auth
passwd
Exit exit
16
Logging onto the cluster and change password (using Terminal in Mac)
18
You won’t see any response while typing
password, which is fine.
Hierarchical File system
/
bin usr home scratchetc tmp
chmod
cp
date
grep
mv
rm
vi
igptestannie codybin lib
bin docs src
libc.so
libgpfs.so
libjpeg.so
libstdc++.so
diff
find
gcc
id
make
perl
ssh
prog1.c
prog2.f77
prog3.cpp
myprog.sh
dothis.pl
dothat.py
/home
/home/igptest
/home/igptest/src/prog3.cpp
19
Working with directories
pwd (print your present working directory)
ls (list directory contents)
mkdir (make a directory)
cd (change directory) .. (parent directory)
. (current directory)
~ or no parameter (home directory)
rmdir (remove an empty directory)
20
Absolute and relative paths
Absolute path A file or directory location in relation to the root of the file system, always
begin with a /
Relative path A file or directory location in relation to where you currently are in the file
system, will not begin with a /
21
Absolute path Relative path
Working with files
more (display the contents of a file) space bar to show next page
q to exist
cp (copy a file)
mv (rename/move a file)
rm (remove a file)
22
Getting help
man (display manual pages for a command) man ls (display manual for the
ls command)
space bar to show next page
q to exist
Alternatives of ls ls -a (do not ignore entries
starting with .)
ls -l (use a long listing format)
ls -al (use a long listing format and do not ignore entries starting with .)
23
Editing files with nano cd ~ (change to home directory)
nano .bashrc (use nano to edit file .bashrc, which includes commands that are executed when starting the system).
Add “setpkgs –a R” to the end of the file (this will allow you to use the R environment which has been installed in the ACCRE system for statistical computing).
A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html
24
Copying files to/from a local computer
Windows Application: Bitvise SSH
(https://www.bitvise.com/ssh-client-download)
Mac Application: Cyberduck
(https://it.vanderbilt.edu/software/downloads.php)
Connect to: vmplogin.accre.vanderbilt.edu
Username: your_user_name
Don’t change other items
25
Summary
28
Command Meaning
rsh <hostname> Remote shell
passwd Modify a user’s password
exit Exit the shell
pwd Display the path of the current directory
ls List files and directories
ls -a List all files and directories
ls -al List all files and directories in a long listing format
mkdir <directory name> Make a directory
cd <directory name> Change to named directory
cd Change to home directory
cd ~ Change to home directory
cd .. Change to parent directory
rmdir <directory name> Remove a directory
more View the contents of a file
cp <file1> <file2> Copy file1 and name the copied file file2
mv <file1> file2> Move or rename file1 to file2
rm <file name> Remove a file
man <command> Display manual pages for a command
nano <file name> Use the nano text editor to view and edit a file
Exercise Create a test directory with the name “test” under your home
Copy the file sample_file.txt under directory /home/igptest to your test directory
Make a copy of the file, sample_file_1.txt
View and modify the file sample_file_1.txt using nano, correct the typo (Warld -> World)
Copy the file to your desktop
Copy a file from your desktop to your test directory
Add “setpkgs –a R” to the end of your .bashrc file
Go through the required sections of the following tutorial before next class. http://ryanstutorials.net/linuxtutorial/ Required sections: 1, 2, 3, 4, 5, 9, 11
Optional sections: 8, 12
Advanced sections: 6, 7, 10, 13
29