29
Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Embed Size (px)

Citation preview

Applied Bioinformatics

Course Overview & Introduction to Linux

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

[email protected]

What is bioinformatics

2

Bio informatics

Data

Hypotheses Questions Samples Experiments

DNA RNA Protein Metabolite Phenotype

Sequence Expression Structure Interaction

Storage/retrieval Visualization Computational methods Statistical methods

Bioinformatics

Why now?

3

Bio informatics

Data

Hypotheses Questions Samples Experiments

DNA RNA Protein Metabolite Phenotype

Sequence Expression Structure Interaction

Storage/retrieval Visualization Computational methods Statistical methods

informatics

Roles for different investigators in bioinformatics

Algorithm developer Statisticians

Mathematicians

Computer scientists

Tool developer Bioinformaticians

Data provider/consumer Biologists

4

Graph courtesy of http://www.incogen.com/

Comprehensive resource list

March 2015 174 Resources

623 Databases

1548 Tools

5

http://bioinformatics.ca/links_directory/

Sequence and structure databases

Genbank: http://www.ncbi.nlm.nih.gov/genbank/ Annotated collection of all publicly available DNA sequences

126,551,501,141 bases in 135,440,924 sequence as of April 2011

UniProt: http://www.uniprot.org/

Comprehensive resource for protein sequences and functional information

534,242 reviewed entries as of January 2012

PDB: http://www.rcsb.org/ 3D structures of large biological molecules, including proteins and nucleic acids

79,180 structures as of February 2012

Pfam: http://pfam.sanger.ac.uk/ Collection of protein families, each represented by multiple sequence alignments

and hidden Markov models (HMMs)

13,672 families as of November 2011

6

7

Genome browsers

UCSC genome browser http://genome.ucsc.edu/cgi-bin/hgGateway

Ensembl genome browser http://www.ensembl.org/index.html

Gene-centric databases

Entrez Gene http://www.ncbi.nlm.nih.gov/gene

NCBI/NIH

All completely sequenced genomes

One gene per page

Ensembl BioMart http://www.ensembl.org/biomart/martview

EMBL-EBI and Sanger Institute

Vertebrates and other selected eukaryotic species

Batch information retrieval

8

Gene expression data

Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

ArrayExpress http://www.ebi.ac.uk/arrayexpress/

9

Pathway and network resources

Gene Ontology (GO): http://www.geneontology.org/

Pathway databases KEGG: http://www.genome.jp/kegg/pathway.html

Reactome: http://www.reactome.org/

WikiPathways: http://www.wikipathways.org/

Protein-protein interaction databases DIP: http://dip.doe-mbi.ucla.edu/ MINT: http://mint.bio.uniroma2.it/mint/ BioGRID: http://www.thebiogrid.org/ HPRD: http://www.hprd.org

Protein-DNA interaction database Transfac: http://www.gene-regulation.com

10

Course content and grades

11

Course materials and report submission

Lecture slides available athttps://sites.google.com/site/vanderbiltigp2014/bioregulation-ii/minimester-3/applied-bioinformatics

Project reports are due at 5pm on the due date (4/13, 4/22, 5/1). There will be a 10% per day deduction for late reports. Report 1 should be sent to Dr. Zhang, Reports 2 and 3 should be sent to Dr. Liu (see email addresses below).

Instructor contact information Dr. Bing Zhang: [email protected]

Dr. Qi Liu: [email protected]

12

ACCRE

Advanced Computing Center for Research & Education http://www.accre.vanderbilt.edu/

The compute cluster currently consists of more than 500 Linux systems with quad or hex core processors

Linux system An operating system (OS) like Windows or Mac

Portable, multi-tasking, multi-user OS

High performance and free, making it idea for high performance computing clusters

13

Proper use of ACCRE

Information in the ACCRE cluster group igp300b_ab may not contain data, information, technology, images, or software that is controlled under Federal Export Administration Regulations (EAR), International Traffic in Arms Regulations (ITAR), Patient Health Information (PHI), or Research Health Information (RHI) nor is it considered proprietary. 

14

Get an ACCRE account

http://www.accre.vanderbilt.edu/?page_id=617

Registration form Name, VUNetID, Department (VU), School (VU), Email, Phone, Position

Group: IGP300b_ab (igp300b_ab)

Primary research area: bioinformatics

Primary application: Existing Application

Primary application name: R

Primary application type: Serial

Expected typical number of processors: NA

Expected typical number of concurrent running jobs: 1

Linux experience:

Expected compilers/languages: C, C++, R, perl, python

Expected external libraries: NA

BlueArc User: No

Other useful information: NA

15

Logging onto the cluster and change password

Windows Application: Bitvise SSH (https://www.bitvise.com/ssh-client-download)

Two steps: edit profile->save profile

Host: vmplogin.accre.vanderbilt.edu

Username: your_user_name

Mac Spotlight to find the application: Terminal

Command: ssh [email protected]

Change password rsh auth

passwd

Exit exit

16

Logging onto the cluster and change password (using Bitvise SSH in Windows)

17

Logging onto the cluster and change password (using Terminal in Mac)

18

You won’t see any response while typing

password, which is fine.

Hierarchical File system

/

bin usr home scratchetc tmp

chmod

cp

date

grep

mv

rm

vi

igptestannie codybin lib

bin docs src

libc.so

libgpfs.so

libjpeg.so

libstdc++.so

diff

find

gcc

id

make

perl

ssh

prog1.c

prog2.f77

prog3.cpp

myprog.sh

dothis.pl

dothat.py

/home

/home/igptest

/home/igptest/src/prog3.cpp

19

Working with directories

pwd (print your present working directory)

ls (list directory contents)

mkdir (make a directory)

cd (change directory) .. (parent directory)

. (current directory)

~ or no parameter (home directory)

rmdir (remove an empty directory)

20

Absolute and relative paths

Absolute path A file or directory location in relation to the root of the file system, always

begin with a /

Relative path A file or directory location in relation to where you currently are in the file

system, will not begin with a /

21

Absolute path Relative path

Working with files

more (display the contents of a file) space bar to show next page

q to exist

cp (copy a file)

mv (rename/move a file)

rm (remove a file)

22

Getting help

man (display manual pages for a command) man ls (display manual for the

ls command)

space bar to show next page

q to exist

Alternatives of ls ls -a (do not ignore entries

starting with .)

ls -l (use a long listing format)

ls -al (use a long listing format and do not ignore entries starting with .)

23

Editing files with nano cd ~ (change to home directory)

nano .bashrc (use nano to edit file .bashrc, which includes commands that are executed when starting the system).

Add “setpkgs –a R” to the end of the file (this will allow you to use the R environment which has been installed in the ACCRE system for statistical computing).

A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html

24

Copying files to/from a local computer

Windows Application: Bitvise SSH

(https://www.bitvise.com/ssh-client-download)

Mac Application: Cyberduck

(https://it.vanderbilt.edu/software/downloads.php)

Connect to: vmplogin.accre.vanderbilt.edu

Username: your_user_name

Don’t change other items

25

Copying files to/from a local computer (using Bitvise SFTP in Windows)

26

Copying files to/from a local computer (using Fugu in Mac)

27

Summary

28

Command Meaning

rsh <hostname> Remote shell

passwd Modify a user’s password

exit Exit the shell

pwd Display the path of the current directory

ls List files and directories

ls -a List all files and directories

ls -al List all files and directories in a long listing format

mkdir <directory name> Make a directory

cd <directory name> Change to named directory

cd Change to home directory

cd ~ Change to home directory

cd .. Change to parent directory

rmdir <directory name> Remove a directory

more View the contents of a file

cp <file1> <file2> Copy file1 and name the copied file file2

mv <file1> file2> Move or rename file1 to file2

rm <file name> Remove a file

man <command> Display manual pages for a command

nano <file name> Use the nano text editor to view and edit a file

Exercise Create a test directory with the name “test” under your home

Copy the file sample_file.txt under directory /home/igptest to your test directory

Make a copy of the file, sample_file_1.txt

View and modify the file sample_file_1.txt using nano, correct the typo (Warld -> World)

Copy the file to your desktop

Copy a file from your desktop to your test directory

Add “setpkgs –a R” to the end of your .bashrc file

Go through the required sections of the following tutorial before next class. http://ryanstutorials.net/linuxtutorial/ Required sections: 1, 2, 3, 4, 5, 9, 11

Optional sections: 8, 12

Advanced sections: 6, 7, 10, 13

29