41
Bioinformatics Ch1. Introduction 阮阮阮 2002, Oct 17 NTUST ww.ntut.edu.tw/~yukijuan/lectures/bioinfo/Oct17.ppt

Bioinformatics Ch1. Introduction 阮雪芬 2002, Oct 17 NTUST yukijuan/lectures/bioinfo/Oct17.ppt

  • View
    252

  • Download
    3

Embed Size (px)

Citation preview

BioinformaticsCh1. Introduction

阮雪芬2002, Oct 17NTUST

www.ntut.edu.tw/~yukijuan/lectures/bioinfo/Oct17.ppt

Outline

A scenario Life in space and time Dogmas: central and peripheral Observable and data archives

Traditional and Current Biology

Traditionally, biology has been an observational science.

Now, biology has been converted into deductive science.

The Data of Bioinformatics Very very large amount Nucleotide sequence databanks

contain 16 x 109 bases The full three-dimensional

coordinates of proteins of average length ~400 residues: 16000 entries

Not only are the individual databanks large, but their sizes are increasing as a very high rate.

GenBank

Goals

“Saw life clearly and saw it whole”

To interrelate sequence, three-dimensional structure, interactions, and function of individual proteins, nucleic acids and protein-nucleic acid complexes

Understand integrative aspects of the biology of organisms

Goals

To deduce events in evolutionary history.

To support application to medicine, agriculture and other scientific fields.

A Scenario

Imagine a Crisis (1)

A new biological virus creates an epidemic of fatal disease in humans or animals Laboratory scientists will

isolate its genetic material-a molecule of nucleic acid and determine the sequence.

Computer program will then take over

Imagine a Crisis (2)

Screening this new genome against a data bank of all know genetic messages

Developing antiviral therapies: virus contain protein molecules which are suitable targets, for drugs that will interfere with viral structure or function

Imagine a Crisis (3)From the viral DNA sequences

Protein sequence

Computer program

Imagine a Crisis (4)

From amino acid sequences

Three-dimensional structure

Computer program

Homology Modelling

Data bank will be screened for related proteins of know structures

Structure will be predicted

AB

Computer program

Ab initio

No related protein of known structure is found

Ab initio

Predicting the structure

Design Therapeutic Agents

Knowing the viral protein structure

Design therapeutic agents

Life in space and time

In Space

Biosphere Ecosystem Darwinian selection or genetic drift

Natural mutation The recombination of genes in sexual reproduction

Direct gene transfer

The generation of variants

In Space

Ecosystem

Species

Cell

Nuclei, organelles and cytoskeleton

Molecules

In Time

A history of life 3.5 billion years

Dogmas: Central and Peripheral

Central Dogmas 1957, Crick 提出中心教條” DNA 製造

RNA , RNA 製造蛋白質” 中心教條大體上是對的,但也有些需要修正

有許多 RNA 病毒: RNADNARNAProtein 跳躍基因 真核細胞 RNA 需要經過剪接 不只蛋白質具酵素功能,某些 RNA 也具酵素功能 某些基因可經不同的轉錄起始點或不同的剪接方式,

製備出多種 RNA ,而轉錄成功能不同的蛋白質

真核細胞 RNA 需要經過剪接

不只蛋白質具酵素功能,某些 RNA 也具酵素功能

First identified in plant virus

Purines and Pyrimidines

The Strand in the Double-helix are Antiparalle

5’ 3’

3’

5’

Paradigm

DNA sequence

Protein sequence

Protein structure

Protein function

determines

determines

determines

Most of the organized activity of bioinformatics has been focused on the analysis of the data related to these processes

Observable and Data Archives

A Databank

An archive of information A logical organization Structure of that information Tools to gain access to it

A Databank in Molecular Biology

Archival databanks of biological information DNA and protein sequence Nucleic acid and protein structure Databanks of protein expression

A Databank in Molecular Biology

Derived Databanks Sequence motifs Mutations and variants in DNA and protein

sequences Classification and relationships

Bibliographic Databanks Databanks of web sites

Databanks of databanks containing biological information

Links between databanks

The Mechanism of Access to a Databank is

the Set of Tools for answering Question Such as: Does the databank contain the

information I require? How can I assemble information from the

databank in a useful form? Indices of databanks are useful in asking

” Where can I find some specific piece of information?”

Give a sequence or fragment of a sequence

Find sequence in the database that are similar to it

A central problem in bioinformatics

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics (1)

Give a protein structure or fragment

Find protein structures in the database that are similar to it

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics (2)

Give a sequence of a protein of unknown structure

Find structures in the database that adopt similar three-dimensional structures

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics (3)

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics (3)

For if two protein have sufficiently similar sequences

They will have similar structure

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics (4)

Give a protein structure

Find sequences in the data bank that correspond to similar structures

A Variety of Possible Kinds of Database Queries Can Arise in Bioinformatics

(1) and (2) are solved problems (3) and (4) are active fields of

research

Curation, Annotation and Quality Control

Older data were limited by older techniques Amino acid sequences of protein used to

be determined by peptide sequencing. Now, almost al are translated from DNA

sequences.

Curation, Annotation and Quality Control

Distributed error-correction and annotation

Dynamic error-correction and annotation