59
How Bioinformatics can change How Bioinformatics can change your life your life Basic Concepts of Basic Concepts of Bioinformatics Bioinformatics M. Alroy Mascrenghe M. Alroy Mascrenghe MBCS, MIEEE, MIT MBCS, MIEEE, MIT [email protected] [email protected] A lecture given for the BCS Wolerhampton Branch at the University of A lecture given for the BCS Wolerhampton Branch at the University of Wolverhampton Wolverhampton http://www.geocities.com/mark_ai/ http://www.geocities.com/mark_ai/

How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT [email protected] A lecture given for the

Embed Size (px)

Citation preview

Page 1: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

How Bioinformatics can change your lifeHow Bioinformatics can change your life Basic Concepts of Basic Concepts of

BioinformaticsBioinformatics

M. Alroy MascrengheM. Alroy MascrengheMBCS, MIEEE, MITMBCS, MIEEE, MIT

[email protected][email protected]

A lecture given for the BCS Wolerhampton Branch at the University of WolverhamptonA lecture given for the BCS Wolerhampton Branch at the University of Wolverhampton

http://www.geocities.com/mark_ai/http://www.geocities.com/mark_ai/

Page 2: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 22

TOCTOC

IntroductionIntroduction Basic concepts in Molecular biologyBasic concepts in Molecular biology Bioinformatics techniquesBioinformatics techniques Areas in bioinformaticsAreas in bioinformatics ApplicationsApplications Related Computer TechnologyRelated Computer Technology Conference in GlasgowConference in Glasgow AcknowledgementsAcknowledgements ReferenceReference

Page 3: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 33

Introduction……Introduction……

Page 4: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 44

20002000

A Major event happened that was to A Major event happened that was to change the course of human historychange the course of human history

It was a joint British and American It was a joint British and American effort effort

nothing to do with IRAQ!nothing to do with IRAQ! It was a race – who will complete It was a race – who will complete

firstfirst Race Test – not whether they have Race Test – not whether they have

taken drugs but whether they can taken drugs but whether they can produce them!produce them!

Human genome was sequencedHuman genome was sequenced

Page 5: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 55

A Situ…somewhere in the A Situ…somewhere in the near futurenear future

A virus –not ‘I love you’ virus- creates an epidemicA virus –not ‘I love you’ virus- creates an epidemic Geneticists and bioinformaticians role on their Geneticists and bioinformaticians role on their

sleevessleeves Genetic material of the virus is compared with the Genetic material of the virus is compared with the

existing base of known genetic material of other existing base of known genetic material of other virusesviruses

As the characteristics of the other viruses are As the characteristics of the other viruses are knownknown

From genetic material computer programs will From genetic material computer programs will derive the derive the proteinsproteins necessary for the survival of the necessary for the survival of the virusvirus

When the When the proteinprotein (sequence and structure) is (sequence and structure) is known then medicines can be designedknown then medicines can be designed

Page 6: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 66

What is What is

The marriage between computer The marriage between computer science and molecular biologyscience and molecular biology The algorithm and techniques of The algorithm and techniques of

computer science are being used to computer science are being used to solve the problems faced by molecular solve the problems faced by molecular biologistsbiologists

‘‘Information technology applied to Information technology applied to the management and analysis of the management and analysis of biological data’biological data’ Storage and Analysis are two of the Storage and Analysis are two of the

important functions – bioinformaticians important functions – bioinformaticians build tools for eachbuild tools for each

Page 7: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 77

Biology Chemistry

StatisticsComputer

Science

Bioinformatics

Page 8: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 88

What is..What is..

This is the age of the Information This is the age of the Information TechnologyTechnology

However storing info is nothing newHowever storing info is nothing new Information to the volume of Information to the volume of

Britannica Encyclopedia is stored in Britannica Encyclopedia is stored in each of our cellseach of our cells

‘‘Bioinformatics tries to determine Bioinformatics tries to determine what info is biologically important’what info is biologically important’

Page 9: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 99

Basics Basics

of of

Molecular Biology….Molecular Biology….

Page 10: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1010

DNA & GenesDNA & Genes

DNA is where the genetic information is DNA is where the genetic information is storedstored

Blonde hair and blue eyes are inherited by Blonde hair and blue eyes are inherited by thisthis

Gene - The basic unit of heredityGene - The basic unit of heredity There are genes for characteristics i.e. a gene There are genes for characteristics i.e. a gene

for blond hair etcfor blond hair etc Genes contain the information as a Genes contain the information as a

sequence of nucleotidessequence of nucleotides Genes are abstract concepts – like Genes are abstract concepts – like

longitude and latitudes in the sense that longitude and latitudes in the sense that you cannot see them separatelyyou cannot see them separately

Genes are made up of nucleotidesGenes are made up of nucleotides

Page 11: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1111

Page 12: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1212

Nucleotide (nt)Nucleotide (nt)

Each nt I made up ofEach nt I made up of SugarSugar Phospate groupPhospate group BaseBase

The base it (nt) contains makes the only The base it (nt) contains makes the only difference between one nt and the otherdifference between one nt and the other

There are 4 different basesThere are 4 different bases G(uanine),A(denine),T(hymine),C(ytosine)G(uanine),A(denine),T(hymine),C(ytosine)

The information is in the order of nucleotide The information is in the order of nucleotide and the order is the infoand the order is the info

Genes can be many thousands of nt longGenes can be many thousands of nt long The complete set of genetic instructions is The complete set of genetic instructions is

called genomescalled genomes

Page 13: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1313

ChromosomesChromosomes

DNA strings make DNA strings make chromosomeschromosomes

AnalogyAnalogy Letters - Letters - ntnt Sentences – genesSentences – genes Individual Individual volumesvolumes of Britannica of Britannica

encyclopedia – chromosomesencyclopedia – chromosomes All voles together - GenomeAll voles together - Genome

Page 14: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1414

Double HelixDouble Helix

The DNA is a double helixThe DNA is a double helix Each strand has complementary Each strand has complementary

informationinformation Each particular base in one strand is Each particular base in one strand is

bonded with another particular base in the bonded with another particular base in the next strandnext strand G - CG - C A - TA - T

For example - For example - AATGCAATGC one strandone strand TTACGTTACG other strandother strand

Page 15: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1515

ProteinsProteins Proteins are very important Proteins are very important

biological featurebiological feature Amino Acids make up the proteinsAmino Acids make up the proteins 20 different amino acids are there20 different amino acids are there The function of a protein is The function of a protein is

dependant on the order of the amino dependant on the order of the amino acidsacids

Page 16: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1616

Proteins…Proteins…

The information required to make aa is stored The information required to make aa is stored in DNAin DNA

DNA sequence determines amino acid DNA sequence determines amino acid sequence sequence

Amino Acid sequence determines protein Amino Acid sequence determines protein structurestructure

Protein structure determines protein functionProtein structure determines protein function A Substance called RNA is used to carry the A Substance called RNA is used to carry the

Info stored in the DNA that in turn is used to Info stored in the DNA that in turn is used to make proteinsmake proteins

Storage - DNAStorage - DNA Information Transfer – RNAInformation Transfer – RNA RNA is the message boy!RNA is the message boy!

Page 17: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1717

Central dogmaCentral dogma

DNADNA transcription transcription RNARNA Translation Translation ProteinProtein

RNA PolymeraseRNA Polymerase Ribosomes Ribosomes

Page 18: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1818

Page 19: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 1919

Proteins…..Proteins….. Since there are 20 amino acids to Since there are 20 amino acids to

translate one nt cannot correspond to translate one nt cannot correspond to one aa, neither can it correspond as one aa, neither can it correspond as twostwos

So in triplet codes – codon – protein So in triplet codes – codon – protein information is carriedinformation is carried

The codons that do not correspond to a The codons that do not correspond to a protein are stop codons – UAA, UAG, protein are stop codons – UAA, UAG, UGA UGA (RNA has U instead of T)(RNA has U instead of T)

Some codons are used as start codons - Some codons are used as start codons - AUG as well as to code methionineAUG as well as to code methionine

Page 20: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2020

Protein StructureProtein Structure Shows a wide variety as opposed to the DNA Shows a wide variety as opposed to the DNA

whose structure is uniformwhose structure is uniform X-ray crystallography or Nuclear Magnetic X-ray crystallography or Nuclear Magnetic

Resonance (NMR) is used to figure out the Resonance (NMR) is used to figure out the structurestructure

Structure is related to the function or rather Structure is related to the function or rather structure determines the functionstructure determines the function

Although proteins are created as a linear Although proteins are created as a linear structure of aa chain they fold into 3 d structure.structure of aa chain they fold into 3 d structure.

If you stretch them and leave them they will go If you stretch them and leave them they will go back to this structure – this is the back to this structure – this is the native native structurestructure of a protein of a protein

Only in the native structure the proteins functions Only in the native structure the proteins functions wellwell

Even after the translation is over protein goes Even after the translation is over protein goes through some changes to its structurethrough some changes to its structure

Page 21: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2121

Gene ExpressionGene Expression Gene Expression – the process of Gene Expression – the process of

Transcripting a DNA and translating a RNA Transcripting a DNA and translating a RNA to make proteinto make protein

Where do the genes begin in a Where do the genes begin in a chromosome?chromosome?

How does the RNA identify the beginning How does the RNA identify the beginning of a gene to make a proteinof a gene to make a protein

A single nt cannot be taken to point out the A single nt cannot be taken to point out the beginning of a gene as they occur beginning of a gene as they occur frequentlyfrequently

But a particular combination of a nucleotide But a particular combination of a nucleotide can becan be

Promoter sequences – the order of nt Promoter sequences – the order of nt which mark the beginning of a genewhich mark the beginning of a gene

Page 22: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2222

Bioinformatics Bioinformatics Techniques…..Techniques…..

Page 23: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2323

Prediction and Pattern Prediction and Pattern RecognitionRecognition The two main areas of bioinformatics The two main areas of bioinformatics

areare Pattern recognitionPattern recognition

‘‘A particular sequence or structure has A particular sequence or structure has been seen before’ and that a particular been seen before’ and that a particular characteristic can be associated with itcharacteristic can be associated with it

PredictionPrediction From a sequence (what we know) we From a sequence (what we know) we

can predict the structure and function can predict the structure and function (what we don’t know)(what we don’t know)

Page 24: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2424

Dot plots….Dot plots….

Simple way of evaluating Simple way of evaluating similarity between two similarity between two sequencessequences

In a graph one sequence is on In a graph one sequence is on one side the next on the other one side the next on the other sideside

Where there are matches Where there are matches between the two sequences the between the two sequences the graph is markedgraph is marked

Page 25: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2525

Page 26: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2626

AlignmentsAlignments

A match for similarity between the characters of two or A match for similarity between the characters of two or more sequencesmore sequences

Eg.Eg. TTACTATATTACTATA TAGATATAGATA

There are so many ways to align the above two There are so many ways to align the above two sequencessequences

1. 1. TTACTATATTACTATA TAGATATAGATA

2.2. TTACTATATTACTATA TAGATATAGATA

3.3. TTACTATATTACTATA TAGATATAGATA

So which one do we choose and on what basis?So which one do we choose and on what basis? Solution is to Provide a match score and mismatch scoreSolution is to Provide a match score and mismatch score

Page 27: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2727

GapsGaps

Introduce gaps and a penalty Introduce gaps and a penalty score for gapsscore for gaps

TTACTATATTACTATA T_A_GATAT_A_GATA

In gap scores a single indel which is two characters long is preferred to two indels which are each one In gap scores a single indel which is two characters long is preferred to two indels which are each one character longcharacter long

However not all gaps are badHowever not all gaps are bad TTGCAATCTTTGCAATCT CAACAA How do we align?How do we align? ---CAA------CAA--- These gaps are not biologically significantThese gaps are not biologically significant Semi Global AlignmentsSemi Global Alignments

Page 28: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2828

Scoring MatrixScoring Matrix

For DNA/protein sequence alignment we create a matrix For DNA/protein sequence alignment we create a matrix If A and A score is 1If A and A score is 1 If A and T score is -5If A and T score is -5 If A and C score is -1If A and C score is -1

Page 29: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 2929

Dynamic ProgrammingDynamic Programming

As the length of the query sequences As the length of the query sequences increase and the difference of length increase and the difference of length between the two sequence also increases between the two sequence also increases –more gaps has to be inserted in various –more gaps has to be inserted in various placesplaces

We cannot perform an exhaustive searchWe cannot perform an exhaustive search Combinatorial explosion occurs – too much Combinatorial explosion occurs – too much

combinations to search forcombinations to search for Dynamic programming is a way of using Dynamic programming is a way of using

heuristics to search in the most promising heuristics to search in the most promising pathpath

Page 30: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3030

DatabasesDatabases Sequence info is stored in databasesSequence info is stored in databases So that they can be manipulated So that they can be manipulated

easilyeasily The db (next slide) are located at diff The db (next slide) are located at diff

placesplaces They exchange info on a daily basis They exchange info on a daily basis

so that they are up-to-date and are in so that they are up-to-date and are in syncsync

Primary db – sequence dataPrimary db – sequence data

Page 31: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

Major Primary DBMajor Primary DBNucleic AcidNucleic Acid ProteinProtein

EMBL (Europe)EMBL (Europe) PIR - PIR -

Protein Information Protein Information ResourceResource

GenBank (USA)GenBank (USA) MIPSMIPS

DDBJ (Japan)DDBJ (Japan) SWISS-PROTSWISS-PROT

University of Geneva, University of Geneva, now with EBInow with EBI

TrEMBLTrEMBL

A supplement to SWISS-A supplement to SWISS-PROTPROT

NRL-3DNRL-3D

Page 32: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3232

Composite DBComposite DB

As there are many db which one to As there are many db which one to search? Some are good in some search? Some are good in some aspects and weak in others?aspects and weak in others?

Composite db is the answer – which Composite db is the answer – which has several db for its base datahas several db for its base data

Search on these db is indexed and Search on these db is indexed and streamlined so that the same stored streamlined so that the same stored sequence is not searched twice in sequence is not searched twice in different dbdifferent db

Page 33: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3333

Composite DBComposite DB

OWL has these as their primary OWL has these as their primary dbdb SWISS PROT (top priority)SWISS PROT (top priority) PIRPIR GenBankGenBank NRL-3DNRL-3D

Page 34: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3434

Secondary dbSecondary db

Store secondary structure info Store secondary structure info or results of searches of the or results of searches of the primary dbprimary db

Compo Compo DBDB

Primary Primary SourceSource

PROSITEPROSITE SWISS-PROTSWISS-PROT

PRINTSPRINTS OWLOWL

Page 35: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3535

Database SearchesDatabase Searches We have sequenced and identified We have sequenced and identified

genes. So we know what they dogenes. So we know what they do The sequences are stored in databasesThe sequences are stored in databases So if we find a new gene in the human So if we find a new gene in the human

genome we compare it with the already genome we compare it with the already found genes which are stored in the found genes which are stored in the databases.databases.

Since there are large number of Since there are large number of databases we cannot do sequence databases we cannot do sequence alignment for each and every sequencealignment for each and every sequence

So heuristics must be used again.So heuristics must be used again.

Page 36: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3636

Areas in Areas in Bioinformatics…Bioinformatics…

Page 37: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3737

GenomicsGenomics

Because of the multicellular structure, each Because of the multicellular structure, each cell type does gene expression in a cell type does gene expression in a different way –although each cell has the different way –although each cell has the same content as far as the genetic same content as far as the genetic

i.e. All the information for a liver cell to be a i.e. All the information for a liver cell to be a liver cell is also present on nose cell, so liver cell is also present on nose cell, so gene expression is the only thing that gene expression is the only thing that differentiatesdifferentiates

Page 38: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 3838

Genomics - Finding GenesGenomics - Finding Genes

Gene in sequence data – needle in a Gene in sequence data – needle in a haystackhaystack

However as the needle is different However as the needle is different from the haystack genes are not diff from the haystack genes are not diff from the rest of the sequence datafrom the rest of the sequence data

Is whole array of nt we try to find and Is whole array of nt we try to find and border mark a set o nt as a geneborder mark a set o nt as a gene

This is one of the challenges of This is one of the challenges of bioinformaticsbioinformatics

Neural networks and dynamic Neural networks and dynamic programming are being employedprogramming are being employed

Page 39: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

OrganismOrganism Genome Genome Size Size (Mb) (Mb) bp * 1,000,000bp * 1,000,000

Gene Gene NumberNumber

Web SiteWeb Site

YeastYeast 13.513.5 6,2416,241 http://genome-http://genome-www.stanford.edwww.stanford.edu/u/SaccharomycesSaccharomyces

Fruit FliesFruit Flies 180180 13,60113,601 http://http://flybase.bio.indiaflybase.bio.indiana.eduna.edu

Homo Homo SapiensSapiens

3,0003,000 45,00045,000 http://http://www.ncbi.nlm.niwww.ncbi.nlm.nih.gov/genome/h.gov/genome/guideguide

Page 40: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4040

ProteomicsProteomics

Proteome is the sum total of an Proteome is the sum total of an organisms proteinsorganisms proteins

More difficult than genomicsMore difficult than genomics 44 2020 Simple chemical makeupSimple chemical makeup complexcomplex Can duplicateCan duplicate can’tcan’t

We are entering into the ‘post We are entering into the ‘post genome era’genome era’

Meaning much has been done with Meaning much has been done with the Genes – not that it’s a overthe Genes – not that it’s a over

Page 41: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4141

Proteomics…..Proteomics…..

The relationship between the RNA and the protein it codes are The relationship between the RNA and the protein it codes are usually very differentusually very different

After translation proteins do changeAfter translation proteins do change So aa sequence do not tell anything about the post So aa sequence do not tell anything about the post

translation changestranslation changes Proteins are not active until they are combined into a larger Proteins are not active until they are combined into a larger

complex or moved to a relevant location inside or outside the cellcomplex or moved to a relevant location inside or outside the cell So aa only hint in these thingsSo aa only hint in these things Also proteins must be handled more carefully in labs as they tend Also proteins must be handled more carefully in labs as they tend

to change when in touch with an inappropriate materialto change when in touch with an inappropriate material

Page 42: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4242

Protein Structure PredictionProtein Structure Prediction

Is one of the biggest challenges Is one of the biggest challenges of bioinformatics and esp. of bioinformatics and esp. biochemistrybiochemistry

No algorithm is there now to No algorithm is there now to consistently predict the structure consistently predict the structure of proteinsof proteins

Page 43: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4343

Structure Prediction methodsStructure Prediction methods

Comparative ModelingComparative Modeling Target proteins structure is Target proteins structure is

compared with related proteinscompared with related proteins Proteins with similar sequences Proteins with similar sequences

are searched for structuresare searched for structures

Page 44: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4444

PhylogeneticsPhylogenetics

The taxonomical system reflects The taxonomical system reflects evolutionary relationshipsevolutionary relationships

Phylogenetics trees are things which reflect Phylogenetics trees are things which reflect the evolutionary relationship thru a the evolutionary relationship thru a picture/graphpicture/graph

Rooted trees where there is only one Rooted trees where there is only one ancestorancestor

Un rooted trees just showing the Un rooted trees just showing the relationshiprelationship

Phylogenetic tree reconstruction algorithms Phylogenetic tree reconstruction algorithms are also an area of researchare also an area of research

Page 45: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4545

Applications….Applications….

Page 46: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4646

Medical ImplicationsMedical Implications

PharmacogenomicsPharmacogenomics Not all drugs work on all patients, some good Not all drugs work on all patients, some good

drugs cause death in some patientsdrugs cause death in some patients So by doing a gene analysis before the So by doing a gene analysis before the

treatment the offensive drugs can be avoidedtreatment the offensive drugs can be avoided Also drugs which cause death to most can be Also drugs which cause death to most can be

used on a minority to whose genes that drug is used on a minority to whose genes that drug is well suited – volunteers wanted!well suited – volunteers wanted!

Customized treatmentCustomized treatment Gene TherapyGene Therapy

Replace or supply the defective or missing geneReplace or supply the defective or missing gene E.g: Insulin and Factor VIII or HaemophiliaE.g: Insulin and Factor VIII or Haemophilia

BioWeapons (??)BioWeapons (??)

Page 47: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4747

Diagnosis of DiseaseDiagnosis of Disease Diagnosis of diseaseDiagnosis of disease

Identification of genes which cause the Identification of genes which cause the disease will help detect disease at early disease will help detect disease at early stage e.g. Huntington disease -stage e.g. Huntington disease -

Symptoms – uncontrollable dance like Symptoms – uncontrollable dance like movements, mental disturbance, personality movements, mental disturbance, personality changes and intellectual impairment changes and intellectual impairment

Death in 10-15 yearsDeath in 10-15 years The gene responsible for the disease has The gene responsible for the disease has

been identifiedbeen identified Contains excessively repeated sections of Contains excessively repeated sections of

CAGCAG So once analyzed the couple can be So once analyzed the couple can be

counseledcounseled

Page 48: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4848

Drug DesignDrug Design

Can go up to 15yrs and Can go up to 15yrs and $700million $700million

One of the goals of One of the goals of bioinformatics is to reduce the bioinformatics is to reduce the time and cost involved with it.time and cost involved with it.

The processThe process DiscoveryDiscovery

Computational methods can Computational methods can improves thisimproves this

TestingTesting

Page 49: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 4949

DiscoveryDiscovery

Target identificationTarget identification Identifying the molecule on which the Identifying the molecule on which the

germs relies for its survivalgerms relies for its survival Then we develop another molecule Then we develop another molecule

i.e. drug which will bind to the targeti.e. drug which will bind to the target So the germ will not be able to interact So the germ will not be able to interact

with the target.with the target. Proteins are the most common targetsProteins are the most common targets

Page 50: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5050

Discovery…Discovery…

For example HIV produces HIV For example HIV produces HIV protease which is a protein and protease which is a protein and which in turn eat other proteinswhich in turn eat other proteins

This HIV protease has an This HIV protease has an active active sitesite where it binds to other where it binds to other moleculesmolecules

So HIV drug will go and bind So HIV drug will go and bind with that active sitewith that active site Easily said than done!Easily said than done!

Page 51: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5151

Discovery…Discovery…

Lead compounds are the Lead compounds are the molecules that go and bind to molecules that go and bind to the target protein’s active sitethe target protein’s active site

Traditionally this has been a trial Traditionally this has been a trial and error methodand error method

Now this is being moved into the Now this is being moved into the realm of computersrealm of computers

Page 52: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5252

Related Computer Related Computer Technology………….Technology………….

Page 53: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5353

PERLPERL

Perl is commonly used for Perl is commonly used for bioinformatics calculations as its bioinformatics calculations as its ability to manipulate character ability to manipulate character symbolssymbols

The default CGI languageThe default CGI language It started out as a scripting language It started out as a scripting language

but has become a fully fledged but has become a fully fledged languagelanguage

IT has everything now, even web IT has everything now, even web service supportservice support

http://bio.perl.orghttp://bio.perl.org

Page 54: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5454

The place of XML & Web The place of XML & Web ServicesServices Various markup languages are being created – Various markup languages are being created –

Gene Markup language etc to represent Gene Markup language etc to represent sequence/gene datasequence/gene data

Web Services – program to program interaction, Web Services – program to program interaction, making the web application centric as opposed to making the web application centric as opposed to human centrichuman centric

So this has to platform language independentSo this has to platform language independent Protocols like SOAP help in this regardProtocols like SOAP help in this regard In bioinformatics various databases are being used, In bioinformatics various databases are being used,

different platforms, languages etcdifferent platforms, languages etc So web services helps achieve platform So web services helps achieve platform

independence and program interactionindependence and program interaction Since sequence data bases are in various formats, Since sequence data bases are in various formats,

platforms SOAP also helps in this regardsplatforms SOAP also helps in this regards

Page 55: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5555

The place of GRIDThe place of GRID GRID - new kid on the blockGRID - new kid on the block Using many computers to fulfill a Using many computers to fulfill a

single computational taskssingle computational tasks Bioinformatics is the ideal platform Bioinformatics is the ideal platform

as it has to deal with a large as it has to deal with a large amount of data in alignment and amount of data in alignment and searchessearches

E-science initiative in the UKE-science initiative in the UK ORACLE 10g – the worlds first ORACLE 10g – the worlds first

GRID databaseGRID database

Page 56: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5656

Data bases and MiningData bases and Mining

Lot of the sequence databases are Lot of the sequence databases are available publiclyavailable publicly

As there is a DB involved various As there is a DB involved various data mining techniques are used to data mining techniques are used to pull the data outpull the data out

As there is a lot of literature – articles As there is a lot of literature – articles etc – on this area a data mining on etc – on this area a data mining on the literature – not on the sequence the literature – not on the sequence data has also become a PhD topic data has also become a PhD topic for manyfor many

Page 57: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5757

European Molecular Biology European Molecular Biology Network (EMBnet)Network (EMBnet) A central system for sharing, training A central system for sharing, training

and centralizing up to date bio infoand centralizing up to date bio info Some of the EMBnet sites are:Some of the EMBnet sites are: SQENETSQENET

http://www.seqnet.dl.ac.ukhttp://www.seqnet.dl.ac.uk UCLUCL

http://www.biochem.ucl.ac.uk/bsm/dbbrohttp://www.biochem.ucl.ac.uk/bsm/dbbrowser/embnet/wser/embnet/

EBI – European Bioinformatics EBI – European Bioinformatics InstituteInstitute www.ebi.ac.ukwww.ebi.ac.uk

Page 58: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5858

References References

Dan E. Krane and Michael L. RaymerDan E. Krane and Michael L. Raymer Basic Concepts of BioinformaticsBasic Concepts of Bioinformatics

Arthur M LeskArthur M Lesk Intro to BioinformaticsIntro to Bioinformatics

T.K. Attwood & D. J. Parry-SmithT.K. Attwood & D. J. Parry-Smith Intro to BioinformaticsIntro to Bioinformatics

The genetic RevolutionThe genetic Revolution Dr Patrick DixonDr Patrick Dixon

Prof David Gilbert’s SiteProf David Gilbert’s Site http://www.brc.dcs.gla.ac.uk/~drg/http://www.brc.dcs.gla.ac.uk/~drg/

Page 59: How Bioinformatics can change your life Basic Concepts of Bioinformatics M. Alroy Mascrenghe MBCS, MIEEE, MIT mark_ai@yahoo.com A lecture given for the

M.Alroy MascrengheM.Alroy Mascrenghe 5959

Thank You!Thank You!