Upload
alexander-lewis
View
225
Download
5
Tags:
Embed Size (px)
Citation preview
How Bioinformatics can change your lifeHow Bioinformatics can change your life Basic Concepts of Basic Concepts of
BioinformaticsBioinformatics
M. Alroy MascrengheM. Alroy MascrengheMBCS, MIEEE, MITMBCS, MIEEE, MIT
[email protected][email protected]
A lecture given for the BCS Wolerhampton Branch at the University of WolverhamptonA lecture given for the BCS Wolerhampton Branch at the University of Wolverhampton
http://www.geocities.com/mark_ai/http://www.geocities.com/mark_ai/
M.Alroy MascrengheM.Alroy Mascrenghe 22
TOCTOC
IntroductionIntroduction Basic concepts in Molecular biologyBasic concepts in Molecular biology Bioinformatics techniquesBioinformatics techniques Areas in bioinformaticsAreas in bioinformatics ApplicationsApplications Related Computer TechnologyRelated Computer Technology Conference in GlasgowConference in Glasgow AcknowledgementsAcknowledgements ReferenceReference
M.Alroy MascrengheM.Alroy Mascrenghe 33
Introduction……Introduction……
M.Alroy MascrengheM.Alroy Mascrenghe 44
20002000
A Major event happened that was to A Major event happened that was to change the course of human historychange the course of human history
It was a joint British and American It was a joint British and American effort effort
nothing to do with IRAQ!nothing to do with IRAQ! It was a race – who will complete It was a race – who will complete
firstfirst Race Test – not whether they have Race Test – not whether they have
taken drugs but whether they can taken drugs but whether they can produce them!produce them!
Human genome was sequencedHuman genome was sequenced
M.Alroy MascrengheM.Alroy Mascrenghe 55
A Situ…somewhere in the A Situ…somewhere in the near futurenear future
A virus –not ‘I love you’ virus- creates an epidemicA virus –not ‘I love you’ virus- creates an epidemic Geneticists and bioinformaticians role on their Geneticists and bioinformaticians role on their
sleevessleeves Genetic material of the virus is compared with the Genetic material of the virus is compared with the
existing base of known genetic material of other existing base of known genetic material of other virusesviruses
As the characteristics of the other viruses are As the characteristics of the other viruses are knownknown
From genetic material computer programs will From genetic material computer programs will derive the derive the proteinsproteins necessary for the survival of the necessary for the survival of the virusvirus
When the When the proteinprotein (sequence and structure) is (sequence and structure) is known then medicines can be designedknown then medicines can be designed
M.Alroy MascrengheM.Alroy Mascrenghe 66
What is What is
The marriage between computer The marriage between computer science and molecular biologyscience and molecular biology The algorithm and techniques of The algorithm and techniques of
computer science are being used to computer science are being used to solve the problems faced by molecular solve the problems faced by molecular biologistsbiologists
‘‘Information technology applied to Information technology applied to the management and analysis of the management and analysis of biological data’biological data’ Storage and Analysis are two of the Storage and Analysis are two of the
important functions – bioinformaticians important functions – bioinformaticians build tools for eachbuild tools for each
M.Alroy MascrengheM.Alroy Mascrenghe 77
Biology Chemistry
StatisticsComputer
Science
Bioinformatics
M.Alroy MascrengheM.Alroy Mascrenghe 88
What is..What is..
This is the age of the Information This is the age of the Information TechnologyTechnology
However storing info is nothing newHowever storing info is nothing new Information to the volume of Information to the volume of
Britannica Encyclopedia is stored in Britannica Encyclopedia is stored in each of our cellseach of our cells
‘‘Bioinformatics tries to determine Bioinformatics tries to determine what info is biologically important’what info is biologically important’
M.Alroy MascrengheM.Alroy Mascrenghe 99
Basics Basics
of of
Molecular Biology….Molecular Biology….
M.Alroy MascrengheM.Alroy Mascrenghe 1010
DNA & GenesDNA & Genes
DNA is where the genetic information is DNA is where the genetic information is storedstored
Blonde hair and blue eyes are inherited by Blonde hair and blue eyes are inherited by thisthis
Gene - The basic unit of heredityGene - The basic unit of heredity There are genes for characteristics i.e. a gene There are genes for characteristics i.e. a gene
for blond hair etcfor blond hair etc Genes contain the information as a Genes contain the information as a
sequence of nucleotidessequence of nucleotides Genes are abstract concepts – like Genes are abstract concepts – like
longitude and latitudes in the sense that longitude and latitudes in the sense that you cannot see them separatelyyou cannot see them separately
Genes are made up of nucleotidesGenes are made up of nucleotides
M.Alroy MascrengheM.Alroy Mascrenghe 1111
M.Alroy MascrengheM.Alroy Mascrenghe 1212
Nucleotide (nt)Nucleotide (nt)
Each nt I made up ofEach nt I made up of SugarSugar Phospate groupPhospate group BaseBase
The base it (nt) contains makes the only The base it (nt) contains makes the only difference between one nt and the otherdifference between one nt and the other
There are 4 different basesThere are 4 different bases G(uanine),A(denine),T(hymine),C(ytosine)G(uanine),A(denine),T(hymine),C(ytosine)
The information is in the order of nucleotide The information is in the order of nucleotide and the order is the infoand the order is the info
Genes can be many thousands of nt longGenes can be many thousands of nt long The complete set of genetic instructions is The complete set of genetic instructions is
called genomescalled genomes
M.Alroy MascrengheM.Alroy Mascrenghe 1313
ChromosomesChromosomes
DNA strings make DNA strings make chromosomeschromosomes
AnalogyAnalogy Letters - Letters - ntnt Sentences – genesSentences – genes Individual Individual volumesvolumes of Britannica of Britannica
encyclopedia – chromosomesencyclopedia – chromosomes All voles together - GenomeAll voles together - Genome
M.Alroy MascrengheM.Alroy Mascrenghe 1414
Double HelixDouble Helix
The DNA is a double helixThe DNA is a double helix Each strand has complementary Each strand has complementary
informationinformation Each particular base in one strand is Each particular base in one strand is
bonded with another particular base in the bonded with another particular base in the next strandnext strand G - CG - C A - TA - T
For example - For example - AATGCAATGC one strandone strand TTACGTTACG other strandother strand
M.Alroy MascrengheM.Alroy Mascrenghe 1515
ProteinsProteins Proteins are very important Proteins are very important
biological featurebiological feature Amino Acids make up the proteinsAmino Acids make up the proteins 20 different amino acids are there20 different amino acids are there The function of a protein is The function of a protein is
dependant on the order of the amino dependant on the order of the amino acidsacids
M.Alroy MascrengheM.Alroy Mascrenghe 1616
Proteins…Proteins…
The information required to make aa is stored The information required to make aa is stored in DNAin DNA
DNA sequence determines amino acid DNA sequence determines amino acid sequence sequence
Amino Acid sequence determines protein Amino Acid sequence determines protein structurestructure
Protein structure determines protein functionProtein structure determines protein function A Substance called RNA is used to carry the A Substance called RNA is used to carry the
Info stored in the DNA that in turn is used to Info stored in the DNA that in turn is used to make proteinsmake proteins
Storage - DNAStorage - DNA Information Transfer – RNAInformation Transfer – RNA RNA is the message boy!RNA is the message boy!
M.Alroy MascrengheM.Alroy Mascrenghe 1717
Central dogmaCentral dogma
DNADNA transcription transcription RNARNA Translation Translation ProteinProtein
RNA PolymeraseRNA Polymerase Ribosomes Ribosomes
M.Alroy MascrengheM.Alroy Mascrenghe 1818
M.Alroy MascrengheM.Alroy Mascrenghe 1919
Proteins…..Proteins….. Since there are 20 amino acids to Since there are 20 amino acids to
translate one nt cannot correspond to translate one nt cannot correspond to one aa, neither can it correspond as one aa, neither can it correspond as twostwos
So in triplet codes – codon – protein So in triplet codes – codon – protein information is carriedinformation is carried
The codons that do not correspond to a The codons that do not correspond to a protein are stop codons – UAA, UAG, protein are stop codons – UAA, UAG, UGA UGA (RNA has U instead of T)(RNA has U instead of T)
Some codons are used as start codons - Some codons are used as start codons - AUG as well as to code methionineAUG as well as to code methionine
M.Alroy MascrengheM.Alroy Mascrenghe 2020
Protein StructureProtein Structure Shows a wide variety as opposed to the DNA Shows a wide variety as opposed to the DNA
whose structure is uniformwhose structure is uniform X-ray crystallography or Nuclear Magnetic X-ray crystallography or Nuclear Magnetic
Resonance (NMR) is used to figure out the Resonance (NMR) is used to figure out the structurestructure
Structure is related to the function or rather Structure is related to the function or rather structure determines the functionstructure determines the function
Although proteins are created as a linear Although proteins are created as a linear structure of aa chain they fold into 3 d structure.structure of aa chain they fold into 3 d structure.
If you stretch them and leave them they will go If you stretch them and leave them they will go back to this structure – this is the back to this structure – this is the native native structurestructure of a protein of a protein
Only in the native structure the proteins functions Only in the native structure the proteins functions wellwell
Even after the translation is over protein goes Even after the translation is over protein goes through some changes to its structurethrough some changes to its structure
M.Alroy MascrengheM.Alroy Mascrenghe 2121
Gene ExpressionGene Expression Gene Expression – the process of Gene Expression – the process of
Transcripting a DNA and translating a RNA Transcripting a DNA and translating a RNA to make proteinto make protein
Where do the genes begin in a Where do the genes begin in a chromosome?chromosome?
How does the RNA identify the beginning How does the RNA identify the beginning of a gene to make a proteinof a gene to make a protein
A single nt cannot be taken to point out the A single nt cannot be taken to point out the beginning of a gene as they occur beginning of a gene as they occur frequentlyfrequently
But a particular combination of a nucleotide But a particular combination of a nucleotide can becan be
Promoter sequences – the order of nt Promoter sequences – the order of nt which mark the beginning of a genewhich mark the beginning of a gene
M.Alroy MascrengheM.Alroy Mascrenghe 2222
Bioinformatics Bioinformatics Techniques…..Techniques…..
M.Alroy MascrengheM.Alroy Mascrenghe 2323
Prediction and Pattern Prediction and Pattern RecognitionRecognition The two main areas of bioinformatics The two main areas of bioinformatics
areare Pattern recognitionPattern recognition
‘‘A particular sequence or structure has A particular sequence or structure has been seen before’ and that a particular been seen before’ and that a particular characteristic can be associated with itcharacteristic can be associated with it
PredictionPrediction From a sequence (what we know) we From a sequence (what we know) we
can predict the structure and function can predict the structure and function (what we don’t know)(what we don’t know)
M.Alroy MascrengheM.Alroy Mascrenghe 2424
Dot plots….Dot plots….
Simple way of evaluating Simple way of evaluating similarity between two similarity between two sequencessequences
In a graph one sequence is on In a graph one sequence is on one side the next on the other one side the next on the other sideside
Where there are matches Where there are matches between the two sequences the between the two sequences the graph is markedgraph is marked
M.Alroy MascrengheM.Alroy Mascrenghe 2525
M.Alroy MascrengheM.Alroy Mascrenghe 2626
AlignmentsAlignments
A match for similarity between the characters of two or A match for similarity between the characters of two or more sequencesmore sequences
Eg.Eg. TTACTATATTACTATA TAGATATAGATA
There are so many ways to align the above two There are so many ways to align the above two sequencessequences
1. 1. TTACTATATTACTATA TAGATATAGATA
2.2. TTACTATATTACTATA TAGATATAGATA
3.3. TTACTATATTACTATA TAGATATAGATA
So which one do we choose and on what basis?So which one do we choose and on what basis? Solution is to Provide a match score and mismatch scoreSolution is to Provide a match score and mismatch score
M.Alroy MascrengheM.Alroy Mascrenghe 2727
GapsGaps
Introduce gaps and a penalty Introduce gaps and a penalty score for gapsscore for gaps
TTACTATATTACTATA T_A_GATAT_A_GATA
In gap scores a single indel which is two characters long is preferred to two indels which are each one In gap scores a single indel which is two characters long is preferred to two indels which are each one character longcharacter long
However not all gaps are badHowever not all gaps are bad TTGCAATCTTTGCAATCT CAACAA How do we align?How do we align? ---CAA------CAA--- These gaps are not biologically significantThese gaps are not biologically significant Semi Global AlignmentsSemi Global Alignments
M.Alroy MascrengheM.Alroy Mascrenghe 2828
Scoring MatrixScoring Matrix
For DNA/protein sequence alignment we create a matrix For DNA/protein sequence alignment we create a matrix If A and A score is 1If A and A score is 1 If A and T score is -5If A and T score is -5 If A and C score is -1If A and C score is -1
M.Alroy MascrengheM.Alroy Mascrenghe 2929
Dynamic ProgrammingDynamic Programming
As the length of the query sequences As the length of the query sequences increase and the difference of length increase and the difference of length between the two sequence also increases between the two sequence also increases –more gaps has to be inserted in various –more gaps has to be inserted in various placesplaces
We cannot perform an exhaustive searchWe cannot perform an exhaustive search Combinatorial explosion occurs – too much Combinatorial explosion occurs – too much
combinations to search forcombinations to search for Dynamic programming is a way of using Dynamic programming is a way of using
heuristics to search in the most promising heuristics to search in the most promising pathpath
M.Alroy MascrengheM.Alroy Mascrenghe 3030
DatabasesDatabases Sequence info is stored in databasesSequence info is stored in databases So that they can be manipulated So that they can be manipulated
easilyeasily The db (next slide) are located at diff The db (next slide) are located at diff
placesplaces They exchange info on a daily basis They exchange info on a daily basis
so that they are up-to-date and are in so that they are up-to-date and are in syncsync
Primary db – sequence dataPrimary db – sequence data
Major Primary DBMajor Primary DBNucleic AcidNucleic Acid ProteinProtein
EMBL (Europe)EMBL (Europe) PIR - PIR -
Protein Information Protein Information ResourceResource
GenBank (USA)GenBank (USA) MIPSMIPS
DDBJ (Japan)DDBJ (Japan) SWISS-PROTSWISS-PROT
University of Geneva, University of Geneva, now with EBInow with EBI
TrEMBLTrEMBL
A supplement to SWISS-A supplement to SWISS-PROTPROT
NRL-3DNRL-3D
M.Alroy MascrengheM.Alroy Mascrenghe 3232
Composite DBComposite DB
As there are many db which one to As there are many db which one to search? Some are good in some search? Some are good in some aspects and weak in others?aspects and weak in others?
Composite db is the answer – which Composite db is the answer – which has several db for its base datahas several db for its base data
Search on these db is indexed and Search on these db is indexed and streamlined so that the same stored streamlined so that the same stored sequence is not searched twice in sequence is not searched twice in different dbdifferent db
M.Alroy MascrengheM.Alroy Mascrenghe 3333
Composite DBComposite DB
OWL has these as their primary OWL has these as their primary dbdb SWISS PROT (top priority)SWISS PROT (top priority) PIRPIR GenBankGenBank NRL-3DNRL-3D
M.Alroy MascrengheM.Alroy Mascrenghe 3434
Secondary dbSecondary db
Store secondary structure info Store secondary structure info or results of searches of the or results of searches of the primary dbprimary db
Compo Compo DBDB
Primary Primary SourceSource
PROSITEPROSITE SWISS-PROTSWISS-PROT
PRINTSPRINTS OWLOWL
M.Alroy MascrengheM.Alroy Mascrenghe 3535
Database SearchesDatabase Searches We have sequenced and identified We have sequenced and identified
genes. So we know what they dogenes. So we know what they do The sequences are stored in databasesThe sequences are stored in databases So if we find a new gene in the human So if we find a new gene in the human
genome we compare it with the already genome we compare it with the already found genes which are stored in the found genes which are stored in the databases.databases.
Since there are large number of Since there are large number of databases we cannot do sequence databases we cannot do sequence alignment for each and every sequencealignment for each and every sequence
So heuristics must be used again.So heuristics must be used again.
M.Alroy MascrengheM.Alroy Mascrenghe 3636
Areas in Areas in Bioinformatics…Bioinformatics…
M.Alroy MascrengheM.Alroy Mascrenghe 3737
GenomicsGenomics
Because of the multicellular structure, each Because of the multicellular structure, each cell type does gene expression in a cell type does gene expression in a different way –although each cell has the different way –although each cell has the same content as far as the genetic same content as far as the genetic
i.e. All the information for a liver cell to be a i.e. All the information for a liver cell to be a liver cell is also present on nose cell, so liver cell is also present on nose cell, so gene expression is the only thing that gene expression is the only thing that differentiatesdifferentiates
M.Alroy MascrengheM.Alroy Mascrenghe 3838
Genomics - Finding GenesGenomics - Finding Genes
Gene in sequence data – needle in a Gene in sequence data – needle in a haystackhaystack
However as the needle is different However as the needle is different from the haystack genes are not diff from the haystack genes are not diff from the rest of the sequence datafrom the rest of the sequence data
Is whole array of nt we try to find and Is whole array of nt we try to find and border mark a set o nt as a geneborder mark a set o nt as a gene
This is one of the challenges of This is one of the challenges of bioinformaticsbioinformatics
Neural networks and dynamic Neural networks and dynamic programming are being employedprogramming are being employed
OrganismOrganism Genome Genome Size Size (Mb) (Mb) bp * 1,000,000bp * 1,000,000
Gene Gene NumberNumber
Web SiteWeb Site
YeastYeast 13.513.5 6,2416,241 http://genome-http://genome-www.stanford.edwww.stanford.edu/u/SaccharomycesSaccharomyces
Fruit FliesFruit Flies 180180 13,60113,601 http://http://flybase.bio.indiaflybase.bio.indiana.eduna.edu
Homo Homo SapiensSapiens
3,0003,000 45,00045,000 http://http://www.ncbi.nlm.niwww.ncbi.nlm.nih.gov/genome/h.gov/genome/guideguide
M.Alroy MascrengheM.Alroy Mascrenghe 4040
ProteomicsProteomics
Proteome is the sum total of an Proteome is the sum total of an organisms proteinsorganisms proteins
More difficult than genomicsMore difficult than genomics 44 2020 Simple chemical makeupSimple chemical makeup complexcomplex Can duplicateCan duplicate can’tcan’t
We are entering into the ‘post We are entering into the ‘post genome era’genome era’
Meaning much has been done with Meaning much has been done with the Genes – not that it’s a overthe Genes – not that it’s a over
M.Alroy MascrengheM.Alroy Mascrenghe 4141
Proteomics…..Proteomics…..
The relationship between the RNA and the protein it codes are The relationship between the RNA and the protein it codes are usually very differentusually very different
After translation proteins do changeAfter translation proteins do change So aa sequence do not tell anything about the post So aa sequence do not tell anything about the post
translation changestranslation changes Proteins are not active until they are combined into a larger Proteins are not active until they are combined into a larger
complex or moved to a relevant location inside or outside the cellcomplex or moved to a relevant location inside or outside the cell So aa only hint in these thingsSo aa only hint in these things Also proteins must be handled more carefully in labs as they tend Also proteins must be handled more carefully in labs as they tend
to change when in touch with an inappropriate materialto change when in touch with an inappropriate material
M.Alroy MascrengheM.Alroy Mascrenghe 4242
Protein Structure PredictionProtein Structure Prediction
Is one of the biggest challenges Is one of the biggest challenges of bioinformatics and esp. of bioinformatics and esp. biochemistrybiochemistry
No algorithm is there now to No algorithm is there now to consistently predict the structure consistently predict the structure of proteinsof proteins
M.Alroy MascrengheM.Alroy Mascrenghe 4343
Structure Prediction methodsStructure Prediction methods
Comparative ModelingComparative Modeling Target proteins structure is Target proteins structure is
compared with related proteinscompared with related proteins Proteins with similar sequences Proteins with similar sequences
are searched for structuresare searched for structures
M.Alroy MascrengheM.Alroy Mascrenghe 4444
PhylogeneticsPhylogenetics
The taxonomical system reflects The taxonomical system reflects evolutionary relationshipsevolutionary relationships
Phylogenetics trees are things which reflect Phylogenetics trees are things which reflect the evolutionary relationship thru a the evolutionary relationship thru a picture/graphpicture/graph
Rooted trees where there is only one Rooted trees where there is only one ancestorancestor
Un rooted trees just showing the Un rooted trees just showing the relationshiprelationship
Phylogenetic tree reconstruction algorithms Phylogenetic tree reconstruction algorithms are also an area of researchare also an area of research
M.Alroy MascrengheM.Alroy Mascrenghe 4545
Applications….Applications….
M.Alroy MascrengheM.Alroy Mascrenghe 4646
Medical ImplicationsMedical Implications
PharmacogenomicsPharmacogenomics Not all drugs work on all patients, some good Not all drugs work on all patients, some good
drugs cause death in some patientsdrugs cause death in some patients So by doing a gene analysis before the So by doing a gene analysis before the
treatment the offensive drugs can be avoidedtreatment the offensive drugs can be avoided Also drugs which cause death to most can be Also drugs which cause death to most can be
used on a minority to whose genes that drug is used on a minority to whose genes that drug is well suited – volunteers wanted!well suited – volunteers wanted!
Customized treatmentCustomized treatment Gene TherapyGene Therapy
Replace or supply the defective or missing geneReplace or supply the defective or missing gene E.g: Insulin and Factor VIII or HaemophiliaE.g: Insulin and Factor VIII or Haemophilia
BioWeapons (??)BioWeapons (??)
M.Alroy MascrengheM.Alroy Mascrenghe 4747
Diagnosis of DiseaseDiagnosis of Disease Diagnosis of diseaseDiagnosis of disease
Identification of genes which cause the Identification of genes which cause the disease will help detect disease at early disease will help detect disease at early stage e.g. Huntington disease -stage e.g. Huntington disease -
Symptoms – uncontrollable dance like Symptoms – uncontrollable dance like movements, mental disturbance, personality movements, mental disturbance, personality changes and intellectual impairment changes and intellectual impairment
Death in 10-15 yearsDeath in 10-15 years The gene responsible for the disease has The gene responsible for the disease has
been identifiedbeen identified Contains excessively repeated sections of Contains excessively repeated sections of
CAGCAG So once analyzed the couple can be So once analyzed the couple can be
counseledcounseled
M.Alroy MascrengheM.Alroy Mascrenghe 4848
Drug DesignDrug Design
Can go up to 15yrs and Can go up to 15yrs and $700million $700million
One of the goals of One of the goals of bioinformatics is to reduce the bioinformatics is to reduce the time and cost involved with it.time and cost involved with it.
The processThe process DiscoveryDiscovery
Computational methods can Computational methods can improves thisimproves this
TestingTesting
M.Alroy MascrengheM.Alroy Mascrenghe 4949
DiscoveryDiscovery
Target identificationTarget identification Identifying the molecule on which the Identifying the molecule on which the
germs relies for its survivalgerms relies for its survival Then we develop another molecule Then we develop another molecule
i.e. drug which will bind to the targeti.e. drug which will bind to the target So the germ will not be able to interact So the germ will not be able to interact
with the target.with the target. Proteins are the most common targetsProteins are the most common targets
M.Alroy MascrengheM.Alroy Mascrenghe 5050
Discovery…Discovery…
For example HIV produces HIV For example HIV produces HIV protease which is a protein and protease which is a protein and which in turn eat other proteinswhich in turn eat other proteins
This HIV protease has an This HIV protease has an active active sitesite where it binds to other where it binds to other moleculesmolecules
So HIV drug will go and bind So HIV drug will go and bind with that active sitewith that active site Easily said than done!Easily said than done!
M.Alroy MascrengheM.Alroy Mascrenghe 5151
Discovery…Discovery…
Lead compounds are the Lead compounds are the molecules that go and bind to molecules that go and bind to the target protein’s active sitethe target protein’s active site
Traditionally this has been a trial Traditionally this has been a trial and error methodand error method
Now this is being moved into the Now this is being moved into the realm of computersrealm of computers
M.Alroy MascrengheM.Alroy Mascrenghe 5252
Related Computer Related Computer Technology………….Technology………….
M.Alroy MascrengheM.Alroy Mascrenghe 5353
PERLPERL
Perl is commonly used for Perl is commonly used for bioinformatics calculations as its bioinformatics calculations as its ability to manipulate character ability to manipulate character symbolssymbols
The default CGI languageThe default CGI language It started out as a scripting language It started out as a scripting language
but has become a fully fledged but has become a fully fledged languagelanguage
IT has everything now, even web IT has everything now, even web service supportservice support
http://bio.perl.orghttp://bio.perl.org
M.Alroy MascrengheM.Alroy Mascrenghe 5454
The place of XML & Web The place of XML & Web ServicesServices Various markup languages are being created – Various markup languages are being created –
Gene Markup language etc to represent Gene Markup language etc to represent sequence/gene datasequence/gene data
Web Services – program to program interaction, Web Services – program to program interaction, making the web application centric as opposed to making the web application centric as opposed to human centrichuman centric
So this has to platform language independentSo this has to platform language independent Protocols like SOAP help in this regardProtocols like SOAP help in this regard In bioinformatics various databases are being used, In bioinformatics various databases are being used,
different platforms, languages etcdifferent platforms, languages etc So web services helps achieve platform So web services helps achieve platform
independence and program interactionindependence and program interaction Since sequence data bases are in various formats, Since sequence data bases are in various formats,
platforms SOAP also helps in this regardsplatforms SOAP also helps in this regards
M.Alroy MascrengheM.Alroy Mascrenghe 5555
The place of GRIDThe place of GRID GRID - new kid on the blockGRID - new kid on the block Using many computers to fulfill a Using many computers to fulfill a
single computational taskssingle computational tasks Bioinformatics is the ideal platform Bioinformatics is the ideal platform
as it has to deal with a large as it has to deal with a large amount of data in alignment and amount of data in alignment and searchessearches
E-science initiative in the UKE-science initiative in the UK ORACLE 10g – the worlds first ORACLE 10g – the worlds first
GRID databaseGRID database
M.Alroy MascrengheM.Alroy Mascrenghe 5656
Data bases and MiningData bases and Mining
Lot of the sequence databases are Lot of the sequence databases are available publiclyavailable publicly
As there is a DB involved various As there is a DB involved various data mining techniques are used to data mining techniques are used to pull the data outpull the data out
As there is a lot of literature – articles As there is a lot of literature – articles etc – on this area a data mining on etc – on this area a data mining on the literature – not on the sequence the literature – not on the sequence data has also become a PhD topic data has also become a PhD topic for manyfor many
M.Alroy MascrengheM.Alroy Mascrenghe 5757
European Molecular Biology European Molecular Biology Network (EMBnet)Network (EMBnet) A central system for sharing, training A central system for sharing, training
and centralizing up to date bio infoand centralizing up to date bio info Some of the EMBnet sites are:Some of the EMBnet sites are: SQENETSQENET
http://www.seqnet.dl.ac.ukhttp://www.seqnet.dl.ac.uk UCLUCL
http://www.biochem.ucl.ac.uk/bsm/dbbrohttp://www.biochem.ucl.ac.uk/bsm/dbbrowser/embnet/wser/embnet/
EBI – European Bioinformatics EBI – European Bioinformatics InstituteInstitute www.ebi.ac.ukwww.ebi.ac.uk
M.Alroy MascrengheM.Alroy Mascrenghe 5858
References References
Dan E. Krane and Michael L. RaymerDan E. Krane and Michael L. Raymer Basic Concepts of BioinformaticsBasic Concepts of Bioinformatics
Arthur M LeskArthur M Lesk Intro to BioinformaticsIntro to Bioinformatics
T.K. Attwood & D. J. Parry-SmithT.K. Attwood & D. J. Parry-Smith Intro to BioinformaticsIntro to Bioinformatics
The genetic RevolutionThe genetic Revolution Dr Patrick DixonDr Patrick Dixon
Prof David Gilbert’s SiteProf David Gilbert’s Site http://www.brc.dcs.gla.ac.uk/~drg/http://www.brc.dcs.gla.ac.uk/~drg/
M.Alroy MascrengheM.Alroy Mascrenghe 5959
Thank You!Thank You!