Upload
hoangnhan
View
219
Download
0
Embed Size (px)
Citation preview
1
Bioinformatica
Prof. Raffaele [email protected]. 011 6705417Cell. 333 3827080Orari di ricevimento:
in qualunque momentoLibri suggeriti:
Introduzione alla genomica, ZanichelliAppunti delle lezioni
What is MicroarrayWhat is Microarray
A powerful technology for biological exploration A powerful technology for biological exploration which which enables to simultaneously measure the enables to simultaneously measure the level of activity of thousands genes. level of activity of thousands genes. The amount of mRNA for each gene in a given The amount of mRNA for each gene in a given sample (or a pair of samples) is measured.sample (or a pair of samples) is measured.MicroarraysMicroarrays are:are:
ParallelParallelHighHigh--throughputthroughputLargeLarge--scalescaleGenomic scaleGenomic scale
2
Most known commercial Most known commercial microarray platformsmicroarray platforms
Commercial microarrays give in general more reproducible results due to the high QC, which can be achieved only at industrial level.
24 24 mersmers
Mirror Mirror photolithographotolithogra
phyphy
NembGeneNembGene
BeadsBeads--linked linked oligooligo librarylibrary
ContactContactspottingspottingPhotolithograPhotolithogra
phyphy
Inkjet Inkjet synthesissynthesis
50mers50mers60mers60mers25mers25mers60mers60mers
IlluminaIlluminaApplied Applied BiosystemBiosystemAffymetrixAffymetrixAgilentAgilent
ScheduleSchedule
First part:First part:Microarray structural design.Microarray structural design.Hybridization and detection.Hybridization and detection.Experimental design.Experimental design.Annotation.Annotation.
Second part:Second part:Data manipulation.Data manipulation.Statistical inference of differential expression.Statistical inference of differential expression.
Third part:Third part:Assessing the biological meaning of the differential expression.Assessing the biological meaning of the differential expression.
3
ScheduleSchedule
First part:First part:Microarray structural design.Microarray structural design.Hybridization and detection.Hybridization and detection.Hybridization and detection.Experimental design.Experimental design.Experimental design.Annotation.Annotation.Annotation.
Second part:Second part:Second part:Data manipulation.Data manipulation.Data manipulation.Statistical inference of differential expression.Statistical inference of differential expression.Statistical inference of differential expression.
Third part:Third part:Third part:Assessing the biological meaning of the differential Assessing the biological meaning of the differential Assessing the biological meaning of the differential expression.expression.expression.
Robot pin spottingRobot pin spotting
4
Agilent inkjet technologyAgilent inkjet technology
Agilent uses inkjet technology to print Agilent uses inkjet technology to print oligosoligos onto glass onto glass slides. slides. The nonThe non--contact inkjet technology produces contact inkjet technology produces microarraysmicroarrayswith more uniform and consistent features. with more uniform and consistent features. The inkjet process does not introduce defects as a result of The inkjet process does not introduce defects as a result of surface tension interactions with the microarray surface. surface tension interactions with the microarray surface. NonNon--contact inkjet process provides substantial contact inkjet process provides substantial improvement over pin spotting with respect to consistency improvement over pin spotting with respect to consistency and spot uniformity. and spot uniformity.
Probe designProbe designProprietary Proprietary algorithms are algorithms are used to design the used to design the probes.probes.Design of Design of consensus target consensus target sequences is based sequences is based on the integration on the integration of public data of public data derived by different derived by different resources. resources. Probes are Probes are experimentally experimentally validated.validated.
5
AffymetrixAffymetrix microarraysmicroarraysa.k.aa.k.a gene chip.gene chip.
Instead of putting entire genes on an array, Instead of putting entire genes on an array, put sets of DNA 25put sets of DNA 25--mers (mers (oligonucleotidesoligonucleotides).).
Produced using a photolithography, process Produced using a photolithography, process similar to that used to make semiconductor similar to that used to make semiconductor chips.chips.
mRNA samples are processed separately mRNA samples are processed separately instead of in pairs (single channel technology)instead of in pairs (single channel technology)
AffymetrixAffymetrix geneChipsgeneChipsVarious 25 Various 25 mersmers are put on the chip to interrogate a are put on the chip to interrogate a single gene.single gene.
Additionally a slight variant (that differs only at the Additionally a slight variant (that differs only at the 1313thth base) is put next to it.base) is put next to it.
The gene expression measure is made combining the The gene expression measure is made combining the hybridization information derived by many (20, 16, hybridization information derived by many (20, 16, 11, >3) separate measurements.11, >3) separate measurements.
New New geneChipsgeneChips releases are characterized by an releases are characterized by an increase of the number of enquired transcripts. To increase of the number of enquired transcripts. To keep the keep the geneChipgeneChip format constant (a square of 1.28 format constant (a square of 1.28 x 1.28 cm): x 1.28 cm):
the size of the feature is reduced (e.g. 11 nmthe size of the feature is reduced (e.g. 11 nm→→5 nm) 5 nm) the number of probes for each probe set the number of probes for each probe set descreasesdescreases (e.g. 11 (e.g. 11 probes probes →→5 probes).5 probes).
6
7
8
AffymetrixAffymetrix geneChipsgeneChipsProbe Array (Photolithography)Probe Array (Photolithography)
1.4 million features on each chip and 1.4 million features on each chip and therefore the sections of the mask can be therefore the sections of the mask can be very tiny.very tiny.
400 chips mask
Probe Array (Photolithography)Synthesis of probe
9
11 µm 18 µm
PMMM
cellProbe pair
Genesequence
ACCAGATCTGTAGTCCATGCGATGC
ACCAGATCTGTAATCCATGCGATGC
PM
MM
Probe set (Probe set (AffymetrixAffymetrix))
10
Limits & Limits & strengthstrength of the of the platformplatform
Claimed linear Claimed linear dinamicdinamic range sensitivity:range sensitivity:>1.5 >1.5 pMpM
Probe pairs are scattered over all chip Probe pairs are scattered over all chip (low impact of local artifacts).(low impact of local artifacts).
Bad, but stillreasonable
Ugly
Image Image artifactsartifacts
Nearly OK
The multi probe design allows a good resistanceto local construction/hybridization artefacts
The multi probe design allows a good resistanceto local construction/hybridization artefacts
11
Exon arrays
On the GeneChip® Human Exon 1.0 ST Array, 5,362,207 features are used tointerrogate one million exon clusters(collections of overlapping exons) withover 1.4 million probe sets.
12
13
14
Transcript clustersThe core type was so named because the annotations in this type were intended to bethe foundation from which was built the gene annotations. The extended type derived its name from the sense that these annotations would extendthe boundaries of the core genes. The idea behind the name of the full typewas that it would signify all possible content.
15
Core Gene Annotation sourcesRefSeq alignmentsGenbank alignments of ‘complete CDS’ transcripts
Extended Gene Annotation sourcescDNA alignmentsEnsembl annotations (Hubbard, T. et al.)Mapped syntenic mRNA from rat and mousemicroRNA annotationsMitomap annotationsVegagene (The HAVANA group, Hillier et al., Heilig et al.)VegaPseudogene (The HAVANA group, Hillier et al., Heilig et al.)
Full Gene AnnotationsGeneid (Grup de Recerca en Informàtica Biomèdica)Genscan (Burge, C. et al.)GENSCAN Suboptimal (Burge, C. et al.)Exoniphy (Siepel et al.)RNAgene (Sean Eddy Lab)SgpGene (Grup de Recerca en Informàtica Biomèdica)TWINSCAN (Korf, I. et al.)
16
NimbleGen photolitography
NimbleGeneNimbleGene technologytechnology
Feature characteristics:Feature characteristics:The most consistent results are obtained The most consistent results are obtained usingusing::
3333--µµm features, created by clustering four m features, created by clustering four activated mirrors surrounded by a border of activated mirrors surrounded by a border of inactivated mirrors.inactivated mirrors.1616--µµm features, in which a single activated m features, in which a single activated mirror is surrounded by a border of inactivated mirror is surrounded by a border of inactivated mirrors. mirrors.
These formats produce These formats produce microarraysmicroarrays with with 85,000 or 195,000 features, respectively, 85,000 or 195,000 features, respectively, when the entire printable area is usedwhen the entire printable area is used..
17
Probe designProbe design
Completely custom.Completely custom.It is possible to reIt is possible to re--design low quality design low quality probes and update the custom array.probes and update the custom array.Probes can be scattered over the all Probes can be scattered over the all chip.chip.
AppleraApplera arrays are based on 60 arrays are based on 60 mermer probes spotted on 3D matrixprobes spotted on 3D matrix
ChemiluminescenceChemiluminescence is used to:is used to:Measure Gene Expression. Measure Gene Expression. Quality Control.Quality Control.
Fluorescence is used to:Fluorescence is used to:Locate and autoLocate and auto--grid. grid. Normalize every feature in a way Normalize every feature in a way independent of gene expression signal.independent of gene expression signal.
18
HybridizationSpecificity
20 50 60 70 80 300
Probe length
T.R.Hughes et al, Nature Biotechnology 19:342-347, 2001
Probes between 50 to 80 Probes between 50 to 80 mersmers give the give the best hybridization specificitybest hybridization specificity
AgilentAppleraIllumina
Affymetrixhome madecDNA arrays
Probe designProbe designSignal mainly derived as the average between the various isoforms
19
Celera curatedCelera curatedOnly: 18%Only: 18%
Curated PublicCurated PublicOnly: 24%Only: 24%
(RefSeq + RIKEN)(RefSeq + RIKEN)
Curated public mapping to Curated public mapping to Celera: 55%Celera: 55%
GenBank mRNAGenBank mRNAOnly: 2.6%Only: 2.6%
AppleraApplera mouse microarray contentmouse microarray content
IlluminaIllumina technologytechnology
20
IlluminaIllumina technologytechnology
Individual fibers conduct light to enable data acquisition and Individual fibers conduct light to enable data acquisition and quantitationquantitation of signal emitted from each bead.of signal emitted from each bead.
Scanner has < 1 Scanner has < 1 µµm resolutionm resolution
8 sub-arrays with 24K probes
Illuminadecoding
21
Illumina decoding
1 1 0 1 2 2 0 2
Probe designProbe designEach address and probe sequence combination has Each address and probe sequence combination has been selected been selected bioinformaticallybioinformatically and functionally and functionally screened in the laboratory to ensure the absence of screened in the laboratory to ensure the absence of crosscross--hybridization. hybridization. GeneGene--specific probes were designed using a multispecific probes were designed using a multi--step algorithm scoring the following parameters:step algorithm scoring the following parameters:
Similarity to other genesSimilarity to other genesAbsence of highly repeated sequence in the genomeAbsence of highly repeated sequence in the genomeSequence complexitySequence complexityEST coverageEST coverageSelfSelf--complementaritycomplementarity for hairpin structure predictionfor hairpin structure predictionMelting temperature for hybridization uniformityMelting temperature for hybridization uniformityDistance from 3Distance from 3’’ end of the transcriptend of the transcript
The design also The design also tookstooks into account into account exonexon structure:structure:Probe design incorporated splice Probe design incorporated splice isoformsisoforms that have been that have been identified and documented in the RefSeq database.identified and documented in the RefSeq database.
22
ScheduleSchedule
First part:First part:Microarray structural design.Microarray structural design.Microarray structural design.Hybridization and detection.Hybridization and detection.Experimental design.Experimental design.Experimental design.Annotation.Annotation.Annotation.
Second part:Second part:Second part:Data manipulation.Data manipulation.Data manipulation.Statistical inference of differential expression.Statistical inference of differential expression.Statistical inference of differential expression.
Third part:Third part:Third part:Assessing the biological meaning of the differential Assessing the biological meaning of the differential Assessing the biological meaning of the differential expression.expression.expression.
Labeling for two channel arraysLabeling for two channel arraysDirect labeling:
Cy3 and Cy5 are directly incorporated during the cDNAsynthesis
Indirect labeling:The process of indirect labeling allows incorporation of modified nucleotides, commonly 5-(3-aminoallyl)-2′-deoxyuridine 5′-triphosphate (a reactive amine derivative of dUTP) into the reverse transcription reaction. The aminoallyl nucleotide is readily incorporated by both DNA and RNA polymerases. A reactive fluorescent dye is then chemically attached to the cDNA transcript in a second reaction.
Amplification labeling (IVT):RT is used with an oligo d(T) primer associated to the T7 promoter sequence to make ds cDNA.T7 RNA Polymerase produces a cRNA with result in an 100 x linear amplification of the dsDNA.
23
Cy3, Cy5 spectra
24
Critical parametersCritical parameters
Many factors have an impact on the Many factors have an impact on the reliability of signals. reliability of signals.
The scanner used is important to consider, The scanner used is important to consider, as it is generating the actual signals to be as it is generating the actual signals to be quantified. quantified. The choice of image analysis software is The choice of image analysis software is clearly critical in the production of reliable clearly critical in the production of reliable intensity values. intensity values.
25
AffymetrixAffymetrix labelinglabeling
DetectionDetection
Hybridization detection is Hybridization detection is performed in two steps:performed in two steps:
StreptavidinStreptavidin labeled with labeled with phycoerythrinphycoerythrin is applied to is applied to the array.the array.Signal amplification is Signal amplification is performed using antiperformed using anti--streptavidinstreptavidin moAbmoAb labeled labeled with biotin followed by with biotin followed by addition of addition of streptavidinstreptavidinlabeled with labeled with fitcfitc..
The fluorescence signal is The fluorescence signal is read by a laser based read by a laser based scanner.scanner.
Biotin
StreptavidinStreptavidin
fitcfitc
moAbmoAb antianti--streptavidinstreptavidin
26
RNA fragment hybridizes with DNA on GeneChipÆ array
RNA fragments with fluorescent tags from sample to be tested
55÷÷11 nm
11 nm
27
GeneChip Output
Limits & Limits & strengthstrength of the of the platformplatform
Single channel platform.Single channel platform.Two steps amplification procedure is Two steps amplification procedure is available.available.Short probes.Short probes.The variance of a specific probe The variance of a specific probe between arrays is smaller of the between arrays is smaller of the variance observed between probes of variance observed between probes of the same probe set (probe effects).the same probe set (probe effects).
28
AAAAAAA 3’ mRNA
5’
(T7 Promoter) 5’ cDNAReverse transcription AAAAAAA - 3’ cRNA
2nd strand DNA synthesis
RNA degradation
cDNA purification
cRNA synthesis
cRNA purification
UUUUUUU 5’ purified DIG labeled cRNA
3’3’3’
AppleraApplera RT /IVT RT /IVT labelinglabeling
(T7 Promoter) 5’ cDNA
AAAAAAA - 3’ cDNA
DIG
29
DetectionDetection
Hybridization detection Hybridization detection is performed using antiis performed using anti--digoxigenindigoxigenin moAbmoAblabeled with labeled with alcalinealcalinephosphatasephosphatase (AP)(AP)ChemioluminescenceChemioluminescence is is acquired by CCD acquired by CCD camera.camera.
digoxigenin
ChemioluminescenceChemioluminescence signalsignal
moAbmoAb antianti--digoxigenindigoxigenin
APAP
Reagent and enhancerReagent and enhancer
Internal Control Probe (ICP)
Immobilized ICP Oligo(co-spotted in every location)
LIZ ®
DIGDI
G
Hybridization
LIZ ®
60 mer oligo
24 mer oligo
30
• Labelling Controls are used to monitor enzyme activity and DIG incorporation during the labelling protocols
� RT (bacterial genes)� IVT (linearised plasmids)
• Hybridization Controls are used to monitor mixing, stringency, and washing during the array hybridization protocol (pre labelled DIG target)
• Chemiluminescent Controls are used to demonstrate that the CL reaction chemistry is performing well during the assay (DIG labelled oligo co-spotted)
1,500 Microarray Controls1,500 Microarray Controls
IlluminaIllumina labelinglabelingA B
31
DetectionDetection
Hybridization Hybridization detection is detection is performed using antiperformed using anti--biotin biotin moAbmoAb labeled labeled with Cy3.with Cy3.The fluorescence The fluorescence signal is read by a signal is read by a laser based scanner.laser based scanner.
Biotin
Cy3Cy3
moAbmoAb antianti--digoxigenindigoxigenin
Limits & Limits & strengthstrength of the of the platformplatform
Single channel technology.Single channel technology.Applicability to RNA samples derived by Applicability to RNA samples derived by paraffineparaffine--embedded tissues.embedded tissues.No two steps amplification procedure No two steps amplification procedure available.available.
32
ScheduleSchedule
First part:First part:Microarray structural design.Microarray structural design.Microarray structural design.Hybridization and detection.Hybridization and detection.Hybridization and detection.Experimental design.Experimental design.Annotation.Annotation.Annotation.
Second part:Second part:Second part:Data manipulation.Data manipulation.Data manipulation.Statistical inference of differential expression.Statistical inference of differential expression.Statistical inference of differential expression.
Third part:Third part:Third part:Assessing the biological meaning of the differential Assessing the biological meaning of the differential Assessing the biological meaning of the differential expression.expression.expression.
Two channel experimentaldesigns
Two conditions:Dye swap.Balanced design.
Multiple conditions:Reference design.Loop design.
33
Microarray Experimental Design NotationMicroarray Experimental Design Notation
TRT 1
TRT 2
1 2
Microarray Experimental Design NotationMicroarray Experimental Design Notation
TRT 1
TRT 2
1 2
34
Microarray Experimental Design NotationMicroarray Experimental Design Notation
TRT 1a
TRT 2a
1a 2a
TRT 1b
TRT 2b
1b 2b
Biological Replicates vs. Technical ReplicatesBiological Replicates vs. Technical Replicates
1a 2a
1a 2a
1b 2b
Biological Replication Technical Replication
Both Biological and Technical Replication
1a 2a
1b 2b
35
Randomly Assign Pairs to SlidesRandomly Assign Pairs to SlidesBalancing the Two Dye ConfigurationsBalancing the Two Dye Configurations
1a 2a
1b 2c
1d 2b
1c 2d
1b 2a
1d 2c
1c 2b
1a 2d
ReferenceReference designdesign1a
1b
1c
2a
2b
2c
3a
3b
ref
Easy analysis.Easy integration with experiments performed at different timesLoss in sensitivity.
36
LoopLoop designdesign
LoopLoop designdesign
More robust results.Complex statistical analysis.Difficult integration with experiments performed at different times.
37
Experimental design for single channel arrays