Microarray analysis: The CCBR’s perspective

Preview:

DESCRIPTION

Microarray analysis: The CCBR’s perspective. Manjula Kasoji CCBR 09/29/2014. Common pitfalls. Number of replicates Source, quantity and quality of RNA Batch effects A dequate expression signal Time series experiments Non-target tissue contamination. No Replicates, No Statistics. - PowerPoint PPT Presentation

Citation preview

1

Microarray analysis: The CCBR’s perspective

Manjula KasojiCCBR

09/29/2014

2

Common pitfalls

• Number of replicates

• Source, quantity and quality of RNA

• Batch effects

• Adequate expression signal

• Time series experiments

• Non-target tissue contamination

3

No Replicates, No Statistics

A project with no replicates may give you some information, but not possible to do statistics

4

How many biological replicates are needed in a quantitative study?

>= 7replicates

>= 3 replicates

More replicates if• High biological variability• Contamination by non-target tissues• Subtle treatment effect• Multiple treatments• Mechanism of action• Network analysis• And many more…

>= 4-5replicates

5

Depending on the source of RNA, sometimes even with the required number of replicates, samples do not cluster well

• Embryonic tissue

• Knock-out efficiency may also play a role in good sample clustering.

Quality, quantity and Source of RNA influence sample clustering

Restoration of Knockout 1

Knockout 1

Knockout 2

Control

6

Randomization and consistent processing will help avoid batch effects

Treatment

BiologicalReplicate

TechnicalReplicate

Array Batch

A A B B

7

Example of batch effect

Cell line A

Cell line B

Cell line C

Cell line DCell line E

Treated 1

Control 1

Treated 2

Control 2

Batch 1 – Scan Date 02/22/2011

Batch 2 – Scan Date 08/12/2011

8

Batch effects can be visualized via clustering as well

Summary:• Batch effects can be avoided by good experimental design and

randomization.• Batch effects can be visualized on a PCA plot and by clustering.

Batch A Batch B

9

Weak signal expression across samples confounds analysis results

Group1

Group2

Group3

Group4

• Poor clustering of samples

• Genes regulated by gene A induced upon DNA damage

• 4 different conditions

10

Weak signal leads toVery little or no significant, differentially expressed genes

Contrasts DEGs (FDR 0.05, FC 2)

Group 1 vs. Group 2 0

Group 3 vs. Group 4 0

Group 1 vs. Group 3 0

Group 2 vs. Group 4 0

What can we do in this situation?• Relax the statistical parameters (lower p-value and FC thresholds)• Caveat is that this will increase the number of false positives and will negatively

influence downstream analysis.

Summary:Sufficient number of replicatesRandomizationValidation

11

Adding time points to an experiment can be useful for finding biological relevance

• Comparing immune system response in knockout mouse model to human model after treatment with endotoxin.

• Only one time point in mouse: 24hrs 6 time points in human data (0,2,4,6,9,24 hours)

WT-Mouse

KO-Mouse

Human 0 hr

Human 2 hr

Human 4 hr

Human 6 hr

Human 9 hr

Human 24 hr

12

A successful project: Sufficient number of replicates and samples of a group

cluster well

Treated, Time 1

Treated, Time 2

Not treated, Time 1

Not treated, Time 2

Principal Component Analysis

•Effect of cell density and drug treatment on cell survival and growth.•Two conditions and 4 samples per group.

13

Diagnosing outliers

PCA plots are a good way to flag outliers

14

Diagnosing outliers: Quality control

• arrayQualityMetrics() from R/Bioconductor• Metrics measured:

1 Between array comparison(Distance between arrays, PCA)2 Array intensity distribution(Box plots, density plots)3 Affymetrix specific plots on raw data (RLE – Relative Log Expression)4 Affymetrix specific plots on raw data (NUSE – Normalize Unscaled Standard

Error)5 Individual array quality (MA plots)6 Spatial distribution of intensities

• If a sample outlier fails more than one QC metric:1. that sample should be re-run if possible.2. be removed from the analysis.

15

Diagnosing outliers: Quality control

Density Plot Box Plot Heat map

16

Sufficient number of replicates and good quality lead to sufficient number of DEGs

Contrast Groups DEGs

Treated-Day 1 vs. Not Treated Day 2 288

Treated-Day 2 vs. Not Treated Day 2 434

Treated-Day 2 vs. Treated-Day 1 105

Not Treated Day 2 vs. Not Treated Day 1 264

Summary: • Sufficient replicates and good quality samples yield a successful

project.• Outliers can be diagnosed by visualization on a PCA plot and

checking technical QC metrics to ensure that the outlier is not due to biological variability.

Significant, differentially expressed genes (DEGs), p-value 0.05, FC 2

17

Downstream analysis: Functional enrichment using IPA

• Question: Which genes are associated with the growth-suppressive effect of low cell density on cell proliferation and survival?

• Time 1 = low cell density, Time 2 = high density

Time 2, Treated vs. Non-treated Time 1, Treated vs. Non-treated

224 210 78

 

Subset of the 10 genes specifically involved in the Cellular Growth and Proliferation function that are also predicted to be growth suppressive.

Top 5 Bio-functions

19

Visualization of networks in IPA

Interaction network Interaction network expanded to include connections to upstream molecules

20

After the analysis

• Submit data to public repository and provide required metadata

21

What you need to provide to CCBR

Microarray Facility

Investigator

23

1CCBR

• Give us a visit before you begin your experiment• Raw data (e.g. .CEL files)• Metadata (type of array, platform, species, experimental design

information, processing dates)– http://ccrifx.cancer.gov/apps/site/example_microarray

• Your goals and participation• Submit your project request

– https://ccrifx.cancer.gov/apps/project_request/request_project

4

22

If you want to perform the analysis on your own, you need to…

• Learn appropriate qc methods, different statistical tests, and experimental designs

• Know what is in your tool box– Command line

• Affymetrix Power Tools (APT)—for Macs, command line only; free• R/Bioconductor packages

– GUI tools• Affymetrix Expression Console (PC only)--free• Partek• Gene Set Enrichment Analysis (GSEA)• Ingenuity Pathway Analysis (IPA)

• To take this further – Know how to run command line programs – Learn how to script (R/Bioconductor)– Learn different R packages

23

Recap

• Appropriate experimental design

• Sufficient replicates to have statistical power

• Consistent processing to avoid batch effects

• Raw data and meta data

• Visualization

• Validation

* Continuous interaction with

CCBR

Acknowledgements

CCRIFX Fathi Elloumi, PhDParthav Jailwala, MS Li Jia, MS Manjula Kasoji, MS Anjan Purkayastha, PhDAnand S Merchant, MD, PhD

Eric Stahlberg, PhD

24

CCR expertsMaggie Cam, PhDSean Davis, MD, PhDMax Lee, PhDPeter FitzGerald, PhDDavid Goldstein, PhD Sequencing FacilityYongmei Zhao, MSBao Tran, MS

ABCCBrian Luke, PhD Uma Mudunuri, MSBob Stephens, PhDMing Yi, PhD

Jack Collins, PhD

25

Questions??

ContactCCBR home page:

http://ccrifx.cancer.gov/apps/site/defaultCCBR email:

ccrifx_support@mail.nih.govBuilding 37, room 1123Building 41, room B620

Office hours: Fridays 9:30am -11:30am

Recommended