Click here to load reader
Upload
ethan-parsons
View
239
Download
1
Tags:
Embed Size (px)
Citation preview
The R genetics package:Tools for statistical genetics
Gregory R. WarnesGregory R. Warnes
Associate DirectorAssociate Director
NonClinical StatisticsNonClinical Statistics
Pfizer Global R&DPfizer Global R&D
Groton CTGroton CT
Page 2 CT ASA Mini Conference: 2005-03-05
Outline
Project GoalsProject GoalsSimplify Population Genetic Analysis
Design Details Design Details Extend R ‘Factor’ objects
Functions IncludedFunctions Included Genetic data: Importing & Creation, Manipulation, Information, Annotation, Transformation, Export Statistical Functions: Hardy-Weinberg (Dis-)Equilibrium, Linkage Disequlibrium, Haplotype Imputation,
Sample-size tools
Simple ExamplesSimple Examples Creating Genotype Objects
Example SessionExample Session Future Development: Future Development:
Emulate BioConductor Project Large scale SNP analysis Formal Object Class Multi-team collaboration
Page 5 CT ASA Mini Conference: 2005-03-05
Problem
At each genetic position within a gene, diploid cells have At each genetic position within a gene, diploid cells have two allelestwo alleles. .
This suggests storing This suggests storing each allele as separate variableeach allele as separate variable. .
However, most laboratory methods cannot distinguish However, most laboratory methods cannot distinguish between A/B and B/A, yielding between A/B and B/A, yielding three observed three observed genotypesgenotypes at each position: (A/A), (A/B or B/A), (B/B). at each position: (A/A), (A/B or B/A), (B/B). Consequently, the observed Consequently, the observed alleles are confoundedalleles are confounded,,
This suggests the use of a This suggests the use of a single genotype variablesingle genotype variable..
This duality is not directly handled by standard statistical This duality is not directly handled by standard statistical packages.packages.
As a consequence, the need to handle both views As a consequence, the need to handle both views creates complexity when manipulating or including creates complexity when manipulating or including genotype data in statistical analysis. genotype data in statistical analysis.
Page 6 CT ASA Mini Conference: 2005-03-05
Initial Project Goals
Simplify Statistical Analysis using Genetic Data by providing: A genotype object class that appropriately captures the single
variable / separate allele duality Methods to import and manipulate genotype objects without string
manipulation Simple tools including different ‘views’ of genotype variables in
standard statistical models Dominant ( at least one copy of X) Recessive ( both alleles are X) Additive ( Number of copies of X) Heterozygote Effect (Differing Alleles) Independent ( separate effect for each allele combination: A/A, A/B=B/A, B/B)
Functions for computing and visualizing common genetic summaries and statistical tests Allele Frequencies Hardy-Weinberg Equilibrium Linkage Disequilibrium
Other statistical methods
Page 7 CT ASA Mini Conference: 2005-03-05
Design Details
Design:Design: Genotypes are stored in ‘Factor’ objects, with factor levels formatted as
‘A/C’. A translation table is constructed to quickly extract individual allele
information:
ConsequencesConsequences Can be stored in standard data frames Can be efficiently manipulated (space & time) Permits both biallelic (C/T) and multi-allelic genetic markers (SSLP’s)
GenotypeGenotype Allele 1Allele 1 Allele 2Allele 2
A/AA/A AA AA
A/BA/B AA BB
B/BB/B BB BB
Page 8 CT ASA Mini Conference: 2005-03-05
Genotype Manipulation
Importing & CreationImporting & Creationgenotype(), as.genotype(), makeGenotypes(), …haplotype(), as.haplotype(), makeHaplotypes(), …
ManipulationManipulation[] (subsetting), []<- (subset assignment), == (equality)
InformationInformationsummary() (Allele and genotype counts and frequencies), allele.names(), allele() (Extract individual alleles), nallele() (Number of distinct allele values)
AnnotationAnnotationlocus(), gene(), marker(), …
TransformationTransformationcarrier(), homozygote(), heterozygote(),allele.count()
ExportExportwrite.marker.file(), write.pedigree.file(),write.pop.file()
Page 9 CT ASA Mini Conference: 2005-03-05
Installation
Windows GUI:Windows GUI:
Command Line: Command Line: > install.packages(“genetics”, dependencies=TRUE)
Page 10 CT ASA Mini Conference: 2005-03-05
Statistical Functions
Hardy-Weinberg (Dis-)Equilibrium: D, D’, r, rHardy-Weinberg (Dis-)Equilibrium: D, D’, r, r22, X, X22
diseq(), diseq.ci() (Confidence Intervals!)
HWE.test(), HWE.chisq(), HWE.exact() Linkage Disequlibrium: D, D’, r, rLinkage Disequlibrium: D, D’, r, r22
LD(), LDplot(), LDtable() Haplotype Imputation:Haplotype Imputation:
hap(), hapambig(), hapmcmc(), hapenum(), hapshuffle() Sample-size toolsSample-size tools
gregorius() (Probability of observing a marked of given frequency with specified sample size)
power.casectrl() UtilitiesUtilities
Bootstrap.ci
Page 11 CT ASA Mini Conference: 2005-03-05
Simple Examples : Creating Genotype Objects
A single vector with a character separator:
> g1 <- genotype( c('A/A','A/C','C/C','C/A',
+ NA,'A/A','A/C','A/C') )
> g3 <- genotype( c('A A','A C','C C','C A',
+ '','A A','A C','A C'),
+ sep=' ', remove.spaces=F)
Page 12 CT ASA Mini Conference: 2005-03-05
Simple Examples : Creating Genotype Objects
A single vector with a positional separator
> g2 <- genotype( c('AA','AC','CC','CA','',
+ 'AA','AC','AC'), sep=1 )
Two separate vectors
> g4 <- genotype(
+ c('A','A','C','C','','A','A','A'),
+ c('A','C','C','A','','A','C','C')
+ )
Page 13 CT ASA Mini Conference: 2005-03-05
Simple Examples : Creating Genotype Objects
A dataframe or matrix with two columns
> gm <- cbind(+ c('A','A','C','C','','A','A','A'),+ c('A','C','C','A','','A','C','C') ) > gm [,1] [,2][1,] "A" "A" [2,] "A" "C" [4,] "C" "A" …> g5 <- genotype( gm )> g5[1] "A/A" "A/C" "C/C" "A/C" NA "A/A" "A/C" "A/C"Alleles: A C
Page 14 CT ASA Mini Conference: 2005-03-05
Simple Examples : Creating Genotype Objects
Convert 1-column genotype variables read from a file:Convert 1-column genotype variables read from a file:> > gm1 <- makeGenotypes(gm1 <- makeGenotypes(++ read.csv("gm1.csv")) read.csv("gm1.csv"))> > gm1gm1 Age Sex G1 V2Age Sex G1 V21 31 M A/A G/T1 31 M A/A G/T2 27 F A/C G/G2 27 F A/C G/G3 35 M C/C G/T3 35 M C/C G/T4 19 M A/C G/T4 19 M A/C G/T5 55 M <NA> G/G5 55 M <NA> G/G6 34 F A/A G/G6 34 F A/A G/G7 45 F A/C T/T7 45 F A/C T/T8 32 M A/C G/T8 32 M A/C G/T> > gm1$G1gm1$G1[1] "A/A" "A/C" "C/C" "A/C" NA "A/A" "A/C" "A/C"[1] "A/A" "A/C" "C/C" "A/C" NA "A/A" "A/C" "A/C"Alleles: A C Alleles: A C
_ gm1.csv __
Age,Sex,G1,G2
31,M,A/A,G/T
27,F,A/C,G/G
35,M,C/C,G/T
19,M,A/C,G/T
55,M,,G/G
34,F,A/A,G/G
45,F,A/C,T/T
32,M,A/C,G/T
Page 15 CT ASA Mini Conference: 2005-03-05
Simple Examples : Creating Genotype Objects
Convert 2-column genotype variables read from a fileConvert 2-column genotype variables read from a file
> gm2 <- makeGenotypes( + read.csv("gm2.csv"),+ convert=list(3:4,5:6))> gm2 Age Sex G1.1/G1.2 V2.1/V2.21 31 M A/A G/T2 27 F A/C G/G3 35 M C/C G/T4 19 M A/C G/T5 55 M <NA> G/G6 34 F A/A G/G7 45 F A/C T/T8 32 M A/C G/T
______ gm2.csv _____
Age,Sex,G1.1,G1.2,G2.1,G2.2
31,M,A,A,G,T
27,F,A,C,G,G
35,M,C,C,T,G
19,M,C,A,G,T
55,M,,,G,G
34,F,A,A,G,G
45,F,A,C,T,T
32,M,A,C,T,G
Page 16 CT ASA Mini Conference: 2005-03-05
Simple Examples : Displaying Genotype Information
“Raw”
> g5
[1] "A/A" "A/C" "C/C"
[4] "A/C" NA "A/A“
[5] "A/C" "A/C"
Alleles: A C
“Summary”
> summary(g5)
Allele Frequency:
Count Proportion
A 8 0.57
C 6 0.43
NA 2 NA
Genotype Frequency:
Count Proportion
A/A 2 0.29
A/C 4 0.57
C/C 1 0.14
NA 1 NA
Page 17 CT ASA Mini Conference: 2005-03-05
Simple Examples: Extracting allele information
Genotypes (Independent factor Genotypes (Independent factor levels): levels): > g5
[1] "A/A" "A/C" "C/C" "A/C"
[5] NA "A/A" "A/C" "A/C"
Alleles: A C Allele Counts (Additive Effect):Allele Counts (Additive Effect):
> allele.count(g5, "A")
[1] 2 1 0 1 NA 2 1 1
attr(,"allele")
[1] "A" Allele presence (Dominant Effect):Allele presence (Dominant Effect):
> carrier(g5,'A')
[1] TRUE TRUE FALSE TRUE
[5] NA TRUE TRUE TRUE
Allele Homozygote (Recessive Allele Homozygote (Recessive Effect):Effect):> homozygote(g5,'A')
[1] TRUE FALSE FALSE FALSE
[5] NA TRUE FALSE FALSE Heterozygote (Heterozygote Heterozygote (Heterozygote
Advantage Effect):Advantage Effect):> heterozygote(g5,'A')
[1] FALSE TRUE FALSE TRUE
[5] NA FALSE TRUE TRUE
Page 18 CT ASA Mini Conference: 2005-03-05
Simple Examples: Extracting allele information
First allele:First allele:> allele(g5, 1)
[1] "A" "A" "C" "A" NA "A"
[7] "A" "A"
attr(,"which")
[1] 1
attr(,"allele.names")
[1] "A" "C“
Both alleles:Both alleles:> allele(g5)
[,1] [,2]
[1,] "A" "A"
[2,] "A" "C"
[3,] "C" "C"
[4,] "A" "C"
[5,] NA NA
[6,] "A" "A"
[7,] "A" "C"
[8,] "A" "C"
attr(,"which")
[1] 1 2
attr(,"allele.names")
[1] "A" "C"
Page 19 CT ASA Mini Conference: 2005-03-05
Example Session
Page 20 CT ASA Mini Conference: 2005-03-05
Future Development
R GeneticsNGR GeneticsNG Mission:Mission:
GeneticsNG is a collaborative project to develop a core set of data structures and analytic tools for the management, visualization, and analysis of genetic data. This core will provide sufficient ease of use, stability, features, documentation, and community support to inspire users and developers to utilize, contribute and extend the system.
Goals:Goals: Scalable to Whole-Genome genetic analysis (>1e5 SNPs) Read/Write common genetics data storage formats Port existing open-source genetics codes
• Current R genetics packages (genetics, haplo.score, gap, …)• Other open-source packages…
Provide good documentation, including tutorials and training Engage the entire R genetics user/developer community
Page 21 CT ASA Mini Conference: 2005-03-05
Future Development
R GeneticsNGR GeneticsNG Current TeamCurrent Team
• Pfizer: Gregory Warnes, Nitin Jain
• Channing Laboratory (Harvard): Ross Lazarus
• BMS: Scott D Chasalow, Giovanni Montana
• Insightful: Michael O'Connell
• Univ. Chicago: Junsheng Cheng
• Join us!
Project Page: Project Page:
http://r-genetics.sf.net/
Page 22 CT ASA Mini Conference: 2005-03-05
References
R Project:R Project: http://www.r-project.org
R genetics package:R genetics package: http://cran.r-project.org/contrib/main/Descriptions/genetics.html
R-News article:R-News article: Warnes GR. ``The Genetics Package,'' R News, Volume 3,
Issue 1, June 2003. R GeneticsNG project:R GeneticsNG project:
http://r-genetics.sf.net/ Me:Me:
http://www.warnes.net [email protected]