Upload
lynette-howard
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Copy the folderFaculty/Sarah/Tues_merlinto the C Drive
C:/Tues_merlin
MERLIN (and other Abecasis products)Sarah Medland & Kate MorleyBoulder 2009
MERLIN softwarePrograms:GRRMERLINMinXMERLIN-regressPedstatsPedwipePedmerge
We will be using CygwinUnix emulator for windowsOpen by double clickingMigrate to this sessions working directorycd C:/tues_merlinCheck to see the files in the directoryls
Data Input FilesGetting your data into Merlin
Input File TypesPedigree FileFamily relationshipsPhenotype dataGenotype dataData FileDescribes contents of pedigree fileMap FileRecords location of genetic markers
Example Pedigree File
Data File Field Codes
CodeDescriptionMMarker Genotype.AAffection Status.TQuantitative Trait.CCovariate.ZZygosity.S[n]Skip n columns.
First step check relationshipsGRR
GRR - www.sph.umich.edu/csg/abecasis/GRRGraphs mean IBS against sd IBSEither within families or across everyone in the sampleIdeally 200+ markers genotyped in common for each pairIf you want to try this laterSample.ped1300 individuals from 200 familiesGenotyped on 320 markers across the genome
Load grr.pedTick all pairs
GRR is good for findingMZ pairs labeled as sib-pairsDuplicatesDads that arent dadsFull sibs who are half-sibs
Manipulating Data FilesPedmerge
Manipulating Data FilesPedmergeCombine multiple data files Remove columns from a ped fileRecode the dat file so unwanted columns are skippedAssumes ped and dat files have the same prefix example.ped example.dat
Type pedmerge
Checking for genotype errorPedstats
Usagepedstats.exe p pedstats.ped d pedstats.dat
Summarizes pedigree
Trait summary
Pedstats will crash if there are Medelian errors
Draw a diagram for this familyfamiddadmumsexA1A21100m321200f211312m231412f331512f33
3/22/12/33/33/3
Mendelian errorsTry to localize the errorShort term solution delete the bad genotypesLong term solution retype the family at this marker
After fixing the problems
Merlin
MERLINAutomates simple linkage tests (black box)Uses fast multipoint calculations to generate IBD and kinship matricesKey options are vc (variance components analysis) useCovariates (user-specified covariates)Means modelCan incorporate user-specified covariatesVariance components model
Merlin's Standard Variance Components Model - AQEEnvironmental component Non shared, uses identity matrixAdditive Polygenic component Shared among relatives, according to kinship matrixQTL componentShared when individuals are IBD, kinship matrix at marker
What is a Kinship Coefficient?Kinship coefficient (): probability that two alleles sampled at random, one from each individual, are identical by descent
2 x ij = expected proportion of alleles IBD across genome for individuals i and j ( )But will vary at each locus
1 / 21 / 2For MZ twins = .5For Full sibs = .25
General covariance model
Practical overviewUsing the LDL data from chromosome 19 (yesterday afternoons practical)Data cleaningMerging phenotype and genotype dataChecking you data with pedstatsVC analysis in MERLINMERLIN-regress analysisComparison of MERLIN vs Mx
Step #1: combining phenotypes and genotypesStart with four files:pheno.ped + pheno.dat (phenotype data)geno.ped + geno.dat (genotype data)
Combine .ped files and combine .dat files using pedmerge to create 1 pedigree file and 1 .dat file
Practical #1: commandsHave a look at your fileshead Combine your pedigree files and dat files pedmerge pheno geno linkage
Check your file using the head commandCalls up the programmeNames of the two sets of files to be combined (N.B. the matching .ped and .dat files must have the same name)Name of the newly created .ped and .dat files
linkage.ped
Step #2: checking your data with pedstatsPedstats provides preliminary data checksInitial check of input filesPedigree consistencyInformation on genetic marker dataMarker heterozygosityProportion of individuals genotypedTests of Hardy Weinberg equilibrium
Prac #2: commands./pedstats -x-9999.000 -d linkage.dat -p linkage.ped > prac2.out
pedstats -x-9999.000
d linkage.dat p linkage.ped
> prac2.out
Calls up the programmeSpecifies the missing valueIdentify the .dat fileIdentify the .ped fileSend the output to a text file
Step #3: running VC linkage./merlin --vc -x -9999.000 -p linkage.ped -d linkage.dat -m linkage.map > linkage.out
merlin --vc -x -9999.000
-p linkage.ped -d linkage.dat -m linkage.map
> linkage.outCalls up the programmeSpecifies VC linkage and the missing valueIdentify the .ped, .dat, and .map filesSend the output to a text file
So why would we run Mx Merlin can not analyse ordinal dataLimited correction for ascertainmentLimited multivariate linkage repeated measures using the mean and TRT correlationOnly runs an AE model no C or D
A 86% E 14%
Chart1
9.599.593
10.1210.118
12.4612.46
5.565.562
3.243.242
4.754.752
1.791.789
1.461.46
1.981.975
1.341.339
1.31.3
MERLIN
Mx
cM
Chi-square
AE
Change in -2LL
cMAE
MERLINMXmerlin lodmx lodregress lod
09.599.592.052.051.537
510.1210.122.172.171.199
1012.4612.462.672.670.725
155.565.561.191.190.387
203.243.240.690.690.097
254.754.751.021.020.177
301.791.790.380.380.279
351.461.460.310.310.439
401.981.980.420.420.44
451.341.340.290.290.488
501.31.300.280.280.441
cMACE
MERLINMXmerlin lodmx lodregress lod
020.2022.914.334.91
516.6020.953.554.49
1010.7214.212.303.04
156.147.951.311.70
202.444.060.520.87
252.333.300.500.71
302.663.240.570.69
353.454.050.740.87
403.363.460.720.74
453.443.430.740.73
503.163.230.680.69
Change in -2LL
MERLIN
Mx
cM
Chi-square
AE
Sheet2
MERLIN
Mx
cM
Chi-square
ACE
Sheet3
A 60%C 30%E 10%
Chart2
20.222.913
16.620.948
10.7214.212
6.147.951
2.444.055
2.333.302
2.663.239
3.454.046
3.363.464
3.443.428
3.163.228
MERLIN
Mx
cM
Chi-square
ACE
Change in -2LL
cMAE
MERLINMXmerlin lodmx lodregress lod
09.599.592.052.051.537
510.1210.122.172.171.199
1012.4612.462.672.670.725
155.565.561.191.190.387
203.243.240.690.690.097
254.754.751.021.020.177
301.791.790.380.380.279
351.461.460.310.310.439
401.981.980.420.420.44
451.341.340.290.290.488
501.31.300.280.280.441
cMACE
MERLINMXmerlin lodmx lodregress lod
020.2022.914.334.91
516.6020.953.554.49
1010.7214.212.303.04
156.147.951.311.70
202.444.060.520.87
252.333.300.500.71
302.663.240.570.69
353.454.050.740.87
403.363.460.720.74
453.443.430.740.73
503.163.230.680.69
Change in -2LL
MERLIN
Mx
cM
Chi-square
AE
Sheet2
MERLIN
Mx
cM
Chi-square
ACE
Sheet3
Merlin Regress
AimTo develop a regression-based method thatHas same power as maximum likelihood variance components, for sib pair dataWill generalise to general pedigreesIs computationally efficient
Multivariate Regression ModelWeighted Least Squares EstimationWeight matrix based on IBD informationDependent variables = IBDIndependent variables = Trait
General approachStandard regression based methods model trait (D2, S2) in terms of estimated IBD statusY = + + Instead IBD estimate is regressed on trait value = + Y +
Extend to general pedigrees = + Y +
Dependent VariablesEstimated IBD sharing of all pairs of relativesExample:
Independent VariablesSquares and cross-products(equivalent to non-redundant squared sums and differences)Example
EstimationFor a family, regression model is
Estimate Q by weighted least squares, and obtain sampling variance, family by familyCombine estimates across families, inversely weighted by their variance, to give overall estimate, and its sampling variance
Why is that better?Regression methods assume that the dependant variable (left hand side) is normally distributed
Distribution of pi-hat
Why is that better?But central limit theorem works well when data a symmetric with mode in the centreIn a general pedigree, sib-pairs provide the most information on linkageIBD under null hypothesis (with complete inheritance information)0 25%0.5 50%1 25%
Selected SamplesMerlin-regress is particularly suited to the analysis of selected samples
Ordinary variance component analysis (e.g. using Merlin) gives biased QTL estimatesMerlin-regress is designed to be robust to data selection
Example Data BMI 10000 pairs
Selected Sample 500 pairs
Results VC
Results Merlin-Regress
Practical #4: running regress./merlin-regress -x -9999.000 -p linkage.ped -d linkage.dat -m linkage.map --mean ? --variance ? --heritability ? > linkage2.out merlin-regress --vc -x -9999.000
-p linkage.ped -d linkage.dat -m linkage.map
--mean ? --variance ? --heritability ?
> linkage.outCalls up the programmeSpecifies VC linkage and the missing valueIdentify the .ped, .dat, and .map filesSend the output to a text fileSpecify the mean, variance, and heritability from the whole population (Pedstats)
*****