56
Linkage analysis Jan Hellemans 6

Linkage analysis Jan Hellemans 6. Finding causal mutations 2 opposing strategies sequence then select select then sequence Sequencing traditional

Embed Size (px)

Citation preview

Page 1: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Linkage analysisJan Hellemans

6

Page 2: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Finding causal mutations

2 opposing strategies sequence then select select then sequence

Sequencing traditional Sanger sequencing only possible after selection Massively parallel sequencing possible prior to or after selection

RNA sequencing exome sequencing genome sequencing

Page 3: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Finding causal mutations

Selection positional (prior to sequencing)

linkage analysis GWAS structural variations (e.g. microdeletions)

functional (prior to & after sequencing) candidate genes selected based on known function or involvement

in related disorders filtering of variants based on functional predictions

overlap (after sequencing) looking for genes / variants that occur in multiple independent

patients

mostly a combination is used

Page 4: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

exome sequencing

Page 5: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Aims

Interprete microsatellite results Add genotypes to pedigrees Create pedigree and genotype files Calculate and interprete LOD-scores Delineate linkage intervals

Basic principles of linkage analysis Analyze other types of markers Association studies Learn how to work with specific pedigree programs

Page 6: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Starting linkage analysis

Page 7: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Preparations

Clearly define the phenotype If not specific enough than you may analyze different disorders that can

map to different genomic loci LOD scores are additive

Find suitable families larger is better more patients is better

Collect genomic DNA from as much family members as possible

Determine the type of inheritance Calculate the power to prove linkage with the available

material (SLink – not part of this course)

Page 8: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Linkage analysis types

Directed linkage analysis Evaluate linkage at a specific locus such as a candidate gene Common approach: evaluate an intragenic, 5’ and 3’ marker

often microsattelites

Genome wide linkage analysis Screen for linkage for markers spread across the entire genome Microsatellites: ~400 markers spaced at about 10cM SNP’s: 500k SNP array

Homozygosity mapping Screen only affected individuals in inbred families Select homozygous markers (typically SNP markers) Very efficient technology

Fine mapping Some linked markers are known, but the borders of the linkage interval

still need to be defined

Page 9: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 1

2 inbred families with a recessive disorder With a homozygosity mapping based on 500k SNP

arrays 2 candidate regions could be identified

-

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

1 2

Chromosome 4 Patient 1 homozygous for

6.052Mb - 14.488Mb 21.008Mb – 37.477Mb

Patient 2 homozygous for 11.186Mb – 37.219Mb

Task: find microsatellite markers to confirm linkage

Page 10: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Find additional flanking markers

Find physical position of marker in NCBI > UniSTS NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/ Go to Homo sapiens and to the wright chromosome Maps & options: show

DeCode, Généthon & Marshfield (genetic maps) Genes

Set region: e.g. 2Mb up- and downstream of your marker Click ‘Data as table view’ Click on STS behind a marker to see its details Select markers that

locate to only 1 genomic location have a PCR product with an extended size range

one size not polymorphic

Page 11: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

http://www.ncbi.nlm.nih.gov/projects/mapview

Page 12: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

http://www.ncbi.nlm.nih.gov/projects/mapview

Page 13: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

http://www.ncbi.nlm.nih.gov/projects/mapview

Page 14: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 1 > possible solution

Markers in 1st candidate region D4S3017 (21.078Mb) D4S3044 (25.189Mb) D4S1618 (33.857Mb) D4S3350 (33.857Mb) D4S2988 (36.889Mb)

Markers in 2nd candidate region D4S1582 (10.311Mb) D4S2906 (12.321Mb) D4S2944 (13.141Mb) D4S1602 (14.059Mb) D4S2960 (15.437Mb)

Order primers & analyze them on all family members

Page 15: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Analyzing microsatellite data

Page 16: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > basics

Repeats of short sequences (e.g. 2bp)NNNNAC(AC)nACNNNN

Number of repeats is variable (instable sequence) Number of repeats determines the allele Number of repeats corresponds to specific length of

PCR product: allel 1: NNNNACACACACACNNNN (5*AC 18bp) allel 2: NNNNACACACACACACNNNN (6*AC 20bp) allel 3: NNNNACACACACACACACNNNN (7*AC 22bp) ...

Determine length to know the allele (sequencer)

Page 17: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > basics

Page 18: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > determine size

230bp220bp

225bp

Use internal size standard (other color)

Page 19: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > heterozygotes

230bp220bp

225bp223bp

Page 20: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > stutter peaks

Repeats are difficult to copy polymerase slips Some amplicons have 1 repeat less

a few even loose multiple repeats Small repeats are more prone to slippage and show

more pronounced stutter peaks Largest product is the correct one Distance between peaks = length of a repeat

Page 21: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > stutter peaks

allelic peak

1st stutter peak

2nd stutter peak

Page 22: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > stutter peaks

Allelic peaks are the heighest Stutter peaks are lower

A1 A2

Page 23: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > stutter peaks

A1 A2

Page 24: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > +A peaks

Taq polymerase tends to add an extra A at the 3’ end Variable degree of products with or without this extra A Do not confuse with stutter peaks (only 1bp difference)

allelic peak

1st stutter peak

2nd stutter peak

allelic peak + A

1st stutter peak + A

2nd stutter peak + A

Page 25: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > complex plots (stutter & +A)

A1 A2

Page 26: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellites > mutliplex

Combine multiple markers in a single analysis ($$$) Different size range Multicolor Commercial kits: e.g. 16 markers / lane

Page 27: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Microsatellite plots examples

Page 28: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 29: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 30: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 31: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 32: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 33: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 34: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 35: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional
Page 36: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Genotyping pedigrees

Page 37: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Genotyping pedigrees

Screen one or multiple markers for some or all family members

For every marker: Make a list of all occuring allele sizes Due to technical variation on sizing the same allele can have a slightly

different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number

Add the allele numbers to the pedigree at the corresponding individual/marker combination

Find the wright phase

Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker

Advanced pedigree programs like Progeny can store genotype information for family members

Verify inheritance

Page 38: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 2

Genotype 3 markers in all available individuals of 2 families

Pedigrees & microsatellite plots inExercisePart2-GenotypingData.pdf

Add allele numbers for the 3 markers to the pedigree Interprete the genotyped pedigrees: linked?

Page 39: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Family 1

Page 40: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Family 2

Page 41: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 2 > Conclusions

D4S1582 Mendelian error can not be interpreted

D4S2944 Linked

D4S3017 Not-linked: unaffected individuals with the same genotype as a patient

Page 42: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Calculate LOD scores

Page 43: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

EasyLinkage

EasyLinkage = UI for linkage analysis http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start Bioinformatics. 2005 Feb 1;21(3):405-7 PMID: 15347576 Bioinformatics. 2005 Sep 1;21(17):3565-7 PMID: 16014370

Interface for many linkage analysis programs Input

Pedigree file (linkage format) Genotype file(s) Marker information (already provided for popular markers) Settings

Page 44: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Pedigree file

Naming requirements for EasyLinkage:p_xxx.pro e.g. p_SMMD.pro

Format: Tab delimited text file 1 individual per row

Columns: 1 family ID 2 person ID 3 father ID 4 mother ID 5 sex (1=male, 2=female, 0=unknown) 6 affection status (1=unaffected, 2=affected, 0=unknown) 7 DNA availability (optional, relevant for power calculations) 8 liability class (to be provided if multiple liability classes are used)

Page 45: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Genotype files

Person ID’s have to match exactly with those provided in the pedigree file

Naming requirements for EasyLinkage:MarkerName_xxx.abi e.g. D1S1609_SMMD.abi

Format: Tab delimited text file 1 individual per row

Columns (for microsatellite based analysis): 1 marker (same as in file name and matching a marker in an

available marker set) 2 custom information (content doesn’t matter, but column must be

present) 3 individual ID (match person ID in pedigree file) 4 & 5 genotypes for 2 alleles (unknown=0)

Page 46: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Marker information

Contains information on the chromosome and position of every marker

Already available for a number of commercial SNP-arrays and for the microsatellite markers from Genethon Marshfield DeCode

Custom marker sets can be created (see manual)

Page 47: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

EasyLinkage settings

Choose a program: FastLink Parametric, single-point SuperLink Parametric, single-/multipoint SPLink Nonparametric, single-point Genehunter Nonpara-/parametric, single-/multipoint Genehunter Plus Nonpara-/parametric, single-/multipoint Genehunter MOD Nonpara-/parametric, single-/multipoint Genehunter Imprinting Nonpara-/parametric, single-/multipoint GeneHunter TwoLocus Parametric, two-locus, single-/multipoint Merlin Nonpara-/parametric, single-/multipoint SimWalk Nonparametric, single-/multipoint Allegro Nonpara-/parametric, single-/multipoint & simulation,

single-/multi-point PedCheck Mendelian error check FastSLink Simulation, single-/multi-point

Page 48: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

EasyLinkage settings

Parametric <-> non-parametric Single point <-> multipoint Frequency of the disease allele Penetrance vectors (wt/wt, wt/mt, mt/mt)

Standard dominant: 0 1 1 Standard recessive: 0 0 1 Reduced penetrance: replace 1 by penetrance (e.g. 0.9) Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1) Example: 0.01 0.9 0.99

1% chance to show a similar phenotype despite a normal genotype90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance)99% likelihood to present with the phenotype if both alleles are mutant

Page 49: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Evaluate calculated LOD-scores

Maximum LOD-scores can be seen in EasyLinkage Details about LOD-scores at different recombination

fractions can be found in text files generated by EasyLinkage process in Excel (generate graphs, ...)

Standard rules for LOD-scores >3 significant linkage 2<LOD<3 suggestive linkage -2<LOD<2 uninformative <-2 significant absence of linkage

Page 50: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Interpreting LOD plots

-5

-4

-3

-2

-1

0

1

2

3

4

5

0 0,1 0,2 0,3 0,4 0,5

-5

-4

-3

-2

-1

0

1

2

3

4

5

0 0,1 0,2 0,3 0,4 0,5

-5

-4

-3

-2

-1

0

1

2

3

4

5

0 0,1 0,2 0,3 0,4 0,5

-5

-4

-3

-2

-1

0

1

2

3

4

5

0 0,1 0,2 0,3 0,4 0,5

Page 51: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 3

Generate one pedigree file containing all family members of both families (use Global ID’s)

Generate a genotype file for each of the tested markers Run SuperLink analysis with the right settings Evaluate results

Page 52: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise – Part 3 > Results

Page 53: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Strengthen the evidence

Analyze more family members Analyze more families Analyze flanking markers

Look for more informative markers that result in higher LOD-scores A series of flanking markers allows for multipoint linkage analysis A series of linked markers gives more confidence (subjective) Flanking markers can also be used to fine-map the linkage interval

Page 54: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Determine the linkage interval

L

L

NL

NL

?

?

LL

NLNL

NL

L?

?

... candidateregion

Page 55: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Exercise 2: find the linkage interval

Page 56: Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional

Post linkage

Create a list of all the genes within the linkage interval NCBI map viewer UCSC (also for non-coding RNA’s)

Evaluate known gene functions for relevance to the investigated phenotype

Sequence genes Start with those that seem the most relevant to the disorder Start with the coding regions Screen the entire region with capture sequencing

Finding a mutation and proving its causality is the ultimate proof