Upload
bin
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree. Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China. Outline. - PowerPoint PPT Presentation
Citation preview
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree
Authors:
Lan Liu & Tao Jiang, Univ. California, Riverside
Jing Xiao, Lirong Xia, Tsinghua Univ. , China
Outline
Introduction and problem definition A new system of linear equations for
ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion
Pedigree
Camilla, Duchess of Cornwall
Peter Phillips Zara Phillips
Diana,Princess of Wales
Prince Williamof Wales
Prince Henry ofWales
PrincessBeatrice of York
PrincessEugenie of York
Lady LouiseWindsor
Prince Charles,Prince of Wales
Princess Anne, Princess Royal
CommanderTimothy Laurence
Prince Andrew,Duke of York
SarahMargaret Ferguson
Prince Edward, Earl of Wessex
Sophie Rhys-Jones
Elizabeth II ofthe United Kingdom
Prince Philip,Duke of Edinburgh
CaptainMark Phillips
An example: British Royal Family
Biological Background
2 2
2 1
1 2
1 1
1 2
Genotype
Haplotype
Locus
Basic concepts Mendelian Law: one haplotype comes from the father and the other comes from the mother.
Example: Mendelian experiment
paternal maternal
12: heterozgyous11 22: homozygous
2|1
1|2
Notations and Recombinant
1122
2222
Genotype
1222
2122
Haplotype Configuration
0 recombinant
1111
2222
2222
2222
1111
2222
MotherFather
Child
: recombinant
1111
2222
2222
2222
1122
2222
1 recombinant
MotherFather
Child
Haplotype Configuration Reconstruction
Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain
In biological application, genotypes instead of haplotypes are collected.
How to reconstruct haplotype from genotype? recombination-free assumption
1 21 2
1 22 1
1 21 2
(b)
1 21 2
1 21 2
1 21 2
1 21 2
1 21 2
1 21 2
(a)
The ZRHC problem Problem definition Given a pedigree and the genotype information for
each member, find a recombination-free haplotype configuration for each member that obeys the Mendelian law of inheritance.
Previous Work Li and Jiang introduced a system of linear equations
over F[2] and presented an time algorithm for ZRHC [LJ03] , where m is #loci and n is #members in pedigree.
Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06].
Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.
3 3O m n
Related work Methods based on fast matrix multiplication algorith
ms could achieve an asymptotic speed of O(k2.376) on k equations with k unknowns
The Lanczos and conjugate gradient algorithms are only heuristics [GV96].
The Wiedeman algorithm has expected quadratic running time [W86]
Our Result
We present a much faster algorithm for ZRHC with running time . 2 3 2log log logO mn n n n
Ax=b
O mn
O mn Ax=b O mn Ax=b
transformation
redundancy elimination
O(n log2n log log n)
O(n)
O(n)
Outline
Introduction and problem definition A new system of linear equations for
ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion O mn
O mn Ax=b
The New Linear System n, m
m : #loci n: #members in pedigree Unknowns
: the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j1 and a child j.
The New Linear System
0100
1101
0000
0111
0 0 0 1
1101
j2 j1
j
Pj1,1
pj1,2
pj1,3
pj1,4
j2
j
j1
Pj2,1
pj2,2
pj2,3
pj2,4
Pj2,1 +0
pj2,2 +1
pj2,3 +1
pj2,4 +1
Pj,1
pj,2
pj,3
pj,4
Pj,1 +1
pj,2 +1
pj,3 +0
pj,4 +0
hj1,j hj2,j
Pj1 +wj1Pj1 Pj2 Pj2 +wj2
Pj1,1 +1
pj1,2 +0
pj1,3 +0
pj1,4 +1
Pj Pj +wj
pj1,2=1 pj1,
3=0
The Linear System
O(mn) equations on O(mn) unknowns.
Given a homozygous locus i on a member j (with a child j1), pj[i] and pj1[i] are pre-determined.
Pedigree Graph A pedigree with genotype
1
6
9
8
32
4 75
12
11
12
12
11
12
12
12
12
22
12
12
12
22
22
12
12
12
11
22
12
11
12
12
22
12
12
1
6
9
8
32
4 75
Pedigree graph G
#edges · 2n
Locus Graph
Locus graph Gi
1
6
9
8
32
4 75
12 22 11
12 12 12 11
12
22
Example: Locus graph for the 3rd locus
Gi = (V, Ei), where Ei= {(k,j)| k is a parent of j, wk[i]=1}
(a) Genotype info
Zero-weight
:
1
6
9
8
32
4 75
? 1 0
1 1 1 0
1
0
h1,4
h4,9h8,9
h6,8
(b) Locus graph
Introduction and problem definition A new system of linear equations for
ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion
Outline
Ax=b
O mn
O mn Ax=b
transformation
O(n)
O(mn)
An Observation For any cycle or any path in a locus graph connecting two pre-determined vertices, the summation of h-variables along the path is a constant.We can use paths to denote
constraints!
a constant
+ dj0, j1
…
Pj1[i]hj1, j2
Pj2[i] Pjk-1[i] Pjk[i]hjk-1, jk
dj1, j2 djk-1, jk
Pj1[i] + dj1, j2+ hj1, j2 = Pj2[i]Pj2[i] + dj2, j3+ hj2, j2 = Pj3[i]…
Pjk-1[i] + djk-1, jk+ hjk-1, jk= Pjk
[i]
Pj0[i]hj0, j1
dj0, j1
Pj0[i] = Pj1[i]
+ hj0, j1
(proof sketch) Assume the path in locus graph Gi connecting two pre-determined vertices j0 and jk .
Examples of Linear Constraints
1
6
9
8
32
4 75
? 1 0
1 1 1 0
1
0h8,9
h6,8
(a) 1st locus graph h6,8 + h8,9= 1
1
6
9
8
32
4 75
0 ? ?
1 ? ? 1
0
1:
(b) 2nd locus graph h3,5 + h3,6 + h2,5 + h2,6 =
0
h2,5
h3,5 h3,6
h2,6
1
6
9
8
32
4 75
? ? ?
? ? ? ?
0
1
h6,8
h2,4
h2,5
h3,5 h3,6
h4,9
(c) 3rd locus graph h4,9 + h2,4 + h2,5 + h3,5 +
h3,6 + h6,8 = 0
Linear Constraints
Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient.
Moreover, we can upper bound #constraints in each locus graph as O(n), while the trivial analysis gives an upper bound O(n2).
Total #constraints = O(mn).
The ZRHC-PHASE algorithm
Algorithm ZRHC_PHASEinput: a pedigree G=(V,E) and genotype {gj}
output: a general solution of {pj}
begin
Step 1. Preprocessing
Step 2. Linear constraint generation on h-variables
Step 3. Solve h-variables by Gaussian Elimination
Step 4. Solve the p-variables by propagation from pre-determined p-variables to others.
end
Our method Solve h-variables and p-variables separately O(mn) linear equations on O(n) h-variables.
Traditional method Solve h-variables and p-variables together O(mn) equations on O(mn) unknowns: O(mn) p-variables and O(n) h-variables.
Outline Introduction and problem definition A new system of linear equations for
ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion
Ax=b
O mn
O mn Ax=b Ax=b
transformation
redundancy elimination
O(n log2n log log n)
O(n)O(n)
O(mn)
Redundant Equation Eliminationj0 j1
jk-1
jk
jk-2
j2
…
An observation
Given a cycle , assume that there are constraints among each pair of vertices. Originally, there are O(k2) constraints. Notice that they are not independent. However, we can replace the original constraints by an equivalent set of constraints with size O(k).
j2 ~ jk-1
j0 ~ j2
j0 ~ jk-1
Remove the redundant equations without solving them!
Key lemma
Given a spanning tree, the stretch of an edge (k, j) is defined as the length of the unique path between k and j on the tree.
Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with average stretch O(log2n log log n).
The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sum of stretches O(nlog2n log log n).
Redundant Equation Elimination
Conclusion We present an efficient algorithm for ZRHC with runni
ng time O(mn2+n3 log2n log log n).
It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O(mn2+n3)
or lower.
Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants
Thanks for your time and attention!