26
Fast Elimination of Redundant Linea r Equations and Reconstruction of R ecombination-free Mendelian Inherit ance on a Pedigree Authors: Lan Liu & Tao Jiang, Univ. California, R iverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China

Authors: Lan Liu & Tao Jiang, Univ. California, Riverside

  • Upload
    bin

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree. Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree

Authors:

Lan Liu & Tao Jiang, Univ. California, Riverside

Jing Xiao, Lirong Xia, Tsinghua Univ. , China

Page 2: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Outline

Introduction and problem definition A new system of linear equations for

ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion

Page 3: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Pedigree

Camilla, Duchess of Cornwall

Peter Phillips Zara Phillips

Diana,Princess of Wales

Prince Williamof Wales

Prince Henry ofWales

PrincessBeatrice of York

PrincessEugenie of York

Lady LouiseWindsor

Prince Charles,Prince of Wales

Princess Anne, Princess Royal

CommanderTimothy Laurence

Prince Andrew,Duke of York

SarahMargaret Ferguson

Prince Edward, Earl of Wessex

Sophie Rhys-Jones

Elizabeth II ofthe United Kingdom

Prince Philip,Duke of Edinburgh

CaptainMark Phillips

An example: British Royal Family

Page 4: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Biological Background

2 2

2 1

1 2

1 1

1 2

Genotype

Haplotype

Locus

Basic concepts Mendelian Law: one haplotype comes from the father and the other comes from the mother.

Example: Mendelian experiment

paternal maternal

12: heterozgyous11 22: homozygous

2|1

1|2

Page 5: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Notations and Recombinant

1122

2222

Genotype

1222

2122

Haplotype Configuration

0 recombinant

1111

2222

2222

2222

1111

2222

MotherFather

Child

: recombinant

1111

2222

2222

2222

1122

2222

1 recombinant

MotherFather

Child

Page 6: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Haplotype Configuration Reconstruction

Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain

In biological application, genotypes instead of haplotypes are collected.

How to reconstruct haplotype from genotype? recombination-free assumption

1 21 2

1 22 1

1 21 2

(b)

1 21 2

1 21 2

1 21 2

1 21 2

1 21 2

1 21 2

(a)

Page 7: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

The ZRHC problem Problem definition Given a pedigree and the genotype information for

each member, find a recombination-free haplotype configuration for each member that obeys the Mendelian law of inheritance.

Page 8: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Previous Work Li and Jiang introduced a system of linear equations

over F[2] and presented an time algorithm for ZRHC [LJ03] , where m is #loci and n is #members in pedigree.

Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06].

Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.

3 3O m n

Page 9: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Related work Methods based on fast matrix multiplication algorith

ms could achieve an asymptotic speed of O(k2.376) on k equations with k unknowns

The Lanczos and conjugate gradient algorithms are only heuristics [GV96].

The Wiedeman algorithm has expected quadratic running time [W86]

Page 10: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Our Result

We present a much faster algorithm for ZRHC with running time . 2 3 2log log logO mn n n n

Ax=b

O mn

O mn Ax=b O mn Ax=b

transformation

redundancy elimination

O(n log2n log log n)

O(n)

O(n)

Page 11: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Outline

Introduction and problem definition A new system of linear equations for

ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion O mn

O mn Ax=b

Page 12: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

The New Linear System n, m

m : #loci n: #members in pedigree Unknowns

: the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j1 and a child j.

Page 13: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

The New Linear System

0100

1101

0000

0111

0 0 0 1

1101

j2 j1

j

Pj1,1

pj1,2

pj1,3

pj1,4

j2

j

j1

Pj2,1

pj2,2

pj2,3

pj2,4

Pj2,1 +0

pj2,2 +1

pj2,3 +1

pj2,4 +1

Pj,1

pj,2

pj,3

pj,4

Pj,1 +1

pj,2 +1

pj,3 +0

pj,4 +0

hj1,j hj2,j

Pj1 +wj1Pj1 Pj2 Pj2 +wj2

Pj1,1 +1

pj1,2 +0

pj1,3 +0

pj1,4 +1

Pj Pj +wj

pj1,2=1 pj1,

3=0

Page 14: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

The Linear System

O(mn) equations on O(mn) unknowns.

Given a homozygous locus i on a member j (with a child j1), pj[i] and pj1[i] are pre-determined.

Page 15: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Pedigree Graph A pedigree with genotype

1

6

9

8

32

4 75

12

11

12

12

11

12

12

12

12

22

12

12

12

22

22

12

12

12

11

22

12

11

12

12

22

12

12

1

6

9

8

32

4 75

Pedigree graph G

#edges · 2n

Page 16: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Locus Graph

Locus graph Gi

1

6

9

8

32

4 75

12 22 11

12 12 12 11

12

22

Example: Locus graph for the 3rd locus

Gi = (V, Ei), where Ei= {(k,j)| k is a parent of j, wk[i]=1}

(a) Genotype info

Zero-weight

:

1

6

9

8

32

4 75

? 1 0

1 1 1 0

1

0

h1,4

h4,9h8,9

h6,8

(b) Locus graph

Page 17: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Introduction and problem definition A new system of linear equations for

ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion

Outline

Ax=b

O mn

O mn Ax=b

transformation

O(n)

O(mn)

Page 18: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

An Observation For any cycle or any path in a locus graph connecting two pre-determined vertices, the summation of h-variables along the path is a constant.We can use paths to denote

constraints!

a constant

+ dj0, j1

Pj1[i]hj1, j2

Pj2[i] Pjk-1[i] Pjk[i]hjk-1, jk

dj1, j2 djk-1, jk

Pj1[i] + dj1, j2+ hj1, j2 = Pj2[i]Pj2[i] + dj2, j3+ hj2, j2 = Pj3[i]…

Pjk-1[i] + djk-1, jk+ hjk-1, jk= Pjk

[i]

Pj0[i]hj0, j1

dj0, j1

Pj0[i] = Pj1[i]

+ hj0, j1

(proof sketch) Assume the path in locus graph Gi connecting two pre-determined vertices j0 and jk .

Page 19: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Examples of Linear Constraints

1

6

9

8

32

4 75

? 1 0

1 1 1 0

1

0h8,9

h6,8

(a) 1st locus graph h6,8 + h8,9= 1

1

6

9

8

32

4 75

0 ? ?

1 ? ? 1

0

1:

(b) 2nd locus graph h3,5 + h3,6 + h2,5 + h2,6 =

0

h2,5

h3,5 h3,6

h2,6

1

6

9

8

32

4 75

? ? ?

? ? ? ?

0

1

h6,8

h2,4

h2,5

h3,5 h3,6

h4,9

(c) 3rd locus graph h4,9 + h2,4 + h2,5 + h3,5 +

h3,6 + h6,8 = 0

Page 20: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Linear Constraints

Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient.

Moreover, we can upper bound #constraints in each locus graph as O(n), while the trivial analysis gives an upper bound O(n2).

Total #constraints = O(mn).

Page 21: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

The ZRHC-PHASE algorithm

Algorithm ZRHC_PHASEinput: a pedigree G=(V,E) and genotype {gj}

output: a general solution of {pj}

begin

Step 1. Preprocessing

Step 2. Linear constraint generation on h-variables

Step 3. Solve h-variables by Gaussian Elimination

Step 4. Solve the p-variables by propagation from pre-determined p-variables to others.

end

Our method Solve h-variables and p-variables separately O(mn) linear equations on O(n) h-variables.

Traditional method Solve h-variables and p-variables together O(mn) equations on O(mn) unknowns: O(mn) p-variables and O(n) h-variables.

Page 22: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Outline Introduction and problem definition A new system of linear equations for

ZRHC An O(mn3) time algorithm for ZRHC An improved algorithm for ZRHC Conclusion

Ax=b

O mn

O mn Ax=b Ax=b

transformation

redundancy elimination

O(n log2n log log n)

O(n)O(n)

O(mn)

Page 23: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Redundant Equation Eliminationj0 j1

jk-1

jk

jk-2

j2

An observation

Given a cycle , assume that there are constraints among each pair of vertices. Originally, there are O(k2) constraints. Notice that they are not independent. However, we can replace the original constraints by an equivalent set of constraints with size O(k).

j2 ~ jk-1

j0 ~ j2

j0 ~ jk-1

Remove the redundant equations without solving them!

Key lemma

Page 24: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Given a spanning tree, the stretch of an edge (k, j) is defined as the length of the unique path between k and j on the tree.

Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with average stretch O(log2n log log n).

The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sum of stretches O(nlog2n log log n).

Redundant Equation Elimination

Page 25: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Conclusion We present an efficient algorithm for ZRHC with runni

ng time O(mn2+n3 log2n log log n).

It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O(mn2+n3)

or lower.

Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants

Page 26: Authors:                 Lan Liu  & Tao Jiang,  Univ. California, Riverside

Thanks for your time and attention!