39
Deconvoluting BAC- Deconvoluting BAC- gene Relationships gene Relationships Using Using a Physical Map a Physical Map Y. Wu Y. Wu 1 1 , L. Liu , L. Liu 1 1 , T. Close , T. Close 2 2 , , S. Lonardi S. Lonardi 1 1 1 Department of Computer Science & Department of Computer Science & Engineering Engineering 2 Department of Botany & Plant Sciences Department of Botany & Plant Sciences

Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships UsingRelationships Using

a Physical Mapa Physical MapY. WuY. Wu11, L. Liu, L. Liu11, T. Close, T. Close22, S. Lonardi, S. Lonardi11

11Department of Computer Science & EngineeringDepartment of Computer Science & Engineering22Department of Botany & Plant SciencesDepartment of Botany & Plant Sciences

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Selective sequencingSelective sequencing

• Many organisms are unlikely to be Many organisms are unlikely to be sequenced in the near future due to the sequenced in the near future due to the large size and highly repetitive content of large size and highly repetitive content of their genomestheir genomes

• Selective sequencing:Selective sequencing: obtain the sequence obtain the sequence of a small set of BAC clones that contain a of a small set of BAC clones that contain a specific set of genes of interestspecific set of genes of interest

• How do we identify these BAC clones?How do we identify these BAC clones?BAC-gene deconvolution problemBAC-gene deconvolution problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problemAn illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problemAn illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problemAn illustration of the problem

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with probesHybridization with probes

• The presence of a gene in a BAC can be The presence of a gene in a BAC can be determined by an hybridization experiment determined by an hybridization experiment (e.g., using a (e.g., using a uniqueunique probe designed from it) probe designed from it)

• Given that typically BAC clones and probes Given that typically BAC clones and probes could be in the order of tens of thousands, could be in the order of tens of thousands, carrying out an experiment for each pair carrying out an experiment for each pair (BAC,probe) is usually unfeasible(BAC,probe) is usually unfeasible

• Group testingGroup testing (or (or poolingpooling) has to be used) has to be used

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with pools of probesHybridization with pools of probes

• Probes can be arranged into pools for Probes can be arranged into pools for group testing. However, in order to achieve group testing. However, in order to achieve exact deconvolution this strategy could be exact deconvolution this strategy could be still unfeasible due to the large number of still unfeasible due to the large number of poolspools

• QuestionQuestion: Can we use a small number of : Can we use a small number of pools (e.g., 1- or 2-decodable pool design) pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?and still achieve accurate deconvolution?

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Dealing with the limitations of poolingDealing with the limitations of pooling

• Answer:Answer: Yes, if one compensates for the Yes, if one compensates for the lack of information obtained by a weak lack of information obtained by a weak pooling design with the knowledge of the pooling design with the knowledge of the overlapping structure of the BACsoverlapping structure of the BACs

• In this way, the number of pools required In this way, the number of pools required is reduced is reduced less expensive/time- less expensive/time-consumingconsuming

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization dataHybridization data

h(b,p)=1 h(b,p)=1 (pool (pool pp hybridizes to BAC hybridizes to BAC bb))– b b mustmust contain at least one of the contain at least one of the

probes/genes represented by probes/genes represented by pp– positive informationpositive information

h(b,p)=0 h(b,p)=0 (pool (pool pp does not hybridize to BAC does not hybridize to BAC bb))– bb cannotcannot contain any of the probes/genes contain any of the probes/genes

represented by represented by pp– negative informationnegative information

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Deconvolution problemDeconvolution problem

• Given Given h(b,p)h(b,p) for all pairs for all pairs (b,p)(b,p) the the deconvolution problemdeconvolution problem is to establish a is to establish a one-to-many assignment between the one-to-many assignment between the probes probes pp and the clones and the clones bb in such a way in such a way that it satisfies the value of that it satisfies the value of hh

1.1. Basic deconvolution: uses only on Basic deconvolution: uses only on information obtained from group testinginformation obtained from group testing

2.2. Improved deconvolution: also uses the Improved deconvolution: also uses the physical mapphysical map

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Input to the basic deconvolutionInput to the basic deconvolution

hh pp11 pp22 pp33 pp44

bb11 11 00 00 00

bb22 11 11 00 00

bb33 00 11 11 00

bb44 00 00 11 11

bb55 00 00 00 11

Hybridization table

pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Input to the basic deconvolutionInput to the basic deconvolution

hh pp11 pp22 pp33 pp44

bb11 11

bb22 11 11

bb33 11 11

bb44 11 11

bb55 11

uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99

pp11 11 11 11

pp22 11 11 11

pp33 11 11 11

pp44 11 11 11

Pool content table

Hybridization table

pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Positive informationPositive information

uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99

bb11,p,p11 11 11 11

bb22,p,p11 11 11 11

bb22,p,p22 11 11 11

bb33,p,p22 11 11 11

bb33,p,p33 11 11 11

bb44,p,p33 11 11 11

bb44,p,p44 11 11 11

bb55,p,p44 11 11 11

pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Negative informationNegative information

uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99

bb11 00 00 00 00 00 00 00

bb22 00 00 00 00 00

bb33 00 00 00 00 00 00

bb44 00 00 00 00 00

bb55 00 00 00 00 00 00 00pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Combining positive & negativeCombining positive & negative

uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99

bb11,p,p11 11 11 11

bb22,p,p11 11 11 11

bb22,p,p22 11 11 11

bb33,p,p22 11 11 11

bb33,p,p33 11 11 11

bb44,p,p33 11 11 11

bb44,p,p44 11 11 11

bb55,p,p44 11 11 11

pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Combining positive & negativeCombining positive & negative

uu11 uu22 uu33 uu44 uu55 uu66 uu77 uu88 uu99

bb11,p,p11 11 11

bb22,p,p11 11 11 11

bb22,p,p22 11 11

bb33,p,p22 11 11

bb33,p,p33 11 11

bb44,p,p33 11 11

bb44,p,p44 11 11 11

bb55,p,p44 11 11

• Each row represents Each row represents a a constraintconstraint to be to be satisfiedsatisfied

• If a row contains only If a row contains only one “1”, then the one “1”, then the relationship between relationship between the BAC and probe the BAC and probe is resolved exactlyis resolved exactly

pi is a poolbj is a BACuk is a probe/gene

pi is a poolbj is a BACuk is a probe/gene

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Physical map-assisted deconvolutionPhysical map-assisted deconvolution

• Basic deconvolution is not sufficient Basic deconvolution is not sufficient • BACs are assembled into contigs by FPC (a BACs are assembled into contigs by FPC (a

contigcontig is a set of BAC clones) is a set of BAC clones)• We assume the probes are unique We assume the probes are unique each probe each probe

can belong to exactly one contigcan belong to exactly one contig

Contig 1 Contig 2

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Optimization problemOptimization problem

• We formulate the following optimization We formulate the following optimization problemproblem

• The problem is NP-complete (proof in the The problem is NP-complete (proof in the paper, reduction from 3SAT)paper, reduction from 3SAT)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Integer Linear ProgrammingInteger Linear Programming

• The optimization problem can be solved The optimization problem can be solved via integer linear programming (ILP)via integer linear programming (ILP)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

LP and randomized roundingLP and randomized rounding

• The ILP is relaxed to the corresponding The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the LP, then the LP is solved exactly (via the GLPK package)GLPK package)

• Optimal solution to the LP is mapped to a Optimal solution to the LP is mapped to a valid solution to the ILP via randomized valid solution to the ILP via randomized rounding rounding

• We prove that our method achieves We prove that our method achieves approximation ratio approximation ratio (1-e(1-e-1-1))

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results on rice genomeExperimental results on rice genome

• Whole genome sequence for rice is availableWhole genome sequence for rice is available• BAC library and fingerprinting data are available BAC library and fingerprinting data are available

from AGIfrom AGI• BAC-end sequences are also available from BAC-end sequences are also available from

GenbankGenbank• Physical map was built using FPCPhysical map was built using FPC• Coordinates of the BAC on the genome were Coordinates of the BAC on the genome were

determined by BLASTing BAC-end sequences determined by BLASTing BAC-end sequences against the genomeagainst the genome

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results on rice genomeExperimental results on rice genome

• Rice unigenes are available from NCBIRice unigenes are available from NCBI

• Unique probes for the unigenes were Unique probes for the unigenes were designed by the Oligospawn softwaredesigned by the Oligospawn software

• Experiments focused on chromosome IExperiments focused on chromosome I

• Probe pools were designed following the Probe pools were designed following the shifted transversal design (STD)shifted transversal design (STD)

• Dataset: 2,002 probes and 2,629 BACsDataset: 2,002 probes and 2,629 BACs

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental resultsExperimental results

1-decodable pooling design1-decodable pooling design

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental resultsExperimental results

2-decodable pooling design2-decodable pooling design

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental resultsExperimental results

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

FindingsFindings

• We proposed a new method to solve the We proposed a new method to solve the BAC-gene deconvolution problem based BAC-gene deconvolution problem based on integer linear programmingon integer linear programming

• Experimental results show that our method Experimental results show that our method is accurate and effectiveis accurate and effective

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Thank youThank you

• FundingFunding

• Serdar Bozdag (UC Riverside) for providing the Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)rice data (fingerprinting and hybridization)

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with pools of probesHybridization with pools of probes

• Probes can be arranged into pools for Probes can be arranged into pools for group testinggroup testing

• In order to achieve exact deconvolution In order to achieve exact deconvolution this strategy can be still unfeasiblethis strategy can be still unfeasible

• The reason: a BAC may contain several, if The reason: a BAC may contain several, if not tens of genes not tens of genes the “decodability” of the “decodability” of the pool design has to be high to achieve the pool design has to be high to achieve exact deconvolution exact deconvolution … …

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with pools of probesHybridization with pools of probes

• … … the pool size has to be small, which the pool size has to be small, which implies that the number of pools will be implies that the number of pools will be largelarge

• QuestionQuestion: Can we use a low decodability : Can we use a low decodability (1- or 2-decodable) pool design and still (1- or 2-decodable) pool design and still achieve good deconvolution?achieve good deconvolution?

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Physical map-assisted deconvolutionPhysical map-assisted deconvolution

• For example, if we knew that BAC For example, if we knew that BAC bi and and BAC BAC bbjj are 80% overlapping are 80% overlapping if a probe if a probe pp belongs to BAC belongs to BAC bi, it is very likely that , it is very likely that pp also belongs to also belongs to bbjj

• On the other hand, if we knew that BAC On the other hand, if we knew that BAC bi and BAC and BAC bbjj are not overlapping are not overlapping if a if a probe probe pp belongs to BAC belongs to BAC bi, then it is very , then it is very unlikely that probe unlikely that probe pp also belong to BAC also belong to BAC bbjj

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Physical map-assisted deconvolutionPhysical map-assisted deconvolution

• Basic deconvolution step is not sufficientBasic deconvolution step is not sufficient• The overlapping structure of the BACs is The overlapping structure of the BACs is

used to resolve additional relationships used to resolve additional relationships between BACs and probesbetween BACs and probes

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Sketch of the algorithmSketch of the algorithm

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Perfect physical mapPerfect physical map

• Cut the chromosome at the points where a BAC Cut the chromosome at the points where a BAC starts or endsstarts or ends

• Let’s call the resulting pieces Let’s call the resulting pieces fragmentsfragments• Each fragment is covered by a set of BACsEach fragment is covered by a set of BACs• Assume the probes are unique, therefore, each Assume the probes are unique, therefore, each

probe can only belong to one fragmentprobe can only belong to one fragment

f1 f2 f3 f4 f5

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Optimization problemOptimization problem

• Optimization problem is similarly formulatedOptimization problem is similarly formulated

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

ILPILP

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Solving the optimization problemSolving the optimization problem

• The above problem is The above problem is NP-completeNP-complete

• It is solved via ILP followed by It is solved via ILP followed by LP relaxation and randomized LP relaxation and randomized roundingrounding

• Similar performance guarantee Similar performance guarantee can be provedcan be proved

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Sketch of the algorithmSketch of the algorithm

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental resultsExperimental results