Upload
briana-sherman
View
213
Download
0
Embed Size (px)
DESCRIPTION
CSE280Stefano/Hossein Polymerase Chain Reaction PCR is a technique for amplifying and detecting a specific portion of the genome Amplification takes place if the primers are ‘appropriate’ distance apart (
Citation preview
CSE280 Stefano/Hossein
Project: Primer design for cancer genomics
CSE280 Stefano/Hossein
Cancer genomics
• In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes
• In the early stages, only a few cells will show this
deletion
CSE280 Stefano/Hossein
Polymerase Chain Reaction
• PCR is a technique for amplifying and detecting a specific portion of the genome
• Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb)
CSE280 Stefano/Hossein
Assaying for Rare Variants
• PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells
Extract Genomic DNA
PCR
Distance too large for amplificationTumor cell
Detection
CSE280 Stefano/Hossein
Primer Approximation Multiplex PCR (PAMP)*
• Multiple primers are optimally spaced, flanking a breakpoint of interest– Upstream of breakpoint, forward primers– Downstream of breakpoint, reverse primers
• The primers are run in a multiplex PCR reaction– Any pair can form a viable product
Deletion Deletion
Patient B Patient C
CSE280 Stefano/Hossein
Goal
• Input, a collection of primer locations and matrices of primer interactions
– Forward/Forward, Forward/Reverse, Reverse/Reverse• Identify a subset of primers that do not interact, are unique, maximizing
the covered region
CSE280 Stefano/Hossein
Algorithms for Optimizing the Cost
• Preprocessing– Determining the pairs of primers that dimerize (Edges in the
graph)– Filtering the primers to ensure “uniqueness”
• Simulated annealing1. Start from an initial candidate set P, generated randomly or
greedily.2. List the neighboring sets P’ and compute
3. Select step s with a probability proportional to
4. Decrease the temperature T and go to step 2.
)()( PCPCs Tse
CSE280 Stefano/Hossein
Cost Function• The cost function used takes
coverage and dimerization into account
Dimerization Coverage
dllvuC vu ,0max),(
Density
CSE280 Stefano/Hossein
Simulated Annealing: Define Neighbors
• Approach 1:– Set – E is the edge set corresponding to dimerizing pairs– Neighbors of P are formed by adding a vertex u to P and
removing all vertices dimerizing with u; i.e.
• Approach 2:– – No hard constraint on
dimerizing pairs.– Neighbors of P are obtained
by adding or removing one vertex from P.
pw
uEvuvuPP somefor }),(:{}{
pw
CSE280 Stefano/Hossein
: indicator of primer i being selected.
: indicator of candidate primer i being immediately after primer j.
ILP Formulation
ijq
ix
• Guaranteed optimality, but intractable for realistic problems• Used here to assess the performance of simulated annealing
CSE280 Stefano/Hossein
Bounds and Numerical Results
• A Weak Theoretical Upper Bound:– Select all primers without dimerization constraints.– For any two adjacent primers with distance reduce
the covered region by bp. dd
dd
CSE280 Stefano/Hossein
Potential Improvements
• Improving the cost function formulation– Incorporating multiplexing sets
• Find an efficient technique to solve the optimization problem.
• Improve on the analytical bound– consider the effect of dimerization within the forward/reverse
primer set.
Pairwise cost function
1
1
1
111 2
F
i
R
jjjii dllll
setprimer Reverse setprimer Forward
RF
• Measures total possible number of sites that are uncovered given all forward and reverse primer combinations
Multiobjective cost function
• Taking coverage and multiplexing sets into account
• Minimizing both objectives, and resolving the dimerization constraint, given a possible solution containing mutliplexing sets S
0)(
,0max
),(
2
11
kSji Eppp
ii
iii
wxg
Sf
dllf
Missed coverage
Sets
Using Fewer Integer Variables• The formulation in the paper uses n2 auxiliary variables, one for
each pair of primers.– qij=1 if and only if primers i and j are selected as two consecutive
primers in the candidate set.• Complexity of ILP (or IQP) generally grows exponentially with the
number of integer variables.• In practice, the distance between two consecutive primers in the
solution is not much larger than d, otherwise there would be a large gap in the covered region.
• Assume a maximum g on the maximum distance• Introduce a variable qij if li – lj < g• The average number of variables is reduce to n(1+ρg)
– ρ is the density of the primers in the initial set.– The number of integer variables becomes O(n).