15
CSE280 Stefano/Hossein Project: Primer design for cancer genomics

CSE280Stefano/Hossein Project: Primer design for cancer genomics

Embed Size (px)

DESCRIPTION

CSE280Stefano/Hossein Polymerase Chain Reaction PCR is a technique for amplifying and detecting a specific portion of the genome Amplification takes place if the primers are ‘appropriate’ distance apart (

Citation preview

Page 1: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Project: Primer design for cancer genomics

Page 2: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Cancer genomics

• In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes

• In the early stages, only a few cells will show this

deletion

Page 3: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Polymerase Chain Reaction

• PCR is a technique for amplifying and detecting a specific portion of the genome

• Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb)

Page 4: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Assaying for Rare Variants

• PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells

Extract Genomic DNA

PCR

Distance too large for amplificationTumor cell

Detection

Page 5: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Primer Approximation Multiplex PCR (PAMP)*

• Multiple primers are optimally spaced, flanking a breakpoint of interest– Upstream of breakpoint, forward primers– Downstream of breakpoint, reverse primers

• The primers are run in a multiplex PCR reaction– Any pair can form a viable product

Deletion Deletion

Patient B Patient C

Page 6: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Goal

• Input, a collection of primer locations and matrices of primer interactions

– Forward/Forward, Forward/Reverse, Reverse/Reverse• Identify a subset of primers that do not interact, are unique, maximizing

the covered region

Page 7: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Algorithms for Optimizing the Cost

• Preprocessing– Determining the pairs of primers that dimerize (Edges in the

graph)– Filtering the primers to ensure “uniqueness”

• Simulated annealing1. Start from an initial candidate set P, generated randomly or

greedily.2. List the neighboring sets P’ and compute

3. Select step s with a probability proportional to

4. Decrease the temperature T and go to step 2.

)()( PCPCs Tse

Page 8: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Cost Function• The cost function used takes

coverage and dimerization into account

Dimerization Coverage

dllvuC vu ,0max),(

Density

Page 9: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Simulated Annealing: Define Neighbors

• Approach 1:– Set – E is the edge set corresponding to dimerizing pairs– Neighbors of P are formed by adding a vertex u to P and

removing all vertices dimerizing with u; i.e.

• Approach 2:– – No hard constraint on

dimerizing pairs.– Neighbors of P are obtained

by adding or removing one vertex from P.

pw

uEvuvuPP somefor }),(:{}{

pw

Page 10: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

: indicator of primer i being selected.

: indicator of candidate primer i being immediately after primer j.

ILP Formulation

ijq

ix

• Guaranteed optimality, but intractable for realistic problems• Used here to assess the performance of simulated annealing

Page 11: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Bounds and Numerical Results

• A Weak Theoretical Upper Bound:– Select all primers without dimerization constraints.– For any two adjacent primers with distance reduce

the covered region by bp. dd

dd

Page 12: CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280 Stefano/Hossein

Potential Improvements

• Improving the cost function formulation– Incorporating multiplexing sets

• Find an efficient technique to solve the optimization problem.

• Improve on the analytical bound– consider the effect of dimerization within the forward/reverse

primer set.

Page 13: CSE280Stefano/Hossein Project: Primer design for cancer genomics

Pairwise cost function

1

1

1

111 2

F

i

R

jjjii dllll

setprimer Reverse setprimer Forward

RF

• Measures total possible number of sites that are uncovered given all forward and reverse primer combinations

Page 14: CSE280Stefano/Hossein Project: Primer design for cancer genomics

Multiobjective cost function

• Taking coverage and multiplexing sets into account

• Minimizing both objectives, and resolving the dimerization constraint, given a possible solution containing mutliplexing sets S

0)(

,0max

),(

2

11

kSji Eppp

ii

iii

wxg

Sf

dllf

Missed coverage

Sets

Page 15: CSE280Stefano/Hossein Project: Primer design for cancer genomics

Using Fewer Integer Variables• The formulation in the paper uses n2 auxiliary variables, one for

each pair of primers.– qij=1 if and only if primers i and j are selected as two consecutive

primers in the candidate set.• Complexity of ILP (or IQP) generally grows exponentially with the

number of integer variables.• In practice, the distance between two consecutive primers in the

solution is not much larger than d, otherwise there would be a large gap in the covered region.

• Assume a maximum g on the maximum distance• Introduce a variable qij if li – lj < g• The average number of variables is reduce to n(1+ρg)

– ρ is the density of the primers in the initial set.– The number of integer variables becomes O(n).