Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Analysis of Chromosome Aberration Frequenciesusing Algebraic Statistics
Serkan Hosten
Mathematics DepartmentSan Francisco State University
May 19, 2014
Serkan Hosten Testing Proximity of Chromosome Territories
Motivation
Definition
A chromosome territory is a discrete region within the cell nucleus that isoccupied by a chromosome.
Figure: Representation of Chromosome Territories. (Bolzer et al. 2005)
Serkan Hosten Testing Proximity of Chromosome Territories
Motivation
Chromosome positioning within cell’s nucleus is shown to influence
gene expression,
DNA damage processing,
genetic diseases and cancer.
Where? and How?: poorly understood.
Serkan Hosten Testing Proximity of Chromosome Territories
Motivation
Figure: Red indicates CT of chromosome 18 (gene-poor) and green, chromosome 19
(gene-rich) in the nucleus of a non-simulated human lymphocyte. Image courtesy
M.Cremer and I. Solovei.
In general,
chromosomes {1, 19, 22, 17} are located in close proximity to the center of nucleus.
chromosomes {18, 2, 4, 13} are located more on the boundary of nucleus.
Serkan Hosten Testing Proximity of Chromosome Territories
Chromosome Aberrations
Application of ionizing radiation to the cell nucleus during G0/G1 phase ofcell cycle causes double stranded breaks of DNA, which promotechromosome aberrations and rearrangements of pieces.
Definition
Interchange is an exchange-type chromosome aberration involving twodifferent chromosomes (i.e. non-homologous) chromosomes.
Serkan Hosten Testing Proximity of Chromosome Territories
Chromosome Aberrations
To study chromosome interchanges multicolor fluorescence in situhybridization (mFISH) method is used.
Figure: In this mFISH image we see that each chromosome is labeled in adifferent color, the chromosome exchanges are marked by the white arrows.
Serkan Hosten Testing Proximity of Chromosome Territories
Proximity-Effect Hypothesis
Definition
The proximity-effect hypothesis states that chromosomes which are inclose juxtaposition are more likely to have higher yield of interchanges .
Serkan Hosten Testing Proximity of Chromosome Territories
Data
Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Sum
1 44 38 42 29 26 29 18 39 29 25 18 15 18 34 31 22 12 14 22 9 27 541
2 43 37 32 30 24 25 29 16 24 30 29 9 26 8 24 8 7 12 13 15 485
3 21 31 32 24 21 26 23 25 23 21 18 18 19 21 11 17 11 12 10 465
4 23 27 28 24 26 20 13 19 23 22 20 16 18 11 6 12 10 7 425
5 17 31 26 25 24 30 25 25 15 19 8 19 13 7 16 7 4 426
6 18 22 21 31 13 30 18 15 19 14 15 13 10 9 8 7 395
7 20 20 17 28 25 13 18 8 18 23 11 9 19 6 7 396
8 13 12 24 11 25 15 16 12 16 17 4 9 7 8 345
9 21 25 7 23 23 27 20 15 22 8 9 7 10 416
10 18 21 14 14 10 19 14 9 5 11 7 3 338
11 25 5 15 16 19 15 8 10 12 3 11 364
12 9 16 9 12 16 8 13 10 5 5 337
13 29 10 10 7 16 5 6 7 9 319
14 22 13 6 10 2 6 13 11 310
15 22 13 9 7 11 7 9 332
16 12 15 12 20 8 13 321
17 5 4 11 5 10 291
18 2 11 9 3 223
19 6 0 8 156
20 7 10 240
21 6 156
22 193
Table: Table of Chromosome Interchanges for 3585 Lymphocyte Cells. Each entry f (j , k) holds thenumber of cells in which at least one exchange between chromosome j and k was recorded (Arsuagaet. al. 2004).
Serkan Hosten Testing Proximity of Chromosome Territories
Log-Linear Model for Chromosome Interchange
Since 22 different colors were used to mark each chromosomeinvolved we have
(222
)= 231 heterologous autosome pairs.
Based on the possible outcomes of this experiment, we can define arandom variable X ∈ {(j , k) : 1 ≤ j < k ≤ 22} which takes 231 valuescorresponding to all possible colored pairs.
Serkan Hosten Testing Proximity of Chromosome Territories
Log-Linear Model for Chromosome Interchange
To model no proximitiy effect, consider ϕ : R22 → R231 defined by
ϕ(ϑ1, ..., ϑ22) = ϑiϑj ∈ R231,
where i 6= j for i , j ∈ {1, ..., 22} and ϑiϑj = pi ,j , the probability ofchromosome i and j interacting.
Serkan Hosten Testing Proximity of Chromosome Territories
Goodness of Fit Test
Consider the hypothesis testing problem:
H0 : p ∈M versus H1 : p 6∈ Musing
χ2(F ) =∑
1≤j<k≤22
(F (j , k)− f (j , k))2
f (j , k)
where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .
To compute p-value of the test observe that
we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;
we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)
Serkan Hosten Testing Proximity of Chromosome Territories
Goodness of Fit Test
Consider the hypothesis testing problem:
H0 : p ∈M versus H1 : p 6∈ Musing
χ2(F ) =∑
1≤j<k≤22
(F (j , k)− f (j , k))2
f (j , k)
where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .
To compute p-value of the test observe that
we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;
we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)
Serkan Hosten Testing Proximity of Chromosome Territories
Goodness of Fit Test
Consider the hypothesis testing problem:
H0 : p ∈M versus H1 : p 6∈ Musing
χ2(F ) =∑
1≤j<k≤22
(F (j , k)− f (j , k))2
f (j , k)
where f (j , k) are the entries of the maximum likelihood estimator (MLE)table with respect to the no-proximity effect model given the data table f .
To compute p-value of the test observe that
we can not use asymptotic test as data table has small counts ofexchanges for some chromosome pairs;
we can not use Fisher exact test as it is too computationallyexpensive since it requires enumeration of all 22× 22 tablespreserving minimal sufficient statistics.Use Markov Monte Carlo Chain (MCMC)
Serkan Hosten Testing Proximity of Chromosome Territories
MLE Table
Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Sum
1 47 43 38 37 33 33 27 34 26 28 25 23 22 24 23 20 14 8.9 15 8.8 11 541
2 37 32 32 29 29 24 30 23 25 22 21 20 22 21 18 13 8.3 14 8.2 11 485
3 30 30 27 27 23 28 22 24 22 20 19 21 20 18 13 8.2 14 8.2 11 465
4 27 24 24 21 26 20 22 20 18 18 19 18 16 12 7.8 13 7.8 10 425
5 24 24 21 26 20 22 20 19 18 19 19 17 12 8 13 8 10 426
6 22 19 24 19 20 19 17 17 18 17 16 12 7.7 12 7.7 9.7 395
7 19 24 19 20 19 17 17 18 17 16 12 7.9 13 7.9 9.9 396
8 20 16 18 16 15 15 16 15 14 11 7.2 11 7.2 9 345
9 20 21 20 18 18 19 18 17 13 8.6 13 8.5 11 416
10 17 16 15 15 16 15 14 11 7.3 11 7.4 9.2 338
11 17 16 16 17 16 15 11 8 12 8 9.9 364
12 15 15 16 15 14 11 7.6 12 7.6 9.4 337
13 14 15 15 13 10 7.4 11 7.4 9.1 319
14 15 14 13 10 7.4 11 7.4 9 310
15 15 14 11 7.9 12 7.9 9.6 332
16 14 11 7.9 11 7.9 9.5 321
17 10 7.4 11 7.4 8.9 291
18 6 8.6 6 7.3 223
19 6.4 4.6 5.6 156
20 6.6 7.8 240
21 5.7 156
22 193
Table: MLE Table
Serkan Hosten Testing Proximity of Chromosome Territories
Markov Basis for the Second Hypersimplex
The model M is the toric model of the second hypersimplex ∆[2, 22]given by the matrix A[2, 22] where the columns of the matrix areej + ek with 1 ≤ j < k ≤ 22.A quadratic lexicographic Groebner basis for the toric ideal of A[2, n]has been given by De Loera, Sturmfels, and Thomas (1995)
Serkan Hosten Testing Proximity of Chromosome Territories
Results of goodness of fit test
Based on the results of Metropolis-Hastings algorithm that generated1, 000, 000 tables (with 30, 000 steps inbetween each “pick”), we couldn’treject the no-proximity-effect model (p ≈ 1).
Serkan Hosten Testing Proximity of Chromosome Territories
MLE and Experiment Data
We have noticed that differences between expected values of interchanges and actual observed ones are bigger forsome chromosome pairs.
Chr 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 -3.2 -5.5 4.2 -8.2 -7.2 -3.8 -9 5 3.3 -2.9 -7 -8.1 -4 10 8.4 2.2 -2 5.1 6.8 0.2 16
2 6.2 4.6 -0.1 1.2 -4.6 1.1 -0.9 -6.9 -0.9 7.6 8.2 -11 4.4 -13 5.9 -4.9 -1.3 -2.1 4.8 4.3
3 -9.4 0.8 4.7 -3.2 -1.8 -2.4 1.1 1.2 1.4 0.9 -1.3 -2.8 -0.9 3.4 -1.7 8.8 -2.8 3.8 -0.52
4 -3.9 2.5 3.6 3.3 0.3 0.0 -8.7 -0.8 4.5 4.2 0.8 -2.4 1.7 -0.9 -1.8 -0.9 2.2 -3
5 -7.5 6.6 5.2 -0.6 3.9 8.2 5.1 6.4 -3 -0.4 -11 2.5 0.9 -1 2.9 -1.0 -6.2
6 -4.4 2.8 -2.5 12 -7.2 11 0.6 -1.8 0.9 -3.4 -0.5 1.5 2.3 -3.5 0.3 -2.7
7 0.7 -3.5 -1.7 7.7 6.4 -4.5 1.1 -10 0.5 7.3 -0.7 1.1 6.3 -1.9 -2.9
8 -7.2 -4.3 6.4 -5.2 9.7 0.1 0.0 -3.5 2 6.5 -3.2 -2.4 -0.2 -1
9 1.2 3.7 -13 4.5 5.1 7.9 1.6 -1.6 9.5 -0.6 -4.5 -1.5 -0.7
10 0.8 5 -1.1 -0.7 -5.8 3.7 0.1 -1.6 -2.3 -0.4 -0.4 -6.2
11 7.8 -11 -0.9 -1.0 2.6 0.1 -3.4 2 -0.3 -5 1.1
12 -6.2 1.2 -6.8 -3.3 2 -2.8 5.4 -1.6 -2.6 -4.4
13 15 -5 -4.6 -6.4 5.6 -2.4 -5.2 -0.4 -0.1
14 7.3 -1.3 -7.1 -0.2 -5.4 -5 5.6 2
15 6.8 -0.9 -2 -0.9 -0.7 -0.9 -0.6
16 -1.6 4.2 4.1 8.5 0.1 3.5
17 -5 -3.4 0.3 -2.4 1.1
18 -4 2.4 3 -4.3
19 -0.4 -4.6 2.4
20 0.4 2.2
21 0.4
Table: Deviation Between Observed and MLE Counts.
Serkan Hosten Testing Proximity of Chromosome Territories
Modified Log-Linear Model
To examine the effect of pairwise interaction for only one pair ofchromosomes {s, r} consider the following map:
ϕ : R22 → R232
defined asϕ(ϑ1, .., ϑ22) = ϑiϑjαij = pij (1)
where
αij =
{α if i = r and j = s,
1 otherwise.
Serkan Hosten Testing Proximity of Chromosome Territories
Log-Ratio Test
Let M0 be our original log-linear model and M be modified model.Observe that by construction our original log-linear model is nested withinmodified model.
We performed the following test:
H0 : p ∈M0 versus HA : p ∈M\M0
The likelihood ratio test statistic is:
G 2 = 2∑i<j
f (i , j) log
(f (i , j)
f0(i , j)
)
where f (i , j) is the MLE under the assumption that M is valid and f0(i , j)is the MLE under the assumption that M0 is valid.
Serkan Hosten Testing Proximity of Chromosome Territories
Results of Log-Ratio Test
p-value
Chromosome Pairs Table Table Table(Cornforth et al. 2001) (Lucas et al. 2002) (Arsuaga et al. 2004)
{1, 22} 0.001 0.007 0.001{1, 16} 1.000 1.000 0.104{2, 15} 0.800 0.500 0.102{3, 19} 1.000 1.000 0.161{6, 10} 1.000 1.000 0.169{6, 12} 1.000 0.162 0.284{8, 13} 0.589 1.000 0.535{9, 18} 1.000 1.000 0.342{9, 13} 0.120 1.000 1.000{11, 12} 1.000 1.000 0.316{13, 14} 0.879 1.000 0.007{16, 20} 0.918 1.000 0.535
Table: p-values adjusted with the Bonferroni correction for the log-ratio test of modified model andoriginal model based on all three observed datasets.
Serkan Hosten Testing Proximity of Chromosome Territories
Conclusion
1 We were not able to reject proposed log-linear model of no proximityeffect with p-value of 1 for the data set published in Arsuaga et al.2004.
2 We considered modified log-linear model with a proximity factor forthe chromosome pair of our interest. We could not reject modifiedmodel for chromosome pair {1, 22} and chromosome pair {13, 14}.
Serkan Hosten Testing Proximity of Chromosome Territories
Conclusion
1 We were not able to reject proposed log-linear model of no proximityeffect with p-value of 1 for the data set published in Arsuaga et al.2004.
2 We considered modified log-linear model with a proximity factor forthe chromosome pair of our interest. We could not reject modifiedmodel for chromosome pair {1, 22} and chromosome pair {13, 14}.
Serkan Hosten Testing Proximity of Chromosome Territories