Download pdf - A Comprehensive Haplotype Analysis of the XPC Genomic Sequence Reveals a Cluster of Genetic Variants Associated with Sensitivity to Tobacco-Smoke Mutagens

TOXICOLOGICAL SCIENCES 115(1), 41–50 (2010)

doi:10.1093/toxsci/kfq027

Advance Access publication January 27, 2010

A Comprehensive Haplotype Analysis of the XPC Genomic SequenceReveals a Cluster of Genetic Variants Associated with Sensitivity to

Tobacco-Smoke Mutagens

Catherine M. Rondelli,* Randa A. El-Zein,† Jeffrey K. Wickliffe,‡ Carol J. Etzel,† and Sherif Z. Abdel-Rahman*,1

*Department of Obstetrics and Gynecology, The University of Texas Medical Branch, Galveston, Texas 77555; †Department of Epidemiology, MD Anderson

Cancer Center, Houston, Texas 77030; and ‡Department of Environmental Health Sciences, Tulane University Health Sciences Center, School of Public Health

and Tropical Medicine, New Orleans, Louisiana 70112

1To whom correspondence should be addressed at Department of Obstetrics and Gynecology, The University of Texas Medical Branch, 11.104 A, Medical

Research Building, Galveston, TX 77555-1062. Fax: (409) 772-2261. E-mail: [email protected].

Received December 16, 2009; accepted January 22, 2010

The impact of single-nucleotide polymorphisms (SNPs) of the

DNA repair gene XPC on DNA repair capacity (DRC) and

genotoxicity has not been comprehensively determined. We

constructed a comprehensive haplotype map encompassing all

common XPC SNPs and evaluated the effect of Bayesian-inferred

haplotypes on DNA damage associated with tobacco smoking,

using chromosome aberrations (CA) as a biomarker. We also used

the mutagen-sensitivity assay, in which mutagen-induced CA in

cultured lymphocytes are determined, to evaluate the haplotype

effects on DRC. We hypothesized that if certain XPC haplotypes

have functional effects, a correlation between these haplotypes

and baseline and/or mutagen-induced CA would exist. Using

HapMap and single nucleotide polymorphism (dbSNP) databases,

we identified 92 SNPs, of which 35 had minor allele frequencies

‡ 0.05. Bayesian inference and subsequent phylogenetic analysis

identified 21 unique haplotypes, which segregated into six distinct

phylogenetically grouped haplotypes (PGHs A–F). A SNP tagging

approach used identified 11 tagSNPs representing these 35 SNPs

(r2 5 0.80). We utilized these tagSNPs to genotype a population of

smokers matched to nonsmokers (n 5 123). Haplotypes for each

individual were reconstituted and PGH designations were

assigned. Relationships between XPC haplotypes and baseline

and/or mutagen-induced CA were then evaluated. We observed

significant interaction among smoking and PGH-C (p5 0.046) for

baseline CA where baseline CA was 3.5 times higher in smokers

compared to nonsmokers. Significant interactions among smoking

and PGH-D (p 5 0.023) and PGH-F (p 5 0.007) for mutagen-

induced CA frequencies were also observed. These data indicate

that certain XPC haplotypes significantly alter CA and DRC in

smokers and, thus, can contribute to cancer risk.

Key Words: DNA nucleotide excision repair; XPC gene;

polymorphism; haplotypes; biomarkers; chromosome; smoking;

cancer.

Smoking is associated with a high risk of cancer at many

organs (IARC, 1986). Not all smokers, however, develop cancer,

which clearly indicates a significant interindividual variation in

metabolism of tobacco carcinogens and in repair of the resulting

genetic damage (Liu et al., 2005). In fact, studies have

consistently shown a significant association between reduced

DNA repair capacity (DRC) and increased risk of tobacco-related

cancers (Shen et al., 2003; Zhu et al., 2007). The nucleotide

excision repair (NER) is the major DNA repair pathway that

removes genetic damage resulting from exposure to many

tobacco carcinogens (Friedberg, 2001). An important protein in

this pathway is the xeroderma pigmentosum complementation

group C (XPC) protein, which plays a key role as a part of the

DNA damage–recognition complex (Araki et al., 2001). XPC is

the only protein in this complex that directly binds to the damaged

DNA (Park and Choi, 2006) to initiate the NER process through

the recruitment of other proteins, including xeroderma pigmento-

sum complementation group A (XPA), transcription factor II H

(TFIIH), xeroderma pigmentosum complementation group G

(XPG), and replication protein A (RPA) (Bunick et al., 2006).

The XPC gene spans 33 kb and encodes a 940 amino acid

protein (Genbank accession No. AC090645). XPC is highly

polymorphic, with many single-nucleotide polymorphisms

(SNPs) in the exonic region and the intronic, 3# and 5#untranslated regions (UTRs), including the promoter region.

Only a few of these SNPs, namely the exon 16 variant K939Q

(rs2228001), exon 8 variant A499V (rs2228000), intron 11–5

splice site C/A (rs3729587), and intron 9 PolyAT insertion,

have been studied as potential modifiers of cancer risk in

humans. Many epidemiological studies have shown associa-

tions between these SNPs and risk for human cancer for many

organs (e.g., An et al., 2007; Guo et al., 2008; Hansen et al.,2007; Zhu et al., 2007).

Over 90 SNPs in the XPC gene have been reported in the

International HapMap Project (www.hapmap.org) and

The authors certify that all research involving human subjects was done

under full compliance with all government policies and the Helsinki

Declaration.

� The Author 2010. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved.For permissions, please email: [email protected]

www.hapmap.org

the National Center for Biotechnology Information (NCBI)

single nucleotide polymorphism (dbSNP) (www.ncbi.nlm.nih.

gov/projects/SNP) databases. The phenotypic and/or functional

effects of these SNPs have not yet been characterized,

including their impact on DNA damage response and DRC.

Analysis of the potential effect of each of these SNPs on

disease risk, or evaluation of their individual phenotypic

effects, is certainly impractical. However, it is well known that

genetic variation in human populations is not arrayed simply as

independent SNPs but, rather, as various combinations of SNPs

or ‘‘haplotypes’’. This is because some of the individual SNPs,

often those located in close proximity to one another, are

correlated and exist in degrees of linkage disequilibrium (LD).

This creates identifiable haplotypes, comprising several SNPs

(Gabriel et al., 2002). Therefore, the phenotypic effects of

haplotypes, rather than that of individual SNPs, should be

examined in studies designed to determine the role of genetic

variability in relation to disease outcome. A practical approach to

achieve this goal would be to identify subsets of SNPs that

accurately identify haplotypes. Such SNPs could be identified

using a ‘‘tagging SNPs (tagSNPs)’’ strategy (Johnson et al.,2001). Thus, a subset of all SNPs (i.e., tagSNPs) in a given gene

region, highly correlated with other SNPs, could then be selected

for analysis, significantly reducing the volume of genotyping

needed. This approach is biologically more plausible, and more

comprehensive, since it involves the evaluation of effects of

multiple SNPs that could jointly influence disease outcome.

To our knowledge, a comprehensive haplotype analysis of

the entire XPC genomic sequence has not been conducted.

Furthermore, an evaluation of the functional effects of the XPChaplotypes, with regard to their effect on DRC, has not yet

been pursued. In the current study, we constructed a comprehen-

sive haplotype map encompassing SNPs of theXPC gene that are

reported to exist with a minor allele frequency (MAF) � 0.05 in

the general population. We hypothesized that if certain XPChaplotypes have phenotypic or functional effects, there would be

a correlation between these haplotypes and genetic damage in

individuals exposed to environmental carcinogens, such as those

found in tobacco smoke. Genetic damage was evaluated in our

study population using chromosome aberrations (CA) as a bio-

marker since increased frequency of CA in circulating peripheral

blood lymphocytes (PBLs) is considered an indication of

increased cancer risk (Bonassi et al., 2000; Hagmar et al.,1998). In addition, we used the mutagen-sensitivity assay, in

which CA frequency is determined following exposure of

cultured PBLs to a known mutagen. This is a biomarker that

serves as an indirect measure for DRC and as an intermediate

phenotype for cancer risk (Hsu et al., 1991; Spitz et al., 1995).

MATERIALS AND METHODS

Study subjects and blood collection. The study protocol was approved by

the University of Texas Medical Branch (UTMB) Institutional Review Board.

All study subjects signed a written consent form that described the purpose of the

study. A total of 123 White non-Hispanic subjects participated in this study.

They were subjects who were a subset of a larger cohort recruited without regard

to age, sex, or ethnicity from the smoking and nonsmoking staff and student

population of UTMB in Galveston, TX. This cohort is composed of individuals

who had responded to posted notices and advertisements requesting volunteers

for studies aimed at understanding the functional and biological significance of

sequence variability in DNA repair genes. Participation in this study was open to

White non-Hispanics only to avoid potential problems with admixtures when

developing the tagSNPs analysis. TagSNPs are not applicable to all races/

ethnicities, and separate sets of tagSNPs would need to be developed for each

ethnic/racial group. White non-Hispanics are accurately represented in HapMap

by the CEPH population (Utah residents with ancestry from northern and western

Europe; abbreviated and thereafter referred to as CEU).

Individuals were defined as nonsmokers if they had smoked less than 100

cigarettes during their lifetime. Individuals were defined as current smokers if

they had smoked at least five cigarettes per day for at least 1 year prior to

enrollment in the study. Smokers (n ¼ 62) were matched to nonsmokers

(n ¼ 61) based on age (± 5 years) and sex. Participants were asked to fill out

a questionnaire that provided demographic, occupational, and medical

information. Also collected was information regarding smoking habits,

including number of cigarettes per day, preferred brand, duration of smoking,

former tobacco use, and use of other tobacco products. Exclusion criteria for all

volunteers included a recent acute viral or bacterial infection; a major chronic

illness, such as cancer or an autoimmune disorder; a recent blood transfusion;

treatment with mutagenic agents, such as chemotherapeutic drugs or radiation;

excessive alcohol consumption, defined as more than a 10 g serving per day (as

determined by nationwide standard practices); and employment involving

exposure to potentially mutagenic agents. Because of these criteria, only

apparently healthy volunteers were included in the study to control for potential

confounders. A blood sample (10 ml) was obtained from each volunteer for

genotype analysis and cytogenetic cultures.

Identification of tagSNPs. The HapMap Data Release 22 phase II

assembly (HapMap online database at www.hapmap.org data release 22 phase

II NCBI assembly B36 dbSNP b126) was used as the source of genotypes for

this study. Genotypes for all SNPs reported in the genomic region

encompassing XPC were obtained from the International HapMap Project

database representing the CEU population. The CEU population sample is the

one that is the most ethnically similar to our sample of self-reported White non-

Hispanic subjects from UTMB who were evaluated in this study. We examined

2 kb of the 5# UTR of XPC since this region contains elements controlling XPC

gene expression, and we also examined the entire gene region and 2 kb of the

3# UTR. Genotypes for the CEU population were screened using Haploview

ver. 4.1 to ensure that only SNPs with a MAF of 0.05 or greater were used

in the subsequent haplotype inference. Next, we used Tagger software

(www.broad.mit.edu/mpg/tagger) to identify tagSNPs for assay design and

subsequent haplotype determination. Specifically, we used an aggressive

multimarker approach (up to six markers) restricted to SNPs with a MAF �0.05. We conservatively set the r2 threshold to � 0.8 (mean value 0.971) and

used a logarithm of odds score for estimating a recombination frequency

heterogeneity threshold of 2.

Genotyping of tagSNPs. Custom-designed real-time PCR-based assay kits

using the TaqMan chemistry from Applied Biosystems (Foster City, CA) were

used for genotyping tagSNPs. Each kit was developed to our specifications using

fluorescent probes that were designed to anneal to the designated SNP,

dependent on its sequence as determined from the reference SNP (rs) number

designated for that SNP in the NCBI dbSNP database (http://www.ncbi.nlm

.nih.gov/SNP/). Allele-specific probes were labeled with either the FAM or the

VIC fluorophore and an appropriate quencher. The PCR consisted of TaqMan

universal master mix, template DNA, and target-assay mix in a total reaction

volume of 12 ll at concentrations recommended by Applied Biosystems.

Thermal cycling was carried out in our laboratory on an MJ Research DNA

Engine thermocycler (from a subsidiary of BioRad Labs) equipped with

a computerized BioRad Chromo4 real-time PCR detection system (Hercules,

CA), under recommended conditions (50�C, 2 min; 95�C, 10 min; and 40 cycles

42 RONDELLI ET AL.

www.ncbi.nlm.nih.gov/projects/SNP

www.ncbi.nlm.nih.gov/projects/SNP

www.hapmap.org

www.broad.mit.edu/mpg/tagger

http://www.ncbi.nlm.nih.gov/SNP/


at 95�C for 15 s and 58–61�C for 1 min). Designation of referent and

polymorphic forms was determined by the FAM to VIC ratio. For quality

control, all PCRs were run in duplicate, and, along with no-template negative

controls, positive controls for each possible genotypic combination were

included when possible. Samples were coded for case-control status so that the

operator interpreting the results was blinded to the smoking status of the subject.

Samples from smokers and nonsmokers were run together in mixed batches, and

10% of the samples were randomly selected and subjected to repeat analysis, as

another quality-control measure for verification of genotyping results.

Additionally, genotypes for all tagSNPs were analyzed for deviations from

Hardy-Weinberg equilibrium (HWE) on a locus-by-locus basis using two

methods implemented in LD Analyzer ver. 1.0. The first method is a standard

two-sided Pearson chi-squared test and is rapid and computationally simple. The

second method relies on a Monte Carlo permutation–based exact test to

estimate deviations from HWE. Any SNP failing the test was excluded from

the study as an added quality-control measure.

Construction of XPC haplotypes and phylogenetic analysis. Our strategy

consisted of first using the HapMap data on the CEU population as a resource

to infer possible XPC haplotypes. These inferred haplotypes were then used to

develop a tagSNPs panel. These tagSNPs were subsequently used to genotype

our study participants. Based on the genotyping results, individuals were then

assigned to haplotypes corresponding to those inferred from the CEU

population. Haplotypes were inferred from the CEU population, using Bayesian

statistics implemented in PHASE ver. 2.1 software (www.stat.washington.edu

/stephens/phase.html), formatting the input file to account for the family trios

comprising the CEU sample. The number of iterations was increased to 10,000,

the thinning interval was increased to 10, and the burn-in was increased to 200

to improve the accuracy of the inferred haplotypes. The default setting was

selected with an output posterior probability threshold of 0.9. To ensure

accuracy of reported results, individuals lacking defined genotype data for more

than one SNP were excluded from the analysis. In addition, individuals lacking

identification of a single SNP that prevented the accurate assignment of full

haplotypes were also excluded from further analysis.

Because a substantial number of inferred haplotypes were expected, making

the number of potential statistical comparisons problematic, a phylogenetic

grouping approach was used to group or cluster evolutionarily related

haplotypes from the CEU population. Genetic distances were computed among

haplotypes using the maximum likelihood composite model implemented in

MEGA 4 (http://www.megasoftware.net/). Distances among haplotypes were

then phylogenetically clustered using the neighbor-joining method in MEGA 4.

Phylogenetically related haplotypes were given group designations for further

statistical comparisons and analysis. The use of the tagSNPs panel derived from

the CEU population allowed us to assign haplotypes to our population

corresponding to CEU-inferred haplotypes and subsequently to groups based

on the phylogenetic analysis of the complete SNP panel from the CEU

population. Grouping of haplotypes, based on genealogical or phenotypic

relationships, previously has been used successfully by many other

investigators (Bardel et al., 2009; Rieder et al., 2005; Veenstra et al., 2005).

Phylogenetically grouped haplotypes (PGHs), which share strong genealogical

similarities, serve to substantially increase the statistical power of analyses by

reducing the number of groups to be evaluated.

Cytogenetic cultures and the mutagen-sensitivity assay. Cultures for

cytogenetic assays were established according to standard procedures (Evans

and O’Riordan, 1975), as routinely done in our laboratory (Abdel-Rahman and

El-Zein, 2000; Affatato et al., 2004). Briefly, aliquots of 1 ml of PBLs were

cultured with 9 ml of RPMI 1640 medium supplemented with 100 U/ml

penicillin, 100 lg/ml streptomycin, 10% fetal bovine serum, and 2mM

L-glutamine (Invitrogen, Carlsbad, CA). Stimulation of PBLs was accom-

plished by the addition of 0.18 mg/ml phytohemagglutinin (reagent grade;

Remel, Lenexa, KS). Two cultures were set up for each subject: one culture was

not treated to give a baseline in vivo CA frequency and the second culture was

used for the mutagen-sensitivity assay. After 46 h, the suspended cells in the

second culture were centrifuged and the growth medium reserved. The PBLs

were then resuspended in 5 ml serum-free RPMI 1640 supplemented with

0.24mM of the mutagen and 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone

(NNK) (CAS#64091-91-4, National Cancer Institute, Midwest Carcinogen

Repository, Kansas City, MO) and incubated at 37�C in the presence of 5%

CO2 for 1 h. Following NNK treatment, the PBLs were washed twice with

serum-free RPMI 1640, transferred to clean tubes and resuspended in the

original growth medium until harvested. Harvesting was performed 72 h after

NNK treatment. The mutagen concentration and harvest times had been

established from our previous studies and have been shown to produce

measurable levels of genetic damage and low levels of toxicity over a period of

time that allows the effects of DNA repair to be manifest (Abdel-Rahman and

El-Zein, 2000; Affatato et al., 2004).

Cell culture harvest and cytogenetic analysis. Prior to harvest, cells from

all cultures were treated with 0.1 lg/ml colcemid (Gibco-Invitrogen) for 1 h to

arrest the cells in metaphase. The cultures of PBLs were centrifuged and the

cells resuspended in hypotonic solution (0.075M potassium chloride), fixed

with Carnoy’s fixative (three parts methanol/one part acetic acid, vol/vol), and

stored at 4�C. Slides for cytogenetic analysis were then prepared in duplicate by

spreading the fixed cells on the slides and staining them with Giemsa. One

hundred metaphase cells on each slide were scored for CAs using a Nikon 400

light microscope, according to standard procedures (ISCN, 1985). Aberrations

were recorded as chromosome breaks or frank chromatid breaks. Chromatid

breaks were counted as one break and chromosome breaks as two breaks. Total

aberrant cells were recorded as a percentage of aberrant cells (breaks per 100

cells). For quality control, slides were coded before scoring to protect against

scorer bias. Cells from slides prepared from both smokers and nonsmokers

were scored blindly in mixed batches. To ensure quality control, 20% of the

slides were randomly selected for blind rescoring. Agreement between the

original data and rescored data was measured using the Cohen’s kappa

statistical test. A statistically significant value of p < 0.001 was obtained for

both baseline and mutagen-induced CA, indicating that the agreement between

the original and rescored data was not attributable to random chance.

Statistical analysis. Each individual was coded for the presence (þ) or

absence (�) of each PGH. We used descriptive statistical analyses [mean (±

standard errors of the mean; SEM)] for continuous variables and frequencies for

categorical variables to characterize the study population. We compared mean

CA frequencies for each PGH (present vs. absent) using preliminary Student’s

two-sample t-tests. In order to account for the fact that we were performing

multiple tests in our comparison of baseline and mutagen-induced CA

frequencies within each PGH group separately, we completed a permutation

test with 1000 replicates (PGH present/absent status was randomly permuted

within each replicate) to calculate empirical p values, respectively, for each

PGH comparison. Permutations test corrections are known to be robust and

have the benefit that the empirical p value is constructed directly from the

experimental data at hand (Cheverud, 2001). We completed the same procedure

upon stratification by smoking status (nonsmokers and smokers). Guided by

these preliminary results, a general linear statistical model that included the

final parameters estimated from the exploratory analysis was then fit to evaluate

differences in CA frequency involving interactions between each PGH and

smoking, separately for each PGH, adjusted for age and gender. We

constructed error-bar plots (depicting mean and 95% confidence interval

limits) to graphically visualize statistically significant interactions.

RESULTS

Characteristics of the Study Population

The study population consisted of 123 White non-Hispanic

subjects. We were able to obtain full haplotype data on 99 of

the 123 individuals, and therefore, only those 99 subjects were

included in all subsequent analyses. Of these individuals, 78

were females (78.8%) and 21 were males (21.2%). There were

50 smokers and 49 nonsmokers, who were matched with

XPC HAPLOTYPES AND SENSITIVITY TO TOBACCO SMOKE 43

www.stat.washington.edu/stephens/phase.html

www.stat.washington.edu/stephens/phase.html

http://www.megasoftware.net/

respect to age (± 5 years) and sex. The smokers had smoked

between 5 and 50 cigarettes per day (mean ± SD: 17.4 ± 1.26)

for a minimum of 1 year (mean ± SD: 19.8 ± 1.62 years) before

participating in the study. The age of the participants ranged

from 20 to 72 years, with a median of 37 years and a mean

(± SD) of 39.0 (± 1.30) years. There was no significant

difference in the smoking habits (total number of smoking

years, number of cigarettes smoked per day, and pack years,

defined as packs smoked per day 3 the number of smoking

years) between males and females. The mean ± SEM frequency

of baseline CA frequency for the study population was

0.79 ± 0.10. After mutagen challenge, the mean ± SEM

frequency of mutagen-induced CA was 5.24 ± 0.29.

Identification of tagSNPs, Structures of XPC Haplotypes, andPhylogenetic Analysis

At the time of the analysis, using the information available

on the CEU population from HapMap, we identified 92 SNPs

encompassing the entire coding region, the introns, and 2 kb

upstream and 2 kb downstream of the coding region of the XPCgene. Of these, 35 SNPs were predicted to occur with a MAF �0.05. We identified 11 tagSNPs (the bolded and underlined rs

numbers in Table 1), which tagged the 35 SNPs with

a correlation coefficient of r2 ¼ 0.8. These 11 tagSNPs were

used for subsequent genotyping of the study population. This

linkage-based genotyping significantly reduced the volume of

unique genotyping assays and concurrently reduced the effort

and time required to evaluate the effect of all 35 SNPs on

genetic damage. The 35 SNPs tagged by these 11 tagSNPs and

their position on the XPC gene are presented in Table 1.

Using the information available on the CEU population,

we utilized the Bayesian-inference analysis implemented

in PHASE ver. 2.1 software (www.stat.washington.edu/

stephens/software/html), which revealed 21 unique haplotypes.

Table 2 shows the full 21 haplotypes, as generated by the

PHASE analysis used in this study. Phylogenetic analysis of

these haplotypes was used to assess genealogical relationships

among these 21 haplotypes, utilizing the maximum likelihood

model implemented in MEGA 4 software (http://www

.megasoftware.net/). Haplotypes were grouped based on clade

formation and percent sequence divergence. Six clades were

apparent in the midpoint-rooted cluster analysis corresponding

to the PGHs A–F. Percent sequence divergence within groups

ranged from 4.8% in PGH-F to 8.6% in PGH-D. Percent

sequence divergence between groups ranged from 18.6%

(PGH-A and PGH-B) to 57.3% (PGH-A and PGH-E). Since

there is no firm objective metric that exists for deciding what

should constitute acceptable levels of within- and between-

group or clade percent sequence divergence for this type of

analysis, the apparently ‘‘natural’’ groups and divisions in this

case were used, based on the phylogenetic structuring in the

tree. A bootstrap analysis (data not shown), using 10,000

replicates, strongly supported such groupings. There was �90% clade support for the selected PGHs. As shown in

Figure 1, PGH-A consisted of six haplotypes, PGH-B of one

haplotype, PGH-C of five haplotypes, PGH-D of two

haplotypes, PGH-E of three haplotypes, and PGH-F consisted

of four haplotypes.

Genotype Analysis of the Study Population

After all individuals were genotyped for the 11 tagSNPs, the

genotype data were analyzed for HWE. In this analysis, only

10 of the 11 SNPs passed. As a result of this analysis, we

subsequently excluded rs2470352, which was determined not

TABLE 1

SNPs Existing with a MAF ‡ 0.05 in the XPC Gene

rsa Alleles Ancestral allele Haplotype position Variation site

8516 C/T T 9 3# UTR

10468 C/T T 9 3# UTR

1126547 C/G G 1 3# UTR

2470352 A/T A 2 3# UTR

2229090 C/G C 9 3# UTR

2228001 A/C C 3 Exon 16b

2733532 C/T T 3 Intron 15

2733533 A/C C 11 Intron 15

2733534 C/G G 11 Intron 15

2279017 G/T T 3 Intron 12

2470353 C/G G 11 Intron 12

2607734 A/G A 3 Intron 11

2607736 A/G A 3 Intron 11

2607737 C/T C 11 Intron 11

3731149 A/C A 8 Intron 10

3731146 G/T T 8 Intron 10

9653966 G/T T 4 Intron 10

1124303 G/T T 5 Intron 10

3731143 C/T T 6 Intron 10

2228000 C/T C 9 Exon 9c

2227999 A/G G 6 Exon 9d

3731127 C/T C 7 Intron 8

3731125 A/G A 4 Intron 7

3731124 A/C A 8 Intron 7

13099160 A/G A 7 Intron 7

1106087 G/T G 9 Intron 5

3731108 C/T C 8 Intron 5

3731106 A/G A 8 Intron 5

3729587 C/G C 8 Intron 5

3731093 C/T T 4 Intron 3

2733537 A/G A 10 Intron 3

3731081 G/T G 8 Intron 3

3731068 A/C C 8 Intron 2

1350344 A/G G 11 Intron 1

2607775 C/G C 11 5# UTR

aReference SNP (rs) numbers are those designated by the dbSNP database of

the NCBI (http://www.ncbi.nlm.nih.gov/SNP/). Bold and underlined rs

numbers correspond to the 11 tagSNPs used in genotyping analysis of the 35

SNPs identified with MAF > 0.05 in the XPC gene.bThe rs2228001 (A/C) SNP in exon 16 results in a lysine to glutamine amino

acid change in codon 939 (K939Q).cThe rs2228000 (C/T) SNP in exon 9 results in a valine to arginine amino

acid change in codon 499 (V499R).dThe rs 2227999 (A/G) SNP in exon 9 results in a histidine to arginine amino

acid change at codon 492 (R492H).

44 RONDELLI ET AL.

www.stat.washington.edu/stephens/software/html

www.stat.washington.edu/stephens/software/html




to be in LD with any of the other SNPs under study. We then

reconstituted haplotypes for each individual in our study

population using genotype data generated with the remaining

10 SNPs, which were compared to the CEU haplotypes. All

SNP genotyping reactions were performed with more than 95%

success rate. We excluded 24 subjects from the study who

lacked genotype data for one (n ¼ 18) or more (n ¼ 6) SNPs

due to repeated PCRs failure since this prevented accurate

haplotype assignment for these individuals. Subsequently, we

reconstituted haplotypes for the remaining 99 individuals,

using the genotype data we generated and the CEU haplotypes

as a reference. For accuracy purposes, these 24 individuals

were also excluded from further analysis. All the subjects

excluded were not different in any other aspect from the rest of

the study population.

A PGH designation was assigned to each individual

evaluated. The descriptive statistical results indicated that the

most common PGH in the study population was PGH-F

(40.4%), while the least common PGH was PGH-B (3.0%).

The frequencies of each of the PGHs are presented in Table 3.

Relationship between XPC Haplotypes and the Background(Baseline) CA Frequency

The background (baseline) and mutagen-induced CA

frequencies observed in the presence (þ) and absence (�) of

haplotypes from the different PGHs identified in this study are

presented in Table 4. The PGH groups were first coded (PGH-

A to PGH-F) and then analyzed based on ‘‘haplotype group

copy’’ (HGC) using a dominant genetic model (0 HGCs ¼ 0, 1

or 2 HGCs ¼ 1). When the general linear model, adjusted for

age and sex, was fit to investigate interactions between each

TABLE 2

Individual Haplotypes Determined by PHASE Analysis for the

CEU Population (Utah Residents with Ancestry from Northern

and Western Europe) of HapMapa

1 TTCACACCGGGGGCATGTTCGCGAAGCACCGGCGC

2 TTCACACCGGGGGCATGTTCGTGAGGCACCGGCGC

3 TTCACACACGCGGTCGTTTCGCACAGTGGTATCAG

4 TTCACACACGCGGTCGTTTCGCACAGTGGTATAAG

5 TTCACACACGCGGTCGTTTTGCACAGTGGTATAAG

6 TTCACACACGCGGTCGTGTCGCACAGTGGTATAAG

7 TTCACCCACGCGGTCGTTTCGCACAGTGGTATCAG

8 TTCACCTCGTGAGCATTTTCGCAAAGCACTAGCGC

9 TTCACCTCGTGAACATTTTCGCAAAGCACTAGCGC

10 TTCAGACCGGGGGCATTTTTGCAAATCACTGGCGC

11 TTCTCACCGGGGGCATGTTCGTGAGGCACCGGCGC

12 TTCTCACACGCGGTCGTGTCGCACAGTGGTATAAG

13 TTGACCCCGTGAACATTTTCGCAAAGCACTAGCGC

14 TTGACCTCGTGAACATTTTCGCAAAGCACTAGCGC

15 TCCACACACGCGGTAGTTTCGCAAAGCGGTAGCAG

16 CCCACACACGCGGTATTTCTACAAATCACTGGCAG

17 CCCAGACACGCGGTATTTTTGCAAATCACTGGCAG

18 CCCAGACACGCGGTATTTCTACAAATCACTGGCAG

19 CCCTGACCGGGGGCATTTTTGCAAATCACTGGCGC

20 CCCTGACACGCGGTATTTTTGCAAATCACTGGCAG

21 CCCTGACACGCGGTATTTCTACAAATCACTGGCAG

aA total of 21 unique haplotypes were identified using Bayesian inference

implemented in PHASE v2.1.1. The 21 haplotypes presented in the table

represent the specific combinations of the 35 SNPs evaluated in the study.

FIG. 1. Haplotype structure of theXPC gene. A total of 21 unique haplotypes

were identified using Bayesian inference implemented in PHASE v2.1.1. A

maximum likelihood composite model of phylogenetic analysis in MEGA 4 was

conducted on these 21 haplotypes resulting in six PGH (PGH-A, PGH-B, PGH-C,

PGH-D, PGH-E, and PGH-F) based on genetic distances, as indicated by the

brackets. These six PGHs were used as individual units in further analyses.

TABLE 3

Frequencies of the PGH of the XPC Gene in the Study

Population

PGH status n (%)

A

þ 26 (26.3)

� 73 (73.7)

Ba

þ 3 (3.0)

� 96 (97.0)

C

þ 20 (20.2)

� 79 (79.8)

D

þ 4 (4.0)

� 95 (96.0)

E

þ 7 (7.0)

� 92 (92.9)

Fb

þ 40 (40.4)

� 59 (59.6)

þ, presence; �, absence.aThe least common PGH.bThe most common PGH.


PGH and smoking on baseline CA frequencies, we observed a

significant interaction between smoking and PGH-C (p ¼ 0.046)

(Fig. 2). Nonsmokers who were negative for PGH-C had the

lowest level of baseline CA (mean ± SEM ¼ 0.53 ± 0.192), while

smokers who were positive for PGH-C had significantly higher

baseline CA frequencies (mean ± SEM ¼ 1.21 ± 0.29). Among

those positive for PGH-C, the baseline CA frequency was

3.5 times higher in smokers compared to nonsmokers. In

contrast, we observed no significant interactions between

smoking and PGH-A and PGH-F on baseline CA (data not

shown). Because of the small sample sizes of PGHs B, D, and E,

their interaction effect with smoking on baseline CA could not be

evaluated in the current study.

Relationship between XPC Haplotypes and MutagenSensitivity

Using the general linear statistical model, adjusted for age

and sex, to investigate interactions between each PGH and

smoking on mutagen-induced CA frequency, we observed no

significant interactions between smoking and PGHs A, B, C,

and E (data not shown). However, we observed significant

interactions between smoking and PGH-D (p ¼ 0.023) and

PGH-F (p ¼ 0.031) (Fig. 3). Nonsmokers who were positive

for PGH-D had a significantly lower level of mutagen-induced

CA frequencies (mean ± SEM ¼ 3.75 ± 0.85) than smokers

who were positive for PGH-D (8.75 ± 2.43). Among those

positive for PGH-D, the mutagen-induced CA frequency was

2.3 times higher in smokers compared to nonsmokers, whereas

among those who were negative for PGH-D, this difference in

response in smokers compared to nonsmokers was not observed.

Likewise, nonsmokers who were positive for PGH-F had a

lower frequency of mutagen-induced CA (4.63 ± 0.47) compared

to smokers who were positive for PGH-F (6.03 ± 0.51).

Among those positive for PGH-F, the mutagen-induced CA

frequency was 1.3 times (24%) higher in smokers compared to

nonsmokers.

DISCUSSION

To our knowledge, this is the first study to provide

a comprehensive evaluation of the relationship between XPChaplotypes and genetic damage associated with tobacco

smoking. Rather than addressing the effect of a few individual

SNPs, we determined the relationship between genetic damage

and haplotypes that comprise the common SNPs in the entire

genomic region of the XPC gene. This approach is compre-

hensive and biologically more plausible since it allows for the

evaluation of the effect of multiple SNPs that could jointly

influence outcome. The relationship between XPC haplotypes

and genetic damage was evaluated using CA as a biomarker

because of the well-established strong association between

increased CA frequency and cancer risk. Of all biomarkers

available for human studies, CA is the only biomarker that has

been adequately validated in many independent prospective

TABLE 4

Effect of XPC Haplotype Groups on Background (Baseline) and

Mutagen-Induced CA Frequencies

aPGH bStatus Nonsmokers Smokers

Baseline CA frequency

A þ 0.69 (0.17c) 0.78 (0.21)

� 0.76 (0.21) 0.91 (0.21)

B þ 0.25 (0.25) 0.00 (0.00)

� 0.77 (0.14) 0.9 (0.16)

C þ 0.53 (0.19) 1.21 (0.29)

� 0.81 (0.17) 0.65 (0.16)

D þ 0.75 (0.48) 1.75 (0.63)

� 0.72 (0.14) 0.78 (0.15)

E þ 0.67 (0.33) 0.43 (0.20)

� 0.73 (0.14) 0.93 (0.17)

F þ 0.92 (0.21) 0.82 (0.18)

� 0.48 (0.12) 1.00 (0.25)

Mutagen-induced CA frequency

A þ 5.11 (0.51) 4.67 (0.72)

� 4.73 (0.51) 6.03 (0.57)

B þ 5.25 (1.65) 2.50 (1.50)

� 4.91 (0.37) 5.67 (0.46)

C þ 5.81 (0.48) 5.68 (0.69)

� 4.52 (0.47) 5.45 (0.60)

D þ 3.75 (0.85) 8.75 (2.43)

� 5.04 (0.38) 5.26 (0.43)

E þ 4.57 (1.23) 4.29 (1.25)

� 5.00 (0.37) 5.74 (0.48)

F þ 4.63 (0.47) 6.03 (0.51)

� 5.32 (0.56) 4.00 (0.86)

þ, presence and �, absence.aPGH: phylogenetically-grouped haplotype.bStatus: presence (þ) or absence (�) of the haplotype group.cSEM = standard error of the mean.

FIG. 2. Interaction between PGH-C and smoking, as related to CA frequency.

A general linear model adjusted for age and gender was fit to investigate

interactions between PGH-C and smoking on CA. A permutation test with 1000

replicates was used to calculate empirical p values to account for multiple testing.

Error-bar plots depict mean and 95% confidence interval limits. The round

symbols indicate nonsmokers and the triangular symbols indicate smokers. The

interaction between smoking and PGH-C was significant (p ¼ 0.046).

46 RONDELLI ET AL.

studies as a risk factor for cancer (Bonassi et al., 1995, 2000;

Hagmar et al., 1994, 1998). Our data indicate a significant XPChaplotype–smoking interaction, which was observed between

smoking and PGH-C on frequencies of CA. Our data provide

support for results from previous association studies linking

certain XPC polymorphisms to smoking-associated cancer risk

(An et al., 2007; Guo et al., 2008; Hansen et al., 2007). Our

results suggest that certain XPC haplotypes could affect the

repair of genetic damage caused by tobacco-smoke carcinogens.

Our findings also suggest that certain smokers may be at greater

risk than others for the development of genomic instability,

a critical step in the carcinogenic process, as evidenced by the

increase in CA in PBLs from individuals with PGH-C.

Previous studies addressed associations between only four

XPC polymorphisms and cancer risk, and these studies

produced inconsistent results. For example, positive associa-

tions between the rs2228000 SNP (A499V) and cancer risk

were reported in some studies (An et al., 2007; Sak et al., 2006;

Shen et al., 2005) but not in others (Guo et al., 2008; Weiss

et al., 2006). Similarly, an association between the rs2279017

in intron 12 of XPC and bladder cancer risk was reported (Sak

et al., 2006); however, this association remains to be

confirmed. The rs2228001 SNP in exon 16 (K939Q) was

associated with esophageal, colorectal, and lung cancers in

some studies (Guo et al., 2008; Hansen et al., 2007) but not in

others (An et al., 2007; Weiss et al., 2006; Zhu et al., 2008).

Inconsistencies between studies are not surprising and have

been reported before with polymorphisms of other genes.

Possible explanations for such inconsistencies were often

discussed and included differences in study design and

ethnicities of the studied populations (Au et al., 2004;

Manuguerra et al., 2006). Another possible explanation we

propose for such discrepancies is that the XPC polymorphisms

evaluated exist in variable degrees of LD with others that were

not evaluated in these investigations. Differences in sampling

procedures, coupled with incomplete LD in some cases, may

capture SNPs with functional effects that are not being directly

investigated, but in other cases, such SNPs may not be

captured. Such sampling inconsistencies, possibly influenced

by an inadequate number of studied subjects, may explain these

disparate results. It is also conceivable that the polymorphisms

previously studied have little or no biological effect in-

dependently, but when present as part of a specific haplotype,

they exert a phenotypic effect. This hypothesis is supported by

recent findings from our laboratory indicating that the

ss74800505 SNP that we discovered in the NEIL2 gene had

no effect on expression levels when evaluated independently,

yet when evaluated as part of a haplotype, a significant

reduction (69%) in NEIL2 expression was observed (Kinslow

et al., 2008). Another possible reason for inconsistencies could

be that the phenotypic effect observed with a certain SNP was,

in fact, due to the effects evoked by other SNPs that exist in LD

with the studied SNP. Because of the variability in the degree

of LD existing in different populations, the effects observed for

a certain SNP in one study may not be the same in other studies

because of the population effect. Future research based on our

current study, addressing the effect of haplotypes rather than

the effects of individual SNPs, may clarify these issues and

may significantly reduce inconsistencies in the results currently

observed between different investigations.

Our findings with PGH-C are consistent with reports

indicating that certain XPC SNPs belonging to this phyloge-

netic group of haplotypes are associated with increased cancer

risk. For example, the rs2228000 (V499R) SNP, uniformly

present in PGH-C, was associated with increased risk of head

and neck, bladder, and lung cancers (An et al., 2007; Sak et al.,2006; Shen et al., 2005). Our data are also consistent with

a recent report indicating that the rs2228000 (V499R) SNP is

associated with decreased DRC (Zhu et al., 2008). Whether the

FIG. 3. Interaction between PGH-D and PGH-F and smoking as related to

mutagen-induced CA frequency. A general linear model adjusted for age and

gender was fit to investigate interactions between PGH-D (A) and PGH-F (B)

and smoking as related to mutagen-induced CA. A permutation test with 1000

replicates was used to calculate empirical p values, respectively, for each

outcome, to account for multiple testing. Error-bar plots depict mean and 95%

confidence interval limits. The round symbols indicate nonsmokers and the

triangular symbols indicate smokers. The interaction between smoking and

PGH-D and PGH-F was significant (p values ¼ 0.023 and 0.031 for PGH-D

and PGH-F, respectively).


effect observed is related to the particular SNP evaluated in

these earlier investigations or to other SNPs in PGH-C remains

to be determined.

We found a significant difference in mutagen sensitivity

between smokers who were positive compared to those who

were negative for PGH-D and PGH-F. Smokers with these

PGHs exhibited significantly higher mutagen sensitivity than

smokers who did not have one of these PGHs. This suggests

that smokers with PGH-D or PGH-F could be predisposed to

a greater risk for developing cancer, given the well-

established association between reduced DRC, as determined

by mutagen sensitivity, and cancer risk (An et al., 2007;

Cheng et al., 1998; Spitz et al., 1995; Wang et al., 2007). The

haplotype-smoking interaction is not surprising since reduced

repair would only be important in presence of genotoxic

exposure. This gene-smoking interaction is consistent with

previous reports with other polymorphisms in other DNA

repair genes (e.g., Abdel-Rahman et al., 2000; Affatato et al.,2004). A plausible biological explanation for such interaction

is that, in smokers, continuous exposure to tobacco smoke

mutagens could overwhelm the DNA repair machinery,

making the effect of the polymorphisms that reduce repair

capacity more pronounced. Thus, the inheritance of poly-

morphisms that result in even a slight decrease in DNA repair

could lead to more noticeable genetic damage in such

individuals compared to nonsmokers. It is noteworthy that

while PGH-C was associated with differences in baseline CA,

it was not associated with mutagen-induced genetic damage.

This could likely be due to differences in the mechanism(s) by

which certain PGHs exert their effects with respect to chronic

and acute exposures. For example, in response to chronic

tobacco carcinogens exposure, haplotypes belonging to PGH-

C could possibly affect XPC binding to the DNA lesion, thus

reducing overall DNA repair over time, which would manifest

as an increase in CA in smokers. Conversely, PGH-D and

PGH-F may exert their effect primarily in the presence of an

acute exposure to a mutagen, suggesting that XPC haplotypes

belonging to these PGHs could affect protein stability and/or

turnover at the translational and/or transcriptional levels.

Additional studies are warranted to support or refute these

potential mechanisms. It should be noted, however, that while

the exact mechanisms by which SNPs belonging to PGH-C,

-D, and -F influence genetic damage are not fully understood,

some of the previously studied SNPs belonging to these PGHs

(e.g., rs2228000, rs2279017) might have potentially signifi-

cant effects on protein structure and/or function. For example,

the rs2228000 (A499V) SNP of PGH-C is located at the 5# end

of the hHR23B-binding region of the gene and may, thus, alter

the function of XPC by altering its binding with the hHR23B

protein that is necessary for XPC function. However, other

SNPs in other regions of the gene, which exist in LD with

rs2228000, may also contribute to the observed phenotypic

effect. For example, an SNP in the 3# UTR can affect

posttranscriptional activity, such as messenger RNA (mRNA)

folding–directed rates of translation or mRNA half-life

stability (George Priya Doss et al., 2008). Similarly, intronic

SNPs that are at, or near, exonic boundaries can affect mRNA

translation through exon skipping and/or aberrant mRNA

folding (Cheng et al., 2006; Duan et al., 2007; Kinslow et al.,2008; Law et al., 2007), and SNPs in the 5# UTR can affect

XPC gene expression via promoter modulation (Cheng et al.,2006). Taken together, our findings suggest that SNPs, in

coding as well as noncoding regions of the XPC gene, that are

in LD with each other as part of a given haplotype may act in

a collective manner to influence the phenotype. Mechanistic

studies examining the effects of haplotypes, rather than the

effects of individual SNPs, on XPC function are warranted to

clarify the role of XPC polymorphisms.

In summary, despite the small sample size of the current

study, a limitation that we acknowledge and which limited our

ability to conclusively evaluate the effect of some PGHs, our

data indicate that haplotypes belonging to PGH-C, -D, and -F

appear to confer sensitivity to the mutagenic effects of tobacco

carcinogens. Larger studies are needed to confirm our initial

findings, and mechanistic research investigating the effect of

XPC haplotypes on NER capacity and on the risk of developing

diseases is clearly warranted. These studies are currently in

progress in our laboratory.

FUNDING

National Institute of Environmental Health Science (NIEHS)

Center award (ES06676), by a John Sealy Memorial

Endowment Foundation grant to S.A.-R.; a predoctoral fellow-

ship to C.M.R. from the NIEHS (T32-07454), a cancer

prevention fellowship funded by the National Cancer Institute

(K07CA093592) to C.J.E.; National Cancer Institute

(CA123208) to C.J.E.; CA129050 and CA098549 to R.E.-Z.

and by the National Institute of Neurological Disorders and

Stroke NS065392-01 to S.A.-R.; studies were conducted with

the assistance of the Institute for Translational Sciences—

Clinical Research Center at UTMB funded by a

1UL1RR029876-01 grant from the National Center for Research

Resources, National Institutes of Health.

ACKNOWLEDGMENTS

We thank Dr Marinel M. Ammenheuser for her critical

review of the manuscript.

REFERENCES

Abdel-Rahman, S. Z., and El-Zein, R. A. (2000). The 399Gln polymorphism in

the DNA repair gene XRCC1 modulates the genotoxic response induced in

human lymphocytes by the tobacco-specific nitrosamine NNK. Cancer Lett.

159, 63–71.

48 RONDELLI ET AL.

Abdel-Rahman, S. Z., Salama, S. A., Au, W. W., and Hamada, F. A. (2000).

Role of polymorphic CYP2E1 and CYP2D6 genes in NNK-induced

chromosome aberrations in cultured human lymphocytes. Pharmacogenetics

10, 239–249.

Affatato, A. A., Wolfe, K. J., Lopez, M. S., Hallberg, C.,

Ammenheuser, M. M., and Abdel-Rahman, S. Z. (2004). Effect of XPD/

ERCC2 polymorphisms on chromosome aberration frequencies in smokers

and on sensitivity to the mutagenic tobacco-specific nitrosamine NNK.

Environ. Mol. Mutagen. 44, 65–73.

An, J., Liu, Z., Hu, Z., Li, G., Wang, L. E., Sturgis, E. M., El-Naggar, A. K.,

Spitz, M. R., and Wei, Q. (2007). Potentially functional single nucleotide

polymorphisms in the core nucleotide excision repair genes and risk of

squamous cell carcinoma of the head and neck. Cancer Epidemiol.

Biomarkers Prev. 16, 1633–1638.

Araki, M., Masutani, C., Takemura, M., Uchida, A., Sugasawa, K., Kondoh, J.,

Ohkuma, Y., and Hanaoka, F. (2001). Centrosome protein centrin2/caltracin1

is part of the xeroderma pigmentosum group c complex that initiates global

genome nucleotide excision repair. J. Bio. Chem. 276, 18665–18672.

Au, W. W., Navasumrit, P., and Ruchirawat, M. (2004). Use of biomarkers to

characterize functions of polymorphic DNA repair genotypes. Int. J. Hyg.

Environ. Health 207, 301–313.

Bardel, C., Danjean, V., Morange, P., Genin, E., and Darlu, P. (2009). On the

use of phylogeny-based tests to detect association between quantitative traits

and haplotypes. Genet Epidemiol. 33, 729–739.

Bonassi, S., Abbondandolo, A., Camurri, L., Dal Pra, L., De Ferrari, M.,

Degrassi, F., Forni, A., Lamberti, L., Lando, C., and Padovani, P. (1995).

Are chromosome aberrations in circulating lymphocytes predictive of future

cancer onset in humans? Preliminary results of an Italian cohort study.

Cancer Genet. Cytogenet. 79, 133–135.

Bonassi, S., Hagmar, L., Stromberg, U., Montagud, A. H., Tinnerberg, H.,

Forni, A., Heikkila, P., Wanders, S., Wilhardt, P., Hansteen, I. L., et al.

(2000). Chromosomal aberrations in lymphocytes predict human cancer

independently of exposure to carcinogens. European Study Group on

Cytogenetic Biomarkers and Health. Cancer Res. 60, 1619–1625.

Bunick, C. G., Miller, M. R., Fuller, B. E., Fanning, E., and Chazin, W. J.

(2006). Biochemical and structural domain analysis of xeroderma pigmen-

tosum complementation group C protein. Biochemistry 45, 14965–14979.

Cheng, A. J., Mao, Y. M., and Cui, R. Z. (2006). The effect of gene

polymorphism in promoter and intron 1 on human Apo A I expression.

Zhonghua Yi Xue Yi Chuan Xue Za Zhi 23, 610–613.

Cheng, L., Eicher, S. A., Guo, Z., Hong, W. K., Spitz, M. R., and Wei, Q.

(1998). Reduced DNA repair capacity in head and neck cancer patients.

Cancer Epidemiol. Biomarkers Prev. 7, 465–468.

Cheverud, J. M. (2001). A simple correction for multiple comparisons in

interval mapping genome scans. Heredity 87, 52–58.

Duan, Z. X., Zhu, P. F., Dong, H., Gu, W., Yang, C., Liu, Q., Wang, Z. G., and

Jiang, J. X. (2007). Functional significance of the TLR4/11367 poly-

morphism identified in Chinese Han population. Shock 160, 160–164.

Evans, H. J., and O’Riordan, M. L. (1975). Human peripheral blood

lymphocytes for the analysis of chromosome aberrations in mutagen tests.

Mutat. Res. 31, 135–148.

Friedberg, E. C. (2001). How nucleotide excision repair protects against cancer.

Nat. Rev. Cancer 1, 22–33.

Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J.,

Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et al.

(2002). The structure of haplotype blocks in the human genome. Science

296, 2225–2229.

George Priya Doss, C., Sundandiradoss, R., Rajasekaran, R., Choudhury, P.,

Sinha, P., Hota, P., Batra, U. P., and Rao, S. (2008). Applications of

computational algorithm tools to identify functional SNPs. Funct. Integr.

Genomics 9, 309–316.

Guo, W., Zhou, R. M., Wan, L. L., Wang, N., Li, Y., Zhang, X. J., and

Dong, X. J. (2008). Polymorphisms of the DNA repair gene xeroderma

pigmentosum groups A and C and risk of esophageal cell carcinoma in

a population of high incidence region of North China. J. Cancer Res. Clin.

Oncol. 134, 267–270.

Hagmar, L., Bonassi, S., Stromberg, U., Brogger, A., Knudsen, L. E.,

Norppa, H., and Reuterwall, C. (1998). Chromosomal aberrations in

lymphocytes predict human cancer: a report from the European Study

Group on Cytogenetic Biomarkers and Health (ESCH). Cancer Res. 58,

4117–4121.

Hagmar, L., Brogger, A., Hansteen, I. L., Heim, S., Hogstedt, B., Knudsen, L.,

Lambert, B., Linnainmaa, K., Mitelman, F., and Nordenson, I. (1994).

Cancer risk in humans predicted by increased levels of chromosomal

aberrations in lymphocytes: Nordic study group on the health risk of

chromosome damage. Cancer Res. 54, 2919–2922.

Hansen, R. D., Sorensen, M., Tjonneland, A., Overvad, K., Walling, H.,

Raaschou-Nielsen, O., and Vogel, U. (2007). XPA A23G, XPC Lys939Gln,

XPD Lys751Gln and XPD Asp312Asn polymorphisms, interactions with

smoking, alcohol and dietary factors, and risk of colorectal cancer. Mutat.

Res. 619, 68–80.

Hsu, T. C., Spitz, M. R., and Schantz, S. P. (1991). Mutagen sensitivity:

a biologic marker of cancer susceptibility. Cancer Epidemiol. Biomarkers

Prev. 1, 83–89.

IARC. (1986). Tobacco smoking. IARC Monographs for the Evaluation of the

Carcinogenic Risk of Chemicals to Humans (IARC), (World Health

Organization, Ed.), pp. 312–314. IARC, Lyon, France.

ISCN. (1985). An International System for Human Cytogenetic Nomenclature.

Report of the Standing Committee on Human Cytogenetic Nomenclature.

Birth Defects Orig. Artic. Ser. 21, 1–117.

Johnson, G. C., Esposito, L., Barratt, B. J., Smith, A. N., Heward, J., Di

Genova, G., Ueda, H., Cordell, H. J., Eaves, I. A., Dudbridge, F., et al.

(2001). Haplotype tagging for the identification of common disease genes.

Nat. Genet. 29, 233–237.

Kinslow, C. J., El-Zein, R. A., Hill, C. E., Wickliffe, J. K., and Abdel-

Rahman, S. Z. (2008). Single nucleotide polymorphisms 5’ upstream

the coding region of the NEIL2 gene influence gene transcription levels

and alter levels of genetic damage. Genes Chromosomes Cancer 47,

923–932.

Law, A. J., Kleinman, J. E., Weinberger, D. R., and Weickert, C. S. (2007).

Disease-associated intronic variants in the ErbB4 gene are related to altered

ErbB4 splice-variant expression in the brain in schizophrenia. Hum. Mol.

Genet. 16, 129–141.

Liu, G., Zhou, W., and Christiani, D. C. (2005). Molecular epidemiology of

non-small cell lung cancer. Semin. Respir. Crit. Care Med. 26, 265–272.

Manuguerra, M., Saletta, F., Karagas, M. R., Berwick, M., Veglia, F.,

Vineis, P., and Matullo, G. (2006). XRCC3 and XPD/ERCC2 single

nucleotide polymorphisms and the risk of cancer: a HuGE review. Am.

J. Epidemiol. 164, 297–302.

Park, C. J., and Choi, B. S. (2006). The protein shuffle: sequential interactions

among components of the human nucleotide excision repair pathway.

FEBS J. 273, 1600–1608.

Rieder, M. J., Reiner, A. P., Gage, B. F., Nickerson, D. A., Eby, C. S.,

McLeod, H. L., Blough, D. K., Thummel, K. E., Veenstra, D. L., and

Rettie, A. E. (2005). Effect of VKORC1 haplotypes on transcriptional

regulation and warfarin dose. N. Engl. J. Med. 352, 2285–2293.

Sak, S. C., Barrett, J. H., Paul, A. B., Bishop, T. D., and Kiltie, A. E. (2006).

Comprehensive analysis of 22 XPC polymorphisms and bladder cancer risk.

Cancer Epidemiol. Biomarkers Prev. 15, 2537–2541.

Shen, H., Spitz, M. R., Qiao, Y., Guo, Z., Wang, L. E., Bosken, C. H.,

Amos, C. I., and Wei, Q. (2003). Smoking, DNA repair capacity and risk of

nonsmall cell lung cancer. Int. J. Cancer 107, 84–88.


Shen, M., Berndt, S. I., Rothman, N., DeMarini, D. M., Mumford, J. L., He, X.,

Bonner, M. R., Tian, L., Yeager, M., Welch, R., et al. (2005). Poly-

morphisms in the DNA nucleotide excision repair gene and lung cancer risk

in Xuan Wei, China. Int. J. Cancer 116, 768–773.

Spitz, M. R., Hsu, T. C., Wu, X. F., Fueger, J. J., Amos, C. I., and Roth, J. A.

(1995). Mutagen sensitivity as a biologic marker of lung cancer risk in

African Americans. Cancer Epidemiol. Biomarkers Prev. 4, 99–103.

Veenstra, D. L., You, J. H., Rieder, M. J., Farin, F. M., Wilkerson, H. W.,

Blough, D. K., Cheng, G., and Rettie, A. E. (2005). Association of Vitamin

K epoxide reductase complex 1 (VKORC1) variants with warfarin dose in

a Hong Kong Chinese patient population. Pharmacogenet. Genomics 15,

687–691.

Wang, Y., Spitz, M. R., Lee, J. J., Huang, M., Lippman, S. M., and Wu, X.

(2007). Nucleotide excision repair pathway genes and oral premalignant

lesions. Clin. Cancer Res. 13, 3753–3758.

Weiss, J. M., Weiss, N. S., Ulrich, C. M., Doherty, J. A., and Chen, C. (2006).

Nucleotide excision repair genotype and the incidence of endometrial cancer:

effect of other risk factors on the association. Gynecol. Oncol. 103, 891–896.

Zhu, Y., Lai, M., Yang, H., Lin, J., Huang, M., Grossman, H. B., Dinney, C. P.,

and Wu, X. (2007). Genotypes, haplotypes, and diplotypes of XPC and risk

of bladder cancer. Carcinogenesis 28, 698–703.

Zhu, Y., Yang, H., Chen, Q., Lin, J., Grossman, H. B., Dinney, C. P., Wu, X.,

and Gu, J. (2008). Modulation of DNA damage/DNA repair capacity by

XPC polymorphisms. DNA Repair 7, 141–148.

50 RONDELLI ET AL.