Upload
lydung
View
217
Download
0
Embed Size (px)
Citation preview
CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-
positive hits for highly amplified genomic regions
Diana M. Munoz1, Pamela J. Cassiani1, Li Li1, Eric Billy2, Joshua M. Korn1, Michael D. Jones1, Javad Golji1,
David A. Ruddy1, Kristine Yu1, Gregory McAllister 3, Antoine DeWeck2, Dorothee Abramowski2, Jessica
Wan1, Matthew D. Shirley1, Sarah Y. Neshat1, Daniel Rakiec1, Rosalie de Beaumont1, Odile Weber2,
Audrey Kauffmann2, E Robert McDonald III1, Nicholas Keen1, Francesco Hofmann2, William R. Sellers1,
Tobias Schmelzle2, Frank Stegmeier1,4,5 and Michael R. Schlabach1,4,5*.
1. Oncology Disease Area, Novartis Institute for Biomedical Research, Cambridge, Massachusetts,
USA. 2. Oncology Disease Area, Novartis Institutes for Biomedical Research, Basel, Switzerland. 3. Developmental and Molecular Pathways, Novartis Institutes for Biomedical Research,
Cambridge, Massachusetts, USA. 4. Present address: KSQ Therapeutics, Cambridge, Massachusetts, USA 5. These authors contributed equally to this work
Running title: CRISPR screens for the discovery of cancer vulnerabilities
Keywords: CRISPR, shRNA, drop out screens, cancer vulnerabilities and genetic amplifications Manuscript type: Research article *Corresponding author: Michael R Schlabach Mailing address: KSQ Therapeutics 790 Memorial Drive, Suite 200 Cambridge, MA 02139 Email: [email protected] Cell: 617-444-9192 Disclose any potential conflict of interest: D.M.M, P.J.C, L.L, E.B, J.K, M.D.J, J.G, D.R, K.Y, G.M, A.D, D.A, J.W, M.D.S, S.Y.N, D.R, R.B, O.W, A.K, E.R.M,N.K, F.H, W.R.S, T.S, F.S and M.R.S are employees of Novartis. Word count: 5,617 Total number of figure: 6
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Abstract
CRISPR/Cas9 has emerged as a powerful new tool to systematically probe gene function. We compared
the performance of CRISPR to RNAi-based loss-of-function screens for the identification of cancer
dependencies across multiple cancer cell lines. CRISPR dropout screens consistently identified more
lethal genes than RNAi, implying that the identification of many cellular dependencies may require full
gene inactivation. However, in two aneuploid cancer models we found that all genes within highly
amplified regions, including non-expressed genes, scored as lethal by CRISPR, revealing an unanticipated
class of false-positive hits. Additionally, using a CRISPR tiling screen, we found that sgRNAs targeting
essential domains generate the strongest lethality phenotypes and thus provide a strategy to rapidly
define the protein domains required for cancer dependence. Collectively, these findings demonstrate
the utility of CRISPR screens in the identification of cancer-essential genes, but also reveal the need to
carefully control for false-positive results in chromosomally unstable cancer lines.
Significance
We show in this study that CRISPR-based screens have a significantly lower false-negative rate
compared to RNAi-based screens, but have specific liabilities particularly in the interrogation of regions
of genome amplification. Therefore, this study provides critical insights for applying CRISPR-based
screens towards the systematic identification of new cancer targets.
Introduction
Genetic loss-of-function screens are an important approach enabling the systematic identification of
cancer selective vulnerabilities. In mammalian cells, RNAi has been the predominant method of
screening and has enabled systematic and genome-wide loss-of-function screens leading to the
identification of new cancer targets (1, 2). RNAi-based screens, however, are often confounded by off-
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
target effects (3). In addition, RNAi induces mRNA downregulation typically resulting in reduced gene
function (hypomorphic allele), rather than a complete loss of function (null allele). Thus, in addition to
the problem of false-positives, RNAi screens also likely suffer a certain rate of false-negative detection of
genes where near completely loss-of-function would be required in order to elicit a phenotypic effect.
The frequency of false-negatives in RNAi-based screens has not yet been systematically assessed.
More recently, the prokaryotic type II CRISPR–Cas9 (clustered regularly interspaced short palindromic
repeats–CRISPR-associated 9) has emerged as an RNA-based genome-editing tool that can be used to
enact loss-of-function screens (4). In contrast to RNAi, the CRISPR system induces sequence-directed
DNA double stranded breaks resulting in frameshift insertion/deletion (indel) mutations that can induce
complete loss of protein function (5). Initial studies demonstrated the use of CRISPR for genetic screens
in mammalian cells (6, 7) and showed high level of phenotypic agreement between reagents targeting
the same gene and a high rate of hit confirmation. Most of these screens were positive selection screens
which are technically less challenging than ‘drop out’ screens. Subsequent screens (8, 9) used improved
libraries and screening methods to discover essential genes in mammalian cells, but a systematic
comparison of CRISPR to RNAi in drop-out screens has not yet been described with sufficient reagent
depth to enable robust conclusions.
In this study, we systematically compared the performance of these two screening technologies for the
identification of new cancer vulnerabilities. We show that at equivalent screening depth, CRISPR
dropout screens identified a significantly higher number of essential genes and thus provide a more
comprehensive assessment of genetic dependencies compared to RNAi-based screens. Additionally, we
show that sgRNAs that target DNA sequences within conserved Pfam domains(10) tend to result in a
more robust drop out phenotype. These findings have important implications for future library designs
and suggest that the CRISPR tiling approach outlined herein might be used to elucidate which protein
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
domains are critical in driving biological effects. We surprisingly found that all genes within highly
amplified genes, even when not expressed, scored as strongly lethal, revealing an unanticipated class of
false-positive hits. Collectively, these findings demonstrate that while CRISPR has certain specific
limitations, CRISPR-mediated genetics screens can be used for robust and systematic discovery of cancer
cell vulnerabilities.
Results
CRISPR-based dropout screens provide a more complete assessment of cancer dependencies
compared to shRNA screens
In order to robustly compare RNAi- and CRISPR-based screening technologies, we constructed shRNA
and sgRNA libraries targeting 2722 human genes with an average coverage of 20 reagents per gene. The
sgRNAs were designed against the N-terminus of protein coding genes, as described previously, as
frameshift mutations in the N-terminus are thought to be more likely to result in ‘complete’ protein
inactivation (7, 11). Deep shRNA libraries were designed as previously described (2).These libraries were
used in proliferation-based screens in a set of 5 cancer cell lines including; the colorectal cancer cell lines
DLD1 and RKO, fibrosarcoma cell line HT-1080, astrocytoma cell line SF-268, and gastric cancer cell line
MKN-45 (Fig. 1A). Following lentiviral transduction of the sgRNA libraries, the impact of gene depletion
on cellular viability or proliferation was assessed by quantifying the abundance of sgRNAs at day 0
(plasmid count) relative to 14 days using next-generation sequencing (see details in methods). sgRNAs
targeting essential genes are expected to inhibit the growth of transduced cells and thus their relative
abundance will be reduced when comparing the relative counts on day 14 vs day 0 (Fig 1A, right graph).
We found that across all cell lines screened about 2-3% of genes scored as lethal genes by both RNAi and
CRISPR approaches (Fig. 1B and 1C, Supplementary Fig. S1, quadrant III). The gene list in quadrant III
included many known essential gene classes such as ribosomal, RNA processing, and DNA replication
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
factors (Supplementary Fig. S2). Notably, there were very few genes that scored as essential by RNAi but
not by CRISPR (Figure 1B, 1C, S1, IV). In contrast, in all of the five cancer models screened a large
number of genes scored as essential by CRISPR but not RNAi (Fig. 1B, 1C, S1, II). In fact, the number of
lethal genes identified by CRISPR was twofold (HT1080 cells) to five-fold (DLD1 cells) higher compared to
RNAi. This suggested that CRISPR either had a significantly lower false-negative rate than shRNA, or a
much higher false-positive rate. One way to identify likely off-target hits is to examine the lethality
scores of non-expressed genes, as these are expected to not be required for cell viability. In DLD1, RKO,
and HT1080 cells, all of the genes required for cell viability (average Z-scores below -1) had an RNASeq
RPKM expression value greater than 2, indicating that the CRISPR screen at this depth showed virtually
no false-positive effects from sgRNAs directed against non-expressed genes (Fig 1D, Fig S3). However, in
SF268 (Fig 1E) and MKN-45 cells (Fig S3), a number of genes scored as essential in the CRISPR screen
despite not being expressed. As described in detail below, we found that these false positive hits were
associated with genes in regions of high copy number amplification. These false positive hits were only
observed in SF268 and MKN-45, as these are chromosomally aneuploid lines, whereas DLD1, RKO and
HT1080 are diploid cancer lines. After removing these false-positive hits due to amplified genes in SF268
and MKN-45 cells, we conducted further analysis on the essential genes identified by CRISPR and RNAi.
The category of genes that only scored by CRISPR but not RNAi included many genes known to be
essential for proliferation of most cells such as CDK9, PLK1, and MYC (12, 13) as well as many known
essential gene classes (RNA processing and DNA replication). We hypothesized that RNAi-based screens
failed to recover these genes either due to the absence of effective shRNAs in our library (despite a
coverage of 20 reagents per gene) and/or insufficient protein knockdown to reveal a full loss-of-function
phenotype. In support of this hypothesis, we found that only 1 of 6 CDK9 shRNAs tested achieved potent
CDK9 protein depletion that resulted in growth inhibition(Supplementary Fig. S4A-B) thus explaining
why CDK9 failed to score as an essential gene in the shRNA-based screen. Collectively, these findings
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
indicate that CRISPR-based screening enables a more complete assessment of genes required for cancer
cell growth.
We next sought to explore whether CRISPR screens can be used to identify cancer-selective
dependencies. The five cell lines screened were derived from various tumor lineages with distinct
genetic alterations. Using a cutoff of -1 average Z-score to delineate genes that are cell essential, we
found that a total of 409 genes scored as essential in at least one of the five cancer cell lines. Of these,
34% of essential genes were required for the proliferation of all cell lines, suggesting that these genes
serve core cellular functions that are likely required for the proliferation of most cells (Fig 2A, B); we
henceforth refer to this category of broadly essential genes as pan-lethals. A smaller number of genes
was selectively required for the growth of only one (25%) or two (12%) of the five screened cancer cell
models (Fig 2A, B); we refer to this class of genes as selective lethals. Of note, the class of selective
lethals included several known oncogene dependencies. For instance, Beta-catenin targeting sgRNAs
selectively impaired the proliferation of DLD1, an APC mutated cell line with constitutive activation of
WNT pathway signaling(14) (Fig 2C). The selective dependence on b-catenin was validated using
inducible sgRNAs and additional cell proliferation assays (Supplementary Fig. S5A-D). This cell line was
also dependent on TCF7L2, a gene that encodes the transcription factor Tcf4. Tcf4 interacts with Beta-
catenin to drive expression of WNT pathway target genes. Surprisingly, the gastric cancer cell line
MKN45 also exhibited dependence on Beta-catenin and Tcf4 despite lacking genetic alterations in WNT
pathway components (Fig 2C). Of note, a prior study reported high levels of nuclear Beta-catenin in
MKN45 cells (15) , suggesting that WNT pathway activation in this cell line might be driven by non-
genetic mechanisms. When we looked at the pattern of KRAS dependence, we found that KRAS
selectively impaired the proliferation of DLD1 cells (Fig 2C); this dependency is likely explained by the
fact that DLD1 harbors the oncogenic KRASG13D mutation. Unexpectedly, however, MKN45 cells were
also dependent on KRAS despite lacking genetic alterations in this oncogene. These cells harbor MET
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
amplification; thus, one possibility is that MET signaling requires KRAS. KRAS also had a low Z score in
MKN-45 cells by shRNA screening (avg. Zscore of -0.8), suggesting the phenotype is biologically relevant
and not a false positive. Both the Beta-catenin and KRAS sensitivities in MKN-45 would not have been
predicted by genetic alterations alone, but were discovered independently by both RNAi and CRISPR,
highlighting the importance of functional profiling. The selective pattern of NRAS and PIK3CA
dependency correlated well with the presence of oncogenic alterations in NRAS and PIK3CA,
respectively (Fig 2C, Supplementary Fig.S 5). In addition, MDM2 sgRNAs selectively impaired the growth
of p53 wild-type but not p53 mutant cell lines (Fig 2C). Importantly, this genetic pattern of MDM2
dependence recapitulates the selective inhibition of p53 wild-type cell lines by pharmacological MDM2
inhibitors, such as Nutlin-3 (16). Together, these findings indicate that in addition to the identification of
broadly essential genes, CRISPR-based dropout screens can also robustly identify cancer-selective
vulnerabilities.
sgRNAs targeting conserved PFAM domains show most robust dropout phenotypes
While the CRISPR-based screen identified CTNNB1 as a cancer-selective dependency in WNT-pathway
deregulated cancer models, CTNNB1 was one of the few genes that scored more robustly in the shRNA
screen compared to CRISPR (Fig 3A). Examination of the individual sgRNA scores indicated that the
efficacy of sgRNAs correlated with the targeting position in the CTNNB1 transcript (Fig. 3B); the first five
sgRNAs targeting the most 5’ regions of the CTNNB1 transcript showed very little to no dropout
phenotype. By contrast, 87% of the next 15 sgRNAs targeting the downstream exons 3, 4 and 5 exhibited
a stronger lethality score. Investigation of the genomic locus of CTNNB1 revealed that it harbors an
alternative translational initiation start site in exon 3 (transcript ID ENST00000405570) suggesting that
the isoform expressed from this alternative start site is likely sufficient for cancer cell growth, explaining
the lack of a dropout phenotype of the 5’ targeting sgRNAs.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
We next set out to more systematically investigate the importance of sgRNA positioning on gene
inactivation. To this end, we designed a sgRNA library that contains all possible sgRNAs targeting a set of
139 genes with an average of 364 sgRNAs/gene, which we refer to as CRISPR tiling array (Fig. 4A). The
genes included in the CRISPR tiling array were chosen to represent diverse biological functions, but were
enriched for genes that elicited growth phenotypes in the primary screen. In order to minimize potential
biases, we included all unique sgRNA sequences targeting these gene coding sequences only requiring
the presence of a PAM sequence and lack of perfect homology to other coding sequences. This CRISPR
tiling library was screened in the three cancer cell lines DLD1, RKO and NCI-H1299. Interestingly, as
observed for CTNNB1, for 63% (46 of 73) of the growth essential genes in DLD1, the sgRNA performance
was strongly influenced by the sgRNA position within the coding region. Similarly, 68% (52 of 76) of the
essential genes in RKO cells showed coding-region dependent activity. The growth effects of individual
sgRNAs were significantly correlated across cell lines (r2=0.504), suggesting that these effects represent
consistent differences in the biological effectiveness of individual reagents (Fig 4B). We next performed
a systematic correlation analysis of sgRNA features to identify what features correlated most strongly
with sgRNA potency. Interestingly, the top predictive feature for sgRNA performance was its localization
within a conserved Pfam protein domain (Fig 4C, Supplementary Fig. S6). In addition, the extent of
sequence conservation across vertebrate species was also a good predictor (p<<0.001) of sgRNA
efficacy, regardless of whether or not the region was annotated as a conserved Pfam domain. While
prior studies (7) have suggested that there is value in targeting the most 5’ coding regions of proteins
with CRISPR reagents, this was not the case in our screens. In this dataset, the average phenotype of
sgRNAs targeting essential genes were slightly weaker for sgRNAs targeting the extreme N-terminal
coding region, and much weaker in the 3’ most coding regions (last 20%) of proteins. This effect,
however, appeared to be largely driven by the location of PFAM domains within coding regions, as the
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
N-terminal and C-terminal effects were no longer observed when only sgRNAs targeting annotated
domains were included in the analysis (Supplementary Fig. S7A-B).
Based on the observation that sgRNAs targeting conserved protein domains scored more robustly in
CRISPR-based screens, we hypothesized that CRISPR-tiling data might be used to perform functional
annotation of critical protein domains. Indeed, sgRNAs targeting the highly conserved armadillo repeats
in Beta-catenin demonstrated more significant average lethality scores compared to sgRNAs targeting
less conserved regions (Fig 4D). The failure of some of the Beta-catenin sgRNAs to score despite
targeting the highly conserved armadillo repeats correlated with ineffective genome editing by these
reagents (Supplementary Fig. S8A-C). Similar to the case of Beta-catenin, sgRNAs targeting the highly
conserved kinase domain or polo-box regions in PLK1 showed the most robust dropout phenotypes (Fig
4E), and sgRNAs targeting the kinase domain of Aurora kinase B (AURKB), had significantly stronger
effects than those targeting the extreme N or C termini (Fig 4F). These findings are consistent with the
notion that the armadillo repeats in Beta-catenin, the kinase activity in Aurora kinase B, and both the
kinase activity and polo-boxes in PLK1’s are essential in mediating their cellular functions (13, 17). A
recent study revealed that the helicase activity but not the bromo-domain of BRM is required to sustain
the growth of BRG1 deficient cancers (Fig. 4G) (18). Strikingly, the CRISPR tiling data for BRM indicated a
more robust dropout phenotype for sgRNAs targeting the ATPase/helicase activity compared to those
targeting the bromo-domain region. Together, these findings suggest that CRISPR tiling screens might be
useful, in some cases, to decipher which protein domains are required for cancer cell growth.
Amplified genomic loci score as false-positive in CRISPR based dropout screens.
As described earlier, we found that in the two aneuploid cancer cell lines SF268 and MKN-45 several
non-expressed genes scored as essential, suggesting that these genes represent false positive hits.
Strikingly, all of these false-positive hits mapped to regions of high-level copy number amplification (Fig
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
5A, B). We therefore wanted to explore more deeply the effect of amplified genomic regions on the
performance of CRISPR-based screens. MKN-45 is a gastric cancer cell line that harbors amplification of a
region of chromosome 7 (7q31) that contains the likely driver oncogene MET (Fig. 5A). While MET
scored as essential in MKN-45 cells, all other genes included in the library and located within the 7q31
amplicon also scored as lethal. Moreover, sgRNAs targeting ING3 and CAV1 exhibited the strongest
viability effect of the genes located within 7q31 amplicon, with MET ranking third (Fig. 5C). Similar
results were observed for SF268 cells, where all genes in the chromosome 11 amplicon (11q22) scored
as lethal (Fig. 5B, D). YAP has been hypothesized to be the most likely driver of this amplicon (19). While
YAP did score as the most strongly essential in this cell line, three genes within this amplicon, MMP7,
MMP20 and ANGPTL5, showed strong viability effects despite lacking detectable expression based on
RNAseq. Of note, all of the non-expressed genes (RNAseq<1) that scored as lethal in SF268 and MKN-45
cells were located in amplified genomic regions (Fig 5A, B). By contrast, shRNA-based screens identified
both MET and YAP as the sole driver oncogenes of their respective amplicons (Fig. 5E, F). We
hypothesized that sgRNAs targeting amplified loci may lead to excessive double-strand breaks and
activation of the DNA damage repair pathways. To test this, we examined the effects of sgRNAs
targeting the non-expressed and amplified genes MMP7, MMP20, and ANGPTL5 in SF268 cells that
harbor the 11q22 amplicon. All 3 sgRNAs led to a strong increase in phosphorylated histone H2AX, a
marker of DNA damage (Supplementary Fig. S9a) and resulted in a G2/M arrest and induction of
apoptosis (Supplementary Fig. S9b-d and S10a-b). As predicted, the induction of DNA damage response,
G2/M arrest, and apoptosis by these sgRNAs was specific to cells with 11q22 amplicon and not observed
in the diploid DLD-1 cells (Supplementary Fig. S9a S10a-b).
We next explored the effect of relative gene copy number on CRISPR lethality score more globaly across
the CRISPR screening dataset. When comparing the copy number status to the average lethality score
for all 2,700 genes screened in these two cell lines, we found a positive correlation between the degree
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
of amplification and CRISPR lethality score (Fig. 6A). By contrast, there was no correlation between copy
number and lethality score in the shRNA screen dataset (Fig. 6A). Even sgRNAs directed against loci with
only a modestly increased copy number, harboring as few as one or two additional copies, showed a
greater average growth inhibitory effect than non-amplified loci. In addition, we observed that sgRNAs
targeting regions harboring hemizygous or complete loss of the genomic region displayed on average a
less pronounced growth effect than diploid regions. This effect was highly significant even when the
analysis was restricted to only non-expressed genes (p=10-35), thus excluding the possibility that this
effect is due to disruption of gene function. Together, these findings further support the notion that
CRISPR reagents that induce multiple genomic cuts result in anti-proliferative effect independent of the
target gene function and that this is directly proportional to the number of induced cuts. We next
wanted to investigate if this phenomenon may also help to explain some of the off-target lethality of
individual sgRNA reagents. To minimize any confounding effects due to on-target gene inactivation, we
restricted this analysis to 14,000 sgRNAs targeting non-lethal genes (as judged by lack of dropout of the
average sgRNA targeting that gene). Strikingly, the best predictor of off-target lethality was the number
of genomic sites with perfect complementarity to the target site (Figure 6B and Supplementary Table
S1). To investigate the mechanism of growth inhibition of these ‘multi-cutter’ sgRNAs, we examined the
cellular response to VEGFA site 2 sgRNA that was previously shown by GUIDE-seq to have more than 140
verified off-target sites (20), as well as another multiple cutter sgRNA observed in our screens (originally
designed against the olfactory receptor OR4F5). Similar to sgRNAs targeting amplified loci, we found
that both ‘multi-cutter’ sgRNAs led to a strongly increased phosporylation of H2AX, G2/M cell cycle
arrest, and apoptosis (Supplementary Fig. S9 and S10). It is important to note that the sgRNAs included
in the CRISPR tiling array were only filtered against perfect matches to other coding regions rather than
the entire genome. sgRNAs targeting multiple genomic loci frequently contained low complexity repeat
sequences (e.g. AGGAGGAGG…), but the off-target effects due to multiple genome matches were still
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
observed after the exclusion of low complexity repeats from the dataset. Collectively, these findings
indicate that loss-of-function proliferation based studies using sgRNA mediated gene-inactivation will be
subject to a set of off-target activities related to the number of times a guide strand sequence is found
in the genome. This will likely lead to false-positive in genes found in areas of genomic amplification,
and false-positives due to multiple homologous sites for a given sgRNA. Hence, sgRNAs should be
selected to have no additional matches to genomic regions (even if not expressed) in order to minimize
off-target lethality due to excessive genome damage. Moreover, these findings indicate that RNAi or
CRISPRi-based screens (21) will be better suited to elucidate the driver oncogenes of amplified regions.
Discussion
Genetic loss of function studies hold great promise for the discovery of novel therapeutic targets for
cancer and other diseases. In this study, we compared the deep coverage shRNA and CRISPR-based
screens for the systematic identification of cancer vulnerabilities. Our data indicate that CRISPR dropout
screens identified between 2-5 times as many essential genes compared to RNAi-based loss-of-function
screens, even when the shRNA screens are powered at 20 shRNAs per gene. We speculate that that this
high rate of false-negatives in RNAi-based screens can likely be attributed to the incomplete nature of
gene inactivation by RNAi, which in most cases generates hypomorphic rather than complete null alleles
(22). By contrast, CRISPR cutting of genomic DNA and error-prone NHEJ will result in indel mutations.
Indels are typically more catastrophic mutations to protein function and frequently lead to complete
gene disruption, especially in the case of frameshift mutations. These findings indicate that CRISPR
based dropout screens can provide a more comprehensive assessment of genetic dependencies
compared to RNAi-based screens.
As for any emerging technology, the specificity and optimal design parameters for CRISPR experiments
are not yet fully understood. Although CRISPR-based screens generally have a low false-positive rate (7,
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
11), likely owing to the increased targeting specificity of sgRNAs (22), we surprisingly found that CRISPR
can be prone to false-positive hits for genes with high ploidy, especially above a copy number threshold
greater than 6 copies. While it will be important to control for this class of false-positive hits, it is
important to note that these artefactual hits comprise only a minor fraction of all essential genes
discovered by CRISPR screens in aneuploid lines (Supplementary Fig. S11) and can easily be removed
bioinformatically. The copy number effect on CRISPR lethality was likely missed in several earlier studies
because those screens were performed on cell lines with stable diploid genomes. A recent study has
observed a similar copy number effect on a single cell line harboring a high level amplicon(8). We
reasoned that the lethality of sgRNAs targeting amplified genomic regions might be explained by two
hypotheses. First, sgRNAs targeting genes within tandem amplicons could lead to the excision of the
entire locus including removal of the essential oncogenic driver genes. Alternatively, an excessive
number of DNA double strand breaks may lead to sustained activation of the DNA damage response
pathway and growth inhibition. In agreement with Wang et al., we found that sgRNAs targeting
amplified loci led to an increase of the DNA damage marker phospho-H2AX, a G2/M cell cycle arrest,
and induction of apoptosis(8). These findings suggest that activation of the DNA damage response
pathway due to excessive DNA double strand breaks is, at least in part, responsible for the observed
growth inhibitory effects, but it is quite possible that the deletion of oncogenic drivers in tandem
amplicons contributes as well. The CN effect of CRISPR appears to be independent of p53 status, as it
was observed with similar magnitude in both p53 mutated (SF-268) and wild-type (MKN-45) cell lines.
While the CN effect is most severe at highly amplified loci, we found that even subtle copy number
changes can have statistically significant effects on CRISPR dropout scores. It is important to note,
however, that one may be able to correct for these subtler copy number effects with bioinformatics
approaches.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
These findings have several important implications for the design of CRISPR screening strategies. First,
CRISPR-based screens will likely not be a good approach to determine drivers of amplified genomic
regions. The putative amplified driver oncogene MET, for instance, did not have the strongest viability
effect in MKN-45 cells compared to other genes in the amplicon. By contrast, MET was identified as the
driver oncogene of this amplicon using shRNA-based screen, indicating that RNAi or CRISPRi-based
screens (21) are better suited to elucidate the driver oncogenes of amplified regions. Second, these
findings have important implications for future sgRNA library designs. In order to avoid lethality due to
excessive genome cuts, it will be critical to design CRISPR reagents that have no or at least minimal other
matches across the entire human genome. Our findings also imply that for pooled CRISPR screening
studies, it will be important to keep the multiplicity of infection during lentiviral transduction low, as
transduction with multiple sgRNAs targeting different genomic regions could lead to excessive genome
cuts and hence result in lethality. Interestingly, even diploid genes (CN=2) exhibited a slight but
statistically significant growth reduction compared to haploid (CN=1) gene loci. Due to this apparent
selection pressure against any genome cutting, it is possible that Cas9 expressing cells could be selected
against strongly during the course of screening. Third, the ability to easily multiplex sgRNA in single
experiments affords the ability of complex genome engineering and synthetic lethal screening. However,
based on our findings, one needs to carefully control for the effects of additional genomic cuts in dual or
even higher multiplexed screens, as synthetic lethality could be the result of passing a threshold of
‘excessive’ genomic cuts rather than genetic interactions. Fourth, the observed copy number effects
suggest that the use of a scrambled non-targeting CRISPR that does not cut the human genome is likely
not the best control for CRISPR lethality experiments, and should be replaced with reagents cutting non-
expressed or known non-essential genomic regions, such as the AAVS1 locus. Lastly, it will be important
to examine whether the copy number effects observed in our study also pertain to normal tissues. In
that case, caution should be exerted in both the experimental and therapeutic application of CRISPR to
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
the editing polyploid tissues, such as liver (23), as it may result in extensive genome damage that leads
to impaired growth or apoptosis.
Most sgRNA libraries have been designed to direct CRISPR-Cas9-induced mutations to the 5’exons of
coding regions (7, 11) with the goal of introducing frame-shift mutations early in the coding region of
the gene of interest, and initial sgRNA design rules (24, 25) have focused on thermodynamic and
sequence parameters of the guide RNA, much like the rules that were derived for RNAi reagents (26).
Our results, however, suggest that performance of sgRNAs appears to be also strongly influenced by the
structure/function of the gene regions they target. This can likely be explained by the fact that CRISPR
can induce both frameshift (3n+/-1, 3n+/-2) and in-frame deletions (3n) of variable size. The
consequences of these indel events can be quite variable depending on the nature of the deletion event.
Frame-shift deletions are likely to destroy protein function due to the deletion of large regions of the
protein. However, small in-frame deletions in non-essential domains are likely to retain functionality (i.e.
deletion of one or a few amino acids does not alter protein function) and thereby significantly reduce
the signal-to-noise in dropout screens. By contrast, deletions of even single amino acids in key functional
domains, such as the catalytic core, are likely perturbing protein function due to improper spacing of
functional groups required for catalysis (27). Therefore, in contrast to non-essential domains that can
tolerate small in-frame deletions, the deletion of even a single amino acid residue in highly conserved
catalytic regions will likely result in disruption of protein function, explaining why these conserved
regions show a much more robust dropout phenotype compared to non-essential regions. These
findings are consistent with recent findings by Vakoc and coworkers (6) and imply that for genes of
unknown function or with multiple known functions, the phenotypic strength of sgRNA targeting
different regions could help pinpoint which domains are most essential for cancer cell growth.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Collectively, our study demonstrates the power of CRISPR-based dropout screens towards identifying
cancer-selective vulnerabilities, but also highlight important caveats for the interrogation of genes in
amplified regions. Moreover, our results suggest that the frequently-used sgRNA design strategies that
predominantly target the most 5’ coding regions of genes may be sub-optimal. Instead, our data
indicate that targeting the most highly conserved regions of a gene may yield a more robust dropout
phenotype and thus maximize screen performance. Together, the findings described in this study
provide a roadmap towards the systematic elucidation of cancer dependencies using CRISPR-based
screening approaches.
METHODS
Cell culture, RNA-seq and copy number variation
Cell lines were purchased from ATCC, the RIKEN cell bank or NCI/DCTC on June 2008 and were grown in
either DMEM or RPMI supplemented with 10%FBS (Thermo Scientific). Cell lines were authenticated by
snp genotyping with the fluidigm biomark platform, with a panel of 48 SNPs (Fluidigm) prior to the
screens. DNA copy number was measured using high-density single nucleotide polymorphism arrays
(Affymetrix SNP 6.0)(28). The RNA seq data was acquired from the cancer cell line encyclopedia from the
Broad institute where large insert non-strand specific RNA sequencing was performed using a large-
scale, automated variant of the Illumina Tru Seq™.Oligo dT beads are used to select mRNA from the
total RNA sample (200ng). The selected RNA is then heat fragmented and randomly primed before cDNA
synthesis from the RNA template. The resultant cDNA then goes through Illumina library preparation
(end repair, base “A” addition, adapted ligation and enrichment) using Broad designed indexed adapters
for multiplexing. After enrichment, the samples are qPCR quantified and equimolar pooled before
processing to Illumina sequencing, done in the Illumina HiSeq 2000 or HISeq 2500, with sequence
coverage to 100M paired reads.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Vectors and CAS 9 cell line generation
To construct the lentiviral CAS9 vector, a human optimized 3FlagSPy-Cas9 was cloned into pLenti 6
(Thermo Scientific). Cell lines expressing CAS9 were generated by lentiviral transduction of the pLenti6-
3flagSPyCAS9 vector. Positive populations were selected using Blasticidin S (Thermo Scientific). CAS9
expression was measured by flow cytometry. 2X106 cells were fixed with 1% PFA (Electron Microscopy
Sciences) and ice cold methanol (Fisher Scientific), cells were permeabilized with o.2% Triton-X (Sigma-
Aldrich) and stained using an antibody against Cas9 at a concentration of 1/200 (Cell signaling) .
The shRNA library was constructed by Cellecta Inc. and can be acquired using library ID number: 27K-
BGP2-MS-NOVA; 13K-hTF-GH-NOVA; 13K-hYAP-GH-NOVA; 13K-hEPI2-GH-NOVA. The sgRNAs libraries
were designed as previously described (7). A modified tracrRNA scaffold (29) for cas9 loading was cloned
into the sgRNA vectors before cloning of the guide RNAs. Each library targets ~2700 genes and is
comprised of 20 shRNA or sgRNAs per gene (Supplementary table S2-3). For the tiling library, all possible
sgRNAs (based on the presence of a PAM motif) against 157 genes were identified (Supplementary table
S4). Oligonucleotides were synthesized on a 92k array (Custom array Inc.), amplified by PCR, and cloned
into the lentiviral U6 sgRNA expression vector’s BbsI restriction sites using Golden Gate assembly (30).
For all proliferation assays and Next generation sequencing, individual sgRNAs were cloned to an
inducible U6 shRNA or sgRNA expressing vector using the restriction enzyme BbsI or AarI.
CRISPR Guide Selection
RefSeq (downloaded on January 5, 2015) was used as the gene model for guide design. All potential
20_mer guides with a predicted cut site within an exon or within 10 base pairs from the exon-intron
boundary were included as potential guides. Guides were annotated with sequence properties (e.g. GC
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Percentage, sequence degeneracy, Doench-root), mapping properties (e.g. 20 mer sequence uniqueness
in the human genome, whether there are known overlapping SNPs or variants observed in any cell lines
in the Novartis-Broad cancer cell line encyclopedia (CCLE)), gene and expressed properties (e.g.
overlapping protein domains).
Rather than choosing guides based on transcript or gene, genetic features were first grouped. In
particular, transcript isoforms which shared at least 50% of potential guides were combined into a single
meta-transcript, for which guides were chosen optimized to target all isoforms in that meta-transcript.
Pooled screening
For all screens, cells were infected with lentiviral shRNAs or sgRNA pools at a representation of 1000
cells per shRNA at an MOI of 0.5. Cells were selected for four days in the presence of puromycin, a
reference sample was collected 72 hours after selection to ensure adequate selection/representation.
Cells were propagated for a total of 14 days with an average shRNA/sgRNA representation of 1000
maintained at each passage. 100 million cells were harvested for DNA extraction by Qiagen QIAmp
Blood Maxi kit, shRNA and sgRNAs were PCR amplified from 100 ug of genomic DNA and PCR fragments
of 260-280bp were purified using Agencourt AMpure XP beads (Beckman). The resulting fragments were
sequenced on a Hiseq 2500 (Illumina) with a single end 50bp run. Sequencing reads were aligned to the
shRNA or sgRNA library and the enrichment or loss of individual bar codes or sgRNA were quantified.
Data processing
For each sample, total number of read counts were normalized to 50x106, with 50 additional pseudo-
counts added to each shRNA to minimize false positives in the low-abundance tail of the shRNA library
distribution, where counts are unreliable. All samples had day 14 log2 ratios for each sgRNA/shRNA
calculated relative to plasmid counts and shRNAs or sgRNAs whose abundance was significantly
different to the mean was calculated using a Z score. The average Z score values integrate the
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
information from multiple shRNAs targeting a single gene, thus showing the similarity of the effect of
these multiple sgRNAs/shRNA’s and minimizing the impact of possible off-target effects.
Statistical analysis
All statistics were computed using python/scipy/pandas. P-values were calculated using the mann-
whitney-wilcoxon rank-sum test using Python’s scipy.stats.mannwhitneyu function. Correlation
coefficients between Z-scores for DLD1 and RKO were calculated by Pearson correlation, while
correlation between Z-scores and vertebrate sequence conservation were calculated using spearman
correlation.
Sequencing analysis
Cells were stably transfected with individual sgRNAs against CTNNB1, after selection cells were cultured
in the presence or absence of doxycycline and collected for DNA extraction after 4 days in culture.
Target regions were amplified by the locus-specific primer pairs shown in Table 1. Amplicons were
pooled in equimolar amounts and libraries were generated on the illumine NeoPrep using the TruSeq
nano protocol and sequences on the MiSeq (Illumina)). The Insertion/deletion frequency was calculated
as previously described (8, 31).
Western Blot analysis
Protein extracts, separated by SDS-PAGE and transferred onto PVDF membranes, were probed with
antibodies against CDK9 ( clone (C12F7) Cell signaling), actin ( Clone AC-74, Sigma), pH2AX (Ser139-
clone JBW301, Millipore) and Tubulin (clone DM1A, Sigma). Proteins of interest were detected with
HRP-conjugated sheep anti-mouse and sheep anti rabbit IgG antibody (1: 2500, Biorad) and visualized
with the Pierce ECL Western blotting substrate (Thermo Scientific, Rockford, IL), according to the
provided protocol.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Proliferation assays
Cells stably expressing dox inducible shRNA or sgRNAs against B catenin, NRAS and PLK1 (Supplementary
table S3) were used for ATP-based measurements of cellular proliferation by plating 1,300 cells per well,
biologically replicated three times, in 96-well plates. After 6 days, 100 μl of Cell Titer-Glo reagent
(Promega) were added to each well, mixed for 30 minutes, after which the luminescence was measured
on the SpectraMax M5 Luminometer (Molecular Devices). .P values were determined by one-tailed
Student’s t-Test.
Proliferation was also measure using live cell time-lapse imaging. Cells were harvested by trypsinization,
counted on a Countess automated cell counter (Invitrogen, Carlsbad, CA) and plated at 130 cells per well
on 96 tissue culture plates in 3 replicates Photomicrographs were taken every 6 hours using an Incucyte
live cell imager (Essen Bioscience) and confluence of the cultures was measured using Incucyte software
(Essen Biosciences ) over 160 hours in culture. P values were determined by one-tailed Student’s t-Test.
Cell cycle and Annexin/ fixable viability dye assays
Cells were analyzed for phosphatidylserine exposure by an annexin-V PerCP-eFluor 710/Fixable viability
dye eFluor 780 (eBioscience) double-staining according to the provided protocol. Cells stably transfected
with sgRNAs toward MMP7, MMP20, ANGPTL5, VEGFA and OR4F5 (Supplementary table S3) were
analyzed after 6 days in culture. A minimum of 10,000 cells were collected with FACScanto (BD
Pharmingen) and analyzed with Flowjo (Tree star).
For cell cycle analysis cells were plated in 6 well plates and analyzed 6 days after stable transfection of
the sgRNAs mentioned above. Cells were harvested, fixed with 70% ethanol and stained with a solution
containing 1% Triton X100 (Sigma), 1ug/ml DAPI (Invitrogen) in PBS for 30 minutes. A minimum of
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
10,000 cells were collected with LSRFortessa (BD Pharmingen) and analyzed with Flowjo (Tree star). The
Watson (Pragmatic) model was used for the cell cycle and apoptotic peak modeling.
References
1. Cowley GS, Weir BA, Vazquez F, Tamayo P, Scott JA, Rusin S, et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific data. 2014;1:140035. 2. Hoffman GR, Rahal R, Buxton F, Xiang K, McAllister G, Frias E, et al. Functional epigenetics approach identifies BRM/SMARCA2 as a critical synthetic lethal target in BRG1-deficient cancers. Proceedings of the National Academy of Sciences of the United States of America. 2014;111:3128-33. 3. Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, Quattrochi B, et al. A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nature methods. 2012;9:363-6. 4. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014;157:1262-78. 5. Jinek M, East A, Cheng A, Lin S, Ma E, Doudna J. RNA-programmed genome editing in human cells. eLife. 2013;2:e00471. 6. Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, Vakoc CR. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nature biotechnology. 2015;33:661-7. 7. Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80-4. 8. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350:1096-101. 9. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell. 2015;163:1515-26. 10. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, et al. The Pfam protein families database. Nucleic acids research. 2008;36:D281-8. 11. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339:819-23. 12. Schmidt EV. The role of c-myc in cellular growth control. Oncogene. 1999;18:2988-96. 13. Raab M, Kappel S, Kramer A, Sanhaji M, Matthess Y, Kurunci-Csacsko E, et al. Toxicity modelling of Plk1-targeted therapies in genetically engineered mice and cultured primary mammalian cells. Nature communications. 2011;2:395. 14. Segditsas S, Tomlinson I. Colorectal cancer and genetic alterations in the Wnt pathway. Oncogene. 2006;25:7531-7. 15. Nunez F, Bravo S, Cruzat F, Montecino M, De Ferrari GV. Wnt/beta-catenin signaling enhances cyclooxygenase-2 (COX2) transcriptional activity in gastric cancer cells. PloS one. 2011;6:e18562. 16. Arva NC, Talbott KE, Okoro DR, Brekman A, Qiu WG, Bargonetti J. Disruption of the p53-Mdm2 complex by Nutlin-3 reveals different cancer cell phenotypes. Ethnicity & disease. 2008;18:S2-1-8. 17. Hulsken J, Birchmeier W, Behrens J. E-cadherin and APC compete for the interaction with beta-catenin and the cytoskeleton. The Journal of cell biology. 1994;127:2061-9. 18. Vangamudi B, Paul TA, Shah PK, Kost-Alimova M, Nottebaum L, Shi X, et al. The SMARCA2/4 ATPase Domain Surpasses the Bromodomain as a Drug Target in SWI/SNF-Mutant Cancers: Insights from cDNA Rescue and PFI-3 Inhibitor Studies. Cancer research. 2015;75:3865-78. 19. Zender L, Xue W, Zuber J, Semighini CP, Krasnitz A, Ma B, et al. An oncogenomics-based in vivo RNAi screen identifies tumor suppressors in liver cancer. Cell. 2008;135:852-64.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
20. Tsai SQ, Zheng Z, Nguyen NT, Liebers M, Topkar VV, Thapar V, et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology. 2015;33:187-97. 21. Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell. 2014;159:647-61. 22. Fellmann C, Lowe SW. Stable RNA interference rules for silencing. Nature cell biology. 2014;16:10-8. 23. Gentric G, Desdouets C. Polyploidization in liver tissue. The American journal of pathology. 2014;184:322-31. 24. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature biotechnology. 2014;32:1262-7. 25. Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, et al. RNA-guided human genome engineering via Cas9. Science. 2013;339:823-6. 26. Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, et al. Design of a genome-wide siRNA library using an artificial neural network. Nature biotechnology. 2005;23:995-1001. 27. Arpino JA, Reddington SC, Halliwell LM, Rizkallah PJ, Jones DD. Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on GFP folding and structure. Structure. 2014;22:889-98. 28. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603-7. 29. Chen B, Gilbert LA, Cimini BA, Schnitzbauer J, Zhang W, Li GW, et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell. 2013;155:1479-91. 30. Weber E, Engler C, Gruetzner R, Werner S, Marillonnet S. A modular cloning system for standardized assembly of multigene constructs. PloS one. 2011;6:e16765. 31. Dow LE, Fisher J, O'Rourke KP, Muley A, Kastenhuber ER, Livshits G, et al. Inducible in vivo genome editing with CRISPR-Cas9. Nature biotechnology. 2015;33:390-4.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Figure legends
Figure 1. CRISPR-based dropout screens identify more essential genes compared to shRNA screens
A, Schematic representation of shRNA and CRISPR-based screens. B,C Comparison of drop out
phenotype of 2722 human genes in DLD1 and SF268 highlighting pan-lethal genes. The dropout
phenotypes are calculated as Z scores for 20 reagents per gene, and the shRNA Z-score is displayed on Y-
axis and CRISPR Z-score on the X-axis. Quadrant III contains genes that score as lethal by both CRISPR
and shRNA. A few known essential genes are marked by colored dots and indicated in the legend.
Quadrant II (red) contains genes that scored lethal only by CRISPR and quadrant IV (grey) contain genes
that scored lethal only by shRNA. D,E, The lethality Z score of each gene in CRISPR screen (X-axis) is
graphed against its expression level as determined by RNAseq (Y-axis) for the indicated cell lines, DLD1
and SF268.
Figure 2. CRISPR screens can robustly identify cancer-selective dependencies.
A, Distribution of lethals amongst the five cancer lines tested. A Z score cutoff of -1 was used to
delineate genes that are cell essential. Pan-lethal genes will be essential in 4 or 5 cell lines, whereas
selective lethals only affect growth of a few cell lines. B, Venn diagram showing the distribution of
CRISPR drop-out screens (Z score<-1) hits in HT1080, DLD1 and RKO. C, Heat map displaying the CRISPR
lethality Z-score of a few selected genes across the five cancer cell line models. The 4 genes on top are
pan lethal (dropout in every cell line), the 3 genes at the bottom are not expressed and hence show no
activity across all models. Several known genetic dependencies display a selective lethal pattern that
correlates with genetic alterations in the cell lines (genotype indicated on top).
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Figure 3. sgRNA targeting 5’coding region of Beta Beta-catenin are ineffective due to alternative
translation initiation site in exon 3.
A, Scatter plot of CRISPR Z score (X-axis) versus shRNA Z score (Y-Axis) of 2722 genes in the colorectal
line DLD1. Beta-catenin is one of the few genes that shows a stronger dropout in shRNA screen
compared to the CRISPR screen. B, (top) Graphical depiction of CTNNB1 genomic locus . The CTNNB1
gene extends over 40kb and contains 16 exons (vertical hatches). (Bottom) exon structure of the human
CTNNB1 cDNA. Position of the start codon (ATG) as well as an alternative start site (*ATG Supported by
EMBL transcript ID ENST00000405570 and UniProtKB-A0A024R2Q3 and P35222). Magnification of exons
2-5 shows the location of each sgRNA relative to the CTNB1 cDNA. The color intensity represents the Z-
score for each sgRNA. None of the five sgRNAs targeting the 5’ most coding region score by CRISPR
(upstream of alternative start codon), whereas 13 of 15 sgRNAs downstream of that site score by
CRISPR.
Figure 4. sgRNAs targeting conserved Pfam domains display most robust dropout phenotypes
A, Schematic representation of the CRISPR tiling screen. B, Hexagonal correlation plot showing that
individual tiling reagents have very high correlation with each other in the two cell lines DLD1 and RKO.
C, Violin plot highlighting the statistical significance that sgRNA targeting conserved PFAM domains lead
to stronger lethality effects on average. This analysis was restricted to genes required for cell growth
(averaged Z scores below -0.4).. D-G, Individual examples of how sgRNA performance is influence by
gene position. Each dot represents the score of an independent sgRNA, with grey dots indicating
sgRNA’s targeting regions outside of a Pfam domain, and orange dots indicate sgRNA’s targeting PFAM
domains. The black line indicates the average dropout score for the neighboring 10 sgRNAs. The protein
domain structure of the respective genes is displayed on top with key Pfam domains labeled.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Figure 5. Highly amplified genes score as false positives in CRIPSR based screens.
A,B Schematic of genomically amplified regions in chromosomes 7 and 11 of the gastric line MKN45 and
the brain line SF268 respectively (Top). Line graph depicting the relative copy number of genes within
the amplicons in chromosome 7 (MKN45 cells) and chromosome 11 (SF268 cells) (Bottom). Orange dots
represent non-expressed genes as assessed by RNA-seq( RNAseq<1).
C-F, Scatter plots display the average CRISPR (C,D) or shRNA (E,F) lethality Z-score for each gene (Y-axis)
relative to its copy number (X-axis) for MKN45 (left) and SF268 cells (right). Orange data points indicate
non expressed genes (expression <1) as assessed by RNA seq. The solid line represents a regression line
for non-expressed genes. The dotted red box (lower right) marks the genes showing the largest dropout
phenotype. Green dot represent the mayor drop out in the highlighted amplicon.
Figure 6. Excessive genome cuts leads to off target lethality
A, Line graphs depicting the average lethality of all sgRNAs (red line) or all shRNAs (blue line) relative to
the copy number of the genes targeted. Lines represent the average Z score of all sgRNA’s and shRNAs
in MKN45 and SF268. Dashed line indicates the Z-score observed on sgRNA’s targeting genes with a
copy number of 2. The x axis (copy number) was split into 4 bins. B, Violin plot depicting the lethality Z-
scores of sgRNAs relative to the number of perfect genome matches. To avoid any influence of
functional on-target effects, only non-expressed genes were included in this analysis.
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Table 1
CTNNB1 Forward Reverse
sgRNA #1 TGCTCAAGGGGAGTAGTTTCA CCACTGGTGAACTGGGAAGA
sgRNA#2 TGCTGAAACATGCAGTTGTAAA CTCACGATGATGGGAAAGGT
sgRNA#3 GGACTTCACCTGACAGATCCA TGGTCAGATGACGAAGAGCA
sgRNA#4 TTCCCAGTTCACCAGTGGAT GCACGCTCCCTATGAGAATC
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Reagents targeting aspecific gene
Plasmid counts
Day
14
coun
ts
shRNA or CRISPR viral pool20 reagents per genetargeting 2700 genes
Cas9 line Infected cells
14 daysDeep
sequencing
Comparison of remaining targeting reagents
Figure 1. CRISPR-based dropout screens identify more essential genes compared to shRNA
screens
A
D
CRISPR avg. Z-score
RN
Ase
q L
og
2(0.
5+FP
KM
)
DLD-1
-3 -2 -1 0 1
1000
100
10
10.1
0.01
0.001
-3 -2 -1 0 1
0.60.2-0.2
-0.6
-1
-1.4
-1.8
-2.2
-2.6
B
CRISPR avg. Z score
shRN
A a
vg. Z
sco
re
DLD-1 SF268
III IV
II IRPS18PRPF19PSMA4RPL7CKAP5RUVBL1RAN
SF268
-4 -3 -2 -1 0 1CRISPR avg. Z score
Copy number >2E
-4 -3 -2 -1 0 1
0.60.2-0.2
-0.6
-1-1.4
-1.8
-2.2
-2.6
CRISPR avg. Z score
C
III IV
II I
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Figure 2. CRISPR screens can robustly identify cancer-selective dependencies
DLD1 HT1080 MKN-45 RKO SF-268
PCNARPL7
PLK1PSMA4
MDM2NRAS
PIK3CAKRAS
CTNNB1
TCF7L2
TSSK2
CYP11B2
PROP1
A B
C
KRASG13D , T
P53S241F
PIK3CAE545K ,APC
Mut
META
mp
BRAFV600E , P
IK3CAH1047R
TP53R273H
NRASQ61K
Pa
n L
eth
al
Ge
ne
tic
de
pe
nd
en
cie
sN
on
ex
pre
sse
d
HT1080 DLD1
RKO
190
18
33
27
34
32
33
Avg
Z sc
ore
0
50
100
150
1 2 3 4 5
Nu
mb
er
of
leth
al
ge
ne
s
Number of cell lines
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
1 3 5 7 9 11 13 15 4 6 8 10 12 14 16
Figure 3. sgRNAs targeting 5’ coding region of Beta catenin are ineffective due to alternative
translation initiation site in exon3
B
2
ATG
Avg
Z sc
ore
*ATG
Genomic sequence
sgRNA’s
cDNA
3 5 4 2
-3
A
CRISPR avg. Z score
shRN
A a
vg. Z
sco
re
Beta-catenin
*Alternative Start site
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
200 400 6000
PLK1
Amino acid position
Amino acid positionAmino acid position200 400 600 8000
0
-1
-2
-3
CTNNB1
DLD
1 Z-
scor
e 0
-1
-2
-3
DLD
1 Z-
scor
e
0
-1
-2
-3
NC
IH12
99 Z
-sco
re
Amino acid position
CRIS
PR a
vg Z
-sco
re
0 100 200 300
AURKB
CRISPR tiling library
Pfam domain
All Possible
sgRNAs
Position-specific
viability phenotypesStart End
Figure 4. sgRNA targeting conserved PFAM domains display most robust dropout phenotypes
53 303 410 490 511 593
Kinase DomainArmadillo repeats
Kinase Domain
Polo-box Domains
141 664
76 327
0
1
-1
-2
-3DLD
1 Z-
scor
e-4
In PFAMDomain
Not in PFAMDomain
p<10-180
A
B
D
C
0 1-1-2-3
0
1
-1
-2
-3
DLD1 Z-score
RKO
Z-s
core
r2=0.509
Amino acid position
0
-1
-2
-3
NC
IH12
99 Z
-sco
re
0 500 1000 1500
SMARCA2ATP-binding Helicase_C Bromo domain
173 208 436 508 736 901 1054 1216 1419 1489
E
F G
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
q31.
2q3
1.1
q31.
31q3
1.32
q22.
1q2
2.2
Chr 7 Chr 11
113,000,000 115,000,000 117,000,000 119,000,000 121,000,000
64
32
16
8
4
2
Cav1
ANGPTL5Met ST7
ING3 PTPRZ1
Copy number0.5 1 2 4 8 16 32
1
0
-1
-2
PTPRZ1ST7MET
ING3CAV1
MKN45C
A
Cop
y num
ber
Copy
num
ber
1 2 4 8 16 32
10
-1
-2
-3-4
CR
ISP
R A
vg. Z
-sco
re
Copy number
MMP7MMP20BIRC3TMEM123BIRC2C11orf70ANGPTL5YAP1
SF268
MKN45 SF268
RNAseq count <1
RNAseq count <1
100,000,000 101,000,000 102,000,000 103,000,000
32
16
8
4
2
MMP20C11orf70
YAP
MMP7
TMEM123
BIRC3
BIRC2
Figure 5 . Highly amplified genes score as false-positives in CRISPR based screens
1 2 4 8 16 32
0
-0.8
-1.6
0.5 1 2 4 8 16 32
shRN
A
Avg
. Z-s
core
Copy numberCopy number
RNAseq count <1
BIRC3MMP7ANGPTL5TMEM123MMP20C11orf70BIRC2
YAP1
0
-0.8
-1.6
-2.4
PTPRZ1ING3ST7CAV1
MET
64
MKN45 SF268E
CR
ISP
R A
vg. Z
-sco
re
shRN
A
Avg
. Z-s
core
B
D
F
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Number of perfect match sites in genome
1 2 3 4 5
DLD
1 Z
-sco
re
A
B2
1
0
-1
-2
-3
-4
-5
Figure 6. Excessive genome cutting leads to off target lethality
0.5 1 2 4 8 16 32
0
-0.4
-0.8
-1.2
-1.6 shRNACRISPR
Avg.
Z-s
core
Copy number
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178
Published OnlineFirst June 3, 2016.Cancer Discov Diana M. Munoz, Pamela J. Cassiani, Li Li, et al. amplified genomic regionscancer vulnerabilities but generate false-positive hits for highly CRISPR screens provide a comprehensive assessment of
Updated version
10.1158/2159-8290.CD-16-0178doi:
Access the most recent version of this article at:
Material
Supplementary
http://cancerdiscovery.aacrjournals.org/content/suppl/2016/06/03/2159-8290.CD-16-0178.DC1
Access the most recent supplemental material at:
Manuscript
Authoredited. Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journal.Sign up to receive free email-alerts
Subscriptions
Reprints and
To order reprints of this article or to subscribe to the journal, contact the AACR Publications
Permissions
Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)
.http://cancerdiscovery.aacrjournals.org/content/early/2016/06/11/2159-8290.CD-16-0178To request permission to re-use all or part of this article, use this link
Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178