CRISPR screens provide a comprehensive assessment of cancer

CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-

positive hits for highly amplified genomic regions

Diana M. Munoz1, Pamela J. Cassiani1, Li Li1, Eric Billy2, Joshua M. Korn1, Michael D. Jones1, Javad Golji1,

David A. Ruddy1, Kristine Yu1, Gregory McAllister 3, Antoine DeWeck2, Dorothee Abramowski2, Jessica

Wan1, Matthew D. Shirley1, Sarah Y. Neshat1, Daniel Rakiec1, Rosalie de Beaumont1, Odile Weber2,

Audrey Kauffmann2, E Robert McDonald III1, Nicholas Keen1, Francesco Hofmann2, William R. Sellers1,

Tobias Schmelzle2, Frank Stegmeier1,4,5 and Michael R. Schlabach1,4,5*.

1. Oncology Disease Area, Novartis Institute for Biomedical Research, Cambridge, Massachusetts,

USA. 2. Oncology Disease Area, Novartis Institutes for Biomedical Research, Basel, Switzerland. 3. Developmental and Molecular Pathways, Novartis Institutes for Biomedical Research,

Cambridge, Massachusetts, USA. 4. Present address: KSQ Therapeutics, Cambridge, Massachusetts, USA 5. These authors contributed equally to this work

Running title: CRISPR screens for the discovery of cancer vulnerabilities

Keywords: CRISPR, shRNA, drop out screens, cancer vulnerabilities and genetic amplifications Manuscript type: Research article *Corresponding author: Michael R Schlabach Mailing address: KSQ Therapeutics 790 Memorial Drive, Suite 200 Cambridge, MA 02139 Email: [email protected] Cell: 617-444-9192 Disclose any potential conflict of interest: D.M.M, P.J.C, L.L, E.B, J.K, M.D.J, J.G, D.R, K.Y, G.M, A.D, D.A, J.W, M.D.S, S.Y.N, D.R, R.B, O.W, A.K, E.R.M,N.K, F.H, W.R.S, T.S, F.S and M.R.S are employees of Novartis. Word count: 5,617 Total number of figure: 6

Research. on April 6, 2018. © 2016 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on June 3, 2016; DOI: 10.1158/2159-8290.CD-16-0178

http://cancerdiscovery.aacrjournals.org/

Abstract

CRISPR/Cas9 has emerged as a powerful new tool to systematically probe gene function. We compared

the performance of CRISPR to RNAi-based loss-of-function screens for the identification of cancer

dependencies across multiple cancer cell lines. CRISPR dropout screens consistently identified more

lethal genes than RNAi, implying that the identification of many cellular dependencies may require full

gene inactivation. However, in two aneuploid cancer models we found that all genes within highly

amplified regions, including non-expressed genes, scored as lethal by CRISPR, revealing an unanticipated

class of false-positive hits. Additionally, using a CRISPR tiling screen, we found that sgRNAs targeting

essential domains generate the strongest lethality phenotypes and thus provide a strategy to rapidly

define the protein domains required for cancer dependence. Collectively, these findings demonstrate

the utility of CRISPR screens in the identification of cancer-essential genes, but also reveal the need to

carefully control for false-positive results in chromosomally unstable cancer lines.

Significance

We show in this study that CRISPR-based screens have a significantly lower false-negative rate

compared to RNAi-based screens, but have specific liabilities particularly in the interrogation of regions

of genome amplification. Therefore, this study provides critical insights for applying CRISPR-based

screens towards the systematic identification of new cancer targets.

Introduction

Genetic loss-of-function screens are an important approach enabling the systematic identification of

cancer selective vulnerabilities. In mammalian cells, RNAi has been the predominant method of

screening and has enabled systematic and genome-wide loss-of-function screens leading to the

identification of new cancer targets (1, 2). RNAi-based screens, however, are often confounded by off-




target effects (3). In addition, RNAi induces mRNA downregulation typically resulting in reduced gene

function (hypomorphic allele), rather than a complete loss of function (null allele). Thus, in addition to

the problem of false-positives, RNAi screens also likely suffer a certain rate of false-negative detection of

genes where near completely loss-of-function would be required in order to elicit a phenotypic effect.

The frequency of false-negatives in RNAi-based screens has not yet been systematically assessed.

More recently, the prokaryotic type II CRISPR–Cas9 (clustered regularly interspaced short palindromic

repeats–CRISPR-associated 9) has emerged as an RNA-based genome-editing tool that can be used to

enact loss-of-function screens (4). In contrast to RNAi, the CRISPR system induces sequence-directed

DNA double stranded breaks resulting in frameshift insertion/deletion (indel) mutations that can induce

complete loss of protein function (5). Initial studies demonstrated the use of CRISPR for genetic screens

in mammalian cells (6, 7) and showed high level of phenotypic agreement between reagents targeting

the same gene and a high rate of hit confirmation. Most of these screens were positive selection screens

which are technically less challenging than ‘drop out’ screens. Subsequent screens (8, 9) used improved

libraries and screening methods to discover essential genes in mammalian cells, but a systematic

comparison of CRISPR to RNAi in drop-out screens has not yet been described with sufficient reagent

depth to enable robust conclusions.

In this study, we systematically compared the performance of these two screening technologies for the

identification of new cancer vulnerabilities. We show that at equivalent screening depth, CRISPR

dropout screens identified a significantly higher number of essential genes and thus provide a more

comprehensive assessment of genetic dependencies compared to RNAi-based screens. Additionally, we

show that sgRNAs that target DNA sequences within conserved Pfam domains(10) tend to result in a

more robust drop out phenotype. These findings have important implications for future library designs

and suggest that the CRISPR tiling approach outlined herein might be used to elucidate which protein




domains are critical in driving biological effects. We surprisingly found that all genes within highly

amplified genes, even when not expressed, scored as strongly lethal, revealing an unanticipated class of

false-positive hits. Collectively, these findings demonstrate that while CRISPR has certain specific

limitations, CRISPR-mediated genetics screens can be used for robust and systematic discovery of cancer

cell vulnerabilities.

Results

CRISPR-based dropout screens provide a more complete assessment of cancer dependencies

compared to shRNA screens

In order to robustly compare RNAi- and CRISPR-based screening technologies, we constructed shRNA

and sgRNA libraries targeting 2722 human genes with an average coverage of 20 reagents per gene. The

sgRNAs were designed against the N-terminus of protein coding genes, as described previously, as

frameshift mutations in the N-terminus are thought to be more likely to result in ‘complete’ protein

inactivation (7, 11). Deep shRNA libraries were designed as previously described (2).These libraries were

used in proliferation-based screens in a set of 5 cancer cell lines including; the colorectal cancer cell lines

DLD1 and RKO, fibrosarcoma cell line HT-1080, astrocytoma cell line SF-268, and gastric cancer cell line

MKN-45 (Fig. 1A). Following lentiviral transduction of the sgRNA libraries, the impact of gene depletion

on cellular viability or proliferation was assessed by quantifying the abundance of sgRNAs at day 0

(plasmid count) relative to 14 days using next-generation sequencing (see details in methods). sgRNAs

targeting essential genes are expected to inhibit the growth of transduced cells and thus their relative

abundance will be reduced when comparing the relative counts on day 14 vs day 0 (Fig 1A, right graph).

We found that across all cell lines screened about 2-3% of genes scored as lethal genes by both RNAi and

CRISPR approaches (Fig. 1B and 1C, Supplementary Fig. S1, quadrant III). The gene list in quadrant III

included many known essential gene classes such as ribosomal, RNA processing, and DNA replication




factors (Supplementary Fig. S2). Notably, there were very few genes that scored as essential by RNAi but

not by CRISPR (Figure 1B, 1C, S1, IV). In contrast, in all of the five cancer models screened a large

number of genes scored as essential by CRISPR but not RNAi (Fig. 1B, 1C, S1, II). In fact, the number of

lethal genes identified by CRISPR was twofold (HT1080 cells) to five-fold (DLD1 cells) higher compared to

RNAi. This suggested that CRISPR either had a significantly lower false-negative rate than shRNA, or a

much higher false-positive rate. One way to identify likely off-target hits is to examine the lethality

scores of non-expressed genes, as these are expected to not be required for cell viability. In DLD1, RKO,

and HT1080 cells, all of the genes required for cell viability (average Z-scores below -1) had an RNASeq

RPKM expression value greater than 2, indicating that the CRISPR screen at this depth showed virtually

no false-positive effects from sgRNAs directed against non-expressed genes (Fig 1D, Fig S3). However, in

SF268 (Fig 1E) and MKN-45 cells (Fig S3), a number of genes scored as essential in the CRISPR screen

despite not being expressed. As described in detail below, we found that these false positive hits were

associated with genes in regions of high copy number amplification. These false positive hits were only

observed in SF268 and MKN-45, as these are chromosomally aneuploid lines, whereas DLD1, RKO and

HT1080 are diploid cancer lines. After removing these false-positive hits due to amplified genes in SF268

and MKN-45 cells, we conducted further analysis on the essential genes identified by CRISPR and RNAi.

The category of genes that only scored by CRISPR but not RNAi included many genes known to be

essential for proliferation of most cells such as CDK9, PLK1, and MYC (12, 13) as well as many known

essential gene classes (RNA processing and DNA replication). We hypothesized that RNAi-based screens

failed to recover these genes either due to the absence of effective shRNAs in our library (despite a

coverage of 20 reagents per gene) and/or insufficient protein knockdown to reveal a full loss-of-function

phenotype. In support of this hypothesis, we found that only 1 of 6 CDK9 shRNAs tested achieved potent

CDK9 protein depletion that resulted in growth inhibition(Supplementary Fig. S4A-B) thus explaining

why CDK9 failed to score as an essential gene in the shRNA-based screen. Collectively, these findings




indicate that CRISPR-based screening enables a more complete assessment of genes required for cancer

cell growth.

We next sought to explore whether CRISPR screens can be used to identify cancer-selective

dependencies. The five cell lines screened were derived from various tumor lineages with distinct

genetic alterations. Using a cutoff of -1 average Z-score to delineate genes that are cell essential, we

found that a total of 409 genes scored as essential in at least one of the five cancer cell lines. Of these,

34% of essential genes were required for the proliferation of all cell lines, suggesting that these genes

serve core cellular functions that are likely required for the proliferation of most cells (Fig 2A, B); we

henceforth refer to this category of broadly essential genes as pan-lethals. A smaller number of genes

was selectively required for the growth of only one (25%) or two (12%) of the five screened cancer cell

models (Fig 2A, B); we refer to this class of genes as selective lethals. Of note, the class of selective

lethals included several known oncogene dependencies. For instance, Beta-catenin targeting sgRNAs

selectively impaired the proliferation of DLD1, an APC mutated cell line with constitutive activation of

WNT pathway signaling(14) (Fig 2C). The selective dependence on b-catenin was validated using

inducible sgRNAs and additional cell proliferation assays (Supplementary Fig. S5A-D). This cell line was

also dependent on TCF7L2, a gene that encodes the transcription factor Tcf4. Tcf4 interacts with Beta-

catenin to drive expression of WNT pathway target genes. Surprisingly, the gastric cancer cell line

MKN45 also exhibited dependence on Beta-catenin and Tcf4 despite lacking genetic alterations in WNT

pathway components (Fig 2C). Of note, a prior study reported high levels of nuclear Beta-catenin in

MKN45 cells (15) , suggesting that WNT pathway activation in this cell line might be driven by non-

genetic mechanisms. When we looked at the pattern of KRAS dependence, we found that KRAS

selectively impaired the proliferation of DLD1 cells (Fig 2C); this dependency is likely explained by the

fact that DLD1 harbors the oncogenic KRASG13D mutation. Unexpectedly, however, MKN45 cells were

also dependent on KRAS despite lacking genetic alterations in this oncogene. These cells harbor MET




amplification; thus, one possibility is that MET signaling requires KRAS. KRAS also had a low Z score in

MKN-45 cells by shRNA screening (avg. Zscore of -0.8), suggesting the phenotype is biologically relevant

and not a false positive. Both the Beta-catenin and KRAS sensitivities in MKN-45 would not have been

predicted by genetic alterations alone, but were discovered independently by both RNAi and CRISPR,

highlighting the importance of functional profiling. The selective pattern of NRAS and PIK3CA

dependency correlated well with the presence of oncogenic alterations in NRAS and PIK3CA,

respectively (Fig 2C, Supplementary Fig.S 5). In addition, MDM2 sgRNAs selectively impaired the growth

of p53 wild-type but not p53 mutant cell lines (Fig 2C). Importantly, this genetic pattern of MDM2

dependence recapitulates the selective inhibition of p53 wild-type cell lines by pharmacological MDM2

inhibitors, such as Nutlin-3 (16). Together, these findings indicate that in addition to the identification of

broadly essential genes, CRISPR-based dropout screens can also robustly identify cancer-selective

vulnerabilities.

sgRNAs targeting conserved PFAM domains show most robust dropout phenotypes

While the CRISPR-based screen identified CTNNB1 as a cancer-selective dependency in WNT-pathway

deregulated cancer models, CTNNB1 was one of the few genes that scored more robustly in the shRNA

screen compared to CRISPR (Fig 3A). Examination of the individual sgRNA scores indicated that the

efficacy of sgRNAs correlated with the targeting position in the CTNNB1 transcript (Fig. 3B); the first five

sgRNAs targeting the most 5’ regions of the CTNNB1 transcript showed very little to no dropout

phenotype. By contrast, 87% of the next 15 sgRNAs targeting the downstream exons 3, 4 and 5 exhibited

a stronger lethality score. Investigation of the genomic locus of CTNNB1 revealed that it harbors an

alternative translational initiation start site in exon 3 (transcript ID ENST00000405570) suggesting that

the isoform expressed from this alternative start site is likely sufficient for cancer cell growth, explaining

the lack of a dropout phenotype of the 5’ targeting sgRNAs.




We next set out to more systematically investigate the importance of sgRNA positioning on gene

inactivation. To this end, we designed a sgRNA library that contains all possible sgRNAs targeting a set of

139 genes with an average of 364 sgRNAs/gene, which we refer to as CRISPR tiling array (Fig. 4A). The

genes included in the CRISPR tiling array were chosen to represent diverse biological functions, but were

enriched for genes that elicited growth phenotypes in the primary screen. In order to minimize potential

biases, we included all unique sgRNA sequences targeting these gene coding sequences only requiring

the presence of a PAM sequence and lack of perfect homology to other coding sequences. This CRISPR

tiling library was screened in the three cancer cell lines DLD1, RKO and NCI-H1299. Interestingly, as

observed for CTNNB1, for 63% (46 of 73) of the growth essential genes in DLD1, the sgRNA performance

was strongly influenced by the sgRNA position within the coding region. Similarly, 68% (52 of 76) of the

essential genes in RKO cells showed coding-region dependent activity. The growth effects of individual

sgRNAs were significantly correlated across cell lines (r2=0.504), suggesting that these effects represent

consistent differences in the biological effectiveness of individual reagents (Fig 4B). We next performed

a systematic correlation analysis of sgRNA features to identify what features correlated most strongly

with sgRNA potency. Interestingly, the top predictive feature for sgRNA performance was its localization

within a conserved Pfam protein domain (Fig 4C, Supplementary Fig. S6). In addition, the extent of

sequence conservation across vertebrate species was also a good predictor (p<<0.001) of sgRNA

efficacy, regardless of whether or not the region was annotated as a conserved Pfam domain. While

prior studies (7) have suggested that there is value in targeting the most 5’ coding regions of proteins

with CRISPR reagents, this was not the case in our screens. In this dataset, the average phenotype of

sgRNAs targeting essential genes were slightly weaker for sgRNAs targeting the extreme N-terminal

coding region, and much weaker in the 3’ most coding regions (last 20%) of proteins. This effect,

however, appeared to be largely driven by the location of PFAM domains within coding regions, as the




N-terminal and C-terminal effects were no longer observed when only sgRNAs targeting annotated

domains were included in the analysis (Supplementary Fig. S7A-B).

Based on the observation that sgRNAs targeting conserved protein domains scored more robustly in

CRISPR-based screens, we hypothesized that CRISPR-tiling data might be used to perform functional

annotation of critical protein domains. Indeed, sgRNAs targeting the highly conserved armadillo repeats

in Beta-catenin demonstrated more significant average lethality scores compared to sgRNAs targeting

less conserved regions (Fig 4D). The failure of some of the Beta-catenin sgRNAs to score despite

targeting the highly conserved armadillo repeats correlated with ineffective genome editing by these

reagents (Supplementary Fig. S8A-C). Similar to the case of Beta-catenin, sgRNAs targeting the highly

conserved kinase domain or polo-box regions in PLK1 showed the most robust dropout phenotypes (Fig

4E), and sgRNAs targeting the kinase domain of Aurora kinase B (AURKB), had significantly stronger

effects than those targeting the extreme N or C termini (Fig 4F). These findings are consistent with the

notion that the armadillo repeats in Beta-catenin, the kinase activity in Aurora kinase B, and both the

kinase activity and polo-boxes in PLK1’s are essential in mediating their cellular functions (13, 17). A

recent study revealed that the helicase activity but not the bromo-domain of BRM is required to sustain

the growth of BRG1 deficient cancers (Fig. 4G) (18). Strikingly, the CRISPR tiling data for BRM indicated a

more robust dropout phenotype for sgRNAs targeting the ATPase/helicase activity compared to those

targeting the bromo-domain region. Together, these findings suggest that CRISPR tiling screens might be

useful, in some cases, to decipher which protein domains are required for cancer cell growth.

Amplified genomic loci score as false-positive in CRISPR based dropout screens.

As described earlier, we found that in the two aneuploid cancer cell lines SF268 and MKN-45 several

non-expressed genes scored as essential, suggesting that these genes represent false positive hits.

Strikingly, all of these false-positive hits mapped to regions of high-level copy number amplification (Fig




5A, B). We therefore wanted to explore more deeply the effect of amplified genomic regions on the

performance of CRISPR-based screens. MKN-45 is a gastric cancer cell line that harbors amplification of a

region of chromosome 7 (7q31) that contains the likely driver oncogene MET (Fig. 5A). While MET

scored as essential in MKN-45 cells, all other genes included in the library and located within the 7q31

amplicon also scored as lethal. Moreover, sgRNAs targeting ING3 and CAV1 exhibited the strongest

viability effect of the genes located within 7q31 amplicon, with MET ranking third (Fig. 5C). Similar

results were observed for SF268 cells, where all genes in the chromosome 11 amplicon (11q22) scored

as lethal (Fig. 5B, D). YAP has been hypothesized to be the most likely driver of this amplicon (19). While

YAP did score as the most strongly essential in this cell line, three genes within this amplicon, MMP7,

MMP20 and ANGPTL5, showed strong viability effects despite lacking detectable expression based on

RNAseq. Of note, all of the non-expressed genes (RNAseq<1) that scored as lethal in SF268 and MKN-45

cells were located in amplified genomic regions (Fig 5A, B). By contrast, shRNA-based screens identified

both MET and YAP as the sole driver oncogenes of their respective amplicons (Fig. 5E, F). We

hypothesized that sgRNAs targeting amplified loci may lead to excessive double-strand breaks and

activation of the DNA damage repair pathways. To test this, we examined the effects of sgRNAs

targeting the non-expressed and amplified genes MMP7, MMP20, and ANGPTL5 in SF268 cells that

harbor the 11q22 amplicon. All 3 sgRNAs led to a strong increase in phosphorylated histone H2AX, a

marker of DNA damage (Supplementary Fig. S9a) and resulted in a G2/M arrest and induction of

apoptosis (Supplementary Fig. S9b-d and S10a-b). As predicted, the induction of DNA damage response,

G2/M arrest, and apoptosis by these sgRNAs was specific to cells with 11q22 amplicon and not observed

in the diploid DLD-1 cells (Supplementary Fig. S9a S10a-b).

We next explored the effect of relative gene copy number on CRISPR lethality score more globaly across

the CRISPR screening dataset. When comparing the copy number status to the average lethality score

for all 2,700 genes screened in these two cell lines, we found a positive correlation between the degree




of amplification and CRISPR lethality score (Fig. 6A). By contrast, there was no correlation between copy

number and lethality score in the shRNA screen dataset (Fig. 6A). Even sgRNAs directed against loci with

only a modestly increased copy number, harboring as few as one or two additional copies, showed a

greater average growth inhibitory effect than non-amplified loci. In addition, we observed that sgRNAs

targeting regions harboring hemizygous or complete loss of the genomic region displayed on average a

less pronounced growth effect than diploid regions. This effect was highly significant even when the

analysis was restricted to only non-expressed genes (p=10-35), thus excluding the possibility that this

effect is due to disruption of gene function. Together, these findings further support the notion that

CRISPR reagents that induce multiple genomic cuts result in anti-proliferative effect independent of the

target gene function and that this is directly proportional to the number of induced cuts. We next

wanted to investigate if this phenomenon may also help to explain some of the off-target lethality of

individual sgRNA reagents. To minimize any confounding effects due to on-target gene inactivation, we

restricted this analysis to 14,000 sgRNAs targeting non-lethal genes (as judged by lack of dropout of the

average sgRNA targeting that gene). Strikingly, the best predictor of off-target lethality was the number

of genomic sites with perfect complementarity to the target site (Figure 6B and Supplementary Table

S1). To investigate the mechanism of growth inhibition of these ‘multi-cutter’ sgRNAs, we examined the

cellular response to VEGFA site 2 sgRNA that was previously shown by GUIDE-seq to have more than 140

verified off-target sites (20), as well as another multiple cutter sgRNA observed in our screens (originally

designed against the olfactory receptor OR4F5). Similar to sgRNAs targeting amplified loci, we found

that both ‘multi-cutter’ sgRNAs led to a strongly increased phosporylation of H2AX, G2/M cell cycle

arrest, and apoptosis (Supplementary Fig. S9 and S10). It is important to note that the sgRNAs included

in the CRISPR tiling array were only filtered against perfect matches to other coding regions rather than

the entire genome. sgRNAs targeting multiple genomic loci frequently contained low complexity repeat

sequences (e.g. AGGAGGAGG…), but the off-target effects due to multiple genome matches were still




observed after the exclusion of low complexity repeats from the dataset. Collectively, these findings

indicate that loss-of-function proliferation based studies using sgRNA mediated gene-inactivation will be

subject to a set of off-target activities related to the number of times a guide strand sequence is found

in the genome. This will likely lead to false-positive in genes found in areas of genomic amplification,

and false-positives due to multiple homologous sites for a given sgRNA. Hence, sgRNAs should be

selected to have no additional matches to genomic regions (even if not expressed) in order to minimize

off-target lethality due to excessive genome damage. Moreover, these findings indicate that RNAi or

CRISPRi-based screens (21) will be better suited to elucidate the driver oncogenes of amplified regions.

Discussion

Genetic loss of function studies hold great promise for the discovery of novel therapeutic targets for

cancer and other diseases. In this study, we compared the deep coverage shRNA and CRISPR-based

screens for the systematic identification of cancer vulnerabilities. Our data indicate that CRISPR dropout

screens identified between 2-5 times as many essential genes compared to RNAi-based loss-of-function

screens, even when the shRNA screens are powered at 20 shRNAs per gene. We speculate that that this

high rate of false-negatives in RNAi-based screens can likely be attributed to the incomplete nature of

gene inactivation by RNAi, which in most cases generates hypomorphic rather than complete null alleles

(22). By contrast, CRISPR cutting of genomic DNA and error-prone NHEJ will result in indel mutations.

Indels are typically more catastrophic mutations to protein function and frequently lead to complete

gene disruption, especially in the case of frameshift mutations. These findings indicate that CRISPR

based dropout screens can provide a more comprehensive assessment of genetic dependencies

compared to RNAi-based screens.

As for any emerging technology, the specificity and optimal design parameters for CRISPR experiments

are not yet fully understood. Although CRISPR-based screens generally have a low false-positive rate (7,




11), likely owing to the increased targeting specificity of sgRNAs (22), we surprisingly found that CRISPR

can be prone to false-positive hits for genes with high ploidy, especially above a copy number threshold

greater than 6 copies. While it will be important to control for this class of false-positive hits, it is

important to note that these artefactual hits comprise only a minor fraction of all essential genes

discovered by CRISPR screens in aneuploid lines (Supplementary Fig. S11) and can easily be removed

bioinformatically. The copy number effect on CRISPR lethality was likely missed in several earlier studies

because those screens were performed on cell lines with stable diploid genomes. A recent study has

observed a similar copy number effect on a single cell line harboring a high level amplicon(8). We

reasoned that the lethality of sgRNAs targeting amplified genomic regions might be explained by two

hypotheses. First, sgRNAs targeting genes within tandem amplicons could lead to the excision of the

entire locus including removal of the essential oncogenic driver genes. Alternatively, an excessive

number of DNA double strand breaks may lead to sustained activation of the DNA damage response

pathway and growth inhibition. In agreement with Wang et al., we found that sgRNAs targeting

amplified loci led to an increase of the DNA damage marker phospho-H2AX, a G2/M cell cycle arrest,

and induction of apoptosis(8). These findings suggest that activation of the DNA damage response

pathway due to excessive DNA double strand breaks is, at least in part, responsible for the observed

growth inhibitory effects, but it is quite possible that the deletion of oncogenic drivers in tandem

amplicons contributes as well. The CN effect of CRISPR appears to be independent of p53 status, as it

was observed with similar magnitude in both p53 mutated (SF-268) and wild-type (MKN-45) cell lines.

While the CN effect is most severe at highly amplified loci, we found that even subtle copy number

changes can have statistically significant effects on CRISPR dropout scores. It is important to note,

however, that one may be able to correct for these subtler copy number effects with bioinformatics

approaches.




These findings have several important implications for the design of CRISPR screening strategies. First,

CRISPR-based screens will likely not be a good approach to determine drivers of amplified genomic

regions. The putative amplified driver oncogene MET, for instance, did not have the strongest viability

effect in MKN-45 cells compared to other genes in the amplicon. By contrast, MET was identified as the

driver oncogene of this amplicon using shRNA-based screen, indicating that RNAi or CRISPRi-based

screens (21) are better suited to elucidate the driver oncogenes of amplified regions. Second, these

findings have important implications for future sgRNA library designs. In order to avoid lethality due to

excessive genome cuts, it will be critical to design CRISPR reagents that have no or at least minimal other

matches across the entire human genome. Our findings also imply that for pooled CRISPR screening

studies, it will be important to keep the multiplicity of infection during lentiviral transduction low, as

transduction with multiple sgRNAs targeting different genomic regions could lead to excessive genome

cuts and hence result in lethality. Interestingly, even diploid genes (CN=2) exhibited a slight but

statistically significant growth reduction compared to haploid (CN=1) gene loci. Due to this apparent

selection pressure against any genome cutting, it is possible that Cas9 expressing cells could be selected

against strongly during the course of screening. Third, the ability to easily multiplex sgRNA in single

experiments affords the ability of complex genome engineering and synthetic lethal screening. However,

based on our findings, one needs to carefully control for the effects of additional genomic cuts in dual or

even higher multiplexed screens, as synthetic lethality could be the result of passing a threshold of

‘excessive’ genomic cuts rather than genetic interactions. Fourth, the observed copy number effects

suggest that the use of a scrambled non-targeting CRISPR that does not cut the human genome is likely

not the best control for CRISPR lethality experiments, and should be replaced with reagents cutting non-

expressed or known non-essential genomic regions, such as the AAVS1 locus. Lastly, it will be important

to examine whether the copy number effects observed in our study also pertain to normal tissues. In

that case, caution should be exerted in both the experimental and therapeutic application of CRISPR to




the editing polyploid tissues, such as liver (23), as it may result in extensive genome damage that leads

to impaired growth or apoptosis.

Most sgRNA libraries have been designed to direct CRISPR-Cas9-induced mutations to the 5’exons of

coding regions (7, 11) with the goal of introducing frame-shift mutations early in the coding region of

the gene of interest, and initial sgRNA design rules (24, 25) have focused on thermodynamic and

sequence parameters of the guide RNA, much like the rules that were derived for RNAi reagents (26).

Our results, however, suggest that performance of sgRNAs appears to be also strongly influenced by the

structure/function of the gene regions they target. This can likely be explained by the fact that CRISPR

can induce both frameshift (3n+/-1, 3n+/-2) and in-frame deletions (3n) of variable size. The

consequences of these indel events can be quite variable depending on the nature of the deletion event.

Frame-shift deletions are likely to destroy protein function due to the deletion of large regions of the

protein. However, small in-frame deletions in non-essential domains are likely to retain functionality (i.e.

deletion of one or a few amino acids does not alter protein function) and thereby significantly reduce

the signal-to-noise in dropout screens. By contrast, deletions of even single amino acids in key functional

domains, such as the catalytic core, are likely perturbing protein function due to improper spacing of

functional groups required for catalysis (27). Therefore, in contrast to non-essential domains that can

tolerate small in-frame deletions, the deletion of even a single amino acid residue in highly conserved

catalytic regions will likely result in disruption of protein function, explaining why these conserved

regions show a much more robust dropout phenotype compared to non-essential regions. These

findings are consistent with recent findings by Vakoc and coworkers (6) and imply that for genes of

unknown function or with multiple known functions, the phenotypic strength of sgRNA targeting

different regions could help pinpoint which domains are most essential for cancer cell growth.




Collectively, our study demonstrates the power of CRISPR-based dropout screens towards identifying

cancer-selective vulnerabilities, but also highlight important caveats for the interrogation of genes in

amplified regions. Moreover, our results suggest that the frequently-used sgRNA design strategies that

predominantly target the most 5’ coding regions of genes may be sub-optimal. Instead, our data

indicate that targeting the most highly conserved regions of a gene may yield a more robust dropout

phenotype and thus maximize screen performance. Together, the findings described in this study

provide a roadmap towards the systematic elucidation of cancer dependencies using CRISPR-based

screening approaches.

METHODS

Cell culture, RNA-seq and copy number variation

Cell lines were purchased from ATCC, the RIKEN cell bank or NCI/DCTC on June 2008 and were grown in

either DMEM or RPMI supplemented with 10%FBS (Thermo Scientific). Cell lines were authenticated by

snp genotyping with the fluidigm biomark platform, with a panel of 48 SNPs (Fluidigm) prior to the

screens. DNA copy number was measured using high-density single nucleotide polymorphism arrays

(Affymetrix SNP 6.0)(28). The RNA seq data was acquired from the cancer cell line encyclopedia from the

Broad institute where large insert non-strand specific RNA sequencing was performed using a large-

scale, automated variant of the Illumina Tru Seq™.Oligo dT beads are used to select mRNA from the

total RNA sample (200ng). The selected RNA is then heat fragmented and randomly primed before cDNA

synthesis from the RNA template. The resultant cDNA then goes through Illumina library preparation

(end repair, base “A” addition, adapted ligation and enrichment) using Broad designed indexed adapters

for multiplexing. After enrichment, the samples are qPCR quantified and equimolar pooled before

processing to Illumina sequencing, done in the Illumina HiSeq 2000 or HISeq 2500, with sequence

coverage to 100M paired reads.




Vectors and CAS 9 cell line generation

To construct the lentiviral CAS9 vector, a human optimized 3FlagSPy-Cas9 was cloned into pLenti 6

(Thermo Scientific). Cell lines expressing CAS9 were generated by lentiviral transduction of the pLenti6-

3flagSPyCAS9 vector. Positive populations were selected using Blasticidin S (Thermo Scientific). CAS9

expression was measured by flow cytometry. 2X106 cells were fixed with 1% PFA (Electron Microscopy

Sciences) and ice cold methanol (Fisher Scientific), cells were permeabilized with o.2% Triton-X (Sigma-

Aldrich) and stained using an antibody against Cas9 at a concentration of 1/200 (Cell signaling) .

The shRNA library was constructed by Cellecta Inc. and can be acquired using library ID number: 27K-

BGP2-MS-NOVA; 13K-hTF-GH-NOVA; 13K-hYAP-GH-NOVA; 13K-hEPI2-GH-NOVA. The sgRNAs libraries

were designed as previously described (7). A modified tracrRNA scaffold (29) for cas9 loading was cloned

into the sgRNA vectors before cloning of the guide RNAs. Each library targets ~2700 genes and is

comprised of 20 shRNA or sgRNAs per gene (Supplementary table S2-3). For the tiling library, all possible

sgRNAs (based on the presence of a PAM motif) against 157 genes were identified (Supplementary table

S4). Oligonucleotides were synthesized on a 92k array (Custom array Inc.), amplified by PCR, and cloned

into the lentiviral U6 sgRNA expression vector’s BbsI restriction sites using Golden Gate assembly (30).

For all proliferation assays and Next generation sequencing, individual sgRNAs were cloned to an

inducible U6 shRNA or sgRNA expressing vector using the restriction enzyme BbsI or AarI.

CRISPR Guide Selection

RefSeq (downloaded on January 5, 2015) was used as the gene model for guide design. All potential

20_mer guides with a predicted cut site within an exon or within 10 base pairs from the exon-intron

boundary were included as potential guides. Guides were annotated with sequence properties (e.g. GC




Percentage, sequence degeneracy, Doench-root), mapping properties (e.g. 20 mer sequence uniqueness

in the human genome, whether there are known overlapping SNPs or variants observed in any cell lines

in the Novartis-Broad cancer cell line encyclopedia (CCLE)), gene and expressed properties (e.g.

overlapping protein domains).

Rather than choosing guides based on transcript or gene, genetic features were first grouped. In

particular, transcript isoforms which shared at least 50% of potential guides were combined into a single

meta-transcript, for which guides were chosen optimized to target all isoforms in that meta-transcript.

Pooled screening

For all screens, cells were infected with lentiviral shRNAs or sgRNA pools at a representation of 1000

cells per shRNA at an MOI of 0.5. Cells were selected for four days in the presence of puromycin, a

reference sample was collected 72 hours after selection to ensure adequate selection/representation.

Cells were propagated for a total of 14 days with an average shRNA/sgRNA representation of 1000

maintained at each passage. 100 million cells were harvested for DNA extraction by Qiagen QIAmp

Blood Maxi kit, shRNA and sgRNAs were PCR amplified from 100 ug of genomic DNA and PCR fragments

of 260-280bp were purified using Agencourt AMpure XP beads (Beckman). The resulting fragments were

sequenced on a Hiseq 2500 (Illumina) with a single end 50bp run. Sequencing reads were aligned to the

shRNA or sgRNA library and the enrichment or loss of individual bar codes or sgRNA were quantified.

Data processing

For each sample, total number of read counts were normalized to 50x106, with 50 additional pseudo-

counts added to each shRNA to minimize false positives in the low-abundance tail of the shRNA library

distribution, where counts are unreliable. All samples had day 14 log2 ratios for each sgRNA/shRNA

calculated relative to plasmid counts and shRNAs or sgRNAs whose abundance was significantly

different to the mean was calculated using a Z score. The average Z score values integrate the




information from multiple shRNAs targeting a single gene, thus showing the similarity of the effect of

these multiple sgRNAs/shRNA’s and minimizing the impact of possible off-target effects.

Statistical analysis

All statistics were computed using python/scipy/pandas. P-values were calculated using the mann-

whitney-wilcoxon rank-sum test using Python’s scipy.stats.mannwhitneyu function. Correlation

coefficients between Z-scores for DLD1 and RKO were calculated by Pearson correlation, while

correlation between Z-scores and vertebrate sequence conservation were calculated using spearman

correlation.

Sequencing analysis

Cells were stably transfected with individual sgRNAs against CTNNB1, after selection cells were cultured

in the presence or absence of doxycycline and collected for DNA extraction after 4 days in culture.

Target regions were amplified by the locus-specific primer pairs shown in Table 1. Amplicons were

pooled in equimolar amounts and libraries were generated on the illumine NeoPrep using the TruSeq

nano protocol and sequences on the MiSeq (Illumina)). The Insertion/deletion frequency was calculated

as previously described (8, 31).

Western Blot analysis

Protein extracts, separated by SDS-PAGE and transferred onto PVDF membranes, were probed with

antibodies against CDK9 ( clone (C12F7) Cell signaling), actin ( Clone AC-74, Sigma), pH2AX (Ser139-

clone JBW301, Millipore) and Tubulin (clone DM1A, Sigma). Proteins of interest were detected with

HRP-conjugated sheep anti-mouse and sheep anti rabbit IgG antibody (1: 2500, Biorad) and visualized

with the Pierce ECL Western blotting substrate (Thermo Scientific, Rockford, IL), according to the

provided protocol.




Proliferation assays

Cells stably expressing dox inducible shRNA or sgRNAs against B catenin, NRAS and PLK1 (Supplementary

table S3) were used for ATP-based measurements of cellular proliferation by plating 1,300 cells per well,

biologically replicated three times, in 96-well plates. After 6 days, 100 μl of Cell Titer-Glo reagent

(Promega) were added to each well, mixed for 30 minutes, after which the luminescence was measured

on the SpectraMax M5 Luminometer (Molecular Devices). .P values were determined by one-tailed

Student’s t-Test.

Proliferation was also measure using live cell time-lapse imaging. Cells were harvested by trypsinization,

counted on a Countess automated cell counter (Invitrogen, Carlsbad, CA) and plated at 130 cells per well

on 96 tissue culture plates in 3 replicates Photomicrographs were taken every 6 hours using an Incucyte

live cell imager (Essen Bioscience) and confluence of the cultures was measured using Incucyte software

(Essen Biosciences ) over 160 hours in culture. P values were determined by one-tailed Student’s t-Test.

Cell cycle and Annexin/ fixable viability dye assays

Cells were analyzed for phosphatidylserine exposure by an annexin-V PerCP-eFluor 710/Fixable viability

dye eFluor 780 (eBioscience) double-staining according to the provided protocol. Cells stably transfected

with sgRNAs toward MMP7, MMP20, ANGPTL5, VEGFA and OR4F5 (Supplementary table S3) were

analyzed after 6 days in culture. A minimum of 10,000 cells were collected with FACScanto (BD

Pharmingen) and analyzed with Flowjo (Tree star).

For cell cycle analysis cells were plated in 6 well plates and analyzed 6 days after stable transfection of

the sgRNAs mentioned above. Cells were harvested, fixed with 70% ethanol and stained with a solution

containing 1% Triton X100 (Sigma), 1ug/ml DAPI (Invitrogen) in PBS for 30 minutes. A minimum of




10,000 cells were collected with LSRFortessa (BD Pharmingen) and analyzed with Flowjo (Tree star). The

Watson (Pragmatic) model was used for the cell cycle and apoptotic peak modeling.

References

1. Cowley GS, Weir BA, Vazquez F, Tamayo P, Scott JA, Rusin S, et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific data. 2014;1:140035. 2. Hoffman GR, Rahal R, Buxton F, Xiang K, McAllister G, Frias E, et al. Functional epigenetics approach identifies BRM/SMARCA2 as a critical synthetic lethal target in BRG1-deficient cancers. Proceedings of the National Academy of Sciences of the United States of America. 2014;111:3128-33. 3. Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, Quattrochi B, et al. A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nature methods. 2012;9:363-6. 4. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPR-Cas9 for genome engineering. Cell. 2014;157:1262-78. 5. Jinek M, East A, Cheng A, Lin S, Ma E, Doudna J. RNA-programmed genome editing in human cells. eLife. 2013;2:e00471. 6. Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, Vakoc CR. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nature biotechnology. 2015;33:661-7. 7. Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80-4. 8. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350:1096-101. 9. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell. 2015;163:1515-26. 10. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, et al. The Pfam protein families database. Nucleic acids research. 2008;36:D281-8. 11. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339:819-23. 12. Schmidt EV. The role of c-myc in cellular growth control. Oncogene. 1999;18:2988-96. 13. Raab M, Kappel S, Kramer A, Sanhaji M, Matthess Y, Kurunci-Csacsko E, et al. Toxicity modelling of Plk1-targeted therapies in genetically engineered mice and cultured primary mammalian cells. Nature communications. 2011;2:395. 14. Segditsas S, Tomlinson I. Colorectal cancer and genetic alterations in the Wnt pathway. Oncogene. 2006;25:7531-7. 15. Nunez F, Bravo S, Cruzat F, Montecino M, De Ferrari GV. Wnt/beta-catenin signaling enhances cyclooxygenase-2 (COX2) transcriptional activity in gastric cancer cells. PloS one. 2011;6:e18562. 16. Arva NC, Talbott KE, Okoro DR, Brekman A, Qiu WG, Bargonetti J. Disruption of the p53-Mdm2 complex by Nutlin-3 reveals different cancer cell phenotypes. Ethnicity & disease. 2008;18:S2-1-8. 17. Hulsken J, Birchmeier W, Behrens J. E-cadherin and APC compete for the interaction with beta-catenin and the cytoskeleton. The Journal of cell biology. 1994;127:2061-9. 18. Vangamudi B, Paul TA, Shah PK, Kost-Alimova M, Nottebaum L, Shi X, et al. The SMARCA2/4 ATPase Domain Surpasses the Bromodomain as a Drug Target in SWI/SNF-Mutant Cancers: Insights from cDNA Rescue and PFI-3 Inhibitor Studies. Cancer research. 2015;75:3865-78. 19. Zender L, Xue W, Zuber J, Semighini CP, Krasnitz A, Ma B, et al. An oncogenomics-based in vivo RNAi screen identifies tumor suppressors in liver cancer. Cell. 2008;135:852-64.




20. Tsai SQ, Zheng Z, Nguyen NT, Liebers M, Topkar VV, Thapar V, et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology. 2015;33:187-97. 21. Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell. 2014;159:647-61. 22. Fellmann C, Lowe SW. Stable RNA interference rules for silencing. Nature cell biology. 2014;16:10-8. 23. Gentric G, Desdouets C. Polyploidization in liver tissue. The American journal of pathology. 2014;184:322-31. 24. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nature biotechnology. 2014;32:1262-7. 25. Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, et al. RNA-guided human genome engineering via Cas9. Science. 2013;339:823-6. 26. Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, et al. Design of a genome-wide siRNA library using an artificial neural network. Nature biotechnology. 2005;23:995-1001. 27. Arpino JA, Reddington SC, Halliwell LM, Rizkallah PJ, Jones DD. Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on GFP folding and structure. Structure. 2014;22:889-98. 28. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603-7. 29. Chen B, Gilbert LA, Cimini BA, Schnitzbauer J, Zhang W, Li GW, et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell. 2013;155:1479-91. 30. Weber E, Engler C, Gruetzner R, Werner S, Marillonnet S. A modular cloning system for standardized assembly of multigene constructs. PloS one. 2011;6:e16765. 31. Dow LE, Fisher J, O'Rourke KP, Muley A, Kastenhuber ER, Livshits G, et al. Inducible in vivo genome editing with CRISPR-Cas9. Nature biotechnology. 2015;33:390-4.




Figure legends

Figure 1. CRISPR-based dropout screens identify more essential genes compared to shRNA screens

A, Schematic representation of shRNA and CRISPR-based screens. B,C Comparison of drop out

phenotype of 2722 human genes in DLD1 and SF268 highlighting pan-lethal genes. The dropout

phenotypes are calculated as Z scores for 20 reagents per gene, and the shRNA Z-score is displayed on Y-

axis and CRISPR Z-score on the X-axis. Quadrant III contains genes that score as lethal by both CRISPR

and shRNA. A few known essential genes are marked by colored dots and indicated in the legend.

Quadrant II (red) contains genes that scored lethal only by CRISPR and quadrant IV (grey) contain genes

that scored lethal only by shRNA. D,E, The lethality Z score of each gene in CRISPR screen (X-axis) is

graphed against its expression level as determined by RNAseq (Y-axis) for the indicated cell lines, DLD1

and SF268.

Figure 2. CRISPR screens can robustly identify cancer-selective dependencies.

A, Distribution of lethals amongst the five cancer lines tested. A Z score cutoff of -1 was used to

delineate genes that are cell essential. Pan-lethal genes will be essential in 4 or 5 cell lines, whereas

selective lethals only affect growth of a few cell lines. B, Venn diagram showing the distribution of

CRISPR drop-out screens (Z score<-1) hits in HT1080, DLD1 and RKO. C, Heat map displaying the CRISPR

lethality Z-score of a few selected genes across the five cancer cell line models. The 4 genes on top are

pan lethal (dropout in every cell line), the 3 genes at the bottom are not expressed and hence show no

activity across all models. Several known genetic dependencies display a selective lethal pattern that

correlates with genetic alterations in the cell lines (genotype indicated on top).




Figure 3. sgRNA targeting 5’coding region of Beta Beta-catenin are ineffective due to alternative

translation initiation site in exon 3.

A, Scatter plot of CRISPR Z score (X-axis) versus shRNA Z score (Y-Axis) of 2722 genes in the colorectal

line DLD1. Beta-catenin is one of the few genes that shows a stronger dropout in shRNA screen

compared to the CRISPR screen. B, (top) Graphical depiction of CTNNB1 genomic locus . The CTNNB1

gene extends over 40kb and contains 16 exons (vertical hatches). (Bottom) exon structure of the human

CTNNB1 cDNA. Position of the start codon (ATG) as well as an alternative start site (*ATG Supported by

EMBL transcript ID ENST00000405570 and UniProtKB-A0A024R2Q3 and P35222). Magnification of exons

2-5 shows the location of each sgRNA relative to the CTNB1 cDNA. The color intensity represents the Z-

score for each sgRNA. None of the five sgRNAs targeting the 5’ most coding region score by CRISPR

(upstream of alternative start codon), whereas 13 of 15 sgRNAs downstream of that site score by

CRISPR.

Figure 4. sgRNAs targeting conserved Pfam domains display most robust dropout phenotypes

A, Schematic representation of the CRISPR tiling screen. B, Hexagonal correlation plot showing that

individual tiling reagents have very high correlation with each other in the two cell lines DLD1 and RKO.

C, Violin plot highlighting the statistical significance that sgRNA targeting conserved PFAM domains lead

to stronger lethality effects on average. This analysis was restricted to genes required for cell growth

(averaged Z scores below -0.4).. D-G, Individual examples of how sgRNA performance is influence by

gene position. Each dot represents the score of an independent sgRNA, with grey dots indicating

sgRNA’s targeting regions outside of a Pfam domain, and orange dots indicate sgRNA’s targeting PFAM

domains. The black line indicates the average dropout score for the neighboring 10 sgRNAs. The protein

domain structure of the respective genes is displayed on top with key Pfam domains labeled.




Figure 5. Highly amplified genes score as false positives in CRIPSR based screens.

A,B Schematic of genomically amplified regions in chromosomes 7 and 11 of the gastric line MKN45 and

the brain line SF268 respectively (Top). Line graph depicting the relative copy number of genes within

the amplicons in chromosome 7 (MKN45 cells) and chromosome 11 (SF268 cells) (Bottom). Orange dots

represent non-expressed genes as assessed by RNA-seq( RNAseq<1).

C-F, Scatter plots display the average CRISPR (C,D) or shRNA (E,F) lethality Z-score for each gene (Y-axis)

relative to its copy number (X-axis) for MKN45 (left) and SF268 cells (right). Orange data points indicate

non expressed genes (expression <1) as assessed by RNA seq. The solid line represents a regression line

for non-expressed genes. The dotted red box (lower right) marks the genes showing the largest dropout

phenotype. Green dot represent the mayor drop out in the highlighted amplicon.

Figure 6. Excessive genome cuts leads to off target lethality

A, Line graphs depicting the average lethality of all sgRNAs (red line) or all shRNAs (blue line) relative to

the copy number of the genes targeted. Lines represent the average Z score of all sgRNA’s and shRNAs

in MKN45 and SF268. Dashed line indicates the Z-score observed on sgRNA’s targeting genes with a

copy number of 2. The x axis (copy number) was split into 4 bins. B, Violin plot depicting the lethality Z-

scores of sgRNAs relative to the number of perfect genome matches. To avoid any influence of

functional on-target effects, only non-expressed genes were included in this analysis.




Table 1

CTNNB1 Forward Reverse

sgRNA #1 TGCTCAAGGGGAGTAGTTTCA CCACTGGTGAACTGGGAAGA

sgRNA#2 TGCTGAAACATGCAGTTGTAAA CTCACGATGATGGGAAAGGT

sgRNA#3 GGACTTCACCTGACAGATCCA TGGTCAGATGACGAAGAGCA

sgRNA#4 TTCCCAGTTCACCAGTGGAT GCACGCTCCCTATGAGAATC




Reagents targeting aspecific gene

Plasmid counts

Day

14

coun

ts

shRNA or CRISPR viral pool20 reagents per genetargeting 2700 genes

Cas9 line Infected cells

14 daysDeep

sequencing

Comparison of remaining targeting reagents

Figure 1. CRISPR-based dropout screens identify more essential genes compared to shRNA

screens

A

D

CRISPR avg. Z-score

RN

Ase

q L

og

2(0.

5+FP

KM

)

DLD-1

-3 -2 -1 0 1

1000

100

10

10.1

0.01

0.001

-3 -2 -1 0 1

0.60.2-0.2

-0.6

-1

-1.4

-1.8

-2.2

-2.6

B

CRISPR avg. Z score

shRN

A a

vg. Z

sco

re

DLD-1 SF268

III IV

II IRPS18PRPF19PSMA4RPL7CKAP5RUVBL1RAN

SF268

-4 -3 -2 -1 0 1CRISPR avg. Z score

Copy number >2E

-4 -3 -2 -1 0 1

0.60.2-0.2

-0.6

-1-1.4

-1.8

-2.2

-2.6

CRISPR avg. Z score

C

III IV

II I




Figure 2. CRISPR screens can robustly identify cancer-selective dependencies

DLD1 HT1080 MKN-45 RKO SF-268

PCNARPL7

PLK1PSMA4

MDM2NRAS

PIK3CAKRAS

CTNNB1

TCF7L2

TSSK2

CYP11B2

PROP1

A B

C

KRASG13D , T

P53S241F

PIK3CAE545K ,APC

Mut

META

mp

BRAFV600E , P

IK3CAH1047R

TP53R273H

NRASQ61K

Pa

n L

eth

al

Ge

ne

tic

de

pe

nd

en

cie

sN

on

ex

pre

sse

d

HT1080 DLD1

RKO

190

18

33

27

34

32

33

Avg

Z sc

ore

0

50

100

150

1 2 3 4 5

Nu

mb

er

of

leth

al

ge

ne

s

Number of cell lines




1 3 5 7 9 11 13 15 4 6 8 10 12 14 16

Figure 3. sgRNAs targeting 5’ coding region of Beta catenin are ineffective due to alternative

translation initiation site in exon3

B

2

ATG

Avg

Z sc

ore

*ATG

Genomic sequence

sgRNA’s

cDNA

3 5 4 2

-3

A

CRISPR avg. Z score

shRN

A a

vg. Z

sco

re

Beta-catenin

*Alternative Start site




200 400 6000

PLK1

Amino acid position

Amino acid positionAmino acid position200 400 600 8000

0

-1

-2

-3

CTNNB1

DLD

1 Z-

scor

e 0

-1

-2

-3

DLD

1 Z-

scor

e

0

-1

-2

-3

NC

IH12

99 Z

-sco

re

Amino acid position

CRIS

PR a

vg Z

-sco

re

0 100 200 300

AURKB

CRISPR tiling library

Pfam domain

All Possible

sgRNAs

Position-specific

viability phenotypesStart End

Figure 4. sgRNA targeting conserved PFAM domains display most robust dropout phenotypes

53 303 410 490 511 593

Kinase DomainArmadillo repeats

Kinase Domain

Polo-box Domains

141 664

76 327

0

1

-1

-2

-3DLD

1 Z-

scor

e-4

In PFAMDomain

Not in PFAMDomain

p<10-180

A

B

D

C

0 1-1-2-3

0

1

-1

-2

-3

DLD1 Z-score

RKO

Z-s

core

r2=0.509

Amino acid position

0

-1

-2

-3

NC

IH12

99 Z

-sco

re

0 500 1000 1500

SMARCA2ATP-binding Helicase_C Bromo domain

173 208 436 508 736 901 1054 1216 1419 1489

E

F G




q31.

2q3

1.1

q31.

31q3

1.32

q22.

1q2

2.2

Chr 7 Chr 11

113,000,000 115,000,000 117,000,000 119,000,000 121,000,000

64

32

16

8

4

2

Cav1

ANGPTL5Met ST7

ING3 PTPRZ1

Copy number0.5 1 2 4 8 16 32

1

0

-1

-2

PTPRZ1ST7MET

ING3CAV1

MKN45C

A

Cop

y num

ber

Copy

num

ber

1 2 4 8 16 32

10

-1

-2

-3-4

CR

ISP

R A

vg. Z

-sco

re

Copy number

MMP7MMP20BIRC3TMEM123BIRC2C11orf70ANGPTL5YAP1

SF268

MKN45 SF268

RNAseq count <1

RNAseq count <1

100,000,000 101,000,000 102,000,000 103,000,000

32

16

8

4

2

MMP20C11orf70

YAP

MMP7

TMEM123

BIRC3

BIRC2

Figure 5 . Highly amplified genes score as false-positives in CRISPR based screens

1 2 4 8 16 32

0

-0.8

-1.6

0.5 1 2 4 8 16 32

shRN

A

Avg

. Z-s

core

Copy numberCopy number

RNAseq count <1

BIRC3MMP7ANGPTL5TMEM123MMP20C11orf70BIRC2

YAP1

0

-0.8

-1.6

-2.4

PTPRZ1ING3ST7CAV1

MET

64

MKN45 SF268E

CR

ISP

R A

vg. Z

-sco

re

shRN

A

Avg

. Z-s

core

B

D

F




Number of perfect match sites in genome

1 2 3 4 5

DLD

1 Z

-sco

re

A

B2

1

0

-1

-2

-3

-4

-5

Figure 6. Excessive genome cutting leads to off target lethality

0.5 1 2 4 8 16 32

0

-0.4

-0.8

-1.2

-1.6 shRNACRISPR

Avg.

Z-s

core

Copy number




Published OnlineFirst June 3, 2016.Cancer Discov Diana M. Munoz, Pamela J. Cassiani, Li Li, et al. amplified genomic regionscancer vulnerabilities but generate false-positive hits for highly CRISPR screens provide a comprehensive assessment of

Updated version

10.1158/2159-8290.CD-16-0178doi:

Access the most recent version of this article at:

Material

Supplementary

http://cancerdiscovery.aacrjournals.org/content/suppl/2016/06/03/2159-8290.CD-16-0178.DC1

Access the most recent supplemental material at:

Manuscript

Authoredited. Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journal.Sign up to receive free email-alerts

Subscriptions

Reprints and

[email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

Permissions

Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerdiscovery.aacrjournals.org/content/early/2016/06/11/2159-8290.CD-16-0178To request permission to re-use all or part of this article, use this link



http://cancerdiscovery.aacrjournals.org/lookup/doi/10.1158/2159-8290.CD-16-0178

http://cancerdiscovery.aacrjournals.org/content/suppl/2016/06/03/2159-8290.CD-16-0178.DC1

http://cancerdiscovery.aacrjournals.org/cgi/alerts

mailto:[email protected]

http://cancerdiscovery.aacrjournals.org/content/early/2016/06/11/2159-8290.CD-16-0178


Documents

CRISPR screens provide a comprehensive assessment of cancer