1
Genome Coverage of MAD7 and Other Nucleases E. coli Figure 2. Percent of E. coli and S. cerevisiae genomes within 20 or 50 bases of TTCN, TTTN, NGG, and TTN PAM sequences of the associated CRISPR nuclease systems. MAD7 covers >90% of both genomes within 50 bases of TTTN. This close PAM proximity ensures that efficient editing should be possible at most genomic targets. Yeast Abstract RNA-guided DNA nuclease systems are widely used in genome editing. Low diversity of currently available nucleases in terms of guide RNA characteris- tics, nuclease size, and PAM target sequences, as well as commercial restric- tions on their use, highlight the need for novel systems. Inscripta has built the capability to identify and generate RNA-guided DNA endonucleases. Here we describe MAD7, a system that targets TTTN PAMs and shows both cutting and editing activity in E. coli and S. cerevisiae. MAD7 Structure Relative to AsCpf1 MAD7: a novel RNA-directed nuclease for precision gene editing MAD7 Cuts Diverse Loci in E. coli Figure 1. A. Overlay of MAD7 inferred structure (green) on the crystal structure of the related ortholog AsCpf1 (grey) in complex with guide RNA (gDNA; aqua) and DNA (red). B. Expanded view of the PAM recognition domain structure showing hydrogen bonds with DNA (indicated by dashed lines). C. Comparison of PAM recognition domain amino acid (AA) residues of MAD7 and AsCpf1 showing hydrogen bond contacts to DNA (red) and position relative to AsCpf1. The MAD7 nuclease used in these studies is 1263 AA in size and shows 31% overall sequence identity with AsCpf1. D. Secondary structure of crRNA, pseudoknot (blue) with base pairing indicated using Leontis-West- hoff nomenclature and a dinucleotide GG (grey). The box indicates variations to the loop region of the pseudoknot that have been validated in E. coli. Numbering indicates the nucleotide position relative to the transcription start site. Figure 4. Cleavage across 96 loci in E. coli. 71/96 loci show complete depletion, 5/96 loci were not present in either transformation, 30 loci were not cut and therefore overrepresented in the result- ing sequencing library. Figure 3. Cleavage efficiency of MAD7 across YTTN and NGGG PAMs. Data represents the average depletion observed for 6 different guides for each PAM, with each dot representing an independent measurement. MAD7-expressing E. coli (MAD7 + ) was transformed with a library of self-targeting plasmids with a NNNN-Target-NN sequence. Completeness of cut was determined by amplifying the target sequence and comparing the depletion of normalized read counts in MAD7+ cells to read counts observed in MAD7- cells. Notably, some loss in activity was observed with a T at the 3’ terminus of the PAM. MAD7 Cuts YTTN Conclusions We have identified MAD7, a novel gene-editing nuclease MAD7 is most similar to, but has low overall homology with, AsCpf1 Genome coverage can be >90% with MAD7 MAD7 cuts TTTN PAM sequences in 74% of loci tested MAD7 shows robust editing in both E. coli and S. cerevisiae MAD7 Induced Editing Efficiency TABLE 1. Edits in E. coli loci were selected to introduce non-synonymous changes to the genome. Edit status was determined by whole-genome shotgun sequencing. Edits in S. cerevisiae S288C were the introduction of an EcoRI recognition sequence into the genome. Edit status was determined by amplifying the locus by PCR and cutting with EcoRI followed by gel electrophoresis. Edit Detection Figure 6. Examples of edits. IGV screenshots showing alignments in the ruvA locus with edited bases highlighted. Edits in E. coli were detected by extracting genomic DNA from individual colo- nies, generating sequencing libraries with Nextera library prep kits and sequencing by NGS. Edits in S. cerevisiae were detected by amplifying the CAN1 locus from individual colonies and detecting the presence or absence of the edited sequence by sensitivity to digestion with EcoRI. Editing Designs Figure 5. Examples of genome target, guide RNA spacer, and homology arm sequences for editing designs for loci in E. coli and S. cerevisiae. PAM sequences are highlighted in yellow, and edited bases highlighted in aqua. ATCGCC GTGCA GGCC CTT GAATCGG ATCGCCATGCAAACCTTTAAATCGG Edited + EcoRI digest Not Edited + EcoRI digest S. cerevisiae CAN1 Genome < GCGCTC TTTCCCGACGAGAGTAAATGGCGAGGATACGTTCT > Spacer CCGACGAGAGUAAAUGGCGAG Homology arm < GCGCTC GAATTC -ACGAGAGTAAATGGCGAGGATACGTTCT > E. coli ruvA Genome < GACCGA TTTAAAGGTTTGCATGGCGATCTCTTTACGCCAGC > Spacer AAGGUUUGCAUGGCGAUCUCU Homology arm < GACCGATT CAA GGG CCTGCA CGGCGATCTCTTTACGCCAGC > Inactive Depleted Not transformed Edited: Wild Type: Andrew Garst, Eileen Spindler, Clint Davis, Chris Abraham, Charlie Johnson, Miles W. Gander, Juhan Kim, William G. Alexander, Jamie Kershner, Isaac Wagner, Andrea Edwards, Josh Shorenstein, Craig Struble, Paul Hardenbol, and Ryan T. Gill Inscripta, Inc. 5500 Central Ave #220 Boulder, CO 80301 Locus PAM Cell Homology arm length, bases Edited bases Colonies tested Edited colonies Edit efficiency alaC TTTC E. coli MG1655 132 6 13 7 54% cyaA TTTC E. coli MG1655 132 5 2 2 100% fucK TTTA E. coli MG1655 132 7 8 8 100% helD TTTA E. coli MG1655 132 5 7 7 100% lolD TTTC E. coli MG1655 132 8 8 8 100% lpxM TTTC E. coli MG1655 132 4 8 8 100% narQ TTTC E. coli MG1655 132 7 8 8 100% phoB TTTG E. coli MG1655 132 5 8 8 100% rhsB TTTC E. coli MG1655 132 5 5 5 100% ruvA TTTA E. coli MG1655 132 5 8 8 100% waaC TTTA E. coli MG1655 132 6 8 0 0% xylF TTTC E. coli MG1655 132 6 8 7 88% CAN1 TTTC S. Cerevisiae S288c 134 6 82 54 66% K607 -3T -2T -4T Y687 N631 K548 K134 T166/T167 - 133 - 134 - 135 - 165 - 166 - 167 - 168 - 169 AsCpf1_5B43 MAD7 - 547 - 548 - 549 - 550 - 551 - 606 - 607 - 608 - 609 - 610 MAD7 homology model AsCpf1 (5B43) DNA RNA - 686 - 687 - 688 PAM YKG SAK FTTYF FATSF NKEKN SKEYS PKCST PKVFL LSNNFIE ------- - 629 - 630 - 631 - 632 - 633 - 634 - 635 KYT IHP A A U U G U A G A U U C A U C U U C U GG 5’- 3’ 21 nt Spacer Pseudoknot - 21 a) d) b) c) 6 - 5’ 3’ 3’ 5’ A U G U U U A G 0 20 40 60 80 100 TTCN TTTN NGG TTN CasX Mad7 Cas9 C2c1 % accessible genome E. coli P20 E. coli P50 S. cerevisiae P20 S. cerevisiae P50 log 2 Depletion 2 0 -2 -4 -6 -8 -10 -12 TTTA TTTG TTTC CTTA CTTG CTTC TTTT CTTT TGGG CGGG GGGG AGGG 0% cut 99% cut Copyright © 2018 Inscripta, Inc. Rev 06/18

MAD7: a novel RNA-directed nuclease for precision gene editing · IGV screenshots showing alignments in the ruvA locus with edited bases highlighted. Edits in E. coli were detected

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MAD7: a novel RNA-directed nuclease for precision gene editing · IGV screenshots showing alignments in the ruvA locus with edited bases highlighted. Edits in E. coli were detected

Genome Coverage of MAD7 and Other Nucleases E. coli

Figure 2. Percent of E. coli and S. cerevisiae genomes within 20 or 50 bases of TTCN, TTTN, NGG, and TTN PAM sequences of the associated CRISPR nuclease systems. MAD7 covers >90% of both genomes within 50 bases of TTTN. This close PAM proximity ensures that efficient editing should be possible at most genomic targets.

Yeast

AbstractRNA-guided DNA nuclease systems are widely used in genome editing. Low diversity of currently available nucleases in terms of guide RNA characteris-tics, nuclease size, and PAM target sequences, as well as commercial restric-tions on their use, highlight the need for novel systems. Inscripta has built the capability to identify and generate RNA-guided DNA endonucleases. Here we describe MAD7, a system that targets TTTN PAMs and shows both cutting and editing activity in E. coli and S. cerevisiae.

MAD7 Structure Relative to AsCpf1

MAD7: a novel RNA-directed nuclease for precision gene editing

MAD7 Cuts Diverse Loci in E. coli

Figure 1. A. Overlay of MAD7 inferred structure (green) on the crystal structure of the related ortholog AsCpf1 (grey) in complex with guide RNA (gDNA; aqua) and DNA (red). B. Expanded view of the PAM recognition domain structure showing hydrogen bonds with DNA (indicated by dashed lines). C. Comparison of PAM recognition domain amino acid (AA) residues of MAD7 and AsCpf1 showing hydrogen bond contacts to DNA (red) and position relative to AsCpf1. The MAD7 nuclease used in these studies is 1263 AA in size and shows 31% overall sequence identity with AsCpf1. D. Secondary structure of crRNA, pseudoknot (blue) with base pairing indicated using Leontis-West-hoff nomenclature and a dinucleotide GG (grey). The box indicates variations to the loop region of the pseudoknot that have been validated in E. coli. Numbering indicates the nucleotide position relative to the transcription start site.

Figure 4. Cleavage across 96 loci in E. coli. 71/96 loci show complete depletion, 5/96 loci were not present in either transformation, 30 loci were not cut and therefore overrepresented in the result-ing sequencing library.

Figure 3. Cleavage efficiency of MAD7 across YTTN and NGGG PAMs. Data represents the average depletion observed for 6 different guides for each PAM, with each dot representing an independent measurement. MAD7-expressing E. coli (MAD7+) was transformed with a library of self-targeting plasmids with a NNNN-Target-NN sequence. Completeness of cut was determined by amplifying the target sequence and comparing the depletion of normalized read counts in MAD7+ cells to read counts observed in MAD7- cells. Notably, some loss in activity was observed with a T at the 3’ terminus of the PAM.

MAD7 Cuts YTTN

Conclusions• We have identified MAD7, a novel gene-editing nuclease • MAD7 is most similar to, but has low overall homology with, AsCpf1• Genome coverage can be >90% with MAD7• MAD7 cuts TTTN PAM sequences in 74% of loci tested• MAD7 shows robust editing in both E. coli and S. cerevisiae

MAD7 Induced Editing EfficiencyTABLE 1. Edits in E. coli loci were selected to introduce non-synonymous changes to the genome. Edit status was determined by whole-genome shotgun sequencing. Edits in S. cerevisiae S288C were the introduction of an EcoRI recognition sequence into the genome. Edit status was determined by amplifying the locus by PCR and cutting with EcoRI followed by gel electrophoresis.

Edit Detection

Figure 6. Examples of edits. IGV screenshots showing alignments in the ruvA locus with edited bases highlighted. Edits in E. coli were detected by extracting genomic DNA from individual colo-nies, generating sequencing libraries with Nextera library prep kits and sequencing by NGS. Edits in S. cerevisiae were detected by amplifying the CAN1 locus from individual colonies and detecting the presence or absence of the edited sequence by sensitivity to digestion with EcoRI.

Editing Designs

Figure 5. Examples of genome target, guide RNA spacer, and homology arm sequences for editing designs for loci in E. coli and S. cerevisiae. PAM sequences are highlighted in yellow, and edited bases highlighted in aqua.

ATCGCCGTGCAGGCCCTTGAATCGGATCGCCATGCAAACCTTTAAATCGG

Edit

ed+

EcoR

I dig

est

Not

Edi

ted

+ Ec

oRI d

iges

t

S. cerevisiae CAN1Genome < GCGCTCTTTCCCGACGAGAGTAAATGGCGAGGATACGTTCT >Spacer CCGACGAGAGUAAAUGGCGAGHomology arm < GCGCTCGAATTC-ACGAGAGTAAATGGCGAGGATACGTTCT >

E. coli ruvAGenome < GACCGATTTAAAGGTTTGCATGGCGATCTCTTTACGCCAGC >Spacer AAGGUUUGCAUGGCGAUCUCUHomology arm < GACCGATTCAAGGGCCTGCACGGCGATCTCTTTACGCCAGC >

Inactive DepletedNot

transformed

Edited:Wild Type:

Andrew Garst, Eileen Spindler, Clint Davis, Chris Abraham, Charlie Johnson, Miles W. Gander, Juhan Kim, William G. Alexander, Jamie Kershner, Isaac Wagner, Andrea Edwards, Josh Shorenstein, Craig Struble, Paul Hardenbol, and Ryan T. Gill Inscripta, Inc. 5500 Central Ave #220 Boulder, CO 80301

Locus PAM Cell Homology arm length, bases

Edited bases

Colonies tested

Edited colonies

Edit efficiency

alaC TTTC E. coli MG1655 132 6 13 7 54%

cyaA TTTC E. coli MG1655 132 5 2 2 100%

fucK TTTA E. coli MG1655 132 7 8 8 100%

helD TTTA E. coli MG1655 132 5 7 7 100%

lolD TTTC E. coli MG1655 132 8 8 8 100%

lpxM TTTC E. coli MG1655 132 4 8 8 100%

narQ TTTC E. coli MG1655 132 7 8 8 100%

phoB TTTG E. coli MG1655 132 5 8 8 100%

rhsB TTTC E. coli MG1655 132 5 5 5 100%

ruvA TTTA E. coli MG1655 132 5 8 8 100%

waaC TTTA E. coli MG1655 132 6 8 0 0%

xylF TTTC E. coli MG1655 132 6 8 7 88%

CAN1 TTTC S. Cerevisiae S288c 134 6 82 54 66%

K607

-3T

-2T

-4T

Y687

N631

K548

K134

T166/T167

-133

-134

-135

-165

-166

-167

-168

-169

AsCpf1_5B43MAD7

-547

-548

-549

-550

-551

-606

-607

-608

-609

-610

MAD7homology model

AsCpf1 (5B43)

DNA

RNA

-686

-687

-688

PAM

YKGSAK

FTTYFFATSF

NKEKNSKEYS

PKCSTPKVFL

LSNNFIE-------

-629

-630

-631

-632

-633

-634

-635

KYTIHP

AA

U

UGUAGAU

UCAUCUU

C U

GG5’-

3’21 ntSpacer

Pseudoknot

- 21

a) d)

b)

c)

6 -

5’3’

3’5’

A UG UU UA G

0

20

40

60

80

100

TTCN TTTN NGG TTNCasX Mad7 Cas9 C2c1

% a

cces

sibl

e ge

nom

e

E. coli P20 E. coli P50 S. cerevisiae P20 S. cerevisiae P50

log 2 D

eple

tion

2

0

-2

-4

-6

-8

-10

-12

TTTA

TTTG

TTTC

CTT

A

CTT

G

CTT

C

TTTT

CTT

T

TGG

G

CG

GG

GG

GG

AGG

G

0% cut

99% cut

Copyright © 2018 Inscripta, Inc. Rev 06/18