Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
www.sciencemag.org/cgi/content/full/science.aac7629/DC1
Supplementary Materials for
Structure of a yeast spliceosome at 3.6-angstrom resolution
Chuangye Yan, Jing Hang, Ruixue Wan, Min Huang, Catherine C. L. Wong, Yigong Shi*
*Corresponding author. E-mail: [email protected]
Published 20 August 2015 on Science Express
DOI: 10.1126/science.aac7629
This PDF file includes:
Materials and Methods Figs. S1 to S23 Tables S1 and S2
Materials and Methods
S. pombe strain
The S. pombe yeast strain used in this study carries a TAP tag (including a calmodulin
binding peptide and a protein A separated by a TEV protease cleavage site) at the C-
terminus of the Cdc5 protein. This design allows two tandem affinity purification steps.
To introduce this tag at the C-terminus of the Cdc5 chromosomal loci, the TAP tag and a
HphMX6 marker were amplified by PCR from the plasmid pF6Aa-CTAP-HphMX6. The
resulting PCR fragments were transformed to haploid S. pombe cells by lithium acetate
method (74) and the transformants were selected on hygromycin B-YES sodium medium.
The two affinity tags were confirmed by PCR and Western blots.
Purification of the yeast spliceosome
Purification of the yeast spliceosome was modified from a published protocol (75). The S.
pombe culture was grown on 5xYES medium for 24-30 hours at 30 oC to an OD600 of ~22.
Cell pellets from a 4-liter culture were collected by centrifugation at 3,800 rpm. The cell
pellets (~92 mL) were resuspended in 30 mL buffer containing 10 mM HEPES-KOH, pH
7.9, 10 mM Tris-Cl, pH 8.0, 40 mM KCl, and 110 mM NaCl. The cell suspension, at a
final volume of about 120 mL, was dropped into liquid nitrogen to form yeast beads with
a diameter of 3-6 mm and pulverized to powder by SPEX 6870 Freezer Mill. Frozen cell
powder was resuspended in 30 mL buffer containing 10 mM HEPES-KOH, pH 7.9, 10
mM Tris-HCl, pH 8.0, 40 mM KCl, 110 mM NaCl, 0.2 mM EDTA, 20% glycerol, and
protease inhibitor cocktail (1mM phenylmethylsulphonyl fluoride (PMSF), 2 mM
1
benzamidine, 2.6 μg/ml aprotinin, 1.4 μg/ml pepstatin and 10 μg/ml leupeptin). The cell
suspension was first centrifuged at 18,000g for 1 hour and the supernatant was
centrifuged again at 150,000g for 1 hour, yielding ~100mL cell extract and a pellet of
cellular debris. The supernatant was incubated with IgG Sepharose-6 Fast Flow resin (GE
Healthcare) and cleaved by TEV protease at 18 oC for 1 hour in a buffer containing
10mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.1% NP40, 1 mM DTT, and 0.5 mM EDTA.
The eluent was supplemented with 20 µl 1 M CaCl2 and loaded into calmodulin affinity
resin (Stratagene). Finally, the spliceosomal complex was eluted by the buffer CEB (10
mM Tris-HCl, pH 8.0, 75 mM NaCl, 1 mM Mg(OAc)2, 1 mM imidazole, 0.01% NP40, 1
mM TCEP, 2 mM EGTA), concentrated, and applied to a Superdex-200 column (GE
Healthcare). The peak fraction was used for sample preparation for EM imaging.
RNA preparation and RT-PCR
Total RNA was extracted from cells by TRIzol® reagent (Life Technologies) following
the suggested protocol. RNA from the purified spliceosomal complex was extracted by
phenol:chloroform:isopentanol at a volume ratio of 25:24:1 (Beijing Dingguo
Changsheng Biotech). First-strand cDNA synthesis was performed with the kit
SuperScript® III First-Strand Synthesis System for RT-PCR (InvitrogenTM) according to
the manufacturer’s instruction. Briefly, 200 ng of RNA was used in each reaction for
reverse transcription. RNA was digested by RNaseH. The remaining DNA was subjected
to PCR amplification. Five specific reactions for the cut6 gene were performed. The five
primers used are: P1: 5’-GAAATGTTTACGAGAAGCGCTTG-3’; P2: 5’-
CCTTTACAGGAACCAAGTCC-3’; P3: 5’-CGAGTACAAATCAATCATGC-3’; P4:
2
5’-TATCTCGGGGCAAGAAATTAGGAAGG-3’; and P5: 5’-
CTAAGTTCAAATTGCTCAAGCGCG-3’. The primer pairs P1/P2, P1/P3, P1/P5, P3/P4,
and P4/P5 give rise to RT-PCR products A, B, C, D, and E, respectively. Four RT-PCR
products are unique, representing 5’-exon (product A), a fusion of 5’-exon and 5’-intron
(product B), intron lariat (product D), and a fusion of 3’-exon and 3’-intron (product E).
The product C has two forms: a long fragment representing the pre-splicing state and a
short fragment representing the post-splicing state. The RT-PCR products were resolved
on 1.5% (wt/vol) agarose gel.
DSS Cross-linking
The purified yeast spliceosome from gel filtration was cross-linked by disuccinimidyl
suberate (DSS) at a 1:0.7 (wt/wt) ratio. 100 mM ammonium bicarbonate was used to
terminate the reaction after incubation at room temperature for 1 hour. Cooled acetone
was applied to precipitate the protein components for mass spectrometric (MS) analysis.
The spliceosome was crosslinked solely for the purpose of MS analysis, not for EM
studies.
Mass spectrometric analysis
Total proteins were precipitated with 15% trichloroacetic acid (TCA) and lyophilized.
The pellet was dissolved in 8 M urea, 100 mM Tris 8.5, followed by TCEP reduction,
iodoacetamide alkylation, and trypsin digestion. Trypsin (Promega) digestion was
quenched by 5% formic acid. Tryptic peptides were desalted with Pierce C18 spin
3
column (Thermo Fisher) and separated in a proxeon EASY-nLC liquid chromatography
system by applying a step-wise gradient of 0-85% acetonitrile (ACN) in 0.1% formic acid.
Peptides eluted from the LC column were directly electrosprayed into the mass
spectrometer with a distal 2 kV spray voltage. Data-dependent tandem mass spectrometry
(MS/MS) analysis was performed with an Orbitrap Fusion mass spectrometer
(ThermoFisher, San Jose, CA). For cross-linked complexes, cooled acetone precipitation
was applied instead of TCA. Sample was digested and analyzed using the same method
described above except that the analysis was performed on Thermo Q-Exactive
instrument in a 60-minute gradient. Raw data was processed with pLink software.
EM data acquisition and processing
Uranyl acetate (1% w/v) was used for negative staining. Briefly, the copper grids
supported by a thin layer of carbon film (Zhongjingkeyi Technology Co. Ltd) were glow-
discharged. 4 µl of spliceosomal complex at a concentration of ~0.1 mg/ml were applied
onto the grid for 1 minute and stored at room temperature. Images were taken on an FEI
Tecnai Spirit Bio TWIN microscope operating at 120 kV for the generation of an initial
model. The same carbon-coated copper grids as those used for negative staining were
used for cryo-EM specimen preparation. Cryo-EM grids were prepared with Vitrobot
Mark IV (FEI Company), using 8 oC and 100 percent humidity. Aliquots of 4 µl of the
spliceosomal complex at a concentration of ~0.3 mg/mL were applied to glow-discharged
grids, blotted for 2.5 seconds and plunged into liquid ethane cooled by liquid nitrogen.
Images were taken by an FEI Titan Krios electron microscope operating at 300 kV with a
nominal magnification of 22,500x. Images were recorded by a Gatan K2 Summit detector
4
(Gatan Company) with the super-resolution mode, and binned to a pixel size of 1.32 Å.
Defocus values varied from 1.5 to 3.0 μm. Each image was dose-fractionated to 32
frames with a dose rate of ~8.2 counts/sec/physical-pixel (~4.7 e-/sec/Å2), total exposure
time of 8 seconds, and 0.25 second per frame. UCSFImage4 was used for all data
collection (developed by Xueming Li).
Image processing
A crude map was generated based on the 60 micrographs of the negative-stained
spliceosomal complex and served as an initial model for cryo-EM 3D refinement. Particle
picking was performed with EMAN (76) subroutine e2boxer.py in an interactive boxing
mode, yielding 12,558 particles. Reference-free classification was performed with
e2refine2d.py, generating 60 classes. 42 classes were selected for the generation of an
initial model using e2initialmodel.py. A total of 2,246 cryo-EM micrographs were
collected. All images were aligned and summed using whole-image motion correction
(77). The defocus value of each image was determined by CTFFIND3 (78). In total,
224,450 particles were picked using the reference-based particle picking subroutine in
RELION (79). The templates for particle semi-autopicking were obtained from the 2D
class averages calculated from ~3,000 manually picked particles.
Particle sorting and reference-free 2D classification were preformed to remove
ice and contaminants using particles binned to a pixel size of 2.64 Å, yielding 118,841
good particles. The auto-refine procedure was performed on the binned particles with the
5
negative stain derived crude map as the initial model, resulting in a 5.3 Å map. The
handedness of the map was determined by attempted docking of known crystal structures
onto the EM density map. Relying on the refined images STAR file (data.star) derived
from the auto-refinement, all particles were read back to their original images based on
their refined centers using our own script (available upon request), generating a total of
2,246 new coordinate files for all the images (.star files). One round of manual particle
picking was performed with these improved coordinate files. This strategy allowed
convenient removal of the vast majority of ice spots and contaminants, greatly reducing
the labor of manual picking. After manual picking, 9,470 bad particles were further
removed and 24,530 missing particles were picked, yielding 133,901 particles for further
processing.
One round of 3D classification was performed with the new 5.3 Å map as
reference, and 112,795 particles were selected and produced a 3D reconstruction with an
average resolution of 3.9 Å using particles with a pixel size of 1.32 Å. This map shows
clear secondary structural elements and amino acid side chains in the core region. To deal
with the flexible nature of the spliceosomal complex, local masks for different parts of
this complex were applied to the 3.9 Å auto-refine procedure at the 10-iteration step.
Application of each local mask invariably led to a better local map than the overall
density map. The density for the target region was extracted from the overall map by
CHIMERA (80), and the mask was created by RELION (79).
6
After per-particle motion correction and radiation-damage weighting (known as
particle polishing) (81), these polished particles gave a reconstruction with an overall
resolution of 3.6 Å. Refinement for each local mask was performed continuously using
the polished particles instead of the original particles at the last iteration, leading to
improved maps for all cases. None of the efforts above was able to produce a reasonable
map for the U2 snRNP, which appears to be exceptionally flexible. Another two rounds
of 3D classification were performed. This effort identified 20,000 particles that
collectively yielded an 11-Å resolution map for this region after auto-refinement with a
small mask for U2 snRNP.
Reported resolutions are based on the gold-standard FSC 0.143 criterion, and
FSC curves were corrected for the effects of a soft mask on the FSC curve using high-
resolution noise substitution (82). Prior to visualization, all density maps were corrected
for the modulation transfer function (MTF) of the detector, and then sharpened by
applying a negative B-factor that was estimated using automated procedures (83). Local
resolution variations were estimated using ResMap (84).
Model Building and refinement
Due to a wide range of resolution limits for various regions of the spliceosome, we
combined de novo model building and homologous structure docking to generate an
atomic model. Local maps generated by different masking strategies described above
were used to facilitate the model building process, and these maps were translated exactly
7
to the 3.6 Å overall map using the CCP4 suite (85). A simplified diagram of the model
building procedure is presented in Fig. S18. Proteins with a characteristic shape, such as
the Sm ring and WD40 repeats, were first identified, and the atomic coordinates of the
corresponding homologues were docked into the density. This effort led to the
identification of Sm rings, Cwf1/Prp5, Cwf8/Prp19, and Cwf17. Large proteins with
available crystal structures for their homologues, such as Spp42 (Prp8 in S. cerevisiae),
Cwf10 (Snu114 in S. cerevisiae) and Cwf11 (Aquarius in human), were recognized and
docked into the density. U5 snRNA was unambiguously located. Proteins with extended
architectures were located, including Cwf3/Syf1, Cwf4/Syf3, and Cwf8/Prp19. The two
extended stretches of super-helical shaped density were assigned to Syf1 and Syf3, which
are HAT repeat containing proteins. The N-terminal structure of Syf3 was identified near
the center region. About 150 amino acids in the central region of Syf1 were also assigned.
The backbone of a large portion of Syf1 and Syf3 was traced as a poly-Ala model.
Cwf2/Prp3, Cwf14 and the Myb Domain of Cdc5 were also identified in the density map.
Two copies of the Prp19 U-box dimer were found as two bulges on the extended tube-
shaped density, confirming the tetrametric assembly of Prp19. Through the connection of
the U-box, we pinpointed the four coiled-coil helices of Prp19, and based on results of
mass spectroscopic analysis of the crosslinked spliceosome sample, we assigned one of
the two remaining long helices to Cdc5. Prp45, Cwf19, Prp17, and Cwf5 were identified
in the nearby region. Sm ring for U2, Msl1, and Lea1 in the flexible region were
identified through an 11-Å local map generated from a subgroup of particles from 3D
classification. Cyp1, Cwf15 and Cwf7 were the last identified proteins in the model
building procedure.
8
Our model building effort was facilitated by published structures and mass
spectroscopic analysis of the crosslinked spliceosome sample. The identified proteins
were further confirmed by atomic modeling. The atomic model of a protein with a
homologue structure was generated by CHAINSAW (86). The proteins and the
corresponding PDB accession codes are summarized in Table S1: 4I43 for Spp42, 4WZJ
for the Sm ring, 3J7P for Snu114, 3U1L for Cwf2, 4YVD for Prp5, 2OOE for syf1,
2OOE for syf3, 2MY1 for Cwf14, 2XL2 for Cwf17, 1GV2 for Cdc5 Myb, 1A9N for
Lea1, 1A9N for Msl1, 2BAY for Prp19 Ubox, 3LRV for Prp19 WD40, 3PJ3 for Cwf11,
and 2X7K for Cyp1. These structures were docked into the density map by COOT (86),
and fitted into density by CHIMERA. The models were manually adjusted and built by
COOT. The chemical properties of proteins and amino acids were considered to facilitate
initial model building. Sequence assignment was guided mainly by bulky residues such
as Phe, Tyr, Trp and Arg. Unique patterns of sequences were exploited for validation of
residue assignment.
The EM density map clearly shows the presence of several pieces of RNA. To
avoid potential mis-assignment, we initially focused our effort on the identification of the
protein components and subsequent atomic model building. The U5 snRNA was first
identified, which is intertwined with the N-terminal portion of Spp42. The density maps
for U2 and U6 snRNAs became unambiguous after assignment for the majority of protein
components. The assignment of U2 and U6 snRNA was facilitated by the location of
proteins that were known to interact with portions of these two RNA molecules. The
9
RNA sequence assignment was greatly aided by reported secondary structures, published
base pairing specifics, relative sizes of the purine and pyrimidine bases, and known RNA
binding partners.
The strong stacking interactions between adjacent base pairs of the RNA duplex
tend to blur the boundaries of the EM density among the bases. The map after applying a
soft mask around the Spp42 region during auto-refine procedure in RELION produced a
local resolution of 2.9-3.3 Å for the catalytic center as calculated by ResMap. To assist
assignment of specific RNA bases, we skipped the automatic FSC-weighting in post-
processing procedure in RELION and applied a 3 Å low-pass filter and sharpened the
map by a negative B-factor around 50. This strategy gave rise to a local map that exhibits
distinguishable features between purine and pyrimidine bases in select regions.
After assignment of U2 and U6 snRNAs, the RNA intron lariat was found with a
characteristic T-shaped structure between the branch point sequence (BPS) and the 5’-
splicing site (5’SS). The EM density for the intron lariat is weak. Assignment of the
specific RNA sequences in the intron lariat was guided by complementary sequences in
U2 and U6 snRNAs. Two magnesium ions were located next to the phosphate group of
the uridine nucleotide at position 68 of U6 snRNA. The RNA sequences were manually
built using COOT.
10
Initial structure refinement of individual protein was carried out by PHENIX in
real space (87) with secondary structure and geometry restraints to prevent over-fitting.
The best map for individual protein was applied during real space refinement. The final
overall model was refined against the overall 3.6 Å map using REFMAC in reciprocal
space with stereo-chemical and homology restraints (88). Overfitting of the overall model
was monitored by refining the model in one of the two independent maps from the gold-
standard refinement approach, and testing the refined model against the other map (89)
(Fig. S9D).
11
Fig. S1 Purification and characterization of the spliceosomal complex from
Schizosaccharomyces pombe (S. pombe). (A) A schematic diagram of the purification
protocol. (B) The purified complex was subjected to gel filtration analysis, which
revealed a single peak. (C) The peak fractions from gel filtration were visualized on
Urea-PAGE by SYBR® gold staining for RNA detection. (D) The same peak fractions
12
from gel filtration were visualized on SDS-PAGE by Coomassie blue staining for protein
detection. (E) The purified yeast spliceosome sample was subject to mass spectroscopic
(MS) analysis. Shown here is a list of core components in U2 snRNP, U5 snRNP, NTC,
and NTC related. Two independently prepared samples, named batch 1 and batch 2, were
subjected to such exhaustive MS analysis, revealing very similar results and confirming
the consistency of our purification method. The (peptide-spectrum match) PSM value
represents the relative abundance of the target protein, with a higher PSM value
indicating a higher abundance. 34 of the 39 listed core protein components have been
identified in our final structure (indicated by the check signs).
13
14
Fig. S2 Results of mass spectrometric (MS) analysis of the purified spliceosomal
complex from S. pombe. Two batches of sample were analyzed. Both samples were
used for the cryo-EM structure determination. (A) All proteins with PSM value higher
than 30 for at least one batch of sample are shown in the order of decreasing abundance.
There are 124 proteins altogether. About half of these proteins are spliceosomal
components. The top 15 entries with high PSM values are all spliceosomal proteins:
Spp42, Cwf17, Cwf8/Prp19, Cdc5, Prp17, Prp45, Cwf1/Prp5, Cwf10, Cwf11, Cwf2/Prp3,
Cwf15, Cwf5/Ecm2, Cwf3/Syf1, Cwf4/Syf3, and Brr2. As previously noted, most of the
contaminating proteins are ribosomal components. (B) Spliceosomal proteins with low
abundance in the purified sample. These proteins have PSM values lower than 30 and are
likely present in the sample with low abundance.
15
Fig. S3 Results of the RT-PCR reactions identify the purified yeast spliceosome as a
mixture of different spliceosomal complexes. (A) Schematic diagrams of the RT-PCR
reactions. Five pairs of RT-PCR primers were specifically designed for the cut6 gene.
Each of the four reactions would give rise to a unique PCR product. These four products
are: 5’-exon (product A), a fusion between 5’-exon and 5’-end sequences of the intron
(product B), intron lariat (product D), a fusion of 3’-exon and 3’-end sequences of the
intron (product E). The product C (using a 5’-primer in 5’-exon and a 3’-primer in 3’-
exon) has two forms: a long fragment representing the pre-splicing state and a short
16
fragment representing the post-splicing state. (B) The correspondence between the
spliceosomal complexes and the RT-PCR products. For example, the spliceosomal C
complex is predicted to have RT-PCR products A, D, and E, but not B or C. (C) The
predicted lengths of the five RT-PCR products for the cut6 gene. (D) Results of the RT-
PCR reactions. The left panel shows the results on the cryo-EM sample. The right panel
shows results of RT-PCR on three samples that have been subjected to extensive washes
by different ionic strength (150, 500, and 800 mM NaCl). The clear presence of the intron
lariat suggests presence of the C, P, or ILS complex. The two bands for the RT-PCR
product C suggests presence of both pre-splicing and post-splicing spliceosomal
complexes.
17
Fig. S4 Results of mass spectrometric (MS) analysis of the crosslinked spliceosomal
complex from S. pombe. The purified spliceosome was crosslinked by disuccinimidyl
suberate and analyzed by MS. This analysis identified 78 pairs of inter-molecular
interaction among the spliceosomal proteins. This data facilitates structural identification
of the spliceosomal components in the EM density.
18
Fig. S5 Preliminary electron microscopic (EM) analysis of the yeast spliceosome.
(A) A representative EM micrograph of the yeast spliceosome sample stained by uranyl
acetate. Scale bar, 100 nm. (B) Three-dimensional reconstruction of the yeast
spliceosome at 40 Å resolution. This structure was used as the initial model for auto-
refinement in the cryo-EM structure determination.
19
Fig. S6 A flow-chart for the cryo-EM data processing of the yeast spliceosome.
Please refer to the Method for details.
20
Fig. S7 Procedures for the cryo-EM image processing and masking strategy. (A)
Procedures for particle selection. After semi-autopicking, particle sorting and reference-
free 2D classification were preformed to remove ice and contaminants. Then, auto-
refinement was performed with the negative stain derived map as the initial model. Based
on the refined images STAR file (data.star), all particles were read back to their original
images with their refined centers and new particle coordinate file for each image was
generated. Finally, one round of manual particle picking was performed with the new
coordinates files to further discard bad particles and pick previously unpicked particles.
(B) To deal with the flexible nature of the spliceosome, local masks for different parts of
this complex were applied, which invariably led to a better local map. Reduced sizes of
the mask were further applied to three local areas, resulting in improved map quality in
these peripheral regions.
21
Fig. S8 Application of local mask improves the resolution and local map quality.
The resolutions are calculated for the eight cases of local mask application described in
Fig. S7B. Much of the central region of density map is resolved at resolutions better than
3.6 Å, ranging between 2.9 and 3.1 Å. The density was also improved for Arm II and the
Head region.
22
Fig. S9 Cryo-EM analysis of the yeast spliceosome. (A) Angular distribution for the
final reconstruction of the yeast spliceosome. Each cylinder represents one view and
the height of the cylinder is proportional to the number of particles for that
view. (B) FSC curves and the calculated resolutions after application of the local masks.
Five cases described in Fig. S7B are shown here. (C) FSC curves and the calculated
resolutions after application of reduced local masks. Three cases described in Fig. S7B
are shown here. (D) FSC curves of the final refined model versus the overall 3.6 Å map
it was refined against (black); of the model refined in the first of the two independent
maps used for the gold-standard FSC versus that same map (red); and of the model
refined in the first of the two independent maps versus the second independent map
(green). The small difference between the red and green curves indicates that the
refinement of the atomic coordinates did not suffer from severe overfitting.
23
Fig. S10 Representative EM density maps for the core regions of the yeast
spliceosome. Shown here are EM density maps for N-terminal regions of Spp42 (A, B),
the Cwf10-interacting loop of Spp42 (C), the RT Palm/Finger region of Spp42 (D),
Thumb/X of Spp42 (E), Linker of Spp42 (F), endonuclease domain (G), RNaseH-like
domain of Spp42 (H), the overall region of Cwf19 (I), and two select secondary structural
elements of Cwf19 (J). The side chain features for many amino acids are clearly visible,
allowing assignment of specific amino acids.
24
Fig. S11 Representative EM density maps for select secondary structural elements
of Spp42 (Prp8 in S. cerevisiae) and Cwf10 (Snu114 in S. cerevisiae). (A)
Representative EM density maps for 11 α-helices and 2 β-strands of Spp42. Bulky
residues are labeled. (B) Representative EM density maps for 2 α-helices and 3 β-strands
of Cwf10. (C) The EN density clearly identifies the bound nucleotide to be GDP in
Cwf10.
25
Fig. S12 The EM density maps for the spliceosomal proteins Cwf11, Cwf19, and the
Sm ring. (A) An overall EM density map is shown for Cwf11 in the left panel and local
EM density maps for select secondary structural elements are displayed in the other three
panels. (B) An overall EM density map is shown for the heptameric Sm ring in the top
left panel. The local EM density maps for all seven Sm proteins, along with their
recognized U5 snRNA bases, are shown in the other seven panels.
26
Fig. S13 The EM density maps for the core components of NTC. (A) An overall EM
density map is shown in two perpendicular views for the core components of NTC. These
components include Cdc5, Cwf7, ad Prp19. (B) Representative EM density maps for the
long α-helix of Cdc5. Two bulky residues are labeled. (C) Representative EM density
maps for select secondary structural elements of Cwf7. (D) Representative EM density
maps for select secondary structural elements of the tetrameric Prp19 protein. The four
chain labels, A through D, are indicated. A few bulky residues are labeled.
27
Fig. S14 The EM density maps for the superhelical proteins Cwf3/Syf1 and
Cwf4/Syf3. (A) The overall EM density maps are shown for the two halves of
Cwf3/Syf1 in the top panels. Local EM density maps are shown for five representative α-
helices of Cwf3/Syf1 in the bottom panels. A number of bulky residues are labeled. (B)
The overall EM density maps are shown for the two halves of Cwf4/Syf3 in the top
panels. Local EM density maps are shown for five representative α-helices of Cwf4/Syf3
in the bottom panels. A number of bulky residues are labeled.
28
Fig. S15 Representative EM density maps for five spliceosomal proteins.
Representative EM density maps are shown for select structural elements of Cwf2/Prp3
(A), Cwf5/Ecm2 (B), Cwf14 (C), Cyp1 (D), and Cwf15 (E). A number of bulky amino
acids are labeled.
29
Fig. S16 Representative EM density maps for Prp17, Prp45, and U2 snRNP. (A)
Local EM density maps are shown for three representative secondary structural elements
of Prp17. Two bulky residues in each element are labeled. (B) Local EM density maps
are shown for five representative secondary structural elements of Prp45. Six bulky
residues are labeled. (C) The overall EM density maps are shown for U2 snRNP. Two
perpendicular views are displayed here.
30
Fig. S17 Representative EM density maps for Cwf17 and Cwf1/Prp5. (A) The
overall EM density maps are shown for Cwf17 in the top panels. Three views are shown
to highlight the interacting elements from SmB1 and Spp42. Local EM density maps are
shown for three structural elements in the bottom panels. A number of bulky residues are
labeled. (B) The overall EM density maps are shown for six mutually interacting
spliceosomal protein components in the top panels. Three perpendicular views are shown.
Local EM density maps are shown for three structural elements of Cwf1/Prp5 in the
bottom panels. Bulky residues are labeled.
31
Fig. S18 A procedure of model building. First, the heptameric Sm ring and WD40 repeat containing proteins, including Cwf1/Prp5, Cwf8/Prp19, and Cwf17, were identified, and the atomic coordinates of the corresponding homologues were docked into the density. Second, large proteins with available crystal structures for their homologues, namely Spp42 (Prp8 in S. cerevisiae), Cwf10 (Snu114 in S. cerevisiae) and Cwf11, were docked into the density. U5 snRNA was unambiguously located. Third, proteins with extended architectures were located, including Cwf3/Syf1, Cwf4/Syf3, and Cwf8/Prp19. Cwf2/Prp3, Cwf14 and the Myb Domain of Cdc5 were also identified in the density map. Fourth, based on results of mass spectroscopic analysis of the crosslinked spliceosome sample, we assigned one of the two long helices to Cdc5. Prp45, Cwf19, Cwf7, and Cwf5 were identified in the nearby region. Fifth, a 11 Å map for U2 snRNP was generated after 3D classification, leading to identification of Lea1, Msl1, a second Sm ring, and the 3’-end portion of U2 snRNA. Finally, U6 snRNA, U2 snRNA, and the intron lariat were found in the remaining EM density, and three additional proteins Cwf7, Cwf15, and Cyp1 were located.
32
Fig. S19 Identification of a conserved catalytic cavity on Prp8. (A) The positively
charged amino acids in the catalytic cavity of Spp42 are invariant among S. cerevisiae, C.
elegans, M. musculus, and H. sapiens. Sequence alignment of the relevant regions from
Spp42 and Prp8 are shown here. Invariant residues are colored with red background.
Positively charged amino acids that line the catalytic cavity of Spp42 are indicated by
arrows. (B) Identification of the catalytic cavity in Prp8 by electrostatic surface potential.
Spp42 is shown in the left panel as a reference.
33
Fig. S20 Prp45 interacts with at least 9 spliceosomal proteins and two snRNA
molecules. (A) Overall structure of Prp45 (colored red) bound to other protein and
snRNA components. The three boxed regions are shown in close-up views in panels B, C,
and D. (B) Residues from the C-terminal α-helix of Prp45 interact with Spp42 through
predominantly van der Waals contacts. The hydrophobic residues from Prp45 (Phe288,
Phe291, Leu295, and Val298) closely stack against hydrophobic residues from Spp42.
(C) Two anti-parallel β-strands at the N-terminal portion of Prp45 pair up with a β-strand
from Prp17 to form a β-sheet. This interface is dominated by hydrogen bonds. (D) Prp45
directly interacts with U2 snRNA. The side chain of Asn260 may donate a hydrogen
bond to the uracil of nucleotide 18 from U2 snRNA. Residues from the loop preceding
Asn260 interact with Spp42.
34
Fig. S21 Structures of six individual protein components in the yeast spliceosome.
(A) Structure of Cwf2/Prp3. (B) Structure of Cwf11. The homologous structure was
docked into the EM density, manually adjusted, and refined. (C) Structure of the β-
propeller protein Cwf17. (D) Structure of Cwf7. Note the extended appearance of Cwf7.
(E) Structure of the cyclophilin family peptidyl-prolyl cis-trans isomerase Cyp1. (F)
Structure of Prp17. Prp17, together with Prp45, Cdc5, and Cwf7, define a family of
intrinsically disordered proteins in isolation. These proteins adopt extended but defined
conformations upon binding to their interacting partners.
35
Fig. S22 Conformational flexibility of the yeast spliceosome. (A) Two conformations
of the yeast spliceosome. These two conformations exhibit large variations mainly in the
Head region and Arm I. (B) Superposition of these two conformations is displayed in
two perpendicular views.
36
Fig. S23 Three distinct zinc-binding motifs in the spliceosomal proteins. (A)
Cwf5/Ecm2 contains two Cys4-type zinc-binding motifs. The two zinc ions are
coordinated by Cys23/Cys26/Cys80/Cys83 and Cys44/Cys47/Cys70/Cys73, respectively.
(B) Cwf14 contains a three-zinc cluster. These three zinc ions are coordinated by 9
cysteine residues: Cys101, Cys102, Cys105, Cys117, Cys119, Cys134, Cys137, Cys139,
and Cys142. Each zinc atom has tetrahedral coordination, and three cysteine residues
each contribute two valences. (C) Cwf19 contains a Cys2His2-type zinc-binding motif.
The zinc ion is bound by Cys412, Cys415, His 455, and His502.
37
Table S1. Cryo-EM data collection and refinement statistics.
Data collection EM equipment FEI Titan Krios Voltage (kV) 300 Detector Gatan K2 Pixel size (Å) 1.32 Electron dose (e-/Å2) 38 Defocus range (µm) 1.5~3.0
Reconstruction Software RELION 1.3 Number of used Particles 112,795 Accuracy of rotation (˚) 1.000 Accuracy of translation (pixels) 0.888 Final Resolution (Å) 3.6
Refinement Software Refmac 5.8 Map sharpening B-factor (Å2) -86.9 Average Fourier shell correlation 0.766 R-factor 0.363
Model composition Protein residues 10,574 RNA nucleotides 332 Ion 11 GDP 1 ADP 1
Validation R.m.s deviations
Bonds length (Å) 0.006 Bonds Angle (˚) 1.002
Ramachandran plot statistics (%) Preferred 89.65 Allowed 6.73 Outlier 3.61
38
Table S2 Summary of model building for the yeast spliceosome. Molecule Length Domain/Region PDB code Modeling Resolution (Å)
U5 snRNP
U5 snRNA 120 7:111 - De novo building 2.9~3.6
Spp42/Prp8
2363
N-terminal Domain (47:825) RT finger/palm (826:1210)
Thumb/X (1211:1327) Linker (1328:1602)
Endonuclease (1603:1783) RNaseH-like (1784:2030)
Jab1/MPN
-
4I43 -
De novo building
Homology modeling
Not modelled
2.9~3.6
~4 -
Cwf10
984
N-terminal Domain (68:120) G domain (121:452) Domain II (453:595) Domain III (596:675) Domain IV (676:843) Domain V (844:919)
C-terminal Domain(920:971)
-
3J7P -
De novo building
Homology modeling
De novo building
2.9~3.8
Brr2 2176 - - Not modelled - Cwf17 340 WD40 domain (42:340) 2XL2 Homology modeling 3.3~4.0 SmB1 SmD1 SmD2 SmD3 SmE1 SmF1 SmG1
147 117 115 97 84 78 77
Sm fold (2:87/94:118) Sm fold (1:82)
Sm fold (19:115) Sm fold (2:97) Sm fold (9:84) Sm fold (4:78) Sm fold (3:75)
4WZJ
Homology modeling
~3.3~4.0
U6 snRNP U6 snRNA 99 nt 1:90 - De novo building 2.9~4.5 U2 snRNP
U2 RNA 186 nt 1:43 93:109/153:177
112:145
- - -
De novo building Homology modeling
Double helix
2.9~4.5 ~11 ~11
Lea1 237 LRR domain 1A9N Rigid docking
~11
Msl1 111 RRM domain 1A9N Rigid docking SmB1 SmD1 SmD2 SmD3 SmE1 SmF1 SmG1
147 117 115 97 84 78 77
Sm fold Sm fold Sm fold Sm fold Sm fold Sm fold Sm fold
U5 Sm proteins
Rigid docking
NTC core
Cdc5 757 Myb Domain (8:109) 502:639
174:203/221:271/652:757
1GV2 - -
De novo building α-helices modelled
De novo building
~3.5 ~5
3.6~4.5
Prp19/Cwf8 (4 copy)
488 U-box (4 copy) Coil-Coil Region (4 copy)
WD40 (1 copy)
2BAY -
3LRV
Homology modeling De novo building
Rigid docking
3.6~6 3.6~5
~7 Cwf7 187 Coil-Coil (7:185) - De novo building 3.6~4.5
Cwf2/prp3 388 50:235 3U1L Homology modeling 3.3~5 Cwf3/Syf1 790 ~100:495,655~730
498~653 - -
α-helices modelled De novo building
5~7 3.4~4
Cwf4/Syf3 674 41-290 ~290-660
- -
De novo building α-helices modelled
3.4~4 5~7
Cwf15 265 24-70; 223-265 - De novo building 3.0~4.0 Syf2 229 - - Not modelled -
Cwf12/Isy1 217 - - Not modelled -
NTC related
Prp5/Cwf1 473 N-terminal Domain WD40 domain (149:470)
- 4YVD
α-helices modelled Homology modeling
~6 ~3.4
Prp45 557 100:315 - De novo building 3.0-4.5 Cwf5/ecm2 354 18:151
RRM domain (207:285) -
2YTC De novo building
Homology modeling 3.3-4.0
4~6 Cwf11 1284 - 3PJ3 Homology modeling 4.0~7 Cwf19 639 334:633 - De novo building 3.4~4 Cwf14 146 3-146 2MY1 Homology modeling ~3.4 Cwf16 270 - - Not modelled - Cwf18 142 - - Not modelled -
Others
Prp17 558 N-terminal (14:161) WD40 domain
- -
De novo building Not modelled
3.3~4.5 -
Cyp1 155 Cyclosporine Like domain 2X7K Homology modeling 3.8~4.5
39