Supplementary Materials for - Science...system by applying a step-wise gradient of 0-85% acetonitrile (ACN) in 0.1% formic acid. Peptides eluted from the LC column were directly electrosprayed

www.sciencemag.org/cgi/content/full/science.aac7629/DC1

Supplementary Materials for

Structure of a yeast spliceosome at 3.6-angstrom resolution

Chuangye Yan, Jing Hang, Ruixue Wan, Min Huang, Catherine C. L. Wong, Yigong Shi*

*Corresponding author. E-mail: [email protected]

Published 20 August 2015 on Science Express

DOI: 10.1126/science.aac7629

This PDF file includes:

Materials and Methods Figs. S1 to S23 Tables S1 and S2

Materials and Methods

S. pombe strain

The S. pombe yeast strain used in this study carries a TAP tag (including a calmodulin

binding peptide and a protein A separated by a TEV protease cleavage site) at the C-

terminus of the Cdc5 protein. This design allows two tandem affinity purification steps.

To introduce this tag at the C-terminus of the Cdc5 chromosomal loci, the TAP tag and a

HphMX6 marker were amplified by PCR from the plasmid pF6Aa-CTAP-HphMX6. The

resulting PCR fragments were transformed to haploid S. pombe cells by lithium acetate

method (74) and the transformants were selected on hygromycin B-YES sodium medium.

The two affinity tags were confirmed by PCR and Western blots.

Purification of the yeast spliceosome

Purification of the yeast spliceosome was modified from a published protocol (75). The S.

pombe culture was grown on 5xYES medium for 24-30 hours at 30 oC to an OD600 of ~22.

Cell pellets from a 4-liter culture were collected by centrifugation at 3,800 rpm. The cell

pellets (~92 mL) were resuspended in 30 mL buffer containing 10 mM HEPES-KOH, pH

7.9, 10 mM Tris-Cl, pH 8.0, 40 mM KCl, and 110 mM NaCl. The cell suspension, at a

final volume of about 120 mL, was dropped into liquid nitrogen to form yeast beads with

a diameter of 3-6 mm and pulverized to powder by SPEX 6870 Freezer Mill. Frozen cell

powder was resuspended in 30 mL buffer containing 10 mM HEPES-KOH, pH 7.9, 10

mM Tris-HCl, pH 8.0, 40 mM KCl, 110 mM NaCl, 0.2 mM EDTA, 20% glycerol, and

protease inhibitor cocktail (1mM phenylmethylsulphonyl fluoride (PMSF), 2 mM

1

benzamidine, 2.6 μg/ml aprotinin, 1.4 μg/ml pepstatin and 10 μg/ml leupeptin). The cell

suspension was first centrifuged at 18,000g for 1 hour and the supernatant was

centrifuged again at 150,000g for 1 hour, yielding ~100mL cell extract and a pellet of

cellular debris. The supernatant was incubated with IgG Sepharose-6 Fast Flow resin (GE

Healthcare) and cleaved by TEV protease at 18 oC for 1 hour in a buffer containing

10mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.1% NP40, 1 mM DTT, and 0.5 mM EDTA.

The eluent was supplemented with 20 µl 1 M CaCl2 and loaded into calmodulin affinity

resin (Stratagene). Finally, the spliceosomal complex was eluted by the buffer CEB (10

mM Tris-HCl, pH 8.0, 75 mM NaCl, 1 mM Mg(OAc)2, 1 mM imidazole, 0.01% NP40, 1

mM TCEP, 2 mM EGTA), concentrated, and applied to a Superdex-200 column (GE

Healthcare). The peak fraction was used for sample preparation for EM imaging.

RNA preparation and RT-PCR

Total RNA was extracted from cells by TRIzol® reagent (Life Technologies) following

the suggested protocol. RNA from the purified spliceosomal complex was extracted by

phenol:chloroform:isopentanol at a volume ratio of 25:24:1 (Beijing Dingguo

Changsheng Biotech). First-strand cDNA synthesis was performed with the kit

SuperScript® III First-Strand Synthesis System for RT-PCR (InvitrogenTM) according to

the manufacturer’s instruction. Briefly, 200 ng of RNA was used in each reaction for

reverse transcription. RNA was digested by RNaseH. The remaining DNA was subjected

to PCR amplification. Five specific reactions for the cut6 gene were performed. The five

primers used are: P1: 5’-GAAATGTTTACGAGAAGCGCTTG-3’; P2: 5’-

CCTTTACAGGAACCAAGTCC-3’; P3: 5’-CGAGTACAAATCAATCATGC-3’; P4:

2

5’-TATCTCGGGGCAAGAAATTAGGAAGG-3’; and P5: 5’-

CTAAGTTCAAATTGCTCAAGCGCG-3’. The primer pairs P1/P2, P1/P3, P1/P5, P3/P4,

and P4/P5 give rise to RT-PCR products A, B, C, D, and E, respectively. Four RT-PCR

products are unique, representing 5’-exon (product A), a fusion of 5’-exon and 5’-intron

(product B), intron lariat (product D), and a fusion of 3’-exon and 3’-intron (product E).

The product C has two forms: a long fragment representing the pre-splicing state and a

short fragment representing the post-splicing state. The RT-PCR products were resolved

on 1.5% (wt/vol) agarose gel.

DSS Cross-linking

The purified yeast spliceosome from gel filtration was cross-linked by disuccinimidyl

suberate (DSS) at a 1:0.7 (wt/wt) ratio. 100 mM ammonium bicarbonate was used to

terminate the reaction after incubation at room temperature for 1 hour. Cooled acetone

was applied to precipitate the protein components for mass spectrometric (MS) analysis.

The spliceosome was crosslinked solely for the purpose of MS analysis, not for EM

studies.

Mass spectrometric analysis

Total proteins were precipitated with 15% trichloroacetic acid (TCA) and lyophilized.

The pellet was dissolved in 8 M urea, 100 mM Tris 8.5, followed by TCEP reduction,

iodoacetamide alkylation, and trypsin digestion. Trypsin (Promega) digestion was

quenched by 5% formic acid. Tryptic peptides were desalted with Pierce C18 spin

3

column (Thermo Fisher) and separated in a proxeon EASY-nLC liquid chromatography

system by applying a step-wise gradient of 0-85% acetonitrile (ACN) in 0.1% formic acid.

Peptides eluted from the LC column were directly electrosprayed into the mass

spectrometer with a distal 2 kV spray voltage. Data-dependent tandem mass spectrometry

(MS/MS) analysis was performed with an Orbitrap Fusion mass spectrometer

(ThermoFisher, San Jose, CA). For cross-linked complexes, cooled acetone precipitation

was applied instead of TCA. Sample was digested and analyzed using the same method

described above except that the analysis was performed on Thermo Q-Exactive

instrument in a 60-minute gradient. Raw data was processed with pLink software.

EM data acquisition and processing

Uranyl acetate (1% w/v) was used for negative staining. Briefly, the copper grids

supported by a thin layer of carbon film (Zhongjingkeyi Technology Co. Ltd) were glow-

discharged. 4 µl of spliceosomal complex at a concentration of ~0.1 mg/ml were applied

onto the grid for 1 minute and stored at room temperature. Images were taken on an FEI

Tecnai Spirit Bio TWIN microscope operating at 120 kV for the generation of an initial

model. The same carbon-coated copper grids as those used for negative staining were

used for cryo-EM specimen preparation. Cryo-EM grids were prepared with Vitrobot

Mark IV (FEI Company), using 8 oC and 100 percent humidity. Aliquots of 4 µl of the

spliceosomal complex at a concentration of ~0.3 mg/mL were applied to glow-discharged

grids, blotted for 2.5 seconds and plunged into liquid ethane cooled by liquid nitrogen.

Images were taken by an FEI Titan Krios electron microscope operating at 300 kV with a

nominal magnification of 22,500x. Images were recorded by a Gatan K2 Summit detector

4

(Gatan Company) with the super-resolution mode, and binned to a pixel size of 1.32 Å.

Defocus values varied from 1.5 to 3.0 μm. Each image was dose-fractionated to 32

frames with a dose rate of ~8.2 counts/sec/physical-pixel (~4.7 e-/sec/Å2), total exposure

time of 8 seconds, and 0.25 second per frame. UCSFImage4 was used for all data

collection (developed by Xueming Li).

Image processing

A crude map was generated based on the 60 micrographs of the negative-stained

spliceosomal complex and served as an initial model for cryo-EM 3D refinement. Particle

picking was performed with EMAN (76) subroutine e2boxer.py in an interactive boxing

mode, yielding 12,558 particles. Reference-free classification was performed with

e2refine2d.py, generating 60 classes. 42 classes were selected for the generation of an

initial model using e2initialmodel.py. A total of 2,246 cryo-EM micrographs were

collected. All images were aligned and summed using whole-image motion correction

(77). The defocus value of each image was determined by CTFFIND3 (78). In total,

224,450 particles were picked using the reference-based particle picking subroutine in

RELION (79). The templates for particle semi-autopicking were obtained from the 2D

class averages calculated from ~3,000 manually picked particles.

Particle sorting and reference-free 2D classification were preformed to remove

ice and contaminants using particles binned to a pixel size of 2.64 Å, yielding 118,841

good particles. The auto-refine procedure was performed on the binned particles with the

5

negative stain derived crude map as the initial model, resulting in a 5.3 Å map. The

handedness of the map was determined by attempted docking of known crystal structures

onto the EM density map. Relying on the refined images STAR file (data.star) derived

from the auto-refinement, all particles were read back to their original images based on

their refined centers using our own script (available upon request), generating a total of

2,246 new coordinate files for all the images (.star files). One round of manual particle

picking was performed with these improved coordinate files. This strategy allowed

convenient removal of the vast majority of ice spots and contaminants, greatly reducing

the labor of manual picking. After manual picking, 9,470 bad particles were further

removed and 24,530 missing particles were picked, yielding 133,901 particles for further

processing.

One round of 3D classification was performed with the new 5.3 Å map as

reference, and 112,795 particles were selected and produced a 3D reconstruction with an

average resolution of 3.9 Å using particles with a pixel size of 1.32 Å. This map shows

clear secondary structural elements and amino acid side chains in the core region. To deal

with the flexible nature of the spliceosomal complex, local masks for different parts of

this complex were applied to the 3.9 Å auto-refine procedure at the 10-iteration step.

Application of each local mask invariably led to a better local map than the overall

density map. The density for the target region was extracted from the overall map by

CHIMERA (80), and the mask was created by RELION (79).

6

After per-particle motion correction and radiation-damage weighting (known as

particle polishing) (81), these polished particles gave a reconstruction with an overall

resolution of 3.6 Å. Refinement for each local mask was performed continuously using

the polished particles instead of the original particles at the last iteration, leading to

improved maps for all cases. None of the efforts above was able to produce a reasonable

map for the U2 snRNP, which appears to be exceptionally flexible. Another two rounds

of 3D classification were performed. This effort identified 20,000 particles that

collectively yielded an 11-Å resolution map for this region after auto-refinement with a

small mask for U2 snRNP.

Reported resolutions are based on the gold-standard FSC 0.143 criterion, and

FSC curves were corrected for the effects of a soft mask on the FSC curve using high-

resolution noise substitution (82). Prior to visualization, all density maps were corrected

for the modulation transfer function (MTF) of the detector, and then sharpened by

applying a negative B-factor that was estimated using automated procedures (83). Local

resolution variations were estimated using ResMap (84).

Model Building and refinement

Due to a wide range of resolution limits for various regions of the spliceosome, we

combined de novo model building and homologous structure docking to generate an

atomic model. Local maps generated by different masking strategies described above

were used to facilitate the model building process, and these maps were translated exactly

7

to the 3.6 Å overall map using the CCP4 suite (85). A simplified diagram of the model

building procedure is presented in Fig. S18. Proteins with a characteristic shape, such as

the Sm ring and WD40 repeats, were first identified, and the atomic coordinates of the

corresponding homologues were docked into the density. This effort led to the

identification of Sm rings, Cwf1/Prp5, Cwf8/Prp19, and Cwf17. Large proteins with

available crystal structures for their homologues, such as Spp42 (Prp8 in S. cerevisiae),

Cwf10 (Snu114 in S. cerevisiae) and Cwf11 (Aquarius in human), were recognized and

docked into the density. U5 snRNA was unambiguously located. Proteins with extended

architectures were located, including Cwf3/Syf1, Cwf4/Syf3, and Cwf8/Prp19. The two

extended stretches of super-helical shaped density were assigned to Syf1 and Syf3, which

are HAT repeat containing proteins. The N-terminal structure of Syf3 was identified near

the center region. About 150 amino acids in the central region of Syf1 were also assigned.

The backbone of a large portion of Syf1 and Syf3 was traced as a poly-Ala model.

Cwf2/Prp3, Cwf14 and the Myb Domain of Cdc5 were also identified in the density map.

Two copies of the Prp19 U-box dimer were found as two bulges on the extended tube-

shaped density, confirming the tetrametric assembly of Prp19. Through the connection of

the U-box, we pinpointed the four coiled-coil helices of Prp19, and based on results of

mass spectroscopic analysis of the crosslinked spliceosome sample, we assigned one of

the two remaining long helices to Cdc5. Prp45, Cwf19, Prp17, and Cwf5 were identified

in the nearby region. Sm ring for U2, Msl1, and Lea1 in the flexible region were

identified through an 11-Å local map generated from a subgroup of particles from 3D

classification. Cyp1, Cwf15 and Cwf7 were the last identified proteins in the model

building procedure.

8

Our model building effort was facilitated by published structures and mass

spectroscopic analysis of the crosslinked spliceosome sample. The identified proteins

were further confirmed by atomic modeling. The atomic model of a protein with a

homologue structure was generated by CHAINSAW (86). The proteins and the

corresponding PDB accession codes are summarized in Table S1: 4I43 for Spp42, 4WZJ

for the Sm ring, 3J7P for Snu114, 3U1L for Cwf2, 4YVD for Prp5, 2OOE for syf1,

2OOE for syf3, 2MY1 for Cwf14, 2XL2 for Cwf17, 1GV2 for Cdc5 Myb, 1A9N for

Lea1, 1A9N for Msl1, 2BAY for Prp19 Ubox, 3LRV for Prp19 WD40, 3PJ3 for Cwf11,

and 2X7K for Cyp1. These structures were docked into the density map by COOT (86),

and fitted into density by CHIMERA. The models were manually adjusted and built by

COOT. The chemical properties of proteins and amino acids were considered to facilitate

initial model building. Sequence assignment was guided mainly by bulky residues such

as Phe, Tyr, Trp and Arg. Unique patterns of sequences were exploited for validation of

residue assignment.

The EM density map clearly shows the presence of several pieces of RNA. To

avoid potential mis-assignment, we initially focused our effort on the identification of the

protein components and subsequent atomic model building. The U5 snRNA was first

identified, which is intertwined with the N-terminal portion of Spp42. The density maps

for U2 and U6 snRNAs became unambiguous after assignment for the majority of protein

components. The assignment of U2 and U6 snRNA was facilitated by the location of

proteins that were known to interact with portions of these two RNA molecules. The

9

RNA sequence assignment was greatly aided by reported secondary structures, published

base pairing specifics, relative sizes of the purine and pyrimidine bases, and known RNA

binding partners.

The strong stacking interactions between adjacent base pairs of the RNA duplex

tend to blur the boundaries of the EM density among the bases. The map after applying a

soft mask around the Spp42 region during auto-refine procedure in RELION produced a

local resolution of 2.9-3.3 Å for the catalytic center as calculated by ResMap. To assist

assignment of specific RNA bases, we skipped the automatic FSC-weighting in post-

processing procedure in RELION and applied a 3 Å low-pass filter and sharpened the

map by a negative B-factor around 50. This strategy gave rise to a local map that exhibits

distinguishable features between purine and pyrimidine bases in select regions.

After assignment of U2 and U6 snRNAs, the RNA intron lariat was found with a

characteristic T-shaped structure between the branch point sequence (BPS) and the 5’-

splicing site (5’SS). The EM density for the intron lariat is weak. Assignment of the

specific RNA sequences in the intron lariat was guided by complementary sequences in

U2 and U6 snRNAs. Two magnesium ions were located next to the phosphate group of

the uridine nucleotide at position 68 of U6 snRNA. The RNA sequences were manually

built using COOT.

10

Initial structure refinement of individual protein was carried out by PHENIX in

real space (87) with secondary structure and geometry restraints to prevent over-fitting.

The best map for individual protein was applied during real space refinement. The final

overall model was refined against the overall 3.6 Å map using REFMAC in reciprocal

space with stereo-chemical and homology restraints (88). Overfitting of the overall model

was monitored by refining the model in one of the two independent maps from the gold-

standard refinement approach, and testing the refined model against the other map (89)

(Fig. S9D).

11

Fig. S1 Purification and characterization of the spliceosomal complex from

Schizosaccharomyces pombe (S. pombe). (A) A schematic diagram of the purification

protocol. (B) The purified complex was subjected to gel filtration analysis, which

revealed a single peak. (C) The peak fractions from gel filtration were visualized on

Urea-PAGE by SYBR® gold staining for RNA detection. (D) The same peak fractions

12

from gel filtration were visualized on SDS-PAGE by Coomassie blue staining for protein

detection. (E) The purified yeast spliceosome sample was subject to mass spectroscopic

(MS) analysis. Shown here is a list of core components in U2 snRNP, U5 snRNP, NTC,

and NTC related. Two independently prepared samples, named batch 1 and batch 2, were

subjected to such exhaustive MS analysis, revealing very similar results and confirming

the consistency of our purification method. The (peptide-spectrum match) PSM value

represents the relative abundance of the target protein, with a higher PSM value

indicating a higher abundance. 34 of the 39 listed core protein components have been

identified in our final structure (indicated by the check signs).

13

14

Fig. S2 Results of mass spectrometric (MS) analysis of the purified spliceosomal

complex from S. pombe. Two batches of sample were analyzed. Both samples were

used for the cryo-EM structure determination. (A) All proteins with PSM value higher

than 30 for at least one batch of sample are shown in the order of decreasing abundance.

There are 124 proteins altogether. About half of these proteins are spliceosomal

components. The top 15 entries with high PSM values are all spliceosomal proteins:

Spp42, Cwf17, Cwf8/Prp19, Cdc5, Prp17, Prp45, Cwf1/Prp5, Cwf10, Cwf11, Cwf2/Prp3,

Cwf15, Cwf5/Ecm2, Cwf3/Syf1, Cwf4/Syf3, and Brr2. As previously noted, most of the

contaminating proteins are ribosomal components. (B) Spliceosomal proteins with low

abundance in the purified sample. These proteins have PSM values lower than 30 and are

likely present in the sample with low abundance.

15

Fig. S3 Results of the RT-PCR reactions identify the purified yeast spliceosome as a

mixture of different spliceosomal complexes. (A) Schematic diagrams of the RT-PCR

reactions. Five pairs of RT-PCR primers were specifically designed for the cut6 gene.

Each of the four reactions would give rise to a unique PCR product. These four products

are: 5’-exon (product A), a fusion between 5’-exon and 5’-end sequences of the intron

(product B), intron lariat (product D), a fusion of 3’-exon and 3’-end sequences of the

intron (product E). The product C (using a 5’-primer in 5’-exon and a 3’-primer in 3’-

exon) has two forms: a long fragment representing the pre-splicing state and a short

16

fragment representing the post-splicing state. (B) The correspondence between the

spliceosomal complexes and the RT-PCR products. For example, the spliceosomal C

complex is predicted to have RT-PCR products A, D, and E, but not B or C. (C) The

predicted lengths of the five RT-PCR products for the cut6 gene. (D) Results of the RT-

PCR reactions. The left panel shows the results on the cryo-EM sample. The right panel

shows results of RT-PCR on three samples that have been subjected to extensive washes

by different ionic strength (150, 500, and 800 mM NaCl). The clear presence of the intron

lariat suggests presence of the C, P, or ILS complex. The two bands for the RT-PCR

product C suggests presence of both pre-splicing and post-splicing spliceosomal

complexes.

17

Fig. S4 Results of mass spectrometric (MS) analysis of the crosslinked spliceosomal

complex from S. pombe. The purified spliceosome was crosslinked by disuccinimidyl

suberate and analyzed by MS. This analysis identified 78 pairs of inter-molecular

interaction among the spliceosomal proteins. This data facilitates structural identification

of the spliceosomal components in the EM density.

18

Fig. S5 Preliminary electron microscopic (EM) analysis of the yeast spliceosome.

(A) A representative EM micrograph of the yeast spliceosome sample stained by uranyl

acetate. Scale bar, 100 nm. (B) Three-dimensional reconstruction of the yeast

spliceosome at 40 Å resolution. This structure was used as the initial model for auto-

refinement in the cryo-EM structure determination.

19

Fig. S6 A flow-chart for the cryo-EM data processing of the yeast spliceosome.

Please refer to the Method for details.

20

Fig. S7 Procedures for the cryo-EM image processing and masking strategy. (A)

Procedures for particle selection. After semi-autopicking, particle sorting and reference-

free 2D classification were preformed to remove ice and contaminants. Then, auto-

refinement was performed with the negative stain derived map as the initial model. Based

on the refined images STAR file (data.star), all particles were read back to their original

images with their refined centers and new particle coordinate file for each image was

generated. Finally, one round of manual particle picking was performed with the new

coordinates files to further discard bad particles and pick previously unpicked particles.

(B) To deal with the flexible nature of the spliceosome, local masks for different parts of

this complex were applied, which invariably led to a better local map. Reduced sizes of

the mask were further applied to three local areas, resulting in improved map quality in

these peripheral regions.

21

Fig. S8 Application of local mask improves the resolution and local map quality.

The resolutions are calculated for the eight cases of local mask application described in

Fig. S7B. Much of the central region of density map is resolved at resolutions better than

3.6 Å, ranging between 2.9 and 3.1 Å. The density was also improved for Arm II and the

Head region.

22

Fig. S9 Cryo-EM analysis of the yeast spliceosome. (A) Angular distribution for the

final reconstruction of the yeast spliceosome. Each cylinder represents one view and

the height of the cylinder is proportional to the number of particles for that

view. (B) FSC curves and the calculated resolutions after application of the local masks.

Five cases described in Fig. S7B are shown here. (C) FSC curves and the calculated

resolutions after application of reduced local masks. Three cases described in Fig. S7B

are shown here. (D) FSC curves of the final refined model versus the overall 3.6 Å map

it was refined against (black); of the model refined in the first of the two independent

maps used for the gold-standard FSC versus that same map (red); and of the model

refined in the first of the two independent maps versus the second independent map

(green). The small difference between the red and green curves indicates that the

refinement of the atomic coordinates did not suffer from severe overfitting.

23

Fig. S10 Representative EM density maps for the core regions of the yeast

spliceosome. Shown here are EM density maps for N-terminal regions of Spp42 (A, B),

the Cwf10-interacting loop of Spp42 (C), the RT Palm/Finger region of Spp42 (D),

Thumb/X of Spp42 (E), Linker of Spp42 (F), endonuclease domain (G), RNaseH-like

domain of Spp42 (H), the overall region of Cwf19 (I), and two select secondary structural

elements of Cwf19 (J). The side chain features for many amino acids are clearly visible,

allowing assignment of specific amino acids.

24

Fig. S11 Representative EM density maps for select secondary structural elements

of Spp42 (Prp8 in S. cerevisiae) and Cwf10 (Snu114 in S. cerevisiae). (A)

Representative EM density maps for 11 α-helices and 2 β-strands of Spp42. Bulky

residues are labeled. (B) Representative EM density maps for 2 α-helices and 3 β-strands

of Cwf10. (C) The EN density clearly identifies the bound nucleotide to be GDP in

Cwf10.

25

Fig. S12 The EM density maps for the spliceosomal proteins Cwf11, Cwf19, and the

Sm ring. (A) An overall EM density map is shown for Cwf11 in the left panel and local

EM density maps for select secondary structural elements are displayed in the other three

panels. (B) An overall EM density map is shown for the heptameric Sm ring in the top

left panel. The local EM density maps for all seven Sm proteins, along with their

recognized U5 snRNA bases, are shown in the other seven panels.

26

Fig. S13 The EM density maps for the core components of NTC. (A) An overall EM

density map is shown in two perpendicular views for the core components of NTC. These

components include Cdc5, Cwf7, ad Prp19. (B) Representative EM density maps for the

long α-helix of Cdc5. Two bulky residues are labeled. (C) Representative EM density

maps for select secondary structural elements of Cwf7. (D) Representative EM density

maps for select secondary structural elements of the tetrameric Prp19 protein. The four

chain labels, A through D, are indicated. A few bulky residues are labeled.

27

Fig. S14 The EM density maps for the superhelical proteins Cwf3/Syf1 and

Cwf4/Syf3. (A) The overall EM density maps are shown for the two halves of

Cwf3/Syf1 in the top panels. Local EM density maps are shown for five representative α-

helices of Cwf3/Syf1 in the bottom panels. A number of bulky residues are labeled. (B)

The overall EM density maps are shown for the two halves of Cwf4/Syf3 in the top

panels. Local EM density maps are shown for five representative α-helices of Cwf4/Syf3

in the bottom panels. A number of bulky residues are labeled.

28

Fig. S15 Representative EM density maps for five spliceosomal proteins.

Representative EM density maps are shown for select structural elements of Cwf2/Prp3

(A), Cwf5/Ecm2 (B), Cwf14 (C), Cyp1 (D), and Cwf15 (E). A number of bulky amino

acids are labeled.

29

Fig. S16 Representative EM density maps for Prp17, Prp45, and U2 snRNP. (A)

Local EM density maps are shown for three representative secondary structural elements

of Prp17. Two bulky residues in each element are labeled. (B) Local EM density maps

are shown for five representative secondary structural elements of Prp45. Six bulky

residues are labeled. (C) The overall EM density maps are shown for U2 snRNP. Two

perpendicular views are displayed here.

30

Fig. S17 Representative EM density maps for Cwf17 and Cwf1/Prp5. (A) The

overall EM density maps are shown for Cwf17 in the top panels. Three views are shown

to highlight the interacting elements from SmB1 and Spp42. Local EM density maps are

shown for three structural elements in the bottom panels. A number of bulky residues are

labeled. (B) The overall EM density maps are shown for six mutually interacting

spliceosomal protein components in the top panels. Three perpendicular views are shown.

Local EM density maps are shown for three structural elements of Cwf1/Prp5 in the

bottom panels. Bulky residues are labeled.

31

Fig. S18 A procedure of model building. First, the heptameric Sm ring and WD40 repeat containing proteins, including Cwf1/Prp5, Cwf8/Prp19, and Cwf17, were identified, and the atomic coordinates of the corresponding homologues were docked into the density. Second, large proteins with available crystal structures for their homologues, namely Spp42 (Prp8 in S. cerevisiae), Cwf10 (Snu114 in S. cerevisiae) and Cwf11, were docked into the density. U5 snRNA was unambiguously located. Third, proteins with extended architectures were located, including Cwf3/Syf1, Cwf4/Syf3, and Cwf8/Prp19. Cwf2/Prp3, Cwf14 and the Myb Domain of Cdc5 were also identified in the density map. Fourth, based on results of mass spectroscopic analysis of the crosslinked spliceosome sample, we assigned one of the two long helices to Cdc5. Prp45, Cwf19, Cwf7, and Cwf5 were identified in the nearby region. Fifth, a 11 Å map for U2 snRNP was generated after 3D classification, leading to identification of Lea1, Msl1, a second Sm ring, and the 3’-end portion of U2 snRNA. Finally, U6 snRNA, U2 snRNA, and the intron lariat were found in the remaining EM density, and three additional proteins Cwf7, Cwf15, and Cyp1 were located.

32

Fig. S19 Identification of a conserved catalytic cavity on Prp8. (A) The positively

charged amino acids in the catalytic cavity of Spp42 are invariant among S. cerevisiae, C.

elegans, M. musculus, and H. sapiens. Sequence alignment of the relevant regions from

Spp42 and Prp8 are shown here. Invariant residues are colored with red background.

Positively charged amino acids that line the catalytic cavity of Spp42 are indicated by

arrows. (B) Identification of the catalytic cavity in Prp8 by electrostatic surface potential.

Spp42 is shown in the left panel as a reference.

33

Fig. S20 Prp45 interacts with at least 9 spliceosomal proteins and two snRNA

molecules. (A) Overall structure of Prp45 (colored red) bound to other protein and

snRNA components. The three boxed regions are shown in close-up views in panels B, C,

and D. (B) Residues from the C-terminal α-helix of Prp45 interact with Spp42 through

predominantly van der Waals contacts. The hydrophobic residues from Prp45 (Phe288,

Phe291, Leu295, and Val298) closely stack against hydrophobic residues from Spp42.

(C) Two anti-parallel β-strands at the N-terminal portion of Prp45 pair up with a β-strand

from Prp17 to form a β-sheet. This interface is dominated by hydrogen bonds. (D) Prp45

directly interacts with U2 snRNA. The side chain of Asn260 may donate a hydrogen

bond to the uracil of nucleotide 18 from U2 snRNA. Residues from the loop preceding

Asn260 interact with Spp42.

34

Fig. S21 Structures of six individual protein components in the yeast spliceosome.

(A) Structure of Cwf2/Prp3. (B) Structure of Cwf11. The homologous structure was

docked into the EM density, manually adjusted, and refined. (C) Structure of the β-

propeller protein Cwf17. (D) Structure of Cwf7. Note the extended appearance of Cwf7.

(E) Structure of the cyclophilin family peptidyl-prolyl cis-trans isomerase Cyp1. (F)

Structure of Prp17. Prp17, together with Prp45, Cdc5, and Cwf7, define a family of

intrinsically disordered proteins in isolation. These proteins adopt extended but defined

conformations upon binding to their interacting partners.

35

Fig. S22 Conformational flexibility of the yeast spliceosome. (A) Two conformations

of the yeast spliceosome. These two conformations exhibit large variations mainly in the

Head region and Arm I. (B) Superposition of these two conformations is displayed in

two perpendicular views.

36

Fig. S23 Three distinct zinc-binding motifs in the spliceosomal proteins. (A)

Cwf5/Ecm2 contains two Cys4-type zinc-binding motifs. The two zinc ions are

coordinated by Cys23/Cys26/Cys80/Cys83 and Cys44/Cys47/Cys70/Cys73, respectively.

(B) Cwf14 contains a three-zinc cluster. These three zinc ions are coordinated by 9

cysteine residues: Cys101, Cys102, Cys105, Cys117, Cys119, Cys134, Cys137, Cys139,

and Cys142. Each zinc atom has tetrahedral coordination, and three cysteine residues

each contribute two valences. (C) Cwf19 contains a Cys2His2-type zinc-binding motif.

The zinc ion is bound by Cys412, Cys415, His 455, and His502.

37

Table S1. Cryo-EM data collection and refinement statistics.

Data collection EM equipment FEI Titan Krios Voltage (kV) 300 Detector Gatan K2 Pixel size (Å) 1.32 Electron dose (e-/Å2) 38 Defocus range (µm) 1.5~3.0

Reconstruction Software RELION 1.3 Number of used Particles 112,795 Accuracy of rotation (˚) 1.000 Accuracy of translation (pixels) 0.888 Final Resolution (Å) 3.6

Refinement Software Refmac 5.8 Map sharpening B-factor (Å2) -86.9 Average Fourier shell correlation 0.766 R-factor 0.363

Model composition Protein residues 10,574 RNA nucleotides 332 Ion 11 GDP 1 ADP 1

Validation R.m.s deviations

Bonds length (Å) 0.006 Bonds Angle (˚) 1.002

Ramachandran plot statistics (％) Preferred 89.65 Allowed 6.73 Outlier 3.61

38

Table S2 Summary of model building for the yeast spliceosome. Molecule Length Domain/Region PDB code Modeling Resolution (Å)

U5 snRNP

U5 snRNA 120 7:111 - De novo building 2.9~3.6

Spp42/Prp8

2363

N-terminal Domain (47:825) RT finger/palm (826:1210)

Thumb/X (1211:1327) Linker (1328:1602)

Endonuclease (1603:1783) RNaseH-like (1784:2030)

Jab1/MPN

-

4I43 -

De novo building

Homology modeling

Not modelled

2.9~3.6

~4 -

Cwf10

984

N-terminal Domain (68:120) G domain (121:452) Domain II (453:595) Domain III (596:675) Domain IV (676:843) Domain V (844:919)

C-terminal Domain(920:971)

-

3J7P -

De novo building

Homology modeling

De novo building

2.9~3.8

Brr2 2176 - - Not modelled - Cwf17 340 WD40 domain (42:340) 2XL2 Homology modeling 3.3~4.0 SmB1 SmD1 SmD2 SmD3 SmE1 SmF1 SmG1

147 117 115 97 84 78 77

Sm fold (2:87/94:118) Sm fold (1:82)

Sm fold (19:115) Sm fold (2:97) Sm fold (9:84) Sm fold (4:78) Sm fold (3:75)

4WZJ

Homology modeling

~3.3~4.0

U6 snRNP U6 snRNA 99 nt 1:90 - De novo building 2.9~4.5 U2 snRNP

U2 RNA 186 nt 1:43 93:109/153:177

112:145

- - -

De novo building Homology modeling

Double helix

2.9~4.5 ~11 ~11

Lea1 237 LRR domain 1A9N Rigid docking

~11

Msl1 111 RRM domain 1A9N Rigid docking SmB1 SmD1 SmD2 SmD3 SmE1 SmF1 SmG1

147 117 115 97 84 78 77

Sm fold Sm fold Sm fold Sm fold Sm fold Sm fold Sm fold

U5 Sm proteins

Rigid docking

NTC core

Cdc5 757 Myb Domain (8:109) 502:639

174:203/221:271/652:757

1GV2 - -

De novo building α-helices modelled

De novo building

~3.5 ~5

3.6~4.5

Prp19/Cwf8 (4 copy)

488 U-box (4 copy) Coil-Coil Region (4 copy)

WD40 (1 copy)

2BAY -

3LRV

Homology modeling De novo building

Rigid docking

3.6~6 3.6~5

~7 Cwf7 187 Coil-Coil (7:185) - De novo building 3.6~4.5

Cwf2/prp3 388 50:235 3U1L Homology modeling 3.3~5 Cwf3/Syf1 790 ~100:495,655~730

498~653 - -

α-helices modelled De novo building

5~7 3.4~4

Cwf4/Syf3 674 41-290 ~290-660

- -

De novo building α-helices modelled

3.4~4 5~7

Cwf15 265 24-70; 223-265 - De novo building 3.0~4.0 Syf2 229 - - Not modelled -

Cwf12/Isy1 217 - - Not modelled -

NTC related

Prp5/Cwf1 473 N-terminal Domain WD40 domain (149:470)

- 4YVD

α-helices modelled Homology modeling

~6 ~3.4

Prp45 557 100:315 - De novo building 3.0-4.5 Cwf5/ecm2 354 18:151

RRM domain (207:285) -

2YTC De novo building

Homology modeling 3.3-4.0

4~6 Cwf11 1284 - 3PJ3 Homology modeling 4.0~7 Cwf19 639 334:633 - De novo building 3.4~4 Cwf14 146 3-146 2MY1 Homology modeling ~3.4 Cwf16 270 - - Not modelled - Cwf18 142 - - Not modelled -

Others

Prp17 558 N-terminal (14:161) WD40 domain

- -

De novo building Not modelled

3.3~4.5 -

Cyp1 155 Cyclosporine Like domain 2X7K Homology modeling 3.8~4.5

39

Documents

Supplementary Materials for - Science...system by applying a step-wise gradient of 0-85% acetonitrile (ACN) in 0.1% formic acid. Peptides eluted from the LC column were directly electrosprayed