ASLI ERTEKIN In partial fulfillment of the requirements

ORDER AND DISORDER IN PROTEINS

by

ASLI ERTEKIN

A dissertation submitted to the

Graduate School-New Brunswick

Rutgers, The State University of New Jersey

In partial fulfillment of the requirements

For the degree of

Doctor of Philosophy

Graduate Program in

Computational Biology and Molecular Biophysics

Written under the direction of

Dr. Gaetano T. Montelione

And approved by

_____________________________

_____________________________

_____________________________

_____________________________

_____________________________

New Brunswick, New Jersey

OCTOBER, 2011

ii

ABSTRACT OF THE DISSERTATION

Order and Disorder in Proteins

by ASLI ERTEKIN

Dissertation Director:

Dr. Gaetano T. Montelione

In contrast to the general view that proteins should have a specific 3D structure in

solution for their activity, there are many proteins which do not have a folded “native”

structure for a big portion of their sequence. While these intrinsically disordered regions

are essential for protein function, they cause problems in efforts for determining the 3D

structures for the folded domains. It has been shown that the removal of the disordered

domains improved the structure determination success both by X-ray crystallography and

by NMR. As part of Northeast Structural Genomics (NESG) effort I worked on

identifying the disordered and flexible parts of the protein using Hydrogen/Deuterium

Exchange with Mass Spectroscopy (HDX-MS) analysis for construct optimization for

high-throughput structure determination. Using this method I also studied human Smad3,

which is an important part of the TGF-β-signaling pathway; and provided the first

experimental data on structural features of the linker domain. During my training, I also

studied human Deleted in Oral Cancer (DOC-1) protein, which was one of the proteins I

studied by HDX-MS for construct optimization. We determined the solution structure of

iii

the folded region of DOC-1, which was shown to be important in cell-cycle regulation

and cancer biology; and I also studied structure-function relations. Additionally, we

studied the solution structure of Methionine Sulfoxide Reductase B from Bacillus

subtilis, an important protein for reversing oxidative damage in cells, by NMR as a part

of methods development studies for NMR for large proteins.

iv

ACKNOWLEDGEMENTS

The work I described here was made possible with the help and support of many

people. Especially because with the beginning of my Ph. D. journey I have been learning

something new at every step I took.

I am grateful to all the people in the Montelione lab, who were very helpful for

every need I had for all my projects. Especially Haleema Janjua, who helped me

incredibly on the CDK2AP1 project, was there when the project and I needed help. I

learned a lot through our studies on the finicky protein, CDK2AP1. I would like to give

special thanks to my highly knowledgeable research professors Tom Acton and Rong

Xiao, who were always there when I was confused and I needed help with my projects.

I had three mentors throughout my studies, GVT Swapna, Jim Aramini and Paolo

Rossi. I started to learn about NMR with Swapna with many hours of sitting by the

instrument and scribbling pulse sequences on any paper available around. With great help

from Jim, I solved my first structure, CDK2AP1. He always seemed more than happy to

help me, and I abused it for sure. Paolo had the unfortunate luck to have his desk next to

mine. He had to be the first person I bounced my questions off, about anything remotely

related. And he worked with me on a very challenging protein structure, MsrB. Thanks to

all my mentors for their patience and help.

And my advisor, the curator of this great lab with wonderful people and

wonderful projects, Guy Montelione. I cannot find enough words to explain my gratitude

for everything he did for the past 6 years. He gave me the opportunity to work on really

v

exciting and challenging projects. With every project I grew up a little more as a scientist.

He has been a big inspiration, from every meeting we had, I would come out thinking

“Hell yeah! Let‟s do science!” I respect the way he does science, I respect the way he

shapes science, I respect him infinitely. I am just happy to be able to work with him and

witness how a great mind thinks.

I also want to thank to my dearest friend, Elif, being of one the biggest support

here, away from home, since the first day I arrived, and my roommate, Meric, for all the

fun times we had. Many special thanks to my boyfriend, Mehul, who was there with me,

from start to end.

Finally, I want to dedicate my work here to my family, my mom, Betul, and my

sister, Ege, for sacrificing the youngest family member for science. Because I know how

hard it is to be away from family. I love them both.

vi

TABLE OF CONTENTS

ABSTRACT ........................................................................................................................ ii

ACKNOWLEDGEMENTS ............................................................................................... iv

TABLE OF CONTENTS ................................................................................................... vi

LIST OF TABLES .............................................................................................................. x

LIST OF FIGURES ........................................................................................................... xi

LIST OF ABBREVIATIONS ........................................................................................... xv

INTRODUCTION .............................................................................................................. 1

1. HYDROGEN-DEUTERIUM EXCHANGE FOR PROTEIN CONSTRUCT

OPTIMIZATION ................................................................................................................ 4

1.1 Introduction ............................................................................................. 4

1.2 Methods and Materials .......................................................................... 10

1.2.1 Hyrogen/Deuterium Exchange with Mass Spectroscopy ................. 10

1.2.2 Experimental Setup and Standard Protocol ...................................... 11

1.2.3 HDX-MS data analysis ..................................................................... 13

vii

1.2.4 Protein Cloning, Expression and Purification ................................... 13

1.3 Results ................................................................................................... 14

1.3.1 Modifications to Existing Protocol ................................................... 14

1.3.2 Results of HDX-MS Analysis on NESG Target Proteins ................. 16

1.4 Discussion ............................................................................................. 18

2. THE INTERDOMAIN LINKER REGION OF HUMAN SMAD3 IS

INTRINSICALLY-DISORDERED ................................................................................. 20

2.1 Introduction ........................................................................................... 20

2.2 Materials and Methods .......................................................................... 23

2.3 Results ................................................................................................... 23

2.4 Discussion ............................................................................................. 28

3. SOLUTION NMR STRUCTURE OF THE FOLDED C-TERMINAL DOMAIN OF

HUMAN CYCLIN DEPENDENT KINASE 2 ASSOCIATED PROTEIN 1 (CKD2AP1)

…....................................................................................................................................... 31

3.1 Introduction ........................................................................................... 31

3.2 Materials and Methods .......................................................................... 36

3.2.1 Sample preparation ........................................................................... 36

viii

3.2.2 Structure Determination .................................................................... 36

3.2.3 Polycistronic Co-expression of CDK2 and CDK2AP1 .................... 38

3.2.4 Size Exclusion Chromatography for Binding Studies ...................... 39

3.2.5 IKKE Phosphorylation of full-length CDK2AP1 ............................. 40

3.3 Results ................................................................................................... 40

3.3.1 HDX-MS Analysis ............................................................................ 40

3.3.2 CDK2-AP1(61-115) is a dimer in solution ....................................... 42

3.3.3 Solution Structure of CDK2AP1 (61-115) ....................................... 45

3.3.4 C105A mutation does not disrupt DOC-1(61-115) dimerization ..... 53

3.3.5 CDK2AP1 is phosphorylated by IKKε ............................................. 54

3.3.6 CDK2:CDK2AP1 interaction ........................................................... 58

3.4 Discussion ............................................................................................. 62

4. SOLUTION NMR STRUCTURE OF PEPTIDE METHIONINE SULFOXIDE

REDUCTASE FROM BACILLUS SUBTILIS .................................................................. 65

4.1 Introduction ........................................................................................... 65

4.2 Methods and Materials .......................................................................... 67

ix

4.2.1 Protein Cloning, Expression and Purification ................................... 67

4.2.2 NMR data collection and structure calculation ................................. 68

4.3 Results ................................................................................................... 69

4.3.1 Samples Used for Data Collection .................................................... 69

4.3.2 Chemical Shift Assignments ............................................................. 71

4.3.3 The Solution structure of MsrB ........................................................ 75

4.3.4 MsrB Dynamics ................................................................................ 80

4.3.5 X-ray Crystal Structure ..................................................................... 83

4.4 Discussion and Conclusions ................................................................. 86

REFERENCES ................................................................................................................. 90

A. APPENDIX ............................................................................................................. 108

x

LIST OF TABLES

Table 1.1 List of NESG targets studied by HDX-MS by Seema Sharma........................... 8

Table 1.2 List of the proteins studied with HDX-MS for construct optimization ............ 16

Table 3.1 Summary of NMR and structural statistics for human CDK2AP1(61-115) ..... 46

Table 4.1 Structure calculation statistics and quality scores for MsrB ............................. 77

xi

LIST OF FIGURES

Figure 1.1 NESG Process of Optimizing Sequences of Partially Disordered Proteins for

NMR Structure Analysis and/or Crystallization. ................................................................ 7

Figure 1.2 15

N-HSQC spectra for NESG targets ER553, LkR15, SaR32 and VpR68 for

full-length (a), and optimized (b) constructs....................................................................... 9

Figure 2.1 The peptides used for HDX-MS analysis of Smad3.. ..................................... 24

Figure 2.2 HDX-MS results on the full-length human Smad3 protein.. ........................... 25

Figure 2.3 a) Sequence alignment of Smad3 and Smad2. b) DisMeta disorder consensus

prediction results for Smad3 and Smad2. ......................................................................... 27

Figure 3.1 Multiple sequence alignment of CDK2AP1 with homologues, by ClustalW. 32

Figure 3.2 HCPIN Interactome for (a) CDK2AP1 and (b) CDK2.. ................................. 34

Figure 3.3 HDX-MS data for CDK2AP1 (NESG ID: HR3057). ..................................... 41

Figure 3.4 Overlay of 600 MHz 1H-

15N HSQC spectra at 298 K of full-length (red) and

61-115 construct (blue) of CDK2AP1 .............................................................................. 42

Figure 3.5 (a) Static light scattering chromatogram for CDK2AP1(61-115) in the

presence of DTT. (b) Rotational correlation time vs MW ................................................ 44

Figure 3.6. Solution structure of residues 61-115 of CDK2AP1.. .................................... 49

xii

Figure 3.7 Disulfide mapping by MALDI-TOF of sample without (a) and with (b) DTT..

........................................................................................................................................... 51

Figure 3.8 Overlay of 600 MHz (a) 1H-

15N HSQC and (b)

1H-

13C HSQC spectra of

CDK2AP1(65-115) with and without DTT ..................................................................... 52


15N HSQC spectra of wild-type and C105A mutant 54

Figure 3.10 The LC/MS chromatograms of peptides after trypsin digestion of samples

before and after phosphorylation reaction.. ...................................................................... 56

Figure 3.11 MS/MS spectrum of QLLSDYGPPSpLGYTQGTGNSQVPQSK peptide.

S46 is identified as the only phosphorylated site after IKKε phosphorylation reaction. .. 56


15N HSQC spectra of wild-type and phosphorylated

full-length CDK2AP1 ....................................................................................................... 57

Figure 3.13 Size exclusion chromatography for CDK2, CDK2AP1(65-115) and mixture.

........................................................................................................................................... 59

Figure 3.14 Size exclusion chromatography for CDK2, CDK2AP1 and mixture ............ 60

Figure 3.15 SDS-GEL of the fractions from co-purification assay for CDK2: CDK2AP1

co-expression system. ....................................................................................................... 61

Figure 3.16 Size exclusion chromatography for CDK2, CDK2AP1(S46p) and mixture . 62

xiii

Figure 4.1 Projection of HNcaCO spectra for (a) double-labeled (13

C,15

N)-sample and (b)

perdeuterated triple-labeld (13

C,15

N2H) sample of B. subtilis MsrB ................................. 71

Figure 4.2 The 800MHz 15

N-HSQC spectrum for MsrB at 25 °C .................................... 73

Figure 4.3 Secondary chemical shift histograms for Cβ atoms in folded proteins for

Proline residues ................................................................................................................. 75

Figure 4.4 Solution NMR structure of major conformational state of fully-reduced MsrB

........................................................................................................................................... 78

Figure 4.5 Stereo image of the active site of MsrB. ......................................................... 79

Figure 4.6 Backbone 15

N relaxation measurements at 600 MHz, 25°C ........................... 80

Figure 4.7 The solution structure of MsrB, the residues with low hetNOE values are

colored red, the residues with high R2 rates are colored blue. .......................................... 83

Figure 4.8 The superimposition of 2.6 Å X-ray crystal structure (PDB ID: 3E0O) and

sparse-constraint NMR structures for MsrB from B. subtilis. .......................................... 84

Figure 4.9 Secondary shifts of Cα atoms for N-terminal residues .................................... 86

Figure A.1 HDX-MS results for NESG target BfR218. ................................................. 108

Figure A.2 HDX-MS results for NESG target ER554. ................................................... 109

Figure A.3 HDX-MS results for NESG target HR2891 ................................................. 110

xiv







Figure A.10 HDX-MS results for NESG target SmR84 ................................................. 117

Figure A.11 HDX-MS results for NESG target SpR36 .................................................. 118

Figure A.12 HDX-MS results for NESG target SvR375 ................................................ 119

xv

LIST OF ABBREVIATIONS

CDK2: Cyclin-dependent kinase 2

CDK2AP1: CDK2 associated protein 1

DOC-1: Deleted in oral cancer 1

DTT: Dithiothreitol

FA: Formic acid

HDX-MS: Hydrogen/deuterium exchange with Mass Spectrometry

HPLC: High performance liquid chromatography

HSQC: Heteronuclear single quantum correlation

IPTG: Isopropyl β-D-1-thiogalactopyranoside

MCS: Multiple cloning site

MES: 2-(N-morpholino)ethanesulfonic acid

MH1, MH2: Mad-homology domain 1,2

MOPS: 3-(N-morpholino)propanesulfonic acid

MsrB: Methionine sulfoxide reductase

NESGC: Northeast Structural Genomics Consortium

xvi

NMR: Nuclear magnetic resonance

NOE: Nuclear Overhauser Effect

NOESY: Nuclear Overhauser Effect Spectroscopy

PDB: Protein data bank

PSI: Protein Structure Initiative

PSVS: Protein structure validation suite

RDC: Residual dipolar coupling

RMSD: Root mean square deviation

TCEP: tris(2-carboxyethyl)phosphine

TGF-β: Transforming growth factor

1

INTRODUCTION

Proteins are dynamic entities. They fluctuate between similar, but distinct,

conformational states. This conformational plasticity is fundamental to their biological

function in the cell, and contributes to both the thermodynamic and kinetic aspects of

their functions. The strong link between enzyme dynamics and function has been studied

extensively on different systems [1-4]. Many of these studies demonstrate that

conformational states spanned during native state dynamics of the protein often

correspond to the predominant conformations they attain in their functionally active

states [4].

In contrast to the general view that a protein should have a specific 3D structure

in solution for its activity there are many proteins which do not have “native state”

structure. Rather, they exhibit “intrinsic disorder”, which is often fundamental to their

function. In contrast to the enzymes mentioned above, these proteins require higher level

plasticity and flexibility for their function in the cell. It is observed that these intrinsically

disordered proteins are especially enriched in cell signaling pathways and in

transcriptional and translational regulation [5]. The intrinsic disorder in proteins is

advantageous in protein signaling, providing readily available post-translational

modification sites, and conformational flexibility that is often required in order to provide

increased promiscuity in binding partners. Conformational flexibility of intrinsically-

disordered proteins may also provide important entropic contributions in determining the

affinities of protein-protein interactions [6-11].

2

In my thesis, I focus on understanding the structural and dynamic features of

proteins. For this I studied my favorite proteins with different experimental methods

including Hydrogen/Deuterium Exchange with Mass Spectrometry (HDX-MS) and

Nuclear Magnetic Resonance Spectroscopy (NMR).

As part of NIH Protein Structure Initiative (PSI) Northeast Structural Genomics

(NESG) effort, I worked on identifying the disordered and flexible parts of proteins using

HDX-MS analysis (Chapter 1). These data were obtained primarily for construct

optimization needed for high-throughput structure determination, but in some cases have

also provide important insights into structure-dynamics-function relationships. I studied

thirteen NESG protein targets under this project

Additional to these NESG targets I also studied human Smad3, a very important

protein in TGFβ-signaling, regulating many key cellular processes including

proliferation, differentiation, adhesion, apoptosis, and immune suppression. Smad3 is

composed of two well-folded domains, MH1 and MH2, which are connected by a linker

domain. The structural features of MH1 and MH2 domains were determined

experimentally but there were no experimental data revealing the structural features of

the linker domain. In Chapter 2, I describe my HDX-MS studies on full –length human

Smad3, where I present the first experimental data on structural characterization of the

linker region, which is largely intrinsically disordered.

I next focused on structure-function studies of the cyclin-dependent kinase 2

associated protein 1 (CDK2AP1, NESG ID: HR3057), which I had initially studied in

Chapter 1. This protein, corresponding to gene doc-1 (deleted in oral cancer-1), is a

3

cyclin-dependent kinase 2 inhibitor, one of the regulators of cell cycle and has important

role in cancer biology. HDX-MS studies reveled that N-terminal 60 residues are

intrinsically disordered. CDK2AP1 is also involved in the Mi-2/NUrD transcriptional-

regulation complex, an important complex regulating epigenetic gene expression. The

details of the structural features of the C-terminal ordered region and new findings are

explained in Chapter 3.

In the final part of my thesis work, I pursued structural and dynamics studies of a

biologically important enzyme, methionine sulfoxide reductase B (MsrB). MsrB has the

critical function of reducing oxidized methione (methionine sulfoxide) residues in

proteins back into methionine, and hence plays a role in the aging process. We solved the

16 kDa MsrB structure using “sparse constraint” approach, using limited NMR distance

constraints that could be obtained on a perdeuterated protein sample. The resulting

structure is very similar to an independently determined 2.6 Å crystal structure of the

same protein. The differences observed between our sparse-constraint NMR structure

and the crystal structure provide guidance on areas than require improvement in order to

make this sparse-constraint approach robust for accurate protein NMR structure

determination. The details of this structure are described in Chapter 4.

4

1. HYDROGEN-DEUTERIUM EXCHANGE FOR PROTEIN CONSTRUCT

OPTIMIZATION

1.1 Introduction

In contrast to general view of proteins exhibiting a certain equilibrium native

structure, it has been recently acknowledged that many proteins evolved to lack any kind

of ordered equilibrium structure for either the entire of the protein or for a significant

portion [12-14]. Based on disorder predictions on entire proteomes of more than 30

organisms suggested that 6-33% bacterial proteins contained regions <40 residues

predicted to be disordered. For eukaryotes this prediction increases to 35-51% of the

proteome [15]. The increase in the occurrences of disorder in higher organisms can be

linked to the complexity of the protein-protein interaction networks [16]. The plasticity of

these proteins is especially advantageous in providing different binding sites for multiple

binding partners, providing flexible linkers, reorganizing the connected functional

domains, providing different binding sites for different interactions, and providing

entropic contributions to protein-protein interactions. These regions often include the

post-translational modification sites, allowing solvent accessibility [12-16].

Throughout the last decade the structural genomics groups have been working on

determining protein structures as well as exploring ways of high-throughput structure

determination. In studying protein structures with X-ray crystallography, obtaining well-

diffracting protein crystals has been a serious bottleneck. Even though the unstructured

regions found in proteins may be functionally important, it has been observed that these

highly flexible regions may prevent formation of high-quality crystals by inhibiting

5

formation of stable crystal contacts, making the crystal formation thermodynamically

unfeasible and/or disrupting the homogeneity of the protein ensemble due to high

susceptibility to protease cleavage. Thus the problem of obtaining diffraction quality

crystals could be partially overcome by removing these unstructured regions of the

protein.

In earlier studies, crystallization success was shown to improve by in-situ limited

proteolysis by addition of minute amounts of proteases into crystallization media to

enable cleavage of disordered tails [17, 18]. In many cases limited proteolysis with Mass

Spectroscopy (LP-MS) was also used to identify the disordered regions and to allow

engineering of protein constructs lacking the problematic region(s) [19]. However,

cleavage at internal loops, rather than N- or C-terminal disordered “tails” may result in

the wrong interpretation of the data.

In the last decade it was demonstrated crystallization success for structure

determination for partially disordered proteins by X-ray crystallography could be

improved by identifying disordered regions of the protein by Hydrogen/Deuterium

Exchange Mass Spectroscopy (HDX-MS) [20-22]. This method proved to be very

effective, requiring very small quantities of protein (micrograms) and minimal data

analysis, thus making it perfect tool for high-throughput studies of partially disordered

proteins.

Nuclear Magnetic Resonance (NMR) spectroscopy is an alternative tool,

complementary to X-ray crystallography, for structure determination, providing high-

resolution solution structures as well as information on dynamic characteristics and

6

interactions of proteins and other biomolecules. Determination of 3D structures of

proteins with partially unfolded regions can be very complicated for NMR as well, since

these segments cause high-intensity, overlapping peaks in NMR spectra. Different NMR

experiments could be used to identify the flexible regions [23-25], however the amount of

protein and extensive labor required for data analysis makes these methods highly

unfavorable. In this study it is shown that the adaptation of the HDX-MS protocol for

high-throughput structural studies by NMR can be highly effective for providing sample

with better properties for rapid and accurate determination of resonance assignments and

structures. Where appropriate, these assignments and/or structures of the ordered “core”

regions of partially-disordered proteins provides a solid starting point for NMR studies of

the full-length protein or studies of complexes using either NMR or X-ray

crystallography.

As part of the technology development of the Northeast Structural Genomics

Consortium (NESGC) project, a process has been developed to identify and remove

disordered or flexible N- and C-terminal regions of proteins in order to provide samples

that are more amenable to high-throughput NMR analysis and crystallization [26]. First,

disordered regions of target proteins are identified using the Dis-Meta server, which is a

collection of different disorder prediction tools available [27-38]. If a consensus of these

multiple bioinformatics methods is observed, suggesting limited number of disordered

regions at one or both ends of the polypeptide chain, new constructs are made based on

this information. When the bioinformatics methods fail to provide a consensus result,

disordered regions are identified experimentally by HDX-MS experiments [26]. New

7

constructs are designed, excluding the highly flexible or disordered residues on either

terminal region, based on these experimental data and secondary structure predictions

(Figure 1.1).

Figure 1.1 NESG Process of Optimizing Sequences of Partially Disordered Proteins for

NMR Structure Analysis and/or Crystallization.

A set of NESG targets, which were studied by HDX-MS, analyzed earlier by

Seema Sharma [26], is given in Table 1.1 and Figure 1.2. Six proteins, for which the

disorder prediction results were inconclusive, have been studied by HDX-MS. For four of

these proteins the structures of the optimized constructs were determined by NMR and

deposited in the Protein Data Bank (PDB IS‟s: 2K1S, 2K3D, 2K5D, 2JZ5, respectively).

These results suggest that by incorporating HDX-MS into pipeline we have achieved a

high salvage-success rate of ~70 %.

8

Table 1.1 List of NESG targets studied by HDX-MS by Seema Sharma.

NESG ID Result Summary Status of New Constructs

ER553 Disordered N-term NMR In PDB

SaR32 Disordered C-term NMR In PDB

LkR15 Disordered N- and C-term NMR In PDB

VpR68 Disordered N-term NMR In PDB

HR2951 Disordered C-term Poor Exp/ Sol

DrR44 Disordered C-term Poor Exp/ Sol

9

ER553

SaR3

2

VpR68

LkR15 ER553 LkR15

1-114

Micro-probe

SaR

32 1

7-99

VpR68

LkR

15 10 – 90

Micro-probe

ER5

53

17-99

VpR68

LkR15

SaR32

ER553

59-199

a)

b)

Figure 1.1 15

N-HSQC spectra for NESG targets ER553, LkR15, SaR32 and VpR68 for

full-length (a), and optimized (b) constructs

10

1.2 Methods and Materials

During my dissertation I continued working on HDX-MS part of this project. We

carried out HDX-MS studies for thirteen NESG protein targets in order to experimentally

identify the intrinsically-disordered regions in these proteins. Results on these 13

proteins, along with a summary of the methodological improvements made relative to the

published experimental protocol [26] in order to make it more suitable for a broader set

of proteins are presented in this Thesis chapter.

1.2.1 Hyrogen/Deuterium Exchange with Mass Spectroscopy

Acidic hydrogen in proteins such as –OH, –NH2, –SH, peptide amide hydrogen

continuously and reversibly interchange with the hydrogen in water. The hydrogen of

–OH, –NH2, and –SH groups exchange very rapidly, which makes the real-time

observation of this reaction very difficult. However, the exchange rates for the amide

hydrogen were shown to be highly dependent on local environment, such as secondary

and tertiary structures and solvent accessibility. Based on the observations and

calculations, the amide hydrogen exchange rates for random coil were found to be

typically in 10-1000 sec range [39]. In a folded polypeptide chain the amide proton,

exchange half times as long as weeks could be observed, due to the stability of the

peptide and protection of the amide group from the solvent. Thus the measurement of

real-time amide hydrogen exchange is a very sensitive and unique probe for protein

structural studies.

11

The real-time 1H/

2H exchange reaction can be achieved by isotope labelling,

which can be monitored by mass spectrometry [40]. The isotope incorporation to

backbone amides can be achieved by diluting the protein solution with buffer with high

deuterium content (>99%), yielding high concentrations of 2H2O in the reaction medium,

forcing the exchange to proceed in one direction. At different preset time points the

reaction is quenched, sample is subjected to a proteolysis reaction, then the proteolysis

products are separated by high-performance liquid chromatography and analyzed by mass

spectrometry. The increase in the peptide mass, due to exchange of hydrogen with

deuterium, is measured by the comparison of the centroids of the mass envelopes of the

peptide of interest corresponding to before and after the exchange reaction.

1.2.2 Experimental Setup and Standard Protocol

Protein 1H/

2H exchange experiments were conducted following the methods

described by Spraggon et al [22]. A 5 l aliquot of protein sample (~ 25 - 50 g, in

preferred buffer for structural studies) was mixed with 15 l of deutarated buffer and

incubated on ice for set time points before being quenched by the addition of 30 l of

quench solution (Buffer A: 0.8% formic acid and 1.6 M Guanidine HCl) and was frozen

immediately on dry ice. For the 0 time point experiment, 5 l of protein sample was

mixed 15 l of buffer in H2O and quenched and analyzed in the same way as the 1H/

2H

exchanged samples.

For the correction of back exchange, a completely exchanged sample was

prepared as described by Hamuro et al [41]; 5 l of the protein sample was mixed with 15

12

l of 0.5% formic acid in 2H2O for complete unfolding reaction and incubated at room

temperature for 24 h. The sample was then quenched and analyzed using the same

conditions as other 1H/

2H exchange experiments.

High Performance Liquid Chromatography (HPLC) solvent bottles, connection

lines including the sample loop, the pepsin column and the analytical column were all

kept on ice. For analysis, frozen samples were thawed on ice and manually injected into a

pre-column (66 l bed-volume, Upchurch) packed in-house with immobilized pepsin

(PIERCE) at a 50 l / 30 sec flow-rate and followed by an injection of 100 l 0.1%

formic acid in 1 min into a 200 l sample loop. By valve switching, the sample loop was

brought online with the C18 HPLC column (Discovery, BioWide Pore C18-3, 5 cm X 2.1

mm 3um, Supelco). After a three minute wash step with 200 l/min of 0.1% formic acid,

the digested peptides were separated by a linear acetonitrile gradient of 2-50% in 17 min

at 200 l/min and analyzed downstream by an electrospray-linear ion-trap mass

spectrometer (LTQ, Thermo). For measurement of the mass shift in 1H/

2H exchange

experiments, MS was set to perform full-scan in the range of 300-2000 in profile mode

for the entire HPLC run. For peptide identifications, the same sample was processed the

same way as the 0 time samples. The mass spectrometer was set to perform one full-scan

MS in the m/z range 300-2000 followed by zoom scans of top 5 most intense ions and

MS/MS of multiply charged ions. Dynamic exclusion conditions were set to exclude

parent ions that were selected for MS/MS once in 30 sec, and the exclusion duration was

60 sec.

13

1.2.3 HDX-MS data analysis

Acquired MS/MS data was searched using Sequest software against a homemade

protein sequence database composed of 83095 entries. The search parameters were set to

use no enzyme and parent peptide tolerance of +/- 2 amu and fragment ion tolerance of

+/-1 amu. The search results yield the list of the identified proteolysis product peptides

and the assignment of each of these peptides to corresponding spectral peaks.

The percent deuterium incorporation for each peptide is calculated based on the

shift observed on the centroid of the mass peak based on the formula Zhang & Smith

[40]:

100)()(

)(),(

unlabeledmcontrolm

unlabeledmtlabeledmDt ,

where m(labeled,t) corresponds to mass peak centroid for the exchange reaction for

duration of t, m(control) and m(unlabeled) correspond to fully deuterated and non-

deuterated cases.

1.2.4 Protein Cloning, Expression and Purification

All proteins used for the 1H/

2H exchange mass spectrometry experiments and

NMR analysis were expressed, cloned and purified based on methodologies previously

published by our laboratory [42]. Briefly, the full-length gene constructs of proteins of

interest were cloned into modified pET15 expression vectors containing an N-terminal

purification tag (MGHHHHHHSH) or pET21 expression vectors containing a C-terminal

affinity tag (LEHHHHH) [43]. All vectors were transformed into codon enhanced BL21

14

(DE3) pMGK E. coli cells, which were cultured at 37 oC in MJ minimal medium [44].

Protein expression was induced at reduced temperature (17 oC) by IPTG (isopropyl--D-

thiogalactopyranoside). Expressed proteins were purified using an AKTAexpress (GE

Healthcare) two-step protocol consisting of HisTrap HP affinity and HiLoad 26/60

Superdex 75 gel filtration chromatography. Sample purity was confirmed using SDS-

PAGE and MALDI-TOF mass spectrometry.

1.3 Results

1.3.1 Modifications to Existing Protocol

Obtaining full coverage of all residues in protein sequence in HDX-MS

experiments by sequencing experiments is important to map the dynamic properties

throughout the protein. For the proteins, where the original protocol (see Section Methods

and Materials) did not yield good coverage, different approaches were explored for

completing the residue coverage.

The optimum peptide length for HDX-MS analysis, after pepsin digestion, is 7-15

residues. It is not possible to get exchange data for shorter peptides, due to high exchange

rates at the terminal regions, and the long peptides do not give good enough resolution to

accurately map the dynamics for a given residue along the protein sequence. The

digestion duration of the protein can be modified for the cases where the observed

peptide lengths are outside of the desired range. Additionally, different digestion times

can be explored for the cases where low residue coverage is observed by the original

15

protocol. Exposing the protein to pepsin for different durations may result in different set

of peptides, which may increase the overall residue coverage.

It was observed that the data collection mode of the spectrometer also affects the

peptides detected and identified peptides by the instrument and the software, respectively.

In the original protocol, the instrument is set to do a full-scan for the m/z range of 300-

2000 followed by zoom scan for the five most intense ions and the corresponding MS/MS

scans for these ions. In this original protocol, zoom-scans before each MS/MS scan take

additional time, which may cause low populated ions to be depleted until their MS/MS

scans are carried out, thus they will not be detected and identified by the receiver. Hence,

we observed improved sequence coverage for several cases when the zoom-scan is not

detected during data collection.

The unfolding conditions are also important in sequence coverage and quality of

the data. For several cases it was observed that some proteins were resilient to

denaturating conditions of the quench solution (Buffer A). This results in low sequence

coverage as well as non-accurate estimation corrected 1H/

2H exchange levels. Hence, we

suggested two additional denaturation conditions at quenching stage, where Buffer A

does not give satisfactory results:

Buffer A: 1.60 M GuHCl, 0.8%FA

Buffer B: 2 M Urea, 0.8% FA

Buffer C: 1 M Urea, 1 M TCEP

16

For the cases we studied it was concluded that buffer B offered more efficient

protein denaturation, thus for the rest of the study buffer B is selected as the first choice

for sequencing experiments.

1.3.2 Results of HDX-MS Analysis on NESG Target Proteins

For this project I analyzed HDX-MS data for thirteen NESG protein targets.

These results are being used to design new constructs, some of which have been brought

through the NESG sample production pipeline. These results are summarized in Table

1.2. Specific results for target HR3057, cyclin-dependent kinase 2 associated protein

(CDK2AP1), a cancer-associated human protein, are presented in Chapter 3 and results

for the rest of the targets are shown in Appendix A. The percent deuteration is calculated

as described in Section 1.2.3 and represented as heat map where color coding is given in

the inset of each figure. The residues are colored “white” where no data was available.

Table 1.2 List of the proteins studied with HDX-MS for construct optimization

NESG ID Number of res Result Summary

BfR218 91 Ordered

ER554 202 Flexible loops, tails

HR2891 96 Disordered at C-term

HR2921 134 Disordered N- and C- terms

HR3018 313 Flexible loops

HR3057 115 Disordered at N-term

HR3070 307 Disordered at N-term

HR3074 261 Ordered

HR3153 340 Disordered loop

HR3159 160 Disordered at N- and C-term

SmR84 158 Disordered at N- and C-term

SpR36 172 Ordered

SvR375 140 Disordered at C-term

17

The HDX-MS analysis of targets BfR218, HR3074, HR3018 and SpR36 revealed

that these proteins were highly ordered (Figure A.1, Figure A.7, Figure A.5, Figure

A.11). Low dispersion in amide chemical shifts, as seen for these cases, may be observed

for unstructured proteins or in helical proteins. The secondary structure predictions by

two different prediction softwares (PROFsec [45] and PSIPRED [46]) suggest strongly

that BfR218, HR3074 and HR3018 are highly helical structures. SpR36 is also predicted

to be mostly helical by PSIPRED. Hence the 15

N-HSQC profile observed for these three

targets in screening can be explained by the predicted helical content.

HDX-MS data for ER554 (Figure A.2), indicated that the structure is highly

ordered while slightly higher flexibility is observed for N-terminal 20 residues and for

residues ~125-145, which may correspond to a loop region. HR2891, on the other hand,

was found to be highly disordered at the C-terminus, where only ~30-40 residues on the

N-terminal region found to have an ordered structure. For HR2921 we also observe

highly flexible N- and C-terminal tails, where two predicted helices around residues ~30-

50 and ~60-80 make the ordered structure (Figure A.4). The HDX-MS data reveal that

HR3070 has a highly accessible N-terminal region composed of ~30 residues and

possibly a dynamic loop at residues ~60-90 (Figure A.6). For HR3159 we observe a

highly flexible internal loop for ~100 residues (~70-170), which makes up ~1/3 of the

protein (Figure A.9). This outcome is interesting and suggests a detailed analysis on the

sequence. The disordered region identified by HDX-MS analysis may correspond to an

internal loop within a single folded structure or it may correspond to a loop between two

domains of the protein, each composed of residues ~1-70 and ~175-340, respectively.

18

The data for HR3159 indicate that the protein is highly ordered, however higher level of

solvent accessibility is observed for ~30 residues on each terminal region of the protein

compared to the core (Figure A.9). For SmR84 no disordered region could be identified

by HDX-MS experiments (Figure A.10). Finally, for SvR375 it is observed that N-

terminal region of the protein shows high levels of protection from solvent exchange

while the C-terminal regions shows higher solvent accessibility, even though we do not

observe high exchange rates as one would observe for disordered proteins (Figure A.11).

1.4 Discussion

Once HDX-MS data for each protein is obtained they are studied carefully to

identify the possible disordered regions. For the cases, where increased flexibility is

observed at the terminal regions new constructs are designed to exclude these flexible

regions. Since the resolution of the data is at the level of 5-10 residues, multiple

constructs are proposed with different termini, excluding different sizes of peptides at

each end. Since the internal loops may contribute to the stability of the structure, those

residues, which were found to be highly flexible but not located at the terminal regions,

are kept intact during designing new constructs.

These proteins discussed here were studied by HDX-MS because the predictions

from the DisMeta server were not conclusive, mostly due to lack of consensus between

different prediction tools used within DisMeta. For these cases the HDX-MS studies were

necessary to identify the disordered regions accurately. For some cases, as in BfR218, the

consensus indicated that there was no disordered region in the protein, which was

confirmed by the HDX-MS studies. However, for some cases the consensus points to

19

interleaved disordered regions, as in HR3153, for which a very different disorder patterns

was identified based on the experimental data. This indicates one should use caution

when a clear consensus between different prediction tools is not achieved.

Here I described an important use of the HDX-MS tool in identifying the

disordered regions of the protein for planning a future construct for structure

determination and summarized the findings in Table 1.1. This low cost and relatively

non-invasive method can also be used in identifying the protein-protein interaction sites

and allosteric effects; it can provide insights on structural changes and dynamics

accompanying post-translational modifications and interactions with other molecules, and

it can also be used to understand equilibrium folding/unfolding mechanisms [41, 47-49].

20

2. THE INTERDOMAIN LINKER REGION OF HUMAN SMAD3 IS

INTRINSICALLY-DISORDERED

2.1 Introduction

Transforming growth factor beta (TGF-β) is an important cytokine, which along

with the other related proteins, regulates many key cellular processes including

proliferation, differentiation, adhesion, apoptosis, and immune suppression [50]. In the

TGF-β pathway, TGF-ß-activated membrane receptor protein kinases transfer the ligand-

binding signal to intracellular substrate proteins, Smads, which in turn propagate the

signal into the nucleus where they function as transcription factors [51-54]. There are

eight identified Smad proteins, which are classified, based on both sequence and

structure, into three groups: receptor-regulated Smads (R-Smad), common Smads (Co-

Smad) and inhibitory Smads (I-Smad). Smad2 and Smad3 from the R-Smad family are

activated by direct phosphorylation by the TGF-β receptor kinase at their SSXS

phosphorylation motif at the C-terminal tail. Activated Smad2 and Smad3 proteins homo-

oligomerize, form complexes with co-Smad Smad4, and then accumulate in the nucleus

where they regulate transcription of target genes [51-54].

Smad3 is composed of two highly conserved N- and C- terminal domains,

referred to as MAD Homology 1 (MH1) and MAD Homology 2 (MH2) domains,

respectively. The MH1 domain contains the DNA binding domain and transcription

factor binding sites as wells as phosphorylation sites for protein kinase C (PKC) and

GSK3 (glycogen synthase kinase 3) [51-57]. It may also be phosphorylated by the Ca++

-

and calmodulin-dependent kinase II (CamKII) [58]. The MH2 domain is responsible for

21

Smad-pathway activation by phosphorylation at the SSXS site by the TGF-β type I

receptor after its recruitment to the receptor complex through interaction with Smad

anchor for receptor activation (SARA) [51-54, 59]. It is also important for cross-talk

between different pathways, and mediates interactions with other Smad molecules to

form homo- and hetero-complexes and interactions with other transcription factors [51-

54]. The MH2 domain is also essential for Smad3 transcriptional activity. These two

well-folded domains are connected by a linker region with high proline content [51-54].

The linker region is important in oligomerization prior to nuclear transport [60], and it is

also essential for transcriptional activation [61, 62]. Furthermore, it contains

demonstrated as well as suspected phosphorylation sites for multiple kinases, such as

members of the cyclin-dependent kinase (CDK) family, members of the mitogen-

activated protein kinase (MAPK) superfamily including ERK (extracellular-signal

regulated kinase), JNK (c-Jun N-terminal kinase) and p38, GSK3, G protein-coupled

receptor kinase 2 (GRK2), and CamKII [58, 63-82]. Phosphorylation of the linker region

by the various kinases differentially affects Smad3 activity in a context-dependent

manner. The linker region also contains a recognition site for Nedd4L, an E3 ubiquitin

ligase, which can lead to the degradation of the Smad3 protein in response to TGF-β [66,

83].

Three-dimensional X-ray crystal structures are available for the MH1 (residues 1-

139) domain bound to DNA [55] and for the MH2 (residues 232-425) domain in the apo-

state [84], in complex with the Smad binding domain of SARA [84], and in a

phosphorylated state in complex with Smad4[85]. To date, however, there have not been

22

any reports on structure of full-length Smad2 or Smad3 proteins; in particular there is no

experimental information about the structure of the linker regions. The high proline

content of the linker region suggests lack of ordered structure, but this is not certain in the

absence of structural or biophysical studies on the full-length protein. Since multiple

phosphorylation sites within the linker region are targets for a number of kinases,

understanding the structural features of this part of the protein is critical for

understanding the functional role of the interdomain linker.

Amide hydrogen-deuterium exchange (HDX) studies provide an important

method for characterizing stable hydrogen-bonded structures in proteins [86]. The

exchange rates for amide hydrogens are highly dependent on solvent accessibility and

local structural environment. Slower exchange rates are observed for amide hydrogens

involved in hydrogen bonds within secondary and tertiary structures proteins compared to

the intrinsically disordered regions, due to the reduced solvent accessibility; and

continuous stretches of backbone amides with fast exchange rates are observed for

intrinsically-disordered regions of proteins, including interdomain flexible linker regions.

The analysis of amide HDX rates with Mass Spectrometry (HDX-MS) provides a reliable

method for assessing such features of a protein [40, 87, 88]. In this study, we report

results of HDX-MS analysis for full-length Smad3, providing the first experimental

evidence that the linker region is intrinsically-disordered and largely solvent accessible in

full-length human Smad3.

23

2.2 Materials and Methods

The sample preparation and HDX-MS analysis were carried out as explained in

Chapter 1.

2.3 Results

The results of HDX-MS analysis for human Smad3 are summarized in Figure 2.2.

The color coding is shown in the inset of the figure, and the interdomain linker region is

shown within the brackets. This HDX-MS analysis includes data from 81 multiple

charged peptide ions from 53 peptide fragments; out of these data 12 peptide ions from 9

fragments provide coverage for the interdomain linker region. For some parts of the

protein, data coverage was not achieved because the mass spectrometer could not detect

any peptides corresponding to these regions, or no reliable HDX-MS data could be

obtained for the detected peptides. These parts are color-coded in white.

24

Figure 2.1 The peptides used for HDX-MS analysis of Smad3. The secondary structures

for the MH1 and MH2 domains are also indicated, based on structural data obtained from

the Protein Data Bank (ID: 1OZJ for the MH1 domain and 1MJS for the MH2 domain).

The linker region is indicated with brackets.

25

Figure 2.2 HDX-MS results on the full-length human Smad3 protein. The color coding,

as indicated in the inset, describes the average percentage of deuteration levels, derived

by multiple overlapping peptide fragments, of each residue after either 10 or 100 second

exposure to deuterium; white color coding is used where no data are available. The linker

region is indicated with brackets, and proposed Proline-directed phosphorylation sites in

the linker are labeled with purple stars, and the non-proline-directed GRK2 site with a

green star. The secondary structures for the MH1 and MH2 domains are also indicated,

based on structural data obtained from the Protein Data Bank (ID: 1OZJ for the MH1

domain and 1MJS for the MH2 domain). Polypeptide segments exhibiting rapid amide

exchange are indicated in red and orange, while segments exhibiting slow amide

exchange rates are indicated in blue and green. Some green color-coded segments show

the same color when exposed to deuterated solvent for 10 seconds versus 100 seconds;

the deuteration levels for these segments were in fact increased when exposed for 100

seconds compared to 10 seconds, but they were still in the same color range.

26

As shown in Figure 2.2, high deuteration levels are observed even at the shortest

10-second exposure to solvent deuterium for the interdomain linker region,

demonstrating that the backbone amides in this region of full-length Smad3 are solvent

exposed. This continuous segment of amide protons with fast HDX rates, including

residues 140-185, indicates an intrinsically-disordered interdomain linker. The C-

terminal part of the linker exhibits somewhat lower exchange rates, suggesting lower

solvent accessibility, and potentially some transient local structure. The SSXS motif in

the C-tail also shows high deuteration levels at short HDX exchange times, consistent

with the notion that the C-tail is accessible for phosphorylation by the TGF-β receptor.

Although the sequence coverage of the data is not 100%, the available data are adequate

for qualitative assessment of the solvent accessibility and structural flexibility for the

entire protein. These results reveal that the overall structure of the Smad3 protein features

well-ordered structures in MH1 and MH2 domains, connected by an intrinsically-

disordered linker.

Disorder prediction methods are also reasonably reliable at identifying

intrinsically-disordered regions of proteins. The DisMeta server (http://www-

nmr.cabm.rutgers.edu/bioinformatics/disorder/) uses 10 different disorder prediction

methods to provide a consensus analysis of intrinsically-disordered regions of proteins.

DisMeta results for human Smad3 are shown in Figure 2.3b. The predicted results agree

well with the experimental HDX-MS data; i.e. the interdomain linker region and the C-

terminal tail segment containing the SSXS phosphorylation motif are intrinsically-

disordered, while the MH1 and MH2 domains are predicted to be well-ordered. Similar

27

Figure 2.3 a) Sequence alignment of Smad3 and Smad2. The secondary structure

elements are shown for Smad3, and the linker regions are shown in brackets. Proposed

phosphorylation sites within the linker are boxed. b) DisMeta disorder consensus

prediction results for Smad3 and Smad2. The value plotted for each residue represents the

number of disorder prediction servers that predict that residue to be intrinsically-

disordered. The corresponding secondary structure elements are indicated. The

secondary structures of Smad3 MH1 and MH2 domains and Smad2 MH2 domain are

based on published results in PDB (ID: 1OZJ, 1MJS, 1KHX, respectively). The Smad2

MH1 domain secondary structure elements are based on the Smad3 MH1 domain

structure.

a)

b)

28

results, including an intrinsically-disordered interdomain linker, are predicted for the

homologous human Smad2 protein (Figure 2.3b).

2.4 Discussion

Intrinsically-disordered interdomain linkers have been observed in several other

proteins [73, 89-93] and appear to be particularly common in multidomain proteins of

higher eukaryotic proteins [6]. Functional roles of such intrinsically-disordered regions

may include (i) providing structure plasticity between domain orientations allowing for

forming alternative complexes with different binding partners; (ii) allowing solvent

access for phosphorylation and other post-translational modifications, and (iii) providing

an entropic contribution that modulates the binding affinity in specific protein-protein

interactions [6-11]. Though it is not yet clear whether all of these potential roles are

important for the functionally-important interdomain linker region of Smad3, these data

provide the first experimental evidence for an intrinsically-disordered interdomain linker

in the R-Smad family of proteins.

Our findings are especially important to shed light on the structural features of the

linker region. There are four proline-directed phosphorylation sites in the linker region,

T179, S204, S208, S213, which are phosphorylated by a variety of kinases [63-74, 78-

81]. Our results show that the N-terminal part of the linker region from amino acid 140 to

amino acid 185 is highly solvent exposed. T179 is located in this region. T179 is

phosphorylated by G1 CDKs, CDK2 and CDK4, in a cell cycle-dependent manner, and

the phosphorylation of T179 and other sites inhibits the antiproliferative function of

Smad3 [63, 78, 79, 94-96]. T179 is also phosphorylated by ERK in response to EGF

29

treatment [68]. Furthermore, T179 is phosphorylated by members of the CDK

superfamily after TGF-β treatment [64-66]. T179 can also be phosphorylated by JNK and

p38 in the presence of TGF-β [69-72, 82]. Importantly, TGF-ß-induced phosphorylation

of T179 provides the major site for Smad3 binding to Pin1, a proline isomerase [97]. Pin1

promotes TGF-ß-induced migration and invasion of cancer cells [97]. The tumor

promoting function of Smad3 and Smad2 in later stages of cancer is linked with their

interaction with Pin1, which is overexpressed in many cancers [98, 99].

The S204 site is located in a part of the linker region that has slower amide

exchange rates than the N-terminal region of the linker, which may indicate some

transient structure in this part o the linker. Residue S204 is phosphorylated by ERK in

response to EGF, and by GSK3 in response to TGF-β [64, 68, 74]. A recent study shows

that GSK3 phosphorylation of S204 can increase Smad3 binding to the E3 ubiquitin

ligase Nedd4L [100], which can lead to Smad3 degradation [66, 83]. Residue S208 is

phosphorylated by ERK in response to EGF and by members of the CDK family, JNK,

and p38 in the presence of TGF-β [64-66, 68-73]. Residue S213 is phosphorylated by

CDK2 and CDK4 in a cell cycle-dependent manner [63]. While our study is not able to

reveal structural features for the S208 and S213 due to the peptides containing these two

sites were not detected by mass spec, we expect that the phosphorylation sites at S208

and S213 are also solvent exposed.

In addition to the proline-directed kinases, the Smad3 linker region is also

phosphorylated by GRK2 [76]. Phosphorylation by GRK2 inhibits the TGF-β receptor

kinase to phosphorylate the SSXS-motif in the C-terminal tail [76]. The GRK2

30

phosphorylation site has been mapped to S157 in the Smad3 linker region [77]. Residue

S157 is within the intrinsically-disordered segment 140-185 of the linker region (Figure

2.2).

In summary, the linker region of Smad3 is under dynamic regulation by various

kinases. Our study provides the first insights into the structural features of the linker

region in the context of the full-length Smad3 protein. We have shown that the linker

region, especially the N-terminal part of the linker region containing residues 140-185, is

intrinsically disordered, and has a high solvent accessibility. This feature allows T179,

which is a site for multiple proline-directed kinases, and S157, which is the site for

GRK2, to be readily phosphorylated. Perhaps more significantly, the intrinsically-

disordered linker region of Smad3 (and also of Smad2) provides structural plasticity,

allowing for multiple interdomain orientations of the MH1 and MH2 domains, which

may be essential to allow different interactions with alternative binding partners.

31

3. SOLUTION NMR STRUCTURE OF THE FOLDED C-TERMINAL

DOMAIN OF HUMAN CYCLIN DEPENDENT KINASE 2 ASSOCIATED

PROTEIN 1 (CKD2AP1)

3.1 Introduction

CDK2AP1 (p12DOC-1

) is a well known, highly conserved tumor suppressor

protein, corresponding to gene doc-1 (deleted in oral cancer) (Figure 3.1). It was first

identified by subtractive hybridization experiments using hamster oral cancer cells, where

it was observed that the gene product is absent or reduced in the malignant cells. When

transformed keratinocytes are transfected with doc-1, its expression inhibits cell growth

[101]. A ubiquitously expressed, highly conserved homolog of this gene, with 88%

cDNA identity and 98% protein sequence identity, has also been identified in different

human tissues [102, 103]. In three malignant human oral keratinocyte cell lines analyzed,

the expression of the doc-1 gene was absent or reduced beyond detection, similar to

previous results on hamster models [102]. These early findings suggested an important

role for CDK2AP1 protein in carcinogenesis.

32

Figure 3.1 Multiple sequence alignment of CDK2AP1 with homologues, by ClustalW.

In recent years, the importance of CDK2AP1 in various cancer cell lines has been

intensely investigated. Differential expression of CDK2AP1 was observed for two

phenotypes of human colorectal cancer cells, micro-satellite stable and unstable cell lines

[104]. In another study of 180 gastric cancer tissues, negative expression of CDK2AP1

was reported for 78% of the tissues, which was directly correlated with more advanced

tumor stages and invasion [105]. doc-1 gene therapy on mouse models of head and neck

squamous cell carcinoma (HNSCC) resulted in significant reduction in the weight, size

and growth of tumors compared to control systems, suggesting CDK2AP1 as a new

therapeutic target [106]. Ectopic expression of doc-1 in transfected malignant

keratinocyte hamster models was observed to elevate the number of apoptotic cells,

33

implying a possible role for CDK2AP1 in apoptotic pathways [107]. In addition to its

role in carcinogenesis, it was observed that doc-1 was one of the 216 genes enriched in

stem cells, making it an important marker defining “stemness” of the cell [108].

Furthermore, it was recently observed that CDK2AP1 is associated with embryonic

development and low levels of this protein lead to embryonic lethality [109].

Westernblot and immunoblot experiments on cell lysates using GST-tagged

CDK2AP1 reveal three putative binding partners (Figure 3.2), DNA polymerase

α/primase (DNApol α/primase), deleted in oral cancer one related protein (DOC-1R,

CDK2AP2), and cyclin-dependent kinase 2 (CDK2), where DNApol α/primase and

CDK2 are important in cell cycle regulation. In vitro analyses have shown that the first

six amino acids of CDK2AP1 interact with DNApol α/primase, the only protein that can

initiate de novo DNA replication [110]. It was observed that the CDK2AP1:DNApol

α/primase interaction negatively regulates DNA replication at the level of initiation,

suggesting a new mechanism for S-phase regulation [102, 111]. However, there have

been no subsequent reports in the literature to support this data or shed light on the details

of this interaction.

DOC-1R, which is a close homolog of CDK2AP1 with 57% sequence identity,

was reported as one of the binding partners of CDK2AP1. [112]. It was reported that

DOC-1R protein was a substrate of MAP kinase and was important in microtubule

organization during meiotic maturation [113]. However, there have not been any further

studies elucidating the function of this protein in the cell and the details of its interaction

with CDK2AP1.

34

Figure 3.2 HCPIN Interactome for (a) CDK2AP1 and (b) CDK2. Each node represents a

protein, which interacts with the protein of interest, and the circles around the nodes

indicate if the structure for that protein or a close homolog is available [114].

CDK2AP1 has also been observed to interact specifically with free, non-

phosphorylated CDK2, making it the only known specific inhibitor of CDK2 [115].

CDK2 regulates G1-to-S phase transition of the cell cycle through its complexes with

cyclin A and cyclin E [116]. It was observed that CDK2AP1 inhibits CDK2 kinase

activity by sequestering the inactive monomer, preventing formation of complexes with

cyclin A and cyclin E, and/or by directing it to the proteosome degradation pathway

[115]. Thus, DNA replication is inhibited by CDK2AP1 by inhibition of CDK2 and/or

inhibition of DNApol α/primase. This scenario is consistent with the previous

observation that transfected malignant keratinocytes experience significantly reduced cell

growth and reversion of phenotype [101]. Overexpression of CDK2AP1 consistently

results in alteration of the cell cycle profile, with increased G1 phase and reduced S phase

[115].

35

More recently, stable isotope labeling with aminoacids in cell culture (SILAC)

study on NUcleosome Remodeling and histone Deacetylase (Mi-2/NuRD) chromatin

remodeling complex revealed that CDK2AP1 is a subunit of Mi-2/NuRD complex [117].

Mi-2/NuRD is one of the ATP dependent chromatin remodeling complexes which plays a

critical role in epigenetic gene regulation [118]. This multi-domain complex, with

variable subunits, changes epigenetic markers by removing methyl and acetyl groups on

DNA, thus changes the positions of nucleosome, making the DNA available for

transcription factors and DNS repair proteins [118]. CDK2AP1 was discovered to be a

bona fide subunit of Mi-2/NuRD complex; however the molecular function of CDK2AP1

within this complex is still unknown.

Although the importance of CDK2AP1 in cell cycle regulation and tumor

formation has been clearly established, the structural features of this protein have not

been reported to date. Here, we present the first report of the three-dimensional solution

structure of CDK2AP1 (UniProtKB/Swiss-Prot ID, CDKA1_HUMAN; NESG ID,

HR3057; hereafter referred to as CDK2AP1(61-115)). The N-terminal 60 residues of the

dimeric full-length CDK2AP1 are „intrinsically-disordered‟, on the basis of

hydrogen/deuterium exchange with mass spectroscopy (HDX-MS) analysis [26]. The

solution structure of C-terminal region of CDK2AP1, including residues 61-115,

determined by NMR methods is a homodimeric four-helical bundle. Significantly,

CDK2AP1 also contains a predicted phosphorylation site for IκB kinase epsilon (IKKε),

Ser46 [119]. Here we show that IKKε phosphorylates CDK2AP1 in vitro at the predicted

36

Ser46 site, which is located in the intrinsically-disordered N-terminal region of the

protein structure.

3.2 Materials and Methods

3.2.1 Sample preparation

Full-length CDK2AP1, CDK2AP1 (61-115), CDK2AP1 (61-115)-C105A, and

CDK2 were cloned and purified based on methodologies previously published by our

laboratory [42]. Briefly, the full-length gene constructs of proteins of interest were cloned

into modified pET15 expression vectors containing an N-terminal purification tag

(MGHHHHHHSH) [43]. All vectors were transformed into codon enhanced BL21

(DE3) pMGK E. coli cells, which were cultured at 37 oC in MJ minimal medium [44]

containing (15

NH4)2SO4 as the sole nitrogen source for uniformly 15

N enrichment, and

100% or 5% 13

C-glucose as sole carbon source for uniform or 5% 13

C enrichment,

respectively. Protein expression was induced at reduced temperature (17 oC) by IPTG

(isopropyl--D-thiogalactopyranoside). Expressed proteins were purified using an

AKTAexpress (GE Healthcare) two-step protocol consisting of HisTrap HP affinity and

HiLoad 26/60 Superdex 75 gel filtration chromatography. Sample purity was confirmed

using SDS-PAGE and MALDI-TOF mass spectrometry.

3.2.2 Structure Determination

Samples of uniformly 13

C, 15

N- and 5%-13

C, U-15

N-enriched human CDK2AP1

(61-115) and CDK2AP1 (61-115) -C105A for NMR structure determination were

concentrated to 0.7 to 0.9 mM in 95% H2O/5% 2H2O solution containing 20 mM MES,

37

200 mM NaCl, 10 mM DTT, 5 mM CaCl2 at pH 6.5. Analytical gel filtration with static

scattering data demonstrates that both full-length and truncated CDK2AP1 are dimeric in

solution under the conditions used in the NMR studies. All NMR data were collected at

25oC on Varian INOVA 600 and Bruker AVANCE 800 NMR spectrometers equipped

with 5-mm cryoprobes, processed with NMRPipe [120], and visualized using SPARKY

[121]. The rotational correlation times for 5%-13

C, U-15

N-enriched CDK2AP1 (61-115),

and CDK2AP1 (61-115)-C105A are calculated from measurements of average 15

N T1 and

T2 relaxation times using 1D 15

N-edited relaxation experiments [122, 123]. Complete 1H,

13C, and

15N resonance assignments for CDK2AP1 (61-115) were determined using

conventional [124] triple resonance NMR methods, respectively, and deposited in the

BioMagResDB (BMRB accession number 16808). Stereospecific isopropyl methyl

assignments for all Val and Leu residues were deduced from characteristic cross-peak

fine structures in high resolution 2D 1H-

13C HSQC spectra of 5%-

13C,100%-

15N

CDK2AP1 (61-115) [125]. Resonance assignments were validated using the Assignment

Validation Suite (AVS) software package [126]. 1H-

15N heteronuclear NOEs were

measured with gradient sensitivity-enhanced 2D heteronuclear NOE approaches [123,

127]. The dimer interface in CDK2AP1 (61-115) was identified using X-filtered

13C-

NOESY experiments [128], on a 1:1 sample of 13

C,15

N- enriched and unlabeled (natural

abundance) protein. One-bond N-HN and N-C‟ residual dipolar couplings, DNH and DNC‟,

were obtained on 200 μL [U-5%-13

C,100%-15

N]- and [U-100%-13

C,100%-15

N]-[

CDK2AP1 (61-115)] aligned in 7% positively charged compressed gel using standard

protocols. The RDCs were determined from 1J(H

N-N

H) scalar couplings measured from

38

an interleaved pair of 2D 1H-

15N TROSY and HNCO-TROSY acquisitions on isotropic

and aligned samples.

For the truncated CDK2AP1 (61-115) structure determination, initial structure

calculations were performed by Cyana 3.0 [129, 130], using peak intensities from 3D

simultaneous CN-NOESY (τm = 120 ms) [131], dihedral angle constraints computed by

TALOS (φ ± 20°; ψ ± 20°) [132], and N-HN and N-C‟ RDC constraints. The 20

structures with lowest target function out of 100 in the final cycle were further refined by

restrained molecular dynamics in explicit water using CNS 1.2 [133, 134], using the final

NOE derived distance constraints, TALOS dihedral angle and RDC constraints. Final

refined ensemble of 20 structures was deposited into the Protein Data Bank (PDB ID,

2KW6). Structural statistics and global structure quality factors, including Verify3D

[135], ProsaII [136],

PROCHECK[137],

and MolProbity [138] raw and statistical Z-

scores, were computed using the PSVS 1.3 software package [139]. The global goodness-

of-fit of the final structure ensembles with the NOESY peak list data were determined

using the RPF analysis program [140].

3.2.3 Polycistronic Co-expression of CDK2 and CDK2AP1

To study the interaction between CDK2 and CDK2AP1 we created an artificial

polycistronic co-expression system utilizing our pET15_NESG T7-based expression

vector. Briefly, CDK2 was PCR amplified with a forward primer allowing in frame

fusion with the pET15_NESG N-terminal 6X-His tag (MGHHHHHHSH). The reverse

primer contained a 25 bp sequence (reverse complement) comprised of a T7 gene 10

Translational Enhancer and a canonical Shine-Dalgarno Ribosome Binding Site (5‟ -

39

ATGTATATCTCCTTCTTAAAGTTAA – 3‟). Full-length CDK2AP1 was PCR

amplified using a forward primer containing the 25 bp translational control elements and

a reverse primer containing a region of overlap with 3‟ end of the pET15_NESG multiple

cloning site (MCS), this open reading frame (ORF) does not impart a 6X-His Tag. Each

ORF retained their native stop codon. CDK2 and CDK2AP1 were PCR amplified using

the primers described above and the correct sized bands visualized and isolated by

agarose gel electrophoresis. Following gel extraction, the two PCR products were cloned

into pET15_NESG using Infusion-based (Clonetech) ligation independent cloning. More

specifically, this was achieved using the CDK2 region of overlap with the pET15_NESG

6X-His tag, the 25 bp region of overlap containing the translational control elements

added on to the 3‟ end of CDK2 and 5‟ end of CDK2AP1, and finally the region of

overlap with the vector‟s 3‟ end of the MCS added on to CDK2AP1. The resulting

expression vector construct contains open reading frames for both CDK2 and CDK2AP1

under the control of a single T7 promoter and termination sequence. Thus following

IPTG induction, T7 RNA polymerase will transcribe a single message containing the two

ORFS (CDK2 and CDK2AP1) each with their own translational control sequences,

allowing the proteins to be co-expressed from the same transcript. This co-expression

strategy (in close proximity) often allows for the isolation of protein complexes that have

proven recalcitrant in other co-expression or co-purification systems.

3.2.4 Size Exclusion Chromatography for Binding Studies

The binding between CDK2AP1 (truncated, full-length and phosphorylated full-

length) and CDK2 were studied by size exclusion chromatography. 50 µM of CDK2 and

40

100 µM of CDK2AP1 construct were mixed in buffer conditions of 50 mM MOPS, 100

mM NaCl, 10 mM MgCl2 at pH 7.4, and incubated for minimum of 1 hour. After

incubation the 50 µl of the mixture is injected onto a Superdex 75 10 mm/30 cm gel-

filtration column (GE Healthcare) with a 100-μL loop at 0.5 mL/min using an AKTA

Explorer system (GE Healthcare) at 4 °C. The eluting protein was detected by UV

absorbance at 280 nm. The absorbance profile of the mixture is compared with those of

the injections of CDK2 and CDK2AP1 constructs alone.

3.2.5 IKKE Phosphorylation of full-length CDK2AP1

For phosphorylation studies IκB kinase epsilon (IKKε) is bought from Invitrogen

(Catalog number: PV4875). The reaction solution is prepared as 0.1 mM of full-length

CDK2AP1 in 50 mM Tris (pH 7.5), 200 mM NaCl, 10 mM CaCl2, 5mM β-

glycerolphosphate, 2.5 mM ATP. For the reaction 4 µl of the IKKε is injected and it was

incubated at 17°C for 48 hrs. Extent of phosphorylation was confirmed by LC/MS/MS

experiments.

3.3 Results

3.3.1 HDX-MS Analysis

Residues comprising unstructured or intrinsically-disordered regions in a protein

can be readily identified by hydrogen/deuterium exchange mass spectrometry (HDX-

MS), since the exchange rates of these amide hydrogens are significantly higher than

those in structured regions due to surface accessibility and lack of hydrogen bonding

[86]. Studying the dynamics with HDX-MS provides a low cost and high-throughput way

41

of quantifying amide proton exchange rates in polypeptide segments of proteins [21].

Moreover, construct optimization involving the elimination of unstructured elements

identified by HDX-MS has been shown to yield significant improvement in the success

of both crystallization experiments for X-ray crystallography studies and protein structure

determination by solution NMR methods [21, 22, 88].

Full-length CDK2AP1, was analyzed by HDX-MS. The backbone amide groups

in the entire N-terminal 60 residues of the protein exhibit rapid HDX rates characteristic

of intrinsically-disordered polypeptide segments (Figure 3.3). Based on this information a

truncated construct corresponding to residues 61-115 was designed for NMR structural

studies. The overlapping highly dispersed peaks from 1H-

15N HSQC spectra for full-

length and truncated CDK2AP1 (Figure 3.4) verifies that the removal of the disordered

region does not disrupt the overall structure of ordered regions of the protein.

Figure 3.3 HDX-MS data for CDK2AP1 (NESG ID: HR3057). The secondary structures

are shown based on the NMR structure described here

42

Figure 3.4 Overlay of 600 MHz

1H-

15N HSQC spectra at 298 K of full-length (red) and

61-115 construct (blue) of CDK2AP1; both in 20 mM MES buffer at pH 6.5, containing

0.02% NaN3, 10 mM DTT, 5 mM CaCl2, 200 mM NaCL, 1x Protease Inhibitors, 10%

D2O, and 50 µM DSS.

3.3.2 CDK2-AP1(61-115) is a dimer in solution

Functional analysis of CDK2AP1 on contact inhibited diploid cells suggests that

it can occur in the cell in both in monomeric and dimeric forms [141]. However, in

contact inhibition phase, it was observed that the dimer content in the cell increases with

concomitant decrease in monomer content. In the same study, mutational analysis

indicated that replacing a single cysteine residue at position 105 by alanine residue

Red: Full-length

Blue: res 61-115

43

disrupts dimerization and CDK2 inhibition. This study also suggested that the dimer is

the active form of CDK2AP1, and is stabilized by formation of an inter-chain disulfide

between C105 residues of each chain of the dimer [141].

We have analyzed the oligomerization state of CDK2AP1(61-115) using

analytical gel filtration with static light scattering and 15

N NMR relaxation experiments.

The molecular weight measured by static light scattering experiments corresponds to the

dimer molecular weight (Figure 3.5a). The rotational correlation time (τc), corresponding

to the time required for a molecule to rotate 1 radian, is directly proportional to its

molecule size. The τc vs molecular weight plot of known monomeric proteins

demonstrates that τc is directly proportional to the molecular mass of the globular proteins

smaller than 25 kDa (Figure 3.5b). We observe that τc for CDK2AP1(61-115) highly

deviates from the general trend and it corresponds to molecular weight of dimeric form of

CDK2AP1 (61-115) (Figure 3.5b). Thus, both techniques firmly establish that CDK2AP1

(61-115) is a dimer in solution under the conditions used in this study. Based on these

results and X-filtered NOESY data, the solution NMR structure of CDK2AP1 (61-115)

was solved as a dimer (see below).

44

Figure 3.5 (a) Static light scattering chromatogram (blue) for CDK2AP1 (61-115) in the

presence of DTT. The sample was injected onto an analytical gel filtration column

(Protein KW-802.5, Shodex, Japan; flow rate, 0.5 ml/min, 4 °C) with the effluent

monitored by refractive index (black trace; Optilab rEX) and 90° static light scattering

(blue) detectors The magenta data points reflect the estimated molecular weight of the

molecule, and the dashed line indicates theoretical molecular weight for the dimer. (b)

Rotational correlation time vs MW plotted for known monomeric proteins (blue),

CDK2AP1 (61-115) (red) and CDK2AP1 (61-115)-C105A (green) constructs. CDK2AP1

samples were in 20 mM MES buffer at pH 6.5, containing 0.02% NaN3, 10 mM DTT, 5

mM CaCl2, 200 mM NaCL, 1x Protease Inhibitors, 10% D2O, and 50 µM DSS, at 25 °C.

a)

b)

wt

C105A

45

3.3.3 Solution Structure of CDK2AP1 (61-115)

Structural statistics for the dimeric solution NMR structure of CDK2AP1(61-115)

are summarized in Table 3.1. The structure was determined using 1827 NOE-based

distance constraints contacts, 72 RDC data and 182 dihedral constraints obtained based

on the backbone chemical shift values. These provided some 18.3 constraints per residue,

and 3.6 constraints per residue for long range interactions. The average backbone root

mean square deviation of ordered residues in the final ensemble is calculated as 0.6 Å.

Backbone dihedral angle analysis indicates that 99.3% of the residues fall in the most

favored regions of the Ramachandran plot. Overall, the structure quality scores, including

RPF scores comparing the structure with the unassigned NOESY peak list (summarized

in Table 3.1) indicate a high quality solution NMR structure for the dimeric C-terminal

region of CDK2AP1.

46

Table 3.1 Summary of NMR and structural statistics for human CDK2AP1(61-115)a

Completeness of resonance assignments

Backbone (%) 97.45

Side chain (%) 84.43

Aromatic (%) 100

Stereospecific methyl (%) 100

Conformationally-restricting constraints

Distance constraints

Total 1827

intra-residue (i = j) 560

sequential (|i- j| = 1) 433

medium range (1 < |i – j| ≤ 5) 434

long range (|i – j| > 5) 400

distance constraints per residue 16.6

Dihedral angle constraints 182

Number of constraints per residue 18.3

Number of long range constraints per residue 3.6

Residual constraint violations

Average number of distance violations per structure

0.1 – 0.2 Å 4.05

0.2 – 0.5 Å 0.4

> 0.5 Å 0

average RMS distance violation / constraint (Å) 0.01 Å

maximum distance violation (Å) 0.30 Å

Average number of dihedral angle violations per structure

1 – 10° 1.85

> 10° 0

average RMS dihedral angle

violation / constraint (degree) 0.22º

maximum dihedral angle violation (degree) 4.80º

RMSD from average coordinates (Å) orderedb

backbone atoms 0.6 Å

heavy atoms 1.2 Å

Ramachandran statistics for ordered residues (Richardson lab Molprobity)

most favored regions (%) 99.3%

additional allowed regions (%) 0.7%

disallowed regions (%) 0.0%

Global quality scoresc Raw / Z-score

Verify3D 0.41 / -0.80

ProsaII 1.07 / 1.74

Procheck(phi-psi) 0.54 / 2.44

Procheck(all) 0.42 / 2.48

MolProbity Clash 13.76 / -0.84

47

Table 3.1 continued

RPF Scores

Recall 0.961

Precision 0.868

F-measure 0.913

DP-score 0.738

RDC Statistics

Number of DNH constraints

36

R

Qrms

0.913 ± 0.011

0.133 ± 0.009

Number of DNC constraints 36

R 0.988 ± 0.001

Qrms 0.115 ± 0.006 aStructural statistics were computed for the ensemble of 20 deposited structures

bOrdered residue ranges [S(phi) + S(psi) > 1.8]: residues 62-112

cCalculated based on PSVS v1.4 program

Each protomer of CDK2-AP1(61-115) is composed of two helices, including

residues 61-82 (α1) and 84-112 (α2), connected by a type II β-turn to form a hairpin helix

motif. These hairpin helices dimerize to form an anti-parallel four helix bundle (Figure

3.6) that has twofold rotational symmetry about an axis nearly parallel to the helix axes

and vertical in Figure 3.6. The four-helical bundle structure is predominantly stabilized

by the hydrophobic interactions involving numerous hydrophobic residues at the

interface. While this fold is relatively rare, similar dimer protein folds have been reported

for a microtubule binding protein (PDBID: 1WU9) and a phycobilisome degradation

protein (PDBID: 1OJH), which are significantly different in sequence, and do not appear

to be homologous proteins.

The α1-helix exhibits a bend around residue K75 near the β-turn, deviating from

an ideal linear helical structure. This bend is supported by intra-chain interactions of side-

chains of residues A87 with T80 and P79 and T80 with L91; and inter-chain interactions

of I77 with I95. The electrostatic potential map (Figure 3.6d) indicates that the surface on

inter-chain α1- α2 interface has a net positive charge near the loop region, while, to a

48

lesser extent, a net negative charge is observed near N- and C- terminal regions of the

chains. This electrostatic distribution on overall protein provides a strong dipolar

character along the axis of the protein, which may be important for its interactions with

its binding partners.

49

Figure 3.6. Solution structure of residues 61-115 of CDK2AP1. Lowest energy ensemble of 20

structures , lowest energy model (b,c), and the electrostatic potential map (d) are shown. In a-c

the structures are colored based on the secondary structures, red for α-helix, green for loop

regions, in c cysteine residues are shown in stick representation, with carbon atoms shown in

green. The electrostatic potential map is calculated by APBS add-on in pymol. The N- and C-

terminal regions are labeled in (a) and (c); the orientation of the models in (a), (b) and (d) are the

same.

N N

C C

C

C

N N

50

Even though formation of a disulfide bridge between C105 thiol moieties of each

chain does not violate the structural constraints used in calculations, the Cβ chemical shift

value of C105 (26.198 ppm), which is diagonostic of the oxidation state of the associated

thiol group [142, 143] indicate that, under the sample conditions used in this study (with

10 mM DTT as reducing agent), the two cysteines at position 105 are predominantly in

the reduced state [143]. Disulfide mapping using MALDI-TOF analysis confirmed that

10 mM DTT is sufficient to fully reduce the Cys residues of to CDK2AP1(61-115)

cysteine (Figure 3.7). Moreover, both 1H-

15N HSQC (Figure 3.8a) and

1H-

13C HSQC

spectra (Figure 3.8) of CDK2AP1(61-115) in solution both with and without DTT are

almost identical. This implies that, even in the absence of reducing agent (i.e., no DTT),

the reduced form of C105 is the dominant state, and the protein forms a dimer in both

conditions. Hence, our structure and data do not support previously published results

concerning the requirement for disulfide bonding of C105 residues in dimer formation

[141].

51

Figure 3.7 Disulfide mapping by MALDI-TOF of sample without (a) and with (b) DTT.

The trypsin digestion product peptides with and without disulfide bond formation are

highlighted. No disulfide formation is observed when DTT is added.

a)

b)

52

Figure 3.8 Overlay of 600 MHz (a)

1H-

15N HSQC and (b)

1H-

13C HSQC spectra of

CDK2AP1(65-115) with (red) and without (green) 10 mM DTT in solution, at 25 °C.

The samples are in 20 mM MES buffer at pH 6.5, containing 0.02% NaN3, 5 mM CaCl2,

200 mM NaCL, 1x Protease Inhibitors, 10% D2O, and 50 µM DSS.

53

3.3.4 C105A mutation does not disrupt DOC-1(61-115) dimerization

To verify our findings the C105A of CDK2AP1(61-115) mutant was also studied

by NMR. Kim et al. have reported that the C105A mutation abolishes dimer formation

and the activity of the protein both in vivo and in vitro [141]. However, comparison of the

1H-

15N HSQC spectra of wild-type and the C105A mutant of CDK2AP1(61-115) (Figure

3.9), suggests otherwise. The fact that the chemical shift perturbations are located mostly

in the vicinity of the mutation site indicates that the mutation did not cause significant

changes in the 3D structure. The τc measurements also supports this finding (Figure

3.5b). This is expected from the results presented above, since the dimer structure is

stabilized mainly by hydrophobic interactions. These data further demonstrate that

disulfide bonding of C105 is not required for dimer formation.

54


1H-

15N HSQC spectra of wild-type (magenta) and

C105A mutant (blue) CDK2AP1(61-115) in 20 mM MES buffer at pH 6.5, containing

0.02% NaN3, 10 mM DTT, 5 mM CaCl2, 200 mM NaCL, 1x Protease Inhibitors, 10%

D2O, and 50 µM DSS, at 25 °C. The residue assignments for the wild-type protein are as

labeled.

3.3.5 CDK2AP1 is phosphorylated by IKKε

IκB kinase (IKK) complex is a part of the canonical cellular mechanism which

regulates propagation of cellular response to inflammation, through inactivation of

inhibitors of nuclear factor kappa-B (NF-κB) thus activating NF-κB pathway [144]. A

less studied IKK family protein, IKK epsilon (IKKε), which is a Ser/Thr kinase, is shown

to regulate the interferon response mechanism, as well the canonical NF-κB pathway

[145, 146]. IKKε was also identified as an oncogene, which was found to be over-

expressed in 30% of breast cancer cell lines [147-149]. The several binding targets of

55

IKKε in interferon signaling pathways were identified but no binding partners responsible

for cell transformation were identified.

Recently consensus phosphorylation peptide motif for IKKε was described

through peptide library scanning studies. A proteome-based bioinformatics search for the

identified motif revealed a set of possible phosphorylation targets for IKKε. On a search

against Swiss-prot data base the site S46 in CDK2AP1 scored in the top 0.05% sites

searched, predicting it as a candidate IKKε target [119].

Here, by in-vitro phosphorylation studies on full-length CDK2AP1 using GST-tag

IKKε (purchased from Invitrogen PV4875) we achieved complete phosphorylation at S46

verified by LC/MS-MS analysis. In Figure 3.10 the LC/MS/MS chromatograms for the

peptide including the putative phosphorylation site, S46, from samples before and after

phosphorylation reaction is shown for both the phosphorylated (m/z ~ 934.8, retention

time ~ 36.7 min) and non-phosphorylated (m/z ~ 908.1, retention time ~ 37.7 min). No

signal is observed at retention time 37.7 and m/z 934.8 for the control sample, suggesting

that the phosphorylation of the peptide only occurs after IKKε reaction. The

chromatogram for the non-phosphorylated peptide from reacted sample indicates that the

reaction almost reached completion, with very small signal coming from the non-

phosphorylated sample. Figure 3.11 shows the MS/MS spectrum for the phosphorylated

peptide, showing that the phosphorylation site is S46, but no other side within this

peptide.

56

Figure 3.10 The LC/MS chromatograms of peptides after trypsin digestion of samples

before and after phosphorylation reaction. Chromatograms of non-phosphorylated

peptide (a and b) and phosphorylated peptide (c and d) are shown. The elution times of

non-phosphorylated and phosphorylated peptides are 36.7 and 37.7 mins, respectively.

Figure 3.11 MS/MS spectrum of QLLSDYGPPSpLGYTQGTGNSQVPQSK peptide.

S46 is identified as the only phosphorylated site after IKKε phosphorylation reaction.

X10

X10

before rxn

before rxn

after rxn

after rxn

57

NMR experiments indicate that no significant structural changes are observed on

the 3D structure of the protein due to phosphorylation (Figure 3.12). For the cases where

the structures of other candidate IKKε targets are known, putative phosphorylation sites

were located at both structured and unstructured regions. For CDK2AP1 the

phosphorylation site is located on the disordered region of the protein, as determined by

HDX-MS analysis. This suggests that structural flexibility of the substrate may be a

structural requirement for kinase activity of IKKε.


1H-

15N HSQC spectra of wild-type (red) and

phosphorylated (green) full-length CDK2AP1 in 20 mM MES buffer at pH 6.5,

containing 0.02% NaN3, 10 mM DTT, 5 mM CaCl2, 200 mM NaCL, 1x Protease

Inhibitors, 10% D2O, and 50 µM DSS, at 25 °C.

58

3.3.6 CDK2:CDK2AP1 interaction

We showed that the first 60 residues of CDK2AP1 were highly flexible and only

the C-terminal 55 residues form a highly ordered tertiary structure and we determined the

solution structure of this region. Based on mutation studies it was reported that residues

109-111 were important in CDK2 interaction [115]. The suggested interaction site is

included in our CDK2AP1 (61-115) construct. Based on this information we expected to

observe an interaction between CDK2 and CDK2AP1 (65-115).

We studied the interaction between CDK2 and CDK2AP1 (65-115) by size

exclusion chromatography. In Figure 3.13 size exclusion chromatograms of CDK2,

CDK2AP1 (65-115) and stoichiometric mixture of CDK2 and CDK2AP1 (65-115) are

shown. Under the same conditions CDK2 elutes at 11.5 ml while CDK2AP1 (65-115)

elutes at 13 ml, due to the difference in their molecular weight. If there is an interaction

between CDK2 and CDK2AP1 (65-115), due to the increased molecular weight of the

complex, the elution volume is expected to be lower than what was observed when pure

CDK2 or CDK2AP1 (61-115) was injected to the column. However, when the mixture is

injected, two separate peaks are observed for CDK2 and CDK2AP1 (65-115) at the same

elution volumes observed for individual injection of each sample; and no new peak is

observed corresponding to complex formation. The separation of two proteins is also

confirmed by the SDS-gel ran on the elution fractions (data not shown).

59

Figure 3.13 Size exclusion chromatography for CDK2 (red), CDK2AP1(65-115) (blue)

and mixture (green), with injection of 50 µL of protein sample to Superdex 75 10 mm/30

cm column with a flow rate 0.5 mL/min at 4 °C. The expected elution volume (10.2 mL)

for the complex is shown with an arrow.

This observation suggests that the disordered N-terminal tail is required for CDK2

and CDK2AP1 interaction. To test this hypothesis we did a similar study for CDK2 and

full-length CDK2AP1 (Figure 3.14). In this case CDK2 and CDK2AP1 elution volumes

are similar, 11.5 and 12.5 ml, respectively. When injected in stoichiometric amounts, we

still do not observe elution of the complex at an earlier elution volume. This again

suggests that there is no interaction detected between CDK2 and CDK2AP1, at the

affinity level that can be captured by gel-shift assays. However, due very low absorbance

observed for full-length CDK2AP1 in this experiment because of its instability in

solution we cannot derive definite conclusions on binding solely based on this

experiment.

Elution volume (ml)

60

Figure 3.14 Size exclusion chromatography for CDK2 (red), CDK2AP1 (blue) and

mixture (green), with injection of 50 µL of protein sample to Superdex 75 10 mm/30 cm

column with a flow rate 0.5 mL/min at 4 °C. The expected elution volume (9.75 mL) for

the complex is shown with an arrow.

To test this hypothesis further we prepared a co-expression system where both

proteins, CDK2 and full-length CDK2AP1, are expressed in the same vector in E.coli; in

which only CDK2 had Ni-NTA affinity tag. The binding of the two proteins was studied

by co-purification with a Ni-NTA affinity column, where the soluble fraction, after

sonication and centrifugation of the cells, is loaded to the affinity column and eluted by

increasing concentrations of imidazole. Each fraction was collected and analyzed by

SDS-Gel (Figure 3.15). In the final gel we observe that CDK2AP1 starts eluting from the

column at no or low imidazole concentrations. CDK2 starts to elute at 100 mM, after all

CDK2AP1 is depleted in the column by 40mM imidazole wash. The fact that these

proteins elute separately from the column indicates that there is no significant interaction

between CDK2 and CDK2AP1 to keep CDK2AP1 on the column bound to CDK2.

Elution volume (ml)

61

Figure 3.15 SDS-GEL of the fractions from co-purification assay for CDK2: CDK2AP1

co-expression system. The summary of the co-expression setup is given at the top, the

bands corresponding to CDK2 and CDK2AP1 are marked on the right. The flow through

(FT) and the imidazole concentrations for each fraction are marked on each column.

Under the conditions we studied the CDK2 and CDK2AP1 proteins we did not

detect any interaction between these proteins. These proteins are shown to be important

in cell cycle regulation, and post-translational modification is highly utilized in the cell

for regulation of protein functions. In the same paper reporting CDK2 and CDK2AP1

interaction, unphosphorylated and monomeric form of CDK2 was reported to be

interacting with CDK2AP1 [115]. We have described above that new potential

phosphorylation site was identified for CDK2AP1 and we proved the phosphorylation in

vitro. It can be suggested that the phosphorylation of CDK2AP1 may provide a regulation

of its function in the cell, i.e. its interaction with CDK2. However, we could not observe

62

any interaction between CDK2 and CDK2AP1 (S46p) in our gel shift analysis (Figure

3.16).

Figure 3.16 Size exclusion chromatography for CDK2 (red), CDK2AP1 (S46p) (blue)

and mixture (green), with injection of 50 µL of protein sample to Superdex 75 10 mm/30

cm column with a flow rate 0.5 mL/min at 4 °C. The expected elution volume (9.75 mL)

for the complex is shown with an arrow.

3.4 Discussion

HDX-MS studies on full-length CDK2AP1 reveal that the N-terminal domain is

highly unstructured, and, hence, solution structural analyses were carried out on a

truncated construct, CDK2AP1 (61-115), excluding the residues in the disordered region.

Static light scattering and 15

N NMR relaxation data indicate that CDK2AP1 (61-115)

forms a stable dimer in solution. The solution NMR structure of CDK2AP1 (61-115) is a

symmetric homodimer, featuring an anti-parallel four-helical bundle motif. The structure

of CDK2AP1(61-115) is mainly stabilized by hydrophobic interactions at the interface.

The only cysteine residue in the protein, C105 located at the interface, is in a reduced

Elution volume (ml)

63

state under the conditions studied. Mutation of this cysteine to alanine does not disrupt

the structure or dimer formation. It was reported that the protein was observed in

monomer form in log-phase and dimerization of these proteins is observed when cell-

growth was inhibited by contact. However, under the conditions studied here the light

scattering data and size exclusion chromatography suggest that the full-length protein

also forms a dimer.

We have also showed that the CDK2AP1 S46 site gets phosphorylated by IKKε

by in-vitro studies, as was suggested by bioinformatics search. With this study, the

known interactome of CDK2AP1 has been extended by addition of its interaction with

IKKε, which is a very important protein for cell defense and an identified oncogene.

Moreover, this interaction may provide a means to regulate CDK2AP1 activity or a more

complex interaction pathway, which needs to be elucidated by further studies.

We also studied the CDK2 and CDK2AP1 interaction using gel shift assays and

co-expression/co-purification assay. The gel shift assays did not yield any data supporting

interaction between CDK2 and CDK2AP1 (61-115), CDK2AP1, and CDK2AP1 (S46p).

Further binding assay on E.coli co-expression and Ni-NTA co-purification of CDK2 and

CDK2AP1 also could not identify any interaction between these two proteins. The

suggested interaction was studied and published by a single lab and it was not supported

by any other work from other labs. A recent study revealed that CDK2AP1 was co-

purifying with Mi-2/NuRD, which is a chromatin remodeling complex [117]. In the same

study it was reported that CDK2AP1 shows an affinity for methylated CpGs DNA;

however the authors report that could not detect CDK2 as one of the binding partners of

64

CDK2AP1. There is an extensive literature about the phenotypes of under-expression or

deletion of doc-1 gene in different cancer cell types; however our data suggest that more

detailed study is required to understand the mechanisms of CDK2AP1 in the cell.

65

4. SOLUTION NMR STRUCTURE OF PEPTIDE METHIONINE

SULFOXIDE REDUCTASE FROM BACILLUS SUBTILIS

4.1 Introduction

Reactive oxidative species (ROS), consisting of singlet oxygen molecule (O2),

superoxide anion, hydroxyl radical, hydroxyl ion, hydrogen peroxide and hypochlorite

ion, can cause damage in the cell by reacting with the nucleic acids, proteins, amino

acids, fatty acids or co-factors. The accumulated oxidative damage leads to age related

degenerative diseases [150-152], cancer [153, 154], and aging due to Free-radical Theory

[155, 156]. Although ROS can occur due to environmental exposure, the main sources of

oxidative species are the by-products of energy production and other cellular activities.

As a result, organisms have developed auxilliary mechanisms to prevent or reverse the

damages caused by oxidation.

Methionine is highly susceptible to oxidation by ROS. The oxidation of free or

peptide-bound methionine results in a equal mixture of R- and S-epimers of methionine

sulfoxides (methionine-S,R-sulfoxides; Met-S,R-SO), due to the achiral sulfur atom

[157]. The oxidation of the methionine residues in proteins was shown to impair

enzymatic activity and protein function, and to be related to many pathologies [158-160].

As one of the defense mechanisms of the cell against oxidative damage, methionine

sulfoxide reductase (Msr) catalyzes the reduction of oxidized methionine (Met-SO) to

methionine in all three kingdoms of life. There are two classes of Msr proteins identified,

MsrA and MsrB, which specifically function to reduce S- or R- epimers, respectively.

Mice lacking msrA gene showed higher sensitivity to oxidative stress with 40% shorter

66

life span [161]. Also an increased resistance and viability is observed when MsrA and

MsrB are overexpressed in oxidative conditions [162-164].

Msr‟s are thought to originate from convergent evolutionary processes due to

their very similar cellular functions with highly divergent sequence and structure [165].

Extensive biochemical and biophysical studies revealed the catalytic mechanisms of both

MsrA [166] and MsrB [167], which are quite similar despite their divergent protein

sequences and folds. The proposed reaction mechanism involves the formation of the

sulfenic acid at the catalytic cysteine with concomitant release of reduced ligand. The

chemically modified enzyme is recycled by formation of intra-chain disulfide bond

between the catalytic cysteine (C115 in Bacillus subtilis MsrB) and the recycling cysteine

(C65 in B. subtilis MsrB), which is in turn reduced by thioredoxin (Trx) [166, 167].

Three dimensional (3D) structures of MsrB from different organisms have been

determined by X-ray crystallography (PDB ID: 3HCG, 3HCH, 3CEZ, 3HCI, 3HCJ,

1L1D, 3E0O and 3CXK). These structures provide structural information on reduced,

oxidized and ligand-bound states of the enzyme. The comparison of the crystal structures

among different organism indicates that the structure is highly conserved with less than

0.7 Å rmsd among structures from different organisms with an overall fold composed of

two β-sheets (seven β-strands) and two short helices at N- and C- terminal regions of the

proteins.

Here we present the solution NMR structure of MsrB from B. subtilis (Gene

name: msrB, Pfam id: PF01641, SWISS-PROT ID: MSRB_BACSU), as a part of North

East Structural Genomics Consortium (NESGC) project. Initial studies indicate that, due

67

to its molecular weight (16.6 kDa), which is on the larger end of the proteins regularly

studied by NMR, and dynamic character of the protein, standard protocols do not

provide sufficient data for high resolution structure determination. The solution structure

was finally calculated using recently developed techniques developed for solution

structure determination for higher molecular weight molecules using sparse constraints

from NMR data.

4.2 Methods and Materials

4.2.1 Protein Cloning, Expression and Purification

The msrB gene from B. subtilis was cloned into pET21 expression vector with C-

terminal hexa-His (LEHHHHHH) purification tag. The expression vector was then

transformed into codon enhanced BL21 (DE3) pMGK E. coli cells and cultured in MJ9

media. The [U-5%-13

C, 100%-15

N]-labeled samples were prepared using (15

NH4)2SO4

and [U-13

C]-D-glucose as the sole nitrogen and carbon sources in the culture medium

[168]. Perdeuterated triple-labeled (13

C,15

N,2H), with the methyl groups of Val, Leu, Ile

(1) selectively protonated, [U-2H,

13C,

15N;

1H-Ile-1,Leu-,Val-]-MsrB sample was

prepared using (5NH4)2SO4 and [U-

13C]-D-glucose with addition of [U-

13C4, 3,3-

2H2]-α-

ketobutyrate (50 mg/L), [U-13

C5, 3-2H]-α-ketoisovalerate (CIL Inc.) (100 mg/L) in D2O

medium [169]. Initial cell growth is carried out at 37°C and the protein expression was

induced by 1 mM isopropyl--D-thiogalactopyranoside (IPTG). Expressed proteins were

purified by two step purification consisting of HisTrap HP affinity chromatography

followed by HiLoad 26/60 Superdex 75 gel filtration chromatography using

68

AKTAexpress (GE Healthcare). Final samples for NMR experiments contain 0.8-1.0 mM

of protein in 20 mM MES buffer at pH 6.5, containing 0.02% NaN3, 10 mM DTT, 5 mM

CaCl2, 200 mM NaCL, 1x Protease Inhibitors, 10% D2O, and 50 µM DSS.

4.2.2 NMR data collection and structure calculation

All NMR data were collected at 25°C on Varian INOVA 600 and Bruker

AVANCE 800 NMR spectrometers, processed with NMRPipe [120], and visualized

using SPARKY [121]. The backbone N, HN, C‟, C

α and C

β, and ILV residues side chain

methyl assignments are used as it was reported earlier [170]. Stereospecific isopropyl

methyl assignments for all Val and Leu residues were deduced from characteristic cross-

peak fine structures in high resolution 2D 1H-

13C HSQC spectra of 5%-

13C, 100%-

15N

MsrB. Distance constraints for structure calculations were acquired from two 3D, 13

C-

and 15

N-edited NOESY spectra and four 3D HSQC-NOESY-HSQC type of spectra

(hCCH, hCNH, hNNH, hNCH) [171-173].

Measurement and Analysis of Residual Dipolar Coupling Data - MsrB was first

aligned in 13.3 mg/ml Pf1 phage medium (ASLA biotech) [174]. The one bond 1H-

15N

couplings for isotropic and aligned samples of MsrB were measured using an NH J-

modulation experiments [175].

Structure Calculations and Structure Quality Assessment - Initial structure

calculations were performed by Cyana 3.0 [129, 130], using peak intensities six NOESY

spectra, dihedral angle constraints computed by TALOS (ψ± 20°; φ ± 20°) [132], and

NH-HN RDC constraints. The 20 structures with lowest target function out of 100 in the

69

final cycle were further were refined by restrained molecular dynamics in explicit water

using CNS 1.2 [133, 134], using the final NOE derived distance constraints, TALOS

dihedral angle and RDC constraints. Final refined ensemble of 20 structures was

deposited into the Protein Data Bank (PDB ID: 2KZN). Structural statistics and global

structure quality factors, including Verify3D [135], ProsaII [136], PROCHECK[137], and

MolProbity [138] raw and statistical Z-scores, were computed using the PSVS 1.3

software package [139].

4.3 Results

4.3.1 Samples Used for Data Collection

Protein structure determination by NMR is mostly limited by the size of the

proteins. With increasing size of protein molecules the efficiency of magnetization

transfer reduces significantly, due to increased relaxation rate of magnetization in the

transverse plane [176], which also affects the resolution of the spectra. To reduce the

relaxation effects observed in larger proteins different deuteration schemes are

introduced, which expanded the size range of proteins for NMR studies [177-180]. Since

the gyromagnetic ratio of 2H is 6.5 fold less than that of

1H, the dipolar effects of

2H on

relaxation of the surrounding nuclei is much less than that of 1H. Hence, perdeuterated or

selectively deuterated samples provides substantial increase in the sensitivity and

resolution in NMR spectra.

MsrB from B. subtilis is a 141 residue protein with molecular weight of 16.6 kDa

is at the upper limit molecular weight for the proteins routinely studied by solution NMR

70

methods. Initial studies on structure determination using standard protocols on uniformly

13C,

15N labeled protein did not provide good quality data for assignment and structure

determination. Thus, 3D structural studies were carried on using triple-labeled with the

methyl groups of Val, Leu, Ile (1) selectively protonated, [U-2H,

13C,

15N;

1H-Ile-1,Leu-

,Val-]-MsrB (ILV) sample. Replacing majority of protons in a protein with deuterium

provides significant increase in the resolution and sensitivity of the data, as shown in

comparison of 2D projections of HNcaCO spectra collected for each sample, in Figure

4.1. The backbone and Ile, Lue and Val side-chain assignments and structure calculation

was made possible by use of a perdeuterated triple-labeled (13

C, 15

N, 2H) sample, with the

methyl groups of ILV labeled as 13

CH3.

71

Figure 4.1 Projection of HNcaCO spectra for (a) double-labeled (

13C,

15N)-sample and (b)

perdeuterated triple-labeld (13

C,15

N2H) sample of B. subtilis MsrB in 20 mM MES

buffer at pH 6.5, containing 0.02% NaN3, 10 mM DTT, 5 mM CaCl2, 100 mM NaCL, 1x

Protease Inhibitors, 10% D2O, and 50 µM DSS.

4.3.2 Chemical Shift Assignments

Although backbone 15

N and 13

C and Ile, Leu and Val 13

C and 1H methyl chemical

shift values have been determined previously (BMRB ID: 5619) [170], we first verified

and corrected these resonance assignments using standard triple-resonance and NOESY

72

NMR experiments, as described in Material and Methods section. The updated chemical

shift list, including the corrections and additional assignments to the existing list, was

used for the remainder of the study. These updated resonance assignments for B. subtilis

MsrB have been deposited in the BioMagResDataBase (BMRB ID: 17008).

In the final list 97.2% of C‟, 97.9% of Cα and C

β, 96.5% of H

N and N assignments

were complete. Complete side-chain assignments for methyl residues residues of Ile, Val

and Leu were determined using 13

C-edited NOESY data. Stereospecific assignments of

isopropyl methyl groups of Ile and Val residues (17 out of 17) determined based on high

resolution 13

C-HSQC spectra using a 100% 15

N-5%13

C labelled sample.

In the course of determining these resonance assignments, we observed two

distinct resonance frequencies for many of the residues in or near the active site of MsrB

(Figure 4.2). These multiple resonance frequencies for some atoms of MsrB indicate two

different conformational states of MsrB, in or near the enzymatic active site, which are in

equilibrium in solution. The peak intensities of the double peaks suggest that these states

reflect approximately 65% and 35% of the total population and will be referred to as

major and minor states in this text, respectively.

73

Figure 4.2 The 800 MHz

15N-HSQC spectrum for MsrB at 25 °C, the residues exhibiting

two different resonance frequencies are labeled in blue. Sample in 20 mM MES buffer at

pH 6.5, containing 0.02% NaN3, 10 mM DTT, 5 mM CaCl2, 100 mM NaCL, 1x Protease

Inhibitors, 10% D2O, and 50 µM DSS.

It was mentioned that the active site of MsrB includes two cysteine residues in

proximity to each other, which form a disulfide bond during the enzymatic recycling

process following catalysis. We therefore considered the possibility that the multiple

states observed for some resonances of the active site may be due to the different

oxidation states of these cysteine residues. However, no multiple resonance frequencies

are identified for any of the cysteines. Dilution with DTT, a disulfide reducing agent, did

not perturb the spectra significantly, or change the ratios of major to minor peaks,

suggesting that different oxidation states of these cysteine residues is not the basis of

74

these multiple states. Rather, the multiple states appear to be due to slow exchange

between two or more conformational states of the protein, which differ primarily in

structure in or near the active site.

The kind of slow exchange we observe for MsrB often is attributable to cis-trans

isomerization around the peptide bond preceding proline residues. We observe two

proline residues at the loop near the active site, at the center of the exchanging residues.

The greatest chemical shift difference between different states is observed around the

residue P109. The chemical shift values of Cβ of P109 in major and minor conformational

states are in agreement with the expected chemical shift values in trans and in cis

isomeric states, respectively [181] (Figure 4.3). Hence the observed multiple

conformational states observed near the active site are likely to be due to cis-trans

isomerization at residue P109.

75

Figure 4.3 Secondary chemical shift histograms for Cβ atoms in folded proteins for

Proline residues preceded by a trans (blue, left y-axis) and cis (red, right y-axis) peptide

bonds. The secondary chemical shift values observed for major and minor conformations

are indicated by stars (blue and red, respectively). (Figure adapted from [181])

4.3.3 The Solution structure of MsrB

Here we report the solution structure of the major conformational form of B.

subtilis MsrB determined on a perdeutereated (13

C,15

N,2H) sample with

13C-

1H labeled

methyl resonances of Ile(), Leu, and Val by „minimal constraint‟ approach. The

structure was studied using sparse constraints from backbone-backbone, backbone-

methyl and methyl-methyl distance constraints obtained from NOESY spectra. These

data basically provide distance constraints among backbone amides and side-chain

methyls from 5 Ile(), 10 Leu and 7 Val residues of MsrB.

76

The solution structure for this protein has been studied earlier in our lab using

sparse constraints under our automated protein fold determination studies [182]. That

study focused on rapid structure determination using sparse NMR constraints with

medium accuracy, which required minimum user interference. This study indicated that

highly automated structure calculation methods using sparse constraints would provide

sufficient information for determining the fold of the protein. However, specifically as

observed for MsrB, the quality scores for the final structures may be quite poor. Here, we

revisited the structure of MsrB, using recent structure calculation tools to improve the

quality of the final structure.

In structure calculations, a total of 622 NOE-derived constraints, 85 RDC

constraints for N-HN bond vectors, and 235 dihedral angle constraints derived from the

backbone chemical shift data using TALOS [132] were used. These correspond to 1.9

long-range restraining constraints per residue, and a total of 6 restricting constraints per

residue. Detailed structural statistics and quality scores are given in Table 4.1.

77

Table 4.1 Structure calculation statistics and quality scores for MsrBa

Conformationally-restricting constraints

Distance constraints

Total 622

intra-residue (i = j) 46

sequential (|i- j| = 1) 165

medium range (1 < |i – j| ≤ 5) 133

long range (|i – j| > 5) 278

distance constraints per residue 4.3

Dihedral angle constraints 235

Number of constraints per residue 6.0

Number of long range constraints per residue 1.9

Residual constraint violations

Average number of distance violations per structure

0.1 – 0.2 Å 30.3

0.2 – 0.5 Å 9.75

> 0.5 Å 0.35

average RMS distance violation / constraint (Å) 0.06 Å

maximum distance violation (Å) 1.85 Å

Average number of dihedral angle violations per structure

1 – 10° 28.65

> 10° 1.55

average RMS dihedral angle

violation / constraint (degree) 1.71º

maximum dihedral angle violation (degree) 20.70º

RMSD from average coordinates (Å) orderedb

backbone atoms 1.4 Å

heavy atoms 2.1 Å

Ramachandran statistics for ordered residues

(Richardson lab Molprobity)

most favored regions (%) 94.3%

additional allowed regions (%) 55.5%

disallowed regions (%) 0.2 %

Global quality scoresc Raw / Z-score

Verify3D 0.29 / -2.73

ProsaII 0.42 / -0.95

Procheck(phi-psi) -0.45 /-1.46

Procheck(all) -0.38 / -2.25

Molprobity clash 18.61 / 1.67

RDC statistics

85 Number of DNH contstraints

R 0.919±0.012

Qrms 0.251±019 aStructural statistics were computed for the ensemble of 20 deposited structures

bOrdered residue ranges [S(phi) + S(psi) > 1.8]: residues 4-27,31-33,37-59,64-80,84-

95,98-133,135-141 cCalculated based on PSVS v1.4 program

78

Figure 4.4 shows a ribbon representation of the solution structure bundle of MsrB,

where two anti-parallel β-sheets formed by β2-β1-β8 and β4-β5-β6-β7-β3, which were

flanked by one α-helix formed on N-terminal region and two α-helices on C-terminal

region of the protein. The root mean square deviation (rmsd) for all heavy atoms of the

protein is calculated as 2.1 Å. The described secondary structural elements are connected

by long loop regions, which make up approximately 50% of the protein. When only the

ordered residues are considered, the rmsd for the backbone reduces to 1.4 Å. The two β-

sheets form the stable core of the protein with 1.0 Å rmsd. This sparse-constraint solution

NMR structure of B. subtilis MsrB is quite similar to the X-ray crystal structures of

homologs that have been published previously [183, 184].

Figure 4.4 Solution NMR structure of major conformational state of fully-reduced MsrB;

the lowest energy ensemble of 20 structures (a) and the lowest energy structure (b) are

shown. The secondary structures are color coded, where the β-strands are shown in

yellow, α-helices are shown in red and loops and unstructured regions are shown in green

The catalytic cysteine (C115) is located in the β-sheet involving β3 through β7, on

strand β7, and the recycling cysteine is located in the β2-β3 loop, between strands β2 and

79

β3 and connecting the two sheets. The distance between the thiol groups of catalytic

cysteine (C115), located in strand β7, and the recycling cysteine (C62), located on the

loop β2 and β3 is 7.7±2.9 Å (Figure 4.5). As mentioned earlier, in order to recycle the

activity of MsrB following Met reduction, formation of a disulfide between C62 and

C115 is necessary. Since the average distance between the sulfur atoms engaged in a

disulfide bond is 2.05 Å, significantly shorter than the ~ 7.6 ± 3.0 Å distance observed in

this fully reduced form of the enzyme, the first step of recycling process requires a

conformational rearrangement in order to bring the β2-β3 loop closer to the active β-

sheet. The average distance between these groups in crystal structures were 3.8 Å, which

still requires conformational rearrangement for recycling.

Figure 4.5 Stereo image of the active site of MsrB, the catalytic and recycling cysteine

residues are shown in stick representation and labeled. The residues experiencing slow

exchange are colored in red.

80

4.3.4 MsrB Dynamics

We have also measured backbone 15

N nuclear relaxation data for MsrB, in order

to understand the internal dynamics of the protein. 1H-

15N-heteronuclear NOE (hetNOE)

and 15

N-relaxation relaxation rates were collected at 600 MHz at 25 °C and plotted in

Figure 4.6. Based on the 15

N nuclear relaxation data the rotational tumbling time for

MsrB is calculated as 13.8 ns, which is higher than expected (10.8 ns) for this size of

protein.

Figure 4.6 Backbone

15N relaxation measurements at 600 MHz, 25°C.

1H-

15N-

heteronuclear NOE and 15

N longitudinal (R1) and transverse (R2) relaxation rates are

shown. The secondary structures are shown at the top of the figure, location of catalytic

and recycling cysteines are shown by black arrows.

HetNOE measurements provide identification of backbone 15

N-1H amide sites

which have tumbling rates higher than the rate of tumbling of the overall molecule (c ~

13.8 ns at 25 °C). These constitute the highly flexible regions of the protein. In hetNOE

α1 α2 α3

β1 β2 β3 β4 β5 β6 β7 β6 α1 α2

81

measurements for some residues, especially in the loop regions, we observe a decreased

hetNOE values. Specifically backbone 15

N-1H amide sites of residues Q30, N31, H36 and

E38 between α1 and β1, residue S60 between β2 and β3, residues M85 and I86 between

β4 and β5, residues F103 and N110 near the catalytic site have improved flexibility on

the < 13.8 ns timescale. Additional to these residues, at the loop regions residue 15 at the

N-terminal helix experiences significantly reduced hetNOE value, indicating some

flexibility at this location in the helical structure.

The transverse relaxation rates (R2) provide information on dynamics on both the

< 13.8 ns and also on the µs-ms timescales. We observe high relaxation rates for residues

E72, at β4, and E74 and E76 at the loop connecting β3 and β4, indicating exchange

broadening due to intermediate timescale dynamics.

The enzyme plasticity due to thermal energy is the key to catalytic function of

enzymes. The link between the thermally driven dynamics at different timescales

observed in enzymes with the catalytic function has been established by NMR relaxation

studies. It has been observed for several cases that the conformational fluctuations

observed in substrate-free enzymes correlate with the fluctuations during catalysis and

often constitute the rate limiting step of the overall reaction [3, 4, 185-187].

The NMR relaxation studies and assignments suggest MsrB experiences

dynamics in multiple time-scales. Near the active site we observe slow conformational

fluctuations in NMR timescale, probably due to cis-trans isomerization at P109. These

fluctuations may be related to the reorientation of the loop carrying the recycling

cysteine, bringing two cycteines together, to start the recycling process. However, in the

82

crystal structure a significant difference between the sulfur groups in cis and trans

conformations is not observed. The role of cis-trans isomerization is subject to further

investigation.

There are two different forms of the MsrB enzyme, where a subset of MsrBs

include a C4 Zn-binding motif, corresponding to loops between β1-β2 and β5-β6, and

wider sequence variability for structural stability [188, 189]. The mutational studies

incorporating Zn-binding sites on an originally non-binding N. meningitides MsrB

increased its thermal stability but abolished its recycling by thioredoxin [189]. This

indicates that the general flexibility observed in MsrB, especially in the loop regions is

essential for continuous activity of the protein.

83

Figure 4.7 The solution structure of MsrB, the residues with low hetNOE values are

colored red, the residues with high R2 rates are colored blue.

4.3.5 X-ray Crystal Structure

As we were working on the solution structure of B. subtilis MsrB, an X-ray

crystal structure (2.6 Å resolution) was deposited in the Protein Data Bank (PDB ID:

3E0O) [190]. The backbone superimposition of this crystal structure and minimum

energy model from our sparse-constraint NMR studies is shown in ribbon representation

in Figure 4.8.

84

Figure 4.8 The superimposition of 2.6 Å X-ray crystal structure (PDB ID: 3E0O) (green)

and sparse-constraint NMR (blue) structures for MsrB from B. subtilis.

The general fold obtained is very similar (2.3 Å backbone rmsd), while the core

formed by two β sheets shows little deviation among two structures with an rmsd of 1.2

Å. The majority of the difference between the crystal structure and the lowest energy

model occurs at the loop regions, where the NMR structure was loosely defined due to

lack of methyl-methyl contacts; in some cases these loops appear to be flexible in

solution, based on 15

N relaxation data described above.

However a significant difference is observed between structures from different

methods in the N-terminal helix, consisting of residues 4-23. The X-ray crystal structure

exhibits a kink in the helix, around residue L12, which results in two short helices. A

85

kink is also observed in the corresponding helices of homologues with known structures

[183, 184]. In the sparse-constraint NMR structure, we observe a single continuous helix

for this region of the protein.

Chemical shifts are very sensitive to the local environment, including both

covalent and non-covalent structures, and provide valuable information on secondary

structures in proteins [191, 192]. The chemical shift values observed for residues within a

secondary structure deviates from the random coil values [193]. For example, the Cα

chemical shift values experience on the average 2.6 ppm downfield shift in an α-helical

structure, and 1.4 ppm upfield shift in β-sheet structure. The chemical shift index, the

variation of the chemical shifts with respect to the random coil values, for the Cα

atoms at

the N-terminal region is shown in Figure 4.9. The hetNOE data also suggest some

flexibility around the location where the kink was observed. While the chemical shift data

and hetNOE data are support discontinuity in the helix around residue L12, 15

N-1H RDC

data correlates with the NMR (correlation coefficient = 0.978) structure at this region

better than X-ray structure (correlation coefficient = 0.871). This discrepancy may be

due to ensemble averaging on the measured RDC values, which may not reflect the actual

orientation of the bond vectors.

86

-3

-2

-1

0

1

2

3

4

5

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Cα

CS

I

Residue number

Figure 4.9 Secondary shifts of Cα atoms for N-terminal residues. These data are

consistent with a kink in the helix at residues 10 – 11. The helical regions as observed in

NMR and X-ray structures are shown above.

4.4 Discussion and Conclusions

The solution NMR structure of MsrB from B. subtilis has been determined using a

“sparse constraint” approach; using a perdeuterated (13

C15

N2H) enriched protein and

distance constraints among backbone amide and sidechain methyls from residues Ile, Val,

and Leu only. The overall structure forms two antiparallel β-sheets with α-helices at both

terminal regions where the secondary structures are connected by long loop regions, very

similar to the structures observed for the homologs from different organisms.

During our studies, and subsequent to completing the NMR structure, an

independently-determined X-ray crystal structure was released. The comparison of these

structures indicates that while the sparse constraint structure generally agrees with the

high-resolution X-ray crystal structure, some differences are observed, mainly around the

loop regions, which are within the uncertainty of our NMR structure. The relaxation

NMR

X-ray X-ray

87

experiments also suggest that these loop regions are flexible. Additional to the loop

regions, a discrepancy was observed for the helix at the N-terminal region, where a kink

at residue L12 breaks the helix into two short helices in the crystal structure. Our analysis

indicated that one long helix, as observed in solution structure, is highly correlated with

the RDC data, however 13

C chemical shift data indicate some distortions from α-helical

structure near residues 12-15. The hetNOE data also indicate some backbone flexibility

in the middle of this helix. The comparison of X-ray and NMR structures with the

NOESY data used in the calculations indicated that the sparse distance constraints

obtained on this perdeuterated sample cannot distinguish between these two

conformations at the N-terminal helix. The RDC constraints included in the calculations

may be the solution ensemble averaged values, which may not reflect the actual state of

the NH-bond vectors. These results suggest that while the sparse constraint approach is

valuable for determining overall folds of proteins more work is needed to make this

method reliable for defining details of backbone structure and core sidechain packing.

For example, inclusion of sophisticated energy force fields, such as the Rosetta force

field, may be needed in order to obtain more accurate structures of proteins like MsrB in

the 15 – 30 kDa size range which can be addressed by sparse constraints methods.

The multiple resonances observed for a set of residues indicate that the protein

exists in two different states which differ in conformation around the enzyme active site.

The sparse constraint structure presented here corresponds to the major (trans isomeric

state; ~65% population) conformational state, determined based on the corresponding

assignments, NOE contacts and RDC data. Though it was possible to determine backbone

88

resonance assignments for the minor conformer, efforts to compute structures did not

yield a well-defined conformation around the active site due to lack of NOE contacts and

RDC data corresponding to the minor state peaks. Again, this kind of problem may be

best addressed by combining these sparse constraints, and particularly the chemical shift

data obtained for backbone resonances, with more sophisticated energy force fields, such

as those provided by programs like Rosetta [194-197].

Based on the reducing agent dilution studies it was concluded that the multiple

conformational states do not arise due to different oxidation states of the active cysteine

residues. Another explanation would be different isomeric states of the peptide bonds

preceding proline residues. At the loop region connecting β2 and β3, where most of the

double resonances are observed, two proline residues are located, P107 and P109. The

largest chemical shift differences are observed around residue P109 and the Cβ chemical

shifts measure at P109 at two different states support cis-trans isomerization at this site,

trans state being the major conformational state[181]. In the X-ray crystal structure of

this protein in four out of six chains in the asymmetric unit cis-isomeric state is observed

at P109. Also in solution NMR structure of a Zn-binding homolog is reported to exhibits

cis-isomeric state on a proline corresponding to P109 in structural comparison,

supporting our findings.

The relaxation studies also indicated that MsrB experiences structural flexibility

at different timescales. Increasing thermal stability by incorporation of Zn-binding sites

abolished the recycling by thioredoxin, indicating that the flexibility is essential in

continuous activity of the protein. The detailed analysis for understanding exactly what

89

kind of dynamics is related to the thiredoxin-recycling step is subject to further

investigation.

90

REFERENCES

1. Mandel, A. M., Akke, M., and Palmer, A. G., 3rd. (1995) Backbone dynamics of

Escherichia coli ribonuclease HI: correlations with structure and function in an

active enzyme. J Mol Biol. 246, 144-163.

2. Beach, H., Cole, R., Gill, M. L., and Loria, J. P. (2005) Conservation of mus-ms

enzyme motions in the apo- and substrate-mimicked state. J Am Chem Soc. 127,

9167-9176.

3. Volkman, B. F., Lipson, D., Wemmer, D. E., and Kern, D. (2001) Two-state

allosteric behavior in a single-domain signaling protein. Science. 291, 2429-2433.

4. Eisenmesser, E. Z., Millet, O., Labeikovsky, W., Korzhnev, D. M., Wolf-Watz,

M., Bosco, D. A., Skalicky, J. J., Kay, L. E., and Kern, D. (2005) Intrinsic

dynamics of an enzyme underlies catalysis. Nature. 438, 117-121.

5. Xie, H., Vucetic, S., Iakoucheva, L. M., Oldfield, C. J., Dunker, A. K., Uversky,

V. N., and Obradovic, Z. (2007) Functional anthology of intrinsic disorder. 1.

Biological processes and functions of proteins with long disordered regions. J

Proteome Res. 6, 1882-1898.

6. Dunker, A. K., Brown, C. J., Lawson, J. D., Iakoucheva, L. M., and Obradovic, Z.

(2002) Intrinsic disorder and protein function. Biochemistry. 41, 6573-6582.

7. Dunker, A. K., and Obradovic, Z. (2001) The protein trinity-linking function and

disorder. Nat Biotechnol. 19, 805-806.

8. Zhou, H.-X. (2001) The Affinity-Enhancing Roles of Flexible Linkers in Two-

Domain DNA-Binding Proteins. Biochemistry. 40, 15069-15073.

9. Gokhale, R. S., and Khosla, C. (2000) Role of linkers in communication between

protein modules. Curr Opin Chem Biol. 4, 22-27.

10. Ezeokonkwo, C., Zhelkovsky, A., Lee, R., Bohm, A., and Moore, C. L. (2011) A

flexible linker region in Fip1 is needed for efficient mRNA polyadenylation.

RNA. 17, 652-664.

11. Fong, J. H., and Panchenko, A. R. (2010) Intrinsic disorder and protein

multibinding in domain, terminal, and linker regions. Mol BioSyst. 6, 1821-1828.

12. Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M., Romero, P., Oh, J.

S., Oldfield, C. J., Campen, A. M., Ratliff, C. M., Hipps, K. W., Ausio, J., Nissen,

M. S., Reeves, R., Kang, C., Kissinger, C. R., Bailey, R. W., Griswold, M. D.,

Chiu, W., Garner, E. C., and Obradovic, Z. (2001) Intrinsically disordered

protein. J Molr Graph Model. 19, 26-59.

91

13. Dyson, H. J., and Wright, P. E. (2005) Intrinsically unstructured proteins and their

functions. Nat Rev Mol Cell Biol. 6, 197-208.

14. Wright, P. E., and Dyson, H. J. (1999) Intrinsically unstructured proteins: re-

assessing the protein structure-function paradigm. J Mol Biol. 293, 321-331.

15. Dunker, A. K., Obradovic, Z., Romero, P., Garner, E. C., and Brown, C. J. (2000)

Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop

Genome Inform. 11, 161-171.

16. Dunker, A. K., Cortese, M. S., Romero, P., Iakoucheva, L. M., and Uversky, V.

N. (2005) Flexible nets. The roles of intrinsic disorder in protein interaction

networks. FEBS J. 272, 5129-5148.

17. Gaur, R. K., Kupper, M. B., Fischer, R., and Hoffmann, K. M. V. (2004)

Preliminary X-ray analysis of a human VH fragment at 1.8 A resolution. Acta

Crystallographica Section D. 60, 965-967.

18. Hoedemaeker, F. J., Signorelli, T., Johns, K., Kuntz, D. A., and Rose, D. R.

(1997) A single chain Fv fragment of P-glycoprotein-specific monoclonal

antibody C219. Design, expression, and crystal structure at 2.4 A resolution. J

Biol Chem. 272, 29784-29789.

19. Cohen, S. L., Ferre-D'Amare, A. R., Burley, S. K., and Chait, B. T. (1995)

Probing the solution structure of the DNA-binding protein Max by a combination

of proteolysis and mass spectrometry. Protein Sci. 4, 1088-1099.

20. Hamuro, Y., Coales, S. J., Southern, M. R., Nemeth-Cawley, J. F., Stranz, D. D.,

and Griffin, P. R. (2003) Rapid analysis of protein structure and dynamics by

hydrogen/deuterium exchange mass spectrometry. J Biomol Tech. 14, 171-182.

21. Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesley, S.

A., and Woods, V. L., Jr. (2004) Rapid refinement of crystallographic protein

construct definition employing enhanced hydrogen/deuterium exchange MS. Proc

Natl Acad Sci U S A. 101, 751-756.

22. Spraggon, G., Pantazatos, D., Klock, H. E., Wilson, I. A., Woods, V. L., Jr., and

Lesley, S. A. (2004) On the use of DXMS to produce more crystallizable proteins:

structures of the T. maritima proteins TM0160 and TM1171. Protein Sci. 13,

3187-3199.

23. Donne, D. G., Viles, J. H., Groth, D., Mehlhorn, I., James, T. L., Cohen, F. E.,

Prusiner, S. B., Wright, P. E., and Dyson, H. J. (1997) Structure of the

recombinant full-length hamster prion protein PrP(29-231): The N terminus is

highly flexible. P Natl Acad Sci USA. 94, 13452-13457.

92

24. Riek, R., Hornemann, S., Wider, G., Glockshuber, R., and Wuthrich, K. (1997)

NMR characterization of the full-length recombinant murine prion protein,

mPrP(23-231). FEBS Lett. 413, 282-288.

25. Yao, J., Dyson, H. J., and Wright, P. E. (1997) Chemical shift dispersion and

secondary structure prediction in unfolded and partly folded proteins. FEBS Lett.

419, 285-289.

26. Sharma, S., Zheng, H., Huang, Y. J., Ertekin, A., Hamuro, Y., Rossi, P., Tejero,

R., Acton, T. B., Xiao, R., Jiang, M., Zhao, L., Ma, L. C., Swapna, G. V.,

Aramini, J. M., and Montelione, G. T. (2009) Construct optimization for protein

NMR structure analysis using amide hydrogen/deuterium exchange mass

spectrometry. Proteins. 76, 882-894.

27. Huang, Y. http://www-nmr.cabm.rutgers.edu/bioinformatics/disorder/.

28. MacCallum. DRIP-PRED CASP 6 meeting. Online paper.

29. Prilusky, J., Felder, C. E., Zeev-Ben-Mordehai, T., Rydberg, E. H., Man, O.,

Beckmann, J. S., Silman, I., and Sussman, J. L. (2005) FoldIndex: a simple tool to

predict whether a given protein sequence is intrinsically unfolded. Bioinformatics.

21, 3435-3438.

30. Linding, R., Jensen, L. J., Diella, F., Bork, P., Gibson, T. J., and Russell, R. B.

(2003) Protein disorder prediction: implications for structural proteomics.

Structure. 11, 1453-1459.

31. Linding, R., Russell, R. B., Neduva, V., and Gibson, T. J. (2003) GlobPlot:

exploring protein sequences for globularity and disorder. Nucleic Acids Res. 31,

3701-3708.

32. Ward, J. J., Sodhi, J. S., McGuffin, L. J., Buxton, B. F., and Jones, D. T. (2004)

Prediction and functional analysis of native disorder in proteins from the three

kingdoms of life. J Mol Biol. 337, 635-645.

33. Cheng, J. (2007) DOMAC: an accurate, hybrid protein domain prediction server.

Nucleic Acids Res. 35, W354-356.

34. Cheng, J., Sweredoski, M. J., and Baldi, P. (2005) Accurate prediction of protein

disordered regions by mining protein structure data. Data Min Knowl Discov. 11,

213-222.

35. Dosztanyi, Z., Csizmok, V., Tompa, P., and Simon, I. (2005) IUPred: web server

for the prediction of intrinsically unstructured regions of proteins based on

estimated energy content. Bioinformatics. 21, 3433-3434.

93

36. Yang, Z. R., Thomson, R., McNeil, P., and Esnouf, R. M. (2005) RONN: the bio-

basis function neural network technique applied to the detection of natively

disordered regions in proteins. Bioinformatics. 21, 3369-3376.

37. Galzitskaya, O. V., Garbuzynskiy, S. O., and Lobanov, M. Y. (2006) FoldUnfold:

web server for the prediction of disordered regions in protein chain.

Bioinformatics. 22, 2948-2949.

38. Peng, K., Radivojac, P., Vucetic, S., Dunker, A. K., and Obradovic, Z. (2006)

Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics. 7,

208.

39. Bai, Y., Milne, J. S., Mayne, L., and Englander, S. W. (1993) Primary structure

effects on peptide group hydrogen exchange. Proteins. 17, 75-86.

40. Zhang, Z., and Smith, D. L. (1993) Determination of amide hydrogen exchange

by mass spectrometry: a new tool for protein structure elucidation. Protein Sci. 2,

522-531.

41. Hamuro, Y., Wong, L., Shaffer, J., Kim, J. S., Stranz, D. D., Jennings, P. A.,

Woods, V. L., Jr., and Adams, J. A. (2002) Phosphorylation driven motions in the

COOH-terminal Src kinase, CSK, revealed through enhanced hydrogen-

deuterium exchange and mass spectrometry (DXMS). J Mol Biol. 323, 871-881.

42. Acton, T. B., Gunsalus, K. C., Xiao, R., Ma, L. C., Aramini, J., Baran, M. C.,

Chiang, Y. W., Climent, T., Cooper, B., Denissova, N. G., Douglas, S. M.,

Everett, J. K., Ho, C. K., Macapagal, D., Rajan, P. K., Shastry, R., Shih, L. Y.,

Swapna, G. V., Wilson, M., Wu, M., Gerstein, M., Inouye, M., Hunt, J. F., and

Montelione, G. T. (2005) Robotic cloning and Protein Production Platform of the

Northeast Structural Genomics Consortium. Methods Enzymol. 394, 210-243.

43. Monleon, D., Chiang, Y., Aramini, J. M., Swapna, G. V., Macapagal, D.,

Gunsalus, K. C., Kim, S., Szyperski, T., and Montelione, G. T. (2004) Backbone

1H, 15N and 13C assignments for the 21 kDa Caenorhabditis elegans homologue

of "brain-specific" protein. J Biomol NMR. 28, 91-92.

44. Jansson, M., Li, Y. C., Jendeberg, L., Anderson, S., Montelione, B. T., and

Nilsson, B. (1996) High-level production of uniformly 15N- and 13C-enriched

fusion proteins in Escherichia coli. J Biomol NMR. 7, 131-141.

45. Rost, B., and Sander, C. (1993) Improved prediction of protein secondary

structure by use of sequence profiles and neural networks. Proc Natl Acad Sci U S

A. 90, 7558-7562.

46. Jones, D. T. (1999) Protein secondary structure prediction based on position-

specific scoring matrices. J Mol Biol. 292, 195-202.

94

47. Lanman, J., Lam, T. T., Barnes, S., Sakalian, M., Emmett, M. R., Marshall, A. G.,

and Prevelige, P. E., Jr. (2003) Identification of novel interactions in HIV-1

capsid protein assembly by high-resolution mass spectrometry. J Mol Biol. 325,

759-772.

48. Busenlehner, L. S., and Armstrong, R. N. (2005) Insights into enzyme structure

and dynamics elucidated by amide H/D exchange mass spectrometry. Arch

Biochem Biophys. 433, 34-46.

49. Hamuro, Y., Molnar, K. S., Coales, S. J., OuYang, B., Simorellis, A. K., and

Pochapsky, T. C. (2008) Hydrogen-deuterium exchange mass spectrometry for

investigation of backbone dynamics of oxidized and reduced cytochrome

P450cam. J Inorg Biochem. 102, 364-370.

50. Roberts, A. B., and Sporn, M. B., (Eds.) (1990) Peptide Growth Factors and

Their Receptors, Vol. 1, Springer, Heidelberg.

51. Shi, Y., and Massague, J. (2003) Mechanisms of TGF-beta signaling from cell

membrane to the nucleus. Cell. 113, 685-700.

52. Liu, F. (2003) Receptor-Regulated Smads in TGF-beta Signaling. Frontiers in

Bioscience. 8, 1280-1303.

53. Derynck, R., and Zhang, Y. E. (2003) Smad-dependent and Smad-independent

pathways in TGF-beta family signalling. Nature. 425, 577-584.

54. Heldin, C. H., and Moustakas, A. (2011) Role of Smads in TGFbeta signaling.

Cell Tissue Res.

55. Shi, Y., Wang, Y. F., Jayaraman, L., Yang, H., Massague, J., and Pavletich, N. P.

(1998) Crystal structure of a Smad MH1 domain bound to DNA: insights on DNA

binding in TGF-beta signaling. Cell. 94, 585-594.

56. Yakymovych, I., Ten Dijke, P., Heldin, C. H., and Souchelnytskyi, S. (2001)

Regulation of Smad signaling by protein kinase C. FASEB J. 15, 553-555.

57. Guo, X., Ramirez, A., Waddell, D. S., Li, Z., Liu, X., and Wang, X. F. (2008)

Axin and GSK3- control Smad3 protein stability and modulate TGF- signaling.

Genes Dev. 22, 106-120.

58. Wicks, S. J., Lui, S., Abdel-Wahab, N., Mason, R. M., and Chantry, A. (2000)

Inactivation of smad-transforming growth factor beta signaling by Ca(2+)-

calmodulin-dependent protein kinase II. Mol Cell Biol. 20, 8103-8111.

59. Tsukazaki, T., Chiang, T. A., Davison, A. F., Attisano, L., and Wrana, J. L.

(1998) SARA, a FYVE Domain Protein that Recruits Smad2 to the TGF²

Receptor. Cell. 95, 779-791.

95

60. Vasilaki, E., Siderakis, M., Papakosta, P., Skourti-Stathaki, K., Mavridou, S., and

Kardassis, D. (2009) Novel regulation of Smad3 oligomerization and DNA

binding by its linker domain. Biochemistry. 48, 8366-8378.

61. Wang, G., Long, J., Matsuura, I., He, D., and Liu, F. (2005) The Smad3 linker

region contains a transcriptional activation domain. Biochem J. 386, 29-34.

62. Prokova, V., Mavridou, S., Papakosta, P., and Kardassis, D. (2005)

Characterization of a novel transcriptionally active domain in the transforming

growth factor beta-regulated Smad3 protein. Nucleic Acids Res. 33, 3708-3721.

63. Matsuura, I., Denissova, N. G., Wang, G., He, D., Long, J., and Liu, F. (2004)

Cyclin-dependent kinases regulate the antiproliferative function of Smads.

Nature. 430, 226-231.

64. Wang, G., Matsuura, I., He, D., and Liu, F. (2009) Transforming growth factor-

{beta}-inducible phosphorylation of Smad3. J Biol Chem. 284, 9663-9673.

65. Matsuzaki, K., Kitano, C., Murata, M., Sekimoto, G., Yoshida, K., Uemura, Y.,

Seki, T., Taketani, S., Fujisawa, J., and Okazaki, K. (2009) Smad2 and Smad3

phosphorylated at both linker and COOH-terminal regions transmit malignant

TGF-beta signal in later stages of human colorectal cancer. Cancer Res. 69, 5321-

5330.

66. Alarcon, C., Zaromytidou, A. I., Xi, Q., Gao, S., Yu, J., Fujisawa, S., Barlas, A.,

Miller, A. N., Manova-Todorova, K., Macias, M. J., Sapkota, G., Pan, D., and

Massague, J. (2009) Nuclear CDKs drive Smad transcriptional activation and

turnover in BMP and TGF-beta pathways. Cell. 139, 757-769.

67. Kretzschmar, M., Doody, J., Timokhina, I., and Massague, J. (1999) A

mechanism of repression of TGFbeta/ Smad signaling by oncogenic Ras. Genes

Dev. 13, 804-816.

68. Matsuura, I., Wang, G., He, D., and Liu, F. (2005) Identification and

characterization of ERK MAP kinase phosphorylation sites in Smad3.

Biochemistry. 44, 12546-12553.

69. Mori, S., Matsuzaki, K., Yoshida, K., Furukawa, F., Tahashi, Y., Yamagata, H.,

Sekimoto, G., Seki, T., Matsui, H., Nishizawa, M., Fujisawa, J., and Okazaki, K.

(2004) TGF-beta and HGF transmit the signals through JNK-dependent Smad2/3

phosphorylation at the linker regions. Oncogene. 23, 7416-7429.

70. Yamagata, H., Matsuzaki, K., Mori, S., Yoshida, K., Tahashi, Y., Furukawa, F.,

Sekimoto, G., Watanabe, T., Uemura, Y., Sakaida, N., Yoshioka, K., Kamiyama,

Y., Seki, T., and Okazaki, K. (2005) Acceleration of Smad2 and Smad3

phosphorylation via c-Jun NH(2)-terminal kinase during human colorectal

carcinogenesis. Cancer Res. 65, 157-165.

96

71. Sekimoto, G., Matsuzaki, K., Yoshida, K., Mori, S., Murata, M., Seki, T., Matsui,

H., Fujisawa, J., and Okazaki, K. (2007) Reversible Smad-dependent signaling

between tumor suppression and oncogenesis. Cancer Res. 67, 5090-5096.

72. Furukawa, F., Matsuzaki, K., Mori, S., Tahashi, Y., Yoshida, K., Sugano, Y.,

Yamagata, H., Matsushita, M., Seki, T., Inagaki, Y., Nishizawa, M., Fujisawa, J.,

and Inoue, K. (2003) p38 MAPK mediates fibrogenic signal through Smad3

phosphorylation in rat myofibroblasts. Hepatology. 38, 879-889.

73. Kamaraju, A. K., and Roberts, A. B. (2005) Role of Rho/ROCK and p38 MAP

kinase pathways in transforming growth factor-beta-mediated Smad-dependent

growth inhibition of human breast carcinoma cells in vivo. J Biol Chem. 280,

1024-1036.

74. Millet, C., Yamashita, M., Heller, M., Yu, L. R., Veenstra, T. D., and Zhang, Y.

E. (2009) A negative feedback control of transforming growth factor-beta

signaling by glycogen synthase kinase 3-mediated Smad3 linker phosphorylation

at Ser-204. J Biol Chem. 284, 19808-19816.

75. Cohen-Solal, K. A., Merrigan, K. T., Chan, J. L., Goydos, J. S., Chen, W., Foran,

D. J., Liu, F., Lasfar, A., and Reiss, M. (2011) Constitutive Smad linker

phosphorylation in melanoma: a mechanism of resistance to transforming growth

factor-beta-mediated growth inhibition. Pigment Cell Melanoma Res. 24, 512-

524.

76. Ho, J., Cocolakis, E., Dumas, V. M., Posner, B. I., Laporte, S. A., and Lebrun, J.-

J. (2005) The G protein-coupled receptor kinase-2 is a TGF[beta]-inducible

antagonist of TGF-beta signal transduction. EMBO J. 24, 3247-3258.

77. Ho, J., Chen, H., and Lebrun, J.-J. (2007) Novel dominant negative Smad

antagonists to TGF-beta signaling. Cellular Signalling. 19, 1565-1574.

78. Liu, F., and Matsuura, I. (2005) Inhibition of Smad antiproliferative function by

CDK phosphorylation. Cell Cycle. 4, 63-66.

79. Liu, F. (2006) Smad3 phosphorylation by cyclin-dependent kinases. Cytokine

Growth Factor Rev. 17, 9-17.

80. Wrighton, K. H., and Feng, X. H. (2008) To TGF-beta or not to TGF-beta: fine-

tuning of Smad signaling via post-translational modifications. Cell Signal. 20,

1579-1591.

81. Wrighton, K. H., Lin, X., and Feng, X. H. (2009) Phospho-control of TGF-beta

superfamily signaling. Cell Res. 19, 8-20.

82. Matsuzaki, K. (2011) Smad phosphoisoform signaling specificity: the right place

at the right time. Carcinogenesis.

97

83. Gao, S., Alarcon, C., Sapkota, G., Rahman, S., Chen, P. Y., Goerner, N., Macias,

M. J., Erdjument-Bromage, H., Tempst, P., and Massague, J. (2009) Ubiquitin

ligase Nedd4L targets activated Smad2/3 to limit TGF-beta signaling. Mol Cell.

36, 457-468.

84. Qin, B. Y., Lam, S. S., Correia, J. J., and Lin, K. (2002) Smad3 allostery links

TGF-beta receptor kinase activation to transcriptional control. Genes &

Development. 16, 1950-1963.

85. Chacko, B. M., Qin, B. Y., Tiwari, A., Shi, G., Lam, S., Hayward, L. J., De

Caestecker, M., and Lin, K. (2004) Structural basis of heteromeric smad protein

assembly in TGF-beta signaling. Mol Cell. 15, 813-823.

86. Englander, S. W., Sosnick, T. R., Englander, J. J., and Mayne, L. (1996)

Mechanisms and uses of hydrogen exchange. Curr Opin Struct Biol. 6, 18-23.

87. Wales, T. E., and Engen, J. R. (2006) Hydrogen exchange mass spectrometry for

the analysis of protein dynamics. Mass Spectrom Rev. 25, 158-170.

88. Sharma, S., Zheng, H., Huang, Y. J., Ertekin, A., Hamuro, Y., Rossi, P., Tejero,

R., Acton, T. B., Xiao, R., Jiang, M., Zhao, L., Ma, L.-C., Swapna, G. V. T.,

Aramini, J. M., and Montelione, G. T. (2009) Construct optimization for protein

NMR structure analysis using amide hydrogen/deuterium exchange mass

spectrometry. Proteins: Structure, Function, and Bioinformatics. 9999, NA.

89. Raghunathan, S., Chandross, R. J., Cheng, B. P., Persechini, A., Sobottka, S. E.,

and Kretsinger, R. H. (1993) The linker of des-Glu84-calmodulin is bent. Proc

Natl Acad Sci U S A. 90, 6869-6873.

90. Srisodsuk, M., Reinikainen, T., Penttilä, M., and Teeri, T. T. (1993) Role of the

interdomain linker peptide of Trichoderma reesei cellobiohydrolase I in its

interaction with crystalline cellulose. J Biol Chem. 268, 20756-20761.

91. Yuzawa, S., Yokochi, M., Hatanaka, H., Ogura, K., Kataoka, M., Miura, K.-i.,

Mandiyan, V., Schlessinger, J., and Inagaki, F. (2001) Solution structure of Grb2

reveals extensive flexibility necessary for target recognition. J Mol Biol. 306, 527-

537.

92. Buckler, D. R., Zhou, Y., and Stock, A. M. (2002) Evidence of intradomain and

interdomain flexibility in an OmpR/PhoB homolog from Thermotoga maritima.

Structure. 10, 153-164.

93. Liu, J., and Nussinov, R. (2010) Rbx1 Flexible Linker Facilitates Cullin-RING

Ligase Function Before Neddylation and After Deneddylation. Biophys J. 99,

736-744.

98

94. Cooley, A., Zelivianski, S., and Jeruss, J. S. (2010) Impact of cyclin E

overexpression on Smad3 activity in breast cancer cell lines. Cell Cycle. 9, 4900-

4907.

95. Zelivianski, S., Cooley, A., Kall, R., and Jeruss, J. S. (2010) Cyclin-dependent

kinase 4-mediated phosphorylation inhibits Smad3 activity in cyclin d-

overexpressing breast cancer cells. Mol Cancer Res. 8, 1375-1387.

96. Liu, F. (2011) Inhibition of Smad3 activity by cyclin D-CDK4 and cyclin E-

CDK2 in breast cancer cells. Cell Cycle. 10, 186.

97. Matsuura, I., Chiang, K. N., Lai, C. Y., He, D., Wang, G., Ramkumar, R., Uchida,

T., Ryo, A., Lu, K., and Liu, F. (2010) Pin1 promotes transforming growth factor-

beta-induced migration and invasion. J Biol Chem. 285, 1754-1764.

98. Finn, G., and Lu, K. P. (2008) Phosphorylation-specific prolyl isomerase Pin1 as

a new diagnostic and therapeutic target for cancer. Curr Cancer Drug Targets. 8,

223-229.

99. Xu, G. G., and Etzkorn, F. A. (2009) Pin1 as an anticancer drug target. Drug

News Perspect. 22, 399-407.

100. Aragon, E., Goerner, N., Zaromytidou, A. I., Xi, Q., Escobedo, A., Massague, J.,

and Macias, M. J. (2011) A Smad action turnover switch operated by WW

domain readers of a phosphoserine code. Genes Dev. 25, 1275-1288.

101. Todd, R., McBride, J., Tsuji, T., Donoff, R. B., Nagai, M., Chou, M. Y., Chiang,

T., and Wong, D. T. (1995) Deleted in oral cancer-1 (doc-1), a novel oral tumor

suppressor gene. FASEB J. 9, 1362-1370.

102. Tsuji, T., Duh, F. M., Latif, F., Popescu, N. C., Zimonjic, D. B., McBride, J.,

Matsuo, K., Ohyama, H., Todd, R., Nagata, E., Terakado, N., Sasaki, A.,

Matsumura, T., Lerman, M. I., and Wong, D. T. (1998) Cloning, mapping,

expression, function, and mutation analyses of the human ortholog of the hamster

putative tumor suppressor gene Doc-1. J Biol Chem. 273, 6704-6709.

103. Daigo, Y., Suzuki, K., Maruyama, O., Miyoshi, Y., Yasuda, T., Kabuto, T.,

Imaoka, S., Fujiwara, T., Takahashi, E., Fujino, M. A., and Nakamura, Y. (1997)

Isolation, mapping and mutation analysis of a human cDNA homologous to the

doc-1 gene of the Chinese hamster, a candidate tumor suppressor for oral cancer.

Genes Chromosomes Cancer. 20, 204-207.

104. Yuan, Z., Sotsky Kent, T., and Weber, T. K. (2003) Differential expression of

DOC-1 in microsatellite-unstable human colorectal cancer. Oncogene. 22, 6304-

6310.

99

105. Choi, M. G., Sohn, T. S., Park, S. B., Paik, Y. H., Noh, J. H., Kim, K. M., Park,

C. K., and Kim, S. (2009) Decreased expression of p12 is associated with more

advanced tumor invasion in human gastric cancer tissues. Eur Surg Res. 42, 223-

229.

106. Figueiredo, M. L., Kim, Y., St John, M. A., and Wong, D. T. (2005) p12CDK2-

AP1 gene therapy strategy inhibits tumor growth in an in vivo mouse model of

head and neck cancer. Clin Cancer Res. 11, 3939-3948.

107. Cwikla, S. J., Tsuji, T., McBride, J., Wong, D. T., and Todd, R. (2000) doc-1--

mediated apoptosis in malignant hamster oral keratinocytes. J Oral Maxillofac

Surg. 58, 406-414.

108. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R. C., and Melton, D. A.

(2002) "Stemness": transcriptional profiling of embryonic and adult stem cells.

Science. 298, 597-600.

109. Kim, Y., McBride, J., Kimlin, L., Pae, E. K., Deshpande, A., and Wong, D. T.

(2009) Targeted inactivation of p12, CDK2 associating protein 1, leads to early

embryonic lethality. PLoS One. 4, e4518.

110. Wang, T. S.-F. (1996) DNA Replication in Eukaryotic Cells, Vol. 31, Cold Spring

Harbor Laboratory Press, Plainview, N.Y.

111. Matsuo, K., Shintani, S., Tsuji, T., Nagata, E., Lerman, M., McBride, J.,

Nakahara, Y., Ohyama, H., Todd, R., and Wong, D. T. (2000) p12(DOC-1), a

growth suppressor, associates with DNA polymerase alpha/primase. FASEB J. 14,

1318-1324.

112. Buajeeb, W., Zhang, X., Ohyama, H., Han, D., Surarit, R., Kim, Y., and Wong, D.

T. (2004) Interaction of the CDK2-associated protein-1, p12(DOC-1/CDK2AP1),

with its homolog, p14(DOC-1R). Biochem Biophys Res Commun. 315, 998-1003.

113. Terret, M. E., Lefebvre, C., Djiane, A., Rassinier, P., Moreau, J., Maro, B., and

Verlhac, M.-H. (2003) DOC1R: a MAP kinase substrate that control microtubule

organization of metaphase II mouse oocytes. Development. 130, 5169-5177.

114. Huang, Y. J., Hang, D., Lu, L. J., Tong, L., Gerstein, M. B., and Montelione, G.

T. (2008) Targeting the Human Cancer Pathway Protein Interaction Network by

Structural Genomics. Molecular & Cellular Proteomics. 7, 2048-2060.

115. Shintani, S., Ohyama, H., Zhang, X., McBride, J., Matsuo, K., Tsuji, T., Hu, M.

G., Hu, G., Kohno, Y., Lerman, M., Todd, R., and Wong, D. T. (2000) p12(DOC-

1) is a novel cyclin-dependent kinase 2-associated protein. Mol Cell Biol. 20,

6300-6307.

100

116. Satyanarayana, A., and Kaldis, P. (2009) Mammalian cell-cycle regulation:

several Cdks, numerous cyclins and diverse compensatory mechanisms.

Oncogene. 28, 2925-2939.

117. Spruijt, C. G., Bartels, S. J. J., Brinkman, A. B., Tjeertes, J. V., Poser, I.,

Stunnenberg, H. G., and Vermeulen, M. (2010) CDK2AP1/DOC-1 is a bona fide

subunit of the Mi-2/NuRD complex. Molecular BioSystems. 6, 1700-1706.

118. Ramirez, J., and Hagman, J. (2009) The Mi-2/NuRD complex: a critical

epigenetic regulator of hematopoietic development, differentiation and cancer.

Epigenetics. 4, 532-536.

119. Hutti, J. E., Shen, R. R., Abbott, D. W., Zhou, A. Y., Sprott, K. M., Asara, J. M.,

Hahn, W. C., and Cantley, L. C. (2009) Phosphorylation of the tumor suppressor

CYLD by the breast cancer oncogene IKKepsilon promotes cell transformation.

Mol Cell. 34, 461-472.

120. Delaglio, F., Grzesiek, S., Vuister, G. W., Zhu, G., Pfeifer, J., and Bax, A. (1995)

NMRPipe: a multidimensional spectral processing system based on UNIX pipes.

Journal of biomolecular NMR. 6, 277-293.

121. T. D. Goddard, D. G. K. SPARKY3. University of California, San Francisco.

122. Kay, L. E., Torchia, D. A., and Bax, A. (1989) Backbone dynamics of proteins as

studied by 15N inverse detected heteronuclear NMR spectroscopy: application to

staphylococcal nuclease. Biochemistry. 28, 8972-8979.

123. Farrow, N. A., Muhandiram, R., Singer, A. U., Pascal, S. M., Kay, C. M., Gish,

G., Shoelson, S. E., Pawson, T., Forman-Kay, J. D., and Kay, L. E. (1994)

Backbone dynamics of a free and phosphopeptide-complexed Src homology 2

domain studied by 15N NMR relaxation. Biochemistry. 33, 5984-6003.

124. Aramini, J. M., Huang, Y. J., Swapna, G. V., Cort, J. R., Rajan, P. K., Xiao, R.,

Shastry, R., Acton, T. B., Liu, J., Rost, B., Kennedy, M. A., and Montelione, G.

T. (2007) Solution NMR structure of Escherichia coli ytfP expands the structural

coverage of the UPF0131 protein domain family. Proteins. 68, 789-795.

125. Neri, D., Szyperski, T., Otting, G., Senn, H., and Wuthrich, K. (1989)

Stereospecific nuclear magnetic resonance assignments of the methyl groups of

valine and leucine in the DNA-binding domain of the 434 repressor by

biosynthetically directed fractional 13C labeling. Biochemistry. 28, 7510-7516.

126. Moseley, H. N., Sahota, G., and Montelione, G. T. (2004) Assignment validation

software suite for the evaluation and presentation of protein resonance assignment

data. J Biomol NMR. 28, 341-355.

101

127. Li, Y. C., and Montelione, G. T. (1993) Solvent Saturation-Transfer Effects in

Pulsed-Field-Gradient Heteronuclear Single-Quantum-Coherence (PFG-HSQC)

Spectra of Polypeptides and Proteins. J Mag Res B. 101, 315-319.

128. Stuart, A. C., Borzilleri, K. A., Withka, J. M., and Palmer, A. G. (1999)

Compensating for Variations in 1Hâˆ‟13C Scalar Coupling Constants in Isotope-

Filtered NMR Experiments. J Am Chem Soc. 121, 5346-5347.

129. Guntert, P., Mumenthaler, C., and Wuthrich, K. (1997) Torsion angle dynamics

for NMR structure calculation with the new program DYANA. J Mol Biol. 273,

283-298.

130. Herrmann, T., Guntert, P., and Wuthrich, K. (2002) Protein NMR structure

determination with automated NOE assignment using the new software CANDID

and the torsion angle dynamics algorithm DYANA. J Mol Biol. 319, 209-227.

131. Pascal, S. M., Muhandiram, D. R., Yamazaki, T., Formankay, J. D., and Kay, L.

E. (1994) Simultaneous Acquisition of 15N- and 13C-Edited NOE Spectra of

Proteins Dissolved in H2O. J Mag Res B. 103, 197-201.

132. Cornilescu, G., Delaglio, F., and Bax, A. (1999) Protein backbone angle restraints

from searching a database for chemical shift and sequence homology. J Biomol

NMR. 13, 289-302.

133. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-

Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R.

J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Crystallography & NMR

system: A new software suite for macromolecular structure determination. Acta

Crystallogr D Biol Crystallogr. 54, 905-921.

134. Linge, J. P., Williams, M. A., Spronk, C. A., Bonvin, A. M., and Nilges, M.

(2003) Refinement of protein structures in explicit solvent. Proteins. 50, 496-506.

135. Luthy, R., Bowie, J. U., and Eisenberg, D. (1992) Assessment of protein models

with three-dimensional profiles. Nature. 356, 83-85.

136. Sippl, M. J. (1993) Recognition of errors in three-dimensional structures of

proteins. Proteins. 17, 355-362.

137. Laskowski, R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993)

PROCHECK: a program to check the stereochemical quality of protein structures.

J Appl Crystallogr. 26, 283-291.

138. Lovell, S. C., Davis, I. W., Arendall, W. B., 3rd, de Bakker, P. I., Word, J. M.,

Prisant, M. G., Richardson, J. S., and Richardson, D. C. (2003) Structure

validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins. 50, 437-

450.

102

139. Bhattacharya, A., Tejero, R., and Montelione, G. T. (2007) Evaluating protein

structures determined by structural genomics consortia. Proteins. 66, 778-795.

140. Huang, Y. J., Powers, R., and Montelione, G. T. (2005) Protein NMR recall,

precision, and F-measure scores (RPF scores): structure quality assessment

measures based on information retrieval statistics. J Am Chem Soc. 127, 1665-

1674.

141. Kim, Y., Ohyama, H., Patel, V., Figueiredo, M., and Wong, D. T. (2005)

Mutation of Cys105 inhibits dimerization of p12CDK2-AP1 and its growth

suppressor effect. J Biol Chem. 280, 23273-23279.

142. Kornhaber, G., Snyder, D., Moseley, H., and Montelione, G. (2006) Identification

of Zinc-ligated Cysteine Residues Based on 13Cα and 13Cβ Chemical Shift Data.

J Biomol NMR. 34, 259-269.

143. Sharma, D., and Rajarathnam, K. (2000) 13C NMR chemical shifts can predict

disulfide bond formation. Journal of biomolecular NMR. 18, 165-171.

144. Israel, A. (2010) The IKK complex, a central regulator of NF-kappaB activation.

Cold Spring Harb Perspect Biol. 2, a000158.

145. Sharma, S., tenOever, B. R., Grandvaux, N., Zhou, G.-P., Lin, R., and Hiscott, J.

(2003) Triggering the Interferon Antiviral Response Through an IKK-Related

Pathway. Science. 300, 1148-1151.

146. Hiscott, J. (2007) Convergence of the NF-kappaB and IRF pathways in the

regulation of the innate antiviral response. Cytokine Growth Factor Rev. 18, 483-

490.

147. Eddy, S. F., Guo, S., Demicco, E. G., Romieu-Mourez, R., Landesman-Bollag, E.,

Seldin, D. C., and Sonenshein, G. E. (2005) Inducible IkappaB kinase/IkappaB

kinase epsilon expression is induced by CK2 and promotes aberrant nuclear

factor-kappaB activation in breast cancer cells. Cancer Res. 65, 11375-11383.

148. Adli, M., and Baldwin, A. S. (2006) IKK-i/IKKepsilon controls constitutive,

cancer cell-associated NF-kappaB activity via regulation of Ser-536 p65/RelA

phosphorylation. J Biol Chem. 281, 26976-26984.

149. Boehm, J. S., Zhao, J. J., Yao, J., Kim, S. Y., Firestein, R., Dunn, I. F., Sjostrom,

S. K., Garraway, L. A., Weremowicz, S., Richardson, A. L., Greulich, H.,

Stewart, C. J., Mulvey, L. A., Shen, R. R., Ambrogio, L., Hirozane-Kishikawa, T.,

Hill, D. E., Vidal, M., Meyerson, M., Grenier, J. K., Hinkle, G., Root, D. E.,

Roberts, T. M., Lander, E. S., Polyak, K., and Hahn, W. C. (2007) Integrative

genomic approaches identify IKBKE as a breast cancer oncogene. Cell. 129,

1065-1079.

103

150. Harman, D. (1995) Free radical theory of aging: Alzheimer‟s disease

pathogenesis. AGE. 18, 97-119.

151. Jenner, P., and Olanow, C. W. (1996) Oxidative stress and the pathogenesis of

Parkinson's disease. Neurology. 47, 161S-170S.

152. Markesbery, W. R. (1997) Oxidative Stress Hypothesis in Alzheimer's Disease.

Free Radical Biol Med. 23, 134-147.

153. Klaunig, J. E., Kamendulis, L. M., and Hocevar, B. A. (2010) Oxidative Stress

and Oxidative Damage in Carcinogenesis. Toxicol Pathol. 38, 96-109.

154. Klaunig, J. E., Xu, Y., Isenberg, J. S., Bachowski, S., Kolaja, K. L., Jiang, J.,

Stevenson, D. E., and Walborg, E. F., Jr. (1998) The role of oxidative stress in

chemical carcinogenesis. Environ Health Perspect. 106 Suppl 1, 289-295.

155. Harman, D. (1956) Aging: a theory based on free radical and radiation chemistry.

J Gerontol. 11, 298-300.

156. Wickens, A. P. (2001) Ageing and the free radical theory. Respiration Physiology.

128, 379-391.

157. Lee, B. C., and Gladyshev, V. N. (2011) The biological significance of

methionine sulfoxide stereochemistry. Free Radical Bio Med. 50, 221-227.

158. Brot, N., Weissbach, L., Werth, J., and Weissbach, H. (1981) Enzymatic

reduction of protein-bound methionine sulfoxide. Proc Natl Acad Sci U S A. 78,

2155-2158.

159. Brot, N., and Weissbach, H. (1991) Biochemistry of methionine sulfoxide

residues in proteins. Biofactors. 3, 91-96.

160. Stadtman, E. R., Moskovitz, J., and Levine, R. L. (2003) Oxidation of Methionine

Residues of Proteins: Biological Consequences. Antioxidant Redox Sign. 5, 577-

582.

161. Moskovitz, J., Bar-Noy, S., Williams, W. M., Requena, J., Berlett, B. S., and

Stadtman, E. R. (2001) Methionine sulfoxide reductase (MsrA) is a regulator of

antioxidant defense and lifespan in mammals. Proc Natl Acad Sci U S A. 98,

12920-12925.

162. Cabreiro, F., Picot, C. R., Perichon, M., Castel, J., Friguet, B., and Petropoulos, I.

(2008) Overexpression of Mitochondrial Methionine Sulfoxide Reductase B2

Protects Leukemia Cells from Oxidative Stress-induced Cell Death and Protein

Damage. J Biol Chem. 283, 16673-16681.

104

163. Cabreiro, F., Picot, C. R., Perichon, M., Friguet, B., and Petropoulos, I. (2009)

Overexpression of Methionine Sulfoxide Reductases A and B2 Protects MOLT-4

Cells Against Zinc-Induced Oxidative Stress. Antioxid Redox Sign. 11, 215-226.

164. Moskovitz, J., Flescher, E., Berlett, B. S., Azare, J., Poston, J. M., and Stadtman,

E. R. (1998) Overexpression of peptide-methionine sulfoxide reductase in

Saccharomyces cerevisiae and human T cells provides them with high resistance

to oxidative stress. Proc Natl Acad Sci U S A. 95, 14071-14075.

165. Kauffmann, B., Aubry, A., and Favier, F. (2005) The three-dimensional structures

of peptide methionine sulfoxide reductases: current knowledge and open

questions. BBA-Proteins Proteom. 1703, 249-260.

166. Antoine, M., Boschi-Muller, S., and Branlant, G. (2003) Kinetic characterization

of the chemical steps involved in the catalytic mechanism of methionine sulfoxide

reductase A from Neisseria meningitidis. J Biol Chem. 278, 45352-45357.

167. Olry, A., Boschi-Muller, S., and Branlant, G. (2004) Kinetic characterization of

the catalytic mechanism of methionine sulfoxide reductase B from Neisseria

meningitidis. Biochemistry. 43, 11616-11622.

168. Jansson, M., Li, Y. C., Jendeberg, L., Anderson, S., Montelione, G. T., and

Nilsson, B. (1996) High-level production of uniformly 15N- and 13C-enriched

fusion proteins in Escherichia coli. Journal of biomolecular NMR. 7, 131-141.

169. Goto, N. K., Gardner, K. H., Mueller, G. A., Willis, R. C., and Kay, L. E. (1999)

A robust and cost-effective method for the production of Val, Leu, Ile (delta 1)

methyl-protonated 15N-, 13C-, 2H-labeled proteins. J Biomol NMR. 13, 369-374.

170. Zheng, D., Cort, J. R., Chiang, Y., Acton, T., Kennedy, M. A., and Montelione,

G. T. (2003) 1H, 13C and 15N resonance assignments for methionine sulfoxide

reductase B from Bacillus subtilis. J Biomol NMR. 27, 183-184.

171. Kay, L., Keifer, P., and Saarinen, T. (1992) Pure absorption gradient enhanced

heteronuclear single quantum correlation spectroscopy with improved sensitivity.

J Am Chem Soc. 114, 10663-10665.

172. Schleucher, J., Schwendinger, M., Sattler, M., Schmidt, P., Schedletzky, O.,

Glaser, S. J., Sorensen, O. W., and Griesinger, C. (1994) A general enhancement

scheme in heteronuclear multidimensional NMR employing pulsed field

gradients. J Biomol NMR. 4, 301-306.

173. Diercks, T., Coles, M., and Kessler, H. (1999) An efficient strategy for

assignment of cross-peaks in 3D heteronuclear NOESY experiments. J Biomol

NMR. 15, 177-180.

105

174. Hansen, M. R., Mueller, L., and Pardi, A. (1998) Tunable alignment of

macromolecules by filamentous phage yields dipolar coupling interactions. Nat

Struct Biol. 5, 1065-1074.

175. Tjandra, N., Grzesiek, S., and Bax, A. (1996) Magnetic Field Dependence of

Nitrogen−Proton J Splittings in 15N-Enriched Human Ubiquitin Resulting from

Relaxation Interference and Residual Dipolar Coupling. J Am Chem Soc. 118,

6264-6272.

176. Cavanagh, J. (2007) Protein NMR spectroscopy : principles and practice, 2nd ed.,

Academic Press, Amsterdam ; Boston.

177. Markus, M. A., Dayie, K. T., Matsudaira, P., and Wagner, G. (1994) Effect of

Deuteration on the Amide Proton Relaxation Rates in Proteins. Heteronuclear

NMR Experiments on Villin 14T. J Mag Res B. 105, 192-195.

178. Metzler, W. J., Wittekind, M., Goldfarb, V., Mueller, L., and Farmer, B. T. (1996)

Incorporation of 1H/13C/15N-{Ile, Leu, Val} into a Perdeuterated, 15N-Labeled

Protein: Potential in Structure Determination of Large Proteins by NMR. J Am

Chem Soc. 118, 6800-6801.

179. Goto, N. K., and Kay, L. E. (2000) New developments in isotope labeling

strategies for protein solution NMR spectroscopy. Curr Opin Struct Biol. 10, 585-

592.

180. Otten, R., Chu, B., Krewulak, K. D., Vogel, H. J., and Mulder, F. A. A. (2010)

Comprehensive and Cost-Effective NMR Spectroscopy of Methyl Groups in

Large Proteins. J Am Chem Soc. 132, 2952-2960.

181. Shen, Y., and Bax, A. (2010) Prediction of Xaa-Pro peptide bond conformation

from sequence and chemical shifts. J Biomol NMR. 46, 199-204.

182. Zheng, D., Huang, Y. J., Moseley, H. N., Xiao, R., Aramini, J., Swapna, G. V.,

and Montelione, G. T. (2003) Automated protein fold determination using a

minimal NMR constraint strategy. Protein Sci. 12, 1232-1246.

183. Ranaivoson, F. M., Neiers, F., Kauffmann, B., Boschi-Muller, S., Branlant, G.,

and Favier, F. (2009) Methionine sulfoxide reductase B displays a high level of

flexibility. J Mol Biol. 394, 83-93.

184. Lowther, W. T., Weissbach, H., Etienne, F., Brot, N., and Matthews, B. W.

(2002) The mirrored methionine sulfoxide reductases of Neisseria gonorrhoeae

pilB. Nat Struct Biol. 9, 348-352.

185. Eisenmesser, E. Z., Bosco, D. A., Akke, M., and Kern, D. (2002) Enzyme

dynamics during catalysis. Science. 295, 1520-1523.

106

186. Henzler-Wildman, K. A., Lei, M., Thai, V., Kerns, S. J., Karplus, M., and Kern,

D. (2007) A hierarchy of timescales in protein dynamics is linked to enzyme

catalysis. Nature. 450, 913-916.

187. Kern, D., Eisenmesser, E. Z., and Wolf-Watz, M. (2005) Enzyme dynamics

during catalysis measured by NMR spectroscopy. Methods in enzymology. 394,

507-524.

188. Kumar, R. A., Koc, A., Cerny, R. L., and Gladyshev, V. N. (2002) Reaction

mechanism, evolutionary analysis, and role of zinc in Drosophila methionine-R-

sulfoxide reductase. J Biol Chem. 277, 37527-37535.

189. Olry, A., Boschi-Muller, S., Yu, H., Burnel, D., and Branlant, G. (2005) Insights

into the role of the metal binding site in methionine-R-sulfoxide reductases B.

Protein Science. 14, 2828-2837.

190. Kim, Y. K., Shin, Y. J., Lee, W.-H., Kim, H.-Y., and Hwang, K. Y. (2009)

Structural and kinetic analysis of an MsrA–MsrB fusion protein from

Streptococcus pneumoniae. Mol Microbiol. 72, 699-709.

191. Williamson, M. P. (1990) Secondary-structure dependent chemical shifts in

proteins. Biopolymers. 29, 1423-1431.

192. Wishart, D. S., Sykes, B. D., and Richards, F. M. (1991) Relationship between

nuclear magnetic resonance chemical shift and protein secondary structure. J Mol

Biol. 222, 311-333.

193. Wishart, D. S., Sykes, B. D., and Richards, F. M. (1992) The chemical shift

index: a fast and simple method for the assignment of protein secondary structure

through NMR spectroscopy. Biochemistry. 31, 1647-1651.

194. Rohl, C. A., Strauss, C. E., Misura, K. M., and Baker, D. (2004) Protein structure

prediction using Rosetta. Methods in enzymology. 383, 66-93.

195. Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J. M., Liu, G., Eletsky, A.,

Wu, Y., Singarapu, K. K., Lemak, A., Ignatchenko, A., Arrowsmith, C. H.,

Szyperski, T., Montelione, G. T., Baker, D., and Bax, A. (2008) Consistent blind

protein structure generation from NMR chemical shift data. Proc Natl Acad Sci U

S A. 105, 4685-4690.

196. Ramelot, T. A., Raman, S., Kuzin, A. P., Xiao, R., Ma, L. C., Acton, T. B., Hunt,

J. F., Montelione, G. T., Baker, D., and Kennedy, M. A. (2009) Improving NMR

protein structure quality by Rosetta refinement: a molecular replacement study.

Proteins. 75, 147-167.

107

197. Raman, S., Huang, Y. J., Mao, B., Rossi, P., Aramini, J. M., Liu, G., Montelione,

G. T., and Baker, D. (2010) Accurate automated protein NMR structure

determination using unassigned NOESY data. J Am Chem Soc. 132, 202-207.

108

A. APPENDIX

Figure A.1 HDX-MS results for NESG target BfR218.

109

Figure A.2 HDX-MS results for NESG target ER554.

110

Figure A.3 HDX-MS results for NESG target HR2891

111


112


113


114


115


116


117

Figure A.10 HDX-MS results for NESG target SmR84

118

Figure A.11 HDX-MS results for NESG target SpR36

119

Figure A.12 HDX-MS results for NESG target SvR375

Documents

ASLI ERTEKIN In partial fulfillment of the requirements