Upload
ian-stokes-rees
View
1.434
Download
0
Embed Size (px)
Citation preview
Wide-‐Search Molecular Replacement
Ian Stokes-‐Reeshttp://portal.nebiogrid.org/
When WS-‐MR is suitable
• You’ve got good data (<4 A)• You’ve tried MR with lots of good candidates
• a priori knowledge• sequence similarity (PSI-‐BLAST search)
• Or• protein not sequenced• no a priori knowledge of expected fold
• You haven’t found any good models to use for phasing
• Time to try a brute-‐force search: WS-‐MR
When MR is not suitable
• Complexes containing signiOicant DNA or RNA• at least right now, these will probably not work
• You haven’t tried MR and just want a “quick Oix”• Very large or very small structures
• both are computationally difOicult
• Low resolution (> 4.5 A)• experience so far suggests these aren’t going to be helped much
Requirements• ReOlection data in MTZ Oile format
• Must have amplitude columns (e.g. FP, SIGFP)
• Doesn’t work with intensities (I, SIGI)
• Time• To analyze results
• To take next steps
• Managed expectations• Identify good MR candidates about 1 in 4 cases
• We don’t produce a fully phased structure, only a list of good MR candidates and their best placements as returned by Phaser
• Experience with Phaser to interpret results and re-‐run candidate models
Background• Utilizes Phaser for MR• Utilizes Open Science Grid for computing• References
• Stokes-‐Rees, Sliz, Protein structure determination by exhaustive search of Protein Data Bank derived databases, Proc. Nat'l Academy of Sciences doi:10.1073/pnas.1012095107
• Stokes-‐Rees, Sliz, Compute and data management strategies for grid deployment of high throughput protein structure studies, IEEE Workshop on Many Task Computing on Grids and Supercomputers 2010 (MTAGS10), Seattle, November 2010
• Phaser: McCoy, Grosse-‐Kunstleve, Adams, Winn, Storoni, Read; J. Appl. Cryst. (2007). 40, 658-‐674
• Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classi?ication of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-‐540.
• Requires 20-‐50,000 hours of computing• Produces 300,000 Oiles• Attempts 100,000 single-‐domain MR trials using all SCOP
domains
https://portal.nebiogrid.org/d/accounts/create
Step 1: Register to use Portal
Step 2: Submit Computational Taskhttps://portal.nebiogrid.org/d/apps/wsmr/create
Side Note: MTZ columns
• Use CCP4 tool “mtzdmp” to check column names and resolution if you’re not sure
$ mtzdmp GAS.mtz | less... * Column Labels : H K L FP SIGFP FreeRflag... * Resolution Range : 0.00050 0.25197 ( 44.699 - 1.992 A )...
columnnames resolution
Step 3a: Review active task list on portal
click here to access task
Step 3b: Check email for task details and link
click here to access task
Step 4: Log into job page
Step 5a: Review web page
Step 5b: Check status
R = RunningI = IdleH = Held
Remember: Someone from SBGrid will manually review your job and release it. Until that happens your job won’t even be in the queue. Even after that, it could be in the queue for several days before it starts running. Do email us if you have questions or if it seems stuck or not running.
Click here
Step 5c: Check status
outcomes to date
summary of active jobs
Step 6a: Review scatter graphs
Look for a cluster of high TFZ and high LLG results distinct from the rest
NOTE: This graph is a static image
Step 6b: Cases with no strong MR candidates*
* Remember this is usually the case, unfortunately
Step 6c: Review scatter graphs
NOTE: This graph is a dynamic clickable image. Only the Oirst 5000 results by LLG are currently available because of memory constraints
Click this button to load data and enable clickable image
Step 6d: Review scatter graphs
Click data point to view details
Click large cartoon image to add to image basket
PDB details
Step 7: Review tabular data
live results (space delimited)
sorted results (tab delimited), generated by ”check status”
Step 8: Wait for job to Oinish
results aprox. 100,000errors < 5,000
No running jobs (all done)
NOTE: This job is not yet Oinished!
Step 9: Download Oinalized augmented results
augmented contains static SCOP domain class and name (25 MB)
Oinal contains a sorted, cleaned set of results (5 MB)
Step 10: Review and download speciOic SCOP PDB
• Use the tabular results to identify speciOic SCOP codes that look promising
• PDBs can be fetched using one of these resources:http://portal.nebiogrid.org/biodb/scop/v1.75/clean/code2/http://abitibi.sbgrid.org/cgi/pdbview.pyhttp://abitibi.sbgrid.org/cgi/tmalign.py
Step 11: Recreate Phaser output
Click on “test” directory(bottom of job page)
ROOT 2vlj-testMODE MR_AUTOHKLIn ../2vlj.mtzLABIn F=FP SIGF=SIGFPENSEmble 200la_ PDB 00/200la_.pdb IDENtity 0.3COMPosition SOLVENT 50.0RESOlution 2.4SEARch ENSEmble 200la_ NUM 1
This is the command input to Phaser
Step 12: Over to you
• You now need to reOine your structure• WS-‐MR only gets you as far attempting to identify promising MR candidates if you haven’t had success with conventional model identiOication methods
• Some further MR options that exist:• Second domain search with Oirst domain Oixed• homo-‐dimer/homo-‐trimer searches• Custom PDB search library -‐ you give us the PDBs, we can run WS-‐MR
over the set
Conclusion and Thanks
• We welcome ideas for improvements• Special processing requirements?
• We may be able to do this from the command line interface
• Please contact us if you have any questions• [email protected]
• Open Science Grid is a big enabler here!• http://opensciencegrid.org
• Thanks to SBGrid team:• http://www.sbgrid.org
• Thanks to the Sliz Lab at Harvard Medical School:• http://hkl.hms.harvard.edu