Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Iden%fying Behavioral Strategies through Large Scale Phenotyping and Sta%s%cal Analysis
Stephen Helms, Ph.D. April 8, 2014 – EYR Global FOM Ins%tute AMOLF, Amsterdam, Netherlands Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF)
How Do We Understand Complex Systems With Many Parts?
(Also a general “big data” ques%on!)
A Model Complex System
Tradi%onal approaches for understanding
complex biological systems
Sta%s%cal approach for understanding
biological systems
Data and computa%on problems
Proposed computa%onal
pla^orm
Outlook for the future
A Simple Model Nervous System: C. elegans
S%muli Response
The Worm • ~1000 total cells
• 302 neurons • 95 muscles
• ~20000 genes • Smell
(vola%le odors) • Taste
(soluble chemicals) • Feel
(touch, heat)
• Movement • Neural ac%vity • Biochemical reac%ons
A Biologist’s Toolbox • Break individual parts, see what happens Gene%cs
• Look at how parts chemically interact Biochemistry
• See where the parts are Cell Biology
End result: • A list of lots of details about what individual genes and proteins are doing • But no clear view on what the system as a whole does
Idea: Finding Simple Models Through Quan%ta%ve, Compara%ve Studies
• Build quan=ta=ve models that are just complicated enough to explain the phenotypes we can observe and care about
• Compare models across mul%ple strains and species to see what phenotypes biology cares about
• The molecular and cellular details can be filled in later using tradi%onal approaches
• Model system: Mo=le behavior – Behavior is the output of all the complicated systems of an organism
C. elegans Behavior
• Undulatory mo%on • Occasional reversals • Occasional sharp “omega” turns
• Con%nuous turning
Gray and Lissmann (1964) J. Exp. Biol. 41:135-‐54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-‐Shimomura et al. (1999) J. Neurosci. 19:9557-‐69. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-‐80. Helms (2013) Figshare.hqp://dx.doi.org/10.6084/m9.figshare.705155
Experimental Overview
Record video of freely moving worms up to 30
minutes Extract behavioral data Develop models
Sampling Behavioral Variability: Individual, Intra-‐ and Inter-‐Species
Holovachov, O. et al. (2009) Nematology 11(6):927-‐950. Chiang, J.-‐T.A. et al. (2006) J. Exp. Biol. 209(10):1859-‐73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-‐90.
Up to 20 individuals per strain
Building Quan%ta%ve Models
• Correla%on func%ons • Phase spaces • Firng linear models
Determinis%c dynamics
• Distribu%ons Stochas%c components
• Monte Carlo simula%ons • Comparison with sta%s%cs of data
Simula%ons
Comparing Quan%ta%ve Models
Parameter Correla%on Matrix Paqerns (Modes) Simula%ons
Data Challenges
Storage
• Videos are large • 240 GB/h raw • 12 GB/h compressed
• Using ~1 TB of storage for a proof of concept project
• Want to scale up: • # individuals by 10-‐fold
• Sampling rate by 3-‐fold
Processing
• >3-‐fold slower than data collec%on on a desktop computer
• Results in: • A backlog of data to analyze
• A long delay before experiments can be interpreted
Sharing
• Videos are too big to regularly transfer around
• Extracted data is also big • 2 GB for the proof of concept project
• Limited ability for others to explore the data themselves
Need to record data on many individuals for a long =me at high frequency
Proposal: Centrally located data processing and
analysis services at SURFsara
SURFsara Video storage
Video processing Standard analyses
Experimental Users (AMOLF, VCU, etc.) Generate videos Visualize data
Develop analyses
Theory Users (VU, OIST, etc.) Visualize data
Develop analyses
Exchange datasets and analysis results (few GBs, weekly)
Upload videos Download datasets (hundreds of GBs, daily at peak)
Download datasets (tens of GBs, weekly)
• Loading large (>10 GB) videos • Processing 104-‐106 frames / video
How EYR Is Helping
Storage
• SURFsara will provide up to 20 TB of storage for the video data
Processing
• SURFsara will provide compu%ng resources • Cloud or grid
• eScience Center is helping with migra%ng analysis code to run on HPC infrastructure
Sharing
• Internet2 and SURFnet are connec%ng the involved ins%tutes with SURFsara using high-‐speed lightpath connec%ons • FOM Ins%tute AMOLF • VU • Okinawa Ins%tute of Science and Tech
• Virginia Commonwealth University
Growth Prospects • Open source aspects of C. elegans community
– WormBook -‐ textbook – WormBase -‐ gene%cs – WormAtlas -‐ anatomy – etc.
• As an analysis service available to other researchers – Mo%lity is widely used as a simple phenotype by C. elegans researchers
• Collabora%ve development of new analysis methods – Other researchers developing sta%s%cal analysis approaches for worm behavior
• Integra%on of neuronal imaging data – Ongoing experiments in the systems biology group at AMOLF
R. Doornekamp, FOM InsBtute AMOLF
These Are General Challenges
• Increasing temporal and spa%al resolu%on à more data
Advances in imaging sensors
• Increasing experimental throughput à more data, access to sta%s%cal approaches
Advances in experimental techniques
• Distor%on of data due to compression ar%facts is a major concern among experimentalists
Lack of compression
op%ons
Acknowledgements • Enlighten Your Research 4 and Global Teams
– Nicole Gregoire (SURFnet) – Sylvia Kuijpers (SURFnet) – Jan Bot (SURFsara) – Frank Seinstra (eScience Center)
• eScience Center – Rob van Nieuwpoort – Elena Ranguelova
• Everyone else involved @ SURFnet, SURFsara, Internet2 • Local ICT members
– Carl Schulz (AMOLF)