Upload
larry-smarr
View
22
Download
3
Embed Size (px)
Citation preview
“Decoding the Software Inside of You”
Invited Presentation IEEE Computer Science Board of Governors Caucus
Anaheim, CAFebruary 2, 2017
Dr. Larry SmarrDirector, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor, Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSDhttp://lsmarr.calit2.net
1
Abstract
The cells in our human bodies contain "the software of life" in their DNA. Amazingly, there are 10 times as many microbial cells as human cells containing DNA in our bodies. Furthermore, our human cell's DNA contain around 23,000 genes, each of which codes for a protein, whereas our microbes DNA contains millions of genes. I will discuss how the joint exponential decline in the cost of computing and genetic sequencing is finally enabling humans to read their own internal software, coded in their human and microbiome DNA, and how machine learning will be required to decipher the differences between states of health and disease.
Is Biology Software Working Itself Out in Organic Chemistry?
"We suggest that life may be characterized by its distinctive and active use of information, thus providing a roadmap to identify rigorous criteria for the emergence of life.”
- Paul Davies and Sara Walker
Most of Evolutionary Time Was in the Microbial World – Billions of Years of Software Evolution
You Are Here
Source: Carl Woese, et al
Tree of Life Derived from 16S rRNA Sequences
Reading the Software of Life Requires Genetic Sequencing: The Cost of Sequencing DNA Has Fallen Over 100,000x in the Last Ten Years
This Has Enabled Sequencing of Both Human and Microbial Genomes
From One to a Trillion Data Points Defining Me in 15 Years:The Exponential Rise in Body Data
Weight
Blood BiomarkerTime Series
Human Genome SNPs
Microbial GenomeTime Series
Improving Body
Discovering Disease
Human Genome
Genomics Big Data Tsunami
The Human Gut as a Super-Evolutionary Microbial Cauldron
• Enormous Density– 1000x Ocean Water
• Highly Dynamic Microbial Ecology– Hundreds to Thousands of Species
• Horizontal Gene Transfer• Phages• Adaptive Selection Pressures (Immune System)
– Innate Immune System– Adaptive Immune System– Macrophages and Antimicrobial proteins
• Constantly Changing Environmental Pressures– Diet– Antibiotics– Pharmaceuticals
A Year of Sequencing a Healthy Gut Microbiome Daily -Remarkable Stability with Abrupt Changes
Days
Genome Biology (2014)David, et al.
We are Genomically Analyzing My Stool Time Series in a Collaboration with the UCSD Knight Lab
Larry’s 40 Stool Samples Over 3.5 Years to Rob’s lab on April 30, 2015
My Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms
Lialda &
Uceris
12/1/13 to
1/1/14
12/1/13-1/1/14
Frequent IBD SymptomsWeight Loss
7/1/12 to 12/1/14
Blue Balls on Diagram to the Right
Principal Coordinate Analysis of Microbiome Ecology
PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD
Weight Data from Larry Smarr, Calit2, UCSD
Weekly Weight
Few IBD SymptomsWeight Gain 1/1/14 to 8/1/15
Red Balls on Diagram to the Right
An Initial Study of the Variation of the Human Gut MicrobiomeAcross Populations and Within an Individual Over Time
5 Ileal Crohn’s Patients, 3 Points in Time
2 Ulcerative Colitis Patients, 6 Points in Time
“Healthy” Individuals
Source: Jerry Sheehan, Calit2Weizhong Li, Sitao Wu, CRBS, UCSD
Total of 27 Billion ReadsOr 2.7 Trillion Bases
Inflammatory Bowel Disease (IBD) Patients250 Subjects
1 Point in Time
7 Points in Time
Each Sample Has 100-200 Million Illumina Short Reads (100 bases)
Larry Smarr(Colonic Crohn’s)
To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers
Source: Weizhong Li, UCSD
Our Team Used 25 CPU-yearsto Compute
Comparative Gut MicrobiomesStarting From
2.7 Trillion DNA Bases of My Samples
and Healthy and IBD Subjects
Illumina HiSeq 2000 at JCVI
SDSC Gordon Data Supercomputer
Computational NextGen Sequencing Pipeline:From Sequence to Taxonomy and Function
PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
Computing the Gut Microbiome EcologyResults Include Relative Abundance of Hundreds of Microbial Species
Average Over 250 Healthy PeopleFrom NIH Human Microbiome Project
Note Log Scale
Clostridium difficile
Image from Calit2 Vroom Wall
Using Scalable Visualization Allows Comparison of the Relative Abundance of 200 Microbe Species
Calit2 VROOM-FuturePatient Expedition
Comparing 3 LS Time Snapshots (Left) with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom)
We Found Major State Shifts in Microbial Ecology PhylaBetween Healthy and Three Forms of IBD
Most Common Microbial
Phyla
Average HE
Average Ulcerative Colitis
Average LSColonic Crohn’s Disease
Average Ileal Crohn’s Disease
Collapse of BacteroidetesExplosion of Actinobacteria
Explosion of Proteobacteria
Hybrid of UC and CDHigh Level of Archaea
Your Microbiome is Your “Near-Body” Environment
and its CellsContain 200-2000x
as Many DNA GenesAs Your Human Cells
DNA-bearing Cells in Your Body:More Microbe Cells Than Human Cells
Inclusion of the “Dark Matter” of the BodyWill Radically Alter Medicine
Each Microbe Contains a Few Thousand Genes on Its DNA
E. Coli Contains ~5000 Genes on its Circular Chromosome, Which is 1000x the Length of the Cell!
Several Million Genes Can Occur in the Human Gut Microbiome
In a “Healthy” Gut Microbiome:Large Taxonomy Variation, Low Protein Family Variation
Source: Nature, 486, 207-212 (2012)
Over 200 People
Using Machine Learning to Determine Major DifferencesBetween Gut Microbiome in Health and Disease
IEEE International Conference on Big Data (December 5-8, 2016)
Using Kolmogorov-Smirnov Test and Random Forest Machine Learning to Discover the Protein Families That Differentiate Between Disease and Health
Selected from
Top 100 KS
Scores
Selected by
Random Forest
ClassifierFrom
Holdout Set
Note: Orders of Magnitude Increase or Decrease in
Protein Families Between
Health and Disease
Source: Computing by Weizhong Li, JCVI; ML by Mehrdad Yazdani, Calit2
To Expand IBD Project the Knight/Smarr Labs Were Awarded ~ 1 CPU-Century Supercomputing Time• Smarr Gut Microbiome Time Series
– From 7 Samples Over 1.5 Years – To 85 Samples Over 5 Years
• IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients
• New Software Suite from Knight Lab– Re-annotation of Reference Genomes, Functional / Taxonomic
Variations– From 10,000 KEGGs to ~1 Million Genes– Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner
8x Compute Resources Over Prior Study
Genome-Scale Metabolic ModelsCreate Virtual Models of Living Microbes
Large Computational Models of Metabolism Account for:• Thousands of Unique
Metabolites• > 2000 Unique Reactions• Product of Thousands of
Genes (>1/3 of Genome for Some Species)
January 2015Feist & Palsson
Nature Biotechnology 26, 659 - 667 (2008)
24
Genome-Scale Metabolic Models Are Used in Industry and Medicine
• Allows for Simulation of the Behavior of a Cell in a Computer– In Different Growth Conditions– Genetic Perturbations
• Applications in:– Industrial Biology
– Metabolic Engineering– Medicine
– Drug Target Discovery
Bordbar A, Monk JM, King ZA, Palsson BO: Constraint-based models predict metabolic and associated cellular functions.
Nat Rev Genet 2014, 15(2):107-120.
“A Whole-Cell Computational ModelPredicts Phenotype from Genotype”
A model of Mycoplasma genitalium, • 525 genes• Using 1,900
experimental observations
• From 900 studies, • They created the
software model, • Which requires 128
computers to run
Massive Research is Underway to Discover A Wide Range of New Techniques for Manipulating Your Microbiome
www.huffingtonpost.com/entry/gut-bacteria-microbiome-disease_us_57068c55e4b053766188f383
www.synlogictx.com
The Future Foundation of Medicine is an Exponential Scaling-Up of the Number of Deeply Quantified Humans
Source: @EricTopolTwitter 9/27/2014