27
“Decoding the Software Inside of You” Invited Presentation IEEE Computer Science Board of Governors Caucus Anaheim, CA February 2, 2017 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1

Decoding the Software Inside of You

Embed Size (px)

Citation preview

Page 1: Decoding the Software Inside of You

“Decoding the Software Inside of You”

Invited Presentation IEEE Computer Science Board of Governors Caucus

Anaheim, CAFebruary 2, 2017

Dr. Larry SmarrDirector, California Institute for Telecommunications and Information Technology

Harry E. Gruber Professor, Dept. of Computer Science and Engineering

Jacobs School of Engineering, UCSDhttp://lsmarr.calit2.net

1

Page 2: Decoding the Software Inside of You

Abstract

The cells in our human bodies contain "the software of life" in their DNA. Amazingly, there are 10 times as many microbial cells as human cells containing DNA in our bodies. Furthermore, our human cell's DNA contain around 23,000 genes, each of which codes for a protein, whereas our microbes DNA contains millions of genes.  I will discuss how the joint exponential decline in the cost of computing  and genetic sequencing is finally enabling humans to read their own internal software, coded in their human and microbiome DNA, and how machine learning will be required to decipher the differences between states of health and disease.

Page 3: Decoding the Software Inside of You

Is Biology Software Working Itself Out in Organic Chemistry?

"We suggest that life may be characterized by its distinctive and active use of information, thus providing a roadmap to identify rigorous criteria for the emergence of life.”

- Paul Davies and Sara Walker

Page 4: Decoding the Software Inside of You

Most of Evolutionary Time Was in the Microbial World – Billions of Years of Software Evolution

You Are Here

Source: Carl Woese, et al

Tree of Life Derived from 16S rRNA Sequences

Page 5: Decoding the Software Inside of You

Reading the Software of Life Requires Genetic Sequencing: The Cost of Sequencing DNA Has Fallen Over 100,000x in the Last Ten Years

This Has Enabled Sequencing of Both Human and Microbial Genomes

Page 6: Decoding the Software Inside of You

From One to a Trillion Data Points Defining Me in 15 Years:The Exponential Rise in Body Data

Weight

Blood BiomarkerTime Series

Human Genome SNPs

Microbial GenomeTime Series

Improving Body

Discovering Disease

Human Genome

Genomics Big Data Tsunami

Page 7: Decoding the Software Inside of You

The Human Gut as a Super-Evolutionary Microbial Cauldron

• Enormous Density– 1000x Ocean Water

• Highly Dynamic Microbial Ecology– Hundreds to Thousands of Species

• Horizontal Gene Transfer• Phages• Adaptive Selection Pressures (Immune System)

– Innate Immune System– Adaptive Immune System– Macrophages and Antimicrobial proteins

• Constantly Changing Environmental Pressures– Diet– Antibiotics– Pharmaceuticals

Page 8: Decoding the Software Inside of You

A Year of Sequencing a Healthy Gut Microbiome Daily -Remarkable Stability with Abrupt Changes

Days

Genome Biology (2014)David, et al.

Page 9: Decoding the Software Inside of You

We are Genomically Analyzing My Stool Time Series in a Collaboration with the UCSD Knight Lab

Larry’s 40 Stool Samples Over 3.5 Years to Rob’s lab on April 30, 2015

Page 10: Decoding the Software Inside of You

My Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms

Lialda &

Uceris

12/1/13 to

1/1/14

12/1/13-1/1/14

Frequent IBD SymptomsWeight Loss

7/1/12 to 12/1/14

Blue Balls on Diagram to the Right

Principal Coordinate Analysis of Microbiome Ecology

PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD

Weight Data from Larry Smarr, Calit2, UCSD

Weekly Weight

Few IBD SymptomsWeight Gain 1/1/14 to 8/1/15

Red Balls on Diagram to the Right

Page 11: Decoding the Software Inside of You

An Initial Study of the Variation of the Human Gut MicrobiomeAcross Populations and Within an Individual Over Time

5 Ileal Crohn’s Patients, 3 Points in Time

2 Ulcerative Colitis Patients, 6 Points in Time

“Healthy” Individuals

Source: Jerry Sheehan, Calit2Weizhong Li, Sitao Wu, CRBS, UCSD

Total of 27 Billion ReadsOr 2.7 Trillion Bases

Inflammatory Bowel Disease (IBD) Patients250 Subjects

1 Point in Time

7 Points in Time

Each Sample Has 100-200 Million Illumina Short Reads (100 bases)

Larry Smarr(Colonic Crohn’s)

Page 12: Decoding the Software Inside of You

To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers

Source: Weizhong Li, UCSD

Our Team Used 25 CPU-yearsto Compute

Comparative Gut MicrobiomesStarting From

2.7 Trillion DNA Bases of My Samples

and Healthy and IBD Subjects

Illumina HiSeq 2000 at JCVI

SDSC Gordon Data Supercomputer

Page 13: Decoding the Software Inside of You

Computational NextGen Sequencing Pipeline:From Sequence to Taxonomy and Function

PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)

Page 14: Decoding the Software Inside of You

Computing the Gut Microbiome EcologyResults Include Relative Abundance of Hundreds of Microbial Species

Average Over 250 Healthy PeopleFrom NIH Human Microbiome Project

Note Log Scale

Clostridium difficile

Image from Calit2 Vroom Wall

Page 15: Decoding the Software Inside of You

Using Scalable Visualization Allows Comparison of the Relative Abundance of 200 Microbe Species

Calit2 VROOM-FuturePatient Expedition

Comparing 3 LS Time Snapshots (Left) with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom)

Page 16: Decoding the Software Inside of You

We Found Major State Shifts in Microbial Ecology PhylaBetween Healthy and Three Forms of IBD

Most Common Microbial

Phyla

Average HE

Average Ulcerative Colitis

Average LSColonic Crohn’s Disease

Average Ileal Crohn’s Disease

Collapse of BacteroidetesExplosion of Actinobacteria

Explosion of Proteobacteria

Hybrid of UC and CDHigh Level of Archaea

Page 17: Decoding the Software Inside of You

Your Microbiome is Your “Near-Body” Environment

and its CellsContain 200-2000x

as Many DNA GenesAs Your Human Cells

DNA-bearing Cells in Your Body:More Microbe Cells Than Human Cells

Inclusion of the “Dark Matter” of the BodyWill Radically Alter Medicine

Page 18: Decoding the Software Inside of You

Each Microbe Contains a Few Thousand Genes on Its DNA

E. Coli Contains ~5000 Genes on its Circular Chromosome, Which is 1000x the Length of the Cell!

Several Million Genes Can Occur in the Human Gut Microbiome

Page 19: Decoding the Software Inside of You

In a “Healthy” Gut Microbiome:Large Taxonomy Variation, Low Protein Family Variation

Source: Nature, 486, 207-212 (2012)

Over 200 People

Page 20: Decoding the Software Inside of You

Using Machine Learning to Determine Major DifferencesBetween Gut Microbiome in Health and Disease

IEEE International Conference on Big Data (December 5-8, 2016)

Page 21: Decoding the Software Inside of You

Using Kolmogorov-Smirnov Test and Random Forest Machine Learning to Discover the Protein Families That Differentiate Between Disease and Health

Selected from

Top 100 KS

Scores

Selected by

Random Forest

ClassifierFrom

Holdout Set

Note: Orders of Magnitude Increase or Decrease in

Protein Families Between

Health and Disease

Source: Computing by Weizhong Li, JCVI; ML by Mehrdad Yazdani, Calit2

Page 22: Decoding the Software Inside of You

To Expand IBD Project the Knight/Smarr Labs Were Awarded ~ 1 CPU-Century Supercomputing Time• Smarr Gut Microbiome Time Series

– From 7 Samples Over 1.5 Years – To 85 Samples Over 5 Years

• IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients

• New Software Suite from Knight Lab– Re-annotation of Reference Genomes, Functional / Taxonomic

Variations– From 10,000 KEGGs to ~1 Million Genes– Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner

8x Compute Resources Over Prior Study

Page 23: Decoding the Software Inside of You

Genome-Scale Metabolic ModelsCreate Virtual Models of Living Microbes

Large Computational Models of Metabolism Account for:• Thousands of Unique

Metabolites• > 2000 Unique Reactions• Product of Thousands of

Genes (>1/3 of Genome for Some Species)

January 2015Feist & Palsson

Nature Biotechnology 26, 659 - 667 (2008)

Page 24: Decoding the Software Inside of You

24

Genome-Scale Metabolic Models Are Used in Industry and Medicine

• Allows for Simulation of the Behavior of a Cell in a Computer– In Different Growth Conditions– Genetic Perturbations

• Applications in:– Industrial Biology

– Metabolic Engineering– Medicine

– Drug Target Discovery

Bordbar A, Monk JM, King ZA, Palsson BO: Constraint-based models predict metabolic and associated cellular functions.

Nat Rev Genet 2014, 15(2):107-120.

Page 25: Decoding the Software Inside of You

“A Whole-Cell Computational ModelPredicts Phenotype from Genotype”

A model of Mycoplasma genitalium, • 525 genes• Using 1,900

experimental observations

• From 900 studies, • They created the

software model, • Which requires 128

computers to run

Page 26: Decoding the Software Inside of You

Massive Research is Underway to Discover A Wide Range of New Techniques for Manipulating Your Microbiome

www.huffingtonpost.com/entry/gut-bacteria-microbiome-disease_us_57068c55e4b053766188f383

www.synlogictx.com

Page 27: Decoding the Software Inside of You

The Future Foundation of Medicine is an Exponential Scaling-Up of the Number of Deeply Quantified Humans

Source: @EricTopolTwitter 9/27/2014