8
Prediction of the Growth Rate Population Dynamic of Bacteria by Causal Jump Dynamic Mode Decomposition Shara Balakrishnan, Aqib Hasnain, Nibodh Boddupalli, Dennis M. Joshy, and Enoch Yeung Abstract—In this paper, we consider the problem of learning a predictive input-output model of cell growth rate from parametric conditions defining the growth medium. We first in- troduce a generic data-driven framework for training operator- theoretic models to predict cell growth rate. We then introduce the experimental design and data generated in this study, namely growth curves of Vibrio natriegens, as a function of titrated casein and glucose levels. We then evaluate the perfor- mance of a classical algorithm for training Koopman operators, the Hankel dynamic mode decomposition algorithm, and show it is unable to predict the biological growth curves. We introduce a modified version of the Hankel dynamic mode decomposition algorithm, that leverages causal-jump embeddings to predict growth rate. We show this algorithm is able to predict growth curves as a function of casein, but that glucose concentration is ultimately non-informative. Our work suggests a method for designing media to achieve optimal growth rates in organisms, whether the goal is to achieve a slow growing microbe or a fast growing one. I. I NTRODUCTION One of the most fundamental processes in life is the ability to replicate and pass on hereditary material [1]. From viral particles to bacteria to mammalian cells, cell division is fundamental to growth, maintenance of physiological health, and intrinsically tied to the notion of senescence [2]. The mechanisms for controlling growth in organisms are determined by metabolic networks [3], [4], namely their topological structure and parametric realization. Known metabolic networks in well studied model organisms such as E. coli [5] and S. cerevisiae [6], [7] have given rise to predictive models that translate environmental activity to metabolic network state, and ultimately to predictions of growth rate. For canonical biological model systems, these models have been highly accurate in predicting growth rate and found utility in industrial microbiology applications, e.g. in the design of bioreactors or informing best practices in food safety. For many biological life forms, relatively little is known about their metabolic network or structure. This is especially the case when developing bioengineering tools in novel host microbes [8], [9]. For new organisms, canonical metabolic networks are lacking and often obtained through a process of sequence alignment and comparative analysis with existing metabolic network models in relative species. However, many novel strains do not exhibit significant similarity, and even in the case of sequence similarity, small mutations can lead to dramatically different growth phenotypes, e.g. growth of non-pathogenic soil strains [10], [11] versus pathogenic counterparts [12]. The absence of predictive cross-species models, as well as the inability to predict growth phenotype wholly from sequence data, motivates the need for data- driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to advances in high-throughput experimental tech- niques, it is relatively easy to characterize growth rates as a function of exposure to environment. Liquid and acoustic- liquid handling robotics enables interrogation of thousands of growth conditions in a single microtiter plate, which in turn opens the door for using data-driven approaches [13] to predict growth rate as a function of environmental state. Is it possible to accurately predict the growth rate of a microbe, entirely from the chemical composition and environmental parameters of its growth condition? In this paper we explore a data-driven operator theoretic approach that utilizes microtiter plate reader data, and more generally multi-variate time-series data, to develop predictive models of growth rate in Vibrio natriegens, one of the fastest growing organisms in the world and a target workhorse [14]. A broadly successful class of data-driven modeling ap- proaches stem from the study of Koopman operators, a mathematical construct for representing the time-evolution of nonlinear dynamical systems. In Koopman operator theory, the time-evolution of a nonlinear system is defined on a function space, acting on the original state of the system. In this function space, known the observables space, the Koopman operator is a linear operator, enabling spectral analysis, the decomposition of eigenspaces, and study of nonlinear structure [15]. The Koopman operator framework has been developed for continuous [16] and discrete time sys- tems [17], [18], for open-loop [17] and input-controlled [19], [20] dynamic systems. Koopman operators generically can be divided into two categories, those with discrete (countable) spectra [17] and those with (uncountable) spectra [15]. Thus, Koopman operators present a powerful framework for analyzing the behavior of nonlinear systems, including predicting or forecasting behavior in a data-driven context. Many numerical methods for discovering Koopman op- erators directly from data have been developed in the last two decades [21]–[28]. The most classic approach is the use dynamic mode decomposition (DMD), which models non- linear dynamics via an approximate local linear expansion [25]. Next there is extended dynamic mode decomposition, which uses an extended dictionary of basis functions [17] or functions with universal function approximation properties to discover an approximation of the lifting map or observ- ables. These techniques suffer from combinatorial explosion, which generally has prohibited analysis of high-dimensional nonlinear systems [29].

Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

Prediction of the Growth Rate Population Dynamic of Bacteria by CausalJump Dynamic Mode Decomposition

Shara Balakrishnan, Aqib Hasnain, Nibodh Boddupalli, Dennis M. Joshy, and Enoch Yeung

Abstract— In this paper, we consider the problem of learninga predictive input-output model of cell growth rate fromparametric conditions defining the growth medium. We first in-troduce a generic data-driven framework for training operator-theoretic models to predict cell growth rate. We then introducethe experimental design and data generated in this study,namely growth curves of Vibrio natriegens, as a function oftitrated casein and glucose levels. We then evaluate the perfor-mance of a classical algorithm for training Koopman operators,the Hankel dynamic mode decomposition algorithm, and showit is unable to predict the biological growth curves. We introducea modified version of the Hankel dynamic mode decompositionalgorithm, that leverages causal-jump embeddings to predictgrowth rate. We show this algorithm is able to predict growthcurves as a function of casein, but that glucose concentrationis ultimately non-informative. Our work suggests a method fordesigning media to achieve optimal growth rates in organisms,whether the goal is to achieve a slow growing microbe or a fastgrowing one.

I. INTRODUCTION

One of the most fundamental processes in life is the abilityto replicate and pass on hereditary material [1]. From viralparticles to bacteria to mammalian cells, cell division isfundamental to growth, maintenance of physiological health,and intrinsically tied to the notion of senescence [2].

The mechanisms for controlling growth in organismsare determined by metabolic networks [3], [4], namelytheir topological structure and parametric realization. Knownmetabolic networks in well studied model organisms suchas E. coli [5] and S. cerevisiae [6], [7] have given riseto predictive models that translate environmental activity tometabolic network state, and ultimately to predictions ofgrowth rate. For canonical biological model systems, thesemodels have been highly accurate in predicting growth rateand found utility in industrial microbiology applications, e.g.in the design of bioreactors or informing best practices infood safety.

For many biological life forms, relatively little is knownabout their metabolic network or structure. This is especiallythe case when developing bioengineering tools in novel hostmicrobes [8], [9]. For new organisms, canonical metabolicnetworks are lacking and often obtained through a process ofsequence alignment and comparative analysis with existingmetabolic network models in relative species. However,many novel strains do not exhibit significant similarity, andeven in the case of sequence similarity, small mutations canlead to dramatically different growth phenotypes, e.g. growthof non-pathogenic soil strains [10], [11] versus pathogeniccounterparts [12]. The absence of predictive cross-speciesmodels, as well as the inability to predict growth phenotype

wholly from sequence data, motivates the need for data-driven methods to accelerate the discovery of metabolicmodels and growth rate prediction models.

Due to advances in high-throughput experimental tech-niques, it is relatively easy to characterize growth rates asa function of exposure to environment. Liquid and acoustic-liquid handling robotics enables interrogation of thousandsof growth conditions in a single microtiter plate, whichin turn opens the door for using data-driven approaches[13] to predict growth rate as a function of environmentalstate. Is it possible to accurately predict the growth rateof a microbe, entirely from the chemical composition andenvironmental parameters of its growth condition? In thispaper we explore a data-driven operator theoretic approachthat utilizes microtiter plate reader data, and more generallymulti-variate time-series data, to develop predictive modelsof growth rate in Vibrio natriegens, one of the fastest growingorganisms in the world and a target workhorse [14].

A broadly successful class of data-driven modeling ap-proaches stem from the study of Koopman operators, amathematical construct for representing the time-evolution ofnonlinear dynamical systems. In Koopman operator theory,the time-evolution of a nonlinear system is defined on afunction space, acting on the original state of the system.In this function space, known the observables space, theKoopman operator is a linear operator, enabling spectralanalysis, the decomposition of eigenspaces, and study ofnonlinear structure [15]. The Koopman operator frameworkhas been developed for continuous [16] and discrete time sys-tems [17], [18], for open-loop [17] and input-controlled [19],[20] dynamic systems. Koopman operators generically can bedivided into two categories, those with discrete (countable)spectra [17] and those with (uncountable) spectra [15].Thus, Koopman operators present a powerful frameworkfor analyzing the behavior of nonlinear systems, includingpredicting or forecasting behavior in a data-driven context.

Many numerical methods for discovering Koopman op-erators directly from data have been developed in the lasttwo decades [21]–[28]. The most classic approach is the usedynamic mode decomposition (DMD), which models non-linear dynamics via an approximate local linear expansion[25]. Next there is extended dynamic mode decomposition,which uses an extended dictionary of basis functions [17] orfunctions with universal function approximation propertiesto discover an approximation of the lifting map or observ-ables. These techniques suffer from combinatorial explosion,which generally has prohibited analysis of high-dimensionalnonlinear systems [29].

Page 2: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

The most recent developments in the field of dynamicmode decomposition integrate established advances in deeplearning with dynamic mode decomposition [27], [30]–[32].The promise of these deep learning techniques is that deepneural networks have phenomenal capacity to approximateexponentially many distinct observable functions, at a lin-ear increase in parameter complexity by increasing neuralnetwork depth [27]. An additional layer allows for a combi-natorial expansion in the number of possible functions thatcan be expressed. Thus, neural networks-based approachesto deep learning have demonstrated the ability to use low-dimensional dictionaries to discover nonlinear dynamics, onsystems that traditionally would require high-dimensionaldictionary functions (or were previously intractable) [27].Recent work has shown deep Koopman learning algorithmscan be extended to synthesize controllers for systems subjectto uncertainty [33], suggesting that deep Koopman learningcan be used broadly for robust controller synthesis.

In this paper, we propose a new dynamic mode decompo-sition algorithm to predict growth rate as a function of mediaconditions, resolving the acausality of the Hankel dynamicmode decomposition algorithm with a modified causal-jumpoperator. In Section II we formulate the Koopman operatorand introduce the method of extended dynamic mode de-composition. In Section III we introduce an experimentaldataset, the design of the experiment and visualize the data.Section IV reviews the Hankel dynamic mode decompositionalgorithm and evaluates its performance, showing it is unableto predict the growth curves with any degree of accuracyin extended forecasting. In Section V we introduce thecausal-jump dynamic mode decomposition algorithm, a newalgorithm that adapts the idea of delay embeddings andHankel dynamic mode decomposition, but resolves a longstanding issue with causality. We show that the algorithm isable to train a predictive Koopman operator, that predictswith 3.4% on the training data and 9% on the test dataon extended forecasting tasks approximately 500 time stepsahead.

II. KOOPMAN OPERATOR FORMULATION

Consider a discrete-time autonomous nonlinear dynamicalsystem

x[k + 1] = f(x[k]) (1)

with f : Rn → Rn is analytic. Then, there exists a Koopmanoperator [34] of (1), which acts on a function space F as K: F → F . This action can be given by

Kψ(x[k]) = ψ ◦ f(x[k]). (2)

where the function ψ : Rn → R is called an observable ofthe system and the set of all observables ψ , {ψi}pi=1, p ≤∞ on the system. Here F is invariant under the action of K.

The most important property of the Koopman operator thatwe utilize is the linearity of the operator, in other words,

K(αψ1 + βψ2) = αψ1 ◦ f + βψ2 ◦ f = αKψ1 + βKψ2

which follows from (2) since the composition operator islinear. Thus, we have that the Koopman operator of (1) is alinear operator that acts on observable functions ψ(xk) andpropagates them forward in time.

A. DMD and relevant variants

The practical identification of Koopman operator for anonlinear system from input-output data is commonly doneusing DMD [25] or extended DMD [17] which constructsan approximate Koopman operator K. Rowley et. al showedthat the finite-dimensional approximation to the Koopmanoperator obtained from DMD is closely related to a spectralanalysis of the linear but infinite-dimensional Koopman op-erator [18]. The approach taken to compute an approximationto the Koopman operator in both DMD and extended DMDis as follows

K = minK||Ψ(Xf )−KΨ(Xp)|| = Ψ(Xf )Ψ(Xp)

† (3)

where Xf ≡[x1 . . . xN−1

], Xp ≡

[x2 . . . xN

]are

snapshot matrices formed from the discrete-time dynamicalsystem (1), Ψ(X) ≡

[ψ1(x) . . . ψR(x)

]is the mapping

from physical space into the space of observables and †

denotes the Moore-Penrose pseudoinverse. Here N is thenumber of snapshots i.e. timepoints. We note that DMD is aspecial case of extended DMD where ψ(x) = x. Throughoutthe rest of the paper, when we refer to the Koopman operatorwe mean the finite dimensional approximation to the infinite-dimensional Koopman operator.

III. EXPERIMENTAL SETUP

To demonstrate the effectiveness of this novel modelingapproach, we chose Vibrio natriegens for the bacterial spec-imen for this experiment primarily due to its enormouslyhigh growth rate [35]. It also serves as a potential candidatefor hosting genetic circuits. The key advantages that thisoffers to our experiment is the ability to see the growth curvedynamics quickly thereby decreasing the sampling time andenabling rapid proof of concept testing. We formulate theproblem of describing V. nat.’s growth rate as an input-outputproblem. The inputs are specific pulse inputs of Casein andGlucose added to V. nat.’s culture medium (described below).The output dataset is the rate of growth calculated from thevariation of optical density obtained from periodic OD600measurements from a plate reader.

Incubating V. nat.: V. natriegens cryo-preserved at −70oCin 30%(vol/vol) glycerol stock is revived by suspending asmall portion into a polypropylene test tube containing 4mLLysogeny Broth (LB). This is cultured at 37oC at 200rpm for4 hours. A cloudy culture medium inside the tube indicatesthe successful formation of a V. nat. culture.

Solution Preparation: The medium chosen for the growthof V. nat. is the VN minimal medium (a modified CGXIImedium) which was used in [ref] to study the individualeffect of various inputs like sugars and intermediates inthe metabolic pathway. VN minimal medium includes acombination of salts (refer Table I). We prepared it by addingthese required salts (mass per liter shown in Table I) to a half

Page 3: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

filled 1L autoclaved bottle with Milli-Q water which is laterfilled up to 1L and mixed. After the dissolution of salts, themedium is sterilized by filtration. Due to the possibility forthese salts to dissociate at high temperatures, autoclaving ofthe solution was not done.

Glucose (Sugar) and Casein (Proteins) are the two inputsthat will be added in varying amounts to the culture to modeltheir effects on the growth curve of V. nat. and hence a veryconcentrated solution of each. 500g/L solution of Glucose iscreated by mixing 4g of Glucose in 8mL of Milli-Q waterand vortexed rigorously for 15-30mins and is taken as the1x concentration Glucose solution. Since Casein does notdissolve in water, Casein acid hydrolysate is used in its placeto bypass the solubility issue. The downside is that Caseinhydrolysate does not contain the amino acids Tryptophanand Cystine as they are destroyed during acid hydrolysis.250g/L solution of Casein acid hydrolysate is created bymixing 2g of it in 8mL of Milli-Q water and is taken asthe 1x concentration Casein solution.

Serial dilution setup for VNat culture: To measurethe patterns of growth of Vibrio Natriegens in the modifiedVN minimal media (VN minimal media with Casein andGlucose), we periodically measured OD600 values for thesemicrobes in a 96-well plate. Each well of this plate contained300µL of modified media - 150µL of VN minimal mixedwith a 150µL input solution containing both Casein andGlucose. The constituent volumes of Casein and Glucosein this latter 150µL were calculated based on a 2D serialdilution to study a variety of cases.

Using a fresh 96-well plate, we added 250µL of casein ata concentration 250g/L to well A1. Let this be defined as1x concentration for casein. By filling the remaining wellsin row A with casein at 0.5x concentration, one-dimensionalserial dilution can be first performed across the rows A-H.We added 150µL of MilliQ to each well in these rows priorto this. The serial dilution using 100µL from row A resultsin wells of the casein concentration diluted by 2.5× theprevious well in the same column. Further, we added 25µLof both the 1x Casein solution and Glucose of concentrationof 500g/L. This enables us to maintain the concentration ofCasein at 0.5x in column 2 and the resulting concentrationof Glucose is 67.5g/L (1x concentration for Glucose). Asecond serial dilution is performed across the columns (2-11) using 50µL from column 2. This results in wells whoseconcentration is 4× diluted compared to the preceding wellin the same row.

We retrieved cultures from the incubator, centrifuged themand discarded the supernatant. To remove residual media, wewashed the cell pellet in PBS solution twice using the vortexmachine and a centrifuge. To the cell pellet, the VN minimalmedia is added and vortexed. Finally, we re-suspended thecell pellet in VN Mininal media using the vortex machine.150µL of this culture is added to each well of the microplatefor a total volume of 300µL in each well and was used inthe plate reader experiment.

Data Collection: The microplate reader is set to 37oCand the shaker to 807cpm continuous double orbital shaking.

TABLE I: Salts required to prepare 1 litre of VN minimal media

Salt mass/L salt mass/LAmmonium Sulphate 5g Iron Sulphate 16.4mgSodium Chloride 15g Manganese Sulphate 10mgPotassium Phosphate 1g Copper Sulphate 0.3mgMonobasicPotassium Phosphate 1g Zinc Sulphate 1mgDibasicMagnesium Sulphate 250mg Nickel Chloride 0.02mgCalcium Chloride 10mg MOPS acid 21mg

Amount of Glucose

Amou

nt o

f Cas

ein

1 2 3 4 5 6 7 8 9 10 11

A

C

B

D

F

E

G

H

9.37mg<latexit sha1_base64="57dHkUag5GQXwP9BNtcLNhgMz08=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbCtVb0YvHCvYD2qVk02wbmmSXJCuUpT/CiwdFvPp7vPlvTNs9aOuDgcd7M8zMCxPOtPG8b6ewsbm1vVPcLe3tHxwelY9P2jpOFaEtEvNYdUOsKWeStgwznHYTRbEIOe2Ek7u533miSrNYPpppQgOBR5JFjGBjpc6NW6sjMRqUK57rLYDWiZ+TCuRoDspf/WFMUkGlIRxr3fO9xAQZVoYRTmelfqppgskEj2jPUokF1UG2OHeGLqwyRFGsbEmDFurviQwLracitJ0Cm7Fe9ebif14vNdF1kDGZpIZKslwUpRyZGM1/R0OmKDF8agkmitlbERljhYmxCZVsCP7qy+ukXXX9mlt9uKo0bvM4inAG53AJPtShAffQhBYQmMAzvMKbkzgvzrvzsWwtOPnMKfyB8/kDwjiOiQ==</latexit>

2.34mg<latexit sha1_base64="GAeMtlf3sVh5za0kBjrD+dY46d4=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFuqx6MVjBfsB7VKyabYNTbJLkhXK0h/hxYMiXv093vw3pu0etPXBwOO9GWbmhQln2njet1PY2t7Z3Svulw4Oj45PyqdnHR2nitA2iXmseiHWlDNJ24YZTnuJoliEnHbD6d3C7z5RpVksH80soYHAY8kiRrCxUrfq1upIjIfliud6S6BN4uekAjlaw/LXYBSTVFBpCMda930vMUGGlWGE03lpkGqaYDLFY9q3VGJBdZAtz52jK6uMUBQrW9Kgpfp7IsNC65kIbafAZqLXvYX4n9dPTXQTZEwmqaGSrBZFKUcmRovf0YgpSgyfWYKJYvZWRCZYYWJsQiUbgr/+8ibpVF2/5lYf6pXmbR5HES7gEq7BhwY04R5a0AYCU3iGV3hzEufFeXc+Vq0FJ585hz9wPn8Ast2Ofw==</latexit>

585.93µg<latexit sha1_base64="QLHptLEtDFcjwR0Zs4AwGEkF/Gg=">AAAB83icbVDLSsNAFJ3UV62vqks3g0VwFZLWYt0V3bisYB/QhDKZTtqhM5MwD6GE/oYbF4q49Wfc+TdO2yy09cCFwzn3cu89Ucqo0p737RQ2Nre2d4q7pb39g8Oj8vFJRyVGYtLGCUtkL0KKMCpIW1PNSC+VBPGIkW40uZv73SciFU3Eo56mJORoJGhMMdJWCuqNuntTgwE3cDQoVzzXWwCuEz8nFZCjNSh/BcMEG06Exgwp1fe9VIcZkppiRmalwCiSIjxBI9K3VCBOVJgtbp7BC6sMYZxIW0LDhfp7IkNcqSmPbCdHeqxWvbn4n9c3Om6EGRWp0UTg5aLYMKgTOA8ADqkkWLOpJQhLam+FeIwkwtrGVLIh+Ksvr5NO1fVrbvXhqtK8zeMogjNwDi6BD65BE9yDFmgDDFLwDF7Bm2OcF+fd+Vi2Fpx85hT8gfP5A5UlkBc=</latexit>

146.48µg<latexit sha1_base64="1II4fFrNlKhHp3pbewmPu1Qjxvw=">AAAB83icbVDLSgNBEOyNrxhfUY9eBoPgKezGoDkGvXiMYB6QXcLsZDYZMrO7zEMIS37DiwdFvPoz3vwbJ8keNLGgoajqprsrTDlT2nW/ncLG5tb2TnG3tLd/cHhUPj7pqMRIQtsk4YnshVhRzmLa1kxz2kslxSLktBtO7uZ+94lKxZL4UU9TGgg8ilnECNZW8r36dbXeQL4waDQoV9yquwBaJ15OKpCjNSh/+cOEGEFjTThWqu+5qQ4yLDUjnM5KvlE0xWSCR7RvaYwFVUG2uHmGLqwyRFEibcUaLdTfExkWSk1FaDsF1mO16s3F/7y+0VEjyFicGk1jslwUGY50guYBoCGTlGg+tQQTyeytiIyxxETbmEo2BG/15XXSqVW9q2rtoV5p3uZxFOEMzuESPLiBJtxDC9pAIIVneIU3xzgvzrvzsWwtOPnMKfyB8/kDijmQEA==</latexit>

36.62µg<latexit sha1_base64="p4qW5FMzK99RxjL5mX+6p6Mk4Ec=">AAAB8nicbVBNSwMxEJ2tX7V+VT16CRbB07LbSvVY9OKxgv2A7VKyabYNTbJLkhVK6c/w4kERr/4ab/4b03YP2vpg4PHeDDPzopQzbTzv2ylsbG5t7xR3S3v7B4dH5eOTtk4yRWiLJDxR3QhrypmkLcMMp91UUSwiTjvR+G7ud56o0iyRj2aS0lDgoWQxI9hYKajV3XoV9USGhv1yxXO9BdA68XNSgRzNfvmrN0hIJqg0hGOtA99LTTjFyjDC6azUyzRNMRnjIQ0slVhQHU4XJ8/QhVUGKE6ULWnQQv09McVC64mIbKfAZqRXvbn4nxdkJr4Jp0ymmaGSLBfFGUcmQfP/0YApSgyfWIKJYvZWREZYYWJsSiUbgr/68jppV12/5lYfriqN2zyOIpzBOVyCD9fQgHtoQgsIJPAMr/DmGOfFeXc+lq0FJ585hT9wPn8AES6P0A==</latexit>

9.15µg<latexit sha1_base64="uejjL8jLD19B6BDu8l56HxjShNc=">AAAB8XicbVBNSwMxEJ2tX7V+VT16CRbB07JbFfVW9OKxgv3AdinZNNuGJtklyQpl6b/w4kERr/4bb/4b03YP2vpg4PHeDDPzwoQzbTzv2ymsrK6tbxQ3S1vbO7t75f2Dpo5TRWiDxDxW7RBrypmkDcMMp+1EUSxCTlvh6Hbqt56o0iyWD2ac0EDggWQRI9hY6fHa9S9QV6Ro0CtXPNebAS0TPycVyFHvlb+6/ZikgkpDONa643uJCTKsDCOcTkrdVNMEkxEe0I6lEguqg2x28QSdWKWPoljZkgbN1N8TGRZaj0VoOwU2Q73oTcX/vE5qoqsgYzJJDZVkvihKOTIxmr6P+kxRYvjYEkwUs7ciMsQKE2NDKtkQ/MWXl0mz6vpnbvX+vFK7yeMowhEcwyn4cAk1uIM6NICAhGd4hTdHOy/Ou/Mxby04+cwh/IHz+QOebY+U</latexit>

2.28µg<latexit sha1_base64="kT/lAHOk0XIzQrk9CzbF/4HEw6o=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSJ4CkkU7LHoxWMF+4FtKJvtpl26uwm7G6GE/gsvHhTx6r/x5r9x2+agrQ8GHu/NMDMvSjnTxvO+nbX1jc2t7dJOeXdv/+CwcnTc0kmmCG2ShCeqE2FNOZO0aZjhtJMqikXEaTsa38789hNVmiXywUxSGgo8lCxmBBsrPQZuUEM9kaFhv1L1XG8OtEr8glShQKNf+eoNEpIJKg3hWOuu76UmzLEyjHA6LfcyTVNMxnhIu5ZKLKgO8/nFU3RulQGKE2VLGjRXf0/kWGg9EZHtFNiM9LI3E//zupmJa2HOZJoZKsliUZxxZBI0ex8NmKLE8IklmChmb0VkhBUmxoZUtiH4yy+vklbg+pducH9Vrd8UcZTgFM7gAny4hjrcQQOaQEDCM7zCm6OdF+fd+Vi0rjnFzAn8gfP5A5m7j5E=</latexit>

0.57µg<latexit sha1_base64="IOdms21duQph1lqn/nLE372ecLw=">AAAB8XicbVBNSwMxEJ2tX7V+VT16CRbB07JblXosevFYwX5gu5Rsmm1Dk+ySZIWy9F948aCIV/+NN/+NabsHbX0w8Hhvhpl5YcKZNp737RTW1jc2t4rbpZ3dvf2D8uFRS8epIrRJYh6rTog15UzSpmGG006iKBYhp+1wfDvz209UaRbLBzNJaCDwULKIEWys9Oi5VzXUEyka9ssVz/XmQKvEz0kFcjT65a/eICapoNIQjrXu+l5iggwrwwin01Iv1TTBZIyHtGupxILqIJtfPEVnVhmgKFa2pEFz9fdEhoXWExHaToHNSC97M/E/r5ua6DrImExSQyVZLIpSjkyMZu+jAVOUGD6xBBPF7K2IjLDCxNiQSjYEf/nlVdKquv6FW72/rNRv8jiKcAKncA4+1KAOd9CAJhCQ8Ayv8OZo58V5dz4WrQUnnzmGP3A+fwCZuI+R</latexit>

0.14µg<latexit sha1_base64="y06P6xM5hhnzv9PuAu0EYO9bKMA=">AAAB8XicbVBNSwMxEJ2tX7V+VT16CRbB07JbC/VY9OKxgv3AdinZNNuGJtklyQpl6b/w4kERr/4bb/4b03YP2vpg4PHeDDPzwoQzbTzv2ylsbG5t7xR3S3v7B4dH5eOTto5TRWiLxDxW3RBrypmkLcMMp91EUSxCTjvh5Hbud56o0iyWD2aa0EDgkWQRI9hY6dFz/RrqixSNBuWK53oLoHXi56QCOZqD8ld/GJNUUGkIx1r3fC8xQYaVYYTTWamfappgMsEj2rNUYkF1kC0unqELqwxRFCtb0qCF+nsiw0LrqQhtp8BmrFe9ufif10tNdB1kTCapoZIsF0UpRyZG8/fRkClKDJ9agoli9lZExlhhYmxIJRuCv/ryOmlXXf/Krd7XKo2bPI4inME5XIIPdWjAHTShBQQkPMMrvDnaeXHenY9la8HJZ07hD5zPH47uj4o=</latexit>

0.03µg<latexit sha1_base64="TkY4vEfbRsm4jgCXppba9szrmm8=">AAAB8XicbVDLTgIxFL2DL8QX6tJNIzFxRWbARJdENy4xETDChHRKBxrazqQPEzLhL9y40Bi3/o07/8YCs1DwJE1Oz7k3994TpZxp4/vfXmFtfWNzq7hd2tnd2z8oHx61dWIVoS2S8EQ9RFhTziRtGWY4fUgVxSLitBONb2Z+54kqzRJ5byYpDQUeShYzgo2THv2qX0c9YdGwX664zxxolQQ5qUCOZr/81RskxAoqDeFY627gpybMsDKMcDot9aymKSZjPKRdRyUWVIfZfOMpOnPKAMWJck8aNFd/d2RYaD0RkasU2Iz0sjcT//O61sRXYcZkag2VZDEothyZBM3ORwOmKDF84ggmirldERlhhYlxIZVcCMHyyaukXasG9Wrt7qLSuM7jKMIJnMI5BHAJDbiFJrSAgIRneIU3T3sv3rv3sSgteHnPMfyB9/kDi9mPiA==</latexit>

0.0

31m

g<latexit sha1_base64="+lxnn8telWZSiQ/W9lC4JZ0w3No=">AAAB73icbZDLSgMxFIbPeK31VnXpJlgEV2WmFXRZdOOygr1AO5RMmmlDcxmTjFCGvoQbF4q49XXc+Tam7Sy09YfAx3/OIef8UcKZsb7/7a2tb2xubRd2irt7+weHpaPjllGpJrRJFFe6E2FDOZO0aZnltJNoikXEaTsa387q7SeqDVPywU4SGgo8lCxmBFtndfyKXwuQGPZLZYdzoVUIcihDrka/9NUbKJIKKi3h2Jhu4Cc2zLC2jHA6LfZSQxNMxnhIuw4lFtSE2XzfKTp3zgDFSrsnLZq7vycyLIyZiMh1CmxHZrk2M/+rdVMbX4cZk0lqqSSLj+KUI6vQ7Hg0YJoSyycOMNHM7YrICGtMrIuo6EIIlk9ehVa1EtQq1fvLcv0mj6MAp3AGFxDAFdThDhrQBAIcnuEV3rxH78V79z4WrWtePnMCf+R9/gAaQY60</latexit>

0.0

77m

g<latexit sha1_base64="NdkEgrtfNEQEmEhQiEe0zKyA8Pg=">AAAB73icbZDLSgMxFIZPvNZ6q7p0EyyCqzJThbosunFZwV6gHUomzbShSWZMMkIZ+hJuXCji1tdx59uYtrPQ1h8CH/85h5zzh4ngxnreN1pb39jc2i7sFHf39g8OS0fHLROnmrImjUWsOyExTHDFmpZbwTqJZkSGgrXD8e2s3n5i2vBYPdhJwgJJhopHnBLrrI5X8Wo1LIf9UtnhXHgV/BzKkKvRL331BjFNJVOWCmJM1/cSG2REW04FmxZ7qWEJoWMyZF2Hikhmgmy+7xSfO2eAo1i7pyyeu78nMiKNmcjQdUpiR2a5NjP/q3VTG10HGVdJapmii4+iVGAb49nxeMA1o1ZMHBCqudsV0xHRhFoXUdGF4C+fvAqtasW/rFTvr8r1mzyOApzCGVyADzWowx00oAkUBDzDK7yhR/SC3tHHonUN5TMn8Efo8wcpi46+</latexit>

0.48

mg

<latexit sha1_base64="TI+rgMR4NylgF/hAUYZ3+LB3Akw=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07JbC+2x6MVjBfsB7VKyabYNTbJLkhXK0h/hxYMiXv093vw3pu0etPXBwOO9GWbmhQln2njet1PY2t7Z3Svulw4Oj45PyqdnHR2nitA2iXmseiHWlDNJ24YZTnuJoliEnHbD6d3C7z5RpVksH80soYHAY8kiRrCxUtdzaw0kxsNyxXO9JdAm8XNSgRytYflrMIpJKqg0hGOt+76XmCDDyjDC6bw0SDVNMJniMe1bKrGgOsiW587RlVVGKIqVLWnQUv09kWGh9UyEtlNgM9Hr3kL8z+unJmoEGZNJaqgkq0VRypGJ0eJ3NGKKEsNnlmCimL0VkQlWmBibUMmG4K+/vEk6Vde/casPtUrzNo+jCBdwCdfgQx2acA8taAOBKTzDK7w5ifPivDsfq9aCk8+cwx84nz+3bY6C</latexit>

1.2m

g<latexit sha1_base64="9qmE/cdrZz+U+ObiJURzVlbgJb8=">AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKexGQY9BLx4jmAckS5idzCZj5rHMzAphyT948aCIV//Hm3/jJNmDJhY0FFXddHdFCWfG+v63V1hb39jcKm6Xdnb39g/Kh0cto1JNaJMornQnwoZyJmnTMstpJ9EUi4jTdjS+nfntJ6oNU/LBThIaCjyULGYEWye1gmoNiWG/XPGr/hxolQQ5qUCORr/81RsokgoqLeHYmG7gJzbMsLaMcDot9VJDE0zGeEi7jkosqAmz+bVTdOaUAYqVdiUtmqu/JzIsjJmIyHUKbEdm2ZuJ/3nd1MbXYcZkkloqyWJRnHJkFZq9jgZMU2L5xBFMNHO3IjLCGhPrAiq5EILll1dJq1YNLqq1+8tK/SaPowgncArnEMAV1OEOGtAEAo/wDK/w5invxXv3PhatBS+fOYY/8D5/ADrgjj8=</latexit>

3mg

<latexit sha1_base64="9ls0wuk7/PIGMgcNP1cWXBirkE4=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY9FLx4r2A9oQ9lsN+3S3U3Y3Qgl9C948aCIV/+QN/+NmzQHbX0w8Hhvhpl5QcyZNq777ZQ2Nre2d8q7lb39g8Oj6vFJV0eJIrRDIh6pfoA15UzSjmGG036sKBYBp71gdpf5vSeqNIvko5nH1Bd4IlnICDaZ1ERiMqrW3LqbA60TryA1KNAeVb+G44gkgkpDONZ64Lmx8VOsDCOcLirDRNMYkxme0IGlEguq/TS/dYEurDJGYaRsSYNy9fdEioXWcxHYToHNVK96mfifN0hMeOOnTMaJoZIsF4UJRyZC2eNozBQlhs8twUQxeysiU6wwMTaeig3BW315nXQbda9Zbzxc1Vq3RRxlOINzuAQPrqEF99CGDhCYwjO8wpsjnBfn3flYtpacYuYU/sD5/AFg2Y3N</latexit>

7.5m

g<latexit sha1_base64="pwOZYIFq5EH9shMgJ0m5ByXNp5Q=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRbB07JblXosevFYwX5Au5Rsmm1jk+ySZIWy9D948aCIV/+PN/+NabsHbX0w8Hhvhpl5YcKZNp737RTW1jc2t4rbpZ3dvf2D8uFRS8epIrRJYh6rTog15UzSpmGG006iKBYhp+1wfDvz209UaRbLBzNJaCDwULKIEWys1Kq5V0gM++WK53pzoFXi56QCORr98ldvEJNUUGkIx1p3fS8xQYaVYYTTaamXappgMsZD2rVUYkF1kM2vnaIzqwxQFCtb0qC5+nsiw0LriQhtp8BmpJe9mfif101NdB1kTCapoZIsFkUpRyZGs9fRgClKDJ9Ygoli9lZERlhhYmxAJRuCv/zyKmlVXf/Crd5fVuo3eRxFOIFTOAcfalCHO2hAEwg8wjO8wpsTOy/Ou/OxaC04+cwx/IHz+QNIq45I</latexit>

18.7

5m

g<latexit sha1_base64="LAqaMGgkhr1All6izDf2SgMlJeM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4Wnar0h6LXjxWsB/QLiWbZtvQJLsmWaEs/RNePCji1b/jzX9j2u5BWx8MPN6bYWZemHCmjed9O2vrG5tb24Wd4u7e/sFh6ei4peNUEdokMY9VJ8SaciZp0zDDaSdRFIuQ03Y4vp357SeqNIvlg5kkNBB4KFnECDZW6vg1t3qNxLBfKnuuNwdaJX5OypCj0S999QYxSQWVhnCsddf3EhNkWBlGOJ0We6mmCSZjPKRdSyUWVAfZ/N4pOrfKAEWxsiUNmqu/JzIstJ6I0HYKbEZ62ZuJ/3nd1ES1IGMySQ2VZLEoSjkyMZo9jwZMUWL4xBJMFLO3IjLCChNjIyraEPzll1dJq+L6l27l/qpcv8njKMApnMEF+FCFOtxBA5pAgMMzvMKb8+i8OO/Ox6J1zclnTuAPnM8fNFqOxQ==</latexit>

0.19

mg

<latexit sha1_base64="tdQDl0QoHECYzs8Cp4dxxGKjbmE=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0iqoN6KXjxWsB/QhrLZbtqlu5uwuxFK6I/w4kERr/4eb/4bt2kO2vpg4PHeDDPzwoQzbTzv2ymtrW9sbpW3Kzu7e/sH1cOjto5TRWiLxDxW3RBrypmkLcMMp91EUSxCTjvh5G7ud56o0iyWj2aa0EDgkWQRI9hYqeO5/g0So0G15rleDrRK/ILUoEBzUP3qD2OSCioN4Vjrnu8lJsiwMoxwOqv0U00TTCZ4RHuWSiyoDrL83Bk6s8oQRbGyJQ3K1d8TGRZaT0VoOwU2Y73szcX/vF5qousgYzJJDZVksShKOTIxmv+OhkxRYvjUEkwUs7ciMsYKE2MTqtgQ/OWXV0m77voXbv3hsta4LeIowwmcwjn4cAUNuIcmtIDABJ7hFd6cxHlx3p2PRWvJKWaO4Q+czx+0XI6A</latexit>

0mg<latexit sha1_base64="X+U0kyMuBnyKIa7riHIPR5Bxbgs=">AAAB63icbVBNSwMxEJ34WetX1aOXYBE8ld0q6LHoxWMF+wHtUrJptg1NskuSFcrSv+DFgyJe/UPe/Ddm2z1o64OBx3szzMwLE8GN9bxvtLa+sbm1Xdop7+7tHxxWjo7bJk41ZS0ai1h3Q2KY4Iq1LLeCdRPNiAwF64STu9zvPDFteKwe7TRhgSQjxSNOic0lD8vRoFL1at4ceJX4BalCgeag8tUfxjSVTFkqiDE930tskBFtORVsVu6nhiWETsiI9RxVRDITZPNbZ/jcKUMcxdqVsniu/p7IiDRmKkPXKYkdm2UvF//zeqmNboKMqyS1TNHFoigV2MY4fxwPuWbUiqkjhGrubsV0TDSh1sVTdiH4yy+vkna95l/W6g9X1cZtEUcJTuEMLsCHa2jAPTShBRTG8Ayv8IYkekHv6GPRuoaKmRP4A/T5A1xEjco=</latexit>

Fig. 1: Different initial conditions of substrates obtained by twodimensional serial dilution of Casein and Glucose and the corre-sponding growth curves are obtained for a period of 48 hours.

The absorbance at 600nm which is termed as the OpticalDensity (OD600) is measured as a function of time for 48hours. This serves as the indicator of the cell concentrationwhich when multiplied by the volume of solution gives thenumber of cells present in each well. Since the volume ismaintained constant, cell concentration can be used in theplace of cumber of cells. The obtained data along with theinputs are shown in Figure. 1.

IV. CAUSAL-JUMP EXTENDED DYNAMIC MODEDECOMPOSITION FOR GROWTH PREDICTION

A. The Growth Rate Modeling Problem

To formulate the problem of optimal input identificationfor maximal growth rate as a function of time, we need toidentify a model that maps the growth rate to the input. Thestates that represent this system are the bacterial cell count(Nb) and the amount of substrates - Casein (C) and Glucose(G) which change as a function of time (pictorially shown inFigure ??). The substrates are also the inputs to the system.Hence the general equation of the system isNb[k + 1]

C[k + 1]G[k + 1]

= f(Nb[k], C[k], G[k]) + g(C[k], G[k])

where f represents the dynamics and g represent the inputs.In the experimental setup (see section III), the substrates areadded only at the initial time and can be treated as an impulse

Page 4: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

input. To simplify the scenario, in this paper, the impulseinput is treated as an initial condition of the substrates. Thesystem is then simplified in the discrete time framework asNb[k + 1]

C[k + 1]G[k + 1]

= f(Nb[k], C[k], G[k]) (4)

given nonzero initial conditions Nb[0], C[0] and G[0]. OD600is the measured quantity in the experiment which has a one-to-one correspondence with Nb.

y[k] = OD600[k] = h(Nb) (5)

There are identified empirical nonlinear models like that ofMonod’s [36] which use a single substrate and [37] and [38]which use multiple substrates. Monod’s model is a two-statenonlinear dynamical system comprising the substrate(S) andthe number of bacteria(Nb):

Nb(t) = rmaxS(t)Nb(t)

Ks + S(t)

S = −γNb (6)

where rmax is the maximum growth rate and Ks is the halfvelocity constant. In our experimental setup(see section III),the only variable of measurement is OD600 To obtain amodel that is more conducive to the measurement framework,we differentiate Nb(t) and eliminate the substrate dynamicsto obtain

Nb(t) =1

rmaxKsNb(KsrmaxNb

2+

2γrmaxNbNb2 − γr2

maxN2b Nb − γNb

3) (7)

This could potentially serve as the input-output models,but the drawback is that it assumes the growth curve to takea certain shape which may not be necessarily true whenculturing new microbes such as Pseudomonas fluorescens(Pf-5) or Vibrio natriegens(see Figure 1). The consequenceis that if we identify the parameters of one model based ona set of initial conditions, it does not necessarily predict theresponse for other conditions as the behavior could changeentirely.

B. The Causal-Jump Extended Dynamic Mode Decomposi-tion Algorithm for Growth Modeling

We instead consider a data-driven approach, using modelsdriven by available measurements and substrate concentra-tions. In this paper we introduce the causal-jump extendeddynamic mode decomposition algorithm, as a variant of theHankel-DMD algorithm, with the important distinction thatthe proposed dictionary functions do not violate the propertyof causality. We briefly review the Hankel DMD algorithm.

1) Hankel Dynamic Mode Decomposition: The HankelDMD algorithm solves the optimization problem

minK,θ(Ψ)

||Ψ(Xf )−KΨ(Xp)|| (8)

where

Ψ(Xf ) =

ψ[t+ 1](x10) . . . ψ[t+ 1](xNT

0 )

ψ[t+ 2](x10) . . . ψ[t+ 2](xNT0 )

.... . .

...ψ[t+ n](x10) . . . ψ[t+ n](xNT

0 )

and

Ψ(Xp) =

ψ[t](x10) . . . ψ[t](xNT

0 )

ψ[t+ 2](x10) . . . ψ[t+ 2](xNT0 )

.... . .

...ψ[t+ n− 1](x10) . . . ψ[t+ n− 1](xNT

0 )

and n is the number of timepoints drawn from a giventime-series trace x[t] , NT is the number of separate traces,initialized from a distinct initial condition xj0, j = 1, .., NT ,and

ψ[t](x0) ≡ ψ[t, τ ](x0) ≡[x[t]T x[t+ 1]T . . . x[t+ τ ]T

]T.

Thus for any given lifted snapshot ψ[t](x0), the HankelDMD generates a model that is non-causal, as it predicts

ψ[t+ 1, x0] = Kψ[t, x0]

which implies that predictions for x[t + 1] are made usingstate information from x[t+ δ] where δ ≥ 1, which is non-causal.

2) The Causal-Jump Extended Dynamic Mode Decompo-sition Algorithm: We now propose a modification to theHankel dynamic mode decomposition algorithm and discussits implications in terms of closure and function approxima-tion theory an alternative view of snapshots of the dynamicalsystem (1).

To maintain causality, we consider windows of lengthτ ∈ Z>0 but apply a downsampling strategy that ensures theprediction task does not require utilizing future state informa-tion to predict the present state. Let x[0], x[t+1], ..., x[t+N ]be a state trajectory for the dynamical system (1). SupposeN + 1 is an integer multiple of τ . Then we divide the statetrajectory into jump partitions:

x[0], x[1], ..., x[τ − 1]

x[τ ], x[τ + 1], ..., x[2τ − 1]

...x[(R− 1)τ ], ..., x[Rτ − 1]

(9)

where Rτ − 1 = N . We can represent these windowsthrough function composition of the vector field f(x) viathe following relation,

(x[(k)τ)f(x[kτ ])

...f (τ)([x[kτ ])

= fτ

x[(k − 1)(τ))]f(x[(k − 1)(τ)]

...f (τ)(x[(k − 1)(τ)])

(10)

We refer to this new structured dynamical system as the τ -jump system of the original dynamical system (1). As long as

Page 5: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

f is locally Lipschitz with Lipschitz constant L, it is easy tosee via compositional arguments with the Lipschitz propertythat fτ will also be locally Lipschitz with Lipschitz constantLτ and therefore fτ retains a unique solution for the τ -jumpdynamical system. Furthermore, if f(x) is analytic, thenthe τ -jump system also admits the existence of a Koopmanoperator

Theorem 1: Let f(x) in system (1) be analytic. Then theτ -jump system also admists a Koopman operator.

Proof: Since f(x) is analytic, it admits a countable-dimension Koopman operator K, with a invariant subspaceisomorphic to either a finite or infinite Taylor polynomialbasis [34]. Moreover, isomorphism with a Taylor polynomialbasis ensures that the Koopman observable space contains thefull state observable, i.e. it is state inclusive [?]. This impliesthat the τ step jump from τ function compositions of f canbe modeled via the action of the Koopman operator, in thefollowing way:

ψ(x[kτ ]) = ψ(f(x(kτ − 1))) = Kψ(x(kτ − 1)) = K (11)

where ψ represents the 1-step Koopman observable function.

x(kτ) = fτ (x(k − 1)τ)

= f (τ−1) ◦ f((x(k − 1)τ + 1)

= f (τ−1) ◦Kxψ(x((k − 1)τ))

= f (τ−2) ◦Kxψ(Kxψ(x((k − 1)τ)))

= g(τ)(x(k − 1)τ)

(12)

where g(x) = Kxψ(x). . This provides an explicit form forthe recovery of the τ -jump governing equations from theone-step Koopman operator and its observable function.

There are two easy arguments to conclude the proof. First,note that since f is analytic, fτ is analytic and thus by thesame reasoning as in [34], fτ thus must admit a Koopmanoperator. The second argument is a constructive one, notingthat equation

ψ(x[(k)τ ]) = Kτψ(x[(k − 1)(τ)) (13)

must hold due to τ applications of the 1-step Koopmanequation. This means therefore that the following matrixequation must hold

ψ

x[(k)τ ]x[kτ + 1]

...x[(k + 1)τ − 1]

= KJψ

(x[(k − 1)(τ))](x[(k − 1)]τ + 1])

...(x[(k)τ − 1)]

(14)where KJ = diag (Kτ ,Kτ , . . .Kτ ) . This concludes theproof.The power of the jump partitioning is apparent when con-sidering multiple trajectories. Notice that an element of thestacked data matrix Ψ(Xp) has access to all τ entries of thestate vector to define any monomial, polynomial, or higher-order nonlinear observable, without violating the rule ofcausality, due to the partitioning of the state trajectory. Thisis precisely the approach we take with causal-jump extended

dynamic mode decomposition. The jump-Koopman learningproblem we solve is as follows

minKJ

||Ψ(Xf )−KJΨ(Xp)||

s.t.

ψ

x[(k)τ ]x[kτ + 1]

...x[(k + 1)τ − 1]

= KJψ

(x[(k − 1)(τ))](x[(k − 1)]τ + 1])

...(x[(k)τ − 1)]

where

Ψ(Xp) =

ψ[0, τ ](x10) . . . ψ[0, τ ](xNT

0 )

ψ[1, τ ](x10) . . . ψ[1, τ ](xNT0 )

.... . .

...ψ[R− 1, τ ](x10) . . . ψ[R− 1, τ ](xNT

0 )

and

Ψ(Xf ) =

ψ[1, τ ](x11) . . . ψ[1, τ ](xNT

0 )

ψ[2, τ ](x10) . . . ψ[2, τ ](xNT0 )

.... . .

...ψ[R, τ ](x10) . . . ψ[R, τ ](xNT

0 )

and with a slight abuse of notation (for brevity)

ψ[k, x0] = ψ (x(kτ), x(kτ + 1), ..., x(((k + 1)τ − 1)) ∈ RnL

where nL is the lifting dimension of the extended ba-sis defined on the set of nonlinear functions on x(kτ +1), ..., x((k+1)τ−1). The solution to the causal-jump Koop-man learning problem follows that of E-DMD, e.g. using thecompanion matrix method or singular value decompositionto compute robust estimates of the Moore-Penrose inverse onthe data matrices. Further, it can be shown that for analyticfunctions expressed by universal function approximators withalgebraic closure properties on function composition, e.g.logistic, RELU and infinite polynomial bases, these extendedcausal-jump lifting functions can generate a Koopman in-variant subspace. The proof is omitted here due to spaceconstraints.

3) Causal-Jump Extended Dynamic Mode Decompositionfor Learning Predictive Growth Models: We now applyour algorithm to generate predictive growth models forcultures of Vibrio natriegens, as a function of environmentalparameters such as casein and glucose. The key ideas to takefrom the Monod model are that the system dynamics can berepresented by the number of cells Nb[k] as a function oftime step k and that the substrate dynamics can be viewedas an initial condition of the population cell dynamics. Thisyields the model:

Nb[k + 1] = f(Nb[k]) (15)

starting with the initial conditions Nb[0], C[0] and G[0]. Thecaveat with this model is that the dynamics has no represen-tation of the input and the initial conditions are independent

Page 6: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

since the bacteria and substrates are independently addedinto each well. To bypass this issue, we exploit the fact thatwe witnessed in (7) that the substrate dynamics is containedin the bacterial cell growth dynamics. Since (7) also usessecond derivatives, it is clear that we need to include morepast information to estimate the current state. So, we breakthe whole response across into fragments and represent eachstate by a collection of time points

x[k] =[y[nk] y[nk + 1] ... y[nk + n− 1]

]T(16)

where n represents the number of data points in eachfragment. Using this representation of state, we can createa model to map the initial condition of the new state to theold one and also predict the cell growth dynamics with moreinformation. The model then translates to

x[k + 1] = F (x[k])Nb[0]C[0]G[0]

= Ax[0] (17)

With the model identification objective in mind, we geartowards system identification techniques of data-based timeseries models. Since we are dealing with a nonlinear model,we simplify the problem by identifying an approximationof the Koopman operator. The first step towards that isto construct a dictionary of observables by computing allpossible monomials of x at each time instant k.

ψ(x[k]) = [y[nk], ..., y[nk + n− 1],

y2[nk], y[nk]y[nk + 1], ..., y2[nk + n− 1]

y3[nk], y2[nk]y[nk + 1], ...]T

Assuming we have N time samples of x[k] and M data-sets, we collect all the observables as

Ψi(x−) =

[ψ(xi[0]) ψ(xi[1]) ... ψi(x[N − 2])

]Ψi(x) =

[ψi(x[1]) ψi(x[1]) ... ψi(x[N − 1])

]i = 1, 2, ...M

Ψ(x−) =[Ψ1(x−) Ψ2(x−) ... ΨM (x−)

]Ψ(x) =

[Ψ1(x) Ψ2(x) ... ΨM (x)

](18)

Then we can pose the Koopman learning problem as

Ψ(x) = KΨ(x−) (19)

which can be solved by DMD as highlighted in section II.To test the model on a new data set, we take estimate ψ(x[0])and then predict the response using

ψ(x[k]) = Kkψ(x[0]) (20)

and the required dynamics can be obtained by simply drop-ping the additional observables other than the base state. Theother model to be constructed is the mapping between theinitial states. For this purpose, across the M training sets, wecollect ψ(x[0]) and obtain

Ψ(x[0]) =[ψ1(x[0]) ψ2(x[0]) ... ψM (x[0])

].

Since yi[0] is the first entry of ψ(x[0]), it does not requireany mapping. So, we construct a matrix of initial substrateconditions.

Y0 =

[C1[0] C2[0] ... CM [0]G1[0] G2[0] ... GM [0]

].

We again formulate the similar structure

Y0 = K0Ψ(x[0]) (21)

which can again be solved by SVD formulation in SectionII.

V. RESULTS

From the data-sets obtained in the plate reader experimentsshown in Figure. 1, we identified approximate Koopmanoperators using Hankel DMD, Extended DMD and theCausal Jump DMD by using the training set of well A7. Toensure equal comparison of the three algorithms, we keep thenumber of points used to represent the initial condition samein Hankel DMD and CJDMD at 10 points and the highestorder of monomials same across EDMD and CJDMD as 10.Using the first 80% of the singular values, we see that EDMDdoes a good job but CJDMD fits almost perfectly as seen in2 where HankelDMD fails miserably.

0 20 40 60 80Time Index

0

0.5

1

1.5O

D60

0

N step Prediction Comparison

Actual DataHankelDMDExtendedDMDCausalJump DMD

50% Principal Comonents Training Data

0 20 40 60 80Time Index

0

0.5

1

1.5

2

OD

600

N step Prediction Comparison

Actual DataHankelDMDExtendedDMDCausalJump DMD

80% Principal Comonents Training Data

Fig. 2: Comparing the goodness of fit of various DMD algorithmswith a single training set with 80% of the singular values

The N step prediction capability of these models weretested in a fresh dataset of well A8 and the results are seenin Figure. 3. CJDMD fit with a absolute mean percent error

0 20 40 60 80Time Index

0

0.5

1

1.5

OD

600

N step Prediction Comparison

Actual DataHankelDMDExtendedDMDCausalJump DMD

50% Principal Comonents Training Data

0 20 40 60 80Time Index

0

0.5

1

1.5

2

OD

600

N step Prediction Comparison

Actual DataHankelDMDExtendedDMDCausalJump DMD

80% Principal Comonents Training Data

0 20 40 60 80Time Index

0

0.5

1

1.5

2

OD

600

N step Prediction Comparison

Actual DataHankelDMDExtendedDMDCausalJump DMD

80% Principal Comonents Test Data

Fig. 3: Comparing the goodness of fit of the various DMD algo-rithms based on a test set. It can be seen that Causal Jump DMDdoes a splendid job in both test set and training set with 80% ofthe singular values

of 1.5%. We then took the wells A5 , A7, A8, A9, A11,B1, B6, B8, B9, B10, C1, C6, C7, C8 and C9 as training

Page 7: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

sets and A6, A10, B7, B11 C10 and C11 as the test setsto build a better model and predict across many variations.The remaining data sets exhibit different trends and are notconsidered.

In the training process, the two parameters that we cantweak to optimize the model are n in equation (16) andmaximum order of monomials in (18). By choosing n = 10and keeping the maximum order or monomial to be 10, theKoopman operator has been identified and the prediction onthe training data is shown in Figure. 4 and on the test set isshown in Figure. 5.

0 10 20 300

1

OD600

ObservedEstimated

Well A5

0 10 20 300

1

ObservedEstimated

Well A7

0 10 20 300

1

2

ObservedEstimated

Well A8

0 10 20 300

1

2

OD600

ObservedEstimated

Well A9

0 10 20 300

1

2

ObservedEstimated

Well A11

0 10 20 300

1

ObservedEstimated

Well B1

0 10 20 30Time[Hrs]

0

1

OD600

ObservedEstimated

Well B6

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B8

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B9

0 10 20 300

1

OD600

ObservedEstimated

Well A5

0 10 20 300

1

ObservedEstimated

Well A7

0 10 20 300

1

2

ObservedEstimated

Well A8

0 10 20 300

1

2

OD600

ObservedEstimated

Well A9

0 10 20 300

1

2

ObservedEstimated

Well A11

0 10 20 300

1

ObservedEstimated

Well B1

0 10 20 30Time[Hrs]

0

1

OD600

ObservedEstimated

Well B6

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B8

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B9

Fig. 4: The identified Koopman operator is tested on the trainingsets with 10 point initial condition and up to 10th order monomialsto get a MSE of 3.4%

0 10 20 300

1

OD600

ObservedEstimated

Well A5

0 10 20 300

1

ObservedEstimated

Well A7

0 10 20 300

1

2

ObservedEstimated

Well A8

0 10 20 300

1

2

OD600

ObservedEstimated

Well A9

0 10 20 300

1

2

ObservedEstimated

Well A11

0 10 20 300

1

ObservedEstimated

Well B1

0 10 20 30Time[Hrs]

0

1

OD600

ObservedEstimated

Well B6

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B8

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B9

0 10 20 30Time[hrs]

0

0.5

1

1.5

2

OD600

ObservedEstimated

Well A6

0 10 20 30Time[hrs]

0

0.5

1

1.5

2

OD600

ObservedEstimated

Well A10

0 10 20 30Time[hrs]

0

0.5

1

1.5

OD600

ObservedEstimated

Well B7

0 10 20 30Time[hrs]

0

0.5

1

1.5

OD600

ObservedEstimated

Well B11

Fig. 5: The identified Koopman operator is tested on the test setsby using the initial observablesψ(x[0]) and the mean squared errorremains the same as that of the training set

Both show very similar Mean Square error of 3.4% on thetraining data and 9% on the test data ensuring the goodnessof the model. A model for mapping ψ(x[0]) to C[0] and G[0]has been identified using (21) and tested on both trainingand test data and shown in Figure. 6. The Casein modelperforms very well on the test data as well with less than5% on training data and less than 15% error on test data.But the glucose model fails miserably in the training data.

This indicates that Glucose does not have a major role toplay in the initial dynamics of the cell growth dynamic. This

0 10 20 300

1

OD600

ObservedEstimated

Well A5

0 10 20 300

1

ObservedEstimated

Well A7

0 10 20 300

1

2

ObservedEstimated

Well A8

0 10 20 300

1

2

OD600

ObservedEstimated

Well A9

0 10 20 300

1

2

ObservedEstimated

Well A11

0 10 20 300

1

ObservedEstimated

Well B1

0 10 20 30Time[Hrs]

0

1

OD600

ObservedEstimated

Well B6

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B8

0 10 20 30Time[Hrs]

0

1

ObservedEstimated

Well B9

0 10 20 30Time[hrs]

0

0.5

1

1.5

2

OD600

ObservedEstimated

Well A6

0 10 20 30Time[hrs]

0

0.5

1

1.5

2

OD600

ObservedEstimated

Well A10

0 10 20 30Time[hrs]

0

0.5

1

1.5

OD600

ObservedEstimated

Well B7

0 10 20 30Time[hrs]

0

0.5

1

1.5

OD600

ObservedEstimated

Well B11

0

0.05

0.1

0.15

0.2

Amt i

n So

lutio

n[m

g]

Training Data Fit to Initial Glucose

A5 A7 A8 A9Well Index

ObservedEstimated

0

5

10

15

20

Amt i

n So

lutio

n[m

g]

Training Data Fit to Initial Casein

A5 A7 A8 A9Well Index

ObservedEstimated

0

0.05

0.1

0.15

Amt i

n So

lutio

n[m

g]

Test Data Fit to Initial Glucose

A6 A10 B7 B11 C8Well Index

ObservedEstimated

0

5

10

15

20

Amt i

n So

lutio

n[m

g]

Test Data Fit to Initial Casein

A6 A10 B7 B11 C8Well Index

ObservedEstimated

00.10.2

Amt i

n So

lutio

n[m

ning Data Fit to Initial Glucose

A5 A7 A8Well Index

ObservedEstimated

01020

Amt i

n So

lutio

n[m

g]

ining Data Fit to Initial Casein

A5 A7 A8Well Index

ObservedEstimated

00.05

0.1

Amt i

n So

lutio

n[m

Test Data Fit to Initial G

A6 A10Well Index

ObservedEstimated

01020

Amt i

n So

lutio

n[m

g]

Test Data Fit to Initial C

A6 A10Well Index

ObservedEstimated

-0.05

0

0.05

0.1

0.15

0.2

Amt i

n So

lutio

n[m

g]

Training Data Fit to Initial Glucose

A5 A7 A8 A9Well Index

ObservedEstimated

0

5

10

15

20

Amt i

n So

lutio

n[m

g]

Training Data Fit to Initial Casein

A5 A7 A8 A9Well Index

ObservedEstimated

0

0.05

0.1

0.15

Amt i

n So

lutio

n[m

g]

Test Data Fit to Initial Glucose

A6 A10 B7 B11 C8Well Index

ObservedEstimated

0

5

10

15

20

Amt i

n So

lutio

n[m

g]

Test Data Fit to Initial Casein

A6 A10 B7 B11 C8Well Index

ObservedEstimated

Fig. 6: The initial substrate conditions are mapped to the initialdy-namics. The figure indicates that Casein can be predicted with ahighaccuracy reinforcing that the dynamics of Casein is embeddedin thecell growth dynamics. Glucose cannot be predicted properlyneitherin the training set nor the test set.

can also be witnessed in the data shown in Figure. 1 thatthe high growth rate is witnessed only in the region of highCasein and low Glucose and since our data set comprises ofonly the large growth rate region, the role of Glucose shouldbe minimal.

VI. CONCLUSION

We have obtained the data of bacterial cell population inresponse to various initial substrate conditions and developedmodels that utilize the initial response of the cell culture tomap the dynamics at any point in time and also map theinitial conditions to generate the response. In the modelsdeveloped above, the autocorrelation of the prediction erroris nonzero at nonzero lags indicating that the Koopmanoperator has developed the best linear model similar to anOutput-Error model in linear system identification. The fu-ture scope of the work involves modeling the noise dynamicsby stochastic models, remove the simplification of treatingimpulse input as an initial state and formulate the input-output dynamics, formulating an optimization problem whichidentifies the best initial conditions to maximize the growthrate. This will pave the way to study the bacteria in unknownenvironments, robustness to changes in environment andgenetic modification of other bacterial species to achievemaximum growth rate which is a boon to Industrial Biology.

REFERENCES

[1] D. Kornberg and D. TA, “Replication,” San Francisco: W H. Freeman,1980.

[2] N. F. Mathon and A. C. Lloyd, “Cell senescence and cancer,” NatureReviews Cancer, vol. 1, no. 3, p. 203, 2001.

[3] G. Wu, Q. Yan, J. A. Jones, Y. J. Tang, S. S. Fong, and M. A. Koffas,“Metabolic burden: cornerstones in synthetic biology and metabolicengineering applications,” Trends in biotechnology, vol. 34, no. 8, pp.652–664, 2016.

[4] D. S. Glazier, “Is metabolic rate a universal pacemakerfor biologicalprocesses?” Biological Reviews, vol. 90, no. 2, pp. 377–407, 2015.

Page 8: Prediction of the Growth Rate Population Dynamic of Bacteria by …€¦ · driven methods to accelerate the discovery of metabolic models and growth rate prediction models. Due to

[5] D. De Martino, F. Capuani, and A. De Martino, “Growth againstentropy in bacterial metabolism: the phenotypic trade-off behindempirical growth rate distributions in e. coli,” Physical biology, vol. 13,no. 3, p. 036005, 2016.

[6] B. J. Sanchez, C. Zhang, A. Nilsson, P.-J. Lahtvee, E. J. Kerkhoven,and J. Nielsen, “Improving the phenotype predictions of a yeastgenome-scale metabolic model by incorporating enzymatic con-straints,” Molecular systems biology, vol. 13, no. 8, 2017.

[7] M. Zwietering, I. Jongenburger, F. Rombouts, and K. Van’t Riet,“Modeling of the bacterial growth curve,” Appl. Environ. Microbiol.,vol. 56, no. 6, pp. 1875–1881, 1990.

[8] T. Tschirhart, V. Shukla, E. E. Kelly, Z. Schultzhaus, E. NewRingeisen,J. S. Erickson, Z. Wang, W. garcia, E. Curl, R. G. Egbert et al.,“Synthetic biology tools for the fast-growing marine bacterium vibrionatriegens,” ACS synthetic biology, 2019.

[9] N. Khan, E. Yeung, Y. Farris, S. J. Fansler, and H. C. Bernstein,“A broad-host-range event detector: expanding and quantifying per-formance across bacterial species,” bioRxiv, p. 369967, 2018.

[10] C. Gill and K. Tan, “Effect of carbon dioxide on growth of pseu-domonas fluorescens.” Appl. Environ. Microbiol., vol. 38, no. 2, pp.237–240, 1979.

[11] D. M. Gulliver, G. V. Lowry, and K. B. Gregory, “Comparative studyof effects of co2 concentration and ph on microbial communities froma saline aquifer, a depleted oil reservoir, and a freshwater aquifer,”Environmental Engineering Science, vol. 33, no. 10, pp. 806–816,2016.

[12] A. E. LaBauve and M. J. Wargo, “Growth and laboratory mainte-nance of pseudomonas aeruginosa,” Current protocols in microbiology,vol. 25, no. 1, pp. 6E–1, 2012.

[13] A. P. Palacios, J. M. Marın, E. J. Quinto, M. P. Wiper et al., “Bayesianmodeling of bacterial growth for multiple populations,” The Annals ofApplied Statistics, vol. 8, no. 3, pp. 1516–1537, 2014.

[14] H. H. Lee, N. Ostrov, B. G. Wong, M. A. Gold, A. Khalil, and G. M.Church, “Vibrio natriegens, a new genomic powerhouse,” bioRxiv, p.058487, 2016.

[15] I. Mezic, “Spectral properties of dynamical systems, model reductionand decompositions,” Nonlinear Dynamics, vol. 41, no. 1-3, pp. 309–325, 2005.

[16] M. Budisic, R. Mohr, and I. Mezic, “Applied koopmanism,” Chaos:An Interdisciplinary Journal of Nonlinear Science, vol. 22, no. 4, p.047510, 2012.

[17] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data–drivenapproximation of the koopman operator: Extending dynamic modedecomposition,” Journal of Nonlinear Science, vol. 25, no. 6, pp.1307–1346, 2015.

[18] C. W. Rowley, I. Mezic, S. Bagheri, P. Schlatter, and D. S. Henningson,“Spectral analysis of nonlinear flows,” Journal of fluid mechanics, vol.641, pp. 115–127, 2009.

[19] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic modedecomposition with control,” SIAM Journal on Applied DynamicalSystems, vol. 15, no. 1, pp. 142–161, 2016.

[20] M. O. Williams, M. S. Hemati, S. T. Dawson, I. G. Kevrekidis, andC. W. Rowley, “Extending data-driven koopman analysis to actuatedsystems,” IFAC-PapersOnLine, vol. 49, no. 18, pp. 704–709, 2016.

[21] T. Askham and J. N. Kutz, “Variable projection methods for anoptimized dynamic mode decomposition,” SIAM Journal on AppliedDynamical Systems, vol. 17, no. 1, pp. 380–416, 2018.

[22] Y. Kaneko, S. Muramatsu, H. Yasuda, K. Hayasaka, Y. Otake, S. Ono,and M. Yukawa, “Convolutional-sparse-coded dynamic mode decom-position and its application to river state estimation,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP). IEEE, 2019, pp. 1872–1876.

[23] O. Azencot, W. Yin, and A. Bertozzi, “Consistent dynamic modedecomposition,” arXiv preprint arXiv:1905.09736, 2019.

[24] K. Manohar, E. Kaiser, S. L. Brunton, and J. N. Kutz, “Optimizedsampling for multiscale dynamics,” Multiscale Modeling & Simula-tion, vol. 17, no. 1, pp. 117–136, 2019.

[25] P. J. Schmid, “Dynamic mode decomposition of numerical and exper-imental data,” Journal of fluid mechanics, vol. 656, pp. 5–28, 2010.

[26] S. Sinha and E. Yeung, “On computation of koopman operator fromsparse data,” arXiv:1901.03024, 2019.

[27] E. Yeung, S. Kundu, and N. Hodas, “Learning deep neural networkrepresentations for koopman operators of nonlinear dynamical sys-tems,” in 2019 American Control Conference (ACC). IEEE, 2019,pp. 4832–4839.

[28] A. Hasnain, S. Sinha, Y. Dorfan, A. E. Borujeni, Y. Park, P. Maschhoff,U. Saxena, J. Urrutia, N. Gaffney, D. Becker et al., “A data-drivenmethod for quantifying the impact of a genetic circuit on its host,”arXiv preprint arXiv:1909.06455, 2019.

[29] C. A. Johnson and E. Yeung, “A class of logistic functions forapproximating state-inclusive koopman operators,” in 2018 AnnualAmerican Control Conference (ACC). IEEE, 2018, pp. 4803–4810.

[30] S. E. Otto and C. W. Rowley, “Linearly recurrent autoencoder net-works for learning dynamics,” SIAM Journal on Applied DynamicalSystems, vol. 18, no. 1, pp. 558–593, 2019.

[31] N. Takeishi, Y. Kawahara, and T. Yairi, “Learning koopman invariantsubspaces for dynamic mode decomposition,” in Advances in NeuralInformation Processing Systems, 2017, pp. 1130–1140.

[32] Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis, “Extendeddynamic mode decomposition with dictionary learning: A data-drivenadaptive spectral decomposition of the koopman operator,” Chaos:An Interdisciplinary Journal of Nonlinear Science, vol. 27, no. 10,p. 103111, 2017.

[33] P. You, J. Pang, and E. Yeung, “Deep koopman controller syn-thesis for cyber-resilient market-based frequency regulation,” IFAC-PapersOnLine, vol. 51, no. 28, pp. 720–725, 2018.

[34] E. Yeung, Z. Liu, and N. O. Hodas, “A koopman operator approachfor computing and balancing gramians for discrete time nonlinear sys-tems,” in 2018 Annual American Control Conference (ACC). IEEE,2018, pp. 337–344.

[35] E. Hoffart, S. Grenz, J. Lange, R. Nitschel, F. Muller, A. Schwentner,A. Feith, M. Lenfers-Lucker, R. Takors, and B. Blombach, “Highsubstrate uptake rates empower vibrio natriegens as production host forindustrial biotechnology,” Appl. Environ. Microbiol., vol. 83, no. 22,pp. e01 614–17, 2017.

[36] J. Monod, “The growth of bacterial cultures,” Annual review ofmicrobiology, vol. 3, no. 1, pp. 371–394, 1949.

[37] B. W. Brandt, I. M. van Leeuwen, and S. A. Kooijman, “A gen-eral model for multiple substrate biodegradation. application to co-metabolism of structurally non-analogous compounds,” Water re-search, vol. 37, no. 20, pp. 4843–4854, 2003.

[38] D. S. Kompala, D. Ramkrishna, N. B. Jansen, and G. T. Tsao,“Investigation of bacterial growth on mixed substrates: experimentalevaluation of cybernetic models,” Biotechnology and Bioengineering,vol. 28, no. 7, pp. 1044–1055, 1986.