6
Learning about learning by many-body systems Weishun Zhong, 1, * Jacob M. Gold, 2, Sarah Marzen, 1, 3, Jeremy L. England, 1, 4, § and Nicole Yunger Halpern 5, 6, 7, 1 Physics of Living Systems, Department of Physics, Massachusetts Institute of Technology, 400 Tech Square, Cambridge, MA 02139, USA 2 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA 3 W. M. Keck Science Department, Pitzer, Scripps, and Claremont McKenna Colleges, Claremont, CA 91711, USA 4 GlaxoSmithKline AI/ML, 200 Cambridgepark Drive, Cambridge MA, 02140, USA 5 ITAMP, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138, USA 6 Department of Physics, Harvard University, Cambridge, MA 02138, USA 7 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA (Dated: April 9, 2020) Many-body systems from soap bubbles to suspensions to polymers learn the drives that push them far from equilibrium. This learning has been detected with thermodynamic properties, such as work absorption and strain. We progress beyond these macroscopic properties that were first defined for equilibrium contexts: We quantify statistical mechanical learning with representation learning, a machine-learning model in which information squeezes through a bottleneck. We identify a structural parallel between representation learning and far-from-equilibrium statistical mechanics. Applying this parallel, we measure four facets of many-body systems’ learning: classification ability, memory capacity, discrimination ability, and novelty detection. Numerical simulations of a classical spin glass illustrate our technique. This toolkit exposes self-organization that eludes detection by thermodynamic measures. Our toolkit more reliably and more precisely detects and quantifies learning by matter. Many-body systems can learn and remember patterns of drives that propel them far from equilibrium. Such behaviors have been predicted and observed in many set- tings, from charge-density waves [1, 2] to non-Brownian suspensions [3–5], polymer networks [6], soap-bubble rafts [7], and macromolecules [8]. Such learning holds promise for engineering materials capable of memory and computation. This potential for applications, with exper- imental accessibility and ubiquity, have earned these clas- sical nonequilibrium many-body systems much attention recently [9]. We present a machine-learning toolkit for measuring the learning of drive patterns by many-body systems. Our toolkit detects and quantifies many-body learning more thoroughly and precisely than thermody- namic tools used to date. A classical, randomly interacting spin glass exemplifies learning driven matter. Consider sequentially applying fields from a set { ~ A, ~ B, ~ C}, which we call a drive. The spins flip, absorbing work. In a certain parameter regime, the power absorbed shrinks adaptively: The spins mi- grate toward a corner of configuration space where their configuration approximately withstands the drive’s in- sults. Consider then imposing fields absent from the orig- inal drive. Subsequent spin flips absorb more work than if the field belonged to { ~ A, ~ B, ~ C}. A simple, low-dimensional property of the material— absorbed power—distinguishes drive inputs that fit a pat- * [email protected]. The first two coauthors contributed equally. [email protected] [email protected] § [email protected] [email protected] tern from drive inputs that do not. This property reflects a structural change in the spin glass’s configuration. The change is long-lived and not easily erased by a new drive. For these reasons, we say that the material has learned the drive. Many-body learning has been quantified with proper- ties commonplace in thermodynamics. Examples include power, as explained above, and strain in polymers that learn stress amplitudes. Such thermodynamic diagnoses have provided insights but suffer from two shortcomings. First, the thermodynamic properties vary from system to system. For example, work absorption characterizes the spin glass’s learning; strain characterizes non-Brownian suspensions’. A more general approach would facilitate comparisons and standardize analyses. Second, thermo- dynamic properties were defined for macroscopic equilib- rium states. Such properties do not necessarily describe far-from-equilibrium systems’ learning optimally. Separately from many-body systems’ learning, ma- chine learning has flourished over the past decade [10, 11]. Machine learning has enhanced our understanding of how natural and artificial systems learn. We apply machine learning to measure learning by many-body systems. We use the machine-learning model known as represen- tation learning [12] [Fig. 1(a)]. A representation-learning neural network receives a high-dimensional variable X, such as a sentence missing a word, e.g., “The is shining.” The neural network compresses relevant in- formation into a low-dimensional latent variable Z , e.g., word types and relationships. The neural network de- compresses Z into a prediction ˆ Y of a high-dimensional variable Y . Y can be the word missing from the sentence; ˆ Y can be “sun.” The size of the bottleneck Z controls a arXiv:2004.03604v1 [cond-mat.stat-mech] 7 Apr 2020

Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

Learning about learning by many-body systems

Weishun Zhong,1, ∗ Jacob M. Gold,2, † Sarah Marzen,1, 3, ‡ Jeremy L. England,1, 4, § and Nicole Yunger Halpern5, 6, 7, ¶

1Physics of Living Systems, Department of Physics,Massachusetts Institute of Technology, 400 Tech Square, Cambridge, MA 02139, USA

2Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA3W. M. Keck Science Department, Pitzer, Scripps,

and Claremont McKenna Colleges, Claremont, CA 91711, USA4GlaxoSmithKline AI/ML, 200 Cambridgepark Drive, Cambridge MA, 02140, USA

5ITAMP, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138, USA6Department of Physics, Harvard University, Cambridge, MA 02138, USA

7Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA(Dated: April 9, 2020)

Many-body systems from soap bubbles to suspensions to polymers learn the drives that pushthem far from equilibrium. This learning has been detected with thermodynamic properties, suchas work absorption and strain. We progress beyond these macroscopic properties that were firstdefined for equilibrium contexts: We quantify statistical mechanical learning with representationlearning, a machine-learning model in which information squeezes through a bottleneck. We identifya structural parallel between representation learning and far-from-equilibrium statistical mechanics.Applying this parallel, we measure four facets of many-body systems’ learning: classification ability,memory capacity, discrimination ability, and novelty detection. Numerical simulations of a classicalspin glass illustrate our technique. This toolkit exposes self-organization that eludes detection bythermodynamic measures. Our toolkit more reliably and more precisely detects and quantifieslearning by matter.

Many-body systems can learn and remember patternsof drives that propel them far from equilibrium. Suchbehaviors have been predicted and observed in many set-tings, from charge-density waves [1, 2] to non-Browniansuspensions [3–5], polymer networks [6], soap-bubblerafts [7], and macromolecules [8]. Such learning holdspromise for engineering materials capable of memory andcomputation. This potential for applications, with exper-imental accessibility and ubiquity, have earned these clas-sical nonequilibrium many-body systems much attentionrecently [9]. We present a machine-learning toolkit formeasuring the learning of drive patterns by many-bodysystems. Our toolkit detects and quantifies many-bodylearning more thoroughly and precisely than thermody-namic tools used to date.

A classical, randomly interacting spin glass exemplifieslearning driven matter. Consider sequentially applying

fields from a set { ~A, ~B, ~C}, which we call a drive. Thespins flip, absorbing work. In a certain parameter regime,the power absorbed shrinks adaptively: The spins mi-grate toward a corner of configuration space where theirconfiguration approximately withstands the drive’s in-sults. Consider then imposing fields absent from the orig-inal drive. Subsequent spin flips absorb more work than

if the field belonged to { ~A, ~B, ~C}.A simple, low-dimensional property of the material—

absorbed power—distinguishes drive inputs that fit a pat-

[email protected]. The first two coauthors contributed equally.† [email protected][email protected]§ [email protected][email protected]

tern from drive inputs that do not. This property reflectsa structural change in the spin glass’s configuration. Thechange is long-lived and not easily erased by a new drive.For these reasons, we say that the material has learnedthe drive.

Many-body learning has been quantified with proper-ties commonplace in thermodynamics. Examples includepower, as explained above, and strain in polymers thatlearn stress amplitudes. Such thermodynamic diagnoseshave provided insights but suffer from two shortcomings.First, the thermodynamic properties vary from system tosystem. For example, work absorption characterizes thespin glass’s learning; strain characterizes non-Browniansuspensions’. A more general approach would facilitatecomparisons and standardize analyses. Second, thermo-dynamic properties were defined for macroscopic equilib-rium states. Such properties do not necessarily describefar-from-equilibrium systems’ learning optimally.

Separately from many-body systems’ learning, ma-chine learning has flourished over the past decade [10, 11].Machine learning has enhanced our understanding of hownatural and artificial systems learn. We apply machinelearning to measure learning by many-body systems.

We use the machine-learning model known as represen-tation learning [12] [Fig. 1(a)]. A representation-learningneural network receives a high-dimensional variable X,such as a sentence missing a word, e.g., “The isshining.” The neural network compresses relevant in-formation into a low-dimensional latent variable Z, e.g.,word types and relationships. The neural network de-compresses Z into a prediction Y of a high-dimensionalvariable Y . Y can be the word missing from the sentence;Y can be “sun.” The size of the bottleneck Z controls a

arX

iv:2

004.

0360

4v1

[co

nd-m

at.s

tat-

mec

h] 7

Apr

202

0

Page 2: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

2

tradeoff between the memory consumed and the predic-tion’s accuracy. We call the neural networks that performrepresentation learning bottleneck neural networks.

Microstate

Macrostate

Drive

X

Y

Z

(a) (b)

FIG. 1: Parallel between two structures: (a) Structureof a bottleneck neural network, which performsrepresentation learning. (b) Structure of afar-from-equilibrium-statistical-mechanics problem.

Representation learning, we argue, shares its structurewith problems in which a strong drive forces a many-body system [Fig. 1(b)]. The system’s microstate, likeX, occupies a high-dimensional space. A macrostate syn-opsizes the microstate in a few numbers, such as particlenumber and magnetization. This synopsis parallels Z.If the system has learned the drive, the macrostate en-codes the drive. One may reconstruct the drive from themacrostate, as a bottleneck neural network reconstructsY from Z.1

Applying this analogy, we use representation learningto measure how effectively a far-from-equilibrium many-body system learns a drive. We illustrate with numericalsimulations of the spin glass, whose learning has been de-tected with work absorption [14]. However, our methodsgeneralize to other platforms. Our measurement schemeoffers three advantages:

1. Bottleneck neural networks register learning behav-iors more thoroughly and precisely than work ab-sorption.

2. Our framework applies to a wide class of stronglydriven many-body systems. The framework doesnot rely on any particular thermodynamic propertytailored to, e.g., spins.

3. Our approach unites a machine-learning sense oflearning with the statistical mechanical sense. Thisunion is conceptually satisfying.

We apply representation learning to measure classifica-tion, memory capacity, discrimination, and novelty de-tection. Our techniques can be extended to other facetsof learning.

Our measurement protocols share the following struc-ture: The many-body system is trained with a drive (e.g.,

fields ~A, ~B, and ~C). Then, the system is tested (e.g., with

1 See [13] for a formal parallel between representation learningand equilibrium thermodynamics.

a field ~D). Training and testing are repeated in manytrials. Configurations realized by the many-body systemare used to train a bottleneck neural network via unsu-pervised learning. The neural network may then receiveconfigurations from the testing of the many-body system.Finally, we analyze the neural network’s bottleneck.

The rest of this paper is organized as follows: We intro-duce our bottleneck neural network, then the spin glasswith which we illustrate. We then prescribe how to quan-tify, using representation learning, the learning of a driveby a many-body system. Finally, we detail opportunitiesengendered by this study. The feasibility of applying ourtoolkit is supported in [15, Sec. III B].

Bottleneck neural network: The introduction identifieda parallel between thermodynamic problems and bottle-neck neural networks (Fig. 1). In the thermodynamicproblem, Y 6= X represents the drive. We could design abottleneck neural network that predicts drives from con-figurations X. But the neural network would undergosupervised learning, by today’s standards. Supervisedlearning gives the neural network tuples (configuration ofthe many-body system, label of drive that generated theconfiguration). The drive labels are not directly availableto the many-body system. The neural network’s predic-tions would not necessarily reflect only learning by themany-body system. Hence we design a bottleneck neuralnetwork that performs unsupervised learning, receivingonly configurations.

This neural network is a variational autoencoder, [16–18], a generative model: It receives samples x from adistribution over the possible values of X, learns aboutthe distribution, and generates new samples. The neuralnetwork approximates the distribution via Bayesian vari-ational inference [15, App. A]. Network parameters areoptimized during training via backpropagation.

Our variational autoencoder has five fully connectedhidden layers, with neuron numbers 200-200-(number ofZ neurons)-200-200. We usually restrict the latent vari-able Z to 2-4 neurons. This choice facilitates the visu-alization of the latent space and suffices to quantify ourspin glass’s learning. Growing the number of degrees offreedom, and the number of drives, may require more di-mensions. But our study suggests that the number ofdimensions needed � the system size.

Figure 2 depicts the latent space Z. Each neuron cor-responds to one axis and represents a continuous-valuedreal number. The neural network maps each inputtedconfiguration to one latent-space dot. Close-togetherdots correspond to configurations produced by the samefield, if the spin glass and neural network learn well. Weillustrate this clustering in Fig. 2 by coloring each dotaccording to the field that produced it.

Spin glass: A spin glass exemplifies the statistical me-chanical learner [14]. Simulations are ofN = 256 classicalspins. The jth spin occupies one of two possible states:sj = ±1.

The spins couple together and experience an external

Page 3: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

3

2 0 2First latent dimension

3

2

1

0

1

2

3

Seco

nd la

tent

dim

ensi

on

Field AField BField C

FIG. 2: Visualization of latent space, Z: Z consists ofneurons Z1 and Z2. A variational autoencoder formed Zwhile training on configurations assumed by a 256-spin glassduring repeated exposure to three fields, A, B, and C. Theneural network mapped each configuration to a dot inlatent-space. We color each dot in accordance with the fieldthat produced the configuration. Same-color dots clustertogether: The neural network identified which configurationsresulted from the same field.

magnetic field: Spin j evolves under a Hamiltonian

Hj(t) =∑k 6=j

Jjksjsk +Aj(t)sj , (1)

and the spin glass evolves under H(t) = 12

∑Nj=1Hj(t),

at time t. We call the first term in Eq. (1) the interactionenergy and the second term the field energy. The cou-plings Jjk = Jkj are defined in terms of an Erdos-Renyirandom network: Spins j and k have some probability pof interacting, for all j and k 6= j. Each spin couples toeight other spins, on average. The nonzero couplings Jjkare selected according to a normal distribution of stan-dard deviation 1.Aj(t) denotes the magnitude and sign of the external

field experienced by spin j at time t. The field alwayspoints along the same direction, the z-axis, so we omit

the arrow from ~Aj(t). We will simplify the notation forthe field from {Aj(t)}j to A (or B, etc.). Each Aj isselected according to a normal distribution of standarddeviation 3. The field changes every 100 seconds. Totrain the spin glass, we construct a drive by forming aset {A,B, . . .} of random fields. We randomly select afield from the set, then apply the field for 100 s. Thisselection-and-application process is performed 300 times.

The spin glass exchanges heat with a bath at a temper-ature T = 1/β. We set Boltzmann’s constant to kB = 1.Energies are measured in Kelvins (K). To flip, a spinmust overcome a height-B energy barrier. Spin j tendsto flip at a rate ωj = eβ[Hj(t)−B]/(1 second) . This ratehas the form of Arrhenius’s law and obeys detailed bal-ance. The average spin flips once per 107 s. We modelthe evolution with discrete 100-s time intervals, using theGillespie algorithm.

The spins absorb work when the field changes, as from

{Aj(t)} to {A′j(t)}. The change in the spin glass’s en-ergy equals the work absorbed by the spin glass: W :=∑Nj=1

[A′j(t)−Aj(t)

]sj . Absorbed power is defined as

W/(100 s). The spin glass dissipates heat by losing en-ergy as spins flip.

The spin glass is initialized in a uniformly random con-figuration C. Then, the spins relax in the absence of anyfield for 100,000 seconds. The spin glass navigates tonear a local energy minimum. If a protocol is repeatedin multiple trials, all the trials begin with the same C.

In a certain parameter regime, the spin glass learnsits drive effectively, even according to the absorbedpower [14]. Consider training the spin glass on a drive{A,B,C}. The spin glass absorbs much work initially.If the spin glass learns the drive, the absorbed powerdeclines. If a dissimilar field D is then applied, the ab-sorbed power spikes. The spin glass learns effectively inthe Goldilocks regime β = 3 K−1 and B = 4.5 K [14]:The temperature is high enough, and the barriers are lowenough, that the spin glass can explore phase space. ButT is low enough, and the barriers are high enough, thatthe spin glass is not hopelessly peripatetic. We distin-guish robust learning from superficially similar behaviorsin [15, App. B].

How to detect and quantify a many-body system’slearning of a drive, using representation learning: Learn-ing has many facets; we detect and quantify four: clas-sification ability, memory capacity, the discrimination ofsimilar fields, and novelty detection. We illustrate withclassification here and detail the rest in [15, Sec. II].Other facets of learning may be quantified similarly. Ourrepresentation-learning approach detects and measureslearning more reliably and precisely than absorbed powerdoes. The code used is accessible at [19].

A system classifies a drive when identifying the driveas one of many possibilities. A variational autoencoder,we find, reflects more of a spin glass’s classification abilitythan absorbed power does: We generated random fieldsA, B, C, D, and E. From 4 of the fields, we formed thedrive D1 := {A,B,C,D}. On the drive, we trained thespin glass in each of 1,000 trials. In each of 1,000 othertrials, we trained a fresh spin glass on a drive D2 :={A,B,C,E}. We repeated this process for each of the 5possible 4-field drives. Ninety percent of the trials wererandomly selected for training our neural network. Therest were used for testing.

Using the variational autoencoder, we measured thespin glass’s ability to classify drives: We identified theconfigurations occupied by the spin glass at a time t inthe training trials. On these configurations, we trainedthe neural network. The neural network populated thelatent space with dots (as in Fig. 2) whose density formeda probability distribution.

We inputted into the neural network a time-t configu-ration from a test trial. The neural network compressedthe configuration into a latent-space point. We calculatedwhich drive most likely, according to the probability den-sity, generated the latent-space point. The calculation

Page 4: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

4

was maximum-likelihood estimation (see [20] and [15,App. C]). We performed this testing and estimation foreach trial in the test data. The fraction of trials in whichthe estimation succeeded constitutes the score. The scoreis plotted against t in Fig. 3 (blue, upper curve).

Number of changes of the field

FIG. 3: Quantification of a many-body system’sclassification ability: A spin glass classified a drive as oneof five possibilities. We define the system’s classificationability as the score of maximum-likelihood estimationperformed with a variational autoencoder (blue, uppercurve). We compare with the score of maximum-likelihoodestimation performed with absorbed power (orange, lowercurve). The variational-autoencoder score rises to near themaximum, 1.00. The thermodynamic score exceeds therandom-guessing score, 1/5, slightly. The neural networkdetects more of the spins’ classification ability.

We compare with the classification ability attributedto the spin glass by the absorbed power: For each driveand each time t, we histogrammed the power absorbedwhile that drive was applied at t in a neural-network-training trial. Then, we took a trial from the test setand identified the power absorbed at t. We inferred whichdrive most likely, according to the histograms, producedthat power. The guess’s score appears as the orange,lower curve in Fig. 3.

A score maximizes at 1.00 if the drive is alwaysguessed accurately. The score is lower-bounded by therandom-guessing value 1/(number of drives) = 1/5. InFig. 3, each score grows over tens of field switches. Theabsorbed-power score begins at2 0.20 and comes to fluc-tuate around 0.25. The neural network’s score comesto fluctuate slightly below 1.00. Hence the neural net-work detects more of the spin glass’s classification abilitythan the absorbed power does, in addition to suggesting ameans of quantifying the classification ability rigorously.

Discussion: We have detected and quantified a many-body system’s learning of its drive, using representa-

2 The neural network’s score begins a short distance from 0.20.The distance, we surmise, comes from stochasticity of threetypes: the spin glass’s initial configuration, the maximum-likelihood estimation, and stochastic gradient descent. Stochas-ticity of only the first two types affects the absorbed-power score.

tion learning, with greater sensitivity than absorbedpower affords. We illustrated by quantifying a many-body system’s ability to classify drives, with the score ofmaximum-likelihood estimates calculated from a varia-tional autoencoder’s latent space. Our toolkit extends toquantifying memory capacity, discrimination, and nov-elty detection. The scheme relies on a parallel that weidentified between statistical mechanical problems andneural networks. Uniting statistical mechanical learn-ing with machine learning, the definition is conceptuallysatisfying. The definition also has wide applicability, notdepending on whether the system exhibits magnetizationor strain or another thermodynamic response. Further-more, our representation-learning toolkit signals many-body learning more sensitively than does the seeminglybest-suited thermodynamic tool. This work engendersseveral opportunities. We detail three below and fourmore in [15, Sec. III].

(i) Decoding latent space: Thermodynamicists param-eterize macrostates with volume, energy, magnetization,etc. Thermodynamic macrostates parallel latent space(Fig. 1). Which variables parameterize the neural net-work’s latent space? Latent space could suggest defini-tions of new thermodynamic variables, or hidden rela-tionships amongst known thermodynamic variables.

We illustrate with part of the protocol for quantifyingclassification: Train the spin glass with a drive {A,B,C}in each of many trials. On the end-of-trial configurations,train the neural network. Figure 4 reveals physical sig-nificances of two latent-space directions: The absorbedpower grows along the diagonal from the bottom right-hand corner to the upper lefthand corner (Fig. 4a). Themagnetization grows radially (Fig. 4b). The directionsare nonorthogonal, suggesting a nonlinear relationshipbetween the thermodynamic variables. Convention bi-ases thermodynamicists toward measuring volume, mag-netization, heat, work, etc. The neural network mightidentify new macroscopic variables better-suited to far-from-equilibrium statistical mechanics, or hidden nonlin-ear relationships amongst thermodynamic variables. Abottleneck neural network could uncover new theoreticalphysics, as discussed in, e.g., [21–23].

(ii) Resolving open problems in statistical mechanicallearning: Our toolkit is well-suited to answering openproblems about many-body learners; we expect to reportan experimental application in a followup. An exampleproblem concerns the soap-bubble raft in [7]. Experimen-talists trained a raft of soap bubbles with an amplitude-γt strain. The soap bubbles’ positions were tracked, andvariances in positions were calculated. No such measuresdistinguished trained rafts from untrained rafts; onlystressing the raft and reading out the strain could [7, 24].Bottleneck neural networks may reveal what microscopicproperties distinguish trained from untrained rafts.

(iii) Extensions to quantum systems: Far-from-equilibrium many-body systems have been realized withmany quantum platforms, including ultracold atoms [25],trapped ions [26, 27], and nitrogen vacancy centers [28].

Page 5: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

5

Absorbed pow

er

(a) Correspondence of absorbedpower to a diagonal

Magnetization

(b) Correspondence of magnetizationto the radial direction

FIG. 4: Correspondence of latent-space directions tothermodynamic quantities: A variational autoencodertrained on the configurations assumed by a spin glass duringits training with fields A, B, and C. We have color-codedeach latent-space plot, highlighting how a thermodynamicproperty changes along some direction. In Fig. 4a, theabsorbed power grows from the bottom righthand corner tothe upper lefthand corner. In Fig. 4b, the magnetizationgrows radially.

Applications to memories have been proposed [29, 30].Yet quantum memories that remember particular coher-ent states have been focused on. The learning of strongdrives by quantum many-body systems calls for explo-ration, as the learning of strong drives by many-bodysystems has proved productive in classical statistical me-chanics. Our framework can guide this exploration.

(iv) Learning about representation learning: We identi-fied a parallel between representation learning and statis-tical mechanics. The parallel enabled us to use represen-tation learning to gain insight into statistical mechanics.Recent developments in information-theoretic far-from-equilibrium statistical mechanics (e.g., [31–34]) might, inturn, shed new light on representation learning.

ACKNOWLEDGMENTS

The authors thank Alexander Alemi, Isaac Chuang,Emine Kucukbenli, Nick Litombe, Seth Lloyd, JuliaSteinberg, Tailin Wu, and Susanne Yelin for useful dis-cussions. WZ is supported by ARO Grant W911NF-18-1-0101; the Gordon and Betty Moore Foundation Grant,under No. GBMF4343; and the Henry W. Kendall (1955)Fellowship Fund. JMG is funded by the AFOSR, underGrant FA9950-17-1-0136. SM was supported partiallyby the Moore Foundation, via the Physics of Living Sys-tems Fellowship. This material is based upon work sup-ported by, or in part by, the Air Force Office of ScientificResearch, under award number FA9550-19-1-0411. JLEhas been funded by the Air Force Office of Scientific Re-search grant FA9550-17-1-0136 and by the James S. Mc-Donnell Foundation Scholar Grant 220020476. NYH isgrateful for an NSF grant for the Institute for TheoreticalAtomic, Molecular, and Optical Physics at Harvard Uni-versity and the Smithsonian Astrophysical Observatory.NYH also thanks CQIQC at the University of Toronto,the Fields Institute, and Caltech’s Institute for QuantumInformation and Matter (NSF Grant PHY-1733907) fortheir hospitality during the development of this paper.

REFERENCES

[1] S. N. Coppersmith et al., Phys. Rev. Lett. 78, 3983(1997).

[2] M. L. Povinelli, S. N. Coppersmith, L. P. Kadanoff, S. R.Nagel, and S. C. Venkataramani, Phys. Rev. E 59, 4970(1999).

[3] N. C. Keim and S. R. Nagel, Phys. Rev. Lett. 107,010603 (2011).

[4] N. C. Keim, J. D. Paulsen, and S. R. Nagel, Phys. Rev.E 88, 032306 (2013).

[5] J. D. Paulsen, N. C. Keim, and S. R. Nagel, Phys. Rev.Lett. 113, 068301 (2014).

[6] S. Majumdar, L. C. Foucard, A. J. Levine, and M. L.Gardel, Soft Matter 14, 2052 (2018).

[7] S. Mukherji, N. Kandula, A. K. Sood, and R. Ganapathy,Phys. Rev. Lett. 122, 158001 (2019).

[8] W. Zhong, D. J. Schwab, and A. Murugan, Journal ofStatistical Physics 167, 806 (2017).

[9] N. C. Keim, J. D. Paulsen, Z. Zeravcic, S. Sastry, andS. R. Nagel, Rev. Mod. Phys. 91, 035002 (2019).

[10] M. Nielsen, Neural Networks and Deep Learning (Deter-mination Press, 2015).

[11] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-ing (MIT Press, 2016), http://www.deeplearningbook.org.

[12] Y. Bengio, A. Courville, and P. Vincent,arXiv:arXiv:1206.5538 (2012).

[13] A. A. Alemi and I. Fischer, arXiv:1807.04162 (2018).[14] J. M. Gold and J. L. England, arXiv:1911.07216 (2019).[15] W. Zhong, J. M. Gold, S. Marzen, J. L. England,

and N. Y. Halpern, Quantifying many-body learningfar from equilibrium with representation learning, 2020,2001.03623.

[16] D. P. Kingma and M. Welling, arXiv:1312.6114 (2013).

Page 6: Department of Physics, Harvard University, Cambridge, MA 02138, … · 2020. 4. 9. · Microstate Macrostate Drive X Y Z (a) (b) FIG. 1: Parallel between two structures: (a) Structure

6

[17] D. Jimenez Rezende, S. Mohamed, and D. Wierstra,Stochastic Backpropagation and Approximate Inferencein Deep Generative Models, in Proc. 31st Int. Conf. onMachine Learning, 2014.

[18] C. Doersch, arXiv:1606.05908 (2016).[19] Online code, https://github.com/smarzen/Statistical-

Physics, 2020.[20] C. M. Bishop, Pattern Recognition and Machine Learning

(Springer, 2006).[21] G. Carleo et al., Rev. Mod. Phys. 91, 045002 (2019).[22] T. Wu and M. Tegmark, Phys. Rev. E 100, 033311

(2019).[23] R. Iten, T. Metger, H. Wilming, L. del Rio, and R. Ren-

ner, Phys. Rev. Lett. 124, 010508 (2020).[24] J. Miller, Physics Today (2019).[25] T. Langen, R. Geiger, and J. Schmiedmayer, Annual

Review of Condensed Matter Physics 6, 201 (2015),

https://doi.org/10.1146/annurev-conmatphys-031214-014548.

[26] N. Friis et al., Phys. Rev. X 8, 021012 (2018).[27] J. Smith et al., Nature Physics 12, 907 (2016).[28] G. Kucsko et al., Phys. Rev. Lett. 121, 023601 (2018).[29] D. A. Abanin, E. Altman, I. Bloch, and M. Serbyn, Rev.

Mod. Phys. 91, 021001 (2019).[30] C. J. Turner, A. A. Michailidis, D. A. Abanin, M. Serbyn,

and Z. Papic, Nature Physics 14, 745 (2018).[31] S. Still, D. A. Sivak, A. J. Bell, and G. E. Crooks, Phys.

Rev. Lett. 109, 120604 (2012).[32] J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa, Na-

ture Physics 11, 131 (2015).[33] J. P. Crutchfield, arXiv:1710.06832 (2017).[34] A. Kolchinsky and D. H. Wolpert, Journal of Statisti-

cal Mechanics: Theory and Experiment 2017, 083202(2017).