14
RESEARCH STATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly driven toward rigorous understanding of self-organization in discrete spatial processes, which arise in different fields including neural networks, physics, computer science, and epidemiology. In the context of modern data science, I am interested in convergence of online/distributed algorithms, MCMC sampling, online learning algorithms and inference. 1. OVERVIEW 1.1. Markovian theory of network data and machine learning. In the past few decades, a large number of mod- els, methods, and algorithmic frameworks have been developed for understanding large and complex datasets. However, as the size and complexity of data are rapidly increasing, understanding their explicit or hidden struc- tures poses further challenges to existing methods. One of my main research program is to develop systematic theory of network data and machine learning based on the Markov chain theory. My two main achievements in this program address the problems of 1) Sampling graphs from large networks and computing/testing stable net- work observables, and 2) convergence of Online Matrix Factorization for Markovian stream of data. Conventional sampling algorithms in statistical network data analysis such as independent and neighborhood sampling have limitations to dense or sparse networks, respectively. In a joint work with Memoli and Sivakoff [LMS19], we propose two complementary MCMC algorithms for sampling motifs from networks and establish bounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in conventional sampling methods and provides efficient ways to compute and test network observables that reveal hierarchical structures. When applied for analyzing Word Adjacency Networks of a 45 novels data set, our framework reveals distinct hierarchical patterns depending on the authors (see Figure 2). Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approxi- mate representation of complex data sets in terms of a reduced number of extracted features. Convergence guar- antees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In a joint work with Needell and Balzano [LNB19], I have show that the well-known OMF algorithm for stream of independent and identically distributed data proposed in [MBPS10], in fact converges almost surely to the set of critical points of the expected loss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. This enables learning features of the sample space directly from a MCMC trajectory (see Figure 4). Also, by combining OMF and the network sampling algorithm we proposed in [LMS19], we propose a novel framework of Network Dictionary Learning, which extracts ‘network dictionary patches’ from a given network in an online manner that encodes main features of the network. Currently I am working on extending the aforementioned Markovian theory of network-matrix data into the hypernetwork-tensor setting. Implications of this project include MCMC algorithms for sampling hypergraphs from hypernetworks, computing and testing various hypernetwork observables, online tensor factorization on Markovian data, learning temporal change of topics or structures in time-varying matrix data, and so on. 1.2. Solitons, box-ball systems, and integrable probability. The box-ball systems (BBS) are integrable cellular automata whose long-time behavior is characterized by the soliton solutions, and have rich connections to other integrable systems such as the celebrated Korteweg-de Veris (KdV) equation. The original form of BBS first in- vented by Takahashi and Satsuma [TS90] as a discrete counterpart of KdV, and later the explicit limiting procedure from KdV to BBS was discovered [TTMS96]. BBS is known to arise both from the quantum and classical integrable systems by the procedures called crystallization and ultradiscretization, respectively. This double origin of the in- tegrability of BBS lies behind its deep connections to quantum groups, crystal base theory, solvable lattice models, the Bethe ansatz, soliton equations, ultradiscretization of the Korteweg-de Vries equation, tropical geometry and so forth [FYO00, HHI + 01, KOS + 06, IKT12]. BBS with random initial configuration is an emerging topic in the probability literature and has gained consid- erable attention with a number of recent works [LLP17, CKST18, KL18, FG18, KL18, CS19a, CS19b, LLPS19]. My reseach on randomized BBS leads a body of this literature, aiming to answer the following question: If the first n boxes are randomly occupied with balls, what is the limiting shape of the invariant Young diagram corresponding to the soliton decomposition, as the system size n tends to infinity? In a joint work with Levine and Pike [LLP17], 1

RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

RESEARCH STATEMENTHanbaek Lyu

I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. Myresearch is broadly driven toward rigorous understanding of self-organization in discrete spatial processes, whicharise in different fields including neural networks, physics, computer science, and epidemiology. In the context ofmodern data science, I am interested in convergence of online/distributed algorithms, MCMC sampling, onlinelearning algorithms and inference.

1. OVERVIEW

1.1. Markovian theory of network data and machine learning. In the past few decades, a large number of mod-els, methods, and algorithmic frameworks have been developed for understanding large and complex datasets.However, as the size and complexity of data are rapidly increasing, understanding their explicit or hidden struc-tures poses further challenges to existing methods. One of my main research program is to develop systematictheory of network data and machine learning based on the Markov chain theory. My two main achievements inthis program address the problems of 1) Sampling graphs from large networks and computing/testing stable net-work observables, and 2) convergence of Online Matrix Factorization for Markovian stream of data.

Conventional sampling algorithms in statistical network data analysis such as independent and neighborhoodsampling have limitations to dense or sparse networks, respectively. In a joint work with Memoli and Sivakoff[LMS19], we propose two complementary MCMC algorithms for sampling motifs from networks and establishbounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, wepropose a novel framework for network data analysis that circumvents some of the drawbacks in conventionalsampling methods and provides efficient ways to compute and test network observables that reveal hierarchicalstructures. When applied for analyzing Word Adjacency Networks of a 45 novels data set, our framework revealsdistinct hierarchical patterns depending on the authors (see Figure 2).

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approxi-mate representation of complex data sets in terms of a reduced number of extracted features. Convergence guar-antees for most of the OMF algorithms in the literature assume independence between data matrices, and the caseof a dependent data stream remains largely unexplored. In a joint work with Needell and Balzano [LNB19], I haveshow that the well-known OMF algorithm for stream of independent and identically distributed data proposedin [MBPS10], in fact converges almost surely to the set of critical points of the expected loss function, even whenthe data matrices form a Markov chain satisfying a mild mixing condition. This enables learning features of thesample space directly from a MCMC trajectory (see Figure 4). Also, by combining OMF and the network samplingalgorithm we proposed in [LMS19], we propose a novel framework of Network Dictionary Learning, which extracts‘network dictionary patches’ from a given network in an online manner that encodes main features of the network.

Currently I am working on extending the aforementioned Markovian theory of network-matrix data into thehypernetwork-tensor setting. Implications of this project include MCMC algorithms for sampling hypergraphsfrom hypernetworks, computing and testing various hypernetwork observables, online tensor factorization onMarkovian data, learning temporal change of topics or structures in time-varying matrix data, and so on.

1.2. Solitons, box-ball systems, and integrable probability. The box-ball systems (BBS) are integrable cellularautomata whose long-time behavior is characterized by the soliton solutions, and have rich connections to otherintegrable systems such as the celebrated Korteweg-de Veris (KdV) equation. The original form of BBS first in-vented by Takahashi and Satsuma [TS90] as a discrete counterpart of KdV, and later the explicit limiting procedurefrom KdV to BBS was discovered [TTMS96]. BBS is known to arise both from the quantum and classical integrablesystems by the procedures called crystallization and ultradiscretization, respectively. This double origin of the in-tegrability of BBS lies behind its deep connections to quantum groups, crystal base theory, solvable lattice models,the Bethe ansatz, soliton equations, ultradiscretization of the Korteweg-de Vries equation, tropical geometry andso forth [FYO00, HHI+01, KOS+06, IKT12].

BBS with random initial configuration is an emerging topic in the probability literature and has gained consid-erable attention with a number of recent works [LLP17, CKST18, KL18, FG18, KL18, CS19a, CS19b, LLPS19]. Myreseach on randomized BBS leads a body of this literature, aiming to answer the following question: If the first nboxes are randomly occupied with balls, what is the limiting shape of the invariant Young diagram correspondingto the soliton decomposition, as the system size n tends to infinity? In a joint work with Levine and Pike [LLP17],

1

Page 2: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

2

I have addressed this question for the basic 1-color BBS with i.i.d. initial configuration. In collaborations withKuniba and Okado [KLO18, KL18], I have investigated the row lengths in the multicolor BBS and obtained Schurpolynomials representations of their scaling limit as well as their large deviations principle. Furthermore, withLewis, Pylyavskyy, and Sen [LLPS19], I have obtained scaling limits of the columns lengths in multicolor BBS usinga modified version of Greene-Kleitman invariants for BBS and circular exclusion processes.

Topics for my current and future research on randomized BBS include two-sided scaling limit of the invariantYoung diagrams, scaling limit of the higher order Young diagrams in mulicolor BBS, generalization to the discreteKdV with random initial configuration, and identifying relation between Grothendieck polynomials, combinatorialR, and the BBS.

1.3. What does a random contingency table with non-uniform margins look like? Contingency tables are fun-damental objects in statistics for studying dependence structure between two or more variables, see e.g. [Eve92,FLL17, Kat14]. They also correspond to bipartite multi-graphs with given degrees and play an important role incombinatorics and graph theory, see e.g. [Bar09, DG95, DS98].

Hypergeometric (or Fisher-Yates) and uniform distributions are two widely used null models for the set of con-tingency tables with given row and column margins, in order to test independence and structural hypotheses, re-spectively [G+63, DE85, MH+13]. Once we reject the null hypothesis of independence using Pearson’s chi-squaredor Fisher’s exact tests, we may continue studying the structure of the contingency table by asking some patternsbetween rows and columns (e.g., in co-occurrence table in ecology [CS79]) or deviation from independence (e.g.,Conditional Volume Test [DE85]). However, unlike the case of hypergeometric distribution, sampling a uniformlydistributed contingency table is surprisingly hard [DE85, Sul18], and various MCMC algorithms and their mixingtimes have been proposed and analyzed [DG95, DKM97, BLSY10, BH12, PD19].

Sampling a random uniform contingency tables is closely related to the problem of counting the number of con-tingency tables with prescribed row and column margins. When the margins are constant or have bounded ratioclose to 1, the exact or asymptotically exact asymptotics for this count are known [CM10, GM08, BLSY10], and theuniform and hypergeometric tables behave very similarly. However, when the margins are far from being constant,Barvinok [Bar10] conjectured that the behavior of the random uniform contingency table becomes drastically dif-ferent from that of hypergeometric table. In a joint work with Dittmer and Pak [DLP19b], I have rigorously shown,for the first time, that there exists a sharp phase transition in the behavior of the uniform contingency table as thesize of the table grow, when the margins assume two values and their ratio exceeds an explicit critical ratio Bc . Thisgives a probabilistic answer to the statistical question on why sampling a uniformly distributed contingency tableis hard – the existence of phase transition.

Currently I am working with Pak to generalize our phase transition for two distinct margin values to the generalcase, aiming to provide a full atlas of phase transition structure in the set of uniform contingency tables.

1.4. Interacting particle systems and discrete spatial processes. Many important phenomena that we would liketo understand − formation of public opinion, trending topics on social networks, growth of stock market, devel-opment of cancer cells, outbreak of epidemics, and collective computation in distributed systems − are closelyrelated to predicting large-scale behavior of systems of locally interacting agents. Discrete spatial processes pro-vide a simple framework for modeling such systems: A vertex coloring X t : V → Zκ on a given graph G = (V ,E)updates in discrete or continuous time according to a fixed deterministic or random transition rule. In a typicalsetting in applied probability literature, one draws the initial coloring X0 from some probability measure and askshow the probability P(X t has property P ) behaves. The answer usually depends on details such as topology of theunderlying graph and parameters in the model.

In collaboration with many researchers in the field, I have addressed the above question for a number of mod-els arising from different contexts: the firefly cellular automata (coupled oscillators), the cyclic cellular automata(BZ checmial reaction), the Greenberg-Hastings model (neural network), the cyclic particle system (multicoloracyclic voter model) [FL17], and lastly, the parking process and ballistic annihilation (annihilating particle systems)[DGJ+17]. The first model were studied on the one-dimensional integer lattice Z by constructing integer-valuedcomparison processes on Z and using combinatorial correspondences [LS17b, LS17a]. I have also used one of themain techniques to answer an open question on persistence of correlated partial sums [LS17a]. A special case

Page 3: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

3

(κ= 3) of the following two models were studied on arbitrary underlying graphs by constructing a similar compar-ison process on the universal cover of the underlying graph [GLS16]. For the cyclic particle system, we proved aconjecture of Bramson and Griffeath in 1989 that the 3- and 4-color system clusters on Z.

My recent contribution to this field concerns multitype annihilating particle systems, namely, ballistic annihi-lation, the parking process, and diffusion-limited annihilating systems. With Junge [JL18], I have obtained a com-plete phase transition picture in the asymmetric ballistic annihilation, where only some heuristic was availablein the physics literature [DRFP95]. On the other hand, we proposed and studied the parking process on a classof lattice-like graphs (e.g., Cayley graphs) and obtained sharp phase transition in the long-time particle density[DGJ+17]. Extending some of our techniques we developed for the parking process as well as using new tech-niques, we have answered some open problems on particle densities in diffusion-limited annihilating systems[DJJ+19].

1.5. Global synchronization of pulse-coupled oscillators on trees. Systems of coupled oscillators (e.g., blinkingfireflies, circadian pacemakers, BZ chemical oscillators) have been a central subject in nonlinear dynamical sys-tems literature for decades [Str00], and have found numerous applications in many areas including robotic vehiclenetworks [NL07] and electric power networks [DB12]. Pulse-coupling is a class of distributed time evolution rulefor networked phase oscillators inspired by biological oscillators, which depends only upon event-triggered localpulse communication. Recently, theory of pulse-coupled oscillators (PCOs) are finding applications as a scalableand efficient clock synchronization algorithms for networks of resource-constrained devices such as wireless sen-sors [PS11].

A fundamental question in systems of coupled oscillators concerns deriving global convergence toward syn-chrony from all (or almost all under some measure) possible initial configurations. For some well-known mod-els such as Kuramoto’s or Peskin’s, global convergence has been established only on complete graphs and cycles[MS90, KB12, WND13]. In a series solo papers [Lyu15, Lyu16, Lyu17] I introduced a classes of discrete and contin-uous PCOs and derived their global convergence on finite trees. Unlike complete graphs or cycles, global synchro-nization on trees is more versatile since it can easily be extended to general graphs by composing with a spanningtree algorithm. This led to a fast universal clock synchronization algorithm with minimal memory and communi-cation requirement, which is especially suitable for synchronizing wireless sensor networks. More discussion onthis topic is given in Section 6.

2. MARKOVIAN THEORY FOR NETWORK DATA AND MACHINE LEARNING

My general principle for incorporating the well-established theory of Markov chains into the realm of data the-ory and machine learning is the following. 1) Construct versatile sampling algorithms by Markov Chain MonteCarlo (MCMC) and obtain bounds on their mixing times; 2) Extend online algorithms for i.i.d. stream of data intothe Markovian setting, so that we can learn directly from MCMC trajectories. Use Markov chain limit theorems toguarantee their convergence; 3) Obtain bounds for inference using concentration inequalities for Markov chainsand stability inequalities. Below we discuss more details on two of the recently completed projects along this line.

2.1. Motif sampling and network data analysis. Sampling is an indispensable tool in statistical analysis of largegraphs and networks, which provides means of computing some of the essential network observables such as av-erage degree, mean shortest path length, and clustering coefficient [WS98, KC14]. In [LMS19], we proposed a novelgraphical sampling method called motif sampling that circumvents some of the main drawbacks of the indepen-dent and neighborhood samplings. The key idea is first to fix ‘template graph’ (motif) F of k nodes, and then

(𝐻 , 𝐹 ) 0.672 0.000 0.038 0.537 0.072 0.063 0.259 0.136 0.468

(𝐻 , 𝐹 ) 0.745 0.121 0.001 0.536 0.287 0.127 0.430 0.438 0.011

(𝐻 , 𝐹 ) 0.750 0.071 0.000 0.434 0.098 0.137 0.300 0.227 0.115

Frobenius 0.364 1.000 0.093 1.000 0.012 0.417 0.196 0.742 0.023

𝐻 , , 𝐻 , 𝐹 , , 𝐹 , 𝐻 , , 𝐹 ,

𝐻 , , 𝐻 , 𝐹 , , 𝐹 , 𝐻 , , 𝐹 ,

Figure 1. Independent sampling (left), neighborhood

sampling (middle), and motif sampling (right).

to sample k nodes from a given network G so that the inducedsubnetwork always contains a copy of F . This is equivalent to con-ditioning the independent sampling to contain a ‘homomorphiccopy’ of F so that we always sample some meaningful portion ofthe network, where the prescribed graph F serves as a backbone.One can then study properties of subnetworks of G induced onthis random copy of F . An immediate advantage is that we havethe freedom to change the motif F to capture different aspects of the network.

Page 4: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

4

Motif sampling can be formulated as sampling a random vertex map x from a motif F = ([k], AF ) into a networkG = ([n], A) according to the following probability distribution

πF→G (x) = 1

Z

( ∏1≤i , j≤k

A(x(i ),x( j ))AF (i , j )

)1

nk, (1)

where Z denotes the partition function. Note that rejection and importance samplings based on independent sam-pling would be too costly for motif sampling, especially when the network G is sparse. In [LMS19], we provide twocomplementary Markov Chain Monte Carlo algorithms for motif sampling inspired by Glauber chain and randomwalk on networks, and obtain bounds on their mixing times. In practice, this tells us how long we should run theMCMC algorithms to obtain desired precision.

Given the MCMC algorithms, we can take various time averages along the MCMC trajectory to compute theexpectation of various functions of the network. For instance, consider the ‘wedge motif’ F = ([3],1{(1,2),(1,3)}),where node 1 has connections to nodes 2 and 3. Given a filtration parameter θ ≥ 0 and a MCMC trajectory (xt )t≥0

of embeddings F →G , Markov chain ergodic theorem implies

PπF→G(A(x(2),x(3)) > θ) = lim

N→∞1

N

N∑t=1

1(A(xt (2),xt (3) > θ). (2)

William Shakespeare Mark Twain

Romeo and Juliet Julius Cesar Advts. of Tom Sawyer Advts. of Huckleberry Finn

Cond

. pro

b.

Filtration level Filtration level Filtration levelFiltration level

Cond

. pro

b.

Wor

d Ad

jace

ncy

Net

wor

k

William Shakespeare Jane Austen Charles Dickens Herman Melville Pride and Prejudice Oliver Twist Moby Dick; or, The Whale

whale Hamlet

Figure 2. Heat map of the Function Word Adjacency Networks of four novels

and their CHD profiles corresponding to the self-loop motif.

When θ = 0, the left hand side asks the probabil-ity that individuals x(2) and x(3) with a commonfriend x(0) in network G know each other, whenthe three are chosen randomly according to themeasure (1). Notice that this is a probabilistic re-formulation of the celebrated network clusteringcoefficient [WS98]. Hence if we increase the fil-tration parameter θ from 0, we obtain a hierar-chical profile of clustering coefficient of G . Tak-ing finite partial sum in the right hand side of (2)gives an approximate algorithm to compute thisprofile, where mixing time bounds and concen-tration inequalities for Markov chains give con-trol on the statistical error.

Figure 2 shows example of such profiles corresponding to the Function Word Adjacency Networks associatedwith four novels, two by William Shakespeare and other two by Mark Twain. The profiles seem to indicate that thereis a clear distinction between the two authors (with the interpretation that Shakespeare tend to repeat functionwords within a sentence less frequently). But how much can we trust such distinction given by the profiles? Aquantitative answer to this question about reliability of inference is given by establishing what is called a ‘stabilityinequality’ for the observable. For the ‘wedge-profile’ in (2), one of the main result in [LMS19] asserts that for anytwo networks G1 and G2,∥∥PπF→G1

(A(x(2),x(3)) > ·)−PπF→G2(A(x(2),x(3)) > ·)∥∥1 ≤ d(G1,G2), (3)

where the left hand side denotes the L1 distance between the corresponding profiles and the right hand side is asuitable metric distance between the networks. This means that small change in the network results in a smallchange in the profile, so we know quantitatively how much we can infer about the networks from the profiles.Inequalities of this type is known for the persistence diagram Topological Data Analysis [CSEH07, CCSG+09] andfor homomorphism density graphon theory [Lov12], respectively.

2.2. Online Matrics Factorization for Markovian data. In modern data analysis, a central step is to find a low-dimensional representation to better understand, compress, or convey the key phenomena captured in the data.Matrix factorization provides a powerful setting for one to describe data in terms of a linear combination of factorsor atoms. In this setting, we have a data matrix X ∈Rd×n , and we seek a factorization of X into the product W H for

Page 5: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

5

𝑑

𝑛

𝑑

𝑟 𝑛

𝑟 ×

≅ 𝑋 𝑊 𝐻

Data Dictionary Code

≅ ×

Figure 3. Illustration of matrix factorization. Each column of the data

matrix is approximated by a linear combination of the columns of the

dictionary matrix.

W ∈ Rd×r and H ∈ Rr×n (see Figure 3). This problemhas gone by many names over the decades, each withdifferent constraints: dictionary learning, factor anal-ysis, topic modeling, component analysis. It has ap-plications in text analysis, image reconstruction, med-ical imaging, bioinformatics, and many other scientificfields more generally [SGH02, BB05, BBL+07, CWS+11,TN12, BMB+15, RPZ+18].

𝑎

𝑏

𝑁 𝑁 𝑁 𝑁 𝑁 𝑡

Dictionary patches of size 20

learned from MCMC trajectory

Reconstructed spin configuration

using learned dictionary patches

An Ising spin configuration

sampled at temperature 𝑇 = 0.5

Dictionary patches of size 20

learned from MCMC trajectory

Reconstructed spin configuration

using learned dictionary patches

Dictionary patches of size 20

learned from MCMC trajectory

Reconstructed spin configuration

using learned dictionary patches

An Ising spin configuration

sampled at temperature 𝑇 = 2.26

An Ising spin configuration

sampled at temperature 𝑇 = 5

Figure 4. (Left) 100 learned dictionary patches from a MCMC Gibbs sampler

for the Ising model on 200×200 square lattice at a subcritical temperature (T =0.5). (Middle) A sampled spin configuration at T = 0.5. (Right) Reconstruction

of the original spin configuration in the middle using the dictionary patches

on the left.

Online matrix factorization is a problem set-ting where data are accessed in a streaming man-ner and the matrix factors should be updatedeach time. That is, we get draws of X from somedistribution π and seek the best factorizationsuch that the expected loss EX∼π

[‖X −W H‖2F

]is small. This is a relevant setting in today’sdata world, where large companies, scientific in-struments, and healthcare systems are collectingmassive amounts of data every day. One can-not compute with the entire dataset, and so wemust develop online algorithms to perform thecomputation of interest while accessing them se-quentially. There are several algorithms for com-puting factorizations of various kinds in an online context. Many of them have algorithmic convergence guaran-tees, however, all these guarantees require that data are sampled at each iteration i.i.d. with respect to previousiterations. In all of the application examples mentioned above, one may make an argument for (nearly) identicaldistributions, but never for independence. This assumption is critical to the analysis of previous works (see., e.g.,[MBPS10, GTLY12, ZTX16]).

𝒂(𝟏)

𝒂(𝟐)

𝑊

𝑉

𝒉(𝟐) 𝑈

𝒂(𝟐)

𝒂(𝟑)

𝑊

𝑉

𝒉(𝟑) 𝑈

𝒉(𝟎)

𝑈

𝒂(𝟎)

𝒂(𝟏)

𝑊

𝑉

𝒉(𝟏)

𝒂(𝟑)

𝒃(𝟏)

𝑊

𝑉

𝒉(𝟒) 𝑈

𝒃(𝟏)

𝒃(𝟐)

𝑊

𝑉

𝒉(𝟓) 𝑈

𝒃(𝟐)

𝒃(𝟑)

𝑊

𝑉

𝒉(𝟔) 𝑈

𝒃(𝟑)

𝒃(𝟒)

𝑊

𝑉

𝒉(𝟕) 𝑈

Question

Answer

Reconstructed Original Network dictionary patches

Figure 5. (Left) 45 learned 5 by 5 network dictionary patches from Glauber

chain sampling the ‘depth-2 wedge motif’ F = ([5],1{(1,2),(2,3),(1,4),(4,4)}) from

the Word Adjacency Matrix of "Mark Twain - Adventures of Huckleberry Finn".

(Middle) Heat map of the original Word Adjacency Matrix of the 10 by 10 torus

where blue = 0 and yellow = 1. (Right) Heat map of the reconstructed Word Ad-

jacency Matrix with the same colormap. The Frobenius norms of the original,

reconstructed, and their difference are 2.8317, 3.4296, and 1.7241, respectively.

A natural way to relax the assumption of in-dependence in this online context is through theMarkovian assumption. In many cases one mayassume that the data are not independent, butindependent conditioned on the previous iter-ation. The central contribution of our work isto extend the analysis of online matrix factor-ization in [MBPS10] to the setting where the se-quential data form a Markov chain. This is natu-rally motivated by the fact that the Markov chainMonte Carlo (MCMC) method is one of the mostversatile sampling techniques across many disci-plines, where one designs a Markov chain explor-ing the sample space that converges to the targetdistribution.

In a joint work with Needell and Balzano [LNB19], I have shown that the well-known OMF algorithm for i.i.d.stream of data proposed in [MBPS10], in fact converges almost surely to the set of critical points of the expectedloss function, even when the data matrices form a Markov chain satisfying a mild mixing condition. One of the keyidea in our analysis is to condition the future state of the Markov chain on a distant past so that the conditionalexpectation of the future state is very close to its stationary expectation. This allows us to control the differencebetween the new and the average losses by concentration of Markov chains.

We discuss two application contexts for our result on Markovian OMF. First, we apply dictionary learning andreconstruction for images using online Non-negative Matrix Factorization for a sequence of Ising spin configura-tions generated by the Gibbs sampler (see Figure 4). This illustrates that we can learn dictionary image patches

Page 6: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

6

from an MCMC trajectory of images. Second, we propose a novel framework for network data analysis that we callNetwork Dictionary Learning (see Figure 5). This allows one to extract ‘network dictionary patches’ from a givennetwork to see its fundamental features and to reconstruct the network using them. Two fundamental buildingblocks are online NMF on Markovian data, which is the main subject in this paper, and the MCMC motif samplingalgorithms that I have developed in [LMS19] (see Subsection 2.1).

3. SOLITONS, BOX-BALL SYSTEMS, AND INTEGRABLE PROBABILITY

The κ-color BBS [Tak93] is an integrable cellular automaton on the half-integer lattice N, which we think ofas an array of capacity-one boxes that can fit at most one ball of any of the κ colors. Balls gradually move tothe right, and the particular time evolution of BBS makes the system to be eventually decomposed into ‘solitons’with various lengths. By arranging the solitons as columns, we can associate a time-invariant Young diagram to agiven BBS configuration. An example of a 5-color BBS trajectory (X t )t≥0 and its associated Young diagramΛ(X0) isshown:

t = 0 : 00312051300411252003211000000000000000000000000t = 1 : 00001320153000141522000321100000000000000000000t = 2 : 00000103021530010410522000032110000000000000000t = 3 : 00000010300215301004100522000003211000000000000t = 4 : 00000001030002150310041000522000000321100000000t = 5 : 00000000103000025103100410000522000000032110000t = 6 : 00000000010300002051031004100000522000000003211

7−→Λ(X0) =

BBS can be obtained through different limiting procedures both from the quantum and classical inegrable systems,which makes the theory of BBS very rich [IKT12].

Recently BBS with random initial configuration is getting a considerable attention in the probability community[LLP17, CKST18, KL18, FG18, KL18, CS19a, CS19b]. There are roughly two central questions that the reseaerchersare aming to answer: 1) If the random initial configuration is one-sided, what is the limiting shape of the invariantrandom Young diagram as the system size tends to infinity? 2) If one considers the two-sided BBS (where theinitial configuration is a bi-directional array of balls), what are the two-sided random initial configurations thatare invariant under the BBS dynamics? In collaboration with a number of researchers in the field, I have workedextensively on the second problem in the past years, combining a number of techniques from different areas ofmathematics and discovering interesting new phenomena.

𝑖 ≥ 1, 𝑗 ≥ 2 fixed 𝜌 (𝑛) 𝜆 (𝑛) 𝜆 (𝑛)

Subcritical phase (𝑝∗ < 𝑝 ) Θ(𝑛) Θ(log 𝑛 ) Θ(log 𝑛 )

Critical phase (𝑝∗ = 𝑝 ) Θ(𝑛) Θ √𝑛 Θ √𝑛

Supercritical phase (𝑝∗ > 𝑝 )

Simple (𝑝∗ = 𝑝ℓ for unique ℓ) Θ(𝑛) Θ(𝑛)

Θ(log 𝑛)

Non-simple (𝑝∗ = 𝑝ℓ for multiple ℓ) O √𝑛 ∩ Ω √𝑛/log 𝑛

𝑝

𝑝 𝑝

𝑝

𝑝

𝑝

𝑝 𝑝

𝑝

(0, 0)

(1, 0) (0, 1)

𝑚 (Γ ) 𝑚 (Γ )

ℎ = 8

Table 1. Asymptotic scaling of column λ j and row ρi lengths for the independence model with ball density p =(p0, p1, · · · , pκ) and p∗ = max(p1, · · · , pκ). The asymptotic soliton lengths undergo a similar ‘double-jump’ phase transi-

tion depending on p∗−p0, as in the κ= 1 case established in [LLP17]. The existence of non-simple supercritical phase is

unique to the multicolor (κ≥ 2) case, where subsequent soliton lengths scales asp

n instead of logn. Sharp asymptotics

for the row lenghts has been obtained in [KL18]. The constant factors depend p, i , and j .

With Levein and Pike [LLP17], I have studied various soliton statistics of the basic 1-color BBS when the systemis initialized according to a Bernoulli product measure with ball density p on the first n boxes. One of their mainresults is that the length of the longest soliton is of order logn for p < 1/2, order

pn for p = 1/2, and order n for

p > 1/2. Additionally, there is a condensation toward the longest soliton in the supercritical p > 1/2 regime in thesense that, for each fixed j ≥ 1, the top j soliton lengths have the same order as the longest for p ≤ 1/2, whereasall but the longest have order logn for p > 1/2. Their analysis is based on geometric mappings from the associatedsimple random walks to the invariant Young diagrams, which enable robust analysis of the scaling limit of the

Page 7: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

7

invariant Young diagram. However, this connection is not apparent in the general κ ≥ 1 case. In fact, one of themain difficulties in analyzing the soliton lengths in the multicolor BBS is that within a single regime, there is amixture of behaviors that we see from different regimes in the single-color case.

The row lengths in the multicolor BBS are well-understood due to my recent works with Kuniba and Okado[KLO18, KL18]. The central observation is that, when the initial configuration is given by a product measure, thenthe sum of row lengths can be computed via some additive functional (called ‘energy’) of carrier processes of vari-ous shapes, which are finite-state Markov chains whose time evolution is given by combinatorial R. In [KLO18], the‘stationary shape’ of the Young diagram for the most general type of BBS is identified by the logarithmic derivativeof a deformed character of the KR modules (or Schur polynomials in the basic case). In [KL18], I showed that therow lengths satisfy a large deviations principle and hence the Young diagram converges to the stationary shape atan exponential rate, in the sense of row scaling.

Recently, with Lewis, Pylyavskyy, and Sen [LLPS19], I have obtained scaling limits of the columns lengths inmulticolor BBS using a modified version of Greene-Kleitman invariants for BBS and circular exclusion processes.Especially, the latter can be regarded as a circular version of the well-known Totally Asymmetric Simple ExclusionProcess (TASEP) on a line (see, e.g., [F+18, BFPS07, BFS08]). For its rough description, consider the followingprocess on the unit circle S1. Starting from some finite number of points, at each time, a new point is added toS1 independently from a fixed distribution, which then deletes the nearest counterclockwise point already on thecircle. Equivalently, one can think of each point in the circle trying to jump to the clockwise direction. It turns outthat this process is crucial in analyzing the permutation model, whereas for the independence model, the relevantcircular exclusion process is defined on the integer ring Zκ+1 where points can stack up at the same location.Interestingly, a cylindric version of Schur functions has been used to study rigged configurations and BBS [LPS14].

4. PHASE TRANSITION IN RANDOM CONTINGENCY TABLES WITH NON-UNIFORM MARGINS

Denote by M (r,c) the set of all (n ×m) contingency tables with row sums ri and column sums c j given bytwo nonnegative integer vectors r = (r1, · · · ,rm) ∈ Nm ,c = (c1, · · · ,cn) ∈ Nn . Let X = (Xi j ) be the contingency ta-ble chosen uniformly at random from M (r,c). When the margins are uniform, i.e. r1 = . . . = rm and c1 = . . . = cn ,the exact asymptotics for

∣∣M (r,c)∣∣ are known [CM10, GM08]. In fact, the distribution of individual entries Xi j

is asymptotically geometric and the dependence between the entries vanish as the size of the table goes to in-finity [CDS10]. This agrees the asymptotic behavior of random contingency tables following the hypergeometricdistribution. However, when the margins are far from being constant, Barvinok [Bar10] conjectured that the be-havior of the random uniform contingency table becomes drastically different from that of hypergeometric table.

A natural model to test Barvinok’s conjecture on uniform contingency tables with non-uniform margins is toinvestigate properties of uniform contingency tables with only two distinct values for the margin, in relation totheir ratio. In particular, fix a parameter B ≥ 1, and take the margin vectors r and c so that their first entry is largewith bBnc and the last n entires are small with n (see Figure 6 left with δ = 0 and C = 1). In [DLP19a], we testedBavinok’s conjecture empirically using a new MCMC algorithm introduced in [PD19], and we found that the cornerentry X11 undergoes phase transition as B passes a critical value Bc = 1+p

2; it is uniformly bounded for B < Bc ,but grows linearly in n for B > Bc .

In a joint work with Dittmer and Pak [DLP19b], I was able to prove Barvinok’s conjecture in a more refined form(but for the ‘think bezel’ case δ ∈ (1/2,1)). The proof uses Barvinok’s typical table theory for uniform contingencytables [Bar09, Bar10] as well as ‘transference principle’ and concentration inequalities in probability literature.Below is a rough summary of the main results in [DLP19b].

Theorem 4.1 (Dittmer, L., and Pak ’19+). Fix constants B ,C > 0 and 1/2 < δ< 1. Let X = (Xi j ) be the uniform ran-dom contingency table with the first bnδc row and column margins bBC nc, and the last n row and column marginsbC nc. Let Bc = 1+p

1+1/C . Then

E[X11] =Θ(1) for B < Bc , and E[X11] =Θ(n1−δ)for B > Bc , (4)

Furthermore, X marginally converges to the matrices given in Figure 6 in total variation distance:

Page 8: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

8

𝐵 1 2 0 1 + √2

c ≡ √2 c c

1

√2

𝐶𝑛

𝐶𝑛

𝐶𝑛

𝐵𝐶𝑛

𝐵𝐶𝑛

𝐶𝑛 𝐶𝑛 𝐶𝑛 𝐵𝐶𝑛 𝐵𝐶𝑛 ⋯

𝑛 𝑛

𝑛

𝑛

Geom(𝐶)

Geom(𝐵 𝐶)

Geo

m( 𝐵

𝐶)

Geom 𝐶(𝐵 − 𝐵 )𝑛 𝑛 𝑛

𝑛

𝑛

𝑛 → ∞

𝐵 (1 + 𝐶)

(𝐵 − 𝐵)(𝐵 + 𝐵 − 2)

Geom(𝐶)

Geom(𝐵𝐶)

Geo

m( 𝐵

𝐶)

Geom

𝐵 < 𝐵 𝐵 > 𝐵

𝐵𝐶𝑛

𝐵𝐶𝑛

𝐶𝑛

𝐶𝑛 𝐶𝑛

𝐶𝑛 𝐶𝑛 𝐶𝑛 𝐵𝐶𝑛

Figure 6. (Left) Contingency table with parameters n,δ,B and C . First bnδc rows and columns have margins bC nc, the

last n rows and columns have margins bBC nc. (Right) Limiting distributions of the entries in the uniform contingency

table X in the subcritical B < Bc (left) and supercritical B > Bc (right) regimes for thick bezels 1/2 < δ < 1. Geom(λ)

denotes geometric distribution with mean λ.

5. MULTITYPE ANNIHILATING PARTICLE SYSTEMS

5.1. Two-type annihilating systems. A multitype particle system is a stochastic process defined on a d-dimensionalinteger lattice Zd with following set of information: particle types with corresponding random walk kernel andjump rate, and their collision rule. Each site is assigned with one type initially independently, and all particles per-form independent random walks and interact upon collision. The system is said to be annihilating if only particleswith different types interact upon collision and they do it by annihilating with each other. Physicists have long beeninterested in this type of process as a model for irreversible reactions with mobile particles since the seminal works[OZ78, TW83]. In probability literature, one-type annihilating system is well-understood in all dimensions due toArratia [Arr83], under the name of annihilating random walks.

Figure 7. Simulation of parking process with critical density p = 0.5.

Blue=cars and Red=spots. Time goes upwards.

Now consider two-type annihilating systems withparticle types A,B and λA ,λB for their correspond-ing jump rates. Let p A denote the density of initialA-particles. Bramson and Lebowitz gave a completeanalysis of this system for equal jump rates λA = λB intheir celebrated paper [BL88]. Most notably, for criticaldensity p A = 1/2, they showed that the particle densityρt at the origin has the following asymptotic

ρt ³{

t−d/4, d ≤ 3

t−1, d ≥ 4.(5)

This is in agreement with mean field approximation in the physics literature for d ≥ 4, but deviates from it for lowdimension d ≤ 3. The above asymptotics were conjectured to hold when λA 6=λB [LC95, Koz96, KR84]. The case ofasymmetric jump rate seems to be challenging, with the best known result due to Cabezas, Rolla, and Sidoravicius[CRS18]: ρt & 1/t in all dimensions for p A = 1/2, and that the system undergoes a phase transition from infinitevisits from A-particles to the root (recurrence) when p A ≥ 1/2 to only finitely many visits (transience) for p A < 1/2.

In joint work with Damron, Gravner, Junge, and Sivakoff [DGJ+17], I studied the two-type annihilating sys-tem with extremely asymmetric jump rate with λB = 0 < λA (a.k.a. parking process). It is helpful to visualize theA-particles as ‘cars’ and the B-particles as ‘parking spots’. Cars simultaneously perform independent simple sym-metric random walks, and when one or more car encounters an available parking spot, a uniformly chosen carparks there and the spot becomes unavailable (see Figure 7). Our main results show that for a large class of tran-sitive and unimodular graphs in place of Zd , there exists a sharp phase transition at p A = 1/2 some quantitativeestimates that was lacking in the literature:

Theorem 5.1. Let Vt be the number of visits of cars to the origin, and let V := limt→∞Vt .

(i) If p A ≥ 1/2, then V is infinite almost surely. Moreover, EpVt = (2p A −1)t +o(t ).

(ii) If p A < 1/2, then V is finite almost surely. Moreover, EpV <∞ if p A < (256d 6e2)−1.

Page 9: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

9

In [DGJ+17], we conjectured that particle densities in the critical (p A = 1/2) parking process satisfies the Bramson-Lebowitz asymptotics (5) especially for low dimensions d ≤ 3. This was recently addressed for d = 1 by Przykucki,Roberts, and Scott [PRS19]. Simultaneously, in a work in preparation with Johnson, Damron, Junge, and Sivakoff,we have obtained a matching lower bound for all d ≤ 3 and for general jump rates 0 ≤λB ≤λA as well as matchingupper bound and upper and lower bounds on the critical exponent when λB = 0:

Theorem 5.2. The following hold.

(i) Let p A = 1/2 and 0 ≤λB ≤ 1. On Zd with d ≤ 3, we have ρt & (t log t )−d/4.(ii) Let p A = 1/2 and λB = 0. On Z it holds that EVt . t 3/4.(iii) Let λB = 0. On Z, there exists C > 0 such that for all 1/4 < p A < 1/2 we have

EpV∞ ≤C(1−2p

)−3 .

(iv) Let λB = 0. On Zd for d ≤ 3, there exists C > 0 such that for all 1/4 < p A < 1/2 it holds that

EpV∞ ≥C(1−2p

)−(4/d)+1 /log(1−2p

)−1

.

Our proof for the original parking process in [DGJ+17] relies on a recursive distributional equation and mass-transport principle coming from the unimodularity. In our following work [DJJ+19], we were able to obtain sharperresults by using a path-swapping coupling that reconstructs the parallel process as a sequential abelian processthat is easier to analyze. In a work in progress with Ahn, Richey, Wang, Junge, and Sivakoff [ARW+19], I am investi-gating particle densities in two-type annihilating systems when same-type particles coalesce upon collision.

5.2. Ballistic annihilation. Ballistic annihilation (BA) is a particular instance of the multitype annihilating sys-tem we described in the previous subsection where the random walk kernels are degenerate so that each particletype maintain constant velocity until collision. This relatively simple to define system has formidable long termdependence that is both interesting and challenging to understand rigorously.

1/2

1/4

1/3 1/2 2/3

𝐹𝑙𝑢𝑐𝑡𝑢𝑎𝑡𝑖𝑜𝑛

𝐹𝑖𝑥𝑎𝑡𝑖𝑜𝑛

𝜆

1/3

0 1

1

𝑝 𝑝 (𝜆, 1) = max4𝜆 − 1

4𝜆 + 2,3 − 4𝜆

6 − 4𝜆

𝑓∗(𝜆 ) = max1 − 2𝜆

2 − 2𝜆,1

4,2𝜆 − 1

2𝜆

𝑓∗(𝜆 ) = max1 − 𝜆

2 − 𝜆,

𝜆

1 − 𝜆

Figure 8. BA with spacing distribution µ = Exp(1) and 2000 particles

for parameters (λ, v) = (1/4,2) and p = 4/9 (top), p = 3/7 (middle), and

p = 1/3 (bottom).

With only two velocities (types), a comparison tosimple random walk ensures that the phase transi-tion occurs when particle types are in balance. The 3-velocity case (velocities −v,0,1) becomes remarkablymore complicated (See Figure 9 for some simulations).Ininitially each particle is assigned with one of thethree velocities −v,0,1 independently with probabili-ties (1−p)(1−λ), p, (1−p)λ, respectively, for parame-ters p and λ. The central question is to characterizethe critical value pc (λ, v) of density p of inert particlesso that the origin is eventually visited (fluctuation) ornot (fixation), for each fixed λ and v . Due to lack ofmonotonicity, it is not even clear that there exists a single such critical value.

1/2

1/4

1/3 1/2 2/3

𝐹𝑙𝑢𝑐𝑡𝑢𝑎𝑡𝑖𝑜𝑛

𝐹𝑖𝑥𝑎𝑡𝑖𝑜𝑛

𝜆

1/3

0 1

1

𝑝 𝑝 (𝜆, 1) = max4𝜆 − 1

4𝜆 + 2,3 − 4𝜆

6 − 4𝜆

𝑓∗(𝜆 ) = max1 − 2𝜆

2 − 2𝜆,1

4,2𝜆 − 1

2𝜆

𝑓∗(𝜆 ) = max1 − 𝜆

2 − 𝜆,

𝜆

1 − 𝜆

Figure 9. Phase diagram for BA with two

parameters (p,λ).

In the early 1990s, physicists were interested in proving for the totally sym-metric case (v = 1 andλ= 1/2), confirming a simple heuristic that pc (1/2,1) =1/4) [BNRL93]. This was a well-known open problem in the mathematicscommunity for decades, and only recently it has been addressed by Hasle-grave, Sidoravicius, and Tournier [HST18]. However, their argument, whichrelies on the fact that inverting a configuration on a finite interval preservesboth the measure and collision event in the totally symmetric case, does notgeneralize to the general asymmetric case.

In a joint work with Junge [JL18], I have obtained a nearly complete phasestructure of BA in the most general setting, giving characterization of the fluc-tuation and fixation regimes that depending on certain conditional probabil-ities for collisions between three particles. We achieve this by invoking a masstransport principle in several places to relate key quantities in order to derive a master equation that describes the

Page 10: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

10

system. The existence of sharp phase transition depends on whether these conditional probabilities are mono-tonic in p, which we are working to address in order to obtain the ‘critical curve’ in the phase plane.

6. SYNCHRONIZATION OF PULSE-COUPLED OSCILLATORS

The fundamental difficulty in understanding collective behaviors of coupled oscillators is the lack of mono-tonicity due to cyclic hierarchy of phase space S1. Three commonly used techniques bypass this difficulty buthave their weaknesses. First, a total ordering on oscillator phases emerges if the initial phases are concentratedwithin a half of S1 (e.g., from the most lagging to the most advanced). A broad class of couplings respects this or-dering and contracts the phase configuration toward synchrony. Second, in order to achieve such concentration inthe first place, heavy randomization could be used to turn the system into a Markov chain; then the strong Markovproperty ensures that concentration is achieved after exponential mount of time. On the other hand, by using anunbounded memory per node, one can ‘lift’ the cyclic phase space S1 into totally ordered R, and let the nodes tunetoward local maxima. Then global maximum propagates and subsumes all the other nodes in O(d) time. Thisclassic idea in theoretical computer science dates back to Lamport [Lam78], which in fact also inspired our liftingtechnique for the 3-color CCA/GHM on arbitrary graphs. However, use of heavy randomization and unboundedmemory are too expensive in terms of running time and memory efficiency. Hence we are left with the followingquestion: how can we drive the system close to synchrony from an arbitrary initial configuration in polynomial timewithout using unbounded memory?

In a series of three solo papers, I addressed the above question for certain class of PCOs on finite trees. In thefirst two papers [Lyu15, Lyu16], I introduced the κ-color FCA as a discrete model for inhibitory PCOs and provedthe following theorem on finite trees.

Theorem 5.1. Let (X t )t≥0 be the random κ-color FCA trajectory on a finite tree T with maximum degree ∆.

(i) If ∆≥ κ, then X t does not synchronize with positive probability.(ii) For κ≤ 6, X t synchronizes with probability 1 iff ∆< κ.(iii) For κ≥ 7, X t does not synchronize with positive probability on some T with ∆≤ κ/2+1.

In fact, the maximum degree condition (i) in the above theorem is a fundamental issue in inhibitory PCOs on finitetrees. Namely, nodes with large degree may receive input pulses so often that its phase is constantly inhibited andit may never send pulses to its neighbors. This divides the tree into non-communicating components so globalsynchrony may not emerge.

0

32

16

48

0

1

Figure 10. Illustration of Theorem 5.2. The underlying graph is a spanning tree

of the 150×150 square grid plus diagonal edges, which have nodes of degree 8.

Oscillators on the nodes of the spanning tree interact according to the ‘adap-

tive’ version of the inhibitory pulse-coupling in [Lyu17]. Phase of each node is

indicated by a warm color between black and white. Initial phase configura-

tion is not confined in an open half-circle of S1, but the system reaches global

synchronization by time t = 563.

In the third paper [Lyu17], I extended the 4-color FCA to what I call the adaptive 4-coupling,which is a continuous-time and continuous-state pulse-coupling for period 1 phase oscilla-tors. The key innovation to surpass the degreeconstraint is that each oscillator has an auxil-iary state variable, which may throttle the inputpulse. This effectively breaks symmetries in localconfigurations on finite trees and reduces the rel-evant diameter to consider in constant time. Themain result is the following:

Theorem 5.2. Let (Σ•(t ))t≥0 be an adaptive 4-coupling trajectory on a finite tree T with diameter d. For arbitrary Σ•(0), the trajectory synchronizes in time 87d.

Moreover, in order to overcome the restriction on tree topology, we precompose the adaptive 4-coupling withdistance≤ 2 coloring and spanning tree algorithms (both of which use randomization). Denote the resulting multi-layer distributed algorithm by A . We have the following corollary.

Corollary 5.3. Consider a distributed communication network of anonymous processors on a connected simplegraph G = (V ,E) of maximum degree ∆ and diameter d. Suppose each processor has a local clock of identical fre-quency.

Page 11: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

11

(i) The adaptive 4-coupling can be implemented with O(log∆) memory per node and uses O(∆) bits of binary pulsecommunication per unit time.

(ii) Let τG be the worst case time until synchronizing a given configuration. Then

E(τG ) =O(|V |+ (d 5 +∆2) log |V |). (6)

To my knowledge, the derivation of global convergence of PCOs on trees, especially with an explicit boundwhich is optimal up to a constant factor, was for the first time. Moreover, the clock synchronization algorithm A

is scalable and can be applied for dynamic and growing networks, as long as the maximum degree is bounded.Hence it is especially suitable in modern wireless sensor networks.

The largest bottleneck for the upper bound (6) comes from the unknown diameter of random spanning treeconstructed by the algorithm of Itkis and Levin [IL94]. This is essentially due to the unknown output of a leader-election subroutine. We can omit the O(|V |) term in the bound (6) if we could answer the following question:

Question 5.4. Is there a (randomized) distributed spanning tree algorithm T , which computes a spanning tree Tof a given connected graph G = (V ,E) with maximum degree ∆ and diameter d , with the following properties?

(i) T can be implemented on G with O(log∆) memory per node.(ii) diam(T ) = diam(G)O(1) log |V |.(iii) E(worst case running time) = diam(G)O(1) log |V |.

Another open question that I am working on, which is important both for theory and application aspects, is toextend the above results for oscillators with non-identical frequencies.

Question 5.5. Find a coupling under which arbitrary (or almost all) initial phase configuration on every undirectedfinite tree synchronizes, where the oscillators may have different natural frequency or communication between os-cillators may have propagation delay. Furthermore, show that the worst-case time to convergence is a polynomialin the diameter of the underlying tree.

REFERENCES

[Arr83] Richard Arratia, Site recurrence for annihilating random walks on Zd , Ann. Probab. 11 (1983), no. 3, 706–713.

[ARW+19] Sungwon Ahn, Jacob Richey, Lily Wang, Matthew Junge, Hanbaek Lyu, and David Sivakoff, Phase transition in parking process with

coalescing cars, In preparation (2019).

[Bar09] Alexander Barvinok, Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation poly-

topes, International Mathematics Research Notices 2009 (2009), no. 2, 348–385.

[Bar10] , What does a random contingency table look like?, Combinatorics, Probability and Computing 19 (2010), no. 4, 517–539.

[BB05] Michael W Berry and Murray Browne, Email surveillance using non-negative matrix factorization, Computational & Mathematical

Organization Theory 11 (2005), no. 3, 249–264.

[BBL+07] Michael W Berry, Murray Browne, Amy N Langville, V Paul Pauca, and Robert J Plemmons, Algorithms and applications for approx-

imate nonnegative matrix factorization, Computational statistics & data analysis 52 (2007), no. 1, 155–173.

[BFPS07] Alexei Borodin, Patrik L Ferrari, Michael Prähofer, and Tomohiro Sasamoto, Fluctuation properties of the TASEP with periodic

initial configuration, Journal of Statistical Physics 129 (2007), no. 5-6, 1055–1080.

[BFS08] Alexei Borodin, Patrik L Ferrari, and Tomohiro Sasamoto, Transition between Airy1 and Airy2 processes and TASEP fluctuations,

Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 61 (2008),

no. 11, 1603–1629.

[BH12] Alexander Barvinok and J Hartigan, An asymptotic formula for the number of non-negative integer matrices with prescribed row

and column sums, Transactions of the American Mathematical Society 364 (2012), no. 8, 4323–4368.

[BL88] Maury Bramson and Joel L. Lebowitz, Asymptotic behavior of densities in diffusion-dominated annihilation reactions, Phys. Rev.

Lett. 61 (1988), 2397–2400.

[BLSY10] Alexander Barvinok, Zur Luria, Alex Samorodnitsky, and Alexander Yong, An approximation algorithm for counting contingency

tables, Random Structures & Algorithms 37 (2010), no. 1, 25–66.

[BMB+15] Rostyslav Boutchko, Debasis Mitra, Suzanne L Baker, William J Jagust, and Grant T Gullberg, Clustering-initiated factor analysis

application for tissue classification in dynamic brain positron emission tomography, Journal of Cerebral Blood Flow & Metabolism

35 (2015), no. 7, 1104–1111.

[BNRL93] E Ben-Naim, S Redner, and F Leyvraz, Decay kinetics of ballistic annihilation, Physical review letters 70 (1993), no. 12, 1890.

[CCSG+09] Frédéric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo Mémoli, and Steve Y Oudot, Gromov-Hausdorff stable signa-

tures for shapes using persistence, Computer Graphics Forum, vol. 28, Wiley Online Library, 2009, pp. 1393–1403.

[CDS10] Sourav Chatterjee, Persi Diaconis, and Allan Sly, Properties of uniform doubly stochastic matrices, arXiv preprint arXiv:1010.6136

(2010).

Page 12: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

12

[CKST18] David A Croydon, Tsuyoshi Kato, Makiko Sasada, and Satoshi Tsujimoto, Dynamics of the box-ball system with random initial

conditions via Pitman’s transformation, arXiv preprint arXiv:1806.02147 (2018).

[CM10] E Rodney Canfield and Brendan D McKay, Asymptotic enumeration of integer matrices with large equal row and column sums,

Combinatorica 30 (2010), no. 6, 655.

[CRS18] M. Cabezas, L. T. Rolla, and V. Sidoravicius, Recurrence and density decay for diffusion-limited annihilating systems, Probab. Theory

Related Fields 170 (2018), no. 3-4, 587–615. MR 3773795

[CS79] Edward F Connor and Daniel Simberloff, The assembly of species communities: chance or competition?, Ecology 60 (1979), no. 6,

1132–1140.

[CS19a] David A Croydon and Makiko Sasada, Duality between box-ball systems of finite box and/or carrier capacity, arXiv preprint

arXiv:1905.00189 (2019).

[CS19b] , Invariant measures for the box-ball system based on stationary Markov chains and periodic Gibbs measures, arXiv preprint

arXiv:1905.00186 (2019).

[CSEH07] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer, Stability of persistence diagrams, Discrete & Computational Geome-

try 37 (2007), no. 1, 103–120.

[CWS+11] Yang Chen, Xiao Wang, Cong Shi, Eng Keong Lua, Xiaoming Fu, Beixing Deng, and Xing Li, Phoenix: A weight-based network

coordinate system using matrix factorization, IEEE Transactions on Network and Service Management 8 (2011), no. 4, 334–347.

[DB12] Florian Dorfler and Francesco Bullo, Synchronization and transient stability in power networks and nonuniform kuramoto oscilla-

tors, SIAM Journal on Control and Optimization 50 (2012), no. 3, 1616–1642.

[DE85] Persi Diaconis and Bradley Efron, Testing for independence in a two-way table: new interpretations of the chi-square statistic, The

Annals of Statistics (1985), 845–874.

[DG95] Persi Diaconis and Anil Gangolli, Rectangular arrays with fixed margins, Discrete probability and algorithms, Springer, 1995,

pp. 15–41.

[DGJ+17] Micheal Damron, Janko Gravner, Matthew Junge, Hanbaek Lyu, and David Sivakoff, Parking on transitive unimodular graphs,

arXiv.org/1710.10529 (2017).

[DJJ+19] Michael Damron, Tobias Johnson, Matthew Junge, Hanbaek Lyu, and David Sivakoff, Particle density in diffusion-limited annihi-

lating systems, In preparation (2019).

[DKM97] Martin Dyer, Ravi Kannan, and John Mount, Sampling contingency tables, Random Structures & Algorithms 10 (1997), no. 4, 487–

506.

[DLP19a] Sam Dittmer, Hanbaek Lyu, and Igor Pak, Contingency tables have empirical bias, In preparation (2019).

[DLP19b] , Phase transition in random contingency tables with non-uniform margins, To appear in Transactions of the AMS.

arXiv:1903.08743 (2019).

[DRFP95] Michel Droz, Pierre-Antoine Rey, Laurent Frachebourg, and Jarosław Piasecki, Ballistic-annihilation kinetics for a multivelocity

one-dimensional ideal gas, Physical Review E 51 (1995), no. 6, 5541.

[DS98] Persi Diaconis and Bernd Sturmfels, Algebraic algorithms for sampling from conditional distributions, The Annals of statistics 26(1998), no. 1, 363–397.

[Eve92] Brian S Everitt, The analysis of contingency tables, Chapman and Hall/CRC, 1992.

[F+18] Pablo A Ferrari et al., TASEP hydrodynamics using microscopic characteristics, Probability Surveys 15 (2018), 1–27.

[FG18] Pablo A Ferrari and Davide Gabrielli, BBS invariant measures with independent soliton components, arXiv preprint

arXiv:1812.02437 (2018).

[FL17] Eric Foxall and Hanbaek Lyu, Clustering of three and four color cyclic particle systems in one dimension, arXiv.org/1711.04741

(2017).

[FLL17] Morten Fagerland, Stian Lydersen, and Petter Laake, Statistical analysis of contingency tables, Chapman and Hall/CRC, 2017.

[FYO00] Kaori Fukuda, Yasuhiko Yamada, and Masato Okado, Energy functions in box ball systems, International Journal of Modern Physics

A 15 (2000), no. 09, 1379–1392.

[G+63] Irving J Good et al., Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables, The Annals

of Mathematical Statistics 34 (1963), no. 3, 911–934.

[GLS16] Janko Gravner, Hanbaek Lyu, and David Sivakoff, Limiting behavior of 3-color excitable media on arbitrary graphs, Annals of Ap-

plied Probability (to appear) (2016).

[GM08] Catherine Greenhill and Brendan D McKay, Asymptotic enumeration of sparse nonnegative integer matrices with specified row and

column sums, Advances in Applied Mathematics 41 (2008), no. 4, 459–481.

[GTLY12] Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan, Online nonnegative matrix factorization with robust stochastic approxi-

mation, IEEE Transactions on Neural Networks and Learning Systems 23 (2012), no. 7, 1087–1099.

[HHI+01] Goro Hatayama, Kazuhiro Hikami, Rei Inoue, Atsuo Kuniba, Taichiro Takagi, and Tetsuji Tokihiro, The A(1)M automata related to

crystals of symmetric tensors, Journal of Mathematical Physics 42 (2001), no. 1, 274–308.

[HST18] J. Haslegrave, V. Sidoravicius, and L. Tournier, The Three-speed ballistic annihilation threshold is 1/4, ArXiv e-prints: 1811.08709

(2018).

[IKT12] Rei Inoue, Atsuo Kuniba, and Taichiro Takagi, Integrable structure of box–ball systems: crystal, Bethe ansatz, ultradiscretization and

tropical geometry, Journal of Physics A: Mathematical and Theoretical 45 (2012), no. 7, 073001.

[IL94] Gene Itkis and Leonid Levin, Fast and lean self-stabilizing asynchronous protocols, Foundations of Computer Science, 1994 Pro-

ceedings., 35th Annual Symposium on, IEEE, 1994, pp. 226–239.

[JL18] Matthew Junge and Hanbaek Lyu, The phase structure of asymmetric ballistic annihilation, arXiv preprint arXiv:1811.08378 (2018).

Page 13: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

13

[Kat14] Maria Kateri, Contingency table analysis, Statistics for Industry and Technology 525 (2014).

[KB12] Johannes Klinglmayr and Christian Bettstetter, Self-organizing synchronization with inhibitory-coupled oscillators: Convergence

and robustness, ACM Transactions on Autonomous and Adaptive Systems (TAAS) 7 (2012), no. 3, 30.

[KC14] Eric D Kolaczyk and Gábor Csárdi, Statistical analysis of network data with r, vol. 65, Springer, 2014.

[KL18] Atsuo Kuniba and Hanbaek Lyu, Large deviations and one-sided scaling limit of multicolor box-ball system, arXiv preprint

arXiv:1808.08074 (2018).

[KLO18] Atsuo Kuniba, Hanbaek Lyu, and Masato Okado, Randomized box-ball systems, limit shape of rigged configurations and Thermo-

dynamic Bethe ansatz, arXiv preprint arXiv:1808.02626v4 (2018).

[KOS+06] Atsuo Kuniba, Masato Okado, Reiho Sakamoto, Taichiro Takagi, and Yasuhiko Yamada, Crystal interpretation of Kerov–Kirillov–

Reshetikhin bijection, Nuclear Physics B 740 (2006), no. 3, 299–327.

[Koz96] Zbigniew Koza, The long-time behavior of initially separated A+B → 0 reaction-diffusion systems with arbitrary diffusion con-

stants, Journal of Statistical Physics 85 (1996), no. 1, 179–191.

[KR84] K. Kang and S. Redner, Scaling approach for the kinetics of recombination processes, Phys. Rev. Lett. 52 (1984), 955–958.

[Lam78] Leslie Lamport, Time, clocks, and the ordering of events in a distributed system, Communications of the ACM 21 (1978), no. 7,

558–565.

[LC95] Benjamin P. Lee and John Cardy, Renormalization group study of the A+B → 0 diffusion-limited reaction, Journal of Statistical

Physics 80 (1995), no. 5, 971–1007.

[LLP17] Lionel Levine, Hanbaek Lyu, and John Pike, Double jump phase transition in a random soliton cellular automaton, arXiv preprint

arXiv:1706.05621 (2017).

[LLPS19] Joel Lewis, Hanbaek Lyu, Pasha Pylyavskyy, and Arnab Sen, arXiv preprint arXiv:1911.04458 (2019).

[LMS19] H Lyu, F Memoli, and D Sivakoff, Sampling random graph homomorphisms and applications to network data analysis, arXiv:???

(2019).

[LNB19] Hanbaek Lyu, Deana Needell, and Laura Balzano, Online matrix factorization for markovian data and applications to network

dictionary learning, arXiv preprint arXiv:1911.01931 (2019).

[Lov12] László Lovász, Large networks and graph limits, vol. 60, American Mathematical Soc., 2012.

[LPS14] Thomas Lam, Pavlo Pylyavskyy, and Reiho Sakamoto, Rigged configurations and cylindric loop schur functions, arXiv preprint

arXiv:1410.4455 (2014).

[LS17a] Hanbaek Lyu and David Sivakoff, Persistence of sums of correlated increments and clustering in cellular automata, Submitted.

Preprint available at arXiv.org/1706.08117 (2017).

[LS17b] , Synchronization of finite-state pulse-coupled oscillators on Z, arXiv.org:1701.00319 (2017).

[Lyu15] Hanbaek Lyu, Synchronization of finite-state pulse-coupled oscillators, Physica D: Nonlinear Phenomena 303 (2015), 28–38.

[Lyu16] , Phase transition in firefly cellular automata on finite trees, Submitted. Preprint available at arxiv:1610.00837 (2016).

[Lyu17] , Global synchronization of pulse-coupled oscillators on trees, To appear in SIAM Journal on Applied Dynamical Systems.

Preprint available at arXiv:1604.08381 (2017).

[MBPS10] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro, Online learning for matrix factorization and sparse coding, Journal

of Machine Learning Research 11 (2010), no. Jan, 19–60.

[MH+13] Jeffrey W Miller, Matthew T Harrison, et al., Exact sampling and counting for fixed-margin matrices, The Annals of Statistics 41(2013), no. 3, 1569–1592.

[MS90] Renato E Mirollo and Steven H Strogatz, Synchronization of pulse-coupled biological oscillators, SIAM Journal on Applied Mathe-

matics 50 (1990), no. 6, 1645–1662.

[NL07] Sujit Nair and Naomi Ehrich Leonard, Stable synchronization of rigid body networks, Networks and Heterogeneous Media 2 (2007),

no. 4, 597.

[OZ78] A. A. Ovchinnikov and Ya. B. Zeldovich, Role of density fluctuations in bimolecular reaction kinetics, Chem. Phys. 28 (1978), no. 1–2,

215–218.

[PD19] Igor Pak and Sam Dittmer, Sampling sparse contingency tables, In preparation (2019).

[PRS19] Michał Przykucki, Alexander Roberts, and Alex Scott, Parking on the integers, arXiv preprint arXiv:1907.09437 (2019).

[PS11] Roberto Pagliari and Anna Scaglione, Scalable network synchronization with pulse-coupled oscillators, IEEE Transactions on Mo-

bile Computing 10 (2011), no. 3, 392–405.

[RPZ+18] Bin Ren, Laurent Pueyo, Guangtun Ben Zhu, John Debes, and Gaspard Duchêne, Non-negative matrix factorization: robust extrac-

tion of extended structures, The Astrophysical Journal 852 (2018), no. 2, 104.

[SGH02] Arkadiusz Sitek, Grant T Gullberg, and Ronald H Huesman, Correction for ambiguous solutions in factor analysis using a penalized

least squares objective, IEEE transactions on medical imaging 21 (2002), no. 3, 216–225.

[Str00] Steven H Strogatz, From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators, Physica

D: Nonlinear Phenomena 143 (2000), no. 1, 1–20.

[Sul18] Seth Sullivant, Algebraic statistics, vol. 194, American Mathematical Soc., 2018.

[Tak93] D Takahashi, On some soliton systems defined by using boxes and balls, 1993 International Symposium on Nonlinear Theory and

Its Applications,(Hawaii; 1993), 1993, pp. 555–558.

[TN12] Leo Taslaman and Björn Nilsson, A framework for regularized non-negative matrix factorization, with application to the analysis

of gene expression data, PloS one 7 (2012), no. 11, e46331.

[TS90] Daisuke Takahashi and Junkichi Satsuma, A soliton cellular automaton, J. Phys. Soc. Japan 59 (1990), no. 10, 3514–3519.

Page 14: RESEARCHSTATEMENT · RESEARCHSTATEMENT Hanbaek Lyu I work in the field of probability theory, combinatorics, and theoretical data science and machine learning. My research is broadly

14

[TTMS96] Tetsuji Tokihiro, Daisuke Takahashi, Junta Matsukidaira, and Junkichi Satsuma, From soliton equations to integrable cellular au-

tomata through a limiting procedure, Physical Review Letters 76 (1996), no. 18, 3247.

[TW83] Doug Toussaint and Frank Wilczek, Particle–antiparticle annihilation in diffusive motion, J. Chem. Phys. 78 (1983), no. 5, 2642–

2647.

[WND13] Yongqiang Wang, Felipe Nunez, and Francis J Doyle, Increasing sync rate of pulse-coupled oscillators via phase response function

design: theory and application to wireless networks, Control Systems Technology, IEEE Transactions on 21 (2013), no. 4, 1455–1462.

[WS98] Duncan J Watts and Steven H Strogatz, Collective dynamics of "small-world" networks, nature 393 (1998), no. 6684, 440–442.

[ZTX16] Renbo Zhao, Vincent YF Tan, and Huan Xu, Online nonnegative matrix factorization with general divergences, arXiv preprint

arXiv:1608.00075 (2016).

HANBAEK LYU, DEPARTMENT OF MATHEMATICS, THE OHIO STATE UNIVERSITY, COLUMBUS, OH 43210.

Email address: [email protected]