Methods of Information Theory and Algorithmic Complexity ...3 Information theory and network biology Information theory is a eld that speci es fundamental limits on signal pro-cessing

Methods of Information Theory and Algorithmic

Complexity for Network Biology

Hector Zenil and Jesper TegnerUnit of Computational Medicine, Deparment of Medicine,

Karolinska Institute & Center for Molecular Medicine,Karolinska University Hospital, Stockholm, Sweden.

Abstract

Network theory is today a central topic in computational systems bi-ology as a framework to understand and reconstruct relations among bi-ological components. For example, constructing networks from a geneexpression dataset provides a set of possible hypotheses explaining con-nections among genes, vital knowledge to advancing our understandingof living organisms as systems. Here we introduce some aspects at theintersection of information theory and network biology. We show thatShannon’s information entropy, Kolmogorov complexity and algorithmicprobability quantify different aspects of biological networks at the inter-play of local and global pattern detection. We provide approximationsto the algorithmic probability and Kolmogorov complexity of motifs con-nected to the asymptotic topological properties of networks.

Keywords: information theory, complex networks, network motifs,Kolmogorov complexity, algorithmic randomness, algorithmic probability,biological networks.

1 Introduction

Over the last decade network theory has become a unifying language in biology,leading to whole new areas of research in computational systems biology. Graphsare an important tool for the mathematical analysis of many systems, from theinteractions of chemical agents, cells and genes, to ecological networks. Yet weare only just beginning to understand the significance of network biology.

We have recently shown connections among graph theory, algorithmic com-plexity and empirical biological networks [32]. Here we study some of theseinformation content approaches to characterize biological networks.

1

arX

iv:1

401.

3604

v3 [

q-bi

o.M

N]

15

Oct

201

4

2 Graph and network theory

A graph or network g, consists of a couple (V (g), E(g)) where V (g) is a setof vertices (also called nodes) and E(g) is a set of edges (also called links).Two vertices, i and j in V (g) are in E(g) if they are connected. A usefulrepresentation of a graph is what is called an adjacency matrix (see Fig. 1).The adjacency matrix of g, which we can denote by Adj(A), is a n × n binarymatrix with entries ai,j = 1 if (i, j) ∈ E(g) and 0 otherwise. The adjacencymatrix Adj(A) fully determines the links E(g) connecting all elements in V (g)and therefore constitutes a full description of a graph. The distance D(g) of agraph g is the maximum distance between any 2 nodes in V (g). The size |V (g)|and |E(g)| denote respectively the number of nodes and edges of g. A subgraphh of a graph g is a graph such that V (h) ⊆ V (g) and E(h) ⊆ E(g).

Directed graphs are graphs with links that have a direction (ingoing or out-going) relative to a node, that is, if u and v are linked nodes, (u, v) is differentfrom (v, u). When the graph is undirected then (u, v) is commutative. For themost part we will cover non-directed graphs, unless otherwise noted. This isbecause most of what we will say applies to both cases, and the non-directedgraph is a simpler case than the directed.

Given that geometric distances and coordinates have no meaning in graphtheory, it is topological measures that are of interest. One of the most basictopological properties of graphs and networks is the number of links per node.When all nodes have the same number of links the graph is said to be regular (asopposed to random or complex). The degree of a node v, denoted by d(v), is thusthe number of (incoming and outgoing) links to other nodes. Another usefultopological measure is what is called a clustering coefficient. It is a measureof the degree to which nodes in a graph tend to cluster together (for example,friends in social networks).

Figure 1: An example of a regular graph with node degree 3 (called Petersengraph) followed by its adjacency matrix. Regular graphs are Kolmogorov simple(see Section 3) because every node requires the same amount of information tobe specified.

2

Figure 2: Examples of regular graphs (top left and top right) are a 2n graphwith 20 nodes and the complete graph of 20 nodes both whose descriptions arevery short. In contrast, a random graph (right) with the same number of nodesand number of links requires more information to be specified, because there isno simple rule connecting the nodes.

Regular graphs (e.g. the Petersen graph in Fig. 1) were the first type ofgraphs to be studied and they are characterised by the graphs which nodes haveall the same degree (same number of edges per node). Regular graphs are objectsthat are simple in informational terms because they can be briefly specified bythe fact that every node has the same degree (incoming and outgoing links)and so only a small quantity of information is required to describe them fully.Examples of regular graphs include complete graphs which are fully connectedand therefore their nodes all of the same degree. So called 2n-graphs are graphswhose nodes have a fixed edge degree of 2, hence regular (see Fig. 1, a regulargraph with node degree 3).

Another type of graph that has been studied is the random graph (alsocalled Erdos-Renyi [9, 10]), in which vertices are randomly and independentlyconnected by links with a fixed probability (see Fig. 2 for a comparison betweena regular and a random graph of the same size). The probability of vertices beingconnected is called edge probability. The main characteristic of random graphsis that all nodes have roughly the same number of links, equal to the averagenumber of links per node. Random graphs have some interesting properties, butbiological networks are not random graphs. They carry information, connectionsbetween certain elements in a biological graph are favoured or avoided and notall vertices have the same probability of being connected to other vertices.

3

2.1 Complex networks

The two most popular complex network models consist of two algorithms thatreproduce certain characteristics found in empirical networks. Indeed, the fieldhas been driven largely by the observation of properties that depart from prop-erties modelled by regular and random graphs. Specifically, there are two topo-logical properties of many complex networks that have been a focus of interest.

On the one hand, the so-called small-world describes the phenomenon ofmany empirical networks where most vertices are separated by a relatively smallnumber of edges. A graph is considered a small-world graph if its averageclustering coefficient C is significantly higher than a random graph constructedon the same vertex set, and if the graph has approximately the same distanceD(g) value as its corresponding random graph.

On the other hand, many networks are scale-free, meaning that their de-grees are size independent, in the sense that the empirical degree distributionis independent of the size of the graph up to a logarithmic term. That is, theproportion of vertices with degree k is proportional to ckτ for some τ > 1 andconstant c. In other words, many empirical networks display a power-law degreedistribution.

Observation of these properties has motivated the development of manymodels to explain these features. Of these, the Barabasi-Albert (BA) model [1]reproduces complex network features using a preferential attachment mecha-nism. The Watts-Strogatz (WS) model [31], for example, provides a mechanismfor constructing small-world networks with a rewiring algorithm. Starting froma regular network (e.g. ring lattice) W of node degree 2, the provides consistsin rewiring a few nodes with low probability. An example of a Watts-Strogatzrewiring algorithm for a 30-vertex network (W ) and rewiring probability p = 0,0.1 and 1 starting from a 2n-regular graph is given in Fig. 3. A property ofnetworks produced by this algorithm is that as soon as the rewiring probabilityis different to zero (e.g. p = 0.2), the average distance D(W ) drops fast due tothe few randomly rewired nodes as compared to p = 0, preserving most of itslow information content requiring only some additional information to describethe few new links to the complexity of the entire network.

The Barabasi-Albert model also reproduces networks with the small-worldproperty by using a preferential attachment mechanism, that is linking everynew node with a probability given by pi = ki/(σjkj), where ki is the degree ofnode i and the sum is made over all preexisting nodes j, resulting in a degreedistribution that follows a power law given by P (k) 1/(k3) (see Fig. 5).

Another way to characterise the small-world property of a graph is if theaverage graph distance D grows no faster than the log of the number of nodes:d ∼ log(V (g)). Both Barabasi-Albert and Watts-Strogatz algorithms producesmall-world graphs (Figs. 3 and 4). However, unlike the Barabasi-Albert, theWatts-Strogatz model does not produce scale-free networks.

4

Figure 3: For p = 0, W is a regular graph that requires little information tobe described. For rewiring probability p = 1, W becomes a random graph thatrequires about as much information in bits as number of nodes in W . For pvery close to zero (here p = 0.02), however, the average distance in the networkdramatically drops close to the random graph case, while remaining of lowcomplexity (little information is required to be described), close to the regularW for p = 0.

Figure 4: Example of a Barabasi-Albert network A (starting from a 10-vertexnetwork and growing it up to 100 nodes). The Barabasi-Albert model is apreferential attachment algorithm that consists in favouring new connections tobe made with nodes that are already highly connected. The results are complexnetworks that are scale-free. That is, the distance D(A) between any 2 nodesbarely grows despite the graph size growth.

5

Figure 5: The degree distribution (normalised by maximum value) of a ran-domly generated Barabasi-Albert network following a power-law distribution(logarithmic scale on the y-axis), meaning a few nodes always remain with mostof the connections and most nodes remain only with a few connections.

3 Information theory and network biology

Information theory is a field that specifies fundamental limits on signal pro-cessing such as communicating, storing and compressing data. Central to in-formation theory is the concept of Shannon’s information entropy, which quan-tifies the average number of bits needed to store or communicate one symbolin a message. Shannon’s entropy determines that one cannot store (and there-fore communicate) a symbol with n different symbols in less than log(n) bits.In this sense Shannon’s entropy determines a lower limit below of which nomessage can be further compressed, not even in principle. Another applica-tion (or interpretation) of Shannon’s information theory, is as a measure toquantify the uncertainty involved in predicting the value of a random variable.For example, specifying the outcome of a fair coin flip (two equally likely out-comes) requires one bit at a time, because the results are independent andtherefore each result conveys maximum entropy. Things start to get interest-ing when the coin is not fair. If one considers a two-headed coin then thetossing experiment always results in heads, and the message will always be 1.When Shannon’s entropy is measured from a finite sample, it can be writtenas H(X) = −

∑ni=1 P (xi) log2 P (xi), where X is the random variable with n

possible outcomes {x1, . . . , xn} and P (xi) the probability of xi to occur by anunderlying process (which also implies that to know H one has to know orassume the probability mass distribution of the whole sample).

Thus, according to Shannon’s entropy, the information content of a messageencoded by a single symbol is zero, or in other words, no uncertainty exists inthe process of revealing the same outcome bit after bit. But if a coin is weighted

6

so that the probability of tails is 25% and the probability of heads is 75%, thenthe Shannon entropy is 0.81. Data in the real world is like a biased coin; it isnot completely random although it may contain some noise. DNA sequences,for example, cannot be random, because they store the information required toproduce the proteins needed by a living organism.

Biological networks carry information, transfer information from one regionto another and implement functions represented by the network interactions.Connections among elements in a biological network are therefore unlikely tobe regular or random. In a biological network nodes usually represent proteins,metabolites, genes or transcription factors. A link represents the interactionsbetween the nodes in a network that can correspond to protein-protein bindinginteractions, metabolic coupling or regulation. The degree of a node in a bio-logical network of protein interactions represents the number of proteins withwhich it interacts.

A more powerful (and finer grained, see Fig. 10) measure of informationand randomness than that of Shannon entropy (see Fig. 9), is the concept ofKolmogorov complexity [6, 14]—denoted by K. This is because K has beenproven to be a universal measure theoretically guaranteed to asymptotically spotany possible computable regularity [29]. Formally, the Kolmogorov complexityof a string s is K(s) = min{|p| : U(p) = s}, that is, the length (in bits) of theshortest program p that when running on a universal Turing machine U outputss upon halting.

A universal Turing machine U is an abstraction of a general-purpose com-puter that can be programmed to produce any computable object such as astring or a network (e.g. the elements of an adjacency matrix). By the in-variance theorem [5, 18], KU only depends on U up to a constant, so as isconventional, the U subscript can be dropped.

Due to its great power, K comes with a technical inconvenience (called semi-computability) and it is proven that no effective algorithm exists which takes astring s as input and produces the exact integer K(s) as output [6,14], and this isrelated to a common problem in computer science known as the undecidability ofthe halting problem [25]—that is, to know whether a computation will eventuallystop or not.

Despite the inconvenience K can be effectively approximated by using, forexample, compression algorithms. Kolmogorov complexity can alternatively beunderstand in terms of uncompressibility. If an object, such as a biologicalnetwork, is highly compressible, then K is small and the object is said to benon-random. However, if the object is uncompressible then it is consideredalgorithmically random.

4 Kolmogorov complexity of biological networks

As shown in [32], estimations of Kolmogorov complexity are able to distinguishcomplex from random networks (of the same size), which are both in turn distin-guished from regular graphs (also of the same size). K calculated by the BDM

7

Figure 6: The extensively studied transcriptional regulation network [3] of Es-cherichia coli (here only the main connected component) contains about 400nodes and 500 edges, 40% of which are self-loops. This is a significant de-parture from the probability of an edge being connected to the same node bychance, unlike in a random network, where we would expect only 1 to 2 nodesto be self-connected.

8

assigns low Kolmogorov complexity to regular graphs, medium complexity tocomplex networks following Watts-Strogatz or Barabasi-Albert algorithms, andhigher Kolmogorov complexity to random networks. That random graphs arethe most algorithmically complex is clear from a theoretical point of view: nearlyall long binary strings are algorithmically random, and so nearly all random un-labelled graphs are algorithmically random [32].

Figure 7: Streptococcus pyogenes metabolic network [12].

In a graph representation of a transcriptional regulation network that con-trols gene expression in cells, nodes (vertices) are operons, which are one ormore genes transcribed on the same messenger ribonucleic acid (mRNA). Edgesof the graph are directed from an operon that encodes a transcription factor toan operon that it directly regulates [20].

One way to study the complexity of biological networks is by applyinginformation-theoretic measures globally, to detect global patterns and measurethe information content of the entire network. In Table 1, Kolmogorov approx-imation by compression ratios of a sample of 22 metabolic networks from [12],including the Streptococcus pyogenes network, have been calculated.

The compression ratio of a network is defined by C(g) = Comp(g)/|g|, whereComp(g) is the compressed length of the adjacency matrix g of a network usinga lossless compression algorithm (in this case the universal algorithm Deflate,used in file formats such as png and zip). |g| is the size of the adjacency matrixmeasured by taking the dimensions of the array and multiplying its values, e.g.,if the adjacency matrix is 10 × 10, then |g| = 100. It is worth mentioning that

9

compressibility is a sufficient test for non-randomness. Which means that itdoesn’t matter which lossless compression algorithm is used, if an object can becompressed by one, then the rate value is a valid upper bound of Kolmogorovcomplexity. Which in turn means the choice of compression algorithm is notvery important as long as it remains lossless. A lossless compression algorithmis an algorithm that includes a decompression algorithm that retrieves the exactoriginal object, without any loss of information when decompressed. The closerC(g) is to 1 the less compressible, and the closer to 0 the more compressible.

Table 1: Compression lengths (using Deflate) of a sample of 20 metabolic net-works where |V (g)| is the number of vertices of g, |E(g)| the number of edges ofg, C(c|V (g)|) the compressed size of the adjacency matrix of the complete graphwith number of nodes equal to |V (g)|, C(g) the compressed size of Adj(g), andC(r|V (g)|) the compressed size of the adjacency matrix of a randomised net-work r built with the same number of vertices as g following also the degreedistribution of g. Networks are sorted by C(g).Network (g) |V (g)| |E(g)| C(c|V (g)|) C(g) C(r|V (g)|)Chlamydia Pneumoniae 387 792 0.0020 0.032 0.90Mycoplasma Pneumoniae 411 936 0.0018 0.034 0.89Chlamydia Trachomatis 447 941 0.0015 0.029 0.88Rickettsia Prowazekii 456 1014 0.0014 0.030 0.89Mycoplasma Genitalium 473 1060 0.0013 0.029 0.89Treponema Pallidum 485 1117 0.0013 0.028 0.89Aeropyrum Pernix 490 1163 0.0012 0.029 0.89Oryza Sativa 665 1514 0.00068 0.022 0.92Arabidopsis Thaliana 694 1593 0.00062 0.020 0.93Pyrococcus Furiosus 751 1768 0.00058 0.019 0.95Pyrococcus Horikoshii 767 1796 0.00055 0.018 0.95Thermotoga Maritima 830 1980 0.00047 0.018 0.96Emericella Nidulans 916 2176 0.00039 0.016 0.98Chlorobium Tepidum 918 2159 0.00039 0.016 0.98Helicobacter Pylori 949 2325 0.00036 0.016 0.98Campylobacter Jejuni 946 2257 0.00036 0.016 0.97Neisseria Meningitidis 981 2393 0.00034 0.015 0.99Porphyromonas Gingivalis 1010 2348 0.00032 0.014 1.0Enterococcus Faecalis 1004 2462 0.00032 0.016 1.0Streptococcus Pyogenes 1051 2577 0.00030 0.014 1.0

We know that for a random (Erdos-Renyi) graph r, its Kolmogorov com-plexity is K(r) = |E(r)|. For example, for edge density 0.5 we have K(r) ∼n(n − 1)/4, because edges being placed at random require to be described oneby one in order to reconstruct the network from its “compressed” descriptionwithout loss of information. Likewise, a complete graph c of size |V (c)| will havea Kolmogorov complexity K(c) = log2 |V (c)|, because its description is trivial–everything is connected–and only the size of a complete graph is needed in order

10

to reproduce it in full. This means that indeed a statistical (Erdos-Renyi) graphis also algorithmic (Kolmogorov) random if it is not produced recursively (e.g.produced by an algorithm) but by a random process (most random computer-generated graphs are actually simulations of valid statistically random graphs),while a complete (or a completely disconnected) graph is Kolmogorov simple.Any other type of graph, either completely (dis)connected, or random, will liebetween the theoretical extremes.

As shown in Fig. 1, the Kolmogorov approximation values by incompress-ibility lie right between the two extreme cases, fully connected or disconnectedgraphs (c), because the size of the complete graph c is equal to g; and randomgraphs (r) constructed by randomly rewiring the network in a way that pre-serves the original degree distribution of g. That biological networks are neitherrandom or simple captures the fact that they contain non-trivial information asmodels of the processes of the living systems they represent. In other words, fora biological network g, 0 ∼ C(cV (g)) < C(g) < C(rV (g)) ∼ 1. This is illustratedby the experiment depicted in Fig. 8 for a fixed network size and varying num-ber of edges. The Streptococcus pyogenes metabolic network with only ∼ 2500links reaches greater complexity than the corresponding random graph for thesame number of edges, and lies between the simplest cases (disconnected andcomplete graphs) and random graphs reaching a compression ratio of 1.

5 Detection of local patterns with algorithmicprobability

There is another seminal concept in the theory of algorithmic information,namely the concept of algorithmic probability. The algorithmic probability of astring s is a measure that describes the probability that a valid random programp produces the string s when run on a universal (prefix-free1) Turing machineU . In equation form this can be rendered as m(s) =

∑p:U(p)=s 1/2|p| (Eq. 1).

That is, the sum over all the programs p for which U outputs s and halts.The algorithmic probability measure m(s) is related to Kolmogorov com-

plexity K(s) in that m(s) is at least the maximum term in the summation ofprograms, given that the shortest program carries the greatest weight in thesum. The Coding Theorem further establishes the connection between m(s)and K(s) as follows: | − log2m(s) − K(s)| < c, where c is a fixed constant,independent of s. The Coding Theorem implies that [5,7] one can estimate theKolmogorov complexity of a string from its frequency. By rewriting Eq. (1) as:K(s) = − log2m(s) + O(1) one can see that it is possible to approximate Kby approximating m, with the added advantage that m(s) is more sensitive tosmall objects [8] than the traditional approach to K using lossless compressionalgorithms, which typically perform poorly when it comes to small objects.

Using this technique one can estimate the Kolmogorov complexity of net-

1The group of valid programs forms a prefix-free set (no element is a prefix of any other,a property necessary to keep 0 < m(s) < 1.) For details see [5].

11

Figure 8: Curve of compression ratios of a network of 1051 nodes (the size ofthe Streptococcus pyogenes metabolic network) behaves as a classical Bernoulliprocess when varying the number of edges from 0 to the maximum of 551 775possible links for the complete graph of 1051 nodes. The maximum compressionratio is reached when the graph has about 275 888 links (half the maximumnumber). With only 2577 links, however, the complexity of the Streptococcuspyogenes network is far removed from simplicity and randomness in the middle.The slight asymmetry of the curve corresponds to the zero diagonal of theadjacency matrix of a graph with no self-loops.

12

work motifs as shown in Fig. 10. If the proposal’s association of network motifswith biological functions turns out to be correct, the estimation of algorithmicinformation content suggests that motifs of high Kolmogorov complexity per-form more complicated information flow functions than small networks of thesame size, such as non-weighted small network number 1 in Fig. 10 that makesno distinction among the nodes and therefore transmits information in all di-rections; or networks 2 to 6 where only 2 nodes are connected out of a possible3; with less trivially connected networks starting at network 9 (7 being an ex-ception), where by trivial we mean that either all nodes are connected (networknumber 1) or not all nodes are connected.

The Coding Theorem Method [8,27] is rooted in the relation provided by al-gorithmic probability between frequency of production of a string from a randomprogram and its Kolmogorov complexity (Eq. (1), also called algorithmic Cod-ing theorem, in contrast with another coding theorem in classical informationtheory). Essentially it uses the fact that the more frequent a string (or object)is, the lower Kolmogorov complexity it has; and strings of lower frequency havehigher Kolmogorov complexity.

The approach to determining the algorithmic complexity of network motifsthus involves considering how often the adjacency matrix of a motif is generatedby a random Turing machine on a 2-dimensional array, also called a termite orLangton’s ant [15]. Fig. 11 shows estimations of K for a sample of motifs oflength 4. We call this the Block Decomposition Method (BDM) as it requires thepartition of the adjacency matrix of a graph into smaller matrices for which wecan numerically calculate, with statistical significance, its algorithmic probabil-ity and therefore its Kolmogorov complexity by means of the Coding theorem.Then the overall complexity of the original adjacency matrix is the sum of thecomplexity of its parts, albeit a logarithmic penalization for repetitions, giventhat repetitions of the same object does not add much complexity to the overallobject as one can simply describe a repetition in terms of the multiplicity of thefirst occurrence. Details of the method and applications to graphs and imagescan be found in [32,33].

6 Patterns and information content of biologicalnetworks

One important development in network biology is the concept of network mo-tifs [3], defined as recurrent and statistically significant sub-graphs found innetworks, as compared to a uniform distribution in a random network. As isto be expected, biological networks are not random networks because biolog-ical networks carry information necessary for an organism to develop. Motifsare believed to be of signal importance largely because they may reflect func-tional properties. Of the 13 possible 3-node architectural network subgraphsin a network, the subgraph 7 in Fig. 10 is, for example, a motif in the E. colitranscription network (Fig. 6). The function of this motif has been identified

13

as a Feed-Forward Loop (FFL), which, it has been speculated, is a more stabletransmitter of information among genes than other possible small network ar-rangements. This motif consists of 3 genes: a regulator that regulates anotherregulator and a gene, this latter regulated by the 2 regulators (leading to eachof the regulatory interactions can be either activation or repression this motifcan be divided into 8 more refined subtypes). This 3 node motif has been foundin E. coli [19,26], yeast [16,20], and other organisms [4,11,21,22,28,30]. Othermotifs have been identified with varieties of memory switches to control anddelay actions, among other possible useful biological functions.

That FFL is identified with a gene regulatory function and it comes classifiedwith medium range Kolmogorov complexity may be expected, given that as wehave seen in previous sections, structure lies between simplicity and randomness.This may suggest that complete subgraphs are, for example, necessarily part ofa larger motif if it is to implement a meaningful biological function. It can alsobe an indication of an over-representation of connections in an inferred networkfrom a biological dataset, hence helping reduce false positives.

FFLs are only one class of network motif among 4 identified [2]. The simplestkind of motif is the positive and negative autoregulation (with as motif a self-loop) and abbreviated NAR or PAR respectively. There is also Single-inputmodules (SIM) and Dense overlapping regulons (DOR) or Multi-input motifs(MIMs). The SIMs is a regulator that regulates a group of genes with no otherregulator in between (with the only regulator regulating itself) and identifiedto a coordinated expression function of a group of genes with shared functioncapable of generating a temporal expression programme and activation orderwith different thresholds for each regulated gene. The DORs have been identifiedwith a gate-array, carrying out a computation from multiple inputs throughoutmultiple outputs.

Figure 9: Shannon’s entropy values applied to the adjacency matrices of the 13subgraphs of size 3 that can occur in a network (sorted from smaller to largerentropy), plotted in a circular embedding as often depicted [3].

Fig. 9 gives an idea of what Shannon’s entropy (applied to the edge list ofthe motifs) is measuring as it is counting the number of different possible con-

14

figurations. Shannon’s entropy clusters motifs in five classes of values accordingonly to the different number of edges in each motif. Motifs from 1 to 3 havethe smallest possible value of log(2) ∼ 0.6365, followed by cases 5 to 13 withthe next possible Shannon entropy value log(2) ∼ 1.0986 different to 0. Thosewith entropy log(2) are motifs with only 2 edges, while those with log(3) valuesare motifs with 3 edges and so on. Shannon’s entropy is therefore here usefulto count number of edges. To further probe the networks, it is advantageous toestimate their Kolmogorov complexity (see Figs. 10 and 11) rather than theirvariety. As a proof-of-principle, we have previously shown that topological prop-erties of complex networks indeed are detected by approximations to K [32]. ForKolmogorov complexity, for example, the simplest possible motif is that con-necting all nodes bidirectionally, accommodating to the idea that such a motifwould be biological meaningless (unless assigning weights indicating levels ofregulation), unlike Shannon’s entropy that assigns it high entropy due to itsgreater number of edges among all other cases. After the complete graph, mo-tifs with lowest Kolmogorov complexity are those connecting only 2 nodes. It isclear that Kolmogorov approximations are capturing more structure assigning11 different values to the 13 cases, hence more fine-grained than Shannon’s.

Figure 10: Kolmogorov complexity estimations (c.f. BDM) of the adjacencymatrices of the 13 subgraphs of size 3 that can occur in a network, sorted fromsmaller to larger approximated Kolmogorov complexity values and plotted in acircular embedding as often depicted [3].

There is a debate as to whether motif analysis is living up to the promise ofbreaking complex biological networks to understand them in terms of minimalfunctions carried by motifs as it has been suggested that the motif approach tonetworks has important limits [13]. In Fig. 12 we tested how much could beinferred if motifs where used to build up a large network by picking randomvertices of each motif and randomly linking them to other motifs with a greaterprobability of being connected to the new one and a small probability of con-necting to an old one in the process of adding one of two chosen motifs one byone. The result is a scale-free network. Motifs are by definition well connectednetworks, small networks that can be thought of as part of a kn-regular network

15

Figure 11: Kolmogorov complexity estimations (c.f. BDM) of a sample of pos-sible motifs sorted from smaller to larger values.

Figure 12: Constructing a network of motifs: After six recursive iterations start-ing from 2 unlabelled different low complexity motifs whose nodes are labelledand randomly linked (e.g. node 1 and 13 after one iteration) the result is a scale-free complex network (the distribution of node links—or degree distribution—remains about the same).

16

Figure 13: Motifs found in the metabolic network of Streptococcus pyogenes(bottom) and the transcription network of E. coli (top) using the mfinder [20]software and plotted in a style suggesting some information flow implementinga function, each followed by its Kolmogorov complexity (c.f. BDM) sorted fromlarger to smaller values.

17

when completely linked, with k the length of the motifs and n the number ofmotifs. Some links will connect distant nodes due to the labelling method thatimplemented the weighted probability.

The experiment, as depicted in Fig. 12, is an attempt to build a large networkout of small networks, but without information about the connections among thesmaller components it turns out to yield a network that is the result of puttingtogether the small networks, and quite at odds with anything that could be saidabout the motifs. Even when there are some known facts about how classesof motifs connect to each other, such as FFLs and SIMs integrated into DORsand DORs occuring in a cingle layer (there is no DOR at the output of anotherDOR) [2], we think the study of motifs and full networks are not alternatives,but are complementary rather, as one cannot reconstruct global patterns fromlocal repetitions, nor local patterns from global properties.

In Fig. 7 the motifs in the metabolic network of the Streptococcus pyogenes(Fig. 7) as described in [12] were calculated using a specialised (for this purpose)open-source software [20] called mfinder. mfinder implements two kinds of motifmining algorithms, a full enumeration and a sampling method, both of whichuse brute-force [20] and succeed in discovering small motifs (larger motifs arevery difficult to find, but new tools have been proposed).

Streptococcus pyogenes is a bacterium that causes group A streptococcalinfections [24], ranging from mild superficial skin infections to life-threateningsystemic diseases. A metabolic network is a set of physical processes that deter-mine the physiological and biochemical properties of a cell, such as the chemicalreactions of metabolism, metabolic pathways, and regulatory interactions. It isworth noting the occurrence of the same motif (1 and 4) in the transcriptionnetwork of E. coli (Fig. 6) and the metabolic network of Streptococcus pyogenes(Fig. 7).

7 Concluding remarks

We have briefly surveyed a novel area at the intersection of network biology,complex networks and information theory. We have applied and compared tech-niques, from Shannon’s entropy to Kolmogorov complexity (this latter approx-imated both by algorithmic probability and lossless compressibility), that shedlight on various aspects of biological networks serviceable at different scales andfor different purposes in the investigation of properties of biological networks.

We have numerically estimated the information content and algorithmic ran-domness of local and global patterns of biological networks, both at the smallestscale (network motifs) and at their full scale.

We have praised for an encompassing information-theoretic study of biolog-ical networks at different scales and for a potential fruitful interaction betweenthe theory of algorithmic information theory and systems biology. First resultshave already been delivered [34] and some deeper connections between algorith-mic probability and biology have been introduced [35, 36], but we have onlystarted to explore this direction at the intersection of information and biology

18

and tools and measures are waiting to be further explored and exploited.Open problems and research directions are characterizations of networks by

information-theoretic properties identifying possible information motifs, nodesand modules ranked by their contribution to the overall information content ofthe networks. One obvious research direction is the investigation of the change ofinformation content of networks over time beyond the study of static networks,and the information flow between different regions of a network. In Fig 13, forexample, we have proposed a bottom-up approach to building networks fromconstituting graph motifs, which kind of motifs would construct a network thatapproximates the information content of a typical biological network? Whatkind of conclusions could we drawn from looking at local information regulari-ties of a network that build up certain global properties (information patterns)typical of biological networks? (e.g. degree distribution/hierarchical organisa-tion). May information motifs be explained or are more fundamental than clas-sical graph motifs? How graph dynamics (hypothesize graph motif functions)connect to information dynamics? How the information content of a geneticregulatory network of a cell in the process of differentiation looks like? Does thenetwork approach a state of algorithmic randomness or departs from it? Caninformation-theoretic constraints help understand and even reprogramme cellsif these are taken as information processor units?

We think the approach presented herein provides a starting point for a novelperspective and a novel framework to formulate a new kind of questions from avery different perspective, effectively bridging various seminal areas of researchin complexity science and computational biology.

References

[1] R. Albert and A.-L. Barabasi, Statistical mechanics of complex networks,Reviews of Modern Physics, 74, 47, 2002.

[2] U. Alon, Network Motifs: theory and experimental approaches, Nature,450, vol. 8, June 2007.

[3] U. Alon, Collection of Complex Networks. Uri Alon Homepage 2007(accessed on July 2013). http://www.weizmann.ac.il/mcb/UriAlon/

groupNetworksData.html.

[4] L.A. Boyer et al. Core transcriptional regulatory circuitry in human em-bryonic stem cells, Cell 122, 947–956, 2005.

[5] C.S. Calude, Information and Randomness: An Algorithmic Perspective,EATCS Series, 2nd. edition, 2010, Springer.

[6] G.J. Chaitin. On the length of programs for computing finite binary se-quences Journal of the ACM, 13(4):547–569, 1966.

[7] T.M. Cover and J.A. Thomas, Elements of Information Theory, 2nd Edi-tion, Wiley-Blackwell, 2009.

19

http://www.weizmann.ac.il/mcb/UriAlon/groupNetworksData.html

http://www.weizmann.ac.il/mcb/UriAlon/groupNetworksData.html

[8] J.-P. Delahaye and H. Zenil, Numerical Evaluation of the Complexity ofShort Strings: A Glance Into the Innermost Structure of Algorithmic Ran-domness, Applied Mathematics and Computation 219, pp. 63–77, 2012.

[9] P. Erdos, A. Renyi, On Random Graphs I. In Publ. Math. Debrecen 6, p.290–297, 1959.

[10] E.N. Gilbert, Random graphs,Annals of Mathematical Statistics 30 : 1141–1144, doi:10.1214/aoms/1177706098.

[11] N. Iranfar, D. Fuller and W.F. Loomis Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBFand LagC. Dev. Biol. 290, 460–469, 2006.

[12] H. Jeong, B. Tombor, A.-L. Barabasi, The large-scale organization ofmetabolic networks, Nature 407, 651, 2000.

[13] J.F. Knabe, Computational Genetic Regulatory Networks: Evolvable, Self-organizing Systems, Springer, 2013.

[14] A. N. Kolmogorov. Three approaches to the quantitative definition of in-formation, Problems of Information and Transmission, 1(1):1–7, 1965.

[15] C.G. Langton, Studying artificial life with cellular automata, Physica D:Nonlinear Phenomena 22 (1–3): 120–149, 1986.

[16] T.I. Lee, N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, G.K. Gerber,N.M. Hannett, C.T. Harbison, C.M. Thompson, I. Simon, J. Zeitlinger,Transcriptional regulatory networks in Saccharomyces cerevisiae, Science,298, 799–804, 2002.

[17] L. A. Levin. Laws of information conservation (non-growth) and aspects ofthe foundation of probability theory, Problems of Information Transmis-sion, 10(3):206–210, 1974.

[18] M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and ItsApplications, 3rd ed., Springer, 2009.

[19] S. Mangan, A. Zaslaver and U. Alon, The coherent feedforward loop servesas a sign-sensitive delay element in transcription networks. J. Mol. Biol.,334, 197–204, 2003.

[20] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon.Network Motifs: Simple Building Blocks of Complex Networks, Science298, no. 5594: 824-827, 2002.

[21] Milo, R. et al. Superfamilies of designed and evolved networks. Science 303,1538–1542, 2004.

[22] D.T. Odom et al. Control of pancreas and liver gene expression by HNFtranscription factors, Science, 303, 1378–1381, 2004.

20

[23] F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Correspondenceand Independence of Numerical Evaluations of Algorithmic InformationMeasures, Computability, vol. 2, no. 2, pp 125–140, 2013.

[24] K.J. Ryan, C.G. Ray (eds), Sherris Medical Microbiology (5th ed.), Mc-Graw Hill, 2011.

[25] A.M. Turing, (1936), On Computable Numbers, with an Application to theEntscheidungsproblem”. Proceedings of the London Mathematical Society.2 42: 230–265, 1937.

[26] S.S. Shen-Orr, R. Milo, S. Mangan and U. Alon, Network motifs in thetranscriptional regulation network of Escherichia coli, Nature Genet. 31,64–68, 2002.

[27] F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, CalculatingKolmogorov Complexity from the Frequency Output Distributions of SmallTuring Machines, PLoS ONE 9(5): e96223, 2014.

[28] L.A. Saddic et al. The LEAFY target LMI1 is a meristem identity regulatorand acts together with LEAFY to regulate expression of CAULIFLOWER,Development 133, 1673–1682, 2006.

[29] R.J. Solomonoff, A formal theory of inductive inference: Parts 1 and 2.Information and Control, 7:1–22 and 224–254, 1964.

[30] G. Swiers, R. Patient and M. Loose, Genetic regulatory networks program-ming hematopoietic stem cells and erythroid lineage specification, Dev.Biol. 294, 525–540, 2006.

[31] D.J. Watts and S. H. Strogatz., Collective dynamics of ‘small-world’ net-works, Nature 393, 440-442, 1998.

[32] H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Graph Automorphismsand Topological Characterization of Complex Networks by Algorithmic In-formation Content, Physica A: Statistical Mechanics and its Applications,vol. 404, pp. 341–358, 2014.

[33] H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, Two-Dimensional Kolmogorov Complexity and Validation of the Coding The-orem Method by Compressibility, 2013.

[34] H. Zenil, N.A. Kiani and J. Tegner, Algorithmic complexity of motifs clus-ters superfamilies of networks, Proceedings of the IEEE International Con-ference on Bioinformatics and Biomedicine, Shanghai, China, 2013.

[35] H. Zenil and J.A.R. Marshall, Some Aspects of Computation Essential toEvolution and Life, Ubiquity (ACM), vol. 2013, no. April (2013), pp 1–16.

21

[36] H. Zenil, C. Gershenson, J.A.R. Marshall and D. Rosenblueth, Life as Ther-modynamic Evidence of Algorithmic Structure in Natural Environments,Entropy, 14(11), 2173–2191, 2012.

22

Documents

Methods of Information Theory and Algorithmic Complexity ...3 Information theory and network biology Information theory is a eld that speci es fundamental limits on signal pro-cessing