12
Received: 14 July 2017 Revised: 14 May 2018 Accepted: 25 May 2018 DOI: 10.1002/jgt.22370 ARTICLE A tanglegram Kuratowski theorem Éva Czabarka 1 László A. Székely 1 Stephan Wagner 2 1 Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA 2 Department of Mathematical Sciences, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa Correspondence László A. Székely, Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA. Email: [email protected] Funding information National Research Foundation of South Africa, Grant/Award Number: 96236; National Science Foundation, Grant/Award Numbers: 1300547, 1600811 Abstract A tanglegram consists of two rooted binary plane trees with the same number of leaves and a perfect matching between the two leaf sets. Tanglegrams are drawn with the leaves on two parallel lines, the trees on either side of the strip created by these lines, and the perfect matching inside the strip. If this can be done without any edges crossing, a tan- glegram is called planar. We show that every nonplanar tan- glegram contains one of two nonplanar 4-leaf tanglegrams as an induced subtanglegram, which parallels Kuratowski's Theorem. KEYWORDS crossing number, graph drawing, planarity, subtrees, tanglegram, trees 2010 MATHEMATICS SUBJECT CLASSIFICA- TION: Primary 05C10; secondary 05C05, 05C62, 92B10 1 INTRODUCTION Kuratowski's Theorem [15], a cornerstone of graph theory, asserts that a graph is nonplanar if and only if it contains a subdivision of 3,3 or 5 . This is not the only characterization of planarity: Wagner's Theorem [26] asserts that a graph is nonplanar if and only if 3,3 or 5 is a minor of the graph. We postpone the formal definition of tanglegrams to Section 2. Informally, tanglegrams are a special kind of graph, consisting of two rooted binary trees of the same size and a perfect matching joining their leaves, with restrictions on how they can be drawn. Namely, the leaves are to be placed on two parallel lines, the two trees are to be drawn on either side of the strip created by these lines as plane trees, and the perfect matching inside the strip is to be drawn in straight line segments. Such a drawing is called a layout of the tanglegram. The Tanglegram Layout Problem calls for the minimization of the number of crossings in the strip. The tanglegram is planar, if it has a layout without crossings. For illustration, Figure 1 shows the 13 tanglegrams of size 4, of which 11 are planar, while 2 are not. Tanglegrams play a major role in phylogenetics, especially in the theory of cospeciation [18]. The J Graph Theory. 2018;1–12. wileyonlinelibrary.com/journal/jgt © 2018 Wiley Periodicals, Inc. 1

A tanglegram Kuratowski theorem - scipaper.ir · Received: 14 July 2017 Revised: 14 May 2018 Accepted: 25 May 2018 DOI: 10.1002/jgt.22370 ARTICLE A tanglegram Kuratowski theorem Éva

Embed Size (px)

Citation preview

Received: 14 July 2017 Revised: 14 May 2018 Accepted: 25 May 2018

DOI: 10.1002/jgt.22370

A R T I C L E

A tanglegram Kuratowski theorem

Éva Czabarka1 László A. Székely1 Stephan Wagner2

1Department of Mathematics, University ofSouth Carolina, Columbia, SC 29208, USA2Department of Mathematical Sciences,Stellenbosch University, Private Bag X1,Matieland 7602, South Africa

CorrespondenceLászló A. Székely, Department ofMathematics, University of SouthCarolina, Columbia, SC 29208, USA.Email: [email protected]

Funding informationNational Research Foundation of South Africa,Grant/Award Number: 96236; National ScienceFoundation, Grant/Award Numbers: 1300547,1600811

AbstractA tanglegram consists of two rooted binary plane trees with

the same number of leaves and a perfect matching between

the two leaf sets. Tanglegrams are drawn with the leaves

on two parallel lines, the trees on either side of the strip

created by these lines, and the perfect matching inside the

strip. If this can be done without any edges crossing, a tan-

glegram is called planar. We show that every nonplanar tan-

glegram contains one of two nonplanar 4-leaf tanglegrams

as an induced subtanglegram, which parallels Kuratowski's

Theorem.

K E Y W O R D Scrossing number, graph drawing, planarity, subtrees, tanglegram, trees

2 0 1 0 M AT H E M AT I C S S U B J E C T C L A S S I F I C A -T I O N :Primary 05C10; secondary 05C05, 05C62, 92B10

1 INTRODUCTION

Kuratowski's Theorem [15], a cornerstone of graph theory, asserts that a graph is nonplanar if and onlyif it contains a subdivision of 𝐾3,3 or 𝐾5. This is not the only characterization of planarity: Wagner'sTheorem [26] asserts that a graph is nonplanar if and only if 𝐾3,3 or 𝐾5 is a minor of the graph.

We postpone the formal definition of tanglegrams to Section 2. Informally, tanglegrams are a specialkind of graph, consisting of two rooted binary trees of the same size and a perfect matching joiningtheir leaves, with restrictions on how they can be drawn. Namely, the leaves are to be placed on twoparallel lines, the two trees are to be drawn on either side of the strip created by these lines as planetrees, and the perfect matching inside the strip is to be drawn in straight line segments. Such a drawingis called a layout of the tanglegram. The Tanglegram Layout Problem calls for the minimization ofthe number of crossings in the strip. The tanglegram is planar, if it has a layout without crossings.For illustration, Figure 1 shows the 13 tanglegrams of size 4, of which 11 are planar, while 2 are not.Tanglegrams play a major role in phylogenetics, especially in the theory of cospeciation [18]. The

J Graph Theory. 2018;1–12. wileyonlinelibrary.com/journal/jgt © 2018 Wiley Periodicals, Inc. 1

2 CZABARKA ET AL.

No. 1 No. 2 No. 3 No. 4

No. 5 No. 6 No. 7 No. 8 No. 9

No. 10 No. 11 No. 12 No. 13

F I G U R E 1 The 13 tanglegrams of size 4 from [3,14]. They are drawn with minimum number of crossings

first binary tree is the phylogenetic tree of hosts, while the second binary tree is the phylogenetic treeof their parasites, e.g. gopher and louse [12]. The matching connects the host with its parasite. Thetanglegram crossing number has been related to the number of times parasites switched hosts [12], or,working with gene trees instead of phylogenetic trees, to the number of horizontal gene transfers ( [6],pp. 204–206). Tanglegrams are well-studied objects in phylogenetics and computer science.

Planarity of tanglegrams is directly characterized by Kuratowski's Theorem in terms of subgraphs,if the tanglegram is augmented by a single edge (Proposition 1). In this article, we provide a charac-terization of planarity of tanglegrams not in terms of its subgraphs, but in terms of excluded inducedsubtanglegrams (Theorem 4). The concept of induced subtanglegram is analogous to that of the topo-logical minor. As for subcubic graphs the concept of minor coincides with the concept of topologicalminor (see [9], Proposition 1.7.4), we do not expect a different analog to Wagner's Theorem amongtanglegrams. As tanglegrams are not widely known objects, it is time to turn to the technical definitions.

2 TANGLEGRAMS

A plane binary tree has a root vertex assumed to be a common ancestor of all other vertices, and eachvertex either has two children (left and right) or no children. A vertex with no children is a leaf, and avertex with two children is an internal vertex. Note that this definition allows a single-vertex tree thatis considered as both root and leaf to be a rooted binary tree. A plane binary tree is easy to draw onone side of a line, without edge crossings, such that only the leaves of the tree are on the line.

A tanglegram (�̃�, �̃�, 𝜎) is a graph that consists of a left binary tree �̃� with root 𝑟, a right binary tree �̃�

with root 𝜌, and a perfect matching 𝜎 between the leaves of �̃� and �̃�. Two tanglegrams are consideredidentical, if there is a graph isomorphism between them fixing 𝑟 and 𝜌. The size of a tanglegram is thenumber of leaves in 𝐿 (or 𝑅). An abstract tanglegram layout (𝐿, 𝑅, 𝜎) of the tanglegram (�̃�, �̃�, 𝜎) isgiven by turning the unordered trees �̃� and �̃� into ordered trees 𝐿 and 𝑅. Given an abstract tanglegramlayout (𝐿, 𝑅, 𝜎), an actual tanglegram layout of (𝐿, 𝑅, 𝜎) consists of a left plane binary tree isomorphicto 𝐿, with root 𝑟 drawn in the halfplane 𝑥 ≤ 0, having its leaves on the line 𝑥 = 0, a right plane binarytree isomorphic to 𝑅 with root 𝜌, drawn in the halfplane 𝑥 ≥ 1, having its leaves on the line 𝑥 = 1,

CZABARKA ET AL. 3

a a

b

b

c

c

d d

original layout

r ρ

a

b

c

d a

d

b

cafter switching at ρ

r ρ

d a

c

b

c

b

a d

after mirroring at r

r ρ

F I G U R E 2 Results of a switch and a mirror operation

and a perfect matching 𝜎 between their leaves drawn in straight line segments. (Isomorphism betweenordered trees and plane trees keeps the root and the order.)

Our main concern about tanglegram layouts is the number of crossings between the matching edges.As it is determined by the abstract tanglegram layout, it is usually sufficient to focus on the abstract tan-glegram layout. For counting crossings, we treat tanglegram layouts combinatorially by understandingthem as abstract tanglegram layouts.

A switch on the abstract tanglegram layout (𝐿, 𝑅, 𝜎) is the following operation: select an internalvertex 𝑣 of one of the two trees 𝐿 and 𝑅 and change the order of its two children.

It is easy to see that two abstract tanglegram layouts represent the same tanglegram if and only if asequence of switches moves one abstract layout into the other. The switch operation is also illustratedin Figure 2 on a tanglegram layout. Hence tanglegrams of a given size partition the set of all abstracttanglegram layouts of the same size, or equivalently a tanglegram can be seen as an equivalence classof abstract tanglegram layouts. Note that interchanging 𝐿 and 𝑅 is not allowed and may result in adifferent tanglegram.

For an internal vertex 𝑣 of 𝐿 or 𝑅, one may also take the mirror image of the subtree rooted at 𝑣.This is called the mirror operation at vertex 𝑣, which is illustrated in Figure 2 on a tanglegram layout.As the mirror operation can be obtained by doing a sequence of switch operations from 𝑣 down towardthe leaves, mirror operations do not change the tanglegram.

Billey et al. [3] considered the enumeration problem for tanglegrams: they obtained an explicit for-mula for the number 𝑡𝑛 of tanglegrams with 𝑛 leaves on each side. The counting sequence starts

1, 1, 2, 13, 114, 1509, 25595, 535753, 13305590, 382728552, .…

Figure 1 illustrates the fourth term in this sequence. The asymptotic formula

𝑡𝑛 ∼ 𝑛! ⋅ 𝑒1∕84𝑛−1

𝜋𝑛3

was derived in [3] as well, and a number of questions on the shape of random tanglegrams were asked.Those were answered in [14] by means of a strong structure theorem.

3 TANGLEGRAM CROSSING NUMBER AND PLANARITY OFTANGLEGRAMS

The crossing number of a tanglegram layout is the number of crossing pairs of matching edges, whichis determined by the abstract tanglegram layout.

It is desirable to draw a tanglegram with the least possible number of crossings, which is known asthe Tanglegram Layout Problem [11,25]. The (tanglegram) crossing number crt(𝑇 ) of a tanglegram 𝑇

4 CZABARKA ET AL.

is defined as the minimum number of crossings among its layouts. The Tanglegram Layout Problemproblem is NP-hard [5,11], but is fixed parameter tractable [4,5]. It does not allow constant factorapproximation under the Unique Game Conjecture [5]. Several heuristics for the problem are comparedin [17]. As the reality of biology is infinitely more complicated than this model, [2] extended the studyof the problem for nonbinary trees, [22] for phylogenetic networks, [1] for the Generalized TanglegramLayout Problem, where correspondence among hosts and parasites is no longer one-to-one. A morerealistic related reconciliation mechanism is studied in [7].

A tanglegram is planar if it has zero tanglegram crossing number; in other words, if it has a layoutwithout crossing matching edges. Otherwise it is called nonplanar. In an earlier article [8], we showedthat the tanglegram crossing number of a randomly and uniformly selected tanglegram of size 𝑛 isΘ(𝑛2) with high probability, i.e. as large as it can be within a constant multiplicative factor. As onewould therefore expect, the number of planar tanglegrams of size 𝑛 grows much more slowly than thetotal number of tanglegrams. The counting sequence 𝑝𝑛 starts

1, 1, 2, 11, 76, 649, 6173, 63429, 688898, 7808246,… ,

and an asymptotic formula of the form

𝑝𝑛 ∼ 𝐴 ⋅ 𝑛−3 ⋅ 𝐵𝑛,

where 𝐴 and 𝐵 are constants, holds [19].Recall [24] that a drawing of a graph 𝐺 in the plane places the vertices of 𝐺 on distinct points in the

plane and then, for every edge 𝑢𝑣 in 𝐺, draws a continuous simple curve in the plane connecting the twopoints corresponding to 𝑢 and 𝑣, in such a way that no curve has a vertex point as an internal point. Thecrossing number cr(𝐺) of a graph 𝐺 is the minimum number of intersection points among the interiorsof the curves representing edges, over all possible drawings of the graph, where no three edges mayhave a common interior point. For more on the state-of-the-art of graph drawing, see [20,21].

Note that the (graph) crossing number cr(𝑇 ) of a tanglegram 𝑇 is less or equal to the (tanglegram)crossing number crt(𝑇 ) of 𝑇 , since the tanglegram layout is more restrictive than the graph drawing.The following proposition that provides an almost obvious characterization of planarity of tanglegram,was found by [25] and later rediscovered [8]:

Proposition 1. Assume that a tanglegram 𝑇 is represented by the layout (𝐿, 𝑅, 𝜎), and let the rootsof 𝐿 and 𝑅 be 𝑟 and 𝜌. Let 𝑇 ∗ denote the graph in which the underlying graph of 𝑇 (consisting of thetwo binary trees and the matching edges) is augmented by the edge 𝑟𝜌. Then the following facts areequivalent:

(1) crt(𝑇 ) ≥ 1,(2) cr(𝑇 ∗) ≥ 1,(3) 𝑇 ∗ contains a subdivision of 𝐾3,3.

Proof.

(1)⇒(2). This is equivalent with cr(𝑇 ∗) = 0 ⇒ crt(𝑇 ) = 0. If cr(𝑇 ∗) = 0 then 𝑇 ∗ can be drawnin the plane with straight lines by Fáry's Theorem [10]. This means both the left and right trees(𝑇left and 𝑇right) of 𝑇 are drawn in straight lines. Draw simple closed curves, 𝐶left and 𝐶right, around𝑇left and 𝑇right, very close to them, such that the matching edges and the edge 𝑟𝜌 intersect each of 𝐶leftand 𝐶right exactly once. Deleting the 𝑟𝜌 edge and the intersection points of the 𝑟𝜌 curve with 𝐶left

CZABARKA ET AL. 5

1

2′

3′

2

3

1′1′3

2′13′

2

F I G U R E 3 Finding copies of 𝐾3,3 in tanglegrams No. 6 and No. 13 after adding edges between the roots [8]

�1 �2 �3 �4

r

r�1�2�3�4

�1 �2 �3 �4

F I G U R E 4 A rooted binary tree with root 𝑟, four leaves 𝓁1,𝓁2,𝓁3,𝓁4 selected, the vertex 𝑟𝓁1𝓁2𝓁3𝓁4and the tree

induced by the selected leaves

and 𝐶right, we can continuosly deform the remaining part of the drawing 𝑇 ∗ such that open curvesleft of 𝐶left and 𝐶right map to the lines 𝑥 = 0 and 𝑥 = 1 obtaining a planar tanglegram drawing of 𝑇 .

(2)⇒(3) follows from Kuratowski's Theorem, as 𝑇 ∗ cannot contain a subdivision of 𝐾5. This isbecause none of its vertices has degree greater than 3.

(3)⇒(1) is proved by the contrapositive: if 𝑇 was to admit a planar tanglegram layout, then we couldadd the additional edge between 𝑟 and 𝜌, creating a planar drawing of the graph 𝑇 ∗. ■

Note that a subdivision of a 𝐾3,3 in the tanglegram 𝑇 ∗ may be such that the six vertices of 𝐾3,3 areall in 𝐿 or all in 𝑅 (compare with Corollary 5).

Now consider the tanglegrams 𝑇1 and 𝑇2, No. 6 and 13 in Figure 1, each augmented with the extraedge connecting the roots. Figure 3 shows a layout with one crossing for the tanglegrams 𝑇1 and 𝑇2, anda subdivision of 𝐾3,3 in the graphs 𝑇 ∗

1 and 𝑇 ∗2 , showing crt(𝑇1) = crt(𝑇2) = 1 in view of Proposition 1.

These two are the only nonplanar tanglegrams of size 4, see Figure 1.In what follows, we will show that the two nonplanar tanglegrams of size 4 are in fact sufficient to

characterize nonplanarity of tanglegrams in the same way that 𝐾5 and 𝐾3,3 characterize nonplanarityof graphs: every nonplanar tanglegram has to contain at least one of the two.

4 INDUCED SUBTANGLEGRAMS

In a rooted plane binary tree 𝐵 with root 𝑟, a choice of a subset of the set of leaves of 𝐵 inducesanother rooted binary tree by taking the smallest subtree containing the leaves in , designating thevertex of this subtree closest to the old root as new root (which we will denote by 𝑟), and suppressingall vertices of degree 2 other than 𝑟, see Figure 4. The study of induced binary subtrees of (rooted or

6 CZABARKA ET AL.

e1

e2

e3

ρe1e2e3

re1e2e3 = r ρe1

e2

e3

F I G U R E 5 A tanglegram 𝑇 with matching edges 𝑒1, 𝑒2, 𝑒3 selected, the vertices 𝑟𝑒1𝑒2𝑒3and 𝜌𝑒1𝑒2𝑒3

, and the subtan-glegram induced by the selected edges

unrooted) leaf-labeled binary trees is topical in the phylogenetic literature [23]. It is immediate that forevery 𝑣 ∈ the vertex 𝑟 lies on the unique path between 𝑟 and 𝑣

For brevity, we use the notation 𝑟𝑥𝑦 or 𝑟𝑥𝑦𝑧 when = {𝑥, 𝑦} or = {𝑥, 𝑦, 𝑧}, respectively.Given a layout of the tanglegram 𝑇 with left binary tree 𝐿 rooted at 𝑟 and right binary tree 𝑅 rooted

at 𝜌 and a subset 𝐸 of the set of matching edges between the leaf sets of the left and right binary planetrees, 𝐸 identifies a subset of leaves on both sides. These leaf sets induce respectively a left and rightinduced binary plane tree, which define a layout of a tanglegram 𝑇 ′ when we put back the edges of𝐸 between the corresponding leaves. We say that 𝐸 induces this sublayout of the original layout oftanglegram 𝑇 , and we call 𝑇 ′ the subtanglegram of tanglegram 𝑇 induced by the matching edge set𝐸 (see Figure 5). As the sublayout and switch operators commute, this definition does not depend onthe particular layout of 𝑇 , it just depends on the tanglegram. We will use 𝑟𝐸 and 𝜌𝐸 for the vertices in𝑇 corresponding to the roots of the left and right subtrees of this induced subtanglegram. There is anatural partial order by inclusion on the set of induced subtanglegrams of a given tanglegram.

Sometimes we put scars on the edges of induced subtanglegrams to remember where the eliminatedmatching edges were connected to the surviving part. Let 𝑒 ∈ 𝜎 ⧵ 𝐸 be a matching edge in a layoutof the tanglegram 𝑇 . When we consider 𝐿𝐸 and 𝑅𝐸 , the smallest subtrees of 𝐿 and 𝑅 that containthe leaf set corresponding to 𝐸, the unique path connecting 𝑒 to 𝑟 in 𝐿 either enters 𝐿𝐸 at a vertex ofdegree 2 or does not enter 𝐿𝐸 at all, and similarly, the unique path connecting 𝑒 to 𝜌 in 𝑅 either enters𝑅𝐸 at a vertex of degree 2 or does not enter 𝑅𝐸 at all. We refer to these degree 2 vertices (when theyexist) as the hosts of 𝑒 in 𝐿𝐸 and 𝑅𝐸 . Host ℎ1 separates host ℎ2 from the root in 𝐿𝐸 (𝑅𝐸), if there is aroot-to-leaf path in 𝐿𝐸 (𝑅𝐸), in which root, ℎ1, ℎ2 is the order of these three vertices. A single vertexcan host several other matching edges not in 𝐸. Hosts in 𝐿𝐸 (respectively in 𝑅𝐸) are in a natural partialorder by separation from 𝑟 (respectively 𝜌).

Scars are markings on the edges of the induced subtanglegram, corresponding to the host verticesand following the natural partial order above, such that every scar marks the names of all edges hosted.Note that the partial order of the scars and the corresponding marks do not depend on the layout, theyonly depend on the tanglegram. Figure 6 illustrates a scar.

The following lemmata will be used to prove our main result:

Lemma 2. Let 𝐹 be a planar tanglegram with a distinguished marked non-matching edge. If in everyplanar layout of 𝐹 the marked edge does not lie on the boundary of the infinite face of the layout, then𝐹 has a set 𝐸 of three matching edges that induces the subtanglegram 𝑆 in Figure 7, where the markededge lies on one of the paths of 𝐹 corresponding to the bold edges.

Proof. Let the left and right tree of 𝐹 be 𝐿 and 𝑅 with roots 𝑟 and 𝜌, respectively, and let 𝜎 denote theset of matching edges. We denote the marked edge by 𝑚 and assume without loss of generality that 𝑚

is an edge of 𝑅 (the argument is the same otherwise with the roles of 𝐿 and 𝑅 exchanged). Consider

CZABARKA ET AL. 7

e1

e2

e3

f

e1

e2

e3sf

F I G U R E 6 The scar 𝑠𝑓 of edge 𝑓 in the example of Figure 5. Notice that the right tree does not have a scar for 𝑓

S

F I G U R E 7 The subtanglegram 𝑆

rre1e2 ρ∗

ρ

e3

e2

e1

r

re2e3

ρ∗

ρ

e3

e2

e1

F I G U R E 8 The possible subtanglegrams of 𝐹 induced by 𝑒1, 𝑒2, 𝑒3. The bold edge in the figure represents a pathin 𝐹 containing the edge 𝑚∗

the unique path 𝑃 in 𝑅 that starts from 𝜌 and whose last edge is 𝑚. Let 𝑚∗ be the edge of 𝑃 closest to𝜌 that does not lie on the boundary of any planar layout of 𝐹 (potentially 𝑚∗ = 𝑚).

Consider a planar layout of 𝐹 where one endpoint of 𝑚∗ (which we will denote by 𝜌∗) lies on theboundary of the infinite face; by the definition of 𝑚∗ such a layout exists. Without loss of generality itis the lower of the two 𝑟–𝜌 paths on the boundary. Let 𝐸∗ be the set of matching edges in the leavesof the subtree of 𝑅 rooted at 𝜌∗. By our assumptions 𝜌 ≠ 𝜌∗, |𝐸∗| ≥ 2, 𝐸∗ ≠ 𝜎, and all edges of 𝐸∗

lie below all edges of 𝜎 ⧵ 𝐸∗ in the layout. Let 𝑒1 ∈ 𝐸∗ and 𝑒3 ∈ 𝜎 ⧵ 𝐸∗ be the matching edges of 𝐹

that are on the boundary of the infinite face of the layout and let 𝑒2 ∈ 𝐸∗ be the edge that lies aboveall other edges of 𝐸∗ in the layout; so we have 𝜌 = 𝜌𝑒1𝑒3

, 𝑟 = 𝑟𝑒1𝑒3,and 𝜌∗ = 𝜌𝑒1𝑒2

(see Figure 8). Wemust have 𝑟 ∈ {𝑟𝑒1𝑒2

, 𝑟𝑒2𝑒3}. If 𝑟 = 𝑟𝑒2𝑒3

, then 𝑟𝑒1𝑒2lies on the unique 𝑟–𝑒1 path in 𝐿, and performing

a mirror operation on 𝑟𝑒1𝑒2and 𝜌∗ results in a planar layout of 𝐹 where 𝑚∗ lies on the boundary of the

infinite face, which is a contradiction. Therefore we must have 𝑟 = 𝑟𝑒1𝑒2, and 𝑟𝑒2𝑒3

lies on the unique𝑟–𝑒3 path in 𝐿.

If 𝑚 lies on the unique 𝜌∗–𝑒2 path in 𝑅 (including the case that 𝑚 = 𝑚∗), then the subtanglegraminduced by 𝑒1, 𝑒2, 𝑒3 satisfies the conclusion of our lemma and we are done. Otherwise let 𝑓 be anymatching edge such that 𝑚 lies on the unique 𝜌∗–𝑓 path in 𝑅. By our assumptions, 𝑓 ∉ {𝑒1, 𝑒2}, 𝑓 liesbetween 𝑒1 and 𝑒2 in our planar layout and 𝜌𝑓𝑒2

lies on the unique 𝜌∗–𝑒2 path in 𝑅 (Figure 9). We have𝑟 ∈ {𝑟𝑒1𝑓

, 𝑟𝑒2𝑓}. If 𝑟 = 𝑟𝑒1𝑓

, then the subtanglegram of 𝐹 induced by 𝑒1, 𝑒3, 𝑓 satisfies the conclusionof our lemma. If 𝑟 = 𝑟𝑒2𝑓

, then the subtanglegram induced by 𝑒1, 𝑒2, 𝑓 satisfies the conclusion. Eitherway, we are done. ■

8 CZABARKA ET AL.

rρ∗

ρ

f

e3

e2

e1

ρfe2

rρ∗

ρ

f

e3

e2

e1

ρfe2

when r = rfe1

rρ∗

ρ

f

e3

e2

e1

ρfe2

when r = rfe2

F I G U R E 9 The possible subtanglegrams of 𝐹 induced by 𝑒1, 𝑒2, 𝑒3, 𝑓 . The edge containing 𝑚 is bold

e1

f

e2

E2

E1

F I G U R E 1 0 Illustration of Lemma 3

Lemma 3. Let 𝐹 be a tanglegram with two sets of matching edges, 𝐸1, 𝐸2, such that 𝐸1 ∩ 𝐸2 = {𝑓}and 𝐸1 ∪ 𝐸2 contains all matching edges of 𝐹 . For 𝑖 ∈ {1, 2}, let 𝐹𝑖 be the subtanglegram induced by𝐸𝑖, and assume that the scars of the edges of 𝐸1 in 𝐹2 as well as the scars of the edges of 𝐸2 in 𝐹1are on a unique root-to-root path containing 𝑓 but no other matching edge. If 𝐹1 and 𝐹2 each haveplanar layouts in which the two matching edges on the boundary of the infinite face are 𝑓 and 𝑒1 andcorrespondingly 𝑓 and 𝑒2, then 𝐹 has a planar layout in which the matching edges on the boundaryof the infinite face are 𝑒1 and 𝑒2.

Proof. Let the left and right roots of 𝐹 be 𝑟 and 𝜌, respectively, and let 𝑃 be the unique 𝑟–𝜌 path in 𝐹

containing 𝑓 but no other matching edges, and let 𝑟1, 𝜌1 and 𝑟2, 𝜌2 be the left and right roots of 𝐹1 and𝐹2, respectively. From the assumptions on 𝑓 we get that 𝑟1, 𝑟2, 𝜌2, 𝜌2 lie on 𝑃 .

The conditions on 𝐹1 and 𝐹2 imply that 𝐹1 has a planar layout such that the 𝑟1–𝜌1 path 𝑃1 containing𝑓 lies on a straight line, all other edges of 𝐹1 lie above this line and 𝑒1 is on the boundary of the infiniteface; also, 𝐹2 has a planar layout such that the 𝑟2–𝜌2 path 𝑃2 containing 𝑓 lies on a straight line, allother edges of 𝐹2 lie below this line and 𝑒2 is on the boundary of the infinite face. Since the orderof vertices on 𝑃 is independent of the drawings, 𝑃1 and 𝑃2 can be obtained from subpaths of 𝑃 bysuppressing some vertices, so these two layouts can be merged into the required planar layout of 𝐹 ;see Figure 10 for an illustration of this lemma. ■

CZABARKA ET AL. 9

r ρ

Lu

Ld

Ru

RdL R

r ρ

Lu

Ld

Ru

Rd

e

h

g

fL R

r ρ

Lu

Ld

Ru

RdL R

A B C

F I G U R E 1 1 Case analysis on the qualitative distribution of matching edges between subtrees of 𝐿 and 𝑅. Dashedlines mark the existence of matching edges between the subtrees

5 CROSSING-CRITICAL TANGLEGRAMS

Another key concept in this article is that of a crossing-critical tanglegram. A tanglegram is crossing-critical if it is nonplanar, but every proper induced subtanglegram of it is planar. For example, tangle-grams No. 6 and No. 13 are crossing-critical. Clearly every nonplanar tanglegram contains a crossing-critical induced subtanglegram.

Theorem 4. The only crossing-critical tanglegrams are No. 6 and No. 13. Therefore, every nonplanartanglegram contains No. 6 or No. 13. as an induced subtanglegram.

Corollary 5. For every nonplanar tanglegram 𝑇 , the augmented graph 𝑇 ∗ contains a subdivision of𝐾3,3, where three of the original vertices of the 𝐾3,3 making a partite set are located in 𝐿, and theother three in 𝑅.

Proof of Theorem 4. Assume that 𝑇 is a crossing-critical tanglegram, with left subtree 𝐿 rooted at 𝑟

and right subtree 𝑅 rooted at 𝜌. Let 𝐿𝑢, 𝐿𝑑 be the rooted subtrees of 𝐿 rooted at the neighbors 𝑟𝑢 and 𝑟𝑑

of 𝑟 and 𝑅𝑢, 𝑅𝑑 be the rooted subtrees of 𝑅 rooted at the neighbors 𝜌𝑢 and 𝜌𝑑 of 𝜌. Since the leaves ofboth 𝐿𝑢 and 𝐿𝑑 are matched to the leaves of at least one of 𝑅𝑢 and 𝑅𝑑 , and vice versa, we may assumewithout loss of generality that there are matching edges between 𝐿𝑢 and 𝑅𝑢, and between 𝐿𝑑 and 𝑅𝑑 .(If this is not the case, we can achieve this situation using switch operations.)

Denote the nonempty set of matching edges between 𝐿𝑢 and 𝑅𝑢 by 𝐸𝑢, and between 𝐿𝑑 and 𝑅𝑑 by𝐸𝑑 , and let 𝐸𝑚 be the (potentially empty) set of matching edges not in 𝐸𝑢 ∪ 𝐸𝑑 .

Let 𝑇𝑢 and 𝑇𝑑 be the subtanglegrams of 𝑇 induced by the matching edges 𝐸𝑢 and 𝐸𝑑 , respectively.Since 𝑇 is crossing-critical, both 𝑇𝑢 and 𝑇𝑑 are planar tanglegrams.

If 𝐸𝑚 = ∅ (part (a) in Figure 11) then 𝑇 has a planar layout (just put the planar layouts of 𝑇𝑢 and 𝑇𝑑

above each other, and connect the vertex 𝑟 to the left roots of 𝑇𝑢 and 𝑇𝑑 , and 𝜌 to the right roots of 𝑇𝑢

and 𝑇𝑑), which is a contradiction. Therefore 𝐸𝑚 ≠ ∅.If 𝐸𝑚 contains a matching edge 𝑔 between 𝐿𝑢 and 𝑅𝑑 and a matching edge 𝑓 between 𝐿𝑑 and 𝑅𝑢

(part (b) in Figure 11), then let 𝑒 ∈ 𝐸𝑢 and ℎ ∈ 𝐸𝑑 . The subtanglegram induced by the edges 𝑒, 𝑓 , 𝑔, ℎ

in 𝑇 is No. 13, and we are done. So we are left to consider the case when only one of the pairs 𝐿𝑢, 𝑅𝑑

and 𝐿𝑑, 𝑅𝑢 has matching edges between them, in this case without loss of generality (using the mirrorimage operation at 𝑟 and 𝜌, if needed) 𝐸𝑚 is the nonempty set of matching edges between 𝐿𝑢 and 𝑅𝑑

(part (c) in Figure 11).We will consider two cases:

10 CZABARKA ET AL.

r ρrg,fu

ρg,fd

fu

fd

Pu

Pd

F I G U R E 1 2 Analysis of a planar drawing of the subtanglegram 𝑇 ′′. The white region between the dotted verticallines is where 𝑇 ′′ is drawn

Case (A): min(|𝐸𝑢|, |𝐸𝑑 |) ≥ 2. We are going to show that this does not happen in a crossing-criticaltanglegram 𝑇 .

Let 𝑒 ∈ 𝐸𝑑 and consider the subtanglegram 𝑇 ′ induced by all matching edges except 𝑒 with left tree𝐿′ and right tree 𝑅′. 𝑇 ′ is planar, contains 𝑇𝑢 as a subtanglegram, and contains the vertices 𝜌𝑢 and𝜌𝑑 . As the unique path from the root to a matching edge in 𝑅′ passes through 𝜌𝑢 for every matchingedge in 𝐸𝑢 and passes through 𝜌𝑑 for every matching edge in 𝐸𝑑 ∪ 𝐸𝑚, in every planar layout of 𝑇 ′ theedges of 𝐸𝑢 appear contiguously, and the edges of 𝐸𝑚 appear on only one side of them. Consequently,every planar layout of 𝑇 ′ gives a planar sublayout of 𝑇𝑢 where all scars from 𝐸𝑚 lie on the same root-to-root path bordering the infinite face; denote the matching edge which this path travels through by𝑓𝑢. Similar logic gives that 𝑇𝑑 has a planar layout in which all scars lie on the same root-to-root pathbordering the infinite face, denote the matching edge which this path travels through by 𝑓𝑑 . Let 𝑇 ′′ bethe tanglegram induced by 𝐸𝑚 ∪ {𝑓𝑢, 𝑓𝑑}. Consider a planar layout of 𝑇 ′′ (as 𝑇 is crossing-critical,such a layout exists), without loss of generality (up to a mirror operation) 𝑓𝑢 lies above 𝑓𝑑 in thislayout (see Figure 12). Let 𝑃𝑢 be the unique shortest path leading from 𝑟 to 𝑓𝑢 and 𝑃𝑑 be the uniqueshortest path leading from 𝜌 to 𝑓𝑑 , and let 𝑔 ∈ 𝐸𝑚 be arbitrary. Consider the vertical strip betweenthe two vertical lines going though 𝑟 and 𝜌 – this is the region where 𝑇 ′′ is drawn. The 𝑟–𝜌 pathscontaining 𝑓𝑢 and 𝑓𝑑 and no other matching edge cut this strip into three subregions, and 𝑔 must liein the unique subregion that borders both 𝑃𝑢 and 𝑃𝑑 . This means that 𝑔 lies between 𝑓𝑢 and 𝑓𝑑 in thisplanar layout of 𝑇 ′′. Consequently in every planar layout of 𝑇 ′′, 𝑓𝑢 and 𝑓𝑑 are on the boundary of theinfinite face. Two applications of Lemma 3 (first on 𝑇 ′′ and 𝑇𝑢 using the common edge 𝑓𝑢, then on theresulting tanglegram and 𝑇𝑑 using the common edge 𝑓𝑑) show that 𝑇 itself has a planar layout, whichis a contradiction.

Case (B): min(|𝐸𝑢|, |𝐸𝑑 |) = 1. We are going to exhibit No. 6 as an induced subtanglegram in 𝑇 .

Assume first that |𝐸𝑑 | = 1, and let 𝑒 be the single matching edge between 𝐿𝑑 and 𝑅𝑑 . This meansin particular that 𝐿𝑑 consists of a single leaf vertex 𝑟𝑑 that is matched by the edge 𝑒. Now, by thecrossing-criticality of 𝑇 , the subtanglegram induced by all matching edges but 𝑒, denoted by �̂� , isplanar, and a nonmatching edge of its right subtree, �̂�, is marked by the scar 𝑠𝑒 of 𝑒 in �̂� (this scarexists, as 𝑅𝑑 has leaves matched by 𝐸𝑚 and therefore 𝜌𝜎⧵{𝑒} = 𝜌). If �̂� has a planar layout in whichthe marked edge is on the boundary of the infinite face, then 𝑇 has a planar layout, contradicting thecrossing-criticality of 𝑇 . Therefore in all planar layouts of �̂� , the marked edge is not on the boundary

CZABARKA ET AL. 11

r ρabc

rabcm

a

b

c

e

F I G U R E 1 3 Finding subtanglegram No. 6 in 𝑇 starting from a copy of 𝑆 drawn in bold

of the infinite face. Lemma 2 shows that �̂� contains the subtanglegram 𝑆 with the marked edge 𝑚

positioned as in Figure 13, and, using the fact that 𝑟 is connected to one of the endpoints of 𝑒, we findthat the subtanglegram induced by the edges 𝑎, 𝑏, 𝑐, 𝑒 is No. 6, so we are done. If |𝐸𝑢| = 1, the argumentis essentially the same after exchanging the roles between 𝐿 and 𝑅 and between 𝐸𝑢 and 𝐸𝑑 . ■

It would be interesting to see if a more general theorem holds for tanglegrams exhibiting an evenhigher degree of nonplanarity:

Question 1. For an integer 𝑘 ≥ 3, is there a characterization of tanglegrams that have at least 𝑘 pairwisecrossing matching edges in every layout, in terms of a finite list of tanglegrams that they must have asinduced subtanglegrams, analogous to Theorem 4?

ORCIDLászló A. Székely http://orcid.org/0000-0002-9470-9111

REFERENCES[1] M. S. Bansal et al., Generalized binary tanglegrams: Algorithms and applications, Bioinformatics and Computa-

tional Biology, First International Conference, BICoB 2009, New Orleans, LA, USA, April 8–10, 2009, Proceed-ings, S. Rajasekaran (eds.), Lecture Notes in Bioinformatics vol. 5462, Springer-Verlag, Berlin, Heidelberg (2009),pp. 114–125.

[2] F. Baumann, C. Buchheim, and F. Liers, Exact bipartite crossing minimization under tree constraints, experimentalalgorithms, SEA 2010, (P. Festa, ed.), Lecture Notes in Computer Science vol. 6049, Springer, Berlin, Heidelberg,2010, pp. 118–128.

[3] S. C. Billey, M. Konvalinka, and F. A. Matsen, On the enumeration of tanglegrams and tangled chains, J. Combin.Theory Ser. A 146 (2017), 239–263.

[4] S. Böcker et al., A faster fixed-parameter approach to drawing binary tanglegrams, Parameterized and Exact Compu-tation 4th International Workshop, IWPEC 2009, Copenhagen, Denmark, September 10-11, 2009, Revised SelectedPapers, J. Chen and F. V. Fomin (eds.), Lecture Notes in Computer Science vol. 5917, Springer, Berlin, Heidelberg(2009), pp. 38–49.

[5] K. Buchin et al., Drawing (complete) binary tanglegrams, hardness, approximation, fixed-parameter tractability,Algorithmica 62 (2012), 309–332.

[6] A. Burt and R. Trivers, Genes in conflict, Belknap Harvard Press, Cambridge, MA, 2006.

[7] T. Calamoneri et al., Visualizing co-phylogenetic reconciliations, Graph Drawing and Network Visualization, 25thInternational Symposium, GD 2017, Boston, MA, USA, September 25–27, 2017, Revised Selected Papers, F. Fratiand K. L. Ma, (eds.), Lecture Notes in Computer Science vol. 10692, Springer-Verlag, Berlin, Heidelberg, 2018,pp. 334–347

[8] É. Czabarka, L. A. Székely, and S. Wagner, Inducibility in binary trees and crossings in tanglegrams, SIAM J.Discrete Math. 31 (2017), 1732–1750.

12 CZABARKA ET AL.

[9] R. Diestel, Graph theory (4th ed.), Graduate texts in mathematics 173, Springer-Verlag, Berlin, Heidelberg, 2010.

[10] I. Fáry, On straight line representations of graphs, Acta Univ. Szeged Sect. Sci. Math. 11 (1948), 229–233.

[11] H. Fernau, M. Kaufmann, and M. Poths, Comparing trees via crossing minimization, J. Comp. Syst. Sci. 76(7)(2010), 593–608.

[12] M. S. Hafner and S. A. Nadler, Phylogenetic trees support the coevolution of parasites and their hosts, Nature 332(1988), 258–259.

[13] M. R. Henzinger, V. King, and T. Warnow, Constructing a tree from homeomorphic subtrees, with applications tocomputational evolutionary biology, Algorithmica 24 (1999), 1–13.

[14] M. Konvalinka and S. Wagner, The shape of random tanglegrams, Adv. Appl. Math. 78 (2016), 76–93.

[15] K. Kuratowski, Sur le problème des courbes gauches en topologie, Fund. Math. 15 (1930), 271–283.

[16] F. A. Matsen et al., Tanglegrams: A reduction tool for mathematical phylogenetics, IEEE/ACM Trans. Comp. Biol.Bioinformatics 15 (2016), 343–349.

[17] M. Nöllenburg et al., Drawing binary tanglegrams: An experimental evaluation, 11th Workshop on AlgorithmEngineering and Experiments and 6th Workshop on Analytic Algorithmics and Combinatorics 2009 (ALENEX09/ANALCO 09), J. Hershberger and I. Finocchi, (eds.), SIAM, 2009, pp. 106–119.

[18] R. D. M. Page (eds.), Tangled trees. Phylogeny, Cospeciation and Coevolution, University of Chicago Press,Chicago, IL, 2002.

[19] D. Ralaivaosaona, J. B. Ravelomanana, and S. Wagner, Counting planar tanglegrams, to appear in: Proceedingsof the 29th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis ofAlgorithms (AofA 2018) Uppsala, Sweden.

[20] M. Schaefer, The graph crossing number and its variants: A survey, Electronic J. Combin. Dynamic Survey #DS21,(2013).

[21] M. Schaefer, Crossing numbers of graphs, CRC Press, Boca Raton, FL, 2018.

[22] C. Scornavacca, F. Zickmann, and D. H. Huson, Tanglegrams for rooted phylogenetic trees and networks, Bioin-formatics 27 (2011), 248–256.

[23] C. Semple, and M. A. Steel, Phylogenetics, Oxford University Press, Oxford, England, 2003.

[24] L. A. Székely, A successful concept for measuring non-planarity of graphs: the crossing number, Discrete Math.276 (1–3)(2004), 331–352.

[25] B. Venkatachalam et al., Untangling tanglegrams: Comparing trees by their drawings, IEEE/ACM Trans. Comp.Biol. Bioinformatics 7 (2010), 588–597.

[26] K. Wagner, Über eine Eigenschaft der ebenen Komplexe, Math. Ann. 114, 570–590.

How to cite this article: Czabarka Éva, Székely LA, Wagner S. A tanglegram Kuratowskitheorem. J Graph Theory. 2018;1–12. https://doi.org/10.1002/jgt.22370