The Shannon Capacity of a Graph - UvA · state Shannon’s Theorem. Since it is not possible to determine the Shannon capacity of every graph exactly, Shannon’s Theorem gives us

The Shannon Capacity of a Graph

Femke Bekius

July 22, 2011

Bachelor Thesis

Supervisor: prof. dr. Alexander Schrijver

KdV Instituut voor wiskunde

Faculteit der Natuurwetenschappen, Wiskunde en Informatica

Universiteit van Amsterdam

AbstractThis thesis focuses on the Shannon capacity of a graph. Sup-pose we want to send a message across a channel to a receiver.The general question is about the effective size of an alphabetin a model such that the receiver may recover the original mes-sage without errors. To answer this question Shannon defined theShannon capacity of a graph, Θ(G), and stated Shannon’s Theo-rem. We study an article of Lovasz [5] where he determined theShannon capacity of the cycle graph C5 and introduced the LovaszNumber, an upper bound for the Shannon capacity. About theLovasz Number we define some formulas and prove a couple oftheorems. In the last chapter we consider three problems Lovaszstated at the end of the article. The problem is that determiningthe Shannon capacity of a graph, even for very simple graphs,is very difficult. Due to this determining Θ(C7) is still an openproblem.

DataTitle: The Shannon Capacity of a GraphAuthor: Femke Bekius, [email protected], 5823390Supervisor: prof. dr. Alexander SchrijverSecond assessor: prof. dr. Monique LaurentEnddate: July 22, 2011

Korteweg de Vries Instituut voor WiskundeUniversiteit van AmsterdamScience Park 904, 1098 XH Amsterdamhttp://www.science.uva.nl/math

http://www.science.uva.nl/math

Contents

Introduction 2

1 Graph Theory, Linear Algebra and Shannon’s Theorem 41.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Shannon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Perfect Graphs . . . . . . . . . . . . . . . . . . . . . . 111.3.2 Even and odd Cycle Graphs Cn . . . . . . . . . . . . . 12

2 The capacity of the pentagon 14

3 Formulas for ϑ(G), an upper bound for Θ(G) 16

4 Further results on the Shannon capacity 31

5 Conclusion 36

Popular summary 37

1

Introduction

Suppose we want to send a message across a channel to a receiver. Dur-ing the transmission it is possible that the message changes because of noiseon the channel. An interesting question is: What is the maximum rate oftransmission such that the receiver may recover the original message with-out errors? In 1956 Shannon asked himself if it is possible to calculate thezero-error capacity of a certain communication model such that he could saysomething about this maximum rate [1]. For answering this question he de-fined the Shannon capacity of a graph. It can be seen as an informationtheoretical parameter which represents the effective size of an alphabet in acommunication model represented by a graph G(V,E).The Shannon capacity attracted some interest in the field of InformationTheory and in the scientific community, because of the applications to com-munication issues. There are also connections with some central combinato-rial and computational questions in graph theory, like computing the largestclique and finding the chromatic number of a graph [2]. Unfortunately deter-mining the Shannon capacity of an arbitrary graph is a very difficult problem.Even for the simple cycle graph C7, or more generally, for the cycle graphCn with n odd, the Shannon capacity is still unknown.In this thesis we study the article of Lovasz, published in 1979, where heexplained what is known about the Shannon capacity of graphs. He provedthat the Shannon capacity of the the cycle graph C5 equals

√5 and intro-

duced the Lovasz Number about which he proved a number of formulas andtheorems.In the first chapter we give some important definitions and theorems whichwe will need for the rest of the thesis. We start with a short introductionin graph theory and then explain some concepts of linear algebra before westate Shannon’s Theorem. Since it is not possible to determine the Shannoncapacity of every graph exactly, Shannon’s Theorem gives us an upper anda lower bound for the Shannon capacity. After that, by using Shannon’sTheorem, we determine the Shannon capacity of some simple cycle graphs.Related to this we say something about an apart collection of graphs, the so

2

called Perfect Graphs. In chapter 2 we use Lovasz technique to determinethe Shannon capacity of C5. For long time this was an open problem andtherefore this is a very important result.Thereafter we proceed with the Lovasz Number, which is another upperbound for the Shannon capacity, about which we prove a couple of formulasand properties. At the end we prove that the Lovasz Number is indeed asmaller or equal upper bound for the Shannon capacity than the one Shannonfound. In this chapter we use, among others, the techniques introduced inchapter 1. In the last chapter we consider the three problems Lovasz statedat the end of his article for which Haemers found a counterexample.

3

Chapter 1

Graph Theory, Linear Algebraand Shannon’s Theorem

In this chapter we give some definitions, theorems and lemmas of the con-cepts needed for the proofs in the rest of the thesis. If there is no reference toan article or book we used the definition, theorem, lemma, example or corol-lary from Lovasz [5]. First we start with some definitions in graph theoryand describe some important concepts of linear algebra. In the last sectionwe state Shannon’s Theorem and define the perfect graphs, for which theShannon capacity is easy to determine. At the end we consider the cyclegraphs Cn.

1.1 Graph Theory

Definition 1.1. (Graph) An undirected graph is a pair G = (V,E), whereV is a finite set and E is a family of unordered pairs from V , we denoteE ⊆ {{i, j} | i, j ∈ V, i 6= j}. The elements of V are called the vertices andthe elements of E are called the edges [6].

Let G be an undirected graph without loops with on the vertices the lettersof the alphabet. Two vertices of G are adjacent if they are either connectedby an edge or are equal. The letters on the vertices can be confused if thevertices are adjacent [5].

Definition 1.2. (Complementary graph) The complementary graph G ofG has V (G) = V (G) and two vertices in G are connected if they are notconnected in G.

Definition 1.3. (Induced and Complete subgraph) Let G = (V,E) andH = (W,F ) be graphs. Then H is a subgraph of G if W ⊆ V and F ⊆ E.

4

H is an induced subgraph of G if W ⊆ V and F = {e ∈ E | e ⊆ W}.H is a complete subgraph of G if H is an induced subgraph of G with theadditional feature that every two vertices in H are connected by an edge. Acomplete subgraph is a clique, see definition 1.20.

Definition 1.4. (Maximum independent set) An independent set in a graphis a set of pairwise nonadjacent vertices. A maximum independent set con-sists of the maximum number of pairwise nonadjacent vertices and its size isdenoted by α(G) [7].

The maximum independent set, α(G), is the maximum number of 1-lettermessages which can be sent without danger of confusion [5]. In other words,such that the receiver knows whether the received message is correct or not.

Definition 1.5. (Strong product) The strong product G1 �G2 of two graphsG1(V1, E1) and G2(V2, E2) has vertex set V1 × V2 = {(u1, u2) : u1 ∈ V1, u2 ∈V2} with (u1, u2) 6= (v1, v2) adjacent if and only if ui = vi or uivi ∈ Ei fori = 1, 2.The k-strong product of graph G, Gk = G � G � . . . � G, with vertex setV k = {(u1, . . . , uk) : ui ∈ V } with (u1, . . . , uk) 6= (v1, . . . , vk) adjacent if andonly if ui = vi or uivi ∈ Ei ∀i [1].

It follows that α(Gk) is the maximum number of k-letter messages which canbe sent without danger of confusion. There are at least α(G)k of such words.Two k-letter messages are confoundable if for all 1 ≤ i ≤ k their ith lettersare adjacent.

Theorem 1.6. α(Gk) ≥ α(G)k

Proof. Let U ⊆ V (G) be a maximum independent set of vertices in G, so| U |= α(G). The α(G)k vertices in Gk of the form (u1, . . . , uk), ui ∈ U ∀iclearly form an independent set inGk. Because if (u1, . . . , un), (u

′1, . . . , u

′n) are

distinct and (u1, . . . , un), (u′1, . . . , u

′n) ∈ U×. . .×U , then ∃i : ui 6= u

′i for which

is ui, u′i ∈ U . Therefore ui, u

′i are not adjacent and so (u1, . . . , un), (u

′1, . . . , u

′n)

are not adjacent. Hence α(Gk) ≥ α(G)k.

Example 1.7. For the cycle graph C5, also called the pentagon, α(C25) = 5.

In fact, if v1, . . . , v5 are the vertices in a cyclic way, then the 2-letter messagesv1v1, v2v3, v3v5, v4v2 and v5v4 are nonconfoundable. In the figure on the nextpage the red colored vertices form the maximum independent set of α(C2

5).

5

Figure 1.1: Strong product C5 � C5

1.2 Linear Algebra

In this thesis all vectors are column vectors, I denotes the identity matrix,J denotes the square matrix where all entries equal one and j is the vectorwith all entries equal to one. The inner product of vectors v,w is denotedby vTw.

Definition 1.8. (Tensor product) If v = (v1, . . . , vn) and w = (w1, . . . , wm)then v ◦w is the vector (v1w1, . . . , v1wm, v2w1, . . . , vnwm)T of length nm.

It follows from the computation below that the definitions above are con-nected by

(x ◦ y)T (v ◦w) = (xTv)(yTw). (1.1)

Indeed, let

x = (xi)ni=1,y = (yj)

mj=1,v = (vi)

ni=1,w = (wj)

mj=1.

Then x ◦ y = (xiyj)n,mi=1,j=1,v ◦w = (viwj)

n,mi=1,j=1 and

(x◦y)T (v◦w) =∑n

i=1

∑mj=1 xiyjviwj = (

∑ni=1 xivi)(

∑mj=1 yjwj) = (xTv)(yTw).

Definition 1.9. (Orthonormal representation) An orthonormal representa-tion of G is a collection of unit vectors (u1, . . . ,un) in a Euclidean spacesuch that if i and j are nonadjacent vertices, then ui and uj are orthogonal.Every graph has an orthonormal representation.

Lemma 1.10. Let (u1, . . . ,un) and (v1, . . . ,vm) be orthonormal represen-tations of G and H respectively. Then the vectors (ui ◦ vj | i = 1, . . . n, j =1, . . .m) form an orthonormal representation of G�H.

6

Proof. Let (i1, i2), (j1, j2) ∈ V (G)× V (H) be nonadjacent in G�H; that is,i1, j1 are nonadjacent in G or i2, j2 are nonadjacent in H. Then uT

i1uj1 = 0

or vTi2uj2 = 0. From this we have (ui1 ◦ vi2)T (uj1 ◦ vj2) = 0. On the other

hand, if (i1, i2), (j1, j2) ∈ V (G)× V (H) adjacent or equal in G�H; that is,i1, j1 adjacent or equal in G and i2, j2 adjacent or equal in H. Hence from(1.1) follows that ui ◦ vj is an orthonormal representation of G�H.

The value of an orthonormal representation (u1, . . . ,un) is defined to be

minc

max1≤i≤n

1

(cTui)2,

where c ranges over all unit vectors and the minimum of all these c is calledthe handle of the representation [5].

Definition 1.11. (Lovasz Number) ϑ(G) is the minimum value over all therepresentations mentioned above, i.e.,

ϑ(G) = minu1,...,un

minc

max1≤i≤n

1

(cTui)2.

A representation is optimal if it achieves this minimum value.

Definition 1.12. (Positive semidefinite matrix) A matrix M is positivesemidefinite (p.s.d.) if there is a matrix X such that M = XTX or, equiva-lently, if M is symmetric and all eigenvalues are non-negative.

Another definition of ϑ(G) is

ϑ(G) = max{n∑

i,j=1

Xij | X = (xij)ni,j=0 p.s.d. ,Tr(X) = 1, xij = 0 if {i, j} ∈ E(G)}

Lemma 1.13. ϑ(G�H) ≤ ϑ(G)ϑ(H).

Proof. Let (u1, . . . ,un) and (v1, . . . ,vm) be optimal orthonormal represen-tations of G and H, with handles c and d respectively. Then c ◦ d is a unitvector by definition and (ui ◦ vj) is an orthonormal representation of G�Hby lemma 1.10. Hence,

ϑ(G�H) ≤ maxi,j

1

((c ◦ d)T (ui ◦ vj))2= max

i,j

1

(cTui)2· 1

(dTvj)2= ϑ(G)ϑ(H).

Remark. In chapter 3 we will prove that equality holds in Lemma 1.13.

7

Lemma 1.14. α(G) ≤ ϑ(G)

Proof. Let (u1, . . . ,un) be an optimal orthonormal representation of G withhandle c. Let {1, . . . , k} be a maximum independent set inG. Then u1, . . . ,uk

are pairwise orthogonal. Extends this to an orthonormal representation(u′1, . . . ,u

′n) where u

′i = ui if i = 1, . . . k. Then,

1 = c2 = cTc =n∑i=1

(cTu′

i)2 ≥

k∑i=1

(cTui)2 ≥ k

ϑ(G)=α(G)

ϑ(G).

Hence α(G) ≤ ϑ(G).

Definition 1.15. (Adjacency matrix) The adjacency matrix of graph G withvertex set {1, . . . , k} is a k × k-matrix where the entry (i, j) equals one if iis adjacent to j and equals 0 when i = j or i is nonadjacent to j.We write Au,v := number of edges connecting u and v for u, v ∈ V (G) [6].The adjacency matrix is symmetric if the graph is undirected.

In this thesis we consider undirected graphsG, therefore the adjacency matrixof G is always symmetric in the next theorems and lemmas.

Remark. Lovasz used another definition of adjacency, when i = j he calledthe vertices also adjacent. In the next proofs we will use the definition ofLovasz but when we consider an adjacency matrix we will use the defintionabove. In other cases it will be clear from the context what is meant.

Example 1.16. The adjacency matrix for the 5-cycle C5 isA =

0 1 0 0 11 0 1 0 00 1 0 1 00 0 1 0 11 0 0 1 0

.Lemma 1.17. Let X and Y be matrices, then Tr(XY ) =Tr(Y X).

Proof.

Tr(XY ) =n∑i=1

(XY )i,i =n∑i=1

(n∑j=1

Xi,jYj,i

)=

n∑j=1

(n∑i=1

Xi,jYj,i

)

=n∑j=1

(n∑i=1

Yj,iXi,j

)=

n∑j=1

(Y X)j,j = Tr(Y X).

8

1.3 Shannon’s Theorem

In 1959 Shannon introduced the Shannon capacity of a graph [5].

Definition 1.18. (Shannon capacity) The Shannon capacity of a graph isdefined by

Θ(G) = supk

k√α(Gk)

By Fekete’s lemma we prove that Θ(G) = supkk√α(Gk) = limk→∞

k√α(Gk).

Lemma 1.19. [6] Let a1, a2, . . . be a sequence of reals such that an+m ≥an + am for all positive n,m ∈ Z. Then

limn→∞

ann

= supn

ann

.

Proof. For all i, j, k ≥ 1 we have ajk+i ≥ ajk + ai ≥ ak + ak + . . .+ ak︸︷︷︸j times

+ai =

jak + ai. So, for all fixed i, k ≥ 1 we have

lim infj→∞

(ajk+i

jk + i

)≥ lim inf

j→∞

(jak + aijk + i

)= lim inf

j→∞

(jakjk + i

+ai

jk + i

)= lim inf

j→∞

(akk

jk

jk + i+

aijk + i

)=akk.

Since this is true for each i, we have for fixed k ≥ 1,

lim infn→∞

(ann

)= inf

i=1,...,klim infj→∞

(ajk+i

jk + i

)= inf

i=1,...,k

(akk

)=akk.

Therefore, lim infn→∞(ann

)≥ supk

akk

.And this implies

limn→∞

ann

= supn

ann.

9

Using the multiplicative version, an+m ≥ anam for all positive n,m ∈ Z andan > 0 ∀n, we have

log an+m︸︷︷︸bn+m

≥ log(anam) = log an︸︷︷︸bn

+ log am︸︷︷︸bm

.

If we use bn+m ≥ bn + bm and the proof of Lemma 1.19 we get

limk→∞

k√ak = sup

k

k√ak

and since α(Gp+l) ≥ α(Gp)α(Gl),

limk→∞

k√α(Gk) = sup

k

k√α(Gk)

as Shannon defined.

From the previous definition and by Theorem 1.6 it is seen that

Θ(G) = supk

k√α(Gk) ≥ sup

k

k√α(G)k = sup

kα(G) = α(G)

Hence, Θ(G) ≥ α(G) and we conclude that α(G) is a lower bound for theShannon capacity of G.

Even for simple graphs it is difficult to determine the Shannon capacity.Shannon proved that Θ(G) = α(G) for graphs which can be covered by α(G)cliques. Examples of such graphs are the perfect graphs [5], in section 1.3.1we will give an explanation.

Definition 1.20. (Clique) A clique in a graph G is a set of pairwise adjacentvertices [7].

Definition 1.21. (Fractional vertex packing) A fractional vertex packing ofG is a function w : V (G)→ R+ such that for all cliques C applies∑

x∈C w(x) ≤ 1.

A general upper bound on Θ(G) was given by Shannon. This upper boundis denoted by α∗(G), which is the maximum of

∑x∈V w(x) taken over all

fractional vertex packings w [5].

The Duality Theorem of Linear Programming states, for A ∈ Rm×n, b ∈Rm, c ∈ Rn, the following:

max{cTx | x ≥ 0, Ax ≤ b} = min{yTb | y ≥ 0,yTA ≥ cT}

10

provided that ∃x ≥ 0 : Ax ≤ b and ∃y ≥ 0 : yTA ≥ cT ,[6].Consider a matrix A whose rows are indicated by cliques, columns by vertices

and whose (C, v)-entry =

{1 if v ∈ C0 if v /∈ C,

and vectors c =

1...1

V (G) , b =

1...1

P (G),

here P (G) is the collection cliques in G, C ∈ P (G) and v ∈ V (G).With this theorem α∗(G) can be defined dually as follows: assign nonnega-tive weights q(C) to the cliques C of G such that

∑x∈C q(C) ≥ 1 for each

point x of G and minimize∑

C q(C).

With this notation Shannons theorem states,

α(G) ≤ Θ(G) ≤ α∗(G).

1.3.1 Perfect Graphs

As is said before, it is difficult to determine the Shannon capacity of a graph,even for very simple graphs. In the next chapter we will see how Lovaszproved that the Shannon capacity of the cycle graph C5 equals

√5. However

there is a collection of graphs for which the Shannon capacity is easy todetermine, the so called Perfect Graphs.

Definition 1.22. (Vertex-coloring) Vertex-coloring, or coloring, is a parti-tion of V into maximum independent sets. Each of the maximum indepen-dent sets is called a color of the coloring. The vertex-coloring number χ(G)is the minimum number of colors in a vertex-coloring [6].

Definition 1.23. (Perfect Graph) A graph G is perfect if χ(H) = ω(H) forevery induced subgraph H of G, with χ(H) the minimum size of coloringand ω(H) the maximum size of a clique [7].

For perfect graphs we have α(G) = α∗(G) and by Shannon’s Theorem weconclude that α(G) = Θ(G). To prove this we need another theorem.

Theorem 1.24. G is perfect ⇐⇒ G is perfect.

Proof. See [4] for a proof.

Theorem 1.25. If G is perfect, then α(G) = α∗(G).

11

Proof. If G is perfect, then G is perfect. Then ω(G) = χ(G) by definitionand it follows that α(G) = χ(G) = χ(G), because ω(G) = α(G).In general α(G) ≤ α∗(G) ≤ χ(G), since

α(G) = max{| S || S ⊆ V (G), S an independent set in G}= max{cTx | x is (0, 1)− vector in RV (G), Ax ≤ b}≤ max{cTx | x ≥ 0, Ax ≤ b}= min{yTb | y ≥ 0,yTA ≥ cT}≤ min{yTb | y is (0, 1)− vector in P (G),yTA ≥ cT}= χ(G).

Hence, if α(G) = χ(G) then also α(G) = α∗(G).

We give an example of a perfect graph for which we can determine the Shan-non capacity.

Example 1.26. Let G = C4, then G is perfect because χ(G) = ω(G) = 2and χ(H) = ω(H) for every induced subgraph H of G. Also α(G) = 2 andhence Θ(G) = 2.

Figure 1.2: Cycle graph C4

1.3.2 Even and odd Cycle Graphs Cn

From the example in the previous section the question raises what is knownabout the Shannon capacity of cycles Cn. We will see that it depends onwhether n is even or odd. First we give α(Cn) and α∗(Cn) for the mostsimple cycles.

• C3: α(C3) = 1, α∗(C3) = 1⇒ Θ(C3) = 1

• C4: α(C4) = 2, α∗(C4) = 2⇒ Θ(C4) = 2

12

• C5: α(C5) = 2, α∗(C5) = 52⇒ 2 ≤ Θ(C5) ≤ 5

2

• C6: α(C6) = 3, α∗(C6) = 3⇒ Θ(C6) = 3

• C7: α(C7) = 3, α∗(C7) = 72⇒ 3 ≤ Θ(C7) ≤ 7

2

• C8: α(C8) = 4, α∗(C8) = 4⇒ Θ(C8) = 4

• C9: α(C9) = 4, α∗(C9) = 92⇒ 4 ≤ Θ(C9) ≤ 9

2

Figure 1.3: Cycle graphs C3 to C9

We see that if n is even, then α(Cn) = α∗(Cn) = Θ(Cn). This result followsfrom the fact that if n is even Cn is a perfect graph or we can check directlythat χ(Cn) = n

2. If n is odd we see Θ(C3) = 1, but for n > 3 we can only

give a lower and an upper bound for Θ(Cn), namely n−12≤ Θ(Cn) ≤ n

2.

We should remark here that in general the exact determination of maximumindependent sets in Cd

n seems to be a very hard task [2].In the next chapter we will see how Lovasz proved that the Shannon capacityof C5 equals

√5, but for C7 and in general for Cn with n odd and n > 5, we

cannot give the exact value of the Shannon capacity.

13

Chapter 2

The capacity of the pentagon

In this chapter we explain how to determine the Shannon capacity of thecycle graph C5, also known as the pentagon [5]. From Shannon’s theorem weknow α(G) ≤ Θ(G) ≤ α∗(G) and hence

√5 ≤ Θ(C5) ≤ 5

2.

Indeed, Θ(C5) = supkk√α(Ck

5 ) = sup{2,√

5, . . .}, so Θ(C5) ≥√

5.Also, for a clique C must apply

∑x∈C w(x) ≤ 1, and since we want to max-

imize the sum of the weights of the vertices we give every vertex maximumweight 1

2, so α?(G) = 5 · 1

2= 5

2and hence Θ(C5) ≤ 5

2.

Figure 2.1: C5

Theorem 2.1. Θ(C5) =√

5

In the next proof we use the umbrella technique introduced by Lovasz, seealso figure 2.2 [5].

14

Proof. Consider an umbrella whose handle and five ribs have the unit length.Open the umbrella till the point where the maximum angle between the ribs isπ2. Let u1,u2,u3,u4,u5 be the ribs and c the handle as vectors oriented away

from their common point. Then u1, . . . ,u5 is an orthonormal representationof C5. By the Spherical Cosine Theorem we can compute cTui = 5−1/4.To show: Θ(C5) ≤

√5.

Let S be a maximum independent set in Ck5 , so S ⊆ Ck

5 . ThenS ⊆ {(i1, . . . , ik) | i1, . . . , ik ∈ V (Ck

5 )}. We have X := {ui1 ◦ . . . ◦ uik |(i1, . . . , ik) ∈ S} is orthonormal because ui and uj are orthogonal if i, jnonadjacent. Let c be a unit vector. Then,

1 = 〈c ◦ . . . ◦ c︸︷︷︸k

, c ◦ . . . ◦ c︸︷︷︸k

〉

≥∑x∈X

〈c ◦ . . . ◦ c,x〉2

=∑

(i1,...ik)∈S

〈c ◦ . . . ◦ c,ui1 ◦ . . . ◦ uik〉2

=∑

(i1,...ik)∈S

k∏j=1

〈c,uij〉2 =∑x∈X

1

(√

5)k=| S |

(√

5)k

⇒| S |≤ (√

5)k.

Therefore,

Θ(C5) = supk

k√| S | ≤ sup

k

k

√(√

5)k =√

5.

Figure 2.2: The Lovasz umbrella

15

Chapter 3

Formulas for ϑ(G), an upperbound for Θ(G)

In this chapter we investigate the upper bound ϑ(G) in more detail and giveseveral proofs which were stated by Lovasz. We first prove that ϑ(G) is anupper bound for Θ(G). Then we prove a theorem about positive semidefinitematrices such that we can give a relation between ϑ(G) and eigenvalues ofsymmetric matrices. Next we prove a theorem about the value of ϑ(G). Afterintroducing an orthonormal representation of G we draw some consequences.Then we give another way of representing the value of ϑ(G) and we proveequality in Lemma 1.13. We give some properties of ϑ(G) if G is vertex-or edge-transitive or if G is regular and we end with proving that ϑ(G) is asmaller or equal upper bound for Θ(G) than the one Shannon found.

Theorem 3.1. Θ(G) ≤ ϑ(G)

Proof. By Lemma 1.13 and 1.14 we have,

Θ(G) = supk

k√α(Gk) ≤ sup

k

k√ϑ(Gk) ≤ sup

k

k√ϑ(G)k = ϑ(G).

Theorem 3.2. λI − A positive semidefinite if and only if the largest eigen-value of A ≤ λ.

Proof. Let λ1, . . . , λn be the eigenvalues of A. Then λ − λ1, . . . , λ − λn arethe eigenvalues of λI − A. Therefore λI − A is positive semidefinite ⇐⇒λ− λ1, . . . , λ− λn ≥ 0 ⇐⇒ λ1 ≤ λ, . . . , λ1 ≤ λ ⇐⇒ the largest eigenvalueof A ≤ λ.

16

Theorem 3.3. Let G be a graph on vertices {1, . . . , n}. Then ϑ(G) is theminimum of the largest eigenvalue of any symmetric matrix A = (aij)

ni,j=1

such thataij = 1 if i = j or if i and j are nonadjacent . (3.1)

Proof. Let U = (u1, . . . ,un) be an optimal orthonormal representation of Gwith handle c. Define

aij =

{1− uTi uj

(cTui)(cTuj)if i 6= j

1 if i = j

Then (3.1) is satisfied and

−aij =

(c− ui

(cTui)

)T (c− uj

(cTuj)

), i 6= j (3.2)

Because(c− ui

(cTui)

)T (c− uj

(cTuj)

)= cTc− cT

uj

cTuj

− c

(ui

cTui

)T+

uTi uj

(cTui)(cTuj)

= cTc− 2 +uTi uj

(cTui)(cTuj)

= −1 +uTi uj

(cTui)(cTuj)= −aij.

And

ϑ(G)− aii =

(c− ui

(cTui)

)2

+

(ϑ(G)− 1

(cTui)2

)(3.3)

Because

ϑ(G)− aii = cTc− 2(cTui)

(cTui)+

uTi ui

(cTui)2+ ϑ(G)− 1

(cTui)2

= 1− 2 +1

(cTui)2− 1

(cTui)2+ ϑ(G) = ϑ(G)− 1.

By equation (3.2) define D :=(c− u1

cTu1, . . . , c− un

cTun

)and let B = DTD.

By defintion of ϑ(G) holds ∀i ϑ(G) ≥ 1(cTui)2

. Hence ∀i we can define wi :=

ϑ(G) − 1(cTui)2

≥ 0, here ∆w is the matrix with on the diagonal w1, . . . , wnand the other entries equal 0. Then by (3.2) and (3.3),ϑ(G)I−A = B+∆w is positive semidefinite becauseB is of the formDTD andB is symmetric with non-negative eigenvalues and ∆w is clearly symmetric

17

and has positive eigenvalues. By Theorem 3.2 the largest eigenvalue of A isat most ϑ(G).On the other hand, let A = (aij) be a matrix as defined above and let λ beits largest eigenvalue. Then by Theorem 3.2 λI − A is positive semidefiniteand hence there exists vectors x1, . . . ,xn such that(λI − A)ij = λIij − Aij = λδij − aij = xTi xj with

δij =

{1 if i = j0 if i 6= j

(3.4)

Let c be a unit vector perpendicular to x1, . . . ,xn and set

ui =1√λ

(c + xi).

Then u2i = 1

λ(c2 + 2cxi + x2

i ) = 1λ(1 + x2

i ).Which follows 1 + x2

i = λu2i and 1 + x2

i = λ if i = j.Hence u2

i = 1 by (3.1) and (3.4) and for i, j nonadjacent

uTi uj =

1

λ(c + xi)

T (c + xj)

=1

λ(cTc + cxTi + cTxj + xTi xj)

=1

λ(1 + xTi xj).

Whence uTi ujλ = 1 + xTi xj and so uT

i ujλ− 1 = xTi xj.So by (3.4) uT

i uj = 0.Therefore (u1, . . . ,un) is an orthonormal representation of G.Moreover, cTui = 1√

λ(cTc + cTxi) = 1√

λand we conclude

λ =1

(cTui)2.

Therefore, ϑ(G) ≤ λ by definition of ϑ(G).

A new definition of the Lovasz Number just considered is:

ϑ(G) = minλ such that λ is the largest eigenvalue of a matrix A satisfying (3.1).

Remark. The above proof shows that among the optimal orthonormal repre-sentations there is one such that

ϑ(G) =1

(cTu1)2= . . . =

1

(cTun)2. (3.5)

18

The next theorem gives a good characterization of the value of ϑ(G).

Theorem 3.4. Let G be a graph on vertices {1, . . . , n} and let B = (bij)ni,j=1

range over all positive semidefinite symmetric matrices such that

bij = 0 (3.6)

for every pair (i, j) of distinct adjacent vertices and

Tr(B) = 1. (3.7)

Then ϑ(G) = maxB Tr(BJ).

Remark. Tr(BJ) is the sum of all entries of matrix B.

Proof. First we prove ϑ(G) ≥ maxB Tr(BJ).Let A = (aij)

ni,j=1 be the matrix as in Theorem 3.3 with largest eigenvalue

ϑ(G), and let B be any matrix as described above. Then

Tr(BJ) =n∑i=1

n∑j=1

bij =n∑i=1

n∑j=1

aijbij = Tr(AB),

because aijbij =

{bij if i = j or i, j nonadjacent0 if i, j adjacent

and so

ϑ(G)− Tr(BJ) = ϑ(G)− Tr(AB) = ϑ(G)Tr(B)− Tr(AB)

= Tr(ϑ(G)B)− Tr(AB) = Tr(ϑ(G)IB − AB)

= Tr((ϑ(G)I − A)B).

Here both ϑ(G)I − A and B are positive semidefinite, by Theorem 3.2.Let e1, . . . , en be a set of mutually eigenvectors of B corresponding witheigenvalues λ1, . . . , λn ≥ 0. Then

Tr((ϑ(G)I − A)B) =n∑i=1

eTi (ϑ(G)I − A)Bei (3.8)

=n∑i=1

λieTi (ϑ(G)I − A)ei ≥ 0. (3.9)

Namely, let U = [e1, . . . , en], then UTU = UUT = I.For matrix C we have,

Tr(C) = Tr(CI) = Tr(CUUT ) = Tr(UTCU) =n∑i=1

(UTCU)i,i =n∑i=1

eTi Cei.

19

Now let C = (ϑ(G)I − A)B and (3.9) follows.The last inequality holds because if C positive semidefinite then∀x xTCx ≥ 0 since C = MTM . Then xTCx = xTMTMx =‖Mx ‖2≥ 0.By the derivation above ϑ(G) − Tr(BJ) ≥ 0 and so ϑ(G) ≥ max Tr(BJ).Second we prove ϑ(G) ≤ max Tr(BJ).Construct a matrix B which makes that the previous inequality becomes aequality, i.e. Tr((ϑ(G)I − A)B) = 0.Let (i1, j1), . . . , (im, jm)(ik < jk) be the edges of G. Consider the (m + 1)-dimensional vectors

h =(hi1hj1 , . . . , himhjm , (

∑hi)

2)T

where h = (h1, . . . , hn) ∈ Rn, ‖ h ‖= 1 ranges through all unit vectors andz = (0, 0, . . . , 0, ϑ(G))T .Claim: z is a convex hull of vectors h.By definition [6],

convex hull (h) = {t∑i=1

λifi | f1, . . . , ft ∈ h, λ1, . . . , λt ∈ R+,t∑i=1

λi = 1} ⊆ Rm+1.

Suppose the claim is not true. The vectors h form a compact set, sinceK = {h ∈ Rn |‖ h ‖= 1} is compact and the image of a compact spaceunder a continuous function is compact. Hence there exists a hyperplanewhich seperate z from all the h. Therefore there is a vector a and a realnumber α such that aT h ≤ α for all unitvectors h but aTz > α.Set

a = (a1, . . . , am, y)T .

Then in particular, aT h ≤ α, for h = (1, 0, . . . , 0), so h = (0, 0, . . . , 1) andaT h = y ≤ α. On the other hand, aTz > α implies ϑ(G)y > α.Hence y > 0 and α > 0. To see this we have to show 0 < y ≤ α < ϑ(G)y,i.e. (ϑ(G)− 1)y > 0.ϑ(G) − 1 ≥ 0 because ϑ(G) ≥ 1 by definition of ϑ(G). If ϑ(G)y ≤ 0, theny ≤ 0. Then ϑ(G)y ≤ y and (ϑ(G) − 1)y ≤ 0. Which is a contradiction, soy > 0 and α > 0.Up to rescaling we can suppose that y = 1 and hence α < ϑ(G).Now define

aij =

{12ak + 1 if {i, j} = {ik, jk}

1 otherwise,

then aT h ≤ α can be written asn∑i=1

n∑j=1

aijhihj ≤ α.

20

Since the largest eigenvalue (l.e.v.) of A = (aij) is max{xTAx || x |= 1} andxTAx =

∑ni=1

∑nj=1 aijhihj ≤ α it follows that the l.e.v.(aij) is at most α.

First we show l.e.v.(A) = max{xTAx || x |= 1}.Let Ax = λx, we may assume | x |= 1, thus xTAx = λxTx = λ | x |2= λ.Then, max{xTAx || x |= 1} ≥ l.e.v.(A).Since A is symmetric, there are eigenvectors e1, . . . , en such thateTi ej = δi,j, as in (3.4).Agian assume | x |= 1 and x =

∑ni=1 αiei.

Then1 =| x |2= xTx =

∑i,j

αiαjeTi ej =

∑i

α2i ,

and

xTAx =∑i,j

αiαjeTi Aej =

∑i,j

αiαjeTi λjej =

∑i,j

λjαiαjeTi ej

=∑i

λiα2i ≤

∑i

l.e.v.(A)α2i = l.e.v.(A)

∑i

α2i = l.e.v.(A).

Since max{xTAx || x |= 1} = α and since (aij) satisfies (3.1), this impliesϑ(G) ≤ α which is a contradiction and proves the claim.By the claim there exists a finite number of unit vectors h1, . . . ,hN and non-negative reals α1, . . . αN such that

α1 + . . .+ αN = 1

α1h1 + . . .+ αN hN = z.

Set

hp = (hp,1, . . . , hp,n)T

bij =N∑p=1

αphp,ihp,j

B = (bij).

The matrix B is symmetric and positive semidefinite as is stated in the begin.Further α1h1 + . . .+ αN hN = z implies

bikjk =N∑p=1

αphp,ikhp,jk = α1h1ikh1jk

+ . . .+ αNhNikhNjk

= α1kh1 + . . .+ αNkhN = z = (0, . . . 0︸︷︷︸m

, ϑ(G)) = 0 for k = 1, . . .m.

21

Also,

TrBJ =∑ij

bij =N∑p=1

αp∑ij

hp,ihp,j =N∑p=1

αp(∑i

hp,i)(∑j

hp,j)

=N∑p=1

αp(hp)m+1 = zm+1 = ϑ(G).

And α1 + . . .+ αN = 1 implies TrB = 1.Therefore ϑ(G) = maxB TrBJ and completes the proof.

Lemma 3.5. Let (u1, . . . ,un) be an orthonormal representation of G and(v1, . . . ,vn) be an orthonormal representation of G. Moreover, let c and dbe any vectors. Then

n∑i=1

(uTi c)2(vTi d)2 ≤ c2d2.

Proof. By (1.1) and (3.4) (ui ◦ vi)(uj ◦ vj) = (uTi uj)(v

Ti vj) = δij.

Hence they form an orthonormal system and in general if b1, . . . bk are or-thonormal, i.e. bTi bj = δij, then for all c holds | c |2≥

∑i(b

Ti c)2. This is be-

cause we can expand b1, . . . bk to an orthonormal basis b1, . . . bk, bk+1, . . . bn.Let B = [b1, . . . bn], then BTB = I and also BBT = I.Then

k∑i=1

(bTi c)2 ≤n∑i=1

(bTi c)2 =n∑i=1

bTi ccTbi = Tr(BT (ccT )B)

= Tr(BBT (ccT )) = Tr(ccT ) =n∑i=1

cTi ci =| c |2 .

From this we can conclude

n∑i=1

(uTi c)2(vTi d)2 ≤ cTc dTd = c2d2.

Corollary 3.6. If (v1, . . . ,vn) is an orthonormal representation of G and dis any unit vector, then

ϑ(G) ≥n∑i=1

(vTi d)2.

22

Indeed, ϑ(G) = 1(cTui)2

∀i by definition of ϑ(G) and (3.5).Also by Lemma 3.5,

1 = c2d2 ≥n∑i=1

(uTi c)2(vTi d)2 =

1

ϑ(G)

n∑i=1

(vTi d)2.

Hence, Corollary 3.6 follows.

Corollary 3.7. ϑ(G)ϑ(G) ≥ n

For the complementary graph G we have ϑ(G) = 1(vTi d)2

∀i.By Corollary 3.6

ϑ(G) ≥n∑i=1

(vTi d)2 =n∑i=1

1

ϑ(G)= n · 1

ϑ(G).

So, Corollary 3.7 follows.

The next theorem gives another formula for the value of the upper boundϑ(G). It uses the orthonormal representation of the complementary graphG.

Theorem 3.8. Let (v1, . . . ,vn) range over all orthonormal representationsof G and d over all unit vectors. Then

ϑ(G) = maxn∑i=1

(dTvi)2.

Proof. By Corollary 3.6 we have the inequality ≥. We will construct arepresentation of G and a unit vector d for which equality holds. LetB = (bij) be a positive semidefinite matrix satisfying (3.6) and (3.7) such thatTrBJ = ϑ(G). Since B is positive semidefinite, there are vectors w1, . . .wn

such thatbij = wT

i wj. (3.10)

Note that∑n

i=1 w2i =

∑ni=1 wT

i wi = TrB = 1 and (∑n

i=1 wi)2

= TrBJ =ϑ(G).Set

vi =wi

| wi |and d =

(∑n

i=1 wi)

|∑n

i=1 wi |.

23

Then the vectors vi form an orthonormal representation of G by (3.10) and(3.6). Moreover, by using the Cauchy-Schwarz inequality,|< x,y >|≤ √< x,x >

√< y,y >, we have

n∑i=1

(dTvi)2 =

(n∑i=1

w2i

)(n∑i=1

(dTvi)2

)

≥

(n∑i=1

| wi | (dTvi)

)2

=

(n∑i=1

(dTwi)

)2

=

(dT

n∑i=1

wi

)2

=

(n∑i=1

wi

)2

= ϑ(G).

So, the inequality ≤ holds also and this proves the theorem.

Remark. If there is equality in the Caucht-Schwarz inequality it also followsthat

(dTvi)2 = ϑ(G)w2

i = ϑ(G)bii ∀i. (3.11)

Since we derived some formulas for ϑ(G) in the previous we can now derivesome consequences, for example if G is vertex-transitive or regular.

Theorem 3.9. ϑ(G�H) = ϑ(G)ϑ(H)

Proof. In Lemma 1.13 we proved that ϑ(G�H) ≤ ϑ(G)ϑ(H). To prove theopposite inequality, let (v1, . . . ,vn) be an orthonormal representation of G,let (w1, . . . ,wn) be an orthonormal representation of H and let c,d be unitvectors such that

∑ni=1(vTi c)2 = ϑ(G) and

∑ni=1(wT

i d)2 = ϑ(H) by Theorem3.8. Since vi ◦wj is an orthonormal representation of G�H and if we provethe next claim we can conclude vi ◦wj is an orthonormal representation ofG�H.Claim: G�H ⊆ G�H.

Let (u, v), (x, y) ∈ V (G)× V (H) be adjacent or equal inG�H

⇐⇒ ((u = x) ∨ (u, x) ∈ E(G)) ∧ ((v = y) ∨ (v, y) ∈ E(H))

⇐⇒ ((u, x) /∈ E(G)) ∧ ((v, y) /∈ E(H))

=⇒ ((u = x) ∧ (v = y)) ∨ ((u 6= x) ∧ (u, x) /∈ E(G)) ∨ ((v 6= y) ∧ (v, y) /∈ E(H))

⇐⇒ ((u, v) = (x, y)) ∨ ({(u, v), (x, y)} /∈ E(G�H))

⇐⇒ ((u, v) = (x, y)) ∨ ({(u, v), (x, y)} ∈ E(G�H))

⇐⇒ (u, v), (x, y) ∈ V (G)× V (H) adjacent or equal inG�H.

24

Moreover, c ◦ d is a unit vector. So,

ϑ(G�H) ≥n∑i=1

m∑j=1

((vi ◦wj)

T (c ◦ d))2

=n∑i=1

m∑j=1

(vTi c)2(wTj d)2

=n∑i=1

(vTi c)2

m∑j=1

(wTj d)2 = ϑ(G)ϑ(H).

For the next theorem we introduce a new definition.

Definition 3.10. (Vertex- and Edge-transitive) A graphG is vertex-transitiveif for every pair i, j ∈ V (G) there is an automorphism that maps i to j. Inother words, vertex-transitivity guarantees that the graph looks the samefrom each vertex. A graph is edge-transitive if for all e, f ∈ E(G) there is anautomorphism of G that maps the endpoints of e to the endpoints of f [7].

Theorem 3.11. If G has a vertex-transitive automorphism group, then

ϑ(G)ϑ(G) = n.

Proof. By Corollary 3.7 we have seen ϑ(G)ϑ(G) ≥ n. We will prove the op-posite inequality.Let Γ be the vertex-transitive automorphism group of G. Consider the el-ements of Γ as n × n permutation matrices. Let B = (bij) be a matrixsatisfying (3.6) and (3.7) such that TrBJ = ϑ(G). Consider

B = (bij) =1

| Γ |

(∑P∈Γ

P−1BP

).

Then B satisfies (3.6), because bij = 0 and therefore1|Γ|

(∑P∈Γ P

−1BP)

= bij = 0.

Also TrB = 1, because if Tr( 1|Γ|

(∑P∈Γ P

−1BP)) = 1, then

Tr(∑

P∈Γ P−1BP ) =| Γ | and since TrB = 1 we have

Tr(∑

P∈Γ P−11P ) =Tr(

∑P∈Γ I) =| Γ |, as desired.

Also,

25

TrBJ = Tr

(1

| Γ |

(∑P∈Γ

P−1BP

)J

)= Tr

(1

| Γ |

(∑P∈Γ

P−1BJP

))

= Tr

(1

| Γ |

(∑P∈Γ

P−1BJ

))=

(1

| Γ |∑P∈Γ

P−1ϑ(G)

)

=1

| Γ || Γ | ϑ(G) = ϑ(G).

We used that PJ = JP = J . This holds because P is a permutation matrixwhich satisfies the condition that in each row and each column there is exactlyone entry equal to 1 and the other entries equal 0.Beside this, B is symmetric and positive semidefinite, since B is positivesemidefinite and P−1BP = P TBP since P is a permutation matrix.Let {PP0 | P ∈ Γ} = Γ.

Then P−10 BP0 = P−1

0

(1|Γ|∑

P∈Γ P−1BP

)P0 = 1

|Γ|∑

P∈Γ P−10 P−1BPP0 = B.

Therefore P−1BP = B for all P ∈ Γ.Since Γ is vertex-transitive and TrB = 1, bii = 1

nfor all i. Constructing

the orthonormal representation (v1, . . . ,vn) of G and unit vector d as in theproof of Theorem 3.8, we have

(dTvi)2 = ϑ(G)bii =

ϑ(G)

n

by (3.11). From the definition of ϑ(G) we have,

ϑ(G) ≤ max1≤i≤n

1

(vTi d)2=

n

ϑ(G)

and hence,ϑ(G)ϑ(G) ≤ n.

Corollary 3.12. If G has a vertex-transitive automorphism group, then

Θ(G)Θ(G) ≤ n.

According to Theorem 3.1, where we proved that Θ(G) ≤ ϑ(G), the Corollaryfollows.

Remark. Theorem 3.11 and Corollary 3.12 do not hold for all graphs be-cause there are graphs with α(G)α(G) > n, for example a star. In this caseTheorem 3.11 an Corollary 3.12 contradict with Lemma 1.14.

26

Definition 3.13. (Regular Graph) A graph G is a regular graph if ∆(G) =δ(G). Here ∆(G) is the maximum degree and δ(G) is the minimum degree.The degree of a vertex v in G is the number of edges incident to v [7].

Theorem 3.14. Let G be a regular graph and let λ1 ≥ λ2 ≥ . . . ≥ λn be theeigenvalues of the adjacency matrix A of G. Then

ϑ(G) ≤ −nλnλ1 − λn

.

Equality holds if the automorphism group of G is transitive on the edges.

Proof. Consider the matrix J −xA where x will be chosen later. The matrixsatisfies the condition that (J−xA)ij = 1 if i = j or if i and j are nonadjacentbecause aij = 0 in this case. We use Theorem 3.3 for J − xA and hence itslargest eigenvalue is at least ϑ(G). Let vi be the eigenvector of A belongingto λi. Then since G is regular, say k−regular, we have Aj = kj. Thereforev1 = j is an eigenvector with eigenvalue k. Since A is symmetric we canchoose the eigenvectors all perpendicular to each other.Let λ1 6= λ2, Avi = λivi and Avj = λjvj, then vTi vj = 0. This is becausevTj Avi = vTi Avj, vTj Avi = λiv

Tj vi and vTi Avj = λjv

Tj vi. This is only

possible if vTi vj = 0. Therefore j,v2, . . . ,vn are also eigenvectors of J .If it is true that vTi vj = 0 if i 6= j. Then also

Jvi =

{nvi if i = 10 if i 6= 1

So, (J − xA)v1 = nv1 − xλ1v1 = (n− xλ1)v1

and (J − xA)vi = −xλivi if i 6= 1.Hence, the eigenvalues of J − xA are n− xλ1,−xλ2, . . . ,−xλn.The largest of these is the first or the last one and the optimal choice of x is:

x =n

(λ1 − λn).

If we now rewrite n− xλ1 and −xλn we get,

n− xλ1 = n−(

n(λ1−λn)

λ1

)= −nλn

λ1−λn

−xλn = −nλnλ1−λn .

Therefore we can conclude

ϑ(G) ≤ −nλnλ1 − λn

.

We assume the automorphism group Γ of G is transitive on the edges.Let C = (cij) be a symmetric matrix such that cij = 1 if i = j or if i and

27

j are nonadjacent and having largest eigenvalue ϑ(G). As in the proof ofTheorem 3.11 consider

C = (cij) =1

| Γ |

(∑P∈Γ

P−1CP

).

Then C also satisfies the condition that (cij) = 1 if i = j or if i and j arenonadjacent. Using Theorem 3.3 its largest eigenvalue is at most and evenequal to ϑ(G). Since

C = (cij)

{1 if i = j or i, j nonadjacentβ if i 6= j and i, j adjacent,

C = J − xA if x = 1− β and hence also the second assertion follows.

Corollary 3.15. For odd n,

ϑ(Cn) =n cos(π

n)

1 + cos(πn).

We explain why this result is true. Let A be the adjacency matrix of Cn,then ai,j = 1 if and only if | j − i |= 1(mod n).

Define (fk)l := e2πikln , then

(Afk)j =n∑l=1

aj,l(fk)l = (fk)j+1 + (fk)j−1 = e2πik(j+1)

n + e2πik(j−1)

n

= e2πikjn

(e

2πikn + e

−2πikn

)= 2 cos

(2πk

n

)(fk)j.

Therefore, the eigenvalues of A are {2 cos(

2πkn

)| k = 1, . . . n}. Hence λ1 = 2

and λn = 2 cos(

(n−1)πn

)= 2 cos

(π − π

n

)= −2 cos(π

n). Since Cn is transitive

on the edges and by Theorem 3.14 ϑ(Cn) =−n·−2 cos(π

n)

2+2 cos(πn

)=

n cos(πn

)

1+cos(πn

).

Example 3.16. For the cycle graph C7 we have ϑ(C7) =7 cos(π

7)

1+cos(π7

)and we

know α(C7) = 3, thus

α(C7) = 3 ≤ Θ(C7) ≤ ϑ(C7) ≈ 3, 3 ≤ α∗(C7) =7

2.

In this example we see that ϑ(C7) a smaller upper bound is for Θ(C7) thanα∗(C7).

28

We arrived at one of the most important theorems. This theorem says thatϑ(G) really is a smaller or equal upper bound for the Shannon capacity thanthe one Shannon found.

Theorem 3.17. ϑ(G) ≤ α∗(G)

Proof. Use Theorem 3.4 which says ϑ(G) = maxB TrBJ if B satisfies (3.6)and (3.7). Let (u1, . . . ,un) be an orthonormal representation of G and c aunit vector such that

ϑ(G) =n∑i=1

(cTui)2.

Let C be any clique in G. Then {ui | i ∈ C} is an orthonormal set of vectors,and hence ∑

i∈C

(cTui)2 ≤ cTc = c2 = 1.

Hence the weights (cTui)2 form a fractional vertex packing and so by defini-

tion of α∗(G)

ϑ(G) =n∑i=1

(cTui)2 ≤ α∗(G).

Another upper bound for Θ(G) is the dimension of an orthonormal represen-tation of G which we will prove in the next theorem.

Theorem 3.18. Assume that G admits an orthonormal representation indimension d. Then

ϑ(G) ≤ d.

Proof. Let (u1, . . . ,un) be an orthonormal representation ofG in d−dimensionalspace. Then V = (u1◦u1, . . . ,un◦un) is another orthonormal representationof G. Let {e1, . . . , ed} be an orthonormal basis and

b =1√d

(e1 ◦ e1 + . . .+ ed ◦ ed).

Then b2 = bTb = 1d

∑i,j(ei ◦ ei)

T (ej ◦ ej) = 1d

∑i,j δi,j = 1

d· d = 1, and

(ui ◦ ui)Tb =

1√d

d∑k=1

(ek ◦ ek)T (ui ◦ ui)

=1√d

d∑k=1

(eTkui)2 =

1√d.

29

We have∑d

k=1(eTkui)2 = 1 because V TV = 1 which says∑d

j=1(V j)2 =

∑dj=1(V Tej)

2 =∑d

j=1(eTj V )2 = 1.In terms of the definition of ϑ(G),

1

(ui ◦ ui)Tb=

1

bT (ui ◦ ui)=√d.

We take at both sides the square and since we just take an orthonormalrepresentation of G and an unit vector we get the inequality.

Therefore, ϑ(G) = minu1,...,un minc

(max1≤i≤n

1(cTui)2

)≤ d.

30

Chapter 4

Further results on the Shannoncapacity

In this last chapter we discuss an article by Haemers [3] where he solved theproblems Lovasz stated at the end of his article. The problems are:

1. Is ϑ(G) = Θ(G)?

2. Is Θ(G�H) = Θ(G)Θ(H)?

3. Is it true that Θ(G)Θ(G) ≥| V (G) |?

Haemers found a graph, the so called Schlafli graph, which is a counterex-ample for the three problems.

Figure 4.1: Schlafli graph

31

Let A be a symmetric n × n matrix over a field with all diagonal entriesequal to one. Let G(A) be the graph with vertex set {1, . . . , n}, two verticesbeing adjacent if and only if (A)ij 6= 0. Let A⊗k denote the kth tensor prod-uct of A with itself. All other notation is the same as in [5]. First we need atheorem.

Theorem 4.1. [3] Θ(G(A)) ≤ rank(A).

Proof. First, rank(A⊗B) =rank(A) rank(B).Second, if B an m×m matrix with Bi,i = 1 ∀i, then α(G(B)) is equal to thebiggest principal submatrix I of B. So, rank(B) ≥ α(G(B)).Third, G(A⊗k) = G(A)k, because A⊗k is an nk × nk matrix with(A⊗k)(i1,...,ik),(j1,...,jk) = Ai1,j1Ai2,j2 . . . Aik,jk . So,

(i1, . . . , ik) = (j1, . . . , jk) or {(i1, . . . , ik), (j1, . . . , jk)} is an edge of G(A⊗k)

⇐⇒ (A⊗k)(i1,...,ik),(j1,...,jk) 6= 0 ⇐⇒ ∀h = 1, . . . k : Aih,jh 6= 0

⇐⇒ ∀h = 1, . . . k : ih = jh or {ih, jh} is an edge of G(A)

⇐⇒ (i1, . . . , ik) = (j1, . . . , jk)or{(i1, . . . , ik), (j1, . . . , jk)} is an edge of G(A)k.

Hence, G(A⊗k) = G(A)k.Now we can conclude, rank(A)k = rank(A⊗k) ≤ α(G(A⊗k)) = α(G(A)k).Therefore, rank(A) ≥ supk

k√α(G(A)k) = Θ(G(A)).

We will construct a matrix B having a (0, 1)−adjacency matrix which is thecomplement of the Schlafli graph. Let B be the (0, 1)−adjacency matrix ofthe Kneser graph K(8, 2) which has

(82

)= 28 vertices and is regular of degree(

62

)= 15. The adjacency matrix B equals one on the entries where the pairs

of vertices are disjoint with each other. Therefore every row and column has(62

)= 15 entries equal to one. Hence

(B − I)(B + 5I) = B2 + 4B − 5I28 = 10J28; BJ28 = 15J28.

Since,

(B)2(i,j),(k,l) = number of pairs disjoint of both (i, j) and (k, l).

=

(

42

)= 6 if (i, j) ∩ (k, l) = ∅(

52

)= 10 if (i, j) ∩ (k, l) = 1(

62

)= 15 if (i, j) = (k, l)

32

So, B2 = 6B + 10(J − I − B) + 15I = −4B + 5I + 10J .

Write

B =

0 jT15 012

j15 B1 N

012 NT B2

=

[0 aT

a C

]

then

B2 =

[0 aT

a C

]2

=

[aTa aTC

Ca aat + C2

]= −4

[0 aT

a C

]+5

[1 0

0 I27

]+10

[1 jT27

j27 J27

]

and

Bj28 =

[0 aT

a C

][1

j27

]=

[aT j27

a+ Cj27

]= 15

[1

j27

]In general,[

0 J15,12

J12,15 0

]=

[−I15 0

0 I12

][0 −J15,12

−J12,15 0

][−I15 0

0 I12

]

Define

B =

[B1 J15,12 −N

J12,15 −NT B2

]=

[−I15 0

0 I12

](C −

[0 J15,12

J12,15 0

])[−I15 0

0 I12

]

Then

33

B2 =

[−I15 0

0 I12

](C −

[0 J15,12

J12,15 0

])2 [−I15 0

0 I12

]

=

[−I15 0

0 I12

] (C − (j27 − a)aT − a(j27 − a)T

)2

[−I15 0

0 I12

]

=

[−I15 0

0 I12

](C2 − C(j27 − a)aT − Ca(j27 − a)T − (j27 − a)aTC − a(j27 − a)TC

+

[12J15 0

0 15J12

])[−I15 0

0 I12

]

=

[−I15 0

0 I12

](C2 − Cj27a

T − CajT27 + 2CaaT − j27aTC − ajT27C + 2aaTC

+

[12J15 0

0 15J12

])[−I15 0

0 I12

]

=

[−I15 0

0 I12

](− aaT − 4C + 5I27 + 10J27 − (15j27 − a)aT − (−4a+ 10j27)jT27

+ 2(−4a+ 10j27)aT − j27(−4aT + 10jT27)− a(15jT27 − aT ) + 2a(−4aT + 10jT27)

+

[12J15 0

0 15J12

])[−I15 0

0 I12

]

=

[−I15 0

0 I12

](−4C + 5I27 +

[5J15 −J15,12

−J12,15 5J12

])[−I15 0

0 I12

]

=

[−I15 0

0 I12

](−4C + 5I27 + 4

[0 J15,12

J12,15 0

]+ 5

[J15 −J15,12

−J12,15 J12

])[−I15 0

0 I12

]

= −4B + 5I27 + 5

[−I15 0

0 I12

][J15 −J15,12

−J12,15 J12

][−I15 0

0 I12

].

It follows that(B − I)(B + 5I) = 5J

34

And from

Bj27 =

[−I15 0

0 I12

](C −

[0 J15,12

J12,15 0

])[−I15 0

0 I12

]j27

=

[−I15 0

0 I12

] (C − (j27 − a)aT − a(j27 − a)T

) [−I15 0

0 I12

]j27

=

[−I15 0

0 I12

] (Cj27 − 2Ca− j27a

T j27 + 2j27aTa+ aaT (j27 − 2a)− a(j27 − a)T (j27 − 2a)

)=

[−I15 0

0 I12

](15j27 − a− 2(−4a+ 10j27)− 15j27 + 30j27 − 27a)

=

[−I15 0

0 I12

](10j27 − 20a) = 10j27

followsBJ = 10J.

This implies that B has an all-one eigenvector with eigenvalue 10 and from(B − I)(B + 5I) = 5J follows that the other eigenvalues are 1 and −5. Leta, b, c be the multiplicities of the eigenvalues respectively 10, 1 and −5, thena = 1 and 10a+ 1b− 5c = TrB = 0. This implies b = 20 and c = 6.Let A = I − B and G = G(A). The complement of G is the Schlafli graph.The eigenvalues of A are 1 − 10 = −9, 1 − 1 = 0 and 1 − (−5) = 6 withmultiplicities 1, 20 and 6. So A has eigenvalue 0 with multiplicity 20, hencerank(A) = 27− 20 = 7. Thus by the previous theorem Θ(G) ≤ 7.Define A(G) = J − I − B and let v1, . . . ,vn be an orthogonal system ofeigenvectors. So if i ≥ 2, then jTvi = 0.Hence (J − I −B)vi = Jvi − vi − λivi = −vi − λivi = (−1− λi)vi if i ≥ 2and (J − I −B)v1 = 27v1 − v1 − λ1v1 = (27− 1− λ1)v1.Therefore A(G) = J − I −B has eigenvalues 27− 1− 10 = 16, −1− 1 = −2and −1 − (−5) = 4. Applying Theorem 3.14 to G and its complement Ggives

ϑ(G) ≤ −27 · −5

10− (−5)= 9 and ϑ(G) ≤ −27 · −2

16− (−2)= 3.

By Corollary 3.7 we have ϑ(G)ϑ(G) ≥ 27. Hence ϑ(G) = 9 and ϑ(G) = 3.Therefore we have Θ(G) 6= ϑ(G), Θ(G)Θ(G) ≤ 21 < 27 and Θ(G�G) ≥ 27since for every graph we have α(G�G) ≥| V (G) | because {(v, v) | v ∈ V (G)}is an independent set in G�G and we know α(G�G) ≤ Θ(G�G). Hencealso Θ(G�G) ≥| V (G) |= 27.Whence we can conclude that all three problems are answered in the negative.

35

Chapter 5

Conclusion

We introduced a couple of definitions and theorems in the field of graph the-ory and linear algebra before we defined the Shannon capacity and statedShannon’s Theorem. With Shannon’s Theorem we determined the Shannoncapacity for perfect graphs, but we only have a lower and an upper boundfor most of the other graphs. For example for the cycle graphs Cn with nodd and n > 5 we have only n−1

2≤ Θ(Cn) ≤ n

2.

For the cycle graph C5 we proved that the Shannon capacity equals√

5 byusing the umbrella technique introduced by Lovasz. He also introduced theLovasz Number, ϑ(G), which is an upper bound for the Shannon capacity.We proved that it is smaller or equal than the upper bound Shannon found.Therefore, with the Lovasz Number we can approach the Shannon capacityof a graph more exactly. We proved a number of formulas and propertiesof ϑ(G), include different ways of determining the value of ϑ(G) and multi-plication of ϑ(G). In the last chapter we considered three problems Lovaszstated at the end of his article and we gave a counterexample by the so calledSchlafli graph, proved by Haemers. For this graph we have Θ(G) 6= ϑ(G) andso the problems are answered in the negative.As mentioned in the introduction, it would be valuable to know the Shan-non capacity for communication models. Unfortunately since determiningthe Shannon capacity of an arbitrary graph is very difficult, even for thesimple cycle graph C7 the Shannon capacity is unknown, we conclude thatthis theoretical parameter is not very useful at this moment. On the otherhand, because of the connections with some central questions in graph the-ory, further research in this field will maybe contribute to a better approachand resulting in more applications of the Shannon capacity.

36

Popular summary

Suppose we want to send a message to a receiver. During the transmission bya channel noise may occur and the message changes. In 1956 Shannon posedthe following interesting question: What is the maximum rate of transmis-sion such that the receiver may recover the original message without errors?We model the channel as a graph G(V,E) which is a set of vertices V (G) andedges E(G) where a relation between two vertices appears if there is an edgebetween these vertices. The vertices of the graph represent the letters of analphabet and two letters may be confused if they are connected by an edge.For example we consider the graph C5 where the vertices correspond to thenumbers 1 to 5 as in Figure 5.1. We see that number 1 can confuse with

Figure 5.1: Cycle graph C5

number 2 or number 5. The maximum number of 1-letter messages whichcan be sent without danger of confusion is defined as α(C5). This is themaximum independent set in C5 which is defined as the maximum numberof vertices such that no two vertices are connected by an edge. In this caseα(C5) = 2 which says that there are 2 1-letter messages which can be sentwithout danger of confusion. In other words, such that the receiver knowswhether the received message is correct. We define α(Gk) as the maximumnumber of k-letter messages which can be sent without danger of confusion.

37

Figure 5.2: C5 � C5

For example we consider the graph C25 = C5 � C5 as in Figure 5.2. For this

graph we have that two vertices (u1, u2) 6= (v1, v2) are adjacent in C5 �C5 ifand only if ui = vi or uivi ∈ E(C5) for i = 1, 2. In Figure 5.2, the red ver-tices form the maximum independent set of this graph. The 2-letter messageswhich can not be confused are v1v1, v2v3, v3v5, v4v2 and v5v4, hence α(C2

5) = 5.

To answer the question of Shannon we define the Shannon capacity of agraph, using the maximum independence number, as

Θ(G) = supk

k√α(Gk).

The Shannon capacity attracted some interest in the field of InformationTheory and in the scientific community because of the applications to com-munication issues. Unfortunately, the determination of the Shannon capacityis a very difficult problem even for very simple small graphs. Therefore Shan-non found an upper and a lower bound for the Shannon capacity. He stated

α(G) ≤ Θ(G) ≤ α∗(G)

which is known as Shannon’s Theorem. Here α∗(G) is defined as the maxi-mum sum of all weights taken over all vertices where for every clique musthold that the sum of the weights of the vertices in a clique is smaller orequal to 1. A clique is defined as a subgraph of G where every two verticesare connected by an edge. To clarify this we take again the graph C5. InFigure 5.3 we have one clique circled and in total we have five cliques. Forevery clique must hold that the sum of the vertices is smaller or equal toone. Let’s give each vertex weight 1

2, then the condition for cliques is sat-

isfied and when we take the sum over all vertices we have α∗(C5) = 5 · 12

= 52.

38

Figure 5.3: Clique in C5

Therefore, by Shannon’s Theorem we have 2 ≤ Θ(C5) ≤ 52.

In 1979 Lovasz proved that Θ(C5) =√

5 using his umbrella technique. Forlong time this was an open problem and therefore this was an importantresult in the field of mathematics. Up to now the Shannon capacity of C7

is still an open problem which marks the difficulty of the problem. For thecycle graphs Cn with n even the Shannon capacity is known because theseare perfect graphs and for these hold α(G) = Θ(G).

Beside the determination of the Shannon capacity of C5 Lovasz also definedthe Lovasz Number ϑ(G), an upper bound for the Shannon capacity, andmost important it is a smaller upper bound than the one Shannon found.At the end of this article Lovasz stated three problems, one of them isΘ(G) = ϑ(G)? Haemers solved the three problems by giving a counterexam-ple by the so called Schlafli graph. Therefore the problems are answered inthe negative.

Figure 5.4: Schlafli graph

39

Concluding, it would be valuable to know the Shannon capacity for a certaingraph, but because determining the Shannon capacity is a very difficult prob-lem, we say that this theoretical parameter is not very useful at this moment.On the other hand, because of the connections with some central questions ingraph theory, further research in this field will maybe contribute to a betterapproach and resulting in more applications of the Shannon capacity.

40

Bibliography

[1] Aigner, M. & Ziegler, G.M. (2010) Proofs from THE BOOK, 4th edition,Springer-Verlag Berlin Heidelberg 241-250.

[2] Codenotti, B., Gerace, I. & Resta, G. (2003) Some remarks on the Shan-non capacity of odd cycles, Ars Combinatoria, 66, 243-257.

[3] Haemers, W. (1979) On Some Problems of Lovasz Concerning the Shan-non Capacity of a Graph, IEEE Transactions on Information Theory, 25,no 2, 231-232.

[4] Lovasz, L. (1972) Normal hypergraphs and the perfect graph conjecture,Discrete Mathematics, 2, no 3, 253-267.

[5] Lovasz, L. (1979) On the Shannon Capacity of a Graph, IEEE Transac-tions on Information Theory, 25, no 1, 1-7.

[6] Schrijver, A. (2003) Combinatorial Optimalization, vol. A, Springer-Verlag, Berlin.

[7] West, D.B. (2001) Introduction to Graph Theory, 2th edition, PrenticeHall.

41

Documents

The Shannon Capacity of a Graph - UvA · state Shannon’s Theorem. Since it is not possible to determine the Shannon capacity of every graph exactly, Shannon’s Theorem gives us