Finite-Dimensional Spaces: Algebra, Geometry, and Analysis

Carnegie Mellon UniversityResearch Showcase @ CMU

Department of Mathematical Sciences Mellon College of Science

2006

Finite-Dimensional Spaces: Algebra, Geometry,and Analysis Volume IWalter NollCarnegie Mellon University, [email protected]

Follow this and additional works at: http://repository.cmu.edu/math

This Book is brought to you for free and open access by the Mellon College of Science at Research Showcase @ CMU. It has been accepted forinclusion in Department of Mathematical Sciences by an authorized administrator of Research Showcase @ CMU. For more information, pleasecontact [email protected].

http://repository.cmu.edu?utm_source=repository.cmu.edu%2Fmath%2F15&utm_medium=PDF&utm_campaign=PDFCoverPages

http://repository.cmu.edu/math?utm_source=repository.cmu.edu%2Fmath%2F15&utm_medium=PDF&utm_campaign=PDFCoverPages

http://repository.cmu.edu/cos?utm_source=repository.cmu.edu%2Fmath%2F15&utm_medium=PDF&utm_campaign=PDFCoverPages

http://repository.cmu.edu/math?utm_source=repository.cmu.edu%2Fmath%2F15&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

Finite-Dimensional Spaces

Algebra, Geometry, and AnalysisVolume I

By

Walter Noll

Department of Mathematics, Carnegie Mellon University, Pittsburgh,PA 15213 USA

This book was published originally by Martinus Nijhoff Publishers in 1987.This is a corrected reprint, posted in 2006 on my websitemath.cmu.edu/ wn0g/noll.

i

Introduction

A. Audience. This treatise (consisting of the present Vol.I and of Vol.II, to bepublished) is primarily intended to be a textbook for a core course inmathematics at the advanced undergraduate or the beginning graduatelevel. The treatise should also be useful as a textbook for selected stu-dents in honors programs at the sophomore and junior level. Finally,it should be of use to theoretically inclined scientists and engineerswho wish to gain a better understanding of those parts of mathemat-ics that are most likely to help them gain insight into the conceptualfoundations of the scientific discipline of their interest.

B. Prerequisites. Before studying this treatise, a student should be familiar with thematerial summarized in Chapters 0 and 1 of Vol.I. Three one-semestercourses in serious mathematics should be sufficient to gain such fa-miliarity. The first should be an introduction to contemporary math-ematics and should cover sets, families, mappings, relations, numbersystems, and basic algebraic structures. The second should be an in-troduction to rigorous real analysis, dealing with real numbers andreal sequences, and with limits, continuity, differentiation, and inte-gration of real functions of one real variable. The third should be anintroduction to linear algebra, with emphasis on concepts rather thanon computational procedures.

C. Organization. There are ten chapters in Vol.I, numbered from 0 to 9. A chaptercontains from 4 to 12 sections. The first digit of a section number indi-cates the chapter to which the section belongs; for example, Sect.611 isthe 11th section of Chap.6. The appropriate section title and numberare printed on the top of each odd-numbered page.

A descriptive name is used for each important theorem. Less impor-tant results are called Propositions and are enumerated in each section;for example, Prop.5 of Sect.83 refers to the 5th proposition of the 3rdsection of Chap.8. Similar enumerations are used, if needed, for for-mal Definitions, Remarks, and Pitfalls. The term Pitfall is used forcomments designed to prevent possible misconceptions.

At the end of most sections there are notes in small print. Theirpurpose is to relate the notations and terms used in the text to othernotations and terms in the mathematical literature, and to commenton symbols, terms, and procedures that appear in print here for thefirst time (to the best of my knowledge).

ii

A list of problems is presented at the end of each chapter. Some of theproblems deal with examples that should help the students to betterunderstand the concepts explained in the text. Some of the problemspresent additional easy results. A few problems present harder resultsand may require ingenuity to solve, although hints are often given.

Theorems, Propositions, and Definitions are printed in italic type. Im-portant terms are printed in boldface type when they are defined, sothat their definitions can be easily located. The end of a Proof, Re-mark, or Pitfall is indicated by a block: . Words enclosed in bracketsindicate substitutes that can be used without invalidating the state-ment. For example, “The maximum [minimum] of the set S is de-noted by maxS[minS]” is a shorthand for two statements. The firstis obtained by omitting the words in brackets and the second is “Theminimum of the set S is denoted by minS”.

D. Style. I do not believe that it is a virtue to present merely the logicalskeleton of an argument and to hide carefully all motivations fromstudents by refusing to draw pictures and analogies. Unfortunately, itis much easier to write down formal definitions, theorems, and proofsthan to describe motivations. I wish I had been able to do more ofthe latter. In this regard, a large contribution must be made by theinstructor who uses this treatise as a textbook.

I have tried to be careful and honest with wording. For example, thephrases “it is evident” and “clearly” mean that a student who hasunderstood the terms used in the statement should be able to agreeimmediately. The phrases “it is easily seen” and “an easy calculationshows” mean that it should take a good student no more than fifteenminutes with pencil and paper to verify the result. The phrase “itcan be shown” is for information only and is not intended to implyanything about the difficulty of the proof.

A mathematical symbol should be used only in one of three ways. Itshould denote a specific object (e.g. 0, 1, N, R), or it should denote adefinite but unspecified object (e.g. n in “Let n ∈ N be given”), or itshould be a dummy (e.g. n in “2n is even for all n ∈ N”). I have tried toavoid any other uses. Thus, there should be no “dangling dummies” orexpressions such as dy

dx, to which I cannot assign a reasonable, precise

meaning, although they appear very often in the literature.

E. Genesis. About 25 years ago I started to write notes for a course for seniorsand beginning graduate students at Carnegie Institute of Technology

iii

(renamed Carnegie-Mellon University in 1968). At first, the coursewas entitled “Tensor Analysis”. I soon realized that what usuallypasses for “Tensor Analysis” is really an undigested mishmash of lin-ear and multilinear algebra, differential calculus in finite-dimensionalspaces, manipulation of curvilinear coordinates, and differential geom-etry on manifolds, all treated with mindless formalisms and withoutreal insight. As a result, I omitted the abstract differential geometry,which is too difficult to be treated properly at this level, and renamedthe course “Multidimensional Algebra, Geometry, and Analysis”, andlater “Finite-Dimensional Spaces”. The notes were rewritten severaltimes. They were widely distributed and they served as the basis forappendices to the books Viscometric Flows of Non-Newtonian Fluids

by B. D. Coleman, H. Markovitz, and W. Noll (Springer-Verlag 1966)and A First Course in Rational Continuum Mechanics by C. Truesdell(Academic Press 1977).

Since 1973 my notes have also been used by J. J. Schaffer and me in anundergraduate honors program entitled “Mathematical Studies”. Oneof the purposes of the program has been to present mathematics asan integrated whole and to avoid its traditional division into separateand seemingly unrelated courses. In this connection, Schaffer and Igradually developed a system of notation and terminology that webelieve is useful for all branches of mathematics. My involvement inthe Mathematical Studies Program has had a profound influence onmy thinking; it has led to radical revisions of my notes and finally tothis treatise.

Chapter 9 of Vol. I is an adaptation of notes entitled “On the Structureof Linear Transformations”, which were written for a course in “Mod-ern Algebra”. (They were issued as Report 70–12 of the Departmentof Mathematics, Carnegie-Mellon University, in March 1970.)

F. Apologia. I wish to list certain features which make this treatise different frommuch, and in some cases most, of the existing literature.

Much of the substance of this treatise is covered in textbooks with titlessuch as “Linear Algebra”, “Analytic Geometry”, “Finite-DimensionalVector Spaces”, “Modern Algebra”, “Vector and Tensor Analysis”,“Advanced Calculus”, “Functions of Several Variables”, or “ Elemen-tary Differential Geometry”. However, I believe this treatise to be thefirst that deals with finite-dimensional spaces in a unified way and thatemphasizes the interplay between algebra, geometry, and analysis.

iv

The approach of this treatise is conceptual, geometric, and uncompro-misingly “coordinate-free”. In some of the literature, “tensors” arestill defined in terms of coordinates and their transformations. To me,this is like looking at shadows dancing on the wall rather than at real-ity itself. Coordinates have no place in the definition of concepts. Ofcourse, when it comes to dealing with specific problems, coordinatesare sometimes useful. For this reason, I have included a chapter inwhich I show how to handle coordinates efficiently.

The space Rn, with n ∈ N, is very rarely mentioned in this trea-tise. It is misused nearly every time it appears in the literature, be-cause it is only a special model for the structure that is appropriatein most situations, and as a special model Rn contains extraneousfeatures that impede geometric insight. Thus any textbook on finite-dimensional calculus with a title like “Functions of Several Variables”must be defective. I consider it a travesty to call Rn “the Euclideann-space”, as so many do. To quote N. D. Goodman: “Obviously, thisis not what Euclid meant” (in “Mathematics as an objective science”,Am. Math. Monthly, Vol. 86, p. 549, 1979).

In this treatise, I have tried to present every mathematical topic ina setting that fits the topic naturally and hence leads to a maximumof insight. For example, the structure of a flat (a.k.a. affine) spaceis the most natural setting for the differential and integral calculus.Most treatments use Rn, a linear space, a normed linear space, or aEuclidean space as the setting. Each of them has extraneous structurewhich conceals the true nature of the calculus. On the other hand,the structure of a differentiable manifold is too impoverished to be asetting for many aspects of calculus.

In this treatise, a very careful distinction is made between a set anda family (see Sect.02). Almost all the literature is very sloppy on thispoint. I have found it liberating to resist the compulsion to think offinite sets always in enumerated form and thus to confuse them withlists. Also, I have found it very useful to be able to use a single symbolfor a family and to use the same symbol, with an index, for the termsof the family. For example, I use Mi,j for the (i, j)-term of the matrixM . It seems nonsensical to me to change from an upper case letterM to the lower case letter m when changing from the matrix to itsterms. A notation such as (mi,j) for a matrix, often seen in textbooks,is poison to me because it contains the dangling dummies i and j(dangling dummies are like cigarettes: both are poison, but I used

i

both when young.)

In this treatise, I have paid more careful attention than is usual to thespecification of domains and codomains of mappings. In particular, theadjustment of domains and codomains is done explicitly (see Sect.03).Most authors are either ambiguous on this matter or talk around itclumsily.

The terminology and notation used in this treatise have been verycarefully selected. They often do not conform with what a particularreader might consider “standard”, especially since what is “standard”for an expert in category theory may differ from what is “standard”for an expert in engineering mechanics. When the more common ter-minology or notation is obscure, clumsy, or leads to clashes, I haveintroduced new terminology or notation. Perhaps the most conspic-uous example is the term “lineon” for the too cumbersome “lineartransformation”. Some terms, such as the noun “tensor”, have beenused with so many different meanings in the past that I found it wiseto avoid them altogether.

I have avoided naming concepts and theorems after their purportedinventors or discoverers. For one thing, there is often much doubtabout who the originators were and frequently credit is given to thewrong people. Secondly, the use of descriptive names makes it easierto learn the material. I have introduced such descriptive names inalmost all cases. The experienced mathematician will rarely have anydifficulty in understanding my meaning. For example, I have yet tofind a mathematician who could not tell immediately that my “Inner-Product Inequality” is what is commonly called the “Cauchy-SchwarzInequality”. The notes at the end of each section list the names thathave been used elsewhere for the concepts and theorems introduced.

ii

Chapter 0

Basic Mathematics

In this chapter, we introduce the notation and terminology used through-out the book. Also, we give a brief explanation of the basic concepts ofcontemporary mathematics to the extent needed in this book. Finally, wegive a summary of those topics of elementary algebra and analysis that area prerequisite for the remainder of the book.

00 Notations

The equality sign = is used to express the assertion that on either side of= are symbolic names (possibly very complicated) for one and the sameobject. Thus, a = b means that a and b are names for the same object;a 6= b means that a and b are names for distinct objects. The symbol := isused to mean that the left side is defined by the right side, that the left sideis an abbreviation of the right side, or that the right side is to be substitutedfor the left side. The symbol =: has an analogous meaning.

The logical equivalence sign ⇔ is used to indicate logical equivalence ofstatements. The symbol :⇔ is used to define a phrase or property; it maybe read as “means by definition that” or “is equivalent by definition to”.

Given a set S and a property p that any given member of S may or maynot have, we use the shorthand

? x ∈ S, x has the property p (00.1)

to describe the problem “Find all x ∈ S, if any, such that x has the propertyp”. An element of S having the property p is called a solution of the

problem. Often, the property p involves an equality; then the problem iscalled an equation.

iii

iv CHAPTER 0. BASIC MATHEMATICS

On occasion we use one and the same symbol for two different objects. Ofcourse, this is permissible only if the two objects are related by some naturalcorrespondence and if the context makes it clear which of the meanings ofthe symbol is appropriate in each instance. Suppose that S and T are twosets and that there is a natural one-to-one correspondence between them.We say that we identify S and T and we write S ∼= T if we wish to usethe same symbol for an element of S and the corresponding element of T .Identification must be handled with great care to avoid notational clashes,i.e. instances in which the meaning of a symbol becomes truly ambiguous,even in context. On the other hand, total avoidance of identifications wouldlead to staggering notational complexity.

We consider 0 to be a real number that is both positive and negative.If we wish to exclude 0, we use the terms “strictly positive” and “strictlynegative”. The following is a list of notations for specific number sets.

Set Description

N set of all natural numbers 0, 1, 2, . . ., including 0.Z set of all integers . . . − 2, −1, 0, 1, 2 . . .Q set of all rational numbers.R set of all real numbers.P set of all positive real numbers, including 0.C set of all complex numbers.

n[ given n ∈ N, the set of the first n natural numbers0, 1, . . . , n − 1, starting with 0.

n] given n ∈ N, the set of the first n natural numbers1, 2, . . . , n, starting with 1.

It is useful to read n[ as “n out” and n] as “n in”.We use a superscript cross to indicate that 0 has been taken out of a set.

For example, P× denotes the set of all strictly positive numbers and N× theset of all non-zero natural numbers. This cross notation may be used on anyset that has a special “zero-element”, not necessarily the number zero.

Notes 00

(1) The “quantifier” ? as in (00.1) was introduced by J. J. Schaffer in about 1973.

(2) Many people say “positive” when we say “strictly positive” and use the awkward“non-negative” when we say “positive”.

(3) The use of the special letters N, . . . , C for the various number sets has become fairlystandard in recent years. However, some people use N for what we denote by N×,

01. SETS, PARTITIONS v

the set of non-zero natural numbers; they do not consider zero to be a naturalnumber. Sometimes P is used for what we call P×, the set of strictly positive realnumbers. The notation R+ for what we denote by P is often used. Older textbooksoften use boldface letters or script letters instead of the special letters now common.

(4) In most of the literature, S ∼= T is used to indicate that S is isomorphic to T . Iprefer to use ∼= only when the isomorphism is natural and used for identification.

(5) The notations n[ and n] were invented by J. J. Schaffer in about 1973. I cannotunderstand any more how I ever got along without them. Some textbooks use theboldface n for what we call n]. I do not consider the change from lightface toboldface a legitimate notation for a functorial process, quite apart from the factthat it is impossible to produce on a blackboard.

(6) The use of the superscript × to indicate the removal of 0 was introduced by J. J.Schaffer and me in about 1973. It has turned out to be a very effective notation.It is consistent with the commonly found notation F× for the multiplicative groupof a field F (see Sect.06).

01 Sets, Partitions

To specify a set S, one must have a criterion for deciding whether any givenobject x belongs to S. If it does, we write x ∈ S and say that x is a memberor an element of S, that S contains x, or that x is in S or contained inS. If x does not belong to S we write x /∈ S. We use abbreviations such as“x, y ∈ S” for “x ∈ S and y ∈ S”.

Let S and T be sets. If every member of S is also a member of T wewrite S ⊂ T or T ⊃ S and say that S is a subset of T , that S is includedin T , or that T includes S. We have

S = T ⇐⇒ (S ⊂ T and T ⊂ S). (01.1)

If S ⊂ T but S 6= T we write S $ T and say that S is a proper subset ofT or that S is properly included in T .

There is exactly one set having no members at all; it is denoted by ∅ andcalled the empty set. The empty set is a subset of every set. A set havingexactly one member is called a singleton; it is denoted by a if a denotesits only member. If the set S is known to be a singleton, we write a :∈ S toindicate that we wish to denote the only member of S by a. A set havingexactly two members is called a doubleton; it is denoted by a, b if a andb denote its two (distinct) members.

A set C whose members are themselves sets is often called a collectionof sets. The collection of all subsets of a given set S is denoted by Sub S.

vi CHAPTER 0. BASIC MATHEMATICS

Hence, if T is a set, then

T ⊂ S ⇐⇒ T ∈ SubS. (01.2)

Many sets in mathematics are specified by naming an encompassing setA and a property p that any given member of A may or may not have. Theset S of all members of A that have this property p is denoted by

S := x ∈ A | x has the property p, (01.3)

which is read as “S is the set of all x in A such that x has the propertyp”. Occasionally, one has no encompassing set and p is a property that anyobject may or may not have. In this case, (01.3) is replaced by

S := x | x has the property p. (01.4)

Remark: Definitions of sets of the type (01.4) must be treated withcaution. Indiscriminate use of (01.4) can lead to difficulties known as “para-doxes”.

Sometimes, a set with only few members is specified by an explicit listingof its members and by enclosing the list in braces . Thus, a, b, c, ddenotes the set whose members are a, b, c and d.

Given any sets S and T , one can form their union S ∪ T , consisting ofall objects that belong either to S or to T (or to both), and one can formtheir intersection S∩T , consisting of all objects that belong to both S andT . We say that S and T are disjoint if they have no elements in common;i.e. if S ∩ T = ∅. The following rules (01.5)–(01.10) are valid for any sets S,T , U .

S ∪ S = S ∩ S = S ∪ ∅ = S, S ∩ ∅ = ∅, (01.5)

T ⊂ S ⇐⇒ T ∪ S = S ⇐⇒ T ∩ S = T. (01.6)

The following rules remain valid if ∩ and ∪ are interchanged.

S ∪ T = T ∪ S, (01.7)

(S ∪ T ) ∪ U = S ∪ (T ∪ U), (01.8)

(S ∪ T ) ∩ U = (S ∩ U) ∪ (T ∩ U), (01.9)

T ⊂ S =⇒ T ∪ U ⊂ S ∪ U. (01.10)

Given any sets S and T , the set of all members of S that do not belong toT is called the set-difference of S and T and is denoted by S \ T , so that

S \ T := x ∈ S | x /∈ T. (01.11)

01. SETS, PARTITIONS vii

We read S \ T as “S without T”. The following rules (01.12)–(01.16) arevalid for any sets S, T , U .

S \ S = ∅, S \ ∅ = S, (01.12)

S \ (T \ U) = (S \ T ) ∪ (S ∩ U), (01.13)

(S \ T ) \ U = (S \ T ) ∩ (S \ U). (01.14)

The following rules remain valid if ∩ and ∪ are interchanged.

(S ∪ T ) \ U = (S \ U) ∪ (T \ U), (01.15)

S \ (T ∪ U) = (S \ T ) ∩ (S \ U). (01.16)

If T is a subset of S, then S \ T is also called the complement of T inS. The complement S \ T is the largest (with respect to inclusion) amongall the subsets of S that are disjoint from T .

The union⋃

C of a collection C of sets is defined to be the set of allobjects that belong to at least one member-set of the collection C. For anysets S and T we have

⋃

∅ = ∅,⋃

S = S,⋃

S, T = S ∪ T. (01.17)

If C and D are collections of sets then

⋃

(C ∪ D) = (⋃

C) ∪ (⋃

D). (01.18)

The intersection⋂

C of a non-empty collection C of sets is defined to bethe set of all objects that belong to each of the member-sets of the collectionC. For any sets S and T we have

⋂

S = S,⋂

S, T = S ∩ T. (01.19)

If C and D are non-empty collections, then

⋂

(C ∪ D) = (⋂

C) ∩ (⋂

D). (01.20)

We say that a collection C of sets is disjoint if any two distinct member-sets of C are disjoint. We say that C covers a given set S if S ⊂

⋃

C.Let a set S be given. A disjoint collection P of non-empty subsets of S

that covers S is called a partition of S. The member-sets of P are calledthe pieces of the partition P. The empty collection is the only partitionof the empty set. If S is any set then E ∈ SubS |E is a singleton is

viii CHAPTER 0. BASIC MATHEMATICS

a partition of S, called the singleton-partition of S. If S 6= ∅, then Sis also a partition of S, called the trivial partition. If T is a non-emptyproper subset of S, then T, S \ T is a partition of S.

Let S be a set and let ∼ be a relation on S, i.e. a fragment that becomesa statement x ∼ y, true or not, when x, y ∈ S. We say that ∼ is anequivalence relation if for all x, y, z ∈ S we have

(i) x ∼ x (reflexivity),

(ii) x ∼ y =⇒ y ∼ x (symmetry), and

(iii) (x ∼ y and y ∼ z) =⇒ x ∼ z (transitivity).

If ∼ is an equivalence relation on S, then

P := P ∈ SubS | P = x ∈ S | x ∼ y for some y ∈ S

is a partition of S; its pieces are called the equivalence classes of therelation ∼, and we have x ∼ y if and only if x and y belong to the samepiece of P. Conversely, if P is a partition of S, then

x ∼ y : ⇐⇒ (for some P ∈ P, x, y ∈ P )

defines an equivalence relation on S whose equivalence classes are the piecesof P.

Notes 01

(1) Some authors use ⊆ when we use ⊂, and ⊂ when we use $. There is some confusionin the literature concerning the use of “contain” and “include”. We carefully observethe distinction.

(2) The term“null set” is often used for what we call the “empty set”. Also the phrase“S is void” instead of “S is empty” can often be found.

(3) The notation Sub S is used here for the first time. The notations P(S) and 2S arecommon. The collection Sub S is often called the “power set” of S.

(4) The notation S–T instead of S \ T is used by some people. It clashes with themember-wise difference notation (06.16). If T is a subset of S, the notations CSTor T c are used by some people for the complement S \ T of T in S.

02. FAMILIES, LISTS, MATRICES ix

02 Families, Lists, Matrices

A family a is specified by a procedure by which one associates with eachmember i of a given set I an object ai. The given set I is called the indexset of the family and the object ai is called the term of index i or simplythe i-term of the family a. If a and b are families with the same index set Iand if ai = bi for all i ∈ I, then a and b are considered to be the same, i.e.a = b. The notation (ai | i ∈ I) is often used to denote a family, especiallyif no name is available a priori.

The set of all terms of a family a is called the range of a and is denotedby Rng a or ai | i ∈ I, so that

Rng a = Rng (ai | i ∈ I) = ai | i ∈ I. (02.1)

Many sets in mathematics are specified by naming a family a and by lettingthe set be the range (02.1) of a. We say that a family a = (ai | i ∈ I) isinjective if, for all i, j ∈ I,

ai = aj =⇒ i = j.

Roughly, a family is injective if there is no repetition of terms.The concept of a family may be viewed as a generalization of the concept

of a set. With each set S one can associate a family by letting the index setbe S itself and by letting the term corresponding to any given x ∈ S be xitself. Thus, the family corresponding to S is (x | x ∈ S). We identify thisfamily with S and refer to it as “S self-indexed.” In this manner, everyassertion involving arbitrary families includes, as a special case, an assertioninvolving sets. The empty set ∅ is identified with the empty family, whichis the only family whose index set is empty.

If all the terms of a given family a belong to a given set S; i.e. ifRng a ⊂ S, we say that a is a family in S. The set of all families in S witha given index set I is denoted by SI and called the I-set-power of S. IfT ⊂ S, then T I ⊂ SI . We have S∅ = ∅.

Let n ∈ N be given. A family whose index set is n] or n[ is called a listof length n. If n is small, a list a indexed on n] can often be specified bya bookkeeping scheme of the form

(a1, a2, . . . , an) := (ai | i ∈ n]). (02.2)

where a1, a2, . . . , an should be replaced by specific names of objects, to befilled in an actual use. For each i ∈ n], we call ai the i’th term of a. The onlylist of length 0 is the empty family ∅. A list of length 1 is called a singlet,

x CHAPTER 0. BASIC MATHEMATICS

a list of length 2 is called a pair, a list of length 3 is called a triple, and alist of length 4 a quadruple.

If S is a set and n ∈ N, we use the abbreviation Sn := Sn]for the set of

all lists of length n in S. We call Sn the n’th set-power of S.

Let S and T be sets. The set of all pairs whose first term is in S andwhose second term is in T is called the set-product of S and T and isdenoted by S × T ; i.e.

S × T := (x, y) | x ∈ S, y ∈ T. (02.3)

We have S×T = ∅ if and only if S = ∅ or T = ∅. We have S×S = S2]= S2;

we call it the set-square of S.A family whose index set is the set-product I×J of given sets I and J is

called an (I ×J)-matrix. The (i, j)-term of an (I ×J)-matrix M is usuallywritten Mi,j instead of M(i,j). For each i ∈ I, the family (Mi,j | j ∈ J) iscalled the i-row of M , and for each j ∈ J , the family (Mi,j | i ∈ I) is calledthe j-column of M . The (J ×I)-matrix M⊤ defined by (M⊤)j,i := Mi,j forall (j, i) ∈ J × I is called the transpose of M . If m, n ∈ N, an (m] × n])-matrix M is called an m-by-n-matrix. Its rows are lists of length n and itscolumns are lists of length m. If m and n are small, an m-by-n-matrix canoften be specified by a bookkeeping scheme of the form

M1,1 M1,2 · · · M1,n

M2,1 M2,2 · · · M2,n

......

...Mm,1 Mm,2 · · · Mm,n

:= M. (02.4)

A family M whose index set is the set-square I2 of a given set I is calleda square matrix. We then say that M is symmetric if M = M⊤, i.e. ifMi,j = Mj,i for all i, j ∈ I. The family (Mi,i | i ∈ I) is called the diagonalof M . We say that a square matrix M in R (or in any set containing a zeroelement 0) is a diagonal matrix if Mi,j = 0 for all i, j ∈ I with i 6= j.

Let a set S be given. For every U ∈ SubS we define the characteristicfamily chU⊂S of U in S by

(chU⊂S)x :=

1 if x ∈ U0 if x ∈ S\U

. (02.5)

If the context makes it clear what S is, we abbreviate chU⊂S to chU .A family whose index set is N or N× is called a sequence.

03. MAPPINGS xi

Notes 02

(1) A family is sometimes called an “indexed set”. The trouble with this term is thatan “indexed set” is not a set. The notation aii∈I is used by some people for whatwe denote by (ai | i ∈ I). Some people even use just ai, which is poison becauseof the dangling dummy i and because it also denotes a singleton with member ai.

(2) The terms “member” or “entry of a family” are often used instead of “term of afamily”. The use of “member” can lead to confusion with “member of a set”.

(3) One often finds the barbarism “n-tuple” for what we call “list of length n”. Theterm “finite sequence” is sometimes used for what we call “list”. A list of numbersis very often called a “vector”. I prefer to use the term “vector” only when it hasits original geometric meaning (see Def.1 of Sect.32).

(4) The terms “Cartesian product” and “direct product” are often used for what wecall “set product”.

(5) In most of the literature, the use of the term “matrix” is confined to the case whenthe index set is of the form n] ×m] and when the terms are numbers of some kind.The generalization used here turns out to be very useful.

03 Mappings

In order to specify a mapping f , one first has to prescribe two sets, say Dand C, and then some kind of prescription, called the assignment rule off , by which one can assign to every element x ∈ D an element f(x) ∈ C.We call f(x) the value of f at x. It is very important to distinguish verycarefully between the mapping f and its values f(x), x ∈ D. The set Dof objects to which the prescription embodied in f can be applied is calledthe domain of the mapping f and is denoted by Dom f := D. The set Cto which the values of f must belong is called the codomain of f and isdenoted by Cod f := C. In order to put C and D into full view, we oftenwrite

f : D → C or Df→ C

instead of just f and we say that f maps D to C or that f is a mapping from

D to C. The phrase “f is defined on D” expresses the assertion that D is thedomain of f . If f and g are mappings with Dom f = Dom g, Cod f = Cod g,and f(x) = g(x) for all x ∈ Dom f , then f and g are considered to coincide,i.e. f = g.

Terms such as “function”, “map”, “functional”, “transformation”, and“operator” are often used to mean the same thing as “mapping”. The term“function” is preferred when the codomain is the set of real or complex

xii CHAPTER 0. BASIC MATHEMATICS

numbers or a subset thereof. A still greater variety of names is used formappings having special properties. Also, in some contexts, the value of fat x is not written f(x) but fx, xf , fx or xf .

In order to specify a mapping f explicitly without introduc-ing unnecessary symbols, it is often useful to employ the notation(x 7→ f(x)) : Dom f → Cod f instead of just f . (Note the use of 7→ insteadof →.) For example, (x 7→ 1

x) : R× → R denotes the function f with

Dom f := R×, Cod f := R and evaluation rule

f(x) :=1

xfor all x ∈ R×.

The graph of a mapping f : D → C is the subset Gr (f) of the set-productD × C defined by

Gr (f) := (x, y) ∈ D × C | y = f(x). (03.1)

The mappings f and g coincide if and only if they have the same domain,codomain, and graph.

Remark: Very often, a mapping is specified by two sets D and C and astatement scheme F (x, y), which may be become valid or not, depending onwhat elements of D and C are substituted for x and y, respectively. If, forevery x ∈ D, there is exactly one y ∈ C such that F (x, y) is valid, then Fdefines a mapping f : D −→ C, namely by the prescription that assigns tox ∈ D the unique y ∈ C that makes F (x, y) valid. Then

Gr(f) = (x, y) | F (x, y) is valid .

In some cases, given x ∈ D, one can define or obtain f(x) by a formula,algorithm, or other procedure. Finding an efficient procedure to this end isoften a difficult task.

With every mapping f we can associate the family (f(x) | x ∈ Dom f) ofits values. Roughly, the family is obtained from the mapping by forgettingthe codomain. Conversely, with every family a := (ai | i ∈ I) and everyset C that includes Rng a we can associate the mapping (i 7→ ai) : I → C.Roughly, the mapping is obtained from the family by specifying thecodomain C. For example, if U is a subset of a given set S, we can obtainfrom the characteristic family of U in S, defined by (02.5), the characteris-tic function of U in S, also denoted by chU⊂S or simply chU , by specifyinga codomain, usually R.

03. MAPPINGS xiii

The range of a mapping f is defined to be the range of the family of itsvalues; i.e., with the notation (02.1), we have

Rng f := f(x) | x ∈ Dom f. (03.2)

We say that f is surjective if its codomain and range coincide, i.e. ifCod f = Rng f .

We say that a mapping f is injective if the family of its values is injec-tive; i.e. if for all x, y ∈ D

f(x) = f(y) =⇒ x = y.

A mapping f : D → C is said to be invertible if it is both injective andsurjective. This is the case if and only if there is a mapping g : C → D suchthat for all x ∈ D and y ∈ C

y = f(x) ⇐⇒ x = g(y).

The mapping g is then uniquely determined by f ; it is called the inverseof f and is denoted by f←. A given mapping f : D → C is invertible if andonly if, for each y ∈ C, the problem

? x ∈ D, f(x) = y

has exactly one solution. This solution is then given by f←(y). Invertiblemappings are also called bijections and injective mappings injections.

Let U be a subset of a given set S. The inclusion mapping of U into

S is the mapping 1U⊂S : U → S defined by the rule

1U⊂S(x) := x for all x ∈ U. (03.3)

The inclusion mapping of the set S into itself is called the identity map-ping of S and is denoted by 1S := 1S⊂S . Inclusion mappings are injective.An identity mapping is invertible and equal to its own inverse.

A constant is a mapping whose range is a singleton. We sometimes usethe symbol cD→C to denote the constant with domain D, codomain C, andrange c. In most cases, we can identify cD→C with c itself, thus using thesame symbol for the constant and its only value.

If f and g are mappings such that Dom g = Cod f , we define the com-posite g f : Dom f → Cod g of f and g by the evaluation rule

(g f)(x) := g(f(x)) for all x ∈ Dom f. (03.4)

xiv CHAPTER 0. BASIC MATHEMATICS

If f , g and h are mappings with Dom g = Cod f and Cod g = Dom h, then

(h g) f = h (g f). (03.5)

Because of this rule, we may omit parentheses and write h g f .Let f be a mapping from a set D to itself. For every n ∈ N, the n’th

iterate fn : D → D of f is defined recursively by f0 = 1D, and f(k+1) :=f fk for all k ∈ N. We have f1 = f , f2 = f f , f3 = f f f , etc. Anelement z ∈ D is called a fixed point of f if f(z) = z. We say that givenmappings f and g, both from D to itself, commute if f g = g f .

Let a mapping f : D → C be given. A mapping g : C → D is calleda right-inverse of f if f g = 1C , a left-inverse of f if g f = 1D. Iff has a right-inverse, it must be surjective; if f has a left-inverse, it mustbe injective. If g is a right-inverse of f and h a left-inverse of f , then f isinvertible and g = h = f←. We always have

f 1Dom f = 1Cod f f = f. (03.6)

Again, let a mapping f : D → C be given. We define the imagemapping f> : SubD → Sub C of f by the evaluation rule

f>(U) := f(x) | x ∈ U for all U ∈ SubD (03.7)

and the pre-image mapping f< : SubC → SubD of f by the rule

f<(V ) := x ∈ D | f(x) ∈ V for all V ∈ SubC. (03.8)

The mappings f> and f< satisfy the following rules for all subsets U andU ′ of D and all subsets V and V ′ of C:

U ⊂ U ′ =⇒ f>(U) ⊂ f>(U ′), (03.9)

V ⊂ V ′ =⇒ f<(V ) ⊂ f<(V ′), (03.10)

U ⊂ f<(f>(U)), f>(f<(V )) = V ∩ Rng f, (03.11)

f>(U ∪ U ′) = f>(U) ∪ f>(U ′), f>(U ∩ U ′) ⊂ f>(U) ∩ f>(U ′), (03.12)

f<(V ∪ V ′) = f<(V ) ∪ f<(V ′), f<(V ∩ V ′) = f<(V ) ∩ f<(V ′), (03.13)

f<(C \ V ) = D \ f<(V ). (03.14)

The inclusions ⊂ in (03.11) and (03.12) become equalities if f is injective.If f is injective, so is f>, and f< is a left-inverse of f>. If f is surjective, sois f>, and f< is a right-inverse of f>. If f is invertible, then (f>)← = f<.If f and g are mappings such that Dom g = Cod f , then

(g f)> = g> f>, (g f)< = f< g<, (03.15)

03. MAPPINGS xv

Rng (g f) = g>(Rng f). (03.16)

If chV⊂C is the characteristic function of a subset V of a given set C andif f : D → C is given, we have

chV⊂C f = chf<(V )⊂D. (03.17)

Let a mapping f and sets A and B be given. We define the restrictionf |A of f to A by Dom f |A := A ∩ Dom f, Cod f |A := Cod f , and theevaluation rule

f |A(x) := f(x) for all x ∈ A ∩ Dom f. (03.18)

We define the mapping

f |BA : A ∩ f<(B ∩ Cod f) → B (03.19)

by the rule

f |BA(x) := f(x) for all x ∈ A ∩ f<(B ∩ Cod f). (03.20)

We say that f |BA is an adjustment of f . We have f |A = f |Cod fA . We use

the abbreviations

f |B := f |BDom f , f |Rng := f |Rng fDomf . (03.21)

We have Dom(f |B) = Dom f if and only if Rng f ⊂ B. We note that

f |BA = (f |A) |B = (f |B) |A. (03.22)

Let f be a mapping from a set D to itself. We say that a subset A of D isf -invariant if f>(A) ⊂ A. If this is the case, we define the A-adjustmentf|A : A → A of f by

f|A := f |AA. (03.23)

Let a set A and a collection C of subsets of A be given such that A ∈ C.For every S ∈ Sub A, the subcollection U ∈ C | S ⊂ U of C then contains Aand hence is not empty. We define Sp : SubA → SubA, the span-mappingcorresponding to C, by the rule

Sp (S) :=⋂

U ∈ C | S ⊂ U for all S ∈ SubA. (03.24)

The following rules hold for all S, T ∈ SubA:

S ⊂ Sp (S), (03.25)

xvi CHAPTER 0. BASIC MATHEMATICS

Sp (Sp (S)) = Sp (S), (03.26)

S ⊂ T =⇒ Sp (S) ⊂ Sp (T ). (03.27)

If the collection C is intersection-stable i.e. if for every non-empty sub-collection D of C we have

⋂

D ∈ C, then, for all U ∈ SubA,

Sp (U) = U ⇐⇒ U ∈ C, (03.28)

and we have Rng Sp = C. Also, for every S ∈ SubA, Sp (S) is the smallest(with respect to inclusion) member of C that includes S.

Many of the definitions and rules of this section can be applied if oneor more of the mappings involved are replaced by a family. For example, ifa = (ai | i ∈ I) is a family in a set S and if f is a mapping with Dom f = S,then f a denotes the family given by f a := (f(ai) | i ∈ I).

Notes 03

(1) Some of the literature still confuses a mapping f with its value f(x), and one stillfinds phrases such as “Consider the function f(x)”, which contains the danglingdummy x and hence is poison.

(2) There is some confusion in the literature concerning the term “image of a mappingf”. Some people use it for what we call “value of f” and others for what we call“range of f”. Occasionally, the term “range” is used for what we call “codomain”.

(3) Many people do not consider the specification of a codomain as part of the specifica-tion of a mapping. If we did likewise, we would have no formal distinction betweena family and a mapping. I believe it is very useful to have such a distinction.

(4) The terms “onto” for “surjective” and “one-to-one” for “injective” are very oftenused. Also “one-to-one correspondence” is a common term for what we call “bi-jection” or “invertible mapping”. We sometimes use “one-to-one correspondence”informally.

(5) The notation f← for the inverse of the mapping was introduced in about 1973 byJ. J. Schaffer and me. The notation f−1 is more common, but it clashes withnotations for value-wise reciprocals. However, for linear mappings, it is useful torevert to the more traditional notation (see Sect.13).

(6) The notation idS is often used for the identity mapping 1S of the set S.

(7) The notations 1U⊂S for an inclusion mapping and cD→C for a constant were intro-duced by J. J. Schaffer and me in about 1973. Also, we started to write fn insteadof the more common fn for the n’th iterate of f to avoid a clash with notations forvalue-wise powers.

04. FAMILIES OF SETS; FAMILIES AND SETS OF MAPPINGS xvii

(8) The notations f> and f< for the image mapping and pre-image mapping of amapping f were introduced by J. J. Schaffer and me in about 1973. Many yearsbefore, I had introduced the notations f∗ and f∗, which were also introduced,independently, by S. MacLane and G. Birkhoff in their book “Algebra” (MacMillan,1967). In most of the literature, the same symbol f is used for the image mapping

as for the mapping f itself. The pre-image mapping of f is often denoted by−1

f orf−1, the latter leading to confusion with the inverse (if there is one). A distinctivenotation for image and pre-image mappings avoids a lot of confusion and leads togreat economy when expressing relations (see, for example, (56.3)).

(9) The definition of the adjustment f |BA of a mapping f , in the generality given here, isnew. For the case when f>(A) ⊂ B it has been used by J. J. Schaffer and me sinceabout 1973. I used the notation B |f |A instead of f |BA in a paper published in 1971.Most people “talk around” such adjustments and do not use an explicit notation.The notation f |A for the restriction is in fairly common use. The notation f|A forf |AA was introduced recently by J. J. Schaffer.

04 Families of Sets; Families and Sets of Mappings

Let (Ai | i ∈ I) be a family of sets, i.e. a family whose terms Ai are allsets. The range of this family is the collection Ai | i ∈ I of sets; it is non-empty if and only if I is not empty. We define the union and, if I 6= ∅, theintersection of the family to be, respectively, the union and intersection ofthe range. We use the notations

⋃

i∈I

Ai =⋃

(Ai | i ∈ I) :=⋃

Ai | i ∈ I, (04.1)

⋂

i∈I

Ai =⋂

(Ai | i ∈ I) :=⋂

Ai | i ∈ I if I 6= ∅. (04.2)

Given any set S, we have the following generalizations of (01.9) and (01.16):

S ∩⋃

i∈I

Ai =⋃

i∈I

(S ∩ Ai), S ∪⋂

i∈I

Ai =⋂

i∈I

(S ∪ Ai), (04.3)

S \⋃

i∈I

Ai =⋂

i∈I

(S \ Ai), S \⋂

i∈I

Ai =⋃

i∈I

(S \ Ai), (04.4)

Given any mapping f with⋃

(Ai | i ∈ I) ⊂ Dom f we have the followinggeneralizations of (03.12):

f>(⋃

i∈I

Ai) =⋃

i∈I

f>(Ai), f>(⋂

i∈I

Ai) ⊂⋂

i∈I

f>(Ai), (04.5)

xviii CHAPTER 0. BASIC MATHEMATICS

and the inclusion becomes an equality if f is injective. If f is a mappingwith

⋃

(Ai | i ∈ I) ⊂ Cod f , we have the following generalizations of (03.13):

f<(⋃

i∈I

Ai) =⋃

i∈I

f<(Ai), f<(⋂

i∈I

Ai) =⋂

i∈I

f<(Ai). (04.6)

The set-product of the family (Ai | i ∈ I) of sets is the set of all familiesa with index set I such that ai ∈ Ai for all i ∈ I. This set product is denotedby

×i∈I

Ai =×(Ai | i ∈ I)

It generalizes the set-product A1×A2 of sets A1, A2 as defined in Sect.02because A1 × A2 is the set-product of the pair (A1, A2) in the sense justdefined. If the terms of a family (Ai | i ∈ I) are all the same, i.e. if there isa set S such that Ai = S for all i ∈ I, then the set-product reduces to the

set-power defined in Sect.02, i.e.×(Ai | i ∈ I) = SI .

Let a family (Ai | i ∈ I) of sets and j ∈ I be given. Then the mapping

(a 7→ (aj , a |I\j)) :×i∈I

Ai → Aj × ×i∈I\j

Ai (04.7)

is a natural bijection.

Let A1 and A2 be sets. We define the two evaluations

ev1 : A1 × A2 → A1 and ev2 : A1 × A2 → A2

by the rules

ev1(a) := a1, ev2(a) := a2 for all a ∈ A1 × A2. (04.8)

We have

a = (ev1(a), ev2(a)) for all a ∈ A1 × A2.

More generally, given a family (Ai | i ∈ I) of sets, we can define, for eachj ∈ I, the evaluation

evj :×i∈I

Ai → Aj

by the rule

evj(a) := aj for all a ∈×i∈I

Ai, (04.9)

04. FAMILIES OF SETS; FAMILIES AND SETS OF MAPPINGS xix

and we have

a = (evi(a) | i ∈ I) for all a ∈×i∈I

Ai.

Given any sets S and T , we denote the set of all mappings f withDom f = S and Cod f = T by Map (S, T ). The set of all injective map-pings from S to T will be denoted by Inj (S, T ). The set of all invertiblemappings from a set S to itself is denoted by Perm S and its members arecalled permutations of S.

Let S and T be sets. For each x ∈ S, we define the evaluation at x,evx : Map (S, T ) → T , by the rule

evx(f) := f(x) for all f ∈ Map (S, T ). (04.10)

Assume now that a subset F of Map (S, T ) is given. Then the restrictionevx |F is also called the evaluation at x and is simply denoted by evx if thecontext makes clear what F is. We define the mapping evF : S → Map (F, T )by

(evF (x))(f) := evx(f) = f(x) for all x ∈ S, f ∈ F. (04.11)

This mapping evF , or a suitable adjustment of it, is called an evaluationmapping; it is simply denoted by ev if the context makes clear what F andthe adjustments are.

There is a natural bijection from the set Map (S, T ) of all mappings fromS to T onto the set TS of all families in T with index set S, as described atthe end of Sect.03. If U is a subset of T we have US ⊂ TS , but Map (S, U)is not a subset of Map (S, T ). However, we have the natural injection

(f 7→ f |T ) : Map (S, U) → Map (S, T ).

Let f and g be mappings having the same domain D. The value-wisepair formation

(x 7→ (f(x), g(x))) : D → Cod f × Cod g (04.12)

will be identified with the pair (f, g), so that

(f, g)(x) := (f(x), g(x)) for all x ∈ D. (04.13)

Thus, given any sets D, C1, C2, we obtain the identification

Map (D, C1) × Map (D, C2) ∼= Map (D, C1 × C2). (04.14)

xx CHAPTER 0. BASIC MATHEMATICS

More generally, given any family (fi | i ∈ I) of mappings, all having thesame domain D, we identify the term-wise evaluation

(x 7→ (fi(x) | i ∈ I)) : D →×i∈I

Cod fi (04.15)

with the family itself, so that

(fi | i ∈ I)(x) := (fi(x) | i ∈ I) for all x ∈ D. (04.16)

Thus, given any set D and any family (Ci | i ∈ I) of sets, we obtain theidentification

×i∈I

Map (D, Ci) ∼= Map (D,×i∈I

Ci). (04.17)

Let f and g be any mappings. We call the mapping

f × g : Dom f × Dom g → Cod f × Cod g

defined by

(f × g)(x, y) := (f(x), g(y)) for all x ∈ Dom f, y ∈ Dom g (04.18)

the cross-product of f and g.More generally, given any family (fi | i ∈ I) of mappings, we call the

mapping

×i∈I

fi =×(fi | i ∈ I) : ×i∈I

Dom fi → ×i∈I

Cod fi

defined by

(×i∈I

fi)(x) := (fi(xi) | i ∈ I) for all x ∈×i∈I

Dom fi (04.19)

the cross-product of the family (fi | i ∈ I). If the terms of the familyare all the same, i.e., if there is a mapping g such that fi = g for all i ∈ I,

we write g×I :=×(fi | i ∈ I) and call it the I-cross-power of g. If noconfusion can arise, we write simply g(c) := g×I(c) = (g(ci) | i ∈ I) whenc ∈ (Dom g)I .

Given any families (Di | i ∈ I) and (Ci | i ∈ I) of sets, the mapping

((fi | i ∈ I) 7→×i∈I

fi) : ×i∈I

Map (Di, Ci) → Map (×i∈I

Di,×i∈I

Ci) (04.20)

04. FAMILIES OF SETS; FAMILIES AND SETS OF MAPPINGS xxi

is a natural injection.

Let sets S and T be given. For every a ∈ S and every b ∈ T we definemappings (a, ·) : T → S × T and (·, b) : S → S × T by the rules

(a, ·)(y) := (a, y) and (·, b)(x) := (x, b) (04.21)

for all x ∈ S, y ∈ T . A mapping f whose domain Dom f is a subset of aset product S × T is often called a “function of two variables”. Its value at(x, y) ∈ S × T is written simply f(x, y) rather than f((x, y)). Given such amapping and any a ∈ S, b ∈ T , we define

f(a, ·) := f (a, ·) |Dom f , f(·, b) := f (·, b) |Dom f . (04.22)

Hence we have

f(·, y)(x) = f(x, y) = f(x, ·)(y) for all (x, y) ∈ Dom f. (04.23)

More generally, if (Ai | i ∈ I) is a family of sets, we define, for each j ∈ I

and each c ∈×(Ai | i ∈ I \ j), the mapping

(c.j) : Aj →×i∈I

Ai

by the rule

((c.j)(z))i :=

ci if i ∈ I \ jz if i = j

. (04.24)

If a ∈×(Ai | i ∈ I), we abbreviate (a.j) := (a |I\j.j). If I = 1, 2, wehave (a.1) = (·, a2), (a.2) = (a1, ·). Given a mapping f whose domain is a

subset of×(Ai | i ∈ I) and given j ∈ I and c ∈×(Ai | i ∈ I \ j) or

c ∈×(Ai | i ∈ I), we define

f(c.j) := f (c.j) |Dom f . (04.25)

We have

f(x.j)(xj) = f(x) for all x ∈×i∈I

Ai. (04.26)

Let S, T and C be sets. Given any mapping f : S × T → C, we identifythe mapping

(x 7→ f(x, ·)) : S → Map (T, C) (04.27)

xxii CHAPTER 0. BASIC MATHEMATICS

with f itself, so that

(f(x, ·))(y) := f(x, y) for all x ∈ S, y ∈ T. (04.28)

Thus, we obtain the identification

Map (S × T, C) ∼= Map (S, Map (T, C)). (04.29)

Notes 04

(1) The notationS

(Ai | i ∈ I) for the union of the family (Ai | i ∈ I) is used when itoccurs in the text rather than in a displayed formula. For typographical reasons,the notation on the left of (04.1) can be used only in displayed formulas. A similarremark applies to intersections, set-products, sums, products, and other operationson families (see (04.2) and (07.1)).

(2) The term “projection” is often used for what we call “evaluation”, in particularwhen the domain is a set-product rather than a set of mappings.

(3) The notationQ

(Ai | i ∈ I) is sometimes used for the set-product

×(Ai | i ∈ I), which is often called “Cartesian product” or “direct product” (seeNote (4) to Sect.02).

(4) The notations (a, ·) and (·, b) as defined by (04.21) and the notation (c.j) as definedby (04.24) were introduced by me a few years ago. The notations f(·, y) and f(x, ·),using (04.23) as definitions rather than propositions, are fairly common in theliterature.

(5) Many people use the term “permutation” only if the domain is a finite set. I cannotsee any advantage in such a limitation.

05 Finite Sets

The cardinal of a finite set S, i.e. the number of elements in S, will bedenoted by ♯ S ∈ N. Given n ∈ N, the range S := ai | i ∈ n] of aninjective list (ai | i ∈ n]) is a finite set with cardinal ♯ S = n. By anenumeration of a finite set S we mean a list (ai | i ∈ n]) of length n := ♯ Swhose range is S, or the corresponding mapping (i 7→ ai) : n] → S. Thereare n! such enumerations. We have ♯ (n]) = ♯ (n[) = n for all n ∈ N.

Let S and T be finite sets. Then S ∩ T , S ∪ T , S \ T , and S × T areagain finite and their cardinals satisfy the rules

♯ (S ∪ T ) + ♯ (S ∩ T ) = ♯ S + ♯ T, (05.1)

♯ (S \ T ) = ♯ S − ♯ (S ∩ T ) = ♯ (S ∪ T ) − ♯ T, (05.2)

♯ (S × T ) = (♯ S)(♯ T ). (05.3)

05. FINITE SETS xxiii

We say that a family is finite if its index set is finite. If (Ai | i ∈ I) is a

finite family of finite sets, then×(Ai | i ∈ I) is finite and

♯ (×i∈I

Ai) =∏

i∈I

(♯ Ai). (05.4)

(See Sect.07 concerning the product∏

(ni | i ∈ I) of a finite family(ni | i ∈ I) in N.)

Let P be a partition of the given finite set S. Then P and the membersof P are all finite and

♯ S =∑

P∈P

(♯ P ). (05.5)

(See Sect.07 concerning the sum∑

(ni | i ∈ I) of a finite family (ni | i ∈ I)in N.)

Pigeonhole Principle: Let f be a mapping with finite domain and

codomain. If f is injective, then ♯ Dom f ≤ ♯ Cod f . If f is surjective, then

♯ Dom f ≥ ♯ Cod f . If ♯ Dom f = ♯ Cod f then the following are equivalent:

(i) f is injective;

(ii) f is surjective;

(iii) f is invertible.

If S and T are finite sets, then the set TS of all families in T indexed onS and the set Map (S, T ) of all mappings from S to T are finite and

♯ (TS) = ♯ Map (S, T ) = (♯ T )♯ S (05.6)

If S is finite, so is SubS and

♯ Sub S = 2(♯ S). (05.7)

Let S be any set. We denote the set of all finite subsets of S by FinS,and the set of all subsets of S having m elements by Fin mS, so that

Fin mS := U ∈ FinS | ♯ U = m. (05.8)

If S is finite, so is Fin mS. The notation

(

nm

)

:= ♯ Fin m(n])

xxiv CHAPTER 0. BASIC MATHEMATICS

(read “n choose m”) is customary, and we have

♯ Fin m(S) =

(

♯ Sm

)

for all m ∈ N. (05.9)

We have the following rules, valid for all n, m ∈ N:

(

nm

)

= 0 if m > n,

(

n0

)

=

(

nn

)

= 1, (05.10)

(

n + mm

)

=

(

n + mn

)

, (05.11)

(

n + 1m + 1

)

=

(

nm

)

+

(

nm + 1

)

, (05.12)

(

m + nm

)

m! =∏

k∈m]

(n + k) =(m + n)!

n!, (05.13)

∑

k∈(n+1)[

(

nk

)

= 2n. (05.14)

If S is a finite set, then the set PermS of all permutations of S is againfinite and

♯ Perm S = (♯ S)!. (05.15)

If S and T are finite sets, then the set Inj (S, T ) of all injections from S toT is again finite and we have Inj (S, T ) = ∅ if ♯ S > ♯ T while

♯ Inj (S, T ) =(♯ T )!

(♯ T − ♯ S)!if ♯ S

<= ♯ T. (05.16)

Notes 05

(1) Many people use |S| for the cardinal ♯S of a finite set.

(2) The notations Fin S and Fin mS are used here for the first time. J. J. Schaffer andI have used F(S) and Fm(S), respectively, since about 1973.

06. BASIC ALGEBRA xxv

06 Basic Algebra

A pre-monoid is a set M endowed with structure by the prescription ofa mapping cmb : M × M → M called combination, which satisfies theassociative law

cmb(cmb(a, b), c) = cmb(a, cmb(b, c)) for all a, b, c ∈ M. (06.1)

A monoid M is a pre-monoid, with combination cmb, endowed withadditional structure by the prescription of a neutral n ∈ M which satisfiesthe neutrality law

cmb(a, n) = cmb(n, a) = a for all a ∈ M. (06.2)

A group G is a monoid, with combination cmb and neutral n, endowedwith additional structure by the prescription of a mapping rev : G → G,called reversion , which satisfies the reversion law

cmb(a, rev(a)) = cmb(rev(a), a) = n for all a ∈ G. (06.3)

Let H be a subset of a pre-monoid M . We say that H is a sub-pre-monoid if it is stable under combination, i.e. if

cmb>(H × H) ⊂ H. (06.4)

If this is the case, then the adjustment cmb |HH×H endows H with the naturalstructure of a pre-monoid.

Let H be a subset of a monoid M . We say that H is a submonoid ofM if it is a sub-pre-monoid of M and if it contains the neutral n of M . Ifthis is the case, then the designation of n as the neutral of H endows Hwith the natural structure of a monoid.

Let H be a subset of a group G. We say that H is a subgroup of G ifit is a submonoid of G and if H is reversion-invariant, i.e. if

rev>(H) ⊂ H.

If this is the case, then the adjustment rev|H (see (03.23)) endows Hwith the natural structure of a group. The singleton n is a subgroup ofG, called the neutral-subgroup of G.

Let M be a pre-monoid. Then M contains at most one element n whichsatisfies the neutrality law (06.2). If such an element exists, we say that M ismonoidable, because we can use n to endow M with the natural structureof a monoid.

xxvi CHAPTER 0. BASIC MATHEMATICS

Let M be a monoid. Then there exists at most one mappingrev : M → M which satisfies the reversion law (06.3). If such a map-ping exists, we say that M is groupable because we can use rev to endowM with the natural structure of a group. We say that a pre-monoid M isgroupable if it is monoidable and if the resulting monoid is groupable. If Gis a group, then every groupable sub-pre-monoid of G is in fact a subgroup.

Pitfall: A monoidable sub-pre-monoid of a monoid need not be a sub-monoid. For example, the set of natural numbers N with multiplication ascombination and 1 as neutral is a monoid and the singleton 0 a sub-pre-monoid of N. Now, 0 is monoidable, in fact groupable, but it does notcontain 1 and hence is not a submonoid of N.

Let E be a set. The set Map (E, E) of all mappings from E to it-self becomes a monoid, if we define its combination by composition, i.e. bycmb(f, g) := f g for all f, g ∈ Map (E, E), and if we designate the identity1E to be its neutral. We call this monoid the transformation-monoidof E. The set PermE of all permutations of E is a groupable submonoidof Map (E, E). It becomes a group if we designate the reversion to be theinversion of mappings, i.e. if rev(f) := f← for all f ∈ Perm E. We call thisgroup Perm E the permutation-group of E.

Let G and G′ be groups with combinations cmb, cmb′, neutrals n, n′

and reversions rev and rev′, respectively. We say that τ : G → G′ is ahomomorphism if it preserves combinations, i.e. if

cmb′(τ(a), τ(b)) = τ(cmb(a, b)) for all a, b ∈ G. (06.5)

It then also preserves neutrals and reversions, i.e.

n′ = τ(n) and rev′(τ(a)) = τ(rev(a)) for all a ∈ G. (06.6)

If τ is invertible, then τ← : G′ → G is again a homomorphism; we then saythat τ is a group-isomorphism.

Let τ : G → G′ be a homomorphism. Then the image τ>(H) under τof every subgroup H of G is a subgroup of G′ and the pre-image τ<(H ′)under τ of every subgroup H ′ of G′ is a subgroup of G. The pre-image ofthe neutral-subgroup n′ of G′ is called the kernel of τ and is denoted by

Ker τ := τ<(n′). (06.7)

The homomorphism τ is injective if and only if its kernel is the neutral-subgroup of G, i.e. Ker τ = n.

06. BASIC ALGEBRA xxvii

We say that a pre-monoid or monoid M is cancellative if it satisfiesthe following cancellation laws for all b, c ∈ M :

(cmb(a, b) = cmb(a, c) for some a ∈ M) =⇒ b = c, (06.8)

(cmb(b, a) = cmb(c, a) for some a ∈ M) =⇒ b = c. (06.9)

A group is always cancellative.One often uses multiplicative terminology and notation when dealing

with a pre-monoid, monoid, or group. This means that one uses the term“multiplication” for the combination, one writes ab := cmb(a, b) and callsit the “product” of a and b, one calls the neutral “unity” and denotes itby 1, and one writes a−1 := rev(a) and calls it the “reciprocal” of a. Thenumber sets N, Z, Q, R, P, and C (see Sect.00) are all multiplicative monoidsand Q×, R×, P×, and C× are multiplicative groups. If M is a multiplicativepre-monoid and if S and T are subsets of M , we write

ST := cmb>(S × T ) = st | s ∈ S, t ∈ T. (06.10)

and call it the member-wise product of S and T .If t ∈ M , we abbreviate St := St and tS := tS. If G is a multiplica-

tive group and S a subset of G, we write

S−1 := rev>(S) = s−1 | s ∈ S. (06.11)

and call it the member-wise reciprocal of S.A pre-monoid, monoid, or group M is said to be commutative if

cmb(a, b) = cmb(b, a) for all a, b ∈ M. (06.12)

If this is the case, one often uses additive terminology and notation. Thismeans that one uses the term “addition” for the combination, one writesa + b := cmb(a, b) and calls it the “sum” of a and b, one calls the neutral“zero” and denotes it by 0, and one writes −a := rev(a) and calls it the“opposite” of a. The abbreviation a − b := a + (−b) is customary. Thenumber sets N and P are additive monoids while Z, Q, R, and C are additivegroups. If M is an additive pre-monoid and if S and T are subsets of M ,we write

S + T := cmb>(S × T ) = s + t | s ∈ S, t ∈ T. (06.13)

and call it the member-wise sum of S and T . If t ∈ M , we abbreviate

S + t := S + t = t + S =: t + S. (06.14)

xxviii CHAPTER 0. BASIC MATHEMATICS

If G is an additive group and if S is a subset of G, we write

−S := rev>(S) = −s | s ∈ S. (06.15)

and call it the member-wise opposite of S. If S and T are subsets of M ,we write

T − S := t − s | t ∈ T, s ∈ S. (06.16)

and call it the member-wise difference of T and S.A ring is a set R endowed with the structure of a commutative group,

called the additive structure of R and described with additive terminol-ogy and notation, and the structure of a monoid, called the multiplicative

structure of R and described with multiplicative terminology and notation,provided that the following distributive laws hold

(a + b)c = ac + bcc(a + b) = ca + cb

for all a, b, c ∈ R. (06.17)

A ring then has a zero, denoted by 0, and a unity, denoted by 1. We have0 6= 1 unless R is a singleton. The following rules hold for all a, b, c,∈ R:

(a − b)c = ac − bc, c(a − b) = ca − cb, (06.18)

(−a)b = a(−b) = −(ab), (−a)(−b) = ab, (06.19)

a0 = 0a = 0. (06.20)

A ring R is called a commutative ring if its multiplicative monoid-structure is commutative. A ring R is called an integral ring if R× is asubmonoid of R with respect to its multiplicative structure. This multi-plicative monoid R× is necessarily cancellative. A ring F is called a field ifit is commutative and integral and if the multiplicative submonoid F× of Fis groupable. Endowed with its natural group structure, F× is then calledthe multiplicative group of the field F .

A subset S of a ring R is called subring if it is a subgroup of the additivegroup R and a submonoid of the multiplicative monoid R. If this is thecase, then S has the natural structure of a ring. If a ring R is commutative[integral], so is every subring of it. A subring of a field is called a subfield ifits natural ring-structure is that of a field.

The set C of complex numbers is a field, R is a subfield of C, and Qis a subfield of R. The set Z is an (integral) subring of Q. The set Pis a submonoid of R both with respect to its additive structure and itsmultiplicative structure, but it is not a subring of R. A similar statementapplies to N and Z instead of P and R.

07. SUMMATIONS xxix

Notes 06

(1) I am introducing the term “pre-monoid” here for the first time. In some textbooks,the term “semigroup” is used for it. However, “semigroup” is often used for whatwe call “monoid” and “monoid” is sometimes used for what we call “pre-monoid”.

(2) In much of the literature, no clear distinction is made between a group and agroupable pre-monoid or between a monoid and a monoidable pre-monoid. Sometextbooks refer to an element satisfying the neutrality law (06.2) with the definitearticle before its uniqueness has been proved; such sloppiness should be avoided.

(3) In most textbooks, monoids and groups are introduced with multiplicative ter-minology and notation. I believe it is useful to at least start with an impartialterminology and notation. It is for this purpose that I introduce the terms “com-bination”, “neutral”, and “reversion”.

(4) Some people use the term “unit” or “identity” for what we call “unity”. This canlead to confusion because “unit” and “identity” also have other meanings.

(5) Some people use the term “inverse” for what we call “reciprocal” or the awkward“additive inverse” for what we call “opposite”. The use of “negative” for “opposite”and the reading of the minus-sign as “negative” are barbarisms that should bestamped out.

(6) The term “abelian” is often used instead of “commutative”. The latter term ismore descriptive and hence preferable.

(7) Some people use the term “ring” for a slightly different structure: they assume thatthe multiplicative structure is not that of a monoid but only that of a pre-monoid.I call such a structure a pre-ring. Another author calls it “rng” (I assume as theresult of a fit of humor).

(8) Most textbooks use the term “integral domain”, or even just “domain”, for whatwe call an “integral ring”. Often these terms are used only in the commutativecase. I see good reason not to use the separate term “domain” for a ring, especiallysince “domain” has several other meanings.

07 Summations

Let M be a commutative monoid, described with additive notation. Also,let a family c = (ci | i ∈ I) in M be given. One can define, by recursion, forevery finite subset J of I, the sum

∑

J

c =∑

(ci | i ∈ J) =∑

i∈J

ci (07.1)

xxx CHAPTER 0. BASIC MATHEMATICS

of c over J . This summation in M over finite index sets is characterized bythe requirement that

∑

∅

c = 0 and∑

J

c = cj +∑

J\j

c. (07.2)

for all J ∈ Fin (I) and all j ∈ J . If I itself is finite, we write∑

c :=∑

I c. Ifthe index set I is arbitrary, we then have

∑

J c =∑

c|J for all J ∈ Fin (I).If K is a finite set and if φ : K → I is an injective mapping, we have

∑

K

c φ =∑

k∈K

cφ(k) =∑

i∈φ>(K)

ci =∑

φ>(K)

c. (07.3)

If J and K are disjoint finite subsets of I, we have

∑

J∪K

c =∑

J

c +∑

K

c. (07.4)

More generally, if J is a finite subset of I and if P is a partition of J , then

∑

J

c =∑

P∈P

(∑

P

c). (07.5)

If J := j is a singleton or if J := j, k is a doubleton we have

∑

j

c = cj ,∑

j,k

c = cj + ck. (07.6)

If c, d ∈ M I , we have

∑

i∈J

(ci + di) = (∑

i∈J

ci) + (∑

i∈J

di) (07.7)

for all J ∈ Fin I. More generally, if m ∈ M I×K is a matrix in M we have

∑

i∈J

∑

k∈L

mi,k =∑

(i,k)∈J×L

mi,k =∑

k∈L

∑

i∈J

mi,k (07.8)

for all J ∈ Fin I and all L ∈ FinK.The support of a family c = (ci | i ∈ I) in M is defined to be the set of

all indices i ∈ I with ci 6= 0 and is denoted by

Supt c := i ∈ I | ci 6= 0. (07.9)

07. SUMMATIONS xxxi

We use the notation

M (I) := c ∈ M I | Supt c is finite (07.10)

for the set of all families in M with index set I and finite support. For everyc ∈ M (I) we define

∑

J

c :=∑

(J∩Supt c)

c (07.11)

for all J ∈ Sub I. With this definition, the rules (07.1)–(07.5), (07.7), and(07.8) extend to the case when J and L are not necessarily finite, providedthat c, d and m have finite support.

If n ∈ N and a ∈ M then (a | i ∈ n]) is the constant list of length n andrange a if n 6= 0, and the empty list if n = 0. We write

na :=∑

(a | i ∈ n]) =∑

i∈n]

a. (07.12)

and call it the nth multiple of a. If (a | i ∈ I) is any constant finite familywith range a and if I 6= ∅, we have

∑

i∈I

a = (♯ I)a. (07.13)

If (Ai | i ∈ I) is a finite family of subsets of M we write

∑

i∈I

Ai := ∑

I

a | a ∈×i∈I

Ai (07.14)

and call it the member-wise sum of the family.Let G be a commutative group and let c = (ci | i ∈ I) be a family in G.

We then have∑

i∈J

(−ci) = −∑

i∈J

ci (07.15)

if J ∈ Sub I is finite or if c|J has finite support.If M is a commutative pre-monoid, described with additive notation,

and if c = (ci | i ∈ I) is a family in M , one still can define the sum∑

J cprovided that J is a non-empty finite set. This summation is characterizedby (07.6)1 and (07.1)2, restricted by the condition that J \j 6= ∅. All rulesstated before remain valid for summations over non-empty finite index sets.

If M is a commutative monoid described with multiplicative rather thanadditive notation, the symbol

∑

is replaced by∏

and∏

J

c =∏

i∈J

ci =∏

c|J (07.16)

xxxii CHAPTER 0. BASIC MATHEMATICS

is called the product of the family c = (ci | i ∈ I) in M over J ∈ Fin I.Of course, 0 must be replaced by 1 wherever it occurs. Also, if n ∈ N anda ∈ M, the n’th multiple na is replaced by the n’th power an of a.

Let R be commutative ring. If c = (ci | i ∈ I) is a family in R and ifJ ∈ Fin I, it makes sense to form the product

∏

J c as well as the sum∑

J cof c over J . We list some generalized versions of the distributive law. Ifc = (ci | i ∈ I) and d = (dk | k ∈ K) are families in R then

(∑

J

c)(∑

L

d) =∑

(i,k)∈J×L

cidk (07.17)

for all J ∈ Fin I and L ∈ FinK. If m ∈ M I×K is a matrix in R then

∏

i∈J

(∑

k∈L

mi,k) =∑

s∈LJ

(∏

i∈J

mi,si) (07.18)

for all J ∈ Fin I and L ∈ FinK. If I is finite and if c, d ∈ RI , then

∏

i∈I

(ci + di) =∑

J∈Sub I

(∏

j∈J

cj)(∏

k∈I\J

dk). (07.19)

If a, b ∈ R and n ∈ N, then

(a + b)n =∑

k∈(n+1)[

(

nk

)

akbn−k. (07.20)

If I is finite and c ∈ RI , then

(♯ I)!∏

I

c = (−1)♯ I∑

K∈Sub I

(−1)♯ K(∑

K

c)♯ I . (07.21)

Notes 07

(1) In much of the literature, summations are limited to those over sets of the formi ∈ Z | n ≤ i ≤ m with n,m ∈ Z, n ≤ m. The notation

mX

i=n

ci :=X

(ci | i ∈ Z, n ≤ i ≤ m)

is often used. Much flexiblity is gained by allowing summations over arbitrary finiteindex sets.

08. REAL ANALYSIS xxxiii

08 Real Analysis

As mentioned in Sect.06, the set R of real numbers is a field and the set Pof positive real numbers (which contains 0) is a submonoid of R both withrespect to addition and multiplication. The set P× = P \ 0 of strictlypositive real numbers is a subgroup of R× = R \ 0 with respect to mul-tiplication and a sub-pre-monoid of R with respect to addition. The setof all negative [strictly negative] real numbers is −P [−P×]. The collection−P×, 0, P× is a partition of R. The relation < (read as “strictly lessthan”) on R is defined by

x < y :⇐⇒ y − x ∈ P× (08.1)

for all x, y ∈ R. For every x, y ∈ R, exactly one of the three statementsx < y, y < x, x = y is valid. The relation ≤ (read as “less than”) on R isdefined by

x ≤ y :⇐⇒ y − x ∈ P. (08.2)

We have x ≤ y ⇐⇒ (x < y or x = y) and

(x ≤ y and y ≤ x) ⇐⇒ x = y. (08.3)

The following rules are valid for all x, y, z ∈ R and remain valid if < isreplaced by ≤:

(x < y and y < z) =⇒ x < z, (08.4)

x < y ⇐⇒ x + z < y + z, (08.5)

x < y ⇐⇒ −y < −x, (08.6)

(x < y ⇐⇒ ux < uy) for all u ∈ P×. (08.7)

The absolute value |x| ∈ P of a number x ∈ R is defined by

|x| :=

x if x ∈ P×

0 if x = 0−x if x ∈ −P×

. (08.8)

The following rules are valid for all x, y ∈ R:

|x| = 0 ⇐⇒ x = 0, (08.9)

|−x| = |x|, (08.10)

||x| − |y|| ≤ |x + y| ≤ |x| + |y|, (08.11)

|xy| = |x||y|. (08.12)

xxxiv CHAPTER 0. BASIC MATHEMATICS

The sign sgnx ∈ −1, 0, 1 of a number x ∈ R is defined by

sgnx :=

1 if x ∈ P×

0 if x = 0−1 if x ∈ −P×

. (08.13)

We have, for all x, y ∈ R,

sgn (xy) = (sgnx)(sgn y), (08.14)

x = (sgnx)|x|. (08.15)

Let a, b ∈ R be given. If a < b, we use the following notations:

]a, b[ := t ∈ R | a < t < b, (08.16)

[a, b] := t ∈ R | a ≤ t ≤ b, (08.17)

]a, b] := t ∈ R | a < t ≤ b, (08.18)

[a, b[ := t ∈ R | a ≤ t < b. (08.19)

If a > b, we sometimes write

]a, b[ :=]b, a[, [a, b] := [b, a], etc. (08.20)

A subset I of R is called an interval if for all a, b ∈ I with a < b wehave [a, b] ⊂ I. The empty set ∅ and singleton subsets of R are intervals. Allother intevals are infinite sets and will be called genuine intervals. Thewhole of R is a genuine interval. All other genuine intervals can be classifiedinto eight types. The first four are described by (08.16)–(08.19). The termopen interval is used if it is of the form (08.16), closed interval if it isof the form (08.17), and half-open interval if it is of the form (08.18) or(08.19). Intervals of the remaining four types have the form a + P, a + P×,a − P, or a − P× for some a ∈ R; they are called half-lines.

Let S be a subset of R. A number a ∈ R is called an upper bound ofS if S ⊂ a − P and a lower bound of S if S ⊂ a + P. There is at mostone a ∈ S which is also an upper bound [lower bound] of S. If it exists, it iscalled the maximum [minimum] of S and is denoted by max S [minS]. IfS is non-empty and finite, then both max S and minS exist. We say that Sis bounded above [bounded below] if the set of its upper bounds [lowerbounds] is not empty. We say that S is bounded if it is both bounded aboveand below. This is the case if and only if there is a b ∈ P such that |s| ≤ bfor all s ∈ S. Every finite subset of R is bounded. If S is bounded, then

08. REAL ANALYSIS xxxv

every subset of S is bounded. The union of a finite collection of boundedsubsets of R is again bounded. A genuine interval is bounded if and only ifit is of one of the types described by (08.16)–(08.19).

Let S be a non-empty subset of R. If S is bounded above [boundedbelow] then the set of all its upper bounds [lower bounds] has a minimum[maximum], which is called the supremum [infimum] of S and is de-noted by supS [inf S]. If S has a maximum [minimum] then supS =max S [inf S = minS]. It is often useful to consider the extended-real-number set R := R∪∞,−∞ and then to express the assertion that S failsto be bounded above [bounded below] by writing supS = ∞ [inf S = −∞].In this sense supS ∈ R and inf S ∈ R are well-defined if S is an arbitrarynon-empty subset of R. We also use the notation P := P ∪ ∞. The rela-tion < on R is extended to R in such a way that −∞ < ∞ and that t ∈ Rsatisfies −∞ < t < ∞ if and only if t ∈ R.

We say that a family in R, or a function whose codomain is included inR, is bounded [bounded above, bounded below] if its range is bounded[bounded above, bounded below]. Let I be a subset of R. We say that thefamily a ∈ RI is isotone [antitone] if

i ≤ j =⇒ ai ≤ aj [aj ≤ ai] for all i, j ∈ I. (08.21)

We say that the family is strictly isotone [strictly antitone] if (08.21)still holds after ≤ has been replaced by <. This terminology applies, inparticular, to lists and sequences (see Sect. 02). We say that a mapping fwith Dom f, Cod f ∈ Sub R is isotone, antitone, strictly isotone, or strictly

antitone if the family (f(t) | t ∈ Dom f) of its values has the correspondingproperty.

Let a be a sequence in R, i.e. a ∈ RN×

or a ∈ RN. We say that aconverges to t ∈ R if for every ε ∈ P× there is n ∈ N× such that a>(n+N) ⊂]t− ε, t + ε[. A sequence in R can converge to at most one t ∈ R. If it does,we call t the limit of a and write t = lima to express the assertion thata converges to t. Every isotone [antitone] sequence that is bounded above[below] converges.

Let a be any sequence. We say that the sequence b is a subsequence ofa if b = a σ for some strictly isotone mapping σ whose domain, and whosecodomain, is N or N×, as appropriate. If a is a sequence in R that convergesto t ∈ R, then every subsequence of a also converges to t.

Let a be a sequence in R. We say that t ∈ R is a cluster point of a,or that a clusters at t, if for every ε ∈ P× and every n ∈ N×, we havea>(n + N) ∩ ]t − ε, t + ε[ 6= ∅. The sequence a clusters at t if and only ifsome subsequence of a converges to t.

xxxvi CHAPTER 0. BASIC MATHEMATICS

Cluster Point Theorem: Every bounded sequence in R has at least

one cluster point.

Let a sequence a ∈ RN be given. The sum-sequence ssq a ∈ RN of a isdefined by

(ssq a)n :=∑

k∈n[

ak for all n ∈ N. (08.22)

Let S be a subset of R. Given f, g ∈ Map (S, R), we define the value-wise sum f+g ∈ Map (S, R) and the value-wise product fg ∈ Map (S, R)by

(f + g)(t) := f(t) + g(t), (fg)(t) := f(t)g(t) for all t ∈ S (08.23)

The value-wise quotient fg∈ Map (g<(R×), R) is defined by

(

f

g

)

(t) :=f(t)

g(t)for all t ∈ g<(R×) = S \ g<(0). (08.24)

We write f ≤ g [f < g] to express the assertion that f(t) ≤ g(t) [f(t) < g(t)]for all t ∈ S. Given f ∈ Map (S, R) and n ∈ N, we define the value-wiseopposite −f , the value-wise absolute value |f | and the value-wise n’thpower fn of f by

(−f)(t) := −f(t), |f |(t) := |f(t)|, fn(t) = (f(t))n for all t ∈ S. (08.25)

If n ∈ −N×, we define fn ∈ Map (f<(R×), R) by fn := 1f |n|

.

The identity mapping of R will be abbreviated by

ι := 1R. (08.26)

The symbol ι will also be used for various adjustments of 1R if the contextmakes it clear which adjustments are appropriate.

For every t ∈ R, the sum-sequence ssq ( tk

k! | k ∈ N) converges; its limitis called the exponential of t and is denoted by et. The exponentialfunction exp : R → R can be defined by the rule

exp(t) := et for all t ∈ R. (08.27)

Let f be a function with Dom f, Cod f ∈ Sub R. Let t ∈ R be anaccumulation point of Dom f , which means that

(]t − σ, t + σ[\t) ∩ Dom f 6= ∅ for all σ ∈ P×. (08.28)

08. REAL ANALYSIS xxxvii

We say that λ ∈ R is a limit of f at t if for every ε ∈ P×, there is σ ∈ P×

such that

|f |]t−σ,t+σ[\t − λ | < ε. (08.29)

The function f can have at most one limit at t. If it has, we write limt f = λto express the assertion that f has the limit λ at t. We say that the functionf is continuous at t if t ∈ Dom f and limt f = f(t).

Let I be a genuine interval. We say that f : I → R is continuousif f is continuous at every t ∈ I. Let t ∈ I be given. We say that f isdifferentiable at t if the limit

∂tf := limt

(

f − f(t)

ι − t

)

(08.30)

exists. If it does, then ∂tf is called the derivative of f at t. We say that fis differentiable if it is differentiable at every t ∈ I. If this is the case, wedefine the derivative (-function) ∂f : I → R of f by

(∂f)(t) := ∂tf for all t ∈ I. (08.31)

If f and g are differentiable, so are f + g and fg, and we have

∂(f + g) = ∂f + ∂g, ∂(fg) = (∂f)g + f(∂g). (08.32)

Let n ∈ N be given. We say that f : I → R is n times differentiable if∂nf : I → R can be defined by the recursion

∂0f := f, ∂k+1f := ∂(∂kf) for all k ∈ n[. (08.33)

We say that f is of class Cn if it is n times differentiable and if ∂nf iscontinuous. We say that f is of class C∞ if it is n times differentiable forall n ∈ N. We also use the notations

f (k) := ∂kf for all k ∈ n[ (08.34)

and

f• := ∂f, f•• := ∂2f, f••• := ∂3f. (08.35)

For each n ∈ Z, the function ιn is of class C∞ and we have

∂kιn =

(∏

j∈k[

(n − j))ιn−k if n 6∈ k[

0 if n ∈ k[

for all k ∈ N. (08.36)

xxxviii CHAPTER 0. BASIC MATHEMATICS

The exponential function defined by (08.27) is of class C∞ and we have∂nexp = exp for all n ∈ N.

Difference-Quotient Theorem: Let a genuine interval I and a, b ∈ Iwith a < b and a function f : I → R be given. If f |[a,b] is continuous and

f |]a,b[ differentiable, then

f(b) − f(a)

b − a∈ ∂tf | t ∈]a, b[. (08.37)

Corollary: If I is a genuine interval and if f : I → R is differentiable

with ∂f = 0, then f must be a constant.

Let f be a function with Cod f ⊂ R. We say that f attains a max-imum [minimum] (at x ∈ Dom f) if Rng f has a maximum [minimum](and if f(x) = max Rng f [minRng f ]). The term extremum is used for“maximum or minimum”.

Extremum Theorem: Let I be an open interval. If f : I → R attains

an extremum at t ∈ I and if f is differentiable at t then ∂tf = 0.Theorem on Attainment of Extrema: If I is a closed and bounded

interval and if f : I → R is continuous, then f attains a maximum and

minimum.

Let I be a genuine interval. Let f : I → R be a continuous function.One can define, for every a, b ∈ I the integral

∫ b

af ∈ R. This integration

is characterized by the requirement that

∫ b

a

f =

∫ c

a

f +

∫ b

c

f for all a, b, c ∈ I (08.38)

and

min f>([a, b]) ≤1

b − a

∫ b

a

f ≤ max f>([a, b]) if a < b. (08.39)

We have∣

∣

∣

∣

∫ b

a

f

∣

∣

∣

∣

≤

∣

∣

∣

∣

∫ b

a

|f |

∣

∣

∣

∣

for all a, b ∈ I. (08.40)

If λ ∈ R, if f, g ∈ Map (I, R) are both continuous, and if a, b ∈ I, then

λ

∫ b

a

f =

∫ b

a

(λf),

∫ b

a

(f + g) =

∫ b

a

f +

∫ b

a

g. (08.41)

If f, g ∈ Map (I, R) are both continuous and if a ≤ b, we have

f ≤ g =⇒

∫ b

a

f ≤

∫ b

a

g. (08.42)

08. REAL ANALYSIS xxxix

Fundamental Theorem of Calculus: Let I be a genuine interval,

let a ∈ I, and let f : I → R be a continuous function. Then the function

g : I → R defined by

g(t) :=

∫ t

a

f for all t ∈ I (08.43)

is differentiable and ∂g = f .Let f : I → R be a continuous function whose domain I is a genuine

interval and let g : I → R be an antiderivative of f , i.e. a differentiablefunction such that ∂g = f . For every a, b ∈ I, we then have

∫ b

a

f = g(b) − g(a). (08.44)

If no dummy-free symbol for the function f is available, it is customary towrite

∫ b

a

f(s)ds :=

∫ b

a

f. (08.45)

Notes 08

(1) Many people use parentheses in notations such as (08.13) - (08.16) and write forexample, (a, b] for what we denote by ]a, b]. This usage should be avoided becauseit leads to confusion between the pair (a, b) and an interval.

(2) Many people say “less than” when we say “strictly less than” and use the awkward“less than or equal to” when we say “less than”.

(3) There are some disparities in the literature about the usage of “interval”. Oftenthe term is restricted to genuine intervals or even to bounded genuine intervals.

(4) Some people use the term “limit point” for “cluster point” and sometimes also for“accumulation point”. This usage should be avoided because it can too easily beconfused with “limit”. There is no agreement in the literature about the terms“cluster point” and “accumulation point”. Sometimes “cluster point” is used forwhat we call “accumulation point” and sometimes “accumulation point” for whatwe call “cluster point”.

(5) In the literature, the term “infinite series” is often used in connection with sum-sequences. There is no agreement on what precisely a “series” should be. Sometextbooks contain a “definition” that makes no sense. We avoid the problem bynot using the term at all.

(6) The Cluster Point Theorem, or a variation thereof, is very often called the “Bolzano-Weierstrass Theorem”.

xl CHAPTER 0. BASIC MATHEMATICS

(7) Most of the literature on real analysis suffers from the lack of a one-symbol notationfor 1R, as has been noted by several authors in the past. I introduced the use of ιfor 1R in 1971 (class notes). Some of my colleagues and I cannot understand anymore how we could ever have lived without it (as without a microwave oven).

(8) I introduced the notation ∂tf for the derivative of f at t a few years ago. Most of themore traditional notations, such as df(t)/dt, turn out, upon careful examination,

to make no sense. The notations f ′ and·

f for the derivative-function of f are verycommon. The former clashes with the use of the prime as a mere distinction symbol(as in Def.1 of Sect.13), the latter makes it difficult to write derivatives of functionswith compound symbols, such as (exp f + ι(sin h))•.

(9) The Difference-Quotient Theorem is most often called the “Mean-Value Theorem ofDifferential Calculus”. I believe that “Difference-Quotient” requires no explanationwhile “Mean-Value” does.

Chapter 1

Linear Spaces

This chapter is a brief survey of basic linear algebra. It is assumed that thereader is already familiar with this subject, if not with the exact terminol-ogy and notation used here. Many elementary proofs are omitted, but theexperienced reader will have no difficulty supplying these proofs for himselfor herself.

In this chapter the letter F denotes either the field R of real numbers orthe field C of complex numbers. (Actually, F could be an arbitrary field. Topreserve the validity of certain remarks, F should be infinite.)

11 Basic Definitions

Definition 1: A linear space (over F) is a set V endowed with structure

by the presciption of

(i) an operation add: V × V → V, called the addition in V,

(ii) an operation sm: F × V → V, called the scalar multiplication in V,

(iii) an element 0 ∈ V called the zero of V,

(iv) a mapping opp: V → V called the opposition in V, provided that the

following axioms are satisfied:

(A1) add (u, add(v,w)) = add(add (u,v), w) for all u, v,w ∈ V,

(A2) add(u,v) = add(v,u) for all u,v ∈ V,

(A3) add(u, 0) = u for all u ∈ V,

(A4) add(u, opp(u)) = 0 for all u ∈ V,

39

40 CHAPTER 1. LINEAR SPACES

(S1) sm(ξ, sm(η, u)) = sm(ξη, u) for all ξ, η ∈ F, u ∈ V,

(S2) sm(ξ + η, u) = add(sm(ξ, u),sm(η, u)) for all ξ, η ∈ F, u ∈ V,

(S3) sm(ξ, add(u,v)) = add(sm(ξ, u), sm(ξ, v)) for all ξ ∈ F,u,v ∈ V,

(S4) sm (1u) = u for all u ∈ V.

The prescription of add, 0, and opp, subject to the axioms (A1)-(A4),endows V with the structure of a commutative group (See Sect. 06). Thus,one can say that a linear space is a commutative group endowed with addi-tional structure by the prescription of a scalar multiplication sm: F×V → Vsubject to the conditions (S1)-(S4).

The zero 0 of V and the opposition opp of V are uniquely determined bythe operation add (see the remark on groupable pre-monoids in Sect. 06).In other words, if add, sm, 0, and opp endow V with the structure of a linearspace and if add, sm, 0′, and opp′ also endow V with such a structure, then0′ = 0 and opp′ = opp and hence the structures coincide. This fact enablesone to say that the prescription of two operations, add: V ×V → V and sm:F×V → V, endow V with the structure of a linear space if there exist 0 ∈ Vand opp: V → V such that the conditions (A1)-(S4) are satisfied.

The following facts are easy consequences of the (A1)-(S4):

(I) For every u,v ∈ V there is exactly one w ∈ V such that add(u,w) =v; in fact, w is given by w := add(v, opp (u)).

(II) sm(ξ − η, u) = add(sm (ξ, u), opp(sm(η, u))) for all ξ, η ∈ F, u ∈ V.

(III) sm(ξ, add(u, opp(v))) = add(sm(ξ, u), opp(sm(ξ, v))) for all ξ ∈ F,u,v ∈ V.

(IV) sm(−ξ, u) = opp(sm(ξ,u)) for all ξ ∈ F, u ∈ V.

(V) If ξ ∈ F and u ∈ V, then sm(ξ, u) = 0 if and only if ξ = 0 or u = 0.

The field F acquires the structure of a linear space if we prescribe add,sm, 0 and opp for F by putting add(ξ, η) := ξ+η, sm(ξ, η) := ξη, 0 := 0, opp(ξ) := −ξ, so that addition, zero, and opposition in the linear space F havetheir ordinary meaning while scalar multiplication reduces to ordinary mul-tiplication. The axioms (A1)-(S4) reduce to rules of elementary arithmeticin F.

It is customary to use the following simplified notations in an arbitrarylinear space V:

11. BASIC DEFINITIONS 41

u + v := add(u,v) when u,v ∈ V,ξ u := sm (ξ, u) when ξ ∈ F, u ∈ V,−u := opp(u) when u ∈ V,

u − v := u + (−v) = add (u, opp (v)) when u,v ∈ V.

We will use these notations most of the time. If several linear spacesare considered at the same time, the context should make it clear whichof the several addition, scalar multiplication, or opposition operations aremeant when the symbols +,−, or juxtaposition are used. Also, when thesymbols 0 or 0 are used, they denote the zero of whatever linear space thecontext requires. With the notations above, the axioms (A1)-(S4) and theconsequences (I)-(V) translate into the following familiar rules, valid for allu,v,w ∈ V and all ξ, η ∈ F:

u + (v + w) = (u + v) + w, (11.1)

u + v = v + u, (11.2)

u + 0 = u, (11.3)

u− u = 0, (11.4)

ξ(ηu) = (ξη)u, (11.5)

(ξ + η)u = ξu + ηu, (11.6)

ξ(u + v) = ξu + ξv, (11.7)

1u = u, (11.8)

u + w = v ⇔ w = v − u, (11.9)

(ξ − η)u = ξu − ηu, (11.10)

ξ(u− v) = ξu − ξv, (11.11)

ξu = 0 ⇔ (ξ = 0 or u = 0). (11.12)

Since the linear space V is a commutative monoid described with addi-tive notation, we can consider the addition of arbitrary families with finitesupport of elements of V as explained in Sect. 07.

Notes 11

(1) The term “vector space” is very often used for what we call a “linear space”. Thetrouble with “vector space” is that it leads one to assume that the elements are“vectors” in some sense, while in fact thay very often are objects that could notbe called “vectors” by any stretch of the imagination. I prefer to use “vector” onlywhen it has its original geometric meaning (see Def. 1 or Sect. 32).

(2) Sometimes, one finds the term “origin” for what we call the “zero” of a linear space.


12 Subspaces

Definition 1: A non-empty subset U of a linear space V is called a sub-space of V if it is stable under the addition add and scalar multiplication

sm in V, i.e., if

add>(U × U) ⊂ U and sm>(F × U) ⊂ U .

It is easily proved that a subspace U of V must contain the zero 0 of Vand must be invariant under opposition, so that

0 ∈ U , opp>(U) ⊂ U .

Moreoever, U acquires the natural structure of a linear space if the addition,scalar multiplication, and opposition in U are taken to be the adjustments

add∣

∣

UU×U , sm

∣

∣

U

F×U, and opp|U , respectively, and if the zero of U is taken to

be the zero 0 of V.

Let a linear space V be given. Trivial subspaces of V are V itself and thezero-space 0. The following facts are easily proved:

Proposition 1: The collection of all subspaces of V is intersection stable;

i.e., the intersection of any collection of subspaces of 0 V is again a subspace

of V.

We denote the span-mapping (see Sect. 03) associated with the collectionof all subspaces of V by Lsp and call its value Lsp S at a given S ∈ Sub Vthe linear span of S. In view of Prop. 1, Lsp S is the smallest subspaceof V that includes S (see Sect. 03). If Lsp S = V we say that S spans Vor that V is spanned by S. A subset of V is a subspace if and only if itcoincides with its own linear span. The linear span of the empty subset of Vis the zero-space 0 of V, i.e., Lsp∅ = 0. The linear span of a singletonv, v ∈ V, is the set of all scalar multiples of v, which we denote by Fv:

Lspv = Fv := ξv | ξ ∈ F. (12.1)

We note that the notations for member-wise sums of sets introduced inSects. 06 and 07 can be used, in particular, if the sets are subsets of a linearspace.

Proposition 2: If U1 and U2 are subspaces of V , so is their sum and

Lsp(U1 ∪ U2) = U1 + U2. (12.2)

12. SUBSPACES 43

More generally, if (Ui | i ∈ I) is a finite family of subspaces of V, so is its

sum and

Lsp

(

⋃

i∈I

Ui

)

=∑

i∈I

Ui. (12.3)

Proposition 3: If (Si | i ∈ I) is a finite family of arbitrary subsets of V,then

Lsp

(

⋃

i∈I

Si

)

=∑

i∈I

Lsp Si. (12.4)

Definition 2: We say that two subspaces Ui and U2 of V are disjunct,and that U2 is disjunct from U1, if U1 ∩ U2 = 0.

We say that two subspaces U1 and U2 of V are supplementary in V and

that U2 is a supplement of U1 in V if U1 ∩ U2 = 0 and U1 + U2 = V.

Proposition 4: Let U1,U2 be subspaces of V. Then the following are

equivalent:

(i) U1 and U2 are supplementary in V.

(ii) To every v ∈ V corresponds exactly one pair (u1,u2) ∈ U1 × U2 such

that v = u1+ u2.

(iii) U2 is maximal among the subspaces that are disjunct from U1, i.e., U2

is disjunct from U1 and not properly included in any other subspace

disjunct from U1.

Pitfall: One should not confuse “disjunct” with “disjoint”, or “supplement”with “complement”. Two subspaces, even if disjunct, are never disjoint.The complement of a subspace is never a subspace, and there is no relationbetween this complement and any supplement. Moreover, supplements arenot unique unless the given subspace is the whole space V or the zero-space0.

To say that two subspaces U1 and U2 are disjunct is equivalent to sayingthey are supplementary in U1 + U2.


Notes 12

(1) The phrase “subspace generated by” is often used for what we call “linear spanof”. The modifier “linear” is very often omitted and the linear span of a subsetS of V is often simply denoted by Sp S. I think it is important to distinguishcarefully between linear spans and other kinds of spans. (See the discussion ofspan-mappings in Sect. 03 and the definition of flat spans in Prop. 7 of Sect. 32).

(2) In many textbooks the term “disjoint” is used when we say “disjunct” and the term“complementary” when we say “supplementary”. Such usage clashes with the set-theoretical meanings of “disjoint” and “complementary” and greatly increases thedanger of becoming a victime of the Pitfall above. It is for this reason that I in-troduced the terms “disjunct” and “supplementary” after consulting a thesaurus.Later, I realized that “supplementary” had already been used by others. In par-ticular, Bourbaki has used “supplementaire” since 1947, which I read in 1950 buthad forgotten.

(3) Some people write U1 ⊕ U2 instead of merely U1 + U2 when the two subspaces U1

and U2 are disjunct, and then call U1 ⊕ U2 the “direct sum” rather than merelythe “sum” of U1 and U2. This is really absurd; it indicates confusion between aproperty (namely disjunctness) of a pair (U1,U2) of two subspaces and a propertyof their sum. The sum U1 + U2, as a subspace of V, has no special properties,because U1 and U2 cannot be recovered from U1 + U2 (except when they are allzero-spaces).

13 Linear Mappings

Definition 1: A mapping L : V → V ′ from a linear space V to a linear space

V ′ is said to be linear if it preserves addition and scalar multiplication, i.e.,

if

add′(L(u),L(v)) = L(add(u,v) for all u,v ∈ V

and

sm′(ξ,L(u)) = L(sm(ξ,u)) for all ξ ∈ F,u ∈ V.

where add, sm denote operations in V and add′, sm′ operations in V ′.For linear mappings, it is customary to omit parentheses and write Lu

for the value of L at u. In simplified notation, the rules that define linearityread

Lu + Lv = L(u + v), (13.1)

L(ξu) = ξ(Lu), (13.2)

13. LINEAR MAPPINGS 45

valid for all u,v ∈ V and all ξ ∈ F. It is easily seen that linear mappingsL : V → V ′ also preserve zero and opposition, i.e., that

L0 = 0′ (13.3)

when 0 and 0′ denote the zeros of V and V ′ respectively, and that

L(−u) = −(Lu) (13.4)

for all u ∈ V.

the constant mapping 0′V→V′ , whose value is the zero 0′ of V ′, is the onlyconstant mapping that is linear. This mapping is called a zero-mappingand is denoted simply by 0. The identity mapping 1V : V → V of any linearspace V is trivially linear.

The following facts are easily proved:

Proposition 1: The composite of two linear mappings is again linear. More

precisely: If V,V ′ and V ′′ are linear spaces and if L : V → V ′ and M: V ′ → V ′′ are linear mappings, so is M L : V → V ′′.

Proposition 2: The inverse of an invertible linear mappings is again linear.

More precisely: If L : V → V ′ is linear and invertible, then L← : V ′ → V is

linear.

If L : V → V ′ is linear, it is customary to write Lf := L f when f isany mapping with codomain V. In particular, we write ML instead of M L when both L and M are linear.

If L : V → V ′ is linear and invertible, it is customary to write L−1 := L←

for its inverse. By Prop. 2, both L and L−1 then preserve the linear-spacestructure. Invertible linear mappings are, therefore, linear-space isomor-phisms, and we also call them linear isomorphisms. We say that twolinear spaces V and V ′ are linearly isomorphic if there exists a linearisomorphism from V to V ′.

Proposition 3: If L : V → V ′ is linear and if U and U ′ are subspaces of Vand V ′, respectively, then the image L>(U) of U is a subspace of V ′ and the

pre-image L<(U ′) of U ′ is a subspace of V.

In particular, Rng L = L>(V) is a subspace of V ′ and L<(0) is asubspace of V.

Definition: The pre-image of the zero-subspace of the codomain of a linear

mapping L is called the nullspace of L and is denoted by


Null L := L<(0) = u ∈ Dom L | Lu = 0. (13.5)

Proposition 4: A linear mapping L : V → V ′ is injective if and only if

Null L= 0.If L : V → V ′ is linear, if U is a subspace of V, and if U ′ is a linear

space of which L>(U) is a subspace, then the adjustment L|U′

U : U → U ′ isevidently again linear. In particular, if U is a subspace of V, the inclusionmapping 1U⊂V is linear.

Proposition 5: If L : V → V ′ is linear and U is a supplement of Null L in

V, then L|Rng LU is invertible. If v′ ∈ Rng L, then v ∈ V is a solution of the

linear equation

? v ∈ V, Lv = v′ (13.6)

if and only if v ∈(

L |RngLU

)−1v′ + Null L.

Notes 13

(1) The terms “linear transformation” or “linear operator” are sometimes used for whatwe call a “linear mapping”.

(2) Some people use the term “kernel” for what we call “nullspace” and they use thenotation Ker L for Null L. Although the nullspace of L is a special kind of kernel(in the sense explained in Sect. 06), I believe it is useful to have a special termand a special notation to stress that one deals with linear mappings rather thanhomomorphisms in general. Notations such as N (N) and N(L) are often used forNull L.

14 Spaces of Mappings, Product Spaces

Let V be a linear space and let S be any set. The set Map(S,V)of all mappings from S to V can be endowed, in a natural man-ner, with the structure of a linear space by defining the operations inMap(S,V) by value-wise application of the operations in V. Thus, iff,g ∈ Map(S,V) and ξ ∈ F, then f + g, −f and ξ f are defined by requiring

(f + g)(s) := f(s) + g(s), (14.1)

(−f)(s) := −(f(s)), (14.2)

(ξf)(s) := ξ(f(s)), (14.3)

to be valid for all s ∈ S. The zero-element of Map(S,V) is the constantmapping 0S→V with value 0 ∈ V. We denote it simply by 0. It is immediate

14. SPACES OF MAPPINGS, PRODUCT SPACES 47

that the axioms (A1)-(S4) of Sect. 01 for a linear space are, in fact, satisfiedfor the structure of Map(S,V) just described. Thus, we can talk about the(linear) space of mappings Map(S,V).

Proposition 1: Let S and S′ be sets, let h : S′ → S be a mapping, and let

V be a linear space. Then

(f 7→ f h) : Map(S,V) → Map(S′,V)

is a linear mapping, i.e.,

(f + g) h = (f h) + (g h), (14.4)

(ξf) h = ξ(f h) (14.5)

hold for all f,g ∈ Map(S,V) and all ξ ∈ F.Let V,V ′ be linear spaces. We denote the set of all linear mappings from

V into V ′ by

Lin(V,V ′) :=

L ∈ Map(V,V ′) | L is linear

.

Proposition 2: Lin(V,V ′) is a subspace of Map(V,V ′).

Proof: Lin(V,V ′) is not empty because the zero-mapping belongs to it. Wemust show that Lin(V,V ′) is stable under addition and scalar multiplication.Let L,M ∈ Lin(V,V ′) be given. Using first the definition (14.1) for L + M,then the axioms (A1) and (A2) in V ′, then the linearity rule (13.1) for Land M, and finally the definition of L + M again, we obtain

(L + M)(u) + (L + M)(v) = (Lu + Mu) + (Lv + Mv)= (Lu + Lv) + (Mu + Mv)= L(u + v) + M(u + v)= (L + M)(u + v)

for all u,v ∈ V. This shows that L + M satisfies the linearity rule (13.1). Ina similar way, one proves that L + M satisfies the linearity rule (13.2) andhence that L + M ∈ Lin(V,V ′). Since L,M ∈ Lin(calV ,V ′) were arbitrary,it follows that Lin(V,V ′) is stable under addition.

The proof that Lin(V,V ′) is stable under scalar multiplication is left tothe reader.

We call Lin(V,V ′) the space of linear mappings from V to V ′. Wedenote the set of all invertible linear mappings from V to V ′, i.e., the set of


all linear isomorphisms from V to V ′, by Lis(V,V ′). This set Lis(V,V ′) isa subset of Lin(V,V ′) but not a subspace (except when both V and V ′ arezero-spaces).

Proposition 3: Let S be a set, let V and V ′ be linear spaces, and let

L ∈ Lin(V,V ′) be given. Then

(f 7→ Lf) : Map(S,V) → Map(s,V ′)

is a linear mapping, i.e.,

L(f + g) = Lf + Lg, (14.6)

L(ξf) = ξ(Lf) (14.7)

hold for all f,g ∈ Map(S,V) and all ξ ∈ F.

Let (V1,V2) be a pair of linear spaces. The set product V1×V2 (see Sect.02) has the natural structure of a linear space whose operations are definedby term-wise application of the operations in V1 and V2, i.e., by

(u1,u2) + (v1,v2) := (u1 + v1,u2 + v2), (14.8)

ξ(u1,u2) := (ξu1, ξu2), (14.9)

−(u1,u2) := (−u1,−u2) (14.10)

for all u1,v1 ∈ V1, u2,v2 ∈ V2, and ξ ∈ F. The zero of V1 × V2 is the pair(01,02), where 01 is the zero of V1 and 02 the zero of V2. Thus, we mayrefer to V1 × V2 as the (linear) product-space of V1 and V2.

The evaluations ev1 : V1 × V2 → V1 and ev2 : V1 × V2 → V2 associatedwith the product space V1 × V2 (see Sect. 04) are obviously linear. So arethe insertion mappings

ins1 := (u1 7→ (u1,0)) : V1 → V1 × V2,ins2 := (u2 7→ (0,u2)) : V2 → V1 × V2.

(14.11)

If U1 is a subspace of V1 and U2 a subspace of V2, then U1×U2 is a subspaceof V1 × V2.

Pitfall: In general, the product-space V1 ×V2 has many subspaces that arenot of the form U1 × U2.

Let W be a third linear space. Recall the identification

14. SPACES OF MAPPINGS, PRODUCT SPACES 49

Map(W ,V1) × Map(W ,V2) ∼= Map(W ,V1 × V2)

(see Sect. 04). It turns out that this identification is such that the subspacesLin(W ,V1), Lin(W ,V2)), and Lin(W ,V1×V2) are matched in the sense thatthe pairs (L1,L2) ∈ Lin(W ,V1)× Lin(W ,V2) of linear mappings correspondto the lienar mappings in Lin(W ,V1 × V2) defined by term-wise evaluation,i.e., by

(L1,L2)w := (L1w,L2w) for all w ∈ W . (14.12)

Thus, (14.12) describes the identification

Lin(W ,V1) × Lin(W ,V2) ∼= Lin(W ,V1 × V2).

There is also a natural linear isomorphism

((L1,L2) 7→ L1 ⊕ L2) : Lin(V1,W) × Lin(V2,W) → Lin(V1 × V2,W).

It is defined by

(L1 ⊕ L2)(v1,v2) := L1v1 + L2v2 (14.13)

for all v1 ∈ V1,v2 ∈ V2, which is equivalent to

L1 ⊕ L2 := L1ev1 + L2ev2, (14.14)

where ev1 and ev2 are the evaluation mappings associated with V1 ×V2 (seeSect. 04).

What we said about a pair of linear spaces easily generalizes to an arbi-

trary family (Vi | i ∈ I) of linear spaces. The set product ×(Vi | i ∈ I)has the natural structure of a linear space whose operations are defined byterm-wise application of the operations in the Vi, i ∈ I. Hence, we may refer

to×(Vi | i ∈ I) as the product-space of the family (Vi | i ∈ I). Givenj ∈ I, the evaluation

evj :×i∈I

Vi → Vj ,

(see (04.9)) is linear. So is the insertion mapping

insj : Vj →×i∈I

Vi


defined by

(insjv)i :=

0 ∈ Vi if i 6= jv ∈ Vj if i = j

for all v ∈ Vj . (14.15)

It is evident that

evk insj =

0 ∈ Lin(Vj ,Vk) if j 6= k1Vj

∈ Lin(Vj ,Vj) if j = k

(14.16)

for all j, k ∈ I.Let W be an additional linear space. For families (Li | i ∈ I) or linear

mappings Li ∈ Lin(W ,Vi) we use termwise evaluation

(Li | i ∈ I)w := (Liw | i ∈ I) for all w ∈ W , (14.17)

which describes the identification

×i∈I

Lin(W ,Vi) ∼= Lin(W ,×i∈I

Vi).

If the index set I is finite, we also have a natural isomorphism

((Li | i ∈ I) 7→⊕

i∈I

Li) :×i∈I

Lin(Vi,W) → Lin(×i∈I

Vi,W)

defined by

⊕

i∈I

Li :=∑

i∈I

Lievi. (14.18)

If the spaces in a family indexed on I all coincide with a given linearspace V, then the product space reduces to the power-space VI , whichconsist of all families in V indexed on I (see Sect. 02). The set V(I) of allfamilies contained in VI that have finite support (see Sect. 07) is easily seento be a subspace of VI . Of particular interest is the space F(I) of all familiesλ := (λi | i ∈ I) in F with finite support. Also, if I and J are finite sets, itis useful to consider the linear space FJ×I of J × I-matrices with terms inF (see Sect. 02). Cross products of linear mappings, as defined in Sect. 04,are again linear mappings.

15. LINEAR COMBINATIONS, LINEAR INDEPENDENCE, BASES 51

Notes 14

(1) The notations L(V,V ′) and L(V,V ′) for our Lin(V,V ′) are very common.

(2) The notation Lis(V,V ′) was apparently first introduced by S. Lang (Introductionto Differentiable Manifolds, Interscience 1966). In some previous work, I usedInvlin(V,V ′).

(3) The product-space V1×V2 is sometimes called the “direct sum” of the linear spacesV1 and V2 and it is then denoted by V1⊕V2. I believe such a notation is superfluousbecause the set-product V1 × V2 carries the natural structure of a linear space anda special notation to emphasize this fact is redundant. A similar remark applies toproduct-spaces of families of linear spaces.

15 Linear Combinations, Linear Independence,

Bases

Definition 1: Let f := (fi | i ∈ I) be a family of elements in a linear space

V. The mapping

lncVf : F(I) → V

defined by

lncVf λ :=∑

i∈I

λifi (15.1)

for all λ := (λi | i ∈ I) ∈ F(I) is then called the linear-combinationmapping for f. The value lncVf λ is called the linear combination of f with

coefficient family λ. (See (07.10) and (07.11) for the notation used here.)

It is evident from the rules that govern sums (see Sect. 07) that linearcombination mappings are linear mappings.

In the special case when V := F and when f is the constant familywhose terms are all 1 ∈ F, then lncF

f reduces to the summation mappingsumi : F(I) → F given by

sumIλ :=∑

i∈I

λi for all λ ∈ F(I). (15.2)

Let U be a subspace of the given linear space V and let f be a familyin U . then Rng lncVf ⊂ U and lncUf is obtained from lncVf by adjustment ofcodomain

lncUf = lncVf |U if Rng f ⊂ U . (15.3)


Definition 2: The family f in a linear space V is said to be linearly in-dependent in V, spanning in V, or a basis of V depending on whether

the linear combination mapping lncVf is injective, surjective, or invertible,

respectively. We say that f is linearly dependent if it is not linearly inde-

pendent.

If f is a family in a given linear space V and if there is not doubt whatV is, we simply write lncf := lncVf . Also we then omit “in V” and “of V”when we say that f is linear independent, spanning, or a basis. Actually, iff is linearly independent in V, it is also linearly independent in any linearspace that includes Rng f.

If b := (bi | i ∈ I) is a basis of V and if v ∈ V is given, then lnc−1b v ∈ F(I)

is called the family components of v relative to the basis b. Thus

v =∑

i∈I

λibi (15.4)

holds if and only if (λi | i ∈ I) ∈ F(I) is the family of components of vrelative to b.

An application of Prop. 4 of Sect. 13 gives:

Proposition 1: The family f is linearly independent if and only if Null(lncf ) = 0.

Let I ′ be a subset of the given index set I. We then define the insertionmapping

insI′⊂I : F(I′) → F(I)

by

(insI′⊂I(λ))i =

λi if i ∈ I ′

0 if i ∈ I\I ′

(15.5)

It is clear that insI′⊂I is an injective linear mapping. If f is a family indexedon I and if f|I′ is its restriction to I ′, we have

lncf |I′ = lncf insI′⊂I . (15.6)

From this formula and the injectivity of insI′⊂I we can read off the following:

Proposition 2: If the family f is linearly independent, so are all its restric-

tions. If any restriction of f is spanning, so is f.

15. LINEAR COMBINATIONS, LINEAR INDEPENDENCE, BASES 53

Let S be a subset of V. In view of the identification of a set with thefamily obtained by self-indexing (see Sect. 02) we can consider the linearcombination mapping lncS : F(S) → V, given by

lncSλ :=∑

u∈S

λuu (15.7)

for all λ := (λu | u ∈ S) ∈ F(S). Thus the definitions above apply not onlyto families in V in general, but also to subsets of V in particular.

The following facts are easily verified:

Proposition 3: A family in V is linearly independent if and only if it

is injective and its range is linearly independent. No term of a linearly

independent family can be zero.

Proposition 4: A family in V is spanning if and only if its range is span-

ning.

Proposition 5: A family in V is a basis if and only if it is injective and its

ranges is a basis-set.

The following result shows that the definition of a spanning set as aspecial case of a spanning family is not in conflict with the definition of aset that spans the space as given in Sect. 12.

Proposition 6: The set of all linear combinations of a family in V is the

linear span of the range of f, i.e., Rng lncf = Lsp(Rng f). In particular, if

S is a subset of V, then Rng lncS = LspS.

The following is an immediate consequence of Prop. 6:

Proposition 7: A family f is linearly independent if and only if it is a basis

of LspRng f.

The next two results are not not hard to prove.

Proposition 8: A subset b of V is linearly dependent if and only if Lspb =Lsp(b\v) for some v ∈ b.

Proposition 9: If b is a linearly independent subset of V and v ∈ V\b, then

b ∪ v is linearly dependent if and only if v ∈ Lspb.

By applying Props. 8 and 9, one easily proves the following importantresult:


Characterization of Bases: Let b be subset of V. Then the following are

equivalent:

(i) b is a basis of V.

(ii) b is both linearly independent and spanning.

(iii) b is a maximal linearly independent set, i.e., b is linearly independent

and is not a proper subset of any other linearly independent set.

(iv) b is a minimal spanning set, i.e., b is spanning and has no proper

subset that is also spanning.

We note that the zero-space 0 has exactly one set basis, namely theempty set.

Pitfall: Every linear space other than a zero-space has infinitely manybases, if it has any at all. Unless the space has structure in addition to itsstructure as a linear space, all of these bases are of equal standing. Thebases form a “democracy”. For example, every singleton ξ with ξ ∈ F×

is a basis set of the linear space F. The special role of the number 1 andhence the bases 1 comes from the additional structure in F given by themultiplication.

Notes 15

(1) In most textbooks, the definition of “linear combination” is rather muddled.Much of the confusion comes from a failure to distinguish a process (the linear-combination mapping) from its result (the linear-combination). Also, most authorsfail to make a clear distinction between sets, lists, and families, a distinction thatis crucial here. I believe that much precision, clarity, and insight are gained bythe use of linear-combination mappings. Most textbooks only talk around thesemappings without explicitly using them.

(2) The condition of Prop. 1 is very often used as the definition of linear linear inde-pendence.

(3) Many people say “coordinates” instead of “components” of v relative to the basisof b. I prefer to use the term “coordinate” only when it has the meaning describedin Chapter 7.

16 Matrices, Elimination of Unknowns

The following result states, roughtly, that linear mappings preserve linearcombinations.

16. MATRICES, ELIMINATION OF UNKNOWNS 55

Proposition 1: If V and W are linear spaces, f := (fi | i ∈ I) a family in

V, L : V → W a linear mapping, and Lf := (Lf i | i ∈ I), then

lnc(Lf) = L lncf . (16.1)

Applying (16.1) to the case when f is a basis, we obtain:

Proposition 2: Let b := (bi | i ∈ I) be a basis of the linear space V and let

g := (gi | i ∈ I) be a family in the linear space W , indexed on the same set

I as b. Then there is exactly one L ∈ Lin(V, W) such that Lb = g. This

L is injective, surjective, or invertible depending on whether g is linearly

independent, spanning, or a basis, respectively.

The first part of Prop. 2 states, roughly, that a linear mapping can bespecified by prescribing its effect on a basis.

Let I be any index set. We define a family δI := (δIi | i ∈ I) in Ff by

(δIi )k = δi,k :=

1 if k = i0 if k 6= i

for all i, k ∈ I. (16.2)

It is easily seen that lncδI = 1F(I) , which is, of course, invertible. Hence

δI is a basis of F(I). We call it the standard basis of F(I). If f:= (fi | i ∈ I)is a family in a given linear space V, then

fj = lncf δIj for all j ∈ I. (16.3)

If we apply Prop. 2 to the case when I is finite, when b := δI is thestandard basis of V := FI , and when W := FJ for some finite index set J ,we obtain:

Proposition 3: Let I and J be finite index sets. Then the mapping from

Lin(FI , JI) to FJ×I which assigns to M ∈ Lin(FI , FJ) the matrix

((MδIi )i | (j, i) ∈ J × I) is a linear isomorphism.

We use the natural isomorphism described in Prop. 3 to identifyM ∈ Lin(FI , FJ) with the corresponding matrix in FJ×I , so that

Mj,i = (MδIi )j for all (j, i) ∈ J × I. (16.4)

Thus, we obtain the identification

Lin(FI , FJ) ∼= FJ×I . (16.5)

Proposition 4: Let I, J be finite index sets. If M ∈ Lin(FI , FJ) ∼= FJ×I

and λ ∈ FI , then


(Mλ)j =∑

j∈J

Nk,jMj,i for all i ∈ I. (16.6)

Proposition 5: Let I, J, K be finite index sets. If M ∈ Lin(FI , FJ) ∼= FJ×I

and

N ∈ Lin(FJ , FK) ∼= FK×J , then NM ∈ Lin(FI , FK) ∼= FK×I is given

by

(NM)k,i =∑

j∈J

Nk,jMj,i for all k, i) ∈ K × I. (16.7)

In the case when I := n ], J := m ], K := p ] and when the bookkeepingscheme (02.4) is used, then (16.6) and (16.7) can be represented in the forms

M11 M12 · · · M1n

M21 M22 · · · M2n

......

...Mm1 Mm2 · · · Mmn

λ1

λ2...

λn

=

∑

i∈I]

M1iλi

∑

i∈n]

M2iλi

...sumI∈n]

Mmiλi

(16.8)

and

N11 · · · N1m

......

Np1 · · · Npm

M11 · · · M1n

......

Mm1 · · · Mmm

=

∑

j∈m] NijMj1 · · ·∑

j∈m] N1jMjn

......

∑

j∈m] NpjMj1 · · ·∑

j∈m] MpjMjn

(16.9)

In particular, (16.9) states that the composition matrices when regardedas linear mappings corresponds to the familiar “row-by-column” multiplica-tion of matrices.

Let V and W be linear spaces and let b := (bi | i ∈ I) andc := (cj | j ∈ J) be finite bases of V and W , respectively. Then the mapping

(L 7→ (lncc)−1Llncb) : Lin(V,W) → Lin(FI , FJ) (16.10)

is a linear isomorphism. Given L ∈ Lin(V,W) we say that M :=(lncc)

−1L lncb ∈ Lin(FI , FJ) ∼= FJ×I is the matrix of the linear map-ping L relative to the bases b and c. It is characterized by

16. MATRICES, ELIMINATION OF UNKNOWNS 57

Lbi =∑

j∈J

Mj,i, cj for all i ∈ I. (16.11)

Let I and J be finite index sets. If M ∈ Lin(FI , FJ) ∼= FJ×I and µ ∈ FJ

are given and if we consider the problem ? λ ∈ FI , Mλ = µ, i.e.,

?(λi | i ∈ I) ∈ FI ,∑

i∈I

Mj,iλi = µj for all j ∈ J, (16.12)

we say that we have the problem of solving a system of #J linear equationswith #I unknowns. The following theorem describes a procedure, familiarfrom elementary algebra, called eliminations of unknowns, which enablesone to reduce a system of linear equations to one having one equation lessand one unknown less.

Theorem on Elimination of Unknowns: Let I and J be finite sets

and let M ∈ FJ×I be such that Mj0,i0 6= 0 for given j0 ∈ J, i0 ∈ I. Put

I ′ := I\i0, J ′ := J\j0 and define M ′ ∈ FJ ′×I′ by

M ′j,i := Mj,i −

Mj,i0Mj0,i

Mj0,i0

for all (j, i) ∈ J ′ × I ′. (16.13)

Let µ ∈ FJ and µ′ ∈ FJ ′be related by

µ′j = µj −Mj,i0

Mj0,i0

j0 for all j ∈ J ′. (16.14)

Then λ ∈ FI is a solution of the equation

? λ ∈ FI , Mλ = µ

if and only if the restriction λ |I′∈ FI′ of λ is a solution of the equation

? λ′ ∈ FI′ , M ′λ′ = µ′

and λi0 is given by

λi0 =1

Mj0,i0

(

µj0 −∑

i∈I′

Mj0,iλi

)

. (16.15)

Corollary: If Lin(FI , FJ) contains an injective mapping, then #J ≥ #I.

Proof: We proceed by induction over #I. If #I = 0 then the assertion istrivial. Assume, then, that #I > 0 and that the assertion is valid if I is


replaced by a subset I ′ of I having one element less. Assume that Lin(FI , FJ)contains an injective mapping M , do that Null M = 0. Since FI 6= 0we must have M 6= 0. Hence we can apply the Theorem with the choiceµ := 0 to conclude that Null M ′ = 0, i.e., that M ′ ∈ Lin(FI′ , FJ ′

) mustbe injective. By the induction hypothesis, we have #J ′ ≥ #I ′, which means(#J) − 1 ≥ (#I) − 1 and hence implies #J ≥ #I.

Notes 16

(1) The notation δi,k as defined by (16.2) is often attributed to Kronecker (“the Kro-necker deltas”).

(2) The standard basis δI is sometimes called the “natural basis” of F(I).

(3) Various algorithms, based on the Theorem on Elimination of Unknowns, for solvingsystems of linear equations are often called “Gaussian elimination procedures”.

(4) A matrix can be interpreted as a non-invertible linear mapping by the identification(16.5) is often called a “singular matrix”. In the same way, many people would usethe term “non-singular” when we speak of an “invertible” matrix.

17 Dimension

Definition: We say that a linear space F is finite-dimensional if it is

spanned by some finite subset. The least among the cardinal numbers of

finite spanning subsets of V is called the dimension of V and is denoted by

dim V.

The following fundamental result gives a much stronger characterizationof the dimension than is given by the definition.

Characterization of Dimension: Let V be a finite-dimensional linear

space and f := (fi | i ∈ I) a family of elements in V.

(a) If f is linearly independent then I is finite and #I ≤ dimV, with

equality if and only if f is a basis.

(b) If f is spanning, then #I ≥ dimV, with equality if and only if f is a

basis.

The proof of this theorem can easily be obtained from the Characteriza-tion of Bases (Sect. 15) and the following:

Lemma: If (bj | j ∈ J) is a finite basis of V and if (fi | i ∈ I) is any linearly

independent family in V, then I must be finite and #I ≤ #J .

17. DIMENSION 59

Proof: Let I ′ be any finite subset of I. By Prop. 2 of Sect. 15, therestriction f|I′ is still linearly independent, which means thatlncf |I′ : FI′ → V is injective.

Since b is a basis, lncb is invertible and hence lnc−1b lncf |I′ : FI′ → FJ is

also injective. By the Corollary of the Theorem on Elimination of Unknowns,it follows that #I ′ ≦ #J . Since I ′ was an arbitrary finite subset of I, itfollows that I itself must be finite and #I ≦ #J.

The Theorem has the following immediate consequences:

Corollary 1: If V is a finite-dimensional linear space, then V has bases and

every basis of V has dim V terms.

Corollary 2: Two finite-dimensional spaces V and V ′ are linearly isomor-

phic if and only if they have the same dimension, i.e., Lis(V,V ′) 6= ∅ if and

only if dim V = dimV ′.Now let V be a finite-dimensional linear space. The following facts are

not hard to prove with the help of the Characterization of Dimension.

Proposition 1: If s is a linearly independent subset of V, then s must be

finite and

dim(Lsps) = #s. (17.1)

Proposition 2: Every subspace U of V is finite-dimensional and satisfies

dimU ≤ dimV, with equality only if U = V.

Proposition 3: Every subspace of V has a supplement in V. In fact, if U1

is a subspace of V , then U2 is a supplement of U1 in V if and only if U2 is

a space of greatest dimension among those that are disjunct from U1.

Proposition 4: Two subspaces U1 and U2 of V are disjunct if and only if

dimU1 + dimU2 = dim(U1 + U2). (17.2)

Proposition 5: Two subspaces U1 and U2 of V are supplementary in V if

and only if two of the following three conditions are satisfied:

(i) U1 ∩ U2 = 0,

(ii) U1 + U2 = V,

(iii) dimU1 + dimU2 = dimV.


Proposition 6: For any two subspaces U1,U2 of V, we have

dimU1 + dimU2 = dim(U1 + U2) + dim(U1 ∩ U2). (17.3)

The following Theorem is perhaps the single most useful fact about linearmappings between finite-dimensional spaces:

Theorem on Dimensions of Range and Nullspace: If L is a linear

mapping whose domain is finite-dimensional, then Rng L is finite dimen-

sional and

dim(Null L) + dim(Rng L) = dim(Dom L). (17.4)

Proof: Put V := Dom L. By Prop. 3 we may choose a supplement U of NullL in V. By Prop. 5 (iii), we have dim(Null L) + dimU = dimV. By Prop.5 of Sect. 13, L|RngL

U is invertible and hence a linear isomorphism. Sincelinear isomorphisms obviously preserve dimension, we infer that dimU =dim(Rng L) and hence that (17.4) holds.

The following result is an immediate consequence of the Theorem juststated and of Prop. 4 of Sect. 13. Its name comes from its analogy to thePigeonhole Principle stated in Sect. 05.

Pigeonhole Principle for Linear Mappings: Let L be a linear map-

ping with finite-dimensional domain and codomain. If L is injective, then

dim(Dom L) ≤ dim(Cod L). If L is surjective, then dim(Cod L) ≤dim(Dom L). If dim(Dom L) = dim(Cod L), then the following are equiva-

lent:

(i) L is invertible,

(ii) L is surjective,

(iii) L is injective,

(iv) Null L = 0.

Let I be a finite index set. Since the standard basis δI of FI (see Sect. 16)has #I terms, it follows from Cor. 1 to the Characterization of Dimensionthat

dim(FI) = #I. (17.5)

If I is replaced by a set product J × I of finite sets J and I, (17.5) yields

18. LINEONS 61

dim(Lin(FI , FJ)) = dim(FJ×I) = (#I)(#J). (17.6)

Propositon 7: If V and W are finite-dimensional linear spaces, so is

Lin(V,W), and

dim(Lin(V,W)) = (dimV)(dimW). (17.7)

Proof: In view of Cor. 1, we may choose a basis (bi | i ∈ I) of V and abasis (cj | m ∈ J) of W , so that dim V = #I and dim W = #J . Thedesired result follows from (17.6) and the fact that Lin(FI , FJ) is linearlyisomorphic to Lin(V,W) by virtue of (16.10).

Notes 17

(1) The dimensions of the nullspace and of the range of a linear mapping L are oftencalled the “nullity” and “rank” of L, respectively. I believe that this terminologyburdens the memory unnecessarily.

18 Lineons

Let V be a linear space. A linear mapping from V to itself will be called alineon on V and the space of all lineons on V will be denoted by

Lin V := Lin(V,V).

The composite of two lineons on V is again a lineon on V (see Prop.1 ofSect. 13). Composition in Lin V plays a role analogous to multiplication inF. For this reason, the composite of two lineons is also called their product,and composition in Lin V is also referred to as multiplication. The lineon0 is the analogue of the number 0 in F and the identity-lineon 1V is theanalogue of the number 1 in F. It follows from Props. 1 and 3 of Sect. 14 thatthe multiplication and the addition in Lin V are related by distributive lawsand hence that Lin V has the structure of a ring (see Sect. 06). Moreover,the composition-multiplication and scalar multiplication in Lin V are relatedby the associative laws

(ξL)M = ξ(LM) = L(ξM), (18.1)

valid for all L,M ∈ Lin V, ξ ∈ F.Since Lin V has, in addition to its structure as a linear space, also a

structure given by the composition-multiplication, we refer to Lin V as the


algebra of lineons on V. Most of the rules of elementary algebra arealso valid in Lin V. The most notable exception is the commutative law ofmultiplication: In general, if L,M ∈ Lin V, then LM is not the same asML. If it happens that LM = ML, we say that L and M commute.

A subspace of Lin V that contains 1V and is stable under multiplicationiscalled a subalgebra of Lin V. For example

F1V := ξ1V | ξ ∈ F

is a subalgebra of Lin V. A subalgebra that contains a given L ∈ LinV isthe set

Comm L := M ∈ Lin V | ML = LM (18.2)

of all lineons that commute with L. It is called the commutant-algebraof L.

The set of all automorphisms of V, i.e., all linear isomorphisms from Vto itself, is denoted by

Lis V := Lis(V, V.

LisV is a subgroup of the group Perm V of all permutations of V. We callLisV the lineon-group of V.

Pitfall: If V is not a zero-space, then LisV does not contain the zero-lineonand hence cannot be a subspace of LinV. If dimV > 1, then LisV 6=(LinV)×, i.e., there are non-zero lineons that are not invertible.

Definition 1: Let L be a lineon on the given linear space V. We say that

a subspace U of V is an L-subspace of V, or simply an L-space, if it is

L-invariant, i.e., if L>(U) ⊂ U (see Sect. 03).The zero-space 0 and V itself are L-spaces for every L ∈ LinV. Also,

Null L and Rng L are easily seen to be L-spaces for every L ∈ LinV. Everysubspace of V is an (λ1V)-space for every λ ∈ F. If U is an L-space, it isalso an Lm-space for every m ∈ N, where Lm, the m-th lineonic power of L,is defined by Lm := Lm, the m-th iterate of L.

If U is an L-subspace, then the adjustments L|U := L |UU is linear andhence a lineon on U . We have (L|U )

m = (Lm)|U for all m ∈ N.Let V, V ′ be linear spaces and let A : V → V ′ be a linear isomorphism.

The mapping

(L 7→ ALA−1) : LinV → LinV ′

18. LINEONS 63

is then an algebra-isomorphism. Also, we have

Rng(ALA−1) = A>(Rng L) (18.3)

and

Null (ALA−1) = A>(Null L) (18.4)

for all L ∈ Lin V.We assume now that V is finite-dimensional. By Prop. 7 of Sect. 17 we

then have

dimLinV = (dim V)2. (18.5)

The Linear Pigeonhole Principle of Sect. 17 has the following immediateconsequences:

Proposition 1: For a lineon L on V, the following are equivalent.

(i) L is invertible,

(ii) L is surjective,

(iii) L is injective,

(iv) Null L = 0.

Proposition 2: Let L ∈ LinV be given. Assume that L is left-invertible

or right-invertible, i.e., that ML = 1V or LM = 1V for some M ∈ Lin V.Then L is invertible, and M = L−1.

Let b := (bi | i ∈ I) be a basis of V. If L ∈ Lin V, we call

[L]b := lnc−1b L lncb (18.6)

the matrix of L relative to the basis b. (This is the matrix of L, asdefined in Sect. 16, when the bases b and c coincide.) This matrix is anelement of Lin FI ∼= FI×I and (L 7→ [L]b) : Lin V → Lin FI is an algebra-isomorphism, i.e., a linear isomorphism that also preserves multiplicationand the identity. By (16.11), the matrix [L]b is characterized by

Lbi =∑

j∈I

([L]b)j,ibj , for all i ∈ I. (18.7)

The matrix of the identity-lineon is the unit matrix given by


[1V ]b = 1FI = (δi,j | (i, j) ∈ I × I), (18.8)

where δi,j is 0 or 1 depending on whether i 6= j or i = j, respectively.

If I is a finite set, then the identification Lin FI ∼= FI×I shows that alineon on FI may be regarded as an I × I-matrix in F. The algebra Lin FI

is therefore also called a matrix-algebra.

Notes 18

(1) The commonly accepted term for a linear mapping of a linear space to itself is“linear transformation”, but “linear operator”, “operator”, or “tensor” are also usedin some contexts. I have felt for many years that there is a crying need for a shortterm with no other meanings. About three years ago my colleague Victor Mizelproposed to me the use of the contraction “lineon” for “linear transformation”, andhis wife Phyllis pointed out that this lends itself to the formation of the adjective“lineonic”, which turned out to be extremely useful.

I conjecture that the lack of a short term such as “lineon” is one of the reasonswhy so many mathematicians talk so often about matrices when they really mean,or should mean, lineons.

(2) Let n ∈ N be given. The group Lis Fn of lineons of Fn, is often called the “group ofinvertible n by n matrices of F” or the “general linear group Gl(n, F)”. Sometimes,the notations Gl(V) is used for the lineon-group Lis V.

(3) What we call the “commutant-algebra” of L is often called the “centralizer” of L.

19 Projections, Idempotents

Definition 1: Let V be a linear space. A linear mapping P : V → U to a

given subspace U of V is called a projection if P|U= 1U . A lineon

E ∈ Lin V is said to be idempotent (and is called an idempotent) if E2 =E.

Proposition 1: A lineon E ∈ Lin V is idempotent if and only if E|Rng is a

projection, i.e., if and only if E|RngE⊂V= 1RngE⊂V .

Proof: Put U := Rng E.

Assume that E is idempotent and let u ∈ U be given. We may choosev ∈ V such that u = Ev. Then

Eu = E(Ev) = E2v = Ev = u.

Since u ∈ U was arbitrary, it follows that E|U= 1U⊂V .

19. PROJECTIONS, IDEMPOTENTS 65

Now assume that E|U= 1U⊂V and let v ∈ V be given. ThenEv ∈ Rng E = U and hence

E2v = E(Ev) = 1U⊂V(Ev) = Ev.

Since v ∈ V was arbitrary, it follows that E2 = E.

Proposition 2: A linear mapping P : V → U from a linear space V to a

subspace U of V is a projection if and only if it is surjective and P|V∈ Lin Vis idempotent.

Proof: Apply Prop. 1 to the case when E := P|V .

Proposition 3: Let E ∈ Lin V be given. Then the following are equivalent:

(i) E is idempotent.

(ii) 1V −E is idempotent.

(iii) Rng(1V − E) = Null E.

(iv) Rng E = Null (1V −E).

Proof: We observe that

Null L ⊂ Rng(1V − L) (19.1)

is valid for all L ∈ Lin V.

(i) ⇔ (iii): We have E = E2 if and only if (E−E2)v = E(1V −E)v = 0 forall v ∈ V, which is the case if and only if Rng(1V −E) ⊂ Null E. In view of(19.1), this is equivalent to (iii).

(ii) ⇔ (iv): This follows by applying (i) ⇔ (iii) with E replaced by 1V− E.

(i) ⇔ (ii): This follows from the identity (1V −E)2 = (1V −E) + (E2 −E),valid for all E ∈ Lin V.

The following proposition shows how projections and idempotents areassociated with pairs of supplementary subspaces.

Proposition 4: Let U1,U2 be subspaces of the linear space V. Then the

following are equivalent:

(i) U1 and U2 are supplementary.


(ii) There are projections P1 and P2 onto U1 and U2, respectively, such

that

v = P1v + P2v for all v ∈ V. (19.2)

(iii) There is a projection P1 : V → U1 such that U2 = Null P1.

(iv) There are idempotents E1,E2 ∈ Lin V such that U1 = Rng E1,U2 =Rng E2 and

E1 + E2 = 1V . (19.3)

(v) There is an idempotent E1 ∈ Lin V such that U1 = Rng E1 and U2 =Null E1.

The projections P1 and P2 and the idempotents E1 and E2 are uniquely

determined by the subspaces U1,U2.

Proof: (i) ⇔ (ii): If U1 and U2 are supplementary, it follows from Prop. 4,(ii) of Sect. 12 that every v ∈ V uniquely determines u1 ∈ U1 and u2 ∈ U2

such that v = u1+ u2. This means that there are mappings P1 : V → U1

and P2 : V → U2 such that (19.2) holds. It is easily seen that P1 and P2

are projections.

(ii) ⇒ (iii): It is clear from (19.2) that v ∈ Null P1, i.e., P1 v = 0, holds ifand only if v = P2v, which is the case if and only if v ∈ U2.

(iii) ⇒ (v): This is an immediate consequence of Prop. 2, with E1 := P1 |V .(v) ⇒ (iv): Put E2 := 1V− E1. Then E2 is idempotent and Rng E2 =Null E1 = U2 by Prop. 2.

(iv) ⇒ (i): We observe that Null L ∩ Null (1V − L) = 0 and V =Rng L + Rng(1V − L) hold for all L ∈ LinV. Using this observation whenL := E1 and hence E2 = 1V − L we conclude that V = U1 + U2 and, fromProp. 3, that 0 = U1 ∩ U2. By Def. 2 of Sect. 12 this means that (i)holds.

The proof of uniqueness of P1, P2, E1, E2 is left to the reader.

The next result, which is easily proved, shows how linear mappings withdomain V are determined by their restrictions to each of two supplementarysubspaces of V.

Proposition 5: Let U1 and U2 be supplementary subspaces of V. For every

linear space V ′ and every L1 ∈ Lin(U1,V′), L2 ∈ Lin(U2,V

′), there is exactly

one L ∈ Lin(V,V ′) such that

110. PROBLEMS FOR CHAPTER 1 67

L1 = L |U1 and L2 = L |U2 .

It is given by

L := L1P1 + L2P2,

where P1 and P2 are the projections of Prop. 4, (ii).

Notes 19

(1) Some textbooks use the term “projection” in this context as a synonym for “idem-potent”. Although the two differ only in the choice of codomain, I believe that thedistinction is useful.

110 Problems for Chapter 1

1. Let V and V ′ be linear spaces. Show that a given mapping L : V → V ′

is linear if and only if its graph Gr(L) (defined by (03.1)) is a subspaceof V × V ′.

2. Let V be a linear space. For each L ∈ LinV, define the left-multiplication mapping

LeL : LinV → LinV

by

LeL(M) := LM for all M ∈ Lin V (P1.1)

(a) Show that LeL is linear for all L ∈ Lin V.

(b) Show that LeLLeK = LeLK for all L,K ∈ Lin V.

(c) Show that LeL is invertible if and only if L is invertible and that(Le−1

L ) = LeL−1 if this is the case.

3. Consider

L :=

[

2 11 0

]

∈ R2]×2] ∼= LinR2.

(a) Show that L is invertible and find its inverse.


(b) Determine the matrix of LeL, as defined by (P1.1), relative tothe list-basis

B :=

([

1 00 0

]

,

[

0 10 0

]

,

[

0 01 0

]

,

[

0 00 1

])

of Lin R2.

(c) Determine the inverse of the matrix you found in (b).

4. Let C∞(R) be the set of all functions f ∈ Map(R, R) such that f isk-times differentiable for all k ∈ N (see Sect. 08). It is clear that C∞Ris a subspace of the linear space Map (R, R) (see Sect. 13). DefineD ∈ Lin C∞(R) by

Df := ∂f for all f ∈ C∞(R) (P1.2)

(see (08.31)) and, for each n ∈ N, the subspace Pn of C∞(R) by

Pn := NullDn. (P1.3)

(a) Show that the sequence

p := (ιk | k ∈ N) (P1.4)

is linearly independent (see (08.26) for notation). Is it a basis ofC∞(R)?

(b) Show that, for each n ∈ N,

p |n[= (ιk | k ∈ n[) (P1.5)

is a basis of Pn and hence that dimPn = n.

5. Let C∞(R) and D ∈ Lin C∞(R) by defined as in Problem 4. DefineM ∈ Lin C∞(R) by Mf := ιf for all f ∈ C∞(R) (see (08.26)). Provethat

DM − MD = 1C∞(R). (P1.6)

6. Let C∞(R) be defined as in Problem 4 and define S ∈ Lin C∞(R) by

Sf := f (ι + 1) for all f ∈ C∞(R) (P1.7)

(a) Show that, for each n ∈ N, the subspace Pn of C∞(R) defined by(P1.3) is an S-space (see Sect. 18).

(b) Find the matrix of S |Pn∈ LinPn relative to the basis p |n[ of Pn

given by (P1.5).

7. Let V be a finite-dimensional linear space and let L ∈ LinV be given.


(i) Rng L ∩ Null L = 0,

(ii) Null (L2) ⊂ Null L,

(iii) Rng L + Null L = V.

8. Consider the subspaces U1 := R(1, 1) and U2 := R(1,−1) of R2.

(a) Show that U1 and U2 are supplementary in R2.

(b) Find the idempotent matrices E1, E2 ∈ R2]×2] ∼= LinR2 which aredetermined by U1,U2 according to Prop. 4 of Sect. 19.

(c) Determine a basis b = (b1, b2) of R2 such that the matrices of E1

and E2 relative to b are

[

1 00 0

]

and

[

0 00 1

]

, respectively.

9. Let a linear space V, a lineon L on V, and m ∈ N be given.

(a) Show that Rng(Lm) is an L-subspace of V.

(b) Show that Rng(Lm+1) ⊂ Rng(Lm), with equality if and only ifthe adjustment L|Rng(Lm) is surjective.

10. Let V be a finite-dimensional linear space and let L be a nilpotentlineon on V with nilpotency m, which means that Lm = 0 butLk 6= 0 for all k ∈ m[ (see also Sect. 93). Prove:

(a) If m > 0, then Null L 6= 0.

(b) The only L-subspace U for which the adjustment L|U is invertibleis the zero-space.

(c) The nilpotency cannot exceed dim V. (Hint: Use Problem 9.)

11. Let a linear space V and a lineon J ∈ Lin V be given.

(a) Show that the commutant-algebray of J (see (18.2)) is all of LinV, i.e., that CommJ = Lin V, if and only if J ∈ F1V .

(b) Prove: If dim V > 0, if J2 = −1V , and if the equation?ξ ∈ F, ξ2 + 1 = 0 has no solution, then CommJ 6= Lin V.

12. Let a linear space V and a lineon J on V satisfying J2 = −1V be given.Define E ∈ Lin(Lin V) by

EL := 12(L + JLJ) for all L ∈ Lin V. (P1.8)


(a) Show that E is idempotent.

(b) Show that Null E = CommJ, the commutant-algebra of J (see(18.2)).

(c) Show that

Rng E = L ∈ LinV | LJ = −JL, (P1.9)

the space of all lineons that “anticommute” with J.

(d) Prove: If dim V > 0 and if ξ ∈ F | ξ2 + 1 = 0 = ∅, then thespace (P1.9) is not the zero-space. (Hint: Use (b) of Problem 11and Prop. 4 of Sect. 19.)

Chapter 2

Duality, Bilinearity

In this chapter, the phrase “let . . . be a linear space” will be used as ashorthand for “let . . . be a finite-dimensional linear space over R”. (Actually,many definitions remain meaningful and many results remain valid when thegiven spaces are infinite-dimenisonal or when R is replaced by an arbitraryfield. The interested reader will be able to decide for himself when this isthe case.)

21 Dual Spaces, Transposition, Annihilators

Let V be a linear space. We write

V∗ := Lin(V, R)

and call the linear space V∗ the dual space of the space V. The elementsof V∗ are often called linear forms or covectors, depending on context. Itis evident from Prop.7 of Sect. 17 that

dimV∗ = dimV. (21.1)

Let V and W be linear spaces and let L ∈ Lin(V,W) be given. It followsfrom Props.1 and 2 of Sect. 14 that the mapping (µ 7→ µL) : W∗ → V∗ islinear. We call this mapping the transpose of L and denote it by L⊤, sothat

L⊤ ∈ Lin(W∗,V∗) if L ∈ Lin(V,W) (21.2)

and

L⊤µ = µL for all µ ∈ W∗. (21.3)

71

72 CHAPTER 2. DUALITY, BILINEARITY

It is an immediate consequence of Prop.3 of Sect. 14 that the mapping

(L 7→ L⊤) : Lin(V,W) → Lin(W∗,V∗) (21.4)

is linear. This mapping is called the transposition on Lin(V,W). Thefollowing rules follow directly from (21.3):

Proposition 1: For every linear space V, we have

(1V)⊤ = 1V∗ . (21.5)

Let V,V ′,V ′′ be linear spaces. For all L ∈ Lin(V,V ′) and all

M ∈ Lin(V ′,V ′′), we have

(ML)⊤ = L⊤M⊤. (21.6)

If L ∈ Lin(V,V ′) is invertible, so is L⊤ ∈ Lin(V′∗,V∗), and

(L⊤)−1 = (L−1)⊤. (21.7)

Definition: Let V be a linear space and let S be a subset of V. We say

that a linear form λ ∈ V∗ annihilates S if λ>(S) ⊂ 0, or, equivalently,

λ|S = 0, or, equivalently, S ⊂ Null λ. The set of all linear forms that

annihilate S is called the annihilator of S and is denoted by

S⊥ := λ ∈ V∗ | λ>(S) ⊂ 0. (21.8)

The following facts are immediate consequences of the definition.Proposition 2: ∅⊥ = 0⊥ = V∗ and V⊥ = 0. If S1 and S2 are

subsets of V then

S1 ⊂ S2 =⇒ S⊥2 ⊂ S⊥

1 .

Proposition 3: For every subset S of V, S⊥ is a subspace of V∗ and

(LspS)⊥ = S⊥.

Proposition 4: If (Si | i ∈ I) is a family of subsets of V, then

(⋃

i∈I

Si)⊥ =

⋂

i∈I

S⊥i . (21.9)

Combining Prop.3 and Prop.4, and using Prop.2 of Sect. 12, we obtainthe following:

Proposition 5: If (Ui | i ∈ I) is a family of subspaces of V, then

(∑

i∈I

Ui)⊥ =

⋂

i∈I

U⊥i . (21.10)

21. DUAL SPACES, TRANSPOSITION, ANNIHILATORS 73

In particular, if U1, U2 are subspaces of V, then

(U1 + U2)⊥ = U⊥

1 ∩ U⊥2 . (21.11)

The following result relates the annihilator of a subspace to the annihi-lator of the image of this subspace under a linear mapping.

Theorem on Annihilators and Transposes: Let V,W be linear

spaces and let L ∈ Lin(V,W) be given. For every subspace U of V, we

then have

(L>(U))⊥ = (L⊤)<(U⊥). (21.12)

In particular, we have

(Rng L)⊥ = Null L⊤. (21.13)

Proof: Let µ ∈ W∗ be given. Then, by (21.8) and (21.3),

µ ∈ (L>(U))⊥ ⇐⇒ 0 = µ>(L>(U)) = (µL)>(U) = (L⊤µ)>(U)

⇐⇒ L⊤µ ∈ U⊥ ⇐⇒ µ ∈ (L⊤)<(U⊥).

Since µ ∈ W∗ was arbitrary, (21.12) follows. Putting U := V in (21.12)yields (21.13).

The following result states, among other things, that every linear formon a subspace of V can be extended to a linear form on all of V.

Proposition 6: Let U be a subspace of V. The mapping

(λ 7→ λ|U ) : V∗ → U∗ (21.14)

is linear and surjective, and its nullspace is U⊥.Proof: It is evident that the mapping (21.14) is linear and that its

nullspace is U⊥. By Prop.3 of Sect. 17, we may choose a supplement U ′ ofU in V. Now let µ ∈ U∗ be given. By Prop.5 of Sect. 19 there is a λ ∈ V∗

such that λ|U = µ (and λ|U ′ = 0, say). Since µ ∈ U∗ was arbitrary, itfollows that the mapping (21.14) is surjective.

Using (21.1) and the Theorem on Dimensions of Range and Nullspace(Sect. 17), we see that Prop.6 has the following consequence:

Formula for Dimension of Annihilators: For every subspace U of a

given linear space V we have

dimU⊥ = dimV − dimU . (21.15)


Notes 21

(1) The notations V ′ or V are sometimes used for the dual V∗ of a linear space V.

(2) Some people use the term “linear functional” instead of “linear form”. I prefer toreserve “linear functional” for the case when the domain is infinite-dimensional.

(3) The terms “adjoint” or “dual” are often used in place of the “transpose” of a linearmapping L. Other notations for our L

⊤ are L∗, L

t, tL, and L.

(4) The notation S0 instead of S⊥ is sometimes used for the annihilator of the set S.

22 The Second Dual Space

In view of (21.1) and Corollary 2 of the Characterization of Dimension (Sect.17), it follows that there exist linear isomorphisms from a given linear spaceV to its dual V∗. However, if no structure on V other than its structure asa linear space is given, none of these isomorphisms is natural (see the Re-mark at the end of Sect.23). The specification of any one such isomorphismrequires some capricious choice, such as the choice of a basis. By contrast,one can associate with each linear space V a natural isomorphism from V toits second dual, i.e. to the dual V∗∗ of the dual V∗ of V. This isomorphismis an evaluation mapping as described in Sect.04.

Proposition 1: Let V be a linear space. For each v ∈ V, the evaluation

ev(v) : V∗ → R, defined by

ev(v)(λ) := λv for all λ ∈ V∗, (22.1)

is linear and hence a member of V∗∗. The evaluation mapping

ev : V → V∗∗ defined in this way is a linear isomorphism.

Proof: The linearity of ev(v) : V∗ → R merely reflects the fact that thelinear-space operations in V∗ are defined by value-wise applications of theoperations in R. The linearity of ev : V → V∗∗ follows from the fact thateach λ ∈ V∗ is linear.

Put U := Null (ev) and let v ∈ V be given. By (22.1), we have

v ∈ U ⇐⇒ (λv = 0 for all λ ∈ V∗),

which means, by the definition of annihilator (see Sect.21), that V∗ coincideswith the annihilator U⊥ of U . Since dimV∗ = dimV, we conclude from theFormula (21.15) for Dimension of Annihilators that dimU = 0 and henceNull (ev) = U = 0. Since dimV∗∗ = dimV, we can use the Pigeonhole

22. THE SECOND DUAL SPACE 75

Principle for Linear Mappings (Sect. 17) to conclude that ev is invertible.

We use the natural isomorphism described by Prop.1 to identify V∗∗ withV:

V∗∗ ∼= V,

and hence we use the same symbol for an element of V and the correspondingelement of V∗∗. Thus, (22.1) reduces to

vλ = λv for all λ ∈ V∗,v ∈ V, (22.2)

where the v on the left side is interpreted as an element of V∗∗.

The identification V∗∗ ∼= V induces identifications such as

Lin(V∗∗,W∗∗) ∼= Lin(V,W)

when V and W are given linear spaces. In particular, if H ∈ Lin(W∗,V∗),we will interpret H⊤ as an element of Lin(V,W). In view of (22.2) and(21.3), H⊤ is characterized by

µH⊤v = (Hµ)v for all v ∈ V, µ ∈ W∗. (22.3)

Using (21.3) and (22.3), we immediately obtain the following:

Proposition 2: Let V, W be linear spaces. Then the transposition

(H 7→ H⊤) : Lin(W∗,V∗) → Lin(V,W) is the inverse of the transposition

(L 7→ L⊤) : Lin(V,W) → Lin(W∗,V∗), so that

(L⊤)⊤ = L for all L ∈ Lin(V,W) (22.4)

The identification V∗∗ ∼= V also permits us to interpret the annihilatorH⊥ of a subset H of V∗ as a subset of V.

Proposition 3: For every subspace U of a given linear space V, we have

(U⊥)⊥ = U (22.5)

Proof: Let u ∈ U be given. By definition of U⊥, we have λu = 0 forall λ ∈ U⊥. If we interpret u as an element of V∗∗ and use (22.2), thismeans that uλ = 0 for all λ ∈ U⊥ and hence u>(U⊥) = 0. Therefore,we have u ∈ (U⊥)⊥. Since u ∈ U was arbitrary, it follows that U ⊂ (U⊥)⊥.Applying the Formula (21.15) for Dimension of Annihilators to U and U⊥

and recalling dimV = dimV∗, we find dimU = dim(U⊥)⊥. By Prop.2 ofSect.17, this is possible only when (22.5) holds.


Using (22.5) and (22.4) one obtains the following consequences ofProps.3, 2, and 5 of Sect.21.

Proposition 4: For every subset S of V, we have

LspS = (S⊥)⊥. (22.6)

If U1 and U2 are subspaces of V, then

U⊥1 ⊂ U⊥

2 =⇒ U2 ⊂ U1

andU⊥

1 + U⊥2 = (U1 ∩ U2)

⊥. (22.7)

Using (22.5) and (22.4) one also obtains the following consequence of theTheorem on Annihilators and Transposes (Sect.21).

Proposition 5: Let V and W be linear spaces and let L ∈ Lin(V,W) be

given. For every subspace H of W∗, we then have

L⊤>(H) = (L<(H⊥))⊥. (22.8)


Rng L⊤ = (Null L)⊥. (22.9)

Notes 22

(1) Our notation λv for the value of λ ∈ V∗ := Lin(V, R) at v ∈ V, as in (22.1), isin accord with the general notation for the values of linear mappings (see Sect.13).Very often, a more complicated notation, such as 〈v, λ〉 or [v, λ], is used. I disagreewith the claim of one author that this complication clarifies matters later on; Ibelieve that it obscures them.

23 Dual Bases

We now consider the space RI of all families of real numbers indexed on agiven finite set I. Let ε := (evi | i ∈ I) be the evaluation family associatedwith RI (see Sect.04). As we already remarked in Sect.14, the evaluationsevi : RI → R, i ∈ I, are linear, i.e. they are members of (RI)∗. Thus, εis a family in (RI)∗. The linear combination mapping lncε : RI → (RI)∗ isdefined by

lncελ :=∑

i∈I

λievi for all λ ∈ RI (23.1)

23. DUAL BASES 77

(see Def.1 in Sect.15). It follows from (23.1) that

(lncελ)µ =∑

i∈I

λievi(µ) =∑

i∈I

λiµi (23.2)

for all λ, µ ∈ RI .Let δI := (δI

k | k ∈ I) be the standard basis of RI (see Sect.16). Writing(23.2) with the choice µ := δI

k, k ∈ I, we obtain

(lncελ)δIk = λk = evkλ (23.3)

for all λ ∈ RI and all k ∈ I. Given α ∈ (RI)∗, it easily follows from (23.3)that λ := (αδI

i | i ∈ I) is the unique solution of the equation

?λ ∈ RI , lncελ = α.

Since α ∈ (RI)∗ was arbitrary, we can conclude that lncε is invertible andhence a linear isomorphism.

It is evident from (23.2) that

(lncελ)µ = (lncεµ)λ for all λ, µ ∈ RI . (23.4)

Comparing this result with (22.3) and using the identification (RI)∗∗ ∼=RI , we obtain lnc⊤ε = lncε, where lnc⊤ε ∈ Lin((RI)∗∗, (RI)∗) is identifiedwith the corresponding element of Lin(RI , (RI)∗). In other words, theisomorphism (lnc⊤ε )−1lncε : RI → (RI)∗∗ coincides with the identificationRI ∼= (RI)∗∗ obtained from Prop.1 of Sect.22. Therefore, there is no conflictif we use lncε to identify (RI)∗ with RI .

From now on we shall use lncε to identify (RI)∗ with RI :

(RI)∗ ∼= RI ,

except that, given λ ∈ RI , we shall write λ· := lncελ rather than merely λfor the corresponding element in (RI)∗. Thus, (23.2) and (23.4) reduce to

µ · λ = λ · µ =∑

i∈I

λiµi for all λ, µ ∈ RI . (23.5)

The equations (23.5) and (23.3) yield

δIk · λ = λ · δI

k = λk = evkλ (23.6)

for all λ ∈ RI and all k ∈ I. It follows that evk = δIk· for all k ∈ I, i.e. that

the standard basis δI becomes identified with the evaluation family ε. Since

evk(δIi ) = δk,i for all i, k ∈ I


we see that

δIk · δI

i = δk,i :=

1 if k = i0 if k 6= i

(23.7)

holds for all i, k ∈ I.The following result is an easy consequence of (22.3), (23.5), and (16.2).

It shows that the transpose of a matrix as defined in Sect.02 is the same asthe transpose of the linear mapping identified with this matrix, and hencethat there is no notational clash.

Proposition 1: Let I and J be finite index sets and let

M ∈ Lin(RI , RJ) ∼= RJ×I be given. Then

M⊤ ∈ Lin((RJ)∗, (RI)∗) ∼= Lin(RJ , RI) ∼= RI×J

satisfies

(M⊤µ) · λ = µ · Mλ for all µ ∈ RJ , λ ∈ RI (23.8)

and

(M⊤)i,j = Mj,i for all i ∈ I, j ∈ J. (23.9)

Let V be a linear space, let b := (bi | i ∈ I) be a basis of V, and letlncb : RI → V be the (invertible) linear combination mapping for b (seeSect.15). Using the identification (RI)∗ ∼= RI , we can regard (lnc−1

b )⊤ as amapping from RI to V∗. Using the standard basis δI of RI , we define

b∗i := (lnc−1

b )⊤δIi for all i ∈ I (23.10)

and call the family b∗ := (b∗i | i ∈ I) in V∗ the dual of the given basis b.

Since (lnc−1b )⊤ is invertible, it follows from Prop.2 of Sect.16 that the dual

b∗ is a basis of V∗.Using (23.10) and (21.7), we find that

lnc⊤bb∗i = δI

i = lnc−1b bi for all i ∈ I. (23.11)

The dual basis b∗ can be used to evaluate the family of components ofa given v ∈ V relative to the basis b:

Proposition 2: Let b := (bi | i ∈ I) be a basis of V and let b∗ be its

dual. For every v ∈ V, we then have

(lnc−1b v)i = b∗

i v for all i ∈ I (23.12)

and hence

v =∑

i∈I

(b∗i v)bi. (23.13)

23. DUAL BASES 79

Proof: It follows from (23.10), (21.3) and (23.6) that

b∗i v = ((lnc−1

b )⊤δIi )v = (lncbe

−1v) · δIi = (lnc−1

b v)i

for all i ∈ I.Using Prop.2 and the formula (16.11) one easily obtains the following:Proposition 3: Let V and W be linear spaces and let b := (bi | i ∈ I)

and c := (cj | j ∈ J) be bases of V and W, respectively. Then the matrix

M ∈ RJ×I of a given L ∈ Lin(V,W) relative to b and c can be obtained by

the formula

Mj,i = c∗jLbi for all i ∈ I, j ∈ J. (23.14)

The following result gives the most useful characterization of the dual ofa basis.

Proposition 4: Let b := (bi | i ∈ I) be a basis of V and let β :=(βi | i ∈ I) be a family in V∗. Then

βkbi = δk,i (23.15)

holds for all i, k ∈ I if and only if β coincides with the dual b∗ of b.

Proof: In view of (23.7) and (21.7), the relation (23.15) is valid if andonly if

βkbi = δIk · δI

i = δIk · (lnc−1

b bi) = ((lnc−1b )⊤δI

k)bi.

Therefore, since b is a basis, it follows from the uniqueness assertion ofProp.2 of Sect.16 that (23.15) holds for all i ∈ I if and only if βk =(lnc−1

b )⊤δ⊤k . The assertion now follows from the definition (23.10) of thedual basis.

The following result furnishes a useful criterion for linear independence.Proposition 5: Let f := (fj | j ∈ J) be a family in V and

ϕ := (ϕj | j ∈ J) a family in V∗ such that

ϕkfj = δk,j for all j, k ∈ J. (23.16)

Then f and ϕ are both linearly independent.

Proof: By the definition (15.1) of lncf it follows from (23.16) that

ϕk(lncfλ) =∑

j∈J

λj(ϕkfj) = λk

for all λ ∈ R(J) and all k ∈ J . Hence we can have λ ∈ Null lncf i.e.lncfλ = 0, only if λk = 0 for all k ∈ J , i.e. only if λ = 0. It follows thatNull lncf = 0, which implies the linear independence of f by Prop.1 of


Sect.15. The linear independence of ϕ follows by using the identificationV∗∗ ∼= V and by interchanging the roles of f and ϕ.

If we apply Prop.4 to the case when V is replaced by V∗ and b by b∗

and use (22.2), we obtain:

Proposition 6: The dual b∗∗ of the dual b∗ of a basis b of V is identified

with b itself by the identification V∗∗ ∼= V, i.e. we have

b∗∗ = b. (23.17)

Using this result and Prop.3 we obtain:

Proposition 7: Let V, W, b and c be given as in Prop.3. If M ∈ RJ×I

is the matrix of a given L ∈ Lin(V,W) relative to b and c, then M⊤ ∈ RI×J

is the matrix of L⊤ ∈ Lin(W∗,V∗) relative to c∗ and b∗, i.e.

M⊤i,j = biL

⊤c∗j = cjLbi = Mj,i for all i ∈ I, j ∈ J (23.18)

Let β := (βi | i ∈ I) be a family in V∗, so that β ∈ V∗I . Using the iden-tification V∗I = (Lin(V, R))I ∼= Lin(V, RI) defined by termwise evaluation(see Sect.14), and the identification (RI)∗ ∼= RI characterized by (23.5), weeasily see that β⊤ ∈ Lin((RI)∗,V∗) ∼= Lin(RI ,V∗) is given by

β⊤ = lncβ. (23.19)

Remark: Let b be a basis of V. The mapping (lnc−1b )⊤lnc−1

b : V → V∗

is a linear isomorphism. In fact, by (23.10) and (23.11) it is the (unique)linear isomorphism that maps the basis b termwise onto the dual basis b∗.This isomorphism is not a natural isomorphism because it depends on thecapricious choice of the basis b. The natural isomorphism that is used forthe identification (RI)∗ ∼= RI exists because RI has a natural basis, namelythe standard basis. This basis gives RI a structure beyond the mere linear-space structure. To use the metaphor mentioned in the Pitfall at the end ofSect.15, the bases in RI do not form a “democracy”, as is the case in linearspaces without additional structure. Rather, they form a “monarchy” withthe standard basis as king.

24 Bilinear Mappings

Definition 1: Let V1, V2, and W be linear spaces. We say that the mapping

B : V1 × V2 → W is bilinear if B(v1, ·) := (v2 7→ B(v1,v2)) : V2 → Wis linear for all v1 ∈ V1 and B(·,v2) := (v1 7→ B(v1,v2)) : V1 → W is

24. BILINEAR MAPPINGS 81

linear for all v2 ∈ V2. The set of all bilinear mappings from V1 × V2 to Wis denoted by Lin2(V1 × V2,W).

Briefly, to say that B is bilinear means that B(v1, ·) ∈ Lin(V2,W) forall v1 ∈ V1 and B(·,v2) ∈ Lin(V1,W) for all v2 ∈ V2.

Proposition 1: Lin2(V1 × V2,W) is a subspace of Map (V1 × V2,W).Proof: Lin2(V1×V2,W) is not empty because the zero-mapping belongs

to it. To show that Lin2(V1 ×V2,W) is stable under addition, consider twoof its members B and C. Using the definition of B + C (see Sect.14) wehave

(B + C)(v1, ·) = B(v1, ·) + C(v1, ·)

for all v1 ∈ V1. Since Lin(V2,W) is stable under addition, it follows that(B + C)(v1, ·) ∈ Lin(V2,W) for all v1 ∈ V1. A similar argument shows that(B + C)(·,v2) ∈ Lin(V1,W) for all v2 ∈ V2 and hence thatB + C ∈ Lin2(V1 × V2,W). It is even easier to show that Lin2(V1 × V2,W)is stable under scalar multiplication.

Pitfall: The space Lin2(V1 ×V2,W) of bilinear mappings has little con-nection with the space Lin(V1 × V2,W) of linear mappings from the prod-uct space V1 × V2 to W . In fact, it is easily seen that as subspaces ofMap (V1 ×V2,W) the two are disjunct, i.e. they have only the zero mappingin common.

Proposition 2: Let V1, V2, and W be linear spaces. For each

B ∈ Lin2(V1 × V2,W), the mapping

(v1 7→ B(v1, ·)) : V1 → Lin(V2,W)

is linear. Moreover, the mapping

(B 7→ (v1 7→ B(v1, ·))) : Lin2(V1 × V2,W) → Lin(V1, Lin(V2,W))

is a linear isomorphism.

Proof: The first assertion follows from the definition of the linear-spaceoperations in Lin(V2,W) and the linearity of B(·,v2) for all v2 ∈ V2. Thesecond assertion is an immediate consequence of the definitions of the spacesinvolved and of the definition of the linear-space operations in these spaces.

We use the natural isomorphism described in Prop.2 to identify:

Lin2(V1 × V2,W) ∼= Lin(V1, Lin(V2,W)). (24.1)

This identification is expressed by

(Bv1)v2 = B(v1,v2) for all v1 ∈ V1,v2 ∈ V2, (24.2)


where on the left side B is interpreted as an element of Lin(V1, Lin(V2,W))and on the right side B is interpreted as an element of Lin2(V1×V2,W). Theidentification given by (24.1) and (24.2) is consistent with the identificationdescribed by (04.28) and (04.29).

Using Prop.7 of Sect.17 and Prop.2 above, we obtain the following for-mula for the dimension of spaces of bilinear mappings:

dimLin2(V1 × V2,W) = (dimV1)(dimV2)(dimW). (24.3)

Examples:

1. The scalar multiplication sm : R×V → V of a linear space V is bilinear,i.e. an element of

Lin2(R × V,V) ∼= Lin(R, Lin(V,V)) = Lin(R, LinV).

The corresponding linear mapping sm ∈ Lin(R, LinV) is given by

sm = (ξ 7→ ξ1V) : R → LinV.

It is not only linear, but it also preserves products, i.e.sm(ξη) = (smξ)(smη) holds for all ξ, η ∈ R. In fact, sm is an in-jective algebra-homomorphism from R to the algebra of lineons on V(see Sect.18).

2. Let V and W be linear spaces. Then

((L,v) 7→ Lv) : Lin(V,W) × V → W

is bilinear. The corresponding linear mapping is simply 1Lin(V,W). Inthe special case W := R, we obtain the bilinear mapping

((λ,v) 7→ λv) : V∗ × V → R.

The corresponding linear mapping is 1V∗ .

3. Let S be a set and let V and V ′ be linear spaces. It then follows fromProps.1 and 3 of Sect.14 that

((L, f) 7→ Lf) : Lin(V,V ′) × Map (S,V) → Map (S,V ′)

is bilinear.

24. BILINEAR MAPPINGS 83

4. Let V, V ′, V ′′ be linear spaces. Then

((M,L) 7→ ML) : Lin(V ′,V ′′) × Lin(V,V ′) → Lin(V,V ′′)

defines a bilinear mapping. This follows from Prop.1 of Sect.13 andthe result stated in the preceding example.

The following two results are immediate consequences of the definitionsand Prop.1 of Sect.13.

Proposition 3: The composite of a bilinear mapping with a linear

mapping is again bilinear. More precisely, if V1, V2, W, and W ′ are

linear spaces and if B ∈ Lin2(V1 × V2,W) and L ∈ Lin(W ,W ′), then

LB ∈ Lin2(V1 × V2,W′).

Proposition 4: The composite of the cross-product of a pair of linear

mappings with a bilinear mapping is again bilinear. More precisely, if V1, V2,

V ′1, V ′

2, and W are linear spaces and if L1 ∈ Lin(V1,V′1), L2 ∈ Lin(V2,V

′2)

and B ∈ Lin2(V′1 × V ′

2,W), then B (L1 × L2) ∈ Lin2(V1 × V2,W).With every B ∈ Lin2(V1 × V2,W) we can associate a bilinear mapping

B∼ ∈ Lin2(V2 × V1,W) defined by

B∼(v2,v1) := B(v1,v2) for all v1 ∈ V1, v2 ∈ V2 (24.4)

We call B∼ the switch of B. It is evident that

(B∼)∼ = B (24.5)

holds for all bilinear mappings B and that the switching, defined by

(B 7→ B∼) : Lin2(V1 × V2,W) → Lin2(V2 × V1,W)

is a linear isomorphism.Definition 2: Let V and W be linear spaces. We say that a bilinear

mapping B ∈ Lin2(V2,W) is symmetric if B∼ = B, i.e. if

B(u,v) = B(v,u) for all u,v ∈ V; (24.6)

we say that it is skew if B∼ = −B, i.e. if

B(u,v) = −B(v,u) for all u,v ∈ V. (24.7)

We use the notations

Sym2(V2,W) := S ∈ Lin2(V

2,W) | S∼ = S,

Skew2(V2,W) := A ∈ Lin2(V

2,W) | A∼ = −A.


Proposition 5: A bilinear mapping A ∈ Lin2(V2,W) is skew if and

only if

A(u,u) = 0 for all u ∈ V. (24.8)

Proof: If A is skew, then, by (24.7), we have A(u,u) = −A(u,u) andhence A(u,u) = 0 for all u ∈ V. If (24.8) holds, then

0 = A(u + v,u + v) = A(u,u) + A(u,v) + A(v,u) + A(v,v)= A(u,v) + A(v,u)

and hence A(u,v) = −A(v,u) for all u,v ∈ V. .

Proposition 6: To every B ∈ Lin2(V2,W) corresponds a unique pair

(S,A) with S ∈ Sym2(V2,W), A ∈ Skew2(V

2,W) such that B = S + A. In

fact, S and A are given by

S =1

2(B + B∼), A =

1

2(B− B∼). (24.9)

Proof: Assume that S ∈ Sym2(V2,W) and A ∈ Skew2(V

2,W) aregiven such that B = S + A. Since B 7→ B∼ is linear, it follows thatB∼ = S∼+A∼ = S−A. Therefore, we have B+B∼ = (S+A)+(S−A) = 2Sand B − B∼ = (S + A) − (S − A) = 2A, which shows that S and A mustbe given by (24.9) and hence are uniquely determined by B. On the otherhand, if we define S and A by (24.9), we can verify immediately that S issymmetric, that A is skew, and that B = S + A.

In view of Prop.4 of Sect.12, Prop.6 has the following immediate conse-quence:

Proposition 7: Sym2(V2,W) and Skew2(V

2,W) are supplementary

subspaces of Lin2(V2,W).

Let V and W be linear spaces. We consider the identifications

Lin2(V ×W∗, R) ∼= Lin(V,W∗∗) ∼= Lin(V,W)

and

Lin2(W∗ × V, R) ∼= Lin(W∗,V∗).

Hence if L ∈ Lin(V,W), we can not only form its transposeL⊤ ∈ Lin(W∗,V∗) but, by interpreting L as an element of Lin2(V ×W∗, R),we can also form its switch L∼ ∈ Lin2(W

∗ × V, R). It is easily verified thatL⊤ and L∼ correspond under the identification, i.e.

L∼ = L⊤ for all L ∈ Lin(V,W). (24.10)

25. TENSOR PRODUCTS 85

Pitfall: For a bilinear mapping B whose codomain is not R, the linearmapping corresponding to the switch B∼ is not the same as the transposeB⊤ of the linear mapping corresponding to B.

Let V1, V2, W1, and W2 be linear spaces. Let L1 ∈ Lin(V1,W1),L2 ∈ Lin(V2,W2) and B ∈ Lin2(W1 ×W2, R) ∼= Lin(W1,W

∗2 ) be given. By

Prop.4, we then have

B (L1 × L2) ∈ Lin2(V1 × V2, R) ∼= Lin(V1,V∗2 ).

Using these identifications, it is easily seen that

B (L1 × L2) = L⊤2 BL1. (24.11)

Notes 24

(1) The terms “antisymmetric” and “skewsymmetric” are often used for what we call,simply, “skew”. A bilinear mapping that satisfies the condition (24.8) is oftensaid to be “alternating”. In the case considered here, this term is synonymouswith “skew”, but one obtains a different concept if one replaces R by a field ofcharacteristic 2.

(2) The pair (S,A) associated with the bilinear mapping B according to Prop.6 issometimes called the “Cartesian decomposition” of B.

25 Tensor Products

For any linear space V there is a natural isomorphism from Lin(R,V) ontoV, given by h 7→ h(1). The inverse isomorphism associates with v ∈ V themapping ξ 7→ ξv in Lin(R,V). We denote this mapping by v⊗ (read “veetensor”) so that

v ⊗ ξ := ξv for all ξ ∈ R. (25.1)

In particular, there is a natural isomorphism from R onto R∗ =Lin(R, R). It associates with every number η ∈ R the operation of mul-tiplication with that number, so that (25.1) reduces to η ⊗ ξ = ηξ. We usethis isomorphism to identify R∗ with R, i.e. we write η = η⊗. However, whenV 6= R, we do not identify Lin(R,V) with V because such an identificationwould conflict with the identification V ∼= V∗∗ and lead to confusion.

If λ ∈ V∗ = Lin(V, R), we can consider λ⊤ ∈ Lin(R∗,V∗) ∼= Lin(R,V∗).Using the identification R∗ ∼= R, it follows from (21.3) and (25.1) that λ⊤ξ =ξλ = λ ⊗ ξ for all ξ ∈ R, i.e. that λ⊤ = λ⊗.


Definition 1: Let V and W be linear spaces. For every w ∈ Wand every λ ∈ V∗ the tensor product of w and λ is defined to be

w ⊗ λ ∈ Lin(V,W), i.e. the composite of λ with w⊗ ∈ Lin(R,W), so that

(w ⊗ λ)v = (λv)w for all v ∈ V. (25.2)

In view of Example 4 of Sect. 24, it is clear that the mapping

((w, λ) 7→ w ⊗ λ) : W ×V∗ −→ Lin(V,W) (25.3)

is bilinear.

In the special case when V := RI ∼= (RI)∗ and W := RJ for fi-nite index sets I and J , (16.4) and (25.2) show that the tensor productµ ⊗ λ ∈ Lin(RI , RJ) ∼= RJ×I of µ ∈ RJ and λ ∈ RI has the components

(µ ⊗ λ)j,i = µjλi for all (j, i) ∈ J × I. (25.4)

Using the identifications V∗∗ ∼= V and W∗∗ ∼= W , we can form tensorproducts

w ⊗ v ∈ Lin(V∗,W), µ ⊗ v ∈ Lin(V∗,W∗),

w ⊗ λ ∈ Lin(V,W), µ ⊗ λ ∈ Lin(V,W∗)

for all v ∈ V, w ∈ W , λ ∈ V∗, µ ∈ W∗. Also, using identifications suchas

Lin(V,W) ∼= Lin(V,W∗∗) ∼= Lin2(V ×W∗, R)

we can interpret any tensor product as a bilinear mapping to R. For example,if w ∈ W and λ ∈ V∗ we have

(w ⊗ λ)(v, µ) = (λv)(wµ) for all v ∈ V, µ ∈ W∗ (25.5)

The following is an immediate consequence of (25.5), the definition (24.4)of a switch, and (24.10).

Proposition 1: For all w ∈ W and λ ∈ V∗, we have

(w ⊗ λ)∼ = (w ⊗ λ)⊤ = λ ⊗ w. (25.6)

The following facts follow immediately from the definition of a tensorproduct.

Proposition 2: If w 6= 0 then

Null (w ⊗ λ) = Null λ; (25.7)

25. TENSOR PRODUCTS 87

if λ 6= 0 then

Rng (w ⊗ λ) = Rw. (25.8)

Proposition 3: Let V, V ′, W, W ′ be linear spaces. If λ ∈ V∗, w ∈ W,

L ∈ Lin(W ,W ′), and M ∈ Lin(V ′,V) then

L(w ⊗ λ) = (Lw) ⊗ λ (25.9)

and

(w ⊗ λ)M = w ⊗ (λM) = w ⊗ (M⊤λ). (25.10)

If v ∈ V and µ ∈ V′∗, then

(w ⊗ λ)(v ⊗ µ) = (λv)(w ⊗ µ). (25.11)

Tensor products can be used to construct bases of spaces of linear map-pings:

Proposition 4: Let V and W be linear spaces, let b := (bi | i ∈ I) be

a basis of V. Let b∗ := (b∗i | i ∈ I) be the basis of V∗ dual to b and let

c := (cj | j ∈ J) be a basis of W. Then (cj ⊗ b∗i | (j, i) ∈ J × I) is a basis

of Lin(V,W), and the matrix M ∈ RJ×I of the components of L relative to

this basis is the same as the matrix lnc−1c Llncb ∈ Lin(RI , RJ) ∼= RJ×I , i.e.

the matrix of L relative to the bases b and c (see Sect. 16).Proof: It is sufficient to prove that

L =∑

(j,i)∈J×I

Mj,i(cj ⊗ b∗i ) (25.12)

holds when M is the matrix of L relative to b and c. It follows from (16.11)and from (25.2) and Prop.4 of Sect. 23 that the left and right sides of(25.12) give the same value when applied to the terms bk of the basis b.Using Prop.2 of Sect. 16, we conclude that (25.12) must hold.

Using (18.6) and (18.8), we obtain the following special case of (25.12):Proposition 5: Let V be a linear space and let b := (bi | i ∈ I) be a

basis of V. For every lineon L ∈ LinV, we then have

L =∑

(j,i)∈I×I

([L]b)j,i(bj ⊗ b∗i ). (25.13)


1V =∑

i∈I

bi ⊗ b∗i . (25.14)

Prop.4 has the following corollary:


Proposition 6: If V and W are linear spaces, then

Lin(V,W) = Lspw ⊗ λ | w ∈ W , λ ∈ V∗. (25.15)

Notes 25

(1) The term “dyadic product” and the notation wλ is often used in the older literaturefor our “tensor product” w⊗λ. We cannot use this older notation because it wouldlead to a clash with the evaluation notation described by (22.2).

(2) For other uses of the term “tensor product” see Note (1) to Sect. 26.

26 The Trace

Let V and W be linear spaces. Since the mapping (25.3) is bilinear, it followsfrom Prop.3 of Sect. 24 that for every linear form Ω on Lin(V,W)

((w, λ) 7→ Ω(w ⊗ λ)) : W ×V∗ −→ R

is a bilinear mapping Using the identifications Lin2(W × V∗, R) ∼=Lin(W ,V∗∗) ∼= Lin(W ,V) (see Sects.24 and 22) we see that there is a map-ping

τ : (Lin(V,W))∗ −→ Lin(W ,V) (26.1)

defined by

λ(τ(Ω)w) = Ω(w ⊗ λ) for all Ω ∈ (Lin(V,W))∗, w ∈ W , λ ∈ V∗.(26.2)

Lemma: The mapping (26.1) defined by (26.2) is a linear isomorphism.

Proof: The linearity of τ follows from the fact that every member ofLin(V,W), and in particular w ⊗ λ, can be identified with an element of(Lin(V,W))∗∗, i.e. a linear form on (Lin(V,W))∗.

If Ω ∈ Null τ then τ(Ω) = 0 and hence, by (26.2), Ω(w ⊗ λ) = 0 forall w ∈ W and all λ ∈ V∗. By Prop.6 of Sect. 25 this is possible onlywhen Ω = 0. We conclude that Null τ = 0. Since dim(Lin(V,W))∗ =(dimV)(dimW) = dim(Lin(W ,V)) (see (21.1) and (17.7) ) it follows fromthe Pigeonhole Principle for Linear Mappings (Sect. 17) that τ is invertible.

The following result shows that the algebra of lineons admits a naturallinear form.

26. THE TRACE 89

Characterization of the Trace: Let V be a linear space. There is

exactly one linear form trV on LinV that satisfies

trV(v ⊗ λ) = λv for all v ∈ V, λ ∈ V∗. (26.3)

This linear form trV is called the trace for V.

Proof: Assume that trV ∈ (LinV)∗ satisfies (26.3). Using (26.2) withW := V and the choice Ω := trV we see that we must have τ(trV) = 1V . Bythe Lemma, it follows that trV is uniquely determined as trV = τ−1(1V). Onthe other hand, if we define trV := τ−1(1V) then (26.3) follows from (26.2).

If the context makes clear what V is, we often write tr for trV .Using (26.3), (25.9), and (25.10), it is easily seen that the definition

(26.2) of the mapping τ is equivalent to the statement that

ΩL = trV(τ(Ω)L) = trW(Lτ(Ω)) (26.4)

holds for all Ω ∈ (Lin(V,W))∗ and all L ∈ Lin(V,W) that are tensor prod-ucts, i.e. of the form L = w ⊗ λ for some w ∈ W , λ ∈ V∗. Since thesetensor products span Lin(V,W) (Prop.6 of Sect. 25), it follows that themapping τ can be characterized by the statement that (26.4) holds for allL ∈ Lin(V,W), whether L is a tensor product or not. Using this fact andthe Lemma we obtain the following two results.

Representation Theorem for Linear Forms on a Space of Lin-ear Mappings: Let V and W be linear spaces. Every linear form Ω on

Lin(V,W) is represented by a unique M ∈ Lin(W ,V) in the sense that

ΩL = trV(ML) for all L ∈ Lin(V,W). (26.5)

Proposition 1: For every L ∈ Lin(V,W) and every M ∈ Lin(W ,V) we

have

trV(ML) = trW(LM). (26.6)

Now let a linear space V be given.Proposition 2: For every L ∈ LinV we have

trV∗L⊤ = trVL. (26.7)

Proof: By the Theorem on Characterization of Trace it suffices to showthat the mapping L 7→ trV∗L⊤ from LinV into R is linear and has the valueλv when L = v ⊗ λ. The first is immediate and the second follows from(25.6) and (26.3), applied to the case when V is replaced by V∗.


Proposition 3: Let b := (bi | i ∈ I) be a basis of V. For every lineon

L ∈ LinV, we have

trVL =∑

i∈I

([L]b)i,i, (26.8)

where [L]b := lnc−1b Llncb is the matrix of L relative to b.

Proof: It follows from (26.3) and Prop.4 of Sect. 23 that

trV(bj ⊗ b∗i ) = b∗

i bj = δi,j for all i, j ∈ I.

Therefore, (26.8) is a consequence of (25.13) and the linearity of trV .If we apply (26.8) to the case when L := 1V and hence ([L])b)i,i = 1 for

all i ∈ I (see (18.8)), we obtain

trV1V = dimV. (26.9)

Proposition 4: Let U be a subspace of V and let P : V → U be a

projection (see Sect. 19). Then

trU (K) = trV(1U⊂VKP) (26.10)

for all K ∈ LinU .Proof: It follows from Prop.1 that

trV((1U⊂VK)P) = trU(P(1U⊂VK)) = trU (P|UK)

for all K ∈ LinU . By the definition of a projection (Sect. 19) we haveP|U = 1U and hence (26.10) holds.

Proposition 5: The trace of an idempotent lineon E on V is given by

trVE = dimRng E (26.11)

Proof: Put U := RngE and P := E|U . We then have E = 1U⊂VP and,by Prop.1 of Sect. 19, P : V → U is a projection. Applying (26.10) tothe case when K := 1U we obtain trU(1U) = trV(E). The desired formula(26.11) now follows from (26.9).

Proposition 6: Let V1, V2, and W be linear spaces. There is exactly

one mapping

Λ : Lin2(V1 × V2,W) −→ Lin(Lin(V∗2 ,V1),W)

such that

B(v1,v2) = Λ(B)(v1 ⊗ v2) for all v1 ∈ V1, v2 ∈ V2 (26.12)

26. THE TRACE 91

and all B ∈ Lin2(V1 × V2,W). Moreover, Λ is a linear isomorphism.

Proof: For every B ∈ Lin2(V × V2,W) and µ ∈ W∗ we haveµB ∈ Lin2(V1 × V2, R) ∼= Lin(V1,V

∗2 ) (see Prop.3 of Sect. 24). Using the

identification V∗∗2

∼= V2, we see that (26.12) holds if and only if

v2((µB)v1) = (µB)(v1,v2) = (µΛ(B))(v1 ⊗ v2) (26.13)

for all v1 ∈ V1,v2 ∈ V2 and all µ ∈ W∗. Comparing (26.13) with (26.2), weconclude that (26.12) is equivalent to

τ(µΛ(B)) = µB for all µ ∈ W∗,

where τ is defined according to (26.1) and (26.2), in which V and W mustbe replaced by V1 and V∗

2 , respectively. Since τ is invertible by the Lemma,we conclude that (26.12) is equivalent to

µΛ(B) = τ−1(µB) for all µ ∈ W∗. (26.14)

Since W is isomorphic to W∗∗, the uniqueness and existence of Λ follow fromthe equivalence of (26.14) with (26.12). The linearity of Λ follows from thelinearity of τ−1. Also it is clear from (26.14) that Λ has an inverse Λ, whichis characterized by

µΛ−1(Φ) = τ(µΦ) for all µ ∈ W∗ (26.15)

and all Φ ∈ Lin(Lin(V∗2 ,V1),W).

Remark: Prop.6 shows that every bilinear mapping B on V1 × V2 isthe composite of the tensor product mapping from V1 ×V2 into Lin(V∗

2 ,V1)with a linear mapping Λ(B). In a setting more abstract than the one usedhere, the term tensor product mapping is often employed for any bilinearmapping ⊗ on V1 × V2 with the following universal factorization property:Every bilinear mapping on V1 × V2 is the composite of ⊗ with a uniquelinear mapping. The codomain of ⊗ is then called a tensor product space

and is denoted by V1 ⊗ V2. For the specific tensor product used here, wehave V1 ⊗ V2 := Lin(V∗

2 ,V1).

Notes 26

(1) There are many ways of constructing a tensor-product space in the sense of theRemark above from given linear spaces V1 and V2. The notation V1 ⊗ V2 for sucha space is therefore ambiguous and we will not use it. One can associate withthe construction of tensor-product spaces a concept of “tensor product” of linearmappings which generalizes the concept of tensor product introduced in Sect. 25.Tensor-product spaces are of little practical value in the present context, eventhough they give important insights in abstract algebra.


27 Bilinear Forms and Quadratic Forms

We assume that a linear space V is given. The bilinear mappings from V2

to R, i.e. the members of Lin2(V2, R) are called bilinear forms on V. The

identification Lin2(V2, R) ∼= Lin(V,V∗)—a special case of (24.1)—enables us

to interpret bilinear forms as mappings from V to its dual V∗. We use thenotation Sym(V,V∗) and Skew(V,V∗) for the subspaces of Lin(V,V∗) thatcorrespond to the subspaces Sym2(V

2, R) and Skew2(V2, R) of Lin2(V

2, R)(see Def.2 in Sect. 24). In view of (24.10), we have

Sym(V,V∗) = S ∈ Lin(V,V∗) | S⊤ = S, (27.1)

Skew(V,V∗) = A ∈ Lin(V,V∗) | A⊤ = −A. (27.2)

By Prop.7 of Sect. 24, Sym(V,V∗) and Skew(V,V∗) are supplementarysubspaces of Lin(V,V∗).

Let b := (bi | i ∈ I) be a basis of V and let B ∈ Lin2(V2, R) ∼= Lin(V,V∗)

be a bilinear form. The matrix M ∈ RI×I defined by

Mj,i := B(bi,bj) for all i, j ∈ I (27.3)

is called the matrix of B relative to b. It is easily seen that M coincideswith the matrix of B when regarded as an element of Lin(V,V∗), relativeto the bases b and b∗ in the sense of the definition in Sect. 16. Usingthe fact that b∗∗ = b, (see (23.17)) it follows from Prop.4 of Sect. 25that M is also the matrix of the components of B relative to the basis(b∗

i ⊗ b∗j | (i, j) ∈ I × I) of Lin2(V

2, R) ∼= Lin(V,V∗), i.e. that

B =∑

(i,j)∈I×I

Mi,j(b∗i ⊗ b∗

j ) (27.4)

holds if and only if M is given by (27.3). It is clear from (27.3) that B issymmetric, i.e. B⊤ = B∼ = B, if and only if

Mi,j = Mj,i for all i, j ∈ I, (27.5)

and that B is skew, i.e. B⊤ = B∼ = −B, if and only if

Mi,j = −Mj,i for all i, j ∈ I (27.6)

Proposition 1: If n := dimV then

dimSym2(V2, R) = dimSym(V,V∗) =

n(n + 1)

2(27.7)

and

27. BILINEAR FORMS AND QUADRATIC FORMS 93

dimSkew2(V2, R) = dim Skew(V,V∗) =

n(n − 1)

2. (27.8)

Proof: We choose a list basis b := (bi | i ∈ n]) of V. LetA ∈ Skew2(V

2, R) be given. We claim that

A =∑

(Ki,j(b∗i ⊗ b∗

j − b∗j ⊗ b∗

i | (i, j) ∈ n] × n], i < j) (27.9)

holds if and only if

Ki,j = A(bi,bj) for all (i, j) ∈ n] × n] with i < j. (27.10)

Indeed, since (b∗i ⊗ b∗

j | (i, j) ∈ n] × n]) is a basis of Lin2(V2, R),

we find, by comparing (27.9) with (27.4) and (27.10) with (27.3), that(27.9) can hold only if (27.10) holds. On the other hand, if we define

(Ki,j |(i, j) ∈ n] × n], i < j) by (27.10), it follows from (27.6) that the matrixM of A is given by

Mi,j =

Ki,j if i < j−Kj,i if j < i

0 if i = j

for all i, j ∈ I.

Therefore, (27.4) reduces to (27.9).It follows from (25.6) that the terms of the family

b∗ ∧ b∗ := (b∗i ⊗ b∗

j − b∗j ⊗ b∗

i | (i, j) ∈ n] × n], i < j) (27.11)

all belong to Skew2(V2, R). Therefore, the equivalence of (27.9) and (27.10)

proves that the family (27.11) is a basis of Skew2(V2, R). This basis has

♯(i, j) ∈ n] × n] | i < j = n(n−1)2 terms, and hence (27.8) holds.

Since Sym2(V2, R) is a supplement of Skew2(V

2, R) in Lin2(V2, R) and

since dimLin2(V2, R) = n2 by (24.3), we can use Prop.5 of Sect. 17 and

(27.8) to obtain

dimSym2(V2, R) = dimLin2(V

2, R) − dim Skew2(V2, R)

= n2 −n(n − 1)

2=

n(n + 1)

2.

We note that if S ∈ Sym2(V2, R), then S (1V ,1V) = (u 7→ S(u,u))

is a real-valued function on V (see Sect. 04). Thus, one can consider themapping

(S 7→ S (1V ,1V)) : Sym2(V2, R) → Map (V, R). (27.12)


Proposition 2: The mapping (27.12) is linear and injective.

Proof: The linearity follows from Prop.1 of Sect. 14. To show that(27.12) is injective, it suffices to show that its nullspace contains only thezero of Sym2(V

2, R). If S belongs to this nullspace, then S (1V , 1V) = 0,i.e. S(u,u) = 0 for all u ∈ V. By Props.5 and 7 of Sect. 24 we concludethat S must also be skew and hence that S = 0.

Definition 1: The range space of the mapping (27.12) will be denoted

by Qu(V). Its members are called quadratic forms on V. We write

S := S (1V ,1V) when S ∈ Sym2(V2, R), (27.13)

and the inverse of the linear isomorphism

(S 7→ S ) : Sym2(V2, R) −→ Qu(V)

will be denoted by

(Q 7→ Q ) : Qu(V) → Sym2(V2, R) ∼= Sym(V,V∗),

so that Q = Q (1V ,1V), i.e.,

Q(u) = Q (u,u) = (Q u)u for all u ∈ V. (27.14)

We say that Q ∈ Qu(V) is non-degenerate if Q ∈ Sym(V,V∗) is injec-

tive, positive [negative] if RngQ ⊂ P [RngQ ⊂ −P], strictly positive[strictly negative] if Q>(V×) ⊂ P× [Q>(V×) ⊂ −P×]. We say that Q is

single-signed if it is either positive or negative, double-signed otherwise.

The same adjectives will be used for the corresponding bilinear form Q.

Example: Let V be a linear space. The mapping

(L 7→ trV(L2)) : LinV −→ R (27.15)

is a quadratic form on the algebra of lineons LinV, i.e. a member ofQu(LinV). The corresponding element of Sym2((LinV)2, R) is given by(L,M) 7→ trV(LM). It follows from the Representation Theorem for LinearForms on LinV (Sect. 26) that the quadratic form (27.15) is non-degenerate.

Using the symmetry and bilinearity of Q , one obtains the formulas

Q(ξu) = ξ2Q(u) for all ξ ∈ R, u ∈ V, (27.16)

Q(u + v) = Q(u) + 2Q (u,v) + Q(v) for all u,v ∈ V. (27.17)

27. BILINEAR FORMS AND QUADRATIC FORMS 95

As a consequence of (27.17), we find

Q (u,v) =1

2(Q(u + v) − Q(u) − Q(v))

=1

2(Q(u) + Q(v) −Q(u − v)) (27.18)

=1

4(Q(u + v) − Q(u− v)),

which give various ways of expressing Q explicitly in terms of Q.The following result follows from Prop.4 of Sect. 24, from (24.11), and

from Prop.1 of Sect. 14.Proposition 3: Let V and W be linear spaces and let L ∈ Lin(V,W).

For every Q ∈ Qu(W) we have Q L ∈ Qu(V) and

Q L = Q (L× L) = L⊤ QL. (27.19)

The mapping

(Q 7→ Q L) : Qu(W) → Qu(V) (27.20)

is linear; it is invertible if and only if L is invertible.

If U is a subspace of V and Q ∈ Qu(V), then Q|U ∈ Qu(U) andQ|U = Q |U×U . If Q is positive, strictly positive, negative, or strictly nega-tive, so is Q|U . However, Q|U need not be non-degenerate even if Q is.

Proposition 4: A positive [negative] quadratic form is strictly positive

[strictly negative] if and only if it is non-degenerate.

Proof: Let Q ∈ Qu(V) and u ∈ Null Q , so that Q u = 0. Then

0 = (Q u)u = Q (u,u) = Q(u). (27.21)

Now, if Q is strictly positive then (27.21) can hold only if u = 0, whichimplies that Null Q = 0 and hence that Q is injective, i.e. that Q ispositive and non-degenerate. Assume that Q is positive and non-degenerateand that Q(v) = 0 for a given v ∈ V. Using (27.18)2 we find that

ξ Q (u,v) = Q (u, ξv) =1

2(Q(u) + Q(ξv) − Q(u− ξv)),

and hence, since Q(u− ξv) ≥ 0 and Q(ξv) = ξ2Q(v) = 0,

ξ Q (u,v) ≤1

2Q(u)

for all v ∈ V and all ξ ∈ R×. It is easily seen that this is possible only when0 = Q (u,v) = (Q u)v for all v ∈ V, i.e. if Qu = 0. Since Q is injective,


it follows that u = 0. We have shown that Q(u) = 0 implies u = 0, andhence, since Q was assumed to be positive, that Q is strictly positive.

Notes 27

(1) Many people use the clumsy terms “positive semidefinite” or “non-negative” whenwe speak of a “positive” quadratic or bilinear form. They then use “positive defi-nite” or “positive” when we use “strictly positive”.

(2) The terms “single-signed” and “double-signed” for quadratic and bilinear formsare used here for the first time. They are clearer than the terms “definite” and“indefinite” found in the literature, sometimes with somewhat different meanings.


1. Let V be a linear space and let L ∈ LinV and λ, µ ∈ R with λ 6= µ begiven. Prove that

Null (L⊤ − µ1V∗) ⊂ (Null (L − λ1V))⊥. (P2.1)

2. Let n ∈ N be given and let the linear space Pn be defined as in Problem4 in Chap.1. For each k ∈ n[, let βk : Pn → R be defined by

βk(f) := f (k)(0) for all f ∈ Pn, (P2.2)

where f (k) denotes the k’th derivative of f (see Sect. 08). Note that,for each k ∈ n[, βk is linear, so that βk ∈ Pn

∗.

(a) Determine a list h := (hk | k ∈ n[) in Pn such that

βkhj = δk,j for all k, j ∈ n[. (P2.3)

(b) Show that the list h determined in (a) is a basis of Pn and thatthe dual of h is given by

h∗ = β := (βk | k ∈ n[). (P2.4)

(Hint: Apply Props.4 and 5 of Sect. 23).


For each t ∈ R, let the evaluation evt : Pn → R be defined by

evt(f) := f(t) for all f ∈ Pn (P2.5)

(see Sect. 04) and note that evt is linear, so that evt ∈ Pn∗.

(c) Let t ∈ R be given. Determine λ := lncβ−1evt ∈ Rn[

, where β isthe basis of Pn

∗ defined by (P2.4) and (P2.2), so that

evt = lncβλ =∑

k∈n[ λkβk. (P2.6)

3. Let n ∈ N be given and let the linear space Pn be defined as in Problem4 of Chap.1. Also, let a subset F of R with n elements be given, sothat n = ♯F .

(a) Determine a family g := (gs | s ∈ F ) in Pn such that

evt(gs) = δt,s for all t, s ∈ F, (P2.7)

where evt ∈ Pn∗ is the evaluation given by (P2.5).

(b) Show that the family g determined in (a) is a basis of Pn andthat the dual of g is given by

g∗ = (evt | t ∈ F ). (P2.8)

(Hint: Apply Props.4 and 5 of Sect. 23).

4. Let V be a linear space. Let the mappings Λ and Υ from LinV × LinVto LinV be defined by

Λ(L,M) := LM −ML (P2.9)

and

Υ(L,M) := LM + ML (P2.10)

for all L,M ∈ LinV.

(a) Show that Λ is bilinear and skew, so thatΛ ∈ Skew2((LinV)2, LinV). Also, show that


∑

γ∈C3Λ(Λ(Lγ(1),Lγ(2)),Lγ(3)) = 0 (P2.11)

for every triple (L1,L2,L3) in LinV, where C3 denotes the set ofcyclic permutations of 3], i.e.

C3 :=

13] ,

(

1 2 32 3 1

)

,

(

1 2 33 1 2

)

.

(b) Show that Υ is bilinear and symmetric, so thatΥ ∈ Sym2((LinV)2, LinV). Also, show that

Υ(L, Υ(Υ(L,L),M)) = Υ(Υ(L,L), Υ(L,M)) (P2.12)


Remark: The mapping Λ endows LinV with the structure of a“Lie-algebra” and the mapping Υ endows LinV with the structureof a “Jordan-algebra”. The theory of these is an important topicin abstract algebra.

5. Let D, M ∈ Lin C∞(R) and Pn, with n ∈ N, be defined as in Problems4 and 5 of Chapt.1.

(a) Show that Pn is D-subspace, a (MD)-subspace, and a(DM)-subspace of C∞(R). Calculate trD|Pn

, tr(MD)|Pn, and

tr(DM)|Pn.

(b) Prove: If V is a linear space of finite and non-zero dimension,there do not exist L,K ∈ LinV such that LK−KL = 1V . (Com-pare this assertion with the one of Problem 5 of Chap.1.)

6. Let V be a linear space.

(a) Prove that Ω ∈ (LinV)∗ satisfies

Ω(LM) = Ω(ML) for all L,M ∈ LinV (P2.13)

if and only if Ω = ξtrV for some ξ ∈ R.

(Hint: Consider the case when L and M in (P2.13) are tensorproducts.)

(b) Consider the left-multiplication mapping LeL ∈ Lin(LinV) de-fined by (P1.1) for each L ∈ LinV. Prove that

trLinV(LeL) = (dimV)trVL for all L ∈ LinV (P2.14)


(Hint: Use part (a)).

7. Let U1 and U2 be supplementary subspaces of the given linear space Vand let P1,P2 be the projections associated with U1,U2 according toProp.4 of Sect. 19. Prove that, for every L ∈ LinV, we have

trVL = trU1(P1L|U1) + trU2(P2L|U2). (P2.15)

(Hint: Apply Prop.1 of Sect. 26 and Prop.5 of Sect. 19.)

8. Let V be a linear space and put n := dimV.

(a) Given a list basis b := (bi | i ∈ n]) of V construct a basis ofSym2(V

2, R) ∼= Sym(V,V∗) by a procedure analogous to the onedescribed in the proof of Prop.1 of Sect. 27.

(b) Show that, for every λ ∈ V∗, the function λ2 : V → R obtainedfrom λ by value-wise squaring (i.e. by λ2(v) := (λv)2 for allv ∈ V), is a quadratic form on V, i.e. that λ2 ∈ Qu(V). (SeeDef.1 in Sect. 27)

(c) Prove that

Lspλ2 |λ ∈ V∗ = Qu(V). (P2.16)

Chapter 3

Flat Spaces

In this chapter, the term “linear space” will be used as a shorthand for“linear space over the real field R”. (Actually, many definitions remainmeaningful and many results remain valid when R is replaced by an arbitraryfield. The interested reader will be able to decide for himself when this isthe case.)

31 Actions of Groups

Let E be a set and consider the permutation group Perm E , which consistsof all invertible mappings from E onto itself. Perm E is a group undercomposition, and the identity mapping 1E is the neutral of this group (seeSect. 06).

Definition 1: Let E be a non-empty set and G a group. By an actionof G on E we mean a group homomorphism τ : G → PermE .

We write τg for the value of τ at g ∈ G. To say that τ is an action of Gon E means that τcmb(g,h) = τg τh for all g, h ∈ G, where cmb denotes thecombination of G (see Sect. 06). If n is the neutral of G, then τn = 1E , andif rev(g) is the group-reverse of g ∈ G, then τrev(g) = τ←g .

Examples:

1. If G is any subgroup of Perm E , then the inclusion mapping 1G⊂Perm E

is an action of G on E . For subgroups of Perm E it is always understoodthat the action is this inclusion.

2. If E is a set with a prescribed structure, then the automorphisms of Eform a subgroup of Perm E which acts on E by inclusion in Perm E as

101

102 CHAPTER 3. FLAT SPACES

explained in Example 1. For instance, if V is a linear space, then thelineon group LisV acts on V.

3. If V is a linear space, define τ : V → PermV by letting τv be theoperation of adding v ∈ V, so that τv(u) := u + v for all u ∈ V. It iseasily seen that τ is a homomorphism of the additive group of V intothe permutation group PermV, i.e., τ is an action of V on itself.

4. If V is a linear space, define mult : P× → PermV by letting multλ :=λ1V be the operation of taking the λ-multiple, so that multλ(v) = λvfor all v ∈ V. It is evident that mult is a homomorphism of themultiplicative group P× into PermV and hence an action of P× on V.mult>(P×) is a subgroup of LisV that is isomorphic to P×; namely,mult>(P×) = P×1V .

Let τ be an action of G on E . We define a relation ∼τ on E by

x ∼τ y :⇐⇒ y = τg(x) for some g ∈ G. (31.1)

It is easily seen that ∼τ is an equivalence relation. The equivalenceclasses are called the orbits in E under the action τ of G. If x ∈ E , thenτg(x) | g ∈ G is the orbit to which x belongs, also called the orbit of x.

Definition 2: An action τ of a group G on a non-empty set E is said

to be transitive if for all x, y ∈ E there is g ∈ G such that τg(x) = y.The action is said to be free if τg, with g ∈ G, can have a fixed point (seeSect.03) only if g = n, the neutral of G.

To say that an action is transitive means that all of E is the only orbitunder the action. Of course, since τn = 1E , all points in E are fixed pointsof τn.

If the action τ of G on E is free, one easily sees that τ : G → Perm Emust be injective, and hence that τ>(G) is a subgroup of Perm E that is anisomorphic image of G.

Proposition 1: An action τ of a group G on a non-empty set E is both

transitive and free if and only if for all x, y ∈ E there is exactly one g ∈ Gsuch that τg(x) = y.

Proof: Assume the action is both transitive and free and that x, y ∈ Eare given. If τg(x) = y and τh(x) = y, then x = (τg)

←(y) = τrev(g)(y) =τrev(g)(τh(x)) = (τrev(g) τh)(x) = τcmb(rev(g),h)(x). Since the action is free,it follows that cmb(rev(g), h) = n and hence g = h. Thus there can be atmost one g ∈ G such that τg(x) = y. The transitivity of the action assuresthe existence of such g.

32. FLAT SPACES AND FLATS 103

Assume now that the condition is satisfied. The action is then transitive.Let τg(x) = x for some x ∈ E . Since τn(x) = 1E(x) = x it follows from theuniqueness that g = n. Hence the action is free.

32 Flat Spaces and Flats

Definition 1: A flat space is a non-empty set E endowed with structure

by the prescription of

(i) a commutative subgroup V of Perm E whose action is transitive.

(ii) a mapping sm : R × V → V which makes V a linear space when

composition is taken as the addition and sm as the scalar multiplication

in V.

The linear space V is then called the translation space of E.

The elements of E are called points and the elements of V translationsor vectors. If ξ ∈ R and v ∈ V we write, as usual, ξv for sm(ξ,v). In V,we use additive notation for composition, i.e., we write u + v for v u, −vfor v←, and 0 for 1E .

Proposition 1: The action of the translation space V on the flat space

E is free.

Proof: Suppose that v ∈ V satisfies v(x) = x for some x ∈ E . Let y ∈ Ebe given. Since the action of V is transitive, we may choose u ∈ V such thaty = u(x). Using the commutativity of the group V, we find that

v(y) = v(u(x)) = (v u)(x) = (u v)(x) = u(v(x)) = u(x) = y.

Since y ∈ E was arbitrary, it follows that v = 1E .This Prop.1 and Prop.1 of Sect.31 have the following immediate conse-

quence:Proposition 2: There is exactly one mapping

diff : E × E −→ V

such that

(diff(x, y))(y) = x for all x, y ∈ E . (32.1)

We call diff(x, y) ∈ V the point-difference between x and y and use

the following simplified notations:

x − y := diff(x, y) when x, y ∈ E ,x + v := v(x) when x ∈ E ,v ∈ V.


The following rules are valid for all x, y, z ∈ E and for all u,v ∈ V.

x − x = 0,

x − y = −(y − x),

(x − y) + (y − z) = x − z,

(x + u) + v = x + (u + v),

x + v = y ⇐⇒ v = y − x.

In short, an equation involving points and vectors is valid if it is valid accord-ing to the rules of ordinary algebra, provided that all expressions occurringin the equation make sense. Of course, an expression such as x+ y does not

make sense when x, y ∈ E .

The following notations, which are similar to notations introduced inSect.06., are very suggestive and useful. They apply when x ∈ E ,v ∈ V,H,Y ∈ Sub E , S ∈ SubV.

x + S := x + s | s ∈ S,

H + v := v>(H) = x + v | x ∈ H,

H + S :=⋃

s∈S

s>(H) = x + s | x ∈ H, s ∈ S,

H−Y := x − y | x ∈ H, y ∈ Y.

The first three of the above are subsets of E , the last is a subset of V. Wehave V = E −E and we sometimes use E - E as a notation for the translationspace of E .

Flat spaces serve as mathematical models for physical planes and spaces.The vectors, i.e. the mappings in V (except the identity mapping 0 = 1E),are to be interpreted as parallel shifts (hence the term “translation”). Ifa vector v ∈ V is given, we can connect points x, y, . . ., with their imagesx + v, y + v, . . ., by drawing arrows as shown in Figure 1. In this sense,vectors can be represented pictorially by arrows. The commutativity of Vis illustrated in Figure 2. The assumption that the action of V is transitivecorresponds to the fact that given any two points x and y, there exists avector that carries x to y.


It often happens that a set E , a linear space V, and an action of theadditive group of V on E are given. If this action is transitive and injective,then E acquires the structure of a flat space whose translation space isthe isomorphic image of V in Perm E under the given action. Under suchcircumstances, we identify V with its isomorphic image and, using poeticlicense, call V itself the translation space of E . If we wish to emphasize thefact that V becomes the translation space only after such an identification,we say that V acts as an external translation space of E .

Let V be a linear space. The action τ : V → PermV described inExample 3 of the previous section is easily seen to be transitive and injective.Hence we have the following trivial but important result:

Proposition 3: Every linear space V has the natural structure of a flat

space. The space V becomes its own external translation space by associating

with each v ∈ V the mapping u 7→ u + v from V to itself.

The linear-space structure of V embodies more information than its flat-space structure. As a linear space, V has a distinguished element, 0, butas a flat space it is homogeneous in the sense that all of its elements are ofequal standing. Roughly, the flat structure of V is obtained from its linearstructure by forgetting where 0 is.

Note that the operations involving points and vectors introduced earlier,when applied to the flat structure of a linear space, reduce to ordinarylinear-space operations. For example, point-differences reduce to ordinarydifferences.

Of course, the set R of reals has the natural structure of a flat spacewhose (external) translation space is R itself, regarded as a linear space.

We now assume that a flat space E with translation space V is given.A subset F of E will inherit from E the structure of a flat space if thetranslations of E that leave F invariant can serve, after adjustment (seeSect.03), as translations of F . The precise definition is this:


Definition 2: A non-empty subset F of E is called flat subspace of E,

or simply a flat in E, if the set

U := u ∈ V | F + u ⊂ F (32.2)

is a linear subspace of V whose additive group acts transitively on F under

the action which associates with every u ∈ U the mapping u|F of F onto

itself.

The action of U on F described in this definition endows F with thenatural structure of a flat space whose (external) translation space is U . Wesay that U is the direction space of F . The following result is immediatefrom Def.2.

Proposition 4: Let U be a subspace of V. A non-empty subset F of Eis a flat with direction space U if and only if

F − F ⊂ U and F + U ⊂ F . (32.3)

Let U be a subspace of V. The inclusion of the additive group of U inthe group Perm E is an action of U on E . This action is transitive only whenU = V. The orbits in E under the action of U are exactly the flats withdirection space U . Since x + U is the orbit of x ∈ E we obtain:

Proposition 5: Let U be a subspace of V. A subset F of E is a flat with

direction space U if and only if it is of the form

F = x + U (32.4)

for some x ∈ E .We say that two flats are parallel if the direction space of one of them is

included in the direction space of the other. For example, two flats with thesame direction space are parallel; if they are distinct, then their intersectionis empty.

The following result is an immediate consequence of Prop.4.Proposition 6: The intersection of any collection of flats in E is either

empty or a flat in E. If it is not empty, then the direction space of the

intersection is the intersection of the direction spaces of the members of the

collection.

In view of the remarks on span-mappings made in Sect.03, we have thefollowing result.

Proposition 7: Given any non-empty subset S of E, there is a unique

smallest flat that includes S. More precisely: There is a unique flat that

includes S and is included in every flat that includes S. It is called the flat


span of S and is denoted by FspS. We have FspS = S if and only if S is

a flat.

If x and y are two distinct points in E , then

←→xy := Fspx, y (32.5)

is called the line passing through x and y.It is easily seen that, if S ⊂ E and if q ∈ S, then Lsp(S − q) is the

direction space of FspS and hence, by Prop.5,

FspS = q + Lsp(S − q). (32.6)

Let E1, E2 be flat spaces with translation spaces V1,V2. The set-productE1 ×E2 then has the natural structure of a flat space as follows: The (exter-nal) translation space of E1 ×E2 is the product space V1 ×V2. The action ofV1 ×V2 on E1 ×E2 is defined by associating with each (v1,v2) ∈ V1 ×V2 themapping (x1, x2) 7→ (x1 +v1, x2 +v2) of E1 ×E2 onto itself. More generally,

if (Ei | i ∈ I) is any family of flat spaces, then ×(Ei | i ∈ I) has thenatural structure of a flat space whose (external) translation space is the

product-space ×(Vi | i ∈ I), where Vi is the translation space of Ei for

each i ∈ I. The action of×(Vi | i ∈ I) on×(Ei | i ∈ I) is defined by

(xi | i ∈ I) + (vi | i ∈ I) := (xi + vi | i ∈ I). (32.7)

We say that a flat space E is finite-dimensional if its translation spaceis finite-dimensional. If this is the case, we define the dimension of E by

dim E := dim (E − E). (32.8)

Let E be any flat space. The only flat in E whose direction space is V :=E − E is E itself. The singleton subsets of E are the zero-dimensional flats,their direction space is the zero-subspace 0 of V. The one-dimensionalflats are called lines, the two-dimensional flats are called planes, and, ifn := dim E ∈ N×, then the (n−1)-dimensional flats are called hyperplanes.The only n-dimensional flat is E itself.

Let V be a linear space, which is its own translation space when regardedas a flat space (see Prop.3). A subset U of V is then a subspace of V if andonly if it is a flat in V and contains 0.


Notes 32

(1) The traditional term for our “flat space” is “affine space”. I believe that the term“flat space” is closer to the intuitive content of the concept.

(2) A fairly rigorous definition of the concept of a flat space, close to the one givenhere, was given 65 years ago by H. Weyl (Raum, Zeit, Materie; Springer, 1918).After all these years, the concept is still not used as much as it should be. One findsexplanations (often bad ones) of the concept in some recent geometry and abstractalgebra textbooks, but the concept is almost never introduced in analysis. One ofmy reasons for writing this book is to remedy this situation, for I believe that flatspaces furnish the most appropriate conceptual background for analysis.

(3) Strictly speaking, the term “point” for an element of a flat space and the term“vector” for an element of its translation space are appropriate only if the flatspace is used as a primitive structure to describe some geometrical-physical reality.Sometimes a flat space may come up as a derived structure, and then the terms“point” and “vector” may not be appropriate to physical reality. Nevertheless, wewill use these terms when dealing with the general theory.

(4) What we call a “flat” or a “flat subspace” is sometimes called a “linear manifold”,“translated subspace”, or “affine subset”, especially when it is a subset of a linearspace.

(5) What we call “flat span” is sometimes called “affine hull” and the notation affS isthen used instead of FspS.

33 Flat Mappings

Let E , E ′ be flat spaces with translation spaces V, V ′. We say that a mappingα : E → E ′ is flat if, roughly, it preserves translations and scalar multiples oftranslations. This means that if any two points in E are related by a giventranslation v, their α-values must be related by a corresponding translationv′, and if v is replaced by ξv then v′ can be replaced by ξv′. The precisestatement is this:

Definition 1: A mapping α : E → E ′ is called a flat mapping if for

every v ∈ V, there is a v′ ∈ V ′ such that

α (ξv) = (ξv′) α for all ξ ∈ R. (33.1)

If we evaluate (33.1) at x ∈ E and take ξ := 1, we see thatv′ = α(x + v) − α(x) is uniquely determined by v. Thus, the following re-sult is immediate from the definition.

Proposition 1: A mapping α : E → E ′ is flat if and only if there is a

mapping ∇α : V → V ′ such that

(∇α)(v) = α(x + v) − α(x) for all x ∈ E and v ∈ V, (33.2)

33. FLAT MAPPINGS 109

and

(∇α)(ξv) = ξ(∇α(v)) for all v ∈ V and ξ ∈ R. (33.3)

The mapping ∇α is uniquely determined by α and is called the gradient of

α.

Note that condition (33.2) is equivalent to

α(x) − α(y) = (∇α)(x − y) for all x, y ∈ E . (33.4)

Proposition 2: The gradient ∇α of a flat mapping α is linear, i.e.,

∇α ∈ Lin(V,V ′).Proof: Choose x ∈ E . Using condition (33.2) three times, we see that

(∇α)(v + u) = α(x + (v + u)) − α(x)

= (α((x + v) + u) − α(x + v)) + (α(x + v) − α(x))

= (∇α)(u) + (∇α)(v)

is valid for all u,v ∈ V. Hence ∇α preserves addition. The condition (33.3)states that ∇α preserves scalar multiplication.

Theorem on Specification of Flat Mappings: Let q ∈ E, q′ ∈ E ′ and

L ∈ Lin(V,V ′) be given. Then there is exactly one flat mapping α : E → E ′

such that α(q) = q′ and ∇α = L. It is given by

α(x) := q′ + L(x − q) for all x ∈ E . (33.5)

Proof: Using (33.4) with y replaced by q we see that (33.5) must be validwhen α satisfies the required conditions. Hence α is uniquely determined.Conversely, if α : E → E ′ is defined by (33.5), one easily verifies that it isflat and satisfies the conditions.

Examples:

1. All constant mappings from one flat space into another are flat. A flatmapping is constant if and only if its gradient is zero.

2. If F is a flat in E , then the inclusion 1F⊂E is flat. Its gradient is theinclusion 1U⊂V of the direction space U of F in the translation spaceV of E .

3. A function a : R → R is flat if and only if it is of the form a = ξι + ηwith ξ, η ∈ R. The gradient of a is ξι ∈ LinR, which one identifieswith the number ξ ∈ R (see Sect.25). The graph of a is a straight linewith slope ξ. This explains our use of the term “gradient” (gradientmeans slope or inclination).


4. The translations v ∈ V of a flat space E are flat mappings of E intoitself. They all have the same gradient, namely 1V . In fact, a flatmapping from E into itself is a translation if and only if its gradient is1V .

5. Let V and V ′ be linear spaces, regarded as flat spaces (see Prop.3 ofSect.32). A mapping L : V → V ′ is linear if and only if it is flat andpreserves zero, i.e. L0 = 0′. If this is the case, then L is its owngradient.

6. Let (Ei | i ∈ I) be a family of flat spaces and let (Vi | i ∈ I) be the fam-ily of their translation spaces. Given j ∈ I, the evaluation mapping

evEj :×(Ei | i ∈ I) → Ej (see Sect.04) is a flat mapping when

×(Ei | i ∈ I) is endowed with the natural flat-space structure de-scribed in the previous section. The gradient of evEj is the evaluation

mapping evVj :×(Vi | i ∈ I) → Vj (see Sect.14).

7. If E is a flat space and V := E − E , then ((x,v) 7→ x + v) : E × V → Eis a flat mapping. Its gradient is the vector-addition V × V → V.

8. The point-difference mapping defined by (32.1) is flat. Its gradient isthe vector-difference mapping

((u,v) 7→ (u − v)) : V × V −→ V.

Proposition 3: Let α : E → E ′ be a flat mapping, F a flat in E with

direction space U , and F ′ a flat in E ′ with direction space U ′. Then:

(i) α>(F) is a flat in E ′ with direction space (∇α)>(U).

(ii) α<(F ′) is either empty or a flat in E with direction space (∇α)<(U ′).

Proof: Using (33.4) and (33.2) one easily verifies that the conditions ofProp.4 of Sect.32 are verified in the situations described in (i) and (ii).

If F ′ := x′ is a singleton in Prop.3, then U ′ = 0, and (ii) states thatα<(x′) is either empty or a flat in E whose direction space is Null ∇α.If F := E , then U = V and (i) states that Rng α is a flat in E ′ whosedirection space is Rng∇α. Therefore, the following result is an immediateconsequence of Prop.3.

Proposition 4: A flat mapping is injective [surjective] if and only if its

gradient is injective [surjective].

34. CHARGE DISTRIBUTIONS, BARYCENTERS, MASS-POINTS 111

The following two rules describe the behavior of flat mappings with re-spect to composition and inversion. They follow immediately from Prop.1.

Chain Rule for Flat Mappings: The composite of two flat mappings

is again flat, and the gradient of their composite is the composite of their

gradients. More precisely: If E , E ′, E ′′ are flat spaces and α : E → E ′ and

β : E ′ → E ′′ are flat mappings, then β α : E → E ′′ is again a flat mapping

and

∇(β α) = ∇β∇α. (33.6)

Inversion Rule for Flat Mappings: A flat mapping α : E → E ′ is

invertible if and only if its gradient ∇α ∈ Lin(V,V ′) is invertible. If this is

the case, then the inverse α← : E ′ → E is also flat and

∇(α←) = (∇α)−1. (33.7)

Hence every invertible flat mapping is a flat isomorphism.

The set of all flat automorphisms of flat space E , i.e. all flat isomorphismfrom E to itself, is denoted by Fis E . It is a subgroup of Perm E . Let V be thetranslation space of E and let h : Fis E → LisV be defined by h(α) := ∇αfor all α ∈ Fis E . It follows from the Chain Rule and Inversion Rule forFlat Mappings that h is a group-homomorphism. It is easily seen that h issurjective and that the kernel of h is the translation group V (see Sect.06).

Notes 33

(1) The traditional terms for our “flat mapping” are “affine mapping” or “affine trans-formation”.

(2) In one recent textbook, the notation α♯ and the unfortunate term “trace” are usedfor what we call the “gradient” ∇α of a flat mapping α.

34 Charge Distributions, Barycenters, Mass-

Points

Let E be a flat space with translation space V. By a charge distribution γon E we mean a family in R, indexed on E with finite support, i.e., γ ∈ R(E).We may think of the term γx of γ at x as an electric charge placed at thepoint x. The total charge of the distribution γ is its sum (see (15.2))

sumEγ =∑

x∈E

γx =∑

x∈Supt γ

γx. (34.1)


Definition 1: We say that the charge distributions γ, β ∈ R(E) are

equivalent, and we write γ ∼ β, if γ and β have the same total charge, i.e.

sumEγ = sumEβ, and if

∑

x∈E

γx(x − q) =∑

y∈E

βy(y − q) for all q ∈ E . (34.2)

It is easily verified that the relation ∼ thus defined is indeed an equiv-alence relation on R(E). Moreover, we have γ ∼ β if and only if γ − β ∼ 0.Also, one easily proves the following result:

Proposition 1: If γ, β ∈ R(E) have the same total charge, i.e. sumEγ =sumEβ, and if (34.2) holds for some q ∈ E, then γ, β are equivalent, i.e.,

(34.2) holds for all q ∈ E.

Given any x ∈ E we define δx ∈ R(E) by

(δx)y =

0 if x 6= y1 if x = y

; (34.3)

in words, the distribution δx places a unit charge at x and no charge any-where else. Actually, δx is the same as the x-term δEx of the standard basisδE of R(E) as defined in Sect.16.

The following result is an immediate consequence of Prop.1.Proposition 2: Given γ ∈ R(E), b ∈ E and σ ∈ R, we have γ ∼ σδb if

and only if

σ = sumEγ and∑

x∈E

γx(x − b) = 0.

The next result states that every charge distribution with a non-zerototal charge is equivalent to a single charge.

Theorem on Unique Existence of Barycenters: Let γ ∈ R(E) be

given so that σ := sumEγ 6= 0. Then there is exactly one b ∈ E such that

γ ∼ σδb. We have

b = q +1

σ

∑

x∈E

γx(x − q) for all q ∈ E . (34.4)

The point b is called the barycenter of the distribution γ.

Proof: Assume that b ∈ E is such that γ ∼ σδb. Using (34.2) withβ := σδb we get

∑

(γx(x − q) | x ∈ E) = σ(b − q) for all q ∈ E . Since σ 6= 0,it follows that (34.4) must hold for each q ∈ E . This proves the uniquenessof b ∈ E . To prove existence, we choose q ∈ E arbitrarily and define b ∈ Eby (34.4). Using Prop.1, it follows immediately that γ ∼ σδb.


Proposition 3: The barycenter of a charge distribution with non-zero

total charge belongs to the flat span of the support of the distribution. More

precisely, if γ ∈ R(E), b ∈ E and σ ∈ R× are such that γ ∼ σδb, then

b ∈ Fsp(Supt γ).Proof: Let F be any flat such that Supt γ ⊂ F . Then γ|F ∈ R(F) is a

charge distrubution on F and has a barycenter in F . But this is also thebarycenter b in E and hence we have b ∈ F . Since F , with Supt γ ⊂ F , wasarbitrary, it follows that b ∈ Fsp(Supt γ).

We call a pair (µ, p) ∈ P× × E a mass-point. We may think of it asdescribing a particle of mass µ placed at p. By the barycenter b of a non-empty finite family ((µi, pi) | i ∈ I) of mass-points we mean the barycenterof the distribution

∑

(µiδpi| i ∈ I). It is characterized by

∑

i∈I

µi(pi − b) = 0 (34.5)

and satisfies

b = q +1

sumIµ

∑

i∈I

µi(pi − q) (34.6)

for all q ∈ E .Proposition 4: Let Π be a partition of a given non-empty finite set I

and let ((µi, pi) | i ∈ I) be a family of mass-points. For every J ∈ Π, let

λJ := sumJ(µ|J) and let qJ be the barycenter of the family ((µj, pj) | j ∈ J).Then ((µi, pi) | i ∈ I) and ((λJ , qJ) | J ∈ Π) have the same barycenter.

Proof: Using (34.6) with I replaced by J , b by qJ and q by the barycen-ter b of ((µi, pi) | i ∈ I), we find that

λJ(qJ − b) =∑

j∈J

µj(pj − b)

holds for all J ∈ Π. Using the summation rule (07.5), we obtain

∑

J∈Π

λJ(qJ − b) =∑

i∈I

µi(pi − b) = 0,

which proves that b is the barycenter of ((λJ , qJ) | J ∈ Π).By the centroid of a non-empty finite family (pi | i ∈ I) of points we

mean the barycenter of the family ((1, pi) | i ∈ I).Examples and Remarks:

1. If z is the barycenter of the pair ((µ, x), (λ, y)) ∈ (P× × E)2 i.e. of thedistribution µδx+λδy, we say that z divides (x, y) in the ratio λ : µ.


By (34.5) z is characterized by µ(x − z) + λ(y − z) = 0 (see Figure 1).If λ = µ, we say that z is the midpoint of (x, y).

Thus, the midpoint of (x, y) is just the centroid of (x, y).

2. Consider a triple (x1, x2, x3) of points and let c be centroid of thistriple. If we apply Prop.4 to the partition 1, 2, 3 of 3] and tothe case when µi = 1 for all i ∈ 3], we find that c is also the barycenterof ((1, x1), (2, z)), where z is the midpoint of (x2, x3). In other words,c divides (x1, z) in the ratio 2:1. (see Figure 2). We thus obtainthe following well known theorem of elementary geometry: The threemedians of a triangle all meet at the centroid, which divides each inthe ratio 2:1.

3. Let c be the centroid of a quadruple (x1, x2, x3, x4) of points. If weapply Prop.4 to the partition 1, 2, 3, 4 of 4] and to the case whenµi = 1 for all i ∈ 4], we find that c is also the midpoint of (y, z), wherey is the midpoint of (x1, x2) and z the midpoint of (x3, x4) (see Figure3). We thus obtain the following geometrical theorem: The four line-segments that join the midpoints of opposite edges of a tetrahedronall meet at the centroid of the tetrahedron, which is the midpoint ofall four.


If Prop.4 is applied to the partition 1, 2, 3, 4 of 4] and to thecase when µi = 1 for all i ∈ 4] one obtains the following geometrictheorem: The four line segments that join the vertices of a tetrahedronto the centroids of the opposite faces all meet at the centroid of thetetrahedron, which divides each in the ratio 3:1.

4. If a non-empty finite family ((µi, pi) | i ∈ I) in P× × E is interpretedphysically as a system of point-particles then the barycenter of thefamily is the center of mass of the system. If the system moves, thenthe places pi will change in time, and so will the center of mass. New-tonian mechanics teaches that the center of mass moves on a straightline with constant speed if no external forces act on the system.

5. Let E , with dim E = 2, be a mathematical model for a rigid, plane,horizontal plate. A distribution γ ∈ R(E) can then be interpreted asa system of vertical forces. The term γx with x ∈ Supt γ gives themagnitude and direction of a force acting at x. We take γx as positiveif the force acts downwards, negative if it acts upwards. If σ := sumEγis not zero and if b is the barycenter of γ, then the force system isstatically equivalent to a single resultant force σ applied at b; i.e. theplate can be held in equilibrium by applying on opposite force −σ atb.

6. We can define an “addition” add: (P× × E)2 → P× × E on P× × E by

add((µ, x), (λ, y)) := (µ + λ, z),

where z is the barycenter of the pair ((µ, x), (λ, y)), i.e., the point thatdivides (x, y) in the ratio λ : µ. It is obvious that this addition is


commutative. Application of Prop.4 to triples easily shows that thisaddition is also associative and hence endows P××E with the structureof a commutative pre-monoid (see Sect.06). We use additive notation,i.e. we write

(µ, x) + (λ, y) := add((µ, x), (λ, y)).

For any non-empty finite family ((µi, pi) | i ∈ I) in P× × E we thenhave

∑

i∈I

(µi, pi) = (sumIµ, b),

where b is the barycenter of the family. It is easily shown that thepre-monoid P× × E is cancellative.

Pitfall: The statements “z divides (x, y) in the ratio λ : µ” and “z isthe midpoint of (x, y)” should not be interpreted as statements about thedistances from z to x and y. There is no concept of distance in a flat spaceunless it is endowed with additional structure as in Chap.4.

Notes 34

(1) The terms “weight” or “mass” are often used instead of our “charge”. The troublewith “weight” and “mass” is that they lead one to assume that their values arepositive.

(2) There is no agreement in the literature about the terms “barycenter” and “cen-troid”. Sometimes “centroid” is used for what we call “barycenter” and sometimes“barycenter” for what we call “centroid”.

35 Flat Combinations

We assume that a flat space E with translation space V and a non-emptyindex set I are given. Recall that the summation mapping sumI : R(I) → Rdefined by (15.2) is linear and hence flat. For each ν ∈ R, we write

(R(I))ν := sum<I (ν) = λ ∈ R(I) | sumIλ = ν. (35.1)

It follows from Prop. 3 of Sect. 33 that (R(I))ν is a flat in R(I) whosedirection space is (R(I))0.

Definition 1: The mapping

flcp : (R(I))1 −→ E ,

35. FLAT COMBINATIONS 117

defined so that flcp(λ) is the barycenter of the distribution∑

(λiδpi| i ∈ I),

is called the flat-combination mapping for p. The value flcp(λ) is called

the flat combination of p with coefficient family λ.

By the Theorem on the Unique Existence of Barycenters of Sect.34, themapping flcp is characterized by

∑

i∈I

λi(pi − flcp(λ)) = 0 for all λ ∈ (R(I))1 (35.2)

and we have, for every q ∈ E ,

flcp(λ) = q +∑

i∈I

λi(pi − q) for all λ ∈ (R(I))1. (35.3)

If F is a flat in E and if Rng p ⊂ F , one can apply Def.1 to F insteadof E and obtain the flat-combination mapping flcFp for p relative to F . It is

easily seen that flcFp coincides with the adjustment flcp|F .

Proposition 1: The flat-combination mapping flcp is a flat mapping.

Its gradient is the restriction to (R(I))0 of the linear-combination mapping

lnc(p−q) for the family p− q := (pi − q | i ∈ I) in V, no matter how q ∈ E is

chosen.

Proof: It follows from (35.3) that

flcp(λ) − flcp(µ) =∑

i∈I

(λi − µi)(pi − q) = lncp−q(λ − µ)

holds for all λ, µ ∈ (R(I))1. Comparison of this result with (33.4) gives thedesired conclusion.

If S is a non-empty subset of E , we identify S with the family (x | x ∈ S)indexed on S itself (see Sect.02). Thus, the flat combination mappingflcS : (R(S))1 → E has the following interpretation: For every γ ∈ (R(S))1,which can be viewed as a charge distribution with total charge 1 and supportincluded in S, flcS(γ) is the barycenter of γ.

Flat Span Theorem: The set of all flat combinations of a non-

empty family p of points in E is the flat span of the range of p, i.e.Rng flcp = Fsp(Rng p). In particular, if S is a non-empty subset of E, then

Rng flcS = FspS.

Proof: Noting that Rng flcp = Rng flcS if S = Rng p, we see that it issufficient to prove that Rng flcS = FspS for every non-empty S ∈ Sub E .

Let x ∈ S. Then δx ∈ (R(S))1 and flcS(δx) = x. Hence x ∈ Rng flcS .Since x ∈ S was arbitrary, it follows that S ⊂ Rng flcS . Since flcS is a flat


mapping by Prop.1, it follows, by Prop.3 of Sect.33, that Rng flcS is a flatin E and hence that S ⊂ FspS ⊂ Rng flcS . On the other hand, by Prop.3of Sect.34, we have flcS(γ) ∈ Fsp(Supt γ) ⊂ FspS for every γ ∈ (R(S))1 andhence Rng flcS ⊂ FspS.

Applying the Flat Span Theorem to the case when p := (x, y) is a pair

of distinct points in E , we find that the line←→xy passing through x and y

(see (32.5)) is just the set of all flat combinations of (x, y):

←→xy = flc(x,y)(λ, µ) | λ, µ ∈ R, λ + µ = 1. (35.4)

If V is a linear space, regarded as a flat space that is its own translationspace, and if f := (fi | i ∈ I) is a family of elements of V, then the flatcombination mapping flcf for f is nothing but the restriction to (R(I))1 ofthe linear combination mapping for f , i.e. flcf = lncf |(R(I))1

. In other wordswe have

∑

i∈I

λifi = flcf (λ) for all λ ∈ (R(I))1.

If E is a flat space that does not carry the structure of a linear space,and if p := (pi | i ∈ I) is a family in E , we often use the symbolic notation

∑

i∈I

λipi := flcp(λ) for all λ ∈ (R(I))1, (35.5)

even though the terms λipi do not make sense by themselves. If p := (p1, p2)is a pair of points, we write

λ1p1 + λ2p2 := flc(p1,p2)(λ1, λ2) (35.6)

for all (λ1, λ2) ∈ (R2)1, i.e. for all λ1, λ2 ∈ R with λ1 + λ2 = 1. Similarnotations are used for other lists with a small number of terms.

If p is a family of points in E and f a family of vectors in V, both withthe same index set I, then

p + f := (pi + fi | i ∈ I) (35.7)

is also a family of points in E .

Definition 2: A non-empty family p of points in E is said to be flatlyindependent, flatly spanning, or a flat basis of E if the flat-combination

mapping flcp is injective, surjective, or invertible, respectively.

By the Flat Span Theorem, the family p is flatly spanning if and only ifthe flat span of its range is all of E .

35. FLAT COMBINATIONS 119

Proposition 2: Let p = (pi | i ∈ I) be a non-empty family of points in

E and let k ∈ I. Then p is flatly independent, flatly spanning, or a flat basis

of E according as the family f := (pi − pk | i ∈ I \ k) of vectors is linearly

independent, spanning, or a basis of V, respectively.

Proof: Using (35.3) with the choice q := pk we see that

flcp(λ) = pk + lncf (λ|I\k) for all λ ∈ (R(I))1. (35.8)

Since the mapping (λ 7→ λ |I\k) : (R(I))1 → R(I\k) is easily seen to beinvertible, it follows from (35.8) that flcp is injective, surjective, or invertibleaccording as lncf is injective, surjective, or invertible, respectively. In viewof Def.2 of Sect.15, this is the desired result.

The following result follows from Prop.2 and the Theorem on Charac-terization of Dimension (Sect.17).

Proposition 3: Let E be a finite-dimensional flat space and let

p := (pi | i ∈ I) be a family of points in E.

(a) If p is flatly independent then I is finite and ♯I ≤ dim E + 1, with

equality if and only if p is a flat basis.

(b) If p is flatly spanning, then ♯I ≥ dim E + 1, with equality if and only

if p is a flat basis.

Let p := (pi | i ∈ I) be a flat basis of E . Then, by Def.2, for every x ∈ Ethere is a unique λ ∈ (R(I))1 such that x = flcp(λ). The family λ is calledthe family of barycentric coordinates of x relative to the flat basis p.

The following result states that the flat mappings are exactly those that“preserve” flat combinations. We leave its proof to the reader (see Problem7 below).

Proposition 4: Let E , E ′ be flat spaces. A mapping α : E → E ′ is flat

if and only if for every non-empty family p of points in E we have

α flcp = flcαp. (35.9)

The following result is an analogue, and a consequence, of Prop.2 ofSect.16.

Proposition 5: Let E , E ′ be flat spaces. Let p := (pi | i ∈ I) be a flat

basis of E and let p′ := (p′i | i ∈ I) be a family of points in E ′. Then there is

a unique flat mapping α : E → E ′ such that α p = p′. This α is injective,

surjective, or invertible depending on whether p′ is flatly independent, flatly

spanning, or a flat basis, respectively.


Notes 35

(1) In the case when p is a list, the flat-combination flcp(λ) is sometimes called the“average” of p with “weights” λ.

(2) The term “frame” is sometimes used for what we call a “flat basis”.

36 Flat Functions

In this section, we assume that E is a finite-dimensional flat space withtranslation space V. We denote the set of all flat mappings from E into Rby Flf E and call its members flat functions. Given any q ∈ E , we use thenotation

FlfqE := a ∈ Flf E | a(q) = 0 (36.1)

for the set of all flat functions that have the value 0 at the point q. The setof all real-valued constants with domain E will be denoted by RE . Note thatthe gradient ∇a of a flat function a belongs to the dual V∗ := Lin(V, R) ofthe translation space V.

The following two basic facts are easily verified using Prop.1 of Sect.33.Proposition 1: Flf E is a subspace of Map (E , R). The set RE is a one-

dimensional subspace of Flf E and, for each q ∈ E, Flfq E is a supplement of

RE in Flf E, so that RE ∩ Flfq E = 0 and

Flf E = RE + FlfqE . (36.2)

Proposition 2: The mapping GE : Flf E → V∗ defined by GEa := ∇ais linear and surjective and has the nullspace Null GE = RE .

The next result follows from Prop.2 by applying the Theorem on Dimen-sion of Range and Nullspace (Sect.14) and (21.1).

Proposition 3: We have

dim E = dim(FlfqE) = dim(Flf E) − 1 (36.3)

for all q ∈ E.

The following result states, among other things, that every flat functiondefined on a flat in E can be extended to a flat function on all of E .

Proposition 4: Let F be a flat with direction space U . The restriction

mapping (a 7→ a|F ) : Flf E → Flf F is linear and surjective. Its nullspace

N := a ∈ Flf E | a|F = 0 has the following properties:

(i) For every q ∈ F we have N = a ∈ Flfq E | ∇a ∈ U⊥.

36. FLAT FUNCTIONS 121

(ii) For every q ∈ E \ F there is an a ∈ N such that a(q) = 1.

Proof: The linearity of a 7→ a|F is evident. Now let q ∈ F be given. Ifb ∈ Flf F is given we can choose, by Prop.6 of Sect.21, λ ∈ V∗ such thatλ|U = ∇b. By the Theorem on Specification of Flat Mappings (Sect.33),we can define a ∈ Flf E by a(x) := b(q) + λ(x − q) for all x ∈ E . We thenhave a|F = b. Since b ∈ Flf F was arbitrary, this shows the surjectivity ofa 7→ a|F .

The property (i) of N is an immediate consequence of Prop.1 of Sect.33.Assume now that q ∈ E \ F is given and choose z ∈ F . Then

v := q − z /∈ U and hence U $ U + Rv. Using Prop.4 of Sect.22, it fol-lows that (U +Rv)⊥ $ U⊥. Hence we can choose λ ∈ U⊥ such that λv 6= 0.By the Theorem on Specification of Flat Mappings, we can define a ∈ Flf Eby

a(x) :=1

λvλ(x − z) for all x ∈ E .

It is evident that a ∈ N and a(q) = 1.Consider now a non-constant flat function a : E → R. As stated in

Example 1 of Sect.33, we then have ∇a 6= 0. It follows from Prop.3 of Sect.33that the direction space of a<(0) is ∇a⊥ = (R∇a)⊥. Since dim(R∇a) =1, it follows from the Formula for Dimension of Annihilators (21.15) thatdim a<(0) = dim∇a⊥ = n−1, i.e. that a<(0) is a hyperplane. We saythat a<(0) is the hyperplane determined by a. Given any hyperplaneF in E , it follows from Prop.4, (ii) that there is a non-constant a ∈ Flf E suchthat F is the hyperplane determined by a. Also, Prop.4, (ii), and Prop.6 ofSect.32 have the following immediate consequence.

Proposition 5: Every flat in E other than E itself is the intersection of

all hyperplanes that include it.

Proposition 6: The evaluation mapping ev : E → (Flf E)∗ defined by

ev(x)a := a(x) for all a ∈ Flf E , x ∈ E (36.4)

is flat and injective. Its gradient ∇ev ∈ Lin(V, (Flf E)∗) coincides with the

transpose of the linear mapping GE ∈ Lin(Flf E ,V∗) defined in Prop.2.

Proof: Using (33.2) and (36.4) we see that

(∇a)v = a(x + v) − a(x) = (ev(x + v) − ev(x))a

holds for all a ∈ Flf E , x ∈ E and v ∈ V. Hence, if we define L : V → (Flf E)∗

by

L(v)a := (∇a)v for all a ∈ Flf E ,v ∈ V, (36.5)


we obtain

L(v) = ev(x + v) − ev(x) for all x ∈ E , v ∈ V.

It is clear from (36.5) that L(ξv) = ξL(v) for all v ∈ V, ξ ∈ R. Thus, evsatisfies the two conditions of Prop.1 of Sect.33 and L is the gradient of ev,i.e. ∇ev = L.

The equation (36.5) states that

((∇ev)v)a = (GEa)v for all a ∈ Flf E , v ∈ V (36.6)

In view of the characterization (22.3) of the transpose, this means that∇ev = G⊤E . By the Theorem on Annihilators and Transposes, (21.13), andby Prop.2, it follows that Null ∇ev = (RngGE)

⊥ = V∗⊥ = 0. Therefore∇ev is injective and so is ev (see Prop.4 of Sect.33).

If we writex := ev(x), v := (∇ev)v (36.7)

when x ∈ E ,v ∈ V, we can, by Prop.6, regard x 7→ x as a flat isomorphismfrom E onto the flat E := ev>(E) in (Flf E)∗ and v 7→ v as a linear isomor-phism from V onto the subspace V := (∇ev)>(V) of (Flf E)∗. This subspaceV is the direction space of the flat E . For all x, y ∈ E , v ∈ V, we have

x − y = v ⇐⇒ x − y = v

andx + v = y ⇐⇒ x + v = y.

Hence, if points in E and translations in V are replaced by their imagesin E and V, a point-difference becomes an ordinary difference in (Flf E)∗

and the symbolic sum of a point and a translation becomes an ordinarysum in (Flf E)∗. If points in E are replaced by their images in E , then flatcombinations become linear combinations. In other words, the image in Eof a symbolic sum of the form (35.5) is the ordinary sum

∑

i∈I λipi.It is not hard to show that every element of (Flf E)∗ is either of the form

ξx with x ∈ E , ξ ∈ R×, or of the form v with v ∈ V.

Notes 36

(1) In accord with Note 1 to Sect.32, the usual term for our “flat function” is “affinefunction”. However, in very elementary texts (especially those for high-school use),the term “linear fnction” is often found. Of course, such usage clashes with our(and the standard) use of “linear”.

37. CONVEX SETS 123

37 Convex Sets

Let E be a flat space with translation space V. Given any points x, y ∈ E wedefine the segment joining x and y to be the set of all flat combinations of(x, y) with positive coefficients and denote it by

[x, y] := λx + µy | λ, µ ∈ P, λ + µ = 1, (37.1)

where the sumbolic-sum notation (35.6) is used. Comparing this definitionwith (35.4), we see that, if x 6= y, the segment joining x and y is a subset ofthe line passing through x and y, as one would expect.

Definition 1: A subset C of E is said to be convex if the segment joining

any two points in C is included in C, i.e. if x, y ∈ C implies [x, y] ⊂ C.

It is evident that the empty set ∅ and all flats in E , and in particularE itself, are convex. Also, all segments are convex. If R is regarded as aone-dimensional flat space, then its convex sets are exactly the intervals (seeSect.08). If s, t ∈ R and s ≤ t, the notation (37.1) is consistent with thenotation [s, t] := r ∈ R | s ≤ r ≤ t (see (08.4)).

The following result is an immediate consequence of Def.1.Proposition 1: The intersection of any collection of convex sets is again

a convex set.

In view of the remarks on span-mappings made in Sect.03 we have thefollowing result.

Proposition 2: Given any subset S of E, there is a unique smallest

convex set that includes S. More precisely, there is a unique convex set that

includes S and is included in every convex set that includes S. It is called

the convex hull of S and is denoted by CxhS. We have S = CxhS if and

only if S is convex.

It is clear that the convex hull of two points is just the segment joiningthe two points; more precisely Cxhx, y = [x, y] for all x, y ∈ E .

The following result follows from the fact (Prop.4 of Sect.35) that flatmappings preserve flat combinations.

Proposition 3: Images and pre-images of convex sets under flat map-

pings are again convex. In particular, if x, y ∈ E and if α is a flat mapping

with domain E, then

α>([x, y]) = [α(x), α(y)]. (37.2)

Using (35.7) for pairs p := (x, y) and f := (u,v) we immediately obtainthe following result.

Proposition 4: Let C be a convex subset of E, and B a convex subset

of V. Then C +B is a convex subset of E. In particular, C + v is convex for

all v ∈ V and x + B is convex for all x ∈ E.


Given any non-empty set I, we use the notation

(P(I))1 := (R(I))1 ∩ P(I)

for the set of all families of positive numbers that have finite support andadd up to 1. It is easily seen that (P(I))1 is a convex subset of the flat space(R(I))1.

Definition 3: Let p := (pi | i ∈ I) be a non-empty family of points in

E. The restriction of the flat-combination mapping for p to (P(I))1 is called

the convex-combination mapping for p and is denoted by

cxcp := flcp |(P(I))1. (37.3)

Convex Hull Theorem: For every non-empty family p of points in E,

we have

Rng cxcp = Cxh(Rng p). (37.4)

In particular, a subset C of E is convex if and only if Rng cxcC = C.

Proof: We have, for all j ∈ I, δj ∈ (P(I))1 and hence pj = flcp(δj) =cxcp(δj). Since j ∈ I was arbitrary, it follows that Rng p ⊂ Rng cxcp.Since (P(I))1 is convex and flcp flat, it follows from Prop.3 that Rng cxcp =(flcp)>((P(I))1) is a convex subset of E and so Cxh(Rng p) ⊂ Rng cxcp.

To prove the reverse inclusion Rng cxcp ⊂ Cxh(Rng p), we must showthat cxcp(λ) ∈ Cxh(Rng p) for all λ ∈ (P(I))1. We do so by induction over♯Suptλ. If ♯Suptλ = 1, then

λ = δj for some j ∈ I and cxcp(λ) = pj ∈ Rng p ⊂ Cxh(Rng p).

Assume, then, that λ ∈ (P(I))1 with ♯Suptλ > 1 is given, and thatcxcp(µ) ∈ Cxh(Rng p) holds for all µ ∈ (P(I))1 with ♯Suptµ < ♯Suptλ. Wemay and do choose j ∈ Supt λ and we put σ :=

∑

(λi | i ∈ I \ j) ∈ ]0, 1[.We define µ ∈ (P(I))1 by

µi :=

1σλi if i ∈ I \ j0 if i = j

Clearly, ♯Suptµ = ♯Suptλ−1 and λ = σµ+λjδj . Since σ+λj = sumIλ = 1,this states that λ ∈ [µ, δj] in (P(I))1. Since flcp is flat, it follows by (37.2)that

cxcp(λ) = flcp(λ) ∈ [flcp(µ), flcp(δj)] = [cxcp(µ), pj]

By the induction hypothesis, cxcp(µ) ∈ Cxh(Rng p). Since pj ∈ Rng p ⊂Cxh(Rng p) and since Cxh(Rng p) is convex, it follows by Def.1 thatcxcp(λ) ∈ Cxh(Rng p).

37. CONVEX SETS 125

Since flat mappings preserve flat combinations (Prop.4 of Sect.35) andhence convex combinations, we immediately obtain the following conse-quence of the Convex Hull Theorem.

Proposition 5: If S is a subset of E and α a flat mapping with domain

E, then

α>(CxhS) = Cxh(α>(S)). (37.5)

The following is a refinement of the Convex Hull Theorem.Strong Convex Hull Theorem: Let p be a non-empty family of points

in E and let x ∈ Cxh(Rng p) be given. Then there is a λ ∈ (P(I))1 such that

x = cxcp(λ) and such that p|Supt λ is flatly independent.

Proof: By the Convex Hull Theorem, cxc<p (x) is not empty. We

choose a λ ∈ cxc<p (x) whose support has minimum cardinal. Put J :=

Suptλ.Assume that p′ := p|J is flatly dependent, i.e. that flcp′ is not injective.

Then, by Prop.4 of Sect.33, the gradient of flcp′ is not injective. Hence wecan choose ν ∈ ((R(I))0)

× such that Supt ν ⊂ J and

0 = (∇flcp′)ν|J = (∇flcp)ν (37.6)

We now choose k ∈ J such that

νk

λk

= maxνi

λi| i ∈ J.

Since sumJν|J = 0 and ν|J 6= 0 and λi > 0 for all i ∈ J , we must haveνk > 0 and we conclude that

λi ≥νi

νk

λk for all i ∈ J.

Hence we have λ′ := λ − λk

νkν ∈ (P(I))1, so that cxcpλ

′ is meaningful. Using(37.6) We obtain

cxcpλ′ = flcp(λ − λk

νkν) = flcpλ − λk

νk(∇flcp)ν

= flcpλ = cxcpλ = x,

which means that λ′ ∈ cxc<p (x). On the other hand, we have

Suptλ′ $ Suptλ because λ′k = 0. It follows that ♯Suptλ′ < ♯Suptλ whichcontradicts the assumption that Suptλ has minimum cardinal. Hence p|Jcannot be flatly dependent.

The following consequence of the Strong Convex Hull Theorem is ob-tained by using Prop.3, (a) of Sect.35.


Corollary: Let S be a subset of a finite-dimensional flat space E. Then,

for every x ∈ CxhS, there is a finite subset I of S such that x ∈ CxhI and

♯I ≤ (dim E) + 1.

Notes 37

(1) Many other notations, for example ConvS and S, can be found for our CxhS.

(2) The Corollary to the Strong Convex Hull Theorem is often called“Caratheodory’s Theorem”.

38 Half-Spaces

Let E be a flat space with translation space V. Consider a non-constant flatfunction a : E → R, i.e. a flat function such that Rng a is not a singleton.Since the only flats in R are the singletons and R itself, it follows by Prop.3of Sect.33 that Rng a = R and hence that a<(0), a<(P), and a<(P×) areall non-empty. As we have seen in Sect.36, a<(0) is the hyperplane de-termined by a if E is finite-dimensional. By Prop.3 of Sect.37, a<(P) anda<(P×) are (non-empty) convex subsets of E , called the half-space andthe open-half-space determined by a. The hyperplane a<(0) is calledthe boundary of these half spaces. Of course, the half-space a<(P) is theunion of its boundary a<(0) and the open-half-space a<(P×), and thesetwo are disjoint. If ξ ∈ P×, then a and ξa determine the same hyperplaneand half-spaces.

If z ∈ E and a ∈ (Flfz E)×, then a is not constant and z belongs to theboundary of the half-space a<(P).

Half-Space Inclusion Theorem: Let E be a finite-dimensional flat

space, C a non-empty convex subset of E and q ∈ E \ C. Then there is a

half-space that includes C and has q on its boundary. In other words, there

is an a ∈ (Flfq E)× such that a|C ≥ 0.

The proof wll be based on the following preliminary result:

Lemma: Let C be a convex subset of E and let b ∈ Flf E be given. If both

C ∩ b<(P×) and C ∩ b<(−P×) are non-empty, so is C ∩ b<(0). If c ∈ Flf Esatisfies c|C∩b<(0) ≥ 0, then

c(x)

b(x)≥

c(y)

b(y)for all x ∈ C ∩ b<(P×), y ∈ C ∩ b<(−P×). (38.1)

38. HALF-SPACES 127

Proof: Let x ∈ C∩b<(P×), y ∈ C∩b<(−P×) be given, so that b(x) > 0,b(y) < 0, and hence b(x) − b(y) > 0. We define

λ :=−b(y)

b(x) − b(y), µ :=

b(x)

b(x) − b(y). (38.2)

It is clear that λ, µ ∈ P×, λ + µ = 1. Hence z := λx + µy belongs to [x, y].Since b preserves convex combinations, we have, by (38.2),

b(z) = λb(x) + µb(y) = 0

and hence z ∈ b<(0). Since C is convex, we also have z ∈ C and hencez ∈ C ∩ b<(0). The situation is illustrated in Fig.1.

If C ∩ b<(P×) and C ∩ b<(−P×) are both non-empty, we may choose apoint in each and obtain a point in C ∩ b<0 by the above method, showingthat C ∩ b<0 is not empty.

Now let c ∈ Flf E be such that c|C∩b<0 ≥ 0. Then, if x and y are givenas above and z constructed accordingly, we have

0 ≤ c(z) = λc(x) + µc(y).

Substituting (38.2) into this inequality and multiplying the result byb(x) − b(y) ∈ P× yields −b(y)c(x) + b(x)c(y) ≥ 0 which is equivalent to(38.1).

Proof of Theorem: We proceed by induction. The assertion is vacu-ously valid if dim E = 0. Assume then, that dim E > 0, that a non-emptyconvex C ∈ Sub E and a q ∈ E \ C are given and that the assertion is validwhen E is replaced by a hyperplane F in E .


Since, by Prop.3 of Sect.36, dim(Flfq E) = dim E > 0, we may and dochoose b ∈ (Flfq E)× and put F := b<0. If C ∩ b<(P×) or C ∩ b<(−P×) isempty then a := −b or a := b, respectively, fulfills the requirement of thetheorem. Therefore, we may assume that both C ∩ b<(P×) and C ∩ b<(−P×)are not empty. By the Lemma, C∩F is then also non-empty. Since C∩F is aconvex subset of F and q ∈ F \ (F ∩C) we may use the induction hypothesisand choose d ∈ (Flfq F)× such that d|C∩F ≥ 0. In view of Prop.4 of Sect.36,we may and do choose a flat extension c of d to E , so that c ∈ (Flfq E)× andc|C∩F ≥ 0. We define subsets S and T of R by

S := c(x)

b(x)| x ∈ C ∩ b<(P×),

T := c(y)

b(y)| y ∈ C ∩ b<(−P×).

Both S and T are non-empty. Applying the second statement of the Lemma,we see that every number in T is less than every number in S. It followsthat −∞ < supT ≤ inf S < ∞. We choose ξ ∈ [supT, inf S] and puta := c − ξb. If x ∈ C ∩ F = C ∩ b<0 we have a(x) = c(x) ≥ 0 since

c|C∩F ≥ 0. If x ∈ C ∩ b<(P×) we have c(x)b(x) ≥ inf S ≥ ξ, b(x) > 0, and

hence a(x) = c(x) − ξb(x) ≥ 0. Finally, if x ∈ C ∩ b<(−P), we have c(x)b(x) ≤

supT ≤ ξ, b(x) < 0, and hence a(x) = c(x) − ξb(x) ≥ 0. Therefore, sinceE = b<0 ∪ b<(P×) ∪ b<(−P×), we have a(x) ≥ 0 for all x ∈ C. SinceF = b<0 we have a|F = c|F = d 6= 0 and hence a 6= 0. Hence a fulfills therequirement of the theorem.

Remark 1: The requirement, in the Theorem, that C be non-empty canobviously be omitted when dim E > 0. This requirement is needed only tomake the assertion vacuously valid when dim E = 0 and thus to simplify theinduction proof.

Remark 2: In general, there are many half-spaces that meet the re-quirements of the Theorem. However, if C := a<(P×) ∪ z, where z ∈ Eand a ∈ (Flfz E)×, and if q ∈ a<(0)\z, then a<(P) is the only half-spacethat meets the requirements (see Fig.2).


Remark 3: The Half-Space Inclusion Theorem depends strongly on theassumption that the translation space of E is a linear space over the real fieldR. Its conclusion is no longer valid if R is replaced by Q (see Problem 10,below).

Notes 38

(1) What we call the “Half-Space Inclusion Theorem” is at the root of a number ofresults, often called “Separation Theorems”, in convex analysis. Some of theseresults are stated in Sect.54.


(1) Let S be a subset of a linear space V. Show that

LspS = Fsp(S ∪ 0). (P3.1)

(2) Let E , E ′ be flat spaces. Show that a mapping α : E → E ′ is flat if andonly if its graph Gr(α) is a flat in E × E ′, and, if this is the case, showthat the direction space of Gr(α) is Gr(∇α). (Hint: Use Problem 1 ofChap.1)

(3) Let E be a flat space and let ε be a flat mapping from E to itself thatis idempotent in the sense that

ε ε = ε. (P3.2)

Show:

(a) We have

ε|Rng ε = 1Rng ε⊂E . (P3.3)

(b) ∇ε is an idempotent lineon on V := E − E .

(c) We have

Rng ε + Null ∇ε = E . (P3.4)

(d) Let x ∈ Rng ε be given. Then ε<(x) is a flat with directionspace Null ∇ε and we have


ε<(x) ∩ Rng ε = x, (P3.5)

ε<(x) + Rng (∇ε) = E . (P3.6)

Moreover, if E is finite-dimensional, we have

dim ε<(x) + dimRng ε = dimE . (P3.7)

(4) Let E be a flat space and let γ be a charge distribution on E with totalcharge zero. Show: Given any p ∈ E and σ ∈ P× there is exactly oneq ∈ E such that γ ∼ σδp − σδq.

Note: If p 6= q, the distribution σδp − σδq is called a dipole.

(5) (a) Let (x1, x2, x3, x4) be a quadruple of points in a flat space. Showthat the following are equivalent:

(i) The midpoint of (x1, x3) coincides with the midpoint of(x2, x4),

(ii) x2 − x1 = x3 − x4,

(iii) x3 − x2 = x4 − x1.

Remark: If the quadruple is injective and if any and hence allof these conditions are satisfied, then the points are the verticesof a parallelogram. In fact, one may take this to be the definitionof “parallelogram”.

(b) Show that the midpoints of the sides of a quadrilateral (not nec-essarily plane) in a flat space form a parallelogram.

(6) Consider a flatly independent triple (x1, x2, x3) in a flat space E . As-sume that y1 divides (x2, x3) in the ratio λ1 : µ1 and that y2 divides(x1, x3) in the ratio λ2 : µ2. Determine the ratios in which the point

c of intersection of←→x1y1 and

←→x2y2 divides (x1, y1) and (x2, y2) (see

Figure).


(7) Let E , E ′ be flat spaces and let α : E → E ′ be a mapping whichpreserves flat combinations of pairs of points, i.e. which satisfies α flcp = flcαp for all pairs p := (p1, p2) ∈ E2. Prove that α must be aflat mapping.

(8) Let E be a flat space and consider the pre-monoid P××E described inExample 6 of Sect.34.

(a) Prove that the pre-monoid P× × E is cancellative (see Sect.06).

(b) Show that P× × E does not contain an element that satisfies theneutrality law (06.2) and hence is not monoidable.

(9) Let E be a flat space and put V := E − E . Let λ ∈ V∗ and a,b ∈ V×

be given such that λa = 1, λb = 0. Put

E := 1V − (a ⊗ λ), N := b⊗ λ. (P3.8)

(a) Show that E + ξN is idempotent for each ξ ∈ R and determineNull (E + ξN) and Rng (E + ξN).

Now let q ∈ E be given and define, for each v ∈ V, ϕv : E → Eby

ϕv(x) := x + (E + (λ(x − q))N)v for all x ∈ E . (P3.9)

(b) Show that, for each v ∈ V, ϕv is flat and determine ∇ϕv; also,show that ϕv is invertible, so that ϕv ∈ Fis E .

(c) Show that the mapping

ϕ := (v 7→ ϕv) : V → Perm E (P3.10)


is an injective homomorphism from the additive group of V toPerm E . (Hence, in view of (b), ϕ>(V) is a subgroup of Fis E thatis isomorphic to—but different from—the additive group V.)

(d) Is the action of V on E defined by (P3.10) free? Is it transitive?

(10) Note that all the definitions of Chap.3 remain meaningful if the fieldR of real numbers is replaced by the field Q of rational numbers. Givean example of a finite-dimensional flat space E over Q, a non-emptyconvex subset C of E and a point q ∈ E \ C such that a>(C) 6⊂ P forall a ∈ (Flfq E)×, where Flf E denotes the set of all flat functions on Ewith values in Q. (Thus, the Half-Space Inclusion Theorem does not

extend to the case when R is replaced by Q.)

Chapter 4

Inner-Product Spaces,

Euclidean Spaces

As in Chap.2, the term “linear space” will be used as a shorthand for“finite dimensional linear space over R”. However, the definitions of aninner-product space and a Euclidean space do not really require finite-dimensionality. Many of the results, for example the Inner-Product In-equality and the Theorem on Subadditivity of Magnitude, remain validfor infinite-dimensional spaces. Other results extend to infinite-dimensionalspaces after suitable modification.

41 Inner-Product Spaces

Definition 1: An inner-product space is a linear space V endowed with

additional structure by the prescription of a non-degenerate quadratic form

sq ∈ Qu(V) (see Sect.27). The form sq is then called the inner square of

V and the corresponding symmetric bilinear form

ip := sq ∈ Sym2(V2, R) ∼= Sym(V,V∗)

the inner product of V.

We say that the inner-product space V is genuine if sq is strictly posi-

tive.

It is customary to use the following simplified notations:

v·2 := sq(v) when v ∈ V, (41.1)

u · v := ip (u,v) = (ip u)v when u,v ∈ V. (41.2)

133

134 CHAPTER 4. INNER-PRODUCT SPACES, EUCLIDEAN SPACES

The symmetry and bilinearity of ip is then reflected in the following rules,valid for all u, v, w ∈ V and ξ ∈ R

u · v = v · u, (41.3)

w · (u + v) = w · u + w · v, (41.4)

u · (ξv) = ξ(u · v) = (ξu) · v (41.5)

We say that u ∈ V is orthogonal to v ∈ V if u · v = 0. The assumptionthat the inner product is non-degenerate is expressed by the statement that,given u ∈ V,

(u · v = 0 for all v ∈ V) =⇒ u = 0. (41.6)

In words, the zero of V is the only member of V that is orthogonal to everymember of V.

Since ip ∈ Lin(V,V∗) is injective and since dimV = dimV∗, it followsfrom the Pigeonhole Principle for Linear Mappings that ip is a linear iso-morphism. It induces the linear isomorphism ip⊤ : V∗∗ → V∗. Sinceip is symmetric, we have ip⊤ = ip when V∗∗ is identified with V as ex-plained in Sect.22. Thus, this identification is the same as the isomorphism(ip⊤)−1ip : V → V∗∗ induced by ip . Therefore, there is no conflict if weuse ip to identify V with V∗.

From now on we shall identify V ∼= V∗ by means of ip except that, givenv ∈ V, we shall write v· := ip v for the corresponding element in V∗ so asto be consistent with the notation (41.2).

Every space RI of families of numbers, indexed on a given finite set I,carries the natural structure of an inner-product space whose inner squareis given by

sq(λ) = λ·2 :=∑

λ2 =∑

i∈I

λ2i (41.7)

for all λ ∈ RI . The corresponding inner product is given by

(ip λ)µ = λ · µ =∑

λµ =∑

i∈I

λiµi (41.8)

for all λ, µ ∈ RI . The identification (RI)∗ ∼= RI resulting from this innerproduct is the same as the one described in Sect.23.

Let V and W be inner-product spaces. The identifications V∗ ∼= V andW∗ ∼= W give rise to the further identifications such as

Lin(V,W) ∼= Lin(V,W∗) ∼= Lin2(V ×W , R),

41. INNER-PRODUCT SPACES 135

Lin(W ,V) ∼= Lin(W∗,V∗).

Thus L ∈ Lin(V,W) becomes identified with the bilinear formL ∈ Lin2(V ×W , R) whose value at (v,w) ∈ V ×W satisfies

L(v,w) = (Lv) · w, (41.9)

and L⊤ ∈ Lin(W∗,V∗) becomes identified with L⊤ ∈ Lin(W ,V) ∼=Lin2(W ×V, R) in such a way that

(L⊤w) · v = L⊤(w,v) = L(v,w) = (Lv) · w (41.10)

for all v ∈ V,w ∈ W .We identify the supplementary subspaces Sym2(V

2, R) and Skew2(V2, R)

of Lin2(V2, R) ∼= LinV (see Prop.7 of Sect.24) with the supplementary sub-

spaces

SymV := S ∈ LinV | S = S⊤, (41.11)

SkewV := A ∈ LinV | A⊤ = −A (41.12)

of LinV. The members of SymV are called symmetric lineons and themembers of SkewV skew lineons.

Given L ∈ Lin(V,W) and hence L⊤ ∈ Lin(W ,V) we have L⊤L ∈ SymVand LL⊤ ∈ SymW . Clearly, if L⊤L is invertible (and hence injective), thenL must also be injective. Also, if L is invertible, so is L⊤L. Applying theseobservations to the linear-combination mapping of a family f := (fi | i ∈ I)in V and noting the identification

Gf := lnc⊤f lncf = (fi · fk | (i, k) ∈ I2) ∈ RI2 ∼= Lin(RI) (41.13)

we obtain the following results.Proposition 1: Let f := (fi | i ∈ I) be a family in V. Then f is linearly

independent if the matrix Gf given by (41.13) is invertible.

Proposition 2: A family b := (bi | i ∈ I) in V is a basis if and only if

the matrix Gb is invertible and ♯I = dimV.

Let b := (bi | i ∈ I) be a basis of V. The identification V∗ ∼= V identifiesthe dual of the basis b with another basis b∗ of V in such a way that

b∗i · bk = δi,k for all i, k ∈ I, (41.14)

where δi,k is defined by (16.2). Using the notation (41.13) we have

bk =∑

i∈I

(Gb)k,ib∗i (41.15)


and

Gb∗ = G−1b . (41.16)

Moreover, for each v ∈ V, we have

lnc−1b (v) = b∗ · v := (b∗

i · v | i ∈ I) (41.17)

and hence

v = lncb(b∗ · v) = lncb∗(b · v). (41.18)

For all u,v ∈ V, we have

u · v = (b · u) · (b∗ · v) =∑

i∈I

(bi · u)(b∗i · v). (41.19)

We say that a family e := (ei | i ∈ I) in V is orthonormal if

ei · ek =

1 or − 1 if i = k0 if i 6= k

for all i, k ∈ I (41.20)

We say that e is genuinely orthonormal if, in addition, e·2i = 1 for alli ∈ I, which—using the notation (16.2)—means that

ei · ek = δi,k for all i, k ∈ I. (41.21)

To say that e is orthonormal means that the matrix Ge, as definedby (41.13), is diagonal and each of its diagonal terms is 1 or −1. It isclear from Prop.1 that every orthonormal family is linearly independentand from Prop.2 that a given orthonormal family is a basis if and onlyif ♯I = dimV. Comparing (41.21) with (41.14), we see that a basis b isgenuinely orthonormal if and only if it coincides with its dual b∗.

The standard basis δI := (δIi | i ∈ I) (see Sect.16) is a genuinely or-

thonormal basis of RI .

Let v ∈ V,w ∈ W be given. Then w⊗v ∈ Lin(V∗,W) becomes identifiedwith the element w ⊗ v of Lin(V,W) whose values are given by

(w ⊗ v)u = (v · u)w for all u ∈ V. (41.22)

We say that a subspace U of a given inner-product space V is regularif the restriction sq|U of the inner square to U is non-degenerate, i.e., if foreach u ∈ U ,

(u · v = 0 for all v ∈ U) =⇒ u = 0. (41.23)

41. INNER-PRODUCT SPACES 137

If U is regular, then sq|U endows U with the structure of an inner-productspace. If U is not regular, it does not have a natural structure of an inner-product space.

The identification V∗ ∼= V identifies the annihilator S⊥ of a given subsetS of V with the subspace

S⊥ = v ∈ V | v · u = 0 for all u ∈ S

of V. Thus, S⊥ consists of all elements of V that are orthogonal to everyelement of S. The following result is very easy to prove with the use of theFormula for Dimension of Annihilators (Sect.21) and of Prop.5 of Sect.17.

Characterization of Regular Subspaces: Let U be a subspace of V.

Then the following are equivalent:

(i) U is regular,

(ii) U ∩ U⊥ = 0,

(iii) U + U⊥ = V,

(iv) U and U⊥ are supplementary.

Moreover, if U is regular, so is U⊥.

If U is regular, then its annihilator U⊥ is also called the orthogonalsupplement of U . The following two results exhibit a natural one-to-onecorrespondence between the regular subspaces of V and the symmetric idem-potents in LinV, i.e., the lineons E ∈ LinV that satisfy E = E⊤ and E2 = E.

Proposition 3: Let E ∈ LinV be an idempotent. Then E is symmetric

if and only if Null E = (RngE)⊥.

Proof: If E = E⊤, then Null E = (RngE)⊥ follows from (21.13).

Assume that Null E = (Rng E)⊥. It follows from (21.13) that Null E =Null E⊤ and hence from (22.9) that RngE⊤ = Rng E. Since, by (21.6),(E⊤)2 = (E2)⊤ = E⊤, it follows that E⊤ is also an idempotent. Theassertion of uniqueness in Prop.4 of Sect.19 shows that E = E⊤.

Proposition 4: If E ∈ LinV is symmetric and idempotent, then Rng Eis a regular subspace of V and Null E is its orthogonal supplement. Con-

versely, if U is a regular subspace of V, then there is exactly one symmetric

idempotent E ∈ LinV such that U = Rng E.

Proof: Assume that E ∈ LinV is symmetric and idempotent. It followsfrom Prop.3 that Null E = (RngE)⊥ and hence by the implication (v) ⇒ (i)of Prop.4 of Sect.19 that RngE and (RngE)⊥ are supplementary. Hence,


by the implication (iv) ⇒ (i) of the Theorem on the Characterization ofRegular Subspaces, applied to U := RngE, it follows that Rng E is regular.

Assume now that U is a regular subspace of V. By the implication(i) ⇒ (iv) of the Theorem just mentioned, U⊥ is then a supplement of U .By Prop.3, an idempotent E with Rng E = U is symmetric if and only ifNull E = U⊥. By Prop.4 of Sect.19 there is exactly one such idempotent.

Notes 41

(1) Most textbooks deal only with what we call “genuine” inner-product spaces and usethe term “inner-product space” only in this restricted sense. The terms “Euclideanvector space” or even “Euclidean space” are also often used in this sense.

(2) The terms “scalar product” or “dot product” are sometimes used instead of “innerproduct”.

(3) Other notations for the value u ·v of the inner product are 〈u,v〉, (u|v), 〈u|v〉, and(u,v). The last is very common, despite the fact that it clashes with the notationfor the pair with terms u and v.

(4) Some people use simply v2 instead of v

·2 for an inner square. In fact, I havedone this many times myself. We use v

·2 because omitting the dot would leadto confusion of the compositional square with the inner square of a lineon (seeSect.44).

(5) The terms of b∗ · v are often called the “contravariant components” of v relative

to the basis b and are denoted by vi. The terms of b · v are called “covariantcomponents” and are denoted by vi.

(6) In most of the literature, the term “orthonormal” is used for what we call “genuinelyorthonormal”.

(7) Some people use the term “non-isotropic” or “non-singular” when we speak of a“regular” subspace.

(8) Some authors use the term “perpendicular projection” or the term “orthogonalprojection” to mean the same as our “symmetric idempotent”. See also Note 1 toSect.19.

(9) The dual b∗ of a basis b of an inner-product space V, when regarded as a basis of

V rather than V∗, is often called the “reciprocal” of b.

42 Genuine Inner-Product Spaces

In this section, a genuine inner-product space V is assumed to be given. Wethen have

v·2 > 0 for all v ∈ V×. (42.1)

Of course, if U is any subspace of V, then the restriction sq|U of the innersquare of V to U is again strictly positive and hence endows U with the

42. GENUINE INNER-PRODUCT SPACES 139

natural structure of a genuine inner-product space. Thus every subspace Uof V is regular and hence has an orthogonal supplement U⊥.

Definition 1: For every v ∈ V, the number |v| ∈ P defined by

|v| :=√

v·2 (42.2)

is called the magnitude of v. We say that u ∈ V is a unit vector if

|u| = 1.In the case when V is R the magnitude turns out to be the absolute

value; hence the notation (42.2) is consistent with the usual notation forabsolute values.

It follows from (42.1) that for all v ∈ V,

|v| = 0 ⇐⇒ v = 0. (42.3)

The following formula follows directly from (42.2) and (41.5):

|ξv| = |ξ||v| for all v ∈ V, ξ ∈ R. (42.4)

Inner-Product Inequality: Every pair (u,v) in a genuine inner-pro-

duct space V satisfies

|u · v| ≤ |u||v|; (42.5)

equality holds if and only if (u,v) is linearly dependent.

Proof: Assume, first, that (u,v) is linearly independent. Then, by(42.3), |u| 6= 0, |v| 6= 0. Moreover, if we put w := |v|2u− (u · v)v, then wcannot be zero because it is a linear combination of (u,v) with at least onecoefficient, namely |v|2, not zero. Hence, by (42.1), we have

0 < w·2 = (|v|2)2u·2 − 2|v|2(u · v)(u · v) + (u · v)2v·2

= |v|2(|v|2|u|2 − (u · v)2),

which is equivalent to (42.5) with equality excluded.Assume, now, that (u,v) is linearly dependent. Then one of u and v is

a scalar multiple of the other. Without loss we may assume, for example,that u = ξv for some ξ ∈ R. By (42.4), we then have

|u · v| = |u · (ξu)| = |ξu·2| = |ξ||u||u| = |v||u|,

which shows that (42.5) holds with equality.Subadditivity of Magnitude: For every u,v ∈ V we have

|u + v| ≤ |u| + |v|. (42.6)


Proof: Using (42.5) we find

|u + v|2 = (u + v)·2 = u·2 + 2u · v + v·2

≤ |u|2 + 2|u||v| + |v|2 = (|u| + |v|)2,

which is equivalent to (42.6).Proposition 1: An inner-product space V is genuine if and only if it

has some genuinely orthonormal basis.

Proof: Assume that V has a genuinely orthonormal basise := (ei | i ∈ I) and let v ∈ V× be given. Since e = e∗, we infer from(41.19) that

v·2 = v · v =∑

i∈I

(ei · v)2,

which is strictly positive. Since v ∈ V× was arbitrary, it follows that V isgenuine.

Assume, now, that V is genuine. Let e be an orthonormal set in V. Wehave seen in the previous section that e is linearly independent. Hence it isa basis if and only if Lspe = V. Now, if Lspe is a proper subset of V, then itsorthogonal supplement (Lspe)⊥ is not the zero-space. Hence we may chooseu ∈ ((Lspe)⊥)× and put f := 1

|u|u, so that f ·2 = 1. It is clear that e ∪ fis an orthonormal set that has e as a proper subset. It follows that everymaximal orthonormal set in V is a basis. Such sets exist because the emptyset ∅ is orthonormal and because no orthonormal set in V can have morethan dimV members.

Definition 2: The set of all elements of V with magnitude strictly less

than 1 is called the unit ball of V and is denoted by

UblV := v ∈ V | |v| < 1 = sq<([0, 1[). (42.7)

The set of all element of V with magnitude less than 1 is called the closedunit ball of V and is denoted by

UblV := v ∈ V | |v| ≤ 1 = sq<([0, 1]). (42.8)

The set of all unit vectors in V is called the unit sphere of V and is denoted

by

UsphV := v ∈ V | |v| = 1 = sq<(1) = UblV \ UblV. (42.9)

43. ORTHOGONAL MAPPINGS 141

Notes 42

(1) The magnitude |v| is often called “Euclidean norm”, “norm”, or “length”.

(2) Many people use ||v|| instead of |v| for the magnitude. I consider this a waste of agood symbol that could be employed for something else. In fact, in this book wereserve the use of double bars for operator-norms only (see Sect.52) and thus havean easy notational distinction between the magnitude |L| and the operator-norm||L|| of a lineon L (see Example 3 in Sect.52).

(3) The Inner-Product Inequality is often called “Cauchy’s inequality” (particularlyin France), “Schwarz’s inequality” (particularly in Germany), or “Bunyakovsky’sinequality” (particularly in Russia). Various combinations of these three names arealso sometimes used.

43 Orthogonal Mappings

Let V and V ′ be inner-product spaces.Definition 1: A mapping R : V → V ′ is called an orthogonal

mapping if it is linear and preserves inner squares, i.e. if R ∈ Lin(V,V ′)and

(Rv)·2 = v·2 for all v ∈ V. (43.1)

The set of all orthogonal mappings from V to V ′ will be denoted byOrth(V,V ′). An orthogonal mapping that has an orthogonal inverse will becalled an orthogonal isomorphism. We say that V and V ′ are orthogo-

nally isomorphic if there exists an orthogonal isomorphism from V to V ′. Ofcourse, if V and V ′ are orthogonally isomorphic, they are also linearly iso-morphic; but they may be linearly isomorphic without being orthogonallyisomorphic (see Prop.1 of Sect.47 below). The following is evident fromDef.1 and from Props.1, 2 of Sect.13.

Proposition 1: The identity mapping of an inner-product space is or-

thogonal. The composite of two orthogonal mappings is orthogonal. If an

orthogonal mapping is invertible, its inverse is again orthogonal, and hence

it is an orthogonal isomorphism.

The following is a direct consequence of Def.1 and of Prop.3 of Sect.27,applied to Q := sq.

Proposition 2: R ∈ Lin(V,V ′) is orthogonal if and only if

R⊤R = 1V , (43.2)

or, equivalently, R preserves inner products.

As an immediate consequence of Prop.2 and the Pigeonhole Principle forLinear Mappings we obtain the following result.


Proposition 3: For every orthogonal mapping R ∈ Orth(V,V ′) we

have: R is injective, RngR is a regular subspace of V ′, and R|Rng is an

orthogonal isomorphism from V to Rng R. Moreover, R is an orthogonal

isomorphism if and only if R is surjective or, equivalently,

RR⊤ = 1V′ . (43.3)

This is the case if and only if dimV = dimV ′.

We write OrthV := Orth(V,V) and call its members orthogonal li-neons. In view of Prop.1 and Prop.3, OrthV is a subgroup of LisV andconsists of all orthogonal automorphisms of V. The group OrthV is calledthe orthogonal group of the inner product space V.

Proposition 4: Let b := (bi | i ∈ I) be a basis of the inner-product

space V, let V ′ be an inner-product space and R ∈ Lin(V,V ′). Then R is

orthogonal if and only if

Rbi · Rbk = bi · bk for all i, k ∈ I. (43.4)

In particular, if e := (ei | i ∈ I) is a genuinely orthonormal basis, then Ris orthogonal if and only if Re := (Rei | i ∈ I) is a genuinely orthonormal

family.

Proof: Since Rbi · Rbk = (R⊤Rbi) · bk for all i, k ∈ I, (43.4) statesthat for each i ∈ I, (R⊤Rbi)· ∈ V∗ and bi· ∈ V∗ agree on a basis. Hence,by Prop.2 of Sect.16, (43.4) holds if and only if R⊤Rbi = bi for all i ∈ I,which, in turn, is the case if and only if R⊤R = 1V . The assertion followsthen from Prop.2.

Proposition 5: Let V,V ′ be genuine inner-product spaces. Then V and

V ′ are orthogonally isomorphic if and only if dimV = dimV ′.

Proof: The “only if” part follows from Cor.2 to the Theorem on Char-acterization of Dimension of Sect.17.

Assume that dimV = dimV ′ =: n. By Prop.1 of Sect.42 and by Cor.1 tothe Theorem on Characterization of Dimension, we may choose genuinelyorthonormal bases e := (ei | i ∈ n]) and e′ := (e′i | i ∈ n]) of V and V ′,respectively. By Prop.2 of Sect.16, there is an invertible linear mappingR : V → V ′ such that Rei = e′i for all i ∈ n]. By Prop.4, R is anorthogonal isomorphism.

Proposition 6: Let V,V ′ be inner-product spaces. Assume that

R : V → V ′ preserves inner products, i.e., that

R(u) · R(v) = u · v for all u,v ∈ V, (43.5)

43. ORTHOGONAL MAPPINGS 143

and that LspRngR is a regular subspace of V ′. Then R is linear and hence

orthogonal.

Proof: To show that R preserves sums, let u,v ∈ V be given. It followsfrom (43.5) that

R(u + v) · R(w) = (u + v) · w = u · w + v · w= R(u) · R(w) + R(v) · R(w)

and hence

(R(u + v) − R(u) − R(v)) · R(w) = 0

for all w ∈ V. This means that

R(u + v) − R(u) − R(v) ∈ (Rng R)⊥.

Since LspRng R is regular we have LspRngR ∩ (RngR)⊥ = 0. It followsthat R(u + v) −R(u) − R(v) = 0. Since u,v ∈ V were arbitrary, we con-clude that R preserves sums. A similar argument shows that R preservesscalar multiples and hence is linear.

Pitfall: If R : V → V ′ preserves inner products but LspRng R is notregular, then R need not be linear. For example, if V = 0 and n ∈ V ′

is such that n·2 = 0 but n 6= 0, then the mapping R : V → V ′ defined byR(0) = n preserves inner products but is not linear.

In view of Prop.3, Prop.6 has the following corollary.

Proposition 7: A surjective mapping that preserves inner products is

necessarily an orthogonal isomorphism.

Notes 43

(1) Orthogonal mappings are commonly called “orthogonal transformations”, but thedefinition is often restricted to the case in which the domain and codomain coincide(i.e. when we use “orthogonal lineon”) and the spaces involved are genuine inner-product spaces.

(2) The notations On, On(R), or O(n, R) are often used for the orthogonal groupOrth Rn.

(3) If the inner-product is double-signed and if its index is 1 (see Sect.47), an orthogonallineon is often called a “Lorentz transformation”, especially in the context of thetheory of relativity. The orthogonal group is then called the “Lorentz group”.


44 Induced Inner Products

Let V1 and V2 be inner-product spaces. Then V1 × V2 carries a naturalinduced structure of a linear space, as explained in Sect.14. We endowV1 ×V2 also with the structure of an inner-product space by prescribing itsinner square by the rule

(v1,v2)·2 := v·2

1 + v·22 for all v1 ∈ V1, v2 ∈ V2. (44.1)

The inner product in V1 × V2 is given by

(u1,u2) · (v1,v2) = u1 · v1 + u2 · v2

for all (u1,u2), (v1,v2) ∈ V1 × V2. More generally, if (Vi | i ∈ I) is a finite

family of inner-product spaces, we endow the product-space×(Vi | i ∈ I)with the natural structure of an inner-product space by prescribing the innersquare by the rule

v·2 :=∑

i∈I

v·2i for all v ∈×i∈IVi. (44.2)

It is clear that×(Vi | i ∈ I) is genuine if all the Vi, i ∈ I, are genuine.Remark: The inner products on V1 and V2 induce on V1 × V2, in a

natural manner, non-degenerate quadratic forms other than the inner squaregiven by (44.1). For example, one such quadratic form Q : V1 × V2 → R isgiven by

Q(v1,v2) := v·21 − v·2

2 (44.3)

This form is double-signed (see Sect.27) when V1 and V2 are genuine inner-product spaces of dimension greater then zero.

Theorem on the Induced Inner Product: Let V and W be inner-

product spaces. Then Lin(V,W) has the natural structure of an inner prod-

uct space, obtained by prescribing its inner square by the rule

L·2 := trV(L⊤L) for all L ∈ Lin(V,W). (44.4)

The corresponding inner product is given by

L · M := trV(L⊤M) = trW(ML⊤) (44.5)

for all L,M ∈ Lin(V,W).If V and W are genuine inner-product spaces, so is Lin(V,W).

44. INDUCED INNER PRODUCTS 145

Proof: The bilinearity of the mapping

((L,M) 7→ trV(L⊤M)) : (Lin(V,W))2 → R (44.6)

is a consequence of the linearity of transposition, the bilinearity of compo-sition, and the linearity of the trace (see Chap.2). To prove the symmetryof the mapping (44.6) we use (26.7), (21.6), and (22.4) to obtain

M · L = trV(M⊤L) = trV((M⊤L)⊤)

= trV(L⊤M⊤⊤) = trV(L⊤M) = L · M

for all L,M ∈ Lin(V,W).The Representation Theorem for Linear Forms on Lin(V,W) of Sect.26

states that the mapping that associates with H ∈ Lin(W ,V) the linear form(M 7→ trV(HM)) ∈ (Lin(V,W))∗ is a linear isomorphism. The transposition(L 7→ L⊤) : Lin(V,W) → Lin(W ,V) is also a linear isomorphism. It followsthat their composite, the linear mapping from Lin(V,W) into (Lin(V,W))∗

associated with the bilinear mapping (44.6), is an isomorphism and henceinjective. This means that (44.4) indeed defines a non-degenerate quadraticform whose associated bilinear form is given by (44.6).

The last equality of (44.5) is an immediate consequence of (26.6).The last statement of the Theorem is an immediate consequence of

Prop.1 of Sect.42 and the following Lemma.Lemma: If e := (ei |i ∈ I) is a genuinely orthonormal basis of V and

L ∈ Lin(V,W), then

L·2 = trW(LL⊤) =∑

i∈I

(Lei)·2. (44.7)

Proof: Since e coincides with its dual, we have, by (25.14),

1V =∑

i∈I

ei ⊗ ei.

Using Prop.3 of Sect.25, we obtain

LL⊤ = L1VL⊤ =∑

i∈I

L(ei ⊗ ei)L⊤ =

∑

i∈I

(Lei) ⊗ (Lei).

Since trW((Lei) ⊗ (Lei)) = (Lei) · (Lei) = (Lei)·2 by (26.3), and since

the trace is linear, the last of the equalities (44.7) follows. The first is aconsequence of (44.5).


Let I and J be finite sets. As we have seen in Sect.41, the spaces RI ,RJ and RJ×I can all be regarded as inner-product spaces. Therefore, theTheorem on the Induced Inner Product shows that Lin(RI , RJ) carries thenatural structure of an inner product space. We claim that this structureis compatible with the identification Lin(RI , RJ) ∼= RJ×I (see Sect.16). In-deed, given M ∈ Lin(RI , RJ) ∼= RJ×I we have, by (16.7) and (23.9)

(M⊤M)i,k =∑

j∈I

(M⊤)i,jMj,k =∑

j∈I

Mj,iMj,k for all i, k ∈ I

and hence, by (26.8),

tr(M⊤M) =∑

i∈I

(M⊤M)i,i =∑

(j,i)∈J×I

(Mj,i)2. (44.8)

We now assume that inner-product spaces V and W are given. Thefollowing formulas, valid for all L ∈ Lin(V,W),v ∈ V, and w ∈ W followeasily from the definitions (44.4) and (44.5):

L · (w ⊗ v) = w · Lv = L(v,w), (44.9)

(w ⊗ v)·2 = (v·2)(w·2). (44.10)

In view of (26.9) we have

1·2V = dimV. (44.11)

Proposition 1: The transposition

(L 7→ L⊤) : Lin(V,W) → Lin(W ,V) is an orthogonal isomorphism, i.e. we

have

L · M = L⊤ · M⊤ for all L,M ∈ Lin(V,W). (44.12)

Proof: Given L,M ∈ Lin(V,W), we may use (44.5) to obtain L⊤ ·M⊤ =trW(L⊤⊤M⊤) = trW(LM⊤) = M · L and hence (44.12).

Proposition 2: The subspaces SymV and SkewV of LinV are orthogonal

supplements of each other (and hence both regular).

Proof: By (44.5) we have, for all S ∈ SymV and all A ∈ SkewV,

S · A = tr(S⊤A) = tr(SA) = −tr(SA⊤) = −A · S

and hence S · A = 0. It follows that SymV ⊂ (SkewV)⊥ and SkewV ⊂(SymV)⊥. We already know (Prop.7 of Sect.24) that SymV and SkewV aresupplementary.

44. INDUCED INNER PRODUCTS 147

From now on we assume that both V and W are genuine inner-productspaces. By the last assertion of the Theorem on the Induced Inner Product,Lin(V,W) is then genuine, too. Thus, according to the Definition of Sect.42,every L ∈ Lin(V,W) has a magnitude

|L| :=√

L·2 =√

trV(L⊤L) =√

trW(LL⊤). (44.13)

In addition to the rules concerning magnitudes in general, the magnitude inLin(V,W) satifies the following rules.

Proposition 3: For all v ∈ V, w ∈ W, and L ∈ Lin(V,W) we have

|w ⊗ v| = |w||v|, (44.14)

|Lv| ≤ |L||v|. (44.15)

Proof: (44.14) is an immediate consequence of (44.10). To prove(44.15), we use (44.9), the Inner-Product Inequality of Sect.42 as applied toLin(V,W), and (44.14) to show that

|w · Lv| = |L · (w ⊗ v)| ≤ |L||w ⊗ v| = |L||w||v|.

Putting w := Lv we get |Lv|2 ≤ |Lv||L||v|, which is equivalent to (44.15).

In view of (44.11), we have

|1V | =√

dim(V). (44.16)

Proposition 4: Let V, V ′ and V ′′ be genuine inner-product spaces and

let L ∈ Lin(V,V ′), M ∈ Lin(V ′,V ′′). Then

|ML| ≤ |M||L|. (44.17)

Proof: Choose a genuinely orthonormal basis e := (ei | i ∈ I) of V. Bythe Lemma, (44.7), we have

|ML|2 = (ML)·2 =∑

i∈I

(MLei)·2 =

∑

i∈I

|M(Lei)|2.

Applying (44.15) to each term, we get

|ML|2 ≤∑

i∈I

|M|2|Lei|2 = |M|2

∑

i∈I

(Lei)·2.

Applying the Lemma, (44.7), again, we obtain |ML|2 ≤ |M|2L·2, which isequivalent to (44.17).


45 Euclidean Spaces

Definition 1: A Euclidean space is a finite-dimensional flat space E en-

dowed with additional structure by the prescription of a separation function

sep : E × E −→ R

such that

(a) sep is translation invariant, i.e.

sep (v × v) = sep (45.1)

for all v in the translation space V and E,

(b) For some x ∈ E, the mapping

(v 7→ sep(x, x + v)) : V −→ R (45.2)

is a non-degenerate quadratic form on V.

It follows from (a) that the mapping (45.2) from V to R does not dependon the choice of x. Hence the translation space V of a Euclidean space Ecarries the natural structure of an inner-product space whose inner squarev 7→ v·2 satisfies

sep(x, y) = (y − x)·2 for all x, y ∈ E . (45.3)

Conversely, if E is a finite-dimensional flat space and if its translationspace V is endowed with additional structure as in Def.1 of Sect.41 so as tomake it an inner-product space, then E acquires the natural structure of aEuclidean space when we use (45.3) to define the separation function.

Every inner-product space V has the natural structure of a Euclideanspace that is its own (external) translation space. Indeed, the natural struc-ture of V as a flat space is the one described by Prop.3 of Sect.32, and theinner-product of V then gives V the natural structure of a Euclidean spaceas remarked above.

We say that a flat F in a Euclidean space E is regular if its directionspace is a regular subspace of V. If F is regular, then sep|F×F endows Fwith the structure of a Euclidean space; but if F is not regular, it does nothave a natural structure of a Euclidean space.

Definition 2: Let E , E ′ be Euclidean spaces. We say that the mapping

α : E → E ′ is Euclidean if it is flat and preserves separation, so that

sep′ (α × α) = sep, (45.4)

45. EUCLIDEAN SPACES 149

where sep and sep′ are the separation functions of E and E ′, respectively.

A Euclidean mapping that has a Euclidean inverse is called a Euclideanisomorphism.

The following is evident from Def.2, (45.3), (33.4), and Def.1 in Sect.43.Proposition 1: The mapping α : E → E ′ is Euclidean if and only if it

is flat and has an orthogonal gradient.

Using this Proposition and Prop.1 of Sect.43, we obtain:Proposition 2: The identity mapping of a Euclidean space is a Eu-

clidean mapping. The composite of two Euclidean mappings is Euclidean. If

a Euclidean mapping is invertible, then its inverse is again Euclidean, and

hence it is a Euclidean isomorphism.

In view of Prop.3 of Sect.43, we have:Proposition 3: Every Euclidean mapping α : E → E ′ is injective, its

range Rng α is a regular subspace of E ′, and α|Rng is a Euclidean isomor-

phorism from E onto Rng α. Moreover, α is a Euclidean isomorphism if and

only if α is surjective, and this is the case if and only if dim E = dim E ′.

The set of all Euclidean automorphisms of a Euclidean space E will bedenoted by EisE . It is a subgroup of the group Fis E of all flat automorphismsof E . The mapping g : EisE → OrthV defined by g(α) := ∇α is a surjectivegroup-homomorphism whose kernel is the translation group V. In fact, g isobtained from the homomorphism h : Fis E → LisV described at the endof Sect.33 by the adjustment g := h|OrthV

EisE .Proposition 4: Let E, E ′ be Euclidean spaces, let α : E → E ′ be a flat

mapping, and let p := (pi | i ∈ I) be a flat basis of E such that

sep(pi, pj) = sep′(α(pi), α(pj)) for all i, j ∈ I. (45.5)

Then α is a Euclidean mapping.

Proof: Since α is flat we have, by (33.4),

α(pi) − α(pj) = ∇α(pi − pj) for all i, j ∈ I. (45.6)

We now choose k ∈ I and put I ′ := I \ k and ui := pi − pk for all i ∈ I ′.Then (45.6) gives

α(pi) − α(pk) = (∇α)ui for all i ∈ I ′

andα(pi) − α(pj) = (∇α)(ui − uj) for all i, j ∈ I ′.

In view of (45.3), it follows from (45.5) that

((∇α)ui)·2 = u·2

i for all i ∈ I ′ (45.7)


and

((∇α)(ui − uj))·2 = (ui − uj)

·2 for all i, j ∈ I ′,

which is equivalent to

((∇α)ui)·2 − 2((∇α)ui) · ((∇α)uj) + ((∇α)uj)

·2 = u·2i − 2ui · uj + u·2

j .

Using (45.7) this gives

(∇α)ui · (∇α)uj = ui · uj for all i, j ∈ I ′.

Since (ui |i ∈ I ′) is a basis of V by Prop.2 of Sect.35, it follows from Prop.4of Sect.43 that ∇α is orthogonal. Hence α is Euclidean by Prop.1.

The following result shows that, in Def.2, the requirement that α be flatcan in many cases be omitted.

Proposition 5: Let E , E ′ be Euclidean spaces and let α : E → E ′ be a

mapping that preserves separation, i.e. satisfies (45.4). Then α is flat and

hence Euclidean provided that the flat span of Rng α is regular.

Proof: Let V and V ′ be the translation spaces of E and E ′, respectively.We choose q ∈ E and define R : V → V ′ by

R(v) := α(q + v) − α(q) for all v ∈ V. (45.8)

We have, for all u,v ∈ V

R(u) − R(v) = α(q + u) − α(q + v).

Since α preserves separation, it follows that

(R(u) − R(v))·2 = (u− v)·2 for all u,v ∈ V.

By an argument similar to one given in the proof of Prop.4, we concludethat

R(u) · R(v) = u · v for all u,v ∈ V.

Now, (45.8) shows that RngR = (Rng α)−α(q). Hence, by (32.6), LspRngRis the direction space of FspRng α. If FspRng α is regular, so is LspRng R,and we can apply Prop.6 of Sect.43 to conclude that R is orthogonal. Itfollows from (45.8) that α is flat and that R = ∇α.

The following corollary to Prop.5 shows that the prescription of the sep-aration function alone is sufficient to determine the structure of a Euclideanspace on a set.

45. EUCLIDEAN SPACES 151

Proposition 6: Let a set E and a function sep : E × E → R be given.

Then E can be endowed with at most one structure of a Euclidean space

whose separation function is sep.

Proof: Let two such structures be given with V and V ′ as the cor-responding translation spaces. The identity 1E preserves separation andhence Prop.5 may be applied to it when E as domain of 1E is consideredas endowed with the first structure and as codomain of 1E with the second.Let R ∈ Orth(V,V ′) be the gradient of 1E when interpreted in this manner.In view of (33.1), it follows that 1E v = (Rv) 1E and hence v = Rv forall v ∈ V, which means that V ⊂ V ′ and R = 1V⊂V ′ . Reversing the rolesof V and V ′ we get V ′ ⊂ V and hence V = V ′ and R = 1V . Since R isan orthogonal isomorphism, it follows that V and V ′ coincide not only assubgroups of Perm E but also as inner-product spaces.

Proposition 7: Let E be a Euclidean space and p := (pi | i ∈ I) a flatly

spanning family of points in E. Then for all x, y ∈ E

(sep(x, pi) = sep(y, pi) for all i ∈ I) =⇒ x = y.

Proof: Let x, y ∈ E be given and let z be the midpoint of (x, y) (seeExample 1 of Sect.34). This means that x = z + u and y = z − u for asuitable u ∈ V. The family b := ((pi − z) | i ∈ I) spans V and we have

sep(x, pi) = (pi − x)·2 = (bi − u)·2

sep(y, pi) = (pi − y)·2 = (bi + u)·2

for all i ∈ I. Hence, if sep(x, pi) = sep(y, pi) for all i ∈ I we have

bi · u =1

4((bi + u)·2 − (bi − u)·2) = 0

for all i ∈ I. Since b is spanning, it follows that u ∈ (Rngb)⊥ = V⊥ = 0and hence u = 0, which means that x = z = y.

Notes 45

(1) What we call a “genuine Euclidean space” (see Def.1 of Sect.46) is usually justcalled a “Euclidean space”, and non-genuine Euclidean spaces are often referred toas “pseudo-Euclidean spaces”. I believe it is more convenient to have a term thatcovers both cases.

The term “Euclidean space” is often misused to mean a genuine inner-product spaceor even Rn.

(2) In the past, the term “separation” has been used only in the special case when thespace E is a model of the event-world in the theory of relativity.


46 Genuine Euclidean Spaces, Congruences

Definition 1: We say that a Euclidean space E is genuine if its separation

function sep has only positive values, i.e. if Rng sep ⊂ P. The distancefunction

dst : E × E → P

is then defined by

dst(x, y) :=√

sep(x, y) for all x, y ∈ E . (46.1)

It is clear from (45.3) that a Euclidean space E is genuine if and onlyif its translation space V is a genuine inner-product space. Hence all flatsin a genuine Euclidean space are regular and have natural structures ofgenuine Eucliden spaces. Comparing (46.1) with (42.2), we see that (45.3)is equivalent to

dst(x, y) = |y − x| for all x, y ∈ E . (46.2)

Using (42.4) with ξ := −1 and (42.6) and (42.3), we obtain:Proposition 1: The distance function dst of a genuine Euclidean space

E has the following properties, valid for all x, y, z ∈ E :

dst(x, y) = dst(y, x), (46.3)

dst(x, y) + dst(y, z) ≥ dst(x, z), (46.4)

dst(x, y) = 0 ⇐⇒ x = y. (46.5)

Let q ∈ E and ρ ∈ P× be given. Then

Ballq,ρ(E) := x ∈ E | dst(x, q) < ρ (46.6)

is called the ball of radius ρ centered at q,

Ballq,ρ(E) := x ∈ E | dst(x, q) ≤ ρ (46.7)

is called the closed ball of radius ρ centered at q, and

Sphq,ρ(E) := x ∈ E | dst(x, q) = ρ (46.8)

= Ballq,ρ(E) \ Ballq,ρ(E)

is called the sphere of radius ρ centered at q. If dim E = 2, the term “disc”is often used instead of “ball”, and the term “circle” instead of “sphere”.

46. GENUINE EUCLIDEAN SPACES, CONGRUENCES 153

In view of Def.2 of Sect.42, we have

Ballq,ρ(E) = q + ρUblV, (46.9)

Ballq,ρ(E) = q + ρUblV, (46.10)

Sphq,ρ(E) = q + ρUsphV. (46.11)

We now assume that a genuine Euclidean space E is given.Definition 2: We say that the families p := (pi | i ∈ I) and p′ :=

(p′i | i ∈ I) of points in E are congruent in E if there is a Euclidean

automorphism α of E such that α(pi) = p′i for all i ∈ I.Congruence Theorem: The families p := (pi | i ∈ I) and

p′ := (p′i | i ∈ I) are congruent if and only if

dst(pi, pj) = dst(p′i, p′j) for all i, j ∈ I. (46.12)

Proof: The “only if” part follows from the definitions.Assume that (46.12) holds. Put F := FspRng p and choose a subset K

of I such that p|K = (pk | k ∈ K) is a flat basis of F . Put F ′ := FspRng p′.By Prop.5 of Sect.35, there is a unique flat mapping β : F → F ′ such thatβ(pk) = p′k for all k ∈ K. By Prop.4 of Sect.45, we can conclude from(46.12) that β must be Euclidean and hence injective (see Prop.3 of Sect.45).It follows that dimF ≤ dimF ′. Reversing the roles of p and p′, we alsoconclude that dimF ′ ≤ dimF . Therefore, we have dimF = dimF ′ and β isa Euclidean isomorphism. Hence p′|K = β>(p|K) is a flat basis of F ′. Sinceβ preserves distance, we have for all x ∈ F

dst(x, pk) = dst(β(x), β(pk)) = dst(β(x), p′k)

for all k ∈ K. Thus, using (46.12) we get

dst(β(pi), p′k) = dst(pi, pk) = dst(p′i, p

′k)

for all k ∈ K, i ∈ I. Using Prop.7 of Sect.45, we conclude that

β(pi) = p′i for all i ∈ I. (46.13)

Let U and U ′ be the direction spaces of F and F ′. Since

dimU = dimF = dimF ′ = dimU ′,

it follows that dimU⊥ = dimU′⊥. Therefore, in view of Prop.5 of Sect.43,

we may choose an orthogonal isomorphism R : U⊥ → U′⊥. Since E =


F + U⊥ = F ′ + U′⊥, there is a unique Euclidean isomorphism α : E → E

such that

α(x + w) = β(x) + Rw for all x ∈ F , w ∈ U⊥.

By (46.13) we have α(pi) = p′i for all i ∈ I, showing that p and p′ arecongruent.

The Congruence Theorem shows that two families of points in E arecongruent in E if and only if they are congruent in any flat F that includesthe ranges of both families. It is for this reason that we may simply say“congruent” rather than “congruent in E”.

Definition 3: Let S and S ′ be two subsets of E. We say that a mapping

γ : S → S ′ is a congruence if (x | x ∈ S) and (γ(x) | x ∈ S) are congruent

families of points. The set of all congruences from a given set S to itself is

called the symmetry-group of S and is denoted by CongS.

The Congruence Theorem has the following immediate consequence:Corollary: The mapping γ : S → S ′ is a congruence if and only if it

is surjective and preserves distance.

All congruences are invertible mappings and CongS is a subgroup of thepermutation group PermS of S. The group CongS describes the internalsymmetry of S. For example, if ♯S = 3, then S can be viewed as the set ofvertices of a triangle. If the triangle is equilateral, then CongS = PermS.If the triangle is isoceles but not equilateral, then CongS is a two-elementsubgroup of PermS that contains 1S and one switch. If the triangle isscalene, then CongS is the identity-subgroup 1S of PermS.

Notes 46

(1) In view of Prop.5 of Sect.45, it turns out that a mapping between genuine Eu-clidean spaces is a Euclidean isomorphism if and only if it is an isometry, i.e. aninvertible mapping that preserves distances. It is for this reason that many peoplesay “isometry” in place of “Euclidean isomorphism”.

47 Double-Signed Inner-Product Spaces

We assume that an inner-product space V with inner square sq is given. Ifsq is strictly positive, i.e. if the space is a genuine inner-product space, thenthe results of Sect.42 apply. If sq is strictly negative, then a change from sqto −sq converts the space into a genuine inner-product space, and the resultsof Sect.42, with suitable adjustments of sign, still apply. In particular, two

47. DOUBLE-SIGNED INNER-PRODUCT SPACES 155

inner-product spaces with strictly negative inner squares are isomorphic ifthey have the same dimension.

The results of this section are of interest only if the inner product isdouble-signed, i.e. if sq is neither strictly positive nor strictly negative. Recallthat a subspace U of V is called regular if sq|U is non-degenerate (see Sect.41).We say that U is

singular if it is not regular,

totally singular if sq|U = 0,

positive-regular if sq|U is strictly positive,

negative-regular if sq|U is strictly negative.

Definition 1: Let V be an inner product space. The greatest among the

dimensions of all positive-regular [negative-regular] subspaces of V will be

denoted by sig +V [sig−V]. The pair (sig +V, sig −V) is called the signatureof V.

The structure of double-signed inner-product spaces is described by thefollowing result.

Inner-Product Signature Theorem: Let V be an inner-product

space. If a positive-regular [negative-regular] subspace U of V satisfies one

of the following conditions, then it satisfies all three:

(a) dimU = sig +V [dimU = sig −V],

(b) U is maximal among the positive-regular [negative-regular] subspaces

of V,

(c) U⊥ is negative-regular [positive-regular].

Proof: It is sufficient to consider the case when U is positive-regular,because a change from sq to −sq will then take care of the other case.We choose a positive-regular subspace W such that dimW = sig +V; thendimU ≤ dimW .

(a) ⇒ (b): This is trivial.

(b) ⇒ (c): Suppose that U⊥ is not negative-regular. Since U⊥ is regularby the last assertion of the Theorem on the Characterization of RegularSubspaces of Sect.41, sq|U⊥ cannot be negative by Prop.4 of Sect.27. Hencewe may choose w ∈ U⊥ such that w·2 > 0. Then U + Rw strictly includesU and is still positive-regular. Hence U is not maximal.

(c) ⇒ (a): Since U is regular, U⊥ is a supplement of U by the Theoremon the Characterization of Regular Subspaces. Hence, by Prop.4 of Sect.19,there is a projection P : V → U such that U⊥ = Null P. Let w ∈ W ∩U⊥

be given. Since W is positive-regular and U⊥ is negative regular, we would


have w·2 > 0 and w·2 < 0 if w were not zero. Since this cannot be, it followsthat

Null P|W = W ∩ U⊥ = 0

and hence that P|W is injective. By the Pigeonhole Principle for LinearMappings, it follows that dimW ≤ dimU and hence dimU = dimW =sig +V,

Using the eqivalence (a) ⇔ (c) twice, once for U and once when U isreplaced by U⊥, we obtain the following:

Corollary: There exist positive-regular subspaces U such that U⊥ is

negative-regular. If U is such a subspace, then

dimU = sig +V, dimU⊥ = sig −V. (47.1)

We have

sig +V + sig −V = dimV. (47.2)

Of course, spaces that are orthogonally isomorphic have the same signa-ture. The converse is also true:

Proposition 1: Two inner-product spaces are orthogonally isomorphic

if (and only if) they have the same signature.

Proof: Assume that V and V ′ are inner-product spaces with the samesignature. In view of the Corollary above, we may choose positive-regularsubspaces U and U ′ of V and V ′, respectively, such that U⊥ and U

′⊥ arenegative regular. Since V and V ′ have the same signature, we have, by(47.1),

dimU = dimU ′, dimU⊥ = dimU′⊥.

Therefore, we may apply Prop.5 of Sect.43 and choose orthogonal isomor-phisms R1 : U → U ′ and R2 : U⊥ → U

′⊥. Since U and U⊥ aresupplementary, there is a unique linear mapping R : V → V ′ such thatR|U = R1|

V′and R|U⊥ = R2|

V′(See Prop.5 of Sect.19). It is easily seen

that R is an orthogonal isomoprhism.Proposition 2: Every inner-product space V has orthonormal bases.

Moreover, if e := (ei | i ∈ I) is an orthonormal basis of V, then the

signature of V is given by

sig +V = ♯i ∈ I | e·2i = 1, (47.3)

sig −V = ♯i ∈ I | e·2i = −1. (47.4)

Proof: In view of the Corollary above, we may choose a positive-regularsubspace U of V such that U⊥ is negative regular. By Prop.1 of Sect.42, we

47. DOUBLE-SIGNED INNER-PRODUCT SPACES 157

may choose a genuinely orthonormal set basis b of U and an orthonormal setbasis c of U⊥ such that e·2 = −1 for all e ∈ c. Then b∪ c is an orthonormalset basis of V.

Let an orthonormal basis e := (ei | i ∈ I) of V be given. We put

U1 := Lspei | i ∈ I, e·2i = 1, U2 := Lspei | i ∈ I, e·2i = −1.

It is clear that U1 is positive-regular and that U2 is negative regular andthat the right sides of (47.3) and (47.4) are the dimensions of U1 and U2,respectively. Hence, by Def.1, dimU1 ≤ sig +V, dimU2 ≤ sig −V. On theother hand, since e is a basis, U1 and U2 are supplementary, in V and henceby Prop.5 of Sect.17, dimU1 + dimU2 = dimV. This is compatible with(47.2) only if dimU1 = sig +V and dimU2 = sig −V.

The greatest among the dimensions of all totally singular subspaces ofa given inner product space V is called the index of V and is denoted byindV.

Inner-Product Index Theorem: The index of an inner-product space

V is given by

indV = min sig +V, sig −V (47.5)

and all maximal totally singular subspaces of V have dimension indV.

Proof: By the Corollary above, we may choose a positive-regular sub-space U of V such that U⊥ is negative-regular. Let P1 : V → U andP2 : V → U⊥ be the projections for which Null P1 = U⊥ and Null P2 = U(see Prop.4 of Sect.19).

Let a totally singular subspace N of V be given. Since U⊥ is negative-regular, every non-zero element of U⊥ has a strictly negative inner squareand hence cannot belong to N . It follows that

Null P1|N = N ∩ U⊥ = 0.

and hence that P1|N is injective. By the Pigeonhole Principle for LinearMappings, it follows that

dimN = dim(Rng P1|N ) ≤ dimU = sig +V. (47.6)

A similar argument shows that

dimN = dim(Rng P2|N ) ≤ dimU⊥ = sig −V. (47.7)

If the inequalities in (47.6) and (47.7) are both strict, then the orthogonalsupplements of Rng P1|N and Rng P2|N relative to U and U⊥, respectively,


are both non-zero. Hence we may choose u ∈ U and w ∈ U⊥ such that u·2 =1, w·2 = −1, and u · P1n = w · P2n = 0 for all n ∈ N . It is easily verifiedthat N + R(u + w) is a totally singular subspace of V that includes N as aproper subspace. We conclude that N is maximal among the totally singularsubspaces of V if and only if at least one of the inequalities (47.6) and (47.7)reduces to an equality, i.e., if and only if dimN = min sig +V, sig −V.

Remark: The mathematical model for space-time in the Theory ofSpecial Relativity has the structure of a (non-genuine) Euclidean space Ewhose translation space V has index 1. It is customary, by convention, toassume

1 = indV = sig −V ≤ sig +V.

(However, many physicists now use the only other possible convention: 1 =indV = sig +V ≤ sig −V). The elements of V are called world-vectors. Anon-zero world-vector v is said to be time-like, space-like, or signal-like,depending on whether v·2 < 0, v·2 > 0, or v·2 = 0, respectively.

Notes 47

(1) The terms “isotropic” and “totally isotropic” are sometimes used for our “singular”and “totally singular”.

(2) Some people use the term “signature” to mean an appropriate list of + signs and− signs rather than just a pair of numbers in N.

(3) The Inner-Product Signature Theorem, or the closely related Prop.2, is often re-ferred to as “the Law of Inertia” or “Sylvester’s Law of Inertia”.


(1) Let V be an inner-product space.

(a) Prove: If e := (ei | i ∈ I) is an orthonormal family in V and ifv ∈ V, then

w := v −∑

i∈I sgn (e·2i )(ei · v)ei (P4.1)

satisfies w · ej = 0 for all j ∈ I. (For the definition of sgn see(08.13)).

(b) Assume that V is genuine. Prove: If b := (bi | i ∈ n]) is a list-basis of V then there is exactly one orthonormal list basis e :=(ei | i ∈ n]) of V such that bk · ek > 0 and


Lspei | i ∈ k] = Lspbi | i ∈ k] (P4.2)

for all k ∈ n]. (Hint: Use induction and Part (a).)

Note: The procedure for obtaining the orthonormal basis e of (b) is often called

“Gram-Schmidt orthogonalization”. I consider it of far less importance than some

mathematicians do.

(2) Let V be a genuine inner-product space of dimension 2.

(a) Show that the set SkewV∩OrthV has exactly two members; more-over, if J is one of them then −J is the other, and we haveJ2 = −1V .

(b) Show that L − L⊤ = −tr(LJ)J for all L ∈ LinV.

(3) Let V be a genuine inner-product space.

(a) Prove: If A is a skew lineon on V, then 1V − A is invertible, sothat we may define Φ : SkewV → LinV by

Φ(A) := (1V + A)(1V − A)−1 for all A ∈ SkewV.(P4.3)

(b) Show that Φ(A) is orthogonal when A ∈ SkewV.

(c) Show that Φ is injective and that

Rng Φ = R ∈ OrthV | R + 1V is invertible (P4.4)

(4) Let V be a genuine inner-product space.

(a) Show that, for each a ∈ V×, there is exactly one orthogonal lineonR on V such that Ra = −a and Ru = u for all u ∈ a⊥. Showthat, for all v ∈ V,

Rv = −v ⇐⇒ v ∈ Ra. (P4.5)

(b) Let L ∈ LinV be given. Show that L commutes with every or-thogonal lineon on V if and only if L ∈ R1V . (Hint: use Part(a).)


(5) Let E be a genuine Euclidean space. Prove that the sum of thesquares of the lengths of the sides of a parallelogram (see Problem 5 ofChapt. 3) in E equals the sum of the squares of the lengths of its diag-onals. (Hint: Reduce the problem to an identity involving magnitudesof vectors in V := E − E .)

Note: This result is often called the “parallelogram law”.

(6) Let E be a genuine Euclidean space with dimE > 0 and let p be aflat basis of E . Show that there is exactly one sphere that includesRng p, i.e. there is exactly one q ∈ E and one ρ ∈ P× such thatRng p ⊂ Sphq,ρE (see (46.8)). (Hint: Use induction over dimE .)

(7) Let p := (pi | i ∈ I) and p′ := (p′i | i ∈ I) be families in a genuineEuclidean space E and assume that there is a k ∈ I such that

(pi − pk) · (pj − pk) = (p′i − p′k) · (p′j − p′k) for all i, j ∈ I. (P4.6)

Show that p and p′ are congruent.

(8) Let V and V ′ be inner-product spaces and consider Lin(V,V ′) withits natural inner-product space structure characterized by (44.4) and(44.5). Let (p, n) and (p′, n′) be the signatures of V and V ′, respec-tively (see Sect.47). Show that the signature of Lin(V,V ′) is given by(pp′ + nn′, pn′ + np′). (Hint: Use Prop.4 of Sect.25 and Prop.2 ofSect.47.)

(9) Let V be an inner-product space with sig +V = 1 and dimV > 1. Put

V+ := v ∈ V | v·2 > 0. (P4.7)

(a) Prove: For all u,v ∈ V+ we have

(u · v)2 ≥ u·2v·2; (P4.8)

equality holds if and only if (u,v) is linearly dependent. (Hint:Use a trick similar to the one used in the proof of the Inner-Product Inequality, Sect.42, and use the Inner-Product SignatureTheorem.)

(b) Prove: For all u,v,w ∈ V+ we have

(u · v)(v · w)(w · u) > 0. (P4.9)


(Hint: Consider z := (v·w)u−(u·w)v and use the Inner-ProductSignature Theorem).

(c) Define the relation ∼ on V+ by

u ∼ v :⇐⇒ u · v > 0. (P4.10)

Show that ∼ is an equivalence relation on V+ and that the corre-sponding partition of V+ (see Sect.01) has exactly two members;moreover, if C is one of them then −C is the other.

Note: In the application to the theory of relativity, one of the two equivalenceclasses of Part (b), call it C, is singled out. The elements of C are then called“future-directed” world-vectors while the members of −C are called “past-directed”world vectors.

Chapter 6

Differential Calculus

In this chapter, it is assumed that all linear spaces and flat spaces underconsideration are finite-dimensional.

61 Differentiation of Processes

Let E be a flat space with translation space V. A mapping p : I → E fromsome interval I ∈ Sub R to E will be called a process. It is useful to thinkof the value p(t) ∈ E as describing the state of some physical system at timet. In special cases, the mapping p describes the motion of a particle andp(t) is the place of the particle at time t. The concept of differentiability forreal-valued functions (see Sect.08) extends without difficulty to processes asfollows:

Definition 1: The process p : I → E is said to be differentiable att ∈ I if the limit

∂tp := lims→0

1

s(p(t+ s) − p(t)) (61.1)

exists. Its value ∂tp ∈ V is then called the derivative of p at t. We say

that p is differentiable if it is differentiable at all t ∈ I. In that case, the

mapping ∂p : I → V defined by (∂p)(t) := ∂tp for all t ∈ I is called the

derivative of p.Given n ∈ N×, we say that p is n times differentiable if ∂np : I → V

can be defined by the recursion

∂1p := ∂p, ∂k+1p := ∂(∂kp) for all k ∈ (n− 1)]. (61.2)

We say that p is of class Cn if it is n times differentiable and ∂np is

continuous. We say that p is of class C∞ if it is of class Cn for all n ∈ N×.

209

210 CHAPTER 6. DIFFERENTIAL CALCULUS

As for a real-valued function, it is easily seen that a process p is contin-uous at t ∈ Dom p if it is differentiable at t. Hence p is continuous if it isdifferentiable, but it may also be continuous without being differentiable.

In analogy to (08.34) and (08.35), we also use the notation

p(k) := ∂kp for all k ∈ (n− 1)] (61.3)

when p is an n-times differentiable process, and we use

p• := p(1) = ∂p, p•• := p(2) = ∂2p, p••• := p(3) = ∂3p, (61.4)

if meaningful.We use the term “process” also for a mapping p : I → D from some

interval I into a subset D of the flat space E . In that case, we use poeticlicense and ascribe to p any of the properties defined above if p|E has thatproperty. Also, we write ∂p instead of ∂(p|E) if p is differentiable, etc. (If Dis included in some flat F , then one can take the direction space of F ratherthan all of V as the codomain of ∂p. This ambiguity will usually not causeany difficulty.)

The following facts are immediate consequences of Def.1, and Prop.5 ofSect.56 and Prop.6 of Sect.57.

Proposition 1: The process p : I → E is differentiable at t ∈ I if and

only if, for each λ in some basis of V∗, the function λ(p− p(t)) : I → R is

differentiable at t.The process p is differentiable if and only if, for every flat function a ∈

Flf E, the function a p is differentiable.

Proposition 2: Let E , E ′ be flat spaces and α : E → E ′ a flat mapping.

If p : I → E is a process that is differentiable at t ∈ I, then α p : I → E ′

is also differentiable at t ∈ I and

∂t(α p) = (∇α)(∂tp). (61.5)

If p is differentiable then (61.5) holds for all t ∈ I and we get

∂(α p) = (∇α)∂p. (61.6)

Let p : I → E and q : I → E ′ be processes having the same domain I.Then (p, q) : I → E×E ′, defined by value-wise pair formation, (see (04.13))is another process. It is easily seen that p and q are both differentiable att ∈ I if and only if (p, q) is differentiable at t. If this is the case we have

∂t(p, q) = (∂tq, ∂tq). (61.7)

61. DIFFERENTIATION OF PROCESSES 211

Both p and q are differentiable if and only if (p, q) is, and, in that case,

∂(p, q) = (∂p, ∂q). (61.8)

Let p and q be processes having the same domain I and the samecodomain E . Since the point-difference (x, y) 7→ x − y is a flat mappingfrom E × E into V whose gradient is the vector-difference (u,v) 7→ u − vfrom V × V into V we can apply Prop.2 to obtain

Proposition 3: If p : I → E and q : I → E are both differentiable at

t ∈ I, so is the value-wise difference p− q : I → V, and

∂t(p− q) = (∂tp) − (∂tq). (61.9)

If p and q are both differentiable, then (61.9) holds for all t ∈ I and we

get

∂(p− q) = ∂p− ∂q. (61.10)

The following result generalizes the Difference-Quotient Theorem statedin Sect.08.

Difference-Quotient Theorem: Let p : I → E be a process and let

t1, t2 ∈ I with t1 < t2. If p|[t1,t2] is continuous and if p is differentiable at

each t ∈ ]t1, t2[ then

p(t2) − p(t1)

t2 − t1∈ CloCxh∂tp | t ∈ ]t1, t2[. (61.11)

Proof: Let a ∈ Flf E be given. Then (a p) |[t1,t2] is continuous and,by Prop.1, a p is differentiable at each t ∈ ]t1, t2[. By the elementaryDifference-Quotient Theorem (see Sect.08) we have

(a p)(t2) − (a p)(t2)

t2 − t1∈ ∂t(a p) | t ∈ ]t1, t2[.

Using (61.5) and (33.4), we obtain

∇a

(

p(t2) − p(t1)

t2 − t1

)

∈ (∇a)>(S), (61.12)

whereS := ∂tp | t ∈ ]t1, t2[.

Since (61.12) holds for all a ∈ Flf E we can conclude that b(p(t2)−p(t1)t2−t1

) ≥ 0holds for all those b ∈ Flf V that satisfy b>(S) ⊂ P. Using the Half-SpaceIntersection Theorem of Sect.54, we obtain the desired result (61.11).

Notes 61

(1) See Note (8) to Sect.08 concerning notations such as ∂tp, ∂p, ∂np, p·, p(n), etc.


62 Small and Confined Mappings

Let V and V ′ be linear spaces of strictly positive dimension. Consider amapping n from a neighborhood of zero in V to a neighborhood of zero inV ′. If n(0) = 0 and if n is continuous at 0, then we can say, intuitively, thatn(v) approaches 0 in V ′ as v approaches 0 in V. We wish to make precisethe idea that n is small near 0 ∈ V in the sense that n(v) approaches 0 ∈ V ′

faster than v approaches 0 ∈ V.Definition 1: We say that a mapping n from a neighborhood of 0 in V

to a neighborhood of 0 in V ′ is small near 0 if n(0) = 0 and, for all norms

ν and ν ′ on V and V ′, respectively, we have

limu→0

ν ′(n(u))

ν(u)= 0. (62.1)

The set of all such small mappings will be denoted by Small(V,V ′).Proposition 1: Let n be a mapping from a neighborhood of 0 in V to a

neighborhood of 0 in V ′. Then the following conditions are equivalent:

(i) n ∈ Small(V,V ′).

(ii) n(0) = 0 and the limit-relation (62.1) holds for some norm ν on Vand some norm ν ′ on V ′.

(iii) For every bounded subset S of V and every N ′ ∈ Nhd0(V′) there is a

δ ∈ P× such that

n(sv) ∈ sN ′ for all s ∈ ]−δ, δ[ (62.2)

and all v ∈ S such that sv ∈ Domn.

Proof: (i) ⇒ (ii): This implication is trivial.(ii) ⇒ (iii): Assume that (ii) is valid. Let N ′ ∈ Nhd0(V ′) and a bounded

subset S of V be given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52,we can choose b ∈ P× such that

ν(v) ≤ b for all v ∈ S. (62.3)

By Prop.3 of Sect.53 we can choose ε ∈ P× such that

εbCe(ν ′) ⊂ N ′. (62.4)

Applying Prop.4 of Sect.57 to the assumption (ii) we obtain δ ∈ P× suchthat, for all u ∈ Domn,

ν ′(n(u)) < εν(u) if 0 < ν(u) < δb. (62.5)

62. SMALL AND CONFINED MAPPINGS 213

Now let v ∈ S be given. Then ν(sv) = |s|ν(v) ≤ |s|b < δb for all s ∈ ]−δ, δ[such that sv ∈ Domn. Therefore, by (62.5), we have

ν ′(n(sv)) < εν(sv) = ε|s|ν(v) ≤ |s|εb

if sv 6= 0, and hence

n(sv) ∈ sεbCe(ν ′)

for all s ∈ ]−δ, δ[ such that sv ∈ Domn. The desired conclusion (62.2) nowfollows from (62.4).

(iii) ⇒ (i): Assume that (iii) is valid. Let a norm ν on V, a normν ′ on V ′, and ε ∈ P× be given. We apply (iii) to the choices S := Ce(ν),N ′ := εCe(ν ′) and determine δ ∈ P× such that (62.2) holds. If we puts := 0 in (62.2) we obtain n(0) = 0. Now let u ∈ Domn be given such that0 < ν(u) < δ. If we apply (62.2) with the choices s := ν(u) and v := 1

su,

we see that n(u) ∈ ν(u)εCe(ν ′), which yields

ν ′(n(u))

ν(u)< ε.

The assertion follows by applying Prop.4 of Sect.57.The condition (iii) of Prop.1 states that

lims→0

1

sn(sv) = 0 (62.6)

for all v ∈ V and, roughly, that the limit is approached uniformly as v variesin an arbitrary bounded set.

We also wish to make precise the intuitive idea that a mapping h from aneighborhood of 0 in V to a neighborhood of 0 in V ′ is confined near zero inthe sense that h(v) approaches 0 ∈ V ′ not more slowly than v approaches0 ∈ V.

Definition 2: A mapping h from a neighborhood of 0 in V to a neigh-

borhood of 0 in V ′ is said to be confined near 0 if for every norm ν on Vand every norm ν ′ on V ′ there is N ∈ Nhd0(V) and κ ∈ P× such that

ν ′(h(u)) ≤ κν(u) for all u ∈ N ∩ Domh. (62.7)

The set of all such confined mappings will be denoted by Conf(V,V ′).Proposition 2: Let h be a mapping from a neighborhood of 0 in V to a

neighborhood of 0 in V ′. Then the following are equivalent:

(i) h ∈ Conf(V,V ′).


(ii) There exists a norm ν on V, a norm ν ′ on V ′, a neighborhood N of 0in V, and κ ∈ P× such that (62.7) holds.

(iii) For every bounded subset S of V there is δ ∈ P× and a bounded subset

S ′ of V ′ such that

h(sv) ∈ sS ′ for all s ∈ ]−δ, δ[ (62.8)

and all v ∈ S such that sv ∈ Domh.

Proof: (i) ⇒ (ii): This is trivial.(ii) ⇒ (iii): Assume that (ii) holds. Let a bounded subset S of V be

given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52, we can chooseb ∈ P× such that

ν(v) ≤ b for all v ∈ S.

By Prop.3 of Sect.53, we can determine δ ∈ P× such that δbCe(ν) ⊂ N .Hence, by (62.7), we have for all u ∈ Domh

ν ′(h(u)) ≤ κν(u) if ν(u) < δb. (62.9)

Now let v ∈ S be given. Then ν(sv) = |s|ν(v) ≤ |s|b < δb for all s ∈ ]−δ, δ[such that sv ∈ Domh. Therefore, by (62.9) we have

ν ′(h(sv) ≤ κν(sv) = κ|s|ν(v) < |s|κb

and hence

h(sv) ∈ sκbCe(ν ′)

for all s ∈ ]−δ, δ[ such that sv ∈ Domh. If we put S ′ := κbCe(ν ′), we obtainthe desired conclusion (62.8).

(iii) ⇒ (i): Assume that (iii) is valid. Let a norm ν on V and a norm ν ′

on V ′ be given. We apply (iii) to the choice S := Ce(ν) and determine S ′ andδ ∈ P× such that (62.8) holds. Since S ′ is bounded, we can apply the Cell-Inclusion Theorem of Sect.52 and determine κ ∈ P× such that S ′ ⊂ κCe(ν ′).We put N := δCe(ν) ∩ Domh, which belongs to Nhd0(V). If we put s := 0in (62.8) we obtain h(0) = 0, which shows that (62.7) holds for u := 0.Now let u ∈ N× be given, so that 0 < ν(u) < δ. If we apply (62.8) with thechoices s := ν(u) and v := 1

su, we see that

h(u) ∈ ν(u)S ′ ⊂ ν(u)κCe(ν ′),

which yields the assertion (62.7).The following results are immediate consequences of the definition and

of the properties on linear mappings discussed in Sect.52:

62. SMALL AND CONFINED MAPPINGS 215

(I) Value-wise sums and value-wise scalar multiples of mappings that aresmall [confined] near zero are again small [confined] near zero.

(II) Every mapping that is small near zero is also confined near zero, i.e.

Small(V,V ′) ⊂ Conf(V,V ′).

(III) If h ∈ Conf(V,V ′), then h(0) = 0 and h is continuous at 0.

(IV) Every linear mapping is confined near zero, i.e.

Lin(V,V ′) ⊂ Conf(V,V ′).

(V) The only linear mapping that is small near zero is the zero-mapping,i.e.

Lin(V,V ′) ∩ Small(V,V ′) = 0.

Proposition 3: Let V,V ′,V ′′ be linear spaces and let h ∈Conf(V,V ′) and k ∈ Conf(V ′,V ′′) be such that Domk = Codh. Then

k h ∈ Conf(V,V ′′). Moreover, if one of k or h is small near zero so is

k h.

Proof: Let norms ν, ν ′, ν ′′ on V,V ′,V ′′, respectively, be given. Since hand k are confined we can find κ, κ′ ∈ P× and N ∈ Nhd0(V), N ′ ∈ Nhd0(V ′)such that

ν ′′((k h)(u) ≤ κ′ν ′(h(u)) ≤ κ′κν(u) (62.10)

for all u ∈ N ∩ Domh such that h(u) ∈ N ′ ∩ Domk, i.e. for allu ∈ N ∩ h<(N ′ ∩ Domk). Since h is continuous at 0 ∈ V, we haveh<(N ′ ∩ Domk) ∈ Nhd0(V) and hence N ∩ h<(N ′ ∩ Domk) ∈ Nhd0(V).Thus, (62.7) remains satisfied when we replace h, κ and N by kh, κ′κ, andN ∩ h<(N ′ ∩ Domk), respectively, which shows that k h ∈ Conf(V,V ′′).

Assume now, that one of k and h, say h, is small. Let ε ∈ P× be given.Then we can choose N ∈ Nhd0(V) such that ν ′(h(u)) ≤ κν(u) holds for allu ∈ N ∩Domh with κ := ε

κ′. Therefore, (62.10) gives ν ′′((kh)(u)) ≤ εν(u)

for all u ∈ N ∩ h<(N ′ ∩ Domk). Since ε ∈ P× was arbitrary this provesthat

limu→0

ν ′′((k h)(u))

ν(u)= 0,

i.e. that k h is small near zero.

Now let E and E ′ be flat spaces with translation spaces V and V ′, respec-tively.


Definition 3: Let x ∈ E be given. We say that a mapping σ from a

neighborhood of x ∈ E to a neighborhood of 0 ∈ V ′ is small near x if the

mapping v 7→ σ(x + v) from (Dom σ) − x to Codσ is small near 0. The

set of all such small mappings will be denoted by Smallx(E ,V′).

We say that a mapping ϕ from a neighborhood of x ∈ E to a neighborhood

of ϕ(x) ∈ E ′ is confined near x if the mapping v 7→ (ϕ(x+v)−ϕ(x)) from

(Domϕ) − x to (Codϕ) − ϕ(x) is confined near zero.

The following characterization is immediate.Proposition 4: The mapping ϕ is confined near x ∈ E if and only if

for every norm ν on V and every norm ν ′ on V ′ there is N ∈ Nhdx(E) and

κ ∈ P× such that

ν ′(ϕ(y) − ϕ(x)) ≤ κν(y − x) for all y ∈ N . (62.11)

We now state a few facts that are direct consequences of the definitions,the results (I)–(V) stated above, and Prop.3:

(VI) Value-wise sums and differences of mappings that are small [confined]near x are again small [confined] near x. Here, “sum” can mean eitherthe sum of two vectors or sum of a point and a vector, while “differ-ence” can mean either the difference of two vectors or the differenceof two points.

(VII) Every σ ∈ Smallx(E ,V′) is confined near x.

(VIII) If a mapping is confined near x it is continuous at x.

(IX) A flat mapping α : E → E ′ is confined near every x ∈ E .

(X) The only flat mapping β : E → V ′ that is small near some x ∈ E isthe constant 0E→V′ .

(XI) If ϕ is confined near x ∈ E and if ψ is a mapping with Domψ = Codϕthat is confined near ϕ(x) then ψ ϕ is confined near x.

(XII) If σ ∈ Smallx(E ,V′) and h ∈ Conf(V ′,V ′′) with Codσ = Domh, then

h σ ∈ Smallx(E ,V′′).

(XIII) If ϕ is confined near x ∈ E and if σ is a mapping with Dom σ = Codϕthat is small near ϕ(x) then σ ϕ ∈ Smallx(E ,V

′′), where V ′′ is thelinear space for which Codσ ∈ Nhd0(V ′′).

(XIV) An adjustment of a mapping that is small [confined] near x is againsmall [confined] near x, provided only that the concept small [confined]near x remains meaningful after the adjustment.

63. GRADIENTS, CHAIN RULE 217

Notes 62

(1) In the conventional treatments, the norms ν and ν′ in Defs.1 and 2 are assumedto be prescribed and fixed. The notation n = o(ν), and the phrase “n is small ohof ν”, are often used to express the assertion that n ∈ Small(V,V ′). The notationh = O(ν), and the phrase “h is big oh of ν”, are often used to express the assertionthat h ∈ Conf(V,V ′). I am introducing the terms “small” and “confined” herefor the first time because I believe that the conventional terminology is intolerablyawkward and involves a misuse of the = sign.

63 Gradients, Chain Rule

Let I be an open interval in R. One learns in elementary calculus that if afunction f : I → R is differentiable at a point t ∈ I, then the graph of f hasa tangent at (t, f(t)). This tangent is the graph of a flat function a ∈ Flf(R).Using poetic license, we refer to this function itself as the tangent to f at

t ∈ I. In this sense, the tangent a is given by a(r) := f(t) + (∂tf)(r− t) forall r ∈ R.

If we put σ := f − a|I , then σ(r) = f(r) − f(t) − (∂tf)(r − t) for all r ∈ I.

We have lims→0σ(t+s)s

= 0, from which it follows that σ ∈ Smallt(R,R).One can use the existence of a tangent to define differentiability at t. Sucha definition generalizes directly to mappings involving flat spaces.

Let E , E ′ be flat spaces with translation spaces V,V ′, respectively. Weconsider a mapping ϕ : D → D′ from an open subset D of E into an opensubset D′ of E ′.

Proposition 1: Given x ∈ D, there can be at most one flat mapping

α : E → E ′ such that the value-wise difference ϕ − α|D : D → V ′ is small

near x.Proof: If the flat mappings α1, α2 both have this property, then the

value-wise difference (α2 − α1)|D = (ϕ − α1|D) − (ϕ − α2|D) is small near


x ∈ E . Since α2 − α1 is flat, it follows from (X) and (XIV) of Sect.62 thatα2 − α1 is the zero mapping and hence that α1 = α2.

Definition 1: The mapping ϕ : D → D′ is said to be differentiableat x ∈ D if there is a flat mapping α : E → E ′ such that

ϕ− α|D ∈ Smallx(E ,V′). (63.1)

This (unique) flat mapping α is then called the tangent to ϕ at x. The

gradient of ϕ at x is defined to be the gradient of α and is denoted by

∇xϕ := ∇α. (63.2)

We say that ϕ is differentiable if it is differentiable at all x ∈ D. If this

is the case, the mapping

∇ϕ : D → Lin(V,V ′) (63.3)

defined by

(∇ϕ)(x) := ∇xϕ for all x ∈ D (63.4)

is called the gradient of ϕ. We say that ϕ is of class C1 if it is differentiable

and if its gradient ∇ϕ is continuous. We say that ϕ is twice differentiableif it is differentiable and if its gradient ∇ϕ is also differentiable. The gradient

of ∇ϕ is then called the second gradient of ϕ and is denoted by

∇(2)ϕ := ∇(∇ϕ) : D → Lin(V,Lin(V,V ′)) ∼= Lin2(V2,V ′). (63.5)

We say that ϕ is of class C2 if it is twice differentiable and if ∇(2)ϕ is

continuous.

If the subsets D and D′ are arbitrary, not necessarily open, and if x ∈IntD, we say that ϕ : D → D′ is differentiable at x if ϕ|EIntD is differentiableat x and we write ∇xϕ for ∇x(ϕ|

EIntD).

The differentiability properties of a mapping ϕ remain unchanged if thecodomain of ϕ is changed to any open subset of E ′ that includes Rngϕ. Thegradient of ϕ remains unaltered. If Rngϕ is included in some flat F ′ in E ′,one may change the codomain to a subset that is open in F ′. in that case,the gradient of ϕ at a point x ∈ D must be replaced by the adjustment∇xϕ|

U ′of ∇xϕ, where U ′ is the direction space of F ′.

The differentiability and the gradient of a mapping at a point dependonly on the values of the mapping near that point. To be more precise, let ϕ1

and ϕ2 be two mappings whose domains are neighborhoods of a given x ∈ Eand whose codomains are open subsets of E ′. Assume that ϕ1 and ϕ2 agree


on some neighborhood of x, i.e. that ϕ1|E′N = ϕ2|

E′N for some N ∈ Nhdx(E).

Then ϕ1 is differentiable at x if and only if ϕ2 is differentiable at x. If thisis the case, we have ∇xϕ1 = ∇xϕ2.

Every flat mapping α : E → E ′ is differentiable. The tangent of α atevery point x ∈ E is α itself. The gradient of α as defined in this sectionis the constant (∇α)E→Lin(V,V′) whose value is the gradient ∇α of α in thesense of Sect.33. The gradient of a linear mapping is the constant whosevalue is this linear mapping itself.

In the case when E := V := R and when D := I is an open interval,differentiability at t ∈ I of a process p : I → D′ in the sense of Def.1 abovereduces to differentiability of p at t in the sense of Def.1 of Sect.1. Thegradient ∇tp ∈ Lin(R,V ′) becomes associated with the derivative ∂tp ∈ V ′

by the natural isomorphism from V ′ to Lin(R,V ′), so that ∇tp = (∂tp)⊗ and∂tp = (∇tp)1 (see Sect.25). If I is an interval that is not open and if t is anendpoint of it, then the derivative of p : I → D′ at t, if it exists, cannot beassociated with a gradient.

If ϕ is differentiable at x then it is confined and hence continuous at x.This follows from the fact that its tangent α, being flat, is confined near xand that the difference ϕ−α|D, being small near x, is confined near x. Theconverse is not true. For example, it is easily seen that the absolute-valuefunction (t 7→ |t|) : R → R is confined near 0 ∈ R but not differentiable at 0.

The following criterion is immediate from the definition:Characterization of Gradients: The mapping ϕ : D → D′ is dif-

ferentiable at x ∈ D if and only if there is an L ∈ Lin(V,V ′) such that

n : (D − x) → V, defined by

n(v) := (ϕ(x+ v) − ϕ(x)) − Lv for all v ∈ D − x, (63.6)

is small near 0 in V. If this is the case, then ∇xϕ = L.

Let D,D1,D2 be open subsets of flat spaces E , E1, E2 with translationspaces V,V1,V2, respectively. The following result follows immediately fromthe definitions if we use the term-wise evaluations (04.13) and (14.12).

Proposition 2: The mapping (ϕ1, ϕ2) : D → D1 × D2 is differentiable

at x ∈ D if and only if both ϕ1 and ϕ2 are differentiable at x. If this is the

case, then

∇x(ϕ1, ϕ2) = (∇xϕ1,∇xϕ2) ∈ Lin(V,V1) × Lin(V,V2) ∼= Lin(V,V1 × V2)(63.7)

General Chain Rule: Let D,D′,D′′ be open subsets of flat spaces

E , E ′, E ′′ with translation spaces V,V ′,V ′′, respectively. If ϕ : D → D′ is


differentiable at x ∈ D and if ψ : D′ → D′′ is differentiable at ϕ(x), then the

composite ψ ϕ : D → D′′ is differentiable at x. The tangent to the compos-

ite ψ ϕ at x is the composite of the tangent to ϕ at x and the tangent to ψat ϕ(x) and we have

∇x(ψ ϕ) = (∇ϕ(x)ψ)(∇xϕ). (63.8)

If ϕ and ψ are both differentiable, so is ψ ϕ, and we have

∇(ψ ϕ) = (∇ψ ϕ)(∇ϕ), (63.9)

where the product on the right is understood as value-wise composition.

Proof: Let α be the tangent to ϕ at x and β the tangent to ψ at ϕ(x).Then

σ := ϕ− α|D ∈ Smallx(E ,V′),

τ := ψ − β|D′ ∈ Smallϕ(x)(E′,V ′′).

We have

ψ ϕ = (β + τ ) (α+ σ)

= β (α+ σ) + τ (α+ σ)

= β α+ (∇β) σ + τ (α+ σ)

where domain restriction symbols have been omitted to avoid clutter. Itfollows from (VI), (IX), (XII), and (XIII) of Sect.62 that (∇β)σ+τ (α+σ) ∈ Smallx(E ,V

′′), which means that

ψ ϕ− β α ∈ Smallx(E ,V′′).

If follows that ψ ϕ is differentiable at x with tangent β α. The assertion(63.8) follows from the Chain Rule for Flat Mappings of Sect.33.

Let ϕ : D → E ′ and ψ : D → E ′ both be differentiable at x ∈ D. Thenthe value-wise difference ϕ− ψ : D → V ′ is differentiable at x and

∇x(ϕ− ψ) = ∇xϕ−∇xψ. (63.10)

This follows from Prop.2, the fact that the point-difference (x′, y′) 7→ (x′−y′)is a flat mapping from E ′ × E ′ into V ′, and the General Chain Rule.

When the General Chain Rule is applied to the composite of a vector-valued mapping with a linear mapping it yields


Proposition 3: If h : D → V ′ is differentiable at x ∈ D and if L ∈Lin(V ′,W), where W is some linear space, then Lh : D → W is differentiable

at x ∈ D and

∇x(Lh) = L(∇xh). (63.11)

Using the fact that vector-addition, transpositions of linear mappings,and the trace operations are all linear operations, we obtain the followingspecial cases of Prop.3:

(I) Let h : D → V ′ and k : D → V ′ both be differentiable at x ∈ D.Then the value-wise sum h + k : D → V ′ is differentiable at x and

∇x(h + k) = ∇xh + ∇xk. (63.12)

(II) Let W and Z be linear spaces and let F : D → Lin(W ,Z) be dif-ferentiable at x. If F⊤ : D → Lin(Z∗,W∗) is defined by value-wisetransposition, then F⊤ is differentiable at x ∈ D and

∇x(F⊤)v = ((∇xF)v)⊤ for all v ∈ V. (63.13)

In particular, if I is an open interval and if F : I → Lin(W ,Z) isdifferentiable, so is F⊤ : I → Lin(Z∗,W∗), and

(F⊤)• = (F•)⊤. (63.14)

(III) Let W be a linear space and let F : D → Lin(W) be differentiable at x.If trF : D → R is the value-wise trace of F, then trF is differentiableat x and

(∇x(trF))v = tr((∇xF)v) for all v ∈ V. (63.15)

In particular, if I is an open interval and if F : I → Lin(W) is differ-entiable, so is trF : I → R, and

(trF)• = tr(F•). (63.16)

We note three special cases of the General Chain Rule: Let I be anopen interval and let D and D′ be open subsets of E and E ′, respectively. Ifp : I → D and ϕ : D → D′ are differentiable, so is ϕ p : I → D′, and

(ϕ p)• = ((∇ϕ) p)p•. (63.17)


If ϕ : D → D′ and f : D′ → R are differentiable, so is f ϕ : D → R, and

∇(f ϕ) = (∇ϕ)⊤((∇f) ϕ) (63.18)

(see (21.3)). If f : D → I and p : I → D′ are differentiable, so is p f , and

∇(p f) = (p• f) ⊗∇f. (63.19)

Notes 63

(1) Other common terms for the concept of “gradient” of Def.1 are “differential”,“Frechet differential”, “derivative”, and “Frechet derivative”. Some authors makean artificial distinction between “gradient” and “differential”. We cannot use“derivative” because, for processes, “gradient” and “derivative” are distinct thoughrelated concepts.

(2) The conventional definitions of gradient depend, at first view, on the prescriptionof a norm. Many texts never even mention the fact that the gradient is a norm-invariant concept. In some contexts, as when one deals with genuine Euclideanspaces, this norm-invariance is perhaps not very important. However, when onedeals with mathematical models for space-time in the theory of relativity, the norm-invariance is crucial because it shows that the concepts of differential calculus havea “Lorentz-invariant” meaning.

(3) I am introducing the notation ∇xϕ for the gradient of ϕ at x because the moreconventional notation ∇ϕ(x) suggests, incorrectly, that ∇ϕ(x) is necessarily thevalue at x of a gradient-mapping ∇ϕ. In fact, one cannot define the gradient-mapping ∇ϕ without first having a notation for the gradient at a point (see 63.4).

(4) Other notations for ∇xϕ in the literature are dϕ(x), Dϕ(x), and ϕ′(x).

(5) I conjecture that the “Chain” of “Chain Rule” comes from an old terminology thatused “chaining” (in the sense of “concatenation”) for “composition”. The term“Composition Rule” would need less explanation, but I retained “Chain Rule”because it is more traditional and almost as good.

64 Constricted Mappings

In this section, D and D′ denote arbitrary subsets of flat spaces E and E ′

with translation spaces V and V ′, respectively.Definition 1: We say that the mapping ϕ : D → D′ is constricted if

for every norm ν on V and every norm ν ′ on V ′ there is κ ∈ P× such that

ν ′(ϕ(y) − ϕ(x)) ≤ κν(y − x) (64.1)

64. CONSTRICTED MAPPINGS 223

holds for all x, y ∈ D. The infimum of the set of all κ ∈ P× for which (64.1)holds for all x, y ∈ D is called the striction of ϕ relative to ν and ν ′; it is

denoted by str(ϕ; ν, ν ′).

It is clear that

str(ϕ; ν, ν ′) = sup

ν ′(ϕ(y) − ϕ(x))

ν(y − x)

∣

∣

∣

∣

x, y ∈ D, x 6= y

. (64.2)

Proposition 1: For every mapping ϕ : D → D′, the following are

equivalent:

(i) ϕ is constricted.

(ii) There exist norms ν and ν ′ on V and V ′, respectively, and κ ∈ P× such

that (64.1) holds for all x, y ∈ D.

(iii) For every bounded subset C of V and every N ′ ∈ Nhd0(V′) there is

ρ ∈ P× such that

x− y ∈ sC =⇒ ϕ(x) − ϕ(y) ∈ sρN ′ (64.3)

for all x, y ∈ D and s ∈ P×.

Proof: (i) ⇒ (ii): This implication is trivial.

(ii) ⇒ (iii): Assume that (ii) holds, and let a bounded subset C of Vand N ′ ∈ Nhd0(V ′) be given. By Cor.1 of the Cell-Inclusion Theorem ofSect.52, we can choose b ∈ P× such that

ν(u) ≤ b for all u ∈ C. (64.4)

By Prop.3 of Sect.53, we can choose σ ∈ P× such that σCe(ν ′) ⊂ N ′. Nowlet x, y ∈ D and s ∈ P× be given and assume that x − y ∈ sC. Then1s(x − y) ∈ C and hence, by (64.4), we have ν(1

s(x − y)) ≤ b, which gives

ν(x − y) ≤ sb. Using (64.1) we obtain ν ′(ϕ(y) − ϕ(x)) ≤ sκb. If we putρ := κb

σ, this means that

ϕ(y) − ϕ(x) ∈ sρσCe(ν ′) ⊂ sρN ′,

i.e. that (64.3) holds.

(iii) ⇒ (i): Assume that (iii) holds and let a norm ν on V and a normν ′ on V ′ be given. We apply (iii) with the choices C := Bdy Ce(ν) andN ′ := Ce(ν ′). Let x, y ∈ D with x 6= y be given. If we put s := ν(x − y)


then x−y ∈ sBdy Ce(ν) and hence, by (64.3), ϕ(x)−ϕ(y) ∈ sρCe(ν ′), whichimplies

ν ′(ϕ(x) − ϕ(y)) < sρ = ρν(x− y).

Thus, if we put κ := ρ, then (64.1) holds for all x, y ∈ D.If D and D′ are open sets and if ϕ : D → D′ is constricted, it is confined

near every x ∈ D, as is evident from Prop.4 of Sect.62. The converse is nottrue. For example, one can show that the function f : ]−1, 1[→ R definedby

f(t) :=

t sin(1t) if t ∈ ]0, 1[

0 if t ∈ ]−1, 0]

(64.5)

is not constricted, but is confined near every t ∈ ]−1, 1[.Every flat mapping α : E → E ′ is constricted and

str(α; ν, ν ′) = ||∇α||ν,ν′,

where || ||ν,ν′ is the operator norm on Lin(V,V ′) corresponding to ν and ν ′

(see Sect.52).Constrictedness and strictions remain unaffected by a change of codo-

main. If the domain of a constricted mapping is restricted, then it remainsconstricted and the striction of the restriction is less than or equal to thestriction of the original mapping. (Pardon the puns.)

Proposition 2: If ϕ : D → D′ is constricted then it is uniformly

continuous.

Proof: We use condition (iii) of Prop.1. Let N ′ ∈ Nhd0(V′) be given.

We choose a bounded neighborhood C of 0 in V and determine ρ ∈ P×

according to (iii). Putting N := 1ρC ∈ Nhd0(V) and s := 1

ρ, we see that

(64.3) gives

x− y ∈ N =⇒ ϕ(x) − ϕ(y) ∈ N ′ for all x, y ∈ D.

Pitfall: The converse of Prop.2 is not valid. A counterexample is thesquare-root function

√: P → P, which is uniformly continuous but not

constricted. Another counterexample is the function defined by (64.5).The following result is the most useful criterion for showing that a given

mapping is constricted.Striction Estimate for Differentiable Mappings: Assume that D

is an open convex subset of E, that ϕ : D → D′ is differentiable and that the

gradient ∇ϕ : D → Lin(V,V ′) has a bounded range. Then ϕ is constricted

and

str(ϕ; ν, ν ′) ≤ sup||∇zϕ||ν,ν′ | z ∈ D (64.6)


for all norms ν, ν ′ on V,V ′, respectively.

Proof: Let x, y ∈ D. Since D is convex, we have tx+ (1 − t)y ∈ D forall t ∈ [0, 1] and hence we can define a process

p : [0, 1] → D′ by p(t) := ϕ(tx+ (1 − t)y).

By the Chain Rule, p is differentiable at t when t ∈ ]0, 1[ and we have

∂tp = (∇zϕ)(x− y) with z := tx+ (1 − t)y ∈ D.

Applying the Difference-Quotient Theorem of Sect.61 to p, we obtain

ϕ(x) − ϕ(y) = p(1) − p(0) ∈ CloCxh(∇zϕ)(x− y) | z ∈ D. (64.7)

If ν, ν ′ are norms on V,V ′ then, by (52.7),

ν ′((∇zϕ)(x− y)) ≤ ||∇zϕ||ν,ν′ν(x− y) (64.8)

for all z ∈ D. To say that ∇ϕ has a bounded range is equivalent, by Cor.1to the Cell-Inclusion Theorem of Sect.52, to

κ := sup||∇zϕ||ν,ν′ | z ∈ D <∞. (64.9)

It follows from (64.8) that ν ′((∇zϕ)(x− y) ≤ κν(x− y) for all z ∈ D, whichcan be expressed in the form

(∇zϕ)(x− y) ∈ κν(x− y)Ce(ν ′) for all z ∈ D.

Since the set on the right is closed and convex we get

CloCxh(∇zϕ)(y − x) | z ∈ D ⊂ κν(x− y)Ce(ν ′)

and hence, by (64.7), ϕ(x)−ϕ(y) ∈ κν(x−y)Ce(ν ′). This may be expressedin the form

ν ′(ϕ(x) − ϕ(y)) ≤ κν(x− y).

Since x, y ∈ D were arbitrary it follows that ϕ is constricted. The definition(64.9) shows that (64.6) holds.

Remark: It is not hard to prove that the inequality in (64.6) is actuallyan equality.

Proposition 3: If D is a non-empty open convex set and if ϕ : D → D′

is differentiable with gradient zero, then ϕ is constant.

Proof: Choose norms ν, ν ′ on V,V ′, respectively. The assumption ∇ϕ =0 gives ||∇zϕ||ν,ν′ = 0 for all z ∈ D. Hence, by (64.6), we have str(ϕ, ν, ν ′) =


0. Using (64.2), we conclude that ν ′(ϕ(y)−ϕ(x)) = 0 and hence ϕ(x) = ϕ(y)for all x, y ∈ D.

Remark: In Prop.3, the condition that D be convex can be replaced bythe weaker one that D be “connected”. This means, intuitively, that everytwo points in D can be connected by a continuous curve entirely within D.

Proposition 4: Assume that D and D′ are open subsets and that

ϕ : D → D′ is of class C1. Let k be a compact subset of D. For every

norm ν on V, every norm ν ′ on V ′, and every ε ∈ P× there exists δ ∈ P×

such that k + δCe(ν) ⊂ D and such that, for every x ∈ k, the function

nx : δCe(ν) → V ′ defined by

nx(v) := ϕ(x+ v) − ϕ(x) − (∇xϕ)v (64.10)

is constricted with

str(nx; ν, ν′) ≤ ε. (64.11)

Proof: Let a norm ν on V be given. By Prop.6 of Sect.58, we canobtain δ1 ∈ P× such that k + δ1Ce(ν) ⊂ D. For each x ∈ k, we definemx : δ1Ce(ν) → V ′ by

mx(v) := ϕ(x+ v) − ϕ(x) − (∇xϕ)v. (64.12)

Differentiation gives

∇vmx = ∇ϕ(x+ v) −∇ϕ(x) (64.13)

for all x ∈ k and all v ∈ δ1Ce(ν). Since k + δ1Ce(ν) is compact by Prop.6of Sect.58 and since ∇ϕ is continuous, it follows by the Uniform ContinuityTheorem of Sect.58 that ∇ϕ|

k+δ1Ce(ν) is uniformly continuous. Now let a

norm ν ′ on V ′ and ε ∈ P× be given. By Prop.4 of Sect.56, we can thendetermine δ2 ∈ P× such that

ν(y − x) < δ2 =⇒ ||∇ϕ(y) −∇ϕ(x)||ν,ν′ < ε

for all x, y ∈ k + δ1Ce(ν). In view of (64.13), it follows that

v ∈ δ2Ce(ν) =⇒ ||∇vmx||ν,ν′ < ε

for all x ∈ k and all v ∈ δ1Ce(ν). If we put δ := min δ1, δ2 and if we definenx := mx|δCe(ν) for every x ∈ k, we see that ||∇vnx||ν,ν′ | v ∈ δCe(ν)is bounded by ε for all x ∈ k. By (64.12) and the Striction Estimate forDifferentiable Mappings, the desired result follows.


Definition 2: Let ϕ : D → D be a constricted mapping from a set Dinto itself. Then

str(ϕ) := infstr(ϕ; ν, ν) | ν a norm on V (64.14)

is called the absolute striction of ϕ. If str(ϕ) < 1 we say that ϕ is a

contraction.

Contraction Fixed Point Theorem: Every contraction has at most

one fixed point and, if its domain is closed and not empty, it has exactly one

fixed point.

Proof: Let ϕ : D → D be a contraction. We choose a norm ν on V suchthat κ := str(ϕ; ν, ν) < 1 and hence, by (64.1),

ν(ϕ(x) − ϕ(y)) ≤ κν(x− y) for all x, y ∈ D. (64.15)

If x and y are fixed points of ϕ, so that ϕ(x) = x, ϕ(y) = y, then(64.15) gives ν(x− y)(1− κ) ≤ 0. Since 1− κ > 0 this is possible only whenν(x − y) = 0 and hence x = y. Therefore, ϕ can have at most one fixedpoint.

We now assume D 6= ∅, choose s0 ∈ D arbitrarily, and define

sn := ϕn(s0) for all n ∈ N×

(see Sect.03). It follows from (64.15) that

ν(sm+1 − sm) = ν(ϕ(sm) − ϕ(sm−1)) ≤ κν(sm − sm−1)

for all m ∈ N×. Using induction, one concludes that

ν(sm+1 − sm) ≤ κmν(s1 − s0) for all m ∈ N.

Now, if n ∈ N and r ∈ N, then

sn+r − sn =∑

k∈r[

(sn+k+1 − sn+k)

and hence

ν(sn+r − sn) ≤∑

k∈r[

κn+kν(s1 − s0) ≤κn

1 − κν(s1 − s0).

Since κ < 1, we have lim n→∞ κn = 0, and it follows that for every ε ∈ P×

there is an m ∈ N such that ν(sn+r−sn) < ε whenever n ∈ m+N, r ∈ N. By


the Basic Convergence Criterion of Sect.55 it follows that s := (sn | n ∈ N)converges. Since ϕ(sn) = sn+1 for all n ∈ N, we have ϕ s = (sn+1 | n ∈ N),which converges to the same limit as s. We put

x := lim s = lim (ϕ s).

Now assume that D is closed. By Prop.6 of Sect.55 it follows that x ∈ D.Since ϕ is continuous, we can apply Prop.2 of Sect.56 to conclude thatlim (ϕ s) = ϕ(lim s), i.e. that ϕ(x) = x.

Notes 64

(1) The traditional terms for “constricted” and “striction” are “Lipschitzian” and“Lipschitz number (or constant)”, respectively. I am introducing the terms “con-stricted” and “striction” here because they are much more descriptive.

(2) It turns out that the absolute striction of a lineon coincides with what is oftencalled its “spectral radius”.

(3) The Contraction Fixed Point Theorem is often called the “Contraction MappingTheorem” or the “Banach Fixed Point Theorem”.

65 Partial Gradients, Directional Derivatives

Let E1, E2 be flat spaces with translation spaces V1,V2. As we have seen inSect.33, the set-product E := E1 × E2 is then a flat space with translationspace V := V1 × V2. We consider a mapping ϕ : D → D′ from an opensubset D of E into an open subset D′ of a flat space E ′ with translationspace V

′. Given any x2 ∈ E2, we define (·, x2) : E1 → E according to (04.21),

putD(·,x2) := (·, x2)

<(D) = z ∈ E1 | (z, x2) ∈ D, (65.1)

which is an open subset of E1 because (·, x1) is flat and hence continuous,and define ϕ(·, x2) : D(·,x2) → D′ according to (04.22). If ϕ(·, x2) is dif-ferentiable at x1 for all (x1, x2) ∈ D we define the partial 1-gradient∇(1)ϕ : D → Lin(V1,V

′) of ϕ by

∇(1)ϕ(x1, x2) := ∇x1ϕ(·, x2) for all (x1, x2) ∈ D. (65.2)

In the special case when E1 := R, we have V1 = R, and the partial1-derivative ϕ,1 : D → V ′, defined by

ϕ,1(t, x2) := (∂ϕ(·, x2))(t) for all (t, x2) ∈ D, (65.3)

65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 229

is related to ∇(1)ϕ by ∇(1)ϕ = ϕ,1⊗ and ϕ,1 = (∇(1)ϕ)1 (value-wise).Similar definitions are employed to define D(x1,·), the mapping

ϕ(x1, ·) : D(x1,·) → D′, the partial 2-gradient ∇(2)ϕ : D → Lin(V2,V′) and

the partial 2-derivative ϕ,2 : D → V ′.The following result follows immediately from the definitions.Proposition 1: If ϕ : D → D′ is differentiable at x := (x1, x2) ∈ D,

then the gradients ∇x1ϕ(·, x2) and ∇x2ϕ(x1, ·) exist and

(∇xϕ)v = (∇x1ϕ(·, x2))v1 + (∇x2ϕ(x1, ·))v2 (65.4)

for all v := (v1,v2) ∈ V.

If ϕ is differentiable, then the partial gradients ∇(1)ϕ and ∇(2)ϕ exist

and we have

∇ϕ = ∇(1)ϕ⊕∇(2)ϕ (65.5)

where the operation ⊕ on the right is understood as value-wise application

of (14.13).Pitfall: The converse of Prop.1 is not true: A mapping can have

partial gradients without being differentiable. For example, the mappingϕ : R × R → R, defined by

ϕ(s, t) :=

sts2+t2

if (s, t) 6= (0, 0)

0 if (s, t) = (0, 0)

,

has partial derivatives at (0, 0) since both ϕ(·, 0) and ϕ(0, ·) are equal to theconstant 0. Since ϕ(t, t) = 1

2 for t ∈ R× but ϕ(0, 0) = 0, it is clear that ϕ isnot even continuous at (0, 0), let alone differentiable.

The second assertion of Prop.1 shows that if ϕ is of class C1, then ∇(1)ϕand ∇(2)ϕ exist and are continuous. The converse of this statement is true,but the proof is highly nontrivial:

Proposition 2: Let E1, E2, E′ be flat spaces. Let D be an open subset of

E1×E2 and D′ an open subset of E ′. A mapping ϕ : D → D′ is of class C1 if

(and only if) the partial gradients ∇(1)ϕ and ∇(2)ϕ exist and are continuous.

Proof: Let (x1, x2) ∈ D. Since D − (x1, x2) is a neighborhood of (0,0)in V1 × V2, we may choose M1 ∈ Nhd0(V1) and M2 ∈ Nhd0(V2) such that

M := M1 ×M2 ⊂ (D − (x1, x2)).

We define m : M → V ′ by

m(v1,v2) := ϕ((x1, x2) + (v1,v2)) − ϕ(x1, x2)

− (∇(1)ϕ(x1, x2)v1 + ∇(2)ϕ(x1, x2)v2).


It suffices to show that m ∈ Small(V1 × V2,V′), for if this is the case, then

the Characterization of Gradients of Sect.63 tells us that ϕ is differentiableat (x1, x2) and that its gradient at (x1, x2) is given by (65.4). In addition,since (x1, x2) ∈ D is arbitrary, we can then conclude that (65.5) holds and,since both ∇(1)ϕ and ∇(2)ϕ are continuous, that ∇ϕ is continuous.

We note that m := h+k, when h : M → V ′ and k : M → V ′ are definedby

h(v1,v2) := ϕ(x1, x2 + v2) − ϕ(x1, x2) −∇(2)ϕ(x1, x2)v2,

k(v1,v2) := ϕ(x1 + v1, x2 + v2) − ϕ(x1, x2 + v2) −∇(1)ϕ(x1, x2)v1.

(65.6)The differentiability of ϕ(x1, ·) at x2 insures that h belongs to

Small(V1 × V2,V′) and it only remains to be shown that k belongs to

Small(V1 × V2,V′).

We choose norms ν1, ν2, and ν ′ on V1,V2 and V ′, respectively. Let ε ∈P× be given. Since ∇(1)ϕ is continuous at (x1, x2), there are open convexneighborhoods N1 and N2 of 0 in V1 and V2, respectively, such that N1 ⊂M1, N2 ⊂ M2, and

||∇(1)ϕ((x1, x2) + (v1,v2)) −∇(1)ϕ(x1, x2) ||ν1,ν′ < ε

for all (v1,v2) ∈ N1 ×N2. Noting (65.6), we see that this is equivalent to

||∇v1k(·,v2) ||ν1,ν′ < ε for all v1 ∈ N1, v2 ∈ N2.

Applying the Striction Estimate of Sect.64 to k(·,v2)|N1 , we infer thatstr(k(·,v2)|N1 ; ν1, ν

′) ≤ ε. It is clear from (65.6) that k(0,v2) = 0 forall v2 ∈ N2. Hence we can conclude, in view of (64.1), that

ν ′(k(v1,v2)) ≤ εν1(v1) ≤ εmax ν1(v1), ν2(v2) for all v1 ∈ N1, v2 ∈ N2.

Since ε ∈ P× was arbitrary, it follows that

lim(v1,v2)→(0,0)

ν ′(k(v1,v2))

max ν1(v1), ν2(v2)= 0,

which, in view of (62.1) and (51.23), shows that k ∈ Small(V1 × V2,V′) as

required.Remark: In examining the proof above one observes that one needs

merely the existence of the partial gradients and the continuity of one ofthem in order to conclude that the mapping is differentiable.

We now generalize Prop.2 to the case when E :=×(Ei | i ∈ I) is theset-product of a finite family (Ei | i ∈ I) of flat spaces. This product E is a


flat space whose translation space is the set-product V :=×(Vi | i ∈ I) ofthe translation spaces Vi of the Ei, i ∈ I. Given any x ∈ E and j ∈ I, wedefine the mapping (x.j) : Ej → E according to (04.24).

We consider a mapping ϕ : D → D′ from an open subset D of E into anopen subset D′ of E ′. Given any x ∈ D and j ∈ I, we put

D(x.j) := (x.j)<(D) ⊂ Ej (65.7)

and define ϕ(x.j) : D(x.j) → D′ according to (04.25). If ϕ(x.j) is differen-tiable at xj for all x ∈ D, we define the partial j-gradient ∇(j)ϕ : D →Lin(Vj ,V

′) of ϕ by ∇(j)ϕ(x) := ∇xjϕ(x.j) for all x ∈ D.

In the special case when Ej := R for some j ∈ I, the partial j-derivative ϕ,j : D → V ′, defined by

ϕ,j(x) := (∂ϕ(x.j))(xj) for all x ∈ D, (65.8)

is related to ∇(j)ϕ by ∇(j)ϕ = ϕ,j⊗ and ϕ,j = (∇(j)ϕ)1.

In the case when I := 1, 2, these notations and concepts reduce to theones explained in the beginning (see (04.21)).

The following result generalizes Prop.1.

Proposition 3: If ϕ : D → D′ is differentiable at x = (xi | i ∈ I) ∈ D,

then the gradients ∇xjϕ(x.j) exist for all j ∈ I and

(∇xϕ)v =∑

j∈I

(∇xjϕ(x.j))vj (65.9)

for all v = (vi | i ∈ I) ∈ V.

If ϕ is differentiable, then the partial gradients ∇(j)ϕ exist for all j ∈ Iand we have

∇ϕ =⊕

j∈I

(∇(j)ϕ), (65.10)

where the operation⊕

on the right is understood as value-wise application

of (14.18).

In the case when E := RI , we can put Ei := R for all i ∈ I and (65.10)reduces to

∇ϕ = lnc(ϕ,i | i∈I), (65.11)

where the value at x ∈ D of the right side is understood to be the linear-combination mapping of the family (ϕ,i(x) | i ∈ I) in V ′. If moreover, E ′

is also a space of the form E ′ := RK with some finite index set K, then


lnc(ϕ,i | i∈I) can be identified with the function which assigns to each x ∈ Dthe matrix

(ϕk,i(x) | (k, i) ∈ K × I) ∈ RK×I ∼= Lin(RI ,RK).

Hence ∇ϕ can be identified with the matrix

∇ϕ = (ϕk,i | (k, i) ∈ K × I). (65.12)

of the partial derivatives ϕk,i : D → R of the component functions. As wehave seen, the mere existence of these partial derivatives is not sufficientto insure the existence of ∇ϕ. Only if ∇ϕ is known a priori to exist is itpossible to use the identification (65.12).

Using the natural isomorphism between

×(Ei | i ∈ I) and Ej × (×(Ei | i ∈ I \ j))

and Prop.2, one easily proves, by induction, the following generalization.

Partial Gradient Theorem: Let I be a finite index set and let Ei, i ∈

I and E ′ be flat spaces. Let D be an open subset of E :=×(Ei | i ∈ I) and

D′ an open subset of E ′. A mapping ϕ : D → D′ is of class C1 if and only

if the partial gradients ∇(i)ϕ exist and are continuous for all i ∈ I.

The following result is an immediate corollary:

Proposition 4: Let I and K be finite index sets and let D and D′ be

open subsets of RI and RK , respectively. A mapping ϕ : D → D′ is of

class C1 if and only if the partial derivatives ϕk,i : D → R exist and are

continuous for all k ∈ K and all i ∈ I.

Let E , E ′ be flat spaces with translation spaces V and V ′, respectively.Let D,D′ be open subsets of E and E ′, respectively, and consider a mappingϕ : D → D′. Given any x ∈ D and v ∈ V×, the range of the mapping(s 7→ (x + sv)) : R → E is a line through x whose direction space is Rv.Let Sx,v := s ∈ R | x + sv ∈ D be the pre-image of D under thismapping and x + 1Sx,vv : Sx,v → D a corresponding adjustment of themapping. Since D is open, Sx,v is an open neighborhood of zero in R andϕ (x + 1Sx,vv) : Sx,v → D′ is a process. The derivative at 0 ∈ R of thisprocess, if it exists, is called the directional derivative of ϕ at x and isdenoted by

(ddvϕ)(x) := ∂0(ϕ (x+ 1Sx,vv)) = lims→0

ϕ(x+ sv) − ϕ(x)

s. (65.13)


If this directional derivative exists for all x ∈ D, it defines a function

ddvϕ : D → V ′

which is called the directional v-derivative of ϕ. The following result isimmediate from the General Chain Rule:

Proposition 5: If ϕ : D → D′ is differentiable at x ∈ D then the

directional v-derivative of ϕ at x exists for all v ∈ V and is given by

(ddvϕ)(x) = (∇xϕ)v. (65.14)

Pitfall: The converse of this Proposition is false. In fact, a mappingcan have directional derivatives in all directions at all points in its domainwithout being differentiable. For example, the mapping ϕ : R2 → R definedby

ϕ(s, t) :=

s2ts4+t2

if (s, t) 6= (0, 0)

0 if (s, t) = (0, 0)

has the directional derivatives

(dd(α,β)ϕ)(0, 0) =

α2

βif β 6= 0

0 if β = 0

at (0, 0). Using Prop.4 one easily shows that ϕ|R2\(0,0) is of class C1 andhence, by Prop.5, ϕ has directional derivatives in all directions at all (s, t) ∈R2. Nevertheless, since ϕ(s, s2) = 1

2 for all s ∈ R×, ϕ is not even continuousat (0, 0), let alone differentiable.

Proposition 6: Let b be a set basis of V. Then ϕ : D → D′ is of class

C1 if and only if the directional b-derivatives ddbϕ : D → V ′ exist and are

continuous for all b ∈ b.

Proof: We choose q ∈ D and define α : Rb → E byα(λ) := q +

∑

b∈bλbb for all λ ∈ Rb. Since b is a basis, α is a flat iso-

morphism. If we define D := α<(D) ⊂ Rb and ϕ := ϕ α|DD

: D → D′, wesee that the directional b-derivatives of ϕ correspond to the partial deriva-tives of ϕ. The assertion follows from the Partial Gradient Theorem, appliedto the case when I := b and Ei := R for all i ∈ I.

Combining Prop.5 and Prop.6, we obtain

Proposition 7: Assume that S ∈ SubV spans V. The mapping

ϕ : D → D′ is of class C1 if and only if the directional derivatives ddvϕ :D → V ′ exist and are continuous for all v ∈ S.


Notes 65

(1) The notation ∇(i) for the partial i-gradient is introduced here for the first time. Iam not aware of any other notation in the literature.

(2) The notation ϕ,i for the partial i-derivative of ϕ is the only one that occurs fre-quently in the literature and is not objectionable. Frequently seen notations suchas ∂ϕ/∂xi or ϕxi

for ϕ,i are poison to me because they contain dangling dummies(see Part D of the Introduction).

(3) Some people use the notation ∇v for the directional derivative ddv. I am intro-ducing ddv because (by Def.1 of Sect.63) ∇v means something else, namely thegradient at v.

66 The General Product Rule

The following result shows that bilinear mappings (see Sect.24) are of classC1 and hence continuous. (They are not uniformly continuous except whenzero.)

Proposition 1: Let V1,V2 and W be linear spaces. Every bi-

linear mapping B : V1 × V2 → W is of class C1 and its gradient

∇B : V1 × V2 → Lin(V1 × V2,W) is the linear mapping given by

∇B(v1,v2) = (B∼v2) ⊕ (Bv1) (66.1)

for all v1 ∈ V1,v2 ∈ V2.

Proof: Since B(v1, ·) = Bv1 : V2 → W (see (24.2)) is linear for eachv1 ∈ V, the partial 2-gradient of B exists and is given by ∇(2)B(v1,v2) =Bv1 for all (v1,v2) ∈ V1 × V2. It is evident that ∇(2)B : V1 × V2 →Lin(V2,W) is linear and hence continuous. A similar argument shows that∇(1)B is given by ∇(1)B(v1,v2) = B∼v2 for all (v1,v2) ∈ V1×V2 and henceis also linear and continuous. By the Partial Gradient Theorem of Sect.65, itfollows that B is of class C1. The formula (66.1) is a consequence of (65.5).

Let D be an open set in a flat space E with translation space V. Ifh1 : D → V1, h2 : D → V2 and B ∈ Lin2(V1 × V2,W) we write

B(h1,h2) := B (h1,h2) : D → W

so that

B(h1,h2)(x) = B(h1(x),h2(x)) for all x ∈ D.

66. THE GENERAL PRODUCT RULE 235

The following theorem, which follows directly from Prop.1 and the Gen-eral Chain Rule of Sect.63, is called “Product Rule” because the term “prod-uct” is often used for the bilinear forms to which the theorem is applied.

General Product Rule: Let V1,V2 and W be linear spaces and let

B ∈ Lin2(V1 ×V2,W) be given. Let D be an open subset of a flat space and

let mappings h1 : D → V1 and h2 : D → V2 be given. If h1 and h2 are both

differentiable at x ∈ D so is B(h1,h2), and we have

∇xB(h1,h2) = (Bh1(x))∇xh2 + (B∼h2(x))∇xh1. (66.2)

If h1 and h2 are both differentiable [of class C1], so is B(h1,h2), and we

have

∇B(h1,h2) = (Bh1)∇h2 + (B∼h2)∇h1, (66.3)

where the products on the right are understood as value-wise compositions.

We now apply the General Product Rule to special bilinear mappings,namely to the ordinary product in R and to the four examples given inSect.24. In the following list of results D denotes an open subset of a flatspace having V as its translation space; W ,W ′ and W ′′ denote linear spaces.

(I) If f : D → R and g : D → R are differentiable [of class C1], so is thevalue-wise product fg : D → R and we have

∇(fg) = f∇g + g∇f. (66.4)

(II) If f : D → R and h : D → W are differentiable [of class C1], so is thevalue-wise scalar multiple fh : D → W , and we have

∇(fh) = h⊗∇f + f∇h. (66.5)

(III) If h : D → W and η : D → W∗ are differentiable [of class C1], so is thefunction ηh : D → R defined by (ηh)(x) := η(x)h(x) for all x ∈ D,and we have

∇(ηh) = (∇h)⊤η + (∇η)⊤h. (66.6)

(IV) If F : D → Lin(W ,W ′) and h : D → W are differentiable at x ∈ D, sois Fh (defined by (Fh)(y) := F(y)h(y) for all y ∈ D) and we have

∇x(Fh)v = ((∇xF)v)h(x) + (F(x)∇xh)v (66.7)

for all v ∈ V. If F and h are differentiable [of class C1], so is Fh and(66.7) holds for all x ∈ D,v ∈ V.


(V) If F : D → Lin(W ,W ′) and G : D → Lin(W ′,W ′′) are differentiableat x ∈ D, so is GF, defined by value-wise composition, and we have

∇x(GF)v = ((∇xG)v)F(x) + G(x)((∇xF)v) (66.8)

for all v ∈ V. If F and G are differentiable [of class C1], so is GF, and(66.8) holds for all x ∈ D,v ∈ V.

If W is an inner-product space, then W∗ can be identified with W , (seeSect.41) and (III) reduces to the following result.

(VI) If h,k : D → W are differentiable [of class C1], so is their value-wiseinner product h · k and

∇(h · k) = (∇h)⊤k + (∇k)⊤h. (66.9)

In the case when D reduces to an interval in R, (66.4) becomesthe Product Rule (08.32)2 of elementary calculus. The following for-mulas apply to differentiable processes f,h, η,F, and G with values inR,W ,W∗,Lin(W ,W ′), and Lin(W ′,W ′′), respectively:

(fh)• = f•h + fh•, (66.10)

(ηh)• = η•h + ηh•, (66.11)

(Fh)• = F•h + Fh•, (66.12)

(GF)• = G•F + GF•. (66.13)

If h and k are differentiable processes with values in an inner product space,then

(h · k)• = h• · k + h · k•. (66.14)

Proposition 2: Let W be an inner-product space and let R : D → LinWbe given. If R is differentiable at x ∈ D and if Rng R ⊂ OrthW then

R(x)⊤((∇xR)v) ∈ SkewW for all v ∈ V. (66.15)

Conversely, if R is differentiable, if D is convex, if (66.15) holds for all

x ∈ D, and if R(q) ∈ OrthW for some q ∈ D then Rng R ⊂ OrthW.

Proof: Assume that R is differentiable at x ∈ D. Using (66.8) with thechoice F := R and G := R⊤ we find that

∇x(R⊤R)v = ((∇xR

⊤)v)R(x) + R⊤(x)((∇xR)v)

66. THE GENERAL PRODUCT RULE 237

holds for all v ∈ V. By (63.13) we obtain

∇x(R⊤R)v = W⊤ + W with W := R⊤(x)((∇xR)v) (66.16)

for all v ∈ V. Now, if RngR ⊂ OrthV then R⊤R = 1W is constant (seeProp.2 of Sect.43). Hence the left side of (66.16) is zero and W must beskew for all v ∈ V, i.e. (66.15) holds.

Conversely, if (66.15) holds for all x ∈ D, then, by (66.16) ∇x(R⊤R) = 0

for all x ∈ D. Since D is convex, we can apply Prop.3 of Sect.64 to concludethat R⊤R must be constant. Hence, if R(q) ∈ OrthW , i.e. if (R⊤R)(q) =1W , then R⊤R is the constant 1W , i.e. Rng R ⊂ OrthW .

Remark: In the second part of Prop.2, the condition that D be con-vex can be replaced by the one that D be “connected” as explained in theRemark after Prop.3 of Sect.64.

Corollary: Let I be an open interval and let R : I → LinW be a

differentiable process. Then RngR ⊂ OrthW if and only if R(t) ∈ OrthWfor some t ∈ I and Rng (R⊤R•) ⊂ SkewW.

The following result shows that quadratic forms (see Sect.27) are of classC1 and hence continuous.

Proposition 3: Let V be a linear space. Every quadratic form Q : V →R is of class C1 and its gradient ∇Q : V → V∗ is the linear mapping

∇Q = 2Q , (66.17)

where Q is identified with the symmetric bilinear form Q ∈ Sym2(V2,R) ∼=

Sym(V,V∗) associated with Q (see (27.14)).

Proof: By (27.14), we have

Q(u) = Q (u,u) = (Q (1V ,1V))(u)

for all u ∈ V. Hence, if we apply the General Product Rule with the choices

B := Q and h1 := h2 := 1V , we obtain ∇Q = Q +∼

Q . Since Q issymmetric, this reduces to (66.17).

Let V be a linear space. The lineonic nth power pown : LinV → LinVon the algebra LinV of lineons (see Sect.18) is defined by

pown(L) := Ln for all n ∈ N. (66.18)

The following result is a generalization of the familiar differentiation rule(ιn)• = nιn−1.


Proposition 4: For every n ∈ N the lineonic nth power pown on

LinV defined by (66.18) is of class C1 and, for each L ∈ LinV, its gradi-

ent ∇Lpown ∈ Lin(LinV) at L ∈ LinV is given by

(∇Lpown)M =∑

k∈n]

Lk−1MLn−k for all M ∈ LinV. (66.19)

Proof: For n = 0 the assertion is trivial. For n = 1, we have pow1 =1LinV and hence ∇Lpow1 = 1LinV for all L ∈ LinV, which is consistent with(66.19). Also, since ∇pow1 is constant, pow1 is of class C1. Assume, then,that the assertion is valid for a given n ∈ R. Since pown+1 = pow1pown

holds in terms of value-wise composition, we can apply the result (V) aboveto the case when F := pown, G := pow1 in order to conclude that pown+1

is of class C1 and that

(∇Lpown+1)M = ((∇Lpow1)M)pown(L) + pow1(L)((∇Lpown)M)

for all M ∈ LinV. Using (66.18) and (66.19) we get

(∇Lpown+1)M = MLn + L(∑

k∈n]

Lk−1MLn−k),

which shows that (66.19) remains valid when n is replaced by n + 1. Thedesired result follows by induction.

If M commutes with L, then (66.19) reduces to

(∇Lpown)M = (nLn−1)M.

Using Prop.4 and the form (63.17) of the Chain Rule, we obtain

Proposition 5: Let I be an interval and let F : I → LinV be a process. If

F is differentiable [of class C1], so is its value-wise nth power Fn : I → LinVand we have

(Fn)• =∑

k∈n]

Fk−1F•Fn−k for all n ∈ N. (66.20)

67 Divergence, Laplacian

In this section, D denotes an open subset of a flat space E with translationspace V, and W denotes a linear space.

67. DIVERGENCE, LAPLACIAN 239

If h : D → V is differentiable at x ∈ D, we can form the trace (seeSect.26) of the gradient ∇xh ∈ LinV. The result

divxh := tr(∇xh) (67.1)

is called the divergence of h at x. If h is differentiable we can define thedivergence div h : D → R of h by

(div h)(x) := divxh for all x ∈ D. (67.2)

Using the product rule (66.5) and (26.3) we obtainProposition 1: Let h : D → V and f : D → R be differentiable. Then

the divergence of fh : D → V is given by

div (fh) = (∇f)h + fdiv h. (67.3)

Consider now a mapping H : D → Lin(V∗,W) that is differentiableat x ∈ D. For every ω ∈ W∗ we can form the value-wise compositeωH : D → Lin(V∗,R) = V∗∗ ∼= V (see Sect.22). Since ωH is differen-tiable at x for every ω ∈ W∗ (see Prop.3 of Sect.63) we may consider themapping

(ω 7→ tr(∇x(ωH))) : W∗ → R.

It is clear that this mapping is linear and hence an element of W∗∗ ∼= W .Definition 1: Let H : D → Lin(V∗,W) be differentiable at x ∈ D. Then

the divergence of H at x is defined to be the (unique) element divxH of

W which satisfies

ω(divxH) = tr(∇x(ωH)) for all ω ∈ W∗. (67.4)

If H is differentiable, then its divergence div H : D → W is defined by

(div H)(x) := divxH for all x ∈ D. (67.5)

In the case when W := R, we also have W∗ ∼= R and Lin(V∗,W) =V∗∗ ∼= V. Thus, using (67.4) with ω := 1 ∈ R, we see that the definitionjust given is consistent with (67.1) and (67.2).

If we replace h and L in Prop.3 of Sect.63 by H and

(K 7→ ωK) ∈ Lin(Lin(V∗,W),V),

respectively, we obtain

∇x(ωH) = ω∇xH for all ω ∈ W∗, (67.6)


where the right side must be interpreted as the composite of∇xH ∈ Lin2(V ×V∗,W) with ω ∈ Lin(W ,R). This composite, being an ele-ment of Lin2(V×V∗,R), must be reinterpreted as an element of LinV via theidentifications Lin2(V × V∗,R) ∼= Lin(V,Lin(V∗,R)) = Lin(V,V∗∗) ∼= LinVto make (67.6) meaningful. Using (67.6), (67.4), and (67.1) we see thatdiv H satisfies

ω(divxH) = divx(ωH) = tr(ω∇xH) (67.7)

for all ω ∈ W∗.The following results generalize Prop.1.Proposition 2: Let H : D → Lin(V∗,W) and ρ : D → W∗

be differentiable. Then the divergence of the value-wise composite

ρH : D → Lin(V∗,R) ∼= V is given by

div (ρH) = ρdiv H + tr(H⊤∇ρ), (67.8)

where value-wise evaluation and composition are understood.

Proof: Let x ∈ D and v ∈ V be given. Using (66.8) with the choicesG := ρ and F := H we obtain

∇x(ρH)v = ((∇xρ)v)H(x) + ρ(x)((∇xH)v). (67.9)

Since (∇xρ)v ∈ W∗, (21.3) gives

((∇xρ)v)H(x) = (H(x)⊤∇xρ)v.

On the other hand, we have

ρ(x)((∇xH)v) = (ρ(x)(∇xH))v

if we interpret ∇xH on the right as an element of Lin2(V × V∗,W). Hence,since v ∈ V was arbitrary, (67.9) gives

∇x(ρH) = ρ(x)∇xH + H(x)⊤∇xρ.

Taking the trace, using (67.1), and using (67.7) with ω := ρ(x), we get

divx(ρH) = ρ(x)divxH + tr(H(x)⊤∇xρ).

Since x ∈ D was arbitrary, the desired result (67.8) follows.Proposition 3: Let h : D → V and k : D → W be differentiable. The

divergence of k ⊗ h : D → Lin(V∗,W), defined by taking the value-wise

tensor product (see Sect.25), is then given by

div (k⊗ h) = (∇k)h + (div h)k. (67.10)

67. DIVERGENCE, LAPLACIAN 241

Proof: By (67.7) we have

ω divx(k⊗ h) = divx(ω(k ⊗ h)) = divx((ω k)h)

for all x ∈ D and all ω ∈ W∗. Hence, by Prop.1,

ω div (k⊗ h) = (∇(ωk))h + (ωk)div h = ω((∇k)h + k div h)

holds for all ω ∈ W∗, and (67.10) follows.Proposition 4: Let H : D → Lin(V∗,W) and f : D → R be differen-

tiable. The divergence of fH : D → Lin(V∗,W) is then given by

div (fH) = fdiv H + H(∇f). (67.11)

Proof: Let ω ∈ W∗ be given. If we apply Prop.2 to the case whenρ := fω we obtain

div (ω(fH)) = ω(fdiv H) + tr(H⊤∇(fω)). (67.12)

Using ∇(fω) = ω ⊗ ∇f and using (25.9), (26.3), and (22.3), we obtaintr(H⊤∇(fω)) = tr(H⊤(ω ⊗∇f)) = ∇f(H⊤ω) = ω(H∇f). Hence (67.12)and (67.7) yield ω div (fH) = ω(fdiv H + H∇f). Since ω ∈ W∗ wasarbitrary, (67.11) follows.

From now on we assume that E has the structure of a Euclidean space,so that V becomes an inner-product space and we can use the identificationV∗ ∼= V (see Sect.41). Thus, if k : D → W is twice differentiable, we canconsider the divergence of ∇k : D → Lin(V,W) ∼= Lin(V∗,W).

Definition 2: Let k : D → W be twice differentiable. Then the Lapla-cian of k is defined to be

∆k := div (∇k). (67.13)

If ∆k = 0, then k is called a harmonic function.

Remark: In the case when E is not a genuine Euclidean space, but onewhose translation space has index 1 (see Sect.47), the term D’Alembertian

or Wave-Operator and the symbol are often used instead of Laplacian and∆.

The following result is a direct consequence of (66.5) and Props.3 and 4.Proposition 5: Let k : D → W and f : D → R be twice differentiable.

The Laplacian of fk : D → W is then given by

∆(fk) = (∆f)k + f∆k + 2(∇k)(∇f). (67.14)


The following result follows directly from the form (63.19) of the ChainRule and Prop.3.

Proposition 6: Let I be an open interval and let f : D → I and

g : I → W both be twice differentiable. The Laplacian of g f : D → W is

then given by

∆(g f) = (∇f)·2(g•• f) + (∆f)(g• f). (67.15)

We now discuss an important application of Prop.6. We consider thecase when f : D → I is given by

f(x) := (x− q)·2 for all x ∈ D, (67.16)

where q ∈ E is given. We wish to solve the following problem: How can D, Iand g : I → W be chosen such that g f is harmonic?

It follows directly from (67.16) and Prop.3 of Sect.66, applied to Q := sq,that ∇xf = 2(x−q) for all x ∈ D and that hence (∇f)•2 = 4f . Also, ∇(∇f)is the constant with value 21V and hence, by (26.9),

∆f = div (∇f) = 2tr1V = 2dimV = 2dim E .

Substitution of these results into (67.15) gives

∆(g f) = 4f(g•• f) + 2(dim E)g• f. (67.17)

Hence, g f is harmonic if and only if

(2ιg•• + ng•) f = 0 with n := dim E , (67.18)

where ι is the identity mapping of R, suitably adjusted (see Sect.08). Now,if we choose I := Rng f , then (67.18) is satisfied if and only if g satisfies theordinary differential equation 2ιg•• + ng• = 0. It follows that g must be ofthe form

g =

aι−( n

2−1) + b if n 6= 2

a log +b if n = 2

, (67.19)

where a,b ∈ W , provided that I is an interval.If E is a genuine Euclidean space, we may take D := E \ q and I :=

Rng f = P×. We then obtain the harmonic function

h =

ar−(n−2) + b if n 6= 2a(log r) + b if n = 2

, (67.20)

where a,b ∈ W and where r : E \ q → P× is defined by r(x) := |x− q|.

68. LOCAL INVERSION, IMPLICIT MAPPINGS 243

If the Euclidean space E is not genuine and V is double-signed we havetwo possibilities. For all n ∈ N× we may take D := x ∈ E | (x −q)•2 > 0 and I := P×. If n is even and n 6= 2, we may instead takeD := x ∈ E | (x− q)•2 < 0 and I := −P×.

Notes 67

(1) In some of the literature on “Vector Analysis”, the notation ∇• h instead of div h

is used for the divergence of a vector field h, and the notation ∇2f instead of ∆ffor the Laplacian of a function f . These notations come from a formalistic anderroneous understanding of the meaning of the symbol ∇ and should be avoided.

68 Local Inversion, Implicit Mappings

In this section, D and D′ denote open subsets of flat spaces E and E ′ withtranslation spaces V and V

′, respectively.

To say that a mapping ϕ : D → D′ is differentiable at x ∈ D means,roughly, that ϕ may be approximated, near x, by a flat mapping α, thetangent of ϕ at x. One might expect that if the tangent α is invertible, thenϕ itself is, in some sense, “locally invertible near x”. To decide whetherthis expectation is justified, we must first give a precise meaning to “locallyinvertible near x”.

Definition: Let a mapping ϕ : D → D′ be given. We say that ψ is a

local inverse of ϕ if ψ = (ϕ|N′

N )← for suitable open subsets N = Codψ =Rngψ and N ′ = Domψ of D and D′, respectively. We say that ψ is a localinverse of ϕ near x ∈ D if x ∈ Rngψ. We say that ϕ is locally invertiblenear x ∈ D if it has some local inverse near x.

To say that ψ is a local inverse of ϕ means that

ψ ϕ|N′

N = 1N and ϕ|N′

N ψ = 1N ′ (68.1)

for suitable open subsets N = Codψ and N′= Domψ of D and D′, respec-

tively.If ψ1 and ψ2 are local inverses of ϕ and M := (Rngψ1)∩ (Rngψ2), then

ψ1 and ψ2 must agree on ϕ>(M) = ψ<1 (M) = ψ<2 (M), i.e.

ψ1|Mϕ>(M) = ψ2|

Mϕ>(M) if M := (Rngψ1) ∩ (Rngψ2). (68.2)

Pitfall: A mapping ϕ : D → D′ may be locally invertible near every

point in D without being invertible, even if it is surjective. In fact, if ψ is alocal inverse of ϕ, ϕ<(Domψ) need not be included in Rngψ.


For example, the function f : ]−1, 1[×→ ]0, 1[ defined by f(s) := |s| forall s ∈ ]−1, 1[× is easily seen to be locally invertible, surjective, and evencontinuous. The identity function 1]0,1[ of ]0, 1[ is a local inverse of f , andwe have

f<(Dom1]0,1[) = f<( ]0, 1[ ) = ]−1, 1[× 6⊂ ]0, 1[= Rng 1[0,1].

The function h : ]0, 1[ → ]−1, 0[ defined by h(s) = −s for all s ∈ ]0, 1[ isanother local inverse of f .

If dim E ≥ 2, one can even give counter-examples of mappingsϕ : D → D′ of the type described above for which D is convex (see Sect.74).

The following two results are recorded for later use.Proposition 1: Assume that ϕ : D → D′ is differentiable at x ∈ D and

locally invertible near x. Then, if some local inverse near x is differentiable

at ϕ(x), so are all others and all have the same gradient, namely (∇xϕ)−1.

Proof: We choose a local inverse ψ1 of ϕ near x such that ψ1 is differ-entiable at ϕ(x). Applying the Chain Rule to (68.1) gives

(∇ϕ(x)ψ1)(∇xϕ) = 1V and (∇xϕ)(∇ϕ(x)ψ1) = 1V′ ,

which shows that ∇xϕ is invertible and ∇ϕ(x)ψ1 = (∇xϕ)−1.Let now a local inverse ψ2 of ϕ near x be given. Let M be defined as in

(68.2). Since M is open and since ψ1, being differentiable at x, is continuousat x, it follows that ϕ>(M) = ψ<1 (M) is a neighborhood of ϕ(x) in E ′. By(68.2), ψ2 agrees with ψ1 on the neighborhood ϕ>(M) of ϕ(x). Hence ψ2

must also be differentiable at ϕ(x), and ∇ϕ(x)ψ2 = ∇ϕ(x)ψ1 = (∇xϕ)−1.Proposition 2: Assume that ϕ : D → D′ is continuous and that ψ is a

local inverse of ϕ near x.

(i) If M′ is an open neighborhood of ϕ(x) and M′ ⊂ Domψ, then

ψ>(M′) = ϕ<(M′) ∩ Rngψ is an open neighborhood of x; hence the

adjustment ψ|ψ>(M′)M′ is again a local inverse of ϕ near x.

(ii) Let G be an open subset of E with x ∈ G. If ψ is continuous at ϕ(x)then there is an open neighborhood M of x with M ⊂ Rngψ ∩ G such

that ψ<(M) = ϕ>(M) is open; hence the adjustment ψ|Mψ<(M) is again

a local inverse of ϕ near x.

Proof: Part (i) is an immediate consequence of Prop.3 of Sect.56. Toprove (ii), we observe first that ψ<(Rngψ ∩ G) must be a neighborhood ofϕ(x) = ψ←(x). We can choose an open subset M′ of ψ<(Rngψ ∩ G) with


ϕ(x) ∈ M′ (see Sect.53). Application of (i) gives the desired result withM := ψ>(M′).

Remark: It is in fact true that if ϕ : D → D′ is continuous, then everylocal inverse of ϕ is also continuous, but the proof is highly non-trivial andgoes beyond the scope of this presentation. Thus, in Part (ii) of Prop.2, therequirement that ψ be continuous at ϕ(x) is, in fact, redundant.

Pitfall: The expectation mentioned in the beginning is not justified. Acontinuous mapping ϕ : D → D′ can have an invertible tangent at x ∈ Dwithout being locally invertible near x. An example is the function f : R →R defined by

f(t) :=

2t2 sin(

1t

)

+ t if t ∈ R×

0 if t = 0

. (68.3)

It is differentiable and hence continuous. Since f•(0) = 1, the tangent to f at0 is invertible. However, f is not monotone, let alone locally invertible, near0, because one can find numbers s arbitrarly close to 0 such that f•(s) < 0.

Let I be an open interval and let f : I → R be a function of class C1.If the tangent to f at a given t ∈ I is invertible, i.e. if f•(t) 6= 0, one caneasily prove, not only that f is locally invertible near t, but also that it hasa local inverse near t that is of class C1. This result generalizes to mappingsϕ : D → D′, but the proof is far from easy.

Local Inversion Theorem: Let D and D′ be open subsets of flat spaces,

let ϕ : D → D′ be of class C1 and let x ∈ D be such that the tangent to ϕ at

x is invertible. Then ϕ is locally invertible near x and every local inverse of

ϕ near x is differentiable at ϕ(x).Moreover, there exists a local inverse ψ of ϕ near x that is of class C1

and satisfies

∇yψ = (∇ψ(y)ϕ)−1 for all y ∈ Domψ. (68.4)

Before proceeding with the proof, we state two important results thatare closely related to the Local Inversion Theorem.

Implicit Mapping Theorem: Let E , E ′, E ′′ be flat spaces, let A be an

open subset of E × E ′ and let ω : A → E ′′ be a mapping of class C1. Let

(xo, yo) ∈ A, zo := ω(xo, yo), and assume that ∇yoω(xo, •) is invertible.

Then there exist an open neighborhood D of xo and a mapping ϕ : D → E ′,differentiable at xo, such that ϕ(xo) = yo and Gr(ϕ) ⊂ A, and

ω(x, ϕ(x)) = zo for all x ∈ D. (68.5)

Moreover, D and ϕ can be chosen such that ϕ is of class C1 and

∇ϕ(x) = −(∇(2)ω(x, ϕ(x)))−1∇(1)ω(x, ϕ(x)) for all x ∈ D. (68.6)


Remark: The mapping ϕ is determined implicitly by an equation inthis sense: for any given x ∈ D, ϕ(x) is a solution of the equation

? y ∈ E ′, ω(x, y) = zo. (68.7)

In fact, one can find a neighborhood M of (xo, yo) in E × E ′ such that ϕ(x)is the only solution of

? y ∈ M(x,•), ω(x, y) = zo, (68.8)

where M(x,•) := y ∈ E ′ | (x, y) ∈ M.

Differentiation Theorem for Inversion Mappings: Let V and V ′

be linear spaces of equal dimension. Then the set Lis(V,V ′) of all linear

isomorphorisms from V onto V ′ is a (non-empty) open subset of Lin(V,V ′),the inversion mapping inv : Lis(V,V ′) → Lin(V ′,V) defined by inv(L) :=L−1 is of class C1, and its gradient is given by

(∇Linv)M = −L−1ML−1 for all M ∈ Lin(V,V ′). (68.9)

The proof of these three theorems will be given in stages, which will bedesignated as lemmas. After a basic preliminary lemma, we will prove aweak version of the first theorem and then derive from it weak versions ofthe other two. The weak version of the last theorem will then be used, likea bootstrap, to prove the final version of the first theorem; the final versionsof the other two theorems will follow.

Lemma 1: Let V be a linear space, let ν be a norm on V, and put

B := Ce(ν). Let f : B → V be a mapping with f(0) = 0 such that f − 1B⊂Vis constricted with

κ := str(f − 1B⊂V ; ν, ν) < 1. (68.10)

Then the adjustment f |(1−κ)B (see Sect.03) of f has an inverse and this

inverse is confined near 0.

Proof: We will show that for every w ∈ (1 − κ)B, the equation

? z ∈ B, f(z) = w (68.11)

has a unique solution.

To prove uniqueness, suppose that z1, z2 ∈ B are solutions of (68.11) fora given w ∈ V. We then have

(f − 1B⊂V)(z1) − (f − 1B⊂V)(z2) = z2 − z1


and hence, by (68.10), ν(z2 − z1) ≤ κν(z2 − z1), which amounts to(1 − κ)ν(z2 − z1) ≤ 0. This is compatible with κ < 1 only if ν(z1 − z2) = 0and hence z1 = z2.

To prove existence, we assume that w ∈ (1 − κ)B is given and we put

α := ν(w)1−κ , so that α ∈ [0, 1[. For every v ∈ αB, we have ν(v) ≤ α and

hence, by (68.10)

ν((f − 1B⊂V)(v)) ≤ κν(v) ≤ κα,

which implies that

ν(w − (f(v) − v)) ≤ ν(w) + ν((f − 1B−V)(v))≤ (1 − κ)α+ κα = α.

Therefore, it makes sense to define hw : αB → αB by

hw(v) := w − (f(v) − v) for all v ∈ αB. (68.12)

Since hw is the difference of a constant with value w and a restriction off − 1B−V , it is constricted and has an absolute striction no greater thanκ. Therefore, it is a contraction, and since its domain αB is closed, theContraction Fixed Point Theorem states that hw has a fixed point z ∈ αB.It is evident from (68.12) that a fixed point of hw is a solution of (68.11).

We proved that (68.11) has a unique solution z ∈ B and that this solutionsatisfies

ν(z) ≤ α :=ν(w)

1 − κ. (68.13)

If we define g : (1 − κ)B → f<((1 − κ)B) in such a way that g(w) is thissolution of (68.11), then g is the inverse we are seeking. It follows from(68.13) that ν(g(w)) ≤ 1

1−κν(w) for all w ∈ (1 − κ)B. In view of part (ii)of Prop.2 of Sect.62, this proves that g is confined.

Lemma 2: The assertion of the Local Inversion Theorem, except possi-

bly the statement starting with “Moreover”, is valid.

Proof: Let α : E → E ′ be the (invertible) tangent to ϕ at x. Since Dis open, we may choose a norming cell B such that x + B ⊂ D. We definef : B → V by

f := (α←|D′ ϕ|x+B (x+ 1V)|x+BB ) − x. (68.14)

Since ϕ is of class C1, so is f . Clearly, we have

f(0) = 0, ∇0f = 1V . (68.15)


Put ν := noB, so that B = Ce(ν). Choose ε ∈ ]0, 1[ (ε := 12 would do). If we

apply Prop.4 of Sect.64 to the case when ϕ there is replaced by f and k by0, we see that there is δ ∈ ]0, 1] such that

κ := str(f |δB − 1δB⊂V ; ν, ν) ≤ ε. (68.16)

Now, it is clear from (64.2) that a striction relative to ν and ν ′ remainsunchanged if ν and ν ′ are multiplied by the same strictly positive factor.Hence, (68.16) remains valid if ν there is replaced by 1

δν. Since Ce(1

δν) =

δCe(ν) = δB (see Prop.6 of Sect.51), it follows from (68.16) that Lemma1 can be applied when ν, B, and f there are replaced by 1

δν, δB and f |δB,

respectively. We infer that if we put

Mo := (1 − κ)δB, No := f<(Mo),

then f |Mo

Nohas an inverse g : Mo → No that is confined near zero. Note

that Mo and No are both open neighborhoods of zero, the latter because itis the pre-image of an open set under a continuous mapping. We evidentlyhave

g|V − 1Mo⊂V = (1No⊂V − f |No) g.

In view of (68.15), 1No⊂V−f |Nois small near 0 ∈ V. Since g is confined near

0, it follows by Prop.3 of Sect.62 that g|V−1Mo⊂V is small near zero. By theCharacterization of Gradients of Sect.63, we conclude that g is differentiableat 0 with ∇0g = 1V .

We now define

N := x+ No, N ′ := ϕ(x) + (∇α)>(Mo).

These are open neighborhoods of x and ϕ(x), respectively. A simple calcu-lation, based on (68.14) and g = (f |Mo

No)←, shows that

ψ := (x+ 1V)|NNo

g (α← − x)|Mo

N ′ (68.17)

is the inverse of ϕ|N′

N . Using the Chain Rule and the fact that g is dif-ferentiable at 0, we conclude from (68.17) that ψ is differentiable at ϕ(x).

Lemma 3: The assertion of the Implicit Mapping Theorem, except pos-

sibly the statement starting with “Moreover”, is valid.

Proof: We define ω : A → E × E ′′ by

ω(x, y) := (x, ω(x, y)) for all (x, y) ∈ A. (68.18)


Of course, ω is of class C1. Using Prop.2 of Sect.63 and (65.4), we see that

(∇(xo,yo)ω)(u,v) = (u, (∇xoω(·, yo))u + (∇yo

ω(xo, ·))v)

for all (u,v) ∈ V × V′. Since ∇yo

ω(xo, ·) is invertible, so is ∇(xo,yo)ω. In-deed,we have

(∇(xo,yo)ω)−1(u,w) = (u, (∇yoω(xo, ·))

−1(w − (∇xoω(·, yo))u))

for all (u,w) ∈ V×V′′. Lemma 2 applies and we conclude that ω has a local

inverse ρ with (xo, yo) ∈ Rng ρ that is differentiable at ω(xo, yo) = (xo, zo).In view of Prop.2, (i), we may assume that the domain of ρ is of the formD×D′′, where D and D′′ are open neighborhoods of xo and zo, respectively.Let ψ : D ×D′′ → E and ψ′ : D ×D′′ → E ′ be the value-wise terms of ρ, sothat

ρ(x, z) = (ψ(x, z), ψ′(x, z)) for all x ∈ D, z ∈ D′′.

Then, by (68.18),

ω(ρ(x, y)) = (ψ(x, z), ω(ψ(x, z), ψ′(x, z))) = (x, z)

and hence

ψ(x, z) = x, ω(x, ψ′(x, z)) = z for all x ∈ D, z ∈ D′′. (68.19)

Since ρ(xo, zo) = (ψ(xo, zo), ψ′(xo, zo)) = (xo, yo), we have ψ′(xo, zo) = yo.

Therefore, if we define ϕ : D → E ′′ by ϕ(x) := ψ′(x, zo), we see that ϕ(xo) =yo and (68.5) are satisfied. The differentiability of ϕ at xo follows from thedifferentiablity of ρ at (xo, zo).

If we define M := Rng ρ, it is immediate that ϕ(x) is the only solutionof (68.8).

Lemma 4: Let V and V ′ be linear spaces of equal dimension. Then

Lis(V,V′) is an open subset of Lin(V,V

′) and the inversion mapping

inv : Lis(V,V′) → Lin(V

′,V) defined by inv(L) := L−1 is differentiable.

Proof: We apply Lemma 3 with E replaced by Lin(V,V′), E ′ by

Lin(V′,V), and E ′′ by LinV. For ω of Lemma 3 we take the mapping

(L,M) 7→ ML from Lin(V,V′) × Lin(V

′,V) to LinV. Being bilinear,

this mapping is of class C1 (see Prop.1 of Sect.66). Its partial 2-gradientat (L,M) does not depend on M. It is the right-multiplication RiL ∈Lin(Lin(V

′,V),LinV) defined by RiL(K) := KL for all K ∈ Lin(V,V ′).

It is clear that RiL is invertible if and only if L is invertible. (We have(RiL)−1 = RiL−1 if this is the case.) Thus, given Lo ∈ Lis(V,V

′), we can


apply Lemma 3 with Lo,L−1o , and 1V playing the roles of xo, yo, and zo,

respectively. We conclude that there is an open neighborhood D of Lo inLin(V,V

′) such that the equation

? M ∈ Lin(V′,V), LM = 1V

has a solution for every L ∈ D. Since this solution can only be L−1 = inv(L),it follows that the role of the mapping ϕ of Lemma 3 is played by inv|D.Therefore, we must have D ⊂ Lis(V,V

′) and inv must be differentiable at

Lo. Since Lo ∈ Lis(V,V′) was arbitrary, it follows that Lis(V,V

′) is open

and that inv is differentiable.

Completion of Proofs: We assume that hypotheses of the Local Inver-sion Theorem are satisfied. The assumption that ϕ has an invertible tangentat x ∈ D is equivalent to ∇xϕ ∈ Lis(V,V

′). Since ∇ϕ is continuous and

since Lis(V,V′) is an open subset of Lin(V,V

′) by Lemma 4, it follows that

G := (∇ϕ)<(Lis(V,V′)) is an open subset of E . By Lemma 2, we can choose

a local inverse of ϕ near x which is differentiable and hence continuous atϕ(x). By Prop.2, (ii), we can adjust this local inverse such that its rangeis included in G. Let ψ be the local inverse so adjusted. Then ϕ has aninvertible tangent at every z ∈ Rngψ. Let z ∈ Rngψ be given. ApplyingLemma 2 to z, we see that ϕ must have a local inverse near z that is differ-entiable at ϕ(z). By Prop.1, it follows that ψ must also be differentiable atϕ(z). Since z ∈ Rngψ was arbitrary, it follows that ψ is differentiable. Theformula (68.4) follows from Prop.1. Since inv : Lis(V,V

′) → Lin(V

′,V) is

differentiable and hence continuous by Lemma 4, since ∇ϕ is continuous byassumption, and since ψ is continuous because it is differentiable, it follows

that the composite inv ∇ϕ|Lis(V,V

′)

Rngψ ψ is continuous. By (68.4), this com-posite is none other than ∇ψ, and hence the proof of the Local InversionTheorem is complete.

Assume, now, that the hypotheses of the Implicit Mapping Theoremare satisfied. By the Local Inversion Theorem, we can then choose a localinverse ρ of ω, as defined by (68.18), that is of class C1. It then follows thatthe function ϕ : D → E ′ defined in the proof of Lemma 3 is also of classC1. The formula (68.6) is obtained easily by differentiating the constantx 7→ ω(x, ϕ(x)), using Prop.1 of Sect.65 and the Chain Rule.

To prove the Differentiation Theorem for Inversion Mappings, one appliesthe Implicit Mapping Theorem in the same way as Lemma 3 was applied toobtain Lemma 4.

Pitfall: Let I and I ′ be intervals and let f : I → I ′ be of class C1 andsurjective. If f has an invertible tangent at every t ∈ I, then f is not only

69. EXTREME VALUES, CONSTRAINTS 251

locally invertible near every t ∈ I but (globally) invertible, as is easily seen.This conclusion does not generalize to the case when I and I ′ are replaced byconnected open subsets of flat spaces of higher dimension. The curvilinearcoordinate systems discussed in Sect.74 give rise to counterexamples. Onecan even give counterexamples in which I and I ′ are replaced by convex

open subsets of flat spaces.

Notes 68

(1) The Local Inversion Theorem is often called the “Inverse Function Theorem”. Ourterm is more descriptive.

(2) The Implicit Mapping Theorem is most often called the “Implicit Function Theo-rem”.

(3) If the sets D and D′ of the Local Inversion Theorem are both subsets of Rn forsome n ∈ N, then ∇xϕ can be identified with an n-by-n matrix (see (65.12)). Thismatrix is often called the “Jacobian matrix” and its determinant the “Jacobian”of ϕ at x. Some textbook authors replace the condition that ∇xϕ be invertible bythe condition that the Jacobian be non-zero. I believe it is a red herring to dragin determinants here. The Local Inversion Theorem has an extension to infinite-dimensional spaces, where it makes no sense to talk about determinants.

69 Extreme Values, Constraints

In this section E and F denote flat spaces with translation spaces V and W ,respectively, and D denotes a subset of E .

The following definition is “local” variant of the definition of an ex-tremum given in Sect.08.

Definition 1: We say that a function f : D → R attains a localmaximum [local minimum] at x ∈ D if there is N ∈ Nhdx(D) (see(56.1)) such that f |N attains a maximum [minimum] at x. We say that

f attains a local extremum at x if it attains a local maximum or local

minimum at x.

The following result is a direct generalization of the Extremum Theoremof elementary calculus (see Sect.08).

Extremum Theorem: If f : D → R attains a local extremum at

x ∈ IntD and if f is differentiable at x, then ∇xf = 0.

Proof: Let v ∈ V be given. It is clear that Sx,v := s ∈ R | x + sv ∈IntD is an open subset of R and the function (s 7→ f(x+ sv)) : Sx,v → Rattains a local extremum at 0 ∈ R and is differentiable at 0 ∈ R. Hence, by


the Extremum Theorem of elementary calculus, and by (65.13) and (65.14),we have

0 = ∂0(s 7→ f(x+ sv)) = (ddvf)(x) = (∇xf)v.

Since v ∈ V was arbitrary, the desired result ∇xf = 0 follows.From now on we assume that D is an open subset of E .The next result deals with the case when a restriction of f : D → R

to a suitable subset of D, but not necessarily f itself, has an extremum ata given point x ∈ D. This subset of D is assumed by the set on which agiven mapping ϕ : D → F has a constant value, i.e. the set ϕ<(ϕ(x)).The assertion that f |ϕ<(ϕ(x)) has an extremum at x is often expressed bysaying that f attains an extremum at x subject to the constraint that ϕ beconstant.

Constrained-Extremum Theorem: Assume that f : D → R and

ϕ : D → F are both of class C1 and that ∇xϕ ∈ Lin(V,W) is surjective for

a given x ∈ D. If f |ϕ<(ϕ(x)) attains a local extremum at x, then

∇xf ∈ Rng (∇xϕ)⊤. (69.1)

Proof: We put U := Null ∇xϕ and choose a supplement Z of U in V.Every neighborhood of 0 in V includes a set of the form N + M, where Nis an open neighborhood of 0 in U and M is an open neighborhood of 0 inZ. Since D is open, we may select N and M such that x + N + M ⊂ D.We now define ω : N ×M → F by

ω(u, z) := ϕ(x+ u + z) for all u ∈ N , z ∈ M. (69.2)

It is clear that ω is of class C1 and we have

∇0ω(0, ·) = ∇xϕ|Z . (69.3)

Since ∇xϕ is surjective, it follows from Prop.5 of Sect.13 that ∇xϕ|Z isinvertible. Hence we can apply the Implicit Mapping Theorem of Sect.68 tothe case when A there is replaced by N ×M, the points xo, yo by 0, and zoby ϕ(x). We obtain an open neighborhood G of 0 in U with G ⊂ N and amapping h : G → Z, of class C1, such that h(0) = 0 and

ω(u,h(u)) = ϕ(x) for all u ∈ G. (69.4)

Since ∇(1)ω(0,0) = ∇0ω(·,0) = ∇xϕ|U = 0 by the definitionU := Null ∇xϕ, the formula (68.6) yields in our case that ∇0h = 0, i.e.we have

h(0) = 0, ∇0h = 0. (69.5)

69. EXTREME VALUES, CONSTRAINTS 253

It follows from (69.2) and (69.4) that x + u + h(u) ∈ ϕ<(ϕ(x)) for allu ∈ G and hence that

(u 7→ f(x+ u + h(u))) : G → R

has a local extremum at 0. By the Extremum Theorem and the Chain Rule,it follows that

0 = ∇0(u 7→ f(x+ u + h(u))) = ∇xf(1U⊂V + ∇0h|V),

and therefore, by (69.5), that ∇xf |U = (∇xf)1U⊂V = 0, which means that∇xf ∈ U⊥ = (Null ∇xϕ)⊥. The desired result (69.1) is obtained by applying(22.9).

In the case when F := R, the Constrained-Extremum Theorem reducesto

Corollary 1: Assume that f, g : D → R are both of class C1 and that

∇xg 6= 0 for a given x ∈ D. If f |g<(g(x)) attains a local extremum at x,then there is λ ∈ R such that

∇xf = λ∇xg. (69.6)

In the case when F := RI , we can use (23.19) and obtainCorollary 2: Assume that f : D → R and all terms gi : D → R

in a finite family g := (gi | i ∈ I) : D → RI are of class C1 and that

(∇xgi | i ∈ I) is linearly independent for a given x ∈ D. If f |g<(g(x))

attains a local extremum at x, then there is λ ∈ RI such that

∇xf =∑

i∈I

λi∇xgi. (69.7)

Remark 1: If E is two-dimensional then Cor.1 can be given a geo-metrical interpretation as follows: The sets Lg := g<(g(x)) and Lf :=f<(f(x)) are the “level-lines” through x of f and g, respectively (seeFigure).


If these lines cross at x, then the value f(y) of f at y ∈ Lg strictlyincreases or decreases as y moves along Lg from one side of the line Lf tothe other and hence f |Lg

cannot have an extremum at x. Hence, if f |Lg

has an extremum at x, the level-lines must be tangent. The assertion (69.6)expresses this tangency. The condition ∇xg 6= 0 insures that the level-lineLg does not degenerate to a point.

Notes 69

(1) The number λ of (69.6) or the terms λi occuring in (69.7) are often called “Lagrangemultipliers”.

610 Integral Representations

Let I ∈ Sub R be some genuine interval and let h : I → V be a continuousprocess with values in a given linear space V. For every λ ∈ V∗ the compositeλh : I → R is then continuous. Given a, b ∈ I one can therefore form theintegral

∫ b

aλh in the sense of elementary integral calculus (see Sect.08). It

is clear that the mapping

(λ 7→

∫ b

a

λh) : V∗ → R

is linear and hence an element of V∗∗ ∼= V.Definition 1: Let h : I → V be a continuous process with values in the

linear space V. Given any a, b ∈ I the integral of h from a to b is defined

to be the unique element∫ b

ah of V which satisfies

λ

∫ b

a

h =

∫ b

a

λh for all λ ∈ V∗. (610.1)

Proposition 1: Let h : I → V be a continuous process and let L be a

linear mapping from V to a given linear space W. Then Lh : I → W is a

continuous process and

L

∫ b

a

h =

∫ b

a

Lh for all a, b ∈ I. (610.2)

Proof: Let ω ∈ W∗ and a, b ∈ I be given. Using Def.1 twice, we obtain

ω(L

∫ b

a

h) = (ωL)

∫ b

a

h =

∫ b

a

(ωL)h =

∫ b

a

ω(Lh) = ω

∫ b

a

Lh.

610. INTEGRAL REPRESENTATIONS 255

Since ω ∈ W∗ was arbitrary, (610.2) follows.Proposition 2: Let ν be a norm on the linear space V and let h : I → V

be a continuous process. Then ν h : I → R is continuous and, for every

a, b ∈ I, we have

ν(

∫ b

a

h) ≤

∣

∣

∣

∣

∫ b

a

ν h

∣

∣

∣

∣

. (610.3)

Proof: The continuity of ν h follows from the continuity of ν (Prop.7of Sect.56) and the Composition Theorem for Continuity of Sect.56.

It follows from (52.14) that

|λh|(t) = |λh(t)| ≤ ν∗(λ)ν(h(t)) = ν∗(λ)(ν h)(t)

holds for all λ ∈ V∗ and all t ∈ I, and hence that

|λh| ≤ ν h for all λ ∈ Ce(ν∗).

Using this result and (610.1), (08.40), and (08.42), we obtain

∣

∣

∣

∣

λ

∫ b

a

h

∣

∣

∣

∣

=

∣

∣

∣

∣

∫ b

a

λh

∣

∣

∣

∣

≤

∣

∣

∣

∣

∫ a

b

|λh|

∣

∣

∣

∣

≤

∣

∣

∣

∣

∫ b

a

ν h

∣

∣

∣

∣

for all λ ∈ Ce(ν∗) and all a, b ∈ I. The desired result now follows from theNorm-Duality Theorem, i.e. (52.15).

Most of the rules of elementary integral calculus extend directly to inte-grals of processes with values in a linear space. For example, if h : I → V iscontinuous and if a, b, c ∈ I, then

∫ b

a

h =

∫ c

a

h +

∫ b

c

h. (610.4)

The following result is another example.Fundamental Theorem of Calculus: Let E be a flat space with trans-

lation space V. If h : I → V is a continuous process and if x ∈ E and a ∈ Iare given, then the process p : I → E defined by

p(t) := x+

∫ t

a

h for all t ∈ I (610.5)

is of class C1 and p• = h.

Proof: Let λ ∈ V∗ be given. By (610.5) and (610.1) we have

λ(p(t) − x) = λ

∫ t

a

h =

∫ t

a

λh for all t ∈ I.


By Prop.1 of Sect.61 p is differentiable and we have (λ(p − x))• = λp•.Hence, the elementary Fundamental Theorem of Calculus (see Sect.08) givesλp• = λh. Since λ ∈ V∗ was arbitrary, we conclude that p• = h, which iscontinuous by assumption.

In the remainder of this section, we assume that the following items aregiven: (i) a flat space E with translation space V, (ii) An open subset D ofE , (iii) An open subset D of R × E such that, for each x ∈ D,

D(·,x) := s ∈ R | (s, x) ∈ D

is a non-empty open inteval, (iv) a linear space W .Consider now a mapping h : D → W such that h(·, x) : D(·,x) → W is

continuous for all x ∈ D. Given a, b ∈ R such that a, b ∈ D(·,x) for all x ∈ D,we can then define k : D → W by

k(x) :=

∫ b

a

h(·, x) for all x ∈ D. (610.6)

We say that k is defined by an integral representation.Proposition 3: If h : D → W is continuous, so is the mapping

k : D → W defined by the integral representation (610.6).Proof: Let x ∈ D be given. Since D is open, we can choose a norm ν

on V such that x+ Ce(ν) ⊂ D. We may assume, without loss of generality,that a < b. Then [a, b] is a compact interval and hence, in view of Prop.4of Sect.58, [a, b] × (x + Ce(ν)) is a compact subset of D. By the UniformContinuity Theorem of Sect. 58, it follows that the restriction of h to [a, b]×(x+ Ce(ν)) is uniformly continuous. Let µ be a norm on W and let ε ∈ P×

be given. By Prop.4 of Sect.56, we can determine δ ∈ ]0, 1 ] such that

µ(h(s, y) − h(s, x)) <ε

b− a

for all s ∈ [a, b] and all y ∈ x + δCe(ν). Hence, by (610.6) and Prop.2, wehave

µ(k(y) − k(x)) = µ

(∫ b

a

(h(·, y) − h(·, x))

)

≤

∫ b

a

µ (h(·, y) − h(·, x))

≤ (b− a)ε

b− a= ε

whenever ν(y − x) < δ. Since ε ∈ P× was arbitrary, the continuity of kat x follows by Prop.1 of Sect.56. Since x ∈ D was arbitrary, the assertionfollows.


The following is a stronger version of Prop.3.

Proposition 4: Let D ⊂ R × R × E be defined by

D := (a, b, x) | x ∈ D, (a, x) ∈ D, (b, x) ∈ D. (610.7)

Then, if h : D → W is continuous, so is the mapping(

(a, b, x) 7→∫ b

ah(·, x)

)

: D → W.

Proof: Choose a norm µ on W . Let (a, b, x) ∈ D be given. Sinceµ h is continuous at (a, x) and at (b, x) and since D is open, we can findN ∈ Nhdx(D) and σ ∈ P× such that

N := ((a+ ]−σ, σ[ ) ∪ (b+ ]−σ, σ[ )) ×N

is a subset of D and such that the restriction of µ h to N is bounded bysome β ∈ P×. By Prop.2, it follows that

µ(

∫ a

s

h(·, y) +

∫ t

b

h(·, y)) ≤ ( |s− a| + |t− b| )β (610.8)

for all y ∈ N , all s ∈ a+ ]−σ, σ[, and all t ∈ b+ ]−σ, σ].Now let ε ∈ P× be given. If we put δ := minσ, ε

4β, it follows from(610.8) that

µ

(∫ a

s

h(·, y) +

∫ t

b

h(·, y)

)

<ε

2(610.9)

for all y ∈ N , all s ∈ a+ ]−δ, δ], and all t ∈ b+ ]−δ, δ]. On the other hand,by Prop.3, we can determine M ∈ Nhdx(D) such that

µ

(∫ b

a

h(·, y) −

∫ b

a

h(·, x)

)

<ε

2(610.10)

for all y ∈ M. By (610.4) we have

∫ t

s

h(·, y) −

∫ b

a

h(·, x) =

∫ b

a

h(·, y) −

∫ b

a

h(·, x)

+

∫ a

s

h(·, y) +

∫ t

b

h(·, y).

Hence it follows from (610.9) and (610.10) that

µ

(∫ t

s

h(·, y) −

∫ b

a

h(·, x)

)

< ε


for all y ∈ M∩N ∈ Nhdx(D), all s ∈ a+ ]−δ, δ], and all t ∈ b+ ]−δ, δ[. Sinceε ∈ P× was arbitrary, it follows from Prop.1 of Sect.56 that the mappingunder consideration is continuous at (a, b, x).

Differentiation Theorem for Integral Representations: Assume

that h : D → W satisfies the following conditions:

(i) h(•, x) is continuous for every x ∈ D.

(ii) h(s, •) is differentiable at x for all (s, x) ∈ D and the partial 2-gradient

∇(2)h : D → Lin(V,W) is continuous.

Then the mapping k : D → W defined by the integral representation

(610.6) is of class C1 and its gradient is given by the integral representation

∇xk =

∫ b

a

∇(2)h(•, x) for all x ∈ D. (610.11)

Roughly, this theorem states that if h satisfies the conditions (i) and (ii),one can differentiate (610.6) with respect to x by “differentiating under theintegral sign”.

Proof: Let x ∈ D be given. As in the proof of Prop.3, we can choosea norm ν on V such that x + Ce(ν) ⊂ D, and we may assume that a < b.Then [a, b] ×

(

x+ Ce(ν))

is a compact subset of D. By the Uniform Conti-nuity Theorem of Sect.58, the restriction of ∇(2)h to [a, b] ×

(

x+ Ce(ν))

isuniformly continuous. Hence, if a norm µ on W and ε ∈ P× are given, wecan determine δ ∈ ]0, 1 ] such that

‖∇(2)h(s, x+ u) −∇(2)h(s, x)‖ν,µ <ε

b− a(610.12)

holds for all s ∈ [a, b] and u ∈ δCe(ν).

We now define n :(

D − (0, x))

→ W by

n(s,u) := h(s, x+ u) − h(s, x) −(

∇(2)h(s, x))

u. (610.13)

It is clear that ∇(2)n exists and is given by

∇(2)n(s,u) = ∇(2)h(s, x+ u) −∇(2)h(s, x).

By (610.12), we hence have

‖∇(2)n(s,u)‖ν,µ <ε

b− a


for all s ∈ [a, b] and u ∈ δCe(ν). By the Striction Estimate for DifferentiableMapping of Sect.64, it follows that n(s, •)|δCe(ν) is constricted for all s ∈ [a, b]and that

str(n(s, •)|δCe(ν) ; ν, µ) ≤ε

b− afor all s ∈ [a, b].

Since n(s,0) = 0 for all s ∈ [a, b], the definition (64.1) shows that

µ (n(s,u)) ≤ε

b− aν(u) for all s ∈ [a, b]

and all u ∈ δCe(ν). Using Prop.2, we conclude that

µ

(∫ b

a

n(•,u)

)

≤

∫ b

a

µ (n(•,u)) ≤ εν(u)

whenever u ∈ δCe(ν). Since ε ∈ P× was arbitrary, it follows that the map-

ping(

u 7→∫ b

an(•,u)

)

is small near 0 ∈ V (see Sect.62). Now, integrating

(610.13) with respect to s ∈ [a, b] and observing the representation (610.6)of k, we obtain

∫ b

a

n(•,u) = k(x+ u) − k(u) −

(∫ b

a

∇(2)h(•, x)

)

u

for all u ∈ D−x. Therefore, by the Characterization of Gradients of Sect.63,k is differentiable at x and its gradient is given by (610.11). The continuityof ∇k follows from Prop.3.

The following corollary deals with generalizations of integral representa-tions of the type (610.6).

Corollary: Assume that h : D → W satisfies the conditions (i) and

(ii) of the Theorem. Assume, further, that f : D → R and g : D → R are

differentiable and satisfy

f(x), g(x) ∈ D(•,x) for all x ∈ D.

Then k : D → W, defined by

k(x) :=

∫ g(x)

f(x)h(•, x) for all x ∈ D, (610.14)

is of class C1 and its gradient is given by

∇xk = h(g(x), x) ⊗∇xg − h(f(x), x) ⊗∇xf (610.15)

+

∫ g(x)

f(x)∇(2)h(•, x) for all x ∈ D.


Proof: Consider the set D ⊂ R × R × E defined by (610.7), so that

a, b ∈ D(•,x) for all (a, b, x) ∈ D. Since D(•,x) is assumed to be an interval

for each x ∈ D, we can define m : D → W by

m(a, b, x) :=

∫ b

a

h(•, x). (610.16)

By the Theorem, ∇(3)m : D → Lin(V,W) exists and is given by

∇(3)m(a, b, x) =

∫ b

a

∇(2)h(•, x). (610.17)

It follows from Prop.4 that ∇(3)m is continuous. By the Fundamental The-orem of Calculus and (610.16), the partial derivatives m,1 and m,2 exist,are continuous, and are given by

m,1 (a, b, x) = −h(a, x), m,2 (a, b, x) = h(b, x). (610.18)

By the Partial Gradient Theorem of Sect.65, it follows that m is of class C1.Since ∇(1)m = m,1 ⊗ and ∇(2)m = m,2 ⊗, it follows from (65.9), (610.17),and (610.18) that

(∇(a,b,x)m)(α, β,u) = h(b, x) ⊗ β − h(a, x) ⊗ α (610.19)

+

(∫ b

a

∇(2)h(•, x)

)

u

for all (a, b, x) ∈ D and all (α, β,u) ∈ R × R × V.

Now, since f and g are differentiable, so is (f, g, 1D⊂E) : D → R×R×Eand we have

∇x(f, g, 1D⊂E) = (∇xf,∇xg,1V) (610.20)

for all x ∈ D. (See Prop.2 of Sect.63). By (610.14) and (610.16) we have

k = m (f, g,1D⊂E)|D. Therefore, by the General Chain Rule of Sect.63, k

is of class C1 and we have

∇xk = (∇(f(x),g(x),x)m)∇x(f, g,1D⊂E)

for all x ∈ D. Using (610.19) and (610.20), we obtain (610.15).

611. CURL, SYMMETRY OF SECOND GRADIENTS 261

611 Curl, Symmetry of Second Gradients

In this section, we assume that flat spaces E and E ′, with translation spacesV and V ′, and an open subset D of E are given.

If H : D → Lin(V,V ′) is differentiable, we can apply the identification(24.1) to the codomain of the gradient

∇H : D → Lin(

V,Lin(V,V ′))

∼= Lin2(V2,V ′),

and hence we can apply the switching defined by (24.4) to the values of ∇H.Definition 1: The curl of a differentiable mapping H : D → Lin(V,V ′)

is the mapping CurlH : D → Lin2(V2,V ′) defined by

(CurlH)(x) := ∇xH − (∇xH)∼ for all x ∈ D. (611.1)

The values of CurlH are skew, i.e. Rng CurlH ⊂ Skew2(V2,V ′).

The following result deals with conditions that are sufficient or necessaryfor the curl to be zero.

Curl-Gradient Theorem: Assume that H : D → Lin(V,V ′) is of class

C1. If H = ∇ϕ for some ϕ : D → E ′ then CurlH = 0. Conversely, if D is

convex and CurlH = 0 then H = ∇ϕ for some ϕ : D → E ′.The proof will be based on the followingLemma: Assume that D is convex. Let q ∈ D and q′ ∈ E ′ be given and

let ϕ : D → E ′ be defined by

ϕ(q + v) := q′ +

(∫ 1

0H(q + sv)ds

)

v for all v ∈ D − q. (611.2)

Then ϕ is of class C1 and we have

∇ϕ(q + v) = H(q + v) −

(∫ 1

0s(CurlH)(q + sv)ds

)

v (611.3)

for all v ∈ D − q.Proof: We define D ⊂ R × V by

D := (s,u) | u ∈ D − q, q + su ∈ D.

Since D is open and convex, it follows that, for each u ∈ D − q, D(•,u) :=s ∈ R | q + su ∈ D is an open interval that includes [0, 1]. Since H is ofclass C1, so is

((s,u) 7→ H(q + su)) : D → Lin(V,V ′);


hence we can apply the Differentiation Theorem for Integral Representationsof Sect.610 to conclude that u 7→

∫ 10 H(q + su)ds is of class C1 and that its

gradient is given by

∇v(u 7→

∫ 1

0H(q + su)ds) =

∫ 1

0s∇H(q + sv)ds

for all v ∈ D − q. By the Product Rule (66.7), it follows that the mappingϕ defined by (611.2) is of class C1 and that its gradient is given by

(∇q+vϕ)u =

(∫ 1

0s∇H(q + sv)uds

)

v +

(∫ 1

0H(q + sv)ds

)

u

for all v ∈ D− q and all u ∈ V. Applying the definitions (24.2), (24.4), and(611.1), and using the linearity of the switching and Prop.1 of Sect.610, weobtain

(∇q+vϕ)u =

(∫ 1

0s(CurlH)(q + sv)ds

)

(u,v) (611.4)

+

(∫ 1

0(s(∇H(q + sv))v + H(q + sv))ds

)

u

for all v ∈ D− q and all u ∈ V. By the Product Rule (66.10) and the ChainRule we have, for all s ∈ [0, 1],

∂s(t 7→ tH(q + tv)) = H(q + sv) + s(∇H(q + sv))v.

Hence, by the Fundamental Theorem of Calculus, the second integral on theright side of (611.4) reduces to H(q + v). Therefore, since CurlH has skewvalues and since u ∈ V was arbitrary, (611.4) reduces to (611.3).

Proof of the Theorem: Assume, first, that D is convex and thatCurlH = 0. Choose q ∈ D, q′ ∈ E ′ and define ϕ : D → E ′ by (611.2). ThenH = ∇ϕ by (611.3).

Conversely, assume that H = ∇ϕ for some ϕ : D → E ′. Let q ∈ Dbe given. Since D is open, we can choose a convex open neighborhood Nof q with N ⊂ D. Let v ∈ N − q by given. Since N is convex, we haveq + sv ∈ N for all s ∈ [0, 1], and hence we can apply the FundamentalTheorem of Calculus to the derivative of s 7→ ϕ(q + sv), with the result

ϕ(q + v) = ϕ(q) +

(∫ 1

0H(q + sv)ds

)

v.

611. CURL, SYMMETRY OF SECOND GRADIENTS 263

Since v ∈ N −q was arbitrary, we see that the hypotheses of the Lemma aresatisfied for N instead of D, q′ := ϕ(q), ϕ|N instead of ϕ, and H|N = ∇ϕ|Ninstead of H. Hence, by (611.3), we have

(∫ 1

0s(CurlH)(q + sv)ds

)

v = 0 (611.5)

for all v ∈ N − q. Now let u ∈ V be given. Since N − q is a neighborhoodof 0 ∈ V, there is a δ ∈ P× such that tu ∈ N − q for all t ∈ ]0, δ]. Hence,substituting tu for v in (611.5) and dividing by t gives

(∫ 1

0s(CurlH)(q + stu)ds

)

u = 0 for all t ∈ ]0, δ].

since CurlH is continuous, we can apply Prop.3 of Sect.610 and, in the limitt → 0, obtain ((CurlH)(q))u = 0. Since u ∈ V and q ∈ D were arbitrary,we conclude that CurlH = 0.

Remark 1: In the second part of the Theorem, the condition that D beconvex can be replaced by the weaker one that D be “simply connected”.This means, intuitively, that every closed curve in D can be continuouslyshrunk entirely within D to a point.

Since Curl∇ϕ = ∇(2)ϕ − (∇(2)ϕ)∼ by (611.1), we can restate the firstpart of the Curl-Gradient Theorem as follows:

Theorem on Symmetry of Second Gradients: If ϕ : D → E ′

is of class C2, then its second gradient ∇(2)ϕ : D → Lin2(V2,V ′) has

symmetric values, i.e. Rng∇(2)ϕ ⊂ Sym2(V2,V ′).

Remark 2: The assertion that ∇x(∇ϕ) is symmetric for a given x ∈ Dremains valid if one merely assumes that ϕ is differentiable and that ∇ϕis differentiable at x. A direct proof of this fact, based on the results ofSect.64, is straightforward although somewhat tedious.

We assume now that E := E1 ×E2 is the set-product of flat spaces E1, E2

with translation spaces V1,V2, respectively. Assume that ϕ : D → E ′ istwice differentiable. Using Prop.1 of Sect.65 repeatedly, it is easily seen that

∇(2)ϕ(x) =

(∇(1)∇(1)ϕ)(x)(ev1, ev1) + (∇(1)∇(2)ϕ)(x)(ev1, ev2) + (611.6)

(∇(2)∇(1)ϕ)(x)(ev2, ev1) + (∇(2)∇(2)ϕ)(x)(ev2, ev2)

for all x ∈ D, where ev1 and ev2 are the evaluation mappings from V :=V1 × V2 to V1 and V2, respectively (see Sect.04). Evaluation of (611.6) at


(u,v) = ((u1,u2), (v1,v2)) ∈ V2 gives

∇(2)ϕ(x)(u,v) =

(∇(1)∇(1)ϕ)(x)(u1,v1) + (∇(1)∇(2)ϕ)(x)(u1,v2) + (611.7)

(∇(2)∇(1)ϕ)(x)(u2,v1) + (∇(2)∇(2)ϕ)(x)(u2,v2).

The following is a corollary to the preceding theorem.Theorem on the Interchange of Partial Gradients: Assume that

D is an open subset of a product space E := E1 × E2 and that ϕ : D → E ′

is of class C2. Then

∇(1)∇(2)ϕ = (∇(2)∇(1)ϕ)∼, (611.8)

where the operation ∼ is to be understood as value-wise switching.

Proof: Let x ∈ D and w ∈ V1 ×V2 be given. If we apply (611.7) to thecase when u := (w1,0), v := (0,w2) and then with u and v interchangedwe obtain

∇(2)ϕ(x)(u,v) = (∇(1)∇(2)ϕ)(x)(w1,w2),

∇(2)ϕ(x)(v,u) = (∇(2)∇(1)ϕ)(x)(w2,w1).

Since w ∈ V1 ×V2 was arbitrary, the symmetry of ∇(2)ϕ(x) gives (611.8).Corollary: Let I be a finite index set and let D be an open subset

of RI . If ϕ : D → E ′ is of class C2, then the second partial derivatives

ϕ,i ,k : D → V ′ satisfy

ϕ,i ,k = ϕ,k ,i for all i, k ∈ I. (611.9)

Remark 3: Assume D is an open subset of a Euclidean space E withtranslation space V. A mapping h : D → V ∼= V∗ is then called a vector-field(see Sect.71). If h is differentiable, then the range of Curlh is included inSkew2(V

2,R) ∼= SkewV. If V is 3-dimensional, there is a natural doubletonof orthogonal isomorphisms from SkewV to V, as will be explained in Vol.II.If one of these two isomorphisms, say V ∈ Orth(SkewV,V), is singled out,we may consider the vector-curl curl h := V(Curlh)|SkewV of the vectorfield h (Note the lower-case “c”). This vector-curl rather than the curl ofDef.1 is used in much of the literature.

Notes 611

(1) In some of the literature on “Vector Analysis”, the notation ∇× h instead of curlhis used for the vector-curl of h, which is explained in the Remark above. Thisnotation should be avoided for the reason mentioned in Note (1) to Sect.67.

612. LINEONIC EXPONENTIALS 265

612 Lineonic Exponentials

We note the following generalization of a familiar result of elementary cal-culus.

Proposition 1: Let E be a flat space with translation space V and let

I be an interval. Let g be a sequence of continuous processes in Map (I,V)that converges locally uniformly to h ∈ Map (I,V). Let a ∈ I and q ∈ E be

given and define the sequence z in Map (I, E) by

zn(t) := q +

∫ t

a

gn for all n ∈ N× and t ∈ I. (612.1)

Then h is continuous and z converges locally uniformly to the process p :I → E defined by

p(t) := q +

∫ t

a

h for all t ∈ I. (612.2)

Proof: The continuity of h follows from the Theorem on Continuity ofUniform Limits of Sect.56. Hence (612.2) is meaningful. Let ν be a normon V. By Prop.2 of Sect.610 and (612.1) and (612.2) we have

ν(zn(t) − p(t)) = ν

(∫ t

a

(gn − h)

)

≤

∣

∣

∣

∣

∫ t

a

ν (gn − h)

∣

∣

∣

∣

(612.3)

for all n ∈ N× and all t ∈ I. Now let s ∈ I be given. We can choosea compact interval [b, c] ⊂ I such that a ∈ [b, c] and such that [b, c] is aneightborhood of s relative to I (see Sect.56). By Prop.7 of Sect.58, g|[b,c]converges uniformly to h|[b,c]. Now let ǫ ∈ P× be given. By Prop.7 of Sect.55we can determine m ∈ N× such that

ν (gn − h)|[b,c] <ǫ

c− bfor all n ∈ m+ N.

Hence, by (612.3), we have

ν(zn(t) − p(t)) < |t− a|ǫ

c− b< ǫ for all n ∈ m+ N

and all t ∈ [b, c]. Since ε ∈ P× was arbitrary, we can use Prop.7 of Sect.55again to conclude that z|[b,c] converges uniformly to p|[b,c]. Since s ∈ I wasarbitrary and since [b, c] ∈ Nhds(I), the conclusion follows.

From now on we assume that a linear space V is given and we considerthe algebra LinV of lineons on V (see Sect.18).


Proposition 2: The sequence S in Map (LinV,LinV) defined by

Sn(L) :=∑

k∈n[

1

k!Lk (612.4)

for all L ∈ LinV and all n ∈ N× converges locally uniformly.

Its limit is called the (lineonic) exponential for V and is denoted by

expV : LinV → LinV. The exponential expV is continuous.

Proof: We choose a norm ν on V. Using Prop.6 of Sect.52 and induction,we see that ‖Lk‖ν ≤ ‖L‖ν

k for all k ∈ N and all L ∈ LinV (see (52.9)).Hence, given σ ∈ P×, we have

‖1

k!Lk‖ν ≤

σk

k!

for all k ∈ N and all L ∈ σCe(‖•‖ν). Since the sum-sequence of (σk

k! | k ∈ N)converges (to eσ), we can use Prop.9 of Sect.55 to conclude that the restric-tion of the sum-sequence S of ( 1

k!Lk | k ∈ N) to σCe(‖ • ‖ν) converges

uniformly. Now, given L ∈ LinV, σCe(‖ • ‖ν) is a neighborhood of L ifσ > ‖L‖ν . Therefore S converges locally uniformly. The continuity of itslimit expV follows from the Continuity Theorem for Uniform Limits.

For later use we need the following:Proposition 3: Let L ∈ LinV be given. Then the constant zero is the

only differentiable process D : R → LinV that satisfies

D• = LD and D(0) = 0. (612.5)

Proof: By the Fundamental Theorem of Calculus, (612.5) is equivalentto

D(t) =

∫ t

0(LD) for all t ∈ R. (612.6)

Choose a norm ν on V. Using Prop.2 of Sect.610 and Prop.6 of Sect.52, weconclude from (612.6) that

‖D(t)‖ν ≤

∫ t

0‖LD(s)‖νds ≤ ‖L‖ν

∫ t

0‖D(s)‖νds,

i.e., with the abbreviations

σ(t) := ‖D(t)‖ν for all t ∈ P, κ := ‖L‖ν , (612.7)

that

0 ≤ σ(t) ≤ κ

∫ t

0σ for all t ∈ P. (612.8)


We define ϕ : P → R by

ϕ(t) := e−κt∫ t

0σ for all t ∈ P. (612.9)

Using elementary calculus, we infer from (612.9) and (612.8) that

ϕ•(t) = −κe−κt∫ t

0σ + e−κtσ(t) ≤ 0

for all t ∈ P and hence that ϕ is antitone. On the other hand, it is evidentfrom (612.9) that ϕ(0) = 0 and ϕ ≥ 0. This can happen only if ϕ = 0,which, in turn, can happen only if σ(t) = 0 for all t ∈ P and hence, by(612.7), if D(t) = 0 for all t ∈ P. To show that D(t) = 0 for all t ∈ −P, oneneed only replace D by D (−ι) and L by −L in (612.5) and then use theresult just proved.

Proposition 4: Let L ∈ LinV be given. Then the only differentiable

process E : R → LinV that satisfies

E• = LE and E(0) = 1V (612.10)

is the one given by

E := expV (ιL). (612.11)

Proof: We consider the sequence G in Map (R,LinV) defined by

Gn(t) :=∑

k∈n[

tk

k!Lk+1 (612.12)

for all n ∈ N× and all t ∈ R. Since, by (612.4), Gn(t) = LSn(tL) forall n ∈ N× and all t ∈ R, it follows from Prop.2 that G converges locallyuniformly to L exp (ιL) : R → LinV. On the other hand, it follows from(612.12) that

1V +

∫ t

0Gn = 1V +

∑

k∈n[

tk+1

(k + 1)!Lk+1 = Sn+1(tL)

for all n ∈ N× and all t ∈ R. Applying Prop.1 to the case when E and V arereplaced by LinV, I by R, g by G, a by 0, and q by 1V , we conclude that

expV(tL) = 1V +

∫ t

0(L expV (ιL)) for all t ∈ R, (612.13)


which, by the Fundamental Theorem of Calculus, is equivalent to the asser-tion that (612.10) holds when E is defined by (612.11).

The uniqueness of E is an immediate consequence of Prop.3.Proposition 5: Let L ∈ LinV and a continuous process F : R → LinV

be given. Then the only differentiable process D : R → LinV that satisfies

D• = LD + F and D(0) = 0 (612.14)

is the one given by

D(t) =

∫ t

0E(t− s)F(s)ds for all t ∈ R, (612.15)

where E : R → LinV is given by (612.11).Proof: Let D be defined by (612.15). Using the Corollary to the Dif-

ferentiation Theorem for Integral Representations of Sect.610, we see thatD is of class C1 and that D• is given by

D•(t) = E(0)F(t) +

∫ t

0E•(t− s)F(s)ds for all t ∈ R.

Using (612.10) and Prop.1 of Sect.610, and then (612.15), we find that(612.14) is satisfied.

The uniqueness of D is again an immediate consequence of Prop.3.Differentiation Theorem for Lineonic Exponentials: Let V be a

linear space. The exponential expV : LinV → LinV is of class C1 and its

gradient is given by

(∇L expV)M =

∫ 1

0expV(sL)M expV((1 − s)L)ds (612.16)


Proof: Let L,M ∈ LinV be given. Also, let r ∈ R× be given and put

K := L + rM, E1 := expV (ιL), E2 := expV (ιK). (612.17)

By Prop.4 we have E1• = LE1, E2

• = KE2, and E1(0) = E2(0) = 1V .Taking the difference and observing (612.17)1 we see that D := E2 − E1

satisfiesD• = KE2 − LE1 = LD + rME2, D(0) = 0.

Hence, if we apply Prop.5 with the choice F := rME2 we obtain

(E2 − E1)(t) = r

∫ t

0E1(t− s)ME2(s)ds for all t ∈ R.


For t := 1 we obtain, in view of (612.17),

1

r(expV(L + rM) − expV(L)) =

∫ 1

0(expV((1 − s)L))M expV(s(L + rM))ds.

Since expV is continuous by Prop.2, and since bilinear mappings are contin-uous (see Sect.66), we see that the integrand on the right depends contin-uously on (s, r) ∈ R2. Hence, by Prop.3 of Sect.610 and by (65.13), in thelimit r → 0 we find

(ddM expV)(L) =

∫ 1

0(expV((1 − s)L))M expV(sL)ds. (612.18)

Again, we see that the integrand on the right depends continuously on(s,L) ∈ R × LinV. Hence, using Prop.3 of Sect.610 again, we conclude thatthe mapping ddM expV : LinV → LinV is continuous. Since M ∈ LinVwas arbitrary, it follows from Prop.7 of Sect.65 that expV is of class C1. Theformula (612.16) is the result of combining (65.14) with (612.18).

Proposition 6: Assume that L,M ∈ LinV commute, i.e. that LM =ML. Then:

(i) M and expV(L) commute.

(ii) We have

expV(L + M) = (expV(L))(expV(M)). (612.19)

(iii) We have

(∇L expV)M = (expV(L))M. (612.20)

Proof: Put E := expv (ιL) and D := EM − ME. By Prop.4, we findD• = E•M − ME• = LEM − MLE. Hence, since LM = ML, we obtainD• = LD. Since D(0) = 1VM − M1V = 0, it follows from Prop.3 thatD = 0 and hence D(1) = 0, which proves (i).

Now put F := expv (ιM). By the Product Rule (66.13) and by Prop.4,we find

(EF)• = E•F + EF• = LEF + EMF.

Since EM = ME by part (i), we obtain (EF)• = (L + M)EF. Since(EF)(0) = E(0)F(0) = 1V , by Prop.4, EF = expv (ι(L + M)). Evaluationat 1 gives (612.19), which proves (ii).

Part (iii) is an immediate consequence of (612.16) and of (i) and (ii).


Pitfalls: Of course, when V = R, then LinV = LinR ∼= R and thelineonic exponential reduces to the ordinary exponential exp. Prop.6 showshow the rule exp(s+t) = exp(s) exp(t) for all s, t ∈ R and the rule exp• = expgeneralize to the case when V 6= R. The assumption, in Prop.6, that L andM commute cannot be omitted. The formulas (612.19) and (612.20) neednot be valid when LM 6= ML.

If the codomain of the ordinary exponential is restricted to P× it becomesinvertible and its inverse log : P× → R is of class C1. If dimV > 1, then expVdoes not have differentiable local inverses near certain values of L ∈ LinVbecause for these values of L, the gradient ∇L expV fails to be injective (seeProblem 10). In fact, one can prove that expV is not locally invertible nearthe values of L in question. Therefore, there is no general lineonic analogueof the logarithm. See, however, Sect.85.


(1) Let I be a genuine interval, let E be a flat space with translation spaceV, and let p : I → E be a differentiable process. Define h : I2 → V by

h(s, t) :=

p(s)−p(t)s−t if s 6= t

p•(s) if s = t

. (P6.1)

(a) Prove: If p• is continuous at t ∈ I, then h is continuous at(t, t) ∈ I2.

(b) Find a counterexample which shows that h need not be continu-ous if p is merely differentiable and not of class C1.

(2) Let f : D → R be a function whose domain D is an open convex subsetof a flat space E with translation space V.

(a) Show: If f is differentiable and ∇f is constant, then f = a|D forsome flat function a ∈ Flf(E) (see Sect.36).

(b) Show: If f is twice differentiable and ∇(2)f is constant, then fhas the form

f = a|D + Q (1D⊂E − qD→E), (P6.2)

where a ∈ Flf(E), q ∈ E , and Q ∈ Qu(V) (see Sect. 27).


(3) Let D be an open subset of a flat space E and let W be a genuine inner-product space. Let k : D → W× be a mapping that is differentiableat a given x ∈ D.

(a) Show that the value-wise magnitude |k| : D → P×, defined by|k|(y) := |k(y)| for all y ∈ D, is differentiable at x and that

∇x|k| = 1|k(x)|(∇xk)⊤k(x). (P6.3)

(b) Show that k/|k| : D → W× is differentiable at x and that

∇x

(

k|k|

)

= 1|k(x)|3

(

|k(x)|2∇xk − k(x) ⊗ (∇xk)⊤k(x))

. (P6.4)

(4) Let E be a genuine inner-product space with translation space V, letq ∈ E be given, and define r : E \ q → V× by

r(x) := x− q for all x ∈ E \ q (P6.5)

and put r := |r| (see Part (a) of Problem 3).

(a) Show that r/r : E \ q → V× is of class C1 and that

∇(

rr

)

= 1r3

(r21V − r⊗ r). (P6.6)

(Hint: Use Part (b) of Problem 3.)

(b) Let the function h : P× → R be twice differentiable. Show thath r : E \ q → R is twice differentiable and that

∇(2)(h r) = 1r

(

1r2

(r(h•• r) − (h• r))r⊗ r + (h• r)1V)

.(P6.7)

(c) Evaluate the Laplacian ∆(h r) and reconcile your result with(67.17).

(5) A Euclidean space E of dimension n with n ≥ 2, a point q ∈ E , and alinear space W are assumed given.

(a) Let I be an open interval, let D be an open subset of E , letf : D → I be given by (67.16), let a be a flat function on E , letg : I → W be twice differentiable, and put h := a|D(g f) : D →W . Show that

∆h = 2a|D ((2ιg•• + (n+ 2)g•) f) − 4a(q)(g• f). (P6.8)


(b) Assuming that the Euclidean space E is genuine, show that thefunction h : E \ q → W given by

h(x) := ((e · (x− q))|x− q|−n)a + b (P6.9)

is harmonic for all e ∈ V := E − E and all a,b ∈ W . (Hint: UsePart (a) with a determined by ∇a = e, a(q) = 0.)

(6) Let E be a flat space, let q ∈ E , let Q be a non-degenerate quadraticform (see Sect.27) on V := E − E , and define f : E → R by

f(y) := Q(y − q) for all y ∈ E . (P6.10)

Let a be a non-constant flat function on E (see Sect.36).

(a) Prove: If the restriction of f to the hyperplane F := a<(0)attains an extremum at x ∈ F , then x must be given by

x = q − λ−1

Q (∇a), where λ := a(q)−1

Q (∇a,∇a)

. (P6.11)

(Hint: Use the Constrained-Extremum Theorem.)

(b) Under what condition does f |F actually attain a maximum or aminimum at the point x given by (P6.11)?

(7) Let E be a 2-dimensional Euclidean space with translation-space Vand let J ∈ OrthV ∩ SkewV be given (see Problem 2 of Chapt.4). LetD be an open subset of E and let h : D → V be a vector-field of classC1.

(a) Prove that

Curlh = −(div(Jh))J. (P6.12)

(b) Assuming that D is convex and that div h = 0, prove that h =J∇f for some function f : D → R of class C2.

Note: If h is interpreted as the velocity field of a volume-preserving flow, then

div h = 0 is valid and a function f as described in Part (b) is called a “stream-

function” of the flow.


(8) Let a linear space V, a lineon L ∈ LinV, and u ∈ V be given. Prove:The only differentiable process h : R → V that satisfies

h• = Lh and h(0) = u (P6.13)

is the one given by

h := (expV (ιL))u. (P6.14)

(Hint: To prove existence, use Prop. 4 of Sect. 612. To prove unique-ness, use Prop.3 of Sect.612 with the choice D := h ⊗ λ (value-wise),where λ ∈ V∗ is arbitrary.)

(9) Let a linear space V and a lineon J on V that satisfies J2 = −1V begiven. (There are such J if dimV is even; see Sect.89.)

(a) Show that there are functions c : R → R and d : R → R, of classC1, such that

expV (ιJ) = c1V + dJ. (P6.15)

(Hint: Apply the result of Problem 8 to the case when V is re-placed by C := Lsp(1V ,J) ⊂ LinV and when L is replaced byLeJ ∈ LinC, defined by LeJU = JU for all U ∈ C.)

(b) Show that the functions c and d of Part (a) satisfy

d• = c, c• = −d, c(0) = 1, d(0) = 0, (P6.16)

and

c(t+ s) = c(t)c(s) − d(t)d(s)d(t+ s) = c(t)d(s) + d(t)c(s)

for all s, t ∈ R. (P6.17)

(c) Show that c = cos and d = sin.

Remark: One could, in fact, use Part (a) to define sin and cos.

(10) Let a linear space V and J ∈ LinV satisfying J2 = −1V be given andput

A := L ∈ LinV | LJ = −JL. (P6.18)


(a) Show that A = Null (∇πJ expV) and conclude that ∇πJ expV failsto be invertible when dimV > 0. (Hint: Use the DifferentiationTheorem for Lineonic Exponentials, Part (a) of Problem 9, andPart (d) of Problem 12 of Chap. 1).

(b) Prove that −1V ∈ Bdy Rng expV and hence that expV fails to belocally invertible near πJ.

(11) Let D be an open subset of a flat space E with translation space V andlet V ′ be a linear space.

(a) Let H : D → Lin(V,V ′) be of class C2. Let x ∈ D and v ∈ V begiven. Note that

∇xCurlH ∈ Lin(V,Lin(V,Lin(V,V ′))) ∼= Lin2(V2,Lin(V,V ′))

and hence

(∇xCurlH)∼ ∈ Lin2(V2,Lin(V,V ′)) ∼= Lin(V,Lin(V,Lin(V,V ′))).

Therefore we have

G := (∇xCurlH)∼v ∈ Lin(V,Lin(V,V ′)) ∼= Lin2(V2,V ′).

Show that

G− G∼ = (∇xCurlH)v. (P6.19)

(Hint: Use the Symmetry Theorem for Second Gradients.)

(b) Let η : D → V∗ be of class C2 and put

W := Curl η : D → Lin(V,V∗) ∼= Lin2(V2,R).

Prove: If h : D → V is of class C1, then

Curl (Wh) = (∇W)h + W∇h + (∇h)⊤W, (P6.20)

where value-wise evaluation and composition are understood.

(12) Let E and E ′ be flat spaces with dim E = dim E ′ ≥ 2, let ϕ : E → E ′ bea mapping of class C1 and put

C := x ∈ E | ∇xϕ is not invertible.


(a) Show that ϕ>(C) ⊃ Rngϕ ∩ Bdy (Rngϕ).

(b) Prove: If the pre-image under ϕ of every bounded subset of E ′ isbounded and if Accϕ>(C) = ∅ (see Def.1 of Sect.57), then ϕ issurjective. (Hint: Use Problem 13 of Chap.5.)

(13) By a complex polynomial p we mean an element of C(N), i.e. asequence in C indexed on N and with finite support (see (07.10)). Ifp 6= 0, we define the degree of p by

deg p := maxSupt p = maxk ∈ N | pk 6= 0.

By the derivative p′ of p we mean the complex polynomial p′ ∈ C(N)

defined by

(p′)k := (k + 1)pk+1 for all k ∈ N. (P6.21)

The polynomial function p : C → C of p is defined by

p(z) :=∑

k∈Npkz

k for all z ∈ C. (P6.22)

Let p ∈ (C(N))× be given.

(a) Show that p<(0) is finite and ♯p<(0) ≤ deg p.

(b) Regarding C as a two-dimensional linear space over R, show thatp is of class C2 and that

(∇zp)w = p′(z)w for all z, w ∈ C. (P6.23)

(c) Prove that p is surjective if deg p > 0. (Hint: Use Part (b) ofProblem 12.)

(d) Show: If deg p > 0, then the equation ? z ∈ C, p(z) = 0 has atleast one and no more than deg p solutions.

Note: The assertion of Part (d) is usually called the “Fundamental Theorem of

Algebra”. It is really a theorem of analysis, not algebra, and it is not all that

fundamental.

Chapter 7

Coordinate Systems

In this chapter we assume again that all spaces (linear, flat, or Euclidean)under consideration are finite-dimensional.

71 Coordinates in Flat Spaces

We assume that a flat space E with translation space V is given. Roughlyspeaking, a coordinate system is a method for specifying points in E bymeans of families of numbers. The coordinates are the functions that as-sign the specifying numbers to the points. It is useful to use the coordinatefunctions themselves as indices when describing the family of numbers thatspecifies a given point. For most “curvilinear” coordinate systems, the co-ordinate functions can be defined unambiguously only on a subset D of Eobtained from E by removing a suitable set of “exceptional” points. Exam-ples will be considered in Sect.74.

Definition: A coordinate system on a given open subset D of E is

a finite set Γ ⊂ Map (D,R) of functions c : D → R, which are called

coordinates, subject to the following conditions:

(C1) Every coordinate c ∈ Γ is of class C2.

(C2) For every x ∈ D, the family (∇xc | c ∈ Γ) of the gradients of the

coordinates at x is a basis of V∗.

(C3) The mapping Γ : D → RΓ, identified with the set Γ by self-indexing

(see Sect.02) and term-wise evaluation (see Sect.04), i.e. by Γ(x) :=(c(x) | c ∈ Γ) for all x ∈ D, is injective.

277

278 CHAPTER 7. COORDINATE SYSTEMS

For the remainder of this section we assume that an open subset Dof E and a coordinate system Γ on D are given. Using the identificationV∗Γ = Lin(V,R)Γ ∼= Lin(V,RΓ) (see Sect.14), we obtain

∇xΓ = (∇xc | c ∈ Γ). (71.1)

The condition (C2) expresses the requirement that the gradient ofΓ : D → RΓ have invertible values. Using the Local Inversion Theoremof Sect.68 and (C3) we obtain the following result.

Proposition 1: The range D := Rng Γ of Γ is an open subset of

RΓ, the mapping γ := Γ|D : D → D is invertible and has an inverse

ψ := γ← : D → D that is of class C2.

By (65.11), the family

(ψ,c | c ∈ Γ) ∈ Map (D,V)Γ ∼= Map (D,VΓ)

of partial derivatives of ψ with respect to the coordinates is related to thegradient ∇ψ : D → Lin(RΓ,V) of ψ by

∇ψ = lnc(ψ,c | c∈Γ), (71.2)

where the value at ξ ∈ D of the right side is understood to be the linearcombination mapping of the family (ψ,c (ξ) | c ∈ Γ). Since ψ = γ←, theChain Rule shows that ∇ξψ is invertible with inverse (∇ξψ)−1 = (∇ψ(ξ)Γ) =

∇ψ(ξ)Γ for all ξ ∈ D. In view of (71.2), it follows that (ψ,c (ξ) | c ∈ Γ) is abasis of V and, in view of (71.1), that

(∇ψ(ξ)c)ψ,d (ξ) = δcd :=

0 if c 6= d1 if c = d

(71.3)

for all c, d ∈ Γ and all ξ ∈ D. We use the notation

bc := ψ,c γ, βc := ∇c for all c ∈ Γ. (71.4)

By (C1) and Prop.1, bc : D → V and βc : D → V∗ are of class C1 for allc ∈ Γ. It follows from (71.3) that

βcbd = δcd for all c, d ∈ Γ. (71.5)

By Prop.4 of Sect.23, we conclude that for each x ∈ D, the family β(x) :=(βc(x) | c ∈ Γ) in V∗ is the dual basis of the basis b(x) := (bc(x) | c ∈ Γ) ofV. We call b : D → VΓ the basis field and β : D → V∗Γ the dual basisfield of the given coordinate system Γ.

71. COORDINATES IN FLAT SPACES 279

Remark: The reason for using superscripts rather than subscripts asindices to denote the terms of certain families will be explained in Sect.73.The placing of the indices is designed in such a way that in most of thesummations that occur, the summation dummy is used exactly twice, onceas a superscript and once as a subscript.

We often use the word “field” for a mapping whose domain is D or anopen subset of D. If the codomain is R, we call it a scalar field; if thecodomain is V, we call it a vector field; if the codomain is V∗, we call it acovector field; and if the codomain is LinV, we call it a lineon field. If his a vector field, we define the component-family

[h] := ([h]c | c ∈ Γ) ∈ (Map (Domh,R))Γ

of h relative to the given coordinate system by [h](x) := lnc−1b(x)h(x) for all

x ∈ Domh, so that

h =∑

d∈Γ

[h]dbd, [h]c = βch for all c ∈ Γ. (71.6)

If η is a covector field, we define the component family [η] := ([η]c | c ∈ Γ)of η by [η](x) := lnc−1

β(x)η(x) for all x ∈ Dom η, so that

η =∑

d∈Γ

[η]dβd, [η]c = ηbc for all c ∈ Γ. (71.7)

If T is a lineon field, we define the component-matrix [T] := ([T]cd | (c, d) ∈Γ2) of T by

[T](x) := (lnc(bc⊗β

d| (c,d)∈Γ2)

)−1T(x)

for all x ∈ DomT, so that

T =∑

(c,d)∈Γ2

[T]cd(bc ⊗ βd). (71.8)

The matrix [T] is given by

[T]cd = βcTbd for all c, d ∈ Γ (71.9)

and characterized by

Tbc =∑

d∈Γ

[T]dcbd for all c ∈ Γ. (71.10)


In general, if F is a field whose codomain is a linear space W constructedfrom V by some natural construction, we define the component family [F]of F as follows: For each x ∈ DomF, [F](x) is the family of componentsof F relative to the basis of W induced by the basis b(x) of V by theconstruction of W . For example, if F is a field with codomain Lin(V∗,V)then the component-matrix [F] := ([F]cd | (c, d) ∈ Γ2) of F is determined by

F =∑

(c,d)∈Γ2

[F]cd(bc ⊗ bd). (71.11)

It is given by[F]cd = βcFβd for all c, d ∈ Γ (71.12)

and characterized by

Fβc =∑

d∈Γ

[F]dcbd for all c ∈ Γ. (71.13)

A field of any type is continuous, differentiable, or of class C1 if and onlyif all of its components have the corresponding property.

We say that Γ is a flat coordinate system if the members of Γ are allflat functions or restrictions thereof. If also D = E , then γ = Γ : E → RΓ

is a flat isomorphism. The point q := ψ(0) = γ←(0) ∈ E is then called theorigin of the given flat coordinate system. A flat coordinate system canbe specified by prescribing the origin q ∈ E and a set basis b of V. Theneach member λ of the dual basis b∗ can be used to specify a coordinatec : E → R by c(x) := λ(x− q) for all x ∈ E .

The basis field b (or its dual β) is constant if and only if the coordinatesystem is flat.

Let ξ ∈ D ⊂ RΓ and c ∈ Γ be given and let ψ(ξ.c) be defined accordingto (04.25). It is easily seen that Domψ(ξ.c) is an open subset of R. Wecall Rngψ(ξ.c) the coordinate curve through the point ψ(ξ) coorespondingto the coordinate c. If Γ is a flat coordinate system, then the coordinatecurves are straight lines.

Pitfall: It is easy to give examples of non-flat coordinate systems whichare “rectilinear” in the sense that all coordinate curves are straight lines.Thus, there is a distinction between non-flat coordinate systems and “curvi-linear coordinate systems”, i.e. coordinate systems having some coordinatecurves that are not straight.

72. CONNECTION COMPONENTS 281

Notes 71

(1) In all discussions of the theory of coordinate systems I have seen in the lit-erature, the coordinates are assumed to be enumerated so as to form a list(ci | i ∈ n]), n := dimE . The numbers in n] are then used as indices in allformulas involving coordinates and components. However, when dealing with aspecific coordinate system, most authors do not follow their theory but instead usethe coordinates themselves as indices. I believe that one should do in theory whatone does in practice and always use the set of coordinates itself as the index set.Enumeration of the coordinates is artificial and serves no useful purpose.

(2) Many textbooks contain a lengthy discussion about the transformation from onecoordinate system to another. I believe that such “coordinate transformations”serve no useful purpose and may lead to enormous unnecessary calculations. In aspecific situation, one should choose from the start a coordinate system that fits thesituation and then stick with it. Even if a second coordinate system is considered,as is useful on rare occasions, one does not need “coordinate transformations”.

72 Connection ComponentsComponents of Gradients

Let a coordinate system Γ on an open subset D of a flat space E withtranslation space V be given. As we have seen in the previous section, thebasis field b and the dual basis field β of the system Γ are of class C1.

We use the notations

Ccde := βd(∇bc)be for all c, d, e ∈ Γ (72.1)

and

Dc := div bc for all c ∈ Γ. (72.2)

The functions Ccde : D → R and Dc : D → R are called the connection

components and the deviation components, respectively, of the systemΓ.

The connection components give the component matrices of the gradientsnot only of the terms of the basis field b but also of its dual β:

Proposition 1: For each coordinate c ∈ Γ, the component matrices of

∇bc and ∇βc are given by

[∇bc] = (Ccde | (d, e) ∈ Γ2) (72.3)

and

[∇βc] = (−Cdce | (d, e) ∈ Γ2). (72.4)


Proof: We obtain (72.3) simply by comparing (72.1) with (71.9). Toderive (72.4), we use the product rule (66.6) when taking the gradient of(71.5) and obtain

(∇βc)⊤bd + (∇bd)⊤βc = 0 for all d ∈ Γ.

Value-wise operation on be gives

bd(∇βc)be + βc(∇bd)be = 0 for all d, e ∈ Γ.

The result (72.4) follows now from (72.1) and the fact that the matrix [∇βc]is given by

[∇βc] = (bd(∇βc)be | (d, e) ∈ Γ2).

Proposition 2: The connection components satisfy the symmetry rela-

tions

Ccde = Ce

dc for all c, d, e ∈ Γ. (72.5)

Proof: In view of the definition βc := ∇c we have ∇βc = ∇∇c. Theassertion (72.5) follows from (72.4) and the Theorem on Symmetry of SecondGradients of Sect.611, applied to c.

Proposition 3: The deviation components are obtained from the con-

nection components by

Dc =∑

d∈Γ

Ccdd for all c ∈ Γ. (72.6)

Proof: By (72.2), (67.1), and (67.2), we have Dc(x) = tr(∇xbc) for allx ∈ D. The desired result (72.6) then follows from (72.3) and (26.8).

Let W be a linear space and let F be a differentiable field with domainD and with codomain W . We use the notation

F;c := (F ψ),c γ for all c ∈ Γ, (72.7)

which says that F;c is obtained from F by first looking at the dependence ofthe values of F on the coordinates, then taking the partial c-derivative, andthen looking at the result as a function of the point. It is easily seen that

(∇F)bc = F;c for all c ∈ Γ. (72.8)

If F := f is a differentiable scalar field, then (72.8) states that the componentfamily [∇f ] of the covector field ∇f : D → V∗ is given by

[∇f ] = (f;c | c ∈ Γ). (72.9)

72. CONNECTION COMPONENTS 283

The connection components are all zero if the coordinate system is flat.If it is not flat, one must know what the connection components are in orderto calculate the components of gradients of vector and covector fields.

Proposition 4: Let h be a differentiable vector field and η a differen-

tiable covector field with component-families [h] and [η], respectively. Then

the component-matrices of ∇h and ∇η have the terms

[∇h]cd = [h]c;d +∑

e∈Γ

[h]eCecd, (72.10)

and

[∇η]cd = [η]c;d −∑

e∈Γ

[η]eCced (72.11)

for all c, d ∈ Γ.

Proof: To obtain (72.10), one takes the gradient of (71.6)1, uses theproduct rule (66.5), the formula (71.9) with T := ∇h, the formula (72.9)with f := [h]c and (72.3). The same procedure, starting with (71.7)1 andusing (72.4), yields (72.11).

It is easy to derive formulas analogous to (72.10) and (72.11) for thecomponents of gradients of other kinds of fields.

Proposition 5: Let η be a differentiable covector field. Then the compo-

nent-matrix of Curl η := ∇η−(∇η)⊤ : Dom η → Lin(V,V∗) (see Sect.611)is given by

[Curl η]cd = [η]c;d − [η]d;c for all c, d ∈ Γ. (72.12)

Proof: By Def.1 of Sect.611, the components of Curl η are obtainedfrom the components of ∇η by

[Curl η]cd = [∇η]cd − [∇η]dc for all c, d ∈ Γ.

After substituting (72.11), we see from (72.5) that the terms involving theconnection components cancel and hence that (72.12) holds.

Note that the connection components do not appear in (72.12).Proposition 6: Let h be a differentiable vector field. The divergence

of h is given, in terms of the component family [h] of h and the deviation

components, by

div h =∑

c∈Γ

([h]c;c + [h]cDc). (72.13)

Proof: Using (71.6)1 and Prop.1 of Sect.67, we obtain

div h =∑

c∈Γ

(∇( [h]c)bc + [h]cdiv bc).


Using (72.8) with F := [h]c and the definition (72.2) of Dc, we obtain thedesired result (72.13).

Proposition 7: Let F be a differentiable field with codomain Lin(V∗,V).Then the component family of div F : DomF → V (see Def.1 of Sect.67) is

given by

[div F]c =∑

d∈Γ

([F]cd;d + [F]cdDd) +∑

(e,d)∈Γ2

[F]edCecd (72.14)

for all c ∈ Γ.

Proof: Let c ∈ Γ be given. Then βcF is a differentiable vector field. Ifwe apply Prop.2 of Sect.67 with W := V, H := F, and ρ := βc, we obtain

div(βcF) = βcdiv F + tr(F⊤∇βc). (72.15)

It follows from (72.4), Prop.5 of Sect.16, (23.9), and (26.8) that

tr(F⊤∇βc) = −∑

(e,d)∈Γ2

[F]edCecd,

and from Prop.6 that

div(βcF) =∑

d∈Γ

([βcF]d;d + [βcF]dDd).

Substituting these two results into (72.15) and observing that [div F]c =βcdiv F and [βcF]d = [F]cd for all d ∈ Γ, we obtain the desired result(72.14).

Assume now that the coordinate system Γ is flat and hence that its basisfield b and the dual basis field β are constant. By (72.1) and (72.2), theconnection components and deviation components are zero. The formulas(72.10) and (72.11) for the components of the gradients of a differentiablevector field h and a differentiable covector field η reduce to

[∇h]cd = [h]c;d, [∇η]cd = [η]c;d, (72.16)

valid for all c, d ∈ Γ. The formula (72.13) for the divergence of a differen-tiable vector field h reduces to

div h =∑

c∈Γ

[h]c;c, (72.17)

and the formula (72.14) for the divergence of a differentiable field F withvalues in Lin(V∗,V) becomes

[div F]c =∑

d∈Γ

[F]cd;d for all c ∈ Γ. (72.18)

73. COORDINATES IN EUCLIDEAN SPACES 285

Notes 72

(1) The term “connection components” is used here for the first time. More traditionalterms are “Christoffel components”, “Christoffel symbols”, “three-index symbols”,or “Gamma symbols”. The last three terms are absurd because it is not the symbolsthat matter but what they stand for. The notations i

jk or Γi

jk instead of our

Ccd

e are often found in the literature.

(2) The term “deviation components” is used here for the first time. I am not awareof any other term used in the literature.

(3) The components of gradients as described, for example, in Prop.4 are often called“covariant derivatives”.

73 Coordinates in Euclidean Spaces

We assume that a coordinate system Γ on an open subset of a (not necessarilygenuine) Euclidean space E with translation space V is given. Since V is aninner-product space, we identify V∗ ∼= V, so that the basis field b as well asthe dual basis field β of Γ have terms that are vector fields of class C1. Weuse the notations

Gcd := bc · bd for all c, d ∈ Γ (73.1)

and

Gcd

:= βc · βd for all c, d ∈ Γ, (73.2)

so that G := (Gcd | (c, d) ∈ Γ2) and G := (Gcd

| (c, d) ∈ Γ2) are symmetricmatrices whose terms are scalar fields of class C1. These scalar fields arecalled the inner-product components of the system Γ. In terms of thenotation (41.13), G and G are given by

G(x) = Gb(x), G(x) = Gβ(x) (73.3)

for all x ∈ D. Since β(x) is the dual basis of b(x) for each x ∈ D, we obtainthe following result from (41.16) and (41.15).

Proposition 1: For each x ∈ D, the matrix G(x) is the inverse of G(x)and hence we have

∑

e∈Γ

GceGed

= δdc for all c, d ∈ Γ. (73.4)


The basis fields b and β are related by

bc =∑

d∈Γ

Gcdβd, βc =

∑

d∈Γ

Gcd

bd (73.5)

for all c ∈ Γ.

The following result shows that the connection components can be ob-tained directly from G and G.

Theorem on Connection Components: The connection components

of a coordinate system on an open subset of a Euclidean space can be obtained

from the inner-product components of the system by

Ccde =

1

2

∑

f∈Γ

Gdf

(Gfc;e +Gfe;c −Gce;f ) (73.6)

for all c, d, e ∈ Γ.

Proof: We use the abbreviation

Bcde := bd · (∇bc)be for all c, d, e ∈ Γ. (73.7)

By (72.1) and Prop.1 we have

Ccde =

∑

f∈Γ

GdfBcfe, Bcde =

∑

f∈Γ

GdfCcfe (73.8)

for all c, d, e ∈ Γ. We note that, in view of (73.8)2 and (72.5),

Bcde = Bedc for all c, d, e ∈ Γ. (73.9)

Let c, d, e ∈ Γ be given. Applying the Product Rule (66.9) to (73.1), wefind that

∇Gcd = (∇bc)⊤bd + (∇bd)

⊤bc.

Hence, by (72.8), (41.10), and (73.7) we obtain

Gcd;e = (∇Gcd) · be = Bcde +Bdce.

We can view this as a system of equations which can be solved for Bcdeas follows: Since c, d, e ∈ Γ were arbitrary, we can rewrite the system withc, d, e cyclically permuted and find that

Gcd;e = Bcde +Bdce,

Gde;c = Bdec +Bedc,


Gec;d = Becd +Bced

are valid for all c, d, e ∈ Γ. If we subtract the last of these equations fromthe sum of the first two and observe (73.9), we obtain

Gcd;e +Gde;c −Gec;d = 2Bcde,

valid for all c, d, e ∈ Γ. Using (73.8)1, we conclude that (73.6) holds.For the following results, we need the concept of determinant, which

will be explained only in Vol.II of this treatise. However, most readersundoubtedly know how to compute determinants of square matrices of smallsize, and this is all that is needed for the application of the results to specialcoordinate systems. We are concerned here with the determinant for RΓ. InVol.II we will see that it is a mapping det : Lin RΓ → R of class C1 whosegradient satisfies

(∇M det)N = det(M)tr(M−1N) (73.10)

for all M ∈ Lis RΓ and all N ∈ Lin RΓ. If M ∈ Lin RΓ ∼= RΓ×Γ is a diagonalmatrix (see Sect.02), then det(M) is the product of the diagonal of M , i.e.we have

det(M) =∏

c∈Γ

Mcc. (73.11)

We have det(M) 6= 0 if and only if M ∈ Lin RΓ is invertible. Hence, sinceG has invertible values by Prop.1, det G : D → R is nowhere zero.

Theorem on Deviation Components: The deviation components of

a coordinate system on an open subset of a Euclidean space can be obtained

from the determinant of the inner-product matrix of the system by

Dc = 2(det G); c

det Gfor all c ∈ Γ. (73.12)

Proof: By (73.10), the Chain Rule, and Prop.1 we obtain

(det G);c = (det G)tr(GG;c). (73.13)

On the other hand, by (73.6) and (72.6), we have

Dc =1

2

∑

(d,f)∈Γ2

Gdf

(Gfc;d +Gfd;c −Gcd;f ). (73.14)

Since the matrices G and G are symmetric we have

∑

(d,f)∈Γ2

GdfGfc;d =

∑

(f,d)∈Γ2

GfdGdc;f =

∑

(d,f)∈Γ2

GdfGcd;f ,


and hence, by (73.14), Prop.5 of Sect.16, and (26.8),

Dc =1

2

∑

(d,f)∈Γ2

GdfGfd;c =

1

2tr(GG;c).

Comparing this result with (73.13), we obtain (73.12).

In practice, it is useful to introduce the function

g :=√

| det G|, (73.15)

which is of class C1 because det G is nowhere zero. Since det G = ±g2,(73.12) is equivalent to

Dc =g;cg

for all c ∈ Γ. (73.16)

Remark: One can show that the sign of det G depends only on thesignature of the inner product space V (see Sect.47). In fact det G is strictlypositive if sig−V is even, strictly negative if sig−V is odd. In particular, ifE is a genuine Euclidean space, det G is strictly positive and the absolutevalue symbols can be omitted in (73.15).

The identification V ∼= V∗ makes the distinction between vector fieldsand covector fields disappear. This creates ambiguities because a vectorfield h now has two component families, one with respect to the basis fieldb and the other with respect to the dual basis field β. Thus, the symbol [h]becomes ambiguous. For the terms of [h], the ambiguity is avoided by carefulattention to the placing of indices as superscripts or subscripts. Thus, weuse ([h]c | c ∈ Γ) := β · h for the component family of h relative to b and([h]c | c ∈ Γ) := b · h for the component family of h relative to β (see also(41.17)). The symbol [h] by itself can no longer be used. It follows from(73.5) that the two types of components of h are related by the formulas

[h]c =∑

d∈Γ

Gcd

[h]d, [h]c =∑

d∈Γ

Gcd[h]d, (73.17)

valid for all c ∈ Γ. To avoid clutter, we often omit the brackets and writehc for [h]c and hc for [h]c if no confusion can arise.

A lineon field T has four component matrices because of the identifica-tions LinV ∼= LinV∗ ∼= Lin(V,V∗) ∼= Lin(V∗,V). The resulting ambiguity isavoided again by careful attention to the placing of indices as superscriptsand subscripts. The four types of components are given by the formulas


[T]cd = βc ·Tbd, [T]cd = bc · Tβd,

[T]cd = bc · Tbd, [T]cd = βc · Tβd,(73.18)

valid for all c, d ∈ Γ. The various types of components are related to eachother by formulas such as

[T]cd =∑

e∈Γ

Gde

[T]ce =∑

(e,f)∈Γ2

GceGdf

[T]ef , (73.19)

valid for all c, d ∈ Γ. Again, we often omit the brackets to avoid clutter.Using (73.16) and the product rule, we see that Props.6 and 7 of Sect.72

have the following corollaries.Proposition 2: The divergence of a differentiable vector field h is given

by

div h =1

g

∑

c∈Γ

(g[h]c);c. (73.20)

Proposition 3: The components of the divergence of a differentiable

lineon field T are given by

[div T]c =1

g

∑

d∈Γ

(g[T]cd);d +∑

(e,d)∈Γ2

[T]edCecd (73.21)

for all c ∈ Γ.

Using Def.2 of Sect.67 and observing (72.9) and (73.17)1, we obtain thefollowing immediate consequence of Prop.2.

Proposition 4: The Laplacian of a twice differentiable scalar field f is

given by

∆f =1

g

∑

(c,d)∈Γ2

(gGcdf;d);c. (73.22)

Notes 73

(1) If h is a vector field, the components [h]c of h relative to the basis field b are oftencalled the “contravariant components” of h, and the components [h]c of h relativeto the dual basis field β are then called the “covariant components” of h. (See alsoNote (5) to Sect.41.)

(2) If T is a lineon field, the components [T]cd and Tcd are often called the “covariant

components” and “contravariant components”, respectively, while [T]cd and [T]cd

are called “mixed components” of T.


74 Special Coordinate Systems

In this section a genuine Euclidean space E with translation space V isassumed given.

(A) Cartesian Coordinates: A flat coordinate system Γ is called aCartesian coordinate system if (∇c | c ∈ Γ) is a (genuine) orthonormal basisof V. To specify a Cartesian coordinate system, we may prescribe a pointq ∈ E and an orthonormal basis set e of V. For each e ∈ e, we define afunction c : E → R by

c(x) := e · (x− q) for all x ∈ E . (74.1)

The set Γ of all functions c defined in this way is then a Cartesian coordinatesystem on E with origin q. Since e is orthonormal, we have βc = bc for allc ∈ Γ. The mapping Γ : E → RΓ defined by Γ(x) := (c(x) | c ∈ Γ) for allx ∈ E is invertible and its inverse ψ : RΓ → E is given by

ψ(ξ) = q +∑

c∈Γ

ξcbc for all ξ ∈ RΓ. (74.2)

The matrices of the inner-product components of Γ are constant and givenby G = G = 1RΓ . If h is a vector field, then [h]c = [h]c for all c ∈ Γ and ifT is a lineon field, then [T]cd = [T]cd = [T]c

d = [T]cd for all c, d ∈ Γ. Thus,all indices can be written as subscripts without creating ambiguity. Sincedet G = 1 and hence g = 1, the formulas (73.20), (73.22), and (73.21)reduce to

div h =∑

c∈Γ

[h]c;c,

∆f =∑

c∈Γ

f;c;c, (74.3)

[div T]c =∑

d∈Γ

[T]cd;d,

respectively.

(B) Polar Coordinates: We assume that dim E = 2. We prescribe apoint q ∈ E , a lineon J ∈ SkewV ∩ OrthV (see Problem 2 of Chap.4) anda unit vector e ∈ V. (J is a perpendicular turn as defined in Sect.87.) Wedefine h : R → V by

h := expV (ιJ)e = cos e + sinJe (74.4)

74. SPECIAL COORDINATE SYSTEMS 291

(see Problem 9 of Chap.6). Roughly, h(t) is obtained from e, by a rotationwith angle t. We have

h•(t) = Jh(t) for all t ∈ R, (74.5)

and hence

h · h• = 0, |h| = |h•| = 1. (74.6)

We now consider the mapping Ψ : P× × R → E defined by

Ψ(s, t) := q + sh(t) for all (s, t) ∈ P× × R. (74.7)

It is clear that Ψ is of class C1, and, in view of (65.11), its gradient is givenby

∇(s,t)Ψ = lnc(h(t),sh•(t)) for all (s, t) ∈ P× × R.

It is clear from (74.6) that (h(t), sh•(t)) is a basis of V for every (s, t) ∈P× × R and hence that ∇Ψ has only invertible values. It follows from theLocal Inversion Theorem of Sect.68 that Ψ is locally invertible. It is notinjective because Ψ(s, t) = Ψ(s, t+ 2π) for all (s, t) ∈ P××R. An invertibleadjustment of Ψ with open domain is ψ := Ψ|D

D, where

D := P×× ]0, 2π[, D := E \ (q + Pe). (74.8)

We put γ := ψ← and define r : D → R and θ : D → R by (r, θ) := γ|R2.

Then Γ := r, θ is called a polar coordinate system. The function r isgiven by

r(x) = |x− q| for all x ∈ D. (74.9)

and the function θ is characterized by Rng θ = ]0, 2π[ and

h(θ(x)) =x− q

r(x)for all x ∈ D. (74.10)

The values of the coordinates of a point x are indicated in Fig.1; the argu-ment x is omitted to avoid clutter.


If we interpret the first term in a pair as the r-term and the second asthe θ-term of a family indexed on Γ = r, θ, then the mappings γ and ψabove coincide with the mappings denoted by the same symbols in Sect.71.Since ψ is an adjustment of the mapping Ψ defined by (74.7), it follows from(71.4)1 that

br = h θ, bθ = r(h• θ). (74.11)

By (74.6) we have br ·bθ = 0, |br| = 1, and |bθ| = r (see Fig.1). Hence theinner-product components (73.1) are

Grθ = 0, Grr = 1, Gθθ = r2. (74.12)

The value-wise inverse G of the matrix G is given by

Grθ

= 0, Grr

= 1, Gθθ

=1

r2. (74.13)

By (73.11) we have det G = r2, and hence g :=√

| det G| becomes

g = r. (74.14)

Using the Theorem on Connection Components of Sect.73, we find that theonly non-zero connection components are

Cθθr = Cr

θθ =

1

r, Cθ

rθ = −r. (74.15)

The formula (73.20) for the divergence of a differentiable vector field hbecomes

div h =1

r(r[h]r);r + [h]θ;θ. (74.16)


The formula (73.22) for Laplacian of a twice differentiable scalar field fbecomes

∆f =1

r(rf;r);r +

1

r2f;θ;θ. (74.17)

The formula (73.21) for the components of the divergence of a differentiablelineon field T yields

[div T]r = 1r(rTrr);r + Trθ

;θ − rTθθ,

[div T]θ = 1r(rTθr);r + Tθθ

;θ + 1r(Trθ + Tθr).

(74.18)

(On the right sides, brackets are omitted to avoid clutter.)Remark: The adjustment of Ψ described above is only one of several

that yield a suitable definition of a polar coordinate system. Another wouldbe obtained by replacing (74.8) by

D := P×× ]−π, π[, D := E \ (q − Pe). (74.19)

In this case θ would be characterized by (74.10) and Rng θ = ]−π, π[.(C) Cylindrical Coordinates: We assume that dim E = 3. We first

prescribe a point q ∈ E and a unit vector f ∈ V. We then put U := f⊥

and prescribe a lineon J ∈ SkewU ∩ OrthU and a unit vector e ∈ U . Wedefine h : R → V by (74.4) and consider the mapping Ψ : P× × R × R → Edefined by

Ψ(s, t, u) := q + sh(t) + uf . (74.20)

It is easily seen that Ψ is of class C1 and locally invertible but not injective.An invertible adjustment of Ψ with open domain is ψ := Ψ|D

D, where

D := P×× ]0, 2π[ × R, D := E \ (q + Pe + Rf). (74.21)

We put γ := ψ← and define the functions r, θ, z, all with domain D, by(r, θ, z) := γ|R

3. Then Γ := r, θ, z is called a cylindrical coordinate

system. let E be the symmetric idempotent for which RngE = U (seeProp.4 of Sect.41). Then the functions r and z are given by

r(x) = |E(x− q)|, z(x) = f · (x− q) for all x ∈ D, (74.22)

and θ is characterized by Rng θ = ]0, 2π[ and

h(θ(x)) =1

r(x)E(x− q) for all x ∈ D. (74.23)

The values of the coordinates of a point x are indicated in Fig.2; the argu-ment x is omitted to avoid clutter.


If we interpret the first term in a triple as the r-term, the second as theθ-term, and the third as the z-term of a family indexed on Γ = r, θ, z,then the notation γ and ψ above is in accord with the one used in Sect.1.By the same reasoning as used in Example (B), we infer from (74.20) thatthe basis field b of Γ is given by

br = h θ, bθ = r(h• θ), bz = f . (74.24)

The matrices G and G of the inner-product components of Γ are diagonalmatrices and their diagonals are given by

Grr = 1, Gθθ = r2, Gzz = 1, (74.25)

Grr

= 1, Gθθ

=1

r2, G

zz= 1. (74.26)

The relation (74.14) remains valid and the only non-zero connection com-ponents of Γ are again given by (74.15). The formulas (74.16), (74.17), and(74.18) must be replaced by

div h =1

r(rhr);r + hθ ;θ + hz ;z, (74.27)

∆f =1

r(rf;r);r +

1

r2f;θ;θ + f;z;z, (74.28)


and

[div T]r =1

r(rTrr);r + Trθ

;θ + Trz;z − rTθθ,

[div T]θ =1

r(rTθr);r + Tθθ

;θ + Tθz;z +

1

r(Trθ + Tθr), (74.29)

[div T]z =1

r(rTzr);r + Tzθ

;θ + Tzz;z.

(Brackets are omitted on the right sides.)

(D) Spherical Coordinates: We assume dim E = 3 and prescribe q,f , and J as in Example (C). We replace the formula (74.20) by

Ψ(s, t, u) := q + s(cos(t)f + sin(t)h(u)). (74.30)

The definition of D in (74.21) must be replaced by

D := P× × ]0, π[ × ]0, 2π[, (74.31)

but D remains the same. Then ψ := Ψ|DD

is invertible, and the functions

r, θ, ϕ, all with domain D, are defined by γ := ψ← and (r, θ, ϕ) := γ|R3. Then

Γ := r, θ, ϕ is called a spherical coordinate system. The functions rand θ are given by

r(x) = |x− q|, θ(x) = arccos

(

f · (x− q)

r(x)

)

for all x ∈ D (74.32)

and ϕ is characterized by Rngϕ = ]0, 2π[ and

h(ϕ(x)) =1

r(x) sin(θ(x))E(x− q) for all x ∈ D (74.33)

where E is defined as in Example (C). The values of the coordinates of apoint x are indicated in Fig.3, again with arguments omitted.


By the same procedure as used in the previous examples, we find thatthe basis field b of Γ is given by

br = (cos θ)f + (sin θ)(h ϕ),

bθ = r(−(sin θ)f + (cos θ)(h ϕ)), (74.34)

bϕ = r(sin θ)(h• ϕ).

The matrices G and G of the inner-product components are again diagonalmatrices and their diagonals are given by

Grr = 1, Gθθ = r2, Gϕϕ = r2(sin θ)2, (74.35)

Grr

= 1, Gθθ

=1

r2, G

ϕϕ=

1

r2(sin θ)2. (74.36)

By (73.11) we have det G = r4(sin θ)2 and hence g :=√

| det G| becomes

g = r2(sin θ). (74.37)

Using (73.6), we find that the only non-zero connection components are

Cθθr = Cr

θθ = Cϕ

ϕr = Cr

ϕϕ =

1

r, Cθ

rθ = −r, (74.38)

Cϕϕθ = Cθ

ϕϕ =

1

tan θ, Cϕ

rϕ = −r(sin θ)2, Cϕ

θϕ = −(sin cos) θ.


The formulas (73.20) and (73.22) for the divergence and for the Laplacianspecialize to

div h =1

r2(r2hr);r +

1

sin θ((sin θ)hθ);θ + hϕ;ϕ (74.39)

and

∆f =1

r2(r2f;r);r +

1

r2(sin θ)((sin θ)f;θ);θ +

1

r2(sin θ)2f;ϕ;ϕ. (74.40)

the formula (73.21) for the divergence of a lineon field gives

[div T]r =1

r2(r2Trr);r +

1

sin θ((sin θ)Trθ);θ + Trϕ

;ϕ

− rTθθ − r(sin θ)2Tϕϕ,

[div T]θ =1

r2(r2Tθr);r +

1

sin θ((sin θ)Tθθ);θ + Tθϕ

;ϕ

+1

r(Trθ + Tθr) − ((sin cos) θ)Tϕϕ, (74.41)

[div T]ϕ =1

r2(r2Tϕr);r +

1

sin θ((sin θ)Tϕθ);θ + Tϕϕ

;ϕ

+1

r(Trϕ + Tϕr) +

1

tan θ(Tϕθ + Tθϕ).

(Brackets are omitted on the right sides to avoid clutter.)

Notes 74

(1) Unfortunately, there is no complete agreement in the literature on what letters touse for which coordinates. Often, the letter ϕ is used for our θ and vice versa. Incylindrical coordinates, the letter ρ is often used for our r. In spherical coordinates,one sometimes finds ω for our ϕ.

(2) Most of the literature is very vague about how one should choose the domain D foreach of the curvilinear coordinate systems discussed in this section.


(1) Let Γ be a coordinate system as in Def.1 of Sect.71 and let β be thedual basis field of Γ. Let p : I → D be a twice differentiable processon some interval I ∈ Sub R. Define the component-functions of p• andp•• by


[p•]c := (βc p)p•, [p••]c := (βc p)p•• (P7.1)

for all c ∈ Γ.

(a) Show that

[p•]c = (c p)• (P7.2)

and

[p••]c = (c p)•• +∑

(d,e)∈Γ2(Cdce p)(d p)

•(e p)• (P7.3)

for all c ∈ Γ, where C denotes the family of connection compo-nents.

(b) Write out the formula (P7.3) for cylindrical and spherical coor-dinates.

(2) Let Γ be a coordinate system as in Def.1 of Sect.71 and let b bethe basis field and β the dual basis field of Γ. If F is a field whosecodomain is Lin(V,LinV) ∼= Lin2(V

2,V), then the component family[F] ∈ (Map (DomF,R))Γ

3of F is given by

[F]cde := βcF(be,bd) for all c, d, e ∈ Γ. (P7.4)

(a) Show: The components of the gradient ∇T of a differentiablelineon field T are given by

[∇T]cde = [T]cd;e +∑

f∈Γ

(

[T]f d Cfce − [T]cf Cd

fe

)

(P7.5)

for all c, d, e ∈ Γ.

(b) Show that if the connection components are of class C1, theysatisfy

Ccde;f − Cc

df ;e +

∑

g∈Γ

(

Ccge Cg

df − Cc

gf Cg

de

)

= 0 (P7.6)

for all c, d, e, f ∈ Γ. (Hint: Apply the Theorem on Symmetry ofSecond Gradients to bc and use Part (a).)

(3) Let Γ be a coordinate system on an open subset D of a genuine Eu-clidean space E with translation space V and let b be the basis field of


Γ. Assume that the system is orthogonal in the sense that bc •bd = 0for all c, d ∈ Γ with c 6= d. Define

bc := |bc| for all c ∈ Γ (P7.7)

and

ec := bc

bcfor all c ∈ Γ, (P7.8)

so that, for each x ∈ D, e(x) := (ec(x) | c ∈ Γ) is an orthonormal basisof V. If h is a vector field with Domh ⊂ D, we define the family 〈h〉 :=(〈h〉c | c ∈ Γ) of physical components of h by 〈h〉(x) := lnc−1

e(x)h(x)for all x ∈ D, so that

〈h〉c = h • ec for all c ∈ Γ. (P7.9)

Physical components of fields of other types are defined analogously.

(a) Derive a set of formulas that express the connection componentsin terms of the functions bc, and bc;d, c, d ∈ Γ. (Hint: Use theTheorem on Connection Components of Sect.73.)

(b) Show that the physical components of a vector field h are relatedto the components [h]c and [h]c, c ∈ Γ, by

[h]c = bc〈h〉c, [h]c = 1bc〈h〉c for all c ∈ Γ (P7.10)

(c) Show that the physical components of the gradient ∇h of a dif-ferentiable vector field h are given by

〈∇h〉c,d = 1bd〈h〉c;d −

bd;c

bdbc〈h〉d if c 6= d,

〈∇h〉c,c = 1bc〈h〉c;c +

∑

e∈Γ\c

bc;ebcbe

〈h〉e(P7.11)

for all c, d ∈ Γ.

(4) Let a 2-dimensional genuine inner-product space E be given. Assumethat a point q ∈ E , an orthonormal basis (e, f) and a number ε ∈ P×

have been prescribed. Consider the mapping Ψ : (ε+P×)× ]0, ε[ → Edefined by


Ψ(s, t) := q + 1ε(st e +

√

(s2 − ε2)(ε2 − t2)f). (P7.12)

(a) Compute the partial derivatives Ψ,1 and Ψ,2 and show that Ψ islocally invertible.

(b) Show that Ψ is injective and hence that D := Rng Ψ is open andthat ψ := Ψ|Rng is invertible.

(c) Put γ := ψ← and define the functions λ, µ from D to R by(λ, µ) := γ|R

2. Show that Γ := λ, µ is a coordinate-system

on D; it is called an elliptical coordinate system.

(d) Show that the coordinate curves corresponding to the coordinatesλ and µ are parts of ellipses and hyperbolas, respectively, whosefoci are q − εe and q + εe. Show that D = q + P×e + P×f .

(e) Using Part (a), write down the basis field (bi | i ∈ λ, µ) of thesystem λ, µ, and show that the inner-product components aregiven by

Gλ,µ = 0, Gλ,λ = λ2−µ2

λ2−ε2, Gµ,µ = λ2−µ2

ε2−µ2 . (P7.13)

(f) Show that the Laplacian of a twice differentiable scalar field fwith Dom f ⊂ D is given by

∆f =

√(λ2−ε2)(ε2−µ2)

λ2−µ2

(

(√

λ2−ε2

ε2−µ2 f;λ

)

;λ+

(

√

ε2−µ2

λ2−ε2f;µ

)

;µ

)

. (P7.14)

(g) Compute the connection components of the system λ, µ.

(5) Let a 3-dimensional genuine inner-product space E with translationspace V be given. Assume that q, f and J are prescribed as in Example(C) of Sect.74 and that h : R → V is defined by (74.4). Define Ψ :P× × P× × R → E by

Ψ(s, t, u) := q + 12(s2 − t2)f + sth(u). (P7.15)

(a) Compute the partial derivatives of Ψ and show that Ψ is locallyinvertible.

(b) Specify an open subset D of P× × P× × R and an open subset Dof E such that ψ := Ψ|D

Dis invertible.


(c) Show that γ := ψ← and (α, β, θ) := γ|R3

define a coordinate sys-tem Γ := α, β, θ on D; it is called a paraboloidal coordinatesystem.

(d) Show that the coordinate curves corresponding to the coordinatesα and β are parabolas with focus q.

(e) Using Part (a), write down the basis field (bi | i ∈ α, β, θ) ofthe system and compute the inner-product components.

(f) Find the formula for the Laplacian of a twice differentiable scalarfield in paraboloidal coordinates.

(g) Compute the connection components of the system α, β, θ.

(6) Let Γ be a coordinate system on an open subset D of a Euclidean

space E and let β be the dual basis field of Γ. Let Gcd

, c, d ∈ Γ, bedefined by (73.2) and g by (73.15).

(a) Show that the Laplacian of the coordinate c ∈ Γ is given by

∆c = 1g

∑

d∈Γ

(gGdc

);d. (P7.16)

(b) Given c ∈ Γ, show that the components of the Laplacian of βc

are given by

[∆βc]d =

(

1g

∑

e∈Γ

(

gGec)

;e

)

;d

. (P7.17)

(Hint: Use βc = ∇c and Part (a)).

(c) Let h be a twice differentiable vector field with Domh ⊂ D. Showthat the components of the Laplacian of h are given by

[h]c = [h]c+

∑

(d,e)∈Γ2

(

1g(gG

ed);e

)

;c[h]d − 2

∑

f∈Γ

Ccdf G

bfe[h]d;e

(P7.18)

for all c ∈ Γ, where Ccdf , c, d, f ∈ Γ, are the connection compo-

nents of Γ. (Hint: Use Prop.5 of Sect.67.)

(d) Write out the formula (P7.18) for cylindrical coordinates.

Chapter 8

Spectral Theory

In this chapter, it is assumed that all linear spaces under consideration areover the real field R and that all linear and inner-product spaces are finite-dimensional. However, in Sects.89 and 810 we deal with linear spaces overR that are at the same time linear spaces over the complex field C. Mostof this chapter actually deals with a given finite-dimensional genuine inner-product space. Some of the definitions remain meaningful and some of theresults remain valid if the space is infinite-dimensional or if R is replaced byan arbitrary field.

81 Disjunct Families, Decompositions

We assume that a linear space V and a finite family (Ui | i ∈ I) of subspacesof V are given. There is a natural summing mapping from the product space

×(Ui | i ∈ I) (see Sect.14) to V, defined by

(u 7→ ΣIu) : ×i∈I

Ui → V. (81.1)

This summing mapping is evidently linear.Definition 1: We say that the finite family (Ui | i ∈ I) of subspaces of

V is disjunct if the summing mapping (81.1) of the family is injective; we

say that the family is a decomposition of V if the summing mapping is

invertible.

It is clear that every restriction of a disjunct family is again disjunct. Inother words, if (Ui | i ∈ I) is disjunct and J ∈ Sub I, then (Uj | j ∈ J) isagain disjunct. The union of a disjunct family cannot be the same as its sumunless all but at most one of the terms are zero-spaces. A disjunct family

303

304 CHAPTER 8. SPECTRAL THEORY

(Ui | i ∈ I) is a decomposition of V if and only if its member-wise sum is V,i.e.

∑

(Ui | i ∈ I) = V (see (07.14)).Since the summing mapping is injective if and only if its nullspace is

zero, we have the following result:Proposition 1: A finite family (Ui | i ∈ I) of subspaces is disjunct if

and only if, for every family u ∈ ×(Ui | i ∈ I),∑

(ui | i ∈ I) = 0 is

possible only when ui = 0 for all i ∈ I.The following result gives various characterizations of disjunct families

and decompositions.Proposition 2: Let (Ui | i ∈ I) be a finite family of subspaces of V.

Then the following are equivalent:

(i) The family (Ui | i ∈ I) is disjunct [a decomposition of V].

(ii) For every j ∈ I, Uj and∑

(Ui | i ∈ I\j) are disjunct [supplementary

in V].

(iii) If I 6= ∅ then for some j ∈ I, Uj and∑

(Ui | i ∈ I \ j) are disjunct

[supplementary in V] and the family (Ui | i ∈ I \ j) is disjunct.

Proof: We prove only the assertions concerning disjunctness. The restthen follows from

∑

(Ui | i ∈ I) = Uj +∑

(Ui | i ∈ I \ j),

valid for all j ∈ I.(i) ⇒ (ii): Assume that (Ui | i ∈ I) is disjunct and that j ∈ I and

w ∈ Uj ∩∑

(Ui | i ∈ I \j) are given. Then we may choose (ui | i ∈ I \j)such that w =

∑

(ui | i ∈ I \j) and hence (−w)+∑

(ui | i ∈ I \j) = 0.By Prop.1, it follows that w = 0. We conclude that Uj∩

∑

(Ui | i ∈ I\j) =0.

(ii) ⇒ (i): Assume that (Ui | i ∈ I) fails to be dis-junct. By Prop.1 we can then find j ∈ I and w ∈ U×jsuch that w +

∑

(ui | i ∈ I \ j) = 0 for a suitable choice of u ∈

×(Ui | i ∈ I \ j). Then w ∈ Uj ∩∑

(Ui | i ∈ I \ j) and hence, sincew 6= 0, Uj and

∑

(Ui | i ∈ I \ j) are not disjunct.(i) ⇒ (iii): This follows from (i) ⇒ (ii) and the fact that a restriction

of a disjunct family is disjunct.

(iii) ⇒ (i): Choose j ∈ I according to (iii) and let u ∈×(Ui | i ∈ I)such that

∑

(ui | i ∈ I) = 0 be given. Then uj = −∑

(ui | i ∈ I \ j) andhence uj = 0 because Uj and

∑

(Ui | i ∈ I \ j) are disjunct. It follows

81. DISJUNCT FAMILIES, DECOMPOSITIONS 305

that∑

(ui | i ∈ I \ j) = 0 and hence u|I\j = 0 because (Ui | i ∈ I \ j)is disjunct. We conclude that u = 0 and hence, by Prop.1, that (Ui | i ∈ I)is disjunct.

It follows from Prop.2 that a pair (U1,U2) of subspaces of V is disjunct[a decomposition of V] if and only if U1 and U2 are disjunct [supplementaryin V] in the sense of Def.2 of Sect.12.

The following result is an easy consequence of Prop.2.

Proposition 3: Let (Ui | i ∈ I) be a disjunct family of subspaces of

V, let J be a subset of I, and let Wj be a subspace of Uj for each j ∈ J .

Then (Wj | j ∈ J) is again a disjunct family of subspaces of V. Moreover, if

(Wj | j ∈ J) is a decomposition of V, so is (Ui | i ∈ I) and we have Wj = Uj

for all j ∈ J and Ui = 0 for all i ∈ I \ J .

Using Prop.2 above and Prop.4 of Sect.17 one easily obtains the followingresult by induction.

Proposition 4: A finite family (Ui | i ∈ I) of subspaces of V is disjunct

if and only if

dim(∑

i∈I

Ui) =∑

i∈I

dimUi. (81.2)

In particular, if (Ui | i ∈ I) is a decomposition of V, then

dimV =∑

(dimUi | i ∈ I).

Let (fi | i ∈ I) be a finite family of non-zero elements in V. Thisfamily is linearly independent, or a basis of V, depending on whether thefamily (Lspfi | i ∈ I) of one-dimensional subspaces of V is disjunct, or adecomposition of V, respectively.

The following result generalizes Prop.4 of Sect.19 and is easily derivedfrom that proposition and Prop.2.

Proposition 5: Let (Ui | i ∈ I) be a decomposition of V. Then there is

a unique family (Pi | i ∈ I) of projections Pi : V → Ui such that

v =∑

i∈I

Piv for all v ∈ V, (81.3)

and we have

Ui ⊂ Null Pj for all i, j ∈ I with i 6= j. (81.4)

Also, there is a unique family (Ei | i ∈ I) of idempotent lineons on V such

that Ui = RngEi,∑

i∈I

Ei = 1V , (81.5)


and

EiEj = 0 for all i, j ∈ I with i 6= j. (81.6)

We have

Null Ej =∑

i∈I\j

Ui for all j ∈ I (81.7)

and

Ei = Pi|V and Pi = Ei|

Ui for all i ∈ I. (81.8)

The family (Pi | i ∈ I) is called the family of projections and the

family (Ei | i ∈ I) the family of idempotents associated with the given

decomposition.

The following result, a generalization of Prop.5 of Sect.19 and Prop.2 ofSect.16, shows how linear mappings with domain V are determined by theirrestrictions to each term of a decomposition of V.

Proposition 6: Let (Ui | i ∈ I) be a decomposition of V. For every

linear space V ′ and every family (Li | i ∈ I) with Li ∈ Lin(Ui,V′) for all

i ∈ I, there is exactly one L ∈ Lin(V,V ′) such that

Li = L|Uifor all i ∈ I. (81.9)

It is given by

L :=∑

i∈I

LiPi, (81.10)

where (Pi | i ∈ I) is the family of projections associated with the decompo-

sition.

Notes 81

(1) I introduced the term “disjunct” in the sense of Def.1 for pairs of subspaces (seeNote (2) to Sect.12); J. J. Schaffer suggested that the term be used also for arbitraryfamilies of subspaces.

(2) In most of the literature, the phrase “V is the direct sum of the family (Ui | i ∈ I)”is used instead of “(Ui | i ∈ I) is a decomposition of V”. The former phrase isactually absurd because it confuses a property of the family (Ui | i ∈ I) with aproperty of the sum of the family (see also Note (3) to Sect.12). Even Bourbakifalls into this trap.

82. SPECTRAL VALUES AND SPECTRAL SPACES 307

82 Spectral Values and Spectral Spaces

We assume that a linear space V and a lineon L ∈ LinV are given. Recall thata subspace U of V is called an L-space if it is L-invariant, i.e. if L>(U) ⊂ U(Def.1 of Sect.18). Assume that U is an L-space. Then we can consider theadjustment L|U ∈ LinU . It is clear that a subspace W of U is an L-space ifand only if it is a L|U -space.

Proposition 1: A subspace U of V is L-invariant if and only if its

annihilator U⊥ is L⊤-invariant.

Proof: Assume that U is L-invariant, i.e. that L>(U) ⊂ U . By theTheorem on Annihilators and Transposes of Sect.21 we then have

U⊥ ⊂ (L>(U))⊥ = (L⊤)<(U⊥),

which means that U⊥ is L⊤-invariant. Interchanging the roles of L = (L⊤)⊤

and L⊤ and of U⊥ and U = (U⊥)⊥ we see that the L⊤-invariance of U⊥

implies the L-invariance of U .Definition 1: For every σ ∈ R, we write

SpsL(σ) := Null (L − σ1V). (82.1)

This is an L-space; if it is non-zero, we call it the spectral space of L for

σ. The spectrum of L is defined to be

SpecL := σ ∈ R | SpsL(σ) 6= 0, (82.2)

and its elements are called the spectral values of L. If σ ∈ SpecL, then

the non-zero members of SpsL(σ) are called spectral vectors of L for σ.

The family (SpsL(σ) | σ ∈ SpecL) of subspaces of L is called the family ofspectral spaces of L. The multiplicity function multL : R → N of Lis defined by

multL(σ) := dim(SpsL(σ)); (82.3)

its value multL(σ) ∈ N× at σ ∈ SpecL is called the multiplicity of the

spectral value σ.

We note that

SpecL = σ ∈ R | multL(σ) 6= 0, (82.4)

i.e. that SpecL is the support of multL (see Sect.07).It is evident that σ ∈ R is a spectral value of L if and only if

Lu = σu (82.5)


for some u ∈ V×. Every u ∈ V× for which (82.5) holds is a spectral vectorof L for σ. In fact, SpsL(σ) is the largest among the subspaces U of V forwhich

L|U = σ1U⊂V . (82.6)

Proposition 2: If the lineons L and M commute, then every spectral

space of M is L-invariant.

Proof: Note that L and M commute if and only if L and M − σ1Vcommute, no matter what σ ∈ R is. Hence, by (82.1), it is sufficient to showthat Null M is L-invariant. But for every u ∈ Null M, we have

0 = L(Mu) = (LM)u = (ML)u = M(Lu),

i.e. Lu ∈ Null M, which proves what was needed.Let E ∈ LinV be an idempotent. By Prop.3, (iv) of Sect.19, we then

haveRng E = SpsE(1). (82.7)

It is easily seen that SpsE(1) and SpsE(0) = Null E are the only spaces ofthe form SpsE(σ), σ ∈ R, that can be different from zero and hence thatSpecE ⊂ 0, 1. In fact, we have SpecE = 0, 1 unless E = 0 or E = 1V .The multiplicities of the spectral values 0 and 1 of E are given, in view of(26.11), by

multE(1) = trE, multE(0) = dimV − trE. (82.8)

The following result, which is easily proved, shows how linear isomor-phisms affect invariance, spectra, spectral spaces, and multiplicity:

Proposition 3: Let V,V ′ be linear spaces, let A : V → V ′ be a linear

isomorphism, and let L ∈ LinV be a lineon, so that ALA−1 ∈ LinV ′.

(a) If U is an L-space then A>(U) is an (ALA−1)-space.

(b) For every σ ∈ R, we have

A>(SpsL(σ)) = SpsALA−1(σ).

(c) SpecL = Spec (ALA−1) and multL = multALA−1.

Theorem on Spectral Spaces: The spectrum of a lineon on V has at

most dimV members and the family of its spectral spaces is disjunct.

The proof will be based on the following Lemma:Lemma: Let L ∈ LinV, let S be a finite subset of SpecL and let f :=

(fσ | σ ∈ S) be such that fσ ∈ (SpsL(σ))× for all σ ∈ S. Then f is linearly

independent.

82. SPECTRAL VALUES AND SPECTRAL SPACES 309

Proof: We proceed by induction over ♯S. The assertion is trivial when♯S = 0, because f must then be the empty family. Assume, then, that ♯S ≥ 1and that the assertion is valid when S is replaced by S′ with ♯S′ = ♯S − 1.Let λ ∈ Null (lncf ) ⊂ RS , so that

∑

(λσfσ | σ ∈ S) = 0. Since Lfσ = σfσ, itfollows that

0 = L(∑

σ∈S

λσfσ) =∑

σ∈S

λσσfσ.

Now choose τ ∈ S and put S′ := S \ τ. We then obtain

0 = τ∑

σ∈S

λσfσ −∑

σ∈S

λσσfσ =∑

σ∈S′

λσ(τ − σ)fσ.

Since τ − σ 6= 0 for all σ ∈ S′ and since ♯S′ = ♯S − 1, we can apply theinduction hypothesis to ((τ − σ)fσ | σ ∈ S′) and conclude that λσ = 0 forall σ ∈ S′, so that 0 =

∑

σ∈S λσfσ = λτ fτ . Since fτ 6= 0 it also followsthat λτ = 0 and hence that λ = 0. Since λ ∈ Null (lncf ) was arbitrary, weconclude that Null (lncf ) = 0.

Proof of the Theorem: The fact that ♯SpecL ≤ dimV followsfrom the Lemma and the Characterization of Dimension of Sect.17. Toshow that (SpsL(σ) | σ ∈ SpecL) is disjunct, we need only observe that∑

(uσ | σ ∈ SpecL) = 0 with u ∈×(SpsL(σ) | σ ∈ SpecL) implies that∑

(uσ | σ ∈ S) = 0, where S := Suptu. The Lemma states that this ispossible only when S = ∅, i.e. when uσ = 0 for all σ ∈ SpecL. By Prop.1,this shows that the family (SpsL(σ) | σ ∈ SpecL) is disjunct.

Since a disjunct family with more than one non-zero term cannot haveV as its union, the following is an immediate consequence of the Theoremjust proved.

Proposition 4: If every v ∈ V× is a spectral vector of L ∈ LinV then

L = λ1V for some λ ∈ R.

Remark: If dimV = 0, i.e. if V is a zero space, then the spectrum ofthe only member 1V = 0 of LinV is empty.

Proposition 5: If the sum of the family (SpsL(σ) | σ ∈ SpecL) of spec-

tral spaces of a lineon L ∈ LinV is V, and hence the family is a decomposition

of V, and if (Eσ | σ ∈ SpecL) is the associated family of idempotents, then

L =∑

σ∈SpecL

σEσ (82.9)

and

trL =∑

σ∈SpecL

(multL(σ))σ. (82.10)


Proof: In view of (82.6), we have L|SpsL(σ) = σ1SpsL(σ)⊂V for all σ ∈SpecL. It follows from Prop.6 of Sect.81 and (81.8) that (82.9) holds. Theequation (82.10) follows from (82.9) and (82.3) using Prop.5 of Sect.26.

Proposition 6: Let L ∈ LinV be given. Assume that Z is a finite subset

of R and that (Wσ | σ ∈ Z) is a decomposition of V whose terms Wσ are

non-zero L-spaces and satisfy

L|Wσ= σ1Wσ

for all σ ∈ Z. (82.11)

Then Z = SpecL and Wσ = SpsL(σ) for all σ ∈ Z.

Proof: Since Wσ is non-zero for all σ ∈ Z, it follows from (82.11) thatσ ∈ SpecL and Wσ ⊂ SpsL(σ) for all σ ∈ Z. The assertion is now animmediate consequence of Prop.3 of Sect.81.

Notes 82

(1) A large number of terms for our “spectral value” can be found in the literature.The most common is “eigenvalue”. Others combine the adjectives “proper”, “char-acteristic”, “latent”, or “secular” in various combinations with the nouns “value”,“number”, or “root”. I believe it is economical to use the adjective “spectral”,which fits with the commonly accepted term “spectrum”.

(2) In infinite-dimensional situations, one must make a distinction between the setsσ ∈ R | (L − σ1V) is not invertible and σ ∈ R |Null (L − σ1V) 6= 0. It isthe former that is usually called the spectrum. The latter is usually called the“point-spectrum”.

(3) When considering spectral values or spectral spaces, most people replace “spectral”with “eigen”, or another of the adjectives mentioned in Note (1).

(4) The notation SpsL(σ) for a spectral space is introduced here for the first time. I

could find only ad hoc notations in the literature.

83 Orthogonal Families of Subspaces

We assume that an inner-product space V is given. We say that two subsetsS and T of V are orthogonal, and we write S⊥T , if every element of Sis orthogonal to every element of T , i.e. if u · v = 0 for all u ∈ S, v ∈ T .This is the case if and only if S ⊂ T ⊥. We say that a family (Ui | i ∈ I)of subspaces of V is orthogonal if its terms are pairwise orthogonal, i.e. ifUi⊥Uj for all i, j ∈ I with i 6= j.

83. ORTHOGONAL FAMILIES OF SUBSPACES 311

Proposition 1: A family (Ui | i ∈ I) of subspaces of V is orthogonal if

and only if for all j ∈ I∑

i∈I\j

Ui ⊂ Uj⊥. (83.1)

Proof: If (83.1) holds for all j ∈ I, then

Uk ⊂∑

i∈I\j

Ui ⊂ Uj⊥

and hence Uk⊥Uj for all j, k ∈ I with j 6= k. If the family is orthogonal,then Ui⊥Uj and hence Ui ⊂ Uj

⊥ for all j ∈ I and i ∈ I \ j. Since Uj⊥ is a

subspace of V, (83.1) follows.Using Prop.2 of Sect.1 and the Characterization of Regular Subspaces

of Sect.41 one immediately obtains the next two results.Proposition 2: If the terms of an orthogonal family are regular sub-

spaces then it is a disjunct family.

Proposition 3: A family (Ui | i ∈ I) of non-zero subspaces of V is an

orthogonal decomposition of V if and only if all the terms of the family are

regular subspaces of V and∑

i∈I\j

Ui = Uj⊥ for all j ∈ I. (83.2)

Proposition 4: A decomposition of V is orthogonal if and only if all

the terms of the family of idempotents associated with the decomposition are

symmetric.

Proof: Let (Ui | i ∈ I) be a decomposition of V and let (Ei | i ∈ I) bethe family of idempotents associated with it (see Prop.5 of Sect.81). FromProp.3 and (81.7) it is clear that the decomposition is orthogonal if and onlyif Null Ej = Uj

⊥ = (RngEj)⊥ for all j ∈ I; and this is equivalent, by Prop.3

of Sect.41, to the statement that all the terms of (Ei | i ∈ I) are symmetric.

Proposition 5: The family of spectral spaces of a symmetric lineon is

orthogonal.

Proof: Let S ∈ SymV and σ, τ ∈ SpecS with σ 6= τ be given. For allu ∈ SpsS(σ) and w ∈ SpsS(τ) we then have Su = σu and Sw = τw. Itfollows that

σ(w · u) = w · (σu) = w · Su = (Sw) · u = (τw) · u = τ(w · u),

and hence (σ − τ)(w · u) = 0 for all u ∈ SpsS(σ) and w ∈ SpsS(τ). Sinceσ − τ 6= 0, we conclude that SpsS(σ)⊥SpsS(τ).


84 The Structure of Symmetric Lineons

There are lineons with an empty spectrum, i.e. with no spectral values atall. The concepts of spectral value and spectral space are insufficient forthe description of the structure of general lineons (see Chap.9). However,a symmetric lineon on a genuine inner-product space is completely deter-mined by its family of spectral spaces. The following theorem, whose proofdepends on the Theorem on Attainment of Extrema of Sect.58 and on theConstrained Extremum Theorem of Sect.69, is the key to the derivation ofthis result.

Theorem on the Extreme Spectral Values of a Symmetric Li-neon: If V is a non-zero genuine inner-product space, then every symmetric

lineon S ∈ SymV has a non-empty spectrum. In fact, the least and greatest

members of SpecS are given by

minSpecS = minS |UsphV , (84.1)

max SpecS = maxS |UsphV , (84.2)

where S : V → R is the quadratic form associated with S (see (27.13)) and

where UsphV is the unit sphere of V defined by (42.9).Proof: UsphV is not empty because V is non-zero, and UsphV is closed

and bounded because it is the boundary of the unit ball UblV (see Prop.3of Sect.52 and Prop.12 of Sect.53). By Prop.3 of Sect.66, S is of classC1 and hence continuous. By the Theorem on Attainment of Extrema, itfollows that S |UsphV attains a maximum and a minimum. Assume that oneof these extrema is attained at u ∈ UsphV. By Cor.1 to the ConstrainedExtremum Theorem, we must have ∇u S = σ∇usq for some σ ∈ R. Since(∇usq)v = 2u · v and ∇u S v = σ(Su) · v for all v ∈ V by (66.17), weconclude that 2(Su) · v = σ(2u · v) for all v ∈ V, i.e. Su = σu. Since|u| = 1, it follows that σ ∈ SpecS and that S(u) = u · Su = σ|u|2 = σ. Weconclude that the right sides of both (84.1) and (84.2) belong to SpecS.

Now let τ ∈ SpecS be given. We may then choose u ∈ UsphV such thatSu = τu. Since u · u = 1, we have τ = (τu) · u = (Su) · u = S (u) ∈Rng S |UsphV . Since τ ∈ SpecS was arbitrary, the assertions (84.1) and(84.2) follow.

The next theorem, which completely describes the structure of symmet-ric lineons on genuine inner product spaces, is one of the most importanttheorems of all of mathematics. It has numerous applications not only inmany branches of mathematics, but also in almost all branches of theoreticalphysics, both classical and modern. The reason is that symmetric lineonsappear in many contexts, often unexpectedly.

84. THE STRUCTURE OF SYMMETRIC LINEONS 313

Spectral Theorem: Let V be a genuine inner-product space. A li-

neon on V is symmetric if and only if the family of its spectral spaces is an

orthogonal decomposition of V.

Proof: Let S ∈ SymV be given. Since all spectral spaces of S are S-invariant, so is their sum U :=

∑

(SpsS(σ) | σ ∈ SpecS). By Prop.1 ofSect.82, the orthogonal supplement U⊥ of U is also S-invariant, and S|U⊥ ∈

SymU⊥ is meaningful. Now, by the preceding theorem, if U⊥ is non-zero,S|U⊥ must have a spectral vector w ∈ (U⊥)×. Of course, w is then also

a spectral vector of S and hence w ∈ U . Therefore, U⊥ 6= 0 impliesU ∩ U⊥ 6= 0, which contradicts the fact that, in a genuine inner-productspace, all subspaces are regular. It follows that U⊥ = 0 and hence V =U =

∑

(SpsS(σ) | σ ∈ SpecS). By Props.2 and 5 of Sect.83 it follows that(SpsS(σ) | σ ∈ SpecS) is an orthogonal decomposition of V.

Assume, now, that L ∈ LinV is such that (SpsL(σ) | σ ∈ SpecL) isan orthogonal decomposition of V. We can then apply Prop.5 of Sect.82and conclude that (82.9) holds. Since the decomposition was assumed to beorthogonal, each of the idempotents Eσ in (82.9) is symmetric by Prop.4 ofSect.83. Therefore, by (82.9), L is symmetric.

We assume now that a genuine inner-product space V is given. We listtwo corollaries to the Spectral Theorem. The first follows by applying Prop.5of Sect.81, Prop.5 of Sect.82, and Prop.4 of Sect.83.

Corollary 1: For every S ∈ SymV there is exactly one family

(Eσ | σ ∈ SpecS) of non-zero symmetric idempotents such that

EσEτ = 0 for all σ, τ ∈ SpecS with σ 6= τ, (84.3)

∑

σ∈SpecS

Eσ = 1V , (84.4)

and∑

σ∈SpecS

σEσ = S. (84.5)

The family (Eσ | σ ∈ SpecS) is called the spectral resolution of S.

Its terms are called the spectral idempotents of S.

The second corollary is obtained by choosing orthonormal bases in eachof the spectral spaces of S.

Corollary 2: A lineon S on V is symmetric if and only if there exists

an orthonormal basis e := (ei | i ∈ I) of V whose terms are spectral vectors

of S. If this is the case we have

Sei = λiei for all i ∈ I, (84.6)


where λ := (λi | i ∈ I) is a family of real numbers whose range is SpecS. In

fact each σ ∈ SpecS occurs exactly multS(σ) times in the family λ.

The matrix of S relative to the basis e is diagonal.

Of course, the orthonormal basis e in Cor.2 is not uniquely determinedby S if dimV > 0. For example, if we replace any or all the terms of eby their opposites, we get another orthonormal basis whose terms are alsospectral vectors of S.

Proposition 1: Let V and V ′ be genuine inner product spaces. Then

S ∈ SymV and S′ ∈ SymV ′ have the same multiplicity function, i.e. multS =multS′ if and only if there is an orthogonal isomorphism R : V → V ′ such

that

S′ = RSR−1 = RSR⊤. (84.7)

If (84.7) holds, then

R>(SpsS(σ)) = SpsS′(σ) for all σ ∈ SpecS. (84.8)

Proof: Assume that multS = multS′ . By Cor.2 above, we may thenchoose orthonormal bases e := (ei | i ∈ I) and e′ := (e′i | i ∈ I) of V and V ′,respectively, such that (84.6) and

S′e′i = λie′i for all i ∈ I (84.9)

hold with one and the same family λ := (λi | i ∈ I). Let R : V → V ′ be thelinear isomorphism for which Rei = e′i for all i ∈ I. By Prop.4 of Sect.43,R is an orthogonal isomorphism. By (84.9) and (84.6) we have

S′Rei = λi(Rei) = R(λiei) = RSei

for all i ∈ I. Since e is a basis, it follows that S′R = RS, which is equivalentto (84.7).

If (84.7) holds, then (84.8) and the equality multS = multS′ follow fromProp.3 of Sect.82.

Using Prop.1 above and Prop.2 of Sect.82, we immediately obtainProposition 2: S ∈ SymV commutes with R ∈ OrthV if and only if

the spectral spaces of S are R-invariant.

In the special case when all spectral values of S have multiplicity 1, thecondition of Prop.2 is equivalent to the following one: If e is an orthonormalbasis such that the matrix of S relative to e is diagonal (see Cor.2 above),then the matrix of R relative to e is also diagonal and its diagonal termsare either 1 or -1.

We are now able to prove the result (52.20) already announced in Sect.52.

85. LINEONIC EXTENSIONS 315

Proposition 3: If V and V ′ are genuine inner-product spaces and if

n := dimV > 0, then

|L| ≥ ||L|| ≥1√

n|L| for all L ∈ Lin(V,V ′). (84.10)

Proof: Let L ∈ Lin(V,V ′) be given. The inequality |L| ≥ ||L|| is animmediate consequence of (52.4), with both ν and ν ′ replaced by | · |, andProp.3 of Sect.44.

Since S := L⊤L belongs to SymV, we can consider the spectral resolution(Eσ | σ ∈ SpecS) of S mentioned in Cor.1 above. By (84.5) and (26.11), weobtain

trS =∑

σ∈SpecS

σ trEσ =∑

σ∈SpecS

σ dimRngEσ. (84.11)

Since (Rng Eσ | σ ∈ SpecS) is a decomposition of V, it follows from (84.11)and Prop.4 of Sect.81 that

trS ≤ nmax(SpecS).

Using the Theorem on the Extreme Spectral Values of a Symmetric Lineon,we conclude that

trS ≤ nmaxS |UsphV .

Since S (v) = v · Sv = v · L⊤Lv = |Lv|2 for all v ∈ V, it follows that

trS ≤ nmax|Lv|2 | v ∈ UsphV .

Since trS = tr(L⊤L) = |L|2 by (44.13), it follows from (52.3), with both νand ν ′ replaced by | · |, that |L|2 ≤ n||L||2.

Remark: If L is a tensor product, i.e. if L = w⊗v for some v ∈ V ∼= V∗

and w ∈ V ′, then |L| = ||L||, as easily seen from (44.14) and Part (a) ofProblem 7 in Chap.5. If L is orthogonal, we have ||L|| = 1√

n|L| by Problem

5 in Chap.5. Hence either of the inequalities in (84.10) can become anequality.

85 Lineonic Extensions

Lineonic Square Roots and Logarithms

We assume that a genuine inner-product space V is given. In Sect.41,we noted the identification

SymV ∼= Sym2(V2, R)


and in Sect. 27, we discussed the isomorphism(S 7→ S ) : Sym2(V

2, R) → QuV . We thus see that each symmetric lineonS on V corresonds to a quadratic form S on V given by

S (u) := S(u,u) = (Su) · u for all u ∈ V. (85.1)

The identity lineon 1V ∈ SymV is identified with the inner product ip onV and hence 1V = ip = sq. In view of Def.1 of Sect.27, it is meaningfulto speak of positive [strictly positive] symmetric lineons. We use thenotations

PosV := S ∈ SymV | S (u) ≥ 0 for all u ∈ V (85.2)

and

Pos+V := S ∈ SymV | S (u) > 0 for all u ∈ V× (85.3)

for the sets of all positive and strictly positive symmetric lineons, respec-tively. Since V is genuine, we have 1V ∈ Pos+V. By Prop.3 of Sect.27 wehave

L⊤SL = S L (85.4)

for all S ∈ SymV and all L ∈ LinV.

The following result is an immediate consequence of (85.4):

Proposition 1: If S ∈ PosV and L ∈ LinV, then L⊤SL ∈ PosV.

Moreover, L⊤SL is strictly positive if and only if S is strictly positive and

L is invertible. In particular, we have L⊤L ∈ PosV, and L⊤L ∈ Pos+V if

and only if L is invertible.

The following result is a corollary to the Theorem on the Extreme Spec-tral Values of a Symmetric Lineon (see Sect.84).

Proposition 2: A symmetric lineon is positive [strictly positive] if and

only if all its spectral values are positive [strictly positive]. In other words,

PosV = S ∈ SymV | SpecS ⊂ P, (85.5)

Pos+V = S ∈ SymV | SpecS ⊂ P×. (85.6)

Let a function f : R → R be given. We define the lineonic extensionf(V) : SymV → SymV of f by

f(V)(S) :=∑

σ∈SpecS

f(σ)Eσ (85.7)


when (Eσ | σ ∈ SpecS) is the spectral resolution of S ∈ SymV. If the domainof f is P or P× instead of R, we still use (85.7), but, in view of Prop.2, thedomain of f(V) must be taken to be PosV or Pos+V, respectively. A similarobservation applies to the codomains of f and f(V).

Let f and g be functions whose domains and codomains are one of R, P,P× each. The following rules are evident from the definition (85.7):

(I) For all S ∈ SymV, S ∈ PosV, or S ∈ Pos+V, as appropriate, thespectrum of f(V)(S) is given by

Spec (f(V)(S)) = f>(SpecS). (85.8)

Moreover, if f |SpecS is injective, then f(V)(S) and S have the samespectral spaces and the same spectral idempotents.

(II) If Dom f = Cod g, then Dom f(V) = Cod g(V) and

(f g)(V) = f(V) g(V). (85.9)

(III) If f is invertible, so is f(V) and

(f(V))← = (f←)(V). (85.10)

As an example, we consider the lineonic extension ιn(V) : SymV → SymVof the real nth power function ιn : R → R.

By (85.7), we have

ιn(V)(S) =∑

σ∈SpecS

σnEσ for all n ∈ N. (85.11)

when (Eσ | σ ∈ SpecS) is the spectral resolution of S ∈ SymV. Using thefact that E2

σ = Eσ for all σ ∈ SpecS and that EσEτ = 0 for all σ, τ ∈ SpecSwith σ 6= τ (see (81.6)), one easily infers from (85.11) by induction that

ιn(V)(S) = Sn for all S ∈ SymV. (85.12)

Hence, the lineonic extension of the real nth power is nothing but the ad-justment pown|SymV of the lineonic nth power defined by (66.18).

Of particular importance is the lineonic extention of the real squarefunction ι2|P, adjusted to P. This function is invertible and the inverse is

the positive real square root function√

:= (ι2|P)← : P → P. We call the

lineonic extension of√

the lineonic square root and denote it by

sqrt :=√

(V) : PosV → PosV.


We also write√

S := sqrt(S) when S ∈ PosV. By (85.7) we have

sqrt(S) =√

S =∑

σ∈SpecS

√σEσ (85.13)

when (Eσ | σ ∈ SpecS) is the spectral resolution of S ∈ PosV.The following result follows directly from the rules (I) and (III) above

and from (85.12).Lineonic Square Root Theorem: For every S ∈ PosV, the lineonic

square root√

S given by (85.13) is the only solution of the equation

? T ∈ PosV, T2 = S. (85.14)

The spectrum of√

S is given by

Spec√

S = √

σ | σ ∈ SpecS, (85.15)

and we have

Sps√S(√

σ) = SpsS(σ) for all σ ∈ SpecS. (85.16)

It turns out that the domain PosV of the lineonic square root sqrt is not

an open subset of the linear space SymV and hence it makes no sense to askwhether sqrt is differentiable. However, as we will see, Pos+V is an opensubset of SymV and the restriction of sqrt to Pos+V is of class C1. Actually,Pos+V is sqrt-invariant because the lineonic extension of the strictly positivereal square function (ι2|P×)← must be the adjustment

sqrt+ := sqrt|Pos+V (85.17)

of the square root. By (85.12) and the rule (III) above, sqrt+ is the inverseof the adjustment to Pos+V of the lineonic square function pow2:

sqrt+ = (pow2+)←, where pow2

+ := pow2|Pos+V . (85.18)

We call sqrt+ the strict lineonic square root.Theorem on the Smoothness of the Strict Lineonic Square

Root: The set Pos+V of strictly positive lineons is an open subset of SymVand the strict lineonic square-root sqrt+ : Pos+V → Pos+V is of class C1.

Proof: Let S ∈ Pos+V be given and let σ be the least spectral value ofS. By Prop.2, σ is strictly positive and by (84.1), we have

(Sv) · v ≥ σ|v|2 for all v ∈ V. (85.19)


In view of (52.19), we have |Tv| ≤ ||T|| |v| for every T ∈ SymV and allv ∈ V, where ||T|| is the operator-norm of T. Therefore, the Inner-ProductInequality of Sect.42 yields

(Tv) · v ≥ −|Tv| |v| ≥ −||T|| |v|2 for all v ∈ V. (85.20)

Combining (85.19) and (85.20), we see that

((S + T)v) · v ≥ (σ − ||T||)|v|2 for all v ∈ V.

It follows that S + T is strictly positive if ||T|| < σ i.e. if T ∈ σCe(|| · ||).Hence Pos+V includes the neighborhood S + σ(Ce(|| · ||) ∩ SymV) of S inSymV. Since S ∈ Pos+V was arbitrary, it follows that Pos+V is open inSymV.

Since Pos+V is open in SymV, it makes sense to say, and it is true byProp.4 of Sect.66, that pow2

+ is of class C1. By (66.19), the gradient ofpow2

+ is given by

(∇Spow2+)U = US + SU for all U ∈ SymV. (85.21)

Let U ∈ Null (∇Spow2+) be given, so that US + SU = 0. We then have

(S + γU)2 = S2 + γ(SU + US) + γ2U2 = S2 + γ2U2

= S2 − γ(SU + US) + γ2U2 = (S − γU)2

for all γ ∈ P×. Since Pos+V is open in SymV, we may choose γ ∈ P× suchthat both S + γU and S − γU belong to Pos+V. We then have

pow2+(S + γU) = pow2

+(S− γU).

Since pow2+ is injective, it follows that S+γU = S−γU and hence that U =

0. It follows that Null (∇Spow2+) = 0. By the Pigeonhole Principle for

Linear Mappings, it follows that ∇Spow2+ is invertible. The Local Inversion

Theorem yields that sqrt+ = (pow2+)← is of class C1.

Since, by (68.4), we have

∇Ssqrt = (∇√Spow2

+)−1 for all S ∈ Pos+V,

we can conclude from (85.21) that for every S ∈ Pos+V and every V ∈SymV, the value (∇Ssqrt+)V is the unique solution of the equation

? U ∈ SymV, V = U√

S +√

SU. (85.22)


If V commutes with S then the solution of (85.22) is given by

(∇Ssqrt+)V =1

2

√S−1

V, (85.23)

which is consistent with the formula for the derivative of the square-rootfunction of elementary calculus. If V does not commute with S, then thereis no simple explicit formula for (∇Ssqrt+)V.

As another example of a lineonic extension we consider (exp |P×)(V),

where exp is the real exponential function (see Sect.08).Proposition 3: The lineonic extension of exp |P

×coincides with an

adjustment of the lineonic exponential defined in Prop.2 of Sect.612. Specif-

ically, we have

(exp |P×)(V) = expV |

Pos+VSymV . (85.24)

Proof: Let S ∈ SymV be given and let (Eσ |σ ∈ SpecS) be its spectralresolution. We define F : R → LinV by

F :=∑

σ∈SpecS

(exp (ισ))Eσ. (85.25)

Differentiation gives

F· =∑

σ∈SpecS

(exp (ισ))σEσ. (85.26)

Using the fact that E2σ = Eσ for all σ ∈ SpecS and that EσEτ = 0 for all

σ, τ ∈ SpecS with σ 6= τ , we infer from (84.5), (85.25), and (85.26) thatF· = SF. Since, by (85.25) and (84.4), F(0) =

∑

(Eσ |σ ∈ SpecS) = 1V wecan use Prop.4 of Sect.612 to conclude that F = expV (ιS). Evaluation of(85.25) at 1 ∈ R hence gives

(exp |P×)(V)(S) = F(1) = expV(S).

Since S ∈ SymV was arbitrary, (85.24) follows.Since exp |P

×is invertible, and since its inverse is the logarithm log :=

(exp |P×)← : P× → R, it follows from Prop.3 and rule (III) above that

expV |Pos+VSymV is invertible and that log(V) is its inverse. We call log(V) the

lineonic logarithm and also write logS := log(V)(S) when S ∈ Pos+V. Insum, we have the following result.

Lineonic Logarithm Theorem: For every S ∈ Pos+V, the lineonic

logarithm logS, given by

logS =∑

σ∈SpecS

(log σ)Eσ, (85.27)

86. POLAR DECOMPOSITION 321

where (Eσ |σ ∈ SpecS) is the spectral resolution of S, is the only solution

of the equation

? T ∈ SymV, expV(T) = S. (85.28)

The spectrum of logS is given by

Spec (logS) = log>(SpecS) (85.29)

and we have

Spslog S(log σ) = SpsS(σ) for all σ ∈ SpecS. (85.30)

Remark: Since Dom log(V) = Pos+V is an open subset of SymV by theTheorem on Smoothness of the Strict Lineonic Square Root, it is meaningfulto ask whether log(V) is of class C1. In fact it is, but the proof is far fromtrivial. (See Problem 6 at the end of this chapter.)

Notes 85

(1) The terms “positive semidefinite” instead of “positive” and “positive definite” in-stead of “strictly positive” are often used in the literature in connection with sym-metric lineons. (See also Note (1) to Sect.27.)

(2) The Theorem on the Smoothness of the Lineonic Square Root, with a proof some-what different from the one above, was contained in notes that led to this book.These notes were the basis of the corresponding theorem and proof in “An Intro-duction to Continuum Mechanics” by M.E. Gurtin (Academic Press, 1981). I amnot aware of any other place in the literature where the Theorem is proved or evenmentioned.

86 Polar Decomposition

Let V be a genuine inner-product space. In view of the identificationsLinV ∼= Lin2(V

2, R), SymV ∼= Sym2(V2, R), and SkewV ∼= Skew2(V

2, R) (seeSect.41), we can phrase Prop.6 of Sect.24 in the following way:

Additive Decomposition Theorem: To every lineon L ∈ LinV cor-

responds a unique pair (S,A) of lineons such that

L = S + A, S ∈ SymV, A ∈ SkewV. (86.1)

In fact, S and A are given by

S =1

2(L + L⊤), A =

1

2(L − L⊤). (86.2)


The pair (S,A) is called the additive decomposition of L.

The following theorem asserts the existence and uniqueness of certainmultiplicative decompositions for invertible lineons.

Polar Decomposition Theorem To every invertible lineon L ∈ LisVcorresponds a unique pair (R,S) such that

L = RS, R ∈ OrthV, S ∈ Pos+V. (86.3)

In fact, S and R are given by

S =√

L⊤L, R = LS−1. (86.4)

Also, there is a unique pair (R′,S′) such that

L = S′R′, R′ ∈ OrthV, S′ ∈ Pos+V. (86.5)

In fact, R′ coincides with R, and S′ is given by

S′ = RSR⊤. (86.6)

The pair (R,S) [(R′,S′)] for which (86.3) [(86.5)] holds is called the

right [left] polar decomposition of L.

Proof: Assume that (86.3) holds for a given pair (R,S). Since S⊤ = Sand R⊤R = 1V we then have

L⊤L = (RS)⊤(RS) = S⊤R⊤RS = S2.

Since L⊤L as well as S belongs to Pos+V (see Prop.1 of Sect.85), it followsfrom the Lineonic Square Root Theorem that (86.4) must be valid andhence that S and R, if they exist, are uniquely determined by L. To proveexistence, we define S and R by (86.4). We then have L = RS and S ∈Pos+V. To prove that R ∈ OrthV we need only observe that

R⊤R = (LS−1)⊤(LS−1) = S−1L⊤LS−1 = S−1S2S−1 = 1V .

Assume now that (86.5) holds for a given pair (R′,S′). We then have

L = S′R′ = (R′R′⊤)S′R′ = R′(R

′⊤S′R′).

In view of Prop.1 of Sect.85, R′⊤S′R′ belongs to Pos+V. It follows from

the already proved uniqueness of the pair (R,S) that R′ = R and thatS = R

′⊤S′R′, and hence (86.6) holds. Therefore, R′ and S′, if they exist,are uniquely determined by S and R, and hence by L. To prove existence,

86. POLAR DECOMPOSITION 323

we need only define R′ and S′ by R′ := R and S′ := RSR⊤. It is thenevident that (86.5) holds.

If we apply the Theorem to the case when L is replaced by L⊤, we obtain

S′ =√

LL⊤, R′ = R = S′−1L⊤. (86.7)

The polar decompositions can be illustrated geometrically. Assume that(86.3) holds, choose an orthonormal basis e := (ei | i ∈ I) of spectral vectorsof S (see Cor.2 to the Spectral Theorem), and consider the “unit cube”

C := Box(1

2e) := v ∈ V | |v · ei| ≤

1

2for all i ∈ I.

Since S is strictly positive, we have Sei = λiei for all i ∈ I where λ :=(λi | i ∈ I) is a family of strictly positive numbers. The lineon S will “stretchand/or compress” the cube C = Box(1

2e) into the “rectangular box”

S>(C) = Box(λi

2ei | i ∈ I),

whose sides have the lengths λi, i ∈ I. Finally, the orthogonal lineon Rwill “rotate” or “rotate and reflect” (see end of Sect.88) the rectangular boxS>(C), the result being the rectangular box

L>(C) = R>(S>(C)) = Box(λi

2Rei | i ∈ I)

congruent to S>(C). Thus, the right polar decomposition (86.3) shows thatthe transformation of the cube C into the rectangular box L>(C) can beobtained by first “stretching and/or compressing” and then “rotating” or“rotating and reflecting”. The left polar decomposition (86.5) can be used toshow that L>(C) can be obtained from C also by first “rotating” or “rotatingand reflecting” and then “stretching and/or compressing”. In the case whendimV = 2 these processes are illustrated in Figure 1.


The Polar Decomposition Theorem shows that there are mappings

or : LisV → LinV,rp : LisV → Pos+V,lp : LisV → Pos+V,

such that Rng or ⊂ OrthV and

L = or(L)rp(L) = lp(L)or(L) (86.8)

for all L ∈ LisV. For L := 1V , we get

or(1V) = rp(1V) = lp(1V) = 1V . (86.9)

In view of (86.4), we have

rp(L) = sqrt+(L⊤L), or(L) = L(rp(L))−1 (86.10)

for all L ∈ LisV, where sqrt+ is the strict lineonic square-root functiondefined in the previous section. It is easily seen, also, that

lp(L) = rp(L⊤) = sqrt+(LL⊤) for all L ∈ LisV. (86.11)

Proposition 1: The mappings or, rp, and lp characterized by the Polar

Decomposition Theorem are of class C1. Their gradients at 1V are given by

(∇1Vor)M =1

2(M − M⊤), (86.12)

(∇1V rp)M = (∇1V lp)M =1

2(M + M⊤) (86.13)

for all M ∈ LinV.

Proof: Since (L 7→ L⊤L) : LisV → Pos+V is of class C1 by the GeneralProduct Rule and since sqrt+ : Pos+V → Pos+V is of class C1 by theTheorem on the Smoothness of the Strict Lineonic Square Root, it followsfrom (86.10)1 and the Chain Rule that rp is of class C1. It is an immediateconsequence of (86.11)1 that lp is also of class C1 and that

(∇Llp)M = (∇L⊤rp)M⊤ (86.14)

for all L ∈ LisV, M ∈ LinV.Using the Differentiation Theorem for Inversion Mappings of Sect.68 we

conclude from (86.10)2 that or is also of class C1.

87. THE STRUCTURE OF SKEW LINEONS 325

If we differentiate (86.8)1 with respect to L using the General ProductRule, we obtain

M = ((∇Lor)M)rp(L) + or(L)((∇Lrp)M)

for all L ∈ LisV, M ∈ LinV. In view of (86.9), for L := 1V this yields

M = (∇1Vor)M + (∇1V rp)M for all M ∈ LinV. (86.15)

Since the codomain Pos+V of rp is an open subset of SymV, the codomainof ∇Lrp is SymV for all L ∈ LisV. In particular,

(∇1V rp)M ∈ SymV for all M ∈ LinV. (86.16)

Since Rng or ⊂ OrthV, it follows from Prop.2 of Sect.66 that

(∇1Vor)M ∈ SkewV for all M ∈ LinV. (86.17)

Comparing (86.15)–(86.17) with (86.1), we see that for every M ∈ LinV,

U := (∇1V rp)M, A := (∇1Vor)M

gives the additive decomposition (U,A) of M. Hence (86.12) and (86.13)follow from the Additive Decomposition Theorem.

There are no simple explicit formulas for ∇Lrp,∇Lor, and ∇Llp whenL 6= 1V .

The results (86.12) and (86.13) show, roughly, that for “infinitesimal”M ∈ LinV, the right and left polar decompositions of 1V + M coincide andare given by (1V + A,1V + U) when (U,A) is the additive decompositionof M.

Notes 86

(1) In the literature on continuum mechanics (including some of my own papers, I mustadmit) Prop.1 is taken tacitly for granted. I know of no other place in the literaturewhere it is proved.

87 The Structure of Skew Lineons

A genuine inner-product space V is assumed to be given. First, we deal withan important special type of skew lineons:


Definition 1: We say that a lineon on V is a perpendicular turn if

it is both skew and orthogonal.

If U is a perpendicular turn and u ∈ V, then |Uu| = |u| because Uis orthogonal, and (Uu) · u = 0, i.e. Uu ⊥ u, because U is skew. HenceU “turns” every vector into one that is perpendicular and has the samemagnitude.

The following two results are immediate consequences of the definition.

Proposition 1: A lineon U on V is a perpendicular turn if and only if

it satisfies any two of the following three conditions: (a) U ∈ SkewV, (b)U ∈ OrthV, (c) U2 = −1V .

Proposition 2: Let U be a perpendicular turn on V and let u ∈ V be a

unit vector. Then (u,Uu) is an orthonormal pair and its span Lspu,Uuis a two-dimensional U-space.

The structure of perpendicular turns is described by the following result.

Structure Theorem for Perpendicular Turns: If U is a perpen-

dicular turn, then there exists an orthonormal subset c of V such that

(Lspe,Ue | e ∈ c) is an orthogonal decomposition of V whose terms are

two-dimensional U-spaces. The set c can be chosen so as to contain any

single prescribed unit vector. We have 2♯c = dimV.

Proof: Consider orthonormal subsets c of V such that(Lspe,Ue | e ∈ c) is an orthogonal family of subspaces of V. It is a trivialconsequence of Prop.2 that the empty subset of V and also singletons withunit vectors have this property. Choose a maximal set c with this property.It is clear that c can be chosen so as to contain any prescribed unit vector.Since all the spaces Lspu,Uu are U-spaces by Prop.2, it follows that thesum

U :=∑

e∈c

Lspe,Ue

of the family (Lspe,Ue | e ∈ c) is also a U-space. Since U is skew, itfollows from Prop.1 of Sect.82 that the orthogonal supplement U⊥ of U isalso a U-space. Now if U⊥ 6= 0, we may choose a unit vector f ∈ U⊥.Since U⊥ is a U-space, we have Lspf ,Uf ⊂ U⊥. It follows that c ∪ fis a subset of V such that (Lspe,Ue | e ∈ c ∪ f) is an orthogonalfamily, which contradicts the maximality of c. We conclude that U⊥ = 0and hence U = V, which means that (Lspe,Ue | e ∈ c) is an orthogonaldecomposition of V. By Prop.2, the terms of the decomposition are two-dimensional and by Prop.4 of Sect.81, we have 2(♯c) = dimV.

Corollary: Perpendicular turns on V can exist only when dimV is even.

Let U be such a turn and let m := 12 dimV. Then there exists an orthonormal

87. THE STRUCTURE OF SKEW LINEONS 327

basis e := (ei | i ∈ (2m)]) such that

Uei =

ei+1 when i is odd−ei−1 when i is even

(87.1)

for all i ∈ (2m)].Proof: The basis e is obtained by choosing an orthonormal set c as in

the Theorem, so that ♯c = m, and enumerating it by odd numbers, so thatc = ei | i ∈ 2(m])−1. The ei for i ∈ 2(m]) are then defined by ei := Uei−1.Since U2 = −1V by (c) of Prop.1, we have Uei = U2ei−1 = −ei−1 wheni ∈ 2(m]).

If m is small, then the (2m)-by-(2m) matrix [U]e of U relative to anybasis for which (87.1) holds can be recorded explicitly in the form

[U]e =

0 −11 0

0 −11 0

··

0 −11 0

, (87.2)

where zeros are omitted.In order to deal with skew lineons, we need the following analogue of

Def.1 of Sect.82.Definition 2: Let L be a lineon on V. For every κ ∈ P×, we write

QspsL(κ) := Null (L2 + κ21V). (87.3)

This is an L-space; if it is non-zero, we call it the quasi-spectral spaceof L for κ. The quasi-spectrum of L is defined to be

Qspec L := κ ∈ P× | QspsL(κ) 6= 0. (87.4)

In view of Def.1 of Sect.82 we have

Qspec L = √

σ | σ ∈ Spec (−L2) ∩ P× (87.5)

andQspsL(κ) = Sps−L2(κ2) for all κ ∈ P× (87.6)

for all L ∈ LinV.


The proof of the following result is based on the Spectral Theorem.

Structure Theorem for Skew Lineons: A lineon A on V is skew if

and only if (i) Rng A = (Null A)⊥, (ii) the family of quasi-spectral spaces of

A is an orthogonal decomposition of Rng A, and (iii) for each κ ∈ Qspec Awe have

A|Wκ= κUκ, (87.7)

where Wκ := QspsA(κ) and where Uκ ∈ LinWκ is a perpendicular turn.

Proof: Assume that A is skew. It follows from (22.9) that RngA =Rng (−A) = Rng (A⊤) = (Null A)⊥, i.e. that (i) is valid. Since A⊤A =(−A)A = −A2 is positive by Prop.1 of Sect.85, it follows from Prop.2 ofSect.85 that Spec (−A2) ⊂ P. Hence it follows from (87.5) that

Qspec A = √

σ |σ ∈ Spec (−A2) \ 0.

In view of (87.6) we conclude that the spectral spaces of −A2 are QspsA(κ)with κ ∈ Qspec A, and also Sps−A2(0) if this subspace is not zero. Sinceit is easily seen that Sps−A2(0) = Null A2 = Null A, it follows from theSpectral Theorem, applied to −A2, that (QspsA(κ) | κ ∈ Qspec A) is anorthogonal decomposition of (Null A)⊥ = RngA, i.e. that (ii) is valid. Nowlet κ ∈ Qspec A be given. By (87.3) we have (A2 + κ21V)|Wκ

= 0 when

Wκ := QspsA(κ) and hence, since Wκ is an A-space, ( 1κA|Wκ

)2 = −1Wκ.

Since 1κA|Wκ

∈ SkewWκ, it follows from Prop.1 that (iii) is valid.

Assume now that the conditions (i), (ii), and (iii) hold and put W :=Null A and K := (Qspec A)∪0. Then (Wκ |κ ∈ K) is an orthogonal de-composition of V. Let (Eκ |κ ∈ K) be the associated family of idempotents(see Prop.5 of Sect.81). Using Prop.6 of Sect.81, (81.8), and (87.7), we findthat

A =∑

κ∈Qspec A

κUκ|VEκ|

Wκ . (87.8)

Since Uκ ∈ SkewWκ and, by Prop.4 of Sect.83, Eκ ∈ SymV for each κ ∈Qspec A, it is easily seen that each term in the sum on the right side of(87.8) is skew and hence that A is skew.

Assume that A ∈ SkewV. It follows from (iii) of the Theorem just provedand the Corollary to the Structure Theorem for Perpendicular Turns thatdimWκ is even for all κ ∈ QspecA. Hence, by (ii) and Prop.3 of Sect.81,we obtain the following

Proposition 3: If A is a skew lineon on V, then dim(RngA) is even.

In particular, if the dimension of V is odd, then there exist no invertible

skew lineons on V.

88. STRUCTURE OF NORMAL AND ORTHOGONAL LINEONS 329

The following corollary to the Structure Theorem above is obtained bychoosing, in each of the quasi-spectral spaces, an orthonormal basis as de-scribed in the Corollary to the Structure Theorem for Perpendicular Turns.

Corollary: Let A be a skew lineon on V and put n := dimV, m :=12 dim(Rng A). Then there is an orthonormal basis e := (ei | i ∈ n]) of V

and a list (κk | k ∈ m]) of strictly positive real numbers such that

Ae2k−1 = κke2k

Ae2k = −κke2k−1

for all k ∈ m], (87.9)

Aei = 0 for all i ∈ n] \ (2m)].

The only non-zero terms in the matrix [A]e of A relative to a basis e for

which (87.9) holds are those in blocks of the form

[

0 −κk

κk 0

]

along the

diagonal. For example, if dimV = 3 and A 6= 0, then [A]e has the form

[A]e =

0 −κ 0κ 0 00 0 0

for some κ ∈ P×. (87.10)

Notes 87

(1) I am introducing the term “perpendicular turn” here for the first time; I believe itis very descriptive.

(2) The concepts of a quasi-spectrum, quasi-spectral space, and the notations of Def.2are introduced here for the first time.

88 Structure of Normal and Orthogonal Lineons

Again, a genuine inner-product space V is assumed to be given.

Definition 1: We say that a lineon on V is normal if it commutes with

its transpose.

It is clear that symmetric, skew, and orthogonal lineons are all normal.

Proposition 1: A lineon N ∈ LinV is normal if and only if the two

terms S := 12(N + N⊤) and A := 1

2(N−N⊤) of the additive decomposition

(S,A) of N commute. If this is the case, then any two of N,N⊤,S and Acommute.


Proof: We have N = S + A and, since S ∈ SymV, A ∈ SkewV, N⊤ =S−A. It follows that NN⊤−N⊤N = 2(AS−SA). Therefore, N and N⊤

commute if and only if S and A commute.

Proposition 2: A lineon N ∈ LinV is normal if and only if

|Nv| = |N⊤v| for all v ∈ V. (88.1)

If this is the case, then Null L = Null L⊤.

Proof: We have |Nv|2 = Nv · Nv = v · N⊤Nv and|N⊤v|2 = v · NN⊤v for all v ∈ V. Hence (88.1) holds if and only if thequadratic forms corresponding to N⊤N and NN⊤ are the same (see (85.1)).This is the case if and only if N⊤N and NN⊤ are the same (see Sect.27).

Proposition 3: An invertible lineon N ∈ LisV is normal if and only if

the two terms of its right [left] polar decomposition commute (see Sect.86).

Proof: It is clear from the Polar Decomposition Theorem of Sect.86that the terms of the right or of the left polar decomposition commute ifand only if the right and the left polar decompositions coincide, i.e. if andonly if S = S′ in the notation of the statement of this theorem. But by(86.4)1 and (86.7)1, we have S = S′ if and only if NN⊤ = N⊤N.

In order to deal with the structure of normal lineons, we need the fol-lowing analogue of Def.2 of Sect.87.

Definition 2: Let L be a lineon on V. For every ν ∈ R × P× we write

PspsL(ν) := Null ((L − ν11V)2 + ν2

21V). (88.2)

This is an L-space; if it is not zero, we call it the pair-spectral space of

L for ν. The pair-spectrum of L is defined to be

PspecL := ν ∈ R × P× | PspsL(ν) 6= 0. (88.3)

The proof of the following theorem is based on the Structure Theoremfor Skew Lineons and the Spectral Theorem.

Structure Theorem for Normal Lineons: A lineon N on V is nor-

mal if and only if the family of its pair-spectral spaces is an orthogonal

decomposition of V and, for each ν ∈ PspecN, we have

N|Wν=

ν11Wνif ν2 = 0

ν11Wν+ ν2Uν if ν2 6= 0

, (88.4)

where Wν := PspsN(ν), and where Uν ∈ LinWν is a perpendicular turn.

First we prove an auxiliary result:


Lemma: If N ∈ LinV is normal and if (S,A) is the additive decompo-

sition of N, then

PspsN((σ, κ)) =

SpsS(σ) ∩ QspsA(κ) if κ 6= 0SpsS(σ) ∩ Null A if κ = 0

(88.5)

for all σ ∈ R and all κ ∈ P.

Proof: Let σ ∈ R and κ ∈ P be given and put M := N − σ1V andT := S− σ1V . Then M is normal and (T,A) is the additive decompositionof M. Using the fact that any two of M,M⊤,T and A commute (seeProp.1), we obtain

(M2 + κ21V)⊤(M2 + κ21V) = (M⊤M)2 + κ2(M⊤2

+ M2) + κ41V

= ((T− A)(T + A))2 + κ2((T− A)2 + (T + A)2) + κ41V

= T4 + 2(AT)⊤(AT) + 2κ2T2 + (A2 + κ21V)2.

Since T,T2 and (A2 + κ21V) are symmetric, it follows that

|(M2 + κ21V)v|2 = |T2v|2 + 2|ATv|2 + 2κ2|Tv|2 + |(A2 + κ21V)v|

2

for all v ∈ V, from which we infer that v ∈ Null (M2 +κ21V) = PspsN(σ, κ)if and only if v ∈ Null T ∩ Null (A2 + κ21V). Since Null T = SpsS(σ),since Null (A2 + κ21V) = QspsA(κ) when κ 6= 0 by Def.2 of Sect.82, andsince Null A2 = Null A because A is skew, (88.5) follows.

Proof of Theorem: Assume that N is normal and that (S,A) is itsadditive decomposition. Let σ ∈ SpecS be given and put U := SpsS(σ). ByProp.2 of Sect.82, U is A-invariant and we may hence consider the adjust-ment A|U ∈ SkewU . It is clear that Null A|U = U ∩ Null A and

QspsA|U(κ) = U ∩ QspsA(κ) for all κ ∈ P×.

Hence, by the Lemma, we conclude that

PspsN(σ, κ) =

QspsA|U(κ) for all κ ∈ P×

Null A|U if κ = 0

. (88.6)

Since S|U = σ1U , U is N-invariant and

N|U = σ1U + A|U . (88.7)

We now apply the Structure Theorem for Skew Lineons to A|U ∈ Skew U .In view of (88.6), and since Qspec A|U ⊂ Qspec A, we have


(PspsN(ν) | ν ∈ PspecN, ν1 = σ) is an orthogonal decomposition of U ,and for all ν ∈ PspecN with ν1 = σ we have

A|Wν= 0 if ν2 = 0 and A|Wν

= ν2Uν if ν2 6= 0,

where Wν := PspsN(ν) and where Uν ∈ LinWν is a perpendicular turn.Using this result and (88.7), adjusted to the subspace Wν = W(σ,ν2) of U ,we infer that (88.4) is valid.

By the Spectral Theorem (SpsS(σ) | σ ∈ SpecS) is an orthogonal decom-position of V. Since, as we just have seen, (PspsN(ν) | ν ∈ PspecN, ν1 = σ)is an orthogonal decomposition of SpsS(σ) for each σ ∈ SpecS, it followsthat (PspsN(ν) | ν ∈ PspecN) is an orthogonal decomposition of V. Thiscompletes the proof of the “only-if” -part of the Theorem. The proof ofthe “if”-part goes the same way as that of the “if”-part of the StructureTheorem for Skew Lineons. We leave the details to the reader.

For a symmetric lineon we have S, PspecS = SpecS × 0 andPspsS((σ, 0)) = SpsS(σ) for all σ ∈ SpecS. For a skew lineon A we havePspecA = 0 × (SpecA ∪ Qspec A) and PspsA(0, κ) = QspsA(κ) for allκ ∈ Qspec A, while PspsA((0, 0)) = Null A when A is not invertible.

As in the previous section, we obtain a corollary to the Structure Theo-rem just given as follows.

Corollary: Let N be a normal lineon on V. Then there exists a number

m ∈ N with 2m ≤ n := dimV, an orthonormal basis e := (ei | i ∈ n])of V, a list (µk | k ∈ m]) in R, a list (κk | k ∈ m]) in P×, and a family

(λi | i ∈ n] \ (2m)]) in R such that

Ne2k−1 = µke2k−1 + κke2k

Ne2k = µke2k − κke2k−1

for all k ∈ m], (88.8)

Nei = λiei for all i ∈ n] \ (2m)].

The only non-zero terms in the matrix [N]e of N relative to a basis efor which (88.8) holds are either on the diagonal or in blocks of the form[

µk −κk

κk µk

]

along the diagonal. For example, if dimV = 3 and if N is not

symmetric, then [N]e has the form

[N]e =

µ −κ 0κ µ 00 0 λ

, κ ∈ P×, λ, µ ∈ R. (88.9)

Orthogonal lineons are special normal lineons. To describe their struc-ture, the following definition is useful.


Definition 3: Let L be a lineon on V. For every θ ∈ ]0, π[ we write

AspecL(θ) := PspsL((cos θ, sin θ)) (88.10)

(see (88.2)). This is an L-space; if it is not zero, we call it the angle-spectral space of L for θ. The angle-spectrum of L is defined to be

Aspec L := θ ∈ ]0, π[ | AspsL(θ) 6= 0. (88.11)

The following result is a corollary to the Structure Theorem for NormalLineons.

Structure Theorem for Orthogonal Lineons: A lineon R on V is

orthogonal if and only if (i) Null (R+1V) ⊥ Null (R− 1V), (ii) the family

of angle-spectral spaces is an orthogonal decomposition of (Null (R + 1V) +Null (R− 1V))

⊥, and (iii) for each θ ∈ Aspec R we have

R|Uθ= cos θ1Uθ

+ sin θVθ, (88.12)

where Uθ := AspsR(θ) and where Vθ ∈ Lin Uθ is a perpendicular turn.

Proof: The conditions (i), (ii), (iii) are satisfied if and only if the condi-tions of the Structure Theorem for Normal Lineons are satisfied with N := Rand

PspecR ⊂ (1, 0), (−1, 0) ∪ (cos θ, sin θ) | θ ∈ Aspec R. (88.13)

Thus we may assume that R is normal. Since (PspsR(ν) | ν ∈ PspecR)is an orthogonal decomposition of V whose terms Wν := PspsR(ν) are R-invariant, it is easily seen that R is orthogonal if and only if R|Wν

is or-thogonal for each ν ∈ PspecR. It follows from (88.4), with N := R, thatthis is the case if and only if ν2

1 + ν22 = 1 for all ν ∈ PspecR, which is

equivalent to (88.13). If ν2 = 0 we have ν = (1, 0) or ν = (−1, 0) andPspsR((1, 0)) = Null (R − 1V) and PspsR((−1, 0)) = Null (R + 1V). Ifν2 6= 0 we have ν = (cos θ, sin θ) for some θ ∈ ]0, π[ and (88.4) reduces to(88.12) with Vθ := U(cos θ,sin θ).

As before we obtain a corollary as follows:

Corollary: Let R be an orthogonal lineon on V. Then there exist num-

bers m ∈ N and p ∈ N with 2m ≤ p ≤ n := dimV, an orthonormal basis

e := (ei | i ∈ n]) of V, and a list (θk | k ∈ m]) in ]0, π[ such that

Re2k−1 = cos θke2k−1 + sin θke2k

Re2k = cos θke2k − sin θke2k−1

for all k ∈ m]


Rei = −ei for all i ∈ p] \ (2m)],

Rei = ei for all i ∈ n] \ p].(88.14)

The only non-zero terms in the matrix [R]e of R relative to a basis efor which (88.14) holds are either 1 or −1 on the diagonal or in blocks of

the form

[

cos θk − sin θk

sin θk cos θk

]

along the diagonal. For example, if dimV = 3

and if R is not symmetric, then [R]e has the form

[R]e =

cos θ − sin θ 0sin θ cos θ 0

0 0 ±1

, θ ∈ ]0, π[. (88.15)

If the + sign is appropriate, then R is a rotation by an angle θ about anaxis in the direction of e3. If the −sign is appropriate, then R is obtainedfrom such a rotation by composing it with a reflection in the plane e3

⊥.

Notes 88

(1) The concepts of a pair-spectrum, pair-spectral space, angle-spectrum, and angle-spectral space, and the notations of Defs.2 and 3 are introduced here for the firsttime.

89 Complex Spaces, Unitary Spaces

Definition 1: A complex space is a linear space V (over R) endowed

with additional structure by the prescription of a lineon J on V that satisfies

J2 = −1V . We call J the complexor of V.

A complex space V acquires the natural structure of a linear space overthe field C of complex numbers if we stipulate that the scalar multiplicationsmC : C × V → V of V as a space over C be given by

smC(ζ,u) = (Re ζ)u + (Im ζ)Ju for all ζ ∈ C, (89.1)

where Re ζ and Im ζ denote the real and imaginary parts of ζ. Indeed, it iseasily verified that smC satisfies the axioms (S1)–(S4) of Sect.11 if we takeF := C. The scalar multiplication sm : R × V → V of V as a space over Ris simply the restriction of smC to R × V :

sm = smC|R×V . (89.2)

89. COMPLEX SPACES, UNITARY SPACES 335

We use the simplified notation

ζu := smC(ζ,u) for all ζ ∈ C, u ∈ V (89.3)

as described in Sect.11.Conversely, every linear space V over the field C of complex numbers

has the natural structure of a complex space in the sense of Def.1. Thestructure of V as a linear space over R is obtained simply by restricting thescalar multiplication of V to R ×V, and the complexor of V is the mapping(u 7→ iu) : V → V.

Let V be a complex space. Most of the properties and constructionsinvolving V discussed in Chaps.1 and 2 depend on whether V is regarded asa linear space over R or over C. To remove the resulting ambiguities, we usethe prefix “C” or a superscript C if we refer to the structure of V as a linearspace over C. For example, a non-empty subset c of V that is a C-basis isnot a basis relative to the structure of V as a space over R. However, it iseasily seen that c ∈ SubV is a C-basis set if and only if c ∪ J>(c) is a basisset of V. If this is the case, then c∩J>(c) = ∅ and hence, since J is injective,♯(c ∪ J>(c)) = 2(♯c). Therefore, the dimension of V is a linear space over Ris twice the dimension of V as a linear space over C:

dimV = 2dimC V. (89.4)

It follows that dimV must be an even number.Let V and V ′ be complex spaces, with complexors J and J′, respectively.

The set of all C-linear mappings from V to V ′ is easily seen to be given by

LinC(V,V ′) = L ∈ Lin(V,V ′) | LJ = J′L. (89.5)

In particular, we have LinC(V,V) =: LinCV = CommJ, the commutant-algebra of the complexor J (see 18.2). The set C1V := ζ1V | ζ ∈ C is asubalgebra of LinCV and we have

J = i1V . (89.6)

Let J be the complexor of a complex space V. Since (−J)2 = J2 = −1V ,we can also consider V as a complex space with −J rather than J as thedesignated complexor. We call the resulting structure of V the conjugate-complex structure of V. In view of (89.1), the corresponding scalar multi-plication, denoted by smC : C × V → V, is given by

smC(ζ,u) = (Re ζ)u − (Im ζ)Ju for all ζ ∈ C (89.7)


and we have

smC(ζ,u) = ζu for all ζ ∈ C, u ∈ V. (89.8)

where ζ denotes the complex-conjugate of ζ.

Let V and V ′ be complex spaces, with complexors J and J′, respectively.We say that a mapping from V to V ′ is conjugate-linear if it is C-linearwhen one of V and V ′ is considered as a linear space over C relative to itsconjugate-complex structure. The set of all conjugate-linear mappings from

V to V ′ will be denoted by LinC(V,V ′), and we have

LinC(V,V ′) = L ∈ Lin(V,V ′) | LJ = −J′L. (89.9)

A mapping L ∈ Lin(V,V ′) is conjugate-linear if and only if

L(ζu) = ζLu for all ζ ∈ C, u ∈ V. (89.10)

A mapping L : V → V ′ is C-linear when both V and V ′ are considered aslinear spaces over C relative to their conjugate-complex structures if andonly if it is C-linear in the ordinary sense.

Let V be a complex space with complexor J. Its dual V∗ then acquires thenatural structure of a complex space if we stipulate that the complexor of V∗

be J⊤ ∈ LinV∗. The dual V∗ := Lin(V, R) must be carefully distinguishedfrom the “complex dual” LinC(V, C) of V. However, the following resultshows that the two are naturally C-linearly isomorphic.

Proposition 1: The mapping

(ω 7→ Re ω) : LinC(V, C) → V∗. (89.11)

is a C-linear isomorphism. Its inverse Γ ∈ LisC(V∗, LinC(V, C)) is given by

(Γλ)u = λu− i λ(Ju) for all u ∈ V, λ ∈ V∗. (89.12)

Proof: We put V† := LinC(V, C) and denote the mapping (89.11) byΦ : V† → V∗, so that

(Φω)u = Re (ωu) for all ω ∈ V†, u ∈ V. (89.13)

It is clear that Φ is linear. By (22.3) and (89.6) we have

((J⊤Φ)ω)u = (Φω)(Ju) = Φω(iu)

= Φ(i (ωu)) = ((Φ(i1V†))ω)u


for all ω ∈ V† and all u ∈ V, and hence

J⊤Φ = Φ(i1V†).

Since J⊤ is the complexor of V∗ and i1V† the complexor of V†, it followsfrom (89.5) that Φ is C-linear.

We now define Γ : V∗ → V† by (89.12). Using (89.13), we then have

((ΓΦ)ω)u = (Γ(Φω))u = (Φω)u− i (Φω)(Ju)= Re (ωu) − i Re (ω(Ju)) = Re (ωu) − i Re (i (ωu))= Re (ωu) + i Im (ωu) = ωu

for all ω ∈ V† and all u ∈ V. It follows that ΓΦ = 1V† . In a similar way,one proves that ΦΓ = 1V∗ and hence that Γ = Φ−1.

Definition 2: A unitary space is an inner-product space V endowed

with additional structure by the prescription of a perpendicular turn J on V(see Def.1 of Sect.87).

We say that V is a genuine unitary space if its structure as an inner-

product space is genuine.

Since J2 = −1V by Prop.1 of Sect.87, a unitary space has the structure ofa complex space with complexor J in the sense of Def.1. The identificationV∗ ∼= V has to be treated with some care. The natural complex-spacestructure of V∗ is determined by the complexor J⊤ but, since J⊤ = −J fora perpendicular turn, the complex structure of V∗, when identified with V,is the conjugate-complex structure of V, not the original complex structure.The identification mapping (v 7→ v·) : V → V∗ (see Sect.41) is not C-linearbut conjugate-linear.

We define the unitary product up : V × V → C of a unitary space Vby

up(u,v) := Γ(v·)u for all u,v ∈ V, (89.14)

where Γ ∈ LisC(V∗, LinC(V, C)) is the natural isomorphism given by (89.12).In view of the remarks above, up(·,v) : V → C is C-linear for all v ∈ V andup(u, ·) : V → C is conjugate-linear for all u ∈ V. We use the simplifiednotation

〈u|v〉 := up(u,v) when u,v ∈ V. (89.15)

It follows from (89.15), (89.14), and (89.12) that

〈u|v〉 = v · u − i (v · Ju) = u · v + i (u · Jv),

Re 〈u|v〉 = u · v, 〈u|u〉 = u·2(89.16)


for all u,v ∈ V. The properties of up are reflected in the following rules,valid for all u,v,w ∈ V and all ζ ∈ C:

〈v|u〉 = 〈u|v〉, (89.17)

〈u + v|w〉 = 〈u|w〉 + 〈v|w〉, (89.18)

〈ζu|v〉 = ζ〈u|v〉 = 〈u|ζv〉. (89.19)

Let V be a unitary space. The following result, which is easily proved,describes the properties of transposition of C-lineons.

Proposition 2: If L ∈ LinCV then L⊤ ∈ LinCV and L⊤ is characterized

by

〈Lu|v〉 = 〈u|L⊤v〉 for all u,v ∈ V. (89.20)

The mapping (L 7→ L⊤) : LinCV → LinCV is conjugate-linear, so that

(ζL)⊤ = ζL⊤ for all ζ ∈ C, L ∈ LinCV. (89.21)

We use the notation

SymCV = SymV ∩ LinCV (89.22)

for the set of all symmetric C-lineons. This set is a subspace, but not aC-subspace, of LinCV. In fact, the following result shows how LinCV can berecovered from SymCV:

Theorem on Real and Imaginary Parts: Let V be a unitary space.

We have

i SymCV = SkewV ∩ LinCV (89.23)

and i SymCV is a supplement of SymCV in LinCV, i.e. to every C-lineon Lcorresponds a unique pair (H,G) such that

L = H + iG, and H,G ∈ SymCV. (89.24)

H is called the real part and G the imaginary part of L.

Proof: Let L ∈ LinCV be given. By (89.21) we have (iL)⊤ = −iL⊤ andhence L = −L⊤ if and only if (iL) = (iL)⊤. Since L ∈ LinCV was arbitrary,(89.23) follows.

Again let L ∈ LinCV be given and assume that (89.24) holds. SinceiG ∈ SkewV by (89.23), it follows that (H, iG) must be the additive decom-position of L described in the Additive Decomposition Theorem of Sect.86and hence that

H =1

2(L + L⊤), G =

i

2(L⊤ − L), (89.25)


which proves the uniqueness of (H,G). To prove existence, we define Hand G by (89.25) and observe, using (89.21), that they have the requiredproperties.

Since SymCV and i SymCV are linearly isomorphic subspaces of LinCVwe can use Props.5 and 7 of Sect.17 to obtain the following consequence ofthe Theorem.

Corollary: Let n := dimC V = 12 dimV. Then dim LinCV =

2dimC LinCV = 2n2 and dimSymCV = dim(SkewV ∩ LinCV) = n2.

Let V,V ′ be complex spaces. We say that a mapping U : V → V ′ isunitary if it is orthogonal and C-linear. It is easily seen that U : V → V ′ isunitary if and only if it is linear and preserves the unitary product definedby (89.14). The set of all unitary mappings from V to V ′ will be denoted byUnit(V,V ′), so that

Unit(V,V ′) := Orth(V,V ′) ∩ LinC(V,V ′) (89.26)

If U is unitary and invertible, then U−1 is again unitary (see Prop.1 ofSect.43). In this case, U is called a unitary isomorphism. We writeUnitV := Unit(V,V). It is clear that UnitV is a subgroup of OrthV andhence of LisV. The group UnitV is called the unitary group of the unitaryspace V.

Remark: Let I be a finite index set. The linear space CI over C (seeSect.14) has the natural structure of a unitary space whose inner square isgiven by

λ·2 :=∑

i∈I

|λi|2 for all λ ∈ Ci. (89.27)

Of course, the complexor is termwise multiplication by i. The unitary prod-uct is given by

〈λ|µ〉 =∑

i∈I

λiµi for all λ, µ ∈ CI (89.28)

Notes 89

(1) As far as I know, the treatment of complex spaces as real linear spaces with ad-ditional structure appears nowhere else in the literature. The textbooks just dealwith linear spaces over C. I believe that the approach taken here provides betterinsight, particularly into conjugate-linear structures and mappings.

(2) The term “complexor” is introduced here for the first time.


(3) Some people use the term “anti-linear” or the term “semi-linear” instead of“conjugate-linear”.

(4) Most textbooks deal only with what we call “genuine” unitary spaces and usethe term “unitary space” only in this restricted sense. The term “complex inner-product space” is also often used in this sense.

(5) What we call “unitary product” is very often called “complex inner-product” or“Hermitian”.

(6) The notations 〈u,v〉 and (u,v) are often used for the value 〈u |v〉 of the unitaryproduct. (See also Note (3) to Sect.41.)

(7) In most of the literature on theoretical physics, the order of u,v in the notation〈u |v〉 for the unitary product is reversed. Therefore, in textbooks on theoreticalphysics, (89.19) is replaced by

〈u | ζv〉 = ζ〈u |v〉 = 〈ζu |v〉.

(8) Symmetric C-lineons are very often called “self-adjoint” or “Hermitian”. I do notthink a special term is needed.

(9) Skew C-lineons are often called “skew-Hermitian” or “anti-Hermitian”.

(10) The notations Un or U(n) are often used for the unitary group UnitCn.

810 Complex Spectra

Let V be a complex space. As we have seen in Sect.89, V can be regardedboth as a linear space over R and as a linear space over C. We modify Def.2of Sect.82 as follows: Let L ∈ LinCV be given. For every ζ ∈ C, we write

SpsC

L(ζ) := Null (L − ζ1V), (810.1)

and, if it is non-zero, call it the spectral C-space of L for ζ. Of course, ifζ is real, we have SpsC

L(ζ) = SpsL(ζ).

The spectral C-spaces are C-subspaces of V. The C-spectrum of L isdefined by

SpecCL := ζ ∈ C | SpsC

L(ζ) 6= 0. (810.2)

It is clear that

SpecL = (SpecCL) ∩ R. (810.3)

The C-multiplicity function multC

L : C → N of L is defined by

multC

L(ζ) := dimC(SpsC

L(ζ)) for all ζ ∈ C. (810.4)

810. COMPLEX SPECTRA 341

In view of (89.4) we have

multL(σ) = 2multC

L(σ) for all σ ∈ R. (810.5)

SpecCL is the support of multC

L.The Theorem on Spectral Spaces of Sect.82 has the following analogue,

which is proved in the same way.Proposition 1: The C-spectrum of a C-lineon L on a complex space V

has at most dimC V members and the family (SpsC

L(ζ) | ζ ∈ SpecCL) of the

spectral C-spaces of L is disjunct.

The following are analogues of Prop.5 and Prop.6 of Sect.82 and areproved in a similar manner.

Proposition 2: Let L ∈ LinCV be given and assume that the family

(SpsL(ζ) | ζ ∈ SpecCL) of spectral C-spaces is a decomposition of V. If

(Eζ | ζ ∈ SpecCL) is the associated family of idempotents then

L =∑

ζ∈SpecCL

ζEζ . (810.6)

Proposition 3: Let L ∈ LinCV be given. Assume that Z is a finite

subset of C and that (Wζ | ζ ∈ Z) is a decomposition of V whose terms are

non-zero L-invariant C-subspaces of V and satisfy

L|Wζ= ζ1Wζ

for all ζ ∈ Z. (810.7)

Then Z = SpecCL and Wζ = SpsC

L(ζ) for all ζ ∈ Z.

From now on we assume that V is a genuine unitary space.Proposition 4: The C-spectrum of a symmetric C-lineon S ∈ SymCV

contains only real numbers, i.e. SpecCS = SpecS.

Proof: Let ζ ∈ SpecCS, so that Su = ζu for some u ∈ V×. UsingS = S⊤, (89.20), (89.19), and (89.16)4 we find

0 = 〈Su |u〉 − 〈u |Su〉 = 〈ζu |u〉 − 〈u | ζu〉

= (ζ − ζ)〈u |u〉 = (ζ − ζ)u·2.

Since V is genuine, we must have u·2 6= 0 and hence ζ = ζ, i.e. ζ ∈ R.Remark: A still shorter proof of Prop.4 can be obtained by applying

the Spectral Theorem and Prop.2, but the proof above is more elementaryin that it makes no use of the Spectral Theorem.

Let S ∈ SymCV be given. By Prop.4, the family of spectral C-spaces ofS coincides with the family of spectral spaces in the sense of Def.1 of Sect.82.


This family consists of C-subspaces of V and, by the Spectral Theorem, is anorthogonal decomposition of V. All the terms Eσ in the spectral resolutionof S (see Cor.1 to the Spectral Theorem) are C-linear. There exists a C-basise := (ei | i ∈ I) of V and a family λ = (λi | i ∈ I) in R such that (84.6)holds. Each σ ∈ SpecS occurs exactly multC

S(σ) times in the family λ.For C-linear normal lineons, the Structure Theorem of Sect.88 can be

replaced by the following simpler version.Spectral Theorem for Normal C-Lineons: Let V be a genuine uni-

tary space. A C-lineon on V is normal if and only if the family of its spectral

C-spaces is an orthogonal decomposition of V.

Proof: Let N ∈ LinCV be normal. Let H,G ∈ SymCV be the real andimaginary parts of N as described in the Theorem on Real and ImaginaryParts of Sect.89, so that N = H + iG. From this Theorem and Prop.1 ofSect.88 it follows that H and G commute. Let Z := SpecH = SpecCH. ByProp.2 of Sect.82, Uσ := SpsH(σ) is G-invariant for each σ ∈ Z. Hence wemay consider

Dσ := G|Uσfor all σ ∈ Z.

It is clear that Dσ ∈ SymCUσ for all σ ∈ Z. Put Qσ := SpecDσ = SpecCDσ

and W(σ,τ) := SpsDσ(τ) for all σ ∈ Z and all τ ∈ Qσ. It is easily seen that

W(σ,τ) is N-invariant and that

N|W(σ,τ)= σ1W(σ,τ)

+ i τ1W(σ,τ)= (σ + i τ)1W(σ,τ)

for all σ ∈ Z and all τ ∈ Qσ. Also, the family (W(σ,τ) | σ ∈ Z, τ ∈ Qσ)is an orthogonal decomposition of V whose terms are C-subspaces of V. Itfollows from Prop.3 that SpecCN = σ + i τ | σ ∈ Z, τ ∈ Qσ and thatSpsN(ζ) = W(Re ζ,Im ζ) for all ζ ∈ SpecCN.

Assume now that L ∈ LinCV is such that (SpsL(ζ) | ζ ∈ SpecCL)is an orthogonal decomposition of V. We can apply Prop.2 and con-clude that (810.6) holds. By Prop.4 of Sect.83, each of the idempotentsEζ , ζ ∈ SpecCL, belongs to SymCV. Using this fact and (89.21) we derivefrom (810.6) that

L⊤ =∑

ζ∈SpecCL

ζEζ

and hence, by (81.6), that

LL⊤ =∑

ζ∈SpecCL

(ζζ)Eζ = L⊤L,

i.e. that L is normal.


The C-spectrum of a normal C-lineon is related to its pair-spectrum (seeSect.88) by

PspecN = (Re ζ, |Im ζ|) | ζ ∈ SpecCN. (810.8)

A normal C-lineon is symmetric, skew, or unitary depending on whetherits C-spectrum consists only of real numbers, imaginary numbers, or num-bers of absolute value 1, respectively. The angle-spectrum of a unitary lineonU is related to its C-spectrum by

Aspec U = ϕ ∈ ]0, π[ | ei ϕ ∈ SpecCU or e−i ϕ ∈ SpecCU. (810.9)

The Spectral Theorem for Normal C-lineons has two corollaries that areanalogues of Cors.1 and 2 to the Spectral Theorem in Sect.84. We leave theformulation of these corollaries to the reader.


(1) Let V be a linear space, let (Ui | i ∈ I) be a finite family of subspacesof V, let Π be a partition of I (see Sect.01) and put

WJ :=∑

j∈J Uj for all J ∈ Π (P8.1)

(see (07.14)). Prove that the family (Ui | i ∈ I) is disjunct [a decom-position of V] if and only if the family (WJ | J ∈ Π) is disjunct [adecomposition of V] and, for each J ∈ Π, the family (Uj | j ∈ J) isdisjunct.

(2) Let (Ei | i ∈ I) be a finite family of lineons on a given linear space V.

(a) Show: If (81.5) and (81.6) hold, then (RngEi | i ∈ I) is a de-composition of V and (Ei | i ∈ I) is the family of idempotentsassociated with this decomposition.

(b) Prove: If (81.5) alone holds and if Ei is idempotent for everyi ∈ I, then (RngEi | i ∈ I) is already a decomposition of V andhence (81.6) follows. (Hint: Apply Prop.4 of Sect.81.)

(3) Let V be a genuine inner-product space and let S and T be symmetriclineons on V.

(a) Show that ST is symmetric if and only if S and T commute.


(b) Prove: If S and T commute then

Spec (ST) ⊂ στ | σ ∈ SpecS, τ ∈ SpecT. (P8.2)

(4) Let V be a genuine inner-product space and let S be a symmetriclineon on V. Show that the operator norm of S, as defined by (52.19),is given by

||S|| = max|σ| | σ ∈ SpecS. (P8.3)

(5) Let a genuine Euclidean space E with translation space V, a pointq ∈ E , and a strictly positive symmetric lineon S ∈ Pos+V be given.Then the set

S := x ∈ E | S(x − q) = 1

is called an ellipsoid centered at q, and the spectral spaces of S arecalled the principal directions of the ellipsoid. PutT := (

√S)−1

(see Sect.85). The spectral values of T are called the semi-axes of S.

(a) Show that

S = q + T>(UsphV). (P8.4)

(see (42.9)).

(b) Show: If e is a spectral unit vector of S, then

S ∩ (q + Re) = q − ae, q + ae, (P8.5)

where a is a semi-axis of S.

(c) Let e be an orthonormal basis-set whose terms are spectral vectorsof S (see Cor.2 to the Spectral Theorem). Let Γ be the Cartesiancoordinate system with origin q which satisfies ∇c | c ∈ Γ = e

(see Sect.74). Show that

S =

x ∈ E∣

∣ .∑

c∈Γ

c(x)2

a2c

= 1

, (P8.6)

where (ac | c ∈ Γ) is a family whose terms are the semi-axes ofthe ellipsoid S.


(d) Let p ∈ E and ρ ∈ P× be given and let α : E → E be an invertibleflat mapping. Show that the image α>(Sphp,ρE) under α of thesphere Sphp,ρE (see (46.11)) is an ellipsoid centered at α(p) whosesemi-axes form the set

ρ√

σ∣

∣ σ ∈ Spec ((∇α)⊤∇α)

.

(6) Let S be a symmetric lineon on a genuine inner-product space V andlet (Eσ | σ ∈ SpecS) be the spectral resolution of S.

(a) Show that, for all σ ∈ SpecS, we have

∫ 10 expV (ι(S− σ1V)) = Eσ +

∑

τ∈SpecS\σ

eτ−σ−1τ−σ

Eτ . (P8.7)

(b) Show that, for every M ∈ LinV, every σ ∈ SpecS, and everyv ∈ SpsS(σ),

((∇S expV)M)v =

eσEσ +∑

τ∈SpecS\σ

eτ−eσ

τ−σEτ

Mv. (P8.8)

(c) Show that expV is locally invertible near S.

(d) Prove that the lineonic logarithm log(V) defined in Sect.85 is of

class C1.

(7) Consider L ∈ Lin R3 and T ∈ Sym R3 as given by

L :=1

5

10 0 53 8 64 −6 8

, T :=

5 0 40 4 04 0 5

.

(a) Determine the spectrum and the spectral idempotents of T .

(b) Determine the right and left polar decompositions of L(see Sect.86).

(c) Find the spectrum and angle-spectrum of R := or(L) ∈ Orth R3

(see Sect.88).

(8) Let L be an invertible lineon on a genuine inner-product space V, andlet (R,S) be the right polar decomposition and (R,T) the left polardecomposition of L. Show that the following are equivalent:


(i) L ∈ SkewV,

(ii) R is a perpendicular turn that commutes with S or with T,

(iii) S = T and R ∈ SkewV,

(iv) SpsS(κ) = QspsL(κ) for all κ ∈ SpecS.

(9) Let V be a genuine inner-product space, let I ∈ Sub R be a genuineinterval, and let F : I → LisV be a lineonic process of class C1 withinvertible values. Define L : I → LinV by

L(t) := F•(t)F(t)−1 for all t ∈ I, (P8.9)

and define D : I → SymV and W : I → SkewV by the requirementsthat (D(t),W(t)) be the additive decomposition of L(t) for all t ∈ I(see Sect.86). For each t ∈ I, define F(t) : I → LisV by

F(t)(s) := F(s)F(t)−1 for all s ∈ I (P8.10)

and put R(t) := or F(t), U(t) := rp F(t), V(t) := lp F(t) (seeSect.86). Prove that

R•(t)(t) = W(t), U•(t)(t) = V•(t)(t) = D(t) for all t ∈ I (P8.11)

(Hint: Use Prop.1 of Sect.86).

(10) Let V be a genuine inner-product space. Prove: If N ∈ LinV is normaland satisfies N2 = −1V , then N is a perpendicular turn (see Def.1 ofSect.87 and Def.1 of Sect.88).

(11) Let S be a lineon on a genuine inner product space V. Prove that thefollowing are equivalent:

(i) S ∈ SymV ∩ OrthV,

(ii) S ∈ SymV and SpecS ⊂ 1,−1,

(iii) S ∈ OrthV and S2 = 1V ,

(iv) S = E−F for some symmetric idempotents E and F that satisfyEF = 0 and E + F = 1V ,

(v) S ∈ OrthV and Aspec S = ∅.


(12) Let U be a skew lineon on a genuine inner product space. Show thatU is a perpendicular turn if and only if Null U = 0 and Aspec U =

π2

.

(13) Let A be a skew lineon on a genuine inner-product space and letR := expV(A).

(a) Show that R is orthogonal. (Hint: Use the Corollary to Prop.2of Sect.66.)

(b) Show that the angle-spectrum of R is given, in terms of the quasi-spectrum of A, by

AspecR =

κ − π⌊ κπ

∣

∣ κ ∈ QspecA

, (P8.12)

where the floor ⌊a of a ∈ R is defined by⌊a := maxn ∈ Z | n ≤ a. (Hint: Use the Structure Theoremsof Sects.87 and 88 and the results of Problem 9 of Chap.6)

Remark: The assertion of Part (a) is valid even if the inner-productspace V is not genuine.

(14) Let V be a linear space and define J ∈ LinV2 by

J(v1,v2) = (−v2,v1) for all v ∈ V2. (P8.13)

(a) Show that J2 = −1V2 , and hence that V2 has the natural struc-ture of a complex space with complexor J (see Def.1 of Sect.89).The space V2, with this complex-space structure, is called thecomplexification of V.

(b) Show that

(ξ + i η)v = (ξv1 − ηv2, ηv1 + ξv2) (P8.14)

for all ξ, η ∈ R and all v ∈ V2.

(c) Prove: If L,M ∈ LinV, then the cross-product L × M ∈ LinV2

(see (04.18)) is C-linear if and only if M = L and it is conjugate-linear if and only if M = −L (see Sect.89).

(d) Let L ∈ LinV and ζ ∈ C be given. Show that ζ ∈ SpecC(L × L)if and only if either Im ζ = 0 and ζ ∈ SpecL or else Im ζ 6= 0 and(Re ζ, |Im ζ|) ∈ PspecL (see Def.2 of Sect.88). Conclude that


ζ ∈ SpecC(L × L) ⇐⇒ ζ ∈ SpecC(L × L). (P8.15)

(e) Let L ∈ LinV and ζ ∈ SpecC(L × L) be given and put ξ :=Re ζ, η := Im ζ, so that ζ = ξ + i η. Show that

SpsC

L×L(ζ) = (SpsL(ξ))2 if η = 0 (P8.16)

while

SpsC

L×L(ζ) =

(v,1

η(ξ1V − L)v) | v ∈ PspsL(ξ, |η|)

if η 6= 0. (P8.17)

(15) Let V be an inner-product space and let V2 be the complexificationof V as described in Problem 14. Recall that V2 carries the naturalstructure of an inner-product space (see Sect.44).

(a) Show that the complexor J ∈ LinV2 defined by (P8.14) is a per-pendicular turn and hence endows V2 with the structure of aunitary space.

(b) Show that the unitary product of V2 is given by

〈u |v〉 = u1 · v1 + u2 · v2 + i (u2 · v1 − u1 · v2) (P8.18)

for all u,v ∈ V2.

(c) Show that H ∈ LinV2 is a symmetric C-lineon on V2 if and onlyif H = T × T + J (A × A) for some T ∈ SymV and someA ∈ SkewV.

Chapter 9

The Structure of General

Lineons

In this chapter, it is assumed again that all linear spaces under considerationare finite-dimensional except when a statement to the contrary is made.However, they may be spaces over an arbitrary field F. If F is to be the fieldR of real numbers or the field C of complex numbers, it will be explicitlystated. We will make frequent use of the definitions and results of Sects.81and 82, which remain valid for finite-dimensional spaces over F even if F isnot R.

91 Elementary Decompositions

Let L be a lineon on a given linear space V.

Definition 1: We say that a subspace M of V is a minimal L-space[maximal L-space] if M is minimal [maximal] (with respect to inclusion)

among all L-subspaces of V that are different from the zero space 0 [the

whole space V].

If V is not itself a zero-space, then there always exist minimal and max-imal L-spaces. In fact, if U is an L-space and U 6= 0 [ U 6= V], then Uincludes [is included in] a minimal [maximal] L-space. The following resultfollows directly from Def.1.

Proposition 1: Let U be an L-subspace of V. If M is a minimal L-

space, then either M∩U = 0 or else M ⊂ U . If W is a maximal L-space,

then either W + U = V or else U ⊂ W.

The next result is an immediate consequence of Prop.2 of Sect.21 andProp.1 of Sect.82.

349

350 CHAPTER 9. THE STRUCTURE OF GENERAL LINEONS

Proposition 2: A subspace M of V is a minimal [maximal] L-space if

and only if the annihilator M⊥ of M is a maximal [minimal] L⊤-space.

Definition 2: We say that the lineon L on V is an elementary lineonif there is exactly one minimal L-space. If L is an arbitrary lineon on V,

an L-subspace U of V is called an elementary L-space if L|U is elemen-

tary. We say that a decomposition of V (see Sect.81) is an elementaryL-decomposition of V if all of its terms are elementary L-spaces.

The following result is an immediate consequence of Def.2.Proposition 3: Let L ∈ LinV be an elementary lineon and M its (only)

minimal L-space. If U is a non-zero L-space, then M ⊂ U and L|U ∈ Lin Uis again an elementary lineon and the (only) minimal L|U -space is again M.

We say that a lineon L ∈ LinV is simple if V 6= 0 and if there are noL-subspaces other than 0 and V. A simple lineon is elementary; its onlyminimal L-space is V and its only maximal L-space is 0. If L is elementarybut not simple and if W is a maximal L-space, then W includes the onlyminimal L-space of V. A lineon of the form λ1V , λ ∈ F, is elementary (andthen simple) if and only if dimV = 1.

The significance of Def.2 lies in the following theorem, which reduces thestudy of the structure of general lineons to that of elementary lineons.

Elementary Decomposition Theorem: For every lineon L there ex-

ists an elementary L-decomposition.

The proof is based on three lemmas. The proof of the first of these willbe deferred to Sect.93; it is the same as Cor.2 to the Structure Theorem forElementary Lineons.

Lemma 1: A lineon L is elementary if and only if there is exactly one

maximal L-space.

Lemma 2: Let L ∈ LinV, let W be a maximal L-space and let E be an L-

space that is minimal among all subspaces of V with the property W+E = V.

Then E is an elementary L-space.

Proof: By Prop.1 we have E 6⊂ W and hence E∩WsubsetneqqE . Now letU be an L-space that is properly included in E , i.e. an L|E -subspace of E otherthan E . In view of the assumed minimality of E , we have W+UsubsetneqqV.Using Prop.1 again, it follows that U ⊂ W and hence U ⊂ E ∩ W . Weconclude that W ∩ E is the only maximal L|E -space and hence, by Lemma1, that E is elementary.

Lemma 3: Let L ∈ LinV and let E be an elementary L-space with

greatest possible dimension. Then there is an L-space that is a supplement

of E in V.

Proof: Let M be the (only) minimal L-space included in E . By Prop.2,M⊥ is then a maximal L⊤-space. We choose a subspace E of V∗ that is

91. ELEMENTARY DECOMPOSITIONS 351

minimal among all subspaces of V∗ with the property M⊥ + E = V∗. ByLemma 2, E is an elementary L⊤-space.

In view of (21.11) and (22.5), we have 0 = V∗⊥ = (M⊥ + E)⊥ =M⊥⊥ ∩ E⊥ = M ∩ E⊥ and hence M 6⊂ E⊥ and also M 6⊂ E⊥ ∩ E . SinceE⊥ ∩ E is an L-subspace of V, it follows from Prop.1 that

E⊥ ∩ E = 0. (91.1)

Using Prop.4 of Sect.17 and the Formula (21.15) for Dimension of Annihi-lators, we easily conclude from (91.1) that

dim E ≥ dim E . (91.2)

Since E is elementary, we can repeat the argument above with L replacedby L⊤ and E replaced by E . Since L⊤⊤ = L, we thus find an elementaryL-subspace E ′ of V such that

dim E ′ ≥ dim E . (91.3)

Since E was assumed to have the greatest possible dimension, we must havedim E ′ ≤ dim E and hence, by (91.2) and (91.3), dim E = dim E , which gives

dimV = dim E⊥ + dim E = dim E⊥ + dim E .

In view of (91.1), it follows from Prop.5 of Sect.17 that E⊥ is a supplementof E .

Proof of the Theorem: We will show that there exists a collection ofelementary L-spaces which, when interpreted as a self-indexed family, is adecomposition of V.

We proceed by induction over the dimension of DomL. IfdimDomL = 0, then the empty collection is an elementary L-decomposition. Assume then, that L ∈ LinV with dimV > 0 is given andthat the assertion is valid for every lineon whose domain has a dimensionstrictly less than dimV.

Since V is not the zero-space, there exist minimal L-spaces and henceelementary L-spaces. We choose an elementary L-space E with great-est possible dimension. By Lemma 3, we can choose an L-subspace Uthat is a supplement of E in V. Since dimU = dimV − dim E < dimV,we can apply the induction hypothesis to L|U and determine a collectionF of elementary L|U -spaces which, self-indexed, is a decomposition of U .The elements of F are elementary L-spaces. Since E is a supplement of


U =∑

(F | F ∈ F), it follows from Prop.2,(iii) of Sect.81 that F∪ E is anelementary L-decomposition of V.

We say that a lineon L ∈ LinV is semi-simple if there is an L-decompo-sition (Ui | i ∈ I) of V such that L|Ui

is simple for each i ∈ I. Of course, sucha decomposition is an elementary decomposition.

If V is a genuine inner product space, then every normal lineon N (andhence every skew, symmetric, or orthogonal lineon) on V is semi-simple.Indeed, if we choose a basis e as in the Corollary to the Structure Theoremfor Normal Lineons of Sect.88 and then define

Uk :=

Lspe2k−1, e2k for all k ∈ m]

Lspe2m+k for all k ∈ (n − m)] \ m]

, (91.4)

then (Uk | k ∈ (n−m)]) is a decomposition of V such that L|Ukis simple for

each k ∈ (n − m)].

Pitfall: The collection of subspaces that are the terms of an elementaryL-decomposition is, in general, not uniquely determined by L. For example,if L = 1V , and if b := (bi | i ∈ I) is a basis of V, then (Lspbi | i ∈ I)is an elementary 1V -decomposition of V. If dimV > 1, then the collectionLspbi | i ∈ I depends on the choice of the basis b and is not uniquelydetermined by V.

Notes 91

(1) The concept of an elementary lineon and the term “elementary” in the sense ofDef.2, were introduced by me in 1970 (see Part E of the Introduction). I chose thisterm to correspond to the commonly accepted term “elementary divisor”. In fact,the elementary divisors of a lineon can be matched, if counted with appropriatemultiplicity, with the terms of any elementary decomposition of the lineon (seeSect.95).

(2) The approach to the structure of general lineons presented in this Chapter wasdeveloped by me before 1970 and is very different from any that I have seen in theliterature. The conventional approaches usually arrive at an elementary decom-position only after an intermediate (often called “primary”) decomposition, whichI was able to bypass. Also, the conventional treatments make heavy use of theproperties of the ring of polynomials over F (unique factorization, principal idealring property, etc.). My approach uses only some of the most basic concepts andfacts of linear algebra as presented in Chaps.1 and 2. Not even determinants areneeded. The proofs are all based, in essence, on dimensional considerations.

92. LINEONIC POLYNOMIAL FUNCTIONS 353

92 Lineonic Polynomial Functions

Let F be any field. The elements of F(N), i.e. the sequences in F indexed onN and with finite support (see (07.10)), are called polynomials over F. Aswe noted in Sect.14, F(N) is a linear space over F. It is infinite-dimensional.We use the abbreviation

ι := δN

1 , (92.5)

where δN is the standard basis of F(N) (see Sect.16). The space F(N) acquiresthe structure of a commutative ring (see Sect.06) if we define its multiplica-tion (p, q) 7→ pq by

(pq)n :=∑

k∈n[

pkqn−k for all n ∈ N. (92.6)

This multiplication is characterized by the condition that the n’th power ιn

of ι be given byιn = δN

n for n ∈ N. (92.7)

The unity of F(N) is ι0 = δN0 . We identify F with the subring Fι0 of F(N) and

hence writeξ = ξι0 = ξδN

0 for all ξ ∈ F, (92.8)

so that 1 = ι0 denotes the unity of both F and F(N). We have

p =∑

k∈N

pkιk for all p ∈ F(N). (92.9)

The degree of a non-zero polynomial p is defined by

deg p := maxSupt p = maxk ∈ N | pk 6= 0 (92.10)

(see (07.9)). If p ∈ F(N) is zero or if deg p ≤ n, then

p =∑

k∈(n+1)[

pkιk. (92.11)

We havedeg (pq) = deg p + deg q (92.12)

for all p, q ∈(

F(N))×

. We say that p ∈(

F(N))×

is a monic polynomial ifpdeg p = 1. If this is the case, then

p = ιdeg p +∑

k∈(deg p)[

pkιk. (92.13)


The polynomial function (ξ 7→ p(ξ)) : F → F associated with a givenpolynomial p ∈ F(N) is defined by

p(ξ) :=∑

k∈N

pkξk for all ξ ∈ F. (92.14)

We call p(ξ) the value of p at ξ. For every given ξ ∈ F, the mapping(p 7→ p(ξ)) : F(N) → F is linear and preserves products, i.e. we have

p(ξ)q(ξ) = (pq)(ξ) for all p, q ∈ F(N). (92.15)

Remark: If F := R, then there is a one-to-one correspondence betweenpolynomials and the associated polynomial functions. In fact, using onlybasic facts of elementary calculus, one can show that f : R → R is a poly-nomial function if and only if, for some n ∈ N, f is n times differentiable andf (n) = 0 (see Problem 4 in Chap.1). If this is the case, then there is exactlyone p ∈ R(N) such that f = (ξ 7→ p(ξ)) and, if f 6= 0, then deg p < n. In thecase when F := R, one often identifies the ring R(N) of polynomials over Rwith the ring of associated polynomial functions; then ι as defined by (92.1)becomes identified with the identity mapping of R as in the abbreviation(08.26).

If F is any infinite field, one can easily show that there is still a one-to-onecorrespondence between polynomials and polynomial functions. However, ifF is a finite field, then every function from F to F is a polynomial functionassociated with infinitely many different polynomials.

Let V be a linear space over F. The lineonic polynomial function(L 7→ p(L)) : LinV → LinV associated with a given p ∈ F(N) is defined by

p(L) :=∑

k∈N

pkLk for all L ∈ LinV. (92.16)

We call p(L) the value of p at L. Again, for every given L ∈ LinV, themapping (p 7→ p(L)) : F(N) → LinV is linear and preserves products, i.e. wehave

p(L)q(L) = (pq)(L) for all p, q ∈ F(N). (92.17)

The following statements are easily seen to be valid for every lineonL ∈ LinV and every polynomial p ∈ F(N).

(I) L commutes with p(L).

(II) Every L-invariant subspace of V is also p(L)-invariant.


(III) Null p(L) and Rng p(L) are L-invariant.

(IV) If U is an L-subspace of V, then

p(L|U) = p(L)|U . (92.18)

(V) We havep(L⊤) = (p(L))⊤. (92.19)

Pitfall: To call p(ξ), as defined by (92.10), and p(L), as defined by(92.12), the value of the polynomial p at ξ and L, respectively, is merely afigure of speech. One must often carefully distinguish between the polyno-mial p, the polynomial function associated with p, and the lineonic polyno-mial functions associated with p. For example, if p is a polynomial over C,the polynomial function associated with the termwise complex-conjugate pof p is not the same as the value-wise complex-conjugate of the polynomialfunction associated with p. In some contexts it is useful to introduce explicitnotations for the polynomial functions associated with p (see, e.g. Problem13 in Chapt.6).

Let a lineon L ∈ LinV be given. It is clear that the intersection of anycollection of L-invariant subspaces of V is again L-invariant. We denotethe span-mapping corresponding to the collection of all L-subspaces of V byLspL (see Sect.03). If S ∈ SubV we call LspLS the linear L-span of S; itis the smallest L-space that includes the set S. It is easily seen that

LspLS = LspLkv | v ∈ S, k ∈ N, (92.20)

and, for each v ∈ V,

LspLv = p(L)v | p ∈ F(N). (92.21)

Proposition 1: If p ∈(

F(N))×

and v ∈ Null p(L), then

dimLspLv ≤ deg p. (92.22)

Proof: We put m := deg p. Since v ∈ Null p(L), we have

0 = p(L)v = pm(Lmv) +∑

k∈m[

pk(Lkv).

Since pm 6= 0, it follows that Lmv ∈ LspLkv | k ∈ m[. Hence, for everyr ∈ N, we have

Lm+rv ∈ (Lr)>(LspLkv | k ∈ m[) = LspLk+rv | k ∈ m[.


Using induction over r ∈ N, we conclude that Lm+rv ∈ LspLkv | k ∈ m[for all r ∈ N and hence, in view of (92.16), that

LspLv = LspLkv | k ∈ m[.

Since the list (Lkv | k ∈ m[) has m = deg p terms, it follows from theCharacterization of Dimension of Sect.17 that (92.18) holds.

Proposition 2: For every lineon L ∈ LinV, there is a unique monic

polynomial p of least degree whose value at L is zero. This polynomial p is

called the minimal polynomial of L.

Proof: Since LinV is finite-dimensional, the sequence (Lk | k ∈ N) mustbe linearly dependent. In view of (92.12) this means that there is a p ∈(

F(N))×

such that p(L) = 0. It follows that there is a monic polynomial qof least degree such that q(L) = 0. If q′ were another such polynomial, then(q − q′)(L) = 0. If q 6= q′, then deg (q − q′) < deg q = deg q′, and hencea suitable multiple of q − q′ would be a monic polynomial with a degreestrictly less than deg q whose value at L is still zero.

If L ∈ LinV is given and if U is an L-subspace of V then the mini-mal polynomial of L|U is the monic polynomial of least degree such thatU ⊂ Null q(L). Using poetic license, we sometimes call this polynomial theminimal polynomial of U .

Proposition 3: Let L ∈ LinV and v ∈ V be given, and let q be the

minimal polynomial of L. If V = LspLv, then dimV = deg q.Proof: Put n := dimV and consider the list (Lkv | k ∈ (n + 1)[). Since

this list has n + 1 terms, it follows from the Characterization of Dimensionthat it must be a linearly dependent list. This means that

0 =∑

k∈(n+1)[

hk(Lkv) = h(L)v

for some non-zero polynomial h with deg h ≤ n. By (92.13), we have

h(L)(p(L)v) = (hp)(L)v = ((ph)(L))v = p(L)(h(L)v) = 0

for all p ∈ F(N). Since V = p(L)v | p ∈ F(N) by (92.17), it follows that hhas the value zero at L. Therefore, since q is the minimal polynomial of L,we have deg q ≤ deg h ≤ n. On the other hand, application of Prop.1 to qyields n = dimV ≤ deg q.

We say that a polynomial p is prime if (i) p is monic, (ii) deg p > 0, (iii)p is not the product of two monic polynomials that are both different fromp. We will see in the next section that in order to understand the structure


of elementary lineons, one must know the structure of prime polynomials.Unfortunately, for certain fields F (for example for F := Q) it is very difficultto describe all possible prime polynomials over F. However in the case whenF is C or R, the following theorem gives a simple description.

Theorem on Prime Polynomials Over C and R: A polynomial pover C is prime if and only if it is of the form p = ι − ζ for some ζ ∈ C.

A polynomial p over R is prime if and only if it is either of the form

p = ι − λ for some λ ∈ R or else of the form p = (ι − µ)2 + κ2 for some

(µ, κ) ∈ R × P×.

The proof of this theorem depends on the following Lemma.Lemma 1: If p is a polynomial over C with deg p ≥ 1, then the equation

? z ∈ C, p(z) = 0 has at least one solution.

The assertion of this Lemma is included in Part (d) of Problem 13 inChap.6. The proof is quite difficult, but Problem 13 in Chap.6 gives anoutline from which the reader can construct a detailed proof.

Lemma 2: Let p be a monic polynomial over a field F and let ξ ∈ F be

such that p(ξ) = 0. Then there is a unique monic polynomial q over F such

that

p = (ι − ξ)q. (92.23)

The proof of this lemma, which is easy, is based on what is usually called“long division” of polynomials. We also leave the details to the reader.

Proof of the Theorem: Let p be a prime polynomial over C. ByLemma 1, we can find ζ ∈ C such that p(ζ) = 0. By Lemma 2, we thenhave p = (ι − ζ)q for some monic polynomial q and hence, since p is prime,p = ι − ζ and q = 1.

Let now p be a prime polynomial over R. If there exists a λ ∈ R suchthat p(λ) = 0 we have p = (ι− λ)q for some monic polynomial q and hence,since p is prime, p = (ι − λ) and q = 1. Assume, then, that p(λ) 6= 0 for allλ ∈ R. Since R ⊂ C and hence R(N) ⊂ C(N), p is also a monic polynomialover C. Hence, by Lemma 1, we can find ζ ∈ C \ R such that p(ζ) = 0 andhence, by Lemma 2, a monic polynomial q ∈ C(N) such that p = (ι − ζ)q.Since p ∈ R(N), we have

0 = p(ζ) = p(ζ) = p(ζ) = 0.

where p denotes the termwise complex-conjugate of p. It follows that(ζ − ζ)q(ζ) = 0. Since ζ ∈ C \ R we have ζ − ζ 6= 0 and hence q(ζ) = 0.Using Lemma 2 again, we can find a monic polynomial r ∈ C(N) such thatq = (ι − ζ)r and hence

p = (ι − ζ)q = (ι − ζ)(ι − ζ)r.


If we put µ := Re ζ and κ := |Im ζ| then (ι − ζ)(ι − ζ) = (ι − µ)2 + κ2 andhence

p = ((ι − µ)2 + κ2)r. (92.24)

Now, since p = p, it follows from (92.20) that

0 = ((ι − µ)2 + κ2)(r − r).

Since r − r ∈ R(N) and since the polynomial function associated with((ι − µ)2 + κ2) has strictly positive values, the polynomial function asso-ciated with r − r must be the zero function. In view of the Remark above,it follows that r = r and hence r ∈ R(N). Since p is prime, it follows from(92.20) that p = (ι − µ)2 + κ2 and r = 1.

It is easy to see that polynomials of the forms described in the Theoremare in fact prime.

93 The Structure of Elementary Lineons

s The following result is the basis for understanding the structure of elemen-tary lineons.

Structure Theorem for Elementary Lineons: Let L ∈ LinV be an

elementary lineon and let M be the (only) minimal L-space. Then:

(a) There is a unique monic polynomial q such that

M = Null q(L) and deg q = dimM. (93.1)

This polynomial q is prime.

(b) dimV is a multiple of dimM, i.e. there is a (unique) d ∈ N× such

that

dimV = d dimM. (93.2)

(c) There are exactly d + 1 L-spaces; they are given by

Hk := Null qk(L) = Rng qd−k(L), k ∈ (d + 1)[, (93.3)

and their dimensions are

dimHk = k dimM, k ∈ (d + 1)[. (93.4)

93. THE STRUCTURE OF ELEMENTARY LINEONS 359


H0 = 0, H1 = M, Hd = V. (93.5)

The polynomial q will be called the prime polynomial of L and the

number d the depth of L.

Proof of Part (a): Let q be a polynomial that satisfies (93.1). Wechoose v ∈ M×. Since v ⊂ M and since LspLv is the smallest L-spacethat includes v, it follows that LspLv ⊂ M. On the other hand, sinceLspLv is a non-zero L-subspace, it follows from Prop.3 of Sect.91 thatM ⊂ LspLv. We conclude that M = LspLv = LspL|M

v. By Prop.3of Sect.92, it follows that dimM is the degree of the minimal polynomial ofL|M. Since q has the value zero at L|M by (93.1) and since deg q = dimM, itfollows that q must be the minimal polynomial of L|M, and hence is uniquelydetermined by L.

On the other hand, if q is defined to be the minimal polynomial of L|M,it is not hard to verify, using Prop.3 of Sect.91 and Props.1 and 3 of Sect.92,that (93.1) holds.

To prove that q is prime, assume that q = q1q2, where q1 and q2 aremonic polynomials. It cannot happen that both Null q1(L) := 0 andNull q2(L) = 0, because this would imply M = Null p(L) = 0. As-sume, for example, that Null q1(L) 6= 0 and choose v ∈ (Null q1(L))×.By Prop.3 of Sect.91 it follows that M ⊂ LspLv and hence, by Prop.1 ofSect.92, that deg q = dimM ≤ dim LspLv ≤ deg q1. In view of (92.8),this is possible only when deg q = deg q1 and deg q2 = 0 and hence q1 = qand q2 = 1.

The proof of the remaining assertions will be based on the following

Lemma: There is a d ∈ N× such that

qd(L) = 0, dimV = d dimM, (93.6)

and

dimRng qk(L) = (d − k) dimM for all k ∈ d]. (93.7)

Proof: Let j ∈ N be given and assume that qj(L) 6= 0, i.e. that U :=Rng qj(L) is not the zero-space. By Prop.3 of Sect.91 we then have M ⊂ U .By Part (a), it follows that M = Null q(L) = Null (q(L)|U). Noting thatRng (q(L)|U) = Rng qj+1(L), we can apply the Theorem on Dimensions ofRange and Nullspace of Sect.17 to q(L)|U and obtain

dim Rng qj+1(L) + dimM = dimRng qj(L) (93.8)


for all j ∈ N such that qj(L) 6= 0. For j = 0 (93.8) reduces to

dim Rng q(L) + dimM = dimV. (93.9)

Using induction, we conclude from (93.8) and (93.9) that

dimRng qj+1(L) = dimV − (j + 1) dimM (93.10)

for all j ∈ N for which qj(L) 6= 0. Since the right side of (93.10) becomesnegative for large enough j, there must be a d ∈ N× such that qd(L) = 0and 0 = dimV − d(dimM), which proves (93.6). We have qk−1(L) 6= 0 forall k ∈ d]. Hence, if we substitute j := k − 1 and dimV = d(dimM) into(93.10), we obtain (93.7).

The part (93.6)2 of the Lemma asserts the validity of Part (b) of theTheorem.

Proof of Part (c): Let k ∈ (d + 1)[ be given and put

Hk := Null qk(L). (93.11)

We apply the Theorem on Dimensions of Range and Nullspace to qk(L) anduse the Lemma to obtain

dimHk = dimV − dimRng qk(L) = k dimM, (93.12)

which proves (93.4). On the other hand, by (93.6)1, we have

0 = Rng qd(L) = qk(L)>(Rng qd−k(L))

and hence Rng qd−k(L) ⊂ Hk. But by (93.12) and (93.7) we have dimHk =dim Rng qd−k(L) and hence Hk = Rng qd−k(L), which proves (93.3).

Now let U be an L-subspace of V. If U = 0, then U = H0. IfU 6= 0, then Prop.3 of Sect.81 applies. Hence we can apply the Lemmato L|U instead of to L and conclude that there is a k ∈ N× such thatdimU = k dimM and

0 = qk(L|U ) = qk(L)|U , i.e. U ⊂ Null qk(L) = Hk.

Since dimU = k dimM = dimHk by (93.12), it follows that Hk = U . SinceU was an arbitrary L-subspace of V, Part (c) follows.

We now list several corollaries. The first is an immediate consequence ofPart (c) of the Theorem.

Corollary 1: A lineon L ∈ LinV is elementary if and only if the collec-

tion of L-spaces is totally ordered by inclusion, which means that, given any

two L-spaces, one of them must be included in the other.

93. THE STRUCTURE OF ELEMENTARY LINEONS 361

The next two corollaries will be proved together.Corollary 2: A lineon L is elementary if and only if there is exactly

one maximal L-space.

Corollary 3: If the lineon L is elementary, so is L⊤, and L⊤ has the

same prime polynomial and the same depth as L.

Proof: If L is elementary, with prime polynomial q and depth d, then,by Part (c) of the Theorem, Hd−1 := Null qd−1(L) is the only maximalL-space.

If there is only one maximal L-space W , then W⊥ is the only minimalL⊤-space and hence L⊤ is elementary. In particular, if L is elementary, so isL⊤. Hence, if L has only one maximal L-space, we can apply this observatonto L⊤ and conclude that (L⊤)⊤ = L must be elementary. Thus, Cor.2 andthe first assertion of Cor.3 are proved.

If q is the prime polynomial of L, then W := Hd−1 = Rng q(L) is theonly maximal L-space and hence

W⊥ = (Rng q(L))⊥ = Null q(L)⊤ = Null q(L⊤)

(see (21.13) and (92.15)) is the only minimal L⊤-space. We have

dimW⊥ = dimV − dimW = dimV − dim Rng q(L)

= dimNull q(L) = dimM = deg q

by the Formula for Dimension of Annihilators, the Theorem on Dimensionsof Range and Nullspace and Part (a) of the Theorem. The uniquenessassertion of Part (a) of the Theorem, applied to L⊤ instead of to L, showsthat q is also the prime polynomial of L⊤. Since dimV∗ = dimV anddimW⊥ = dimM, it follows from part (b) of the Theorem that d is alsothe depth of L⊤.

Corollary 4: If L ∈ LinV is elementary, if W is the (only) maximal

L-space and if v ∈ V \W, then V = LspLv.Proof: We have LspLv 6⊂ W . Since all L-spaces other than V must

be included in W , it follows that LspLv = V.Corollary 5: Let L ∈ LinV be an elementary lineon with prime polyno-

mial q and depth d. Then dimV = deg qd and qd is the minimal polynomial

of L.

Proof: Since V = Null qd(L) by Part (c) of the Theorem, qd has thevalue zero at L. By Cor.4 and Prop.3 of Sect.92, the degree of the minimalpolynomial of L must be dimV. On the other hand, by (92.8), (93.1)2, and(93.2), we have deg qd = d deg q = d(dimM) = dimV. Hence qd must bethe minimal polynomial of L.


Remark: Another proof of the Structure Theorem can be based on thefact, known to readers familiar with algebra, that the ring F(N) of polyno-mials is a principal-ideal ring. The proof goes as follows: First, one showsthat if L ∈ LinV is cyclic in the sense that V = LspLv for some v ∈ V,then a subspace U of V is an L-space if and only if U = Rng p(L) for somemonic divisor p of the minimal polynomial of L. One deduces from this thatif there is only one maximal L-space, the minimal polynomial of L must beof the form qd, where q is a prime polynomial. Using an argument like theone used in the proofs of Cor.2, and 3, one proves that Cor.2 and 3 are valid.The remainder of the proof is then very easy.

As in the case when F := R (see Sect.82) the spectrum of a lineon Lon a linear space V over F is defined by

SpecL := σ ∈ F | Null (L − σ1V) 6= 0. (93.13)

If L is elementary, then SpecL is non-empty only in the exceptional casedescribed as follows.

Proposition 1: If L ∈ LinV is elementary and has a non-empty spec-

trum, then this spectrum is a singleton, the only minimal L-space is one-

dimensional, the prime polynomial of L is ι − σ when σ :∈ SpecL, and the

depth of L is n := dimV. Also, there are exactly n+1 L-subspaces and they

are given by

Hk := Null (L − σ1V)k = Rng (L − σ1V)n−k, k ∈ (n + 1)[. (93.14)

Proof: Let L ∈ LinV by given. If σ ∈ SpecL, then everyone-dimensional subspace of Null (L−σ1V) is evidently a minimal L-space.Therefore, if SpecL has more than one element or if dim Null (L − σ1V) ≥ 2for some σ ∈ SpecL, there are at least two distinct minimal L-spaces. Hence,if L is elementary, SpecL can have only one element. If σ is this element,then Null (L− σ1V) is one-dimensional and it is the only minimal L-space.The remaining statements follow immediately from the Structure Theoremfor Elementary Lineons.

A lineon L is said to be nilpotent if Lm = 0 for some m ∈ N×. Theleast such m is then called the nilpotency of L.

Proposition 2: A lineon L ∈ LinV is both elementary and

non-invertible if and only if it is nilpotent with a nilpotency equal to dimV.

Proof: By Prop.1 of Sect.18 and (93.13), L is non-invertible if and onlyif 0 ∈ SpecL. In view of this fact, the assertion follows immediately fromProp.1.

94. CANONICAL MATRICES 363

Notes 93

(1) The use of the terms “prime polynomial” and “depth” in the sense described in theStructure Theorem for Elementary Lineons was recently proposed by J. J. Schaffer.

(2) What we call simply the “nilpotency” of a nilpotent lineon is sometimes called the“index of nilpotence”.

94 Canonical Matrices

We assume that a linear space V over the field F is given. In this section, wesuggest methods for finding bases of V relative to which a given lineon onL has a matrix of a simple and illuminating form. Such a matrix is calleda canonical matrix. In view of the Elementary Decomposition Theoremof Sect.91, it is sufficient to consider only elementary lineons. Indeed, let(Ei | i ∈ I) be an elementary L-decomposition for a given L ∈ LinV and let,for each i ∈ I, a basis b(i) be determined such that the matrix Mi of L|Ei

relative to b(i) is canonical. Then one can “put together” (possibly withthe help of reindexing) the bases b(i), i ∈ I, to obtain a basis b of V suchthat the only non-zero terms of the matrix M of L relative to b are thosein blocks of the form Mi along the diagonal. An illustration will be given atthe end of this section.

We first deal with elementary lineons having a non-empty spectrum.

Proposition 1: If L ∈ LinV is elementary and if its spectrum is not

empty, then there is a list-basis b := (bi | i ∈ n]), n := dimV, and σ ∈ Fsuch that

Lbi =

σbi if i = 1

σbi + bi−1 if i ∈ n] \ 1

. (94.1)

Proof: By Prop.1 of Sect.93, there is σ ∈ F such that SpecL = σ andthe only maximal L-space is Hn−1 = Rng (L−σ1V). We choose v ∈ V\Hn−1

and define the list b by bi := (L − σ1V)n−iv for all i ∈ n]. We then have(L−σ1V)b1 = (L−σ1V)nv = 0 and (L−σ1V)bi = bi−1 for all i ∈ n] \1,which proves (94.1).

The matrix [L]b of L relative to a basis b for which (94.1) holds is acanonical matrix. It is given by

([L]b)i,j =

σ if j = i1 if j = i + 10 otherwise

. (94.2)


If n is small, it can be recorded explicitly in the form

[L]b =

σ 1σ 1

· ·· ·

σ 1σ

, (94.3)

where zeros are omitted.

From now on we confine ourselves to the case when F := C or F := R.

Let V be a linear space over C, let L ∈ LinV be an elementary lineon,and let q be its prime polynomial. By the Theorem on Prime Polynomialsover C and R of Sect.92, q must be of the form q = ι − σ for some σ ∈ C.By (93.1) the minimal L-space is M = Null q(L) = Null (L − σ1V). SinceM 6= 0, it follows that σ belongs to the C-spectrum of L. Hence thisspectrum is not empty. Therefore, Prop.1 applies and one can find a basisb := (bi | i ∈ n]) such that the matrix [L]b is given by (94.2).

In sum, if F := C, the situation covered by Prop.1 of Sect.93 and Prop.1above is general rather than exceptional.

We now assume that V is a linear space over R and that L ∈ LinV isan elementary lineon. Let q be its prime polynomial. By the Theorem onPrime Polynomials over C and R, we must have either q = ι − λ for someλ ∈ R or else q = (ι−µ)2 +κ2 for some (µ, κ) ∈ R×P×. In the former case,Prop.1 applies again. The latter case is covered by the following result.

Proposition 2: Let L ∈ LinV be an elementary lineon whose prime

polynomial is q = (ι − µ)2 + κ2, (µ, κ) ∈ R × P×, and whose depth is d.Then there is a list-basis b := (bi | i ∈ n]), n := dimV = 2d, such that

Lb1 = µb1 + κb2,Lb2 = −κb1 + µb2,

(94.4)

Lb2k+1 = µb2k+1 + κb2k+2 + b2k−1

Lb2k+2 = −κb2k+1 + µb2k+2 + b2k

for all k ∈ (d − 1)]. (94.5)

For each k ∈ d], the space Lspb2k−1,b2k is a supplement of the L-space

Hk in the L-space Hk−1 (see (93.3)).

Proof: By the Structure Theorem for Elementary Lineons of Sect.93,the L-subspaces are given by (93.3), and Hd−1 = Null qd−1(L) is the onlymaximal L-space. We use the abbreviation D := L − µ1V , so that q(L) =

94. CANONICAL MATRICES 365

D2 + κ21V . We choose v ∈ V \ Hd−1 and define the lists e := (ek | k ∈ d])and f := (fk | k ∈ d]) recursively by

e1 := v, f1 :=1

κDe1, (94.6)

ek+1 := Dek + κfkfk+1 := Dfk − κek

for all k ∈ (d − 1)]. (94.7)

An easy calculation shows that

κe1 = 1κq(L)e1 − Df1,

κek+1 = q(L)fk −Dfk+1 for all k ∈ (d − 1)],

and

κek+1 = q(L)(Dfk−1 − fk) + 2κ2fk for all k ∈ (d − 1)].

Since q(L)d = 0, it follows that

qd−1(L)ek = − 1κDqd−1(L)fk for all k ∈ d],

qd−1(L)fk = 12κ

qd−1(L)ek+1 for all k ∈ (d − 1)]

. (94.8)

Using the fact that v = e1 /∈ Hd−1 = Null qd−1(L) and henceq(L)d−1e1 6= 0, we conclude from (94.8), by induction, that

q(L)d−1ek 6= 0 and q(L)d−1fk 6= 0 for all k ∈ d]. (94.9)

We now define the list b := (bi | i ∈ n]) by

b2k−1 := q(L)d−kek

b2k := q(L)d−kfk

for all k ∈ d]. (94.10)

It follows easily from (94.6) and (94.7) that (94.4) and (94.5) hold andhence that Lspbi | i ∈ n] is L-invariant. On the other hand, we havebn−1 = b2d−1 = ed by (94.10), and hence bn−1 /∈ Null q(L)d−1 = Hd−1 by(94.9). Since Hd−1 is the only maximal L-space, it follows from Cor.4 ofSect.93 that LspLbn−1 = V. We conclude that Lspbi | i ∈ n] = V andhence, by the Theorem on Characterization of Dimension, that b is indeeda basis of V.

The fact that Lspb2k−1,b2k is a supplement of Hk in Hk−1 is animmediate consequence of (94.9), (94.10), and (93.3).


The matrix [L]b of L relative to a basis b for which (94.4) and (94.5)hold is a canonical matrix. It is given by

([L]b)i,j =

µ if j = iκ if j = i + 1 and i is odd

−κ if j = i − 1 and i is even1 if j = i + 20 otherwise

. (94.11)

If n is small, it can be recorded explicitly in the form

[L]b =

µ κ 1 0−κ µ 0 1

µ κ 1 0−κ µ 0 1

· ·· ·

µ κ 1 0−κ µ 0 1

µ κ−κ µ

, (94.12)

where zeros are omitted.We now illustrate our results by considering a linear space V over R with

dimV = 4. If L is a lineon on V, we can then find a basis b := (bi | i ∈ 4])such that the matrix [L]b has one of the following 9 forms (zeros are notwritten):

λ1

λ2

λ3

λ4

,

σ 1σ

λ1

λ2

,

µ κ−κ µ

λ1

λ2

,

σ 1σ 1

σλ

,

σ1 1σ1

σ2 1σ2

,

µ κ−κ µ

σ 1σ

,

µ1 κ1

−κ1 µ1

µ2 κ2

−κ2 µ2

,

σ 1σ 1

σ 1σ

,

µ κ 1−κ µ 1

µ κ−κ µ

.

95. SIMILARITY, ELEMENTARY DIVISORS 367

The number of terms in an elementary decomposition of L is 4 in the firstform, 3 in the second and third, 2 in the fourth up to the seventh, and 1 in thelast two. The first, third, and seventh forms apply when L is semi-simple.

Pitfall: If V is not the zero space, the basis relative to which the matrixof an elementary lineon has the form (94.2) or (94.11) is never uniquelydetermined by L. Indeed, the construction of these bases involved the choiceof an element outside the maximal L-space, and different choices give riseto different bases.

Remark: Props.1 and 2 can be used to prove the following result: If Vis a field over C or R, then every L ∈ LinV has an additive decomposition(N,S), N,S ∈ LinV, such that L = N+S, N is nilpotent, S is semi-simple,and N and S commute. Indeed, if L is elementary, one can define S to bethe lineon whose matrix, relative to a basis for which (94.2) or (94.11) hold,is obtained by replacing 1 on the right side of (94.2) or (94.11) by 0. Thenone can define N := L − S and prove that (N,S) is a decomposition withthe desired properties. The case when L is not elementary can be reducedto the case when it is by applying the Elementary Decomposition Theorem.

Actually the result just described remains valid for a large class of fields(often called perfect fields), and the decomposition can be proved to beunique.

Notes 94

(1) A matrix of the type described by (94.2) or (94.3) is often called an “elementaryJordan matrix”. A matrix whose only non-zero terms are in blocks of the formdescribed by (94.2) or (94.3) along the diagonal is often called a “Jordan matrix”,“Jordan form”, or “Jordan canonical form”. Using this terminology, we may say,in consequence of the results of Sects.91-93 and of Prop.1, that for every lineon ona linear space over C, we can find a basis such that the matrix of the lineon relativeto that basis is a Jordan matrix.

95 Similarity, Elementary Divisors

Definitions We say that the lineon L ∈ LinV is similar to the lineon

L′ ∈ LinV ′ if there is a linear isomorphism A : V → V ′ such that

L′ = ALA−1, i.e. L′A = AL. (95.1)

Assume that L′ ∈ LinV ′ is similar to L ∈ LinV. By Cor. 2 to theCharacterization of Dimension (Sect.17) we then must have dimV = dimV ′.Also, the following results, all easily proved, are valid.


(I) A subspace U of V is L-invariant if and only if A>(U) isL′-invariant. (See Prop.3 of Sect.82).

(II) L is elementary if and only if L′ is elementary. If this is the case, thenL and L′ have the same prime polynomial and the same depth.

(III) A decomposition (Ei | i ∈ I) of V is an elementary L-decomposition ifand only if (A>(Ei) | i ∈ I) is an elementary L′-decomposition of V ′.

Roughly, similar lineons have the same intrinsic structure.The following result shows, roughly, that elementary decompositions of

similar lineons can be made to correspond.Similarity Theorem for Lineons: Let (Ei | i ∈ I) and (E ′

i | i ∈ I ′) be

elementary decompositions of the given lineons L ∈ LinV and

L′ ∈ LinV ′, respectively. If L′ is similar to L, then there is an invertible

mapping ϕ : I → I ′ such that L|Eiis similar to L|E′

ϕ(i)for all i ∈ I.

The proof will be based on the following:Lemma: Assume that (95.1) holds for given L ∈ LinV, L′ ∈ LinV ′, and

A ∈ Lis(V,V ′). Let E ,U be supplementary L-subspaces of V and let E ′,U ′

be supplementary L′-subspaces of V ′. Let E ∈ LinV be the idempotent for

which Null E = E and RngE = U and let F ∈ LinV ′ be the idempotent for

which Null F = U ′ and Rng F = E ′ (see Prop.4 of Sect.19). Then:

(a) Null (FA) is an L-subspace of V and Null (EA−1) is an L′-subspace

of V ′.

(b) A>(Null (FA|E)) = Null (EA−1|U ′).

(c) If dim E ≥ dim E ′ and Null (FA|E) = 0, then L|E is similar to L′|E′ ,

and L|U is similar to L′|U ′.

Proof: It follows immediately from the definitions of E and F that Ecommutes with L and that F commutes with L′.

Let v ∈ Null (FA) be given. Using (95.1) we obtain 0 = L′(FA)v =F(L′A)v = F(AL)v = FA(Lv) and hence Lv ∈ Null (FA). Sincev ∈ Null (FA) was arbitrary, it follows that Null (FA) is L-invariant. In-terchanging the roles of L and L′ we find that Null (EA−1) is L′ invariant.Hence Part (a) is proved.

Let v ∈ V be given. Then

v ∈ Null (FA|E) = Null (FA) ∩ E⇐⇒ v ∈ E = Null E and F(Av) = 0⇐⇒ 0 = Ev = (EA−1)(Av) and Av ∈ Null F = U ′

⇐⇒ Av ∈ Null (EA−1) ∩ U ′ = Null (EA−1|U ′).

95. SIMILARITY, ELEMENTARY DIVISORS 369

Since v ∈ V was arbitrary, Part (b) follows.

If dim E ≥ dim E ′ and Null (FA|E) = 0 we can apply the Pigeon-hole Principle for Linear Mappings of Sect.17 to (FA)|E

′

E and conclude thatdim E = dim E ′ and that (FA)|E

′

E is invertible. Using (95.1) we obtain

(FA)|E′

E L|E = (FAL)|E′

E = (FL′A)|E′

E = (L′FA)|E′

E = L′|E′(FA)|E

′

E

and hence that L′|E′ is similar to L|E . By part (b) we also have

Null (EA−1|U ′) = 0. Since

dimU ′ = dimV ′ − dim E ′ = dimV − dim E = dimU

we can apply the same argument as just given to the case when L and L′

are interchanged and conclude that L|U is similar to L′|U ′ .

Proof of the Theorem: We choose a space of greatest dimension fromthe collection Ei | i ∈ I ∪ E ′

i | i ∈ I ′. Without loss of generality wemay assume that we have chosen Ej , j ∈ I. Let M be the (only) minimalL-space included in Ej . Let (E′

i | i ∈ I) be the family of idempotents associ-ated with the decomposition (E ′

i | i ∈ I ′) (see Prop.5 of Sect.81). By (81.5),we have

∑

i∈I′

(E′

iA)|M = (∑

i∈I′

E′

i)A|M = A|M.

Since A is invertible and M 6= 0, we may choose j′ ∈ I ′ such that(Ej′A)|M 6= 0. We now abbreviate E := Ej , E ′ := E ′

j′ ,U :=

∑

(Ei | i ∈ I \ j), U ′ :=∑

(E ′i | i ∈ I ′ \ j′) and F := E′

j′ . TheLemma then applies. By part (a), Null (FA) is an L-subspace of V and sois E ∩Null (FA) = Null (FA|E). Since FA|M 6= 0 and M ⊂ E , we cannothave M ⊂ Null (FA|E). Hence, since E is elementary, it follows from Prop.3of Sect.91 that Null (FA|E) = 0. In view of the maximality assumptionon the dimension of E := Ej , we havedim E ≥ dim E ′. Hence we can apply Part (c) of the Lemma to concludethat dim Ej = dim E ′

j′ , that L|Ejis similar to L′

|E′j′, and that L|Uj

is similar

to L′|U ′

j′. The desired result now follows immediately by induction, using

Prop.6 of Sect.81.

If we apply the Similarity Theorem to the case when L′ = L we obtain

Corollary 1: All elementary L-decompositions for a given lineon L have

the same number of terms. Moreover, if two such decompositions are given,

then the terms of one decomposition can be matched with the terms of the

other such that the adjustments of L to the matched L-spaces are similar.


In view of the statement (II) above, this Corollary implies that the primepolynomials and the depths of the adjustments of a lineon L to the termsin an elementary L-decomposition, if each is counted with an appropriate“multiplicity”, do not depend on the decomposition but only on L. Moreprecisely, denoting the set of all powers of prime polynomials over F by P,one can associate with each lineon L a unique elementary multiplicityfunction

emultL : P → N

with the following property: In every elementary decomposition of L, thereare exactly emultL(qd) terms whose minimal polynomial is qd. The supportof emultL is finite and the members of this support are called the elemen-tary divisors of L. They are all divisors of the characteristic polynomialchp of L, which is defined by

chpL :=∏

qd∈P

(qd)emultL(qd). (95.2)

It follows from Cor.5 of Sect.93 and Prop.4 of Sect.81 that the degree of thecharacteristic polynomial (95.2) is dimV. It is evident that chpL(L) = 0.Therefore the degree of the minimal polynomial of L cannot exceed thedimension of the domain V of L.

Remark 1: In Vol.II we will give another definition of the character-istic polynomial, a definition in terms of determinants. Then (95.2) andchpL(L) = 0 will become theorems. The determinant of L will turn out tobe given by detL = (−1)dimV(chpL)0.

The following consequence of the Similarity Theorem is now evident.

Corollary 2: If the lineon L′ is similar to the lineon L, then L′ and

L have the same elementary multiplicity functions, the same elementary

divisors, and the same characteristic polynomial, i.e. we have emultL =emultL′ and chpL = chpL′.

Remark 2: The converse of Cor.2 is also true: if emultL = emultL′

then L′ is similar to L (see Problem 2). However, if L and L′ merely havethe same set of elementary divisors (without counting multiplicity), or if Land L′ merely have the same characteristic polynomial, then they need notbe equivalent.



(1) Consider

L :=

3 0 11 2 1−1 0 1

∈ Lin R3.

(a) Find an elementary L-decomposition of R3.

(b) Determine the elementary divisors, the characteristic polynomial,and the minimal polynomial of L.

(c) Find a basis b := (b1, b2, b3) of R3 such that the matrix [L]b iscanonical in the sense described in Sect.94.

(2) Let V and V ′ be linear spaces (over any field F). Let L ∈ LinV,L′ ∈ LinV ′ be given and let q and q′ be their respective minimalpolynomials.

(a) Assume that q = q′ and deg q = dimV = dimV ′. Show that L′ isthen similar to L.

(d) Assume that L and L′ are both elementary and have the sameprime polynomial and depth. Show that L′ is then similar toL. (Hint: Use Cor.5 to the Structure Theorem for ElementaryLineons.)

(c) Assume that L and L′ have the same elementary multiplicityfunction, i.e. that emultL = emultL′ (see Sect.95). Show that L′

is then similar to L. (Hint: Choose elementary decompositionsfor L and L′, use Part (b) above, and then Prop.6 of Sect.81.)

(3) Let (Ei | i ∈ I) be a decomposition of a given linear space V and letL ∈ LinV.

(a) Put

E ′j :=

⋂

i∈I\j E⊥i for all j ∈ I. (P9.1)

Show that (E ′j | j ∈ I) is a decomposition of V∗.

(b) Prove: If (Ei | i ∈ I) is an elementary L-decomposition, then(E ′

j | j ∈ I) as defined by (P9.1), is an elementary

L⊤-decomposition and (L|Ei)⊤ is similar to (L⊤)|E′

ifor each i ∈ I.


(c) Prove that L⊤ is similar to L. (Hint: Use Cor.3 to the StructureTheorem for Elementary Lineons and the results of Problem 2.)

(4) Let V be a linear space over R and let L be a lineon on V.

(a) Show: If L is elementary, then either SpecL is a singleton, inwhich case PspecL (see Def.2 of Sect.88) is empty, or else SpecLis empty, in which case PspecL is a singleton.

(b) Assume that L is elementary and that SpecL is a singleton. Putσ :∈ SpecL. Show that

L = σ1V + N (P9.2)

for some nilpotent lineon N with nilpotency n := dimV (seeSect.93).

(c) Assume that L is elementary and that SpecL is empty. Put(µ, κ) :∈ PspecL. Show that dimV must be even and that

L = µ1V + κJ + N (P9.3)

for some J ∈ LinV satisfying J2 = −1V and some nilpotent lineonN that commutes with L and has nilpotency d := 1

2 dimV.

(d) Prove: SpecL cannot be empty when dimV is odd and PspecLcannot be empty when SpecL is empty.

(5) Let V be a linear space over R and let L be a lineon on V.

(a) Let (Ei | i ∈ I) be a decomposition of V all of whose terms areL-spaces. Prove that the terms are then also (expV(L))-spaces,that

(expV(L))|Ei= expEi

(L|Ei) for all i ∈ I, (P9.4)

and that

expV(L) =∑

i∈I expEi(L|Ei

)|VPi, (P9.5)

where (Pi | i ∈ I) is the family of projections associated with thedecomposition (see Prop.5 of Sect.81). (Hint: Use a proof similarto the one of Prop.3 of Sect.85.)


Remark: Since this result applies, in particular, to elementarydecompositions, we see that the problem of evaluating the expo-nential of an arbitrary lineon is reduced to the problem of evalu-ating the exponential of elementary lineons.

(b) Assume that L is elementary. Show that

expV(L) = eσ∑

k∈n[

1k!N

k (P9.6)

if L is of the form (P9.2) of Problem 4 and that

expV(L) = eµ(cosκ1V + sinκJ)∑

k∈d[

1k!N

k (P9.7)

if L is of the form (P9.3). (Hint: Use the results of Problem 9 inChap.6.)

Index of Theorem Titles

Section PageAdditive Decomposition Theorem 86 321Annihilators and Transposes, Th. on. 21 73Attainment of Extrema, Th. on 08, 58 35, 203Basic Convergence Criterion 55 188Cell-Inclusion Theorem 52 173Chain Rule for Flat Mappings 33 111Characterization of Bases 15 54Characterization of Dimension 17 58Characterization of Gradients 63 219Characterization of Regular Subspaces 41 137Characterization of the Trace 26 89Cluster Point Theorem 08, 55 33, 187Compact Image Theorem 58 200Compactness Theorem 58 201Composition Theorem for Continuity 56 194Composition Theorem for Uniform Continuity 56 195Connection Components, Th. on. 73 286Congruence Theorem 46 153Constrained-Extremum Theorem 69 252Continuity of Uniform Limits, Th. on. 56 196Contraction Fixed Point Theorem 64 227Convex Hull Theorem 37 124Curl-Gradient Theorem 611 261Deviation Components, Th. on. 73 287Difference-Quotient Theorem 08, 61 35, 211Differentiation Theorem for Lineonic Exponentials 612 268Differentiation Theorem for Integral Representations 610 258Differentiation Theorem for Inversion Mappings 68 246Dimension of Annihilators, Formula for 21 73

375

376 Index of Theorem Titles

Section PageDimensions of Range and Nullspace, Th. on. 17 60Elementary Decomposition Theorem 91 350Elimination of Unknowns, Th. on. 16 57Extreme Spectral Values of a Symmetric Lineon, Th. on. 84 312Extremum Theorem 08, 69 35, 251Flat Span Theorem 35 117Fundamental Theorem of Calculus 08, 610 36, 255General Chain Rule 63 219General Product Rule 66 235Half-Space Inclusion Theorem 38 126Half-Space Intersection Theorem 54 183Induced Inner-Product, Th. on the 44 144Implicit Mapping Theorem 68 245Inner-Product Index Theorem 47 157Inner-Product Inequality 42 139Inner-Product Signature Theorem 47 155Interchange of Partial Gradients, Th. on. 611 264Inversion Rule for Flat Mappings 33 111Lineonic Logarithm Theorem 85 320Lineonic Square-Root Theorem 85 318Local Inversion Theorem 68 245Norm-Duality Theorem 52 175Norm-Equivalence Theorem 51 170Partial Gradient Theorem 65 232Pigeonhole Principle 05 21Pigeonhole Principle for Linear Mappings 17 60Polar Decomposition Theorem 86 322Prime Polynomials over C and R, Th. on. 92 357Real and Imaginary Parts, Th. on. 89 338

Index of Theorem Titles 377

Section PageRepresentation Theorem for Linear Formson a Space of Linear Mappings 26 89Similarity Theorem for Lineons 95 368Smoothness of the Strict Lineonic Square Root, Th. on. 85 318Specification of Flat Mappings, Th. on. 33 109Spectral spaces, Th. on. 82 308Spectral Theorem 84 313Spectral Theorem for Normal C-Lineons 810 342Striction Estimate for Differentiable Mappings 64 224Strong Convex Hull Theorem 37 125Structure Theorem for Elementary Lineons 93 358Structure Theorem for Normal Lineons 88 330Structure Theorem for Orthogonal Lincons 88 333Structure Theorem for Perpendicular Turns 87 326Structure Theorem for Skew Lineons 87 328Subadditivity of Magnitude 42 139Symmetry of Second Gradients, Th. on. 611 263Uniform Continuity Theorem 58 200Unique Existence of Barycenters, Th. on. 34 112

Index of Special Notations

Section PageS ⊂ T (S is a subset of T ) 01 3S $ T (S is a proper subset of T ) 01 3S\T (set-difference of S and T ) 01 4(ai | i ∈ I) (family with index set I) 02 7SI (set of all families in S with index set I) 02 7

M (I) (set of all families in M with indexset I and finite support) 07 28

(R(I))ν (set of all families in Rwith index set I, finite support, and sum ν) 35, 37 116, 124

×(Ai|i ∈ I)(set-product of the family (Ai|i ∈ I) of sets) 04 15f × g (cross-product of the mappings f and g) 04 17

×(fi|i ∈ I) (cross-product of the family (fi|i ∈ I)of mappings) 04 17

g×I (I-cross-power of the mapping g) 04 18f> (image mapping of f) 03 12f< (pre-image mapping of f) 03 12f← (inverse of the mapping f) 03 11fn (nth iterate of the mapping f) 03 11f |A (restriction of f to A) 03 12f |BA (adjustment of f range) 03 12f |Rng (adjustment of f to range) 03 13f|A (A-adjustment of f when A is f -invariant) 03 13

f(V) (lineonic extension of f) 85 316

cD→C (constant with domain D, codomain C, andrange c) 03 11

1S (identity mapping of S) 03 111U⊂S (inclusion mapping of U into S) 03 11(a, ·), (·, b) (“insertion” into a product of two sets) 04 18(c.j) (“insertion” into a product of a family of sets) 04 19♯S (cardinal of S) 05 20[x, y] (closed interval; segment joining the points x

and y) 08, 37 32, 123]x, y[ (open interval; open segment joining the

points x and y) 08, 51 32, 163

379

380 Index of Special Notations

Section Page[a, b[, ]a, b] (half-open intervals) 08 31

R (extended-real-number set) 08 32

P (extended-positive-number set) 08 32ι (identity-mapping of R; “indeterminate”) 08, 92 34, 353∂tf (derivative of f at t) 08, 61 34, 209∂f (derivative-function of f) 08, 61 34, 209f• (derivative-function of f) 08, 61 35, 210

∂nf, f (n) (derivative of order n) 08, 61 35, 209∇ (gradient) 33, 63 108, 218∇(1),∇(2) (partial gradients) 65 228, 229

ϕ,1, ϕ,2 (partial derivatives) 65 228, 229∇(j), ϕ,j (partial gradients and derivatives) 65 231

(Laplacian) 67 241L1 ⊕ L2 (“evaluation-sum” of L1 and L2) 14 49⊕

(Li|i ∈ I) (“evaluation-sum” of a family oflinear mappings) 14 50

δI (standard basis of F(I)) 16 55V∗ (dual of the linear space V) 21 71b∗ (dual of the basis b) 23 78L⊤ (transpose of the linear mapping L) 21 71S⊥ (annihilator of the set S; orthogonal supplement) 21, 41, 72,137B∼ (switch of the bilinear mapping B) 24 83w ⊗ λ (tensor product of w and λ) 25 86

S (quadratic form corresponding to thebilinear form S) 27 94

Q (bilinear form corresponding to thequadratic form Q) 28 94

←→xy (line passing through the points x and y) 32 107v·2 (inner square of v) 41 133u · v (inner product of u and v) 41 133〈u|v〉 (unitary product of u and v) 89 337|v| (magnitude of v) 42 139||L||ν,ν′ (operator norm of L relative to ν, ν ′) 52 174||L||ν (operator norm of the lineon L relative to ν) 52 174||L|| (operator norm of L relative to magnitude) 52 176[L]b (matrix of the lineon L relative to the basis b) 18 63[h]c, [h]c, [T]cd (components relative to a coordinate system) 71, 73 279, 289

Index of Multiple-Letter Symbols

Section PageAcc (set of accumulation points, of a set) 57 197add (addition mapping) 11 39Aspec (angle-spectrum, of a lineon) 88 333Asps (angle-spectral space, of a lineon) 88 333Ball (ball, in a genuine Euclidean space) 46 153Bdy (boundary, of a set) 53 179Box (norming box, determined by a basis) 51 168Ce (Norming cell, of a norm) 51 164ch (characteristic family or function, of a set) 02, 03 8, 10chp (characteristic polynomial, of a lineon) 95 370Clo (closure, of a set) 53 178Cod (codomain, of a mapping) 03 9Comm (commutant algebra, of a lineon) 18 62Conf (set of confined mappings) 62 213, 216Curl (curl, of a mapping) 611 261cxc (convex-combination mapping, of a family in a

flat space) 37 124Cxh (convex hull, of a subset of a flat space) 37 123dd (directional derivative) 65 233deg (degree, of a polynomial) 92 353det (determinant) 73 287diam (diameter) 52 173diff (point-difference mapping) 32 103dim (dimension, of a linear space or a flat space) 17, 32 58, 107div (divergence) 67 239Dmd (norming diamond, determined by a basis) 51 168Dom (domain, of a mapping) 03 9dst (distance function, of a genuine Euclidean space) 46 152Eis (group of Euclidean automorphisms) 45 149emult (elementary multiplicity function, of a lineon) 95 370ev (evaluation, on a set-product or a set of mappings) 04, 22, 16, 17, 74

381

382 Index of Multiple-Letter Symbols

Section Pageexp (exponential, lineonic exponential) 08, 612 34, 266Fin (set of all finite subsets, of a set) 05 21Fis (group of flat automorphisms) 33 111flc (flat combination mapping, of a family in a flat space) 35 116Flf (space of flat functions) 36 120Fsp (flat span, of a subset of a flat space) 32 107Gr (graph, of a mapping) 03 10ind (index, of an inner-product space) 47 157Inj (set of all injective mappings from a given set

to another) 04 16inf (infinum, of a set) 08 32ins (insertion mapping) 14, 15 48, 52Int (interior, of a set) 53 177inv (inversion mapping) 68 246ip (inner-product) 41 133Ker (kernel, of a homomorphism) 06 24lim (limit) 08, 55, 57 34, 186, 198Lin (space of linear mappings, from a given linear space

to another; algebra of lineons) 14, 18 47, 61Lin2 (space of bilinear mappings) 24 81Lis (set of linear isomorphism, from a given linear space

to another; linear group) 14, 18 48, 62lnc (linear combination mapping, of a family in a

linear space) 15 51log (lineonic logarithm) 85 320lp (polar decomposition, left positive part) 86 324Lsp (linear span, of a subset of a linear space) 12, 92 42, 355Map (set of all mappings, from a given set to another) 04 16max (maximum, of a set) 08 32min (minimum, of a set) 08 32mult (multiplicity function, of a lineon) 82, 810 307,340Nhd (collection of neighborhoods, of a point) 53 177

Index of Multiple-Letter Symbols 383

Section Pageno (norm, of a norming cell) 51 165Null (nullspace, of a linear mapping) 13 46opp (opposition mapping) 11 39or (polar decomposition, orthogonal part) 86 324Orth (set of orthogonal mappings, from a given

inner-product space to another; orthogonal group) 43 141, 142Perm (set of all permutations, of a given set) 04 16Pos (set of positive symmetric lineons) 85 316Pos+ (set of strictly positive symmetric lineons) 85 316pow (lineonic power) 66 237Pspec (pair-spectrum, of a lineon) 88 330Psps (pair-spectral space, of a lineon) 88 330Qu (space of quadratic forms) 27 94Qspec (quasi-spectrum, of a lineon) 87 327Qsps (quasi-spectral space, of a lineon) 87 327Rng (range, of a family or a mapping) 02, 03 7, 10rp (polar decomposition, right positive part) 86 324sep (separation function, of a Euclidean space) 45 148sgn (sign-function) 08 32sig (signature, of an inner-product space) 47 155Skew (space of skew linear mappings or lineons) 27, 41 92, 135Skew2 (space of skew bilinear mappings) 24 83sm (scalar-multiple mapping) 11, 89 39, 335Small (set of small mappings) 62 212, 216Spec (spectrum, of a lineon) 82, 810 307, 340Sph (sphere, in a genuine Euclidean space) 46 153Sps (spectral space, of a lineon) 82, 810 307, 340sq (inner square) 41 133sqrt (lineonic square root) 85 317sqrt+ (strict lineonic square root) 85 318ssq (sum-sequence, of a sequence) 08, 55 33, 191str (striction, of a mapping relative to given norms;

absolute striction) 64 223, 227

384 Index of Multiple-Letter Symbols

Section PageSub (subsetset, of a set) 01 3sum (summation mapping) 15 51sup (supremum of a set) 08 32Supt (support, of a family) 07 28Sym (space of symmetric linear mappings or lineons) 27, 41 92, 135Sym2 (space of symmetric bilinear mappings) 24 83tr (trace, of a lineon) 26 89Ubl (unit ball, in a genuine inner-product space) 42 140Unit (set of unitary mappings, from a given unitary

space to another; unitary group) 89 339Usph (unit sphere, in a genuine inner-product space) 42 140

Index of Terminology

l1-norm 172l∞-norm 172

abelian 26absolute striction 227absolute value 31accumulation point 34, 197action, of a group 101addition 25, 39additive decomposition 322adjoint 74adjustment, of a mapping 13affine function 122affine hull 108affine mapping 112affine space 108affine subset 108affine transformation 111algebra of lineons 62alternating 85angle-spectrum 333angle-spectrum space 333annihilator 72anti-Hermitian 340anti-linear 340anticommute 70antiderivative 36antisymmetric 85antitome 32average 120

ball 152, 169, 172barycenter 112, 116barycentric coordinates 119basis 52basis field 278big oh 217bijection 11bilinear form 92bilinear mapping 80Borel-Lebesgue Theorem 204boundary 126, 179bounded 32bounded above [bounded below] 32bounded sequence 187bounded set 172Bourbaki 306box 168Bunyakovsky’s inequality 141

cancellation law 24cancellative 24canonical matrix 363Caratheodory’s Theorem 126cardinal 20Cartesian Coordinates 290Cartesian decomposition 85Cartesian product 9Cauchy Convergence Criterion 192Cauchy sequence 192Cauchy’s inequality 141cell 163

385

386 Index of Terminology

cell modelled on anorming cell 167

center of mass 116centralizer 64centroid 113, 116characteristic family 8characteristic function 10characteristic polynomial 370characteristic value 310charge distribution 111Christoffel symbols 288class Cn 35, 209class C∞ 35, 209class C1 218class C2 218closed ball 152closed interval 32closed set 179closure 177closure, of a cell 167closure, of a norming cell 164cluster point 33, 186codomain, of a mapping 9collection 3column, of a matrix 8combination, in a

pre-monoid 22commutant-algebra 62commutative 25commutative ring 26commute 11compact 199complement 5complementary 44complex dual space 336complex inner-product 340complex space 334complexification 347

complexor 334component 52component-family 279composite 11Composition Rule 222confined 213, 216congruence 154congruent 153conjugate-complex structure 335conjugate-linear mapping 336connected 226connection components 281constant 11constraint 252constricted mapping 222contained, in a set 3continuous 34, 192contraction 230contravariant components 138, 289convergence, of a sequence 33, 186convex hull 123convex set 123convex-combination 124coordinate 54coordinate curve 280coordinate system 277coordinate transformation 281cos 273covariant components 138, 289covariant derivatives 285covector 71covector field 279cover 5, 199cross-polytope 172cross-power 18cross-product, of a family of

mappings 18

Index of Terminology 387

curl 261curvilinear coordinate

system 280cyclic lineon 362cylindrical coordinates 293

D’Alembertian 241decomposition, of a

linear space 303definite quadratic form 92degree, of a polynomial 275, 353depth, of an elementary

lineon 359derivative 34, 209determinant 287, 370deviation components 281diagonal matrix 8diagonal, of a matrix 8diameter 173diamond 168differentiable 34, 209, 218differential 222dimension 58, 107Dini’s Theorem 208direct product 9direct sum 44, 51, 306direction space 106directional derivative 233disjoint 4, 5disjunct 43, 303distance 196distance function 152distributive law 25divergence 239domain, of a mapping 9dot product 138double-signed 94doubleton 3dual 74dual basis field 278

dual norm 175dual space 71dual, of a basis 78dyadic product 88

eigenvalue 310elementaryL-decomposition 350

elementary L-space 350elementary divisor 352, 370elementary lineon 350elementary multiplicity 370ellipsoid 344elliptical coordinates 300empty set 3enumeration 20equivalence classes 6equivalence relation 6Euclidean isomorphism 149Euclidean mapping 148Euclidean norm 143Euclidean space 148Euclidean vector space 141evaluation 16, 17evaluation mapping 17exponential 34exponential, lineonic 266extended-real-number 32external translation

space 105extremum 35

family 7Feechet derivative 222field 26fixed point 11flat 106flat basis 118flat coordinate system 280flat function 120


flat isomorphism 111flat mapping 108flat space 101flat span 106flat-combination 117flatly independent family 118flatly spanning family 118frame 120free action 102frontier 182function 9function of two variables 18functional 9fundamental sequence 192Fundamental Theorem of

Algebra 275future-directed 161

Gamma symbols 285Gaussian elimination 58general linear group 64genuine Euclidean space 152genuine inner product 133genuine interval 31genuine unitary space 337genuinely orthonormal 136gradient 218Gram-Schmidt

orthogonalization 159graph, of a mapping 10group 22group-isomorphism 24groupable 23

half-line 32half-open interval 32half-space 126harmonic function 241Heine-Borel Theorem 204Hermitian 340homomorphism 24

hyperplane 107, 121

idempotent 64idempotents, family of 306identity 26identity mapping 11image mapping 12imaginary part, of a C-lineon 343included, in a set 3inclusion mapping 11indefinite quadratic form 96indentification 2index of nilpotence 363index set 7index, of an inner-product

space 157Inertia, Law of 158infimum 32infinite series 37injection 11injective family 7injective mapping 10inner product 133inner square 133inner-product components 285inner-product space 133insertion mapping 49, 52integral 35, 254integral domain 27integral representation 256integral ring 26interior 179intersection 4, 5, 15intersection-stable 13interval 31invariant 13inverse 11, 26inversion mapping 246invertible mapping 10isolated point 197isometry 154


isotone 32isotropic 158iterate 11

Jacobian 251Jordan matrix 367Jordan-algebra 98

kernel 24, 46Kronecker delta 58

ℓ∞-norm 172ℓ1-norm 172L-space 62, 307Lagrange multipliers 254Laplacian 241latent root 310Lebesgue number 204Lebesgue’s Covering Lemma 204left-inverse 11left-multiplication mapping 67length, of a list 7Lie-algebra 98limit point 37limit, of a function 34limit, of a mapping 198limit, of a sequence 186limit, of a sequence 33line 107linear L-span 355linear combination 51linear form 71linear function 122linear functional 74linear isomorphism 45linear manifold 108linear mapping 44linear operator 46linear operator 64linear space 39

linear span 42linear transformation 46, 64linearly dependent family 52linearly independent family 52lineon 61lineon field 279lineon-group 62lineonic extension 316lineonic nth power 237, 317lineonic polynomial function 354Lipschitz number 228Lipschitzian 228list 7local inverse 243local maximum

[local minimum] 251locally invertible 243locally uniform convergence 190logarithm, lineonic 320Lorentz group 143lower bound 32

magnitude, of a vector 139map 10mapping 9mass-point 113matrix 8, 55, 63matrix, of a bilinear form 93matrix-algebra 64maximal L-space 349maximum [minimum] 32, 35Mean-Value Theorem 37median 114member-wise difference 25member-wise opposite 25member-wise product 24member-wise reciprocal 25member-wise sum 25, 29midpoint 114, 163minimal L-space 349


minimal polynomial 356mixed components 289monic polynomial 353monoid 22monoidable 23multiple 28multiplication 24multiplication, of lineons 61multiplicative group 26multiplicity, of a

spectral value 307

negative 26negative quadratic or

bilinear form 94negative-regular 155neighborhood 177, 192neutral, of a monoid 22neutral-subgroup 23neutrality law 22nilpotency 69, 362nilpotent lineon 69, 362non-degenerate quadratic or

bilinear form 94non-isotropic subspace 138non-singular subspace 138norm 164normal lineon 334norming box 167norming cell 163norming diamond 168null set 6nullity 61nullspace 45

one-to-one 14one-to-one correspondence 14onto 14open interval 32

open segment 163open set 179, 192open-half-space 126operator 9, 64operator norm 174opposite 25opposition 39orbit, under an action 102origin 41, 280orthogonal 134, 310orthogonal family of

subspaces 310orthogonal group 142orthogonal isomorphism 141orthogonal lineon 142orthogonal mapping 141orthogonal projection 138orthogonal supplement 137orthonormal 136

pair 8pair-spectral space 330pair-spectrum 330paraboloidal coordinates 301parallel flats 106parallelepiped 172parallelogram 130, 172parallelogram law 160parallelotope 172partial derivative 228, 231partial-gradient 228, 231partition 5past-directed 161perfect field 374permutation 16permutation-group 23perpendicular projection 137perpendicular turn 290, 326


physical components 299pieces, of a partition 6plane 107point 103point-difference 103point-spectrum 310Polar Coordinates 290polar decomposition 322polynomial 275, 353polynomial function 275, 354positive definite 96, 321positive quadratic or

bilinear form 94positive semidefinite 96, 321positive symmetric

lineons 316positive-regular 155power set 6power-space 50pre-image mapping 12pre-monoid 22pre-ring 27primary decomposition 352prime polynomial 356prime polynomial, of an

elementary lineon 359principal direction, of an

ellipsoid 344process 209product 24product, of a family 29product, of lineons 61product-space 48, 49projection 19, 64projections, family of 306proper number 310proper subset 3pseudo-Euclidean space 151

quadratic form 94quadruple 8

quasi-spectral space 327quasi-spectrum 327

range, of a family 7range, of a mapping 10rank 61real part, of a C-lineon 338reciprocal 24reflection 334regular subspace 136, 148, 155Relativity 158restriction, of a mapping 12resultant force 115reversion law 22reversion, in a group 22right-inverse 11ring 25rng 27rotation 334row, of a matrix 8

scalar field 279scalar product 138scale 167Schwarz’s inequality 141second dual 74second gradient 218secular value 310segment 123self-adjoint 340self-indexed family 7semi-axes, of an ellipsoid 344semi-linear 340semi-simple lineon 352semigroup 26separation function 148Separation Theorem 129sequence 8, 186series 37set-difference 4


set-power 7, 8set-product 8, 15set-square 8sign 31signal-like 158signature 155similar lineons 367simple lineon 350simply connected 263sin 273single-signed 94singlet 7singleton 3singleton-partition 6singular matrix 58singular subspace 155skew bilinear mapping 83skew lineon 135skew-Hermitian 340skewsymmetric 85small mapping 212, 216small oh 217space of linear mappings 47space-like 158span-mapping 13spanned 42spanning family 52spectral idempotents 313spectral radius 228spectral resolution 313spectral space 307, 340, 345spectral value 307spectral vector 307spectrum 307, 340, 362sphere 152Spherical Coordinates 295square matrix 8square root, lineonic 317square root, strict lineonic 318

standard basis 55stream-function 272striction 223strictly antitone 32strictly isotone 32strictly negative 2strictly negative quadratic or

bilinear form 94strictly positive 2strictly positive quadratic or

bilinear form 94strictly positive symmetric

lineons 316strip 181sub-pre-monoid 22subgroup 23submonoid 23subring 26subsequence 33, 187subspace 42subspace generated by 44sum 25sum-sequence 33, 191summation mapping 51supplement 43supplementary 43supremum 32surjective mapping 10Sylvester’s Law 158symmetric bilinear mapping 83symmetric lineon 135symmetric matrix 8symmetry-group 154

tangent 218tensor 64tensor product 86, 91tensor product space 91term, of a family 7


term-wise evaluation 17tetrahedron 114three-index symbols 285time-like 158total charge 111totally isotropic 158totally singular 155trace 89, 111transformation 9transformation-monoid 23transitive action 102translated subspace 108translation 103translation space 103transpose 8transpose, of a linear

mapping 71transposition 72triple 8trivial partition 6

uniformly continuous 194union 4, 5, 15unit 26unit ball 140, 172unit cube 323unit matrix 63unit sphere 140unit vector 139unitary group 339unitary isomorphism 339unitary mapping 339unitary product 337unitary space 337unity 24upper bound 32

value, of a mapping 9value-wise absolute value 33

value-wise n’th power 33value-wise opposite 33value-wise pair formation 17value-wise product 33value-wise sum 33vector 103Vector Analysis 243, 264vector field 279vector space 41vector-curl 264void 6

Wave-Operator 241weak boundedness 177Weierstrass Comparison Test 19weight 116world-vector 158

zero 25, 39zero-mapping 45zero-space 42

Documents

Finite-Dimensional Spaces: Algebra, Geometry, and Analysis