Lecture Notes: MSc Maths and Statistics Refresher · PDF file2=1.414213562···, e=2.718281828459···, and π=3.141592654···. — the set of real numbers R contains all rational

Lecture Notes:

MSc Maths and Statistics Refresher Course

Jidong Zhou∗

Department of Economics

University College London

September, 2008

Very Preliminary

∗Chapters 5—8 of the lecture notes benefit from the lecture notes prepared by Dr Monica

Costa Dias for this course in the past few years. I am very grateful for her generosity. All

errors are mine. Please do not circulate without the permission of the author. Contact

information: [email protected].

MSc Maths and Statistics 2008Department of Economics UCL

Chapter 1: Mathematical PreliminariesJidong Zhou

Chapter 1:

Mathematical Preliminaries

1 Sets

1.1 Basic definitions

• A set is a collection of elements. If a set contains no element, we call it an empty setand denote it by ∅.

— N ≡ {1, 2, 3, ...}: the set of natural numbers (or counting numbers);— Z ≡ {...,−3,−2,−1, 0, 1, 2, 3, ...}: the set of integers;— Z2 ≡ {(m,n) : m,n ∈ Z}: the set of pairs of integers.

• A set B is a subset of A if each element of B is also an element of A:

B ⊂ A if x ∈ A whenever x ∈ B.

A set A equals another set B if

A ⊂ B and B ⊂ A.

• Operations with sets:

— A ∪ B (A union B) is the set of all elements that are either in A or in B (or in

both):

A ∪B = {x : x ∈ A or x ∈ B};

— A ∩B (A intersect B) is the set of all elements that are in both A and B:

A ∩B = {x : x ∈ A and x ∈ B};

— A\B (A minus B) is the set of all elements that are in A but not in B:

A\B = {x : x ∈ A and x /∈ B}.

— Ac = X\A is the complement of A in X.

• Two sets A and B are disjoint or mutually exclusive if A ∩B = ∅.

1



• Finite, countable, and uncountable sets:

— a set A is finite if the number of its elements is finite, i.e., A = {a1, . . . , an}n<∞.— a set A is countable (and so infinite) if the number of its elements is “equal” to

that of N, i.e., A = {ai : i ∈ N}.1

— a set A is uncountable (and of course infinite) if it is neither finite nor countable,

or roughly speaking, its elements cannot be listed completely.

— every infinite subset of a countable set is countable.2

— the union of a countable number of countable sets is still countable, i.e., ∪i∈NAi is

countable if each Ai is countable.

Example 1 Both Z and Z2 are countable.

Exercise 1 Prove(i) (DeMorgan’s law) (A ∪B)c = Ac ∩Bc; (A ∩B)c = Ac ∪Bc;

(iii) A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C); A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)

1.2 The real number system

• Real numbers:

— the set of rational numbers (or all quotients of integers) is

Q ≡½p

q: p, q ∈ Z and q 6= 0

¾.

— the set of irrational numbers includes all numbers that cannot be written as ratiosof integers. The decimal expansions of irrational numbers never end and have no

repeating pattern. For example,√2 = 1. 414 213 562 · · · , e = 2. 718 281 828459 · · · ,

and π = 3. 141 592 654 · · · .— the set of real numbers R contains all rational and irrational numbers.

— Q is countable; R\Q is uncountable; and R is uncountable.

• Upper bounds and lower bounds:

— b is called an upper bound for S ⊂ R if x ≤ b for all x ∈ S.

1More rigorously, A is countable if there is a one-to-one correspondence between A and N.2Therefore, roughly speaking, countable sets represent the “smallest” infinity.

2



— b is called the least upper bound of S ⊂ R (or b = supS) if b is an upper bound ofS but any number smaller than b is not an upper bound of S. (Roughly speaking,

b is the smallest upper bound.) That is,

x ≤ b for all x ∈ S, but for any b0 < b, ∃ y ∈ S such that y > b0.

— similarly, we can define the lower bound and the greatest lower bound (inf S) of S.

— for any nonempty S ⊂ R, if S has an upper bound, it has a least upper bound; ifS has a lower bound, it has a greatest lower bound.

Example 2 Let

S = { 1n: n ∈ N}.

Then supS = 1 and inf S = 0.

• Two results:

— (archimedean property) if x, y ∈ R, x > 0, and y <∞, then there is a n ∈ N suchthat nx > y.

— (Q is dense in R) if x, y ∈ R, and x < y, then there is a r ∈ Q such that x < r < y.

Exercise 2 Let S = {x ∈ Q : 1 < x <√2}. What are supS and inf S?

1.3 Open sets, closed sets, and compact sets

1.3.1 Metrics

We introduce a measurement of the distance between two elements in a set.

• We define a metric or a distance function in a given set X as follows: for any two

elements x and y in X, there is associated a real number d(x, y) such that

(i) d(x, y) > 0 if x 6= y and d(x, y) = 0 if x = y;

(ii) d(x, y) = d(y, x);

(iii) (triangle inequality) d(x, y) ≤ d(x, z) + d(z, y) for any z ∈ X.

• A set equipped with a metric is called a metric space.

• In this course, we mainly deal with the space with X = Rn and the distance function

(Euclidean metric)

d(x, y) = kx− yk ≡p(x1 − y1)2 + · · ·+ (xn − yn)2.

3



We call it the Euclidean space. In particular, if n = 1 (i.e., we are considering R), thedistance function degenerates to

d(x, y) = |x− y| .

• The ε-neighborhood or ε-ball of a point z in X is the set of all elements in X which lie

close to z with a distance less than ε > 0:

Nε(z) = {x ∈ X : d(x, z) < ε}.

For example, in R, Nε(0) = (−ε, ε).

Exercise 3 Check that the metric k·k satisfies the triangle inequality. (Hint: use the Cauchy-Schwarz inequality in the appendix.)

1.3.2 Open sets

In the following, we regard X as the universal set and consider a subset A ⊂ X.

• A set A in X is open if for each z ∈ A, there exists ε > 0 such that Nε(z) ⊂ A.

The word “open” has a connotation of “no boundary”: from any point one can always

move a little distance in any direction and still be in the set.

Example 3 (i) An interval (a, b) is open in R; (ii) {(x, y) ∈ R2 : x2 + y2 < 1} is also anopen set in R2.

• Two properties:

— any union of open sets is open;

— the finite intersection of open sets is open.

• Interior points:

— a point z ∈ A is an interior point of A if there exists ε > 0 such that Nε(z) ⊂ A.

Therefore, all points in an open set are interior points.

— the interior of a set A is the set of all interior points of A. It is denoted by Ao.

Exercise 4 Give an example in which the infinite intersection of open sets is not open.

4



1.3.3 Closed sets

• Definitions:

— 1: a set A ⊂ X is closed if Ac is open in X.

— 2: a point z is a limit point of A ⊂ X if any neighborhood of z (i.e., Nε(z) for any

ε > 0) contains at least a point of A which is different from z. Then a set A is

closed if each limit point of A (if any) is contained in A.

Example 4 (i) An interval [a, b] in R is closed; (ii) {(x, y) ∈ R2 : x2 + y2 ≤ 1} is also aclosed set in R2.

• The closure of a set A is the union of A and its limit points. It is denoted by A.

• Two properties:

— any intersection of closed sets is closed;

— the finite union of closed sets is closed.

• There are many sets that are neither open nor closed. For example, (a, b]. There areonly two sets which are both open and closed in Rn: Rn itself and the empty set.

Exercise 5 (i) Prove definition 1 of closed sets by using definition 2.(ii) Find the limit points of the following sets in R: {a} ∪ [c, d] and { 1n}n∈N. Are they

closed sets in R?(iii) Find the limit points of the following sets in R2: {(x, y) ∈ R2 : x + y = 1} and

{(x, y) ∈ R2 : x > 0}. Are they closed sets in R2?(iv) Does a finite point set have any limit point? Is a finite point set closed?

(v) Give an example in which the infinite union of closed sets is not closed.

1.3.4 Bounded sets and compact sets

• A set A is bounded if there exists a real number M > 0 such that d(x, y) < M for any

x, y ∈ A.

• Compact sets:

— 1: a set A in Rn is compact if it is both closed and bounded. For example,

{(x, y) ∈ R2 : x2 + y2 ≤ 1} is compact.— 2: a set A in a general metric space is compact if every infinite subset of A has a

limit point in A.

5



— several results:

∗ (Weierstrass theorem) every bounded infinite subsets of Rn has a limit point

in Rn.

∗ closed subsets of compact sets are compact.∗ if A is closed and B is compact, then A ∩B is compact.

Exercise 6 [a,∞) is not compact according to the first definition. Show it does not satisfythe second one either.

1.3.5 Connected sets and convex sets

• Connected sets

— two subsets A and B of a metric space X are said to be separated if both A ∩ Band A ∩B are empty, i.e., if no point of A lies in the closure of B and no point of

B lies in the closure of A.3

— a set is said to be connected if it is not a union of two nonempty separated sets.For example, both [a, b] and {(x, y) ∈ R2 : 1 < x2 + y2 < 2} are connected.

— in R, a set A is connected if and only if it has the following property: if x, y ∈ A,

and x < z < y, then z ∈ A.

• Convex sets

— a set A ⊂ Rn is said to be convex if for any two elements x and y in A, any convex

combination of them

λx+ (1− λ)y with λ ∈ [0, 1]is also in A.

— for example, (a, b) and {(x, y) ∈ R2 : x2 + y2 ≤ 1} are convex, but {(x, y) ∈ R2 :12 ≤ x2 + y2 ≤ 1} is not.

— the intersection of convext sets is still convex.

2 Sequences

• A sequence {xn}∞n=1 in a metric space X is said to converge to x ∈ X if

∀ ε > 0, ∃ an integer N such that d(xn, x) < ε for any n > N .

3Separated sets are disjoint, but disjoint sets need not to be separated. For example, A = [0, 1] and

B = (1, 2) are disjoint but not separated.

6



That is, xn will be arbitrarily close to x when n is sufficiently large. We call x the limit

point of {xn}, and we write xn → x or

limn→∞xn = x.

Example 5 We show (−1)n/n → 0: ∀ ε > 0, ∃ N = 1ε such that |(−1)n/n| = 1/n < ε for

n > N .

• Properties of convergence

— the limit point of a convergent sequence is unique.

— if {xn} converges, then {xn} is bounded.— if x is the limit point of a set A in X, then there exists a sequence {xn} in A such

that xn → x.

— suppose xn → x and yn → y, where xn, yn ∈ R. Then (a) cxn → cx; (b) xn+ yn →x+ y; (c) xnyn → xy; and (d) 1/xn → 1/x, provided xn 6= 0 for any n and x 6= 0.

— suppose xn = (a1n, a2n, . . . , amn) ∈ Rm. Then xn → x = (a1, a2, . . . , am) if and

only if ain → ai.

• Using sequence to define closed sets: a set is closed if the limit point of any convergentsequence in it is also contained in this set.4

• Subsequences:

— given a sequence {xn}, consider a sequence of positive integers such that n1 < n2 <

n3 < · · · . Then the new sequence {xni} is called a subsequence of {xn}. If {xni}converges, its limit point is called a subsequential limit point or an accumulation

point of {xn}.— a sequence can have multiple subsequential limit points. For example, {(−1)n(1 +1/n)} has two subsequential limit points {−1, 1}. Moreover, the set of subsequen-tial limit points is closed.

— {xn} converges to x if and only if every subsequence of {xn} converges to x.— every bounded sequence in Rn contains a convergent subsequence.

• Cauchy sequence:4Then a set is open if its complement is closed. Hence, we have two ways to define open sets and closed

sets: use the concept of ε-neighborhood or use the concept of sequence convergence.

7



— a sequence {xn} in a metric space is said to be a Cauchy sequence if ∀ ε > 0, ∃ Nsuch that d(xm, xn) < ε for any m,n > N .

— in Rn, a sequence converges if and only if it is a Cauchy sequence.5

— this result enables us to decide whether a sequence converges or not without theknowledge of the limit of which it may converge.

• Monotonic sequences:

— suppose a sequence {xn} of real numbers is monotonic (i.e., xn ≤ xn+1 or xn+1 ≤xn). Then it converges if and only if it is bounded.

• Some special sequences:

— if a > 0, then 1/na → 0;

— if a > 0, then n√a→ 1;

— n√n→ 1;

— if a > 0 and b ∈ R, then nb

(1+a)n → 0;

—¡1 + 1

n

¢n → e, where e is defined asP∞

n=01n! ≈ 2. 718 3.

• Series:

— a seriesP∞

n=1 an converges if the sequence {xk =Pk

n=1 an} converges. IfP

an

converges, then an → 0.

— special series:

∗ if |x| < 1, then P∞n=0 x

n = 11−x ; if |x| ≥ 1, then the series diverges.

∗ if a > 1, thenP 1na converges; if a ≤ 1, then the series diverges. In particular,

the harmonic series 1 + 12 +

13 + · · · diverges.

Exercise 7 (i) Show xnyn → xy if xn → x ∈ R and yn → y ∈ R.(ii) Show that {xn} converges to x if and only if every subsequence of {xn} converges to

x.

(iii) Show that a monotonic sequence converges iff it is bounded.

(iv) Show¡1 + x

n

¢n → ex for x ∈ R.65A metric space in which every Cauchy sequence converges is said to be complete. Thus, at least the

Euclidean space is complete.6This limit is used to derive the compound interest rate in the continuous time scenario. Suppose the annual

interest rate is r, and the bank compounds interest n times a year. Then if a person deposits A pounds in the

bank, after one year he will get A(1 + rn)n, and after t years he will get A(1 + r

n)nt which will approach to

Aert as n tends to infinity. Therefore, the present value of x pounds available t years in the future is xe−rt.

8



3 Functions

• A function (or a mapping) f is a relation which assigns to each element x of a set A a

unique element f(x) of a set B. It is denoted by f : A→ B.7

— the set A is called the domain of f , the elements f(x) are called the values of f ,

and the set of all values of f is called the range of f .

— for any a ∈ A, the corresponding element in B, b = f(a), is called the image of a

while a is the preimage of b. More generally, we can talk of images and preimages

of sets of elements.

— a function f is said to map A into B if f(A) ⊂ B. A function f is said to map A

onto B if f(A) = B.

— a function f of A onto B is said one-to-one or a one-to-one correspondence if each

element b ∈ B has a unique preimage a ∈ A and we write a = f−1(b) where f−1 iscalled the inverse of f .

— in this course, we mainly deal with the case in which A ⊂ Rn and B ⊂ R. Then f

is called a real-value function.

• Examples (for some a, b, c ∈ R): f(x) = ax + b, f(x) = ax2 + bx + c, f(x) = xn,

f(x) = exp(x), f(x) = ax, f(x) = ln(x), f(x) = |x|, f(x) = sinx, f(x, y) = xαy1−α,f(x, y) = [xρ + yρ]1/ρ, f(x, y) = min{x, y}.

• Monotonic functions: a function f : R→ R is strictly increasing (decreasing) if

x1 > x2 ⇒ f(x1) > (<)f(x2)

• A composite function (f ◦ g)(x) defines the successive application of two functions:(f ◦ g)(x) = f(g(x)).

• Continuous functions:

— 1: consider a function f : A→ B, and let dA and dB be the metrics (or the distance

functions) associated with A and B, respectively. Then f is said to be continuous

at x0 ∈ A if ∀ ε > 0, ∃ δ > 0 such that

dB(f(x0), f(x)) < ε

for any x ∈ A satisfying dA(x0, x) < δ. If f is continuous at x0, then we call

y = limx→x0 f(x) the limit of function f at x0.

7 If f(x) is a non-singleton subset of B, then we call f a correspondence.

9



— 2: a function f : A → B is continuous at x0 ∈ A if for any sequence {xn} whichconverges to x0 in A, f(xn) converges to f(x0).

— a function f is said to be continuous if it is continuous at any point in its domain.

(Roughly speaking, a continuous function maps nearby points in its domain into

nearby points in its range.)

— consider a continuous function f : A → R where A ⊂ Rn is convex. Suppose

f(a) < f(b) for a, b ∈ A. Then for any c ∈ (f(a), f(b)), there exists λ ∈ (0, 1) suchthat

f(λa+ (1− λ)b) = c.

— in particular, for a continuous function, if A = [a, b] ⊂ R, and f(a) < c < f(b),

then there is x ∈ (a, b) such that f(x) = c (the intermediate value theorem).

• Examples of discontinuous functions:

f(x) =

(x, if x < 0

x+ 1, if x ≥ 0 ; f(x) =

(1, if x ∈ Q0, if x ∈ R\Q .

• A monotonic function f : R→ R cannot be “too discontinuous” in the sense that it hasat most countable number of discontinuous points.

4 Matrices

4.1 Basic definitions

• A matrix of dimension m × n is a rectangular array of numbers consisting of m rows

and n columns:

Am×n =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a11 a12 ... a1j ... a1n

a21 a22 ... a2j ... a2n

... ... ... ... ... ...

ai1 ai2 ... aij ... ain

... ... ... ... ... ...

am1 am2 ... amj ... amn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= [aij ]i=1,...m;j=1,...n

• Each entry aij is an element of the matrix Am×n where the first index is for the rowand the second is for the column.

• Two matrices of the same size, Am×n and Bm×n, are equal if all corresponding entriesare equal, aij = bij , for all i = 1, ...m and j = 1, ...n.

10



• A vector is a matrix in which one of the dimensions is one. If m = 1, the vector is a

row vector. If n = 1, the vector is a column vector.

• A matrix with m = n is a square matrix.

— A square matrix with aij = aji,∀i=1,..n∀j=1,...m is a symmetric matrix.

— A square matrix with aij = 0 whenever i 6= j is a diagonal matrix.

— A square matrix with aij = 0 for i > j is an upper-triangular matrix.

— A square matrix with aij = 0 for i < j is an lower-triangular matrix.

— A diagonal matrix with aii = 1 for all i = 1, ...n is the identity matrix and we writeIn.

• Elementary matrices:

— Eij is formed by interchanging the ith and the jth rows of the identity matrix I.

— Ei(α) is formed by multiplying the ith row of I by the scalar α.

— Eij(α) is formed by adding α times row i to row j in I.

4.1.1 Ranks

• Linear independence:

— Take a collection of vectors in Rm: v1, ...,vn. (Each of them is a m × 1 columnvector.) We say that they are linearly dependent if there exists scalars α1, ..., αn,

not all zero, such that

α1v1 + · · ·+ αnvn = 0. (1)

That is, at least one vector can be expressed as a linear combination of other

vectors.

— We say that v1, ...,vn are linearly independent if there is no such scalars (at leastone nonzero) so that (1) holds. That is, no vector can be expressed as a linear

combination of others. Alternatively, v1, ...,vn are linearly independent if

α1v1 + · · ·+ αnvn = 0 =⇒ α1 = · · · = αn = 0.

— If n > m, then v1, ...,vn must be linearly dependent.

• The number of linearly independent rows (or columns) of a matrix is the rank of thematrix. The number of linearly independent rows must equal the number of linearly

independent columns.

11



• If A is m× n, then rank(A) 6 min(m,n).

• A square matrix An×n is of full rank if rank(A) = n.

Exercise 8 Let

A =

Ã4 2 3

8 4 6

!and B =

⎛⎜⎝ 1 0 0

0 2 0

1 0 3

⎞⎟⎠ .

Solve rank(A) and rank(B).

4.1.2 Determinants

We will assign a number called the determinant to each square matrices An×n. Denote it by

detA = det

⎛⎜⎜⎝a11 · · · a1n...

. . ....

an1 · · · ann

⎞⎟⎟⎠ or |A| =

¯¯ a11 · · · a1n...

. . ....

an1 · · · ann

¯¯ .

We calculate |A| by the following inductive procedure:

• For 1× 1 matrix, det(a) = a.

• Let Aij be the (n − 1) × (n − 1) submatrix of A by deleting row i and column j. Call

the scalar

Mij = |Aij |the (i, j)th minor of A and the scalar

Cij = (−1)i+jMij

the (i, j)th cofactor of A. A cofactor is just a signed minor.

• We calculate |A| as follows: for any row index i ∈ {1, 2, · · · , n},

|A| = ai1Ci1 + ai2Ci2 + · · ·+ ainCin

= ai1(−1)i+1Mi1 + ai2(−1)i+2Mi2 + · · ·+ ain(−1)i+nMin.

We can also pick any column index and do a similar calculation. We will get the same

result in either way. (It is usually more convenient to pick the row or column with the

most zero entries.)

• For a 2× 2 matrix, we have ¯¯ a11 a12

a21 a22

¯¯ = a11a22 − a12a21.

12



• For a 3× 3 matrix, we have¯¯ a11 a12 a13

a21 a22 a23

a31 a32 a33

¯¯ = a11

¯¯ a22 a23

a32 a33

¯¯− a12

¯¯ a21 a23

a31 a33

¯¯+ a13

¯¯ a21 a22

a31 a32

¯¯

Example 6 Calculate ¯¯ 2 1 3

3 0 1

4 0 5

¯¯ .

• The determinant of a lower-triangular, upper-triangular, or diagonal matrix is simplythe product of its diagonal entries. In particular, |I| = 1.

• A geometric interpretation of determinant: consider a 2× 2 matrix

A = [a1 a2]

where a1 and a2 are its column vectors. Then one can verify that |A| is just the area ofthe parallelogram formed by these two column vectors in a plane. Sucn an interpretation

can be extended to higher dimensional case.

• Some properties of the determinant (A is a n× n matrix):

— |AT | = |A|.— ifB is formed fromA by interchanging two rows (or two columns), then |B| = − |A|.— |A| = 0 if two rows (or two columns) of A are equal.

— if B is formed by multiplying a row or column by a scalar α, then |B| = α |A| andso |αA| = αnA.

— transform A to B by adding r times row i of A to row j of A to form row j of B.

Then |B| = |A|.— for any two n×n matrices A and B, we have |AB| = |A||B|. Then ¯A−1¯ = 1/ |A|.

• A square matrix is of full rank if and only if its determinant is nonzero.

Exercise 9 (i) Show that |A| = 0 if two rows (or two columns) of A are equal.

(ii) Suppose

A =

Ãa1 a2

a3 a4

!and B =

Ãb1 b2

b3 b4

!.

Verify |AB| = |A||B|.(iii) Give an example in which |A+B| 6= |A|+ |B|.

13



4.2 Operations with matrices

• Sum: Am×n +Bm×n = [aij + bij ]i=1,...m;j=1,...n

• Product by a scalar: αAm×n = [αaij ]i=1,...m;j=1,...n

• Matrix product: Am×nBn×l =hPn

j=1 aijbjk

ii=1,...m;k=1,...l

. More explicitly, to obtain

the (i, k)th entry of AB, multiply the ith row of A and the jth column of B as follows:

( ai1 ai2 · · · ain ) ·

⎛⎜⎜⎜⎜⎝b1k

b2k...

bnk

⎞⎟⎟⎟⎟⎠ =nX

j=1

aijbjk.

Notice that the number of columns of A must equal to the number of rows of B, and

the new matrix inherits the number of its rows from A and the number of its columns

from B.

— for any m× n matrix A and n× n identity matrix I,

AI = A.

— all operation laws for numbers but the commutative law for multiplication appliesfor matrices. For example, the distributive laws are:

A(B + C) = AB +AC,

(A+B)C = AC +BC.

But AB 6= BA in general even if both of them are well defined. For example,

A2×3B3×2 6= B3×2A2×3. Even if A and B are both square matrices with the same

size, AB 6= BA in general.

— a square matrix with AA = A is an idempotent matrix. For example In is an

idempotent matrix.8

— suppose E is some elementary matrix defined previously. Then EA is the matrix

obtained by performing the same row operation on A as E does on I.

— rank(AB) ≤ min(rank(A),rank(B))

• Transpose: ATm×n = [aij ]

Ti=1,...m;j=1,...n = [aji]j=1,...,n;i=1,...m. So the first row of A

becomes the first column of AT , the second row of A becomes the second column of AT ,

8The only full rank, symmetric idempotent matrix is the identify matrix.

14



and so on. Notice that

(A+B)T = AT +BT

(AB)T = BTAT

— rank(A) =rank(AAT ) =rank(ATA)

• Inverse: Let A be a square matrix. If there exists a square matrix B such that AB =

BA = I, then B is an inverse of A, and A is said to be invertible. We denote it by A−1.(The inverse operation for matrices is analogous to the division operation for numbers.)

— a square matrix is invertible if and only if it is of full rank or its determinant isnonzero. We also call an invertible matrix a nonsingular matrix.

— decomposing invertible matrices: an invertible matrix can be written as the productof elementary matrices.9

— properties of inverse (α is a nonzero scalar):

∗ ((A)−1)−1 = A;

∗ (AT )−1 = (A−1)T ;

∗ (AB)−1 = B−1A−1;

∗ (αA)−1 = 1αA

−1.Notice that A+B needs not to be invertible even if both A and B are invertible.

Moreover, even if A + B is invertible, (A + B)−1 is generally not equal toA−1 +B−1.

— the inverse of partitioned matrix : let A be a square matrix partitioned as

A =

ÃA11 A12

A21 A22

!

where A11 and A22 are square submatrices. Then A is invertible if both A22 and

D = A11 −A12A−122 A21

are invertible, and

A−1 =

ÃD−1 −D−1A12A−122

−A−122 A21D−1 A−122 (I +A21D−1A12A−122 )

!.

9 In general, any matrix can be written as the product of several elementary matrices and a reduced row

echelon matrix. This decomposition theorem is useful in proving many matrix results.

15



Exercise 10 (i) Give an example in which A and B are square matrices with the same size

but AB 6= BA.

(ii) Verify that Ã5 −54 −4

!is an idempotent matrix. For Ã

a b

c d

!to be an idempotent matrix, what conditions must (a, b, c, d) satisfy?

(iii) Suppose

A =

Ã2 1

1 1

!and B =

Ã2 3 1

0 −1 2

!.

Verify that (AB)T = BTAT .

(iv) Show that the inverse of

A =

Ãa b

c d

!is

A−1 =1

ad− bc

Ãd −b−c a

!whenever ad 6= bc. Then the inverse of a 2× 2 symmetric matrix is still symmetric.

(v) Give an example in which (A+B)−1 6= A−1 +B−1.

4.3 Systems of linear equations

• A system of m linear equations with n unknown variables (x1, · · · , xn) can be conciselyrepresented in a matrix format:

Am×n

⎛⎜⎜⎝x1...

xn

⎞⎟⎟⎠ =

⎛⎜⎜⎝b1...

bm

⎞⎟⎟⎠ or Ax = b.

We call A the coefficient matrix.

• The solution to Ax = b with n equations and n unknown variables.

— If A is invertible (or nonsingular), then the system has a unique solution x = A−1b.It can be calculated by using Cramer’s rule:

xi =|Bi||A| for i = 1, · · · , n

where Bi is the matrix A with b replacing the ith column of A.

16



— If A is not invertible (or singular), then the system either has no solution or has

infinitely many solutions. The argument is very simple. Since A is singular, then

its row vectors ⎛⎜⎜⎝r1...

rn

⎞⎟⎟⎠must be linearly dependent, which means that at least one vector can be expressed

as a linear combination of other vectors. Suppose ri =P

j 6=i αjrj . Then if bi 6=Pj 6=i αjbj , then we have no solution; and if bi =

Pj 6=i αjbj , the ith equation is

redundant (i.e, we have n − 1 equations and n unknown variables) and then we

have no solution or infinitely many solutions.

• The solution to Ax = b with m equations and n unknown variables (m 6= n). (See, for

example, pp. 147 in Simon&Blume (1994).)

Exercise 11 Use Cramer’s rule to solve the system of equations:

2x1 − 3x2 = 2

4x1 − 6x2 + x3 = 7

x1 + 10x2 = 1

4.4 Eigenvectors and Eigenvalues

• Eigenvalues: an eigenvalue (or characteristic value) of a square matrix An×n is a numberλ solving

|A− λI| = 0. (2)

— the left-hand side of (2) is an nth order polynomial in terms of λ, and so it has n

real or complex (maybe repeated) roots.10

— if we subtract each diagonal entry of A by one of its eigenvalues, this matrix will

become singular.

— if a+bi is a complex eigenvalue, then its conjugate a−bimust be another eigenvalue.— let λ1, · · · , λn be the eigenvalues of An×n, then

∗ Pλi =P

aii (the trace of A);

∗ Qλi = |A|.— A is singular if at least one of its eigenvalues is zero; if all eigenvalues are nonzero,

then A must be nonsingular.10The fundamental theorem of algebra confirms that an nth order polynomial equation has exactly n (real

or complex) roots (maybe repeated).

17



Example 7 Find the eigenvalues of

A =

Ã1 1

1 3

!, B =

Ã1 −31 3

!, and C =

Ã1 −11 3

!respectively. From

|A− λAI| = λ2A − 4λA + 2 = 0,|B − λBI| = λ2B − 4λB + 6 = 0,|C − λCI| = λ2C − 4λC + 4 = 0,

we solve

λA = 2±√2, λB = 2±

√2i, and λC = 2.

• Eigenvectors: an eigenvector v of A associated with its eigenvalue λ is a non-zero

solution to

(A− λI)v = 0 or Av = λv. (3)

— notice that, since A − λI is singular, (3) has infinitely many solutions. One can

pick any one different from the null vector to be the eigenvector associated with

eigenvalue λ.

— if we have complex eigenvalues, their associated eigenvectors are also complex.

— suppose λ1, · · · , λn are eigenvalues of A and v1, · · · ,vn are the corresponding eigen-vectors. Let the matrix P = [v1 · · · vn]. Then

AP = [Av1 · · · Avn]= [λ1v1 · · · λnvn]

= [v1 · · · vn]

⎛⎜⎜⎝λ1 · · · 0...

. . ....

0 · · · λn

⎞⎟⎟⎠ .

So

P−1AP =

⎛⎜⎜⎝λ1 · · · 0...

. . ....

0 · · · λn

⎞⎟⎟⎠whenever P is invertible. This is the standard procedure to diagonalize a matrix

if we can always find such invertible P (or linearly independent eigenvectors).11

11When we do not have enough linearly independent eigenvectors (which may occur only when we have

repeated eigenvalues), we need introduce generalized eigenvectors v which satisfy (A − λI)v 6= 0 but (A −λI)mv = 0 for some m > 1. And A can only be “almost” diagonalized. See the detailed procedure, for

example, in pp.601 in Simon&Blume (1994).

18



— conversely, if P−1AP is a diagonal matrix D, then the columns of P must be

eigenvectors of A and the diagonal entries of D must be eigenvalues of A.

— for k ≤ n distinct eigenvalues of A, their associated eigenvectors must be linearly

independent.

• Symmetric matrices:

— if A is symmetric, then all of its eigenvalues are real.

— eigenvectors corresponding to distinct eigenvalues are orthogonal (more than lin-early independent) (i.e., for λi 6= λj , vi · vj = 0).

— if A is symmetric, then there exists an orthogonal matrix P (meaning that P−1 =PT , so more than invertible) such that,

PTAP = D

where D is a diagonal matrix with A’s eigenvalues as its diagonal entries, and the

columns of P are mutually orthogonal eigenvectors of A.

Exercise 12 (i) ProveP

λi =P

aii andQ

λi = |A| for 2× 2 matrices.(ii) Let

A =

Ã4 2

2 1

!.

Verify all properties related with symmetric matrices.

(iii) Show that the eigenvalues of a symmetric matrix are real in the 2× 2 case.

4.5 Quadratic forms

• A quadratic form is a particular function in Rn defined as follows

Q(x) = xT ·A · x =nX

i,j=1

aijxixj

where A is a symmetric n× n matrix and x = (x1, · · · , xn)T . In the following, we alsocall A a quadratic form.

• A symmetric n× n matrix (or a quadratic form) A is said to be

— positive (negative) definite if xT ·A · x > (<)0 for any nonzero x ∈ Rn.

— positive (negative) semidefinite if xT ·A · x ≥ (≤)0 for any nonzero x ∈ Rn.

— indefinite if neither of the above holds.

19



• Practical way I to identify definiteness:

— A is positive (negative) definite iff its eigenvalues are positive (negative);

— A is positive (negative) semidefinite iff its eigenvalues are nonnegative (nonposi-

tive).

The proof is very simple. For symmetric A, there exists orthogonal matrix P such

that PTAP = D, where D is a diagonal matrix with A’s eigenvalues as its diagonal

entries, and so A = PDPT . Then

xT ·A · x = (PTx)TD(PTx) =nXi=1

λiz2i

where zi = (PTx)i. Our results follow immediately.

• Practical way II to identify definiteness:

— A kth order principal submatrix of A is formed by deleting n− k columns and the

same n− k rows from A. Its determinant is called a kth order principal minor of

A.

— The kth order leading principal submatrix of A is formed by deleting the last n−kcolumns and rows from A. Its determinant is called the kth order leading principal

minor of A.

— Definiteness:

∗ A is positive definite iff all its n leading principal minors are strictly positive.

∗ A is negative definite iff its n leading principal minors alternate in sign as

follows:

|A1| < 0, |A2| > 0, |A3| < 0, etc.That is, the kth order leading principal minor has the sign of (−1)k.

— Semidefiniteness:

∗ A is positive semidefinite iff every principal minor of A is ≥ 0.∗ A is negative semidefinite iff every principal minor of odd order is ≤ 0 and

every principal minor of even order is ≥ 0.

Example 8 Examine the definiteness or semidefiniteness of the following matrices by usingboth methods:

A =

Ã1 2

2 5

!, B =

⎛⎜⎝ −1 1 0

1 −3 1

0 1 −2

⎞⎟⎠ , C =⎛⎜⎝ 1 2 0

2 4 5

0 5 6

⎞⎟⎠ .

20



• Definiteness on a subspace {x : Bx = 0} where B is a m× n matrix and m < n.

Let us construct a bordered matrix

H =

Ã0 B

BT A

!.

— if |H| has the same sign as (−1)n and if the last n −m leading principal minors

alternate in sign,12 then A is negative definite in the subspace {x : Bx = 0}.— if the last n−m leading principal minors have the same sign as (−1)m, then A is

positive definite on the subspace.

A Appendix

A.1 Basic logics

• Given a proposition “statement P implies statement Q” or simply

P =⇒ Q,

its converse is

Q =⇒ P.

Notice that if a proposition is true, its converse needs not to be. For example, “x >

0 =⇒ x2 > 0” is true but “x2 > 0 =⇒ x > 0” is not.

• Given a proposition P =⇒ Q, its contrapositive is

∼ Q =⇒ ∼ P,

where “∼” means the negation of a statement. It says, if Q does not hold, then P does

not hold either. Notice that if proposition is true, then its contrapositive must be true,

and vice versa. That is, a proposition is logically equivalent to its contrapositive.

• If a statement involves universal quantifier (e.g., “all swans are white”), then its nega-tion involves an existential quantifier (e.g., “there exists one swan which is not white”):

to deny the truth of a universal statement only requires us to find just one case where

the statement fails.

• Similarly, the negation of an existential quantifier involves a universal quantifier: todeny that there is at least one case where the proposition holds requires us to show that

the proposition fails in every case.12Let Ak be the submatrix of A with the first k rows and k columns. Then the last n−m leading principal

minors corresponds to Am+1, · · · , An, respectively.

21



• Examples of negations:The negation of

“∀ x ∈ A, f(x) > 0.”

is

“∃ x ∈ A such that f(x) ≤ 0. ”

The negation of

“∀ x ∈ A, ∃ y ∈ B such that f(x, y) > 0.”

is

“∃ x ∈ A such that for any y ∈ B, f(x, y) ≤ 0.”

• Sufficient and necessary conditions:

— supposeP =⇒ Q.

Then P is a sufficient condition for Q (i.e., P is sufficient for Q to be true), and Q

is a necessary condition for P (i.e., for P to be true, Q must necessarily be true).

— supposeP ⇐⇒ Q, or P holds if and only if Q holds.

Then P is a sufficient and necessary condition for Q to be true, or the two state-

ments P and Q are equivalent.

A.2 Proofs

• Direct proofs: A =⇒ · · · =⇒ B

• Contradiction: ∼ B =⇒ ∼ A

• Mathematical induction:Step 1: a statement is true when n = 1.

Step 2: if this statement is true for n = k, we show it is also true for n = k + 1.

Then we can conclude that this statement is true for any n ≥ 1.Notice that this method is only suitable for proving propositions about the integers or

indexed by the integers.

Example 9 Show 1 + 2 + · · ·+ n = n(n+1)2 .

22



A.3 Useful equalities and inequalities

• Binomial equality:

(a+ b)n =nXi=0

µn

i

¶an−ibi

= an +

µn

1

¶an−1b+

µn

2

¶an−2b2 + · · ·+

µn

n− 1¶abn−1 + bn

•nXi=1

i =n(n+ 1)

2

• For |x| < 1,∞Xi=0

xi =1

1− x

•ex > 1 + x

and for x > 0,

lnx < x− 1.

• Arithmetic-Geometric mean inequality: for nonnegative xi and positive αi,

α

qQni=1 x

αii ≤

Pni=1 αixiα

where α =Pn

i=1 αi. In particular, we have

n√x1 · · ·xn ≤ x1 + · · ·+ xn

n.

• Harmonic inequality: for positive xi,nPn

i=1 1/xi≤Pn

i=1 xin

• Triangle inequality:|x− y| ≤ |x− z|+ |z − y|

• Bernouli inequality:

(1 + x)n > 1 + nx for x > −1 and x 6= 0.

23



• Cauchy-Schwarz inequality: ÃnXi=1

xiyi

!2≤

nXi=1

x2i ·nXi=1

y2i

• Young’s inequality: for positive a, b, p, and q,

ab ≤ ap

p+

bq

q

if 1p +1q = 1.

Exercise 13 (i) Show the harmonic inequality by using the A-G inequality.

(ii) Verify the Cauchy-Schwarz inequality when n = 2.

(iii) Prove Young’s inequality.

24

Documents

Lecture Notes: MSc Maths and Statistics Refresher · PDF file2=1.414213562···, e=2.718281828459···, and π=3.141592654···. — the set of real numbers R contains all rational