Whatis Linear Algebra andWhy DoI Need It?into the matrix: in other words, i is an integer between 1 and the number of rows, and j is an integer between 1 and the number of columns

MATH10202 Linear Algebra A 2019-20

These notes accompany the part of the course lectured by Mark Kambites.

What is Linear Algebra and Why Do I Need It?

Linear means “to do with lines”. Linear algebra is the algebra of linear equations, which areequations whose solution sets (when drawn in space) are lines, and higher dimensional analogues of linescalled linear subspaces. Linear equations can be concisely and elegantly expressed using algebraic objectscalled matrices, and the first part of the course is mostly concerned with these. The subject is importantfor both pure mathematics and applications:

• Linear algebra expresses some of the fundamental objects of geometry in a formal algebraic way. Itallows us to use equational reasoning to understand geometry.

• Many real-world problems (both those of a geometric flavour and others) are modelled with linearalgebra, which provides a powerful toolkit to solve these problems.

Practicalities

Lecturer (first half). Professor Mark Kambites (email [email protected]).

Notes and Lectures. Notes for Mark’s part of the course will be provided with gaps for you tocomplete in lectures, on Mark’s webpage at

personalpages.manchester.ac.uk/staff/Mark.Kambites/la.php

The notes form the definitive content of this part of the course, and you should expect to refer to themfrequently. If you need the notes in a different format due a disability, please just let me know. The lectureswill explain the same material, but sometimes more informally.

Exercises. Exercise sheets will be handed out in lectures, and contain instructions on when to do theexercises and what to hand in. They are an essential part of learning, so if you want to do well in the courseyou need to schedule time to attempt all of them (not just the ones for handing in!). Solutions will be madeavailable when you have had a chance to attempt the questions yourself.

Office Hour. My office is Alan Turing 2.137. My office hour will generally be 15:30-16:30 on Tuesdaysduring teaching weeks; it may sometimes be necessary to change this in which case details will be postedon my website, so please check there before making a special journey. If you can’t make it to my office hourbut need a personal meeting then please email or ask after a lecture for an appointment.

Supervisions and Homework. The exercise sheets tell you which exercises to hand in for which weeks;your supervision group leader will tell you exactly when and where to hand in. Attendance at super-visions, and handing in homework, is compulsory. Please make sure you arrive on time with theexercise sheets; group leaders may mark you absent if you come late or not properly prepared.

Assessment. The assessment for the course comprises homework and supervision attendance (10%) and afinal exam (90%). The supervisions/homework for the first 6 weeks and Section A of the exam will covermy half of the course.

Feedback. Please let me know how you are finding the course, and especially if you think there are anyproblems with it. Feel free to speak to me after a lecture, email me, come along in my office hour, or evenslip an anonymous note under my office door!

Books. The course is self-contained and full notes will be supplied, so you should not need to refer to anybooks. But if you would like an alternative viewpoint, the course webpage contains some suggested texts.

1

Prerequisite Material

This course builds directly upon MATH10101 Foundations of Pure Mathematics from lastsemester. You will need to understand many of the concepts from that course, including:

• proofs (including by contradiction and induction);

• sets (ways of defining them, cardinalities);

• functions (and their properties such as injectivity and surjectivity);

• modular arithmetic.

If you can’t remember what the above terms mean, you should go back to your 10101 notes and reread thedefinitions. Exercise Sheet 0 gives a chance to practice working with these.

Remark. You’ll discover this semester that mathematics at university level is not an “examine-and-forget” subject. Each stage builds upon what you have learnt before, and demands progressively deeperunderstanding of previous material. Your main aim in studying this course — and my aim in lecturing it— is therefore not the exam, but a deep understanding of the material which will serve you well next yearand beyond. If together we can achieve that, the exam will take care of itself!

1 Matrices

Matrices are a useful and elegant way to study linear equations. For teaching purposes it is actually easier tointroduce matrices first (which we do in this chapter) and linear equations afterwards (in the next chapter).

Definition. A matrix is a rectangular array of real numbers1, for example:

A =

(

2 5√3 0.8

)

or B =

(

1 6 32 1 9

)

or C =

12

−3.74

.

Size. A matrix has clearly defined numbers of rows and columns, which together make up its size. Amatrix with p rows and q columns is called a p× q matrix. Of the above examples:

• A is a 2× 2 matrix (2 rows and 2 columns – we say that it is square);

• B is a 2× 3 matrix (2 rows and 3 columns);

• C is a 4× 1 matrix (4 rows and 1 column).

Matrix Entries. If a matrix is called A, we write Aij to denote the entry in the ith row and jth column ofA. For example, in the matrix A above, the entries are A11 = 2, A12 = 5, A21 =

√3 and A22 = 0.8. (Some

people write aij instead of Aij .)

1For now! Later we shall see that matrices (“matrices” is the plural of “matrix”) can also have other things in them, suchas complex numbers or even more abstract mathematical objects, and much of what we are doing will still work. The objectswhich form the entries of a matrix are called the scalars (so for now “scalar” is just another word for “real number”).

2

Equality of Matrices. Two matrices A and B are equal if and only if

• they have the same number of rows and

• they have the same number of columns and

• corresponding entries are equal, that is Aij = Bij for all appropriate2 i and j.

1.1 Addition of Matrices

Two matrices can be added if they have the same number of rows (as each other) and the same number ofcolumns (as each other). If so, to get each entry of A + B we just add the corresponding entries of A andB. Formally, A+B is the matrix with entries given by

(A+B)ij = Aij +Bij

for all i and j.

Example. The matrices A =

2 16 17 1

and B =

7 21 43 9

are both 3×2 matrices, so they can be added.

The sum is

A+B =

2 + 7 1 + 26 + 1 1 + 47 + 3 1 + 9

=

9 37 510 10

.

1.2 Scaling of Matrices

Any matrix can be multiplied by any scalar (real number), to give another matrix of the same size. To dothis, just multiply each entry of the matrix by the scalar. Formally, if A is a matrix and λ ∈ R then thematrix λA has entries given by

(λA)ij = λ(Aij)

for all i and j.

Example. 7

(

−1 43 2

)

=

(

7× (−1) 7× 47× 3 7× 2

)

=

(

−7 2821 14

)

.

Notation. We write −A as a shorthand for (−1)A.

1.3 Subtraction of Matrices

Two matrices of the same size (the same number of rows and same number of columns) can be subtracted.Again, the operation is performed by subtracting corresponding components. Formally,

(A−B)ij = Aij −Bij.

Example. Let A =

2 77 −1−1 2

and B =

−1 30 42 0

.

Since both are 3× 2 matrices, the subtraction A−B is meaningful and we have

A−B =

2− (−1) 7− 37− 0 −1− 4−1− 2 2− 0

=

3 47 −5−3 2

2By “appropriate” I mean, of course, that i and j have the right values to be “indices” (“indices” is the plural of “index”)into the matrix: in other words, i is an integer between 1 and the number of rows, and j is an integer between 1 and the numberof columns. Once the size of a matrix is known it should be obvious what numbers are allowed to be row and column indices.For brevity I will often just write “for all i and j” to mean “for all integers i between 1 and the number of rows in the matrixand j between 1 and the number of columns in the matrix”.

3

Proposition 1.1. For any matrices A and B of the same size p× q, we have

A−B = A+ (−B).

Proof. Notice first that (−B) = (−1)B is also a p× q matrix, so the sum A+ (−B) makes sense. Now foreach i and j we have

(A+ (−B))ij = Aij + (−B)ij = Aij + (−1)Bij = Aij −Bij = (A−B)ij.

So A−B and A+ (−B) are the same size, and agree in every entry, which means they are equal.

1.4 Multiplication of Matrices

Two matrices A and B can, under certain circumstances, be multiplied together to give a product AB.The condition for this to be possible is that the number of columns in A should equal the number of rowsin B.

If A is m× n and B is n× p then AB can be formed and will be m× p.

A × B = ABm× n n× p m× p

The entry in the ith row and the jth column of AB depends on the entries in the ith row of A and theentries in the jth column of B. Formally,

(AB)ij = Ai1B1j +Ai2B2j +Ai3B3j + . . . +AinBnj =

n∑

k=1

AikBkj.

The way entries of the factors combine to form entries of the product can be visualised as follows:

...Ai1 Ai2 Ai3 . . . Ain

...

...

. . . . . .

B1j

B2j

B3j

...Bnj

. . .

=

. . . . . .

...(AB)ij

...

...

. . .

i− th row j− th column Element in i− th rowand j− th column

Example. Consider the matrices

A =

2 1−3 71 5

and B =

(

5−2

)

.

A is 3× 2 and B is 2× 1. As the number of columns in A is the same as the number of rows in B, ABexists. AB will be 3× 1. The three entries of AB are computed as follows:

• The top entry of AB will be computed from top row of A and the single column of B: it is 2 × 5 +1× (−2) = 8.

• The middle entry of AB will be computed from the middle row of A and the single column of B: it is−3× 5 + 7× (−2) = −29.

• The bottom entry of AB will be computed from the bottom row of A and the single column of B: itis 1× 5 + 5× (−2) = −5.

4

So AB =

8−29−5

.


C =

(

2 1−5 3

)

and D =

(

4 −23 7

)

.

Both C and D are 2× 2. As the number of columns in C equals the number of rows in D, CD exists, andit will also be 2× 2.

CD =

(

2 1−5 3

)(

4 −23 7

)

=

(

2× 4 + 1× 3 2× (−2) + 1× 7−5× 4 + 3× 3 (−5)× (−2) + 3× 7

)

=

(

11 3−11 31

)


E =

21−31−2

62720

−48227

3−5944

and F =

7−122

84−32

−112−2

.

Then EF will be a 5× 3 matrix, and for example the entry (EF )42 can be calculated using the 4th row of

E, which is 1 2 2 4 , and the 2nd column of F , which is

84−32

, so:

(EF )42 = 1× 8 + 2× 4 + 2× (−3) + 4× 2 = 18.

Tip. The only way to master matrix operations — especially multiplication — is to practice! When youhave finished the examples on Exercise Sheet 1, try making up examples of your own and multiplying them.Get a friend to do the same ones and check you get the same answer.

Exercise. Prove that if A and B are square matrices (same number of rows and columns) of the same sizeas each other, then AB is always defined. What size will AB be?

1.5 Properties of Matrix Operations

The operations on matrices share a lot of properties with the corresponding operations on numbers. Thefollowing theorem gives some of the most notable ones:

Theorem 1.2. Let A, B and C be matrices such that the given operations are defined, and λ and µ benumbers. Then

(i) A+B = B +A;

(ii) A+ (B + C) = (A+B) + C;

(iii) A(BC) = (AB)C;

5

(iv) A(B + C) = AB +AC;

(v) (B + C)A = BA+ CA;

(vi) A(B − C) = AB −AC;

(vii) (B − C)A = BA− CA;

(viii) λ(B + C) = λB + λC;

(ix) λ(B − C) = λB − λC;

(x) (λ+ µ)C = λC + µC;

(xi) (λ− µ)C = λC − µC;

(xii) λ(µC) = (λµ)C;

(xiii) λ(BC) = (λB)C = B(λC).

Proof. We prove (i) and (iv) to exemplify the methods. Some of the other proofs are on Exercise Sheet 1.

(i) For A+B (and B +A) to be defined, A and B must be the same shape, say m× n. Now A+B andB +A are both m× n, and for each i and j we have

(A+B)ij = Aij +Bij = Bij +Aij = (B +A)ij .

So A+B and B +A are the same shape with exactly the same entries, which means A+B = B +A.

(iv) For the given operations to be defined, A must be an m× r matrix, and B and C both r×n matrices.Now A(B + C) and AB +AC are both m× n matrices, and for each i and j we have

[A(B + C)]ij =

r∑

l=1

Ail(B + C)lj =

r∑

l=1

Ail(Blj + Clj) =

r∑

l=1

AilBlj +

r∑

l=1

Ailclj

= (AB)ij + (AC)ij = (AB +AC)ij .

Remark. Property (i) in Theorem 1.2 is called the commutativity law for matrix addition; notice thatthere isn’t a corresponding statement for matrix multiplication - see Exercise Sheet 1. Properties (ii) and(iii) are associativity laws (for matrix addition and matrix multiplication respectively). Properties (iii)-(xi)are distributivity laws.

1.6 Diagonal Matrices, Identities and Inverses

The Main Diagonal. If A is a square matrix (say n×n) then the main diagonal (sometimes just calledthe diagonal) is the diagonal from top left to bottom right of the matrix. In other words, it is the collectionof entries whose row number is the same as their column number.

A matrix is called diagonal if all the entries not on the diagonal are 0. (Some, or even all, of thediagonal entries may also be 0.) For example,

(

2 00 0

)

and

1 0 00 1 00 0 1

are diagonal matrices but

6

(

−2 05 2

)

is not diagonal because of the 5 which lies off the main diagonal.

Identity Matrices. A square matrix with 1’s on the main diagonal and 0’s everywhere else is called anidentity matrix. The n× n identity matrix is denoted In. For example:

I2 =

(

1 00 1

)

while I3 =

1 0 00 1 00 0 1

.

It has the important property that

AIn = A and/or InA = A

whenever A is the right size for these products to be defined. (Exercise: prove this.)

Inverses. Let A be an n× n square matrix. An inverse of A is an n× n matrix A−1 such that

A A−1 = A−1 A = In.

Not every matrix has an inverse. Matrices with an inverse are called invertible; those which are notinvertible are called singular.

Exercise. What is the inverse of the identity matrix In?

Lemma 1.3. (i) Suppose A is an invertible n× n matrix. Then the inverse of A is unique.

(ii) If E and D are both invertible n× n matrices then ED is invertible.

Proof. (i) Suppose B and C are both inverses of A. Then AB = BA = In and AC = CA = In. Nowusing Theorem 1.2(iii):

B = InB = (CA)B = C(AB) = CIn = C.

(ii) Let E−1 and D−1 be inverses of E and D respectively. Then again by Theorem 1.2(iii):

(ED)(D−1E−1) = E(DD−1)E−1 = EInE−1 = EE−1 = In.

A similar argument (exercise!) gives (D−1E−1)(ED) = In. This shows that the matrix D−1E−1 isan inverse of ED (and so by part (i), the inverse of ED).

Notation. If A is an invertible matrix, we write A−1 for its inverse.

There is a simple way to describe inverses of 2× 2 matrices:

Theorem 1.4. Let

A =

(

a bc d

)

be a 2× 2 matrix. Then A is invertible if and only if ad− bc 6= 0, and if it is, then

A−1 =1

ad− bc

(

d −b−c a

)

.

7

Remark. The value ad − bc is a very important number associated to the matrix A. It is called thedeterminant of A, and we will see later (Chapter 4) how to generalise the idea to bigger matrices.

Exercise. Determine which of the following matrices are invertible, and for those which are find the inverse:

(

1 23 4

) (

0 11 0

) (

−1 3−2 6

) (

3 13 1

)

.

The Zero Matrix. A matrix is called a zero matrix if all of its entries are 0. For example:

0000

or

(

0 00 0

)

.

Clearly there is exactly one zero matrix of each size: we write 0p×q for the p× q matrix all of whose entriesare 0, so those above are 04×1 and 02×2.

1.7 Transpose and Symmetry

The transpose of a p× q matrix is the q× p matrix obtained by swapping over the rows and columns. Thetranspose of a matrix A is written AT , and formally it is the q × p matrix which entries given by

(AT )ij = Aji

for all i and j. For example:

1724

T

=(

1 7 2 4)

while if A =

2 6−1 31 1

then AT =

(

2 −1 16 3 1

)

.

Theorem 1.5. Let A and B be matrices such that the given operations are defined. Then:

(i) (AT )T = A;

(ii) (A+B)T = AT +BT ;

(iii) (AB)T = (BTAT ).

Proof.

(i) Clear.

(ii) For A+B to be defined we need A and B to have the same size, say m×n, in which case (A+B)T andAT +BT are both defined and n×m. Now for each i and j,

((A+B)T )ij = (A+B)ji = Aji +Bji = (AT )ij + (BT )ij = (AT +BT )ij.

So (A+B)T and (AT +BT ) are the same size with the same entries, which means they are equal.

(iii) For AB to be defined it must be that A is m × p and B is p × n, in which case AT is p ×m and BT

is n × p, so that BTAT is defined. It is easy to check that (AB)T and BTAT are both n × m. Now for1 ≤ i ≤ n and 1 ≤ j ≤ m we have3

((AB)T )ij = (AB)ji =

p∑

k=1

AjkBki =

p∑

k=1

(AT )kj(BT )ik =

p∑

k=1

(BT )ik(AT )kj = (BTAT )ij .

3This is a case where just saying “appropriate i and j” might not be entirely clear, so we are careful to specify!

8

Theorem 1.6. If A is an invertible matrix then AT is invertible and (AT )−1 = (A−1)T .

Proof. Using Theorem 1.5(iii)AT (A−1)T = (A−1A)T = (In)

T = In

(A−1)TAT = (AA−1)T = (In)T = In

Definition. A matrix A is called symmetric if AT = A, or skew symmetric if AT = −A.

Example.

1 9 49 3 −24 −2 7

is symmetric.

0 9 4−9 0 −2−4 2 0

is skew symmetric.

Remark. Notice that a matrix which is symmetric or skew symmetric must be square, and a skew symmetricmatrix must have 0s on the main diagonal.

Theorem 1.7. (i) If A is any matrix then AAT and ATA are symmetric matrices.

(ii) If A is a square matrix then A+AT is symmetric and A−AT is skew symmetric.

(iii) If A is invertible and symmetric then A−1 is symmetric

Proof. (i) and (ii) follow easily from Theorem 1.5, and (iii) follows from Theorem 1.6. (exercise: dothis!).

9

2 Systems of Linear Equations

A system of equations of the formx + 2y = 72x − y = 4

or5p − 6q + r = 42p + 3q − 5r = 76p − q + 4r = −2

is called a system of linear equations.

Definition. An equation involving certain variables is linear if each side of the equation is a sum ofconstants and constant multiples of the variables.

Examples. The equations 2x = 3y +√3, 3 = 0 and x+ y + z = 0 are linear.

Non-examples. The equations y = x2 and 3xy + z + 7 = 2 are not linear since they involve products ofvariables.

2.1 Solutions to Systems of Equations

A solution to a system of equations is a way of assigning a value to each variable, so as to make all theequations true at once.

Example. The system of equations . . .

4x − 2y = 229x + 3y = 12

. . . has a solution x = 3 and y = −5. (In fact in this case this is the only solution.)

Remark. In general, a system of linear equations can have no solutions, a unique solution, or infinitelymany solutions. It is not possible for the system to have, for example, exactly 2 solutions; this is a higherdimensional generalisation of the fact that 2 straight lines cannot meet in exactly 2 points (recall ExerciseSheet 0).

Definition. The set of all solutions is called the solution set or solution space of the system.

Warning. When we talk about solving a system of equations, we mean describing all the solutions (orsaying that there are none, if the solution set is empty), not just finding one possible solution!

2.2 2-dimensional Geometry of Linear Equations

What can solution spaces look like? Consider a linear equation in two variables x and y, say:

3x+ 4y = 4.

Each solution consists of an x-value and a y-value. We can think of these as the coordinates of a point in2-dimensional space. For example, a solution to the above equation is x = −4 and y = 4. This solutiongives the point (−4, 4).

In general, the set of solutions to a linear equation with variables x and y forms a line in 2-dimensionalspace. So each equation corresponds to a line — we say that the equation defines the line.

Systems of Equations. Now suppose we have a system of two such equations. A solution to the systemmeans an x-value and a y-value which solve all the equations at once. In other words, a solution

10

corresponds to a point which lies on all the lines. So in geometric terms, the solution set to a system oflinear equations with variables x and y is an intersection of lines in the plane.

Question. Recall from Exercise Sheet 0 what an intersection of two lines in the plane can look like:

• Usually4 it is a single point. This is why 2 equations in 2 variables usually have a unique solution.

• Alternatively, the lines could be distinct but parallel. Then the intersection is the empty set. Thisis why 2 equations in 2 variables can sometimes have no solutions.

• Or then again, the two lines could actually be the same. In this case the intersection is the wholeline. This is why 2 equations in 2 variables can sometimes have infinitely many solutions.

Similarly, if we have a system of more than 2 equations, we are looking for the intersection of all thelines (in other words, the set of points in the plane which lie on all the lines). The intersection of 3 lines is“usually” empty but could also be a single point or a line.

2.3 Higher Dimensional Geometry of Linear Equations

Now suppose we have an equation in the variables x, y and z, say

3x+ 4y + 7z = 4.

This time we can consider a solution as a point in 3-dimensional space. The set of all solutions (assumingthere are some) forms a plane in space. The solution set for a system of k linear equations will be anintersection of k planes.

Exercise. What can the intersection of 2 planes in 3-space look like? Try to imagine all the possibilities,as we did for lines in the previous section. How do they correspond to the possible forms of solution sets of2 equations with 3 variables?

Exercise. Now try to do the same for intersections of 3 planes. (Recall that the intersection of three setsis the set of points which lie in all three). How do the possible geometric things you get correspond to thepossible forms of solution sets for 3 equations with 3 variables?

In yet higher dimensions, the solution set to a linear equation in n variables is a copy of (n− 1)-dimensionalspace inside n-dimensional space. Such a set is called a hyperplane. So geometrically, the solution set ofk equations with n variables will be an intersection of k hyperplanes in n-dimensional space.

Remark. Actually it is not quite true to say that every equation defines a hyperplane. Equations like“0 = 0” and “2 = 2” define the whole space; if all equations in a system have this form then the solutionspace will be the whole space. On the other hand, equations like “0 = 1” have no solutions; if any equationin a system has this form then the solution space will be empty.

Remark. You might be wondering what happens if we drop the requirement that the equations be linear,and allow for example polynomial equations? If we do this then lines, planes and hyperplanes get replacedby more general objects called curves, surfaces and hypersurfaces respectively. For example, the solutionset to x2 + y2 + z2 = 1 is a sphere, which is a 2-dimensional surface in 3-space. The solution space to asystem of polynomial equations (an intersection of hypersurfaces) is called an (affine) algebraic variety.This part of mathematics is called algebraic geometry; you can study it in your third year.

4Of course “usually” here is a vague term, but I hope you can see intuitively what I mean! If you pick two lines at randomit is somehow “infinitely unlikely” that they will be exactly parallel. It is possible to make this intuition precise; this is wellbeyond the scope of this course but as a (challenging!) exercise you might like to think how you would do so.

11

2.4 Linear Equations and Matrices

Any system of linear equation can be rearranged to put all the constant terms on the right, and all theterms involving variables on the left. This will yield a system something like . . .

4x − 2y = 32x + 2y = 5

. . . so from now on we will assume all systems have this form. The system can then be written in matrixform:

(

4 −22 2

) (

xy

)

=

(

35

)

.

You can check (by multiplying out the matrix equation using the definition of matrix multiplication) thatit holds if and only if both of the original (scalar) equations hold. We can express the equations even moreconcisely as an augmented matrix:

(

4 −2 32 2 5

)

.

2.5 Elementary Row Operations and Solving Systems of Equations

At school you learnt to solve systems of linear equations by simple algebraic manipulation. This ad hocapproach is handy for small examples, but many applications involve bigger systems of equations, and forthese it is helpful to have a systematic method.

We start with the augmented matrix of our system (see Section 2.4), and we allow ourselves to modifythe matrix by certain kinds of steps. We can:

(i) multiply a row by a non-zero scalar (written ri → λri);

(ii) add a multiple of one row to another (written ri → ri + λrj);

(iii) swap two rows of the matrix (written ri ↔ rj).

These are called elementary row operations.5

Warning. Multiplying a row by 0 is definitely not an elementary row operation!

Definition. Two matrices A and B are called row equivalent if each can be obtained from the other bya sequence of elementary row operations.

Row operations and row equivalence are useful because of the following fact:

Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M isrow equivalent to N then the two systems have the same solution set.

We’ll prove this theorem later (Chapter 3), and but for now let’s explore its consequences. It meansthat if we start with a system of equations, applying elementary row operations to the augmented matrixwill give us another system of equations with the same solution set. The aim of the game is to obtain asystem of equations which is easier to solve. For example, consider the system:

First we convert this into augmented matrix form (noting that the “missing” x terms are really 0x andbecome 0s in the matrix):

5There is an obvious “dual” idea of elementary column operations, written ci → λci, ci → ci + λcj and ci ↔ cj .

12

Now we swap rows 1 and 2 (r1 ↔ r2), to get

Next let’s scale rows 1 and 2 to make the first entry in each 1 (r1 → −r1, r2 → 1

10r2):

Next we subtract 10 times row 2 from row 3 (r3 → r3 − 10r2), giving:

Now we convert the augmented matrix back to a system of equations:

It turns out that these equations are easier to solve than the original ones. Equation [3] tells us straightaway that z = 3. Now substituting this into equation [2] we get y − 3

10× 3 = −19

10, in other words, y = −1.

Finally, substituting both of these into equation [2] gives x− 4× (−1) + 3 = 9, that is, x = 2.

So we have found the solutions to equations [1], [2] and [3]. But these were obtained from our originalsystem of equations by elementary row operations, so (by Theorem 2.1) they are also the solutions to theoriginal system of equations. (Check this for yourself !)

This strategy forms the basis of a very general algorithm, called Gaussian6 Elimination, which weshall see in Section 2.7.

2.6 Row Echelon Matrices

Remark. Notice how we converted the matrix into a roughly “triangular” form, with all the entries towardsthe “bottom left” being 0. It was this property which made the new equations easy to solve. The followingdefinitions will make precise what we mean by “triangular” here.

Pivots. The pivot (or leading entry) of a row in an augmented matrix is the position of the leftmostnon-0 entry to the left of the bar. (If all entries left of the bar are 0, we call the row a zero row and it hasno pivot.)

Example. In the matrix A on the right

• the pivot of the first row is the 5;

• the pivot of the second row is the 3;

• the third row is a zero row (it has no pivot)

A =

0 5 1 03 0 0 00 0 0 2

Row Echelon Matrices. An augmented matrix is called a row echelon matrix if

6After Carl Friedrich Gauß (pronounced to rhyme with “house”), 1777–1855.

13

(1) any zero rows come at the bottom; and

(2) the pivot of each of the other rows is strictly to the right of the pivot of the row above; and

(3) all the pivots are 1.

A row echelon matrix is a reduced row echelon matrix if in addition:

(4) each pivot is the only non-zero entry in its column.

Example. The matrix A above is not a row echelon matrix, because thesecond row pivot is not to the right of the first row pivot. The matrix Bon the right is a row echelon matrix, but it is not a reduced row echelonmatrix because the 2 in the first row is in the same column as the secondrow pivot.

B =

0 1 1 2 00 0 0 1 50 0 0 0 00 0 0 0 6

2.7 Gaussian Elimination

Gaussian Elimination is a simple algorithm which uses elementary row operations to reduce theaugmented matrix to row echelon form:

(1) By swapping rows if necessary, make sure that no non-0 entries (in any row) are to the left ofthe first row pivot.

(2) Scale the first row so that the pivot is 1.

(3) Add multiples of the first row to each of the rows below, so as to make all entries below the first rowpivot become 0. (Since there were no non-0 entries left of the first row pivot, this ensures all pivotsbelow the first row are strictly to the right of the pivot in the first row.)

(4) Now ignore the first row, and repeat the entire process with the second row. (This will ensure that allpivots below the second row are to the right of the pivot in the second row, which in turn is to theright of the pivot in the first row.)

(5) Keep going like this (with the third row, and so on) until either we have done all the rows, or all theremaining rows are zero.

(6) At this point the matrix is in row echelon form.

Exercise. Go back to the example in Section 2.5, and compare what we did with the procedure above.

2.8 Another Example

Example. Let’s solve the system of equations:

2x + y − z = −76x − z = −10−4x + y + 7z = 31

Solution. First we express the system as an augmented matrix:

Notice (Step 1) that there are no non-0 entries to the left of the pivot in the first row (there can’t be,since the pivot is in the first column!). So we do not need to swap rows to ensure this is the case.

14

Next (Step 2) we scale to make the pivot in the first row 1, that is, r1 → 1

2r1, which gives

Next (Step 3) we seek to remove the entry in row 2 which is below the row 1 pivot. We can do this withthe operation r2 → r2 − 6r1, giving the matrix:

Similarly, we remove the entry in row 3 below the row 1 pivot by r3 → r3 + 4r1:

Now we ignore the first row, and repeat the whole process with the remaining rows. Notice that thereare no pivots to the left of the row 2 leading entry. (The one in row 1 doesn’t count, since we are ignoringrow 1!). Now we scale to make the second row pivot into 1, with r2 → −1

3r2, giving:

Then remove the entry in row 3 which is below the row 2 pivot, with r3 → r3 − 3r2, giving:

Finally, we repeat the process again, this time ignoring rows 1 and 2. All we have to do now is scale row3 to make the pivot 1, with r3 → 1

7r3:

Our matrix is now in row echelon form, so we convert it back to a system of equations:

These matrices can easily be solved by “backtracking” through them. Specifically:

• equation [3] says explicitly that z = 4;

• substituting into equation [2] gives y − 2

3× 4 = −11

3, so y = −1;

• substituting into equation [1] gives x+ 1

2× (−1)− 1

2× 4 = −7

2, so x = −1.

15

Thus, the solution is x = −1, y = −1 and z = 4. (You should check this by substituting the values backinto the original equations!)

Remark. Notice how the row echelon form of the matrix facilitated the “backtracking” procedure forsolving equations [1], [2] and [3].

Exercise. Use Gaussian elimination to solve the following systems of equations:

(i)4x + y = 92x − 3y = 1

(ii)2x − 4y = 12−x + 2y = −5

2.9 Gauss-Jordan Elimination

Gaussian elimination lets us transform any matrix into a row echelon matrix. Sometimes it is helpful togo further and obtain a reduced row echelon matrix. To do this we use a slight variation called Gauss-Jordan7 Elimination:

• First use Gaussian elimination to compute a row echelon form.

• Add multiples of the 2nd row to the row above to remove any non-0 entries above the 2nd row pivot.

• Add multiples of the 3rd row to the rows above to remove any non-0 entries above the 3rd row pivot.

• Continue down the rows, adding multiples of row k to the rows above to eliminate any non-0 entriesabove the kth row pivot.

Theorem 2.2. Let A be a matrix. Then A is row equivalent to a unique reduced row echelon matrix (calledthe reduced row echelon form of the matrix).

Proof. It should be clear that applying the Gauss-Jordan elimination algorithm to A will give a reducedrow echelon matrix row equivalent to A. The proof of uniqueness is more difficult, and we omit it.

Important Consequence. We can check if two matrices are row equivalent, by applying Gauss-Jordanelimination to compute their reduced row echelon forms, then checking if these are equal. (If the reducedrow echelon forms are equal then clearly the original matrices are equivalent. Conversely, if the matrices areequivalent then by Theorem 2.2, they must have the same reduced row echelon form.)

Exercise. Check if

1 3 60 1 23 7 14

and

1 0 01 5 101 1 2

are row equivalent.

2.10 Elimination with More Variables than Equations

What if, after applying Gaussian elimination and converting back to equations, there are more variables thanequations? This can happen either because the system we started with was like this, or because eliminationgives us rows of zeros in the matrices: these correspond to the equation “0 = 0” which obviously can’t beused to find the value of any variables.

Suppose there are k equations and n variables, where n > k. We can still apply Gaussian elimination tofind a row echelon form. But when we convert back to equations and “backtrack” to find solutions, we willsometimes still encounter an equation with more than one unknown variable.

If this happens, there are infinitely many solutions. We can describe all the solutions by introducingparameters to replace n− k of the variables.

7After the German Wilhelm Jordan (1842–1899) and not the more famous French mathematician Camille Jordan (1838–1922). The correct pronunciation is therefore something like “YOR-dan” (not zhor-DAN!) but most people give up and go withthe English pronunciation!

16

Example. Consider the equations.

−x + 2y + z = 23x − 5y − 2z = 10x − y = 14

The augmented matrix is:

−1 2 1 23 −5 −2 101 −1 0 14

and reducing to row echelon form we get

1 −2 −1 −20 1 1 160 0 0 0

.

which gives the equationsx − 2y − z = −2 [1]

y + z = 16 [2]0 = 0 [3]

Equation [3] is always satisfied but clearly useless for finding the values of the variables, so we can ignore it.Equation [2] has two unknown variables (y and z) so we introduce a parameter λ to stand for one of them,say z = λ. Now solving we get y = 16−λ. Substituting into [1] gives x− 2(16−λ)−λ = −2, or x = 30−λ.So the solutions are:

x = 30− λ, y = 16− λ, z = λ

for λ ∈ R.

Remark. When we say these are the solutions, we mean that substituting in different values of λ ∈ R willgive all the solutions to the equations. For example, λ = 1 would give x = 29, y = 15 and z = 1, so thisis one solution to the system of equations (check this!). Or then again, λ = 0 gives x = 30, y = 16 andz = 0, so this is another possible solution.

Exercise. Use Gaussian elimination to solve

2x − 4y = 10−x + 2y = −5

2.11 Homogeneous Systems of Equations

A system of linear equations is called homogeneous if the constant terms are all 0. For example:

x + 7y − 4z = 02x + 4y − z = 03x + y + 2z = 0

An obvious observation about homogeneous systems is that they always have at least one solution,given by setting all the variables equal to 0. This is called the trivial solution. Geometrically, this meansthat the solution space to a homogeneous system always contains the origin.

Forward Pointer. Solution sets to homogeneous systems — that is, intersections of hyperplanes throughthe origin — are very special and important subsets in Rn, called vector subspaces. More on this later(Chapter 5).

When it comes to solving them, homogeneous systems of equations are treated just like non-homogeneousones. The only difference is that nothing interesting ever happens on the RHS of the augmented matrix, asthe following exercise will convince you!

Exercise. Find solutions to the system of equations above.

Remark. Notice how, in your solution, everything stays “homogeneous” throughout. All the augmentedmatrices in the elimination process, and all the resulting equations, have only 0’s on the RHS.

17

2.12 Computation in the Real World

In describing methods to solve equations (Gaussian and Gauss-Jordan elimination) we have implicitly as-sumed two things:

• that the equations are precisely known; and

• that we can perform the necessary arithmetic with perfect accuracy.

But in reality, these assumptions are often not true. If we want to solve a system of equations because itmodels a real-world problem, then the coefficients in the equations may have been obtained by measure-ment, in which case their values are subject to experimental error. And even if the coefficients are knownprecisely, real-world computers don’t actually store and process real numbers: they can’t, because thereis an (uncountably) infinite set of real numbers but even the most powerful digital computer has a finitememory. Instead they work with approximations to real numbers: typical is floating point arithmeticin which numbers are rounded to a certain number of significant figures. At every stage of the computationnumbers have to be rounded to the nearest available approximation, introducing slight inaccuracies even ifthe original data was perfect.

If the starting data or the arithmetic are imperfect then it clearly isn’t realistic to expect perfect solutions.But some methods have the useful property that data which is almost right will produce solutions whichare almost right8. Other methods do not have this property at all: they may work perfectly in theory, butthe tiniest error in the starting data gets magnified as the computation progresses, and leads to a wildlyinaccurate answer. Gaussian elimination, in the simple form we have described, is not very good in thisrespect, but it can be made better with a slightly more sophisticated approach (called partial pivoting)which takes account of the relative size of the matrix entries.

Further consideration of these practical issues is beyond the scope of this course, but you can learn aboutthem in later years, starting with the second year option MATH20602 Numerical Analysis I.

8Or the subtly different property that the solutions produced are exact solutions to almost the right problems. Of coursethis is an oversimplification; for an accurate explanation see the first few sections of Chapter 1 in the book Accuracy and Stabilityof Numerical Algorithms (2nd edition) by N. J. Higham, and in particular the distinction between forward error and backwarderror. (But this is not examinable in this course.)

18

3 Fields, Elementary Matrices and Calculating Inverses

3.1 Fields

So far we have worked with matrices whose entries are real numbers (and systems of equations whosecoefficients and solutions are real numbers). If for a moment we call the set of real numbers K (it willbecome clear shortly why we’re not just calling it R!), here are a few basic properties of these operations:

(F1) for all a, b, c ∈ K, (a+ b) + c = a+ (b+ c);

(F2) for all a, b ∈ K, a+ b = b+ a;

(F3) there is an element of K (which we call “0”) with the property that a+ 0 = 0 + a = a for all a ∈ K;

(F4) for every a ∈ K there exists an element −a ∈ K such that a+ (−a) = 0;

(F5) for all a, b, c ∈ K, (ab)c = a(bc);

(F6) for all a, b ∈ K, ab = ba;

(F7) there is an element of K (which we call “1” and which is different from 0) with the property that1a = a1 = a for all a ∈ K;

(F8) for every a ∈ K except 0, there exists an element a−1 ∈ K such that aa−1 = a−1a = 1;

(F9) for all a, b, c ∈ K, a(b+ c) = ab+ ac.

These nine properties are called the field axioms. Check for yourself that they are true for the realnumbers. Of course lots of other things are true for the real numbers, so what’s so special about these ones?It turns out these are exactly the properties which we need to make linear algebra “work”. This means wecan replace the real numbers with any other type of “scalar” having these properties and things will stillwork in the same way! For example, the complex numbers (check!) also satisfy the field axioms, so we cando linear algebra where the scalars are complex numbers.

Definition. Let K be a set equipped with operations of addition and multiplication9. Then K is a calledfield if it satisfies the nine field axioms.

Examples. The real numbers satisfy the axioms (that was the whole point!) so R is a field. We havealready mentioned that the complex numbers satisfy the axioms, so C is a field as well. Exercise Sheet 3gives some more examples.

Non-example. The set of 2 × 2 matrices (with real number entries) has operations of addition andmultiplication, but is not a field. We saw on Exercise Sheet 1 that it doesn’t satisfy (F6). In fact it satisfiesseven of the nine axioms: on Exercise Sheet 3 you can find the other one that it doesn’t satisfy!

Remark. For real numbers (and complex numbers) we have the extra operations of subtraction and division.The definition of a field doesn’t explicitly mention these, but in fact we can use them in any field by defining:

• a− b to be shorthand for a+ (−b); and

• a/b and abto be shorthand for a(b−1).

Remark. Many other elementary facts about the real numbers can be deduced from the field axioms, whichmeans these facts must be true in all fields. For example, we know that in the real numbers we always have(b+ c)a = ba+ ca, which is not one of the field axioms. But in fact if a, b and c come from any field then:

(b+ c)a = a(b+ c) = ab+ ac = ba+ ca

9Of course, the field axioms don’t even make sense unless we know how to add and multiply the objects!

19

where the first and last equalities follow from (F6) and the middle one from (F9). Exercise Sheet 3 givessome more examples of this.

Important Remark. In fact, almost everything we did in Chapters 1 and 2 works when the real numbersare replaced by elements of some other field (for example, by complex numbers). Matrix operations canbe defined in exactly the same way, and everything we proved about them remains true. Systems of linearequations still make sense, and Gaussian and Gauss-Jordan Elimination still find the solution sets (there isan example to try on Exercise Sheet 3). Of course, wherever we referred to the real numbers 0 and 1 (forexample, in defining identity and zero matrices, or during Gaussian elimination) we use the 0 and 1 elementsof the field instead. The only thing which was really specific to the real numbers, and doesn’t quite workover other fields, is the geometric interpretation of solution sets in space.

Remark. It is worth taking a step back to consider the “big picture” of what we did in this section:

• We start with a particular “concrete” mathematical structure (in this case the real numbers).

• We identify the vital features which make it “work”, and abstract these features as axioms.

• We forget the original structure and study an abstract structure (a field) about which the onlyassumption we make is that it satisfies the axioms.

• Everything we learn in the abstract setting applies to every structure satisfying the axioms (for exam-ple, we can apply it back to the real numbers, but we can also apply it to the complex numbers).

This process of abstraction is one of the central pillars of mathematics. Instead of studying specificindividual structures in isolation, it lets us simultaneously understand lots of different structures whichwork in the same way. For example, instead of learning that something is true for the real numbers andthen having to check all over again whether it still works for the complex numbers, we can do it for bothstructures at once (and for a whole load of other structures besides). This idea is the basis of abstractalgebra, which you will study in detail starting with MATH20201 Algebraic Structures I next semester.

3.2 Elementary Matrices

From now on, we will mostly work with matrices whose entries come from a field K. Thus, everything weshow and do will work for matrices of real numbers, complex numbers, or even more obscure fields. If youfind this confusing, remember that you can always “specialise” back to the case of R: in other words justimagine all the entries are real numbers.

Definition. An n×n matrix E is an elementary matrix if it can be obtained from the identity matrix Inby a single elementary row operation. For an elementary row operation ρ we write Eρ for the correspondingmatrix.

There are three types of elementary matrices, corresponding to the three types of row operations. Theylook like:

Eri↔rj =

20

Eri→αri =

Eri→ri+λrj =

Theorem 3.1. Let A and B be m × n matrices over a field and suppose B is obtained from A by a rowoperation ρ. Then B = EρA.

Proof. Check by matrix multiplication.

Corollary 3.2. Suppose A can be transformed into B by applying a sequences of elementary row operationsρ1, ρ2, . . . , ρk. Then we have

B = Eρk . . . Eρ2Eρ1A.

Proof. By induction on k. The base case k = 1 is exactly Theorem 3.1.

Now let k ≥ 2 and suppose for induction that the statement holds for smaller k. Let A′ be the matrixobtained from A by applying the transformation ρ1. Then by Theorem 3.1, A′ = Eρ1A. Now B is obtainedfrom A′ by the sequence of (k − 1) elementary row operations ρ2, . . . , ρk, so by the inductive hypothesis,

B = Eρk . . . Eρ2A′ = Eρk . . . Eρ2(Eρ1A) = Eρk . . . Eρ2Eρ1A.

Theorem 3.3. Let E be an elementary n× n matrix over a field. Then E is invertible and E−1 is also anelementary matrix.

Proof. It is easy to check (exercise!) using the definition of the inverse that:

• the inverse of Eri↔rj is Eri↔rj ;

• the inverse of Eri→αri is Eri→ 1

αri;

• the inverse of Eri→ri+λrj is Eri→ri−λrj .

Elementary matrices allow us to provide the promised proof that elementary row operations on anaugmented matrix preserve the solution set.

Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M isrow equivalent to N then the two systems have the same solution set.

Proof. Let M = (A | B) and N = (C | D). Then the two systems of equations can be written in matrix form(see Section 2.4) as AX = B and CX = D where X is the column matrix whose entries are the variables.

21

Since M and N are equivalent by a sequences of row operations, A and C are equivalent by the samesequence of row operations, and so are B and D. So by Corollary 3.2 there are elementary matricesE1, . . . , Ek such that

C = Ek . . . E1A and D = Ek . . . E1B.

Now suppose Z is column matrix which is a solution to AX = B, that is, AZ = B. Then we have

CZ = (Ek . . . E1A)Z = (Ek . . . E1)(AZ) = (Ek . . . E1)B = D

so Z is also a solution to CX = D.

The converse (showing that a solution to CX = D is also a solution to AX = B) is very similar(exercise!).

3.3 Invertibility and Systems of Equations

Theorem 3.4. For A an n× n matrix over a field, the following are equivalent:

(i) A is invertible;

(ii) AX = 0n×1 has only the trivial solution X = 0n×1;

(iii) the reduced row echelon form of A is In;

(iv) A is row equivalent to In;

(v) A can be written as a product of elementary matrices.

Proof. We prove (i) =⇒ (ii) =⇒ (iii) =⇒ (iv) =⇒ (v) =⇒ (i).

(i) =⇒ (ii). Suppose A is invertible. If AX = 0n×1 then

X = InX = (A−1A)X = A−10n×1 = 0n×1

so 0n×1 is the only solution to AX = 0n×1.

(ii) =⇒ (iii). Suppose (ii) holds. Let M be the reduced row echelon form of A. Since M is row equivalentto A, it follows from Theorem 2.1 that the equation MX = 0n×1 has only one solution. Since M is squareand in reduced row echelon form, it is either the identity matrix or has a zero row at the bottom. Havinga zero row at the bottom would give lots of solutions to MX = 0n×1 (exercise: why? think about(M | 0n×1) and backtracking), so it must be that M = In.

(iii) =⇒ (iv). By definition, A is row equivalent to its reduced row echelon form.

(iv) =⇒ (v). If A is row equivalent to In then by Corollary 3.2 we have

A = Ek . . . E1In = Ek . . . E1

for some elementary matrices E1, . . . , Ek.

(v) =⇒ (i). By Theorem 3.3 elementary matrices are invertible, and by (an inductive argument using)Lemma 1.3(ii) a product of invertible matrices is invertible.

3.4 Calculating Inverses

Suppose A is an invertible n× n matrix over a field. By Theorem 3.4 there is a sequence of row operationsρ1, . . . , ρk transforming A to In. By Corollary 3.2 this means

In = Eρk . . . Eρ1A

22

Multiplying both sides on the right by A−1 we get

A−1 = Eρk . . . Eρ1AA−1 = Eρk . . . Eρ1In.

But this means (by Corollary 3.2 again) that applying the sequence of row operations ρ1, . . . , ρk to In givesA−1. This observation gives us an efficient way to find the inverse A−1 of an invertible matrix A:

(i) use Gauss-Jordan elimination to transform A into reduced row echelon form (which by Theorem 3.4must turn out to be In — if it isn’t then A isn’t invertible!);

(ii) apply the same sequence of row operations to In to get A−1

We can make the process even easier by writing the matrices A and In side-by-side in a single augmentedn × 2n matrix (A | In), then just apply Gauss-Jordan elimination to convert the left-hand-side to In, andlet the right-hand-side “follow along”.

Example. Let A =

1 0 11 1 −10 1 0

. We have

1 0 1 1 0 01 1 −1 0 1 00 1 0 0 0 1

r2→r2−r1−−−−−−→

1 0 1 1 0 00 1 −2 −1 1 00 1 0 0 0 1

r3→r3−r2−−−−−−→

1 0 1 1 0 00 1 −2 −1 1 00 0 2 1 −1 1

r3→ 1

2r3−−−−−→

1 0 1 1 0 00 1 −2 −1 1 00 0 1 1

2−1

2

1

2

r1→r1−r3r2→r2+2r3−−−−−−−→

1 0 0 1

2

1

2−1

2

0 1 0 0 0 10 0 1 1

2−1

2

1

2

.

We have now reduced the left-hand-side to its reduced row echelon form, which is In, confirming that A isinvertible. The remaining right-hand-side must be A−1, so

A−1 =

1

2

1

2−1

2

0 0 11

2−1

2

1

2

.

Example. Let B =

1 6 42 4 −1−1 2 5

. This time:

1 6 4 1 0 02 4 −1 0 1 0−1 2 5 0 0 1

r2→r2−2r1r3→r3+r1−−−−−−→

1 6 4 1 0 00 −8 −9 −2 1 00 8 9 1 0 1

r3→r3+r2−−−−−−→

1 6 4 1 0 00 −8 −9 −2 1 00 0 0 −1 1 1

.

We could continue with the elimination (exercise: try it!) but the zero row which has appeared in thebottom of the left-hand-side is clearly not going away again. This means the reduced row echelon form ofB is not In, so B is not invertible!

23

4 Determinants

The determinant of a square matrix is a scalar (i.e. an element of the field from which the matrix entriesare drawn) which can be associated to it, and which contains a surprisingly large amount of informationabout the matrix. To define it we need some extra notation:

Definition. If M is an n × n matrix over a field, then M ij is the (n − 1) × (n − 1) matrix obtained fromM by throwing away the ith row and jth column.

Example. If A =

1 2 34 5 67 8 9

then A11 =

(

5 68 9

)

and A23 =

(

1 27 8

)

.

Definition. If M is a 1× 1 matrix then detM = M11. If M is an n× n matrix with n ≥ 2 then

detM =n∑

k=1

(−1)k+1 M1k detM1k.

Alternative Notation. Sometimes the determinant is written det(M) or |M |.

Remark/Exercise. If M =

(

a bc d

)

is a 2× 2 matrix then the formula simplifies to

detM = ad− bc.

which we saw in Section 1.6.

Example. The determinant of the matrix A from the previous example is:

detA = A11 detA11 −A12 detA12 +A13 detA13

= 1det

(

5 68 9

)

− 2 det

(

4 67 9

)

+ 3det

(

4 57 8

)

= 1× (5× 9− 6× 8)− 2× (4× 9− 6× 7) + 3× (4× 8− 5× 7)

= −3 + 12 − 9

= 0

Warning. The determinant is only defined for square (n× n) matrices!

4.1 Expanding Along Rows and Columns

The formula in the definition of the determinant is, for fairly obvious reasons, called the expansion alongthe first row. In fact, we can expand in a similar way along any row, or indeed down any column:

Theorem 4.1. For any n× n matrix M over a field and any 1 ≤ i ≤ n and 1 ≤ j ≤ n:

(i) detM =∑n

k=1(−1)i+k Mik detM ik;

(ii) detM =∑n

k=1(−1)k+j Mkj detMkj.

Proof. Omitted.

Computing the determinant using (i) is called expanding along the ith row, while using (ii) isexpanding down the jth column. Notice that expanding along the first row just means using thedefinition of the determinant. But depending on the form of the matrix, choosing the row or column toexpand along/down carefully can make it much easier to compute the determinant.

24

Example. Suppose we wish to compute det

1 0 32 0 91 4 6

. Noticing the two 0 entries in the second column,

it makes sense to expand down this column, giving

detA = (−1)2+1×0×det

(

2 91 6

)

+ (−1)2+2×0×det

(

1 31 6

)

+ (−1)2+3×4×det

(

1 32 9

)

= −4×3 = −12.

Using the definition (i.e. expanding along the first row) would have given the same answer (check foryourself !), but with more calculation.

Terminology. The value detM ij is called that (i, j) minor of M , while the value (−1)i+j detM ij isthe (i, j) cofactor of M . The method we have seen for computing the determinant is called cofactorexpansion or Laplace10 expansion. (Note that some people who write mij for matrix entries use thenotation Mij for the (i, j) minor of M .)

4.2 Upper Triangular Matrices

A square matrix M is called upper triangular if Mij = 0 for all i > j, that is, if all entries below the maindiagonal are 0.

Examples.

(

1 30 1

)

and

1 0 30 7 −20 0 0

are upper triangular.

Remark. A square (non-augmented) row echelon matrix is always upper triangular.

Theorem 4.2. Let M be an n × n upper triangular matrix over a field. Then detM is the product of theentries on the main diagonal:

detM =

n∏

i=1

Mii.

Proof. See Exercise Sheet 3.

Corollary 4.3. det In = 1 for all n.

Proof. In is clearly upper triangular, so det In is the product of the entries on the main diagonal,which is 1.

4.3 Elementary Row Operations and the Determinant

Cofactor expansion (even along a carefully chosen row or column) is not a very efficient way to computedeterminants. If the matrix is large it quickly becomes impractical. Just as with solving systems of equationsand computing inverses, row operations ride to the rescue!

Theorem 4.4. Let A and B be n× n matrices over a field. Then

(i) if Ari→αri−−−−→ B then detB = α detA;

(ii) if Ari→ri+λrj−−−−−−−→ B then detB = detA;

(iii) if Ari↔rj−−−−→ B then detB = − detA;

Corollary 4.5. If A and B are square matrices over a field and are row equivalent, then

detA = 0 ⇐⇒ detB = 0.

10After Pierre-Simon, Marquis de Laplace (pronounced “la-PLASS”), 1749–1827.

25

Proof. It follows from Theorem 4.4 that none of the types of elementary row operations ever change whetherthe determinant is 0. If A and B are row equivalent then there is a sequence of row operations transformingA into B, and none of these can ever change whether the determinant is 0.

We will prove Theorem 4.4 shortly, but first let’s see how it helps us calculate a determinant. The ideais to use row operations to transform the matrix into a form where the determinant is easy to calculate,and keep track of how the operations affect the determinant so that we can recover the determinant ofthe original matrix. Theorem 4.2 suggests it makes sense to aim for an upper triangular matrix; we knowevery square matrix is row equivalent to one of these because row echelon matrices are upper triangular,but usually it is not necessary to go as far as finding a row echelon matrix:

Example. Let’s find the determinant of the matrix

1

60 7

6

0 0 −61 −2 −4

.

Applying the operations r1 → 6r1, r3 → r3 − r1, r2 ↔ r3 gives an upper triangular matrix:

1 0 70 −2 −110 0 −6

.

By Theorem 4.2 the determinant of this matrix is 1 × (−2) × (−6) = 12. By Theorem 4.4 the operationr1 → 6r1 will have multiplied the determinant by 6, while the row swap will have changed the sign, so thedeterminant of our original matrix must be −1

6(12) = −2.

Another application of Theorem 4.4 is to obtain the determinants of elementary matrices; this will proveuseful later.

Corollary 4.6. (i) detEri→αri =

(ii) detEri→ri+λrj =

(iii) detEri↔rj =

Proof. Each part follows by the corresponding part of Theorem 4.4 from the fact that Eρ is obtained fromIn by the corresponding row operation ρ. For (iii), for example, Eri↔rj is obtained from In by a row swap,so by Theorem 4.4(iii) and Corollary 4.3 we have

detEri↔rj = − det In = −1.

Exercise. Do the other two cases.

4.4 Proof of Theorem 4.4 (not directly examinable)

This section contains a proof of Theorem 4.4. It is not directly examinable and will not be covered inlectures. However, you are strongly encouraged to read it because (i) it will increase your understanding ofthings which are examinable, (ii) it will give you experience of learning mathematics by reading and (iii) itwill begin to show you how complex proofs are put together in stages using lemmas.

Lemma 4.7. Let A be an n× n matrix. If A has two different rows the same, then detA = 0.

Proof. Again, we use induction on n. The case n = 1 is vacuously true: if there is only one row then therecannot be two different rows the same, so the statement cannot be false.11

If n = 2 then A =

(

a ba b

)

for some a, b ∈ R, and now detA = ab− ab = 0.

Now let n ≥ 3 and suppose for induction that the result holds for smaller matrices. Pick two rows in Awhich are the same, and let r be the index of some other row. Expanding along row r, we have

detA =

n∑

k=1

(−1)k+r Ark detArk.

11Recall the flying pigs from Exercise Sheet 0!

26

For each k, notice that Ark is an (n− 1)× (n− 1) matrix with two rows the same. Hence, by the inductivehypothesis, all of these matrices have determinant 0, so detA = 0.

Question/Exercise. Why, in the proof of Lemma 4.7, do we have to start the induction from n = 3? Inother words, why can’t we just use n = 1 as the base case?

Lemma 4.8. Let X, Y and Z be n × n matrices which are all the same except in row i, and supposeZij = Xij + Yij for all j. Then detZ = detX + detY

Proof. Expand detZ along the ith row (exercise: write out the details!).

We are now ready to prove Theorem 4.4.

Proof of Theorem 4.4. (i) Suppose Ari→αri−−−−→ B. Expanding along the ith row we have:

detB =n∑

k=1

(−1)k+i Bik detBik

=n∑

k=1

(−1)k+i (αAik) detAik

= α

n∑

k=1

(−1)k+i Aik detAik

= αdetA.

(ii) Suppose Ari→ri+λrj−−−−−−−→ B. Let D be the matrix which is the same as A and B, except that the ith row

is λ times the jth row of A. Then we are in the position of Lemma 4.8 with X = A, Y = D and Z = B, sowe have

detB = detA+ detD.

Now let C be the matrix which is the same as D, except that the ith row is exactly the jth row of A. ThenD can be obtained from C by applying the transformation ri → λri, so by part (i), detD = λdetC. But Chas two rows (rows i and j) the same, so by Lemma 4.7, detD = λ0 = 0. Thus, detB = detA+ 0 = detA.

(iii) Suppose Ari↔rj−−−−→ B. We define matrices F , G, H, P and Q which are the same as A (and B) except

in rows i and j, where their entries are as follows:

• row i and row j of F are both the sum of rows i and j in A;

• row i of G is the sum of rows i and j in A; row j of G is row j of A;

• row i of H is the sum of rows i and j in A; row j of H is row i of A;

• rows i and j of P are row i of A; and

• rows i and j of Q are row j of A.

Now we are in the position of Lemma 4.8 with X = G, Y = H and Z = F , so we have detF = detG+detH.Similarly, the conditions of Lemma 4.8 are satisfied with X = A,Y = Q,Z = G and also with X = B,Y =P,Z = H so we obtain

detG = detA+ detQ and detH = detB + detP.

Notice also that each of F , P and Q has two identical rows, so by Lemma 4.7 they all have determinant 0.Hence

0 = detF = detG+ detH = detA+ detQ+ detP + detB = detA+ detB

so that detA = − detB as required.

27

4.5 Determinants and Inverses

We saw in Section 1.6 that a 2 × 2 matrix is invertible if and only its determinant is non-0. We now knownearly enough to prove the corresponding statement for n× n matrices: just one more lemma is needed.

Lemma 4.9. Let M be an n× n reduced row echelon matrix over a field. Then either M = In or M has azero row and has determinant 0.

Proof. See Exercise Sheet 3.

Theorem 4.10. An n× n matrix A over a field is invertible if and only if detA 6= 0.

Proof. By Theorem 2.2, A is row equivalent to a reduced row echelon matrix M . By Theorem 3.4, A isinvertible if and only if M = In. By Lemma 4.9, M = In if and only if detM 6= 0. But M is row equivalentto A, so by Corollary 4.5, detM 6= 0 if and only if detA 6= 0.

We also saw in Section 1.6 that for 2× 2 matrices there is an explicit formula for the inverse in terms ofdeterminants. The formula generalises to larger matrices, although it is often too complicated to be usefulin practice:

Theorem 4.11 (Cramer’s Rule12). If A is invertible then

A−1 =1

detAB

where B is the matrix given byBij = (−1)i+j detAji.

The matrix B in the statement of the theorem is called the adjugate of A.

Exercise. Check that if A is a 2× 2 matrix, Cramer’s Rule simplifies to the formula from Section 1.6.

4.6 Multiplicativity of the Determinant

In this section we will establish another important property of the determinant, namely that det(AB) =detAdetB for all n × n matrices A and B. This property is called multiplicativity of the determinant.To prove it we will use elementary matrices.

Lemma 4.12. Let E1, . . . , Ek be n× n elementary matrices. Then for every n× n matrix M ,

det(Ek . . . E1M) = det(Ek . . . E1) detM.

Proof. By strong induction on k. Suppose first k = 1. Then E1 = Eρ for some elementary row operation ρ.Let M be an n×n matrix. Now EρM is obtained from M by the operation ρ. For each of the three types ofelementary row operation, we can check using the appropriate parts of Theorem 4.4 and Corollary 4.6 thatdet(EρM) = detEρ detM . For example, if ρ is a row swap then by Theorem 4.4(i), det(EρM) = − detMand by Corollary 4.6(i) det(Eρ) = −1.

Now let k ≥ 2 and suppose the statement is true for smaller k. Then applying the inductive hypothesisthree times, with the role of M played first by E1M , then by M itself, and finally by E1, we obtain:

det(EkEk−1 . . . E1M) = det(Ek . . . E2) det(E1M)

= det(Ek . . . E2) detE1 detM

= det(Ek . . . E1) detM.

Theorem 4.13. Let A and B be n× n matrices over a field. Then

det(AB) = (detA)(detB)

12Gabriel Cramer (1704–52).

28

Proof. Write A = Ek . . . E1M where M is the reduced row echelon form of A and the Eis are elementarymatrices. Then by Lemma 4.12:

det(AB) = det(Ek . . . E1MB) = det(Ek . . . E1) det(MB).

We now consider two cases:

Case 1: M = In. In this case we have MB = B and Ek . . . E1 = A so

det(AB) = det(Ek . . . E1) det(MB) = detAdetB.

Case 2: M 6= In. In this case A is not invertible and so by Theorem 4.10, detA = 0. Also, by Lemma 4.9M has a row of zeros. It follows that MB has a row of zeros, and expanding along this row we see thatdet(MB) = 0. Now

det(AB) = det(Ek . . . E1) det(MB) = 0 = (detA)(detB).

29

5 Vectors, Eigenvalues and Subspaces

5.1 Vectors and Matrices

Recall that real n-space is the set

Rn = {(a1, . . . , an) | a1, . . . , an ∈ R}.

The elements of Rn are called (real) n-vectors, or just vectors when n is clear. More generally, if K is afield then

Kn = {(a1, . . . , an) | a1, . . . , an ∈ K}.is the space of n-vectors over K.

The Zero Vector. We write 0n for the zero vector (0, . . . , 0) ∈ Kn, or just 0 where n is clear.

Vectors as Matrices. There is an obvious way to think about an n-vector (a1, . . . , an) as a 1 × n rowmatrix (a1 a2 . . . an); a vector thought of in this way we call a row n-vector. Similarly, a vector may bethought of as an n× 1 column matrix, in which case we call it a column n-vector. Notice that each rown-vector is the transpose of the same vector thought of as a column n-vector.

Vector Operations. We know how to add matrices of the same size (Section 1.1), and how to scale amatrix by a scalar (Section 1.2). We can also define addition and scaling on vectors, by thinking of them asrow (or column) vectors: The resulting operations are:

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn) and λ(a1, . . . , an) = (λa1, . . . , λan).

Since row and column vectors are just matrices, we can also multiply them by matrices which have the rightsize. For example, if v is a column n-vector and M is an n × n matrix13 then Mv is defined (since thenumber of columns in M equals the number of rows in v) and gives another column n-vector. On the other,hand, vM is not defined14.

5.2 Eigenvalues and Eigenvectors

Definition. Let A be an n× n matrix over a field. We say that a scalar λ is an eigenvalue15 of A if thereis a non-zero column n-vector (that is, an n× 1 matrix) X such that

AX = λX.

The vector X is called an eigenvector of A associated (or corresponding) to the eigenvalue λ.

Example. Let A =

(

3 11 3

)

, X =

(

11

)

and Y =

(

22

)

. Then

AX =

(

44

)

= 4X and AY =

(

88

)

= 4Y

so 4 is an eigenvalue of A, and X and Y are both corresponding eigenvectors.

13Over the same field as v, of course. Although what we are doing makes sense over any field, it doesn’t make sense to “mixand match” different fields at the same time: we need to assume that, at any one time, all matrices and vectors considered areover the same fixed field.

14Unless n = 1. Although it would be defined for n > 1 if v were a row n-vector! For this reason we should technicallydistinguish between row vectors and column vectors — in other words, officially regard them as different objects, just as we dowith scalars and 1× 1 matrices (recall Exercise Sheet 1). In practice, it is often convenient to blur these distinctions where noconfusion can arise; in particular, we will often think of Kn as the space of column n-vectors

15From the German “eigen” (pronounced EYE-gn) meaning “own” or “private”, in the sense that they are the matrix’s ownspecial values.

30

Definition. If A is an n× n matrix then the characteristic polynomial of A is defined by

χA(x) = det(A− xIn).

Remark. The symbol x here is a “formal variable”. What do we mean by A− xIn? It is a matrix whoseentries are polynomials with variable x. Polynomials don’t actually form a field (exercise: which axiomsfail?) but its determinant is calculated just as if they did, by expanding along a row or column, and usingalgebraic addition and multiplication of polynomials in place of the normal addition and multiplication ofnumbers. What comes out is a polynomial (of degree n, as it happens) in the variable x.

Example (continued). Let A be the matrix above. Then

A− xIn =

(

3 11 3

)

−(

x 00 x

)

=

(

3− x 11 3− x

)

and

χA(x) = det(A− xIn) = det

(

3− x 11 3− x

)

= (3− x)× (3− x)− 1× 1 = (3− x)2 − 1 = x2 − 6x+ 8 = (x− 4)(x− 2).

I have factorised the polynomial so we can see what the roots are: notice that one of them is 4, which wesaw was an eigenvalue of A. In fact, this is a general phenomemon:

Theorem 5.1. Let A be an n× n matrix over a field K, and λ ∈ K. Then λ is an eigenvalue of A if andonly if χA(λ) = 0.

Proof. We have

λ is an eigenvalue of A ⇐⇒ AX = λX for some X 6= 0n×1

⇐⇒ AX − λX = 0n×1 for some X 6= 0n×1

⇐⇒ AX − λInX = 0n×1 for some X 6= 0n×1

⇐⇒ (A− λIn)X = 0n×1 for some X 6= 0n×1

⇐⇒ A− λIn is not invertible (by Theorem 3.4)⇐⇒ det(A− λIn) = 0 (by Theorem 4.10)⇐⇒ χA(λ) = 0 (by the definition of χA(x))

Example (continued). Keeping A as above, we know that χA(λ) has roots 2 and 4. We already knowthat 4 is an eigenvalue, but Theorem 5.1 tells us that 2 is an eigenvalue as well. What are the corresponding

eigenvectors? If X =

(

x1x2

)

is one of them then AX = 2X, or equivalently, (A− 2In)X = 02×1, that is,

(

1 11 1

)(

x1x2

)

=

(

00

)

.

The solutions to this are x1 = t, x2 = −t, so the set of all eigenvectors corresponding to the eigenvalue 2 is:

{(

t−t

)

| t ∈ R, t 6= 0

}

.

Notice how we exclude the case t = 0, since this gives the zero vector which by definition is not an eigenvector.

Remark. By the same line of reasoning, the collection of eigenvectors corresponding to a given eigenvaluewill always be the solution space to some system of homogeneous equations (minus the trivial solution,which is the zero vector).

31

Example. What are the eigenvalues and eigenvectors of A =

(

0 1−1 0

)

? This time

χA(x) = det(A− xI2) = det

(

−x 1−1 −x

)

= (−x)(−x)− (1× (−1)) = x2 + 1

so the eigenvalues are the solutions to x2 + 1 = 0. If we regard A as a matrix over R (or Q) there are nosolutions in K, so A does not have any (real) eigenvalues. On the other hand, A is also a matrix over C

(every real number is also a complex number!), and this equation does have solutions in C, namely x = iand x = −i. So A has (complex) eigenvalues i and −i. (Can you find the corresponding eigenvectors?)

5.3 Linear Combinations and Spans

Definition. Let v ∈ Kn and S = {w1, . . . , wt} ⊆ Kn. Then

(i) v is called a linear combination of w1, . . . , wt if there exist λ1, . . . , λt ∈ K such that

v = λ1w1 + λ2w2 + · · · + λtwt.

(ii) span(S) is the set of all vectors which are linear combinations of w1, . . . , wt, that is

span(S) = {µ1w1 + µ2w2 + · · ·+ µtwt | µ1, . . . , µt ∈ K}

Example. Let w1 = (1, 2,−1), w2 = (6, 4, 2) and S = {w1, w2} ⊆ R3.

(i) Is v = (9, 2, 7) a linear combination of w1 and w2? In other words, can we find λ1, λ2 ∈ R such thatv = λ1w1 + λ2w2? In other words, such that

(9, 2, 7) = λ1(1, 2,−1) + λ2(6, 4, 2)

= (λ1, 2λ1,−1λ1) + (6λ2, 4λ2, 2λ2)

= (λ1 + 6λ2, 2λ1 + 4λ2,−1λ1 + 2λ2)

In other words, such that

λ1 + 6λ2 = 9, 2λ1 + 4λ2 = 2 and − 1λ1 + 2λ2 = 7.

This is a just a system of linear equations. Solving (e.g. by Gaussian elimination), we see that thereis a solution (λ1 = −3 and λ2 = 2). So the answer is YES, v is a linear combination of w1 and w2.

(ii) Is v = (4,−1, 8) a linear combination of w1 and w2? This time v = λ1w1 + λ2w2 would mean

(4,−1, 8) = λ1(1, 2,−1) + λ2(6, 4, 2),

that is,

λ1 + 6λ2 = 4, 2λ1 + 4λ2 = −1 and − 1λ1 + 2λ2 = 8.

These equations have no solutions (check!), so this time v is not a linear combination of w1 and w2.

(iii) What is span(S)? By definition we have:

span(S) = {λ1w1 + λ2w2 | λ1, λ2 ∈ R}= {(λ1 + 6λ2, 2λ1 + 4λ2, −1λ1 + 2λ2) | λ1, λ2 ∈ R}

By the previous parts we have (9, 2, 7) ∈ span(S) but (4,−1, 8) /∈ span(S).

32

Remark. Notice that the zero vector is always in span(S), since it can be written as

0 = 0w1 + 0w2 + · · · + 0wt.

Remark/Definition. What if S is the empty set ∅? You might expect span(∅) to be the empty set, butit turns out to be convenient to define

span(∅) = {0}.so that, even in this case, the zero vector is always in the span.

5.4 Subspaces

Definition. Let U be non-empty subset of Kn. Then U is a subspace of Kn if

(i) ∀u1, u2 ∈ U, u1 + u2 ∈ U (U is “closed under addition”); and

(ii) ∀λ ∈ K,∀u ∈ U, λu ∈ U (U is “closed under scaling”).

Examples and Non-examples.

(i) Rn of Rn.

(ii) {0n} of Rn.

(iii) {(λ, 0) | λ ∈ R} of R2.

(iv) {(µ, µ) | µ ∈ R} of R2.

(v) {(λ, 1) | λ ∈ R} of R2.

(vi) ∅ Rn for any n.

Proposition 5.2. If U is a subspace of Kn then 0n ∈ U .

Proof. Since U is non-empty we may choose some u ∈ U . Now (−1)u ∈ U so 0n = u+ (−1)u ∈ U .

Theorem 5.3. Let S = {w1, . . . wt} ⊆ Kn. Then span(S) is a subspace of Kn.

Proof. We have seen that the zero vector is always in span(S), so span(S) is certainly not empty.Let u, v ∈ span(S). Then

u = λ1w1 + λ2w2 + · · ·+ λtwt and v = µ1w1 + µ2w2 + · · ·+ µtwt

for some λ1, . . . , λt, µ1, . . . , µt ∈ K. Now we have

u+ v = (λ1w1 + λ2w2 + · · ·+ λtwt) + (µ1w1 + µ2w2 + · · ·+ µtwt)

= (λ1 + µ1)w1 + (λ2 + µ2)w2 + · · ·+ (λt + µt)wt

which means that u+ v ∈ span(S).Now let λ ∈ K and u ∈ span(S). Then again

u = λ1w1 + λ2w2 + · · ·+ λtwt

for some λ1, . . . , λt ∈ K, so

λu = λ(λ1w1 + λ2w2 + · · ·+ λtwt) = (λλ1)w1 + (λλ2)w2 + · · ·+ (λλt)wt

which means that λu ∈ span(S).Thus, span(S) is a subspace of Kn.

33

The following theorem gives many more examples of subspaces.

Theorem 5.4. Let M be an m× n matrix over a field K. Let U be the set of all n × 1 column vectors Xsuch that

MX = 0m×1.

Then U is a subspace of Kn.

Proof. First notice that 0n×1 ∈ U , so U is non-empty.Now if u, v ∈ U then Mu = Mv = 0m×1 so using Theorem 1.2

M(u+ v) = Mu+Mv = 0m×1 + 0m×1 = 0m×1

which means that u+ v ∈ U .Similarly, if λ ∈ K and u ∈ U then16

M(λu) = λ(Mu) = λ0m×1 = 0m×1

so that λu ∈ U .

Thus, U is a subspace of Kn.

Definition. The subspace U in Theorem 5.4 is called the kernel or nullspace of the matrix M . (More onthis in the second part of the course.)

Corollary 5.5. For any homogeneous system of m linear equations with n variables, the solution space isa subspace of Kn.

Proof. Because the constant terms are all zero, when we write the equations in matrix form (see Section 2.4)they have the form

MX = 0m×1,

where M is m× n matrix of coefficients and X is the column matrix of variables. So the solution set is bydefinition exactly the kernel of the matrix of coefficients M .

5.5 Dependence and Independence

Definition. S = {w1, . . . , wt} ⊆ Kn is a linearly independent set if the only way to write 0 as a linearcombination

0 = λ1w1 + λ2w2 + · · ·+ λtwt.

is with λ1 = λ2 = · · · = λt = 0. Otherwise S is a linearly dependent set.

Example. In R2, the set {(1, 0), (0, 1)} is linearly independent. It should be obvious that

λ1(1, 0) + λ2(0, 1) = (λ1, λ2)

so the only way this can be 0 is if λ1 = λ2 = 0. Intuitively, if you start at the origin and travel some non-zero distance in the x-direction, you clearly can’t return to the origin by travelling only some distance (even“backwards”) in the y-direction. In this sense, the x-direction and y-direction are independent directions,which is why the vectors (1, 0) (in the x-direction) and (0, 1) (in the y-direction) are linearly independent.

Example. Let w1 = (1,−2, 3), w2 = (5, 6,−1), w3 = (3, 2, 1) and S = {w1, w2, w3} ⊆ R3. Is S linearlydependent or linearly independent? To answer this we need to consider solutions to

λ1w1 + λ2w2 + λ3w3 = (0, 0, 0), in other words,

16Again, this argument uses the basic properties of matrix operations from Theorem 1.2. From here on we will use thesewithout explicit reference.

34

λ1 + 5λ2 + 3λ3 = 0, −2λ1 + 6λ2 + 2λ3 = 0, 3λ1 − λ2 + λ3 = 0

Gaussian elimination tells us that the solutions are of the form

λ1 = λ2 =−t

2, λ3 = t

for t ∈ R. Putting t = 2, for example, gives a solution with λ1, λ2, λ3 not all zero, so S is linearly dependent.17

Example. Let w1 = (4,−1, 2), w2 = (−4, 10, 2) and S = {w1, w2} ⊆ R3. Is S linearly dependent orindependent? This time we consider the solutions to

λ1w1 + λ2w2 = (0, 0, 0), that is,

4λ1 − 4λ2 = 0, −λ1 + 10λ2 = 0, 2λ1 + 2λ2 = 0.

Here the only solution is λ1 = λ2 = 0 (check!), which means that S is linearly independent.

5.6 Basis and Dimension

Definition. Let U be a subspace of Kn. A basis for U is a subset S ⊆ U such that

(i) S is linearly independent; and

(ii) span(S) = U

Example. Let S = {(4,−1, 2), (−4, 10, 2)}. We saw in the example at the end of Section 5.5 that S islinearly independent, so S is a basis for span(S).

Example. The set

S = { (1, 0, 0, . . . , 0), (0, 1, 0 . . . , 0), . . . , (0, 0, . . . , 0, 1) }

is a basis for Rn (exercise: why?). It is called the standard basis for Rn.

Theorem 5.6. Suppose U is a subspace of Kn. Then U has a finite basis, and if B1 and B2 are two differentbases18 for U then |B1| = |B2|.

Proof. This will be proved in the second part of the course.

Definition. If U is a subspace of Kn then the dimension of U is |B| where B is a basis for U . We writedimU for the dimension of U . (Notice how Theorem 5.6 ensures that this definition makes sense: withoutit, the dimension might depend on which basis B we choose!)

Lemma 5.7. Let U be a subspace of Kn. If S ⊆ U , S is linearly independent and |S| = dimU then S is abasis for U .

Proof. Again, to be proved in the second part of the course.

17Alternatively, we could observe that the presence of a parameter in the solution description immediately tells us that thereare infinitely many solutions, so λ1 = λ2 = λ3 = 0 obviously can’t be the only solution!

18“Bases”, pronounced “bay-seez”, is the plural of “basis”.

35

5.7 Geometry of Subspaces in Rn

An m-dimensional subspace of Rn basically looks like a copy of Rm sitting inside Rn and containing theorigin. In other words, it is a copy of m-space inside n-space, for example a line in space, a plane in space,a 3-space in 4-space and so on.

Example. A 2-dimensional subspace of R3 is simply a plane (through the origin, since a subspace alwayscontains the zero vector). A 1-dimensional subspace is a line through the origin, while a 0-dimensionalsubspace is a point (which has to be the origin!).

Remark. Intuitively, you can think of a basis as giving a system of m coordinates for describing pointsin the subspace. For example, if U is a 2-dimensional subspace of R3 then it is a plane, so we should be ableto describe points on it with only 2 coordinates. Choose a basis {u, v} for U . Then each point in U can bewritten as λu + µv in exactly one way, and we can think of (λ, µ) as its coordinates in the plane. Noticehow the basis vectors themselves are u = 1u+ 0v and v = 0u+ 1v, so they get coordinates (1, 0) and (0, 1)respectively.

36

6 Orthogonality

In this chapter, we shall work only with vectors and matrices over R, and not over a general field. (Thereason for this will become apparent.)

6.1 The Euclidean Inner Product and Norm

Definition. Let u = (a1, . . . an), v = (b1, . . . , bn) ∈ Rn. The (Euclidean19) inner product on Rn isdefined by

〈u | v〉 = a1b1 + a2b2 + · · · + anbn.

Remark. If u and v are a column n-vectors, notice that 〈u | v〉 is just the entry of the 1× 1 matrix uT v.

Proposition 6.1. Let u, v ∈ Rn. Then

(i) 〈u | v〉 = 〈v | u〉;(ii) 〈u+ v | w〉 = 〈u | w〉 + 〈v | w〉;(iii) 〈λu | v〉 = λ〈u | v〉 = 〈u | λv〉;(iv) 〈v | v〉 ≥ 0, and 〈v | v〉 = 0 ⇐⇒ v = 0n.

Proof. Let u = (a1, . . . , an), v = (b1, . . . , bn) and w = (c1, . . . cn).

(i) Clear from the definition.

(ii) u+ v = (a1 + b1, a2 + b2, . . . , an + bn), so

〈u+ v | w〉 = (a1+b1)c1+(a2+b2)c2+· · ·+(an+bn)cn = (a1c1 . . . ancn)+(b1c1+. . . bncn) = 〈u | w〉+〈v | w〉.

(iii) λu = (λa1, . . . , λan) so

〈λu | v〉 = λa1b1 + λa2b2 + . . . λanbn = λ(a1b1 + · · ·+ anbn) = λ〈u | v〉and similarly for (u, λv).

(iv) 〈v | v〉 = b21 + · · · + b2n ≥ 0 since all the bi are real20 and is clearly equal to 0 if and only ifb1 = b2 = · · · = bn = 0, that is, v = 0n.

Definition. For v = (a1, a2, . . . , an) ∈ Rn, the (Euclidean) norm (also called the magnitude or length)of v is

||v|| =√

〈v | v〉 =√

a21+ a2

2+ · · ·+ a2n.

Remark. The norm of v is just the distance from the origin to the point with coordinates given byv. (Notice that in 2 dimensions this is Pythagoras’ Theorem!). In particular, note that ||v|| ≥ 0 and||v|| = 0 ⇐⇒ v = 0n. If n = 1, say v = (v1), notice that ||v|| =

√

v21is just the absolute value |v1| of the

entry v1, that is, v1 if v1 ≥ 0 and −v1 if v1 < 0.

Remark. The norm gives us an algebraic way to talk about distance in Rn: the distance between thepoints with coordinate vectors a and b is ||a − b||. This is the “usual” notion of the distance between twopoints (there are others!) in Rn, as first studied in detail by Euclid.

19After Euclid of Alexandria, 3rd-4th century BCE. He didn’t invent inner products — vectors in the abstract sense we arestudying are a 19th century innovation — but we’ll see shortly that this definition gives a notion of distance in Rn which makesit into the geometric space which Euclid studied. The inner product is sometimes called the scalar product (because it givesa scalar) or the dot product (because some people write it u · v instead of 〈u | v〉).

20For the first time in the whole course, we are using here a fact about the real numbers which isn’t true for fields in general.Actually property (iv) in the statement of the proposition does not really make sense over C (for example) since the order ≥does not make sense on the complex numbers. It turns out that we can do something similar to the inner product for complexnumbers (see Exercise Sheet 5) but not for fields in general.

37

6.2 Orthogonality and Orthonormality

Definition. Two vectors u and v are called orthogonal if 〈u | v〉 = 0. A set S ⊆ Rn is called an orthogonalset if every pair of vectors from S is orthogonal.

Remark. Geometrically, vectors are orthogonal if the lines from the origin to the corresponding points areat right angles (or if at least one of them is the zero vector).

Example. Let u = (−1, 3, 2), v = (4, 2,−1) and w = (1, 1, 2). Then

〈u | v〉 = (−1× 4) + (3× 2) + (2×−1) = 0

so u and v are orthogonal. On the other hand,

〈u | w〉 = (−1× 1) + 3× 1 + 2× 2 = 6

so u and w are not orthogonal. Hence, the set {u, v} is orthogonal, but the set {u, v, w} is not. Also,

||u|| =√

〈u | u〉 =√

(−1)2 + 32 + 22 =√14

Exercise. What are ||w|| and 〈v | w〉? Is {v,w} an orthogonal set?

Proposition 6.2. Let u, v ∈ Rn and λ, µ ∈ R. If u and v are orthogonal then λu and µv are orthogonal.

Proof. If u and v are orthogonal then 〈u | v〉 = 0, so using Proposition 6.1(iii) twice,

〈λu | µv〉 = λ〈u | µv〉 = λµ〈u | v〉 = 0,

so λu and µv are orthogonal.

Definition. A vector u ∈ Rn is called a unit vector if ||u|| = 1. A set S ⊆ Rn is called an orthonormalset if

(i) S is an orthogonal set; and

(ii) every vector in S is a unit vector.

Example. In R3 the set {(1, 0, 0), (0, 0, 1)} is orthonormal (exercise: check!).

Lemma 6.3. Let v ∈ Rn and λ ∈ R. Then ||λv|| = |λ| ||v||.Proof. Using Proposition 6.1(iii) again,

||λv|| =√

〈λv | λv〉 =√

λ2〈v | v〉 = |λ|√

〈v | v〉 = |λ| ||v||.

Corollary 6.4. If v ∈ Rn is a non-zero vector then 1

||v||v is a unit vector.

Proof. Since ||v|| is positive, by Lemma 6.3 we have

|| 1

||v||v|| =∣

∣

∣

∣

1

||v||

∣

∣

∣

∣

||v|| = 1

||v|| ||v|| = 1.

Example. Let u = (0, 1, 0), v = (1, 0, 1) and w = (1, 0,−1) and let S = {u, v, w}. Then 〈u | v〉 = 〈u | w〉 =〈v | w〉 = 0 (exercise: check!) so S is an orthogonal set. Although ||u|| = 1 the set is not orthonormalbecause ||v|| = ||w|| =

√2 (exercise: again, check!). However, by Corollary 6.4 and Proposition 6.2 the

set{

u,1

||v||v,1

||w||w}

=

{

(0, 1, 0),

(

1√2, 0,

1√2

)

,

(

1√2, 0,− 1√

2

)}

is orthonormal.

38

6.3 Orthonormal Bases

Definition. Let U be a subspace of Rn and let S ⊆ U . We say that S is an orthonormal basis for U ifS is an orthonormal set and S is a basis for U .

Recall (from Section 5.6) that if S is a basis for U then we can write every vector in U as a linearcombination of vectors in S. In general, finding how to write a given vector in this way involves solving asystem of linear equations. When S is an orthonormal basis, however, things are much easier:

Lemma 6.5. Let U be a subspace of Rn and S = {v1, v2, . . . , vk} an orthonormal basis for U . Then forevery u ∈ U ,

u = 〈u | v1〉v1 + 〈u | v2〉v2 + · · ·+ 〈u | vk〉vk.

Proof. Since S is a basis for U and u ∈ U we can write u = λ1v1 + · · ·+ λkvk for some λ1, . . . , λk ∈ R. Nowfor each i,

〈u | vi〉 = 〈λ1v1 + λ2v2 + · · · + λkvk | vi〉= λ1〈v1 | vi〉+ λ2〈v2 | vi〉+ · · ·+ λk〈vk | vi〉 (by Proposition 6.1)= λi〈vi | vi〉 (because 〈vj | vi〉 = 0 for j 6= i)= λi (because 〈vi | vi〉 = 1)

Theorem 6.6. If S = {v1, . . . , vk} is an orthogonal set of non-zero vectors in Rn then S islinearly independent.

Proof. Suppose

λ1v1 + λ2v2 + · · ·+ λkvk = 0n.

Recalling the definition of linear independence (see Section 5.5), we need to show that λ1 = · · · = λk = 0.For each i we have

0 = 〈0n | vi〉= 〈λ1v1 + · · ·+ λkvk | vi〉= λ1〈v1 | vi〉+ · · ·+ λk〈vk | vi〉 (by Proposition 6.1)= λi〈vi | vi〉 (because 〈vj | vi〉 = 0 for j 6= i)

Now vi is a non-zero vector so 〈vi | vi〉 6= 0 by Proposition 6.1(iv), so we must have λi = 0. Thus, S islinearly independent.

Example. Let S = {u, v, w} where u = (0, 1, 0), v =(

−4

5, 0, 3

5

)

and w = (35, 0, 4

5). Then S is an orthogonal

(in fact orthonormal) set (check!) so, by Theorem 6.6, S is linearly independent. Since dimR3 = 3 = |S|,Lemma 5.7 tells us that S is a basis (in fact, an orthonormal basis) for R3.

Hence every element of R3 can be written as a linear combination of u, v and w, and because S isan orthonormal basis, Lemma 6.5 gives us a practical way to do this. For example, consider x = (1, 1, 1) ∈ R3.We have:

(1, 1, 1) = x = 〈x | u〉u+ 〈x | v〉v + 〈x | w〉w = u− 1

5v +

7

5w.

Remark. Recall from Section 5.7 that we can think of a basis for an m-dimensional subspace as giving asystem of m coordinates for describing points in the subspace. In particular, viewing R3 as a subspace ofitself, the basis S gives a different system of coordinates for R3. The point which has “proper” coordinates(1, 1, 1) has coordinates

(

1,−1

5, 75

)

in this new system of coordinates.

39

6.4 Gram-Schmidt Orthogonalization

We have seen that for understanding a subspace, it is useful to have an orthonormal basis. This naturallyleads to two questions: does every subspace have an orthonormal basis, and if so, how do we find it?

Theorem 6.7. Every subspace of Rn has an orthonormal basis.

By way of a proof, we will present an algorithm called the Gram-Schmidt21 OrthogonalizationProcess, which answers the second question of how we find an orthonormal basis. The algorithm startswith a basis for a subspace of Rn (recall that by Theorem 5.6, every subspace has one!) and produces fromit an orthonormal basis for the same subspace.

Let U be a subspace of Rn, and {u1, . . . , uk} ⊆ Rn a (not necessarily orthonormal) basis for U .

Step 1. Set v1 = u1.

Step 2. Set v2 = u2 − 〈u2|v1〉〈v1|v1〉 v1. Notice that v2 ∈ U , because it is a linear combination of things from U

and U is a subspace. Notice also that22

〈v1 | v2〉 = 〈v1 | u2 −〈u2 | v1〉〈v1 | v1〉

v1〉 = 〈v1 | u2〉 − 〈u2 | v1〉 = 0.

Moreover, v2 6= 0n, because otherwise we would have:

0n = v2 = u2 −〈u2 | v1〉〈v1 | v1〉

v1 = 1u2 −〈u2 | v1〉〈v1 | v1〉

u1

which is impossible because the set {u1, . . . , uk} is linearly independent.

Step 3. Set v3 = u3 − 〈u3|v2〉〈v2|v2〉 v2 −

〈u3|v1〉〈v1|v1〉 v1. By similar arguments to step 2, v3 ∈ U and v3 6= 0n. Now

because 〈v1 | v2〉 = 0 we have

〈v1 | v3〉 = 〈v1 | u3 −〈u3 | v2〉〈v2 | v2〉

v2 −〈u3 | v1〉〈v1 | v1〉

v1〉

= 〈v1 | u3〉 −〈u3 | v2〉〈v2 | v2〉

〈v1 | v2〉 −〈u3 | v1〉〈v1 | v1〉

〈v1 | v1〉

= 〈v1 | u3〉 − 〈u3 | v1〉 = 0

A similar argument (exercise!) shows that 〈v2 | v3〉 = 0.

Now continue in the same way until finally.....

Step k. Set

vk = uk −〈uk | vk−1〉〈vk−1 | vk−1〉

vk−1 − · · · − 〈uk | v2〉〈v2 | v2〉

v2 −〈uk | v1〉〈v1 | v1〉

v1.

Again we find that vk 6= 0, vk ∈ U and 〈vi | vk〉 = 0 for all i < k.

So we end up with non-zero vectors v1, . . . , vk ∈ U such that each vj is orthogonal to each vi with i < j.Thus, v1, . . . , vk is an orthogonal (but not yet orthonormal!) subset of U . Now set

S =

{

1

||v1||v1,

1

||v2||v2, . . . ,

1

||vk||vk

}

.

Then S is orthogonal by Proposition 6.2 and the elements are unit vectors by Corollary 6.4, so S is or-thonormal. Now by Theorem 6.6, S is linearly independent. Finally,

|S| = k = |{u1, . . . , uk}| = dimU

so by Lemma 5.7, S is a basis for U .

21After Jørgen Pedersen Gram (1850–1916) and Erhard Schmidt (1876–1959), although the idea was known rather earlier.22From here on we will make frequent use of the basic properties of the inner product, given by Proposition 6.1, without

explicit mention.

40

Example. Let u1 = (1, 1, 1), u2 = (0, 3, 3) and u3 = (0, 0, 1). Consider the subspace

U = span({u1, u2, u3}) ⊆ R3

of all linear combinations of these vectors. It is easy to check that these three vectors are linearly independent,so they form a basis for U . However, they are neither orthogonal (check!) nor (except for u3) unit vectors,so they are not an orthonormal basis. We apply the Gram-Schmidt process to compute an orthonormalbasis. First we set:

v1 = u1 = (1, 1, 1)

Now 〈u2 | v1〉 = 6 and 〈v1 | v1〉 = 3 so we set

v2 = u2 −〈u2 | v1〉〈v1 | v1〉

v1 = (0, 3, 3) − 6

3(1, 1, 1) = (−2, 1, 1).

Now we have 〈v2 | v2〉 = 6, 〈u3 | v1〉 = 1 and 〈u3 | v2〉 = 1 so we set

v3 = u3 −〈u3 | v2〉〈v2 | v2〉

v2 −〈u3 | v1〉〈v1 | v1〉

v1 = (0, 0, 1) − 1

6(−2, 1, 1) − 1

3(1, 1, 1) =

(

0,−1

2,1

2

)

.

Thus,

{v1, v2, v3} =

{

(1, 1, 1), (−2, 1, 1),

(

0,−1

2,1

2

)}

is a basis of orthogonal vectors for the space U . To turn it into an orthonormal basis we just have to scaleto get unit vectors, so

{

1√3(1, 1, 1),

1√6(−2, 1, 1),

1√2(0,−1, 1)

}

is an orthonormal basis for U .

Remark. What is U in the previous example? In fact since {u1, u2, u3} is a linearly independent set of size3 = dimR3 it must be that U = R3! So what we have found is an orthonormal basis for R3.

Exercise. Use the Gram-Schmidt process to convert {u1, u2, u3} into an orthonormal basis for R3 where

u1 = (1, 0, 0), u2 = (3, 2,−2) and u3 = (1, 4, 1).

6.5 Orthogonal Matrices

Definition. A square matrix A is called orthogonal if A is invertible and A−1 = AT .

Theorem 6.8. Let A be an n× n matrix. Then the following are equivalent:

(i) A is orthogonal;

(ii) ATA = In;

(iii) AAT = In;

(iv) the columns of A (viewed as column n-vectors) form an orthonormal basis for Rn;

(v) the rows of A (viewed as row n-vectors) form an orthonormal basis for Rn;

Proof. The equivalence of (i), (ii) and (iii) follows from Exercise 5.8.Now let c1, . . . , cn be the columns of A, viewed as column n-vectors. It follows from the definitions of

matrix multiplication, the tranpose and the inner product that

(ATA)ij =

n∑

k=1

(AT )ikAkj =

n∑

k=1

AkiAkj = 〈ci | cj〉

41

for all i and j. So ATA = In means exactly that

(ATA)ii = 〈ci | ci〉 = 1 for all i and (ATA)ij = 〈ci | cj〉 = 0 for i 6= j,

which means exactly that {c1, . . . , cn} is an orthonormal set. Since dimRn = n, by Lemma 5.7 this is thesame as {c1, . . . , cn} being an orthonormal basis. This shows that (ii) ⇐⇒ (iv).

A very similar argument with rows instead of columns (exercise!) shows (iii) ⇐⇒ (v).

Remark/Warning. The terminology here is very confusing: an orthogonal matrix is one whose rows(and columns) are orthonormal, not just orthogonal! (It might be better if orthogonal matrices were calledorthonormal matrices instead, but unfortunately this is an example of where illogical names have becometoo ingrained to change.)

Remark. The equivalence of (iv) and (v) alone is quite remarkable. Given an orthonormal basis for Rn,write it down as the rows of an n × n matrix. Magically23 , the columns of the matrix will be anotherorthonormal basis! For example, consider the orthonormal basis for R3 which we obtained in the examplein Section 6.4. Writing the basis vectors as the rows of a matrix we get

A =

1√3

1√3

1√3

−2√6

1√6

1√6

0 −1√2

1√2

.

Theorem 6.8 tells us both that A will be an orthogonal matrix (exercise: check!), and that the vectorswhich make up the columns of A will form another orthonormal basis for R3, namely:

{(

1√3,−2√6, 0

)

,

(

1√3,1√6,−1√2

)

,

(

1√3,1√6,1√2

)}

.

In fact, the rows and the columns of a matrix are often connected in deep and mysterious ways; this is anexample of a very general mathematical phenomenon called duality.

23Disclaimer: the University of Manchester does not really believe in magic. ;-)

42

Documents

Whatis Linear Algebra andWhy DoI Need It?into the matrix: in other words, i is an integer between 1 and the number of rows, and j is an integer between 1 and the number of columns