Diﬀerential Calculus in Several Variables · 2018. 3. 24. · Several Variables In this chapter we introduce the concept of diﬀerentiability for functions of several variables

Chapter 3

Differential Calculus in

Several Variables

In this chapter we introduce the concept of differentiability for functionsof several variables and derive their fundamental properties. Includedare the chain rule, Taylor’s theorem, maxima - minima, the inverseand implicit function theorems, constraint extrema and the Lagrangemultiplier rule, functional dependence, and Morse’s lemma.

3.1 Differentiable functions

To motivate the definition of differentiability for functions on Rn, where

n > 1, we recall the definition of the derivative when n = 1. Let Ω ⊆ R

be an interval and a ∈ Ω. A function f : Ω→ R is said differentiable at aif the limit

limx→a

f(x)− f(a)

x− a= f ′(a)

exists.

The number f ′(a) is called the derivative of f at a and geometricallyis the slope of the tangent line to the graph of f at the point (a, f(a)).As it stands this definition can not be used for functions defined on R

n

with n > 1 since division by elements of Rn makes no sense. However,f ′(a) gives information concerning the local behavior of f near the point

103

104 Chapter 3 Differential Calculus in Several Variables

a. In fact, we may write the above formula equivalently as

limx→a

|f(x)− f(a)− f ′(a)(x− a)||x− a| = 0,

which makes precise the sense in which we approximate the values f(x),for x sufficiently near a, by the values of the linear function y = f(a) +f ′(a)(x−a). The graph of this function is the tangent line to the graphof f at the point (a, f(a)). In other words, the error R(x, a) made byapproximating a point on the graph of f by a point on the tangent linewith the same x− coordinate is f(x)− (f(a)+ f ′(a)(x−a)) (for x 6= a)and has the property that

limx→a

|R(x, a)||x− a| = 0.

Roughly speaking this says that R(x, a) approaches 0 faster than |x−a|,and f(a)+f ′(a)(x−a) is a good approximation to f(x) for |x−a| small.Because T (x) = f ′(a)x defines a linear transformation T : R → R andthe good approximation to f(x) is f(a) + T (x − a), it is this idea thatgeneralizes to R

n using the terminology of linear algebra.

Definition 3.1.1. Let f : Ω → Rm be a function defined on an open

set Ω in Rn and a ∈ Ω. We say that f is differentiable at a if there is a

linear transformation T : Rn → Rm such that

limx→a

||f(x)− f(a)− T (x− a)||m||x− a||n

= 0.

Here in the numerator we use the norm in Rm and in the denominator

that of Rn. In the future we shall surpress the subscripts. As we shallsee T is unique and is denoted by Df (a), the derivative of f at the pointa. If f is differentiable at every point of Ω, we say that f is differentiableon Ω.

Remark 3.1.2. Setting h = x− a 6= 0, then x→ a in Rn is equivalent

to h→ 0 and so, the limit in Definition 3.1.1 is equivalent to

limh→0

||f(a+ h)− f(a)− T (h)||||h|| = 0. (3.1)

3.1 Differentiable functions 105

The derivative1 T = Df (a) depends on the point a as well as thefunction f . We are not saying that there exists a T which works forall a, but that for a fixed a such a T exists. Differentiability of f ata says that f(x) is “well” approximated by the affine transformationF (x) = f(a) + T (x− a) for x near a. Important particular cases of thederivative are when m = 1 or n = 1. When m = 1, f is real-valuedfunction and the derivative T is a linear functional on R

n, which willturn out to be the so called gradient of f . When n = 1, for t ∈ Ω ⊆ R,f(t) is a curve in R

m and the derivative T is a vector in Rm, which will

turn out to be the tangent vector to the curve at f(a).

The following is called the Linear Approximation theorem. Since a

is fixed it will be convenient in what follows to write ε(x) rather ε(x, a).Here ε is a function from Ω ⊆ R

n → Rm.

Theorem 3.1.3. f : Ω ⊆ Rn → R

m is differentiable at a if and only ifthere is a function ε(x) so that for x ∈ Ω we have

f(x) = f(a) + T (x− a) + ε(x)||x− a||

with ε(x)→ 0 as x→ a.

Proof. Set

ε(x) =f(x)− f(a)− T (x− a)

||x− a|| , x 6= a.

Now, if f is differentiable at a, then limx→a ε(x) = 0.Conversely, if

f(x) = f(a) + T (x− a) + ε(x) ||x− a||

holds, then since x 6= a, we have

f(x)− f(a)− T (x− a)

||x− a|| = ε(x)→ 0

as x→ a and f is differentiable at a.

1In more abstract settings where f is a differentiable mapping between completenormed linear spaces (called Banach spaces) the derivative Df (a) is known as theFrechet derivative.


Note that by Remark 3.1 2., differentiability of f at a is equivalentto

f(a+ h) = f(a) + T (h) + ε(h)||h|| (3.2)

with ε(h)→ 0 as h→ 0.

Proposition 3.1.4. Let f : Ω→ Rm be a function defined on an open

set Ω in Rn and a ∈ Ω. If f is differentiable at a, then Df (a) is uniquely

determined by f .

Proof. Suppose T , S are linear transformations satisfying (3.2). Weprove T = S. By Theorem 3.1.3, we have f(a + h) = f(a) + T (h) +εT (h)||h|| and f(a+ h) = f(a) + S(h) + εS(h)||h|| with both εT (h) andεS(h) approaching zero as h→ 0. Subtracting we get

T (h)− S(h) = (εT − εS)||h||.

Setting L = T −S (again a linear transformation), and dividing by ||h||,it follows that

||L(h)||||h|| = ||εT − εS|| → 0

as h→ 0.Now, let x ∈ R

n be any (but fixed) nonzero vector and for t ∈ R

take h = tx. Then h→ 0 is equivalent to t→ 0 and since L is linear

0 = limt→0

||L(tx)||||tx|| =

|t|||L(x)|||t|||x|| =

||L(x)||||x|| .

Therefore, ||L(x)|| = 0. Thus, Lx = 0 for all x and so L = 0. HenceT = S.

Example 3.1.5. Let F : Rn → Rm be an affine transformation of the

form F (x) = T (x) + v, where T is a fixed linear transformation and v

is a fixed vector. Then F is everywhere differentiable and DF (a) = T

for all a ∈ Rn. Indeed,

limh→0

||F (a+ h)− F (a)− T (h)||||h|| = lim

h→0

0

||h|| = 0.

In particular, note that a linear transformation T is its own derivative,that is, DT (a) = T .

3.1 Differentiable functions 107

One simple, but important consequence of differentiability iscontinuity.

Proposition 3.1.6. If f is differentiable at a, then it is continuousat a.

Proof. Since f is diferentiable at a, we have

f(x) = f(a) +Df (a)(x− a) + ε(x)||x− a||.

Let x→ a. Then the third term on the right tends to zero “in spades”.As for the second term, by Proposition 2.4.3 there is b > 0 such that

||Df (a)(x− a)|| ≤ b||x− a||,

and so it also tends to zero. Hence, limx→a f(x) = f(a).

Next we state the differentiation rules. The proofs of the differen-tiation rules proceed almost exactly as in the one variable case, withslight modifications in notation, and we leave them to the reader as anexercise.

Theorem 3.1.7. Let f and g be functions from an open set Ω in Rn

to Rm, differentiable at a ∈ Ω and let c ∈ R. Then

1. f + g is differentiable at a and

D(f+g)(a) = Df (a) +Dg(a).

2. cf is differentiable at a and

Dcf (a) = cDf (a).

3. Suppose m = 1, then (fg)(x) = f(x)g(x) is differentiable at a and

Dfg(a) = g(a)Df (a) + f(a)Dg(a).

4. Suppose m = 1 and g(a) 6= 0, then (fg)(x) = f(x)

g(x) is differentiableat a and

D( fg)(a) =

g(a)Df (a)− f(a)Dg(a)

g(a)2.


We now come to the Chain Rule, which deals with the derivative ofa composite function.

Theorem 3.1.8. (Chain Rule). Let Ω be open in Rn and f : Ω→ R

m

and g : U → Rp, where U is open in R

m with f(Ω) ⊆ U. If f isdifferentiable at a ∈ Ω and g is differentiable at f(a), then g f isdifferentiable at a and

D(gf)(a) = Dg(f(a))Df (a).

Proof. We shall apply the Linear Approximation theorem. Let b = f(a)and y = f(x) for x ∈ Ω. By the differentiability of f and g, we have

f(x) = f(a) +Df (a)(x− a) + ε1(x)||x− a||

and

g(y) = g(b) +Dg(b)(y − b) + ε2(y)||y − b||.Substituting the first equation into the second yields

g(f(x)) = g(f(a)) +Dg(f(a)) Df (a)(x− a) + ε1(x)||x− a||

+ε2(y) ‖Df (a)(x− a) + ε1(x)||x− a||‖ .Since Dg(f(a)) is linear, it follows that

g(f(x)) = g(f(a)) +Dg(f(a))Df (a)(x − a) + ||x− a||Dg(f(a))ε1(x)

+ε2(y)

(

||Df (a)x− a

||x− a|| + ε1(x)||)

||x− a||.

Factoring ||x− a|| from the last two terms, we get

g(f(x)) = g(f(a)) +Dg(f(a))Df (a)(x− a) + ε(x)||x− a||,

where ε(x) = Dg(f(a))ε1(x) + ε2(y)||Df (a)(x−a)||x−a|| + ε1(x)||.

The proof will be complete, if we show that ε(x) → 0 as x → a.Clearly, the first term tends to zero as x→ a (since ε1(x)→ 0 as x→ a

and Dg(f(a)) is continuous). For the second term, using the triangle

3.2 Partial and directional derivatives, tangent space 109

inequality and the fact that Df (a) is bounded (i.e., ||Df (a)(v)|| ≤ c||v||for some c > 0), we have

0 ≤ ||ε2(y)||∥

∥

∥

∥

1

||x− a||Df (a)(x− a) + ε1(x)

∥

∥

∥

∥

≤ ||ε2(y)||(c + ||ε1(x)||).

As x→ a, the continuity of f at a implies y → b and so, both ε2(y)→ 0and ε1(x)→ 0 as x→ a. Hence ||ε2(y)||(c + ||ε1(x)||)→ 0

Proposition 3.1.9. Let Ω be an open set in Rn and f : Ω→ R

m, withcomponent functions f = (f1, f2, ..., fm). Then f is differentiable at aif and only if fj is differentiable at a, for all j = 1, 2, ...,m. Moreover,

Df (a) = (Df1(a), ...,Dfm(a)) .

Proof. Let Df (a) = T = (λ1, ..., λm). If f is differentiable at a, thevector equality f(x) = f(a) + T (x− a) + ε(x)||x − a|| written in termsof the components becomes fj(x) = fj(a) + λj(x − a) + εj(x)||x − a||,for j = 1, 2, ...,m.Since

maxj=1,...m

|εj(x)| ≤ ||ε(x)|| ≤√n max

j=1,...m|εj(x)|,

we have ε(x) → 0 as x → a if and only if εj(x) → 0 as x → a andso the result. At the same time by the uniqueness of the derivativeλj = Dfj (a) for j = 1, ...,m

By the above proposition, we see that to study the differentiabilityof functions f : Ω ⊆ R

n → Rm, it suffices to study the differentiability

of its component functions fj : Ω ⊆ Rn → R, for j = 1, ...,m. Hence we

turn to real-valued functions of several variables.

3.2 Partial and directional derivatives, tangent

space

Here we consider a real-valued function f : Ω ⊆ Rn → R on an open set

Ω which we assume is differentiable at a ∈ Ω. Thus, there is a linearfunctional λ : Rn → R such that

f(x) = f(a) + λ(x− a) + ε(x)||x − a||


with ε(x) → 0 as x → a. By Theorem 2.4.12, a linear functional Rn isan inner product by some fixed vector w ∈ R

n, that is, λ(x) = 〈w, x〉.This vector, which depends on a, is called the gradient of f at a and isdenoted by ∇f(a). Thus,

f(x) = f(a) + 〈∇f(a), (x− a)〉+ ε(x)||x− a||

We want to get a convenient and explicit form of the gradient. Todo so for f as above, we define the notion of directional derivative.

Definition 3.2.1. The directional derivative of f at a in the directionof a nonzero vector u ∈ R

n, denoted by Duf(a) is defined by

Duf(a) = limt→0

f(a+ tu)− f(a)

t,

whenever the limit exists.

We remark that the function g : R → R given by g(t) = f(a + tu)represents the function f restricted on the line X(t) = a + tu in Ω,passing from a in the direction of u. Since

g′(0) = limt→0

g(t)− g(0)

t= lim

t→0

f(a+ tu)− f(a)

t= Duf(a),

the directional derivative is the rate of change of f in the direction u.

Theorem 3.2.2. If f : Ω ⊆ Rn → R is differentiable at a ∈ Ω, then for

any direction u 6= 0, u ∈ Rn, Duf(a) exists and

Duf(a) = 〈∇f(a), u〉.

Proof. Let t ∈ R and normalize u 6= 0 so that ||u|| = 1. Since f isdifferentiable at a, f(x) = f(a) + 〈∇f(a), (x − a)〉 + ε(x)||x − a||, withε(x) → 0 as x → a. Setting x = a + tu, this yields f(a + tu) =f(a) + 〈∇f(a), tu〉+ ε(a+ tu)||tu|| and implies

f(a+ tu)− f(a)

t− 〈∇f(a), u〉 = |t|

tε(a+ tu).

Taking absolute values we get,

|f(a+ tu)− f(a)

t− 〈∇f(a), u〉| = |ε(a+ tu)|.


Now, as t→ 0, x→ a and so limt→0 |ε(a+ tu)| = 0. Hence,

limt→0

f(a+ tu)− f(a)

t= 〈∇f(a), u〉.

Corollary 3.2.3. For nonzero vectors u, v ∈ Rn and c, d ∈ R, we have

Dcu+dvf(a) = cDuf(a) + dDvf(a).

We now address the question, for which direction u, is |Duf(a)| thelargest?

Corollary 3.2.4. |Duf(a)| is maximum in the direction of ∇f(a).

Proof. By the Cauchy-Schwarz inequality

|Duf(a)| = |〈∇f(a), u〉| ≤ ||∇f(a)|| · ||u||with equality only if u is a scalar multiple of ∇f(a).

Certain directions are special, namely those of the standard basiselements ei for i = 1, ..., n, the directions of the coordinate axes.

Definition 3.2.5. Let f : Ω ⊆ Rn → R. The directional derivative of

f at a in the direction of ei is denoted by

∂f

∂xi(a) = Deif(a) = 〈∇f(a), ei〉

is called the partial derivative of f at a with respect to xi.

Writing f(a1, ..., an) in place of f(a), we see that

∂f

∂xi(a) = lim

t→0

f(a+ tei)− f(a)

t

is simply the ordinary derivative of f considered as a function of xialone, keeping the other components fixed. Since λ(x) = 〈w, x〉 for eachi = 1, ..., n

∇f(a) =

(

∂f

∂x1(a), ...,

∂f

∂xn(a)

)

,

and so

f(x) = f(a) +n∑

i=1

(xi − ai)∂f

∂xi(a) + ε(x)||x− a||.


Example 3.2.6. Let f(x, y, z) = 2x2 + 3y2 + z2. Find the directionalderivative of f at a = (2, 1, 3) in the direction v = (1, 0 − 2). What isthe largest of the directional derivative of f at a, and in what dirctiondoes it occur?

Solution. ||v|| =√5. The unit vector in the given direction is

u = ( 1√5, 0, −2√

5). The gradient is ∇f(x, y, z) = (4x, 6y, 2z), so that

∇f(2, 1, 3) = (8, 6, 6). By Theorem 3.2.2

Duf(a) = 〈∇f(a), u〉 = 〈(8, 6, 6), ( 1√5, 0,−2√5)〉 = − 4√

5.

The negative sign indicates that f decreases in the given direction. Thelargest directional derivative at a is ||∇f(a)|| = 2

√34, and it occurs in

the direction 1√34(4, 3, 3).

Next we show how matrices arise in connection with derivatives. Weshall see that if f : Ω ⊆ R

n → Rm is differentiable at a ∈ Ω, then the

partial derivatives of its component functions∂fj∂xi

(a) exist and determinethe linear transformation Df (a) completely.

Theorem 3.2.7. Let Ω be an open set in Rn and f : Ω → R

m bedifferentiable at a ∈ Ω. Then

∂fj∂xi

(a) exist and the standard matrix

representation of Df (a) is the m× n matrix whose jith entry is∂fj∂xi

(a)for j = 1, ...,m and i = 1, ..., n.

Proof. Let Df (a) = T = (λ1, ..., λm) and e1, ..., en and u1, ..., um bethe standard bases for Rn and R

m respectively. By the definition of thematrix of a linear map the jith entry of the standard matrix of T , say cji,is given by the jth component of the vector T (ei) =

∑mj=1 cjiuj . Since f

is differentiable at a, by Proposition 3.1.9 each fj is differentiable at a.

Hence, by Theorem 3.2.2 each∂fj∂xi

(a) exists and∂fj∂xi

(a) = 〈∇fj(a), ei〉 =λj(ei).

So we have,

T (ei) = (λ1(ei), ..., λm(ei)) =m∑

j=1

λj(ei)uj =m∑

j=1

∂fj

∂xi(a)uj .


Therefore,m∑

j=1

[

cji −∂fj

∂xi(a)

]

uj = 0.

Since u1, ..., um is linearly independent, it follows that cji =∂fj∂xi

(a)for j = 1, ...,m and i = 1, ..., n.

As the following example shows, the converse of Theorem 3.2.7 isfalse. Namely, the existence of the partial derivatives of f : Ω → R ata ∈ Ω, does not imply differentiability of f at the same point a (noreven the continuity at a). However, see Theorem 3.2.16.

Example 3.2.8. Consider the function f : R2 → R given by

f(x, y) =

0 if x = 0 or y = 0,1 otherwise.

We have

∂f

∂x(0, 0) = lim

t→0

f(t, 0)− f(0, 0)

t= lim

t→0

0

t= 0.

Similarly, ∂f∂y

(0, 0) = 0. But, f is not continuous at a, sincelim(x,y)→(0,0) f(x, y) does not exist. Hence, f is not differentiable at(0, 0). It is quite simple to understand such behavior. The partialderivatives depend only on what happens in the direction of the co-ordinate axes, whereas the definition of the derivative Df involves thecombined behavior of f in a whole neighborhood of a given point. Notealso, Df (0, 0) = ∇f(0, 0) = (0, 0), so that the derivative exists at (0, 0),but still f is not differentiable at (0, 0).

Definition 3.2.9. The standard matrix of Df (a) denoted again byDf (a) is called the Jacobian matrix of f at a. That is, the Jacobianmatrix of f is the m× n matrix

Df (a) =

∂f1∂x1

(a) ... ∂f1∂xn

(a)

. ... .

. ... .

. ... .∂fm∂x1

(a) ... ∂fm∂xn

(a)

=

Df1(a).

.

.

Dfm(a)


and reduces the problem of computing the derivative of a differentiablefunction f to that of computing the partial derivatives of its componentfunctions f1, ..., fm.

When m = n, the Jacobian matrix of f is a square n × n matrixand its determinant is then defined. This determinant is called theJacobian2 of f at a and is denoted by Jf (a). Thus,

Jf (a) = det (Df (a)) .

Other common notations of the Jacobian are

∂(f1,...fn)∂(x1,...,xn)

|x=a or ∂(y1,...yn)∂(x1,...,xn)

for y = f(x).

Note that when m = 1, in which case f is a real-valued function of nvariables, the derivative Df (a) is an 1 × n (row) matrix which can beregarded as a vector in R

n. This is the gradient vector

∇f(a) =

(

∂f

∂x1(a), ...,

∂f

∂xn(a)

)

.

The case (when n = 1) of a vector-valued function in one variable isalso important. As mentioned earlier, here it is customary to use X(t)rather than f(x). Thus,

X(t) = (x1(t), ..., xm(t))

and X is said a curve in Rm. If X is differentiable at t = a, we write

the derivative X ′(a) which is represented by the m× 1 column vector

X ′(a) =

x′1(a).

.

.

x′m(a)

(3.3)

2C. Jacobi (1804-1851). He initiated the theory of elliptic functions and madeimportant contributions in differential equations, number theory, the theory of de-terminats and other fields of mathematics.


Of course, X is differentiable at a if and only if each xj is differen-tiable at a, for j = 1, ...,m. Since X is differentiable at a if and onlyif

X ′(a) = limh→0

X(a+ h)−X(a)

h

exists, using the fact that 1h(X(a + h) − X(a)) is a chord which ap-

proximates the tangent line to the curve at X(a), we see that X ′(a)represents the tangent vector at X(a). For this reason, we call X ′(a)the tangent vector to the curve X(t) at t = a. If X ′(a) 6= 0, then thetangent line to X(t) at a is given by Y (t) = X(a) + tX ′(a). The phys-ical interpretation here is: if X(t) is the position of a particle movingsmoothly on a curve X(t) in space at time t, then X ′(t) is the velocityvector which is of course tangent to the curve at that time and ||X ′(t)||is its speed.

If X ′(t) exists for all t ∈ R and is itself differentiable, then a glanceat (3.3) shows

X ′′(t) =

x′′1(t).

.

.

x′′m(t)

which physically represents the accelaration vector of the moving parti-cle along the curve X(t).

Example 3.2.10. Let f : R3 → R2 be f(x, y, z) = (x

2−y4−2z2 , yz) =

(u, v). Find Df (x, y, z). What is Df (3, 2,−1)? Find the Jacobians∂(u,v)∂(x,y) ,

∂(u,v)∂(y,z) and ∂(u,v)

∂(x,z)

Solution. Here, f has two component functions u = f1(x, y, z) =x2−y4−2z

2 and v = f2(x, y, z) = yz and the derivative of f at any point(x, y, z) is

Df (x, y, z) =

(

∂f1∂x

(x, y, z) ∂f1∂y

(x, y, z) ∂f1∂z

(x, y, z)∂f2∂x

(x, y, z) ∂f2∂y

(x, y, z) ∂f2∂z

(x, y, z)

)

=

(

x −2y3 −10 z y

)

.


In particular, Df (3, 2,−1) =(

3 −16 −10 −1 2

)

.

Finally, ∂(u,v)∂(x,y) = det

(

x −2y30 z

)

= xy. Similarly ∂(u,v)∂(y,z) = −2y4 + z,

and ∂(u,v)∂(x,z) = xy.

Theorem 3.2.7 enables us to look at the Chain Rule (Theorem3.1.8) in terms of partial derivatives. Since composition of lineartransformations corresponds to matrix multiplication, the Chain RuleD(gf)(a) = Dg(f(a))Df (a) yields.

Corollary 3.2.11. (Chain Rule)

D(gf)(a) =

∂g1∂y1

(f(a)) ... ∂g1∂ym

(f(a))

. ... .

. ... .

. ... .∂gp∂y1

(f(a)) ...∂gp∂ym

(f(a))

∂f1∂x1

(a) ... ∂f1∂xn

(a)

. ... .

. ... .

. ... .∂fm∂x1

(a) ... ∂fm∂xn

(a)

.

Two important special cases are:Case 1. p = 1. Then

D(gf)(a) =(

∂g∂y1

(f(a)) ... ∂g∂ym

(f(a)))

∂f1∂x1

(a) ... ∂f1∂xn

(a)

. ... .

. ... .

. ... .∂fm∂x1

(a) ... ∂fm∂xn

(a)

Writing this out, we obtain, for i = 1, ..., n

∂(g f)∂xi

(a) =m∑

j=1

∂g

∂yj(f(a)) · ∂fj

∂xi(a). (3.4)

Case 2. Let Ω ⊆ R and X : Ω → Rn and g : U → R, where U

is open in Rn such that X(Ω) ⊆ U . If X is differentiable at a and

g differentiable at X(a) = b, then g X : Ω → R is a (one-variable)


function differentiable at a and

d(g X)

dt(a) =

(

∂g∂y1

(X(a)) ... ∂g∂yn

(X(a)))

dx1

dt(a).

.

.dxn

dt(a)

.

That is,

d(g X)

dt(a) =

n∑

i=1

∂g

∂xi(X(a)) · dxi

dt(a). (3.5)

Alternatively,d(g X)

dt(a) = 〈∇g(X(a)),X ′(a)〉.

Example 3.2.12. Let z = exy2

and suppose x = t cos t, y = t sin t.Compute dz

dtat t = π

2 .

Solution. By (3.5) we have

dz

dt=

∂z

∂x

dx

dt+

∂z

∂y

dy

dt= (y2exy

2

)(cos t− t sin t)+(2xyexy2

)(sin t+ t sin t).

At t = π2 , x = 0 and y = π

2 . Hencedzdt|t=π

2

= π2

4 (π2 ) = −π3

8 .

Example 3.2.13. Suppose ϕ : R2 → R2 is the polar coordinates map-

ping defined by

(x, y) = ϕ(r, θ) = (r cos θ, r sin θ),

(see Figure 3.1). Let f : R2 → R be a differentiable function and

u = f ϕ. Find ∂u∂r

and ∂u∂θ

Solution. The composition is u(r, θ) = f(r cos θ, r sin θ). Then theChain Rule (3.4) gives

∂u

∂r=

∂f

∂x

∂x

∂r+

∂f

∂y

∂y

∂r=

∂f

∂xcos θ +

∂f

∂ysin θ,

and∂u

∂θ=

∂f

∂x

∂x

∂θ+

∂f

∂y

∂y

∂θ= −∂f

∂xr sin θ +

∂f

∂yr cos θ.


r

Θ

Hr,ΘL

x

y

Figure 3.1: Polar coordinates

Example 3.2.14. Let f(x, y, z) = (x2y, y2, e−xz) and g(u, v, w) = u2−v2 − w. Find g f and compute the derivative of g f .

1. directly

2. using the Chain Rule.

Solution. Set F = g f : R3 → R. We have

F (x, y, z) = (g f)(x, y, z) = g(f(x, y, z)) = g(x2y, y2, e−xz)

= (x2y)2 − (y2)2 − e−xz = x4y2 − y4 − e−xz.

1.

∇F (x, y, z)=(∂F

∂x,∂F

∂y,∂F

∂z)=(

4x3y2+ze−xz, 2x4y−4y3, xe−xz)

.

2. At any point (x, y, z), the Chain Rule gives

D(gf) = DF =

(

∂F

∂x

∂F

∂y

∂F

∂z

)

=

(

∂g

∂u

∂g

∂v

∂g

∂w

)

∂u∂x

∂u∂y

∂u∂z

∂v∂x

∂v∂y

∂v∂z

∂w∂x

∂w∂y

∂w∂z

= (2u − 2v − 1)

2xy x2 00 2y 0

−ze−xz 0 −xe−xz


=(

2x2y − 2y2 − 1)

2xy x2 00 2y 0

−ze−xz 0 −xe−xz

=

4x3y2 + ze−xz

2x4y − 4y3

xe−xz

,

which is the gradient ∇F (x, y, x) written as a column vector.

Example 3.2.15. Let f(x, y) = (ex+y, ex−y) and X : R → R2 a curve

in R2 with X(0) = (0, 0) and X ′(0) = (1, 1). Find the tangent vector to

the image of the curve X(t) under f at t = 0.

Solution. Set Y (t) = f(X(t). By the Chain Rule, Y ′(t) =Df (X(t))X ′(t). At t = 0 we get Y ′(0) = Df (X(0))X ′(0) =

Df (0, 0)

(

11

)

. An easy calculation shows that Df (0, 0) =

(

1 11 −1

)

.

Hence

Y ′(0) =

(

1 11 −1

)(

11

)

=

(

20

)

,

That is, Y ′(0) = (2, 0).

As we saw in Example 3.2.8, if the partial derivatives of the functionexist the function need not be differentiable. However, if they are alsocontinuous, then the function is differentiable.

Theorem 3.2.16. (Differentiability Criterion). Let f : Ω→ Rm where

Ω is open in Rn. If all partial derivatives

∂fj∂xi

, j = 1, ...,m and i = 1, ..., nexist in a neighborhood of a ∈ Ω and are continuous at a, then f isdifferentiable at a.

Proof. In view of Proposition 3.1.9, it suffices to prove the result for areal-valued function f . We must show

limx→a

|f(x)− f(a)− 〈∇f(a), x− a〉|||x− a|| = 0.

To do this, we write the change f(x) − f(a) as a telescoping sum, bymaking the change one coordinate at a time. If x = (x1, ..., xn) and


a = (a1, ..., an), consider the vectors v0 = a, vi = (x1, ..., xi, ai+1, ..., an)for i = 1, ..., n − 1 and vn = x. Note that if x is in some ball around a,so are all the vectors vi, for i = 1, ..., n. Then

f(x)− f(a) =

n∑

i=1

[f(vi)− f(vi−1)].

Set gi(t) = f(x1, ..., xi−1, t, ai+1, ..., an). Then gi maps the interval be-tween ai and xi into R, with gi(ai) = f(vi−1), gi(xi) = f(vi) and thederivative of gi is ∂f

∂xi(x1, ..., xi−1, t, ai+1, ..., an). By the one-variable

Mean Value theorem (Theorem B.1.2), there are ξi strictly between aiand xi such that

f(vi)− f(vi−1) = (xi − ai)∂f

∂xi(x1, ..., xi−1, ξi, ai+1, ..., an)

Let ui = (x1, ..., xi−1, ξi, ai+1, ..., an) and note that ||ui − a|| ≤ ||x− a||,so that as x→ a also ui → a. Hence

limx→a

|f(x)− f(a)−∑ni=1(xi − ai)

∂f∂xi

(a)|||x− a||

= limx→a

|∑ni=1[

∂f∂xi

(ui)− ∂f∂xi

(a)](xi − ai)|||x− a|| .

Since |xi − ai| ≤ ||x− a||, we get

≤ limx→a

∑ni=1 | ∂f∂xi

(ui)− ∂f∂xi

(a)||xi − ai|||x− a|| ≤ lim

x→a

n∑

i=1

| ∂f∂xi

(ui)−∂f

∂xi(a)|.

The latter is zero since each ∂f∂xi

is continuous at a.

Definition 3.2.17. Let f : Ω → Rm where Ω is open in R

n. If allpartial derivatives

∂fj∂xi

(x), j = 1, ...,m and i = 1, ..., n exist for everyx ∈ Ω and are continuous on Ω, we say f is continuously differentiableon Ω. We write f ∈ C1(Ω) and say f is of class C1 on Ω. We also calla continuous function on Ω of class C0 on Ω or C(Ω).

Theorem 3.2.16 and Proposition 3.1.6 yield the following.


Corollary 3.2.18. A function of class C1(Ω) is of class C0(Ω).

Identifying the set of m × n real matrices with Rmn, we see that

f ∈ C1(Ω) if and only if the derivative function Df : Ω → Rmn is

continuous on Ω. We have also seen that f ∈ C1(Ω) implies f is differ-entiable on Ω which, in turn, implies that f has a directional derivativein every direction, in particular all partial derivatives exist on Ω. Asthe following examples show if the partial derivatives of a function f

are not continuous at a point, the question on the differentiability of fat the point remains open. One has to use the definition, or even bet-ter, the Linear Approximation theorem (Theorem 3.1.2) to decide onthe differentiability of f at the point. Hence, the converse of Theorem3.2.16 is also not true.

Example 3.2.19. Let f : R2 → R be defined by

f(x, y) =

(x2 + y2) sin 1√x2+y2

if (x, y) 6= (0, 0),

0 if (x, y) = (0, 0).

First we study the function on R2 − (0, 0). At any (x, y) 6= (0, 0), by

partial differentiation of f we see that

∂f

∂x(x, y) = 2x sin

1√

x2 + y2− (x2 + y2)

x

(x2 + y2)3

2

cos1

√

x2 + y2,

and by symmetry

∂f

∂y(x, y) = 2y sin

1√

x2 + y2− (x2 + y2)

y

(x2 + y2)3

2

cos1

√

x2 + y2.

Since both functions ∂f∂x

(x, y) and ∂f∂y(x, y) are continuous on R

2 −(0, 0), f is C1 on R

2 − (0, 0) and so f is differentiable at any(x, y) 6= (0, 0).Next we study f at (0, 0).

∂f

∂x(0, 0) = lim

t→0

f(t, 0)− f(0, 0)

t= lim

t→0t sin

1

|t| = 0

and similarly ∂f∂y(0, 0) = 0. Since

∂f

∂x(x, 0) = 2x sin

1

|x| −x

|x| cos1

|x|


does not have limit as x → 0, it follows that the function ∂f∂x

(x, y) is

not continuous at (0, 0). Similarly ∂f∂y(x, y) is not continuous at (0, 0).

However, f is differentiable at (0, 0). To see this we use the LinearApproximation theorem

ε(x, y) =f(x, y)− f(0, 0) − 〈∇f(0, 0), (x, y)〉

||(x, y)|| =f(x, y)√

x2 + y2

For (x, y) 6= (0, 0) we have 0 ≤ |f(x,y)|√x2+y2

≤√

x2 + y2. Therefore,

lim(x,y)→(0,0)

ε(x, y) = 0

and f is differentiable at (0, 0).

Example 3.2.20. Let f : R2 → R be defined by

f(x, y) =

xy√x2+y2

if (x, y) 6= (0, 0),

0 if (x, y) = (0, 0).

It is easy to see as before that ∂f∂x

(0, 0) = 0 and ∂f∂y(0, 0) = 0. If (x, y) 6=

(0, 0) differentiating f with respect x to we get

∂f

∂x(x, y) =

y√

x2 + y2− x2y

(x2 + y2)3

2

,

and by symmetry

∂f

∂x(x, y) =

x√

x2 + y2− y2x

(x2 + y2)3

2

.

Since both lim(x,y)→(0,0)∂f∂x

(0, y) and lim(x,y)→(0,0)∂f∂y

(x, 0) do not exist,

it follows that lim(x,y)→(0,0)∂f∂x

(x, y) and lim(x,y)→(0,0)∂f∂y(x, y) do not

exist and so both partial derivatives ∂f∂x

and ∂f∂y

are discontinuous at(0, 0). However, the function f itself is continuous at (0, 0). This followsfrom the estimate

0 ≤ |f(x, y)| = |xy|√

x2 + y2≤ 1

2

√

x2 + y2


We show that f is not differentiable at (0, 0). By the Linear Ap-proximation theorem we must look at

ε(x, y) =f(x, y)√

x2 + y2=

xy

x2 + y2.

But,

lim(x,x)→(0,0)

ε(x, x) =1

2.

Therefore, f is not differentiable at (0, 0).

To find the relationship between the gradient of a function f and itslevel sets, recall that a level set of level c ∈ R for f : Ω ⊆ R

n → R isthe set Sc = x ∈ Ω : f(x) = c. The set Sc (a hypersurface) in R

n hasdimension n− 1.

Proposition 3.2.21. Let f : Ω ⊆ Rn → R be a differentiable function

and a ∈ Ω lie on Sc. Then ∇f(a) is orthogonal to Sc: If v is the tangentvector at t = 0 of a differentiable curve X(t) in Sc with X(0) = a, then∇f(a) is perpendicular to v.

Proof. Let X(t) lie in Sc withX(0) = a. Then v = X ′(0) and f(X(t)) =c. So f(X(t)) is constant in t, and the Chain Rule tells us

0 =d(f(X(t))

dt= 〈∇f(X(t)),X ′(t)〉

For t = 0, this gives 〈∇f(a), v〉 = 0.

Now consider all curves on the level set Sc of f passing through thepoint a ∈ Sc. As we just saw, the tangent vectors at a of all thesecurves are perpendicular to ∇f(a). If ∇f(a) 6= 0, then these tangentvectors determine a hyperplane and ∇f(a) is the normal vector to it.This plane is called the tangent hyperplane to the surface Sc at a, andwe denote it by Ta(Sc). We recall from Chapter 1 (Example 1.3.32) thatthe plane passing through a point a ∈ R

3 with normal vector n consistsof all points x satisfying

〈n,x− a〉 = 0.


f a

Figure 3.2: Tangent plane and gradient

Hence, we arrive at the following.

Definition 3.2.22. Let f : Ω ⊆ Rn → R be a differentiable function at

a ∈ Ω. The tangent hyperplane to the level set Sc of f at a ∈ Sc is theset of points x ∈ R

n satisfying

〈∇f(a), x− a〉 = 0.

The tangent hyperplane translated so that it passes from the origin iscalled the tangent space at a and we denote it again by Ta(Sc).

Example 3.2.23. (Tangent hyperplane to a graph). An importantspecial case arises when Sc is the graph of a differentiable functiony = f(x1, ..., xn). As we saw in Chapter 2 the graph of f maybe regarded as the level set S ⊆ R

n+1 (of level 0) of the func-tion F (x, y) = F (x1, ..., xn, y) = f(x1, ..., xn) − y. Then ∇F (x, y) =( ∂f∂x1

(x), ..., ∂f∂xn

(x),−1). So that,

〈∇F (a, f(a)), (x − a, y − f(a)〉 = 0

implies

y = f(a) + 〈∇f(a), x− a〉.

In coordinates the tangent hyperplane at (a, f(a)) is written

y = f(a) +

n∑

i=1

∂f

∂xi(a)(xi − ai). (3.6)


For a differentiable function in two variables z = f(x, y) this yield theequation of the tangent plane to the graph of f at ((x0, y0), f(x0, y0))

z = f(x0, y0) +∂f

∂x(x0, y0))(x − x0) +

∂f

∂y(x0, y0))(y − y0).

Example 3.2.24. Find the equation of the tangent plane to the graphof z = f(x, y) = x2 + y4 + exy at the point (1, 0, 2).

Solution. The partial derivatives are ∂f∂x

(x, y) = 2x + yexy, ∂f∂y

(x, y) =

4y3 + xexy, so that ∂f∂x

(1, 0) = 2 and ∂f∂y(1, 0) = 1. Substituting in (3.6)

yields

z = 2 + 2(x− 1) + 1(y − 0) or z = 2x+ y.

Example 3.2.25. Let y = f(x1, ..., xn) =√

x21 + ...+ x2n = ||x||. Findthe tangent hyperplane

1. at a ∈ Rn with a 6= 0.

2. at ej ∈ Rn.

3. at 0 ∈ Rn

Solution.

1. At a = (a1, ..., an) we have ∂f∂xi

(a) = ai√a21+...+a2n

= ai||a|| . Substitu-

tion in (3.6) yields

y = ||a|| +n∑

i=1

ai

||a|| (xi − ai) = ||a|| +1

||a||

n∑

i=1

(aixi − a2i ) =〈a, x〉||a|| .

2. At ej we get y = xj .

3. At 0∂f

∂xi(0) = lim

t→0

f(0 + tei)− f(0)

t= lim

t→0

|t|t.

Since this latter limit does not exist, f is not differentiable at 0,and so there is no tangent hyperplane there.


EXERCISES

1. For each of the following functions find the partial derivatives and∇f .

(a) f(x, y) = e4x−y2

+ log(x2 + y).

(b) f(x, y) = cos(x2 − 3y).

(c) f(x, y) = tan−1(xy).

(d) f(x, y, z) = x2eyz .

(e) f(x, y, z) = zxy.

(f) f(r, θ, φ) = r cos θ sinφ.

(g) f(x, y, z) = e3x+y sin(5z) at (0, 0, π6 ).

(h) f(x, y, z) = log(z + sin(y2 − x)) at (1,−1, 1).

(i) f(x) = e−||x||2

2 , x ∈ Rn.

2. Find the directional derivative of f at the given point in the givendirection.

(a) f(x, y) = 4xy + 3y2 at (1, 1) in the direction (2,−1).(b) f(x, y) = sin(πxy) + x2y at (1,−2) in the direction (35 ,

45).

(c) f(x, y) = 4x2 + 9y2 at (2, 1) in the direction of maximumdirectional derivative.

(d) f(x, y, z) = x2e−yz at (1, 0, 0) in the direction v = (1, 1, 1).

3. Prove Theorem 3.2.2, using the Chain Rule.

4. Find the angles made by the gradient of f(x, y) = x√3 + y at the

point (1, 1) with the coordinate axes.

5. Find the tangent plane to the surface in R3 described by the

equation

(a) z = x2 − y3 at (2, 1,−5).(b) x2 + 2y2 + 2z2 = 6 at (1, 1,−1).

6. Show that the function f(x, y) = |x| + |y| is continuous, but notdifferentiable at (0, 0).


7. Let

f(x, y) =

xy x2−y2x2+y2

if (x, y) 6= (0, 0),

0 if (x, y) = (0, 0).

Show that ∂f∂x

(x, 0) = 0 = ∂f∂y(0, y), ∂f

∂x(0, y) = −y and ∂f

∂y(x, 0) =

x. Show that f is differentiable at (0, 0).

8. Let

f(x, y) =

x3−y3x2+y2

if (x, y) 6= (0, 0),

0 if (x, y) = (0, 0).

Find ∂f∂x

(0, 0) and ∂f∂y(0, 0). Show that f is not differentiable at

(0, 0).

9. Let

f(x, y, z) =

xyz(x2+y2+z2)α

if (x, y, z) 6= (0, 0, 0),

0 if (x, y, z) = (0, 0, 0),

where α ∈ R a constant. Show that f is differentiable at (0, 0, 0)if and only if α < 1.

10. Show that the function f(x, y) =√

|xy| is not differentiable at(0, 0).

11. Let α > 12 . Show that the function f(x, y) = |xy|α is differentiable

at (0, 0).

12. Let

ϕ(t) =

sin tt

if t 6= 0,1 if t = 0.

Show that ϕ is differentiable on R.Let

f(x, y) =

cos x−cos yx−y if x 6= y,

− sinx otherwise.

Express f in terms of ϕ and show that f is differentiable on R2.

13. For each of the following functions find the derivative Df and theindicated Jacobians.


(a) f : R2 → R2, f(x, y) = (x2 + y, 2xy− y2). Find the Jacobian

Jf (x, y).

(b) f : R2 → R3, f(x, y) = (xy, x2 + xy2, x3y) = (u, v, w).

Find ∂(u,v)∂(x,y) ,

∂(u,w)∂(x,y) and ∂(v,w)

∂(x,y) .

(c) f : R3 → R2, f(x, y, z) = (xey, x3 + z2 sinx) = (u, v).

Find ∂(u,v)∂(x,y) ,

∂(u,v)∂(x,z) and ∂(u,v)

∂(y,z) .

(d) f : R3 → R3, f(r, θ, z) = (r cos θ, r sin θ, z). Find Jf (r, θ, z).

(e) f : R3 → R3, f(r, θ, φ) = (r cos θ sinφ, r sin θ sinφ, r cosφ).

Find Jf (rθ, φ).

14. Let u = xyf(x+yxy

), where f : R → R is differentiable. Show that

u satisfies the partial differential equation x2 ∂u∂x− y2 ∂u

∂y= g(x, y)u

and find g(x, y).

15. Let x ∈ Rn and u = f(r), where r = ||x|| and f differentiable.

Show that

∑ni=1

(

∂u∂xi

)2= [f ′(r)]2.

16. Let z = ex sin y, x = log t, y = tan−1(3t). Compute dzdt

in twoways:

(a) Using the Chain Rule.

(b) Finding the composition and differentiating.

17. Let f(x, y) be C1 and let x = s cos θ − t sin θ, y = s sin θ + t cos θ.Compute

(

∂f∂s

)2+(

∂f∂t

)2.

18. Find ∂u∂s

and ∂u∂t

in terms of the partial derivatives ∂f∂x

, ∂f∂y

and ∂f∂z

for

(a) u = f(es−3t, log(1 + s2),√1 + t4).

(b) u = tan−1[f(t2, 2s − t,−4)].

3.3 Homogeneous functions and Euler’s equation 129

19. Let f(x, y) = (2x + y, 3x + 2y) and g(u, v) = (2u − v,−3u + v).Find Df , Dg, and D(gf).

20. Let g : R2 → R3 be g(x, y) = (x2 − 5y, ye2x, 2x − log(1 + y2)).

Find Dg(0, 0). Let f : R2 → R2 be of class C1, f(1, 2) = (0, 0)

and Df (1, 2) =

(

1 23 4

)

. Find D(gf)(1, 2).

Answers to selected Exercises

1. (g) (32 ,12 ,

5√3

2 ). (h) (−1,−2, 1)).

2. (a) − 2√5. (b) −2

5(4 + π). (c) 2√145. (d) 2√

3.

4. x-axis α = π6 , y-axis β = π

3 . 5. (a) 4x− 3y = z = 6.(b) 2x+ 4y − 6z = 12.

14. g(x, y) = x− y. 20.

−15 −203 42 4

.

3.3 Homogeneous functions and Euler’s

equation

Definition 3.3.1. Let f : Rn → R be a function. We say that f

is homogeneous of degree α ∈ R if f(tx) = tαf(x) for all t ∈ R andx ∈ R

n.

Observe that for such a function f(0) = 0. The simplest homo-geneous functions that appear in analysis and its applications are thehomogeneous polynomials in several variables, that is, polynomials con-sisting of monomials all of which have the same degree. For example,f(x, y) = 2x3−5x2y is homogeneous of degree 3 on R

2. Of course, thereare homogeneous functions which are not polynomials. For instance,f(x, y) = (x2 +4xy)−

1

3 is not a polynomial and is a homogeneous func-tion of degree −2

3 on R2 − (0, 0). In general for x = (x1, ..., xn) the

function f(x) = ||x||α with α ∈ R is homogeneous of degree α for t ∈ R+.

For example f(x, y) = (x2 + y2)1

2 is homogeneous of degree 1 and it is


not a polynomial, for if it were, then the function ϕ(t) = (1+t2)1

2 wouldalso be a polynomial. However, since all derivatives of ϕ are never iden-tically zero, this is impossible.

The following theorem characterizes homogeneous differentiablefunctions.

Theorem 3.3.2. (Euler3) Let f : Rn → R be a differentiable functon.Then f is homogeneous of degree α if and only if for all x ∈ R

n itsatisfies Euler’s partial differential equation

n∑

i=1

xi∂f

∂xi(x) = αf(x).

Proof. Suppose f is homogeneous of degree α. Consider the func-tion ϕ : R → R given by ϕ(t) = f(tx). On the one hand, sincef(tx) = tαf(x), we have ϕ′(t) = αtα−1f(x). On the other, by the ChainRule we have ϕ′(t) = 〈∇f(tx), d

dt(tx)〉 = 〈∇f(tx), x〉. Taking t = 1 we

get ϕ′(1) = αf(x) = 〈∇f(x), x〉. That is, αf(x) =∑ni=1 xi

∂f∂xi

(x). Con-versely, suppose f satisfies Euler’s equation. Then for t 6= 0 and x ∈ R

n,we have

ϕ′(t) = 〈∇f(tx), x〉 = 1

t〈∇f(tx), tx〉 = 1

tαf(tx) =

α

tϕ(t).

Letting g(t) = t−αϕ(t), it follows that

g′(t) = −αt−α−1ϕ(t) + t−αϕ′(t) = −αt−α−1ϕ(t) + t−αα

tϕ(t) = 0.

That is, g′(t) = 0. Therefore, g(t) = c and ϕ(t) = ctα. For t = 1 thisgives c = ϕ(1) = f(x). Thus, f(tx) = tαf(x) and f is homogeneous ofdegree α.

3L. Euler (1707-1783), was a pioneering mathematician and physicist. A student ofJ. Bernoulli and the thesis advisor of J. Lagrange. He made important discoveries infields as diverse as infinitesimal calculus and graph theory. Euler’s identity eiπ+1 = 0was called “the most remarkable formula in mathematics” by R. Feynman. Eulerspent most of his academic life between the Academies of Berlin and St. Petersburg.He is also known for his many contributions in mechanics, fluid dynamics, optics andastronomy.

3.4 The mean value theorem 131

3.4 The mean value theorem

In this section we generalize the Mean Value theorem to functions ofseveral variables.

Theorem 3.4.1. (Mean value theorem). Suppose f : Ω ⊆ Rn → R is

a differentiable function on the open convex set Ω. Let a, b ∈ Ω andγ(t) = a + t(b − a) the line segment joining a and b. Then there existsc on γ(t) such that

f(b)− f(a) = 〈∇f(c), b− a〉.

Proof. Consider the function ϕ : [0, 1] → R defined by ϕ(t) = f(γ(t)).Then ϕ is continuous on [0, 1] and by the Chain Rule, ϕ is differentiableon (0, 1) with derivative ϕ′(t) = 〈∇f(γ(t)), γ′(t)〉 = 〈∇f(γ(t)), b − a〉.By the one-variable Mean Value theorem applied to ϕ, there is a pointξ ∈ (0, 1) such that ϕ(1) − ϕ(0) = ϕ′(ξ). Take c = γ(ξ). Then

f(b)− f(a) = ϕ(1) − ϕ(0) = ϕ′(ξ) = 〈∇f(c), b − a〉.

The Mean Value theorem has some important corollaries.

Corollary 3.4.2. Let f : Ω ⊆ Rn → R be a differentiable function on

a convex subset K of Ω. If ||∇f(x)|| ≤M for all x ∈ K, then

|f(x)− f(y)| ≤M ||x− y||

for all x, y ∈ K. This is an example of what is called a Lipschitzcondition.

Proof. Let x, y ∈ K. Since K is convex, the line segment joining x withy lies in K. From the Mean Value theorem, we have f(x) − f(y) =〈∇f(c), x− y〉 for some c ∈ K. The Cauchy-Schwarz inequality tells us

|f(x)− f(y)| ≤ ||∇f(c)|| · ||x− y|| ≤M ||x− y||.

Corollary 3.4.3. Let f be a differentiable function on an open convexset Ω in R

n. If ∇f(x) = 0 for all x ∈ Ω, then f is constant on Ω.


Proof. Let x and y be any two distinct points of Ω. The proof of Corol-lary 3.4.2 tells us |f(x)− f(y)| = 0. That is, f(x) = f(y).

Clearly every convex set is pathwise connected (line segments arepaths), but most connected sets are not convex. Corollary 3.4.3 can beextended to differentiable functions on an open connected set Ω in R

n.So more generally.

Corollary 3.4.4. Let f be differentiable on an open connected set Ω ∈Rn. If ∇f(x) = 0 for all x ∈ Ω, Then f is constant on Ω.

Proof. Let a, b ∈ Ω be any two points in Ω. Since Ω is open and con-nected it is pathwise connected (see Proposition 2.6.14). Let γ(t) be apath in Ω with γ(0) = a and γ(1) = b. Cover the path γ(t) by openballs. Since γ([0, 1]) is compact, as a continuous image of the compactinterval [0, 1], a finite number of these balls to covers γ(t). As each ballis convex, by Corollary 3.4.3, we know that f is constant on each ball.The balls intersect nontrivially, so f is the same constant as we movefrom one ball to the next. After a finite number of steps we concludethat f(a) = f(b). Since a, b ∈ Ω were arbitrary, f is constant on Ω.

Obviously, the hypothesis of connectness is essential in Corollary3.4.4., even in one variable. For example, let Ω = (−∞, 0) ∪ (1,∞) andf : Ω→ R be the function

f(x) =

0 if x < 0,1 if x > 1.

then f ′(x) = 0 for all x ∈ Ω, but f(x) is not constant on Ω.

Exercise 3.4.5. Prove Corollary 3.4.4 using only connectedness. Hint.Let a ∈ Ω and set f(a) = c. Look at the set A = x ∈ Ω : f(x) 6= cand show that A = ∅.

Corollary 3.4.6. Let f : Ω → Rm where Ω is open connected set in

Rn. If Df (x) = 0 for all x ∈ Ω, then f is constant on Ω.

Proof. If Df (x) = 0, then ∇fj(x) = 0 for all j = 1, 2, ...,m. By Corol-lary 3.4.4, each fj is constant. Hence so is f .

3.4 The mean value theorem 133

For vector-valued functions f : Ω ⊆ Rn → R

m, even if n = 1, therecan be no Mean Value theorem when m > 1.

Example 3.4.7. Let f : R → R2 be given by f(x) = (x2, x3). Let us

try to find a ξ such that 0 < ξ < 1 and f(1) − f(0) = f ′(ξ)(1 − 0).This means (1, 1) − (0, 0) = (2ξ, 3ξ2) or 1 = 2ξ and 1 = 3ξ2, which isimpossible.

However, there is a useful inequality which is called the Lipschitz4

condition. Before we prove it, we define f ∈ C1(Ω) to mean that f ∈C1(Ω) such that f and ∂f

∂xifor all i = 1, 2, ...n extend continuously to

∂(Ω)

Corollary 3.4.8. (Lipschitz condition). Let Ω be an open boundedconvex set in R

n and f : Ω→Rm with f ∈ C1(Ω). Then there exists a

constant M > 0 such that for all x, y ∈ Ω

||f(x)− f(y)|| ≤M ||x− y||.

Proof. Let f = (f1, f2, ..., fm). Since Ω is convex, for x, y ∈ Ω, anapplication of the Mean Value theorem to each component of f and theCauchy-Schwarz inequality give,

||f(x)− f(y)||2 =m∑

j=1

|fj(x)− fj(y)|2 =m∑

j=1

|〈∇fj(cj), x− y〉|

≤m∑

j=1

||∇fj(cj)||2||x− y||2.

Now Ω is compact and since∂fj∂xi

(x) are continuous on Ω, the mapping

x 7→ ||∇fj(x)||2 being the composition of continuous functions, is itselfcontinuous, and therefore bounded. Let Mj = maxx∈Ω ||∇fj(x)||2 and

take M =√

∑mj=1Mj , then, for x, y ∈ Ω, we have ||f(x) − f(y)|| ≤

M ||x− y||. Since our inequality is ≤, this also applies to Ω.

4R. Lipschitz (1832-1903). A student of Dirichlet and professor at the Universityof Bonn. He worked in a board range of areas including number theory, mathematicalanalysis, algebras with involution, differential geometry and mechanics.


3.5 Higher order derivatives

For a function f : Ω ⊆ Rn → R the partial derivatives ∂f

∂xi(x) for

i = 1, 2, ..., n, are functions of x = (x1, x2, ..., xn) and there is no problemdefining partial derivatives of higher order, whenever they exist; justiterate the process of partial differentation for i, j = 1, 2, ..., n

∂j∂if =∂2f

∂xj∂xi=

∂

∂xj

(

∂f

∂xi

)

.

When i = j we write ∂2f

∂x2

i

. Repeating the process of partial differ-

entiation we obtain third (and higher) order partial derivatives. Forexample,

∂3f

∂xj∂x2i

=∂

∂xj

(

∂2f

∂x2i

)

.

Of course, when n = 2, we denote the variables by x, y rather thanusing subscripts. Thus, for a function f : Ω ⊆ R

2 → R, one obtains thefour second partial derivatives

∂2f

∂x2,

∂2f

∂y2,

∂2f

∂x∂y,

∂2f

∂y∂x.

The last two are refered to as the mixed second order partial derivativesof f . In certain situations subscript notation for partial derivatives hasadvantages. We shall also denote by fx, fy, and fxx, fyy, fxy, fyx thefirst and second order partial derivatives of f respectively.

Definition 3.5.1. Let f : Ω ⊆ Rn → R. If the second order partial

derivatives ∂2f∂xj∂xi

, for i, j = 1, 2, ..., n all exist and are continuous on

Ω, then we say that f is of class C2 on Ω and we write f ∈ C2(Ω).Likewise, for each positive integer k, we say f is of class Ck(Ω), whenall the kth order partial derivatives of f exist and are continuous on Ω.A function f is said to be smooth on Ω or of class C∞(Ω), if f has allits partial derivatives of all orders, that is, f ∈ C∞(Ω) if f ∈ Ck(Ω) forall k = 1, 2, ....

A consequence of Corollary 3.2.18 are the following inclusions

C∞(Ω) ⊆ ... ⊆ Ck(Ω) ⊆ Ck−1(Ω) ⊆ ... ⊆ C1(Ω) ⊆ C0(Ω).

Documents

Diﬀerential Calculus in Several Variables · 2018. 3. 24. · Several Variables In this chapter we introduce the concept of diﬀerentiability for functions of several variables