Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Notes on convex functions
G.J.O. Jameson
Contents
1. The definition and examples.
2. Elementary results.
3. Jensen’s inequality; application to means.
4. Inequalities for integrals; applications to discrete sums.
5. Algebra of convex functions.
6. Monotonic averages.
7. Majorisation.
1. The definition and examples.
Let I be a real interval, open or closed, bounded
or unbounded (so possibly the whole real line). A real-
valued function f is said to be convex on I if it lies below
(or on) the straight-line chords between pairs of points
of its graph. In other words, if x1, x2 are points of I with
x1 < x2, and ax+ b is the linear function agreeing with
f(x) at x1 and x2, then f(x) ≤ ax+ b for x1 ≤ x ≤ x2.
We say that f is strictly convex on I if for all x1, x2 as above, we have f(x) < ax + b
for x1 < x < x2. We also say that f is concave on I if −f is convex, so that f(x) ≥ ax + bfor such x.
Of course, a and b could be written out explicitly, but this is not particularly helpful.
However, an important equivalent way of stating the definition is as follows. All points x of
the interval [x1, x2] are expressible in the form
xλ = x1 + λ(x2 − x1) = (1− λ)x1 + λx2. (1)
for some λ in [0, 1]. We then have
g(xλ) = (1− λ)g(x1) + λg(x2)
for g(x) = 1 and for g(x) = x, hence for g(x) = ax+b as above. So the definition of convexity
equates to
f(xλ) ≤ (1− λ)f(x1) + λf(x2) (2)
1
for 0 ≤ λ ≤ 1, with xλ as above.
Another equivalent way to express (1) and (2) is: given that xλ − x1 = λ(x2 − x1), wehave f(xλ)− f(x1) ≤ λ[f(x2)− f(x1)].
We list some immediate facts:
(E1) If c > 0 and f(x) is convex on I, then so is cf(x).
(E2) If f and g are convex on I, then so is f + g.
(E3) A linear function ax+ b is both convex and concave.
(E4) If f is convex on I, then f(x+ c) is convex on {x : x+ c ∈ I}.
(E5) If f is convex on I, then for any non-zero c (positive or negative), f(cx) is
convex on {x : cx ∈ I}.
(E6) If fn is convex on I for each n ≥ 1 and limn→∞ fn(x) = f(x) for each x ∈ I,then f is convex on I.
For example, by (E2) and (E5), if f is convex on [−R,R], then so are f(−x) andf(x) + f(−x).
If a function has increasing derivative, then it is “curving upwards”, and it seems
almost obvious that it must be convex. With the help of the mean-value theorem, this is
easily proved:
1.1 PROPOSITION. If the derivative f ′ exists and is increasing on I, then f is convex
on I.
Proof. Let x1 < x2, and let ax+ b be the linear function agreeing with f at x1 and x2.
Let g(x) = f(x)− ax− b. Then g(x1) = g(x2) = 0, and g′(x) = f ′(x)− a, so is increasing onI. Suppose that g(x) > 0 for some x in (x1, x2). By the mean-value theorem, there exist ξ1
in (x1, x) such that g(x) − g(x1) = g(x) = (x − x1)g′(ξ1), hence g′(ξ1) > 0. Similarly, thereexists and ξ2 in (x, x2) such that g(x2) − g(x) = −g(x) = (x2 − x)g′(ξ2), hence g′(ξ2) < 0.Since ξ1 > ξ2, this contradicts the fact that g
′ is increasing. So g(x) ≤ 0, hence f(x) ≤ ax+b,for all x in (x1, x2). �
1.2 COROLLARY. If f ′′(x) ≥ 0 on I, then f is convex on I. �
Of course, if f ′ is strictly increasing (which occurs if f ′′(x) > 0 on I), then f is strictly
convex on I.
Note. For closed intervals, the following slight refinement of 1.1 applies: if f is con-
2
tinuous on [a, b] and f ′(x) exists and is increasing on (a, b), then f is convex on [a, b]. The
proof still applies, because the mean-value theorem is still valid under these assumptions.
In practice, we usually recognise convex functions by applying 1.1 or 1.2. In particular:
1.3. The function xp is convex on (0,∞) if p ≥ 1 or p ≤ 0, and concave if 0 ≤ p ≤ 1.
Proof. We have f ′′(x) = p(p− 1)xp−2, so f ′′(x) ≥ 0 for all x > 0 if p(p− 1) ≥ 0, whichoccurs when p ≥ 1 and when p ≤ 0. The opposite occurs when 0 ≤ p ≤ 1. �
In fact, xp is strictly convex if p > 1 or p < 0, and strictly concave if 0 < p < 1. For
p ≥ 0, the statements apply on [0,∞) (for 0 < p < 1, this uses the Note above). The readeris invited to sketch x2, x1/2 and x−1 to illustrate the three cases. Note also that x2 is convex
on the whole real line, while x3 is concave on (−∞, 0].
We list some further examples.
Example. For any real a (including a < 0), the function f(x) = eax is strictly convex
on R, since f ′′(x) = a2eax > 0.
Example. The function log is (strictly) concave on (0,∞), since the derivative 1/x isdecreasing.
Example. The function x log x is convex on (0,∞), since f ′(x) = 1 + log x, which isincreasing.
Example. Both sinx and cosx are concave on [0, π2], since the second derivatives, − sinx
and − cosx, are non-positive. Hence sinx ≥ 2πx and cos x ≥ (1− 2
π)x on [0, π
2].
Example. Any polynomial with non-negative coefficients is convex on [0,∞), by 1.3and (E2).
Example. The function |x| is convex on R: this is not a consequence of 1.1, since thefunction is not differentiable at 0, but it is easily verified directly.
The following sections outline a number of results about convex functions. Most of
them are very well established, and we do not give references. Sections 6 and 7 contain some
rather more recent results and techniques, and here some references are supplied.
Note. This provisional version contains blank spaces where certain diagrams are in-
tended. Owing to the Coronavirus pandemic, I am currently unable to access the files for
these diagrams.
3
2. Elementary results
First, we verify another fact that seems geometrically obvious: if a function has in-
creasing derivative, then it lies above its tangents.
2.1 PROPOSITION. Suppose that f ′(x) exists and is increasing on an interval I. Then
for any x and x0 in I,
f(x) ≥ f(x0) + (x− x0)f ′(x0) (3)
The opposite inequality holds if f ′(x) is decreasing.
Proof. First suppose that x > x0. By the mean-value theorem, there exists ξ in (x0, x)
such that f(x)−f(x0) = (x−x0)f ′(ξ). Since ξ > x0, we have f ′(ξ) ≥ f ′(x0), hence (3). Nowsuppose that x < x0. Then there exists ξ in (x, x0) such that f(x0)− f(x) = (x0 − x)f ′(ξ).We now have f ′(ξ) ≤ f ′(x0), hence f(x0)− f(x) ≤ (x0 − x)f ′(x0), which equates to (3). �
Note. If f has a second derivative, then we have by Taylor’s theorem
f(x) = f(x0) + (x− x0)f ′(x0) + 12(x− x0)2f ′′(ξ)
for some ξ between x0 and x. This implies (3), and also gives an error term.
Example. Let f(x) = (1 + x)p. Then f(0) = 1 and f ′(0) = p, so by (3), for all x > −1,we have (1 + x)p ≥ 1 + px when p ≥ 1 or p < 0, and the reverse when 0 ≤ p ≤ 1.
Example. Let f(x) = log(1 + x). Then f is concave, f(0) = 0 and f ′(0) = 1, so
log(1 + x) ≤ x for all x > −1.
We will see below that convex functions always have one-sided derivatives, and satisfy
a correspondingly amended version of 2.1
Next, we consider gradients of chords. For x 6= y, write
mf (x, y) =f(y)− f(x)
y − x.
The following result confirms a fact that seems obvious from the diagram. It is the key to
many further statements about convex functions.
2.2 PROPOSITION. If f is convex on I and x1,
x2, x3 are points of I with x1 < x2 < x3, then
mf (x1, x2) ≤ mf (x1, x3) ≤ mf (x2, x3).
Proof. Let x2 − x1 = λ(x3 − x1). Then, by (2),
f(x2)− f(x1) ≤ λ[f(x3)− f(x1)] =x2 − x1x3 − x1
[f(x3 − f(x1)],
4
which equates to mf (x2, x1) ≤ mf (x3, x1).
The proof that mf (x1, x3) ≤ mf (x2, x3) is similar, using x3−x2 = (1−λ)(x3−x1) andf(x3)− f(x2) ≥ (1− λ)[f(x3)− f(x1)]. �
This simple result has numerous applications.
2.3 COROLLARY. If f is convex on I and xj, yj are points of I with x1 < y1, x2 < y2,
also x1 ≤ x2 and y1 ≤ y2, then mf (x1, y1) ≤ mf (x2, y2).
Proof. Then mf (x1, y1) ≤ mf (x1, y2) ≤ mf (x2, y2). �
(Note that we do not require y1 ≤ x2 in this result.)
2.4 COROLLARY. If f is convex on [0, R] and f(0) = 0, then f(x)/x is increasing on
(0, R].
Proof. Note that f(x)/x = mf (0, x). If 0 < x < y, then mf (0, x) ≤ mf (0, y). �
Another consequence is the following converse to 2.1:
2.5. If f is convex and differentiable on an open interval I, then f ′ is increasing on I.
Proof. Take x1, x2 in I with x1 < x2, and h > 0. By 2.2, mf (x1, x1+h) ≤ mf (x2, x2+h).Taking the limit as h→ 0, we obtain f ′(x1) ≤ f ′(x2). �
The next result is a useful complement to the original definition. Again, it is geomet-
rically obvious: picture a straight line intersecting twice with an upwardly curving line.
2.6 PROPOSITION. Suppose that f is convex on I and x1, x2 are points of I with
x1 < x2. Let ax+ b be the linear function agreeing with f at x1 and x2. Then f(x) ≥ ax+ bfor points x of I such that x < x1 or x > x2.
Proof. Clearly, mf (x1, x2) = a. So if x > x2, then mf (x2, x) ≥ a, hence
f(x) ≥ f(x2) + a(x− x2) = (ax2 + b) + a(x− x2) = ax+ b.
Similarly for x < x1. �
Example. The function 2x is strictly convex, since it equals ex log 2. It agrees with 1 + x
at x = 0 and x = 1. Hence 2x < 1 +x for 0 < x < 1, and 2x > 1 +x for x > 1 and for x < 0.
As well as being useful in itself, the next result illustrates a style of proof that is often
effective.
5
2.7 PROPOSITION. Let a, b, c, d be real numbers with a < d and b, c in [a, d]. Let
α, β, γ, δ be non-negative numbers such that
β + γ = α + δ,
βb+ γc = αa+ δd.
Then for any convex function f on [a, d],
βf(b) + γf(c) ≤ αf(a) + δf(d). (4)
Proof. The assumptions say that βg(b) + γg(c) = αg(a) + δg(d) for g(x) = 1 and
g(x) = x, hence for any linear function g. Take g to be the linear function agreeing with f
at a and d: then f(b) ≤ g(b) and f(c) ≤ g(c). The statement follows. (Note that it does notmatter whether c is greater or less than b.) �
Of course, the opposite inequality applies for concave f . Also, strict inequality applies
if f is strictly convex and b, c are in (a, d).
We record the case α = β = γ = δ = 1, which is already of interest:
2.8 COROLLARY. Suppose that b, c are in [a, d] and b+ c = a+ d. If f is convex on
[a, d], then f(b) + f(c) ≤ f(a) + f(d). �
This result can also be derived from 2.2: mf (a, b) ≤ mf (c, d). In turn, it implies severalfurther statements.
2.9 COROLLARY. If f is convex on [a−R, a+R] for some a, R, then f(a+x)+f(a−x)increases with x for 0 ≤ x ≤ R. �
In particular, if f is convex on [−R,R], then f(x) + f(−x) increases with x on [0, R].If f is also even, then f(x) increases on [0, R].
2.10 COROLLARY. If f is convex on [0,∞) and f(0) = 0, then f(x+y) ≥ f(x)+f(y)for x, y > 0 (f is “superadditive”). �
2.11 COROLLARY. If f is convex on [x0,∞) and c > 0, then f(x+c)−f(x) increaseswith x for x ≥ x0.
Proof. Let x < y. By 2.8, f(x + c) + f(y) ≤ f(x) + f(y + c), so f(x + c − f(x) ≤f(y + c)− f(y). �
Note that 2.11 is trivial if f has increasing derivative, since then f ′(x+ c)− f ′(x) ≥ 0.
6
Conversely, if f(x+ c)− f(x) increases with x for all c > 0 and f is differentiable, then f ′(x)is increasing (consider the limit as c→ 0).
We now consider continuity and one-sided derivatives of convex functions.
2.12 PROPOSITION. If f is convex on an open interval I, then it is continuous there.
Proof. Choose x0 ∈ I. Since I is an open interval, there exist points x1 and x2 in Iwith x1 < x0 < x2. Let a1x + b1 and a2x + b2 be the linear functions agreeing with f at
x1, x0 and x0, x2 respectively. By 2.2, a1 ≤ a2. By the original definition and 2,6, we havea1x + b1 ≤ f(x) ≤ a2x + b2 for x0 < x < x2, and the reverse inequalities for x1 < x < x0.Hence f(x)→ f(x0) both as x→ x+0 and as x→ x−0 . �
However, f can be discontinuous at an end-point of a closed interval: for example, a
convex function on [0, 1] is defined by f(x) = 0 for 0 ≤ x < 1 and f(1) = 1.
Write (D+f)(x) and (D−f)(x) for the right and left derivatives of f at x.
2.13. If I is an open interval and f is convex on I, then f has finite right and left
derivatives at each point of I, and (D+f)(x) ≥ (D−f)(x) for x ∈ I.
Proof. Choose x ∈ I. For small enough h, k > 0, the points x− h and x+ k are in I,and by 2.2, mf (x−h, x) ≤ mf (x, x+k). Also, mf (x, x+k) decreases as k decreases towards0. Hence it tends to a limit L as k → 0+, and L ≥ mf (x−h, x). Similarly, mf (x−h, x) tendsto a limit M as h→ 0+, and M ≤ L. By definition, L = (D+f)(x) and M = (D−f)(x). �
We can now state a version of 2.1 that applies to all convex functions, without the
assumption of differentiability.
2.14 PROPOSITION. If f is convex on I and x0 is an interior point of I, then
f(x) ≥ f(x0) + (x− x0)(D+f)(x0) (5)
for all x ∈ I.
Proof. Let D+f)(x0) = m. First, take x > x0, and let x0 < y < x. By 2.2, mf (x0, y) ≤mf (x0, x). But mf (x0, y)→ m as y → x+0 , so mf (x0, x) ≥ m. This means that f(x)−f(x0) ≥m(x− x0), as stated.
Now consider x < x0. In the same way, mf (x, x0) ≤ (D−f)(x0) ≤ m, so f(x0)−f(x) ≤m(x0 − x), which is equivalent to (5). �
7
3. Jensen’s inequality; applications to means
The next result is the key to numerous applications of convex functions. It was formu-
lated by the Danish mathematician J. L. W. V. Jensen in 1906. We give two proofs.
3.1. THEOREM (Jensen’s inequality). Suppose that f is convex on I. Suppose that
xj ∈ I and λj > 0 for 1 ≤ j ≤ n, and∑n
j=1 λj = 1. Let x =∑n
j=1 λjxj. Then
f(x) ≤n∑j=1
λjf(xj). (6)
The opposite inequality holds if f is concave.
Proof 1. Induction. The case n = 2 is the definition. Assume the statement true for
n. Let x =∑n+1
j=1 λjxj, where xj ∈ I (1 ≤ j ≤ n + 1) and∑n+1
j=1 λj = 1. Let µ =∑n
j=1 λj =
1− λn+1, and
y =n∑j=1
λjµxj,
so that x = µy + λn+1xn+1. By the induction hypothesis,
f(y) ≤n∑j=1
λjµf(xj).
Hence, by (2),
f(x) ≤ µf(y) + λn+1f(xn+1) ≤n∑j=1
λjf(xj). �
Proof 2. The statement is trivial if the xj are all equal, so assume they are not. Then
x is an interior point of I. By 2.14 (or the more elementary 2.1 for differentiable f), for all
x ∈ I, f(x)− f(x) ≥ m(x− x) where m = (D+f)(x). Son∑j=1
λjf(xj)− f(x) =n∑j=1
λj[f(xj)− f(x)]
≥ mn∑j=1
λj(xj − x)
= m
(n∑j=1
λjxj − x
)= 0. �
An immediate application is the well-known inequality of the means. Given positive
numbers xj and wj with∑n
j=1wj = 1, the weighted arithmetic mean of the numbers xj is
8
x =∑n
j=1wjxj, while the weighted geometric mean is∏xwjj . The ordinary arithmetic and
geometric means are obtained by taking wj =1n
for each j.
3.2 PROPOSITION. Let xj, wj (1 ≤ j ≤ n) be positive numbers with∑n
j=1wj = 1.
Thenn∏j=1
xwjj ≤
n∑j=1
wjxj.
Proof. By Jensen’s inequality applied to the concave function log x, we have∑nj=1wj log xj ≤ log x. �
Example. Apply 3.2 (in the logarithmic form) with xj = 1/wj: then∑n
j=1wjxj = n,
so∑n
j=1wj log1wj≤ log n. (This quantity is known as the “entropy” of (wj), viewed as a
probability distribution).
Hölder’s inequality (in two equivalent forms) follows in elegant style. Note that for
p > 1, the conjugate index p∗ is defined by 1/p+ 1/p∗ = 1, so that p∗ = p/(p− 1).
3.3 PROPOSITION (Hölder’s inequality). Suppose that aj, bj (1 ≤ j ≤ n) are non-negative numbers and 0 < r < 1. Let s = 1− r. Then
n∑j=1
arjbsj ≤
(n∑j=1
aj
)r( n∑j=1
bj
)s. (7)
(ii) Suppose that xj, yj (1 ≤ j ≤ n) are non-negative numbers and p > 1. Then
n∑j=1
xjyj ≤
(n∑j=1
xpj
)1/p( n∑j=1
yp∗
j
)1/p∗. (8)
Proof. (i) Let∑n
j=1 aj = A and∑n
j=1 bj = B, also cj = aj/A and dj = bj/B. By 3.2,
crjbsj ≤ rcj + sdj, so
n∑j=1
crjdsj ≤ r
n∑j=1
cj + sm∑j=1
dj = r + s = 1.
But crjdsj = (a
rjbsj)/(A
rBs), so this equates to (7).
(ii) Apply (7) with r = 1/p, s = 1/p∗ and aj = xpj , bj = y
p∗
j . �
We now record what Jensen’s inequality says when applied to the function xp.
3.4 PROPOSITION. Suppose that xj ≥ 0 and wj > 0 for 1 ≤ j ≤ n, with∑n
j=1wj = 1.
If p > 1 or p < 0, then (n∑j=1
wjxj
)p≤
n∑j=1
wjxpj . (9)
9
The reverse inequality holds if 0 < p ≤ 1. �
The weighted pth mean of the numbers xj is Mp(x,w) =(∑n
j=1wjxpj
)1/p. In this
notation, (9) says that M1(x,w) ≤Mp(x,w) for p > 1. By applying it to xpj , with index q/p,one can deduce that Mp(x,w) ≤Mq(x,w) for 0 < p < q.
If∑n
j=1wj = W , then replacing wj by wj/W in (9), we obtain(n∑j=1
wjxj
)p≤ W p−1
n∑j=1
wjxpj .
In particular, (∑n
j=1 xj)p ≤ np−1
∑mj=1 x
pj . A suitable substitution delivers another proof of
Hölder’s inequality.
We return to the general study of Jensen’s inequality. With the help of Taylor’s the-
orem, we can modify the second proof to give a version with an error term in terms of the
second derivative.
3.5 PROPOSITION. Suppose that f has continuous second derivative on I. Let λj, xj
and x be as in Theorem 3.1. Then there exists ξ ∈ I such thatn∑j=1
λjf(xj)− f(x) = 12f′′(ξ)
n∑j=1
λj(xj − xλ)2.
Proof. For each j, by Taylor’s theorem, there exists yj ∈ I such that
f(xj)− f(x) = (xj − xλ)f ′(xλ) + 12(xj − x)2f ′′(yj).
Let min1≤j≤n f′′(yj) = m and max1≤j≤n f
′′(yj) = M . By the cancellation seen in Proof 2 of
Theorem 3.1, we see thatn∑j=1
λjf(xj)− f(x) = 12n∑j=1
λj(xj − x)2f ′′(yj)
= 12µ
n∑j=1
λj(xj − x)2,
where m ≤ µ ≤M . By the intermediate value theorem, µ = f ′′(ξ) for some ξ ∈ I. �
Simple reasoning along the same lines as 2.7 delivers a companion inequality in the
reverse direction, giving an upper bound for∑n
j=1 λjf(xj).
3.6. Let f be convex on I, and let xj, λj and x be as in Theorem 3.1. Let m ≤ xj ≤Mfor 1 ≤ j ≤ n, where m < M . Then
n∑j=1
λjf(xj) ≤M − xM −m
f(m) +x−mM −m
f(M).
10
Proof. Observe that x = αm + βM , where α = (M − x)/(M − m) and β =(x−m)/(M −m). Then α + β = 1, so
n∑j=1
λjg(xj) = αg(m) + βg(M)
for g(x) = 1 and for g(x) = x, hence for all linear g. Take g to be the linear function agreeing
with f at m and M . Then f(xj) ≤ g(xj) for each j, so the statement follows. �
This looks simpler when m = 0 and M = 1: the bound becomes (1−x)f(0)+xf(1). Inturn, this implies the bound f(0) + f(1)− f(1− x), known as the Jensen-Mercer inequality.
Finally, we state the continuous version of Jensen’s inequality, in which the vectors
(xj) and (λj) are replaced by functions and discrete sums are replaced by integrals. Proof 2
of Theorem 3.1 appplies with minimal change.
3.7 THEOREM. Suppose that w is a non-negative function on an interval I with∫Iw = 1, and let x be any integrable function on I. Let
∫Iwx = A(x,w). Let f be a convex
function defined at least on x(I). Then
f [A(x,w)] ≤∫I
w(t)f [x(t)] dt.
Proof. Write A(x,w) = A. For t ∈ I, we have
f [x(t)]− f(A) ≥ [x(t)− A]f ′(A),
in which f ′(A) means the right-derivative if necessary. Multiplying by w(t) and integrating,
we obtain ∫I
w(t)f [x(t)]− f(A) ≥ f ′(A)∫I
w(t)[x(t)− A] dt
= f ′(A)
[∫I
w(t)x(t) dt− A]
= 0. �
We can derive a continuous analogue of the inequality of the means. Given that x(t) > 0
on I, we define the weighted geometric mean G(x,w) by: logG(x,w) =∫Iw(t) log x(t) dt.
By Theorem 3.7, with f taken to be the concave function log, we have at once:
3.8 COROLLARY. Under these conditions, we have G(x,w) ≤ A(x,w). �
11
4. Inequalities for integrals; applications to discrete sums
Recall first the following elementary estimation for integrals, derived from upper and
lower bounds: if m ≤ f(x) ≤M for a ≤ x ≤ b, then
m(b− a) ≤∫ ba
f(x) dx ≤M(b− a).
For a decreasing function on [p, q] (where p, q are integers), by combining these estimates on
successive intervals [r − 1, r], we obtain
f(p+ 1) + f(p+ 2) + · · ·+ f(q) ≤∫ qp
f(x) dx ≤ f(p) + f(p+ 1) + · · ·+ f(q − 1).
For convex functions, we can give much more accurate estimations, as follows. If f is
convex on [a, b], then f(x) ≤ h(x) on [a, b], where (with a change of notation) h(x) is thelinear function mx+ d such that ma+ d = f(a) and mb+ d = f(b). Then∫ b
a
(mx+ d) dx = 12m(b2 − a2) + d(b− a)
= (b− a)[12m(b+ a) + d]
= 12(b− a)[f(a) + f(b)].
Of course, this is just the area of the trapezium described; it is the “trapezium rule” estimate
for the integral.
Meanwhile, if c = 12(a + b) and µ = (D+f)(c),
then by 2.14, f(x) ≥ g(x) on [a, b], where g(x) = f(c) +µ(x− c). Clearly,∫ b
a
g(x) dx = (b− a)f(c).
This is the “mid-point” estimate for the integral. Since∫ bag ≤
∫ baf ≤
∫ bah, we conclude:
4.1 PROPOSITION. If f is convex on [a, b] and c = 12(a+ b), then
(b− a)f(c) ≤∫ ba
f(x) dx ≤ 12(b− a)[f(a) + f(b)]. �
Note. Alternatively, to prove the left-hand inequality without using the one-sided
derivative, observe that f(c) ≤ 12f(c− x) + 1
2f(c+ x) and integrate on [0, 1
2(b− a)].
Example. Since log(1 + x) =∫ 1+x1
1tdt, we have
log(1 + x) ≤ x2
(1 +
1
1 + x
)=x(2 + x)
2(1 + x)
12
and
log(1 + x) ≥ x1 + 1
2x
=2x
2 + x.
For comparison, the bounds given by simple integral comparison are x and x/(1 + x).
By combining the estimations in 4.1 on successive intervals, we obtain:
4.2 PROPOSITION. Let q − p be an integer. If f is convex on [p, q], then
f(p+ 12) + · · ·+ f(q − 1
2) ≤
∫ qp
f(x) dx ≤ 12f(p) + f(p+ 1) + · · ·+ f(q − 1) + 1
2f(q).
Proof. By 4.1, for any r in [p, q],
f(r − 12) ≤
∫ rr−1
f ≤ 12[f(r − 1) + f(r)].
Add for r = p+ 1, . . . , q to obtain the statement. �
The application is usually to use the known value of the integral to give bounds for the
discrete sums on either side.
By taking limits, we can derive the following estimates for the tail of an infinite series.
4.3. Suppose that f(x) is decreasing, convex and non-negative for all x ≥ 1. Supposealso that
∫∞1f(x) dx is convergent, and write rn =
∑∞j=n+1 f(j). Then∫ ∞
n
f(x) dx− 12f(n) ≤ rn ≤
∫ ∞n+ 1
2
f(x) dx.
Proof. In the right-hand inequality in 4.2, take q = n and let q → ∞ to obtain∫∞nf ≤ 1
2f(n) + rn.
In the left-hand inequality in 4.2, take p = n+ 12, q = r + 1
2to get
f(n+ 1) + · · ·+ f(r) ≤∫ r+1/2n+1/2
f.
Taking the limit as r →∞, we obtain rn ≤∫∞n+1/2
f . �
Example. Let rn =∑∞
j=n+11j2
. Since∫∞a
(1/x2) dx = 1/a, we obtain
1
n− 1
2n2≤ rn ≤
1
n+ 12
=1
n− 1n(2n+ 1)
(Compare the bounds 1/(n+ 1) and 1/n given by simple estimation.)
13
Example. LetHn =∑n
r=11r. Simple integral estimation gives log n+ 1
n≤ Hn ≤ log n+1.
By 4.2, with p = 1 and q = n, we have
1
2+
1
2+
1
3+ · · ·+ 1
n− 1+
1
2n≥∫ n1
1
xdx = log n,
so Hn ≥ log n+ 12 +12n
. Also, taking p = 12
and q = n− 12, we have
Hn ≤∫ n−1/21/2
1
xdx = log(2n− 1).
(With more care, this can be developed into more accurate estimates involving Euler’s con-
stant. However, closer estimates are delivered by methods using the logarithmic series or
Euler-Maclaurin summation.)
Example. Apply 4.2 to log x: this function is concave, so the inequality reverses, giving
12
log 1 +n−1∑r=2
log r + 12
log n ≤∫ n1
log x dx = n log n− n+ 1,
so log n! =∑n
r=1 log r ≤ (n+12) log n−n+1, hence n! ≤ nn+1/2e1−n. (This can be developed
into a proof of Stirling’s formula, but again a more accurate versions are given by other
methods.)
We now present a rather different result on integrals of convex functions. It is essentially
an integrated form of 2.6.
4.4 PROPOSITION. Suppose that a1, a2, a3 and b1, b2, b3 are real numbers such that
a1 < b1 ≤ b2 < a2 ≤ a3 < b3 (10)
and that p, q, r are positive numbers such that
pa1 + qa2 + ra3 = pb1 + qb2 + rb3. (11)
Suppose that f is convex on [a1, b3] and that either
pa21 + qa22 + ra
23 = pb
21 + qb
22 + rb
23 (12)
or that f is also increasing and
pa21 + qa22 + ra
23 ≤ pb21 + qb22 + rb23. (13)
Then
q
∫ a2b2
f ≤ p∫ b1a1
f + r
∫ b3a3
f. (14)
14
Consequently, if g is a function such that g′ is convex on [a1, b3], and either (12) holds, or
(13) holds and g is also convex, then
pg(a1) + qg(a2) + rg(a3) ≤ pg(b1) + qg(b2) + rg(b3). (15)
Proof. First, assume (12). Conditions (11) and (12) can be rewritten as
q(a2 − b2) = p(b1 − a1) + r(b3 − a3),
q(a22 − b22) = p(b21 − a21) + r(b23 − a23).
These identities equate, respectively, to the statements that
q
∫ a2b2
h = p
∫ b1a1
h+ r
∫ b3a3
h (16)
for h(x) = 1 and for h(x) = x, and hence for all linear h(x) = mx+n. Now take h to be the
linear function agreeing with f at b2 and a2. By 2.6, we have f ≤ h on [b2, a2], while f ≥ hon [a1, b1] and [a3, b3]. Inequality (14) follows.
Now assume that f is increasing and (13) holds. Then equality is replaced by ≤ in(14) for h(x) = x, hence also for h(x) = mx + n with m ≥ 0. This condition is satisfied bythe linear function agreeing with f at b2 and a2, since f(b2) ≤ f(a2). Inequality (14) followsas before.
We now apply this with f = g′. If g is convex, then g′ is increasing. So under either
set of conditions, we deduce (14): it now says
q[g(a2)− g(b2)] ≤ p[g(b1)− g(a1)] + r[g(b3)− g(a3)],
which equates to (15). �
Of course, if f is strictly convex, then strict inequality holds in (14).
Note that if aj, bj satisfy (9) and (10) or (9) and (11), then so do the numbers aj + c,
bj + c for any c.
The special case p = q = r = 1 is already of interest. It says: given (1), if∑3
j=1 aj =∑3j=1 bj and
∑3j=1 a
2j =
∑3j=1 b
2j , then
∑3j=1 g(aj) ≤
∑3j=1 g(bj) when g
′ is convex, and the
opposite when g′ is concave. So, for instance,∑3
j=1 apj <
∑3j=1 b
pj (strict inequality) for
all p > 2, and the reverse for 1 < p < 2. There are plenty of integer triples that that
satisfy these conditions, for example (aj) = (1, 4, 4), (bj) = (2, 2, 5) and (aj) = (1, 5, 6),
15
(bj) = (2, 3, 7). (A systematic description of such pairs of triples would be interesting, but
we will not embark upon it here.) A completely different route to results of this sort, but
restricted to the functions xp, is by a generalisation of Descartes’ rule of signs: see [Jam1,
Example 3].
An appplication of Proposition 4.4 is monotonicity of the mid-point and trapezium
approximations to integrals for convex functions. This was originally proved in [BJ], by a
rather intricate method. A simpler proof is given in [Jam3].
5. Algebra of convex functions
The properties (E1), (E4), (E5), (E6) listed in section 1 gave some elementary facts
about the derivation of new convex functions from given ones. We now present some less
obvious results of this type.
5.1. Let f , g be functions that are non-negative and convex on an interval I, either
both increasing or both decreasing. Then fg is convex on I.
Proof. Let x1 < x2. Let ax+ b and cx+d be the linear functions agreeing with f and g
respectively at x1 and x2. If f and g are increasing, then a and c are non-negative, while if f
and g are decreasing, then a and c are non-positive: in either case, ac ≥ 0. For x1 < x < x2,we have f(x)g(x) ≤ h(x), where
h(x) = (ax+ b)(cx+ d) = acx2 + (bc+ ad)x+ cd.
Since ac ≥ 0, h is convex, so for x1 < x < x2, we have h(x) ≤ H(x), where H is the linearfunction agreeing with h (hence also with fg) at x1 and x2. �
Example. Let f(x) = 1/x and g(x) = x3/2. Both are convex, but f is decreasing and g
is increasing. Then f(x)g(x) = x1/2, which is concave.
5.2. Suppose that f is convex on I, and g is convex and increasing on the interval
f(I). Then the composition g ◦ f is convex on I.
Proof. Let xλ = (1 − λ)x1 + λx2. Then f(xλ) ≤ (1 − λ)f(x1) + λf(x2). Since gincreasing and convex,
g[f(xλ)] ≤ g[(1− λ)f(x1) + λf(x2)]
≤ (1− λ)g[f(x1)] + λg[f(x2).
To see that this result does not hold without the condition that g is increasing, we only
need to observe that if g(x) = −x, then (g ◦ f)(x) = −f(x).
16
5.3 COROLLARY. If f is convex and non-negative on I and p > 1, then f(x)p is
convex on I. �
5.4 COROLLARY. If g is convex and increasing on [a, b], where 0 < a < b, then g(1/x)
is convex on [1b, 1a]. �
Example. Let g(x) = 1/(x + 1). Then g is convex, but decreasing, for x > 0. Then
g(1/x) = 1− 1/(x+ 1), which is concave.
Note that for positive, convex f , one can have 1/f(x) convex or concave, as shown by
x−2 and x−1/2. An increasing, convex function with 1/f(x) concave is x2 + 1 on [0, 1].
The inverse of the convex function ex is the concave function log x. This is a special
case of the following result:
5.5. Let f be strictly increasing and convex on [a, b], with inverse function g. Then g
is concave on [f(a), f(b)]. If f is strictly decrasing, then g is convex.
Proof. Take points x1, x2 in [a, b] with x1 < x2. Let 0 < λ < 1 and put xλ =
(1 − λ)x1 + λx2. Let f(x1) = y1, f(x2) = y2 and yλ = (1 − λ)y1 + λy2. Since f is convex,f(xλ) ≤ yλ. Since g is increasing,
g(yλ) ≥ g[f(xλ)] = xλ = (1− λ)g(y1) + λg(y2).
The opposite applies if f , hence also g, is decreasing. �
Log-convexity. We say that a strictly positive function f is log-convex on I if log f(x)
is convex. Clearly, this is equivalent to
f(xλ) ≤ f(x1)1−λf(x2)λ,
where xλ = (1 − λ)x1 + λx2. For a differentiable function, it is equivalent to f ′(x)/f(x)increasing. By 2.11, it implies that f(x+ c)/f(x) increases with x for c > 0.
By 5.2, a log-convex function is convex: if h(x) is convex, then so is eh(x).
Of course, we say that f(x) is log-concave if log f(x) is concave: this is equvalent to
1/f(x) being log-convex.
Clearly, eax is both log-convex and log-concave for any a. Also, xp is log-concave for
p > 0 and log-convex for p < 0.
Obviously, if f and g are log-convex on I, then so is fg.
17
5.6. If f and g are log-convex on I, then so is f + g.
Proof. Let xλ = (1− λ)x1 + λx2. Write f(xj) = fj and g(xj) = gj for j = 1, 2. Then
f(xλ) ≤ f 1−λ1 fλ2 , g(xλ) ≤ g1−λ1 gλ2 .
By Hölder’s inequality in the form (7),
f 1−λ1 fλ2 + g
1−λ1 g
λ2 ≤ (f1 + g1)1−λ(f2 + g2)λ.
Hence f + g is log-convex. �
There is no corresponding statement for log-concavity, as the following example shows.
Example. Let f(x) = ex and g(x) = 1. Then f and g are log-concave (as well as
log-convex). Let h(x) = ex + 1. Then h′(x)/h(x) = ex/(ex + 1), which is strictly increasing
for all x.
Example: the gamma function. The gamma function Γ(x) is log-convex. This is proved
most easily from Euler’s limit definition: Γ(x) = limn→∞ Γn(x), where
Γn(x) =nxn!
x(x+ 1) . . . (x+ n),
so that
log Γn(x) = x log n+ log(n!)−n∑r=0
log(x+ r).
Hence log Γn(x), hence also log Γ(x) is convex. A famous result, the Bohr-Mollerup theorem
states that conversely, Γ(x) is the unique function f(x) on R+ that is log-convex and satisfiesf(x+ 1) = xf(x) and f(1) = 1.
6. Monotonic averages
Here we outline a pair of theorems first presented in [BJ], with some applications. For
a function f on the interval [0, 1], we define
An(f) =1
n− 1
n−1∑r=1
f( rn
)(n ≥ 2),
Bn(f) =1
n+ 1
n∑r=0
f( rn
)(n ≥ 1).
These are, respectively, the averages of the values f( rn) excluding and including the end
points. For An(f), we do not need f to be defined at 0 and 1. If f is continuous on [0, 1],
18
then both An(f) and Bn(f) tend to∫ 10f as n → ∞. We show that for convex functions,
they do so in a monotonic way.
6.1 PROPOSITION. If f is convex on (0, 1), then An(f) increases with n.
Proof. Let n ≥ 2. For 1 ≤ r ≤ n − 1, the point r/n lies between r/(n + 1) and(r + 1)/(n+ 1). More exactly,
r
n=n− rn
r
n+ 1+r
n
r + 1
n+ 1.
Write f [r/(n+ 1)] = fr. Since f is convex,
f( rn
)≤ n− r
nfr +
r
nfr+1.
Hence
n−1∑r=1
f( rn
)≤ n− 1
nf1 +
1
nf2 +
n− 2n
f2 +2
nf3 + · · ·+
1
nfn−1 +
n− 1n
fn
=n− 1n
n∑r=1
fr,
which says that An(f) ≤ An+1(f). �
6.2 PROPOSITION. If f is convex on [0, 1], then Bn(f) decreases with n.
Proof. Let n ≥ 2. This time, we use the fact that for 1 ≤ r ≤ n− 1,
r
n=r
n
r − 1n− 1
+n− rn
r
n− 1.
Write f [r/(n− 1)] = gr. By convexity of f , for r as above,
f( rn
)≤ rngr−1 +
n− rn
gr,
also f(0/n) = g0 and f(n/n) = gn−1. Hence
n∑r=0
f( rn
)≤ n
ng0 +
1
ng0 +
n− 1n
g1 +2
ng1 + · · ·+
1
ngn−1 +
n
ngn−1
=n+ 1
n
n−1∑r=0
gr,
which says that Bn(f) ≤ Bn−1(f). �
Of course, the opposite statements hold if f is concave. If f is linear, it follows that
An(f) and Bn(f) are constant. This is easily verified directly: if f(x) = x, then An(f) =
Bn(f) =12
for all n.
19
Applied to xp, these results give the following (we confine ourselves to the case p > 0):
6.3. Let Sn(p) =∑n
r=1 rp and
cn(p) =Sn(p)
n(n+ 1)p, dn(p) =
Sn(p)
np(n+ 1).
Then cn(p) increases with n for p ≥ 1, and decreases for 0 ≤ p ≤ 1. Meanwhile, dn(p)decreases with n for p ≥ 1, and increases for 0 < p ≤ 1.
(The reversals at p = 1 reflect the fact that cn(1) and dn(1) have the constant value12.)
Proof. With f(x) = xp, we have
An+1(f) =1
n
n∑r=1
rp
(n+ 1)p= cn(p),
Bn(f) =1
n+ 1
n∑r=0
rp
np= dn(p). �
By simple integral estimation,
np+1
p+ 1≤ Sn(p) ≤
(n+ 1)p+1
p+ 1,
hence cn(p) and dn(p) tend to 1/(p+1) as n→∞. Since the terms of an increasing sequenceare not greater than the limit, we can deduce the following stronger estimate:
6.4 COROLLARY. For p ≥ 1, we have
1
p+ 1np(n+ 1) ≤ Sn(p) ≤
1
p+ 1n(n+ 1)p.
The reverse inequalities hold when 0 < p ≤ 1. �
6.5 COROLLARY. Let un(p) = Sn(p)/np+1 and vn(p) = Sn(p)/(n + 1)
p+1. Then for
all p > 0, un(p) decreases with n and vn(p) increases.
Proof. We have un(p) = (1 +1n)dn(p), hence is decreasing for p ≥ 1. Also, un(p) =
(1 + 1n)pcn(p), hence is decreasing for 0 ≤ p ≤ 1. Similarly, vn(p) = [n/(n + 1)]cn(p) =
[n/(n+ 1]pdn(p). �
Note that unlike cn(p) and dn(p), these ratios fail to be constant when p = 1.
We now apply Proposition 6.1 to the concave function log x.
20
6.6. The expression1
n+ 1(n!)1/n decreases with n.
Proof. Let f(x) = log x. By 6.1, an+1(f) decreases with n. But
An+1(f) =1
n
n∑r=1
(log r − log(n+ 1)
)=
1
nlog(n!)− log(n+ 1). �
It was shown in [MS] that 1n(n!)1/n decreases; this statement is weaker than ours,
because their expression equates to ours multiplied by the decreasing factor 1 + 1n.
Propositions 6.1 and 6.2 can be generalised in various ways. One way is to consider
weighted averages of the form Bn(W, f) =∑n
r=0wn,rf(rn), where wn,r ≥ 0 and
∑nr=0wn,r = 1
for all n. This is explored in [Jam2]. For example, it is shown that Bn(W, f) decreases
with n for convex f when W is any Hausdorff mean matrix W : in particular, this applies
to the Euler matrix defined by wn,r =(nr
)xr(1 − x)n−r for a chosen x in (0, 1). Another
generalisation is Bennett’s concept of “meaningful” sequences [Benn1].
7. Majorisation
The result outlined here, known as the “majorisation principle”, seems to have been
first formulated by Hardy, Littlewood and Pólya in 1929 [HLP1]. It was rediscovered by
Karamata in 1932 [Kar], and it has also been called “Karamata’s inequality”.
The result concerns sums of the form∑n
j=1 f(xj). For this purpose, we may assume
the xj arranged in decreasing order.
Given a sequence x = (xj) (finite or infinite), we write Xk =∑k
j=1 xj (and similarly Yk
for a second sequence y). For now, let x, y be decreasing elements of Rn. If Yk ≤ Xk for eachk, we write y ≤S x. If also Yn = Xn, we write y ≤M x, and say that y is “majorised” by x.(This is not standard notation; in fact, no notation is firmly established for these relations.)
Examples: (5, 4, 2) ≤M (7, 3, 1); (6, 5, 5, 3) ≤M (9, 4, 4, 2).
Recall the well-known Abel summation formula for finite sums:
n∑j=1
ajxj = a1X1 +n∑j=2
aj(Xj −Xj−1) =n−1∑j=1
(aj − aj+1)Xj + anXn. (17)
The following Lemma is an obvious consequence:
7.1 LEMMA. Suppose that x, y are elements of Rn with y ≤S x. Suppose also that
21
(aj) is a decreasing element of Rn, and that either Yn = Xn or an ≥ 0. Thenn∑j=1
ajyj ≤n∑j=1
ajxj.
Proof. Apply (17) to (xj) and (yj) in turn and compare. �
The majorisation principle follows easily by combining this with 2.1:
7.2 PROPOSITION (the majorisation principle). Let x, y be decreasing elements of
Rn with y ≤S x. Suppose that the function f is convex on an interval I containing all xjand yj, and that
either (i) Yn = Xn (so that y ≤M x),or (ii) f is increasing on I.
Thenn∑j=1
f(yj) ≤n∑j=1
f(xj). (18)
Proof. Assume first that f is differentiable. Then by 2.1,
f(xj)− f(yj) ≥ (xj − yj)f ′(yj)
for each j. Since f is convex, f ′(t) increases with t, so f ′(yj) decreases with j. Under either
hypothesis, Lemma 7.1 shows that∑n
j=1(xj − yj)f ′(yj) ≥ 0.
If f is not differentiable at some yj, we consider the right-derivative, using 2.14 instead
of 2.1. The reasoning is the same. �
Of course, the reverse of (18) holds if f is concave and either Yn = Xn or f is decreasing.
Strict inequality holds if f is strictly convex and yj 6= xj for some j. The case n = 2 is 2.8.
Note that under condition (ii), it follows further that∑k
j=1 f(yj) ≤∑k
j=1 f(xj) for
each k ≤ n, more closely reflecting the hypothesis Yk ≤ Xk for each k.
General sequences. For sequences that are not decreasing, 7.2 can be restated as follows
(and often is in the literature). Let x∗ be the vector consisting of the terms xj arranged in
decreasing order (the “decreasing rearrangement” of x). Of course, the sum∑n
j=1 f(xj) is
unchanged by rearrangement. So Theorem 1 says that (3) holds if y∗ ≤M x∗, or if y∗ ≤S x∗
and f is increasing. The established terminology is that y is “majorised” by x if y∗ ≤M x∗.
Example. Let (xj) be decreasing and x = Xn/n. Let yj = x for 1 ≤ j ≤ n. It iselementary that Xk/k (the sequence of averages) decreases with k, hence for k ≤ n, we
22
have Xk/k ≥ x. So Yk = kx ≤ Xk, hence y ≤S x. By 7.2, nf(x) ≤∑n
j=1 f(xj), or
f(x) ≤ 1n
∑nj=1 f(xj). This is Jensen’s inequality for equally weighted elements.
We record some applications. Applied to f(x) = xp, the result becomes:
7.3. Let x, y be decreasing, non-negative elements of Rn with y ≤S x. Then:
(i) if p ≥ 1, then∑k
j=1 ypj ≤
∑kj=1 x
pj for each k ≤ n.
(ii) if 0 < p < 1 and also Yn = Xn, then∑n
j=1 ypj ≥
∑nj=1 x
pj . �
Clearly, (i) also extends to infinite sequences. There is no question of (ii) applying
without the condition Yn = Xn, since this would allow each yj to be arbitrarily small.
Using the logarithmic and exponential functions, we can relate sums to products.
7.4. Let x, y be decreasing, positive elements of Rn. If y ≤M x, then y1y2 . . . yn ≥x1x2 . . . xn.
Proof. This follows from 7.2 with f(t) = log t: since f is concave, (18) is reversed. �
With the Example above, this gives xn ≥ x1x2 . . . xn, in other words, the arithmeticmean x is not less than the geometric mean.
Note: It is quite possible to have∑n
j=1 xj =∑n
j=1 yj and x1x2 . . . xn = y1y2 . . . yn, for
example (12, 5, 4) and (10, 8, 3). What 7.4 tells us is that this cannot happen with either
y ≤S x or x ≤S y.
In the opposite direction, we can prove the following result.
7.5. Let x, y be decreasing, positive elements of Rn. If
y1y2 . . . yk ≤ x1x2 . . . xk
for each k ≤ n, then for any p > 0, we have∑k
j=1 ypj ≤
∑kj=1 x
pj for each k ≤ n.
Proof. Then∑k
j=1 log yj ≤∑n
j=1 log xj for each k ≤ n. Let f(t) = ept. Then f isconvex and increasing, and f(log yj) = y
pj . The conclusion follows, by (18). �
In particular, the hypothesis in 7.5 implies that y ≤S x. The stated inequality thenfollows from 7.3 for p ≥ 1, but not for 0 < p < 1.
Example. By 7.5, we have 4p + 3p ≤ 6p + 2p for all p > 0.
We describe a further application involving xp, in which there is no longer any assump-
tion about partial sums.
23
7.6. Let x, y be non-negative elements of Rn, both decreasing or both increasing. Ifp > 1, then (
n∑j=1
xpjypj
)(n∑j=1
xj
)p≥
(n∑j=1
xjyj
)p( n∑j=1
xpj
). (19)
The reverse inequality holds if 0 < p < 1.
Proof. We prove the statement for the case where (xj) and (yj) are decreasing. The case
where they are increasing then follows by considering (xn, . . . , x2, x1) and (yn, . . . , y2, y1).
Write zj = xjyj and Zn = cXn. We show that cXk ≤ Zk for k ≤ n. By 7.2, it thenfollows, for p > 1, that cp
∑nj=1 x
pj ≤
∑nj=1 z
pj , which equates to (19).
Let Zk = ckXk, so cn = c. We have to show that ck ≥ c. This will follow if we showthat ck+1 ≤ ck for each k < n. Now Zk =
∑kj=1 xjyj ≥ ykXk, so ck ≥ yk, hence ck ≥ yk+1.
Now
Zk+1 = ckXk + xk+1yk+1 ≤ ck(Xk + xk+1) = ckXk+1,
hence ck+1 ≤ ck, as required. �
This result can be restated neatly in terms of `p-norms. Define ‖x‖p to be (∑n
j=1 |xj|p)1/p
(note that ‖x‖2 is the ordinary Euclidean norm). Then (19) equates to
‖xy‖p‖xy‖1
≥ ‖x‖p‖x‖1
.
We now establish a converse to 7.2, essentally showing that the property stated there
characterises the majorisation relation.
7.7. Let x, y be decreasing elements of Rn. If (18) holds for all increasing, convex f ,then y ≤S x. If (18) holds for all convex f , then y ≤M x.
Proof. Suppose first that (18) holds for all convex f . Then it holds, in particular, for
f(t) = ±t. This implies at once that Yn = Xn.
Now suppose that (18) holds for increasing, convex f . Choose k ≤ n, and let
f(t) = (t− xk)+ ={t− xk if t ≥ xk,
0 if t < xk.
Then f is convex and increasing, and
n∑j=1
f(xj) =k∑j=1
(xj − xk) = Xk − kxk.
24
Also, since f(t) is not less than both t− xk and 0 for all t, we have
n∑j=1
f(yj) ≥k∑j=1
(yj − xk) = Yk − kxk,
hence Yk ≤ Xk. �
Finally, we formulate a continuous version of majorisation, in which integrals replace
discrete sums. The proof is analogous, but Abel summation is replaced by integration by
parts.
7.8. Let x, y be decreasing, differentiable functions on [a, b], with values in an interval
I. Write X(t) =∫ tax(s) ds, similarly Y (t). Suppose that Y (t) ≤ X(t) for a ≤ t ≤ b. Let
f be a function that is convex and twice differentiable on I. Suppose further that either
Y (b) = X(b) or f is increasing. Then∫ ba
f [y(t)] dt ≤∫ ba
f [x(t)] dt.
Proof. By 2.1,
f [x(t)]− f [y(t)] ≥ [x(t)− y(t)]f ′[y(t)].
Integrate by parts:∫ ba
[x(t)− y(t)]f ′[y(t)] dt = [[X(t)− Y (t)]f ′[y(t)]]ba −∫ ba
[X(t)− Y (t)]f ′′[y(t)]y′(t) dt
= [X(b)− Y (b)]f ′[y(b)]−∫ ba
[X(t)− Y (t)]f ′′[y(t)]y′(t) dt.
Under either of the alternative hypotheses, the first term is non-negative. Since f ′′[y(t)] ≥ 0and y′(t) ≤ 0, the second term is non-negative. �
One can derive applications analogous to 7.3 and 7.6.
There is a substantial body of further theory concerning majoristion. See, for example
[HLP2], [MO], [Benn2]. Here we just mention without proof a purely algebraic character-
isation [HLP2, 46–49]. An n × n matrix P = (pj,k) is doubly stochastic if the entries arenon-negative and all row sums and all column sums equal 1. The statement y∗ ≤M x∗ isequivalent to the existence of a doubly stochastic matrix P (not necessarily unique) such
that y = Px. It is easy to see that this condition implies (18), and hence y ≤M x, by 7.7.Since yj =
∑nk=1 pj,kxk, Jensen’s inequality gives f(yj) ≤
∑nk=1 pj,kf(xk). Summation over j
then gives∑n
j=1 f(yj) ≤∑n
k=1 f(xk).
25
References
[Benn1] Grahame Bennett, Meaningful sequences, Houston J. Math. 33 (2007),555–580.
[Benn2] Grahame Bennett, Some forms of majorization, Houston J. Math. 36 (2010),1037–1066.
[BJ] Grahame Bennett and Graham Jameson, Monotonic averages of convexfunctions, J. Math. Anal. Appl. 252 (2000), 410–430.
[HLP1] G. H. Hardy, J. Littlewood and G. Pólya, Some simple inequalities satisfied byconvex functions, Messenger Math. 58 (1929), 145–152.
[HLP2] G. H. Hardy, J. Littlewood and G. Pólya, Inequalities, 2nd ed., Cambridge Univ.Press (1967).
[Jam1] G. J. O. Jameson, Counting zeros of generalised polynomials, Math. Gazette90 (2006), 223–234.
[Jam2] G. J. O. Jameson, Monotonicity of weighted averages of convex functions,Math. Ineq. Appl., 23 (2020), 425–432.
[Jam3] G. J. O. Jameson, Monotonicity of the mid-point and trapezium estimatesfor integrals, Math. Gazette 105 (2021), to appear.
[Kar] J. Karamata, Sur une inégalité relative aux fonctions convexes, Publ. Math. Univ.Belgrade 1 (1932), 145–148.
[MO] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and ItsApplications, Academic Press, New York (1979).
[MS] H. Minc and L. Sathre, Some inequalities concerning (r!)1/r, Proc. EdinburghMath. Soc. 14 (1964), 41–46.
updated 3 November 2020
26