26
Notes on convex functions G.J.O. Jameson Contents 1. The definition and examples. 2. Elementary results. 3. Jensen’s inequality; application to means. 4. Inequalities for integrals; applications to discrete sums. 5. Algebra of convex functions. 6. Monotonic averages. 7. Majorisation. 1. The definition and examples. Let I be a real interval, open or closed, bounded or unbounded (so possibly the whole real line). A real- valued function f is said to be convex on I if it lies below (or on) the straight-line chords between pairs of points of its graph. In other words, if x 1 , x 2 are points of I with x 1 <x 2 , and ax + b is the linear function agreeing with f (x) at x 1 and x 2 , then f (x) ax + b for x 1 x x 2 . We say that f is strictly convex on I if for all x 1 , x 2 as above, we have f (x) < ax + b for x 1 <x<x 2 . We also say that f is concave on I if -f is convex, so that f (x) ax + b for such x. Of course, a and b could be written out explicitly, but this is not particularly helpful. However, an important equivalent way of stating the definition is as follows. All points x of the interval [x 1 ,x 2 ] are expressible in the form x λ = x 1 + λ(x 2 - x 1 ) = (1 - λ)x 1 + λx 2 . (1) for some λ in [0, 1]. We then have g(x λ ) = (1 - λ)g(x 1 )+ λg(x 2 ) for g(x) = 1 and for g(x)= x, hence for g(x)= ax + b as above. So the definition of convexity equates to f (x λ ) (1 - λ)f (x 1 )+ λf (x 2 ) (2) 1

Notes on convex functionsjameson/conv.pdfNotes on convex functions G.J.O. Jameson Contents 1. The de nition and examples. 2. Elementary results. 3. Jensen’s inequality; application

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • Notes on convex functions

    G.J.O. Jameson

    Contents

    1. The definition and examples.

    2. Elementary results.

    3. Jensen’s inequality; application to means.

    4. Inequalities for integrals; applications to discrete sums.

    5. Algebra of convex functions.

    6. Monotonic averages.

    7. Majorisation.

    1. The definition and examples.

    Let I be a real interval, open or closed, bounded

    or unbounded (so possibly the whole real line). A real-

    valued function f is said to be convex on I if it lies below

    (or on) the straight-line chords between pairs of points

    of its graph. In other words, if x1, x2 are points of I with

    x1 < x2, and ax+ b is the linear function agreeing with

    f(x) at x1 and x2, then f(x) ≤ ax+ b for x1 ≤ x ≤ x2.

    We say that f is strictly convex on I if for all x1, x2 as above, we have f(x) < ax + b

    for x1 < x < x2. We also say that f is concave on I if −f is convex, so that f(x) ≥ ax + bfor such x.

    Of course, a and b could be written out explicitly, but this is not particularly helpful.

    However, an important equivalent way of stating the definition is as follows. All points x of

    the interval [x1, x2] are expressible in the form

    xλ = x1 + λ(x2 − x1) = (1− λ)x1 + λx2. (1)

    for some λ in [0, 1]. We then have

    g(xλ) = (1− λ)g(x1) + λg(x2)

    for g(x) = 1 and for g(x) = x, hence for g(x) = ax+b as above. So the definition of convexity

    equates to

    f(xλ) ≤ (1− λ)f(x1) + λf(x2) (2)

    1

  • for 0 ≤ λ ≤ 1, with xλ as above.

    Another equivalent way to express (1) and (2) is: given that xλ − x1 = λ(x2 − x1), wehave f(xλ)− f(x1) ≤ λ[f(x2)− f(x1)].

    We list some immediate facts:

    (E1) If c > 0 and f(x) is convex on I, then so is cf(x).

    (E2) If f and g are convex on I, then so is f + g.

    (E3) A linear function ax+ b is both convex and concave.

    (E4) If f is convex on I, then f(x+ c) is convex on {x : x+ c ∈ I}.

    (E5) If f is convex on I, then for any non-zero c (positive or negative), f(cx) is

    convex on {x : cx ∈ I}.

    (E6) If fn is convex on I for each n ≥ 1 and limn→∞ fn(x) = f(x) for each x ∈ I,then f is convex on I.

    For example, by (E2) and (E5), if f is convex on [−R,R], then so are f(−x) andf(x) + f(−x).

    If a function has increasing derivative, then it is “curving upwards”, and it seems

    almost obvious that it must be convex. With the help of the mean-value theorem, this is

    easily proved:

    1.1 PROPOSITION. If the derivative f ′ exists and is increasing on I, then f is convex

    on I.

    Proof. Let x1 < x2, and let ax+ b be the linear function agreeing with f at x1 and x2.

    Let g(x) = f(x)− ax− b. Then g(x1) = g(x2) = 0, and g′(x) = f ′(x)− a, so is increasing onI. Suppose that g(x) > 0 for some x in (x1, x2). By the mean-value theorem, there exist ξ1

    in (x1, x) such that g(x) − g(x1) = g(x) = (x − x1)g′(ξ1), hence g′(ξ1) > 0. Similarly, thereexists and ξ2 in (x, x2) such that g(x2) − g(x) = −g(x) = (x2 − x)g′(ξ2), hence g′(ξ2) < 0.Since ξ1 > ξ2, this contradicts the fact that g

    ′ is increasing. So g(x) ≤ 0, hence f(x) ≤ ax+b,for all x in (x1, x2). �

    1.2 COROLLARY. If f ′′(x) ≥ 0 on I, then f is convex on I. �

    Of course, if f ′ is strictly increasing (which occurs if f ′′(x) > 0 on I), then f is strictly

    convex on I.

    Note. For closed intervals, the following slight refinement of 1.1 applies: if f is con-

    2

  • tinuous on [a, b] and f ′(x) exists and is increasing on (a, b), then f is convex on [a, b]. The

    proof still applies, because the mean-value theorem is still valid under these assumptions.

    In practice, we usually recognise convex functions by applying 1.1 or 1.2. In particular:

    1.3. The function xp is convex on (0,∞) if p ≥ 1 or p ≤ 0, and concave if 0 ≤ p ≤ 1.

    Proof. We have f ′′(x) = p(p− 1)xp−2, so f ′′(x) ≥ 0 for all x > 0 if p(p− 1) ≥ 0, whichoccurs when p ≥ 1 and when p ≤ 0. The opposite occurs when 0 ≤ p ≤ 1. �

    In fact, xp is strictly convex if p > 1 or p < 0, and strictly concave if 0 < p < 1. For

    p ≥ 0, the statements apply on [0,∞) (for 0 < p < 1, this uses the Note above). The readeris invited to sketch x2, x1/2 and x−1 to illustrate the three cases. Note also that x2 is convex

    on the whole real line, while x3 is concave on (−∞, 0].

    We list some further examples.

    Example. For any real a (including a < 0), the function f(x) = eax is strictly convex

    on R, since f ′′(x) = a2eax > 0.

    Example. The function log is (strictly) concave on (0,∞), since the derivative 1/x isdecreasing.

    Example. The function x log x is convex on (0,∞), since f ′(x) = 1 + log x, which isincreasing.

    Example. Both sinx and cosx are concave on [0, π2], since the second derivatives, − sinx

    and − cosx, are non-positive. Hence sinx ≥ 2πx and cos x ≥ (1− 2

    π)x on [0, π

    2].

    Example. Any polynomial with non-negative coefficients is convex on [0,∞), by 1.3and (E2).

    Example. The function |x| is convex on R: this is not a consequence of 1.1, since thefunction is not differentiable at 0, but it is easily verified directly.

    The following sections outline a number of results about convex functions. Most of

    them are very well established, and we do not give references. Sections 6 and 7 contain some

    rather more recent results and techniques, and here some references are supplied.

    Note. This provisional version contains blank spaces where certain diagrams are in-

    tended. Owing to the Coronavirus pandemic, I am currently unable to access the files for

    these diagrams.

    3

  • 2. Elementary results

    First, we verify another fact that seems geometrically obvious: if a function has in-

    creasing derivative, then it lies above its tangents.

    2.1 PROPOSITION. Suppose that f ′(x) exists and is increasing on an interval I. Then

    for any x and x0 in I,

    f(x) ≥ f(x0) + (x− x0)f ′(x0) (3)

    The opposite inequality holds if f ′(x) is decreasing.

    Proof. First suppose that x > x0. By the mean-value theorem, there exists ξ in (x0, x)

    such that f(x)−f(x0) = (x−x0)f ′(ξ). Since ξ > x0, we have f ′(ξ) ≥ f ′(x0), hence (3). Nowsuppose that x < x0. Then there exists ξ in (x, x0) such that f(x0)− f(x) = (x0 − x)f ′(ξ).We now have f ′(ξ) ≤ f ′(x0), hence f(x0)− f(x) ≤ (x0 − x)f ′(x0), which equates to (3). �

    Note. If f has a second derivative, then we have by Taylor’s theorem

    f(x) = f(x0) + (x− x0)f ′(x0) + 12(x− x0)2f ′′(ξ)

    for some ξ between x0 and x. This implies (3), and also gives an error term.

    Example. Let f(x) = (1 + x)p. Then f(0) = 1 and f ′(0) = p, so by (3), for all x > −1,we have (1 + x)p ≥ 1 + px when p ≥ 1 or p < 0, and the reverse when 0 ≤ p ≤ 1.

    Example. Let f(x) = log(1 + x). Then f is concave, f(0) = 0 and f ′(0) = 1, so

    log(1 + x) ≤ x for all x > −1.

    We will see below that convex functions always have one-sided derivatives, and satisfy

    a correspondingly amended version of 2.1

    Next, we consider gradients of chords. For x 6= y, write

    mf (x, y) =f(y)− f(x)

    y − x.

    The following result confirms a fact that seems obvious from the diagram. It is the key to

    many further statements about convex functions.

    2.2 PROPOSITION. If f is convex on I and x1,

    x2, x3 are points of I with x1 < x2 < x3, then

    mf (x1, x2) ≤ mf (x1, x3) ≤ mf (x2, x3).

    Proof. Let x2 − x1 = λ(x3 − x1). Then, by (2),

    f(x2)− f(x1) ≤ λ[f(x3)− f(x1)] =x2 − x1x3 − x1

    [f(x3 − f(x1)],

    4

  • which equates to mf (x2, x1) ≤ mf (x3, x1).

    The proof that mf (x1, x3) ≤ mf (x2, x3) is similar, using x3−x2 = (1−λ)(x3−x1) andf(x3)− f(x2) ≥ (1− λ)[f(x3)− f(x1)]. �

    This simple result has numerous applications.

    2.3 COROLLARY. If f is convex on I and xj, yj are points of I with x1 < y1, x2 < y2,

    also x1 ≤ x2 and y1 ≤ y2, then mf (x1, y1) ≤ mf (x2, y2).

    Proof. Then mf (x1, y1) ≤ mf (x1, y2) ≤ mf (x2, y2). �

    (Note that we do not require y1 ≤ x2 in this result.)

    2.4 COROLLARY. If f is convex on [0, R] and f(0) = 0, then f(x)/x is increasing on

    (0, R].

    Proof. Note that f(x)/x = mf (0, x). If 0 < x < y, then mf (0, x) ≤ mf (0, y). �

    Another consequence is the following converse to 2.1:

    2.5. If f is convex and differentiable on an open interval I, then f ′ is increasing on I.

    Proof. Take x1, x2 in I with x1 < x2, and h > 0. By 2.2, mf (x1, x1+h) ≤ mf (x2, x2+h).Taking the limit as h→ 0, we obtain f ′(x1) ≤ f ′(x2). �

    The next result is a useful complement to the original definition. Again, it is geomet-

    rically obvious: picture a straight line intersecting twice with an upwardly curving line.

    2.6 PROPOSITION. Suppose that f is convex on I and x1, x2 are points of I with

    x1 < x2. Let ax+ b be the linear function agreeing with f at x1 and x2. Then f(x) ≥ ax+ bfor points x of I such that x < x1 or x > x2.

    Proof. Clearly, mf (x1, x2) = a. So if x > x2, then mf (x2, x) ≥ a, hence

    f(x) ≥ f(x2) + a(x− x2) = (ax2 + b) + a(x− x2) = ax+ b.

    Similarly for x < x1. �

    Example. The function 2x is strictly convex, since it equals ex log 2. It agrees with 1 + x

    at x = 0 and x = 1. Hence 2x < 1 +x for 0 < x < 1, and 2x > 1 +x for x > 1 and for x < 0.

    As well as being useful in itself, the next result illustrates a style of proof that is often

    effective.

    5

  • 2.7 PROPOSITION. Let a, b, c, d be real numbers with a < d and b, c in [a, d]. Let

    α, β, γ, δ be non-negative numbers such that

    β + γ = α + δ,

    βb+ γc = αa+ δd.

    Then for any convex function f on [a, d],

    βf(b) + γf(c) ≤ αf(a) + δf(d). (4)

    Proof. The assumptions say that βg(b) + γg(c) = αg(a) + δg(d) for g(x) = 1 and

    g(x) = x, hence for any linear function g. Take g to be the linear function agreeing with f

    at a and d: then f(b) ≤ g(b) and f(c) ≤ g(c). The statement follows. (Note that it does notmatter whether c is greater or less than b.) �

    Of course, the opposite inequality applies for concave f . Also, strict inequality applies

    if f is strictly convex and b, c are in (a, d).

    We record the case α = β = γ = δ = 1, which is already of interest:

    2.8 COROLLARY. Suppose that b, c are in [a, d] and b+ c = a+ d. If f is convex on

    [a, d], then f(b) + f(c) ≤ f(a) + f(d). �

    This result can also be derived from 2.2: mf (a, b) ≤ mf (c, d). In turn, it implies severalfurther statements.

    2.9 COROLLARY. If f is convex on [a−R, a+R] for some a, R, then f(a+x)+f(a−x)increases with x for 0 ≤ x ≤ R. �

    In particular, if f is convex on [−R,R], then f(x) + f(−x) increases with x on [0, R].If f is also even, then f(x) increases on [0, R].

    2.10 COROLLARY. If f is convex on [0,∞) and f(0) = 0, then f(x+y) ≥ f(x)+f(y)for x, y > 0 (f is “superadditive”). �

    2.11 COROLLARY. If f is convex on [x0,∞) and c > 0, then f(x+c)−f(x) increaseswith x for x ≥ x0.

    Proof. Let x < y. By 2.8, f(x + c) + f(y) ≤ f(x) + f(y + c), so f(x + c − f(x) ≤f(y + c)− f(y). �

    Note that 2.11 is trivial if f has increasing derivative, since then f ′(x+ c)− f ′(x) ≥ 0.

    6

  • Conversely, if f(x+ c)− f(x) increases with x for all c > 0 and f is differentiable, then f ′(x)is increasing (consider the limit as c→ 0).

    We now consider continuity and one-sided derivatives of convex functions.

    2.12 PROPOSITION. If f is convex on an open interval I, then it is continuous there.

    Proof. Choose x0 ∈ I. Since I is an open interval, there exist points x1 and x2 in Iwith x1 < x0 < x2. Let a1x + b1 and a2x + b2 be the linear functions agreeing with f at

    x1, x0 and x0, x2 respectively. By 2.2, a1 ≤ a2. By the original definition and 2,6, we havea1x + b1 ≤ f(x) ≤ a2x + b2 for x0 < x < x2, and the reverse inequalities for x1 < x < x0.Hence f(x)→ f(x0) both as x→ x+0 and as x→ x−0 . �

    However, f can be discontinuous at an end-point of a closed interval: for example, a

    convex function on [0, 1] is defined by f(x) = 0 for 0 ≤ x < 1 and f(1) = 1.

    Write (D+f)(x) and (D−f)(x) for the right and left derivatives of f at x.

    2.13. If I is an open interval and f is convex on I, then f has finite right and left

    derivatives at each point of I, and (D+f)(x) ≥ (D−f)(x) for x ∈ I.

    Proof. Choose x ∈ I. For small enough h, k > 0, the points x− h and x+ k are in I,and by 2.2, mf (x−h, x) ≤ mf (x, x+k). Also, mf (x, x+k) decreases as k decreases towards0. Hence it tends to a limit L as k → 0+, and L ≥ mf (x−h, x). Similarly, mf (x−h, x) tendsto a limit M as h→ 0+, and M ≤ L. By definition, L = (D+f)(x) and M = (D−f)(x). �

    We can now state a version of 2.1 that applies to all convex functions, without the

    assumption of differentiability.

    2.14 PROPOSITION. If f is convex on I and x0 is an interior point of I, then

    f(x) ≥ f(x0) + (x− x0)(D+f)(x0) (5)

    for all x ∈ I.

    Proof. Let D+f)(x0) = m. First, take x > x0, and let x0 < y < x. By 2.2, mf (x0, y) ≤mf (x0, x). But mf (x0, y)→ m as y → x+0 , so mf (x0, x) ≥ m. This means that f(x)−f(x0) ≥m(x− x0), as stated.

    Now consider x < x0. In the same way, mf (x, x0) ≤ (D−f)(x0) ≤ m, so f(x0)−f(x) ≤m(x0 − x), which is equivalent to (5). �

    7

  • 3. Jensen’s inequality; applications to means

    The next result is the key to numerous applications of convex functions. It was formu-

    lated by the Danish mathematician J. L. W. V. Jensen in 1906. We give two proofs.

    3.1. THEOREM (Jensen’s inequality). Suppose that f is convex on I. Suppose that

    xj ∈ I and λj > 0 for 1 ≤ j ≤ n, and∑n

    j=1 λj = 1. Let x =∑n

    j=1 λjxj. Then

    f(x) ≤n∑j=1

    λjf(xj). (6)

    The opposite inequality holds if f is concave.

    Proof 1. Induction. The case n = 2 is the definition. Assume the statement true for

    n. Let x =∑n+1

    j=1 λjxj, where xj ∈ I (1 ≤ j ≤ n + 1) and∑n+1

    j=1 λj = 1. Let µ =∑n

    j=1 λj =

    1− λn+1, and

    y =n∑j=1

    λjµxj,

    so that x = µy + λn+1xn+1. By the induction hypothesis,

    f(y) ≤n∑j=1

    λjµf(xj).

    Hence, by (2),

    f(x) ≤ µf(y) + λn+1f(xn+1) ≤n∑j=1

    λjf(xj). �

    Proof 2. The statement is trivial if the xj are all equal, so assume they are not. Then

    x is an interior point of I. By 2.14 (or the more elementary 2.1 for differentiable f), for all

    x ∈ I, f(x)− f(x) ≥ m(x− x) where m = (D+f)(x). Son∑j=1

    λjf(xj)− f(x) =n∑j=1

    λj[f(xj)− f(x)]

    ≥ mn∑j=1

    λj(xj − x)

    = m

    (n∑j=1

    λjxj − x

    )= 0. �

    An immediate application is the well-known inequality of the means. Given positive

    numbers xj and wj with∑n

    j=1wj = 1, the weighted arithmetic mean of the numbers xj is

    8

  • x =∑n

    j=1wjxj, while the weighted geometric mean is∏xwjj . The ordinary arithmetic and

    geometric means are obtained by taking wj =1n

    for each j.

    3.2 PROPOSITION. Let xj, wj (1 ≤ j ≤ n) be positive numbers with∑n

    j=1wj = 1.

    Thenn∏j=1

    xwjj ≤

    n∑j=1

    wjxj.

    Proof. By Jensen’s inequality applied to the concave function log x, we have∑nj=1wj log xj ≤ log x. �

    Example. Apply 3.2 (in the logarithmic form) with xj = 1/wj: then∑n

    j=1wjxj = n,

    so∑n

    j=1wj log1wj≤ log n. (This quantity is known as the “entropy” of (wj), viewed as a

    probability distribution).

    Hölder’s inequality (in two equivalent forms) follows in elegant style. Note that for

    p > 1, the conjugate index p∗ is defined by 1/p+ 1/p∗ = 1, so that p∗ = p/(p− 1).

    3.3 PROPOSITION (Hölder’s inequality). Suppose that aj, bj (1 ≤ j ≤ n) are non-negative numbers and 0 < r < 1. Let s = 1− r. Then

    n∑j=1

    arjbsj ≤

    (n∑j=1

    aj

    )r( n∑j=1

    bj

    )s. (7)

    (ii) Suppose that xj, yj (1 ≤ j ≤ n) are non-negative numbers and p > 1. Then

    n∑j=1

    xjyj ≤

    (n∑j=1

    xpj

    )1/p( n∑j=1

    yp∗

    j

    )1/p∗. (8)

    Proof. (i) Let∑n

    j=1 aj = A and∑n

    j=1 bj = B, also cj = aj/A and dj = bj/B. By 3.2,

    crjbsj ≤ rcj + sdj, so

    n∑j=1

    crjdsj ≤ r

    n∑j=1

    cj + sm∑j=1

    dj = r + s = 1.

    But crjdsj = (a

    rjbsj)/(A

    rBs), so this equates to (7).

    (ii) Apply (7) with r = 1/p, s = 1/p∗ and aj = xpj , bj = y

    p∗

    j . �

    We now record what Jensen’s inequality says when applied to the function xp.

    3.4 PROPOSITION. Suppose that xj ≥ 0 and wj > 0 for 1 ≤ j ≤ n, with∑n

    j=1wj = 1.

    If p > 1 or p < 0, then (n∑j=1

    wjxj

    )p≤

    n∑j=1

    wjxpj . (9)

    9

  • The reverse inequality holds if 0 < p ≤ 1. �

    The weighted pth mean of the numbers xj is Mp(x,w) =(∑n

    j=1wjxpj

    )1/p. In this

    notation, (9) says that M1(x,w) ≤Mp(x,w) for p > 1. By applying it to xpj , with index q/p,one can deduce that Mp(x,w) ≤Mq(x,w) for 0 < p < q.

    If∑n

    j=1wj = W , then replacing wj by wj/W in (9), we obtain(n∑j=1

    wjxj

    )p≤ W p−1

    n∑j=1

    wjxpj .

    In particular, (∑n

    j=1 xj)p ≤ np−1

    ∑mj=1 x

    pj . A suitable substitution delivers another proof of

    Hölder’s inequality.

    We return to the general study of Jensen’s inequality. With the help of Taylor’s the-

    orem, we can modify the second proof to give a version with an error term in terms of the

    second derivative.

    3.5 PROPOSITION. Suppose that f has continuous second derivative on I. Let λj, xj

    and x be as in Theorem 3.1. Then there exists ξ ∈ I such thatn∑j=1

    λjf(xj)− f(x) = 12f′′(ξ)

    n∑j=1

    λj(xj − xλ)2.

    Proof. For each j, by Taylor’s theorem, there exists yj ∈ I such that

    f(xj)− f(x) = (xj − xλ)f ′(xλ) + 12(xj − x)2f ′′(yj).

    Let min1≤j≤n f′′(yj) = m and max1≤j≤n f

    ′′(yj) = M . By the cancellation seen in Proof 2 of

    Theorem 3.1, we see thatn∑j=1

    λjf(xj)− f(x) = 12n∑j=1

    λj(xj − x)2f ′′(yj)

    = 12µ

    n∑j=1

    λj(xj − x)2,

    where m ≤ µ ≤M . By the intermediate value theorem, µ = f ′′(ξ) for some ξ ∈ I. �

    Simple reasoning along the same lines as 2.7 delivers a companion inequality in the

    reverse direction, giving an upper bound for∑n

    j=1 λjf(xj).

    3.6. Let f be convex on I, and let xj, λj and x be as in Theorem 3.1. Let m ≤ xj ≤Mfor 1 ≤ j ≤ n, where m < M . Then

    n∑j=1

    λjf(xj) ≤M − xM −m

    f(m) +x−mM −m

    f(M).

    10

  • Proof. Observe that x = αm + βM , where α = (M − x)/(M − m) and β =(x−m)/(M −m). Then α + β = 1, so

    n∑j=1

    λjg(xj) = αg(m) + βg(M)

    for g(x) = 1 and for g(x) = x, hence for all linear g. Take g to be the linear function agreeing

    with f at m and M . Then f(xj) ≤ g(xj) for each j, so the statement follows. �

    This looks simpler when m = 0 and M = 1: the bound becomes (1−x)f(0)+xf(1). Inturn, this implies the bound f(0) + f(1)− f(1− x), known as the Jensen-Mercer inequality.

    Finally, we state the continuous version of Jensen’s inequality, in which the vectors

    (xj) and (λj) are replaced by functions and discrete sums are replaced by integrals. Proof 2

    of Theorem 3.1 appplies with minimal change.

    3.7 THEOREM. Suppose that w is a non-negative function on an interval I with∫Iw = 1, and let x be any integrable function on I. Let

    ∫Iwx = A(x,w). Let f be a convex

    function defined at least on x(I). Then

    f [A(x,w)] ≤∫I

    w(t)f [x(t)] dt.

    Proof. Write A(x,w) = A. For t ∈ I, we have

    f [x(t)]− f(A) ≥ [x(t)− A]f ′(A),

    in which f ′(A) means the right-derivative if necessary. Multiplying by w(t) and integrating,

    we obtain ∫I

    w(t)f [x(t)]− f(A) ≥ f ′(A)∫I

    w(t)[x(t)− A] dt

    = f ′(A)

    [∫I

    w(t)x(t) dt− A]

    = 0. �

    We can derive a continuous analogue of the inequality of the means. Given that x(t) > 0

    on I, we define the weighted geometric mean G(x,w) by: logG(x,w) =∫Iw(t) log x(t) dt.

    By Theorem 3.7, with f taken to be the concave function log, we have at once:

    3.8 COROLLARY. Under these conditions, we have G(x,w) ≤ A(x,w). �

    11

  • 4. Inequalities for integrals; applications to discrete sums

    Recall first the following elementary estimation for integrals, derived from upper and

    lower bounds: if m ≤ f(x) ≤M for a ≤ x ≤ b, then

    m(b− a) ≤∫ ba

    f(x) dx ≤M(b− a).

    For a decreasing function on [p, q] (where p, q are integers), by combining these estimates on

    successive intervals [r − 1, r], we obtain

    f(p+ 1) + f(p+ 2) + · · ·+ f(q) ≤∫ qp

    f(x) dx ≤ f(p) + f(p+ 1) + · · ·+ f(q − 1).

    For convex functions, we can give much more accurate estimations, as follows. If f is

    convex on [a, b], then f(x) ≤ h(x) on [a, b], where (with a change of notation) h(x) is thelinear function mx+ d such that ma+ d = f(a) and mb+ d = f(b). Then∫ b

    a

    (mx+ d) dx = 12m(b2 − a2) + d(b− a)

    = (b− a)[12m(b+ a) + d]

    = 12(b− a)[f(a) + f(b)].

    Of course, this is just the area of the trapezium described; it is the “trapezium rule” estimate

    for the integral.

    Meanwhile, if c = 12(a + b) and µ = (D+f)(c),

    then by 2.14, f(x) ≥ g(x) on [a, b], where g(x) = f(c) +µ(x− c). Clearly,∫ b

    a

    g(x) dx = (b− a)f(c).

    This is the “mid-point” estimate for the integral. Since∫ bag ≤

    ∫ baf ≤

    ∫ bah, we conclude:

    4.1 PROPOSITION. If f is convex on [a, b] and c = 12(a+ b), then

    (b− a)f(c) ≤∫ ba

    f(x) dx ≤ 12(b− a)[f(a) + f(b)]. �

    Note. Alternatively, to prove the left-hand inequality without using the one-sided

    derivative, observe that f(c) ≤ 12f(c− x) + 1

    2f(c+ x) and integrate on [0, 1

    2(b− a)].

    Example. Since log(1 + x) =∫ 1+x1

    1tdt, we have

    log(1 + x) ≤ x2

    (1 +

    1

    1 + x

    )=x(2 + x)

    2(1 + x)

    12

  • and

    log(1 + x) ≥ x1 + 1

    2x

    =2x

    2 + x.

    For comparison, the bounds given by simple integral comparison are x and x/(1 + x).

    By combining the estimations in 4.1 on successive intervals, we obtain:

    4.2 PROPOSITION. Let q − p be an integer. If f is convex on [p, q], then

    f(p+ 12) + · · ·+ f(q − 1

    2) ≤

    ∫ qp

    f(x) dx ≤ 12f(p) + f(p+ 1) + · · ·+ f(q − 1) + 1

    2f(q).

    Proof. By 4.1, for any r in [p, q],

    f(r − 12) ≤

    ∫ rr−1

    f ≤ 12[f(r − 1) + f(r)].

    Add for r = p+ 1, . . . , q to obtain the statement. �

    The application is usually to use the known value of the integral to give bounds for the

    discrete sums on either side.

    By taking limits, we can derive the following estimates for the tail of an infinite series.

    4.3. Suppose that f(x) is decreasing, convex and non-negative for all x ≥ 1. Supposealso that

    ∫∞1f(x) dx is convergent, and write rn =

    ∑∞j=n+1 f(j). Then∫ ∞

    n

    f(x) dx− 12f(n) ≤ rn ≤

    ∫ ∞n+ 1

    2

    f(x) dx.

    Proof. In the right-hand inequality in 4.2, take q = n and let q → ∞ to obtain∫∞nf ≤ 1

    2f(n) + rn.

    In the left-hand inequality in 4.2, take p = n+ 12, q = r + 1

    2to get

    f(n+ 1) + · · ·+ f(r) ≤∫ r+1/2n+1/2

    f.

    Taking the limit as r →∞, we obtain rn ≤∫∞n+1/2

    f . �

    Example. Let rn =∑∞

    j=n+11j2

    . Since∫∞a

    (1/x2) dx = 1/a, we obtain

    1

    n− 1

    2n2≤ rn ≤

    1

    n+ 12

    =1

    n− 1n(2n+ 1)

    (Compare the bounds 1/(n+ 1) and 1/n given by simple estimation.)

    13

  • Example. LetHn =∑n

    r=11r. Simple integral estimation gives log n+ 1

    n≤ Hn ≤ log n+1.

    By 4.2, with p = 1 and q = n, we have

    1

    2+

    1

    2+

    1

    3+ · · ·+ 1

    n− 1+

    1

    2n≥∫ n1

    1

    xdx = log n,

    so Hn ≥ log n+ 12 +12n

    . Also, taking p = 12

    and q = n− 12, we have

    Hn ≤∫ n−1/21/2

    1

    xdx = log(2n− 1).

    (With more care, this can be developed into more accurate estimates involving Euler’s con-

    stant. However, closer estimates are delivered by methods using the logarithmic series or

    Euler-Maclaurin summation.)

    Example. Apply 4.2 to log x: this function is concave, so the inequality reverses, giving

    12

    log 1 +n−1∑r=2

    log r + 12

    log n ≤∫ n1

    log x dx = n log n− n+ 1,

    so log n! =∑n

    r=1 log r ≤ (n+12) log n−n+1, hence n! ≤ nn+1/2e1−n. (This can be developed

    into a proof of Stirling’s formula, but again a more accurate versions are given by other

    methods.)

    We now present a rather different result on integrals of convex functions. It is essentially

    an integrated form of 2.6.

    4.4 PROPOSITION. Suppose that a1, a2, a3 and b1, b2, b3 are real numbers such that

    a1 < b1 ≤ b2 < a2 ≤ a3 < b3 (10)

    and that p, q, r are positive numbers such that

    pa1 + qa2 + ra3 = pb1 + qb2 + rb3. (11)

    Suppose that f is convex on [a1, b3] and that either

    pa21 + qa22 + ra

    23 = pb

    21 + qb

    22 + rb

    23 (12)

    or that f is also increasing and

    pa21 + qa22 + ra

    23 ≤ pb21 + qb22 + rb23. (13)

    Then

    q

    ∫ a2b2

    f ≤ p∫ b1a1

    f + r

    ∫ b3a3

    f. (14)

    14

  • Consequently, if g is a function such that g′ is convex on [a1, b3], and either (12) holds, or

    (13) holds and g is also convex, then

    pg(a1) + qg(a2) + rg(a3) ≤ pg(b1) + qg(b2) + rg(b3). (15)

    Proof. First, assume (12). Conditions (11) and (12) can be rewritten as

    q(a2 − b2) = p(b1 − a1) + r(b3 − a3),

    q(a22 − b22) = p(b21 − a21) + r(b23 − a23).

    These identities equate, respectively, to the statements that

    q

    ∫ a2b2

    h = p

    ∫ b1a1

    h+ r

    ∫ b3a3

    h (16)

    for h(x) = 1 and for h(x) = x, and hence for all linear h(x) = mx+n. Now take h to be the

    linear function agreeing with f at b2 and a2. By 2.6, we have f ≤ h on [b2, a2], while f ≥ hon [a1, b1] and [a3, b3]. Inequality (14) follows.

    Now assume that f is increasing and (13) holds. Then equality is replaced by ≤ in(14) for h(x) = x, hence also for h(x) = mx + n with m ≥ 0. This condition is satisfied bythe linear function agreeing with f at b2 and a2, since f(b2) ≤ f(a2). Inequality (14) followsas before.

    We now apply this with f = g′. If g is convex, then g′ is increasing. So under either

    set of conditions, we deduce (14): it now says

    q[g(a2)− g(b2)] ≤ p[g(b1)− g(a1)] + r[g(b3)− g(a3)],

    which equates to (15). �

    Of course, if f is strictly convex, then strict inequality holds in (14).

    Note that if aj, bj satisfy (9) and (10) or (9) and (11), then so do the numbers aj + c,

    bj + c for any c.

    The special case p = q = r = 1 is already of interest. It says: given (1), if∑3

    j=1 aj =∑3j=1 bj and

    ∑3j=1 a

    2j =

    ∑3j=1 b

    2j , then

    ∑3j=1 g(aj) ≤

    ∑3j=1 g(bj) when g

    ′ is convex, and the

    opposite when g′ is concave. So, for instance,∑3

    j=1 apj <

    ∑3j=1 b

    pj (strict inequality) for

    all p > 2, and the reverse for 1 < p < 2. There are plenty of integer triples that that

    satisfy these conditions, for example (aj) = (1, 4, 4), (bj) = (2, 2, 5) and (aj) = (1, 5, 6),

    15

  • (bj) = (2, 3, 7). (A systematic description of such pairs of triples would be interesting, but

    we will not embark upon it here.) A completely different route to results of this sort, but

    restricted to the functions xp, is by a generalisation of Descartes’ rule of signs: see [Jam1,

    Example 3].

    An appplication of Proposition 4.4 is monotonicity of the mid-point and trapezium

    approximations to integrals for convex functions. This was originally proved in [BJ], by a

    rather intricate method. A simpler proof is given in [Jam3].

    5. Algebra of convex functions

    The properties (E1), (E4), (E5), (E6) listed in section 1 gave some elementary facts

    about the derivation of new convex functions from given ones. We now present some less

    obvious results of this type.

    5.1. Let f , g be functions that are non-negative and convex on an interval I, either

    both increasing or both decreasing. Then fg is convex on I.

    Proof. Let x1 < x2. Let ax+ b and cx+d be the linear functions agreeing with f and g

    respectively at x1 and x2. If f and g are increasing, then a and c are non-negative, while if f

    and g are decreasing, then a and c are non-positive: in either case, ac ≥ 0. For x1 < x < x2,we have f(x)g(x) ≤ h(x), where

    h(x) = (ax+ b)(cx+ d) = acx2 + (bc+ ad)x+ cd.

    Since ac ≥ 0, h is convex, so for x1 < x < x2, we have h(x) ≤ H(x), where H is the linearfunction agreeing with h (hence also with fg) at x1 and x2. �

    Example. Let f(x) = 1/x and g(x) = x3/2. Both are convex, but f is decreasing and g

    is increasing. Then f(x)g(x) = x1/2, which is concave.

    5.2. Suppose that f is convex on I, and g is convex and increasing on the interval

    f(I). Then the composition g ◦ f is convex on I.

    Proof. Let xλ = (1 − λ)x1 + λx2. Then f(xλ) ≤ (1 − λ)f(x1) + λf(x2). Since gincreasing and convex,

    g[f(xλ)] ≤ g[(1− λ)f(x1) + λf(x2)]

    ≤ (1− λ)g[f(x1)] + λg[f(x2).

    To see that this result does not hold without the condition that g is increasing, we only

    need to observe that if g(x) = −x, then (g ◦ f)(x) = −f(x).

    16

  • 5.3 COROLLARY. If f is convex and non-negative on I and p > 1, then f(x)p is

    convex on I. �

    5.4 COROLLARY. If g is convex and increasing on [a, b], where 0 < a < b, then g(1/x)

    is convex on [1b, 1a]. �

    Example. Let g(x) = 1/(x + 1). Then g is convex, but decreasing, for x > 0. Then

    g(1/x) = 1− 1/(x+ 1), which is concave.

    Note that for positive, convex f , one can have 1/f(x) convex or concave, as shown by

    x−2 and x−1/2. An increasing, convex function with 1/f(x) concave is x2 + 1 on [0, 1].

    The inverse of the convex function ex is the concave function log x. This is a special

    case of the following result:

    5.5. Let f be strictly increasing and convex on [a, b], with inverse function g. Then g

    is concave on [f(a), f(b)]. If f is strictly decrasing, then g is convex.

    Proof. Take points x1, x2 in [a, b] with x1 < x2. Let 0 < λ < 1 and put xλ =

    (1 − λ)x1 + λx2. Let f(x1) = y1, f(x2) = y2 and yλ = (1 − λ)y1 + λy2. Since f is convex,f(xλ) ≤ yλ. Since g is increasing,

    g(yλ) ≥ g[f(xλ)] = xλ = (1− λ)g(y1) + λg(y2).

    The opposite applies if f , hence also g, is decreasing. �

    Log-convexity. We say that a strictly positive function f is log-convex on I if log f(x)

    is convex. Clearly, this is equivalent to

    f(xλ) ≤ f(x1)1−λf(x2)λ,

    where xλ = (1 − λ)x1 + λx2. For a differentiable function, it is equivalent to f ′(x)/f(x)increasing. By 2.11, it implies that f(x+ c)/f(x) increases with x for c > 0.

    By 5.2, a log-convex function is convex: if h(x) is convex, then so is eh(x).

    Of course, we say that f(x) is log-concave if log f(x) is concave: this is equvalent to

    1/f(x) being log-convex.

    Clearly, eax is both log-convex and log-concave for any a. Also, xp is log-concave for

    p > 0 and log-convex for p < 0.

    Obviously, if f and g are log-convex on I, then so is fg.

    17

  • 5.6. If f and g are log-convex on I, then so is f + g.

    Proof. Let xλ = (1− λ)x1 + λx2. Write f(xj) = fj and g(xj) = gj for j = 1, 2. Then

    f(xλ) ≤ f 1−λ1 fλ2 , g(xλ) ≤ g1−λ1 gλ2 .

    By Hölder’s inequality in the form (7),

    f 1−λ1 fλ2 + g

    1−λ1 g

    λ2 ≤ (f1 + g1)1−λ(f2 + g2)λ.

    Hence f + g is log-convex. �

    There is no corresponding statement for log-concavity, as the following example shows.

    Example. Let f(x) = ex and g(x) = 1. Then f and g are log-concave (as well as

    log-convex). Let h(x) = ex + 1. Then h′(x)/h(x) = ex/(ex + 1), which is strictly increasing

    for all x.

    Example: the gamma function. The gamma function Γ(x) is log-convex. This is proved

    most easily from Euler’s limit definition: Γ(x) = limn→∞ Γn(x), where

    Γn(x) =nxn!

    x(x+ 1) . . . (x+ n),

    so that

    log Γn(x) = x log n+ log(n!)−n∑r=0

    log(x+ r).

    Hence log Γn(x), hence also log Γ(x) is convex. A famous result, the Bohr-Mollerup theorem

    states that conversely, Γ(x) is the unique function f(x) on R+ that is log-convex and satisfiesf(x+ 1) = xf(x) and f(1) = 1.

    6. Monotonic averages

    Here we outline a pair of theorems first presented in [BJ], with some applications. For

    a function f on the interval [0, 1], we define

    An(f) =1

    n− 1

    n−1∑r=1

    f( rn

    )(n ≥ 2),

    Bn(f) =1

    n+ 1

    n∑r=0

    f( rn

    )(n ≥ 1).

    These are, respectively, the averages of the values f( rn) excluding and including the end

    points. For An(f), we do not need f to be defined at 0 and 1. If f is continuous on [0, 1],

    18

  • then both An(f) and Bn(f) tend to∫ 10f as n → ∞. We show that for convex functions,

    they do so in a monotonic way.

    6.1 PROPOSITION. If f is convex on (0, 1), then An(f) increases with n.

    Proof. Let n ≥ 2. For 1 ≤ r ≤ n − 1, the point r/n lies between r/(n + 1) and(r + 1)/(n+ 1). More exactly,

    r

    n=n− rn

    r

    n+ 1+r

    n

    r + 1

    n+ 1.

    Write f [r/(n+ 1)] = fr. Since f is convex,

    f( rn

    )≤ n− r

    nfr +

    r

    nfr+1.

    Hence

    n−1∑r=1

    f( rn

    )≤ n− 1

    nf1 +

    1

    nf2 +

    n− 2n

    f2 +2

    nf3 + · · ·+

    1

    nfn−1 +

    n− 1n

    fn

    =n− 1n

    n∑r=1

    fr,

    which says that An(f) ≤ An+1(f). �

    6.2 PROPOSITION. If f is convex on [0, 1], then Bn(f) decreases with n.

    Proof. Let n ≥ 2. This time, we use the fact that for 1 ≤ r ≤ n− 1,

    r

    n=r

    n

    r − 1n− 1

    +n− rn

    r

    n− 1.

    Write f [r/(n− 1)] = gr. By convexity of f , for r as above,

    f( rn

    )≤ rngr−1 +

    n− rn

    gr,

    also f(0/n) = g0 and f(n/n) = gn−1. Hence

    n∑r=0

    f( rn

    )≤ n

    ng0 +

    1

    ng0 +

    n− 1n

    g1 +2

    ng1 + · · ·+

    1

    ngn−1 +

    n

    ngn−1

    =n+ 1

    n

    n−1∑r=0

    gr,

    which says that Bn(f) ≤ Bn−1(f). �

    Of course, the opposite statements hold if f is concave. If f is linear, it follows that

    An(f) and Bn(f) are constant. This is easily verified directly: if f(x) = x, then An(f) =

    Bn(f) =12

    for all n.

    19

  • Applied to xp, these results give the following (we confine ourselves to the case p > 0):

    6.3. Let Sn(p) =∑n

    r=1 rp and

    cn(p) =Sn(p)

    n(n+ 1)p, dn(p) =

    Sn(p)

    np(n+ 1).

    Then cn(p) increases with n for p ≥ 1, and decreases for 0 ≤ p ≤ 1. Meanwhile, dn(p)decreases with n for p ≥ 1, and increases for 0 < p ≤ 1.

    (The reversals at p = 1 reflect the fact that cn(1) and dn(1) have the constant value12.)

    Proof. With f(x) = xp, we have

    An+1(f) =1

    n

    n∑r=1

    rp

    (n+ 1)p= cn(p),

    Bn(f) =1

    n+ 1

    n∑r=0

    rp

    np= dn(p). �

    By simple integral estimation,

    np+1

    p+ 1≤ Sn(p) ≤

    (n+ 1)p+1

    p+ 1,

    hence cn(p) and dn(p) tend to 1/(p+1) as n→∞. Since the terms of an increasing sequenceare not greater than the limit, we can deduce the following stronger estimate:

    6.4 COROLLARY. For p ≥ 1, we have

    1

    p+ 1np(n+ 1) ≤ Sn(p) ≤

    1

    p+ 1n(n+ 1)p.

    The reverse inequalities hold when 0 < p ≤ 1. �

    6.5 COROLLARY. Let un(p) = Sn(p)/np+1 and vn(p) = Sn(p)/(n + 1)

    p+1. Then for

    all p > 0, un(p) decreases with n and vn(p) increases.

    Proof. We have un(p) = (1 +1n)dn(p), hence is decreasing for p ≥ 1. Also, un(p) =

    (1 + 1n)pcn(p), hence is decreasing for 0 ≤ p ≤ 1. Similarly, vn(p) = [n/(n + 1)]cn(p) =

    [n/(n+ 1]pdn(p). �

    Note that unlike cn(p) and dn(p), these ratios fail to be constant when p = 1.

    We now apply Proposition 6.1 to the concave function log x.

    20

  • 6.6. The expression1

    n+ 1(n!)1/n decreases with n.

    Proof. Let f(x) = log x. By 6.1, an+1(f) decreases with n. But

    An+1(f) =1

    n

    n∑r=1

    (log r − log(n+ 1)

    )=

    1

    nlog(n!)− log(n+ 1). �

    It was shown in [MS] that 1n(n!)1/n decreases; this statement is weaker than ours,

    because their expression equates to ours multiplied by the decreasing factor 1 + 1n.

    Propositions 6.1 and 6.2 can be generalised in various ways. One way is to consider

    weighted averages of the form Bn(W, f) =∑n

    r=0wn,rf(rn), where wn,r ≥ 0 and

    ∑nr=0wn,r = 1

    for all n. This is explored in [Jam2]. For example, it is shown that Bn(W, f) decreases

    with n for convex f when W is any Hausdorff mean matrix W : in particular, this applies

    to the Euler matrix defined by wn,r =(nr

    )xr(1 − x)n−r for a chosen x in (0, 1). Another

    generalisation is Bennett’s concept of “meaningful” sequences [Benn1].

    7. Majorisation

    The result outlined here, known as the “majorisation principle”, seems to have been

    first formulated by Hardy, Littlewood and Pólya in 1929 [HLP1]. It was rediscovered by

    Karamata in 1932 [Kar], and it has also been called “Karamata’s inequality”.

    The result concerns sums of the form∑n

    j=1 f(xj). For this purpose, we may assume

    the xj arranged in decreasing order.

    Given a sequence x = (xj) (finite or infinite), we write Xk =∑k

    j=1 xj (and similarly Yk

    for a second sequence y). For now, let x, y be decreasing elements of Rn. If Yk ≤ Xk for eachk, we write y ≤S x. If also Yn = Xn, we write y ≤M x, and say that y is “majorised” by x.(This is not standard notation; in fact, no notation is firmly established for these relations.)

    Examples: (5, 4, 2) ≤M (7, 3, 1); (6, 5, 5, 3) ≤M (9, 4, 4, 2).

    Recall the well-known Abel summation formula for finite sums:

    n∑j=1

    ajxj = a1X1 +n∑j=2

    aj(Xj −Xj−1) =n−1∑j=1

    (aj − aj+1)Xj + anXn. (17)

    The following Lemma is an obvious consequence:

    7.1 LEMMA. Suppose that x, y are elements of Rn with y ≤S x. Suppose also that

    21

  • (aj) is a decreasing element of Rn, and that either Yn = Xn or an ≥ 0. Thenn∑j=1

    ajyj ≤n∑j=1

    ajxj.

    Proof. Apply (17) to (xj) and (yj) in turn and compare. �

    The majorisation principle follows easily by combining this with 2.1:

    7.2 PROPOSITION (the majorisation principle). Let x, y be decreasing elements of

    Rn with y ≤S x. Suppose that the function f is convex on an interval I containing all xjand yj, and that

    either (i) Yn = Xn (so that y ≤M x),or (ii) f is increasing on I.

    Thenn∑j=1

    f(yj) ≤n∑j=1

    f(xj). (18)

    Proof. Assume first that f is differentiable. Then by 2.1,

    f(xj)− f(yj) ≥ (xj − yj)f ′(yj)

    for each j. Since f is convex, f ′(t) increases with t, so f ′(yj) decreases with j. Under either

    hypothesis, Lemma 7.1 shows that∑n

    j=1(xj − yj)f ′(yj) ≥ 0.

    If f is not differentiable at some yj, we consider the right-derivative, using 2.14 instead

    of 2.1. The reasoning is the same. �

    Of course, the reverse of (18) holds if f is concave and either Yn = Xn or f is decreasing.

    Strict inequality holds if f is strictly convex and yj 6= xj for some j. The case n = 2 is 2.8.

    Note that under condition (ii), it follows further that∑k

    j=1 f(yj) ≤∑k

    j=1 f(xj) for

    each k ≤ n, more closely reflecting the hypothesis Yk ≤ Xk for each k.

    General sequences. For sequences that are not decreasing, 7.2 can be restated as follows

    (and often is in the literature). Let x∗ be the vector consisting of the terms xj arranged in

    decreasing order (the “decreasing rearrangement” of x). Of course, the sum∑n

    j=1 f(xj) is

    unchanged by rearrangement. So Theorem 1 says that (3) holds if y∗ ≤M x∗, or if y∗ ≤S x∗

    and f is increasing. The established terminology is that y is “majorised” by x if y∗ ≤M x∗.

    Example. Let (xj) be decreasing and x = Xn/n. Let yj = x for 1 ≤ j ≤ n. It iselementary that Xk/k (the sequence of averages) decreases with k, hence for k ≤ n, we

    22

  • have Xk/k ≥ x. So Yk = kx ≤ Xk, hence y ≤S x. By 7.2, nf(x) ≤∑n

    j=1 f(xj), or

    f(x) ≤ 1n

    ∑nj=1 f(xj). This is Jensen’s inequality for equally weighted elements.

    We record some applications. Applied to f(x) = xp, the result becomes:

    7.3. Let x, y be decreasing, non-negative elements of Rn with y ≤S x. Then:

    (i) if p ≥ 1, then∑k

    j=1 ypj ≤

    ∑kj=1 x

    pj for each k ≤ n.

    (ii) if 0 < p < 1 and also Yn = Xn, then∑n

    j=1 ypj ≥

    ∑nj=1 x

    pj . �

    Clearly, (i) also extends to infinite sequences. There is no question of (ii) applying

    without the condition Yn = Xn, since this would allow each yj to be arbitrarily small.

    Using the logarithmic and exponential functions, we can relate sums to products.

    7.4. Let x, y be decreasing, positive elements of Rn. If y ≤M x, then y1y2 . . . yn ≥x1x2 . . . xn.

    Proof. This follows from 7.2 with f(t) = log t: since f is concave, (18) is reversed. �

    With the Example above, this gives xn ≥ x1x2 . . . xn, in other words, the arithmeticmean x is not less than the geometric mean.

    Note: It is quite possible to have∑n

    j=1 xj =∑n

    j=1 yj and x1x2 . . . xn = y1y2 . . . yn, for

    example (12, 5, 4) and (10, 8, 3). What 7.4 tells us is that this cannot happen with either

    y ≤S x or x ≤S y.

    In the opposite direction, we can prove the following result.

    7.5. Let x, y be decreasing, positive elements of Rn. If

    y1y2 . . . yk ≤ x1x2 . . . xk

    for each k ≤ n, then for any p > 0, we have∑k

    j=1 ypj ≤

    ∑kj=1 x

    pj for each k ≤ n.

    Proof. Then∑k

    j=1 log yj ≤∑n

    j=1 log xj for each k ≤ n. Let f(t) = ept. Then f isconvex and increasing, and f(log yj) = y

    pj . The conclusion follows, by (18). �

    In particular, the hypothesis in 7.5 implies that y ≤S x. The stated inequality thenfollows from 7.3 for p ≥ 1, but not for 0 < p < 1.

    Example. By 7.5, we have 4p + 3p ≤ 6p + 2p for all p > 0.

    We describe a further application involving xp, in which there is no longer any assump-

    tion about partial sums.

    23

  • 7.6. Let x, y be non-negative elements of Rn, both decreasing or both increasing. Ifp > 1, then (

    n∑j=1

    xpjypj

    )(n∑j=1

    xj

    )p≥

    (n∑j=1

    xjyj

    )p( n∑j=1

    xpj

    ). (19)

    The reverse inequality holds if 0 < p < 1.

    Proof. We prove the statement for the case where (xj) and (yj) are decreasing. The case

    where they are increasing then follows by considering (xn, . . . , x2, x1) and (yn, . . . , y2, y1).

    Write zj = xjyj and Zn = cXn. We show that cXk ≤ Zk for k ≤ n. By 7.2, it thenfollows, for p > 1, that cp

    ∑nj=1 x

    pj ≤

    ∑nj=1 z

    pj , which equates to (19).

    Let Zk = ckXk, so cn = c. We have to show that ck ≥ c. This will follow if we showthat ck+1 ≤ ck for each k < n. Now Zk =

    ∑kj=1 xjyj ≥ ykXk, so ck ≥ yk, hence ck ≥ yk+1.

    Now

    Zk+1 = ckXk + xk+1yk+1 ≤ ck(Xk + xk+1) = ckXk+1,

    hence ck+1 ≤ ck, as required. �

    This result can be restated neatly in terms of `p-norms. Define ‖x‖p to be (∑n

    j=1 |xj|p)1/p

    (note that ‖x‖2 is the ordinary Euclidean norm). Then (19) equates to

    ‖xy‖p‖xy‖1

    ≥ ‖x‖p‖x‖1

    .

    We now establish a converse to 7.2, essentally showing that the property stated there

    characterises the majorisation relation.

    7.7. Let x, y be decreasing elements of Rn. If (18) holds for all increasing, convex f ,then y ≤S x. If (18) holds for all convex f , then y ≤M x.

    Proof. Suppose first that (18) holds for all convex f . Then it holds, in particular, for

    f(t) = ±t. This implies at once that Yn = Xn.

    Now suppose that (18) holds for increasing, convex f . Choose k ≤ n, and let

    f(t) = (t− xk)+ ={t− xk if t ≥ xk,

    0 if t < xk.

    Then f is convex and increasing, and

    n∑j=1

    f(xj) =k∑j=1

    (xj − xk) = Xk − kxk.

    24

  • Also, since f(t) is not less than both t− xk and 0 for all t, we have

    n∑j=1

    f(yj) ≥k∑j=1

    (yj − xk) = Yk − kxk,

    hence Yk ≤ Xk. �

    Finally, we formulate a continuous version of majorisation, in which integrals replace

    discrete sums. The proof is analogous, but Abel summation is replaced by integration by

    parts.

    7.8. Let x, y be decreasing, differentiable functions on [a, b], with values in an interval

    I. Write X(t) =∫ tax(s) ds, similarly Y (t). Suppose that Y (t) ≤ X(t) for a ≤ t ≤ b. Let

    f be a function that is convex and twice differentiable on I. Suppose further that either

    Y (b) = X(b) or f is increasing. Then∫ ba

    f [y(t)] dt ≤∫ ba

    f [x(t)] dt.

    Proof. By 2.1,

    f [x(t)]− f [y(t)] ≥ [x(t)− y(t)]f ′[y(t)].

    Integrate by parts:∫ ba

    [x(t)− y(t)]f ′[y(t)] dt = [[X(t)− Y (t)]f ′[y(t)]]ba −∫ ba

    [X(t)− Y (t)]f ′′[y(t)]y′(t) dt

    = [X(b)− Y (b)]f ′[y(b)]−∫ ba

    [X(t)− Y (t)]f ′′[y(t)]y′(t) dt.

    Under either of the alternative hypotheses, the first term is non-negative. Since f ′′[y(t)] ≥ 0and y′(t) ≤ 0, the second term is non-negative. �

    One can derive applications analogous to 7.3 and 7.6.

    There is a substantial body of further theory concerning majoristion. See, for example

    [HLP2], [MO], [Benn2]. Here we just mention without proof a purely algebraic character-

    isation [HLP2, 46–49]. An n × n matrix P = (pj,k) is doubly stochastic if the entries arenon-negative and all row sums and all column sums equal 1. The statement y∗ ≤M x∗ isequivalent to the existence of a doubly stochastic matrix P (not necessarily unique) such

    that y = Px. It is easy to see that this condition implies (18), and hence y ≤M x, by 7.7.Since yj =

    ∑nk=1 pj,kxk, Jensen’s inequality gives f(yj) ≤

    ∑nk=1 pj,kf(xk). Summation over j

    then gives∑n

    j=1 f(yj) ≤∑n

    k=1 f(xk).

    25

  • References

    [Benn1] Grahame Bennett, Meaningful sequences, Houston J. Math. 33 (2007),555–580.

    [Benn2] Grahame Bennett, Some forms of majorization, Houston J. Math. 36 (2010),1037–1066.

    [BJ] Grahame Bennett and Graham Jameson, Monotonic averages of convexfunctions, J. Math. Anal. Appl. 252 (2000), 410–430.

    [HLP1] G. H. Hardy, J. Littlewood and G. Pólya, Some simple inequalities satisfied byconvex functions, Messenger Math. 58 (1929), 145–152.

    [HLP2] G. H. Hardy, J. Littlewood and G. Pólya, Inequalities, 2nd ed., Cambridge Univ.Press (1967).

    [Jam1] G. J. O. Jameson, Counting zeros of generalised polynomials, Math. Gazette90 (2006), 223–234.

    [Jam2] G. J. O. Jameson, Monotonicity of weighted averages of convex functions,Math. Ineq. Appl., 23 (2020), 425–432.

    [Jam3] G. J. O. Jameson, Monotonicity of the mid-point and trapezium estimatesfor integrals, Math. Gazette 105 (2021), to appear.

    [Kar] J. Karamata, Sur une inégalité relative aux fonctions convexes, Publ. Math. Univ.Belgrade 1 (1932), 145–148.

    [MO] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and ItsApplications, Academic Press, New York (1979).

    [MS] H. Minc and L. Sathre, Some inequalities concerning (r!)1/r, Proc. EdinburghMath. Soc. 14 (1964), 41–46.

    updated 3 November 2020

    26