29
LECTURE NOTES 6 FOR 247B TERENCE TAO 1. Paraproducts - introduction In the previous quarter we have focused primarily on linear (or sublinear) operators, which take one function f as input and return a function Tf as output. In this set of notes we shall consider some examples of multilinear (or more specifically bilinear ) and nonlinear operators. Of course there are an infinite number of such operators, but we shall focus on operators related to the two model examples of such operators, the pointwise product operator (f,g) fg and a pointwise nonlinear operator f F (f ) where F : C C is a specific function (e.g. a power-type function F (z ) := |z | p1 z ). These two operators and their variants, and their behaviour on Sobolev spaces, are particularly relevant in the theory of nonlinear PDE. Two model questions are as follows: For which values of s,p,s 1 ,p 1 ,s 2 ,p 2 ,d is it true that whenever f 1 W s1,p1 (R d ) and f 2 W s2,p2 (R d ), one has f 1 f 2 W s,p (R d )? Let F : C C be given. For which values of s,p,t,q is it true that whenever f W s,p (R d ), one has F (f ) W t,q (R d )? These types of questions turn out to be most easily answered by Littlewood-Paley decomposition, breaking up expressions such as the pointwise product into com- ponent pieces known as paraproducts. (The analogous decomposition for nonlinear functions u F (u) is Bony’s linearisation formula.) The theory of such decom- positions is known as the paradifferential calculus 1 . The term “paraproduct” is somewhat vaguely defined, but loosely speaking, paraproducts tend to be restricted versions of products in which only certain types of frequency interactions are per- mitted. 1 This is the theory of multilinear constant coefficient differential operators and their gener- alisations. In contrast, the pseudodifferential calculus is the theory of linear variable coefficient differential operators and their generalisations. One could formulate a “parapseudodifferential calculus” encompassing multilinear variable coefficient differential operators, but this gets messy and not particularly enlightening. 1

Paraproducts-introduction - UCLAtao/247b.1.07w/notes6.pdfLECTURE NOTES 6 FOR 247B TERENCE TAO 1. Paraproducts-introduction In the previous quarter we have focused primarily on linear

Embed Size (px)

Citation preview

LECTURE NOTES 6 FOR 247B

TERENCE TAO

1. Paraproducts - introduction

In the previous quarter we have focused primarily on linear (or sublinear) operators,which take one function f as input and return a function Tf as output. In thisset of notes we shall consider some examples of multilinear (or more specificallybilinear) and nonlinear operators. Of course there are an infinite number of suchoperators, but we shall focus on operators related to the two model examples ofsuch operators, the pointwise product operator

(f, g) 7→ fg

and a pointwise nonlinear operator

f 7→ F (f)

where F : C → C is a specific function (e.g. a power-type function F (z) := |z|p−1z).These two operators and their variants, and their behaviour on Sobolev spaces, areparticularly relevant in the theory of nonlinear PDE. Two model questions are asfollows:

• For which values of s, p, s1, p1, s2, p2, d is it true that whenever f1 ∈ W s1,p1(Rd)and f2 ∈W s2,p2(Rd), one has f1f2 ∈W s,p(Rd)?

• Let F : C → C be given. For which values of s, p, t, q is it true thatwhenever f ∈W s,p(Rd), one has F (f) ∈ W t,q(Rd)?

These types of questions turn out to be most easily answered by Littlewood-Paleydecomposition, breaking up expressions such as the pointwise product into com-ponent pieces known as paraproducts. (The analogous decomposition for nonlinearfunctions u 7→ F (u) is Bony’s linearisation formula.) The theory of such decom-positions is known as the paradifferential calculus1. The term “paraproduct” issomewhat vaguely defined, but loosely speaking, paraproducts tend to be restrictedversions of products in which only certain types of frequency interactions are per-mitted.

1This is the theory of multilinear constant coefficient differential operators and their gener-

alisations. In contrast, the pseudodifferential calculus is the theory of linear variable coefficientdifferential operators and their generalisations. One could formulate a “parapseudodifferentialcalculus” encompassing multilinear variable coefficient differential operators, but this gets messyand not particularly enlightening.

1

2 TERENCE TAO

Before we plunge into the details, let us give some informal discussion to motivatewhy some sort of decomposition is necessary. Let us prove the following specificassertion:

Proposition 1.1. If f, g ∈ S(R3), then

‖fg‖W 1,3/2(R3) . ‖f‖W 1,2(R3)‖g‖W 1,2(R3). (1)

This proposition is only stated for Schwartz functions, but it is not hard to usethis estimate to then show that the same estimate is also true for all f ∈W 1,2(R3)and g ∈W 1,2(R3) by density arguments; we leave the details as an exercise to thereader.

To prove this proposition, let us normalise

‖f‖W 1,2(R3) = ‖g‖W 1,2(R3).

From Sobolev embedding we conclude

‖f‖L2(R3), ‖∇f‖L2(R3), ‖f‖L6(R3), ‖g‖L2(R3), ‖∇g‖L2(R3), ‖g‖L6(R3) . 1.(2)

From Holder’s inequality this already gives

‖fg‖L3/2(R3) ≤ ‖f‖L2(R3)‖g‖L6(R3) . 1.

Since

‖fg‖W 1,3/2(R3) ∼ ‖fg‖L3(R3) + ‖∇(fg)‖L3(R3)

we see that it will now suffice to show that

‖∇(fg)‖L3/2(R3) . 1.

Comparing this estimate with (2) we observe that the derivative ∇ is on the outsideof the product fg here, whereas in (2) the derivative is applied to f and g separately.But of course we can relate one to the other by the Leibnitz rule (or product rule)

∇(fg) = (∇f)g + f(∇g). (3)

From Holder we have

‖(∇f)g‖L3/2(R3) ≤ ‖∇f‖L2(R3)‖g‖L6(R3) . 1

and

‖f(∇g)‖L3/2(R3) ≤ ‖f‖L6(R3)‖∇g‖L2(R3) . 1

and so the claim follows from the triangle inequality. Note how the two terms(∇f)g and f(∇g), while similar, had to be treated in slightly different ways, andso ∇(fg) could not be treated directly by a single application of Holder.

It is also instructive to work through (1) with a specific (but slightly informal)example. Let f and g be non-trivial bump functions of height 1 adapted to ballsof radius 1/N and 1/M respectively for some N,M ≥ 1 (the cases N ≤ 1 orM ≤ 1 require a slightly different treatment, in which the lower order terms in theSobolev norms become more important, but we leave this to the reader). Then‖f‖L2(R3) ∼ N−3/2 and ‖g‖L2(R3) ∼ M−3/2. The function f has “wavelength”∼ 1/N and thus “frequency” ∼ N ; more concretely, the Fourier transform of f(which is a modulated Schwartz function of heightN−3 adapted to the ball B(0, N))

LECTURE NOTES 6 3

is concentrated on frequencies of magnitude comparable to N . Because of this, weexpect ∇f to “behave like” Nf , and indeed ∇f is a bump function of heightN adapted to the same ball of radius 1/N . In particular it is not hard to seethat ‖∇f‖L2(R3) ∼ N−1/2, and so ‖f‖W 1,2(R3) ∼ N−1/2. Similarly ‖g‖W 1,2(R3) ∼

M−1/2. Observe that this is consistent with Sobolev embedding, for instance wehave ‖f‖L6(R3) ∼ N−1/2 and ‖g‖L6(R3) ∼M−1/2.

Now let us look at the product fg. At worst this will be a bump of height 1adapted to a ball of radius 1/max(N,M) (this is when the balls supporting f, gare nested; otherwise fg is either smaller than this, or vanishes entirely). Thus the“wavelength” here is 1/max(N,M) and the “frequency” is max(N,M). This is tiedto the fundamental observation that the frequency of a product is the sum of thefrequencies of the factors:

e2πiξ1·x × e2πiξ2·x = e2πi(ξ1+ξ2)·x

or in terms of Fourier transforms

f g(ξ) =

ξ1+ξ2=ξ

f(ξ1)g(ξ2) dξ1.

So we expect to have

‖fg‖L3/2(R3) . max(N,M)−2

and

‖∇(fg)‖L3/2(R3) . max(N,M)−1

and so the inequality (1) becomes

max(N,M)−1 . N−1/2M−1/2.

But this is easily verified by splitting into two cases N ≥ M and N ≤ M andrecalling thatN,M ≥ 1. This splitting of cases is the analogue of the decomposition(3); notice in our specific example that when N ≥ M the term (∇f)g dominates,whereas when N ≤M the term f(∇g) dominates. This example also shows that (1)is rarely sharp; even in the model case of bump functions, we only expect equalitywhen the bumps have coincident support and with wavelength ≪ 1.

Pretending for the moment that ∇ is invertible, the decomposition (3) can be recastas

fg = ∇−1((∇f)g) +∇−1(f(∇g)). (4)

The two expressions ∇−1((∇f)g) and ∇−1(f(∇g)) are thus two components ofthe product fg, and are thus model examples of paraproducts. Roughly speaking,∇−1((∇f)g) is the portion of fg which favours the “high-low” interactions when ahigh-frequency component of f is multiplied with a low-frequency component of g,whereas ∇−1(f(∇g)) is the portion which favours the “low-high” interaction whena low-frequency component of f is multiplied with a high-frequency component ofg. (The situation is more subtle with the “high-high” interactions in which highfrequency components of f and g multiply and cancel in phase to create a lowfrequency contribution to fg. In this case ∇−1((∇f)g) and ∇−1(f(∇g)) are quitelarge compared to fg but have opposite sign, and so largely cancel each other.Thus in this case, these two expressions are not really behaving like paraproducts,as they are worse than the initial product.)

4 TERENCE TAO

To deal with more general Sobolev spaces we need to find extensions of (3) to otherdifferential operators, for instance can we decompose 〈∇〉s(fg) in a similar fashion?For ∇2 we can use the Leibnitz rule twice to give

∇2(fg) = (∇2f)g + f(∇2g) + 2∇f∇g.

A useful heuristic from PDE is that the worst terms always are the highest or-der terms (ones involving the highest order of differentiation), so we arrive at theheuristic

∇2(fg) ≈ (∇2f)g + f(∇2g).

More generally we see

∇k(fg) ≈ (∇kf)g + f(∇kg)

for integer k ≥ 0. Extrapolating from this, we could conjecture some sort of frac-tional Leibnitz rule

D(fg) ≈ (Df)g + f(Dg)

for any positive order differential or pseudodifferential operator such as 〈∇〉s fors ≥ 0. Of course this conjecture is not well formulated at present because the ≈symbol is undefined. Nevertheless we shall see shortly that a version of this rulecan indeed be made rigorous.

2. Coifman-Meyer multipliers

Just as linear Fourier multipliers are a good framework with which to study constantco-efficient differential operators and related objects, multilinear Fourier multipliersare a good framework to study (translation-invariant) multilinear operators. Forsimplicity we discuss only the bilinear case, although the multilinear case is quitesimilar. The starting point is the product formula

fg(x) =

Rd

Rd

e2πix·(ξ1+ξ2)f(ξ1)g(ξ2) dξ1dξ2.

Inspired by this, we define the bilinear multiplier Tm for any (locally integrable,tempered) function m : Rd ×Rd → C and Schwartz f, g ∈ S(Rd) by the formula

Tm(f, g)(x) :=

Rd

Rd

m(ξ1, ξ2)e2πix·(ξ1+ξ2)f(ξ1)g(ξ2) dξ1dξ2

or equivalently

Tm(f, g)(ξ) =

ξ1+ξ2=ξ

m(ξ1, ξ2)f(ξ1)g(ξ2) dξ1.

Formally, Tm is the unique bilinear operator such that

Tm(eξ1 , eξ2) = m(ξ1, ξ2)eξ1+ξ2

for all frequencies ξ1, ξ2 ∈ Rd, where we use eξ to denote the character (or planewave) eξ(x) := e2πix·ξ. Thus bilinear Fourier multipliers multiply plane wavestogether (thus adding their frequencies), but also modulate their amplitude by asymbol m(ξ1, ξ2).

In analogy with the linear case, we refer to m as the symbol of Tm. For instance,we have fg = T1(f, g), (∇f)g = Tiξ1(f, g), f(∇g) = Tiξ2(f, g), and ∇(fg) =

LECTURE NOTES 6 5

Ti(ξ1+ξ2)(f, g), where we extend the notation to vector valued m in the obviousmanner. Note that as the map m 7→ Tm is linear in m, this gives a Fourier-analyticproof of the Leibnitz rule (3). The composition calculus is more complicated inthe bilinear case than in the linear one, since there is not really a good notion of acomposition of two bilinear operators (except perhaps to form a trilinear operator).Nevertheless, we have the useful identities

Tm(a(D)f, g) = Ta(ξ1)m(f, g)

Tm(f, a(D)g) = Ta(ξ2)m(f, g)

a(D)Tm(f, g) = Ta(ξ1+ξ2)m(f, g)

for all “reasonable” m, a (e.g. polynomial growth and smooth will suffice) andSchwartz f, g. Thus linear Fourier multipliers can easily be absorbed into bilinearFourier multipliers.

For linear Fourier multipliers a(D), we have the simple adjoint relationship a(D)∗ =a(D). The situation is slightly more complicated for bilinear operators, because itturns out there are two adjoint operators (or more precisely, transpose operators).Indeed, for any Schwartz f, g, h and reasonable m (e.g. polynomial growth), asimple application of Fubini and Parseval shows that

Rd

Tm(f, g)h =

Rd

Tm′(g, h)f

=

Rd

Tm′′(h, f)g

=

ξ1+ξ2+ξ3=0

m(ξ1, ξ2)f(ξ1)g(ξ2)h(ξ3) dξ1dξ2

where

m′(ξ2, ξ3) := m(−ξ2 − ξ3, ξ2); m′′(ξ3, ξ1) = m(ξ1,−ξ1 − ξ3).

Similarly we have Tm(f, g) = Tmt(g, f), where mt(ξ2, ξ1) := m(ξ1, ξ2).

Recall the Hormander-Mikhlin multiplier theorem, which established (among otherthings) the Lp boundedness properties of linear Fourier multipliers m(D) providedthat m obeyed the symbol estimates

|∇jm(ξ)| .j,d |ξ|−j

for all j ≥ 0 and ξ 6= 0 (actually we only needed this for finitely many j, namelyj = 0, 1, . . . , d + 2). There is an analogue for bilinear Fourier multipliers (andindeed for multilinear multipliers), called the Coifman-Meyer multiplier theorem.It is not quite as universally applicable as its linear counterpart - we will have to alsoestablish a more “dyadic” variant of this theorem in order to obtain satisfactoryapplications - but it is still highly useful, and captures the essential flavour ofparadifferential calculus, in particular the decomposition of interactions into high-high, high-low, and low-high frequency interactions.

Definition 2.1 (Coifman-Meyer multipliers). A Coifman-Meyer symbol is a func-tion m : Rd ×Rd → C obeying the estimates

|∇j1ξ1∇j2ξ2m(ξ1, ξ2)| .j1,j2,d (|ξ1|+ |ξ2|)

−j1−j2

6 TERENCE TAO

for all2 j1, j2 ≥ 0. The corresponding operator Tm is a Coifman-Meyer multiplier.

• If we have |ξ1| ∼ |ξ2| on the support of m, we say that Tm is a high-high Coifman-Meyer paraproduct and denote Tm also by πhh (thus differentoccurrences of πhh can refer to different multipliers, analogously to the O()notation).

• If we have |ξ1+ ξ2| ∼ |ξ2| on the support of m, we say that Tm is a low-highCoifman-Meyer paraproduct and denote Tm also by πlh.

• If we have |ξ1+ ξ2| ∼ |ξ1| on the support of m, we say that Tm is a high-lowCoifman-Meyer paraproduct and denote Tm also by πhl.

Observe from the Leibnitz rule that the product of two Coifman-Meyer symbols isstill a Coifman-Meyer symbol (and thus we also obtain analogous product proper-ties for Coifman-Meyer paraproduct symbols). This fact is not as fundamental tothe theory as the corresponding fact for linear symbols, because multiplication ofbilinear symbols is not automatically related to any composition operation, but isstill a handy fact to know nevertheless.

Let us give three model examples of paraproducts. All three involve bump functionsψj(ξ) adapted to an annulus |ξ| ∼ 2j , and ψ<j(ξ) adapted to an annulus |ξ| . 2j,and all sums are over the integers unless otherwise noted.

Example 2.2 (High-high product). The operator πhh(f, g) :=∑

j(ψj(D)f)(ψj(D)g)

is a high-high Coifman-Meyer paraproduct with symbolm(ξ1, ξ2) =∑j ψj(ξ1)ψj(ξ2).

Informally, this paraproduct multiplies high (∼ 2j) frequencies of f with high fre-quencies of g to produce comparable or lower frequencies (. 2j) in the output.

Example 2.3 (Low-high product). If the constant C is sufficiently large, the op-erator πlh(f, g) :=

∑j(ψ<j−C(D)f)(ψj(D)g) will be a low-high Coifman-Meyer

paraproduct with symbol m(ξ1, ξ2) =∑

j ψ<j−C(ξ1)ψj(ξ2) (in particular we have

|ξ1| . 2−C |ξ2|, which is what ensures |ξ1 + ξ2| ∼ |ξ2| if C is large enough. Infor-mally, this paraproduct multiplies low (≪ 2j) frequencies of f with high frequenciesof g to produce high frequencies (. 2j) in the output.

Example 2.4 (High-low product). If the constant C is sufficiently large, the op-erator πhl(f, g) :=

∑j(ψj(D)f)(ψ<j−C (D)g) will be a high-low Coifman-Meyer

paraproduct with symbol m(ξ1, ξ2) =∑

j ψj(ξ1)ψ<j−C(ξ2).

Example 2.5 (Non-Coifman-Meyer paraproducts). In general, expressions such as∑j

∑k cj,k(ψj(D)f)(ψk(D)f), where cj,k are bounded constants are not Coifman-

Meyer multipliers (in contrast ot the linear situation, in which∑

j cjψj(D) is a

Hormander-Mikhlin multiplier). Indeed, for certain extremely pathological choicesof cj,k, such expressions do not obey the expected Lp estimates (a result of Grafakosand Kalton). Related to this is the fact that the Coifman-Meyer multiplier classis not closed under composition with Hormander-Mikhlin multipliers; if Tm is aCoifman-Meyer multiplier and a(D) is a Hormander-Mikhlin multiplier then Tm(a(D)f, g),Tm(f, a(D)g), and a(D)Tm(f, g) are in general not Coifman-Meyer multipliers. Itis because of these facts that Coifman-Meyer multipliers do not, by themselves,

2Actually, in practice only a finite number of j1, j2 depending on d are needed.

LECTURE NOTES 6 7

form a fully satifactory class of multipliers. Nevertheless we shall see later that itis possible to redress this failing to a large extent by introducing a class of “residualparaproducts” to absorb certain error terms.

Example 2.6 (Leibnitz rule). Let’s work in one dimension for simplicity. The twoexpressions ∇−1((∇f)g and ∇−1(f(∇g)) are (formally) bilinear multipliers withsymbols ξ1/(ξ1+ξ2) and ξ2/(ξ1+ξ2) respectively. These are certainly not Coifman-Meyer multipliers - they are not even locally integrable - but they behave somewhatlike high-low and low-high paraproducts, in that the former is concentrated in theregion |ξ1| & |ξ2| and the latter in the region |ξ2| & |ξ1|.

We remark that the transposes of a Coifman-Meyer multiplier, as defined above, arestill Coifman-Meyer multipliers. These transpose operations can convert low-highparaproducts to high-high, etc.; we leave the precise description of the permutationsas an exercise to the reader.

One pleasant fact about Coifman-Meyer multipliers is that they can always bedecomposed into paraproducts:

Lemma 2.7 (Bony’s paraproduct decomposition). Let Tm be a Coifman-Meyerparaproduct. Then we have the decomposition

Tm(f, g) = πhh(f, g) + πhl(f, g) + πlh(f, g)

for some high-high, high-low, and low-high Coifman-Meyer paraproducts πhh, πhl, πlhrespectively (which of course will depend on m. In particular, the pointwise producthas such a decomposition:

fg = πhh(f, g) + πhl(f, g) + πlh(f, g).

Proof There are several ways to perform such a decomposition. One such wayinvolves a Littlewood-Paley decomposition 1 =

∑j ψj , where each ψj is a bump

function adapted to the annulus {ξ : 2j−1 ≤ |ξ| ≤ 2j+1}. The telescoping sumsψ<j :=

∑k<j ψk are then bump functions adapted to the ball {ξ : |ξ| ≤ 2j+1}. We

then have the partition of unity

1 =∑

j

k

ψj(ξ1)ψk(ξ2)

=∑

j

k<j−5

ψj(ξ1)ψk(ξ2) +∑

j

|k−j|≤5

ψj(ξ1)ψk(ξ2) +∑

j

k>j+5

ψj(ξ1)ψk(ξ2)

=∑

j

ψj(ξ1)ψ<j−5(ξ2) +∑

j

|k−j|≤5

ψj(ξ1)ψk(ξ2) +∑

k

ψ<k−5(ξ1)ψk(ξ2).

The three expressions on the right can be easily verified to be high-low, high-high, and low-high Coifman-Meyer symbols respectively. Multiplying both sides bym(ξ1, ξ2) we obtain the result.

By the triangle inequality, we thus see that to estimate Coifman-Meyer multipliersit suffices to treat each type of paraproduct separately. This we now do.

8 TERENCE TAO

3. Paraproduct estimates

The pointwise product (f, g) 7→ fg obeys the Holder inequality

‖fg‖Lr(Rd) ≤ ‖f‖Lp(Rd)‖g‖Lq(Rd)

whenever 0 < p, q, r ≤ ∞ and 1/p+ 1/q = 1/r. A useful and pleasant fact is thatparaproducts πhh(f, g), πhl(f, g), πlh(f, g) obey analogues of this Holder inequalityfor large ranges of p, q, r (though not quite as large as for the Holder inequalityitself). We illustrate this fact first with some model examples and then steadilybuild up to more general theorems, culminating in the Coifman-Meyer multipliertheorem.

We first observe a simple lemma which will be very useful in the computationswhich follow. It is a formalisation of the heuristic that band-limited functions (e.g.functions whose Fourier transform are localised to the ball B(0, 2k)) are “essen-tially” constant at scale 2−k. Here, we shall use the Hardy-Littlewood maximalinequality to formalise the modifier “essentially”.

Lemma 3.1 (Local constancy of band-limited functions). Let j ∈ Z, and let ψ≤j

be a bump function adapted to a ball {|ξ| . 2j}. Then we have

|ψ≤j(D)f(y)| .d 〈2j(y − x)〉dMf(x)

and more generally

|∇kψ≤j(D)f(y)| .k,d 2jk〈2j(y − x)〉dMf(x) (5)

for all x, y ∈ Rd and k ≥ 0, where M is the Hardy-Littlewood maximal function.In particular, if f itself has Fourier transform supported in B(0, 2j), then

|f(y)| .d 〈2j(y − x)〉dMf(x)

and more generally

|∇kf(y)| .k,d 2jk〈2j(y − x)〉dMf(x).

Proof It suffices to prove (5). We can translate so that x = 0, and then rescale sothat j = 0. Expressing ∇kψ≤0(D) in physical space, we reduce to showing that

|

Rd

∇kψ≤0(y − z)f(z) dz| .k,d 〈y〉dMf(0).

We use the pointwise bound

∇kψ≤0(y − z) = Ok,d(〈y − z〉−100d)

and reduce to showing that∫

Rd

〈y − z〉−100d|f(z)| dz .d 〈y〉dMf(0).

For the region 〈z〉 . 〈y〉 this follows by estimating 〈y − z〉−100d crudely by O(1).For the region 〈z〉 ≫ 〈y〉 we estimate 〈y − z〉−100d by 〈z〉−100d and use dyadicdecomposition in |z|.

LECTURE NOTES 6 9

Now let us first consider the high-high paraproduct

πhh(f, g) :=∑

j

(ψj(D)f)(ψj(D)g)

in Example 2.2. From Cauchy-Schwarz we see that

|πhh(f, g)| ≤ (Sf)(Sg)

where S is the square function

Sf := (∑

j

|ψj(D)f |2)1/2.

Thus for any 0 < p, q, r ≤ ∞ with 1/p+ 1/q = 1/r, the ordinary Holder inequalitygives

‖πhh(f, g)‖Lr(Rd) ≤ ‖Sf‖Lp(Rd)‖Sg‖Lq(Rd).

If we also have 1 < p, q < ∞, we thus see from the Littlewood-Paley inequality‖Sf‖Lp(Rd) ∼d,p ‖f‖Lp(Rd) (and similarly for ‖Sg‖Lq(Rd)) that

‖πhh(f, g)‖Lr(Rd) .p,q,d ‖f‖Lp(Rd)‖g‖Lq(Rd).

Thus this paraproduct enjoys a Holder type inequality, as long as we impose theadditional requirements 1 < p, q <∞.

Now let us consider the low-high paraprodyct

πlh(f, g) :=∑

j

(ψ<j−C(D)f)(ψj(D)g).

If C is large enough, the product (ψ<j−C(D)f)(ψj(D)g) has Fourier transformsupported in frequencies of magnitude ∼ 2j. So we can write

(ψ<j−C(D)f)(ψj(D)g) = ψj(D)[(ψ<j−C (D)f)(ψj(D)g)]

where ψj is a bump function adapted to an annulus {|ξ| ∼ 2j}. Now suppose thatwe have 1 < p, q, r <∞. Then by the Littlewood-Paley inequality

‖∑

j

ψj(D)fj‖Lr(Rd) .d,r ‖(∑

j

|fj|2)1/2‖Lr(Rd)

we see that

‖πlh(f, g)‖Lr(Rd) .d,r ‖(∑

j

|(ψ<j−C(D)f)(ψj(D)g)|2)1/2‖Lr(Rd).

To deal with this, we observe from Lemma 3.1 that we have the pointwise bound

ψ<j−C(D)f(x) .Mf(x)

for all j ∈ Z and x ∈ Rd. Using this bound, we see that

(∑

j

|(ψ<j−C(D)f)(ψj(D)g)|2)1/2 . (Mf)(Sg)

and so on taking Lr norms and using Holder, the Hardy-Littlewood inequality, andthe Littlewood-Paley inequality we obtain

‖πlh(f, g)‖Lr(Rd) . ‖f‖Lp(Rd)‖g‖Lq(Rd).

Note that in this case we also get the endpoint p = ∞, since the maximal functionM is trivially bounded here.

10 TERENCE TAO

A similar argument shows that the paraproduct πhl in Example 2.4 obeys theestimate

‖πhl(f, g)‖Lr(Rd) . ‖f‖Lp(Rd)‖g‖Lq(Rd).

whenever 1 < p, q, r <∞ (and one also gets the endpoint q = ∞).

We thus see that these types of paraproducts can be estimated by means of theHolder, Hardy-Littlewood, and Littlewood-Paley inequalities. Now we establish thesame estimates for more general multipliers.

Lemma 3.2 (High-high paraproducts). Let πhh be a high-high paraproduct. Thenwe have

‖πhh(f, g)‖Lr(Rd) .p,q,r,d ‖f‖Lp(Rd)‖g‖Lq(Rd)

whenever 1 < p, q <∞, f, g ∈ S(Rd), and 1/r = 1/p+ 1/q.

Proof We allow all implied constants to depend on p, q, r, d. The strategy is todecompose the paraproduct so that it resembles Example 2.2.

We use a Littlewood-Paley decomposition 1 =∑

j ψ2j to split

πhh(f, g) =∑

j

k

πhh(ψj(D)ψj(D)f, ψk(D)ψk(D)g).

The operator πhh(ψj(D)f, ψk(D)g) vanishes unless j = k + O(1), in which case itis bilinear multiplier whose symbol mjk is a bump function adapted to the domain{ξ1, ξ2 = O(2j)}. Thus by the triangle inequality

|πhh(f, g)| ≤∑

j,k:j=k+O(1)

|Tmjk(ψj(D)f, ψk(D)g)|.

To proceed further we need to do something about the symbol mjk. We shall useFourier decomposition on a cube in Rd×Rd of sidelength C2j for some sufficientlylarge C to write

mjk(ξ1, ξ2) =∑

n1,n2∈Zd

cn1,n2e2πi(n1·ξ1+n2·ξ2)/C2j

on the support of ψj(ξ1)ψk(ξ2), where the Fourier coefficients cn1,n2are rapidly

decreasing, for instance

cn1,n2. (1 + |n1|+ |n2|)

−100d.

Applying this we see that

Tmjk(ψj(D)f, ψk(D)g)(x) =

n1,n2∈Zd

cn1,n2ψj(D)f(x−n1/C2

j)ψk(D)g(x−n2/C2j)

and thus by the triangle inequality

|πhh(f, g)(x)| ≤∑

j,k:j=k+O(1)

n1,n2∈Zd

(1+|n1|+|n2|)−100d|ψj(D)f(x−n1/C2

j)||ψk(D)f(x−n2/C2j)|.

To deal with the shifts by n1/C2j and n2/C2

j we use the reproducing formula

ψj(D)f = ψj(D)ψj(D)f

LECTURE NOTES 6 11

for some suitable bump function ψj adapted to the ball B(0, 2j+3) (say). FromLemma 3.1 we have

|ψj(D)f(x − n1/C2j)| . (1 + |n1|)

dM(ψj(D)f)(x). (6)

Similarly (recalling that k = j +O(1))

|ψk(D)f(x− n2/C2j)| . (1 + |n2|)

dM(ψk(D)g)(x).

Thus we have

|πhh(f, g)| .∑

j,k:j=k+O(1)

n1,n2∈Zd

(1+|n1|+|n2|)−100d(1+|n1|)

d(1+|n2|)d|Mψj(D)f(x)||Mψk(D)f(x)|.

We can perform the n1, n2 sum to obtain

|πhh(f, g)| .∑

j,k:j=k+O(1)

|Mψj(D)f ||Mψk(D)f |.

By Schur’s test or Young’s inequality (or Cauchy-Schwartz) we conclude

|πhh(f, g)| . (∑

j

|Mψj(D)f |2)1/2(∑

k

|Mψk(D)f |2)1/2.

Taking Lr norms and using Holder we obtain

‖πhh(f, g)‖Lr(Rd) . ‖(∑

j

|Mψj(D)f |2)1/2‖Lp(Rd)‖(∑

k

|Mψk(D)f |2)1/2‖Lq(Rd).

By the Fefferman-Stein maximal inequality followed by the Littlewood-Paley in-equality we have

‖(∑

j

|Mψj(D)f |2)1/2‖Lp(Rd) . ‖(∑

j

|ψj(D)f |2)1/2‖Lp(Rd) . ‖f‖Lp(Rd)

and similarly for g. The claim follows.

Lemma 3.3 (Low-high paraproducts). Let πlh be a low-high paraproduct. Thenwe have

‖πlh(f, g)‖Lr(Rd) .p,q,r,d ‖f‖Lp(Rd)‖g‖Lq(Rd)

whenever 1 < p ≤ ∞, 1 < q, r <∞, f, g ∈ S(Rd), and 1/r = 1/p+ 1/q.

Proof We allow all implied constants to depend on p, q, r, d. We apply a similar(but not identical) strategy to the previous proof. Performing a Littlewood-Paleydecomposition to g alone, we obtain

πlh(f, g) =∑

j

πlh(f, ψj(D)ψj(D)g).

The low-high nature of the paraproduct ensures that we may replace f by ψ<j−C(D)ψ<j−C(D)ffor some sufficiently large C. Thus we can write

πlh(f, g) =∑

j

Tmj (ψ<j−C(D)f, ψj(D)g)

12 TERENCE TAO

where mj(ξ1, ξ2) := m(ξ1, ξ2)ψ<j−C(ξ1)ψj(ξ2). The Coifman-Meyer bounds on mimply that mj is a bump function adapted to the region {ξ1, ξ2 = O(2j)}. We canperform a Fourier decomposition as before and conclude

πlh(f, g)(x) =∑

j

n1,n2∈Zd

cn1,n2ψ<j−C(D)f(x− n1/C2

j)ψj(D)g(x− n2/C2j).

Taking absolute values and using Lemma 3.1 as before we obtain

|πlh(f, g)| .∑

j

n1,n2∈Zd

(1+|n1|+|n2|)−100d(1+|n1|)

d(1+|n2|)dM(ψ<j−C(D)f)M(ψj(D)g).

We can perform the n1, n2 summations, and also apply Lemma 3.1 to conclude

|πlh(f, g)| .∑

j

M(Mf)M(ψj(D)g).

If we then take Lr norms and use Holder, the Hardy-Littlewood inequality, theFefferman-Stein inequality, and the Littlewood-Paley inequality we obtain the claim.

There is of course an analogus result for high-low paraproducts with the roles ofp and q reversed. Combining all these results with Lemma 2.7 on the commondomain of p, q, r we obtain

Corollary 3.4 (Coifman-Meyer multiplier theorem, easy case). Let Tm be a Coifman-Meyer multiplier. Then

‖Tm(f, g)‖Lr(Rd) .p,q,r,d ‖f‖Lp(Rd)‖g‖Lq(Rd)

whenever 1 < p, q, r <∞ and 1r = 1

p + 1q .

4. The BMO theory

Corollary 3.4 is not fully satisfactory because it misses the endpoints when p, q, requal 1 or ∞. For individual paraproducts, we already saw that we could obtainsome of these endpoints. Now we extend the theory to these (more difficult) end-points. It turns out that for some L∞ endpoints we can in fact use the BMO norminstead. We begin with a model case.

Proposition 4.1. If πlh is the low-high paraproduct

πlh(f, g) =∑

j

(ψ<j−C(D)f)(ψj(D)g)

for C sufficiently small, then for all Schwartz f, g we have

‖πlh(f, g)‖L2(Rd) .d ‖f‖L2(Rd)‖g‖BMO(Rd).

Proof We normalise ‖g‖BMO(Rd) = ‖f‖L2(Rd) = 1, and allow all implicit constantsto depend on d. Observe that (ψ<j−C(D)f)(ψj(D)g) has Fourier support in the

LECTURE NOTES 6 13

annulus {ξ ∈ Rd : |ξ| ∼ 2j}. These annuli have only finite overlap, thus byPlancherel we have

‖πlh(f, g)‖2L2(Rd) .

j

‖(ψ<j−C(D)f)(ψj(D)g)‖2L2(Rd)

so it suffices to show that∑

j

Rd

|ψ<j−C(D)f |2|ψj(D)g|2 . 1.

We shall dyadically decompose the ψ<j−C(D)f factor. Since

|ψ<j−C(f)|2 .

k

22k1|ψ<j−C(D)f |≥2k

we have∑

j

Rd

|ψ<j−C(D)f |2|ψj(D)g|2 .∑

k

22k∫

Rd

j:|ψ<j−C (D)f |≥2k

|ψj(D)g|2.

From Lemma 3.1 we see that if |ψ<j−C(D)f | ≥ 2k then Mf > ε2k for somesufficiently small absolute constant ε > 0 (we will choose this constant later). Thuswe reduce to showing that

k

22k∫

Mf>ε2k

j:|ψ<j−C (D)f |≥2k

|ψj(D)g|2 . 1.

Now observe from the Hardy-Littlewood maximal inequality that∑

k

22k|{Mf ≥ ε22k}| .ε

Rd

Mf2 .

Rd

f2 . 1.

Thus it will suffice to show that∫

Mf>ε2k

j:|ψ<j−C (D)f |≥2k

|ψj(D)g|2 .ε |{Mf ≥ ε22k}|

for all k. By dividing f by 2k we can assume k = 0. (This destroys the L2

normalisation of f , but we will no longer need this normalisation), thus we wish toshow ∫

Mf>ε

j:|ψ<j−C (D)f |≥1

|ψj(D)g|2 .ε |{Mf ≥ ε2}|

for all f and for a sufficiently small absolute constant ε > 0.

By monotone convergence we can replace the set {Mf ≥ ε} by an arbitrary compactset E inside this set, as long as our bounds are of course uniform in E. By theusual Vitali covering lemma argument we can cover E by finitely many balls

E ⊂ 3B1 ∪ . . . ∪ 3BN

where the balls B1, . . . , BN are disjoint with∫−Bi

|f | = ε

and such that ∫−tBi

|f | ≤ ε for all t > 1.

14 TERENCE TAO

In particular we see (if ε is small enough) that

Mf > ε2 on Bi

and in particular ∑

i

|Bi| ≤ |{Mf > ε2}|.

Thus it will suffice to show that∫

3Bi

j:|ψ<j−C(D)f |≥1

|ψj(D)g|2 .ε |Bi|.

By translation we may centre Bi at the origin, thus Bi = B(0, ri). By replacingf(x), g(x) with f(rix), g(rix) we may then set ri = 1. We now wish to show that

B(0,3)

j:|ψ<j−C(D)f |≥1

|ψj(D)g|2 .ε 1.

Now by hypothesis we know that∫

B(0,t)

|f | . εtd

for all t ≥ 1. From this and the kernel bounds on ψ<j−C(D) and dyadic decompo-sition one can easily verify that

|ψ<j−C(D)f(x)| . ε

in the low frequency case j ≤ 0. Thus (if ε is small enough) we can restrict toj ≥ 0, and it suffices to show that

B(0,3)

j≥0

|ψj(D)g|2 .ε 1. (7)

We would like to say that this estimate is invariant under subtraction of a constantfrom g (since the BMO norm is insensitive to such changes), which will allow us tonormalise

∫−B(0,1)

g = 0. Unfortunately this interferes with the hypothesis that g is

Schwartz, which is necessary to define ψj(D)g properly. This is fixable by a numberof soft methods. Here is one: firstly we can restrict j to a finite range 0 ≤ j ≤ J , solong as our final bounds are uniform in J . Now we replace g by g−

∫−B(0,1)

gφ(x/R),

where R is a really large radius (larger than anything depending on g and J) andφ is a bump function adapted to B(0, 2) which equals 1 on B(0, 1). One can verifythat in the limit R → ∞ this does not affect either the BMO norm of g or theexpression (7), and that this function has mean zero on B(0, 1). Thus without lossof generality we can assume that

∫−B(0,1)

g = 0. From the John-Nirenberg inequality

and the normalisation ‖g‖BMO(Rd) = 1 we then have∫

B(0,1)

|g|2 . 1

but also more generally we have for all t ≥ 1∫−B(0,t)

g = O(td)

LECTURE NOTES 6 15

(actually, John-Nirenberg allows us to improve the right-hand side to Od(1+ log t),but we will not need this) and so

B(0,t)

|g|2 . t3d.

We split g into g1B(0,4) and g(1− 1B(0,4)). From the above estimate and the rapiddecrease of the kernel of ψj we readily see that

|ψj(D)(g(1 − 1B(0,4)))(x)| . 2−100j

for all j ≥ 0 and x ∈ B(0, 3). So this “global” portion is negligible and we onlyneed to control the local part:

B(0,3)

j≥0

|ψj(D)(g1B(0,4))|2 .ε 1.

But the left-hand side is bounded by

‖(∑

j

|ψj(D)(g1B(0,4))|2)1/2‖2L2(Rd) . ‖g1B(0,4)‖

2L2(Rd) . 1

and the claim follows.

Corollary 4.2. The same claim holds for more general low-high Coifman-Meyerparaproducts.

Proof We again normalise ‖g‖BMO(Rd) = ‖f‖L2(Rd) = 1, and allow all implicitconstants to depend on d. By arguing as in the preceding section, we can decompose

πlh(f, g) =∑

j

Tmj (ψ<j−C(D)f, ψj(D)g)

where mj is adapted to the region where {|ξ1 + ξ2| ∼ |ξ2| ∼ 2j}. By Plancherel asbefore we have

‖πlh(f, g)‖2L2(Rd) .

j

‖Tmj(ψ<j−C(D)f, ψj(D)g)‖2L2(Rd)

so it suffices to show that

(∑

j

‖Tmj(ψ<j−C(D)f, ψj(D)g)‖2L2(Rd))1/2 . 1.

On the other hand, by a Fourier decomposition we have

‖Tmj(ψ<j−C(D)f, ψj(D)g)‖L2(Rd)

.∑

n1,n2∈Zd

(1 + |n1|+ |n2|)−100d‖ψ<j−C(D)f(x− n1/C2

j)ψj(D)g(x− n2/C2j)‖L2(Rd)

and so by the triangle inequality it suffices to show that∑

n1,n2∈Zd

(1+|n1|+|n2|)−100d(

j

‖ψ<j−C(D)f(x−n1/C2j)ψj(D)g(x−n2/C2

j)‖2L2(Rd))1/2 . 1.

But by repeating the proof of the previous proposition we can show that

(∑

j

‖ψ<j−C(D)f(x−n1/C2j)ψj(D)g(x−n2/C2

j)‖2L2(Rd))1/2 . (1+|n1|+|n2|)

20d.

16 TERENCE TAO

The basic point is that the n1 and n2 shifts can be viewed as a phase modulationon the symbol of ψ<j−C and ψj . An inspection of the previous proof will revealthat one only needed to bound the first 10d derivatives of these symbols in orderto close the argument, which is what leads to the loss of (1 + |n1| + |n2|)

20d. Theclaim follows.

We now see that if g ∈ BMO, then the linear operator f 7→ πlh(f, g) is bounded onL2. In fact more is true:

Proposition 4.3. Let ‖g‖BMO(Rd) = O(1), and let πlh be a low-high Coifman-Meyer paraproduct. Then the operator f 7→ πlh(f, g) is a CZO. Similarly for anyhigh-high Coifman-Meyer paraproduct πhh, the operator f 7→ πhh(f, g) is a CZO.

Proof The second claim follows from the first by taking adjoints (note the adjointof a CZO is a CZO). Since we have already established L2 boundedness, the onlyremaining task is to establish singular integral bounds.

We will do this for the model example

πlh(f, g) =∑

j

(ψ<j−C(D)f)ψj(D)g;

the more general cases can be obtained by using the Fourier decompositions alreadyemployed previously. To avoid technicalities let us implicitly restrict j to a finiterange, e.g. −J ≤ j ≤ J ; our bounds will be independent of J and so standardlimiting arguments will allow us to remove the finite range restriction. The kernelK(x, y) of this operator is given by the formula

K(x, y) =∑

j

ψ<j−C(y − x)ψj(D)g(x).

Since ψ<j−C is a bump function adapted to the ball of radius O(2j), we have

|∇kψ<j−C(x− y)| . 2jk2dj〈2j(x− y)〉−100d

for k = 0, 1; since g is bounded in BMO one also easily verifies that

|∇kψj(D)g(x)| . 2jk

for k = 0, 1. Thus

|∇kx,yK(x, y)| .

j

2jk2dj〈2j(x− y)〉−100d . |x− y|d−k

for k = 0, 1, and the claim follows.

Corollary 4.4. Corollary 3.4 also holds in the boundary cases when exactly one ofp, q, r′ is equal to ∞.

Proof By duality we may assume that q is ∞. By Lemma 2.7 we may assume thatTm is either πhh, πlh, or πhl. In the first two cases we can use the above Corollaryand Calderon-Zygmund theory; in the last case we use Lemma 3.3 with the rolesof f and g reversed.

LECTURE NOTES 6 17

We remark that one can also show that Corollary 3.4 holds in the regime when1 < p, q < ∞ and r is allowed to be less than 1 (as in Lemma 3.2), but this is alittle trickier to show (though in much the same spirit as the earlier results) andwill not be done here.

5. Residual paraproducts

As mentioned earlier, Coifman-Meyer multipliers have the disadvantage that theyare not closed under composition with linear Fourier multipliers such as Hormander-Mikhlin multipliers. The following class of multipliers performs somewhat betterin this regard3.

Definition 5.1 (Residual paraproducts). A residual symbol is a function m : Γ →C on the space Γ := {(ξ1, ξ2, ξ3) ∈ Rd ×Rd ×Rd : ξ1 + ξ2 + ξ3 = 0} obeying theestimates

|∇j11 ∇j2

2 ∇j33 m(ξ1, ξ2, ξ3)| .j1,j2,j3,d,ε [

min(|ξ1|, |ξ2|, |ξ3|)

max(|ξ1|, |ξ2|, |ξ3|)]ε

abc=123,231,312

min(|ξb|, |ξc|)−ja

(8)

for all4 j1, j2, j3 ≥ 0 and all non-zero ξ1, ξ2, ξ3 ∈ Rd and some ε > 0, where ∇1

is the gradient in the directions5 ξ1 = const, etc. The corresponding multiplierTm (where we abuse notation and write m(ξ1, ξ2) := m(ξ1, ξ2,−ξ1 − ξ2)) will becalled a residual paraproduct and will be denoted πr. We can classify these residualparaproducts further as high-high, low-high, and high-low residual paraproductsπrhh, π

rlh, π

rhl respectively.

Examples 5.2. The symbol

[min(|ξ1|, |ξ2|, |ξ3|)

max(|ξ1|, |ξ2|, |ξ3|)]ε|ξ1|

it1 |ξ2|it2 |ξ3|

it3

is a residual symbol for any bounded real t1, t2, t3. The product of a residualsymbol with either a residual symbol or a Coifman-Meyer symbol is another residualsymbol. The operator

πr(f, g) :=∑

j1,j2,j3

cj1,j2,j3ψj3(D)[(ψj1 (D)f)(ψj2 (D)g)],

where each ψj is a bump function adapted to the annulus {|ξ| ∼ 2j}, is a residualoperator as long as the constants cj1,j2,j3 obey the decay condition

cj1,j2,j3 = O(2−ε(max(j1,j2,j3)−min(j1,j2,j3))).

Remark 5.3. To understand the estimates (8), let us work for instance in the “low-high” case when |ξ2| ∼ |ξ3| (and so |ξ1| . |ξ2|). Then (8) asserts that m is boundedby (|ξ1|/|ξ2|)

ε (so it decays a little away from the diagonal |ξ1| ∼ |ξ2|), and that onecan move each of ξ1, ξ2, ξ3 by a small multiple of |ξ1|, |ξ2|, |ξ3| respectively (stayingin Γ, of course) without encountering any significant fluctuations or irregularity inthe symbol.

3This notation is not standard in the literature.4Actually, in practice only a finite number of j1, j2 depending on d are needed.5If one dropped the ξ3 variable and viewed m as a function purely of ξ1 and ξ2, then (up to

irrelevant constants) ∇1 = ∇ξ2 , ∇2 = ∇ξ1 , and ∇3 = ∇ξ1 +∇ξ2 .

18 TERENCE TAO

The residual and Coifman-Meyer paraproducts are incomparable; residual para-products oscillate too much to be Coifman-Meyer, and Coifman-Meyer paraprod-

ucts do not have the decay factor [ min(|ξ1|,|ξ2|,|ξ3|)max(|ξ1|,|ξ2|,|ξ3|)

]ε which is crucial to residual

paraproducts. Unfortunately, it turns out that eliminating this factor can causethe multipliers to cease obeying good estimates (see Proposition 5.5 below). How-ever, with this decay we have good estimates:

Proposition 5.4 (Residual multiplier theorem). Let πr be a residual multiplier.Then

‖πr(f, g)‖Lr(Rd) .p,q,r,d ‖f‖Lp(Rd)‖g‖Lq(Rd)

whenever 1 ≤ p, q, r ≤ ∞ and 1r = 1

p + 1q , with (p, q) 6= (∞,∞), (∞, 1), (1,∞).

Proof We suppress all dependence of constants on p, q, r, d. By applying Lemma2.7 (which applies just as easily to residual multipliers as to Coifman-Meyer mul-tipliers) we may assume that πr is a high-high, low-high, or high-low paraproduct.By duality we may reduce to the low-high case. We use Littlewood-Paley decom-position to split

πr(f, g) =∑

j1,j2,j3

ψj3(D)Tmj1,j2,j3(ψj1(D)f, ψj2 (D)g)

where

mj1,j2,j3(ξ1, ξ2, ξ3) = m(ξ1, ξ2, ξ3)2∏

i=1

ψji(ξi).

Since we are in the low-high case we may take j2 = j3 + O(1) and j1 ≤ j2 + O(1).Let us write j1 = j2 − k, then we see that mj1,j2,j3 = 2−εkm′

j1,j2,j3where m′ is a

bump function adapted to the region {(ξ1, ξ2,−ξ1 − ξ2) : |ξ1| ∼ 2j1 ; |ξ2| ∼ 2j2}. Bythe triangle inequality it thus suffices to show that

‖∑

j1,j2,j3:j1=j2−k,j3=j2+l

ψj3(D)Tm′

j1,j2,j3(ψj1(D)f, ψj2 (D)g)‖Lr(Rd) . ‖f‖Lp(Rd)‖g‖Lq(Rd)

for all k ≥ −O(1) and l = O(1).

Fix k, l. By the Littlewood-Paley inequality the left-hand side is

. ‖(∑

j2

|Tm′

j2−k,j2,j2+l(ψj2−k(D)f, ψj2 (D)g)|2)1/2‖Lr(Rd).

By the usual Fourier decomposition we can estimate

|Tm′

j2−k,j2,j2+l(ψj2−k(D)f, ψj2(D)g)|(x)

.∑

n1,n2

(1 + |n1|+ |n2|)−100d|ψj2−k(D)f(x− n1/C2

j2−k)||ψj2 (D)g(x− n2/C2j2)|

and then by using (6) (or Lemma 3.1) we obtain

|Tm′

j2−k,j2,j2+l(ψj2−k(D)f, ψj2(D)g)|(x) .

n1,n2

(1+|n1|+|n2|)−100d(1+|n1|)

d(1+|n2|)d(Mψj2−k(D)f)(Mψj2 (D)g)

The n1, n2 sum is O(1) and can be discarded. One then uses Cauchy-Schwarz,Holder, Fefferman-Stein, and Littlewood-Paley as before.

LECTURE NOTES 6 19

Finally, we give an example which shows that multiplier estimates can sometimesfail.

Proposition 5.5 (Grafakos-Kalton example). Let 1 < p, q, r < ∞ be such that1/p+ 1/q = 1/r. The estimate

‖∑

j1,j2,j3

cj1,j2,j3ψj3(D)(ψj1 (D)f, ψj2(D)g)‖Lr(Rd) .p,q,r,d ‖f‖Lp(Rd)‖g‖Lq(Rd)

does not hold uniformly for all choices of uniformly bounded constants cj1,j2,j3 , andfor bump functions ψj uniformly adapted to annuli {|ξ| ∼ 2j}.

Proof To simplify the notation slightly we work in one dimension d = 1. Theidea is to assume the estimate is false, and then (by inspecting various frequencylimits of the estimate) derive increasingly ridiculous (and eventually patently false)estimates as a result.

If the estimate failed, then

‖∑

j2>0

(ψj2(D)g)(∑

j1<0

cj1,j2ψj1(D)f)‖Lr(R) . ‖f‖Lp(R)‖g‖Lq(R)

whenever cj1,j2 are bounded, where we shall suppress all dependence of implicitconstants on p, q, r, d. Thus if we let T1, . . . , TN be an arbitrary collection ofHormander-Mikhlin multipliers, each of the form

Tn =∑

j<0

cj,nψj(D)

then we have

‖N∑

n=1

(Tnf)ψjn(D)g‖Lr(R) . ‖f‖Lp(R)‖g‖Lq(R)

for arbitrary distinct positive j1, . . . , jN . By the Littlewood-Paley inequality thiswould imply

‖(

N∑

n=1

|Tnf |2|ψjn(D)g|2)1/2‖Lr(R) . ‖f‖Lp(R)‖g‖Lq(R).

Now we specialise to a function g of the form

g =N∑

n=1

e2πi2jnxgn(x)

where gn are fixed Schwartz functions with compactly supported Fourier transform.If the jn are sufficiently large and separated from each other (depending on the gn),we have

|ψjn(D)g| = |gn|

and (by the Littlewood-Paley inequality)

‖g‖Lq(R) ∼ ‖(

N∑

n=1

|gn|2)1/2‖Lq(R).

20 TERENCE TAO

We thus conclude that

‖(N∑

n=1

|Tnf |2|gn|

2)1/2‖Lr(R) . ‖f‖Lp(R)‖(N∑

n=1

|gn|2)1/2‖Lq(R).

By density this estimate is in fact true for arbitrary functions gn. If we set gn(x) :=g(x)1n=n(x), where n(x) is the index n which maximises |Tnf(x)| for each x, wehave

(

N∑

n=1

|Tnf |2|gn|

2)1/2 = |g| sup1≤n≤N

|Tnf |

and thus

‖|g| sup1≤n≤N

|Tnf |‖Lr(R) . ‖f‖Lp(R)‖g‖Lq(R).

This is true for all g, so by the converse Holder inequality we obtain a “granduniversal maximal inequality”

‖ sup1≤n≤N

|Tnf |‖Lp(R) . ‖f‖Lp(R).

Since N is arbitrary, we can use monotone convergence and conclude

‖ sup|cj|≤1

|∑

j<0

cjψj(D)f |‖Lp(R) . ‖f‖Lp(R).

But

sup|cj|≤1

|∑

j<0

cjψj(D)f | =∑

j<0

|ψj(D)f |

and so we have shown

‖∑

j<0

|ψj(D)f |‖Lp(R) . ‖f‖Lp(R).

By rescaling we may replace j < 0 by j < J for any J ; letting J → ∞ and usingmonotone convergence we conclude

‖∑

j

|ψj(D)f |‖Lp(R) . ‖f‖Lp(R).

Setting f(x) =∑N

n=1 e2πi2jnxfn(x) for some band-limited fn as before, and letting

the jn get widely spaced and go to infinity, we conclude

N∑

n=1

|fn|‖Lp(R) . ‖(

N∑

n=1

|fn|2)1/2‖Lp(R)

for all such fn, and hence (by limiting arguments) for arbitrary fn. But this isclearly false, as can be seen for instance by setting fn := 1[0,1].

6. The paradifferential calculus

Now we put all these paraproduct estimates to work. Let us recall some definitionsof linear symbols.

LECTURE NOTES 6 21

Definition 6.1. Let s ≥ 0. A homogeneous symbol of order s is a symbol m :Rd → C which obeys the estimates

|∇jm(ξ)| .j,s,d |ξ|s−j

for all j ≥ 0 and ξ 6= 0, whereas an inhomogeneous symbol of order s is a symbolm : Rd → C which obeys the estimates

|∇jm(ξ)| .j,s,d 〈ξ〉s−j .

The corresponding Fourier multipliers m(D) are referred to as homogeneous andinhomogeneous Fourier multipliers of order s respectively.

These multipliers almost commute with paraproducts in certain ways:

Lemma 6.2 (Kato-Ponce type commutator identities). Let s ∈ R, and let Ds bean inhomogeneous symbol of order s.

• If πlh is a low-high paraproduct, then

Dsπlh(f, g) = πlh(f,Dsg) + πlh(∇f, 〈∇〉s−1g) = π′

lh(f, 〈∇〉sg)

for some other (vector-valued) low-high paraproduct πlh and low-high para-product π′

lh.• If πhl is a high-low paraproduct, then

Dsπhl(f, g) = πhl(Dsf, g) + πhl(〈∇〉s−1g,∇f) = π′

hl(Ds, g)

for some high-low paraproducts πlh, π′lh.

• If πhh is a high-high paraproduct, then

πhh(Dsf, g) = πhh(f,D

sg) + πhh(∇f, 〈∇〉s−1g) = π′hh(f, 〈∇〉sg)

for some high-high paraproducts πhh, π′hh.

The implied constants in the symbol bounds for πlh, etc. may depend on s. Similarclaims hold for homogeneous symbols Ds of order s, but with 〈∇〉 replaced by |∇|.

Proof We just prove the low-high inhomogeneous case, as the other cases aresimilar. Writing the symbol of Ds as ms(ξ) and the symbol of πlh as mlh(ξ1, ξ2),we observe that the bilinear operator Dsπlh(f, g)− πlh(f,D

sg) has symbol

mlh(ξ1, ξ2)[ms(ξ1 + ξ2)−ms(ξ2)].

This is supported in the region |ξ1+ ξ2| ∼ |ξ2|. Using Littlewood-Paley multipliers,we can subdivide further into the regions |ξ1| ≤

12 |ξ2| and |ξ1| ≥

14 |ξ2|. Suppose we

are in the former region. Then by the fundamental theorem of calculus we have

mlh(ξ1, ξ2)[ms(ξ1 + ξ2)−ms(ξ2)] =

∫ 1

0

ξ1 ·mlh(ξ1, ξ2)∇ms(tξ1 + ξ2) dt.

One can easily verify thatmlh(ξ1, ξ2)∇ms(tξ1+ξ2)〈2πξ2〉

1−s is a low-high Coifman-Meyer paraproduct uniformly for t ∈ [0, 1], and the claim follows (using Minkowski’sinequality to deal with the integration in t).

Now suppose instead we are in the region |ξ1| ≥14 |ξ2|, which when combined with

the low-high nature of πlh implies that ξ1, ξ2, ξ1 + ξ2 are all comparable. In this

22 TERENCE TAO

case we cannot use the fundamental theorem of calculus as before because the linesegment {tξ1 + ξ2 : 0 ≤ t ≤ 1} can pass close to the origin. Instead, we write

mlh(ξ1, ξ2)[ms(ξ1 + ξ2)−ms(ξ2)] = ξ1 ·

ξ1|ξ1|2

(mlh(ξ1, ξ2)[ms(ξ1 + ξ2)−ms(ξ2)]).

One easily verifies that ξ1|ξ1|2

(mlh(ξ1, ξ2)[ms(ξ1+ξ2)−m

s(ξ2)])〈2πξ2〉1−s is a low-high

Coifman-Meyer multiplier (without attempting to exploit any cancellation betweenthe ms terms), and the claim follows.

To obtain the cruder representation of π′lh(f, 〈∇〉sg), one simply notes that ms(ξ1+

ξ2)mlh(ξ1, ξ2)〈ξ2〉−s is a Coifman-Meyer low-high symbol.

One can also move positive-order operators from the low frequency to the high,leaving a residual error:

Lemma 6.3 (Moving derivatives). Let s = s1 + s2 > 0 for some s1, s2 ≥ 0, andlet Ds be an inhomogeneous symbol of order s.

• If πlh is a low-high paraproduct, then

πlh(Dsf, g) = 〈∇〉s1πrlh(f, 〈∇〉s2g)

for some residual low-high paraproduct πrlh.• If πhl is a high-low paraproduct, then

πhl(f,Dsg) = 〈∇〉s1πrhl(〈∇〉s2f, g)

for some residual high-low paraproduct πrhl.• If πhh is a high-high paraproduct, then

Dsπhh(f, g) = πrhh(〈∇〉s1f, 〈∇〉s2g)

for some residual high-high paraproduct πrhh.

The implied constants in the symbol bounds for πlh, etc. may depend on s. Similarclaims hold for homogeneous symbols Ds of order s, but with 〈∇〉 replaced by |∇|.

Proof We again just prove the low-high case. If we letms(ξ) andmlh(ξ1, ξ2) denotethe symbols of Ds and πlh as before, then we see that πrlh will be a multiplier withsymbol

〈2π(ξ1 + ξ2)〉−s1ms(ξ1)〈2πξ2〉

−s2mlh(ξ1, ξ2).

But one easily verifies that this is a residual low-high symbol (with ε = s).

There are analogues of the above two lemmas for residual paraproducts, but weshall leave them to the reader as we do not need them here.

Let us now put all of these above estimates to work. Let Ds be an inhomogeneousFourier multiplier of order s. Applying Ds to the decomposition of the product inLemma 2.7 we have

Ds(fg) = Dsπhh(f, g) +Dsπlh(f, g) +Dsπhl(f, g)

LECTURE NOTES 6 23

and similarly

(Dsf)g = πhh(Dsf, g) + πlh(D

sf, g) + πhl(Dsf, g)

and

f(Dsg) = πhh(f,Dsg) + πlh(f,D

sg) + πhl(f,Dsg).

Subtracting and using Lemma 6.2, we have the fractional Leibnitz rule

Ds(fg) = (Dsf)g + f(Dsg) + T (f, g)

where T is a bilinear operator of the form

T (f, g) = Dsπhh(f, g)− πhh(Dsf, g)− πhh(f,D

sg)

+ πlh(∇f, 〈∇〉s−1g)− πlh(Dsf, g) + πhl(〈∇〉s−1f,∇g)

− πhl(f,Dsg).

This operator can be simplified a fair bit by using the above identities. For instanceone can express T in the form

T (f, g) = π(〈∇〉θf, l〈∇〉s−θg) + π′(〈∇〉s−θf, l〈∇〉θg)

for any 0 < θ < min(s, 1), where π, π′ are linear combinations of Coifman-Meyerand residual paraproducts; we leave this as an exercise to the reader. This shouldbe compared with the usual Leibnitz rule

∇k(fg) = (∇kf)g + f(∇kg) +O(|∇f ||∇k−1g|+ . . .+ |∇k−1f ||∇g|)

for integer k ≥ 1.

The fractional Leibnitz rule leads to a number of product estimates in Sobolevspaces. A complete list of such estimates would be impossible here, so let us justgive a sample:

Proposition 6.4. Let 0 < s < 1 and let 1 < p, q, r < ∞ be such that 1/p+ 1/q =1/r, and let Ds be a Fourier multiplier of order s. Let s1, s2 > 0 be such thats1 + s2 = s. Then

‖Ds(fg)− (Dsf)g − f(Dsg)‖Lr(Rd) .p,q,r,s,s1,s2,d ‖f‖W s1,p(Rd)‖g‖W s2,q(Rd)

and

‖fg‖W r,s(Rd) .p,q,s,d ‖f‖W s,p(Rd)‖g‖Lq(Rd) + ‖f‖Lp(Rd)‖g‖W s,q(Rd)

for all Schwartz f, g.

Proof For the first claim, we see from the previous discussion that

Ds(fg)− (Dsf)g − f(Dsg) = π(〈∇〉s1f, 〈∇〉s2g)

for some linear combination π of Coifman-Meyer and residual paraproducts, andso by applying Corollary 3.4 and Proposition 5.4

‖Ds(fg)− (Dsf)g − f(Dsg)‖Lr(Rd) . ‖〈∇〉s1f‖Lp(Rd)‖〈∇〉s2g‖Lq(Rd)

hence the claim.

24 TERENCE TAO

For the second claim, we need to estimate ‖〈∇〉s(fg)‖Lr(Rd). One could use thefull Leibnitz formula as before, but for this particular estimate we can instead usethe simpler paraproduct decomposition

〈∇〉s(fg) = 〈∇〉sπhh(f, g) + 〈∇〉sπlh(f, g) + 〈∇〉sπhl(f, g)

= πrhh(〈∇〉sf, g) + π′lh(f, 〈∇〉sg) + π′

hl(〈∇〉sf, g)

thanks to Lemma 2.7, Lemma 6.2, and Lemma 6.3. The claim then easily followsby using Corollary 4.4 and Proposition 5.4.

It is also instructive to establish such estimates by direct Littlewood-Paley decom-position, avoiding the Coifman-Meyer paradifferential calculus.

There is of course an extension of this bilinear calculus to trilinear operators, etc.but the theory becomes notationally messy, and in practice one can usually ob-tain whatever trilinear or multilinear estimates necessary by concatenating bilinearestimates together, or by working things out by hand using Littlewood-Paley mul-tipliers.

Finally, we remark that in the theory of nonlinear dispersive equations (such asnonlinear Schrodinger and wave equations) there has been significant interest inestablishing bilinear or trilinear multiplier estimates when the symbol does not obeyCoifman-Meyer or residual type bounds, but instead has singularities concentratednear larger dimensional sets such as paraboloids or cones. These estimates are mosteffective at the L2 level, in which case they go by the name of Xs,b estimates. Butthese are beyond the scope of this course.

7. Fractional chain rule

We now turn from bilinear estimates to nonlinear ones, in particular understandingthe relationship between the size of a function u : Rd → C and a compositionF (u) : Rd → C, where F : C → C is a known function6. For various technicalreasons we assume F (0) = 0. Model examples include the Lipschitz case, when|F (z)−F (w)| . |z−w| for all z, w ∈ C, and the power nonlinearity case, in which|F (z)| .p |z|p and (more generally) |F (z) − F (w)| .p |z − w|(|z|p−1 + |w|p−1) forsome p ≥ 1. (Thus the Lipschitz case corresponds to the p = 1 power nonlinearitycase.) A very typical example of a power nonlinearity is the function F (z) = |z|p−1z.If p is an odd integer, then F (u) is a multilinear combination of u and u and canthus (at least in principle) be treated by the theory of the previous section, butwe now allow F to be “non-algebraic” and “non-analytic” in the sense that F (u)cannot be expressed in terms of multilinear combinations (or convergent powerseries) of u and u.

6This is of course not the only nonlinear function one wishes to consider. Another commonproblem which arises in PDE is to estimate F (u)− F (v) in terms of u, v, and u− v. But one canoften reduce these more general problems to this basic one, for instance by writing F (u)−F (v) =∫ 10 (u−v)·F ′((1−t)u+tv) dt, or else expressing F (u)−F (v) = (u−v)G(u, v) where G : C2

→ C is

the function G(z, w) :=F (z)−F (w)

z−wand applying a vector-valued version of the previous analysis.

LECTURE NOTES 6 25

In Lebesgue spaces it is clear what the relationship between u and F (u) is. For in-stance in the Lipschitz case we have F (u) = O(|u|), and hence we have ‖F (u)‖Lq(Rd) .

‖u‖Lq(Rd) for all 0 < q ≤ ∞. In the power nonlinearity case (with exponent p) we

have F (u) = Op(|u|p) and hence ‖F (u)‖Lq/p(Rd) .p ‖u‖

pLq(Rd)

for all 0 < q ≤ ∞.

In Sobolev spaces with exactly one derivative of regularity we can also get the rightestimates quickly from the chain rule. For instance, in the Lipschitz case (and forSchwartz u and F , for simplicity) we have

∇(F (u)) = ∇u · F ′(u) = O(|∇u|) (9)

and so‖∇F (u)‖Lq(Rd) . ‖∇u‖Lq(Rd)

for all 0 < q <∞. In particular this implies that

‖F (u)‖W 1,q(Rd) .q,d ‖u‖W 1,q(Rd)

for 1 < q < ∞. A similar computation in the power nonlinearity case (which weleave to the reader) gives

‖F (u)‖W 1,q/p(Rd) .q,d ‖u‖W 1,q(Rd)

for all p < q <∞. Indeed, by use of Sobolev embedding one can even improve theq/p exponent somewhat; we leave this again to the reader.

One might then hope (perhaps by some sort of interpolation) to obtain some inter-mediate result for Sobolev spaces with regularity s between 0 and 1. (For s > 1 wedo not expect any estimates unless more regularity is also placed on F ; for instance,in order to get two degrees of regularity on F (u) it is reasonable to first demandtwo degrees of regularity on F .) For instance based on the above it is reasonableto conjecture that in the Lipschitz case we have

‖F (u)‖W s,q(Rd) .q,d,s ‖u‖W s,q(Rd)

for all 0 ≤ s ≤ 1. Unfortunately, it is difficult to apply standard interpolation the-orems here, because the operator u 7→ F (u) is not linear, multilinear, or sublinear.Nevertheless, one can still try to follow the spirit of the (real) interpolation method,by trying to decompose u into “low regularity” (or “high frequency”) pieces withwhich one uses the s = 0 theory, and a “high regularity” (or “low regularity”) piecewhich can be controlled by the s = 1 theory. One has to deal with the non-linearityof F , of course, but the plan would be to use formulae such as Taylor’s theoremwith remainder, e.g.

F (v) = F (u) +

∫ 1

0

(v − u) · F ′((1 − t)u+ tv) dt.

(Here, F ′(z) is viewed as a real-linear map from C to C.) There is another, closelyrelated, strategy, which is to try to generalise the chain rule (9) (similarly to how wegeneralised the Leibnitz rule in previous sections). For integer k, repeated iterationsof the chain rule give

∇kF (u) = (∇ku) · F ′(u) + . . .

where the remaining terms require fewer than k derivatives on u (and plenty on F ).We can extrapolate from this and guess a fractional chain rule

DsF (u) ≈ (Dsu) · F ′(u).

26 TERENCE TAO

This type of rule turns out to be a little tricky to formalise properly, but it servesas good intuition to guide the results which follow.

A good compromise between the above strategies is not to work on expressionssuch as DsF (u) directly, but instead to work with Littlewood-Paley componentsψj(D)F (u), hoping to reconstruct DsF (u) later. The analogue of the fractionalchain rule here is the heuristic

ψj(D)F (u) ≈ ψj(D)u · F ′(ψ<j(D)u). (10)

An informal motivation of this heuristic is as follows. Splitting u = ψ<j(D)u +ψ≥j(D)u and using Newton’s approximation we roughly have

F (u) ≈ F (ψ<j(D)u) + ψ≥j(D)u · F ′(ψ<j(D)u).

Now we apply ψj(D) to both sides. The function ψ<j(D)u is almost annihilated byψj(D), and we expect the same for multilinear combinations of this function; thusthe first term F (ψ<j(D)u) should disappear and the second term, being a kind ofhigh-low paraproduct between u and F ′(u), should then have the ψj(D) term fallsolely on the ψ≥j(D)u factor, giving (10).

One can eventually make the above heuristics precise, but they require more regu-larity on F than currently assumed (for instance, one needs something like a Holdercontinuity estimate on F ′). Here, we will present here a cruder estimate which doesnot demand as high regularity on F and suffices for many purposes.

Lemma 7.1. Let u be Schwartz and let 1 =∑

j ψj(D) be a Littlewood-Paley de-

composition. If F is a Lipschitz nonlinearity with F (0) = 0 (in order to ensure Fdecays at infinity), then we have the pointwise estimate

|ψj(D)F (u)| .d∑

k

min(2k, 1)M(ψj+k(D)u)

where M is the Hardy-Littlewood maximal inequality. More generally, if F is apower-type nonlinearity with exponent p ≥ 1, then

|ψj(D)F (u)| .p,d∑

k

min(2k, 1)[M(|u|p−1)M(ψj+k(D)u) +M(|u|p−1ψj+k(D)u)].

Proof We just prove the first estimate and leave the second as an exercise. Bytranslation invariance it suffices to estimate this at the origin. By rescaling (notingthat the rescaled version 2jF (2−jz) of F is still Lipschitz) we may take j = 0, thuswe need to show

|ψ0(D)(F (u))(0)| .d∑

k

min(2k, 1)M(ψk(D)u)(0).

We express ψ0(D) in Fourier space to obtain

ψ0(D)(F (u))(0) =

Rd

ψ0(−y)F (u(y)) dy.

We use the Lipschitz condition to write

F (u(y)) = F (ψ≤0(D)u(y)) +O(|ψ>0(D)u(y)|).

LECTURE NOTES 6 27

Since ψ0 vanishes at the origin, ψ0 has mean zero, so we can rewrite this as

ψ0(D)(F (u))(0) =

Rd

ψ0(−y)(F (ψ≤0(D)u(y))−F (ψ≤0(D)u(0))+O(|ψ>0(D)u(y)|)) dy.

Now we take advantage of the rapid decrease of ψ0 and the triangle inequality toobtain

|ψ0(D)(F (u))(0)| .d

Rd

〈y〉−100d(|F (ψ≤0(D)u(y))−F (ψ≤0(D)u(0))|+|ψ>0(D)u(y)|).

At this point we use the Lipschitz hypothesis to bound

|F (ψ≤0(D)u(y)) − F (ψ≤0(D)u(0))| . |ψ≤0(D)u(y)− ψ≤0(D)u(0)|.

Applying a Littlewood-Paley decomposition we then have

|ψ0(D)(F (u))(0)| .d

Rd

〈y〉−100d∑

k≤0

|ψk(D)u(y)− ψk(D)u(0)|+∑

k>0

|ψk(D)u(y)|.

The second term can be easily bounded by∑k>0Mψk(D)u(0), which is acceptable.

For the k < 0 contributions, we use the fundamental theorem of calculus to express

ψk(D)u(y)− ψk(D)u(0) =

∫ 1

0

y · ∇ψk(D)u(ty) dt.

By Lemma 3.1 we have

∇ψk(D)u(ty) = O(〈2ky〉dMψk(D)u(0)).

We thus obtain a contribution of∑

k≤0 2kMψk(D)u(0), which is acceptable.

This estimate is already enough to control F (u) adequately in many situations. Wegive just one example:

Corollary 7.2. Let u be Schwartz. If F is a Lipschitz nonlinearity then

‖F (u)‖W s,q(Rd) .s,q,d ‖u‖W s,q(Rd)

for all 1 < q < ∞ and 0 < s < 1. If instead F is a power-type nonlinearity withexponent p, then

‖F (u)‖W s,q(Rd) .s,q,d ‖u‖p−1Lr(Rd)

‖u‖W s,t(Rd)

whenever 1 < q, r, t <∞ and 0 < s < 1 are such that 1/q = (p− 1)/r + 1/t.

Proof Again we just handle the Lipschitz case and leave the power case as anexercise. Since F (u) = O(|u|), we have

‖F (u)‖Lq(Rd) . ‖u‖Lq(Rd) .s,q,d ‖u‖W s,q(Rd)

so it suffices to show that

‖F (u)‖W s,q(Rd) .s,q,d ‖u‖W s,q(Rd).

From the Littlewood-Paley characterisation of Sobolev spaces, we have

‖F (u)‖W s,q(Rd) ∼s,q,d ‖(∑

j

22sj |ψj(D)F (u)|2)1/2‖Lq(Rd).

28 TERENCE TAO

Applying Lemma 7.1 and the triangle inequality, we conclude

‖F (u)‖W s,q(Rd) ∼s,q,d∑

k

min(2k, 1)‖(∑

j

22sj |M(ψj+k(D)u)|2)1/2‖Lq(Rd).

Apploying Fefferman-Stein we then have

‖F (u)‖W s,q(Rd) ∼s,q,d∑

k

min(2k, 1)‖(∑

j

22sj|ψj+k(D)u|2)1/2‖Lq(Rd).

Shifting j by k and using the Littlewood-Paley inequality we conclude

‖F (u)‖W s,q(Rd) ∼s,q,d∑

k

min(2k, 1)2−sk‖u‖W s,q(Rd)

and the claim follows.

8. Exercises

• Q1. Show that Corollary 3.4 fails when p = q = r = ∞. (Hint: Considera high-high paraproduct

∑j ψj(D)fψj(D)g in one dimension applied to

f = g equal to the signum function sgn(x) (or some smoothed out versionthereof) and evaluate this at x = 0.) By duality obtain similar results when(p, q) = (1,∞) or (p, q) = (∞, 1).

• Q2. (Carleson embedding theorem) Let µ be a positive Radon measureon R+ ×Rd. Show that the following are equivalent up to changes in theimplied constant:(i) We have

µ([0, r]×B(x, r)) .d |B(x, r)|

for all balls B(x, r).(ii) For any 1 < p <∞, we have

R+×Rd

[

∫−B(x,r)

|f |]p dµ(r, x) .p,d ‖f‖pLp(Rd)

.

(Hints: the implication of (i) from (ii) is easy; in fact one only needs (ii) for asingle p, such as p = 2, in order to deduce (i). For the converse implication,dyadically decompose

∫−B(x,r)

|f | as in the proof of (4.1).) Measures µ which

obey either (i) or (ii) are known as Carleson measures.• Q3. (Moser’s inequality) Let s > 0 and 1 < p <∞. Show that

‖fg‖W s,p(Rd) .s,p,d ‖f‖W s,p(Rd)‖g‖L∞(Rd) + ‖f‖L∞(Rd)‖g‖W s,p(Rd)

for all Schwartz f, g. Conclude in in particular that if s > d/p, thenW s,p(Rd) is closed under multiplication.

• Q4. (Div-curl lemma) Let 1 < p < ∞, and let f : Rd → Rd and g :Rd → Rd be vector-valued Schwartz functions such that ∇ · f = 0 and∇ ∧ g = 0 (thus f is divergence-free and g is curl-free). Show that for anyHormander-Mikhlin multiplier a(D), that

‖a(D)(fg)‖L1(Rd) .d,p ‖f‖Lp(Rd)‖g‖Lp′(Rd).

This is despite a(D) not necessarily being bounded on L1. (Readers familiarwith Hardy spaces will thus be able to conclude that fg lies in H1(Rd).)

• Q5. Complete the proof of Lemma 7.1 for nonlinearities of power type.

LECTURE NOTES 6 29

• Q6. Complete the proof of Corollary 7.2 for nonlinearities of power type.• Q7. For 1 ≤ p ≤ ∞ and 0 < α < 1, define the Holder space Λαp (R

d) to bethe functions whose norm

‖u‖Λαp (R

d) := ‖u‖Lp(Rd) + sup0<|h|≤1

‖Transhu− u‖Lp(Rd)/|h|α

is finite, where Transhu(x) := u(x − h) is the shift of u by h. Thus forinstance Λα∞ is the class of Holder continuous functions of order α.

• (i) Show that

‖u‖Λαp (R

d) ∼p,α,d ‖u‖Lp(Rd) + supj≥0

2−jα‖ψj(D)u‖Lp(Rd)

where we use the usual Littlewood-Paley decomposition. Conclude that for1 < p <∞ we have

‖u‖Wα−ε,p(Rd) .p,α,d,ε ‖u‖Λαp (R

d) .p,α,d,ε ‖u‖Wα+ε,p(Rd).

This fact allows us to use Holder spaces as a “cheap” substitute for Sobolevspaces, provided we are willing to lose an epsilon.

• (ii) Using only elementary estimates (such as Holder’s inequality - no Fourieranalysis!) show that

‖F (u)‖Λαq (Rd) . ‖u‖Λα

q (Rd)

for any Lipschitz nonlinearity F and 1 ≤ q ≤ ∞, and more generally for apower nonlinearity of order p that

‖F (u)‖Λαq (Rd) . ‖u‖p−1

Lr(Rd)‖u‖Λα

t (Rd)

whenever 1 ≤ q, r, t ≤ ∞ is such that 1/q = (p− 1)/r + 1/t.• Q8. (Localisation of Sobolev spaces) Let s ≥ 0, 1 ≤ p ≤ ∞, and let φ be aSchwartz function of height 1 adapted to a ball B. If the radius of B is atleast 1, show that

‖φf‖W s,p(Rd) .s,p,d ‖f‖W s,p(Rd)

for all f ∈ S(Rd). If we also have s < d/p, show that the hypothesis thatthe radius of B is at least 1 can be omitted.

Department of Mathematics, UCLA, Los Angeles CA 90095-1555

E-mail address: [email protected]