331
Asset Pricing John H. Cochrane

up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Asset Pricing

John H. Cochrane

Page 2: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

2

Page 3: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Acknowledgments

This book owes an enormous intellectual debt to Lars Hansen and Gene Fama. Most of theideas in the book developed from shuttling back and forth between their of¿ces, and tryingto make sense of what each was saying in the language of the other. I am also grateful to allmy colleagues in Finance and Economics at the University of Chicago, and to George Con-stantinides especially, for many long discussions about the ideas in this book. I thank GeorgeConstantinides, Andrea Eisfeldt, Gene Fama, Wayne Ferson, Owen Lamont, Anthony Lynch,Dan Nelson, Alberto Pozzolo, Michael Roberts, Mike Stutzer, and several generations ofPh.D. students at the University of Chicago for comments on earlier drafts. Pietro Verson-esi kindly shared his classnotes on term structure models. I thank the NSF and the GraduateSchool of Business for research support.

Until publication, ever-improving drafts of this book are available at

http://www-gsb.uchicago.edu/fac/john.cochrane/research/papers

Comments and suggestions are most welcome This book draft is copyrightf� John H.Cochrane 1997, 1998, 1999, 2000

John H. CochraneGraduate School of BusinessUniversity of Chicago1101 E. 58th St.Chicago IL 60637773 702 [email protected] 1, 2000

3

Page 4: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Contents

Acknowledgments 3

1 Preface 5

Part I. Asset pricing theory 9

2 Consumption-based model and overview 10

2.1 Basic pricing equation 10

2.2 Marginal rate of substitution/stochastic discount factor 12

2.3 Prices, payoffs and notation 13

2.4 Classic issues in ¿nance 15

2.5 Discount factors in continuous time 29

2.6 Problems 34

3 Applying the basic model 38

3.1 Assumptions and applicability 38

3.2 General Equilibrium 40

3.3 Consumption-based model in practice 44

3.4 Alternative asset pricing models: Overview 47

4 Contingent Claims Markets 49

4.1 Contingent claims 49

4.2 Risk neutral probabilities 50

4.3 Investors again 51

4.4 Risk sharing 53

4.5 State diagram and price function 54

5 The discount factor 58

5.1 Law of one price and existence of a discount factor 58

5.2 No-Arbitrage and positive discount factors 64

5.3 An alternative formula, and x� in continuous time?? 68

4

Page 5: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

6 Mean-variance frontier and beta representations 72

6.1 Expected return - Beta representations 72

6.2 Mean-variance frontier: Intuition and Lagrangian characterization 75

6.3 An orthogonal characterization of the mean-variance frontier 78

6.4 Spanning the mean-variance frontier 84

6.5 A compilation of properties ofU�> Uh� and{� 84

6.6 Problems 87

7 Relation between discount factors, betas, and mean-variance frontiers 88

7.1 From discount factors to beta representations 88

7.2 From mean-variance frontier to a discount factor and beta representation 92

7.3 Factor models and discount factors 95

7.4 Discount factors and beta models to mean - variance frontier 99

7.5 Three riskfree rate analogues 100

7.6 Mean-variance special cases with no riskfree rate 105

7.7 Problems 109

8 Implications of existence and equivalence theorems 110

9 Conditioning information 118

9.1 Scaled payoffs 119

9.2 Suf¿ciency of adding scaled returns 121

9.3 Conditional and unconditional models 123

9.4 Scaled factors: a partial solution 130

9.5 Summary 132

10 Factor pricing models 133

10.1 Capital Asset Pricing Model (CAPM) 135

10.2 Intertemporal Capital Asset Pricing Model (ICAPM) 146

10.3 Comments on the CAPM and ICAPM 148

10.4 Arbitrage Pricing Theory (APT) 151

10.5 APT vs. ICAPM 160

10.6 Problems 161

5

Page 6: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Part II. Estimating and evaluating asset pricing models 162

11 GMM in explicit discount factor models 165

11.1 The Recipe 165

11.2 Interpreting the GMM procedure 168

11.3 Applying GMM 172

12 GMM: general formulas and applications 176

12.1 General GMM formulas 176

12.2 Testing moments 180

12.3 Standard errors of anything by delta method 181

12.4 Using GMM for regressions 182

12.5 Prespeci¿ed weighting matrices and moment conditions 184

12.6 Estimating on one group of moments, testing on another. 193

12.7 Estimating the spectral density matrix 193

12.8 Problems 200

13 Regression-based tests of linear factor models 202

13.1 Time-series regressions 202

13.2 Cross-sectional regressions 207

13.3 Fama-MacBeth Procedure 216

13.4 Problems 222

14 GMM for linear factor models in discount factor form 223

14.1 GMM on the pricing errors gives a cross-sectional regression 223

14.2 The case of excess returns 225

14.3 Horse Races 227

14.4 Testing for characteristics 228

14.5 Testing for priced factors: lambdas or b’s? 229

14.6 Problems 233

15 Maximum likelihood 235

15.1 Maximum likelihood 235

6

Page 7: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

15.2 ML is GMM on the scores 237

15.3 When factors are returns, ML prescribes a time-series regression. 239

15.4 When factors are not excess returns, ML prescribes a cross-sectionalregression 243

15.5 Problems 244

16 Time-series vs. cross section� ML vs. GMM 246

16.1 Time series vs. cross-section 246

16.2 ML vs. GMM 250

Part III. Bonds and options 261

17 Option pricing 263

17.1 Background 263

17.2 Black-Scholes formula 270

17.3 Problems 276

18 Option pricing without perfect replication 277

18.1 On the edges of arbitrage 277

18.2 One-period good deal bounds 278

18.3 Multiple periods and continuous time 285

18.4 Extensions, other approaches, and bibliography 294

19 Term structure of interest rates 296

19.1 De¿nitions and notation 296

19.2 Yield curve and expectations hypothesis 300

19.3 Term structure models – a discrete-time introduction 303

19.4 Continuous time term structure models 307

19.5 Three famous linear term structure models 312

19.6 Bibliography 323

19.7 Problems 325

Part IV. A brief empirical survey 326

7

Page 8: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

20 Equity premium puzzle?? 327

21 Hansen-Jagannathan bounds 329

21.1 Results 341

21.2 Beyond mean and variance. 341

21.3 What do we know about discount factors: a summary 343

21.4 Comments on the Hansen-Jagannathan bound. 343

22 Bibliography 345

22.1 References 345

Part V. General equilbrium 348

Part VI. Appendix 351

23 Continuous time 352

23.1 Brownian Motion 352

23.2 Diffusion models 353

23.3 Ito’s lemma 356

23.4 Densities 358

23.5 Tricks 359

23.6 Problems 359

24 Notation 361

25 Utility functions 365

26 Probability and statistics 366

26.1 Probability 366

26.2 Statistics 370

26.3 Regressions 373

26.4 Partitioned matrix inverse formulas 374

27 Tricks 376

8

Page 9: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

28 Bibliography 377

29 Answer sketches for the problems 378

29.1 Problems for Chapter 2: 378

29.2 Problems for chapter 6. 383

29.3 Problems for chapter 7 384

29.4 Problems for chapter 13 385

29.5 Problems for chapter 14 385

29.6 Problems for Chapter 15 388

29.7 Problems for Chapter 17 389

29.8 OTHER PROBLEMS 390

30 Algebra for cross-sectional regressions 394

31 Notes: 400

31.1 ML approach to a consumption-based model 404

31.2 General Equilibrium Recipe 406

31.3 Endowment economies 406

31.4 Production economies 407

31.5 Justi¿cation for the procedure 407

31.6 Risk sharing and security design. 407

31.7 Incomplete markets economies. 407

31.8 Outlook 407

31.9 Problems 408

32 Frictions 409

Part VII. Empirical survey 410

33 Return Predictability 411

33.1 Stocks 411

33.2 Term structure 411

33.3 Comparison to continuous-time models 411

33.4 Foreign exchange 411

9

Page 10: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

33.5 Econometric issues 411

33.6 Conditional variance 411

33.7 Conditional Sharpe ratios and portfolio implications 412

34 Present value tests 413

35 Factor pricing models 414

35.1 Capm 414

35.2 Chen Roll Ross model 414

35.3 Investment and macro factors 414

35.4 Book to market 414

35.5 Momentum and more 414

35.6 Digesting the tests 414

36 Consumption based model tests 415

36.1 CRRA utility 415

36.2 Durable goods 415

36.3 Habits 415

36.4 State-nonseparabilities 415

36.5 Problems 415

10

Page 11: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 1. PrefaceAsset pricing theory tries to understand the prices or values of claims to uncertain payments.A low price implies a high rate of return, so one can also think of the theory as explainingwhy some assets pay higher average returns than others.

To value an asset, we have to account for the delay and for the risk of its payments. Theeffects of time are not too dif¿cult to work out. However, corrections for risk are much moreimportant determinants of an many assets’ values. For example, over the last 50 years stockshave given a real return of about 9% on average. Of this, only about 1% is due to interestrates� the remaining 8% is a premium earned for holding risk.Uncertainty, or corrections forrisk make asset pricing interesting and challenging.

Asset pricing theory shares the positive vs. normative tension present in the rest of eco-nomics. Does it describe the way the worlddoes work or the way the worldshould work?We observe the prices or returns of many assets. We can use the theory positively, to try tounderstand why prices or returns are what they are. If the world does not obey a model’s pre-dictions, we can decide that the model needs improvement. However, we can also decide thattheworld is wrong, that some assets are “mis-priced” and present trading opportunities forthe shrewd investor. This latter use of asset pricing theory accounts for much of its popular-ity and practical application. Also, and perhaps most importantly, the prices of many assetsor claims to uncertain cashÀows are not observed, such as potential public or private invest-ment projects, new¿nancial securities, buyout prospects, and complex derivatives. We canapply the theory to establish what the prices of these claimsshould be as well� the answersare important guides to public and private decisions.

Asset pricing theory all stems from one simple concept, derived in the¿rst page of Chap-ter 1 of this book: price equals expected discounted payoff. The rest is elaboration, specialcases, and a closet full of tricks that make the central equation useful for one or anotherapplication.

There are two polar approaches to this elaboration. I will call themabsolute pricing andrelative pricing. In absolute pricing, we price each asset by reference to its exposure to fun-damental sources of macroeconomic risk. The consumption-based and general equilibriummodels described below are the purest examples of this approach. The absolute approach ismost common in academic settings, in which we use asset pricing theory positively to givean economic explanation for why prices are what they are, or in order to predict how pricesmight change if policy or economic structure changed.

In relative pricing, we ask a less ambitious question. We ask what we can learn about anasset’s valuegiven the prices of some other assets. We do not ask where the price of the otherset of assets came from, and we use as little information about fundamental risk factors aspossible. Black-Scholes option pricing is the classic example of this approach. While limitedin scope, this approach offers precision in many applications.

Asset pricing problems are solved by judiciously choosing how much absolute and how

11

Page 12: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 1 PREFACE

much relative pricing one will do, depending on the assets in question and the purpose of thecalculation. Almost no problems are solved by the pure extremes. For example, the CAPMand its successor factor models are paradigms of the absolute approach. Yet in applications,they price assets “relative” to the market or other risk factors, without answering what deter-mines the market or factor risk premia and betas. The latter are treated as free parameters.On the other end of the spectrum, most practical¿nancial engineering questions involve as-sumptions beyond pure lack of arbitrage, assumptions about equilibrium “market prices ofrisk.”

The central and un¿nished task of absolute asset pricing is to understand and measure thesources of aggregate or macroeconomic risk that drive asset prices. Of course, this is also thecentral question of macroeconomics, and this is a particularly exciting time for researcherswho want to answer these fundamental questions in macroeconomics and¿nance. A lot ofempirical work has documented tantalizing stylized facts and links between macroeconomicsand¿nance. For example, expected returns vary across time and across assets in ways thatare linked to macroeconomic variables, or variables that also forecast macroeconomic events�a wide class of models suggests that a “recession” or “¿nancial distress” factor lies behindmany asset prices. Yet theory lags behind� we do not yet have a well-described model thatexplains these interesting correlations.

This book advocates a discount factor / generalized method of moments view of assetpricing theory and associated empirical procedures. I summarize asset pricing by two equa-tions:

sw @ H+pw.4{w.4,

pw.4 @ i+data, parameters,=

wheresw = asset price,{w.4 = asset payoff,pw.4 = stochastic discount factor.

The major advantage of the discount factor / moment condition approach are its simplicityand universality. Where once there were three apparently different theories for stocks, bonds,and options, now we see each as just special cases of the same theory. The common languagealso allows us to use insights from each¿eld of application in other¿elds.

This approach also allows us to conveniently separate the step of specifying economicassumptions of the model (second equation) from the step of deciding which kind of empiri-cal representation to pursue or understand. For a given model – choice ofi+�, – we will seehow the¿rst equation can lead to predictions stated in terms of returns, price-dividend ra-tios, expected return-beta representations, moment conditions, continuous vs. discrete timeimplications and so forth. The ability to translate between such representations is also veryhelpful in digesting the results of empirical work, which uses a number of apparently distinctbut fundamentally connected representations.

12

Page 13: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Thinking in terms of discount factors often turns out to be much simpler than thinkingin terms of portfolios. For example, it is easier to insist that there exists a positive discountfactor than to check that every possible portfolio that dominates every other portfolio has alarger price, and the long arguments over the APT stated in terms of portfolios are easy todigest when stated in terms of discount factors.

The discount factor approach is also associated with a state-space geometry in place ofthe usual mean-variance geometry, and this book emphasizes the state-space intuition behindmany classic results.

For these reasons, the discount factor language and the associated state-space geometryis common in academic research and high-tech practice. It is not yet common in textbooks,and that is the niche that this book tries to¿ll.

I also diverge from the usual order of presentation. Most books are structured follow-ing the history of thought: portfolio theory, mean-variance frontiers, spanning theorems,CAPM, ICAPM, APT, option pricing, and¿nally consumption-based model. Contingentclaims are an esoteric extension of option-pricing theory. I go the other way around: con-tingent claims and the consumption-based model are the basic and simplest models around�the others are specializations. Just because they were discovered in the opposite order is noreason to present them that way.

I also try to unify the treatment of empirical methods. A wide variety of methods are pop-ular, including time-series and cross-sectional regressions, and methods based on generalizedmethod of moments (GMM) and maximum likelihood. However, in the end all of these ap-parently different approaches do the same thing: they pick free parameters of the model tomake it¿t best, which usually means to minimize pricing errors� and they evaluate the modelby examining how big those pricing errors are.

As with the theory, I do not attempt an encyclopedic compilation of empirical procedures.As with the theory, the literature on econometric methods contains lots of methods and specialcases (likelihood ratio ways of doing common Wald tests� cases with and without riskfreeassets and when factors do and don’t span the mean variance frontier, etc.) that are seldomused in practice. I try to focus on the basic ideas and on methods that are actually used inpractice.

The accent in this book is on understanding statements of theory, and working with thattheory to applications, rather than rigorous or general proofs. Also, I skip very lightly overmany parts of asset pricing theory that have faded from current applications, although theyoccupied large amounts of the attention in the past. Some examples are portfolio separationtheorems, properties of various distributions, or asymptotic APT. I have also reorganized thetraditional order of ideas, by putting portfolio theory after asset pricing. While portfoliotheory is still interesting and useful, it is no longer a cornerstone of pricing. Rather thanuse portfolio theory to¿nd a demand curve for assets, which intersected with a supply curvegives prices, we now go to prices directly. One can then¿nd optimal portfolios, but it is aside issue.

13

Page 14: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 1 PREFACE

Again, my organizing principle is that everything can be traced back to specializations ofthe basic pricing equation s @ H+p{,. Therefore, after reading the ¿rst chapter, one canpretty much skip around and read topics in as much depth or order as one likes. Each majorsubject always starts back at the same pricing equation.

The target audience for this book is economics and ¿nance Ph.D. students, advanced MBAstudents or professionals with similar background. I hope the book will also be useful tofellow researchers and ¿nance professionals, by clarifying, relating and simplifying the set oftools we have all learned in a hodgepodge manner. I presume some exposure to undergraduateeconomics and statistics. A reader should have seen a utility function, a random variable, astandard error and a time series, should have some basic linear algebra and calculus andshould have solved a maximum problem by setting derivatives to zero. The hurdles in assetpricing are really conceptual rather than mathematical.

14

Page 15: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

PART IAsset pricing theory

15

Page 16: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 2. Consumption-based modeland overview

2.1 Basic pricing equation

An investor’s ¿rst order conditions give the basic consumption-based model,

sw @ Hw

��x3+fw.4,x3+fw,

{w.4

�=

Our basic objective is to¿gure out the value of any stream of uncertain cashÀows. I startwith an apparently simple case, which turns out to capture very general situations.

Let us¿nd the value at timew of a payoff {w.4. For example, if one buys a stock today,the payoff next period is the stock price plus dividend,{w.4 @ sw.4.gw.4. {w.4 is a randomvariable: an investor does not know exactly how much he will get from his investment, but hecan assess the probability of various possible outcomes. Don’t confuse thepayoff {w.4 withthepro¿t or return� {w.4 is the value of the investment at timew . 4, without subtracting ordividing by the cost of the investment.

We¿nd the value of this payoff by asking what it is worth to a typical investor. To do this,we need a convenient mathematical formalism to capture what an investor wants. We modelinvestors by autility function de¿ned over current and future values of consumption,

X+fw> fw.4, @ x+fw, . �Hw ^x+fw.4,` =

We will often use a convenient power utility form,

x+fw, @4

4� �f4��w =

16

Page 17: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.1 BASIC PRICING EQUATION

The limit as � $ 4 is

x+f, @ oq+f,=

The utility function captures the fundamental desire for more consumption, rather thanposit a desire for intermediate objectives such as means and variance of portfolio returns.Consumption fw.4 is also random� the investor does not know his wealth tomorrow, andhence how much he will decide to consume. The period utility function x+�, is increasing,reÀecting a desire for more consumption, and concave, reÀecting the declining marginal valueof additional consumption. The last bite is never as satisfying as the ¿rst.

This formalism captures investors’ impatience and their aversion to risk, so we can quan-titatively correct for the risk and delay of cashÀows. Discounting the future by� capturesimpatience, and� is called thesubjective discount factor. The curvature of the utility func-tion also generates aversion to risk and to intertemporal substitution: The consumer prefers aconsumption stream that is steady over time and across states of nature.

Now, assume that the investor can freely buy or sell as much of the payoff{w.4 as hewishes, at a pricesw. How much will he buy or sell? To¿nd the answer, denote byh theoriginal consumption level (if the investor bought none of the asset), and denote by� theamount of the asset he chooses to buy. Then, his problem is,

pd{i�j

x+fw, .Hw�x+fw.4, v=w=

fw @ hw � sw�

fw.4 @ hw.4 . {w.4�

Substituting the constraints into the objective, and setting the derivative with respect to�equal to zero, we obtain the¿rst-order condition for an optimal consumption and portfoliochoice,

swx3+fw, @ Hw ^�x

3+fw.4,{w.4` (1)

or,

sw @ Hw

��x3+fw.4,x3+fw,

{w.4

�= (2)

The investor buys more or less of the asset until this¿rst order condition holds.

Equation (2.1) expresses the standard marginal condition for an optimum:swx3+fw, is

the loss in utility if the investor buys another unit of the asset� Hw ^�x3+fw.4,{w.4` is theincrease in (discounted, expected) utility he obtains from the extra payoff atw.4= The investorcontinues to buy or sell the asset until the marginal loss equals the marginal gain.

17

Page 18: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

Equation (2.2) isthe central asset-pricing formula. Given the payoff{w.4 and given theinvestor’s consumption choicefw> fw.4, it tells you what market pricesw to expect. Its eco-nomic content is simply the¿rst order conditions for optimal consumption and portfolio for-mation. Most of the theory of asset pricing just consists of specializations and manipulationsof this formula.

Notice that we have stopped short of a complete solution to the model, i.e. an expressionwith exogenous items on the right hand side. We relate one endogenous variable, price,to two other endogenous variables, consumption and payoffs. One can continue to solvethis model and derive the optimal consumption choicefw> fw.4 in terms of the givens of themodel. In the model I have sketched so far, those givens are the income sequencehw> hw.4 anda speci¿cation of the full set of assets that the investor may buy and sell. We will in fact studysuch fuller solutions below. However, for many purposes one can stop short of specifying(possibly wrongly) all this extra structure, and obtain very useful predictions about assetprices from (2.2), even though consumption is an endogenous variable.

2.2 Marginal rate of substitution/stochastic discount factor

We break up the basic consumption-based pricing equation into

s @ H+p{,

p @ �x3+fw.4,x3+fw,

wherepw.4 is thestochastic discount factor.

A convenient way to break up the basic pricing equation (2.2) is to de¿ne thestochasticdiscount factor pw.4

pw.4 � �x3+fw.4,x3+fw,

(3)

Then, the basic pricing formula (2.2) can simply be expressed as

sw @ Hw+pw.4{w.4,= (4)

When it isn’t necessary to be explicit about time subscripts, I’ll suppress them and justwrite s @ H+p{,. The price always comes atw, the payoff atw . 4, and the expectation isconditional on timew information.

The termstochastic discount factor refers to the waypgeneralizes standard discount

18

Page 19: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.3 PRICES, PAYOFFS AND NOTATION

factor ideas. If there is no uncertainty, we can express prices via the standard present valueformula

sw @4

Ui{w.4 (5)

where Ui is the gross risk-free rate.4@Ui is thediscount factor. Since gross interest ratesare typically greater than one, the payoff{w.4 sells “at a discount.” Riskier assets havelower prices than equivalent risk-free assets, so they are often valued by using risk-adjusteddiscount factors,

slw @4

UlHw+{

lw.4,=

Here, I have added thel superscript to emphasize that each risky assetl must be discountedby an asset-speci¿c risk-adjusted discount factor4@Ul.

In this context, equation (2.4) is obviously a generalization, and it says something deep:one can incorporate all risk-corrections by de¿ning asingle stochastic discount factor – thesame one for each asset – and putting it inside the expectation.pw.4 is stochastic or randombecause it is not known with certainty at timew. As we will see, the correlation between therandom components ofp and{l generate asset-speci¿c risk corrections.

pw.4 is also often called themarginal rate of substitution after (2.3). In that equation,pw.4 is the rate at which the investor is willing to substitute consumption at timew . 4 forconsumption at timew= pw.4is sometimes also called thepricing kernel. If you know what akernel is and express the expectation as an integral, you can see where the name comes from.It is sometimes called achange of measure or a state-price densityfor reasons that we willsee below.

For the moment, introducing the discount factor p and breaking the basic pricing equa-tion (2.2) into (2.3) and (2.4) is just a notational convenience. As we will see, however, itrepresents a much deeper and more useful separation. For example, notice thats @ H+p{,would still be valid if we changed the utility function, but we would have a different func-tion connectingp to data. As we will see,all asset pricing models amount to alternativemodels connecting the stochastic discount factor to data, whiles @ H+p{, is a convenientaccounting identity with almost no content. At the same time, we will study lots of alter-native expressions ofs @ H+p{,, and we can summarize many empirical approaches tos @ H+p{,. By separating our models into these two components, we don’t have to redo allthat elaboration for each asset pricing model.

2.3 Prices, payoffs and notation

Theprice sw gives rights to apayoff {w.4. In practice, this notation covers a variety ofcases, including the following:

19

Page 20: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

Pricesw Payoff{w.4Stock sw sw.4 . gw.4

Return 4 Uw.4

Price-dividend ratio swgw

�sw.4gw.4

. 4�gw.4gw

Excess return 3 Uhw.4 @ Ud

w.4 �Uew.4

Managed portfolio }w }wUw.4

Moment condition H+sw}w, {w.4}wOne-period bond sw 4

Risk free rate 4 Ui

Option F pd{+VW �N> 3,

The pricesw and payoff{w.4 seem like a very restrictive kind of security. In fact, thisnotation is quite general and allows us easily to accommodate many different asset pricingquestions. In particular, we can cover stocks, bonds and options and make clear that there isone theory for all asset pricing.

For stocks, the one period payoff is of course the next price plus dividend,{w.4 @ sw.4.gw.4. We frequently divide the payoff{w.4 by the pricesw to obtain agross return

Uw.4 � {w.4sw

We can think of a return as a payoff with price one. If you pay one dollar today, the return ishow many dollars or units of consumption you get tomorrow. Thus, returns obey

4 @ H+pU,

which is by far the most important special case of the basic formulas @ H+p{,. I use capitalletters to denotegross returnsU, which have a numerical value like 1.05. I use lowercaseletters to denotenet returnsu @ U�4 or log (continuously compounded) returnsoq+U,, bothof which have numerical values like 0.05. One may also quotepercent returns433� u.

Returns are often used in empirical work because they are stationary (in the statisticalsense� “stationary” does not mean constant) over time. However, thinking in terms of returnstakes us away from the central task of¿nding assetprices. Dividing by dividends and creatinga payoff{w.4 @ +4 . [email protected],gw.4@gw corresponding to a pricesw@gw is a way to look atprices but still to examine stationary variables.

Not everything can be reduced to a return. If you borrow a dollar at the interest rateUi

and invest it in an asset with returnU, you pay no money out-of-pocket today, and get thepayoffU � Ui . This is a payoff with azero price, so you obviously can’t divide payoff byprice to get a return. Zero price does not imply zero payoff. It is a bet in which the chance oflosing exactly balances its chance of winning, so that it is not worth paying extra to take thebet. It is common to study equity strategies in which one short sells one stock or portfolio andinvests the proceeds in another stock or portfolio, generating an excess return. I denote any

20

Page 21: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

such difference between returns as an excess return, Uh. It is also called a zero-cost portfolioor a self-¿nancing portfolio.

In fact, much asset pricing focuses on excess returns. Our economic understanding ofinterest rate variation turns out to have little to do with our understanding of risk premia, soit is convenient to separate the two exercises by looking at interest rates and excess returnsseparately.

We also want to think about the managed portfolios, in which one invests more or lessin an asset according to some signal. The “price” of such a strategy is the amount investedat timew, say}w, and the payoff is}wUw.4. For example a market timing strategy might puta weight in stocks proportional to the price-dividend ratio, investing less when prices arehigher. We could represent such a strategy as a payoff using}w @ d� e+sw@gw,.

When we think about conditioning information below, we will think of objects like}w asinstruments. Then we take an unconditional expectation ofsw}w @ Hw+pw.4{w.4,}w> yieldingH+sw}w, @ H+pw.4{w.4}w,= We can think of this operation as creating a “security” withpayoff{w.4}w.4, and “price”H+sw}w, represented with unconditional expectations.

A one period bond is of course a claim to a unit payoff. Bonds, options, investmentprojects are all examples in which it is often more useful to think of prices and payoffs ratherthan returns.

Prices and returns can be real (denominated in goods) or nominal (denominated in dol-lars)� s @ H+p{, can refer to either case. The only difference is whether we use a real ornominal discount factor. If prices, returns and payoffs are nominal, we should use a nomi-nal discount factor. For example, ifs and{ denote nominal values, then we can create realprices and payoffs to write

sw�w

@ Hw

���x3+fw.4,x3+fw,

�{w.4�w.4

where� denotes the price level (cpi). Obviously, this is the same as de¿ning a nominaldiscount factor by

sw @ Hw

���x3+fw.4,x3+fw,

�w

�w.4

�{w.4

To accommodate all these cases, I will simply use the notation pricesw and payoff{w.4.These symbols can denote3> 4> or }w andUh

w > uw.4, or }wUw.4 respectively, according to thecase. Lots of other de¿nitions ofs and{are useful as well.

2.4 Classic issues in ¿nance

I use simple manipulations of the basic pricing equation to introduce classic issues in¿-

21

Page 22: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

nance: the economics of interest rates, risk adjustments, systematic vs. idiosyncratic risk, ex-pected return-beta representations, the mean-variance frontier, the slope of the mean-variancefrontier, time-varying expected returns, and present value relations.

A few simple rearrangements and manipulations of the basic pricing equations @ H+p{,give a lot of intuition and introduce some classic issues in¿nance, including determinants ofthe interest rate, risk corrections, idiosyncratic vs. systematic risk, beta pricing models, andmean variance frontiers.

2.4.1 Risk free rate

Ui @ 4@H+p,=

With lognormal consumption growth,

uiw @ � . �Hw�oq fw.4 � �5

5�5w +� oq fw.4,

Interest rates are high when people are impatient (�), when expected consumption growthis high (intertemporal substitution), or when risk is low (precautionary saving). A morecurved utility function (�) or a lower elasticity of intertemporal substitution (1/�) means thatinterest rates are more sensitive to changes in expected consumption growth.

The risk free rate is given by

Ui @ 4@H+p,= (6)

The risk free rate is known ahead of time, sos @ H+p{, becomes4 @ H+pUi , @H+p,Ui .

If a risk free security is not traded, we can de¿neUi @ 4@H+p, as the “shadow” risk-freerate. (In some models it is called the “zero-beta” rate.) If one introduced a risk free securitywith returnUi @ 4@H+p,, investors would be just indifferent to buying or selling it. I useUi to simplify formulas below with this understanding.

To think about the economics behind interest rates in a simple setup, use power utilityx3+f, @ f�� . Start by turning off uncertainty, in which case

Ui @4

�fw.4fw

��

.

22

Page 23: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

We can see three effects right away:

1. Interest rates are high when people are impatient, when � is low. If everyone wants toconsume now, it takes a high interest rate to convince them to save.

2. Interest rates are high when consumption growth is high. In times of high interest rates,it pays investors to consume less now, invest more, and consume more in the future.Thus, high interest rates lower the level of consumption today, while raising its growthrate from today to tomorrow.

3. Interest rates are more sensitive to consumption growth if the power parameter � is large.If utility is highly curved, the investor cares more about maintaining a consumptionpro¿le that is smooth over time, and is less willing to rearrange consumption over timein response to interest rate incentives. Thus it takes a larger interest rate change to inducehim to a given consumption growth.

To understand how interest rates behave when there is some uncertainty, I specify thatconsumption growth is lognormally distributed. In this case, the riskfree rate equation be-comes

uiw @ � . �Hw�oq fw.4 � �5

5�5w +� oq fw.4, (7)

where I have de¿ned the log riskfree rateuiw and subjective discount rate� by

uiw @ oqUiw > � @ h��>

and� denotes the¿rst difference operator,

�oq fw.4 @ oq fw.4 � oq fw=

To derive expression (2.7) for the riskfree rate, start with

Uiw @ 4@Hw

%�

�fw.4fw

���&=

Using the fact that normal} means

H +h}, @ hH+},.45�5+},

(you can check this by writing out the integral that de¿nes the expectation), we have

Uiw @

�h��h��Hw�oq fw.4.

�5

5�5w�oq fw.4

��4=

Then take logarithms. The combination of lognormal distributions and power utility is one ofthe basic tricks to getting analytical solutions in this kind of model. Section 2.30 shows howto get the same result in continuous time.

23

Page 24: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

Looking at (2.7), we see the same results as we had with the deterministic case. Inter-est rates are high when impatience� is high and when consumption growth is high� higher�makes interest rates more sensitive to consumption growth. The new�5 term capturespre-cautionary savings. When consumption is more volatile, people with this utility function aremore worried about the low consumption states than they are pleased by the high consump-tion states. Therefore, people want to save more, driving down interest rates.

We can also read the same terms backwards: consumption growth is high when interestrates are high, since people save more now and spend it in the future, and consumption is lesssensitive to interest rates as the desire for a smooth consumption stream, captured by�, rises.. Section 3.2 below takes up the question of which way we should read this equation – asconsumption determining interest rates, or as interest rates determining consumption.

For the power utility function, the curvature parameter� simultaneously controls in-tertemporal substitution – aversion to a consumption stream that varies over time, risk aver-sion – aversion to a consumption stream that varies across states of nature, and precautionarysavings, which turns out to depend on the third derivative of the utility function. This link isparticular to the power utility function. We will study utility functions below that loosen thelinks between these three quantities.

2.4.2 Risk corrections

s @H+{,

Ui. fry+p>{,

H+Ul,�Ui @ �Ui fry�p>Ul

�=

Payoffs that are positively correlated with consumption growth have lower prices, to com-pensate investors for risk. Expected returns are proportional to the covariance of returns withdiscount factors.

Using the de¿nition of covariancefry+p>{, @ H+p{, � H+p,H+{,, we can writeequation (2.2) as

s @ H+p,H+{, . fry+p>{,= (8)

Substituting the riskfree rate equation (2.6), we obtain

s @H+{,

Ui. fry+p>{, (9)

24

Page 25: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

The ¿rst term in (2.9) is the standard discounted present value formula. This is the asset’sprice in a risk-neutral world – if consumption is constant or if utility is linear. The secondterm is arisk adjustment. An asset whose payoff covaries positively with the discount factorhas its price raised and vice-versa.

To understand the risk adjustment, substitute back forp in terms of consumption, toobtain

s @H+{,

Ui.

fry ^�x3+fw.4,> {w.4`x3+fw,

(10)

Marginal utility x3+f, declines asf rises. Thus, an asset’s price is lowered if its payoff co-varies positively with consumption. Conversely, an asset’s price is raised if it covaries nega-tively with consumption.

Why? Investors do not like uncertainty about consumption. If you buy an asset whosepayoff covaries positively with consumption, one that pays off well when you are alreadyfeeling wealthy, and pays off badly when you are already feeling poor, that asset will makeyour consumption stream more volatile. You will require a low price to induce you to buysuch an asset. If you buy an asset whose payoff covaries negatively with consumption, ithelps to smooth consumption and so is more valuable than its expected payoff might indicate.Insurance is an extreme example. Insurance pays off exactly when wealth and consumptionwould otherwise be low–you get a check when your house burns down. For this reason, youare happy to hold insurance, even though you expect to lose money—even though the priceof insurance is greater than its expected payoff discounted at the risk free rate.

To emphasize why thecovariance of a payoff with the discount factor rather than itsvariance determines its riskiness, keep in mind that the investor cares about the volatility ofconsumption. He doesnot care about the volatility of his individual assets or of his portfolio,if he can keep a steady consumption. Consider what happens to the volatility of consumptionif the investor buys a little more� of payoff{:

�5+f, becomes�5+f. �{, @ �5+f, . 5�fry+f> {, . �5�5+{,

For small (marginal) portfolio changes, thecovariance between consumption and payoff de-termines the effect of adding a bit more of each payoff on the volatility of consumption.

We use returns so often that it is worth restating the same intuition for the special case thatthe price is one and the payoff is a return. Start with the basic pricing equation for returns,

4 @ H+pUl,=

I denote the returnUl to emphasize that the point of the theory is to distinguish the behaviorof one assetUl from anotherUm . Apply the covariance decomposition,

4 @ H+p,H+Ul, . fry+p>Ul, (11)

25

Page 26: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

and, usingUi @ 4@H+p,,

H+Ul,�Ui @ �Ui fry+p>Ul, (12)

or

H+Ul,�Ui @ �fry^x3+fw.4,>Ulw.4`

H^x3+fw.4,`= (13)

All assets have an expected return equal to the risk-free rate, plus a risk adjustment. Assetswhose returns covary positively with consumption make consumption more volatile, and somust promise higher expected returns to induce investors to hold them. Conversely, assetsthat covary negatively with consumption, such as insurance, can offer expected rates of returnthat are lower than the risk-free rate, or even negative (net) expected returns.

Much of ¿nance focuses on expected returns. We think of expected returns increasingor decreasing to clear markets� we offer intuition that “riskier” securities must offer higherexpected returns to get investors to hold them, rather than saying “riskier” securities trade forlower prices so that investors will hold them. Of course, a low initial price for a given payoffcorresponds to a high expected return, so this is no more than a different language for thesame phenomenon.

2.4.3 Idiosyncratic risk does not affect prices

Only the component of a payoff perfectly correlated with the discount factor generates anextra return.Idiosyncratic risk, uncorrelated with the discount factor, generates no premium.

You might think that an asset with a high payoff variance is “risky” and thus should havea large risk correction. However, if the payoff is uncorrelated with the discount factorp, theasset receivesno risk-correction to its price, and pays an expected return equal to the risk-freerate! In equations, if

fry+p>{, @ 3

then

s @H+{,

Ui=

This prediction holds even if the payoff{ is highly volatile and investors are highly riskaverse. The reason is simple: if you buy a little bit more of such an asset, it has no¿rst-ordereffect on the variance of your consumption stream.

More generally, one gets no compensation or risk adjustment for holdingidiosyncratic

26

Page 27: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

risk. Only systematic risk generates a risk correction. To give meaning to these words, we candecompose any payoff { into a part correlated with the discount factor and an idiosyncraticpart uncorrelated with the discount factor by running a regression,

{ @ surm+{mp, . %=

Then, the price of the residual or idiosyncratic risk % is zero, and the price of { is the sameas the price of its projection on p. The projection of { on p is of course that part of {which is perfectly correlated with p. The idiosyncratic component of any payoff is that partuncorrelated with p. Thus only the systematic part of a payoff accounts for its price.

Projection means linear regression without a constant,

surm+{mp, @H+p{,

H+p5,p=

You can verify that regression residuals are orthogonal to right hand variables H+p%, @ 3from this de¿nition. H+p%, @ 3 of course means that the price of % is zero.

s +surm+{mp,, @ s

�H+p{,

H+p5,p

�@ H

�p5H+p{,

H+p5,

�@ H+p{, @ s+{,=

The words “systematic” and “idiosyncratic” are de¿ned differently in different contexts,which can lead to some confusion. In this decomposition, the residuals% can be correlatedwith each other, though they are not correlated with the discount factor. The APT starts witha factor-analytic decomposition of the covariance of payoffs, and the word “idiosyncratic”there is reserved for the component of payoffs uncorrelated with all of the other payoffs.

2.4.4 Expected return-beta representation

We can write s @ H+p{, as

H+Ul, @ Ui . �l>p�p

We can express the expected return equation (2.12), for a return Ul, as

H+Ul, @ Ui .

�fry+Ul>p,

ydu+p,

���ydu+p,

H+p,

�(14)

or

H+Ul, @ Ui . �l>p�p (15)

27

Page 28: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

where�lp is the regression coef¿cient of the returnUl onp. This is abeta pricing model.It says that expected returns on assetsl @ 4> 5> ===Q should be proportional to their betas in aregression of returns on the discount factor. Notice that the coef¿cient�p is the same for allassetsl>while the�l>p varies from asset to asset. The�p is often interpreted as theprice ofrisk and the� as thequantity of risk in each asset.

Obviously, there is nothing deep about saying that expected returns are proportional tobetas rather than to covariances. There is a long historical tradition and some minor conve-nience in favor of betas. The betas refer to the projection ofU onp that we studied above,so you see again a sense in which only the systematic component of risk matters.

With p @ � +fw.4@fw,�� , we can take a Taylor approximation of equation (2.14) to

express betas in terms of a more concrete variable, consumption growth, rather than marginalutility. The result, which I derive more explicitly and conveniently in the continuous timelimit below, is

H+Ul, @ Ui . �l>�f��f (2.16)

��f @ �ydu+�f,=

Expected returns should increase linearly with their betas on consumption growth itself. Inaddition, though it is treated as a free parameter in many applications, the factor risk premium��f is determined by risk aversion and the volatility of consumption. The more risk aversepeople are, or the riskier their environment, the larger an expected return premium one mustpay to get investors to hold risky (high beta) assets.

2.4.5 Mean-variance frontier

All asset returns lie inside a mean-variance frontier. Assets on the frontier are perfectlycorrelated with each other and the discount factor. Returns on the frontier can be generatedas portfolios of any two frontier returns. We can construct a discount factor from any frontierreturn (exceptUi ), and an expected return-beta representation holds using any frontier return(exceptUi ) as the factor.

Asset pricing theory has focused a lot on the means and variances of asset returns. Inter-estingly, the set of means and variances of returns is limited. All assets priced by the discountfactorp must obey

��H+Ul,�Ui�� � �+p,

H+p,�+Ul,= (17)

28

Page 29: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

To derive (2.17) write for a given asset return Ul

4 @ H+pUl, @ H+p,H+Ul, . �p>Ul�+Ul,�+p,

and hence

H+Ul, @ Ui � �p>Ul

�+p,

H+p,�+Ul,= (18)

Correlation coef¿cients can’t be greater than one in magnitude, leading to (2.17).

This simple calculation has many interesting and classic implications.

1. Means and variances of asset returns must lie in the wedge-shaped region illustratedin Figure 1. The boundary of the mean-variance region in which assets can lie is called themean-variance frontier. It answers a naturally interesting question, “how much mean returncan you get for a given level of variance?”

E(R)

σ(R)

Mean-variance frontier

Rf

Slope σ(m)/E(m)

Some asset returns

Idiosyncratic risk Ri

Figure 1. Mean-variance frontier. The mean and standard deviation of all assets priced bya discount factorp must line in the wedge-shaped region

2. All returns on the frontier are perfectly correlated with the discount factor: the frontieris generated by

���p>Ul

�� @ 4. Returns on the upper part of the frontier are perfectly negativelycorrelated with the discount factor and hence positively correlated with consumption. Theyare “maximally risky” and thus get the highest expected returns. Returns on the lower part ofthe frontier are perfectly positively correlated with the discount factor and hence negativelycorrelated with consumption. They thus provide the best insurance against consumptionÀuctuations.

29

Page 30: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

3. All frontier returns are also perfectly correlated with each other, since they are allperfectly correlated with the discount factor. This fact implies that we canspan or synthesizeany frontier return from two such returns. For example if you pick any single frontier returnUp then all frontier returnsUpy must be expressible as

Upy @ Ui . d�Up �Ui

�for some numberd.

4. Since each point on the mean-variance frontier is perfectly correlated with the discountfactor, we must be able to pick constantsd> e> g> h such that

p @ d. eUpy

Upy @ g. hp=

Thus, any mean-variance ef¿cient return carries all pricing information.Given a mean-variance ef¿cient return and the risk free rate, we can¿nd a discount factor that prices allassets and vice versa.

5. Given a discount factor, we can also construct a single-beta representation, soexpectedreturns can be described in a single - beta representation using any mean-variance ef¿cientreturn (except the riskfree rate),

H+Ul, @ Ui . �l>py�H+Upy,�Ui

�=

The essence of the � pricing model is that, even though the means and standard deviationsof returns ¿ll out the space inside the mean-variance frontier, a graph of mean returns versusbetas should yield a straight line. Since the beta model applies to every return includingUpy itself, andUpy has a beta of one on itself, we can identify the factor risk premium as� @ H+Upy �Ui ,.

The last two points suggest an intimate relationship between discount factors, beta modelsand mean-variance frontiers. I explore this relation in detail in Chapter 7. A problem at theend of this chapter guides you through the algebra to demonstrate points 4 and 5 explicitly.

6. We can plot the decomposition of a return into a “priced” or “systematic” componentand a “residual,” or “idiosyncratic” component as shown in Figure 1. The priced part isperfectly correlated with the discount factor, and hence perfectly correlated with any frontierasset. The residual or idiosyncratic part generates no expected return, so it liesÀat as shownin the¿gure, and it is uncorrelated with the discount factor or any frontier asset..

2.4.6 Slope of the mean-standard deviation frontier and equity premium puzzle

The Sharpe ratio is limited by the volatility of the discount factor. The maximal risk-return

30

Page 31: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

tradeoff is steeper if there is more risk or more risk aversion����H+U,�Ui

�+U,

���� � �+p,

H+p,� ��+� oq f,

This formula captures the equity premium puzzle, which suggests that either people are veryrisk averse, or the stock returns of the last 50 years were good luck which will not continue.

The ratio of mean excess return to standard deviation

H+Ul,�Ui

�+Ul,@ Sharpe ratio

is known as the Sharpe ratio. It is a more interesting characterization of any security thanthe mean return alone. If you borrow and put more money into a security, you can increasethe mean return of your position, but you do not increase the Sharpe ratio, since the standarddeviation increases at the same rate as the mean. The slope of the mean-standard deviationfrontier is the largest available Sharpe ratio, and thus is naturally interesting. It answers “howmuch more mean return can I get by shouldering a bit more volatility in my portfolio?”

Let Upy denote the return of a portfolio on the frontier. From equation (2.17), the slopeof the frontier is ����H+Upy,�Ui

�+Upy,

���� @ �+p,

H+p,@ �+p,Ui =

Thus, the slope of the frontier is governed by the volatility of the discount factor.

For an economic interpretation, again consider the power utility function,x3+f, @ f�� >����H+Upy,�Ui

�+Upy,

���� @ � ^+fw.4@fw,�� `

Hk+fw.4@fw,

��l = (19)

The standard deviation is large if consumption is volatile or if� is large. We can state thisapproximation again using the lognormal assumption. If consumption growth is lognormal,����H+Upy,�Ui

�+Upy,

���� @ sh�5�5+� oq fw.4, � 4 � ��+� oq f,= (20)

(A problem at the end of the chapter guides you though the algebra of the¿rst equality.The relation is exact in continuous time, and thus the approximation is easiest to derive byreference to the continuous time result� see section 2.30.)

Reading the equation,the slope of the mean-standard deviation frontier is higher if theeconomy is riskier – if consumption is more volatile – or if investors are more risk averse.Both situations naturally make investors more reluctant to take on the extra risk of holding

31

Page 32: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

risky assets. This expression is also the slope of the expected return beta line of the consump-tion beta model, (2.16).

In postwar US data, the slope of the mean-standard deviation frontier, or of expectedreturn-beta lines is much higher than reasonable risk aversion and consumption volatilityestimates suggest. This is the “equity premium puzzle.” Over the last 50 years in the U.S.,real stock returns have averaged 9% with a standard deviation of about 16%, while the realreturn on treasury bills has been about 1%. Thus, the historical annual market Sharpe ratiohas been about 0.5. Aggregate consumption growth has been about 1%. Thus, we can onlyreconcile these facts with (2.20) if investors have a risk aversion coef¿cient of 50!

Obvious ways of generalizing the calculation just make matters worse. Equation (2.20)relates consumption growth to the mean-variance frontier of all contingent claims. The mar-ket indices with 0.5 Sharpe ratios are if anything inside that frontier, so recognizing marketincompleteness will only make matters worse. Aggregate consumption has about 0.2 cor-relation with the market return, while the equality (2.20) takes the worst possible case thatconsumption growth and asset returns are perfectly correlated. If you add this fact, you needrisk aversion of 250 to explain the market Sharpe ratio in the face of 1% consumption volatil-ity! Individuals have riskier consumption streams than aggregate, but as their risk goes uptheir correlation with any aggregate must decrease proportionally, so to¿rst order recongniz-ing individual risk will not help either.

Clearly, either 1) people are alot more risk averse than we might have thought 2) the stockreturns of the last 50 years were largely good luck rather than an equilibrium compensationfor risk, or 3) something is deeply wrong with the model, including the utility function anduse of aggregate consumption data. This “equity premium puzzle” has attracted the attentionof a lot of research in¿nance, especially on the last item. I return to the equity premium inmore detail in Chapter??.

2.4.7 Random walks and time-varying expected returns

If investors are risk neutral, returns are unpredictable, and prices follow martingales. Ingeneral, prices scaled by marginal utility are martingales, and returns can be predictable ifinvestors are risk averse and if the conditional second moments of returns and discount factorsvary over time. This is more plausible at long horizons.

So far, we have concentrated on the behavior of prices or expected returns across assets.We should also consider the behavior of the price or return of a given asset over time. Goingback to the basic ¿rst order condition,

swx3+fw, @ Hw^�x

3+fw.4,+sw.4 . gw.4,`= (21)

32

Page 33: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.4 CLASSIC ISSUES IN FINANCE

If investors are risk neutral, i.e. if x+f, is linear or there is no variation in consumption,if the security pays no dividends between w and w. 4, and for short time horizons where � isclose to one, this equation reduces to

sw @ Hw+sw.4,=

Equivalently, prices follow a time-series process of the form

sw.4 @ sw . %w.4=

If the variance�5w +%w.4, is constant, prices follow arandom walk. More generally, pricesfollow a martingale. Intuitively, if the price today is a lot lower than investor’s expectationsof the price tomorrow, then people will try to buy the security. But this action will driveup the price of the security until the price today does equal the expected price tomorrow.Another way of saying the same thing is that returns should not be predictable� dividing bysw, expected returnsHw+sw.4@sw, @ 4 should be constant� returns should be like coinÀips.

The more general equation (2.21) says that prices should follow a martingale after adjust-ing for dividends and scaling by marginal utility. Since martingales have useful mathematicalproperties, and since risk-neutrality is such a simple economic environment, many asset pric-ing results are easily derived by scaling prices and dividends by marginal utility¿rst, andthen using “risk-neutral” formulas and economic arguments.

Since consumption and risk aversion don’t change much day to day, we might expectthe random walk view to hold pretty well on a day-to-day basis. This idea contradicts thestill popular notion that there are “systems” or “technical analysis” by which one can predictwhere stock prices are going on any given day. It has been remarkably successful. Despitedecades of dredging the data, and the popularity of television and radio reports that purportto explain where markets are going, trading rules that reliably survive transactions costs anddo not implicitly expose the investor to risk have not yet been reliably demonstrated.

However, more recently, evidence has accumulated that long-horizon excess returns arequite predictable, and to some this indicates that the whole enterprise of economic explana-tion of asset returns isÀawed. To think about this issue, write our basic equation for expectedreturns as

Hw+Uw.4,�Uiw @ �fryw+pw.4> Uw.4,

Hw+pw.4,(2.22)

@�w+pw.4,

Hw+pw.4,�w+Uw.4,�w+pw.4> Uw.4,

� �w�w+�fw.4,�w+Uw.4,�w+pw.4> Uw.4,=

I include thew subscripts to emphasize that the relation applies toconditional moments.Sometimes, theconditional mean or other moment of a random variable is different from itsunconditional moment. Conditional on tonight’s weather forecast, you can better predict raintomorrow than just knowing the average rain for that date. In the special case that random

33

Page 34: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

variables are i.i.d. (independent and identically distributed), like coinÀips, the conditionaland unconditional moments are the same, but that is a special case and not likely to be true ofasset prices, returns, and macroeconomic variables. In the theory so far, we have thought ofan investor, today, forming expectations of payoffs, consumption, and other variables tomor-row. Thus, the moments are really allconditional, and if we want to be precise we shouldinclude some notation to express this fact. I use subscriptsHw+{w.4, to denote conditionalexpectation� the notationH+{w.4mLw, whereLw is the information set at timew is more precisebut a little more cumbersome.

Examining equation (2.22), we see that returns can be somewhat predictable. First, if theconditional variance of returns changes over time, so can and should the conditional mean– the return can just move in and out a line of constant Sharpe ratio. This explanation doesnot seem to help much in the data� variables that forecast means do not seem to forecastvariances and vice versa. Unless we want to probe the conditional correlation, predictableexcess returns have to be explained by changing risk –�w+�fw.4, – or changing risk aversion�. It is not plausible that risk or risk aversion change at daily frequencies, but fortunatelyreturns are not predictable at daily frequencies. It is much more plausible that risk and riskaversion change over the business cycle, and this is exactly the horizon at which we seepredictable excess returns. Models that make this connection precise are a very active area ofcurrent research.

2.4.8 Present value statement

sw @ Hw

4[m@3

pw>w.mgw.m =

It is convenient to use only the two period valuation, thinking of a pricesw and a payoff{w.4= But there are times when we want to relate a price to the entire cashÀow stream, ratherthan just to one dividend and next period’s price.

The most straightforward way to do this is to write out a longer term objective,

Hw

4[m@3

�mx+fw.m,=

Now suppose an investor can purchase a streamigw.mj at pricesw. As with the two-period

34

Page 35: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.5 DISCOUNT FACTORS IN CONTINUOUS TIME

model, his ¿rst order condition gives us the pricing formula directly,

sw @ Hw

4[m@3

�mx3+fw.m,x3+fw,

gw.m @ Hw

4[m@3

pw>w.mgw.m = (23)

You can see that if this equation holds at time w and time w . 4, then we can derive thetwo-period version

sw @ Hw^pw.4+sw.4 . gw.4,` (24)

Thus, the in¿nite period and two period models are equivalent.

(Going in the other direction is a little tougher. If you chain together (2.24), you get (2.23)plus an extra term. To get (2.23) you also need the “transversality condition”olpw$4Hwpw>w.msw.m @3. This is an extra¿rst order condition of the in¿nite period investor, which is not presentwith overlapping generations of two-period investors. It rules out “bubbles” in which pricesgrow so fast that people will buy now just to resell at higher prices later, even if there are nodividends.)

From (2.23) we can write a risk-adjustment to prices, as we did with one period payoffs,

sw @4[m@4

Hwgw.m

Uiw>w.m

.4[m@4

fryw+gw.m >pw>w.m,

whereUiw>w.m � Hw+pw>w.m,�4 is them period interest rate. Again, assets whose dividend

streams covary negatively with marginal utility, and positively with consumption, have lowerprices, since holding those assets gives the investor a more volatile consumption stream. (Itis common instead to write prices as a discounted value using a risk adjusted discount factor,e.g.sw @

[email protected]@Uw>w.mbut this approach is dif¿cult to use correctly for multiperiod

problems, especially when expected returns can vary over time.)

2.5 Discount factors in continuous time

Continuous time versions of the basic pricing equations.

Discrete Continuous

sw @ Hw

S4m@4 �

m x3+fw.m,x3+fw,

Gw.m swx3+fw, @ Hw

U4v@3

h��vx3+fw.v,Gw.vgv

pw.4 @ � x3+fw.4,x3+fw,

�w @ h��wx3+fw,s @ H+p{, 3 @ �G gw.Hw^g+�s,`

H+U, @ Ui �Uifry+p>U, Hw

�gss

�. G

sgw @ uiw gw�Hw

kg��

gss

l

35

Page 36: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

It is often convenient to express asset pricing ideas in the language of continuous timestochastic differential equations rather than discrete time stochastic difference equations asI have done so far. The appendix contains a brief introduction to continuous time processesthat covers what you need to know for this book. Even if you want to end up with a discretetime representation, manipulations are often easier in continuous time. For example, relatinginterest rates and Sharpe ratios to consumption growth in the last section required a clumsylognormal approximation� you’ll see the same sort of thing done much more cleanly in thissection.

The choice of discrete vs. continuous time is one of modeling convenience. The richnessof the theory of continuous time processes often allows one to obtain analytical results thatwould be unavailable in discrete time. On the other hand, in the complexity of most practicalsituations, one often ends up resorting to numerical simulation of a discretized model anyway.In those cases, it might be clearer to start with a discrete model. But I emphasize this is all achoice of language. One should become familiar enough with discrete as well as continuoustime representations of the same ideas to pick the representation that is most convenient fora particular application.

First, we need to think about how to model securities, in place of pricesw and one-periodpayoff {w.4. Let a generic security have pricesw at any moment in time, and let it paydividends at the rateGwgw. (I will continue to denote functions of time assw rather thans+w,to maintain continuity with the discrete-time treatment, and I will drop the time subscriptswhere they are obvious, e.g.gs in place ofgsw. In an intervalgw, the security pays dividendsGwgw= I use capitalG for dividends to distinguish them from the differential operatorg. )

The instantaneous total return is

gswsw

.Gw

swgw=

We model the price of risky assets as diffusions, for example

gswsw

@ �+�,gw. �+�,g}=

(I will reserve the notationg} for increments to a standard Brownian motion, e.g.}w.� �}w � Q +3>�,. I use the notation+�, to indicate that the drift and diffusions can be functionsof state variables. I limit the discussion to diffusion processes – no jumps.) What’s nice aboutthis diffusion model is that the incrementsg} are normal� the dependence of� and� on statevariables means that the¿nite time distribution of pricesi+sw.�mLw, need not be normal.

We can think of a riskfree security as one that has a constant price equal to one and paysthe riskfree rate as a dividend,

s @ 4> Gw @ uiw > (25)

36

Page 37: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.5 DISCOUNT FACTORS IN CONTINUOUS TIME

or as a security that pays no dividend but whose price climbs deterministically at a rate

gswsw

@ uiw gw= (26)

Next, we need to express the ¿rst order conditions in continuous time. The utility functionis

X +ifwj, @ H

] 4

w@3

h��wx+fw,gw=

Suppose the investor can buy a security whose price is sw and that pays a dividend stream Gw.As we did in deriving the present value price relation in discrete time, the ¿rst order conditionfor this problem gives us the in¿nite period version of the basic pricing equation right away1,

swx3+fw, @ Hw

] 4

v@3

h��vx3+fw.v,Gw.vgv (27)

This equation is an obvious continuous time analogue to

sw @ Hw

4[m@3

�wx3+fw.m,x3+fw,

Gw.m =

It turns out that dividing by x3+fw, is not a good idea in continuous time, since the ratiox3+fw.�,@x3+fw, isn’t well behaved for small time intervals. Instead, we keep track of thelevel of marginal utility. Therefore, de¿ne the “discount factor” in continuous time as

�w � h��wx3+fw,=

Then we can write the pricing equation as

sw�w @ Hw

] 4

v@3

�w.vGw.vgv= (28)

(Some people like to de¿ne�w @ x3+fw,, in which case you keep theh��w in the equation, or

to scale�w by the riskfree rate, in which case you get an extrah�Uv�@3

uiw.�g� in the equation.

The latter procedure makes it look like a risk-neutral or present-value formula valuation.)

The analogue to the one period pricing equations @ H+p{, is

3 @ �G gw.Hw ^g+�s,` = (29)

� One unit of the security pays the dividend stream (|, i.e. (|_| units of the numeraire consumption good in atime interval _|. The security costs R| units of the consumption good. The investor can ¿nance the purchase of 1units of the security by reducing consumption from e| to S| ' e| 3 1R|*_| during time interval _|. The loss inutility from doing so is ��ES|�Ee| 3 S|�_| ' ��ES|�1R|. The gain is the right hand side of (2.27)

37

Page 38: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

To derive this fundamental equation, take the difference of equation (2.28) atw andw . �.(Or, start directly with the¿rst order condition for buying the security atwand selling it atw.�.)

sw�w @ Hw

] �

v@3

�w.vGw.vgv.Hw ^�w.�sw.�`

For� small the term in the integral can be approximated

sw�w � �wGw�.Hw ^�w.�sw.�` = (30)

We want to get tog something, so introduce differences by writing

sw�w � �wGw�.Hw ^�wsw . +�w.�sw.� � �wsw,` = (31)

Cancelingsw�w >

3 � �wGw�.Hw+�w.�sw.� � �wsw,=

Taking the limit as� $ 3>

3 @ �wGwgw.Hw ^g+�wsw,`

or, dropping time subscripts, equation (2.29).

Equation (2.29) looks different thans @ H+p{, because there is no price on the lefthand side� we are used to thinking of the one period pricing equation as determining price atwgiven other things, including price atw . 4=But price atw is really here, of course, as youcan see from equation (2.30) or (2.31). It is just easier to express thedifference in price overtime rather than price today on the left and payoff (including price tomorrow) on the right.

With no dividends and constant�, 3 @ Hw +gsw, @ Hw+sw.� � sw, says that price shouldfollow a martingale. Thus,Hw ^g+�s,` @ 3 means that marginal utility-weighted price shouldfollow a martingale, and 2.29 adjusts for dividends. Thus, it’s the same as the equation (2.21),swx

3+fw, @ Hw +pw.4 +sw.4 . gw.4,, that we derived in discrete time.

Since we will write down price processes forgs and discount factor processes forg�, andto interpret (2.29) in terms of expected returns, it is often convenient to break up theg+�wsw,term using Ito’s lemma:

g+�s, @ sg�. �gs. gsg�= (32)

Using the expanded version (2.32) in the basic equation (2.29), and dividing bys� to makeit pretty, we obtain an equivalent, slightly less compact but slightly more intuitive version,

3 @G

sgw.Hw

�g�

�.

gs

s.

g�

gs

s

�= (33)

38

Page 39: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.5 DISCOUNT FACTORS IN CONTINUOUS TIME

(This formula only works when both � and s can never be zero. That is often enough the casethat this formula is useful. If not, multiply through by � and s and keep them in numerators.)

Applying the basic pricing equations (2.29) or (2.33) to a riskfree rate, de¿ned as (2.25)or (2.26), we obtain

uiw gw @ �Hw

�g�w�w

�(34)

This equation is the obvious continuous time equivalent to

Uiw @

4

Hw+pw.4,=

If a riskfree rate is not traded, we can use (2.34) to de¿ne a shadow riskfree rate or zero-betarate.

With this interpretation, we can rearrange equation (2.33) as

Hw

�gswsw

�.

Gw

swgw @ uiw gw�Hw

�g�w�w

gswsw

�= (35)

This is the obvious continuous-time analogue to

H+U, @ Ui �Uifry+p>U,= (36)

The last term in (2.35) is the covariance of the return with the discount factor or marginalutility. Since means are ordergw, there is no difference between covariance and second mo-ment in the last term of (2.35). The interest rate component of the last term of (2.36) naturallyvanishes as the time interval gets short.

Ito’s lemma makes many transformations simple in continuous time. For example, thenonlinear transformation between consumption and the discount factor led us to some trickyapproximations in discrete time. This transformation is easy in continuous time (diffusionsare locally normal, so it’s really the same trick). With�w @ h��wx3+fw, we have

g�w @ ��h��wx3+fw,gw. h��wx33+fw,gfw .4

5h��wx333+fw,gf5w

g�w�w

@ ��gw.fwx

33+fw,x3+fw,

gfwfw

.4

5

f5wx333+fw,

x3+fw,gf5wf5w

(37)

Denote the local curvature and third derivative of the utility function as

�w @ �fwx33+fw,

x3+fw,

�w @f5wx

333+fw,x3+fw,

=

39

Page 40: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

(For power utility, the former is the power coef¿cient� and the latter is�w @ �+� . 4,.)

Using this formula we can quickly redo the relationship between interest rates and con-sumption growth, equation (2.7),

uiw @ � 4

gwHw

�g�w�w

�@ � . �w

4

gwHw

�gfwfw

�� 4

5�w

4

gwHw

�gf5wf5w

�=

We can also easily express asset prices in terms of consumption risk rather than discountfactor risk, as in equation (2.16). Using (2.37) in (2.35),

Hw

�gswsw

�.

Gw

swgw� uiw gw @ �Hw

�gfwfw

gswsw

�(38)

Thus, assets whose returns covary more strongly with consumption get higher mean returns,and the constant relating covariance to mean return is the utility curvature coef¿cient�.

Since correlations are less than one, equation (2.38) implies that Sharpe ratios are re-lated to utility curvature and consumption volatility directly� we don’t need the ugly log-normal facts and an approximation that we needed in (2.20). Using�s � Hw +gsw@sw, >

�5s @ Hw

k+gsw@sw,

5l> �5f @ Hw

k+gfw@fw,

5l,

�s .Gw

swgw� uiw gw

�s� ��f=

2.6 Problems

1.

(a) The absolute risk aversion coef¿cient is

x33+f,x3+f,

=

We scale byx3+f, because expected utility is only de¿ned up to lineartransformations –d. ex+f, gives the same predictions asx+f, – and this measure ofthe second derivative is invariant to linear transformations. Show that the utilityfunction with constant absolute risk aversion is

x+f, @ �h��f=

(b) The coef¿cient of relative risk aversion in a one-period model (i.e. when

40

Page 41: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.6 PROBLEMS

consumption equals wealth) is de¿ned as

uud @fx33+f,x3+f,

=

For power utility x3+f, @ f�� , show that the risk aversion coef¿cient equals thepower.

(c) The elasticity of intertemporal substitution is de¿ned as

�L � �f5@f4g+f4@f5,

gU@U=

Show that with power utility x3+f, @ f�� , the intertemporal substitution elasticity isequal to 4@�.

Show that the “idiosyncratic risk” line in Figure 1 is horizontal.2.

(a) Suppose you have a mean-variance ef¿cient returnUpy and the risk free rate.using the fact thatUpy is perfectly correlated with the discount factor, construct adiscount factorp in terms ofUi andUpy, with no free parameters. (the constantsin p @ d. eUpy will be functions of things likeH+Upy,)

(b) Using this result, and the beta model in terms ofp, show that expected returns canbe described in a single - beta representation using any mean-variance ef¿cientreturn (except the riskfree rate).

H+Ul, @ Ui . �l>py�H+Upy,�Ui

�=

3. Can the “Sharpe ratio” between two risky assets exceed the slope of the mean-variancefrontier? I.e. ifUpy is on the frontier, is it possible that

H+Ul,�H+Um,

�+Ul �Um,A

H+Upy,�Ui

�+Upy,B

4. Show that if consumption growth is lognormal, then

����H+Upy,�Ui

�+Upy,

���� @ � ^+fw.4@fw,�� `

Hk+fw.4@fw,

��l = @ s

h�5�5+� oq fw.4, � 4 � ��+� oq f,=

(Start with�5+{, @ H+{5,�H+{,5 and the lognormal propertyH+h}, @ hH}.45�5+},.)

5. There are assets with mean return equal to the riskfree rate, but substantial standarddeviation of returns. Long term bonds are pretty close examples. Why would anyonehold such an asset?

6. The¿rst order conditions for an in¿nitely lived consumer who can buy an asset with

41

Page 42: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 2 CONSUMPTION-BASED MODEL AND OVERVIEW

dividend streamigwj are

sw @ Hw

4[m@4

�mx3+fw.m,x3+fw,

gw.m = (39)

The¿rst order conditions for buying a security with pricesw and payoff{w.4 @gw.4 . sw.4 are

sw @ Hw

��x3+fw.4,x3+fw,

+sw.4 . gw.4,

�(40)

(a) Derive (??) from (??)(b) Derive (??) from (??). You need an extra condition. Show that this extra condition is

a¿rst order condition for maximization. To do this, think about what strategy theconsumer could follow to improve utility if the condition did not hold.

7. The representative consumer maximizes a CRRA utility function.

Hw

[�mf4��w.m =

(a) Show that with log utility, the price/consumption ratio of the consumption stream isconstant, no matter what the distribution of consumption growth.

(b) Suppose there is news at timew that future consumption will be higher. For� ? 4> � @ 4>and� A 4, evaluate the effect of this news on the price. Make sense ofyour results. (Note: there is a real-world interpretation here. It’s often regarded as apuzzle that the market declines on good economic news. This is attributed to anexpectation by the market that the Fed will respond to such news by raising interestrates. Note that� A 3 in this problem gives a completely real and frictionlessinterpretation to this phenomenon! I thank Pete Hecht for this nice problem.)

8. Suppose a consumer has a utility function that includes leisure. (This could also be asecond good, or a good produced in another country.) Using the continuous time setup,show that expected returns will now depend ontwo covariances, the covariance of returnswith leisure and the covariance of returns with consumption, so long as leisure entersnon-separably, i.e.x+f> o, cannot be writteny+f, . z+o,. (This is a three line problem,but you need to apply ito’s lemma to�.)

9. From

4 @ H+pU,

show that the negative of the mean log discount factor must be larger thanany meanreturn,

�H+oqp, A H+oqU,=

How is it possible thatH+oqU, is bounded – what about returns of the formU @ +4 � �,Ui . �Up for arbitrarily large�B (Hint: start by assumingp andU

42

Page 43: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 2.6 PROBLEMS

are lognormal. Then see if you can generalize the results using Jensen’s inequality,H+i+{,, A i+H+{,, for i convex. The return that solvespd{UH+oqU, is known as thegrowth optimal portfolio.)

43

Page 44: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 3. Applying the basic model

3.1 Assumptions and applicability

Writing s @ H+p{,, we do not assume

1. Markets are complete, or there is a representative investor2. Asset returns or payoffs are normally distributed (no options), or independent over time.3. Two period investors, quadratic utility, or separable utility4. Investors have no human capital or labor income5. The market has reached equilibrium, or individuals have bought all the securities they

want to.

All of these assumptions come later, in various special cases, but we haven’t made themyet. We do assume that the investor can consider a small marginal investment or disinvest-ment.

The theory of asset pricing contains lots of assumptions to derive analytically convenientspecial cases and empirically useful representations. In writings @ H+p{, or sx3+fw, @Hw ^�x3+fw.4,{w.4` we havenot made most of these assumptions.

We havenot assumed complete markets or a representative investor. These equationsapply to each individual investor, for each asset to which he has access, independently ofthe presence or absence of other investors or other assets. Complete markets/representativeagent assumptions are used if one wants to use aggregate consumption data inx3+fw,, or otherspecializations and simpli¿cations of the model.

We havenot said anything about payoff or return distributions. In particular, we havenot assumed that returns are normally distributed or that utility is quadratic. The basicpricing equation should hold forany asset, stock, bond, option, real investment opportunity,etc., and any monotone and concave utility function. In particular, it is often thought thatmean-variance analysis and beta pricing models require these kind of limiting assumptionsor quadratic utility, but that is not the case. A mean-variance ef¿cient return carries all pricinginformation no matter what the distribution of payoffs, utility function, etc.

This is not a “two-period model.” The fundamental pricing equation holds for any twoperiods of a multi-period model, as we have seen. Really, everything involvesconditionalmoments, so we have not assumed i.i.d. returns over time.

I have written things down in terms of a time- and state-separable utility function and Ihave extensively used the convenient power utility example. Nothing important lies in either

44

Page 45: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 3.1 ASSUMPTIONS AND APPLICABILITY

choice. Just interpret x3+fw, as the partial derivative of a general utility function with respectto consumption at time w. State- or time-nonseparable utility (habit persistence, durability)complicates the relation between the discount factor and real variables, but does not changes @ H+p{, or any of the basic structure. We will look at several examples below.

We donot assume that investors have no non-marketable human capital, or no outsidesources of income. The¿rst order conditions for purchase of an asset relative to consumptionhold no matter what else is in the budget constraint. By contrast, the portfolio approach toasset pricing as in the CAPM and ICAPM relies heavily on the assumption that the investorhas no non-asset income, and we will study these special cases below. For example, leisurein the utility function just means thatx3+f> o, may depend ono as well asf.

We don’t even really need the assumption (yet) that the market is “in equilibrium,” thatinvestor has bought all of the asset that he wants to, or even that he can buy the asset at all.We can interprets @ H+p{, as giving us the value, or willingness to pay for, a small amountof a payoff{w.4 that the investor does not yet have. Here’s why: If the investor had a little�more of the payoff{w.4 at timew. 4>his utility x+fw, . �Hwx+fw.4, would increase by

�Hw ^x+fw.4 . �{w.4,� x+fw.4,` @ �Hw

�x3+fw.4,{w.4� .

4

5x33+fw.4, +{w.4�,

5 . ===

If � is small, only the¿rst term on the right matters. If the investor has to give up a smallamount of moneyyw� at timew, that loss lowers his utility by

x+fw � yw�, @ x3+fw,yw� .4

5x33+fw, +yw�,

5 . ====

Again, for small�, only the¿rst term matters. Therefore, in order to receive the small extrapayoff�{w.4, the investor is willing to pay the small amountyw� where

yw @ Hw

��x3+fw.4,x3+fw,

{w.4

�=

If this private valuation is higher than the market valuesw, and if the investor can buysome more of the asset, he will. As he buys more, his consumption will change� it will behigher in states where{w.4 is higher, driving downx3+fw.4, in those states, until the valueto the investor has declined to equal the market value. Thus,after an investor has reachedhis optimal portfolio, themarket value should obey the basic pricing equation as well, usingpost-trade or equilibrium consumption. But the formula can also be applied to generatethe marginalprivate valuation, using pre-trade consumption, or to value apotential, not yettraded security.

We have calculated the value of a “small” or marginal portfolio change for the investor.For some investment projects, an investor cannot take a small (“diversi¿ed”) position. For ex-ample, a venture capitalist or entrepreneur must usually take all or nothing of a project withpayoff streami{wj. Then the value of a project not already taken,H

Sm �

mx+fw.m . {w.m,

45

Page 46: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 3 APPLYING THE BASIC MODEL

might be substantially different from its marginal counterpart, HS

�mx3+fw.m,{w.m = Oncethe project is taken of course, fw.m . {w.m becomes fw.m , so the marginal valuation still ap-plies to the ex-post consumption stream. Analysts often forget this point and apply marginal(diversi¿ed) valuation models such as the CAPM to projects that must be bought in discretechunks. Also, we have abstracted from short sales and bid/ask spreads� this modi¿cationchangess @ H+p{, from an equality to a set of inequalities.

3.2 General Equilibrium

Asset returns and consumption: which is the chicken and which is the egg? The exogenousreturn model, the endowment economy model, and the argument that it doesn’t matter forstudyings @ H+p{,.

So far, we have not said where the joint statistical properties of the payoff{w.4 andmarginal utility pw.4 or consumptionfw.4 come from. We have also not said anythingabout the fundamental exogenous shocks that drive the economy. The basic pricing equations @ H+p{, tells us only what the price should be,given the joint distribution of consumption(marginal utility, discount factor) and the asset payoff.

There is nothing that stops us from writing the basic pricing equation as

x3+fw, @ Hw ^�x3+fw.4,{w.4@sw` =

We can think of this equation as determining today’sconsumption given asset prices andpayoffs, rather than determining today’sasset price in terms of consumption and payoffs.Thinking about the basic¿rst order condition in this way gives the permanent income modelof consumption.

Which is the chicken and which is the egg? Which variable is exogenous and which is en-dogenous? The answer is, neither, and for many purposes, it doesn’t matter. The¿rst orderconditions characterize any equilibrium� if you happen to knowH+p{,, you can use them todetermines� if you happen to knows, you can use them to determine consumption and sav-ings decisions. For most asset pricing applications we are interested in understanding a widecross-section of assets. Thus, it is interesting to contrast the cross-sectional variation in theirprices (expected returns) with cross-sectional variation in their second moments (betas) witha single discount factor. In most applications, the discount factor is a function of aggregatevariables (market return, aggregate consumption), so is plausible to hold the properties of thediscount factor constant as we study one individual asset after another. Permanent incomestudies typically dramatically restrict the number of assets under consideration, often to justan interest rate, and study the time-series evolution of aggregate or individual consumption.

Nonetheless, it is an obvious next step to complete the solution of our model economy� to

46

Page 47: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 3.2 GENERAL EQUILIBRIUM

¿nd f and s in terms of truly exogenous forces. The results will of course depend on what therest of the economy looks like, in particular the production or intertemporal transformationtechnology and the set of markets.

Figure 2 shows one possibility for a general equilibrium. Suppose that the productiontechnologies are linear: the real, physical rate of return (the rate of intertemporal transfor-mation) is not affected by how much is invested. Now consumption must adjust to thesetechnologically given rates of return. If the rates of return on the intertemporal technologieswere to change, the consumption process would have to change as well. This is, implic-itly, how the permanent income model works. This is how many¿nance theories such as theCAPM and ICAPM and the Cox, Ingersoll and Ross (1986) model of the term structure workas well. These models specify the return process, and then solve the consumer’s portfolio andconsumption rules.

Ct

Ct+1

R

Figure 2. Consumption adjusts when the rate of return is determined by a linear technology.

Figure 3 shows another extreme possibility for the production technology. This is an“endowment economy.” Nondurable consumption appears (or is produced by labor) everyperiod. There is nothing anyone can do to save, store, invest or otherwise transform con-sumption goods this period to consumption goods next period. Hence, asset prices mustadjust until people are just happy consuming the endowment process. In this case consump-tion is exogenous and asset prices adjust. Lucas (1978) and Mehra and Prescott (1985) aretwo very famous applications of this sort of “endowment economy.”

Which of these possibilities is correct? Well, neither, of course. The real economy and allserious general equilibrium models look something like¿gure 4: one can save or transformconsumption from one date to the next, but at a decreasing rate. As investment increases,

47

Page 48: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 3 APPLYING THE BASIC MODEL

Ct

Ct+1

R

Figure 3. Asset prices adjust to consumption in an endowment economy.

rates of return decline

Ct

Ct+1

R

Figure 4. General equilibrium. The solid lines represent the indifference curve and pro-duction possibility set. The dashed straight line represents the equilibrium rate of return.The dashed box represents an endowment economy that predicts the same consumption-assetreturn process.

48

Page 49: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 3.2 GENERAL EQUILIBRIUM

Does this observation invalidate any modeling we do with the linear technology (CAPM,CIR, permanent income) model, or the endowment economy model? No. Start at the equilib-rium in ¿gure 4. Suppose we model this economy as a linear technology, but we happen tochoose for the rate of return on the linear technologies exactly the same stochastic process forreturns that emerges from the general equilibrium. The resulting joint consumption, asset re-turn process is exactly the same as in the original general equilibrium! Similarly, suppose wemodel this economy as an endowment economy, but we happen to choose for the endowmentprocess exactly the stochastic process for consumption that emerges from the equilibriumwith a concave technology. Again, the joint consumption-asset return process is exactly thesame.

Therefore, there is nothing wrong in adopting one of the following strategies for empiricalwork:

1. Form a statistical model of bond and stock returns, solve the optimal consumption-portfolio decision. Use the equilibrium consumption values ins @ H+p{,.

2. Form a statistical model of the consumption process, calculate asset prices and returnsdirectly from the basic pricing equations @ H+p{,.

3. Form a completely correct general equilibrium model, including the productiontechnology, utility function and speci¿cation of the market structure. Derive theequilibrium consumption and asset price process, includings @ H+p{, as one of theequilibrium conditions.

If the statistical models for consumption and/or asset returns are right, i.e. if they coincidewith the equilibrium consumption or return process generated by the true economy, either ofthe¿rst two approaches will give correct predictions for the joint consumption-asset returnprocess.

As we will see, most¿nance models, developed from the 1950s through the early 1970s,take the return process as given, implicitly assuming linear technologies. The endowmenteconomy approach, introduced by Lucas (1978), is a breakthrough because it turns out to bemuch easier. It is much easier to evaluates @ H+p{, for ¿xedp than it is to solve jointconsumption-portfolio problems for given asset returns, all to derive the equilibrium con-sumption process. To solve a consumption-portfolio problem we have to model the investor’sentire environment: we have to specifyall the assets to which he has access, what his la-bor income process looks like (or wage rate process, and include a labor supply decision).Once we model the consumption stream directly, we can look at each asset in isolation, andthe actual computation is almost trivial. This breakthrough accounts for the unusual struc-ture of the presentation in this book. It is traditional to start with an extensive study ofconsumption-portfolio problems. But by modeling consumption directly, we have been ableto study pricing directly, and portfolio problems are an interesting side trip which we candefer.

Most uses ofs @ H+p{, do not require us to take any stand on exogeneity or endo-geneity, or general equilibrium. This is a condition that must hold for any asset, for any

49

Page 50: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 3 APPLYING THE BASIC MODEL

production technology. Having a taste of the extra assumptions required for a general equi-librium model, you can now appreciate why people stop short of full solutions when they canaddress an application using only the¿rst order conditions, using knowledge ofH+p{, tomake a prediction abouts.

It is enormously tempting to slide into an interpretation thatH+p{, determines s. We rou-tinely think of betas and factor risk prices – components ofH+p{, – asdetermining expectedreturns. For example, we routinely say things like “the expected return of a stock increasedbecause the¿rm took on riskier projects, thereby increasing its�.” But the whole consump-tion process, discount factor, and factor risk premia change when the production technologychanges. Similarly, we are on thin ice if we say anything about the effects of policy interven-tions, new markets and so on. The equilibrium consumption or asset return process one hasmodeled statistically may change in response to such changes in structure. For such ques-tions one really needs to start thinking in general equilibrium terms. It may help to rememberthat there is an army of permanent-income macroeconomists who make precisely the oppo-site assumption, taking our asset return processes as exogenous and studying (endogenous)consumption and savings decisions.

3.3 Consumption-based model in practice

The consumption-based model is, in principle, a complete answer to all asset pricingquestions, but works poorly in practice. This observation motivates other asset pricing mod-els.

The model I have sketched so far can, in principle, give a compete answer to all thequestions of the theory of valuation. It can be applied toany security—bonds, stocks, options,futures, etc.—or to any uncertain cashÀow. All we need is a functional form for utility,numerical values for the parameters, and a statistical model for the conditional distribution ofconsumption and payoffs.

To be speci¿c, consider the standard power utility function

x3+f, @ f�� = (1)

Then, excess returns should obey

3 @ Hw

%�

�fw.4fw

���Uhw.4

&(2)

Taking unconditional expectations and applying the covariance decomposition, expected ex-

50

Page 51: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 3.4 ALTERNATIVE ASSET PRICING MODELS: OVERVIEW

cess returns should follow

H+Uhw.4, @ �Uifry

%�fw.4fw

���> Uh

w.4

&= (3)

Given a value for �, and data on consumption and returns, one can easily estimate the meanand covariance on the right hand side, and check whether actual expected returns are, in fact,in accordance with the formula.

Similarly, the present value formula is

sw @ Hw

4[m@4

�m�fw.mfw

���gw.m = (4)

Given data on consumption and dividends or another stream of payoffs, we can estimate theright hand side and check it against prices on the left.

Bonds and options do not require separate valuation theories. For example, an Q-perioddefault-free nominal discount bond (a U.S. Treasury strip) is a claim to one dollar at timew.Q . Its price should be

sw @ Hw

#�Q

�fw.Qfw

����w

�w.Q4

$

where� = price level ($/good). A European option is a claim to the payoffpd{+Vw.W �N> 3,, whereVw.W = stock price at timew. W>N @ strike price. The option price should be

sw @ Hw

%�W

�fw.Wfw

���pd{+Vw.W �N> 3,

&

again, we can use data on consumption, prices and payoffs to check these predictions.

Unfortunately, the above speci¿cation of the consumption-based model does not workvery well. To give aÀavor of some of the problems, Figure 5 presents the mean excessreturns on the ten size-ranked portfolios of NYSE stocks vs. the predictions – the right handside of (3.3) – of the consumption-based model. I picked the utility curvature parameter� @574 to make the picture look as good as possible (The section on GMM estimation belowgoes into detail on how to do this. ) As you can see, the model isn’t hopeless–there is somecorrelation between sample average returns and the consumption-based model predictions.But the model does not do very well. The pricing error (actual expected return - predictedexpected return) for each portfolio is of the same order of magnitude as the spread in expectedreturns across the portfolios.

51

Page 52: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 3 APPLYING THE BASIC MODEL

Figure 5. Mean excess returns of 10 CRSP size portfolios vs. predictions of the powerutility consumption-based model. The predictions are generated by�Uifry+p>Ul, withp @ �+fw.4@fw,

�� . � @ 3=<; and� @ 574 are picked by¿rst-stage GMM to minimize thesum of squared pricing errors (deviation from 45� line). Source: Cochrane (1996).

52

Page 53: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 3.4 ALTERNATIVE ASSET PRICING MODELS: OVERVIEW

3.4 Alternative asset pricing models: Overview

I motivate exploration of different utility functions, general equilibrium models, and linearfactor models such as the CAPM, APT and ICAPM as approaches to circumvent the empiricaldif¿culties of the consumption-based model.

The poor empirical performance of the consumption-based model motivates a search foralternative asset pricing models – alternative functionsp @ i+data,. All asset pricing mod-els amount to different functions forp. I give here a bare sketch of some of the differentapproaches� we study each in detail in later chapters.

1) Different utility functions. Perhaps the problem with the consumption-based model issimply the functional form we chose for utility. The natural response is to try different utilityfunctions. Which variables determine marginal utility is a far more important question thanthe functional form. Perhaps the stock of durable goods inÀuences the marginal utility ofnondurable goods� perhaps leisure or yesterday’s consumption affect today’s marginal utility.These possibilities are all instances ofnonseparabilities. One can also try to use micro dataon individual consumption of stockholders rather than aggregate consumption. Aggregationof heterogenous investors can make variables such as the cross-sectional variance of incomeappear in aggregate marginal utility.

2) General equilibrium models. Perhaps the problem is simply with the consumptiondata.General equilibrium models deliver equilibrium decision rules linking consumption to othervariables, such as income, investment, etc. Substituting the decision rulesfw @ i+|w> lw> = = = ,in the consumption-based model, we can link asset prices to other, hopefully better-measuredmacroeconomic aggregates.

In addition, true general equilibrium models completely describe the economy, includingthe stochastic process followed by all variables. They can answer questions such aswhy isthe covariance (beta) of an asset payoff{ with the discount factorp the value that it is, ratherthan take this covariance as a primitive. They can in principle answer structural questions,such as how asset prices might be affected by different government policies. Neither kind ofquestion can be answered by just manipulating investor¿rst order conditions.

3) Factor pricing models. Another sensible response to bad consumption data is to modelmarginal utility in terms of other variables directly.Factor pricing models follow this ap-proach. They just specify that the discount factor is a linear function of a set of proxies,

pw.4 @ d. eDiDw.4 . eEi

Ew.4 . = = = = (5)

wherei l arefactors andd> el are parameters. (This is a different sense of the use of the word“factor” than “discount factor.” I didn’t invent the confusing terminology.) By and large, thefactors are just selected as plausible proxies for marginal utility� events that describe whethertypical investors are happy or unhappy. Among others, the Capital Asset Pricing Model

53

Page 54: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 3 APPLYING THE BASIC MODEL

(CAPM) is the model

pw.4 @ d. eUZw.4

where UZ is the rate of return on a claim to total wealth, often proxied by a broad-basedportfolio such as the value-weighted NYSE portfolio. The Arbitrage Pricing Theory (APT)uses returns on broad-based portfolios derived from a factor analysis of the return covariancematrix. The Intertemporal Capital Asset Pricing Model (ICAPM) suggests macroeconomicvariables such as GNP and inÀation and variables that forecast macroeconomic variables orasset returns as factors. Term structure models such as the Cox-Ingersoll-Ross model specifythat the discount factor is a function of a few term structure variables, for example the shortrate of interest and a few interest rate spreads.

Many factor pricing models are derived as general equilibrium models with linear tech-nologies and no labor income� thus they also fall into the general idea of using general equi-librium relations (from, admittedly, very stylized general equilibrium models) to substituteout for consumption.

4) Arbitrage or near-arbitrage pricing. The mere existence of a representation s @H+p{, and the fact that marginal utility is positive p � 3 (these facts are discussed inthe next chapter) can often be used to deduce prices of one payoff in terms of the prices ofother payoffs. The Black-Scholes option pricing model is the paradigm of this approach:Since the option payoff can be replicated by a portfolio of stock and bond, anyp that pricesthe stock and bond gives the price for the option. Recently, there have been several sug-gestions on how to use this idea in more general circumstances by using very weak furtherrestrictions onp, and we will study these suggestions in Chapter 17.

We return to a more detailed derivation and discussion of these alternative models of thediscount factorp below. First, and with this brief overview in mind, we look ats @ H+p{,and what the discount factorp represents in a little more detail.

54

Page 55: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 4. Contingent Claims MarketsOur ¿rst task is to understand the s @ H+p{, representation a little more deeply. In thischapter I introduce a very simple market structure, contingent claims. This leads us to aninner product interpretation of s @ H+p{, which allows an intuitive visual representationof most of the theorems. We see that discount factors exist, are positive, and the pricingfunction is linear, just starting from prices and payoffs in a complete market, without anyutility functions. The next chapter shows that these properties can be built up in incompletemarkets as well.

4.1 Contingent claims

I describe contingent claims. I interpret the stochastic discount factor p as contingentclaims prices divided by probabilities, and s @ H+p{, as a bundling of contingent claims.

Suppose that one of V possible states of nature can occur tomorrow, i.e. specialize to a¿nite-dimensional state space. Denote the individual states byv. For example, we might haveV @ 5 andv @ rain orv @ shine.

A contingent claim is a security that pays one dollar (or one unit of the consumptiongood) in one statev only tomorrow.sf+v, is the price today of the contingent claim. I writesf to specify that it is the price of a contingent claim and+v, to denote in which statev theclaim pays off.

In acomplete market investors can buy any contingent claim. They don’t necessarily haveto be faced with explicit contingent claims� they just need enough other securities tospanor synthesize all contingent claims. For example, if the possible states of nature are (rain,shine), one can span or synthesize any contingent claim or portfolio achieved by combiningcontingent claims by forming portfolios of a security that pays 2 dollars if it rains and one ifit shines, or{4 @ +5> 4,, and a riskfree security whose payoff pattern is{5 @ +4> 4,.

Now, we are on a hunt for discount factors, and the central point is:

If there are complete contingent claims, a discount factor exists, and it is equal to thecontingent claim price divided by probabilities.

Let {+v, denote an asset’s payoff in state of naturev= We can think of the asset as abundle of contingent claims—{+4, contingent claims to state4> {+5, claims to state5, etc.The asset’s price must then equal the value of the contingent claims of which it is a bundle,

s+{, @[v

sf+v,{+v,= (1)

55

Page 56: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 4 CONTINGENT CLAIMS MARKETS

I denote the price s+{, to emphasize that it is the price of the payoff {. Where the payoff inquestion is clear, I suppress the +{,. I like to think of equation (4.1) as a happy-meal theorem:the price of a happy meal (in a frictionless market) should be the same as the price of onehamburger, one small fries, one small drink and the toy.

It is easier to take expectations rather than sum over states. To this end, multiply anddivide the bundling equation (4.1) by probabilities,

s+{, @[v

�+v,

�sf+v,

�+v,

�{+v,

where�+v, is the probability that statev occurs. Then de¿nep as the ratio of contingentclaim price to probability

p+v, @sf+v,

�+v,=

Now we can write the bundling equation as an expectation,

s @[v

�+v,p+v,{+v, @ H+p{,=

Thus, in a complete market, the stochastic discount factorp in s @ H+p{, exists, and itis just a set of contingent claims prices, scaled by probabilities. As a result of this interpre-tation, the combination of discount factor and probability is sometimes called astate-pricedensity.

The multiplication and division by probabilities seems very arti¿cial in this ¿nite-statecontext. In general, we posit states of nature$ that can take continuous (uncountably in¿nite)values in a space . In this case, the sums become integrals, and we have to usesome measureto integrate over . Thus, scaling contingent claims prices by some probability-like object isunavoidable.

4.2 Risk neutral probabilities

I interpret the discount factorp as a transformation to risk-neutral probabilities such thats @ H�+{,@Ui .

Another common transformation ofs @ H+p{, results in “risk-neutral” probabilities.De¿ne

��+v, � Uip+v,�+v, @ Uisf+v,

56

Page 57: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 4.3 INVESTORS AGAIN

where

Ui � 4@[

sf+v, @ 4@H+p,=

The ��+v, are positive, less than or equal to one and sum to one, so they are a legitimate setof probabilities. Then we can rewrite the asset pricing formula as

s+{, @[v

sf+v,{+v, @4

Ui

[��+v,{+v, @

H�+{,Ui

=

I use the notation H� to remind us that the expectation uses the risk neutral probabilities ��

instead of the real probabilities �.

Thus, we can think of asset pricing as if agents are all risk neutral, but with probabilities�� in the place of the true probabilities �. The probabilities �� gives greater weight to stateswith higher than average marginal utility p.

There is something very deep in this idea: risk aversion is equivalent to paying moreattention to unpleasant states, relative to their actual probability of occurrence. People whoreport high subjective probabilities of unpleasant events like plane crashes may not haveirrational expectations, they may simply be reporting the risk neutral probabilities or theproduct p � �. This product is after all the most important piece of information for manydecisions: pay a lot of attention to contingencies that are either highly probable or that areimprobable but have disastrous consequences.

The transformation from actual to risk-neutral probabilities is given by

��+v, @p+v,

H+p,�+v,=

We can also think of the discount factorpas thederivative or change of measure from thereal probabilities� to the subjective probabilities��= The risk-neutral probability represen-tation of asset pricing is quite common, especially in derivative pricing where the results areindependent of risk adjustments.

4.3 Investors again

We look at investor’s¿rst order conditions in a contingent claims market. The marginalrate of substitution equals the discount factor and the contingent claim price ratio.

Though the focus of this chapter is on how to do without utility functions, It’s worthlooking at the investor’s¿rst order conditions again in the contingent claim context. Theinvestor starts with a pile of initial wealth| and a state-contingent income|+v,. He purchases

57

Page 58: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 4 CONTINGENT CLAIMS MARKETS

contingent claims to each possible state in the second period. His problem is then

pd{if>f+v,j

x+f, .[v

��+v,x^f+v,` v=w= f.[v

sf+v,f+v, @ | .[v

sf+v,|+v,=

Introducing a Lagrange multiplier � on the budget constraint, the ¿rst order conditions are

x3+f, @ �

��+v,x3^f+v,` @ �sf+v,=

Eliminating the Lagrange multiplier �,

sf+v, @ ��+v,x3^f+v,`x3+f,

or

p+v, @sf+v,

�+v,@ �

x3^f+v,`x3+f,

Coupled with s @ H+p{,, we obtain the consumption-based model again.

The investor’s¿rst order conditions say that marginal rates of substitution betweenstatestomorrow equals the relevant price ratio,

p+v4,

p+v5,@

x3^f+v4,`x3^f+v5,`

=

p+v4,@p+v5, gives the rate at which the investor can give up consumption in state 2 in returnfor consumption in state 1 through purchase and sales of contingent claims.x3^f+v4,`@x3^f+v5,`gives the rate at which the investor is willing to make this substitution. At an optimum, themarginal rate of substitution should equal the price ratio, as usual in economics.

We learn that the discount factorp is the marginal rate of substitution between dateandstate contingent commodities. That’s why it, likef+v,, is a random variable. Also, scalingcontingent claims prices by probabilities gives marginal utility, and so is not so arti¿cial as itmay have seemed above.

Figure 6 gives the economics behind this approach to asset pricing. We observe the in-vestor’s choice of date or state-contingent consumption. Once we know his utility function,we can calculate the contingent claim prices that must have led to the observed consumptionchoice, from the derivatives of the utility function.

The relevant probabilities are the investor’ssubjective probabilities over the various states.Asset prices are set, after all, by investor’s demands for assets, and those demands are setby investor’s subjective evaluations of the probabilities of various events. We often assumerational expectations, namely that subjective probabilities are equal to objective frequencies.But this is an additional assumption that we may not always want to make.

58

Page 59: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 4.4 RISK SHARING

State 1, or date 1

State 2or date 2

Indifference curve

(c1, c2)

Figure 6. Indifference curve and contingent claim prices

4.4 Risk sharing

Risk sharing: In complete markets, consumption moves together. Only aggregate riskmatters for security markets.

We deduced that the marginal rate of substitution for any individual investor equals thecontingent claim price ratio. But the prices are the same for all investors. Therefore, marginalutility growth should be the same for all investors

�lx3+flw.4,x3+flw,

@ �mx3+fmw.4,

x3+fmw ,(2)

where l and m refer to different investors. If investors have the same homothetic utility func-tion (for example, power utility), then consumption itself should move in lockstep,

flw.4flw

@fmw.4

fmw=

More generally, shocks to consumption are perfectly correlated across individuals.

This is so radical, it’s easy to misread it at¿rst glance. It doesn’t say that expectedconsumption growth should be equal� it says that consumption growth should be equalex-post. If my consumption goes up 10%, yours goes up exactly 10% as well, and so doeseveryone else’s. In a complete contingent claims market, all investors share all risks, so

59

Page 60: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 4 CONTINGENT CLAIMS MARKETS

when any shock hits, it hits us all equally (after insurance payments). It doesn’t say theconsumptionlevel is the same – this is risk-sharing, not socialism. The rich have higherlevels of consumption, but rich and poor share theshocks equally.

This risk sharing isPareto-optimal. Suppose a social planner wished to maximize every-one’s utility given the available resources. For example, with two investorsl andm, he wouldmaximize

pd{�l[

�wx+flw, . �m[

�wx+fmw , v=w= flw . fmw @ fdw

wherefd is the total amount available and�l and�m are l andm’s relative weights in theplanner’s objective. The¿rst order condition to this problem is

�lx3+flw, @ �mx

3+fmw ,

and hence the same risk sharing that we see in a complete market, equation (4.2).

This simple fact has profound implications. First, it shows you whyonly aggregate shocksshould matter for risk prices. Any idiosyncratic income risk will be equally shared, and so4@Q of it becomes an aggregate shock. Then the stochastic discount factorsp that determineasset prices are no longer affected by truly idiosyncratic risks. Much of this sense that onlyaggregate shocks matter stays with us in incomplete markets as well.

Obviously, the real economy does not yet have complete markets or full risk sharing –individual consumptions do not move in lockstep. However, this observation tells us muchabout the function of securities markets. Security markets – state-contingent claims – bringindividual consumptions closer together by allowing people to share some risks. In addition,better risk sharing is much of the force behind¿nancial innovation. Many successful newsecurities can be understood as devices to more widely share risks.

4.5 State diagram and price function

I introduce the state space diagram and inner product representation for prices,s+{, @H+p{, @ p � {=

s+{, @ H+p{, impliess+{, is a linear function.

Think of the contingent claims pricesf and asset payoffs{ as vectors inUV , where eachelement gives the price or payoff to the corresponding state,

sf @�sf+4, sf+5, � � � sf+V,

�3>

{ @�{+4, {+5, � � � {+V,

�3=

60

Page 61: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 4.5 STATE DIAGRAM AND PRICE FUNCTION

Figure ?? is a graph of these vectors in UV . Next, I deduce the geometry of Figure ??.

Price = 0 (excess returns)

Price = 1 (returns)

Price = 2

State 1 Payoff

State 2Payoff

pc

State 1 contingent claim

Riskfree rate

Figure 7. Contingent claims prices (pc) and payoffs.

The contingent claims price vector sf points in to the positive orthant. We saw in sec-tion 4.3 thatp+v, @ x3^f+v,`@x3+f,= Now, marginal utility should always be positive (peoplealways want more), so the marginal rate of substitution and discount factor are always non-negative,p A 3 andsf A 3. Don’t forget,p andsf are vectors, or random variables. Thus,p A 3 means the realization ofp is positive in every state of nature, or, equivalently everyelement of the vectorp is positive.

The set of payoffs with any given price lie on a (hyper)plane perpendicular to the contin-gent claim price vector.We reasoned above that the price of the payoff {must be given by itscontingent claim value (4.1),

s+{, @[v

sf+v,{+v,= (3)

Interpreting sf and { as vectors, this means that the price is given by the inner productof thecontingent claim price and the payoff.

If two vectors are orthogonal – if they point out from the origin at right angles to eachother – then their inner product is zero. Therefore, the set of all zero price payoffs must lieon a plane orthogonal to the contingent claims price vector, as shown in¿gure??.

More generally, the inner product of two vectors{ andsf equals the product of the mag-

61

Page 62: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 4 CONTINGENT CLAIMS MARKETS

nitude of the projection of { onto sf times the magnitude of sf. Using a dot to denote innerproduct,

s+{, @[v

sf+v,{+v, @ sf � { @ msfm � msurm+{msf,m @ msfm � m{m � frv+�,

where m{m means the length of the vector { and � is the angle between the vectors sf and{. Since all payoffs on planes (such as the price planes in ¿gure ??) that are perpendicularto sf have the same projection onto sf, they must have the same price. (Only the price = 0plane is, strictly speaking, orthogonal to sf. Lacking a better term, I’ve called the nonzeroprice planes “perpendicular” tosf.) When vectors are¿nite-dimensional, the prime notationis commonly used for inner products,sf3{. This notation does not extend well to in¿nite-dimensional spaces. The notationksfm{l is also often used for inner products.

Planes of constant price move out linearly, and the origin { @ 3must have a price ofzero. If payoff | @ 5{, then its price is twice the price of{>

s+|, @[v

sf+v,|+v, @[v

sf+v,5{+v, @ 5 s+{,=

Similarly, a payoff of zero must have a price of zero.

We can think ofs+{, as a pricingfunction, a map from the state space or payoff space inwhich{ lies (UV in this case) to the real line. We have just deduced from the de¿nition (4.3)thats+{, is alinear function, i.e. that

s+d{. e|, @ ds+{, . es+|,=

The constant price lines in Figure?? are of course exactly what one expects from a linearfunction fromUV to U. (One might draw the price on the z axis coming out of the page.Then the price function would be a plane going through the origin and sloping up with iso-price lines as given in Figure??.)

Figure?? also includes the payoffs to a contingent claim to the¿rst state. This payoff isone in the¿rst state and zero in other states and thus located on the axis. The plane of price= 1 payoffs is the plane of assetreturns� the plane of price = 0 payoffs is the plane ofexcessreturns. A riskfree unit payoff (the payoff to a risk-free pure discount bond) lies on the+4> 4,point in Figure??� the riskfree return lies on the intersection of the78r line (same payoff inboth states) and the price = 1 plane (the set of all returns).

Geometry with p in place of pc.

The geometric interpretation of Figure?? goes through with the discount factorp in theplace ofsf. We can de¿ne an inner product between the random variables{ and| by

{ � | � H+{|,>

and retain all the mathematical properties of an inner product. For this reason, random vari-

62

Page 63: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 4.5 STATE DIAGRAM AND PRICE FUNCTION

ables for which H+{|, @ 3 are often called “orthogonal.”

This language may be familiar from linear regressions. When we run a regression of| on{,

| @ e3{. %

we¿nd the linear combination of{ that is “closest” to|, by minimizing the variance or “size”of the residual%. We do this by forcing the residual to be “orthogonal” to the right hand vari-ableH+{%, @ 3. The projection of| on { is de¿ned as the¿tted value,surm+|m{, @e3{ @H+{{3,�4H+|{3,{. This ideal is often illustrated by a residual vector% that is perpen-dicular to a plane de¿ned by the right hand variables{. Thus, when the inner product isde¿ned by a second moment, the operation “project| onto{” is a regression. (If { does notinclude a constant, you don’t add one.)

The geometric interpretation of Figure?? also is valid if we generalize the setup to anin¿nite-dimensional state space, i.e. if we think of continuously-valued random variables.Instead of vectors, which are functions fromUV to U, random variables are (measurable)functions from to U. Nonetheless, we can still think of them as vectors. The equivalentof Uv is now aHilbert space O5, which denotes spaces generated by linear combinationsof square integrablefunctions from to the real line, or the space of random variables with¿nite second moments. We can still de¿ne an “inner product” between two such elementsby { � | @ H+{|,, ands+{, @ H+p{, can still be interpreted as “p is perpendicular to(hyper)planes of constant price.”Proving theorems in this context is a bit harder. You can’tjust say things like “we can take a line perpendicular to any plane,” such things have to beproved. Sometimes,¿nite-dimensional thinking can lead you to errors, so it’s important toprove things the right way, keeping the¿nite dimensional pictures in mind for interpretation.Hansen and Richard (198x) is a very good reference for the Hilbert space machinery.

63

Page 64: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 5. The discount factorNow we look more closely at the discount factor. Rather than derive a speci¿c discount factoras with the consumption-based model in the last chapter, I work backwards. A discount factoris just some random variable that generates prices from payoffs,s @ H+p{,=What does thisexpression mean? Can one always¿nd such a discount factor? Can we use this convenientrepresentation without implicitly assuming all the structure of the investors, utility functions,complete markets, and so forth?

The chapter focuses on two famous theorems. Thelaw of one price states that if twoportfolios have the same payoffs (in every state of nature), then they must have the sameprice. A violation of this law would give rise to an immediate kind of arbitrage pro¿t, as youcould sell the expensive version and buy the cheap version of the same portfolio. The¿rsttheorem states that there is a discount factor that prices all the payoffs bys @ H+p{, if andonly if this law of one price holds.

In ¿nance, we reserve the termabsence of arbitrage for a stronger idea, that if payoff Ais always at least as good as payoff B, and sometimes A is better, then the price of A mustbe greater than the price of B. The second theorem is that there is apositive discount factorthat prices all the payoffs bys @ H+p{, if and only if there are no arbitrage opportunities,so de¿ned.

These theorems are useful to show that we can use stochastic discount factors withoutimplicitly assuming anything about utility functions, aggregation, complete markets and soon. All we need to know about investors in order to represent prices and payoffs via a discountfactor is that they won’t leave law of one price violations or arbitrage opportunities on thetable. These theorems can beused to describe aspects of apayoff space (such as law of oneprice, absence of arbitrage) by restrictions on thediscount factor (such as it exists and it ispositive). Chapter 31 shows how it can be more convenient to impose and check restrictionson a single discount factor than it is to check the corresponding restrictions on all possibleportfolios. Chapter 8 discusses these and other implications of the existence theorems.

5.1 Law of one price and existence of a discount factor

De¿nition of law of one price� price is a linear function.

s @ H+p{, implies law of one price.

The law of one price implies that a discount factor exists: There exists a unique{� in [such thats @ H+{�{, for all { 5[ = space of all available payoffs.

Furthermore, for any valid discount factorp>

{� @ proj+p m [,=

64

Page 65: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.1 LAW OF ONE PRICE AND EXISTENCE OF A DISCOUNT FACTOR

So far we have derived the basic pricing relation s @ H+p{, from environments with alot of structure: either the consumption-based model or complete markets.

Suppose we observe a set of pricess and payoffs{> and that markets — either the mar-kets faced by investors or the markets under study in a particular application — arein-complete, meaning they do not span the entire set of contingencies. In what minimal setof circumstances does some discount factor exists which represents the observed prices bys @ H+p{,B This section and the following answer this important question. This treatmentis a simpli¿ed version of Hansen and Richard (1987), which contains rigorous proofs andsome technical assumptions.

Payoff space

The payoff space[ is the set (or a subset) of all the payoffs that investors can purchase,or it is a subset of the tradeable payoffs that is used in a particular study. For example, if thereare complete contingent claims to V states of nature, then [ @ UV . But the whole point isthat markets are (as in real life) incomplete, so we will generally think of [ as a proper subsetof complete markets UV .

The payoff space will include some set of primitive assets, but investors can also formnew payoffs by forming portfolios. I assume that investors can form any portfolio of tradedassets:

A1: (Portfolio formation) {4> {5 5 [ , d{4 . e{5 5 [ for any real d> e.

Of course, [ @ UV for complete markets satis¿es the portfolio formation assumption. Ifthere is a single basic payoff {, then the payoff space must be at least the ray from the originthrough {. If there are two basic payoffs in U6, then the payoff space [ must include theplane de¿ned by these two payoffs and the origin. Figure 8 illustrates these possibilities.

The payoff space is not the space of returns. The return space is a subset of the payoffspace� if a return U is in the payoff space, then you can pay a price $2 to get a payoff 5U, sothe payoff 5Uwith price 5 is also in the payoff space. Also, �U is in the payoff space.

Free portfolio formation is in fact an important and restrictive simplifying assumption. Itrules out short sales constraints, bid/ask spreads, leverage limitations and so on. The theorycan be modi¿ed to incorporate these frictions, but it is a substantial modi¿cation.

If investors can form portfolios of a vector of basic payoffs { (say, the returns on theNYSE stocks), then the payoff space consists of all portfolios or linear combinations of theseoriginal payoffs [ @ if3{j where f is a vector of portfolio weights. We also can allow trulyin¿nite-dimensional payoff spaces. For example, investors might be able to tradenonlinearfunctions of a basis payoff{, such as call options on{ with strike priceN, which have payoffpd{ ^{+v,�N> 3` =

65

Page 66: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

State 1

State 2

x

X

Single Payoff in R2State 1

State 2

Two Payoffs in R3

State 3 (into page)

x1

x2

Figure 8. Payoff spaces [ generated by one (left) and two (right) basis payoffs.

The law of one price.

A2: (Law of one price, linearity) s+d{4 . e{5, @ ds+{4, . es+{5,

It doesn’t matter how one forms the payoff{. The price of a burger, shake and fries mustbe the same as the price of a happy meal. Graphically, if the iso-price curves were not planes,then one could buy two payoffs on the same iso-price curve, form a portfolio whose payoffis on the straight line connecting the two original payoffs, and sell the portfolio for a higherprice than it cost to assemble it.

The law of one price basically says that investors can’t make instantaneous pro¿ts byrepackaging portfolios. If investors can sell securities, this is a very weak characterizationof preferences. It says there is at least one investor for whom marketing doesn’t matter, whovalues a package by its contents. The law is meant to describe a market that has alreadyreached equilibrium. If there are any violations of the law of one price, traders will quicklyeliminate them so they can’t survive in equilibrium.

A1 and A2 also mean that the 0 payoff must be available, and must have price 0.

The Theorem

The existence of a discount factor implies the law of one price. This is obvious to thepoint of triviality: if { @ | . } thenH+p{, @ H^p+| . },`= The hard, and interesting partof the theorem reverses this logic. We show that thelaw of one price implies theexistence ofa discount factor.

Theorem: Given free portfolio formation A1, and the law of one price A2, there

66

Page 67: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.1 LAW OF ONE PRICE AND EXISTENCE OF A DISCOUNT FACTOR

exists a unique payoff {� 5 [ such that s+{, @ H+{�{, for all { 5 [=

{� is a discount factor. A1 and A2 imply that the price function on [ looks like Figure??: parallel hyperplanes marching out from the origin. The only difference is that [ may bea subspace of the original state space, as shown in Figure 8. The essence of the proof, then,is that any linear function on a space [ can be represented by inner products with a vectorthat lies in [.

Proof 1: (Geometric.) We have established that the price is a linear function as shownin Figure 9. (Figure 9 can be interpreted as the plane [ of a larger dimensional space as inthe right hand panel of Figure 8, laid Àat on the page for clarity.) Now we can draw a linefrom the origin perpendicular to the price planes. Choose a vector {�on this line. Since theline is orthogonal to the price zero plane we have 3 @ s+{, @ H+{�{, for price zero payoffs{ immediately. The inner product between any payoff { on the price = 1 plane and {� ismsurm+{m{�,m � m{�m Thus, every payoff on the price = 1 plane has the same inner productwith {�. All we have to do is pick {� to have the right length, and we obtain s+{, @ 4 @H+{�{, for every { on the price = 1 plane. Then, of course we have s+{, @ H+{�{, forpayoffs { on the other planes as well. Thus, the linear pricing function implied by the Lawof One Price can be represented by inner products with {�. �

Price = 0 (excess returns)

Price = 1 (returns)

Price = 2

x*

Figure 9. Existence of a discount factor {�=

The basic mathematical point is just that any linear function can be represented by aninner product. The the Riesz representation theorem extends the proof to in¿nite-dimensional

67

Page 68: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

payoff spaces. See Hansen and Richard (1987).

Proof 2: (Algebraic.) We can prove the theorem by construction when the payoff space[ is generated by portfolios of a Q basis payoffs (for example, Q stocks). This is a commonsituation, so the formulas are also useful in practice. Organize the basis payoffs into a vector{ @

�{4 {5 === {Q

�3and similarly their prices s. The payoff space is then [ @ if3{j.

We want a discount factor that is in the payoff space, as the theorem requires. Thus, it must beof the form {� @ f3{= Construct f so that {� prices the basis assets. We want s @H+{�{, @H+{{3f,= Thus we need f @ H+{{3,�4s. If H+{{3, is nonsingular, this f exists and is unique.A2 implies that H+{{3, is nonsingular (after pruning redundant rows of {). Thus,

{� @ s3H+{{3,�4{ (1)

is our discount factor. It is a linear combination of { so it is in [. It prices the basis assets{ by construction. It prices every { 5 [ = H^{�+{3f,` @ H^s3H+{{3,�4{{3f` @ s3f= Bylinearity, s+f3{, @ f3s.

What the theorem does and does not say

The theorem says there is a unique {� in [. There may be many other discount factorsp not in [. In fact, unless markets are complete, there are an in¿nite number of randomvariables that satisfy s @ H+p{,. If s @ H+p{, then s @ H ^+p. %,{` for any % orthogonalto {, H+%{, @ 3=

Not only does this construction generate some additional discount factors, it generatesall of them: Any discount factor p (any random variable that satis¿es s @ H+p{,, can berepresented as p @ {�.%with H+%{, @ 3= Figure 10 gives an example of a one-dimensional[ in a two-dimensional state space, in which case there is a whole line of possible discountfactorsp. If markets are complete, there is nowhere to go orthogonal to the payoff space[,so{� is the only possible discount factor.

Reversing the argument,{� is the projection of any stochastic discount factor p on thespace [ of payoffs. This is a very important fact:the pricing implications of any discountfactor p for a set of payoffs [ are the same as those of the projection of p on [. Thisdiscount factor is known as themimicking portfolio for p. Algebraically,

s @ H+p{, @ H ^+surm+pm[, . %,{` @ H ^surm+pm[, {`

Let me repeat and emphasize the logic. Above, we started with investors or a contingentclaim market, and derived a discount factor.s @ H+p{, implies the linearity of the pricingfunction and hence the law of one price, a pretty obvious statement in those contexts. Herewe work backwards. Markets areincomplete in that contingent claims to lots of states ofnature are not available. We found that the law of one price implies a linear pricing function,and a linear pricing function implies that there exists at least one and usually many discountfactors.

68

Page 69: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.1 LAW OF ONE PRICE AND EXISTENCE OF A DISCOUNT FACTOR

Payoff space X

x*

m = x* + ε space of discount factors

Figure 10. Many discount facotors p can price a given set of assets in incomplete markets.

69

Page 70: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

We do allow arbitrary portfolio formation, and that sort of “completeness” is importantto the result. If investors cannot form a portfoliod{. e|, they cannot force the price of thisportfolio to equal the price of its constituents. The law of one price is not innocuous� it is anassumption about preferences albeit a weak one. The point of the theorem is that this isjustenough information about preferences to deduce the existence of a discount factor.

5.2 No-Arbitrage and positive discount factors

The de¿nition of arbitrage: positive payoff implies positive price.

There is a strictly positive discount factor p such that s @ H+p{, if and only if there areno arbitrage opportunities=

No arbitrage is another, slightly stronger, implication of marginal utility, that can be re-versed to show that there is apositive discount factor. We need to start with the de¿nition ofarbitrage:

De¿nition (Absence of arbitrage): A payoff space[ and pricing functions+{, leaveno arbitrage opportunities if every payoff{ that is always non-negative,{ � 3(almost surely), and positive,{ A 3, with some positive probability, has positiveprice,s+{, A 3=

No-arbitrage says that you can’t get for free a portfolio thatmight pay off positively, butwill certainly never cost you anything. This de¿nition is different from the colloquial use ofthe word “arbitrage.” Most people use “arbitrage” to mean a violation of the law of one price– a riskless way of buying something cheap and selling it for a higher price. “Arbitrages” heremight pay off, but then again they might not. The word “arbitrage” is also widely abused.“Risk arbitrage” is a Wall Street oxymoron that means making speci¿c kinds of bets.

An equivalent statement is that if one payoffdominates another, then its price must behigher – if{ � |> thens+{, � s+|, (Or, a bit more carefully but more long-windedly, if{ � | almost surely and{ A | with positive probability, thens+{, A s+|,. You can’t forgetthat{ and| are random variables.)

p A 3 ,No-arbitrage

The absence of arbitrage opportunities is clearly a consequence of a positive discountfactor, and a positive discount factor naturally results from any sort of utility maximization.Recall,

p+v, @ �x3^f+v,`x3+f,

A 3=

70

Page 71: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.2 NO-ARBITRAGE AND POSITIVE DISCOUNT FACTORS

It is a sensible characterization of preferences that marginal utility is always positive. Fewpeople are so satiated that they will throw away money. Therefore,the marginal rate ofsubstitution is positive. The marginal rate of substitution is a random variable, so “positive”means “positive in every state of nature” or “in every possible realization.”

Now, if contingent claims prices are all positive, a bundle of positive amounts of con-tingent claims must also have a positive price, even in incomplete markets. A little moreformally,

Theorem: s @ H+p{, andp+v, A 3 imply no-arbitrage.

Proof: p A 3� { � 3 and there are some states where{ A 3. Thus, in some statesp{ A 3 and in other statesp{ @ 3. ThereforeH+p{, A 3. �

No arbitrage , p A 3

Now we turn the observation around, which is again the hard and interesting part. Asthe law of one price property guaranteed the existence of a discount factorp, no-arbitrageguarantees the existence of a positivep.

The basic idea is pretty simple. No-arbitrage means that the prices of any payoff in thepositive orthant (except zero, but including the axes) must be strictly positive. The price =0 plane divides the region of positive prices from the region of negative prices. Thus, if theregion of negative prices is not to intersect the positive orthant, the iso-price lines must marchup and to the right, and the discount factorp, must point up and to the right. This is howwe have graphed it all along, most recently in¿gure 9. Figure 11 illustrates the case that isruled out: a whole region of negative price payoffs lies in the positive orthant. For example,the payoff{ is strictly positive, but has a negative price. As a result, the (unique, since thismarket is complete) discount factorp is negative in the y-axis state.

The theorem is easy to prove in complete markets. There is only onep> {�. If it isn’tpositive in some state, then the contingent claim in that state has a positive payoff and anegative price, which violates no arbitrage. More formally,

Theorem: In complete markets, no-arbitrage implies that there exists a uniquep A3 such thats @ H+p{,.

Proof: No-arbitrage implies the law of one price, so there is an{� such thats @H+{�{,, and in a complete market this is the unique discount factor. Suppose that{� � 3 for some states. Then, form a payoff{ that is 1 in those states, and zeroelsewhere. This payoff is strictly positive, but its price,

Sv={�+v,?3 �+v,{

�+v, isnegative, negating the assumption of no-arbitrage. �

The tough part comes if markets are incomplete. There are now manyp’s that priceassets. Anyp of the formp @ {� . �> with H+�{, @ 3 will do. We want to show that at

71

Page 72: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

p = 0

p = +1

p = -1

x*, m

x

Figure 11. Counter-example for no-arbitrage, p A 3 theorem. The payoff{ is positive,but has negative price. The discount factor is not strictly positive

leastone of these is positive. But that one may not be{�. Since the discount factors otherthan{� are not in the payoff space[, we can’t use the construction of the last argument,since that construction may yield a payoff that is not in[, and hence to which we don’tknow how to assign a price. To handle this case, I adopt a different strategy of proof. (Thisstrategy is due to Ross 1978. Duf¿e 1992 has a more formal textbook treatment.) The basicidea is another “to every plane there is a perpendicular line” argument, but applied to a spacethat includes prices and payoffs together. As you can see, the price = 0 plane is a separatinghyperplane between the positive orthant and the negative payoffs, and the proof builds on thisidea.

Theorem: No arbitrage implies the existence of a strictly positive discount factor,p A 3> s @ H+p{, ; { 5 [.Proof : Join +�s+{,> {, together to form vectors inUV.4. Call P the set of all+�s+{,> {, pairs,

P @ i+�s+{,> {,> { 5 [j

P is still a linear space:p4 5 P> p5 5 P , dp4 . ep5 5 P . No-arbitragemeans that elements ofP can’t have all positive elements. If{ is positive,�s+{,must be negative. Thus,P is a hyperplane that only intersects the positive orthant

72

Page 73: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.2 NO-ARBITRAGE AND POSITIVE DISCOUNT FACTORS

UV.4. at the point 0. We can then create a linear functionI = UV.4 , U such that

I +�s> {, @ 3 for +�s> {, 5 P , andI +�s> {, A 3 for +�s> {, 5 UV.4. except

the origin. Since we can represent any linear function by a perpendicular vector,there is a vector+4>p, such thatI +�s> {, @ +4>p, � +�s> {, @ �s . p � { or�s . H+p{, using the second moment inner product. Finally, sinceI +�s> {, ispositive for+�s> {, A 3> p must be positive.

In a larger space thanUV.4. , as generated by continuously valued random variables, the

separating hyperplane theorem assures us that there is a linear function that separates the twoconvex setsP and the equivalent ofUV.4

. , and theRiesz representation theorem tells us thatwe can representI as an inner product with some vector byI +�s> {, @ �s.p � {.

What the theorem does and does not say

The theorem says that a discount factorp A 3 exists, but it doesnot say thatp A 3 isunique. The left hand panel of Figure 12 illustrates the situation. Anyp on the line through{� perpendicular to[ also prices assets. Again,s @ H^+p . %,{` if H+%{, @ 3. All ofthese discount factors that lie in the positive orthant are positive, and thus satisfy the theorem.There are lots of them! In a complete market,p is unique, but not otherwise.

The theorem says that a positivep exists, but it also doesnot say thatevery discountfactorpmust be positive. The discount factors in the left hand panel of Figure 12outside thepositive orthant are perfectly valid – they satisfys @ H+p{,> and the prices they generateon[ are arbitrage free, but they are not positive in every state of nature. In particular, thediscount factor{� in the payoff space is still perfectly valid —s+{, @ H+{�{, — but it neednot be positive, again as illustrated in the left hand panel of Figure 12.

This theorem shows that we canextend the pricing function de¿ned on[ to all possiblepayoffsUV > and not imply any arbitrage opportunities on that larger space of payoffs. It saysthat there is a pricing functions+{, de¿ned overall of UV , that assigns the same (correct, orobserved) prices on[ and that displays no arbitrage on all ofUV = Graphically, it says thatwe can draw parallel planes to represent prices on all ofUV in such a way that the planesintersect[ in the right places, and the price planes march up and to the right so the positiveorthant always has positive prices. Any positivep generates such a no-arbitrage extension,as illustrated in the right hand panel of Figure 12. In fact, there are many ways to do this.Each different choice ofp A 3 generates a different extension of the pricing function.

We can think of strictly positive discount factors as possible contingent claims prices.We can think of the theorem as answering the question: is it possible that an observed andincomplete set of prices and payoffs is generated by some complete markets, contingent claimeconomy? The answer is, yes, if there is no arbitrage on the observed prices and payoffs. Infact, since there are typically many positivep’s consistent with ai[> s+{,j, there existmany contingent claims economies consistent with our observations.

73

Page 74: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

X

p = 1

p = 2x*

m > 0

X

p = 1p = 2

x*

m

Figure 12. Existence of a discount factor and extensions. The left graph shows that thepositive discount factor is not unique, and that discount factors may also exist that are notstrictly positive. In particular, {� need not be positive. The right hand graph shows thateach particular choice of p A 3 induces an arbitrage free extension of the prices on [ to allcontingent claims.

Finally, the absence of arbitrage is another very weak characterization of preferences. Thetheorem tells us that this is enough to allow us to use the s @ H+p{, formalism with p A 3.

As usual, this theorem and proof do not require that the state space is UV . State spacesgenerated by continuous random variables work just as well.

5.3 An alternative formula, and xW in continuous time??

In terms of the covariance matrix of payoffs,

{� @ H+{�, . ^s�H+{�,H+{,`3 �4+{�H+{,,=

Just like {� in discrete time,

g��

��@ �uigw�

��.

G

s� u

�3�4g}=

prices assets by construction in continuous time.

Being able to compute {� is useful in many circumstances. This section gives an alterna-

74

Page 75: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.3 AN ALTERNATIVE FORMULA, AND X� IN CONTINUOUS TIME??

tive formula in discrete time, and the continuous time counterpart.

A formula that uses covariance matrices

H+{{3, in our previous formula (5.1) is a second moment matrix. We typically summarizedata in terms of covariance matrices instead. Therefore, a convenient alternative formula is

{� @ H+{�, . ^s�H+{�,H+{,`3�4+{�H+{,, (2)

where

� H�^{�H+{,` ^{�H+{,`3

�denotes the covariance matrix of the { payoffs. (We could just substitute H+{{3, @ .H+{,H+{3,, but the inverse of the sum is not very useful.) We can derive this formula bypostulating a discount factor that is a linear function of the shocks to the payoffs,

{� @ H+{�, . +{�H+{,,3e>

and then ¿nding e to ensure that {� prices the assets { =

s @ H+{�,H+{, .H�+{�H{,{3

�e

so

e @ �4 ^s�H+{�,H+{,` =

If a riskfree rate is traded, then we know H+{�, @ 4@Ui . If a riskfree rate is not traded –if 4 is not in[ – then this formula does not necessarily produce a discount factor{� that isin [. In many applications, however, all that matters is producing some discount factor, andthe arbitrariness of the risk-free or zero beta rate is not a problem.

This formula is particularly useful when the payoff space consists solely of excess returnsor price-zero payoffs. In that case,{� @ s3H+{{3,�4{ gives{� @ 3. {� @ 3 is in fact theonly discount factor in[ that prices all the assets, but in this case it’s more interesting (andavoids4@3 dif¿culties when we want to transform to expected return/beta or other represen-tations) to pick a discount factor not in[ by picking a zero-beta rate or price of the riskfreepayoff. In the case of excess returns, for arbitrarily chosenUi , then, (5.2) gives us

{� @4

Ui� 4

UiH+Uh,3�4+Uh�H+Uh,,> � fry+Uh,

Continuous time

The law of one price implies the existence of a discount factor process, and absence ofarbitrage a positive discount factor process in continuous time as well as discrete time. Atone level, this statement requires no new mathematics. If we reinvest dividends for simplicity,

75

Page 76: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 5 THE DISCOUNT FACTOR

then a discount factor must satisfy

sw�w @ Hw +�w.vsw.v, =

Calling sw.v @ {w.v, this is precisely the discrete time s @ H+p{, that we have studied allalong. Thus, the law of one price or absence of arbitrage are equivalent to the existence of aor a positive �w.v. The same conditions at all horizons v are thus equivalent to the existenceof a discount factor process, or a positive discount factor process �w for all time w.

For calculations it is useful to ¿nd explicit formulas for a discount factors. Suppose a setof securities pays dividends

Gwgw

and their prices follow

gswsw

@ �wgw. �wg}w

where s and } are Q � 4 vectors, �w and �wmay vary over time, �+sw> w>other variables,,H +g}wg}3w, @ L and the division on the left hand side is element-by element. (As usual, I’lldrop thew subscripts when not necessary for clarity, but everything can vary over time.)

We can form a discount factor that prices these assets from a linear combination of theshocks that drive the original assets,

g��

��@ �uigw�

��.

G

s� ui

�3�4�g}= (3)

where � ��3 again is the covariance matrix of returns. You can easily check that thisequation solves

Hw

�gs

s

�.

G

sgw� uigw @ �Hw

�g��

��gs

s

�(4)

and

Hw

�g��

��

�@ �uigw>

or you can show that this is the only diffusion driven byg}, gw with these properties. If thereis a risk free rateuiw (also potentially time-varying), then that rate determinesuiw . If there isno risk free rate, (5.3) will price the risky assets for any arbitrary (or convenient) choice ofuiw . As usual, this discount factor is not unique� �� plus orthogonal noise will also act as adiscount factor:

g�

�@

g��

��. gz> H+gz, @ 3> H+g}gz, @ 3=

76

Page 77: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 5.3 AN ALTERNATIVE FORMULA, AND X� IN CONTINUOUS TIME??

You can see that (5.3) is exactly analogous to the discrete time formula (5.2). (If you don’tlike answers popping out of hats like this, guess a solution of the form

g�

�@ ��gw. ��g}=

Then¿nd�� and�� to satisfy (5.4) for the riskfree and risky assets.)

77

Page 78: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 6. Mean-variance frontier andbeta representationsMuch empirical work in asset pricing is couched in terms of expected return - beta represen-tations and mean-variance frontiers. This chapter introduces expected return - beta represen-tations and mean-variance frontiers.

I discuss here the betarepresentation, most commonly applied to factor pricing models.Chapter 7 shows how an expected return/beta model is equivalent to a linear model for thediscount factor, i.e.p @ e3i wherei are the right hand variables in the time-series regres-sions that de¿ne betas. Chapter 10 then discusses thederivation of popular factor modelssuch as the CAPM, ICAPM and APT, i.e. under what assumptions the discount factoris alinear function of other variablesi such as the market return.

I summarize the classic Lagrangian approach to the mean-variance frontier. I then intro-duce a powerful and useful representation of the mean-variance frontier due to Hansen andRichard (1987). This representation uses the state-space geometry familiar from the existencetheorems. It is also useful because it is valid and useful in in¿nite-dimensional payoff spaces,which we shall soon encounter when we add conditioning information, dynamic trading oroptions.

6.1 Expected return - Beta representations

The expected return-beta expression of a factor pricing model is

H+Ul, @ �. �l>d�d . �l>e�e . = = =

The model is equivalent to a restriction that the intercept is the same for all assets intime-series regressions.

When the factors are returns excess returns, then�d @ H+id,. If the test assets are alsoexcess returns, then the intercept should be zero,� @ 3.

Much empirical work in¿nance is cast in terms of expected return - beta representationsof linear factor pricing models, of the form

H+Ul, @ �. �l>d�d . �l>e�e . = = = > l @ 4> 5> ===Q= (1)

78

Page 79: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.1 EXPECTED RETURN - BETA REPRESENTATIONS

The� terms are de¿ned as the coef¿cients in a multiple regression of returns on factors,

Ulw @ dl . �l>di

dw . �l>ei

ew . = = =. %lw> w @ 4> 5> ===W= (2)

This is often called atime-series regression, since one runs a regression across time for eachsecurity l. The “factors” i are proxies for marginal utility growth. I discuss the stories usedto select factors at some length in chapter 10. For the moment keep in mind the canonical ex-amples,i @ consumption growth, ori @ the return on the market portfolio (CAPM). Noticethat we run returnsUl

w on contemporaneous factorsi mw . This regression is not about predict-ing returns from variables seen ahead of time. Its objective is to measure contemporaneousrelations or risk exposure� whether returns are typically high in “good times” or “bad times”as measured by the factors.

The point of the beta model(6.1) is to explain the variation in average returns acrossassets. I writel @ 4> 5> ===Q in (6.1) to emphasize this fact. The model says that assets withhigher betas should get higher average returns. Thus the betas in (6.1) are the explanatory (x)variables, which vary asset by asset. The� and� – common for all assets – are the interceptand slope in this cross-sectional relation. For example, equation (6.1) says that if we plotexpected returns versus betas in a one-factor model, we should expect all+H+Ul,> �l>d,pairs to line up on a straight line with slope�d and intercept�.

�l>d is interpreted as the amount of exposure of assetl to factord risks, and�d is inter-preted as the price of such risk-exposure. Read the beta pricing model to say: “for each unitof exposure� to risk factord, you must provide investors with an expected return premium�d=” Assets must give investors higher average returns (low prices) if they pay off well intimes that are already good, and pay off poorly in times that are already bad, as measured bythe factors.

One way to estimate the free parameters (�> �) and to test the model (6.1) is to run acrosssectional regression of average returns on betas,

H+Ul, @ �. �l>d�d . �l>e�e . = = =. �l> l @ 4> 5> ===Q= (3)

Again, the�l are the right hand variables, and the� and� are the intercept and slope coef-¿cients that we estimate in this cross-sectional regression. The errors�l arepricing errors.The model predicts�l @ 3, and they should be statistically insigni¿cant in a test. (I intention-ally use the same symbol for the intercept, or mean of the pricing errors, and the individualpricing errors�l.) In the chapters on empirical technique, we will see test statistics based onthe sum of squared pricing errors.

The fact that the betas are regression coef¿cients is crucially important. If the betas arealso free parameters then there is no content to the equation. More importantly (and this isan easier mistake to make), the betas cannot be asset-speci¿c or¿rm-speci¿c characteristics,such as the size of the¿rm, book to market ratio, or (to take an extreme example) the letter ofthe alphabet of its ticker symbol. It is true that expected returns areassociated with or corre-lated withmany such characteristics. Stocks of small companies or of companies with high

79

Page 80: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

book/market ratios do have higher average returns. But this correlation must beexplained bysome beta. The proper betas should drive out any characteristics in cross-sectional regres-sions. If, for example, expected returns were truly related to size, one could buy many smallcompanies to form a large holding company. It would be a “large” company, and hence paylow average returns to the shareholders, while earning a large average return on its holdings.The managers could enjoy the difference. What ruins this promising idea? . The “large”holding company will stillbehave like a portfolio of small stocks. Thus, only if asset returnsdepend onhow you behave, not who you are – on betas rather than characteristics – can amarket equilibrium survive such simple repackaging schemes.

Some common special cases

If there is a risk free rate, its betas are all zero,2 so the intercept is equal to the risk freerate,

Ui @ �=

We can impose this condition rather than estimate� in the cross-sectional regression (6.3).If there is no risk-free rate, then� must be estimated in the cross-sectional regression. Sinceit is the expected return of a portfolio with zero betas on all factors,� is called the (expected)zero-beta ratein this circumstance.

We often examine factor pricing models using excess returns directly. (There is an im-plicit, though not necessarily justi¿ed, division of labor between models of interest rates andmodels of equity risk premia.) Differencing (6.1) between any two returnsUhl @ Ul � Um

(Um does not have to be risk free), we obtain

H+Uhl, @ �l>d�d . �l>e�e . = = = > l @ 4> 5> ===Q= (4)

Here,�ld represents the regression coef¿cient of the excess returnUhl on the factors.

It is often the case that the factors are also returns or excess returns. For example, theCAPM uses the return on the market portfolio as the single factor. In this case, the modelshould apply to the factors as well, and this fact allows us to directly measure the� coef-¿cients. Each factor has beta of one on itself and zero on all the other factors, of course.Therefore, if the factors are excess returns, we haveH+id, @ �d, and so forth. We can thenwrite the factor model as

H+Uhl, @ �l>dH+id, . �l>eH+ie, . = = = > l @ 4> 5> ===Q=

The cross-sectional beta pricing model (6.1)-(6.4) and the time-series regression de¿ni-tion of the betas in (6.2) look very similar. It seems that one can take expectations of the

2 The betas are zero because the risk free rate is known ahead of time. When we consider the effects ofconditioning information, i.e. that the interest rate could vary over time, we have to interpret the means and betas asconditional moments. Thus, if you are worried about time-varying risk free rates, betas, and so forth, either assumeall variables are i.i.d. (and thus that the risk free rate is constant), or interpret all moments as conditional on time| information.

80

Page 81: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.2 MEAN-VARIANCE FRONTIER: INTUITION AND LAGRANGIAN CHARACTERIZATION

time-series regression (6.2) and arrive at the beta model (6.1), in which case the latter wouldbe vacuous since one can always run a regression of anything on anything. The difference issubtle but crucial: the time-series regressions (6.2) will in general have a different interceptdl for each returnl, while the intercept� is the same for all assets in the beta pricing equa-tion (6.1). The beta pricing equation is a restriction on expected returns, and thus imposes arestriction on intercepts in the time-series regression.

In the special case that the factors are themselves excess returns, the restriction is partic-ularly simple: the time-series regression intercepts should all be zero. In this case, we canavoid the cross-sectional regression entirely, since there are no free parameters left.

6.2 Mean-variance frontier: Intuition and Lagrangiancharacterization

The mean-variance frontierof a given set of assets is the boundary of the set of means andvariances of the returns on all portfolios of the given assets. One can ¿nd or de¿ne this bound-ary by minimizing return variance for a given mean return. Many asset pricing propositionsand test statistics have interpretations in terms of the mean-variance frontier.

Figure 13 displays a typical mean-variance frontier. As displayed in Figure 13, it is com-mon to distinguish the mean-variance frontier of all risky assets, graphed as the hyperbolicregion, and the mean-variance frontier of all assets, i.e. including a risk free rate if there isone, which is the larger wedge-shaped region. Some authors reserve the terminology “mean-variance frontier” for the upper portion, calling the whole thing theminimum variance fron-tier. The risky asset frontier is a hyperbola, which means it lies between two asymptotes,shown as dotted lines. The risk free rate is typically drawn below the intersection of theasymptotes and the vertical axis, or the point of minimum variance on the risky frontier. If itwere above this point, investors with a mean-variance objective would try to short the riskyassets, which cannot represent an equilibrium.

In general, portfolios of two assets or portfolios¿ll out a hyperbolic curve through thetwo assets. The curve is sharper the less correlated are the two assets, because the portfoliovariance bene¿ts from increasing diversi¿cation. Portfolios of a risky asset and risk free rategive rise to straight lines in mean-standard deviation space.

In Chapter 1, we derived a similar wedge-shaped region as the set of means and variancesof all assets that are priced by a given discount factor. This chapter is about incompletemarkets, so we think of a mean-variance frontier generated by a given set of assets, typicallyless than complete.

When does the mean-variance frontier exist? I.e., when is the set of portfolio means andvariances less than the wholeiH> �j space? We basically have to rule out a special case: tworeturns are perfectly correlated but yield different means. In that case one could short one,long the other, and achieve in¿nite expected returns with no risk. More formally, eliminatepurely redundant securities from consideration, then

81

Page 82: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

Theorem: So long as the variance-covariance matrix of returns is non-singular, thereis a mean-variance frontier.

To prove this theorem, just follow the construction below. This theorem should soundvery familiar: Two perfectly correlated returns with different mean are a violation of the lawof one price. Thus, the law of one price implies that there is a mean variance frontier as wellas a discount factor.

E(R)

σ(R)

Mean-variance frontier

Rf

Original assets

Risky asset frontier

Tangency portfolioof risky assets

Figure 13. Mean-variance frontier

6.2.1 Lagrangian approach to mean-variance frontier

The standard de¿nition and the computation of the mean-variance frontier follows a bruteforce approach.

Problem: Start with a vector of asset returnsU. Denote byH the vector of mean returns,H �H+U,> and denote by the variance-covariance matrix @ H

�+U�H,+U�H,3

�. A

portfolio is de¿ned by its weightsz on the initial securities. The portfolio return isz3Uwhere the weights sum to one,z34 @4. The problem “choose a portfolio to minimize vari-ance for a given mean” is then

plqizj z3z s.t.z3H @ �> z34 @ 4= (5)

Solution: Let

D @ H3�4H> E @ H3�44> F @ 43�44=

82

Page 83: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.2 MEAN-VARIANCE FRONTIER: INTUITION AND LAGRANGIAN CHARACTERIZATION

Then, for a given mean portfolio return�, the minimum variance portfolio has variance

ydu +Us, @F�5 � 5E�.D

DF �E5(6)

and is formed by portfolio weights

z @ �4H +F��E, . 4 +D�E�,

+DF �E5,=

Equation (6.6) shows that the variance is a quadratic function of the mean. The squareroot of a parabola is a hyperbola, which is why we draw hyperbolic regions in mean-standarddeviation space.

Theminimum-variance portfoliois interesting in its own right. It appears as a special casein many theorems and it appears in several test statistics. We can ¿nd it by minimizing (6.6)over �> giving �min var @ E@F. The weights of the minimum variance portfolio are thus

z @ �44@+43�44,=

We can get to any point on the mean-variance frontier by starting with two returns onthe frontier and forming portfolios. The frontier isspanned by any two frontier returns.To see this fact, notice thatz is a linear function of�. Thus, if you take the portfolioscorresponding to any two distinct mean returns�4 and�5> the weights on a third portfoliowith mean�6 @ ��4 . +4� �,�5 are given byz6 @ �z4 . +4� �,z5.

Derivation: To derive the solution, introduce Lagrange multipliers5� and5� on the con-straints. The¿rst order conditions to (6.5) are then

z��H � �4 @ 3

z @ �4+�H . �4,= (7)

We¿nd the Lagrange multipliers from the constraints,

H3z @ H3�4+�H . �4, @ �

43z @ 43�4+�H . �4, @ 4

or �H3�4H H3�4443�4H 43�44

� ���

�@

��4

�D EE F

� ���

�@

��4

83

Page 84: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

Hence,

� @F��E

DF�E5

� @D�E�

DF �E5

Plugging in to (6.7), we get the portfolio weights and variance.

6.3 An orthogonal characterization of the mean-variance frontier

Every return can be expressed as Ul @ U� .zlUh� . ql=

The mean-variance frontier isUpy @ U� .zUh�

Uh� is de¿ned asUh� @ surm+4mUh,= It represents mean excess returns,H+Uh, @H+Uh�Uh,;Uh 5 Uh

The Lagrangian approach to the mean-variance frontier is straightforward but cumber-some. Our further manipulations will be easier if we follow an alternative approach due toHansen and Richard (1987). Technically, Hansen and Richard’s approach is also valid whenwe can’t generate the payoff space by portfolios of a¿nite set of basis payoffsf3{. This hap-pens, for example, when we think about conditioning information in Chapter 9. Also, it is thenatural geometric way to think about the mean-variance frontier given that we have startedto think of payoffs, discount factors and other random variables as vectors in the space ofpayoffs. Rather than write portfolios as combinations of basis assets, and pose and solve aminimization problem, we¿rst describe any return by a three-way orthogonal decomposition.The mean-variance frontier then pops out easily without any algebra.

6.3.1 De¿nitions of U�> Uh�

I start by de¿ning two special returns.U� is the return corresponding to the payoff{� thatcan act as the discount factor. The price of{�, is, like any other price,s+{�, @ H+{�{�,.Thus,

The de¿nition of U� is

U� � {�

s+{�,@

{�

H+{�5,(8)

84

Page 85: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.3 AN ORTHOGONAL CHARACTERIZATION OF THE MEAN-VARIANCE FRONTIER

The de¿nition of Uh�is

Uh� � surm+4 m Uh, (9)

Uh � space of excess returns@ i{ 5 [ v=w= s+{, @ 3j

Why Uh�? We are heading towards a mean-variance frontier, so it is natural to seek aspecial return that changes means.Uh� is an excess return that represents means onUh withan inner product in the same way that{� is a payoff in[ that represents prices with an innerproduct. As

s+{, @ H+p{, @ H^surm+pm[,{` @ H+{�{,>

so

H+Uh, @ H+4�Uh, @ H ^surm+4 m Uh,�Uh` @ H+Uh�Uh,=

If U� andUh� are still a bit mysterious at this point, they will make more sense as we usethem, and discover their many interesting properties.

Now we can state a beautiful orthogonal decomposition.

Theorem: Every returnUl can be expressed as

Ul @ U� .zlUh� . ql

wherezl is a number, andql is an excess return with the property

H+ql, @ 3=

The three components are orthogonal,

H+U�Uh�, @ H+U�ql, @ H+Uh�ql, @ 3=

This theorem quickly implies the characterization of the mean variance frontier which weare after:

Theorem: Upy is on the mean-variance frontier if and only if

Upy @ U� . zUh� (10)

for some real numberz.

As you vary the numberz, you sweep out the mean-variance frontier.H+Uh�, 9@ 3> soadding morez changes the mean and variance ofUpy. You can interpret (6.10) as a “two-

85

Page 86: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

fund” theorem for the mean-variance frontier. It expresses every frontier return as a portfolioof U� andUh�, with varying weights on the latter.

As usual,¿rst I’ll argue why the theorems are sensible, then I’ll offer a simple algebraicproof. Hansen and Richard (1987) give a much more careful algebraic proof.

6.3.2 Graphical construction

Figure 14 illustrates the decomposition. Start at the origin (0). Recall that the{� vector isperpendicular to planes of constant price� thus theU� vector lies perpendicular to the planeof returns as shown. Go toU�.

Uh� is the excess return that is closest to the vector4� it lies at right angles to planes (inUh) of constantmean return, shown in theH @ 4> H @ 5 lines, just as the returnU� lies atright angles to planes of constant price. SinceUh� is an excess return, it is orthogonal toU�.Proceed an amountzl in the direction ofUh�, getting as close toUl as possible.

Now move, again in an orthogonal direction, by an amountql to get to the returnUl. Wehave thus expressedUl @ U�.zlUh�.ql in a way that all three components are orthogonal.

Returns withq @ 3, U� . zUh�>are the mean-variance frontier. Here’s why. SinceH+U5, @ �5+U, . H+U,5, we can de¿ne the mean-variance frontier by minimizing secondmoment for a given mean. The length of each vector in Figure 14 is its second moment, sowe want the shortest vector that is on the return plane for a given mean. The shortest vectorsin the return plane with given mean are on theU� .zUh� line.

The graph also shows howUh� represents means in the space of excess returns. Ex-pectation is the inner product with 1. Planes of constant expected value in Figure 14 areperpendicular to the4 vector, just as planes of constant price are perpendicular to the{� orU� vectors. I don’t show the full extent of the constant expected payoff planes for clarity� Ido show lines of constant expected excess return inUh, which are the intersection of constantexpected payoff planes with theUh plane. Therefore, just as we found an{� in [ to repre-sent prices in[ by projectingp onto[, we¿ndUh� in Uh by projecting of4 ontoUh. Yes,a regression with one on the left hand side. Planes perpendicular toUh� in Uh are payoffswith constantmean, just as planes perpendicular to{� in [ are payoffs with the sameprice.

6.3.3 Algebraic argument

Now, an algebraic proof of the decomposition and characterization of mean variance frontier.The algebra just represents statements about what is at right angles to what with secondmoments.

Proof: Straight from their de¿nitions, (6.8) and (6.9) we know thatUh� is an excess

86

Page 87: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.3 AN ORTHOGONAL CHARACTERIZATION OF THE MEAN-VARIANCE FRONTIER

R*

Re*

1

ni

Ri=R*+wiRe*+ni

0

E=0E=1

R*+wiRe*

Re = space of excess returns (p=0)

R=space of returns (p=1)

Figure 14. Orthogonal decomposition and mean-variance frontier.

return (price zero), and hence thatU� andUh� are orthogonal,

H+U�Uh�, @H+{�Uh�,H+{�5,

@ 3=

We de¿neql so that the decomposition adds up toUl as claimed, and we de¿nezl to make sure thatql is orthogonal to the other two components. Then we provethatH+ql, @ 3. Pick anyzl and then de¿ne

ql � Ul �U� �zlUh�=

ql is an excess return so already orthogonal toU�>

H+U�ql, @ 3.

To showH+ql, @ 3 andql orthogonal toUh�, we exploit the fact that sinceql is anexcess return,

H+ql, @ H+Uh�ql,=

Therefore,Uh� is orthogonal toql if and only if we pickzl so thatH+ql, @ 3. We

87

Page 88: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

don’t have to explicitly calculatezl for the proof.3

Once we have constructed the decomposition, the frontier drops out. SinceH+ql, @3 and the three components are orthogonal,

H+Ul, @ H+U�, .zlH+Uh�,

�5+Ul, @ �5+U� .zlUh�, . �5+ql,=

Thus, for each desired value of the mean return, there is a uniquezl. Returns withql @ 3 minimize variance for each mean. �

6.3.4 Decomposition in mean-variance space

Figure 15 illustrates the decomposition in mean-variance space rather than in state-space.

First, let’s locateU�. U� is the minimum second moment return. One can see this factfrom the geometry of Figure 14:U� is the return closest to the origin, and thus the returnwith the smallest “length” which is second moment. As with OLS regression, minimizingthe length ofU� and creating anU� orthogonal to all excess returns is the same thing. Onecan also verify this property algebraically. Since any return can be expressed asU @ U� .zUh� . q, H+U5, @ H+U�5, . z5H+Uh�5, . H+q5,. q @ 3 andz @ 3 thus give theminimum second moment return.

In mean-standard deviation space, lines of constant second moment are circles. Thus,the minimum second-moment returnU� is on the smallest circle that intersects the set of allassets, which lie in the mean-variance frontier in the right hand panel of Figure 17. NoticethatU� is on the lower, or “inef¿cient” segment of the mean-variance frontier. It is initiallysurprising that this is the location of the most interesting return on the frontier!U� is notthe “market portfolio” or “wealth portfolio,” which typically lie on the upper portion of thefrontier.

Adding moreUh� moves one along the frontier. Addingq does not change mean but doeschange variance, so it is anidiosyncratic return that just moves an asset off the frontier asgraphed.� is the “zero-beta rate” corresponding toU�. It is the expected return of any returnthat is uncorrelated withU�. I demonstrate these properties in section 7.5.

� Its value

�� '.E-��3 .E-W�

.E-eW�

is not particularly enlightening.

88

Page 89: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.4 SPANNING THE MEAN-VARIANCE FRONTIER

σ(R)

E(R)

R*

RiR* + wiRe*

ni

Figure 15. Orthogonal decomposition of a returnUl in mean-standard deviation space.

89

Page 90: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

6.4 Spanning the mean-variance frontier

The characterization of the mean-variance frontier in terms ofU� andUh� is most naturalin our setup. However, you can equivalently span the mean-variance frontier with any twoportfolios that are on the frontier – any two distinct linear combinations ofU� andUh�. Inparticular, take any return

U� @ U� . �Uh�> � 9@ 3= (11)

Using this return in place ofUh�,

Uh� @U� �U�

you can express the mean variance frontier in terms ofU� andU� =

U� .zUh� @ U� . | +U� �U�, (6.12)

@ +4� |,U� . |U�

where I have de¿ned a new weight| @ z@�.

The most common alternative approach is to use a risk free rate or a risky rate that some-how behaves like the risk free rate in place ofUh� to span the frontier. When there is a riskfree rate, it is on the frontier with representation

Ui @ U� .UiUh�

I derive this expression in equation (6.18) below. Therefore, we can use (6.12) withUd @ Ui .When there is no risk free rate, several risky returns that retain some properties of the risk freerate are often used. In section 5.3 below I present a “zero beta” return, which is uncorrelatedwith U�, a “constant-mimicking portfolio” return, which is the return on the traded payoffclosest to unity,aU @ surm+4m[,@s^surm+4m[,` and the minimium variance return. Each ofthese returns is on the mean-variance frontier, with form 6.11, though different values of�.Therefore, we can span the mean-variance frontier withU� and any of these risk-free rateproxies.

6.5 A compilation of properties of -Wc -eW and %W

The special returnsU�, Uh� that generate the mean variance frontier have lots of interestingand useful properties. Some I derived above, some I will derive and discuss below in moredetail, and some will be useful tricks later on. Most properties and derivations are extremelyobscure if you don’t look at the picture!

90

Page 91: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.5 A COMPILATION OF PROPERTIES OF U�> Uh� AND {�

1)

H+U�5, @4

H+{�5,= (13)

To derive this fact, multiply both sides of (6.8) by U�, take expectations, and remember U� isa return so 4 @ H+{�U�,.

2) We can reverse the de¿nition and recover {� from U� via

{� @U�

H+U�5,= (14)

To derive this formula, start with the de¿nition U� @ {�@H+{�5, and substitute from (6.13)for H+{�5,

3) U� can be used to represent prices just like {�. This is not surprising, since they bothpoint in the same direction, orthogonal to planes of constant price. Most obviously, from 6.14

s+{, @ H+{�{, @H +U�{,H+U�5,

;{ 5 [

For returns, we can nicely express this result as

H+U�5, @ H+U�U, ;U 5 U= (15)

This fact can also serve as an alternative de¿ning property of U�.

4) Uh� represents means on Uh via an inner product in the same way that {� representsprices on [ via an inner product. Uh� is orthogonal to planes of constant mean in Uh as {� isorthogonal to planes of constant price. Algebraically, in analogy to s+{, @ H+{�{, we have

H+Uh, @ H+Uh�Uh, ;Uh 5 Uh= (16)

This fact can serve as an alternative de¿ning property of Uh�.

5) Uh� and U� are orthogonal,

H+U�Uh�, @ 3=

More generally, U� is orthogonal to any excess return.

6) The mean variance frontier is given by

Upy @ U� .zUh�=

To prove this, H+U5, @ H�+U� .zUh� . q,5

�@ H+U�5, . z5H+Uh5, . H+q5,, and

H+q, @ 3> so set q to zero. The conditional mean-variance frontier allowsz in the con-ditioning information set. The unconditional mean variance frontier requiresz to equal aconstant.

91

Page 92: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 6 MEAN-VARIANCE FRONTIER AND BETA REPRESENTATIONS

7) U� is the minimum second moment return. Graphically,U� is the return closest tothe origin. To see this, using the decomposition in #6, and setz5 andq to zero to minimizesecond moment.

8)Uh� has the same¿rst and second moment,

H+Uh�, @ H+Uh�5,=

Just apply fact (6.16) toUh� itself. Therefore

ydu+Uh�, @ H+Uh�5,�H+Uh�,5 @ H+Uh�, ^4�H+Uh�,` =

9) If there is a riskfree rate, thenUh� can also be de¿ned as the residual in the projectionof 4onU� =

Uh� @ 4� surm+4mU�, @ 4� H+U�,H+U�5,

U� @ 4� 4

UiU� (17)

You’d never have thought of this without looking at Figure 14! SinceU� andUh are orthog-onal and together span[, 4 @ surm+4mUh,. surm+4mU�,. You can also verify this statementanalytically. Check thatUh� so de¿ned is an excess return in[ – its price is zero–, andH+Uh�Uh, @ H+Uh,> H+U�Uh�, @ 3=

As a result,Ui has the decomposition

Ui @ U� .UiUh�. (18)

SinceUi A 4 typically, this means thatU�.Uh� is located on the lower portion of the mean-variance frontier in mean-variance space, just a bit to the right ofUi . If the risk free rate wereone, then the unit vector would lie in the return space, andUi @ U� . Uh�. Typcially, thespace of returns is a little bit above the unit vector. As you stretch the unit vector by theamountUi to arrive at the returnUi , so you stretch the amountUh� that you add toU� toget toUi .

If there is no riskfree rate, then we can use

surm+4m[, @ surm+surm +4m[, mUh, . surm+surm +4m[, mU�,@ surm+4mUh, . surm+4mU�,

to deduce an analogue to equation (6.17),

Uh� @ surm+4m[,� surm+4mU�, @ surm+4m[,� H+U�,H+U�5,

U� (19)

10) If a riskfree rate is traded, we can constructUi fromU� via

Ui @4

H+{�,@

H+U�5,H+U�,

= (20)

92

Page 93: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 6.6 PROBLEMS

If not, this gives a “zero beta rate” interpretation of the right hand expression.

11) Since we have a formula{� @ s3H+{{3,�4{ for constructing{� from basis assets(see section 5.1), we can constructU� in this case from

U� @{�

s+{�,@

s3H+{{3,�4{s3H+{{3,�4s

=

(s+{�, @ H+{�{�, leading to the denominator.)

12) We can constructUh� from a set of basis assets as well. Following the de¿nition toproject one on the space of excess returns,

Uh� @ H+Uh,3H+UhUh3,�4Uh

whereUh is the basis set of excess returns. (You can always useUh @ U�U� if you want).This construction obviously mirrors the way we constructed{� in section 5.1, and you cansee the similarity in the result, withH in place ofs, sinceUh� represents means rather thanprices. .

If there is a riskfree rate, we can also use (6.17),

Uh� @ 4� 4

UiU� @ 4� 4

Ui

s3H+{{3,�4{s3H+{{3,�4s

= (21)

If there is no riskfree rate, we can use (6.19) to constructUh�. The central ingredient is

surm+4m[, @ H+{,3H+{{3,�4{=

6.6 Problems

1. Prove thatUh� lies at right angles to planes (inUh) of constantmean return, as shown in¿gure 14.

2. Should we typically draw{� above, below or on the plane of returns?Must {�always liein this position?Can you constructUh� from knowledge ofp, {�, orU�B

3. What happens toU�> Uh� and the mean-variance frontier if investors are risk neutral? a)If a riskfree rate is traded. b) If no riskfree rate is traded? (Hint: make a drawing or thinkabout the case that payoffs are generated by anQ dimensional vector of basis assets{)

4. {� @ surm+pm[,. IsU� @ surm+pmU,B

93

Page 94: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 7. Relation between discountfactors, betas, and mean-variance frontiersIn this chapter, I draw the connection between discount factors, mean-variance frontiers, andbeta representations. In the¿rst chapter, I showed how mean-variance and a beta represen-tation follow froms @ H+p{, and (in the mean-variance case) complete markets. Here, Idiscuss the connections in both directions and in incomplete markets, drawing on the repre-sentations studied in the last chapter.

The central theme of the chapter is thatall three representations are equivalent. Figure16 summarizes the ways one can go from one representation to another. A discount factor, areference variable for betas – the thing you put on the right hand side in the regressions thatgive betas – and a return on the mean-variance frontier all carry the same information, andgiven any one of them, you can¿nd the others. More speci¿cally,

1. s @ H+p{, , �: Givenp such thats @ H+p{,, p> {�> U�, orU� . zUh� all canserve as reference variables for betas.

2. s @ H+p{, , mean-variance frontier. You can constructU� from {� @ surm+pm[,,U� @ {�@H+{�5,, and thenU�, U� .zUh� are on the mean-variance frontier.

3. Mean-variance frontier, s @ H+p{,. If Upy is on the mean-variance frontier, thenp @ d. eUpy linear in that return is a discount factor� it satis¿ess @ H+p{,.

4. � , s @ H+p{,. If we have an expected return/beta model with factorsi , thenp @ e3i linear in the factors satis¿ess @ H+p{, (and vice-versa).

5. If a return is on the mean-variance frontier, then there is an expected return/beta modelwith that return as reference variable.

The following subsections discuss the mechanics of going from one representation to theother in detail. The last section of the chapter collects some special cases when there isno riskfree rate. The next chapter discusses some of the implications of these equivalencetheorems, and why they are important.

7.1 From discount factors to beta representations

p>{�> andU� can all be the single factor in a single beta representation.

94

Page 95: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

E(Ri) = α + βi’λ

p = E(mx)

Rmv on m.v.f.

LOOP ⇒ m exists

f = m, x*, R*

m = b’f

R* is on m.v.f.

m = a + bRmv

f = Rmv

proj(f|R) on m.v.f.

E(RR’) nonsingular ⇒ Rmv exists

Figure 16. Relation between three views of asset pricing.

95

Page 96: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

7.1.1 Beta representation using p

s @ H+p{, implies H+Ul, @ �. �l>p�p. Start with

4 @ H+pUl, @ H+p,H+Ul, . fry+p>Ul,=

Thus,

H+Ul, @4

H+p,� fry+p>Ul,

H+p,=

Multiply and divide byydu+p,, de¿ne� � 4@H+p, to get

H+Ul, @ �.

�fry+p>Ul,

ydu+p,

���ydu+p,

H+p,

�@ �. �l>p�p=

As advertised, we have a single beta representation.

For example, we can equivalently state the consumption-based model as: mean assetreturns should be linear in the regression betas of asset returns on+fw.4@fw,�� . Furthermore,the slope of this cross-sectional relationship�p is not a free parameter, though it is usuallytreated as such in empirical evaluation of factor pricing models.�p should equal the ratio ofvariance to mean of+fw.4@fw,�� =

The factor risk premium�p for marginal utility growth is negative. Positive expectedreturns are associated with positive correlation with consumption growth, and hence negativecorrelation with marginal utility growth andp. Thus, we expect�p ? 3.

7.1.2 � representation using {� and U�

It is often useful to express a pricing model in a way that the factor is a payoff rather than areal factor such as consumption growth. In applications, we can then avoid measurement dif-¿culties of real data. We have already seen the idea of “factor mimicking portfolios” formedby projection: projectp on to[, and the resulting payoff{� also serves as a discount fac-tor. Unsurprisingly,{� can also serve as the factor in an expected return-beta representa-tion. It’s even more useful if the reference payoff is a return. Unsurprisingly, the returnU� @ {�@H+{�5, can also serve as the factor in a beta pricing model. When the factor is alsoa return, the model is particularly simple, since the factor risk premium is also the expectedexcess return.

Theorem. 4 @ H+pUl, implies an expected return - beta model with{� @surm+pm[, or U� � {�@H+{�5, as factors, e.g.H+Ul, @ � . �l>{��{� andH+Ul, @ �. �l>U� ^H+U�,� �`=Proof: Recall thats @ H+p{, impliess @ H ^surm+p m [, {`, or s @ H+{�{,.

96

Page 97: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.1 FROM DISCOUNT FACTORS TO BETA REPRESENTATIONS

Then

4 @ H+pUl, @ H+{�Ul, @ H+{�,H+Ul, . fry+{�> Ul,=

Solving for the expected return,

H+Ul, @4

H+{�,� fry+{�> Ul,

H+{�,@

4

H+{�,� fry+{�> Ul,

ydu+{�,ydu+{�,H+{�,

(1)

which we can write as the desired single-beta model,

H+Ul, @ �. �l>{��{� =

Notice that the zero-beta rate4@H+{�, appears when there is no riskfree rate.To derive a single beta representation withU�, recall the de¿nition,

U� @{�

H+{�5,

SubstitutingU� for {�, equation (7.1) implies that we can in fact construct a returnU� fromp that acts as the single factor in a beta model,

H+Ul, @H+U�5,H+U�,

� fry+U�> Ul,

H+U�,@

H+U�5,H+U�,

.

�fry+U�> Ul,

ydu+U�,

���ydu+U�,

H+U�,

or, de¿ning Greek letters in the obvious way,

H+Ul, @ �. �Ul>U��U� (2)

Since the factorU� is also a return, its expected excess return over the zero beta rategives the factor risk premium�U� . Applying equation (7.2) toU� itself,

H+U�, @ �� ydu+U�,H+U�,

= (3)

So we can write the beta model in an even more traditional form

H+Ul, @ �. �Ul>U� ^H+U�,� �`= (4)

Recall thatU� is the minimum second moment frontier, on the lower portion of the mean-variance frontier. This is whyU� has an unusual negative expected excess return or factorrisk premium,�U� @ �ydu+U�,@H+U�, ? 3. � is the zero-beta rate onU� shown inFigure15.

Special cases

A footnote to these constructions is thatH+p,, H+{�,> orH+U�, cannot be zero, or youcouldn’t divide by them. This is a pathological case:H+p, @ 3 implies a zero price for the

97

Page 98: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

riskfree asset, and an in¿nite riskfree rate. If a riskfree rate is traded, we can simply observethat it is not in¿nite and verify the fact. Also, in a complete market,H+p, cannot be zerosince, by absence of arbitrage,p A 3. We will see similar special cases in the remainingtheorems: the manipulations only work for discount factor choices that do not imply zero orin¿nite riskfree rates. I discuss the issue in section 7.6

The manipulation from expected return-covariance to expected return-beta breaks downif ydu+p,, ydu+{�, or ydu+U�, are zero. This is the case of pure risk neutrality. In this case,the covariances go to zero faster than the variances, so all betas go to zero and all expectedreturns become the same as the risk free rate.

7.2 From mean-variance frontier to a discount factor and betarepresentation

Upy is on mean-variance frontier, p @ d. eUpy> H+Ul,� � @ �l ^H+Upy,� �`

We have seen thats @ H+p{, implies a single�� model witha mean-variance ef¿cientreference return, namelyU�. The converse is also true: for (almost) any return on the mean-variance frontier, we can de¿ne a discount factorp that prices assets as a linear function ofthe mean-variance ef¿cient return. Also, expected returns mechanically follow a single��representation using the mean-variance ef¿cient return as reference.

I start with the discount factor.

Theorem: There is a discount factor of the formp @ d. eUpy if and only ifUpy

is on the mean-variance frontier, andUpyis not the riskfree rate. (If there is noriskfree rate, ifUpy is not the constant-mimicking portfolio return.)

Graphical argument

The basic idea is very simple, and Figure 17 shows the geometry for the complete marketscase. The discount factorp @ {� is proportional toU�. The mean-variance frontier isU� . zUh�. Pick a vectorUpy on the mean-variance frontier as shown in Figure 17. Thenstretch it (eUpy) and then subtract some of the 1 vector (d). SinceUh� is generated by theunit vector, we can get rid of theUh� component and get back to the discount factor{� if wepick the rightdande.

If the original return vector were not on the mean-variance frontier, then any linear com-binationd . eUpy with e 9@ 3 would point in some of theq direction, whichU� and{� donot. If e @ 3, though, just stretching up and down the4 vector will not get us to{�= Thus, wecanonly get a discount factor of the formd. eUpy if Upy is on the frontier.

98

Page 99: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.2 FROM MEAN-VARIANCE FRONTIER TO A DISCOUNT FACTOR AND BETA REPRESENTATION

You may remember that{� is not the only discount factor – all discount factors are of theformp @ {�. % with H+%{, @ 3. Perhapsd. eU gives one of these discount factors, whenU is not on the mean-variance frontier? This doesn’t work, however� q is still in the payoffspace[ while, by de¿nition, % is orthogonal to this space.

If the mean-variance ef¿cient returnUpy that we start with happens to lie right on theintersection of the stretched unit vector and the frontier, then stretching theUpy vector andadding some unit vector are the same thing, so we again can’t get back to{� by stretching andadding some unit vector. The stretched unit payoff is the riskfree rate, so the theorem rules outthe riskfree rate. When there is no riskfree rate, we have to rule out the “constant-mimickingportfolio return.” I treat this case in section 6.1.

R*

Re*

1

0

x* = a + bRmv

Rmv

bRmv

E - σ frontier Rf

Figure 17. There is a discount factorp @ d . eUpy if and only if Upy is on themean-variance frontier and not the risk free rate.

Algebraic proof

Now, an algebraic proof that captures the same ideas.

99

Page 100: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

Proof. For an arbitraryU> try the discount factor model

p @ d. eU @ d. e+U� .zUh� . q,= (5)

We show that this discount factor prices an arbitrary payoff if and only ifq @ 3, andexcept for thez choice that makesU the riskfree rate, or the constant-mimickingportfolio return if there is no riskfree rate.We can determinedandeby forcingp to price any two assets. I¿nd d ande tomake the model priceU� andUh�.

4 @ H+pU�, @ dH+U�, . eH+U�5,

3 @ H+pUh�, @ dH+Uh�, . ezH+Uh�5, @ +d. ez,H+Uh�,=

Solving ford ande,

d @z

zH+U�,�H+U�5,

e @ � 4

zH+U�,�H+U�5,=

Thus, if it is to priceU� andUh�, the discount factor must be

p @z � +U� .zUh� . q,

zH+U�,�H+U�5,= (6)

Now, let’s see ifp prices an arbitrary payoff{l= Any {l 5 [ can also be decom-posed as

{l @ |lU� .zlUh� . ql=

(See Figure 14 if this isn’t obvious.) The price of{l is |l, since bothUh� andql arezero-price (excess return) payoffs. Therefore, we wantH+p{l, @ |l. Does it?

H+p{l, @ H

�+z �U� �zUh� � q,+|lU� .zlUh� . ql,

zH+U�,�H+U�5,

Using the orthogonality ofU�, Uh� q> H+q, @ 3 andH+Uh�5, @ H+Uh�, to sim-plify the product,

H+p{l, @z|lH+U�,� |lH+U�5,� H+qql,

zH+U�,�H+U�5,@ |l � H+qql,

zH+U�,� H+U�5,=

To gets+{l, @ |l @ H+p{l,, we needH+qql, @ 3= The only way to guaranteethis condition forevery payoff{l 5 [ is to insist thatq @ 3=Obviously, this construction can’t work if the denominator of (7.6) is zero, i.e. ifz @ H+U�5,@H+U�, @ 4@H+{�,. If there is a riskfree rate, thenUi @ 4@H+{�,,so we are ruling out the caseUpy @ U� . UiUh�, which is the risk free rate. If

100

Page 101: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.3 FACTOR MODELS AND DISCOUNT FACTORS

there is no riskfree rate, I interpret aU @ U� . H+U�5,@H+U�,Uh� as a “constantmimicking portfolio return” in section 5.3, and I give a graphical interpretation ofthis special case in section 6.1 �

We can generalize the theorem somewhat. Nothing is special about returns� any payoff ofthe form|U�.zUh� or |{�.zUh� can be used to price assets� such payoffs have minimumvariance among all payoffs with given mean and price. Of course, we proved existence notuniqueness:p @ d. eUpy . �> H+�{, @ 3 also price assets as always.

To get from the mean-variance frontier to a beta pricing model, we can just chain thistheorem and the theorem of the last section together. There is a slight subtlety about specialcases when there is no riskfree rate, but since it is not important for the basic points I relegatethe direct connection and the special cases to section 6.2.

7.3 Factor models and discount factors

Beta-pricing models are equivalent to linear models for the discount factor m.

H+Ul, @ �. �3�l / p @ d. e3i

We have shown thats @ H+p{, implies a single beta representation usingp> {� orU� as factors. Let’s ask the converse question: suppose we have an expected return - betamodel such as CAPM, APT, ICAPM, etc. What discount factor model does this imply? Ishow that an expected return - beta model is equivalent to a model for the discount factor thatis a linear function of the factors in the beta model. This is an important and central result.It gives the connection between the discount factor formulation emphasized in this book andthe expected return/beta, factor model formulation common in empirical work.

You can write a linear factor model most compactly asp @ e3i , letting one of the factorsbe a constant. However, since we want a connection to the beta representation based oncovariances rather than second moments, it is easiest to fold means of the factors in to theconstant, and writep @ d. e3i with H+i, @ 3 and henceH+p, @ d.

The connection is easiest to see in the special case that all the test assets are excess returns.Then3 @ H+pUh, does not identify the mean ofp, and we can normalized arbitrarily. I¿nd it convenient to normalize toH+p, @ 4, orp @ 4 . e3 ^i �H +i,`. Then,

Theorem: Given the model

p @ 4 . e3 ^i �H +i,` > 3 @ H+pUh, (7)

101

Page 102: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

one can¿nd� such that

H+Uh, @ �3� (8)

where� are the multiple regression coef¿cients of excess returnsUh on the factors.Converesely, given� in (7.8), we can¿nde such that (7.7) holds.Proof: From (7.7)

3 @ H+pUh, @ H+Uh, . e3fry+i>Uh,

H+Uh, @ �e3fry+i>Uh,=

From covariance to beta is quick,

H+Uh, @ �e3ydu+i,ydu+i,�4fry+i>Uh, @ �3�

Thus,� ande are related by

� @ �ydu+i,e=

When the test assets are returns, the same idea works just as well, but gets a little moredrowned in algebra since we have to keep track of the constant inp and the zero-beta rate inthe beta model.

Theorem: Given the model

p @ d. e3i> 4 @ H+pUl,> (9)

one can¿nd� and� such that

H+Ul, @ �. �3�l> (10)

where�l are the multiple regression coef¿cients ofUl on i with a constant. Con-versely, given� and� in a factor model of the form (7.10), one can¿nd d> e suchthat (7.9) holds.Proof: We just have to construct the relation between+�> �, and+d> e, and showthat it works. Start withp @ d. e3i , 4 @ H+pU,>and hence

H+U, @4

H+p,� fry+p>U,

H+p,@

4

d� H+Ui 3,e

d(11)

�l is the vector of the appropriate regression coef¿cients,

�l�H�ii 3

��4H+iUl,>

102

Page 103: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.3 FACTOR MODELS AND DISCOUNT FACTORS

so to get � in the formula, continue with

H+U, @4

d� H+Ui 3,H+ii 3,�4H+ii 3,e

d@

4

d� �3

H+ii 3,ed

Now, de¿ne � and � to make it work,

� � 4

H +p,@

4

d(7.12)

� � �4

dH+ii 3,e @� �H ^pi `

Using (7.12) we can just as easily go backwards from the expected return-beta rep-resentation top @ d. e3i .As always, we have to worry about a special case of zero or in¿nite riskfree rates.We rule outH+p, @ H+d . e3i, @ 3 to keep (7.11) from exploding, and we ruleout� @ 3 andH+ii 3, singular to go from�> �> � in (7.12) back top. �

Given either modelthere is a model of the other form. They are notunique. We can add top any random variable orthogonal to returns, and we can add spurious risk factors with zero� and/or� , leaving pricing implications unchanged. We can also express the multiple betamodel as a single beta model withp @ d. e3i as the single factor, or use its correspondingU�.

Equation (7.12) shows that the factor risk premium� can be interpreted as the price of thefactor� A test of� 9@ 3 is often called a test of whether the “factor is priced.” More precisely,� captures the priceH+pi,of the (de-meaned) factors brought forward at the risk free rate.Ifwe start with underlying factors�i such that the demeaned factors arei @ �i�H+ �i,,

� � �� sk�i�H+ �i,

l@ ��

%s+ �i,�H+ �i,

&

� represents the price of the factors less their risk-neutral valuation, i.e. thefactor risk pre-mium. If the factors are not traded, � is the model’s predicted price rather than a marketprice. Low prices are high risk premia, resulting in the negative sign. If the factors are re-turns with price one, then the factor risk premium is the expected return of the factor, less�>� @ H+i,� �.

Note that the “factors” need not be returns (though they may be)� they need not be orthog-onal, and they need not be serially uncorrelated or conditionally or unconditionally mean-zero. Such properties may occur as natural special cases, or as part of the economic deriva-tion of speci¿c factor models, but they are not required for the existence of a factor pricingrepresentation. For example, if the riskfree rate is constant thenHw+pw.4, is constant and atleast the sume3i should be uncorrelated over time. But if the riskfree rate is not constant,thenHw+pw.4, @ Hw+e3i w.4, should vary over time.

103

Page 104: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

Factor-mimicking portfolios

It is often convenient to use factor-mimicking payoffs

i� @ surm+i m[,

factor-mimicking returns

i� @surm+i m[,

s ^surm+i m[,`

or factor-mimicking excess returns

i� @ surm+i mUh,

in place of true factors. These payoffs carry the same pricing information as the originalfactors, and can serve as reference variables in expected return-beta representations

When the factors are not already returns or excess returns, it is convenient to express a betapricing model in terms of itsfactor mimicking portfolios rather than the factors themselves.Recall that{� @ surm+pm[, carries all ofp3v pricing implications on[� s+{, @ H+p{, @H+{�{,. The factor-mimicking portfolios are just the same idea using the individual factors.

De¿ne the payoffsi� by

i� @ surm+i m[,

Then,p @ e3i�carries the same pricing implications on[ as doesp @ e3i :

s @ H+p{, @ H+e3i {, @ H ^e3+surmi m[, {` @ H ^e3i� {` = (13)

(I include the constant as one of the factors.)

The factor-mimicking portfolios also form a beta representation. Just go from (7.13) backto an expected return- beta representation

H+Ul, @ �� . ��3��> (14)

and¿nd��, �� using (7.12). The�� are the regression coef¿cients of the returnsUl on thefactor-mimicking portfolios, not on the factors, as they should be.

It is more traditional to use thereturns or excess returns on the factor-mimicking portfo-lios rather than payoffs as I have done so far. To generate returns, divide the payoff by itsprice,

i� @surm+i m[,

s ^surm+i m[,`=

104

Page 105: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.4 DISCOUNT FACTORS AND BETA MODELS TO MEAN - VARIANCE FRONTIER

The resultinge will be scaled down by the price of tha factor-mimicking payoff, and themodel is the same. Note you project on the space of payoffs, not of returns. ReturnsU arenot a space, since they don’t contain zero.

If the test assets are all excess returns, you can even more easily project the factors on theset of excess returns, which are a space since they do include zero. If we de¿ne

i� @ surm+i mUh,

then of course the excess returnsi� carry the same pricing implications as the factorsi for aset of excess returns�p @ e3i� satis¿es3 @ H+pUhl, and

H+Uhl, @ �l>i�� @ �l>i�H+i�,

7.4 Discount factors and beta models to mean - variance frontier

From p, we can construct U� which is on the mean variance frontier

If a beta pricing model holds, then the return U� on the mean-variance frontier is a linearcombination of the factor-mimicking portfolio returns.

Any frontier return is a combination ofU� and one other return, a risk free rate or a riskfree rate proxy. Thus, any frontier return is a linear function of the factor-mimicking returnsplus a risk free rate proxy.

It’s easy to show that givenp that we can¿nd a return on the mean-variance frontier.Givenp construct{� @ surm+pm[, andU� @ {�@H+{�5,. U� is the minimum secondmoment return, and hence on the mean-variance frontier.

Similarly, if you have a set of factorsi for a beta model, then a linear combination of thefactor-mimicking portfolios is on the mean-variance frontier. A beta model is the same asp @ e3i . Sincep is linear ini , {� is linear ini� @ surm+i m[,, soU� is linear in the factormimicking payoffsi� or their returnsi�@s+i�,.

Section 6.4 showed how we can span the mean-variance frontier withU� and a risk freerate, if there is one, or the zero-beta, minimum variance, or constant-mimicking portfolioreturn aU @ surm+4m[,@s^surm+4m[,` if there is no risk free rate. The latter is particularlynice in the case of a linear factor model, since we may consider the constant as a factor, sothe frontier is entirely generated by factor-mimicking portfolio returns.

105

Page 106: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

7.5 Three riskfree rate analogues

I introduce three counterparts to the risk free rate that show up in asset pricing formulaswhen there is no risk free rate. The three returns are thezero-beta return, the minimum-variance returnand the constant-mimicking portfolioreturn.

Three different generalizations of the riskfree rate are useful when a risk free rate or unitpayoff is not in the set of payoffs. These are the zero-betareturn, the minimum-variance re-turn and the constant-mimicking portfolioreturn. I introduce the returns in this section, and Iuse them in the next section to state some special cases involving the mean-variance frontier.Each of these returns maintains one property of the risk free rate in a market in which thereis no risk free rate. The zero-beta return is a mean-variance ef¿cient return that is uncorre-lated with another given mean-variance ef¿cient return. The minimum-variance return is justthat. The constant-mimicking portfolio return is the return on the payoff “closest” to the unitpayoff. Each of these returns one has a representation in the standard formU� .zUh� withslightly differentz. In addition, the expected returns of these risky assets are used in someasset pricing representations. For example, the zero betarate is often used to refer to theexpcted value of the zero beta return.

Each of these riskfree rate analogues is mean-variance ef¿cient. Thus, I characterize eachone by¿nding its weightz in a representation of the formU� . zUh�. We derived such arepresentation above for the riskfree rate as equation (6.18),

Ui @ U� .UiUh�= (15)

In the last subsection, I show how each riskfree rate analogue reduces to the riskfree ratewhen there is one.

7.5.1 Zero-beta return for U�

The zero beta return for U�, denoted U�, is the mean-variance ef¿cient return uncorre-lated withU�. Its expected return is the zero beta rate� @ H+Ud,. This zero beta return hasrepresentation

Ud @ U� .ydu+U�,

H+U�,H+Uh�,Uh�>

and the corresponding zero beta rate is

� @ H+U�, @H+U�5,H+U�,

@4

H+{�,=

106

Page 107: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.5 THREE RISKFREE RATE ANALOGUES

The zero beta rate is found graphically in mean-standard deviation space by extending thetangency atU� to the vertical axis. It is also the inverse of the price that{� andU� assign tothe unit payoff.

The riskfree rateUi is of course uncorrelated withU�. Risky returns uncorrelated withU� earn the same average return as the risk free rate if there is one, so they might take theplace ofUi when the latter does not exist. For any returnU� that is uncorrelated withU� wehaveH+U�U�, @ H+U�,H+U�,> so

� @ H+U�, @H+U�5,H+U�,

@4

H+{�,=

The¿rst equality introduces a popular notation� for this rate. I call� the zero betarate, andUd the zero betareturn. There is no riskfree rate, so there is no security that just pays�.

As you can see from the formula, the zero-beta rate is the inverse of the price thatU� and{� assign to the unit payoff, which is another natural generalization of the riskfree rate. It iscalled the zerobeta rate becausefry+U�> U�, @ 3 implies that the regression beta ofU� onU� is zero. More precisely, one might call it the zero beta rateon U�, since one can calculatezero-beta rates for returns other thanU� and they are not the same as the zero-beta rate forU� In particular, the zero-beta rate on the “market portfolio” will generally be different fromthe zero beta rate onU�.

σ(R)

E(R)

R*

α = E(R*2 )/ E(R* )= 1/E(x*)

Figure 18. Zero-beta rate� and zero-beta returnUd for U�=

I draw � in Figure 18 as the intersection of the tangency and the vertical axis. This

107

Page 108: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

is a property of any return on the mean variance frontier: The expected return on an assetuncorrelated with the mean-variance ef¿cient asset (azero-betaasset) lies at the point soconstructed. To check this geometry, use similar triangles: The length of U� in Figure 18 issH+U�5,, and its vertical extent is H+U�,. Therefore, �@

sH+U�5, @

sH+U�5,@H+U�,,

or � @ H+U�5,@H+U�,. Since U� is on the lower portion of the mean-variance frontier, thiszero beta rate� is above the minimum variance return.

Note that in general� 9@ 4@H+p,. Projectingpon[ preserves asset pricing implica-tions on[ but not for payoffs not in[. Thus if a risk free rate is not traded,{� andp maydiffer in their predictions for the riskfree rate as for other nontraded assets.

The zero betareturn is the rate of return on the mean-variance frontier with mean equal tothe zero beta rate, as shown in Figure 18. We want to characterize this return inU� .zUh�

form. To do this, we want to¿ndz such that

H+Ud, @H+U�5,H+U�,

@ H+U�, . zH+Uh�,=

Solving, the answer is

z @H+U�5,�H+U�,5

H+U�,H+Uh�,@

ydu+U�,H+U�,H+Uh�,

=

Thus, the zero betareturn is

Ud @ U� .ydu+U�,

H+U�,H+Uh�,Uh�>

expression (7.23). Note that the weight is notH+Ud, @ H+U�5,@H+U�,. When there is norisk free rate, the weight and the mean return are different.

7.5.2 Minimum variance return

The minimum variance return has the representation

Umin. var. @ U� .H+U�,

4�H+Uh�,Uh�=

The riskfree rate obviously is the minimum variance return when it exists. When there isno risk free rate, the minimum variance return is

Umin. var. @ U� .H+U�,

4�H+Uh�,Uh�= (16)

108

Page 109: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.5 THREE RISKFREE RATE ANALOGUES

Taking expectations,

H+Umin. var., @ H+U�, .H+U�,

4�H+Uh�,H+Uh�, @

H+U�,4�H+Uh�,

=

The minimum variance return retains the nice property of the risk free rate, that its weight onUh� is the same as its mean,

Umin. var. @ U� .H+Umin. var.,Uh�

just as Ui @ U� . UiUh�. When there is no risk free rate, the zero-beta and minimumvariance returns are not the same. You can see this fact clearly in Figure 18.

We can derive expression (7.16) for the minimum variance return by brute force: choosez in U� .zUh� to minimize variance.

plqz

ydu+U� .zUh�, @ H^+U� .zUh�,5`�H+U� .zUh�,5 @

@ H+U�5, . z5H+Uh�,�H+U�,5 � 5zH+U�,H+Uh�,�z5H+Uh�,5=

The¿rst order condition is

3 @ zH+Uh�,^4�H+Uh�,`�H+U�,H+Uh�,

z @H+U�,

4�H+Uh�,=

7.5.3 Constant-mimicking portfolio return

The constant-mimicking portfolio return is de¿ned as the return on the projection of theunit vector on the payoff space,

aU @surm+4m[,

s ^surm+4m[,`=

It has the representation

aU @ U� .H+U�5,H+U�,

Uh�=

When there is a risk free rate, it is the rate of return on a unit payoff,Ui @ 4@s+4,. Whenthere is no risk free rate, we might de¿ne the rate of return on the mimicking portfolio for a

109

Page 110: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

unit payoff,

aU @surm+4m[,

s ^surm+4m[,`=

I call this object theconstant-mimicking portfolio return.

The mean-variance representation of the constant-mimicking portfolio return is

aU @ U� . �Uh� @ U� .H+U�5,H+U�,

Uh�= (17)

Note that the weight� equal to the zero beta rate creates the constant-mimicking return, notthe zero beta return. To show (7.17), start with property (6.19),

Uh� @ surm+4m[,� H+U�,H+U�5,

U�= (18)

Take the price of both sides. Since the price ofUh� is zero and the price ofU� is one, weestablish

s ^surm+4m[,` @H+U�,H+U�5,

= (19a)

Solving (7.18) forsurm+4m[,, dividing by (7.19a) we obtain the right hand side of (7.17).

7.5.4 Risk free rate

The risk free rate has the mean-variance representation

Ui @ U� .UiUh�=

The zero-beta, minimum variance and constant-mimicking portfolio returns reduce to thisformula when there is a risk free rate.

Again, we derived in equation (6.18) that the riskfree rate has the representation,

Ui @ U� .UiUh�= (20)

Obviously, we should expect that the zero-beta return, minimum-variance return, and constant-mimicking portfolio return reduce to the riskfree rate when there is one. These other ratesare

constant-mimicking:aU @ U� .H+U�5,H+U�,

Uh� (21)

110

Page 111: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.6 MEAN-VARIANCE SPECIAL CASES WITH NO RISKFREE RATE

minimum-variance:Umin. var. @ U� .H+U�,

4�H+Uh�,Uh� (22)

zero-beta:U� @ U� .ydu+U�,

H+U�,H+Uh�,Uh�= (23)

To establish that these are all the same when there is a riskfree rate, we need to show that

Ui @H+U�5,H+U�,

@H+U�,

4�H+Uh�,@

ydu+U�,H+U�,H+Uh�,

(24)

We derived the¿rst equality above as equation (??). To derive the second equality, takeexpectations of (7.15),

Ui @ H+U�, .UiH+Uh�, (25)

and solve forUi . To derive the third equality, use the¿rst equality from (7.24) in (7.25),

H+U�5,H+U�,

@ H+U�, .UiH+Uh�,=

Solving forUi >

Ui @H+U�5,�H+U�,5

H+U�,H+Uh�,@

ydu+U�,H+U�,H+Uh�,

=

7.6 Mean-variance special cases with no riskfree rate

We can ¿nd a discount factor from any mean-variance ef¿cient return except the constant-mimicking return.

We can¿nd a beta representation from any mean-variance ef¿cient return except theminimum-variance return.

I collect in this section the special cases for the equivalence theorems of this chapter. Thespecial cases all revolve around the problem that the expected discount factor, price of a unitpayoff or riskfree rate must not be zero or in¿nity. This is typcially an issue of theoreticalrather than practical importance. In a complete, arbitrage free market,p A 3 so we knowH+p, A 3. If a riskfree rate is traded you can observe4 A H+p, @ 4@Ui A 3. However,in an incomplete market in which no riskfree rate is traded, there are many discount factorswith the same asset pricing implications, and you might have happened to choose one withH+p, @ 3 in your manipulations. By and large, this is easy to avoid: choose another of the

111

Page 112: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

many discount factors with the same pricing implications that does not haveH+p, @ 3. Moregenerally, when you choose a particular discount factor you are choosing an extension of thecurrent set of prices and payoffs� you are viewing the current prices and payoffs as a subsetof a particular contingent-claim economy. Make sure you pick a sensible one. Therefore, wecould simply state the special cases as “when a riskfree rate is not traded, make sure you usediscount factors with3 ? H+p, ? 4=” However, it is potentially useful and it certainly istraditional to specify the specialreturn on the mean-variance frontier that leads to the in¿niteor zero implied riskfree rate, and to rule it out directly. This section works out what thosereturns are and shows why they must be avoided.

7.6.1 The special case for mean variance frontier to discount factor

When there is no riskfree rate, we can¿nd a discount factor that is a linear function ofany mean-variance ef¿cient return except the constant-mimicking portfolio return.

In section 7.2, we saw that we can form a discount factord . eUpy from any mean-

variance ef¿cient returnUpy except one particular return, of the formU�. H+U�5,H+U�, U

h�. Thisreturn led to an in¿nitep. We now recognize this return as the risk-free rate, when there isone, or the constant-mimicking portfolio return, if there is no riskfree rate.

Figure 19 shows the geometry of this case. To use no more than three dimensions I had toreduce the return and excess return spaces to lines. The payoff space[ is the plane joiningthe return and excess return sets as shown. The set of all discount factors isp @ {� . %>H+%{, @ 3, the line through{� orthogonal to the payoff space[ in the¿gure. I draw theunit payoff (the dot marked “1” in Figure 19) closer to the viewer than the plane[> and Idraw a vector through the unit payoff coming out of the page.

Take any return on the mean-variance frontier,Upy. (Since the return space only has twodimensions, all returns are on the frontier.) For a givenUpy, the spaced. eUpy is the planespanned byUpy and the unit payoff. This plane lies sideways in the¿gure. As the¿gureshows, there is a vectord. eUpy in this plane that lies on the line of discount factors.

Next, the special case. This construction would go awry if the plane spanning the unitpayoff and the returnUpy were parallel to the plane containing the discount factor. Thus,the construction would not work for the return markedaU in the Figure. This is a returncorresponding to a payoff that is the projection of the unit payoff on to[, so that the residualwill be orthogonal to[, as is the line of discount factors.

With Figure 19 in front of us, we can also see why the constant-mimicking portfolio returnis not the same thing as the minimum-variance return. Variance is the size or second moment

112

Page 113: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.6 MEAN-VARIANCE SPECIAL CASES WITH NO RISKFREE RATE

0

R*

Re

R

X

Rmv

1

Constant-mimicking return RDiscount factors

a+b Rmvx*

^

Figure 19. One can construct a discount factorp @ d . eUpy from anymean-variance-ef¿cient return except the constant-mimicking returnaU.

113

Page 114: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 7 RELATION BETWEEN DISCOUNT FACTORS, BETAS, AND MEAN-VARIANCE FRONTIERS

of the residual in a projection (regression) on 1.

ydu+{, @ H�+{�H+{,,5

�@ H

�+{� surm+{m4,,5� @ mm{� surm+{m4,mm5

Thus, the minimum variance return is the return closest to extensions of the unit vector. It isformed by projectingreturns on theunit vector. The constant-mimicking portfolio return isthe return on the payoff closest to 1 It is formed by projecting theunit vector on the set ofpayoffs.

7.6.2 The special case for mean-variance frontier to a beta model

We can use any return on the mean-variance frontier as the reference return for a singlebeta representation, except the minimum-variance return.

We already know mean variance frontiers/ discount factor and discount factor/ singlebeta representation, so at a super¿cial level we can string the two theorems together to gofrom a mean-variance ef¿cient return to a beta representation. However it is more elegant togo directly, and the special cases are also a bit simpler this way.

Theorem: There is a single beta representation with a returnUpy as factor,

H+Ul, @ �Upy . �l>Upy ^H+Upy,� �` >

if and only if Upy is mean-variance ef¿cient and not the minimum variance return.

This famous theorem is given by Roll (1976) and Hansen and Richard (1987). We ruleout minimum variance to rule out the special caseH+p, @ 3. Graphically, the zero-beta rateis formed from the tangency to the mean-variance frontier as in Figure 18. I use the notation�Upy to emphasize that we use the zero-beta rate corresponding to the particular mean-variance returnUpy that we use as the reference return. If we used the minimum-variancereturn, that would lead to an in¿nite zero-beta rate.

Proof: The mean-variance frontier isUpy @ U� . zUh�. Any return isUl @U� .zlUh� . ql. Thus,

H+Ul, @ H+U�, .zlH+Uh�, (26)

Now,

fry+Ul> Upy, @ fry�+U� .zUh�, > +U� .zlUh�,

�@ ydu+U�, .zzlydu+Uh�,� +z .zl,H+U�,H+Uh�,@ ydu+U�,�zH+U�,H+Uh�, . zl ^z ydu+Uh�,�H+U�,H+Uh�,`

114

Page 115: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 7.7 PROBLEMS

Thus, fry+Ul> Upy, and H+Ul, are both linear functions of zl. We can solvefry+Ul> Upy, for zl, plug into the expression for H+Ul, and we’re done.To do this, of course, we must be able to solvefry+Ul>Upy, for zl. This requires

z 9@ H+U�,H+Uh�,ydu+Uh�,

@H+U�,H+Uh�,

H+Uh�5,�H+Uh�,5@

H+U�,4�H+Uh�,

(27)

which is the condition for the minimum variance return. �

7.7 Problems

1. In the argument thatUpy on the mean variance frontier,Upy @ U� .zUh�, implies adiscount factorp @ d. eUpy, do we have to rule out the case of risk neutrality? (Hint:What isUh� when the economy is risk-neutral?)If you use factor mimicking portfolios as in (7.13), you know that the predictions forexpected returns are the same as they are if you use the factors themselves . Are the��,��, and�� for the factor mimicking portfolio representation the same as the original�,�, and� of the factor pricing model?

2. Suppose the CAPM is true,p @ d� eUp prices a set of assets, and there is a risk-freerateUi . FindU� in terms of the moments ofUp> Ui .

3. If you express the mean-variance frontier as a linear combination of factor-mimickingportfolios from a factor model, do the relative weights of the various factors change asyou sweep out the frontier, or do they stay the same? (Start with the riskfree rate case)

4. For an arbitrary mean-variance ef¿cient return of the formU� .zUh�, ¿nd its zero-betareturn and zero-beta rate. Show that your rate reduces to the riskfree rate when there isone.

5. When the economy is risk neutral, and if there is no risk-free rate, show that thezero-beta, minimum-variance, and constant-mimicking portfolio returns are again allequivalent, though not equal to the risk-free rate. (In this case, the mean-variance frontieris just the minimum-variance point.)

115

Page 116: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 8. Implications of existence andequivalence theorems

Existence of a discount factor means s @ H+p{, is innocuous, and all content Àows fromthe discount factor model.

The theorems apply to sample moments too� the dangers of ¿shing up ex-post or samplemean-variance ef¿cient portfolios.

Sources of discipline in factor¿shing expeditions.

The joint hypothesis problem. How ef¿ciency tests are the same as tests of economicdiscount factor models.

Factors vs. their mimicking portfolios.

Testing the number of factors.

Plotting contingent claims on the axis vs. mean and variance.

The theorems on the existence of a discount factor, and the equivalence between thes @H+p{,, expected return - beta, and mean-variance views of asset pricing have importantimplications for how we approach and evaluate empirical work.

The equivalence theorems are obviously important, especially to the theme of this book,to show that the choice of discount factor language versus expected return-beta languageor mean-variance frontier is entirely one of convenience. Nothing in the more traditionalstatements is lost.

s @ H+p{, is innocuous

Before Roll (1976), expected return – beta representations had been derived in the con-text of special and explicit economic models, especially the CAPM. In empirical work, thesuccess of any expected return - beta model seemed like a vindication of the whole structure.The fact that, for example, one might use the NYSE value-weighted index portfolio in placeof the return on total wealth predicted by the CAPM seemed like a minor issue of empiricalimplementation.

When Roll showed that mean-variance ef¿ciency implies a single beta representation,all that changed.Some single beta representation always exists, since there is some mean-variance ef¿cient return. The asset pricing model only serves to predict that a particularreturn (say, the “market return”) will be mean-variance ef¿cient. Thus, if one wants to “testthe CAPM” it becomes much more important to be choosy about the reference portfolio, toguard against stumbling on something that happens to be mean-variance ef¿cient and henceprices assets by construction.

116

Page 117: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

This insight led naturally to the use of broader wealth indices (Stambaugh 198x) in thereference portfolio to provide a more grounded test of the CAPM. However, this approachhas not caught on. Stocks are priced with stock factors, bonds with bond factors, and so on.More recently, stocks sorted on size, book/market, and past performance characteristics arepriced by portfolios sorted on those characteristics. Part of the reason for this is that the betasare small� stocks and bonds are not highly correlated so risk premia from one source of betashave small impacts on another set of average returns. Larger measures of wealth includinghuman capital and real estate do not come with high frequency price data, so adding them toa wealth portfolio has little effect on betas.

The good news in this existence theorem is that you can always start by writing an ex-pected return-beta model, knowing that you have imposed almost no structure in doing so.The bad news is that you haven’t gotten very far. All the economic, statistical and predictivecontent comes in picking the factors.

The theorem that, from the law of one price, there existssome discount factorp suchthat s @ H+p{, is just an updated restatement of Roll’s theorem. The content is all inp @ i+data, not ins @ H+p{,. Again, an asset pricing framework that initially seemed torequire a lot of completely unbelievable structure–the representative consumer consumption-based model in complete frictionless markets–turns out to require (almost) no structure at all.Again, the good news is that you can always start by writings @ H+p{,, and need not suffercriticism about hidden contingent claim or representative consumer assumptions in so doing.The bad news is that you haven’t gotten very far by writings @ H+p{, as all the economic,statistical and predictive content comes in picking the discount factor modelp @ i+data).

Ex-ante and ex-post.

I have been deliberately vague about the probabilities underlying expectations and othermoments in the theorems. The fact is, the theorems hold for anyset of probabilities4. Thus,the theorems work equally well ex-anteas ex-post: H+p{,> �> H+U, and so forth can referto agent’s subjective probability distributions, objective population probabilities, or to themoments realized in a given sample.

Thus, if the law of one price holds in a sample, one may form an{� from sample momentsthat satis¿ess+{, @ H+{�{,> exactly, in that sample, wheres+{, refers to observed pricesandH+{�{, refers to the sample average. Equivalently, if thesample covariance matrix ofa set of returns is nonsingular, there exists anex-postmean-variance ef¿cient portfolio forwhich sample average returns line up exactly with sample regression betas.

This observation points to a great danger in the widespread exercise of searching for andstatistically evaluating ad-hoc asset pricing models. Such models areguaranteed empiricalsuccess in a sample if one places little enough structure on what is included in the discountfactor function. The only reason the model doesn’t workperfectly is the restrictions the re-searcher has imposed on the number or identity of the factors included inp, or the parametersof the function relating the factors top. Since these restrictions are theentire content of the

e Precisely, any set of probabilities that agree on impossible (zero-probability) events.

117

Page 118: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 8 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

model, they had better be interesting, carefully described and well motivated!

Obviously, this is typically not the case or I wouldn’t be making such a fuss about it. Mostempirical asset pricing research posits an ad-hoc pond of factors,¿shes around a bit in thatset, and reports statistical measures that show “success,” in that the model is not statisticallyrejected in pricing an ad-hoc set of portfolios. The set of discount factors is usually not largeenough to give the zero pricing errors we know are possible, yet the boundaries are not clearlyde¿ned.

Discipline

What is wrong, you might ask, with¿nding an ex-post ef¿cient portfolio or{� that pricesassets by construction? Perhaps the lesson we should learn from the existence theorems is toforget about economics, the CAPM, marginal utility and all that, and simply price assets withex-post mean variance ef¿cient portfolios that we know set pricing errors to zero!

The mistake is that a portfolio that is ex-post ef¿cient in one sample, and hence pricesall assets in that sample, is unlikely to be mean-variance ef¿cient, ex-ante or ex-post, in thenext sample, and hence is likely to do a poor job of pricing assets in the future. Similarly,the portfolio{� @ s3H+{{3,�4{ (using the sample second moment matrix) that is a discountfactor by construction in one sample is unlikely to be a discount factor in the next sample�the required portfolio weightss3H+{{3,�4 change, often drastically, from sample to sample.

For example, suppose the CAPM is true, the market portfolio is ex-ante mean-variance ef-¿cient, and sets pricing errors to zero if you use true or subjective probabilities. Nonetheless,the market portfolio is unlikely to beex-postmean-variance ef¿cient in any given sample. Inany sample, there will be lucky winners and unlucky losers. Anex-postmean variance ef-¿cient portfolio will be a Monday-morning quarterback� it will tell you to put large weightson assets that happened to be lucky in a given sample, but are no more likely than indicatedby their betas to generate high returns in the future. “Oh, if I had only bought Microsoft in1982...” is not a useful guide to forming a mean-variance ef¿cient portfolio today. (In fact,mean-reversion in the market and book/market effects in individual stocks suggest that ifanything, assets with unusually good returns in the past are likely to do poorly in the future!)

The only solution is to impose some kind of discipline in order to avoid dredging upspuriously good in-sample pricing.

The situation is the same as in traditional regression analysis. Regressions are used toforecast or to explain a variable| by other variables{ in a regression| @ {3� . %. Byblindly including right hand variables, one can produce models with arbitrarily good statis-tical measures of¿t. But this kind of model is typically unstable out of sample or otherwiseuseless for explanation or forecasting. One has to carefully and thoughtfully limit the searchfor right hand variables{ to produce good models.

What makes for an interesting set of restrictions? Econometricians wrestling with| @{3� . % have been thinking about this question for about 50 years, and the best answersare 1) use economic theory to carefully specify the right hand side and 2) use a battery of

118

Page 119: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

cross-sample and out-of-sample stability checks.

Alas, this advice is hard to follow. Economic theory is usually either silent on whatvariables to put on the right hand side of a regression, or allows a huge range of variables.The same is true in¿nance. “What are the fundamental risk factors?” is still an unansweredquestion. At the same time one can appeal to the APT and ICAPM to justify the inclusion ofjust about any desirable factor (Fama 1991 calls these theories a “¿shing license.”) Thus, youwill grow old waiting for theorists to provide useful answers to this kind of question.

Following the purely statistical advice, the battery of cross-sample and out-of-sampletests often reveals the model is unstable, and needs to be changed. Once it is changed, thereis no more out-of-sample left to check it. Furthermore, even if one researcher is pure enoughto follow the methodology of classical statistics, and wait 50 years for another fresh sampleto be available before contemplating another model, his competitors and journal editors areunlikely to be so patient. In practice, then, out of sample validation is not a strong guardagainst¿shing.

Nonetheless, these are the only standards we have to guard against¿shing. In my opinion,the best hope for¿nding pricing factors that are robust out of samples and across differentmarkets, is to try to understand the fundamental macroeconomic sources of risk. By this Imean, tying asset prices to macroeconomic events, in the way the ill-fated consumption basedmodel does viapw.4 @ �x3+fw.4,@x3+fw,. The dif¿culties of the consumption-based modelhas made this approach lose favor in recent years. However, the alternative approach is alsorunning into trouble that the number and identity of epirically-determined risk factors doesnot seem stable. Every time a new anomaly or data set pops up, a new set of ad-hoc factorsgets created to explain them!

In any case, one should always ask of a factor model, “what is the compelling economicstory that restricts the range of factors used?” and / or “whatstatistical restraints are used”to keep from discovering ex-post mean variance ef¿cient portfolios, or to ensure that theresults will be robust across samples. The existence theorems tell us that the answers to thesequestions are theonly content of the exercise. If the purpose of the model is not just topredictasset prices but also toexplain them, this puts an additional burden on economic motivationof the risk factors.

There is a natural resistance to such discipline built in to our current statistical method-ology for evaluating models (and papers). When the last author¿shed around and producedan ad-hoc factor pricing model that generates 1% average pricing errors, it is awfully hardto persuade readers, referees, journal editors, and clients that your economically motivatedfactor pricing model is better despite 2% average pricing errors. Your model may really bebetter and will therefore continue to do well out of sample when the¿shed model falls bythe wayside of¿nancial fashion, but it is hard to get past statistical measures of in-sample¿t.One hungers for a formal measurement of the number of hurdles imposed on a factor¿shingexpedition, like the degrees of freedom correction in�U5. Absent a numerical correction, wehave to use judgment to scale back apparent statistical successes by the amount of economicand statistical¿shing that produced them.

119

Page 120: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 8 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

Mimicking portfolios

The theorem {� @ surm+pm[, also has interesting implications for empirical work. Thepricing implications of any model can be equivalently represented by its factor-mimickingportfolio. If there is any measurement error in a set of economic variables drivingp, thefactor-mimicking portfolios for the truep will price assets better than an estimate ofp thatuses the measured macroeconomic variables.

Thus, it is probably not a good idea to evaluate economically interesting models withstatistical horse races against models that use portfolio returns as factors. Economically in-teresting models, even if true and perfectly measured, will just equal the performance of theirown factor-mimicking portfolios, even in large samples. They will always lose in sampleagainst ad-hoc factor models that¿nd nearly ex-post ef¿cient portfolios.

This said, there is an important place for models that use returns as factors.After wehave found the underlying true macro factors, practitioners will be well advised to look atthe factor-mimicking portfolio on a day-by-day basis. Good data on the factor-mimickingportfolios will be available on a minute-by-minute basis. For many purposes, one does nothave to understand theeconomic content of a model.

But this fact does not tell us to circumvent the process of understanding the true macroe-conomic factors by simply¿shing for factor-mimicking portfolios. The experience of practi-tioners who use factor models seems to bear out this advice. Large commercial factor modelsresulting from extensive statistical analysis (otherwise known as¿shing) perform poorly outof sample, as revealed by the fact that the factors and loadings (�) change all the time.

Also models speci¿ed with economic fundamentals will always seem to do poorly ina given sample against ad-hoc variables (especially if one¿shes an ex-post mean-varianceef¿cient portfolio out of the latter!). But what other source of discipline do we have?

Irrationality and Joint Hypothesis

Finance contains a long history of¿ghting about “rationality” vs. “irrationality” and“ef¿ciency” vs. “inef¿ciency” of asset markets. The results of many empirical asset pricingpapers are sold as evidence that markets are “inef¿cient” or that investors are “irrational.” Forexample, the crash of October 1987, and various puzzles such as the small-¿rm, book/market,seasonal effects or long-term predictability have all been sold this way.

However, none of these puzzles documents an arbitrage opportunity5. Therefore, weknow that there isa “rational model”–a stochastic discount factor,an ef¿cient portfolio to usein a single-beta representation—that rationalizes them all. And we can con¿dently predictthis situation to continue� real arbitrage opportunities do not last long! Fama (1970) containsa famous statement of the same point. Fama emphasized that any test of “ef¿ciency” is ajointtest of ef¿ciency and a “model of market equilibrium.” Translated, an asset pricing model, ora model ofp.

D The closed-end fund puzzle comes closest since it documents an apparent violation of the law of one price.However, you can’t costlessly short closed end funds, and we have ignored short sales constraints so far.

120

Page 121: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

But surely markets can be “irrational” or “inef¿cient” without requiringarbitrage oppor-tunities? Yes, they can,if (and only if) the discount factors that generate asset prices aredisconnected from marginal rates of substitution or transformation in the real economy. Butnow we are right back to specifying and testing economic models of the discount factor! Atbest, an asset pricing puzzle might be so severe that we can show that the required discountfactors are completely “unreasonable” (by some standard) measures of real marginal rates ofsubstitution and/or transformation, but we still have to saysomething about what a reasonablemarginal rate looks like.

In sum, the existence theorems mean that there are no quick proofs of “rationality” or“irrationality.” The only game in town for the purpose ofexplaining asset prices is thinkingabout economic models of the discount factor.

The number of factors.

Many assets pricing tests focus on thenumber of factors required to price a cross-sectionof assets. The equivalence theorems imply that this is a silly question. A linear factor modelp @ e3i or its equivalent expected return / beta modelH+Ul, @ � . �3li�i are not uniquerepresentations. In particular, given any multiple-factor or multiple-beta representation wecan easily¿nd a single-beta representation. The single factorp @ e3i will price assetsjust as well as the original factorsi , as will {� @ surm+e3i m [, or the correspondingU�. All three options give rise to single-beta models with exactly the same pricing ability asthe multiple factor model. We can also easily¿nd equivalent representations with differentnumbers (greater than one) of factors. For example, write

p @ d. e4i4 . e5i5 . e6i6 @ d. e4i4 . e5

�i5 .

e6e5i6

�@ d. e4i4 . e5 ai5

to reduce a “three factor” model to a “two factor” model. In the ICAPM language, consump-tion itself could serve as a single state variable, in place of theV state variables presumed todrive it.

There are times when one is interested in a multiple factor representation. Sometimes thefactors have an economic interpretation that is lost on taking a linear combination. But thepurenumber of pricing factors is not a meaningful question.

Discount factors vs. mean, variance and beta.

The point of the previous chapter was to show how the discount factor, mean-variance,and expected return- beta models are all equivalent representations of asset pricing. It seemsa good moment to contrast them as well� to understand why the mean-variance and betalanguage developed¿rst, and to think about why the discount factor language seems to betaking over.

Asset pricing started by putting mean and variance of returns on the axes, rather thanpayoff in state 1 payoff in state 2, etc. as we do now. The early asset pricing theorists posedthe question just right: they wanted to treat assets in the apples-and-oranges, indifference

121

Page 122: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 8 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

curve and budget set framework of macroeconomics. The problem was, what labels to puton the axis? Clearly, “IBM stock” and “GM stock” is not a good idea� investors do notvalue securities per se, but value some aspects of the stream of random cashÀows that thosesecurities give rise to.

Their brilliant insight was to put the mean and variance of the portfolio return on the axis�to treat these as “hedonics” by which investors valued their portfolios. Investors plausiblywant more mean and less variance. They gave investors “utility functions” de¿ned over thismean and variance, just as standard utility functions are de¿ned over apples and oranges. Themean-variance frontier is the “budget set.”

With this focus on portfolio mean and variance, the next step was to realize that eachsecurity’s mean return measures its contribution to the portfolio mean, and that regressionbetas on the overall portfolio give each security’s contribution to the portfolio variance. Themean-return vs. beta description for each security followed naturally.

In a deep sense, the transition from mean-variance frontiers and beta models to discountfactors represents the realization that putting consumption in state 1 and consumption instate 2 on the axes — specifying preferences and budget constraints over state-contingentconsumption — is a much more natural mapping of standard microeconomics into¿nancethan putting mean, variance, etc. on the axes. If for no other reason, the contingent claimbudget constraints are linear, while the mean-variance frontier is not. Thus, I think, the focuson means and variance, the mean-variance frontier and expected return/beta models is alldue to an accident of history, that the early asset pricing theorists happened to put mean andvariance on the axes rather than state contingent consumption. If Arrow or Debreu (195x),who invented state-contingent claims, had taken on asset pricing, we might never have heardof these constructs.

Well, here we are, why prefer one language over another? The discount factor languagehas an advantage for its simplicity, generality, mathematical convenience, and elegance.These virtues are to some extent in the eye of the beholder, but to this beholder, it is in-spiring to be able to startevery asset pricing calculation with one equation,s @ H+p{,.This equation covers all assets, including bonds, options, and real investment opportunities,while the expected return/beta formulation is not useful or very cumbersome in the latter ap-plications. Thus, it has seemed that there are several different asset pricing theories: expectedreturn/beta for stocks, yield-curve models for bonds, arbitrage models for options. In fact allthree are just cases ofs @ H+p{,. As a particular example,arbitrage, in the precise senseof positive payoffs with negative prices, has not entered the equivalence discussion at all. Idon’t know of any way to cleanly graft absence of arbitrage on to expected return/beta mod-els. You have to tack it on after the fact – “by the way, make sure that every portfolio withpositive payoffs has a positive price.” It is trivially easy to graft it on to a discount factormodel: just addp A 3.

The discount factor and state space language also makes it easier to think about differenthorizons and the present value statement of models.s @ H+p{, generalizes quickly tosw @ Hw

Sm pw>w.m{w.m , while returns have to be chained together to think about multiperiod

122

Page 123: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

models. Papers are still written arguing about geometric vs. arithmetic average returns formultiperiod discounting.

The choice of language is not about normality or return distributions. There is a lot ofconfusion about where return distribution assumptions show up in ¿nance. I have made nodistributional assumptions in any of the discussion so far. Second moments as in betas andthe variance of the mean-variance frontier show up becauses @ H+p{, involves a secondmoment. One does not need to assume normality to talk about the mean-variance frontier.Returns on the mean-variance frontier price other assets even when returns are not normallydistributed.

123

Page 124: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 9. Conditioning informationThe asset pricing theory I have sketched so far really describes prices at time w in terms ofconditional moments. The investor’s ¿rst order conditions are

swx3+fw, @ �Hw ^x

3+fw.4,{w.4`

whereHw means expectationconditional on the investor’s timew information. Sensibly, theprice at timew should be higher if there is information at timew that the discounted payoff islikely to be higher than usual at timew. 4= The basic asset pricing equation should be

sw @ Hw+pw.4{w.4,=

(Conditional expectation can also be written

sw @ H ^pw.4{w.4mLw`when it is important to specify theinformation set Lw.)=

If payoffs and discount factors were independent and identically distributed (i.i.d.) overtime, then conditional expectations would be the same as unconditional expectations andwe would not have to worry about the distinction between the two concepts. But stockprice/dividend ratios, bond and option prices all change over time, which must reÀect chang-ing conditional moments of something on the right hand side.

One approach is to specify and estimate explicit statistical models of conditional distribu-tions of asset payoffs and discount factor variables (e.g. consumption growth). This approachis sometimes used, and is useful in some applications, but it is usually cumbersome. As wemake the conditional mean, variance covariance and other parameters of the distribution of(say)Q returns dependÀexibly onP information variables, the number of required param-eters can quickly exceed the number of observations.

More importantly, this explicit approach typically requires us to assume that investors usethe same model of conditioning information that we do. We obviously don’t even observe allthe conditioning information used by economic agents, and we can’t include even a fractionof observed conditioning information in our models. The basic feature and beauty of assetprices (like all prices) is that they summarize an enormous amount of information that onlyindividuals see. The events that make the price of IBM stock change by a dollar, like theevents that make the price of tomatoes change by 10 cents, are inherently unobservable toeconomists or would-be social planners (Hayek 194x). Whenever possible, our treatment ofconditioning information should allow agents to see more than we do.

If we don’t want to model conditional distributions explicitly, and if we want to avoid as-suming that investors only see the variables that we include in an empirical investigation, weeventually have to think about unconditional moments, or at least moments conditioned onless information than agents see. Unconditional implications are also interesting in and ofthemselves. For example, we may be interested in¿nding out why the unconditional mean

124

Page 125: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.1 SCALED PAYOFFS

returns on some stock portfolios are higher than others, even if every agent fundamentallyseeks high conditional mean returns. Most statistical estimation essentially amounts to char-acterizing unconditional means, as we will see in the chapter on GMM. Thus, rather thanmodel conditional distributions, this chapter focuses on what implications forunconditionalmoments we can derive from theconditional theory.

9.1 Scaled payoffs

sw @ Hw+pw.4{w.4, , H+sw}w, @ H+pw.4{w.4}w,

One can incorporate conditioning information by addingscaled payoffs and doing everythingunconditionally. I interpret scaled returns as payoffs tomanaged portfolios.

9.1.1 Conditioning down

The unconditional implications of any pricing model are pretty easy to state. From

sw @ Hw+pw.4{w.4,

we can take unconditional expectations to obtain6

H+sw, @ H+pw.4{w.4,= (1)

Thus, if we just interprets to stand forH+sw,, everything we have done above appliesto unconditional moments. In the same way, we can also condition down from agents’¿neinformation sets to coarser sets that we observe,

sw @ H+pw.4Uw.4 m , , H+swmL � , @ H+pw.4Uw.4 m L � ,

, sw @ H+pw.4Uw.4 m Lw � w, if sw 5 Lw=

In making the above statements I used thelaw of iterated expectations, which is importantenough to highlight it. This law states that if you take an expected value using less informa-tion of an expected value that is formed on more information, you get back the expected valueusing less information. Your best forecast today of your best forecast tomorrow is the same

S We need a small technical assumption that the unconditional moment or moment conditioned on a coarser

information set exists. For example, if f and t are normal Efc ��, then .�f

t�t�' f but .

�f

t

�is in¿nite.

125

Page 126: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

as your best forecast today. In various useful guises,

H+Hw+{,, @ H+{,>

Hw�4+Hw+{w.4,, @ Hw�4+{w.4,

H ^H+{m , m L � ` @ H ^{mL`

9.1.2 Instruments and managed portfolios

We can do more than just condition down. Suppose we multiply the payoff and price by aninstrument }w observed at time w. Then,

}wsw @ Hw+pw.4{w.4}w,

and, taking unconditional expectations,

H+sw}w, @ H+pw.4{w.4}w,= (2)

This is an additional implication of the conditional model, not captured by just condition-ing down as in (9.1). This trick originates from the GMM method of estimating asset pricingmodels, discussed below. The wordinstruments for the} variables comes from theinstru-mental variables estimationheritage of GMM.

To think about equation (9.2), group +{w.4}w,. Call this product a payoff { @ {w.4}w,with prices @ H+sw}w,. Then 9.2 reads

s @ H+p{,

once again. Rather than thinking about (9.2) as a instrumental variables estimate of a condi-tional model, we can think of it as a price and a payoff, and apply all the asset pricing theorydirectly.

This interpretation is not as arti¿cial as it sounds.}w{w.4 are the payoffs tomanagedportfolios. An investor who observes}w can, rather than “buy and hold,” invest in an assetaccording to the value of}w. For example, if a high value of}w forecasts that asset returns arelikely to be high the next period, the investor might buy more of the asset when}w is high andvice-versa. If the investor follows a linear rule, he puts}wsw dollars into the asset each periodand receives}w{w.4 dollars the next period.

This all sounds new and different, but practically every test uses managed portfolios.For example, the size, beta, industry, book/market and so forth portfolios of stocks are allmanaged portfolios, since their composition changes every year in response to conditioninginformation – the size, beta, etc. of the individual stocks. This idea is also closely relatedto the deep idea ofdynamic spanning. Markets that are apparently very incomplete can in

126

Page 127: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.2 SUFFICIENCY OF ADDING SCALED RETURNS

reality provide many more state-contingencies through dynamic (conditioned on information)trading strategies.

Equation (9.2) offers a very simple view of how to incorporate the extra information inconditioning information:Add managed portfolio payoffs, and proceed with unconditionalmoments as if conditioning information didn’t exist!

Linearity is not important. If the investor wanted to place, say, 5 . 6}5 dollars in theasset, we could capture this desire with an instrument }5 @ 5. 6}5. Nonlinear (measurable)transformations of time�w random variables are again random variables.

We can thus incorporate conditioning information while still looking at unconditionalmoments instead of conditional moments, without any of the statistical machinery of explicitmodels with time-varying moments. The only subtleties are 1) The set of asset payoffs ex-pands dramatically, since we can consider all managed portfolios as well as basic assets,potentially multiplying every asset return by every information variable. 2) Expected pricesof managed portfolios show up fors instead of justs @ 3 ands @ 4 if we started with basicasset returns and excess returns.

9.2 Suf¿ciency of adding scaled returns

Checking the expected price of all managed portfolios is, in principle, suf¿cient to checkall the implications of conditioning information.

H+}w, @ H+pw.4Uw.4}w, ;}w 5 Lw , 4 @ H+pw.4Uw.4mLw,

H+sw, @ H+pw.4{w.4, ; {w.4 5 [w.4 , sw @ H+pw.4{w.4mLw,

We have shown that we can derivesome extra implications from the presence of con-ditioning information by adding scaled returns. But does this exhaust the implications ofconditioning information? Are we missing something important by relying on this trick?The answer is, in principleno.

I rely on the following mathematical fact: The conditional expectation of a variable|w.4given an information setLw, H+|w.4 m Lw, is equal to a regression forecast of|w.4 using everyvariable}w 5 Lw. Now, “every random variable” means every variable and every nonlinear(measurable) transformation of every variable, so there are a lot of variables in this regression!(The wordprojection andsurm+|w.4m}w, is used to distinguish the best forecast of|w.4 usingonly linear combinations of}w from the conditional expectation.) Applying this fact to ourcase, let|w.4 @ pw.4Uw.4 � 4= ThenH ^+pw.4Uw.4 � 4, }w` @ 3 for every}w 5 Lw implies4 @ H+pw.4Uw.4 m Lw,. Thus, no implications are lost in principle by looking at scaledreturns.

127

Page 128: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

Another way of looking at the same idea is that Uw.4}w.4 is the return on a payoff avail-able at timew . 4. Thus, the space of all payoffs[w.4 should be understood to include thetime-w.4 payoff you can generate with a basis set of assetsUw.4 and all dynamic strategiesthat use information in the setLw. With that de¿nition of the space[w.4 we can write thesuf¿ciency of scaled returns with the more general second equality above.

“All linear and nonlinear transformations of all variables observed at timew” sounds like alot of instruments, and it is. But there is a practical limit to the number of instruments}w oneneeds to scale by, since only variables that forecast returns orp (or their higher momentsand co-moments) add any information.

Since adding instruments is the same thing as including potential managed portfolios,thoughtfully choosing a few instruments is thesame thing as the thoughtful choice of a fewassets or portfolios that one makes in any test of an asset pricing model. Even when evaluatingcompletely unconditional asset pricing models, one always forms portfolios and omits manypossible assets from analysis. Few studies, in fact, go beyond checking whether a modelcorrectly prices 10-25 stock portfolios and a few bond portfolios. Implicitly, one feels thatthe chosen payoffs do a pretty good job of spanning the set of available risk-loadings (meanreturns) and hence that adding additional assets will not affect the results. Nonetheless, sincedata are easily available on all 2000 or so NYSE stocks, plus AMEX and NASDAQ stocks, tosay nothing of government and corporate bonds, returns of mutual funds, foreign exchange,foreign equities, real investment opportunities, etc., the use of a few portfolios means that atremendous number of potential asset payoffs are left out in an ad-hoc manner.

In a similar manner, if one had a small set of instruments that capture all the predictabilityof discounted returnspw.4Uw.4, then there would be no need to add more instruments.Thus, we carefully but arbitrarily select a few instruments that we think do a good job ofcharacterizing the conditional distribution of returns. Exclusion of potential instruments isexactly the same thing as exclusion of assets. It is no better founded, but the fact that it is acommon sin may lead one to worry less about it.

There is nothing special about unscaled returns, and no economic reason to place themabove scaled returns. A mutual fund might come into being that follows the managed port-folio strategy and then itsunscaled returns would be the same as an original scaled return.Models that cannot price scaled returns are no more interesting than models that can onlyprice (say) stocks with¿rst letter A through L. (There may be econometric reasons to trustresults for nonscaled returns a bit more, but we haven’t gotten to statistical issues yet.)

Of course, the other way to incorporate conditioning information is by constructing ex-plicit parametric models of conditional distributions. With this procedure one can in factcheckall of a model’s implications about conditional moments. However, the parametricmodel may be incorrect, or may not reÀect some variable used by investors. Including in-struments may not be as ef¿cient, but it is still consistent if the parametric model is incorrect.The wrong parametric model of conditional distributions may lead to inconsistent estimates.In addition, one avoids estimating nuisance parameters of the parametric distribution model.

128

Page 129: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.3 CONDITIONAL AND UNCONDITIONAL MODELS

9.3 Conditional and unconditional models

A conditional factor model does not imply a ¿xed-weight or unconditional factor model:

pw.4 @ e3wiw.4> sw @ Hw+pw.4{w.4, does not imply that<e v=w= pw.4 @ e3iw.4> H+sw, @H+pw.4{w.4,=

Hw+Uw.4, @ �3w�w does not implyH+Uw.4, @ �3�=

Conditional mean-variance ef¿ciency does not imply unconditional mean-variance ef¿-ciency.

The converse statements are true, if managed portfolios are included.

For explicit discount factor models—models whose parameters are constant over time—the fact that one looks at a conditional vs. unconditional implications makes no difference tothe statement of the model.

sw @ Hw+pw.4{w.4, , H+sw, @ H+pw.4{w.4,

and that’s it. Examples include the consumption-based model with power utility,pw.4 @�+fw.4@fw,

�� , and the log utility CAPM,pw.4 @ [email protected]=

However, linear factor models include parameters that may vary over time and as func-tions of conditioning information. In these cases the transition from conditional to uncondi-tional moments is much more subtle. We cannot easily condition down the model at the sametime as the prices and payoffs.

9.3.1 Conditional vs. unconditional factor models in discount factor language

As an example, consider the CAPM

p @ d� eUZ

whereUZ is the return on the market or wealth portfolio. We can¿nd d ande from thecondition that this model correctly price any two returns, for exampleUZ itself and a risk-free rate:

�4 @ Hw+pw.4UZ

w.4,

4 @ Hw+pw.4,Uiw

,;?=

d @ 4Uiw

. eHw+UZw.4,

e @Hw+U

Zw.4,�Uiw

Uiw �

5w +U

Zw.4,

= (3)

As you can see,e A 3 andd A 3: to make a payoff proportional to the minimum second-moment return (on the inef¿cient part of the mean-variance frontier) we need a portfolio longthe risk free rate and short the marketUZ .

129

Page 130: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

More importantly for our current purposes, d and e vary over time, asHw+UZw.4,> �

5w +U

Zw.4,,

and Uiw vary over time. If it is to price assets conditionally, the CAPM must be a linear factor

model with time-varying weights, of the form

pw.4 @ dw � ewUZw.4=

This fact means that we can no longer transparently condition down. The statement that

4 @ Hw

�+dw . ewU

Zw.4,Uw.4

�does not imply that we can¿nd constantsd ande so that

4 @ H�+d. eUZ

w.4,Uw.4

�=

Just try it. Taking unconditional expectations,

4 @ H�+dw . ewU

Zw.4,Uw.4

�@ H

�dwUw.4 . ewU

Zw.4Uw.4

@ H+dw,H+Uw.4, .H+ew,H+UZw.4Uw.4, . fry+dw> Uw.4, . fry+ew> U

Zw.4Uw.4,

Thus, the unconditional model

4 @ H��H+dw, .H+ew,U

Zw.4

�Uw.4

�only holds if the covariance terms above happen to be zero. Sincedw andew are formed fromconditional moments of returns, the covariances will not, in general be zero.

On the other hand, suppose itis true thatdw andew are constant over time. Then

4 @ Hw

�+d. eUZ

w.4,Uw.4

�does imply

4 @ H�+d. eUZ

w.4,Uw.4

�>

just like any other constant-parameter factor pricing model. Furthermore, the latter uncondi-tional model implies the former conditional model, if the latter holds for all managed portfo-lios.

9.3.2 Conditional vs. unconditional in an expected return / beta model

To put the same observation in beta-pricing language,

Hw+Ul, @ Ui

w . �w�w (4)

130

Page 131: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.3 CONDITIONAL AND UNCONDITIONAL MODELS

does not imply that

H+Ul, @ �. �� (5)

The reason is that �w and � represent conditional and unconditional regression coef¿cientsrespectively.

Again, if returns and factors are i.i.d., the unconditional model can go through. In thatcase, fry+�, @ fryw+�,, ydu+�, @ yduw+�,, so the unconditional regression beta is the sameas the conditional regression beta, � @ �w. Then, we can take expectations of (9.4) to get(9.5), with � @ H+�w,. But to condition down in this way, the covariance and variance musteach be constant over time. It is not enough that their ratio, or conditional betas are constant.If fryw and yduw change over time, then the unconditional regression beta, � @ fry@ydu isnot equal to the average conditional regression beta, H+�w, or H+fryw@yduw,= Some modelsspecify that fryw and yduw vary over time, but fryw@yduw is a constant. This speci¿cation stilldoes not imply that the unconditional regression beta � � fry@ydu is equal to the constantfryw@yduw. Similarly, it is not enough that � be constant, since H+�w, 9@ �. The betas mustbe regression coef¿cients, not just numbers.

If the betas do not vary over time, the �w may still vary and � @ H+�w,.

9.3.3 A precise statement.

Let’s formalize these observations somewhat. Let[ denote the space of all portfolios of theprimitive assets,including managed portfolios in which the weights may depend on condi-tioning information, i.e. scaled returns.

A conditional factor pricing model is a modelpw.4 @ dw . e3wiw.4 that satis¿essw @Hw.4+pw.4{w.4, for all {w.4 5[.

An unconditional factor pricing model is modelpw.4 @ d . e3iw.4 satis¿esH+sw, @H+pw.4{w.4, for all {w.4 5 [. It might be more appropriately called a¿xed-weight factorpricing model.

Given these de¿nitions it’s almost trivial that the unconditional model is just a specialcase of the conditional model, one that happens to have¿xed weights. Thus,a conditionalfactor model does not imply an unconditional factor model (because the weights may vary)but an unconditional factor model does imply a conditional factor model.

There is one important subtlety. The payoff space[ is common, and contains all managedportfolios in both cases. The payoff space for the unconditional factor pricing model isnotjust ¿xed combinations of a set of basis assets. For example, we might simply check thatthe static (constantd> e, CAPM captures the unconditional mean returns of a set of assets. Ifthis model does not also price those assetsscaled by instruments, then it is not a conditionalmodel, or, as I argued above, really a valid factor pricing model at all.

Of course, everything applies for the relation between a conditional factor pricing model

131

Page 132: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

using a ¿ne information set (like investors’ information sets) and conditional factor pricingmodels using coarser information sets (like ours). If you think a set of factors prices assetswith respect to investors’ information, that does not mean the same set of factors prices assetswith respect to our, coarser, information sets.

9.3.4 Mean-variance frontiers

De¿ne the conditional mean-variance frontieras the set of returns that minimize yduw+Uw.4,given Hw+Uw.4, (including the “inef¿cient” lower segment as usual). De¿ne theuncondi-tional mean-variance frontieras the set of returns including managed portfolio returnsthatminimize ydu+Uw.4, given H+Uw.4,. These two frontiers are related by:

If a return is on the unconditional mean-variance frontier, it is on the conditionalmean-variance frontier.

However,

If a return is on the conditional mean-variance frontier, it need not be on the uncon-ditional mean-variance frontier.

These statements are exactly the opposite of what you ¿rst expect from the language. Thelaw of iterated expectations H+Hw+{,, @ H+{, leads you to expect that “conditional” shouldimply “unconditional.” But we are studying the conditional vs. unconditional mean-variancefrontier, not raw conditional and unconditional expectations, and it turns out that exactly theopposite words apply. Of course “unconditional” can also mean “conditional on a coarserinformation set.”

Again, keep in mind that the unconditional mean variance frontierincludes returns onmanaged portfolios. This de¿nition is eminently reasonable. If you’re trying to minimizevariance for given mean, why tie your hands to¿xed weight portfolios? Equivalently, whynot allow yourself to include in your portfolio the returns of mutual funds whose adviserspromise the ability to adjust portfolios based on conditioning information?

You could form a mean-variance frontier of¿xed-weight portfolios of a basis set of assets,and this is what many people often mean by “unconditional mean-variance frontier.” The re-turn on the true unconditional mean-variance frontier will, in general, include some managedportfolio returns, and so will lie outside thismean-variance frontier of¿xed-weight portfolios.Conversely, a return on the ¿xed-weight portfolio MVF is, in general,not on the uncondi-tional or conditional mean-variance frontier. All we know is that the¿xed-weight frontier liesinside the other two. It may touch, but it need not. This is not to say the¿xed-weight uncon-ditional frontier is uninteresting. For example, returns on this frontier will price¿xed-weightportfolios of the basis assets. The point is that this frontier has no connection to the other twofrontiers. In particular, a conditionally mean-variance ef¿cient return (conditional CAPM)need not unconditionally price the¿xed weight portfolios.

132

Page 133: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.3 CONDITIONAL AND UNCONDITIONAL MODELS

I offer several ways to see this important statement.

Using the connection to factor models

We have seen that the conditional CAPM pw.4 @ dw� ewUZw.4 does not imply an uncon-

ditional CAPMpw.4 @ d � eUZw.4. We have seen that the existence of such a conditional

factor model is equivalent to the statement that the returnUZw.4 lies on the conditional mean-

variance frontier, and the existence of an unconditional factor modelpw.4 @ d � eUZw.4 is

equivalent to the statement thatUZ is on the unconditional mean-variance frontier. Then,from the “trivial” fact that an unconditional factor model is a special case of a conditionalone, we know thatUZ on the unconditional frontier impliesUZ on the conditional frontierbut not vice-versa.

Using the orthogonal decomposition

We can see the relation between conditional and unconditional mean-variance frontiersusing the orthogonal decomposition characterization of mean-variance ef¿ciency given above.This beautiful proof is the main point of Hansen and Richard (1987).

By the law of iterated expectations,{� andU� generate expected prices andUh� generatesunconditional means as well as conditional means:

H ^s @ Hw+{�{,` , H+s, @ H+{�{,

H�Hw+U

�5, @ Hw+U�U,

� , H+U�5, @ H+U�U,

H ^Hw+Uh�Uh, @ Hw+U

h,` , H+Uh�Uh, @ H+Uh,

This fact is subtle and important. For example, starting with{� @ s3wHw+{w.4{3w.4,�4{w.4,

you might think we need a different{�> U�> Uh� to represent expected prices and uncon-ditional means, using unconditional probabilities to de¿ne inner products. The three linesabove show that this is not the case. The same old{�, U�> Uh� represent conditional as wellas unconditional prices and means.

Recall that a return is mean-variance ef¿cient if and only if it is of the form

Upy @ U� .zUh�=

Thus,Upy is conditionally mean-variance ef¿cient ifz is any number in the timew informa-tion set.

conditional frontier:Upyw.4 @ U�w.4 .zwU

h�w.4>

andUpy is unconditionally mean-variance ef¿cient ifz is any constant.

unconditional frontier:Upyw.4 @ U�w.4 .zUh�

w.4=

133

Page 134: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

Constants are in the w information set� time w random variables are not necessarily constant.Thus unconditional ef¿ciency (including managed portfolios) implies conditional ef¿ciencybut not vice versa. As with the factor models, once you see the decomposition, it is a trivialargument about whether a weight is constant or time-varying.

Brute force and examples.

If you’re still puzzled, an additional argument by brute force may be helpful.

If a return is on the unconditional MVF it must be on the conditional MVF at each date.If not, you could improve the unconditional mean-variance trade-off by moving to the con-ditional MVF at each date. Minimizing unconditional variance given mean is the same asminimizing unconditional second moment given mean,

plqH+U5, v=w= H+U, @ �

Writing the unconditional moment in terms of conditional moments, the problem is

plqH�Hw+U

5,�v=w= H ^Hw+U,` @ �

Now, suppose you could lowerHw+U5, at one datew without affectingHw+U, at that date.This change would lower the objective, without changing the constraint. Thus, you shouldhave done it: you should have picked returns on theconditional mean variance frontiers.

It almost seems that reversing the argument we can show that conditional ef¿ciency im-plies unconditional ef¿ciency, but it doesn’t. Just because you have minimizedHw+U5, forgiven value ofHw+U, at each datew does not imply that you have minimizedH+U5, for agiven value ofH+U,. In showing that unconditional ef¿ciency implies conditional ef¿ciencywe held¿xedHw+U, at each date at�, and showed it is a good idea to minimize�w+U,. Intrying to go backwards, the problem is that a given value ofH+U, does not specify whatHw+U, should be at each date. We can increaseHw+U, in one conditioning information setand decrease it in another, leaving the return on the conditional MVF.

Figure 20 presents an example. Return B is conditionally mean-variance ef¿cient. It alsohas zero unconditional variance, so it is the unconditionally mean-variance ef¿cient return atthe expected return shown. Return A is on the conditional mean-variance frontiers, and hasthe same unconditional expected return as B. But return Ahas some unconditional variance,and so is inside the unconditional mean-variance frontier.

As a second example,the riskfree rate is only on the unconditional mean-variance frontierif it is a constant. Remember the expression (7.15) for the risk free rate,

Ui @ U� .UiUh�=

The unconditional mean-variance frontier isU�.zUh� with z a constant. Thus, the riskfreerate is only unconditionally mean-variance ef¿cient if it is a constant.

134

Page 135: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.3 CONDITIONAL AND UNCONDITIONAL MODELS

σt(R)

Et(R)

A

A

B

Info. set 1

Info. set 2

Figure 20. Return A is on the conditional mean-variance frontiers but not on the uncondi-tional mean variance frontier.

135

Page 136: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

9.3.5 Implications: Hansen-Richard Critique.

Many models, such as the CAPM, imply a conditional linear factor model pw.4 @ dw .e3wiw.4= These theorems show that such a model does not imply an unconditional model.Equivalently, if the model predicts that the market portfolio is conditionally mean-varianceef¿cient, this doesnot imply that the market is unconditionally mean-variance ef¿cient. Weoften test the CAPM by seeing if it explains the average returns of some portfolios or (equiv-alently) if the market is on the unconditional mean-variance frontier. The CAPM may quitewell be true (conditionally) and fail these tests� many assets may do better in terms ofuncon-ditional mean vs. unconditionalvariance.

The situation is even worse than these comments seem, and are not repaired by simpleinclusion of some conditioning information. Models such as the CAPM imply a conditionallinear factor model with respect to investors’information sets. However, the best we can hopeto do is to test implications conditioned down on variables that we can observe and includein a test. Thus, a conditional linear factor model is not testable!

I like to call this observation the “Hansen-Richard critique” by analogy to the “Roll Cri-tique.” Roll pointed out, among other things, that the wealth portfolio might not be observ-able, making tests of the CAPM impossible. Hansen and Richard point out that the condi-tioning information of agents might not be observable, and that one cannot omit it in testing aconditional model. Thus, even if the wealth portfoliowere observable, the fact that we cannotobserve agents’information sets dooms tests of the CAPM.

9.4 Scaled factors: a partial solution

You can expand the set of factors to test conditional factor pricing models

factors@ iw.4 }w

The problem is that the parameters of the factor pricing modelpw.4 @ dw . ewiw.4 mayvary over time. A partial solution is tomodel the dependence of parametersdw andew onvariables in the time�w information set� let dw @ d+}w,> ew @ e+}w, where}w is a vector ofvariables observed at timew (including a constant). In particular, why not trylinear models

dw @ d3}w> ew @ e3}w

Linearity is not restrictive:}5w is just another instrument. The only criticism one can makeis that some instrument}mw is important for capturing the variation indw and ew, and wasomitted. For instruments on which we have data, we can meet this objection by trying}mwand seeing whether it does, in fact, enter signi¿cantly. However, for instruments}w that are

136

Page 137: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 9.4 SCALED FACTORS: A PARTIAL SOLUTION

observed by agents but not by us, this criticism remains valid.

Linear discount factor models lead to a nice interpretation as scaled factors, in the sameway that linearly managed portfolios are scaled returns. With a single factor and instrument,write

pw @ d+}w, . e+}w,iw.4 (6)

@ d3 . d4}w . +e3 . e4}w,iw.4

@ d3 . d4}w . e3iw.4 . e4 +}wiw.4, = (7)

Thus, in place of the one-factor model with time-varying coef¿cients (9.6), we have a three-factor model (}w > iw.4> }wiw.4, with ¿xed coef¿cients, (9.7).

Since the coef¿cients are now¿xed, wecan use the scaled-factor model with uncondi-tional moments.

sw @ Hw ^+d3 . d4}w . e3iw.4 . e4 +}wiw.4,, {w.4` ,

H+sw, @ H ^+d3 . d4}w . e3iw.4 . e4+}wiw.4,, {w.4`

For example, in standard derivations of CAPM, the market (wealth portfolio) return isconditionally mean-variance ef¿cient� investors want to hold portfolios on theconditionalmean-variance frontier� conditionally expected returns follow aconditional single-beta rep-resentation, or the discount factorp follows aconditional linear factor model

pw.4 @ dw � ewUZw.4

as we saw above.

But none of these statements mean that we can use the CAPMunconditionally. Ratherthan throw up our hands, we can add some scaled factors. Thus, if, say, the dividend/price ra-tio and term premium do a pretty good job of summarizing variation in conditional moments,theconditional CAPM implies anunconditional, ¿ve-factor (plus constant) model. The fac-tors are a constant, the market return, the dividend/price ratio, the term premium, and themarket returntimes the dividend-price ratio and the term premium.

The unconditional pricing implications of such a¿ve-factor model could, of course, besummarized by a single�� representation. (See the caustic comments in the section on im-plications and equivalence.) The reference portfolio would not be the market portfolio, ofcourse, but a mimicking portfolio of the¿ve factors. However, the single mimicking port-folio would not be easily interpretable in terms of a single factor conditional model and twoinstruments. In this case, it might be more interesting to look at a multiple�� or multiple-factor representation.

137

Page 138: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 9 CONDITIONING INFORMATION

If we have many factors i and many instruments }, we should in principle multiply everyfactor by every instrument,

p @ e4i4 . e5i4}4 . e6i4}5 . ===. eQ.4i5 . eQ.5i5}4 . eQ.6i5}5 . ===

This operation can be compactly summarized with the Kronecker product notation, d e,which means “multiply every element in vectord by every element in vectore> or

pw.4 @ e3+iw.4 }w,=

9.5 Summary

When you¿rst think about it, conditioning information sounds scary – how do we account fortime-varying expected returns, betas, factor risk premia, variances, covariances, etc. How-ever, the methods outlined in this chapter allow a very simple and beautiful solution to theproblems raised by conditioning information. To express the conditional implications of agiven model, all you have to do is include some scaled or managed portfolio returns, and thenpretend you never heard about conditioning information.

Some factor models are conditional models, and have coef¿cients that are functions ofinvestors’ information sets. In general, there is no way to test such models, but if you arewilling to assume that the relevant conditioning information is well summarized by a fewvariables, then you can just add new factors, equal to the old factors scaled by the conditioningvariables, and again forget that you ever heard about conditioning information.

You may want to remember conditioning information as a diagnostic and in economicinterpretation of the results. It may be interesting to take estimates of a many factor model,pw @ d3 . d4}w . e3iw.4 . e4}wiw.4> and see what they say about the implied conditionalmodel,pw @ +d3 . d4}w, . +e3 . e4}w,iw.4. You may want to make plots of conditionales, betas, factor risk premia, expected returns,etc. But you don’t have to worry about it inestimation and testing.

138

Page 139: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 10. Factor pricing modelsIn Chapter 3, I noted that the consumption-based model, while a complete answer to mostasset pricing questions in principle, does not (yet) work well in practice. This observationmotivates efforts to tie the discount factorp to other data. Linear factor pricing models arethe most popular models of this sort in¿nance. They dominate discrete time empirical work.

Factor pricing models replace the consumption-based expression for marginal utilitygrowth with a linear model of the form

pw.4 @ d. e3i w.4

d and e are free parameters. As we have seen above, this speci¿cation is equivalent to amultiple-beta model

H+Uw.4, @ �. �3�

where� are multiple regression coef¿cients of returnsU on the factorsi . Here,� and� arethe free parameters.

The big question is, what should one use for factorsiw.4? Factor pricing models look forvariables that are good proxies for aggregate marginal utility growth, i.e., variables for which

�x3+fw.4,x3+fw,

� d. e3i w.4 (1)

is a sensible and economically interpretable approximation.

More directly and interpretably, the essence of asset pricing is that there are special statesof the world in which investors are especially concerned that their portfolios not do badly.They are willing to trade off some overall performace – average return – to make sure thatportfolios do not do badly in these particular states of nature. The factors are variables thatindicate that these “bad states” have occurred.

The factors that result from this search are and should be intuitively sensible. In anysensible economic model, as well as in the data, consumption is related to returns on broad-based portfolios, to interest rates, to growth in GNP, investment, or other macroeconomicvariables, and to returns on production processes. All of these variables measure “wealth”or the state of the economy. Consumption is and should be high in “good times” and low in“bad times.”

Furthermore, consumption and marginal utility respond tonews: if a change in somevariable today signals high income in the future, then consumption risesnow, by permanentincome logic. This fact opens the door toforecasting variables: any variable that forecastsasset returns (“changes in the investment opportunity set”) or macroeconomic variables is acandidate factor. Variables such as the term premium, dividend/price ratio, stock returns, etc.can be defended as pricing factors on this logic. Though they themselves are not measures ofaggregate good or bad times, theyforecast such times.

139

Page 140: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

Should factors be independent over time? The answer is, sort of. If there is a constantreal interest rate, then marginal utility growth should be unpredictable. (“Consumption is arandom walk” in the quadratic utility permanent income model.) To see this, just look at the¿rst order condition with a constant interest rate,

x3+fw, @ �UiHw ^x3+fw.4,`

or in a more time-series notation,

x3+fw.4,x3+fw,

@4

�Ui. %w.4> Hw+%w.4, @ 3=

The real risk free rate is not constant, but it does not vary a lot, especially compared to as-set returns. Measured consumption growth is not exactly unpredictable but it is the leastpredictable macroeconomic time series, especially if one accounts properly for temporal ag-gregation (consumption data are quarterly averages). Thus, factors that proxy for marginalutility growth, though they don’t have to be totally unpredictable, should not be highly pre-dictable. If one chooses highly predictable factors, the model will counterfactually predictlarge interest rate variation.

In practice, this consideration means that one should choose the right units: Use GNPgrowth rather than level, portfolioreturns rather than prices or price/dividend ratios, etc.However, unless one wants to impose an exactly constant risk free rate, one does not have to¿lter or prewhiten factors to make them exactly unpredictable.

This view of factors as intuitively motivated proxies for marginal utility growth is suf¿-cient to carry the reader through current empirical tests of factor models. The extra constraintsof a formal exposition of theory in this part have not yet constrained the factor-¿shing expe-dition.

The precise derivations all proceed in the way I have motivated factor models: One writesdown a general equilibrium model, in particular a speci¿cation of the production technologyby which real investment today results in real output tomorrow. This general equilibriumproduces relations that express the determinants of consumption from exogenous variables,and relations linking consumption and other endogenous variables� equations of the formfw @ j+iw,. One then uses this kind of equation to substitute out for consumption in the basic¿rst order conditions. (You don’t have to know more than this about general equilibriumto follow the derivations in this chapter. I discuss the economics and philosophy of generalequilibrium models in some depth later, in Chapter V.)

The formal derivations accomplish two things: they determine one particularlist of factorsthat can proxy for marginal utility growth, and they prove that the relation should belinear.Some assumptions can often be substituted for others in the quest for these two features of afactor pricing model.

This is a point worth remembering:all factor models are derived as specializations of theconsumption-based model. Many authors of factor model papers disparage the consumption-

140

Page 141: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

based model, forgetting that their factor model is the consumption-based model plus extraassumptions that allow one to proxy for marginal utility growth from some other variables.

Above, I argued that clear economic foundation was important for factor models, since itis the only guard against¿shing. Alas, we discover here that the current state of factor pricingmodels is not a particularly good guard against¿shing. One can call for better theories orderivations, more carefully aimed at limiting the list of potential factors and describing thefundamental macroeconomic sources of risk, and thus providing more discipline for empiricalwork. The best minds in¿nance have been working on this problem for 40 years though, soa ready solution is not immediately in sight. On the other hand, we will see that even currenttheory can provide much more discipline than is commonly imposed in empirical work. Forexample, the derivations of the CAPM and ICAPM do leave predictions for the risk free rateand for factor risk premia that are often ignored. The ICAPM gives tighter restrictions onstate variables than are commonly checked: “State variables” do have to forecast something!We also see how special and unrealistic are the general equilibrium setups necessary to derivepopular speci¿cations such as CAPM and ICAPM. This observation motivates a more seriouslook at real general equilibrium models below.

10.1 Capital Asset Pricing Model (CAPM)

The CAPM is the modelp @ d . eUz> Uz = wealth portfolio return. I derive it fromthe consumption based model by 1) Two period quadratic utility� 2) Two periods, exponentialutility and normal returns� 3) In¿nite horizon, quadratic utility and i.i.d. returns� 4) Log utilityand normal distributions.

The CAPM is the¿rst, most famous and (so far) most widely used model in asset pricing,as the related consumption-based model is in macroeconomics. It ties the discount factorpto the return on the “wealth portfolio.” The function is linear,

pw.4 @ d. eUZw.4=

d ande are free parameters. One can¿nd theoretical values for the parametersd ande byrequiring the discount factorp to price any two assets, such as the wealth portfolio returnand risk-free rate,4 @ H+pUZ , and4 @ H+p,Ui . (As an example, we did this in equation(9.3) above.) In empirical applications, we can also pickd ande to “best” price larger cross-sections of assets. We do not have good data on, or even a good empirical de¿nition for, thereturn on total wealth. It is conventional to proxyUZ by the return on a broad-based stockportfolio such as the value- or equally-weighted NYSE, S&P500, etc.

The CAPM is of course most frequently stated in equivalent expected return / beta lan-

141

Page 142: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

guage,

H+Ul, @ �. �l>UZ ^H+Uz,� �` =

This section brieÀy describes some classic derivations of the CAPM. Again, we needto ¿nd assumptions that defend which factors proxy for marginal utility (UZ here), andassumptions to defend the linearity between p and the factor.

I present several derivations of the same model. Many of these derivations use classicmodeling assumptions which are important in their own sake. This is also an interesting placein which to see that various sets of assumptions can often be used to get to the same place.The CAPM is often criticized for one or another assumption. By seeing several derivations,we can see how one assumption can be traded for another. For example, the CAPM does notin fact require normal distributions, if one is willing to swallow quadratic utility instead.

10.1.1 Two-period quadratic utility

Two period investors with no labor income and quadratic utility imply the CAPM.

Investors have quadratic preferences and only live two periods,

X+fw> fw.4, @ �4

5+fw � f�,5 � 4

5�H^+fw.4 � f�,5`= (2)

Their marginal rate of substitution is thus

pw.4 @ �x3+fw.4,x3+fw,

@ �+fw.4 � f�,+fw � f�,

=

The quadratic utility assumption means marginal utility is linear in consumption. Thus, the¿rst target of the derivation, linearity.

Investors are born with wealth Zw in the ¿rst period and earn no labor income. Theycan invest in lots of assets with prices slw and payoffs {lw.4, or, to keep the notation simple,returns Ul

w.4= They choose how much to consume at the two dates, fw and fw.4, and theportfolio weights �l for their investment portfolio. Thus, the budget constraint is

fw.4 @ Zw.4 (3)

Zw.4 @ UZw.4 +Zw � fw,

142

Page 143: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

UZ @Q[l@4

�lUl>

Q[l@4

�l @ 4=

UZ is the rate of return on total wealth.

The two-period assumption means that investors consume everything in the second pe-riod, by constraint (10.3). This fact allows us to substitute wealth and the return on wealthfor consumption, achieving the second goal of the derivation, naming the factor that proxiesfor consumption or marginal utility:

pw.4 @ �UZw.4+Zw � fw,� f�

fw � f�@

��f�

fw � f�.

�+Zw � fw,

fw � f�UZw.4

i.e.

pw.4 @ dw . ewUZw.4=

10.1.2 Exponential utility, normal distributions

Hx+f, @ h��f and a normally distributed set of returns also produces the CAPM.

The combination of exponential utility and normal distributions is another set of assump-tions that deliver the CAPM in a one or two period model. This structure has a particularlyconvenient analytical form. Since it gives rise to linear demand curves, it is very widelyused in models that complicate the trading structure, by introducing incomplete markets orasymmetric information.

I present a model with consumption only in the last period. (You can do the quadraticutility model of the last section this way as well.) Utility is

H ^x+f,` @ H�h��f

�=

� is known as thecoef¿cient of absolute risk aversion. If consumption is normally distributed,we have

Hx+f, @ h��H+f,.�5

5 �5+f,=

Suppose this investor has initial wealthZ which can be split between a riskfree assetpayingUi and a set of risky assets paying returnU. Let | denote the amount of this wealthZ (amount, not fraction) invested in each security. Then, the budget constraint is

f @ |iUi . |3UZ @ |i . |34

143

Page 144: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

Plugging the ¿rst constraint into the utility function we obtain

Hx+f, @ h��^|iUi.|3H+U,`.�5

5 |3|= (4)

As with quadratic utility, the two-period model is what allows us to set consumption to wealthand then substitute the return on the wealth portfolio for consumption growth in the discountfactor.

Maximizing (10.4) with respect to|> |i , we obtain the¿rst order condition describing theoptimal amount to be invested in the risky asset,

| @ �4H+U,�Ui

Sensibly, the investor invests more in risky assets if their expected return is higher, less if hisrisk aversion coef¿cient is higher, and less if the assets are riskier. Notice that total wealthdoes not appear in this expression. With this setup, the amount invested in risky assets isindependent of the level of wealth. This is why we say that this investor has an aversion toabsolute rather thanrelative (to wealth) risk aversion. Note also that these “demands” for therisky assets are linear in expected returns, which is a very convenient property.

Inverting the¿rst order conditions, we obtain

H+U,�Ui @ �| @ � fry+U>Up,= (5)

The investor’s total risky portfolio is|3U. Hence,| gives the covariance of each returnwith |3U, and also with the investor’s overall portfolio|iUi . |3U. If all investors areidentical, then the market portfolio is the same as the individual’s portfolio so| also givesthe correlation of each return withUp @ |iUi . |3U. (If investors differ in risk aversion�,the same thing goes through but with an aggregate risk aversion coef¿cient.)

Thus, we have the CAPM. This version is especially interesting because it ties the marketprice of risk to the risk aversion coef¿cient. Applying (10.5) to the market return itself, wehave

H+Up,�Ui

�5+Up,@ �=

10.1.3 Quadratic value function, dynamic programming.

We can let investors live forever in the quadratic utility CAPM so long as we assume thatthe environment is independent over time. Then thevalue function is quadratic, taking theplace of the quadratic second-period utility function. This case is a nice¿rst introduction todynamic programming.

144

Page 145: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

The two-period structure given above is unpalatable, since (most) investors do in fact livelonger than two periods. It is natural to try to make the same basic ideas work with lessrestrictive and more palatable assumptions.

We can derive the CAPM in a multi-period context by replacing the second-period quadraticutility function with a quadraticvalue function. However, the quadratic value function re-quires the additional assumption that returns are i.i.d. (no “shifts in the investment oppor-tunity set”). This observation, due to Fama (1970), is also a nice introduction todynamicprogramming, which is a powerful way to handle multiperiod problems by expressing themas two period problems. Finally, I think this derivation makes the CAPM more realistic, trans-parent and intuitively compelling. Buying stocks amounts to taking bets overwealth� reallythe fundamental assumption driving the CAPM is that marginal utility ofwealth is linear inwealth and does not depend on other state variables.

Let’s start in a simple ad-hoc manner by just writing down a “utility function” de¿nedover this period’s consumption and next period’swealth,

X @ x+fw, . �HwY +Zw.4,=

This is a reasonable objective for an investor, and does not require us to make the very ar-ti¿cial assumption that he will die tomorrow. If an investor with this “utility function” canbuy an asset at pricesw with payoff {w.4, his ¿rst order condition (buy a little more, then{ contributes to wealth next period) is

swx3+fw, @ �Hw ^Y

3+Zw.4,{w.4` =

Thus, the discount factor uses next period’s marginal value of wealth in place of the morefamiliar marginal utility of consumption

pw.4 @ �Y 3+Zw.4,

x3+fw,

(The envelope condition states that, at the optimum, a penny saved has the same value as apenny consumedx3+fw, @ Y 3+Zw,. We could use this condition to express the denominatorin terms of wealth also.)

Now, suppose the value function were quadratic,

Y +Zw.4, @ ��

5+Zw.4 �Z �,5=

Then, we would have

pw.4 @ ���Zw.4 �Z �

x3+fw,@ ���

UZw.4+Zw � fw,�Z �

x3+fw,

@

���Z �

x3+fw,

�.

����+Zw � fw,

x3+fw,

�UZw.4>

145

Page 146: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

or, once again,

pw.4 @ dw . ewUZw.4>

the CAPM!

Let’s be clear about the assumptions and what they do.

1) The value function only depends on wealth. If other variables entered the value func-tion, thenCY@CZ would depend on those other variables, and so wouldp. This assumptionbought us the¿rst objective of any derivation: the identity of the factors. The ICAPM, be-low, allows other variables in the value function, and obtains more factors. (Actually, othervariables could enter so long as they don’t affect themarginal value of wealth. The weatheris an example: You like me might be happier on sunny days, but you do not value additionalwealth more on sunny than on rainy days. Hence, covariance with weather does not affecthow you value stocks.)

2) The value function is quadratic. We wanted themarginal value functionY 3+Z , belinear, to buy us the second objective, showingp is linear in the factor. Quadratic utility andvalue functions deliver a globally linear marginal value functionY 3+Z ,. By the usual Taylorseries logic, linearity ofY 3+Z , is probably not a bad assumption for small perturbations, andnot a good one for large perturbations.

Why is the value function quadratic?

You might think we are done. But economists are unhappy about a utility function thathaswealth in it. Few of us are like Disney’s Uncle Scrooge, who got pure enjoyment outof a daily swim in the coins in his vault. Wealth is valuable because it gives us access tomore consumption. Utility functions should always be written overconsumption. One of thefew real rules in economics that keep our theories from being vacuous is that ad-hoc “utilityfunctions” over other objects like wealth (or means and variances of portfolio returns, or“status” or “political power”) should be defended as arising from a more fundamental desirefor consumption.

More practically, being careful about the derivation makes clear that the super¿ciallyplausible assumption that the value function is only a function of wealth derives from themuch less plausible, in fact certainly false, assumption that interest rates are constant, thedistribution of returns is i.i.d., and that the investor has no risky labor income. So, let us seewhat it takes to defend the quadraticvalue function in terms of someutility function.

Suppose investors last forever, and have the standard sort of utility function

X @ �4

5Hw

4[m@3

�mx+fw.m,=

Again, investors start with wealthZ3 which earns a random returnUZ and they have noother source of income. In addition, suppose that interest rates are constant, and stock returns

146

Page 147: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

are i.i.d. over time.

De¿ne the value function as the maximized value of the utility function in this environ-ment. Thus, de¿neY +Z , as7

Y +Zw, � pd{ifw>fw.4>fw.5===�w>�w.4>===jHw

4[m@3

�mx+fw.m, (10.6)

s.t.Zw.4 @ UZw.4+Zw � fw,> U

Zw @ �3wUw> �

3w4 @ 4

(I used vector notation to simplify the statement of the portfolio problem�U � �U4 U5 === UQ

�3,

etc.) The value function is the total level of utility the investor can achieve, given how muchwealth he has, and any other variables constraining him. This is where the assumptions ofno labor income, a constant interest rate and i.i.d. returns come in. Without these assump-tions, the value function as de¿ned above might depend on these other characteristics of theinvestor’s environment. For example, if there were some variable, say, “D/P” that indicatedreturns would be high or low for a while, then the investor would be happier, and have ahigh value, when D/P is high, for a given level of wealth. Thus, we would have to writeY +Zw>G@Sw,

Value functions allow you to express an in¿nite period problem as a two period problem.Break up the maximization into the¿rst period and all the remaining periods, as follows

Y +Zw, @ pd{ifw>�wj

;?=x+fw, . �Hw

57 pd{ifw.4>fw.5==>�w.4>�w.5====j

Hw.4

4[m@3

�mx+fw.4.m,

68<@> v= w= ==

or

Y +Zw, @ pd{ifw>�wj ix+fw, . �HwY +Zw.4,j v=w= === (7)

Thus, we have defended theexistence of a value function. Writing down a two period“utility function” over this period’s consumption and next period’swealth is not as crazy asit might seem.

The value function is also an attractive view of how people actually make decisions. Youdon’t think “If I buy a sandwich today, I won’t be able to go out to dinner one night 20years from now” – trading off goods directly as expressed by the utility function. You think“I can’t afford a new car” meaning that the decline in the value of wealth is not worth theincrease in the marginal utility of consumption. Thus, the maximization in (10.7) describesyour psychological approach to utility maximization.

. There is also a transversality condition or a lower limit on wealth in the budget constraints. This keeps theconsumer from consuming a bit more and rolling over more and more debt, and it means we can write the budgetconstraint in present value form.

147

Page 148: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

The remaining question is, can the value function be quadratic? What utility functionassumption leads to a quadratic value function? Here is the fun fact: A quadratic utilityfunction leads to a quadratic value function in this environment. This is not a law of nature�it is not true that for any x+f,, Y +Z , has the same functional form. But it is true here anda few other special cases. The “in this environment” clause is not innocuous. The valuefunction – the achieved level of expected utility – is a result of the utility functionand theconstraints.

How could we show this fact? One way would be to try to calculate the value functionby brute force from its de¿nition, equation (10.6). This approach is not fun, and it doesnot exploit the beauty of dynamic programming, which is the reduction of an in¿nite periodproblem to a two period problem.

Instead solve (10.7) as a functional equation.Guess that the value functionY +Zw.4, isquadratic, with some unknown parameters. Then use therecursive de¿nition of Y +Zw, in(10.7), and solve atwo period problem–¿nd the optimal consumption choice, plug it into(10.7) and calculate the value functionY +Zw,. If the guess was right, you obtain a quadraticfunction forY +Zw,, and determine any free parameters.

Let’s do it. Specify

x+fw, @ �4

5+fw � f�,5 =

Guess

Y +Zw.4, @ ��

5+Zw.4 �Z�,5

with � andZ � parameters to be determined later. Then the problem (10.7) is (I don’t writethe portfolio choice� part for simplicity� it doesn’t change anything)

Y +Zw, @ pd{ifwj

��4

5+fw � f�,5 � �

5H+Zw.4 �Z �,5

�v= w= Zw.4 @ UZ

w.4+Zw � fw,=

(Hw is nowH since I assumed i.i.d.) Substituting the constraint into the objective,

Y +Zw, @ pd{ifwj

��4

5+fw � f�,5 � �

5H�UZw.4+Zw � fw,�Z ��5� = (8)

The¿rst order condition with respect tofw> usingaf to denote the optimal value, is

afw � f� @ ��H��UZw.4+Zw � afw,�Z ��UZ

w.4

�Solving forafw>

afw @ f� . ��H��UZ5w.4Zw � afwU

Z5w.4 �Z �UZ

w.4

��148

Page 149: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

afw�4 . ��H+UZ5

w.4,�@ f� . ��H+UZ5

w.4,Zw � ��Z �H+UZw.4,

afw @f� � ��H+UZ

w.4,Z� . ��H+UZ5

w.4,Zw

4 . ��H+UZ5w.4,

(9)

This is a linear function of Zw. Writing (10.8) in terms of the optimal value of f> we get

Y +Zw, @ �4

5+afw � f�,5 � �

5H�UZw.4+Zw � afw,�Z ��5 (10)

This is a quadratic function of Zw and af. A quadratic function of a linear function is aquadratic function, so the value function is a quadratic function of Zw. If you want to spenda pleasant few hours doing algebra, plug (10.9) into (10.10), check that the result really isquadratic in Zw, and determine the coef¿cients �>Z � in terms of fundamental parameters�> f�> H+UZ ,> H+UZ5, (or �5+UZ ,). The expressions for �>Z � do not give much insight,so I don’t do the algebra here.

10.1.4 Log utility

Log utility rather than quadratic utility also implies a CAPM. Log utility implies thatconsumption is proportional to wealth, allowing us to substitute the wealth return for con-sumption data.

The point of the CAPM is to avoid the use of consumption data, and so to use wealthor the rate of return on wealth instead. Log utility is another special case that allows thissubstitution. Log utility is much more plausible than quadratic utility.

Suppose that the investor has log utility

x+f, @ oq+f,=

De¿ne the wealth portfolio as a claim to all future consumption. Then,with log utility, theprice of the wealth portfolio is proportional to consumption itself.

sZw @ Hw

4[m@4

�mx3+fw.m,x3+fw,

fw.m @ Hw

4[m@4

�mfwfw.m

fw.m @�

4� �fw

The return on the wealth portfolio is proportional to consumption growth,

UZw.4 @

sZw.4 . fw.4

sZw@

�4�� . 4

�4��

fw.4fw

@4

fw.4fw

@4

x3+fw,x3+fw.4,

=

149

Page 150: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

Thus, the log utility discount factor equals the inverse of the wealth portfolio return,

pw.4 @4

UZw.4

= (11)

Equation (10.11) could be used by itself: it attains the goal of replacing consumption databy some other variable. (Brown and Gibbons 1982 test a CAPM in this form.) Note that logutility is the only assumption so far. We do not assume constant interest rates, i.i.d. returnsor the absence of labor income.

Log utility has a special property that “income effects offset substitution effects,” or inan asset pricing context that “discount rate effects offset cashÀow effects.” News of higherconsumption = dividend should make the claim to consumption more valuable. However,throughx3+f, it also raises the discount rate, lowering the value of the claim to consumption.For log utility, these two effects exactly offset.

10.1.5 Linearizing any model: Taylor approximations and normal distributions.

Any nonlinear modelp @ i+}, can be turned into a linear modelp @ d. e} in discretetime by assuming normal returns.

It is traditional in the CAPM literature to try to derive alinear relation betweenp andthe wealth portfolio return. We could always do this by a Taylor approximation,

pw.4�@ dw . ewU

Zw.4=

We can make this approximation exact in a special case, that the factors and all asset returnsare normally distributed. (We can also take the continuous time limit, which is really thesame thing. However, this discrete-time trick is common and useful.) First, I quote withoutproof the central mathematical trick as a lemma

Lemma 1 (Vwhlq*v ohppd, If i>U are bivariate normal, j+i, is differentiable and H mj3+i, m? 4, then

fry ^j+i,>U` @ H^j3+i,` fry+i>U,= (12)

Now we can use the lemma to state the theorem.

Theorem 2 If p @ j+i,, if i and a set of the payoffs priced by p are normally distributedreturns, and if mH^j3+i,`m ? 4, then there is a linear model p @ d . ei that prices thenormally distributed returns.

150

Page 151: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.1 CAPITAL ASSET PRICING MODEL (CAPM)

Proof: First, the de¿nition of covariance means that the pricing equation can berewritten as a restriction between mean returns and the covariance of returns withp:

4 @ H+pU, / 4 @ H+p,H+U, . fry+p>U,= (13)

Now, given p @ j+i,> i and U jointly normal, apply Stein’s lemma (10.12) and(10.13),

4 @ H^j+i,`H+U, .H^j3+i,`fry+i>U,

4 @ H^j+i,`H+U, . fry+H^j3+i,`i>U,

Exploiting the+ part of (10.13), we know that anp with meanH+j+i,, and thatdepends oni viaH+j3+i,,i will price assets,

p @ H^j+i,` .H^j3+i,`^i �H+i,`=

Using this trick, and recalling that we have not assumed i.i.d. so all these moments areconditional,the log utility CAPM implies the linear model

pw.4 @ Hw

�4

UZw.4

��Hw

%�4

UZw.4

�5& �

UZw.4 �Hw+U

Zw.4,

�(14)

if UZw.4 and all asset returns to be priced are normally distributed. From here it is a short

step to an expected return-beta representation using the wealth portfolio return as the factor.

In the same way, we can trade the quadratic utility function for normal distributions in thedynamic programming derivation of the CAPM. Starting from

pw.4 @ �Y 3+Zw.4,

x3+fw,@ �

Y 3 �UZw.4+Zw � fw,

�x3+fw,

we can derive an expression that linksp linearly toUZw.4 by assuming normality.

Using the same trick, the consumption-based model can be written in linear fashion, i.e.expected returns can be expressed as a linear function of betas on consumption growth ratherthan betas on consumption growth raised to a power. However, for large risk aversion co-ef¿cients (more than about 10 in postwar consumption data) or other transformations, theinaccuracies due to the normal or lognormal approximation can be very signi¿cant in dis-crete data.

The normal distribution assumption seems rather restrictive, and it is. However, the mostpopular class of continuous-time models specify instantaneously normal distributions evenfor things like options that have very non-normal distributions for discrete time intervals.

151

Page 152: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

Therefore, one can think of the Stein’s lemma tricks as a way to get to continuous timeapproximations without doing it in continuous time. I demonstrate the explicit continoustime approach with the ICAPM, discussed next.

10.2 Intertemporal Capital Asset Pricing Model (ICAPM)

Any “state variable”}w can be a factor. The ICAPM is a linear factor model with wealthand state variables that forecast changes in the distribution of future returns or income.

The ICAPM generates linear discount factor models

pw.4 @ d. e3iw.4

in which the factors are “state variables” for the investor’s consumption-portfolio decision.

The “state variables” are the variables that determine how well the investor can do in hismaximization. State variables include current wealth, and also variables that describe theconditional distribution of income and asset returns the agent will face in the future or “shiftsin the investment opportunity set.” Therefore, optimal consumption decisions are functions ofthe state variables,fw @ j+}w,. We can use this fact once again to substitute out consumption,and write

pw.4 @ �x3 ^j+}w.4,`x3 ^j+}w,`

=

From here, it is a simple linearization to deduce that the state variables}w.4 will be factors.

Alternatively, thevalue function depends on the state variables

Y +Zw.4> }w.4,>

so we can write

pw.4 @ �YZ +Zw.4> }w.4,

YZ +Zw> }w,

(The marginal value of a dollar must be the same in any use, so I made the denominator prettyby writing x3+fw, @ YZ +Zw> }w,= This fact is known as theenvelope condition.)

This completes the¿rst step, naming the proxies. To obtain a linear relation, we can takea Taylor approximation, assume normality and use Stein’s lemma, or, most conveniently,move to continuous time (which is really just a more convenient way of making the normalapproximation.) We saw above that we can write the basic pricing equation in continuous

152

Page 153: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.3 COMMENTS ON THE CAPM AND ICAPM

time as

Hgs

s� uigw @ �H

�g�

gs

s

�=

(for simplicity of the formulas, I’m folding any dividends into the price process). The dis-count factor is marginal utility, which is the same as the marginal value of wealth,

g�w�w

@gx3+fw,x3+fw,

@gYZ +Zw> }w,

YZ

Our objective is to express the model in terms of factors} rather than marginal utility orvalue, and Ito’s lemma makes this easy

gYZYZ

@ZYZZ

YZ

gZ

Z.

YZ}

YZg} .

4

5+second derivative terms)

(We don’t have to grind out the second derivative terms if we are going to takeuigw @Hw +g�@�, > though this approach removes a potentially interesting and testable implicationof the model). The elasticity of marginal value with respect to wealth is often called thecoef¿cient of relative risk aversion,

uud � �ZYZZ

YZ=

Substituting, we obtain the ICAPM, which relates expected returns to the covariance of re-turns with wealth, and also with the other state variables,

Hgs

s� uigw @ uud H

�gZ

Z

gs

s

�� YZ}

YZH

�g}

gs

s

�=

From here, it is fairly straightforward to express the ICAPM in terms of betas rather thancovariances, or as a linear discount factor model. Most empirical work occurs in discretetime� we often simply approximate the continuous time result as

H+U,�Ui � uud fry+U>�Z , . �}fry+U>�},=

One often substitutes covariance with the wealth portfolio for covariance with wealth, andone uses factor-mimicking portfolios for the other factorsg} as well. The factor-mimickingportfolios are interesting for portfolio advice as well, as they give the purest way of hedgingagainst or pro¿ting from state variable risk exposure.

This short derivation does not do justice to the beauty of Merton’s portfolio theory andICAPM. What remains is to actually state the consumer’s problem and prove that the valuefunction depends onZ and}, the state variables for future investment opportunities, and thatthe optimal portfolio holds the market and hedge portfolios for the investment opportunityvariables.

153

Page 154: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

10.3 Comments on the CAPM and ICAPM

Conditional vs. unconditional models.

Do they price options?

Why bother linearizing?

The wealth portfolio.

Ex-post returns.

The implicit consumption-based model.

What are the ICAPM state variables?

CAPM and ICAPM as general equilibrium models

Is the CAPM conditional or unconditional?

Is the CAPM a conditional or an unconditional factor model? I.e., are the parametersdande in p @ d� eUZ constants, or do they change at each time period, as conditioning in-formation changes? We saw above that a conditional CAPM does not imply an unconditionalCAPM, so additional steps must be taken to say anything about observed average returns.

The two period quadratic utility based derivation results in aconditional CAPM, since theparametersdw andew depend on consumption which changes over time. Also we know thatdande must vary over time if the conditional moments ofUZ > Ui vary over time. This two-period investor chooses a portfolio on theconditional mean variance frontier, which is not ontheunconditional frontier. The multiperiod quadratic utility CAPM only holds if returns arei.i.d. so it only holds if there is no difference between conditional and unconditional models.

The log utility CAPM expressed with the inverse market return is a beautiful model, sinceit holds both conditionally and unconditionally. There are no free parameters that can changewith conditioning information:

4 @ Hw

�4

UZw.4

Uw.4

�/ 4 @ H

�4

UZw.4

Uw.4

�=

In fact there are no free parameters at all! Furthermore, the model makes no distributional as-sumptions, so it can apply to any asset, including options. Finally it requires no speci¿cationof the investment opportunity set, or (macro language) no speci¿cation of technology.

Linearizing the log utility CAPM comes at enormous price. The expectations in the lin-earized log utility CAPM (10.14) areconditional. Thus, the apparent simpli¿cation of linear-ity destroys the nice unconditional feature of the log utility CAPM.

Should the CAPM price options?

154

Page 155: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.3 COMMENTS ON THE CAPM AND ICAPM

As I have derived them, the quadratic utility CAPM and the nonlinear log utility CAPMshould apply to all payoffs: stocks, bonds, options, contingent claims, etc. However, if we as-sume normal return distributions to obtain a linear CAPM from log utility, we can no longerhope to price options, since option returns are non-normally distributed (that’s the point ofoptions!) Even the normal distribution for regular returns is a questionable assumption. Youmay hear the statement “the CAPM is not designed to price derivative securities”� the state-ment refers to the log utility plus normal-distribution derivation of the linear CAPM.

Why linearize?

Why bother linearizing a model? Why take the log utility modelp @ 4@UZ whichshould priceany asset, and turn it intopw.4 @ dw.ewU

Zw.4 that loses the clean conditioning-

down property and cannot price non-normally distributed payoffs? These tricks were de-veloped before thes @ H+p{, expression of asset pricing models, when (linear) expectedreturn-beta models were the only thing around. You need a linear model ofp to get an ex-pected return - beta model. More importantly, the tricks were developed when it was hard toestimate nonlinear models. It’s clear how to estimate a� and a� by regressions, but estimat-ing nonlinear models used to be a big headache. Now, GMM has made it easy to estimate andevaluate nonlinear models. Thus, in my opinion, linearization is mostly intellectual baggage.

The desire for linear representations and this normality trick is one of the central reasonswhy many asset pricing models are written in continuous time. In most continuous timemodels, everything is locally normal. Unfortunately for empiricists, this approach adds time-aggregation and another layer of unobservable conditioning information into the predictionsof the model. For this reason, most empirical work is still based on discrete-time models.However, the local normal distributions in continuous time, even for option returns, is a goodreminder that normal approximations probably aren’t that bad, so long as the time interval iskept reasonably short.

What about the wealth portfolio?

The log utility derivation makes clear just how expansive is the concept of the wealthportfolio. To own a (share of) theconsumption stream, you have to own not only all stocks,but all bonds, real estate, privately held capital, publicly held capital (roads, parks, etc.), andhuman capital – a nice word for “people.” Clearly, the CAPM is a poor defense of commonproxies such as the value-weighted NYSE portfolio. And keep in mind that since it is easy to¿nd ex-post mean-variance ef¿cient portfolios of any subset of assets (like stocks) out there,taking the theory seriously is our only guard against¿shing.

Implicit consumption-based models

Many users of alternative models clearly are motivated by a belief that the consumption-based model doesn’t work, no matter how well measured consumption might be. This view isnot totally unreasonable� as above, perhaps transactions costs de-link consumption and assetreturns at high frequencies, and some diagnostic evidence suggests that the consumptionbehavior necessary to save the consumption model is too wild to be believed.

However, the derivations make clear that the CAPM and ICAPM are notalternatives to

155

Page 156: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

the consumption-based model, they arespecial cases of that model. In each casepw.4 @�x3+fw.4,@x3+fw, still operates. We just added assumptions that allowed us to substitutefwin favor of other variables. One cannot adopt the CAPM on the belief that the consumptionbased model iswrong. If you think the consumption-based model is wrong, the economicjusti¿cation for the alternative factor models evaporates.

The only plausible excuse for factor models is a belief that consumptiondata are un-satisfactory. However, while asset return data are well measured, it is not obvious that theS&P500 or other portfolio returns are terri¿c measures of the return to total wealth. “Macrofactors” used by Chen, Roll and Ross (1986) and others are distant proxies for the quanti-ties they want to measure, and macro factors based on other NIPA aggregates (investment,output, etc.) suffer from the same measurement problems as aggregate consumption.

In large part, the “better performance” of the CAPM and ICAPM relative to consumption-based models comes from throwing away content. Againpw.4 @ �x3+fw.4,@x3+fw, is therein any CAPM or ICAPM. The CAPM and ICAPM make predictions concerning consumptiondata that are wildly implausible, not only of admittedly poorly measured consumption databut any imaginable perfectly measured consumption data as well. For example, equation(10.15) says that the standard deviation of the wealth portfolio return equals the standarddeviation of consumption growth. The latter is about 1% per year. All the miserable failuresof the log-utility consumption-based model apply equally to the log utility CAPM. Finally,most models take the market price of risk as a free parameter. Of course it isn’t� it is relatedto risk aversion and consumption volatility and is very hard to justify as such.

Ex-post returns.

The log utility model also allows us for the ¿rst time to look at what moves returns ex-postas well as ex-ante. Recall that, in the log utility model, we have

UZw.4 @

4

fw.4fw

= (15)

Thus, the wealth portfolio return is high,ex-post, when consumption is high. This holds atevery frequency: If stocks go up between 12:00 and 1:00, it must be because (on average) weall decided to have a big lunch. This seems silly. Aggregate consumption and asset returns arelikely to be de-linked at high frequencies, buthow high (quarterly?) and by what mechanismare important questions to be answered. In any case, this is another implication of the logutility CAPM that is just thrown out.

In sum, the poor performance of the consumption-based model is an important nut tochew on, not just a blind alley or failed attempt that we can safely disregard and go on aboutour business.

Identity of state variables

The ICAPM does not tell us theidentity of the state variables}w, and many authors usethe ICAPM as an obligatory citation to theory on the way to using factors composed of ad-hoc portfolios, leading Fama (1991) to characterize the ICAPM as a “¿shing license.” The

156

Page 157: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.4 ARBITRAGE PRICING THEORY (APT)

ICAPM really isn’t quite such an expansive license. One could do a lot to insist that thefactor-mimicking portfolios actually are the projections of some identi¿able state variableson to the space of returns, and one could do a lot to make sure the candidate state variablesreally are plausible state variables for an explicitly stated optimization problem. For example,one could check that they actually do forecast something. The¿shing license comes as muchfrom habits of applying the theory as from the theory itself.

General equilibrium models

The CAPM and other models are reallygeneral equilibrium models. Looking at thederivation through general-equilibrium glasses, we have speci¿ed a set of linear technologieswith returnsUl that do not depend on the amount invested. Some derivations make furtherassumptions, such as an initial capital stock, and no labor or labor income.

10.4 Arbitrage Pricing Theory (APT)

The APT: If a set of asset returns are generated by a linear factor model

Ul @ H+Ul, .Q[m@4

�lm �im . %l

H+%l, @ H+%l �im, @ 3=

Then (with additional assumptions) there is a discount factorp linear in the factorsp @d. e3i that prices the returns.

The APT starts from a statistical characterization. There is a big common componentto stock returns: when the market goes up, most individual stocks also go up. Beyond themarket, groups of stocks move together such as computer stocks, utilities, small stocks, valuestocks and so forth. Finally, each stock’s return has some completely idiosyncratic movement.This is a characterization ofrealized returns,outcomes or payoffs. The point of the APT is tostart with this statistical characterization ofoutcomes, and derive something aboutexpectedreturns orprices.

The intuition behind the APT is that the completely idiosyncratic movements in assetreturns should not carry any risk prices, since investors can diversify them away by holdingportfolios. Therefore, risk prices or expected returns on a security should be related to thesecurity’s covariance with the common components or “factors” only.

The job of this section is then 1) to describe a mathematical model of the tendency forstocks to move together, and thus to de¿ne the “factors” and residual idiosyncratic compo-nents, and 2) to think carefully about what it takes for the idiosyncratic components to have

157

Page 158: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

zero (or small) risk prices, so that only the common components matter to asset pricing.

There are two lines of attack for the second item. 1) If there were no residual, then wecould price securities from the factors by arbitrage (really, by the law of one price, but thecurrent distinction between law of one price and arbitrage came after the APT was named.)Perhaps we can extend this logic and show that if the residuals are small, they must havesmall risk prices. 2) If investors all hold well-diversi¿ed portfolios, then only variations inthe factors drive consumption and hence marginal utility.

Much of the original appeal and marketing of the APT came from the¿rst line of attack,the idea that we could derive pricing implicationswithout the economic structure requiredof the CAPM, ICAPM, or any other model derived as a specialization of the consumption-based model. In this section, I will¿rst try to see how far we can in fact get with purely lawof one price arguments. I will conclude that the answer is, “not very far,” and that the mostsatisfactory argument for the APT is in fact just another specialization of the consumption-based model.

10.4.1 Factor structure in covariance matrices

I de¿ne and examine the factor decomposition

{l @ �l . �3li . %l> H+%l, @ 3> H+i%l, @ 3

The factor decomposition is equivalent to a restriction on the payoff covariance matrix.

The APT models the tendency of asset payoffs (returns) to move together via a statisticalfactor decomposition

{l @ �l .P[m@4

�lmim . %l @ �l . �3li . %l= (16)

The im are thefactors, the �lm are thebetas or factor loadings and the%l are residuals.As usual, I use the same letter without subscripts to denote a vector, for examplei @�i4 i5 === iN

�3. A discountfactor p, pricing factors i in p @ e3i and thisfactor

decomposition (or factor structure) for returns are totally unrelated uses of the word “factor.”I didn’t invent the terminology! The APT is conventionally written with{l @ returns, but itends up being much less confusing to use prices and payoffs.

It is a convenient and conventional simpli¿cation to fold the factor means into the¿rst,

158

Page 159: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.4 ARBITRAGE PRICING THEORY (APT)

constant, factor and write the factor decomposition with zero-mean factors�i � i �H+i,=

{l @ H+{l, .P[m@4

�lm�im . %l= (17)

Remember thatH+{l, is still just a statistical characterization, not a prediction of a model.

We can construct the factor decomposition as a regression equation. De¿ne the�lm asregression coef¿cients, and then the%l are uncorrelated with the factors by construction,

H+%l �im, @ 3=

The content — the assumption that keeps (10.17) from describing any arbitrary set of returns— is an assumption that the%l areuncorrelated with each other.

H+%l%m, @ 3=

(More general versions of the model allow some limited correlation across the residuals butthe basic story is the same.)

The factor structure is thus a restriction on the covariance matrix of payoffs. For example,if there is only one factor, then

fry+{l> {m, @ H^+�l �i . %l,+�m �i . %m,` @ �l�m�5+i, .

��5%l

if l @ m3 if l 9@ m

=

Thus, withQ @ number of securities, theQ+Q � 4,@5 elements of a variance-covariancematrix are described byQ betas, andQ . 4 variances. A vector version of the same thing is

fry+{> {3, @ ��3�5+i, .

597

�54 3 33 �55 3

3 3...

6:8 =

With multiple (orthogonalized) factors, we obtain

fry+{> {3, @ �4�34�

5+i4, . �5�35�

5+i5, . = = =. (diagonal matrix)

In all these cases, we describe the covariance matrix a singular matrix��3 (or a sum of a fewsuch singular matrices) plus a diagonal matrix.

If we know the factors we want to use ahead of time, say the market (value-weightedportfolio) and industry portfolios, or size and book to market portfolios, we can estimatea factor structure by running regressions. Often, however, we don’t know the identities ofthe factor portfolios ahead of time. In this case we have to use one of several statisticaltechniques under the broad heading offactor analysis (that’s where the word “factor” camefrom in this context) to estimate the factor model. One can estimate a factor structure quickly

159

Page 160: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

by simply taking an eigenvalue decomposition of the covariance matrix, and then settingsmall eigenvalues to zero.

10.4.2 Exact factor pricing

With no error term,

{l @ H+{l,4 . �3l �i=

implies

s+{l, @ H+{l,s+4, . �3ls+ �i,

and thus

p @ d. e3i > s+{l, @ H+p{l,

H+Ul, @ Ui . �3l�=

using only the law of one price.

Suppose that there are no idiosyncratic terms %l= This is called an exact factor model.Now look again at the factor decomposition,

{l @ H+{l,4 . �3l �i= (18)

It started as a statistical decomposition. But it also says that the payoff {l can be synthesizedas a portfolio of the factors and a constant (risk-free payoff). Thus, the price of{l can onlydepend on the prices of the factorsi>

s+{l, @ H+{l,s+4, . �3ls+ �i,= (19)

Thelaw of one price assumption lets you take prices of right and left sides.

If the factors are returns, their prices are 1. If the factors are not returns, their prices arefree parameters which can be picked to make the model¿t as well as possible. Since thereare fewer factors than payoffs, this procedure is not vacuous. (Recall that the prices of thefactors are related to the� in expected return beta representations.� is determined by theexpected return of a return factor, and is a free parameter for non-return factor models.)

We are really done, but the APT is usually stated as “there is adiscount factor linearin i that prices returnsUl,” or “there is an expected return-beta representation withi asfactors.” Therefore, we should take a minute to show that the rather obvious relationship(10.19) between prices is equivalent to discount factor and expected return statements.

160

Page 161: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.4 ARBITRAGE PRICING THEORY (APT)

Assuming only the law of one price, we know there is a discount factor p linear infactors that prices the factors. We usually call it {�> but call it i� here to remind us that itprices the factors. Denote ai @

�4 �i

�3the factors including the constant. As with {�,

i� @ s+ ai,3H+ ai ai 3,�4 ai @ d . e3i satis¿es s+ ai, @ H+i� ai, and s+4, @ H+i�,= If thediscount factor prices the factors, it must price any portfolio of the factors� hence i� pricesall payoffs {l that follow the factor structure (10.18).

We could now go from p linear in the factors to an expected return-beta model using theabove theorems that connect the two representations. But there is a more direct and elegantconnection. Start with (10.19), specialized to returns{l @ Ul and of courses+Ul, @ 4. Uses+4, @ 4@Ui and solve for expected return as

H+Ul, @ Ui . �3lk�Uis+ �i,

l@ Ui . �3l�=

The last equality de¿nes�. Expected returns are linear in the betas, and the constants+�, arerelated to the prices of the factors. In fact, this is the same de¿nition of � that we arrived atabove connectingp @ e3i to expected return-beta models.

10.4.3 Approximate APT using the law of one price

Attempts to extend the exact factor model to an approximate factor pricing model whenerrors are “small,” or markets are “large,” still only using law of one price.

For¿xedp, the APT gets better and better asU5 or the number of assets increases.

However, for any¿xedU5 or size of market, the APT can be arbitrarily bad.

These observations mean that we must go beyond the law of one price to derive factorpricing models.

Actual returns do not display an exact factor structure. There is some idiosyncratic orresidual risk� we cannot exactly replicate the return of a given stock with a portfolio of a fewlarge factor portfolios. However, the idiosyncratic risks are often small. For example, factormodel regressions of the form (10.16) often have very highU5, especially when portfoliosrather than individual securities are on the left hand side. And the residual risks are stillidiosyncratic: Even if they are a large price of an individual security’s variance, they shouldbe a small contributor to the variance of well diversi¿ed portfolios. Thus, there is reason tohope that the APT holds approximately, especially for reasonably large portfolios. Surely, ifthe residuals are “small” and/or “idiosyncratic,” the price of an asset can’t be “too different”from the price predicted from its factor content?

To think about these issues, start again from a factor structure, but this time put in a

161

Page 162: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

residual,

{l @ H+{l,4 . �3l �i . %l

Again take prices of both sides,

s+{l, @ H+{l,s+4, . �3ls+ �i, .H+p%l,

Now, what can we say about the price of the residual s+%l, @ H+p%l,B

Figure 21 illustrates the situation. Portfolios of the factors span a payoff space, the linefrom the origin through �3li in the Figure. The payoff we want to price, {l is not in that space,since the residual %l is not zero. A discount factor i� that is in the i payoff space prices thefactors. The set of all discount factors that price the factors is the line p perpendicular toi�. The residual %l is orthogonal to the factor space, since it is a regression residual, and toi� in particular, H+i�%l, @ 3. This means that i� assigns zero price to the residual. But theother discount factors on the p line are not orthogonal to %l, so generate non-zero price forthe residual%l. As we sweep along the line of discount factorsp that price thei , in fact, wegenerate every price from�4 to4 for the residual. Thus, the law of one price does not naildown the price of the residual%l and hence the price or expected return of{l=

β’ if

xi

εi

f*

m

All m

m > 0

m: σ2(m) < A

Figure 21. Approximate arbitrage pricing.

Limiting arguments

We would like to show that the price of{l has to be “close to” the price of�3li . One notion

162

Page 163: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.4 ARBITRAGE PRICING THEORY (APT)

of “close to” is that in some appropriate limit the price of{l converges to the price of�3li=“Limit” means, of course, that you can get arbitrarily good accuracy by going far enough inthe direction of the limit (for every% A 3 there is a�====,. Thus, establishing a limit result isa way to argue for an approximation.

Here is one theorem that seems to imply that the APT should be a good approximationfor portfolios that have highU5on the factors. I state the argument for the case that there is aconstant factor, so the constant is in thei space andH+%l, @ 3. The same ideas work in theless usual case that there is no constant factor, using second moments in place of variance.

Theorem: Fix a discount factorp that prices the factors. Then, asydu+%l, $ 3,s+{l, $ s+�3li,=

This is easiest to see by just looking at the graph.H+%l, @ 3 soydu+%l, @ H+%l5, @mm%lmm5. Thus, as the size of the%l vector in Figure 21 gets smaller,{l gets closer and closer to�3li . For any¿xedp, the induced pricing function (lines perpendicular to the chosenp) iscontinuous. Thus, as{l gets closer and closer to�3li , its price gets closer and closer to�3li=

The factor model is de¿ned as a regression, so

ydu+{l, @ ydu+�3li, . ydu+%l,

Thus, the variance of the residual is related to the regressionU5.

ydu+%l,

ydu+{l,@ 4�U5

The theorem says that asU5 $ 4, the price of the residual goes to zero.

We were hoping for some connection between the fact that the risks areidiosyncratic andfactor pricing. Even if the idiosyncratic risks are a large part of the payoff at hand, theyare a small part of a well-diversi¿ed portfolio. The next theorem shows that portfolios withhigh U5 don’t have to happen by chance� well-diversi¿ed portfolios will always have thischaracteristic.

Theorem: As the number of primitive assets increases, theU5 of well-diversi¿edportfolios increases to 1.Proof: Start with an equally weighted portfolio

{s @4

Q

Q[l@4

{l=

Going back to the factor decomposition (10.16) for each individual asset{l, the

163

Page 164: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

factor decomposition of {s is

{s @4

Q

Q[l@4

�dl . �3li . %l

�@

4

Q

Q[l@4

dl .4

Q

Q[l@4

�3li .4

Q

Q[l@4

%l @ ds . �3

si . %s=

The last equality de¿nes notation �s> �s> %s. But

ydu+%s, @ ydu

#4

Q

Q[l@4

%l

$

So long as the variance of %l are bounded, and given the factor assumption H+%l%m, @3,

olpQ$4

ydu+%s, @ 3=

Obviously, the same idea goes through so long as the portfolio spreads some weighton all the new assets, i.e. so long as it is “well-diversi¿ed.” �

These two theorems can be interpreted to say that the APT holds approximately (in theusual limiting sense) for portfolios that either naturally have highU5, or well-diversi¿edportfolios in large enough markets. We have only used the law of one price.

Law of one price arguments fail

Now, let me pour some cold water on these results. I¿xed p and then let other things takelimits. TheÀip side is that for any nonzero residual%l, no matter how small, we can pick adiscount factorp that prices the factors and assignsany price to{l$ As often in mathematics,the order of “for all” and “there exists” matters a lot.

Theorem: For any nonzero residual%l there is a discount factor that prices the fac-tors i (consistent with the law of one price) and that assignsany desired price in+�4>4, to the payoff{l=

So long asmm%lmm A 3, as we sweep the choice ofp along the dashed line, the innerproduct ofp with %l and hence{l varies from�4 to 4. Thus, for a given sizeU5 ? 4, ora given¿nite market, the law of one price says absolutely nothing about the prices of payoffsthat do not exactly follow the factor structure. The law of one price says that two ways ofconstructing the same portfolio must give the same price. If the residual is not exactly zero,there is no way of replicating the payoff{l from the factors and no way to infer anythingabout the price of{l from the price of the factors.

I think the contrast between this theorem and those of the last subsection accounts formost of the huge theoretical controversy over the APT. If you¿x p and take limits ofQ or

164

Page 165: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.4 ARBITRAGE PRICING THEORY (APT)

%, the APT gets arbitrarily good. But if you ¿x Q or %, as one does in any application, theAPT can get arbitrarily bad as you search over possible p.

The lesson I learn is that the effort to extend prices from an original set of securities (i inthis case) to new payoffs that are not exactly spanned by the original set of securities, usingonly the law of one price, is fundamentally doomed. To extend a pricing function, you needto add some restrictions beyond the law of one price.

10.4.4 Beyond the law of one price: arbitrage and Sharpe ratios

We can ¿nd a well-behaved approximate APT if we impose the law of one priceand arestriction on the volatility of discount factors, or, equivalently, a bound on the Sharpe ratioachievable by portfolios of the factors and test assets.

The approximate APT based on the law of one price fell apart because we could alwayschoose a discount factor suf¿ciently “far out” to generate an arbitrarily large price for anarbitrarily small residual. But those discount factors are surely “unreasonable.” Surely, wecan rule them out, reestablishing an approximate APT, without jumping all the way to fullyspeci¿ed discount factor models such as the CAPM or consumption-based model

A natural ¿rst idea is to impose the no-arbitrage restriction thatp must be positive.Graphically, we are now restricted to the solidp line in Figure 21. Since that line onlyextends a¿nite amount, restricting us to strictly positivep3v gives rise to¿nite upper andlower arbitrage bounds on the price of%l and hence{l. (The wordarbitrage bounds comesfrom option pricing, and we will see these ideas again in that context. If this idea worked, itwould restore the APT to “arbitrage pricing” rather than “law of one-pricing.”)

Alas, in applications of the APT (as often in option pricing), the arbitrage bounds aretoo wide to be of much use. The positive discount factor restriction is equivalent to saying“if portfolio A gives a higher payoff than portfolio B inevery state of nature, then the priceof A must be higher than the price of B.” Since stock returns and factors are continuouslydistributed, not two-state distributions as I have graphed for¿gure 21, there typically are nostrictly dominating portfolios, so addingp A 3 does not help.

A second restriction does let us derive an approximate APT that is useful in¿nite marketswith U5 ? 4. We can restrict thevariance and hence the size (mmpmm @ H+p5, @ �5+p, .H+p,5 @ �5+p, . 4@Ui5, of the discount factor. Figure 21 includes a plot of the discountfactors with limited variance, size, or length in the geometry of that Figure. The restrictedrange of discount factors produces a restricted range of prices for{l. The restricted rangeof discount factors gives us upper and lower pricebounds for the price of{l in terms of the

165

Page 166: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

factor prices. Precisely, the upper and lower bounds solve the problem

plqipj

+ or pd{ipj

, s+{l, @ H+p{l, v=w= H+pi, @ s+i,> p � 3> �5+p, � D=

Limiting the variance of the discount factor is of course the same as limiting the maximumSharpe ratio (mean / standard deviation of excess return) available from portfolios of thefactors and {l. Recall that

H +Uh,

�+Uh,� �+p,

H+p,=

Though a bound on Sharpe ratios or discount factor volatility is not a totally preference-free concept, it clearly imposes a great deal less structure than the CAPM or ICAPM whichare essentially full general equilibrium models. Ross (1976) included this suggestion in hisoriginal APT paper, though it seems to have disappeared from the literature since then inthe failed effort to derive an APT from the law of one price alone. Ross pointed out thatdeviations from factor pricing could provide very high Sharpe ratio opportunities, whichseem implausible though not violations of the law of one price. Saá-Requejo and I (2000)dub this idea “good-deal” pricing, as an extension of “arbitrage pricing.” Limiting�+p, rulesout “good deals” as well as pure arbitrage opportunities.

Having imposed a limit on discount factor volatility or Sharpe ratioD, then the APT limitdoes work, and does not depend on the order of “for all” and “there exists.”

Theorem: As %l $ 3 andU5 $ 4, the prices+{l, assigned byany discount factorp that satis¿esH+pi, @ s+i,> p � 3> �5+p, � D approachess+�3li,.

10.5 APT vs. ICAPM

A factor structure in the covariance of returns or highU5 in regressions of returns onfactors are suf¿cient (APT) but not necessary (ICAPM) for factor pricing.

Differing inspiration for factors.

The disappearance of absolute pricing.

The APT and ICAPM stories are often confused. Factor structure can employ factorpricing (APT), but factor pricing does not require a factor structure. In the ICAPM there isno presumption that factorsi in a pricing modelp @ e3i describe the covariance matrixof returns. The factors don’t have to be orthogonal or i.i.d. either. HighU5 in time-seriesregressions of the returns on the factors may imply factor pricing (APT), but again are notnecessary (ICAPM). The regressions of returns on factors can have as low anU5 as one

166

Page 167: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.6 PROBLEMS

wishes in the ICAPM.

The biggest difference between APT and ICAPM for empirical work is in the inspirationfor factors. The APT suggests that one start with a statistical analysis of the covariance matrixof returns and ¿nd portfolios that characterize common movement. The ICAPM suggests thatone start by thinking about state variables that describe the conditional distribution of futureasset returns and non-asset income. More generally, the idea of proxying for marginal utilitygrowth suggests macroeconomic indicators, and indicators of shocks to non-asset income inparticular.

The difference between the derivations of factor pricing models, and in particular an ap-proximate law-of-one-price basis vs. a proxy for marginal utility basis seems not to havehad much impact on practice. In practice, we just test modelsp @ e3i and rarely worryabout derivations. The best evidence for this view is the introductions of famous papers.Chen, Roll and Ross (1986) describe one of the earliest popular multifactor models, usingindustrial production and inÀation as some of the main factors. They do not even presenta factor decomposition of test asset returns, or the time-series regressions. A reader mightwell categorize the paper as a macroeconomic factor model or perhaps an ICAPM. Fama andFrench (1993) describe the currently most popular multifactor model, and their introductiondescribes it as an ICAPM in which the factors are state variables. But the factors are sortedon size and book/market just like the test assets, the time-seriesU5 are all above<3(, andmuch of the explanation involves “common movement” in test assets captured by the factors.A a reader might well categorize the model as much closer to an APT.

In the¿rst chapter, I made a distinction betweenrelative pricing andabsolute pricing. Inthe former, we price one security given the prices of others, while in the latter, we price eachsecurity by reference to fundamental sources of risk. The factor pricing stories are interestingin that they start with a nice absolute pricing model, the consumption-based model, andthrow out enough information to end up with relative models. The CAPM pricesUl giventhe market, but throws out the consumption-based model’s description of where the marketreturn came from.

10.6 Problems

1. Suppose the investor only has a one-period horizon. He invests wealthZ at date zero,and only consumes with expected utilityHx+f, @ Hx+Z , in period one. Derive thequadratic utility CAPM in this case. (This is an even simpler derivation. The Lagrangemultiplier on initial wealthZ now becomes the denominator ofp in place ofx3+f3,).

2. Express the log utility CAPM in continuous time to derive a discount factor linear inwealth.

3. Figure 21 suggests thatp A 3 is enough to establish a well-beahved approximate APT.The text claims this is not true. Which is right?

167

Page 168: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

PART IIEstimating and evaluating asset

pricing models

168

Page 169: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 10.6 PROBLEMS

Our ¿rst task in bringing an asset pricing model to data is to estimate the free parameters� the� and � in p @ �+fw.4@fw,

�� , or the e in p@ e3i= Then we want to evaluate the model. Isit a good model or not? Is another model better?

Statistical analysis helps us to evaluate a model by providing a distribution theory fornumbers such as parameter estimates that we create from the data. A distribution theorypursues the following idea: Suppose that we generate arti¿cial data over and over again froma statistical model. For example, we could specify that the market return is an i.i.d. normalrandom variable, and a set of stock returns is generated by Uhl

w @ �l . �lUhpw . %lw. After

picking values for the mean and variance of the market return and the �l> �l> �5+%l,> we could

ask a computer to simulate many arti¿cial data sets. We can repeat our statistical procedurein each of these arti¿cial data sets, and graph the distribution of any statistic which we haveestimated from the real data, i.e. the frequency that it takes on any particular value in ourarti¿cial data sets.

In particular, we are interested in a distribution theory for the estimated parameters, to giveus some sense of how much the data really has to say about their values� and for the pricingerrors, which helps us to judge whether pricing errors are just bad luck of one particularhistorical accident or if they indicate a failure of the model. We also will want to generatedistributions for statistics that compare one model to another, or provide other interestingevidence, to judge how much sample luck affects those calculations.

All of the statistical methods I discuss in this part achieve these ends. They give methodsfor estimating free parameters� they provide a distribution theory for those parameters, andthey provide distributions for statistics that we can use to evaluate models, most often aquadratic form of pricing errors in the form a�3Y �4a�.

I start by focusing on the GMM approach. The GMM approach is a natural ¿t for adiscount factor formulation of asset pricing theories, since we just use sample moments inthe place of population moments. As you will see, there is no singular “GMM estimate andtest.” GMM is a large canvas and a big set of paints and brushes� a Àexible tool for doingall kinds of sensible (and, unless you’re careful, not-so-sensible) things to the data. ThenI consider traditional regression tests (naturally paired with expected return-beta statementsof factor models) and their maximum likelihood formalization. I emphasize the fundamentalsimilarities between these three methods, as I emphasized the similarity betweens @ H+p{,,

169

Page 170: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 10 FACTOR PRICING MODELS

expected return-beta models, and mean-variance frontiers. A concluding chapter highlightssome of the differences between the methods, as I contrasteds @ H+p{, and beta or mean-variance representations of the models.

170

Page 171: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 11. GMM in explicit discountfactor modelsThe basic idea in the GMM approach is very straightforward. The asset pricing model pre-dicts

H+sw, @ H ^p+dataw.4>parameters, {w.4` = (1)

The most natural way to check this prediction is to examine sample averages, i.e. to calculate

4

W

W[w@4

sw and4

W

W[w@4

^p+dataw.4>parameters, {w.4` = (2)

GMM estimates the parameters by making the sample averages as close to each other aspossible. It seems natural, before evaluating a model, to pick parameters that give it its bestchance. GMM then works out a distribution theory for the estimates. This distribution theoryis a generalization of the simplest exercise in statistics: the distribution of the sample mean.Then, it suggests that weevaluate the model by looking at how close the sample averagesof price and discounted payoffare to each other, or equivalently by looking at the pricingerrors. It gives a statisticaltest of the hypothesis that the underlying population means are infact zero.

11.1 The Recipe

De¿nitions

xw.4+e, � pw.4+e,{w.4 � sw

jW +e, � HW ^xw+e,`

V �4[

m@�4H ^xw+e, xw�m+e,3`

GMM estimate

ae5 @ dujplqe jW +e,3 aV�4jW +e,=

Standard errors

ydu+ae5, @4

W+g3V�4g,�4> g � CjW +e,

Ce

171

Page 172: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 11 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

Test of the model (“overidentifying restrictions”)

WMW @ Wplq�jW +e,

3V�4jW +e,� � "5+&moments�&parameters,=

It’s easiest to start our discussion of GMM in the context of an explicit discount factormodel, such as the consumption-based model. I treat the special structure of linear factormodels later. I start with the basic classic recipe as given by Hansen and Singleton (1982).

Discount factor models involve some unknown parameters as well as data, so I writepw.4+e,when it’s important to remind ourselves of this dependence. For example, ifpw.4 @

�+fw.4@fw,�� > then e � ^� �`3. I write ae to denote an estimate when it is important to

distinguish estimated from other values.

Any asset pricing model implies

H+sw, @ H ^pw.4+e,{w.4` = (3)

It’s easiest to write this equation in the formH+�, @ 3

H ^pw.4+e,{w.4 � sw` @ 3= (4)

{ ands are typically vectors� we typically check whether a model forp can price a numberof assets simultaneously. Equations (11.4) are often called themoment conditions.

It’s convenient to de¿ne theerrors xw+e, as the object whose mean should be zero,

xw.4+e, @ pw.4+e,{w.4 � sw

Given values for the parameterse, we could construct a time series onxw and look at its mean.

De¿ne jW +e, as the sample mean of thexw errors, when the parameter vector ise in asample of sizeW :

jW +e, � 4

W

W[w@4

xw+e, @ HW ^xw+e,` @ HW ^pw.4+e,{w.4 � sw` =

The second equality introduces the handy notationHW for sample means,

HW +�, @ 4

W

W[w@4

+�,=

(It might make more sense to denote these estimatesaH andaj. However, Hansen’sW subscriptnotation is so widespread that doing so would cause more confusion than it solves.)

172

Page 173: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 11.1 THE RECIPE

The ¿rst stage estimate of e minimizes a quadratic form of the sample mean of the errors,

ae4 @ dujplqiaej jW +ae,3ZjW +ae,

for some arbitrary matrix Z (often, Z @ L). This estimate is consistent and asymptoticallynormal. You can and often should stop here, as I explain below.

Using ae4, form an estimate aV of

V �4[

m@�4H ^xw+e, xw�m+e,3` = (5)

(Below I discuss various interpretations of and ways to construct this estimate.) Form asecond stage estimate ae5 using the matrix aV in the quadratic form,

ae5 @ dujplqe jW +e,3 aV�4jW +e,=

ae5 is a consistent, asymptotically normal, and asymptotically ef¿cient estimate of the param-eter vectore. “Ef¿cient” means that it has the smallest variance-covariance matrix among allestimators that set different linear combinations ofjW +e, to zero.

The variance-covariance matrix ofae5 is

ydu+ae5, @4

W+g3V�4g,�4

where

g � CjW +e,

Ce

or, more explicitly,

g @ HW

�C

Ce^+pw.4+e,{w.4 � sw,`

�����e@ae

(More precisely,g should be written as the object to whichCjW@Ce converges, and thenCjW@Ce is an estimate of that object used to form a consistent estimate of the asymptoticvariance-covariance matrix.)

This variance-covariance matrix can be used to test whether a parameter or group ofparameters are equal to zero, via

aeltydu+ae,ll

� Q+3> 4,

173

Page 174: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 11 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

and

aemkydu+ae,mm

l�4aem � "5+&included e3v,

where em @subvector> ydu+e,mm @submatrix=

Finally, the test of overidentifying restrictions is a test of the overall ¿t of the model. Itstates that W times the minimized value of the second-stage objective is distributed"5 withdegrees of freedom equal to the number of moments less the number of estimated parameters.

WMW @ Wplqiej

�jW +e,

3V�4jW +e,� � "5+&moments�&parameters,=

11.2 Interpreting the GMM procedure

jW +e, is a pricing error. It is proportional to�.

GMM picks parameters to minimize a weighted sum of squared pricing errors.

The second-stage picks the linear combination of pricing errors that are best measured, byhaving smallest sampling variation. First and second stage are like OLS and GLS regressions.

The standard error formula is a simple application of the delta method.

TheMW test evaluates the model by looking at the sum of squared pricing errors.

Pricing errors

The moment conditions are

jW +e, @ HW ^pw.4+e,{w.4`�HW ^sw` =

Thus, each moment is the difference between actual (HW +s,) and predicted (HW +p{,) price,or pricing error. What could be more natural than to pick parameters so that the model’spredicted prices are as close as possible to the actual prices, and then to evaluate the modelby how large these pricing errors are?

In the language of expected returns, the momentsjW +e, are proportional to the differencebetween actual and predicted returns� Jensen’s alphas, or the vertical distance between thepoints and the line in Figure 5. To see this fact, recall that3 @ H+pUh, can be translated toa predicted expected return,

H+Uh, @ �fry+p>Uh,

H+p,=

174

Page 175: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 11.2 INTERPRETING THE GMM PROCEDURE

Therefore, we can write the pricing error as

j+e, @ H+pUh, @ H+p,

�H+Uh,�

��fry+p>Uh,

H+p,

��

j+e, @4

Ui+actual mean return - predicted mean return.,

If we express the model in expected return-beta language,

H+Uhl, @ �l . �3l�

then the GMM objective is proportional to the Jensen’s alpha measure of mis-pricing,

j+e, @4

Ui�l=

First-stage estimates

If we could, we’d pick e to make every element ofjW +e, @ 3 — to have the model priceassets perfectly in sample. However, there are usually more moment conditions (returns timesinstruments) than there are parameters. There should be, because theories with as many freeparameters as facts (moments) are vacuous. Thus, we choosee to makejW +e, as small aspossible, by minimizing a quadratic form,

plqiej

jW +e,3ZjW +e,= (6)

Z is aweighting matrix that tells us how much attention to pay to each moment, or howto trade off doing well in pricing one asset or linear combination of assets vs. doing well inpricing another. In the common caseZ @ L, GMM treats all assets symmetrically, and theobjective is to minimize the sum of squared pricing errors.

The sample pricing errorjW +e, may be a nonlinear function ofe. Thus, you may have touse a numerical search to¿nd the value ofe that minimizes the objective in (11.6). However,since the objective is locally quadratic, the search is usually straightforward.

Second-stage estimates: WhyV�4B

What weighting matrix should you use? The weighting matrix directs GMM to emphasizesome moments or linear combinations of moments at the expense of others. You might startwith Z @ L, i.e., try to price all assets equally well. A Z that is not the identity matrix canbe used to offset differences in units between the moments. You also might also start withdifferent elements on the diagonal of Z if you think some assets are more interesting, moreinformative, or better measured than others.

The second-stage estimate picks a weighting matrix based onstatistical considerations.

175

Page 176: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 11 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

Some asset returns may have much more variance than other assets. For those assets, thesample mean jW @ HW +pwUw � 4, will be a much less accurate measurement of the popula-tion meanH+pU�4,, since the sample mean will vary more from sample to sample. Hence,it seems like a good idea to pay less attention to pricing errors from assets with high varianceof pwUw � 4. One could implement this idea by using aZ matrix composed of inverse vari-ances ofHW +pwUw � 4, on the diagonal. More generally, since asset returns are correlated,one might think of using the covariance matrix ofHW +pwUw�4,. This weighting matrix paysmost attention tolinear combinations of moments about which the data set at hand has themost information. This idea is exactly the same as heteroskedasticity and cross-correlationcorrections that lead you from OLS to GLS in linear regressions.

The covariance matrix ofjW @ HW +xw.4, is the variance of a sample mean. Exploitingthe assumption thatH+xw, @ 3> and thatxw is stationary soH+x4x5, @ H+xwxw.4, dependsonly on the time interval between the twoxs, we have

ydu+jW , @ ydu

#4

W

W[w@4

xw.4

$

@4

W 5

�WH+xwx

3w, . +W � 4,

�H+xwx

3w�4, .H+xwx

3w.4,,

�. ===

As W $ 4> +W � m,@W $ 4> so

ydu+jW , $ 4

W

4[m@�4

H+xwx3w�m, @

4

WV=

The last equality denotesV> known for other reasons as thespectral density matrix at fre-quency zeroof xw. (Precisely, V so de¿ned is the variance-covariance matrix of thejW for¿xede. The actual variance-covariance matrix ofjW must take into account the fact that wechosee to set a linear combination of thejW to zero in each sample. I give that formulabelow. The point here is heuristic.)

This fact suggests that a good weighting matrix might be the inverse ofV. In fact, Hansen(1982) shows formally that the choice

Z @ V�4> V �4[

m@�4H+xwx

3w�m,

is the statistically optimal weighing matrix, meaning that it produces estimates with lowestasymptotic variance.

You may be more used to the formula�+x,@sW for the standard deviation of a sample

mean. This formula is a special case that holds when thex3wv are uncorrelated over time. If

176

Page 177: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 11.2 INTERPRETING THE GMM PROCEDURE

Hw+xwx3w�m, @ 3> m 9@ 3> then the previous equation reduces to

ydu

#4

W

W[w@4

xw.4

$@

4

WH+xx3, @

ydu+x,

W=

This is probably the ¿rst statistical formula you ever saw – the variance of the sample mean.In GMM, it is the last statistical formula you’ll ever see as well. GMM amounts to just gen-eralizing the simple ideas behind the distribution of the sample mean to parameter estimationand general statistical contexts.

The¿rst and second stage estimates should remind you of standard linear regression mod-els. You start with an OLS regression. If the errors are not i.i.d., the OLS estimates are con-sistent, but not ef¿cient. If you want ef¿cient estimates, you can use the OLS estimates toobtain a series of residuals, estimate a variance-covariance matrix of residuals, and then doGLS. GLS is also consistent and more ef¿cient, meaning that the sampling variation in theestimated parameters is lower.

Standard errors

The formula for the standard error of the estimate,

ydu+ae5, @4

W+g3V�4g,�4 (7)

can be understood most simply as an instance of the “delta method” that the asymptoticvariance ofi+{, is i 3+{,5ydu+{,. Suppose there is only one parameter and one moment.V@W is the variance matrix of the momentjW . g�4 is ^CjW@Ce`

�4 @ Ce@CjW . Then the deltamethod formula gives

ydu+ae5, @4

W

Ce

CjWydu+jW ,

Ce

CjW=

The actual formula (11.7) just generalizes this idea to vectors.

11.2.1 MW Test

Once you’ve estimated the parameters that make a model “¿t best,” the natural question is,how well does it¿t? It’s natural to look at the pricing errors and see if they are “big.” TheMW test asks whether they are “big” by statistical standards – if the model is true, how oftenshould we see a (weighted) sum of squared pricing errors this big? If not often, the model is“rejected.” The test is

WMW @ WkjW +ae,

3V�4jW +ae,l� "5+&moments�&parameters,=

SinceV is the variance-covariance matrix ofjW , this statistic is the minimized pricing errors

177

Page 178: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 11 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

divided by their variance-covariance matrix. Sample means converge to a normal distribution,so sample means squared divided by variance converges to the square of a normal, or"5.

The reduction in degrees of freedom corrects for the fact thatV is really the covariancematrix ofjW for ¿xede. We set a linear combination of thejW to zero in each sample, so theactual covariance matrix ofjW is singular, with rank #moments - #parameters. More detailsbelow.

11.3 Applying GMM

Notation.

Forecast errors and instruments.

Stationarity and choice of units.

Notation� instruments and returns

Most of the effort involved with GMM is simply mapping a given problem into the verygeneral notation. The equation

H ^pw.4+e,{w.4 � sw` @ 3

can capture a lot. We often test asset pricing models using returns, in which case the momentconditions are

H ^pw.4+e,Uw.4 � 4` @ 3=

It is common to addinstruments as well. Mechanically, you can multiply both sides of

4 @ Hw ^pw.4+e,Uw.4`

by any variable}w observed at timew before taking unconditional expectations, resulting in

H+}w, @ H ^pw.4+e,Uw.4}w` =

Expressing the result inH+�, @ 3 form,

3 @ H i^pw.4+e,Uw.4 � 4` }wj = (8)

We can do this for a whole vector of returns and instruments, multiplying each return by eachinstrument. For example, if we start with two returnsU @ ^Ud Ue`3 and one instrument},

178

Page 179: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 11.3 APPLYING GMM

equation (11.8) looks like

H

;AA?AA=

5997

pw.4+e, Udw.4

pw.4+e, Uew.4

pw.4+e, Udw.4}w

pw.4+e, Uew.4}w

6::8�

5997

44}w}w

6::8<AA@AA> @

5997

3333

6::8 =

Using the Kronecker product meaning “multiply every element by every other element”we can denote the same relation compactly by

H i^pw.4+e, Uw.4 � 4` }wj @ 3> (9)

or, emphasizing the managed-portfolio interpretation ands @ H+p{, notation,

H ^pw.4+e,+Uw.4 }w,� +4 }w,` @ 3=

Forecast errors and instruments

The asset pricing model says that, although expectedreturns can vary across time and as-sets, expecteddiscounted returns should always be the same, 1. The errorxw.4 @ pw.4Uw.4�4 is the ex-post discounted return.xw.4 @ pw.4Uw.4 � 4 represents aforecast error. Likeany forecast error,xw.4 should be conditionally and unconditionally mean zero.

In an econometric context,} is an instrument because it is uncorrelated with the errorxw.4. H+}wxw.4, is the numerator of a regression coef¿cient of xw.4 on }w� thus addinginstruments basically checks that the ex-post discounted return is unforecastable by linearregressions.

If an asset’s return is higher than predicted when}w is unusually high, but not on average,scaling by}w will pick up this feature of the data. Then, the moment condition checks thatthe discount rate is unusually low at such times, or that the conditional covariance of thediscount rate and asset return moves suf¿ciently to justify the high conditionally expectedreturn. As I explained in Section 9.1, the addition of instruments is equivalent to adding thereturns of managed portfolios to the analysis, and is in principle able to captureall of themodel’s predictions.

Stationarity and distributions

The GMM distribution theory does require some statistical assumption. Hansen (1982)and Ogaki (1993) cover them in depth. The most important assumption is thatp> s> and{ must bestationary random variables. (“Stationary” of often misused to mean constant, ori.i.d.. The statistical de¿nition of stationarity is that the joint distribution of{w> {w�m dependsonly onm and not onw.) Sample averages must converge to population means as the samplesize grows, and stationarity implies this result.

Assuring stationarity usually amounts to a choice of sensible units. For example, though

179

Page 180: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 11 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

we could express the pricing of a stock as

sw @ Hw ^pw.4+gw.4 . sw.4,`

it would not be wise to do so. For stocks, s and g rise over time and so are typically notstationary� their unconditional means are not de¿ned. It is better to divide by sw and expressthe model as

4 @ Hw

�pw.4

gw.4 . sw.4sw

�@ Hw +pw.4Uw.4,

The stock return is plausibly stationary.

Dividing by dividends is an alternative and I think underutilized way to achieve stationar-ity:

swgw

@ Hw

�pw.4

�4 .

sw.4gw.4

�gw.4gw

�=

Now we map�4 . sw.4

gw.4

�gw.4gw

into {w.4 and swgw

into sw= This formulation allows us to focus

on prices rather than one-period returns.

Bonds are a claim to a dollar, so bond prices and yields do not grow over time. Hence, itmight be all right to examine

sew @ H+pw.4 4,

with no transformations.

Stationarity is not always a clear-cut question in practice. As variables become “lessstationary,” as they experience longer swings in a sample, the asymptotic distribution canbecomes a less reliable guide to a¿nite-sample distribution. For example, the level of nominalinterest rates is surely a stationary variable in a fundamental sense: it was 6% in ancientBabylon, about 6% in 14th century Italy, and about 6% again today. Yet it takes very longswings away from this unconditional mean, moving slowly up or down for even 20 years ata time. Therefore, in an estimate and test that uses the level of interest rates, the asymptoticdistribution theory might be a bad approximation to the correct¿nite sample distributiontheory. This is true even if the number of data points is large. 10,000 data points measuredevery minute are a “smaller” data set than 100 data points measured every year. In sucha case, it is particularly important to develop a¿nite-sample distribution by simulation orbootstrap, which is easy to do given today’s computing power.

It is also important to choosetest assets in a way that is stationary. For example, individualstocks change character over time, increasing or decreasing size, exposure to risk factors,leverage, and even nature of the business. For this reason, it is common to sort stocks intoportfolios based on characteristics such as betas, size, book/market ratios, industry and soforth. The statistical characteristics of theportfolio returns may be much more constant than

180

Page 181: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 11.3 APPLYING GMM

the characteristics of individual securities, which Àoat in and out of the various portfolios.(One can alternatively include the characteristics as instruments.)

Many econometric techniques require assumptions about distributions. As you can see,the variance formulas used in GMM do not include the usual assumptions that variablesare i.i.d., normally distributed, homoskedastic, etc. You can put such assumptions in if youwant to – we’ll see how below, and adding such assumptions simpli¿es the formulas and canimprove the small-sample performance when the assumptions are justi¿ed – but you don’thave to add these assumptions.

181

Page 182: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 12. GMM: general formulasand applicationsLots of calculations beyond formal parameter estimation and overall model testing are usefulin the process of evaluating a model and comparing it to other models. But you want tounderstand sampling variation in such calculations, and mapping the questions into the GMMframework allows you to do this easily. In addition, alternative estimation and evaluationprocedures may be more intuitive or robust to model misspeci¿cation than the two (or multi)stage procedure described above.

In this chapter I lay out the general GMM framework, and I discuss four applications andvariations on the basic GMM method. 1) I show how to derive standard errors of nonlin-ear functions of sample moments, such as correlation coef¿cients. 2) I apply GMM to OLSregressions, easily deriving standard error formulas that correct for autocorrelation and con-ditional heteroskedasticity. 3) I show how to use prespeci¿ed weighting matricesZ in assetpricing tests in order to overcome the tendency of ef¿cient GMM to focus on spuriously low-variance portfolios 4) As a good parable for prespeci¿ed linear combination of momentsd, Ishow how to mimic “calibration” and “evaluation” phases of real business cycle models. 5)I show how to use the distribution theory for thejW beyond just forming theMW test in orderto evaluate the importance of individual pricing errors. The next chapter continues, and col-lects GMM variations useful for evaluating linear factor models and related mean-variancefrontier questions.

Many of these calculations amount to creative choices of thedW matrix that selects whichlinear combination of moments are set to zero, and reading off the resulting formulas for vari-ance covariance matrix of the estimated coef¿cients, equation (12.4) and variance covariancematrix of the momentsjW , equation (12.5).

12.1 General GMM formulas

The general GMM estimate

dW jW +ae, @ 3

Distribution ofae =

Wfry+ae, @ +dg,�4dVd3+dg,�43

Distribution ofjW +ae, =

WfrykjW +ae,

l@

�L � g+dg,�4d

�V�L � g+dg,�4d

�3182

Page 183: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.1 GENERAL GMM FORMULAS

The “optimal” estimate usesd @ g3V�4= In this case,

Wfry+ae, @ +g3V�4g,�4

WfrykjW +ae,

l@ V � g+g3V�4g,�4g3

and

WMW @ WjW +ae,3V�4jW +ae, $ "5+&moments�&parameters,=

An analogue to the likelihood ratio test,

WMW +restricted,� WMW +unrestricted, � "5Number of restrictions

GMM procedures can be used to implement a host of estimation and testing exercises.Just about anything you might want to estimate can be written as a special case of GMM. Todo so, you just have to remember (or look up) a few very general formulas, and then mapthem into your case.

Express a model as

H^i+{w> e,` @ 3

Everything is a vector:i can represent a vector ofO sample moments,{w can beP dataseries,e can beQ parameters. i+{w> e, is a slightly more explicit statement of the errorsxw+e, in the last chapter

De¿nition of the GMM estimate.

We estimate parametersae to set some linear combination of sample means ofi to zero,

ae = setdW jW +ae, @ 3 (1)

where

jW +e, � 4

W

W[w@4

i+{w> e,

anddW is a matrix that de¿nes which linear combination ofjW +e, will be set to zero. Thisde¿nes the GMM estimate.

If there are as many moments as parameters, you will set each moment to zero� whenthere are fewer parameters than moments, (12.1) just captures the natural idea that you willset some moments, or some linear combination of moments to zero in order to estimate theparameters. The minimization of the last chapter is a special case. If you estimatee by

183

Page 184: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

plq j3W +e,3ZjW +e,, the ¿rst order conditions are

Cj3WCe

ZjW +e, @ 3>

which is of the form (12.1) with dW @ Cj3W@CeZ . The general GMM procedure allows youto pick arbitrary linear combinations of the moments to set to zero in parameter estimation.

Standard errors of the estimate.

Hansen (1982), Theorem 3.1 tells us that the asymptotic distribution of the GMM estimateis

sW +ae� e, $ Q �

3> +dg,�4dVd3+dg,�43�

(2)

where

g � H

�Ci

Ce3+{w> e,

�@

CjW +e,

Ce3

(i.e., g is de¿ned as the population moment in the ¿rst equality, which we estimate in sampleby the second equality), where

d � plim dW >

and where

V �4[

m@�4H ^i+{w> e,> i+{w�me,3` = (3)

Don’t forget thesW in (12.2)! In practical terms, this means to use

ydu+ae, @4

W+dg,�4dVd3+dg,�43 (4)

as the covariance matrix for standard errors and tests. As in the last chapter, you can under-stand this formula as an application of the delta method.

Distribution of the moments.

Hansen’s Lemma 4.1 gives the sampling distribution of the momentsjW +e, =

sWjW +ae, $ Q

k3>�L � g+dg,�4d

�V�L � g+dg,�4d

�3l= (5)

As we have seen,V would be the asymptotic variance-covariance matrix of sample means, ifwe did not estimate any parameters, which sets some linear combinations of thejW to zero.TheL � g+dg,�4d terms account for the fact that in each sample some linear combinationsof jW are set to zero. Thus, this variance-covariance matrix is singular.

184

Page 185: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.1 GENERAL GMM FORMULAS

"5tests.

A sum of squared standard normals is distributed "5. Therefore, it is natural to use thedistribution theory for jW to see if the jW are jointly “too big.” Equation (12.5) suggests thatwe form the statistic

WjW +ae,3k�L � g+dg,�4d

�V�L � g+dg,�4d

�3l�4jW +ae, (6)

and that it should have a"5 distribution. It does, but with a hitch: The variance-covariancematrix is singular, so you have to pseudo-invert it. For example, you can perform an eigen-value decomposition

S@ T�T3 and then invert only the non-zero eigenvalues. Also, the

"5 distribution has degrees of freedom given by the number non-zero linear combinationsof jW , the number of moments less number of estimated parameters. You can similarly use(12.5) to construct tests of individual moments (“are the small stocks mispriced?”) or groupsof moments.

Ef¿cient estimates

The theory so far allows us to estimate parameters by setting any linear combination ofmoments to zero. Hansen shows that one particular choice is statistically optimal,

d @ g3V�4= (7)

This choice is the¿rst order condition toplqiej jW +e,3V�4jW +e, that we studied in the lastChapter. With this weighting matrix, the standard error formula (12.4) reduces to

sW +ae� e, $ Q �

3> +g3V�4g,�4�= (8)

This is Hansen’s Theorem 3.2. The sense in which (12.7) is “ef¿cient” is that the samplingvariation of the parameters for arbitraryd matrix, (12.4), equals the sampling variation of the“ef¿cient” estimate in (12.8) plus a positive semide¿nite matrix.

With the optimal weights (12.7), the variance of the moments (12.5) simpli¿es to

fry+jW , @4

W

�V � g+g3V�4g,�4g3

�= (9)

We can use this matrix in a test of the form (12.6). However, Hansen’s Lemma 4.2 tells usthat there is an equivalent and simpler way to construct this test,

WjW +ae,3V�4jW +ae, $ "5+&moments�&parameters). (10)

This result is nice since we get to use the already-calculated and non-singularV�4.

To derive (12.10) from (12.5), factorV @ FF3 and then¿nd the asymptotic covariancematrix ofF�4jW +ae, using (12.5). The result is

yduks

WF�4jW +ae,l@ L �F�4g+g3V�4g,�4g3F�43=

185

Page 186: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

This is an idempotent matrix of rank &moments-&parameters, so (12.10) follows.

Alternatively, note thatV�4is a pseudo-inverse of the second stagefry+jW ,. (A pseudo-inverse timesfry+jW , should result in an idempotent matrix of the same rank asfry+jW ,.)

V�4fry+jW , @ V�4�V � g+g3V�4g,�4g3

�@ L � V�4g+g3V�4g,�4g3

Then, check that the result is idempotent.�L � V�4g+g3V�4g,�4g3

� �L � V�4g+g3V�4g,�4g3

�@ L � V�4g+g3V�4g,�4g3=

This derivation not only veri¿es thatMW has the same distribution asj3W fry+jW ,�4jW , but

that they are numerically the same in every sample.

I emphasize that (12.8) and (12.10) only apply to the “optimal” choice of weights, (12.7).If you use another set of weights, as in a¿rst-stage estimate, you must use the general for-mulas (12.4) and (12.5).

Model comparisons

You often want to compare one model to another. If one model can be expressed as aspecial or “restricted” case of the other or “unrestricted” model we can perform a statisticalcomparison that looks very much like a likelihood ratio test. If we use the sameV matrix– usually that of the unrestricted model – the restrictedMW must rise. But if the restrictedmodel is really true, it shouldn’t rise “much.” How much?

WMW +restricted,� WMW +unrestricted, � "5+&of restrictions,

This is a “"5 difference” test, due to Newey and West (1987a), who call it the “D-test.”

12.2 Testing moments

How to test one or a group of pricing errors. 1) Use the formula forydu+jW , 2) A "5

difference test.

You may want to see how well a model does on particular moments or particular pricingerrors. For example, the celebrated “small¿rm effect” states that an unconditional CAPM+p @ d.eUZ , no scaled factors) does badly in pricing the returns on a portfolio that alwaysholds the smallest 1/10th or 1/20th of¿rms in the NYSE. You might want to see whether anew model prices the small returns well. The standard error of pricing errors also allowsyou to add error bars to a plot of predicted vs. actual mean returns such as Figure 5 or otherdiagnostics based on pricing errors.

We have already seen that individual elements ofjW measure the pricing errors or ex-pected return errors. Thus, the sampling variation ofjW given by (12.5) provides exactly

186

Page 187: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.3 STANDARD ERRORS OF ANYTHING BY DELTA METHOD

the standard error we are looking for. You can use the sampling distribution of jW , to evalu-ate the signi¿cance of individual pricing errors, to construct aw-test (for a singlejW , such assmall¿rms) or"5 test (for groups ofjW , such as small¿rms instruments). As usual this istheWald test.

Alternatively, you can use the"5 difference approach. Start with a general model that in-cludes all the moments, and form an estimate of the spectral density matrixV. Now set tozero the moments you want to test, and denotejvW +e, the vector of moments, including thezeros+v for “smaller”). Chooseev to minimizejvW +ev,3V�4jvW +ev,using the same weight-ing matrixV. The criterion will be lower than the original criterionjW +e,3V�4jW +e,, sincethere are the same number of parameters and fewer moments. But, if the moments we want totest truly are zero, the criterion shouldn’t be that much lower. The"5 difference test applies,

WjW +ae,3V�4jW +ae,� WjvW +aev,V

�4jvW +aev, � "5+&eliminated moments,=

Of course, don’t fall into the obvious trap of picking the largest of 10 pricing errors andnoting it’s more than two standard deviations from zero. The distribution of thelargest of 10pricing errors is much wider than the distribution of a single one. To use this distribution,you have to pick which pricing error you’re going to testbefore you look at the data.

12.3 Standard errors of anything by delta method

One quick application illustrates the usefulness of the GMM formulas. Often, we want toestimate a quantity that is a nonlinear function of sample means,

e @ ! ^H+{w,` @ !+�,=

In this case, the formula (12.2) reduces to

ydu+eW , @4

W

�g!

g�

�3 4[m@�4

fry+{w> {3w�m,

�g!

g�

�= (11)

The formula is very intuitive. The variance of the sample mean is the covariance term inside.The derivatives just linearize the function! near the truee.

For example, a correlation coef¿cient can be written as a function of sample means as

fruu+{w> |w, @H+{w|w,�H+{w,H+|w,s

H+{5w ,�H+{w,5sH+|5w ,�H+|w,5

Thus, take

� @�H+{w, H+{5w , H+|w, H+|5w , H+{w|w,

�3=

A problem at the end of the chapter asks you to take derivatives and derive the standard errorof the correlation coef¿cient. One can derive standard errors for impulse-response functions,

187

Page 188: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

variance decompositions, and many other statistics in this way.

12.4 Using GMM for regressions

By mapping OLS regressions in to the GMM framework, we derive formulas for OLSstandard errors that correct for autocorrelation and conditional heteroskedasticity of the er-rors. The general formula is

ydu+a�, @4

WH+{w{

3w,�4

57 4[m@�4

H+xw{w{3w�mxw�m,

68H+{w{

3w,�4=

and it simpli¿es in special cases.

Mapping any statistical procedure into GMM makes it easy to develop an asymptoticdistribution that corrects for statistical problems such as non-normality, serial correlation andconditional heteroskedasticity. To illustrate, as well as to develop the very useful formulas, Imap OLS regressions into GMM.

Correcting OLS standard errors for econometric problems isnot the same thing as GLS.When errors do not obey the OLS assumptions, OLS is consistent, and often more robustthan GLS, but its standard errors need to be corrected.

OLS picks parameters� to minimize the variance of the residual:

plqi�j

HW

�+|w � �3{w,5

�=

We ¿nd a� from the¿rst order condition, which states that the residual is orthogonal to theright hand variable:

jW +a�, @ HW

k{w+|w � {3wa�,

l@ 3 (12)

This condition is exactly identi¿ed–the number of moments equals the number of parameters.Thus, we set the sample moments exactly to zero and there is no weighting matrix (d @ L,.We can solve for the estimate analytically,

a� @ ^HW +{w{3w,`�4

HW +{w|w,=

This is the familiar OLS formula. The rest of the ingredients to equation (12.2) are

g @ H+{w{3w,

188

Page 189: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.4 USING GMM FOR REGRESSIONS

i+{w> �, @ {w+|w � {3w�, @ {whw

where hw is the regression residual. Equation (12.2) gives a formula for OLS standard errors,

ydu+a�, @4

WH+{w{

3w,�4

57 4[m@�4

H+xw{w{3w�mxw�m,

68H+{w{

3w,�4= (13)

This formula reduces to some interesting special cases.

Serially uncorrelated, homoskedastic errors

These are the usual OLS assumptions, and it’s good the usual formulas emerge. Formally,the OLS assumptions are

H+hw m {w> {w�4 ===hw�4> hw�5===, @ 3 (14)

H+h5w m {w> {w�4 ===hw> hw�4===, @ constant@ �5h= (15)

To use these assumptions, I use the fact that

H+de, @ H+H+dme,e,=The¿rst assumption means that only them @ 3 term enters the sum

4[m@�4

H+hw{w{3w�mhw�m, @ H+h5w{w{

3w,=

The second assumption means that

H+h5w{w{3w, @ H+h5w ,H+{w{

3w, @ �5hH+{w{

3w,=

Hence equation (12.13) reduces to our old friend,

ydu+a�, @4

W�5hH+{w{

3w,�4 @ �5h +[

3[,�4

=

The last notation is typical of econometrics texts, in which[ @�{4 {5 === {W

�3rep-

resents the data matrix.

Heteroskedastic errors

If we delete the conditional homoskedasticity assumption (12.15), we can’t pull thexoutof the expectation, so the standard errors are

ydu+a�, @4

WH+{w{

3w,�4H+x5w{w{

3w,H+{w{

3w,�4=

189

Page 190: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

These are known as “Heteroskedasticity consistent standard errors” or “White standard er-rors” after White (1980).

Hansen-Hodrick errors

Hansen and Hodrick (1982) run forecasting regressions of (say) six month returns, usingmonthly data. We can write this situation in regression notation as

|w.n @ �3{w . %w.n w @ 4> 5> ===W=

Fama and French (1988) also use regressions of overlapping long horizon returns on variablessuch as dividend/price ratio and term premium. Such regressions are an important part of theevidence for predictability in asset returns.

Under the null that one-period returns are unforecastable, we will still see correlation inthe%w due to overlapping data. Unforecastable returns imply

H+%w%w�m, @ 3 for mmm � n

but not formmm ? n= Therefore, we can only rule out terms inV lower thann= Since we mightas well correct for potential heteroskedasticity while we’re at it, the standard errors are

ydu+eW , @4

WH+{w{

3w,�4

57 n[m@�n

H+xw{w{3w�mxw�m,

68H+{w{

3w,�4=

12.5 Prespeci¿ed weighting matrices and moment conditions

Prespeci¿ed rather than “optimal” weighting matrices can emphasize economically inter-esting results, they can avoid the trap of blowing up standard errors rather than improvingpricing errors, they can lead to estimates that are more robust to small model misspeci¿-cations. This is analogous to the fact that OLS is often preferable to GLS in a regressioncontext. The GMM formulas for a¿xed weighting matrixZ are

ydu+ae, @4

W+g3Zg,�4g3ZVZg+g3Zg,�4

ydu+jW , @4

W+L � g+g3Zg,�4g3Z ,V+L �Zg+g3Zg,�4g3,=

In the basic approach outlined in Chapter 18.9, our¿nal estimates were based on the “ef¿-cient”V�4 weighting matrix. This objective maximizes the asymptotic statistical information

190

Page 191: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

in the sample about a model, given the choice of moments jW . However, you may want touse a prespeci¿ed weighting matrix Z 9@ V�4 instead, or at least as a diagnostic accompa-nying more formal statistical tests. A prespeci¿ed weighting matrix lets you, rather than theV matrix, specify which moments or linear combination of moments GMM will value in theminimizationplqiej jW +e,3ZjW +e,. A higher value ofZll forces GMM to pay more atten-tion to getting thelth moment right in the parameter estimation. For example, you might feelthat some assets suffer from measurement error, are small and illiquid and hence should bedeemphasized, or you may want to keep GMM from looking at portfolios with strong longand short position. I give some additional motivations below.

You can also go one step further and impose which linear combinationsdW of momentconditions will be set to zero in estimation rather than use the choice resulting from a min-imization,dW @ g3V�4 or dW @ g3Z . The¿xedZ estimate still trades off the accuracyof individual moments according to the sensitivity of each moment with respect to the pa-rameter. For example, ifjW @

�j4W j5W

�3, Z @ L, butCjW@Ce @ ^4 43`, so that the second

moment is 10 times more sensitive to the parameter value than the¿rst moment, then GMMwith ¿xed weighting matrix sets

4� j4W . 43� j5W @ 3=

The second moment condition will be 10 times closer to zero than the¿rst. If youreally wantGMM to pay equal attention to the two moments, then you can¿x thedW matrix directly, forexampledW @ ^4 4` or dW @ ^4 � 4`.

Using a prespeci¿ed weighting matrix or using a prespeci¿ed set of moments isnot thesame thing as ignoring correlation of the errorsxw in the distribution theory. TheV matrixwill still show up in all the standard errors and test statistics.

12.5.1 How to use prespeci¿ed weighting matrices

Once you have decided to use a prespeci¿ed weighting matrixZ or a prespeci¿ed set ofmomentsdW jW +e, @ 3, the general distribution theory outlined in section 12.1 quickly givesstandard errors of the estimates and moments, and therefore a"5 statistic that can be usedto test whether all the moments are jointly zero. Section 12.1 gives the formulas for thecase thatdW is prespeci¿ed. If we use weighting matrixZ , the ¿rst order conditions toplqiej j3W +e,ZjW +e, are

CjW +e,3

CeZjW +e, @ g3ZjW +e, @ 3>

so we map into the general case withdW @ g3Z= Plugging this value into (12.4), the variance-covariance matrix of the estimated coef¿cients is

ydu+ae, @4

W+g3Zg,�4g3ZVZg+g3Zg,�4= (16)

191

Page 192: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

(You can check that this formula reduces to 4@W +g3V�4g,�4with Z @ V�4.)

Plugging d @ g3Z into equation (12.5), we ¿nd the variance-covariance matrix of themomentsjW

ydu+jW , @4

W+L � g+g3Zg,�4g3Z ,V+L �Zg+g3Zg,�4g3, (17)

As in the general formula, the terms to the left and right ofV account for the fact that somelinear combinations of moments are set to zero in each sample.

Equation (12.17) can be the basis of"5 tests for the overidentifying restrictions. If weinterpret+,�4 to be a generalized inverse, then

j3Wydu+jW ,�4jW � "5+&prphqwv�&sdudphwhuv,=

As in the general case, you have to pseudo-invert the singularydu+jW ,, for example by in-verting only the non-zero eigenvalues.

The major danger in using prespeci¿ed weighting matrices or momentsdW is that thechoice of moments, units, and (of course) the prespeci¿eddW orZ must be made carefully.For example, if you multiply the second moment by 10 times its original value, theV matrixwill undo this transformation and weight them in their original proportions. The identityweighting matrix will not undo such transformations, so the units should be picked rightinitially.

12.5.2 Motivations for prespeci¿ed weighting matrices

Robustness, as with OLS vs. GLS.

When errors are autocorrelated or heteroskedastic, every econometrics textbook showsyou how to “improve” on OLS by making appropriate GLS corrections. If you correctlymodel the error covariance matrix and if the regression is perfectly speci¿ed, the GLS pro-cedure can improve ef¿ciency, i.e. give estimates with lower asymptotic standard errors.However, GLS is less robust. If you model the error covariance matrix incorrectly, the GLSestimates can be much worse than OLS. Also, the GLS transformations can zero in on slightlymisspeci¿ed areas of the model, producing garbage. GLS is “best,” but OLS is “pretty darngood.” One often has enough data that wringing every last ounce of statistical precision (lowstandard errors) from the data is less important than producing estimates that do not dependon questionable statistical assumptions, and that transparently focus on the interesting fea-tures of the data. In these cases, it is often a good idea to use OLS estimates. The OLSstandard error formulas are wrong, though, so you must correct thestandard errors of theOLS estimates for these features of the error covariance matrices, using the formulas wedeveloped in section 12.4.

For example, the GLS transformation for highly serially correlated errors essentially turnsa regression in levels into a regression in¿rst differences. But relationships that are robust in

192

Page 193: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

levels can disappear in ¿rst differences, especially of high frequency data, because of smallmeasurement or speci¿cation errors. Lucas (198x) followed a generation of money demandestimates Pw @ d.e\w.%w that had been run in quasi-¿rst differences following GLS advice,and that had found small and unstable income elasticities, since month-to-month variation inmeasured money demand has little to do with month-to-month variation in income. Lucas ranthe regression by OLS in levels, correcting standard errors for serial correlation, and foundthe pattern evident in any graph that the level of money and income track very well over yearsand decades.

GMM works the same way. First-stage or otherwise¿xed weighting matrix estimates maygive up something in asymptotic ef¿ciency, but they are still consistent, and they can be morerobust to statistical and economic problems. You still want to use theV matrix in computingstandard errors, though, as you want to correct OLS standard errors, and the GMM formulasshow you how to do this.

Even if in the end you want to produce “ef¿cient” estimates and tests, it is a good idea tocalculate standard errors and model¿t tests for the¿rst-stage estimates. Ideally, the parameterestimates should not change by much, and the second stage standard errors should be tighter.If the “ef¿cient” parameter estimates do change a great deal, it is a good idea to diagnosewhy this is so. It must come down to the “ef¿cient” parameter estimates strongly weightingmoments or linear combinations of moments that were not important in the¿rst stage, andthat the former linear combination of moments disagrees strongly with the latter about whichparameters¿t well. Then, you can decide whether the difference in results is truly due toef¿ciency gain, or whether it signals a model misspeci¿cation.

Near-singular S.

The spectral density matrix is often nearly singular, since asset returns are highly corre-lated with each other, and since we often include many assets relative to the number of datapoints. As a result, second stage GMM (and, as we will see below, maximum likelihoodor any other ef¿cient technique) tries to minimize differences and differences of differencesof asset returns in order to extract statistically orthogonal components with lowest variance.One may feel that this feature leads GMM to place a lot of weight on poorly estimated, eco-nomically uninteresting, or otherwise non-robust aspects of the data. In particular, portfoliosof the form433U4� <<U5 assume that investors can in fact purchase such heavily leveragedportfolios. Short-sale costs often rule out such portfolios or signi¿cantly alter their returns,so one may not want to emphasize pricing them correctly in the estimation and evaluation.

For example, suppose thatV is given by

V @

�4 �� 4

�=

so

V�4 @4

4� �5

�4 ���� 4

�=

193

Page 194: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

We can factor V�4 into a “square root” by the Choleski decomposition. This produces atriangular matrixF such thatF3F @ V�4. You can check that the matrix

F @

%4s4��5

��s4��5

3 4

&(18)

works. Then, the GMM criterion

plq j3WV�4jW

is equivalent to

plq+j3WF3,+FjW ,=

FjW gives the linear combination of moments that ef¿cient GMM is trying to minimize.Looking at (12.18), as� $ 4> the (2,2) element stays at 1, but the (1,1) and (1,2) elementsget very large and of opposite signs. For example, if� @ 3=<8> then

F @

�6=53 �6=373 4

�=

In this example, GMM pays a little attention to the second moment, but placesthree timesas much weight on thedifference between the¿rst and second moments. Larger matricesproduce even more extreme weights. At a minimum, it is a good idea to look atV�4 and itsCholeski decomposition to see what moments GMM is prizing.

The same point has a classic interpretation, and is a well-known danger with classicregression-based tests. Ef¿cient GMM wants to focus on well-measured moments. In as-set pricing applications, the errors are typically close to uncorrelated over time, so GMM islooking for portfolios with small values ofydu+pw.4U

hw.4,. Roughly speaking, those will

be asset with smallreturn variance. Thus,GMM will pay most attention to correctly pricingthe sample minimum-variance portfolio,and GMM’s evaluation of the model byMW test willfocus on its ability to price this portfolio.

Now, consider what happens in a sample, as illustrated in Figure 22. The sample mean-variance frontier is typically a good deal wider than the true, or ex-ante mean-variance fron-tier. In particular, the sample minimum-variance portfolio may have little to do with thetrue minimum-variance portfolio. Like any portfolio on the sample frontier, its compositionlargely reÀects luck – that’s why we have asset pricing models in the¿rst place rather thanjust price assets with portfolios on the sample frontier. The sample minimum variance returnis also likely to be composed of strong long-short positions.

In sum, you may want to force GMM not to pay quite so much attention to correctlypricing the sample minimum variance portfolio, and you may want to give less importance toa statistical measure of model evaluation that almost entirely prizes GMM’s ability to pricethat portfolio.

194

Page 195: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

E(R)

σ(R)

True, ex-ante frontier

Sample, ex-post frontier

Sample minimum-variance portfolio

Figure 22. True or ex ante and sample or ex-post mean-variance frontier. The sample oftenshows a spurious minimum-variance portfolio.

195

Page 196: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

Economically interesting moments.

The optimal weighting matrix makes GMM pay close attention to linear combinations ofmoments with small sampling error in both estimation and evaluation. One may want to forcethe estimation and evaluation to pay attention to economically interesting moments instead.The initial portfolios are usually formed on an economically interesting characteristic such assize, beta, book/market or industry. One typically wants in the end to see how well the modelprices these initial portfolios, not how well the model prices potentially strange portfoliosof those portfolios. If a model fails, one may want to characterize that failure as “the modeldoesn’t price small stocks” not “the model doesn’t price a portfolio of<33� small¿rm returns�933� large¿rm returns�5<<� medium¿rm returns.”

Level playing ¿eld.

TheV matrix changes as the model and as its parameters change. (See the de¿nition,(11.5) or (12.3).) As theV matrix changes, which assets the GMM estimate tries hard toprice well changes as well. For example, theV matrix from one model may value stronglypricing the T bill well, while that of another model may value pricing a stock excess returnwell. Comparing the results of such estimations is like comparing apples and oranges. By¿xing the weighting matrix, you can force GMM to pay attention to the various assets in thesame proportion while you vary the model.

The fact thatV matrices change with the model leads to another subtle trap. One modelmy may “improve” aMW @ j3WV

�4jW statistic because it blows up the estimates ofV, ratherthan making any progress on lowering the pricing errorsjW . No one would formally use acomparison ofMW tests across models to compare them, of course. But it has proved nearlyirresistible for authors to claim success for a new model over previous ones by noting im-provedMW statistics, despite different weighting matrices, different moments, and sometimesmuch larger pricing errors. For example, if you take a modelpw and create a new model bysimply adding noise, unrelated to asset returns (in sample),p3

w @ pw . %w, then the momentconditionjW @ HW +p

3wU

hw , = HW ++pw . %w,U

hw , is unchanged. However, the spectral den-

sity matrixV @ Hk+pw . %w,

5UhwU

h3w

lcan rise dramatically. This can reduce theMW leading

to a false sense of “improvement.”

Conversely, if the sample contains a nearly riskfree portfolio of the test assets, or a port-folio with apparently small variance ofpw.4U

hw.4, then theMW test essentially evaluates the

model by how will it can price this one portfolio. This can lead to a false rejection – even avery smalljW will produce a largej3WV

�4jW if there is an eigenvalue ofV that is (spuriously)too small.

If you use a common weighting matrixZ for all models, and evaluate the models byj3WZjW , then you can avoid this trap. Beware that the individual"5 statistics are based onj3W ydu+jW ,

�4jW , andydu+jW , containsV, even with a prespeci¿ed weighting matrixZ .You should look at the pricing errors, or at some statistic such as the sum of absolute orsquared pricing errors to see if they are bigger or smaller, leaving the distribution aside. Thequestion “are the pricing errors small?” is as interesting as the question “if we drew arti¿cial

196

Page 197: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

data over and over again from a null statistical model, how often would we estimate a ratioof pricing errors to their estimated variance j3WV

�4jW this big or larger?”

12.5.3 Some prespeci¿ed weighting matrices

Two examples of economically interesting weighting matrices are the second-moment matrixof returns, advocated by Hansen and Jagannathan (1997) and the simple identity matrix,which is used implicitly in much empirical asset pricing.

Second moment matrix.

Hansen and Jagannathan (1997) advocate the use of the second moment matrix of payoffsZ @ H+{{3,�4 in place ofV. They motivate this weighting matrix as an interesting distancemeasure between a model forp, say|, and the space of truep’s. Precisely, the minimumdistance (second moment) between a candidate discount factor| and the space of true dis-count factors is the same as the minimum value of the GMM criterion withZ @ H+{{3,�4

as weighting matrix.

x*

m

y

X

proj(y| X)

Nearest m

Figure 23. Distance between| and nearestp = distance betweensurm+|m[, and{�=

To see why this is true, refer to Figure 23. The distance between| and the nearest validp is the same as the distance betweensurm+| m [, and{�. As usual, consider the case that

197

Page 198: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

[ is generated from a vector of payoffs { with price s. From the OLS formula,

surm+| m [, @ H+|{3,H+{{3,�4{=

{� is the portfolio of { that prices { by construction,

{� @ s3H+{{3,�4{=

Then, the distance between | and the nearest valid p is:

n| � nearest pn @ nsurm+|m[,� {�n@

��H+|{3,H+{{3,�4{� s3H+{{3,�4{��

@��+H+|{3,� s3,H+{{3,�4{

��@ ^H+|{,� s`3H+{{3,�4^H+|{,� s`

@ j3WH+{{3,�4jW

You might want to choose parameters of the model to minimize this “economic” measureof model¿t, or this economically motivated linear combination of pricing errors, rather thanthe statistical measure of¿t V�4. You might also use the minimized value of this criterion tocompare two models. In that way, you are sure the better model is better because it improveson the pricing errors rather than just blowing up the weighting matrix.

Identity matrix.

Using the identity matrix weights the initial choice of assets or portfolios equally in esti-mation and evaluation. This choice has a particular advantage with large systems in whichVis nearly singular, as it avoids most of the problems associated with inverting a near-singularV matrix. Many empirical asset pricing studies use OLS cross-sectional regressions, whichare the same thing as a¿rst stage GMM estimate with an identity weighting matrix.

Comparing the second moment and identity matrices.

The second moment matrix gives an objective that is invariant to the initial choice ofassets or portfolios. If we form a portfolioD{ of the initial payoffs{, with nonsingularD(i.e. a transformation that doesn’t throw away information) then

^H+|D{,�Ds`3H+D{{3D3,�4^H+|D{,�Ds` @ ^H+|{,� s`3H+{{3,�4^H+|{,� s`=

The optimal weighting matrixV shares this property. It is not true of the identity or other¿xed matrices. In those cases, the results will depend on the initial choice of portfolios.

Kandel and Stambaugh (19xx) have suggested that the results of several important assetpricing model tests are highly sensitive to the choice of portfolio� i.e. that authors inadver-tently selected a set of portfolios on which the CAPM does unusually badly in a particularsample. Insisting that weighting matrices have this kind of invariance to portfolio selectionmight be a good device to ward against this problem.

198

Page 199: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.6 ESTIMATING ON ONE GROUP OF MOMENTS, TESTING ON ANOTHER.

On the other hand, if you want to focus on the model’s predictions for economicallyinteresting portfolios, then it wouldn’t make much sense for the weighting matrix to undothe speci¿cation of economically interesting portfolios! For example, many studies wantto focus on the ability of a model to describe expected returns that seem to depend on acharacteristic such as size, book/market, industry, momentum, etc. Also, the second momentmatrix is often even more nearly singular than the spectral density matrix, sinceH+{{3, @fry+{,.H+{,H+{,3. Therefore, it often emphasizes portfolios with even more extreme shortand long positions, and is no help on overcoming the near singularity of theV matrix.

12.6 Estimating on one group of moments, testing on another.

You may want to force the system to use one set of moments forestimation and another fortesting. The real business cycle literature in macroeconomics does this extensively, typicallyusing “¿rst moments” for estimation (“calibration”) and “second moments” (i.e.¿rst mo-ments of squares) for evaluation. A statistically minded macroeconomist might like to knowwhether the departures of model from data “second moments” are large compared to sam-pling variation, and would like to include sampling uncertainty about the parameter estimatesin this evaluation.

You might want to choose parameters using one set of asset returns (stocks� domestic as-sets� size portfolios,¿rst 9 size deciles, well-measured assets) and then see how the modeldoes “out of sample” on another set of assets (bonds� foreign assets� book/market portfo-lios, small¿rm portfolio, questionably measured assets, mutual funds). However, you wantthe distribution theory for evaluation on the second set of moments to incorporate samplinguncertainty about the parameters in their estimation on the¿rst set of moments.

You can do all this very simply by using an appropriate weighting matrix or a prespec-i¿ed moment matrixdW . For example, if the¿rst Q moments will be used to estimateQparameters, and the remainingP moments will be used to test the model “out of sample,”usedW @ ^LQ 3Q�P ` = If there are more momentsQ than parameters in the “estimation”block, you can construct a weighting matrixZ which is an identity matrix in theQ �Q es-timation block and zero elsewhere. ThendW @ Cj3W@CeZ will simply contain the¿rst Qcolumns ofCj3W@Ce followed by zeros. The test moments will not be used in estimation. Youcould even use the inverse of the upperQ �Q block ofV (not the upper block of the inverseof V!) to make the estimation a bit more ef¿cient.

12.7 Estimating the spectral density matrix

Hints on estimating the spectral density or long run covariance matrix. 1) Use a sensi-ble ¿rst stage estimate 2) Remove means 3) Downweight higher order correlations 4) Con-sider parametric structures for autocorrelation and heteroskedasticity 5) Use the null to limit

199

Page 200: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

the number of correlations or to impose other structure on VB 6) Size problems� considera factor or other parametric cross-sectional structure forV. 7) Iteration and simultaneouse> V estimation.

The optimal weighting matrixV depends onpopulation moments, and depends on theparameterse. Work back through the de¿nitions,

V @4[

m@�4H+xwx

3w�m,>

xw � +pw+e,{w � sw�4,

How do we estimate this matrix? The big picture is simple: following the usual philoso-phy, estimate population moments by their sample counterparts. Thus, use the¿rst stageeestimates and the data to construct sample versions of the de¿nition of V. This produces aconsistent estimate of the true spectral density matrix, which is all the asymptotic distributiontheory requires.

The details are important however, and this section gives some hints. Also, you may wanta different, and less restrictive, estimate ofV for use in standard errors than you do when youare estimatingV for use in a weighting matrix.

1) Use a sensible ¿rst stage W, or transform the data.

In the asymptotic theory, you can use consistent¿rst stagee estimates formed byanynontrivial weighting matrix. In practice, of course, you should use a sensible weightingmatrix so that the¿rst stage estimates are not ridiculously inef¿cient.Z @ L is often a goodchoice.

Sometimes, some moments will have different units than other moments. For example,the dividend/price ratio is a number like 0.04. Therefore, the moment formed byUw.4�g@swwill be about3=37 as big as large as the moment formed byUw.4 � 4. If you useZ @ L,GMM will pay much less attention to theUw.4 � g@sw moment. It is wise, then, to eitheruse an initial weighting matrix that overweights theUw.4 � g@sw moment, or to transformthe data so the two moments are about the same mean and variance. For example, you coulduseUw.4 � +4 . g@sw,. It is also useful to start with moments that are not horrendouslycorrelated with each other, or to remove such correlation with a cleverZ . For example, youmight considerUd andUe �Ud rather thanUd andUe. You can accomplish this directly, orby starting with

Z @

�4 �43 4

� �4 3�4 4

�@

�5 �4�4 4

�=

2) Remove means.

Under the null,H+xw, @ 3> so it does not matter to the asymptotic distribution theory

200

Page 201: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.7 ESTIMATING THE SPECTRAL DENSITY MATRIX

whether you estimate the covariance matrix by removing means, using

4

W

W[w@4

^+xw � �x,+xw � �x,3` > �x � 4

W

W[w@4

xw

or whether you estimate the second moment matrix by not removing means. However,Hansen and Singleton (1982) advocate removing the means in sample, and this is gener-ally a good idea.

It is already a major obstacle to second-stage estimation that estimatedV matrices (andeven simple variance-covariance matrices) are often nearly singular, providing an unreliableweighting matrix when inverted. Since second moment matricesH+xx3, @ fry+x> x3, .H+x,H+x3, add a singular matrixH+x,H+x3, they are often even worse.

3) Downweight higher order correlations.

You obviously cannot use a direct sample counterpart to the spectral density matrix. Ina sample of size433, there is no way to estimateH+xwx

3w.434,.Your estimate ofH+xwx

3w.<<,

is based onone data point,x4x3433. Hence, it will be a pretty unreliable estimate. For thisreason, the estimator using all possible autocorrelations in a given sample isinconsistent.(Consistency means that as the sample grows, the probability distribution of the estimatorconverges to the true value. Inconsistent estimates typically have very large sample variation.)

Furthermore, evenV estimates that use few autocorrelations are not always positive def-inite in sample. This is embarrassing when one tries to invert the estimated spectral densitymatrix, which you have to do if you use it as a weighting matrix. Therefore, it is a good ideato construct consistent estimates that are automatically positive de¿nite in every sample. Onesuch estimate is the Bartlett estimate, used in this application by Newey and West (1987b). Itis

aV @n[

m@�n

�n � mmm

n

�4

W

W[w@4

+xwx3w�n,= (19)

As you can see, only autocorrelations up tonth +n ? W , order are included, and higher orderautocorrelations are downweighted. (It’s important to use4@W not4@+W �n,� this is a furtherdownweighting.) The Newey-West estimator is basically the variance of kth sums, which iswhy it is positive de¿nite in sample:

Y du

3C n[m@4

xw�m

4D @ nH+xwx

3w, . +n � 4,^H+xwx

3w�4, .H+xw�4x3w,` . � � �

.^H+xwx3w�n, .H+xw�nx3w,` @ n

n[m@�n

n � mmmn

H+xwx3w�n,=

This calculation also gives some intuition for theV matrix. We’re looking for the variance

201

Page 202: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

across samples of the sample mean ydu+ 4W

SWw@4 xw). We only have one sample mean to look

at, so we estimate the variance of the sample mean by looking at the variance in a single

sample of shorter sums, ydu�4n

Snm@4 xm

�. The V matrix is sometimes called the long-run

covariancematrix for this reason. In fact, one could estimate V directly as a variance of nthsums and obtain almost the same estimator, that would also be positive de¿nite in any sample,

yw @n[m@4

xw�m > �y @4

W � n

W[[email protected]

yw

aV @4

n

4

W � n

W[[email protected]

+yw � �y, +yw � �y,3 =

This estimator has been used when measurement of V is directly interesting (Cochrane 1998,Lo and MacKinlay 1988). A variety of other weighting schemes have been advocated.

What value of n> or how wide a window if of another shape, should you use? Hereagain, you have to use some judgment. Too short values of n, together with a xw that issigni¿cantly autocorrelated, and you don’t correct for correlation that might be there in theerrors. Too long a value ofn, together with a series that does not have much autocorrelation,and the performance of the estimate and test deteriorates. Ifn @ W@5 for example, you arereally using only two data points to estimate the variance of the mean. The optimum valuethen depends on how much persistence or low-frequency movement there is in a particularapplication, vs. accuracy of the estimate.

There is an extensive statistical literature about optimal window width, or size ofn. Alas,this literature mostly characterizes therate at whichn should increase with sample size. Youmust promise to increasen as sample size increases, but not as quickly as the sample sizeincreases –olpW$4 n @ 4> olpW$4 n@W @ 3 – in order to obtain consistent estimates. Inpractice, promises about what you’d do with more data are pretty meaningless, and usuallybroken once more data arrives.

4) Consider parametric structures for autocorrelation and heteroskedasticity.

“Nonparametric” corrections such as (12.19) often don’t perform very well in typicalsamples. The problem is that “nonparametric” techniques are really very highly parametric�you have to estimate many correlations in the data. Therefore, the nonparametric estimatevaries a good deal from sample to sample, while the asymptotic distribution theory ignoressampling variation in covariance matrix estimates. The asymptotic distribution can thereforebe a poor approximation to the¿nite-sample distribution of statistics like theMW . TheV�4

weighting matrix will also be unreliable.

One answer is to use a Monte-Carlo or bootstrap to estimate the¿nite-sample distributionparameters and test statistics rather than to rely on asymptotic theory.

Alternatively, you can impose a parametric structure on theV matrix. Just because theformulas are expressed in terms of a sum of covariances does not mean you have to estimate

202

Page 203: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.7 ESTIMATING THE SPECTRAL DENSITY MATRIX

them that way� GMM is not inherently tied to “nonparametric” covariance matrix estimates.For example, if you model a scalarx as an AR(1) with parameter�> then you can estimatetwo numbers� and�5x rather than a whole list of autocorrelations, and calculate

V @4[

m@�4H+xwxw�m, @ �5x

4[m@�4

�mmm @ �5x4 . �

4� �

If this structure is not a bad approximation, imposing it can result in more reliable estimatesand test statistics since one has to estimate many fewer coef¿cients. You could transform thedata in such a way that there is less correlation to correct for in the¿rst place.

(This is a very useful formula, by the way. You are probably used to calculating thestandard error of the mean as

�+�{, @�+{,s

W=

This formula assumes that the{ are uncorrelated over time. If an AR(1) is not a bad modelfor their correlation, you can quickly adjust for correlation by using

�+�{, @�+{,s

W

u4 . �

4� �

instead.)

This sort of parametric correction is very familiar from OLS regression analysis. Thetextbooks commonly advocate the AR(1) model for serial correlation as well as parametricmodels for heteroskedasticity corrections. There is no reason not to follow a similar approachfor GMM statistics.

5) Use the null to limit correlations?

In the typical asset pricing setup, the null hypothesis speci¿es thatHw+xw.4, @ Hw+pw.4Uw.4�4, @ 3> as well asH+xw.4, @ 3. This implies that all the autocorrelation terms ofV dropout� H+xwx

3w�m, @ 3 for m 9@ 3= The laggedx could be an instrument}> the discounted return

should be unforecastable, using past discounted returns as well as any other variable. In thissituation, one could exploit the null to only includeone term, and estimate

aV @4

W

W[w@4

xwx3w=

Similarly, if one runs a regression forecasting returns from some variable}w>

Uw.4 @ d. e}w . %w.4>

the null hypothesis that returns are not forecastable by any variable at timew means that theerrors should not be autocorrelated. One can then simplify the standard errors in the OLS

203

Page 204: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

regression formulas given in section 12.4, eliminating all the leads and lags.

In other situations, the null hypothesis can suggest a functional form for H+xwx3w�m, or that

some but not all are zero. For example, as we saw in section 12.4, regressions of long horizonreturns on overlapping data lead to a correlated error term, even under the null hypothesis ofno return forecastability. We can impose this null, ruling out terms past the overalap, assuggested by Hansen and Hodrick,

ydu+eW , @4

WH+{w{

3w,�4

57 n[m@�n

H+hw{w{3w�mhw�m,

68H+{w{

3w,�4= (20)

However, the null might not be correct, and the errors might be correlated. If so, you mightmake a mistake by leaving them out. If the null is correct, the extra terms will converge to zeroand you will only have lost a few (¿nite-sample) degrees of freedom needlessly estimatingthem. If the null is not correct, you have an inconsistent estimate. With this in mind, youmight want to include at least a few extra autocorrelations, even when the null says they don’tbelong.

Furthermore, there is no guarantee that the unweighted sum in (12.20) is positive de¿nitein sample. If the sum in the middle is not positive de¿nite, you could add a weighting tothe sum, possibly increasing the number of lags so that the lags nearn are not unusuallyunderweighted. Again, estimating extra lags that should be zero under the null only loses alittle bit of power.

Monte Carlo evidence (Hodrick 199x, Campbell 1994) suggests that imposing the nullhypothesis to simplify the spectral density matrix helps to get the¿nite-samplesize of teststatistics right – the probability of rejection given the null is true. One should not be surprisedthat if the null is true, imposing as much of it as possible makes estimates and tests workbetter. On the other hand, adding extra correlations can help with thepower of test statistics– the probability of rejection given that an alternative is true – since they converge to thecorrect spectral density matrix.

This trade-off requires some thought. Formeasurement rather than puretesting, usinga spectral density matrix that can accommodate alternatives may be the right choice. Forexample, in the return forecasting regressions, one is really focused on measuring returnforecastability rather than just formally testing the hypothesis that it is zero. On the otherhand, the small-sample performance of the nonparametric estimators with many lags is notvery good.

If you are testing an asset pricing model that predictsx should not be autocorrelated, andthere is a lot of correlation – if this issue makes a big difference – then this is an indication thatsomething is wrong with the model� that includingx as one of your instruments} would resultin a rejection or at least substantially change the results. If thex are close to uncorrelated,then it really doesn’t matter if you add a few extra terms or not.

6) Size problems� consider a factor or other parametric cross-sectional structure.

204

Page 205: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.7 ESTIMATING THE SPECTRAL DENSITY MATRIX

If you try to estimate a covariance matrix that is larger than the number of data points (say2000 NYSE stocks and 800 monthly observations), the estimate of V> like any other covari-ance matrix, is singular by construction. This fact leads to obvious problems when you try toinvertV! More generally, when the number of moments is more than around 1/10 the numberof data points,V estimates tend to become unstable and near-singular. Used as a weightingmatrix, such anV matrix tells you to pay lots of attention to strange and probably spuriouslinear combinations of the moments, as I emphasized in section 12.5. For this reason, mostsecond-stage GMM estimations are limited to a few assets and a few instruments.

A good, but as yet untried alternative might be to impose a factor structure or other well-behaved structure on the covariance matrix. The near-universal practice of grouping assetsinto portfolios before analysis already implies an assumption that the trueV of the underlyingassets has a factor structure. Grouping in portfolios means that the individual assets have noinformation not contained in the portfolio, so that a weighting matrixV�4 would treat allassets in the portfolio identically. It might be better to estimate anV imposing a factorstructure on all the primitive assets.

Another response to the dif¿culty of estimatingV is to stop at¿rst stage estimates, andonly useV for standard errors. One might also use a highly structured estimate ofV asweighting matrix, while using a less constrained estimate for the standard errors.

This problem is of course not unique to GMM. Any estimation technique requires us tocalculate a covariance matrix. Many traditional estimates simply assume thatxw errors arecross-sectionally independent. This false assumption leads to understatements of the standarderrors far worse than the small sample performance of any GMM estimate.

Our econometric techniques all are designed for large time series and small cross-sections.Our data has a large cross section and short time series. A large unsolved problem in¿nanceis the development of appropriate large-N small-T tools for evaluating asset pricing models.

7) Alternatives to the two-stage procedure: iteration and one-step.

Hansen and Singleton (1982) describe the above two-step procedure, and it has becomepopular for that reason. Two alternative procedures may perform better in practice, i.e. mayresult in asymptotically equivalent estimates with better small-sample properties. They canalso be simpler to implement, and require less manual adjustment or care in specifying thesetup (moments, weighting matrices) which is often just as important.

a) Iterate. The second stage estimateae5 will not imply the same spectral density as the¿rst stage. It might seem appropriate that the estimate ofe and of the spectral density shouldbe consistent, i.e. to¿nd a¿xed point ofae @ plqiej^jW +e,3V+ae,�4jW +e,`. One way to searchfor such a¿xed point is to iterate:¿nd e5 from

ae5 @ plqiej

jW +e,3V�4+e4,jW +e, (21)

wheree4 is a¿rst stage estimate, held¿xed in the minimization overe5. Then useae5 to ¿nd

205

Page 206: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 12 GMM: GENERAL FORMULAS AND APPLICATIONS

V+ae5,, ¿nd

ae6 @ plqiej

^jW +e,3V+ae5,�4jW +e,`>

and so on. There is no ¿xed point theorem that such iterations will converge, but they oftendo, especially with a little massaging. (I once used V ^+em . em�4,@5` in the beginning part ofan iteration to keep it from oscillating between two values of e). Ferson and Foerster (1994)¿nd that iteration gives better small sample performance than two-stage GMM in MonteCarlo experiments. This procedure is also likely to produce estimates that do not depend onthe initial weighting matrix.

b) Pick e and V simultaneously. It is not true thatV must be held¿xed as one searchesfor e. Instead, one can use a newV+e, for each value ofe. Explicitly, one can estimatee by

plqiej

^jW +e,3V�4+e,jW +e,` (22)

The estimates produced by this simultaneous search will not be numerically the same ina¿nite sample as the two-step or iterated estimates. The¿rst order conditions to (12.21) are�

CjW +e,

Ce

�3V�4+e4,jW +e, @ 3 (23)

while the¿rst order conditions in (12.22) add a term involving the derivatives ofV+e, withrespect toe. However, the latter terms vanish asymptotically, so the asymptotic distributiontheory is not affected. Hansen, Heaton and Yaron (1996) conduct some Monte Carlo experi-ments and¿nd that this estimate may have small-sample advantages in certain problems. Aproblem is that the one-step minimization may¿nd regions of the parameter space that blowup the spectral density matrixV+e, rather than lower the pricing errorsjW .

Often, one choice will be much more convenient than another. For linear models, onecan¿nd the minimizing value ofe from the¿rst order conditions (12.23) analytically. Thisfact eliminates the need to search so even an iterated estimate is much faster. For nonlinearmodels, each step involves a numerical search overjW +e,

3VjW +e,= Rather than perform thissearch many times, it may be much quicker to minimize once overjW +e,

3V+e,jW +e,= Onthe other hand, the latter is not a locally quadratic form, so the search may run into greaternumerical dif¿culties.

12.8 Problems

1. Use the delta method to derive the sampling variance of an autocorrelation coef¿cient.2. Write a formula for the standard error of OLS regression coef¿cients that corrects for

autocorrelation but not heteroskedasticity3. Write a formula for the standard error of OLS regression coef¿cients ifH+hwhw�m, @

206

Page 207: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 12.8 PROBLEMS

�m�5.4. If the GMM errors come from an asset pricing model, xw @ pwUw � 4, can you ignore

lages in the spectral density matrix? What if you know that returns are predictable? Whatif the error is formed from an instrument/managed portfolio xw}w�4B

207

Page 208: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 13. Regression-based tests oflinear factor modelsThis and the next three chapters study the qeustion, how should we estimate and evaluatelinear factor models� models of the form s @ H+p{,> p @ e3i or equivalently H+Uh, @��? These models are by far the most common in empirical asset pricing, and there is a largeliterature on econometric techniques to estimate and evaluate them. Each technique focuseson the same questions: how to estimate parameters, how to calculate standard errors of theestimated parameters, how to calculate standard errors of the pricing errors, and how to testthe model, usually with a test statistic of the form a�3Y �4a�.

I start with simple and longstanding time-series and cross-sectional regression tests. Then,I pursue GMM approach to the model expressed ins @ H+p{,> p @ e3i form. The follow-ing chapter summarizes the principle of maximum likelihood estimation and derives maxi-mum likelihood estimates and tests. Finally, a chapter compares the different approaches.

As always, the theme is the underlying unity. All of the techniques come down to one oftwo basic ideas: time-series regression or cross-sectional regression. The GMM,s @ H+p{,approach turns out to be almost identical to cross-sectional regressions. Maximum likelihood(with appropriate statistical assumptions) justi¿es the time-series and cross-sectional regres-sion approaches. The formulas for parameter estimates, standard errors, and test statistics areall strikingly similar.

13.1 Time-series regressions

When the factor is also a return, we can evaluate the model

H+Uhl, @ �lH+i,

by running OLS time series regressions

Uhlw @ �l . �liw . %lw> w @ 4> 5> ===W

for each asset. The OLS distribution formulas (with corrected standard errors) provide stan-dard errors of� and�.

With errors that are i.i.d. over time, homoskedastic and independent of the factors, theasymptotic joint distribution of the intercepts gives the model test statistic,

W

%4 .

�HW +i,

a�+i,

�5&�4

a�3 a�4a� �"5Q

208

Page 209: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.1 TIME-SERIES REGRESSIONS

The Gibbons-Ross-Shanken test is a multivariate,¿nite sample counterpart to this statistic,when the errors are also normally distributed,

W �Q �N

Q

�4 .HW +i,

3 a �4HW +i,��4

a�3 a�4a� �IQ>W�Q�N =

I show how to construct the same test statistics with heteroskedastic and autocorrelated errorsvia GMM.

I start with the simplest case. We have a factor pricing model with a single factor. Thefactor is an excess return (for example, the CAPM, withUhp @ Up � Ui ), and the testassets are all excess returns. We express the model in expected return - beta form. The betasare de¿ned by regression coef¿cients

Uhlw @ �l . �liw . %lw (1)

and the model states that expected returns are linear in the betas:

H+Uhl, @ �lH+i,= (2)

Since the factor is also an excess return, the model applies to the factor as well, soH+i, @4� �.

Comparing the model (13.2) and the expectation of the time series regression (13.1) wesee that the model has one and only one implication for the data:all the regression intercepts�l should be zero. The regression intercepts are equal to the pricing errors.

Given this fact, Black Jensen and Scholes (1972) suggested a natural strategy for estima-tion and evaluation: Run time-series regressions (13.1) for each test asset. The estimate ofthe factor risk premium is just the sample mean of the factor,

a� @ HW +i,.

Then, use standard OLS formulas for a distribution theory of the parameters. In particularyou can use t-tests to check whether the pricing errors� are in fact zero. These distributionsare usually presented for the case that the regression errors in (13.1) are uncorrelated andhomoskedastic, but the formulas in section 12.4 show easily how to calculate standard errorsfor arbitrary error covariance structures.

We also want to know whether all the pricing errors arejointly equal to zero. This requiresus to go beyond standard formulas for the regression (13.1) taken alone, as we want to knowthe joint distribution of� estimates from separate regressions running side by side but witherrors correlated across assets (H+%lw%

mw , 9@ 3). (We can think of 13.1 as a panel regression,

and then it’s a test whether the¿rm dummies are jointly zero.) The classic form of these testsassume no autocorrelation or heteroskedasticity, but allow the errors to be correlated acrossassets. Dividing thea� regression coef¿cients by their variance-covariance matrix leads to a

209

Page 210: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

"5 test,

W

%4 .

�HW +i,

a�+i,

�5&�4

a�3 a�4a� �"5Q (3)

whereHW +i, denotes sample mean,a�5+i, denotes sample variance,a� is a vector of theestimated intercepts,

a� @�a�4 a�5 === a�Q

�3a is the residual covariance matrix, i.e. the sample estimate ofH+%w%

3w, @ , where

%w @�%4w %5w � � � %Qw

�3=

As usual when testing hypotheses about regression coef¿cients, this test is valid asymp-totically. The asymptotic distribution theory assumes that�5+i, (i.e. [ 3[) and haveconverged to their probability limits� therefore it is asymptotically valid even though the fac-tor is stochastic and is estimated, but it ignores those sources of variation in a¿nite sample.It does not require that the errors are normal, relying on the central limit theorem so thata� isnormal. I derive (13.3) below.

Also as usual in a regression context, we can derive a¿nite-sampleI distribution for thehypothesis that a set of parameters are jointly zero, for¿xed values of the right hand variableiw>.

W �Q � 4

Q

%4 .

�HW +i,

a�+i,

�5&�4

a�3 a�4a� �IQ>W�Q�4 (4)

This is the Gibbons Ross and Shanken (1989) or “GRS” test statistic. TheI distributionrecognizes sampling variation ina, which is not included in (13.3). This distribution requiresthat the errors% are normal as well as uncorrelated and homoskedastic. With normal errors,the a� are normal anda is an independent Wishart (the multivariate version of a"5), so theratio isI . This distribution is exact in a¿nite sample.

Tests (13.3) and (13.4) have a very intuitive form. The basic part of the test is a quadraticform in the pricing errors,a�3 a�4a�. If there were no�i in the model, then thea� wouldsimply be the sample mean of the regression errors%w. Assuming i.i.d.%w, the variance oftheir sample mean is just4@W. Thus, if we knew thenW a�3�4a� would be a sum ofsquared sample means divided by their variance-covariance matrix, which would have anasymptotic"5Q distribution, or a¿nite sample"5Q distribution if the%w are normal. But wehave to estimate, which is why the¿nite-sample distribution isI rather than"5. We alsoestimate the�, and the second term in (13.3) and (13.4) accounts for that fact.

Recall that a single beta representation exists if and only if the reference return is onthe mean-variance frontier. Thus, the test can also be interpreted as a test whetheri is ex-

210

Page 211: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.1 TIME-SERIES REGRESSIONS

ante mean-variance ef¿cient – whether it is on the mean-variance frontier usingpopulationmoments – after accounting for sampling error. Even ifi is on the true or ex-ante mean-variance frontier, other returns will outperform it in sample due to luck, so the returni willusually be inside the ex-post mean-variance frontier – i.e. the frontier drawn usingsamplemoments. Still, it should not be too far inside the sample frontier. Gibbons Ross and Shankenshow that the test statistic can be expressed in terms of how far inside the ex-post frontier thereturni is,

W �Q � 4

Q

��t�t

�5��HW +i,a�+i,

�54 .

�HW +i,a�+i,

�5 = (5)

��t�t

�5is the Sharpe ratio of the ex-post tangency portfolio (maximum ex-post Sharpe ratio)

formed from the test assets plus the factori .

If there are many factors that are excess returns, the same ideas work, with some cost ofalgebraic complexity. The regression equation is

Uhl @ �l . �3liw . %lw=

The asset pricing model

H+Uhl, @ �3lH+i,

again predicts that the intercepts should be zero. We can estimate� and� with OLS time-series regressions. Assuming normal i.i.d. errors, the quadratic forma�3 a�4a� has the distri-bution,

W �Q �N

Q

�4 . HW +i,

3 a �4HW +i,��4

a�3 a�4a� �IQ>W�Q�N (6)

where

Q @ Number of assets

N @ Number of factors

a @4

W

W[w@4

^iw �HW +i,` ^iw �HW +i,`3

The main difference is that the Sharpe ratio of the single factor is replaced by the naturalgeneralizationHW +i,

3 a �4HW +i,.

13.1.1 Derivation of (13.3) and distributions with general errors.

I derive (13.3) as an instance of GMM. This approach allows us to generate straightforwardlythe required corrections for autocorrelated and heteroskedastic disturbances. (MacKinlay and

211

Page 212: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

Richardson (1991) advocate GMM approaches to regression tests in this way.) It also servesto remind us that GMM ands @ H+p{, are not necessarily paired� one can do a GMMestimate of an expected return - beta model too. The mechanics are only slightly differentthan what we did to generate distributions for OLS regression coef¿cients in section 12.4,since we keep track ofQ OLS regressions simultaneously.

Write the equations for allQ assets together in vector form,

Uhw @ �. �iw . %w=

We use the usual OLS moments to estimate the coef¿cients,

jW +e, @

�HW +Uh

w � �� �iw,HW ^+Uh

w � �� �iw, iw`

�@ HW

��%wiw%w

��@ 3

These moments exactly identify the parameters�> �, so thed matrix in djW +ae, @ 3 is theidentity matrix. Solving, the GMM estimates are of course the OLS estimates,

a� @ HW +Uhw ,� a�HW +iw,

a� @HW ^+Uh

w �HW +Uhw ,, iw`

HW ^+iw �HW +iw,, iw`@

fryW +Uhw > iw,

yduW +iw,=

Theg matrix in the general GMM formula is

g � CjW +e,

Ce3@ �

�LQ LQH+iw,

LQH+iw, LQH+i5w ,

�@ �

�4 H+iw,

H+iw, H+i5w ,

� LQ

whereLQ is anQ �Q identity matrix. TheV matrix is

V @4[

m@�4

�H+%w%3w�m, H+%w%3w�miw�m,H+iw%w%3w�m, H+iw%w%3w�miw�m,

�=

Using the GMM variance formula (12.4) withd @ L we have

ydu

��a�a�

��@

4

Wg�4Vg�43= (7)

At this point, we’re done. The upper left hand corner ofydu+� �, gives usydu+a�, and thetest we’re looking for isa�3ydu+a�,�4a� � "5Q .

The standard formulas make this expression prettier by assuming that the errors are uncor-related over time and not heteroskedastic to simplify theV matrix, as we derived the standardOLS formulas in section 12.4. If we assume thati and% are independent as well as orthog-onal,H+i%%3, @ H+i,H+%%3, andH+i5%%3, @ H+i5,H+%%3,. If we assume that the errorsare independent over time as well, we lose all the lead and lag terms. Then, the S matrix

212

Page 213: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.2 CROSS-SECTIONAL REGRESSIONS

simpli¿es to

V @

�H+%w%

3w, H+%w%

3w,H+iw,

H+iw,H+%w%3w, H+%w%

3w,H+i5w ,

�@

�4 H+iw,

H+iw, H+i5w ,

� (8)

Now we can plug into (13.7). Using+D E,�4 @ D�4 E�4 and+D E,+F G, @DF EG> we obtain

ydu

��a�a�

��@

4

W

#�4 H+iw,

H+iw, H+i5w ,

��4

$=

Evaluating the inverse,

ydu

��a�a�

��@

4

W

4

ydu+i,

�H+i5w , �H+iw,�H+iw, 4

We’re interested in the top left corner. UsingH+i5, @ H+i,5 . ydu+i,,

ydu +a�, @4

W

�4 .

H+i,5

ydu+i,

�=

This is the traditional formula (13.3), but there is now no real reason to assume that theerrors are i.i.d. or independent of the factors. By simply calculating 13.7, we can easilyconstruct standard errors and test statistics that do not require these assumptions.

13.2 Cross-sectional regressions

We can ¿t

H+Uhl, @ �3l�. �l

by running a cross-sectionalregression of average returns on the betas. This technique canbe used whether the factor is a return or not.

I discuss OLS and GLS cross-sectional regressions, I¿nd formulas for the standard errorsof �> and a"5 test whether the� are jointly zero. I derive the distributions as an instance ofGMM, and I show how to implement the same approach for autocorrelated and heteroskedas-tic errors. I show that the GLS cross-sectional regression is the same as the time-series re-gression when the factor is also an excess return, and is included in the set of test assets.

213

Page 214: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

)( eiRE

Assets i

Slope λ

Figure 24. Cross-sectional regression

Start again with theN factor model, written as

H+Uhl, @ �3l�> l @ 4> 5> ===Q

The central economic question is why average returns varyacross assets� expected returns ofan asset should be high if that asset has high betas or risk exposure to factors that carry highrisk premia.

Figure 13.10 graphs the case of a single factor such as the CAPM. Each dot representsone assetl. The model says that average returns should be proportional to betas, so plot thesample average returns against the betas. Even if the model is true, this plot will not work outperfectly in each sample, so there will be some spread as shown.

Given these facts, a natural idea is to run across-sectional regressionto ¿t a line throughthe scatterplot of Figure 13.10. First ¿nd estimates of the betas from a time series regression,

Uhlw @ dl . �3liw . %lw> w @ 4> 5> ===W for each l. (9)

Then estimate the factor risk premia � from a regression across assets of average returns onthe betas,

HW +Uhl, @ �3l�. �l> l @ 4> 5====Q= (10)

214

Page 215: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.2 CROSS-SECTIONAL REGRESSIONS

As in the¿gure,� are the right hand variables,� are the regression coef¿cients, and thecross-sectional regression residuals�l are the pricing errors. This is also known as atwo-passregression estimate, because one estimates ¿rst time-series and then cross-sectional re-gressions.

You can run the cross-sectional regression with or without a constant. The theory saysthat the constant or zero-beta excess return should be zero. You can impose this restriction orestimate a constant and see if it turns out to be small. The usual tradeoff between ef¿ciency(impose the null as much as possible to get ef¿cient estimates) and robustness applies.

13.2.1 OLS cross-sectional regression

It will simplify notation to consider a single factor� the case of multiple factors looks thesame with vectors in place of scalars. I denote vectors from 4 to Q with missing sub orsuperscripts, i.e. %w @

�%4w %5w � � � %Qw

�3, � @

��4 �5 � � � �Q

�3, and similarly

for Uhw and �. For simplicity take the case of no intercept in the cross-sectional regression.

With this notation OLS cross-sectional estimates are

a� @��3�

��4�3HW +U

h, (13.11)

a� @ HW +Uh,� a��=

Next, we need a distribution theory for the estimated parameters. The most natural placeto start is with the standard OLS distribution formulas. I start with the traditional assumptionthat the true errors are i.i.d. over time, and independent of the factors. This will give us someeasily interpretable formulas, and we will see most of these terms remain when we do thedistribution theory right later on.

In an OLS regression\ @ [�. x andH+xx3, @ , the standard error of the� estimateis +[3[,�4[3 [+[3[,�4. The residual covariance matrix is+L �[+[3[,�4[3, +L �[+[ 3[,�4[3,3

Denote @ H +%w%3w,. Since the�l are just time series averages of the true%lw shocks

(the average of the sample residuals is always zero), the errors in the cross-sectional regres-sion have covariance matrixH +��3, @ 4

W. Thus the conventional OLS formulas for the

covariance matrix of OLS estimates and residual with correlated errors give

�5�a��

@4

W

��3�

��4�3�

��3�

��4(13.12)

fry+a�, @4

W

�L � �

��3�

��4�3��L � �

��3�

��4�3�

(13.13)

We could test whether all pricing errors are zero with the statistic

a�3fry+a�,�4a� �"5Q�4= (14)

215

Page 216: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

The distribution is"5Q�4 not"5Q because the covariance matrix is singular. The singularityand the extra terms in (13.13) result from the fact that the� coef¿cient was estimated alongthe way, and means that we have to use a generalized inverse. (If there areN factors, weobviously end up with"5Q�N =)

A test of the residuals is unusual in OLS regressions. We do not usually test whether theresiduals are “too large,” since we have no information other than the residuals themselvesabout how large they should be. In this case, however, the¿rst stage time-series regressiongives us some independent information about the size offry+��3,, information that we couldnot get from looking at the cross-sectional residual� itself.

13.2.2 GLS cross-sectional regression

Since the residuals in the cross-sectional regression (13.10) are correlated with each other,standard textbook advice is to run a GLS cross-sectional regression rather than OLS, usingH+��3, @ 4

W as the error covariance matrix:

a� @��3�4�

��4�3�4HW +U

h, (13.15)

a� @ HW +Uh,� a��=

The standard regression formulas give the variance of these estimates as

�5�a��

@4

W

��3�4�

��4(13.16)

fry+a�, @4

W

�� �

��3�4�

��4�3�

(13.17)

The comments of section 12.5 warning that OLS is sometimes much more robust thanGLS apply in this case. The GLS regression should improve ef¿ciency, i.e. give more preciseestimates. However, may be hard to estimate and to invert, especially if the cross-sectionQ is large. One may well choose the robustness of OLS over the asymptotic statistical ad-vantages of GLS.

A GLS regression can be understood as a transformation of the space of returns, to focusattention on the statistically most informative portfolios. Finding (say, by Choleski decompo-sition) a matrixF such thatFF3 @ �4, the GLS regression is the same as an OLS regressionof FHW +U

h, onF�, i.e. of testing the model on the portfoliosFUh. The statistically mostinformative portfolios are those with the lowest residual variance. But this asymptotic sta-tistical theory assumes that the covariance matrix has converged to its true value. In mostsamples, the ex-post or sample mean-variance frontier still seems to indicate lots of luck, andthis is especially true if the cross section is large, anything more than 1/10 of the time series.The portfoliosFUh are likely to contain many extreme long-short positions.

Again, we could test the hypothesis that all the� are equal to zero with (13.14). Thoughthe appearance of the statistic is the same, the covariance matrix is smaller, reÀecting the

216

Page 217: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.2 CROSS-SECTIONAL REGRESSIONS

greater power of the GLS test. As with theMW test, (12.10) we can develop an equivalent testthat does not require a generalized inverse�

W a�3�4a� �"5Q�4= (18)

To derive (13.18), I proceed exactly as in the derivation of theMW test (12.10). De¿ne, say byCholeski decomposition, a matrixF such thatFF 3 @ �4. Now,¿nd the covariance matrixof

sWF3a�:

fry+sWF�, @ F3

�+FF3,�4 � �

��3FF3�

��4�3�F @ L � �

��3�

��4�3

where

� @ F3�=

In sum,a� is asymptotically normal sosWF3a� is asymptotically normal,fry+

sWF 3a�, is an

idempotent matrix with rankQ � 4� thereforeW a�3FF 3a� @ W a�3�4a� is"5Q�4=

13.2.3 Correction for the fact that � are estimated, and GMM formulas thatdon’t need i.i.d. errors.

In applying standard OLS formulas to a cross-sectional regression, we assume that the righthand variables� are¿xed. The� in the cross-sectional regression are not¿xed, of course,but are estimated in the time series regression. This turns out to matter, even asW $ 4.

In this section, I derive the correct asymptotic standard errors. With the simplifying as-sumption that the errors% are i.i.d. over time and independent of the factors, the result is

�5+a�ROV, @4

W

k+�3�,�4�3�

��3�

��4 �4 . �3�4i �

�.i

l(13.19)

�5+a�JOV, @4

W

k��3�4�

��4 �4 . �3�4i �

�.i

lwherei is the variance-covariance matrix of the factors. This correction is due to Shanken(1992). Comparing these standard errors to (13.12) and (13.16), we see that there is a multi-

plicative correction�4 . �3�4i �

�and an additive correctioni .

The asymptotic variance-covariance matrix of the pricing errors is

fry+a�ROV, @4

W

�LQ � �

��3�

��4�3��LQ � �+�3�,�4�3

� �4 . �3�4i �

�(13.20)

fry+a�JOV, @4

W

�� �

��3�4�

��4�3��

4 . �3�4i ��

(13.21)

Comparing these results to (13.13) and (13.17) we see the same multiplicative correctionapplies.

217

Page 218: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

We can form the asymptotic"5 test of the pricing errors by dividing pricing errors by theirvariance-covariance matrix,a�fry+a�,�4a�. Following (13.18), we can simplify this result forthe GLS pricing errors resulting in

W�4 . �3�4i �

�a�3JOV

�4a�JOV � "5Q�N = (22)

Are the corrections important relative to the simple OLS formulas given above? In theCAPM� @ H+Uhp, so�5@�5+Uhp, � +3=3;@3=49,5 @ 3=58 in annual data. In annual data,then, the multiplicative term is too large to ignore. However, the mean and variance bothscale with horizon, so the Sharpe ration scales with the square root of horizon. Therefore,for a monthly interval�5@�5+Uhp, � 3=58@45 � 3=35 which is quite small and ignoring themultiplicative term makes little difference.

The additive term in the standard error ofa� can be very important. Consider a one factormodel, suppose all the� are 1.0, all the residuals are uncorrelated so is diagonal, supposeall assets have the same residual covariance�5+%,, and ignore the multiplicative term. Nowwe can write either covariance matrix in (13.19) as

�5+a�, @4

W

�4

Q�5+%, . �5+i,

Even withQ @ 4> most factor models have fairly highU5, so�5+%, ? �5+i,. TypicalCAPM values of U5 @ 4 � �5+%,@�5 +i, for large portfolios are 0.6-0.7� and multifactormodels such as the Fama French 3 factor model haveU5 often over 0.9. Typical numbers ofassetsQ @ 43 to 83 make the¿rst term vanish compared to the second term.

More generally, suppose the factor were in fact a return. Then the factor risk premium is� @ H+i,, and we’d usei@W as the standard error of�. This is the “correction” term in(13.19), so we expect it to be, in fact, the most important term.

Note thati@W is the standard error of the mean ofi . Thus, in the case that the returnis a factor, soH+i, @ �, this is the only term you would use.

This example suggests thati is not just an important correction, it is likely to be thedominant consideration in the sampling error of thea�.

Comparing (13.22) to the GRS tests for a time-series regression, (13.3), (13.4), (13.6) wesee the same statistic. The only difference is that by estimating� from the cross-section ratherthan imposing� @ H+i,, the cross-sectional regression loses degrees of freedom equal tothe number of factors.

Though these formulas are standard classics, I emphasize that we don’t have to make thesevere assumptions on the error terms that are used to derive them. As with the time-seriescase, I derive a general formula for the distribution ofa� anda�, and only at the last momentmake classic error term assumptions to make the spectral density matrix pretty.

Derivation and formulas that don’t require i.i.d. errors.

218

Page 219: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.2 CROSS-SECTIONAL REGRESSIONS

The easy and elegant way to account for the effects of “generated regressors” such asthe� in the cross-sectional regression is to map the whole thing into GMM. Then, we treatthe moments that generate the regressors� at the same time as the moments that generatethe cross-sectional regression coef¿cient �, and the covariance matrixV between the twosets of moments captures the effects of generating the regressors on the standard error of thecross-sectional regression coef¿cients. Comparing this straightforward derivation with thedif¿culty of Shanken’s (1992) paper that originally derived the corrections fora�> and notingthat Shanken did not go on to¿nd the formulas (13.20) that allow a test of the pricing errorsis a nice argument for the simplicity and power of the GMM framework.

To keep the algebra manageable, I treat the case of a single factor. The moments are

jW +e, @

57 H+Uh

w � d� �iw,H ^+Uh

w � d� �iw,iw`H +Uh � ��,

68 @

57 3

33

68 (23)

The top two moment conditions exactly identifyd and� as the time-series OLS estimates.(Note d not �. The time-series intercept is not necessarily equal to the pricing error in across-sectional regression.) The bottom moment condition is the asset pricing model. It is ingeneral overidenti¿ed in a sample, since there is only one extra parameter+�, andQ extramoment conditions. If we use a weighting vector�3 on this condition, we obtain the OLScross-sectional estimate of�. If we use a weighting vector�3�4, we obtain the GLS cross-sectional estimate of�. To accommodate both cases, use a weighting vector�3, and thensubstitute�3 @ �3> �3 @ �3�4 , etc. at the end.

The standard errors fora� come straight from the general GMM standard error formula(12.4). Thea� are not parameters, but are the lastQ moments. Their covariance matrix isthus given by the GMM formula (12.5) for the sample variation of thejW .

All we have to do is map the problem into the GMM notation. The parameter vector is

e3 @�d3 �3 �

�Thed matrix chooses which moment conditions are set to zero in estimation,

d @

�L5Q 33 �3

�=

Theg matrix is the sensitivity of the moment conditions to the parameters,

g @CjWCe3

@

57 �LQ �LQH+i, 3

�LQH+i, �LQH+i5, 33 ��LQ ��

68

219

Page 220: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

TheV matrix is the long-run covariance matrix of the moments.

V @4[

m@�4H

3C57 Uh

w � d� �iw+Uh

w � d� �iw,iwUhw � ��

6857 Uh

w�m � d� �iw�m+Uh

w�m � d� �iw�m,iw�mUhw�m � ��

6834D

@4[

m@�4H

3C57 %w

%wiw�+iw �Hi, . %w

6857 %w�m

%w�miw�m�+iw�m �Hi, . %w�m

6834D

In the second expression, I have used the regression model and the restriction under the nullthatH +Uh

w , @ ��. In calculations, of course, you could simply estimate the¿rst expression.

We are done. We have the ingredients to calculate the GMM standard error formula (12.4)and formula for the covariance of moments (12.5).

We can recover the classic formulas (13.19), (13.20), (13.21) by adding the assumptionthat the errors are i.i.d. and independent of the factors, and that the factors are uncorrelatedover time as well. The assumption that the errors and factors are uncorrelated over timemeans we can ignore the lead and lag terms. Thus, the top left corner isH+%w%3w, @ . Theassumption that the errors are independent from the factorsiw simpli¿es the terms in which%w andiw are multiplied:H+%w +%

3wiw,, @ H+i, for example. The result is

V @

57 H+i,

H+i, H+i5, H+i, H+i, ��3�5+i, .

68

Multiplying d> g> V together as speci¿ed by the GMM formula for the covariance matrixof parameters (12.4) we obtain the covariance matrix of all the parameters, and its (3,3)element gives the variance ofa�. Multiplying the terms together as speci¿ed by (12.5), weobtain the sampling distribution of thea�, (13.20). The formulas (13.19) reported above arederived the same way with a vector of factorsiw rather than a scalar� the second momentcondition in (13.23) then readsH ^+Uh

w � d� �i w, iw`. The matrix multiplication is notparticularly enlightening.

Once again, there is really no need to make the assumption that the errors are i.i.d. andespecially that they are conditionally homoskedastic – that the factori and errors% are in-dependent. It is quite easy to estimate anV matrix that does not impose these conditionsand calculate standard errors. They will not have the pretty analytic form given above, butthey will more closely report the true sampling uncertainty of the estimate. Furthermore, ifone is really interested in ef¿ciency, the GLS cross-sectional estimate should use the spectraldensity matrix as weighting matrix rather than�4.

220

Page 221: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.2 CROSS-SECTIONAL REGRESSIONS

13.2.4 Time series vs. cross-section

How are the time-series and cross-sectional approaches different?

Most importantly, you can run the cross-sectional regression when the factor is not areturn. The time-series test requires factors that are also returns, so that you can estimatefactor risk premia bya� @ HW +i,. The asset pricing model does predict a restriction on theintercepts in the time-series regression. Why not just test these? If you impose the restrictionH+Uhl, @ �3l�, you can write the time-series regression (13.9) as

Uhlw @ �3l�. �3l +iw �H+i,, . %lw> w @ 4> 5> ===W for eachl.

Comparing this with (13.9), you see that the intercept restriction is

dl @ �3l +��H+i,, =

This restriction makes sense. The model says that mean returns should be proportional tobetas, and the intercept in the time-series regression controls the mean return. You can alsosee how� @ H+i, results in a zero intercept. Finally, however, you see that without anestimate of�, you can’t check this intercept restriction. If the factor is not a return, you willbe forced to do something like a cross-sectional regression.

When the factor is a return, so that we can compare the two methods, they are not neces-sarily the same. The time-series regression estimates the factor risk premium as the samplemean of the factor. Hence, the factor receives a zero pricing error. Also, the predicted zero-beta excess return is also zero. Thus, the time-series regression describes the cross-section ofexpected returns by drawing a line as in Figure 13.10 that runs through the origin and throughthe factor, ignoring all of the other points. The OLS cross-sectional regression picks the slopeand intercept, if you include one, to best¿t all the points� to minimize the sum of squares ofall the pricing errors.

If the factor is a return,the GLS cross-sectional regression, including the factor as a testasset, is identical to the time-series regression. The time-series regression for the factor is, ofcourse,

iw @ 3 . 4iw . 3

so it has a zero intercept, beta equal to one, and zero residual in every sample. The residualvariance covariance matrix of the returns, including the factor, is

H

��Uh � d� �ii � 3� 4i

�^�`3

�@

� 33 3

Since the factor haszero residual variance, a GLS regression puts all its weight on that asset.Therefore,a� @ HW +i, just as for the time-series regression. The pricing errors are the same,as is their distribution and the"5 test. (You gain a degree of freedom by adding the factor tothe cross sectional regression, so the test is a"5Q .)

221

Page 222: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

Why does the “ef¿cient” technique ignore the pricing errors of all of the other assets inestimating the factor risk premium, and focus only on the mean return? The answer is simple,though subtle. In the regression model

Uhw @ d. �iw . %w>

the average return of each asset in a sample is equal to beta times the average return of thefactor in the sample, plus the average residual in the sample. An average return carriesnoadditional information about the mean of the factor. A signal plus noise carries no additionalinformation beyond that in thesame signal. Thus, an “ef¿cient” cross-sectional regressionwisely ignores all the information in the other asset returns and uses only the information inthe factor return to estimate the factor risk premium.

13.3 Fama-MacBeth Procedure

I introduce the Fama-MacBeth procedure for running cross sectional regression and calcu-lating standard errors that correct for cross-sectional correlation in a panel. I show that, whenthe right hand variables do not vary over time, Fama-MacBeth is numerically equivalent topooled time-series, cross-section OLS with standard errors corrected for cross-sectional cor-relation, and also to a single cross-sectional regression on time-series averages with standarderrors corrected for cross-sectional correlation. Fama-MacBeth standard errors do not includecorrections for the fact that the betas are also estimated.

Fama and MacBeth (1973) suggest an alternative procedure for running cross-sectionalregressions, and for producing standard errors and test statistics. This is a historically impor-tant procedure, it is computationally simple to implement, and is still widely used, so it isimportant to understand it and relate it to other procedures.

First, you¿nd beta estimates with a time-series regression. Fama and MacBeth use rolling5 year regressions, but one can also use the technique with full-sample betas, and I willconsider that simpler case. Second, instead of estimating a single cross-sectional regressionwith the sample averages, we now run a cross-sectional regressionat each time period, i.e.

Uhlw @ �3l�w . �lw l @ 4> 5> ===Q for eachw=

I write the case of a single factor for simplicity, but it’s easy to extend the model to multiplefactors. Then, Fama and MacBeth suggest that we estimate� and�l as the average of thecross sectional regression estimates,

a� @4

W

W[w@4

a�w> a�l @4

W

W[w@4

a�lw=

222

Page 223: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.3 FAMA-MACBETH PROCEDURE

Most importantly, they suggest that we use the standard deviations of the cross-sectionalregression estimates to generate the sampling errors for these estimates,

�5+a�, @4

W 5

W[w@4

�a�w � a�

�5> �5+a�l, @

4

W 5

W[w@4

+a�lw � a�l,5 =

It’s 4@W 5 because we’re¿nding standard errors of sample means,�5@W

This is an intuitively appealing procedure once you stop to think about it. Sampling erroris, after all, about how a statistic would vary from one sample to the next if we repeated theobservations. We can’t do that with only one sample, but why not cut the sample in half, anddeduce how a statistic would vary from one full sample to the next from how it varies fromthe¿rst half of the sample to the next half? Proceeding, why not cut the sample in fourths,eights and so on? The Fama-MacBeth procedure carries this idea to is logical conclusion,using the variation in the statistica�w over time to deduce its sampling variation.

We are used to deducing the sampling variance of the sample mean of a series{w bylooking at the variation of{w through time in the sample, using�5+�{, @ �5+{,@W @4W5

Sw +{w � �{,5. The Fama-MacBeth technique just applies this idea to the slope and pric-

ing error estimates. The formula assumes that the time series is not autocorrelated, but onecould easily extend the idea to estimatesa�w that are correlated over time by using a long runvariance matrix, i.e. estimate .

�5+a�, @4

W

4[m@�4

fryW +a�w> a�w�m,

One should of course use some sort of weighting matrix or a parameteric description of theautocorrelations ofa�, as explained in section 12.7. Asset return data are usually not highlycorrelated, but accounting for such correlation could have a big effect on the applicationof the Fama-MacBeth technique to corporate¿nance data or other regressions in which thecross-sectional estimates are highly correlated over time.

It is natural to use this sampling theory to test whether all the pricing errors are jointlyzero as we have before. Denote by� the vector of pricing errors across assets. We couldestimate the covariance matrix of the sample pricing errors by

a� @4

W

W[w@4

a�w

fry+a�, @4

W 5

W[w@4

+a�w � a�, +a�w � a�,3

(or a general version that accounts for correlation over time) and then use the test

a�3fry+a�,�4a� � "5Q�4=

223

Page 224: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

13.3.1 Fama MacBeth in depth

The GRS procedure and the formulas given above for a single cross-sectional regression arefamiliar from any course in regression. We will see them justi¿ed by maximum likelihoodbelow. The Fama MacBeth procedure seems unlike anything you’ve seen in any econometricscourse, and it is obviously a useful and simple technique that can be widely used in paneldata in economics and corporate¿nance as well as asset pricing. Is it truly different? Is theresomething different about asset pricing data that requires a fundamentally new technique nottaught in standard regression courses? Or is it similar to standard techniques? To answer thesequestions it is worth looking in a little more detail at what it accomplishes and why.

It’s easier to do this in a more standard setup, with left hand variable| and right handvariable{. Consider a regression

|lw @ �3{lw . %lw l @ 4> 5> ===Q > w @ 4> 5> ===W=

The data in this regression has a cross-sectional element as well as a time-series element.In corporate¿nance, for example, one might be interested in the relationship between in-vestment and¿nancial variables, and the data set has many¿rms (Q ) as well as time seriesobservations for each¿rm (W ). In and expected return-beta asset pricing model, the{lw stand-ing for the�l and� stands for�.

An obvious thing to do in this context is simply to stack thel and w observations togetherand estimate� by OLS. I will call this thepooled time-series cross-section estimate.How-ever, the error terms are not likely to be uncorrelated with each other. In particular, the errorterms are likely to be cross-sectionally correlated at a given time. If one stock’s return is un-usually high this month, another stock’s return is also likely to be high� if one¿rm invests anunusually great amount this year, another¿rm is also likely to do so. When errors are corre-lated, OLS is still consistent, but the OLS distribution theory is wrong, and typically suggestsstandard errors that are much too small. In the extreme case that theQ errors are perfectlycorrelated at each time period, there really only one observation for each time period, so onereally hasW rather thanQW observations. Therefore, a real pooled time-series cross-sectionestimate must include corrected standard errors. People often ignore this fact and report OLSstandard errors.

Another thing we could do is¿rst take time series averages and then run apure cross-sectionalregression of

HW +|lw, @ �3HW +{lw, . xl l @ 4> 5> ===Q

This procedure would lose any information due to variation of the {lw over time, but at leastit might be easier to ¿gure out a variance-covariance matrix forxl and correct the standarderrors for residual correlation. (You could also average cross-sectionally and than run a singletime-series regression. We’ll get to that option later.)

In either case, the standard error corrections are just applications of the standard formula

224

Page 225: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.3 FAMA-MACBETH PROCEDURE

for OLS regressions with correlated error terms.

Finally, we could run the Fama-MacBeth procedure: run a cross-sectional regression ateach point in time� average the cross-sectionala�w estimates to get an estimatea�, and use thetime-series standard deviation ofa�w to estimate the standard error ofa�.

It turns out that the Fama MacBeth procedure is just another way of calculating the stan-dard errors, corrected for cross-sectional correlation:

Proposition: If the{lw variables do not vary over time, and if the errors are cross-sectionallycorrelated but not correlated over time, then the Fama-MacBeth estimate, the pure cross-sectional OLS estimate and the pooled time-series cross-sectional OLS estimates are identi-cal. Also, the Fama-MacBeth standard errors are identical to the cross-sectional regression orstacked OLS standard errors, corrected for residual correlation. None of these relations holdif the { vary through time.

Since they are identical procedures, whether one calculates estimates and standard errorsin one way or the other is a matter of taste.

I emphasize one procedure that is incorrect: pooled time series and cross section OLS withno correction of the standard errors. The errors are so highly cross-sectionally correlated inmost¿nance applications that the standard errors so computed are often off by a factor of 10.

The assumption that the errors are not correlated over time is probably not so bad forasset pricing applications, since returns are close to independent. However, when pooledtime-series cross-section regressions are used in corporate¿nance applications, errors arelikely to be as severely correlated over time as across¿rms, if not more so. The “otherfactors” (%) that cause, say, companyl to invest more at timew than predicted by a set of righthand variables is surely correlated with the other factors that cause companym to invest more.But such factors are especially likely to cause companyl to invest more tomorrow as well. Inthis case, any standard errors must also correct for serial correlation in the errors� the GMMbased formulas in section 12.4 can do this easily.

The Fama-MacBeth standard errors also do not correct for the fact thata� are generatedregressors. If one is going to use them, it is a good idea to at least calculate the Shankencorrection factors outlined above, and check that the corrections are not large.

Proof: We just have to write out the three approaches and compare them. Having assumedthat the{ variables do not vary over time, the regression is

|lw @ {3l� . %lw=

We can stack up the cross-sectionsl @ 4===Q and write the regression as

|w @ {� . %w=

{ is now a matrix with the{3l as rows. The error assumptions meanH+%w%3w, @ .

225

Page 226: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

Pooled OLS: To run pooled OLS, we stack the time series and cross sections by writing

\ @

59997

|4|5...|W

6:::8 > [ @

59997

{{...{

6:::8 > � @

59997

%4%5...%W

6:::8

and then

\ @ [� . �

with

H+��3, @ @

597

...

6:8

The estimate and its standard error are then

a�ROV @ +[3[,�4

[3\

fry+a�ROV, @ +[3[,�4

[3 [ +[3[,�4

Writing this out from the de¿nitions of the stacked matrices, with[ 3[ @W{3{>

a�ROV @ +{3{,�4 {3HW +|w,

fry+a�ROV, @4

W+{3{,�4 +{3{, +{3{,�4 =

We can estimate this sampling variance with

a @ HW

�a%wa%

3w

�> a%w � |w � {a�ROV

Pure cross-section:The pure cross-sectional estimator runs one cross-sectional regressionof the time-series averages. So, take those averages,

HW +|w, @ {� .HW +%w,

where{ @ HW +{ , since{ is constant. Having assumed i.i.d. errors over time, the errorcovariance matrix is

H +HW +%w,HW +%3w,, @4

W=

The cross sectional estimate and corrected standard errors are then

a�[V @ +{3{,�4 {3HW +|w,

�5+a�[V, @4

W+{3{,�4 {3{�4 +{3{,�4

226

Page 227: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 13.3 FAMA-MACBETH PROCEDURE

Thus, the cross-sectional and pooled OLS estimates and standard errors are exactly the same,in each sample.

Fama-MacBeth: The Fama–MacBeth estimator is formed by¿rst running the cross-sectional regression at each moment in time,

a�w @ +{3{,�4 {3|w=

Then the estimate is the average of the cross-sectional regression estimates,

a�IP @ HW

�a�w

�@ +{3{,�4 {3HW +|w, =

Thus, the Fama-MacBeth estimator is also the same as the OLS estimator, in each sample.The Fama-MacBeth standard error is based on the time-series standard deviation of thea�w.UsingfryW to denote sample covariance,

fry�a�IP

�@

4

WfryW

�a�w

�@

4

W+{3{,�4 {3fryW +|w,{ +{

3{,�4 =

with

|w @ {�IP .a%w

we have

fryW +|w, @ HW +a%wa%3w, @ a

and¿nally

fry�a�IP

�@

4

W+{3{,�4 {3 a{ +{3{,�4 =

Thus, the FM estimator of the standard error is also numerically equivalent to the OLS cor-rected standard error.

Varying x If the {lw vary through time, none of the three procedures are equal anymore,since the cross-sectional regressions ignore time-series variation in the{lw. As an extremeexample, suppose a scalar{lw varies over time but not cross-sectionally,

|lw @ �. {w� . %lw> l @ 4> 5> ===Q > w @ 4> 5> ===W=

The grand OLS regression is

a�ROV @

Slw �{w|lwSlw �{

5w

@

Sw �{w

4Q

Sl |lwS

w �{5w

where�{ @ {�HW +{, denotes the demeaned variables. The estimate is driven by the covari-ance over time of{w with the cross-sectional average of the|lw, which is sensible because allof the information in the sample lies in time variation. It is identical to a regression overtime

227

Page 228: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 13 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

of cross-sectionalaverages. However, you can’t even run a cross-sectional estimate, sincethe right hand variable is constant acrossl. As a practical example, you might be interestedin a CAPM speci¿cation in which the betas vary over time (�w, but not across test assets.This sample still contains information about the CAPM: the time-variation in betas shouldbe matched by time variation in expected returns. But any method based on cross-sectionalregressions will completely miss it. �

In historical context, the Fama MacBeth procedure was also important because it allowedchanging betas, which a single cross-sectional regression or a time-series regression test can-not easily handle.

13.4 Problems

1. When we express the CAPM in excess return form, can the test assets be differencesbetween risky assets,Ul�Um? Can the market excess return also use a risky asset, or mustit be relative to a risk free rate? (Hint: start withH+Ul,�Ui @ �l>p

�H+Up,�Ui

�and see if you can get to the other forms. Betas must be regression coef¿cients.)

2. Can you run the GRS test on a model that uses industrial production growth as a factor,H+Ul,�Ui @ �l>�ls�ls?

3. Fama and French (19xx) report that pricing errors are correlated with betas in a test of afactor pricing model on industry portfolios. How is this possible?

4. We saw that a GLS cross-sectional regression of the CAPM passes through the marketand riskfree rate by construction. Show that if the market return is anequally weightedportfolio of the test assets, then an OLS cross-sectional regression with an estimatedintercept passes through the market return by construction. Does it also pass through theriskfree rate or origin?

228

Page 229: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 14. GMM for linear factormodels in discount factor form

14.1 GMM on the pricing errors gives a cross-sectional regression

The ¿rst stage estimate is an OLS cross-sectional regression, and the second stage is aGLS regression,

First stage = ae4 @ +g3g,�4g3HW +s,

Second stage = ae5 @ +g3V�4g,g3V�4H+s,.

Standard errors are the corresponding regression formulas, and the variance of the pricingerrors are the standard regression formula for variance of a residual.

Treating the constantd� 4 as a constant factor, the model is

p @ e3iH+s, @ H+p{,=

or simply

H+s, @ H+{i 3,e= (1)

Keep in mind thats and{ areQ � 4 vectors of asset prices and payoffs respectively� iis aN � 4 vector of factors, ande is aN � 4 vector of parameters. I suppress the timeindicespw.4> iw.4> {w.4>sw. The payoffs are typically returns or excess returns, includingreturns scaled by instruments. The prices are typically one (returns) zero (excess returns) orinstruments.

To implement GMM, we need to choose a set of moments. The obvious set of momentsto use are the pricing errors,

jW +e, @ HW +{i3e� s,=

This choice is natural but not necessary. You don’thave to uses @ H+p{, with GMM, andyou don’t have to use GMM withs @ H+p{,. You can (we will) use GMM on expectedreturn-beta models, and you can use maximum likelihood ons @ H+p{,. It is a choice, andthe results will depend on this choice of moments as well as the speci¿cation of the model.

229

Page 230: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

The GMM estimate is formed from

plqe

jW +e,3ZjW +e,

with ¿rst order condition

g3ZjW +e, @ g3Z HW +{i3e� s, @ 3

where

g3 @Cj3W +e,Ce

@ HW +i{3,=

This is the second moment matrix of payoffs and factors. The ¿rst stage has Z @ L, thesecond stage has Z @ V�4. Since this is a linear model, we can solve analytically for theGMM estimate, and it is

First stage = ae4 @ +g3g,�4g3HW +s,

Second stage = ae5 @ +g3V�4g,g3V�4HW +s,.

The ¿rst stage estimate is an OLS cross-sectional regression of average prices on thesecond moment of payoff with factors, and the second stage estimate is a GLS cross-sectionalregression.What could be more sensible? The model (14.1) says that average prices shouldbe a linear function of the second moment of payoff with factors, so the estimate runs a linearregression. These are cross-sectionalregressions since they operate across assets on sampleaverages. The “data points” in the regression are sample average prices+|, and secondmoments of payoffs with factors+{, across test assets. We are picking the parametere tomake the model¿t explain the cross-section of asset prices as well as possible.

We ¿nd the distribution theory from the standard GMM standard error formulas (12.2)and (12.8). In the¿rst stage,d @ g3.

First stage = fry+ae4, @4

W+g3g,�4g3Vg+g3g,�4 (14.2)

Second stage = fry+ae5, @4

W+g3V�4g,�4.

Unsurprisingly, these are exactly the formulas for OLS and GLS regression errors with errorcovarianceV. The pricing errors are correlated across assets, since the payoffs are correlated.Therefore the OLS cross-sectional regression standard errors need to be corrected for corre-lation, as they are in (14.2) and one can pursue an ef¿cient estimate as in GLS. The analogyis GLS is close, sinceV is the covariance matrix ofH+s, � H+{i 3,e� V is the covariancematrix of the “errors” in the cross-sectional regression.

230

Page 231: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 14.2 THE CASE OF EXCESS RETURNS

The covariance matrix of the pricing errors is, from (12.5), (12.9) and (12.10)

First stage = WfrykjW +ae,

l@

�L � g+g3g,�4g3

�V�L � g+g3g,�4g3

�(14.3)

Second stage = WfrykjW +ae,

l@ V � g+g3V�4g,�4g3.

These are obvious analogues to the standard regression formulas for the covariance matrix ofregression residuals.

The model test

jW +e,3fry+jW ,�4jW +e, � "5+&moments �&parameters,

which specializes for the second-stage estimate as

WjW +ae,3V�4jW +ae, � "5+&moments�&parameters,=

There is not much point in writing these out, other than to point out that the test is a quadraticform in the vector of pricing errors. It turns out that the"5 test has the same value for¿rstand second stage for this model, even though the parameter estimates, pricing errors andcovariance matrix are not the same.

14.2 The case of excess returns

Whenpw.4 @ d � e3iw.4 and the test assets are excess returns, the GMM estimate isa GLScross-sectionalregression of average returns on the second moments of returns withfactors,

First stage = ae4 @ +g3g,�4g3HW +Uh,

Second stage = ae5 @ +g3V�4g,g3V�4HW +Uh,.

where g is the covariance matrix between returns and factors. The other formulas are thesame.

The analysis of the last section requires that at least one asset has a nonzero price. If allassets are excess returns then ae4 @ +g3g,�4g3HW +s, @ 3. Linear factor models are most oftenapplied to excess returns, so this case is important. The trouble is that in this case the meandiscount factor is not identi¿ed. If H+pUh, @ 3 then H++5 �p,Uh, @ 3. Analogously inexpected return-beta models, if all test assets are excess returns, then we have no informationon the level of the zero-beta rate.

Writing out the model asp @ d�e3i , we cannot separately identifyd ande so we have tochoose some normalization. The choice is entirely one of convenience� lack of identi¿cation

231

Page 232: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

means precisely that the pricing errors do not depend on the choice of normalization.

The easiest choics is d @ 4. Then

jW +e, @ HW +pUh, @ HW +Uh,�H+Uhi 3,e=

We have

g3 @CjW +e,

Ce3@ H+iUh3,>

the second moment matrix of returns and factors. The ¿rst order condition to plq j3WZjWis

g3Z ^g e . HW +Uh,` @ 3=

Then, the GMM estimates of e are

First stage = ae4 @ +g3g,�4g3HW +Uh,

Second stage = ae5 @ +g3V�4g,g3V�4HW +Uh,.

The GMM estimate is a cross-sectional regression of mean excess returns on the secondmoments of returns with factors.From here on in, the distribution theory is unchanged fromthe last section.

Mean returns on covariances

We can obtain a cross-sectional regression of mean excess returns oncovariances, whichare just a heartbeat away from betas, by choosing the normalizationd @ 4 . e3H+i, ratherthand @ 4. Then, the model isp @ 4 � e3+i � H+i,, with meanH+p, @ 4. The pricingerrors are

jW +e, @ HW +pUh, @ HW +Uh,�HW +U

h �i 3,e

where I denote�i � i �H+i,= We have

g3 @CjW +e,

Ce3@ HW + �iU

h3,>

which now denotes the covariance matrix of returns and factors. The¿rst order condition toplq j3WZjW is now

g3Z ^g e . HW +Uh,` @ 3=

Then, the GMM estimates ofe are

First stage = ae4 @ +g3g,�4g3HW +Uh,

Second stage = ae5 @ +g3V�4g,g3V�4HW +Uh,.

232

Page 233: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 14.3 HORSE RACES

The GMM estimate is a cross-sectional regression of expected excess returns on the covari-ance between returns and factors.Naturally, the model says that expected excess returnsshould be proportional to the covariance between returns and factors, and the estimate es-timates that relation by a linear regression. The standard errors and variance of the pricingerrors are the same as in (14.2) and (14.3), withg now representing the covariance matrix.The formulas are almost exactly identical to those of the cross-sectional regressions in sec-tion 13.2. Thes @ H+p{, formulation of the model for excess returns is equivalent toH+Uh, @ �Fry+Uh> i 3,e� thus covariances enter in place of betas�.

There is oneÀy in the ointment� the mean of the factorH+i, is estimated, and the dis-tribution theory should recognize sampling variation induced by this fact, as we did for thefact that betas are generated regressors in the cross-sectional regressions of section 2.3. Thedistribution theory is straightforward, and a problem at the end of the chapter guides youthrough it. However, I think it is better to avoid the complication and just use the second mo-ment approach, or some other non sample dependent normalization ford. The pricing errorsare identical – the whole point is that the normalization ofd does not matter to the pricingerrors. Therefore, the"5 statistics are also identical. As you change the normalization ford, you change the estimate ofe. Therefore, the only effect is to add a term in the samplingvariance of the estimated parametere.

14.3 Horse Races

How to test whether one set of factors drives out another. Teste5 @ 3 in p @ e34i4.e35i5using the standard error ofae5> or the"5 difference test.

It’s often interesting to test whether one set of factors drives out another. For example,Chen Roll and Ross (1986) test whether their¿ve macroeconomic factors price assets so wellthat one can ignore even the market return. Given the large number of factors that have beenproposed, a statistical procedure for testing which factors survive in the presence of the othersis desirable.

In this framework, such a test is very easy. Start by estimating a general model

p @ e34i4 . e35i5= (4)

We want to know, given factorsi4, do we need thei5 to price assets – i.e. ise5 @ 3? Thereare two ways to do this.

First and most obviously, we have an asymptotic covariance matrix for^e4e5`, so we canform aw test (ife5 is scalar) or"5 test fore5 @ 3 by forming the statistic

ae35ydu+ae5,�4ae5 � "5&e5

233

Page 234: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

where &e5 is the number of elements in the e5 vector. This is a Wald test..

Second, we can estimate a restricted system p @ e34i4. Since there are fewer free param-eters and the same number of moments than in (14.4), we expect the criterionMW to rise. Ifwe use the same weighting matrix, (usually the one estimated from the unrestricted model(14.4)) then theMW cannot in fact decline. But ife5 really is zero, it shouldn’t rise “much.”How much? The"5 difference test answers that question�

WMW +restricted,� WMW +unrestricted, � "5+&of restrictions,

This is very much like a likelihood ratio test.

14.4 Testing for characteristics

How to check whether an asset pricing model drives out a characteristic such as size,book/market or volatility. Run cross sectional regressions of pricing errors on characteristics�use the formulas for covariance matrix of the pricing errors to create standard errors.

It’s often interesting to characterize a model by checking whether the model drives outa characteristic. For example, portfolios organized by size or market capitalization show awide dispersion in average returns (at least up to 1979). Small stocks gave higher averagereturns than large stocks. The size of the portfolio is a characteristic. A good asset pricingmodel should account for average returns by betas. It’s ok if a characteristic is associatedwith average returns, but in the end betas should drive out the characteristic� the alphas orpricing errors should not be associated with the characteristic. The original tests of the CAPMsimilarly checked whether the variance of the individual portfolio had anything to do withaverage returns once betas were included.

Denote the characteristic of portfoliol by |l. An obvious idea is to include both betas andthe characteristic in a multiple, cross-sectional regression,

H+Uhl, @ +�3, . �3l�. �|l . %l> l @ 4> 5> ===Q

Alternatively, subtract�� from both sides and consider a cross-sectional regression of alphason the characteristic,

�l @ +�3, . �|l . %l> l @ 4> 5> ===Q=

(The difference is whether you allow the presence of the size characteristic to affect the�estimate or not.)

We can always run such a regression, but we don’t want to use the OLS formulas for thesampling error of the estimates, since the errors%l are correlated across assets. Under thenull that� @ 3, % @ �, so we can simply use the covariance matrix of the alphas to generate

234

Page 235: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 14.5 TESTING FOR PRICED FACTORS: LAMBDAS OR B’ S?

standard errors of the�. Let[ denote the vector of characteristics, then the estimate is

a� @ +[3[,�4[3a�

with standard error

�+a�, @ +[3[,�4[3fry+a�,[+[3[,�4

At this point, simply use the formula forfry+a�, or fry+jW , as appropriate for the model thatyou tested.

Sometimes, the characteristic is also estimated rather than being a¿xed number suchas the size rank of a size portfolio, and you’d like to include the sampling uncertainty ofits estimation in the standard errors ofa�. Let |lw denote the time series whose meanH+|lw,determines the characteristic. Now, write the moment condition for the ith asset as

jW @ HW +pw.4+e,{w.4 � sw � �|lw,=

The estimate of� tells you how the characteristicH+|l, is associated with model pricingerrorsH+pw.4+e,{w.4 � sw,. The GMM estimate of� is

H+|,3Z +H+p{,� s� �|,

a� @ +H3W +|,ZHW +|,,

�4H3W +|,ZjW

a OLS or GLS regression of the pricing errors on the estimated characteristics. The stan-dard GMM formulas for the standard deviation of� or the"5 difference test for� @ 3 tellyou whether the� estimate is statistically signi¿cant, including the fact thatH+|,must beestimated.

14.5 Testing for priced factors: lambdas or b’s?

em asks whether factor m helps to price assets given the other factors. em gives the multipleregression coef¿cient of p on im given the other factors.

�m asks whether factor m is priced, or whether its factor-mimicking portfolio carries apositive risk premium.�m gives thesingle regression coef¿cient ofp onim .

Therefore, when factors are correlated, one should testem @ 3 to see whether to includefactorm given the other factors rather than test�m @ 3.

Expected return-beta models de¿ned withsingle regression betas give rise to� with mul-tiple regression interpretation that one can use to test factor pricing.

235

Page 236: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

In the context of expected return-beta models, it has been more traditional to evaluate therelative strenghths of models by testing the factor risk premia� of additional factors, ratherthan test whether theire is zero. (Thee’s are not the same as the�’s. e are the regressioncoef¿cient ofp oni> � are the regression coef¿cients ofUl oni=)

To keep the equations simple, I’ll use mean-zero factors, excess returns, and normalize toH+p, @ 4, since the mean ofp is not identi¿ed with excess returns.

The parameterse and� are related by

� @ H+ii 3,e=

See section 7.3. BrieÀy,

3 @ H+pUh, @ H ^Uh+4� i 3e,`

H+Uh, @ fry+Uh> i 3,e @ fry+Uh> i 3,H+ii 3,�4H+ii 3,e @ �3�=

Thus, when the factors are orthogonal,H+ii 3, is diagonal, and each�m @ 3 if and only ifthe correspondingem @ 3. The distinction betweene and� only matters when the factors arecorrelated. Factors are often correlated however.

�m captures whether factorim is priced. We can write� @ H ^i+i 3e,` @ �H+pi, to seethat� is (the negative of) the price that the discount factorp assigns toi . e captures whetherfactorim is marginally useful in pricing assets, given the presence of other factors. Ifem @ 3,we can price assets just as well without factorim as with it.

�m is proportional to thesingle regression coef¿cient of pon i . �m @ fry+p> im,.�m @ 3 asks the corresponding single regression coef¿cient question—“is factorm correlatedwith the true discount factor?”

em is themultiple regression coef¿cient ofpon im given all the other factors. This justfollows fromp @ e3i . (Regressions don’t have to have error terms!) A multiple regressioncoef¿cient�m in | @ {� . % is the way to answer “does{m help to explain variation in|given the presence of the other{’s?” When you want to ask the question, “should I includefactorm given the other factors?” you want to ask the multiple regression question.

For example, suppose the CAPM is true, which is the single factor model

p @ d� eUhp

whereUhp is the market excess return. Consider any other excess returnUh{, positivelycorrelated withUhp (x for extra). If we try a factor model with the spurious factorUh{, theanswer is

p @ d� eUhp . 3�Uh{=

e{ is obviously zero, indicating that adding this factor does not help to price assets.

However, since the correlation ofUh{ with Uhp is positive, the beta ofUh{ onUhp ispositive,Uh{ earns a positive expected excess return, and�{ @ H+Uh{, A 3. In the expected

236

Page 237: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 14.5 TESTING FOR PRICED FACTORS: LAMBDAS OR B’ S?

return - beta model

H+Uhl, @ �lp�p . �l{�{

�p @ H+Uhp, is unchanged by the addition of the spurious factor. However, since the fac-torsUhp> Uh{ are correlated, the multiple regressionbetas of Uhl on the factors change whenwe add the extra factorUh{. If �l{ is positive,�lp will decline from its single-regressionvalue, so the new model explains the same expected returnH+Uhl,= The expected return -beta model will indicate a risk premium for�{ exposure, and many assets will have�{ expo-sure (U{ for example!) even though factorU{ is spurious. In particular,Uh{ will of coursehave multiple regression coef¿cients�{>p @ 3 and�{>{ @ 4, and its expected return will beentirely explained by the new factor{.

So, as usual, the answer depends on the question. If you want to know whether factorlis priced, look at� (or H+pi l,). If you want to know whether factorl helps to price otherassets, look atel. This is not an issue about sampling error or testing. All moments above arepopulation values.

Of course, testinge @ 3 is particularly easy in the GMM,s @ H+p{, setup. But you canalways test the same ideas in any expression of the model. In an expected return-beta model,estimatee byH+ii 3,�4� and test the elements of that vector rather than� itself.

14.5.1 Mean-variance frontier and performance evaluation

A GMM, s @ H+p{, approach to testing whether a return expands the mean-variancefrontier. Just test whetherp @ d. eU prices all returns. If there is no risk free rate, use twovalues ofd.

We often summarize asset return data by mean-variance frontiers. For example, a largeliterature has examined the desirability of international diversi¿cation in a mean-variancecontext. Stock returns from many countries are not perfectly correlated, so it looks like onecan reduce portfolio variance a great deal for the same mean return by holding an internation-ally diversi¿ed portfolio. But is this real or just sampling error? Even if the value-weightedportfolio were ex-ante mean-variance ef¿cient, an ex-post mean-variance frontier constructedfrom historical returns on the roughly NYSE stocks would leave the value-weighted portfo-lio well inside the ex-post frontier. So is “I should have bought Japanese stocks in 1960” (andsold them in 1990!) a signal that broad-based international diversi¿cation a good idea now,or is it simply 20/20 hindsight regret like “I should have bought Microsoft in 1982?” Sim-ilarly, when evaluating fund managers, we want to know whether the manager is truly ableto form a portfolio that beats mean-variance ef¿cient passive portfolios, or whether betterperformance in sample is just due to luck.

Since a factor model is true if and only if a linear combination of the factors (or factor-

237

Page 238: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

E(R)

σ(R)

1/E(m)

Frontiers intersect

Figure 25. Mean variance frontiers might intersect rather than coincide.

mimicking portfolios if the factors are not returns) is mean-variance ef¿cient, one can inter-pret a test of any factor pricing model as a test whether a given return is on the mean-variancefrontier. Section 13.1 showed how the Gibbons Ross and Shanken pricing error statistic canbe interpreted as a test whether a given portfolio is on the mean-variance frontier, when re-turns and factors are i.i.d., and the GMM distribution theory of that test statistic allows us toextend the test to non-i.i.d. errors. A GMM,s @ H+p{,, p @ d � eUs test analogouslytests whetherUs is on the mean-variance frontier of the test assets.

We may want to go one step further, and not just test whether a combination of a set ofassetsUg (say, domestic assets) ison the mean-variance frontier, but whether theUg assetsspan the mean-variance frontier ofUg andUl (say, foreign or international) assets. Thetrouble is, that if there is no riskfree rate, the frontier generated byUg might justintersect thefrontier generated byUg andUl together, rather than span or coincide with the latter frontier,as shown in Figure 25. Testing thatp @ d � e3Ug prices bothUg andUl only checks forintersection.

DeSantis (1992) and Chen and Knez (1992,1993) show how to test for spanning as op-posed to intersection. Forintersection, p @ d � e3gU

g will price bothUg andUi only forone value ofd, or equivalentlyH+p, or choice of the intercept, as shown. If the frontiers co-incide or span, thenp @ d. e3g U

g prices bothUg andUi for any value ofd. Thus,we cantest for coincident frontiers by testing whetherp @ d . e3g Ug prices bothUg andUi for

238

Page 239: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 14.6 PROBLEMS

two prespeci¿ed values of d simultaneously.

To see how this work, start by noting that there must be at least two assets in Ug. If not,there is no mean-variance frontier ofUg assets� it is simply a point. If there are two assets inUg,Ug4 andUg5, then the mean-variance frontier of domestic assets connects them� they areeach on the frontier. If they are both on the frontier, then there must be discount factors

p4 @ d4 � �e4Ug4

and

p5 @ d5 � �e5Ug5

and, of course, any linear combination,

p @��d4 . +4� �,d5

�� k��e4Ug4 . +4� �,�e5Ug5

l=

Equivalently, for any value ofd, there is a discount factor of the form

p @ d� �e4Ug4 . e5Ug5

�=

Thus, you can test for spanning with aMW test on the moments

H�+d4 � e43Ug,Ug

�@ 3

H�+d4 � e43Ug,Ul

�@ 3

H�+d5 � e53Ug,Ug

�@ 3

H�+d5 � e53Ug,Ul

�@ 3

for any two¿xed values ofd4> d5.

14.6 Problems

1. Work out the GMM distribution theory for the modelp @ 4 � e3+i � H+i,, andtest assets are excess returns. The distribution should recognize the fact thatH+i, isestimated in sample. To do this, set up

jW @

�HW +U

h �Uh +i 3 �Hi 3, e,HW +i �Hi,

dW @

%HW

��iUh3

�3

3 LN

&=

239

Page 240: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 14 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

The estimated parameters are e> H+i,. You should end up with a formula for the standarderror of e that resembles the Shanken correction (13.19), and an unchanged MW test.

240

Page 241: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 15. Maximum likelihoodMaximum likelihood is, like GMM, a general organizing principle that is a useful place tostart when thinking about how to choose parameters and evaluate a model. It comes withan asymptotic distribution theory, which, like GMM, is a good place to start when you areunsure about how to treat various problems such as the fact that betas must be estimated in across-sectional regression.

As we will see, maximum likelihood is a special case of GMM. Given a statistical descrip-tion of the data, it prescribes which moments are statistically most informative. Given thosemoments, ML and GMM are the same. Thus, ML can be used to defend why one picks a cer-tain set of moments, or for advice on which moments to pick if one is unsure. In this sense,maximum likelihood (paired with carefully chosen statistical models) justi¿es the regressiontests above, as it justi¿es standard regressions. On the other hand, ML does not easily allowyou to use other non-“ef¿cient” moments, if you suspect that ML’s choices are not robust tomisspeci¿cations of the economic or statistical model. For example, ML will tell you howto do GLS, but it will not tell you how to adjust OLS standard errors for non-standard errorterms.

15.1 Maximum likelihood

The maximum likelihood principle says to pick the parameters that make the observeddata most likely. Maximum likelihood estimates are asymptotically ef¿cient. The informa-tion matrix gives the asymptotic standard errors of ML estimates.

The maximum likelihood principle says to pick that set of parameters that makes theobserved data most likely. This is not “the set of parameters that are most likely given thedata” – in classical (as opposed to Bayesian) statistics, parameters are numbers, not randomvariables.

To implement this idea, you¿rst have to¿gure out what the probability of seeing a dataseti{wj is, given the free parameters� of a model. This probability distribution is called thelikelihood function i+i{wj > �,. Then, the maximum likelihood principle says to pick

a� @ dujpd{i�j

i+i{wj > �,=

For reasons that will soon be obvious, it’s much easier to work with the log of this probabilitydistribution

O+i{wj > �, @ oq i+i{wj > �,>

241

Page 242: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 15 MAXIMUM LIKELIHOOD

Maximizing the log likelihood is the same thing as maximizing the likelihood.

Finding the likelihood function isn’t always easy. In a time-series context, the best wayto do it is often to¿rst¿nd the logconditional likelihood function i+{wm{w�4> {w�5> ==={3> �,,the chance of seeing{w.4 given{w> {w�4> === and given values for the parameters, . Since jointprobability is the product of conditional probabilities, the log likelihood function is just thesum of the conditional log likelihood functions,

O+i{wj > �, @W[w@4

oq i+{wm{w�4> {w�5==={3> �,= (1)

More concretely, we usually assume normal errors, so the likelihood function is

O @ �W

5oq +5� mm,� 4

5

W[w@4

%3w�4%w (2)

where%w denotes a vector of shocks� %w @ {w �H+{wm{w�4> {w�5==={3> �,.This expression gives a simple recipe for constructing a likelihood function. You usally

start with a model that generates{w from errors, e.g.{w @ �{w�4 . %w. Invert that model toexpress the errors%w in terms of the datai{wj and plug in to (15.2).

There is a small issue about how to start off a model such as (15.1). Ideally, the¿rstobservation should be the unconditional density, i.e.

O+i{wj > �, @ oq i+{4> �, . oq i+{5m{4> �, . oq i+{6m{5> {4> �,===However, it is usually hard to evaluate the unconditional density or the¿rst terms with onlya few lagged{s. Therefore, if as usual the conditional density can be expressed in termsof a ¿nite numbern of lags of{w, one often maximizes theconditional likelihood function(conditional on the¿rst n observations), treating the¿rst n observations as¿xed rather thanrandom variables.

O+i{wj > �, @ oq i+{n.4m{n> {n�4==={4> �, . oq i+{n.5m{n> {n�4==={5> �, . ===

Alternatively, one can treatn pre-sample valuesi{3> {�4> ==={�n.4j as additional parametersover which to maximize the likelihood function.

Maximum likelihood estimators come with a useful asymptotic (i.e. approximate) distri-bution theory. First, the distribution of the estimates is

a��Q#�>

�� C5OC�C�3

��4$(3)

If the likelihoodO has a sharp peak ata�, then we know a lot about the parameters, while ifthe peak isÀat, other parameters are just as plausible. The maximum likelihood estimator

242

Page 243: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 15.2 ML IS GMM ON THE SCORES

is asymptotically ef¿cient meaning that no other estimator can produce a smaller covariancematrix.

The second derivative in (15.3) is known as the information matrix,

L @ � 4

W

C5OC�C�3

@ � 4

W

W[w@4

C5 oq i+{w.4m{w> {w�4> ==={3> �,C�C�3

= (4)

(More precisely, the information matrix is de¿ned as the expected value of the second partial,which is estimated with the sample value.) The information matrix can also be estimated asa product of ¿rst derivatives. The expression

L @ � 4

W

W[w@4

�C oq i+{w.4m{w> {w�4> ==={3> �,

C�

��C oq i+{w.4m{w> {w�4> ==={3> �,

C�

�3=

converges to the same value as (15.4). (Hamilton 1994 p.429 gives a proof.)

If we estimate a model restricting the parameters, the maximum value of the likelihoodfunction will necessarily be lower. However, if the restriction is true, it shouldn’t be thatmuch lower. This intuition is captured in thelikelihood ratio test

5+Ounrestricted �Orestricted,�"5number of restrictions (5)

The form and idea of this test is much like the"5 difference test for GMM objectives that wemet in section??.

15.2 ML is GMM on the scores

ML is a special case of GMM. ML uses the information in the auxiliary statistical model toderive statistically most informative moment conditions. To see this fact, start with the¿rstorder conditions for maximizing a likelihood function

CO+i{wj > �,C�

@W[w@4

C oq i+{wm{w�4{w�5===> �,C�

@ 3= (6)

This is a GMM estimate. It is the sample counterpart to a population moment condition

j+�, @ H

�C oq i+{wm{w�4{w�5===> �,

C�

�@ 3= (7)

The termC oq i+{wm{w�4{w�5===> �,@C� is known as the “score.” It is a random variable,formed as a combination of current and past data+{w> {w�4===,. Thus, maximum likelihood isa special case of GMM, a special choice of which moments to examine.

243

Page 244: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 15 MAXIMUM LIKELIHOOD

For example, suppose that { follows an AR(1) with known variance,

{w @ �{w�4 . %w>

and suppose the error terms are i.i.d. normal random variables. Then,

oq i+{wm{w�4> {w�5===> �, @ const. � %5w5�5

@ const � +{w � �{w�4,5

5�5

and the score is

C oq i+{wm{w�4{w�5===> �,C�

@+{w � �{w�4,{w�4

�5=

The ¿rst order condition for maximizing likelihood is

4

W

W[w@4

+{w � �{w�4,{w�4 @ 3=

This expression is a moment condition, and you’ll recognize it as the OLS estimator of�,which we have already regarded as a case of GMM.

The example shows another property of scores:The scores should be unforecastable. Inthe example,

Hw�4

�+{w � �{w�4,{w�4

�5

�@ Hw�4

k%w{w�4�5

l@ 3= (8)

Intuitively, if we used a combination of the{ variablesH+k+{w> {w�4> ===,, @ 3 that waspredictable, we could form another moment – an instrument – that described the predictabilityof thek variable and use that moment to get more information about the parameters. To provethis property more generally, start with the fact thati+{wm{w�4> {w�5> ===> �, is a conditionaldensity and therefore must integrate to one,

4 @

]i+{wm{w�4> {w�5> ===> �,g{w

3 @

]Ci+{wm{w�4> {w�5> ===> �,

C�g{w

3 @

]C oq i+{wm{w�4> {w�5> ===> �,

C�i+{wm{w�4> {w�5> ===> �,g{w

3 @ Hw�4

�C oq i+{wm{w�4> {w�5> ===> �,

C�

�=

Furthermore, as you might expect,the GMM distribution theory formulas give the sameresult as the ML distribution, i.e., the information matrix is the asymptotic variance-covariancematrix. To show this fact, apply the GMM distribution theory (12.2) to (15.6). The derivative

244

Page 245: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 15.3 WHEN FACTORS ARE RETURNS, ML PRESCRIBES A TIME-SERIES REGRESSION.

matrix is

g @CjW +�,

C�3@

4

W

W[w@4

C5 oq i+{wm{w�4{w�5===> �,C�C�

3@ L

This is the second derivative expression of the information matrix. TheV matrix is

H

�C oq i+{wm{w�4{w�5===> �,

C�

C oq i+{wm{w�4{w�5===> �,C�

3�@ L

The lead and lag terms inV are all zero since we showed above that scores should be un-forecastable. This is the outer product de¿nition of the information matrix. There is nodmatrix, since the moments themselves are set to zero. The GMM asymptotic distribution ofa� is therefore

sW +a� � �, $ Q �

3> g�4Vg�43�@ Q �

3> L�4� =We recover the inverse information matrix, as speci¿ed by the ML asymptotic distributiontheory.

15.3 When factors are returns, ML prescribes a time-seriesregression.

I add to the economic model H +Uh, @ �H+i, a statistical assumption that the regressionerrors are independent over time and independent of the factors. ML then prescribes a time-series regression with no constant. To prescribe a time series regression with a constant, wedrop the model prediction� @ 3. I show how the information matrix gives the same resultas the OLS standard errors.

Given a linear factor model whose factors are also returns, as with the CAPM, ML pre-scribes a time-series regression test. To keep notation simple, I again treat a single factori .The economic model is

H +Uh, @ �H+i, (9)

Uh is anQ � 4 vector of test assets, and� is anQ � 4 vector of regression coef¿cients ofthese assets on the factor (the market returnUhp in the case of the CAPM).

To apply maximum likelihood, we need to add an explicit statistical model that fullydescribes the joint distribution of the data. I assume that the market return and regression

245

Page 246: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 15 MAXIMUM LIKELIHOOD

errors are i.i.d. normal, i.e.

Uhw @ �. �iw . %w (15.10)

iw @ H+i, . xw�%wxw

��Q

��33

�>

� 33 �5x

��

(We can get by with non-normal factors, but it is easier not to present the general case.)Equation (15.10) has no content other than normality. The zero correlation betweenxw and%w identi¿es� as a regression coef¿cient. You can just writeUh> Uhp as a general bivariatenormal, and you will get the same results.

The economic model (15.9) implies restrictions on this statistical model. Taking expec-tations of (15.10), the CAPM implies that the intercepts� should all be zero. Again, this isalso the only restriction that the CAPM places on the statistical model (15.10).

The most principled way to apply maximum likelihood is to impose the null hypothesisthroughout. Thus, we write the likelihood function imposing� @ 3= To construct the likeli-hood function, we reduce the statistical model to independent error terms, and then add theirlog probability densities to get the likelihood function.

O @ +const.)� 4

5

W[w@4

+Uhw � �iw,

3�4 +Uhw � �iw,� 4

5

W[w@4

+iw �H+i,,5

�5x

The estimates follow from the¿rst order conditions,

COC�

@ �4W[w@4

+Uhw � �iw, iw @ 3 , a� @

#W[w@4

i5w

$�4 W[w@4

Uhwiw

COCH+i,

@4

�5x

W[w@4

+iw �H+i,, @ 3 , H+i, @ a� @4

W

W[w@4

iw

(CO@C andCO@C�5 also produce ML estimates of the covariance matrices, which turn outto be the standard averages of squared residuals.)

The ML estimate of� is the OLS regressionwithout a constant. The null hypothesis saysto leave out the constant, and the ML estimator uses that fact to avoid estimating a constant.Since the factor risk premium is equal to the market return, it’s not too surprising that the�estimate is the same as that of the average market return.

We know that the ML distribution theory must give the same result as the GMM distribu-tion theory which we already derived in section 13.1, but it’s worth seeing it explicitly. The

246

Page 247: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 15.3 WHEN FACTORS ARE RETURNS, ML PRESCRIBES A TIME-SERIES REGRESSION.

asymptotic standard errors follow from either estimate of the information matrix, for example

C5OC�C�3

@ ��4W[w@4

i5w @ 3

Thus,

fry+a�, @4

W

4

H+i5, @

4

W

4

H+i,5 . �5+i,= (11)

This is the standard OLS formula.

We also want pricing error measurements, standard errors and tests. We can apply maxi-mum likelihood to estimate an unconstrained model, containing intercepts, and then use Waldtests (estimate/standard error) to test the restriction that the intercepts are zero. We can alsouse the unconstrained model to run the likelihood ratio test. The unconstrainted likelihoodfunction is

O @ +const.)� 4

5

W[w@4

+Uhw � �� �iw,

3�4 +Uhw � �� �iw, . ===

(I ignore the term in the factor, since it will again just tell us to use the sample mean toestimate the factor risk premium.)

The estimates are now

COC�

@ �4W[w@4

+Uhw � �� �iw, @ 3 , a� @ HW +U

hw ,� a�HW +iw,

COC�

@ �4W[w@4

+Uhw � �� �iw, iw @ 3 , a� @

fryW +Uhw > iw,

�5W +iw,

Unsurprisingly, the maximum likelihood estimates of� and� are the OLS estimates, with aconstant.

The inverse of the information matrix gives the asymptotic distribution of these estimates.Since they are just OLS estimates, we’re going to get the OLS standard errors, but it’s worthseeing it come out of ML.

5997 C5OC

���

�C�� �

�6::8�4

@

��4 �4H+i,

�4H+i, �4H+i5,

��4

@4

�5+i,

�H+i5, H+i,H+i, 4

247

Page 248: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 15 MAXIMUM LIKELIHOOD

The covariance matrices of a� and a� are thus

fry+a�, @4

W

%4 .

�H+i,

�+i,

�5&

fry+a�, @4

W

4

�5+i,= (15.12)

These are just the usual OLS standard errors, which we derived in section 13.1 as a specialcase of GMM standard errors for the OLS time-series regressions when errors are uncorre-lated over time and independent of the factors, or by specializing�5+[3[,�4.

You cannot just invertC5O@C�C�3 to ¿nd the covariance ofa�. That attempt would givejust as the covariance matrix ofa�, which would be wrong. You have to invert the entireinformation matrix to get the standard error of any parameter. Otherwise, you are ignoringthe effect that estimating� has on the distribution ofa�. In fact, what I presented is reallywrong, since we also must estimate. However, it turns out thata is independent ofa� anda� – the information matrix is block-diagonal – so the top left two elements of the true inverseinformation matrix are the same as I have written here.

The variance ofa� in (15.12) is larger than it is in (15.11) was when we impose the null ofno constant. ML uses all the information it can to produce ef¿cient estimates – estimates withthe smallest possible covariance matrix. The ratio of the two formulas is equal to the familiarterm 4 . H+i,5@�5+i,. In annual data for the CAPM,�+Uhp, @ 49(, H+Uhp, @ ;(,means that unrestricted estimate (15.12) has a variance 25% larger than the restricted estimate(15.11), so the gain in ef¿ciency can be important. In monthly data, however the gain issmaller since variance and mean both scale with the horizon.

We can also view this fact as a warning: ML will ruthlessly exploit the null hypothesis anddo things like running regressions without a constnat in order to get any small improvementin ef¿ciency.

We can use these covariance matrices to construct a Wald (estimate/standard error) testthe restriction of the model that the alphas are all zero,

W

#4 .

�H+i,

�+i,

�5$�4

a�3�4a��"5Q = (13)

Again, we already derived this"5 test in (13.3), and its¿nite sampleI counterpart, the GRSI test (13.4).

The other test of the restrictions is the likelihood ratio test (15.5). Quite generally, like-lihood ratio tests are asymptotically equivalent to Wald tests, and so gives the same result.Showing it in this case is not worth the algebra.

15.4 When factors are not excess returns, ML prescribes a

248

Page 249: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 15.4 WHEN FACTORS ARE NOT EXCESS RETURNS, ML PRESCRIBES A CROSS-SECTIONAL REGRESSION

cross-sectional regression

If the factors are not returns, we don’tt have a choice between time-series and cross-sectionalregression, since the intercepts are not zero. As you might suspect, ML prescribes a cross-sectional regression in this case.

The factor model, expressed in expected return beta form, is

H+Uhl, @ �l . �3l�> l @ 4> 5> ==Q (14)

The betas are de¿ned from time-series regressions

Uhlw @ dl . �3liw . %lw (15)

The interceptsdl in the time-series regressions need not be zero, since the model does notapply to the factors. They are not unrestricted, however. Taking expectations of the time-series regression (15.15) and comparing it to (15.14) (as we did to derive the restriction� @ 3for the time-series regression), the restriction� @ 3 implies

dl @ �3l +��H+iw,, (16)

Plugging into (15.15), the time series regressions must be of the restricted form

Uhlw @ �3l�. �3l ^iw �H+iw,` . %lw= (17)

In this form, you can see that�3l� determines the mean return. Since there are fewer factorsthan returns, this is a restriction on the regression (15.17).

Stack assetsl @ 4> 5> ===Q to a vector� and introduce the auxiliary statistical model thatthe errors and factors are i.i.d. normal and uncorrelated with each other. Then, the restrictedmodel is

Uhw @ E�.E ^iw �H+iw,` . %w

iw @ H+i, . xw�%wxw

�� Q

�3>

33 Y

whereE denotes aQ �N matrix of regression coef¿cients of theQ assets on theN factors.The likelihood function is

O @ +const.)� 4

5

W[w@4

%3w�4%w � 4

5

W[w@4

x3wY�4xw

%w @ Uhw �E ^�. iw �H+i,` > xw @ iw �H+i,=

249

Page 250: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 15 MAXIMUM LIKELIHOOD

Maximizing the likelihood function,

COCH+i,

= 3 @W[w@4

E3�4 +Uhw�E ^�. iw �H+i,`, .

W[w@4

Y �4+iw �H+i,,

COC�

= 3 @ E3W[w@4

�4 +Uhw �E ^�. iw �H+i,`,

The solution to this pair of equations is

H+i, @ HW +iw, (15.18)

a� @�E3�4E

��4E3�4HW +Uh

w , = (15.19)

The maximum likelihood estimate of the factor risk premium is a GLS cross-sectional regres-sion of average returns on betas.

The maximum likelihood estimates of the regression coef¿cients E are again not the sameas the standard OLS formulas. Again, ML imposes the null to improve ef¿ciency.

COCE

=W[w@4

�4 +Uhw�E ^�. iw �H+i,`, ^�. iw �H+i,`3 @ 3 (15.20)

aE @W[w@4

Uhw ^iw . ��H+i,`3

#W[w@4

^iw . ��H+i,` ^iw . ��H+i,`3$�4

This is true, even though the E are de¿ned in the theory as population regression coef¿cients.(The matrix notation hides a lot here! If you want to rederive these formulas, it’s helpful tostart with scalar parameters, e.g.Elm , and to think of it asCO@C� @

SWw@4 +CO@C%w,3 C%w@C�=

) Therefore, to really implement ML, you have to solve (15.19) and (15.20) simultaneouslyfor a�, aE, along witha whose ML estimate is the usual second moment matrix of the resid-uals. This can usually be done iteratively: Start with OLSaE, run an OLS cross-sectionalregression fora�, form a, and iterate.

15.5 Problems

1. Why do we use restricted ML when the factor is a return, but unrestricted ML whenthe factor is not a return? To see why, try to formulate a ML estimator based on anunrestricted regression when factors are not returns, equation (13.1). Add pricing errors�l to the regression as we did for the unrestricted regression in the case that factors arereturns, and then¿nd ML estimators forE> �> �>H+i,= (TreatY and as known tomake the problem easier.)

250

Page 251: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 15.5 PROBLEMS

Answer: Adding pricing errors to (15.17), we obtain

Uhlw @ �l . �3l�. �3l ^iw �H+iw,` . %lw=

Stacking assets l @ 4> 5> ===Q to a vector

Uhw @ �.E�.E ^iw �H+iw,` . %w

where E denotes a Q �N matrix of regression coef¿cients of the Q assets on the Nfactors.If we ¿t this model, maximum likelihood will give asset-by asset OLS estimates of theinterceptd @ � . E+� � H+i w,, and slope coef¿cientsE. It will not give separateestimates of� and�. The most that the regression can hope to estimate is one intercept�if one chooses a higher value of�, we can obtain the same error term with a lower valueof �. The likelihood surface isÀat over such choices of� and�. One could do an ad-hocsecond stage, minimizing (say) the sum of squared� to choose� givenE, H+i w, andd.This intuitively appealing procedure is exactly a cross-sectional regression. But it wouldbe ad-hoc, not ML.

2. Instead of writing a regression, build up the ML for the CAPM a little more formally.Write the statistical model as just the assumption that individual returns and the marketreturn are jointly normal,�

Uh

Uhp

��Q

��H+Uh,H+Uhp,

�>

fry+Uhp> Uh3,fry+Uhp> Uh, �5p

The model’s restriction is

H+Uh, @ �fry+Uhp> Uh,=

Estimate� and show that this is the same time-series estimator as we derived bypresupposing a regression.

251

Page 252: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 16. Time-series vs. crosssection� ML vs. GMMAs you probably have already noticed, GMM, regression and ML approaches to asset pricingmodels all look very similar. In each case one estimates a set of parameters, such as thed> e in pw @ d . ei w> the �> � in pw @ � +fw@fw�4,

�� or the �l in Uhl @ d . �liw . %lw.Then, we calculate pricing errors and evaluate the model by a quadratic form in the pricingerrors. Here I draw some additional connections and highlight the distinctions between thethree approaches. I start with two facts that help to anchor the discussion: 1) ML is a specialcase of GMM, 2) one can approach either s @ H+p{, or expected return/beta expressions ofasset pricing models with either ML or GMM approaches to estimation. .

16.1 Time series vs. cross-section

I track down why ML prescribes a time-series regression when factors are returns anda cross-sectional regression when factors are not returns. I argue that the cross-sectionalregression may be more robust to model misspeci¿cation. I show that the time-series / cross-sectional regression issue is the same as the OLS / GLS cross-sectional regression issue, andsimilar to the issue whether one runs time-series regressions with no intercept. In all cases,one may want to trade ef¿ciency for robustness.

The issue

When factors are not returns, ML prescribes a cross-sectional regression When the factorsare returns, ML prescribes a time-series regression. Figure 26 illustrates the difference be-tween the two approaches. The time-series regression estimates the factor risk premium fromthe average of the factors alone, ignoring any information in the other assets. For examplein the CAPM,a� @ HW +U

hp,. Thus, a time-series regression draws the expected return-betaline across assets by making it¿t precisely on two points, the market return and the risk-free rate, and the market and riskfree rate have zero estimated pricing error in every sample.The cross-sectional regression draws the expected return-beta line by minimizing the squaredpricing error across all assets. It allows some pricing error for the market return and (if theintercept is free) the riskfree rate, if by doing so the pricing errors on other assets can bereduced.

Of course, if the model is correct, the two approaches should converge as the sample getslarger. However, the difference between cross-sectional and time-series approaches can oftenbe large and economically important in samples and models typical of current empirical work.The ¿gure shows a stylized version of CAPM estimates on the size portfolios. High betaassets (the small¿rm portfolios) have higher average returns than predicted by the CAPM.

252

Page 253: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

)( eiRE

Market return

Riskfree rate

Cross-sectionalregression

Time-seriesregression

Figure 26. Time-series vs. cross-sectional regression estimates.

253

Page 254: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

However, the estimated pricing error for these assets is much smaller if one allows a cross-sectional regression to determine the market price of risk. Classical tests of the CAPM basedon beta-sorted portfolios often turn out the other way: the cross-sectional regression marketline isÀatter than the time-series regression, with an intercept that is higher than the sampleriskfree rate. As another example, Fama and French (1997) report important correlationsbetween betas and pricing errors in a time-series test of a three-factor model on industryportfolios. This correlation cannot happen with an OLS cross-sectional estimate, as the cross-sectional estimate sets the cross-sectional correlation between right hand variables (betas) anderror terms (pricing errors) to zero by construction.

When there is a choice – when one is testing a linear factor model in which the factors arealso portfolio returns – should one use a time-series or a cross-sectional regression? Sincethe¿nal evaluation of any model depends on the size of pricing errors, it would seem to makesense to estimate the model by choosing free parameters to make the pricing errors as small aspossible. That is exactly what the cross-sectional regression accomplishes. However, whenthe GRS assumptions are satis¿ed, the time-series regression is the maximum likelihoodestimator, and thus asymptotically ef¿cient. This seems like a strong argument for the puretime-series approach.

Why ML does what it does

To resolve this issue, we have to understand why ML prescribes such different procedureswhen factors are and aren’t returns. Why does ML ignore all the information in the test assetaverage returns, and estimate the factor premium from the average factor return only? Theanswer lies in the structure that we told ML to assume when looking at the data. First, whenwe writeUh @ d . �iw . %w and% independent ofi , we tell ML that a sample of returnsalready includes the samesample of the factor, plus extra noise. Thus, the sample of testasset returns cannot possibly tell ML anything more than the sample of the factor alone aboutthe mean of the factor. Second, we tell ML that the factor risk premium equals the mean ofthe factor, so it may not consider the possibility that the two are different in trying to matchthe data. When the factor is not also a return, ML still ignores anything but the factor data inestimating the mean of the factor, but now it is allowed to change a different parameter to¿tthe returns, which it does by cross-sectional regression.

The GLS cross-sectional regression will often be very close if not identical to the time-series regression, so we can view the issue as the choice between OLS and GLS. ML pre-scribes a GLS cross-sectional regression,

a� @��3�4�

��4�3�4HW +U

h,=

The assets are weighted by inverse of theresidual covariance matrix, not the return covari-ance matrix. If we include as a test asset a portfolio that is nearly equal to the factor but witha very small residual variance, then the elements of in that row and column will be verysmall and�4 will overwhelmingly weight that asset in determining the risk premium. If thelimit that we actually include the factor as a test asset,

��3�4�

��4�3�4 becomes a vector

254

Page 255: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.1 TIME SERIES VS. CROSS-SECTION

of zeros with a unit element in the position of the factor and we return toa� @ HW +i,. Thesame thing happens if the test assets span the factors. For example, if the test assets are value-weighted portfolios, a value-weighted portfolio of the test assets is often almost exactly thesame as the value-weighted market portfolio. In a test of the CAPM, the GLS cross-sectionalregression will give almost exactly the same result as the time series regression.

Model misspeci¿cation

As I argued in section 12.5 and in section 13.2, it may well be a good idea to use OLScross-sectional regressions or¿rst-stage GMM rather than the more ef¿cient GLS, becausethe OLS regression can be more robust to small misspeci¿cations of the economic or sta-tistical model, and OLS may be better behaved in small samples in which�4 is hard toestimate.

Similarly, time-series regressions are almost universally run with a constant, though MLprescribes a regression with no constant. The reason must be that researchers feel that omit-ting some of the information in the null hypothesis, the estimation and test is more robust,though some ef¿ciency is lost if the null economic and statistical models are exactly correct.Since ML prescribes a cross-sectional regression if we drop the restriction� @ H+i,, run-ning a cross-sectional regression may also be a way to gain robustness at the expense of onedegree of freedom.

Here is an example of a common small misspeci¿cation that justi¿es a cross-sectionalrather than a time-series approach. Suppose that the test portfolios do follow a single factormodel precisely, with an excess returniw as the single factor. However, we have an impreciseproxy for the true factor,

isw @ iw . �w= (1)

Most obviously, the market return is an imperfect proxy for the wealth portfolio that theCAPM speci¿es as the single factor. Multifactor models also use return factors that areimperfect proxies for underlying state variables. For example Fama and French (1993) useportfolios of stocks sorted by book/market value as a return factor, with the explicit idea thatit is a proxy for a more rigorously-derivable state variable.

The true model is

Uh @ �iw . %w= (2)

There is no intercept sincei is a return. For the statistical part of the model, I again assumethat%w andiw are jointly normal i.i.d. and independent of each other.

If we had data oniw, the ML estimate of this model would be, like the CAPM, a puretime series regression. We have to work out what a model using the proxyis as referenceportfolio looks like. This is an example, and to keep it simple I assume that�w is mean zero,and uncorrelated withiw and%w.

H+�, @ 3> H+�i, @ H+�%, @ 3=

255

Page 256: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

(This is a poor assumption for the CAPM. Since the market portfolio is a linear combinationof the test assets, the error inUp is the sum of the errors% and thus unlikely to be uncorrelatedwith them. This is a more plausible assumption for non-market factors in multifactor models.Similar examples withH+i%, 9@ 3 generate the same sorts of misspeci¿cation, but alsointroduce pricing errors in the test assets.)

Now, if we use the proxyis rather thani as the factor, thetest assets still follow thefactor model exactly, but the factor portfolio does not, and the risk premium is no longerequal to the mean of the factor portfolio:

H+Uh, @ �s�s (16.3)

H+is, @ �s � �s

and�s is the regression coef¿cient ofUh on the proxyis. Therefore, if you spell out themisspeci¿cation, the ML estimate of the factor model is now a cross-sectional regression, nota time-series regression! A similar misspeci¿cation occurs when we suspect that the riskfreerate is “too low” and again leads to cross-sectional estimates.

The algebra behind (16.3) is straightforward,

fry+Uh> is, @ fry+Uh> i . �, @ fry+Uh> i,

H+Uh, @fry+Uh> i,

�5+i,� @

fry+Uh> is,

�5+is,

�5+is,

�5+i,� @ �s�s

H+is, @ H+i, @ � @ �s�5+i,

�5+is,@ �s � �

��5+�,

�5+i,

�@ �s � �s

where I have introduced the notation

�s ��4 .

�5 +�,

�5+i,

��

�s ���5+�,

�5+i,

��=

16.2 ML vs. GMM

I have emphasized the many similarities between ML, GMM and regressions. In the clas-sic environments we have examined –a linear factor model, excess return, and i.i.d. normalreturns and factors–all methods basically run some sort of cross-sectional regression of aver-age returns on a second moment, and evaluate the model by a quadratic form of the pricingerrors. In Monte Carlo evaluations they produce almost identical estimates and test statistics(Cochrane 2000 runs a Monte Carlo showing how close the discount factor/GMM methodand regressions are in the i.i.d. environment� Jagannathan and Wang 2000 show some addi-tional asymptotic similarities between the methods.) Before we use them in more complex

256

Page 257: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.2 ML VS. GMM

environments, it is important to see that the GMM / discount factor procedures are not radi-cally different than the well-understood regression procedures in the classic environments.

However, ML, GMM and regressions can suggest quite different procedures, and pro-duce quite different results, in other situations. In particular, they can differ when the modelis highly nonlinear, or there is important conditioning information such as changing meanreturns and changing volatility of returns, changing betas and so forth, or when return dis-tributions are not normal. This happens in more sophisticated consumption-based models,explicit term structure and option pricing models, and factor models with richer conditioninginformation.

“ML” vs. “GMM” is really a bad way to put the issue. ML is a special case of GMM: itsuggests a particular choice of moments that are statistically optimal in a well-de¿ned sense.Given the set of moments, the distribution theories are identical. Also, there is no such thingas “the” GMM estimate. GMM is aÀexible tool� you can use anydW matrix andjW momentsthat you want to use. For example, I used GMM to derive the asymptotic distribution ofthe standard time-series regression estimator with autocorrelated returns� the moments in thiscase were not the pricing errors. It’s all GMM� the issue is the choice of moments. BothML and GMM are best thought of tools that a thoughtful researcher can use in learning whatthe data says about a given asset pricing model, rather than as stone tablets giving precisedirections that lead to truth if followed literally. If followed literally and thoughtlessly, bothML and GMM can lead to horrendous results.

As we have seen, there are situations in which it’s better to trade some small ef¿ciencygains for the robustness of simpler procedures or more easily interpretable moments� OLS canbe better than GLS. The central reason is speci¿cation errors� the fact that our statistical andeconomic models are at best quantitative parables. There are no theorems about the optimaltradeoff of ef¿ciency for robustness, but we can think about the lessons of past experienceswith this question. That is the point of this section.

16.2.1 Speci¿cation errors

ML is often ignored

As we have seen, ML plus the assumption of normal i.i.d. disturbances leads to easilyinterpretable time-series or cross-sectional regressions, empirical procedures that are close tothe economic content of the model. However, asset returns arenot normally distributed ori.i.d.. They have fatter tails than a normal, they are heteroskedastic (times of high and timesof low volatility), they are autocorrelated, and predictable from a variety of variables. Ifone were to take seriously the ML philosophy and its quest for ef¿ciency, one should modelthese features of returns. The result would be a different likelihood function, and its scoreswould prescribe different moment conditions than the familiar and intuitive time-series orcross-sectional regressions.

Interestingly, few empirical workers do this. (The exceptions tend to be papers whose

257

Page 258: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

primary point is illustration of econometric technique rather than empirical¿ndings.) MLseems to be¿ne when it suggests easily interpretable regressions� when it suggests somethingelse, people use the regressions anyway. For example, as we have seen, ML prescribes thatone estimate�s without a constant.�s are almost universally estimated with a constant.ML speci¿es a GLS cross-sectional regression, but many empirical workers use OLS cross-sectional regressions. And of course, any of the regressions continue to be run, with MLjusti¿cations, despite the fact that returns do not conform to the i.i.d. assumptions and thetechnical feasibility of addressing non-normal and non-i.i.d. returns. The regressions came¿rst, and the maximum likelyhood formalization came later. If we had to assume that returnshad a gamma distribution to justify the regressions, it’s a sure bet that we would make that“assumption” behind ML instead of the normal i.i.d. assumption!

This fact tells us something interesting about the nature of empirical work: researchersdon’t really believe that their null hypotheses, statistical and economic, are exactly cor-rect. They want to produce estimates and tests that are robust to reasonable model mis-speci¿cations. They also want to produce estimates and tests that are easily interpretable,that capture intuitively clear stylized facts in the data, and that relate directly to the economicconcepts of the model. Such estimates are persuasive in large part because the reader cansee that they are robust. (And following this train of thought, one might want to pursue es-timation strategies that are even more robust than OLS, since OLS places a lot of weight onoutliers. For example, Chen and Ready (1997) claim that Fama and French’s (1993) size andvalue effects depend crucially on a few outliers.)

ML does not necessarily produce robust or easily interpretable estimates. It wasn’t de-signed to do so. The point and advertisement of ML is that it providesef¿cient estimates�it uses every scrap of information in the statistical and economic model in the quest for ef-¿ciency. It does the “right” ef¿cient thing if model is true. It does not necessarily do the“reasonable” thing for “approximate” models.

Examples

We have seen that ML speci¿es a time-series regression when the factor is a return, buta cross-sectional regression when the factor is not a return. The time-series regression gainsone degree of freedom, but we have also seen that an even slight proxy error in the factorleads to the more intuitive cross-sectional regression. We have also discussed reasons whyresearchers use OLS cross-sectional regressions rather than more “ef¿cient” GLS. GLS re-quires modeling and inverting anQ � Q covariance matrix, and then focuses attention onportfolios with strong positive and negative weights that seem to have lowest variance in asample. But such portfolios may be quite sensitive to small transactions costs, so the esti-mate and test focuses on pricing a portfolio that investors cannot hold. Also, the samplingerror in large covariance matrix estimates may ruin the asymptotic advantages of GLS in a¿-nite sample. Similarly, if one asked a researcher why he included a constant in estimating abeta while applying the CAPM, he might well respond that he doesn’t believe the theory thatmuch.

Here are some additional examples:

258

Page 259: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.2 ML VS. GMM

Time-series models.In estimating time-series models such as the AR(1), maximum like-lihood minimizes one-step ahead forecast error variance,

S%5w . But any time-series model

is only an approximation, and the researcher’s objective may not be one-step ahead forecast-ing. For example, in making sense of the yield on long term bonds, one is interested in thelong-run behavior of the short rate of interest. The approximate model that generates thesmallest one-step ahead forecast error variance may be quite different from the model thatbest matches long-run autocorrelations. ML can pick the wrong model and make very badpredictions for long-run responses. (Cochrane 1986 contains a more detailed analysis of thispoint.)

Lucas’ money demand estimate.Lucas (198x) contains a gem of an example. Lucas wasinterested in estimating the income elasticity of money demand. Money and income trendupwards over time and over business cycles, but also have some high-frequency movementthat looks like noise. If you run a regression in log-levels,

pw @ d. e|w . %w

you get a sensible coef¿cient of aboute @ 4, but you¿nd that the error term is stronglyserially correlated. Following standard advice, most researchers run GLS, which amountspretty much to¿rst-differencing the data,

pw �pw�4 @ e+|w � |w�4, . �w=

This error term passes its Durbin-Watson statistic, but thee estimate is much lower, whichdoesn’t make much economic sense, and, worse, is unstable, depending a lot on time periodand data de¿nitions. Lucas realized that the regression in differences threw out all of theinformation in the data, which was in the trend, and focused on the high-frequency noise.Therefore, the regression in levels, with standard errors corrected for correlation of the errorterm, is the right one to look at. Of course, GLS and ML didn’t know there was any “noise” inthe data, which is why they threw out the baby and kept the bathwater. Again, ML ruthlesslyexploits the null for ef¿ciency, and has no way of knowing what is “reasonable” or “intuitive.”

Stochastic singularities and calibration. Models of the term structure of interest ratesand real business cycle models in macroeconomics give even more stark examples. Thesemodels arestochastically singular. They generate predictions for many time series from afew shocks, so the models predict that there are combinations of the time series that leaveno error term. Even though the models have rich and interesting implications, ML will seizeon this economically uninteresting singularity, refuse to estimate parameters, and reject anymodel of this form.

The simplest example of the situation is the linear-quadratic permanent income modelpaired with an AR(1) speci¿cation for income. The model is

|w @ �|w�4 . %w

fw � fw�4 @ +Hw � Hw�4,4

4� �

4[m@3

�m|w.m @4

+4� ��, +4� �,%w

259

Page 260: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

This model generates all sorts of important and economically interesting predictions for thejoint process of consumption and income (and asset prices). Consumption should be roughlya random walk, and should respond only to permanent income changes� investment should bemore volatile than income and income more volatile than consumption. Since there is onlyone shock and two series, however, the model taken literally predicts a deterministic relationbetween consumption and income� it predicts

fw � fw�4 @u�

4� ��+|w � �|w�4, =

ML will notice that this is thestatistically most informative prediction of the model. Thereis no error term! In any real data set there isno con¿guration of the parametersu> �> � thatmake this restriction hold, data point for data point. The probability of observing a data setifw> |wj is exactly zero, and the log likelihood function is�4 for any set of parameters. MLsays to throw the model out.

The popular af¿ne models of the term structure of interest rates act the same way. Theyspecify that all yields at any moment in time are deterministic functions of a few state vari-ables. Such models can capture much of the important qualitative behavior of the term struc-ture, including rising, falling and humped shapes, and the information in the term structurefor future movements in yields and the volatility of yields. They are very useful for derivativepricing. But it is never the case in actual yield data that yields of all maturities areexact func-tions ofN yields. Actual data onQ yields always requireQ shocks. Again, a ML approachreports a�4 log likelihood function for any set of parameters.

Addressing model mis-speci¿cation

The ML philosophy offers an answer to model mis-speci¿cation: specify theright model,and then do ML. If regression errors are correlated, model and estimate the covariance matrixand do GLS. If you are worried about proxy errors in the pricing factor, short sales costs orother transactions costs so that model predictions for extreme long-short positions shouldnot be relied on, time-aggregation or mismeasurement of consumption data, non-normal ornon-i.i.d. returns, time-varying betas and factor risk premia� additional pricing factors and soon–don’t chat about them, write them down, and then do ML.

Following this lead, researchers have added “measurement errors” to real business cy-cle models (Sargent 199x is a classic example) and af¿ne yield models in order to break thestochastic singularity. The trouble is, of course, that the assumed structure of the measure-ment errors now drives what moments ML pays attention to. And modeling and estimatingthe measurement errors takes us further away from the economically interesting parts of themodel.

More generally, authors tend not to follow this advice, in part because it is ultimatelyinfeasible. Economics necessarily studies quantitative parables rather than completely spec-i¿ed models. It would be nice if we could write down completely speci¿ed models, if wecould quantitatively describe all the possible economic and statistical model and speci¿ca-tion errors, but we can’t.

260

Page 261: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.2 ML VS. GMM

The GMM framework, used judiciously, allows us to evaluate misspeci¿ed models. Itallows us to direct that the statistical effort focus on the “interesting” predictions while ignor-ing the fact that the world does not match the “uninteresting” simpli¿cations. For example,where ML only gives us a choice of OLS, whose standard errors are wrong, or GLS, GMMallows us to keep an OLS estimate, but correct the standard errors for non-i.i.d. distributions.More generally, GMM allows one to specify an economically interesting set of moments, or aset of moments that one feels will be robust to misspeci¿cations of the economic or statisticalmodel,without having to spell out exactly what is the source of model mis-speci¿cation thatmakes those moments “optimal” or even “interesting” and “robust.” It allows one to acceptthe lower “ef¿ciency” of the estimates under some sets of statistical assumptions, in returnfor such robustness.

At the same time, the GMM framework allows us toÀexibly incorporate statistical modelmisspeci¿cations in the distribution theory. For example, knowing that returns are not i.i.d.normal, one may want to use the time series regression technique to estimate betas anyway.This estimate is not inconsistent, but thestandard errors that ML formulas pump out underthis assumption are inconsistent. GMM gives aÀexible way to derive at least and asymptoticset of corrections for statistical model misspeci¿cations of the time-series regression coef¿-cient. Similarly, a pooled time-series cross-sectional OLS regression is not inconsistent, butstandard errors that ignore cross-correlation of error terms are far too small.

The “calibration” of real business cycle models is often really nothing more than a GMMparameter estimate, using economically sensible moments such as average output growth,consumption/output ratios etc. to avoid the stochastic singularity that would doom a ML ap-proach. (Kydland and Prescott’s 198x idea that empirical microeconomics would provideaccurate parameter estimates for macroeconomic and¿nancial models has pretty much van-ished.) Calibration exercises usually do not compute standard errors, nor do they report anydistribution theory associated with the “evaluation” stage when one compares the model’spredicted second moments with those in the data. Following Christiano and Eichenbaum(19xx) however, it’s easy enough to calculate such a distribution theory – to evaluate whetherthe difference between predicted “second moments” and actual moments is large comparedto sampling variation, including the variation induced by parameter estimation in the samesample – by listing the¿rst and second moments together in thejW vector.

“Used judiciously” is an important quali¿cation. Many GMM estimations and tests suf-fer from lack of thought in the choice of moments, test assets and instruments. For example,early GMM papers tended to pick assets and especially instruments pretty much at random.Industry portfolios have almost no variation in average returns to explain. Authors oftenincluded many lags of returns and consumption growth as instruments to test a consumption-based model. However, the 7th lag of returns really doesn’t predict much about future returnsgiven lags 1-12, and the¿rst-order serial correlation in seasonally adjusted, ex-post revisedconsumption growth may be economically uninteresting. More recent work tends to em-phasize a few well-chosen assets and instruments that capture important and economicallyinteresting features of the data..

261

Page 262: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

16.2.2 Other considerations for statistical methods

Auxiliary model

ML requires an auxiliary statistical model. For example, in the classic ML formalizationof regression tests, we had to stopped to assume that returns and factors are jointly i.i.d. nor-mal. As the auxiliary statistical model becomes more and more complex and hence realistic,more and more effort is devoted to estimating the auxiliary statistical model. ML has no wayof knowing that some parameters –d> e> �> �, risk aversion� – are more “important” thanothers –, and parameters describing time-varying conditional moments of returns.

A very convenient feature of GMM is that it does not require such an auxiliary statisticalmodel. For example, in studying GMM we went straight froms @ H+p{, to momentconditions, estimates, and distribution theory. This is an important saving of the researcher’sand the reader’s time, effort and attention.

Finite sample distributions

Many authors say they prefer regression tests and the GRS statistic in particular becauseit has a¿nite sample distribution theory, and they distrust the¿nite-sample performance ofthe GMM asymptotic distribution theory.

This argument does not have much force. The¿nite sample distribution only holds ifreturns really are normal and i.i.d., and if the factor is perfectly measured. Since these as-sumptions do not hold, it is not obvious that a¿nite-sample distribution that ignores non-i.i.d.returns will be a better approximation than an asymptotic distribution that corrects for them.

In addition, all approaches give essentially the same answers in the classic setup of i.i.d.returns. The¿nite sample size and power of GMM distributions for regressions and dis-count factor estimates are almost identical to those of the ML, regression and GRS formulas.(Cochrane 2000.) The issue is how the various techniques perform in more complex setups,especially with conditioning information, and here there are no analytic¿nite-sample distri-butions.

In addition, once you have picked the estimation method – how you will generate a num-ber from the data� or which moments you will use –¿nding its¿nite sample distribution,given an auxiliary statistical model, is simple. Just run a Monte Carlo or bootstrap. Thus,picking an estimation method because it delivers analytic formulas for a¿nite sample distri-bution (under false assumptions) should be a thing of the past. Analytic formulas for¿nitesample distributions are useful for comparing estimation methods and arguing about statisti-cal properties of estimators, but they are not necessary for the empiricists’ main task.

Finite sample quality of asymptotic distributions, and “nonparametric” estimates

Several investigations (citation 19xx) have found cases in which the GMM asymptoticdistribution theory is a poor approximation to a ¿nite-sample distribution theory. This is es-pecially true when one asks “non-parametric” corrections for autocorrelation or heteroskedas-ticity to provide large corrections and when the number of moments is large compared to the

262

Page 263: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.2 ML VS. GMM

sample size.

The ML distribution is the same as GMM, conditional on the choice of moments, but typ-ical implementations of ML also use the parametric time-series model to simplify estimatesof the terms in the distribution theory as well as to derive the likelyhood function.

If this is the case – if the “nonparametric” estimates of the GMM distribution theoryperform poorly in a¿nite sample, while the “parametric” ML distribution works well – thereis no reason not to use a parametric time series model to estimate the terms in the GMMdistribution as well. For example, rather than calculate

S4m@�4H+xwxw�m, from a large

sum of autocorrelations, you can modelxw @ �xw�4 . %w, estimate�, and then calculate�5+x,

S4m@�4 �m @ �5+x,4.�4�� . Section 12.7 discussed this idea in more detail.

The case for ML

In the classic setup, the ef¿ciency gain of ML over GMM on the pricing errors is tiny.However, several studies have found cases in which the statistically motivated choice of mo-ments suggested by ML has important ef¿ciency advantages.

For example, Jacquier, Polson and Rossi (1994) study the estimation of a time-seriesmodel with stochastic volatility. This is a model of the form

gVw@Vw @ �gw. Ywg]4w (16.4)

gYw @ �Y +Yw,gw. �+Yw,g]5w>

andV is observed butY is not. The obvious and easily interpretable moments include theautocorrelation of squared returns, or the autocorrelation of the absolute value of returns.However, Jacquier, Polson and Rossi¿ndthat the resulting estimates are far less ef¿cient thanthose resulting from the ML scores.

Of course, this study presumes that the model (16.4) really is exactly true. Whetherthe uninterpretable scores or the interpretable moments really perform better to give an ap-proximate model of the form (16.4), given some other data-generating mechanism is open todiscussion.

Even in the canonical OLS vs. GLS case, a wildly heteroskedastic error covariance ma-trix can mean that OLS spends all its effort¿tting unimportant data points. A “judicious”application of GMM (OLS) in this case would require at least some transformation of unitsso that OLS is not wildly inef¿cient.

Statistical arguments

The history of empirical work that has been persuasive – that has changed people’s under-standing of the facts in the data and which economic models understand those facts – looks alot different than the statistical theory preached in econometrics textbooks.

The CAPM was taught and believed in and used for years despite formal statistical rejec-tions. It only fell by the wayside when other, coherent views of the world were offered in themultifactor models. And the multifactor models are also rejected! It seems that “it takes a

263

Page 264: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

model to beat a model,” not a rejection.

Even when evaluating a speci¿c model, most of the interesting calculations come fromexamining speci¿c alternatives rather than overall pricing error tests. The original CAPMtests focused on whether the intercept in a cross-sectional regression was higher or lower thanthe risk free rate, and whether individual variance entered into cross-sectional regressions.The CAPM fell when it was found that characteristics such as size and book/market do entercross-sectional regressions, not when generic pricing error tests rejected.

InÀuential empirical work tells a story. The most ef¿cient procedure does not seem toconvince people if they cannot transparently see what stylized facts in the data drive theresult. A test of a model that focuses on its ability to account for the cross section of averagereturns of interesting portfolios will in the end be much more persuasive than one that (say)focuses on the model’s ability to explain the¿fth moment of the second portfolio, even if ML¿nds the latter moment much more statistically informative.

Most recently, Fama and French (199x) and (199x) are good examples of empirical workthat changed many people’s minds, in this case that long-horizon returns really are pre-dictable, and that we need a multifactor model rather than the CAPM to understand thecross-section of average returns. These papers are not stunning statistically: long horizonpredictability is on the edge of statistical signi¿cance, and the multifactor model is rejectedby the GRS test. But these papers made clear what stylized and robust facts in the data drivethe results, and why those facts are economically sensible. For example, the (1993) paper fo-cused on tables of average returns and betas. Those tables showed strong variation in averagereturns that was not matched by variation in market betas, yet was matched by variation inbetas on new factors. There is no place in statistical theory for such a table, but it is muchmore persuasive than a table of"5 values for pricing error tests. On the other hand, I canthink of no case in which the application of a clever statistical models to wring the last ounceof ef¿ciency out of a dataset, changing t statistics from 1.5 to 2.5, substantially changed theway people think about an issue.

Statistical testing is one of many questions we ask in evaluating theories, and usually notthe most important one. This is not a philosophical or normative statement� it is a positive orempirical description of the process by which the profession has moved from theory to theory.Think of the kind of questions people ask when presented with a theory and accompanyingempirical work. They usuall start by thinking hard about the theory itself. What is the centralpart of the economic model or explanation? Is it internally consistent? Do the assumptionsmake sense? Then, when we get to the empirical work, how were the numbers produced?Are the data de¿nitions sensible? Are the concepts in the data decent proxies for the conceptsin the model? (There’s not much room in statistical theory for that question!) Are the modelpredictions robust to the inevitable simpli¿cations? Does the result hinge on power utility vs.another functional form? What happens if you add a little measurement error, or if agentshave an information advantage, etc.? What are the identi¿cation assumptions, and do theymake any sense – why is| on the left and{ on the right rather than the other way around?Finally, someone in the back of the room might raise his hand and ask, “if the data were

264

Page 265: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 16.2 ML VS. GMM

generated by a draw of i.i.d. normal random variables over and over again, how often wouldyou come up with a number this big or bigger?” That’s an interesting and important check onthe overall believability of the results. But it is not necessarily the¿rst check, and certainlynot the last and decisive check. Many models are kept that have economically interesting butstatistically rejectable results, and many more models are quickly forgotten that have strongstatistics but just do not tell as clean a story.

The classical theory of hypothesis testing, its Bayesian alternative, or the underlyinghypothesis-testing view of the philosophy of science are miserable descriptions of the wayscience in general and economics in particular proceed from theory to theory. And this isprobably a good thing too. Given the non-experimental nature of our data, the inevitable¿sh-ing biases of many researchers examining the same data, and the unavoidable fact that ourtheories are really quantitative parables more than literal descriptions of the way the data aregenerated, the way the profession settles on new theories makes a good deal of sense. Clas-sical statistics requires that nobody ever looked at the data before specifying the model. Yetmore regressions have been run than there are data points in the CRSP database. Bayesianeconometrics can in principle incorporate the information of previous researchers, yet it neverapplied in this way – each study starts anew with a “uninformative” prior. Statistical theorydraws a sharp distinction between themodel – which we know is right� utility is exactlypower� and theparameters which we estimate. But this distinction isn’t true� we are just asuncertain about functional forms as we are about parameters. A distribution theory at bottomtries to ask an unknowable question: If we turned the clock back to 1947 and reran the post-war period 1000 times, in how many of those alternative histories would (say) the averageS&P500 return be greater than 9%? It’s pretty amazing in fact that a statistician can purportto give any answer at all to such a question, having observed only one history.

These paragraphs do not contain original ideas, and they mirror changes in the philosophyof science more broadly. 50 years ago, the reigning philosophy of science focused on the ideathat scientists provide rejectable hypotheses. This idea runs through philosophical writingsexempli¿ed by Popper (19xx), the statistical decision theory of Fisher (19xx), and mirroredin economics by Friedman (19xx) However, this methodology contains an important incon-sistency. Though researchers are supposed to let the data decide, writers on methodology donot look at how actual theories evolved. It was, as in Friedman’s title, a “Methodology ofpositive economics,” not a “positive methodology of economics.” Why should methodologybe normative, a result of philosophical speculation, and not an empirical discipline like ev-erything else. In a very famous book, Kuhn (19xx) looked at the actual history of scienti¿crevolutions, and found that the actual process had very little to do with the formal methodol-ogy. McCloskey (198x, 198x) has gone even further, examining the “rhetoric” of economics�the kinds of arguments that persuaded people to change their minds about economic theories.Needless to say, the largest t-statistic did not win!

Kuhn’s and especially McCloskey’s ideas are not popular in the¿nance and economicsprofessions. Most people in the¿elds cling to the normative, rejectable-hypothesis view ofmethodology. But we need not suppose that they would be popular. The ideas of economicsand¿nance are not popular among the agents in the models. How many stock market in-

265

Page 266: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

vestors even know what a random walk or the CAPM is, let alone believing those modelshave even a grain of truth? Why should the agents in the models of how scienti¿c ideasevolve have an intuitive understanding of the models? “As if” rationality can apply to us aswell!

Philosophical debates aside, a researcher who wants his ideas to be convincing, as wellas right, would do well to study how ideas have in the past convinced people, rather than juststudy a statistical decision theorist’s ideas about how ideasshould convince people. Kuhn,and, in economcs, McCloskey have done that, and their histories are worth reading. In theend, statistical properties may be a poor way to choose statistical methods.

266

Page 267: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

PART IIIBonds and options

267

Page 268: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 16 TIME-SERIES VS. CROSS SECTION� ML VS. GMM

The term structure of interest rates and derivative pricing use closely related techniques. Asyou might have expected, I present both issues in a discount factor context. All models comedown to a speci¿cation of the discount factor. The discount factor speci¿cations in termstructure and option pricing models are quite simple.

So far, we have focused on returns, which reduces the pricing problem to a one-period orinstantaneous problem. Pricing bonds and options forces us to start thinking about chainingtogether the one-period or instantaneous representations to get a prediction for prices of long-lived securities. Taking this step is very important, and I forecast that we will see muchmore multiperiod analysis in stocks as well, studying price and stream of payoffs rather thanreturns. This step rather than the discount factor accounts for the mathematical complexityof some term structure and option pricing models.

There are two standard ways to go from instantaneous or return representations to prices.First, we can chain the discount factors together,¿nding from a one period discount factorpw>w.4 a long-term discount factorpw>w.m @ pw>w.4pw.4>w.5===pw.m�4pw.m that can pricea m period payoff. In continuous time, we will¿nd the discount factor incrementsg� satisfythe instantaneous pricing equation3 @ Hw ^g +�S ,`, and then solve its stochastic differentialequation to¿nd its level�w.m in order to price am� period payoff asSw @ Hw ^�w.m@�w{w.m `.Second, we can chain the prices together. Conceptually, this is the same as chaining re-turnsUw>w.m @ Uw>w.4Uw.4>w.5==Uw.m�4>w.m instead of chaining together the discount fa-cotrs. From3 @ Hw ^g +�S ,`, we ¿nd a differential equation for the prices, and solve thatback. We’ll use both methods to solve interest rate and option pricing models.

268

Page 269: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 17. Option pricingOptions are a very interesting and useful set of instruments, as you will see in the backgroundsection. In thinking about their value, we will adopt an extremely relative pricing approach.Our objective will be to ¿nd out a value for the option, taking as given the values of othersecurities, and in particular the price of the stock on which the option is written and an interestrate.

17.1 Background

17.1.1 De¿nitions and payoffs

A call option gives you the right to buy a stock for a speci¿ed strike price on a speci¿edexpiration date.

The call option payoff is FW @ pd{+VW �[> 3,=

Portfolios of options are called strategies. A straddle – a put and a call at the same strikeprice – is a bet on volatility

Options allow you to buy and sell pieces of the return distribution.

Before studying option prices, we need to start by understanding option payoffs.

A call option gives you theright, but not the obligation, to buy a stock (or other “under-lying” asset) for a speci¿edstrike price (X) on (or before) theexpiration date (T). Europeanoptions can only beexercised on the expiration date.American options can be exercised any-time before as well as on the expiration date. I will only treat European options. Aput optiongives theright to sell a stock at a speci¿ed strike price on (or before) the expiration date. I’lluse the standard notation,

F @ Fw @ call price todayFW @ call payoff = value at expiration (T).V @ Vw @ stock price todayVW @ stock price at expiration[ @ strike price

Our objective is to¿nd the priceF. The general framework is (of course)F @ H+p{,where{ denotes the option’s payoff. The option’s payoff is the same things as its value atexpiration. If the stock has risen above the strike price, then the option is worth the differencebetween stock and strike. If the stock has fallen below the strike price, it expires worthless.

269

Page 270: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

Thus, the option payoff is

Call payoff @

�VW �[ if VW � [

3 if VW � [

FW @ pd{+VW �[> 3,=

A put works the opposite way: It gains value as the stock falls below the strike price, sincethe right to sell it at a high price is more and more valuable.

Put payoff @ SW @ pd{+[ � VW > 3,.

It’s easiest to keep track of options by a graph of their value as a function of stock price.Figure 27 graphs the payoffs from buying calls and puts, and the corresponding short posi-tions, which are calledwriting call and put options. One of the easiest mistakes to make is toconfuse thepayoff, with thepro¿t, which is the value at expiration less the cost of buying theoption. I drew in pro¿t lines, payoff - cost, to emphasize this difference.

ST

Call

ST

Put

ST

Write Call

ST

Write Put

Payoff

Profit

Figure 27. Payoff diagrams for simple option strategies.

Right away, you can see some of the interesting features of options. A call option allowsyou a huge positive beta. Typical at-the-money options (strike price = current stock price)give a beta of about 10� meaning that the option is equivalent to borrowing $10 to invest $11in the stock. However, your losses are limited to the cost of the option, which is paid upfront.Options are obviously very useful for trading. Imagine how dif¿cult it would be to buy stockon such huge margin, and how dif¿cult it would be to make sure people paid if the bet went

270

Page 271: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.1 BACKGROUND

bad. Options solve this problem. No wonder that active options trading started only a year ortwo after the ¿rst stocks started trading.

The huge beta also means that options are very useful for hedging. If you have a largeilliquid portfolio, you can offset the risks very cheaply with options.

Finally, options allow you to shape the distribution of returns in interesting and sometimesdangerous ways. For example, if you buy a 20% out of the money put option as well as astock, you have bought “catastrophe insurance” for your stock portfolio, at what turns out tobe a remarkably small price. You cut off the left tail of the return distribution, at a small costto the mean of the overall distribution.

On the other side, bywriting out of the money put options, you can earn a small fee yearin and year out, only once in a while experiencing a huge loss. You have a large probability ofa small gain and a small probability of a large loss. You are providing catastrophe insuranceto the market, and it works much like, say, writing earthquake insurance. The distributionof returns from this strategy is extremely non-normal, and thus statistical evaluation of itsproperties will be dif¿cult. This strategy is tempting to a portfolio manager who is beingevaluated only by the statistics of his achieved return. If he writes far out of the moneyoptions in addition to investing in an index , the chance of beating the index for one or even¿ve years is extremely high. If the catastrophe does happen and he loses a billion dollars orso, the worst you can do is¿re him. (Your contract with the manager is always a call option.)This is why portfolio management contracts are not purely statistical, but also write downwhat kind of investments can and cannot be made.

Portfolios of put and call options are calledstrategies, and have additional interestingproperties. Figure 28 graphs the payoff of astraddle, which combines a put and call at thesame strike price. This strategy pays off if the stock goes up or goes down. It loses moneyif the stock does not move. Thus the straddle is a bet onvolatility. Of course, everyoneelse understands this, and will bid the put and call prices up until the straddle earns only anequilibrium rate of return. Thus, you invest in a straddle if you think that stock volatility ishigher than everyone else thinks it will be. Options allow ef¿cient markets and random walksto operate on the second and higher moments of stocks as well as their overall direction! Youcan also see quickly that volatility will be a central parameter in option prices. The higherthe volatility, the higher both put and call prices.

More generally, by combining options of various strikes, you can buy and sell any pieceof the return distribution. A complete set of options – call options on every strike price – isequivalent to complete markets, i.e., it allows you to form payoffs that depend on the terminalstock price in any way� you can form any payoff of the formi+VW ,.

271

Page 272: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

Stock Price

Call payoff

Put payoff

Straddle profit

Make money if stock ends up here

Straddle

P+C

Figure 28. Payoff diagram for a straddle.

272

Page 273: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.1 BACKGROUND

17.1.2 Prices: one period analysis

I use the law of one price – existence of a discount factor – and no-arbitrage – existenceof a positive discount factor – to characterize option prices. The results are: 1) Put-callparity: S @ F � V . [@Ui = 2) Arbitrage bounds, best summarized by Figure 30 3) Theproposition that you should never exercise an American call option early on a stock that paysno dividends. The arbitrage bounds are a linear program, and this procedure can be used to¿nd them in more complex situations where clever identi¿cation of arbitrage portfolios mayfail.

We have a set of interesting payoffs. Now what can we say about their prices – theirvalues at dates before expiration? Obviously,s @ H+p{, as always. We have learned about{, now we have to think aboutp.

We can start by imposing little structure – the law of one price and the absence of arbi-trage, or, equivalently, the existence of some discount factor or a positive discount factor. .In the case of options, these two principles actually do tell you a good deal about the optionprice.

Put-Call parity

The law of one price, or the existence of some discount factor that prices stock, bond, anda call option, allows us to deduce the value of a put in terms of the price of the stock, bond,and call. Consider the following two strategies: 1) Hold a call, write a put, same strike price.2) Hold stock, promise to pay [= The payoffs of these two strategies are the same, as shownin Figure 29.

Equivalently, the payoffs are related by

SW @ FW � VW .[=

Thus, so long as the law of one price holds, the prices of left and right hand sides must beequal. Applying H+p�, to both sides for any p>

S @ F � V .[@Ui =

(The price of VW is V. The price of the payoff [ is [@Ui .)

Arbitrage bounds

If we add the absence of arbitrage, or equivalently the restriction that the discount factormust be positive, we can deduce bounds on the call option price without needing to know theput price. In this case, it is easiest to cleverly notice arbitrage portfolios – situations in which

273

Page 274: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

Buy call

+Write put

=

Buy stock

+Borrow strike

=

-X

X ST

Figure 29. Put call parity.

portfolio A dominates portfolio B. Then, either directly from the de¿nition of no-arbitrage orfrom D A E> p A 3 , H+pD, A H+pE,, you can deduce that the price of A must begreater than the price of B. The arbitrage portfolios are

1. FW A 3 , F A 3. The call payoff is positive so the call price must be positive.2. FW � VW �[ , F � V �[@Ui . The call payoff is better than hold stock - pay strike

for sure, so the call price is greater than holding the stock and borrowing the strike, i.e.promising to pay it for sure.

3. FW � VW , F � V. The call payoff is worse than stock payoff (because you have topay the strike price. Thus, the call price is less than stock price.

Figure 30 summarizes these arbitrage bounds on the call option value. We have gottensomewhere – we have restricted the range of the option prices. However, the arbitrage boundsare too large to be of much use. Obviously, we need to learn more about the discount factorthan pure arbitrage orp A 3 will allow. We could retreat to economic models, e.g. use theCAPM or other explicit discount factor model. Option pricing is famous because we don’thave to do that. Instead, if we open up dynamic trading–the requirement that the discountfactor price the stock and bondat every date to expiration–it turns out that we can sometimesdetermine the discount factor and hence the option value precisely.

This presentation is unsettling for two reasons. First, you may worry that you will not beclever enough to dream up dominating portfolios in more complex circumstances. Second,you may worry that we have not dreamed up all of the arbitrage portfolios in this circum-stance. Perhaps there is another one lurking out there, which would reduce the unsettlinglarge size of the bounds. It leaves us hungry for a constructive technique for¿nding arbi-

274

Page 275: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.1 BACKGROUND

Call value CIn here

X/Rf SStock value today

C Call valueToday

Figure 30. Arbitrage bounds for a call option

trage bounds that would be guaranteed to work in general situations, and to ¿nd the tightestarbitrage bound.

We want to know Fw @ H+pw>W{fW , where{fW @ pd{+VW�[> 3, denotes the call payoff,we want to use information in the observed stock and bond prices to learn about the optionprice, and we want to impose the absence of arbitrage. We can capture this search with thefollowing problem:

pd{p

Fw @ Hw+p{FW , v=w= p A 3> Vw @ Hw+pVW ,> 4 @ Hw+pUi , (1)

and the corresponding minimization. The ¿rst constraint implements absence of arbitrage.The second and third use the information in the stock and bond price to learn what we canabout the option price.

Write 17.1 out in state notation,

pd{ip+v,j

Fw @[v

�+v,p+v,{FW +v, v=w= p+v, A 3> Vw @[v

�+v,p+v,VW +v,> 4 @[v

�+v,p+v,Ui

This is a linear program – a linear objective and linear constraints. In situations where youdo not know the answer, you can calculate arbitrage bounds – and know you have them all– by solving this linear program. I don’t know how you would begin to check thatfor everyportfolio D whose payoff dominatesE, the price ofD is greater than the price ofE. Thediscount factor method lets you construct the arbitrage bounds.

275

Page 276: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

Early exercise

By applying the absence of arbitrage, we can show quickly that you should never exercisean American call option on a stock that pays no dividends before the expiration date. This is alovely illustration because such a simple principle leads to a result that isn’t initially obvious.Follow the table:

Payoffs FW @ pd{+VW �[> 3, � VW �[Price F � V �[@Ui

Ri A 4 F � V �[

V�[ is what you get if you exercise now. The value of the call is greater than this value,because you can delay paying the strike, and because exercising early loses the option value.Put-call parity let us concentrate on call options� this fact lets us concentrate on Europeanoptions.

17.2 Black-Scholes formula

Write a process for stock and bond, then use �� to price the option. The Black-Scholesformula 17.7 results. You can either solve for the¿nite-horizon discount factor�W@�3 and¿nd the call option price by taking the expectationF3 @ H3+�W@�3{

FW ,, or you can¿nd a

differential equation for the call option price and solve it backward.

Our objective, again, is to learn as much as we can about the value of an option, giventhe value of the underlying stock and bond. The one-period analysis led only to arbitragebounds, at which point we had to start thinking about discount factor models. Now, we allowintermediate trading, which means we really are thinking about dynamic multiperiod assetpricing.

The standard approach to the Black-Scholes formula rests on explicitly constructing port-folios: at each date we cleverly construct a portfolio of stock and bond that replicates theinstantaneous payoff of the option� we reason that the price of the option must equal theprice of the replicating portfolio. Instead, I follow the discount factor approach. The law ofone price is the same thing as the existence of a discount factor. Thus, rather than constructlaw-of-one-price replicating portfolios, construct at each date a discount factor that prices thestock and bond, and use that discount factor to price the option. the discount factor approachshows how thinking of the world in terms of a discount factor is equivalent in the result andas easy in the calculation as other approaches.

This case shows some of the interest and engineering complexity of continuous timemodels. Though at each instant the analysis is trivial law of one price, chaining it together

276

Page 277: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.2 BLACK-SCHOLES FORMULA

over time is not trivial.

The call option payoff is

FW @ pd{+VW �[> 3,

where[ denotes the strike price[ andVW denotes the stock price on the expiration dateW .The underlying stock follows

gV

V@ �gw. �g}=

There is also a money market security that pays the real interest rateugw.

We want a discount factor that prices the stock and bond. All such discount factors are ofthe formp @ {� . z, H+{z, @ 3. In continuous time, all such discount factors are of theform

g�

�@ �ugw� +�� u,

�g} � �zgz> H+gzg}, @ 3=

(You can check that this set of discount factors does in fact price the stock and interest rate,or take a quick look back at section??.)

Now we price the call option with this discount factor, and show that the Black-Scholesequation results. Importantly, the choice of discount factor via choice of�zgz turns out tohave no effect on the resulting option price.Every discount factor that prices the stock andinterest rate gives the same value for the option price. The option is therefore priced usingthe law of one price alone.

There are two paths to follow. Either we solve the discount factor forward, and then¿ndthe call value byF @ H+p{F,, or we characterize the price path and solve it backwardsfrom expiration.

17.2.1 Method 1: Price using discount factor??

Let us use the discount factor to price the option directly:

F3 @ Hw

��W�w

pd{ +VW �[> 3,

�@

]�W�w

pd{ +VW �[> 3, gi +�W > VW ,

where�W andVW are solutions to

gV

V@ �gw. �g} (17.2)

g�

�@ �ugw� �� u

�g} � �zgz=

I simplify the algebra by setting�zgz to zero, anticipating that it does not matter. You can

277

Page 278: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

reason that since V does not depend on gz, FW depends only on VW , so F will only dependon V, and gz will have no effect on the answer. If this isn’t good enough, a problem asksyou to include thegz, trace through the remaining steps and verify that the answer does notin fact depend ongz.

“Solving” a stochastic differential equation such as (17.2) means¿nding the distributionof the random variablesVW and�W , using information as of date3= This is just what wedo with difference equations. For example, if we solve{w.4 @ �{w . %w.4 with % normalforward to{W @ �W{3 .

SWm@4 �

W�m%m , we have expressed{W as a normally distributed

random variable with mean�W{3 and standard deviationSW

m@4 �5+W�m,. In the continuous

time case, it turns out that we can solve some nonlinear speci¿cations as well. Integrals ofg}give us shocks, as integrals ofgw give us deterministic functions of time.

We can¿nd analytical expressions for the solutions equations of the form (17.2). Startwith the stochastic differential equation

g\

\@ �\ gw. �\ g}= (3)

Write

g oq\ @g\

\� 4

5

4

\ 5g\ 5 @

��\ � 4

5�5\

�gw. �\ g]

Integrating from3 to W , (17.3) has solution

oq\W @ oq\3 .

��\ � �5\

5

�W . �\ +}W � }3, (4)

}W �}3 is a normally distributed random variable with mean zero and varianceW . Thus,oq\

is conditionally normal with meanoq\3 .��\ � �5\

5

�W and variance�5\ W= You can check

this solution by differentiating it – don’t forget the second derivative terms.

Applying the solution (17.4) to (17.2), we have

oqVW @ oqV3 .

��� �5

5

�W . �

sW% (17.5)

oq�W @ oq�3 �#u .

4

5

��� u

�5$W � �� u

sW%

where the random variable% is

% @}W � }3s

W� Q +3> 4, =

Having found the joint distribution of stock and discount factor, we evaluate the call

278

Page 279: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.2 BLACK-SCHOLES FORMULA

option value by doing the integral corresponding to the expectation,

F3 @

] 4

VW@[

�W�w

+VW �[, gi +�W > VW , (17.6)

@

] 4

VW@[

�W +%,

�w+VW +%,�[, gi +%,

We know the joint distribution of the terminal stock priceVW and discount factor�Won the right hand side, so we have all the information we need to calculate this integral.This example has enough structure that we can¿nd an analytical formula. In more generalcircumstances, you may have to resort to numerical methods. At the most basic level, youcan simulate the�> V process forward and then take the integral by summing over many suchsimulations.

Doing the integral

Start by breaking up the integral (17.6) into two terms,

F3 @

] 4

VW@[

�W +%,

�wVW +%, gi +%,�

] 4

VW@[

�W +%,

�w[ gi +%, =

VW and�W are both exponential functions of%. The normal distribution is also an exponentialfunction of%. Thus, we can approach this integral exactly as we approach the expectation ofa lognormal� we can merge the two exponentials in% into one term, and express the result asintegrals against a normal distribution. Here we go. Plug in (17.5) forVW >�W , and simplifythe exponentials in terms of%,

F3 @

] 4

VW@[

h��u. 4

5+��u� ,5

�W���u

sW%

V3h+�� 4

5�5,W.�

sW%i+%,g%

�[

] 4

VW@[

h��u. 4

5+��u� ,5

�W���u

sW%i +%,g%

@ V3

] 4

VW@[

h

k��u� 4

5

��5.+��u� ,5

�lW.+����u

� ,sW%

i+%,g%

�[

] 4

VW@[

h��u. 4

5+��u� ,5

�W���u

sW%i +%,

Now add the normal distribution formula fori+%,>

i+%, @4

5�h�

45 %

5

=

279

Page 280: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

The result is

F3 @4s5�

V3

] 4

VW@[

h

k��u� 4

5

��5.+��u� ,5

�lW.+����u

� ,sW%� 4

5%5

g%

� 4s5�

[

] 4

VW@[

h�ku. 4

5+��u� ,5

lW���u

sW%�4

5 %5

g%

@4s5�

V3

] 4

VW@[

h�45 ^%�+����u

� ,sW `5g%

� 4s5�

[h�uW] 4

VW@[

h�45+%.

��u�

sW,5g%=

Notice that the integrals have the form of a normal distribution with nonzero mean. The lowerbound VW @ [ is, in terms of %,

oq[ @ oqVW @ oqV3 .

��� �5

5

�W . �

sW%

% @oq[ � oqV3 �

��� �5

5

�W

�sW

=

Finally, we can express de¿nite integrals against a normal distribution by the cumulativenormal,

4s5�

] 4

d

h�45 +%��,5g% @ �+�� d,

i.e., �+, is the area under the left tail of the normal distribution.

F3 @ V3�

3C�

oq[ � oqV3 ���� �5

5

�W

�sW � w

.

�� � �� u

�sW

4D

�[h�u+W�w,�

3C�

oq[ � oqV3 ���� �5

5

�W

�sW

� �� u

sW

4D

Simplifying, we get the Black-Scholes formula

F3 @ V3�

#oqV3@[ .

�u . 4

5�5�W

�sW

$�[h�uW�

#oqV3@[ .

�u � 4

5�5�W

�sW

$= (7)

280

Page 281: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 17.2 BLACK-SCHOLES FORMULA

17.2.2 Method 2: Derive Black-Scholes differential equation

Rather than solve the discount factor forward and then integrate, we can solve the pricebackwards from expiration. The instantaneous or expected return formulation of a pricingmodel amounts to a differential equation for prices.

Guess that the solution for the call price is a function of stock price and time to expiration,Fw @ F+V> w,. Use Ito’s lemma to¿nd derivatives ofF+V> w,,

gF @ Fwgw.FVgV .4

5FVVgV

5

gF @

�Fw .FVV�.

4

5FVVV

5�5�gw.FVV�g}

Plugging into the basic asset pricing equation

3 @ Hw +g�F, @ FHwg�. �HwgF .Hwg�gF>

usingHw+g�@�, @ �ugw and canceling�gw, we get

3 @ �uF .Fw .FVV�.4

5FVVV

5�5 � V +�� u,FV

or,

3 @ �uF .Fw . VuFV .4

5FVVV

5�5= (8)

This is the Black-Scholes differential equation for the option price.

We now know a differential equation for the price functionF+V> w,. We know the value ofthis function at expiration,F+VW > W , @ pd{+VW �[> 3,. The remaining task is to solve thisdifferential equation backwards through time. Conceptually, and numerically, this is easy.Express the differential equation as

�CF+V> w,

Cw@ �uF+V> w, . Vu

CF+V> w,

CV.

4

5

C5F+V> w,

CV5V5�5=

At any point in time, you know the values ofF+V> w, for all V – for example, you can storethem on a grid forV. Then, you can take the¿rst and second derivatives with respect toVand form the quantity on the right hand side at each value ofV. Now, you can¿nd the optionprice at any value ofV, one instant earlier in time.

This differential equation, solved with boundary condition

F @ pd{ iVW �[> 3jhas an analytic solution – the familiar formula (17.7). One standard way to solve differentialequations is to guess and check� and by taking derivatives you can check that (17.7) does

281

Page 282: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 17 OPTION PRICING

satisfy (17.8). Black and Scholes solved the differential equation with a fairly complicatedFourier transform method. The more elegant Feynman-Kac solution amounts to showing thatsolutions of the partial differential equation (17.8) can be represented as integrals of the formthat we already derived independently as in (17.6). (See Duf¿e 1992 p.87)

17.3 Problems

1. We showed that you should never exercise an American call early if there are nodividends. Is the same true for American puts, or are there circumstances in which it isoptimal to exercise American puts early?

2. Retrace the steps in the integral derivation of the Black-Scholes formula and show thatthegz does not affect the¿nal result.

282

Page 283: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 18. Option pricing withoutperfect replication

18.1 On the edges of arbitrage

The Black-Scholes formula is justly famous and launched a thousand techniques for optionpricing. Theprinciple of no-arbitrage pricing is obvious, but its application leads to manysubtle and unanticipated pricing releationshps.

However, in many practical situations, the law of one price arguments that we used inthe Black-Scholes formula break down. If options really were redundant, it is unlikely thatthey would be traded as separate assets. It really is easy to synthesize forward rates fromzero-coupon bonds, and forward rates are not separately traded or quoted.

We really cannot trade continuously, and trying to do so would drown a strategy in trans-actions costs. As a practical example, at the time of the 1987 stock market crash, severalprominent funds were trying to follow “portfolio insurance” strategies, essentially synthesiz-ing put options by systematically selling stocks as prices declined. During the time of thecrash, however, they found that the markets just dried up – they were unable to sell as pricesplummeted. We model this situation mathematically as a Poisson jump, a discountinuousmovement in prices. In the face of such jumps the call option payoff isnot perfectly hedgedby a portfolio of stock and bond, and cannot be priced as such.

Generalizations of the stochastic setup lead to the same result. If the interest rate orstock volatility are stochastic, we do not have securities that allow us to perfectly hedge thecorresponding shocks, so the law of one price again breaks down.

In addition, many options are written on underlying securities that are not traded, or nottraded continually and with suf¿cient liquidity. Real options in particular – the option tobuild a factory in a particular location – are not based on a tradeable underlying security, sothe logic behind Black-Scholes pricing does not apply. Executives are speci¿cally forbiddento short stock in order to hedge executive options.

Furthermore, applications of option pricing formulas to trading activities seem to suffer astrange inconsistency. We imagine that the stock and bond are perfectly priced and perfectlyliquid – available for perfect hedging. Then, we search for options that are priced incorrectlyas trading opportunities. If the options can be priced wrong, why can’t the stock and bondbe priced wrong? We should treat all assets symmetrically in evaluating trading opportuni-ties. Trading opportunities also involve risk, and a theory that pretends they are arbitrageopportunities does not help much to quantify that risk.

In all of these situations, an unavoidable “basis risk” creeps in between the option payoffand the best possible hedge portfolio. Holding the option entails some risk, and the valueof the option depends on the “market price” of that risk – the covariance of the risk with an

283

Page 284: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

appropriate discount factor.

Nonetheless, we would like not to give up and go back to the consumption-based model,factor models, or other “absolute” methods that try to price all assets. We are still willing totake as given the prices of lots of assets in determining the price of an option, and in particularassets that will be used to hedge the option. We can form an “approximate hedge” or portfolioof basis assets “closest to” the focus payoff, and we can hedge most of the option’s risk withthat approximate hedge. Then, the uncertainty about the option payoff is reduced only to¿guring out the price of the residual.

In addition, since the residuals are small, we might be able to say a lot about optionprices with much weaker restrictions on the discount factor than those suggested by absolutemodels.

In this chapter, I survey “good deal” option price bounds, a technique that Jesus Saá-Requejo and I (1999) advocated for this situation. The good deal bounds amount to system-atically searching over all possible assignments of the “market price of risk” of the residual,constraining the total market price of risk to a reasonable value, and imposing no arbitrageopportunites, to¿nd upper and lower bounds on the option price. It isnot equivalent to pric-ing options with pure Sharpe ratio arguments. The concluding section of this chapter surveyssome alternative and additional techniques.

18.2 One-period good deal bounds

We want to price the payoff {F , for example, {F @ pd{+VW �N> 3, for a call option. Wehave in hand a Q�dimensional vector of basis payoffs {, whose prices s we can observe, forexample the stock and bond. The good deal bound ¿nds the minimum and maximum value of{F by searching over all positive discount factors that price the basis assets and have limitedvolatility:

F @ pd{ipj

H+p{F, s.t. s @ H+p{,, p � 3> �5+p, � k@Ui (1)

The corresponding minimization yields the lower bound F. This is a one-period discrete-time problem. The Black-Scholes formula does not apply because you can’t trade betweenthe price and payoff periods.

The¿rst constraint on the discount factor imposes the price of the basis assets. We wantto do as much relative pricing as possible� we want to extend what we know about the pricesof { to price{F , without worrying about where the prices of{ come from. The secondconstraint imposes the absence of arbitrage. This problem without the last constraint yieldsthe arbitrage bounds that we studied in section 30. In most situations, the arbitrage boundsare too wide to be of much use.

The last is an additional constraint on discount factors, and the extra content of good-deal vs. arbitrage bounds. It is a relatively weak restriction. We could obtain closer bounds

284

Page 285: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.2 ONE-PERIOD GOOD DEAL BOUNDS

on prices with more information about the discount factor. In particular, if we know thecorrelation of the discount factor with the payoff{F we could price the option a lot better!

As p A 3 means that no portfolios priced byp may display an arbitrage opportunity,�5+p, � k@Ui means that no portfolio priced byp may have a Sharpe ratio greater thank.RecallH+pUh, @ 3 impliesH+p,H+Uh, @ ���+p,�+Uh, andm�m � 4.

It is a central advantage of a discount factor approach that we can easily imposeboth thediscount factor volatility constraint and positivity, merging the lessons of factor models andoption pricing models. The prices and payoffs generated by discount factors that satisfybothp � 3 and�+p, � k@Ui do more than rule out arbitrage opportunities and high Sharperatios.

I’ll treat the case that there is a riskfree rate, so we can writeH+p, @ 4@Ui . In this case,it is more convenient to express the volatility constraint as a second moment, so the bound(18.1) becomes

F @ plqipj

H +p {f, v=w= s @ H +p{, > H�p5

� � D5> p � 3> (2)

whereD5 � +4.k5,@Ui5. The problem is a standard minimization with two inequality con-straints. Hence we¿nd a solution by trying all the combinations of binding and nonbindingconstraints, in order of their ease of calculation. 1) Assume the volatility constraint binds andthe positivity constraint is slack. This one is very easy to calculate, since we will¿nd ana-lytic formulas for the solution. If the resulting discount factorp is nonnegative, this is thesolution. If not, 2) assume that the volatility constraint is slack and the positivity constraintbinds. This is the classic arbitrage bound. Find the minimum variance discount factor thatgenerates the arbitrage bound. If this discount factor satis¿es the volatility constraint, this isthe solution. If not, 3) solve the problem with both constraints binding.

18.2.1 Volatility constraint binds, positivity constraint is slack

If the positivity constraint is slack, the problem reduces to

F @ plqipj

H+p {f, v=w= s @ H +p{, > H�p5

� � D5= (3)

We could solve this problem directly, choosingp in each state with Lagrange multiplierson the constraints. But as with the mean-variance frontier it is much more elegant to set uporthogonal decompositions and then let the solution pop out.

Figure 31 describes the idea.[ denotes the space of payoffs of portfolios of the basisassets{, a stock and a bond in the classic Black-Scholes setup. Though graphed as a line,[is typically a larger space. We know all prices in[, but the payoff{f that we wish to valuedoes not lie in[.

285

Page 286: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

Start by decomposing the focus payoff {f into an approximate hedge a{f and a residual z>

{f @ a{f .z>

a{f � surm+{fm[, @ H+{f{3,H+{{3,�4{>

z � {f � a{f= (18.4)

We know the price of a{f. We want to bound the price of the residual z to learn as much aswe can about the price of {f.

All discount factors that price { – that satisfys @ H+p{,– lie in the plane through{�.As we sweep through these discount factors, we generate any price from�4 to 4 for theresidualz and hence payoff{f. All positive discount factorsp A 3 lie in the intersectionof thep plane and the positive orthant – the triangular region. Discount factorsp in thisrange generate a limited range of prices for the focus payoff – the arbitrage bounds. Sincesecond moment de¿nes distance in Figure 31, the set of discount factors that satis¿es thevolatility constraintH+p5, � D5 lies inside a sphere around the origin. The circle in Figure31 shows the intersection of this sphere with the set of discount factors. This restricted rangeof discount factors will produce a restricted range of values for the residualz and hence arestricted range of values for the focus payoff{f= In the situation I have drawn, the positivityconstraint is slack, since theH+p5, � D5 circle lies entirely in the positive orthant.

We want to¿nd the discount factors in the circle that minimize or maximize the priceof the residualz. The more a discount factor points in thez direction, the larger a priceH+pz, it assigns to the residual. Obviously, the discount factors that maximize or minimizethe price ofz point as much as possible towards and away fromz. If you add any movement% orthogonal toz, this increases discount factor volatility without changing the price ofz.

Hence, the discount factor that generates the lower bound is

p @ {� � yz (5)

where

y @

vD5 �H+{�5,

H+z5,(6)

is picked to just satisfy the volatility constraint. The bound is

F @ H+p{f, @ H+{�{f,� yH+z5, (7)

The upper bound is given byy @ �y

The¿rst term in equation (18.7) is the value of the approximate hedge portfolio, and canbe written several ways, including

H+{�{f, @ H+{�a{f, @ H+pa{f, (8)

286

Page 287: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.2 ONE-PERIOD GOOD DEAL BOUNDS

X

cx

cx

*x

w

0>m

22 )( AmE <

vwx −*

vwx +*

x

Figure 31. Construction of a discount factor to solve the one-period good deal bound whenthe positivity constraint is slack.

for any discount factorp that prices basis assets. (Don’t forget,H+{|, @ H^{ surm+|m[,`=)The second term in equation (18.7) is the lowest possible price of the residualz consistentwith the discount factor volatility bound:

yH+z5, @ H+yz z, @ H^+{� . yz,z` @ H+pz,=

For calculations you can substitute the de¿nitions of{� andz in equation (18.7) to obtainan explicit, if not very pretty, formula:

F @ s3H+{{3,�4H+{{f,�sD5 � s3H+{{3,�4s

sH+{f5,�H+{f{3,H+{{3,�4H+{{f,=

(9)

287

Page 288: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

The upper bound F is the same formula with a . sign in front of the square root.

Using (18.5), check whether the discount factor is positive in every state of nature. If so,this is the good-deal bound, and the positivity constraint is slack. If not, proceed to the nextstep.

If you prefer an algebraic and slightly more formal argument, start by noticing that anydiscount factor that satis¿ess @ H+p{, can be decomposed as

p @ {� . yz . %

whereH+{�z, @ H+{�%, @ H+z%,. Check these properties from the de¿nition of z and%�this is just likeU @ U� .zUh� . q. Our minimization problem is then

plqiy>%j

H+p{f, s.t.H+p5, � D5

plqiy>%j

H ^++{� . yz . %, +a{f .z,` s.t.H�{�5

�. y5H+z5, .H+%5, � D5

plqiy>%j

H+{�a{f, . yH+z5, s.t.H�{�5

�. y5H+z5, .H+%5, � D5

The solution is% @ 3 andy @ t

D5�H+{�5,H+z5, .

18.2.2 Both constraints bind

Next, I ¿nd the bounds when both constraints bind. Though this is the third step in the pro-cedure, it is easiest to describe this case¿rst. Introducing Lagrange multipliers, the problemis

F @ plqipA3j

pd{i�>�A3j

H +p {f, . �3 ^H +p{,� s` .�

5

�H�p5

��D5�

The¿rst order conditions yield a discount factor that is a truncated linear combination of thepayoffs,

p @ pd{

��{f . �3{

�> 3

�@

��{f . �3{

�.= (10)

The last equality de¿nes the`. notation for truncation. In¿nance terms, this is a call optionwith zero strike price. To derive this expression, take partial derivatives with respect top ineach state.

We could plug expression (18.10) into the constraints, and solve numerically for Lagrangemultipliers� and� that enforce the constraints. Alas, this procedure requires the solution ofa system of nonlinear equations in (�, �), which is often a numerically dif¿cult or unstableproblem.

288

Page 289: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.2 ONE-PERIOD GOOD DEAL BOUNDS

Hansen, Heaton and Luttmer (1995) show how to recast the problem as a maximization,which is numerically much easier. Interchanging min and max,

F @ pd{i�>�A3j

plqipA3j

H+p {f, . �3 ^H +p{,� s` .�

5

�H�p5

�� D5�= (11)

The inner minimization yields the same¿rst order conditions (18.10). Plugging those¿rst-order conditions into the outer maximization of (18.11) and simplifying, we obtain

F @ pd{i�>�A3j

H

+��

5

��{f . �3{

�.5,� �3s��

5D5= (12)

You can search numerically over+�> �, to ¿nd the solution to this problem. The upper boundis found by replacingpd{ with plq and replacing� A 3 with � ? 3.

18.2.3 Positivity binds, volatility is slack

If the volatility constraint is slack and the positivity constraint binds, the problem reduces to

F @ plqipj

H +p {f, v=w= s @ H +p{, > p A 3= (13)

These are the arbitrage bounds. We found these bounds in section 30 for a call option by justbeing clever. If you can’t be clever, (18.13) is a linear program.

We still have to check that the discount factor volatility constraint can be satis¿ed at thearbitrage bound. Denote the lower arbitrage bound byFo. The minimum variance (secondmoment) discount factor that generates the arbitrage boundFo solves

H+p5,plq @ plqipj

H+p5, v=w

�sFo

�@ H

�p

�{{f

��> p A 3=

Using the same conjugate method, this problem is equivalent to

H+p5,plq @ pd{iy>�j

�Hq^� +�{f . y3{,`.5

r� 5y3s�5�Fo=

Again, search numerically for+y>�, to solve this problem. IfH+p5,plq � D, Fo is thesolution to the good-deal bound� if not we proceed with the case that both constraints arebinding described above.

18.2.4 Application to Black-Scholes

The natural ¿rst exercise with this technique is to see how it applies in the Black-Scholesworld. Keep in mind, this is the Black-Scholes world with no intermediate trading� comparethe results to the arbitrage bounds, not to the Black-Scholes formula. Figure 32, taken from

289

Page 290: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

Cochrane and Saá-Requejo (1999) presents the upper and lower good-deal bounds for a calloption on the S&P500 index with strike priceN @ '433> and three months to expiration. Weused parameter valuesH+U, @ 46(, �+U, @ 49( for the stock index return and an riskfreerateUi @ 8(. The discount factor volatility constraint is twice the historical market Sharperatio,k @ 5 � H+U � Ui ,@�+U, @ 4=3. To take the expectations required in the formula,we evaluated integrals against the lognormal stock distribution.

Figure 32. Good deal option price bounds as a function of stock price. Options havethree months to expiration and strike priceN @ '433. The bounds assume no trading untilexpiration, and a discount factor volatility boundk @ 4=3 corresponding to twice the marketSharpe ratio. The stock is lognormally distributed with parameters calibrated to an indexoption.

The¿gure includes the lower arbitrage boundsF � 3, F � N@Ui . The upper arbitragebound states thatF � V, but this 45� line is too far up to¿t on the vertical scale and still seeanything else. As in many practical situations, the arbitrage bounds are so wide that they areof little use. The upper good-deal bound is much tighter than the upper arbitrage bound. Forexample, if the stock price is $95, the entire range of option prices between the upper boundof $2 and the upper arbitrage bound of $95 is ruled out.

The lower good-deal bound is the same as the lower arbitrage bound for stock prices less

290

Page 291: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.3 MULTIPLE PERIODS AND CONTINUOUS TIME

than about $90 and greater than about $110. In this range, the positivity constraint binds andthe volatility constraint is slack. This range shows that it is important to impose both volatilityand positivity constraints. Good deal bounds are not just the imposition of low Sharpe ratioson options. (I emphasize it because this point causes a lot of confusion.) The volatility boundalone admits negative prices. A free out of the money call option is like a lottery ticket: it isan arbitrage opportunity, but its expected return/standard deviation ratio is terrible, becausethe standard deviation is so high. A Sharpe ratio crieterion alone will not rule it out.

In between $90 and $110, the good-deal bound improves on the lower arbitrage bound.It also improves on a bound that only imposes only the volatility constraint. In this region,both positivity and volatility constraints bind. This fact has an interesting implication: Not allvalues outside the good-deal bounds imply high Sharpe ratios or arbitrage opportunities. Suchvalues might be generated by a positive but highly volatile discount factor, and generated byanother less volatile but sometimes negative discount factor, but no discount factor generatesthese values that issimultaneously nonnegative and respects the volatility constraint.

It makes sense rule out these values. If we know that an investor will invest in any ar-bitrage opportunity or take any Sharpe ratio greater thank, then we know that hisuniquemarginal utility satis¿es both restrictions. He would¿nd a utility-improving trade for valuesoutside the good-deal bounds, even though those values may not imply a high Sharpe ratio,an arbitrage opportunity, or any other simple portfolio interpretation.

The right thing to do is to intersect restrictions on the discount factor. Simple portfoliointerpretations, while historically important, are likely to fall by the wayside as we add morediscount factor restrictions or intersect simple ones.

18.3 Multiple periods and continuous time

Now, on to the interesting case. Option pricing is all about dynamic hedging, even if imper-fect dynamic hedging. Good deal bounds would be of little use if we could only apply themto one-period environments.

18.3.1 The bounds are recursive

The central fact that makes good deal bounds tractable in dynamic environments is that thebounds are recursive. Today’s bound can be calculated as the minimum price of tomorrow’sbound, just as today’s option price can be calculated as the value of tommorrow’s optionprice.

To see that the bounds are recursive, consider a two-period version of the problem,

F3 @ plqip4> p5j

H3+p4p5 {f5, s.t.

291

Page 292: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

sw @ Hw+pw.4sw.4,> Hw+p5w.4, � D5

w > pw.4 A 3> w @ 3> 4= (14)

This two period problem is equivalent to a series of one period problems, in which the F3

problem ¿nds the lowest price of the F4 lower bound,

F4 @ plqip5j

H4+p5{f5, >F3 @ plq

ip4jH3+p4F4,

subject to (18.14). Why? The solution to the two-period problemplqH3+p4H4+p5{f,,

must minimizeH4+p5{f, in each state of nature at time 1. If not, you could lowerH4+p5{

f,without affecting the constraints, and lower the objective. Note that this recursive propertyonly holds if we imposep A 3. If p4 ? 3 were possible we would want to maximizeH4+p5{

f, in some states of nature.

18.3.2 Basis risk and real options

The general case leads to some dense formulas, so a simple example will let us understandthe idea most simply. Let’s value a European call option on an eventY that is not a tradedasset, but is correlated with a traded asset that can be used as an approximate hedge. Thissituation is common with real options and non¿nancial options and describes some¿nancialoptions on illiquid assets.

The terminal payoff is

{fW @ pd{+YW �N> 3,=

Model the joint evolution of the traded assetV and the eventY on which the option is writtenas

gV

V@ �Vgw. �Vg}>

gY

Y@ �Y gw. �Y }g} . �Yzgz=

Thegz risk cannot be hedged by theV asset, so the the market price ofgz risk – its correla-tion with the discount factor – will matter to the option price.

We are looking for a discount factor that pricesV andui , has instantaneous volatilityD, and generates the largest or smallest price for the option. Hence, it will have the largestloading ongz possible. By analogy with the one period case (18.5), you can quickly see that

292

Page 293: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.3 MULTIPLE PERIODS AND CONTINUOUS TIME

the discount factor will have the form

g�

�@

g��

��tD5 � k5Vgz

g��

��@ �ugw� kVg}

kV @�V � u

�V=

g��@�� is the familiar analogue to {� that prices stock and bond. We add a loading on theorthogonal shock gz just suf¿cient to satisfy the constraint Hw+g�

5@�5, @ D5. One of will generate the upper bound, and one will generate the lower bound.

Now that we have the discount factor, the good deal bound is given by

Fw @ Hw

��W�w

pd{+YW �N,

�=

Vw, Yw, and �w are all diffusions with constant coef¿cients. Therefore, VW > YW and �W arejointly lognormally distributed, so the double integral de¿ning the expectation is straightfor-ward to perform, and works very similarly to the integral we evaluated to solve the Black-Scholes formula in section??. (If you get stuck, see Cochrane and Saá-Requejo 199x for thealgebra.)

The result is

F orF @ Y3h�W!

�g.

4

5�Y

sW

��Nh�uW!

�g� 4

5�Y

sW

�(15)

where!+�,denotes the left tail of the normal distribution and

�5Y � HwgY 5

Y 5@ �5Y } . �5Y z

g � oq+Y3@N, . +� . u,W

�YsW

� �%kY � kV

#�� d

vD5

k5V� 4

s4� �5

$&�Y

kV � �V � u

�V> kY � �Y � u

�Y

� � fruu

�gY

Y>gV

V

�@

�Y }�Y

d @

�.4 upper bound�4 lower bound

.

This expression is exactly the Black-Scholes formula with the addition of the� term.

293

Page 294: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

�Y enters the formula because the event Y may not grow at the same rate as the assetV. Obviously, the correlation � between Y shocks and asset shocks enters the formula, and asthis correlation declines, the bounds widen. The bounds also widen as the volatility constraintD becomes larger relative to the asset Sharpe ratios kV .

Market prices of risk

Continuous-time pricing problems are often speci¿ed in terms of “market prices of risk”rather than discount factors. This is the instantaneous Sharpe ratio that an asset must earn ifit loads on a speci¿c shock. If an asset has a price processS that loads on a shock�gz, thenits expected return must be

HwgS

S� uigw @ ��Hw

�g�

�gz

with Sharpe ratio

� @Hw

gSS

� uigw

�@ �Hw

�g�

�gz

�=

I have introduced the common notation� for the market price of risk. Thus, problems areoften attacked by making assumptions about� directly and then proceeding from

Hw

gS

S� uigw @ ��=

In this language, the market price of stock risk iskV and can be measured by observingthe stock, and does not matter when you can price by arbitrage (notice it is missing fromthe Black-Scholes formula). Our problem comes down to choosing the market price ofgzrisk, which cannot be measured by observing a traded asset, in such a way as to minimize or

maximize the option price, subject to a constraint that the total price of risktk5V . �5 � D.

18.3.3 Continuous time

Now, a more systematic expression of the same ideas in continuous time. As in the optionpricing case in the last chapter and the term structure case in the next chapter, we will obtaina differential characterization. To actually get prices, we have either to solve the discountfactor forward, or to¿nd a differential equation for prices which we solve backward.

Basis assets

In place ofH+{,, H+{{3, etc., model the price processes of anqV-dimensional vector ofbasis assets by a diffusion,

gV

V@ �V+V> Y> w,gw. �V+V> Y> w,g}> H+g} g}3, @ L (16)

294

Page 295: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.3 MULTIPLE PERIODS AND CONTINUOUS TIME

Rather than complicate the notation, understand division to operate element by element onvectors, e.g., gV@V @

�gV4@V4 gV5@V5 � � � �

. The basis assets may pay dividends atrate G+V> Y> w,gw.

Y represents an qY -dimensional vector of additional state variables that follow

gY @ �Y +V> Y> w,gw. �Y }+V> Y> w,g} . �Y z+V> Y> w,gz> H+gz gz3, @ L>H+gz g}3, @ 3=(17)

This could include a stochastic stock volatility or stochastic interest rate – classic cases inwhich the Black-Scholes replication breaks down. Again, I keep it simple by assuming thereis a risk free rateu+V> Y> w,gw=

The problem

We want to value an asset that pays continuous dividends at rate{f+V> Y> w,gw and with aterminal payment{fW +V> Y> W ,. Now we must choose a discount factorprocess to minimizethe asset’s value

Fw @ plqi�v> w?v�Wj

Hw

] W

v@w

�v�w

{fvgv.Hw

��W�w

{fW

�(18)

subject to the constraints that 1) the discount factor prices the basis assetsV> u at eachmoment in time, 2) the instantaneous volatility of the discount factor process is less than aprespeci¿ed valueD5 and 3) the discount factor is positive�v A 3> w � v � W .

One period at a time� differential statement

Since the problem is recursive, we can study how to move one step back in time,

Fw�w @ plqi�vj

Hw

] w.�w

v@w

�v{fvgv.Hw

��w.�wFw.�w

�or, for small time intervals,

Fw�w @ plqi��j

Hw i{f�w. +Fw .�F, +�w .��,j =

Letting�w $ 3, we can write the objective in differential form,

3 @{fwFgw. plq

ig�jHw ^g +�F,`

�F> (19)

subject to the constraints. We can also write (18.19) as

Hw

gF

F.

{fwFgw� uigw @ �plq

ig�jHw

�g�

gF

F

�= (20)

Since the second and third terms on the left hand side are¿xed, the condition sensibly tellsus to¿nd the lowest valueF by maximizing the drift of the bound at each date. You should

295

Page 296: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

recognize the form of (??) and (18.20) as the basic pricing equations in continuous time,relating expected returns to covariance with discount facotrs.

Constraints

Now we express the constraints. As in the discrete time case, we orthogonalize the dis-count factor inp @ {� . % form, and then the solution pops out. Any discount factor thatprices the basis assets is of the form

g�

�@

g��

��� ygz

where

g��

��� �ugw� ��3V

�4V �Vg}

��V � �V .G

V� u> V @ �V�

3V =

andy is a4� qY matrix. We can add shocks orthogonal togz if we like, but they will haveno effect on the answer� the minimization will say to set such loadings to zero.

The volatility constraint is

4

gwHw

g�5

�5� D5

and hence, using (??),

yy3 � D5 � 4

gwHw

g��5

��5@ D5 � ��3V

�4V ��V = (21)

By expressing the constraints via (??) and (18.21), we have again reduced the problemof choosing the stochastic process for� to the choice of loadingsy on the noisesgz withunknown values, subject to a quadratic constraint onyy3. Since we are picking differentialsand have ruled out jumps, the positivity constraint is slack so long as� A 3.

Market prices of risk

Using equation (??), y is the vector of market prices of risks of thegz shocks – theexpected return that any asset must offer if its shocks aregz:

� 4

gwH

�g�

�gz

�@ y=

Thus, the problem is equivalent to:¿nd at each date the assignment of market prices of riskto thegz shocks that minimizes (maximizes) the focus payoff value, subject to the constraintthat the total (sum of squared) market price of risk is bounded byD5.

Now, we’re ready to follow the usual steps. We can characterize a differential equation forthe option price that must be solved back from expiration, or we can try to solve the discount

296

Page 297: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.3 MULTIPLE PERIODS AND CONTINUOUS TIME

factor forward and take an expectation.

Solutions: the discount factor and bound drift at each instant

We can start by characterizing the bound’s process, just as the basis assets follow (18.16).This step is exaclty the instantaneous analogue of the one-period bound without a positivityconstraint, so remember that logic if the equations start to get a bit forbidding.

Guess that lower boundF follows a diffusion process, and¿gure out what the coef¿cientsmust look like. Write

gF

F@ �F+V> Y> w,gw. �F}+V> Y> w,g} . �Fz+V> Y> w,gz= (22)

�F} and�Fz capture the stochastic evolution of the bound over the next instant – the ana-logues toH+{{f,, etc. that were inputs to the one period problem. Therefore, a differentialor moment-to-moment characterization of the bound will tell us�F andg�in terms of�F}and�Fz.

Theorem:The lower bound discount factor�w follows

g�

�@

g��

��� y gz (23)

and�F > �F} and�Fz satisfy the restriction

�F .{f

F� u @ � 4

gwHw

�g��

���F}g}

�. y�3Fz (24)

where

y @

uD5 � 4

gwHw

g��5

��5�Fzt�Fz�3Fz

(25)

The upper bound processFw and discount factor�w have the same representationwith y @ �y.

This theorem has the same geometric interpretation as shown in Figure 31.g��@�� is thecombination of basis asset shocks that prices the basis assets by construction, in analogy to{�. The term�Fzgz corresponds to the errorz, and�Fz�3Fz corresponds toH+z5,. Theproposition looks a little different because now we choose a vectory rather than a number. Wecould de¿ne a residual�Fzgz and then the problem would reduce to choosing a number, theloading ofg� on this residual. It is not convenient to do so in this case since�Fz potentiallychanges over time. In the geometry of Figure 31, thez direction may change over time. Thealgebraic proof just follows this logic.

Proof: Substituting equation (??) into the problem (18.19) in order to impose the

297

Page 298: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

pricing constraint, the problem is

3 @{f

Fgw.Hw

�g+��F,

��F

��plq

iyjyHw

�gz

gF

F

�s.t. yy3 � D5 � 4

gwHw

�g��5

��5

�=

Using equation (18.22) for gF@F in the last term, the problem is

3 @{f

F.

4

gwHw

�g+��F,

��F

��plq

iyjy�3Fz s.t. yy3 � D5 � 4

gwHw

�g��5

��5

�=

This is a linear objective in y with a quadratic constraint. Therefore, as long as�Fz 9@ 3, the constraint binds and the optimal y is given by (18.25). y @ �y givesthe maximum since �Fz�

3Fz A 3= Plugging the optimal value for y in (??) gives

3 @{f

F.

4

gwHw

�g +��F,

��F

�� y�3Fz=

For clarity, and exploiting the fact that g�� does not load on gz, write the middleterm as

4

gwHw

�g +��F,

��F

�@ �F � u .

4

gwHw

�g��

���F}g}

If �Fz @ 3, any y leads to the same price bound. In this case we can most simplytake y @ 3. �

As in the discrete-time case, we can plug in the de¿nition of �� to obtain explicit, if lessintuitive, expressions for the optimal discount factor and the resulting lower bound,

g�

�@ �ugw. ��3V

�4V �Vg} �

tD5 � ��3V

�4V ��V

�Fzt�Fz�3Fz

gz (26)

�F .{f

F� u @ ��3V

�4V �V�F} .

tD5 � ��3V

�4V ��V

t�Fz�3Fz= (27)

A partial differential equation

Now we are ready to apply the standard method� ¿nd a partial differnetial equation andsolve it backwards to¿nd the price at any date. The method proceeds exactly as for the Black-Scholes formula: Guess a solutionF+V> Y> w,= Use Ito’s lemma to derive expressions for�Fand�F}> �Fz in terms of the partial derivatives ofF+V> Y> w,. Substitute these expressionsinto restriction (18.27). The result is ugly, but straightforward to evaluate numerically. Justlike the Black-Scholes partial differential equation, it expresses the time derivativeCF@Cw interms of derivatives with respect to state variables, and thus can be used to work back from aterminal period.

298

Page 299: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.3 MULTIPLE PERIODS AND CONTINUOUS TIME

Theorem. The lower bound F+V> Y> w, is the solution to the partial differential equa-tion

{f � uF .CF

Cw.

.4

5

[l>m

C5F

CVlCVmVlVm�Vl�

3Vm

.4

5

[l>m

C5F

CYlCYm+�Y }l�

3Y }m

. �Yzm�3Y zm

, .[l>m

C5F

CVlCYmVl�Vl�

3Y }m

@

@

�G

V� u

�3+VFV, .

���3V

�4V �V�

3Y } � �3Y

�FY .

tD5 � ��3V

�4V ��V

tF3Y �Y z�3YzFY

subject to the boundary conditions provided by the focus asset payoff{fW . FY de-notes the vector with typical elementCF@CYm and+VFV, denotes the vector withtypical elementVlCF@CVl. Replacing. with � before the square root gives thepartial differential equation satis¿ed by the upper bound.

The discount factor

In general, the� process (18.23) or (18.26) depends on the parameters�Fz. Hence,without solving the above partial differential equation we do not know how to spread theloading ofg� across the multiple sources of riskgz whose risk prices we do not observe.Equivalently, we do not know how to optimally spread the total market price of risk across theelements ofgz. Thus, in general we cannot use the integration approach– solve the discountfactor forward – to¿nd the bound by

Fw @ Hw

] W

v@w

�v�w

{fvgv.Hw

��W�w

{fW

�=

However, if there is only one shockgz, then we don’t have to worry about how the loadingof g� spreads across multiple sources of risk.y can be determined simply by the volatilityconstraint. In this special case,gz and�Fz are scalars. Hence equation (18.23) simpli¿esas follows:

Theorem: In the special case that there is only one extra noisegz driving theYprocess, we can¿nd the lower bound discount factor� from directly from

g�

�@ �ugw� ��3V

�4V �Vg} �

tD5 � ��3V

�4V ��V gz= (28)

I used this characterization to solve for the case of a non-traded underlying in the lastsection. In some applications, the loading ofg� on multiple shocksgzmay be constant over

299

Page 300: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 18 OPTION PRICING WITHOUT PERFECT REPLICATION

time. In such cases, one can again construct the discount factor and solve for bounds by(possibly numerical) integration, avoiding the solution of a partial differential equation.

18.4 Extensions, other approaches, and bibliography

The roots of the good deal good deal idea go a long way back. Ross (1976) bounded APTresiduals by assuming that no portfolio can have more than twice the market Sharpe ratio, andI used the corresponding idea that discount factor volatility should be bounded to generate arobust approximate APT in Chapter 10.4. Good deal bounds apply the same idea to optionpayoffs. However, the good deal bounds also impose positive discount factors, and thisconstraint is important in an option pricing context. We also study dynamic models thatchain discount factors together as in the option pricing literature.

The one-period good-deal bound is the dual to the Hansen-Jagannathan (1991) bound withpositivity – Hansen and Jagannathan study the minimum variance of positive discount factorsthat correctly price a given set of assets. The good deal bound interchanges the position ofthe option pricing equation and the variance of the discount factor. The techniques for solvingthe bound, therefore, are exactly those of the Hansen-Jagannathan bound in this one-periodsetup.

He and Pearson (1992) derive theg��@�� discount factor representation in continuoustime.

There is nothing magic about discount factor volatility. This kind of problem needs weakbut credible discount factor restrictions that lead to tractable and usefully tight bounds. Sev-eral other similar restrictions have been proposed in the literature. 1) Levy (1985) and Con-stantinides (1998) assume that the discount factor declines monotonically with a state vari-able� marginal utility should decline with wealth. 2) The good deal bounds allow the worstcase that marginal utility growth is perfectly correlated with a portfolio of basis and focusassets. In many cases one could credibly impose a sharper limit than�4 � � � 4 onthis correlation to obtain tighter bounds. 3) Bernardo and Ledoit (1999) use the restrictiond � p � e to sharpen the no-arbitrage restriction4 � p A 3, and cleverly relate thisbound taken alone to portfolio “gain/loss ratios,” as the volatility bound�5+p, � k@Ui

taken alone rules out Sharpe ratios. Sincep � d, the call option price generated by this re-striction in a one-period model is strictly greater than the lower arbitrage bound generated byp @ 3� as in this case, the gain-loss bound can improve on the good-deal bound. 4) Bernardoand Ledoit also suggestd � p@| � e where| is an explicit discount factor model such asthe consumption-based model or CAPM, as a way of imposing a “weak implication” of thatparticular model.

These alterantives are really not competitors. Add all the discount factor restrictions thatare appropriate and useful for a given problem.

This exercise seems to me a strong case for discount factor methods as opposed to port-folio methods. The combination of positivity and volatility constraints on the discount factor

300

Page 301: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 18.4 EXTENSIONS, OTHER APPROACHES, AND BIBLIOGRAPHY

leads to a sharper bound than the intersection of no-arbitrage and limited Sharpe ratios. Idon’t know of a simple portfolio characterization of the set of prices that are ruled out by thegood deal bound when both constraints bind. The same will be true as we add, say, gain-lossrestrictions, monotonicity restrictions, etc.

In continuous time, option pricing and terms structure problems increasingly feature as-sumptions about the “market price of risk” of the non-traded shocks. The good-deal boundstreat these rather formally� they choose the market prices of risks at each instant to minimizeor maximize the option price subject to a constraint that the total market price of risk is lessthan a resonable value, compared to the Sharpe ratios of other trading opportunities. One nee-den’t be this formal. Many empirical implementations of option pricing and term structuremodels feature unbelievable sizes and time-variation in market prices of risk. Just imposingsensible values for the market prices of risk, and trying on a range of sensible values may begood enough for many practical situations.

The continuous-time treatment has not yet been extended to the important case of jumpsrather than diffusion processes. With jumps, both the positivity and volatility constraints willbind.

301

Page 302: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

Chapter 19. Term structure of interestratesTerm structure models are particularly simple, since bond prices are just the expected valueof the discount factor. In equations, the price at time t of a zero coupon bond that comesdue at time w . m is S +m,

w @ Hw+pw>w.m,. Thus, once you specify a time-series process for aone-period discount factorpw>w.4, you can in principle¿nd the price of any bond by chaining

together the discount factors and¿ndingS +m,w @ Hw+pw>w.4pw.4>w.5===pw.m�4>w.m,. As with

option pricing models, this chaining together can be hard to do, and much of the analyticalmachinery in term structure models centers on this technical question. As with option pricingmodels, there are two equivalent ways to do the chaining together: Solve the discount factorforward and take an integral, or¿nd a partial differential equation for prices and solve itbackwards from the maturity date.

19.1 De¿nitions and notation

A quick introduction to bonds, yields, holding period returns, forward rates, and swaps.

s+Q,w @ log price ofQ period zero-coupon bond at timew.

|+Q, @ � 4Qs+Q, = log yield.

ksu+Q,w.4 @ s

+Q�4,w.4 � s

+Q,w = log holding period return.

ksu @ gS +Q>w,S

� 4S

CS +Q>w,CQ

gw = instantaneous return.

i+Q$Q.4,w @ s

+Q,w � s

+Q.4,w = forward rate.

i+Q> w, @ � 4S

CS +Q>w,CQ

= instantaneous forward rate.

19.1.1 Bonds

The simplest¿xed-income instrument is azero-coupon bond. A zero-coupon bond is apromise to pay one dollar (a nominal bond) or one unit of the consumption good (a realbond) on a speci¿ed date. I use a superscript in parentheses to denote maturity:S

+6,w is the

price of a three year zero-coupon bond. I will suppress thew subscript when it isn’t necessary.

I denote logs by lowercase symbols,s+Q,w @ oqS

+Q,w . The log price has a nice interpre-

tation. If the price of a one-year zero coupon bond is 0.95, i.e. 95/c per dollar face value, thelog price isoq+3=<8, @ �3=384. This means that the bond sells at a 5% discount. Logs also

302

Page 303: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.1 DEFINITIONS AND NOTATION

give the continuously compounded rate. If we write huQ @ 4@S +Q, then the continuouslycompounded rate is uQ @ � oqS +Q,.

Coupon bonds are common in practice. For example, a $100 face value 10 year couponbond may pay $5 every year for 10 years and $100 at 10 years. (Coupon bonds are oftenissued with semiannual or more frequent payments, $2.50 every six months for example.)We price coupon bonds by considering them as a portfolio of zeros.

Yield. The yield of a bond is the ¿ctional, constant, known, annual, interest rate thatjusti¿es the quoted price of a bond, assuming that the bond does not default. It is not the rateof return of the bond. From this de¿nition, the yield of a zero coupon bond is the number\ +Q, that satis¿es

S +Q, @4�

\ +Q,�Q =

Hence

\ +Q, @4�

S +Q,� 4Q

> |+Q, @ � 4

Qs+Q,=

The latter expression nicely connects yields and prices. If the price of a 4 year bond is -0.20or a 20% discount, that is 5% discount per year, or a yield of 5%. The yield of any stream ofcashÀows is the number\ that satis¿es

S @Q[m@4

FIm\ m

=

In general, you have to search for the value\ that solves this equation, given the cashÀowsand the price. So long as all cashÀows are positive, this is fairly easy to do.

As you can see,the yield is just a convenient way to quote the price. In using yields wemakeno assumptions. We do not assume that actual interest rates are known or constant� wedo not assume the actual bond is default-free. Bonds that may default trade at lower prices orhigher yields than bonds that are less likely to default. This only means a higher return if thebond happens not to default.

19.1.2 Holding Period Returns

If you buy anQ period bond and then sell it—it has now become anQ�4 period bond—youachieve a return of

KSU+Q,w.4 @

'back'paid

@S+Q�4,w.4

S+Q,w

(1)

303

Page 304: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

or, of course,

ksu+Q,w.4 @ s

+Q�4,w.4 � s

+Q,w =

We date this return (from w to w . 4) as w . 4 because that is when you ¿nd out its value. Ifthis is confusing, take the time to write returns as KSUw$w.4 and then you’ll never get lost.

In continuous time, we can easily¿nd the instantaneous holding period return of bondswith ¿xed maturitydate S +W> w,

hpr@S +W> w.�,� S +W> w,

S +W> w,

and, taking the limit,

hpr@gS +W> w,

S=

However, it’s nicer to look for a bond pricing functionS +Q> w, that¿xes thematurity ratherthan thedate. As in (19.1), we have to account for the fact that you sell bonds that haveshorter maturity than you buy.

hpr @S +Q ��> w.�,� S +Q> w,

S +Q> w,

@S +Q ��> w.�,� S +Q> w.�,. S +Q> w.�,� S +Q> w,

S +Q> w,

and, taking the limit

hpr@gS +Q> w,

S� 4

S

CS +Q> w,

CQgw (2)

19.1.3 Forward rate

The forward rate is de¿ned as the rate at which you can contracttoday to borrow or lendmoney starting at periodQ> to be paid back at periodQ . 4.

You can synthesize a forward contract from a spectrum of zero coupon bonds, so theforward rate can be derived from the prices of zero-coupon bonds. Here’s how. Suppose youbuy oneQ period zero and simultaneously sell{ Q . 4 period zero coupon bonds. Let’strack your cashÀow at every date.

304

Page 305: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.1 DEFINITIONS AND NOTATION

Buy N-Period zero Sell x N+1 Period zeros Net cashÀowToday 0: �S +Q, .{S +Q.4, {S +Q.4, � S +Q,

Time N: 1 1Time N+1: -x -x

Now, choose{ so that today’s cashÀow is zero:

{ @S +Q,

S +Q.4,

You pay or get nothing today, you get'4=33 atQ , and you payS +Q,@S +Q.4, atQ .4. Youhave synthesized a contract signed today for a loan fromQ toQ .4—a forward rate! Thus,

I+Q$Q.4,w @ Forward rate at t forQ $ Q . 4 @

S+Q,w

S+Q.4,w

and of course

i+Q$Q.4,w @ s

+Q,w � s

+Q.4,w =

People sometimes identify forward rates by the initial date,i+Q,w , and sometimes by the

ending date,i +Q.4,w . I use the arrow notation when I want to be really clear about dating a

return.

Forward rates have the lovely property that you can always express a bond price as itsdiscounted present value using forward rates,

s+Q,w @ s

+Q,w � s

+Q�4,w . s

+Q�4,w � s

+Q�5,w � ===s

+5,w � s

+4,w . s

+4,w

@ �i+Q�4$Q,w � i

+Q�5$Q�4,w � ===� i

+4$5,w � |

+4,w

(|+4,w @ i+3$4,w of course), so

S+Q,w @ hs

+Q,w @ h�

SQ�4m@3 i

+m$m.4,w @

3CQ�4\

m@3

I+m$m.4,w

4D�4

=

Intuitively, the price today must be equal to the present value of the payoff at rates you canlock in today.

In continuous time, we can de¿ne the instantaneous forward rate

i+Q> w, @ � 4

S

CS +Q> w,

CQ

Then, forward rates have the same property

S +Q> w, @ h�UQ{@3

i+{>w,g{=

305

Page 306: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

19.1.4 Swaps and options

Swaps are an increasingly popular ¿xed income instrument. The simplest example is a ¿xed-for-Àoating swap. Party A may have issued a 10 year¿xed coupon bond. Party B may haveissued a 10 year variable rate bond – a bond that promises to pay the current one year rate.(For example, if the current rate is 5%, the variable rate issuer would pay $5 for every $100of face value. A long-term variable rate bond is the same thing as rolling over one-perioddebt.) They may be unhappy with these choices. For example, the¿xed-rate payer may notwant to be exposed to interest rate risk that the present value of his promised payments risesif interest rates decline. The variable-rate issuer may want to take on this interest rate risk,betting that rates will rise or to hedge other commitments. If they are unhappy with thesechoices, they canswap the payments. The¿xed rate issuer pays off the variable rate coupons,and the variable rate issuer pays off the¿xed rate coupons. Obviously, only the differencebetween¿xed and variable rate actually changes hands.

Swapping the payments is much safer than swapping the bonds. If one party defaults, theother can drop out of the contract, losing the difference in price resulting from intermediateinterest rate changes, but not losing the principal. For this reason, and because they match thepatterns of cashÀows that companies usually want to hedge, swaps have become very populartools for managing interest rate risk. Foreign exchange swaps are also popular: Party A mayswap dollar payments for party B’s yen payments. Obviously, you don’t need to have issuedthe underlying bonds to enter into a swap contract – you simply pay or receive the differencebetween the variable rate and the¿xed rate each period.

The value of a pureÀoating rate bond is always exactly one. The value of a¿xed rate bondvaries. Swaps are set up so no money changes hands initially, and the¿xed rate is calibratedso that the present value of the¿xed payments is exactly one. Thus, the “swap rate” is thesame thing as a the yield on a comparable coupon bond.

Many¿xed income securities contain options, and explicit options on¿xed income secu-rities are also popular. The simplest example is a call option. The issuer may have the right tobuy the bonds back at a speci¿ed price. Typically, he will do this if interest rates fall a greatdeal, making a bond without this option more valuable. Home mortgages contain an interest-ing prepayment option: if interest rates decline, the homeowner can pay off the loan at facevalue, and re¿nance. Options on swaps also exist� you can buy the right to enter into a swapcontract at a future date. Pricing all of these securities is one of the tasks of term structuremodeling.

19.2 Yield curve and expectations hypothesis

The expectations hypothesis is three equivalent statements about the pattern of yieldsacross maturity,

306

Page 307: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.2 YIELD CURVE AND EXPECTATIONS HYPOTHESIS

1. The Q period yield is the average of expected future one-period yields2. The forward rate equals the expected future spot rate.3. The expected holding period returns are equal on bonds of all maturities.

The expectations hypothesis is not quite the same thing as risk neutrality, since it ignores4@5�5 terms that arise when you move from logs to levels.

Theyield curve is a plot of yields of zero coupon bonds as a function of their maturity.Usually, long-term bond yields are higher than short therm bond yields – arising yield curve– but sometimes short yields are higher than long yields – aninverted yield curve. Theyield curve sometimes has humps or other shapes as well. Theexpectations hypothesis is theclassic theory for understanding the shape of the yield curve.

More generally, we want to think about the evolution of yields – the expected value andconditional variance of next period’s yields. This is obviously the central ingredients forportfolio theory, hedging, derivative pricing, and economic explanation. The expectationshypothesis is the traditional benchmark for thinking about the expected value of future yields.

We can state the expectations hypothesis in three mathematically equivalent forms:

1. TheQ period yield is the average of expected future one-period yields

|+Q,w @

4

QHw

�|+4,w . |

+4,w.4 . |

+4,w.5 . ===. |

+4,w.Q�4

�+. risk premium) (3)

2. The forward rate equals the expected future spot rate

iQ$Q.4w @ Hw

�|+4,w.Q

�+. risk premium) (4)

3. The expected holding period returns are equal on bonds of all maturities

Hw+ksu+Q,w.4, @ |

+4,w +. risk premium)= (5)

You can see how the expectations hypothesis explains the shape of the yield curve. If theyield curve is upward sloping – long term bond yields are higher than short term bond yields– the expectations hypothesis says this is because short term rates are expected to rise in thefuture. You can view the expectations hypothesis as a response to a classic misconception.If long term yields are 10% but short term yields are 5%, an unsophisticated investor mightthink that long-term bonds are a better investment. The expectations hypothesis shows howthis may not be true. Future short rates are expected to rise: this means that you will rollover the short term bonds at a really high rate, say 20%, giving the same long-term return.Conversely, when short term interest rates rise, long-term bond prices decline. Thus, thelong-term bonds will only give a 5% rate of return for the¿rst year.

307

Page 308: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

You can see from the third statement that the expectations hypothesis is roughly the sameas risk-neutrality. If we had said that the expectedlevel of returns was equal across maturities,that would be the same as risk-neutrality. The expectations hypothesis speci¿es that theexpectedlog return is equal across maturities. This is typically a close approximation to riskneutrality, but not the same thing. If returns are log-normal, thenH+U, @ hH+u,.

45�

5+u,. Ifmean returns are about 10% or 0.1 and the standard deviation of returns is about 0.1, then4@5�5 is about 0.005, which is very small but not zero. We could easily specify risk-neutralityin the third expression of the expectations hypothesis, but then it would not imply the othertwo – 4

5�5 terms would crop up.

The intuition of the third form is clear: risk-neutral investors will adjust positions untilthe expected one-period returns are equal on all securities. Any two ways of getting moneyfrom w to w . 4 must offer the same expected return. The second form adapts the same ideato the choice of locking in a forward contract vs. waiting and borrowing and lending at thespot rate. Risk-neutral investors will load up on one or the other contract until the expectedreturns are the same. Any two ways of getting money fromw.Q to w.Q .4 must give thesame expected return.

The¿rst form reÀects a choice between two ways of getting money fromw toQ= You canbuy aQ period bond, or roll-overQ one-period bonds. Risk neutral investors will chooseone over the other strategy until the expectedQ� period return is the same.

The three forms are mathematically equivalent. If every way of getting money fromw tow.4 gives the same expected return, then so must every way of getting money fromw. 4 tow. 5, and, chaining these together, every way of getting money fromw to w. 5.

For example, let’s show that forward rate = expected future spot rate implies the yieldcurve. Start by writing

iQ�4$Qw @ Hw+|

+4,w.Q�4,=

Add these up overQ>

i3$4w . i4$5

w . ===. iQ�5$Q�4w . iQ�4$Q

w @ Hw

�|+4,w . |

+4,w.4 . |

+4,w.5 . ===. |

+4,w.Q�4

�=

The right hand side is already what we’re looking for. Write the left hand side in terms of thede¿nition of forward rates, rememberingS +3, @ 4 sos+3, @ 3,

i3$4w . i4$5

w . ===. iQ�5$Q�4w . iQ�4$Q

w @�s+3,w � s

+4,w

�.�s+4,w � s

+5,w

�. ===.

�s+Q�4,w � s

+Q,w

�@ �s

+Q,w @ Q|

+Q,w =

You can show all three forms are equivalent by following similar arguments. (This is agreat problem.)

It is common to add a constantrisk premium and still refer to the resulting model as the

308

Page 309: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.3 TERM STRUCTURE MODELS – A DISCRETE-TIME INTRODUCTION

expectations hypothesis, and I include a risk premium in parentheses to remind you of thisidea. One end of each of the three statements does imply more risk than the other. A forwardrate is known while the future spot rate is not. Long-term bond returns are more volatile thanshort term bond returns. Rolling over short term real bonds is a riskier long-term investmentthan buying a long term real bond. If real rates are constant, and the bonds are nominal, thenthe converse can hold: short term real rates can adapt to inÀation, so rolling over short nominalbonds can be a safer long-term real investment than long-term nominal bonds. These riskswill generate expected return premia if they covary with the discount factor, and our theoryshould reÀect this fact.

If you allow an arbitrary, time-varying risk premium, the model is a tautology, of course.Thus, the entire content of the “expectations hypothesis” augmented with risk premia is inthe restrictions that the risk premium is constant over time. We will see that the constant riskpremium model does not do that well empirically. One of the main points of term structuremodels is to quantify the size and movement over time in the risk premium.

19.3 Term structure models – a discrete-time introduction

Term structure models specify the evolution of the short rate and potentially other statevariables, and the prices of bonds of various maturities at any given time as a function ofthe short rate and other state variables. I examine a very simple example based on an AR(1)for the short rate and the expectations hypothesis, which gives a geometric pattern for theyield curve. A good way to generate term structure models is to write down a process forthe discount factor, and then price bonds as the conditional mean of the discount factor. Thisprocedure guarantees the absence of arbitrage. I give a very simple example of an AR(1)model for the log discount factor, which also results in geometric yield curves. .

A natural place to start in modeling the term structure is to model yields statistically. Youmight run regressions of changes in yields on the levels of lagged yields, and derive a modelof the mean and volatility of yield changes. You would likely start with a factor analysisof yield changes and express the covariance matrix of yields in terms of a few large factorsthat describe their common movement. The trouble with this approach is that you can quiteeasily reach a statistical representation of yields that admits an arbitrage opportunity, andyou would not want to use such a statistical characterization for economic understanding ofyields, for portfolio formation, or for derivative pricing. For example, a statistical analysiswould strongly suggest that a ¿rst factor should be a “level” factor, in which all yields moveup and down together. It turns out that this assumption violates arbitrage: the long-maturityyield must converge to a constant.

How do you model yields without arbitrage? An obvious solution is to use the discountfactor existence theorem: Write a statistical model for a positive discount factor, and¿nd

309

Page 310: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

bond prices as the expectation of this discount factor. Such a model will be, by construction,arbitrage free. Conversely, any arbitrage-free distribution of yields can be captured by somepositive discount factor, so you don’t lose any generality with this approach.

19.3.1 A term structure model based on the expectations hypothesis

We can use the expectations hypothesis to give the easiest example of a term structure model.(This one does not start from a discount factor and so may not be arbitrage-free.) Suppose theone-period yield follows an AR(1),

|+4,w.4 � � @ �+|

+4,w � �, . %w.4=

Now, we can use the expectations hypothesis (19.3) to calculate yields on bonds of all matu-rities as a function of today’s one period yield,

|+5,w @

4

5Hw

k|+4,w . |

+4,w.4

l@

4

5

k|+4,w . � . �+|

+4,w � �,

l@ � .

4 . �

5+|

+4,w � �,

Continuing in this way,

�|+Q,w � �

�@

4

Q

4� �Q.4

4� �+|

+4,w � �,= (6)

You can see some features that will recur throughout the term structure models. First, themodel (19.6) can describe some movements in the yield curve over time. If the short rate isbelow its mean, then there is a smoothly upward sloping yield curve. Long term bond yieldsare higher as short rates are expected to increase in the future. If the short rate is above itsmean, we get a smoothly inverted yield curve. This particular model cannot produce humpsor other interesting shapes that we sometimes see in the term structure. Second, this modelpredicts no average slope of the term structure –H+|

+Q,w , @ H+|

+4,w , @ �. In fact, the average

term structure seems to slope up slightly. Third, all bond yields move together in the model.If we were to stack the yields up in a VAR representation, it would be

|+4,w.4 � � @ �

�|+4,w � �

�. %w.4

|+5,w.4 � � @ �

�|+5,w � �

�. 4.�

5 %w.4

===

|+Q,w.4 � � @ �

�|+Q,w � �

�. 4

Q4��Q.4

4�� %w.4

=

(You can write the right hand variable in terms of|+4,w if you want – any one yield carries the

310

Page 311: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.3 TERM STRUCTURE MODELS – A DISCRETE-TIME INTRODUCTION

same information as any other.) Theerror terms are all the same. We can add more factors tothe short rate process, to improve on this prediction, but most tractable term structure modelsmaintain less factors than there are bonds, so some perfect factor structure is a commonprediction of term structure models. Finally, this model has a problem in that the short rate,following an AR(1), can be negative. Since people can always hold cash, nominal short ratesare never negative, so we want to start with a short rate process that does not have this feature.

With this simple model in hand, you can see some obvious directions for generalization.First, we will want more complex driving processes than an AR(1). For example, a hump-shape in the conditionally expected short rate will result in a hump-shaped yield curve. Ifthere are multiple state variables driving the short rate, then we will have multiple factorsdriving the yield curve which will also result in more interesting shapes. We also want pro-cesses that keep the short rate positive in all states of nature. Second, we will want to addsome “market prices of risk” – some risk premia. This will allow us to get average yieldcurves to not beÀat, and time-varying risk premia seem to be part of the yield data.

The yield curve literature proceeds in exactly this way: specify a short rate process andthe risk premia, and¿nd the prices of long term bonds. The trick is to specify suf¿cientlycomplex assumptions to be interesting, but preserve our ability to solve the models.

19.3.2 The simplest discrete time model

The simplest nontrivial model I can think of is to let the log of the discount factor follow anAR(1) with normally distributed shocks. I write the AR(1) for the log rather than the level inorder to make sure the discount factor is positive, precluding arbitrage. Log discount factorsare typically slightly negative, so I denote the unconditional meanH +oqp, @ ��

+oqpw.4 . �, @ � +oqpw . �, . %w.4=

In turn, you can think of this discount factor model as arising from a consumption basedpower utility model with normal errors.

pw.4 @ h���Fw.4Fw

��

fw.4 � fw @ �+fw � fw�4, . �w.4=

The term structure literature has only started to think whether the empirically successfuldiscount factor processes can be connected empirically back to macroeconomic events in thisway.

From this discount factor, we can¿nd bond prices and yields. This is easy because theconditional mean and variance of an AR(1) are easy to¿nd. (I am following the strategy of

311

Page 312: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

solving the discount factor forward rather than solving the price backward.) We need

|+4,w @ �s

+4,w @ � oqHw+h

oqpw.4,

|+5,w @ �4

5s+5,w @ �4

5oqHw+h

oqpw.4.oqpw.5,

and so on. Iterating the AR(1) forward,

+oqpw.5 . �, @ �5+oqpw . �, . �%w.4 . %w.5

+oqpw.6 . �, @ �6+oqpw . �, . �5%w.4 . �%w.5 . %w.6

so

+oqpw.4 . �, . +oqpw.5 . �, @ +�. �5, +oqpw . �, . +4 . �,%w.4 . %w.5

+oqpw.4 . �, . +oqpw.5 . �, . +oqpw.6 . �,

@ +�. �5 . �6, +oqpw . �, . +4 . �. �5,%w.4 . +4 . �,%w.5 . %w.6=

Using the rule for a lognormal H+h{, @ hH+{,.45�

5{ > we have ¿nally

|+4,w @ � � �+oqpw . �,� 4

5�5%

|+5,w @ � � +�. �5,

5+oqpw . �,� 4 . +4 . �,5

7�5%

|+6,w @ � � +�. �5 . �6,

6+oqpw . �,� 4 . +4 . �,5 . +4 . �. �5,5

9�5%=

Notice all yields move as linear functions of a single state variable, oqpw . �. Therefore,we can substitute out the discount factor and express the yields on bonds of any maturity asfunctions of the yields on bonds of one maturity. Which one we choose is arbitrary, but it’sconventional to use the shortest interest rate as the state variable. WithH+|+4,, @ � � 4

5�5%,

we can write our term structure model as

|+4,w �H+|+4,, @ �

k|+4,w�4 �H+|+4,,

l. %w (19.7)

|+5,w @ � � +�. �5,

5

�|+4,w �H+|+4,,

�� 4 . +4 . �,5

7�5%

|+6,w @ � � +�. �5 . �6,

6

�|+4,w �H+|+4,,

�� 4 . +4 . �,5 . +4 . �. �5,5

9�5%=

|+Q,w @ � � �

Q

4� �Q

4� �

�|+4,w �H+|+4,,

�.

4[m@4

m[n@4

+4 . �n,5

5m�5%

This is the form in which term structure models are usually written – an evolution equationfor the short rate process (together, in general, with other factors or other yields used to

312

Page 313: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.4 CONTINUOUS TIME TERM STRUCTURE MODELS

identify those factors), and then longer rates written as functions of the short rate, or the otherfactors.

This is not a very realistic term structure model. In the data, the average yield curve –the plot ofiH^|

+Q,w `j versusQ – is slightly upward sloping. The average yield curve from

this model is slightly downward sloping as the�5% terms pile up. The effect is not large� with� @ 3=< and�% @ 3=35, I ¿ndH+|

+5,w , @ H+|

+4,w ,�3=35( andH+|

+6,w , @ H+|

+4,w ,�3=39(.

Still, it does not slope up. More importantly, this model only produces smoothly upwardsloping or downward sloping term structures. For example, with� @ 3=<, the¿rst threeterms multiplying the one period rate in (19.7) are3= ;9, 3= ;4, 3=:;. Two, three and fourperiod bonds move exactly with one-period bonds using these coef¿cients. This model showsno conditional heteroskedasticity – the conditional variance of yield changes is always thesame. The term structure data show times of high and low volatility, and times of high yieldsand high yield spreads seem to track these changes in volatility Finally, this model shows aweakness of almost all term structure models – all the yields move together� they follow anexact one-factor structure, they are perfectly conditionally correlated.

The solution, of course, is to specify more complex discount rate processes that give riseto more interesting term structures.

19.4 Continuous time term structure models

The basic steps.

1. Write a time-series model for the discount factor, typically in the form

g�

�@ �ugw� ��+�,g}

gu @ �u+�,gw. �u+�,g}2. Solve the discount factor model forward and take expectations, to¿nd bond prices

S+Q,w @ Hw

��w.Q�w

�=

3. Alternatively, from the basic pricing equation3 @ H ^g+�S ,` we can¿nd a differentialequation that the price must follow,

CS

Cu�u .

4

5

CS

Cu5�5u �

CS

CQ� uS @

CS

Cu�u��=

You can solve this back fromS +3,Q @ 4.

I contrast the discount factor approach to the market price of risk and arbitrage pricing

313

Page 314: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

approaches.

Term structure models are usually more convenient in continuous time. As with the lastmodel, I specify a discount factor process and then ¿nd bond prices. A wide and popularclass of term structure models are based on discount factor process of the form

g�

�@ �ugw� ��+�,g} (19.8)

gu @ �u+�,gw. �u+�,g}

This speci¿cation is analogous to a discrete time model of the form

pw.4 @ {w . �%w.4

{w.4 @ �{w . %w.4=

This is a convenient representation since the state variable { carries the mean discount factorinformation.

The u variable starts out as a state variable for the drift of the discount factor. However,you can see quickly that it will become the short rate process since Hw+g�@�, @ �uiw gw. Thedots +�, remind you that these terms can be functions of state variables. Of course, we canadd orthogonal components to the discount factor with no effect on the bond prices. Thus,the perfect correlation between interest rate and discount factor shocks is not essential.

Term structure models differ in the speci¿cation of the functional forms for �u> �u> ��.We will study three famous examples, the Vasicek model, the Cox-Ingersoll-Ross model andthe general af¿ne speci¿cation. The¿rst two are

Vasicekg�

�@ �ugw� ��g} (19.9)

gu @ !+�u � u,gw. �ug}

CIRg�

�@ �ugw� ��

s�g} (19.10)

gu @ !+�u � u,gw. �usug}

The Vasicek model is quite similar to the AR(1) we studied in the last section. The CIRmodel adds the square root terms in the volatility. This speci¿cation captures the fact in USdata that higher interest rates seem to be more volatile. In the other direction, it keeps thelevel of the interest rate from falling below zero. (We need�u A 5!�u to guarantee that thesquare root process does not get stuck at zero.)

Having speci¿ed a discount factor process, it’s a simple matter to¿nd bond prices once

314

Page 315: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.4 CONTINUOUS TIME TERM STRUCTURE MODELS

again,

S+Q,w @ Hw

��w.Q�w

�.

We can simply solve the discount factor forward and take the expectation. We can also usethe instantaneous pricing condition 3 @ H+g+�S ,, to ¿nd a partial differential equation forprices, and solve that backward.

Both methods naturally adapt to pricing term structure derivatives – call options on bonds,interest rateÀoors or caps, “swaptions” that give you the right to enter a swap, and so forth.We simply put any payoff{F that depends on interest rates or interest rate state variablesinside the expectation

S+Q,w @ Hw

] 4

v@w

�v�w

{F+v,gv=

Alternatively, the price of such options will also be a function of the state variables that drivethe term structure, so we can solve the bond pricing differential equation backwards usingthe option payoff rather than one as the boundary condition.

19.4.1 Expectation approach

As with the Black-Scholes option pricing model, we can think of solving the discount factorforward, and then taking the expectation. We can write the solution8 to (19.8)

�W�3

@ h�UWv@3+uv.

45�

5�v,gv �

UWv@3

��vg}

and thus,

S+Q,3 @ H3

�h�

UWv@3+uv.

45�

5�v,gv �

UWv@3

��vg}�

. (11)

For example, in a riskless economy�� @ 3, we obtain the continuous-time present valueformula,

S+Q,3 @ h�

UWv@3

uvgv=

With a constant interest rateu,

S3 @ h�uW =

H If this is mysterious, write ¿rst

_ *?\ '_\

\3

2

_\2

\2' 3

�o n

2j2

\

�_|3 j\_5

and then integrate both sides from zero to A .

315

Page 316: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

In more interesting situations, solving the � equation forward and taking the expectationanalytically is not so easy. Conceptually and numerically, it is easy, of course. Just simulatethe system (19.8) forward a few thousand times, and take the average.

19.4.2 Differential equation approach

Recall the basic pricing equation for a security with price V and no dividends is

Hw

�gV

V

�� ugw @ �Hw

�gV

V

g�

�= (12)

The left hand side is the expected return. As we guessed an option price F+V> w, and used(19.12) to derive a differential equation for the call option price, so we will guess a bondprice S +Q> w, and use this equation to derive a differential equation for the bond price.

If we speci¿ed bonds by their maturity date W , S +w> W ,, we could apply (19.12) directly.However, it’s nicer to look for a bond pricing functionS +Q> w, that¿xes thematurity ratherthan thedate. Equation (19.2) gives the holding period return for this case, which adds anextra term to correct for the fact that you sell younger bonds than you buy,

return =gS +Q> w,

S� 4

S

CS +Q> w,

CQgw

Thus, the fundamental pricing equation, applied to the price of bonds of givenmaturityS +Q> w, is

Hw

�gS

S

���4

S

CS +Q> w,

CQ. u

�gw @ �Hw

�gS

S

g�

�= (13)

Now, we’re ready to¿nd a differential equation for the bond price, just as we did for theoption price to derive the Black-Scholes formula. Guess that all the time dependence comesthrough the state variableu, soS +Q> u,. Using Ito’s lemma

gS @

�CS

Cu�u .

4

5

C5S

Cu5�u

�gw.

CS

Cu�ug}=

(There is noCS@CQ because we de¿ned theS function as the price of bonds with constantmaturityQ . If we had de¿nedS +W> u, as the price of bonds that come due on dateW , thenthere would be a�CS@CW term. This would take the place of theCS@CQ term in (19.13).)Plugging in to (19.13) and cancelinggw, we obtain the fundamental differential equation forbonds,

CS

Cu�u .

4

5

CS

Cu5�5u �

CS

CQ� uS @

CS

Cu�u��= (14)

All you have to do is specify the function s�u+�,, �u+�,, ��+�, and solve the differentialequation.

316

Page 317: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.4 CONTINUOUS TIME TERM STRUCTURE MODELS

19.4.3 Market price of risk and risk-neutral dynamic approaches

The bond pricing differential equation (19.14) is conventionally derived without discountfactors.

One conventional approach is to write the short rate process gu @ �u+�,gw.�u+�,g}, andthen specify that any asset whose payoffs have shocks �ug} must offer a Sharpe ratio of �+�,.We would then write

CS

Cu�u .

4

5

CS

Cu5�5u �

CS

CQ� uS @

CS

Cu�u�=

With � @ ��, this is just (19.14) of course. (If the discount factor and shock are imperfectlycorrelated, then � @ ���.) Different authors use “market price of risk” in different ways. CoxIngersoll and Ross (1985, p.398) warn against modeling the right hand side asCS@Cu # +�,directly� this speci¿cation could lead to a positive expected return when�u @ 3 and hence anin¿nite Sharpe ratio or arbitrage opportunity. By generating expected returns as the covari-ance of payoff shocks and discount factor shocks, we naturally avoid this mistake and othersubtle ways to introduce arbitrage opportunities without realizing that you have done so.

A second conventional approach is to use an alternative interest rate and discount factorprocess,

g�

�@ �ugw (19.15)

gu @ +�u � �u�,gw. �ug}u=

If we use this alternative process, we obtain

CS

Cu+�u � �u�, .

4

5

CS

Cu5�5u �

CS

CQ� uS @ 3

which is of course the same thing. This is the “risk neutral probability” approach, since thedrift term in (19.15) is not the true drift. Since (19.15) gives the same prices, we can¿nd andrepresent the bond price via the integral

S+Q,w @ H�

w

kh�

UWv@3

uvgvl

whereH� represents expectation with respect to the risk-neutral process de¿ned in (19.15)rather than the true probabilities de¿ned by the process (19.8).

When we derive the model from a discount factor, the single discount factor carries twopieces of information. The drift or conditional mean of the discount factor gives the shortrate of interest, while the covariance of the discount factor shocks with asset payoff shocksgenerates expected returns or “market prices of risk.” I¿nd it useful to write the discountfactor model to keep the term structure connected with the rest of asset pricing, and to remindmyself where “market prices of risk” come from, and reasonable values for their magnitude.Of course, the result is the same no matter which method you follow.

317

Page 318: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

The fact that there are fewer factors than bonds means that once you have one bond priceyou can derive all the others by “no arbitrage” arguments and make this look like optionpricing. Some derivations of term structure models follow this approach, setting up arbitrageportfolios.

19.4.4 Solving the bond price differential equation

Now we have to solve the partial differential equation (19.14) subject to the boundary con-dition S +Q @ 3> u, @ 4. Solving this equation is straightforward conceptually and numeri-cally. Express (19.14) as

CS

CQ@

CS

Cu+�u � �u���, .

4

5

C5S

Cu5�5u � uS=

We can start atQ @ 3 on a grid ofu, andS +3> u, @ 4. For¿xedQ , we can work to one steplargerQ by evaluating the derivatives on the right hand side. The¿rst step is

S +�Q> u, @CS

CQ�Q @ �u�Q=

At the second step,CS@Cu @ �Q , C5S@Cu5 @ 3, so

S +5�Q> u, @ �Q5 +�u � �u��,� u5�Q5=

Now the derivatives of�u and�u with respect tou will start to enter, and we let the computertake it from here. Analytic solutions only exist in special cases, which we study next.

19.5 Three famous linear term structure models

I solve the Vasicek, Cox Ingersoll Ross, and Af¿ne model. Each model gives alinearfunction for log bond prices and yields, for example,

oqS +Q> u, @ D+Q,�E+Q,u

As we have seen, term structure models are easy in principle and numerically: specify adiscount factor process and¿nd its conditional expectation or solve the bond pricing partialdifferential equation back from maturity. In practice, the computations are hard. I presentnext three famous special cases of term structure models – speci¿cations for the discountfactor process – that allow analytical or quickly calculable solutions.

318

Page 319: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.5 THREE FAMOUS LINEAR TERM STRUCTURE MODELS

Analytical or close to analytical solutions are still important, because we have not yetfound good techniques for reverse-engineering the term structure. We know how to start witha discount factor process and¿nd bond prices. We don’t know how to start with the charac-teristics of bond prices that we want to model and construct an appropriate discount factorprocess. (At least one whose parameters do not change every day, as in the “arbitrage free”literature.) Thus, in evaluating term structure models, we will have to do lots of the “forward”calculations – from assumed discount factor model to bond prices – and it is important thatwe should be able to do them quickly.

Obviously, this¿eld would be dramatically changed if we could¿nd a way to reverse-engineer the term structure to directly estimate the discount factor process. Also, the ad-hoctime-series models of the discount factor obviously need to be connected to macroeconomics�if not to consumption, then at least to inÀation, marginal products of capital, and macroeco-nomic variables used by the Federal Reserve in its interventions in the interest rate markets.

19.5.1 Vasicek model via by pde

The Vasicek (19xx) model is a special case that allows a fairly easy analytic solution. Themethod is the same as the more complex analytic solution in the CIR and af¿ne classes, butthe algebra is easier, so this is a good place to start.

The Vasicek discount factor process is

g�

�@ �ugw� ��g}

gu @ !+�u � u,gw. �ug}

Using this process in the basic bond differential equation (19.14), we obtain

CS

Cu!+�u � u, .

4

5

C5S

gu5�5u �

CS

CQ� uS @

CS

Cu�u��= (16)

I’ll solve this equation with the usual unsatisfying non-constructive technique – guess thefunctional form of the answer and show it’s right. I guess that log yields and hence log pricesare a linear function of the short rate,

S +Q> u, @ hD+Q,�E+Q,u= (17)

I take the partial derivatives required in (19.16) and see if I can¿ndD+Q, andE+Q, to make(19.16) work. The result is a set ofordinary differential equations forD+Q, andE+Q,, andthese are of a particularly simple form that can be solved by integration. I solve them, subject

319

Page 320: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

to the boundary condition imposed by S +3> u, @ 4. The result is

E+Q, @4

!

�4� h�!Q

�(19.18)

D+Q, @

�4

5

�5u!5

.�u��!

� �u

�+Q �E+Q,,� �5u

7!E+Q,5= (19.19)

The exponential form of (19.17) means that log prices and log yields are linear functionsof the interest rate,

s+Q> u, @ D+Q,�E+Q,u

|+Q> u, @ �D+Q,

Q.

E+Q,

Qu=

Solving the pde: details.

The boundary condition S +3> u, @ 4 will be satis¿ed if

D+3,�E+3,u @ 3=

Since this must hold for every u, we will need

D+3, @ 3> E+3, @ 3=

Given the guess (19.17), the derivatives that appear in (19.16) are

4

S

CSuCu

@ �E+Q,

4

S

C5S

Cu5@ E+Q,5

4

S

CS

CQ@ D3+Q,�E3+Q,u=

Substituting these derivatives in (19.16),

�E+Q,!+�u � u, .4

5E+Q,5�5u �D3+Q, .E3+Q,u � u @ �E+Q,�u��=

This equation has to hold for every u, so the terms multiplying u and the constant terms mustseparately be zero.

D3+Q, @4

5E+Q,5�5u � +!�u � �u��,E+Q, (19.20)

E3+Q, @ 4�E+Q,!=

We can solve this pair of ordinary differential equations by simple integration. The second

320

Page 321: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.5 THREE FAMOUS LINEAR TERM STRUCTURE MODELS

one is

gE

gQ@ 4� !e]

gE

4� !E@ gQ

� 4

!oq +4� !E, @ Q

and hence

E+Q, @4

!

�4� h�!Q

�= (21)

Note E+3, @ 3 so we did not need a constant in the integration.

We solve the ¿rst equation in (19.20) by simply integrating it, and choosing the constantto set D+3, @ 3. Here we go.

D3+Q, @4

5E+Q,5�5u � +!�u � �u��,E+Q,

D+Q, @�5u5

]E+Q,5gQ � +!�u � �u���,

]E+Q,gQ .F

D+Q, @�5u5!5

] �4� 5h�!Q . h�5!Q

�gQ �

��u � �u���

!

�] �4� h�!Q

�gQ .F

D+Q, @�5u5!5

�Q .

5h�!Q

!� h�5!Q

5!

����u � �u���

!

��Q .

h�!Q

!

�.F

We pick the constant of integration to give D+3, @ 3. You can do this explicitly, or ¿gure outdirectly that the result is achieved by subtracting one from the h�!Q terms,

D+Q, @�5u5!5

#Q .

5�h�!Q � 4

�!

��h�5!Q � 4

�5!

$���u � �u���

!

�#Q .

�h�!Q � 4

�!

$

Now, we just have to make it pretty. I’m aiming for the form given in (19.19). Note

E+Q,5 @4

!5�4� 5h�!Q . h�5!Q

�!E+Q,5 @ 5

4� h�!Q

!.

h�5!Q � 4

!

h�5!Q � 4

!@ !E+Q,5 � 5E+Q,=

321

Page 322: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

Then

D+Q, @�5u5!5

�Q � 5E+Q,� !

5E+Q,5 .E+Q,

����u � �u���

!

�+Q �E+Q,,

D+Q, @ ��5u7!

E+Q,5 ���u � �u���

!� �5u

5!5

�+Q �E+Q,, =

We’re done.

19.5.2 Vasicek model by expectation

What if we solve the discount rate forward and take an expectation instead? The Vasicekmodel is simple enough that we can follow this approach as well, and get the same analyticsolution. The same methods work for the other models, but the algebra gets steadily worse.

The model is

g�

�@ �ugw� ��g} (19.22)

gu @ !+�u � u,gw. �ug}= (19.23)

The bond price is

S+Q,3 @ H3

��Q�3

�(24)

I use3 andQ rather thanw andw.Q to save a little bit on notation.

To ¿nd the expectation in (19.24), we have to solve the system (19.22)-(19.23) forward.The steps are simple, though the algebra is a bit daunting. First, we solveu forward. Then,we solve� forward. oq�w turns out to be conditionally normal, so the expectation in (19.24)is the expectation of a lognormal. Collecting terms in the resulting expectation that dependon u3 as theE+Q, term, and the constant term as theD+Q, term, we¿nd the same solutionas (19.18)-(19.19).

The interest rate is just an AR(1). By analogy with a discrete time AR(1) you can guessthat its solution is

uw @

] w

v@3

h�!+w�v,�ug}v . h�!wu3 . +4� h�!w,�u= (25)

To derive this solution, de¿ne�u by

�uw @ h!w+uw � �u,=

322

Page 323: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.5 THREE FAMOUS LINEAR TERM STRUCTURE MODELS

Then,

g�uw @ !�uwgw. h!wguw

g�uw @ !�uwgw. h!w!+�u � u,gw. h!w�ug}w

g�uw @ !�uwgw� h!w!h�!w�uwgw. h!w�ug}w

g�uw @ h!w�ug}w=

This equation is easy to solve,

�uw � �u3 @ �u

] w

v@3

h!vg}v

h!w+uw � �u,� +u3 � �u, @ �u

] w

v@3

h!vg}v

uw � �u @ h�!w+u3 � �u, . �u

] w

v@3

h�!+w�v,g}v=

And we have (19.25).

Now, we solve the discount factor process forward. It isn’t pretty, but it is straightforward.

g oq�w @g�

�� 4

5

g�5

�5@ �+uw .

4

5�5�,gw� ��g}w

oq�w � oq�3 @ �] w

v@3

+uv .4

5�5�,gv� ��

] w

v@3

g}v=

Plugging in the interest rate solution (19.25),

oq�w � oq�3 @ �] w

v@3

��] v

x@3

h�!+v�x,�ug}x

�. h�!v +u3 � �u, . �u .

4

5�5�

�gv� ��

] w

v@3

g}v

Interchanging the order of the¿rst integral, evaluating the easygv integrals and rearranging,

@ ���

] w

v@3

g}v � �u

] w

x@3

�] w

v@x

h�!+v�x,gv�g}x �

���u .

4

5�5�

�w. +u3 � �u,

] w

v@3

h�!vgv�

@ �] w

x@3

��� .

�u!

�4� h�!+w�x,

��g}x �

��u .

4

5�5�

�w� +u3 � �u,

4� h�!w

!= (19.26)

The¿rst integral has a deterministic function of timex. This gives rise to a normally dis-tributed random variable – it’s just a weighted sum of independent normalsg}x:

] w

x@3

i+x,g}x � Q

�3>

] w

x@3

i5+x,gx

�=

323

Page 324: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

Thus, oq�w � oq�3 is normally distributed with mean given by the second set of terms in(19.26) and variance

ydu3 +oq�w � oq�3, @

@

] w

x@3

��� .

�u!

�4� h�!+w�x,

��5gx

@

] w

x@3

%��� .

�u!

�5

� 5�u!

��� .

�u!

�h�!+w�x, .

�5u!5

h�5!+w�x,&gx

@

��� .

�u!

�5

w� 5�u

!5

��� .

�u!

��4� h�!w

�.

�5u5!6

�4� h�5!w

�= (19.27)

Since we have the distribution of �Q we are ready to take the expectation.

oqS +Q> 3, @ oqH3

�hoq�Q�oq�3

�@ H3 +oq�Q � oq�3, .

4

5�53 +oq�Q � oq�3, =

Plugging in the mean from (19.26) and the variance from (19.27)

oqS+Q,3 @ �

���u .

4

5�5�

�Q . +u3 � �u,

4� h�!Q

!

�(19.28)

.4

5

��u!

. ��

�5

Q � �u

!5

��u!

. ��

��4� h�!Q

�.

�5u7!6

�4� h�5!Q

�(19.29)

All that remains is to make it pretty. To compare it with our previous result, we want toexpress it in the form oqS +Q> u3, @ D+Q,�E+Q,u3. The coef¿cient on u3 (19.28) is

E+Q, @4� h�!Q

!> (30)

the same expression we derived from the partial differential equation.

To simplify the constant term, recall that (19.30) implies

4� h�5!Q

!@ �!E+Q,5 . 5E+Q,=

324

Page 325: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.5 THREE FAMOUS LINEAR TERM STRUCTURE MODELS

Thus, the constant term (the terms that do not multiply u3) in (19.28) is

D+Q, @ ���

�u .4

5�5�

�Q � �u

4� h�!Q

!

.4

5

��u!

. ��

�5

Q � �u

!5

��u!

. ��

��4� h�!Q

�.

�5u7!6

�4� h�5!Q

�D+Q, @ �

���u .

4

5�5�

�Q � �uE+Q,

.4

5

��u!

. ��

�5

Q � �u!

��u!

. ��

�E+Q,� �5u

7!5�!E+Q,5 � 5E+Q,

�D+Q, @

�4

5

�5u!5

. ���u!

� �u

�+Q �E+Q,,� �5u

7!5!E+Q,5=

Again, this is the same expression we derived from the partial differential equation.

This integration is usually expressed under the risk-neutral measure. If we write the risk-neutral process

g�

�@ �ugw

gu @ ^!+�u � u,� �u��` gw. �ug}=

Then the bond price is

S+Q,3 @ Hh�

UQv@3

uvgv=

The result is the same, of course.

19.5.3 Cox Ingersoll Ross Model

For the Cox-Ingersoll-Ross (1985) model

g�

�@ �ugw� ��

s�g}

gu @ !+�u � u,gw. �usug}

our differential equation (19.14) becomes

CS

Cu!+�u � u, .

4

5

C5S

Cu5�5uu �

CS

CQ� uS @

CS

Cu�u��u= (31)

Guess again that log prices are a linear function of the short rate,

S +Q> u, @ hD+Q,�E+Q,u= (32)

325

Page 326: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

Substituting the derivatives of (19.32) into (19.31),

�E+Q,!+�u � u, .4

5E+Q,5�5uu �D3+Q, .E3+Q,u � u @ �E+Q,�u��u=

Again, the coef¿cients on the constant and on the terms in u must separately be zero,

E3+Q, @ 4� 4

5�5uE+Q,5 � +�u�� . !,E+Q, (19.33)

D3+Q, @ �E+Q,!�u=

The ordinary differential equations (19.33) are quite similar to the Vasicek case, (19.20).However, now the variance terms multiply an u, so the E+Q, differential equation has theextraE+Q,5 term. We can still solve both differential equations by simple integration, thoughthe algebra is a little bit more complicated. The result is

E+Q, @5�4� h�Q

�+� . !. �u��,+h�Q � 4, . 5�

D+Q, @!�u

�5u

�5 oq

�5�

#+h�Q � 4, . 5�

�. #Q

where

� @

t+!. �u��,

5 . 5�5u# @ !. ���u . �=

The CIR model can also be solved by expectation. In fact, this is how Cox Ingersolland Ross (1985) actually solve it – their marginal value of wealthMZ is the same thing asthe discount factor. However, where the interest rate in the Vasicek model was a simpleconditional normal, the interest rate now has a non-central"5 distribution, so taking theintegral is a little messier.

19.5.4 Multifactor af¿ne models

The Vasicek and CIR models are special cases of theaf¿ne class of term structure mod-els (Duf¿e and Kan 1996, Dai and Singleton 1999). These models allow multiple factors,meaning all bond yields are not just a function of the short rate. Af¿ne models maintain theconvenient form that log bond prices are linear functions of the state variables. This meansthat we can takeN bond yields themselves as the state variables, and the yields will revealanything of interest in the hidden state variables. The short rate and its volatility will beforecast by lagged short rates but also by lagged long rates or interest rate spreads. My pre-sentation and notation is similar to Dai and Singleton’s, but as usual I add the discount factorexplicitly.

326

Page 327: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.5 THREE FAMOUS LINEAR TERM STRUCTURE MODELS

Here is the af¿ne model setup:

g| @ ! +�| � |, gw.gz (19.34)

u @ �3 . �3| (19.35)g�

�@ �ugw� e3�gz (19.36)

gzl @t�l . �3l|g}l> H+g}lg}m, @ 3= (19.37)

Equation (19.34) describes the evolution of the state variables. In the end, yields will belinear functions of the state variables, so we can take the state variables to be yields� thus weuse the letter |. | denotes a N� dimensional vector of state variables. ! is now a N �Nmatrix, �| is a N� dimensional vector, is a N �N matrix. Equation (19.35) describes themean of the discount factor or short rate as a linear function of the state variables. Equation(19.36) is the discount factor. e� is a N�dimensional vector that describes how the discountfactor responds to the N shocks. The more � responds to a shock, the higher the market priceof risk of that shock. Equation (19.37) describes the shocks gz. The functional form neststhe CIR square root type models if �l @ 3 and the Vasicek type Gaussian process if �l @ 3.You can’t pick �l and�l arbitrarily, as you have to make sure that�l.�3l| A 3 for all valuesof | that the process can attain. Dai and Singleton characterize this “admissibility” criterion.

We¿nd bond prices in the af¿ne setup following exactly the same steps as for the Vasicekand CIR models. Again, we guess that prices are linear functions of the state variables|.

S +Q> |, @ hD+Q,�E+Q,3|=

We apply Ito’s lemma to this guess, and substitute in the basic bond pricing equation (19.13).We obtain ordinary differential equations thatD+Q, andE+Q, must satisfy,

CE+Q,

CQ@ �!3E+Q,�

[l

�E+Q,le�l .

4

5^3E+Q,`

5l

��l . � (19.38)

CD+Q,

CQ@

[l

�E+Q,le�l .

4

5^3E+Q,`

5l

��l �E+Q,3!�| � �3= (19.39)

I use the notation{`l to denote thelth element of a vector{. As with the CIR and Vasicekmodels, these are ordinary differential equations that can be solved by integration startingwith D+3, @ 3> E+3, @ 3. While they do not always have analytical solutions, they arequick to solve numerically – much quicker than solving a partial differential equation.

Derivation

To derive (19.39) and (19.38), we start with the basic bond pricing equation (19.13),which I repeat here,

Hw

�gS

S

���4

S

CS

CQ. u

�gw @ �Hw

�gS

S

g�

�= (40)

327

Page 328: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

We need gS@S>

gS

S@

4

S

CS

C|

3g| .

4

5

4

Sg|3

C5S

C|C|3g|=

The derivatives are

4

S

CS

C|@ �E+Q,

4

S

C5S

C|C|3@ E+Q,E3+Q,

4

S

CS

CQ@

CD+Q,

CQ� CE+Q,

CQ

3|=

Thus, the ¿rst term (19.40) is

Hw

�gS

S

�@ �E+Q,3! +�| � |,gw.

4

5Hw +gz

33E+Q,E3+Q,gz,

Hw+gzlgzm, @ 3, which allows us to simplify the last term. If z4z5 @ 3, then,

+z3ee3z, @�z4 z5

� � e4e4 e4e5e5e4 e5e5

� �z4

z5

�@ e54z

54 . e55z

55 @

[e5lz

5l =

Applying the same algebra to our case,

Hw +gz33E+Q,E3+Q,gz, @

[l

^3E+Q,`5l gz

5l @

[l

^3E+Q,`5l

��l . �3l|

�gw=

I use the notation ^{`l to denote the ith element of the N�dimensional vector {= In sum, wehave

Hw

�gS

S

�@ �E+Q,3! +�| � |,gw.

4

5

[l

^3E+Q,`5l

��l . �3l|

�gw= (41)

The right hand side term in (19.40) is

�Hw

�gS

S

g�

�@ �E+Q,3gzgz3e�

gzgz3 is a diagonal matrix with elements��l . �3l|

�. Thus,

�Hw

�gS

S

g�

�@ �

[l

E+Q,le�l��l . �3l|

�(42)

328

Page 329: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.6 BIBLIOGRAPHY

Now, substituting (19.41) and (19.42) in (19.40), along with the easier CS@CQ centralterm, we get

�E+Q,3! +�| � |, .4

5

[l

^3E+Q,`5l

��l . �3l|

���CD+Q,

CQ� CE+Q,

CQ

3| . �3 . �3|

@ �[l

E+Q,le�l��l . �3l|

�=

Once again, the terms on the constant and each |l must separately be zero. The constantterm:

�E+Q,3!�| .4

5

[l

^3E+Q,`5l �l �

CD+Q,

CQ� �3 @ �

[l

E+Q,le�l�l=

CD+Q,

CQ@

[l

�E+Q,le�l .

4

5^3E+Q,`

5l

��l �E+Q,3!�| � �3

The terms multiplying | =

E+Q,3!| .4

5

[l

^3E+Q,`5l �

3l| .

CE+Q,

CQ

3| � �3| @ �

[l

E+Q,le�l�3l|=

Taking the transpose and solving,

CE+Q,

CQ@ �!3E+Q,�

[l

�E+Q,le�l .

4

5^3E+Q,`

5l

��l . �=

19.6 Bibliography

Campbell Lo and MacKinlay show how to do in discrete time. Constantinides a nice discretetime model

So far most of the term structure literature has emphasized the risk-neutral probabilities,rarely making any reference to the separation between drifts and market prices of risk. Thiswas not a serious shortcoming for option pricing uses, for which modeling the volatilities ismuch more important than for modeling the driftrs, and to draw smooth yield curves acrossmaturities. However, it makes the models unsuitable for bond portfolio analysis and otheruses. Many models specify very high and time-varying market prices of risk or conditionalSharpe ratios. Recently, Duffee (1999) and Duarte (2000) have started the important task ofspecifying term structure models that¿t the empirical facts about expected returns in termstructure models, and in particular the Fama-Bliss (1986) and Campbell and Shiller (199x)

329

Page 330: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

CHAPTER 19 TERM STRUCTURE OF INTEREST RATES

regressions that relate expected returns to the slope of the term structure (see Chapter ??),while maintaining the tractability of af¿ne models.

Term structure models used in ¿nance amount to regressions of interest rates on laggedinterest rates. Macroeconomists also run regressions of interest rates on a wide variety ofvariables, including lagged interest rates, but also lagged inÀation, output, unemployment,exchange rates, and so forth. They often interpret these equations as the Federal reserve’spolicy-making rule for setting short rates as a function of macroeconomic conditions. Thisis particularly clear in the monetary VAR literature, for example Christiano Eichenbaum andEvans (199x), Cochrane (199x). Someone, it would seem, is missing important right handvariables.

The criticism of¿nance models is stinging when we only use the short rate as a statevariable. Mutifactor models are more subtle. If any variable forecasts future interest rates,then it becomes a state variable, and it should be reÀected in bond yields. In the af¿nemodels, bond yields will be a linear function of this state variable, and thus perfectly revealit. Thus, bond yields should completely drive out any other macroeconomic state variables!They don’t, which is an interesting observation. Not all of the discrepancy can be chalked upto nonlinearities.

In addition, there is an extensive literature that studies yields from a purely statistical pointof view, Gallant and Tauchen (199x) for eample, and a literature that studies high frequencybehavior in the federal funds market, for example Hamilton (199x). (Overnight interest ratesbehave unusually, so term structure models often use the 1 month or 3 month rate as the“short rate.”) Obviously, these three literatures need to become integrated. Piazzesi (2000)recently took an important step in this direction. She integrated a careful speci¿cation of high-frequency moves in the federal funds rate due to Fed policy, following Hamilton’s (19xx)careful modeling of federal funds, into a term structure model.

The models studied here are all based on diffusion models with rather slow-moving statevariables. These models generate one-day ahead densities that are almost exactly normal.In fact, as Johannes (2000) points out, one-day ahead densities have much fatter tails thannormal distributions predict. This behavior could be modeled by fast-moving state variables.However, it is more natural to think of this behavior as generated by a jump process, andJohannes nicely¿ts a combined jump diffusion for yields. This speci¿cation changes pricingand hedging characteristics of term structure models signi¿cantly.

All of the term structure models in this chapter describe many bond yields as a function ofa few state variables. This is a reasonable approximation to the data� almost all of the varianceof yields can be described in terms of a few factors, typically a “level” “slope” and “hump”factor, and the information in the yield curve about future changes in yields – the conditionalmean or drift and conditional variance or diffusion – can usually be well summarized by onelevel and a few spreads. However, it is an approximation. Actual bonds trade a few tenthsof a percent away from the predictions of any smooth yield curve, and the covariance matrixof actual bond yields does not have an exactN factor structure. Hence you cannot applymaximum likelyhood� you either have to estimate the models by GMM, forcing the estimate

330

Page 331: up.iranblog.comup.iranblog.com/uploads/John-Cochrane-Asset-Pricing.pdf · Contents Acknowledgments 3 1Preface 5 Part I. Asset pricing theory 9 2 Consumption-based model and overview

SECTION 19.7 PROBLEMS

to ignore the stochastic singularity, or you have to add distasteful measurement errors.

As always, the importance of an approximation depends on how you use the model. Ifyou take the model literally, a bond slightly off is an arbitrage opportunity. In fact, it is at besta good Sharpe ratio, but a N factor model will not tell you how good. A hedging stratetegyfor an option may fall apart if you cannot in fact buy a bond at exactly the price predicted bythe model.

An interesting response to this problem is models with potentially in¿nite numbers ofstate variables. (Kennedy 1994, Santa Clara 19xx, Kimmel 2000) If you have as many statevariables as bonds, Longstaff (2000) shows that at last the much maligned expectations hy-pothesis can be consistent with lack of arbitrage, as no combination of long term bonds canexactly synthesize the risk free rate.

19.7 Problems

1. Complete the proof that each of the three statements of the expectations hypothesisimplies the other. Is this also true if we add a constant risk premium? Are the risk premiaof the same sign?

2. Under the expectations hypothesis, if long-term yields are higher than short term yields,does this mean that futurelong term rates should go up, down, or stay the same? (Hint: aplot of the expected log bond prices over time will really help here.)

3. Are long-term nominal bonds or rolling over short term nominal bonds a safer long-termreal investment? (Hint: It can go either way, depending on whether interest rate variationis due to changing inÀation at constant real rates or changing real rates at constantinÀation.)

4. Start by assuming risk neutrality –H+KSU+Q,w.4, @ \

+4,w for all maturitiesQ . Try to

derive the other representations of the expectations hypothesis. Now you see why wespecify that the expected log returns are equal.

5. Look at (19.11) and show that adding orthogonalgz to the discount factor has no effecton bond pricing formulas.

6. Look at (19.11) and show thatS @ h�uW if interest rates are constant, i.e. ifg�@� @ �ugw. ��g}. (

7. If the interest rate and�� are constants, the discount factor is lognormal so it’s easy totake the expectation, and we get the same result,

S+Q,3 @ H3

�h�uW � 4

5�5�W .��+}W�}3,

�@ h�uW .

331