40
Introduction Contents of the course, logic, sets, functions SB: A1, 13.5 General info Lecturer: Mitri Kitti ([email protected]) Web-page: www.mitrikitti.fi/mathcamp.html Material handouts by Juuso V¨ alim¨ aki on course webpage Book: Simon and Blume, Mathematics for Economists, relevant chapters indicated in slides (SB: XX) exercises and slides can be found in the course website Contents Basics: logic, sets, functions Elementary analysis topology: metrics, open and closed sets normed vector spaces continuity of functions, Weierstrass theorem Linear algebra linear functions rank of a matrix, determinant eigenvalues and eigenvectors application: linear dynamic systems Multivariate calculus differential and partial derivatives Taylor series, linearizations and log-linearizations implicit function theorem application: stability of nonlinear dynamic systems Contents and schedule Unconstrained optimization first and second order optimality conditions envelope theorem Convex analysis convex sets and concave functions, quasiconcavity concave and quasiconcave optimization Nonlinear programming first order necessary conditions comparative statics sufficient optimality conditions Contents and schedule Introduction to Programming Programming with Matlab basics, linear algebra conditional statements and loops writing functions Basics on Dynare Logic A proposition is a statement which can be declared true or false without ambiguity A propositional (or sentential) function is a statement which becomes a proposition when we replace the variable with a specified value 2+2=4” is a proposition while “x < 2” is a propositional function Propositional functions lead to propositions when the variables are quantified for x (in the domain of discourse) is denoted as there is an x is denoted as , if there is uniqueness then we denote ! Logic Connectives (for constructing new propositions) and (), or (), negation (¬) Implication “we write A B if proposition B is true when proposition A is true, we also say that A is sufficient condition for B and B is necessary condition for A if A B and B A we write A B Note: ¬(x , P (x )) ⇔∃x , ¬P (x ) Methods of proof Direct proof showing that conclusion (C ) follows from hypothesis (H ) Proof by contrapositive to show H C it is shown that ¬C ⇒¬H Proof by contradiction assume ¬(H C ) and show that ¬(H C ) H is a contradiction Proof by construction Proof by decomposition

Generalinfo Introduction Lecturer: Mitri Kitti (mitri

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Introduction

Contents of the course,logic, sets, functions

SB: A1, 13.5

General info

Lecturer: Mitri Kitti ([email protected])

Web-page: www.mitrikitti.fi/mathcamp.html

Material

handouts by Juuso Valimaki on course webpageBook: Simon and Blume, Mathematics for Economists,relevant chapters indicated in slides (SB: XX)exercises and slides can be found in the course website

Contents

Basics: logic, sets, functions

Elementary analysis

topology: metrics, open and closed setsnormed vector spacescontinuity of functions, Weierstrass theorem

Linear algebra

linear functionsrank of a matrix, determinanteigenvalues and eigenvectorsapplication: linear dynamic systems

Multivariate calculus

differential and partial derivativesTaylor series, linearizations and log-linearizationsimplicit function theoremapplication: stability of nonlinear dynamic systems

Contents and schedule

Unconstrained optimization

first and second order optimality conditionsenvelope theorem

Convex analysis

convex sets and concave functions, quasiconcavityconcave and quasiconcave optimization

Nonlinear programming

first order necessary conditionscomparative staticssufficient optimality conditions

Contents and schedule

Introduction to Programming

Programming with Matlab

basics, linear algebraconditional statements and loopswriting functions

Basics on Dynare

Logic

N A proposition is a statement which can be declared true orfalse without ambiguity

N A propositional (or sentential) function is a statement whichbecomes a proposition when we replace the variable with aspecified value

“2 + 2 = 4” is a proposition while “x < 2” is a propositionalfunction

Propositional functions lead to propositions when the variablesare quantified

for x (in the domain of discourse) is denoted as ∀there is an x is denoted as ∃, if there is uniqueness then wedenote ∃!

Logic

Connectives (for constructing new propositions)

and (∧), or (∨), negation (¬)Implication“⇒”

we write A ⇒ B if proposition B is true when proposition A istrue, we also say that A is sufficient condition for B and B isnecessary condition for Aif A ⇒ B and B ⇒ A we write A ⇔ B

Note: ¬(∀x , P(x )) ⇔ ∃x ,¬P(x )

Methods of proof

Direct proof

showing that conclusion (C ) follows from hypothesis (H )

Proof by contrapositive

to show H ⇒ C it is shown that ¬C ⇒ ¬HProof by contradiction

assume ¬(H ⇒ C ) and show that ¬(H ⇒ C ) ∧H is acontradiction

Proof by construction

Proof by decomposition

Page 2: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Sets

Naive set theory: set is a collection of objects

Membership relation x ∈ A (element x belongs to set A)

set inclusion B ⊆ A if x ∈ B ⇒ x ∈ A, proper inclusionB ⊂ A: B ⊆ A and B 6= A

The usual binary operations

union A ∪ B

intersection A ∩ B

complement Ac , the set difference (complement of A relativeto B) of B and A is B \A (elements of B that are notelements of A)Cartesian product A× B

De Morgan laws (A∩B)c = Ac ∪Bc and (A∪B)c = Ac ∩Bc

Sets

Examples

real numbers R, integers Z, positive integers N, rationalnumbers Qreal coordinate sets Rn : n-times Cartesian product of R

Power set P(A) or 2A: the set of all subsets of A

Arbitrary unions and complements, index set I

union over I :⋃

i∈IAi

intersection over I :⋂

i∈IAi

Real coordinate spaces

The sets Rn , n ≥ 1 are called real coordinate spaces

we write an element x = (x1, . . . , xn) ∈ Rn with xi ∈ R,

i = 1, . . . ,n also as a column array (vector) x =

x1...xn

Inner product of two elements of Rn : x · y = xTy =∑

ixiyi

(here xT denotes the transpose of x )

x is orthogonal to y if x · y = 0note: cos(∠x , y) = x · y/(‖x‖‖y‖), where ‖x‖ =

√x · x

Least upper bound of a set

an element u ∈ R is a least upper bound of S if x ≤ u for allx ∈ S and for any v ∈ R such that x ≤ v for all x ∈ S it holdsthat u ≤ v , we denote u = supS

Real coordinate spaces

Infima and suprema

greatest lower bound inf S respectively as supaxiom: any ∅ 6= S ⊂ R with an upper bound has a least upperbound

Notations

zero vector 0 = (0, . . . , 0)positive vectors x ≥ 0 ⇔ xi ≥ 0∀i = 1, . . . ,nstrictly positive vectors x ≫ 0 ⇔ xi > 0∀i = 1, . . . ,npositive and strictly positive orthants Rn

+ = {x ∈ Rn : x ≥ 0}and Rn

++ = {x ∈ Rn : x ≫ 0}

Functions

N Function f : X 7→ Y is a rule that assigns an elementf (x ) ∈ Y to each element x ∈ X

formally f is a subset of X × Y such that∀x ∈ X∃y ∈ Y : (x , y) ∈ f and(x , y) ∈ f ⇔ ∀z 6= y : (x , z ) /∈ f

graph of a function is {(x , f (x )) ∈ X × Y : x ∈ X }note: for each x there is unique y ∈ X , if there were severaly ’s corresponding to x ∈ X , f would be a correspondence (setvalued mapping), i.e., f : X 7→ P(Y )often the notions of map and mapping are used as synonymsof function

Functions: domain and range

X is the domain of the function and Y is the range

note: in these notes the domain is often assumed to be Rn ,although we could in most cases assume the domain to be a(open) subset of Rn

For A ⊂ X the image of A is f (A)

Pre-image (or inverse image) of B ∈ Y isf −1(B) = {x ∈ X : ∃y ∈ B , f (x ) = y}

N If f (X ) = Y then f is surjection (onto)

N If for each y ∈ Y , f −1(y) consists of at most one element ofX , then f is injective (one-to-one) mapping of X to Y

Bijections

N If f is both surjective and injective, then it is a bijection

f is a bijection if and only if f −1 is a function

Examples:

identity function id(x ) = x is a bijectionexponential function is not bijection between R and itself butit is a bijection between R and R++

� If X and Y are finite sets, then there is a bijection between X

and Y if and only if the number of elements in the two sets isthe same

this can be taken as the definition for the same number ofelements (cardinality of sets)

Economic models

Purposes of economic modeling

explaining how economies work and how economic agentsinteractforming an abstraction of observed datameans of selection of dataother uses: forecasting, policy analysis

Simplification in modeling

choice of variables and relationships relevant for the analysis

Endogenous and exogenous variables

when an economic model involves functions in which some ofthe variables are not affected by the choice of economicagents, these variables are called exogenous variablesexogenous variables are fixed parameters in the modelrespectively, endogenous variables are those that are affectedby economic agents, they are determined by the model

Page 3: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Economic models

Example: model of a consumer

elements: choice of a consumption bundle x ∈ Rn

+ over budgetset {x ∈ Rn : p · x ≤ I , x ≥ 0} where p ≫ 0 and I > 0utility function u : Rn

+ 7→ R (more generally preferencerelation)variables p and I are exogenous and x is endogenous

Page 4: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Elements of Analysis

Some topology, continuity,Weierstrass theoremSB: 10–12, 29, 30.1

Metric spaces

Definition

A set X is said to be a metric space if for any two points p and q

of X there is associated a real number d(p, q), called the distancefrom p to q , such that:

(i) d(p, q) > 0 if p 6= q ; d(p, p) = 0,

(ii) d(p, q) = d(q , p),

(iii) d(p, q) ≤ d(p, r) + d(r , q), for any r ∈ X .

any function with these three properties is called a distancefunction or a metric

Examples of metrics

Example 1: any set X can be made into a metric space bydefining d(x , x ) = 0 and d(x , y) = 1 when x 6= y

Example 2: X = Rn , d(p, q) =

√∑n

k=1|pk − qk |2,

p = (p1, . . . , pn), q = (q1, . . . , qn)

Example 3: X = the set of real valued bounded functionsdefined on a set S , d(f , g) = supx∈S |f (x )− g(x )|.

Example 4: X = the set of infinitely lengthy sequences of realnumbers, d(p, q) = supk |pk − qk |, p = (p1, p2, . . .),q = (q1, q2, . . .)

Examples of metrics

Why would an economist be interested in metric spaces inexamples 3 and 4?

In some settings the choice variable of a decision maker can bea function or an infinite sequence

Application for example 3: S = [0, 1] with q ∈ S representingthe the quality of a product, function f (q) = price of qualityq ∈ S product

Application for example 4: infinitely many periods, in eachperiods there is price for a product, X = sequences of prices

Vector spaces

Definition

A set X is a vector space if the sum of two vectors and theproduct of a scalar and a vector can be defined

the sum“+” should satisfy the usual properties of a sum:associativity ((x + y) + z = x + (y + z )), commutativity(x + y = y + x ), there are identity (zero) and inverse(-vector) elements

the scalar product satisfies distributivity (a(x + y) = ax + ay ,(a + b)x = ax + bx ) and compatibility (a(bx ) = (ab)x )conditions and there is an identity element (1x = x )

Example 1: real coordinate space Rn

Example 2: X functions on S , define sum and scalar productpointwise

Vector spaces

Definition

Normed vector space X , norm ‖ · ‖ (function that measures thelength)

‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0

‖ax‖ = |a|‖x‖ for all scalars a

‖x + y‖ ≤ ‖x‖+ ‖y‖ (triangle inequality)

Example: X = Rn with ‖x‖ =

∑nk=1

x 2

k (Euclidean norm),

note d(x , y) = ‖x − y‖ is a distance

Open sets

N X metric space, r -neighborhood of x ,Nr (x ) = {y ∈ X : d(x , y) < r}

N Interior point, x is an interior point of X if there is r > 0 suchthat Nr (x ) ⊆ X

we denote the interior of a set as int(S )

Definition

S ⊆ X is an open set if every point of S is an interior point, i.e.,S = int(S )

Open sets

1. X is open as well as ∅

2. any union of open sets is open

3. intersection of finitely many open sets is open

⋆ these three properties are the definition of a topology!

Open sets in Rn

Norm equivalence theorem: all norms in Rn define the same

topology (the set of open sets)example 1: strictly positive orthant Rn

++

example 2: open half spaces {x ∈ Rn : p · x < a}

Page 5: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Closed sets

Definition

A set S is closed if S c (complement) is open

Equivalent definitionclosure of S : x ∈ S if for all ε > 0 we have Nε(x ) ∩ S 6= ∅S is closed if (and only if) S = S

Examples

positive orthant Rn+

hyperplanes {x ∈ Rn : p · x = a} and closed half-spaces

{x ∈ Rn : p · x ⋚ a}

level curves of a continuous function: {x ∈ Rn : f (x ) = α}

(lower and upper) level sets of a continuous function:{x ∈ R

n : f (x ) ⋚ α}

N Boundary of a set = ∂S , x ∈ ∂S if Nε(x ) ∩ S 6= ∅ andNε(x ) ∩ S c 6= ∅

remark: int(S ) = S \ ∂S

Sequences and convergence

A sequence x 1, x 2, . . ., where x k ∈ X , k = 1, 2, . . . (X is ametric space), is said to converge to x ∈ X if for every ε > 0,there is an integer N such that k ≥ N implies d(x k , x ) < ε

notation for a sequence {x k}formally a sequence in S is a function on the setN = {1, 2, . . .}, if X : N 7→ S is a sequence then X (n) = xn

if {x k} converges to x we denote x k → x (as k → ∞) orlimk→∞

x k = x

x is called the limit point of the sequence

� A set S is closed if and only if {x k} ⊂ S , x k → x implies thatx ∈ S

Compact sets

Definition

A subset S of a metric space X is said to be compact if everysequence {x k} ⊂ S has a convergent subsequence with a limitpoint in S

given an infinite sequence {kj } of integers withk1 < k2 < k3 . . . we say that {x kj } is a subsequence of {x k}

compactness can be defined for topological spaces (withoutmetric): S is compact if for arbitrary collection of open sets{Uα}α∈A ⊆ X such that S ⊆

α∈AUα there is a finite setI ⊆ A such that S ⊆

α∈J Uα

Compact sets in Rn

Theorem

Every compact set of a normed vector space is bounded and closed

S ⊂ X is bounded if there is M > 0 such that ‖x‖ ≤ M forall x ∈ S

⋆ For Rn the converse of the previous theorem is also true, i.e.,every bounded and closed set is compact, i.e., S ⊂ R

n iscompact if and only if it is bounded and closed

⋆ is equivalent to Bolzano-Weierstrass theorem: everybounded sequence in R

n has a convergent subsequence

Continuous functions

Definition

Let X and Y be metric spaces, f is continuous at p if for everyε > 0 there is a δ > 0 such thatdX (x , p) < δ =⇒ dY (f (x ), f (p)) < ε

If f is continuous at any x ∈ X then f is continuous on X

Equivalent definitions of continuity

f −1(V ) is open (closed) for every open (closed) set V ∈ X

limk→∞

f (x k ) = f (p) for any sequence {x k} such that

p = limk→∞

x k

Lipschitz continuity

X and Y metric spaces, f : X 7→ Y is Lipschitz continuous ifthe is L ≥ 0 (Lipschitz constant) such thatdY (f (x 1), f (x 2)) ≤ LdX (x 1, x 2)

If L ∈ (0, 1) the function is called a contraction

f is locally Lipschitz on X if any x ∈ X has an openneighborhood in which f is Lipschitz

Weierstrass theorem

Theorem

Continuous function over a compact set attains its extrema(minimum and maximum)

S compact, f continuous on S , then there is x such thatf (x ) = infx∈S f (x ) (and the same for supremum)

Page 6: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Linear algebra

Matrices, determinant, inversionsSB: 6–9, 26, 27

Introduction

Some economic models have a linear structure

Nonlinear models can be approximated by linear models

System of linear equations, general form

a11x1 + a12x2 + ....+ a1nxn = b1...

......

am1x1 + am2x2 + ....+ amnxn = bm

⇔ Ax = b

aij , bj , i = 1, . . . ,m, j = 1, . . . ,n, are parameters (exogenousvariables) and x1, . . . , xn are (endogenous) variables

Example: two products

Demand Qdi = KiP

αi1

1 Pαi2

2 Y βi , i = 1, 2

Pi price of product i , Y incomeαij elasticity of demand of product i for price j

βi income elasticity of product i

Supply Qsi = MiP

γii , i = 1, 2

γi is the price elasticity of supply

Equilibrium Qdi = Qs

i , i = 1, 2

Notations qdi = lnQdi , q

si = lnQs

i , pi = lnPi , y = lnY ,mi = lnMi , ki = lnKi

Example: two products

By taking a logarithm of supply, demand, and the equilibriumcondition we get a linear system

qdi = ki + αiipi + αijpj + βiy ,

qsi = mi + γipi ,

qsi = qdi .

after some algebraic manipulation this results to a system ofthe form

(

a11 a12a21 a22

)(

p1p2

)

=

(

b1b2

)

Vector spaces

Definition

W ⊆ V is a subspace of V (vector space) if (i)v ,w ∈ W ⇒ v + w ∈ W and (ii) αv ∈ W for all scalars α andv ∈ W

Let V be a vector space and v1, . . . , vn ∈ V and α1, . . . , αn

scalars. The expression α1v1 + · · ·+ αnv

n is called a linearcombination of v1, . . . , vn

the set of all linear combination of v1, . . . , vn , denoted byspan(v1, . . . , vn), this set is a subspace of V

Linearly dependent vectors

Definition

Vectors v1, . . . , vn ∈ V are linearly dependent if there are scalarsα1, . . . , αn not all equal to zero such that α1v

1 + · · ·+ αnvn = 0

If vectors are not linearly dependent they are linearlyindependent

If vectors v1, . . . , vn are linearly independent andspan(v1, . . . , vn) = V then they form a basis of V

If v ∈ V is written as v = α1v1 + · · ·+ αnv

n , then(α1, . . . , αn) are the coordinates of v w.r.t. the given basis

Vector and affine spaces

If V has a basis consisting of n elements we say that thedimension of V is n

Any two basis of a vector space have the same number ofelements

Example: coordinate vectors ei , i = 1, . . . ,n of Rn

Definition

Translation x 0 + V of a vector space V is called an affine space(set)

dimension of an affine space x 0 + V is the dimension of thevector space V

Matrices

N m × n (real) matrix A, we denote A ∈ Rm×n , is an array

A =

a11 a12 · · · a1na21 a22 · · · a2n...

......

...am1 am2 · · · amn

,

where aij ∈ R, i = 1, . . . ,m, j = 1, . . . ,m

notation: Ai = (ai1, . . . , ain) (ith row), Aj = (a1j , . . . , amj )(j th column)

Transpose of a matrix AT is defined by (AT )ij = Aji

Special matrices: square, identity matrix (I ), symmetric,diagonal, upper (lower) triangular

Page 7: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Invertible matrices

Multiplication of matrices, A ∈ Rn×m , B ∈ R

m×s , theproduct AB is defined as

(AB)ij =

n∑

k=1

aikbkj = Ai · Bj

Definition

A ∈ Rn×n is invertible (nonsingular) if there is A−1 ∈ R

n×n suchthat AA−1 = A−1A = I

Some resultsVectors v1, . . . , vk ∈ R

n with Aj = v j , j = 1 . . . , k are linearlydependent if and only if the system Ax = 0 has a nonzerosolution x ∈ R

k

A ∈ Rn×n is nonsingular if and only if its columns (or rows)

are linearly independent

Positive and negative definite matrices

Definition

A symmetric matrix A ∈ Rn×n is positive semidefinite if xTAx ≥ 0

for all x ∈ Rn , and positive definite if xTAx > 0 for all x 6= 0

matrix A is negative (semi)definite if −A is positive(semi)definite

if a matrix is neither neg. or pos. sem. def., then it isindefinite

note: x · Ax is a quadratic form and we also talk aboutdefiniteness of quadratic forms

Linear equations

Linear system of equations Ax = b

here A ∈ Rm×n , and x ∈ R

n , b ∈ Rm

if there are more equations than unknowns, m > n, the systemis said to be over-determinednote: Ax = b ⇔ x1A

1 + · · ·+ xnAn = b

system is said to be homogeneous if b = 0

Solutions of linear systems

system is solvable if and only if b can be expressed as a linearcombination of columns of Ain other words, system is solvable if and only ifb ∈ span(A1, . . . ,An) = R(A) (range of A)the set of solutions of Ax = 0 is called the null space of A andit is denoted by N (A)

Linear equations

rank(A) = dimension of the image space{y ∈ R

m : y = Ax , x ∈ Rn}, where A ∈ R

m×n

column rank = dimension of span(A1, . . . ,An)

row rank = dimension of span(A1, . . . ,Am)

rank = column rank = row rank

rank is at most min(m,n), if rank(A) = min(m,n) we saythat A has full rank

if rank(A) = m it has full row rank, if rank(A) = n it has fullcolumn rank

Linear mappings

Definition

A function F : V 7→ W is linear if F (αx + βy) = αF (x ) + βF (y)for all x , y ∈ V and α, β ∈ R

If a function is of the form F (x ) = L(x ) + b, whereL : V 7→ W is linear and b ∈ W , then F is an affine function

Example: rotation and scaling

Kernel of F = ker(F ) = {v ∈ V : F (v) = 0}

Image of F =Im(F ) = {w ∈ W : ∃v ∈ V , such that w = F (v)}

when considering a matrix as a linear function, then kernelcoincides with the null space and image with the range

Linear mappings

Theorem

If F : Rn 7→ Rm is a linear function, then there is a matrix

A ∈ Rm×n such that F (x ) = Ax for all x ∈ R

n

Theorem (Fundamental theorem of linear algebra)

dim(V ) = dim(ker(F )) + dim(Im(F ))

for A ∈ Rm×n we have n = dim(N (A)) + rank(A)

Theorem

If x 0 is a solution of Ax = b then any other solution x can can bewritten as x = x 0 + w where w ∈ N (A)

in other words the set of solutions is the affine set x 0 +N (A)

Linear equations: number of solutions

Determining the size of the solution set for Ax = b withA ∈ R

m×n

if rank(A) = m, then Ax = b has a solution for all b ∈ Rm

⇔ A (or f (x ) = Ax ) is surjective if and only if A has full row rankif n = m, A is surjective if and only if it is injectiveif rank(A) < m, then Ax = b has a solution only for b ∈ R(A)if rank(A) = n, then N (A) = {0} and Ax = b has at mostone solution solution for all b ∈ R

m

⇔ A is injective if and only if N (A) = {0}⇔ A is injective if and only if it has full column rank

if rank(A) < n, then if Ax = b has any solution at all, the setof solutions is an affine subspace of dimension n − rank(A)pos. (neg.) def. matrix is nonsingular

Determinants

Geometric idea: scale factor for measure

n = 2, determinant tells how the area is scaled under the lineartransformation, det(A) = a11a22 − a12a21

Definition (Laplace’s formula)

let Aij denote the matrix obtained from A ∈ Rn×n by

deleting ith row and j th column

minor Mij = det(Aij )

cofactor Cij = (−1)i+jMij

for any row (i) det(A) =∑n

j=1aijCij

for any column (j ) det(A) =∑n

i=1aijCij

Theorem

A is nonsingular if and only if det(A) 6= 0

Page 8: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Cramer’s rule

Method for solving Ax = b

let Bi denote the matrix that is obtained from A by replacingith column with b

assuming det(A) 6= 0 the system has a unique solution that isgiven by xi = det(Bi)/ det(A)

Example IS-LM analysis

national income model, net national product Y , interest rater , marginal propensity to save s, marginal efficiency of capitala, investment I = I 0 − ar , money balances needed m,government spending G , money supply Ms

sY + ar = I 0 +G

mY − hr = Ms −M 0

Example: IS-LM analysis

Endogenous variables Y and r can be solved by usingCramer’s rule

Y =

I 0 +G a

Ms −M 0 −h

s a

m −h

=(I 0 +G)h + a(M −M 0)

sh + am

r =

s I 0 +G

m Ms −M 0

s a

m −h

=(I 0 +G)m − s(M −M 0)

sh + am

Determinants and definiteness of a matrix

kth order leading principal minor of A ∈ Rn×n is the

determinant of k × k matrix obtained from A by deleting thelast n − k columns and n − k rows

Symmetric matrix A ∈ Rn×n is pos. def if and only if all its n

leading principal minors are strictly positive

Symmetric matrix A ∈ Rn×n is neg. def if and only if all its n

leading principal minors alternate in sign such that kth orderleading principal minor has the same sign as (−1)k

Matrix inversion

Main approaches

direct elimination of variablesrow reduction (Gaussian elimination and Gauss-Jordanelimination)Cramer’s rulemore efficient methods for numerical purposes (LUdecomposition, iterative methods etc.)

Number of operations with Gaussian eliminationn3/3 + n2 − n/3, i.e. O(n3)

theoretical record (this far) O(n2.376)

Note: for solving Ax = b it is not necessary to find A−1

Gaussian elimination

Idea, let ei denote ith coordinate vector

finding solutions xi to equations Ax = ei , i = 1, . . . ,n⇒ A−1 = (x1 x2 · · · xn)

Elementary row operations

interchage two rows of a matrixchange a row by adding to it a multiple of another rowmultiply each element in a row by the same nonzero number

Form an augmented matrix [A|I ]

when solving Ax = b we use [A|b] insteadterminology: matrix is in row echelon form if in place of A wehave triangular matrix, if we have an identity matrix we have areduced row echelon form

Gaussian elimination

Forward elimination

using elementary row operations the augmented matrix isreduced to echelon form (this is Gaussian elimination)if the result is degenerate (n zeros in the beginning of somerow) there is no inverse (or solution to the equation)note: rank of a matrix = number of non-zero rows in echelonform

Back substitution

bring the echelon form matrix to reduced row echelon form byelem. row operations (if this step is included we call thealgorithm as Gauss-Jordan elimination)

→ in place of A we have I and in place of I we have A−1

Example

For what values of b ∈ R4 there is a solution for Ax = b

where A =

1 2 1 40 0 0 22 4 2 20 2 0 3

the rank of A is three (transform A into echelon form byGaussian elimination)there is a solution when b ∈ R(A)

What about the number of solutions?

Cramer’s rule for inversion

The matrix whose (i , j )-entry is Cij (cofactor) is called as theadjoint of A, adj(A)

Cramer’s rule:

A−1 =1

det(A)· adj(A)

Note: in terms of numbers of operatios Cramer’s rule is moredemanding than Gaussian elimination

Page 9: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Linear algebra: Eigenvalues

SB 23.1, 23.3, 23.7–23.9

Eigenvalues: The idea

Idea for a square matrix A

can we make a change of variables so that instead of A wecould consider some other matrix that would be“easier” tohandle?

change of variables: invertible mapping (matrix) P , newvariables P−1x

how does A behave when we introduce the new variables:take y (new variables) and find the corresponding x , i.e.,x = Py , map this with A, i.e., APy , return back to originalvariables, i.e., P−1APy (image of y in new variables)

transformed mapping D = P−1AP

previous question: can we find P such that D is a diagonalmatrix?

let λi denote the diagonal element in the ith row of D and vithe ith column vector of P → we can deduce that Avi = λivi(because PD = AP)

Eigenvalues

Definition

λ is the eigenvalue of A and v 6= 0 is the correspondingeigenvector if Av = λv

Note: it is assumed v 6= 0 because 0 would always be aneigenvector (and is therefore not particularly interesting)

Other formulation: λ is an eigenvalue if and only if A− λI issingular ⇔ det(A− λI ) = 0, which is called the characteristicequation (polynomial) of A

Eigenvalues: Example 1

A =

(

2 00 3

)

by subtracting 2

(

1 00 1

)

from A we get

(

0 00 1

)

,

which is singular, hence, 2 is an eigenvalue and the correspondingeigenvector v = (v1, v2) is obtained by solving(

0 00 1

)(

v1v2

)

=

(

00

)

, i.e., vectors of the form v1 6= 0 and v2 = 0

are all eigenvectors corresponding to eigenvalue 2

3 is also an eigenvalue because A− 3I =

(

−1 00 0

)

is singular

Facts about eigenvalue and eigenvectors

Every non-zero element of the space spanned by eigenvectorscorresponding to an eigenvalue is an eigenvector (eacheigenvector defines a subspace/eigenspace of eigenvectors)

⇒ if v is an eigenvector, so is αv for all α 6= 0

The eigenvectors corresponding to different eigenvalues arelinearly independent

Diagonal elements of a diagonal matrix are eigenvalues

Eigenvector determines a direction that is left invariant underA

Matrix is singular if and only if 0 is its eigenvalue

If a matrix is symmetric, it has n real eigenvalues

If λ1, . . . , λn are the eigenvalues of A ∈ Rn×n , then

λ1 + λ2 + · · ·+ λn = trace(A) and λ1 · λ2 · · ·λn = det(A).

trace(A) is the sum of diagonal elements of A

Eigenvalues: Example 2

A =

(

−1 32 0

)

, because this is not a diagonal matrix we need

characteristic equation for finding the eigenvalues

det(A− λI ) = det

(

−1− λ 32 0− λ

)

=

= −(1 + λ)(−λ)− 6 = λ2 + λ− 6

= (λ+ 3)(λ− 2)

the eigenvalues are λ1 = −3 and λ2 = 2. The eigenvectorcorresponding to λ1 can be obtained from

(A− (−3)I )v =

(

2 32 3

)(

v1v2

)

, e.g., v = (−3, 2) is an eigenvector

(like all vectors αv , α 6= 0), what is the other eigenvector?

Eigenvalues: Example 2

A =

1 0 20 5 03 0 2

, characteristic equation

det(A− λI ) = det

1− λ 0 20 5− λ 03 0 2− λ

=

= (5− λ)(λ− 4)(λ+ 1)

there are three eigenvalues λ1,2,3 = 5, 4,−1, with corresponding

eigenvectors v1,2,3 =

010

,

203

,

10−1

Eigenvalues: More facts

Characteristic equation is a polynomial of degree n

→ there are n eigenvalues some of which may be complexnumbers

in general, finding the roots of nth degree polynomial is hard

Page 10: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Diagonalization

Definition

A ∈ Rn×n is diagonalizable if there is an invertible matrix P such

that D = P−1AP is a diagonal matrix

P can be seen as a change of the basis

Theorem

A ∈ Rn×n is diagonalizable if it has n distinct nonzero eigenvalues

when eigenvalues are distinct and 6= 0, then eigenvectors arelinearly independent

eigenvectors are the columns of P and eigenvalues are thediagonal elements of D

note: matrix can be diagonalizable even when the eigenvaluesare not distinct

Diagonalization: Example

Example 3 (cont’d)

transformation P =

0 2 11 0 00 3 −1

and D =

5 0 00 4 00 0 −1

Information of eigenvalues

The nature of the linear function defined by A can besketched by using eigenvalues

if an eigenvalue of A is positive, then A scales the vectors inthe direction of the corresponding eigenvectorif an eigenvalue is negative, A reverses the direction of thecorresponding eigenvector and scales

example: how does A =

(

k1 00 k2

)

behave?

if there are no real eigenvectors, there is no directions in whichA behaves as described above

→ the matrix turns all the directions (and possibly scales them)

e.g., A =

(

cos(θ) − sin(θ)sin(θ) cos(θ)

)

reverses vectors by angle θ

(what are the eigenvalues?)

Information of eigenvalues

Investigating the definiteness of a matrix by using eigenvalues

symmetric A ∈ Rn×n is pos.sem.def (neg.sem.def) if and only

if the eigenvalues of A are ≥ 0 (≤ 0)respectively, pos. def. (neg. def.) if eigenvalues are > 0 (< 0)matrix is indefinite if and only if it has both positive andnegative eigenvalues

Page 11: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Solving linear difference equations

SB 23.2, 23.6

Linear difference equations

Dynamical system given by z k+1 = Az k , k = 0, 1, . . ., whereA ∈ R

n×n and z 0 ∈ Rn is given

model for a discrete time processeigenvalues can be utilized in solving this kind of systemslater we shall obtain this kind of systems by linearizingnonlinear difference equation

Example 1: calculating compound interests

capital yk , interest rate ρ, yk+1 = (1 + ρ)ykbeginning from period 0, y1 = (1 + ρ)y0, y2 = (1 + ρ)2y0, . . .,we notice that yk = (1 + ρ)ky0 (solution of the differenceequation)

Linear difference equations

Example 2: model for unemployment

employed (on the average) x , unemployed y

probability of getting a job p, probability of staying in job q

xk+1 = qxk + pyk

yk+1 = (1− q)xk + (1− p)yk

in matrix form

(

xk+1

yk+1

)

=

(

q p

1− q 1− p

)(

xkyk

)

first idea (brute force): let z k+1 = Az k denote the system,find zN corresponding to z 0 by iterating the system, i.e.,zN = AN z 0, which means that we need AN

Linear difference equations

Observations from the 2d model z k+1 = Az k , A =

(

a b

c d

)

if A was a diagonal matrix, i.e., b = c = 0, then the variables(components of z ) would be uncoupled and we could solve forthe components separately as in the compound interestexample

→ idea: make a transformation of variables which leads to adiagonal system → use eigenvalues

Solution by diagonalization

Let us consider a difference equation z k+1 = Az k withdiagonalizable A

1. transform of variables, Z = P−1z and z = PZ

old variables z ∈ Rn and new variables Z ∈ R

n

find the eigenvalues of A and the corresponding eigenvectors,form P from the eigenvectors

2. form a difference equation for ZZ k+1 = P−1z k+1 = P−1Az k = P−1APZ k

note: P−1AP is a diagonal matrix with eigenvalues in thediagonal

3. solve the resulting difference equation for Z4. return back to original variables

Example

Let A =

(

1 41/2 0

)

and the aim is to solve z k+1 = Az k

Finding the eigenvalues and eigenvectors

characteristic equationdet(A− λI ) = 0 ⇔ (1− λ)(0− λ)− 4 · 1/2 = 0

→ we get the eigenvalues λ1,2 = 2,−1

eigenvectors, (i) (A− λ1I ) =

(

−1 41/2 −2

)(

v1v2

)

, e.g.

v1 = (4, 1)

(ii) (A− λ2I ) =

(

2 41/2 1

)(

v1v2

)

, e.g. v2 = (−2, 1)

transformation P =

(

4 −21 1

)

and P−1 =

(

1/6 1/3−1/6 2/3

)

Example

Transform of variables (change of basis)

let us denote z = (x , y), new variables X and Y , Z = (X ,Y ):(

X

Y

)

=

(

1/6 1/3−1/6 2/3

)(

x

y

)

and

(

x

y

)

=

(

4 −21 1

)(

X

Y

)

Deriving the difference equation for Z

Z k+1 = P−1APZ k and P−1AP is a diagonal matrix with

eigenvalues on the diagonal; P−1AP =

(

2 00 −1

)

in this difference equation X and Y are uncoupled!

Example

Solve for Z k : Xk = 2kX0 and Yk = (−1)kY0

Going back to original variables

z k = PZ k =

(

4 −21 1

)(

2kX0

(−1)kY0

)

=

(

4 · 2kX0 − 2 · (−1)kY0

2kX0 + (−1)kY0

)

= (2kX0)

(

41

)

+ ((−1)kY0)

(

−21

)

Initial values X0 and Y0 can be obtained by Z 0 = P−1z 0

Page 12: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Results for linear systems

Assume that A is diagonalizable with real eigenvalues

eigenvalues λ1, . . . , λn and eigenvectors v1, . . . , v2

D =

λ1 · · · 0

.

.

.

.

.

.

.

.

.

0 · · · λn

, in which case D

k=

λk1 · · · 0

.

.

.

.

.

.

.

.

.

0 · · · λkn

Theorem

When the initial value is z 0, the solution of the difference equationis z k = PDkP−1z 0

The result follows by observing that Ak = PDkP−1 and plugthis into z k = Ak z 0

note: with the formula for Ak we can define, e.g.,√A as a

series

Results for linear systems

Theorem

The general solution of the difference equation z k+1 = Az k

(z k ∈ Rn) is z k = c1λ

k1v

1 + c2λk2v

2 + . . .+ cnλknv

n , where ci ,i = 1, . . . ,n, are constants

In the general solution z 0 is unspecified

Regardless of the initial value z 0 the solution is of the previousform

constants ci , i = 1 . . . ,n, can be solved when z 0 is given;denote c = (c1, . . . , cn), then c = P−1z 0

Complex eigenvalues

2d intuition

complex eigenvalues means that A reverses all directions, .i.,eno direction is mapped to the subspace defined by the direction

As in the case of real eigenvalue, we obtain exactly the samesolution for the difference equation but now the matrices P ,D will be complex matrices

for n = 2 we get (by using the rules of addition andmultiplication of complex numbers) a general solution in theformz k = 2rk [(c1 cos kθ − c2 sin kθ)u − (c2 cos kθ + c1 sin kθ)v ],where c1 and c2 are real numbers determined by z 0 and theeigenvalues are α± βi and the eigenvectors are u ± iv

complex eigenvalues imply oscillations

Stability of linear systems

Definition

Origin is (globally asymptotically) stable equilibrium of a systemz k+1 = Az k if z k → 0 for all z 0

note: there may be other equilibria than origin (when?)

Theorem

Origin is stable when the eigenvalues are inside a unit circle incomplex plane, i.e., r =

√a2 + b2 < 1 if a ± bi are eigenvalues

in case of real eigenvalues, stability is obtained when theeigenvalues less than 1 in absolute values

Markov processes

Definition

let us assume that there is a finite number of statesI = {1, . . . ,n}. Stochastic process is a rule that gives theprobability that the system will be in state i ∈ I at time k + 1given the probabilities of its being in the various states in previousperiods

when (for all i ∈ I ) the probability depends only on whatstate the system was in at time k the process is called as aMarkov process

Example

let us classify households as urban (1), suburban (2), or rural(3)the different classes can be considered as states 1, 2, 3

Markov processes: state transitions

Let xi(k) be the probability that in period k the system is atstate i

State transition probabilities mij = probability that in periodk + 1 the state is i when the state was j in period k

collecting the probabilities in a Markov matrix

M =

m11 · · · m1n

.

.

.

.

.

.

.

.

.

mn1 · · · mnn

Example (cont’d.)we may think that the probability xi(k) is the portion ofpopulations in are i in period k

e.g., m1j = probability of staying urban (j = 1) or becomingurban (j = 2 or j = 3),

set M =

0.75 0.02 0.10.2 0.9 0.20.05 0.08 0.7

Markov process as a difference equation

Probability that in period k + 1 the state is i : xi(k + 1)=Prob(transition from state 1 to state i) × Prob(initial state is1) + . . .+ Prob(transition from state n to state i) ×Prob(initial state is n) =

j mij xj (k)

using the state transition matrix we get x (k + 1) = Mx (k),i.e.,

x1(k + 1)...

xn(k + 1)

=

m11 · · · m1n

.... . .

...mn1 · · · mnn

x1(k)...

xn(k)

Example

Markov matrix M =

(

0.9 0.40.1 0.6

)

Eigenvalues, det(M − λI ) = (0.9− λ)(0.6− λ)− 0.1 · 0.4 =λ2 − (0.9 + 0.6)λ+ 0.9 · 0.6− 0.04 = 0, we getλ1,2 = (1.5±

√1.52 − 4 · 0.5)/2 = (1.5± 0.5)/2 = 1, 0.5

Eigenvectors v1 = (4, 1) and v2 = (1,−1)

General solution

x1(k + 1) = c1 · 4 · 1k + c2 · 1 · 0.5k

x2(k + 1) = c1 · 1 · 1k + c2 · (−1) · 0.5k

Page 13: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Example

Remarks

one eigenvalue is one and the other is less than onewhen k → ∞, the limit is c1(4, 1) (direction corresponding tothe eigenvector for eigenvalue 1), because the limit should be aprobability distribution 4c1 + c1 = 1, i.e., c1 = 1/5general result: when M is a Markov matrix with allcomponents positive (or M k has positive components for somek), then one of the eigenvalues is one and the rest are on theinterval (−1, 1), and the process converges to the limitdistribution determined by the eigenvector corresponding tothe eigenvalue 1

→ the limit distribution is a stable equilibrium of the differenceequation

Page 14: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Linear differential equations

SB: 25.2, 25.4

Scalar equations

kth order differential equation is an equation of the formF (y [k ], y [k−1], . . . , y [1], y , t) = 0, where y [j ] = d j y(t)/dt j

(and y(t) ∈ R)

model for a continuous time process or dynamical system (timet)when F : Rk+1 7→ R is linear in first k arguments thedifferential equation is linear

First order scalar equation dy(t)/dt = a(t)y

notation y = dy(t)/dtif a(t) = a ∀t the equation is autonomous

general solution y(t) = Ce∫

ta(s)ds where C is an integration

constant

Scalar equations

Second order linear scalar equation ay + by + cy = 0

example: what is the class of utility functions for which−u ′′(x )/u ′(x ) = a (the left hand side is Arrow-Pratt measureof absolute risk aversion) ⇔ u ′′(x ) + au ′(x ) = 0the solution can be found by using an educated guess (ansatz)y = eλt

Linear systems

Linear system of differential equations is of the formx = A(t)x + b(t), where t ∈ R (time) x (t), b(t) ∈ R

n , andA(t) ∈ R

n×n

Homogeneous autonomous system A(t) = A and b(t) = 0 forall t :

x1 = a11x1 + . . .+ a1nxn...

xn = an1x1 + . . .+ annxn

if the question is of finding the solution for a given initial valuex (t0) = x 0 the problem is an initial value problem

Linear systems

From second order scalar equation to a linear system

take a new variable z = y , when the differential equationbecomes az + bz + cy = 0, i.e., we have a pairz = −bz/a − cy/a, y = z

more generally, by introducing new variables we can transformhigher order scalar equations into linear systems

Solution by diagonalization

Theorem

When A is diagonalizable and its eigenvalues λi , i = 1, . . . ,n aredistinct and non-zero, the general solution of x = Ax isx (t) = C1e

λ1tv1 + C2eλ2tv2 + . . .+ Cne

λn tvn , where Ci areconstants, and v i are the eigenvectors corresponding to theeigenvalues (i = 1, . . . ,n)

The solution of the initial value problem isx (t) = PeDtP−1x (0)

note: PeDtP−1 = eAt , i.e., the solution is of the same form asfor scalar equations (x (t) ∈ R)

Solution by diagonalization

Example: x = Ax with A =

(

1 41 1

)

finding the eigenvalues:det(A− λI ) = (1− λ)(1− λ)− 4 = 0, i.e., λ2 − 2λ− 3 = 0,we get λ1,2 = (2±

22 − 4(−3))/2 = 1± 2 = 3,−1eigenvectors: (1, 0.5) and (1,−0.5)the general solution:

(

x1(t)x2(t)

)

= c1e3t

(

10.5

)

+ c2e−t

(

1−0.5

)

what if the initial value was x (0) = (1, 2)?

Stability

Origin x = 0 is the steady state of x = Ax

Definition

x = 0 is globally asymptotically stable steady state of x = Ax , ifthe solution limt→∞ x (t ; x 0) = 0 for all x 0

here x (t ; x 0) is the solution of the system for initial state x 0

Stability by using eigenvalues

if the real parts of eigenvalues are negative, then origin is as.stablenote: some of the eigenvalues may be complex

Page 15: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Multivariate calculus

Differentiation, Taylor series, linearizationSB: 14–15, 30

Motivation

Need for nonlinear models

need to model nonlinear phenomena, e.g., decreasing marginalprofitslinear models may lead to optimization problems with no(bounded) solutions, even when there are solutions, thesolutions may behave against economic intuition (e.g.,discontinuities)

Nonlinear models are often hard solve analytically

numerical solutionlocal approach: linearize the model around equilibrium,optimum or other interesting point → differentiation

Differentiation in R

Definition

A function f : R 7→ R is differentiable at x ∈ R if there is anumber f ′(x ) such that

limh→0

|f (x + h)− f (x )|

|h|= f ′(x )

rewriting we get f (x + h)− f (x ) = f ′(x )h + o(h), where o(h)is a function such that o(h)/h → 0 as h → 0

around x , i.e., at x + h, function f behaves as a linearfunction f ′(x )h (plus the error o(h))

f ′(x ) is the derivative of f at x and f ′(x )h is the differential

note: if f is differentiable for all x ∈ R then f ′(x ) is a function

if f ′(x ) is differentiable at x we obtain the second derivativef ′′(x ) (higher order derivatives respectively)

Differentiation in Rn

Definition

A function f : Rn 7→ Rm is differentiable at x ∈ R

n if there is alinear function Df (x ) : Rn 7→ R

m such that

limh→0

‖f (x + h)− f (x )− [Df (x )]h‖

‖h‖= 0

Df (x ) is the derivative (or total derivative or differential) of fat x

note: in the definition of the above limit, Df (x ) isindependent on how h goes to zero, e.g., for f (x ) = |x | wehave f (0 + h)− f (0) = h for all h > 0 but f is notdifferentiable at x = 0 because for h < 0 we havef (0 + h)− f (0) = −h, i.e., there is no Df (x ) as required inthe definition

Differentiation in Rn

Theorem

Derivative of a differentiable function is unique

Chain rule: suppose f maps an open set S ⊆ Rn into R

m andis differentiable at x 0 ∈ S , g maps an open set containingf (S ) into R

k and is differentiable at f (x 0), thenF (x ) = g(f (x )) is differentiable at x 0 andDF (x 0) = Dg(f (x 0))Df (x 0)

Partial derivatives

Partial derivatives: assume that f = (f1, . . . , fm), for x ∈ Rn ,

i = 1, . . . ,m, j = 1, . . . ,n we define (provided the limitexists)

Dj fi(x ) = limh→0

fi(x + hej )− fi(x )

h,

Dj fi is the derivative of fi w.r.t. xj , we call it a partialderivative of f w.r.t xj and write

Dj fi(x ) =∂fi(x )

∂xj

Partial derivatives

If f is differentiable at x , then all partial derivatives exist at xand the (i , j )-component of Df (x ) is Dj fi(x ), we writeDf (x ) = [Dj fi(x )] and call this matrix as the Jacobian matrix

If partial derivatives are continuous then function isdifferentiable

When there are several argument vectors with respect to whichwe can compute the Jacobian we use the subscript to denotethe variables, e.g., f (x , a), Jacobian w.r.t. x is Dx f (x , a)

Continuous differentiability

A function that has continuous partial derivatives in an openset containing x is said to be continuously differentiable at x

k -times continuously differentiable function has continuous kthorder partial derivativesif f is k -times continuously differentiable on S we denotef ∈ C k (S ) (note: k can be infinite)

Page 16: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Gradient of a function

N Gradient of f at x is ∇f (x ) = Df (x )T

geometric interpretation: y − f (x ) = ∇f (x ) · (x − x ) is ahyperplane in R

n+1

∇f (x ) is perpendicular (orthogonal) to the tangent space of{y ∈ R

n : f (y) = f (x )} (level set)

N Directional derivative of f at x to direction v is (if the limitexists)

f ′(x ; v) = limh→0

f (x + hv)− f (x )

h

for differentiable f : Rn 7→ R we havef ′(x ; v) = ∇f (x ) · v = ∇f (x )T · vif we let v with ‖v‖ = 1 vary we notice that maximum off ′(x ; v) is attained at v = ∇f (x )/‖∇f (x )‖, i.e., ∇f (x ) is thedirection of steepest ascent at x

Taylor’s theorem

Motivation: sometimes it is not enough to linearize anonlinear system but more information is needed → higherorder approximation is needed

Purpose: approximate function of a single variable arounda ∈ R

Notations

k ! is the factorial of k (0! = 1)f (k) is the kth derivative of fRn is the remainder (function) such that there is ξ on theinterval joining x and a, such that

Rn(x ) =f (n+1)(ξ)

(n + 1)!(x − a)n+1

Taylor’s theorem

Theorem

If n ≥ 0 is an integer and f a function which is n timescontinuously differentiable on the closed interval [a, x ] and n + 1times differentiable on the open interval (a, x ), then

f (x ) =

n∑

k=0

f (k)(a)

k !(x − a)k + Rn(x ),

for n = 0 the result is known as the mean value theorem

note: there are also other expressions for Rn(x )

Taylor’s theorem: multivariate case

Theorem

Let f : Rn 7→ R be (m + 1)-times continuously differentiable onNr (a) ⊂ R

n , for any x ∈ Nr (a)

f (x ) =

m∑

k=0

f (k)(a; x − a)

k !+ Rm(x ),

where f (1)(a; t) =∑

i Di f (x )ti , f(2)(a; t) =

i ,j Dij f (x )ti tj ,

f (3)(a; t) =∑

i ,j ,k Dijk f (x )ti tj tk , . . ., and there is z on theline between x and a such that

Rm(x ) =f (m+1)(z ; x − a)

(m + 1)!

note: Dij is the second (partial) derivative of f with respectto xi and xj , Dijk and higher orders respectively

Linearizations

According to Taylor’s theorem linear functionf (x ) = f (a) +∇f (a) · (x − a) approximates f around a withthe error term R1(x )

f (x ) is the linearization of f at a, or a first order approximationif we have a relation y = f (x ), then around a the change of yis given by the linear term, i.e., y − f (a) ≈ ∇f (a) · (x − a)

In many economic models log-linearization provides moreattractive interpretations than standard (above) linearization

reason: expressions are obtained in terms of elasticities

Example: Arrow-Pratt

Deriving the Arrow-Pratt measure of absolute risk aversion(ARA)

Finding an approximation for a certain loss that a decisionmaker with utility function u is willing take instead of a lottery

ε is random variable zero mean and variance σ2

condition for x at w : E[u(w + ε)] = u(w − x )by Taylor’s theorem u(w + ε) ≈ u(w) + εu ′(w) + ε2u ′′(w)/2(second order approximation) and u(w − x ) ≈ u(w)− xu ′(w)(first order approximation)observing that E[εu ′(w) + ε2u ′′(w)/2] = 0+ σ2u ′′(w)/2 (noteE[ε2] = σ2)and manipulating givesx ≈ −(σ2/2)[u ′′(w)/u ′(w)]the second term is ARA

Log-linearizations

The usual way: make a log-transform and linearize

1. if the original variable is X take log-transform x = logX ,then X = ex (note X > 0)

2. plug ex in place of X (if there are several variables do thesame for each of them)

3. linearize f (ex ) around X ∗ = ex∗

the linearization gives f (x ) = f (X ∗) + X ∗f ′(X ∗)(x − x∗)the change of X is now relative, i.e.,x − x∗ = log(X )− log(X ∗) = log(X /X ∗) and X /X ∗ = 1+%changethe relative change of y = f (X ) has an approximation[f (X )− f (X ∗)]/f (X ∗) ≈ [X ∗f ′(X ∗)/f (X ∗)](x − x∗) =ε(x − x∗), where ε is the elasticity

Log-linearizations

Multivariate case: X ∈ Rn++, and f : R

n++ 7→ R, then

f (x ) = f (X ∗) +∑

i X∗

i [∂f (X∗)/∂Xi ](xi − x ∗

i ) (turn this intorelative changes form!)

Examples

X ≈ X ∗(1 + x − x∗)resource constraint of the economy Y = C + I ,(y − y∗) ≈ [C ∗/Y ∗](c − c∗) + [I ∗/Y ∗](i − i∗) (log-linearizeboth sides and manipulate)consumption Euler equation: 1 = Rt+1β[Ct+1/Ct ]

−γ ,log-linearization around steady state R∗, C ∗, i.e., Rt = R∗

and Ct+1 = Ct = C ∗, leads to(ct − c∗) ≈ −[1/γ](rt+1 − r∗) + (ct+1 − c∗) (linear relation)

Page 17: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Hessian matrix

Hessian matrix of a twice differentiable function f : Rn 7→ R

the matrix

D2f (x ) = [Dij f (x )] =

∂2f (a)/∂x1∂x1 · · · ∂2f (a)/∂x1∂xn∂2f (a)/∂x2∂x1 · · · ∂2f (a)/∂x2∂xn

.

.

. · · ·

.

.

.

∂2f (a)/∂xn∂x1 · · · ∂2f (a)/∂xn∂xn

Taylor’s theorem

f (x ) = f (a)+∇f (a)·(x−a)+(1/2)(x−a)·D2f (a)(x−a)+R2(x )

this gives second order (quadratic) approximation of f at awhen f ∈ C 2(Nε(a)) then D2f (a) is symmetric

Implicit function theorem

When the endogenous (y) and exogenous (x ) variables cannotbe separated as in f (x , y) = c the following questions arise

does f define y as a continuous function of x (we denotey(x )) around a given point x 0 such that y(x 0) = y0 andf (x , y(x )) = c?how does the change in x affect the corresponding y ’s(comparative statics)?note: there has to be as many equations as there areendogenous variables

The case f : R2 7→ R (and f ∈ C 1)

assume f (x 0, y0) = c, sufficient condition for the existence ofan implicit function is that ∂f (x 0, y0)/∂y 6= 0If y(x ) is the implicit function the its derivative at x 0 can beobtained by applying the chain rule

y ′(x 0) = −∂f (x 0, y0)/∂x

∂f (x 0, y0)/∂y

Linear implicit function theorem

Theorem

Assume that A ∈ Rm×(n+m), and A = (Ax ,Ay), with

Ax ∈ Rm×n , Ay ∈ R

m×m such that A(x , y) = Axx + Ayy

(here (x , y) =

(

x

y

)

)

If Ay is invertible we obtain from A(x , y) = 0 the functiony(x ) = −A−1

y Axx (i.e., Dy(x ) = −A−1y Ax )

If we write Axdx + Aydy = 0 we obtain dy = −A−1y Axdx ,

when only one exogenous variable is changed, then we can useCramer’s rule to find dy

Implicit function theorem

Theorem

f : Rn+m 7→ Rm continuously differentiable on Nε(x

0, y0) for someε > 0 and f (x 0, y0) = 0, det(Dy f (x

0, y0)) 6= 0, then there isδ > 0 and function y(x ) ∈ C 1(Nδ(x

0)) such that

1. f (x , y(x )) = 0

2. y(x 0) = y0

3. Dxy(x0) = −(Dy f (x

0, y0))−1Dx f (x0, y0)

IFT: Example 1

f (x , y) =

(

f1(x , y)f2(x , y)

)

, f1(x , y) = y1y22 − x1x2 + x2 − 7,

f2(x , y) = y1 − x1/y2 + x2 − 5

Let us take (x 0, y0) = (−2, 2, 1, 1)(x 0

1 = −2, x 02 = 2, y01 = 1, y02 = 1)

Dy f (x0, y0) =

(

∂f1(x0,y0)

∂y1

∂f1(x0,y0)

∂y2∂f2(x

0,y0)∂y1

∂f2(x0,y0)

∂y2

)

=

(

(y02 )

2 2y01y

02

1 x01 /(y

02 )

2

)

=

(

1 21 −2

)

Dx f (x0, y0) =

(

∂f1(x0,y0)

∂x1

∂f1(x0,y0)

∂x2∂f2(x

0,y0)∂x1

∂f2(x0,y0)

∂x2

)

=

(

−(x02 ) 1− x0

2

−1/y02 1

)

=

(

−2 −1−1 1

)

IFT: Example 1

How does change in x1 affect the endogenous variables y?

Dy f (x0, y0)

(

dy1dy2

)

+Dx1 f (x0, y0)dx1 =

(

00

)

we get(

1 21 −2

)(

dy1dy2

)

+

(

−2−1

)

dx1 =

(

00

)

by Cramer’s rule

dy1 =

det

(

2 21 −2

)

det

(

1 21 −2

)dx1 =1

2dx1

dy2 =

det

(

1 21 1

)

det

(

1 21 −2

)dx1 =1

4dx1

Example 2

Nonlinear IS-LM

Y − C (Y − T )− I (r)−G = 0,

M (Y , r)−M s = 0,

C ′(·) ∈ (0, 1), I ′(r),Mr = ∂M /∂r < 0, MY = ∂M /∂Y > 0(why?)write this as f (G ,T ,M s ,Y , r) = 0

assume that G ,T ,M s are exogenous and there is anequilibrium around which

D(Y ,r)f (·) =

(

1− C ′(·) −I ′(r)MY (·) Mr (·)

)

we get det(D(Y ,r)f (·)) = [1− C ′(·)]Mr (·) + I ′(r)MY (·) < 0,i.e., (Y , r) can be defined as functions of exogenous variablesaround equilibrium

Example 2

What happens to equilibrium when M s is kept fixed and G

and T are raised equally much (“dT = dG”)?

(

1− C ′(·) −I ′(r)MY (·) Mr (·)

)(

dY

dr

)

=

(

dG − C ′(·)dT0

)

Cramer’s rule:

dY /dG =

det

(

1− C ′(·) −I ′(r)0 Mr (·)

)

det(D(Y ,r)f (·))> 0

dr/dG =

det

(

1− C ′(·) 1− C ′(·)MY (·) 0

)

det(D(Y ,r)f (·))> 0

Page 18: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Total derivative

Let f : Rn 7→ R. What does the following expression mean?

df =n∑

i=1

∂f

∂xidxi (1)

heuristically, we think about relations between infinitesimalsthe benefit is that we can easily account for relations likex 21 = x2x3, i.e., 2x1dx1 = x3dx2 + x2dx3the expression is total derivative or differential form

Formally (1) can be defined at p ∈ Rn by thinking of xi ’s as

functions on a neighborhood of p and

f ′(p; v) =n∑

i=1

∂f

∂xix ′

i (p; v)

Total derivative

The result obtained in example 2 using total derivatives canbe obtained by assuming that T (G) satisfies T ′(G) = 1 orT = G + c

(

dY /dGdr/dG

)

= −[D(Y ,r)f (G,T (G),M s ,Y , r)]−1DG f (·)

= −

1

det(D(Y ,r)f (G,T (G),M s ,Y , r))

(

Mr (·) I ′(r)−MY (·) 1− C ′(·)

)(

C ′(·)− 10

)

Inverse function theorem

Theorem

f : Rn 7→ Rn continuously differentiable on an open set S . If

det(Df (x 0)) 6= 0 then inverse function f −1 exists on an openneighborhood of f (x 0) (f is a bijection around x 0) andDf −1(f (x 0)) = (Df (x 0))−1

checking directly that a given nonlinear function is a bijectionis extremely difficult

for bijectivity we only need to show the non-singularity of thederivative which is an easy task

Page 19: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Local stability of steady-states

non-linear dynamic systems, stability by linearizationSB: 23.2, 25.4.

Dynamical systems

Dynamical systems

time (discrete or continuous) and a state (in state space)rule describing the evolution of the state: how future statefollows from past states

Nonlinear systems

→ it may be impossible to find an analytic solution→ instead of finding solutions the main interest is in steady states

and their stability

Difference equations

A sequence {xk}∞k=0⊂ R

n satisfies a difference equation oforder T if G(xk , xk−1, . . . , xk−T ) = 0 (for all k ≥ T ), whereG : RTn 7→ R

n

when a difference equation is given in a first order form wehave a recursive presentation, i.e., x k+1 = F (x k )in this case the solution of a difference equation is thesequence that is obtained from the recursion when the initialvalue x 0 is giventhe recursion is also called as the state transition equation, lawof motion or dynamic system

Example: National income

period k national income Yk satisfies Yk = Ck + Ik +Gk ,where Ck is the consumers expenditures, Ik is investments,and Gk is public expenditures

assume Ck = αYk−1, Gk = G0 for all k , Ik = β(Ck − Ck−1)

we obtain a difference equationYk = αYk−1 + β(Ck − Ck−1) +G0 =α(1 + β)Yk−1 − αβYk−2 +G0

this is a second order equation

by choosing xk = Yk and zk = Yk−1 as variables we get thedifference equation in the recursive form

xk = α(1 + β)xk−1 − αβzk−1 +G0

zk = xk−1

Other examples

Economic growth

production function f (Kk ,Lk ), Kk capital, Lk labor, Ck

consumption, δ decay of capitalKk+1 = f (Kk ,Lk ) + (1− δ)Kk − Ck

Natural resources

resource stock sk , harvest of the resource xk , growth functionf (s), dynamics sk+1 = f (sk )− xk

First order difference equations

Definition

x ∗ ∈ Rn is steady state (equilibrium) of a difference equation

x k+1 = F (x k ), F : Rn 7→ Rn if x ∗ = F (x ∗)

non-linear difference equation can be linearized around steadystate: F (x ) ≈ F (x ∗) +DF (x ∗)(x − x ∗), with the change ofvariables z = x − x ∗ we obtain a linear difference equationz k+1 = Az k with A = DF (x ∗)

→ it is possible to analyze local stability of an equilibrium byanalyzing the resulting linear difference equation

Stability

Definition

x ∗ is locally asymptotically stable if there is ε > 0 such thatx 0 ∈ Nε(x

∗) implies that x k → x ∗ as k → ∞, x k = F (x k−1) fork ≥ 1

x ∗ is globally (as.) stable if x k → x ∗ for all x 0

for a linear difference equations z k+1 = Az k we are usuallyinterested in the stability of the origin

Theorem

If I −DF (x ∗) is non-singular and the eigenvalues of DF (x ∗) areinside a unit circle then x ∗ is locally asymptotically stable

Nonlinear differential equations

Consider first order autonomous nonlinear system x = F (x ),where F : Rn 7→ R

n

general solution and the solution to an initial value problem aredefined similarly as for linear systems and difference equations

Definition

x ∗ is a steady state (equilibrium) of the system x = F (x ) ifF (x ∗) = 0

the equilibrium need not be unique

Page 20: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Stability

Definition

x ∗ is asymptotically stable equilibrium of x = F (x ) on S ⊆ Rn , if

x (t ; x 0) → x ∗, when t → ∞ for all x 0 ∈ S

Here x (t ; x 0) is the solution of the initial value problem for x 0

If there is ε > 0 such that x ∗ is stable on Nε(x∗) then x ∗ is

locally stable

If x ∗ is stable for all possible initial states x 0, then x ∗ isglobally stable

Example: growth model k = sf (k)− (1 + δ)k , where k iswealth per capita and f is a production function

questions: when is there an equilibrium and when is it stable?the number of equilibria may also be of interest

Analyzing stability

Theorem

the equilibrium x ∗ of a system x = F (x ) is locally asymptotically

stable if the real parts of eigenvalues of DF (x ∗) are negative

if there is one eigenvalue with positive real part, then theequilibrium is unstable

the qualitative behavior of the solution is characterized by thelinearized system, note:F (x ) ≈ F (x ∗) +DF (x ∗)(x − x ∗) = DF (x ∗)(x − x ∗), withthe translation of origin y = x − x ∗ we get a linear systemy = DF (x ∗)y

for n = 1 the condition is that F ′(x ∗) < 0

Analyzing stability

Example: population model for two species

x1 = x1(4− x1 − x2)

x2 = x2(6− x2 − 3x1)

finding the equilibria: (i) (x1, x2) = (0, 0), (ii) (x1, x2) = (0, 6),(iii) (x1, x2) = (4, 0), (iv) (x1, x2) = (1, 3)

computation of the Jacobian

(

4− 2x1 − x2 −x1−3x2 6− 2x2 − 3x1

)

eigenvalues: (i) λ1,2 = 4, 6, (ii) λ1,2 = −2,−6, (iii)λ1,2 = −4,−6, (iv) λ1,2 = −2±

√10

Page 21: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Optimization

Introduction, unconstrained optimizationSB: 17, 19.2

Basic concepts

Definition

Point x ∗ ∈ X is a maximum of f : X 7→ R on S ⊆ X iff (x ∗) ≥ f (x ) for all x ∈ S

Mathematical programming (or optimization) problem: find amaximum (or a minimum) of a given function f : X 7→ R on aset S ⊂ X

function f is the objective functionS is a feasible set, if x ∈ S , then x is a feasible point (solution)if x ∈ S solves the maximization problem, it is an optimumnote: the main tool for the existence optimum is theWeierstrass theorem

Basic concepts

Definition

Assume that the domain X is a subset of a metric space. Then x ∗

is a local maximum of the problem if there is ε > 0 such thatf (x ∗) ≥ f (x ) for all x ∈ Nε(x

∗) ∩ S

Local maxima and minima are called (local) extrema

If f (x ∗) > f (x ) for all x ∈ Nε(x∗) ∩ S and x 6= x ∗, then x ∗ is

strict local maximum (strict global maximum respectively)

Notations

The maximization problem is denoted as maxx∈S f (x ) and theminimization problem as minx∈S f (x )

If f is maximized over x while y are fixed (exogenous) wedenote: maxx f (x , y)

Note: if x is a minimum of f , then it is a maximum of −f (x )(wlog we may consider maximization problems)

If the supremum of f over S is attained at a point x ∗, thenf (x ∗) = maxx∈S f (x ) = supx∈S f (x )

Optimization problems

When S = X , i.e., the feasible set is the whole domain of f ,the problem is unconstrained, otherwise the problem isconstrained

The general optimization problem (for X ⊆ Rn) is a nonlinear

programming problem

When the problem involves time, it is a dynamic optimizationproblem

when there is no time involved, the problem is static

Optimization problems: NLP

Nonlinear programming problems whereS = {x ∈ R

n : g1(x ) ≤ 0, . . . , gk (x ) ≤ 0 andh1(x ) = 0, . . . , hm(x ) = 0}, gi : Rn 7→ R, hj : R

n 7→ R,i = 1, . . . , k , j = 1, . . . ,m

the inequalities g1(x ) ≤ 0, . . . , gk (x ) ≤ 0 are inequalityconstraints (note that gi(x ) ≥ 0 ⇔ −gi(x ) ≤ 0), and theconstraints h1(x ) = 0, . . . , hm(x ) = 0 are equality constraintsif f , g1, . . . , gk , h1, . . . , hm are linear (or affine) the problem isa linear programming problem

Unconstrained problems: FOC

Theorem

Let X ⊆ Rn be an open set, and f : X 7→ R be a differentiable

function on X . If x ∗ is a local extremum of f , then ∇f (x ∗) = 0.

Note: the converse is not necessarily true, i.e., ∇f (x ∗) = 0 isa necessary but not sufficient optimality condition, we call∇f (x ∗) = 0 as a first order optimality condition

The points at which ∇f (x ) = 0 are critical points of f

For n = 1 the FOC is simply f ′(x ∗) = 0

Second order optimality conditions

f twice differentiable on X

2nd order necessary condition: if x∗ is a local maximum, thenD2f (x∗) is negative semidefinite2nd order sufficient condition (SOC): if x∗ is a critical pointand D2f (x∗) is neg.def. then x∗ is a strict local maximumnote: for n = 1, D2f (x∗) = f ′′(x∗), i.e., f ′′(x∗) ≥ 0 (i.e.,f ′(x∗) decreasing at x∗)note: for minimization problem replace neg. (sem.) def. withpos. sem. def.

Page 22: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Least squares problems

Data, observations(y1, x11, x21, . . . , xk1), . . . , (yn , x1n , . . . , xkn)

Linear model yi = β1x1i + · · ·+ βkxki + εi

independent and identically distributed observation error εi ,i = 1, . . . ,nunknown parameter β (vector)more generally we may have a (nonlinear) modelyi = f (x i , β) + εi , where β is a parameter vectorthe linear model can be written as a system y = Xβ + ε,where y , ε ∈ R

n , X ∈ Rn×k and β ∈ R

k

Least squares problems

Least squares problem minβ

(y − Xβ)T(y − Xβ)

Solution by FOC and SOC

FOC: ∇f (β) = ∇β(y − Xβ)T(y − Xβ) = 0, now∇f (β) = −2XTy + 2XTXβ, solving for β givesβ = (XTX )−1XTy

SOC: D2f (β) = 2XTX which is pos. sem. def. (why?)

Envelope theorem

Consider a problem maxx∈Rn f (x , a), where a ∈ Rm is

exogenous

How does the optimal f vary as a is changed?

what are partial derivatives of ∂f (x (a), a)/∂ai , i = 1, . . . ,m?i.e., we are interested in comparative statics of the valuefunction v(a) = maxx f (x , a)

Envelope theorem

Theorem

Assume that the optimum of f is unique in the neighborhood of

a∗, f is differentiable at (x (a∗), a∗), and x (a) is differentiablefunction at a∗. It follows that dv(a∗)/dai = ∂f (x (a∗), a∗)/∂ai forall i = 1, . . . ,m, where v is the value function

we denote Dv(a∗) = Da f (x (a∗), a∗) (total derivative of v is

the partial derivative of f w.r.t. a)

when is x (a) a differentiable function?

it is possible to get qualitative results without actually findingx (a)

partial derivatives are easier to compute than total derivatives

Example

f (x , a) = −x 2 + 2ax + 4a2

Optimum by setting gradient to zero: x (a) = a

Plug the solution into the objective function are differentiatew.r.t a

g(a) = f (x (a), a) = 5a2, hencedg(a)/da = df (x (a), a)/da = 10a

Same result by using the envelope theorem∂f (x (a), a)/∂a = 2x + 8a = 10a

Example: marginal costs

Assume that a firm minimizes costs of labor (L) and capital(K ) when producing a given amount q

production function f (K ,L), i.e., output q = f (K ,L)unit costs of capital and labor wL and wK

problem minL,K wLL+ wKK s.t. q = f (K ,L)eliminate labor from q = f (K ,L) to obtain (short run optimal)cost function c(K , q)

In the long run capital is chosen such that the costs areminimized ⇒ K (q), in the short run capital is fixed

c(K , q) is the short run optimal cost

Long run and short run marginal costs are the same

by the envelope theorem dc(K (q), q)/dq = ∂c(K (q), q)/∂q

Example: marginal costs

Test with production function√KL, where L is labor

set unit costs of labor and capital = 1cost of input K (capital) and L (labor) is K + L, productionfunction

√KL, i.e., output q =

√KL, from which we get

L = q2/K , and c(K , q) = q2/K +K

long run optimum K (q) = q , at which c(K (q), q) = 2q anddc(K (q), q)/dq = 2

Observations:

the graph of a function c(K , q) is above c(K (q), q) except atK = q (optimum)at K = q the tangent of the functions are the same

Page 23: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Convex analysis

Convex sets, concave and quasiconcave functionsSB: 21, 22.1

Convex sets

Definition

Let V be a vector space. The set X ⊂ V is convex ifλx + (1− λ)y ∈ X ∀x , y ∈ X and λ ∈ [0, 1]

vector λx + (1− λ)y is a convex combination of x and y

X is convex if and only if y =∑k

i=1 λixi ∈ X for all k ≥ 2,

x i ∈ X , i = 1, . . . , k ,∑

i λi = 1 and λi ∈ [0, 1], i = 1, . . . , k ,y is called a convex combination of x 1, . . . , x k

a set X is strictly convex if X is convex andλx + (1− λ)y /∈ ∂X for any x , y ∈ X , x 6= y , λ ∈ (0, 1)

Convexity

Operations that preserve convexity

Closure and interior of a convex set are convexIntersection of convex sets is convexLinear map of a convex set is convexCartesian product of convex sets is convex

Theorem

(Minimum distance theorem) Let S ⊂ Rn be a closed convex set

and y /∈ S . Then there is a unique point x ∈ S with minimumdistance from y . Necessary and sufficient condition for x to be theminimizer is (y − x ) · (x − x ) ≤ 0 for all x ∈ S

Convexity

Examples

hyperplanes p · x = c, (open and closed) half spaces p · x ≤ c,(also p · x < c, p · x > c), polyhedral sets = intersections ofhalf spaces, ellipsoids x · Vx ≤ c (V pos.def.), solutions oflinear equations, simplexNote: sets {x}, ∅, and R

n are convex

� Every closed convex set can be presented as an intersection ofclosed half spaces

Convex and concave functions

Definition

Let V be a vector space and X ⊂ V a convex subset of V ,function f : X 7→ R is concave if

f (λx + (1− λ)y) ≥ λf (x ) + (1− λ)f (y) ∀x , y ∈ X , λ ∈ [0, 1]

Function f is strictly concave if

f (λx+(1−λ)y) > λf (x )+(1−λ)f (y) ∀x , y ∈ X , x 6= y , λ ∈ (0, 1)

If −f is (strictly) concave then f is (strictly) convex

Concave functions

Hypograph of a function f : Rn 7→ R ishypf = {(x , µ) ∈ R

n+1 : f (x ) ≥ µ}

hypf is the set of point below the graph of a function,respectively the set above the graph is the epigraph (epif )f is concave (convex) function if and only if hypf (epif ) is aconvex set

Examples of concave functions

ln(x ), −ex , xα when α ∈ [0, 1], −|x |p when p ≥ 1,min{x1, . . . , xn}, x · Vx when V is neg.sem.def

Properties of concave functions

Upper level sets U (f ;α) = {x ∈ X : f (x ) ≥ α} are convexsets

Concave function is locally Lipschitz continuous on theinterior of its domain

Extra: concave function is almost everywhere differentiable onthe interior of its domain

If V = Rn and a function is bounded, convexity is equivalent

to f (λx +(1−λ)y) ≥ λf (x )+ (1−λ)f (y) for some λ ∈ (0, 1)

f is concave if and only if gx ,y(λ) = f (λx + (1− λ)y) isconcave for all x and y on [0, 1]

Operations that preserve concavity

Positively weighted sum of concave functions∑

wi fi(x ),wi ≥ 0, is concave

Minimum g(x ) = minα∈A f (x , α) is concave when f (x , α) isconcave for all α ∈ A

Composite function f (x ) = h(g(x )),g = (g1, . . . , gm) : Rn 7→ R

m , h : Rm 7→ R, f s concave in thefollowing cases

1. when h is concave, non-decreasing and gi is concave for all i

2. when h is concave, non-increasing and gi is convex for all i

3. when h is concave and g is linear/affine (g(x ) = Ax + b)

Page 24: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Examples

Model of a firm

input vector x ∈ Rn

production function f : Rn+ 7→ R+ = {y ∈ R : y ≥ 0} that is

concave and in increasing in all its argumentscost function c : Rn

+ 7→ R, convex and increasing in all itsargumentsprice p > 0, profits pf (x )− c(x ) defined for x ≥ 0

Expenditure function e(p) = minx∈X p · x is concave

p ∈ Rn is a vector of prices of inputs x ∈ X ⊂ R

n+, and X

convex

Differentiable concave functions

Graph of a concave function is below tangent plane:

f (y)− f (x ) ≤ ∇f (x ) · (y − x ), ∀x , y ∈ X

Theorem

Differentiable function is (strictly) concave if and only if itsgradient is (strictly) monotone

for f : R 7→ R, f ′(x ) is decreasingfor f : Rn 7→ R we have [∇f (x )−∇f (y)] · (x − y) ≤ 0 for allx , y ∈ R

n

strict monotonicity: strict inequality for x 6= y

Differentiable concave functions

Theorem

Twice differentiable function f is concave if and only if its Hessianmatrix D2f (x ) is negative semidefinite (on the domain of f )

note 1: this is the main tool for analyzing concavity

note 2: negative semidefiniteness implies monotonicity of ∇f

note 3: positive (semi)definiteness ⇔ convexity

note 4: for f : R 7→ R concavity ⇔ f ′′(x ) non-positive for all x

Hessian matrix of a strictly concave function is not necessarilyneg. nef. but if it is, then the function is strictly concave

Concave optimization

An optimization problem maxx∈S f (x ) is concave (or convex)if f is a concave function and S is a convex set

when S = {x ∈ Rn : g(x ) ≤ 0, h(x ) = 0}, and

g = (g1, . . . , gk ), h(x ) = Ax + b, then S is a convex set whengi(x ), i = 1, . . . , k , are convex functions

Theorem

Local maximum of a concave function is a global maximum

first order conditions give an optimum

Concave optimization

Theorem

The set of solutions of a concave problem is a convex set

The set of maximizers over S is denoted as argmaxx∈S f (x )

If f is strictly concave and the problem has a solution then itis unique

Note: concavity does not guarantee existence

Theorem

When f is concave and differentiable, then x ∗ is an optimum if andonly if ∇f (x ∗) · (x − x ∗) ≤ 0 for all x ∈ S

Variational inequality of concave optimization

Necessary and sufficient optimality condition for a concaveproblem

Parametric optimization and concavity

Function v(a) = maxx∈X (a) f (x , a) (value function) isconcave when f is concave in (x , a) and X (a) is graph convexcorrespondence

Graph convexity:λX (a1) + (1− λ)X (a2) ⊆ X (λa1 + (1− λ)a2)

e.g., X (a) = {x ∈ Rn : gi(x , a) ≤ 0, i = 1, . . . , k} is graph

convex when gi are convex functions in (x , a)

Quasiconcave functions

Definition

Let X be a convex subset of a vector space V , functionf : X 7→ R is quasiconcave if f (λx + (1− λ)y) ≥ min{f (x ), f (y)}∀x , y ∈ X and λ ∈ [0, 1]

Strict quasiconcavity: f (λx + (1− λ)y) > min{f (x ), f (y)},x 6= y , λ ∈ (0, 1)

(Strict) quasiconcavity is equivalent with U (f , α) (upper levelset) being a (strictly) convex set for all α ∈ R

If −f is (strictly) quasiconcave, then f is (strictly) quasiconvex

Quasiconcave functions

Properties of a quasiconcave function f : Rn 7→ R

1. a differentiable function f on an open set convex set X isquasiconcave if and only if f (y) ≥ f (x ) ⇒ ∇f (x ) · (y − x ) ≥ 0for all x , y ∈ X

2. if y · ∇f (x ) = 0 ⇒ y ·D2f (x )y ≤ 0 for all x , y ∈ X , then f isquasiconcave (for strictly quasiconcave functions we need“< 0”)

Page 25: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Quasiconcavity and transformations

Monotone transformation of a quasiconcave function isquasiconcave

i.e., f (x ) = h(g(x )), where g : Rn 7→ R is quasiconcave andh : R 7→ R is (strictly) increasing

⇒ if a function is a monotone transformation of a concavefunction then it is quasiconcave

If a monotone transformation of a function is concave thenthe original function is quasiconcave

if h(f (x )) is concave with increasing h, then f is quasiconcaveimportant specific case: log-concave functions, h = ln

Quasiconcavity and transformations

Other operations that preserve quasiconcavity

minimization: g(x ) = miny f (x , y) (f quasiconcave as afunction of x )composite function f (x ) = h(Ax + b) is quasiconcave, when h

is quasiconcave (here x ∈ Rn and Ax + b ∈ R

m)

Examples

Cobb-Douglas utility functions U (x ) = xα1

1 · · · xαnn , when

x ≥ 0 and αi > 0, i = 1, . . . ,nC-D functions are log-concave, they are concave when∑

i αi < 1 and αi ∈ (0, 1) for all i

CES utility functions U (x ) = [∑

i(αixri )]

1/r, when x ≥ 0,

αi > 0, i = 1, . . . ,n, r ≤ 1

Utility functions

Preference relations are often described by utility functions

decision maker prefers a to b if u(a) ≥ u(b)in the ordinal theory of utility the measurement unit of utilityis not essential; increasing transformation of u describes thesame preference relation

N A property of a function that can (cannot) be preserved by anincreasing transformation is an ordinal (cardinal) property

quasiconcavity is an ordinal while concavity is a cardinalpropertydiminishing marginal rate of substitution, i.e.,MRSxy = [∂u(x , y)/∂x ]/[∂u(x , y)/∂y ] (marginal rate ofsubstitution between x and y) decreasing, is an ordinalpropertydiminishing marginal utility, i.e., MUx = ∂u(x , y)/∂xdecreasing, is a cardinal property

Quasiconcave optimization

maxx∈S f (x ) is a quasiconcave optimization problem when f

is a quasiconcave function and S is a convex set

Quasiconcave optimization problem may have local optimawhich are not global optima

argmaxx∈S f (x ) is a convex setif f is strictly quasiconcave and the problem has a globaloptimum, then it is unique

Theorem

If ∇f (x ) · (y − x ) < 0 for all y ∈ S , y 6= x , then x ∈ S is a globalmaximum of a quasiconcave optimization problem maxx∈S f (x )

v(a) = maxx∈S f (x , a) is quasiconcave when f isquasiconcave in (x , a) and S is a convex set

Example: consumer’s problem

Variables

n commodities, xi amount of commodity i , price pi , income I

Objective function = utility function u(x )

u is usually assumed to be quasiconcave

Constraints

budget constraint p1x1 + p2x2 + . . .+ pnxn ≤ I andnon-negativity xi ≥ 0, i = 1, . . . ,n

Consumer’s problem: for given p and I find the utilitymaximizing commodity bundle

the solution is the consumer’s demand, if it is unique then it isa demand function

Questions: existence of demand functions, continuity, etc

Page 26: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Nonlinear programming

constrained optimizationSB: 18

NLP problems

NLP problem max f (x ) s.t. (subject to) gi(x ) ≤ 0i = 1 . . . , k , hj (x ) = 0, j = 1, . . . ,m

we also denote g(x ) ≤ 0 (inequality constraint in vector form)and h(x ) = 0 (equality constraints)if x satisfies the constraints it is feasibleif gi(x ) = 0, then ith inequality constraint is active (binding)at xwhen we denote maxx f (x , a) we refer to a problem in which a

is exogenous

Inequality constraints into equality constraints

Adding slack variables

way to transform an inequality constraint to an equalityconstraint, e.g., by adding variable s we can write a problemmax f (x ) s.t. g(x ) ≤ 0 to equality constrained problemmax(x ,s) f (x ) s.t. g(x ) + s2 = 0with this trick it is actually possible to derive most of theresults for inequality constrained problems from the theory ofequality constrained problems, otherwise the trick is seldom ofany use

Note: replacing equality constraint with two inequalityconstraints does not work (the problem is that the Jacobiansof the constraints become linearly dependent)

In many inequality constrained problems, it is possible todetect active constraints from the structure of the problem

Nonlinear programming

Example: consumer’s problem

maxx u(x ) s.t. p1x1 + . . .+ pnxn ≤ I and xi ≥ 0,i = 1, , . . . ,nwhen u is increasing in all of its arguments, then budgetconstraint is active at optimum (when prices are positive).Moreover, often u(x ) = 0 when xi = 0 for some i andU (x ) > 0 when x ≫ 0, which means that at optimum x ≫ 0

Feasible points of inequality constrained problems

if gi(x ) = 0 for some i = 1, . . . , k , k > 1, we say that x is acorner point, otherwise x is an interior pointrespectively, when x is an optimum, we say that it is a cornersolution or an interior solution

First order conditions: assumptions

Lagrange functionL(x , λ, µ) = f (x )−

i λigi(x )−∑

j µjhj (x )

λ ∈ Rk is a vector of Lagrange multipliers for inequality

constraints and µ ∈ Rm contains the Lagrange multipliers of

equality constraints

Assumptions: x ∗ is a local optimum, f , g , h are differentiableat x ∗, k0 first inequality constraints are active at x ∗, thefollowing matrix has full rank

∂g1(x∗)

∂x1· · ·

∂g1(x∗)

∂xn

.... . .

...∂gk0

(x∗)

∂x1· · ·

∂gk0(x∗)

∂xn∂h1(x

∗)∂x1

· · ·∂h1(x

∗)∂xn

.... . .

...∂hm (x∗)

∂x1· · ·

∂hm (x∗)∂xn

First order conditions

Result: there are Lagrange multipliers such that

∂L(x∗, λ, µ)

∂xi= 0 ∀i = 1, . . . ,n, (1)

λigi(x∗) = 0 ∀i = 1, . . . , k , (2)

gi(x∗) ≤ 0 ∀i = 1, . . . , k , (3)

hi(x∗) = 0 ∀i = 1, . . . ,m, (4)

λi ≥ 0 ∀i = 1, . . . , k (5)

In vector notation:

∇L(x∗

, λ, µ) = 0,

λ · g(x∗) = 0,

g(x∗) ≤ 0,

h(x∗) = 0,

λ ≥ 0

Remarks on FOCs

Necessary optimality condition

known as Karush-Kuhn-Tucker conditionsfor minimization problems the condition is otherwise the samebut the sign of Lagrange multiplier of inequality constraints(“≤”-constraints) is changednot all points satisfying FOCs are (local) optima

Works also when there are no constraints at all, or either onlyequality or inequality constraints

FOCs: nomenclature

The rank condition is known as non-degeneracy constraintqualification (NDCQ)

assuming k0 first inequality constraints active was done tosimplify notation, more generally the matrix can be formed bycollecting the active constraints

(1) Lagrangian optimality (x ∗ is a critical point of L)

(2) complementary slackness

(3)–(4) primal feasibility

(5) dual feasibility

Page 27: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Geometric interpretation

Problem max f (x ) s.t. g(x ) ≤ 0

At local maximum, x ∗, there is no feasible ascent direction

d 6= 0 is a feasible direction at x∗ if there is λ such thatg(x∗ + λd) ≤ 0 for λ ∈ (0, λ)d 6= 0 is an ascent direction at x∗ if there is λ such thatf (x∗ + λd) > f (x∗) for all λ ∈ (0, λ)

⇒ at local maximum there is no d 6= 0 such that ∇g(x∗) · d < 0and ∇f (x∗) · d > 0when ∇g(x∗) 6= 0 this is equivalent to FOCs

Cookbook: solving FOCs

Method 1 (brute force): find all the points that satisfy theFOCs

for equality constrained problems simply find all solutions ofthe system of equations determined by FOCswhen there are inequality constraints, go through all possiblecombinations of active constraints, if some combination is notpossible it will imply wrong sign for one of the Lagrangemultipliers

Method 2: make an educated guess of active constraints (oreven guess the solution)

check your guess by plugging it into FOCs

Example 1

Problem max x1 − x 22s.t. x 2

1+ x 2

2= 4, x1, x2 ≥ 0

LagrangianL(x , λ, µ) = x1 − x 2

2+ λ1x1 + λ2x2 − µ(x 2

1+ x 2

2− 4)

FOCs:

∂L/∂x1 = 1 + λ1 − 2µx1 = 0

∂L/∂x2 = −2x2 + λ2 − 2µx2 = 0

x 2

1 + x 2

2 = 4

xi ≥ 0, i = 1, 2

λixi = 0, i = 1, 2

λi ≥ 0, i = 1, 2

Example 1

Deducing the active constraints

both inequality constraints cannot be active at the same timeif λ1 = λ2 = 0 (e.g., none of the constraints is active) thenfrom ∂L/∂x2 = 0 we get µ = −1 (⇒ x1 = −1/2, infeasible!)or x2 = 0 (⇒ x1 = 2, µ = 1/4)Are there other solutions than(x1, x2, λ1, λ2, µ) = (2, 0, 0, 0, 1/4)? If there were then x2 > 0and λ2 = 0, by the constraints µ = −1 which leads toinfeasible x1, hence there are no other solutions

Example 2

f (x ) = −(x1 − 1)2 − (x2 − 1)2 and h(x ) = x 21+ x 2

2− 8

Geometric interpretation: minimize the distance from (1, 1) tocircle h(x ) = 0

By geometric intuition the solution seems to be (2, 2)

FOCs: equality constraint and −2(x1 − 1) + 2µx1 = 0,−2(x2 − 1)− 2µx2 = 0, by choosing µ = 1/2 the FOCs aresatisfied (2, 2)

Example: Cobb-Douglas consumer

Two commodities, Cobb-Douglas utility functionU (x1, x2) = xa

1x 1−a2

and budget constraint h(x ) = 0, whereh(x1, x2) = p1x1 + p2x2 − I

L(x , µ) = xa1x 1−a2

− µ(p1x1 + p2x2 − I )

NDCQ holds for p 6= 0

FOC: budget constraint and axa−1

1x 1−a2

− µp1 = 0,(1− a)xa

1x−a2

− µp2 = 0

By eliminating µ we get(1− a)(x1/x2)

a − a(x1/x2)a−1(p2/p1) = 0, i.e.,

p1(1− a)− p2a(x1/x2)−1 = 0 and eventually

p1x1(1− a)− p2x2a = 0

note: we assume x1, x2 > 0 at optimum

Example: Cobb-Douglas consumer

Solving the pair of equations: budget equation and the abovecondition

from budget equation we get p1x1 = I − p2x2, and by pluggingthis into the other we obtain x2 = (1− a)I /p2 and x1 = aI /p1

What can you say about the optimality of the solution?

More general FOCs

FOCs can be formulated for problems where the optimizedvariables are infinite dimensional

formally this works only when the underlying space ofoptimized variables has particular structure

Example: consumer’s life cycle consumption problem

max∞∑

k=0

δku(ck ), δ ∈ (0, 1)

period k consumption ckinitial wealth (exogenous) w0, interest rate r , dynamicswk+1 = (1 + r)(wk − ck ), where wk is period k wealthdifference equation for wealth is considered as a constraint

Page 28: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Example (cont’d)

FOCs

Lagrangian

L(c,w , µ) =∑

k≥0

[

δku(ck )− µk (wk+1 − (1 + r)(wk − ck ))]

note: optimized variables are ck , wk+1, k ≥ 0, i.e., there areinfinitely many variables to be optimizeddifferentiate L w.r.t all variables (ck ’s and wk ’s) and set thedifferential to zero

δku ′(ck ) + µk (1 + r)(−1) = 0 (diff. L w.r.t ck )

(1 + r)µk − µk−1 = 0 (diff. L w.r.t wk , k ≥ 1)

wk+1 = (1 + r)(wk − ck ), k ≥ 0

FOCs are a difference equation for ck , ak and µk , k = 0, 1, . . .note: we can eliminate µk ’s

Page 29: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Comparative statics

Constrained optimization,interpretation of Lagrange multipliers,

generalized envelope theoremSB: 19.1, 19.2

Introduction

How does the solution of an optimization problem changewhen exogenous variables change?

example: how does consumer’s optimum change when theincome is increased?

Optimization problem max f (x , a) s.t. h(x , a) = 0, where a

is exogenous

the solution x (or argmax ) will depend on exogenousvariables, i.e., it is a correspondence (or function) x (a)how does x (a) change as a function of a, what aboutv(a) = f (x (a), a)?

Interpretation of Lagrange multipliers

Theorem

Assume that x (a) solves max f (x ) s.t. h(x ) = a, whereh : Rn 7→ R

m , and x (a) and µ(a) (Lagrange multiplierscorresponding to x (a)) are differentiable functions on an open setof exogenous variables a. Assume also that the first orderconditions with NDCQ are satisfied at x (a), µ(a). Then∇v(a) = µ(a), i.e., df (x (a))/dai = µi(a).

Note: the result tells the change of optimal f for smallchanges of exogenous variables

Lagrange multipliers can be interpreted as shadow prices!

Why these assumptions? When can we be certain that x (a) isdifferentiable?

Interpretation of Lagrange multipliers

When we think the right hand side of h(x ) = a as an amountof resource, then df (x (a))/dai tells how the optimumchanges as the amount of resource i changes

Sketch for a proof:

let v denote the value function, by chain rule∇av(a) = [Dax (a)]

T∇x f (x (a))Lagrangian optimality ∇x f (x (a))− [Dxh(x (a))]

Tµ(a) = 0,i.e., ∇x f (x (a)) = [Dxh(x (a))]

Tµ(a)by differentiating h(x ) = a, we getDa [h(x (a))] = I ⇔ Dxh(x (a))Dax (a) = I (by chain rule)combining the results∇av(a) = [Dax (a)]

T [Dxh(x (a))]Tµ(a) = ITµ(a) = µ(a)

Interpretation of Lagrange multipliers

Theorem

maxx f (x ) s.t. g(x ) ≤ a and assumptions otherwise as before anda has an open neighborhood such that the active constraintsremain the same on that neighborhood. Thendf (x (a))/dai = λi(a), where λi(a) is the Lagrange multipliercorresponding to ith constraint.

example: max x1 + x2 s.t. x 21 + x 2

2 ≤ a and x2 ≤ 1, a > 0.What happens at a = 2?

Generalized envelope theorem

Theorem

maxx f (x , a) s.t. h(x , a) = 0, a ∈ R. Assume that that thesolution of the problem x (a), µ(a) are differentiable (on an openneighborhood of a) and the FOCs are satisfied and NDCQ holds atoptimum. It follows that df (x (a), a)/da = ∂L(x (a), µ(a), a)/∂a

Combination of envelope theorem and the result on theinterpretation of Lagrange multipliers

Example 1

f (x1, x2) = x 21 x2 and h(x1, x2) = 2x 2

1 + x 22 − a

The problem can be solved by eliminating the other variablefrom the constraint h(x ) = 0, we get an unconstrainedproblem

this trick works more generally, but usually elimination ofvariables is difficult, note also that if there are inequalityconstraints they have to be taken into account after theeliminationby eliminating x 2

1 , i.e., x21 = (a − x 2

2 )/2 and plugging this intothe objective function we get max x2(a − x 2

2 )/2differentiating and setting the derivative to zero we geta − 3x 2

2 = 0, i.e., x2 = ±√

a/3 and x1 = ±√

a/3, the

optimum is x = (√

a/3,√

a/3)

Example 1

The drawback of elimination approach is that it does notproduce Lagrange multiplier for the constraint h(x ) = 0, toget the Lagrange multiplier we have to formulate FOCs∂f (x )/∂x1 = 2x1x2 and ∂h(x )/∂x1 = 4x1, hence,µ(a) = x2(a)/2 =

a/12

Try e.g., a1 = 3 and a2 = 3.3, according to theory[f (x (a2))− f (x (a1))]/[a2 − a1] ≈ µ(a1), nowf (x (a2))− f (x (a1)) ≈ 0.1537, µ(a1) = 0.5, which givesµ(a1)(a2 − a1) = 0.3 · 0.5 = 0.15, the difference is in the thirddecimal

Page 30: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Example 2

Output is produced of two raw materials x1, x2 (amounts)

production function xα1 x

β2 , unit costs c1 and c2, price p

objective function (profit) pxα1 x

β2 − c1x1 − c2x2

Regulator sets a constraint x1 = x2

Assume that the firm wants to add the usage of input 1

this assumption imposes some restrictions on exogenousvariables (see below)the firm would rather face a constraint x1 = x2 + a, a > 0than the original constraint

Example 2

We estimate how much the firm should invest in lobbying (orbribing) for the change of the constraint to the formx1 = x2 + a

We solve the problem without making the temptingelimination x1 = x2

we do not eliminate variables at this stage because we wantthe Lagrange multiplier of the constraint

FOCs:

αpxα−11 x

β2 − c1 − µ = 0

βpxα1 x

β−12 − c2 + µ = 0

x1 = x2

Example 2

Solving the FOCs by plugging x2 = x1 into two otherconditions and eliminating µ

we get αpxα+β−11 + βpxα+β−1

1 − c1 − c2 = 0the solution is x1 = x2 = [(c1 + c2)/(p(α+ β))]1/(α+β−1)

Lagrange multiplier is µ = (αc2 − βc1)/(α+ β)

When exogenous variables satisfy αc2 > βc1, the firm iswilling to add input 1

for a small a the firm should invest at most aµ for lobbying, µis the price of the constraint

Differentiability of solutions

When is x (a) differentiable? And what is its derivative?

Unconstrained problem

FOC: ∂f (x , a)/∂xi = 0, i = 1, . . . ,n, denote this set ofequations as F (x , a) = 0

by implicit function theorem, the solution x (a) is differentiableif the matrix DxF = D2

x f is non-singularthe assumptions of IFT are satisfied when f ∈ C 2(Nε(x (a)))and D2

x f (x , a) is non-singular on Nε(x (a))

Differentiability of solutions

Equality constrained problem

FOCs are a system of equations that determines x (a) and µ(a)FOCs can be written as

∇xL(x , µ, a) = 0

h(x , a) = 0

IFT requires that the Jacobian of this system w.r.t (x , µ) isnon-singularnote: non-singularity also guarantees NDCQdenote F (x , µ, a) = 0, IFT:

D(x (a), µ(a)) = −[D(x ,µ)F (x , µ, a)]−1DaF (x , µ, a)

Example: consumer’s problem

maxx u(x ) s.t. p · x ≤ I , x ≥ 0, x , p ∈ Rn , p ≫ 0 and I are

exogenous

we assume that at optimum budget constraint is active andx ≫ 0

FOC:

∇u(x )− µp = 0

I − p · x = 0

The solution (when unique) is Marshallian demand functionx (p, I )

Example: consumer’s problem

Differentiability of the demand, the below matrix should benon-singular

H (x , p) =

(

D2u(x ) −p

−pT 0

)

H is the Jacobian of FOCs w.r.t. (x , µ)H is non-singular when D2u(x ) is neg. def. (and µ > 0), inthat case also H is neg. def.

Example u(x1, x2) = ln x1 + ln x2, FOCs:

1/x1 − µp1 = 0

1/x2 − µp2 = 0

I − p1x1 − p2x2 = 0

and H (x , p) =

−1/x21 0 −p1

0 −1/x22 −p2

−p1 −p2 0

Example: consumer’s problem

Using IFT we get the Jacobian

D(p,I ) (x (p, I ), µ(p, I )) = −[H (x , p)]−1D(p,I )F (x , µ, p, I ),

where

D(p,I )F (x , µ, p, I ) =

−µ 0 00 −µ 0

−x1 −x2 1

e.g., for p = (2, 2) and I = 1, we get x1 = x2 = 1/4 andµ = 2,

D(p,I ) (x (p, I ), µ(p, I )) = −

−16 0 −20 −16 −2−2 −2 0

−1

−2 0 00 −2 0

−1/4 −1/4 1

we get D(p,I ) (x (p, I ), µ(p, I )) =

−1/8 0 1/40 −1/8 1/40 0 −2

Page 31: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Sufficient optimality conditions

SB: 19.3, 21.5

Sufficient conditions for global optimum

NLP problem max f (x ) s.t. g(x ) ≤ 0 and h(x ) = 0

Theorem

Solution of the FOCs is a global optimum if f is concave and gi ,i = 1, . . . , k (g : Rn 7→ R

k ) are quasiconvex and h(x ) = Ax + b

when f is strictly concave the solution is uniqueconcavity of f can be replaced by the assumption that f isquasiconcave and ∇f (x ) 6= 0 for all feasible x

Second order conditions

Theorem

Let f , g , h be twice continuously differentiable around x ∗ whichsatisfies the FOCs is a local maximum, if D2

xL(x∗, λ, µ) is

negatively definite on the set

C ={v 6= 0 : ∇gi(x∗) · v = 0, i = 1, . . . k0,

∇gi(x∗) · v ≤ 0, i = k0 + 1, . . . , k1,

Dh(x )v = 0},

where k1 first inequality constraints are active and first k0 of thesehave strictly positive Lagrange multipliers

neg. def. mathematically: vTD

2

x L(x , λ, µ)v < 0 ∀v ∈ C

note: Lagrange multiplier vectors λ and µ corresponding to x∗, i.e., for

the variables FOCs hold

the neg.definiteness requirement is known as the second order sufficient

condition

Second order conditions

Remarks

when all Lagrange multipliers corresponding to active inequalityconstraints are positive, then C is a tangent set defined by theactive inequality constraints and the equality constraintsthe 2nd order condition is not satisfied when C = ∅a point can be a maximum even though neg. def. requirementis not satisfied, however we have . . .

Theorem

If x ∗ is a local maximum, then D2xL is neg.sem.def ∀v ∈ C (when

C 6= ∅)

negativesemidefiniteness does not imply optimality

Second order conditions

Variations of second order conditions

if D2

xL(x∗, λ, µ) is neg. def. (for all v), then x∗ is a local

maximumif λ and µ are Lagrange multipliers corresponding to x∗ inFOCs and D2

xL(x , λ, µ) is neg. def. (for all v) and for all

feasible x , then x∗ is a global maximum

Investigating definiteness

Determining the definiteness of D2xL on linear constraints on

set C = {v ∈ Rn : Dh(x ∗)v = 0}

determine the definiteness of the bordered Hessian

H =

(

0 Dh(x )Dh(x )T D2

xL(x , µ)

)

if h : Rn 7→ Rm , then we need to check the last leading

(n −m) principal minors: if they alternate in sign and det(H )has the same sign as (−1)n , then D2

xL(x∗, µ)

example: n = 2 and m = 1, it is enough to compute thedeterminant of H , if it positive, then D2

xL(x , µ) is neg. def.

Recipe for checking the sufficient conditions

Let us assume that we have a candidate for an optimum fromFOCs

1. Does the problem seem (quasi)concave?

if yes, show it, if no proceed to 2.

2. Does it seem possible that D2L is neg. def. for all v?

if yes, show it, otherwise proceed to step 3.note: D2f is part of D2L, thus, it makes sense to begin bychecking whether D2f is neg. def.if D2f is not neg. def. then it is unlikely that D2L is neg. def.for all vif D2f is neg. def. then check the Hessians of constraintsD2gi , i = 1, . . . , k (and equality constraints)if all Hessians are pos. def. (neg. def.) for constraints withpositive (negative) Lagrange multiplier, then show that D2L isneg. def. for all v .

Recipe for checking the sufficient conditions

3. Determine the definiteness of the bordered Hessian

alternatively you may check the definiteness of D2L on C

directly by using the definition; eliminate some of the variables(components of v) from the equation of the tangent set, plugthe variables in the expression of D2L and hope that you cannow say something on the definiteness

Page 32: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Example

max x 21x2 s.t. 2x 2

1+ x 2

2= 3

From FOCs we get six candidates for an optimum

(x1, x2, µ) =

(0,±√3, 0)

(±1, 1, 1/2)

(±1,−1,−1/2)

Hessian matrix of the objective function

(

2x2 2x12x1 0

)

the

determinant is −4x 21≤ 0, i.e., the leading principal minors are

2x2 (1st lead. princ. minor) and −4x 21(2nd lead. princ.

minor), it seems that the matrix indefinite

Example

Hessian of the constraint

(

4 00 2

)

matrix is pos. def.

at points (x , µ) = (±1, 1, 1/2) we get D2

xL =

(

0 ±2±2 −1

)

which is indefinite, however onv ∈ C = {v 6= 0 : (4× (±1), 2× 1) · v = 0} the quadraticform defined by the Hessian is v ·D2Lv = −4v2

1± v2

1< 0, i.e.,

the points are local maxima

Example

What about the rest of the solution candidates?

x = (0,±√3): when x2 ≥ 0 (x2 ≤ 0) the objective function is

at least (at most) zero. Hence, x = (0,√3) is local minimum

and x = (0,−√3) is local maximum

at x = (±1,−1) we get D2

xL =

(

0 ±2±2 1

)

, which is

indefinite, the bordered Hessian is

0 ±4 −2±4 0 ±2−2 ±2 1

in this case n = 2 and m = 1, i.e., we need check thedeterminant of the matrix which is −48 implying that D2

xL is

pos. def. on C which means that these points are local minima

Page 33: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Introduction to Programming and Matlab

Matlab tutorial

Introduction

Elements of programming

core: syntax (language) + logic + mathcreativity + rigiditylearning by doing; work + making mistakes

Motivation for economists

most problems are too complicated to be solved with paperand pencilway to gain deeper understanding of economic models, e.g.finding ”new phenomena”trend: computers become more efficient, computation costsdecrease

Programming as a part of academic work

1. Analyzing problem; creating a model

2. Designing an algorithmic solution

3. Programming the solution in a particular programminglanguage

4. Testing the correctness; going back to previous steps

5. Revising and extending the model and the software

Programming languages

Lay out an algorithm in a form that computer ”understands”it

a programming language needs to be precise enough and closeto human languages

Elements of a programming language

commands (”vocabulary”)rules how to combine commands

Learning to program is comparable to learning a language

note: programming languages are not forgiving to mistakes

Overview on programming languages

High-level programming languages

languages close to English (note: high vs. low level is relative)need to be translated (compiled) to instructions that thecomputer hardware understands

Compiled languages

a separate compilation step is needed before program executionexamples: C, C++. Java, Fortran

Interpreted languages

program is compiled (translated) during the executionexamples: Matlab, Python, Lisp

Some examples and curiosities

R

GAUSS

Stata

GAMS

Mathematica

APL: a functional language

Octave: free alternative to Matlab

Matlab

Matlab is interactive system and programming language forgeneral scientific and technical computation and visualization

History

developed in 1079’s by Cleve Molerthe purpose was to develop an interactive program to accessLINPACK and EISPACK (efficient Fortran linear algebrapackages)the name comes from ”Matrix Laboratory”rewritten in C in 1980’sMathworks was founded in 1984

Features of Matlab

Designed for scientific computing

Relatively easy to learn

Quick when preforming Matrix operations

Can be used to build graphical user interfaces that function asstand-alone programs

Has a huge number of Toolboxes

Has good built in graphics for plotting data and displayingimages

Matlab as a programming language

is an interpreted language; writing programs is easy, errors areeasy to fix, slower than compiled languageshas object-oriented elementsis NOT general purpose programming language

Page 34: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Basics of Using Matlab

variables and elementary operations

Basic principles

In Matlab all numeric data is stored in various size of arrays

the elementary (algebraic) operators apply to arrays (ormatrices = array with two dimensions)state of mind: everything is arrays (crucial for writing efficientcode and understanding Matlab programming)

Note: Matlab can be used as a calculator for scalars

the usual scalar operators: +, −, ∗, /, ˆ, parenthesesrelational operators: <, >, <=, >=, ==, ˜=relational operators: & (and), | (or), ˜(complement), xorstandard algebraic rules apply

Numerical variables

Entering a variable: =name of variable is a sequence of letters and numbers (nospaces, no numbers in the beginning), name cannot be areserved keyword

Entering scalars, matrices, and vectorsexamples 1, entering a scalar:

a=1

example 2, entering a 2× 2 matrix:

A= [1 0; 2 3];

note: the role of semicolonexample 3, entering a row vector:

A= [1 0 2 3];

example 4, entering a column vector:

A= [1; 0; 2; 3];

Accessing the i , j element of an array:

A(i,j)

Other data types

Character arrays

A = ’string’

Structures

used for grouping data, example:

persons(1).name=’John’;

persons(1).age=35;

persons(2).name=’Diana’;

persons(2).age=60

Cell arrays

elements can contain any data, example:

A{1}=ones(3);

A{2}=’John’;

A{3}=persons

Generating arrays

Colon operator

syntax: variable=startvalue:stepsize:endvalueexample:

a=0:2:6;

Commands for creating arrays

arrays of zeros; zeros, array of ones; onesdiagonal matrices; diag, identity matrix; eyerandom arrays: rand, randn, randi

Initializing variables

Matlab may execute faster if the zeros (or ones) function isused to set aside storage for an array before its elements aregenerated

Handling arrays

Arrays can be concatenated

example: A=[B C];

A(r1:r2,k1:k2) returns the block consisting of the elements k1up to and including k2 from rows r1 up to and including r2from A

respectively if there are more dimensionsnote: A(r1:r2,:) picks the matrix consisting of rows from r1 tor2

A(v1,v2) returns the elements in the rows specified in vectorv1 and the columns specified in vector v2

“end” refers to the size of the relevant dimension

Examples:

A=[magic(6) ones(6,1)];

B=A([1,3], [2,4])

a=A(2,:)

B=A(2:4,3:end)

Information about variables

To show a list of variables stored in memory use who and whos

To clear variables use clear

during programming, when testing your scripts, it may beworth clearing the variables every now and thenother clear commands: clf and clc

Array functions

information on arrays: size and length (for vectors)reshape, repmat, squeeze

Saving data

save command (and load) and diary

Importing data

loading .txt files: importdata, dlmread, csvread, fopen, ...reading .xls files: xlsread

Basic maths

+ and - work for arrays of the same size

* is matrix product

example:

a=[1 2; 3 4]*[3;6]

Aˆn is used to raise matrix A to a power n

’ is transpose

To make an operation (+,-,*,/,ˆ) element-wise use . beforethe operator

example:

a=[1 2; 3 4].*[5 6; 7 8];

b=[1 2; 3 4]*[5 6; 7 8];

note: many built-in functions operate element-wise

Page 35: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Logical expressions

Relational operators

smaller than (<), smaller or equal than (<=) respectively,larger than, equal ==, not equal ˜=outcome of logical expression is either true (1) or false (0)other commands: any, all

Logical conditions can be combined into logical expressionsusing logical operators

and (&), or (|), not ˜

Relational and logical operators can be used over arrays

for scalars it is recommended to use && (short-circuit and)instead of & and || instead of |the outcome of logical condition is a logical array

Example:

x=rand(10,1);

a= (x>.4 & x<.6)

Page 36: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Graphs and Plots in Matlab

2d and 3d graphics and image manipulation

Basic 2D plotting

plot command

plot(y) plots vector y against its indicesplot(x,y) plots vector y against xfplot can be used for function plots (no need for arrays)example:

x=1:100; y=x.^2-5; plot(x,y)

Style commands, see help plot

line styles: dashed, dotted, dash dottedplot markers: +, ∗, o, xline colors, line width etc

New figure window: figure

Plot commands

Labels, title and legends

xlabel, ylabel, title, legend, textnote: Matlab understands some latex

Multiple plots in same figure

hold on (and hold off)use plot(x1,y1,x2,y2)combine multiple y values in a matrix: plot(x,[y1 y2])

Miscellaneous

grid on (or off), axis box, axis square, box on (off)axis range: axis([xmin xmax ymin ymax])

Example:

figure(2); x=1:100; y=x.^2-5; plot(x,y,’k-.’);

grid on; axis([10 50 0 10000]), axis square,

y2=x.^(1.5)-10; hold on; plot(x,y2,’b--’);

xlabel(’x’); title(’example’)

3D plots

Use plot3 to draw 3d lines

example:

z=[0:pi/50:10*pi];x=exp(-.2.*z).*cos(z);

y=exp(-.2.*z).*sin(z); plot3(x,y,z);

Surfaces and contours

to evaluate a function at given x , y-grid-points use meshgrid tocreate the griduse surf to plot the surfaceuse contour to plot contoursexample:

[x,y]=meshgrid(-5:.5:5,-10:.5:10); z=x.^2+2*sin(y);

figure(1); surf(x,y,z);

figure(2); contour(x,y,z)

Useful graphics commands

Several plots in the same figure: subplot(m,n,p)

splits the figure into m × n matrix, p identifies the part wherethe next plot will be drawn

Bars, pies, histograms

example:

subplot(2, 2, 1); bar([2 3 1 4; 1:4]), title(’Bar graph { multiple series’)

subplot(2, 2, 2); bar3([2 3 1 4; 1:4]),

title(’3D bar graph’)

subplot(2, 2, 3); pie([1 2 3 4]), title(’Pie chart’)

subplot(2, 2, 4); hist(randn(1000,1))

Exporting figures

use print command, example:

print -deps test.eps

Page 37: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Programming with Matlab

scripts, loops and conditional expressions, functions

Introduction

It is possible to write Matlab commands (or scripts) in m-filesand execute these files

m-files can be called/run in Matlab by their nameanything in the Matlab workspace can be called from a scriptnote: m-files are text

Functions are computer codes that accept inputs from theuser, carry out computations, and produce outputs

functions are created in m-filesfunctions differ from scripts in having a function declaration(more later) and you cannot (unless you necessarily want) callvariables in Matlab workspace inside a functionnote: many Matlab functions are implemented as m-files, toview the code use edit command

Note: comments are added in m-files by starting the commentwith %

in general you cannot comment your codes too much

For loops

Syntax to repeat execution of a block of code

basic syntax:

for counter=startvalue:stepsize:endvalue

% some commands

end

the following syntax is also possible

for counter=[vector]

% some commands

end

example:

n=100; x=1:n;

for i=2:5:n

x(i)=x(i-1)+i;

end

plot(x)

Conditional execution with if/else structure

Syntaxif logical_expression1

% commands executed if expression1 is true

elseif logical_expression2

% commands executed if logical_expression1 is false

% but logical_expression_2 is true

else

% executed in all other cases

end

note: else and elseif are not necessarylogical expression is composed of relational and logicaloperatorsexample:if x >= 0

y = sqrt(x);

else

disp(’Number cannot be negative’);

end

Conditional execution with Switch structure

Alternative for if structureSyntax:switch variable

case option1

statements ...

case option n

statements

otherwise

statements

end

Example:state = input(’Enter state: ’)

switch state

case {’TX’, ’FL’, ’AR’}

disp(’State taxes: NO’);

otherwise

disp(’State taxes: YES’)

end

While loop

Syntax:

while logical_expression

commands

end

the loop executes the commands as long as logical expressionis trueexample:

n = input(’Enter nr > 0: ’); f = 1;

while n > 0

f = f .* n

n = n - 1

end

Breaking loops

break command inside loop (for or while) terminates the loopnote: any Matlab script can be stopped from running bypressing control + c

Interaction with the user

Receiving input from the user

as function arguments (to come)input command, e.g., before ”input(’Enter state: ’)”

Displaying output

disp command, example:

A = [1 4];disp(’The entries in A are’); disp(A);

printf (allows specifying the output type), example:

x = 2;

printf(’The value of x as integer is %d \n

and as real is %f’, x, x);

Number display can be controlled by format command

Writing a function

Each function starts with a function-definition line thatcontains

the word functiona variable that defines the function outputa function nameinput argument(s)example:

function avg = Average3(x, y, z)

avg = (x + y + z) / 3;

Function name and name of M-file should be the same

function name has to be a valid Matlab variable namerecall that Matlab is case sensitivem-file without function declaration is a ”script”file

Using help

comment lines immediately following the first function line arereturned when help is queried

Page 38: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

More advanced function programming

Inputs and outputs

nargin and nargout, vararginnote: it is not necessary to give all arguments (unless needed)or to take all outputsnote: it is possible to write a function without arguments

Subfunctions

complex functions can be implemented by grouping smallerfunctions (called subfunctions) together in a single fileany function declaration after the initial one is a subfunction tothe main function, a subfunction is visible to all functionsdeclared before itexample:

function [output1,output2]=myfunction(input1)

% some commands

c=sub_myfunction(a,b);

% some commands

function output1=sub_myfunction(input1,input2)

Local and global variables

Variables inside a function in an M-file are local variables

local variables are not defined outside the functionsubfunctions have their own part of memory and cannot accessvariables of the main function (unless they are shared viaglobal)

Global variables are available to all functions in a program

warning: it might become unclear what a global variablecontains at a certain point as any intermediate function couldhave accessed it and could have changed it (hence avoid them)global variables are typically reserved for constantsglobal variables are declared in the workspace using thekeyword globalwhen used inside a function, the function must explicitlyrequest the global variable by name

Miscellaneous: directories, diagnostics

How does Matlab search for function files?

first from the current working directory, then from otherdirectories specified in the pathuseful path command: path, pathtool

Debugging and enhancing your code

Matlab editor has a debugger toolprofile and tic (toc) commands can be used to analyzecomputation times

Function Handles

useful for functions of functions, example: f = @fun creates afunction handle ”f” that references function ”fun”

Tips for efficiency

Avoid loops and vectorize instead

Use functions instead of scripts

Avoid overwriting variables

Do not output too much information to the screen

Use short circuit && and || instead of & and |

Use the sparse format if your matrices are very big but containmany zeros

Finally: there is a trade-off between speed, precision, andprogramming time

don’t spend an hour figuring out how to save yourself microseconds of computing time

Page 39: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Dynare

basics on writing a Dynare model,material: Dynare user guide at www.dynare.org

Introduction

”DYNARE is a pre-processor and a collection of MATLAB routinesthat has the great advantages of reading DSGE model equationswritten almost as in an academic paper.” (DYNARE User Guide)DYNARE can be used to:

compute the steady states of DSGE models

compute the solutions of deterministic models

compute first and second order approximations to solutions ofstochastic models

estimate parameters of DSGE models

compute optimal policies in linear-quadratic models

See www.dynare.org

The purpose

Finding a solution to a dynamic system of the formEε[f (yt+1, yt , yt−1, εt ; θ)] = 0

such systems arise from stochastic macro modelsyt are endogenous variables, εt are exogenous shocks(expected value zero), θ are exogenous parametersa solution is a function (e.g., policy) yt = g(yt−1, εt), notethat f is knownin the deterministic case εt are known (no expectation) andsolution is simply a path of y ’s

Dynare gives linearization of g around the steady state

at the steady state y it holds that Eε[f (y , y , y , ε; θ)] = 0the first order Dynare solution is of the formyt = y + A(yt−1 − y) + Bεt , second order approximationrespectively

Example

Optimal growth model

maxkt ,ct

E

[

∞∑

t=0

βt(c1−ν

t − 1)/(1− ν)

]

(1)

ct + kt = ztkα

t−1+ (1− δ)kt−1 (2)

zt = (1− ρ) + ρzt−1 + εt (3)

The FOC’s lead to a system to be solved: equations (2), (3),and c−ν

t= E

[

βc−ν

t+1(αzt+1kα−1t

+ 1− δ)]

here y = [c, k , z ]

Dynare files

Dynare models are written in .mod files

use the Matlab editor as if writing an m-filewhen saving, it is important to append the extension ’.mod’instead of ’.m’.

There are four major blocks in a mod-file

The preamble, listing the variables and the parametersThe model, containing the equationsInitial values for linearization around the steady stateThe variance-covariance matrix for the shocks

Dynare file blocks

The preamble

variables, example:

var c, k, z;

exogenous variables, e.g., shocks:

varexo e;

parameters and their values:

parameters alpha, beta, delta, nu, rho;

alpha=.36; beta=.99; delta=.025; nu=2; rho=.95;

The model (x(±n) denotes xt±n)

model;

c^(-nu)=beta*c(+1)^(-nu)*(alpha*z(+1)*k^(alpha-1)+1-delta);

c+k=z*k(-1)^alpha+(1-delta)*k(-1);

z=(1-rho)+rho*z(-1)+e;

end;

Dynare file blocks

The model block (cont’d)

all the relevant equations need to be listed, also those for theshock processthere should be as many equations as there are endogenousvariablesthe timing reflects when a variable is decided, notecontemporaneous are without parenthesis

Initialization block

initial values for finding the steady stateit can be tricky to figure out a good initial guess!

initval;

c = 1; k = .8; z = 11;

end;

Dynare file blocks

Random shock block

indicate the standard deviation for the exogenous innovation

shocks;

var e; stderr 0.007;

end;

covariances are given as var e, u = ...; (cov. of e and u)

Solution and properties block

the solution is found by command stoch simul; which (1)computes a Taylor approximation around the steady state oforder one or two (2) simulates data from the model (3)computes moments, autocorrelations and impulse responsesinputs of stoch simul: periods is the number of periods to usein simulations (default=0), irf is the number of periods onwhich to compute the impulse response functions (default=0),nofunctions suppresses the printing of the approximatedsolution. order is the order of the Taylor approximation (1 or 2)

Page 40: Generalinfo Introduction Lecturer: Mitri Kitti (mitri

Dynare outputs

Dynare is run in Matlab by ”dynare modelname”

Outputs are saved in a global variable oo which contains thefollowing fields

steady state contains the steady states of the variables (in theorder used in the preamble)mean, var (variances), autocorr (autocorrelations) etc

Saving variables

use ”dynasave(FILENAME) [variable names separated bycommas]”at the end of .mod file to save variablesvariables can be retrieved by load -mat FILENAMEnote: Dynare clears all variables in the memory unless called by”dynare program.mod noclearall”

Dynare outputs

The main output is the linearized (or second order approx)policy function ”g”around the steady state

Example:

POLICY AND TRANSITION FUNCTIONS

k z c

constant 37.989254 1.000000 2.754327

k(-1) 0.976540 -0.000000 0.033561

z(-1) 2.597386 0.950000 0.921470

e 2.734091 1.000000 0.969968

interpretation, e.g.,kt = k + 0.97(kt−1 − k) + 2.59(zt−1 − z ) + 2.73εt , where k

and z are steady-state values (the constants on the first line)

Loops

Running the same Dynare program for several parametervalues

Write a Matlab script (or function) in which you loop over therelevant parameter values

for each value of parameter use ”save parameterfile parameter”in the Dynare file call ”load parameterfile”and replace the partin which the parameter value is set with”set param value(’·’,·)”; the first argument is the name in the.mod file and the second is the numerical value

Scope and limitations

Dynare has tools for LQ models, estimation, and regression

Finding steady-states

use other Matlab routines

Limitations

difficult to modify the Dynare code for ”non-standard”calculationsdifficult to detect errorsdocumentation?limited to approximations, e.g., in dynamic programming theare other methods (implemented e.g., in CompEcon toolbox)