105
Quantitative Asset and Risk Management Preparation text for the admission test Mathematics & Statistics Version 2017

Preparation text for the admission test - fh-vie.ac.at · This admission test will be conducted as a PC-based multiple choice test. ... 2.6.3 Rational functions ... While in arithmetics

Embed Size (px)

Citation preview

Quantitative Asset and Risk Management

Preparation text for theadmission test

Mathematics & Statistics

Version 2017

Preamble

This text is the sole document needed to prepare for the admission test to the studyprogram Quantitative Asset and Risk Management (ARIMA) at the University of App-lied Sciences BFI Vienna. This admission test will be conducted as a PC-based multiplechoice test. During the test applicants are either asked to perform similar calculationsas shown in the text or to use the definitions included for a plausibility check on thepresented answers.

Overall, this text contains six chapters, of which the first three discuss important ma-thematical concepts such as algebra, the definition and properties of functions and thecalculation of the derivative of a function. Chapter 4 presents the fundamental conceptsused for calculating the rate of return. Returns and returns vectors are one of the mostimportant inputs to models in mathematical finance and are therefore part of almostevery course covered during the ARIMA study program.

Chapters 5 and 6 highlight basic statistical concepts. Chapter 5 discusses the possibilitiesof descriptive statistical methods used for analyzing and presenting given data sets. Inthe closing chapter the theoretical concept of random variables is briefly introduced.Additionally some of the most important theoretical probability distributions such asthe Binomial distribution and the Gaussian distribution are covered in some detail.

The preparation text was written by the members of the ARIMA-Team. For the mostpart this was done during the winter semester 2015/2016, therefore it is rather new.Feedback is always welcome! We wish everyone reading this text a good time and hopethat we will see some of you at our admission test.

Best Regards,The ARIMA-Team November 2016 in Vienna

2

Table of Contents

1 Algebra 61.1 Elementary algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.1 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.2 Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.3 Exponential and logarithmic equations . . . . . . . . . . . . . . . . 111.2.4 Radical equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Functions 132.1 Definition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Well defined functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Properties of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1 Equations of functions . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Monotone functions . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Operations with functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.1 Arithmetic operations . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.2 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Special functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6.2 Power functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.6.3 Rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.6.4 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 282.6.5 Exponential function ex . . . . . . . . . . . . . . . . . . . . . . . . 302.6.6 Logarithmic functions . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Differentiation 353.1 Continuity and differentiability . . . . . . . . . . . . . . . . . . . . . . . . 373.2 Computing the derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 Differentiation is linear . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.2 The product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.3 The quotient rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.4 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.5 The elementary and the generalized power rule . . . . . . . . . . . 39

3

Table of Contents

3.2.6 Derivatives of exponential and logarithmic functions . . . . . . . . 393.2.7 Derivatives of trigonometric functions . . . . . . . . . . . . . . . . 40

4 Rate of return calculation 424.1 Discrete rate of return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 Continuously compounded rate of return . . . . . . . . . . . . . . . . . . . 464.3 The relationship between discrete and continuous rates of return . . . . . 50

5 Descriptive statistics 525.1 Basic concepts of descriptive statistics . . . . . . . . . . . . . . . . . . . . 52

5.1.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.1.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.1.3 Statistical characteristics and their attributes . . . . . . . . . . . . 53

5.2 Distribution analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2.1 Absolute and relative frequencies . . . . . . . . . . . . . . . . . . . 555.2.2 Visualization of a frequency distribution . . . . . . . . . . . . . . . 575.2.3 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2.4 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3 Cumulative absolute and relative frequencies . . . . . . . . . . . . . . . . 625.4 Empirical distribution function . . . . . . . . . . . . . . . . . . . . . . . . 625.5 Measures of location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5.1 Arithmetic mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.5.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.5.3 α-quantile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.5.4 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5.5 Relationship among mean, median and mode . . . . . . . . . . . . 695.5.6 Geometric mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.6 Measures of dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.6.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.6.2 Average absolute deviation . . . . . . . . . . . . . . . . . . . . . . 715.6.3 Average squared deviation . . . . . . . . . . . . . . . . . . . . . . . 725.6.4 Box-and-whisker plot . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.7 Correlation and regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.7.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.7.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Random variables 856.1 Introduction to random variables . . . . . . . . . . . . . . . . . . . . . . . 856.2 Cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . 866.3 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.4 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . 886.5 Expected value of a random variable . . . . . . . . . . . . . . . . . . . . . 89

6.5.1 Expected value for a discrete random variable . . . . . . . . . . . . 896.5.2 Expected value for a continuous random variable . . . . . . . . . . 90

4

Table of Contents

6.5.3 Properties of the expected value . . . . . . . . . . . . . . . . . . . 916.6 Variance of a random variable . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.6.1 Formal definition of variance . . . . . . . . . . . . . . . . . . . . . 926.6.2 Variance of a discrete random variable . . . . . . . . . . . . . . . . 926.6.3 Variance of a continuous random variable . . . . . . . . . . . . . . 936.6.4 Properties of the variance . . . . . . . . . . . . . . . . . . . . . . . 94

6.7 Important probability distributions . . . . . . . . . . . . . . . . . . . . . . 946.7.1 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . 946.7.2 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . 966.7.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 976.7.4 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 986.7.5 Standardized normal distribution . . . . . . . . . . . . . . . . . . . 100

5

1 Algebra

The major difference between algebra and arithmetics is the inclusion of variables inalgebra. While in arithmetics only numbers and their arithmetical operations (such as+,−, ·,÷) occur, in algebra variables such as x and y or a and b are used to replacenumbers.

The purpose of using variables is to allow generalizations in mathematics. This is usefulbecause it allows

• arithmetical equations to be stated as laws (e.g. a+ b = b+ a for all a and b).

• reference to values which are not known. In context of a problem, a variable mayrepresent a certain value which is not yet known, but which may be found throughthe formulation and manipulation of equations.

• the exploration of mathematical relationships between quantities.

1.1 Elementary algebra

In elementary algebra, an expression may contain numbers, variables and arithmeticaloperations. These are conventionally written with higher-power terms on the left.

Examples for algebraic expressions

1. x+ 2

2. x2 − 2y + 7

3. z8 + x3(b+ 2x) + a4b− π

4. a3−4ab+3a2−1

In mathematics it is important that the value of an expression is always computed the

6

1 Algebra

same way. Therefore, it is necessary to compute the parts of an expression in a particu-lar order, known as the order of operations. In elementary algebra three mathematicaloperations are used. Algebraic expressions can be linked via addition, multiplicationand applying exponents. Additionally, algebraic expressions can contain parenthesis andabsolute value symbols.

The standard order of operations is expressed in the following list.

1. parenthesis and other grouping symbols including brackets

2. absolute value symbols and the fraction bar

3. exponents and roots

4. multiplication and division

5. addition and substraction

The three basic algebraic operations have certain properties. These are briefly describedin the following list.

Addition The symbol for addition is + and for its inverse operation substraction it is−. While both operations are associative1, only addition is also commutative2.

Multiplication The standard symbol for multiplication is ·. Sometimes × is used as well.By convention, the symbol for multiplication can be left out if the interpretationis clear. The symbol for its inverse operation division is ÷ or the fraction bar.Similar to addition both operations are associative, while only multiplication iscommutative. Furthermore multiplication is distributive3 over addition.

Exponentiation Usually exponentiation is represented by raising the exponent, e.g. ab.Unlike the other two operations, exponentiation is neither commutative nor asso-

1An operation is associative if and only if

(a b) c = a (b c)

holds for all a, b and c.2An operation is commutative if and only if

a b = b a

holds for all a and b.3An operation distributes over an operation ?, if and only if

(a ? b) c = a c ? b c

holds for all a, b and c.

7

1 Algebra

ciative. But it distributes over multiplication, therefore

(ab)c = acbc

for all a, b and c.

Exponentiation has two inverse operations, the logarithm and the nth root. Forthe logarithm

aloga b = b = loga ab

holds for all a and b whereas for the root(b√a)b

= a

holds for all a and b.

Additionally the following equations hold for exponentiation

amn =

(n√a)m

= n√am,

abac = ab+c,(ab)c

= abc.

Examples for elementary algebraic operations

1. Applying the multiplicative distribution rules for (x2 + 2x− 1)(x+ 2) we have

(x2 + 2x− 1)(x+ 2) = x3 + 2x2 − x+ 2x2 + 4x− 2

= x3 + 4x2 + 3x− 2.

2. For (a2 − 3b)2 we have

(a2 − 3b)2 = a4 − 6a2b+ 9b2.

3. A fraction like x+4x2−1 + 3x

x2−1 + 4x can be simplified as

x+ 4

x2 − 1+

3x

x2 − 1+ 4x =

(x+ 4) + (3x) + 4x(x2 − 1)

x2 − 1

=x+ 4 + 3x+ 4x3 − 4x

x2 − 1=

4x3 + 4

x2 − 1.

For 4x7 + 8

7 − x we have

4x

7+

8

7− x =

4x− 7x+ 8

7=−3x+ 8

7.

8

1 Algebra

Finally, x2−1x+1 is simplified as

x2 − 1

x+ 1=

(x+ 1)(x− 1)

x+ 1= x− 1.

4. Calculating with exponents simplifies (3a)4 · (3a)7 · a to

(3a)4 · (3a)7 · a = (3a)4+7 (3a)1

3=

312a12

3= 311a12

and√x · x2 to

√x · x2 = x

12x2 = x

52 =√x5.

5. A standard multiplicative rule is that

(x+ 1)(x− 1) = x2 − x+ x− 1 = x2 − 1

holds.

1.2 Equations

In general, an equation is a mathematical statement that asserts the equality of twoexpressions. It is written by placing the expressions on either side of an equals sign =,for example

x+ 2 = 6

asserts that x+ 2 is equal to 6.

Equations often express relationships between given quantities, the knowns, and quanti-ties yet to be determined, the unknowns. Usually unknowns are denoted by letters at theend of the alphabet (x, y, z, w, . . .) while knowns are denoted by letters at the beginning(a, b, c, . . .). The process of expressing the unknowns in terms of the knowns is calledsolving the equation.

The example stated above shows an equation with a single unknown. A value of thatunknown for which the equation is true is called a solution or root of the equation. Inthe example above, 4 is the solution.

If both sides of the equations are polynomials the equation is called an algebraic equa-tion.

9

1 Algebra

1.2.1 Linear equations

The simplest equations to solve are linear equations that have only one variable. Theycontain only constant numbers and a single variable without an exponent.

The central technique is add, subtract, multiply or divide both sides of the equation bythe same number in order to isolate the variable on one side of the equation. Once thevariable is isolated, the other side of the equation is the value of the variable.

In the general case a linear equation looks as follows

ax+ b = c.

The solution is given by

x =c− ba

.

1.2.2 Quadratic equations

All quadratic equations can be expressed in the form

ax2 + bx+ c = 0,

where a is not zero (if it were so, the equation would be a linear equation). As a 6= 0 wemay divide by a and rearrange the equation into the standard form

x2 + px+ q = 0.

Then the solution of the equation can be calculated by using the following formula:

x1,2 = −p2±√p2

4− q. (1.1)

A quadratic equation with real coefficients (a, b, c ∈ R) can have either zero, one ortwo distinct real roots, or two distinct complex roots. The discriminant determines thenumber and nature of the roots. The discriminant is the expression underneath thesquare root sign. If it is negative, the solutions are complex (∈ C) and no real rootexists. If it is positive, the solutions are real (∈ R) and if it is 0, there is only one realsolution (formally, the two real solutions coincide).

10

1 Algebra

1.2.3 Exponential and logarithmic equations

An exponential equation is an equation of the form

ax = b for a > 0,

which has the solution

x = loga b =log b

log a

when b > 0.

Elementary algebraic techniques are used to rewrite a given equation. As mentionedabove this is used to isolate the unknown and thus calculate the solution.

For example, if3 · 2x−1 + 1 = 10

then, by subtracting 1 from both sides, and then dividing both sides of the equation by3 we obtain

2x−1 = 3.

Applying the natural logarithm (see section 2.6.6 for properties of the logarithmic func-tions) on both sides we get

(x− 1) log 2 = log 3

or

x =log 3

log 2+ 1 ≈ 2.5850.

A logarithmic equation is an equation of the form

loga x = b for a > 0,

which has the solutionx = ab.

1.2.4 Radical equations

A radical equation is an equation of the form

xmn = a m, n ∈ N, n 6= 0.

For these types of equations two cases have to be considered separately.

11

1 Algebra

• If the exponent m is odd the equation has the solution

x = m√an =

(m√a)n.

• If the exponent m is even and a ≥ 0 the equation has the two solutions

x = ± m√an = ±

(m√a)n.

Examples for solving equations

1. The equation 4x = 8 can be solved by dividing both sides by 2. The solution tothe equation is x = 2.

2. The equation x2 + 10x− 11 = 0 can be solved directly using the solution given inequation 1.1. As p = 10 and q = −11 we get:

x1,2 = −5±√

25− (−11) = −5±√

36

x1 = −5 + 6 = 1

x2 = −5− 6 = −11.

3. Let us consider4

4 log10(x− 3)− 2 = 6.

By adding 2 to both sides, followed by dividing both sides of the equation by 4 weget

log10(x− 3) = 2.

By applying the functionf(x) = 10x

on both sides of the equation we obtain x−3 = 102 = 100 and x = 103 respectively.

4. The equation

(x+ 5)23 = 4

has an even exponent m = 2 and positive a = 4. Therefore there exist two solutions.We get

x+ 5 = ±(√

4)3

= ±8

x1,2 = −5± 8

and x ∈ 3,−13.4For the definition of the logarithm to the base 10, denoted as log10(·) see section 2.6.6.

12

2 Functions

The first formalization of functions was created in the 17th century by Gottfried WilhelmLeibniz. He coined the term function to indicate the dependence of one quantity onanother.

A function can be described as a machine, a black box or a rule that, foreach input, returns a corresponding output. Figure 2.1 shows a schematic picture of therelationships of a function f(x).

Figure 2.1: A function f takes an input x and returns an output f(x).

2.1 Definition of functions

In formal terms a function f is given by

f : D → R

x 7→ f(x).

To distinguish the different sets involved in the definition of a function the followingterms are used:

Domain The input x to a function is called the argument of the function. The set of allpossible inputs D is called the domain.

13

2 Functions

Range The output f(x) of a function is called the value. The set R of all possible outputsis called the range.

Image The image of a subset A of the domain under the function can be defined aswell. For A ⊆ D the image of A under f is denoted f(A) ⊆ R and is a subset ofthe functions range.

The image of the whole domain D under f is a subset of the range R as well. Wehave f(D) ⊆ R. f(D) is sometimes simply referred to as the image of f .

Preimage The preimage of a subset B of the range of the function (B ⊆ R) under thefunction f is the set A ⊆ D of all elements x for which f(x) lies within B, formallyA = f−1(B) or A = x ∈ A : f(x) ∈ B.

Graph The set (x, f(x)) : x ∈ D of all paired inputs and outputs is called the graphof the function. A function is fully defined by its graph. In case of a real-valuedfunction f of real values, the graph of f lies in the xy-plane.

2.2 Well defined functions

Generally spoken, a function is well defined if there is a unique f(x) for every x in thedomain of f . This means that a function is well defined if and only if all arguments xhave only one possible function value f(x) (or none).

For functions where the domain and the range consist of real numbers, there is a generalrule to determine if the function is well defined. This rule is called the vertical line test.

A curve in the xy-plane is the graph of some function f with R as domainand range, if and only if no vertical line intersects the curve more than once.

Examples for different functions

1. The function of a simple parabola in the plane is defined by

f : R→ Rx 7→ x2

and shown in figure 2.2.

14

2 Functions

−4 −2 0 2 40

5

10

15

20

25

Figure 2.2: The polynomial function f(x) = x2 from -5 to 5.

2. The function which aligns every integer number to the constant number 4 is definedby

f : Z→ Zx 7→ 4

and shown in figure 2.3.

−4 −2 0 2 43

3.5

4

4.5

5

Figure 2.3: The constant function f(x) = 4 from -5 to 5.

15

2 Functions

3. A function which aligns every number to its non-negative number is defined by

f : R→ Rx 7→ |x|

and shown in figure 2.4.

−4 −2 0 2 40

1

2

3

4

5

Figure 2.4: The absolute value function f(x) = |x| from -5 to 5.

4. Furthermore, a function can be defined piecewise. That is a function which isdefined by multiple subfunctions, each subfunction applying to a certain intervalof the main functions domain (a sub-domain):

f : R→ R

x 7→

0, x ≤ −1

+√

1− x2, −1 < x < 1x, x ≥ 1.

Figure 2.5 shows the graph of the function.

There are many ways to describe or represent a function. Some functions may be des-cribed by a formula or an algorithm that defines how to compute the output for a giveninput. A function can also be presented graphically. Sometimes functions are given by atable that gives the outputs for selected inputs. Furthermore a function can be describedthrough its relationship with other functions, for example as an inverse function or as asolution of a differential equation.

16

2 Functions

−4 −2 0 2 40

1

2

3

4

5

Figure 2.5: A piecewise defined function from -5 to 5 with a jump at x = 1.

2.3 Properties of functions

It is not enough to say f is a function without specifying the domain and the range,unless these are known from the context. For example, a formula such as

f(x) =√x2 − 5x+ 6

is not a properly defined function on its own. However, it is standard to take the largestpossible subset of R as the domain (in this case x ≤ 2 or x ≥ 3) and R as range.

2.3.1 Equations of functions

Well defined functions, which lie in the xy-plane can be represented by their equations.For a function f(x) the equation can be obtained by replacing f(x) with y.

For example, the function

f : R→ Rx 7→ x2 − 4x+ 3

lies in the xy-plane. The equation for this function is y = x2 − 4x+ 3.

17

2 Functions

2.3.2 Monotone functions

A function whose graph is always rising as it is traversed from left to right is said to bean increasing function.Definition Let x1 and x2 be arbitrary points in the domain of a function f . Then f isstrictly increasing if

f(x1) < f(x2) whenever x1 < x2.

A function whose graph is always falling as it is traversed from left to right is said to bea decreasing function.Definition Again, let x1 and x2 be arbitrary points in the domain of a function f . Thenf is strictly decreasing if

f(x1) > f(x2) whenever x1 < x2.

Note that strictly increasing and strictly decreasing functions are always injective andtherefore invertible (see section 2.5).

2.3.3 Continuity

Another important property of functions is the property of continuity. A continuousfunction is a function for which small changes in the input result in small changesin the output. Otherwise, a function is said to be discontinuous.

If the function is a real-valued function of real values and therefore its graph lies inthe xy-plane, then the continuity of the function can be recognized in the graph. Thefunction is continuous if, roughly speaking, the graph is a single unbroken curve with nogaps or jumps.

There are several ways to make this intuition mathematically rigorous. For example:

The function f : D → R is continuous at c ∈ D, if and only if for every ε > 0there is a δ > 0 such that

for all x ∈ D for which it holds that |x− c| < δ =⇒ |f(x)− f(c)| < ε.

More intuitively, we can say that if we want to get all the f(x) values to stayin some small neighborhood around f(c), we simply need to choose a smallenough neighborhood for the x values around c. If that is possible, no matterof how small the f(c)-neighborhood is, then f is continuous at c.

A function f is said to be continuous if it is continuous at all points c ∈ D.

18

2 Functions

Examples of continuous and discontinuous functions

1. All polynomials are continuous, for example f(x) = x2 − 1.

2. The absolute value function f(x) = |x| is continuous.

3. The sign-function

sgn(x) =

1 if x > 00 if x = 0−1 if x < 0

is not continuous, as it jumps around x = 0.

2.4 Operations with functions

In analogy to arithmetics, it is possible to define addition, subtraction, multiplicationand division of functions. Another important operation on functions is composition.When composing functions, the output from one function becomes the input to anotherfunction.

2.4.1 Arithmetic operations

For this section we consider the two function f : D → R and g : D′ → R′.

Addition of functions The addition of two functions f and g is defined as

(f + g)(x) = f(x) + g(x).

The domain of the new function f + g is the intersection of the domains of f andg. Formally, we get D ∩D′ as the domain of the function f + g.

Subtraction of functions Similarly, the subtraction of two functions f and g is definedas

(f − g)(x) = f(x)− g(x).

Again, the domain of the new function f − g is the intersection of the domains off and g. Formally, we get D ∩D′ as the domain of the function f − g.

Multiplication of functions Additionally, the multiplication of two functions f and g isdefined as

(f · g)(x) = f(x) · g(x),

19

2 Functions

where the domain of the new function f · g is the intersection of the domains of fand g. Again, we get D ∩D′ as the domain of the function f · g.

Division of functions Finally, the division of two functions f and g is defined as(f

g

)(x) =

f(x)

g(x).

In this case the domain of the new function f/g is the intersection of the domains off and g excluding the points where g(x) = 0 (to avoid division by zero). Note thatthe points x where g(x) = 0 can be written as the preimage of the set containing0 under the function g. We get (D ∩D′) \ g−1(0) as the new domain.

2.4.2 Composition of functions

Now we consider a new operation on functions, called composition. This operation hasno direct analogy in arithmetics. Informally stated, the operation of composition is per-formed by substituting some function for the variable of another function.

Definition Given functions f : D → R and g : D′ → R′, the composition of f and g,denoted by f g, is the function defined by

(f g)(x) = f(g(x)).

The domain of f g is defined as all x in the domain of g for which g(x) is in the domainof f . If we denote the new domain with E, this means E = x ∈ D′ : g(x) ∈ D.

Examples

1. Let f(x) = x2 − 3x+ 4 and g(x) = x2 − 1. Then we have

a) (f + g)(x) = x2 − 3x+ 4 + x2 − 1 = 2x2 − 3x+ 3

b) (f − x)(x) = x2 − 3x+ 4− (x2 − 1) = −3x+ 5

c) (f · g)(x) = (x2 − 3x+ 4)(x2 − 1) = x4 − 3x3 + 4x2 + 3x− 4

d)(fg

)(x) = x2−3x+4

x2−1 , which is defined for x ∈ R \ −1, 1.

2. Let f(x) = x3 and g(x) = x2 − 4. Then we have

20

2 Functions

a) (f + g)(x) = x3 + x2 − 4

b) (f − g)(x) = x3 − x2 + 4

c) (f · g)(x) = x5 − 4x3

d)(fg

)(x) = x3

x2−4 , which is defined for x ∈ R \ −2, 2.

3. Let f(x) = x2 − 3x+ 4 and g(x) = x2 − 1. Lets calculate f g(x) and g f(x).

a) For f g(x) we get

f(g(x)) = (x2 − 1)2 − 3(x2 − 1) + 4 = x4 − 5x2 + 8.

b) For g f(x) we get

g(f(x)) = (x2 − 3x+ 4)2 − 1 = x4 − 6x3 + 17x2 − 24x+ 15.

4. Let f(x) = 1x2

and g(x) = x2 + 4. Lets calculate f g(x) and g f(x).

a) For f g(x) we get

f(g(x)) =1

(x2 + 4)2.

b) For g f(x) we get

g(f(x)) =

(1

x2

)2

+ 4 =4x4 + 1

x4.

2.5 Inverse functions

In mathematics, an inverse function is a function that undoes another function. Twofunctions f and g are said to be inverse to each other, if an input x into the functionf produces an output y and plugging this y into the function g produces the output x,and vice versa.

ExampleSuppose g is the inverse function of f . Then f(x) = y and g(y) = x. Furthermoref(g(x)) = x, meaning g(x) composed with f(x) leaves x unchanged.

A function f that has an inverse is called invertible. The inverse function is unique forevery f and is denoted by f−1.

21

2 Functions

Definition and determination of the inverse function

Let f be a function whose domain is the set D and whose range is the set R. Then f isinvertible if there exists a function g with domain R and range D with the property:

f(x) = y if and only if g(y) = x.

If f is invertible, the function g is unique and is called the inverse of f , denoted byf−1.

Not all functions have an inverse. For this rule to be applicable, each element y ∈ Rmust correspond to no more than one x ∈ D. A function f with this property is calledinjective.

The most powerful approach to find a formula for f−1 , if it exists, is to solve the equationx = f(y) for y.

Graph of the inverse function

If f and f−1 are inverses, then the graph of the function y = f−1(x) is the same as thegraph of the equation x = f(y). Thus the graph of f−1 can be obtained from the graphof f by switching the positions of the x and y axes. This is equivalent to reflecting thegraph across the line y = x.

In figure 2.6 the graphs of the functions f(x) and f−1(x) of example 5 below are shown.

Examples

1. The function f(x) = x2 is only invertible if the domain is restricted to positivenumbers (or alternative restricted to negative numbers). If so, the inverse functionis f−1(x) =

√x.

2. For a function f(x) = x−a, where a is a constant, the inverse function is f−1(x) =x+ a.

3. For a function f(x) = mx, where m is a constant, the inverse function is f−1(x) =xm . Thereby m 6= 0 must hold.

4. For a function f(x) = 1x the inverse function is f−1(x) = 1

x , where x 6= 0 musthold.

22

2 Functions

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−1

0

1

2

3 f(x)

f−1(x)

Figure 2.6: f(x) =(x+12

)3in blue and its inverse f−1(x) = 2 3

√x− 1 in green from 0 to

2.

5. f is the function

f(x) =

(x

2+

1

2

)3

.

To find the inverse we must solve the equation x =(y2 + 1

2

)3for y:

x =

(y

2+

1

2

)3

3√x =

y + 1

22 3√x− 1 = y

Thus the inverse function f−1 is given by the formula

f−1(x) = 2 3√x− 1.

2.6 Special functions

There exists an overwhelming number of different functions. Nevertheless there are se-veral classes of functions, which should be considered separately.

23

2 Functions

2.6.1 Polynomial functions

A polynomial function is a function that can be defined by evaluating a polynomial. Afunction f of one argument is called a polynomial function if it satisfies

f(x) = anxn + an−1x

n−1 + . . .+ a2x2 + a1x+ a0 =

n∑i=0

aixi (2.1)

for all arguments x, where n is a non-negative integer and a0, a1, a2, . . . , an are constantcoefficients. A polynomial function is either zero, or can be written as the sum of one ormore non-zero terms as denoted in equation 2.1. The number of terms is always finite.

The exponent of a variable in a term is called the degree of that variable in that term.The degree of a polynomial is the largest degree of any term. Since x = x1, the degreeof a variable without exponent is one. A term with no variable is called a constant term,or just a constant. The degree of a constant term is 0, since x0 = 1.

−4 −2 0 2 4

−40

−20

0

20

40

degree 2degree 3degree 5

Figure 2.7: Three polynomial functions of degree 2, 3 and 5 from -5 to 5.

Polynomial functions are widely used in different applications in physics, economics,calculus and numerical analysis. Furthermore they can be used to approximate otherfunctions. The application of polynomial functions in a wide range of topics is due tosome elementary properties of polynomials:

1. A sum of polynomials is a polynomial. The degree of the resulting polynomialequals the higher degree of the two added polynomials. For example (x2 + 3x +1) + (x− 1) = x2 + 4x.

2. A product of polynomials is a polynomial. The degree of the resulting polynomial

24

2 Functions

equals the sum of the degrees of the two multiplied polynomials. For example(x2 + 3x+ 1)(x− 1) = x3 + 2x2 − 2x− 1.

3. A composition of two polynomials is a polynomial. It is obtained by substitutinga variable of the first polynomial by the second polynomial. The degree of theresulting polynomial equals the product of the degrees of the two polynomialsof the composition. For example, for f(x) = x2 + 3x + 1 and g(x) = x − 1 thecomposition f g results in

f g(x) = x2 + x.

Note that the composition g f results in a different polynomial, namely

g f(x) = x2 + 3x.

4. All polynomials are continuous functions.

5. The derivative5 of the polynomial

anxn + xn−1x

n−1 + . . .+ a2x2 + a1x+ a0

is the polynomial

nanxn−1 + (n− 1)an−1x

n−2 + . . .+ 2a2x+ a1.

Furthermore polynomials are smooth functions, which means that all orders ofderivatives of polynomials exist. The resulting derivatives of any order are alwayspolynomials as well.

6. Any polynomial function P (x) of nth order has n roots (roots are values for xwhere P (x) = 0). These roots may be complex numbers (∈ C).

Furthermore every polynomial function can be written as product of simpler poly-nomials, using its roots. Let (xi)i=1,...,n denote the roots of the polynomial functionP (x) of degree n. Then the following holds

P (x) = anxn + an−1x

n−1 + · · ·+ a1x+ a0

= an(x− x1) · . . . · (x− xn).

Examples of different polynomials

1. A constant polynomial would be f(x) = 3.

5For more information on derivatives, see chapter 3.

25

2 Functions

2. Linear polynomials look like f(x) = 2x− 12 and f(x) = 3x

2 − 2.

3. A polynomial of degree 2 is f(x) = x2 + 3x+ 20.

4. A polynomial of degree 3 is f(x) = 7x3

12 + x2 − 5x.

5. A polynomial of degree 5 looks like f(x) = x5−3x3100 − 8.

6. In figure 2.7 the graphs of the polynomials given in examples 3-5 are shown.

2.6.2 Power functions

Power functions are elementary mathematical functions of the form

f : x 7→ a · xr a, r ∈ R.

The possible domain6 depends on the exponent. If roots of negative numbers are notallowed (which is mostly the case, except if the range includes complex numbers C),then the maximum size of the domain D of the power function depends on the exponentr ∈ R. Table 2.1 lists the possible domains, depending on the nature of the exponentr.

r > 0 r < 0

r ∈ Z D ⊆ R D ⊆ R \ 0r 6∈ Z D ⊆ R+

0 D ⊆ R+

Table 2.1: The possible domains D for power functions, depending on r ∈ R.

For the maximum range of a power function, the sign of a has to be considered as well.If r ∈ Z, the range differs depending on whether r is an odd or even number. Table 2.2lists the possibilities for the maximum range of the power function, depending on thenature of the coefficient a ∈ R and the exponent r ∈ R of the power function.

r > 0 r < 0

r even or r 6∈ Z r odd r even or r 6∈ Z r odd

a > 0 R ⊆ R+0 R ⊆ R R ⊆ R+ R ⊆ R \ 0

a < 0 R ⊆ R−0 R ⊆ R R ⊆ R− R ⊆ R \ 0

Table 2.2: The possible maximum ranges R for power functions, depending on a, r ∈ R.

Examples of power functions

6See section 2.1.

26

2 Functions

1. f(x) = x4

2. f(x) =7√x3 = x

37

3. f(x) = 5√x

= 5x−12

4. f(x) = 17 ·

1x5

= 17x−5

2.6.3 Rational functions

A rational function is a function which can be written as ratio of two polynomial func-tions7. Neither the coefficients of the polynomials nor the values taken by the functionare necessarily rational numbers.

In case of one variable x, a function is called a rational function if and only if it can bewritten in the form

f(x) =P (x)

Q(x)

where P (x) and Q(x) are polynomial functions and Q is not the zero polynomial. Thedomain of f is the set of all points x for which the denominator Q(x) is not zero.

Examples for different rational functions

1. An example for a rational function of degree 2 is

f(x) =x2 − 3x− 2

x2 − 4.

The graph of this function is shown in figure 2.8. It can bee seen that the functionhas a horizontal asymptote at y = 1 and two vertical asymptotes at x = ±2. Forx = ±2 the function is not defined.

2. A rational function of degree 3 is

f(x) =x3 − 2x

2(x2 − 5).

The graph of this function is shown in figure 2.9. It can bee seen that the functionhas two vertical asymptotes at x = ±

√5. Furthermore the linear function g(x) = x

2

is an asymptote. For x = ±√

5 the function is not defined.

7See section 2.6.1.

27

2 Functions

−4 −2 0 2 4−10

−5

0

5

10

Figure 2.8: f(x) = x2−3x−2x2−4 from -5 to 5 with vertical asymptotes at x± 2.

−4 −2 0 2 4−10

−5

0

5

10

Figure 2.9: f(x) = x3−2x2(x2−5) from -5 to 5 with vertical asymptotes at x±

√2.

2.6.4 Trigonometric functions

Trigonometric functions are functions of an angle. They are used to relate angles ofa triangle to the length of the sides of that triangle. The most familiar trigonometricfunctions are sine, cosine and tangent.

The definition of trigonometric functions works best in the context of the standard unitcircle with radius 1. In such a circle a triangle is formed by a ray originating at the origin

28

2 Functions

and making some angle θ with the x-axis. The sine of the angle θ gives the length of they-component (rise) of the triangle, the cosine gives the length of the x-component (run)and the tangent function gives the slope (y-component divided by the x-component).

Thus it holds that tan(θ) = sin(θ)cos(θ) . These relationships are shown in figure 2.10.

sin(θ)

cos(θ)

tan(θ)

Figure 2.10: The unit circle, a ray (θ = 45) and the functions sin(θ), cos(θ) and tan(θ).

Trigonometric functions have a wide range of applications including computing unknownlengths and angles in triangles or modeling periodic phenomena. In figure 2.11 the sinefunction sin(θ) within the model of the unit circle and its function values are presented.It is easy to see that sine is a periodic function oscillating between -1 and 1.

For angles grater than 2π or less than −2π the circle is rotated more than once andtherefore sine and cosine are periodic functions with periodicity 2π. To be more exact,the following holds for any angle θ and integer k:

sin θ = sin(θ + 2πk)

cos θ = cos(θ + 2πk).

The trigonometric functions satisfy a range of identities which make them very powerfulin algebraic calculations. Here is a list of the most important identities.

• Most frequently used is the Pythagorean identity, which states that for any angle,the square of the sine plus the square of the cosine is 1. In symbolic form, thePythagorean identity is written as

sin2 x+ cos2 x = 1.

29

2 Functions

Figure 2.11: The sine function sin(θ),the unit circle and the graph of the function sin(θ).

• Other key relationships are the sum and difference formulas, which give the sineand cosine of the sum and difference of two angles in terms of sines and cosines ofthe angles themselves. For the sine the formulas state

sin(x+ y) = sinx cos y + cosx sin y

sin(x− y) = sinx cos y − cosx sin y.

Similarly for the cosine the following holds

cos(x+ y) = cosx cos y − sinx sin y

cos(x− y) = cosx cos y + sinx sin y.

• When the two angles are equal, the sum formulas reduce to simpler equationsknown as the double-angle formulae.

sin 2x = 2 sinx cosx

cos 2x = cos2 x− sin2 x = 2 cos2 x− 1

2.6.5 Exponential function ex

The exponential function is the function ex, where e is the number

2.71828 18284 59045 23536 02874 71352 66249 77572 47093 69995 . . .

30

2 Functions

such that the function ex is its own derivative. The exponential function can be usedto model a relationship in which a constant change in the independent variable givesthe same proportional change (e.g. percentage increase or decrease) in the dependentvariable.

The function is often written as exp(x), especially when it is impractical to write theindependent variable as superscript. The graph of the exponential function for real num-bers is illustrated in figure 2.12.

−5 −4 −3 −2 −1 0 1 20

2

4

6

8

Figure 2.12: The exponential function exp(x) from -5 to 2.

The exponential function arises whenever a quantity grows or decays at a rate pro-portional to its current value. One such situation is continuously compounded interest.Leonhard Euler has proven in the 18th century that the number

limn→∞

(1 +

1

n

)nactually exists. It is now known as e.

If a principal amount of 1 earns interest at an annual rate of x compounded monthly,then the interest earned each month is x

12 times the current value, so each month thetotal value is multiplied by (1 + x

12), and the value at the end of the year is(1 +

x

12

)12.

If instead interest is compounded daily this becomes(1 +

x

365

)365.

31

2 Functions

Letting the number of time intervals per year grow without bound leads to the limitdefinition of the exponential function

exp(x) = limn→∞

(1 +

x

n

)n,

first given by Euler. Another very important application of the exponential functionstands in the center of mathematical finance which is the core of our study program,Quantitative Asset and Risk Management (ARIMA). This is the calculation of conti-nuously compounded rates of return. For a brief introduction to return calculation, seechapter 4.

The importance of the exponential function in mathematics and sciences stems mainlyfrom the properties of its derivative (see section 3). In particular,

d

dxex = ex.

That is, ex is its own derivative.

2.6.6 Logarithmic functions

The logarithm of a number is the exponent by which another fixed value, the base, hasto be raised to produce that number. For example, the logarithm of 100 to base 10 is 2,because 100 = 102.

More generally, if x = by, then y is the logarithm of x to base b, and is written

y = logb(x).

For example log10(100) = 2.

Logarithms were introduced by John Napier in the early 17th century as a means tosimplify calculations. They were rapidly adopted by scientists, engineers and others toperform computations more easily using slide rules and logarithm tables. These devicesrely on the fact that the logarithm of a product is the sum of the logarithms of thefactors:

logb(xy) = logb(x) + logb(y).

There are three distinct bases for which the logarithm is used often:

• The logarithm to base b = 10 is called the common logarithm and has manyapplications in science and engineering. In this text the notation log10(x) for thelogarithm to base 10 will be used. For example log10(10000) = log10(104) = 4.

32

2 Functions

• The logarithm where the base is the Euler constant e is called the natural logarithm.It is mostly used in mathematics and for the definition of continuously compoundedrates of return in mathematical finance (see section 4.2). In this text the notationlog(x) for the natural logarithm will be used. For example log(e3) = 3.

• The logarithm to base b = 2 is called the binary logarithm and is used primarily incomputer science. In this text, we will not use the binary logarithm. In literaturethe notation lb(x) is most commonly used. For example lb(8) = 3.

The graph of the natural logarithm function for real numbers is shown in figure 2.13.

0 2 4 6 8 10

−4

−2

0

2

Figure 2.13: The natural logarithm function log(x) for x ∈]0, 10].

It has been shown above that the logarithm of a product is the sum of the logarithms ofthe numbers being multiplied. Analogical the logarithm of the ratio of two numbers isthe difference of the logarithms. The following listing shows all properties of logarithmsregarding to arithmetic operations.

product The logarithm of the product is the sum of the logarithms. In formula:

logb(xy) = logb(x) + logb(y)

quotient The logarithm of the ratio is the difference of the logarithms. In formula:

logb

(x

y

)= logb(x)− logb(y)

power The logarithm of x to the power of p is p times the logarithm of x. In formula:

logb(xp) = p · logb(x)

33

2 Functions

root The logarithm of the p-root of x is the logarithm of x divided by p. In formula:

logbp√x =

logb(x)

p

Finally the base of the logarithm can be changed. The logarithm logb(x) can be computedfrom the logarithms of x and b with respect to an arbitrary base c using the formula

logb(x) =logc(x)

logc(b).

34

3 Differentiation

In mathematics the derivative of a function f is a measure of how the function changesas its input changes. Loosely speaking, a derivative can be thought of as how much onequantity is changing in response to changes in some other quantity.

The derivative of a function at a chosen input value describes the best linear approxi-mation to the function near that input value. For a real-valued function with a singlevariable, the derivative at a point equals the slope of the tangent line to the function atthat point (see figure 3.1).

Figure 3.1: The graph of a function and its tangent line.

The process of finding a derivative is called differentiation. As stated above, differentiati-on is a method to compute the rate at which a dependent output y changes with respectto the change in the independent input x. This rate of change is called the derivative ofy with respect to x.

The simplest case is when y is a linear function of x, meaning that the graph y = f(x)in the xy-plane is a straight line. In this case,

y = f(x) = kx+ d

for k, d ∈ R and the slope k is given by

k =change in y

change in x=

∆y

∆x,

where the symbol ∆ (the uppercase form of the Greek letter Delta) is an abbreviationfor change in. This gives the exact value k for the slope of the straight line andsubsequently for its derivative.

35

3 Differentiation

If the function f is not linear, the change in y divided by the change in x varies as xvaries. Differentiation is a method to find an exact value for this rate of change at anygiven value of x. The idea is to compute the rate of change as the limit of the ratio ofthe differences

∆y

∆x=f(x+ h)− f(x)

x+ h− x=f(x+ h)− f(x)

h,

as ∆x (or h respectively) becomes infinitely small.

In figure 3.2 the graph of a function and its secant line are shown. As h becomes infinitelysmall, the secant line becomes the tangent line. Its slope is the derivative of the functionat x.

Figure 3.2: The graph of a function and a secant line to that function.

In Leibniz’s notation, such an infinitesimal change in x is denoted by dx, and the deri-vative of y with respect to x is written

dy

dx,

suggesting the ratio of two infinitesimal quantities. The expression is read as the deri-vative of y with respect to x or dy over dx.

Simultaneously with Leibniz, the English mathematician and physicist Newton has deve-loped the concept of differentiation in the 17th century. He developed a slightly differentnotation where the difference quotient is defined as

∆f(x)

∆x=f(x+ h)− f(x)

h.

The derivative is the value of the difference quotient as h becomes infinitesimally small.

36

3 Differentiation

3.1 Continuity and differentiability

For a function f to have a derivative at point a, it is necessary for the function f to becontinuous at a (see section 2.3.3), but continuity alone is not sufficient. For example,the absolute value function f(x) = |x| is continuous at x = 0, but is not differentiablethere (see figure 3.3).

In the graph of the function in the xy-plane this can be seen as a kink or a cusp inthe graph at x = 0. The derivative does not exist at x = 0 as the tangent line at thispoint is not unique.

−4 −2 0 2 40

1

2

3

4

5

Figure 3.3: The function f(x) = |x|. It is not differentiable at x = 0.

Let f be a differentiable function, and let f ′(x) be its derivative. The derivative of f ′(x)- if it has one - is written f ′′(x) and is called the second derivative of f . Similarly thederivative of a second derivative is written f ′′′(x) and is called the third derivative of f .These repeated derivatives are called higher-order derivatives.

A function that has infinitely many derivatives is called infinitely differentiable or smooth.For example, every polynomial function is infinitely differentiable (see section 2.6.1).

3.2 Computing the derivative

In theory the derivative of a function can be computed from the definition by consideringthe difference quotient and computing its limit. In practice the derivatives are computedusing rules for obtaining derivatives of complicated functions from simpler ones.

37

3 Differentiation

3.2.1 Differentiation is linear

For any functions f(x) and g(x) and any real numbers a, b and c, the derivative of thefunction h(x) = a · f(x) + b · g(x) + c with respect to x is

h′(x) = a · f ′(x) + b · g′(x).

Special cases include

• the constant multiple rule

(a · f(x))′ = a · f ′(x)

• the sum rule(f(x) + g(x))′ = f ′(x) + g′(x)

• the subtraction rule(f(x)− g(x))′ = f ′(x)− g′(x)

• the constant rule(f(x) + c)′ = f ′(x)

3.2.2 The product rule

For the functions f(x) and g(x), the derivative of the function h(x) = f(x) · g(x) withrespect to x is

h′(x) = f ′(x) · g(x) + f(x) · g′(x).

3.2.3 The quotient rule

For the functions f(x) and g(x), the derivative of the function h(x) = f(x)g(x) with respect

to x is

h′(x) =

(f(x)

g(x)

)′=f ′(x) · g(x)− g′(x) · f(x)

g(x)2

whenever g(x) is nonzero.

38

3 Differentiation

3.2.4 The chain rule

The derivative of the function of a function h(x) = f(g(x)) (or h(x) = f g(x) respec-tively, see section 2.4.2) with respect to x is

h′(x) = f ′(g(x)) · g′(x).

3.2.5 The elementary and the generalized power rule

If f(x) = xr and r ∈ R \ 0f ′(x) = rxr−1.

If r = 0 then the function f is a constant function and its derivative equals 0.

The most general power rule is the functional power rule. For any functions f(x) andg(x), (

f(x)g(x))′

=(eg(x)·log(f(x))

)′= f(x)g(x) ·

(f ′(x)

g(x)

f(x)+ g′(x) · log(f(x))

),

where both sides are well defined.

3.2.6 Derivatives of exponential and logarithmic functions

The exponential function is its own derivative (see section 2.6.5), therefore

(ex)′ = ex.

For the logarithmic function the following holds

(logc x)′ =1

x log c, c > 0, c 6= 1, x 6= 0

and

(log x)′ =1

x, x 6= 0.

Furthermore

(log |x|)′ = 1

xand (xx)′ = xx(1 + log x)

holds as well. By applying the chain rule it can be stated that

(log f)′ =f ′

fwherever f is positive.

39

3 Differentiation

3.2.7 Derivatives of trigonometric functions

The most important trigonometric functions were introduced in section 2.6.4. Theirderivatives are

(sinx)′ = cosx,

(cosx)′ = − sinx,

(tanx)′ =1

cos2 x= 1 + tan2 x.

Examples

1. The derivative off(x) = 4x7

isf ′(x) = 4 · 7x6 = 28x6.

2. The derivative off(x) = 3x8 + 2x+ 1

isf ′(x) = 3 · 8x7 + 2x0 = 24x7 + 2.

3. The derivative off(x) = 2

√x = 2x

12

is

f ′(x) = 2 · 1

2x−

12 =

1√x.

4. The derivative off(x) = (x+ 1)(x2 − x+ 1)

using the product rule is

f ′(x) = x0 · (x2 − x+ 1) + (x+ 1)(2x− 1x0)

= x2 − x+ 1 + (x+ 1)(2x− 1) = 3x2.

40

3 Differentiation

5. The derivative of

f(x) =2x+ 4

x2 + 1

using the quotient rule is

f ′(x) =2(x2 + 1)− 2x(2x+ 4)

(x2 + 1)2= −2

x2 + 4x− 1

(x2 + 1)2.

6. The derivative of

f(x) = sin

(1

x2

)using the chain rule is

f ′(x) = cos

(1

x2

)· (−2x−3) = − cos

(1

x2

)2

x3.

7. The derivative off(x) = log 5x

is

f ′(x) =5

5x=

1

x.

8. The derivative off(x) = x3ecosx

using the product rule and the chain rule is

f ′(x) = 3x2ecosx − x3 sinxecosx.

41

4 Rate of return calculation

The rate of return expresses the profit, respectively loss, as a percentage of the originalvalue of the asset. It measures by what percentage the original value changed.

4.1 Discrete rate of return

The discrete rate of return R on an asset (alternatively called simple rate of return orperiodically compounded rate of return) is calculated as

R =St2 − St1St1

=St2St1− 1,

where St1 and St2 are prices of an asset at different points in time t1 and t2, wheret1 < t2. Hence R represents a T -period discrete rate of return, where T = t2 − t1.

In order to be able to compare rates of return that are calculated over different timeperiods, the rates of return are typically annualised. The annualised discrete rate ofreturn Rann is calculated from the T -period rate of return R as

Rann = (1 +R)1/T − 1.

A T -period rate of return R is calculated from an annualised rate of return Rann as

R = (1 +Rann)T − 1.

The value at t2 of an investment in a specific asset at time t1 is

Vt2 = Vt1 · (1 +R) = Vt1 · (1 +Rann)T ,

where Vt1 and Vt2 represent the value of the investment at time t1 and t2, respectively.

Examples

Example 4.1 Consider the price information for four stocks, given in table 4.1.

42

4 Rate of return calculation

Asset Stock 1 Stock 2 Stock 3 Stock 4

Price on January 1st, 2014 (St1) 100e 67e 3 486e 587ePrice on January 1st, 2016 (St2) 110e 69e 3 674e 203e

Table 4.1: Prices of the 4 Stocks on two dates for example 4.1.

Assume we want to calculate the T -period discrete rate of return R, the annualizeddiscrete rate of return Rann and the value of an original investment of e 10 000 in 2014in each of the assets at January 1st, 2016.

For Stock 1 we get:

• a 2-year discrete rate of return:

R1 =110

100− 1 = 10%.

• an annualized discrete rate of return:

Rann1 = 1.11/2 − 1 = 4.881%.

• a value on January 1st, 2016:

1V2016 = 10 000 · 1.048812 = 11 000e.

For Stock 2 we get:

• a 2-year discrete rate of return:

R2 =69

67− 1 = 2.985%.

• an annualized discrete rate of return:

Rann2 = 1.029851/2 − 1 = 1.482%.

• a value on January 1st, 2016:

2V2016 = 10 000 · 1.014822 = 10 298.51e.

For Stock 3 we get:

43

4 Rate of return calculation

• a 2-year discrete rate of return:

R3 =3 674

3 486− 1 = 5.393%.

• an annualized discrete rate of return:

Rann3 = 1.053931/2 − 1 = 2.661%.

• a value on January 1st, 2016:

3V2016 = 10 000 · 1.026612 = 10 539.30e.

For Stock 4 we get:

• a 2-year discrete rate of return:

R4 =203

587− 1 = −65.417%.

• an annualized discrete rate of return:

Rann4 = 0.345831/2 − 1 = −41.193%.

• a value on January 1st, 2016:

4V2016 = 10 000 · 0.588072 = 3 458.26e.

Example 4.2 Consider the price information for four stocks, given in table 4.2.

Asset Stock 1 Stock 2 Stock 3 Stock 4

Price on November 30th, 2015 (St1) 107e 68e 3 681e 242ePrice on December 17th, 2015 (St2) 109e 70e 3 682e 199e

Table 4.2: Prices of the 4 Stocks on two dates for example 4.2.

Assume we want to calculate the T -period discrete rate of return R, the annualizeddiscrete rate of return Rann and the value of an original investment of e 10 000 onNovember 30th, 2015 in each of the assets at December 17th, 2015.

The time period T is 17 days, corresponding to T = 17/365 = 0.04657534 years.

For Stock 1 we get:

44

4 Rate of return calculation

• a 17/365-year discrete rate of return:

R1 =109

107− 1 = 1.869%.

• an annualized discrete rate of return8:

Rann1 = 1.0186936517 − 1 = 48.827%.

• a value on December 17th, 2015:

1Vt2 = 10 000 · 1.4882717365 = 10 186.92e.

For Stock 2 we get:

• a 17/365-year discrete rate of return:

R2 =70

68− 1 = 2.941%.

• an annualized discrete rate of return:

Rann2 = 1.0294136517 − 1 = 86.336%.

• a value on December 17th, 2015:

2Vt2 = 10 000 · 1.8633617365 = 10 294.12e.

For Stock 3 we get:

• a 17/365-year discrete rate of return:

R3 =3 682

3 681− 1 = 0.027%.

• an annualized discrete rate of return:

Rann3 = 1.0002736517 − 1 = 0.585%.

8Please note that117365

=365

17.

45

4 Rate of return calculation

• a value on December 17th, 2015:

3Vt2 = 10 000 · 1.0058517365 = 10 002.72e.

For Stock 4 we get:

• a 17/365-year discrete rate of return:

R4 =199

242− 1 = −17.769%.

• an annualized discrete rate of return:

Rann4 = 0.8223136517 − 1 = −98.501%.

• a value on December 17th, 2015:

4Vt2 = 10 000 · 0.0149917365 = 8 223.14e.

4.2 Continuously compounded rate of return

The continuously compounded rate of return r on an investment (alternatively calledlog-return) is mostly used in financial models. It is calculated as

r = log(St2)− log(St1) = log

(St2St1

),

where log(·) is the natural logarithm and St1 and St2 are prices of an asset at differentpoints in time t1 and t2, where t1 < t2. Hence r represents a T -period continuouslycompounded rate of return, where T = t2 − t1.

The annualized continuously compounded rate of return rann is calculated from theT -period rate of return r as

rann =r

T.

A T -period rate of return r is calculated from an annualized rate of return rann as

r = rann · T.

The value at t2 of an investment in a specific asset at time t1 is

Vt2 = Vt1 · er = Vt1 · erann·T ,

where Vt1 and Vt2 represent the value of the investment at time t1 and t2, respectively.e is the Euler’s number, which was introduced in section 2.6.5.

46

4 Rate of return calculation

Examples

Example 4.3 Consider the price information for four stocks, given in table 4.3.

Asset Stock 1 Stock 2 Stock 3 Stock 4

Price on January 1st, 2014 (St1) 100e 67e 3 486e 587ePrice on January 1st, 2016 (St2) 110e 69e 3 674e 203e

Table 4.3: Prices of the 4 Stocks on two dates for example 4.3.

Assume we want to calculate the T -period continuously compounded rate of return r,the annualized continuously compounded rate of return rann and the value of an originalinvestment of e 10 000 in 2014 in each of the assets at January 1st, 2016.

For Stock 1 we get:

• a 2-year continuously compounded rate of return:

r1 = log

(110

100

)= 9.531%.

• an annualized continuously compounded rate of return:

rann1 = 0.09531 · 1

2= 4.766%.

• a value on January 1st, 2016:

1V2016 = 10 000 · e0.04881·2 = 11 000e.

For Stock 2 we get:

• a 2-year continuously compounded rate of return:

r2 = log

(69

67

)= 2.941%.

• an annualized continuously compounded rate of return:

rann2 = 0.02941 · 1

2= 1.471%.

• a value on January 1st, 2016:

2V2016 = 10 000 · e0.01471·2 = 10 298.51e.

47

4 Rate of return calculation

For Stock 3 we get:

• a 2-year continuously compounded rate of return:

r3 = log

(3 674

3 486

)= 5.253%.

• an annualized continuously compounded rate of return:

rann3 = 0.05253 · 1

2= 2.626%.

• a value on January 1st, 2016:

3V2016 = 10 000 · e0.02626·2 = 10 539.30e.

For Stock 4 we get:

• a 2-year continuously compounded rate of return:

r4 = log

(203

587

)= −106.182%.

• an annualized continuously compounded rate of return:

rann4 = −1.06182 · 1

2= −53.091%.

• a value on January 1st, 2016:

4V2016 = 10 000 · e−0.53091·2 = 3 458.26e.

Example 4.4 Consider the price information for for stocks, given in table 4.4.

Asset Stock 1 Stock 2 Stock 3 Stock 4

Price on November 30th, 2015 (St1) 107e 68e 3 681e 242ePrice on December 17th, 2015 (St2) 109e 70e 3 682e 199e

Table 4.4: Prices of the 4 Stocks on two dates for example 4.4.

Assume we want to calculate the T -period continuously compounded rate of return r,the annualized continuously compounded rate of return rann and the value of an original

48

4 Rate of return calculation

investment of e 10 000 on November 30th, 2015 in each of the assets at December 17th,2015.

The time period T is 17 days, corresponding to T = 17/365 = 0.04657534 years.

For Stock 1 we get:

• a 17/365-year continuously compounded rate of return:

r1 = log

(109

107

)= 1.852%.

• an annualized continuously compounded rate of return:

rann1 = 0.01852 · 365

17= 39.761%.

• a value on December 17th, 2015:

1Vt2 = 10 000 · e0.39761·17365 = 10 186.92e.

For Stock 2 we get:

• a 17/365-year continuously compounded rate of return:

r2 = log

(70

68

)= 2.899%.

• an annualized continuously compounded rate of return:

rann2 = 0.02899 · 365

17= 62.238%.

• a value on December 17th, 2015:

2Vt2 = 10 000 · e0.62238·17365 = 10 294.12e.

For Stock 3 we get:

• a 17/365-year continuously compounded rate of return:

r3 = log

(3 682

3 681

)= 0.027%.

49

4 Rate of return calculation

• an annualized continuously compounded rate of return:

rann3 = 0.00027 · 365

17= 0.583%.

• a value on December 17th, 2015:

3Vt2 = 10 000 · e0.00583·17365 = 10 002.72e.

For Stock 4 we get:

• a 17/365-year continuously compounded rate of return:

r4 = log

(199

242

)= −19.563%.

• an annualized continuously compounded rate of return:

rann4 = −0.19536 · 365

17= −420.035%.

• a value on December 17th, 2015:

4Vt2 = 10 000 · e−4.20035·17365 = 8 223.14e.

4.3 The relationship between discrete and continuouslycompounded rate of return

A continuously compounded rate of return is calculated from a discrete rate of returnsas

r = log(1 +R)

and a discrete rate of return is calculated from a continuously compounded rate of returnas

R = er − 1.

If, for example, the discrete rate of return is 10%, R = 10%, the corresponding con-tinuously compounded rate of return is r = log(1.1) = 9.531%. We may verify thate0.09531 − 1 = 10%. As a matter of fact r ≤ R holds in all cases.

50

4 Rate of return calculation

A total loss of the investment (if the value of the invested amount corresponds to zero,Vt2 = 0) corresponds to R = −100% and r = −∞.

Figure 4.1 shows the relationship between these two definitions of returns. The figureshows the resulting returns for an investment with Vt1 = 100 and different end-valuesVt2 .

0 20 40 60 80 100 120 140 160 180 200

−200

−100

0

100

end-values Vt2

retu

rnin

%

discrete renturnscontinuously compounded returns

Figure 4.1: The relationship between continuously compounded and discrete returns.

51

5 Descriptive statistics

Descriptive statistics is the study of collection, organization, analysis and interpretationof data. It deals with all aspects of this, including the planning of data collection interms of surveys and experiments. Statistical methods can be used for summarizingand describing a collection of data. This is called descriptive statistics and is useful inresearch to communicate the results of experiments.

5.1 Basic concepts of descriptive statistics

The set on which descriptive statistics is usually applied is called the population or thesample, depending on the size of the set. The term statistical characteristics denotes theentities we actually want to measure and analyze. The following sections provide a shortoverview on the most important basic concepts of statistics.

5.1.1 Population

For applying statistics to a problem it is necessary to begin with a population to bestudied. A population can also be composed of observations of a process at various pointsin time, with data from each observation serving as a different member of the overallgroup. Data collected this way constitutes what is called a time series. In general, apopulation consists of all elements - individuals, items or objects - whose characteristicswe wish to study.

Examples for different populations

• The ballots of every person who voted in a given election.

• The sex of every persons of a country.

• The electric charge of every atom composing a crystal.

• Price of gasoline at every petrol station in a country.

52

5 Descriptive statistics

5.1.2 Sample

A sample is a selection or a subset of entities we can actually study from within a po-pulation. This may be used to estimate characteristics of the whole population. Usually,an entire population cannot be surveyed because the cost of a census is too high. Thethree main advantages of sampling are that

1. the costs are lower,

2. data collection is faster,

3. studying the population is not practical,

4. if the object of interest is defined as a time series, data for the whole populationcannot be collected as we would need observations for all points in time.

Examples for samples

• The ballots of 5000 voters in a given election.

• The returns of a given stock during 2008 - 2012.

• Prices of 50 petrol stations in a country.

5.1.3 Statistical characteristics and their attributes

When the population and the sampling process are specified, the entities of statisticalinvestigation are fixed. The set of all entities in the population is usually denoted as Ω.Examples are voters, manufactured products or stocks.

Next the statistical characteristic, the object of statistical investigation, has to be speci-fied. In this text the set of statistical characteristics is usually denoted as S. Examplesare the vote, exact dimension of the product or prices of a stock at certain points intime.

Finally the statistical attributes, the properties of statistical characteristics, can be des-cribed. The statistical attributes therefore comprise of all elements of the set S. Examplesare SPO or OVP, 2.1243cm or 2.1245cm and e 75 or e 77.

53

5 Descriptive statistics

Summarizing, the model used of statistical investigations can be formally denoted as

X : Ω→ Sω 7→ s.

Statistical characteristics can be classified as qualitative or quantitative characteristics.

Qualitative statistical characteristics

If a sample is measured using qualitative characteristics, the statistical sample cannotdirectly be measured with numbers. Within qualitative statistical characteristics twoconcepts can be distinguished, namely nominal and ordinal characteristics.

1. A nominal characteristic is measured on a nominal scale. Within such a scale,there is no hierarchy of attributes and the distance between the attributes cannotbe interpreted.

Therefore nominal characteristics only allow qualitative classification. The ele-ments under investigation are measured in terms of whether the individual itembelongs to some distinctive category, but these categories cannot be quantified orordered.

ExamplesEye color, occupation, religious affiliation, stock exchange of an equity share.

2. An ordinal characteristic is measured on a so-called ordinal scale. For this scale,there is a hierarchy of attributes, but the distance between the attributes cannotbe interpreted.

Therefore an ordinal characteristic allows one to rank measured items in terms ofwhich has less and which has more of the quality represented by the characteristic,but there is no information about how much more.

ExamplesDrinking habit, level of education, grades, rating of a company.

Quantitative statistical characteristics

The attributes of a quantitative characteristic are characterized by their magnitude.There exists not only a hierarchy of the attributes (in this case values), but also the

54

5 Descriptive statistics

distance between them can be interpreted meaningfully. Depending on the values anattribute can have, two types can be distinguished.

1. A discrete characteristic may possess only certain distinct attributes. The numberof values can be finite or countably infinite. Most of the time, the set of a discretequantitative characteristic is a subset of the natural numbers, formally S ⊆ N.

Examples for discrete characteristicsNumber of bad items in a sample, monthly income in e, number of employees ofa company.

2. A continuous characteristic may possess at least theoretically any value within aninterval. In general the set of such a statistical characteristic S is a subset of thereal numbers, formally S ⊆ R.

Examples for continuous characteristicsTemperature, length, daily return of a stock.

5.2 Distribution analysis

For this section we concentrate on a statistical model with a finite set of statisticalcharacteristics.

5.2.1 Absolute and relative frequencies

The absolute frequency of an event Ei is the number ni of times, the event occurred inthe sample (or the population) under investigation. Formally, let S be the finite set ofstatistical characteristics containing k attributes aj , j = 1, 2, . . . , k.

The absolute frequency of the attribute aj , j = 1, . . . , k is defined as

F(aj) = Number of cases in which aj occurs.

The relative frequency of the attribute aj , j = 1, . . . , k is defined as

f(aj) =1

n· F(aj),

where n denotes the number of observations. Important properties of absolute and rela-tive frequencies are

55

5 Descriptive statistics

1. The absolute frequency of any attribute is less than or equal to the number ofobservations,

0 ≤ F(aj) ≤ n, j = 1, . . . , k.

2. The relative frequency of any attribute is less than or equal to 1 (or 100%),

0 ≤ f(aj) ≤ 1, j = 1, . . . , k.

3. The sum of all absolute frequencies of all attributes equals the number of observa-tions n,

k∑j=1

F(aj) = n.

4. The sum of all relative frequencies of all attributes equals 1 (or 100%),

k∑j=1

f(aj) = 1.

Example for the calculation of frequencies of a statistical modelLet X : Ω → S be the statistical model which assigns a grade in a given exam to allstudents. Therefore the set Ω is the set of all students, while the set S consists of allpossible grades9, S = 1, 2, 3, 4, 5. Clearly, each grade can occur several times within agiven sample. The number of students who received a certain grade, is then the absolutefrequency of this grade.

Based on the calculation of frequencies, a frequency distribution can be defined. Thesequence

F(a1),F(a2), . . . ,F(ak)

is called the absolute frequency distribution. Analogously, the sequence

f(a1), f(a2), . . . , f(ak)

is called the relative frequency distribution.

Example for the calculation of absolute and relative frequenciesLet X : Ω → S be a statistical model for the average harmonized Austrian inflationrate (HVPI) since 1996. For the sake of simplicity the data has been rounded to half apercent values. The data is presented in table 5.1. The data contains 18 data points and7 different values. Therefore

n = 18 and k = 7.

The absolute and relative frequency distributions are presented in table 5.2.

9The Austrian grade system is used for this example.

56

5 Descriptive statistics

year inflation p.a.

1996 2.0%1997 1.0%1998 1.0%1999 0.5%2000 2.0%2001 2.5%2002 1.5%2003 1.5%2004 2.0%2005 2.0%2006 1.5%2007 2.0%2008 3.0%2009 0.5%2010 1.5%2011 3.5%2012 2.5%2013 2.0%

Table 5.1: Harmonized Austrian inflation rates since 1996, rounded in 1/2%-steps.

aj F(aj) f(aj)0.5% 2 0.1111.0% 2 0.1111.5% 4 0.2222.0% 6 0.3332.5% 2 0.1113.0% 1 0.0563.5% 1 0.056∑

18 1.000

Table 5.2: Absolute and relative frequencies for the Austrian inflation data given in table5.1.

5.2.2 Visualization of a frequency distribution

There are several ways to visualize frequency distributions. In the following sections themost important methods are presented.

Bar charts

A bar chart depicts the frequencies by a series of bars. Absolute or relative frequenciescan be depicted (see figure 5.1 for an absolute frequency bar chart).

A common term for a bar chart of the frequency distribution is histogram. Most of the

57

5 Descriptive statistics

0 1 2 3 40

2

4

6

Austrian inflation rate p.a. in %

abso

lute

freq

uen

cies

Figure 5.1: A bar chart for the absolute frequencies for the Austrian inflation rates.

time a histogram combines a bar chart of the absolute frequencies distribution with theprobability density function of a normal distribution10 with the same mean and standarddeviation as the given data. See figure 5.2 for an absolute frequency bar chart and theapproximated normal distribution as red line.

0 1 2 3 40

2

4

6

Austrian inflation rate p.a. in %

abso

lute

freq

uen

cies

Figure 5.2: A histogram for the Austrian inflation rates.

Polygons

A frequency polygon is a simple line graph of the relative or absolute frequency distri-bution. See figure 5.3 for a relative frequency line chart for the given data.

10For the definition of normal distributions see chapter 6.7.4.

58

5 Descriptive statistics

0.5 1 1.5 2 2.5 3 3.50

0.1

0.2

0.3

Austrian inflation rate p.a. in %

rela

tive

freq

uen

cies

Figure 5.3: A line chart for relative frequencies for the Austrian inflation rates.

Pie charts

A pie chart is a pie-shaped figure in which pieces of the pie represent the relative fre-quencies. See figure 5.4 for a relative frequency pie chart for the given data.

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%3.5%

Figure 5.4: A pie chart for the absolute frequencies for the Austrian inflation rates.

5.2.3 Skewness

The frequency distribution charts with bars or polygons allow us to describe the fre-quency distribution in terms of skewness and kurtosis. In terms of skewness, a frequencycurve can be

59

5 Descriptive statistics

negatively skewed

positively skewed

symmetric

Figure 5.5: The different forms of frequency curves in terms of skewness.

positively skewed The curve is nonsymmetric with the tail to the right. See thedistribution curve on the top of figure 5.5.

negatively skewed The curve is nonsymmetric with the tail to the left. See the dis-tribution curve in the middle of figure 5.5.

symmetrical The curve is symmetric. See the distribution curve at the bottom of figure5.5.

Karl Pearson suggested a simple calculation as a measure of skewness (sometimes denoted

60

5 Descriptive statistics

as Pearson’s median). He defined the skewness coefficient S as11

S =3(X −Me

)s

, (−3 ≤ S ≤ 3).

Example for the calculation of the skewness coefficientThe skewness coefficient S of the Austrian inflation data presented in table 5.1 is

S =3 · (1.81%− 2.00%)

0.7885%= −0.740.

As can be seen easily, the given data is negatively skewed.

5.2.4 Kurtosis

In terms of kurtosis, a frequency curve can be

mesokurtic The curve is neither flat nor peaked, in terms of the distribution of observedvalues. See type A in figure 5.6.

leptokurtic The curve is peaked with the observations concentrated within a narrowrange of values. See type B in figure 5.6.

platykurtic The curve is flat with the observations distributed relatively evenly. See typeC in figure 5.6.

Figure 5.6: The different forms of frequency curves in terms of kurtosis.

11The measures of location arithmetic mean X and median Me are defined in section 5.5. The definitionof the standard deviation s can be found in section 5.6.

61

5 Descriptive statistics

5.3 Cumulative absolute and relative frequencies

Additionally to the absolute and relative frequency distribution the cumulative distri-bution can be of interest. The cumulative frequency distribution visualizes the numberof values or percentage of values equal to or below a distinct attribute aj .

Formally the cumulative absolute frequency of the attribute aj , j = 1, . . . , k is definedas

F(a1) + F(a2) + . . .+ F(aj) =

j∑i=1

F(ai), j = 1, . . . , k.

Similarly the cumulative relative frequency of the attribute aj , j = 1, . . . , k is definedas

f(a1) + f(a2) + . . .+ f(aj) =

j∑i=1

f(ai), j = 1, . . . , k.

Continued ExampleThe cumulative distributions for the example in section 5.2.1 are presented in table 5.3.A polygon plot of the cumulative absolute or relative frequencies is denoted as ogive.

j aj F(aj) f(aj)∑j

i=1F(ai)∑j

i=1 f(ai)

1 0.5% 2 0.111 2 0.1112 1.0% 2 0.111 4 0.2223 1.5% 4 0.222 8 0.4444 2.0% 6 0.333 14 0.7785 2.5% 2 0.111 16 0.8896 3.0% 1 0.056 17 0.9447 3.5% 1 0.056 18 1.000

Total 18 1.000

Table 5.3: Absolute and relative frequencies combined with their cumulative distribution,calculated for the data in table 5.1.

In an ogive the data values are shown on the horizontal axis and either the cumulativeabsolute frequencies or the cumulative relative frequencies on the vertical axis. In figure5.7 an ogive with cumulative relative frequencies is shown.

5.4 Empirical distribution function

The empirical distribution function, or empirical CDF, is the cumulative distributionfunction associated with the empirical measure of the sample. This CDF is a step function

62

5 Descriptive statistics

0.5 1 1.5 2 2.5 3 3.50

20

40

60

80

100

Austrian inflation p.a. (in %)

cum

ula

tive

rela

tive

freq

uen

cy(i

n%

)

Figure 5.7: An ogive for cumulative relative frequencies for the Austrian inflation rates.

that jumps up by 1/n at each of the n data points. The empirical distribution functionestimates the true underlying cumulative distribution function of the data.

The empirical distribution function is defined as follows

F (x) =number of elements in the sample with attribute < aj

n

=

0 for x < a1∑j

i=1 f(ai) for aj ≤ x < aj+1

1 for x ≥ ak.

The empirical distribution function fulfills some important properties.

1. The empirical CDF describes the proportion of values in the sample, which aresmaller than a distinct value x. Formally

F (x) = P(X ≤ x) (P : proportion)

2. The empirical CDF always lies between 0 and 1. Formally

0 ≤ F (x) ≤ 1 ∀x

3. The empirical CDF is a monotonically increasing function. Formally

∀x1, x2 : x1 < x2 ⇒ F (x1) ≤ F (x2)

63

5 Descriptive statistics

4. The empirical CDF can be used to calculate the proportion of values within acertain half-open interval ]x1, x2].

P(x1 < X ≤ x2) = F (x2)− F (x1)

5. F (x) is at least right-sided continuous and has at most a finite number of jumpdiscontinuities.

6. All empirical CDFs approach 0 as x→ −∞ and 1 as x→∞.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

Austrian inflation p.a. (in %)

cum

ula

tive

rela

tive

freq

uen

cy

Figure 5.8: The empirical cumulative distribution function for the Austrian inflationrates.

The empirical cumulative distribution function can be represented in different ways. Infigure 5.8 the empirical cumulative distribution function for the example of Austrianinflation rates is shown. The circles denote that the corresponding points are excludedand the filled points are included in the graph.

Additionally, the empirical CDF can be represented in tabular form. Thereby the domainof the function is divided in half-open intervals and the function value can be specified foreach interval. In table 5.4 the empirical cumulative distribution function for the Austrianinflation rates is shown.

Interpretation exampleThe empirical CDF allows the following calculations and interpretations.

1. The value of the empirical CDF at 0.022 is

F (0.022) = 0.778.

64

5 Descriptive statistics

Interval F (x)

]−∞, 0.005[ 0.000[0.005, 0.010[ 0.111[0.010, 0.015[ 0.222[0.015, 0.020[ 0.444[0.020, 0.025[ 0.778[0.025, 0.030[ 0.889[0.030, 0.035[ 0.944[0.035,+∞[ 1.000

Table 5.4: The empirical cumulative distribution function for Austrian inflation ratessince 1996 in tabular form.

This can be interpreted as for 77.8% of the average harmonized inflation rates areless than or equal to 2.20%.

2. The value of the difference of the empirical CDF at 0.030 and 0.021 can be usedto calculate the probability of an inflation rate lying between 2.1% and 3.0%. Weget

F (0.030)− F (0.021) = 0.944− 0.778 = 0.167.

Therefore 16.7% of the harmonized inflation rates lie between 2.1% and 3.0%.

5.5 Measures of location

5.5.1 Arithmetic mean

The arithmetic mean, or simply the mean when the context is clear, is the centraltendency of a collection of values taken as the sum of the values divided by the size ofthe collection. Let

xi, i = 1, . . . , n

be the observed values of a variate having the attributes

aj , j = 1, . . . k.

If observed values for the whole population of size N are known, the mean is called thepopulation mean. It is usually denoted as µ and is calculated as

µ =1

N

N∑i=1

xi

65

5 Descriptive statistics

and therefore is a simple arithmetic mean of population data. In terms of attributes themean for the whole population is defined by

µ =1

N

k∑j=1

aj · F(aj).

If observed values consist of a sample of the population, the mean is called the samplemean. It is usually denoted as X and is calculated as

X =1

n

n∑i=1

xi

and therefore is a simple arithmetic mean of the sample data. In terms of attributes themean for sample data is defined by

X =1

n

k∑j=1

aj · F(aj).

The arithmetic mean fulfills the following properties:

1. The arithmetic mean of data which has been linearly transformed by its arithmeticmean is always 0. Formally

n∑i=1

(xi −X

)= 0.

2. The arithmetic mean is the best approximation to the data in terms of least squares.Formally

∀a ∈ R :n∑i=1

(xi −X

)2 ≤ n∑i=1

(xi − a)2 .

3. If X1 and X2 are the arithmetic averages of two samples of sizes n1 and n2 respec-tively, then the arithmetic mean X of the distribution combining the two can becalculated as

X =n1 ·X1 + n2 ·X2

n1 + n2.

4. Given the linear transformation

x∗i = α+ βxi i = 1, . . . , n and α, β ∈ R,

we obtainX∗ = α+ β ·X.

66

5 Descriptive statistics

Example for the calculation of the arithmetic meanThe arithmetic mean for the Austrian inflation data presented in table 5.1 is

X =1

18

18∑i=1

xi =0.325

18= 0.0181 = 1.81%.

In terms of attributes we get

X =1

18

7∑j=1

aj · F(aj) =0.325

18= 0.0181 = 1.81%.

5.5.2 Median

The median is the numerical value separating the higher half of a sample or a populationfrom the lower half. The median of a finite list of numbers can be found by arrangingall the observations from lowest value to highest value and picking the middle one.

If there is an even number of observations, then there is no single value in the middle.The median is then usually defined to be the mean of the two middle values. Formally,the median of a data set is defined by

Me =

x[n+12 ] if n is odd

12

(x[n2 ] + x[n2+1]

)if n is even.

Here [·] denotes the position of the observation in the arranged (ascending or descending)data set.

Example for the calculation of the medianThe median for the Austrian inflation data presented in table 5.1 is

Me = 0.02 = 2%.

5.5.3 α-quantile

Quantiles are points taken at regular intervals from the cumulative distribution function(CDF) of a random variable (see section 6). Dividing ordered data into q essentiallyequal-sized data subsets is the motivation for q-quantiles. The quantiles are the datavalues marking the boundaries between consecutive subsets.

67

5 Descriptive statistics

Put in another way, the kth q-quantile for a random variable is the value x such that theprobability that the random variable will be less than x is at most

k

q

and the probability that the random variable will be more than x is at most

q − kq

= 1− k

q.

For a ranked data set, the α-quantile (0 < α < 1) is defined by

xα =

x[k] if n · α is not an integer (then k is the following integer)12

(x[k] + x[k+1]

)if n · α is integer (k = nα)

In many applications the terms lower and upper quartile are used. Usually the 25%quantile is called lower (first) quartile and the 75% quartile is called upper (third) quartilerespectively.

Example for the calculation of a 25% quantileWhen calculating the 25% quantile for the Austrian inflation data presented in table 5.1we get:

18 · 0.25 = 18 · 0.25 = 4.5.

As n · α is not an integer we search for the data point in the sorted data set at theposition x[k] = x[5]. This is

x[5] = 0.015 = 1.5%.

5.5.4 Mode

The mode is the number that appears the most often in a set of numbers. In other words,the mode of a data set is the attribute with the highest absolute frequency. Usually itis denoted as Mo. Like the statistical mean (see section 5.5.1) and median (see section5.5.2), the mode is a way of expressing, in a single number, important information abouta random variable or a population.

The numerical value of the mode is usually different from those of the mean and medianand may be very different for strongly skewed distributions. The mode is not necessarilyunique, since the same maximum frequency may be attained at different values. The mostextreme case occurs in uniform distributions, where all values occur equally frequently.The mode of a discrete probability distribution is the value x at which its probabilitydistribution has its maximum value. Informally speaking, the mode is at the peak.

68

5 Descriptive statistics

Example for the determination of the modeThe mode for the Austrian inflation data presented in table 5.1 is

Mo = 0.02 = 2.0%

as it has the highest absolute frequency, namely 6.

5.5.5 Relationship among mean, median and mode

The information about mean, median and mode allows one to determine some propertiesof the sample distribution or population distribution respectively. Among other proper-ties the skewness (see section 5.2.3) of the distribution can be identified.

Figure 5.9: The values of mean, median and mode for three possibilities of a sample orpopulation distribution.

In figure 5.9 the following three possibilities are presented:

1. For a symmetric frequency distribution curve with one peak, the values of the mean,median and mode are identical. They lie in the centre of the distribution. This isdisplayed in the image in the center of figure 5.9.

2. For a frequency distribution curve skewed to the right, the value of the mean is thelargest, that of the mode is the smallest and the value of the median lies betweenthe two. This is displayed in the right image of figure 5.9.

3. If a frequency distribution curve is skewed to the left, the value of the mean is thesmallest and that of the mode is the largest with the value of the median lyingbetween these two. This is displayed in the left image of figure 5.9.

69

5 Descriptive statistics

5.5.6 Geometric mean

The geometric mean is a type of mean or average, which indicates the central tendencyor typical value of a set of numbers. A geometric mean is often used when comparingdifferent items - finding a single figure of merit for these items - when each item hasmultiple properties that have different numeric ranges.

The geometric mean is similar to the arithmetic mean (see section 5.5.1), except that thenumbers are multiplied and then the nth root (where n is the count of numbers in theset) of the resulting product is taken. Formally, for the sample of size n the geometricmean is defined as

Xg = n√x1 · x2 · . . . · xn = n

√√√√ n∏i=1

xi, xi > 0, i = 1, . . . , n.

Example of usage of the geometric meanThe geometric mean can give a meaningful average to compare two companies whichare each rated at 0 to 5 for their environmental sustainability and are rated at 0 to 100for their financial viability.

If an arithmetic mean is used instead of a geometric mean, the financial viability isgiven more weight because its numeric range is larger. A small percentage change inthe financial rating (e.g. going from 80 to 90) makes a much larger difference in thearithmetic mean than a large percentage change in environmental sustainability (e.g.going from 2 to 5).

The use of a geometric mean normalizes the ranges being averaged, so that no rangedominates the weighting, and a given percentage change in any of the properties has thesame effect on the geometric mean. So, a 20% change in environmental sustainabilityfrom 4 to 4.8 has the same effect on the geometric mean as a 20% change in financialviability from 60 to 72.

5.6 Measures of dispersion

The measures of location indicate the general magnitude of the data and locate only thecentre of a distribution. They do not establish the degree of variability or the spread ofthe individual items and the deviation from the average.

Two distributions of statistical data may be symmetrical and have common arithmeticmean, median and modes. Yet with these points in common they may differ widely in

70

5 Descriptive statistics

the scatter or in their values in measures of dispersion.

Statistical dispersion (also called statistical variability or variation) is variability orspread in a variable or a probability distribution. Common examples of measures ofstatistical dispersion are the variance, standard deviation and interquartile range.

5.6.1 Range

In descriptive statistics, the range of a set of data is the difference between the largestand smallest values. It is the smallest interval which contains all the data and providesan indication of statistical dispersion. Formally, the range is defined by

R = xmax − xmin

where xmin denotes the smallest value in the data set and xmax denotes the largest valuein the data set.

The absolute range of a set of data has some disadvantages for interpretation. Firstlyit is strongly influenced by outliers and secondly its calculation is based on two valuesonly. Thus, the range is not always a satisfactory measure of dispersion.

The interquartile range counterbalances some of these disadvantages. It is defined as

RQ = x75% − x25%where x75% is the 75% quantile and x25% is the 25% quantile.

Example for the range of a distributionThe 25% quantile of the Austrian inflation data presented in table 5.1 is x25% = 1.5%,the 75% quantile is x75% = 2.0%. The lowest value is xmin = 0.5% and the highest valueequals xmax = 3.5%.

Therefore we getR = 3.5%− 0.5% = 3.0%

as range R andRQ = 2.0%− 1.5% = 0.5%

as interquartile range RQ.

5.6.2 Average absolute deviation

The average absolute deviation, or simply average deviation, of a data set is the averageof the absolute deviations and is a summary statistic of statistical dispersion or varia-

71

5 Descriptive statistics

bility. Typically, the point from which the deviation is measured is a measure of centraltendency, most often the median or the mean of the data set.

1. Absolute average deviation from the median is defined as

dMe =1

n

n∑i=1

|xi −Me| .

2. Absolute average deviation from the mean is defined as

dX =1

n

n∑i=1

∣∣xi −X∣∣ .Example of the absolute average deviation from the median and the meanThe absolute average deviation from the median of the Austrian inflation data presentedin table 5.1 is 0.005833 or 0.5833%.

The absolute average deviation from the mean of the same data set is 0.006049 or0.6049%.

5.6.3 Average squared deviation

The average squared deviation of a data set is the average of the squared deviationsand is a summary statistic of statistical dispersion or variability. The average squareddeviation from the arithmetic average is the most important measure of dispersion andis called variance. There are two definitions of variance, depending on the data set.

1. If the data set contains the whole population, the population variance σ2 is definedas

σ2 =1

N

N∑i=1

(xi − µ)2

where N is the size of the population and µ is the arithmetic mean of the popula-tion.

2. If the data set contains just a sample, the sample variance s2 is defined as

s2 =1

n− 1

n∑i=1

(xi −X

)2where n is the sample size and X denotes the arithmetic mean of the sample.

72

5 Descriptive statistics

For manual calculations the formulae for the population variance and the sample variancecan be reformulated in order to obtain the following short cut formulas:

1. For the population variance we get:

σ2 =

∑Ni=1 x

2i −

(∑Ni=1 xi)

2

N

N=

∑Ni=1 x

2i −Nµ2

N.

2. The sample variance can be reformulated into:

s2 =

∑ni=1 x

2i −

(∑ni=1 xi)

2

n

n− 1=

∑Ni=1 x

2i −NX

2

n− 1.

The standard deviation is then defined as the square root of the variance. Hence

σ =√σ2

is the population standard deviation and

s =√s2

is the sample standard deviation.

If the mean and the standard deviation of a data set or a distribution is known, thecoefficient of variation can be calculated.

1. For a population data set the coefficient of variation is defined as

v =σ

µ.

2. For a sample set the coefficient of variation is defined as

v =s

X.

Furthermore the average squared deviation from the median can be calculated. It isdefined as

sMe =1

n

n∑i=1

(xi −Me)2 .

Example for the calculation of the sample variance and sample standard deviationThe sample variance s2 of the Austrian inflation data presented in table 5.1 is

s2 = 0.000062173 or s2 = 0.006217%.

73

5 Descriptive statistics

Its sample standard deviation s is

s = 0.007885 or s = 0.7885%.

The coefficient of variation equals

v =0.7885%

1.8056%= 0.4367.

5.6.4 Box-and-whisker plot

A box-and-whisker plot (or simply box plot) is a convenient way of graphically depictinga data set through its five number summaries. It is constructed by drawing a box andtwo whiskers. The five valued used to construct a box plot are:

1. the smallest observation (sample minimum) for the lower whisker

2. the lower quartile (x25%) for the lower end of the box

3. the median for the horizontal line within the box

4. the upper quartile (x75%) for the upper end of the box

5. the largest observation (sample maximum) for the upper whisker

0

1

2

3

4

Au

stri

an

infl

atio

nra

tep

.a.

(in

%)

Figure 5.10: A box plot of the Austrian inflation data presented in table 5.1.

Depending on the data, a box plot may also indicate which observations might be conside-red outliers. This is the case if the whiskers are defined alternatively. Possible definitionsfor the whiskers are

74

5 Descriptive statistics

• The minimum and the maximum of the data.

• One standard deviation above and below the mean of the data.

• The 9th percentile and the 91th percentile.

• The 2nd percentile and the 98th percentile.

Figure 5.10 shows the box plot for the Austrian inflation data presented in table 5.1.

5.7 Correlation and regression

5.7.1 Correlation

Correlation refers to any of a broad class of statistical relationships between two randomvariables or two sets of data. Correlations can (but does not necessarily) indicate apredictive relationship that can be exploited in practice.

For example, an electrical utility may produce less power on a mild day based on thecorrelation between electricity demand and weather. In this example there is a causalrelationship, because extreme weather causes people to use more electricity for heatingor cooling. However it is necessary to note that statistical dependence is not sufficientto demonstrate the presence of such a causal relationship.

The following possible forms of correlation should be distinguished:

1. Positive and negative correlationsIf two variables change in the same direction, then this is called a positive correla-tion. (e.g. stock returns of two companies in similar markets)

If two variables change in the opposite direction, then this is called a negativecorrelation. (e.g. returns of stocks and bonds)

2. Linear and non-linear correlations.

The value of a correlation coefficient always lies within the interval [−1, 1]. In table 5.5different degrees of correlation are distinguished.

Data for the following examplesIn the following sections we will compare a time series of the change of the real Austriangross domestic product (change in Austrian real GDP) in % and the change of consumer

75

5 Descriptive statistics

degrees positive negative

absence of correlation 0 0perfect correlation +1 -1high degree [0.75,1[ ]-1,-0.75]moderate degree [0.25,0.75[ ]-0.75,-0.25]low degree ]0,0.25[ ]-0.25,0[

Table 5.5: Depending on the value of the Pearson correlation coefficient different degreesof correlation can be distinguished.

spending (in %) in Austria in the years from 1996 up to 201412. The data is presentedin table 5.6.

change in real change in Austrianyear Austrian GDP consumer spending

in % in %

1996 2.4 2.91997 2.2 0.41998 3.6 2.91999 3.6 2.42000 3.4 3.12001 1.4 1.32002 1.7 0.52003 0.8 1.72004 2.7 2.32005 2.1 2.22006 3.4 2.12007 3.6 0.92008 1.5 0.82009 -3.8 0.52010 1.9 1.52011 3.1 0.82012 0.9 0.52013 0.2 -0.2

Table 5.6: %-changes in the Austrian GDP and the Austrian consumer spending from1996 to 2013.

Pearson correlation coefficient

There are several correlation coefficients, often denoted ρ or r, measuring the degree ofcorrelation. The most common of these is the Pearson correlation coefficient, which issensitive only to a linear relationship between two variables. Furthermore it should beused only when both variables are quantitative.

12Source of the data is the home page of Statistik Austria: www.statistik.at.

76

5 Descriptive statistics

The Pearson correlation coefficient, when applied to a sample, is commonly representedby the letter r and may be referred to as the sample correlation coefficient. For the twovariables X and Y the Pearson correlation coefficient is defined as

r =

∑ni=1

(Xi −X

)·(Yi − Y

)√∑ni=1

(Xi −X

)2 ·∑ni=1

(Yi − Y

)2 .

If the sample means (X and Y ) and the sample standard deviations (sX and sY ) arealready known the Pearson correlation coefficient can alternatively be calculated as

r =1

n− 1

n∑i=1

(Xi −XsX

)(Yi − YsY

).

Example for the Pearson correlation coefficientThe Pearson correlation coefficient of the data presented in table 5.6 is

r = 0.54.

Therefore the data is moderately positively correlated.

Spearman correlation coefficient

If at least one variable is ordinal the Spearman’s rank correlation coefficient can be used.It is usually denoted by rS and is a non-parametric measure of statistical dependencebetween two variables. It is applicable to linear and non-linear relationships between twovariables.

It is defined using the ranks of the values of the variables. For assigning ranks, thevariables must be ordinal or quantitative. For a sample of size n we have

Ri, i = 1, . . . n : ranks of the characteristic X

R′i, i = 1, . . . n : ranks of the characteristic Y.

Then the Spearman correlation coefficient is defined as

rS = 1−6 ·∑n

i=1(Ri −R′i)2

(n− 1) · n · (n+ 1).

Example for the calculation of the Spearman correlation coefficient

In table 5.7 the given data and their corresponding ranks Ri and R′i are shown. Thelowest rank is given to the lowest data point in each column, the second lowest the

77

5 Descriptive statistics

change in real change in Austrianyear Austrian GDP consumer spending Ri R′

i (Ri −R′i)

2

in % in %1996 2.4 2.9 11 16 251997 2.2 0.4 10 2 641998 3.6 2.9 16 16 01999 3.6 2.4 16 15 12000 3.4 3.1 14 18 162001 1.4 1.3 5 9 162002 1.7 0.5 7 3 162003 0.8 1.7 3 11 642004 2.7 2.3 12 14 42005 2.1 2.2 9 13 162006 3.4 2.1 14 12 42007 3.6 0.9 16 8 642008 1.5 0.8 6 6 02009 -3.8 0.5 1 3 42010 1.9 1.5 8 10 42011 3.1 0.8 13 6 492012 0.9 0.5 4 3 12013 0.2 -0.2 2 1 1∑

349

Table 5.7: Calculation of the Spearman correlation coefficient for the Austrian GDP andthe Austrian consumer spending.

the following data points. If several data points coincide, all get the same rank but thefollowing ranks have to be skipped in order to balance the effect.

For example, the years 2002, 2009 and 2012 saw a growth in consumer spending of 0.5%.Therefore all data pints for 2002, 2009 and 2012 get the same rank 3, but the ranks 4and 5 are excluded in the consecutive ranking process. The next lowest value for thechange in consumer spending observed for the years 2008 and 2011, which equals 0.8%,get rank 6.

Additionally the table includes the values for (Ri−R′i)2, which are used to calculate theSpearman correlation coefficient rS .

The Spearman correlation coefficient of the data presented in table 5.6 equals

rS = 1−6 ·∑18

i=1(Ri −R′i)2

(18− 1) · 18 · (18 + 1)= 1− 6 · 349

5814= 0.64.

We see again that the data is moderate positively correlated.

78

5 Descriptive statistics

Scatter plot

For this method, the values of the two variables for every observation are plotted on agraph. One is taken along the horizontal axis (x-axis) and the other along the verticalaxis (y-axis). By plotting the data we obtain points on the graph which are generallyscattered, hence the name scatter plot.

−4 −3 −2 −1 0 1 2 3 4

0

1

2

3

change in GDP in %

chan

ge

inco

nsu

mer

spen

din

gin

%

Figure 5.11: A scatter plot of the Austrian GDP and Austrian consumer spending datapresented in table 5.6.

The manner in which these points are scattered suggests the degree and the direction ofcorrelation. Let the degree of correlation be denoted by ρ. Its direction is given by thesigns positive and negative.

i) If all points lie exactly on a rising straight line the correlation is perfectly positiveand ρ = +1.

ii) If all points lie exactly on a falling straight line the correlation is perfectly negativeand ρ = −1.

iii) If the points lie in a narrow rising strip, the correlation is highly positive.

iv) If the points lie in a narrow falling strip, the correlation is highly negative.

v) If the points are spread widely over a broad rising strip the correlation is weaklypositive.

vi) If the points are spread widely over a broad falling strip, the correlation is weakly

79

5 Descriptive statistics

negative.

vii) If the points are scattered without any specific pattern, the correlation is absent,i.e. ρ = 0.

Example for a scatter plotFigure 5.11 shows the scatter plot of the data presented in table 5.6. It can be seen thatthe data is moderately positively correlated.

5.7.2 Linear Regression

Linear Regression is an approach for modeling the relationship between a scalar de-pendent variable Y and one or more explanatory variables denoted X. The case of oneexplanatory variable is called simple regression. More than one explanatory variable iscalled multiple regression.

In linear regression, data is modeled using linear predictor functions. Such functions arelinear functions with a set of coefficients and explanatory variables, whose value is usedto predict the outcome of a dependent variable Y . They can be denoted as

y = f(x).

Ordinary least squares

A large number of procedures have been developed for parameter estimation and infe-rence in linear regression. In this text we focus on the basic and most important modelof ordinary least squares estimation (sometimes referred to as OLS). It is a simple andcomputationally straightforward method.

The ordinary least squares method minimizes the sum of squared residuals, formally

n∑i=1

(yi − y)2 −→ min!

If the data X consists of only one explanatory variable, then the linear predictor functionis defined by a constant and a scalar multiplier for xi. This is called the simple regressionmodel.

The vector of parameters in such model is 2-dimensional, and is commonly denoted as(α, β):

yi = α+ βxi.

80

5 Descriptive statistics

The coefficients α and β can be obtained by solving the following system of linearequations

n · α + β∑n

i=1 xi =∑n

i=1 yiα∑n

i=1 xi + β∑n

i=1 x2i =

∑ni=1 xi · yi

Alternatively the coefficients can be calculated directly by using the formulas

β =

∑xiyi − 1

n

∑xi∑yi∑

x2i −1n (∑xi)

2 =Cov(x, y)

Var(x)and α = y − βx.

Coefficient of determination

We can further obtain the coefficient of correlation R of a linear regression model as

R =n∑xiyi −

∑xi ·

∑yi√(

n∑x2i − (

∑xi)

2)·(n∑y2i − (

∑yi)

2) , −1 ≤ R ≤ 1.

The coefficient of determination is then

R2, 0 ≤ R2 ≤ 1.

Formally, the coefficient of determination is defined as

R2 =SSR

SST,

where SSR denotes the sum of square due to regression

SSR =n∑i=1

(yi − y)2

and SST denotes the total sum of squares

SST =n∑i=1

(yi − y)2.

The coefficient of determination R can be interpreted as a measure for the goodness offit of the model. Values close to one indicate a good model fit.

Applications of linear regression

Linear regression has many practical uses. Most applications of linear regression fall intoone of the following two broad categories:

81

5 Descriptive statistics

1. If the goal is prediction, or forecasting, linear regression can be used to fit a pre-dictive model to an observed data set of Y and X values. After developing such amodel, if an additional value of X is then given without its accompanying value ofY , the fitted model can be used to make a prediction of the value of Y .

2. Given a variable Y and a number of variables X1, . . . , Xp that may be related to Y ,linear regression analysis can be applied to quantify the strength of the relationshipbetween Y and the Xj , to assess which Xj may have no relationship with Y at alland to identify which subsets of the Xj contain redundant information about Y .

Example of an OLS calculation

The simple regression model can be used to explain the change in Austrian consumerspending with the change in the Austrian GDP. Both time series are given in table 5.6.We search for α, β such that

Consumeri = α+ β · CPIi + εi

n∑i=1

ε2i → min .

Using the formulas from previously we denote the consumer spending growth as yi andthe change in the Austrian GDP as xi. We get the sums

n∑i=1

xi = 34.70 andn∑i=1

yi = 26.60,

n∑i=1

xi · yi = 67.74 andn∑i=1

x2i = 120.55.

Therefore we have to solve the linear system

18.0 · α + 34.70 · β = 26.6034.7 · α + 120.55 · β = 67.74

We getα = 0.8864 and β = 0.3068.

The linear predictor function is

yi = α+ βxi = 0.8864 + 0.3068 · xi.

Table 5.8 shows the calculated predictions yi in the forth column and the two squaresums needed to calculation the coefficient of determination in the last two columns. Forthe sum of squares due to regression we get

SSR = 5.050

82

5 Descriptive statistics

year GDP as xi Consumer as yi yi (y − y)2 (y − y)2

1996 2.4 2.9 1.62 0.021 2.0231997 2.2 0.4 1.56 0.007 1.1621998 3.6 2.9 1.99 0.263 2.0231999 3.6 2.4 1.99 0.263 0.8502000 3.4 3.1 1.93 0.204 2.6322001 1.4 1.3 1.32 0.026 0.0322002 1.7 0.5 1.41 0.005 0.9562003 0.8 1.7 1.13 0.120 0.0492004 2.7 2.3 1.71 0.056 0.6762005 2.1 2.2 1.53 0.003 0.5222006 3.4 2.1 1.93 0.204 0.3872007 3.6 0.9 1.99 0.263 0.3342008 1.5 0.8 1.35 0.017 0.4592009 -3.8 0.5 -0.28 3.088 0.9562010 1.9 1.5 1.47 0.000 0.0002011 3.1 0.8 1.84 0.129 0.4592012 0.9 0.5 1.16 0.099 0.9562013 0.2 -0.2 0.95 0.281 2.814∑

5.050 17.291

Table 5.8: Calculation of SSR and SST for the growth in consumer spending and thegrowth in the Austrian real GDP.

−4 −3 −2 −1 0 1 2 3 4

0

1

2

3

change in GDP in %

chan

gein

con

sum

ersp

end

ing

in%

Figure 5.12: A simple linear regression, where the change in Austrian consumer spendingis explained by the change of the Austrian GDP.

and the total sum of squares equals

SST = 17.291.

83

5 Descriptive statistics

Therefore the coefficient of determination is

R2 =SSR

SST=

5.050

17.291= 0.2921.

This can be interpreted as:

About 30% of the variance of the growth in consumer spending can be ex-plained by the variance of the GDP growth.

Figure 5.12 shows a scatter plot of the change in GDP and change in consumer spending.The resulting linear regression line is shown in red.

84

6 Random variables

As opposed to other mathematical variables, a random variable conceptually does nothave a single fixed value. Rather it can take on a set of possible different values, eachwith an associated probability.

In probability theory, random variables are defined in terms of functions on a probabilityspace (Ω,Θ,P). A probability space consists of three parts.

1. Ω is the sample space, which is the set of all possible outcomes.

2. Θ is a field of events, where each event is a set containing zero or more outcomes.

• If the sample space Ω is a finite set, the set Θ is a subset of the power set ofΩ, Θ ⊆ P(Ω).

• For an uncountable infinite sample space (for example R) the sigma algebraΘ is just somewhat similar to the power set P(Ω).

3. P assigns the probability to the events. Therefore P is a function from Θ to pro-bability levels,

P : Θ→ [0, 1].

ExampleLet Ω be the sample space of all humans. The random variable X assigns each humanto the measurable attribute height. We have

X : Ω→ Rω 7→ height of the person.

6.1 Introduction to random variables

A random variable is considered to be a function on the possible outcomes. Randomvariables are typically classified as either discrete or continuous.

85

6 Random variables

Discrete variables can take on either a finite or at most a countably infinite set of dis-crete values. Their probability distribution is given by a probability mass functionwhich directly maps a value of the random variable to a probability.

Continuous variables take on values that vary continuously within one or more (pos-sibly infinite) intervals. There are an uncountably infinite number of individualoutcomes, and each has a probability 0.

As a result, the probability distribution for many continuous random variablesis defined using a probability density function. It indicates the density of theprobability of a random variable in a small neighborhood around a given value.More technically, the probability that an outcome is in a particular range is derivedfrom the definite integral of the probability density function for that range.

Both concepts can be united using a cumulative distribution function (CDF), whichdescribes the probability that an outcome will be less than or equal to a specified value.

Examples for possible discrete and continuous random variablesDiscrete random variables could be

• The number of cars passing a checkpoint in an hour.

• The number of home mortgages approved by a bank per year.

• The received points in an exam.

Continuous random variables could be

• The time it takes to make breakfast.

• The exact length of a car.

• The rate of return of a certain stock within one year.

6.2 Cumulative distribution function

In statistics the cumulative distribution function (CDF) describes the probability thata real-valued random variable X with a given probability distribution will be found at avalue less than or equal to x (similar to the empirical cumulative distribution functionintroduced in section 5.4).

86

6 Random variables

Formally, for every real number x, the cumulative distribution function of a real-valuedrandom variable X is given by

FX(x) = P(X ≤ x),

where the right-hand side represents the probability that the random variable X takeson a value less than or equal to x. The probability that X lies in the interval ]a, b], wherea < b, therefore is

P(a < X ≤ b) = FX(b)− FX(a).

If treating several random variables X,Y, . . . etc. the corresponding letters are used assubscripts while, if treating only one, the subscript is omitted. It is conventional to use acapital F for a cumulative distribution function, in contrast to the lower-case f used forprobability density functions and probability mass functions. The CDF of a continuousrandom variable X can be defined in terms of its probability density function f asfollows:

F (X) =

∫ x

−∞f(t)dt.

6.3 Discrete random variables

If X is a discrete random variable, then the function

fX(x) = P(X = x)

defined on the outcomes of X is called the probability mass function (PMF) of thediscrete random variable X.

Suppose thatX : Ω→ A A ⊆ R

is a discrete random variable defined on a sample space Ω. Then the probability massfunction

fX : A→ [0, 1]

for X is formally defined as

fX(x) = P(X = x) = P(ω ∈ Ω : X(ω) = x).

Note that fX is defined for all real numbers, including those not in the image of X. Itholds that fX(x) = 0 for all x 6∈ X(Ω). The sum of probabilities across all x must equal1, ∑

x∈AfX(x) = 1.

87

6 Random variables

Therefore, if X is a discrete random variable it attains values x1, x2, . . . with probabilitiesP(xi) = pi. Then, the cumulative distribution function of X will be discontinuous at thepoints xi and constant in between:

F (x) = P(X ≤ x) =∑xi≤x

P(X = xi) =∑xi≤x

pi.

Example for a discrete random variableSuppose that Ω is the sample space of all outcomes of a single toss of a fair coin and Xis the random variable defined on Ω assigning 0 to tails and 1 to heads. Since thecoin is fair, the probability mass function is

fX(x) =

12 x ∈ 0, 10 x 6∈ 0, 1.

Therefore the variable X has the outcomes x1 = 0 and x2 = 1 and we can write theprobability mass function in tabular form as

xi x1 = 0 x2 = 1 x 6∈ 0, 1pi = P(X = x) = f(xi) 1/2 1/2 0

The cumulative distribution function of the random variable X is

F (x) =

0 if −∞ < x < 01/2 if 0 ≤ x < 11 if x ≥ 1.

This is a special case of the binomial distribution (see section 6.7.2).

6.4 Continuous random variables

Intuitively, a continuous random variable is one which can take a continuous range ofvalues, as opposed to a discrete distribution (see previous section), where the set of pos-sible values is at most countable. Note that for continuous variables the probability thatthe random variable attains any distinct value always equals zero. Merely the probabilitythat the outcome will fall into an interval is nonzero.

Formally, if X is a continuous random variable, then it has a probability density func-tion fX(x). The probability of X falling into a given interval, say [a, b] is given by theintegral

P(a ≤ X ≤ b) =

∫ b

afX(x)dx.

88

6 Random variables

Here we see that the probability for X taking any single value a (that is a ≤ X ≤ a) iszero, because an integral with coinciding upper and lower limits always equals zero.

Equivalently, if the continuous random variable X has a differentiable distribution func-tion F (x), then

F (x) =

∫ x

−∞f(u)du,

and the function

f(x) =d

dxF (x)

is the probability density function of X.

6.5 Expected value of a random variable

The expected value of a random variable is the weighted average of all possible valuesthat this random variable can take on. The weights used in computing this averagecorrespond to the probabilities in case of discrete random variables, or densities in caseof continuous variables.

6.5.1 Expected value for a discrete random variable

Suppose the random variable X takes value x1 with probability p1, value x2 with pro-bability p2, and so on, up to value xk with probability pk. Then the expected value ofthis random variable X is defined as

E(X) = x1p1 + x2p2 + · · ·+ xkpk =k∑i=1

xipi.

Since all probabilities pi add up to 1 (∑k

i=1 pi = 1) the expected value can be viewed asthe weighted average, with pi’s being the weights:

E(X) =x1p1 + x2p2 + · · ·+ xkpk

p1 + p2 + · · ·+ pk.

If all outcomes xi are equally likely (that is p1 = p2 = . . . = pk, e.g. probability a certainvalue when of tossing a fair dice always equals 1/6), then the weighted average becomesa simple average. For a fair die we have

E(X) = 1 · 1

6+ 2 · 1

6+ 3 · 1

6+ 4 · 1

6+ 5 · 1

6+ 6 · 1

6= 3.5

89

6 Random variables

The expected value of a discrete random variable with countable possible outcomes xi,i ∈ N only exists if

∞∑i=1

|xi| · pi <∞.

The generalized expected value of the form E(g(X)) for any function g(X) can be cal-culated as

E(g(X)) =

∞∑i=1

g(xi) · pi.

6.5.2 Expected value for a continuous random variable

If the probability distribution of X has the probability density function f(x), the expec-ted value can be computed as

E(X) =

∫ ∞−∞

x · f(x)dx.

Similarly to the discrete case, the expected value of an continuous random variable onlyexists if ∫ ∞

−∞|x| · f(x)dx <∞.

If this holds, we can calculate the expected value of an arbitrary function of X, say g(X).With respect to the probability density function fX(x) the expected value of g(X) isgiven by the inner product of fX and g

E(g(X)) =

∫ ∞−∞

g(x) · fX(x)dx.

Example for the calculation of the expected value of an arbitrary random variable withgiven probability density functionSuppose, we have a continuous random variable X with a probability density function

f : R→ R

x 7→

0, x 6∈ [0, 3]−2

9x+ 23 , x ∈ [0, 3].

90

6 Random variables

Then the expected value E(X) can be calculated as

E(X) =

∫ ∞−∞

x · f(x)dx =

∫ 3

0x

(−2

9x+

2

3

)dx

=

∫ 3

0

(−2

9x2 +

2

3x

)dx

= −2

9· x

3

3+

2

3· x

2

2

∣∣∣∣30

= −2

9· 27

3+

9

3= 1.

6.5.3 Properties of the expected value

There are three major properties of the expected value.

Constants The expected value of a constant is equal to the constant itself. For example,if c is a constant, then E(c) = c.

Monotonicity If X and Y are random variables such that X ≤ Y almost surely, then

E(X) ≤ E(Y ).

Linearity The expected value E is linear in the sense that

E(X + c) = E(X) + c

E(X + Y ) = E(X) + E(Y )

E(aX) = aE(X)

where X,Y are random variables defined on the same probability space and a, c ∈R.

6.6 Variance of a random variable

The variance of a probability distribution is a measure of how far a set of values is spreadout (compare section 5.6.3). It is one of several descriptors of a probability distribution,describing how far the values lie from the expected value (e.g. the mean).

The variance is a parameter describing in part either the actual probability distributionof an observed population of numbers, or the theoretical probability distribution of a

91

6 Random variables

sample of numbers. In the latter case a sample of data from such a distribution can beused to construct an estimate of its variance. In the simplest case this estimate can bethe sample variance, defined below.

6.6.1 Formal definition of variance

If a random variable X has the expected value µ = E(X), then the variance of X isgiven by

Var(X) = E((X − µ)2

).

Here it can be seen that the variance is the expected value of the squared differencebetween the variable’s realization and the variable’s mean.

This definition can be expanded as follows, using the properties of variance as describedsubsequent to the definition:

Var(X) = E((X − µ)2

)= E(X2 − 2µX + µ2)

= E(X2)− 2µE(X) + µ2 = E(X2)− 2µ2 + µ2

= E(X2)− µ2 = E(X2)− (E(X))2 .

A mnemonic for the expression above is mean of the square minus square of the mean.The variance of a random variable X is typically designated as Var(X), σ2X , or simplyσ2. The standard deviation, denoted by σ, is defined as

σ =√σ2 (> 0).

6.6.2 Variance of a discrete random variable

If the random variableX is discrete with probability mass function x1 7→ p1, . . . , xn 7→ pn,then

Var(X) =n∑i=1

pi · (xi − µ)2

where µ is the expected value, i.e.

µ =

n∑i=1

pi · xi.

That is the expected value of the square of the deviation of X from its own mean.Roughly, it can be expressed as the mean of the squares of the deviations of the datapoints from the average. It is thus the mean squared deviation.

92

6 Random variables

Example for the calculation of the variance of rolling an unbiased dieThe variance of the discrete random variable describing rolling an unbiased die is

Var(X) = (1− 3.5)2 · 1

6+ (2− 3.5)2 · 1

6+ · · ·+ (6− 3.5)2 · 1

6≈ 2.92

6.6.3 Variance of a continuous random variable

If the random variable X is continuous with probability density function f(x), then thevariance equals the second central moment, given by

Var(X) =

∫(x− µ)2 · f(x)dx,

where µ is the expected value,

µ =

∫x · f(x)dx,

and where the integrals are definite integrals taken for x ranging over the range of X.

Note that not all distributions have a variance. If a continuous distribution does nothave an expected value, as is the case for the Cauchy distribution, it does not have avariance either. Many other distributions for which the expected value does exist alsodo not have a finite variance because the integral in the variance definition diverges.

Example for the calculation of the variance of an arbitrary random variable with givenprobability density functionThe variance of the random variable X with the probability density function

f : R→ R

x 7→

0, x 6∈ [0, 3]−2

9x+ 23 , x ∈ [0, 3].

is

Var(X) =

∫ 3

0(x− 1)2

(−2

9x+

2

3

)dx

=

∫ 3

0

(−2

9x3 +

10

9x2 − 14

9x+

2

3

)dx

= −2

9· x

4

4+

10

9· x

3

3− 14

9· x

2

2+

2

3x

∣∣∣∣30

=1

2.

93

6 Random variables

6.6.4 Properties of the variance

1. Variance is non-negative because the squares are positive or zero.

Var(X) ≥ 0

2. The variance of a constant random variable is zero; and if the variance of a variablein a data set is 0, then all entries have the same value.

P(X = a) = 1⇐⇒ Var(X) = 0

3. Variance is invariant respect to changes in a location parameter. That is, if aconstant is added to the values of the variable, the variance is unchanged.

Var(X + a) = Var(X)

4. If all values are scaled by a constant, the variance is scaled by the square of thatconstant.

Var(aX) = a2Var(X)

5. The variance of a sum of two random variables is given by

Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

Var(X − Y ) = Var(X) + Var(Y )− 2Cov(X,Y )

The variance of a finite sum of uncorrelated random variables is equal to thesum of their variances. This stems from the identity above and the fact that foruncorrelated variables the covariance is zero.

6.7 Important probability distributions

6.7.1 Hypergeometric distribution

In statistics, the hypergeometric distribution is a discrete probability distribution thatdescribes the probability of k successes in n draws from a finite population of size Ncontaining m successes without replacement. Formally, a random variable X follows thehypergeometric distribution if its probability mass function is given by

f(k;N,m, n) = P(X = k) =

(mk

)(N−mn−k

)(Nn

) ,

where

94

6 Random variables

• N is the population size

• m is the number of success states in the population

• n is the number of draws

• k is the number of successes in the draws

•(ab

)is a binomial coefficient13

It is positive whenmax(0, n+m−N) ≤ k ≤ min(m,n).

Expected value The expected value of the hypergeometric distribution is

E(X) = nm

N.

Variance The variance of the hypergeometric distribution is

Var(X) = nm

N

(1− m

N

) N − nN − 1

.

Example for a hypergeometric distributionThe classical application of the hypergeometric distribution is sampling without repla-cement. Think of an urn with two types of marbles, black ones and white ones. Definedrawing a white marble as a success and drawing a black marble as a failure. If the va-riable N describes the number of all marbles in the urn and m describes the number ofwhite marbles, then N −m corresponds to the number of black marbles. In this exampleX is the random variable whose outcome is k, the number of white marbles actuallydrawn in the experiment.

Suppose, there are 5 white and 45 black marbles in the urn. Standing next to the urn,you close your eyes and draw 10 marbles without replacement. What is the probabilitythat exactly 4 of the 10 are white?

13The binomial coefficient is defined as(a

b

)=

a!

b!(a− b)!for 0 ≤ b ≤ a.

95

6 Random variables

We get

P(X = 4) =

(54

)(50−510−4

)(5010

)=

5 · 8145060

10272278170= 0.00396 = 3.96%.

6.7.2 Binomial distribution

The binomial distribution is the discrete probability distribution of the number of suc-cesses in a sequence of n independent yes/no experiments, each of which yields successwith probability p. The binomial distribution is frequently used to model the number ofsuccesses in a sample of size n drawn with replacement from a population size N .

In general, if a random variable X follows the binomial distribution with the parametersn and p, we write X ∼ B(n, p). The probability of getting exactly k successes in n isgiven by the probability mass function

f(k;n, p) = P(X = k) =

(n

k

)pk(1− p)n−k, for k = 0, 1, 2, . . . , n.

Figure 6.1 shows the probability mass function for different values for p, n = 100 andk = 0, . . . , n.

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

k

pro

bab

ilit

yin

%

p = 0.05p = 0.20p = 0.40p = 0.80

Figure 6.1: Probability mass functions of the binomial distribution with different valuesfor p, n = 100 and k = 0, . . . , n.

96

6 Random variables

Expected value The expected value of a binomial distributed random variable X ∼B(n, p) is

E(X) = np.

Variance The variance isVar(X) = np(1− p).

Example for a binomial distributionA box contains 25 items, 10 of which are defective. A sample of two items with replace-ment will be taken. Then the probability, that one of the items are defective is

P(X = 1) =

(2

1

)·(

10

25

)1

·(

15

25

)2−1=

12

25= 0.48 = 48%.

6.7.3 Poisson distribution

The Poisson distribution is a discrete probability distribution that expresses the proba-bility of a given number of events occurring in a fixed interval of time and/or space ifthese events occur with a known average rate and independently of the time since thelast event. The distribution was first introduced by Simeon Denis Poisson (1781-1840).His work focused on certain random variables N that count the number of discreteoccurrences that take place during a time-interval of given length.

A discrete stochastic variable X is said to have a Poisson distribution with parameterλ > 0 if the probability mass function of X for k = 0, 1, 2, . . . is given by

f(k, λ) = P(X = k) =λke−λ

k!.

Figure 6.2 shows the probability mass function for different values for λ and k =0, . . . , 20.

For a Poisson distributed random variable X ∼ Pois(λ), the positive real number λ ∈ Ris equal to the expected value of X and also to the variance

λ = E(X) = Var(X).

The Poisson distribution can be applied to systems with a large (theoretically infinite)number of possible events, each of which is rare.

Example for a Poisson distributionAssume that the number of defaults per year on a large set of loans will be measured as

97

6 Random variables

0 2 4 6 8 10 12 14 16 18 200

20

40

60

80

100

k

pro

bab

ilit

yin

%

λ = 0.05λ = 2λ = 5

Figure 6.2: Probability mass functions of the Poisson distribution with different valuesfor λ.

no defaults, one default, two defaults and so on. In average the number of defaults peryear is 0.3. Then, the probability that there are two defaults in one year is

P(X = 2) =0.32e−0.3

2!=

0.066673

2= 0.03333.

Therefore the probability of two defaults in one year is 3.33%.

6.7.4 Normal distribution

In 1809 Carl Friedrich Gauss published his monograph Theoria motus corporum coeles-tium in sectionibus conicis solem ambientium where among other things he introducedseveral important statistical concepts, such as the method of least squares, the methodof maximum likelihood and the normal distribution.

In probability theory, the normal distribution (or Gaussian distribution) is a continuousprobability distribution that has a bell-shaped probability density function, known asthe Gaussian function. A continuous variable X is normally distributed if its probabilitydensity function is of the form

f(x;µ, σ2) =1

σ√

2πe−

12(x−µσ )

2

.

Figure 6.3 shows the probability density function of the normal distribution for differentvalues for µ and σ2. Sometimes it is denoted ϕµ,σ2(x).

98

6 Random variables

−5 −4 −3 −2 −1 0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

x

ϕµ,σ

2(x

)

µ = 0, σ2 = 0.2

µ = 0, σ2 = 1.0

µ = 0, σ2 = 5.0

µ = −2, σ2 = 0.5

Figure 6.3: Probability density function ϕµ,σ2(x) of normally distributed random varia-bles with different values for µ and σ2.

For a normal distributed continuous random variable X ∼ N(µ, σ2), the parameter µ isthe mean or expected value,

E(X) = µ.

The second parameter σ2 is its variance,

Var(X) = σ2.

σ is known as the standard deviation. The distribution with µ = 0 and σ2 = 1 is calledthe standard normal distribution.

The normal distribution is considered the most prominent probability distribution instatistics. There are several reasons for this:

1. The normal distribution arises from the central limit theorem, which states thatunder mild conditions the mean of a large number of random variables drawn fromthe same distribution is distributed approximately normally, irrespective of theform of the original distribution.

2. The normal distribution is very tractable analytically. This means, that a largenumber of results involving this distribution can be derived in explicit form.

There are several rules of the thumb describing properties of the normal distribution.

• About 68% of the values lie within 1 standard deviation of the mean.

99

6 Random variables

−5 −4 −3 −2 −1 0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

x

Φµ,σ

2(x

)

µ = 0, σ2 = 0.2

µ = 0, σ2 = 1.0

µ = 0, σ2 = 5.0

µ = −2, σ2 = 0.5

Figure 6.4: Cumulative distribution functions Φµ,σ2(x) of normally distributed randomvariables with different values for µ and σ2.

• Similarly, about 95% of the values lie within 2 standard deviations of the mean.

• Nearly all (99.7%) of the values lie within 3 standard deviations. In figure 6.5 theselevels are shown graphically.

Figure 6.5: Empirical rules of a normally distributed random variable shown with itsdensity function.

6.7.5 Standardized normal distribution

As stated above, the normal distribution with µ = 0 and σ2 = 1 is called the standardnormal distribution. Every normally distributed random variable X with known para-

100

6 Random variables

meters for µ and σ2 can be transformed into a random variable Z with standard normaldistribution by the linear transformation

Z =X − µσ

,

where µ is the mean of the population and σ is the standard deviation of the population.Here, Z represents the distance between the raw score and the population mean in unitsof the standard deviation. Z is negative where the raw score is below the mean, positivewhen above.

Standardization can be used to calculate prediction intervals for a normally distributedrandom variable X with known mean and variance. A prediction interval is an estimateof an interval in which future observations will fall with a certain probability, given whathas already been observed. A prediction interval [l, u] for a future observation of therandom variable X ∼ N(µ, σ2) with known mean and variance can be calculated from

γ = P(l < X < u) = P

(l − µσ

<X − µσ

<u− µσ

)= P

(l − µσ

< Z <u− µσ

),

where

Z =X − µσ

,

is the standard score of X and is distributed according to the standard normal distribu-tion.

A prediction interval is conventionally written as

[µ− zσ, µ+ zσ].

The most important values for z are shown in table 6.1 for symmetric intervals. For asingle tail γ has to be transformed as in γ = 1− 1−γ

2 .

significance γ z value

50% 0.6790% 1.6495% 1.9699% 2.58

Table 6.1: z-values for different levels of significance.

Example for the calculation of a prediction interval for a normally distributed randomvariableAssume, we want to calculate the 95% prediction interval for a normal distribution witha mean of µ = 5 and a standard deviation of σ = 2.

101

6 Random variables

Then z equals 1.96. Therefore, the lower limit of the prediction interval is

l = 5− (2 · 1.96) = 1.0801

and the upper limit equals

u = 5 + (2 · 1.96) = 8.9199.

Thus we get the prediction interval of approximately 1 to 9, where 95% of all observationswill lie in.

102

List of Tables

2.1 The possible domains D for power functions, depending on r ∈ R. . . . . 262.2 The possible maximum ranges R for power functions, depending on a, r ∈ R. 26

4.1 Prices of the 4 Stocks on two dates for example 4.1. . . . . . . . . . . . . 434.2 Prices of the 4 Stocks on two dates for example 4.2. . . . . . . . . . . . . 444.3 Prices of the 4 Stocks on two dates for example 4.3. . . . . . . . . . . . . 474.4 Prices of the 4 Stocks on two dates for example 4.4. . . . . . . . . . . . . 48

5.1 Austrian CPI since 1996, rounded in 1/2%-steps. . . . . . . . . . . . . . . 575.2 Absolute and relative frequencies . . . . . . . . . . . . . . . . . . . . . . . 575.3 Absolute and relative frequencies of the Austrian inflation rates . . . . . . 625.4 The empirical CDF for Austrian inflation rates . . . . . . . . . . . . . . . 655.5 Different degrees of correlation can be distinguished . . . . . . . . . . . . 765.6 %-changes in the Austrian GDP and consumer spending . . . . . . . . . . 765.7 Calculation of the Spearman correlation coefficient . . . . . . . . . . . . . 785.8 Calculation of SSR and SST . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 z-values for different levels of significance. . . . . . . . . . . . . . . . . . . 101

103

List of Figures

2.1 A function f takes an input x and returns an output f(x). . . . . . . . . . 132.2 The polynomial function f(x) = x2 from -5 to 5. . . . . . . . . . . . . . . 152.3 The constant function f(x) = 4 from -5 to 5. . . . . . . . . . . . . . . . . 152.4 The absolute value function f(x) = |x| from -5 to 5. . . . . . . . . . . . . 162.5 A piecewise defined function from -5 to 5 with a jump at x = 1. . . . . . . 172.6 The function f(x) =

(x+12

)3in blue and its inverse in green. . . . . . . . . 23

2.7 Three polynomial functions of degree 2, 3 and 5 from -5 to 5. . . . . . . . 242.8 f(x) = x2−3x−2

x2−4 from -5 to 5 with vertical asymptotes at x± 2. . . . . . . 28

2.9 f(x) = x3−2x2(x2−5) from -5 to 5 with vertical asymptotes at x±

√2. . . . . . . 28

2.10 The unit circle, a ray (θ = 45) and the functions sin(θ), cos(θ) and tan(θ). 292.11 The sine function sin(θ),the unit circle and the graph of the function sin(θ). 302.12 The exponential function exp(x) from -5 to 2. . . . . . . . . . . . . . . . . 312.13 The natural logarithm function log(x) for x ∈]0, 10]. . . . . . . . . . . . . 33

3.1 The graph of a function and its tangent line. . . . . . . . . . . . . . . . . 353.2 The graph of a function and a secant line to that function. . . . . . . . . . 363.3 The function f(x) = |x|. It is not differentiable at x = 0. . . . . . . . . . . 37

4.1 The relationship between continuously compounded and discrete returns. 51

5.1 A bar chart for the absolute frequencies for the Austrian inflation rates. . 585.2 A histogram for the Austrian inflation rates. . . . . . . . . . . . . . . . . . 585.3 A line chart for relative frequencies for the Austrian inflation rates. . . . . 595.4 A pie chart for the absolute frequencies for the Austrian inflation rates. . 595.5 The different forms of frequency curves in terms of skewness. . . . . . . . 605.6 The different forms of frequency curves in terms of kurtosis. . . . . . . . . 615.7 An ogive for the Austrian inflation rates . . . . . . . . . . . . . . . . . . . 635.8 The empirical CDF for the Austrian inflation rates . . . . . . . . . . . . . 645.9 The values of mean, median and mode for different distributions . . . . . 695.10 A box plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.11 A scatter plot of the Austrian GDP and Austrian consumer spending . . . 795.12 A simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 PMFs of the binomial distribution . . . . . . . . . . . . . . . . . . . . . . 966.2 PMFs of the Poisson distribution with different values for λ. . . . . . . . . 986.3 PDFs ϕµ,σ2(x) of normally distributed random variables . . . . . . . . . . 99

104

List of Figures

6.4 CDFs Φµ,σ2(x) of normally distributed random variables . . . . . . . . . . 1006.5 Empirical rules of a normally distributed random variable . . . . . . . . . 100

105