Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
2. Foundations of inferential statistics
Learning objectives
In the first topic we have seen how to describe a sample,something that has already happened. The second topic describesa possible future event using random variables.When you have completed this topic you will be able to
explain what is a random variable;
describe characteristics of a random variable using probabilitydistributions and population measures;
view a data set as realizations of independent and identicallydistributed (i.i.d.) random variables.
23 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
24 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
2.1 Random variable
Definition: random variable
A random variable (r.v.) X is the outcome of an experiment thathasn’t happened yet.
Examples
Consider the experiment of flipping a coin. Denote by X thenumber of heads. X is a random variable.
Denote Y the number of hours an HEC student spendsexercising per week. Y is a random variable.
25 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Notation: random variables are denoted with uppercase letters(e.g. X ). Lowercase x represents a realization of X .
Definitions
The set of all possible realizations of x is denoted X and iscalled the range of the random variable.
The random variable X is called discrete if X is countable,and continuous if X is uncountable.
26 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Examples: discrete or continuous?
1 The number of heads at the flip of a coin.
2 The number of hours HEC students spend exercising per week.
We now wish to measure how likely it is that each value occur.
27 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
2.2 Univariate distribution functions
2.2.1 Discrete random variables
Consider a discrete r.v. X with range X .
Definition: pmf
The probability mass function (pmf), f (x), gives the probabilitythat X takes the value x :
f (x) = P(X = x) ! 0 .
The pmf is a function f : X " [0, 1] satisfying
!
x!Xf (x) = 1 .
Jump to bivariate pmf
28 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
In our example of flipping a coin, the probability of landing headmay be 50%. However, some coins may be biased. For instance,the Belgium Euro coin has been accused of unfair flipping (NewScientist, Jan. 4, 2002).
In general: the flip of a coin can be modeled by a Bernoullidistribution.
29 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Definition: Bernoulli distribution(Jacob Bernoulli)
A random variable X has a Bernoulli distribution if
X =
!
1 with probability p
0 with probability 1! p.
Hence, the pmf is
f (0) = 1! p and f (1) = p .
30 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Notation: X ! f , if X has a pmf given by f .
Definition
The cumulative distribution function (cdf) of X is:
FX (x) = P(X " x) =!
y!x
f (y) .
31 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
We now extend these definitions to the case of continuous randomvariables...
Problem
The probability of occurrence of a single value (e.g. probability thatthe number of hours an HEC student spends exercising per week is3.1 hours) will typically be zero for a continuous random variable.
32 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
2.2.2 Continuous random variables Jump to bivariate pdf
Consider a continuous r.v. X with range X ! R.
Definitions
The probability density function (pdf) is a functionf : X " R+, satisfying:
1 P(a # X # b) =
! b
a
f (x)dx =area under f (x)between a and b
;
2
!
Xf (x)dx = 1 .
The cumulative distribution function (cdf) of X is:
FX (x) = P(X # x) =
! x
!"f (y)dy .
33 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Definition: normal distribution
A r.v. X is normally distributed, denoted X ! N (µ,!2) withµ " R and ! > 0, if its pdf is given by
f (x) =1#2"!
e!(x!µ)2
2!2 , x " R .
The normal distribution with parameter values µ = 0 and ! = 1 iscalled the standard normal distribution.
34 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Male height in U.S. (20+)
Male heights in centimeters
Freq
uenc
y
150 160 170 180 190 200
020
4060
8010
0
Sample mean and standard deviation: 176.3cm and 7.65cm.Almost all men are between [153.35, 199.25]cm. However. . .
35 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Shaquille O’Neal: 216cm.
36 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Alert
The cdf of a normal will play an important role, but itscomputation is challenging. Only numerical integration techniquesare available.
We can compute the cdf FX (x) by either using a Z-table or Excel.
Excel: use
NORMDIST(x , µ,!, 1) for the cdf of N (µ,!2);
NORMDIST(x , µ,!, 0) for the pdf of N (µ,!2).
37 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Example
Consider X ! N (0, 1). Calculate FX (0.8) by first using a table,then, using Excel.
If µ "= 0 and/or ! "= 1, we can standardize the r.v. as follows.
Theorem
Consider X ! N (µ,!2). Then,
X # µ
!! N (0, 1) .
Example
Calculate FX (0.8) for X ! N (0.5, 22) by using a table, then, Excel.
38 / 221 Veronika Czellar HEC Paris Statistics
1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals
4. Testing statistical hypotheses5. Regression analysis
2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample
Sample PopulationObservation xi Random variable X
Histograms pdf, pmf, cdfSample mean x ?
Sample variance s2 ?Sample quantiles q̂! ?
39 / 221 Veronika Czellar HEC Paris Statistics