10
CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified by Longin Jan Latecki

CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

Embed Size (px)

Citation preview

Page 1: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

CIS 2033A Modern Introduction to Probability

and StatisticsUnderstanding Why and How

Chapter 17: Basic Statistical Models

Slides by Dan Varano

Modified by Longin Jan Latecki

Page 2: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.1 Random Samples and Statistical Models Random Sample: A random sample is a

collection of random variables X1, X2,…, Xn, that that have the same probability distribution and are mutually independent If F is a distribution function of each random

variable Xi in a random sample, we speak of a random sample from F. Similarly we speak of a random sample from a density f, a random sample from an N(µ, σ2) distribution, etc.

Page 3: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.1 continued Statistical Model for repeated

measurements A dataset consisting of values x1, x2,…, xn of

repeated measurements of the same quantity is modeled as the realization of a random sample X1, X2,…, Xn. The model may include a partial specification of the probability distribution of each Xi.

Page 4: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.2 Distribution features and sample statistics

Empirical Distribution Function Fn(a) =

Law of Large Numbers lim n->∞ P(|Fn(a) – F(a)| > ε) = 0

This implies that for most realizations Fn(a) ≈ F(a)

n

aX i ]),((#

Page 5: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.2 cont. The histogram and kernel density estimate

≈ f(x) Height of histogram on (x-h, x+h] ≈ f(x) fn,h(x) ≈ f(x)

n

hxhxXi ]),((#

Page 6: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.2 cont. The sample mean, sample median, and

empirical quantiles Ẋn ≈ µ

Med(x1, x2,…, xn) ≈ q0.5 = Finv(0.5)

qn(p) ≈ Finv(p) = qp

Page 7: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.2 cont. The sample variance and standard

deviation, and the MAD Sn

2 ≈ σ2 and Sn ≈ σ

MAD(X1, X2,…,Xn) ≈ Finv(0.75) – Finv (0.5)

Page 8: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.2 cont. Relative Frequencies

for a random sample X1,X2, . . . , Xn from a discrete distribution with probability mass function p,one has that ≈ p(a)n

aX i )(#

Page 9: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.4 The linear regression model Simple Linear Regression Model: In a simple

linear regression model for a bivariate dataset (x1, y1), (x2, y2),…,(xn, yn), we assume that x1, x2,…, xn are nonrandom and that y1, y2,…, yn are realizations of random variables Y1, Y2,…, Yn satisfying Yi = α + βxi + Ui for i = 1, 2,…, n,

Where U1,…, Un are independent random variables with E[Ui] = 0 and Var(Ui) = σ2

Page 10: CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified

17.4 cont Y1, Y2,…,Yn do not form a random sample.

The Yi have different distributions because every Yi has a different expectation E[Yi] = E[α + βxi + Ui] = α + βxi + E[Ui] = α + βxi