Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

  • Slide 1
  • Model Fitting Jean-Yves Le Boudec 0
  • Slide 2
  • Contents 1
  • Slide 3
  • Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential model seems appropriate How can we fit the model, in particular, what is the value of ? 2
  • Slide 4
  • Least Square Fit of Virus Infection Data 3 Least square fit = 0.5173 Mean doubling time 1.34 hours Prediction at +6 hours: 100 000 hosts
  • Slide 5
  • Least Square Fit of Virus Infection Data In Log Scale 4 Least square fit = 0.39 Mean doubling time 1.77 hours Prediction at +6 hours: 39 000 hosts
  • Slide 6
  • Compare the Two 5 LS fit in natural scale LS fit in log scale
  • Slide 7
  • Which Fitting Method should I use ? Which optimization criterion should I use ? The answer is in a statistical model. Model not only the interesting part, but also the noise For example 6 = 0.5173
  • Slide 8
  • How can I tell which is correct ? 7 = 0.39
  • Slide 9
  • Look at Residuals = validate model 8
  • Slide 10
  • 9
  • Slide 11
  • Least Square Fit = Gaussian iid Noise Assume model (homoscedasticity) The theorem says: minimize least squares = compute MLE for this model This is how we computed the estimates for the virus example 10
  • Slide 12
  • Least Square and Projection Skriva war an daol petra zo: data point, predicted response and estimated parameter for virus example 11 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise
  • Slide 13
  • Confidence Intervals 12
  • Slide 14
  • 13
  • Slide 15
  • Robustness to Outliers 14
  • Slide 16
  • A Simple Example Least Square L1 Norm Minimization 15
  • Slide 17
  • Mean Versus Median 16
  • Slide 18
  • 2. Linear Regression Also called ANOVA (Analysis of Variance ) = least square + linear dependence on parameter A special case where computations are easy 17
  • Slide 19
  • Example 4.3 What is the parameter ? Is it a linear model ? How many degrees of freedom ? What do we assume on i ? What is the matrix X ? 18
  • Slide 20
  • 19
  • Slide 21
  • Does this model have full rank ? 20
  • Slide 22
  • Some Terminology x i are called explanatory variable Assumed fixed and known y i are called response variables They are the data Assumed to be one sample output of the model 21
  • Slide 23
  • Least Square and Projection 22 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise
  • Slide 24
  • Solution of the Linear Regression Model 23
  • Slide 25
  • Least Square and Projection The theorem gives H and K 24 residuals Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise data
  • Slide 26
  • The Theorem Gives with Confidence Interval 25
  • Slide 27
  • SSR Confidence Intervals use the quantity s s 2 is called Sum of Squared Residuals 26 residuals Predicted response data
  • Slide 28
  • Validate the Assumptions with Residuals 27
  • Slide 29
  • Residuals Residuals are given by the theorem 28 residuals Predicted response data
  • Slide 30
  • Standardized Residuals The residuals e i are an estimate of the noise terms i They are not (exactly) normal iid The variance of e i is ???? A: 1- H i,i Standardized residuals are not exactly normal iid either but their variance is 1 29
  • Slide 31
  • Which of these two models could be a linear regression model ? A: both Linear regression does not mean that y i is a linear function of x i Achtung: There is a hidden assumption Noise is iid gaussian -> homoscedasticity 30
  • Slide 32
  • 31
  • Slide 33
  • 3. Linear Regression with L1 norm minimization = L1 norm minimization + linear dependency on parameter More robust Less traditional 32
  • Slide 34
  • This is convex programming 33
  • Slide 35
  • 34
  • Slide 36
  • Confidence Intervals No closed form Compare to median ! Boostrap: How ? 35
  • Slide 37
  • 36
  • Slide 38
  • 4. Choosing a Distribution Know a catalog of distributions, guess a fit Shape Kurtosis, Skewness Power laws Hazard Rate Fit Verify the fit visually or with a test (see later) 37
  • Slide 39
  • Distribution Shape Distributions have a shape By definition: the shape is what remains the same when we Shift Rescale Example: normal distribution: what is the shape parameter ? Example: exponential distribution: what is the shape parameter ? 38
  • Slide 40
  • Standard Distributions In a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard. Standard normal: N(0,1) Standard exponential: Exp(1) Standard Uniform: U(0,1) 39
  • Slide 41
  • Log-Normal Distribution 40
  • Slide 42
  • 41
  • Slide 43
  • Skewness and Curtosis 42
  • Slide 44
  • Power Laws and Pareto Distribution 43
  • Slide 45
  • Complementary Distribution Functions Log-log Scales 44 Pareto LognormalNormal
  • Slide 46
  • Zipfs Law 45
  • Slide 47
  • 46
  • Slide 48
  • Hazard Rate Interpretation: probability that a flow dies in next dt seconds given still alive Used to classify distribs Aging Memoriless Fat tail Ex: normal ? Exponential ? Pareto ? Log Normal ? 47
  • Slide 49
  • The Weibull Distribution Standard Weibull CDF: Aging for c > 1 Memoriless for c = 1 Fat tailed for c