28
CSSS/POLS 510 Maximum Likelihood Estimation: Lab 4 Kai Ping (Brian) Leung 10/18/2019

CSSS/POLS510MaximumLikelihood Estimation: Lab4staff.washington.edu/kpleung/mle/lab_materials/lab4...3. Calculatingpredictedvaluesandconfidenceintervals-Scenario1 # How to plot two

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • CSSS/POLS 510 Maximum LikelihoodEstimation: Lab 4

    Kai Ping (Brian) Leung

    10/18/2019

  • 0. Agenda

    1. Key concepts

    2. Recap:I Simulating heteroskedastic normal dataI Fitting a model using the simulated data and lm()I Fitting the heteroskedastic normal model using ML

    3. Calculating predicted values and confidence intervals

    4. Simulating predicted values using MASS and simcf

    5. Questions about Homework 2 and lectures

  • 1. Key concepts

    Two kinds of uncertaintyI Key point today: Two kinds of uncertainty

    I Uncertainty about parameter vs. Uncertainty about samplingprocess

    I Estimation of parameters carries uncertainty (we only knowtheir relative likelihood compared to other values)

    I In addition, the sampling process (that produces particularGaussian random variables) is also uncertainI Prediction is even harder as it combines two kinds of uncertainty

  • Prerequiste

    rm(list=ls()) # Clear memoryset.seed(123456) # For reproducible random numberslibrary(MASS) # Load packageslibrary(tidyverse)library(simcf) # Download from Chris's website and install manually

  • Code from last lab# R Script file available on Chris's website: lab4_start_coden

  • 3. Calculating predicted values and confidence intervals

    Motivation: We want to study how the change in a particularexplanatory variable affects the outcome variable, all else being equal

    Scenario 1: Vary covariate 1; hold covariate 2 constant1. Create a data frame with a set of hypothetical scenarios for

    covariate 1, while keeping covariate 2 at its meanI What is the sensible range of some hypothetical scenarios for

    covariate 1? Consider the original range of w1.2. Calculate the predicted values using the predict() function

    I Hint: you need at least the following arguments:predict(object = ... , newdata = ... , interval =... , level = ...)

    3. Plot the prediction intervals4. Similarly, calculate the confidence intervals using the

    predict() function5. Plot the confidence intervals; compare them with the predictive

    intervals

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # Set upw1range

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    simPI.w1 %as_tibble() %>% # Coerce it into a tibblebind_cols(w1 = w1range) # Combine hypo w1 with predicted y

    head(simPI.w1) # Inspect

    ## # A tibble: 6 x 4## fit lwr upr w1## ## 1 7.66 -0.536 15.8 0## 2 7.90 -0.295 16.1 0.05## 3 8.13 -0.0545 16.3 0.1## 4 8.37 0.186 16.6 0.15## 5 8.61 0.426 16.8 0.2## 6 8.85 0.666 17.0 0.25

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # ggplot2theme_set(theme_classic())

    ggplot(simPI.w1, aes(x = w1, y = fit, ymax = upr, ymin = lwr)) +geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 1")

    0

    5

    10

    15

    20

    0.00 0.25 0.50 0.75 1.00Covariate 1

    Pre

    dict

    ed Y

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # Calculate confidence intervals using predict()simCI.w1 %bind_cols(w1 = w1range)

    head(simCI.w1)

    ## # A tibble: 6 x 4## fit lwr upr w1## ## 1 7.66 7.23 8.08 0## 2 7.90 7.50 8.29 0.05## 3 8.13 7.77 8.50 0.1## 4 8.37 8.04 8.71 0.15## 5 8.61 8.30 8.92 0.2## 6 8.85 8.57 9.13 0.25

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # Plot confidence intervalsggplot(simCI.w1, aes(x = w1, y = fit, ymax = upr, ymin = lwr)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 1")

    7

    8

    9

    10

    11

    12

    13

    0.00 0.25 0.50 0.75 1.00Covariate 1

    Pre

    dict

    ed Y

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # How to plot two plots side by side# In ggplot2, we need to combine two datasets but also create a new variable# to identify whether the data are from prediction or confidence intervalssimALL.w1 % mutate(type = "PI"),simCI.w1 %>% mutate(type = "CI"))

    head(simALL.w1)

    ## # A tibble: 6 x 5## fit lwr upr w1 type## ## 1 7.66 -0.536 15.8 0 PI## 2 7.90 -0.295 16.1 0.05 PI## 3 8.13 -0.0545 16.3 0.1 PI## 4 8.37 0.186 16.6 0.15 PI## 5 8.61 0.426 16.8 0.2 PI## 6 8.85 0.666 17.0 0.25 PI

  • 3. Calculating predicted values and confidence intervals -Scenario 1

    # Plot confidence intervals and predictive intervals side by sideggplot(simALL.w1, aes(x = w1, y = fit, ymax = upr, ymin = lwr)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 1") +facet_grid(~ type)

    CI PI

    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

    0

    5

    10

    15

    20

    Covariate 1

    Pre

    dict

    ed Y

  • 3. Calculating predicted values and confidence intervals -Scenario 2

    Scenario 2: Vary covariate 2; hold covariate 1 constantw2range

  • 3. Calculating predicted values and confidence intervals -Scenario 2

    # Plot the predictive intervals for hypothetical w2ggplot(simPI.w2, aes(x = w2, y = fit, ymax = upr, ymin = lwr)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 2")

    0

    10

    20

    0.00 0.25 0.50 0.75 1.00Covariate 2

    Pre

    dict

    ed Y

    The prediction intervals do not capture the heteroskedastic nature of the outcome variable with respect tocovariate 2. Any better solution?

  • 4. Simulating predicted values using simcf

    Motivation: Can we use simulation methods to produce the sameprediction and confidence intervals? Recall the lecture yesterday: wecan draw a bunch of β̃ from a multivariate normal distribution

    Demonstration:

    Scenario 1: Vary covariate 1; hold covariate 2 constant1. Create a data frame with a set of hypothetical scenarios for

    covariate 1 while keeping covariate 2 at its mean2. Simulate the predicted values using MASS and simcf3. Plot the results

  • 4. Simulating predicted values using simcf

    In order to use simcf to generate quantities of interest, three thingsare needed:

    1. Draw β̃ (and other parameters) from a multivariate normaldistribution

    2. Specify model formula(s)3. Create hypothetical scenarios of your substantive interest

  • 4. Simulating predicted values using simcf: Scenario 1

    # Draw parameters from the model predictive distributionsims

  • 4. Simulating predicted values using simcf: Scenario 1# Use `cfMake` to create a baseline datasetxhypo

  • 4. Simulating predicted values using simcf: Scenario 1# Use `cfChange` and loop function to loop through each hypothetical x valuesfor (i in 1:length(w1range)) {

    xhypo

  • 4. Simulating predicted values using simcf: Scenario 1# Repeat the same procedures for zzhypo

  • 4. Simulating predicted values using simcf: Scenario 1# Simulate predictive intervals for heteroskedastic linear models# `hetnormsimpv()` is from simcf packagesimRES.w1

  • 4. Simulating predicted values using simcf: Scenario 1

    simRES.w1 %bind_rows() %>% # Collaspe the list into d.f.bind_cols(w1 = w1range) # Combine with hypo. w1 values

    simRES.w1 # Inspect

    ## # A tibble: 6 x 4## pe lower upper w1## ## 1 7.52 0.885 13.9 0## 2 8.57 1.94 15.2 0.2## 3 9.58 2.85 16.4 0.4## 4 10.6 3.55 17.5 0.6## 5 11.6 4.51 18.5 0.8## 6 12.5 5.33 19.7 1

  • 4. Simulating predicted values using simcf: Scenario 1# ggplot2ggplot(simRES.w1, aes(x = w1, y = pe, ymax = upper, ymin = lower)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 1")

    0

    5

    10

    15

    20

    0.00 0.25 0.50 0.75 1.00Covariate 1

    Pre

    dict

    ed Y

  • 4. Simulating predicted values using simcf

    Your turn for practice:

    Scenario 2: Vary covariate 2; hold covariate 1 constant1. Create a data frame with a set of hypothetical scenarios for

    covariate 2 while keeping covariate 1 at its mean2. Simulate the predicted values using MASS and simcf3. Plot the results

  • 4. Simulating predicted values using simcf: Scenario 2

    # Create hypothetical scenarios for w2w2range

  • 4. Simulating predicted values using simcf: Scenario 2# Plot the predictive intervals for hypothetical w2ggplot(simRES.w2, aes(x = w2, y = pe, ymax = upper, ymin = lower)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 2")

    0

    10

    20

    30

    0.00 0.25 0.50 0.75 1.00Covariate 2

    Pre

    dict

    ed Y

    Can we compare them with the prediction intervals generated by predict()?

  • 4. Simulating predicted values using simcf: Scenario 2# Combine two dataframessimPI.w2 %

    rename(pe = fit, lower = lwr, upper = upr)

    simALL.w2 % mutate(method = "sim"),simPI.w2 %>% mutate(method = "lm")

    )

    # Plot the predictive intervals for hypothetical w2ggplot(simALL.w2, aes(x = w2, y = pe, ymax = upper, ymin = lower)) +

    geom_line() +geom_ribbon(alpha = 0.1) +labs(y = "Predicted Y", x = "Covariate 2") +facet_grid(~ method)

    lm sim

    0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

    0

    10

    20

    30

    Covariate 2

    Pre

    dict

    ed Y