22
Data Science: Data Visualization Boot Camp Relationship Bubble Plot Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 1/22

Data Science: Data Visualization Boot Camp Relationship ... · Bubble Plot Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD 24 January 202024 January

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Data Science: Data Visualization Boot CampRelationshipBubble Plot

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    24 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 2020

    1/22

  • 2/22

    Type Sample data Hands on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 TypeUsesGeneral considerations

    2 Sample data

    3 Hands on

    4 Q & A

    5 Conclusion6 References7 Files

  • 3/22

    Type Sample data Hands on Q & A Conclusion References Files

    A definition

    “A bubble graph is a vari-ation of a point or line graphwhere the data points (dots)have been replaced by circles(bubbles). The major advan-tage of a bubble graph versus apoint or line graph is the abil-ity to encode one or more addi-tional variables by means of thebubble symbol. Bubble graphsmight be two or three dimen-sional, . . . ”

    R. L. Harris [1]

  • 4/22

    Type Sample data Hands on Q & A Conclusion References Files

    R supplied data set (1 of 2)

    Included in the R package ggplot2.

    “This dataset contains a subset of the fuel econ-omy data that the EPA makes available on . It contains only models which hada new release every year between 1999 and 2008 - this wasused as a proxy for the popularity of the car.”

    H. Wickham [2]

    library(ggplot2)

    ?mpg

    head(mpg)

    Resulting in:

  • 5/22

    Type Sample data Hands on Q & A Conclusion References Files

    R supplied data set (2 of 2)

    # A tibble: 6 x 11

    manufacturer model displ year cyl trans drv cty hwy fl class

    1 audi a4 1.8 1999 4 auto(l5) f 18 29 p comp

    2 audi a4 1.8 1999 4 manual(m5) f 21 29 p comp

    3 audi a4 2 2008 4 manual(m6) f 20 31 p comp

    4 audi a4 2 2008 4 auto(av) f 21 30 p comp

    5 audi a4 2.8 1999 6 auto(l5) f 16 26 p comp

    6 audi a4 2.8 1999 6 manual(m5) f 18 26 p comp

  • 6/22

    Type Sample data Hands on Q & A Conclusion References Files

    More recent mileage data

    Downloaded from:https://www.fueleconomy.gov/feg/download.shtml

    Described at: https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

    We will:

    1 Extract csv data from a zip file (39,865 rows)

    2 Select certain makes (attempt to replicate the sample data)

    3 Display different data for selected makes/models

    https://www.fueleconomy.gov/feg/download.shtmlhttps://www.fueleconomy.gov/feg/ws/index.shtml#vehiclehttps://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

  • 7/22

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (1 of 3)

  • 8/22

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (2 of 3)

    rm(list=ls())

    library(ggplot2)

    data(mpg, package="ggplot2")

    mpg_select

  • 9/22

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (3 of 3)

    g + geom_point(aes(col=manufacturer))

    g + geom_jitter(aes(col=manufacturer))

    g + geom_jitter(aes(col=manufacturer, size=hwy)) +

    geom_smooth(aes(col=manufacturer), method="lm", se=F)

    g + geom_jitter(aes(col=manufacturer, size=hwy)) +

    geom_smooth(aes(col=manufacturer), method="lm", se=F) +

    labs(size = "Highway\n mpg",

    colour = "Brand")

  • 10/22

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (1 of 4)

  • 11/22

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (2 of 4)

    rm(list=ls())

    library(ggplot2)

    saveFileName

  • 12/22

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (3 of 4)

    labs(subtitle="mpg: Displacement vs City Mileage",

    title="Bubble chart",

    x="Engine displacement (liters)",

    y="City mpg",

    color="Manufacturer")

    g + geom_point()

    g + geom_point(aes(col=make))

    g + geom_jitter(aes(col=make))

    g + geom_jitter(aes(col=make, size=highway08)) +

    geom_smooth(aes(col=make), method="lm", se=F) +

    labs(size = "Highway\n mpg",

    colour = "Brand")

  • 13/22

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (4 of 4)

  • 14/22

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes. (1 of 4)

  • 15/22

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes. (2 of 4)

    rm(list=ls())

    library(ggplot2)

    saveFileName

  • 16/22

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes. (3 of 4)

    " and average CO2 over time"),

    title="Bubble chart",

    x="Year",

    y="City mpg",

    caption = paste0("Idea taken from \"Practical",

    " Statistics for Data",

    " Scientists\", Bruce and Bruce."

    )

    )

    g + geom_point()

    g + geom_jitter()

    g + geom_jitter(aes(size=co2,

    shape=as.factor(cylinders)

    ), alpha=0.5) +

    geom_smooth(colour="green", method="lm", se=F) +

  • 17/22

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes. (4 of 4)

    labs(size = "CO2\nmeasurements",

    shape = "Number\nof cylinders"

    )

  • 18/22

    Type Sample data Hands on Q & A Conclusion References Files

    Hands-on

    1 The supervisor would like to see the effect of different“default” themes on the first plot. Show how to use the gray,linedraw, and classical themes.

    2 The CO2 plot displays data for Hondas only. Change the dataselection command to include Fords, and discuss how theresulting plot could be improved.

  • 19/22

    Type Sample data Hands on Q & A Conclusion References Files

    Q & A time.

    Q: How many Harvard MBA’sdoes it take to screw in a lightbulb?A: Just one. He grasps it firmlyand the universe revolves aroundhim.

  • 20/22

    Type Sample data Hands on Q & A Conclusion References Files

    What have we covered?

    Bubble plots are:

    Require slightly more thoughtand consideration than scatterplotsUsed to show 3, or more relateddata sets

    Good for showing gross differencesin the third dimension.

    Next: Columnar histograms (how grouping data can show patterns)

  • 21/22

    Type Sample data Hands on Q & A Conclusion References Files

    References (1 of 1)

    [1] Robert L. Harris,Information Graphics: A Comprehensive Illustrated Reference,Oxford University Press, 2000.

    [2] Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis,Springer-Verlag New York, 2009.

  • 22/22

    Type Sample data Hands on Q & A Conclusion References Files

    Files of interest

    1 Code snippet to createimages in this presentation

    2 Extract Federal fuel data

    ## First codesrm(list=ls())

    library(ggplot2)data(mpg, package="ggplot2")

    mpg_select