Upload
dinhkhuong
View
221
Download
6
Embed Size (px)
Citation preview
Lecture Notes 2:Variables and graphics
1
Highlights:
• Quantitative vs. qualitative variables • Continuous vs. discrete and ordinal vs. nominal
variables • Frequency distributions • Pie charts • Bar charts • Histograms and distribution shape • Box plots
Variable (Data) Types• Variables can be either qualitative or quantitative.
• Quantitative: Numeric - height, weight, number of customers, blood alcohol level
• Quantitative variables have values that we can do sensible math with. Numbers which do not represent quantities are not quantitative.
• Qualitative: Names or categories - eye color type of car, political affiliation, breed of dog. Sometimes qualitative variables are also referred to as categorical.
• Non-quantitative numbers are categorical.
Variable Levels• Quantitative variables come in two levels, continuous
and discrete:
• Quantitative Continuous: Numeric variables which can be given to an arbitrary number of decimal places. Typically, continuous variables are measured. Examples:
Variable Levels• Quantitative Discrete: Numeric variables where only integer responses make
sense. Typically, discrete variables are counted.Examples:
• Note that when a continuous variable is rounded to the nearest integer, it is still considered continuous.
• For instance, rounding temperature to the nearest integer is a common thing to do, but temperature is still considered continuous.
Variable Levels• Qualitative variables also come in two levels, ordinal and
nominal.
• Qualitative Ordinal: These are qualitative variables that are typically placed in a set order.
• If placing the values of a qualitative variable “out of order” would be confusing, then it should probably be treated as ordinal.Examples:
Variable Levels
• Qualitative Nominal: These are qualitative variables in which order does not matter. Most qualitative variables are nominal.Examples:
Data Graphics• We will look at pie charts, bar charts,
histograms, and box plots.
• All of the graphs we will look at show frequency distributions of data. Often this is shortened to just distribution.
• A distribution tells you the values a variable takes on, and the frequency with which those values are taken on.
• So, if we are interested in the distribution of blood types from a bank of donors, I could first show you the data like this…
B B B B B B A B O O A AB B O B AB A B B B B AB B AB AB O B O AB AB A AB A AB AB O O AB O B AB A O A B B A B A AB B AB A O A B AB A AB B B B AB B B B O A B A B A A B A A AB
Blood Types from a Group of 77 Donors
(This is raw data, not a distribution)
Blood Type
A AB B O
# of Donors
18 18 30 11
…or like this:
This is a frequency distribution, because it tells you the different values that the variable “Blood Type” takes on, as well as how often it takes each value on.
I could show it to you like this:
This is also a frequency distribution. Does the visual aspect help give meaning to the distribution?
Relative Frequency Distribution• Sometimes it is useful to show relative frequency
rather than just frequency.
• Relative frequency shows the different values a variable takes on, and how often it takes each value on as a proportion of the total.
• Proportions are often denoted as “p”, and given as:
# of observations of interestr e l a t ive f r eq . = p =
to ta l # of observat ions
Relative Frequency DistributionRelative frequency example:
Blood Type
A AB B O Total
# of Donors
18 18 30 11
Relative frequency (p)
Hate stats (20)
Like stats (43)
Open mind (198)
ST301 Student Attitudes
attitudes towards statisticsSurvey results for students'
Pie Charts• Pie charts can be used to
summarize one qualitative variable
• Pie slices represent the proportion of observations in a class
• Sometimes frequency results are also included
• The more categories you have, the more difficult the pie chart will be to read.
Death (11)
No opinion ( 2)
Neither ( 1)Depends ( 1)Life in prison
(14)
Females
No opinion ( 1)
Neither ( 2)
Depends ( 3)Life in prison ( 3)
Death (22)
Males
Two pie charts • Multiple pie charts can be
used to compare two different groups. Here, the pie charts compare attitudes of females and males toward the appropriate punishment for murder.
• Often it is tough to make direct comparisons using pie charts.
Bar charts• Bar charts can be
used wherever pie charts are used.
• Like pie charts, they are used to show the distribution of a qualitative variable.
• Each bar in a bar chart shows you the frequency (or count) for the group it is associated with.
Num
ber C
augh
t0
13
25
38
50
Species
Brown
Brook
Rainbo
w
Cutthro
at
Lrg.M
th
Small M
th.
Walley
e
Salmon
Sunfis
h
Bluegil
l
Perch
The chart above shows the frequency of catches for different species of fish.
Bar charts vs. pie chartsHere is the intro stats grade distribution bar chart from before, alongside a pie chart of the same data.
• Bar charts make comparing categories easier
• For example, It isn’t immediately obvious that the “A” slice is the same size as the “AB” slice. But it is obvious that the “A” bar is the same height as the “AB” bar.
The HistogramHistogram of Stat311 Heights (Inches)Frequency
60 65 70
05
1015
The Histogram
• A histogram displays the distribution of a quantitative variable.
• The difference between a histogram and a bar chart is that bar charts are for qualitative data and histograms are for quantitative data.
• With bar charts, each bar represents a different distinct group. With histograms, each bar represents the number of observations which fall into an interval, also known as a bin.
The Histogram• Note that the number of bins on a histogram is
arbitrary. The larger the bin size, the fewer bins there will be.
• Changing the number of bins can produce different looking histograms, even if the underlying data is exactly the same.
• The following four histograms represent the exact same data:
Frequency
0 10 20 30 40 50 60
010
2030
40
Frequency
0 10 20 30 40 50 60
020
4060
80
Frequency
0 10 20 30 40 50 60
010
2030
4050
60
Frequency
0 10 20 30 40 50 600
24
68
1012
14
Note: If an observation falls directly on a bin endpoint, it is typical to place that observation in the bar to the left of its value. But this is not a hard and fast rule.
Histogram exampleLet’s briefly construct two different histograms that represent the same simple dataset below:
Heights of 10 randomly selected statistics students65 67 66 69 69 66 64 64 63 72
Distribution Shape• Looking at a histogram allows
us to discern a distribution’s shape.
• When there are lots of low values and just a few high values the distribution is said to be skewed to the right, or positively skewed
Distribution Shape
• When there are lots of high values and just a few low values the distribution is said to be skewed to the left, or negatively skewed
• The skewedness of this histogram is not as dramatic compared to that of the previous histogram.
Distribution Shape
• When the two halves of the histogram look approximately like mirror images the distribution is said to be (almost) symmetrical.
• We say “almost” symmetrical because it is unlikely that a histogram of data will be perfectly symmetrical.
Distribution Shape• When there are two peaks in a
histogram, we say that the data is bimodal
• The mode is the most common value in a distribution.
• Bimodality may indicate that there are two distinct groups being combined into one dataset.
Histogram of Heights (Inches)
Frequency
60 65 70 75
05
1015
2025
Boxplots• Like histograms, boxplots are used to display the
distribution of a quantitative variable.
• The shape of the distribution as well as the presence of any possible outliers is easily discerned from the boxplot. These outliers are drawn as dots.
• Boxplots are also useful for comparing multiple groups of data side by side.
Boxplot graphics• The boxplot is sometimes called
the box and whiskers plot.
• In a boxplot, half the data lies above the thick black line and half lies below it.
• Also, half the data lies inside the box, and half lies outside
• The dots are outliers.
We will discuss boxplots in detail in the next set of notes
Graphic Summaries• Graphs are tools that allow us to give meaning to a set of
data. They should give the reader a better understanding of what is going on than can be achieved by just looking at the raw data.
• “A picture is worth a thousand words.” In the case of statistics, a picture is also worth a whole lot of numbers.
• In the next set of notes, we will discuss some common statistics that can be used to summarize a set of data.