49

Chapter 1 Section 1

Embed Size (px)

DESCRIPTION

Chapter 1 Section 1. Introduction to the Practice of Statistics. Chapter 1 – Section 1. The science of statistics is Collecting Organizing Summarizing Analyzing information to draw conclusions or answer questions. Chapter 1 – Section 1. Organize and summarize the information - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 1 Section 1
Page 2: Chapter 1 Section 1

Chapter 1Section 1

Introduction to thePractice of Statistics

Page 3: Chapter 1 Section 1

Chapter 1 – Section 1

• The science of statistics is– Collecting– Organizing– Summarizing– Analyzing

information to draw conclusions or answer questions

Page 4: Chapter 1 Section 1

Chapter 1 – Section 1

• Organize and summarize the information

Descriptive statistics (chapters 2 through 4)

• Draw conclusion/generalization from the informationInferential statistics (chapters 9 through 11)

Page 5: Chapter 1 Section 1

Chapter 1 – Section 1

• A population

- Is the group to be studied

- Includes all of the individuals in the group

• A sample– Is a subset of the population– Is often used in analyses because getting

access to the entire population is impractical

Page 6: Chapter 1 Section 1

Chapter 1 – Section 1

• Characteristics of the individuals under study are called variables– Some variables have values that are attributes or

characteristics … those are called qualitative or categorical variables

– Some variables have values that are numeric measurements … those are called quantitative variables

• The suggested approaches to analyzing problems vary by the type of variable

Page 7: Chapter 1 Section 1

Chapter 1 – Section 1

• Examples of qualitative variables– Gender– Zip code– Blood type– States in the United States– Brands of televisions

• Qualitative variables have category values … those values cannot be added, subtracted, etc.

Page 8: Chapter 1 Section 1

Chapter 1 – Section 1

• Examples of quantitative variables– Temperature– Height and weight– Sales of a product– Number of children in a family– Points achieved playing a video game

• Quantitative variables have numeric values … those values can be added, subtracted, etc.

Page 9: Chapter 1 Section 1

Chapter 1 – Section 1

• Quantitative variables can be either discrete or continuous

• Discrete variables– Variables that have a finite or a countable number of

possibilities– Frequently variables that are counts

• Continuous variables– Variables that have an infinite but not countable

number of possibilities– Frequently variables that are measurements

Page 10: Chapter 1 Section 1

Chapter 1 – Section 1

• Examples of discrete variables– The number of heads obtained in 5 coin flips– The number of cars arriving at a McDonald’s between

12:00 and 1:00– The number of students in class– The number of points scored in a football game

• The possible values of qualitative variables can be listed

Page 11: Chapter 1 Section 1

Chapter 1 – Section 1

• Examples of continuous variables– The distance that a particular model car can drive on

a full tank of gas– Heights of college students

Page 12: Chapter 1 Section 1

Summary: Chapter 1 – Section 1

• The process of statistics is designed to collect and analyze data to reach conclusions

• Variables can be classified by their type of data– Qualitative or categorical variables– Discrete quantitative variables– Continuous quantitative variables

Page 13: Chapter 1 Section 1

Chapter 2

Organizing and

Summarizing Data

Page 14: Chapter 1 Section 1

Chapter 2 Sections

• Sections in Chapter 2– Organizing Qualitative Data– Organizing Quantitative Data– Graphical Misrepresentations of Data

Page 15: Chapter 1 Section 1

Chapter 2Section 1

OrganizingQualitative Data

Page 16: Chapter 1 Section 1

Chapter 2 – Section 1

• Qualitative data values can be organized by a frequency distribution

• A frequency distribution lists– Each of the categories– The frequency for each category

Page 17: Chapter 1 Section 1

Chapter 2 – Section 1

• A simple data set is

blue, blue, green, red, red, blue, red, blue

• A frequency table for this qualitative data is

• The most commonly occurring color is blue

Color Frequency

Blue 4

Green 1

Red 3

Page 18: Chapter 1 Section 1

Chapter 2 – Section 1

• The relative frequencies are the proportions (or percents) of the observations out of the total

• A relative frequency distribution lists– Each of the categories– The relative frequency for each category

Page 19: Chapter 1 Section 1

Chapter 2 – Section 1

• A relative frequency table for this qualitative data is

• A relative frequency table can also be constructed with percents (50%, 12.5%, and 37.5% for the above table)

Color Relative Frequency

Blue .500

Green .125

Red .375

Page 20: Chapter 1 Section 1

Chapter 2 – Section 1

• Bar graphs for our simple data (using Excel)– Frequency bar graph– Relative frequency bar graph

Page 21: Chapter 1 Section 1

Chapter 2 – Section 1

• A Pareto chart is a particular type of bar graph• A Pareto differs from a bar chart only in that the

categories are arranged in order– The category with the highest frequency is placed first

(on the extreme left)– The second highest category is placed second– Etc.

• Pareto charts are often used when there are many categories but only the top few are of interest

Page 22: Chapter 1 Section 1

Chapter 2 – Section 1

• A Pareto chart for our simple data (using Excel)

Page 23: Chapter 1 Section 1

Chapter 2 – Section 1

• An example side-by-side bar graph comparing educational attainment in 1990 versus 2003

Page 24: Chapter 1 Section 1

Chapter 2 – Section 1

• An example of a pie chart

Page 25: Chapter 1 Section 1

Chapter 2Section 2

Organizing Quantitative Data:

Page 26: Chapter 1 Section 1

Chapter 2 – Section 2

• Consider the following data

• We would like to compute the frequencies and the relative frequencies

Page 27: Chapter 1 Section 1

Chapter 2 – Section 2

• The resulting frequencies and the relative frequencies

Page 28: Chapter 1 Section 1

Chapter 2 – Section 2

• Example of histograms for discrete data– Frequencies– Relative frequencies

Page 29: Chapter 1 Section 1

Chapter 2 – Section 2

• Continuous data cannot be put directly into frequency tables since they do not have any obvious categories

• Categories are created using classes, or intervals of numbers

• The continuous data is then put into the classes

Page 30: Chapter 1 Section 1

Chapter 2 – Section 2

• For ages of adults, a possible set of classes is20 – 2930 – 3940 – 4950 – 59

60 and older• For the class 30 – 39

– 30 is the lower class limit– 39 is the upper class limit

• The class width is the difference between the upper class limit and the lower class limit

• For the class 30 – 39, the class width is40 – 30 = 10

Page 31: Chapter 1 Section 1

Chapter 2 – Section 2

• All the classes have the same widths, except for the last class

• The class “60 and above” is an open-ended class because it has no upper limit

• Classes with no lower limits are also called open-ended classes

Page 32: Chapter 1 Section 1

Chapter 2 – Section 2

• The classes and the number of values in each can be put into a frequency table

• In this table, there are 1147 subjects between 30 and 39 years old

Age Number

(frequency)

20 – 29 533

30 – 39 1147

40 – 49 1090

50 – 59 493

60 and older 110

Page 33: Chapter 1 Section 1

Chapter 2 – Section 2

• Good practices for constructing tables for continuous variables– The classes should not overlap– The classes should not have any gaps between them– The classes should have the same width (except for

possible open-ended classes at the extreme low or extreme high ends)

– The class boundaries should be “reasonable” numbers

– The class width should be a “reasonable” number

Page 34: Chapter 1 Section 1

Chapter 2 – Section 2

• Just as for discrete data, a histogram can be created from the frequency table

• Instead of individual data values, the categories are the classes – the intervals of data

Page 35: Chapter 1 Section 1

Chapter 2 – Section 2

• A stem-and-leaf plot is a different way to represent data that is similar to a histogram

• To draw a stem-and-leaf plot, each data value must be broken up into two components– The stem consists of all the digits except for the right

most one– The leaf consists of the right most digit– For the number 173, for example, the stem would be

“17” and the leaf would be “3”

Page 36: Chapter 1 Section 1

Chapter 2 – Section 2

• In the stem-and-leaf plot below

– The smallest value is 56– The largest value is 180– The second largest value is 178

Page 37: Chapter 1 Section 1

Chapter 2 – Section 2

• To draw a stem-and-leaf plot– Write all the values in ascending order– Find the stems and write them vertically in ascending

order– For each data value, write its leaf in the row next to its

stem– The resulting leaves will also be in ascending order

• The list of stems with their corresponding leaves is the stem-and-leaf plot

Page 38: Chapter 1 Section 1

Chapter 2 – Section 2

• Modifications to stem-and-leaf plots– Sometimes there are too many values with

the same stem … we would need to split the stems (such as having 10-14 in one stem and 15-19 in another)

– If we wanted to compare two sets of data, we could draw two stem-and-leaf plots using the same stem, with leaves going left (for one set of data) and right (for the other set)

Page 39: Chapter 1 Section 1

Chapter 2 – Section 2

• A dot plot is a graph where a dot is placed over the observation each time it is observed

• The following is an example of a dot plot

Page 40: Chapter 1 Section 1

Chapter 2 – Section 2

• A useful way to describe a variable is by the shape of its distribution

• Some common distribution shapes are– Uniform– Bell-shaped (or normal)– Skewed right– Skewed left

Page 41: Chapter 1 Section 1

Chapter 2 – Section 2

• A variable has a uniform distribution when– Each of the values tends to occur with the

same frequency– The histogram looks flat

Page 42: Chapter 1 Section 1

Chapter 2 – Section 2

• A variable has a bell-shaped distribution when– Most of the values fall in the middle– The frequencies tail off to the left and to the

right– It is symmetric

Page 43: Chapter 1 Section 1

Chapter 2 – Section 2

• A variable has a skewed right distribution when– The distribution is not symmetric– The tail to the right is longer than the tail to the left– The arrow from the middle to the long tail points right

Right

Page 44: Chapter 1 Section 1

Chapter 2 – Section 2

• A variable has a skewed left distribution when– The distribution is not symmetric– The tail to the left is longer than the tail to the right– The arrow from the middle to the long tail points left

Left

Page 45: Chapter 1 Section 1

Summary: Chapter 2 – Section 2

• Quantitative data can be organized in several ways– Histograms based on data values are good

for discrete data– Histograms based on classes (intervals) are

good for continuous data– The shape of a distribution describes a

variable … histograms are useful for identifying the shapes

Page 46: Chapter 1 Section 1

Chapter 2Section 3

Graphical Misrepresentations

of Data

Page 47: Chapter 1 Section 1

Chapter 2 – Section 4

• The two graphs show the same data … the difference seems larger for the graph on the left

• The vertical scale is truncated on the left

Page 48: Chapter 1 Section 1

Chapter 2 – Section 4

• The gazebo on the right is twice as large in each dimension as the one on the left

• However, it is much more than twice as large as the one on the left

Original “Twice” as large

Page 49: Chapter 1 Section 1

Summary: Chapter 2 – Section 1

• Qualitative data can be organized in several ways– Tables are useful for listing the data, its

frequencies, and its relative frequencies– Charts such as bar graphs, Pareto charts, and

pie charts are useful visual methods for organizing data

– Side-by-side bar graphs are useful for comparing two sets of qualitative data