View
225
Download
1
Tags:
Embed Size (px)
Citation preview
BCOR 1020Business Statistics
Lecture 3 – January 24, 2008
Overview• Chapter 3 – Describing Data Visually…
– Visual Description– Dot Plots– Frequency Distribution and Histograms– Simple Line Charts & Bar Charts– Scatter Plots– Tables– Pie Charts– Maps and Pictograms– Deceptive Graphs
Chapter 3 – Visual DescriptionMethods of organizing, exploring and summarizing data include:
• Visual (charts and graphs) – provides insight into characteristics of a data set without using mathematics.
• Numerical (statistics or tables) – provides insight into characteristics of a data set using mathematics.
Chapter 3 – Visual DescriptionBeginning with univariate data (a set of n observations on one variable), consider the following:
• Measurement – What are the units of measurement? Are the data integer or continuous? Any missing observations? Any concerns with accuracy or sampling methods?
• Central Tendency – Where are the data values concentrated? What seem to be typical or middle data values?
• Dispersion – How much variation is there in the data? How spread out are the data values? Are there unusual values?
• Shape – Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?
Chapter 3 – Visual Description• Example: Price/Earnings Ratios:
• P/E ratios are current stock price divided by earnings per share in the last 12 months. For example:
Chapter 3 – Visual DescriptionMeasurement – Look at the data and visualize how it was collected and measured.
Sorting – Sort the data and then summarize in a graphical display.
– Here are the sorted P/E ratios:
– Sorting allows you to observe central tendency, dispersion and shape as well as minimum, maximum and range.
88 1010 1010 1010 1313 1313 1414 1414 1515 1515
1616 1616 1717 1818 1919 1919 2020 2020 2121 2222
2323 2626 2626 2727 2929 2929 3434 4848 5555 6868
Chapter 3 – Dot PlotsA dot plot is the simplest graphical display of n individual values of numerical data.
– Easy to understand.– Not good for large samples (e.g., > 5,000).
Steps in Making a Dot Plot:1. Make a scale that covers the data range
2. Mark the axes and label them
3. Plot each data value as a dot above the scale at its approximate location. If more than one data value lies at about the same axis location, the dots are piled up vertically.
* Figure 3.4 in your text details the MegaStat menus for creating a dotplot.
Chapter 3 – Dot Plots
• Range of data shows dispersion.
• Can add annotations (text boxes) to call attention to specific features.
• Clustering shows central tendency. • Dot plots do not tell much of shape of distribution.
Chapter 3 – Frequency Distributions and Histograms
Bins and Bin Limits:• A frequency distribution is a table formed by
classifying n data values into k classes (bins).• Bin limits define the values to be included in each bin.
Widths must all be the same.• Frequencies are the number of observations within
each bin.• Express as relative frequencies (frequency divided by
the total) or percentages (relative frequency times 100).
Chapter 3 – Frequency Distributions and Histograms
Constructing a Frequency Distribution:1. Sort data in ascending order (e.g., P/E ratios)
2. Choose the number of bins (k).– k should be much smaller than n. – Too many bins results in sparsely populated bins, too few and
dissimilar data values are lumped together.
– Herbert Sturges proposes the following rule: k = 1 + log2(n)
Sample Size Sample Size (n)(n)
Number of Bins Number of Bins (k)(k)
1616 55
3232 66
6464 77
128128 88
Sample Size Sample Size (n)(n)
Number of Bins Number of Bins (k)(k)
256256 99
512512 1010
10241024 1111
Chapter 3 – Frequency Distributions and Histograms
Constructing a Frequency Distribution:
3. Set the bin limits: Bin width max minX X
k
In our example, we will use k = 7 bins to get convenient bin limits. The approximate bin width is:
68 8 608.57
7 7
Bin width
To obtain “nice” limits, we round the width to 10 and start the first bin at 0 to get bin limits:
0, 10, 20, 30, 40, 50, 60, 70
Chapter 3 – Frequency Distributions and Histograms
Constructing a Frequency Distribution:4. Put the data values in the appropriate bin.
– In general, the lower limit is included in the bin while the upper limit is excluded.
5. Create the table, you can include: – Frequencies – counts for each bin– Relative frequencies – absolute frequency divided by total
number of data values.– Cumulative frequencies – accumulated relative frequency
values as bin limits increase.
Example: Back to the P/E ratio data…
Chapter 3 – Frequency Distributions and Histograms
What are the bin limits for the P/E ratio data?
Bin Range FrequencyRelative
Frequency
Cumulative Relative
Frequency
0<P/E Ratio<10 1 0.0333 0.0333
10<P/E Ratio<20 15 0.5000 0.5333
20<P/E Ratio<30 10 0.3333 0.8666
30<P/E Ratio<40 1 0.0333 0.8999
40<P/E Ratio<50 1 0.0333 0.9332
50<P/E Ratio<60 1 0.0333 0.9665
60<P/E Ratio<70 1 0.0333 0.9998
Chapter 3 – Frequency Distributions and Histograms
Histograms:• A histogram is a graphical representation of a frequency distribution.• A histogram is a bar chart.
– X-axis ticks shows end points of each bin.
– Y-axis shows frequency (or relative/cumulative frequency) within each bin.Consider 3 histograms for the P/E ratio data with different bin widths. Do they give you different impressions of the data?
k = 4 k = 7 k = 13
* Figures 3.8 & 3.9 in your text details the MegaStat menus for creating a histogram.
Chapter 3 – Frequency Distributions and Histograms
Modal Class – a histogram bar that is higher than those on either side:• Monomodal – a single modal class.• Bimodal – two modal classes.• Multimodal – more than two modal classes.
Caution: Modal classes may be artifacts of the way bin limits are chosen.
Chapter 3 – Frequency Distributions and Histograms
Shape:• A histogram suggests the shape of the
population.• Skewness – indicated by the direction of the
longer tail of the histogram.– Left-skewed – (negatively skewed) a longer left tail.– Right-skewed – (positively skewed) a longer right tail.– Symmetric – both tail areas approximately the same.
Some examples…
Chapter 3 – Frequency Distributions and Histograms
ClickersConsider the histogram of the P/E ratio data that was displayed earlier in this lecture. How would you describe the skewness of this histogram?
A = symmetric
B = left-skewed
C = right-skewed
Chapter 3 – Simple Line Charts
Simple Line Charts – Used to display a time
series or spot trends, or to compare time periods.
• Can display several variables at once.
Chapter 3 – Simple Line ChartsTwo-scale line chart –used to compare variables that differ in magnitude or are measured in different units.
Grid Lines – A line graph usually has no vertical grid lines. Horizontal lines can be added to make it easier to establish the y value.
Which is easier to read?
Chapter 3 – Simple Line Charts
Log Scales:• Arithmetic scale – distances on the Y-axis are
proportional to the magnitude of the variable being displayed.
• Logarithmic scale – (ratio scale) equal distances represent equal ratios.– Use a log scale for the vertical axis when data vary over a wide
range, say, by more than an order of magnitude. This will reveal more detail for small data values.
– Log scale is only suited for positive data values.– Reveals whether the quantity is growing at an
increasing percent (concave upward), constant percent (straight line), or declining percent (concave downward)
Example…Consider the following graphs illustrating U.S. Trade from 1959 to 2002. What does the log scale graph tell you about growth rate for both series?
Arithmetic scale Log scale
Chapter 3 – Simple Line Charts
When to Use Log Scales:• Useful for…
• time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, national debt, future income)
• financial charts that cover long periods of time-data that grow rapidly (e.g., revenues)
Chapter 3 – Simple Line ChartsTips for Effective Line Charts:1. Line charts are used for time series data (never
for cross-sectional data).2. Y-axis shows numerical variable while X-axis
shows time units with time increasing left to right.
3. Use a zero origin on the Y-axis unless more detail is needed.
4. Omit numerical labels on a line chart to avoid clutter. Use gridlines if needed.
5. Use data markers (squares, triangles, circles) if they don’t clutter the graph.
6. Don’t make lines too thick.
Chapter 3 – Bar Charts
Plain Bar Charts – Most common way to display attribute data.• Bars represent categories or attributes.• Lengths of bars represent frequencies.
Chapter 3 – Bar ChartsPareto Charts – Special type of bar chart used in quality management to display the frequency of defects or errors of different types.• Categories are displayed in descending order of frequency.• Focus on significant few (i.e., few categories that account
for most defects or errors).
Chapter 3 – Bar Charts
Stacked Bar Chart – Bar height is the sum of several subtotals. Areas may be compared by color to show patterns in the subgroups and total.
Chapter 3 – Bar Charts
Bar Charts for Time Series Data – Bar charts can be (and often are) used for time series data although it may be harder to compare trends.
Chapter 3 – Bar Charts
Tips for Effective Bar Charts:1. Show the numerical variable of interest with
vertical bars on the Y-axis, category labels on the X-axis.
2. For time series quantities, display the category labels on the horizontal X-axis with time increasing from left to right.
3. The height or length of each bar should be proportional to the quantity displayed.
4. Put numerical values at the top of each bar, except if too cluttered.
Chapter 3 – Scatter Plots
Example: Aircraft Fuel Consumption:• Consider five observations on flight time and fuel
consumption for a twin-engine Piper Cheyenne aircraft.
Trip Trip LegLeg
Flight Flight Time Time
(hours)(hours)
Fuel Fuel Used Used
(pounds)(pounds)
11 2.32.3 145145
22 4.24.2 258258
33 3.63.6 219219
44 4.74.7 276276
55 4.94.9 283283
• A causal relationship is assumed since a longer flight would consume more fuel.
Chapter 3 – Scatter Plots
• Example: Aircraft Fuel Consumption:• Here is the scatter plot with flight time on the
X-axis and fuel use on the Y-axis.
• Is there an association
between variables?
* Figure 3.31 in your text details the Excel menus for creating a scatter plot.
Chapter 3 – Scatter PlotsDegree of Association/Correlation:
Very strong association Strong association
Moderate association Little or no association
ClickersConsider the scatter plot (below) comparing birthrates and life expectancies in several countries.
True or False: This graph shows a strong association between these two variables.
A = True
B = False
Chapter 3 – TablesTables are the simplest form of data display. A compound table is a table that contains time series data down the columns and variables across the rows.Example: School Expenditures
• Arrangement of data is in rows and columns to enhance meaning.• The data can be viewed by focusing on the time pattern (down the columns) or
by comparing the variables (across the rows).• Units of measure are stated in the footnote.• Note merged headings to group columns.
Elementary and SecondaryElementary and Secondary Colleges and UniversitiesColleges and Universities
YearYear All SchoolsAll Schools TotalTotal PublicPublic PrivatePrivate TotalTotal PublicPublic PrivatePrivate
19601960 142.2142.2 99.699.6 93.093.0 6.66.6 42.642.6 23.323.3 19.319.3
19701970 317.3317.3 200.2200.2 188.6188.6 11.611.6 117.2117.2 75.275.2 41.941.9
19801980 373.6373.6 232.7232.7 216.4216.4 16.216.2 140.9140.9 93.493.4 47.447.4
19901990 526.1526.1 318.5318.5 293.4293.4 25.125.1 207.6207.6 132.9132.9 74.774.7
20002000 691.9691.9 418.2418.2 387.8387.8 30.330.3 273.8273.8 168.8168.8 105.0105.0
Source: U.S. Census Bureau, Source: U.S. Census Bureau, Statistical Abstract of the United States: 2002Statistical Abstract of the United States: 2002, p. 133. Note: All , p. 133. Note: All figures are in billions of constant 2000/2001 dollars.figures are in billions of constant 2000/2001 dollars.
Chapter 3 – TablesTips for Effective Tables:1. Keep the table simple, consistent with its purpose.
– Summary tables go in the main body.– Detailed tables go in an appendix.– In a slide show, main point of table should be clear within 10 seconds,
otherwise, break up table.
2. Display the data to be compared in columns.3. Round off data to 3 or 4 significant figures.4. Table layout should guide the eye towards the desired
comparison.– Use spaces or shading to separate rows or columns.– Use lines sparingly.
5. Keep row and column headings simple yet descriptive.6. Use a consistent number of decimal digits within a
column.– Right-justify or decimal align the data.
Chapter 3 – Pie Charts
An Oft-Abused Chart:• A pie chart can only convey a general idea of
the data.• Pie charts should be used to portray data which
sum to a total (e.g., percent market shares).– If frequency counts are important, use a bar chart or
histogram.
• A pie chart should only have a few (i.e., 2 or 3) slices.
• Each slice should be labeled with data values or percents.
Chapter 3 – Maps and Pictograms
Spatial Variation and GIS:• Maps can be used for displaying many kinds of
data.– Appropriate when patterns of variation across space are
of interest.– Self-explanatory and revealing.– Assess patterns based on geography.
• GIS (geographic information systems) combines statistics, geography and graphics.
Chapter 3 – Maps and PictogramsExample: U.S. population change by county, 1990/2000U.S. population change by county, 1990/2000
Chapter 3 – Maps and PictogramsExample: U.S. presidential election results, 2004U.S. presidential election results, 2004
On election night 2004 and in the months and years since then, we have seen many maps that look like this.
The amount of red on the map is skewed because there are a lot of large states (geographically) in which a majority voted Republican.
One possible way to allow for this, suggested by Robert Vanderbei at Princeton University, is to use not just two colors on the map, red and blue, but instead to use red, blue, and shades of purple to indicate percentages of voters. Here is what the normal map looks like if you do this.
Source: http://www-personal.umich.edu/~mejn/election/
Chapter 3 – Maps and PictogramsExample: U.S. presidential election results, 2004U.S. presidential election results, 2004
We can also correct for this by making use of a cartogram, a map in which the sizes of states have been rescaled according to their population. That is, states are drawn with a size proportional not to their sheer topographic acreage -- which has little to do with politics -- but to the number of their inhabitants, states with more people appearing larger than states with fewer, regardless of their actual area on the ground.
Source: http://www-personal.umich.edu/~mejn/election/
Chapter 3 – Maps and PictogramsPictograms – A visual display in which data values are replaced by pictures.
• Although entertaining, they can create visual distortion. What do you think?
Chapter 3 – Deceptive Graphs
• A nonzero origin will exaggerate the trend.
Error 1: Nonzero Origin
Chapter 3 – Deceptive Graphs
• Keep the aspect ratio (width/height) below 2.00 so as not to exaggerate the graph. By default, Excel uses an aspect ratio of 1.8.
Error 2: Elastic Graph Proportions
Chapter 3 – Deceptive Graphs
• Keep short and grab readers attention.
• Avoid so as not to distract readers or impart an emotional slant.
• Can use pictures of authority figures to impart credibility to self-serving commercial claims.
Error 3: Dramatic Title
Error 4: Distracting Pictures
Error 5: Authority Figures
Chapter 3 – Deceptive Graphs
• Can make trends appear to dwindle into the distance or loom towards you.
Deceptive Correct
Error 6: 3-D and Rotated Graphs
Chapter 3 – Deceptive Graphs
• If tick marks are missing, you cannot identify individual data values.
• Missing or unclear units of measurement can render a chart useless.
• May indicate lost citation, unknown source, or mixed data sources. Use complete source citations.
Error 7: Missing Axis Demarcations
Error 8: Missing Measurement Units or Definitions
Error 9: Vague Source
Chapter 3 – Deceptive Graphs
• Avoid if possible. Keep your main objective in mind. If necessary, break graph into smaller parts.
Error 10: Complex Graphs
Chapter 3 – Deceptive Graphs
• Avoid too many annoying special effects when using slide shows.
• Estimated points should be noted when used or avoided if possible.
Error 11: Gratuitous Effects
Error 12: Estimated Data
Chapter 3 – Deceptive Graphs
• As figure height increases, so does width, distorting the area.
Error 13: Area Trick
ClickersConsider the graph given below. What error is present that makes this a deceptive graph?
A = Non-Zero Origin
B = Dramatic Title
C = 3-D or Rotated
D = Complex Graph
Avg. Annual Tuition at Colorado 4-Year Universities
$3,000
$4,000
$5,000
$6,000
$7,000
1997
-199
8
1998
-199
9
1999
-200
0
2000
-200
1
2001
-200
2
2002
-200
3
2003
-200
4
2004
-200
5
2005
-200
6
2006
-200
7
Academic Year
Av
g. T
uit
ion
Chapter 3 – Appendix:Effective Excel Charts
Use the mouse Use the mouse to select to select (highlight) the (highlight) the data you want to data you want to plot.plot.
• Click on the Chart Wizard icon on the toolbar to open a sequence of pop-up menus to guide you through the steps of creatingcreating a chart.
• Step 1:Step 1: Select the Select the Chart typeChart type and then click and then click NextNext..
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Chart WizardChart Wizard
• Step 2:Step 2: Add labels for Add labels for years on the years on the XX-axis by -axis by selecting a data range selecting a data range (B4:B13). Click (B4:B13). Click NextNext..
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Chart WizardChart Wizard
• Step 3:Step 3: Embellish Embellish the chart by adding the chart by adding a title, axis labels, a title, axis labels, adjusting the adjusting the gridlines or gridlines or appending a data appending a data table to the graph by table to the graph by clicking on the clicking on the appropriate tab.appropriate tab.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Chart WizardChart Wizard
• Step 4: Click Step 4: Click NextNext to display the finished chart. to display the finished chart.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Chart WizardChart Wizard
• Charts created in Excel can be edited to:Charts created in Excel can be edited to:
- Improve the titles (main, - Improve the titles (main, XX-axis, -axis, YY-axis).-axis).
- Change the axis scales (minimum, maximum, - Change the axis scales (minimum, maximum, demarcations).demarcations).
- Display the data - Display the data values (on the top values (on the top of each bar).of each bar).
Fractional Shares, 1993-2002
110 158 285 548957
1551
2607
3834
4871
5827
0
1000
2000
3000
4000
5000
6000
7000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Series1
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Charts created in Excel can be edited to:Charts created in Excel can be edited to:
- Add a data table underneath the graph.- Add a data table underneath the graph.
Fractional Shares, 1993-2002
0
2000
4000
6000
8000
Series1
Series1 110 158 285 548 957 1551 2607 3834 4871 5827
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Charts created in Excel can be edited to:Charts created in Excel can be edited to:
- Change color or patterns in the plot or chart areas.- Change color or patterns in the plot or chart areas.
Fractional Shares, 1993-2002
0
1000
2000
3000
4000
5000
6000
7000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Series1
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Charts created in Excel can be edited to:Charts created in Excel can be edited to:- Format the decimals (on the axes or data labels).- Format the decimals (on the axes or data labels).
- Edit the - Edit the gridlines gridlines (color, (color, dotted or dotted or solid, solid, patterns).patterns).
0
1000
2000
3000
4000
5000
6000
7000
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Series1
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Charts created in Excel can be edited to:Charts created in Excel can be edited to:
- Alter the appearance of the bars (color, pattern, gap - Alter the appearance of the bars (color, pattern, gap width).width).
Fractional Shares, 1993-2002
0
1000
2000
3000
4000
5000
6000
7000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Series1
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• To alter a chart’s appearance, click on any chart object To alter a chart’s appearance, click on any chart object and then right-click to see a menu of properties that you and then right-click to see a menu of properties that you can change.can change.
• For example, right-click on the For example, right-click on the YY-axis scale and choose -axis scale and choose Format AxisFormat Axis..
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Be careful about over-embellishing your charts.Be careful about over-embellishing your charts.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
Embellished bar chartEmbellished bar chart Over-embellished chartOver-embellished chart
• Excel offers many other types of specialized charts.Excel offers many other types of specialized charts.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
Multiple bar chartMultiple bar chartArea (mountain) chartArea (mountain) chart
Data from http://peltiertech.com/Excel/ChartsHowTo/HowToBubble.html
• Other specialized Excel charts:Other specialized Excel charts:- - BubbleBubble chart displays three variables on a chart displays three variables on a 2-dimensional scatter plot. 2-dimensional scatter plot.
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10
Series1
- Note: bubble - Note: bubble size is size is proportional to proportional to third variable.third variable.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Other specialized Excel charts:Other specialized Excel charts:
- - StockStock chart chart for high/lowfor high/low/close stock /close stock prices.prices.
Data fromhttp://finance.yahoo.com
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Other specialized Excel charts:
- Radar (or Spider) chart compares individual performance against abenchmark.
- Caution, data may be distorted by emphasized areas.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts
• Other specialized Excel charts:Other specialized Excel charts:
- Use - Use floating floating barbar charts to charts to show a range show a range of data.of data.
Effective Excel ChartsEffective Excel ChartsEffective Excel ChartsEffective Excel Charts
• Embellished ChartsEmbellished Charts