39
Frequency Tables and Single variable Graphics

Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Embed Size (px)

Citation preview

Page 1: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Frequency Tables and Single variable Graphics

Page 2: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense the data into a more manageble form.

This can be accomplished with the aid of a frequency distribution.

FREQUENCY DISTRIBUTION

Page 3: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

The frequency for x=1 is 3

To demonstrate the concept of a frequency distribution, let’s use the following set of data:

3 2 2 3 2 4 4 1 2 2 4 3 2 0 2 2 1 3 3 1

A frequency distribution is used to represent this set of data by listing the x values with their frequencies. For example, the value 1 occurs in the sample three times;

Page 4: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

The frequency f is the number of times the value x occurs in the sample.

x f

0 1

1 3

2 8

3 5

4 3

3 2 2 3 2 4 4 1 2 2 4 3 2 0 2 2 1 3 3 1

Ungrouped frequency distribution

We say ungrouped because each value of x in the distribution stands alone.

Page 5: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Classes: When a large set of data has many different x values instead of a few repeated values, as in the previous example, we can group the data into a set of classes and construct a frequency table.

Lower and upper class limits: Lower class limit is the smallest piece of data that could go into each class. The upper class limits are the largest values fitting into each class.

Number of classes: It can be take a value between 8 and 15.

CONSTRUCTION OF A FREQUENCY TABLE

Page 6: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Class boundaries (true class limits) are numbers that do not occur in the sample data but are halfway between the upper limit of one class and the lower limit of the next class.

Relative frequency is a propotional measure of the frequency of an occurence.

Class mark (class mid-point) is the numerical value that is exactly in the middle of each class.

Class interval is the difference between a lower class limit and the next lower class limit.

Page 7: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

The two basic guidelines that should be followed in constructing a grouped frequency distribution are:

1. Each class should be of the same width.

(there are some exceptions)

2. Classes should be set up so that they do not overlap and so that each piece of data belongs to exactly one class

Page 8: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

157 148 165 116 119 165 ........140 198 208 161 312 200 n=155 (There are 155 observations)

1. Rank the data.

2. Identify lowest (L) and highest (H) scores and find the range (range=H-L)

3. Select the number of classes and find class width

L=116, H=315

Range=312-116=196

#of classes=8

Class Int.=196/8=24,525

PROCEDURE OF CLASSIFICATION

Page 9: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

CLASSES Lower Limit Upper Limit Frequency

116 140 8

141 165 13

166 190 31

191 215 47

216 240 35

241 265 16

266 290 4

291 315 1 TOTAL 155

Relative Frequency

=100*(8/155)=5.2

Relative Frequency

5.2

8.4

20.0

30.3

22.6

10.3

2.6

0.6 100.0

Class

Mid-Point

128 153 178 203 228 253 278 303

True

Class Limit

140.5 165.5 190.5 215.5 240.5 265.5 290.5

Less than ... Rel. Freq.

0.0

5.2 13.5 33.5 63.9 86.5 96.8 99.4 100.0

Less than ... Frequency

0

8 21 52 99 134 150 154 155

115.5

315.5

CONSTRUCTION OF THE FREQUENCY DISTRIBUTION TABLE

Page 10: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

fi

mi

A

L

C=25

CLASSES Lower Limit Upper Limit Frequency bi

Relative Frequency

Class Mid-Point

True Class Limit

Less than ... Frequency

Less than ... Rel. Freq.

115.5 0 0.0

116 140 8 -3 5.2 128 140.5 8 5.2

141 165 13 -2 8.4 153 165.5 21 13.5

166 190 31 -1 20.0 178 190.5 52 33.5

191 215 47 0 30.3 203 215.5 99 63.9

216 240 35 1 22.6 228 240.5 134 86.5

241 265 16 2 10.3 253 265.5 150 96.8

266 290 4 3 2.6 278 290.5 154 99.4

291 315 1 4 0.6 303 TOTAL 155 100.0 315.5 155 100.0

Page 11: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Measures of central tendency

Mean

n

mf

x iii

8

1 Cn

bfAx i

ii

8

1or

Median

Cf

Cfn

LM

2

Mode is 203

Page 12: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Measures of position

What are the 25th, 75th percentiles and the median?

TCLxTCLx

TCLxx

xx

xx

23

2

23

21

Page 13: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

True Class Limit

Less than ... Frequency

Less than ... Rel. Freq.

115.5 0 0.0

140.5 8 5.2

165.5 21 13.5

190.5 52 33.5

215.5 99 63.9

240.5 134 86.5

265.5 150 96.8

290.5 154 99.4

315.5 155 100.0

25=x133.5-13.5

25-13.5

x=?x-165.5

190.5-165.5

Page 14: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

TCLxTCLx

TCLxx

xx

xx

23

2

23

21

5.1655.190

5.165

5.135.33

5.1325

x 88.179 x

165.5 13.5

X 25

190.5 33.5

P25=Q1=179.88

25% of observations lie below 179.88.

Page 15: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

1

8

1

22

nn

mfmf

s i

iiii

Standart deviation

1

)(8

1

22

nn

bfbf

Cs i

iiii

or

Coefficient of variation

%33.1710032.203

24.35100

x

sCV

Page 16: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Graphic Presentation of Data

We will learn how to present single-variable data by using graphical technique.

There are several graphic ways to describe data.

The method used is determined by the type of data and the idea to be presented.

Page 17: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Bar graph and pie (circle) graph are often used to summarize attribute data.

Data are represented by frequency or proportion.

In graphical presentation, proportion is more meaningful than frequency.

BAR GRAPH AND PIE GRAPH

In a bar graph; x axis represents the attribute, while y axis (bar’s height) represents proportion or frequency of each attribute.

In a pie graph, each piece represents proportion of attribute.

Page 18: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Marital status of woman are given below:

Marital status Freq. %

Single 65 46.8

Married 32 23.0

Divorced 27 19.4

Widowed 10 7.2

Separate 5 3.6

Total 139 100.0

Example

Page 19: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Marital Status

SeparateWidowedDivorcedMarriedSingle

Per

cent

50

40

30

20

10

0

Bar chart of marital status of woman

Page 20: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

3,6%

7,2%

19,4%

23,0%

46,8%

Separate

Widowed

Divorced

Married

Single

Pie chart of marital status of woman

Page 21: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

STEM AND LEAF PLOT

The stem is leading digit(s) of the data, while the leaf is the trailing digit(s). For example, the numerical value 458 might be split into stem (45) and leaf (8).

This plot provides a convinient means of tallying the observations and can be used as a direct display of data or as a preliminary step in constructing a frequency table.

Page 22: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Let’s construct a stem-and-leaf display of following set of 20 test scores:

82 74 88 66 58 74 78 84 96 76

62 68 72 92 86 76 52 76 82 78

At a quick glance we see that there are scores in 50s, 60s, 70s, 80s and 90s.

Let’s use the first digit of score as the stem and second digit as the leaf.

Page 23: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

We will construct the display in a vertical position. Draw a vertical line and to the left of it locate the stems in order.

5

6

7

8

9

Next we place each leaf on its stem. This is accomplished by placing the trailing digit on the right side of the vertical line opposite to its corresponding leading digit.

2 8

2 6 8

2 4 4 6 6 6 8 8

2 2 4 6 8

2 6

Page 24: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

All scores with the same tens digit are placed on the same branch. This may not always be desired. Suppose we construct the display; this time instead of grouping ten possible values on each stem, let’s group the values so that only five possible values could fall in each stem.

(50-54) 5(55-59) 5(60-64) 6(65-69) 6(70-74) 7(75-79) 7(80-84) 8(85-89) 8(90-94) 9(95-99) 9

2826 84 4 28 6 6 6 82 4 28 626

Page 25: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Histogram is a type of bar graph representing the frequency distribution of quantitative data.

A histogram is made up of the following components:

1. A title, which identifies the sample of concern.

2. A vertical scale, which identifies the frequencies (relative frequencies) in the various classes.

3. A horizantal scale, which identifies the variable x (class mid-points or true class limits or lower class limits).

HISTOGRAM

Page 26: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

2920 4050 2300 3400 3912 3234 3110 4045 3310 2594 2187 2700 3300 4250 3350 2750 2342 2350 4680 4123 3470 2084 2211 2450 4070 4360 3700 2495 3203 2950 3250 2920 2630 2466 3100 2780 4265 1588 3700 4990 3510 3020 4174 2950 3380 2607 3225 2410 3450 1800 3110 3860 4100 2540 2960 1790 3610 2225 3232 2625

Birthweights of 60 infants are given below:

Page 27: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

CLASSES Lower Limit Upper Limit Frequency

1588 2013 3

2014 2439 8

2440 2865 11

2866 3291 14

3292 3717 11

3718 4143 7

4144 4569 4

4570 4995 2 TOTAL 60

bwt

0

4

8

12

Co

un

t

1800

,5

2226

,5

2652

,5

3078

,5

3504

,5

3930

,5

4356

,5

4756

,5

Page 28: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

CLASSES Lower Limit Upper Limit

Relative Frequency

1588 2013 5.0

2014 2439 13.33

2440 2865 18.33

2866 3291 23.33

3292 3717 18.33

3718 4143 11.67

4144 4569 6.67

4570 4995 3.33 TOTAL 100.00

bwt

0%

5%

10%

15%

20%

Per

cen

t

1800

,5

2226

,5

2652

,5

3078

,5

3504

,5

3930

,5

4356

,5

4756

,5

Page 29: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Co

un

t

bwt

0

4

8

12

1800

,5

2226

,5

2652

,5

3078

,5

3504

,5

3930

,5

4356

,5

4756

,5P

erce

nt

bwt

0%

5%

10%

15%

20%

1800

,5

2226

,5

2652

,5

3078

,5

3504

,5

3930

,5

4356

,5

4756

,5

Page 30: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Symmetric Distribution

Right-skewed Distribution Left-skewed Distribution

Page 31: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

BOX PLOT (BOX AND WHISKER PLOT)The median and first and third quartiles of the distribution are used in constructing box plots. The location of the midpoint or median of the distribution is indicated with a horizontal line in the box. Straight lines or whiskers extend 1.5 times the interquartile range above and below the 75th and 25th percentiles when there are outliers or extreme observations. If they do not exist, lines represent minimum and maximum values. Cases with values between 1.5 and 3 box lengths from the upper or lower edge of the box are called outliers. Cases with values more than 3 box lengths from the upper or lower edge of the box are called extreme points.

Page 32: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

6000

5000

4000

3000

2000

1000

Page 33: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Since there are no outliersBWT

3110,0

3677,5

4990

3402

50 (Median)

75

Percentiles

Maximum

Range

1588Minimum

2553,525

Page 34: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Mode

Median

Mean

Mode Median

Mean

ModeMedian

Mean

Left SkewedRight Skewed Simetric

Page 35: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

SCATTER PLOT WITH ONE VARIABLE

Scatter plot displays the value of each observation by a small circle, on an invisible line which is parallel to the y-axis displaying original measurement.

BW

T

6000

5000

4000

3000

2000

1000

Page 36: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

In line graph, individual data points are connected by a line. Line plots provide a simple way to visually present a sequence of many values.

LINE GRAPH

Page 37: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

The distribution of measles cases among seansons in an area are as follows:

Spring 75

Summer 25

Fall 50

Winter 100

SEASONS

WinterFallSummerSpring

Fre

quen

cy

120

100

80

60

40

20

Page 38: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

Error bars help you visualize distributions and dispersion by indicating the variability of the measure being displayed. The mean of a scale variable is plotted for a set of categories, and the length of the bar on either side of the mean value indicates standard deviations. Error bars can extend in one direction or both directions from the mean.

Error bars are sometimes displayed in the same chart with other chart elements such as bars or lines.

ERROR BARS

Page 39: Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense

BWT

Mea

n

1 S

D B

WT

4000

3000

2000

BWT

Mea

n

2 S

D B

WT

5000

4000

3000

2000

1000