30
College of Engineering Engineering Statistics Department of Dam & Water Resources Lecturer: Goran Adil & Chenar ------------------------------------------------------------------------------------------------------------------------------- Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 1 Chapter 2 Part 3: Measure of Central Tendency Learning Outcomes When you complete this chapter, you should be able to do the following: • Understand the mean, median, mode, of a set of numerical data. • Compute the mean, median, mode of a given set of data. • Understand the mean, median, mode of a set of data as it applies to real world situations. Central Tendency When we work with numerical data, it seems apparent that in most set of data there is a tendency for the observed values to group themselves about some interior values; some central values seem to be the characteristics of the data. This phenomenon is referred to as central tendency. For a given set of data, the measure of location we use depends on what we mean by middle; different definitions give rise to different measures. We shall consider some more commonly used measures, namely arithmetic mean, median and mode. The formulas in finding these values depend on whether they are ungrouped data or grouped data. Sample and Population Let X 1 , X 2 , X N be the population values (in general, they are unknown) of the variable of interest. The population size = N Let x 1 , x 2 , …, x n be the sample values (these values are known). The sample size = n (i) A parameter is a measure (or number) obtained from the population values X1,X2, …, XN (parameters are unknown in general) (ii) A statistic is a measure (or number) obtained from the sample values x1,x2, …, xn (Statistics are known in general)

2.3 measure of central tendancies

Embed Size (px)

Citation preview

Page 1: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 1

Chapter 2

Part 3: Measure of Central Tendency

Learning Outcomes

When you complete this chapter, you should be able to do the following:

• Understand the mean, median, mode, of a set of numerical data.

• Compute the mean, median, mode of a given set of data.

• Understand the mean, median, mode of a set of data as it applies to real world situations.

Central Tendency When we work with numerical data, it seems apparent that in most set of data there is a tendency

for the observed values to group themselves about some interior values; some central values

seem to be the characteristics of the data. This phenomenon is referred to as central tendency.

For a given set of data, the measure of location we use depends on what we mean by middle;

different definitions give rise to different measures. We shall consider some more commonly

used measures, namely arithmetic mean, median and mode. The formulas in finding these values

depend on whether they are ungrouped data or grouped data.

Sample and Population

Let X1, X2, … XN be the population values (in general, they are unknown) of the

variable of interest. The population size = N

Let x1, x2, …, xn be the sample values (these values are known).

The sample size = n

(i) A parameter is a measure (or number) obtained from the population values X1,X2, …,

XN (parameters are unknown in general)

(ii) A statistic is a measure (or number) obtained from the sample values x1,x2, …, xn

(Statistics are known in general)

Page 2: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 2

Mean (Arithmetic Mean) The mean, often called the average of a numerical set of data, is simply the sum of the data

values divided by the number of values. This is also referred to as the arithmetic mean. The mean

is the balance point of a distribution.

∑ is the Greek symbol sigma denotes the summation of all x values.

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

The mean is sensitive to every value, so one exceptional value can affect the mean

dramatically.

The median overcomes that disadvantage.

Notation

Sample Data Population Data

Sample Size

n is the size of the Sample

Population Size

N is the size of Population

Sample Mean

Population Mean

Sample Median Population Median

sum of the valuesMean =

the number of values

‘ ’

is pronounced x bar and denotes

the mean of a set of sample

x

values

1 =

n

i

i

X

xn

1 =

n

i

i

X

xn

1 =

n

i

i

X

N

x

Page 3: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 3

Note

The mean is sensitive to every value, so one exceptional value can affect the mean

dramatically. Whereas median overcomes that disadvantage.

The mean is simple to calculate.

There is only one mean for a given sample data.

The mean can be distorted by extreme values.

The mean can only be found for quantitative variables

Median If the numerical data is SORTED into order from low to high (or high to low) the MEDIAN is

the number in the middle LOCATION in the sorted list.

Is not affected by an extreme value

Sample Median is denoted by and Population median

Mode The Mode is the number in the data set that occurs the most number of times (most frequently).

For ungrouped data, we simply count the largest frequency of the given value. If all are of the

same frequency, no mode exits. If more than one value has the same largest frequency, then the

mode is not unique. Denoted by M

Types of mode

1- Bimodal

When two data values occur with the same greatest frequency, each one is a mode and the data

set is bimodal.

2- Multimodal

When more than two data values occur with the same greatest frequency, each is a mode and the

data set is said to be multimodal.

3- No Mode

When no data value is repeated, we say that there is no mode.

x

x

Page 4: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 4

Example 2.9 The following series is the minimum monthly flow (m3 S-l) in each of the 20 years 1957 to 1976

at Bywell on the River Tyne: 21, 36, 4, 16, 21, 21, 23, 11, 46, 10, 25, 12, 9, 16, 10, 6, 11, 12, 17,

and 3. Calculate Mean, Mode, and Median

Calculate Mean, Median and Mode?

1

n

i

i

X

= 21+ 36 + 4 + 16 + 21 + 21 + 23 + 11 + 46 + 10 + 25 + 12 + 9 + 16 + 10 + 6 + 11

+ 12 + 17 + 3=

1=

n

i

i

X

n

Sort Date

3, 4, 6, 9, 10, 10, 11, 11, 12, 12, 16, 16, 17, 21, 21, 21, 23, 25, 36, 46

Example 2.10 The data in Table below (Adamson, 1989) are the annual maximum flood peak flows to the

Hardap Dam in Namibia, covering the period from October1962 to September 1987. The range

of these data is from 30 to 6100.

Annual maximum flood-peak inflows to Hardap Dam (Namibia): catchment area 12600 km2

Year 1962-3 1963-4 1964-5 1965-6 1966-7 1967-8 1968-9 1969-0 1970-1

Inflow (m3 S-l) 1864 44 46 364 911 83 477 457 782

Year 1971-2 1972-3 1973-4 1974-5 1975-6 1976-7 1977-8 1978-9 1979-0

Inflow (m3 S-l) 6100 197 3259 554 1506 1508 236 635 230

Year 1980-1 1981-2 1982-3 1983-4 1984-5 1985-6 1986-7

Inflow (m3 S-l) 125 131 30 765 408 347 412

Calculate Mean, Median and Mode?

Page 5: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 5

Page 6: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 6

Weighted Mean

Example 2.11 A water company operates in three divisions A, Band C. Each division has estimated the cost of

bringing the levels of service up to a new standard. The costs are expressed as pounds per year

per property over a five-year improvement period.

Find the weighted mean of the following example

Division A Division B Division C

Estimated cost (£/year per property) 10 14 20

Number of properties (thousands) 300 200 100

•( )x

w

w

x

Page 7: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 7

Mean from a Frequency Table

x = class midpoint. f = frequency. ∑ f = n

☻ Calculators can easily find the mean of frequency tables, using the class midpoints and the

frequencies.

☻ Mean values found from frequency tables will approximate the mean value found using the

actual data.

Median from a Frequency Table

For grouped data, the median can be found by first identify the class containing the median, then

apply the following formula:

In case of a frequency distribution, the median is given by the formula

2

h nX l c

f

Where

l =lower class boundary of the median class (i.e. that class for which the cumulative frequency is

just in excess of n/2).

h=class interval size of the median class

f =frequency of the median class

n=f (the total number of observations)

c =cumulative frequency of the class preceding the median class

Note:

This formula is based on the assumption that the observations in each class are evenly distributed

between the two class limits.

•( )x

f

f

x

Page 8: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 8

Mode form Frequency table For grouped data, the mode can be found by first identify the largest frequency of that class,

called modal class, then apply the following formula on the modal class:.

Where:

L1 is the lower class boundary of the modal class;

d1 is the difference of the frequencies of the modal class with the

Previous class and is always positive;

d2 is the difference of the frequencies of the modal class with the

Following class and is always positive;

L2 is the upper class boundary of the modal class.

Mode can also be obtained from a histogram.

Step 1: Identify the modal class and the bar representing it

Step 2: Draw two cross lines as shown in the diagram.

Step 3: Drop a perpendicular from the intersection of the two lines until it touch the horizontal

axis.

Step 4: Read the mode from the horizontal axis

11 2 1

1 2

( )d

Mode L L Ld d

Page 9: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 9

Example 2.12 Unit weight measurements from a boring are presented in Table below. This boring was drilled

offshore in the Gulf of Mexico at the location of an oil production platform. The soil consists of

a normally consolidated clay over the length of the boring. The unit weight varies with depth,

and ranges from 95 to 125 pcf.

1- Determine Mean, Median, and Mode from the Frequency Table.

2- Draw an Ogive chart between Depth and Total Unit weight, and from the graph, decide

whether any relation between Depth and Unit Weight exist?

Total Unit Weight Data from Offshore Boring Depth (ft) 0.5 1.0 1.5 5.0 6.5 7.5 16.5 19.0 Total Unit Weight, (pcf) 105 119 117 99 101 96 114 100 Depth (ft) 22.0 25.0 27.5 31.0 34.5 37.5 40.0 45.0 Total Unit Weight, (pcf) 99 102 100 101 101 100 101 99 Depth (ft) 50 60.5 62.0 71.5 72.0 81.5 82.0 91.5 Total Unit Weight, (pcf) 100 103 101 106 109 100 104 102 Depth (ft) 101.5 102.0 112.0 121.5 122.0 132.0 142.5 152.5 Total Unit Weight, (pcf) 106 99 102 100 101 101 104 102 Depth (ft) 162.0 172.0 191.5 201.5 211.5 241.5 251.5 261.8 Total Unit Weight, (pcf) 105 95 116 107 112 114 109 110 Depth (ft) 271.5 272.0 281.5 292.0 301.5 311.5 322.0 331.5 Total Unit Weight, (pcf) 109 106 108 111 125 112 104 113 Depth (ft) 341.5 342.0 352.0 361.5 362.0 371.5 381.5 391.5 Total Unit Weight, (pcf) 112 113 116 124 117 114 115 114 Depth (ft) 392.0 402.0 411.5 412.0 421.5 432.0 442.0 451.5 Total Unit Weight, (pcf) 115 114 112 115 115 112 115 119

Class limits Frequency

(F)

Cumulative

Frequency

Class

Midpoint

(x)

Class Boundary F×X

Lower Upper Lower Class

Boundary

Upper Class

Boundary

95 99 6 6 97 94.5 99.5

100 104 21 21+6=27 102 99.5 104.5 Mode

105 109 10 27+10=37 107 104.5 109.5 Median

110 114 14 37+14=51 112 109.5 114.5

115 119 11 51+11=62 117 114.5 119.5

120 124 1 62+1=63 122 119.5 124.5

125 129 1 63+64 127 124.5 129.5

64

Page 10: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 10

Page 11: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 11

Dispersion and Skewness Sometimes mean, median and mode may not be able to reflect the true picture of some data. The

following example explains the reason.

Symmetric

Distribution of data is symmetric if the left half of its histogram is roughly a

mirror image of its right half

Skewed

Distribution of data is skewed if it is not symmetric and extends more to one

side than the other

Skewed to the left

Also called negatively skewed) have a longer left tail, mean and median are to the

left of the mode

Skewed to the right

Also called positively skewed) have a longer right tail, mean and median are to

the right of the mode

Page 12: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 12

Chapter 2

Part 4: Measure of Variation (Spread)

Objectives In this section, we discuss the characteristic of variation. In particular, we present measures of

variation, such as the standard deviation, as tools for analyzing data. Our focus here is not only

to find values of the measures of variation, but also to interpret those values. In addition, we

discuss concepts that help us to better understand the standard deviation.

Definition: A measure of dispersion may be defined as a statistics signifying the extent of

the scattered-ness of items around a measure of central tendency.

Measure of Variation (Dispersion) The variation or dispersion in a set of values refers to how spread out the values are from each

other.

• The variation is small when the values are close together.

• There is no variation if the values are the same.

Some measures of dispersion: Range, Variance, Standard deviation and Coefficient of variation

Page 13: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 13

Range The range is the simplest measure of variability, calculated as:

Range = Largest observation – Smallest observation

E.g.

Data: {4, 4, 4, 4, 50} Range = 46

Data: {4, 8, 15, 24, 39, 50} Range = 46

The range is the same in both cases, but the data sets have very different distributions…

Note:

The range is not useful as a measure of the variation since it only takes into account two of the

values. (it is not good)

Variance

Variance and its related measure, standard deviation, are arguably the most important statistics.

Used to measure variability, they also play a vital role in almost all statistical inference

procedures.

The variance is a measure that uses the mean as a point of reference.

The variance is small when all values are close to the mean. The variance is large when all

values are spread out from the mean.

Population variance

Page 14: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 14

2

2 2 2

1 22 21...

N

iNi

XX X X

UnitN N

Population variance is denoted by σ2 (Lower case Greek letter “sigma” squared)

Sample variance is denoted by s2 (Lower case “S” squared)

Where

ix = the item or observation N = total number of

observations in the population µ = population mean

.

Sample Variance

2

2 2 2

1 22 21...

1 1

n

ini

x xx x x x x x

S Unitn n

Or

2 2

12

)

.

( 1

n

i

i

x n

s

x

n

22

2

( 1) (n 1)

iix

n

xs

n

Where 2s = Sample variance

x = Sample mean

n = total number of observations in the Sample

Note! The denominator is sample size (n) minus one!

The standard deviation The standard deviation is simply the square root of the variance, thus:

N

XN

ii

1

Page 15: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 15

Population standard deviation: 2

Sample standard deviation: 2s s

Standard Deviation from a Frequency Table

☻ Class midpoint must be used as the ‘representative’ score of each class for this computation.

Properties of Standard Deviation • Measures the variation among data values

• Values close together have a small standard deviation, but values with much more

variation have a larger standard deviation

• Has the same units of measurement as the original data

• For many data sets, a value is unusual if it differs from the mean by more than two

standard deviations

• Compare standard deviations of two different data sets only if the they use the same

scale and units, and they have means that are approximately the same

• The value of the standard deviation s is usually positive. It is zero only when all of the

data values are the same number. (It is never negative).

• The value of the standard deviation s can increase dramatically with the inclusion of one

or more outliers (data values that are very far away from all of the others).

• The units of the standard deviation s (such as minutes, feet, pounds, and so on) are the

same as the units of the original data values.

2 2[ ( )] [ ( )]

( 1)

n f x f xS

n n

Page 16: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 16

Co-efficient of Variation · The variance and the standard deviation are useful as measures of variation of the values of

a single variable for a single population (or sample).

If we want to compare the variation of two variables we cannot use the variance or the standard

deviation because:

1. The variables might have different units.

2. The variables might have different means.

Without an understanding of the relative size of the standard deviation compared to the original

data, the standard deviation is somewhat meaningless for use with the comparison of data sets.

To address this problem the coefficient of variation is used.

☻ The coefficient of variation often used to compare the variability of two data sets. It allows

comparison regardless of the units of measurement used for each set of data.

☻ The larger the coefficient of variation, the more the data varies.

100

%CV

100 %s

CVx

For Sample For Population

Page 17: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 17

Example 2.13

Stream flow velocity. A practical example of the mean is the determination of the mean velocity

of a stream based on measurements of travel times over a given reach of the stream using a

floating device. For instance, if 10 velocities are calculated as follow:

Velocity,

m/s

0.20 0.20 0.21 0.42 0.24 0.16 0.55 0.70 43 0.34

Calculate

1- Mean, Median and Mode,

2- Standard Deviation and

3- Coefficient of Variation

Page 18: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 18

Example 2.14

Concrete cube test. From 28-day concrete cube tests made in England in 1990, the following

results of maximum load at failure in kilonewtons and compressive strength in newtons per

square millimeter were obtained:

Maximum load: 950, 972, 981, 895, 908, 995, 646, 987, 940, 937, 846, 947, 827,

961, 935, 956.

Compressive strength: 42.25, 43.25, 43.50, 39.25, 40.25, 44.25, 28.75, 44.25, 41.75,

41.75, 38.00, 42.50, 36.75, 42.75, 42.00, 33.50.

Calculate

1- Mean, Median and Mode,

2- Standard Deviation and

3- Coefficient of Variation

Page 19: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 19

Example 2.15

64 Samples of unit weight measurements from a boring are presented in Frequency table below.

This boring was drilled offshore in the Gulf of Mexico at the location of an oil production

platform. The soil consists of a normally consolidated clay over the length of the boring. The unit

weight varies with depth, and ranges from 95 to 125 pcf.

The data are summarised in Frequency Table shown below,

Determine Variance and Standard Deviation

Class limits Frequency (F) X f x 2f x Lower Upper

95 99 6

100 104 21

105 109 10

110 114 14

115 119 11

120 124 1

125 129 1

64

Page 20: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 20

Page 21: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 21

Estimation of Standard Deviation Range Rule of Thumb

4 4

highest value loweRa st vn es

alueg

Usual Sample Values

Minimum ‘usual’ value (mean) - 2 (standard deviation)

Minimum x - 2(s)

Maximum ‘usual’ value (mean) + 2 (standard deviation)

Maximum x + 2(s)

The Empirical Rule… If the histogram is bell shaped

Page 22: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 22

1. Approximately 68% of all observations fall within one standard deviation of the mean.

2. Approximately 95% of all observations fall within two standard deviations of the mean.

3. Approximately 99.7% of all observations fall within three standard deviations of the mean.

Page 23: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 23

Example 2.16 A bell shaped data set contains sample data. The data set has a mean of 250 and a standard

deviation of 30.

A) What is the range for usual data?

B) What is the range for unusual data?

C) Is a value of 130 unusual?

Example 2.17* (Example 2.9) The following series is the minimum monthly flow (m3 S-l) in each of the 20 years 1957 to 1976

at Bywell on the River Tyne:

21, 36, 4, 16, 21, 21, 23, 11, 46, 10, 25, 12, 9, 16, 10, 6, 11, 12, 17, 3

Page 24: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 24

Example 2.18

A Water content of a soil sample yield the following data.

The results followed a bell shaped data set with a mean of 26% and a standard deviation of 5%

A) What is the range for usual data?

B) What is the range for unusual data?

C) Would a water content of 10 and 60 are considered as a usual value?

Page 25: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 25

Chebyshev’s Theorem Applies to distributions of any shape.

the proportion (or fraction) of any set of data lying within K standard deviations of the

mean is always at least 1 - 1/K2 , where K is any positive number greater than 1.

K=2, at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.

K=3, at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.

Example 2.19

The mean value of Specific Gravity results of 20 soil samples was 2.68 with a standard deviation

of 0.02.

Find the range at which at least 75% of the data will fall using Chebyshev’s Theorem

Page 26: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 26

Example 2.20 Annual rainfall.

If the annual rainfalls in a city are 22, 37, 25, 62, 33, 51, 56, 42, 53, and 49 cm over a 10-year

period, Find the minimum percentage of the data values that will fall between 36 and 50

cm

Page 27: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 27

Tutorials

Tutorial 2.9

Total cycle times of trucks hauling asphaltic concrete on a highway project were observed and

found to be (in minutes):

30 18 17 24 20

20 16 24 25 19

24 28 23 23 23

17 18 11 18

Find the sample mean, Median, Mode, Standard Deviation and Coefficient of variation.

Tutorial 2.10

Fifteen lots of 100 sections each of 108-in. concrete pipe were tested for porosity. The number

of sections in each lot failing to meet the standards were:

1 5 6 3 0

7 4 9 4 1

3 2 1 8 6

Compute the sample mean, Mode, Median, Standard Deviation and Variance, and coefficient of

variation. If the plant continues to manufacture pipe of this quality, can you suggest a possible

technique for quality control of the product? What cost factors enter the problem?

Tutorial 2.11

The following values of shear strength (in tons per square foot) were determined by unconfined

compression tests of soil from Waukell Creek, California. Compute Mean, Median, Mode, and

Standard Deviation along with coefficient of variations.

0.12 0.21 0.36 0.37 0.39 0.46 0.47 0.50 0.50 0.51

0.53 0.58 0.61 0.62 0.77 0.81 0.93 1.05 1.59 1.73

Page 28: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 28

Tutorial 2.12

The times (in seconds) for loading, swinging, dumping, and returning for a shovel moving

sandstone on a dam project were measured as shown in Table below.

Compute sample mean, variance, and coefficient of variation of each set of data. If the variability

in total time is causing hauling problems, which operation should be studied as the primary

source of

variability in the total? Which of the summary statistics would be most useful in such a study?

Load Swing Dump Return Total Load Swing Dump Return Total

25 10 2 8 45 18 7 2 8 35

17 9 2 9 37 15 8 2 10 35

14 8 2 9 33 25 10 2 10 35

19 10 2 9 40 14 8 2 10 34

18 8 2 10 38 14 8 2 9 33

16 10 2 15 43 21 7 2 8 38

19 7 2 8 36 17 10 2 9 38

22 11 2 8 43 15 9 2 11 37

17 9 2 8 36 16 12 2 12 43

15 10 2 9 36 21 8 2 10 41

20 8 2 11 41 13 9 2 9 36

15 25 2 10 52 15 10 2 9 51

26 10 2 13 51

Tutorial 2.13

The following data for the Ogden Valley artesian aquifer have been collected over a period of

years. Find the sample means, Mode, Median, variances, standard deviations, and coefficient of

variation.

Ogden Valley artesian aquifer Discharge and Recharge data

Year 1935 1936 1937 1938 1939 1940 1941 1942

Measurement of discharge,

acre-ft.

11300 12800 12700 10400 10800 11500 9900 11900

Estimated recharge, acre-ft. 11400 14600 13600 10100 9900 1200 9700 11800

Year 1943 1944 1945 1946 1947 1948 1949 1950 1951

Measurement of discharge,

acre-ft.

1300 13700 14100 15200 15100 15400 16000 16500 16700

Estimated recharge, acre-ft. 12700 13600 14600 14900 14300 14200 17400 16400 14900

Page 29: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 29

Tutorial 2.14

Embankment material for zone 1 of the Santa Rosita Dam in Southern Chihuahua, Mexico, will

come from a borrow pit downstream from the dam site at a location that is frequently flooded. A

cofferdam 800 m long is needed and the contractor needs to know the optimum construction

height. Normal flow (200 m3/sec) requires a height of 3 m. flooding will involve a 3-week delay

in construction. Maximum flow rates from 1921 to 1965 were:

Year 1921 1922 1923 1924 1925 1926 1927 1928 1929

Inflow (m3 S-l) 1340 1380 1450 618 523 508 1220 - 1060

Year 1930 1931 1932 1933 1934 1935 1936 1937 1938

Inflow (m3 S-l) 412 184 1480 876 113 516 1780 1090 944

Year 1939 1940 1941 1942 1943 1944 1945 1946 1947

Inflow (m3 S-l) 397 282 353 597 995 611 985 1430 778

Year 1948 1949 1950 1951 1952 1953 1954 1955 1956

Inflow (m3 S-l) 1280 1020 1300 1000 1890 611 409 780 674

Year 1957 1958 1959 1960 1961 1962 1963 1964 1965

Inflow (m3 S-l) 969 870 329 458 1556 1217 819 576 1324

The Contractor’s option are:

Cofferdam height, m Capacity, m3/sec cost

3 200 15, 600 US dollar

4.5 550 18600 US dollar

The cost of a 3-week delay from flooding of the borrow pit is estimated as $30,000.

Compute the sample mean, Mode, Median, Standard deviation and variance. Will a histogram be

useful in the analysis of the contractor’s decision? Why? How would you structure the decision

situation? How does time enter the problem?

Page 30: 2.3 measure of central tendancies

College of Engineering Engineering Statistics

Department of Dam & Water Resources Lecturer: Goran Adil & Chenar

-------------------------------------------------------------------------------------------------------------------------------

Measure of Central Tendency and Dispersion Chapter 2-3 & 2-4 30

Tutorial 2.15

The maximum annual flood flows for the Feather River at Oroville, California, for the period

1902 to 1960 are as follows. The data have been ordered, but the years of occurrence are also

given. Compute sample mean, Median, Mode, Standard Deviation and variance. Plot histogram

and frequency distribution. If a 1-year construction project is being planned and a flow of 20,000

cfs or greater will halt construction, what, in the past, has been the relative frequency of such

flows ?

Year 1907 1956 1928 1938 1940 1909 1960

Floods (cfs) 230 000 203 000 185 000 185 000 152 000 140 000 135 000

Year 1906 1914 1904 1953 1942 1943 1958

Floods (cfs) 128 000 122 000 118 000 113 000 110 000 108 000 102 000

Year 1903 1927 1951 1936 1941 1957 1915

Floods (cfs) 102 000 94 000 92 100 85 400 84 200 83 100 81 400

Year 1905 1917 1930 1911 1919 1925 1921

Floods (cfs) 81 000 80 400 80 100 75 400 65 900 64 300 62 300

Year 1945 1952 1935 1926 1954 1946 1950

Floods (cfs) 60 100 59200 58 600 55 700 54 800 54 400 46 400

Year 1947 1916 1924 1902 1948 1922 1959 1910

Floods (cfs) 45 600 42 400 42 400 41 000 36 700 36 400 34 500 31 000

Tutorial 2.16

The water-treatment plant at an air station in California was constructed for a design capacity

of 4,500,000 gal/day (domestic use). It is nearly always necessary to suspend lawn irrigation

when demand exceeds supply. There are, of course, attendant losses. Measured demands during

July and August 1965 (weekdays only) were (in thousands of gallons per day, ordered data):

2298 3205 3325 3609 3918 3992 4057 4188 4289 4363

4377 4448 4450 4524 4536 4565 4591 4657 4666 4670

4724 4737 4763 4784 4816 4817 4852 4887 4905 4908

4923 4941 4993 4998 5035 5041 5058 5142 5152 5152

5330 5535

Compute sample mean, Mode, Median, Standard Deviation and variance. Construct a cumulative

histogram in which 4,500,000 gal/day is one of the interval boundaries. On a relative frequency

basis, how often did demand exceed capacity?