32
UKP 6053: ANALISIS DATA DAN PENAKSIRAN DESCRIPTIVE STATISTICS AND PROBABILITY Lecturer; Prof Madya Dr Abd Wahab Bin Jusoh Prepared by; Mr Sazliman Ismail (M20082000084)

Exercise Lesson 4,5,6

  • Upload
    mahmuda

  • View
    938

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Exercise Lesson 4,5,6

UKP 6053: ANALISIS DATA DAN PENAKSIRAN

DESCRIPTIVE STATISTICSAND

PROBABILITY

Lecturer;

Prof Madya Dr Abd Wahab Bin Jusoh

Prepared by;

Mr Sazliman Ismail (M20082000084)

Mr Mahmud Ahmad (M20082000083)

Mr Tho Siew Wei (M20082000303)

Mrs Norazlilah Md. Nordin (M20082000300)

Miss Wong Yew Tuang (M20082000XXX)

Page 2: Exercise Lesson 4,5,6

EXERCISES (Measures of Central Tendency)

3.1) Explain how the value of the median is determined for a data set that contains an

odd number of observations and for a data set that contains an even number of

observations.

Solution:

To determine the value of the median for a data set that contains an odd number

of observations first rank the data in increasing order then find the position of the

middle term in a data set with n values as follows:

Position of the middle term = . The value at this position is the median

For example the mark for a sample of five students of class 5R as follows:

76 56 85 63 91

So to find the median we must do the following steps:

STEP 1: rank the data in increasing order

56 63 76 85 91

STEP 2: find the position of middle term

where n = 5, so the position of the middle term = =

Therefore, median marks for this sample is 76.

If the number of observation is even, we also need to rank the data in increasing

order and median can be find from the average of the values of the two middle

terms. For example we have the following data:

4 6 11 9 20 14

STEP 1: rank the data in increasing order

4 6 9 11 14 20

STEP 2: find the position of middle term

where n = 6, so the position of the middle term = =

Page 3: Exercise Lesson 4,5,6

So median is the average value of

3.2) Briefly explain the meaning of an outlier. Is the mean or the median a better

measure of central tendency for a data set that contains an outlier? Illustrate with

the help of an example.

Solution:

Outliers or Extreme Values is a data set that contains a few very small or a very

large values, relative to the majority of the values in a data set .The mean is not

always the best measure of central tendency because it is heavily influenced by

outliers but median it is not influenced by outliers.

Example: Table below show the monthly salary for 7 staff at Rahman Industries.

Staff Salary/month

1 1500

2 1000

3 800

4 15000

5 1200

6 1000

7 1600

Mean for sample with outliers Mean for sample without outliers

= = 3157.14 =

Median is not influences by the outliers because the mediangives the center of a

histogram, with half of the data values to the left of the median and half to the

right of the median.

Page 4: Exercise Lesson 4,5,6

3.4) Which of the three measures of central tendency (mean, median, mode) can be

calculated for quantitative data only, and which can be calculated for both

quantitative and qualitative data? Illustrate with example.

Solution:

Mode can be calculated for both kinds of data, qualitative and quantitative,

whereas the mean and median can be calculated for only quantitative data.

Example: The race of 10 students who are members of Sc. and Math. Club are:

malay chinese chinese indian

chinese malay chinese indian chinese

malay

Because Chinese occurs more frequently, than the other categories, it is the mode

for this data set. We cannot calculate the mean and median for this data set.

3.18) The following data give the 1998 profits (million dollars) of the 10 airlines listed

in ‘Fortune magazine’s top 1000 U.S Corporations’. A number with a negative

sign represents a loss for that airline. The profits are respectively,

1314 821 1001 -286 538 383 433 -120 109 124

Compute the mean and median. Do these data have a mode? Explain.

Solution:

Mean=

Median for this data we must rank the data in increasing order

-286 -120 109 124 383 433 538 821 1001 1314

After that find the position of middle term = = = 5.5

So median is the average value of

This data have no mode because it data set with each value occurring only once.

Page 5: Exercise Lesson 4,5,6

3.23) The following data give the 1998 revenues (million dollars) of the eight

automotive retailing and service companies.

1410 1604 1630 2298 2616 3045 5189 17487

(a) Calculate the mean and median for these data

(b) Do these data contain an outlier? If so, drop the outlier and recalculate the

mean and median. Which of these two summary measure changes by a large

amount when you drop the outlier

(c) Which is the better summary measure for these data, the mean or the median ?

Explain.

Solution:

a)

b) Yes, these data contain an outlier which is 17487 million dollars.

To find the median rank the data in increasing or decreasing order where this

data already arrange in decreasing order so we can continue to find the

position of the middle term as n =7

So the position of middle term = hence the median values is 2298.

The value of mean are affected seriously when we drop the outlier .

c) Median is the better summary measure for these data because the value of

mean are affected seriously when we drop the outlier meanwhile the value of

median are not influenced by outlier.

Page 6: Exercise Lesson 4,5,6

EXERCISES (Measures of dispersion : Ungrouped data)

3.37) When is the value of the standard deviation for a data set zero? Give one example.

Calculate the standard deviation for the example and show that its value is zero.

Solution:

The sum of the deviations of the x values from the mean is always zero. That

means S(x - m) = 0 and = 0

Example: The final examination result of Physics subject of 6 students are 58, 76

64, 68, 81 and 73

X58 58 – 70 = -1276 76 – 70 = +664 64 – 70 = -668 68 – 70 = -281 81 – 70 = +1173 73 – 70 = +3

From the table, the sum of the deviations of the x values from the mean is zero.

That means,

Page 7: Exercise Lesson 4,5,6

3.41) The following data give the weekly food expenditures for a sample of five

families.

65 82 92 116 170

(a) Find the mean for these data. Calculate the deviations of the data values from

the mean. Is the sum of these deviations zero?

(b) Calculate the range, variance and standard deviation.

Solution:

a)

X65 65 – 105 = -4082 82 – 105 = -2392 92 – 105 = -13116 116 – 105 = +11170 170 – 105 = +65

The sum of these deviations is Zero.

b) Range = 170 – 65 = 105 dollars

X x2

65 422582 672492 8464116 13456170 28900

Variance,

Standard deviation,

EXERCISES : (Measure of dispersion : Grouped Data)

Page 8: Exercise Lesson 4,5,6

3.66) The following table gives information on the amounts (dollars) of electric bills for

August 1999 for a sample of 50 families.

Amount of electric bill (dollars) Number of families

0 to less than 20 5

20 to less than 40 16

40 to less than 60 11

60 to less than 80 10

80 to less than 100 8

Find the mean, variance and standard deviation.

Give a brief interpretation of the values in the column labeled m¦ in your table of

calculations. What does Sm¦ represent?

Solution:

Since, the data set includes only 50 families, it represents a sample.

Amount of electric

bill (dollars)

f m mf m2f

0 to less than 20 5 10 50 500

20 to less than 40 16 30 480 14400

40 to less than 60 11 50 550 27500

60 to less than 80 10 70 700 49000

80 to less than 100 8 90 720 64800

n=50

Variance, σ2:

Page 9: Exercise Lesson 4,5,6

Standard deviation, σ:

Where m is midpoint amount of electric bill in dollars and ¦ is the frequency of a

class. Thus, m¦ represents the approximate total amount of electric bill spent in

August 1990 by families.

EXERCISES (Chebyshev’s Theorem & Empirical Rule)

Page 10: Exercise Lesson 4,5,6

3.76) The mean time taken by all participants to run a road race was found to be 220

minutes with a standard deviation 0f 20 minutes. Using Chebyshev’s theorem,

find the percentage of runners who ran this road race in

a) 180 to 260 minutes

b) 160 to 280 minutes

c) 170 to 270 minutes

Solution:

From the given information, for this distribution;

Mean, m = 220

Standard deviation, s = 20

a) As shown below, each of the two points is 40 units away from the mean.

k =

= = 2

Then,

or 75%

According to Chebyshev’s theorem, at least 75% of the runners who ran this

road race in 180 to 260 minutes.

180 220 260

180-220=-40 260-220=40

mm-2-2ss mm mm+2+2ss

At least 75% of the values lie in the shaded area

Page 11: Exercise Lesson 4,5,6

b) As shown below, each of the two points is 60 units away from the mean.

k =

= = 3

Then,

or 88.9%

According to Chebyshev’s theorem, at least 88.9% of the runners who ran this

road race in 160 to 280 minutes.

c) As shown below, each of the two points is 50 units away from the mean.

160 220 280

160-220=-60 280-220=60

170 220 270

170-220=-50 270-220=50

mm-3-3ss mm mm+3+3ss

At least 88.9% of the values lie in the shaded area

Page 12: Exercise Lesson 4,5,6

k =

= = 2.5

Then,

or 84%

According to Chebyshev’s theorem, at least 84% of the runners who ran this

road race in 170 to 270 minutes.

3.80) The mean life of a certain brand of auto batteries is 44 months with a standard

deviation of 3 months. Assume that the lives of all auto batteries of this brand

have a bell-shaped distribution. Using the empirical rule, find the percentage of

auto batteries of this brand that have a life of

mm-2.5-2.5ss mm mm+2.5+2.5ss

At least 84% of the values lie in the shaded area

Page 13: Exercise Lesson 4,5,6

a) 41 to 47 months

b) 38 to 50 months

c) 35 to 53 months

Solution:

We use the empirical rule to find the required percentage because the distribution

of ages follows a bell-shaped curve. From the given information, for this

distribution,

= 44 months and s = 3 months

a) Each of the two points, 41 and 47, is 3 units away from the mean. Therefore,

k = 3/3 = 1

Thus, the distance between 41 and 44 and between 44 and 47 is equal to s.

From the empirical rule, because the area within one standard deviation of the

mean is approximately 68% for a bell-shaped curve, approximately 68% of

the auto batteries in the sample have a life between 41 to 47 months.

b) Each of the two points, 38 and 50, is 6 units away from the mean. Therefore,

k = 6/3 = 2

Thus, the distance between 38 and 44 and between 44 and 50 is equal to 2s.

4141 44 44 47 47 x – s x x + sx – s x x + s

ss- s- s

Page 14: Exercise Lesson 4,5,6

From the empirical rule, because the area within two standard deviations of

the mean is approximately 95% for a bell-shaped curve, approximately 95%

of the auto batteries in the sample have a life between 38 to 50 months.

c) Each of the two points, 35 and 53, is 9 units away from the mean. Therefore,

k = 9/3 = 3

Thus, the distance between 35 and 44 and between 44 and 53 is equal to 3s.

From the empirical rule, because the area within two standard deviations of

the mean is approximately 99.7% for a bell-shaped curve, approximately

99.7% of the auto batteries in the sample have a life between 35 to 53 months.

EXERCISES (Measure of Position)

2.89) The following data give the speeds of 13 cars, measured by radar, traveling on

interstate highway I-84.

3838 44 44 50 50 x – 2s x x + 2sx – 2s x x + 2s

22ss

- - 2s2s

3535 44 44 53 53 x – 3s x x + 3sx – 3s x x + 3s

33ss

- - 3s3s

Page 15: Exercise Lesson 4,5,6

73 75 69 68 78 69 74 76 72 79 68

77 71

(a) Find the values of three quartiles and interquartile range.

(b) Calculate the (approximate) value of the 35th percentile.

(c) Compute the percentile rank of 71.

Solution:

(a) Find the values of three quartiles and interquartile range.

First, we rank the given scores in increasing order. Then we calculate the three

quartiles as follows:

68 68 69 69 71 72 73 74 75 76 77 78 79

Interquartile range, IRQ

= Q3 – Q1

= 76.5 – 69

= 7.5

(b) Calculate the (approximate) value of the 35th percentile.

First, we arrange the given scores in increasing order

68 68 69 69 71 72 73 74 75 76 77 78 79

Values less than the median Values greater than the median

Q1=

= 69

Q3=

= 76.5

Q2= 73

(median)

Page 16: Exercise Lesson 4,5,6

The value of the 35th percentile is;

th term

The value of the 4.55th term can be approximated by the average of the fourth

and fifth terms in the ranked data. Therefore,

35th percentile, P35 = = 70

(c) Compute the percentile rank of 71.

First, arrange the scores in increasing order.

68 68 69 69 71 72 73 74 75 76 77 78 79

In this data set, 4 of the 13 scores are less than 71.

Hence, Percentile rank of 71 = = 30.8%

Rounding this answer to the nearest integral value. Therefore, we can state

that about 31% of the scores in this sample are less than 71.

EXERCISES (Probability)

4.19) Which of the following values cannot be probabilities of events and why ?

1/5 0.97 -0.5 1.56 5/3 0.0 -2/7 1.0

Page 17: Exercise Lesson 4,5,6

Solution:

The probability of an event always lies in the range 0 to 1.

Simple event E1 = 0 ≤ P(E1) ≤ 1

Compound event A = 0 ≤ P(A) ≤ 1

-0.5, 1.56, 5/3, -2.7 cannot be probabilities of events. This is because -0.5 and -2.7

have probability less than 0 meanwhile 1.56 and 5/3 have probability more than 1.

Meanwhile, 1/5, 0.97, 0.0 and 1.0 are probabilities of events because there are in

range 0 to 1.

4.25) A hat contains 40 marbles with 18 are red and 22 are green. If one marble is

randomly selected out of this hat, what is the probability that this marble is

(a) red (b) green

Solution:

Let n denote the total number of marbles in the hat and f1 represents number of red

marble and f2 represents number of green marble.

n = 40, f1 = 18, f2 = 22

a) using the relative frequency concept of probability:

P (red marbles in the hat) = = = 0.45

b) using the relative frequency concept of probability:

P (green marbles in the hat) = = = 0.55

Page 18: Exercise Lesson 4,5,6

MARBLES Frequency, f RELATIVE FREQUENCY

RED 18 = 0.45

GREEN 22 = 0.55

n = 40 Sum = 1.00

4.26) A dice is rolled once. What is the probability that

(a) a number less than 5 is obtained

(b) a number 3 to 6 is obtained ?

Solution:

a) The experiment has a total of six outcomes = 1, 2, 3, 4, 5 and 6.

All these outcomes are equally likely. Let A be an event that a number less

than 5 is observed on the dice. Event A include four outcomes = 1, 2, 3 and 4;

that is A = {1, 2, 3, 4}

If any one of these four number is obtained, event A is said to occur. Hence,

P(A) = = = 0.67

b) Let B be an event that a number 3 to 6 is obtained. Event B include four

outcomes: 3, 4, 5, 6; that is B = {3, 4, 5, 6}

If anyone of these four numbers is obtained, event B is said to occur. Hence,

P(B) = = = 0.67

Page 19: Exercise Lesson 4,5,6

6 A B

4.45) How many different outcomes are possible for four rolls of a dice?

Solution:

Suppose we roll a dice four times, where each step has 6 outcomes; 1, 2, 3, 4, 5

and 6.

Total outcomes for four roll of a dice

= 6 X 6 X 6 X 6 = 64 = 1296

4.46) How many different outcomes are possible for 10 tosses of a coin?

Solution:

Suppose we toss a coin 10 times, where each step has 2 outcomes; head and tail.

Total outcomes for 10 toss of a coin

= 2 X 2 X 2 X 2 X 2 X 2 X 2 X 2 X 2 X 2 = 210 = 1024

4.47) A statistical experiment has eight equally likely outcomes that are denoted by 1,

2, 3, 4, 5, 6,7, and 8. Let event A ={ 2, 5, 7 } and event B = { 2, 4, 8 }.

(a) Are events A and B mutually exclusive events?

(b) Are events A and B independent events?

(c) What are the complements of events A and B, respectively, and their

probabilities?

Solution:

a) The following are the Venn diagram of event A and B

1 3

57

42 8

Page 20: Exercise Lesson 4,5,6

Mutually nonexclusive events A and B.

At a 2 spot, A and B happen at the same time. Hence, event A and B are not

mutually exclusive.

b) Two events are said to be independent if the occurrence of one does not affect

the probability of the occurrence of the other. So, A and B are independent

events if:

P (A | B) = P (A) or P (B | A) = P (B)

But in this cases, P (A | B) ≠ P (A) or P (B | A) ≠ P (B)

Thus, events A and B are not independent events.

c) P (A’) =1- P (A)

= 1-

=

P (B’) = 1- P (B)

= 1-

=

Page 21: Exercise Lesson 4,5,6

4.54) The following table gives a two-way classification, based on gender and

employment status, of the civilian labor force age 16 to 24 years as of July 1999.

The numbers in the table are thousands.

EMPLOYED UNEMPLOYED TOTAL

MALE 11638 1337 12975

FEMALE 10540 1157 11697

TOTAL 22178 2494 24672

(a) If one person is selected at random from these young persons, find the

probability that this person is

i) unemployed

ii) a female

iii) employed given the person is male

iv) a female given the person is unemployed

(b) Are the events ‘employed’ and ‘unemployed’ mutually exclusive? What about

the events ‘unemployed’ and ‘male’?

(c) Are the events ‘female’ and ‘unemployed’ independent? Why or why not?

Solution:

a) i) P(unemployed) = = = 0.1010

Page 22: Exercise Lesson 4,5,6

ii) P (female) = = = 0.4741

iii) P (employed | male) = = = 0.8970

iv) P (female | unemployed) = = = 0.4639

\

b) Mutually exclusive events are the events are that cannot occur together.

For the events ‘employed’ and ‘unemployed’, a gender is neither an employed

nor unemployed. A gender cannot have both identities, meaning that he or

she cannot be a employed and unemployed at the same time. Thus, employed

and unemployed is mutually exclusive.

S

For the events unemployed and male, if we choose a male. The male can be

employed or unemployed. Then unemployed gender who is a male happens at

the same time. Hence, event unemployed and male are not mutually exclusive.

S

employed unemployed

male unemployed

Page 23: Exercise Lesson 4,5,6

c) Two events are said to be independent if the occurrence of one does not affect

the probability of the occurrence of the other. Thus events female and

unemployed is said to be independent if: either P (female | unemployed) = P

(female) or P (unemployed | female) = P (unemployed)

However, in this experiment, P (female | unemployed) + P (female) where not

all the female is unemployed, still have same female is employed and P

(unemployed | female) + P (unemployed) where not all unemployed is female,

some unemployed is male. The occurrence of unemployed events affects the

probability if occurrence of female event and inversely. Thus, female and

unemployed is independent events.