Statistics Assignment 05

8/3/2019 Statistics Assignment 05

1/14

Assignment 05Course: Statistics (STA 240)

Submitted To:

Md. Mortuza AhmmedSubmitted By:Shamsul Islam Raisy(BSCE-11106005)Submitting Date:21/11/2011.


2/14

Statistics:

Statistics is the study of the collection, organization, analysis and interpretation of

data.

Application of statistics in civil engineering: There are many uses in different kinds offields in civil engineering of statistics. One of the most important is disaster management.

Every year our country faces lots of natural disasters. We collect, organize and analyze all

those datas to interpret them, to think of a better way to deal with the nature.

Sample and population: In a class of students, everyone is part of the whole population

and the monitor is a sample.

Variable: A number that could change the value in different situations is variable. Ex. height

of a student (a different student could have a different height), size of a shirt (there could be

many different size of a shirt).

Scale of measurement:

1. Nominal scale: nominal scale of measurement only satisfies the identity property ofmeasurement. Values assigned to variables represent a descriptive category, but have

no inherent numerical value with respect to magnitude. Ex. gender (male, female),

color (black, white, red), religion (Islam, Hindu, Buddhism).

2. Ordinal scale: the ordinal scale has the property of both identity and magnitude.Each value on the ordinal scale has a unique meaning and it has an ordered

relationship to every other value on the scale. Ex. in a horse race-win, place and

show; in a class-superior, good, average and poor.

3. Interval scale: the interval scale of measurement hasthe properties of identity, magnitude and equal intervals.

Interval scale expresses the difference and the

measurement of difference in the same scale. Ex.

Womens dress size and temperature

Size Bust Waist Hips

8 32 24 35

10 34 26 37

4. Ratio scale: The ratio scale of measurement satisfiesall four properties of measurement, identity,

magnitude, equal intervals, and an absolute zero.

Ex. weight of an object, it could be zero. We can

say C weights twice as B and D is heavier than A,

B and C.

Day Temperature

Sunday 60 f

Monday 65 f

Tuesday 70 f

Object Weight

A 0

B 2

C 4

D 8


3/14

Line Graph :

Bar Diagram :

Pie Chart :

0

200

400

600

800

1000

1200

August September October November December

0

50

100

150

200

250

Muslim Hindu Other

Muslim

Hindu

Other

Months Share

Index

August 700

September 800

October 1000

November 600

December 400

Religion Population

Muslim 200

Hindu 100

Other 50

Religion Population Percent

Muslim 200 57.14%

Hindu 100 28.57%

Other 50 14.29%

Total 350 100%


4/14

Scatter Diagram :{x, y}={(10,20)(20,40)(30,45)(40,50)(50,55)(60,50)(70,45)}

Steam & Leaf Plot :11,14,16,21,23,24,27,30,31,35,36,37,38,40,41,42,43,50,51.Steam Leaf

1 1,4,6.

2 1,3,4,7.

3 0,1,5,6,7,8.

4 0,1,2,3.

5 0,1.

Histogram :

Central tendency:

Central tendency is to find out a significant number to represent the whole data set.

Measures of central tendency:

0

10

20

30

40

50

60

0 20 40 60 80

0

0.5

1

1.5

2

2.5

3

3.5

5 to 10 10 to 15 15 to 20 20 to 25 25 to 30

Age Weight

10 20

20 40

30 45

40 50

50 55

60 50

70 45

Class Frequency Length

5 to 10 5 5/5=1

10 to 15 10 10/5=2

15 to 20 15 15/5=3

20 to 25 10 10/5=2

25 to 30 5 5/5=1


5/14

1. Mean: also known as arithmetic mean is the value which we get after dividing thetotal of all value by the number of values. = .Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4

=

5 + 6 + 7 + 3 + 4 + 7 + 8 + 5 + 4

9

=49

9

= 5.44

2. Median: it is the middle value that we get after arranging all the values inchronological order. Its generally used when there is an extreme value

present. = ()

Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4[chronologically= 3, 4,4,5,5, 6, 7, 7,8.]

= 9 + 12

= 5.Again, if the given value is: 3,4,4,5,5,6,7,7,8,9.

10

2 = 5. 10

2+ 1 = 6.= 5 = 5 5 6

5 + 5

2= 5

3. Mode: in a given set of numbers which one appears most often that of is mode ofthat data set. When there is two numbers appearing most, then both of them are

mode. Ex. given values: 3,4,4,5,5,6,7,7,7,8,9. = 74. Geometric mean:its useful when we want to find average change of

percentages, ratios, or growth rates over time. We cant use GM when there is a 0 or

negative value in the set. GM is never greater than the AM. The GM of a set of n

positive numbers is defined as the n th root of the product of n values: = 123 () {, = ,1 = . }To find average percent increase over time the formula would be:

= 1{, = } 5. Harmonic mean:we cant use HM when there is a 0 or negative value in the set.

The formula is:

= 11 + 12 + + 1 {, = , 1 = . . }Arithmetic mean is the best measure of central tendency: there are some criteria for a good

measure of central tendency. They are:

I. Clearly defined.II. Readily comprehensible.III. Easily calculated.IV. Based on all observations.V. Less effected by extreme values.VI. Capable for further algebraic treatment.


6/14

And since arithmetic mean fulfills all criteria other then the fifth one, some can say Arithmetic

mean is the best measure of central tendency.

Extreme value in real life situation: let in a family of three where members weights a re 40, 45

and 50 kg, we can find the average 45kg. But if someone with 100kg of weight joins the

family then our average will be 58.7kg, which is more than 3 other values and very far fromthe last one.

Sometimes, measures of central tendency are not appropriate to work with. If we consider

only the central tendency on a data set, we may draw wrong conclusion about the whole

data set. Ex. In a company, average salary of employees is 79000tk (98000tk, 75000tk and

65000tk.) but this doesnt mean everyones salary is 79000tk each or some amount near that.

Find the two numbers whose harmonic mean=32/5, geometric mean=8., = 32/5, = 8, , < . = 8 = 64 . . ()

=

64

. .

, 21 + 1 =

32

5

2 + =

32

5

2 + = 325

2 6 4 + = 325 32 + = 5 2 6 4 + = 20 64 + = 20 64 + 2

= 20

64 + 2 = 20


7/14

2 20 + 6 4 = 0b (b-16)-4(b-16) = 0 (b-16) (b-4) = 0So, b= 16 = 4a= 20 16 or a= 20 4=4 = 16

< = 4 = 16 Find the mean for the series- 1, 2, 3,. . . . . . . . , 500.

mean for the series = + 12 = 500+ 12 = 250.5 What is the median of the sample 4,5,7,9,6,3,2,5,1,9,8,5,8?

= 1,2,3,4,5,5,5,6,7,8,8,9,9.= 5 Find the mean for the series 1000, 2000, 3000, . . . . . . ., 50000.

, = + 12

1000,2000,3000, . . . . . . . , 50000. 1000 ,1 ,2,3, . . . . . . . ,50. = 5 0 + 1

2= 25.5

= 25.5 1000 = 25500Measures of Dispersion

Measures of dispersion are the way by which we can find out the actual data sheet ofan average. That means finding out which average is very systematic.

Methods of measures of dispersion:

Range is based on the largest and the smallest values in the data set, mean deviation,

variance and standard deviation are all based on deviations from the arithmetic mean.

a. Range: The simplest measure of dispersion is the range. It is the difference betweenthe largest and the smallest values in a data set. the formula:

=

Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find outthe range for his scores.


8/14

= 98 75= 23

b. Mean deviation: The arithmetic mean of the absolute values of the deviations fromthe arithmetic mean. the formula:

= {, = , = = }again, when we have to find out MD and SD for two numbers:

= = 2

{, = }Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find out

the mean deviation for his scores.

Given, 1 = 92, 2 = 75, 3 = 95,4 = 90, 5 = 98So, = 92+75+95+90+98

5

=450

5

= 90

= 92 90 + 75 90 + 95 90 + 90 90 + 98 905

=2+15+5+0+8

5

=30

5= 6

c. Variance: The arithmetic mean of the squared deviation from the mean. the formulais: 2 = 2 Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out

the variance for his scores.

Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88So, = 86+94+76+76+88

5

=420

5

= 84

2 = 86842+94842+76842+76842+888425

=22+102+82+82+42

5

=4+100+64+64+16

5

=248

5= 49.6

d. Standard deviation: The square root of the variance. the formula is: = 2 again, when we have to find out MD and SD for two numbers: = =

2{, = }

or when we have to find out the standard deviation for n numbers: = 2112

Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out

the standard deviation for his scores.

Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88


9/14

So, = 86+94+76+76+885

=420

5

= 84

= 86

84

2+

94

84

2+

76

84

2+

76

84

2+

88

84

2

5

=22+102+82+82+425

= 4+100+64+64+165

= 2485

= 49.6 = 7.04Co-efficient of variation: co-efficient of variation is used to compare the dispersion

in different sets of data with different units of measurement. The formula is:

= 100Ex. Find the co-efficient of variation for 5kg and 3taka.Here, = 53

2= 1

and, = 5+32

= 4

= 14

100

= 25%

Correlation: The correlation is a way to measure how associated or related two

variables are. Formula is:

= 22{, =, = , = }There are three patterns of correlation:

I. Positive correlation: in a positive correlation, ifone of the observations increases the second

one does the same and when the firstobservation decreases the second one does

the same. Ex. higher education and years

spent on education - people with higher

education tends to more year of education.

II. Negative correlation: in a negative correlation, if one of the observationsincreases the second one decreases and when the first observation decreases

the second one increases. Ex. watching TV and exam grade when a

student watch a lots of TV he tends to have lower grade in exam.

III. Zero correlation: in a zero correlation, an observation doesnt have any effecton other one. Ex. Bill Gates money and my happiness no matter how muchmoney Bill Gates has that dont make me sad or happier.

Chart for correlation strength

Range Strength

+1 Perfectly positive

-1 Perfectly negative

0 to .3 Weakly positive

.3 to .7 Moderately positive

.7 to -1 Strongly positive


10/14

Regression:Its a statistical tool for the investigation of relationship between

variables. Formula is:

= + {,= = 0, =

1

}

and,

= = 2()2

Difference between regression & correlation

Regression Correlation

1. It can explain cause or effect. 1. It cant explain cause or effect.2. The limit of regression is

. + 2. The limit of correlation is -1 . . . ..+13. It cant predict the future. 3. It can predict the future.

Probability:

Probability provides a way to find and express our uncertainty in making decisions

about a population from sample information. Probability reflects the long-run relative

frequency of the outcome, a probability could be expressed as decimal (0.1), faction

(1

10) or percentage (10%). Formula:

I. = + II. () = () + ()III. = . (/) = . (/)IV. = . ()V. (/) = ()() = .(/)()

Important Terms:

Experiment: Its an activity that is either observed or measured, such as tossing a coin

or drawing a card.

Event: An event is a possible outcome of an experiment. Ex. if the experiment is to

sample six lamps coming off a production line, an event could be to get one defective

and five good ones.

Certain event: In a certain event, if we have a sample of eight numbers and we have

to find out the probability that sample should be included with eight digits; this is

known as certain event. Ex. = 2,3,5,7,11,13,17,19 = 88 [] = 1


11/14

Impossible event: An event which have no possibility to occur. Ex. in a jar of red balls

finding a white ball could be considered as impossible event.

Sample space: A sample space is a complete set of all events of an experiment. Ex.

singer

=

,

,

,

,

,

, bee

=

,

.

Mutually exclusive events: those events that cant happen at a time are called

mutually exclusive event. Ex. In a coin toss of a single coin, events of heads and tails

are mutually exclusive event.

Independent event: Two or more events could be called independent events when

the occurring or not occurring of one doesnt affect another. Ex. coin toss and exam

grade.

Conditional probability: A conditional probability is denoted by P(X/Y).

Probability Distributions: There are three types of probability distribution:

I. Binomial distribution: The probability distribution of the random variable X iscalled binomial distribution. The formula is:() = {, = , = 0,1,2,3, . . ,,

= , = . [ = 1 ]}

Mean of binomial distribution is: = = Variance of binomial distribution is: () = 2 =

II. Poisson probability: There are some applications for the Poisson distribution.Applications are:

a) The number of death by horse kicking in the army.b) Birth defects and genetic mutations.c) Rare diseases (leukemia).d) Car accidents.e) Traffic flow and ideal gap distance.f) Number of typing errors on a page.g) Hairs found in McDonalds burger.h) Spread of an endangered animal in Africa.i) Failure of a machine in one month.

Formula is: =

! , = 0,1,2,3, . . = 2.71828,=

Mean and variance: = , = 2 = .III. Normal probability distributions: The normal probability distribution is very

common in the field of statistics. Formula:() = 12 12( )2 Mean and variance: E(X) = V(X) =

2


12/14

Area under the normal curve using integration: the probability of a

continuous normal variable X found in a particular interval [a,b] is the area

under the curve bounded by x= and x= ( < < ) = () The standard normal distribution: If we have the standardized situation of

= 0 and = 1 then we have,() = 12 22

we can transform all the observations of any normal random variable x with

mean ()and variance () to a new set of observations of another normalrandom variable z with mean 0 and variance 1 using the following transform

= Property of normal distribution:

a) The normal curve is symmetrical about the mean

.

b) The mean is at the middle and divides the area into halves.c) The total area under the curve is equal to 1.d) Its completely determined by its mean and standard deviation.

Sampling:

The methods of drawing sample from a population are:

1. Simple random sampling: Simple random sample is a sample selected so thateach item or person in the population has the same chance of being included,

this can be done in two methods

I. Lottery method: let, in a group of people we have to select 3 peoplerandomly. We write down all their names on different small piece of papers.

Then fold them so that no one could read which name is written in which.

Then shuffle them all in a jar. Then ask someone to pick three piece of paper

from that jar, and this three will be names of our 3 selected people. This

method of selecting simple random sampling is called lottery method.

II. Random number applying: random numbers can be obtained using thecalculator, a spreadsheet, printed tablets of random numbers or tossing coins

or rolling dice.

2. Stratified sampling: let, in a group of people we have to select 1 single,1married and 2 divorced. To be able to do that, we have to divide all the male

and females of that group in 3 subgroups, 1.single 2.married and 3.divorced.

Then from first subgroup we have to take one, one from second subgroup and

two from the third subgroup. This way we will get our 1 single, 1married and 2

divorced people. This method of sampling is called stratified sampling.

3. Systematic sampling: let, in a university we have to know what the studentsare thinking about a new drink within two days, but we can ask only hundred

students in that time limit. There are five thousand students in that university. So

to complete this task in two days, we divide the entire student IDs by 50 and we

ask every 50th

ID holder about our new drink. This process of sampling is called

systematic sampling.


13/14

4. Cluster sampling: let, in a university we have to know what the students arethinking about a new drink, within two days. There are five thousand students in

that university studying in thirty subjects. That is huge amount of data to process

in two days. So to complete this task in two days, we select five specific subjects

and we ask twenty students from each selected subjects about our new drink.

This process of sampling is called cluster sampling.

Difference between stratified sampling & cluster sampling

Stratified sampling Cluster sampling

1. Two strata cannot be same. 1. Two clusters can be same.2. Strata show the homogeneous

and the heterogeneous type

[in case of situation].

2. Clusters show the

homogeneous.

3. Strata divided into groups. 3. Clusters are divided intobrunch.

Hypothesis:

Hypothesis is a statement about a parameter subject to verification.

Null hypothesis: A statement about the value of a population parameter

developed for the purpose of testing numerical evidence. It is expressed by0. Alternate hypothesis: A statement that is accepted if the sample data providesufficient evidence that the null hypothesis is false. It is expressed by1.Level of significance: The probability of rejecting the null hypothesis when it is

true. It is also called level of risk, because it is the risk we take of rejecting the null

hypothesis when it is really true. It is expressed by.Hypothesis testing is done in five simple steps. They are:

Step 1: Establishing 0 and 1.Step 2: Selecting the value for .Step 3: Selecting appropriate formula. = Step 4: Calculating the value of z.

Step 5: Making a decision, we have to accept or reject 0 depending on the value ofz. If the value of z is more then , then 1 is right and if the value of z is less then ,then 1is wrong.In these five steps, after calculating all the right variables, two kind of error is possible.They are:


14/14

Type 1 error: Rejecting the 0, when it is true or right.Type 2 error: Accepting the 0, when it is false or wrong.

The average I.Q. of university women in Bangladesh is suspected to be more then110. A random sample of 64 women yielded an average I.Q. of 115.5 and a standarddeviation of 20. Can you conclude that the average I.Q. of the women in the

population is really more than 110? Test this at 5% level of significance (5% = 1.64).Step 1:0: = 1101: > 110Step 2:5% = 1.64Step 3: = Step 4:, = 115.5 = 110 = 64 = 20 =

115.5

110

2064 =

5.5

2.5= 2.2

Step 5:

Value of z= 2.2 is more than the value of5% = 1.64 so, the average I.Q. of universitywomen in Bangladesh is more then 110.

Documents

Statistics Assignment 05