Upload
mila-anjum
View
259
Download
5
Embed Size (px)
Citation preview
8/3/2019 Statistics Assignment 05
1/14
Assignment 05Course: Statistics (STA 240)
Submitted To:
Md. Mortuza AhmmedSubmitted By:Shamsul Islam Raisy(BSCE-11106005)Submitting Date:21/11/2011.
8/3/2019 Statistics Assignment 05
2/14
Statistics:
Statistics is the study of the collection, organization, analysis and interpretation of
data.
Application of statistics in civil engineering: There are many uses in different kinds offields in civil engineering of statistics. One of the most important is disaster management.
Every year our country faces lots of natural disasters. We collect, organize and analyze all
those datas to interpret them, to think of a better way to deal with the nature.
Sample and population: In a class of students, everyone is part of the whole population
and the monitor is a sample.
Variable: A number that could change the value in different situations is variable. Ex. height
of a student (a different student could have a different height), size of a shirt (there could be
many different size of a shirt).
Scale of measurement:
1. Nominal scale: nominal scale of measurement only satisfies the identity property ofmeasurement. Values assigned to variables represent a descriptive category, but have
no inherent numerical value with respect to magnitude. Ex. gender (male, female),
color (black, white, red), religion (Islam, Hindu, Buddhism).
2. Ordinal scale: the ordinal scale has the property of both identity and magnitude.Each value on the ordinal scale has a unique meaning and it has an ordered
relationship to every other value on the scale. Ex. in a horse race-win, place and
show; in a class-superior, good, average and poor.
3. Interval scale: the interval scale of measurement hasthe properties of identity, magnitude and equal intervals.
Interval scale expresses the difference and the
measurement of difference in the same scale. Ex.
Womens dress size and temperature
Size Bust Waist Hips
8 32 24 35
10 34 26 37
4. Ratio scale: The ratio scale of measurement satisfiesall four properties of measurement, identity,
magnitude, equal intervals, and an absolute zero.
Ex. weight of an object, it could be zero. We can
say C weights twice as B and D is heavier than A,
B and C.
Day Temperature
Sunday 60 f
Monday 65 f
Tuesday 70 f
Object Weight
A 0
B 2
C 4
D 8
8/3/2019 Statistics Assignment 05
3/14
Line Graph :
Bar Diagram :
Pie Chart :
0
200
400
600
800
1000
1200
August September October November December
0
50
100
150
200
250
Muslim Hindu Other
Muslim
Hindu
Other
Months Share
Index
August 700
September 800
October 1000
November 600
December 400
Religion Population
Muslim 200
Hindu 100
Other 50
Religion Population Percent
Muslim 200 57.14%
Hindu 100 28.57%
Other 50 14.29%
Total 350 100%
8/3/2019 Statistics Assignment 05
4/14
Scatter Diagram :{x, y}={(10,20)(20,40)(30,45)(40,50)(50,55)(60,50)(70,45)}
Steam & Leaf Plot :11,14,16,21,23,24,27,30,31,35,36,37,38,40,41,42,43,50,51.Steam Leaf
1 1,4,6.
2 1,3,4,7.
3 0,1,5,6,7,8.
4 0,1,2,3.
5 0,1.
Histogram :
Central tendency:
Central tendency is to find out a significant number to represent the whole data set.
Measures of central tendency:
0
10
20
30
40
50
60
0 20 40 60 80
0
0.5
1
1.5
2
2.5
3
3.5
5 to 10 10 to 15 15 to 20 20 to 25 25 to 30
Age Weight
10 20
20 40
30 45
40 50
50 55
60 50
70 45
Class Frequency Length
5 to 10 5 5/5=1
10 to 15 10 10/5=2
15 to 20 15 15/5=3
20 to 25 10 10/5=2
25 to 30 5 5/5=1
8/3/2019 Statistics Assignment 05
5/14
1. Mean: also known as arithmetic mean is the value which we get after dividing thetotal of all value by the number of values. = .Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4
=
5 + 6 + 7 + 3 + 4 + 7 + 8 + 5 + 4
9
=49
9
= 5.44
2. Median: it is the middle value that we get after arranging all the values inchronological order. Its generally used when there is an extreme value
present. = ()
Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4[chronologically= 3, 4,4,5,5, 6, 7, 7,8.]
= 9 + 12
= 5.Again, if the given value is: 3,4,4,5,5,6,7,7,8,9.
10
2 = 5. 10
2+ 1 = 6.= 5 = 5 5 6
5 + 5
2= 5
3. Mode: in a given set of numbers which one appears most often that of is mode ofthat data set. When there is two numbers appearing most, then both of them are
mode. Ex. given values: 3,4,4,5,5,6,7,7,7,8,9. = 74. Geometric mean:its useful when we want to find average change of
percentages, ratios, or growth rates over time. We cant use GM when there is a 0 or
negative value in the set. GM is never greater than the AM. The GM of a set of n
positive numbers is defined as the n th root of the product of n values: = 123 () {, = ,1 = . }To find average percent increase over time the formula would be:
= 1{, = } 5. Harmonic mean:we cant use HM when there is a 0 or negative value in the set.
The formula is:
= 11 + 12 + + 1 {, = , 1 = . . }Arithmetic mean is the best measure of central tendency: there are some criteria for a good
measure of central tendency. They are:
I. Clearly defined.II. Readily comprehensible.III. Easily calculated.IV. Based on all observations.V. Less effected by extreme values.VI. Capable for further algebraic treatment.
8/3/2019 Statistics Assignment 05
6/14
And since arithmetic mean fulfills all criteria other then the fifth one, some can say Arithmetic
mean is the best measure of central tendency.
Extreme value in real life situation: let in a family of three where members weights a re 40, 45
and 50 kg, we can find the average 45kg. But if someone with 100kg of weight joins the
family then our average will be 58.7kg, which is more than 3 other values and very far fromthe last one.
Sometimes, measures of central tendency are not appropriate to work with. If we consider
only the central tendency on a data set, we may draw wrong conclusion about the whole
data set. Ex. In a company, average salary of employees is 79000tk (98000tk, 75000tk and
65000tk.) but this doesnt mean everyones salary is 79000tk each or some amount near that.
Find the two numbers whose harmonic mean=32/5, geometric mean=8., = 32/5, = 8, , < . = 8 = 64 . . ()
=
64
. .
, 21 + 1 =
32
5
2 + =
32
5
2 + = 325
2 6 4 + = 325 32 + = 5 2 6 4 + = 20 64 + = 20 64 + 2
= 20
64 + 2 = 20
8/3/2019 Statistics Assignment 05
7/14
2 20 + 6 4 = 0b (b-16)-4(b-16) = 0 (b-16) (b-4) = 0So, b= 16 = 4a= 20 16 or a= 20 4=4 = 16
< = 4 = 16 Find the mean for the series- 1, 2, 3,. . . . . . . . , 500.
mean for the series = + 12 = 500+ 12 = 250.5 What is the median of the sample 4,5,7,9,6,3,2,5,1,9,8,5,8?
= 1,2,3,4,5,5,5,6,7,8,8,9,9.= 5 Find the mean for the series 1000, 2000, 3000, . . . . . . ., 50000.
, = + 12
1000,2000,3000, . . . . . . . , 50000. 1000 ,1 ,2,3, . . . . . . . ,50. = 5 0 + 1
2= 25.5
= 25.5 1000 = 25500Measures of Dispersion
Measures of dispersion are the way by which we can find out the actual data sheet ofan average. That means finding out which average is very systematic.
Methods of measures of dispersion:
Range is based on the largest and the smallest values in the data set, mean deviation,
variance and standard deviation are all based on deviations from the arithmetic mean.
a. Range: The simplest measure of dispersion is the range. It is the difference betweenthe largest and the smallest values in a data set. the formula:
=
Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find outthe range for his scores.
8/3/2019 Statistics Assignment 05
8/14
= 98 75= 23
b. Mean deviation: The arithmetic mean of the absolute values of the deviations fromthe arithmetic mean. the formula:
= {, = , = = }again, when we have to find out MD and SD for two numbers:
= = 2
{, = }Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find out
the mean deviation for his scores.
Given, 1 = 92, 2 = 75, 3 = 95,4 = 90, 5 = 98So, = 92+75+95+90+98
5
=450
5
= 90
= 92 90 + 75 90 + 95 90 + 90 90 + 98 905
=2+15+5+0+8
5
=30
5= 6
c. Variance: The arithmetic mean of the squared deviation from the mean. the formulais: 2 = 2 Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out
the variance for his scores.
Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88So, = 86+94+76+76+88
5
=420
5
= 84
2 = 86842+94842+76842+76842+888425
=22+102+82+82+42
5
=4+100+64+64+16
5
=248
5= 49.6
d. Standard deviation: The square root of the variance. the formula is: = 2 again, when we have to find out MD and SD for two numbers: = =
2{, = }
or when we have to find out the standard deviation for n numbers: = 2112
Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out
the standard deviation for his scores.
Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88
8/3/2019 Statistics Assignment 05
9/14
So, = 86+94+76+76+885
=420
5
= 84
= 86
84
2+
94
84
2+
76
84
2+
76
84
2+
88
84
2
5
=22+102+82+82+425
= 4+100+64+64+165
= 2485
= 49.6 = 7.04Co-efficient of variation: co-efficient of variation is used to compare the dispersion
in different sets of data with different units of measurement. The formula is:
= 100Ex. Find the co-efficient of variation for 5kg and 3taka.Here, = 53
2= 1
and, = 5+32
= 4
= 14
100
= 25%
Correlation: The correlation is a way to measure how associated or related two
variables are. Formula is:
= 22{, =, = , = }There are three patterns of correlation:
I. Positive correlation: in a positive correlation, ifone of the observations increases the second
one does the same and when the firstobservation decreases the second one does
the same. Ex. higher education and years
spent on education - people with higher
education tends to more year of education.
II. Negative correlation: in a negative correlation, if one of the observationsincreases the second one decreases and when the first observation decreases
the second one increases. Ex. watching TV and exam grade when a
student watch a lots of TV he tends to have lower grade in exam.
III. Zero correlation: in a zero correlation, an observation doesnt have any effecton other one. Ex. Bill Gates money and my happiness no matter how muchmoney Bill Gates has that dont make me sad or happier.
Chart for correlation strength
Range Strength
+1 Perfectly positive
-1 Perfectly negative
0 to .3 Weakly positive
.3 to .7 Moderately positive
.7 to -1 Strongly positive
8/3/2019 Statistics Assignment 05
10/14
Regression:Its a statistical tool for the investigation of relationship between
variables. Formula is:
= + {,= = 0, =
1
}
and,
= = 2()2
Difference between regression & correlation
Regression Correlation
1. It can explain cause or effect. 1. It cant explain cause or effect.2. The limit of regression is
. + 2. The limit of correlation is -1 . . . ..+13. It cant predict the future. 3. It can predict the future.
Probability:
Probability provides a way to find and express our uncertainty in making decisions
about a population from sample information. Probability reflects the long-run relative
frequency of the outcome, a probability could be expressed as decimal (0.1), faction
(1
10) or percentage (10%). Formula:
I. = + II. () = () + ()III. = . (/) = . (/)IV. = . ()V. (/) = ()() = .(/)()
Important Terms:
Experiment: Its an activity that is either observed or measured, such as tossing a coin
or drawing a card.
Event: An event is a possible outcome of an experiment. Ex. if the experiment is to
sample six lamps coming off a production line, an event could be to get one defective
and five good ones.
Certain event: In a certain event, if we have a sample of eight numbers and we have
to find out the probability that sample should be included with eight digits; this is
known as certain event. Ex. = 2,3,5,7,11,13,17,19 = 88 [] = 1
8/3/2019 Statistics Assignment 05
11/14
Impossible event: An event which have no possibility to occur. Ex. in a jar of red balls
finding a white ball could be considered as impossible event.
Sample space: A sample space is a complete set of all events of an experiment. Ex.
singer
=
,
,
,
,
,
, bee
=
,
.
Mutually exclusive events: those events that cant happen at a time are called
mutually exclusive event. Ex. In a coin toss of a single coin, events of heads and tails
are mutually exclusive event.
Independent event: Two or more events could be called independent events when
the occurring or not occurring of one doesnt affect another. Ex. coin toss and exam
grade.
Conditional probability: A conditional probability is denoted by P(X/Y).
Probability Distributions: There are three types of probability distribution:
I. Binomial distribution: The probability distribution of the random variable X iscalled binomial distribution. The formula is:() = {, = , = 0,1,2,3, . . ,,
= , = . [ = 1 ]}
Mean of binomial distribution is: = = Variance of binomial distribution is: () = 2 =
II. Poisson probability: There are some applications for the Poisson distribution.Applications are:
a) The number of death by horse kicking in the army.b) Birth defects and genetic mutations.c) Rare diseases (leukemia).d) Car accidents.e) Traffic flow and ideal gap distance.f) Number of typing errors on a page.g) Hairs found in McDonalds burger.h) Spread of an endangered animal in Africa.i) Failure of a machine in one month.
Formula is: =
! , = 0,1,2,3, . . = 2.71828,=
Mean and variance: = , = 2 = .III. Normal probability distributions: The normal probability distribution is very
common in the field of statistics. Formula:() = 12 12( )2 Mean and variance: E(X) = V(X) =
2
8/3/2019 Statistics Assignment 05
12/14
Area under the normal curve using integration: the probability of a
continuous normal variable X found in a particular interval [a,b] is the area
under the curve bounded by x= and x= ( < < ) = () The standard normal distribution: If we have the standardized situation of
= 0 and = 1 then we have,() = 12 22
we can transform all the observations of any normal random variable x with
mean ()and variance () to a new set of observations of another normalrandom variable z with mean 0 and variance 1 using the following transform
= Property of normal distribution:
a) The normal curve is symmetrical about the mean
.
b) The mean is at the middle and divides the area into halves.c) The total area under the curve is equal to 1.d) Its completely determined by its mean and standard deviation.
Sampling:
The methods of drawing sample from a population are:
1. Simple random sampling: Simple random sample is a sample selected so thateach item or person in the population has the same chance of being included,
this can be done in two methods
I. Lottery method: let, in a group of people we have to select 3 peoplerandomly. We write down all their names on different small piece of papers.
Then fold them so that no one could read which name is written in which.
Then shuffle them all in a jar. Then ask someone to pick three piece of paper
from that jar, and this three will be names of our 3 selected people. This
method of selecting simple random sampling is called lottery method.
II. Random number applying: random numbers can be obtained using thecalculator, a spreadsheet, printed tablets of random numbers or tossing coins
or rolling dice.
2. Stratified sampling: let, in a group of people we have to select 1 single,1married and 2 divorced. To be able to do that, we have to divide all the male
and females of that group in 3 subgroups, 1.single 2.married and 3.divorced.
Then from first subgroup we have to take one, one from second subgroup and
two from the third subgroup. This way we will get our 1 single, 1married and 2
divorced people. This method of sampling is called stratified sampling.
3. Systematic sampling: let, in a university we have to know what the studentsare thinking about a new drink within two days, but we can ask only hundred
students in that time limit. There are five thousand students in that university. So
to complete this task in two days, we divide the entire student IDs by 50 and we
ask every 50th
ID holder about our new drink. This process of sampling is called
systematic sampling.
8/3/2019 Statistics Assignment 05
13/14
4. Cluster sampling: let, in a university we have to know what the students arethinking about a new drink, within two days. There are five thousand students in
that university studying in thirty subjects. That is huge amount of data to process
in two days. So to complete this task in two days, we select five specific subjects
and we ask twenty students from each selected subjects about our new drink.
This process of sampling is called cluster sampling.
Difference between stratified sampling & cluster sampling
Stratified sampling Cluster sampling
1. Two strata cannot be same. 1. Two clusters can be same.2. Strata show the homogeneous
and the heterogeneous type
[in case of situation].
2. Clusters show the
homogeneous.
3. Strata divided into groups. 3. Clusters are divided intobrunch.
Hypothesis:
Hypothesis is a statement about a parameter subject to verification.
Null hypothesis: A statement about the value of a population parameter
developed for the purpose of testing numerical evidence. It is expressed by0. Alternate hypothesis: A statement that is accepted if the sample data providesufficient evidence that the null hypothesis is false. It is expressed by1.Level of significance: The probability of rejecting the null hypothesis when it is
true. It is also called level of risk, because it is the risk we take of rejecting the null
hypothesis when it is really true. It is expressed by.Hypothesis testing is done in five simple steps. They are:
Step 1: Establishing 0 and 1.Step 2: Selecting the value for .Step 3: Selecting appropriate formula. = Step 4: Calculating the value of z.
Step 5: Making a decision, we have to accept or reject 0 depending on the value ofz. If the value of z is more then , then 1 is right and if the value of z is less then ,then 1is wrong.In these five steps, after calculating all the right variables, two kind of error is possible.They are:
8/3/2019 Statistics Assignment 05
14/14
Type 1 error: Rejecting the 0, when it is true or right.Type 2 error: Accepting the 0, when it is false or wrong.
The average I.Q. of university women in Bangladesh is suspected to be more then110. A random sample of 64 women yielded an average I.Q. of 115.5 and a standarddeviation of 20. Can you conclude that the average I.Q. of the women in the
population is really more than 110? Test this at 5% level of significance (5% = 1.64).Step 1:0: = 1101: > 110Step 2:5% = 1.64Step 3: = Step 4:, = 115.5 = 110 = 64 = 20 =
115.5
110
2064 =
5.5
2.5= 2.2
Step 5:
Value of z= 2.2 is more than the value of5% = 1.64 so, the average I.Q. of universitywomen in Bangladesh is more then 110.