Statistics Assignment 05

Embed Size (px)

Citation preview

  • 8/3/2019 Statistics Assignment 05

    1/14

    Assignment 05Course: Statistics (STA 240)

    Submitted To:

    Md. Mortuza AhmmedSubmitted By:Shamsul Islam Raisy(BSCE-11106005)Submitting Date:21/11/2011.

  • 8/3/2019 Statistics Assignment 05

    2/14

    Statistics:

    Statistics is the study of the collection, organization, analysis and interpretation of

    data.

    Application of statistics in civil engineering: There are many uses in different kinds offields in civil engineering of statistics. One of the most important is disaster management.

    Every year our country faces lots of natural disasters. We collect, organize and analyze all

    those datas to interpret them, to think of a better way to deal with the nature.

    Sample and population: In a class of students, everyone is part of the whole population

    and the monitor is a sample.

    Variable: A number that could change the value in different situations is variable. Ex. height

    of a student (a different student could have a different height), size of a shirt (there could be

    many different size of a shirt).

    Scale of measurement:

    1. Nominal scale: nominal scale of measurement only satisfies the identity property ofmeasurement. Values assigned to variables represent a descriptive category, but have

    no inherent numerical value with respect to magnitude. Ex. gender (male, female),

    color (black, white, red), religion (Islam, Hindu, Buddhism).

    2. Ordinal scale: the ordinal scale has the property of both identity and magnitude.Each value on the ordinal scale has a unique meaning and it has an ordered

    relationship to every other value on the scale. Ex. in a horse race-win, place and

    show; in a class-superior, good, average and poor.

    3. Interval scale: the interval scale of measurement hasthe properties of identity, magnitude and equal intervals.

    Interval scale expresses the difference and the

    measurement of difference in the same scale. Ex.

    Womens dress size and temperature

    Size Bust Waist Hips

    8 32 24 35

    10 34 26 37

    4. Ratio scale: The ratio scale of measurement satisfiesall four properties of measurement, identity,

    magnitude, equal intervals, and an absolute zero.

    Ex. weight of an object, it could be zero. We can

    say C weights twice as B and D is heavier than A,

    B and C.

    Day Temperature

    Sunday 60 f

    Monday 65 f

    Tuesday 70 f

    Object Weight

    A 0

    B 2

    C 4

    D 8

  • 8/3/2019 Statistics Assignment 05

    3/14

    Line Graph :

    Bar Diagram :

    Pie Chart :

    0

    200

    400

    600

    800

    1000

    1200

    August September October November December

    0

    50

    100

    150

    200

    250

    Muslim Hindu Other

    Muslim

    Hindu

    Other

    Months Share

    Index

    August 700

    September 800

    October 1000

    November 600

    December 400

    Religion Population

    Muslim 200

    Hindu 100

    Other 50

    Religion Population Percent

    Muslim 200 57.14%

    Hindu 100 28.57%

    Other 50 14.29%

    Total 350 100%

  • 8/3/2019 Statistics Assignment 05

    4/14

    Scatter Diagram :{x, y}={(10,20)(20,40)(30,45)(40,50)(50,55)(60,50)(70,45)}

    Steam & Leaf Plot :11,14,16,21,23,24,27,30,31,35,36,37,38,40,41,42,43,50,51.Steam Leaf

    1 1,4,6.

    2 1,3,4,7.

    3 0,1,5,6,7,8.

    4 0,1,2,3.

    5 0,1.

    Histogram :

    Central tendency:

    Central tendency is to find out a significant number to represent the whole data set.

    Measures of central tendency:

    0

    10

    20

    30

    40

    50

    60

    0 20 40 60 80

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    5 to 10 10 to 15 15 to 20 20 to 25 25 to 30

    Age Weight

    10 20

    20 40

    30 45

    40 50

    50 55

    60 50

    70 45

    Class Frequency Length

    5 to 10 5 5/5=1

    10 to 15 10 10/5=2

    15 to 20 15 15/5=3

    20 to 25 10 10/5=2

    25 to 30 5 5/5=1

  • 8/3/2019 Statistics Assignment 05

    5/14

    1. Mean: also known as arithmetic mean is the value which we get after dividing thetotal of all value by the number of values. = .Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4

    =

    5 + 6 + 7 + 3 + 4 + 7 + 8 + 5 + 4

    9

    =49

    9

    = 5.44

    2. Median: it is the middle value that we get after arranging all the values inchronological order. Its generally used when there is an extreme value

    present. = ()

    Ex. given values: 5, 6, 7, 3, 4, 7, 8, 5, 4[chronologically= 3, 4,4,5,5, 6, 7, 7,8.]

    = 9 + 12

    = 5.Again, if the given value is: 3,4,4,5,5,6,7,7,8,9.

    10

    2 = 5. 10

    2+ 1 = 6.= 5 = 5 5 6

    5 + 5

    2= 5

    3. Mode: in a given set of numbers which one appears most often that of is mode ofthat data set. When there is two numbers appearing most, then both of them are

    mode. Ex. given values: 3,4,4,5,5,6,7,7,7,8,9. = 74. Geometric mean:its useful when we want to find average change of

    percentages, ratios, or growth rates over time. We cant use GM when there is a 0 or

    negative value in the set. GM is never greater than the AM. The GM of a set of n

    positive numbers is defined as the n th root of the product of n values: = 123 () {, = ,1 = . }To find average percent increase over time the formula would be:

    = 1{, = } 5. Harmonic mean:we cant use HM when there is a 0 or negative value in the set.

    The formula is:

    = 11 + 12 + + 1 {, = , 1 = . . }Arithmetic mean is the best measure of central tendency: there are some criteria for a good

    measure of central tendency. They are:

    I. Clearly defined.II. Readily comprehensible.III. Easily calculated.IV. Based on all observations.V. Less effected by extreme values.VI. Capable for further algebraic treatment.

  • 8/3/2019 Statistics Assignment 05

    6/14

    And since arithmetic mean fulfills all criteria other then the fifth one, some can say Arithmetic

    mean is the best measure of central tendency.

    Extreme value in real life situation: let in a family of three where members weights a re 40, 45

    and 50 kg, we can find the average 45kg. But if someone with 100kg of weight joins the

    family then our average will be 58.7kg, which is more than 3 other values and very far fromthe last one.

    Sometimes, measures of central tendency are not appropriate to work with. If we consider

    only the central tendency on a data set, we may draw wrong conclusion about the whole

    data set. Ex. In a company, average salary of employees is 79000tk (98000tk, 75000tk and

    65000tk.) but this doesnt mean everyones salary is 79000tk each or some amount near that.

    Find the two numbers whose harmonic mean=32/5, geometric mean=8., = 32/5, = 8, , < . = 8 = 64 . . ()

    =

    64

    . .

    , 21 + 1 =

    32

    5

    2 + =

    32

    5

    2 + = 325

    2 6 4 + = 325 32 + = 5 2 6 4 + = 20 64 + = 20 64 + 2

    = 20

    64 + 2 = 20

  • 8/3/2019 Statistics Assignment 05

    7/14

    2 20 + 6 4 = 0b (b-16)-4(b-16) = 0 (b-16) (b-4) = 0So, b= 16 = 4a= 20 16 or a= 20 4=4 = 16

    < = 4 = 16 Find the mean for the series- 1, 2, 3,. . . . . . . . , 500.

    mean for the series = + 12 = 500+ 12 = 250.5 What is the median of the sample 4,5,7,9,6,3,2,5,1,9,8,5,8?

    = 1,2,3,4,5,5,5,6,7,8,8,9,9.= 5 Find the mean for the series 1000, 2000, 3000, . . . . . . ., 50000.

    , = + 12

    1000,2000,3000, . . . . . . . , 50000. 1000 ,1 ,2,3, . . . . . . . ,50. = 5 0 + 1

    2= 25.5

    = 25.5 1000 = 25500Measures of Dispersion

    Measures of dispersion are the way by which we can find out the actual data sheet ofan average. That means finding out which average is very systematic.

    Methods of measures of dispersion:

    Range is based on the largest and the smallest values in the data set, mean deviation,

    variance and standard deviation are all based on deviations from the arithmetic mean.

    a. Range: The simplest measure of dispersion is the range. It is the difference betweenthe largest and the smallest values in a data set. the formula:

    =

    Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find outthe range for his scores.

  • 8/3/2019 Statistics Assignment 05

    8/14

    = 98 75= 23

    b. Mean deviation: The arithmetic mean of the absolute values of the deviations fromthe arithmetic mean. the formula:

    = {, = , = = }again, when we have to find out MD and SD for two numbers:

    = = 2

    {, = }Ex. A student took five exams and scored 92, 75, 95, 90 and 98. We have to find out

    the mean deviation for his scores.

    Given, 1 = 92, 2 = 75, 3 = 95,4 = 90, 5 = 98So, = 92+75+95+90+98

    5

    =450

    5

    = 90

    = 92 90 + 75 90 + 95 90 + 90 90 + 98 905

    =2+15+5+0+8

    5

    =30

    5= 6

    c. Variance: The arithmetic mean of the squared deviation from the mean. the formulais: 2 = 2 Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out

    the variance for his scores.

    Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88So, = 86+94+76+76+88

    5

    =420

    5

    = 84

    2 = 86842+94842+76842+76842+888425

    =22+102+82+82+42

    5

    =4+100+64+64+16

    5

    =248

    5= 49.6

    d. Standard deviation: The square root of the variance. the formula is: = 2 again, when we have to find out MD and SD for two numbers: = =

    2{, = }

    or when we have to find out the standard deviation for n numbers: = 2112

    Ex. A student took five exams and scored 86, 94, 76, 76 and 88. We have to find out

    the standard deviation for his scores.

    Given, 1 = 86, 2 = 94, 3 = 76,4 = 76, 5 = 88

  • 8/3/2019 Statistics Assignment 05

    9/14

    So, = 86+94+76+76+885

    =420

    5

    = 84

    = 86

    84

    2+

    94

    84

    2+

    76

    84

    2+

    76

    84

    2+

    88

    84

    2

    5

    =22+102+82+82+425

    = 4+100+64+64+165

    = 2485

    = 49.6 = 7.04Co-efficient of variation: co-efficient of variation is used to compare the dispersion

    in different sets of data with different units of measurement. The formula is:

    = 100Ex. Find the co-efficient of variation for 5kg and 3taka.Here, = 53

    2= 1

    and, = 5+32

    = 4

    = 14

    100

    = 25%

    Correlation: The correlation is a way to measure how associated or related two

    variables are. Formula is:

    = 22{, =, = , = }There are three patterns of correlation:

    I. Positive correlation: in a positive correlation, ifone of the observations increases the second

    one does the same and when the firstobservation decreases the second one does

    the same. Ex. higher education and years

    spent on education - people with higher

    education tends to more year of education.

    II. Negative correlation: in a negative correlation, if one of the observationsincreases the second one decreases and when the first observation decreases

    the second one increases. Ex. watching TV and exam grade when a

    student watch a lots of TV he tends to have lower grade in exam.

    III. Zero correlation: in a zero correlation, an observation doesnt have any effecton other one. Ex. Bill Gates money and my happiness no matter how muchmoney Bill Gates has that dont make me sad or happier.

    Chart for correlation strength

    Range Strength

    +1 Perfectly positive

    -1 Perfectly negative

    0 to .3 Weakly positive

    .3 to .7 Moderately positive

    .7 to -1 Strongly positive

  • 8/3/2019 Statistics Assignment 05

    10/14

    Regression:Its a statistical tool for the investigation of relationship between

    variables. Formula is:

    = + {,= = 0, =

    1

    }

    and,

    = = 2()2

    Difference between regression & correlation

    Regression Correlation

    1. It can explain cause or effect. 1. It cant explain cause or effect.2. The limit of regression is

    . + 2. The limit of correlation is -1 . . . ..+13. It cant predict the future. 3. It can predict the future.

    Probability:

    Probability provides a way to find and express our uncertainty in making decisions

    about a population from sample information. Probability reflects the long-run relative

    frequency of the outcome, a probability could be expressed as decimal (0.1), faction

    (1

    10) or percentage (10%). Formula:

    I. = + II. () = () + ()III. = . (/) = . (/)IV. = . ()V. (/) = ()() = .(/)()

    Important Terms:

    Experiment: Its an activity that is either observed or measured, such as tossing a coin

    or drawing a card.

    Event: An event is a possible outcome of an experiment. Ex. if the experiment is to

    sample six lamps coming off a production line, an event could be to get one defective

    and five good ones.

    Certain event: In a certain event, if we have a sample of eight numbers and we have

    to find out the probability that sample should be included with eight digits; this is

    known as certain event. Ex. = 2,3,5,7,11,13,17,19 = 88 [] = 1

  • 8/3/2019 Statistics Assignment 05

    11/14

    Impossible event: An event which have no possibility to occur. Ex. in a jar of red balls

    finding a white ball could be considered as impossible event.

    Sample space: A sample space is a complete set of all events of an experiment. Ex.

    singer

    =

    ,

    ,

    ,

    ,

    ,

    , bee

    =

    ,

    .

    Mutually exclusive events: those events that cant happen at a time are called

    mutually exclusive event. Ex. In a coin toss of a single coin, events of heads and tails

    are mutually exclusive event.

    Independent event: Two or more events could be called independent events when

    the occurring or not occurring of one doesnt affect another. Ex. coin toss and exam

    grade.

    Conditional probability: A conditional probability is denoted by P(X/Y).

    Probability Distributions: There are three types of probability distribution:

    I. Binomial distribution: The probability distribution of the random variable X iscalled binomial distribution. The formula is:() = {, = , = 0,1,2,3, . . ,,

    = , = . [ = 1 ]}

    Mean of binomial distribution is: = = Variance of binomial distribution is: () = 2 =

    II. Poisson probability: There are some applications for the Poisson distribution.Applications are:

    a) The number of death by horse kicking in the army.b) Birth defects and genetic mutations.c) Rare diseases (leukemia).d) Car accidents.e) Traffic flow and ideal gap distance.f) Number of typing errors on a page.g) Hairs found in McDonalds burger.h) Spread of an endangered animal in Africa.i) Failure of a machine in one month.

    Formula is: =

    ! , = 0,1,2,3, . . = 2.71828,=

    Mean and variance: = , = 2 = .III. Normal probability distributions: The normal probability distribution is very

    common in the field of statistics. Formula:() = 12 12( )2 Mean and variance: E(X) = V(X) =

    2

  • 8/3/2019 Statistics Assignment 05

    12/14

    Area under the normal curve using integration: the probability of a

    continuous normal variable X found in a particular interval [a,b] is the area

    under the curve bounded by x= and x= ( < < ) = () The standard normal distribution: If we have the standardized situation of

    = 0 and = 1 then we have,() = 12 22

    we can transform all the observations of any normal random variable x with

    mean ()and variance () to a new set of observations of another normalrandom variable z with mean 0 and variance 1 using the following transform

    = Property of normal distribution:

    a) The normal curve is symmetrical about the mean

    .

    b) The mean is at the middle and divides the area into halves.c) The total area under the curve is equal to 1.d) Its completely determined by its mean and standard deviation.

    Sampling:

    The methods of drawing sample from a population are:

    1. Simple random sampling: Simple random sample is a sample selected so thateach item or person in the population has the same chance of being included,

    this can be done in two methods

    I. Lottery method: let, in a group of people we have to select 3 peoplerandomly. We write down all their names on different small piece of papers.

    Then fold them so that no one could read which name is written in which.

    Then shuffle them all in a jar. Then ask someone to pick three piece of paper

    from that jar, and this three will be names of our 3 selected people. This

    method of selecting simple random sampling is called lottery method.

    II. Random number applying: random numbers can be obtained using thecalculator, a spreadsheet, printed tablets of random numbers or tossing coins

    or rolling dice.

    2. Stratified sampling: let, in a group of people we have to select 1 single,1married and 2 divorced. To be able to do that, we have to divide all the male

    and females of that group in 3 subgroups, 1.single 2.married and 3.divorced.

    Then from first subgroup we have to take one, one from second subgroup and

    two from the third subgroup. This way we will get our 1 single, 1married and 2

    divorced people. This method of sampling is called stratified sampling.

    3. Systematic sampling: let, in a university we have to know what the studentsare thinking about a new drink within two days, but we can ask only hundred

    students in that time limit. There are five thousand students in that university. So

    to complete this task in two days, we divide the entire student IDs by 50 and we

    ask every 50th

    ID holder about our new drink. This process of sampling is called

    systematic sampling.

  • 8/3/2019 Statistics Assignment 05

    13/14

    4. Cluster sampling: let, in a university we have to know what the students arethinking about a new drink, within two days. There are five thousand students in

    that university studying in thirty subjects. That is huge amount of data to process

    in two days. So to complete this task in two days, we select five specific subjects

    and we ask twenty students from each selected subjects about our new drink.

    This process of sampling is called cluster sampling.

    Difference between stratified sampling & cluster sampling

    Stratified sampling Cluster sampling

    1. Two strata cannot be same. 1. Two clusters can be same.2. Strata show the homogeneous

    and the heterogeneous type

    [in case of situation].

    2. Clusters show the

    homogeneous.

    3. Strata divided into groups. 3. Clusters are divided intobrunch.

    Hypothesis:

    Hypothesis is a statement about a parameter subject to verification.

    Null hypothesis: A statement about the value of a population parameter

    developed for the purpose of testing numerical evidence. It is expressed by0. Alternate hypothesis: A statement that is accepted if the sample data providesufficient evidence that the null hypothesis is false. It is expressed by1.Level of significance: The probability of rejecting the null hypothesis when it is

    true. It is also called level of risk, because it is the risk we take of rejecting the null

    hypothesis when it is really true. It is expressed by.Hypothesis testing is done in five simple steps. They are:

    Step 1: Establishing 0 and 1.Step 2: Selecting the value for .Step 3: Selecting appropriate formula. = Step 4: Calculating the value of z.

    Step 5: Making a decision, we have to accept or reject 0 depending on the value ofz. If the value of z is more then , then 1 is right and if the value of z is less then ,then 1is wrong.In these five steps, after calculating all the right variables, two kind of error is possible.They are:

  • 8/3/2019 Statistics Assignment 05

    14/14

    Type 1 error: Rejecting the 0, when it is true or right.Type 2 error: Accepting the 0, when it is false or wrong.

    The average I.Q. of university women in Bangladesh is suspected to be more then110. A random sample of 64 women yielded an average I.Q. of 115.5 and a standarddeviation of 20. Can you conclude that the average I.Q. of the women in the

    population is really more than 110? Test this at 5% level of significance (5% = 1.64).Step 1:0: = 1101: > 110Step 2:5% = 1.64Step 3: = Step 4:, = 115.5 = 110 = 64 = 20 =

    115.5

    110

    2064 =

    5.5

    2.5= 2.2

    Step 5:

    Value of z= 2.2 is more than the value of5% = 1.64 so, the average I.Q. of universitywomen in Bangladesh is more then 110.