51
Dr. Rohit Joshi, IIM Shillong Data, Models and Decisions PGP 13-15

Session 2 Data Collection

Embed Size (px)

DESCRIPTION

data collection

Citation preview

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 1/50

Dr. Rohit Joshi, IIM Shillong

Data, Models and DecisionsPGP 13-15

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 2/50

Why Study Statistics?

Decision Makers Use Statistics To:

Present and describe business data and information

 roerl! Dra" conclusions about large oulations, using

information collected from samles

Make reliable forecasts about a business acti#it!

Imro#e business rocesses

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 3/50

Why Collect Data? $ marketing research anal!st needs to assess the

effecti#eness of a ne" tele#ision ad#ertisement.

$ harmaceutical manufacturer needs to determine "hethera ne" drug is more effecti#e than those currentl! in use.

$n oerations manager "ants to monitor a manufacturing rocess to find out "hether the %ualit! of roduct beingmanufactured is conforming to coman! standards.

$n auditor "ants to re#ie" the financial transactions of acoman! in order to determine "hether the coman! is incomliance "ith generall! acceted accounting rinciles.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 4/50

Types of Statistics

StatisticsThe branch of mathematics that transforms data

into useful information for decision makers.

Descriptive Statistics

&ollecting, summari'ing, and

describing data

Inferential Statistics

Dra"ing conclusions and(or makingdecisions concerning a oulation

 based onl! on samle data

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 5/50

Descriptie Statistics

&ollect data

e). Sur#e!

Present data e). Tables and grahs

&haracteri'e data

e). Samle mean * i X 

n

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 6/50

!nferential Statistics

+stimation

e). +stimate the

 oulation mean "eight

using the samle a#erage"eight

!othesis testing

e). Test the claim that

the oulation a#erage"eight is - /g

Dra"ing conclusions and(or making decisions

concerning a population based on sample results.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 7/50

"asic #oca$ulary of StatisticsVARIABLE$ variable is a characteristic of an item or indi#idual.

DATA

Data are the different #alues associated "ith a #ariable.

POPULATION$ population consists of all the items or indi#iduals about "hich !ou "ant to dra" a

conclusion.

SAMPLE

$ sample is the ortion of a oulation selected for anal!sis.

PARAMETER$ parameter is a numerical measure that describes a characteristic of a oulation.

STATISTIC

$ statistic is a numerical measure that describes a characteristic of a samle

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 8/50

Population s% Sa&ple

Population Sample

Measures used to describe the

 oulation are called parameters

Measures comuted from

samle data are called statistics

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 9/50

Sources of Data

Primar! Sources: The data collector is the one

using the data for anal!sis Data from a olitical sur#e!

Data collected from an e)eriment

0bser#ed data

Secondar! Sources: The erson erforming data

anal!sis is not the data collector 

$nal!'ing census data +)amining data from rint 1ournals or data ublished on

the internet.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 10/50

Types of #aria$les

Categorical 2%ualitati#e3 #ariables ha#e #alues

that can onl! be laced into categories, such as

4!es5 and 4no.5

Numerical 2%uantitati#e3 #ariables ha#e #alues

that reresent %uantities.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 11/50

Types of Data

Data

Categorical Numerical

Discrete Continuous

Examples:

Marital Status Political Party Eye Color

  (Defined categoriesExamples:

Number of C!ildren Defects per !our

  (Counted items

Examples:

"eig!t #oltage

  (Measured c!aracteristics

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 12/50

Pro$a$ility

 +mirical classic robabilit!6ased on historical data&omuted after erforming the e)eriment 7umber of times an e#ent occurred di#ided b! the number of

trials

0b1ecti#e 88 e#er!one correctl! using the method assigns anidentical robabilit!

Sub1ecti#e robabilit!different indi#iduals ma! 2correctl!3 assign different numeric

 robabilities to the same e#ent

Mutuall! +)clusi#e e#ent

&ollecti#el! +)hausti#e e#ent

+%uall! 9ikel! e#ent

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 13/50

'ando& #aria$le $ random #ariable x takes on a defined set

of #alues "ith different robabilities. or e)amle, if !ou roll a die, the outcome is random

2not fi)ed3 and there are - ossible outcomes, each of"hich occur "ith robabilit! one8si)th.

or e)amle, if !ou oll eole about their #oting references, the ercentage of the samle that resonds4;es on Proosition <==5 is a also a random #ariable2the ercentage "ill be slightl! differentl! e#er! time!ou oll3.

Roughl!, robabilit! is ho" fre%uentl! "ee)ect different outcomes to occur if "ereeat the e)eriment o#er and o#er

24fre%uentist5 #ie"3

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 14/50

'ando& aria$les can $e discrete or

continuous

Discrete random #ariables ha#e a countable number ofoutcomes+)amles: Dead(ali#e, dice, counts, etc.

Continuous random #ariables ha#e an infinite continuumof ossible #alues. +)amles: blood ressure, "eight, the seed of a car, the real

numbers from < to -.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 15/50

Probabilit !unctions$ robabilit! function mas the ossible #alues of

 x against their resecti#e robabilities of

occurrence, p(x) 

 p(x) is a number from = to <.=.

The area under a robabilit! function is al"a!s <.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 16/50

Discrete e(a&ple) roll of a die

 x

 p(x)

1/6

1 4 5 62 3

∑   =  )all

<P2)3

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 17/50

Pro$a$ility &ass function *p&f+

 x p(x)

1   p(x=1)=1/6

2   p(x=2)=1/6

3   p(x=3)=1/6

4   p(x=4)=1

/65   p(x=5)=1

/66   p(x=6)=1

/6<.=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 18/50

Cu&ulatie distri$ution function

*CD+

 x

 P(x)

<(-

< > -? @

<(@

<(?

?(@

(-

<.=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 19/50

Cu&ulatie distri$ution function

 x P(x≤A)

1   P(x≤1)=1/6

2   P(x≤2)=2/6

3   P(x≤3)=3/6

4   P(x≤4)=4/6

5   P(x≤5)=5/6

6   P(x≤6)=6/6

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 20/50

Practice Problem:The number of atients seen in the +R in an! gi#en hour is

a random #ariable reresented b! x. The robabilit!

distribution for x is:

 x  10 11 12 13 14P(x) .4 .2 .2 .1 .1

ind the robabilit! that in a gi#en hour:

a. e)actl! <> atients arri#e

 b. $t least <? atients arri#e

c. $t most << atients arri#e

 p(x=14)* .<

 p(x≥ 12)* 2.? A .< A.<3 * .> 

 p(x≤11)* 2.> A.?3 * .-

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 21/50

'eie .uestion 1

If !ou toss a die, "hatBs the robabilit! that !ou

roll a @ or lessC

a. <(-

 b. <(@

c. <(?

d. (-e. <.=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 22/50

'eie .uestion 1

If !ou toss a die, "hatBs the robabilit! that !ou

roll a @ or lessC

a. <(-

 b. <(@

c$ %&'

d. (-e. <.=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 23/50

'eie .uestion /

T"o dice are rolled and the sum of the face

#alues is si)C hat is the robabilit! that at

least one of the dice came u a @C

a. <(

 b. ?(@

c. <(?d. (-

e. <.=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 24/50

'eie .uestion /

T"o dice are rolled and the sum of the face

#alues is si). hat is the robabilit! that at least

one of the dice came u a @C

a$ %&

 b. ?(@

c. <(?d. (-

e. <.=

o" can !ou get a - on t"o diceC

<8, 8<, ?8>, >8?, @8@

0ne of these fi#e has a @.

∴<(

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 25/50

Example: Suose "e fli t"o identical coins

simultaneousl!. hat is the robabilit! of obtaining a head

on the first coin 2call e#ent A3 and a head on the secondcoin 2call e#ent B3C

Example: $ card is dra"n from a "ell shuffled ack of

 la!ing cards. hat is the robabilit! that it "ill either a

sade or a %ueenCExample: In a DMD class there are <?@ students of "hich

E@ students are males and @= are females. 0f these, @-

males and <F females lan to ma1or in Marketing. $ student

is selected at random from this class and it is found that thisstudent lans to be a Marketing ma1or. hat is the

 robabilit! that the student is a maleC

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 26/50

Continuous case

 The probabilit !unction that accompanies acontinuous ran"om #ariable is a continuousmathematical !unction that inte$rates to 1. %or e&ample' recall the ne$ati#e e&ponential

!unction (in probabilit' this is calle" an)e&ponential "istribution*+:

  This function integrates to <:

 xe x f     −=32

<<==

=

=+=−=+∞

−+∞

−∫    x x ee

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 27/50

or e)amle, the robabilit! of x falling "ithin < to ?:

 x

 p(x)=e-x

<

< ?

&linical e)amle: Sur#i#al

times after lung translant ma!

roughl! follo" an e)onential

function.

Then, the robabilit! that a atient "ill die in the second

!ear after surger! 2bet"een

!ears < and ?3 is ?@G.

?@.@-F.<@.?3)P2< <??

<

?

<

=+−=−−−=−==≤≤   −−−−∫    eeee   x x

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 28/50

E"#ecte$ Value an$ Variance

$ll robabilit! distributions are

characteri'ed b! an e)ected #alue

2mean3 and a #ariance 2standardde#iation s%uared3.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 29/50

,&pecte" #alue' !ormall

Discrete case:

&ontinuous case:

∑=

  )all

32   ) p(x x X  E  ii

dx ) p(x x X  E  ii∫ =  )all

32

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 30/50

 0 Situation

$cme ruit and Hegetable holesalers bu!s tomatoes,then sells them to retailers. $cme currentl! a!s -?=== er container. Tomatoes sold on the same da!

 bring -  === er container. +)tremel! erishable in

nature, if an! tomato container not sold on the same

da! are "orthless and re%uired to be disosed off

2consider at no cost3. The distribution managerBs

 roblem is to determine the otimum number he

should order each da!. 0n da!s "hen he stocks more

than he sells, his rofit is reduced b! the cost of the

unsold containers. 0n the other hand, "hen retailers

re%uest more containers than he has in stock, he loses

sales and makes smaller rofit than he could ha#e.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 31/50

De%elo#ing Pa&o!! table

$cme currentl! a!s - ?=== er container. Tomatoes soldon the same da! bring - === er container. Profit * @===

 er container.

Pay off table in ` )** 

$&TI07S 2 uantit! ordered 3

E#EN+S(Demand

,%- %* ,'- %% ,. -%' ,/- %.

D%- %* @== ?F= ?-= ?>=

D'- %% @== @@= @<= ?E=D.- %' @== @@= @-= @>=

D/- %. @== @@= @-= @E=

hen D , P * @= and "hen D , P * @= D ?= 28D3≥

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 32/50

Probabilit o! Occurrence #rinci#le

9et us suose the Manager ket a record of his sales for the ast<== da!s.

The e)ected #alue 2+H3 of decision alternati#e d i  is defined as:

"here:  N  * the number of states of nature

  P 2 s j 3 * the robabilit! of state of nature s j

 ij * the a!off corresonding to decision alternati#e d i  and state of nature s j

Daily Sales Number of dayssold

Probability of eac!number being sold

D%- %* < =.<

D'- %% ?= =.?=

D.- %' >= =.>=

D/- %. ? =.?

+H2 3 2 3d P s V  i j ij  j

 N 

==

∑<

+H2 3 2 3d P s V  i j ij  j

 N 

==

∑<

Expected profit from stoc0ing %* containers

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 33/50

Expected profit from stoc0ing %* containers

Expected profit from stoc0ing %% containers

$&TI07 2 uantit! ordered is <=3

E#EN+S(Demand

Conditionalprofit (%

Probability ofselling ('

Expected profit-(% x ('

D%- %* @== =.< >

D'- %% @== =.?= -=

D.- %' @== =.>= <?=

D/- %. @== =.? K

+otal E# .**

$&TI07 2 uantit! ordered is <<3

E#EN+S

(Demand

Conditional

profit (%

Probability of

selling ('

Expected profit

-(% x ('D%- %* ?F= =.< >?

D'- %% @@= =.?= --

D.- %' @@= =.>= <@?

D/- %. @@= =.? F?

+otal E# .''$*

Expected profit from stoc0ing %' containers

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 34/50

Expected profit from stoc0ing %' containers

Expected profit from stoc0ing %. containers

$&TI07 2 uantit! ordered is <?3

E#EN+S(Demand

Conditionalprofit (%

Probability ofselling ('

Expected profit-(% x ('

D%- %* ?-= =.< @E

D'- %% @<= =.?= -?

D.- %' @-= =.>= <>>

D/- %. @-= =.? E=

+otal E# ..

$&TI07 2 uantit! ordered is <@3

E#EN+S

(Demand

Conditional

profit (%

Probability of

selling ('

Expected profit

-(% x ('D%- %* ?>= =.< @-

D'- %% ?E= =.?= F

D.- %' @>= =.>= <@-

D/- %. @E= =.? EK

+otal E# .'1$*

Strateg! adoted

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 35/50

!&portant discrete pro$a$ility

distri$ution) The $ino&ial

The "ino&ial Distri$ution) Properties

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 36/50

The "ino&ial Distri$ution) Properties $ fi)ed number of obser#ations, n

e). < tosses of a coinL ten light bulbs taken from a

"arehouse T"o mutuall! e)clusi#e and collecti#el!

e)hausti#e categories e). head or tail in each toss of a coinL defecti#e or not

defecti#e light bulbL ha#ing a bo! or girl enerall! called 4success5 and 4failure5

Probabilit! of success is , robabilit! of failure is <

&onstant robabilit! for each obser#ation The outcome of one obser#ation does not affect the outcome of

the other 

T"o samling methods Infinite oulation "ithout relacement

inite oulation "ith relacement

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 37/50

"ino&ial distri$ution

  Take the e)amle of coin tosses. hatBs the robabilit! that !ou fli e)actl! @ heads in coin

tossesC

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 38/50

"ino&ial distri$ution, enerally

1-p = robabilit!of failure

 p *

 robabilit! of

success

 X = N

successes

out of n 

trials

n * number of trials

Note the general pattern emerging

  if you have only two possibleoutcomes (call them 1/0 or yes/no or success/failure) in n independent

trials, then the probability of exactly  X  “successes”=

 X n X n

 X 

 p p  −− 

 

 

 

 3<2

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 39/50

"ino&ial distri$ution) e(a&ple

If I toss a coin ?= times, "hatBs the robabilit! of

getting e)actl! <= headsC

<K-.32.32.   <=<=?=

<=

=  

  

 

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 40/50

"ino&ial distri$ution) e(a&ple

If I toss a coin ?= times, "hatBs the robabilit! of

getting of getting ? or fe"er headsC

>

>K?=<F??=

?

K?=<E<?=

<

K?=?==?=

=

<=F.<

<=F.<<=.E<E=32.O?O<F

O?=32.32.

<=E.<<=.E?=32.O<O<E

O?=32.32.

<=.E32.O=O?=

O?=32.32.

−−

−−

=

===   

  

+=== 

 

 

 

 

+==   

  

 x

 x x x

 x x x

 x

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 41/50

''All #robabilit $istributions are c(aracteri)e$

b an e"#ecte$ %alue an$ a %ariance*

If X  follo"s a binomial distribution "ith arameters n and p:  X ~ Bin (n, p) 

Mean

Hariance and Standard De#iation

here n * samle si'e

  * robabilit! of success

2< 3 * robabilit! of failure

 pn+2)3   ==

382<nQ

?

 p p=   382<nQ   p p=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 42/50

 0pplications

$ manufacturing lant labels items as eitherdefecti#e or accetable

$ firm bidding for contracts "ill either get a

contract or not

$ marketing research firm recei#es sur#e! resonses

of 4!es I "ill bu!5 or 4no I "ill not5

 7e" 1ob alicants either accet the offer or re1ect it

;our team either "ins or loses the football game at

the coman! icnic

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 43/50

The 2ypereo&etric Distri$ution

The binomial distribution is alicable"hen selecting from a finite oulation "ith

relacement or from an infinite oulation

"ithout relacement.

The !ypergeometric distribution is

alicable "hen selecting from a finite oulation "ithout relacement.

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 44/50

The 2ypereo&etric Distri$ution

here

 7 * oulation si'e

$ * number of successes in the oulation  7 $ * number of failures in the oulation

n * samle si'e

* number of successes in the samle

  n * number of failures in the samle

      

   

  

 −−

   

  

 

=

n

 N 

 X  n

 A N 

 X  

 A

 X   P    32

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 45/50

The 2ypereo&etric Distri$ution

(a&ple

Different comuters are checked from <= in thedeartment. > of the <= comuters ha#e illegal

soft"are loaded. hat is the robabilit! that ? of the

@ selected comuters ha#e illegal soft"are loadedC

So, 7 * <=, n * @, $ * >, * ?

The robabilit! that ? of the @ selected comuters

ha#e illegal soft"are loaded is .@=, or @=G.

4%31/4

*5+*5+

3

14

1

5

/

6

n

7

8n

 07

8

 0

/+P*8   ==

   

  

 

   

  

    

  

 

=

   

  

 

   

  

 

   

  

 

==

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 46/50

The 2ypereo&etric Distri$ution

Characteristics

The mean of the h!ergeometric distribution is:

The standard de#iation is:

here is called the 4inite Poulation &orrection actor5

from samling "ithout relacement from a finite oulation

7

n0*(+9   ==

1-7

n-7

7

 0+-n0*7:

/  ⋅=

1-7

n-7

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 47/50

The Poisson Distri$ution Definitions

$n area of opportunity is a continuous unit orinter#al of time, #olume, or such area in "hich

more than one occurrence of an e#ent can

occur.

e). The number of scratches in a carBs aint

e). The number of mos%uito bites on a erson

e). The number of comuter crashes in a da!

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 48/50

The Poisson Distri$ution Properties

2pply t!e Poisson Distribution 3!en:;ou "ish to count the number of times an e#ent occurs in a

gi#en area of oortunit!

The robabilit! that an e#ent occurs in one area of oortunit!

is the same for all areas of oortunit!

The number of e#ents that occur in one area of oortunit! is

indeendent of the number of e#ents that occur in the other

areas of oortunit!

The robabilit! that t"o or more e#ents occur in an area of

oortunit! aroaches 'ero as the area of oortunit! becomes smaller 

The a#erage number of e#ents er unit is 2lambda3

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 49/50

The Poisson Distri$ution or&ula

"here:

* the robabilit! of e#ents in an area of oortunit!

λ * e)ected number of e#ents

e * mathematical constant aro)imated b! ?.K<F?F

O

 eP23

)T −

=

7/17/2019 Session 2 Data Collection

http://slidepdf.com/reader/full/session-2-data-collection 50/50

 0n e(a&ple

Suose that, on a#erage, cars enter a arking lot er minute. hat is the robabilit! that in a gi#en

minute, K cars "ill enterC

So, there is a <=.>G chance K cars "ill enter the

 arking in a gi#en minute.

Mean * Hariance *

=.<=>KOe

ROT eP2K3

K)T 

===−−