64
Advanced Research Methods & Statistical Applications Dr. Basheer Ahmad Samim 1 01:38 PM

Arm & Sa Spring 13

Embed Size (px)

DESCRIPTION

Statistical apllication

Citation preview

Quantitative Analysis

Advanced Research Methods &Statistical ApplicationsDr. Basheer Ahmad Samim15:01 PMStatistical Applications25:04 PMRecommended Readings (Books)Introduction to Statistics, Walpole, R. E., 3rd Edition (2000)Statistical Methods for Practice and Research by Ajai S. Gaur and Sanjaya S. Gaur

35:01 PMConstantA characteristic or property that does not change from individual to individual.45:01 PMVariableA characteristic or property that varies from individual to individual.55:01 PMTypes of Variable65:01 PMNominal ScaleVariable categories are mutually exclusive and exhaustive.Variable categories have no logical order.Eye Color, Hair Color, Gender.75:01 PMOrdinal ScaleData categories are mutually exclusive and exhaustive.Data classifications are ranked or ordered according to the particular trait they possess.Level of Knowledge about SPSS85:01 PMInterval ScaleData categories are mutually exclusive and exhaustive.Data classifications are ranked or ordered according to the particular trait they possess.Equal differences in the characteristic are not represented by equal differences in the measurements.Temperature, Shoe Size and IQ scores95:01 PM10Ratio ScaleData categories are mutually exclusive and exhaustive.Data classifications are ranked or ordered according to the particular trait they possess.Equal differences in the characteristic are represented by equal differences in the measurements.The zero point is the essence of the characteristic.Height, Weight, Distance.5:01 PM11Measurement Scales5:01 PM12DataThe information collected for any kind of investigation.Usually Numerical but can be Qualitative.5:01 PM1213Primary DataThe initial material collected during the research process.The information collected directly from the respondent.Personal Invetigation, Through Investigator, Through Questionnaire, Through Local Sources, Through Telephone,5:01 PM14Secondary DataThe information collected and processed by the people other than the researcherGovernment Organizations, Semi-Government Organizations,

5:01 PM

Data CollectionAny of the following methods may be adopted:(a) Personal interview(b) Direct observation(c) Mail interview (internet interview)(d) Telephone interviewWhat are the cons and pros of each?155:01 PMData managementOffice Editing,Post Coding,Data entry and Verification.

165:01 PMData organization and AnalysisPreparing data for analysis,Extracting descriptive measures from the data, Using advanced statistical techniques to analyze the data and draw inference there from.175:01 PM18Measures of Central TendencyArithmetic Mean Quantiles (Median, Quartiles, Deciles, Percentiles)Mode5:01 PM1819Arithmetic MeanA value obtained by dividing the sum of all the observations by their number.

If X1, X2, , Xn are n observations of a variable X then

5:01 PM1920Arithmetic MeanThe marks obtained by 8 students are:

5:01 PM2021QuantilesFor individual observations/discrete frequency distribution, the ith quartile, jth decile and kth percentile are located in the array/discrete frequency distribution by the following relations

5:01 PM2122The weekly TV Watching times (Hours):

QuartilesThe array of the above data is given below:

5:01 PM2223Quartiles

5:01 PM2324

Quartiles5:01 PM2425Quantiles

5:01 PM2526ModeThe mode is a value which occurs most frequently in a set of data. Or mode is a value that occurs maximum number of times in a sequence of observations.5:01 PM2627The total automobile sales (in millions) in the United States for the last 14 years.

ModeMode = 8.2 million5:01 PM2728Measures of DipsersionMeasures of variation measure the variation present among the values of a data set, so measures of variation are measures of spread of values in the data.5:01 PM29Absolute Measures of DispersionRangeQuartile DeviationMean (Average) DeviationVariance and Standard Deviation5:01 PM30Relative Measures of DispersionCoefficient of RangeCoefficient of Quartile DeviationCoefficient of Mean DeviationCoefficient of Variation (CV)5:01 PM31RangeDifference between the largest and the smallest observations

5:01 PM32Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12Range = 12 - 7 = 57 8 9 10 11 12Range = 12 - 7 = 5 Disadvantages of the Range1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120Range = 5 - 1 = 4Range = 120 - 1 = 1195:01 PMInter-quartile Range (IQR)Inter-quartile range = 3rd quartile 1st Quartile Q3 - Q1

IQR is independent of outliers

335:01 PMInter-quartile Range34Median(Q2)XmaximumXminimumQ1Q325% 25% 25% 25%12 30 45 57 70Inter-quartile Range (IQR) = 57 30 = 275:01 PM35The Mean (absolute) DeviationX83502-30Mean Deviation is the average of absolute deviations taken form the mean value.

3036

5:02 PM36VarianceVariance is the average of the squared deviations taken from the mean value.X cm(X-Mean)^2X243616616369181124144139169163625660102702

5:02 PM37Comparing Standard DeviationsMean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21Data A11 12 13 14 15 16 17 18 19 20 21Mean = 15.5 S = 4.567Data CThe smaller the standard deviation, the more tightly clustered the scores around meanThe larger the standard deviation, the more spread out the scores from mean

5:02 PM

11 12 13 14 15 16 17 18 19 20 21Data BMean = 15.5 S = 0.92638Relative Measures of Variation

5:02 PMCoefficient of Variation (CV)Can be used to compare two or more sets of data measured in different units or same units but different average size.5:02 PM39

40Use of Coefficient of VariationStock A:Average price last year = $50Standard deviation = $5

Stock B:Average price last year = $100Standard deviation = $5but stock B is less variable relative to its price

Both stocks have the same standard deviation5:02 PM41Appropriate Choice of Measure of VariabilityIf data are symmetric, with no serious outliers, use range and standard deviation.If data are skewed, and/or have serious outliers, use IQR.If comparing variation across two data sets, use coefficient of variation (C.V)5:02 PM42Five Number SummaryThe five number summary of a data set consists of the minimum value, the first quartile, the second quartile, the third quartile and the maximum value written in that order: Min, Q1, Q2, Q3, Max.

From the three quartiles we can obtain a measure of central tendency (the median, Q2) and measures of variation of the two middle quarters of the distribution, Q2-Q1 for the second quarter and Q3-Q2 for the third quarter.5:02 PM43The weekly TV viewing times (in hours).

The array of the above data is given below:

Five Number Summary5:02 PM44

Five Number Summary

Minimum value=5.0 Maximum value=66.0

5:02 PM45Box and Whisker DiagramA box and whisker diagram or box-plot is a graphical mean for displaying the five number summary of a set of data. In a box-plot the first quartile is placed at the lower hinge and the third quartile is placed at the upper hinge. The median is placed in between these two hinges. The two lines emanating from the box are called whiskers. The box and whisker diagram was introduced by Professor Jhon W. Tukey.5:02 PM46Construction of Box-PlotStart the box from Q1 and end at Q3Within the box draw a line to represent Q2Draw lower whisker to Min. Value up to Q1Draw upper Whisker from Q3 up to Max. ValueQ1Q3Q25:02 PMMaxValueMinValue47Construction of Box-PlotQ1=22.0 Q3=36.5Q2=30.5Minimum Value=5.0Maximum Value=66.070

60

50

40

30

20

10

05:02 PM48Interpretation of Box-Plot70

60

50

40

30

20

10

0Box-Whisker Plot is useful to identifyMaximum and Minimum Values in the dataMedian of the data IQR=Q3-Q1, Lengthy box indicates more variability in the dataShape of the data From Position of line within boxLine At the center of the box----SymmetricalLine above center of the box----Negatively skewedLine below center of the box----Positively SkewedDetection of Outliers in the data5:02 PM49OutliersAn outlier is the values that falls well outside the overall pattern of the data. It might be

the result of a measurement or recording error,a member from a different population,simply an unusual extreme value.

An extreme value needs not to be an outliers; it might, instead, be an indication of skewness.

5:02 PM50Inner and Outer FencesIf Q1=22.0 Q2=30.5 Q3=36.5

5:02 PM51Identification of the OutliersThe values that lie within inner fences are normal valuesThe values that lie outside inner fences but inside outer fences are possible/suspected/mild outliersThe values that lie outside outer fences are sure outliers80

70

60

50

40

30

20

10

0Plot each suspected outliers with an asterisk and each sure outliers with an hollow dot.*Only 66 is a mild outlier5:02 PM52Box plots are especially suitable for comparing two or more data sets. In such a situation the box plots are constructed on the same scale.Uses of Box and Whisker DiagramMaleFemale5:02 PMStandardized VariableA variable that has mean 0 and Variance 1 is called standardized variableValues of standardized variable are called standard scoresValues of standard variable i.e standard scores are unit-lessConstruction

5:02 PM53X Z325-1.36241.856164-0.54500.29701190.817410.668212161.08991.1879325404.009

Variable Z has mean 0 and variance 1 so Z is a standard variable.Standard Score at X=11 is

5:02 PMStandardized Variable55The industry in which sales rep Mr. Atif works has mean annual sales=$2,500standard deviation=$500. The industry in which sales rep Mr. Asad works has mean annual sales=$4,800standard deviation=$600. Last year Mr. Atifs sales were $4,000 and Mr. Asads sales were $6,000. Performance evaluation by z-scoresWhich of the representatives would you hire if you have one sales position to fill?5:02 PM56Performance evaluation by z-scores

Sales rep. AtifXB= $2,500SB= $500XB= $4,000Sales rep. AsadXP =$4,800SP = $600XP= $6,000

Mr. Atif is the best choice5:02 PM57A distribution in which the values equidistant from the centre have equal frequencies is defined to be symmetrical and any departure from symmetry is called skewness.

Length of Right Tail = Length of Left TailMean = Median = ModeSk=0 a) Sk=(Mean-Mode)/SDb) Sk=(Q3-2Q2+Q1)/(Q3-Q1)

5:02 PMMeasures of Skewness58A distribution is positively skewed, if the observations tend to concentrate more at the lower end of the possible values of the variable than the upper end. A positively skewed frequency curve has a longer tail on the right hand side

Length of Right Tail > Length of Left TailMean > Median > ModeSK>0Measures of Skewness5:02 PM59A distribution is negatively skewed, if the observations tend to concentrate more at the upper end of the possible values of the variable than the lower end. A negatively skewed frequency curve has a longer tail on the left side.Length of Right Tail < Length of Left TailMean < Median < ModeSK< 0

5:02 PMMeasures of Skewness5:02 PM60The Kurtosis is the degree of peakedness or flatness of a unimodal (single humped) distribution, When the values of a variable are highly concentrated around the mode, the peak of the curve becomes relatively high; the curve is Leptokurtic. When the values of a variable have low concentration around the mode, the peak of the curve becomes relatively flat;curve is Platykurtic. A curve, which is neither very peaked nor very flat-toped, it is taken as a basis for comparison, is called Mesokurtic/Normal.Measures of Kurtosis615:02 PM

Measures of Kurtosis62Measures of KurtosisIf Coefficient of Kurtosis > 3 ----------------- Leptokurtic.If Coefficient of Kurtosis = 3 ----------------- Mesokurtic.If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.

5:02 PM63

The Empirical Rule

68.26%

95.45%

99.73%5:02 PMSPSS Statistical Package for Social Sciences5:02 PM646772687065687563

2541273243663531155

34263238163038302021

5151620212526273030

31323234353738414366

9.08.28.09.110.311.011.5

10.310.59.89.38.28.28.5

2541273243663531155

34263238163038302021

5151620212526273030

31323234353738414366