Statistical Computation Research Method1

Statistical Computation Research M h d QTN METHOD

by

Method : QTN METHODby

Munawar Asikin, S.Si, MSE

April, 14th 2009

Approach to Statistical Research1 Formulate a Hypothesis 1. Formulate a Hypothesis 2. State predictions of the hypothesis 3 Perform experiments or observations 3. Perform experiments or observations 4. Interpret experiments or observations

E l l i h h h i 5. Evaluate results with respect to hypothesis 6. Refine hypothesis and start again

(Basically the same as all other research)HOW TO DO RESEARCH IN COMPUTER SCIENCE,

PLEASE MENTION THE STEPS!!!

QTN METHOD mostly depend on STATISTICS as a Q y pscience, tools, and method

Basic Statistical Concepts

D i ti St ti tiDescriptive Statistic

Probability Distributions

Sampling Distributions

Estimation

Hypothesis Testing

Goodness of Fit Test

Correlation Analysis

Basic TerminologyPopulation: is a set of entities concerning which statistical Population: is a set of entities concerning which statistical inferences are to be drawn. Sample: a number of independent observations from the same probability distributionParameter: the distribution of a random variable as belonging to a family of probability distributions distinguished from each other family of probability distributions, distinguished from each other by the values of a finite number of parametersBias: a factor that causes a statistical sample of a population to have p p psome examples of the population less represented than others.

Descriptive StatisticsDS involves describing data collections with a few key DS involves describing data collections with a few key summary values

Population

Mean, standard d

Population

Large Collections of Data

deviation, variance

Sample

Degress of FreedomIn data collection indicate the number of data items that are In data collection indicate the number of data items that are independent of one another and that can carry unique pieces of information.o o at o .

Population pvs

Sample

Don’t forget the notation!!!

Example of DOF: The StatementI am thinking of the number 5I am thinking of the number 5

I am thinking of the number 7

Th f t b I thi ki f i 12The sum of two numbers I am thinking of is 12

How many number of data items that are independent of one h d h i i f i f i ? another and that can carry unique pieces of information?

Answer: TWO

Probability DistributionsA discrete random variable can assume only certain specified A discrete random variable can assume only certain specified values, usually the integers

A continous random variable can assume any numerical value A continous random variable can assume any numerical value within some range

The expected value of a random variable is the average value The expected value of a random variable is the average value of the variable over many trials or observations

Remember:Remember:

Statisticians and Economist: Random Walk Hypothesis

E R l h B C d P d Exp: Relationship Between Consumption and Product Domestics Brutto

From A DICRETE to An INTERVAL/RATIO SCALE

A qualitative SUMMATED

RATING Data SCALE

A quantitative Data

SCORING SYSTEM

Sampling DistributionsA SD is the array of all possible sample statistics that can be A SD is the array of all possible sample statistics that can be drawn from a population for a given sample size

CENTRAL LIMIT THEOREMCENTRAL LIMIT THEOREM

As the sample size becomes larger, the sampling distribution of the sample mean tends toward the normal distributionof the sample mean tends toward the normal distribution

The Normal Distribution

Why?One reason the normal distribution is important is that One reason the normal distribution is important is that many psychological and organsational variables are distributed approximately normally. Measures of reading pp y y gability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close.

Why?

A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all

l d d h l statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality.

EstimationA point Estimate vs An Interval Estimate A point Estimate vs An Interval Estimate

BE CARE FULLBE CARE FULLWITH YOUR

ESTIMATION!!!

Demokrat vs PDIP and GOLKAR?What is the better and quicker to know WHO WILL WIN in What is the better and quicker to know WHO WILL WIN in the LEGISLATIVE’s POOLING?

INTERVAL ESTIMATION QUICK COUNT SURVEY Xb ME T V l Xb +ME

INTERVAL ESTIMATION: DEMOKRAT THE WINNER

SURVEY Xbar-ME True Value Xbar+ME

ME= margin of error

KPU mannually t ll ??????????tally ??????????

Trends in Statistical Tests used in Research Papers

Historically Currently

Results in:Accept/Reject

Results in:p-Value

Results in:Approx. Mean

Hypothesis Testing: The Steps1 State the hypothesis being tested: H0 vs H11. State the hypothesis being tested: H0 vs H1

2. Collect a randome sample of the items from the population, measure them, and compute the appropriate p p , , p pp psample statistic

3. Assume the null hypothesis is true and consult the sampling distribution from which the sample statistic was drawn under this assumptionC h b b l h h l ld h 4. Compute the probability that such a sample could have been drawn from this sampling distribution

5 If this probabilit is high do not reject H0 if this 5. If this probability is high, do not reject H0, if this probability is low, Ho can be rejected

Hypothesis TestingH0 : Null Hypothesis status quoH0 : Null Hypothesis, status quo

HA : Alternative Hypothesis, research question

So, either :

"Th d d H ""The data does not support H0"

or

"We fail to reject H0"

Result of Hypothesis Test

Do not Reject H0

Reject Ho

State

Ho true CorrectDecision

Type I Error

of

Nat

H0 false Type II Error CorrectDecisiontu

re

Goodnest Of Fit TestThe goodness of fit test determine whether sampled items The goodness of fit test determine whether sampled items may be assumed to have been drawn from a population that

follows a specified distributiion

Determine whether your data fits some

known suspected or theoritical distribution

Examples:

known, suspected, or theoritical distribution

Regression: F test, T test

Perception Survey: Pearson Correlation and Cronbach Alphap y p

Correlation Analysis

S DiScatter Diagrams plots X-Y data points on a

two dimensional graphg p

Correlation Coefficient measures the

extent to which two variable are linearly related to each other

GRAPHYOUR WORLD to know how it is!!!

Scatter Plots

Excellent for examining i i b association between two

variables

H i li b d f How many regression line can be made from the data above? The BLUE concepts will be pteached NEXT WEEK!!

Outliers (and their treatment)An "outlier" is an observation that does not fit the An outlier is an observation that does not fit the pattern in the rest of the data

Check the data

Check with the measurer

If reason to believe it is NOT real, change it if possible, otherwise l i (b )leave it out (but note).

If reason to believe it is real, leave it out and note.

ExerciseIf the average IQ in a given population is 100 and the If the average IQ in a given population is 100, and the standard deviation is 15, what percentage of the population has an IQ of 145 or higher ?as a Q o 45 o g e ?

AnswerP(X >= 145)P(X >= 145)

P(Z >= ((145 - 100)/15))

P(Z >= 3)P(Z >= 3)

From the tables :

99.87% are less than 3

=> 0.13% of population

Reliability AnalysisReliability analysis allows you to study the properties of Reliability analysis allows you to study the properties of measurement scales and the items that make them up.

The Reliability Analysis procedure calculates a number of The Reliability Analysis procedure calculates a number of commonly used measures of scale reliability and also provides information about the relationships between individual items information about the relationships between individual items in the scale.

Intraclass correlation coefficients can be used to compute Intraclass correlation coefficients can be used to compute interrater reliability estimates.

ExampleDoes my questionnaire measure customer satisfaction in a Does my questionnaire measure customer satisfaction in a useful way?

Using reliability analysis you can determine the extent to Using reliability analysis, you can determine the extent to which the items in your questionnaire are related to each other, you can get an overall index of the repeatability or other, you can get an overall index of the repeatability or internal consistency of the scale as a whole, and you can identify problem items that should be excluded from the y pscale.

StatisticsDescriptives for each variable and for the scale summary Descriptives for each variable and for the scale, summary statistics across items, inter-item correlations and covariances, reliability estimates, ANOVA table, intraclasscova a ces, e a ty est ates, OV ta e, t ac asscorrelation coefficients, Hotelling’s T-square, and Tukey’s test of additivity.y

Models of Reliability Analysis:

Alpha (Cronbach). This is a model of internal consistency, based on the average inter-item correlation.gSplit-half. This model splits the scale into two parts and examines the correlation between the parts.Guttman. This model computes Guttman’s lower bounds for true reliability.Parallel. This model assumes that all items have equal variances and equal error variances across replications.

ll l Th d l k h f h Strict parallel. This model makes the assumptions of the parallel model and also assumes equal means across items.

Cronbach’s Alpha Formula:

where := Alpha Cronbach coef

/ S2items = items/question varianceS2 total = total variancek = no of questioned being testedk no of questioned being tested

K/(1-K): The Weight

2,500

weight

1,500

2,000

0,500

1,000 weight

-

0 50 100 150 200 250

The no of 50 items in QUESTIONAIRE will not effect to THE WEIGHT The no of 50 items in QUESTIONAIRE will not effect to THE WEIGHT (or have the SAME WEIGHT). So don’t more than 50 items in your perceptions survey

SOLLUTION: separate to several DIMENSION/INDICATORS of eachp

CAN YOU COMPUTE A CRONBACH ALPHA?

obs a b c d e f

1 4 4 5 5 3 4

2 4 4 5 4 3 52 4 4 5 4 3 5

3 5 3 5 5 3 5

4 5 4 4 4 3 4

5 4 4 4 4 5 5

6 3 3 3 3 3 5

7 4 4 5 5 4 3

8 5 4 4 5 4 4

9 4 4 4 4 4 49 4 4 4 4 4 4

10 3 5 3 5 4 4

CRONBACH ALPHA: USING SPSS

Reliability Statistics

Cronbach'sAlpha Based

,093 ,038 6

Cronbach'sAlpha

Alpha Basedon

StandardizedItems N of Items

,093 ,038 6

The Data drawn from 10 people tends to have LOW RELIABITY INDEX. It means that the QUESTION DESIGN can not measure on what researcher will measure

What Next? REDESIGN QUESTION and ADD MORE SAMPLES !

CRONBACH ALPHA: USING FOXPRO

clear

set talk offcalculate var(a) to nacalculate var(b) to nbcalculate var(c) to nccalculate var(d) to ndcalculate var(e) to necalculate var(f) to nf

l lcalculate var(jum) to njumtotvar=na+nb+nc+nd+ne+nf

il ik 6/(6 1)nilaik=6/(6-1)relia=nilaik*(1-(totvar/njum))?relia

Documents

Statistical Computation Research Method1