Upload
munawarasikin
View
468
Download
2
Embed Size (px)
DESCRIPTION
This is a part of our second week material of computational research method. The lecture is headed by Dr. Said Mirza Pahlevi and Munawar Asikin, S.Si. MSE as the coo-lecturer.Any comment? [email protected]
Citation preview
Statistical Computation Research M h d QTN METHOD
by
Method : QTN METHODby
Munawar Asikin, S.Si, MSE
April, 14th 2009
Approach to Statistical Research1 Formulate a Hypothesis 1. Formulate a Hypothesis 2. State predictions of the hypothesis 3 Perform experiments or observations 3. Perform experiments or observations 4. Interpret experiments or observations
E l l i h h h i 5. Evaluate results with respect to hypothesis 6. Refine hypothesis and start again
(Basically the same as all other research)HOW TO DO RESEARCH IN COMPUTER SCIENCE,
PLEASE MENTION THE STEPS!!!
QTN METHOD mostly depend on STATISTICS as a Q y pscience, tools, and method
Basic Statistical Concepts
D i ti St ti tiDescriptive Statistic
Probability Distributions
Sampling Distributions
Estimation
Hypothesis Testing
Goodness of Fit Test
Correlation Analysis
Basic TerminologyPopulation: is a set of entities concerning which statistical Population: is a set of entities concerning which statistical inferences are to be drawn. Sample: a number of independent observations from the same probability distributionParameter: the distribution of a random variable as belonging to a family of probability distributions distinguished from each other family of probability distributions, distinguished from each other by the values of a finite number of parametersBias: a factor that causes a statistical sample of a population to have p p psome examples of the population less represented than others.
Descriptive StatisticsDS involves describing data collections with a few key DS involves describing data collections with a few key summary values
Population
Mean, standard d
Population
Large Collections of Data
deviation, variance
Sample
Degress of FreedomIn data collection indicate the number of data items that are In data collection indicate the number of data items that are independent of one another and that can carry unique pieces of information.o o at o .
Population pvs
Sample
Don’t forget the notation!!!
Example of DOF: The StatementI am thinking of the number 5I am thinking of the number 5
I am thinking of the number 7
Th f t b I thi ki f i 12The sum of two numbers I am thinking of is 12
How many number of data items that are independent of one h d h i i f i f i ? another and that can carry unique pieces of information?
Answer: TWO
Probability DistributionsA discrete random variable can assume only certain specified A discrete random variable can assume only certain specified values, usually the integers
A continous random variable can assume any numerical value A continous random variable can assume any numerical value within some range
The expected value of a random variable is the average value The expected value of a random variable is the average value of the variable over many trials or observations
Remember:Remember:
Statisticians and Economist: Random Walk Hypothesis
E R l h B C d P d Exp: Relationship Between Consumption and Product Domestics Brutto
From A DICRETE to An INTERVAL/RATIO SCALE
A qualitative SUMMATED
RATING Data SCALE
A quantitative Data
SCORING SYSTEM
Sampling DistributionsA SD is the array of all possible sample statistics that can be A SD is the array of all possible sample statistics that can be drawn from a population for a given sample size
CENTRAL LIMIT THEOREMCENTRAL LIMIT THEOREM
As the sample size becomes larger, the sampling distribution of the sample mean tends toward the normal distributionof the sample mean tends toward the normal distribution
The Normal Distribution
Why?One reason the normal distribution is important is that One reason the normal distribution is important is that many psychological and organsational variables are distributed approximately normally. Measures of reading pp y y gability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close.
Why?
A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all
l d d h l statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality.
EstimationA point Estimate vs An Interval Estimate A point Estimate vs An Interval Estimate
BE CARE FULLBE CARE FULLWITH YOUR
ESTIMATION!!!
Demokrat vs PDIP and GOLKAR?What is the better and quicker to know WHO WILL WIN in What is the better and quicker to know WHO WILL WIN in the LEGISLATIVE’s POOLING?
INTERVAL ESTIMATION QUICK COUNT SURVEY Xb ME T V l Xb +ME
INTERVAL ESTIMATION: DEMOKRAT THE WINNER
SURVEY Xbar-ME True Value Xbar+ME
ME= margin of error
KPU mannually t ll ??????????tally ??????????
Trends in Statistical Tests used in Research Papers
Historically Currently
Results in:Accept/Reject
Results in:p-Value
Results in:Approx. Mean
Hypothesis Testing: The Steps1 State the hypothesis being tested: H0 vs H11. State the hypothesis being tested: H0 vs H1
2. Collect a randome sample of the items from the population, measure them, and compute the appropriate p p , , p pp psample statistic
3. Assume the null hypothesis is true and consult the sampling distribution from which the sample statistic was drawn under this assumptionC h b b l h h l ld h 4. Compute the probability that such a sample could have been drawn from this sampling distribution
5 If this probabilit is high do not reject H0 if this 5. If this probability is high, do not reject H0, if this probability is low, Ho can be rejected
Hypothesis TestingH0 : Null Hypothesis status quoH0 : Null Hypothesis, status quo
HA : Alternative Hypothesis, research question
So, either :
"Th d d H ""The data does not support H0"
or
"We fail to reject H0"
Result of Hypothesis Test
Do not Reject H0
Reject Ho
State
Ho true CorrectDecision
Type I Error
of
Nat
H0 false Type II Error CorrectDecisiontu
re
Goodnest Of Fit TestThe goodness of fit test determine whether sampled items The goodness of fit test determine whether sampled items may be assumed to have been drawn from a population that
follows a specified distributiion
Determine whether your data fits some
known suspected or theoritical distribution
Examples:
known, suspected, or theoritical distribution
Regression: F test, T test
Perception Survey: Pearson Correlation and Cronbach Alphap y p
Correlation Analysis
S DiScatter Diagrams plots X-Y data points on a
two dimensional graphg p
Correlation Coefficient measures the
extent to which two variable are linearly related to each other
GRAPHYOUR WORLD to know how it is!!!
Scatter Plots
Excellent for examining i i b association between two
variables
H i li b d f How many regression line can be made from the data above? The BLUE concepts will be pteached NEXT WEEK!!
Outliers (and their treatment)An "outlier" is an observation that does not fit the An outlier is an observation that does not fit the pattern in the rest of the data
Check the data
Check with the measurer
If reason to believe it is NOT real, change it if possible, otherwise l i (b )leave it out (but note).
If reason to believe it is real, leave it out and note.
ExerciseIf the average IQ in a given population is 100 and the If the average IQ in a given population is 100, and the standard deviation is 15, what percentage of the population has an IQ of 145 or higher ?as a Q o 45 o g e ?
AnswerP(X >= 145)P(X >= 145)
P(Z >= ((145 - 100)/15))
P(Z >= 3)P(Z >= 3)
From the tables :
99.87% are less than 3
=> 0.13% of population
Reliability AnalysisReliability analysis allows you to study the properties of Reliability analysis allows you to study the properties of measurement scales and the items that make them up.
The Reliability Analysis procedure calculates a number of The Reliability Analysis procedure calculates a number of commonly used measures of scale reliability and also provides information about the relationships between individual items information about the relationships between individual items in the scale.
Intraclass correlation coefficients can be used to compute Intraclass correlation coefficients can be used to compute interrater reliability estimates.
ExampleDoes my questionnaire measure customer satisfaction in a Does my questionnaire measure customer satisfaction in a useful way?
Using reliability analysis you can determine the extent to Using reliability analysis, you can determine the extent to which the items in your questionnaire are related to each other, you can get an overall index of the repeatability or other, you can get an overall index of the repeatability or internal consistency of the scale as a whole, and you can identify problem items that should be excluded from the y pscale.
StatisticsDescriptives for each variable and for the scale summary Descriptives for each variable and for the scale, summary statistics across items, inter-item correlations and covariances, reliability estimates, ANOVA table, intraclasscova a ces, e a ty est ates, OV ta e, t ac asscorrelation coefficients, Hotelling’s T-square, and Tukey’s test of additivity.y
Models of Reliability Analysis:
Alpha (Cronbach). This is a model of internal consistency, based on the average inter-item correlation.gSplit-half. This model splits the scale into two parts and examines the correlation between the parts.Guttman. This model computes Guttman’s lower bounds for true reliability.Parallel. This model assumes that all items have equal variances and equal error variances across replications.
ll l Th d l k h f h Strict parallel. This model makes the assumptions of the parallel model and also assumes equal means across items.
Cronbach’s Alpha Formula:
where := Alpha Cronbach coef
/ S2items = items/question varianceS2 total = total variancek = no of questioned being testedk no of questioned being tested
K/(1-K): The Weight
2,500
weight
1,500
2,000
0,500
1,000 weight
-
0 50 100 150 200 250
The no of 50 items in QUESTIONAIRE will not effect to THE WEIGHT The no of 50 items in QUESTIONAIRE will not effect to THE WEIGHT (or have the SAME WEIGHT). So don’t more than 50 items in your perceptions survey
SOLLUTION: separate to several DIMENSION/INDICATORS of eachp
CAN YOU COMPUTE A CRONBACH ALPHA?
obs a b c d e f
1 4 4 5 5 3 4
2 4 4 5 4 3 52 4 4 5 4 3 5
3 5 3 5 5 3 5
4 5 4 4 4 3 4
5 4 4 4 4 5 5
6 3 3 3 3 3 5
7 4 4 5 5 4 3
8 5 4 4 5 4 4
9 4 4 4 4 4 49 4 4 4 4 4 4
10 3 5 3 5 4 4
CRONBACH ALPHA: USING SPSS
Reliability Statistics
Cronbach'sAlpha Based
,093 ,038 6
Cronbach'sAlpha
Alpha Basedon
StandardizedItems N of Items
,093 ,038 6
The Data drawn from 10 people tends to have LOW RELIABITY INDEX. It means that the QUESTION DESIGN can not measure on what researcher will measure
What Next? REDESIGN QUESTION and ADD MORE SAMPLES !
CRONBACH ALPHA: USING FOXPRO
clear
set talk offcalculate var(a) to nacalculate var(b) to nbcalculate var(c) to nccalculate var(d) to ndcalculate var(e) to necalculate var(f) to nf
l lcalculate var(jum) to njumtotvar=na+nb+nc+nd+ne+nf
il ik 6/(6 1)nilaik=6/(6-1)relia=nilaik*(1-(totvar/njum))?relia