Reliability

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT AND EVALUATION

RELIABILITY

PRESENTED TO:DR AMRITA KAURPREPARED BY:MARY (819709)ANITAH A/P GOTANDABANI (819826)RODZITA BINTI ABD MUTALIB (816732)RAJESHWARY A/P KRISHNAN (819855)SITI RUQAYYAH BINTI MD IDRIS (819859)

DEFINITIONThat can be relied on; dependable; trustworthyWebsters New World Dictionary

it would give us the same result over and over againit provides the same result on two and more separate occasionit does not change when the concept being measured remain constant in valueA measure is considered reliable if :

repeated measure give a similar result

Internal Consistency - if the scales do not accurately measure the concept in question, researchers will draw false conclusions from the data and make inaccurate diagnosesInter-rater reliability - subjective methods such as observations, because a researcher could be biased and (consciously or unconsciously) only record behaviours that support their hypothesisImportance of Reliability

Alternate forms reliability - if two different methods studying the same construct obtain similar results and correlations are high, it is likely that the results are reliable. Therefore, accurate conclusions and inferences can be drawn from results.Test-retest reliability - if a study is low in this, it shows that results may be due to factors other than the manipulation of the independent variable e.g. fatigue therefore results cannot be published and applied to the real world because any conclusions drawn from the data may be false.used for prediction of academic successeasy to identify the weak componentscan be presented graphically in various waysidentify the most problematic areas in the system

Factors Affecting ReliabilityAdministrator FactorsNumber of Items on the instrumentThe Instrument Taker Heterogeneity of the Items Heterogeneity of the Group Members Length of Time between Test and Retest

Poor or unclear directions given during administration or inaccurate scoring can affect reliability.

For Example - say you were told that your scores on being social determined your promotion. The result is more likely to be what you think they want than what your behavior is.

Administrator Factors

The larger the number of items, the greater the chance for high reliability.

For Example -it makes sense when you ponder that twenty questions on your leadership style is more likely to get a consistent result than four questions. Remedy: Use longer tests or accumulate scores from short tests.

Number of Items on the Instrument

For Example -If you took an instrument in August when you had a terrible flu and then in December when you were feeling quite good, we might see a difference in your response consistency. If you were under considerable stress of some sort or if you were interrupted while answering the instrument questions, you might give different responses.

The Test Taker

Heterogeneity of the Items -- The greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients.

Heterogeneity of the Group Members -- The greater the heterogeneity of the group members in the preferences, skills or behaviors being tested, the greater the chance for high reliability correlation coefficients.

Heterogeneity

The shorter the time, the greater the chance for high reliability correlation coefficients.

As we have experiences, we tend to adjust our views a little from time to time. Therefore, the time interval between the first time we took an instrument and the second time is really an "experience" interval.

Experience happens, and it influences how we see things. Because internal consistency has no time lapse, one can expect it to have the highest reliability correlation coefficient.

Length of Time between Test and Retest

TYPES OF RELIABILITYTest-Retest ReliabilityAlternate / Equivalent FormInternal ConsistencyInterrater Reliability

TEST-RETEST RELIABILITY administering the same test twice over a period of time to a group or individual. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time. For example: give a set of people a test twice and see if the two sets of scores are correlated

Product Moment Pearson Correlation 5(1233.5)-(325) (18.7) ------------------------------------------------- = 0.0724 (5) (21375)- (325)2 5(71.31)-(18.7)2

Exam in MayExam in OctoberX2Y2XY1553.130259.61170.52653.5422512.25227.53603.4360011.562044704.2490017.642945754.5562520.25337.5

Sum32518.72137571.311233.5

ALTERNATE / EQUIVALENT FORMThis reliability is the correlation between a groups scores on two forms of the same test.As an exam give everyone in a group two forms of the same test and correlate those two sets of scores).

INTERRATER RELIABILITYInvolves having two raters independently observe and record specified behaviors, such as teaching, classroom engagement.

Interater reliability = no of agreement no of possible agreement

Rater 1Rater 2Answer 187Answer 298Answer 31010Answer 477Answer 588Answer 699Answer 7109Answer 899Answer 988Answer 1077Answer 1187Answer 1288Match TotalIRR

3.Internal consistencyInternal consistency reliability defines the consistency of the results delivered in a test, ensuring that the various items measuring the different constructs deliver consistent scores.

Split-Halves Test

The split halves test for internal consistency reliability is the easiest type, and involves dividing a test into two halves.For example, a questionnaire to measure extroversion could be divided into odd and even questions. The results from both halves are statistically analyzed.Example 1: 12 students take a test with 50 questions. For each student the total score is recorded along with the sum of the scores for the even questions and the sum of the scores for the odd question as shown in Figure 1.Determine whether the test is reliable by using the split-half methodology.(Spearman-Brown correction)

Spearman-Brown correlation :Spearman-Brown correlation :Figure 1 Split-half methodology P= 2(0.6672) 1 + 0.6672= 0.8004Correlation: 0.6673P= 2(r) 1+ rThis result shows that the test is quite reliable.

Split-half methodologyStudentScoreEvenOdd142222023318153442321445212453014166261511745212483517189401723103216161134191512452322

Kuder and Richardson Formula 20It is equivalent to performing the split half methodology on all combinations of questions and is applicable when each question is either right or wrong. A correct question scores 1 and an incorrect question scores 0.

Kuder and Richardson Formula 20

Students Q1Q2Q3Q4Q5Q6X (CORRECT ITEMS)X11111112110010310100041101115100101610101171011108111111901011110111101pQ=(1-P) P*Q S k Students

Kuder and Richardson Formula 20S = 192 (42) 10 10 - 1

= 192 176.4 9 = 1.733

Students Q1Q2Q3Q4Q5Q6X (CORRECT ITEMS)X1111111636211001039310100024411011152551001013961010114167101110416811111163690101114161011110152542192p0.90.60.60.70.70.7Q=(1-P) 0.10.40.40.30.30.3P*Q 0.090.240.240.210.210.21 = 1.2S1.733k 6Students 10

ContK (no. question) = 6Sum pq = 1.2 S = 1.733

6 1.2 ----------- 1 - --------------------- (6 1) 1.733

= 1.2 (0.308)

= 0.369

Kuder and Richardson Formula 21

studentQ1Q2Q3Q4Q5Q6X (CORRECT ITEMS)X11111116362110010393101000411011151001016101011

ContMean = 23/6 3.833S = 99 (23) 6 6-1

= 99 88.167 5 = 2.167

ContK (no. question) = 6X = 3.833 S = 2.167

6 3.833 (6-3.833) ----------- 1- --------------------- (6 1) 6(2.167)

= 1.2 (0.361)

= 0.4332

Cronbachs Alpha

STUDENTS Q1Q2Q3Q4Q5Q6XX13145152435431334444443523125354434645352572535428342442934453610323325SUM: VAR OF EACH ITEM:

Cronbachs Alpha

ContCalculate Cronbachs alpha for a 10 question questionnaire with Likert scores. (1-7)

STUDENTS Q1Q2Q3Q4Q5Q6XX13145151936124354312040033444442352943523121625653544342352964535252457672535422144183424421936193445362562510323325183242084402SUM: 313834422736VAR OF EACH ITEM: 0.3221.9560.9330.6221.3442.9338.11

Cont K si ---- 1 - -----K-1 s K =10

si = 8.11

s = 8.4 10 8.11 ---- 1 - --------10-1 8.4 = 1.1 ( 0.035)

= 0.038

***

Documents

Reliability