42
Analyzing & Analyzing & Interpreting Data Interpreting Data Assessment Institute Assessment Institute Summer 2005 Summer 2005

Analyzing & Interpreting Data Assessment Institute Summer 2005

Embed Size (px)

Citation preview

Page 1: Analyzing & Interpreting Data Assessment Institute Summer 2005

Analyzing & Analyzing & Interpreting DataInterpreting Data

Assessment InstituteAssessment Institute

Summer 2005Summer 2005

Page 2: Analyzing & Interpreting Data Assessment Institute Summer 2005

Categorical vs. Continuous Categorical vs. Continuous VariablesVariables

►Categorical VariablesCategorical Variables Examples Examples Student’s major, enrollment status, Student’s major, enrollment status,

gender, ethnicity; also whether or not the student gender, ethnicity; also whether or not the student passed the cutoff on a testpassed the cutoff on a test

►Continuous VariablesContinuous Variables Examples Examples GPA, test scores, number of credit hours.GPA, test scores, number of credit hours.

►Why make this distinction?Why make this distinction? Whether a variable is categorical or continuous affects Whether a variable is categorical or continuous affects

whether a particular statistic can be used whether a particular statistic can be used ►Doesn’t make sense to calculate the average Doesn’t make sense to calculate the average

ethnicity of students!ethnicity of students!

Page 3: Analyzing & Interpreting Data Assessment Institute Summer 2005

AveragesAverages

►Typical value of a variableTypical value of a variable► In assessment we commonly compare In assessment we commonly compare

averages of:averages of: Different groupsDifferent groups

►Each group consists of different peopleEach group consists of different people►Avg. score on a test for students in different Avg. score on a test for students in different

classesclasses Different occasionsDifferent occasions

►Same people tested on each occasionSame people tested on each occasion►Avg. score on a test for students who took the test Avg. score on a test for students who took the test

as freshmen an then again when they were seniorsas freshmen an then again when they were seniors

Page 4: Analyzing & Interpreting Data Assessment Institute Summer 2005

Before calculating an Before calculating an average…average…

►Check to make sure that the variable:Check to make sure that the variable: Is Is continuouscontinuous Has values in your data set that are within Has values in your data set that are within

the possible the possible limitslimits►Check minimum and maximum valuesCheck minimum and maximum values

Does Does notnot have a distribution that is overly have a distribution that is overly skewedskewed

►If so, consider using medianIf so, consider using median Does not have any values that would be Does not have any values that would be

considered considered outliersoutliers

Page 5: Analyzing & Interpreting Data Assessment Institute Summer 2005

HistogramHistogram

Page 6: Analyzing & Interpreting Data Assessment Institute Summer 2005

Correlations (Correlations (rr))

► Captures Captures linear linear relationship between two relationship between two continuouscontinuous variables (X and Y) variables (X and Y)

► Ranges from -1 to 1 with values closer to |1| Ranges from -1 to 1 with values closer to |1| indicating a stronger relationship than indicating a stronger relationship than values closer to 0 (no relationship)values closer to 0 (no relationship)

► Positive values:Positive values: High X associated with high Y; low X associated High X associated with high Y; low X associated

with low Y with low Y ► Negative values:Negative values:

High X associated with low Y; low X associated High X associated with low Y; low X associated with high Y with high Y

Page 7: Analyzing & Interpreting Data Assessment Institute Summer 2005

Relationship between KWH1 Fall 2003 Total Scores and KWH1 Spring 2005 Total Scores

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

KWH1 Total Score Fall 2003

KW

H1

To

tal S

core

Sp

rin

g 2

005

Scatterplot:Scatterplot: Does relationship appear Does relationship appear linearlinear? Is there a ? Is there a problem with problem with restriction of rangerestriction of range? Does there appear to ? Does there appear to be be outliersoutliers??

In this example, dropping cases that

appeared to be outliers did not

change the relationship between

the two administrations (r

= .30), nor their averages.

Page 8: Analyzing & Interpreting Data Assessment Institute Summer 2005

StandardsStandards

30

55

0

10

20

30

40

50

60

70

80

90

100

Incoming Freshmen Keystone Course Completers

Tes

t S

core

STANDARD SET BY FACULTY

► May want to use standard setting procedures to establish “cut-offs” for proficiency on the testMay want to use standard setting procedures to establish “cut-offs” for proficiency on the test► Could be that students are gaining knowledge/skills over time, but are they gaining enough?Could be that students are gaining knowledge/skills over time, but are they gaining enough?► Another common statistic calculated in assessment is the Another common statistic calculated in assessment is the % of students meeting or exceeding a % of students meeting or exceeding a

standardstandard

Page 9: Analyzing & Interpreting Data Assessment Institute Summer 2005

A.A. Are the 29 senior music majors in Spring Are the 29 senior music majors in Spring 2005 scoring higher on the Vocal Techniques 2005 scoring higher on the Vocal Techniques 10-item test than last year’s 20 senior music 10-item test than last year’s 20 senior music majors?majors?

►Compare averages of different groupsCompare averages of different groups

Yes, this year’s seniors Yes, this year’s seniors scored higher (M = 6.72) scored higher (M = 6.72) than last year’s (M = than last year’s (M = 6.65).6.65).

Page 10: Analyzing & Interpreting Data Assessment Institute Summer 2005

B.B. Are senior kinesiology majors in different Are senior kinesiology majors in different concentrations (Sports Management vs. concentrations (Sports Management vs. Therapeutic Recreation) scoring differently on Therapeutic Recreation) scoring differently on a test used to assess their “core” kinesiology a test used to assess their “core” kinesiology knowledge? knowledge?

►Compare averages of different groupsCompare averages of different groups

Concentration Name (Acronym) # Unique Items

N Mean SD Min Max

Exercise Science and Leadership (ESL) 50 24 60.58% 7.66% 50% 74%

Physical and Health Education Teacher Education (PHETE)

50 13 54.16% 8.66% 42% 74%

Recreation Management (RM) 50 9 59.78% 10.12% 44% 74%Sport Management (SM) 50 64 51.84% 8.54% 22% 68%Therapeutic Recreation (TR) 50 4 57.00% 4.76% 52% 62%Overall 50 114 54.78% 9.08% 22% 74%

Page 11: Analyzing & Interpreting Data Assessment Institute Summer 2005

60.58% 59.78%

51.84%

57.00%54.16%

0%

25%

50%

75%

100%

Exercise Science andLeadership (ESL)

Physical and HealthEducation Teacher

Education (PHETE)

RecreationManagement (RM)

Sport Management(SM)

Therapeutic Recreation (TR)

Concentration

Mea

n C

ore

Tes

t Tot

al (P

erce

nt C

orre

ct)

Concentration Mean Overall Mean

Page 12: Analyzing & Interpreting Data Assessment Institute Summer 2005

C.C. On the Information Seeking Skills Test On the Information Seeking Skills Test (ISST), what percent of incoming freshmen in (ISST), what percent of incoming freshmen in Fall 2004 met or exceeded the score Fall 2004 met or exceeded the score necessary to be considered as having necessary to be considered as having “proficient” information literary skills? “proficient” information literary skills?

Of the 2862 students Of the 2862 students attempting the ISST, 2751 attempting the ISST, 2751 (96%) met or exceeded the (96%) met or exceeded the

“proficient” standard.“proficient” standard.

► Percent of students meeting and exceeding Percent of students meeting and exceeding a standarda standard

Page 13: Analyzing & Interpreting Data Assessment Institute Summer 2005

D.D. Are the well-being levels (as Are the well-being levels (as measured using six subscales - e.g., measured using six subscales - e.g., self-acceptance, autonomy, etc.) of self-acceptance, autonomy, etc.) of incoming JMU freshmen different than incoming JMU freshmen different than

the well-being levels of adults?the well-being levels of adults?

►Compare averages of different groups Compare averages of different groups (JMU students vs. adults)(JMU students vs. adults)

►More than one variable (six different More than one variable (six different subscales)subscales)

Page 14: Analyzing & Interpreting Data Assessment Institute Summer 2005

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Ave

rag

e

Ryff & Keyes 95 F04

Ryff & Keyes 95 14.6 14.8 15.2 14.9 14.4 15.7

F04 15.3 15.0 13.9 14.0 14.8 15.7

Self-Acceptance

Positive Relations with

OthersAutonomy

Environmental Mastery

Purpose in LifePersonal Growth

SimilaritiesJMU Incoming Freshmen seem to be similar to the adult sample (N = 1100) in Positive Relations with Others and Personal Growth.DifferencesJMU incoming freshmen have significantly lower Autonomy and Environmental Mastery well-being compared to the adult sample and significantly higher Self-Acceptance and Purpose in Life.

While the practical significance of the differences for Self-Acceptance and Purpose in Life are considered small (d=.14 and d=.25), the differences for Autonomy (d=.50) and Environmental Mastery (d=.35) are considered medium and small to medium, respectively.

Page 15: Analyzing & Interpreting Data Assessment Institute Summer 2005

►Comparing Means Across Different Comparing Means Across Different Occasions for Different GroupsOccasions for Different Groups

E.E. Are students scoring higher on the Health Are students scoring higher on the Health and Wellness Questionnaire as sophomores and Wellness Questionnaire as sophomores compared to when they were freshmen? Does compared to when they were freshmen? Does the difference depend on whether or not they the difference depend on whether or not they have completed their wellness course have completed their wellness course requirement?requirement?

Page 16: Analyzing & Interpreting Data Assessment Institute Summer 2005

HWQ1-Part1 Mean Total Scores from Fall 2003 and Spring 2005 by Number of Wellness Courses Completed

38.91

40.5240.19

41.38

15

20

25

30

35

40

45

50

55

60

No Courses 1 Course

Number of Wellness Courses Completed

HW

Q1

-Pa

rt1

To

tal S

co

re

Fall 2003 Spring 2005

“Non-Completers”N = 21

“Completers” N = 283

Page 17: Analyzing & Interpreting Data Assessment Institute Summer 2005

F.F. Are the writing portfolios collected in the fall Are the writing portfolios collected in the fall semester yielding higher ratings than writing semester yielding higher ratings than writing portfolios collected in the spring semester?portfolios collected in the spring semester?Are the differences between the semesters Are the differences between the semesters the same across three academic years?the same across three academic years?

►Compare averages of different groupsCompare averages of different groups► Six different groups (fall and spring Six different groups (fall and spring

for each academic year)for each academic year)

Page 18: Analyzing & Interpreting Data Assessment Institute Summer 2005

Average Portfolio Rating of Students Who Took the GWRIT Course in Different Semesters

by Academic Year

2.56 2.56

2.192.42 2.39

2.22

1.00

1.50

2.00

2.50

3.00

3.50

4.00

2001-2002 2002-2003 2003-2004

Academic Year

Av

era

ge

Ra

tin

g

Fall Spring

In the 2001-2002 and 2002-2003 academic years, fall In the 2001-2002 and 2002-2003 academic years, fall portfolios were rated slightly higher than spring portfolios were rated slightly higher than spring portfolios. In the most current academic year, the fall portfolios. In the most current academic year, the fall and spring portfolio averages were about the same. and spring portfolio averages were about the same. There doesn’t seem to be overwhelming evidence that There doesn’t seem to be overwhelming evidence that the difference between fall and spring portfolios is of the difference between fall and spring portfolios is of importance. importance.

Page 19: Analyzing & Interpreting Data Assessment Institute Summer 2005

G.G. Are students who obtained transfer or AP Are students who obtained transfer or AP credit for their general education sociocultural credit for their general education sociocultural domain course scoring differently on the 27-domain course scoring differently on the 27-item Sociocultural Domain Assessment (SDA) item Sociocultural Domain Assessment (SDA) than students who completed their courses at than students who completed their courses at JMU? JMU?

•JMU students: N = 369, M = 18.63, SD = 3.83JMU students: N = 369, M = 18.63, SD = 3.83•AP/transfer students: N = 29, M = 18.55, SD = AP/transfer students: N = 29, M = 18.55, SD = 3.683.68•Difference was Difference was notnot statistically, statistically, tt(335)=.11, (335)=.11, pp = .92, nor practically significant (= .92, nor practically significant (dd = .02). = .02).

►Compare averages of different groupsCompare averages of different groups

Page 20: Analyzing & Interpreting Data Assessment Institute Summer 2005

G.G. What is the relationship between a What is the relationship between a student’s general education sociocultural student’s general education sociocultural domain course grade and their score on the domain course grade and their score on the 27-item Sociocultural Domain Assessment 27-item Sociocultural Domain Assessment (SDA)?(SDA)?►Relationship between two variables, finally!Relationship between two variables, finally!

Relationship Between GSYC101 Course Grade and SDA Total Score

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 3 6 9 12 15 18 21 24 27

SDA Total Score

GP

SY

C101 C

ou

rse G

rad

e

Relationship Between GSYC160 Course Grade and SDA Total Score

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 3 6 9 12 15 18 21 24 27

SDA Total Score

GP

SY

C1

60

Co

urs

e G

rad

e

r = .31 r = .23

Page 21: Analyzing & Interpreting Data Assessment Institute Summer 2005

Inferential StatisticsInferential Statistics

► ““How likely is it to have found results such as How likely is it to have found results such as mine in a population where the null hypothesis mine in a population where the null hypothesis is true?”is true?”

► Comparing Averages of Different GroupsComparing Averages of Different Groups Independent Samples T-testIndependent Samples T-test

► Null Null Groups do not differ in population means Groups do not differ in population means► Comparing Averages Across Different OccasionsComparing Averages Across Different Occasions

Paired Samples T-testPaired Samples T-test► Null Null Occasions do not differ in population means Occasions do not differ in population means

► CorrelationCorrelation► Null Null No relationship between variables in the population No relationship between variables in the population

Typically, want to reject the null: p-value < .05

Page 22: Analyzing & Interpreting Data Assessment Institute Summer 2005

Effect Sizes and Confidence Effect Sizes and Confidence IntervalsIntervals

►Statistical significance is a function of Statistical significance is a function of both the magnitude of the effect (e.g., both the magnitude of the effect (e.g., difference between means) difference between means) and sample and sample sizesize

►Supplement with Supplement with confidence intervalsconfidence intervals and and effect sizeseffect sizes SPSS provides you with confidence intervalsSPSS provides you with confidence intervals Can use Wilson’s Effect Size Calculator to Can use Wilson’s Effect Size Calculator to

obtain effect sizesobtain effect sizes

Page 23: Analyzing & Interpreting Data Assessment Institute Summer 2005

a Identify the dimensions of wellness.

b Identify factors that influence each dimension of wellness.

c Recognize how dimensions of wellness are interrelated.

a Recognize the importance of lifestyle in disease prevention

b Recognize the relationship between health behaviors and wellness.

c Identify and apply the theories of health behavior change.

d Examine the role of consumer health issues related to overall wellness

a Assess one’s levels of health and wellness

bEvaluate how one’s levels of health and wellness compare to recommended levels

cRecognize how genetics, environment and lifestyle behaviors influence health and wellness levels.

a Identify a realistic and adjustable personal wellness plan.

bRecognize how to use self-management skills relating to healthy lifestyle behaviors.

c Participate in a greater number of healthy wellness-related activities.

* Not an actual goal/objective; created only for Assessment Institute instructional purposes.

Students will identify and implement strategies to improve their wellness

4

ObjectivesGoals

Students will recognize an individual’s level of health and wellness and understand how these

levels impact quality of life 3

Students should be able to understand the dimensions of wellness, the various factors

affecting each dimension, and how dimensions are interrelated.

1

Students should be able to understand the relationship between personal behaviors and

lifelong health and wellness.2

Wellness Domain ExampleWellness Domain ExampleGoals & ObjectivesGoals & Objectives

Students take one of two courses to fulfill this requirement, either GHTH 100 or GKIN 100.

Page 24: Analyzing & Interpreting Data Assessment Institute Summer 2005

Knowledge Knowledge of Health of Health

and and Wellness Wellness (KWH) (KWH)

Test Test SpecificatioSpecificatio

n Tablen Table

Goals Objectives Assessment Items Spring 2005# KWH Items/

Objective % of KWH

a 30, 16 2 5.71

b 6, 24 2 5.71

c 5,8,10,13 4 11.43

Total # Items for Goal 1: 8 22.86

a 14,20, 35 3 8.57

b 22, 23, 18, 3 4 11.43

c 25, 11, 26 3 8.57

d 2, 12, 27, 19 4 11.43

Total # Items for Goal 2: 14 40.00

a 17, 28 2 5.71b 9, 29, 7, 4, 31 5 14.29c 1 1 2.86

Total # Items for Goal 3: 8 22.86

a 32, 33, 15 3 8.57b 21, 34 2 5.71

Total # Items for Goal 4: 5 14.29Total # Items 35 100.00

c Assessed via HWQ --- ----

Test Specification Table for KWH

4

3

1

2

Page 25: Analyzing & Interpreting Data Assessment Institute Summer 2005

Data Management PlanData Management PlanWellness_Data.sav (N = 105)Wellness_Data.sav (N = 105)

Missing data indicated for all variables by Missing data indicated for all variables by ".""."

Name Label Possible Values When Data Collected Type

id Numeric

kwhtot03 KWH Total Fall 2003 0 - 35 Fall 2003 Numeric

kwhtot05 KWH Total Spring 2005 0 - 35 Spring 2005 Numeric

ghth100 Personal Wellness Course Grade Letter Grade Fall 2003 thru Fall 2004 Character

gkin100 Lifetime Fitness & Wellness Course Grade Letter Grade Fall 2003 thru Fall 2004 Character

hth100 Personal Wellness Course Grade Numeric Grade Fall 2003 thru Fall 2004 Numeric

kin100 Lifetime Fitness & Wellness Course Grade Numeric Grade Fall 2003 thru Fall 2004 Numeric

took_hth Did the student take GHTH100?0 = "Did NOT take GHTH100"

1 = "Did take GHTH100"

took_kin Did the student take GKIN100?0 = "Did NOT take GKIN100"

1 = "Did take GKIN100"

numwellHas the student completed their wellness

domain requirement?0 = "Requirement NOT completed"

1 = "Requirement Completed"

Page 26: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item AnalysisItem Analysis

Item 1

Page 27: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item DifficultyItem Difficulty

► The The proportion of people who answered the proportion of people who answered the item correctly (item correctly (pp))

Used with dichotomously scored items Used with dichotomously scored items Correct Answer - score=1Correct Answer - score=1 Incorrect Answer - score=0Incorrect Answer - score=0

► Item difficulty a.k.a.Item difficulty a.k.a. p-value p-value► Dichotomous itemsDichotomous items

Mean=Mean=pp Variance=Variance=pqpq, where , where q q = 1-= 1-pp

Page 28: Analyzing & Interpreting Data Assessment Institute Summer 2005

Mean Std Dev Cases 1. ITEM1 .5609 .4972 271.0 2. ITEM2 .9520 .2141 271.0 3. ITEM3 .8598 .3479 271.0 4. ITEM4 .7454 .4364 271.0 5. ITEM5 .6089 .4889 271.0 6. ITEM6 .5793 .4946 271.0

SPSS output for 1st 6 items of 35 item GKIN100 Test3 Spring 2005

58% of the sample obtained the correct response to Item 6. The difficulty or p-value of Item 6 is .58

Mean is item difficulty (p)

Std Dev is a measure f the variability in the item scores

Sample size on which analysis is based

Page 29: Analyzing & Interpreting Data Assessment Institute Summer 2005

Easiest & Hardest ItemsEasiest & Hardest Items

► 25.25. Causes of mortality today are:Causes of mortality today are:A.A. the same as in the early 20th century.the same as in the early 20th century.B.B. mostly related to lifestyle factors.mostly related to lifestyle factors.C.C. mostly due to fewer vaccinations.mostly due to fewer vaccinations.D.D. a result of contaminated water.a result of contaminated water.

► 34.34. Which of the following is a healthy lifestyle Which of the following is a healthy lifestyle that influences wellness?that influences wellness?

A.A. brushing your teethbrushing your teethB.B. physical fitnessphysical fitnessC.C. access to health careaccess to health careD.D. obesogenic environmentobesogenic environment

p = .99

EASIEST

p = .14

HARDEST

Page 30: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item Difficulty GuidelinesItem Difficulty Guidelines

► High p-values, item is High p-values, item is easyeasy; low p-values, item is ; low p-values, item is hardhard

► If p-value=1.0 If p-value=1.0 (or 0),(or 0), everyone answering question everyone answering question correctly correctly (or incorrectly)(or incorrectly) and there will be no and there will be no variability in item scoresvariability in item scores

► If p-value too low, item is too difficult, need revision If p-value too low, item is too difficult, need revision or perhaps test is too longor perhaps test is too long

► Good to have a mixture of difficulty in items on testGood to have a mixture of difficulty in items on test► Once know difficulty of items, usually sort them Once know difficulty of items, usually sort them

from easiest to hardest on testfrom easiest to hardest on test

Page 31: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item DiscriminationItem Discrimination► Correlation between item score and total score on Correlation between item score and total score on

testtest

► Since dealing with dichotomous items, this Since dealing with dichotomous items, this correlation is usually either a biserial or point-correlation is usually either a biserial or point-biserial correlationbiserial correlation

► Can range in value from -1 to 1Can range in value from -1 to 1

► Positive values closer to 1 are desirablePositive values closer to 1 are desirable

Page 32: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item Discrimination Item Discrimination GuidelinesGuidelines

► Item discrimination: can the item separate the Item discrimination: can the item separate the men from the boys (women from the girls)men from the boys (women from the girls) Can the item differentiate between low or high Can the item differentiate between low or high

scorers?scorers?► Want Want highhigh item discrimination! item discrimination! ► Consider dropping or revising items with Consider dropping or revising items with

discriminations discriminations lower than .30lower than .30► Can be negative, if so – check scoring key and Can be negative, if so – check scoring key and

if the key is correct, may want to drop or if the key is correct, may want to drop or revise itemrevise item

► a.k.a. a.k.a. rrpbis pbis or Corrected Item-Total Correlationor Corrected Item-Total Correlation

Page 33: Analyzing & Interpreting Data Assessment Institute Summer 2005

ITEM2

TO

TA

L35

30

25

20

15

10

5

0

0 1

ITEM17

TO

TA

L

35

30

25

20

15

10

5

0

Scatterplot of Scatterplot of relationship between relationship between item 2 score (0 or 1) item 2 score (0 or 1) and total scoreand total scorerrpbis pbis = .52= .52

Scatterplot of Scatterplot of relationship between relationship between item 17 score (0 or 1) item 17 score (0 or 1) and total scoreand total scorerrpbis pbis = .18= .18

0 1

If I know you item score, I have a pretty good idea as to what your ability level or total score is.

If I know you item score, I DO NOT have a pretty good idea as to what your ability level or total score is.

Page 34: Analyzing & Interpreting Data Assessment Institute Summer 2005

Scale Mean if

Item Deleted

Scale Variance if

Item Deleted

Corrected Item-Total Correlation

Alpha if Item

Deleted

ITEM1 26.5424 18.0047 0.2046 0.7772ITEM2 26.1513 18.1141 0.5242 0.7673ITEM3 26.2435 18.1553 0.283 0.7722ITEM4 26.3579 17.5418 0.3783 0.7672ITEM5 26.4945 17.8805 0.2408 0.775ITEM6 26.524 17.4578 0.3418 0.7691

SPSS output for 1st 6 items of 35 item GKIN100 Test3 Spring 2005 Corrected Item-

Total Correlation is Item Discrimination (rpbis)

Why is it called corrected item-total correlation? The corrected implies that the total is NOT the sum of all item scores, but the sum of item scores WIHTOUT including the item in question.

Page 35: Analyzing & Interpreting Data Assessment Institute Summer 2005

TOTAL * RR2

TOTAL

.7% 15.5000

95.2% 27.6512

2.6% 17.7143

.7% 25.5000

.7% 2.5000

100.0% 27.1033

RR21

2*

3

4

9

Total

% of Total N MeanA = 1

B = 2

C = 3

D = 4

9 = Missing

Percentage of sample choosing each alternative.

Average total test score for students who chose each alternative.

Notice how the highest average total test score (M = 27.65) is associated with the correct alternative (B). All other means are quite a bit lower. This indicates that the item is functioning well and will discriminate.

This information is This information is for item 2, where the for item 2, where the item difficulty and item difficulty and discrimination were:discrimination were:p = .95, rp = .95, rpbis pbis = .52= .52

Page 36: Analyzing & Interpreting Data Assessment Institute Summer 2005

TOTAL * RR17

TOTAL

15.1% 26.2927

3.3% 25.6667

69.7% 27.9048

11.1% 25.2333

.7% 2.5000

100.0% 27.1033

RR171

2

3*

4

9

Total

% of Total N MeanA = 1

B = 2

C = 3

D = 4

9 = Missing

Percentage of sample choosing each alternative.

Average total test score for students who chose each alternative.

Notice how the highest average total test score (M = 27.91) is associated with the correct alternative (C). Unlike item 2, with this item all other means are fairly close to 27.91. This indicates that the item does not discriminate as well as item 2.

This information is This information is for item 17, where for item 17, where the item difficulty the item difficulty and discrimination and discrimination were:were:p = .697, rp = .697, rpbis pbis = .18= .18

Page 37: Analyzing & Interpreting Data Assessment Institute Summer 2005

► Did this mainly for those items Did this mainly for those items that were difficult (that were difficult (pp < .50) or < .50) or had low discrimination (had low discrimination (rrpbispbis < .20)< .20)

►Took information from Took information from SPSS distractor analysis SPSS distractor analysis output and put it in the output and put it in the following graph.following graph.

Item 4

16%

3%6%

75%22.65

28.28

25.1424.32

0%

20%

40%

60%

80%

100%

A B C* D

% c

ho

osi

ng

res

po

nse

0

5

10

15

20

25

30

35

Ave

rag

e T

ota

l S

core

% choosing response avg. score for those choosing response

4. The DSHEA of 1994 has:A. labeled certain drugs illegal based on their active

ingredient.B. caused health food companies to lose significant

business.C. made it easier for fraudulent products to stay on the

market.D. caused an increase in the cost of many dietary

supplements.

Page 38: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item 31

17%

27%

42%

13%

29.28

26.4925.47

26.96

0%

20%

40%

60%

80%

100%

A B* C D

% c

ho

osi

ng

res

po

nse

0

5

10

15

20

25

30

35

Ave

rag

e T

ota

l S

core

% choosing response avg. score for those choosing response

Hard item - but pattern of means indicates it is not problematic.

31. aging relates to lifestyle.

A. Time-dependentB. AcquiredC. PhysicalD. Mental

Page 39: Analyzing & Interpreting Data Assessment Institute Summer 2005

Item 10

1%

56%

39%

2%

27.67 27.19

21.17

18.50

0%

20%

40%

60%

80%

100%

A B* C D

% c

ho

osi

ng

res

po

nse

0

5

10

15

20

25

30

35

Ave

rag

e T

ota

l S

core

% choosing response avg. score for those choosing response

This item may be problematic - students choosing "C" scoring almost as high on the test overall as those choosing "B".

10.Chris MUST get a beer during the commercials each time he watches the NFL. Which stage of addiction does this demonstrate?

a) Exposureb) Compulsionc) Loss of controld) This is not an example of addiction.

Page 40: Analyzing & Interpreting Data Assessment Institute Summer 2005

Other Information from SPSSOther Information from SPSS

►Descriptive Statistics for total score.Descriptive Statistics for total score. N of

Statistics for Mean Variance Std Dev Variables SCALE 27.1033 19.1152 4.3721 35

Average # of points by which total scores are varying from the mean

Average total score

# items on the test

►An measure of the internal consistency reliability for your An measure of the internal consistency reliability for your test called coefficient alpha.test called coefficient alpha.

Alpha = .7779Ranges from 0 1 with higher values indicating higher reliability. Want it to be > .60

Page 41: Analyzing & Interpreting Data Assessment Institute Summer 2005

Test Score ReliabilityTest Score Reliability► Reliability defined: extent or degree to which a Reliability defined: extent or degree to which a

scale/test consistently measures a personscale/test consistently measures a person► Need a test/scale to be reliable in order to trust Need a test/scale to be reliable in order to trust

the test scores! If I administered a test to you the test scores! If I administered a test to you today, wiped out your memory, administered it today, wiped out your memory, administered it again to you tomorrow – you should receive again to you tomorrow – you should receive the same score on both administrations!the same score on both administrations!

► How much would you trust a bathroom scale if How much would you trust a bathroom scale if you consecutively weighed yourself 4 times you consecutively weighed yourself 4 times and obtained weights of 145, 149, 142, 150?and obtained weights of 145, 149, 142, 150?

Page 42: Analyzing & Interpreting Data Assessment Institute Summer 2005

Internal Consistency Internal Consistency ReliabilityReliability

► Internal consistency reliabilityInternal consistency reliability: extent to which : extent to which items on a test are highly intercorrelateditems on a test are highly intercorrelated

► SPSS reports Cronbach’s coefficient alphaSPSS reports Cronbach’s coefficient alpha► Alpha may be low if:Alpha may be low if:

Test is shortTest is short Items are measuring very different things (several Items are measuring very different things (several

different content areas or dimensions)different content areas or dimensions) Low variability in your total scores or small range of Low variability in your total scores or small range of

ability in the sample you are testingability in the sample you are testing Test only contains either very easy items or very hard Test only contains either very easy items or very hard

itemsitems