Upload
carlo-magno
View
575
Download
2
Embed Size (px)
Citation preview
Developing Instruments for Research
Carlo Magno, PhD.Lasallian Institute for Development and Educational ResearchCollege of Education, De La Salle University, Manila
Activity 1: Assessment Schema Check-up
Answer the following questions as a group. Your answers should reflect your current practices in assessing your students. Write your answers in a piece of paper.
1. List down the things that you do when preparing to write your test items? (procedure)2. What are the things that you consider when writing your test items? (concepts)3. What further steps do you do after you have scored and recorded the test papers? (procedure)4. What other forms of assessment do you conduct aside from paper and pencil tests?
1. List down the things that you do when preparing to write your test items?
Prepare Table of Specifications (TOS) Use the Taxonomy of Cognitive skills
(Bloom’s taxonomy) Conduct Item review
2. What are the things that you consider when writing your test items?
Learning objectives Curriculum/national standards Needs of students Higher order thinking skills Test length Test instruction Test layout Scoring
3. What further steps do you do after you have scored and recorded the test papers?
Item analysis Item difficulty Item discriminationDistracter analysisReliability analysisValidity analysis
4. What other forms of assessment do you conduct aside from paper and pencil tests?
Alternative forms of assessment Performance-based Assessment Authentic assessment Portfolio assessment
Types of Measures
Non-cognitive measures Attitude Beliefs Interests Values Dispositions
Cognitive Measures Tests (right and wrong answers)
8
Steps in Constructing Cognitive Measures
Decide what information should be sought (1) No instruments are available to measure such
construct (2) All tests are foreign and it is not suitable for the
stakeholders or sample that will take the measure (3) Existing measures are not appropriate for the
purpose of assessment (4) The test developer intends to explore the
underlying factors of a construct and eventually confirm it
Search for Content Domain: Search for relevant literature reviews Look for the appropriate definition Explain the theory Specify the underlying variables (deconstruction)
9
Steps in Constructing Cognitive Measures
Factor
Subscale 1
Subscale 2
Subscale 3
Subscale 4
Subscale 5
Unidimensional (One-factor)
Steps in Constructing Cognitive Measures
Factor 1
Subscale 1
Subscale 2
Subscale 3
Subscale 4
Subscale 5
Factor 2
Subscale 1
Subscale 2
Subscale 3
Subscale 4
Subscale 5
Unidimensional (One-factor)
11
Steps in Constructing Cognitive Measures
Write the first draft of items: Items are created for each subscale as guided by
the conceptual definition. The number of items as planned in the Table of
Specifications is also considered. As much as possible, a large number of items are
written to represent well the behavior being measured.
How to write Items: Items are based on the definition of the subscales Provide the manifestation of the construct Descriptions from references Ask experts to write sample items
Steps in Constructing Cognitive Measures
ContentOutline
No. of items
1. Table of specifications 102. Test and Item characteristics 203. Test layout 54. Test instructions 55. Reproducing the test 56. Test length 57. Scoring the test 5TOTAL 55
One-grid Table of SpecificationsOne-grid Table of Specifications
One-grid Table of Specifications
Steps in Constructing Cognitive Measures
Cognitive DomainContent Knowledge Comprehensio
nApplication
I.II.III.
Two grid Table of Specifications
Steps in Constructing Cognitive Measures
Weight(Time Frame)
ContentOutline
Knowledge30%
Comprehension40%
Application30%
No. of items by content
area
35% 1. Table of specifications 1 4 4 9
30% 2. Test and Item characteristics
2 3 3 8
10% 3. Test layout 1 1 0 2
5% 4. Test instructions 0 1 0 1
5% 5. Reproducing the test 1 0 0 1
5% 6. Test length 1 0 1 2
10% 7. Scoring the test 2 1 0 3
8 10 8 26
Three grid Table of Specifications
15
Steps in Constructing Cognitive Measures
Good questionnaire items should:1. Include a vocabulary that is simple, direct, and
familiar to all respondents2. Be clear and specific3. Not involve leading, loaded or double barreled
questions4. Be as short as possible5. Include all conditional information prior to the
key ideas6. Be edited for readability7. Generalizable for a large sample.8. Avoid time-bound situations.
16
Steps in Constructing Cognitive Measures
Example of bad items: I am satisfied with my wages and hours at the
place where I work. (Double Barreled) I not in favor congress passing a law not allowing
any employer to force any employee to retire at any age. (Double Negative) Most people favor death penalty. What do you
think? (Leading Question)
17
Steps in Constructing Cognitive Measures
Select a Response Format: After writing the items, the test developer decides on the
appropriate response format to be used in the scale. The most common response formats used:
Binary type Multiple choice Short answer Essay
Develop directions for responding: Directions or instructions for the target respondents be
created as early as when the items are created. Clear and concise. Respondents should be informed how to answer. When you intend to have a separate answer sheet, make
sure to inform the respondents about it in the instructions. Instructions should also include ways of changing answers,
how to answer (encircle, check, or shade). Inform the respondents in the instructions specifically what
they need to do.
18
Steps in Constructing Cognitive Measures
Conduct a judgmental review of items Have experts review your items.
19
Steps in Constructing Cognitive Measures
Reexamine and revise the questionnaire Prepare a draft and gather preliminary pilot
data: Requires a layout of the test for the respondents. Make the scale as easy as possible to use. Each item can be identified with a number or a letter
to facilitate scoring of responses later. The items should be structured for readability and
recording responses. Whenever possible items with the same response
formats are placed together. Make the layout visually appealing to increase
response rate. Self-explanatory and the respondents can complete it
in a short time. Ordering of items: The first few questions set the
tone for the rest of the items and determine how willingly and conscientiously respondents will work on subsequent questions.
20
Steps in Constructing Cognitive Measures
Analyze Pilot data: The responses in the test should be recorded
using a spreadsheet. The numerical responses are then analyzed. The analysis consists of determining whether the
test is reliable or valid. Revise the Instrument: The instrument is then revised because items
with low factor loadings (not significant in CFA) are removed
Items when removed will increase Cronbach’s alpha.
21
Steps in Constructing Cognitive Measures
Gather final pilot data A large sample is again selected which is three
times the number of items. Conduct Additional Validity and Reliability
Analysis The validity and reliability is again analyzed using
the new pilot data. Edit the test and specify the procedures for
its use Items with low factor loadings are again removed
resulting to less items. A new form of the test with reduced items will be
formed. Prepare the Test Manual The test manual indicates the purpose of the
test, instructions in administering, procedure for scoring, interpreting the scores including the norms.
Test Length
The test must be of sufficiently length to yield reliable scores.
The longer the test, the more the reliable the results. This also targets the validity of the test because the test should be valid if it is reliable.
For the grade school, one must consider the stamina and attention span of the pupils
The test should be long enough to be adequately reliable and short enough to be administered
Test Instruction
It is the function of the test instructions to furnish the learning experiences needed in order to enable each examinee to understand clearly what he is being asked to do.
Instructions may be oral, a combination of written and oral instruction is probably desirable, except with very young children.
Clear concise and specific.
Test layout
The arrangement of the test items influences the speed and accuracy of the examinee
Utilize the space available while retaining readability. Items of the same type should be grouped together Arrange test items from easiest to most difficult as a
means of reducing test anxiety. The test should be ordered first by type then by
content Each item should be completed in the column and
page in which it is started. If the reference material is needed, it should occur on
the same page as the item If you are using numbers to identify items it is better
to use letters for the options
Properties of a Test
Reliability Validity Item discrimination/Item Difficulty
Reliability
Consistency of scores Obtained by the same person when retested with the identical test or with an equivalent form of the test
Test-Retest Reliability
Repeating the identical test on a second occasion Temporal stability When variables are stable ex: motor
coordination, finger dexterity, aptitude, capacity to learn
Correlate the scores from the first test and second test.
The higher the correlation the more reliable
Alternate Form/Parallel Form Same person is tested with one form on the first
occasion and with another equivalent form on the second
Equivalence; Temporal stability and consistency of response Used for personality and mental ability tests Correlate scores on the first form and scores on
the second form
Split half
Two scores are obtained for each person by dividing the test into equivalent halves
Internal consistency; Homogeneity of items Used for personality and mental ability tests The test should have many items Correlate scores of the odd and even numbered
items Convert the obtained correlation coefficient into a
coefficient estimate using Spearman Brown
Kuder Richardson(KR #20/KR #21)
When computing for binary (e.g., true/false) items
Consistency of responses to all items Used if there is a correct answer
(right or wrong) Use KR #20 or KR #21 formula
Coefficient Alpha
The reliability that would result if all values for each item were standardized (z transformed)
Consistency of responses to all items Homogeneity of items Used for personality tests with
multiple scored-items Use the cronbach’s alpha formula
Inter-item reliability
Consistency of responses to all items Homogeneity of items Used for personality tests with
multiple scored-items Each item is correlated with every
item in the test
Scorer Reliability
Having a sample of test papers independently scored by two examiners
To decrease examiner or scorer variance Clinical instruments employed in intensive
individual tests ex. projective tests The two scores from the two raters obtained are
correlated with each other
Validity
Degree to which the test actually measures what it purports to measure
Content Validity
Systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured.
More appropriate for achievement tests & teacher made tests
Items are based on instructional objectives, course syllabi & textbooks
Consultation with experts Making test-specifications
Criterion-Prediction Validity
Prediction from the test to any criterion situation over time interval
Hiring job applicants, selecting students for admission to college, assigning military personnel to occupational training programs
Test scores are correlated with other criterion measures ex: mechanical aptitude and job performance as a machinist
Concurrent validity
Tests are administered to a group on whom criterion data are already available
Diagnosing for existing status ex. entrance exam scores of students for college with their average grade for their senior year.
Correlate the test score with the other existing measure
Construct Validity The extent to which the test may be said
to measure a theoretical construct or trait.
Used for personality tests. Measures that are multidimensional
· Correlate a new test with a similar earlier test as measured approximately the same general behavior
· Factor analysis · Comparison of the upper and lower
group · Point-biserial correlation (pass and
fail with total test score) · Correlate subtest with the entire test
Convergent Validity
The test should correlate significantly from variables it is related to
Commonly for personality measures Multitrait-multidimensional matrix
Divergent Validity
The test should not correlate significantly from variables from which it should differ
Commonly for personality measures Multitrait-multidimensional matrix
Item Analysis
Item Difficulty – The percentage of respondents who answered an item correctly
Item Discrimination – Degree to which an item differentiates correctly among test takers in the behavior that the test is designed to measure.
Difficulty Index
Difficulty Index Remark .76 or higher Easy Item .25 to .75 Average Item .24 or lower Difficult Item
Index Discrimination
.40 and above - Very good item .30 - .39 - Good item .20 - .29 - Reasonably
Good item .10 - .19 - Marginal item Below .10 - Poor item
Writing multiple choice items
Multiple Choice
1. Salvador Dali isa. a famous Indian.b. important in international law.c. known for his surrealistic art.d. the author of many avant-garde plays. •It is recommended that the stem be a direct
question.•The stem should pose a clear, define, explicit, and singular problem.
Why is the item faulty?
Multiple Choice
IMPROVED: With which one of the fine arts is Salvador Dali associated?a. surrealistic paintingb. avant-garde theatrec. polytonal symphonic musicd. impressionistic poetry
Multiple Choice
2. Milk can be pasteurized at home bya. heating it to a temperature of 130o
b. Heating it to a temperature of 145o
c. Heating it to a temperature of 160o
d. Heating it to a temperature of 175o
•Include in the stem any words that might otherwise be repeated in each response.
Why is the item faulty?
Multiple Choice
IMPROVED: The minimum temperature that can be used to pasteurize milk at home is:
a. 130o
b. 145o
c. 160o
d. 175o
Multiple Choice
3. Although the experimental research, particularly that by Hansmocker must be considered equivocal and assumptions viewed as too restrictive, most testing experts would recommend as the easiest method of significantly improving paper-and-pencil achievement test reliability toa. increase the size of the group being tested.b. increase the differential weighting of items.c. increase the objective of scoring.d. increase the number of items.e. increase the amount of testing time.Items should be stated simply and
understandably, excluding all nonfunctional words from stem and alternatives.
Why is the item faulty?
Multiple Choice
IMPROVED: Assume a 10-item, 10-minute paper-and-pencil multiple choice achievement test has a reliability of .40. The easiest way of increasing the reliability to .80 would be to increaseda. group sizeb. scoring objectivityc. differential item scoring weightsd. the number of itemse. testing time
Multiple Choice
4. None of the following cities is a state capital excepta. Bangorb. Los Angelesc. Denverd. New Haven
•Avoid negatively stated items
Why is the item faulty?
Multiple Choice
IMPROVED: Which of the following cities is a state capital?a. Bangorb. Los Angelesc. Denverd. New Haven
Multiple Choice
5. Who wrote Harry Potter and the Goblet of Fire?a. J. K. Rowlingb. Manny Paquiaoc. Lea Salongad. Mark Twain
•If possible the alternatives should be presented in some logical, numerical, or systematic order.•Response alternatives should be mutually exclusive.
Why is the item faulty?
Multiple Choice
IMPROVED: Who wrote Penrod?a. J. K. Rowlingb. J. R. R. Tolkienc. V. Hugo d. L. Carrol
Multiple Choice
6. Which of the following statements makes clear the meaning of the word “electron”?a. An electronic toolb. Neutral particlesc. Negative particlesd. A voting machinee. The nuclei of atoms
•Make all responses plausible and attractive to the less knowledgeable and skillful student.
Why is the item faulty?
Multiple Choice
IMPROVED: Which of the following phrases is a description of an “electron”?a. Neutral particleb. Negative particlec. Neutralized protond. Radiated particlee. Atom nucleus
Multiple Choice
7. What is the area of a right triangle whose sides adjacent to the right angle are 4 inches long respectively?a. 7b. 12c. 25d. None of the above
•The response alternative “None of the above” should be used with caution, if at all.
Why is the item faulty?
Multiple Choice
IMPROVED: What is the area of a right triangle whose sides adjacent to the right angle are 4 inches and 3 inches respectively?a. 6 sq. inchesb. 7 sq. inchesc. 12 sq. inchesd. 25 sq. inchese. None of the above
Multiple Choice
8. As compared with the American factory worker in the early part of the 19th century, the American factory worker at the close of the centurya. was working long hoursb. received greater social security benefitsc. was to receive lower money wagesd. was less likely to belong to a labor union.e. became less likely to have personal contact with employers Make options grammatically parallel to each other
and consistent with the stem.Why is the item faulty?
Multiple Choice
IMPROVED: As compared with the American factory worker in the early part of the century, the American factory worker at the close of the centurya. worked longer hours.b. had more social security.c. received lower money wages.d. was less likely to belong to a labor unione. had less personal contact with his employer
Multiple Choice
9. The “standard error of estimate’ refer toa. the objectivity of scoring.b. the percentage of reduced error variance.c. an absolute amount of possible error.d. the amount of error in estimating criterion scores.Avoid such irrelevant cues as “common elements”
and “pat verbal associations.”
Why is the item faulty?
Multiple Choice
IMPROVED: The “standard error of estimate” is most directly related to which of the following test characteristic?a. Objectivityb. Reliabilityc. Validityd. Usabilitye. Specificity
Multiple Choice10. What name is given to the group of complex organic compounds that occur in small quantities in natural foods that are essential to normal nutrition?a. Calorieb. Mineralsc. Nutrientsd. Vitamins
In testing for understanding of a term or concept, it is generally preferable to present the term in the stem and alternative definitions in the options.
Why is the item faulty?
Multiple Choice
IMPROVED: Which of the following statements is the best description of a vitamin?
Activity 2: Workshop
Work with your group and write the items for your test.
1 hour