ASSESSMENT
ADIBAH BINTI ABDUL LATIF
SCHOOL OF EDUCATIONFACULTY OF SOCIAL SCIENCES AND HUMANITIES
TERMINOLOGY
Testing
Measurement
Evaluation
Assessment
TESTING
• A tool to determine student’s ability to completespecific tasks or demonstrate mastery of a skill orknowledge of content
• The most critical basis in ensuring the validity ofstudents interpretation score
• Examples: Q&A session in class, assignment,performance task, test, quiz and final exam
MEASUREMENT
• A systematic process of assigning numerals(quantitative) to the test administered.
• It can be in raw scores, percentile, standard score,etc.
• Examples: Assignment marks, total sore in a finalexam, mean of PLO, KPI score, mean of elppt, rankingscore.
12 August 2020SPP2032::Educational Measurement and
Evaluation
MEASUREMENT
1
2
n
O
O
O
1
2
n
x
x
x
Ability Score
EVALUATION
• The process of describing, obtaining, and providinguseful information for judging decision alternatives.This process allows one to make a judgment aboutthe desirability or value of something.
• Examples: ABCD, Pass and Fail, HL,TM,MM,Description from the value (Baik, Cemerlang,Sederhana), Description of P1 to P5 in e-LPPT.
ASSESSMENT
• The process of gathering information to monitor andreflect the progress in learning and teaching and tomake educational decisions if necessary.
• Examples: Dr A found out that her students areexcellent in formative assessment, but they cannotperform well in their final exam. What to do?
• Dr B realises that there are two groups ofability/level of skills in his class. How can he do tomake sure the learning environment can helpstudents’ learning?
WHAT IS ASSESSMENT?
• The word “assess” comes from the Latin verb “assidere” meaning ‘to sit with’
• In assessment, one is supposed to sit with the learner. This implies it is something we do ‘with’ and ‘for’ students and not ‘to’ students.
9
“Assessment is at the heart of student experience” Brown and Knight (1994)
“If you want to change student learning then change the method of assessment”
Brown, Bull & Pendlebury (1997)
ASSESSMENT
SUMMATIVEFORMATIVE
ALTERNATIVETRADITIONAL
ONLINEOFFLINE
Continuously At the end of lesson
Synchronous
Asynchronous
Online Examination
Asynchronous
Traditional
Restricted Responses
Extended Responses
AlternativePerformance
Based
Synchronous Traditional
Manual Invigilation
Online Proctoring
Without Invigilation and
Proctoring
Types of Assessment (Function)
Students Monitor their own learning, use selfassessment
At the end of learning, learning evidence (product),give judgement on students’ achievement.
Assessment FOR Learning
Continuous assessment, learning evidence (process),give feedback to students and how to improve
Assessment OF Learning
Assessment AS Learning
Types of Assessment (Time)
ASSESSMENT
PLACEMENT FORMATIVE DIAGNOSTIC SUMMATIVE
CONTINUOUS AT THE END
Pre test Assignment, exercise, discussion
Tutorial, remedial Final exam , viva,Mid term exam,one attempt quiz
Types of Assessment (Interpretation)
15
Norm Referenced Test – compare withThe norm
Criterion Referenced Test –Compare with the standard
ASSESSMENT IN OBE
Identify outcomes
Determine assessment
Learning activities
Curriculum
PLO-CLO-ULO STRATEGIES/ METHODSTOOLS
CONSTRUCTIVE RELEVANCY
Table 1: Subjects Without Practical Components
Percentage Parts Assessed
10-20% Soft skills (e.g. communication, teamwork,
problem solving, responsibility)
40-60% Academic coursework (tests, quizzes,
assignments, papers)
30-40% Final examination
Distribution of Marks
Table 2: Subjects With Practical Components
Percentage Parts Assessed
10-20% Soft skills (e.g. discipline, teamwork, problem
solving, ethics)
80-90% Practical knowledge and skills
Distribution of Marks
Differences between Assessment and Evaluation
Dimension of Difference Assessment Evaluation
Timing Formative Summative
Focus of Measurement Process-Oriented Product-Oriented
Relationship Between Administrator and Recipient
Reflective Prescriptive
Findings, Uses Thereof Diagnostic Judgmental
Ongoing Modifiability of Criteria, Measures Thereof
Flexible Fixed
Standards of Measurement Absolute Comparative
Relation Between Objects of A/E Coöperative Competitive
PRINCIPLE OF ASSESSMENT
• Well-aligned with educational learning outcomes.
• Assessment should be valid and reliable
• Formative assessments needs to scaffold students inthe summative assessment
• Student should receive feedback on their work intimely manner.
• Assessment should be inclusive and equitable for allstudents
• Assessment is not used to threaten and intimidatestudents.
• Assessment should help student mastery learning.
VALIDITY (KESAHAN)
• Measuring what should be measured
• The appropriateness of the interpretations made from test scores and other evaluation results with regard to a particular use.
CHARACTERISTIC OF A GOOD TEST
CONTENT VALIDITY• Most related with achievement test
• The test represent the topic and cognitive process
towards the syllabus.
• Does it measuring learning objectives? –cognitive /
affective / psychomotor
• Table of specification –quiz, test, exam
•Construct formation and Operational Definition -
Rubric
• Subject matter expert –for both
TABLE OF SPECIFICATION
AKTIVITI 1
Bilakah Jadual Spesifikasi Item dibina?
PENGENALAN
Kubiszyn & Borich, (2003) emphasized the followingsignificance and components of TOS:
1. A Table of Specifications consists of a two-way chartor grid relating instructional objectives to theinstructional content.The column of the chart lists the objectives or"levels of skills" (Gredlcr, 1999) to be addressed;The rows list the key concepts or content the test isto measure.
TUJUAN JSI
• Menjamin content validity
• Memastikan sample item yang representative secara adil.
• Ujian memfokuskan kepada kandungan yang penting
• Menentukan pemberat / masa yang akan ditetapkan dalam kuliah.
TUJUAN JSI
• JSI juga dapat membantu pensyarah sebagaipanduan dalam perancangan menetapkan topik yanglebih penting, masa yang diperlukan untuk topiktertentu dan apakah tugasan / projek yang bolehdilakukan untuk membantu pelajar belajar topiktersebut lebih bermakna.
TUJUAN JSI
According to Bloom, et al. (1971),"We have found it useful to represent the
relation of content and behaviors in the form of a two dimensional table with the objectives on one axis, the content on the other”.
TUJUAN JSI
2. A Table of Specifications identifies not only thecontent areas covered in class, it identifies theperformance objectives at each level of the cognitivedomain of Bloom's Taxonomy.
Teachers can be assured that they are measuring students' learning
across a wide range of content and readings as well as cognitive
processes requiring higher order thinking.
TUJUAN JSI
3. A Table of Specifications is developed before the test is written. In fact it should be constructed before the actual teaching begins.
TUJUAN JSI
The cornerstone of classroom assessmentpractices is the validity of the judgments aboutstudents’ learning and knowledge.
A TOS is one tool that teachers can use tosupport their professional judgment when creatingor selecting test for use with their students.
TUJUAN JSI
In order to understand how to best modify a TOSto meet your needs, it is important to understand thegoal of this strategy: improving validity of a teacher’sevaluations based on a given assessment. Validity isthe degree to which the evaluations or judgments wemake as teachers about our students can be trustedbased on the quality of evidence we gathered(Wolming & Wilkstrom, 2010).
TUJUAN JSI
A Table of Specifications helps to ensure thatthere is a match between what is taught and what istested. Classroom assessment should be driven byclassroom teaching which itself is driven by coursegoals and objectives.
Tables of Specifications provide the link betweenteaching and testing. (University of Kansas, 2013)
FORMULA
Formula A
Relative weight for the importance of content =
( The number of the TLO OR class period for one topic ÷TOTAL number of TLO OR class period ) ×100%
(3/10)*100 = 30
Relative weight of the subjectTLO / Hours spentContent
%303Topic 1
%101Topic 2
%101Topic 3
%202Topic 4
%101Topic 5
%202Topic 6
100%10Total TLO / class periods for teaching the unit
FORMULA
Formula B
Relative weight for the item =
(% of weight in each Bloom level x total
item of the test)
(0.3*20)= 6
Objectives
Totals 100%
Topics
Knowledge and
Comprehension
30 %
Application
and Analysis
50%
Evaluation
and
Synthesize
20%
Totals 100%
Topic 1 (30 %)
Topic 2 (10 %)
Topic 3 (10 %)
Topic 4 (20 %)
Topic 5 (10 %)
Topic 6 (20 %)
Weight for item 6 10 4 20
FORMULA
Formula C
Identify the number of questions in each topic for
each level of objectives =
(The total number of test x relative weight of the
topics x relative weight of Bloom level)
(20*0.3*0.3)= 1.8
Objectives
(Totals 100%)
Topics
Knowledge and
Comprehension
30 %
Application
and Analysis
50%
Evaluation
and
Synthesize
20%
Totals 100%
Topic 1 (30 %) 1.8 (2) 3 (3) 1.2 (1) 6
Topic 2 (10 %) 0.6 (1) 1 (1) 0.4 (0) 2
Topic 3 (10 %) 0.6 (1) 1 (1) 0.4 (0) 2
Topic 4 (20 %) 1.2 (1) 2(2) 0.8 (1) 4
Topic 5 (10 %) 0.6 (0) 1 (1) 0.4 (1) 2
Topic 6 (20 %) 1.2 (1) 2(2) 0.8 (1) 4
Number of questions 6 10 4 20
RELIABILITY
Test-retest reliability.
• Reliability coefficient is obtained by administering the same test twice and correlating the scores.
• An excellent measure of score consistency as one is directly measuring consistency from administration to administration.
RELIABILITYSplit Half Test
• Coefficient is obtained by dividing a test into halves, correlatingthe scores on each half, and then correcting for length (longertests tend to be more reliable).
• The split can be based on:
odd versus even numbered items, randomly selecting items,or manually balancing content and difficulty.
• Advantage: only requires a single test administration.
• Weakness: - the resultant coefficient will vary as afunction of how the test was split.
- not appropriate on tests where speed is a factor
RELIABILITY
Alternate Form Reliability
Most standardized tests provide equivalent forms that can be used interchangeably.
These alternative forms are typically matched in terms of content and difficulty.
Scores on pairs of alternative forms for the same examinees are correlated to provide a measure of consistency or reliability.
RELIABILITY
CORRELATION = RELIABILITY
Kebolehpercayaan Item
98% pengulangan keputusan boleh Berlaku jika ditadbir kepada kumpulan
Pelajar lain
Activity
46
WHAT CAN BE CONCLUDED FROM THE GIVEN DIAGRAM?
Validity and Reliability
Validity and Reliability
Validity and Reliability
49
TEST DURATION
Carey (1988) pointed out that the timeavailable for testing depended not onlyon the length of the class period butalso on students' attention spans.
TEST DURATION
Linn & Gronlund (2000):
1. A true-false test item takes 15 seconds to answer unless the student is asked to provide the correct answer for false questions. Then the time increases to 30-45 seconds.
2. A seven item matching exercise takes 60-90 seconds.
TEST DURATION
3. A four response multiple choice testitem that asks for an answer regardinga term, fact, definition, rule orprinciple (knowledge level item) takes30 seconds. The same type of test itemthat is at the application level may take60 seconds.
TEST DURATION
4. Any test item format that requiressolving a problem, analyzing,synthesizing information or evaluatingexamples adds 30-60 seconds to aquestion.
TEST DURATION
5. Short-answer test items take 30-45 seconds.
6. An essay test takes 60 seconds
for each point to be compared and contrasted.
ARAS KESUKARAN UJIAN
Aras kesukaran
• Memastikan item yang dibina adalah bersesuaiandengan aras keupayaan pelajar.
• Membuktikan aras kesukaran item yang ditetapkandalam JSI.
• Analisis aras kesukaran ini dilakukan untukpenetapan aras item untuk disimpan di dalam bankitem.
• Bagi mengkaji semula aras kesukaran item yangdiletakkan semasa penulisan Jadual Spesfikasi Item.
• Analisis secara CTT boleh dilakukan sebagai asasanalisis item
Item Difficulty Level: Definition
The percentage of students who answered the item correctly.
High
(Difficult)
Medium
(Moderate)
Low
(Easy)
≤= 30% > 30% AND < 80% ≥=80%
0 10 20 30 40 50 60 70 80 90 100
• Menentukan indeks kesukaran bagi item objektif:
pengiraan itu boleh dilakukan dalam bentuk jadual seperti dibawah.
cTT
• Menentukan indeks kesukaran bagi item subjektif:
-Pengiraan dibuat seperti jadual di bawah:
cTT
Item Difficulty Level: Discussion
• Is a test that nobody failed too easy?
• Is a test on which nobody got 100% too difficult?
• Should items that are “too easy” or “too difficult” be thrown out?
KUALITI INSTRUMEN
Indeks Diskriminasi Item
Bagi memastikan item yang dibina berfungsi denganbaik. Boleh dianalisis menggunakan CTT dan IRT. Itemyang baik seharusnya dapat membezakan keupayaanpelajar yang berpencapaian tinggi dan berpencapaianrendah. Indeks diskriminasi membantu penetapan itemdibuang dan disimpan dalam bank item
Bagaimana anda menganalisis indeks kesukaran item?
What is a “good” value?
If the item has Ratio of Students answered the itemcorrectly
Positive Discrimination High achievers >Low achievers
Negative Discrimination High achievers < low achievers
No discrimination High achievers = low achievers
What is a “good” value?
Discrimination Index Item Evaluation
0.40 and above Very good
0.30-0.39 Good and can be improved
0.20-0.29 Marginal and need improvement
Below 0.19 Bad, cant be accepted and need proper checking
• Contoh:Jika terdapat 40 orang murid dalam satu kelas, bahagikan mereka kepada dua kumpulan iaitu 20 murid pencapaian tinggi dan 20 murid pencapaian rendah. Misalnya bagi item 8, 16 murid dari kumpulan berpencapaian tinggi dapat menjawab dengan betul manakala hanya 4 orang murid dari kumpulan berpencapaian rendah yang menjawab betul bagi item tersebut.
Maka:K t = 16 = 0.8 atau ( 80 % )
20K r = 4 = 0.2 atau ( 20 % )
20D = K t – K r = 0.8 - 0.2 = 0.6( Kesimpulannya item 8 adalah item yang baik)
cTT
TEST THEOREM
If an individual can perform the most difficult aspects of the objective, the instructor can "assume" the lower levels can be done.
However, if testing the lower levels, the instructor cannot "assume" the individual can perform the higher levels.
ALTERNATIVE ASSESSMENT
• Beyond the traditional psychometrically driventesting. Design to assess learning tasks that stimulatecritical thinking skills and require students toproduce or demonstrate knowledge rather simplyrecall information provided to them by others
Alternative assessments are used to determine what students can and cannot do, in contrast to what they
know or do not know
WHEN TO USE??
• Substitute Pencil and Paper test
• Measuring higher level skills or other skills that cannot be measured by pencil and paper
• E.g (acting skills, balancing, counting, drawing,experimenting, interviewing, musical skills, physicaleducation skills, speaking skills, writing skills)
HUMAN JUDGMENT IN SCORING
REAL WORLD APPLICATIONS
MEANINGFUL INSTRUCTIONAL TASK
HIGHER LEVEL OF THINKING
STUDENTS PERFORMANCE
Characteristics of Alternative Assessment
70
ASSESSMENT
CONVENTIONALALTERNATIVE
AUTHENTICPERFORMANCE
BASED
LITERATURE REVIEW ANALYSIS
ALTERNATIVE ASSESSMENT LADDER
The Ladder of Alternative Assessment
Examples of authentic assessment
Research Project Debate
Writing Speech / summary
Studio
Work
Portfolio
Article Review
Writing Journal / proposal
Case Study
AUTHENTIC ASSESSMENT
New Academia Learning Innovation
Not only performance based, buthappen in the real setting.
Emphasizing more on process ratherthan product
Soft skills development
Holistic assessment
Rubric
TEACHING PRACTICES
CAPSTONE PROJECT
SERVICE LEARNING
2U2I PROGRAMME
WORK BASED LEARNING
JOB CREATION
SCORING ALTERNATIVE ASSESSMENT
METHOD
CHECKLISTRATING SCALE
RUBRIC
HOLISTIC ANALYTIC
82
GRADED ASSIGNMENT
• Develop one table of specification for your final exam, using the formula given in this workshop.
• Send in softcopy (using excel form)
• Individual/ Group Assignment according to your course.
GRADED ASSIGNMENT
• Analyze your final examination item using Classical test theory
• Send in softcopy (using excel form)
• Individual/ Group Assignment according to your course
“Students can escape bad teaching but they cannot escape bad
assessment.”
Boud, 1995
85
86
Give full measure and weight with justice
87
Give just measure and weight
Look at the measure not the score…..
Emphasize on the outcome not the output….
THANK YOU