teaching material

Teaching material

PHYSICS EDUCATIONAL ASSESMENT

BY:

KADEK AYU ASTITI, S. PD., M. PD.

NIP. 20140928 201404 2 002

Support by:

Dana PGMIPAU Tahun 2014

PHYSICS EDUCATION PROGRAM

MATHEMATIC AND SCIENCE DEPARTMENT

FACULTY OF TEACHER TRAINING AND EDUCATION NUSA CENDANA UNIVERSITY

2014

PREFACE

A notable concern of many teachers is that they frequently have the task of

constructing assessment to reflect on learning but have relatively little training or information

to rely on in this task. Assessment so important in teaching and learning. Assessment is very

important for our students, because it shows them where they are falling short. That it why

teachers should always discuss exams with students afterwards, to show them what the right

answers were, and where they made mistakes. For the same reason, students must be given

their marks, and their exam scripts, as soon as possible. Assessment for Learning focuses on

the opportunities to develop students' ability to evaluate themselves, to make judgements

about their own performance and improve upon it. It makes use of authentic assessment

methods and offers lots of opportunities for students to develop their skills through formative

assessment using summative assessment sparingly. To do an effectively assessment, so

teacher must be understand the type of assessment, type of scale assessment, method of

construct the test, validity and reliability tests. Each aspect is discussed in this sourcesbook.

To help the teacher do assessing, so part one contains information the meaning and the

type of assessment. Concerning general test construction and introduces the six levels of

intellectual understanding: knowledge, comprehension, application, analysis, synthesis, and

evaluation. These levels of understanding assist in categorizing test questions, with

knowledge as the lowest level. Part Two of the information sourcebook is devoted to actual

test question construction, test of validity and reliability. Five test item types are discussed:

multiple choice, true-false, matching, completion, and essay. Information covers the

appropriate use of each item type, advantages and disadvantages of each item type, and

characteristics of well written items. Suggestions for addressing higher order thinking skills

for each item type are also presented. This sourcebook was developed to accomplish three

outcomes: 1) Teachers will know the meaning and follow appropriate principles for

developing and using assessment methods in their teaching, avoiding common pitfalls in

student assessment, 2) Teachers will be able to identify and accommodate the limitations of

different informal and formal assessment methods. 3) Teachers will gain an awareness that

certain assessment approaches can be incompatible with certain instructional goals.

Kadek Ayu Astiti, S. Pd., M. Pd.

Contens

Preface i

CHAPTER I TYPE OF ASSESSMENT 1

1.1. Difference measurement, assessment and evaluation

1.2 General Type of Assessment

1.3 Norm Referenced Assessment and Criterion Referenced

Assessment

CHAPTER II EVALUATION OF LEARNING OBJECTS

2.1 Cognitive Learning Outcomes

2.2 Affective Learning Outcomes

2.3 Psychomotor Learning Outcomes

2.4 The Type Value Scale

CHAPTER III LEARNING ASSESSMENT

3.1 Objective

3.2 Essay

CHAPTER IV NON TEST ASSESSMENT

4.1 Observation

4.2 Interview

4.3 Questionnare

4.4 Portofolios

4.5 Project

CHAPTER V Validity Test

5.1 Content Validity

5.2 Criterion Validity

5.3 Construct Validity

CHAPTER VI RELIABILITY TEST

6.1 External Consistency Reliability

6.2 Internal Consistency Reliability

CHAPTER I

TYPES OF ASSESSMENT

Purpose: After learning this matter, students are expected to:

- Be able to explain the definition of assessment

- Be able to explain the different between measurement, assessment, evaluation

- Mention the types of assessment (summative and formative)

- Understand the concept of criterion referenced test and norm referenced framework

1.1. Difference measurement, assessment and evaluation

There is a lot of confusion over these three terms as well as other terms associated

with measurenment, assessment, and evaluation. The following is an understanding of each of

these terms:

Measurement, beyond its general definition, refers to the set of procedures and the principles

for how to use the procedures in educational tests and assessments. Some of the derived

scores, standard scores, etc. A measurement takes place when a “test” is given and a “score”

is obtained. If the test collects quantitative data, the score is a number. If the test collects

qualitative data, the score may be a phrase or word such as “excellent.”

Assessment is a process by which information is obtained relative to some known objective

or goal. As noted in my definition of test, an assessment may include a test, but also includes

methods such as observations, interviews, behavior monitoring, etc.

Evaluation: focuses on grades and may reflect classroom components other than course

content and mastery level. Evaluation are procedures used to determine whether the subject

(i.e. student) meets a preset criteria, such as qualifying for special education services. This

uses assessment (remember that an assessment may be a test) to make a determination of

qualification in accordance with a predetermined criteria.

For the purpose of schematic representation, the three concepts of evaluation,

measurement and testing have traditionally been demonstrated in three concentric circles of

varying sizes. This is the relationship among these concepts.

Assessment plays a major role in how students learn, their motivation to learn, and how

teachers teach. Assessment is used for various purposes.

• Assessment for learning: where assessment helps teachers gain insight into what

students understand in order to plan and guide instruction, and provide helpful

feedback to students.

• Assessment as learning: where students develop an awareness of how they learn and

use that awareness to adjust and advance their learning, taking an increased

responsibility for their learning.

• Assessment of learning: where assessment informs students, teachers and parents, as

well as the broader educational community, of achievement at a certain point in time

In order to celebrate success, plan interventions and support continued progress.

Assessment must be planned with its purpose in mind. Assessment for, as and of learning all

have a role to play in supporting and improving student learning, and must be appropriately

balanced. The most important part of assessment is the interpretation and use of the

information that is gleaned for its intended purpose. Assessment is embedded in the learning

process. It is tightly interconnected with curriculum and instruction. As teachers and students

Figure 1.1 relationship measurement, assessment and evaluation

Measurement

Assessment

Evaluation

work towards the achievement of curriculum outcomes, assessment plays a constant role in

informing instruction, guiding the student’s next steps, and checking progress and

achievement. Teachers use many different processes and strategies for classroom assessment,

and adapt them to suit the assessment purpose and needs of individual student.

Table 1.1 Classroom assessment: from … to …

No From To

1 Classroom tests disconnected from the

focus of instruction

Classroom tests refecting the written and

taught curriculum

2 Assessment using only selected

respons formats

Assessment method selected intentionally to

reflect specific kinds of learning target

3 Mystery assessment, where students

don’t know in advances what they are

accountable for learning

Transparency in assessments, where students

know in advance what they will be held

accountable for learning

4 All assessment and assignments,

including practice, “count” toward the

grade

Some assessment an assignment “count”

toward the grade, others are for practice or

other formative use

5 Students as passive participant in the

assessment process

Students as active users of assessments as

learning experiences

6 Students not finding out until the

graded event what they are good at

and what they need to work on

Students being able to identify theirs

strengths and areas for futher study during

learning

1.2. General Type of Assessment

1.2.1. Summative assessment

Summative assessment are cumulative evaluation used to measure student growth

after instruction and are generally given at the end of a course in order to determine wheter

long term learning goals have been met. Summative assessment is assessments that provide

evidence of student achievement for the purpose of making a judgment about student

competence or program effectiveness. Typically the summative evaluation concentrates on

learner outcome rather than only the program of instruction. It is means to determine a

student’s mastery and understanding of information, skills, concept and process. Summative

assessment occur at the end of a formal learning experience. Either a class or a program and

may include a variety of activities example test, demonstration, portofolios, internship,

clinical, and capstone project. Summative assement is a high stakes type of assessment for the

purpose of the making final judgment about student achivment and instructional effectiveness.

By the time summative assessment occur student haved typically exit the learning mode.

Teachers/schools can use these assessments to identify strengths and weaknesses of

curriculum and instruction, with improvements affecting the next year's/term's students.

Summative assessment are given periodically to determine at a particular point in time what

students know and do not know. Many associate summative assessments only with

standardized tests such as state assessments, but they are also used at and are an important

part of district and classroom programs. Summative assessment at the district and classroom

level is an accountability measure that is generally used as part of the grading process. The list

is long, but here are some examples of summative assessments:

a) State assessments

b) District benchmark or interim assessments

c) End-of-unit or chapter tests

d) End-of-term or semester exams

e) Scores that are used for accountability of schools (AYP) and students (report card grades).

The key is to think of summative assessment as a means to gauge, at a particular point

in time, student learning relative to content standards. Although the information gleaned from

this type of assessment is important, it can only help in evaluating certain aspects of the

learning process. Because they are spread out and occur after instruction every few weeks,

months, or once a year, summative assessments are tools to help evaluate the effectiveness of

programs, school improvement goals, alignment of curriculum, or student placement in

specific programs. Summative assessments happen too far down the learning path to provide

information at the classroom level and to make instructional adjustments and interventions

during the learning process. It takes formative assessment to accomplish this. The goal of

summative assessment is to evaluate student learning at the end of an instructional unit by

comparing it against some standard or benchmark. Information from summative assessments

can be used formatively when students or faculty use it to guide their efforts and activities in

subsequent courses.

1.2.2. Formative assessment

Formative Assessment is part of the instructional process. Formative assessment is an

integral part of teaching and learning. Formative assessment ongoing assessments, reviews,

and observations in a classroom. Teachers use formative assessment to improve instructional

methods and student feedback throughout the teaching and learning process. For example, if a

teacher observes that some students do not grasp a concept, she or he can design a review

activity or use a different instructional strategy. Likewise, students can monitor their progress

with periodic quizzes and performance tasks. The results of formative assessments are used to

modify and validate instruction. Formative assessment occurs in the short term, as learners are

in the process of making meaning of new content and of integrating it into what they already

know. When in corporated into classroom practice, it providesthe information needed to

adjust teaching and learning while they are happening. In this sense formative assessment

informs both teachers and students about student understanding at a point when timely

adjustment can be made. These adjustment help to ensure student achieve, targeted standards

based learning goal within a set time frame. Although formative assessment strategies appear

in a variety of formats, there are some distinct ways to distinguish them from summative

assessments. Formative assessment helps teachers determine next steps during the learning

process as the instruction approaches the summative assessment of student learning.

Some of the instructional strategies that can be used formatively include the following:

1. Criteria and goal setting with students engages them in instruction and the learning

process by creating clear expectations. In order to be successful, students need to

understand and know the learning target/goal and the criteria for reaching it. Establishing

and defining quality work together, asking students to participate in establishing norm

behaviors for classroom culture, and determining what should be included in criteria for

success are all examples of this strategy. Using student work, classroom tests, or

exemplars of what is expected helps students understand where they are, where they need

to be, and an effective process for getting there.

2. Observations go beyond walking around the room to see if students are on task or

need clarification. Observations assist teachers in gathering evidence of student learning

to inform instructional planning. This evidence can be recorded and used as feedback for

students about their learning or as anecdotal data shared with them during conferences.

3. Questioning strategies should be embedded in lesson/unit planning. Asking better

questions allows an opportunity for deeper thinking and provides teachers with

significant insight into the degree and depth of understanding. Questions of this nature

engage students in classroom dialogue that both uncovers and expands learning. An “exit

slip” at the end of a class period to determine students’ understanding of the day’s lesson

or quick checks during instruction such as “thumbs up/down” or “red/green” (stop/go)

cards are also examples of questioning strategies that elicit immediate information about

student learning. Helping students ask better questions is another aspect of this formative

assessment strategy.

4. Self and peer assessment helps to create a learning community within a classroom.

Students who can refect while engaged in metacognitive thinking are involved in their

learning. When students have been involved in criteria and goal setting, self-evaluation is

a logical step in the learning process. With peer evaluation, students see each other as

resources for understanding and checking for quality work against previously established

criteria.

5. Student record keeping helps students better understand their own learning as

evidenced by their classroom work. This process of students keeping ongoing records

of their work not only engages students, it also helps them, beyond a “grade,” to see

where they started and the progress they are making toward the learning goal. All of these

strategies are integral to the formative assessment process, and they have been suggested

by models of effective middle school instruction.

6. Balancing Assessment. As teachers gather information/data about student learning,

several categories may be included. In order to better understand student learning,

teachers need to consider information about the products (paper or otherwise) students

create and tests they take, observational notes, and reflections on the communication that

occurs between teacher and student or among students. When a comprehensive

assessment program at the classroom level balances formative and summative student

learning/achievement information, a clear picture emerges of where a student is relative

to learning targets and standards. Students should be able to articulate this shared

information about their own learning. When this happens, student-led conferences, a

formative assessment strategy, are valid. The more we know about individual students as

they engage in the learning process, the better we can adjust instruction to ensure that all

students continue to achieve by moving forward in their learning.

The goal of formative assessment is to monitor student learning to provide ongoing feedback

that can be used by instructors to improve their teaching and by students to improve their

learning. More specifically, formative assessments:

• help students identify their strengths and weaknesses and target areas that need work

• help faculty recognize where students are struggling and address problems

immediately

Formative assessments are generally low stakes, which means that they have low or no point

value. Examples of formative assessments include asking students to:

• draw a concept map in class to represent their understanding of a topic

• submit one or two sentences identifying the main point of a lecture

• turn in a research proposal for early feedback

1.3. Norm Referenced Assessment and Criterion Referenced Assessment

When we look at the types of assessment instruments, we can generally classify them

into two main groups: Criterion referenced assessments and norm-referenced assessments.

1.3.1. Norm Referenced assessment

Linn and Gronlund (2000) define norm referenced assessments in the following a test

or other type of assessment designed to provide a measure of performance that is interpretable

in terms of an individual's relative standing in some known group. Norm referenced tests

allow us to compare a student's skills to others in his age group. Norm-referenced tests are

developed by creating the test items and then administering the test to a group of students that

will be used as the basis of comparison.

The essential characteristic of norm referencing is that students are awarded their

grades on the basis of their ranking within a particular cohort. Norm-referencing involves

fitting a ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding

grades. Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical

terminology), either by qualitative, informal rough-reckoning or by statistical techniques of

varying complexity. For large student cohorts (such as in senior secondary education),

statistical moderation processes are used to adjust or standardise student scores to fit a normal

distribution. Norm referenced is standardized test compare students performance to that of a

norming or sample group who are in the same grade or are the same age. Student performance

is communicated in presentile ranks, grade aquivalent score, normal curve equivalents, scaled

scores, or stanine scores.

1.3.2. Criterion Referenced Assessment

Criterion referenced is a students performance is easured against a standard. One form

of criterion referenced assessment is the benchmark, a description of a key task that students

are expected be perform. In contrast, criterion referencing assessment as the name implies,

involves determining a student’s grade by comparing his or her achievements with clearly

stated criteria for learning outcomes and clearly stated standards for particular levels of

performance. Linn and Gronlund (2000) define criterion referenced assessments in the

following a test or other type of assessment designed to provide a measure of performance

that is interpretable in terms of a clearly defined and delimited domain of learning tasks.

Unlike norm-referencing, there is no pre-determined grade distribution to be generated and a

student’s grades is in no way influenced by the performance of others. Theoretically, all

students within a particular cohort could receive very high (or very low) grades depending

solely on the levels of individuals’ performances against the established criteria and standards.

The goal of criterion referencing is to report student achievement against objective reference

points that are independent of the cohort being assessed. Criterion referencing can lead to

simple pass fail grading schema, such as in determining fitness to practice in professional

fields. Criterion referencing can also lead to reporting student achievement or progress on a

series of key criteria rather than as a single grade or percentage. Criterion referencing is worth

aspiring towards. Criterion referencing requires giving thought to expected learning

outcomes: it is transparent for students, and the grades derived should be defensible in

reasonably objective terms students should be able to trace their grades to the specifics of

their performance on set tasks. Criterion referencing lays an important framework for student

engagement with the learning process and its outcomes.

The distinction between criterion and norm referenced assessments is criterion

referencing compares one to a standard, norm referencing compares one to others. The

following is a difference between norm referenced test and criterion referenced test adapted

from Popham, (1975).

Table 1.2. Difference Norm Referenced Test and Criterion Referenced Test

Dimension Criterion Referenced

Tests

Norm Referenced

Tests

Purpose

To determine whether each student

has achieved specific skills or

concepts.

To find out how much students

know before instruction begins and

after it has finished.

To rank each student with respect to

the achievement of others in broad

areas of knowledge.

To discriminate between high and low

achievers.

Content

Measures specific skills which

make up a designated curriculum.

These skills are identified by

teachers and curriculum experts.

Each skill is expressed as an

instructional objective.

Measures broad skill areas sampled

from a variety of textbooks, syllabi,

and the judgments of curriculum

experts.

Item

Characteristics

Each skill is tested by at least four

items in order to obtain an adequate

sample of student

performance and to minimize the

effect of guessing.

Each skill is usually tested by less than

four items.

Items vary in difficulty.

The items which test any given

skill are parallel in difficulty.

Items are selected that discriminate

between high and low achievers.

Score

Interpretation

Each individual is compared with a

preset standard for acceptable

achievement. The performance of

other examinees is irrelevant.

A student's score is usually

expressed as a percentage.

Student achievement is reported for

individual skills.

Each individual is compared with other

examinees and assigned a score--

usually expressed as a percentile, a

grade equivalent

score, or a stanine.

Student achievement is reported for

broad skill areas, although some norm-

referenced tests do report student

achievement for individual skills.

Which of these methods is preferable? Mostly, students’ grades in universities are

decided on a mix of both methods, even though there may not be an explicit policy to do so.

In fact, the two methods are somewhat interdependent, more so than the brief explanations

above might suggest. Logically, norm-referencing must rely on some initial criterion-

referencing, since students’ ‘raw’ scores must presumably be determined in the first instance

by assessors who have some objective criteria in mind. Criterion-referencing, on the other

hand, appears more educationally defensible. But criterion-referencing may be very difficult,

if not impossible, to implement in a pure form in many disciplines. It is not always possible to

be entirely objective and to comprehensively articulate criteria for learning outcomes: some

subjectivity in setting and interpreting levels of achievement is inevitable in higher education.

This being the case, sometimes the best we can hope for is to compare individuals’

achievements relative to their peers.

Norm-referencing, on its own and if strictly and narrowly implemented is undoubtedly

unfair. With norm-referencing, a student’s grade depends to some extent at least not only on

his or her level of achievement, but also on the achievement of other students. This might lead

to obvious inequities if applied without thought to any other considerations. For example, a

student who fails in one year may well have passed in other years! The potential for

unfairness of this kind is most likely in smaller student cohorts, where norm-referencing may

force a spread of grades and exaggerate differences in achievement. Alternatively, norm-

referencing might artificially compress the range of difference that actually exists.

Recognising, however, that some degree of subjectivity is inevitable in higher

education, it is also worthwhile to monitor grade distributions – in other words, to use a

modest process of norm-referencing to watch the outcomes of a predominantly criterion-

referenced grading model. In doing so, if it is believed too many students are receiving low

grades, or too many students are receiving high grades, or the distribution is in some way

oddly spread, then this might suggest something is amiss and the assessment process needs

looking at. There may be, for instance, a problem with the overall degree of difficulty of the

assessment tasks, for example: not enough challenging examination questions, or too few, or

assignment tasks that fail to discriminate between students with differing levels of knowledge

and skills. There might also be inconsistencies in the way different assessors are judging

student work. Best practice in grading in higher education involves striking a balance between

criterion referencing and norm-referencing. This balance should be strongly oriented towards

criterion referencing as the primary and dominant principle.

Reference:

Bastanfar, A. 2009. Alternative in Assessment. Article.

http://www3.telus.net/linguisticsissues/alternatives.

Garrison, C. & Ehringhaus, M. 2010. Formative and Summative Assessment in The

Classroom. www.measuredprogress.

Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.).

Upper Saddle River, NJ: Prentice Hall.

Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing 18

(4) 351–372.

Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall,

Inc.

CHAPTER II

EVALUATION OF LEARNING OBJECTS

Purpose: After learning this matter, student are expected to:

- Understanding what is measured in cognitive aspects

- Understanding what is measured in afectif aspects

- Understanding what is measured in psikomotor aspects

- Be able to explain the kinds of scale assessment

2. 1. Cognitive Learning Outcomes

One of the objects of evaluation result is the cognitive aspects of learning. The test

questions will focus on appropriate intellectual activity ranging from simple recall to problem

solving, crithical thinking and reasoning. Cognitive complexity refers to the various levels of

learning that can be tested. A good test reflects the goals of the instruction. If the instructor is

mainly concerned with students memorizing facts, the test should ask for simple recall of

material. If the instructor is trying to develop analytic skills, a test that asks for recall is

inappropriate and will cause students to conclude that memorization is the instructor's true

goal.

In 1956, after extensive research on educational goals, the group published its findings

in a book edited by Dr. Benjamin S. Bloom, a Harvard professor. Bloom’s Taxonomy of

Educational Objectives lists six levels of intellectual understanding:

• Knowledge • analysis

• Comprehension • syntesis

• application • evaluation

Table 2.1 Cognitive complexity adapated from Clay (2001)

Steps Explanation Example

Knowledge Recognizing and recalling

information, including dates,

events, persons, places; terms,

definitions; facts, principles,

theories; methods and procedures

Who invented the…?

What is meant by…?

Where is the…?

Comprehension Understanding the meaning of

information, including restating (in

own words); translating from one

form to another; or interpreting,

explaining, and summarizing.

Restate in your own words…?

Convert fractions into…?

List three reasons for…?

Application Applying general rules, methods, or

principles to a new situation,

including classifying something as

a specific example of a general

principle or using a formula to

solve a problem.

How is...an example of... ?

How is...related to... ?

Why is...significant?

Analysis Identifying the organization and

patterns within a system by

identifying its component parts and

the relationships among the

components.

What are the parts of... ?

Classify ...according to...

Outline/diagram...

Synthesis Discovering/creating new

connections, generalizations,

patterns, or perspectives;

combining ideas to form a new

whole.

What would you infer from... ?

What ideas can you add to... ?

How would you create a... ?

Evaluation Using evidence and reasoned

argument to judge how well a

proposal would accomplish a

particular purpose; resolving

controversies or differences of

opinion.

Do you agree…?

How would you decide about... ?

What priority would you give... ?

2. 2. Affective Learning Outcomes

Affective learning outcomes are learning outcomes related to their interests, attitudes

and values. Affective learning outcomes developed by karthwohl, et al as outlined in his

book: “Handbook II: The affective Domain”. According Karthwohl (in Mehren and

Lehmann, 1973) affective domain consisit of: receiving, responding, valuting, organization,

and Characteristing.

Table 2.2: Affective domain guide adapted by Clay (2001)

Level If the student must Then use these key words in objectives,

assignments and evaluations

Receiving …receive information

about or give attention

to this new attitude,

value or belief.?

• be alert to

• be aware of

• be sensitive to

• experience

• listen to

• look at

• perceive existence

• Receive information on

• take notes on

• take notice of

• willingly attends

Responding …participate in, or

react to this new

attitude, value or belief

in a positive manner.

• allow other to

• answer questions

on

• contribute to

• cooperate with

dialog on

• discuss openly

• enjoy doing

• participate in

• reply to

• respect those who

Valuing …show some definite

involvement in or

commitment to this new

attitude, value or belief

• accept as right

• accept as true

• affirm belief/trust

in

• associate himself

with

• assume as true

• consider valuable

• decide based on

• indicate agreement

• influence others

• justify based on

• seek out more detail

Organizing …integrate this new

attitude, value or belief,

with the existing

organization of

attitudes, values and

beliefs, so that it has a

position of priority and

advocacy.

• Advocate

• integrate into life

• judge based on

• place in value

system

• prioritize based on

• persuade others

• systematize

Characteristi

ng

…fully internalize this

new attitude, value or

belief so that it

consistently

characterizes thought

and action.

• act based on

• consistently carry

out

• consistently

practice

• fully internalize

• know by others as

• characterized by

• sacrifice for

• view life based on

2. 3. Psychomotor Learning Outcomes

Psychomotor learning outcomes are learning outcomes related to motor skills and the

ability to act individually. Pscychomotor behaviors are performed actions that are

neuromuscular in nature and demand certain levels of physical dexterity. This assessment is

suitable to assess the achievement of competence demanded of learners perform a specific

task example: experiment in laboratorium. Taxonomy is often used is the taxonomy of the

psychomotor learning outcomes Simpson (Gronlund and Linn, 1990. That taxonomi such as

perception, set, guided response, mechanism, Complex Overt Response, adaptation,

origination.

Tabel 2.3 Psychomotor Domain

Category Description Examples of activity Action verbs

Perception

Awareness, the

ability to use

sensory cues to

guide physical

activity. The ability

to use sensory cues

to guide motor

activity. This

ranges from

sensory

stimulation,

through cue

selection, to

translation.

use and/or selection of senses

to absorb data for guiding

movement

Examples: Detects non-

verbal communication cues.

Estimate where a ball will

land after it is thrown and

then moving to the correct

location to catch the ball.

Adjusts heat of stove to

correct temperature by smell

and taste of food. Adjusts the

height of the forks on a

forklift by comparing where

the forks are in relation to the

pallet.

“By the end of the music

theatre program, students will

be able to relate types of

music to particular dance

steps.”

chooses, describes,

detects, differentiates,

distinguishes, feels,

hears, identifies,

isolates, notices,

recognizes, relates,

selects, separates,

touches,

Set

Readiness, a

learner's readiness

to act. Readiness to

act. It includes

mental, physical,

and emotional sets.

These three sets are

dispositions that

predetermine a

person’s response

to different

situations

(sometimes called

mindsets).

mental, physical or emotional

preparation before experience

or task

Examples: Knows and acts

upon a sequence of steps in a

manufacturing process.

Recognize one’s abilities and

limitations. Shows desire to

learn a new process

(motivation). NOTE: This

subdivision of Psychomotor is

closely related with the

"Responding to phenomena"

subdivision of the Affective

domain.

“By the end of the physical

education program, students

will be able to demonstrate

the proper stance for batting a

ball.”

arranges, begins,

displays, explains, gets

set, moves, prepares,

proceeds, reacts,

shows, states,

volunteers, responds,

starts,

Guided

Response

Attempt. The early

stages in learning a

complex skill that

includes imitation

imitate or follow instruction,

trial and error.

Examples: Performs a

mathematical equation as

assembles, builds,

calibrates, constructs,

copies, dismantles,

displays, dissects,


and trial and error.

Adequacy of

performance is

achieved by

practicing.

demonstrated. Follows

instructions to build a model.

Responds hand-signals of

instructor while learning to

operate a forklift.

“By the end of the physical


will be able to perform a golf

swing as demonstrated by the

instructor.”

fastens, fixes, follows,

grinds, heats, imitates,

manipulates, measures,

mends, mixes, reacts,

reproduces, responds

sketches, traces, tries.

Mechanism

basic proficiency,

the ability to

perform a complex

motor skill.

This is the

intermediate stage

in learning a

complex skill.

Learned responses

have become

habitual and the

movements can be

performed with

some confidence

and proficiency.

competently respond to

stimulus for action

Examples: Use a personal

computer. Repair a leaking

faucet. Drive a car.

“By the end of the biology

program, students will be able

to assemble laboratory

equipment appropriate for

experiments.”

assembles, builds,

calibrates, completes,

constructs, dismantles,

displays, fastens, fixes,

grinds, heats, makes,


mends, mixes,

organizes, performs,

shapes, sketches.

Complex

Overt

Response

expert proficiency,

the intermediate

stage of learning a

complex skill.

The skillful

performance of

motor acts that

involve complex

movement patterns.

Proficiency is

indicated by a

quick, accurate,

and highly

coordinated

performance,

requiring a

minimum of

energy. This

category includes

performing without

hesitation, and

automatic

performance. For

example, players

Execute a complex process

with expertise

Examples: Maneuvers a car

into a tight parallel parking

spot. Operates a computer

quickly and accurately.

Displays competence while

playing the piano.

“By the end of the industrial


will be able to demonstrate

proper use of woodworking

tools to high school students.”

assembles, builds,

calibrates, constructs,

coordinates,

demonstrates,

dismantles, displays,

dissects, fastens, fixes,

grinds, heats,


mends, mixes,

organizes, sketches.

NOTE: The key words

are the same as

Mechanism, but will

have adverbs or

adjectives that indicate

that the performance is

quicker, better, more

accurate, etc.


are often utter

sounds of

satisfaction or

expletives as soon

as they hit a tennis

ball or throw a

football, because

they can tell by the

feel of the act what

the result will

produce.

Adaptation

adaptable

proficiency, a

learner's ability to

modify motor skills

to fit a new

situation.

Skills are well

developed and the

individual can

modify movement

patterns to fit

special

requirements.

alter response to reliably meet

varying challenges

Examples: Responds

effectively to unexpected

experiences. Modifies

instruction to meet the needs

of the learners. Perform a task

with a machine that it was not

originally intended to do

(machine is not damaged and

there is no danger in

performing the new task).

“By the end of the industrial


will be able to adapt their

lessons on woodworking

skills for disabled students.”

adapts, adjusts, alters,

changes, integrates,

rearranges, reorganizes,

revises, solves, varies.

Origination

creative

proficiency, a

learner's ability to

create new

movement patterns.

Creating new

movement patterns

to fit a particular

situation or specific

problem. Learning

outcomes

emphasize

creativity based

upon highly

developed skills.

develop and execute new

integrated responses and

activities

Examples: Constructs a new

theory. Develops a new and

comprehensive training

programming. Creates a new

gymnastic routine.

arranges, builds,

combines, composes,

constructs, creates,

designs, formulates,

initiate, makes,

modifies, originates, re-

designs,

trouble-shoots.

2. 4. The Type of Value Scale

Scales of measurement refer to ways in which variables/numbers are defined and

categorized. Each scale of measurement has certain properties which in turn determines the

appropriateness for use of certain statistical analyses. There are four measurement scales (or

types of data) such as nominal, ordinal, interval and ratio.

2.4.1. Nominal value scale

Nominal value scale is a scale that used to identifying object, individual, or group. In

the quisioner that gave yes (1) answer or no (0) is the sample nominal value scale. The least

like real number. Nominal basically refers to category discrete data such as name of your

school, type of car you drive, classifying the gender, religion, menu items selected, etc.

a sub-type of nominal scale with only two categories (e.g. male/female) is called

“dichotomous.” Nominal data can be clearly described in pie charts because they include

clear categories that sup-up to 100%.

2.4.2. Ordinal values scale

Ordinal value scale is a scale that have a rank form. Ordinal is Scale for ordering

observations from low to high with any ties attributed to lack of measurement sensitivity e.g.

score from a questionnaire. For the example the first rank, second, and soon. In the questioner

that have a likerts scale, use the ordinal value scale such as a disagree statement (1), doubt

statement (2), and an agree statement (3). Ordinal refer to quantities that have a natural

ordering. The ranking of favorit sports, the order of people placein a line, the order of runners

finishing a race or more often the choice on a rating scale from 1 to 5. For the example: class

ranks, social class categories, etc.

Example:

What is your gender?

M – Male

F - Female

Which recreational activities

do you participat in?

1 – Hiking

2 – Fishing

3 – Boating

4 – swimming

5 - picniking

How statisfied are you

with our service?

1. very unsatisfied

2. unsatisfied

3. neutral

4. unsatisfied

5. very unsatisfied

How do you feel today?

1. very unhappy

2. unhappy

3. ok

4. happy

5. very happy

What is your hair colour?

1 – Black

2 – brown

3 – blonde

4 – gray

5 - other

2.4.3. Interval values scale

Internal value scale is a same scale with nominal and and ordinal values scale, but it

has a remain characteristics and be able to notate into the mathematics function. Scale with a

fixed and defined interval. For the example how much a woman go to market (one, twice, etc)

or final test score. Interval data is like ordinal except we can say the intervals between each

value are equally split. The most common example is the temperature in degrees Fahrenheit.

The difference between 29 and 30 degrees is the same magnitude as the difference between 78

and 79.

2.4.4. Ratio value scale

Ratio value scale is a real value scale, have a same distance and be able to notate into

the mathematics function. Ratio scales are the easiest to understand Because they are numbers

as we usually think of them. The distance between adjacent numbers are equal on a ratio scale

and the score of zero on the scale ratio means that there is none of whatever is being

Measured. Most ratio scales are counts of things. Ratio data is interval data with a natural zero

point. The example, weight the distance of street, time to complete a task, size of an object,

etc.

Reference :

Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question).

Kansas State Department of Education

Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom.

Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co

Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall.

Inc.

Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.

CHAPTER III

TEST ASSESSMENT


- be able to explain the test assessment

- be able to describe kinds of test assessment

- understanding advantages and disadvantages using the test

As Figure 3.1 shows, tests constitute only a small set of options, among a wide range

of other options, for a language teacher to make decisions about students. The judgment

emanating from a test is not necessarily more valid or reliable from the one deriving from

qualitative procedures since both should meet reliability or validity criteria to be considered as

informed decisions. The area circumscribed within quantitative decision-making is relatively

tests

Non testing Non

measurement

quantitative

assessment by

description

Decision making

Figure 3.1 alternative assessment, decision making in educational

setting

small and represents a specific choice made by the teacher at a particular time in the course

while the vast area outside which covers all non-measurement qualitative assessment

procedures represents the wider range of procedures and their general nature. This means that

the qualitative approaches which result in descriptions of individuals, as contrasted to

quantitative approaches which result in numbers, can go hand in hand with the teaching and

learning experiences in the class and they can reveal more subtle shades of students’

proficiency.

Test is method of measuring a persons ability, knowledge or performance to complete

certain tasks or demonstrate mastery of a skill or knowledge of content. Test is a systematic

procedure for observing persons and describing them with either a numerical scale or a

category system. Thus test may give either a qualitative or quantitative information. Two type

of test are objective test and essay test. Essay tests are appropriate when:

• The group to be tested is small and the test is not to be reused.

• You wish to encourage and reward the development of student skill in

writing.

• You are more interested in exploring the student’s attitudes than in

measuring his/her achievement.

Objective tests are appropriate when:

• The group to be tested is large and the test may be reused.

• Highly reliable scores must be obtained as efficiently as possible.

• Impartiality of evaluation, fairness, and freedom from possible test

scoring influences are essential.

Either essay or objective tests can be used to 1) measure almost any important

educational achievement, 2) a written test can measure, 3) test understanding and ability to

apply principles, 4) test ability to think critically, 5) test ability to solve problems.

3.1. Objective

Objective tests measure both your ability to remember facts and figures and your

understanding of course materials. These tests are often designed to make you think

independently, so don't count on recognizing the right answer. Instead, prepare yourself for

high level critical reasoning and making fine discriminations to determine the best answer.

Taking an objective examination is somewhat different from taking an essay examination.

The objective examination may be composed of true false, multiple choice, or matching

responses. Also included occasionally is a fill in section. There are certain things that you

must remember to do as you take this kind of test. First, roughly decide how to divide your

time. Quickly glance over the pages to see how many kinds of questions are being used and

how many there are of each kind. Secondly, carefully read the instructions and make sure that

you understand them before you begin to work. Indicate your answers exactly as specified in

the instructions. If your instructor has not indicated whether there is a penalty for guessing,

ask him or her about it; then, if there is a penalty, do not guess.

3.1.1. Multiple choice

Multiple choice is a test that has items formatted as multiple choice question, and the

candidat must choose which answer or group of answers is correct. The multiple choice

question consists of two parts: 1) the stem the statement or question, which identifies the

question or problem and 2) the choices also known as the distracters. Usually, students are

asked to select the one alternative that best completes a statement or answers a question.

Multiple choice items can also provide an excellent basis for post test discussion, especially if

the discussion addresses why the incorrect responses were wrong as well as why the correct

responses were right. Unfortunately, multiple choice items are difficult and time consuming

to construct well. They may also appear too discriminating (picky) to students, especially

when the alternatives are well constructed and are open to misinterpretation by students who

read more into questions than is there. Multiple choice tests can be used to test the ability to:

1. Recall memorized information

2. Apply theory to routine cases

3. Apply theory to novel situations

4. Use judgment in analyzing and evaluating

Example of multiple choice:

A three years old child can usually be expected to:

a. Cry when separated from his or her mother

b. Have imaginary friends

c. Play with other children of the same age

d. Constantly argue with order siblings

3.1.2. True/false questions

True/false question present candidates with a binary choice a statement is either true or

false. This method presents problems as depending on the number of questions. True/false

questions also a popular question type, the true false question has only two options. True false

questions usually state the relation of two things to one another. Because the instructor is

interested in knowing whether you know when and under what circumstances something is or

is not true, s/he usually includes some qualifiers in the statement. The qualifiers must be

carefully considered. With the following qualifiers, you are wiser to guess "yes" if you don't

know the answer because you may stand some chance of getting the answer right: most, some,

usually, sometimes, and great. On the other hand, with these next qualifiers, you should guess

"no" unless you are certain that the statement is true: all, no, always, is, never, is not, good,

bad, equal, less.

The following are advantages of true or false test:

• Can test large amounts of content

• Students can answer 3-4 questions per minute

And the disadvantages are:

• They are easy

• It is difficult to discriminate between students that know the material and students who

do not

• Students have a 50-50 chance of getting the right answer by guessing

• Need a large number of items for high reliability

Example of true or false question:

1. Electrons are larger than molecules.

a. True b. false

2. True or false? The study of plants is known as botany.

a. True b. false

3. TTrue or false? Is it recommended to take statements directly from the text to make

good true-false questions?

a. True b. false

3.1.3. Matching Questions Type

Matching question type is an items that provides a define term and requires a test taker

to match identifying characteristic to the correct term. Matching questions give students some

opportunity for guessing. Student must know the information well in that you are presented

with two columns of items for which student must establish relationships. If only one match is

allowed per item then once items become eliminated, a few of the latter ones may be guessed.

Matching questions give stundent some opportunity for guessing. Student must know the

information well in that you are presented with two columns of items for which students must

establish relationships. If only one match is allowed per item then once items become

eliminated, a few of the latter ones may be guessed. A simple matching item consists of two

columns: one column of stems or problems to be answered, and another column of responses

from which the answers are to be chosen. Traditionally, the column of stems is placed on the

left and the column of responses is placed on the right.

Example:

Directions: match the following!

Water A. NaCl

Discovered radium B. H2O

Salt C. Fermi

Ammonia D. NH3

E. Curie

3.1.4. Completions type

Completions type is a filling in the blank item provides a test taker with identifying

characteristic and requires the test taker to recall the correct term. Completion items are

especially useful in assessing mastery of factual information when a specific word or phrase is

important to know. There are two type of completion type such us the easier version provides

a word bank of possible word that will fill in the blank. For some exams all words in the word

bank are exactly once. If a teacher wanted to create a test of medium difficulty, they would

provide a test with a word bank, but some words maybe used more than onces and others not

at all. The hardest variety of such a test is a fill in the blank test in which no word bank is

provided at all. This generally requires a higer level of understanding and memory than a

multiple choice. Advantages:

• Good for who, what, where, when content

• Minimizes guessing

• Encourages more intensive study. Student must know the answer vs.

recognizing the answer.

• Can usually provide an objective measure of student achievement or

ability

Disadvantages:

• Difficult to assess higher levels of learning because the answers to

completion items are usually limited to a few words

• Difficult to construct so that the desired response is clearly indicated

• May overemphasize memorization of facts

• Questions may have more than one correct answer

• Scoring is time consuming

A completion item requires the student to answer a question or to finish an incomplete

statement by filling in a blank with the correct word or phrase.

For example,

A subatomic particle with a negative electric charge is called a(n) ____________.

3.2. Essay

Essay test is a test that requires the student to compose responses, usually lengthy up

to several paragraphs. Essay test measure higer level thinking. A typical essay test usually

consists of a small number of questions to which the student is expected to recall and organize

knowledge in logical, integrated answers. Questions that test higher level processes such as:

analysis, synthesis, evaluation, creativity. The distinctive feature of essay type tets is the

freedom of response. Pupil are free to select, relates and present ideas in their own words.

Items such us shorts answer or essay typically require a test taker to write a response to fulfill

the requirenments of the item. In administrative term, essay items take less time to construct.

As an assessment tool, essay items can test complex learning objectives as well as processes

used to anser the questions. The items can also provide more realistic and generalize task for

test. Finally, these items make it difficult for test takers to guess the correct answers and

require test takers to demonstrate their writing skills as well as correct spelling and grammar.

Uses of essay test:

a. Assess the ability to recall, organize, and integrate ideas.

b. Assess the ability to express one self in writing.

c. Ability to supply information.

d. Assess student understanding of subject matter.

e. Measure the knowledge of factual information.

The main advantages of essay and short answer items are that they permit students to

demonstrate achievement of such higher level objectives as analyzing and critical thinking.

Written items offer students the opportunity to use their own judgment, writing styles, and

vocabularies. They are less time consuming to prepare than any other item type. Research

indicates that students study more efficiently for essay type examinations than for selection

(multiple choice) tests. Students preparing for essay tests focus on broad issues, general

concepts, and interrelationships rather than on specific details. This studying results in

somewhat better student performance regardless of the type of exam they are given. Essay

tests also give the instructor an opportunity to comment on students' progress, the quality of

their thinking, the depth of their understanding, and the difficulties they may be having.

The following are the advantages essay test:

• Students less likely to guess

• Easy to construct

• Stimulates more study

• Allows students to demonstrate ability to organize knowledge, express opinions, show

originality.

Disadvantages:

• Can limit amount of material tested, therefore has decreased validity.

• Subjective, potentially unreliable scoring.

• Time consuming to score.

Types of essay test:

3.2.1. Restricted response

The restricted response question usually limits both the content and the response the

content is usually restricted by the scope of the topic to be discussed limitations on the form

of response are generally indicated in the question another way of restricting responses in

essay tests is to base the questions on specific problems. For this purpose, introductory

material like that used in interpretive exercises can be presented. Such items differ from

objective interpretive exercise only by the fact that essay questions are used instead of

multiple choice or true or false items. Because the restricted response question is more

structured it is most useful for measuring learning outcomes requiring the interpretation and

application of date in a specific area. Example of restricted response: describe two situations

that demonstrate the application of the law of supply and demand, state any five definition of

education!

Advantages of restricted response questions:

• restricted response questions more structured

• measure specific learning outcomes

• provide more for more ease of assessment

• any outcomes measured by an objective interpretive exercise can be measured by

a restricted response questions

3.2.2. Extended response

Extended response question allows student to select information that they think is

pertinent, to organize the answer in accordance with their best judgment and to integrate and

evaluate ideas as they think suitable. No restriction is placed in students as to the points he

will discuss and the type of organization he will use. They do not set limits on the length or

exact content to be discussed.

Teachers in such a way so as to give students the maximum possible freedom to

determine the nature and scope of question and in a way he would give response of course

being related topic and in stipulated time frame these types of questions. The student may be

select the points he thinks are most important, pertinent and relevant to his points and

arrangement and organize the answers in whichever way he wishes. So they are also called

free response questions. This enables the teacher to judge the student’s abilities to organize,

integrate, interpret the material and express themselves in their own words. It also gives an

opportunity to comment or look into students’ progress, quality of their thinking, the depth of

their understanding problem solving skills and the difficulties they may be having. These

skills interact with each other with the knowledge and understanding the problem requires.

Thus it is at the levels of synthesis and evaluation of writing skills that this type of questions

makes the greatest contribution. Example: 1) describe at length the defects of the present day

examination system in the state of Maharashtra. Suggest ways and means of improving the

examination system. 2) describe the character of hamlet. 3) global warming is the next step to

disaster.

Reference :



Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological

Bulletin.




Inc.


CHAPTER IV

NON TEST ASSESSMENT


- be able explain the non test assessment

- be able to describe kinds of non test assessment

Non test is an instrument other than academic achievement test. Item writing

procedures for non-test instruments is the same as the procedure of writing tespada learning

achievement test. Construct the lattice test, write items according to the lattice, review,

validation grains, grain testing, grain refinement based on the results of trials.

4.1. Observation

Should follow an established plan or checklist organized around concrete, objective

data. Observation needs to be tied to the objectives of the course. By observation Teachers

can assess their students' abilities simply by observing their classroom behavior or completion

of activities. By watching students as they work, teachers can identify signs of struggle and

determine where a child may be experiencing academic difficulties. Because students often do

not realize that they are being observed, teachers can ensure that the picture they receive of

student understanding represents the student's actual abilities. For most practitioners

observation is a feature of everyday working life and practitioners can often be found with a

notebook and pen close to hand to jot down unplanned observations that can be added to

normal recording systems at a later time. However, as previously discussed, specific

observations should be planned. Prior to beginning the observation practitioners should work

through the stages outlined in the previous section and, as a part of this process, the most

appropriate observational method should be selected from the range available. It will also be

helpful to produce a cover sheet including such details as:

• child’s name

• child’s age

• date

• name of observer

• the specific setting or area of setting

• permissions gained

• aims and purpose of observation

• start and finish times.

4.2. Interview

Interviews are the most frequently used method of personnel selection, but also are

used for school admissions, promotions, scholarships, and other awards. Interviews vary in

their content and structure. In a structured interview, questions are prepared before the

interview starts. An unstructured interview simply represents a free conversation between an

interviewer and interviewee, giving the interviewer the freedom to adaptively or intuitively

switch topics. Research has shown that unstructured interviews lack predictive validity53 or

show lower predictive validity than structured interviews. The best practices for conducting

interviews are:

• High degree of structure

• Selection of questions according to job requirements

• Assessment of aspects that cannot be better assessed with other methods

• Scoring with pre-tested, behavior-anchored rating scales

• Empirical examination of each question

• Rating only after the interview

• Standardized scoring

• Training of interviewers

Structured interviews can be divided into three types:

a. Behavioral description interview involves questions that refer to past behavior in real

situations, also referred to as job-related interview.

b. Situational interview uses questions that require interviewees to imagine hypothetical

situations (derived from critical incidents) and state how they would act in such

situations.

c. Multimodal interview combines the two approaches above and adds unstructured parts

to ensure high respondent acceptance.

Analyses of predictive validity of interviews for job performance have shown that they

are good predictors of job performance, add incremental validity above and beyond general

mental ability, and that behavioral description interviews show a higher validity than

situational interviews. Interviews are less predictive of academic performance as compared to

job-related outcomes. Predictive validity probably also depends on the content of the

interview, but the analyses aggregated interviews with different contents.

4.3. Questionnaire

Questionnaires are the most commonly used method for collecting information from

program participants when evaluating educational and extension programs. There are nine

steps involved in the development of a questionnaire:

1. Decide the information required.

2. Define the target respondents.

3. Choose the method(s) of reaching your target respondents.

4. Decide on question content.

5. Develop the question wording.

6. Put questions into a meaningful order and format.

7. Check the length of the questionnaire.

8. Pre-test the questionnaire.

9. Develop the final survey form.

4.4. Portofolios

Portofolios is a collection of student work with a common theme or purpose. Like a

photographers portfolio they should contain the best examples of all of their work. For

subjects that are paper-based, the collection of a portfolio is simple. Homework is a structured

practiced exercise that usually place a part in grading. Sometimes instructors assign reading or

other homework which covers the theoretical aspects the subject matter, so that the class time

can be used for more hands on practical work. In a portfolio assessment, a teacher looks not at

one piece of work as a measure of student understanding, but instead at the body of work the

student has produced over a period of time. To allow for a portfolio assessment, a teacher

must compile student work throughout the term. This is commonly accomplished by

providing each student with a folder in which to store essays or other large activities. Upon

compilation of the portfolio, the teacher can review the body of work and determine the

degree to which the work indicates the student's understanding of the content.

Advantages of Portfolio Assessment

• Assesses what students can do and not just what they know.

• Engages students actively.

• Fosters student-teacher communication and depth of exploration.

• Enhances understanding of the educational process among parents and in the community.

• Provides goals for student learning.

• Offers an alternative to traditional tests for students with special needs.

The use of the portfolio as an assessment tool is a process with multiple steps. The

process takes time, and all of the component parts must be in place before the assessment can

be utilized effectively.

a. Decide on a purpose or theme. General assessment alone is not a sufficient goal for a

portfolio. It must be decided specifically what is to be assessed. Portfolios are most useful

for addressing the student’s ability to apply what has been learned. Therefore, a useful

question to consider is, What skills or techniques do I want the students to learn to apply?

The answer to this question can often be found in the school curriculum.

b. Consider what samples. Consider what samples of student work might best illustrate the

application of the standard or educational goal in question. Written work samples, of

course, come to mind. However, videotapes, pictures of products or activities, and

testimonials are only a few of the many different ways to document achievement.

c. Determine how samples will be selected. A range of procedures can be utilized here.

Students, maybe in conjunction with parents and teachers, might select work to be

included, or a specific type of sample might be required by the teacher, the school, or the

school system.

d. Decide whether to assess the process and the product or the product only. Assessing the

process would require some documentation regarding how the learner developed the

product. For example, did the student use the process for planning a short story or

utilizing the experimental method that was taught in class? Was it used correctly?

Evaluation of the process will require a procedure for accurately documenting the process

used. The documentation could include a log or video of the steps or an interview with

the student. Usually, if both the process and the product are to be evaluated, a separate

scoring system will have to be developed for each.

e. Develop an appropriate scoring system. Usually this is best done through the use of a

rubric, a point scale with descriptors that explain how the work will be evaluated. Points

are allotted with the highest quality work getting the most points. If the descriptors are

clear and specific, they become goals for which the student can aim. There should be a

separate scale for each standard being evaluated. For example, if one standard being

assessed is the use of grammatically correct sentence structure, five points might be

allotted if all sentences are grammatically correct. Then, a specific number of errors

would be identified for all other points with zero points given if there are more than a

certain number of errors. It is important that the standards for evaluation be carefully

explained. If we evaluate for clarity of writing, then an operational description of what is

meant by clarity should be provided. Points available should be small enough to be

practical and meaningful; an allotment of 20 points for clarity is not workable because an

evaluator cannot really distinguish between a 17- and an 18-point product with regard to

clarity.

f. Share the scoring system with the students. Qualitative descriptors of how the student

will be evaluated, known in advance, can guide learning and performance.

g. Engage the learner in a discussion of the product. Through the process of discussion the

teacher and the learner can explore the material in more depth, exchange feelings and

attitudes with regard to the product and the learning process, and reap the greatest

advantage of effective portfolio implementation.

4.5. Case studies and problem solving assignment can be used to apply knowledge. This type

of assignment required the student to place him or herself in or react to a situation where

their prior learning is needed to solve the problem or evaluate the situation. Cas studies

should be realistic and practical with clear instruction.

4.6. Project

Project are usually designed so that the students can apply many of the skills they have

developed in the course by producing a product of some kind. Usually project assignments

are given early in the course with a completion date toward the end of the quarter. By asking

students to complete a project, teachers can see how well their pupils can apply taught

information. Successful completion of a project requires a student to translate their learning

into the completion of a task. Project-based assessment more closely approximates how

students will be assessed in the real world, as employers will not ask their employees to take

tests, but instead judge their merit upon the work they complete. Project is the example of

performance task.

Reference :

Arvey, R. D., & Campion, J. E. (1982). The employment interview: A summary and review of

recent research. Personnel Psychology.




Bulletin.

Damiani, V. B. 2004. Portofolio Asssessment in The Classroom. National Association of

School Psycologists.



Janz, T., Hellervik, L., & Gilmore, D. C. (1986). Behavior Description Interviewing (BDI).

Boston: Allyn & Bacon.

Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A. (1980). The situational

interview (SI). Journal of Applied Psychology.


Inc.


Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in

personnel psychology. Psychological Bulletin.

Schuler, H. (2002). Das Einstellungsinterview (Multimodales Interview [MMI]). Göttingen:

Hogrefe.

CAPTER 5

VALIDITY TEST

Purpose:

- Be able to explain the definition of validity test

- Be able to explain the function of validity test

- Be able to test of validity

Test validity is the extent to which a test (such as a chemical, physical, or scholastic test)

accurately measures what it purports to measure. Validity divided into various kinds such as

content validity, criterion validity, and construct validity.

5. 1. Content Validity

Content validity is the estimate of how much a measure represents every single

element of a construct. Content validity is The extent to which the content of the test matches

the instructional objectives. The example A semester or quarter exam that only includes

content covered during the last six weeks is not a valid measure of the course's overall

objectives. It has very low content validity.

5. 2. Criterion Validity

Criterion Validity assesses whether a test reflects a certain set of abilities. If the

criterion is obtained some titTIC after the test is given, he is studying predictive validity. If

the test score and criterion score are detertnined at essentially the sanle time, he is studying

concurrent validity.

Concurrent validity measures the test against a benchmark test and high correlation

indicates that the test has strong criterion validity. In concurrent validity, we assess the

operationalization's ability to distinguish between groups that it should theoretically be able to

distinguish between. For example, if we come up with a way of assessing manic-depression,

our measure should be able to distinguish between people who are diagnosed manic-

depression and those diagnosed paranoid schizophrenic. If we want to assess the concurrent

validity of a new measure of empowerment, we might give the measure to both migrant farm

workers and to the farm owners, theorizing that our measure should show that the farm

owners are higher in empowerment. As in any discriminating test, the results are more

powerful if you are able to show that you can discriminate between two groups that are very

similar. If the end-of-year math tests in 4th grade correlate highly with the statewide math

tests, they would have high concurrent validity.

Predictive validity is a measure of how well a test predicts abilities. It involves testing

a group of subjects for a certain construct and then comparing them with results obtained at

some point in the future. In predictive validity, we assess the operationalization's ability to

predict something it should theoretically be able to predict. For instance, we might theorize

that a measure of math ability should be able to predict how well a person will do in an

engineering-based profession. We could give our measure to experienced engineers and see if

there is a high correlation between scores on the measure and their salaries as engineers. A

high correlation would provide evidence for predictive validity -- it would show that our

measure can correctly predict something that we theoretically think it should be able to

predict.

5. 3. Construct Validity

Construct validity is an assessment of how well you translated your ideas or theories

into actual programs or measures. Construct validity defines how well a test or experiment

measures up to its claims. A test designed to measure depression must only measure that

particular construct, not closely related ideals such as anxiety or stress. Construct validity

refers to the degree to the which inferences can legitimately be made from the

operationalizations in your study to the theoretical constructs on roomates Reviews those

operationalizations were based. Like external validity, construct validity is related to

generalizing . But , where external validity Involves generalizing from context to guide the

study of people, places or times, the construct validity Involves generalizing from your

program or measures to the concept of your program or measures.

Convergent validity tests that constructs that are expected to be related are, in fact,

related. In convergent validity, we examine the degree to which the operationalization is

similar to (converges on) other operationalizations that it theoretically should be similar to.

For instance, to show the convergent validity of a Head Start program, we might gather

evidence that shows that the program is similar to other Head Start programs. Or, to show the

convergent validity of a test of arithmetic skills, we might correlate the scores on our test

with scores on other tests that purport to measure basic math ability, where high correlations

would be evidence of convergent validity.

Discriminant validity tests that constructs that should have no relationship do, in fact,

not have any relationship. (also referred to as divergent validity). In discriminant validity, we

examine the degree to which the operationalization is not similar to (diverges from) other

operationalizations that it theoretically should be not be similar to. For instance, to show the

discriminant validity of a Head Start program, we might gather evidence that shows that the

program is not similar to other early childhood programs that don't label themselves as Head

Start programs. Or, to show the discriminant validity of a test of arithmetic skills, we might

correlate the scores on our test with scores on tests that of verbal ability, where low

correlations would be evidence of discriminant validity.

Reference :




Bulletin.




Inc.


CHAPTER VI

RELIABILITY TEST

Purpose:

- Be able to explain the definition of reliability test

- Be able to explain the function of reliability test

- Be able to test of reliability

Reliability relates to the consistency of an assessment. Reliability is a necessary but

not sufficient condition for validity. For instance, if the needle of the scale is five pounds

away from zero I always over report my weight by five pounds. The measurement consistent

but it is consistenly wrong, the measurement not valid. A reliable assessment is one that

consistently achieves the same results with the same (or similar) cohort of students. Various

factors affect reliability including ambiguous questions, too many options within a question

paper, vague marking instructions and poorly trained markers. Reliability testing methods can

be divided into two as external consistency and as internal consistency.

6.1. External Consistency Reliability

Reliability as an external consistency considers that the test said to be reliable if after

having tested several times will give relatively consistent results. Test methods included in

this method is the re-test method and parallel method.

Table 6.1 Test Re-test and Parallel Forms

No Method Prosedure Technic

1 Test Re-test The same tests were given as

much as two times to the same

students in different time

Correlation product moment

(between skor test 1 and test 2)

2 Parallel

Forms

Two similar tests / parallel

given to the same group of

learners

Correlation product moment

(between skor instrument test 1 dan

instrument test 2)

6.1.1. Test Re-test Reliability

Test Re-test reliability used to assess the consistency of a measure from one time to

another. Technique to measure the reliability of an achievement test by testing the same

achievement test repeatedly. The weakness of this method is that if the time interval is too

short then the second test enable learners still remember material diteskan so it is possible that

a second test result is better than the results of the first test.

The reliability coefficient in this case is simply the correlation between the scores

obtained by the same persons on the two administrations of the test. If the first test result has

parallels with the results of the second test, the test is said to be reliable. The analysis is done

by looking for correlations between the results of the first test and second test results. This is

done using the Pearson product-moment correlation coefficient (r). The value of "r" will

always fall within the range –1 to +1.

Example :

No Students name Score test 1 (X) Score test 2 (Y)

1 Agustina 78 80

2 Feby 80 85

3 Antoni 77 80

4 Chandra 90 85

5 Dionisius 70 75

6 Fitriani 73 78

etc

The formula:

{ }{ }2222 )()(

))((

YYNXXN

YXXYNrXY

∑−∑∑−∑

∑∑−∑=

Description:

N = number of students

X = score test 1

Y = score test 2

6.1.2. Parallel Forms Reliability

Parallel form reliability used to assess the consistency of the results of two tests

constructed in the same way from the same content domain. This method requires the

presence of two series of questions that have the same goals, level of difficulty, as well as

composition of matter, but because of different grains, in other words, two tests must be

parallel. Reliability coefficient obtained by correlating the results of the first test and second

test results.

Example :

No Students name Result of

Instrument 1 (X)

Result of

Instrument 2 (Y)

1 Fransiska 78 80

2 Johnson 80 85

3 Leona 77 80

4 Ratya 90 85

5 Febriyanti 70 75

6 Karmila 73 78

etc

The formula:

Description:

N = number of students

X = score from result of instrument test 1

Y = score from result of instrument test 2

{ }{ }2222 )()(

))((

YYNXXN

YXXYNrXY

∑−∑∑−∑

∑∑−∑=

6.2. Internal Consistency Reliability

Reliability as an internal consistency of the view that the test said to be reliable if the

test item between consistent measurement results. Test-retest method and parallel form

reliability methods have the disadvantage that they are time consuming. In most cases the

researcher wants to estimate the reliability from a single administration of a test. This

requirement has led to the measuring of internal consistency, or homogeneity. Internal

consistency measures consistency within the tool. Several internal consistency methods exist.

All internal consistency measurements have one thing in common, namely that the

measurement is based on the results of a single measurement. In the present study Split-Half

technique and Cronbach's Alpha method were used to estimate the internal consistency

reliability. The statistical analysis for Split half reliability (Spearman and Brown formula and

Guttmann's formula) and Cronbach's Alpha reliability, SPSS 17 Statistical Software was used.

The calculation for the Split half reliability by Flanagan's formula MS-Excel software was

used.

6.2.1. Split-Half reliability method

In the Split-Half reliability method, the inventory was first divided into two

equivalent halves and the correlation coefficient between scores of these half-test was found.

This correlation coefficient denotes the reliability of the half test. The self correlation

coefficient of the whole test is estimated by different formulas. The measuring instrument can

be divided into two halves in a number of ways. But the best way to divide the measuring

instrument into two halves is to find the correlation coefficient between scores of odd

numbered and even numbered items. In the present study the correlation coefficient was

calculated by using following formulas:

a. Spearman and Brown Formula

The spearman and Brown formula was designed to estimate the reliability of a test n

times as long as the one for which we know a self correlation. From the reliability of the

half test, the self-correlation coefficient of the whole test is estimated by the following

Spearman and Brown formula:

Where,

rtt = reliability of a total test estimated from reliability of one of its halves (reliability

coefficient of the whole test)

rhh = self correlation of a half test (reliability coefficient of the half test)

b. Rulon/Guttmann's Formula

An alternate method for finding split-half reliability was developed by Rulon. It requires

only the variance of the differences between each person's scores on the two half-tests

and the variance of total scores. These two values are substituted in the following

formula, which yields the reliability of the whole test directly:

Where,

rtt = Reliability of the test

SDd = SD of difference of the scores

SDx = SD of the scores of whole test

c. Flanagan Formula

Flanagan gave a parallel formula for finding reliability using split half method.

Flanagan's Formula for reliability is described below:

Where,

rtt = Reliability of the test

SD1 = SD of the scores on 1st half

SD2 = SD of scores on 2nd half

SDt = SD of scores of whole test

6.2.2. Cronbach's Alpha method

Cronbach's Alpha is mathematically equivalent to the average of all possible split-half

estimates. A statistical analysis computer programme SPSSS 17 was used to calculate the

Cronbach's Alpha (a).

Reference :




Bulletin.




Inc.


Curiculum vitae

Kadek Ayu Astiti, S. Pd., M.Pd. born in Singaraja, September 28, 1988.

She is the second child of the couple and Ni Ketut Sudi Made Suarsini.

Website address: www.kadekayuastiti.blogspot.com . History of

education: elementary school No. 6 Kampung Baru Singaraja-Bali, SMP

Negeri 3 Singaraja-Bali, SMA N 1 Singaraja-Bali, S1 Physical Education

at Ganesha University of Education, Science Education S2 at Ganesha

University of Education. Employment history: laboratory in SMP N 1

Singaraja-Bali (2010-2011) , lecturer in SMP N 1 Singaraja-Bali (2011-2013 ), lecturer of

physical education courses at the University of Nusa Cendana (2014-present)

Documents

teaching material