Measuring & Evaluating Learning Outcomes

CHAPTER 1

A Perspective on Educational Assessment,

Measurement, and Evaluation

As teaching is causing learning among learners, teachers need to be

thoroughly aware of the processes in determining how successful they are

in the aforementioned task. They need to know whether their students are

achieving successfully the knowledge, skills, and values inherent in their

lessons. For this reason, it is critical for beginning teachers, to build a

repertoire measurement and evaluation of student learning. This chapter is

geared towards equipping you with the basic concepts in educational

assessment, measurement, and evaluation.

Measurement, Assessment, and Evaluation

Measurement as used in education is the quantification of what

students learned through the use of tests, questionnaires, rating scales,

checklists, and other devices. A teacher, for example, who gave his class a

10 – item quiz after a lesson on the agreement of subject and verb is

undertaking measurement of what was learned by the students on that

particular lesson.

Assessment, however, refers to the full range of information

gathered and synthesized by teachers about their students and their

classrooms (Arends, 1994). This information can be gathered in informal

ways, such as through observation or verbal exchange. It cal also be

gathered through formal ways, such as assignments, tests, and written

reports or outputs.

While measurement refers to the quantification of students’

performance and assessment as the gathering and synthesizing of

information, evaluation is a process of making judgments, assigning value

or deciding on the worth of students’ performance. Thus, when a teacher

assigns a grade to the score you obtained in a chapter quiz or term

examination, he is performing an evaluative act. This is because he places

value on the information gathered on the test.

Measurement answers the question, how much does a student

learn or know? Assessment looks into how much change has occurred on

the student’s acquisition of a skill, knowledge or value before and after a

given learning experience. Since evaluation is concerned with making

judgments on the worth or value of a performance, it answers the question,

how good, adequate or desirable is it? Measurement and assessment are,

therefore, both essential to evaluation.

Educational Assessment: A context for Educational

Measurement and Evaluation

As a framework for educational measurement and evaluation,

educational assessment is quite difficult to define. According to Stiggins

and his colleagues (1996) assessment is a method of evaluating

personality in which an individual, living in a group meets and solve a

variety of lifelike problems. From the viewpoint of Cronbach, as cited by

Jaeger (1997), three principal features of assessment are identifiable: (1)

the use of a variety technique; (2) reliance on observations in structured

and unstructured situations; and (3) integration of information. The

aforementioned definition and features of assessment are applicable to a

classroom situation. The term personality in the definition of assessment

refers to an individual’s characteristics which may be cognitive, affective

and psychomotor. The classroom setting is essentially social, which

provides both structured and unstructured phases. Even problem – solving

is a major learning task. Holistic appraisal of a learner, his or her

environment, and his or her accomplishments is the principal objective of

educational assessment.

Bloom (1970) has this to say on the process of educational

assessment:

Assessment characteristically starts with an analysis of the

criterion and the environment in which an individual lives, learns,

and works. It attempts to determine the psychological pressures the

environment creates, the role expected, and the demands and

pressures – their hierarchical arrangement, consistency, as well as

conflict. It then proceeds to the determination of the kinds of

evidence that are appropriate about the individuals who are placed

in this environment, such as their relevant strengths and

weaknesses, their needs and personality characteristics, their skills

and abilities.

From the foregoing description of the process of educational

assessment, it is very clear that educational assessment concerns itself

with the total educational setting and is a more inclusive term. This is

because it subsumes measurement and evaluation. It focuses not only on

the nature of the learner but also on what is to be learned and how it is to

be learned. In a real since, it is diagnostic in intent or purpose. This is due

to the fact that through educational assessment the strengths and

weaknesses of an individual learner can be identified and at the same

time, the effectiveness of the instructional materials used and the

curriculum can be ascertained.

Assessments are continuously being undertaken in all educational

settings. Decisions are made about content and specific objectives, nature

of students and faculty, faculty morale and satisfaction, and the extent to

which student performances meet standards. Payne (2003) describes a

typical example of how assessments can be a basis for decision making:

1. The teacher reviews a work sample, showing some column

additions are in error and there are frequent carrying errors.

2. He / She assigns simple problems on proceeding pages, with

consistent addition errors in some number combinations, as

well as repeated errors in carrying from one column to

another.

3. He / She give instruction through verbal explanation,

demonstration, trial and practice.

4. The student becomes a successful in calculations made in

each preparation step after direct teacher instruction.

5. The student returns to the original pages, completes it

correctly, and is monitored closely when new processes are

introduced.

From the foregoing example, it can be seen that there is a very close

association between assessment and instruction. The data useful in

decision – making may be related from informal assessments, such as

observations from interactions or from teacher – made tests.

Informed decision – making in education is very important owing to

the obvious benefits it can bring about (Linn, 1999). Foremost among

these benefits evaluation of feelings of competence in the area of

academic skill and the sense of one’s perception of being able to function

effectively in society is something obligatory. Finally, the affective side of

development is equally important. Personal dimensions, like feelings of self

– worth, being able to adjust to people and cope with various situations

lead to better overall life adjustment.

Purposes of Educational Assessment, Measurement and

Evaluation

Educational assessment, measurement and evaluation serve the

following purposes (Kellough, et al, 1993):

Improvement of Student Learning – Knowing how well students

are performing in class can lead teachers to devise ways and means

of improving student learning.

Identification of Students’ Strengths and Weaknesses – Through

measurement, assessment, and evaluation, teachers can be able to

single out their students’ strengths and weaknesses. Data on these

strengths and weaknesses can serve as bases for undertaking

reinforcement and / or enrichment activities for the students.

Assessment of the Effectiveness of a Particular Teaching

Strategy – Accomplishment of an instructional objective through the

use of a particular teaching strategy is important to teachers.

Competent teachers continuously evaluate their choice of strategies

on the basis of student achievement.

Appraisal of the Effectiveness of the Curriculum – Through

educational measurement, assessment, and evaluation, various

aspects of the curriculum are continuously evaluated by curriculum

committees on the basis of the results of achievement test results.

Assessment and Improvement of Teaching Effectiveness –

Results of testing are used as basis for determining teaching

effectiveness. Knowledge of the results of testing can provide school

administrators inputs on the instructional competence of teachers

under their charge. Thus, intervention programs to improve teaching

effectiveness can be undertaken by the principals or even

supervisors on account of the results of educational measurement

and evaluation.

Communication with and Involvement of Parents in Their

Children’s Learning – Results of educational measurement,

assessment, and evaluation are utilized by the school teachers in

communicating to parents their children’s learning difficulties,

knowing how well their children are performing academically can

lead them to forge a partnership with the school in improving and

enhancing student learning.

Types of Classroom Assessment

There are three general types of classroom assessment teachers

are engaged in (Airisian, 1994). These are as follows: official; sizing up;

and instructional.

Official assessment is undertaken by teachers to carry out the

bureaucratic aspects of teaching, such as giving students grades at the

end of each marking period. This type of assessment can be done through

formal tests, term papers, reports, quizzes, and assignments. Evidence

sought by teachers in official assessment is mainly cognitive.

Sizing up assessment, however, is done to provide teachers

information regarding the students’ social, academic, and behavioral

characteristics at the beginning of each school year. Information gathered

by teachers, in this type of assessment, provides a personality profile of

each of these students to boost instruction and foster communication and

cooperation in the classroom.

Instructional assessment is utilized in planning instructional

delivery and monitoring the progress of teaching and learning. It is

normally done daily throughout the school year. It, therefore, includes

decisions on lessons to teach, teaching strategy to employ, and

instructional materials and resources to use in the classroom.

Methods of Collecting Assessment Data

Airisian (1994) identified two basic methods of collecting information

about the learners and instruction, namely: paper and pencil; and

observational techniques.

When the learners put down into writing their answers to questions

and problems, the assessment method is pre – and – pencil technique.

Paper and pencil evidence that teachers are able to gather includes tests

taken by students, maps drawn, written reports; completed assignments

and practice exercises. By examining these evidences, teachers are able

to gather information about their students’ progress.

There are two general types of paper and pencil techniques: supply

and selection. Supply type requires the student to produce or construct an

answer to the question. Book report, essay question, class project, and

journal entry are examples of the supply – type of paper and pencil

technique.

Selection type, on the other hand, requires the student to choose

the correct answer from a list of choices or options. Multiple choice,

matching test, alternate response test are technique as the students

answer questions by simply choosing an answer from a set of options

provided.

The second method teachers utilize is observation. This method

involves watching the students as they perform certain learning tasks like

speaking, reading, performing laboratory investigation and participating in

group activities.

Sources of Evaluate Information

To be able to make correct judgments about students’ performance,

there is a need for teachers to gather accurate information. Thus, teachers

have to be familiar with the different sources of evaluative information.

Cumulative Record. It holds all the information collected on

students over the years. It is usually stored in the principal’s office or

guidance office and contains such things as vital statistics, academic

records, conference information, health records, family data and scores on

tests of aptitude, intelligence, and achievement. It may also contain

anecdotal and behavioral comments from previous teachers. These

comments are useful in understanding the causes of the students’

academic and behavioral problems.

Personal Contact. It refers to the teacher’s daily interactions with

his / her students. A teacher’s observation on students as he / she works

and relaxes, as well as daily conversation with them can provide valuable

clues that will be or great help in planning instruction. Observing students

not only tells the teacher how well students are doing but allows him / her

to provide them with immediate feedback. Observational information is

available in the classroom as the teacher watches and listens to students

in various situations. Examples of these situations are as follows:

1. Oral Reading. Can the student read well or not?

2. Answering Questions. Does the student understand concepts?

3. Following Directions. Does the student follow specified

instruction?

4. Seatwork. Does the student stay on – task?

5. Interest in the Subject. Does the student participate actively in

learning activities?

6. Using Instructional Materials. Does the student use the

material correctly?

Through accurate observations, a teacher can determine whether

the students are ready for next lesson. He / She can also identify those

students who are in need of special assistance.

Analysis. Through a teacher’s analysis of the errors committed by

students, he / she can be provided with much information about their

attitude and achievement. Analysis can take place either during or

following instruction. Through analysis, the teacher will be able to identify

immediately students’ learning difficulties. Thus, teachers have to file

samples of students’ work for discussion during parent – teacher

conferences.

Open – ended Themes and Diaries. One technique that can be

used to provide information about students is by asking them to write about

their lives in and out of the school. Some questions that students can be

asked to react to are as follows:

1. What things do you like and dislike about school?

2. What do you want to become when you grow up?

3. What things have you accomplished which you are proud of?

4. What subjects do you find interesting? Uninteresting?

5. How do you feel about your classmates?

The use of diaries is another method for obtaining data for

evaluative purposes. A diary can consist of a record, written every 3 or 4

days, in which students write about their ideas, concerns, and feelings. An

analysis of students’ diaries often gives valuable evaluative information.

Conferences. Conferences with parents and the students’ previous

teachers can also provide evaluative information. Parents often have

information which can explain why students are experiencing academic

problems. Previous teachers can also describe students’ difficulties and

the techniques they employed in correcting them. Guidance counselors

can also be an excellent source of information. They can also shed light on

test results and personality factors, which might affect students’

performance in class.

Testing. Through testing, teachers can measure students’ cognitive

achievement, as well their attitudes, values, feelings, and motor skills. It is

probably the most common measurement technique employed by teachers

in the classroom.

Types of Evaluation

Teachers need continuous feedback in order to plan, monitor, and

evaluate their instruction. Obtaining this feedback may take any of the

following types: diagnostic, formative, and summative.

Diagnostic evaluation is normally undertaken before instruction, in

order to assess students’ prior knowledge of a particular topic or lesson. Its

purpose is to anticipate potential learning problems and group / place

students in the proper course or unit of study. Placement of some

elementary school children in special reading programs based on a

reading comprehension test is an example of this type of evaluation.

Requiring entering college freshmen to enroll in Math Plus based on the

results of their entrance test in Mathematics is another example.

Diagnostic evaluation can also be called pre – assessment, since it

is designed to check the ability levels of the students in some areas so that

instructional starting points can be established. Through this type of

evaluation, teachers can be provided with the valuable information

concerning students’ knowledge, attitudes, and skills when they begin

studying a subject and can be employed as basis for remediation or

special instruction. Diagnostic evaluation can be based on teacher – made

tests, standardized tests or observational techniques.

Formative evaluation is usually administered during the

instructional process to provide feedback to students and teachers and

how well the former are learning the lesson being taught. Results of this

type of evaluation permit teachers modify instruction as needed. Remedial

work is normally done to remedy deficiencies noted and bring the slow

learners to the level of their classmates or peers. Basically, formative

evaluation asks, ”how are my students doing?” It uses pretests, homework,

seatwork, and classroom questions. Results of formative evaluation are

neither recorded, nor graded but are used for modifying or adjusting

instruction.

Summative evaluation is undertaken to determine students’

achievement for grading purposes. Grades provide the teachers the

rationale for passing or failing students, based on a wide range of

accumulated behaviors, skills, and knowledge. Through this type of

evaluation, students’ accomplishments during a particularly marking term

are summarized or summed up. It is frequently based on cognitive

knowledge, as expressed through test scores and written outputs.

Examples of summative evaluation are chapter tests, homework

grades, completed project grades, periodical tests, unit test and

achievement tests.

This type of evaluation answers the question, “how did my students

fare?” Results of summative evaluation can be utilized not only for judging

student achievement but also for judging the effectiveness of the teacher

and the curriculum.

Approaches to Evaluation

According to Escarilla and Gonzales (1990), there are two

approaches to evaluation, namely: norm – referenced and criterion –

referenced.

Non – referenced evaluation is one wherein the performance of a

student in a test is compared with the performance of the other students

who took the same examination. The following are examples of norm –

referenced evaluation:

1. Karl’s score in the periodical examination is below the mean.

2. Cynthia ranked fifth in the unit test in Physics.

3. Rey’s percentile rank in the Math achievement test is 88.

Criterion – referenced evaluation on the other hand, is an

approach to evaluation wherein a student’s performance is compared

against a predetermined or agreed upon standard. Examples of this

approach are as follows:

1. Sid can construct a pie graph with 75% accuracy.

2. Yves scored 7 out of 10 in the spelling test.

3. Lito can encode an article with no more than 5 errors in

spelling.

REFERENCES

Airisian, P.W. (1994). Classroom Assessment, 2nd Ed. New York: McGraw

Hill, Inc.

Bloom, B.S. (1970). The Evaluation of Instruction: Issues and Problems.

New York: Holt, Rinehart & Winston.

Clark, J. & I. Starr (1977). Secondary School Teaching Methods. New

York: Macmillan Publishing Company.

Escarilla, E. R. & E. A. Gonzales (1990). Measurement and Evaluation in

Secondary Schools. Makati: Fund for Assistance to Private

Education (FAPE).

Jaeger, R. M. (1997). Educational Assessment: Trends and Practices.

New York: Holt, Rinehart & Winston.

Kellough, R. D., et al (1993). Middle School Teaching Methods and

Resources, New York: Macmillan Publishing Company.

Payne, D. A. (2003). Measuring and evaluating Educational Outcomes.

New York: Macmillan Publishing Company.

CHAPTER 2

Test and Their Uses in Educational Assessment

The most common important aspect of student evaluation in most

classrooms involves the tests teachers make and administer to their

students (Grondlund & Linn, 1990). Teachers, therefore, need to

understand the different types of tests and their uses in the assessment

and evaluation of the students’ learning. This chapter orients prospective

teachers on tests and their uses in education.

Test Defined

A test is a systematic procedure for measuring an individual’s

behavior (Brown, 1991). This definition implies that it has to be developed

following specific guidelines. It is a formal and systematic way of gathering

information about the learners’ behavior, usually through paper – and –

pencil procedure (Airisian, 1989).

Through testing, teachers can measure students’ acquisition of

knowledge, skills, and values in any learning area in the curriculum. While

testing is the most common measurement technique teachers use in the

classroom, there are certain limitations in their use. As pointed out by

Moore (1992), tests cannot measure student motivation, physical

limitations and even environmental factors. The foregoing indicates that

testing is only one of students’ learning and achievement.

Uses of Tests

Tests serve a lot of functions for school administrators, supervisors,

teachers, and parents, as well (Arends, 1994; Escarilla & Gonzales, 1990).

School administrators utilize test results for making decisions

regarding the promotion or retention of students; improvement or

enrichment of the curriculum; and conduct of staff development programs

for teachers. Through test results, school administrators can also have a

clear picture of the extent to which the objectives of the school’s

instructional program is achieved.

Supervisors use test results in discovering learning areas needing

special attention and identifying teachers’ weaknesses and learning

competencies not mastered by the students. Test results can also provide

supervisors baseline data on curriculum revision.

Teachers, on the other hand, utilize tests for numerous purposes.

Through testing, teachers are able to – gather information about the

effectiveness of instruction; give feedback to students about their progress;

and assign grades.

Parents, too, derive benefits from tests administered to their

children. Through test scores, they are able to determine how well their

sons and daughters are faring in school and how well the school is doing

its share in educating their children.

Types of Tests

Numerous types of tests are used in school. There are different

ways of categorizing tests, namely: ease of quantification of response ,

mode of preparation, mode of administration, test constructor, mode of

interpreting results, and nature of response (Manarang & Manarang, 1983;

Louisell & Descamps, 1992).

As to mode of response, test can be oral, written or performance.

1. Oral Test – It is a test wherein the test taker gives his answer

orally.

2. Written Test – It is a test where answers to questions are

written by the test taker.

3. Performance Test – It is one in which the test taker creates

an answer or a product that demonstrates his knowledge or

skill, as in cooking and baking.

As to ease quantification of response, tests can either be

objective or subjective.

1. Objective Test – It is a paper and pencil test wherein

students’ answers can be compared and quantified to yield a

numerical score. This is because it requires convergent or

specific response.

2. Subjective Test – It is a paper – and – pencil test which is not

easily quantified as students are given the freedom to write

their answer to a question, such as an essay test. Thus, the

answer to this type of test is divergent.

As to mode of administration, tests can either be individual or

group.

1. Individual Test – It is a test administered to one student at a

time.

2. Group Test – It is one administered to a group of students

simultaneously.

As to test constructor, tests can be classified into standardized and

unstandardized.

1. Standardized Test – It is a test prepared by an expert or

specialist. This type of test samples behavior under uniform

procedures. Questions are administered to students with the

same directions and time limits. Results in this kind of test are

scored following a detailed procedure based on its manual

and interpreted based on specified norms or standards.

2. Unstandardized Test – It is one prepared by teachers for use

in the classroom, with no established norms of scoring and

interpretation of results. it is constructed by a classroom

teacher to meet a particular need.

As to the mode of interpreting results, tests can either be norm –

referenced or criterion – referenced.

1. Norm – referenced Test – It is a test that evaluates a

student’s performance by comparing it to the performance of a

group of students on the same test.

2. Criterion – referenced Test – It is a test that measures a

student’s performance against an agreed upon or pre –

established level of performance.

As to the nature of the answer, tests can be categorized into the

following types: personality, intelligence, aptitude, achievement,

summative, diagnostic, formative, socio – metric, and trade.

1. Personality Test – It is a test designed for assessing some

aspects of an individual’s personality. Some areas tested in

this kind of test include the following: emotional and social

adjustment; dominance and submission; value orientation;

disposition; emotional stability; frustration level; and degree of

introversion or extroversion.

2. Intelligence Test – It is a test that measures the mental ability

of an individual.

3. Aptitude Test – it is a test designed for the purpose of

predicting the likelihood of an individual’s success in a

learning area or field of endeavor.

4. Achievement Test – It is a test given to students to determine

what a student has learned from formal instruction in school.

5. Summative Test – It is a test given at the end of instruction to

determine students’ learning and assign grades.

6. Diagnostic Test – It is a test administered to students to

identify their specific strengths and weaknesses in past and

present learning.

7. Formative Test – It is a test given to improve teaching and

learning while it is going on. A test given after teaching the

lesson for the day is an example of this type of test.

8. Socio – metric Test – It is a test used in discovering learners’

likes and dislikes, preferences, and their social acceptance, as

well as social relationships existing in a group.

9. Trade Test – It is a test designed to measure an individual’s

skill or competence in an occupation or vocation.

CHAPTER 3

Assessment of Learning in the Cognitive Domain

Learning and achievement in the cognitive domain are usually

measured in school through the use of paper – and – pencil tests (Oliva,

1988). Teachers have to measure students’ achievement in all the levels of

the cognitive domain. Thus, they need to cognizant with the procedures in

the development of the different types of paper – and – pencil tests. This

chapter is focused on acquainting prospective teachers with methods and

techniques of measuring learning in the cognitive domain.

Behaviors Measured and Assessed in the Cognitive Domain

There are three domains of behavior measured and assessed in

schools. The most commonly assessed, however, is the cognitive domain.

The cognitive domain deals with the recall or recognition of knowledge and

the development into six hierarchical levels, namely: knowledge,

comprehension, application, analysis, synthesis, and evaluation.

1. Knowledge Level: behaviors related to recognizing and

remembering facts, concepts, and other important data on any

topic or subject.

2. Comprehension Level: behaviors associated with the

clarification and articulation of the main idea of what students are

learning.

3. Application Level: behaviors that have something to do with

problem – solving and expression, which require students to

apply what they have learned to other situations or cases in their

lives.

4. Analysis Level: behaviors that require students to think critically,

such as looking for motives, assumptions, cause – effect

relationship, differences and similarities, hypotheses, and

conclusions.

5. Synthesis Level: behaviors that call for creative thinking, such

as combining elements in new ways, planning original

experiments, creating original solutions to a problem and building

models.

6. Evaluation Level: behaviors that necessitate judging the value

or worth of a person, object, or idea or giving opinion on an issue.

Preparing for Assessment of Cognitive Learning

Prior to the construction of paper – and – pencil test to be use in the

measurement of cognitive learning, teachers have to answer the following

questions (Airisian, 1994): What should be tested; what emphasis to give

to the various objectives taught; whether to administer a paper and pencil

test or observe each student directly; how long the test should take; and

how best to prepare students for testing.

What Should Be Tested. Identification of the information, skills, and

behaviors to be tested is the first important decision that a teacher has to

take. Knowledge of what shall be tested will enable a teacher to develop

an appropriate test for the purpose. The basic rule to remember, however,

is that testing emphasis should parallel teaching emphasis.

How to Gather Information About What to Test. A teacher has to

decide whether he should give a paper and pencil test or simply gather

information through observation. Should he decide to use a paper – and –

pencil test, if he decides to use observation of students’ performance of the

targeted skill, then he has to develop appropriate devices to use in

recording his observations. Decisions on how to gather information about

what to test depends on the objective or the nature or behavior to be

tested.

How Long the Test Should Be. The answer to the aforementioned

question depends on the following factors: age and attention span of the

students; and type of questions to be used.

How Best to Prepare Students for Testing. To prepare students

for teaching, Airisian (1994) recommends the following measures; (1)

providing learners with good instruction; (2) reviewing students before

testing; (3) familiarizing students with question formats; (4) scheduling the

test; and (5) providing students information about the test.

Assessing Cognitive Learning

Teacher use two types of tests in assessing student learning in the

cognitive domain: objective test and essay test (Reyes, 2000). An objective

test is a kind of test wherein there is only one answer to each item. On the

other hand, an essay test is one wherein the test taker has the freedom to

respond to a question based on how he feels it should be answered.

Types of Objective Tests

There are generally two types of objective tests: supply type and

selection type (Carey, 1995). In the supply type, the student constructs

his / her own answer to each question. Conversely, the student chooses

the right answer to each item in the selection type of objective test.

Supply types of Objective Tests: The following types of tests fall

under the supply type of test: completion drawing type, completion

statement type, correction type, identification type, simple recall type, and

short explanation type (Ebel & Frisbie, 1998).

Completion Drawing Type – an incomplete drawing is

presented which the student has to complete.

Example: In the following food web, draw arrow lines

indicating which organisms are consumers and

which are producers.

Completion Statement Type – an incomplete sentence is

presented and the student has to complete it by filling in the

blank.

Example: The capital city of the Philippines is

__________________.

Correction Type – a sentence with underlined word or phrase

is presented, which the student has to replace to make it right.

Example: Change the underlined word / phrase to make

each of the following statements correct. Write

your answer on the space before each number.

__________ 1. The theory of evolution was popularized by

Gregor Mendel.

__________ 2. Hydrography is the study of oceans and ocean

currents.

Identification Type – a brief description is presented and the

student has to identify what it is.

Example: To what does each of the following refer? Write

your answer on the blank before each number.

__________ 1. A flat representation of all curved surfaces of

the earth.

__________ 2. The transmission of parents’ characteristics

and traits to their offsprings.

Simple Recall Type – a direct question is presented for the

student to answer using a word or phrase.

Example: What is the product of two negative numbers?

Who is the national hero in the Philippines?

Short Explanation Type – similar to an essay test but

requires a short answer.

Example: Explain in a complete sentence why the

Philippines was not really discovered by

Magellan.

Selection Types of Objective Test. Included in the category of

selection type, grouping type, matching type, multiple choice type,

alternate response type, key list test, and interpreting exercise.

Arrangement Type – Terms or objects are to be arranged by

the students in a specified order.

Example 1: Arrange the following events chronologically by

writing the letters A, B, C, D, E on the spaces

provided.

_______ Glorious Revolution _______ Russian Revolution

_______ American Revolution _______ French Revolution

_______ Puritan Revolution

Example 2: Arrange the following planets according to their

nearness to the sun, by using numbers, 1, 2, 3,

4, 5.

_______ Pluto _______Jupiter _______ Saturn

_______ Venus _______ Mars

Matching Type – A list of numbered items are related to a list

of lettered choices.

Example: Match the country in Column 1 with its capital city in

Column 2. Write letters only.

Column 1 Column 2

________ 1. Philippines a. Washington D. C.

________ 2. Japan b. Jeddah

________ 3. United States c. Jerusalem

________ 4. Great Britain d. Manila

________ 5. Israel e. London

f. Tokyo

g. New York

Multiple Choice Type – this type contains a question,

problem or unfinished sentence followed by several responses.

Example: The study of value is (a) axiology (c) epistemology

(b) logic (d) metaphysics.

Alternative Response Type – A test wherein there are only

two possible answers to the question. The true – false format is a

form of alternative response type. Variations on the true – false

include yes – no, agree – disagree, and right – wrong.

Example: Write True, if the statement is true; False, if it is false.

_________ 1. Lapulapu was the first Asian to repulse European

colonizers in Asia.

_________ 2. Magellan’s expedition of the Philippines led to the

first circumnavigation of the globe.

_________ 3. The early Filipinos were uncivilized before the

Spanish conquest of the archipelago.

_________ 4. The Arabs introduced Islam in Southern

Philippines.

Key List Test – A test wherein the student has to examine

paired concepts based on a specified set of criteria (Olivia, 1998).

Example: Examine the paired items in Column 1 and Column 2.

On the blank before each number, write:

A = If the item in column 1 is an example of the item in column 2;

B = If the item in column 1 is a synonym of the item in column 2;

C = If the item in column 2 is opposite of the item in column 1; and

D = If the item in Columns 1 and 2 are not related in any way.

Column 1 Column 2

_____ 1. capitalism economic system

_____ 2. labor intensive capital intensive

_____ 3. Planned economy command economy

_____ 4. opportunity cost demand and supply

_____ 5. free goods economic goods

Interpretive Exercise – It is a form of a multiple choice type

of test that can assess higher cognitive behaviors. According to

Airisian (1994) and Mitchell (1992), interpretive exercise provides

students some information or data followed by a series of

questions on that information. In responding to the questions in

an interpretive exercise, the students have to analyze, interpret,

or apply the material provided, like a map, excerpt of a story,

passage of a poem, data matrix, table or cartoon.

Example: Examine the data on child labor in Europe during the

period immediately after the Industrial Revolution in

the continent. Answer the questions given below

encircling the letter of your choice.

TABLE 1

Child Labor in the Years Right After the Industrial

Revolution in Europe

1. The employment of child labor was greatly used in

____________.

a. 1750 c. 1770

b. 1760 d. 1780

2. As industrialization became rapid, what year indicated a

sudden increase in the number of child laborers?

a. 1760 c. 1780

b. 1770 d. 1790

3. Labor unions and government policies were responsible

in addressing the problems of child labor. In what year this evident?

a. 1780 c. 1800

1750

1760

1770

1780

1790

1800

1820

1800

3000

5000

3400

1200

600

150

Number of Child LaborersYear

b. 1790 d. 1820

Essay Test

This type of test presents a problem or question and the student is to

compose a response in paragraph form, using his or her own words, and

ideas. There are two forms of the essay test: brief or restricted; and

extended.

Brief or Restricted Essay Test – This form of the essay test

requires a limited amount of writing or requires that a given

problem be solved in a few sentences.

Example: Why did early Filipino revolts fail? Cite and explain

2 reasons.

Extended Essay Test – This form of the essay test requires a

student to present his answer in several paragraphs or pages of

writing. It gives students more freedom to express ideas and

opinions and use synthesizing skills to change knowledge into a

creative idea.

Example: Explain your position on the issue of charter change

in the Philippines.

According to Reyes (2000) and Gay (1985), the essay test is

appropriate to use when learning outcomes cannot be adequately

measured by objective test items. Nevertheless, all levels of cognitive

behaviors can be measured with the use of the essay test as shown below.

Knowledge Level – Explain hoe Siddharta Guatama became

Buddha.

Comprehension Level – What does it mean when a person

had crossed the Rubicon?

Application Level – Cite three instances showing the

application of the Law of Supply and Demand.

Analysis Level – Analyze the annual budget of your college

as to categories of funds, sources of funds, major

expenditures; and needs of your college.

Synthesis Level – Discuss the significance of the People’s

Power Revolution in the restoration of democracy in the

Philippines.

Evaluation Level – Are you in favor of the political platform of

the People’s Reform Party? Justify your answer.

Choosing the type of test depends on the teacher’s purpose and the

amount of time to be spent for the test. As a general rule, teachers must

create specific tests that will allow students to demonstrate targeted

learning competencies.

CHAPTER 4

An Introduction to the Assessment of Learning in the

Psychomotor and Affective Domains

As pointed out in the previous chapter, there are three domains of

learning objectives that teachers have to assess. While it is true that

achievement in the cognitive domain is the one teachers’ measure

frequently, students’ growth in non – cognitive domains of learning should

also be given equal emphasis. This chapter expounds different ways by

which learning in the psychomotor and affective domains can be assessed

and evaluated.

Levels of Learning in the Psychomotor Domain

The psychomotor domain of learning is focused on processes and

skills involving the mind and the body (Eby & Kujawa, 1994). It is the

domain of learning which classifies objectives dealing with physical

movement and coordination (Arends, 1994; Simpson, 1966). Thus,

objectives in the psychomotor domain require significant motor

performance. Playing a musical instrument, singing a song, drawing,

dancing, putting a puzzle together, reading a poem and presenting a

speech are examples of skills developed in the aforementioned domain of

learning.

There are three levels of psychomotor learning: imitation,

manipulation and precision (Gronlund, 1970).

Imitation is the ability to carry out a basic rudiments of a skill

when given directions and under supervision. At this level the

total act is not performed skillfully. Timing and coordination of

the act are not yet refined.

Manipulation is the ability to perform a skill independently.

The entire skill can be performed in sequence. Conscious

effort is no longer needed to perform the skill, but complete

accuracy has not been achieved yet.

Precision is the ability to perform an act accurate, efficiently,

and harmoniously. Complete coordination of the skill has been

acquired. The skill has been internalized to such extent that it

can be performed unconsciously.

Based on the foregoing list of objectives, it can be noted that these

objectives range from simple reflex reactions to complex actions, which

communicate ideas or emotions to others. Moreover, these objectives

serve as a reminder to every teacher that students under his charge have

to learn a variety of skills and be able to think and act in simple and

complex ways.

Measuring the Acquisition of Motor and oral Skills

There are two approaches that teachers can use in measuring the

acquisition of motor and oral skills in the classroom: observation of student

performance and evaluation of student projects (Gay 1990).

Observation of Student Performance is an assessment approach

in which the learner does the desired skill in the presence of the teacher.

For instance, in physical Education class, the teacher can directly observe

how male students dribble and shoot the basketball. In this approach, the

teacher observes the performance of a student, gives feedback, and keeps

a record of his performance, if appropriate.

Observation of student performance can either be holistic or

atomistic (Louisell & Descamps, 1992). Holistic observation is employed

when the teacher gives a score or feedback based on pre – established

prototypes of how an outstanding, average, or deficient performance looks.

Prior to the observation, the teacher describes the different levels of

performance.

A teacher, for example, who required his students to make an oral

report on a research they undertook, describes the factors which go into an

ideal presentation. What the teacher may consider in grading the report,

include the following: knowledge of the topic; organization of the

presentation of the report; enunciation; voice projection; and enthusiasm.

The ideal present has to be described and the teacher has to comment on

each of these factors. A student whose presentation closely matches the

ideal described by the teacher would receive a perfect mark.

The second type of observation that can be utilized is atomistic or

analytic. This type of observation requires that a task analysis be

conducted in order to identify the major subtasks involved in the student

performance. For example, in dribbling the ball, the teacher has to identify

movements necessary to perform the task. Then, he has to develop pa

checklist which enumerates the movements necessary to the performance

of the task. These positions are demonstrated by the teacher. As students

perform the dribbling of the ball, the teacher assigns checkmarks for each

of the various subtasks. After the students’ has performed the specified

action, all checkmarks are considered and an assessment of the

performance is made.

Evaluation of Student Products is another approach that teachers

can use in the assessment of students’ mastery of skills. For example,

projects in different learning areas may be utilized in assessing students’

progress. Student products include drawings, models, construction paper

products, etc.

The same principles involved in holistic and atomistic observations

apply to the evaluation of projects. The teacher has to identify prototypes

representing different levels of performance for a project or do a task

analysis and assign scores by subtasks. In either case, the student has to

inform of the criteria and procedures to be used in the assessment of their

work.

Assessing Performance through Student Portfolios

Portfolio assessment is a new form of assessing students’

performance (Mitchell, 1992). A portfolio is but a collection of the students’

work (Airisian, 1994). It is used in the classroom to gather a series of

students’ performances or products that show their accomplishment and /

or improvement over time. It consists of carefully selected samples of the

students’ work indicating their growth and development in some curricular

goals. The following can be included in a student’s portfolio: representative

pieces of his / her writing; solved math problems; projects and puzzles

completed; artistic creations; videotapes of performance; and even tape

recordings.

Wolf (1989) says that portfolios can be used for the following

purposes:

Providing examples of student performance to parents;

Showing student improvement over time;

Providing a record of students’ typical performances to pass on to

the next year’s teacher;

Identifying areas of the curriculum that need improvement;

Encouraging students to think about what constitutes good

performance in a learning area; and

Grading students.

According to Airisian (1994), there are four steps to consider in

making use of this type of performance assessment. (1) establishing a

clear purpose; (2) setting performance criteria; (3) creating an appropriate

setting; and (4) forming scoring criteria or predetermined rating.

Purpose is very important in carrying out portfolio assessment. Thus,

there is a need to determine beforehand the objective of the assessment

and the guidelines for student products that will be included in the portfolio

prior to compilation.

While teachers need to collaborate with their colleagues in setting a

common criterion, it is crucial they involve their students in setting

standards or performance. This will enable the latter to claim ownership

over their performance.

Portfolio assessment also needs to consider the setting in which

students’ performance will be gathered. Shall it be a written portfolio? Shall

it be a portfolio of oral or physical performances, science experiments,

artistic productions and the like? Setting has to be looked into since

arrangements have to be made on how desired performance can be

properly collected.

Lastly, scoring methods and judging students’ performance are

required in portfolio assessment. Scoring students’ portfolio, however, is

time consuming as a series of documents and performances has to be

scrutinized and summarized. Rating scales, anecdotal records, and

checklists can be used in scoring students’ portfolios. The content of a

portfolio, however, can be reported in the form of a narrative.

Tools for Measuring Acquisition of Skills

As pointed out previously, observation of student performance and

evaluation of student products are ways by which teachers can measure

the students’ acquisition of motor and oral skills. To overcome the problem

relating to validity and reliability, teachers can use rating scales, checklists

or other written guides to help them come up with unbiased or objective

observations of student performance.

Rating scale is nothing but a series of categories that is arranged in

orders of quality. It can be helpful in judging skills, products, and

procedures. According to Reyes (2000), there are three steps to follow in

constructing a rating scale.

Identify qualities of the product to be assessed. Create a scale

for each quality or performance aspect.

Arrange the scales either from positive or negative or vice –

versa.

Write directions for accomplishing the rating scale.

Following is an example of a rating scale for judging a student

teacher presentation of a lesson.

Rating Scale for Lesson Presentation

Student Teacher ___________________________ Date ______________

Subject _____________________________________________________

Rate the student teacher on each of the skill areas specified below.

Use the following code: 5 = Outstanding; 4 = Very satisfactory; 3 =

Satisfactory; 2 = Fair; 1 = Needs improvement. Encircle the number

corresponding to your rating.

5 4 3 2 1 Audience contact

5 4 3 2 1 Enthusiasm

5 4 3 2 1 Speech quality and delivery

5 4 3 2 1 Involvement of the audience

5 4 3 2 1 Use of non – verbal communication

5 4 3 2 1 Use of questions

5 4 3 2 1 Directions and refocusing

5 4 3 2 1 Use of reinforcement

5 4 3 2 1 Use of teaching aids and instructionalmaterials

A checklist differs from a rating scale as it indicates the presence or

absence of specified characteristics. It is basically a list of criteria upon

which a student’s performance or end product is to be judged. The

checklist is used by simply checking off the criteria items that have been

met.

Response on a checklist varies. It can be a simple check mark

indicating that an action took place. For instance, a checklist for observing

student participation in the conduct of a group experiment may appear like

this:

1. Displays interest in the experiment.

2. Helps in setting up the experiment.

3. Participates in the actual conduct of the experiment.

4. Makes worthwhile suggestions.

The rater would simply check the items occurred during the conduct

of the group experiment.

Another type of checklist requires a yes or no response. The yes is

checked when the action is done satisfactorily; the no is checked when the

action is done unsatisfactorily. Below is an example of this type of

checklist.

Performance Checklist for a Speech Class

Name ___________________________________ Date ______________

Click Yes or No as to whether the specified criterion is met.

Did the student: YES ON

1. Use correct grammar? _______________ ______________

2. Make clear presentation? _______________ ______________

3. Stimulate interest? _______________ ______________

4. Use clear direction? _______________ ______________

5. Demonstrate poise? _______________ ______________

6. Manifest enthusiasm? _______________ ______________

7. Use appropriate _______________ ______________voice projection?

Levels of Learning in the Affective Domain

Objectives in the affective domain are concerned with emotional

development. Thus, affective domain deals with attitudes, feelings, and

emotions. Learning intent in this domain of learning is organized according

to the degree of internalization. Kratwhol and his colleagues (1964)

identified four levels of learning in the affective domain.

Receiving involves being aware of and being willing to freely attend

to a stimulus.

Responding involves active participation. It involves not only freely

attending to a stimulus but also voluntarily reacting to it in some way.

It requires physical, active behavior.

Valuing refers to voluntarily giving worth to an object, phenomenon

or stimulus. Behaviors at this level reflect a belief, appreciation, or

attitude.

Commitment involves building an internally consistent value system

and freely living by it. A set of criteria is established and applied in

making choices.

Evaluating Affective Learning

Learning in the affective domain is difficult and sometimes

impossible to assess. Attitudes, values and feelings can be intentionally

concealed. This is because learners have the right not to show their

personal feelings and beliefs, if they choose to do. Although the

achievement of objectives in the affective domain are important in the

educational system, they cannot be measured or observed like objectives

in the cognitive and psychomotor domains.

Teachers attempt evaluating affective outcomes when they

encourage students to express feelings, attitudes, and values about topics

discussed in class. They can observe students and may find evidence of

some affective learning.

Although, it is difficult to assess learning in the affective domain,

there are some tools that teachers can use in assessing learning in this

area. Some of these tools are the following: attitude scale; questionnaire;

simple projective techniques; and self – expression techniques (Escarilla &

Gonzales, 1990; Ahmann & Glock, 1991).

Attitude Scale is a form of rating scale containing statements

designed to gauge students’ feelings on an attitude or behavior. An

example of an attitude scale is shown below.

An Attitude Scale for Determining Interest in Mathematics

Name __________________________________ Date _______________

Each of the statements below expresses a feeling toward

mathematics. Rate each statement on the extent to which you agree. Use

the following response code: SA = Strongly Agree; U = Uncertain; D =

Disagree; SD = Strongly Disagree.

1. I enjoy my assignments in Mathematics.

2. The book we are using in the subject is interesting.

3. The lessons and activities in the subject challenge me to

give my best.

4. I do not find exercises during our lesson boring.

5. Mathematical problems encourage me to think critically.

6. I feel at ease during recitation and board work.

7. My grade in the subject is commensurate to the effort I

exert.

8. My teacher makes the lesson easy to understand

9. I would like to spend more time in this subject.

10. I like the way our teacher presents the steps in solving

mathematical problems.

Response to the items is based on the response code provided in

the attitude scale. A value ranging from 1 to 5 is assigned to the options

provided. The value of of 5 is usually assigned to the option “strongly

agree” and 1 to the option “strongly disagree.” When a statement is

negative, however, the assigned values are usually reversed. The

composite score is determined by adding the scale values and dividing it

by the number of statements or items.

Questionnaire can also be used in evaluating attitudes, feelings,

and opinions. It requires students to examine themselves and react to a

series of statements about their attitudes, feelings, and opinions. The

response style for a questionnaire can take any of the following forms:

checklist type, semantic differential, and likert scale

The Checklist type of response provides the students a list of

adjectives for describing or evaluating something and requires them to

check those that apply. For example, a checklist questionnaire on

students’ attitudes in a science class may include the following:

This class is ________________ boring.

________________ exciting.

________________ interesting.

________________ unpleasant.

________________ highly informative.

I find Science ________________ fun.

________________ interesting.

________________ very tiring.

________________ difficult.

________________ easy.

The scoring of this type of test is simple. Subtract the number of

negative statements checked from the number of positive statements

checked.

Semantic differential is another type of response on a

questionnaire. It is usually a five – point scale showing polar or opposite

objectives. It is designed so that attitudes, feelings, and opinions can be

measured by degrees from very favorable to very unfavorable. Given

below is an example of a questionnaire employing the aforementioned

response type.

Working with my group members is:

Interesting _____ : _____ : _____ : _____ : _____ Boring

Challenging _____ : _____ : _____ : _____ : _____ Difficult

Fulfilling _____ : _____ : _____ : _____ : _____ Frustrating

The composite score on the total questionnaire is determined by

averaging the scale values given to the items included in the

questionnaire.

Likert scale is one of the frequently used styles of response in

attitude measurement. It is oftentimes a five – point scale links the options

“strongly agree” and “strongly disagree”. An example of this kind of

response is shown below.

A Likert Scale for Assessing Students’ Attitude Towards

Leadership Qualities of Student Leaders

Name ____________________________________ Date _____________

Read each statement carefully. Decide whether you agree or

disagree with each of them. Use the following response code: 5 = Strongly

disagree; 4 = Agree; 3 = Undecided; 2 = Disagree; 1 = Strongly Disagree.

Write your response on the blank before each item.

Student leaders:

1. Have to work for the benefit of the students.

2. Should set example of good behavior to the

members of the organization.

3. Need to help the school in implementing campus

rules and regulations.

4. Have to project a good image of the school in the

community.

5. Must speak constructively of the school’s teacher

and administrators.

Scoring of a Likert scale is simlar to the scoring of an attitude scale

earlier presented in this chapter.

Simple projective techniques are usually used when a teacher

wants to probe deeper into the student’s feelings and attitudes. Escarilla

and Gonzales (1990) say that there are three types of simple projective

techniques that can 1be used in the classroom, namely: word association,

unfinished sentences, and unfinished story.

In word association, the student is given a word and asked to

mention what comes to his / her mind upon hearing it. For example, what

comes to your mind upon hearing the word corruption?

In an unfinished sentence, the students are presented partial

sentences and are asked to complete them with words that best express

their feeling, for instance:

Given the chance to choose, I _____________________________.

I am happy when _______________________________________.

My greatest failure in life was ______________________________.

In an unfinished story, a story with no ending is deliberately

presented to the students, which they have to finish or complete. Through

this technique, the teacher will be able to sense students’ worries,

problems, and concerns.

Another way by which affective learning can be assessed is through

the use of self – expression techniques. Through these techniques,

students are provided the opportunity to express their emotions and views

about issues, themselves, and others. Self – expression techniques may

take any of the following forms: log book of daily routines or activities,

diaries, essays and other written compositions or themes, and

autobiographies.

CHAPTER REVIEW

1. What is meant by psychomotor learning? What are the levels of

learning under the psychomotor domain? Explain each.

2. What are the two general approaches in measuring the acquisition

of motor and oral skills? Differentiate each.

3. What are the guidelines to observe in undertaking atomistic and

holistic observation?

4. What is portfolio assessment? What are the advantages of using this

type of assessment in evaluating student performance and student

products?

5. What are the guidelines to observe in using portfolio assessment in

the classroom?

6. What are the tools teachers can use in measuring students’

acquisition of motor and oral skills? Briefly define each.

7. What do we mean by affective learning? What are the different

levels of affective learning? Describe each briefly.

8. What are the techniques teachers can employ in evaluating affective

learning? Discuss each very briefly.

CHAPTER 5

Constructing Objective Paper – and – Pencil Tests

Constructing paper – and – pencil test is a professional skill.

Becoming proficient at it takes study, and practice. Owing to the

recognized importance of a testing program, a prospective teacher has to

assume this task seriously and responsibly. He / She needs to be familiar

with the different types of test items and how best to write them. This

chapter seeks to equip prospective teachers with the skill in constructing

objective paper – and – pencil tests.

General Principles of Testing

Ebel and Frisbie (1999) listed five basic principles that should guide

teachers in measuring learning and in constructing their own test. These

principles are discussed below.

Measure all instructional objectives. The test a teacher writes

should be congruent with all the learning objectives focused in class.

Cover all learning tasks. A good test is not focused only on one

type of objective. It must be truly representative of all targeted

learning outcomes.

Use appropriate test items. Test items utilized by a teacher have to

be in consonance with the learning objectives to be measured.

Make test valid and reliable. Teachers have to see to it that the

test they construct measures what it purports to measure. Moreover,

they need to ensure that the test will yield consistent results for the

students taking it for the second time.

Use test to improve learning. Test scores obtained by the students

can serve as springboards for the teachers to re-teach concepts and

skills that the former have not mastered.

Attributes of a Good Test as an Assessment Tool

A good test must possess the following attributes or qualities:

validity; reliability; objectivity; scorability; administrability; relevance;

balance; efficiency; diffculty; discrimination; and fairness (Sparzo, 1990;

Reyes 2000; Manarang and Manarang, 1993; Medina; 2002).

Validity – It is the degree to which a test measures what it

seeks to measure. To determine whether a test a teacher

constructed is valid or not, he / she has to answer the

following questions:

1. Does the test adequately sample the intended content?

2. Does it test the behaviors / skills important to the

content being tested?

3. Does it test all the instructional objectives of the content

take up in class?

Reliability – It is the accuracy with which a test consistently

measures that which it does measure. A test, therefore, is

reliable if it produces similar results when used repeatedly. A

test may be reliable but not necessarily valid. On the other

hand, a valid test is always a reliable one.

Objectivity – It is the extent to which personal biases or

subjective judgment of the test scorer is eliminated in checking

the student responses to the test items, as there is only one

correct answer for each question. For a test to be considered

objective, experts must agree on the right of the best answer.

Thus, objectivity is a characteristic of the scoring of the test

and not of the form of the test questions.

Scorability – It is easy to score or check as answer key and

answer sheet are provided.

Administrability – It is easy to administer as clear and simple

instructions are provided to students, proctors, and scorers.

Relevance – It is the correspondence between the behavior

required to respond correctly to a test item and the purpose or

objective in writing the item. The test item should be directly

related to the course objectives and actual instruction. When

used in relation to educational assessment, relevance is

considered a major contributor to test validity.

Balance – Balance in a test refers to the degree to which the

proportion of items testing particular outcomes corresponds to

the deal test. The framework of the test is outlined by a table

of specifications.

Efficiency – It refers to the number of meaningful responses

per unit of time. Compromise has to be made the available

time for testing, scoring, and relevance.

Difficulty – The test items should be appropriate in difficulty

level to the group being tested. In general, for a norm –

referenced test, a reliable test is one in which each item is

passed by half of the students. For a criterion – referenced

test, difficulty can be judged relative to the percentage passing

before and after instruction. Difficulty will indefinitely be based

on the skill and knowledge measured and student’s ability.

Discrimination – For a norm – referenced, the ability of an

item to discriminate is generally indexed by the difference

between the proportion of good and poor students who

respond correctly. For a criterion – referenced test,

discrimination is usually associated with pretest and posttest

differences of the ability of the test or item to distinguish

competent from less competent students.

Fairness – To ensure fairness, the teacher should construct

and administer the test in manner that allows students an

equal chance to demonstrate their knowledge or skills.

Steps in Constructing Classroom Tests

Constructing classroom tests is a skill. As such, there are steps that

a teacher has to follow (Reyes, 2000). These steps are outlined and

discussed below.

Identification of instructional objectives and learning

outcomes. This is the first step a teacher has to undertake

when constructing classroom tests. He / She has to identify

instructional objectives and learning outcomes, which will

serve as his / her guide in writing test items.

Listing of the topics to be covered by the Test. After

identifying the instructional objectives and learning outcome, a

teacher needs to outline the topics to be included in the test.

Preparation of Table of Specification (TOS). The table of

specifications is a two – way table showing the content

coverage of the test and the objectives to be tested. It can

serve as a blueprint in writing the test items later.

Selection of the Appropriate Types of Tests. Based on the

TOS, the teacher has to select test types that will enable him /

her to measure the instructional objectives in the most

effective way. Choice of test type depends on what shall be

measured.

Writing Test Items. After determining the type of test to use,

the teacher proceeds to write the suitable test items.

Sequencing the Items. After constructing the test items, the

teacher has to arrange them based on difficulty. As a general

rule items have to be sequenced from the easiest to the most

difficult for psychology reason.

Writing the Directions or Instructions. After sequencing

items, the teacher has to write clear and simple directions,

which the students will follow in answering the test questions.

Preparations of the Answer Sheet and Scoring Key. To

facilitate checking of students’ answers, the teacher has to

provide answer sheets and prepare a scoring key in advance.

Preparing the Table of Specifications (TOS)

As already mentioned the table of specifications is the teacher’s

blueprint in constructing a test for classroom use. According to Arends

(2001), the TOS is valuable to teachers for two reasons. First, it helps

teachers decide on what to include and leave out in a test. Second, it helps

them determine how much weight to give for each topic covered and

objective to be tested.

There are steps to observe in preparing a table of test specifications.

1. List down the topics covered for inclusion in the test.

2. Determine the objectives to be assessed by the test.

3. Specify the number of days / hours spent for teaching a

particular topic.

4. Determine percentage allocation of test items for each of the

topic covered. The formula to be applied is as follows:

% for a Topic = Total number of days / hours spent

divided by the total number of days / hours spent

teaching the topic.

Example: Mrs. Sid Garcia utilized 10 hours for teaching the

unit on Pre – Spanish Philippines. She spent 2

hours in teaching the topic, “Early Filipinos and

their Society.” What percentage of test items

should she allocate for the aforementioned topic?

Solution: (100) = 20%

5. Determine the number of items to construct for each topic.

This can be done by multiplying the percentage allocation for

each topic by the total number of items to be constructed.

Example: Mrs. Sid Garcia decided to prepare a 50 – item test

on the unit, “Pre – Spanish Philippines.” How many

items should she write for the topic mentioned in

step number 4?

Solution: 50 items x 0.20 (20%) = 10 items

24

6. Distribute the number of items to the objectives to be tested.

The number of items allocated for each objective depends on

the degree of importance attached by the teacher to it.

After going through the six steps, the teacher has to write the TOS in

a grid or matrix, as shown below.

Table of Specification for a 50 – Item Test in Economics

Topic / Objective Knowledge Comprehension Application Analysis Total

5

2

6

7

10

10

15

15

5020

1

3

3

3

1010

2

2

3

3

10

2

3

3

2

Total

The Nature ofEconomics

EconomicsSystems

Law of Demand& Supply

Price Elasticity of Demands &

Supply

General Guidelines in Writing Test Items

Airisian (1994) identified five basic guidelines in writing test items.

These guidelines are as follows:

1. Avoid wording that is ambiguous and confusing.

2. Use appropriate vocabulary and sentence structure.

3. Keep questions short and to the point.

4. Write items that have one correct answer.

5. Do not provide clues to the answer.

Criteria for Providing Test Directions

Test directions are very important in any written test as the inability

of the test taker to understand them affects the validity of a test. Thus,

direction should be complete, clear and concise. The students must be

aware of what is expected of them. The method of answering has to be

kept as simple as possible. Test directions should also contain instructions

on guessing.

The following criteria should be kept in mind when writing directions

for a test (Linn, 1999):

Assume that the examinees and the examiner know nothing at

all about the objective tests.

In writing directions, use a clear, succinct style. Be as explicit

as possible but avoid long drawn – out explanations.

Emphasize the more important directions and key activities

through the use of understanding, italics, or different type size

or style.

Field or pretest the directions with a sample of both

examinees and examiners to identify possible

misunderstandings and inconsistencies and gather

suggestions for improvement.

Keep directions for different forms, subsections or booklets as

uniform as possible.

Where necessary or helpful, give practice items before each

regular section. This is very important when testing young

children or those unfamiliar with the objective tests, or

separate answer sheets.

Writing Multiple – Choice Items

The most widely used form of the test is the multiple – choice item.

This is because of its versatility. It can be used in measuring different kinds

of content and almost any type of cognitive behavior, from factual

knowledge to analysis of complex data. Furthermore, it is easy to score.

A multiple – choice item is composed of a stem, which sets up the

problem and asks a question, followed by a number of alternative

responses. Only one of the alternatives is the correct answer, the other

alternatives are distractors or foils.

The principal goal for a multiple – choice item construction is to write

clear, concise or unambiguous items. Consider the example below.

Poor: The most serious disease in the world is -

(A) Mental illness (C) Heart disease

(B) AIDS (D) Cancer

The correct answer depends on what is meant by “serious.”

Considering that heart disease leads to more deaths, mental illness affects

a number of people, and AIDS is a world – wide problem nowadays, there

are three possible answers. Nevertheless, the question can be reworded

as follows, for example:

Improved: The leading cause of death in the world today is:

(A) Mental illness (C) Heart disease

(B) AIDS (D) Cancer

To be able to write effective multiple – choice items, the following

guidelines should be followed:

1. Each item should be clearly stated, in the form of a

question or an incomplete statement.

2. Do not provide grammatical or contextual clues to the

correct answer. For instance, the use of a before the options

indicates that the answer begins with a vowel.

3. Use language that even the poorest readers will

understand.

4. Write a correct or best answer and several plausible

distractors.

5. Each alternate response should fit the stem in order to

avoid giving clues to its correctness.

6. Refrain from using negatives or double negatives. They

tend to make the items confusing and difficult.

7. Use all of the above and none only when they will

contribute more than another plausible distractor.

8. Do not use items directly from the textbook. Test for

understanding not memorization.

Examine the following multiple – choice items.

Sample 1: A two – way grid summarizing the relationship

between test scores and criterion scores is

sometimes referred to as an:

(A) Correlation coefficient. (C) Probability histogram.

(B) Expectancy table. (D) Bivariate frequencydistribution

Sample 1 is faulty because of the use of article an. This is because

this article can lead the student to the correct answer, which is B.

Improved: Two – way grids summarizing test – criterion

relationships are sometimes called:

(A) Correlation coefficient. (C) Probability histogram.

(B) Expectancy table. (D) Bivariate frequencydistribution

Sample 2: Which of the following descriptions makes clear the

meaning of the word “electron”?

(A) An electronic gadget (D) A voting machine

(B) Neutral particles (E) The nuclei of atoms

(C) Negative particles

Sample 2 is poorly written owing to its use of distractors that are not

plausible or closely related to each other. Options A and D are not in

anyway associated with the remaining choices or alternatives.

Improved: Which of the following phrases is a description of an

electron?

(A) Neutral particle (D) Related particle

(B) Negative particle (E) Atom nucleus

(C) Neutralized proton

Sample 3: What is the area of a right triangle whose sides

adjacent to the right angle are 4 inches and 3 inches,

respectively?

Sample 3 is also erroneously written as it used the option none of

the above without caution. Why? This is because the answer is 6 inches

and the bright student will definitely choose option D. on the other hand,

the student who solved the problem incorrectly and came up with an

answer not found among the choices, would choose D, thereby getting the

correct answer for the wrong reason. The answer, “none of the above” can

be a good alternative if the correct answer is included among the options

or choices.

Improved: What is the area of a right triangle whose sides

adjacent to the right angles are 4 inches,

respectively?

(A) 6 square inches (D) 13 square inches

(B) 7 square inches (E) none of the above

(C) 12 square inches

Using Multiple Choice Items in Assessing Problem

Solving and Logical Thinking

Schools today are stressing on problem – solving skills owing to

society’s pressures on the former to produce individuals with significant

skills in the aforementioned area. A number of terms have been used to

describe the basic operations of application. Terms like critical thinking and

logical reasoning are used as rubrics under which the basic processes of

problem identification, specification of alternative solutions, evaluation of

consequences, and solution selection are grouped.

Creating problem – solving measures follows a step – by – step

procedures (Haladyna & Downing, 1999).

Step 1. Decide on the principle / s to be tested. Criteria to be

considered should:

Be known principles but the situation in which the principles

are to be applied should be new.

Involve significantly important principles.

Be pertinent to a problem or situation common to all students.

B e within the range of comprehension of all students.

Use only valid and reliable sources from which to draw data

Be interesting to students.

Step 2. Determine the phrasing of the problem situation so as to

require the students in drawing their conclusion to do one

of the following:

Make a prediction.

Choose a course of action.

Offer an explanation for an observed phenomenon.

Criticize a prediction or explanation made by others.

Step 3. Set up the problem situation in which the principle or

principles selected operate. Present the problem to the

class with directions to draw a conclusion or conclusions

and give several supporting reasons foe their answer.

Step 4. Edit the students’ answers, selecting those that are most

representative of their thinking. These will include

conclusions and supporting reasons that are both

acceptable and unacceptable.

Step 5. To the conclusions and reasons obtained from the students,

the teacher now adds any others that he or she feels are

necessary to cover the salient points. The total number of

items should be at least 50% more than is desired in the

final form to allow for elimination of poor items. Some types

of statements that can be used are as follows:

True statements of principles and facts

False statements of principles and facts

Acceptable and unacceptable analogies

Appeal to acceptable or unacceptable authority

Ridicule

Assumption of the conclusion

Teleological explanations

Step 6. Submit tests to colleagues or evaluators for criticisms.

Revise test based on these criticisms.

Step 7. Administer test. Follow with thorough class discussion.

Step 8. Conduct an item analysis.

Step 9. In the light of steps 7 and 8, revise the test.

Following are some examples of problem – solving items.

1. Ulysses wanted to go to the US. But Ulysses’ father, who is quite

strict with him, stated emphatically that he could not go unless he

got a grade of 1.25 in both his freshman English courses,

Ulysses’ father always keep his promises. When summer came,

Ulysses went to the US. If from this information, you conclude

that Ulysses earned 1.25, you must be assuming that:

(A) Ulysses had never obtained a grade of 1.25 before.

(B) Ulysses had no money of his own.

(C) Ulysses’ father was justified in saying what he did.

(D) Ulysses went to the US with his father’s consent.

(E) Ulysses was very sure that he would be able to go.

2. Consider these facts about the coloring of animals:

Plant lice, which live on the stems of green plants,

are green.

The grayish – mottled moth resembles the bark of

the trees on which it lives.

Insects, birds, and mammals that live in the desert

are usually sandy or grey.

Polar bears and other animals living in the Arctic

region are white.

Which one of the following statements do these facts tend to

support?

Animals that prey on others use colors as disguise.

Some animals imitate the color and shape of other natural

objects for protection.

The coloration of animals has to do with their surroundings.

Protective coloration is found more among insects and

birds than among mammals.

Many animals and insects have protective coloring.

Writing Alternate – Response Items

An alternate – response item is one wherein there are only two

possible answers to the stem. The true – false format is an alternate –

response item. Some variations of the basic true – false item include yes –

no, right – wrong, and agree – disagree items.

Alternate – response items seem easy to construct. Writing good

alternate – response items, however, requires skill so as to avoid triviality.

Writing good true – false items is difficult as there are few assertions that

are unambiguously true or false. Besides, they are sensitive to guessing.

Some guidelines to follow in writing alternate – response items are

given below.

1. Avoid the use of negatives.

2. Avoid the use of unfamiliar or esoteric language.

3. Avoid trick items that appear to be true but are false because of

an inconspicuous word or phrase.

4. Use quantitative and precise rather than quantitative language

where possible.

5. Don’t make true items longer than false items.

6. Refrain from creating a pattern of response.

7. Present a similar number of true and false statements.

8. Be sensitive to the use of specific determiners. Words such as

always all, never, and none indicate sweeping generalizations,

which are associated with false items. Conversely, words like

usually and generally are associated with true items.

9. A statement must only have one central idea.

10. Avoid quoting exact statements from the textbooks.

Let us go over examples of the alternate response test items.

Sample 1. The raison d’etre for capital punishment is retribution

according to some peripatetic politicians.

This sample alternate response item is poorly written for it used

words that are very unfamiliar or difficult to understand by an average

student.

Improved: According to some politicians, the justification for the

existence of capital punishment can be traced to the

biblical statement, “an eye for an eye, a tooth for a tooth.”

Sample 2. From time to time efforts have been made to explain the

notion that there may be a cause – and – effect

relationship between arboreal life and primate anatomy.

Sample 2 id again faulty as it was copied exactly between from the

textbook.

Improved: There is a known relationship between primate anatomy

and arboreal life.

Sample 3. Many people voted for Gloria Macapagal – Arroyo in the

last presidential election.

Sample 3 also violates the rule on writing alternate response items

owing to its use of not precise language. As such it is open to numerous

and ambiguous interpretation.

Improved: Gloria Macapagal – Arroyo received more than 50% of

the votes cast in the last presidential election.

Alternate – response items allow teachers to sample a number of

cognitive behaviors in a limited amount of time. Even the scoring of

alternative – response items tends to be simple and easy. Nonetheless,

there are content and learning outcomes that cannot be adequately

measured by alternate – response items, like problem – solving and

complex learning.

Writing Matching Items

Matching items are designed to measure students’ ability to single

out pairs of matching phrases, words or other related facts from separate

lists. It is basically an efficient arrangement of a set multiple – choice items

with all stems, called premises, having the same set of possible alternative

answers. Matching items are appropriate to use in measuring verbal

associative knowledge (Moore, 1997) or knowledge such as inventors and

inventions, titles and authors, or objects and their basic characteristics.

To be able to write good matching items, the following guidelines

have to be considered in the process.

1. Specify the basis for matching the premises with the

responses. Sound testing practice dictates that the directions spell

out the nature of the task. It is unfair and reasonable that the student

should have to read through the stimulus and response list in order

to discern the basis for matching.

2. Be sure that the whole matching exercise is found on one

page only. Splitting the exercise is confusing, distracting, and time –

consuming for the student.

3. Avoid including too many premises on one matching item.

If a matching exercise is too long, the task becomes tedious and the

discrimination too fine.

4. Both the premises and responses in the same general

category or class (e.g. inventors – inventions; authors – literary

works; objects - characteristics).

5. Premises or responses composed of one or two words

should be arranged alphabetically.

Analyze the following matching exercise. Does it follow the

suggestions on writing a matching exercise?

Directions: Match Column A with Column B. You will be given one

point for each correct match.

Column A Column B

1. Execution of Rizal a. 1521

2. Pseudonym of Ricarte b. 1896

3. Hero of Tirad Pass c. Gregorio del Pilar

4. Arrival of the Spaniards in the d. Spolarium

Philippines e. Vibora

5. Masterpiece of Juan Luna

The matching exercise is poorly written as the premises in column A

do not belong to same category. Thus, answers can easily be guessed by

the student. Below is the version of the above matching exercise.

Column A Column B

1. National Hero of the Philippines a. Aguinaldo

2. Hero of Tirad Pass b. Bonifacio

3. Brain of the Katipunan c. Del Pilar

4. Brain of the Philippine Revolution d. Rizal Jacinto

5. The Sublime Paralytic e. Mabini

f. Rizal

Writing Completion Items

Completion items require the students to associate an incomplete

statement with a word of phrase recalled from memory (Ahman, 1991).

Each completion test item contains a blank, which the student must fill in

correctly with one word or a short phrase. Inasmuch as the student is

required to write test items are useful for the testing of specific facts.

Guidelines in constructing completion items are as follows:

1. As a general rule, it is best to use only one blank in a

completion item.

2. The blank should be placed near or at the end of the

sentence.

3. Give clear instructions indicating whether synonyms will be

correct and whether spelling will be a factor in scoring.

4. Be definite enough in the incomplete statement so that only

one correct answer is possible.

5. Avoid using direct statements from the textbooks with a

word or two missing.

6. All blanks for all items should be of equal length and long

enough to accommodate the longest response.

Go over the following sample items:

Directions: On your answer sheet, write the expression that

completes each of the following sentences.

1. __________ is money earned from the use of money.

2. The Philippines is at the _________ and ________ of ________.

Sample 1 is poorly written as a well – written completion item should

have its blank either near or at the end of the sentence. In like manner

Sample 2 is also poorly written as the statement is over – mutilated.

Following are the improved versions of these sample items.

1. Money earned from the use of money is called _________.

2. The Philippines is located in the continent of _________.

Writing Arrangement Items

Arrangement items are used for knowledge of sequence and order.

Arrangement of words alphabetically, of events chronologically, of

numbers according to magnitude, stages in a process, incidents in a story

or novel in a word, are a few cases of this type of test. Some guidelines on

preparing this type of test are as follows:

1. Items to be arranged should belong to one category only.

2. Provide instructions on the rationale for arrangement or

sequencing.

3. Specify the response code students have to use in arranging the

items.

4. Provide sufficient space for the writing to the answer.

Following are examples or arrangement items.

Sample 1 Directions: Arrange the following decimals in the order

of magnitude by placing 1 above the smallest, 2 above

the next, 3 above the third, and 4 above the biggest.

(a) 0.2180 (b) 0.2801 (c) 0.2018 (d) 0.2081

Sample 2 Directions: The following words are arranged at

random. On your answer sheet, rearrange the words so

that they will form a sentence.

much the costs rose

Sample 3 Directions: Each group of letters below spell out words

item if the letters are properly arranged. On your answer

sheet, rearrange the letters in each group to form a

word.

ybo ebul swie atgo

Writing Completion – Drawing Items

As pointed out in the previous chapter, a completion – drawing item

is one wherein an incomplete drawing is presented which the student has

to complete. The following guidelines have to be observed in writing the

aforementioned type of test item:

1. Provide instruction on how the drawing will be completed.

2. Present the drawing to be completed.

Writing Correction Items

The correction type of test item is similar to the completion item,

except that some words or phrases have to be changed to make the

sentence correct. The following have to be considered by the teacher in

writing this kind of test item.

1. Underline or italicize the word of phrase to be corrected in

a sentence.

2. Specify in the instruction where students will write their

correction of the underlined or italicized word or phrase.

3. Write items that measure higher levels of cognitive

behavior.

Following are examples of correction items written following the

guidelines in constructing this kind of item.

Directions: Change the underlined word or phrase to make each of

the following statements correct. Write your answer on

the space before each number.

1. Inflation caused by increased demand is known as

oil – push.

2. Inflation is the phenomenon of falling prices.

3. Expenditure on non – food items increases with

increased income according to Keynes.

4. The additional cost for producting an additional unit

of a product is average cost.

5. The sum of the fixed and variable costs is total

revenue.

Writing Identification Items

An identification type of test item is one wherein an unknown

specimen is to be identified by name or other criterion. In writing this type

of item, teachers have to observe the following guidelines:

1. The direction of the test should indicate clearly what has to

be identified, like persons, instruments, dates, events, steps, in a

process and formulas.

2. Sufficient space has to be provided for the answer to each

item.

3. The question should not be copied verbatim from the

textbook.

Following are examples of identification items written following the

guidelines in constructing this type of test item.

Directions: Following are phrase definitions of terms. Opposite

each number, write the term defined.

1. Weight divided by volume.

2. Degree of Hotness or coldness of a body

3. Changing speed of a moving body

4. Ratio of resistance to effort

Writing Enumeration Items

An enumeration item is one wherein the student has to list down

parts or elements / components of a given concept or topic. Guidelines to

follow in writing type of test items include the following:

1. The exact numbers of expected answers have to be specified.

2. Spaces for the writing of answers have to be provided and should

be of the same length.

Below are examples of enumeration items.

Directions: List down or enumerate what are asked for in each of

the following.

Underlying Causes of World War I and II

1. ______________________ 4. ______________________

2. ______________________ 5. ______________________

3. ______________________

Factors Affecting the Demand for a Product

6. ______________________ 9. ______________________

7. ______________________ 10. _____________________

8. ______________________

Writing Analogy Items

An analogy item consists of a pair of words, which are related to

each other (Calmorin, 1994). This type of item is often used in measuring

the student’s skill in easing association between paired words or concepts.

Examples of this type of item are given below.

Example 1: Black is to white, as peace is to ______________.

(a) Unity (c) Harmony

(b) Discord (d) Concord

Example 2: Bonifacio is for the Philippines, while ______________

is for the United States of America.

(a) Jefferson (c) Madison

(b) Lincoln (d) Washington

The following guidelines have to be considered in constructing

analogy items: (Calmorin, 1994).

1. The pattern of relationship in the first pair of words must be

the same pattern in the second pair.

2. Options must be related to the correct answer.

3. The principle of parallelism has to be observed in writing

the options.

4. More than three options have to be included in each

analogy item to lessen guessing.

5. All items must be grammatically consistent.

Writing Interpretative Test Item

Interpretative test item is often used in testing higher cognitive

behavior. This kind of test item may involve analysis of maps, figures, or

charts or even comprehension of written passages. Airisian (1994)

suggested the following guidelines in writing this kind of test item:

1. The interpretative exercise must be related to the

instruction provided the students.

2. The material to be presented to the students should be

new to the students but similar to what was presented during

instruction.

3. Written passages should be as brief as possible. The

exercise should not be a test of general reading ability.

4. The students have to interpret, apply, analyze and

comprehend in order to answer a given question in the exercise.

Writing Short Explanation Items

This type of item is similar to an essay test but requires a short

response, usually a sentence or two. This type of question is a good

practice for the students in expressing themselves concisely. In writing this

type of test item, the following guidelines have to be considered:

1. Specify in the instruction of the test, the number of

sentences that students can use in answering the question.

2. Make the question brief and to the point for the students not

to be confused.

CHAPTER REVIEW

1. What are the basic principles of testing that teachers must

consider in constructing classroom tests? Explain each briefly.

2. What are the steps or procedures teachers have to follow

in writing their own tests? Explain the importance of each of

them.

3. What is the table of specification (TOS)? How is it

prepared?

4. What are the general guidelines in writing test items?

5. What are the specific guidelines to be observed in writing

the following types of test item:

5.1 Multiple – choice;

5.2 True – false;

5.3 Matching item;

5.4 Arrangement item;

5.5 Identification item;

5.6 Correction item;

5.7 Analogy;

5.8 Interpretative exercise;

5.9 Short explanation item?

CHAPTER 6

Constructing and Scoring Essay Tests

Many new teachers believe that easy tests are the easiest type of

assessment instrument to construct and score. This is not actually true.

The expenditure of time and effort is necessary if essay items and tests

are to yield meaningful information. An essay test permits direct

assessment of the attainment of numerous goals and objectives. In

contrast with the objective test item types, an essay test demands less

construction time per fixed unit of student time but a significant increase in

labor and time for scoring. This chapter exposes you to the problems and

procedures involved in developing, administering, and scoring of essay

tests.

General Types of Essay Items

There are two types of essay items: extended response and

restricted response.

An extended response essay item is one that allows for an in –

depth sampling of a student’s knowledge, thinking processes, and problem

– solving behavior relative to a specific topic. The open – ended nature of

task posed by an instruction such as “discuss essay and objective tests” is

challenging to a student. In order to answer this question correctly, the

student has to recall specific information and organize, evaluate, and write

an intelligible composition. Since it is poorly structured, such a free –

response essay item would tend to yield a variety of answers from the

examinees, both with respect to content and organization, and thus inhibit

reliable grading. The potential ambiguity of an essay task is probably the

single most important contributor to unreliability. In addition, the more

extensive the responses required and the fewer questions a teacher may

ask would definitely result to lower content validity of the test.

On the other hand, a restricted response essay item is one where

the examinee is required to provide limited response based on a specified

criterion for answering the question. It follows, therefore, that a more

restricted response essay item is, in general, preferable. An instruction

such as “discuss the relative advantages and disadvantages of essay tests

with respect to (1) reliability, (2) objectivity, (3) content validity, and (4)

usability” presents a better defined task more likely to lend itself to reliable

scoring and yet allows examinees sufficient opportunity or freedom to

organize and express their ideas creatively.

Learning Outcomes Measured Effectively with Essay

Items

Essay questions are designed to provide the students the

opportunity answer questions in their own words (Orristein, 1990). They

can be used in assessing the student’s skill in analyzing, synthesizing,

evaluating, thinking, logically, solving problems, and hypothesizing.

According to Gronlund and Linn (1990), there are 12 complex learning

outcomes that can be measured effectively with essay items. There are the

abilities to:

Explain cause – effect relationships;

Describe relevant arguments;

Formulate tenable hypothesis;

State necessary assumptions;

Describe the limitations of data;

Explain methods and procedures;

Produce, organize, and express ideas;

Integrate learning in different areas;

Create original forms; and

Evaluate the worth of ideas.

Content versus Expression

It is frequently claimed the essay item allows the student to present

his or her knowledge and understanding and to organize the material in

unique form and style. More often or not, factors like expression, grammar,

spelling and the like are evaluated in relation to content. If the teacher has

attempted to develop students’ skills in expression, and if this learning

outcome is included in the table of specifications, the assessment of such

skills is just right and valid. If these skills are not part of the instructional

program, its not right to assess them. If the score of each essay question

includes an evaluation of the mechanics of English, this should be made

known to the student possible separate scores should be given to content

and expression.

Specific Types of Essay Questions

The following set of essay questions is presented to illustrate how an

essay item is phrased or worded to elicit particular behaviors and levels of

response.

I. Recall

A. Simple Recall

1. What is the chemical formula for sodium bicarbonate?

2. Who wrote the novel, “The Last of the Mothicans?”

B. Selective Recall in which a basis for evaluation

or judgment is suggested

1. Who among the Greek philosophers affected your

thinking as a student?

2. Which method of recycling is the most appropriate to

use at home?

II. Understanding

A. Comparison of two phenomena on a single

designated basis

1. Compare 19th century and present – day Filipino writers

with respect to their involvement in societal affairs.

B. Comparison of two phenomena in general

1. Compare the Philippine Revolution of 1896 with that of

People’s Power Revolution of 1986.

C. Explanation of the use or exact meaning of a

phrase or statement.

1. The legal system of the Mesopotamians was anchored

on the principle of an eye for an eye, a tooth for a tooth.

What dies these principle mean?

D. Summary of a text or some portion of it

1. What is the central idea of communism as an

economic system?

E. Statement of an artist’s purpose in the

selection or organization

1. Why did Hemingway describe in detail the episode in

which Gordon, lying wounded, engages the oncoming

enemy?

III. Application. It should be clearly understood that

whether or not question requires application depends on the

preliminary educational experience. If an analysis has been

taught explicitly, a questionnaire analysis is but a simple recall.

A. Causes or Effects

1. Why did Fascism prevail in Germany and Italy but not

in Great Britain and France?

2. Why does frequent dependence on penicillin for

treatment minor ailment result in its reduced

effectiveness against major invasion of body tissues

by infectious bacteria?

B. Analysis

1. Why was Hamlet torn by conflicting desires?

2. Why was the Propaganda Movement a successful

failure?

C. Statement of Relationship

1. A researcher reported that teaching styles correlates

with student achievement at about 0.75. What does this

correlation mean?

D. Illustrations or examples of principles

1. Identify three examples of the uses of the hammer in a

typical Filipino home.

E. Application of rules or principles in specified

situations

1. Would you weigh more or less on the moon? Why or

why not?

F. Reorganization of facts

1. Some radical Filipino historians assert that the Filipino

revolution against Spain was a revolution from the top

not from below. Using the same observation, what

other conclusion is possible?

IV. Judgment

A. Decision for or against

1. Should members of the Communist Party of the

Philippines be allowed to teach in colleges and

universities? Why or why not?

2. Nature is more influential than the environment in

shaping an individual’s personality. Prove or disprove

this statement.

B. Discussion

1. Trace the events that led to the downfall of the

dictatorial regime of Ferdinand Marcos.

C. Criticism of the adequacy, correctness, or

relevance of a statement

1. Former President Joseph Estrada was convicted for

the case of plunder by the Sandiganbayan. Comment

on the adequacy of the evidence used by the said

tribunal in reaching a decision on the case field

against the former chief executive of the country.

D. Formulation of new questions

1. What should be the focus of researches in education

to explain the incidence of failure among students with

high intelligence quotient?

2. What questions should parents ask their children in

order to determine the reasons why they join

fraternities and sororities?

Following are examples of essay questions based on Bloom’s

Taxonomy of Cognitive Objectives.

A. Knowledge

Explain how Egypt came to be called the gift of the Nile.

B. Comprehension

What is meant when a person says, “I had just crossed the

bridge?”

C. Application

Give at least three examples of how the law of supply operates in

our economy today.

D. Analysis

Explain the causes and effects of the People’s Power Revolution

on the political and social life of the Filipino people.

E. Synthesis

Describe the origin and significance of the celebration of

Christmas the world over.

Sources of Difficulty in the Use of Essay Tests

There are four sources of difficulty that are likely to be encountered

by teachers in the use of essay tests (Greenberg, et al, 1996). Let us over

each of these difficulties and look into ways to minimize them.

Question Construction. The preparation of the essay item is the

most important in the development process. Language usage and word

choice are particularly important during the construction process. The

language dimension is very critical not only because it controls the

comprehension level of the item for examinee, but it also specifies the

parameters of the task. As a test constructor, you need to narrowly specify,

define, and clarify what it is that you want from the examinees. Examine

this sample essay question, “Comment on the significance of Darwin’s

Origin of Species.” The question is quite broad considering that there are

several ways of responding to it. While the intention of the teacher who

wrote this item was to provide opportunity for the students to display their

mastery of the material, students could write for an hour and still not

discover what their teacher really wants them to relative to the

aforementioned topic. An improved version of the same question follows:

“Do you agree with Darwin’s concept of natural selection resulting in the

survival of the fittest and the elimination of the unfit?” Why or why not?

Reader Reliability. A number of studies had been conducted then

and now on the reliability of grading free – response test items. Results of

these researches failed to demonstrate consistently satisfactory agreement

among essay raters (Payne, 2003). Some of the specific contributory

factors in the lack of reader reliability include the following: quality of

composition and penmanship; item readability; racial or ethnic prejudice on

essay scoring and subjectivity of human judgment.

Instrument Reliability. Even if an acceptable level of scoring is

attained, there is no guarantee that measurement of desired behaviors will

be consistent. There remains the issue of the sampling of objectives or

behaviors represented by the test. One way to increase the reliability of an

essay test is to increase the number of questions and restrict the length of

the answers. The more specific and narrowly defined the questions, the

less likely they are to be ambiguous to the examinee. This procedure

should result in more uniform understanding and performance of assigned

and scoring. It also helps ensure better coverage of the domain of

objectives.

Instrument Validity. The number of test questions influences both

the validity and reliability of essay questions. As commonly constructed, an

essay test contains a small number of items; thus, the sampling of desired

behaviors represented in the table of specification will be limited, and the

test suffering from decreased or lowered content validity.

There is another sense in which the validity of an essay test may be

questioned. Theoretically, the essay test allows the examinees to construct

a creative, organized, unique and integrated communication. Nonetheless,

these examinees spend most of their time very frequently in simply

recalling and organizing information, rather than integrating it. The

behavior elicited by the test, then, is not that hoped for by the teacher or

dictated by the table of specifications. Again, one way of handling the

problem is by increasing the number of items on the test.

Guidelines foe Constructing, Evaluating and Using

Essay Tests

Consider the following suggestions for constructing, evaluating and

using essay tests:

Limit the problem that the question poses so that it will have a

clear or definite meaning to most students.

Use simple words which will convey clear meaning to the

students.

Prepare enough questions to sample the material of the

subject area broadly, within a reasonable time limit.

Use the essay question for purposes it best serves, like

organization, handling complicated ideas and writing.

Prepare questions which require considerable thought, but

which can be answered in relatively few words.

Determine in advance how much weight will be accorded each

of the various elements expected in a complete answer.

Without knowledge of students’ names, score each question

for all students.

Require all students to answer, all questions on the test.

Write questions about materials immediately relevant to the

subject.

Study past questions to determine how students performed.

Make gross judgments of the relative excellence of answers

as a first step in grading.

Word a question as simple as possible in order to make the

task clear.

Do not judge papers on the basis of the external factors

unless they have been clearly stipulated.

Do not make a generalized estimate of an entire paper’s

worth.

Do not construct a test consisting of only one question.

Scoring Essay Tests

Most teachers would agree that the scoring of essay items and tests

is among the most time – consuming and frustrating tasks associated with

classroom assessment. Teachers are frequently not willing to devote a

large chunk of time necessary for checking essay tests. It almost goes

without saying that if reliable scoring is to be achieved, there is a need for

the teacher to spend considerable time and effort.

Before focusing on the specific methods of scoring essay tests, let

us consider the following guidelines. First, it is critical that the teacher

prepare in advance a detailed ideal answer. This is necessary as it will

serve as the criterion by which each student’s response will be judge. If

this is not done, the results could be terrible. The subjectivity of the teacher

could seriously prevent consistent scoring, and it also possible that student

responses might dictate what constitutes correct answers. Second, student

papers should be scored anonymously, and that all answers to a given

item be scored one at a time, rather than grading each student’s total test

separately.

As already pointed out, essay questions are the most difficult to

check owing to the absence of uniformity of response on the part of the

students who took the test. Moreover, there are a number of distractors on

the students’ responses that can contribute to subjective scoring of an

essay item (Hopkins et al, 1990). These distractors include the following:

handwriting, style, grammar, neatness, and knowledge of the students.

There are two ways of scoring an essay test: holistic and analytic

(Kubiszyn & Borich, 1990).

Holistic Scoring. In this type of scoring, a total score is assigned to

each essay question based on the teacher’s general impression of over –

all assessment. Answers to an essay question are classified into any of the

following categories: outstanding; very satisfactory; fair; and poor. A score

value is then assigned to each of these categories. Outstanding response

gets the highest score, while poor response gets the lowest score.

Analytic Scoring. In this type of scoring, the essay is scored in

terms of its components. An essay scored in this manner has separate

points for organization of ideas; grammar and spelling; and supporting

arguments or proofs.

As an essay test is difficult to check, there is a need for teachers to

ensure objectivity in scoring students’ responses (Hopkins et al, 1990). To

minimize subjectivity in scoring an essay test, the following guidelines have

to be considered by the teacher (Airisian, 1994):

Decide what factors constitute a good answer before

administering an essay question.

Explain these factors in the test item.

Read all answers to a single essay question before reading

other questions.

Reread essay answer a second time after initial scoring.

CHAPTER 7

Administering and Scoring Objective

Paper – and – Pencil Test

While it is true that test formats and content coverage are important

ingredients in constructing paper – and – pencil tests, the conditions under

which students shall take the test are equally essential.

This chapter is focused on how tests should be administered and

scored.

Arranging Test Items

Before administering a teacher – made test, test items have to be

reviewed. Once the review is completed, these items have to be

assembled into a test. The following guidelines should be observed in

assembling a test (Airisian, 1994; Jacobsen et al, 1993):

1. Similar items should be grouped together. For example,

multiple – choice items should be together and separated from

true – false items.

2. Arrange test items logically. Test items have to be arranged

from the easiest to the most difficult.

3. Selection items should be placed at the start of the test and

supply items at the end.

4. Short – answer items should be placed before essay items.

5. Specify directions that students have to follow in responding

to each set of grouped items.

6. Avoid cramming items too close to each other. Leave

enough space for the students to write their answers.

7. Avoid splitting multiple – choice or matching items across

two different pages.

8. Number test items consecutively.

Administering the Test

Test administration is concerned with the physical and psychological

setting in which students take the test, for the students to do their best

(Airisian, 1994). Some guidelines that teachers should observe in

administering a test are discussed below.

Provide a quite and comfortable setting. This is essential

as interruptions can affect students’ concentration and their

performance in the test.

Anticipate questions that students may ask. This is also

necessary as students’ questions can interrupt test – taking.

In order to avoid questions, teachers have to proofread their

test question before administering it to the class.

Set proper atmosphere for testing. This means that

students have to know in advance that they will be given a

test. In effect, such information can lead them to prepare for

the test and reduce test anxiety.

Discourage cheating. Students cheat for a variety of

reasons. Some of these are pressures from parents and

teachers, as well as intensive competition in the classroom.

To prevent and discourage cheating Airisian (1994)

recommends the following strategies: strategies before

testing; and strategies during testing.

Strategies before Testing

Teach well.

Give students sufficient time to prepare for the test.

Acquaint the students with the nature of the test and its

coverage.

Define to the students what is meant by cheating.

Explain the discipline to be imposed when caught cheating

Strategies during Testing

Require students to remove unnecessary materials from their

desks.

Have students sit in alternating seats.

Go around the testing room and observe students during

testing.

Prohibit the borrowing of materials like pen and eraser.

Prepare alternate forms of the test.

Implement established cheating rules.

Help students keep track of time.

Scoring Test

After the administration of a test, the teacher needs to check the

students’ test papers in order to summarize their performance on the test.

The difficulty of checking a test differs with the kind of test items used.

Selection items are the easiest to scores, followed by short answer

response and completion items. The most difficult to score, however is the

essay item.

Scoring Objective Tests. The following guidelines have to be

considered by a teacher in scoring an objective test:

Key to correction has to be prepared in advance for use in

scoring the test paper.

Apply the same rules to all students in checking students’

responses to the test questions.

Score each part of the test to have a clear picture of how

students fared in order to determine areas they failed to

master.

Sum up the scores for grading purposes.

Conducting Post – test Review

After scoring a test and recording results, teachers have to provide

students information on their performance. This can be done by writing

comments on the test paper to indicate ho students fared in the test.

Answers to the items have to be reviewed in class for the students to know

where they committed mistakes. In so doing, students will become aware

of the right answer and how the test was scored and graded.

Documents

Measuring & Evaluating Learning Outcomes