A Metaevaluation Study on the Assessment of Teacher Performance in an Assessment Center in the Philippines

Running Head: Metaevaluation Study

A Metaevaluation Study on the Assessment of Teacher Performance in

an Assessment Center in the Philippines

Carlo Magno

De La Salle Universoty-Manila

Nicole Tangco

Center for Learning and Performance Assessment

De La Salle-College of Saint Benilde

1

Abstract

The present study conducted a metaevaluation of the teacher performance system used in the

Performance Assessment Services Unit (PASU) of De La Salle-College of Saint Benilde. To

determine whether the evaluation system on teacher performance adheres to quality evaluation,

the standards of feasibility, utility, propriety, and accuracy are used as standards. The system of

teacher performance evaluation in PASU includes the use of students rating called the Student

Instructional Report (PEF) and a rating scale used by peers called the Peer Evaluation Form

(PEF). A series of guided discussions was conducted among the different stakeholders of the

evaluation system in the college such as the deans and program chairs, teaching faculty, and

students to determine their appraisal of the evaluation system in terms of the four standards. A

metaevaluation checklist was also used by experts in measurement and evaluation in the Center

for Learning and Performance Assessment (CLPA). The results of the guided discussion showed

that most of the stakeholders were satisfied with the conduct of teacher performance assessment.

Although in using the standards by the Joint Committee on evaluation, the results are very low.

The ratings of utility, propriety, and feasibility were fair and the standard on accuracy is poor.

The areas for improvement are discussed in the paper.

2

A Metaevaluation Study on the Assessment of Teacher Performance in

an Assessment Center in the Philippines

3

It is a primary concern among educational institutions to assess the teaching performance

of teachers. Assessing teaching performance enables one to gage the quality of instruction

represented by an institution and facilitate better learning among students. The Philippine

Accrediting Association of Schools, Colleges and Universities (PAASCU) judges a school not

by the number of hectares of property or buildings it owns but rather by the caliber of classroom

teaching and learning it can maintain (O’Donnell, 1996). Judging the quality of teacher

performance actually depends on the quality of assessing the components of teaching. When

PAASCU representatives visit schools, they place a high priority on firsthand observation of

actual faculty performance in the classroom. This implies the value of the teaching happening in

an educational institution as a measure of the quality of that institution. Different institutions

have a variety of ways of assessing teacher performance. These commonly include classroom

observation by and feedback from supervisors, assessment from peers, and students’ assessment,

all of which should be firmly anchored on the school’s mission and vision statements.

4

The De La Salle-College of Saint Benilde (DLS-CSB) uses a variety of assessment

techniques to come up with a valid evaluation of a teacher’s performance. As an institution that

has adopted the learner-centered psychological principles, any assessment technique it uses, as

mentioned in the school’s mission, “recognizes diversity by addressing various needs, interests,

and cultures. As a community of students, faculty, staff, and administrators, we strengthen our

relationships through transformational experiences guided by appreciation of individual worth,

creativity, professional competence, social responsibility, a sense of nationhood, and our faith.

We actively anticipate and respond to individual, industry, and societal needs by offering

innovative and relevant programs that foster holistic human development.” The processes in

teacher performance evaluation of instructors, professors and professionals of the college is

highly critical since it is used to decide on matters such as hiring, rehiring, and promotion. There

should be careful calibration and continuous study of the instruments used to assess teachers.

The process of evaluation in the college was established since the start of the institution

in 1988. Since that time, different assessment techniques have been used to evaluate instructors,

professors, and professionals. The assessment of teachers is handled by the Center for Learning

Performance and Assessment (CLPA), which is primarily responsible for instrument

development, administration, scoring and the communication of assessment results to its

stakeholders. Currently, the instructors and professors are assessed by students using the Student

Instructional Report (SIR), the Peer Evaluation Form (PEF), and academic advising. The current

forms of these instruments have been in use in the last three years.

5

At the present period, there is a need to evaluate the process of evaluating teacher

performance in DLS-CSB. Through a metaevaluation study, it may be determined whether the

processes meets the Joint Committee Standards for Evaluation. The Joint Committee Standards

set a “common language to facilitate communication and collaboration in evaluation.” It is very

helpful in a metaevaluation process since it provides a set of general rules for dealing with a

variety of specific evaluation problems. The processes and practices of the CLPA in assessing

teaching performance needs to be studied whether it meets the standards of utility, feasibility,

propriety, and accuracy. The metaevaluation technique involves the process of delineating,

obtaining, and applying descriptive information and judgmental information about the standards

of utility, feasibility, propriety, and accuracy of an evaluation in order to guide the evaluation

and to publicly report its strength and weaknesses (Stufflebeam, 2000).

This study on metaevaluation addresses the issue of whether the process used by the

CLPA on evaluating teaching performance in DLS-CSB meets the standards and requirements of

a sound evaluation. Specifically, the study will provide information on the adequacy of the SIR,

peer assessment and student advising on following areas: (1) items and instructions of

responding; (2) process of administering the instruments; (3) procedures practiced in assessment;

(4) utility value from stakeholders; (4) accuracy and validity of responses.

Models and Methods of Teacher Evaluation

Generally, teacher evaluations may be summative or formative. The instruments used for

summative evaluation are typically checklist-type forms that provide little room for narrative,

and take note of observable traits and methods that serve as criteria for continued employment,

promotions, and the like (Searfoss & Enz, 1996 in Isaacs, 2003). On the other hand, formative

evaluations are geared toward professional development. In this form of evaluation, teachers and

6

their administrators meet to try to trace the teacher’s further development as a professional.

(Bradshaw, 1996 in Isaacs, 2003).

Another model is differentiated supervision which is a flexible model of evaluation that

works from the premises that teaching is a profession, and as such, teachers should have a certain

level of control over their development as professionals (Glatthorn, 1997 in Isaacs, 2003). This

model allows “for the clinical model of evaluation, cooperative options that allow teachers to

work with peers, and self-directed options guided by the individual teacher” (Isaacs, 2003). The

model allows professional staff and supervisors/administrators options in the process applied for

supervision and evaluation. The supervision program is designed to be developmentally

appropriate to meet the needs of each member of the professional team. The three processes in

the Differentiated Supervision Model are: (1) Focused Supervision, (2) Clinical Supervision, and

(3) Self-Directed Supervision.

The method of collaborative evaluation was developed (Berliner, 1982; Brandt, 1996;

Wolf, 1996 in Isaacs, 2003) with the core of the mentor/administrator-teacher collaboration.

Whether new or experienced, a teacher is aided by a mentor.It “requires a more intensive

administrative involvement that may include multiple observations, journal writing, or artifact

collections, plus a strong mentoring program” (Isaacs, 2003). At the end of a prescribed period,

the mentor and mentee sit down to compare notes on the data gathered over the observation

period. Together, they identify strengths, weaknesses, areas for improvement, and other such

points. In this model, there are no ratings, no evaluative commentaries and no summative write-

ups (Isaacs, 2003).

7

Another is the multiple evaluation checklist which uses several instruments other than

administrator observations. Here, the peer evaluation, the self-evaluation, and the student

evaluation meet in varying combinations to form a teacher’s evaluation (Isaacs, 2003).

Self-evaluation also plays an important role in the evaluation process. It causes the

teacher to think about his or her methods more deeply, and causes him or her to consider the

long-term. It is also said to promote a sense of responsibility and the development of higher

standards (Lengeling, 1996 in Isaacs, 2003).

Then there is the most commonly-used evaluation, the student evaluation (Bonfadini,

1998; Lengeling, 1996; Strobbe, 1993; Williams & Ceci, 1997 in Isaacs, 2003). They are the

easiest to administer and they provide a lot of insights about rapport-building skills, teacher

communication, and effectiveness. However, as Williams and Ceci (1997), according to Isaacs

(2003) have found that a change in a content-free variable in teaching (they conducted a study in

which the only variable modified was the teaching style—teachers were told to be more

enthusiastic and attended a seminar on presentation methods) was enough to cause a great

magnitude of increase in teacher ratings, student evaluations have to be viewed with caution.

Another reason is one of the findings of the study by Bonfadini (1998), cited by Isaacs (1993).

He found that, upon asking students to rate their teachers according to four determinant areas, (a)

personal traits, (b) professional competence, (c) student-teacher relationships, and (d) classroom

management, the least used determinant was professional competence. Conclusion: students may

tend to look more at the packaging (content-free variables) rather than that which empirically

makes a good teacher—so viewing student-based information, says Isaacs (2003), should be

done with care.

8

In the field of teacher evaluation, the growing use of the portfolio is slowly softening the

otherwise sharp edges of the standardized instrument (Engelson, 1994; Glatthorn, 1997;

Shulman, 1988; Seldin, 1991 in Isaacs, 2003).

National standards are also used as method for teacher evaluation. It is based on the

instigation of a screening board other than the standard licensure committee, something that has

no counterpart in the Philippines. The creation of the National Board for Professional Teaching

Standards (1998) was prompted by the report A Nation Prepared: Teachers for the 21st Century

generated by the 1986 Carnegie Task Force on Teaching as a Profession, which in turn was

prompted by the 1983 A Nation at Risk report (Isaacs, 2003). It is the mission of the NBPTS to:

…establish high and rigorous standards for what experienced teachers should know and

be able to do, to develop and operate a national, voluntary system of assessment and

certification for teachers, and to advance educational reforms for the purpose of

improving student learning in America's schools” (Isaacs, 2003).

The National Board Certification was meant as a complement to, but not a replacement

for, state licensure exams. While this latter represents the minimum standards required to teach,

the former stands as a test for more advanced standards in teaching as a profession. Unlike the

licensure examinations, it may or may not be taken; it is voluntary. As such, some schools offer

monetary rewards for the completion of the test, as well as opportunities for better positions (i.e.

certified teaching leadership and mentor roles) (Isaacs, 2003).

Criteria for Assessing Teacher Performance

What makes a good teacher? Offhand, one might say, good communication and rapport-

building skills, a sense of empathy for the students, and, of course, knowledge of the lessons to

be taught. These, however, are not easily measurable, and even if one should manage to grasp the

9

key to quantifying them, the degree of importance each one of those variables might bear might

change over time, due to the changing demands of the profession.

Paradigm Shifting Through the Years

As the paradigm of how teaching should be done shifts through the years, so should what

is considered as criteria of good teaching change. And it has, through the decades. In the 1970s,

for instance, the predominant philosophy was based on Madeline Hunter’s model, which was in

turn, based on student achievement—“norm-referenced, machine-scorable, multiple-choice tests

of fairly low level knowledge” (Danielson and McGreal, 2000). Today, however, things are

different. A good student is not only one who can perfect tests. Rather, he is capable of complex

learning, of problem solving and applying knowledge to unfamiliar situations (Danielson and

McGreal, 2000).

This change did not happen overnight. Like most good changes, it was a gradual process,

with some small shift occurring at as each decade turned the pages of time. From the behaviorist

70s came an increase in the need to help students attain more complex goals, which soon, in the

80s and 90s, highlighted the need for critical thinking, problem solving, lifelong learning,

collaborative learning and a shift to a more constructivist perspective.

Danielson and McGreal (2000) described the shift of focus in teaching across history. In

the 1970, emphasis was given on learning styles (encouraged emphasis on teacher-centered,

structured classrooms), anticipatory set, statement of objectives, instructional input, modeling,

and checking for understanding guided practice and independent practice. In the 1980’s teacher

effectiveness was given attention that includes: expectancy studies, discipline models, Hunter

derivatives, effective school research, cooperative learning, and brain research. In the 1990’s

numerous studies on critical thinking was present in literature. Teaching emphasized on content

10

knowledge, content pedagogy, alternative assessment, multiple intelligence, collaborative

learning, cognitive learning theory, constructivist classrooms, authentic pedagogy, engaged

teaching and learning, and teaching for understanding. For the 21st century, teaching

effectiveness and critical thinking are refocused on authentic pedagogy. Authentic pedagogy

includes engaged teaching and learning and teaching for understanding.

While the trend in teaching is certainly a good factor to consider when thinking about

how to evaluate, it doesn’t end there. When evaluating, the National Board for Professional

Teaching Standards makes it clear that there is more than just teaching per se:

The fundamental requirements for proficient teaching are relatively clear: a broad

grounding in the liberal arts and sciences; knowledge of the subjects to be taught,

of the skills to be developed, and of the curricular arrangements and materials that

organize and embody that content; knowledge of general and subject-specific

methods for teaching and for evaluating student learning; knowledge of students

and human development; skills in effectively teaching students from racially,

ethnically, and socioeconomically diverse backgrounds; and the skills, capacities,

and dispositions to employ such knowledge wisely in the interest of students.

(The National Board for Professional Teaching Standards, 1998, p.4, as quoted by

Isaacs, 2003)

If one reads through the paragraph, one will notice that from talking about the academic

requirements, it moves to basic psychological knowledge and sociological understanding.

Danielson’s Components of Professional Practice

Danielson & McGreal (2000) proposed a model containing four domains embodying the

components of professional practice. Domain 1, Planning and Preparation, covers everything that

11

happens before contact with the students: knowing the topic, knowing what one has at one’s

disposal and being able to use it (Demonstrating knowledge of content and pedagogy,

demonstrating knowledge of students, selecting instructional goals, demonstrating knowledge of

resources, designing coherent instruction, and assessing student learning). Domain 2, The

Classroom Environment, speaks of setting the mood for learning and classroom management

(Creating an environment of respect and rapport, establishing a culture for learning, managing

classroom procedures, managing student behavior, and organizing physical space). The third

domain, Instruction, tackles the raison d’ etre of teaching, from being able to communicate

effectively to actually helping the students learn to giving feedback (Communicating clearly and

accurately, using questioning and discussion techniques, engaging students in learning, providing

feedback to students, and demonstrating flexibility and responsiveness). The last domain,

Professional Responsibilities, contains miscellaneous components of being a professional

teacher: being able to maintain accurate records, to talk to parents, and to reflect on one’s

teaching processes to be able to self-critique them (Reflecting on teaching, maintaining accurate

records, communicating with families, contributing to the school and district, growing and

developing professionally, showing professionalism).

Student Evaluation of Teaching (SET)

The student evaluation of teaching or SET is probably the most widely used form of

evaluation, favored for its simplicity and economy in administration, scoring, and interpretation.

Since it was first put into modern use by pioneers like Herman Remmers, it has been applied in

at least three different ways, as identified by McKeachie (1996). These are student guidance in

choice of course, improvement of teaching, and evaluating teaching for use in personnel

decisions (i.e. tunure or merit salary increases).

12

What Students Look At

If Wright’s (2006) source Olshavsky and Spreng (1995) is right, then what “level of

knowledge” do students possess and use upon evaluating their instructors?

For starters, Wright (2006) says that the “entertainment” level of classroom experience

(Costin, Greenough, & Menges, 1971 in Wright, 2006) is a big determinant. The “Dr. Fox” study

by Naftulin, Ware, and Donnelly (1973, in Wright, 2006) pointed to this. The researchers hired a

highly enthusiastic actor to give a lecture that was intentionally devoid of any educational value

—yet his teaching was rated quite highly. Definitely, most students want their classes to be more

fun and entertaining (Trout, 1997 in Wright, 2006).

Another thing that seems important to students is their teacher’s communication skills. In

a study by Williams and Ceci (1997, in Wright, 2006), it was found that, though course material,

the lecture format, and student performance were held constant, an improvement of

communication skills led to a significant improvement in the SETs in all areas.

Then, there are the findings of Wright (2000) himself. According to him, two factors that

affect SETs are perceived fairness in grading and instructor appearance.

It is very possible for students to even favor a factor that is detrimental to their

experiencing effective learning. Wright (2006) cites Strom, Hocevar, and Zimmer (1990) who

found that students who preferred easy courses preferred teachers with high student orientation

but did so much better in the classes of teachers with low student orientation.

What Students Look At: Steiner et al.’s (2006) Study on Student Biases

Steiner et al. (2006) also did a study looking into what students consider when they

evaluate. They broke down student criteria into four variables: student perceptions, instructor

attributes, instructional formats, and course attributes. Each of these had several variables under

13

them, studied individually in relation to predicted SET scores. The findings are of interest in that

they show that students, at least those of the data set, for the researchers did admit that their

sample provided very little generalizability, consider how much they perceive they learned from

the teacher, and that they tend to consider things that are very well beyond the teacher’s control.

Under student perceptions, the predicted value of SET increased with every unit increase

of a student’s perception of how much they learned and decreased with every unit decrease of the

grade students expect to get (i.e. from an A to a B). In addition to this, though not statistically

significant, the researchers noted as important that how challenging a course is perceived to be

by the students has a positive relationship with SETs.

Under instructor attributes, only one subvariable registered as significant—instructor’s

gender. Generally, student ratings were worse for female professors than for the males.

Under instructional formats, the significant subvariables were using videos and guest

speakers, offering extra credit opportunities, the percent of course time spent lecturing and the

percent of course time spent in active learning activities. All of these, except for the percent of

course time spent lecturing had a positive relationship with SETs.

No significant subvariable surfaced under the course attributes variable, but the authors

noted that it seems teaching electives and graduate level classes as against teaching required

courses slightly improves SETs.

There was also an interaction effect noted, one between the instructor’s gender and

percent of course time spent lecturing. It appears that the negative effect of lecturing is

particularly pronounced if the professor is male.

The trouble with these findings is that they reveal that sometimes, students include in

their judgment things that are beyond the teachers’ control. For instance, simply being a female

14

instructor for the sample of Steiner et al.’s (2006) already greatly diminishes one’s chance for

merits, which is unfair when compared to the male professors. This may not be the case for the

sample in the mind of the reader, but it should still be impressed in the reader’s mind that any

sample might have some biases that teachers cannot control, so it is important to find out.

Student Performance and Teacher Evaluation

Studies are ambivalent about the relationship of grades, both actual and expected, to

SETs. Landrum and Dillinger (2004) say that instructor evaluation is weakly but significantly

correlated with actual grade but not with expected grade. Moore (2006) says that neither actual

nor expected grade (referred to as anticipated grade in his study) were indicators of SETs. And

finally, Steiner et al.’s (2006) study says that expected grade is strongly correlated with SETs.

It is important to be able to establish how the grading system affects SETs because one

major issue that may invalidate the SETs is that “dumbing down” course work making it easier

to get better grades would generate better evaluations.

Are SETs Really Helpful?

To settle this question, one must first determine what use evaluation results are put to, for

the utility attached to them, whether for administrative decisions or for professional

improvement purposes, changes the answer.

Conducting a metaevaluation (which contains standards for utility) would be able to

determine if the evaluation generated helps in making administrative decisions. On the other

hand, if it is the improvement of teaching quality that one is after, then one must be guided by

Remmers’ finding that evaluations do help bring about improvement, but not much (McKeachie,

1996) and by the idea that discussing the evaluation results with another teacher produces

substantial improvement (McKeachie et al., 1980 in McKeachie, 1996). This latter, however is a

15

voluntary act, and this is where the dedication of the instructor, teacher, or professor to his or her

professional development comes in.

Aultman (2006) suggests the use of formative evaluation as she has found through

personal experience that it brings about unexpected benefits. She conducted her own formative

evaluation, separate from the official summative evaluation conducted at the end of the semester.

This she did at the third week of the semester, giving her a lot of time to reinforce her strengths

and improve on her weaknesses as identified by the students.

Her “mini-evaluation” was composed of three sections. The first section allowed students

to write down any questions regarding the course content up to that point of the term. The second

part asked students to rate her in four sections: pace of the course, clarity of her lectures, the

quality of the activities the class did to enhance their understanding, and her preparedness for

class. Finally, the third section asked for student comments on how she could improve the class.

In the next meeting, she tackled the evaluations with her students. She answered all the

content-related questions about the course and the questions raised in the evaluation about future

projects that they had not asked in class.

The results were quite astonishing. The students, seeing that their input was given

importance, soon began to raise more questions in class. Sometimes, they would come to class

early or stay behind to ask questions not only about the subject matter but occasionally also

about her dissertation progress or her family. The mini-evaluation had “served as a catalyst for

improved communication between my students and me. Students saw as a real person as well as

their instructor” (Aultman, 2006).

What Aultman (2006) did requires a certain amount of dedication and a certain comfort

level with criticism. Students may just be the learners, but sometimes, since they get to see things

16

that teachers can’t see merely by virtue of their vantage point. If teachers accept that and manage

to learn from their students, they may find that SETs as a method of improving themselves are

effective.

Metaevaluaton: “Evaluation of an evaluation”

In 1969, Michael Scriven used the term metaevaluation to describe the evaluation of any

evaluation, evaluative tool, device or measure. Seeing how so many decisions are based on

evaluation tools (which is typically their main purpose for existence in the first place—to help

people make informed decisions), it is no wonder that the need to do metaevaluative work on

these evaluation tools is as great as it is (Stufflebeam, 2000).

In the teaching profession, student evaluation of teachers stands as one of the main tools

of evaluating. However, as earlier stated, while it is but fair that students be included in the

evaluative process, depending on the evaluation process and content, it may not be very fair to

teaching professionals to have their very careers at the mercy of a potentially flawed tool.

The Process of Metaevaluation (Hummel, 2003)

How does one go about performing a metaevaluation? Stufflebeam (2000, in Hummel,

2003) identified certain steps:

1. Determine and Arrange to Interact with the Evaluation's Stakeholders. Stakeholders

can refer to anyone whose interests might be affected by the evaluation under the microscope.

These may include teachers, students, and administrators.

2. Staff the Metaevaluation with One or More Qualified Metaevaluators. Preferably, these

should be people with technical knowledge in psychometrics and people who are familiar with

the Joint Committee Personnel Evaluation Standards. It is sound to have more than one

metaevaluator on the job, so that more aspects may be covered objectively.

17

3. Define the Metaevaluation Questions. While this might differ on a case-to-case basis,

the four main criteria ought to be present: propriety, utility, feasibility, and accuracy.

4. As Appropriate, agree on Standards, Principles, and/or Criteria to Judge the Evaluation

System or Particular Evaluation

5. Issue a Memo of Understanding or Negotiate a Formal Metaevaluation Contract

This will serve as a guiding tool. It contains the standards and principles contained in the

previous step and will help both the metaevaluators and their clients understand the direction the

metaevaluation will take.

6. Collect and Review Pertinent Available Information

7. Collect New Information as Needed, Including, for Example, On-Site Interviews,

Observations and Surveys

8. Analyze the Findings. Put together all the qualitative and quantitative data in such a

way that it will be easy to do the following step.

9. Judge the Evaluation's Adherence to the Selected Evaluation Standards, Principles,

and/or other criteria. This is the truly metaevaluative step. Here, one takes the analyzed data and

judges the evaluation based on the standards that were agreed upon and put down in the formal

contract. In another source, this step is lumped with the previous one to form a single step

(Stufflebeam, 2000).

10. Prepare and Submit the Needed Reports. This entails the finalization of the data into a

coherent report.

18

11. As Appropriate, Help the Client and Other Stakeholders Interpret and Apply the

Findings. This is important for helping evaluation system under scrutiny improve by ensuring

that the clients know how to use the metaevaluative data properly.

The Standards of Metaevaluation

There are four standards of metaevaluation: propriety, utility, feasibility, and accuracy.

Propriety standards were set to ensure that the evaluation in question is done in an ethical

and legal manner (P1 Service Orientation, P2 Formal Written Agreements, P3 Rights of Human

Subjects, P4 Human Interactions, P5 Compete and Fair Assessment, P6 Disclosure of Findings,

P7 Conflict of Interest, P8 Fiscal Responsibility). They also check to see that all welfare of all

stakeholders in considered (Widmer, 2003 in Hummel, 2003).

Utility standards stand as a check for how much the evaluation in question caters to the

information needs of its users (Widmer, 2003 in Hummel, 2003). They include: (U1) Stakeholder

Identification, (U2) Evaluator Credibility, (U3) Information Scope and Selection, (U4) Values

Identification, (U5) Report Clarity, (U6) Report Timeliness and Dissemination, and (U7)

Evaluation Impact.

Feasibility standards make sure that the evaluation “is conducted in a realistic, well-

considered, diplomatic, and cost-conscious manner” (Widmer, 2003 in Hummel, 2003). They

include: (F1) Practical Procedures, (F2) Political Viability, and (F3) Cost Effectiveness.

Finally, accuracy standards make sure that the evaluation in question produces and

disseminates information that is both valid and useable (Widmer, 2003 in Hummel, 2003). They

include: (A1) Program Documentation, (A2) Context Analysis, (A3) Described Purposes and

Procedures, (A4) Defensible Information Sources, (A5) Valid Information, (A6) Reliable

Information, (A7) Systematic Information, (A8) Analysis of Quantitative Information, (A9)

19

Analysis of Qualitative Information, (A10) Justified Conclusion, (A11) Impartial Reporting, and

(A12) Metaevaluation.

It should be noted that the aforementioned standards were developed primarily for the

metaevaluation of the evaluation of education, training programs and educational personnel.

The Student Instructional Report

The Student Instructional Report (SIR) currently used by the College of Saint Benilde

originated from the SET form used by De La Salle University. It has been revised over the years

—instructions have been changes, certain things were omitted from the manual. The items used

to day are pretty much what they were in 2000, and the instructions more or less the same as

those written in 2003.

The SIR is administered in the eighth week of every term, the week directly after the

midterms week. The evaluands of the form are teachers; the evaluators, are their students, and

other stakeholders are the chairs and deans, who use the data generated by the SIR for

administrative decisions. The results are presented to the teachers after the course cards are

given. By definition then, it is a form of summative evaluation. There is currently no data that

speaks of its value as a method of formative evaluation.

Peer Evaluation Form

The Peer Evaluation Form (PEF) is used by faculty members in observing the

performance of their colleagues. The PEF is designed to determine the extent to which the CSB

faculty has been exhibiting teaching behaviors along the areas of: teacher’s procedures, teacher’s

performance, and students’ actions as observed by their peers.

20

The PEF is used by a peer observer if the teacher is new in the college and due for

promotion. The peer discuss with the faculty evaluated the observation and rating given. The

faculty signs the form after the conference proper.

Method

Guided Discussion

The Guided Discussion is the primary method of data-gathering for all groups concerned.

As stated above, the represented groups include the teachers, the chairs and/or deans, the CLPA-

PASU staff directly involved in the evaluation process, the evaluation measurement expert team

and the students.

As suggested by Galves (1988), there are five to seven (5-7) participants for every guided

discussion (GD) session. The participants for the GD were chosen by the deans of the respective

schools involved. The groups included are teachers, chairs and/pr deans, the CLPA-PASU staff,

a team of evaluation measurement experts from CLPA and students.

Separate GD sessions for each of the schools of the college were conducted they have

different needs. The scope of this study is to “assess and evaluate" the current practices

undertaken in the SIR and PEF system of administration, scoring, and interpretation. In the GT

sessions that were conducted, the participants are co-evaluators considering that they all employ

the same PEF and the same SIR items and standards of practice.

Each of the former four of the aforementioned list discuss and evaluate along the lines of

one of the four criteria set by the Joint Committee Standards for Evaluation. The Teachers group

is set to discuss and evaluate the propriety aspect; the Chairs/Deans group, the utility aspect; the

CLPA-PASU Staff group, the feasibility aspect; the team of experts, the accuracy aspect.

21

Before any of the GD sessions, the list of guide questions for each group was sent to the

chosen participants for a pre-screening of the topics to be discussed at least ten days before the

scheduled GT session for that group. The participants are given the liberty to request that other

topics be added to the discussion or that certain topics be scratched out.

The modified guide containing the set of questions to be covered is presented to the

participants. Three researchers play specific roles as prescribed by Galves (1988): the guide shall

ask the questions and guide the discussion, the recorder records of the points raised per question

and any questions the participants may care to ask (using a medium visible to the whole group),

and the observer of the process is tasked to keep the discussion on track, regulate the time per

topic, and prevent anyone from monopolizing the discussion. The guide initiated the discussion

by presenting the new set of questions, at which point the participants were given another

opportunity to add or subtract topics for discussion. Once the final set of questions has been

decided upon and recorded by the recorder, responses were gathered and discussed by the group.

One key feature of the GD method is that a consensus on the issues under a topic must be

reached. When all the points were raised, the group was given the chance to look over their

responses to validate or invalidate them. Whatever the group decides to keep will be kept; what it

chooses to strike out gets stricken out.

The side-along evaluation done by the observer may be done at regular points throughout

the discussions as decided by the group (i.e. after each topic) and/or when he or she deems it fit

to interrupt (i.e. at points when the discussion goes astray, or the participant spend too much

time on one point)

22

A similar procedure was followed for the Student group. The purpose the students’

discussion is to get information of their perspectives of the evaluation process and their

perception of their role as evaluators.

At the end of each discussion, the participants were asked to give their opinion about the

usefulness and feasibility of having this sort of discussion every year to process their questions,

comments, doubts, and suggestions. This provides data for streamlining the metaevaluative

process for future use.

Extant Data, Reliability, and Validity Testing

The average ratings of the professors within the last three school years (AY 2003-2004

and 2004-2005) were used to generate findings on how well the results could discriminate the

levels of good teaching and needs improvement teaching. The Cronbach’s alpha was used to

determine the internal consistency of the old teacher performance instrument.

The average of the scores for the three terms was computed for each school year,

generating three average scores. These scores were compared to each other to check the

reliability across time.

Metaevaluation Checklist

A checklist was used to determine whether the evaluation meets the standard of utility,

feasibility, propriety, and accuracy. There were seven experts in line with measurement and

evaluation who were invited to evaluate the system used by the CLPA in assessing teachers

performance on both Student Instructional Report (SIR) and Peer Evaluation Report (PEF). The

metaevaluators first used a 30-item checklist adopted from the Joint Committee Standards for

23

Evaluation. The metaevaluators were guided by information from the ginabayang talakayan

session notes (as transcribed by the taga-tala) and other extant data.

Instrumentation

For the GD sessions, a guide lists was used. The guide is composed of a set of questions

under each standard that is meant to evaluate the evaluation system (see appendix A). The

questions in the GT are the pre-written. In the data-gathering method, these are still subject to

change, both in the fielding of the questions prior to the GT sessions and on the day of the GT

session itself.

The Metaevaluation Checklist by Stufflebeam (2000) was used to rate the SIR and PEF

as an evaluation system. It is composed of ten items for each of the subvariables under each of

the four standards (see appendix B). The task is to check the items in each list that are applicable

in the current teacher performance evaluation system done by the center. Nine to ten (9-10) items

generates a rating of excellent for that particular subvariables; 0.7-0.8), a very good; 0.5-0.6,

good; 0.3-0.4, fair; and 0-0.2, poor.

Data Analysis

The data obtained from the Ginabayang Talakayan (GT) was analyzed using the

qualitative approach. The important themes from the notes produced in the Ginabayang

Talakayan were extracted based on the appraisal components for each area of metaevaluation

standard. For utility appraisal themes referring to stakeholder identification (persons affected by

the evaluation should be identified), evaluator credibility (trustworthiness and competence of the

evaluator), information scope and selection (broad selection of information/data for evaluation),

values identification (description of procedures and rationale of the evaluation), report clarity

(description of the evaluation being evaluated), report timeliness (findings and reports distributed

24

to users), and Evaluation impact (the evaluation should encourage follow-through by

stakeholders) were extracted. For propriety the appraisal themes extracted are on Service

orientation (designed to assist and address effectively the needs of the organization), formal

agreement (Obligation of formal parties are agreed to in writing), rights of human subjects

(evaluation is conducted to respect and protect the rights of human subjects), and human

interaction (respect human dignity and worth). For feasibility the themes extracted are on

practical procedures, political viability, fiscal viability, and legal viability. The qualitative data

were used as basis in accomplishing the metaevaluation checklist for utility, feasibility, and

propriety.

For the standards on accuracy on accuracy, the existing documents of processes,

procedures, programs, policies, documentations, and reports were made available to the

metaevaluators in order to accomplish the metaevaluation checklist in this area.

In the checklist, every item of the metaevaluation standard that was checked were divided

into 10 and averaged according to the number of metaevaluators who accomplished the checklist.

Each component is then interpreted whether the system reached the typical stands of evaluation.

The scores are interpreted as 0.9 to 1.0, Excellent; 0.7 to 0.8, Very Good; 0.5 to 0.6, Good; 0.3 to

0.4, Fair; 0.1 to 0.2, Poor.

Results

Utility

Under utility there are four standards evaluated: stakeholder identification, information

scope and selection, values identification, functional reporting, follow-up and impact, and

information scope and selection. Table 1 shows the themes and clusters formed in evaluating the

utility of the teacher performance evaluation system.

25

Table 1

Themes and Clusters for the Utility Standard

Standard Cluster ThemeStakeholder identification

Mode of feedback One on one basis Informal Post conference [document] Meetings A note is given if the feedback is urgent

Approaches to feedback Developmental – suggestions to further improve the teaching skills

Evaluative – Standing of the facultySources of feedback Students (SIR)

Student Advising E-mail from students and parents Peers (senior faculty, chairs, deans)

Time of giving feedback If the rating is high (3.75 and above), no feedback is given

When the results of the SIR are low If the faculty is new to the college Those who have been teaching for a long time and

getting low ratingsValues Identification

Needs Results (that)are (not) too cumbersome for deans to read

A print out of the results should be given The time taken to access the results turns off some

teachers from accessing them Students having difficulty answering the SIR Students don’t see how teaching effectiveness is

measured Create a particular form for laboratory classes in

SHRIM classes.Actions Taken Removing items that are valid and another

computation is done. Other evaluation criteria is done

Instrument value There should be indicators for each score There should be factors of teaching effectiveness

with clear labels Identify what the instrument measures There needs to be a lump score on learner-

centeredness There are other success indicators that are not

reflected in the SIRFunctional Reporting

Decisions Promotion Loading with course Retaining PTF Deloading a faculty Permanency Training enhancement

Function Use for improvement of the faculty The VPA comes up with a list of faculty that will

be given teaching load based on SIR reports The PEF constricts what needs to be evaluated

more.

26

Follow-up and Impact

Qualitative Give headings/labels for the different parts Come up with dimensions and subdimensions Devise a way to reach the faculty (yahoo, emails

etc.) The teachers and students should see what aspects

to improve on. There should be narrative explanations for the

figuresQuantitative Faculty doesn’t understand the spreading index

Conduct a seminar explaining the statistics Come up with a general global score. Each area should be represented with a number Verbal list of strengths and weaknesses of the

faculty

For the standard on stakeholder identification, the strands were clustered into four

themes: Mode of feedback, approaches to feedback, sources of feedback, and time of giving

feedback. For the deans and chairs the mode of feedback took the form of “one on one basis,

approach is informal, post conferences, meetings, and when urgent a note is given.” The

approaches in giving feedback were both developmental (suggestions to further improve the

teaching skills) and evaluative (Standing of the faculty). The sources of feedback come from the

students through the SIR, student advising, e-mail from students and parents, and peers (senior

faculty, chairs, deans). Feedback is given “if the rating is high (3.75 and above); sometimes no

feedback is given; when the results of the SIR are low; if the faculty is new to the college; and

those who have been teaching for a long time and getting low ratings.”

For values identification, the strands were clustered into three themes: Needs, action

taken, and value of the instrument. According to the participants, the needs included “Results

(that) are (not) too cumbersome for deans to read; A print out of the results should be given; the

time taken to access the results turns off some teachers from accessing them; students having

difficulty answering the SIR; students don’t see how teaching effectiveness is measured and;

create a particular form for laboratory classes in SHRIM classes.” The action taken theme

included “removing items that are valid and another computation is done and; other evaluation

27

criteria is done.” The instrument value theme showed that for the instrument to be valuable,

“there should be indicators for each score; there should be factors of teaching effectiveness with

clear labels; identify what the instrument measures; there needs to be a lump score on learner-

centeredness and; there are other success indicators that are not reflected in the SIR.”

For Functional reporting, two clusters emerged: decisions and functions. The decisions

made by the teacher evaluation include promotion, loading with course, retaining PTF, deloading

a faculty, permanency, and training enhancement. The functions of the teacher evaluation are

“used for improvement the faculty; the VPA comes up with a list of faculty that will be given

teaching load based on SIR reports and; the PEF constricts what needs to be evaluated more.”

The follow-up and impact included both qualitative and quantitative. The qualitative

aspect of the instruments included suggestions to “give headings/labels for the different parts;

come up with dimensions and subdimensions; devise a way to reach the faculty (yahoo, emails

etc.); the teachers and students should see what aspects to improve on; and there should be

narrative explanations for the figures.” The quantitative aspect of the report included “faculty

doesn’t understand the spreading index; conduct a seminar explaining the statistics; come up

with a general global score; each area should be represented with a number; and a verbal list of

strengths and weaknesses of the faculty.”

Two clusters were identified for information scope and selection: perception and action.

In the perception the faculty “looks at evaluation as something negative because the school uses

the results.” For the suggested actions “come up with CLPA kit explaining the PEF and SIR;

check on the credibility on the answers of students; and SIR needs to be simplified for the

SDEAS.”

Table 2

28

Rating for Utility

Utility Mean Rating InterpretationStakeholder Identification 0.59 GoodEvaluation Credibility 0.65 GoodInformation Scope and selection 0.78 Very GoodValues Identification 0.66 GoodReport Clarity 0.52 GoodReport Timeliness and dissemination 0.29 PoorEvaluation Impact 0.35 FairNote. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (0-.2)

The ratings for utility using the metaevaluation checklist showed that in most of the item

areas, the performance of the teacher evaluation processes are good. In particular, the area on

information scope and selection is very good. However, report timeliness is poor and evaluation

impact is fair and should thus be improved.

Propriety

The standards on propriety include service orientation, formal evaluation guidelines,

conflict of interest, confidentiality, and helpfulness. Table 3 shows the clusters and themes

formed from the ginabayang talakayan.

Table 3

Themes and Clusters for the Propriety Standard

Standard Cluster ThemeService Orientation Results Not satisfied because the results come very late

Prepare hard copies of the results Most faculty members could not access the results PEF qualitative results are not seen online

Examiner Friendly Sometimes late New staff have difficulty administering the form

because they could not handle deaf students They are not able to answer the questions of

students

Responding There should be orientation to students

29

Formal Evaluation Guidelines

Students They get tired of answering many SIR within the day

Frequency of Meetings No guidelines for modular classes and team teaching

No SIR for OJT classes-teacher cannot be promoted

Observation visits Make clear who will call the teacher when the SIR is finish

The observer can’t make other visits The PEF guidelines does not give instructions what

the observer will do Not practical for the observer to go through the

whole process of preobservation, observation and post observation.

Conflict of interest CLPA do not give in to requests Not too many queries about the SIR Because the LC is adopted by the college, more

value is given to the SIR SIR is not fully explained to the teacher

Confidentiality The information is very confidentialHelpfulness The comments are read rater than the numbers

It’s the comments that the teachers look at The numerical results need a more clear

explanation Comments need to be broken down into specific

factors

For service orientation, the clusters formed were on the results, examiner, and

responding. According to the participants, they were “not satisfied because the results come very

late.” There is a need to “prepare hard copies of the results” because “most faculty members

could not access the results” and the “PEF qualitative results are not seen online.” The

participants appraisal of the examiners include being “friendly, sometimes late, new staff have

difficulty administering the form because they could not handle deaf students, and they are not

able to answer the questions of students.” In responding to the SIR it was mentioned that “there

should be orientation to students.”

For the formal evaluation guidelines the three areas specified were the students,

frequency of meetings, and observation visits. For the students, it was mentioned that “they get

tired of answering many SIR (forms) within the day.” In terms of the frequency of meetings,

30

there are “no guidelines for modular classes and team teaching; and “no SIR for OJT classes and

the teacher cannot be promoted.” In the observation visits, it is needed to “make clear who will

call the teacher when the SIR is finish; the observer can’t make other visits; the PEF guidelines

do not give instructions what the observer will do; it is not practical for the observer to go

through the whole process of preobservation, observation and post observation.”

No clusters were formed for the conflict of interest. The themes extracted were “CLPA

do not give in to requests; not too many queries about the SIR; because the LC is adopted by the

college, more value is given to the SIR; and SIR is not fully explained to the teacher.”

For confidentiality, majority of the participants agree that ‘the information kept by the

center is very confidential.”

For the area on helpfulness the themes identified were “the comments are read rather than

the numbers; it’s the comments that the teachers look at; the numerical results need a more clear

explanation; and comments need to be broken down into specific factors.”

Table 4

Rating for Propriety

Propriety Mean Rating InterpretationService Orientation 0.53 GoodFormal Agreement 0.70 Very GoodRights of Human subjects 0.62 GoodHuman Interaction 0.57 GoodComplete and Fair Assessment 0.38 FairDisclosure of Findings 0.50 GoodConflict of Interest 0.48 FairPhysical Responsibility 0.40 FairNote. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

Most of the ratings for propriety using the metaevaluation checklist were pegged at good.

A very good rating was obtained for formal agreement. A fair rating is obtained in the areas of

complete and fair assessment, conflict of interest, and physical responsibility.

31

Feasibility

The standards on feasibility include practical procedures, political viability, fiscal

viability, and legal viability. Table 5 shows the clusters and themes for the standards.

Table 5

Themes and Clusters for the Feasibility Standard

Standard Cluster ThemesPractical procedures

Understandability of the Instructions

Generally, the students do not understand the instructions.

Korean students don’t understand May ilang students (freshmen or even higher years)

na hindi naiintindihan ang instructions sa part 2 and 3 - Paano gagawin?

Students normally ask, "Bakit 3 columns, bakit hindi pwedeng shade ang column 2 kung walang 1?"

Difficulty with the Comments and Suggestions Part

There are a lot of questions about the Comments and Suggestions part.

Kailangan ba talagang sagutan or optional? Yung iba ayaw lagyan dahil baka mabasa daw ng

professor at pagalitan sila. Halatang hindi naiintindihan ang instructions kasi

hindi kinukumpleto ang sentence. Hindi malinaw sa kanila ang statement.

Difficulty with the Instrument as a Whole

The instrument is complicated, not only the instructions.

Political viability Time Issues in Administration

Oras- yung 30 minutes kulang sa pagexplain at pagadminister, magdagdag ng another 10 minutes pag first year. Kung papayag ang faculty, if we prolong the time for administration

Pressured The new instrument needs 30 minutes to be

administered so other classes cannot be accommodated because the time will not be enough anymore.

(In cases where) the last 30 minutes was used (for the evaluation)…and the next class (had theirs in the)…first 30 minutes, problem(s) occurs in managing the time.

Time Issues of Teachers Some faculty members do not want to be evaluated in the first 30 minutes

Some faculty members dictate that the last 30 minutes will be used for evaluation.

Teachers complain about the duration of the SIR administration but the guidelines indicated first 30 minutes.

32

Rescheduling Issues Rescheduling is always granted because some of the faculty members (or their students) do not show up.

Rescheduling due to conflict with other activities-The Young Hoteliers’ Exposition and some tours and retreats have the same schedule as the SIR.

Frequency of Evaluation Kailangan bang ievaluate every term?

Identifying and Anticipating Teacher-related Issues

Anticipating name changes (e.g. female faculty members getting married) – We need to request for the updated list of the faculty names early in the term, a list including the faculty members who changed their surnames with ACTC.

Faculty member comment – Matagal nang mali yang evaluation form.

(To make sure the teacher is aware of his/her evaluation schedule) Signing in the receiving copy of memo is helpful.

In case there is a transfer of classroom, the teacher should put a note in front of the door.

Need to think of (ways of) maximizing the time of the teacher while outside of the classroom.

The door should be locked to prevent the teacher from coming in before the students are done evaluating.

Inuunahan na ang faculty for the evaluation (Go to the room ahead of the faculty member to preempt anything he/she may have planned for the day)

Tinatandaan yung mga notorious faculty para alam iapproach next time.

Some teachers announce the evaluation and it affects the attendance of students – students do not show up.

The policies are not read by the faculty in terms of the guidelines. They always come and call CLPA about the release of results.

Some faculty members do not know how to enter the faculty portal. The instructions on accessing the SIR results need to be included in the guidelines.

Some faculty would want to sit at the back of the class and wants to stay inside during the SIR administration.

Anticipating Student Needs Compile the needs of students and present it to students in an attractive form. Drum up the interest of students in the evaluation.

Concerns about Utility Utilization is used by high end users. For others, the process (of utilization) is not clear.

Cost Effectiveness Human resources The (human) resources (e.g. staff) are maximized during SIR. LASU staffers are also used. Ipabasa ang transcript with the LASU staff.

Well-utilized Some staffers have difficulties going home because

of the late hours. Meals provided - okay, except they’re redundant

sometimes. They serve as good compensation for the late work hours.

33

Material resources The scanner is worth it because it encodes the responses fast and it helps meet the deadlines.

The SIR process is well-supported by the College Sometimes it’s hard to administer in AKIC.

Technology The programmer is new (as of September 2006) hindi pa niya feel yung data processing kaya nawawala siya. Hindi pa siya attuned sa work flow.

The staffs are oriented with the use of the program. The program is shared. Random checking of the comments and the editing can be directly done

The faculty members do not have their own PCs so they do not get the memo on time.

Kaunting respondents with online evaluation. We need to maximize online. If all classes come together for on-line the computers hang.

Legal viability Standardizing the Evaluation Setting

There is a common script The classroom is generally conducive in answering During C-break some classes are affected with the

noise.

For practical procedures, the clusters formed were on the understandability of the

instructions, difficulty with the comments and suggestions part, difficulty with the instrument as

a whole. These clusters show that while there are standardized procedures for every run of the

SIR, there is a difficulty following them because “generally, the students do not understand the

instructions.” The comments and suggestions (part four) part of the instrument appears to be a

particularly problematic part—here too, the instructions do not seem to be clear to the students:

“halatang hindi naiintindihan ang instructions kasi hindi kinukumpleto ang sentence.” (It is

obvious they do not understand the instructions because they do not complete the sentence.).

Other than this, some students are not sure whether “talagang sagutan or optional” (they are

required to answer or it is optional). Others don’t feel safe answering this part because they are

afraid their professors will get back at them for whatever they write. Ultimately, observed the

participants, “the instrument (itself) is complicated, not only the instructions.”

For political viability, eight issue clusters were formed. These were time issues in

administration, time issues of teachers, rescheduling issues, frequency of evaluation, anticipating

34

name changes, identifying and anticipating teacher-related issues, anticipating student needs, and

concerns about utility. The time issues in administration mentioned administration problems

regarding the first-thirty-minutes policy observed by the Center. The time allotment is generally

too short for the whole administration procedure from giving instructions to the actual answering

of the instrument (“yung thirty minutes kulang sa pagexplain at pagadminister”). Teachers also

have issues regarding the same policy. Some refuse to be rated in the first thirty minutes,

preferring to be rated in the last thirty. Another issue regarding the policy is the refusal of some

teachers to be evaluated in the first thirty minutes. There are faculty members who “dictate that

the last 30 minutes will be used for evaluation”. There are others who “complain about the

duration of the SIR administration”, even if “the guidelines (distributed in the eighth week of the

term, the week before the evaluation) indicated first 30 minutes.”

Though discouraged by the Center, rescheduling still does happen during the evaluation

period. Usually it is because “some of the faculty members (or their students) do not show up”.

Similarly, there are times when some students do come, but their numbers do not meet the fifty

percent quota required for each section’s evaluation. Another common reason for rescheduling

are schedule conflicts with other activities: “(the) Young Hoteliers’ Exposition and some tours

and retreats have the same schedule as the SIR”.

The next issue “cluster” formed is regarding the frequency of evaluation; teachers

question whether there is a need to evaluate every term. Although there is only one strand, it is

important enough to be segregated as it gives voice to one of the interest groups’ major concerns.

The next cluster forms the biggest group, the cluster that talks about identifying and

anticipating the needs of the one of the major interest groups/stakeholders of the whole

evaluation system: the teachers themselves. Their needs range from the minor (“We need to

35

request for the updated list of the faculty names early in the term, a list including the faculty

members who changed their surnames with ACTC.”) to the major (“Matagal nang mali yang

evaluation form”), and a lot in between. Among this last include the need to make sure that

teachers are aware of their evaluation schedules and the Center’s policies, to come up with ways

to deal with the teachers during the actual administration, and to equip them with the know-how

to access their online results.

Just as teachers, the evaluatees, have needs, so do their evaluators, their students. By not

taking care of the students’ needs and/or preferences, the Center risks generates inaccurate

results. Thus, the Center should “compile the needs of students and present it (the SIR) to (the)

students in an attractive form. (CLPA should) drum up the interest of students in the evaluation.”

Last under this area are issues on utilization. There appears to be a need to make the

utilization clearer to the stakeholders, especially the teachers.

For the area on cost effectiveness, the clusters formed were human resources, material

resources, and technology. The human resources of the Center are “well-utilized”. The staff

feels that despite special cases when they find it difficult to go home because of the late working

hours, they feel well compensated, in part because of the meals served. As to material resources,

“the SIR process is well-supported by the College” and so, everything is generally provided.

There are special cases where the evaluation setting makes administration difficult. For instance,

“sometimes it’s hard to administer in AKIC”, especially in the food labs. Finally, under the

theme of technology, the Center proved well-equipped enough to handle the pen-and-paper

instrument’s processing. However, it may be some time before the process become paperless; if

the memos would be delivered online, instead of personally, as is currently done, some of the

faculty would “not get the memo on time” because “the faculty members do not have their own

36

PCs”. Then, an attempt was made to administer the instrument online. A problem that was noted

in this regard was “kaunting respondents with online evaluation” (very few respondents are

gathered with the online evaluation). Other than that, “if all classes come together for on-line the

computers hang.”

For legal viability, only one theme was developed, standardizing the evaluation setting.

“There is a common script” to keep the instructions standardized and, although “During C-break

some classes are affected with the noise (of C-break activities)”, the “classroom is generally

conducive in answering”.

Table 6

Rating for Feasibility

Feasibility Rating InterpretationPolitical Viability 0.23 PoorPractical Procedure 0.68 GoodCost effectiveness 0.50 GoodNote. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

For the three areas of feasibility, a good raring was obtained for practical procedure and

cost effectiveness and poor for political viability.

Accuracy

The standards of accuracy were rated based on the reliability report of the instrument

since SY 2003-2004 to 2005-2006. The trend of the mean performance of the means of the

faculty from 2003-2006 was also obtained.

Table 7

Internal Consistency of the items for the SIR from 2003 to 2006

School Year2003-2004 2004-2005 2005-2006

1st Term 0.873 0.875 0.8812nd Term 0.888 0.892 0.8943rd Term 0.892 0.885

37

Summer 0.832 0.866

The reliability of the SIR form is consistently high since 2003 to 2006. The Cronbach

alphas obtained are all in the same high level across the three terms and across three school

years. This indicates that the internal consistency of the SIR measure is stable and accurate

across time.

Figure 1 shows a line graph of the means in the SIR each term across three school years.

Figure 1

Data Trend from the Last Three Years

The trend in the means show that the SIR results increase at a high level during summer

terms (4th). The high level of increase can be observed from the spikes in the 4th term in the line

graph for the three part of the SIR instrument. The means during the first, second, and third term

are stable and it rapidly increase for the summer term.

Table 8

Rating for Accuracy

Accuracy Rating InterpretationProgram documentation 0.03 Poor

38

3.70

3.80

3.90

4.00

4.10

4.20

4.30

4.40

1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th

Term

Mea

n

Part I

Part 2

Part 3

Content Analysis 0.00 PoorDescribed Purposes and Procedures 0.25 PoorDefensible Information Sources 0.50 GoodValid Information 0.23 PoorReliable information 0.35 FairSystematic information 0.85 Very GoodAnalysis of Quantitative information 0.25 PoorAnalysis of Qualitative information 0.00 PoorJustified conclusions 0.00 PoorImpartial reporting 0.38 FairMetaevaluation 0.08 PoorNote. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

The ratings for accuracy using the metaevaluation checklist were generally poor in most

areas. Only systematic information was rated as very good, only defensible information sources

was rated as good, and both reliable and impartial reporting were fair.

Table 9

Summary Ratings for the Standards

Standard Rating PercentageInterpretatio

nFeasibility 4.5 25% FairAccuracy 8.75 0% PoorPropriety 15.17 25% FairUtility 13.5 25% Fair

In the four standards as a whole, feasibility (25%), propriety (25%), and utility (25%) are

met fairly and accuracy (0%) is poor for the entire teacher performance evaluation system of the

center. The poor accuracy is due to zero ratings on content analysis, qualitative information, and

justified information. The three standards rated as fair did not even meet half of the standards in

the metaevaluation checklist.

Figure 2

Outcome of the Standards

39

Standard Outcome

0% 25% 50% 75% 100%

Feasibility

Accuracy

Propriety

Utility

Discussion

The overall findings in the metaevaluation of the teacher evaluation system at the Center

for Learning and Performance Assessment show that it falls below the standards of the Joint

Committee on Evaluation. The ratings of utility, propriety, and feasibility were fair and the

standard on accuracy is poor.

In the standard of utility the report timeliness and dissemination is poor. This is due to the

lack of timely exchanges with the full range of right-to-know audiences. In order to improve the

timely exchanges, the Center needs to conduct consistent communications with different offices

that they are serving.

For propriety, the rating is only fair because low ratings were obtained for complete and

fair assessment, conflict of interest, and fiscal responsibility. To improve complete and fair

assessment, there is a need to assess and report the strengths and weaknesses of the procedure,

use the strengths to overcome weaknesses, estimate the effects of the evaluation’s limitations on

the overall judgment of the system. In line with conflict of interest, there is a need to make the

release of evaluation procedures, data and reports for public review. For physical responsibility,

40

there is a need to improve adequate personnel records concerning job allocations and time spent

on the job, and employ comparisons for evaluation materials.

In standards of accuracy, majority of the ratings were poor, including program

documentation, content analysis, described purposes and procedures, valid information, analysis

of qualitative and quantitative information, justified conclusion and metaevalaution. For program

documentation the only criteria met was the technical report that documents the programs’

operations; all other nine criteria were not met. For content analysis, all criteria were not met. In

described purposes and procedures, only the record of the client’s purpose of evaluation and

implementation of actual evaluation procedures were met. All other eight criteria were not met.

For valid information, there is a need to focus evaluation on key ideas, employ multiple

measures to address each idea, provide detailed description of the constructs assessed, report the

type of information each employed procedures acquires, report and justify inferences, report the

comprehensiveness of the information provided by the procedures as set in relation to the

information needed, and establish meaningful categories of information by identifying regular

and recurrent themes using qualitative analysis. In the analysis of qualitative and quantitative

information, there is a need to conduct exploratory analysis to assure data correctness, choose

procedures appropriate to the system of evaluating teachers, specify assumptions being met by

the evaluation, report limitations of each analytic procedures, examine outliers and verify

correctness, analyze statistical interactions, and using displays to clarify the presentation and

interpretation of statistical results. In the areas of justified conclusions and metaevaluation, all

criteria were not met.

In the standards of feasibility, political viability needs to be improved. For political

viability, the evaluation needs to consider ways to counteract attempts to bias or misapply the

41

findings, foster cooperation, involve stakeholders throughout the evaluation, issue interim

reports, report divergent views, and affirm a public contract.

Given the present condition of the SIR and PEF in evaluating faculty performance based

on the qualitative data, there are still gaps that need to be addressed in line with the evaluation

system. The stakeholders are more or less not yet aware of the detailed standards on conducting

evaluations among their faculty and what is verbalized in the qualitative data is only based on

their personal experience and the practices required of the evaluation system. By contrast, the

standards on evaluation would specify more details that need to be met in the evaluation. Some

areas in the evaluation are interpreted by the stakeholders as acceptable based on the themes of

the qualitative data but more criteria need to be met in a larger range of evaluating teachers. It is

recommended for the Center for Learning and Performance Assessment to consider the specific

areas found wanting under utility, propriety, feasibility, and especially accuracy to attain quality

standards in their conduct of teacher evaluation.

References

Aultman, L.P. (2006). An unexpected benefit of formal student evaluations. College Teaching,

54(3), 251, Retrieved from ProQuest Educaton Journals.

Benson, M. and Otten, A. (2005, September). Teachers’ Rights Review. The Professional,5.

Retrieved September 10, 2006, from http://www.minnetonkateachers.org/docs/news05-06/The

%20Professional%202005-09.htm

42

Danielson, C. and McGreal, T.L. (2000). Teacher evaluation to enhance professional practice

[Electronic version]. Princeton, New Jersey:Educational Testing Service.

Differentiated supervision. Buck County Schools Intermediate Unit, Special Education News.

Retrieved from http://www.bucksiu.org/specialed/news/?id=55

Frederick Winslow Taylor: Scientific Management. (n.d.). Retrieved September 6, 2006, from

http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#The_development_of_scientific_manag

ement

Galves, R.E. (1986). Ang ginabayang talakayan: Katutubong pamamaraan ng sama-samang

pananaliksik. Unpublished manuscript, Psychology Department, University of the Philippines

Gonzales, V. (2001). Teacher evaluations. Retrieved September 10, 2006, from

http://students.ed.uiuc.edu/vgonzale/eport/eol2.html=

Greenberg, M.S. (2001). Student evaluations of teacher effectiveness in the workplace: Mask

designers versus design engineers. (Doctoral dissertation, University of Southern California,

2001) (UMI No. 3065790)

Hummel, B. (2006). Metaevaluation: An online resource. Retrieved September 6, 2006, from

http://www.bhummel.com/Metaevaluation/index.html

Isaacs, J.S. (2003). A study of teacher evaluation methods found in select Virginia secondary

public schools using the 4x4 model of block scheduling. Unpublished doctoral dissertation,

Virginia Polytechnic Institute and State University.

Landrum, R.E. and Dillinger, R. J. (2004). The relationship between student performance and

instructor evaluation revisited. The Journal of Classroom Interaction, 39(2), 5. Retrieved from

ProQuest Academic Research Library.

43

Mallory, A.L. (2006, August 10). Teacher evaluation law criticized [Electronic version]. Knight

Ridder Business Tribune News, p.1 Retrieved September 6, 2006, from ProQuest database.

McKeachie, W. J. (1996). The Professional Evaluation of Teaching: Student ratings of teaching.

American Council of Learned Societies, Occasional Paper No. 33. Retrieved from

http://www.acls.org/op33.htm#TOP

Moore, T. (2006). Teacher evaluations and grades: Additional evidence. Journal of American

Academy of Business, Cambridge, 9(2), 58. Retrieved from ABI/INFORM Global.

Steiner, S., Holley, L.C., Gerdes, K., Campbell, H. E. (2006). Evaluating teaching: Listening to

students while acknowledging bias. Journal of Social Work Education, 42(2), 355. Retrieved

from ProQuest Psychology Journals.

Stufflebeam, D.L. (2000). The methodology of metaevaluation as reflected in by the Western

Michigan University Evaluation Center. Journal of Personnel Evaluation in Education, 14(1),

95. Retrieved from ProQuest Education Journals.

Wright, R.E. (2006). Student evaluations of faculty: Concerns raised in the literature, and

possible solutions. College Student Journal, 40(2), 417. Retrieved from ProQuest Psychology

Journals.

Appendix AGabay based on the Joint Committee Standards for Evaluation

Propriety1. Does PASU provide Quality service in delivering SIR? (service orientation)2. Are the guidelines for the SIR clear? (formal evaluation guidelines)3. Does PASU answer appropriately to the queries on SIR and PEF? (conflict of

interest)4. Is confidentiality of information maintained? (access to personnel evaluation)5. Does the SIR and PEF help teacher improve their teaching performance?

(interaction with evaluatees)

44

6. What items are applicable in your area?Utility

1. Do you provide feedback on the rating of your faculty? (constructive orientation)2. Do the SIR and PEF fit the needs of your school? (defined uses)3. Is the SIR conducted professionally? (evaluator credibility). Does the PEF

facilitate the feedback process?4. Is the SIR and PEF helpful in decision making and providing loads for the

teachers? (functional reporting)5. Are the reports generated clear for faculty and chairpersons? (follow-up and

impact) Feasibility

1. Are the instructions for the SIR clear for students? (Practical procedures)2. What part of the policies and procedures for the SIR needs to be appealed and

rectified? (political viability)3. Are the resources used effectively? (fiscal viability)4. Does the process adhere to testing standards? (legal viability)

Accuracy1. Is the staff generally qualified to administer the evaluation? (defined roles)2. Are the conditions of the students and faculty considered during administration of

the SIR/PEF? (work environment)3. Is the processing of the SIR/PEF well documented? (documentation and

procedures)4. Is the staff well-trained in the scoring, coding and data entry? (systematic data

control)5. Are potential biases safeguarded? (bias control)6. Do we periodically assess evaluation? (Monitoring and control)

Appendix BMetaevaluation checklist

The Metaevaluation Checklist: For Evaluating Evaluations against The Program Evaluation Standards - Accuracy

To meet the requirements for ACCURACY, evaluations should:

A1 Program Documentation

Collect descriptions of the intended program from various written sources

Collect descriptions of the intended program from the client and various

45

stakeholders

Describe how the program was intended to function

Maintain records from various sources of how the program operated

As feasible, engage independent observers to describe the program's actual operations

Describe how the program actually functioned

Analyze discrepancies between the various descriptions of how the program was intended to function

Analyze discrepancies between how the program was intended to operate and how it actually operated

Ask the client and various stakeholders to assess the accuracy of recorded descriptions of both the intended and the actual program

Produce a technical report that documents the program's operations TOTALTotal ÷ 10

0.9 – 1.0 Excellent 0.7 – 0.8 Very Good 0.5 – 0.6 Good0.3 – 0.4 Fair 0.1 – 0.2 Poor

A2 Content Analysis

Use multiple sources of information to describe the program's context

Describe the context's technical, social, political, organizational, and economic features

Maintain a log of unusual circumstances

Record instances in which individuals or groups intentionally or otherwise interfered with the program

Record instances in which individuals or groups intentionally or otherwise gave special assistance to the program

Analyze how the program's context is similar to or different from contexts where the program might be adopted

Report those contextual influences that appeared to significantly influence the program and that might be of interest to potential adopters

Estimate effects of context on program outcomes

Identify and describe any critical competitors to this program that functioned at the

46

same time and in the program's environment

Describe how people in the program's general area perceived TOTALTotal ÷ 10


A3 Described Purposes and Procedures

At the evaluation's outset, record the client's purposes for the evaluation

Monitor and describe stakeholders' intended uses of evaluation findings

Monitor and describe how the evaluation's purposes stay the same or change over time

Identify and assess points of agreement and disagreement among stakeholders regarding the evaluation's purposes

As appropriate, update evaluation procedures to accommodate changes in the evaluation's purposes

Record the actual evaluation procedures, as implemented

When interpreting findings, take into account the different stakeholders' intended uses of the evaluation

When interpreting findings, take into account the extent to which the intended procedures were effectively executed

Describe the evaluation's purposes and procedures in the summary and full-length evaluation reports

As feasible, engage independent evaluators to monitor and evaluate the evaluation's purposes and procedures TOTALTotal ÷ 10


A4 Defensible Information Sources

Obtain information from a variety of sources

Use pertinent, previously collected information once validated

47

As appropriate, employ a variety of data collection methods

Document and report information sources

Document, justify, and report the criteria and methods used to select information sources

For each source, define the population

For each population, as appropriate, define any employed sample

Document, justify, and report the means used to obtain information from each source

Include data collection instruments in a technical appendix to the evaluation report

Document and report any biasing features in the obtained information TOTALTotal ÷ 10


A5 Valid Information

Focus the evaluation on key questions

As appropriate, employ multiple measures to address each question

Provide a detailed description of the constructs and behaviors about which information will be acquired

Assess and report what type of information each employed procedure acquires

Train and calibrate the data collectors

Document and report the data collection conditions and process

Document how information from each procedure was scored, analyzed, and interpreted

Report and justify inferences singly and in combination

Assess and report the comprehensiveness of the information provided by the procedures as a set in relation to the information needed to answer the set of evaluation questions

Establish meaningful categories of information by identifying regular and recurrent themes in information collected using qualitative assessment procedures TOTALTotal ÷ 10

48


A6 Reliable Information

Identify and justify the type(s) and extent of reliability claimed

For each employed data collection device, specify the unit of analysis

As feasible, choose measuring devices that in the past have shown acceptable levels of reliability for their intended uses

In reporting reliability of an instrument, assess and report the factors that influenced the reliability, including the characteristics of the examinees, the data collection conditions, and the evaluator's biases

Check and report the consistency of scoring, categorization, and coding

Train and calibrate scorers and analysts to produce consistent results

Pilot test new instruments in order to identify and control sources of error

As appropriate, engage and check the consistency between multiple observers

Acknowledge reliability problems in the final report

Estimate and report the effects of unreliability in the data on the overall judgment of the program TOTALTotal ÷ 10


A7 Systematic Information

Establish protocols for quality control of the evaluation information

Train the evaluation staff to adhere to the data protocols

Systematically check the accuracy of scoring and coding

When feasible, use multiple evaluators and check the consistency of their work

Verify data entry

49

Proofread and verify data tables generated from computer output or other means

Systematize and control storage of the evaluation information

Define who will have access to the evaluation information

Strictly control access to the evaluation information according to established protocols

Have data providers verify the data they submitted TOTALTotal ÷ 10


A8 Analysis of Quantitative Information

Begin by conducting preliminary exploratory analyses to assure the data's correctness and to gain a greater understanding of the data

Choose procedures appropriate for the evaluation questions and nature of the data

For each procedure specify how its key assumptions are being met

Report limitations of each analytic procedure, including failure to meet assumptions

Employ multiple analytic procedures to check on consistency and replicability of findings

Examine variability as well as central tendencies

Identify and examine outliers and verify their correctness

Identify and analyze statistical interactions

Assess statistical significance and practical significance

Use visual displays to clarify the presentation and interpretation of statistical results TOTALTotal ÷ 10


A9 Analysis of Qualitative Information

50

Focus on key questions

Define the boundaries of information to be used

Obtain information keyed to the important evaluation questions

Verify the accuracy of findings by obtaining confirmatory evidence from multiple sources, including stakeholders

Choose analytic procedures and methods of summarization that are appropriate to the evaluation questions and employed qualitative information

Derive a set of categories that is sufficient to document, illuminate, and respond to the evaluation questions

Test the derived categories for reliability and validity

Classify the obtained information into the validated analysis categories

Derive conclusions and recommendations and demonstrate their meaningfulness

Report limitations of the referenced information, analyses, and inferences TOTALTotal ÷ 10


A10 Justified Conclusions

Focus conclusions directly on the evaluation questions

Accurately reflect the evaluation procedures and findings

Limit conclusions to the applicable time periods, contexts, purposes, and activities

Cite the information that supports each conclusion

Identify and report the program's side effects

Report plausible alternative explanations of the findings

Explain why rival explanations were rejected

Warn against making common misinterpretations

Obtain and address the results of a prerelease review of the draft evaluation report

Report the evaluation's limitations TOTAL

51

Total ÷ 10


A11 Impartial Reporting

Engage the client to determine steps to ensure fair, impartial reports

Establish appropriate editorial authority

Determine right-to-know audiences

Establish and follow appropriate plans for releasing findings to all right-to-know audiences

Safeguard reports from deliberate or inadvertent distortions

Report perspectives of all stakeholder groups

Report alternative plausible conclusions

Obtain outside audits of reports

Describe steps taken to control bias

Participate in public presentations of the findings to help guard against and correct distortions by other interested parties TOTALTotal ÷ 10


A12 Metaevaluation

Designate or define the standards to be used in judging the evaluation

Assign someone responsibility for documenting and assessing the evaluation process and products

Employ both formative and summative metaevaluation

Budget appropriately and sufficiently for conducting the metaevaluation

Record the full range of information needed to judge the evaluation against the stipulated standards

As feasible, contract for an independent metaevaluation

52

Determine and record which audiences will receive the metaevaluation report

Evaluate the instrumentation, data collection, data handling, coding, and analysis against the relevant standards

Evaluate the evaluation's involvement of and communication of findings to stakeholders against the relevant standards

Maintain a record of all metaevaluation steps, information, and analyses TOTALTotal ÷ 10


SCORING THE EVALUATION FOR ACCURACY: Add the following: Number of Excellent ratings (0-12) ____ x 4 = ____

Number of Very Good ratings (0-12)

____ x 3 = ____

Number of Good ratings (0-12) ____ x 2 = ____ Number of Fair ratings (0-12) ____ x 1 = ____ Total Score =____(Interpret below)Strength of the Evaluation's provisions for Accuracy:

45 (93%) - 48Excellent

33 (68%) - 44 Very Good

24 (50%) - 32Good

12 (25%) - 13 Fair

0 (0%) - 11 Poor

Evaluation Standards - Feasibility

To meet the requirements for FEASIBILITY, evaluations should:

F1 Practical Procedures

Tailor methods and instruments to information requirements

Minimize disruption

Minimize the data burden

Appoint competent staff

Train staff

53

Choose procedures that the staff are qualified to carry out

Choose procedures in light of known constraints

Make a realistic schedule

Engage locals to help conduct the evaluation

As appropriate, make evaluation procedures a part of routine events TOTALTotal ÷ 10


F2 Political Viability

Anticipate different positions of different interest groups

Avert or counteract attempts to bias or misapply the findings

Foster cooperation

Involve stakeholders throughout the evaluation

Agree on editorial and dissemination authority

Issue interim reports

Report divergent views

Report to right-to-know audiences

Employ a firm public contract

Terminate any corrupted evaluation TOTALTotal ÷ 10


F3 Cost Effectiveness

Be efficient

Make use of in-kind services

Produce information worth the investment

54

Inform decisions

Foster program improvement

Provide accountability information

Generate new insights

Help spread effective practices

Minimize disruptions

Minimize time demands on program personnel TOTALTotal ÷ 10


SCORING THE EVALUATION FOR FEASIBILITYAdd the following: Number of Excellent ratings (0-3) ____ x 4 = ____

Number of Very Good ratings (0-3) ____ x 3 = ____ Number of Good ratings (0-3) ____ x 2 = ____ Number of Fair ratings (0-3) ____ x 1 = ____

Total Score =____ (Interpret below)

Strength of the Evaluation's provisions for Feasibility: 11 (93%) - 12

Excellent 8 (68%) - 10 Very Good

6 (50%) - 7 Good

3 (25%) - 5Fair

0 (0%) - 2Poor

Evaluation Standards - Propriety

To meet the requirements for PROPRIETY, evaluations should:

P1 Service Orientation

Assess needs of the program's customers

Assess program outcomes against targeted customers' assessed needs

Help assure that the full range of rightful program beneficiaries are served

Promote excellent service

Make the evaluation's service orientation clear to stakeholders

Identify program strengths to build on

55

Identify program weaknesses to correct

Give interim feedback for program improvement

Expose harmful practices

Inform all right-to-know audiences of the program's positive and negative outcomes TOTALTotal ÷ 10


P2 Formal Agreements

Reach advance written agreements on:

Evaluation purpose and questions

Audiences

Evaluation reports

Editing

Release of reports

Evaluation procedures and schedule

Confidentiality/anonymity of data

Evaluation staff

Metaevaluation

Evaluation resources TOTALTotal ÷ 10


P3 Rights of Human Subjects

Make clear to stakeholders that the evaluation will respect and protect the rights of human subjects

Clarify intended uses of the evaluation

56

Keep stakeholders informed

Follow due process

Uphold civil rights

Understand participant values

Respect diversity

Follow protocol

Honor confidentiality/anonymity agreements

Do no harm TOTALTotal ÷ 10


P4 Human Interactions

Consistently relate to all stakeholders in a professional manner

Maintain effective communication with stakeholders

Follow the institution's protocol

Minimize disruption

Honor participants' privacy rights

Honor time commitments

Be alert to and address participants' concerns about the evaluation

Be sensitive to participants' diversity of values and cultural differences

Be even-handed in addressing different stakeholders

Do not ignore or help cover up any participant’s incompetence, unethical behavior, fraud, waste, or abuse TOTALTotal ÷ 10


P5 Complete and Fair Assessment

57

Assess and report the program's strengths

Assess and report the program's weaknesses

Report on intended outcomes

Report on unintended outcomes

Give a thorough account of the evaluation's process

As appropriate, show how the program's strengths could be used to overcome its weaknesses

Have the draft report reviewed

Appropriately address criticisms of the draft report

Acknowledge the final report's limitations

Estimate and report the effects of the evaluation's limitations on the overall judgment of the program TOTALTotal ÷ 10


P6 Disclosure of Findings

Define the right-to-know audiences

Establish a contractual basis for complying with right-to-know requirements

Inform the audiences of the evaluation's purposes and projected reports

Report all findings in writing

Report relevant points of view of both supporters and critics of the program

Report balanced, informed conclusions and recommendations

Show the basis for the conclusions and recommendations

Disclose the evaluation's limitations

In reporting, adhere strictly to a code of directness, openness, and completeness

Assure that reports reach their audiences TOTALTotal ÷ 10

58


P7 Conflict of Interest

Identify potential conflicts of interest early in the evaluation

Provide written, contractual safeguards against identified conflicts of interest

Engage multiple evaluators

Maintain evaluation records for independent review

As appropriate, engage independent parties to assess the evaluation for its susceptibility or corruption by conflicts of interest

When appropriate, release evaluation procedures, data, and reports for public review

Contract with the funding authority rather than the funded program

Have internal evaluators report directly to the chief executive officer

Report equitably to all right-to-know audiences

Engage uniquely qualified persons to participate in the evaluation, even if they have a potential conflict of interest; but take steps to counteract the conflict TOTALTotal ÷ 10


P8 Fiscal Responsibility

Specify and budget for expense items in advance

Keep the budget sufficiently flexible to permit appropriate reallocations to strengthen the evaluation

Obtain appropriate approval for needed budgetary modifications

Assign responsibility for managing the evaluation finances

Maintain accurate records of sources of funding and expenditures

Maintain adequate personnel records concerning job allocations and time spent on the job

59

Employ comparison shopping for evaluation materials

Employ comparison contract bidding

Be frugal in expending evaluation resources

As appropriate, include an expenditure summary as part of the public evaluation report TOTALTotal ÷ 10


SCORING THE EVALUATION FOR PROPRIETYAdd the following: Number of Excellent ratings (0-8) ____ x 4 = ____

Number of Very Good ratings (0-8) ____ x 3 = ____ Number of Good ratings (0-8) ____ x 2 = ____ Number of Fair ratings (0-8) ____ x 1 = ____

Total Score =____(Interpret below)

Strength of the Evaluation's provisions for Propriety: 30 (93%) - 32

Excellent 22 (68%) - 29

Very Good16 (50%) - 21

Good 8 (25%) - 15

Fair

0 (0%) - 7 Poor

The Metaevaluation Checklist: For Evaluating Evaluations against The Program Evaluation Standards - Utility

To meet the requirements for UTILITY, evaluations should:

U1 Stakeholder Identification

Clearly identify the evaluation client

Engage leadership figures to identify other stakeholders

Consult potential stakeholders to identify their information needs

Use stakeholders to identify other stakeholders

With the client, rank stakeholders for relative importance

Arrange to involve stakeholders throughout the evaluation

Keep the evaluation open to serve newly identified stakeholders

Address stakeholders' evaluation needs

60

Serve an appropriate range of individual stakeholders

Serve an appropriate range of stakeholder organizationsTOTALTotal ÷ 10


U2 Evaluator Credibility

Engage competent evaluators

Engage evaluators whom the stakeholders trust

Engage evaluators who can address stakeholders’ concerns

Engage evaluators who are appropriately responsive to issues of gender, socioeconomic status, race, and language and cultural differences

Assure that the evaluation plan responds to key stakeholders' concerns

Help stakeholders understand the evaluation plan

Give stakeholders information on the evaluation plan's technical quality and practicality

Attend appropriately to stakeholders' criticisms and suggestions

Stay abreast of social and political forces

Keep interested parties informed about the evaluation's progress TOTALTotal ÷ 10


U3 Information Scope and Selection

Understand the client's most important evaluation requirements

Interview stakeholders to determine their different perspectives

Assure that evaluator and client negotiate pertinent audiences, questions, and

61

required information

Assign priority to the most important stakeholders

Assign priority to the most important questions

Allow flexibility for adding questions during the evaluation

Obtain sufficient information to address the stakeholders' most important evaluation questions

Obtain sufficient information to assess the program's merit

Obtain sufficient information to assess the program's worth

Allocate the evaluation effort in accordance with the priorities assigned to the needed TOTALTotal ÷ 10


U4 Values Identification

Consider alternative sources of values for interpreting evaluation findings

Provide a clear, defensible basis for value judgments

Determine the appropriate party(s) to make the valuational interpretations

Identify pertinent societal needs

Identify pertinent customer needs

Reference pertinent laws

Reference, as appropriate, the relevant institutional mission

Reference the program's goals

Take into account the stakeholders' values

As appropriate, present alternative interpretations based on conflicting but credible value bases TOTALTotal ÷ 10


62

U5 Report Clarity

Clearly report the essential information

Issue brief, simple, and direct reports

Focus reports on contracted questions

Describe the program and its context

Describe the evaluation's purposes, procedures, and findings

Support conclusions and recommendations

Avoid reporting technical jargon

Report in the language(s) of stakeholders

Provide an executive summary

Provide a technical report TOTALTotal ÷ 10


U6 Report Timeliness and Dissemination

Make timely interim reports to intended users

Deliver the final report when it is needed

Have timely exchanges with the program's policy board

Have timely exchanges with the program's staff

Have timely exchanges with the program's customers

Have timely exchanges with the public media

Have timely exchanges with the full range of right-to-know audiences

Employ effective media for reaching and informing the different audiences

Keep the presentations appropriately brief

Use examples to help audiences relate the findings to practical situations TOTALTotal ÷ 10

63


U7 Evaluation Impact

Maintain contact with audience

Involve stakeholders throughout the evaluation

Encourage and support stakeholders' use of the findings

Show stakeholders how they might use the findings in their work

Forecast and address potential uses of findings

Provide interim reports

Make sure that reports are open, frank, and concrete

Supplement written reports with ongoing oral communication

Conduct feedback workshops to go over and apply findings

Make arrangements to provide follow-up assistance in interpreting and applying the findings TOTALTotal ÷ 10


SCORING THE EVALUATION FOR UTILITYAdd the following: Number of Excellent ratings (0-7) ____ x 4 = ____

Number of Very Good ratings (0-7) ____ x 3 = ____ Number of Good ratings (0-7) ____ x 2 = ____ Number of Fair ratings (0-7) ____ x 1 = ____ Total Score =____ (Interpret below)Strength of the Evaluation's provisions for Utility:

64

26 (93%) - 28Excellent

19 (68%) - 25 Very Good

14 (50%) - 18 Good

7 (25%) - 13Fair

0 (0%) - 6Poor

65

Documents

A Metaevaluation Study on the Assessment of Teacher Performance in an Assessment Center in the Philippines