15
Designing a Web-based assessment environment for improving pre-service teacher assessment literacy Tzu-Hua Wang a, * , Kuo-Hua Wang b , Shih-Chieh Huang c a Department of Education, National Hsinchu University of Education, No. 521, Nanda Rd., Hsinchu City 300, Taiwan b Graduate Institute of Science Education, National Changhua University of Education, No. 1, Jinde Rd., Changhua City, Changhua County 500, Taiwan c Biology Department, National Changhua University of Education, No. 1, Jinde Rd., Changhua City, Changhua County 500, Taiwan Received 19 March 2007; received in revised form 16 May 2007; accepted 14 June 2007 Abstract Teacher assessment literacy is a key factor in the success of teaching, but some studies concluded that teachers lack it. The aim of this research is to propose the ‘‘Practicing, Reflecting and Revising with WATA system (P2R-WATA) Assess- ment Literacy Development Model’’ for improving pre-service teacher assessment literacy. WATA system offers person- alized learning resources and opportunities for pre-service teachers to assemble tests and administer them to students on-line. Furthermore, WATA system facilitates performance of test analysis and item analysis, and enables pre-service teachers to review statistical information from the test and item analyses to revise test items. Sixty pre-service teachers par- ticipated in this research. The research results indicate that pre-service teachers using P2R-WATA Assessment Literacy Development Model have better effectiveness in improving their assessment knowledge and assessment perspectives. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Teacher assessment literacy; Teacher education; WATA system; P2R-WATA Assessment Literacy Development Model 1. Introduction The National Science Education Standards (NSES) of the United States emphasizes the importance of assessment in the process of teaching. However, assessment is also important for students. Some researchers have suggested that proper assessment can enhance student learning effectiveness (Campbell & Collins, 2007; Mertler, 2004; Mertler & Campbell, 2005). Unfortunately, numerous studies have revealed that teachers lack the assessment literacy required to administer proper and effective assessment in a classroom (e.g. Arter, 2001; Brookhart, 2001; Mertler, 2004; Popham, 2006; Wang, Wang, Wang, Huang, & Chen, 2004). The literature has concluded that the major reasons are ‘‘faults in teacher education program (Mertler, 2004; Stiggins, 2004)’’ and ‘‘faults in teacher certification policy (Stiggins, 1999, 2004)’’. This research attempts to address 0360-1315/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compedu.2007.06.010 * Corresponding author. E-mail address: [email protected] (T.H. Wang). Computers & Education 51 (2008) 448–462 www.elsevier.com/locate/compedu

PCK3

  • Upload
    mxtxxs

  • View
    110

  • Download
    1

Embed Size (px)

Citation preview

Page 1: PCK3

Computers & Education 51 (2008) 448–462

www.elsevier.com/locate/compedu

Designing a Web-based assessment environmentfor improving pre-service teacher assessment literacy

Tzu-Hua Wang a,*, Kuo-Hua Wang b, Shih-Chieh Huang c

a Department of Education, National Hsinchu University of Education, No. 521, Nanda Rd., Hsinchu City 300, Taiwanb Graduate Institute of Science Education, National Changhua University of Education, No. 1, Jinde Rd., Changhua City,

Changhua County 500, Taiwanc Biology Department, National Changhua University of Education, No. 1, Jinde Rd., Changhua City, Changhua County 500, Taiwan

Received 19 March 2007; received in revised form 16 May 2007; accepted 14 June 2007

Abstract

Teacher assessment literacy is a key factor in the success of teaching, but some studies concluded that teachers lack it.The aim of this research is to propose the ‘‘Practicing, Reflecting and Revising with WATA system (P2R-WATA) Assess-ment Literacy Development Model’’ for improving pre-service teacher assessment literacy. WATA system offers person-alized learning resources and opportunities for pre-service teachers to assemble tests and administer them to studentson-line. Furthermore, WATA system facilitates performance of test analysis and item analysis, and enables pre-serviceteachers to review statistical information from the test and item analyses to revise test items. Sixty pre-service teachers par-ticipated in this research. The research results indicate that pre-service teachers using P2R-WATA Assessment LiteracyDevelopment Model have better effectiveness in improving their assessment knowledge and assessment perspectives.� 2007 Elsevier Ltd. All rights reserved.

Keywords: Teacher assessment literacy; Teacher education; WATA system; P2R-WATA Assessment Literacy Development Model

1. Introduction

The National Science Education Standards (NSES) of the United States emphasizes the importance ofassessment in the process of teaching. However, assessment is also important for students. Some researchershave suggested that proper assessment can enhance student learning effectiveness (Campbell & Collins, 2007;Mertler, 2004; Mertler & Campbell, 2005). Unfortunately, numerous studies have revealed that teachers lackthe assessment literacy required to administer proper and effective assessment in a classroom (e.g. Arter, 2001;Brookhart, 2001; Mertler, 2004; Popham, 2006; Wang, Wang, Wang, Huang, & Chen, 2004). The literaturehas concluded that the major reasons are ‘‘faults in teacher education program (Mertler, 2004; Stiggins,2004)’’ and ‘‘faults in teacher certification policy (Stiggins, 1999, 2004)’’. This research attempts to address

0360-1315/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.compedu.2007.06.010

* Corresponding author.E-mail address: [email protected] (T.H. Wang).

Page 2: PCK3

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 449

the problem of ‘‘faults in teacher education program’’. Based on the key elements of four successful trainingmodels mentioned in the literature, this research constructs a Web-based training model, named P2R-WATA,to enhance the development of teacher assessment literacy. This research also tries to investigate the effective-ness of the P2R-WATA in improving the assessment literacy of pre-service Biology teachers.

2. Literature review

2.1. Teacher assessment literacy

Assessment literacy is part of the package of pedagogical content knowledge (PCK), which has been intro-duced as an element of the knowledge base for teaching (Shulman, 1986). PCK has been described as ‘‘specialamalgam of content and pedagogy that is uniquely the province of teachers, their own special form of profes-sional understanding (Shulman, 1987, p. 8)’’. In recent years, researches in the domain of teacher educationhave significantly promoted the understanding of the content of PCK. Magnusson, Krajcik, and Borko(1999) conclude that ‘‘knowledge and beliefs about assessment in science’’ consists in teachers’ PCK. Magnus-son et al. further explain that ‘‘knowledge and beliefs about assessment in science’’ includes ‘‘knowledge ofdimensions of science learning to assess’’ and ‘‘knowledge of methods of assessment’’. The former refers toteacher knowledge of the aspects of student learning that are important to assess within a particular unitof study. The latter refers to their knowledge of the possible methods to assess the specific aspects of studentlearning that are important to a particular unit of study. That is to say, teacher assessment literacy is includedin the ‘‘knowledge and beliefs about assessment in science’’.

In addition to Magnusson et al. (1999), in the paper – ‘‘Assessment Literacy for the 21st century’’ (Stiggins,1995), Stiggins integrates the arguments brought up before and develops a brief definition of those equippedwith assessment literacy:

‘‘. . .know the difference between sound and unsound assessments. . . .are not intimidated by the sometimesmysterious and always daunting technical world of assessment . . . knowing what they are assessing, whythey are doing so, how best to assess the achievement of interest, how to generate sound samples of per-formance, what can go wrong, and how to prevent those problems before they occur. Most important,those who are truly sensitive to the potential negative consequences of inaccurate assessment never permitstudents to be put in a situation where their achievement might be mismeasured. . .’’ (Stiggins, 1995).

Table 1Definition of teacher assessment literacy

Researchers/organizations Definition

Center for School Improvement and Policy Studies,Boise State University (2007)

Assessment literate educators recognize sound assessment, evaluation, andcommunication practices; theyUnderstand which assessment methods to use to gather dependable informationabout student achievementCommunicate assessment results effectively, whether using report card grades, testscores, portfolios, or conferencesCan use assessment to maximize student motivation and learning by involvingstudents as full partners in assessment, record keeping, and communication

Mertler (2004) 1. Recognize sound assessment, evaluation, and communication practices2. Understand which assessment methods to use to gather dependable informationand student achievement3. Communicate assessment results effectively, whether using report card grades,test scores, portfolios, or conferences4. Can use assessment to maximize student motivation and learning by involvingstudent as full partners in assessment, record keeping, and communication

North Central Regional Educational Laboratory(2007)

The possession of knowledge about the basic principals of sound assessmentpractice, including terminology, the development and use of assessmentmethodologies and techniques, familiarity with standards of quality in assessment. . . and familiarity with alternatives to traditional measurements of learning

Page 3: PCK3

450 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Moreover, some academic organizations and researchers also interpret assessment literacy in their ownway, as shown in Table 1.

This research will adopt the definition given by Mertler (2004), and divide teacher assessment literacy intotwo aspects: teacher assessment knowledge and their perspectives on it. Regarding the limitation of thisresearch, the former includes assessment knowledge about multiple-choice tests, such as the construction oftest items, the assembling and administering of tests, test analysis, item analysis and others. The latter includesteacher perspectives on assessment functions and assessment procedures of multiple-choice tests.

2.2. Development of teacher assessment literacy

Researches show that assessment in the classroom has become an important activity in the instructionalprocess (Mertler, 2004; Stiggins & Chappuis, 2005) because a well-designed assessment and its scoring carryquite positive educational significance for both teachers and students (AFT, NCME, & NEA, 1990; Stiggins,2004; Stiggins & Chappuis, 2005). Popham (2006) indicated that assessment plays a pivotal role in the educa-tion of students. Campbell and Collins (2007) further suggested that when assessment and instruction work intandem, and assessment is implemented effectively, improvement in student achievement is likely to occur.Moreover, according to NSES of the United States, ‘assessment is an important tool for good inquiry intoteaching . . .skilled teachers of science are diagnosticians who understand students’ ideas, beliefs, and reason-ing (NRC, 1996, p. 63)’. In brief, assessment plays an important role for both students and teachers.

In recent years, the ‘‘No Child Left Behind (NCLB)’’ project carried out in the United States puts emphasison assessments in schools because much evidence shows that performance on these assessments correlates tostudent scores in standardized achievement examinations (Campbell, Murphy, & Holt, 2002; Mertler, 2004).Campbell et al. (2002) argue that this tendency requires teachers to develop assessment-related ability accord-ing to the curriculum standards set by each state and further to improve their own teaching and student learn-ing effectiveness. Therefore, the promotion of professional knowledge of assessment among teachers hasbecome an important issue, and in turn assessment literacy is greatly valued (Mertler, 2004). In addition tothe growing importance of improving teacher assessment literacy resulted from the NCLB, Lukin, Bandalos,Eckhout, and Mickelson (2004) further indicate that in the project of ‘‘Standards-based, Teacher-led Assess-ment and Reporting System (STARS)’’ devised by the state of Nebraska in 2000, the district assessment isadopted instead to manage teaching materials with greater precision, and optimize student learning efficiency(Bandalos, 2004; Plake, Impara, & Buckendahl, 2004). Lukin et al. point out that the STARS educationreform can be successful only when the district assessment is used to improve the teaching in schools andfor school improvement and accountability purposes. Lukin et al. further observe that another element affect-ing the result of the STARS education reform is the improvement of classroom assessment. As a de-centralizedmodel of education reform, STARS can be successful only with teacher participation and teacher shoulderingof responsibility for student learning effectiveness. In this situation, classroom assessment quality and teacherassessment literacy are issues that need to be seriously considered. Since assessment in the classroom can beused to aid both instruction and learning, Nebraska State law LB 812 regulates teacher assessment ability andits development (Buckendahl, Impara, & Plake, 2004; Lukin et al., 2004; Plake et al., 2004). All in all, teacherassessment literacy has been more and more emphasized in the United States recently.

The literature review above shows that in recent decades, scholars have seen teaching, assessment and learn-ing effectiveness as being closely related to one another. Further, in educational reform projects carried out inrecent years, assessment has played an important role in assisting teaching and learning and promoting learn-ing effectiveness. However, many researchers have claimed that both pre-service and in-service teachers are notequipped with appropriate classroom assessment ability. There had been extensive discussions of this issue inthe 1990s (e.g. Plake, Impara, & Fager, 1993), and it is seen as a problem that will persist for years to come(e.g. Campbell & Collins, 2007; Campbell et al., 2002; Mertler, 2004; Mertler & Campbell, 2005).

Thus, in recent decades, scholars have emphasized the importance of teacher assessment literacy. Theybelieve that the courses related to the development of teacher assessment literacy should be included in bothpre-service and in-service teacher education (AFT, NCME, & NEA, 1990; Brookhart, 2001; Campbell et al.,2002; Lukin et al., 2004; Mertler, 2004; Stiggins, 1995, 1999, 2004). Four effective models of developing teacherassessment literacy are introduced below:

Page 4: PCK3

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 451

� Plake and Impara (1993): In addition to ‘‘courses training material’’, the design of ‘‘parent teacher vign-ette’’ helps construct the ‘‘simulated parent teacher conference’’. In this conference, teachers can simulatehow to explain the significance of assessment scores to parents, which helps teachers put their assessmentknowledge into practice.� NAC (Lukin et al., 2004): NAC program consists of 18 credit hours of graduate level courses and targets

experienced, practicing teachers and/or administrators. The 18 credit hours consist of six-hour coursesoffered in each of two consecutive summers with six-hour of ‘‘practicum’’ during the intervening schoolyear. The six-hour course in the first summer covers basic assessment concepts as they are applied in bothclassroom and large-scale settings. The six-hour course in the second summer focuses on analyzing andinterpreting assessment data and data based decision-making. The distinguishing feature of the NAC pro-gram are a great increase in assessment course credits, and a ‘‘practicum’’ of six credit hours that providespre-service teachers with opportunities to use their 12 credit hours of knowledge and techniques of assess-ment in a realistic situation.� ALLT (Arter, 2001), PALS & IPALS (Lukin et al., 2004): The training is done through a ‘‘learning team’’.

Team members can study assessment materials together and share what they learn. Putting the assessmentknowledge and assessment techniques they learn into practice is also emphasized.

In the four models mentioned above, assessment-related knowledge and techniques are all taught in a tra-ditional way. An important characteristic shared by them is that they all provide opportunities for teachers toapply what they have learned to realistic classroom situations. This combination of classroom experience andassessment literacy training is affirmed by many related literature (Brookhart, 2001; Lukin et al., 2004; Mer-tler, 2004; Stiggins, 1999; Taylor & Nolen, 1996). Taylor and Nolen (1996) even argue that only when taughtin realistic teaching situations can concepts of assessment be meaningful to the teachers taking the courses onassessment literacy. For example, in the model of Plake and Impara (1993), both ‘‘practice exercise’’ and‘‘hands-on experience’’ are included. In NAC model, the ‘‘practicum’’ course held in spring and fall semestersrequires course-takers to apply the assessment concepts and techniques they have learned in the summersemester to the realistic classroom situation. The ALLT, PALS and IPALS are similar in that they all includethe part of realistic classroom assessment. In ALLT, administrative staff and teachers are both involved inassisting in the assessment done in classroom. PALS and IPALS also include the ‘‘student teaching semester’’,in which course-takers can apply the assessment knowledge and techniques they have learned to a realisticeducation environment.

This research will take the four models mentioned above as references. Consulting their shared character-istic of integrating classroom experiences into the development of teacher assessment literacy, this researchaims to construct a teacher assessment literacy development model in an e-Learning environment.

2.3. Web-based assessment system and teacher assessment literacy development

With the improvement of Internet communication technology and database technology in recent years,Web-Based Testing (WBT) has been a common and effective type of assessment and has been employed indifferent educational settings (He & Tymms, 2005; Sheader, Gouldsborough, & Grady, 2006; Wang et al.,2004; Wang, 2007). With WBT, teachers can construct test items, correct test papers and record scores on-line.Moreover, the WBT can present a test paper for testees to take in the form of hypermedia, including audio,video and even virtual or dynamic images designed by JAVA. These characteristics enable the WBT to takethe place of traditional paper tests in some way. Below we will investigate how to develop a Web-based assess-ment system to help develop teacher assessment literacy from the twin angles of designing a Web-based assess-ment system and developing teacher assessment literacy.

Scholars have offered a great of advice on the design of computer-assisted and Web-based assessment systems.Bonham, Beichner, Titus, and Martin (2000) point out that a good assessment system should be equipped withthe following characteristics and functions: able to be connected through common Internet explorer software,able to identify users by secret codes, able to grade automatically, and able to collect and record the informationrelated to student scores. Gardner, Sheridan, and White (2002) conclude that the system should be equipped theability to construct a test item bank, in which teachers can store the items they already have or which have been

Page 5: PCK3

452 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

provided by the publishers of textbooks. Teachers can assemble tests on-line at any time and make students takethem on-line. In addition, the system should also include the design of scores bulletin board, which is constantlyupdated with the most recent student scores and enables students to query their scores at any time and monitortheir own learning. He and Tymms (2005) conclude that the outstanding ability of computers to collect and pro-cess information makes it easy to collect the information on examinee examination tracks. Therefore, an assess-ment system should not only passively collect information about the examination tracks but actively providefeedback, guide learning, and assist detecting learning misconceptions.

In addition to the above design suggestions, the design of the Web-based assessment system in this researchshould also assist in developing teacher assessment literacy. This research therefore adopts the Web-basedAssessment and Test Analysis system (WATA system) (Wang et al., 2004), which is equipped with the com-plete assessment process described by Gronlund and Linn (1990, pp. 109–141, 228). Using the angle of Gronl-und & Linn, the similarities and differences between the WATA system and other six different Web-basedassessment systems are shown in Table 2.

The ‘Step 1–Step 7’ in Table 2 are the ‘‘basic steps in classroom testing’’ proposed by Gronlund and Linn(1990, pp. 109–141, 228). Table 2 shows that the WATA system is the only system that allows the constructionof a Two-Way Chart and assembles the test based on it. In ‘‘appraising the test’’, there are also many differ-ences. For example, Gateway Testing System, Lon-CAPA, Mallard, Question Mark, Top-Class andWeb@ssessor can present original scores, statistic charts and test results, but they do not perform item anal-ysis. Question Mark is able to record the answer history of each student and perform item analysis. However,it does not analyse student scores in more details (such as the T score) or perform a test analysis (such as KR20 Reliability). Based on the analysis above, the WATA system appears to more completely satisfy the needsof this research as it is provided with a comprehensive assessment process. That is why it will be used in thisresearch as the Web-based assessment system for the construction of an e-Learning environment to developteacher assessment literacy.

The WATA system is equipped with the scaffold of complete assessment process, which is the ‘‘Triple-AModel (Wang et al., 2004)’’. The Triple-A Model is developed from the idea of ‘‘basic steps in classroom test-ing’’, which includes ‘‘determining the purpose of testing’’, ‘‘constructing the Two-Way Chart’’, ‘‘selectingappropriate items according to the Two-Way Chart’’, ‘‘preparing relevant items’’, ‘‘assembling the test’’,and ‘‘appraising the test’’, as described by Gronlund and Linn (1990, pp. 109–141, 228), along with the resultsof the questionnaires and interviews done with the in-service teachers (Wang et al., 2004). The content of theTriple-A Model includes:

Assembling: Teachers can construct the question database by themselves, arrange a Two-Way Chart andassemble tests based on it.

Administering: Teachers arrange and administer a multi-examination schedule.Appraising: After tests are taken, teachers can perform test analysis and item analysis.

Table 2Comparing functions of seven different WBT systems (Wang et al., 2004)

WBT systems Assembling Administering, Step 6 Appraising, Step 7

Step 1 & Step 2 Step 3 Step 4 Step 5

Gateway Testing System X X V V V DLon-CAPA X X V V V DMallard X X V V V DQuestion Mark X X V V V DTop-Class X X V V V DWeb@ssessor X X V V V DWATA V V V V V V

X = not available; V = available; D = partially available; Step 1: Determining the purpose of testing; Step 2: Constructing the two-waychart; Step 3: Selecting appropriate items according to the Two-Way Chart; Step 4: Preparing relevant items; Step 5: Assembling the test;Step 6: Administering the test; Step 7: Appraising the test, including ‘‘variance’’, ‘‘standard deviation’’, ‘‘testee T score, z score and Z

score’’, ‘‘mean of all grades’’, ‘‘average difficulty of the test’’, ‘‘KR20 reliability’’, ‘‘DP’’, ‘‘ID’’, ‘‘answers on each item in upper group andlower group’’, ‘‘wrong answers of students for each item’’, ‘‘distracter analyses of all testees, upper group and lower group’’, and so on.

Page 6: PCK3

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 453

In addition to the Triple-A Model taken from the literature review, the ‘‘personalized design (Passig, 2001)’’and the ‘‘situated design (Clarke & Hollingsworth, 2002; Wiske, Sick, & Wirsig, 2001)’’ are two important ele-ments in designing an environment for teacher education in this research. These two designs include:

� Personalized designPassig (2001) observes that in an e-Learning environment of teacher education, personalized design is quiteimportant. Egbert and Thomas (2001) also point out that an e-Learning environment requires teachers toparticipate and learn quickly, instead of passively waiting to be crammed with information. The e-Learningsystem should vary with the individual progress and needs of each teacher. Moreover, it should be able toprovide a wide variety of personalized feedback whenever required. In other words, the teachers are givenmore time to reflect and should be able to decide their own learning procedures. They can search for andgain feedback from the system depending on their learning progress, the questions they encounter, and thesituation they are in.� Situated design

Clarke and Hollingsworth (2002) and Wiske et al. (2001) all point out that situated design is important forthe environment in teacher education. One advantage of e-Learning is that situations that cannot be easilysimulated in a traditional classroom may be simulated with the help of information technology. Therefore,taking situated design as the essence, the Web-based assessment system constructed in this research is thusprovided with a high degree of contextualization, which enables learners to experience a whole set of class-room assessment procedures in simulation.

The following is the concrete way of devising personalized and situated design in the WATA system:

� Personalized design in the WATA systemThe WATA system provides each user an individual interface of the Triple-A Model. Each user can con-duct a personalized ‘‘assembling of test papers’’, ‘‘management of test schedules’’ and ‘‘test analysis anditem analysis’’. Moreover, in each section and page of the Triple-A Model, the WATA system provides elec-tronic learning resources related to the each section or page.� Situated design in the WATA system

The WATA system is equipped with the Triple-A Model. It can provide a framework for the completeassessment process, helping users to simulate the complete assessment process on-line. The complete assess-ment process stated by Gronlund and Linn (1990, pp. 109–141, 228) includes ‘‘determining the purpose oftesting’’, ‘‘constructing the Two-Way Chart’’, ‘‘selecting appropriate items according to the Two-WayChart’’, ‘‘preparing relevant items’’, ‘‘assembling the test’’, and ‘‘appraising the test’’.

In addition to adopting the WATA system to construct the e-Learning environment for teacher assessmentliteracy development, this research also takes as a reference the four teacher assessment literacy developmentmodels referred to in the literature, the model of Plake and Impara (1993), the NAC model (Lukin et al.,2004), the ALLT model (Arter, 2001), and the PALS model and IPALS model (Lukin et al., 2004). Basedon the view of combining classroom experiences and the development of assessment literacy held by Brook-hart (2001), Lukin et al. (2004), and Taylor and Nolen (1996), this research develops the ‘‘P2R-WATA Assess-ment Literacy Development Model (P2R-WATA)’’ (Fig. 1) to provide teachers with a better model fortraining and promotion of their assessment literacy. All pre-service teachers who participate in can simulatethe assembling the tests according to scaffold of the complete assessment process (Triple-A Model) in theWATA system, and ‘‘practice’’ administering and appraising tests on-line, using real students. After the stu-dents finish the tests, the WATA system can assist pre-service teachers in test analysis and item analysis, andthen provide immediate statistical feedback to the pre-service teachers. Using the statistical feedback providedby the system, pre-service teachers can ‘‘reflect’’ the faults of the items they construct. Based on what they havelearned before, electronic learning resources, and the statistics information provided by the WATA system,they can ‘‘revise’’ their items and re-administer the revised version of their tests. This will help pre-serviceteachers test whether their revision strategies are effective. This research explores the effectiveness of theP2R-WATA in promoting teacher assessment literacy.

Page 7: PCK3

Fig. 1. P2R-WATA Assessment Literacy Development Model.

454 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

3. Methodology

3.1. Participants

Participants in this research consisted of 30 third grade Biology pre-service teachers (male: 20, female: 10)and 30 fourth grade Biology pre-service teachers (male: 15, female: 15). The third grade was assigned to thecontrol group and the fourth grade was assigned to the experimental group. The average age of the controlgroup is 21.47 (SD = .67), that of the experimental group is 22.53 (SD = .67). There is no significant differencebetween control group and experimental group on the entry behaviour of assessment knowledge (F1,57 = .101,p = .752) and assessment perspectives (F1,57 = .341, p = .562).

3.2. Instruments

3.2.1. Assessment Knowledge Test (AKT) & Survey of Assessment Perspectives (SAP)

AKT is mainly used to evaluate the assessment knowledge of pre-service teacher. Before designing theAKT, all the assessment concepts slated for inclusion in the AKT were established and coded. Based on Stan-dards for Teacher Competence in Educational Assessment of Students (STCEAS) (AFT, NCME, & NEA,1990) and the basic assessment concepts of Gronlund and Linn (1990), three experts on Biology educationand assessment chosen some assessment concepts important to Biology teachers (Table 3). The 50 items inthe first version of the AKT covered all assessment concepts listed in Table 3. After the pilot test of theAKT, an item whose discrimination index was below .250 was cancelled (Noll, Scannell, & Craig, 1979). Fortyitems were retained in the final version of the AKT. Cronbach’s a for the AKT is 0.993, and its average dif-ficulty is .505.

The Survey of Assessment Perspectives (SAP) (Wang et al., 2004) is used to assess the perspectives of theassessment functions and procedures among pre-service teachers. The SAP is a nominal-categorical measure-ment and each item grouped participants into categories based upon agreement (Yes = 1 or No = 0). Thereare two subscales in the SAP, ‘‘perspectives about assessment functions’’, consisting of seven items (Cron-bach’s a = .71), and ‘‘perspectives about assessment steps’’, consisting of eight items (Cronbach’s a = .78).

We also adopt the Multitrait-Multimethod Matrix (MTMM) to examine the construct validity of theinstruments in this research, and find that MTMM analysis supports both the convergent and discriminantvalidity (Campbell & Fiske, 1959) (see Table 4).

3.2.2. Web-based Assessment and Test Analysis (WATA) system (Wang et al., 2004)

This research uses the WATA system to construct the ‘‘P2R-WATA Assessment Literacy DevelopmentModel’’. The WATA system was developed on the basis of the personalized Triple-A Model, which in turn

Page 8: PCK3

Table 3Assessment concepts covered in AKT

1. Constructing items, assembling test papers and administering tests1.1. Principles of constructing a multiple-choice item1.2. Characteristics of multiple-choice item1.3. Bloom’s Taxonomy1.4. Difference between summative assessment and formative assessment1.5. Functions of summative assessment and formative assessment in teaching activity1.6. General steps of administering tests1.7. Administering tests

2. Analysis and appraising of testing data2.1. Test analysis2.1.1. Variance2.1.2. Average2.1.3. Standard deviation2.1.4. KR20 and Cronbach a reliability2.1.5. Test difficulty2.1.6. Validity (content validity – Two-Way Chart)2.1.7. Analysis of students’ scores distribution2.1.8. T-scores2.1.9. z-scores2.1.10. Normal distribution2.2. Item analysis2.2.1. Options distractor power analysis2.2.2. Item discrimination analysis2.2.3. Students’ error-conception analysis2.2.4. Item difficulty analysis

Table 4MTMM analysis of AKT and SAP

AKT SAP

AKT (0.993)SAP 0.441 (0.660)

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 455

is based on the ‘‘basic steps in classroom testing’’ proposed by Gronlund and Linn (1990, pp. 109–141, 228)and interviews with 17 in-service teachers and assessment experts. The Triple-A Model comprises (Wang et al.,2004):

Assembling: Construction of item pools and test items; assembly of tests based on the Two-Way Chart(Fig. 2).

Administering: Assignment of test items and item choices randomly to testees; provision of personal identifi-cation numbers (PINs) and examination passwords for testees to take the test over the Internet;collection, recording, and processing the scores and other data from the tests.

Appraising: Analysis of the collected process data tests; generation of the test analyses and item analysis(Fig. 3) statistical reports.

3.3. Research design

This research was implemented in the course of ‘‘Biology Teaching Theory and Practice’’, an importantrequired course for Biology pre-service teachers. The experimental group used the P2R-WATA AssessmentLiteracy Development Model (P2R-WATA). The control group used the P2R Assessment Literacy Develop-ment Model (P2R). The P2R-WATA is constructed based on the WATA system, whose primary framework is

Page 9: PCK3

Fig. 2. Assembling a test according to the Two-Way Chart.

Fig. 3. Test analysis and item analysis.

456 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

the aforementioned Triple-A Model. The Triple-A Model provides experimental group pre-service teacherswith the ‘‘framework of complete assessment steps’’, which facilitates improvement of assessment knowledgeand assessment perspectives among experimental group pre-service teachers. In addition to the Triple-AModel, the P2R-WATA includes two essential components:

� Personalized designThis design is based on the suggestions held by Passig (2001) and Egbert and Thomas (2001) that e-Learn-ing system for teacher education should vary with individual progress and the needs of each teacher. All

Page 10: PCK3

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 457

experimental group pre-service teachers can improve their assessment knowledge and assessment perspec-tives using the personalized Triple-A Model interface. In addition, the WATA system supports individualpre-service teachers by providing personalized electronic learning resources.� Situated design

This design is based on the important characteristic, providing opportunities for learners to apply what theyhave learned to realistic classroom situations, shared by the four assessment literacy development modelsintroduced before. All experimental group pre-service teachers can practice assembling, administering andappraising tests on-line. Using the P2R-WATA, pre-service teachers can ‘‘practice’’ assembling and admin-istering tests on-line, and testees can take the test over the internet. Process data from the test is then ana-lysed and a set of test-related statistics generated by the WATA system. Pre-service teachers can make useof the statistics to ‘‘reflect’’ on their own mistakes in test construction and on testee learning. After reflec-tion, pre-service teachers will ‘‘revise’’ their mistakes based on the statistics provided by the system. Theymay also draw on the personalized electronic learning resources provided by the WATA system. After revi-sion, the revised test can be administered to the same testees again to test whether the revision strategieshave been effective.

The P2R is similar to the P2R-WATA. The major difference is that the P2R-WATA provides experimentalgroup pre-service teachers with an e-Learning environment using the WATA system and personalized elec-tronic learning resources, but the P2R provides control group pre-service teachers with a traditional learningenvironment and printed rather than personalized electronic learning resources. The content of printed learn-ing resources was identical to the personalized electronic learning resources. Neither group received lecturesfrom their professors during this research.

3.4. Research procedure

The research procedure consisted of following steps:

1. All the pre-service teachers were divided into experimental group and control group. The pre-test of AKTand SAP were administered to the experimental group and control group pre-service teachers to ascertaintheir entry behaviour of assessment knowledge and assessment perspectives.

2. After administering the two pre-tests, the experimental group and control group pre-service teachers wereasked to practice assembling tests. All pre-service teachers were provided with the teaching materials aboutthe topic of ‘Evolution’ and ‘Classification’, and a Two-Way Chart. These two topics were selected from thetextbook for students in the first grade of junior high school. Based on the provided teaching materials andthe Two-Way Chart, the pre-service teachers constructed their own test papers. There were ten multiple-choice items in the test paper. Throughout the process of constructing the items, the pre-service teachersin the experimental group relied on the WATA system, while the pre-service teachers in the control groupused Microsoft Word.

3. After they finish designing the first version of test papers, the first-graders from sixteen classes in a juniorhigh school in the middle of Taiwan take the test papers. All junior high school students had been taughtthe topic of ‘Evolution’ and ‘Classification’ before. In this test, whether a class should complete the testpapers designed by the experimental group or the control group pre-service teachers was randomlyassigned. If the former was the case, the class of students had to take the test online with the WATA systemin a computer classroom at the same time. If the latter was the case, the class of students had to take the testin the format of paper-and-pencil test in a traditional classroom at the same time.

4. After the first version of test papers was administered, all the process data during the tests was sent back tothe individual pre-service teachers. The WATA system automatically corrected and analysed all test papersof the experimental group pre-service teachers, while the control group pre-service teachers manually cor-rected their test papers and did the analysis. After they finished the analysis, both groups have to composetests and item analysis reports, and revise the items they themselves consider to be problematic, such asthose with poor discrimination index, distraction index and difficulty index. In addition, the pre-serviceteachers in both groups may not add or delete any items in the first version of test papers. They were

Page 11: PCK3

458 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

restricted to just revising items. The experimental group pre-service teachers directly revised the items in theWATA system, while the control group pre-service teachers did it in Microsoft Word. After the revised testpapers were constructed, both groups were given the middle-test of AKT and SAP.

5. The classes of junior high school students that completed the first version of test papers by the experimentalgroup pre-service teachers was also asked to complete the revised version by the same group. The same pro-cedure was used with the control group. After the revised test papers were administered, both groups againdid the analysis, composed the test and item analysis reports and revised the items. Finally, they were allgiven the post-test of AKT and SAP, which was meant to evaluate the effectiveness of their learning.

3.5. Data collection and analysis

All data collected are quantitative, including pre-test, mid-test and post-test scores of the AKT and SAP.The pre-test scores are used to test the differences in entry behaviour between the control and experimentalgroup. In addition, the repeated measures analysis, using the age of pre-service teachers as a covariate toremove the effect, is used to test the differences in the improvement of assessment knowledge and assessmentperspectives between the control and experimental group.

4. Results

4.1. Development of assessment knowledge

The results of repeated measures analysis are shown in Table 5. There is a significant difference betweenexperimental and control groups (F1,57 = 4.588, p < .05) on the pre-test, mid-test and post-test scores of theAKT. Furthermore, the experimental group performs significantly better than control group as shown bythe Post Hoc test (Table 6; p < .05).

In addition, Table 5 also indicates that there is no significant difference among the pre-test, mid-test andpost-test scores for all pre-service teachers on the AKT. However, the ‘‘Group’’ factor significantly interacts

Table 5Summary table of repeated measures on AKT (n = 60)

Source SS DF MS F

Between

Age 129.976 1 129.976 1.102Group 541.082 1 541.082 4.588*

Error 6722.739 57 117.943

Within

AKT 75.031 2 37.515 .586AKT · Age 113.187 2 56.593 .811AKT · Group 833.740 2 416.870 5.972**

Error 7957.435 114 69.802

Age: the age of pre-service teachers is used as a covariate to remove its effect; Group: experimental group and control group.* p < .05.

** p < .01.

Table 6Post Hoc test of experimental and control groups on AKT

Treatment AKT mean differencea Standard error

Experimental group (n = 30) � control group (n = 30) 7.677* 3.584

a Adjustments for multiple comparisons: Bonferroni method.* p < .05.

Page 12: PCK3

Table 7Cross analysis for Group by AKT scores

Group AKT Mean scoresa Standard error

Control group (n = 30) Pre-test 49.781 3.045Mid-test 55.176 2.378Post-test 62.062 2.594

Experimental group (n = 30) Pre-test 51.303 3.045Mid-test 61.807 2.378Post-test 76.938 2.594

a Evaluated at covariates appeared in the model: Age = 22.000. The age of pre-service teachers is used as a covariate to remove its effect.

Fig. 4. Graph of interaction between Group and AKT scores.

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 459

with the pre-test, mid-test and post-test scores of AKT (F2,114 = 5.972, p < .01). The findings of cross analysisfor ‘‘Group’’ by AKT scores are shown in Table 7 and Fig. 4.

Fig. 4 shows that there is no significant difference between the experimental group and control group in thepre-test scores of AKT (see Section 3.1). However, the trend of the experimental group is significantly divergedfrom that of the control group (Table 6; p < .05).

Table 8Summary table of repeated measures on SAP (n = 60)

Source SS DF MS F

Between

Age .458 1 .458 .260Group 26.013 1 26.013 14.791**

Error 100.246 57 1.759

Within

SAP .525 2 .262 .141SAP · Age .103 2 .052 .028SAP · Group 23.566 2 11.783 6.311**

Error 212.852 114 1.867

Age: the age of pre-service teachers is used as a covariate to remove its effect; Group: experimental group and control group.** p < .01.

Page 13: PCK3

Table 10Cross analysis for Group by SAP scores

Group SAP Mean scoresa Standard error

Control group (n = 30) Pre-test 11.296 .479Mid-test 11.921 .321Post-test 12.958 .250

Experimental group (n = 30) Pre-test 11.737 .479Mid-test 14.579 .321Post-test 14.909 .250

a Evaluated at covariates appeared in the model: Age = 22.000. The age of pre-service teachers is used as a covariate to remove its effect.

Fig. 5. Graph of interaction between Group and SAP scores.

Table 9Post Hoc test of experimental and control groups on SAP

Treatment SAP mean differencea Standard error

Experimental group (n = 30) � Control group (n = 30) 1.683* .438

a Adjustments for multiple comparisons: Bonferroni method.* p < .05.

460 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

4.2. Development of assessment perspectives

The results of repeated measures analysis are shown in Table 8. There is a significant difference betweenexperimental and control groups (F1,57 = 14.791, p < .01) on the pre-test, mid-test and post-test scores ofthe SAP. The Post Hoc test shows that the experimental group performs significantly better than the controlgroup (Table 9; p < .05).

In addition, Table 8 also indicates that there is no significant difference among all pre-service teachers onthe pre-test, mid-test and post-test scores of SAP. However, the ‘‘Group’’ factor significantly interacts with thepre-test, mid-test and post-test scores of SAP (F2,114 = 6.311, p < .01). The findings of cross analysis for‘‘Group’’ by SAP scores are shown in Table 10 and Fig. 5.

Fig. 5 shows that there is no significant difference between the experimental group and control group intheir SAP pre-test scores (see Section 3.1). However, the trend of the experimental group is significantlydiverged from that of the control group (Table 9; p < .05).

5. Concluding remarks

Positive findings of the effectiveness of the P2R-WATA in this research show that the P2R-WATA canmore effectively improve Biology pre-service teacher assessment knowledge and assessment perspective than

Page 14: PCK3

T.H. Wang et al. / Computers & Education 51 (2008) 448–462 461

the P2R. This research results state above can be explained by the characteristics of the P2R-WATA. Itsdesign is based on the suggestions of personalized design (Passig, 2001) and situated design (Clarke & Hol-lingsworth, 2002; Wiske et al., 2001). Those pre-service teachers in the experimental group (P2R-WATA)can determine their own learning speed and process in the WATA system, and use the information providedby the WATA system for feedback and reflection. The situated design of the P2R-WATA permits the pre-ser-vice teachers in the experimental group to combine assessment knowledge with assessment practice and in turnimprove their own assessment literacy.

In addition, the results also suggest that a well-designed Web-based assessment system is suitable for tea-cher education in improving teacher assessment literacy. Moreover, the design of Triple-A Model is suggestedto be included in the well-designed Web-based assessment system. The Triple-A Model offers a comprehensiveframework of assessment steps, providing scaffold to facilitate the development of assessment literacy. Thepre-service teachers in the P2R-WATA have the opportunity to practice assembling, administering andappraising tests on-line. Furthermore, they can exploit the array of statistical data generated by the WATAsystem to perform test and item analysis, compose test and item analysis reports, and revise the tests theyhad made. The P2R-WATA also provides experimental group pre-service teachers with opportunities to testtheir item revision strategies. Though the P2R provides control group pre-service teachers with opportunitiesto practice assembling, administering and appraising tests, it provides no personalized e-Learning environ-ment, no personalized electronic learning resources, no scaffolding to develop assessment literacy, and noautomatically generated statistical data. Pre-service teachers in the P2R may encounter problems when per-forming item and test analysis because the statistics are complex and difficult to perform without the aid ofa dedicated system. Moreover, when faced with problems, the control group pre-service teachers are limitedto the support of printed learning resources and cannot take advantage of personalized feedback and supportfrom personalized electronic learning resources.

Preliminary results on the effectiveness of the P2R-WATA are promising. This research suggests that theP2R-WATA should be considered as a model for an assessment literacy development program. However,because the instruments in this research are developed primarily to assess Biology teacher’s assessment literacyabout multiple-choice tests in a traditional classroom, this research suggests that more instruments should bedeveloped to assess the assessment literacy of teachers with other subject matter backgrounds, and to assessteachers’ assessment literacy about alternative assessments (e.g. performance assessment, authentic assessmentand portfolio assessment). In addition, further research should explore the effectiveness of the P2R-WATA inin-service teacher education. Moreover, longitudinal research should also be conducted to investigate whetherteachers in the P2R-WATA will apply their increased assessment literacy to their classroom teaching toimprove student learning effectiveness.

Acknowledgements

This paper is a part of Dr. Tzu-Hua Wang’s unpublished doctoral dissertation from Graduate Institute ofScience Education, National Changhua University of Education, Taiwan. Dr. Tzu-Hua Wang would like toexpress the deepest gratitude to his dissertation advisors, Prof. Shih-Chieh Huang and Prof. Kuo-Hua Wang,for their continuous support and invaluable guidance. Dr. Tzu-Hua Wang is also grateful to his dissertationcommittee members, Prof. Ching-Kuch Chang, Prof. Mei-Hung Chiu, Prof. Hsiang-Chuan Liu, Prof. Whe-Dar Lin, Prof. Wei-Lung Wang and Prof. Chih-Chiang Yang, for their thoughtful comments and suggestions.The authors are also grateful for the insightful comments from the referees.

References

American Federation of Teachers, National Council on Measurement in Education, & National Education Association (AFT, NCME, &NEA). (1990). The standards for teacher competence in the educational assessment of students. Educational Measurement: Issues and

Practice, 9(4), 30–32.Arter, J. (2001). Learning teams for classroom assessment literacy. NASSP Bulletin, 85(621), 53–65.Bandalos, D. (2004). Introduction to the special issue on the Nebraska Standards-based, Teacher-led assessment and Reporting System

(STARS). Educational Assessment: Issues and Practice, 23(2), 4–6.

Page 15: PCK3

462 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Bonham, S. W., Beichner, R. J., Titus, A., & Martin, L. (2000). Education research using Web-based assessment systems. Journal of

Research on Computing in Education, 33, 28–45.Brookhart, S. M. (2001). The standards and classroom assessment research. Paper presented at the annual meeting of the American

Association of Colleges for Teacher Education, Dallas, TX (ERIC Document Reproduction Service No. ED451189).Buckendahl, C. W., Impara, J. C., & Plake, B. S. (2004). A strategy for evaluating district developed assessments for state accountability.

Educational Measurement: Issues and Practice, 23(2), 15–23.Campbell, C., & Collins, V. L. (2007). Identifying essential topics in general and special education introductory assessment textbooks.

Educational Measurement, Issues and Practice, 26(1), 9–18.Campbell, C., Murphy, J. A., & Holt, J. K. (2002). Psychometric analysis of an assessment literacy instrument: Applicability to preservice

teachers. Paper presented at the annual meeting of the Mid-Western Educational Research Association, Columbus, OH.Campbell, D., & Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin,

56, 81–105.Center for School Improvement and Policy Studies, Boise State University. (2007). What is assessment literacy? Assessment literacy.

Retrieved May 16, 2007 (available http://csi.boisestate.edu/al/).Clarke, D., & Hollingsworth, H. (2002). Elaborating a model of teacher professional growth. Teaching and Teacher Education, 18(8),

913–1059.Egbert, J., & Thomas, M. (2001). The new frontier: A case study in applying instructional design for distance teacher education. Journal of

Technology and Teacher Education, 9(3), 391–405.Gardner, L., Sheridan, D., & White, D. (2002). A Web-based learning and assessment system to support flexible education. Journal of

Computer Assisted Learning, 18, 125–136.Gronlund, N. E., & Linn, R. L. (1990). Measurement and evaluation in teaching (6th ed.). New York: MacMillan.He, Q., & Tymms, P. (2005). A computer-assisted test design and diagnosis system for use by classroom teachers. Journal of Computer

Assisted Learning, 21(6), 419–429.Lukin, L. E., Bandalos, D. L., Eckhout, T. J., & Mickelson, K. (2004). Facilitating the development of assessment literacy. Educational

Measurement: Issues and Practice, 23(2), 26–32.Magnusson, S.,Krajcik,J., & Borko,H. (1999).Nature, sourcesanddevelopment ofpedagogicalcontentknowledge. InJ. Gess-Newsome &N.

G. Lederman (Eds.), Examining pedagogical content knowledge (pp. 95–132). Dordrecht, The Netherlands: Kluwer Academic Publishers.Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experience make a difference?. American Secondary

Education 32(3), 49–64.Mertler, C. A. & Campbell, C. (2005). Measuring teachers’ knowledge & application of classroom assessment concepts: Development of the

assessment literacy inventory. Paper presented at the annual meeting of the American Educational Research Association, Quebec, Canada.Noll, V., Scannell, D., & Craig, R. (1979). Introduction to educational measurement (4th ed.). Boston: Houghton Mifflin Company.North Central Regional Educational Laboratory. (2007). Indicator. Assessment. Retrieved May 16, 2007 (available www.ncrel.org/

engauge/framewk/pro/literacy/prolitin.htm).National Research Council (NRC). (1996). National science education standards. Washington, DC: National Academy Press.Passig, D. (2001). Future online teachers’ scaffolding: What kind of advanced technological innovations would teachers like to see in future

distance training projects? Journal of Technology and Teacher Education, 9(4), 599–606.Plake, B. S. & Impara, J. C. (1993). Teacher assessment literacy: Development of training modules (ERIC Document Reproduction Service

No. ED358131).Plake, B. S., Impara, J. C., & Buckendahl, C. W. (2004). Technical quality criteria for evaluating district assessment portfolios used in the

Nebraska STARS. Educational Measurement: Issues and Practice, 23(2), 10–14.Plake, B. S., Impara, J. C., & Fager, J. J. (1993). Assessment competencies of teachers: A national survey. Educational Measurement: Issues

and Practice, 12(4), 10–12, 39.Popham, W. J. (2006). Needed: A dose of assessment literacy. Educational Leadership, 63(6), 84–85.Sheader, E., Gouldsborough, I., & Grady, R. (2006). Staff and student perceptions of computer-assisted assessment for physiology

practical classes. American Journal of Physiology-Advances in Physiology Education, 30(4), 174–180.Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(1), 4–14.Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57, 1–22.Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77(3), 238–245.Stiggins, R. J. (1999). Evaluating classroom assessment training in teacher education programs. Educational Measurement: Issues and

Practice, 18(1), 23–27.Stiggins, R. J. (2004). New assessment beliefs for a new school mission. Phi Delta Kappan, 86(1), 22–27.Stiggins, R. J., & Chappuis, J. (2005). Using student-involved classroom assessment to close achievement gaps. Theory Into Practice, 44(1),

11–18.Taylor, C. S., & Nolen, S. B. (1996). What does the psychometrician’s classroom look like? Reframing assessment concepts in the context

of learning. Education Policy Analysis Archives, 4(17) (available olam.ed.asu.edu/epaa/v4n17.html).Wang, T. H. (2007). What strategies are effective for formative assessment in an e-learning environment? Journal of Computer Assisted

Learning, 23, 171–186.Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, Sherry Y. (2004). Web-based Assessment and Test Analyses (WATA)

system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71.Wiske, M. S., Sick, M., & Wirsig, S. (2001). New technologies to support teaching for understanding. International Journal of Educational

Research, 35, 483–501.