Computers & Education · 2001). Often, the amount of formative assessment has decreased, despite its advantages to student learning (Race, 2001). Computer based assessment (CBA) technologies

Computers & Education 52 (2009) 749–761

Contents lists available at ScienceDirect

Computers & Education

journal homepage: www.elsevier .com/locate /compedu

Authoring diagram-based CBA with CourseMarker

Colin A. Higgins, Brett Bligh *, Pavlos Symeonidis, Athanasios TsintsifasSchool of Computer Science/Learning Sciences Research Institute/Visual Learning Lab, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham NG8 1BB, UK

a r t i c l e i n f o

Article history:Received 27 February 2008Received in revised form 18 November 2008Accepted 26 November 2008

Keywords:Authoring tools and methodsArchitectures for educational technologysystemInteractive learning environmentsEvaluation of CAL systems

0360-1315/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.compedu.2008.11.019

* Corresponding author. Tel.: +44 1158467662.E-mail addresses: [email protected] (C.A. Higgins),

a b s t r a c t

The CourseMarker system has been used to assess free-response computer based assessment (CBA) exer-cises since 1998. The aim of the studies reported here was to evaluate the feasibility and usefulness ofdeveloping and deploying diagram-based exercises using DATsys, an authoring environment for dia-gram-based CBA, together with CourseMarker. Postgraduate students constructed diagram-based exer-cises in four domains. The process of constructing the exercises was captured as an indicator offeasibility. The exercises were then used to assess two cohorts of undergraduate students. Instrumentsincluding system submission logs and student questionnaires were used to assess usefulness.

Findings indicate that there is considerable potential for the assessment of free-response domains suchas diagrams. Such an approach can help students as part of an iterative process of learning by allowingrepeated submission of coursework, which may be most appropriate within a formative assessment con-text. The exercises are popular with students and demonstrate a gradual, though decelerating, increase inmarks over subsequent submissions. The techniques are reliable, but further development allowing foralternative model solutions and assessment of the aesthetic appearance of diagrams would increasevalidity. Our techniques and findings are novel for CBA, and have implications for the increasingly impor-tant research area of formative assessment.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

As Higher Education institutions confront the challenge of providing academic courses with increasingly less favourable staff-to-studentratios, numerous strategies have been adopted in an attempt to manage the assessment issues associated with teaching large groups (Rust,2001). Often, the amount of formative assessment has decreased, despite its advantages to student learning (Race, 2001). Computer basedassessment (CBA) technologies have been proposed as a solution to mechanise the assessment process (Charman & Elmes, 1998a), therebyallowing large volumes of student work to be marked with a practical level of effort. Here we argue that, by using CBA architectures whichallow for the marking of free-response exercises as well as the provision of rich feedback to students, it is possible to mechanise an iter-ative, formative process of learning for the student which is a considerable advance upon prevalent, reductionist automated assessmentssuch as online multiple-choice exams.

CBA refers to the delivery of materials for teaching and assessment, the input of solutions by the students, an automated assessmentprocess and the delivery of feedback, all achieved through an integrated, coherent, online system (Higgins & Bligh, 2006). Importantly,the entire process occurs online at the computer terminal.

CBA exercises can be categorised as either fixed-response or free-response. Fixed-response CBA exercises, such as multiple-choice ques-tions, are the easiest to implement but are often criticised for assessing only the knowledge of a student, thereby encouraging surface-learning strategies (Johnstone & Ambusaidi, 2000). In free-response exercise types, the student is presented with an environment withinwhich a solution can be constructed in a freeform way, encouraging problem-solving. However, free-response domains require a morecomplex marking algorithm, since the student solution cannot be precisely anticipated.

CBA exercises are used across a wide range of disciplines. For example, Charman and Elmes (1998b) describe the use of the QuestionMarkPerception software to conduct formative assessment in Geography topics, while similar approaches have been documented in Mathemat-ics (Greenhow, 2000), Health Sciences (Wybrew, 1998) and many others. Buchanan (2000) reports on the use of the bespoke web-basedpackage PsyCAL to conduct self-assessment in Psychology modules. Paul and Boyle (1998) describe a CBA system used to assess second yearundergraduates in Paleontology which generates randomised MCQ tests from a question bank, designed to simultaneously assess studentsboth formatively and summatively.

ll rights reserved.

[email protected] (B. Bligh), [email protected] (P. Symeonidis), [email protected] (A. Tsintsifas).

mailto:[email protected]




http://www.sciencedirect.com/science/journal/03601315

http://www.elsevier.com/locate/compedu

750 C.A. Higgins et al. / Computers & Education 52 (2009) 749–761

The CourseMarker system used as the basis for this research is the successor to the widely used Ceilidh (Higgins, Hegazy, Symeonidis, &Tsintsifas, 2003), which introduced many key concepts into CBA, such as a multi-layer architecture, a hierarchical course structure, differ-ent user views and configurable marking tools. Ceilidh’s influence can be seen widely in such systems as Kassandra (von Matt, 1994), theapproach described by Oliver (1998), RoboProf (Daly, 1999), ASSYST (Jackson, 2000), ASAP (Douce, Livingstone, Orwell, Grindle, & Cobb,2005), EduComponents (Amelung, Piotrowski, & Rösner, 2006) and others. Our aim in this paper is to demonstrate that the introductionof a flexible diagramming framework into such a CBA system can provide considerable benefit to practitioners, analogous to those providedfor other free-response exercise types such as programming assignments. Additionally, we aim to describe how the process of creating anddeploying such coursework would work in practice and how such exercises have been received by students. It is not our intention merelyto showcase the idiosyncratic abilities of our system.

The DATsys system is an object-oriented framework for diagramming in a CBA context (Tsintsifas, 2002). Coursework authored in DAT-sys is designed to be distributed and assessed using the CourseMarker CBA system (Higgins et al., 2003). Unlike systems such as TRAKLA2(Malmi & Korhonen, 2004) and PILOT (Bridgeman, Goodrich, Kobourov, & Tamassia, 2000), which carefully constrain student interaction,DATsys allows the free-form drawing of solutions by the student on a drawing canvas. Furthermore, DATsys is designed to accommodate awide range of diagram domains, in contrast to work such as that by Thomas, Waugh, and Smith (2006) and Batmaz and Hinde (2006), whoconcentrate on the in-depth analysis of a single domain.

Our research question is as follows: can it be demonstrated that CBA in diagram-based domains is both feasible and useful? A course ofaction is feasible if it can be implemented such that the requirements are fulfilled. Usefulness is achieved when a system provides resultswhich are of benefit to practitioners. We therefore set out the feasibility requirements of CBA and demonstrate that these can be fulfilled infour disparate domains of educational diagrams by describing the process of authoring exercises across the domains. We demonstrate use-fulness by presenting experimental results from live exercises on undergraduate student cohorts.

Given the definition of CBA, it is necessary to examine the success of CourseMarker in automating the basic assessment process.Almond, Steinberg, and Mislevy (2002) summarise the four basic processes present in a simple assessment cycle. The Activity Selec-tion Process selects and sequences tasks with an assessment or instructional focus, including administrative duties. The PresentationProcess presents the task to the student and captures their response. Response Processing identifies and evaluates essential features inthe response and records a series of ‘‘Observations”. Finally, the Summary Scoring Process uses the Observations to update the ‘‘Scor-ing Record”. To be considered successful, a CBA system must provide integrated, online facilities to enable each of these processes tooccur.

A key drawback of automated assessment is a perceived inability to assess higher-order learning (Carter et al., 2003). Existing workwhich seeks to address this problem includes the careful construction of assertion-reason multiple-choice questions (Charman & Elmes,1998a) and graphical hotspot questions (King & Duke-Williams, 2001). In contrast to these approaches, which seek to counteract the lim-itations of the response medium by the use of intricate question design, our approach seeks to change the medium itself to allow richer,more naturalistic interactions for the learner.

Formative assessment aims to improve the learning of the student through feedback (Race, 2001). Formative assessment stands op-posed to summative assessment, whose central function is to provide an indicator of achievement (e.g. a grade) at the conclusion of a unitof learning. Viewed as resource-intensive, formative assessment has suffered a decline as academic workload has increased, despite a sur-vey of research publications concluding that, ‘‘if best practices were achieved in mathematics on a nationwide scale that would raise ‘aver-age’ countries such as England and the USA into the top five” (Black & Wiliam, 1998).

Formative assessment, therefore, is set to become a key pedagogic battleground, given that it is unlikely that Higher Education institu-tions will reverse staff-to-student ratio trends within the forseeable future. The work presented here, which allows an iterative process oflearning to occur as a result of repeated student submissions, based upon meaningful automated feedback, presents a plausible avenue ofadvance within the context of an important problem.

2. Methodology

2.1. Subject and context

The CourseMarker system has been used to assess undergraduate coursework for Computer Science undergraduates at the University ofNottingham since 1998 (Symeonidis, 2006), primarily for free-response programming assignments. CourseMarker is deeply integrated intothe programming module, since students attend weekly lab sessions and complete coursework which is assessed by the system. Since theprogramming module is compulsory for all Computer Science undergraduates and carries the highest weighting of any module in the firstyear of the course, students are encouraged to devote considerable time and effort to the weekly exercises. CourseMarker, therefore, isembedded within the learning culture of the undergraduate course.

The lab sessions are supported by a team of ‘‘lab demonstrators”, primarily postgraduate students who are paid by the hour for theirtutoring work. The tutors have instructions to help with student problems by attempting to promote conceptual understanding. However,since the assessment is summative for the programming assignments, it is inappropriate for tutors to directly contribute to students’solutions.

The CourseMarker server systems run on a machine housed within the Computer Science building, and the client program is loaded ontoeach computer in the student terminal rooms. In this way, students can work at their convenience by accessing the CourseMarker clientprogram from a terminal room during opening hours. Hence, attendance at the formal lab sessions has not been made mandatory. Instead,student submissions and marks can be monitored through CourseMarker’s web-based administration facilities (Foxley, Higgins, Symeon-idis, & Tsintsifas, 2001b). Students will be expected to attend lab sessions if they are struggling, and will be contacted if this is not the case.In fact, the lab sessions are very well attended, by students of all ability levels, who value the culture of structure and support which isprovided.

The CourseMarker system has interfaces for several different types of users (Foxley, Higgins, Hegazy, Symeonidis, & Tsintsifas, 2001a).For our present purpose, we focus on the feasibility of constructing diagram-based coursework by users known as ‘developers’, and the

C.A. Higgins et al. / Computers & Education 52 (2009) 749–761 751

usefulness of the exercises in assessing and providing feedback to students. Typically, the exercise developer is a postgraduate student whoundertakes the work of writing CourseMarker exercise files on a paid basis, based upon ideas provided by the module lecturer.

Cohorts of undergraduate Computer Science students were used to test the exercises in a live setting. For the first three prototypicaldomains, the cohort consisted of 167 first year undergraduate students undertaking an introductory Software Tools module. For the fourthdomain, entity–relationship diagrams, the cohort consisted of 141 second year undergraduate students taking a module on DatabaseSystems.

To encourage student participation, two strategies were attempted. In the Software Tools module the exercises were assessed and scaf-folded using the standard method of providing lab sessions. The exercises were summatively assessed, which is to say that the exercisescounted towards award of module credit, albeit with a low weighting value. This method is known to encourage student participation, butmay present challenges pedagogically (Race, 2001).

In the Database Systems module the exercises were considered as voluntary, formative assessment only. However, the exercises werethe prelude to associated, summatively assessed exercises, which were compulsory. This two-part assessment strategy was intended toencourage student participation, since the students would gain valuable insight into the summative exercises by completing the formativeexercises first, thereby receiving useful feedback and the chance to practice their skills. For each cohort, the first exercise was trivial anddesigned to allow students to learn to use the system while the other exercises were progressively more complex.

2.2. Instruments

A variety of instruments were used to assess the feasibility and usefulness of constructing and running the exercises.In terms of the construction of the exercises we were primarily interested in process, since the outputs would be exercises which we

would assess separately in terms of their usefulness to students. For diagram-based assessment to occur through CourseMarker, it is nec-essary to specify a notation for the domain which allows respondents to construct their solutions. It is also necessary to assess the solutionsusing an automated process once they have been submitted. The DATsys framework and the three editors allow customisation of the stu-dent diagram editor to the requirements of the exercise, while CourseMarker’s generic marking mechanism manages the automated pro-cess according to conditions defined within the exercise files. To assess the feasibility of constructing appropriate domain notations andexercise files, postgraduate students undertook the exercise development work, paid for their time in a manner similar to the more estab-lished programming exercises.

Two postgraduate students were involved in the work, with one constructing the exercises for the three prototypical domains in theSoftware Tools module while the other constructed the exercises for the Database Systems module. These developers were asked to keepnotes of the procedures they used to develop the materials so that the feasibility of the process could be assessed. Section 4 summariseshow this is accomplished, for each domain in turn, by the combination of graphical primitives, the specification of connectivity and howdifferent criteria were expressed as extensions to the generic marking mechanism, based upon the notes taken by the developers. It mustbe noted that both postgraduates had previously been involved in the development of CourseMarker programming exercises and so hadsome familiarity with the system. The construction of programming exercises has now been successfully undertaken in practice for severalyears; we considered that if the development of diagram-based exercises could occur with comparable levels of effort then this would be agood indicator of feasibility.

With regard to the usefulness of the exercises to students, we collected both quantitative and qualitative data. The CourseMarkerarchiving server is responsible for the archival of student work and of marking results (Symeonidis, 2006). It can retrieve stored courseworkor marks for exercises, units and courses upon request, dependent on suitable authentication. Thus, for each student, it was possible totrack the changes made between submissions. It was also possible to access information such as the number of submissions made by eachstudent and to see how the marks changed as the number of submissions increased. This data was combined with further information fromthe student questionnaires to form a quantitative picture of how the students interacted with the system over time.

Students in the Database Systems module were presented with a questionnaire divided into two sections. Likert scale questions withinthe first section were designed to broadly assess how useful the exercises had been to the student learning process and how enjoyable theexperience of using the CBA system had been to the users. Qualitative data was collected in the second section through the use of open-ended questions. Unfortunately the distribution of questionnaires within the earlier Software Tools module was limited and regarded asunreliable as a source of data.

In both modules we conducted brief, informal interviews with the lab tutors, immediately after lab sessions, at the beginning and end ofthe demonstration series. We asked tutors:

� to summarise their perception of the exercises as learning aids for students;� whether certain problems had been raised by multiple students;� whether prominent problems were conceptual and related to the domain, or whether they were caused by inadequacies with the

system;� how they felt that students had progressed during the lab session.

Due to the lack of questionnaire data from the Software Tools module, we also asked the tutors within that module for their opinion onhow popular the exercises had been with students.

2.3. Data analysis

When analysing our data, it was necessary to consider issues such as reliability and validity, as well as the relationships between thequantitative and qualitative data and the different aspects of the culture of learning they might illustrate. In keeping with Brown, Race, andSmith (1996), we define a measurement as reliable if it is consistent across assessors and between assessments on different occasions,while a measurement is valid if it measures the criteria without dependency on other qualities.


Section 4 presents a summary of the development process for the four different exercise domains. We summarise the notes made by thepostgraduate exercise developers, providing an account which is valid in that it represents an account of how the problem was solved prac-tically. Our analysis in this case cannot be regarded as reliable, since different exercise developers might have solved the problems using adifferent approach. However, our data is sufficiently robust to demonstrate our core aim of feasibility, i.e. that a flexible diagrammingframework and generic marking mechanism can be used to meaningfully assess free-response coursework in diagramming. We do notmake the claim that these are the only available solutions.

Subsequently, Section 5 explores the experience which was gained when the exercises were used by student cohorts, supervised by labdemonstrators. We regard the first three exercise domains as prototypes; our inability to gain meaningful feedback from questionnairedata means that we relied on demonstrator perception, which is, again, valid but not reliable. We concentrate our attention mainly onthe experience with the entity–relationship diagram exercises. We begin with the quantitative data from the CourseMarker archiving ser-ver. CBA assessment can confidently be considered reliable because a consistent marking process is applied to the solutions of all students(Charman & Elmes, 1998a). The validity of the marks, of course, cannot be verified without reference to the qualitative data, since the sys-tem is effectively applying marking processes without the benefit of domain knowledge. We continue our quantitative analysis by refer-encing the student feedback on the Likert scale questions. These marks are averaged and are reliable within context, but care must be takenwhen assigning validity due to the pre-immersion of the undergraduates within a culture in which the use of CourseMarker is well estab-lished. Students unfamiliar with the basics of the CourseMarker system might conceivably differ in their reactions to the diagrammingexercises.

The qualitative data was used to contextualise those trends which could be observed in the quantitative data. Since a fundamentalrequirement of the experiment was to determine the success of the automated assessment itself, it was necessary to consider the obser-vations and experience of the tutors who had led the laboratory sessions. Our primary criteria for identification within the interview datawere the items with which we started the tutor interviews. We corroborated the tutor responses by referencing the answers to the open-ended questions in the second section of the student questionnaires. Finally, we formally evaluated how feasibility and usefulness had beendemonstrated by our results, with reference to the definitions given in the introductory section.

3. DATsys and CourseMarker

The DATsys architecture has been influenced by the Unidraw (Vlissides, 1990), Hotdraw (Johnson, 1992), and JHotdraw (Beck & Gamma,1997) object-oriented frameworks. In contrast to these frameworks, DATsys defines graphical tools that allow the configuration of exten-sion points. Such tools are contained in two diagram editors, Daidalos and Ariadne. A third diagram editor, Theseus, is used by students todevelop their solutions.

Fig. 1 illustrates the relationships between the three diagram editors that are parts of the DATsys framework. Daidalos defines specifi-cations for diagram notations, which are grouped into libraries. Ariadne uses these libraries to allow the authoring of diagram-based CBAexercises. The building of a diagram-based CBA exercise consists of describing how the student editor will function and how the studentdiagram will be marked. Both Daidalos and Ariadne are the front-end of the authoring system for diagram-based CBA. Theseus is the cus-tomised diagram editor that is unique to the CBA exercise. Daidalos, Ariadne and Theseus are all concrete implementations of DATsys edi-tors. Ariadne is, additionally, integrated with CourseMarker’s generic marking mechanism.

The authoring of a diagram-based CBA exercise involves the following stages:

� Using Daidalos to build a tool library for creating and connecting diagram elements.� Using Ariadne to build a CBA exercise by choosing a subset of Daidalos’ tools for the student tool library, selecting application features,

developing the marking scheme, and configuring the marking tools.

The lifecycle of CBA exercises involves the following additional stages:

� Testing and deploying the exercise through CourseMarker.� Running the exercise and marking student solutions.� Administering the exercise and evaluating the results.

Fig. 1. A view of how DATsys relates to the marking of diagrams.


3.1. Daidalos

Users of Daidalos develop a tool library. Tool libraries are designed to contain tools that have been customised to suit specific graphicalnotations. Because more than one notation often exists for a type of diagram, tool libraries can be grouped. A group aims to represent adiagrammatic domain.

Daidalos’ interface presents the user with three main windows for:

� Tool library management.� Interactive diagram element creation and editing on the canvas.� Selection editing.

Fig. 2 illustrates a view of Daidalos and describes all its associated options. The tool library window allows the organisation of tools intotool libraries and these into groups of tool libraries. Supported functions include the loading and saving of libraries, and the adding andremoving of tools. The Add Tool button creates a new tool by using the figures that are currently selected. Depending on the selection,Daidalos interprets the type and configuration of the tool to be created. Before adding a new tool, the selection must contain a valid spec-ification for a diagram element. The specification is visual; it consists of the graphical appearance of the diagram element, together with itsdata model and its configuration for connectivity to other elements.

All three aspects of a diagram element are described interactively. The graphical view is drawn using primitive figures. The data model isspecified by adding typed data fields. The connectivity is specified by either choosing perimeter-based connections or pin-based connec-tions. The appearance and type of connection lines can be further configured by selecting appropriate options.

3.2. Ariadne

Ariadne loads the existing diagram-based CBA exercises, together with the tool libraries that were previously authored with Daidalos.These exercises may have been already deployed in CourseMarker, in which case Ariadne loads them from the CourseMarker’s course area.

Fig. 2. Daidalos’ map of features.

Fig. 3. Ariadne’s map of features.


For development and testing purposes, the exercises can also reside in a local directory. Ariadne is used by teachers, who aim to creatediagram-based CBA exercises. To accomplish this, the output of Ariadne for a single exercise consists of:

� An exercise specific tool library and application configuration file.� A marking scheme and configuration for the marking tools.� Configuration for the CBA exercise.

Fig. 3 illustrates a view of Ariadne and describes all its associated options. Ariadne retains most of the editing features of Daidalos. Inaddition, it contains a repository management window to manage the files that belong to a diagram-based CBA exercise.

For each of the configuration files, Ariadne uses an appropriate editor. For the description of the marking scheme, Ariadne offers thegeneration of the source code by using wizards, and a simple text editor that provides compilation and testing features.

3.3. Theseus

Upon execution, Theseus loads the exercise specific tool library and a configuration file that describes the available exercise options.Theseus is therefore customised, by parameterisation, to the requirements of the CBA exercise.

Theseus’ users are the students that have to draw the exercise solution. Upon completion, the student presses the submit button on theirCourseMarker client. The client sends a submission object to the CourseMarker submission server, which is then delegated to the markingsubsystem. The marking subsystem executes the appropriate diagrammatic marking tool that examines the student’s solution and returnsmarking results and feedback.

Fig. 4 illustrates a view of Theseus for a sample exercise in logic design and describes all the associated to the exercise options. Theseuscontains only a subset of the editing features of Daidalos.

4. Examples of diagram-based CBA exercises

Exercises in four different domains have been automatically assessed using DATsys and CourseMarker at the University of Nottingham.Exercises in logic circuit design, flowchart design, Object-oriented design and entity–relationship diagrams were authored using Daidalosand Ariadne and deployed via CourseMarker servers.

Fig. 4. Theseus’ map of features.


4.1. Logic design Coursework

Diagram-based CBA exercises in logic design were the first to be authored. Fig. 5 depicts the entire process.Daidalos was used to create the tool library that represents the logic gates. The graphical representation of the gates and other compo-

nents was made by placing bitmap pictures of the various gates into a rectangle with a transparent perimeter, as it proved easier to findappropriate bitmaps than to draw the gates using primitive figures. The connectivity properties for each gate were then set on the appro-priate connection points.

After a tool library for gates was complete, Ariadne was used to select the application features that configured Theseus. For example, assimple circuit design does not require zooming, this feature was disabled. However, multiple undo and redo may be useful, so this featurewas enabled in Theseus. Next, Ariadne was used to develop the marking scheme and configure the marking tools of the CBA exercise. Inorder to accurately assess exercises in logic circuit design, we developed a CourseMarker marking tool that simulates logic circuits. Thefeature tool was also used to test the students’ solutions for specific characteristics, such as the minimum or maximum number of gatesused.

Two logic design exercises were set as coursework. The first exercise required students to draw a simple circuit for an elevator controlboard. The second exercise required the students to design a circuit for a switchboard that controls a nuclear facility. It is worth noting thatthe students were allowed to use more components than necessary to produce the required logic. In these cases, students were awardedfull marks for the dynamic tests which used circuit simulation. However, during the features checking part of the marking process, themarking feedback would suggest to the students that their solution was not optimal. A small percentage of marks would be deductedby the features marking tool in this case.

4.2. Flowcharts

Exercises that use flowchart diagrams were the second type of exercise that was created. Fig. 6 depicts an overview of the authoringprocess.

The authoring of the tool library for the flowchart symbols involved a different process to that of logic gates. The view of the flowchartdiagram elements was made by composing primitive shapes. The connectivity properties for each element were defined as perimeter-type.The connection figure used to connect two flowchart elements is a simple line figure, decorated with an arrowhead to denote the directionof the flow. The data model for each element is a text label that holds the statement within the flowchart symbol. Ariadne was used toconfigure Theseus to allow zooming, grouping, alignment, font editing and z-layer ordering.

Appropriate marking tools mark the student flowchart as effectively as for logic design. A flowchart tool was developed that converts theflowchart into BASIC code and uses CourseMarker’s dynamic marking tool to test the correctness of the flowchart’s execution. Originally

Fig. 5. Steps for authoring CBA exercises in logic design.

Fig. 6. Steps for authoring CBA exercises in flowchart design.


developed to assess programming coursework (Symeonidis, 2006), the dynamic marking tool runs student solutions against test data, pro-duces textual output and assesses whether the output produced by the program satisifies the stated solution. The exercise author, there-fore, needs to identify the test cases which will be ‘‘fed” to the simulated flowchart as well as the expected output values. Separately, thefeature tool is again used to test the students’ ability to construct their solution using the most appropriate diagramming components.

One exercise has been set as coursework for flowcharts to our students. The exercise required students to draw a flowchart for compar-ing three numbers. Although simple, this example used all the nodes of the flowchart diagram notation. The solution of the exercise re-quired the students to:

� Draw a starting and ending flowchart node, three input statements, three conditional statements and three printing statements.� Enter statements within each flowchart symbol to define its meaning. The question description explained the simple syntax of these

statements.� Connect the flowchart symbols using single arrowed lines.


4.3. Object-oriented design

Object-oriented design diagrams are a key part of the object-oriented design methodology (Rumbaugh, Jacobson, & Booch, 2004). With-in an educational setting, they are used to encourage students to think of computer systems in terms of objects and the relationships be-tween those objects, rather than in more traditional terms such as procedures. The OO diagrams we wished to assess would contain suchcomponents as objects, classes and interfaces, and would use connection lines to represent the relationships between these nodes.

As for flowcharts, the connectivity properties for each object-oriented node were set as perimeter-type. The connection figure that con-nects two object-oriented design elements indicates the relationship type; students choose the connection figure according to which typeis the most appropriate. Symbols for most object-oriented notations can be represented with primitives such as circles, diamonds, and ar-rows. Object-oriented design diagram exercises rely heavily on CourseMarker’s feature tool, which is used to validate the relationships be-tween diagram elements. The tool identifies redundant classes and interfaces and distinguishes between the cardinality of thediagramming components and their relationships.

The set coursework required students to design a hotel management application according to a well-defined specification of require-ments. This exercise was harder to solve than those for the two previous domains, and took advantage of the expressiveness of the ob-ject-oriented diagram notation. The solution of the exercise required the students to place 12 components and draw 17 relationships.Four additional components were available in the toolbar as distracters.

4.4. Entity–relationship diagrams

Entity–relationship diagrams represent the structure of data in an abstract sense (Beynon-Davies, 1992). Entities are uniquely identifi-able objects capabale of an independent existence (for example, a CD), relationships indicate how entities are related (for example, a recordcompany might ‘publish’ many CDs) and attributes identify aspects of an entity or relationship (for example, a CD might have a title). Thesediagrams are most often used within Computer Science education in the design process for database systems. In our case, this module wascompulsory for second year Computer Science undergraduates.

The authoring of the tool library to represent the entities, relationships, attributes and connection lines of the entity–relationship do-main involved composing shapes together in a similar way to the flowchart elements. Entities, relationships and attributes all have perim-eter-type connectivity, while appropriate connection lines are decorated with forked heads (to represent, for example, one-to-manyconnection lines). The data model for entity, relationship and attribute elements consists of a text field which is editable by the student,as well as an uneditable name which permanently identifies the node type. The aim is to allow the student to enter free-form text into thenodes on the canvas, thereby increasing the free-response nature of the assessment. As before, Ariadne was used in the construction of theexercises. Theseus was configured to allow zooming and alignment.

The marking tool for entity–relationship diagrams is an attempt to construct a tool which can be re-used for as many future domains aspossible. The tool works from the sole assumption that each diagram node is a composite figure whose members include a text field; in-deed, this will always be true since the original elements were authored this way in Daidalos and this property can be repeated for futuredomains. As a result, each node can be identified in terms of two attributes: its name (entity, relationship or attribute) and its text content.Connection lines, by contrast, are identifiable in terms of three attributes: name, start node and end node. Allowing students to enter free-form text into the nodes makes it necessary to be flexible in marking since students may use a range of strings to indicate the same intent.Two techniques are used to attempt to accommodate this. Firstly, questions can be carefully worded to encourage students to use partic-ular terminology. Secondly, the marking scheme makes use of Oracles (Zin & Foxley, 1992), an extended regular expression notation whichhas already been used successfully in the assessment of programming coursework.

The entity–relationship tool operates in a similar way to the features testing tool. Four generic operators are available: exist, whichsearches for an element of a given type; exact, which searches for an element of given type and text; connection, which searches for a con-nection of given directionality and type between two specified element types and exactConnection, which searches for a connection of givendirectionality and type between two elements of specified types and texts. A fifth operator, compositeRelationship, combines together thefunctionality of the other operators in a domain specific way and has been implemented to make features test specifications more conve-nient to author.

Three exercises were created. In the first exercise, students were asked to identify entities, relationships and attributes from a textualdescription of a CD catalogue and draw a diagram to represent these elements and the cardinality ratios between them. The second exercisebuilt upon the first: the students were asked to refine their original diagrams, removing problematic constructs such as many-to-manyrelationships. The third exercise introduced a new problem, a movie database. The exercise asked the student to perform tasks analogousto the first two exercises. Later, students were to undertake summative coursework on the topic of SQL based upon their solution to thethird exercise.

5. Results

The following subsections consider the results of deploying the exercises with two separate student cohorts. Section 5.1 provides a briefoverview of how the logic design, flowchart and object-oriented design exercises were received in the Software Tools module. Section 5.2presents more detailed results based upon the deployment of the entity–relationship diagram exercises in the Database Systems module.

5.1. Initial exercises

The two simple logic design exercises were intended to introduce the students to Theseus. Most of the students drew a correct solutionand those that experienced difficulties were quickly helped by CourseMarker’s feedback. Many students reported that they particularlyliked the uncluttered feel of Theseus and the speed with which the system responded during the drawing of their solution. Learning touse Theseus was perceived as easy and intuitive.


The flowchart exercise was similarly popular. The exercise was more difficult; some students got the order of input wrong and receivedlow grades, but with the help of CourseMarker’s feedback they corrected their mistakes and improved their solution, receiving an improvedmark with a repeated submission. The fact that the students could submit a solution, receive feedback and re-submit very quickly was metwith widespread approval.

The object-oriented design exercise was more complex and involved and had been designed to be more time-consuming to solve. Anunexpected side-effect of this was that a small but noticeable number of students attempted to draw their solution on paper first, beforeentering the solution into Theseus immediately prior to their first submission. We were concerned whether this meant that those studentshad reservations as to Theseus’ usability. We had intended that students develop their solution interactively on the drawing canvas. Inves-tigation revealed that some students were simply more at ease when drafting a solution first before ‘‘writing it up” as a second phase. Fur-thermore, some students stated that they developed their solutions on paper because that was how they usually sketched out theirprogram designs. The trend seems, therefore, to relate more to pre-existing behavioural habits than the properties of Theseus and warrantsfurther investigation in future.

Exercises in these first three domains were prototypical and created to highlight the flexibility of the DATsys framework. However, thediagramming exercises were met with considerable success overall. The students’ sense of ease with the course delivery interface can be,perhaps, explained by the fact that the cohort undertook the course concurrently with their compulsory first-year programming exercises,which were also delivered through and assessed by CourseMarker. However, the Theseus interface used by students to draw their diagramswas completely new and was well received.

5.2. Entity–relationship diagram exercises

Table 1 provides an overview of student submission and mark statistics for the initial problem set and the more complex subsequentexercise. Since the entity–relationship diagram exercises in the Database Systems course were of a purely formative nature, we were par-ticularly keen to monitor student uptake. The construction of the exercises as a two-part assessment, where attempting the exerciseswould assist the students directly with the later formal coursework, combined with the intuitive and easy-to-use nature of the Theseusstudent editor, resulted in high student motivation: of 141 active students registered on the course, 130 attempted the exercises. It is clearthat the students found the complex exercise more demanding than the initial exercise set, since they made 80% more submissions onaverage for the final exercise, with 8 students making more than 25 submissions and one student a total of 72. However, a comparableincrease in marks was achieved across both sets of exercises. We will consider our concerns regarding the absolute validity of the percent-age scores later in this section, but it is clear that the students’ solutions converged towards the expectations of the marking system, overthe course of several submissions, in response to the feedback provided onscreen.

A general pattern can be discerned across all three entity–relationship exercises. The improvement in marks between the earlier sub-missions is substantially larger than that between later submissions. Fig. 7 shows how, for the third exercise, the underlying average stu-dent mark improved over the first 9 submissions for those whose total submissions were 12 or fewer. On average, over the first 9submissions a gradually improving underlying student average mark converges around the 70% mark.

Unfortunately, after 9 submissions the improvement in marks between submissions became negligible. This may provide an explana-tion as to why the number of students continuing to submit after this point sharply declined, since the feedback to the student would havechanged little for 2 or 3 consecutive submissions. Institutionally, 70% or higher is considered a first class mark at the University of Notting-ham. Feedback associated with the attainment of such a mark is therefore mostly positive. This, together with the marking limitations dis-cussed later, explains why this mark was not generally higher.

Those students who submitted a great number of times failed to acquire proportionally higher marks. Some students seemed to ran-domly submit altered solutions in the hope of chancing on a higher mark. Some students tended to be perpetually dissatisfied with theirfeedback and submit more times in the hope of achieving a slightly higher result. These students conform to the ‘‘gamblers and perfection-ists” stereotypes identified by Benford, Burke, and Foxley (1992).

It is worth noting that some students were simply interested in finding out how the automated assessment mechanism worked by sub-mitting slightly different diagrams in order to see how the feedback would change. This was the first opportunity that such students hadencountered to make such large numbers of submissions using CourseMarker. For the previous exercise domains, the number of allowedsubmissions was restricted (typically to 3–5 submissions per exercise) because the marks carried summative weight. Similar restrictionsare in place for the undergraduate programming exercises which are assessed each year using CourseMarker. For the entity–relationshipdiagrams, unlimited submissions were allowed because the exercises were designed purely to help students. It is clear that students foundthe ability to re-submit their solutions helpful for more than the standard number of submissions, but that the decision to allow unlimitedsubmissions needs to be considered carefully in future.

The quantitative questionnaire questions asked the student to agree with a series of statements which were then scored on a five pointLikert scale, from 1 – disagree to 5 – agree. 38 completed questionnaires were received, with none spoiled; the results of the questionnaire

Table 1Summary of entity–relationship exercise statistics.

Exercise take-up 92%Initial problemMean submissions per student 5Average mark for first submission 49.2%Average mark for final submission 75.1%

More complex problemMean submissions per student 9Average mark for first submission 50.7%Average mark for final submission 70.1%

Fig. 7. First nine submissions of students who made 12 or fewer total submissions.

Table 2Results of student evaluation questionnaire.

Statement Mean score (N = 38)

The system was easy to use 4.2The diagram questions were easy to comprehend 4.0The diagram questions were useful to my learning process 4.0The feedback I received for my submissions motivated me to research further 3.2I made improvements to my solution as a result of the feedback I received 3.8The feedback was relevant to my solution 2.7The diagram exercises were a good use of my time 3.8


are summarised in Table 2. Students strongly agreed that the system was easy to use and that the questions were easy to comprehend anduseful. Students were motivated to conduct further research to improve their solution before submitting again, but were divided uponwhether they felt that the feedback was strictly relevant to their solution.

Informal interviews with lab tutors uncovered major problems which need to be overcome if the exercises are to be improved. Mostobviously, the marking mechanism failed to take into account diagram appearance. Many students committed little effort to presentingtheir diagram attractively since little could be gained by doing so. This meant that when unexpected feedback was received it was some-times difficult for a lab assistant to determine what was wrong with a student diagram due to its poor layout.

Perhaps more serious was the inability of the features marking mechanism to cope with complex exercises where different model solu-tions are equally acceptable. Features are marked according to their presence, without a sense of context. For the two initial exercises, theextent of this shortcoming was hidden because few variations were plausible. Because the exercise specification for the third exercise wasmore substantial, however, further possibilities for students were available. Only common subset features were marked, meaning that stu-dents were heavily dependent upon lab tutors for assistance in developing their solution. This provides a further explanation as to whystudent submissions fail to converge to a mark higher than 70%. Not only were students receiving good feedback from CourseMarker whensubmitting their solution, but they were receiving assistance from authoritative sources (the tutors) whose mental picture of the modelsolution was likely inconsistent with that in the CourseMarker mark scheme.

Although it is necessary to consider all shortcomings if the materials are to be improved, it must be stressed that students and tutorswere positive about their experiences with the exercises. Student marks demonstrably increased as a result of using the CourseMarker sys-tem. The system itself was popular with students who found their experience of using it enjoyable. Although the tutors suggested improve-ments, they were aware that the system was novel and were positive about what it did achieve.

6. Evaluation

We now evaluate our experiences and show how they relate to the feasibility and usefulness criteria outlined in Section 2.

6.1. Feasibility

We have demonstrated that the authoring of diagram-based CBA is feasible. Four different domains have been implemented, requiringdisparate processes in terms of authoring the diagram notation and marking the student solutions. CourseMarker has met the requirementsof all CBA systems because the delivery of the materials, the input of student solutions, the automated marking and the delivery of feedbackwere all coherently managed by the system in each case.

Concretely, in terms of the assessment process as defined in Section 2, the Activity Selection Process is managed successfully by Course-Marker. The sequencing of assessment tasks can be specified exactly using a marking scheme, while the presentation of teaching materialscan be achieved through the user clients and administrative duties are handled by CourseMarker’s servers. The student launches the con-figured Theseus editor from within the CourseMarker client to draw their solution and then submits through CourseMarker after saving.

The Presentation Process is also fulfilled. The student is presented with a problem specification in the CourseMarker client. Upon settingup the exercise, the student can develop their solution within a parameterised Theseus client. Theseus allows the student to interactively‘‘draw” their diagram upon a development canvas. The student can save their drawing by selecting the save function within Theseus; thedrawing is then captured in a format which can be traversed by the marking mechanism.


Response Processing is fulfilled by the generic marking mechanism. Marking tools are specified when the domain is created. We havedeveloped marking tools for four domains. We summarise approaches for possible further domains and discuss the possibility of a ‘‘gen-eric” diagram marking tool below.

The Summary Scoring Process is successfully managed. CourseMarker assigns marks based upon a weighted summary of the tests it hascarried out and stores these marks in a structured, logical marking result object.

6.2. Usefulness

The authoring of diagram-based CBA is useful. The courses were popular with students, who felt that the assessments assisted theirlearning process. The use of a CBA process allowed large numbers of submissions to be assessed per student, even across large cohorts.This would not have been possible using traditional assessment methods. When a new domain is to be assessed, the process of definingthe domain notation in Daidalos is straightforward. Constructing the marking tools can be a lengthy process and care is required in thedevelopment of exercises. However, once exercises have been created they can be deployed repeatedly. The time-saving potential ofCBA over the medium and long term is, therefore, considerable.

From the point of view of practitioners, the practical benefits of CBA are well established (Charman & Elmes, 1998a). By offering a mech-anism for the authoring of diagram domain notations, a generic marking mechanism, a configurable student diagram editor and an inte-grated online environment, this work has demonstrated that CBA can be applied in an important free-response domain.

6.3. Future directions

The generic marking mechanism is flexible and powerful and does not limit the development of marking tools. Moreover, a breadth ofexperience has been gained in marking tool development which can be useful when new domains are to be assessed. However, the factremains that marking tools must be constructed each time a new diagram domain is to be assessed. Such a development process canbe lengthy and involved.

The assessment of exercises in entity–relationship diagrams using a tool based purely upon domain-neutral features testing was an at-tempt to determine whether a generic marking tool could be applied across a large number of domains. Future approaches can be distin-guished based upon whether they are domain-specific or domain-independent.

Many future marking tools have been proposed for CourseMarker. Database scheme diagrams could be marked by using a suitable toolthat converts the diagram to a database table, runs SQL queries, and tests the output data using oracles. Network diagrams might be con-verted into formats understood by various network simulator tools. Such tools could perform a variety of tasks including load balancing,distribution examination, data throughput analysis and performance scaling investigation. The output of such a network simulator toolcould be read back by the marking system.

Domain-specific mechanisms for assessing entity–relationship diagrams have been documented by Thomas et al. (2006) and Batmazand Hinde (2006). These approaches could be implemented as CourseMarker marking tools.

Medical diagrams (and any other type of picture-based diagrams) could be assessed by developing a marking tool that checks geometricpositioning. The developer would configure the tool with the areas of interest, along with their name and coordinates. Analogue circuitdiagrams could be marked by using external simulation tools such as Spice (Vladimirescu, 1994). Concept maps could be marked by latentsemantic analysis tools similar to Lou’s work on essay-based assessment (Foxley & Lou, 2001b).

A domain-independent approach is now being prototyped, based upon our approach to assessing entity–relationship diagrams (Bligh,2006). The features testing tool used for the entity–relationship diagrams is used as part of a modified marking scheme where several solu-tion cases are examined. The educator identifies features which are common to all plausible model solutions and then enunciates the dif-ferences between the model solutions in the alternate solution cases. Early results indicate that this approach can be useful, but that verylarge amounts of effort are required to develop the exercise marking schemes. Bligh also considers how marking tools can be used to aes-thetically assess the layout of student diagrams developed within Theseus.

7. Conclusion

This research investigated the feasibility and usefulness of the idea of designing an authoring environment for developing diagram-based CBA and considered early experiences in exposing the exercises to student cohorts. An innovative facility has been designed andimplemented to allow experimentation, research and development of diagram-based CBA coursework in a controlled environment. To-gether, the DATsys and CourseMarker systems make supporting the full lifecycle of diagram-based CBA coursework both viable and real-istic. DATsys solves the problem of customising the diagram editor to the specifics of the exercise. CourseMarker’s generic markingmechanism allows variation to be expressed through marking schemes and marking tools, while CourseMarker can support the full lifecyleof the CBA coursework. Between them, Ceilidh and CourseMarker have proved to be invaluable tools in the assessment of programming-based coursework for nearly two decades. We believe that, given the extensions outlined here, CourseMarker and DATsys can prove just asuseful in managing diagram-based exercises.

References

Almond, R., Steinberg, L., & Mislevy, R. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning andAssessment, 1(5).

Amelung, M., Piotrowski, M., & Rösner, D. (2006). EduComponents: Experiences in e-assessment in computer science education. In Proceedings of the 11th Annual ACM SIGCSEConference on Innovation and Technology in Computer Science Education (pp. 88–92). Bologna, Italy.

Batmaz, F., & Hinde, C. J. (2006). A diagram drawing tool for semi-automatic assessment of conceptual database diagrams. In Proceedings of the 10th CAA Conference.Loughborough, UK.

Beck, K., & Gamma, E. (1997). Advanced design with patterns in Java. In Object-oriented programming systems, languages and applications (OOPSLA’1997), tutorial 30.Benford, S., Burke, E., & Foxley, E. (1992). Courseware to support the teaching of programming. In Proceedings of the conference on developments in the teaching of computer

science (pp. 158–166).


Beynon-Davies, P. (1992). Database systems. Basingstoke: Palgrave.Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.Bligh, B. (2006). Formative computer based assessment in diagram based domains. PhD thesis. School of Computer Science and IT, University of Nottingham.Bridgeman, S., Goodrich, M. T., Kobourov, S. G., & Tamassia, R. (2000). PILOT: An interactive tool for learning and grading. In Proceedings of the 31st ACM SIGCSE Technical

Symposium on Computer Science Education (pp. 139–143). Austin, TX, USA.Brown, S., Race, P., & Smith, B. (1996). 500 Tips on assessment. London: Kogan Page.Buchanan, T. (2000). The efficacy of a World-Wide Web mediated formative assessment. Journal of Computer Assisted Learning, 16(3), 193–200.Carter, J., English, J., Ala-Mutka, K., Dick, M., Fone, W., Fuller, U., & Sheard, J. (2003). How shall we assess this? In Proceedings of the 8th Annual Joint Conference Integrating

Technology into Computer Science Education (pp. 107–123). Thessaloniki, Greece.Charman, D., & Elmes, A. (1998a). Computer based assessment (Volume 1): A guide to good practice. Plymouth: SEED Publications.Charman, D., & Elmes, A. (1998b). A computer-based formative assessment strategy for a basic statistics module in geography. Journal of Geography in Higher Education, 22(3),

381–385.Daly, C. (1999). RoboProf and an introductory computer programming course. In Proceedings of the 4th annual ACM SIGCSE/SIGCUE conference on innovation and technology in

computer science education (pp. 155–158). Cracow, Poland.Douce, C., Livingstone, D., Orwell, J., Grindle, S., & Cobb, J. (2005). A technical perspective on ASAP – Automated system for assessment of programming. In Proceedings of the

9th CAA conference. Loughborough, UK.Foxley, E., Higgins, C.A., Hegazy, T., Symeonidis, P., & Tsintsifas, A. (2001a). The CourseMaster CBA system: Improvements over Ceilidh. In Proceedings of the 5th annual

computer assisted assessment conference (pp. 189–201). Loughborough, UK.Foxley, E., Higgins, C. A., Symeonidis, P., & Tsintsifas, A. (2001b). The CourseMaster automated assessment system: A next generation Ceilidh. In Computer assisted assessment

workshop. Warwick, UK.Foxley, E. & Lou, B. (1994). STAMS: A simple text automatic marking system. In Artificial intelligence and simulation of behaviour 94 conference for: Computational linguistics for

speech and handwriting recognition. Leeds, UK.Greenhow, M. (2000). Setting objective tests in mathematics with qm designer. Learning Technology Support Network Connections, 2(1), 21–26.Higgins, C. A. & Bligh, B. (2006). Formative computer based assessment in diagram based domains. In Proceedings of the 11th annual ACM SIGCSE conference on innovation and

technology in computer science education (pp. 98–102). Bologna, Italy.Higgins, C. A., Hegazy, T., Symeonidis, P., & Tsintsifas, A. (2003). The CourseMaster CBA system: Improvements over Ceilidh. Journal of Education and Information Technologies,

8(3), 287–304.Jackson, D. (2000). A semi-automated approach to online assessment. In Proceedings of the 5th annual ACM SIGCSE/SIGCUE conference on innovation and technology in computer

science education (pp. 164–167). Helsinki, Finland.Johnson, R. (1992). Documenting frameworks using patterns. In Proceedings of the 7th annual conference on object-oriented programming systems, languages, and applications,

(pp. 63–76).Johnstone, A. H., & Ambusaidi, A. (2000). Fixed response: what are we testing? Chemistry Education: Research and Practice in Europe, 1(3), 323–328.King, T., & Duke-Williams, E. (2001). Assessing higher level learning outcomes with CBA. In Handbook of the institute for learning and teaching in higher education conference:

Professionalism in practice. York, UK.Malmi, L., & Korhonen, A. (2004). Automatic feedback and resubmissions as learning aid. In Proceedings of the 4th IEEE international conference on advanced learning technologies

(ICALT’04) (pp. 186–190). Joensuu, Finland.Oliver, R. (1998). Experiences of assessing programming assignments by computer. In D. Charman & A. Elmes (Eds.), Computer based assessment (Volume 2): Case studies in

science and computing (pp. 47–49). Plymouth: SEED Publications.Paul, C., & Boyle, A. (1998). Computer-based assessment in palaeontology. In D. Charman & A. Elmes (Eds.), Computer based assessment (Volume 2): Case studies in science and

computing (pp. 51–56). Plymouth: SEED Publications.Race, P. (2001). A briefing on self, peer and group assessment. York: LTSN Generic Centre: Assessment Series.Rumbaugh, J., Jacobson, I., & Booch, G. (2004). The unified modeling language reference manual (2nd ed.). Reading, MA: Addison-Wesley.Rust, C. (2001). A briefing on assessment of large groups. York: LTSN Generic Centre: Assessment Series.Symeonidis, P. (2006). Automated assessment of java programming coursework for computer science education. PhD thesis. School of Computer Science and IT, University of

Nottingham.Thomas, P., Waugh, K., & Smith, N., (2006). Using patterns in the automatic marking of er-diagrams. In Proceedings of the 11th annual ACM SIGCSE conference on innovation and

technology in computer science education (pp. 83–87). Bologna, Italy.Tsintsifas, A. (2002). A framework for the computer based assessment of diagram based coursework. PhD thesis. School of Computer Science and IT, University of Nottingham.Vladimirescu, A. (1994). The SPICE book. New York: John Wiley.Vlissides, J. (1990). Generalized graphical object editing. PhD thesis. Stanford University.von Matt, U. (1994). Kassandra: The automatic grading system. Technical Report UMIACS-TR-94-59. Department of Computer Science, University of Maryland, USA.Wybrew, L. (1998). The use of computerised assessment in health science modules. In D. Charman & A. Elmes (Eds.), Computer based assessment (Volume 2): Case studies in

science and computing (pp. 61–65). Plymouth: SEED Publications.Zin, A. M. & Foxley, E. (1992). The ‘‘oracle” program. Technical report. Department of Computer Science, University of Nottingham, UK.

Documents

Computers & Education · 2001). Often, the amount of formative assessment has decreased, despite its advantages to student learning (Race, 2001). Computer based assessment (CBA) technologies