9
248 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009 p88110: A Graphical Simulator for Computer Architecture and Organization Courses M. Isabel García, Santiago Rodríguez, Antonio Pérez, and Antonio García Abstract—Studying fundamental Computer Architecture and Organization topics requires a significant amount of practical work if students are to acquire a good grasp of the theoretical concepts presented in classroom lectures or textbooks. The use of simulators is commonly adopted in order to reach this objective. However, as most of the available educational simulators focus on specific topics, different laboratory assignments usually require the use of different simulators. This paper presents a graph- ical and interactive reduced instruction set computer (RISC) processor and memory simulator that allows active learning of some theoretical concepts covered in Computer Architecture and Organization courses. The simulator can be configured to present different processor views, from a simple serial one, without caches or pipelines, to a more realistic one with caches and superscalar execution. This approach allows a set of increasingly complex code-based laboratory assignments to be developed using a single simulator, covering topics ranging from assembly language pro- gramming to the analysis of the different kind of cache misses, pipeline hazards or branch prediction hits and misses produced during a program execution. The simulator has been included in a an automatic assessment system that helps the students to complete the assignments and helps teachers to evaluate the correctness of the students’ solutions in different environments, such as high-enrollment courses or distance education. Since 1996, both the simulator and the automatic assessment system have been successfully used by more than 5000 students in Computer Architecture and Organization courses at the Technical University of Madrid (UPM), Spain. Index Terms—Automatic assessment, computer architecture and organization, computer science education, educational tech- nology, graphical interface, high-enrollment course management. I. INTRODUCTION C OMPUTER Architecture and Organization courses cover fundamental topics such as assembly language program- ming, input/output subsystems, cache memories, pipeline, and superscalar processors. Acquiring a good grasp of the theoret- ical concepts presented in classroom lectures and recommended textbooks requires students to perform a lot of practical work, by means of laboratory assignments. It is of little use, for in- stance, for a student to know the principles of cache memory, if s/he does not understand their influence on system performance. The use of software simulators is a common practice to help stu- dents in understanding these main topics. Educational simula- tors provide simplified models of real computers, allowing the Manuscript received March 04, 2008; revised May 15, 2008. Current version published May 06, 2009. The authors are with the Department of Computer System Architecture and Technology (DATSI), Technical University of Madrid, 28660 Madrid, Spain (e-mail: mgarcia@fi.upm.es; srodri@fi.upm.es; aperez@fi.upm.es; dopico@fi. upm.es). Digital Object Identifier 10.1109/TE.2008.927690 student to observe the interaction between the simulated com- ponents during program execution, as well as some events that generally occur “behind the scenes” on real computers. Many educational simulators have been developed to be used in Com- puter Architecture and Organization courses, surveys of which can be found in [1]–[3]. Most of these educational simulators focus on particular sub- jects. Spim [4] and MARS [5] are useful tools for assembly lan- guage laboratory assignments. Spim-Cache [6], DCMSim [7], and Dinero [8] are suitable for studying cache memory sys- tems. Windlx [9], DLXview [10], SuperSim [11] and PSATSim [12] are useful in understanding basic pipeline related concepts, out-of-order execution, and superscalar systems principles. The authors’ goal was to have an interactive simulator that could be used to cover several of the core topics students should know about, rather than multiple simulators each focused on a specific topic. This approach reduces the time-consuming task of learning how each individual simulator works, as well as the programming language it uses, and allows students to focus on solving the problem at hand. Additionally, the authors needed a tool to be used in high-enrollment courses which would allow teachers to track the progress of the laboratory assignments, and to evaluate them automatically [13]. This paper presents an interactive processor-memory simu- lator with a graphical interface that is being used for teaching some theoretical concepts in Computer Architecture and Organ- ization courses at the Computer Science School of the Technical University of Madrid (UPM), Spain. A previous text-based version of the simulator [14] has been used by more than 5000 students since 1996. This version was successfully applied in the basic Computer Organization course offered in the second year of the Computer Engineering degree, which covers basic computer structure concepts such as the in- struction set and assembly language programming, and in the Computer Architecture course offered in the third year, which covers topics such as cache memories and pipeline processors. Likewise, the simulator described in this paper will be used for the optional course Instruction-Level Parallelism (ILP) and Su- perscalar Architectures offered in the fourth year. The proposed simulator, described in this paper, can be configured through the graphical interface to present different processor views: from a simple serial view, without caches or pipelines, to a more realistic view with caches or superscalar execution. This feature allows instructors to adapt the simu- lator to different laboratory assignments and course curricula. Caches and superscalar execution are disabled, for instance, on assembly language programming laboratory exercises. The simulator includes an embedded editor and debugger, al- lowing the student to compose an assembly language program, 0018-9359/$25.00 © 2009 IEEE

p88110: A Graphical Simulator for Computer Architecture and Organization Courses

Embed Size (px)

Citation preview

248 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009

p88110: A Graphical Simulator for ComputerArchitecture and Organization Courses

M. Isabel García, Santiago Rodríguez, Antonio Pérez, and Antonio García

Abstract—Studying fundamental Computer Architecture andOrganization topics requires a significant amount of practicalwork if students are to acquire a good grasp of the theoreticalconcepts presented in classroom lectures or textbooks. The use ofsimulators is commonly adopted in order to reach this objective.However, as most of the available educational simulators focus onspecific topics, different laboratory assignments usually requirethe use of different simulators. This paper presents a graph-ical and interactive reduced instruction set computer (RISC)processor and memory simulator that allows active learning ofsome theoretical concepts covered in Computer Architecture andOrganization courses. The simulator can be configured to presentdifferent processor views, from a simple serial one, without cachesor pipelines, to a more realistic one with caches and superscalarexecution. This approach allows a set of increasingly complexcode-based laboratory assignments to be developed using a singlesimulator, covering topics ranging from assembly language pro-gramming to the analysis of the different kind of cache misses,pipeline hazards or branch prediction hits and misses producedduring a program execution. The simulator has been includedin a an automatic assessment system that helps the students tocomplete the assignments and helps teachers to evaluate thecorrectness of the students’ solutions in different environments,such as high-enrollment courses or distance education. Since 1996,both the simulator and the automatic assessment system havebeen successfully used by more than 5000 students in ComputerArchitecture and Organization courses at the Technical Universityof Madrid (UPM), Spain.

Index Terms—Automatic assessment, computer architectureand organization, computer science education, educational tech-nology, graphical interface, high-enrollment course management.

I. INTRODUCTION

C OMPUTER Architecture and Organization courses coverfundamental topics such as assembly language program-

ming, input/output subsystems, cache memories, pipeline, andsuperscalar processors. Acquiring a good grasp of the theoret-ical concepts presented in classroom lectures and recommendedtextbooks requires students to perform a lot of practical work,by means of laboratory assignments. It is of little use, for in-stance, for a student to know the principles of cache memory, ifs/he does not understand their influence on system performance.The use of software simulators is a common practice to help stu-dents in understanding these main topics. Educational simula-tors provide simplified models of real computers, allowing the

Manuscript received March 04, 2008; revised May 15, 2008. Current versionpublished May 06, 2009.

The authors are with the Department of Computer System Architecture andTechnology (DATSI), Technical University of Madrid, 28660 Madrid, Spain(e-mail: [email protected]; [email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TE.2008.927690

student to observe the interaction between the simulated com-ponents during program execution, as well as some events thatgenerally occur “behind the scenes” on real computers. Manyeducational simulators have been developed to be used in Com-puter Architecture and Organization courses, surveys of whichcan be found in [1]–[3].

Most of these educational simulators focus on particular sub-jects. Spim [4] and MARS [5] are useful tools for assembly lan-guage laboratory assignments. Spim-Cache [6], DCMSim [7],and Dinero [8] are suitable for studying cache memory sys-tems. Windlx [9], DLXview [10], SuperSim [11] and PSATSim[12] are useful in understanding basic pipeline related concepts,out-of-order execution, and superscalar systems principles.

The authors’ goal was to have an interactive simulator thatcould be used to cover several of the core topics students shouldknow about, rather than multiple simulators each focused on aspecific topic. This approach reduces the time-consuming taskof learning how each individual simulator works, as well as theprogramming language it uses, and allows students to focus onsolving the problem at hand. Additionally, the authors needed atool to be used in high-enrollment courses which would allowteachers to track the progress of the laboratory assignments, andto evaluate them automatically [13].

This paper presents an interactive processor-memory simu-lator with a graphical interface that is being used for teachingsome theoretical concepts in Computer Architecture and Organ-ization courses at the Computer Science School of the TechnicalUniversity of Madrid (UPM), Spain.

A previous text-based version of the simulator [14] has beenused by more than 5000 students since 1996. This version wassuccessfully applied in the basic Computer Organization courseoffered in the second year of the Computer Engineering degree,which covers basic computer structure concepts such as the in-struction set and assembly language programming, and in theComputer Architecture course offered in the third year, whichcovers topics such as cache memories and pipeline processors.Likewise, the simulator described in this paper will be used forthe optional course Instruction-Level Parallelism (ILP) and Su-perscalar Architectures offered in the fourth year.

The proposed simulator, described in this paper, can beconfigured through the graphical interface to present differentprocessor views: from a simple serial view, without caches orpipelines, to a more realistic view with caches or superscalarexecution. This feature allows instructors to adapt the simu-lator to different laboratory assignments and course curricula.Caches and superscalar execution are disabled, for instance, onassembly language programming laboratory exercises.

The simulator includes an embedded editor and debugger, al-lowing the student to compose an assembly language program,

0018-9359/$25.00 © 2009 IEEE

GARCÍA et al.: GRAPHICAL SIMULATOR FOR COMPUTER ARCHITECTURE 249

assemble it, and then execute the assembled program. The ex-ecution can be carried out either step-by-step, displaying thecontents of the processor resources at each step, or the wholeprogram at once, to check the final status of the processor re-sources. Finally, the student can obtain some performance met-rics to evaluate the impact of different program or componentdesign decisions on performance.

This paper is organized as follows: Section II describes thesimulator, its graphical interface and the information it provides.Section III discusses different types of laboratory assignmentsproposed using this simulator. Section IV describes the assign-ment assessment system. Finally, Section V presents some con-clusions based on the authors’ experience.

II. SIMULATOR DESCRIPTION

The p88110 emulates the behavior of the 32-bit MC88110microprocessor [15] by simulating its main functional units.When the simulator development started in 1993, the 88000architecture was considered a very “clean” design and theMC88110 microprocessor one of the simplest and best docu-mented superscalar processors. Previous versions of the sim-ulator are described in [14] and [16]. Although the MC88110could be considered old-fashioned, it incorporates some inter-esting educational characteristics, as shown.

• Since it has a reduced instruction set computer (RISC)architecture, it provides a simple instruction set, andsimple but useful addressing modes, particularly suitablefor novice students facing their first assembly languageprogram.

• It has two on-chip caches in a Harvard architecture styleand can therefore be used to introduce main memory hi-erarchy principles, and to study the impact of its differentdesign parameters on overall system performance.

• Being a pipelined two-way superscalar processor, it allowsstudents to see how this approach can lead to higher per-formance, as well as how performance can be affected bydata, structural, and control hazards.

Some characteristics of the actual processor, such as ex-tended floating point format, graphical units or the extendedregister file, have not been included because the authors do notconsider them very relevant for academic purposes. Neithercomplete exception handling mechanisms nor input/outputsubsystems have been included in this simulator because thelaboratory assignments related with these topics are developedusing the bsvc simulator [17] based on a complex instructionset computer (CISC) microprocessor. In this way, students canunderstand and work with both RISC and CISC architectures.In the p88110, exceptions are detected and the simulation stops,warning the user and identifying the kind of exception.

The simulator provides a simple graphical interface. Fig. 1shows the simulator main window which consists of:

• a menu giving access to the different windows to openor edit a program, configure the memory hierarchy pa-rameters, visualize the main system components, modifymemory and register contents, and display some statistics;

• three command buttons to control program execution:Restart from the beginning, Execute the whole program

Fig. 1. p88110 simulator main window.

(or until a breakpoint is reached), and Step-by-step exe-cution. To execute more than one instruction or cycle, anumber of steps can also be selected through a spin-button.

• the contents of the general purpose and PC registers, aswell as the Carry and Overflow conditions. Register con-tents can be viewed in decimal or hexadecimal notationthrough two radio-buttons placed at the top right corner ofthe window.

• three text boxes, at the bottom, containing the next instruc-tion to be executed, the number of instructions executed,and the elapsed time, expressed in clock cycles.

The main options, included in the main window menu, cor-respond to the main steps followed in developing a practicalexercise.

A. Program Development

Students can write their assembly language program usinga conventional editor. Nevertheless, the simulator has an em-bedded editor. The editor window (Fig. 2) enables the writing ofnew programs, that can be saved into files, as well as the openingand modifying of existing ones. This window includes the stan-dard file management options, as well as the Assemble option.

The latter translates the assembly language program into thecorresponding binary code that can be directly executed by theprocessor simulator. A warning window, with the appropriatemessages and line numbers, is shown when there is any errorduring the assembly process. If there are no errors, the programbinary code can be saved in a file to be used later. Saving thisfile avoids the need to repeat the assembly process every timethe program has to be executed.

250 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009

Fig. 2. Editor window.

As well as instructions, code can incorporate the following:• pseudoinstructions such as org to specify where code and

data have to be stored, res to reserve memory for data, anddata to reserve and initialize a memory area;

• the definition of macros;• comments separated from instructions by semicolons.Students can assign different colors to the different code

items: opcodes, labels, and comments through the Configureoption of the editor window. This feature allows them tobecome aware of syntax errors while writing their programs.

B. Configuration

The parameters to be configured depend on the complexity ofthe practical assignment. Fig. 3 shows the windows to configuresome of the main parameters. For beginners’ practical exercisesbased on assembly language programming, the student shouldselect serial execution. This execution mode allows details suchas caches or superscalar execution to be ignored. Moreover, theuser can choose between different options available for severaloperation modes, which are shown in the window on the left inFig. 3:

• the byte ordering convention to be used: big endian or littleendian;

• to enable or disable exceptions due to arithmetic opera-tions, such as overflow or division by zero;

• the rounding mode to be applied in floating point arith-metic operations. The four IEEE floating-point standardrounding modes (round-to-nearest, round-toward-zero,round-down, and round-up) are supported.

For simulation exercises on cache memory, the mainmemory hierarchy parameters can also be configured, throughthe window shown on the right in Fig. 3:

• the main memory access time, expressed in clock cycles;

Fig. 3. Parameter configuration.

• the main cache memory parameters: access time, numberof lines, bytes per line, organization policy (direct, fully as-sociative and set associative), lines per set, and write policy(write through and write back). The replacement policy isassumed to be least recently used (LRU) for associativecaches. Both instruction and data cache can be enabled andconfigured separately.

Finally, for advanced practical assignments, when pipelineand superscalar characteristics are studied, the student has toselect parallel execution mode. In this mode, the simulated pro-cessor implements the MC88110 four-stage pipeline, with thefollowing characteristics:

• decode and write-back stages require one clock cycle,while Fetch and Execution/Memory stage latencies de-pend on whether there is a hit or a miss in the instructionand data caches, as well as on the latency of the functionalunit involved;

• the processor can issue two instructions per clock cycle;• most instructions are issued in the program order, but they

can complete in a different order because of the differentlatencies of the eight functional units simulated [16];

• the processor implements a partial out-of-order issuemodel for branch and store instructions that can be issuedeven if their operands are not available;

• static branch prediction and delayed branches. The simu-lator also implements the MC88110 history buffer to re-store the processor state when there is a branch mispredic-tion or an exception.

C. Execution and Debugging

The runtime environment depends on the selected configura-tion, as well as on the practical assignment, although there aresome common characteristics. For example, see the following.

• Program execution can be carried out all at once or stepby step. In serial mode, a step involves a single instructionexecution. When parallel execution is selected, a step cor-responds to the operations carried out during a clock cycle.

• Debugging functions allow the student to disassemble thebinary program from a particular code address, and to setor delete breakpoints. Fig. 4 shows the windows used for

GARCÍA et al.: GRAPHICAL SIMULATOR FOR COMPUTER ARCHITECTURE 251

Fig. 4. Disassembling the object program.

Fig. 5. Displaying and modifying main memory and modify registers.

debugging. The Disassemble window can be used to setor delete breakpoints while seeing the program code. Thebreakpoints window provides the existing breakpoints list.

• Students can visualize the main memory, as well as modifymain memory locations and register contents, as shown inFig. 5.

For beginners’ practical exercises, the simulator displays theprocessor state at the end of an instruction execution (or at theend of the program) in the simulator main window. The simu-lator also displays changes in the main memory, when the cor-responding window is being displayed.

Fig. 6. Displaying cache memory contents and statistics.

For simulation exercises on cache memory, the student canobserve the activities of the caches when an instruction is exe-cuted (or the cache contents and statistics at the end of the pro-gram) through the windows shown in Fig. 6.

Unlike other simulators that show tag and line informationor tag and set information in the cache window, this simulatorpresents only two fields: the complete main memory address,and a value if the memory address content is present in thecache; otherwise this field is substituted by blanks. The maingoal is to enable students to observe if some particular informa-tion is present in the cache. In this case, for write-back caches,students can see that the cache and main memory can containtwo different values for the same location.

The cache statistics window provides information about in-struction and data accesses, read and write accesses, misses,hit ratio, and the number of dirty lines replaced for write-backcaches.

Finally, for pipeline and superscalar exercises, the evolutionof instructions through pipeline stages, the history buffer con-tents, and some pipeline statistics can be visualized at everyclock cycle (Fig. 7).

The pipeline window shows the instruction addresses and theassembly code. Stalled instructions, due to hazards or cachemisses, are shown in a different color.

As branch prediction is made at the Decode stage, instruc-tions on the predicted path are tagged as Conditional (instruc-tions at addresses 20028 and 20032). When the condition valueis computed at the Execution stage, the prediction is checked. Ifthe prediction was right, the Conditional tags are removed, andexecution continues on the predicted path. Otherwise these in-structions are aborted.

At the bottom of the pipeline window is the history buffer thatallows instructions to be retired in order, to preserve sequentialconsistency. Issued instructions are stored in this buffer in theprogram order, as well as their state (completed, not executed orexception detected), and the previous value of their destinationregister, in order to restore the state prior to their execution whenthere is a misprediction or an exception.

252 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009

Fig. 7. Displaying the p88110 pipeline and the history buffer.

If the instruction that reaches the head of the history buffer isa branch whose prediction failed, all the tagged instructions areremoved and the values of their destination registers are restoredto those saved in the history buffer.

The pipeline statistics window shows the data dependenciesfound during the program execution, as well as the executedconditional branches, the mispredicted branches, and the pre-diction hit ratio.

III. THE LABORATORY ASSIGNMENTS

As aforementioned, the simulator is being used in the basicComputer Organization course offered in the second year of theComputer Engineering degree, and in the Computer Architec-ture course offered in the third year, and will be used for theoptional ILP and Superscalar Architectures course, offered inthe fourth year.

This section describes a selection of laboratory experimentsfor the second and third year, assignments A and B respectively,and a proposal for the fourth year, assignment C, which is cur-rently carried out using two different simulators: DLXide forpipeline basics and DLXview [10] for out of order executionusing the Tomasulo algorithm.

A. Assembly Language Programming Assignment

In this assignment, instead of proposing several simple andindependent assembly language programs, which is a commonpractice in other universities, the implementation of a relativelycomplex problem is proposed.

An example of the problems proposed in this assignment con-sists of applying a filter to an image represented as an array ofunsigned bytes (pixels). The filter is based on a recursive oper-ation using a filter kernel, composed of a small square matrix offraction numbers represented as an ASCII string.

The program must be structured as a main program with sev-eral simple subroutines to make the program development anddebugging easier. The program also allows the automatic eval-uation tool [13] to give the student a precise description of anyerrors present in each subroutine. Examples of these subroutinesinclude: converting an ASCII character string into the corre-sponding floating point number, comparing the original imagewith the filtered one, or applying the filter to a specific pixel ofthe image.

The main program has to initialize the user stack, use thestack to store the arguments that must be passed to the differentsubroutines, call them, and collect the returned results. Both byreference and by value parameter passing methods have to beused, depending on the type of parameter.

Since the MC88110 does not have either a special stackpointer register or push and pop machine instructions, the stu-dent has to manage the stack, using a general purpose register(e.g., r30) and implement these instructions through macros.

Also, the only branch to subroutine instructions available inthe MC88110 (jsr and bsr) save the return address in a register(r1), which means that the student has to save this register intothe stack when nested or recursive subroutine calls are needed.

Finally, the student has to implement the stack frame for eachsubroutine to reserve memory for local variables. To implementthis structure two general purpose registers are used: r30 as thestack pointer and r31 as the frame pointer.

With this practical assignment, the student gains hands-on ex-perience with the main topics presented in Computer Organiza-tion classroom lectures, as follows:

• the computer’s instruction set, the main addressing modes,and how to use these to implement a program operating onscalar and array variables.

• managing the program stack to support simple and recur-sive procedure calls at the assembly level;

• operating with variables represented in different notation:ASCII, floating point, unsigned and two’s complement in-teger, and dealing with format conversion problems andpossible rounding problems.

B. Cache Memory Assignment

In this assignment, students are provided with several simplebut highly educational programs in assembly language. Thecorresponding version in C is also provided as comments on theprogram head, allowing the students to identify the correspon-dence between high level language constructs and assemblylevel instructions.

Due to the large number of cache parameters that interact,affecting the system performance, this assignment is scheduledin several stages.

First, students have to execute the programs with several datacache configurations with the same line size and write policy,four words per block and write-back caches, but with differentcapacity and organization policies. Then, they have to fill out atable like that shown in Table I, resulting from the execution ofa program segment that can be expressed in C as follows:

GARCÍA et al.: GRAPHICAL SIMULATOR FOR COMPUTER ARCHITECTURE 253

TABLE IRESULTS OBTAINED FOR WRITE-BACK CACHES WITH 64 AND 128 LINES

The left-most column shows the number of lines and the or-ganization policy of the cache memory: dir for direct mapped,ac2 for two way set associative, and as for fully associative. Ex-ecuted instructions, memory references (Refs), cycles, misses,and hit ratio (Hr) are provided by the simulator. Using this infor-mation students have to determine which accesses cause cachemisses. Then they have to classify these into compulsory, ca-pacity, or conflict categories, as well as to compute the per-centage of each of these of the total number of misses. Thesepercentages are shown in parentheses in Table I.

Carrying out this task requires students to calculate manuallythe main memory blocks where information is stored, as wellas the cache blocks where they will be located, and determinethe sequence of references made during the program execution.The debugging facilities provided by the simulator can also beused to check when a reference results in a hit or in a miss.

In the second stage, students have to check the influence onperformance of the following parameters.

• Write policy. The same program has to be executed withthe same capacity and organization policies but with write-through caches, and students have to compare the resultsobtained with those from the previous execution (hit ratiosand execution time).

• Block size. Students have to check the influence of theblock size on compulsory misses and miss penalty bymaintaining the cache size and increasing the block size.

• Cache capacity. In this case, the block size is fixed, andstudents have to decrease the cache capacity (number ofblocks) to check the impact on capacity misses and execu-tion time.

Finally, students have to modify the programs provided by ap-plying somesoftwareoptimizations used in moderncompilers, toreduce memory accesses (scalar replacement) and cache misses(loop interchange, loop fusion, loop distribution, loop unrolling,blocking, and merging arrays). The main goal of these techniquesis to maximize the use of data loaded into the data cache beforethey are removed, by improving spatial or temporal locality.

The optimizations applied depend on the programs provided,which vary from course to course, although they maintain thesame general structure: several loops operating on arrays.

With this assignment students learn how to write programsthat make efficient use of cache memories. By executing real-life programs, students gain a better understanding of the fol-lowing basic concepts of cache memory operation:

• the impact of different cache organizations on perfor-mance;

• the major kinds of misses, why they occur, and how theycan be reduced;

• that the best measure of cache performance, other that thehit ratio, is the average access time, which considers the hittime and miss penalty;

• the way in which write policy affects the average accesstime, through comparison of the most common types: writethrough with no allocation and write back with allocation;

• the impact of block and cache size on performance;• that in some programs the replacement policy can result

in higher miss rates for larger associativity, as shown inTable I.

C. Pipeline and Superscalar Assignment

As aforementioned, this assignment corresponds to the op-tional ILP processors course offered in the fourth year. Studentsthat choose this course have already studied the basic pipelineconcepts in the Computer Architecture course, offered in thethird year, and are therefore qualified to tackle some advancedconcepts such as dynamic instruction scheduling techniques, su-perscalar execution, and branch prediction.

In this assignment, two different kinds of programs are pro-vided, one of them operating on lists, and including a relativelyhigh number of conditional and unconditional branches, and theother performing some matrix operation, with a low number ofpredictable conditional branches. First, students have to con-figure the simulator to operate in parallel execution mode, andconfigure the main memory and cache parameters. Then, theyhave to execute these programs, and to analyze the following as-pects, by observing the flow of instructions through the pipelinestages, as well as the history buffer contents:

• the pipeline stalls produced by cache misses;• the data, structural, and control hazards that cause pipeline

stalls;• the number of instructions in the Execute stage, i.e., the

utilization of the available functional units;• the instructions that complete out of order, due to the dif-

ferent latencies of functional units;• the instructions dispatched out of order. In this simulator,

branches and stores whose operands are not availabledo not delay the execution of subsequent instructions,allowing dynamic scheduling for these instructions.

• the system behavior when an exception occurs. The simu-lator provides a precise exception model by using the his-tory buffer.

• the execution of branch instructions when prediction anddelayed branches are used, and the activities and penaltieswhen a branch is taken or not taken.

• the usefulness of the history buffer;• the average number of instructions executed per cycle

(IPC).

254 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009

Second, students have to optimize the code using some ofthe previously studied software techniques, like loop unrollingor software pipelining. Then, the code obtained is reordered toavoid structural, data, and control hazards. The new IPC is cal-culated for the new code and compared with the original.

With this practical assignment, the student gets hands-on ex-perience with the main ILP topics:

• the basics of dynamic scheduling and out-of-order execu-tion;

• the hardware elements that support out-of-order executionand in-order completion;

• the difference between completing and retiring an instruc-tion;

• the basics of speculative execution and superscalarprocessing.

• compiling techniques used in ILP processors;• the influence of system memory on pipeline and super-

scalar execution performance.

IV. ASSIGNMENT ASSESSING SYSTEM

The proposed assignments are described in specification doc-uments that are delivered to students. This environment allowsstudents to learn by themselves and to build up their knowledge,as suggested in the constructivist approach [18], [19], while theteacher acts as an advisor who helps the students solve the prob-lems they encounter.

The students have to download the specification documentand the graphical simulator. This approach allows the studentsto work on the project at home, running p88110 on their owncomputers, because the simulator has been built using widelyavailable software packages. When the students have completedthe assignment, the teaching staff has to assess the students’solutions. This process may be time-consuming if the numberof students is greater than a few dozen. This constraint led thedepartment to build a tool, described in detail in [20], that allowsan easy interaction between the teaching staff and the students.This tool provides the following facilities.

• The student can access it by using any computer connectedto the Internet.

• The student can submit the assignment to the tool. Thestudent’s assignment is stored in the system until it isevaluated.

• Every assignment submitted to the system is evaluated (seebelow) by the automatic evaluator.

• The evaluation result can be accessed by the students.• The students can improve their assignments using the feed-

back provided by the evaluator.• The students can resubmit their assignments to the system

to be evaluated again.The automatic evaluator is a tool that helps the students and

teachers to achieve their objectives, helping students to accom-plish the requirements specified by the assignment and makingit simpler for teachers to assess students’ projects. The evalu-ator works on a set of tests built using the same specificationdocument delivered to the students. The automatic evaluatoris run at specific times and executes an exhaustive set of testsfor every assignment submitted for evaluation. For each test as-sembly code, the evaluator may append the student code if nec-essary. The evaluator assembles the new file and executes the

Fig. 8. p88110 evaluation tool structure.

test machine code. The results are compared with the expectedresults. If any difference is found, the student is notified that thetest has failed.

The simulator described in the previous sections was modi-fied to provide this functionality. As shown in Fig. 8, the coreof the p88110 is used to run the test set, having been modifiedto allow batch executions and to avoid infinite loops. This pro-gram runs until the timeout is reached, in which case the studentis notified via the feedback file.

The teaching staff has to create the following files in order tobuild a test.

• The configuration file contains the data set to be analyzedwhen the test finishes, i.e., the register and memory loca-tions where the results are expected to be found. The sta-tistical data the simulator has to compute are also speci-fied. For example, the cache assignment requires analyzingthe number of accesses to data and instruction caches, thenumber of cycles used to run the program, the hit ratio, etc.Furthermore, the configuration file contains the timeoutprovided for the test; if the timeout expires no result isprovided.

• The code file contains the main program that calls the stu-dent code. Depending on the assignment this file may beempty. For example, for the assembly language program-ming assignment this file contains the program that callsthe students’ subroutines and arranges the students’ re-sults in different registers and memory areas. The cacheassignment does not use this file, because the main objec-tive is for the student to optimize the program providedby the teaching staff, and this already includes the callingprogram.

• The results file contains the expected results for the test.The performance data does not have to be identical to theseexpected results, that is to say, the evaluator will pass a testif the student’s result is “similar” (within a certain range)to the statistical result included in this file.

• The feedback file contains a description of the test to beincluded in the report file when a test fails. In this case,the evaluator includes this file and suggests the type oferror found, for example an infinite loop, a memory contenterror or some undesired statistical results to help studentsto improve their assignments.

When an evaluation has finished, the feedback file containsthe tests that did not meet the specifications. The student canretrieve this file by using the front-end of the Assignment As-sessing System.

GARCÍA et al.: GRAPHICAL SIMULATOR FOR COMPUTER ARCHITECTURE 255

Fig. 9. Computer structure results.

The assessment of the assignment is concluded when the stu-dent delivers a report containing the design of the assignmentand an explanation of the results obtained. Finally, the studentshave to take an exam to measure the knowledge level acquiredthrough the assignment.

V. CONCLUSION

This paper has presented p88110, a pedagogical simulator ofthe MC88110 used since 1996 at the UPM. This tool makesit easy for students to understand some theoretical conceptstaught in Computer Organization and Architecture courses, byallowing the student to solve code-based exercises. The mainbenefits the simulator provides to both students and teachers areas follows.

• The student can discover the internal structure of the pro-cessor, i.e., cache operation, pipeline transitions, etc. Con-ventional laboratory assignments do not try to show theseconcepts.

• Most simulator parameters are fully configurable. This fea-ture allows the student to change different parameters, tocompare the different results, and to identify how some pa-rameters influence on performance.

• The “batch mode simulator” allows the teaching staff toassess the students’ assignments. This kind of assessmentallows teachers to propose “real-life” laboratory assign-ments.

Teaching experience has shown that the simulator and the as-sessment system help the students to pass the theoretical exam.Figs. 9 and 10 show the percentage of students who passed thetheoretical exam of Computer Organization and Architectureover the last five academic years. In both figures, the white barrepresents the students who pass the theoretical exam and whohave previously passed the assignment, and the shaded bar rep-resents the students that do not pass the assignment but do passthe theoretical exam. Fig. 9 shows that the students who passthe assembly language programming assignment are twice aslikely to pass the theoretical exam as those who have not passedthe assignment. Fig. 10 shows a similar graph for the cachesassignment. In this case, if the students have passed the assign-ment they quadruple their probability of passing the theoreticalexam.

The difference between the two assignments is in the diffi-culty of the theoretical concepts involved. The cache assignment

Fig. 10. Computer architecture results.

treats more difficult concepts than does the assembly assign-ment, such as replacement, organization, and write policies, orthe kind of misses (compulsory, capacity, and conflict). Theseconcepts have to be understood by the students to improve theirunderstanding of computer performance.

These results show that this simulator has many benefits, forstudents and teachers alike, and that it helps students to under-stand computer’s internal structure.

REFERENCES

[1] W. Yurcik, G. Wolfe, and M. Holliday, “A survey of simulators used incomputer organization/architecture courses,” in Proc. Summer Comput.Simul. Conf., Orlando, FL, Jul. 2001.

[2] A. Stojkovic, J. Djordjevic, and B. Nikolic, “WASP: A web-based sim-ulator for an educational pipelined processor,” Int. J. Elect. Eng. Educ.,vol. 44, no. 3, pp. 197–215, Jul. 2007.

[3] J. Djordjevic, B. Nikolic, and A. Milenkovic, “Flexible web-based ed-ucational system for teaching computer architecture and organization,”IEEE Trans. Educ., vol. 48, pp. 264–273, May 2005.

[4] J. Larus, SPIM S20: A MIPS R2000 Simulator U. Wisconsin-Madison,Comput. Sci. Dept., 1990, Tech. Rep. 996.

[5] K. Vollmar and P. Sanderson, “MARS: An education-oriented MIPSassembly language simulator,” SIGCSE Bull., vol. 38, no. 1, pp.239–243, 2006.

[6] J. Sahuquillo, N. Tomás, S. Petit, and A. Pont, “SPIM-cache: A ped-agogical tool for teaching cache memories through code-based exer-cises,” IEEE Trans. Educ., vol. 50, pp. 244–250, Aug. 2007.

[7] E. Cordeiro, I. Stefani, T. Soares, and C. Martins, “DCMSIM: Didacticcache memory simulator,” in Proc. 33rd Frontiers in Educ. Conf., 2003,pp. F1C–14.

[8] J. Edler and M. Hill, Dinero IV Trace-Driven Uniprocessor CacheSimulator 2008 [Online]. Available: http://pages.cs.wisc.edu/markhill/DineroIV

[9] H. Gruenbacher and M. Khosravipour, “WinDLX and MIPSimpipeline simulators for teaching computer architecture,” in Proc. IEEESymp. Workshop Eng. Comput. Based Syst., Washington, DC, 1996,pp. 412–419.

[10] Y. Zhang and G. Adams-III, “An interactive, visual simulator for theDLX pipeline,” IEEE TCCA Newslett., pp. 9–12, 1997.

[11] A. Misev and M. Gusev, “Visual simulator for ILP dynamic OOO pro-cessor,” in Proc. Workshop Comput. Architect. Educ., New York, 2004,pp. 87–92.

[12] C. Smullen and T. Taha, “PSATSim: An interactive graphical super-scalar architecture simulator for power and performance analysis,” inProc. Workshop on Comput. Architec. Education, New York, 2006.

[13] S. Rodríguez, J. Pedraza, A. García, F. Rosales, and R. Méndez,“Computer-assisted assembly language programming laboratory,” Int.J. Elect. Eng. Educ., vol. 44, no. 3, pp. 216–229, Jul. 2007.

[14] S. Rodríguez, M. García, and R. Méndez, “Teaching computer ar-chitecture with a new superscalar processor emulator,” in Proc. 4thAnn. SIGCSE/SIGCUE Conf. Innovat. Technol. Comput. Sci. Educ.,ITiCSE’99, Cracow, Poland, 1999, pp. 99–102.

[15] K. Diefendorff and M. Allen, “Organization of the motorola 88110 su-perscalar RISC microprocessor,” IEEE Micro, vol. 12, no. 2, pp. 40–63,Apr. 1992.

256 IEEE TRANSACTIONS ON EDUCATION, VOL. 52, NO. 2, MAY 2009

[16] J. Pérez, S. Rodríguez, M. García, and R. Méndez, “The EM88110:Emulating a superscalar processor,” ACM SIGCSE Bull., no. 4, pp.45–50, Dec. 1997.

[17] S. Rodríguez, J. Zamorano, F. Rosales, A. García, and J. Pedraza, “Aframework for lab work management in mass courses. Application tolow level input/output without hardware,” Comput. Educ., vol. 48, no.2, pp. 153–170, Feb. 2007.

[18] L. Malmi, A. Korhonen, and R. Saikkonen, “Experiences in automaticassessment on mass courses and issues for designing virtual courses,”in Proc. 7th Ann. SIGCSE/SIGCUE Conf. Innov. Technol. Comput. Sci.Educ., ITiCSE’02, Aarhus, Denmark, 2002, pp. 55–59.

[19] M. Khalifa and R. Lam, “Web-based learning: Effects on learningprocess and outcome,” IEEE Trans. Educ., vol. 45, pp. 350–356, Nov.2002.

[20] A. García, S. Rodríguez, F. Rosales, and J. Pedraza, “Automatic man-agement of laboratory work in mass computer engineering courses,”IEEE Trans. Educ., vol. 48, pp. 89–98, Feb. 2005.

M. Isabel García received the B.S. degree in computer engineering and thePh.D. degree in computer science from the Technical University of Madrid(UPM), Spain, in 1982 and 1985, respectively.

She is currently an Associate Professor with the Department of ComputerSystems Architecture and Technology, UPM. Her teaching interests includecomputer architecture and instruction level parallelism architectures.

Santiago Rodríguez received the B.S. degree in computer engineering and thePh.D. degree in computer science from the Technical University of Madrid(UPM), Spain, in 1990 and 1996, respectively.

He is currently an Associate Professor with the Department of Computer Sys-tems Architecture and Technology, UPM. His teaching interests include com-puter architecture and fault tolerant computers.

Antonio Pérez received the B.S. degree in telecomunication engineering andthe Ph.D. degree in computer science from the Technical University of Madrid(UPM), Spain, in 1979 and 1982, respectively.

He is currently a Full Professor with the Department of Computer SystemsArchitecture and Technology, UPM. His teaching interests include computerarchitecture, fault tolerant computers, and microprocessor systems design.

Antonio García received the B.S. degree in computer engineering and the Ph.D.degree in computer science from the Technical University of Madrid (UPM),Spain, in 1993 and 2001, respectively.

He is currently an Associate Professor with the Department of Computer Sys-tems Architecture and Technology, UPM. His teaching interests include com-puter architecture, parallel and distributed systems, and hardware descriptionlanguages.