Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Hardware-SoftwareApproaches to In-CircuitEmulation forEmbedded ProcessorsChung-Fu Kao
National Sun Yat-Sen University
Hsin-Ming Chen
Andes Technology
Ing-Jer Huang
National Sun Yat-Sen University
&AN IN-CIRCUIT EMULATOR (ICE) is part of the
development environment for a microprocessor- or
microcontroller-based system—called a target system.
(We use the terms microprocessor and microcontroller
interchangeably unless we need to differentiate them.)
While retaining the same functionality and physical
features as the original microprocessor, the ICE
provides extra debug and test mechanisms to support
designers in the test, development, debug, and
maintenance of target systems’ hardware and soft-
ware. These mechanisms include single stepping,
breakpoint setting and detection, tracing, internal
resource monitoring, and modification.
Traditionally, designers have used an ICE mainly
when debugging a microprocessor-based system
design at the PCB (printed circuit board) level, as
Figure 1 shows. (Although test, development, debug,
and maintenance are different activities, they involve
similar operations. We use the term debug to include
all these activities unless there is a need to differentiate
them.) To debug the system, designers
pull the target microprocessor chip out
of its slot on the board and insert the
ICE into the slot to act as the target
microprocessor.1 The host computer’s
software controls the ICE’s operation
via a communication channel. After
debug is complete, the designer dis-
connects the ICE and places the
original target microprocessor chip
back in its slot. In this scenario, the ICE functions only
during debug and doesn’t exist in the final product.
The ICE’s cost and performance affect the develop-
ment system but not the final product.
In the SoC era, however, the ICE no longer plays a
negligible role. Responding to the needs of higher
performance, more functionality, and higher integra-
tion levels, manufacturers are permanently embed-
ding an ICE with the microprocessor core in the final
product. For example, in microprocessors developed
by ARM2 and IBM,3 there is no way to remove the ICE
from the chip. ICE performance, cost, power con-
sumption, test and debug support, and hardware-
software interfacing have become important consid-
erations in microprocessor-based platform design.
Thus, it has become necessary to comprehensively
investigate the effects of embedding an ICE in a
microprocessor core. Unfortunately, the design of ICEs
has mainly followed an ad hoc approach. Architecture
platforms, hardware-software interfaces, and operating
462
In-circuit emulators have become part of the permanent structure of
microprocessor cores to support on-chip test and debug activities in highly
integrated environments such as SoCs. However, ICE design styles and
operation principles are quite diverse. This article presents a taxonomy based
on the notions of foreground and background operations and hardware-
software implementation alternatives to organize existing in-circuit emulation
approaches.
In-Circuit Emulation
0740-7475/08/$25.00 G 2008 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
methodologies vary widely
among ICEs for different
microprocessors. Further-
more, many ICEs are pro-
prietary commercial prod-
ucts for which in-depth
design information is un-
available. Most available
information is in the form
of user manuals or appli-
cation notes, which pro-
vide very limited design
information. Therefore, it
is difficult to perform a fair
comparison of on-chip
debug approaches and
select appropriate approaches
for future designs under
various application require-
ments.
In this article, our goal is
to demystify ICE designs
and their impact on the
SoC environment. We classify existing ICE approaches,
identify a basic design for each major category, and
show how to instantiate it with an ARM7-based
microprocessor. Finally, we conduct experiments to
quantitatively analyze the hardware, software, and
operational features of these on-chip debug approach-
es and draw conclusions about their applications in
embedded-system design.
Classification of in-circuitemulation approaches
We divide in-circuit emulation operations into two
modes: background debug mode (BDM) and fore-
ground debug mode (FDM). In BDM, the user program
executes normally, except that the ICE is active at the
same time to monitor system status for trigger
conditions such as timer timeout, breakpoint and
watchpoint matching, single stepping, and trace buffer
full. (Although breakpoints, watchpoints, single step-
ping, and traces are different activities, they can be
implemented with similar basic operations. To simply
our discussion, we focus on the breakpoint activity.)
Once the trigger condition exists, the operation mode
switches into FDM, in which the ICE, rather than the
user program, takes control of the system.
In FDM, while the user program is halted, the ICE
can observe or configure the microprocessor’s internal
system status, including memory, registers, and other
control or I/O signals. Alternatively, the ICE can
communicate with the host computer to receive
debug commands from the host and execute them
or send back the internal system status to the host
though a communication channel. Finally, the ICE can
switch the operation mode back to BDM to resume
user program execution.
We can refine these two modes into more
sophisticated debug modes. For example, in one
variation of FDM, the user program can continue
execution within a limited and safe context instead of
halting completely, while the ICE communicates with
the host. (Because of space limitations, we don’t go
into such details here.)
Both modes can be implemented with either
software or hardware. Therefore, we can place all
possible in-circuit emulation approaches into the four
classes listed in Table 1. The software emulation class
uses the all-software approach for both FDM and BDM,
and hardware emulation uses the all-hardware ap-
proach. Hybrid emulation 1 uses software for FDM and
hardware for BDM; and hybrid emulation 2 uses
hardware for FDM and software for BDM. The table
uses the notations F and B for foreground and
background, and S and H for software and hardware.
Thus, FSBH indicates a software foreground and a
463
Figure 1. Microprocessor-based development system. (PCB: printed circuit board.)
September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
hardware background implementation and represents
hybrid emulation 1.
BDM operationsBDM includes two major tasks: detecting trigger
conditions while the user program is executing, and
suspending the user program and switching to FDM
when the conditions are met.
Software BDM approaches
The two basic approaches to BDM software
implementation are instrumenting and single stepping.
Instrumenting creates a specialized version of the user
program. Programmers construct this version by
patching special instructions into the target locations
of the original user program. Executing these instruc-
tions raises a software interrupt that transfers control to
an exception handler (also called an interrupt service
routine), causing the processor to enter FDM immedi-
ately. Alternatively, the instrumented program can
perform simple condition checking at the cost of
performance overhead. If the conditions are met, the
processor enters FDM; otherwise, it resumes user
program execution. This pure software approach can
be implemented in almost all microprocessors. Its
disadvantage is that the instrumented program is
different from the original user program; a bug-free
instrumented user program doesn’t guarantee a bug-
free original user program.
The single-stepping approach can be used in
microprocessors, such as Intel’s x86 microprocessors,
that support the single-stepping exception.7 This single-
stepping mechanism, once enabled, causes an excep-
tion to be raised after the execution of each instruction
in the user program, and the corresponding exception
handler takes control. The exception handler then
activates the software monitor to check trigger
conditions and decide whether to execute the next
instruction in the user program or to switch to FDM.
This approach’s advantage is that it achieves debug
without instrumenting the user program. On the other
hand, debugging with this approach is very slow, so it
464
Table 1. Classification of in-circuit emulation approaches.*
Approach
Foreground
debug mode
(FDM)
Background
debug mode
(BDM) Advantages Disadvantages Examples
Software emulation
(FSBS)**
Software Software Flexible, easily
modified
Large amount of
system memory,
longer time to detect
breakpoint and
return to user
program
Motorola HC08 Monitor Mode,4
Motorola MPC565,5 Infineon
Tricore,6 Intel x86 debug
instructions,7 ARM Angel
debug monitor,8 Intel
IA-32/647,9,10
Hardware emulation
(FHBH)**
Hardware Hardware Real-time breakpoint
detection, support
for sophisticated
breakpoint conditions
Gate count overhead,
modification
inflexibility
ARM embedded ICE,2
hardware breakpoint in ARM
RealView Debugger,11 Nexus
500112,13
Hybrid emulation 1
(FSBH)**
Software Hardware Similar to FHBH,
flexible FDM
implementation
Longer time for FDM
operations
Intel x86 debug register,7,14
Intel IA-32/647,9,10
Hybrid emulation 2
(FHBS)**
Hardware Software Similar to FSBS,
smaller supporting
software for
FDM operations
Higher hardware cost
than FSBS
Motorola M68300/M68HC16,15
software breakpoint in ARM
RealView Debugger,11 any
simple microprocessor core
with JTAG port and boundary
scan cells and an appropriate
software interrupt instruction
* ICE: in-circuit emulator.
** F: foreground; B: background; H: hardware; S: software.
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
is infeasible for even a medium-size user program. To
avoid this problem, the user program must be
instrumented to turn on the single-stepping mecha-
nism only within a small range in the program.
In summary, the advantages of the software BDM
approaches are that they are applicable to most
microprocessors, flexible in design, easy to modify,
and require little hardware support. They support a
flexible number of breakpoints. In addition, they allow
adjustment of the priority among the breakpoint
exception and other hardware and software excep-
tions to protect critical tasks from being disrupted by
the debug activity.
On the other hand, software BDM takes up
exception (interrupt) vector space and precious
memory space, which might be limited in the SoC
environment. Second, identifying a possible trigger
condition takes a significant number of instruction
cycles, making software BDM inappropriate for real-
time debug. Third, software overhead makes it
infeasible to detect trigger conditions with complica-
tions such as masking, data dependency, and range
checking. Finally, software BDM can detect only
software logic bugs; it has difficulty detecting hard-
ware and timing-related bugs.
Hardware BDM approaches
The basic hardware support approach provides a
mechanism to control the target microprocessor’s
program execution flow.16 Implementing BDM in
hardware usually involves a hardware comparator to
monitor address and data buses, control signals,
internal states, and I/O signals. The comparator
contains a set of registers that can be programmed
for several trigger conditions. The trigger conditions
can be more sophisticated than those of the software
approach, including masking and data dependency
(equal, not equal, greater than, less than, range, and so
forth), because these are easy to implement in
hardware. Once the trigger conditions are met, the
comparator stops the core clock or raises an exception
to halt the user program and enter FDM.
Implementing BDM in hardware is a simple concept
but requires careful design. An important issue is
proper timing in halting the microprocessor after the
trigger conditions are met to keep it in a stable, precise
state. Another issue is handling instruction parallelism
such as pipelining and superscalar execution so as to
retain the logical sequence and eliminate false
conditions caused by parallel execution.
The advantages of the hardware approach are that
it allows trigger condition checking in real time, and
the trigger conditions can be sophisticated because
these extra functions take only a few extra gates. In
addition, system status that is not directly accessible by
software can be handled by hardware. Therefore, the
hardware approach can detect hardware, software,
and timing bugs. The disadvantages are hardware
overhead, longer design and verification time, and
inflexibility in modifying the ICE (such as increasing
the number of supported breakpoints) after integration
in the SoC.
FDM operationsFDM consists of three major tasks: accessing and
modifying internal system status and configuring BDM,
interacting with the host computer, and switching
back to BDM and resuming the user program.
Software FDM approaches
The software FDM implementation has the form of
a system service routine or software monitor that
usually resides in the system memory area.16 The
software monitor consists of a command loop that
interacts with the host computer to receive commands
from the user and feed information back to the user
through a communication channel. Upon receiving
the user’s command, the software monitor decodes
the command and calls the corresponding service
subroutine, such as setting breakpoints, accessing
memory, accessing registers, resuming the user
program, single stepping, or tracing.
Although a procedure call can invoke the software
monitor, it is more efficient to invoke it through an
interrupt (or exception) such as a software interrupt.
The exception handler backs up the user program’s
system status, checks the exception’s source, performs
system mode switching (if necessary), and finally calls
the software monitor. Once the service of the software
monitor is complete, the system leaves FDM and
returns to BDM by simply resuming the execution of
the user program.
The main advantages and disadvantages of soft-
ware FDM are similar to those of software BDM. An
additional advantage is the smooth transition software
provides between BDM and FDM: Entering (or
leaving) the software monitor automatically suspends
(or restarts) the user program. There is no need to
release (or hold) the system clock to activate (or
465September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
deactivate) the user program, as in the case of
hardware FDM.
Hardware FDM approaches
Implementing FDM in hardware usually requires an
I/O port specifically dedicated to debug, a set of
registers for storing related information, and a debug
controller for handling communication with the
external world and executing FDM operations. Hard-
ware FDM is independent of the microprocessor core
and is thus driven by the test clock, which is different
from the core clock. While the system is in FDM, the
core clock halts and the hardware FDM is under the
test clock’s control. To switch back to BDM, the test
clock halts and the core clock resumes.
Although there are many possible hardware FDM
implementations, designs based on standard test
mechanisms make core integration and software
development easier. Such mechanisms include the
IEEE 1149.1 JTAG architecture,17 which provides serial
test access to the chip, and the newer IEEE 1500
architecture, which provides both serial and parallel
access to the chip.18
Hardware FDM has two main advantages. First, the
debug circuit is independent of the microprocessor
core and thus takes no programming resources from
the user program. No exception or service routine is
necessary. Second, the test clock can run faster than
the core clock to speed up debug operations, because
the debug circuit is far
simpler than the micropro-
cessor core, which is not
active during FDM. The
main disadvantages are
similar to those of hard-
ware BDM.
FDM communication
channels
An important distinc-
tion between software
and hardware FDM is their
communication channels.
Figure 2 shows a generic
block diagram of an SoC
with an ICE. The SoC has
two communication chan-
nels: the external I/O bus
connected to the micro-
processor’s system (mem-
ory) bus, and the external test bus connected to the
test access mechanism.
In software, the debug channel can be regarded as
a regular I/O port, accessible through memory-
mapped I/O addresses or distinct I/O ports, as defined
by the instruction set architecture. Therefore, software
FDM communicates with the external world through
the external I/O bus. The advantage of this approach is
its simplicity. The disadvantage is that other SoC
components might be blocked from accessing the
system bus or might have to share bus use with
software FDM. Thus, the approach can slow down
both system and debug performance.
Hardware FDM communicates with the external
world through the external test bus. This bus is visible
only to the test access mechanism, not the micropro-
cessor software. The advantage is that debug access
doesn’t interfere with other activities on the system
bus. The disadvantage is that additional I/O pins are
necessary.
Classification examplesThe debugger program for the Motorola 68HC11
evaluation board is an example of software emula-
tion.19 For FDM, it uses a software monitor called
Buffalo, which resides at the top of the memory. The
user can input a set of commands from the
keyboard. The program performs a BDM breakpoint
through a software interrupt (SWI) instruction
466
Figure 2. ICE communication channels in an SoC. (DMA: direct memory access; DSP:
digital-signal processor.)
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
patched into the target address. It performs single
stepping through a counter (OC5timer) that gener-
ates an interrupt to halt the user program. The value
set in the counter is the exact time required to run
through the monitor and execute the next user
instruction.
Another example of software emulation is the ARM
Angel debug monitor.8 Angel is a program that lets
developers debug applications running on ARM-based
hardware. Angel requires ROM or flash memory to
store the debug monitor code, and RAM to store data.
A typical Angel system’s two main components are a
host debugger and a debug monitor, which commu-
nicate through a physical connection such as a serial
cable. The host debugger, acting as the FDM, runs on
the host computer. The Angel debug monitor, acting
as the BDM, runs on the target system. Angel uses its
Angel Debug Protocol to communicate between the
host and the target systems.
Intel’s x86 microprocessors, such as the IA-32/64,
support both software and hybrid emulation 1.7,10 In
software emulation, the software FDM resides in the
INT1 and INT3 handlers. In BDM, a breakpoint
instruction (OCCh) patched into the target address
causes an INT1 trap and activates the breakpoint.
Turning on the trap flag, which causes an INT3 trap,
achieves single stepping. Alternatively, in hybrid
emulation 1, the hardware comparator handles break-
points. There are four breakpoint registers in hardware.
The breakpoint comparison occurs at the linear
address space—that is, before the physical address
translation.
ARM’s microprocessor ICEs are examples of
hardware emulation. A hardware comparator, called
the ICEBreaker, serves as the hardware BDM. The
supported breakpoint conditions are very sophisticat-
ed, including masking, data dependency, chaining,
and range check. User program execution halts when
a breakpoint is matched. A JTAG port serves as the
hardware FDM. There are two scan chains for the
microprocessor core’s I/O pins, and one scan chain for
configuring the ICEBreaker. The RISCWatch debugger
of IBM’s PowerPC microprocessors uses a similar
hardware technique.3
ARM’s ICE hardware supports only two breakpoints
(which we call BP0 and BP1). To overcome this
limitation, ARM’s RealView Debugger, running on the
host, uses an interesting technique to combine the
hardware and software emulations.11 RealView pro-
vides one hardware breakpoint and an unlimited
number of software breakpoints. The hardware break-
point refers to BP0 in hardware. The so-called software
breakpoints in RealView are actually accomplished by
BP1 in hardware, as opposed to software interrupts in
the previously described software emulation method.
When a programmer places software breakpoints in
the program under debug, RealView replaces the
instructions in the corresponding locations with the
same specific binary pattern (for example, 0xFFFF
FFFF). In addition, RealView configures BP1 in the ICE
hardware as a watchpoint, with the binary pattern as
the target value under watch. When program execu-
tion reaches such locations, the binary pattern is
fetched as an instruction from program memory and
appears on the data bus. The binary pattern appearing
on the data bus triggers the watchpoint and causes the
processor to halt accordingly. The host debugger
software can then read back the program counter
through the JTAG port to determine the halted
location. With this technique, classified as hybrid
emulation 2, a single breakpoint circuit in hardware
can support an unlimited number of software break-
points.
The National Sun Yat-Sen University’s retargetable
embedded ICE module is another example of
hardware emulation based on the JTAG architecture.20
To make the ICE module retargetable to a wide range
of microprocessor architectures, the developers de-
cided that its operations should be controlled only
through test access port (TAP) instructions, not
through instruction set architecture features such as
instructions, system flags, or proprietary configuration
registers. Therefore, they defined a TAP instruction set
extension and additional hardware for the module’s
JTAG architecture.
The Nexus 5001 Forum defined the IEEE-ISTO 5001-
2003 debug interface specification to standardize the
processor debug interface in embedded systems.12 The
standard adopts the hardware emulation approach. It
uses the JTAG port to access the internal debug circuit
and allows optional extra pins, defined by the
designer, for higher debug throughput or more
complex control. At least two hardware breakpoints
are required to meet the standard. Vendors such as
IPextreme provide Nexus 5001-compliant debug mod-
ules for microprocessors such as the ARM7 and ARM9,
and on-chip bus interfaces such as the Advanced High-
Performance Bus (AHB).13
Finally, any microprocessor core with a basic JTAG
port and boundary scan cells and appropriate software
467September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
interrupt capability is a typical example
of hybrid emulation 2. The JTAG-related
circuits serve as the hardware FDM, and
the software interrupt instruction can be
patched into the user program to serve as
the software BDM.
Representative ICE designsWe have presented a classification
scheme of in-circuit emulation approach-
es from the hardware and software
perspective. However, quantitatively ana-
lyzing and comparing such approaches is
still difficult because existing designs are
implemented on significantly different plat-
forms and for different purposes. Here,
we identify a typical design for each
class of ICE and show how to implement
it on the same ARM7 microprocessor
platform, thus allowing fair analysis and
comparison.
Software emulation (FSBS)
Figure 3a shows a block diagram of
the software emulation scheme for the
ARM7 microprocessor. At the right is the
ARM7 microprocessor core. External
memory is connected to the address
and data buses of the microprocessor
core. The external memory is conceptu-
ally partitioned into three portions:
system memory, user memory, and the
communication device. Software FDM
and software BDM are located in system
memory and user memory, respectively.
Software FDM is implemented with a soft-
ware program segment called SoftFDM,
activated by the SWI exception handler,
which also resides in system memory.
Software BDM is the instrumented user
program under debug. The communica-
tion devices are memory-mapped I/O
devices.
Figure 3b shows the memory layout
in more detail. The upper part (system
memory) contains the table that stores
breakpoint information, the pool that
preserves the register contents of the
user program upon entering FDM, the
I/O buffers that hold information while
468
Figure 3. Memory organization of software emulation for the ARM7
microprocessor: block diagram (a) and memory layout (b). (SWI: software
interrupt.)
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
SoftFDM communicates with the host
computer, and SoftFDM, which is part of
the SWI handler. The instrumented user
program resides in the user program
memory. A target location in the user
program, where the user intends to set a
breakpoint, is replaced with the special
SWI instruction, which serves as the
FDM trigger.
Figure 4 presents the basic structure
of the SWI exception handler. Before
entering SoftFDM, the SWI exception
handler must back up the user register
file and read the SWI instruction’s data
field to determine the exception service
vector. After entering SoftFDM, the pro-
gram’s first task is to restore the registers
polluted by the SWI exception handler.
These housekeeping activities constitute
software emulation’s major performance
overhead. SoftFDM is a command loop
that receives commands from the host
computer, decodes them, and performs
corresponding operations.
Hardware emulation (FHBH)
Figure 5 shows a block diagram of
the hardware emulation scheme for the
ARM7 microprocessor core. The hard-
ware monitor is the hardware BDM. The
JTAG controller and its related compo-
nents, such as the five I/O pins and the boundary
scan chains, serve as the hardware FDM. The
hardware monitor is connected to the microproces-
sor core’s address and data buses. The hardware
monitor checks the trigger conditions on the buses.
The hardware monitor’s major component is a
comparator. Figure 6 shows the circuit diagram of the
comparator, which supports two breakpoints. The
figure shows the details of one breakpoint. Three kinds
of information are necessary to configure a break-
point: the control, data, and address signals. Each
signal is further controlled by two parameters: the
mask and the target value. It takes a total of six
configuration registers to control a breakpoint setting.
These configuration registers allow breakpoint check-
ing to be data dependent and bitwise maskable. The
hardware monitor is controlled by the debug-enable
I/O pin and the hardware FDM. When a breakpoint is
triggered, output signal breakpt is asserted. This
disables the microprocessor core clock at the proper
cycle to halt user program execution and switch the
system into FDM, in which the system is under test
clock control.
Hardware FDM is implemented with the IEEE
1149.1 JTAG architecture. The serial access imposed
by the JTAG standard could cause a performance
bottleneck during debug. To improve debug perfor-
mance, designers can use newer architectures with
parallel test access, such as IEEE 1500, for the FDM
implementation, at the cost of higher hardware
overhead.
Hybrid emulation 1
Hybrid emulation 1 (FSBH) uses the software FDM
from software emulation and the hardware BDM from
hardware emulation. Figure 7 shows the block dia-
gram and the memory layout for hybrid emulation 1.
These are similar to those of software emulation
469
Figure 4. Basic structure of the SWI exception handler.
September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
because the FDM is implemented with software.
However, a few modifications are worth noting. First,
an additional hardware module, the hardware mon-
itor, connects to the memory data and address buses.
The hardware monitor implements the hardware
BDM. Second, there is no instrumented code in the
user program, because the hardware
monitor performs breakpoint checking
in the background. Third, the hardware
monitor’s behavior is similar to memory
controllers such as memory manage-
ment units. Thus, instead of holding the
core clock for the microprocessor core
as in hardware emulation, the hardware
monitor halts the microprocessor core
and enters the FDM by generating a data
abort signal (using its breakpt output
signal). Fourth, the software FDM is in
the data abort exception handler, in-
stead of the software interrupt handler as
in software emulation. Fifth, an addition-
al field called the hardware monitor
registers is allocated in the system
memory for configuration of the hard-
ware monitor.
Hybrid emulation 2
Hybrid emulation 2 (FHBS) uses the
hardware FDM from hardware emulation
and the software BDM from software emulation.
Figure 8 shows the block diagram and the memory
layout for hybrid emulation 2. These illustrations are
similar to those of hardware emulation because the
FDM is implemented with hardware. However, again,
we note a few modifications. First, the hardware
470
Figure 5. Hardware emulation for the ARM7 microprocessor. (BDM:
background debug mode; FDM: foreground debug mode; nTRST: test reset;
TCK: test clock; TMS: test mode select; TDI: test data in; TDO: test data out.)
Figure 6. The major BDM hardware: the comparator. (BP: breakpoint.)
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
module connected to the
memory data and address
buses in the hardware em-
ulation scheme is not nec-
essary here, because the
software BDM checks trig-
ger conditions. Second,
although hardware per-
forms the major FDM task,
this design still needs a
small software interface to
manage the FDM control-
ler. This is the FDM control
routine in the SWI excep-
tion handler in Figure 8b.
Third, because there is no
hardware monitor to hold
the core clock, the FDM
control routine must hold
the core clock by writing
to a memory-mapped I/O
address—0x0000001 C in
Figure 8b. While the clock
is held, system control can
be safely transferred to the
FDM controller. The user
can reactivate the core
clock by properly config-
uring the related I/O circuit
through the FDM controller.
Table 2 summarizes the
implementation features of
the four emulation ap-
proaches for the ARM7
microprocessor core.
Quantitativecomparisons
We constructed an FPGA-
based prototyping system
to verify and demonstrate
various in-circuit emula-
tion approaches. We built
the ICE designs just de-
scribed with an academ-
ic synthesizable micropro-
cessor core that implements the ARM7 instruction set.
We downloaded the ICEs to the prototyping system for
experiments, synthesized them to standard cells, and
analyzed their gate-level features.
Hardware analysis
We synthesized the ICEs with TSMC’s 0.35-micron
standard cell library. The ARM7 core requires 46,167
gates. Table 3 presents our quantitative analysis of the
471
Communicationbuffer field
Host computer
Softwareemulation code
Figure 7. Hybrid emulation 1 (FSBH) for the ARM7 microprocessor: block diagram (a)
and memory layout (b).
September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
ICE hardware for each of the four
emulation approaches. Compared with
the ARM7 core, the gate count overheads
of the FSBS, FHBH, FSBH, and FHBS
approaches are 0%, 15%, 11%, and 4%.
The major gate count contributor is the
hardware comparator. Therefore, the
designer must be careful in deciding
how many breakpoints and conditions
are supported by hardware. Regarding
the core clock speed, FHBH and FHBS
have the same speed because they have
the same critical path. FSBS and FSBH
also have the same speed. In addition,
FHBH and FHBS have a minor overhead
of 0.4% of the overheads of FSBS and
FSBH. The overhead is due to the scan
cells on the critical path. The experi-
ment shows that most of the ICE
hardware components are not on the
critical path and don’t affect system
performance.
FHBH and FHBS have another clock,
the test clock, which drives the hardware
while the core clock is halted. The
experiment shows that the test clock is
about 20% faster than the core clock,
because most of the complex system
hardware modules are not used during
test. This indicates that the hardware
debug mechanism can operate at a
faster speed than normal system speed.
Software aspects
The related software modules are
written in the ARM7 assembly language,
and assembled and linked with the ARM
STD v2.5 development tool. The ma-
chine code is downloaded into the
embedded memory in the chip. Table 4
presents our quantitative analysis of the
ICE software for the four emulation
approaches. Of all the approaches,
FHBH needs no software code or re-
sources. FSBH needs the largest software
code and consumes one exception
vector resource, but it can also debug
the original user program. On the other
hand, FHBS has the same software
debug mechanism as FSBS but requires
472
Figure 8. Hybrid emulation 2 (FHBS) for the ARM7 microprocessor: block
diagram (a) and memory layout (b).
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
473
Table 2. Implementation features of the four emulation approaches for the ARM7 microprocessor.
Feature
Software emulation
(FSBS)*
Hardware emulation
(FHBH)*
Hybrid emulation 1
(FSBH)*
Hybrid emulation 2
(FHBS)*
FDM approach SoftFDM program in
software exception
handler
JTAG controller SoftFDM program in
data abort exception
handler
JTAG controller with simple
control routine in software
exception handler
BDM approach Instrumented code
(SWI instruction)
Hardware monitor Same as hardware
emulation
Same as software
emulation
Mode switch: BDM
to FDM
Execute SWI instruction breakpt signal stops
core clock
breakpt signal raises
data abort exception
Same as software
emulation
Mode switch: FDM
to BDM
Exit SWI exception
handler
Input and execute
JTAG restart instruction
to resume core clock
Exit data abort
exception handler
Same as hardware
emulation
Suspend user
program
Jump to SWI exception
handler
Hold core clock with breakpt
signal of hardware monitor
Jump to data abort
exception handler
Hold core clock with SWI
exception handler
Communication
interface
Memory-mapped I/O JTAG port Same as software
emulation
Same as hardware
emulation
Set breakpoint Use SWI instruction to
patch instruction at
breakpoint address
Scan breakpoint values into
hardware monitor registers
Use store instruction
to store target values
in hardware monitor
register buffer of
system memory
Same as software
emulation
Register access Execute memory store
instruction to store
register values in
output buffer of
system memory
Scan in and execute memory
store instruction to put register
values on memory data bus;
then scan out bus through
boundary scan chain
Same as software
emulation
Same as hardware
emulation
Memory access Execute memory load
and store instructions
to copy memory content
to output buffer of
system memory
accessible by host
Scan in and execute memory
load instruction
to read memory content
in register file; then use register
access to output content to host
Same as software
emulation
Same as hardware
emulation
Single-step Patch consecutive
instructions
Set new breakpoint on next
instruction address or use
end-of-instruction signal as
breakpoint trigger
Same as hardware
emulation
Same as software
emulation
* F: foreground; B: background; H: hardware; S: software.
Table 3. Quantitative comparison of ICE hardware in the four emulation approaches for the ARM7 microprocessor.
Features
Software emulation
(FSBS)*
Hardware emulation
(FHBH)*
Hybrid emulation
1 (FSBH)*
Hybrid emulation
2 (FHBS)*
Gate count 0 6,992 4,912 2,046
Core clock, test
clock (ns)
17.03, NA 17.10, 13.78 17.03, NA 17.10, 13.78
* F: foreground; B: background; H: hardware; S: software.
September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
far less code in the SWI exception handler, thus saving
precious system memory space.
ICE features
Table 5 presents our quantitative analysis of the ICE
features for the four emulation approaches. The first
three ICE features in the table are related to the ICE
capability. FHBH has the most complex and inflexible
design, because everything is in hardware. FSBS is the
simplest and most flexible, because everything is in
software. FSBH and FHBS have medium complexity,
and they are complementary to each other in the
flexibility of FDM and BDM. FHBH and FSBH can
provide sophisticated breakpoint conditions, where-
as FSBS and FHBS detect only instruction accesses.
The analysis shows that when choosing the appro-
priate in-circuit emulation approach, the designer
must consider the flexibility requirements for FDM
and BDM in a specific SoC development environ-
ment.
The last five ICE features in Table 5 are the latencies
for various ICE operations. We show the latency with
the physical time instead of the cycle count, because
the operations in FHBH and FHBS are the collabora-
tion of the core clock and the test clock running at
their own speeds. In addition, because some opera-
tions involve interactions between the SoC and the
external world, the bandwidth of the communication
channel also affects the latencies.
Therefore, Table 5 shows two versions of latencies,
whenever appropriate, designated by S and P to
indicate serial and parallel access. For the hardware
FDM, serial access refers to the IEEE 1149.1 JTAG
architecture, and parallel access refers to the IEEE
1500 architecture. For the software FDM, serial access
and parallel access refer to an external I/O bus with 1-
bit and 32-bit bandwidth, respectively. Furthermore,
some ICE operations can be broken down to three
steps of operations, which are listed in parentheses in
the table:
& set up the debug command,
& execute the command, and
& send feedback to the user.
The analysis of the ICE operation latencies shows
that FHBH has the shortest latencies, especially for
detecting the breakpoints in which the latency incurs
only the time spent waiting for the current instruction
to complete its execution before a break can be taken.
This feature makes FHBH the best candidate for real-
time debug. The next-best candidate is FHBS. The
worst one is FSBH, because once a breakpoint is
detected by the hardware monitor with the data abort
exception, it must await the current instruction for
completion, preserve the system status, and then
transfer control to the software FDM.
Moreover, the latency breakdown analysis indi-
cates that the major contributor of the latency is the
time spent receiving commands from and sending
feedback to the user, rather than the time to execute
the debug command. This observation suggests that
the ICE performance for an SoC can be greatly
improved by employing the following strategies: First,
develop a communication channel with a high
bandwidth and an efficient protocol. Second, store
macros of ICE operations on chip, similar to the
concept of microprogramming, to avoid communicat-
ing a tremendous amount of primitive operations
through the channel.
474
Table 4. Quantitative comparison of ICE software in the four emulation approaches for the ARM7 microprocessor.
Features
Software
emulation
(FSBS)*
Hardware
emulation
(FHBH)*
Hybrid
emulation 1
(FSBH)*
Hybrid
emulation 2
(FHBS)*
Code size (bytes):
equivalent hardware
gate count
888 NA 920 44
Resources used One exception
vector: SWI
NA One exception vector:
data abort exception
SWI
Debug on original user
program
No Yes Yes No
* F: foreground; B: background; H: hardware; S: software.
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
Application domains
Table 6 gives the suitable SoC application domains
for the four emulation approaches.
THE QUANTITATIVE ANALYSES show that the FHBH
hardware emulation is suitable for SoC designs where
the extensive hardware cost is affordable and real-time
hardware-software debug is a strict requirement. The
FSBS software emulation is suitable for SoC designs
with a rich memory resource and a simple I/O
structure, in which functional software debug, and
not timing behavior, is the primary concern. It can be
also used as a supplement to the FHBH hardware to
provide extra capacity that is not provided by the
hardware, such as more breakpoints. The FSBH
approach is suitable for SoC designs requiring low-
475
Table 5. Quantitative comparison of ICE features in the four emulation approaches for the ARM7 microprocessor.
Features
Software emulation
(FSBS)*
Hardware emulation
(FHBH)*
Hybrid emulation
1 (FSBH)*
Hybrid emulation 2
(FHBS)*
Design complexity Simple, flexible Complex, inflexible Medium, flexible FDM but
inflexible BDM
Medium, inflexible FDM
but flexible BDM
Breakpoint number Unlimited, subject only
to memory size
2, fixed in number 2, fixed in number Unlimited, subject only to
memory size
Breakpoint condition Instruction address only Instruction or data access
with masking, data
dependency checking
Instruction or data access
with masking, data
dependency checking
Instruction address only
Breakpoint setup** S: 20947 (20828, 119, 0)
P: 885 (766, 119, 0)
S: 3790 (152, 3638, 0)
P: 317 (69, 248, 0)
S: 124761 (124251, 510, 0)
P: 4393 (3883, 510, 0)
S: 11268 (303, 10965, 0)
P: 865 (138, 727, 0)
Latency for breakpoint
detection (ns)**
732 (0, 732, 0) 17 to 272 (0, 17 to 272,
0) Wait for one instruction
to complete execution
749 to 1,004 (17 to 272,
732, 0) Wait for one
instruction to complete
execution
633 (0, 633, 0)
Latency to resume
user program (ns)**
S: 12210 (11444, 766, 0)
P: 1124 (358, 766, 0)
S: 6818 (303, 6515, 0)
P: 949 (138, 811, 0)
S: 11751 (11444, 307, 0)
P: 665 (358, 307, 0)
S: 14797 (303, 14494, 0)
P: 1458 (138, 1320, 0)
Latency to access
one memory word
(ns)**
S: 15702 (15072, 85, 545)
P: 920 (818, 85, 17)
S: 6527 (3541, 51, 2935)
P: 657 (441, 51, 165)
S: 15702 (15072, 85, 545)
P: 920 (818, 85, 17)
S: 6527 (3541, 51, 2935)
P: 657 (441, 51, 165)
Latency to access
one register word
(ns)**
S: 15293 (14714, 34, 545)
P: 511 (460, 34, 17)
S: 3272 (2260, 34, 978)
P: 337 (248, 34, 55)
S: 15293 (14714, 34, 545)
P: 511 (460, 34, 17)
S: 3272 (2260, 34, 978)
P: 337 (248, 34, 55)
* F: foreground; B: background; H: hardware; S: software.
** S: serial access; P: parallel access. Numbers in parentheses indicate time to set up the debug command, time to execute this command, and
time to send feedback to the user.
Table 6. Suitable SoC application domains of the four emulation approaches for the ARM7 microprocessor.
Software emulation
(FSBS)*
Hardware emulation
(FHBH)*
Hybrid emulation 1
(FSBH)*
Hybrid emulation 2
(FHBS)*
Functional software debug
with rich memory resource
and simple I/O structure and
timing requirement
Real-time low-level hardware
and software debug with
complex I/O structure and
timing requirement
Low-level hardware and
software debug with flexibility
in FDM but less real-time
response
Functional software debug with
limited memory resource
* F: foreground; B: background; H: hardware; S: software.
September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
level hardware and software debug support and
flexibility in the FDM implementation. FHBS is suitable
for software emulation but with a tight memory size. In
the future, we’d like to investigate the in-circuit
emulation problems in superscalar processors, VLIW
processors, multiprocessors, and hierarchical-core-
based systems. &
AcknowledgmentsWe thank the editors and reviewers for their
valuable suggestions in improving this work. This
work was partially funded by the National Science
Council (Taiwan) under contract 91-2218-E-110-005.
&References
1. M. Rafiquzzaman, Microprocessors and Microcomputer
Development Systems, Harper & Row, 1984.
2. S. Furber, ARM System-on-Chip Architecture, 2nd ed.,
Addison-Wesley, 2000.
3. ‘‘RISCWatch Debugger,’’ IBM, http://www-306.ibm.com/
chips/products/powerpc/tools/riscwatc.html.
4. K. Kikuchi and J. Suchyta, HCS08 Background Debug
Mode versus HC08 Monitor Mode, Motorola application
note AN2497/D, June 2003; http://e-www.motorola.com/
files/microcontrollers/doc/app_note/AN2497.pdf.
5. MPC565/MPC566 User’s Manual, MPC565UM/D
revision 2, Motorola, 2002.
6. Tricore 1 Architecture Manual, v1.3.3, Infineon
Technologies, 2002.
7. Pentium Processor Family User’s Manual, vol. 3,
Architecture and Programming Manual, Intel, 1994.
8. ‘‘Angel Debug Monitor,’’ ARM, http://www.arm.com/
products/DevTools/AngelDebugMonitor.html.
9. ‘‘Debugging Support, Embedded Intel386DX
Microprocessor,’’ data sheet, Intel, 1995.
10. Intel 64 and IA-32 Architectures Software Developer’s
Manual, vol. 3b, System Programming Guide. 2006, http://
www.intel.com/design/processor/manuals/253669.pdf.
11. ‘‘RealView Debugger,’’ ARM, http://www.arm.com/
products/DevTools/RVD.html.
12. The Nexus 5001 Forum Standard for a Global Embedded
Processor Debug Interface, IEEE Industry Standards and
Technology Organization, 23 Dec. 2003, http://www.
nexus5001.org/standard.html.
13. ‘‘Freescale Nexus 5001 Software Debug Interfaces,’’
IPextreme, http://www.ip-extreme.com/IP/nexus_5001.
html.
14. M. El Shobaki and L. Lindh, ‘‘A Hardware and Software
Monitor for High-Level System-on-Chip Verification,’’
Proc. 2nd Int’l Symp. Quality Electronic Design, IEEE CS
Press, 2001, pp. 56-61.
15. C. Melear, ‘‘Emulation Techniques for Microcontrollers,’’
Proc. WESCON/97 Conf., IEEE Press, 1997, pp.
532-541.
16. A.B.T. Hopkins and K.D. McDonald-Maier, ‘‘Debug
Support for Complex Systems on-Chip: A Review,’’ IEE
Proc. Computers and Digital Techniques, vol. 153, no. 4,
3 July 2006, pp. 197-207.
17. IEEE Std. 1149.1-2001, Test Access Port and Boundary-
Scan Architecture, IEEE, 2001.
18. IEEE Std. 1500, Embedded Core Test (SECT), IEEE
Computer Society, 2005, http://grouper.ieee.org/groups/
1500/index.html.
19. W.C. Wray, J.D. Greenfield, and R. Bannatyne, Using
Microprocessors and Microcomputers: The Motorola
Family, 4th ed., Prentice Hall, 1998.
20. I.-J. Huang et al., ‘‘A Retargetable Embedded In-Circuit
Emulation Module for Microprocessors,’’ IEEE Design &
Test, vol. 19, no. 4, July/Aug. 2002, pp. 28-38.
Chung-Fu Kao completed the work described in
this article while completing his PhD at National Sun
Yat-Sen University, Taiwan. His research interests
include SoC platform design, design for verification,
and hardware-software coverification. He has a BS in
computer science and information engineering from
Tamkang University, Taiwan, and an MS and a PhD in
computer science and engineering from National Sun
Yat-Sen University.
Hsin-Ming Chen is an advanced engineer at
Andes Technology. He completed the work de-
scribed in this article while completing his MS at
National Sun Yat-Sen University. His research inter-
ests include embedded-ICE design and DFT. He has
a BS in computer information science from National
Chin-Yi University of Technology, Taiwan, and an MS
in computer science and engineering from National
Sun Yat-Sen University.
Ing-Jer Huang is a professor in the Department of
Computer Science and Engineering at National Sun
Yat-Sen University. His research interests include
microprocessors, SoC design, design automation,
system software, embedded systems, and hardware-
software codesign. He has a BS in electrical
engineering from National Taiwan University, and an
MS and a PhD in computer engineering from the
476
In-Circuit Emulation
IEEE Design & Test of Computers
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.
University of Southern California. He is a member of
the IEEE and the ACM.
&Direct questions and comments about this article to
Ing-Jer Huang, Embedded Systems Laboratory
(F5014), Dept. of Computer Science and Engineer-
ing, National Sun Yat-Sen University, 70 Lien-Hai Rd,
Kaohsiung City, Taiwan 80424 ROC; ijhuang@cse.
nsysu.edu.tw.
For further information on this or any other computing
topic, please visit our Digital Library at http://www.
computer.org/csdl.
477September/October 2008
Authorized licensed use limited to: National Sun Yat Sen University. Downloaded on September 29, 2009 at 02:19 from IEEE Xplore. Restrictions apply.