Real instruction set architectures Part 2: internal CPU storage, overview of Intel architectures

Real instruction set architectures

Part 2: internal CPU storage, overview of Intel architectures

Big-Endian vs. Little-Endian: quick recap

• In a big-endian machine, bytes used to store a data item are arranged left to right, so that the MSB is found at the leftmost position (first byte of address, the “big end”)

• Little-endian is just the opposite; bytes are arranged right to left, with the MSB as the first bit of the last byte (the “little end”)

• Note that, in either case, bits within each bytebits within each byte are arranged left to right – so a little-endian integer isn’t exactly the same thing as a big-endian integer backwards

Byte ordering & data movement

• Computer networks are big endian:– Little endian machines must convert integers

(e.g. network device addresses) before they can be passed over the network

– Little endian machines must also convert integers retrieved from the network to the native mode for the machine

Byte ordering & data movement

• Any program that reads/writes file data must be aware of byte ordering– For example, Windows BMPs were developed

on a little endian machine; an application on a big endian machine that reads a BMP must reverse byte order

– PhotoShop, JPEG, MacPaint, Sun raster files: big endian

– GIF, PC Paintbrush, RTF: little endian

Internal CPU storage

• 3 choices for data storage in CPU:– Stack architecture:

• Use stack to execute instructions; operands stored at top of stack

• No random access

– Accumulator architecture:• Minimum of internal complexity; short instructions• One (implicit) operand stored in accumulator• Involves high volume of memory traffic

– General Purpose register: see next slide

General Purpose Register (GPR)

• Set (>1) of GPRs• Most common architecture in use today• Registers are faster than memory; easier for

other parts of the CPU to handle register data (than data from memory)

• Cheaper hardware tends to mean an increased number of registers in the CPU

• GPRs mean longer instructions, because register(s) must be specified; takes more time to fetch/decode longer instructions

Classification of GPR architectures

• Memory to memory (VAX):– Instruction uses 2-3 operands, stored in memory– Instructions can perform operations without involving

registers

• Register to memory (Intel, Motorola): at least one operand must be in a register

• Load-store (SPARC, MIPS, Alpha, PowerPC): Requires movement of data to registers before any operations performed

Operand number / instruction length

• Instructions can be formatted 2 ways:– Fixed-length: fast, but wastes space– Variable-length: more complex to decode, but

saves space

• Real-life compromise often involves 2-3 instruction lengths (so fixed, but variable)

Some historical architectures

• VAX: Digital’s line of midsize computers, dominant in academia in the 70s and 80s

• Characteristics:– Variable-length instructions; anywhere from 2 to 5

operands– Full set of addressing modes: operands can be

anywhere; single instruction could take up to 31 bytes– “High level” instructions: complexity built into

instruction set to make programmers’ task easier– Extensive set of data types at machine level

Some historical architectures

• Motorola’s 68000 series– Initial Apple MacIntosh, early Sun

workstations– Variable-length instructions: 0-2 operands– Wide variety of addressing modes (but not as

many as VAX)– Could not start an instruction until previous

one was completed

Intel architectures

• 8086 chip: first produced in 1979– Handled 16-bit data, 20-bit addresses– Could address 1 million bytes of memory– CPU split into 2 parts:

• Execution unit: contained GPRs & ALU• Bus interface unit: included instruction queue,

segment registers, instruction pointer (SR & IP are special-purpose registers)

8086 GPRs

• AX: accumulator

• BX: base register: could be used to extend addressing

• CX: count register

• DX: data register

• Some 8086 instructions require use of specific GPR, but in general, could use any of these to hold data

Byte-level addressing

• Each GPR addressable at word or byte level

• For example, AX divided into:– AH (contains MSB)– AL (contains LSB)

• Same for BX, CX, DX

Other registers in 8086

• Pointer registers:– SP: stack pointer: used as offset into stack– BP: base pointer: used to reference parameters

pushed on stack; indicates lowest value SP can reach– IP: holds address of next instruction (like Pep/8’s PC)

• Index registers:– SI: source index; used as source pointer for string

operations– DI: destination index; used as destination pointer for

string operations– Both SI & DI sometimes used to supplement GPRs

Other registers in 8086

• Status flags register: bits indicate CPU status & results (overflow, carry, negative, etc.)

• Segment registers– 8086 assembly language programs divided

into specialized blocks of code called segments

– Each segment holds specific types of information

8086 Segments

• Code segment: program itself (instructions)

• Data segment: program data

• Stack segment: program’s runtime stack (for procedure calls)

8086 segments

• To access information in a segment, had to specify item’s offset from segment start

• Segment needed to store segment addresses – these were stored in segment registers:– CS: code segment– DS: data segment– SS: stack segment– ES: extra segment (used by some string operations to handle

memory addressing)

• Addresses specified in segment/offset form:XXX:YYYWhere XXX is the value stored in a segment register, and YYY is

the offset from the start of the segment

Evolution of Intel platform

• Basic 8086 ISA used in many successor chips:– 8087

• Introduced in 1980• Added floating-point instructions, 80-bit stack

– 80286• Introduced 1982• Could address up to 16Mb of memory


• 80386– Could address 4Gb of RAM– 32-bit chip, with 32-bit bus, 32-bit word– To achieve backward compatibility, Intel kept

same basic architecture, register sets– Used new naming convention in registers:

EAX, EBX, etc. were 32-bit (extended) versions of AX, BX, etc.; could still access original 16-bit registers (and their byte components) using original names


• 80486– Added high-speed cache memory for performance

improvement– Integrated math co-processor

• Pentium™ series– Intel quit using numbers: couldn’t trademark them– 32-bit registers, 64-bit bus– Employed superscalar design, with multiple ALUs;

could run instructions in parallel, handling more than one instruction per clock cycle

Pentium™ series

• Pro added branch prediction

• II added MMX

• III added increased support for 3D graphics using floating-point instructions

• P4: 1.4 GHz and higher clock rates; 42 million transistors per CPU; 400MHz (and faster) system bus, refinements to cache & floating-point operations

Pentium™ series

• Itanium: Intel’s first 64-bit chip– Employs hardware emulator to maintain

backward compatibility with x86 – 4 integer ALUs, 4 floating-point ALUs, 4 cache

levels, 128 bit registers for integers and floating-point numbers

– Multiple miscellaneous registers for dealing with efficient instruction loading for branching

– Addresses up to 16Gb of RAM

CISC vs. RISC

• CISC: complex instruction set computing– Employed by Intel up through Pentium Pro– Pentium II and III used combined CISC/RISC: CISC

architecture with RISC core that could translate CISC instructions to RISC

• RISC: reduced instruction set computing• CISC emphasizes complexity in hardware,

simplicity in software; RISC is opposite• RISC is generally considered superior in

performance

Documents

Real instruction set architectures Part 2: internal CPU storage, overview of Intel architectures