HyperTransport 3.1 Interconnect Technology.pdf

7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

1/30


2/30

Are your companys technical training needs being addressed in the most effective manner?

MindShare has over 25 years experience in conducting technical training on cutting-edge technologies. We

understand the challenges companies have when searching for quality, effective training which reduces thestudents time away from work and provides cost-effective alternatives. MindShare offers many flexible solution

to meet those needs. Our courses are taught by highly-skilled, enthusiastic, knowledgeable and experienced

instructors. We bring life to knowledge through a wide variety of learning methods and delivery options.

2 PCI Express 2.0

2 Intel Core 2 Processor Architecture

2 AMD Opteron Processor Architecture

2 Intel 64 and IA-32 Software Architecture

2 Intel PC and Chipset Architecture

2 PC Virtualization

2 USB 2.0

2 Wireless USB

2 Serial ATA (SATA)

2 Serial Attached SCSI (SAS)

2 DDR2/DDR3 DRAM Technology

2 PC BIOS Firmware

2 High-Speed Design

2 Windows Internals and Drivers

2 Linux Fundamentals

... and many more.

All courses can be customized to meet your

groups needs. Detailed course outlines can

be found at www.mindshare.com

world-class technical training

MindShare training courses expand your technical skillset


3/30www.mindshare.4285 SLASH PINE DRIVE COLORADO SPRINGS, CO 80908 USA

M 1.602.617.1123 O 1.800.633.1440 [email protected]

Engage MindShare

Have knowledge that you want to bring to life? MindShare will work with you to Bring Your Knowledge to Life.

Engage us to transform your knowledge and design courses that can be delivered in classroom or virtual class-room settings, create online eLearning modules, or publish a book that you author.

We are proud to be the preferred training provider at an extensive list of clients that include:

ADAPTEC AMD AGILENT TECHNOLOGIES APPLE BROADCOM CADENCE CRAY CISCO DELL FREESCALE

GENERAL DYNAMICS HP IBM KODAK LSI LOGIC MOTOROLA MICROSOFT NASA NATIONAL SEMICONDUCTOR

NETAPP NOKIA NVIDIA PLX TECHNOLOGY QLOGIC SIEMENS SUN MICROSYSTEMS SYNOPSYS TI UNISYS

Classroom Training

Invite MindShare to train

you in-house, or sign-up to

attend one of our many public

classes held throughout the

year and around the world.

No more boring classes, the

MindShare Experience is

sure to keep you engaged.

Virtual Classroom Training

The majority of our courses

live over the web in an inter-

active environment with WebEx

and a phone bridge. We deliver

training cost-effectively across

multiple sites and time zones.

Imagine being trained in your

cubicle or home office and

avoiding the hassle of travel.

Contact us to attend one of

our public virtual classes.

eLearning Module Training

MindShare is also an eLearning

company. Our growing list of

interactive eLearning modules

include:

Intro to VirtualizationTechnology

Intro to IO Virtualization

Intro to PCI Express 2.0Updates

PCI Express 2.0

USB 2.0

AMD Opteron Processor Architecture

Virtualization Technology

...and more

MindShare Pres

Purchase our boo

eBooks or publish

own content throu

MindShare has au

over 25 books an

is growing. Let us

make your book p

a successful one.

MindShare Learning Options

MindShareClassroom

MindShareVirtual Classroom

MindShareeLearning

MindShaPress


4/30

HyperTransport 3.1

InterconnectTechnology

MINDSHARE, INC.

Brian Holden

Jay TroddenDon Anderson

MINDSHAREPRESS


5/30

The authors and publishers have taken care in preparation of this book, but make noexpressed or implied warranty of any kind and assume no responsibility for errors oromissions. No liability is assumed for incidental or consequential damages in connec-tion with or arising out of the use of the information or programs contained herein.

Visit MindShare, Inc. on the web: www.mindshare.com

Library of Congress Control Number: 2006934887

Copyright 2008 by MindShare, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted, in any form or by any means, electronic, mechanical, photocopy-ing, recording, or otherwise, without the prior written permission of the publisher.

Printed in the United States of America.

For information on obtaining permission for use of material from this work, please sub-mit a written request to:

MindShare PressAttn: Maryanne Daves4285 Slash Pine Dr.Colorado Springs, CO 80908Fax: 719-487-1434

Set in 10-point Times New Roman by MindShare, Inc.

ISBN 978-0-9770878-2-2

First Printing September 2008


6/30

9

1 I ntroduction toHyperTransport

This Chapter

This chapter discusses some of the motivations leading to the development of Hyper-

Transport. It reviews some of the attributes that limit the ability of older generation I/O

buses to keep pace with the increasing demands of new applications and advances in

processor and memory technologies. The chapter then summarizes the key features

behind the improved performance of HT over earlier buses.

The Next Chapter

The next chapter provides an overview of HT architecture, including the primary ele-

ments of HT technology and the relationship between them. The chapter describes the

general features, capabilities, and limitations of HT and introduces the terminology and

concepts necessary for in-depth discussions of the various HT topics in subsequent

chapters.

Computers: Three SubsystemsA server, desktop or notebook computer system is comprised of three major subsystems:

1. Processor. In all systems, the processor device may have more than one CPU core.

In servers, there may be more than one processor device.

2. Main DRAM Memory

3. I/O Subsystem (Southbridge). This connects to all other devices including such

things as graphics, mass storage, networks, legacy hardware, and the buses required

to support them: PCIe, PCI, PCI-X, AGP, USB, IDE, etc.


7/30

HyperTransport 3.1 Interconnect Technology

10

CPUs are Faster Than Their Interconnect

Because of improvements in both the CPU internal execution speed and in the number

of CPU cores per device, processor devices are more demanding than ever when theyaccess external resources such as memory and I/O. Each external read or write by a

CPU core represents a huge performance hit compared to internal execution.

Multiple Processors Aggravate The Problem

In systems with multiple processor devices, such as servers, the problem of accessing

external devices becomes worse because of competition for access to system DRAM

and the single set of I/O resources.

Caches Help DRAM Keep UpSystem DRAM memory has kept up reasonably well with the increasing demands of

CPUs for a couple of reasons. First, the performance penalty for accessing memory is

mitigated by the use of internal processor cache memory. Modern processors generally

implement multiple levels of internal caches that run at the full CPU clock rate and are

tuned for high hit rates. Each fetch from an internal cache eliminates the need for an

external bus cycle to memory.

Second, in cases where an external memory fetch is required, DRAM technology and

the use of high-speed, mixed-signal interfaces (e.g. DDR-3, FB-DIMM, etc.) have

allowed DRAM to maintain bandwidths comparable with the processor external bus

rates.

Southbridge Interconnect Must Keep Pace

In order to prevent system slow-downs, the connectivity to the Southbridge with its I/O

subsystem must keep pace with the processor.

I/O Accesses Can Slow Down The Processor

Although external DRAM accesses by processors can be minimized through the use of

internal caches, there is no way to avoid external bus operations when accessing I/O

devices. Many applications require the processor to perform small, inefficient external

transactions, which then must find their way through the I/O subsystem to the bus host-

ing the device.


8/30

Chapter 1: Introduction to HyperTransport

11

Lack of Bandwidth Also Hurts Fast Peripherals

Similarly, bus master I/O devices using slower subsystem buses to reach main memory

are also hindered by the lack of bandwidth. Some modern peripheral devices are capable

of running much faster than the buses they live on. This presents another system bottle-

neck. This is a particular problem in cases where applications are running that empha-

size latency-critical and/or real-time movement of data through the I/O subsystem over

CPU processing.

Reducing I/O Bottlenecks

Two important schemes have been used to connect I/O devices to main memory. The

first is the shared bus approach, as used in PCI and PCI-X. The second involves point-

to-point component interconnects, and includes some proprietary buses as well as open

architectures such as HyperTransport and PCI-Express. These are described here, along

with the advantages and disadvantages of each.

The Historic Approach

Figure 1-1 on page 12 depicts the classic North-South bridge PCI implementation.

Note that the PCI bus acts as both an add-in bus for user peripheral cards and as an

interconnect bus to memory for all devices residing on or below it. Even traffic to and

from the USB and IDE controllers integrated in the South Bridge must cross the PCI bus

to reach main memory.

The topology shown in Figure 1-1was the traditional way of connecting desktop sys-

tems for a number of reasons, including:

1. A shared bus reduces the number of traces on the motherboard to a single set.

2. All of the devices located on the PCI bus are only one bridge interface away from

the principal target of their transactions main DRAM memory.

3. A single, very popular protocol (PCI) can be used for all embedded devices, add-in

cards, and chipset components attached to the bus.

Unfortunately, some of the things that made this topology so popular also have made it

difficult to fix the I/O bandwidth problems which have become more obvious as proces-

sors and memory became faster.


9/30


12

Figure 1-1: Classic PCI North-South Bridge System

CPU

PCI Bus

PCI Slots

NorthBridge

MainMemoryAGP

Port

AGPGraphics

accelerator

VideoBIOS

D

V

D

Host Port

Video Port

Monitor

LocalVideo

Memory

SouthBridge

SCSIHBA

Ethernet

IDEHardDrive

IDE CD ROM

IRQs

USB

ISA Bus

ISASlots

SoundChipset

SuperIO

RTC

COM1

COM2

SystemBIOS

FSB

CCIR601

VMI(Video Module I/F)

InterruptController

INTR


10/30

17

2 HT Archi tecturalOverview

The Previous Chapter

To understand why HT was developed, it is helpful to review the previous generation of

I/O buses and interconnects. This chapter reviews the factors that limit the ability of

older generation buses to keep pace with the increasing demands of new applications.

Finally, this chapter discusses the key factors of the HT technology that provides its

improved capability.

This Chapter

This chapter provides an overview of the HT architecture that defines the primary ele-

ments of HT technology and the relationship between these elements. This chapter sum-

marizes the features, capabilities, and limitation of HT and provides the background

information necessary for in-depth discussions of the various HT topics in later chap-

ters.

The Next Chapter

The next chapter describes the function of each signal in the high- and low- speed

HyperTransport signal groups.

General

HyperTransport is in its essence a hardware interface. While software is involved in

configuring, controlling, and utilizing the interface, the bulk of the protocols described

in this book are implemented in hardware. The interface has been built into manydevices including all of the microprocessors in AMDs Turion, Athlon, Phenom, and

Opteron product lines.


11/30


18

HyperTransport provides a point-to-point interconnect that can be extended to support a

wide range of devices. Figure 2-1 on page 19illustrates a sample HT system with four

internal links. HyperTransport provides a high-speed, high-performance, point-to-point

dual simplex link for interconnecting IC components on a PCB. Data is transmitted

from one device to another across the link.

The width of the link along with the clock frequency at which data is transferred are

scalable and auto-negotiable:

Link width ranges from 2 bits to 32-bits

Clock Frequency ranges from 200MHz to 3.2Hz

This scalability allows for a wide range of link performance and potential applications

with bandwidths ranging from 200MB/s to 51.2GB/s.

Once again referring to Figure 2-1, the HT bus has been extended in the sample system

via a series of devices known as tunnels. A tunnel is an HT device that performs somefunction and also has a second HT interface that permits the connection of another HT

device. The end device is termed a cave, which always represents the termination of a

chain of devices that all reside on the same HT bus. Cave devices include a function, but

no additional HT connection. The series of devices that comprise an HT bus is some-

times simply referred to as an HT chain.

In Figure 2-1, the tunnel devices provide connections to Infiniband, PCI-Express, and

Ethernet. The Cave device provides a connection to a Serial Attached SCSI storage

device. Devices supporting connections to any of these other technologies can be made

as either cave or tunnel devices.

Additional HT buses (i.e. chains) may be implemented in a given system by using a

HT-to-HT bridge. In this way, a fabric of HT devices may be implemented. Refer to sec-

tion entitled, Extending the Topology on page 31 for additional detail.

Transfer Types Supported

HT supports two types of addressing semantics:

1. legacy PC, address-based semantics

2. messaging semantics common to networking environments

The most of this book discusses the address-based semantics common to compatible PC

implementations. Message-passing semantics are discussed in See The Direct Packet

Feature Set on page 497.


12/30

Chapter 2: HT Architectural Overview

19

Address-Based Semantics

The HT bus was initially implemented as a PC compatible solution that by definition

uses Address-based semantics. This includes both the 40-bit, 1 Terabyte (TB) and the

64-bit, 18 Exabyte (EB) address spaces. The 64-bit address map is the same as the 40-bit

address map with 24 extra bits that are defined to be 00_0000h for 40 bit transactions.

Transactions specify locations within this address space that are to be read from or writ-

ten to. The address space is divided into blocks that are allocated for particular func-

tions, listed in Figure 2-2 on page 20. Unless otherwise specified, 40-bit addresses are

used in this book.

Read and write request command packets contain a 40-bit address Addr[39:2]. When an

address extension doubleword is present, twenty four additional address bits are pro-

vided for a total of 64 address bits.

Figure 2-1: Example HyperTransport System

CPU CPU

Memory/Graphics Hub and

HyperTransport Host Bridge

PCIe

DDR

SDRAM

Infiniband

PCIe-

GB

Ethernet

PCIe slots

Infiniband Switch

SASSATA

RAID Disk array

HyperTransportTunnel Devices

-

HyperTransport

Cave Device

HyperTransportLinks

Ethernet

Cable

Out of Box

HTX slot


13/30


20

HyperTransport does not contain dedicated I/O address space as PCI does. Instead, CPU

I/O space is memory-mapped to high address range (FD_FC00_0000h

FD_FDFF_FFFFh). Each HyperTransport device is configured at initialization time by

the boot ROM configuration software to respond to a range of memory address spaces.

The devices are assigned addresses via the base address registers contained in the con-

figuration register header. Note that these registers are based on the PCI Configuration

registers, and are also mapped to memory space (FD_FE00_0000hFD_FFFF_FFFFh.

Also unlike the PCI bus, there is no dedicated configuration address space.

Additional memory address ranges are used for interrupt signaling and system manage-

ment messages. Details regarding the use of each range of address space is discussed in

subsequent chapters that cover the related topic. For example, a detailed discussion of

the configuration address space can be found in Chapter 14, entitled "Device Configura-

tion," on page 329

Figure 2-2: 40-bit HT Address Map

1012GB

DRAM / Memory-Mapped IO

3984MB Interrupt /

EOI

1MB Legacy

PIC IACK

1MB System

Management

46MB Reserved

7680MB Reserved

32MB

Configuration

32MB IO

00_0000_0000h to FC_FFFF_FFFFh

FD_0000_0000h to FD_F8FF_FFFFh

FD_F900_0000h to FD_F90F_FFFFh

FD_F910_0000h to FD_F91F_FFFFh

FD_F920_0000h to FD_FBFF_FFFFh

FD_FC00_0000h to FD_FDFF_FFFFh

FD_FE00_0000h to FD_FFFF_FFFFh

FE_0000_0000h to FE_1FFF_FFFFh512MB Extended

Configuration

FE_2000_0000h to FF_FFFF_FFFFh


14/30

53

3 Signal Groups


The previous chapter provided an overview of the HT architecture that defines the pri-

mary elements of HT technology and the relationship between these elements. The

chapter summarized the features, capabilities, and limitation of HT and provided the

background information necessary for in-depth discussions of the various HT topics in

later chapters.

This ChapterThis chapter describes the function of each signal in the high and low speed Hyper-

Transport signal groups. The CAD, CTL and CLK high speed signals are routed point-

to-point as low-voltage differential pairs between two devices (or between a device and

a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDTSTOP# low

speed signals are single-ended low voltage CMOS and may be bused to multiple

devices. In addition, each device requires power supply and ground pins. Because the

CAD bus width is scalable, the actual number of CAD, CTL and CLK signal pairs var-

ies, as does the number of power and ground pins to the device.

The Next Chapter

The next chapter describes the use of HyperTransportcontroland datapackets to con-

struct HyperTransport link transactions. Control packet types include Information,

Request, and Response variants; data packets contain a payload of 0-64 valid bytes. The

transmission, structure, and use of each packet type is presented.

Introduction

Signals on each HyperTransport link fall into two groups: high speed signals associated

with the sending and receiving of control and data packets, and miscellaneous low-

speed signals required for such things as reset and power management. The low speed

signals are not scalable and employ conventional low voltage CMOS signaling. The

high speed signal group is scalable in terms of both bus width and clock rate, and each

signal is carried by a low-voltage differential signal pair.


15/30


54

While device pin count varies with scaling, signal group functions remain the same; the

only real difference in signaling between a 32-bit link versus a 2-bit link is the number

of bit times required to shift information onto the bus.

The Signal Groups

As illustrated in Figure 3-1 on page 54, the high-speed HyperTransport signals on each

link consist of an outbound (transmit) set of signals and an inbound (receive) set of sig-

nals for each device; these are routed point-to-point. Having two sets of uni-directional

signals allows concurrent traffic. In addition, there is one set of low speed signals that

may be bused to multiple devices.

Figure 3-1: HyperTransport Signal Groups

Link

CTL[n:0] signal pairs

CAD[p:0] signal pairs

CLK[m:0] signal pairs

RCV

XMT

XMT

RCV

System Logic

PWROK

RESET#

*LDTSTOP#

*LDTREQ#

*Optional

Low Speed

Signals

VHT= 1.2Volts

GND

High Speed

Signals

(NextLink)

(NextLink)

XMT

RCV

Device A Device B

XMT

RCV

RCV

XMT

For Width = 2 4 8 16 32

m = 0 0 0 1 3

Gen3: n = 0 0 0 1 3

Gen1: n = 0 0 0 0 0

p = 1 3 7 15 31

CTL[n:0] signal pairs

CAD[p:0] signal pairs

CLK[m:0] signal pairs


16/30

Chapter 3: Signal Groups

55

The High Speed Signals (One Set In Each Direction)

Each high-speed signal is actually a differential signal pair. The CAD (Command/

Address/Data) signals carry the HyperTransport packets. When a link transmitter sendspackets on the CAD bus, the receive side of the interface uses the CLK signals, also sup-

plied by the transmitter, to latch in packet information during each bit time. The CTL

signal or signals are used to delineate between the packet types.

The CAD Signal Group

The CAD bus is always driven by the transmitter side of a link, and is comprised of sig-

nal pairs that carry HyperTransport requests, responses, and data. Each CAD bus may

consist of between 2 bits (two differential signal pairs) and 32 bits (thirty-two differen-

tial signal pairs). The HyperTransport specification permits the CAD bus width to bedifferent (asymmetrical) for the two directions. To enable the corresponding receiver to

make a distinction as to the type of information currently being sent over the CAD bus,

the transmitter also drives the CTL signals (See the following description).

Control Signal (CTL)

This set of signal pairs is driven by the transmitter to identify the information being sent

concurrently over the CAD signals. The receiver uses this information to delineate the

incoming CAD information.

In the Gen1 Protocol, there is one (and only one) CTL signal pair for each link direction,

regardless of the width of the CAD bus. If this signal is asserted (high), the transmitter is

indicated that it is sending a control packet; if deasserted, the transmitter is sending a

data packet. The CTL line may only change on the boundaries of the transfer of the log-

ical 32 bit CAD bus. The receiver uses this restriction to allow framing on the data.

In the Gen3 Protocol, there is one CTL signal pair for each set of 8 or fewer CAD lines.

These CTL signal(s) are encoded in a manner to transfer four CTL bits per 32 CAD bits.

Five of the sixteen available codepoints for these bits are used to delineate the transfer

of commands, inserted commands, the CRC for a command with data, the CRC for a

command without data, and finally the data itself. The receiver frames on the presence

of the used codepoints.


17/30


56

Clock Signal(s) (CLK)

As each HyperTransport transmitter sends a differential clock signal along with CAD

and CTL signals to the receiver at the other end of the link. There is one CLK signal pairfor each set of 8 or fewer CAD signal pairs. Having a CLK signal pair per set of 8 or

fewer CAD signal pairs aids in the partitioning of on-chip clock circuitry as well as eas-

ing board layout.

In HyperTransport revisions 1.03 to 1.1, the CLK signal is source synchronous with the

CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 800 MHz.

In HyperTransport revision 2.0 the CLK signal is also source synchronous with the

CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 1.4 GHz.

In HyperTransport revision 3.1 the CLK signal is also source synchronous with the

CAD and CTL lines, but the phase of CAD and CTL with respect to CLK is allowed tovary within a defined budget over a several bit times. The clock speeds range from 200

MHz (default) up to 3.2 GHz.

For all revisions, the jitter on the CLK line is tightly controlled.

Scaling Hazards: Burden Is On The Transmitter

It is a requirement in HyperTransport that the software in the transmitter side of each

link must be aware of the capabilities of its corresponding receiver and avoid the double

hazard of a scalable bus: running at a faster clock rate than the receiver can handle orusing a wider data path than the receiver supports. Because the link is not a shared bus,

the transmitter side of each device is concerned with the capabilities of only one target.

The Low Speed Signals

Power OK (PWROK) And Reset (RESET#)

PWROK used with RESET# indicates to HyperTransport devices whether a Cold or

Warm Reset is in progress. Which system logic component is responsible for managingthe PWROK and RESET# signals is beyond the scope of the HyperTransport specifica-

tion, but timing and use of the signals are defined. The basic use of the signals includes:


18/30

63

4 Packet Protocol


The previous chapter described the function of each signal in the high and low speed

HyperTransport signal groups. The CAD, CTL, and CLK high speed signals are routed

point-to-point as low-voltage differential pairs between two devices (or between a

device and a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDT-

STOP# low speed signals are single-ended low voltage CMOS and may be bused to

multiple devices. In addition, each device requires power supply and ground pins.

Because the CAD bus width is scalable, the actual number of CAD, CTL and CLK sig-nal pairs varies, as does the number of power and ground pins to the device.

This Chapter

This chapter describes the use of HyperTransportcontroland datapackets to construct

HyperTransport link transactions. Control packet types include Information, Request,

and Response variants; data packets contain a payload of 0-64 valid bytes. The trans-

mission, structure, and use of each packet type is presented.

The Next Chapter

The next chapter describes HyperTransportflow control, used to throttle the movementof packets across each link interface. On a high-performance connection such as Hyper-

Transport, efficient management of transaction flow is nearly as important as the raw

bandwidth made possible by clock speed and data bus width. Topics covered here

include background information on bus flow control and the initialization and use of the

HyperTransport virtual channel flow control buffer mechanism defined for each trans-

mitter-receiver pair.

The Packet-Based Protocol

HyperTransport employs a packet-based protocol in which all information, address,commands, and data, travel in packets which are multiples of four bytes each. Packets

are used in link management (e.g. flow control and error reporting) and as building

blocks in constructing more complex transactions such as read and write data transfers.


19/30


64

It should be noted that, while packet descriptions in this chapter are in terms of bytes,

the links bidirectional interface width (2, 4, 8, 16, or 32 bits) ultimately determines the

amount of packet information sent during each bit timeon HyperTransport links. There

are two bit times per clock period.

Before looking at packet function and use, the following sections describe the mechan-ics of packet delivery over 2, 4, 8, 16, and 32 bit scalable link interfaces.

8 Bit Interfaces

For 8-bit interfaces, one byte of packet information may be sent in each bit time. For

example, a 4-byte request packet would be sent by the transmitter during four adjacent

bit times, least significant byte first as shown in Figure 4-1 on page 64. Total time to

complete a four-byte packet is two clock periods.

Figure 4-1: Four Byte Packet On An 8-Bit Interface

Bit Time

Clock

Byte 0 -3

0

0

0

0

7

7

7

7

A

A B C D

(Byte 0)

xamp e:

4 Byte Packet On An 8-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

B

C

D

Device A Device B

CAD0-7

8

D,C, B, A


20/30

Chapter 4: Packet Protocol

65

Interfaces Narrower Than 8 Bits

For link interfaces that are narrower than 8 bits, the first byte of packet information is

shifted out over multiple bit times, least significant bits first. Referring to Figure 4-2 onpage 65, a 2-bit interface would require four bit times to transmit each byte of informa-

tion. After the first byte is sent, subsequent bytes in the packet are shifted out in the

same manner. Total time to complete four byte packet: eight clock periods.

Figure 4-2: Four Byte Packet On A 2-Bit Interface

Bit Time

Clock

Byte 0 Byte 1 Byte 2 Byte 3

0

0

0

0

7

7

7

7

D

H

L

P

C

G

K

O

B

F

J

N

AA B C D E F G H I J K L M N O P

E

(Byte 0)

xamp e:

4 Byte Packet On A 2-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

P, O, N.....C, B, A

Device A Device B

CAD0-1

2


21/30


66

Interfaces Wider Than 8 Bits

For 16 or 32 bit interfaces, packet delivery is accelerated by sending multiple bytes of

packet information in parallel with each other.

16 Bit Interfaces

On 16-bit interfaces, two bytes of information may be sent in each bit time. Referring to

Figure 4-3 on page 66, note that even numbered bytes travel on the lower portion of the

16 bit interface, odd numbered bytes on the upper portion.

Figure 4-3: Four Byte Packet On A 16-Bit Interface

Device A Device B8

Bit Time

Clock

(Byte 0 -3)

0

0

0

0

7

7

7

7

A

CAD0-7

(Byte 0)

Example:

4 Byte Packet On A 16-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

B

8

(Byte3), (Byte1)

A B

B A

CAD8-15

(Byte2), (Byte0)


22/30

125

6 I /O Ordering


The previous chapter describes HyperTransportflow control, used to throttle the move-

ment of packets across each link interface. On a high-performance connection such as

HyperTransport, efficient management of transaction flow is nearly as important as the

raw bandwidth made possible by clock speed and data bus width. Topics covered here

include background information on bus flow control and the initialization and use of the

HyperTransport virtual channel flow control buffer mechanism defined for each trans-

mitter-receiver pair.

This Chapter

This chapter describes the ordering rules which apply to HyperTransport packets.

Attribute bits in request and response packets are configured according to application

requirements. Additional ordering requirements can be found in Chapter 23, entitled "I/

O Compatibility," on page 525 that are used when interfacing PCI, PCI-X, PCIe and

AGP.

The Next Chapter

In the next chapter, examples are presented which apply the packet principles describedin the preceding chapter. The examples also entail more complex system transactions

than discussed previously, including reads, posted and non-posted writes, and atomic

read-modify-write operations.

The Purpose Of Ordering Rules

Some of the important reasons for enforcing ordering rules on packets moving through

HyperTransport include the following:


23/30


126

Maintain Data Correctness

If transactions are in some way dependent on each other, a method is required to assure

that they complete in a deterministic way. For example, if Device A performs a writetransaction targeting main memory and then follows it with a read request targeting the

same location, what data will the read transaction return? HyperTransport ordering

seeks to make such events predictable (deterministic) and to match the intent of the pro-

grammer. Note that, compared to a shared bus such as PCI, HyperTransport transaction

ordering is complicated somewhat by point-to-point connections which result in target

devices on the same chain (logical bus) being at different levels of fabric hierarchy.

Avoid Deadlocks

Another reason for ordering rules is to prevent cases where the completion of two sepa-rate transactions are each dependent on the other completing first. HyperTransport

ordering includes a number of rules for deadlock avoidance. Some of the rules are in the

specification because of known deadlock hazards associated with other buses to which

HyperTransport may interface (e.g. PCI).

Support Legacy Buses

One of the principal roles of HyperTransport is to serve as a backbone bus which is

bridged to other peripheral buses. HyperTransport explicitly supports PCI, PCI-X, and

AGP and the ordering requirements of those buses.

Maximize Performance

Finally, HyperTransport permits devices in the path to the target, and the target itself,

some flexibility in reordering packets around each other to enhance performance. When

acceptable, relaxed ordering may be enabled by the requester on a per-transaction basis

using attribute bits in request and response packets.


24/30

Chapter 6: I/O Ordering

127

Introduction: Three Types Of Traffic Flow

HyperTransport defines three types of traffic: Programmed I/O (PIO), Direct Memory

Access (DMA), and Peer-to-Peer. Figure 6-1 on page 127depicts the three types of traf-fic.

1. Programmed I/O traffic originates at the host bridge on behalf of the CPU and tar-

gets I/O or Memory Mapped I/O in one of the peripherals. These types of transac-

tions often are generated by CPU to set up peripherals for bus master activity, check

status, program configuration space, etc.

2. DMA traffic originates at a bus master peripheral and typically targets main mem-

ory. It may also originate from a DMA controller located within the processor

device outside of the CPU itself. This traffic is used so that the CPU may be off-

loaded from the burden of moving large amounts of data to and from the I/O sub-

system. Generally, the CPU uses a few PIO instructions to program the peripheral

device with information about a required DMA transfer (transfer size, target

address in memory, read or write, etc.), then performs some other task while the

DMA transfer is carried out. When the transfer is complete, the DMA device may

generate an interrupt message to inform the CPU.

3. Peer-to-Peer traffic is generated by an interior node and targets another interior

node. In HyperTransport, direct peer-to-peer traffic is not allowed. As indicated in

Figure 6-1 on page 127, the request is issued upstream and must travel to the host

bridge. The host bridge examines the address and determines whether the request

should be reflected downstream. If the request is non-posted, the response will sim-

ilarly travel from the target back up to the host bridge and then be reissued to the

original requester.

Figure 6-1: PIO, DMA, And Peer-to-Peer Traffic

Memory

Target

SourceUnitID0

TargetUnitID2

Host Bridge Host Bridge

S

P

TunnelTunnel

Bus 0Bus 0

Bus 0Bus 0

Bus 0

Bus 0

SourceSourceUnitID2UnitID2

TargetUnitID1

Peer-to-PeerDMAPIO

Tunnel

P P

P

S

Host Bridge

S

P P


25/30


128

The Ordering Rules

HyperTransport packet ordering rules are divided into groups: general rules, rules for

upstream I/O ordering, and rules for downstream ordering. Even the peer-to-peer exam-

ple in Figure 6-1 on page 127can be broken into two parts: the request moving to the

bridge (covered by upstream ordering rules) and the reflection of the request down-

stream to the peer-to-peer target (covered by downstream I/O ordering rules). Refer to

Chapter 23, entitled "I/O Compatibility," on page 525 for a discussion of ordering when

packets move between HyperTransport and another protocol (PCI, PCI-X, or AGP).

General I/O Ordering Limits

Ordering Covers Targets At Same Hierarchy Level

Ordering rules only apply to the order in which operations are detected by targets at the

same level in the HyperTransport fabric hierarchy. Referring to Figure 6-2 on page 128,

assume that two peer-to-peer writes targeting devices on two different chains have been

performed by the end device in chain 0.

Figure 6-2: Targets At Different Levels In Hierarchy And In Different Chains

HT Host Bridge

HT-to-

PCI-X

Tunnel

HT-to-

GbE

Tunnel

HT-to-

PCI

Tunnel

HT-to-

SCSI

Tunnel

Chain 1

Chain 2

Chain 0

Request A

Request B

I/O

Hub


26/30

145

7 TransactionExamples


The previous chapter describes the ordering rules which apply to HyperTransport pack-

ets. Attribute bits in request and response packets are configured according to applica-

tion requirements. Additional ordering requirements can be found in Chapter 23,

entitled "I/O Compatibility," on page 525 that are used when interfacing PCI, PCI-X,PCIe and AGP.

This Chapter

In this chapter, examples are presented that apply the packet principles in the preceding

chapters and includes more complex system transactions, not previously discussed. The

examples include reads, posted and non-posted writes, and atomic read-modify-write.

The Next Chapter

HT uses an interrupt signaling scheme similar to PCIs Message Signaled Interrupts.

The next chapter defines how HT delivers interrupts to the Host Bridge via posted mem-ory writes. This chapter also defines an End of Interrupt message and details the mecha-

nism that HT uses for configuring and setting up interrupt transactions (which is

different from the PCI-defined mechanisms).

Transaction Examples: Introduction

The following section describes some transactions which might be seen in HyperTrans-

port. Not all cases are covered, but the ones which are presented are intended to illus-

trate two important aspects of each transaction:


27/30


146

Packet Format And Optional Features

Most of the control packet variants have a number of fields used to enable optional fea-

tures: isochronous vs. standard virtual channels, byte vs. dword data transfers, postedvs. non-posted channels, etc. In each of the examples, the key options are described; in

some cases, certain packet fields dont apply to a particular request or response at all.

Refer to Chapter 4, entitled "Packet Protocol," on page 63 for low level, bit-by-bit

packet field descriptions.

General Sequence Of Events

Once the packet fields are determined, the next topic covered in the transaction exam-

ples is a general summary of events which must occur in the HyperTransport topology

to perform the transaction. The description starts at the agent initiating an information orrequest control packet and follows it (and possibly a data packet) to the target; if there is

a response required, this packet (and any data associated with it) is similarly followed

from the target back to the original requester. Packet movement through HyperTransport

involves aspects of topics described in more detail in other chapters, including flow con-

trol, packet ordering and routing, error handling, etc.

For each of the examples, assume the link interface CAD bus is eight bits in each direc-

tion. This simplifies the discussion of packet bytes moving across the connection

each byte of packet content is sent in one bit time. The only difference, when link width

is greater or less than 8 bits, is how the packet bytes are shifted out and received. For

example, on a 2-bit interface each byte of packet content is shifted onto the link over 4bit times. For an interface 32 bits wide, four bytes of packet content are sent in parallel

with each other in each bit time.


28/30

Chapter 7: Transaction Examples

147

Example 1: NOP Information Packet

Situation:Device 2 in Figure 7-1 on page 147is using a NOP packet to inform Device

1 about the availability of two new Posted Request (CMD) entries and one new

Response (Data) entry in its receiver flow control buffers.

Figure 7-1: Example 1: NOP Information Packet With Buffer Updates

ResponseData[1:0]

6 5 4 3 2 170

Bits

Bit Fields

Byte 0:DisCon = 0bCMD = 000000b

Command Type Cmd[5:0] = 000000b

1

2

3

Byte 0DisCon

PostData[1:0]Response[1:0]

Reserved

PostCmd[1:0]

NonPostData[1:0] NonPostCmd[1:0]IsocDiagRsv-Host Reserved

Byte 1: PostCMD = 10b (2d) PostData = 00b Response = 00b ResponseData = 01b (1d)

Bit Fields

Byte 3: RxNextPktToAck = 25h

Byte 2: NonPostCMD = 00b Non-PostData = 00b

Isoc = 0b Diag = 0b

NOP PacketBytes

Device 1 Device 2

CAD0-7

CTL 8

CLK

CAD0-7

CTL8

CLK

Flow ControlBuffers

RxNextPktToAck[7:0]


29/30


148

Example 1: NOP Packet Setup

(Refer toFigure 7-1 on page 147.)

Command[5:0] Field (Byte 0, Bit 5:0)

The NOP information packet command code. There are no option bits within this field.

For this example, field = 000000b.

DisCon Bit Field (Byte 0, Bit 6)

This bit is set to indicate an LDTSTOP# sequence is beginning. Not active for this

example. Refer to HT Link Disconnect/Reconnect Sequence on page 217 for a discus-

sion of the DisCon bit and the LDTSTOP# sequence. For this example, field = 0b.

PostCMD[1:0] Field (Byte 1, Bits 1:0)

This NOP field is used by all devices to dynamically report the number of new entries

(0-3) in the receiversPosted Request (CMD)flow control buffers which have become

available since the last NOP reporting this information was sent. In our example,

assume that two new entries have become available and this field = 10b.

PostData[1:0] Field (Byte 1, Bits 3:2)


(0-3) in the receivers Posted Request (Data) flow control buffers that have become

available since the last NOP reporting this information was sent. In our example,assume no new entries in this buffer have become available and this field = 00b.

Response[1:0] Field (Byte 1, Bits 5:4)

This NOP field dynamically reports the number of new entries (0-3) in the receivers

Response (Data)flow control buffers which have become available since the last NOP

reporting this information was sent. In our example, assume no new entries in this buffer

have become available. For this example, field = 00b.

ResponseData[1:0] Field (Byte 1, Bits 3:2)


(0-3) in the receiversResponse (Data)flow control buffers which have become avail-

able since the last NOP reporting this information was sent. In our example, assume one

new entry in this buffer has become available. For this example, field = 01b.


30/30

Documents

HyperTransport 3.1 Interconnect Technology.pdf