HyperTransport 3.1 Interconnect Technology.pdf

Embed Size (px)

Citation preview

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    1/30

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    2/30

    Are your companys technical training needs being addressed in the most effective manner?

    MindShare has over 25 years experience in conducting technical training on cutting-edge technologies. We

    understand the challenges companies have when searching for quality, effective training which reduces thestudents time away from work and provides cost-effective alternatives. MindShare offers many flexible solution

    to meet those needs. Our courses are taught by highly-skilled, enthusiastic, knowledgeable and experienced

    instructors. We bring life to knowledge through a wide variety of learning methods and delivery options.

    2 PCI Express 2.0

    2 Intel Core 2 Processor Architecture

    2 AMD Opteron Processor Architecture

    2 Intel 64 and IA-32 Software Architecture

    2 Intel PC and Chipset Architecture

    2 PC Virtualization

    2 USB 2.0

    2 Wireless USB

    2 Serial ATA (SATA)

    2 Serial Attached SCSI (SAS)

    2 DDR2/DDR3 DRAM Technology

    2 PC BIOS Firmware

    2 High-Speed Design

    2 Windows Internals and Drivers

    2 Linux Fundamentals

    ... and many more.

    All courses can be customized to meet your

    groups needs. Detailed course outlines can

    be found at www.mindshare.com

    world-class technical training

    MindShare training courses expand your technical skillset

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    3/30www.mindshare.4285 SLASH PINE DRIVE COLORADO SPRINGS, CO 80908 USA

    M 1.602.617.1123 O 1.800.633.1440 [email protected]

    Engage MindShare

    Have knowledge that you want to bring to life? MindShare will work with you to Bring Your Knowledge to Life.

    Engage us to transform your knowledge and design courses that can be delivered in classroom or virtual class-room settings, create online eLearning modules, or publish a book that you author.

    We are proud to be the preferred training provider at an extensive list of clients that include:

    ADAPTEC AMD AGILENT TECHNOLOGIES APPLE BROADCOM CADENCE CRAY CISCO DELL FREESCALE

    GENERAL DYNAMICS HP IBM KODAK LSI LOGIC MOTOROLA MICROSOFT NASA NATIONAL SEMICONDUCTOR

    NETAPP NOKIA NVIDIA PLX TECHNOLOGY QLOGIC SIEMENS SUN MICROSYSTEMS SYNOPSYS TI UNISYS

    Classroom Training

    Invite MindShare to train

    you in-house, or sign-up to

    attend one of our many public

    classes held throughout the

    year and around the world.

    No more boring classes, the

    MindShare Experience is

    sure to keep you engaged.

    Virtual Classroom Training

    The majority of our courses

    live over the web in an inter-

    active environment with WebEx

    and a phone bridge. We deliver

    training cost-effectively across

    multiple sites and time zones.

    Imagine being trained in your

    cubicle or home office and

    avoiding the hassle of travel.

    Contact us to attend one of

    our public virtual classes.

    eLearning Module Training

    MindShare is also an eLearning

    company. Our growing list of

    interactive eLearning modules

    include:

    Intro to VirtualizationTechnology

    Intro to IO Virtualization

    Intro to PCI Express 2.0Updates

    PCI Express 2.0

    USB 2.0

    AMD Opteron Processor Architecture

    Virtualization Technology

    ...and more

    MindShare Pres

    Purchase our boo

    eBooks or publish

    own content throu

    MindShare has au

    over 25 books an

    is growing. Let us

    make your book p

    a successful one.

    MindShare Learning Options

    MindShareClassroom

    MindShareVirtual Classroom

    MindShareeLearning

    MindShaPress

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    4/30

    HyperTransport 3.1

    InterconnectTechnology

    MINDSHARE, INC.

    Brian Holden

    Jay TroddenDon Anderson

    MINDSHAREPRESS

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    5/30

    The authors and publishers have taken care in preparation of this book, but make noexpressed or implied warranty of any kind and assume no responsibility for errors oromissions. No liability is assumed for incidental or consequential damages in connec-tion with or arising out of the use of the information or programs contained herein.

    Visit MindShare, Inc. on the web: www.mindshare.com

    Library of Congress Control Number: 2006934887

    Copyright 2008 by MindShare, Inc.

    All rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted, in any form or by any means, electronic, mechanical, photocopy-ing, recording, or otherwise, without the prior written permission of the publisher.

    Printed in the United States of America.

    For information on obtaining permission for use of material from this work, please sub-mit a written request to:

    MindShare PressAttn: Maryanne Daves4285 Slash Pine Dr.Colorado Springs, CO 80908Fax: 719-487-1434

    Set in 10-point Times New Roman by MindShare, Inc.

    ISBN 978-0-9770878-2-2

    First Printing September 2008

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    6/30

    9

    1 I ntroduction toHyperTransport

    This Chapter

    This chapter discusses some of the motivations leading to the development of Hyper-

    Transport. It reviews some of the attributes that limit the ability of older generation I/O

    buses to keep pace with the increasing demands of new applications and advances in

    processor and memory technologies. The chapter then summarizes the key features

    behind the improved performance of HT over earlier buses.

    The Next Chapter

    The next chapter provides an overview of HT architecture, including the primary ele-

    ments of HT technology and the relationship between them. The chapter describes the

    general features, capabilities, and limitations of HT and introduces the terminology and

    concepts necessary for in-depth discussions of the various HT topics in subsequent

    chapters.

    Computers: Three SubsystemsA server, desktop or notebook computer system is comprised of three major subsystems:

    1. Processor. In all systems, the processor device may have more than one CPU core.

    In servers, there may be more than one processor device.

    2. Main DRAM Memory

    3. I/O Subsystem (Southbridge). This connects to all other devices including such

    things as graphics, mass storage, networks, legacy hardware, and the buses required

    to support them: PCIe, PCI, PCI-X, AGP, USB, IDE, etc.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    7/30

    HyperTransport 3.1 Interconnect Technology

    10

    CPUs are Faster Than Their Interconnect

    Because of improvements in both the CPU internal execution speed and in the number

    of CPU cores per device, processor devices are more demanding than ever when theyaccess external resources such as memory and I/O. Each external read or write by a

    CPU core represents a huge performance hit compared to internal execution.

    Multiple Processors Aggravate The Problem

    In systems with multiple processor devices, such as servers, the problem of accessing

    external devices becomes worse because of competition for access to system DRAM

    and the single set of I/O resources.

    Caches Help DRAM Keep UpSystem DRAM memory has kept up reasonably well with the increasing demands of

    CPUs for a couple of reasons. First, the performance penalty for accessing memory is

    mitigated by the use of internal processor cache memory. Modern processors generally

    implement multiple levels of internal caches that run at the full CPU clock rate and are

    tuned for high hit rates. Each fetch from an internal cache eliminates the need for an

    external bus cycle to memory.

    Second, in cases where an external memory fetch is required, DRAM technology and

    the use of high-speed, mixed-signal interfaces (e.g. DDR-3, FB-DIMM, etc.) have

    allowed DRAM to maintain bandwidths comparable with the processor external bus

    rates.

    Southbridge Interconnect Must Keep Pace

    In order to prevent system slow-downs, the connectivity to the Southbridge with its I/O

    subsystem must keep pace with the processor.

    I/O Accesses Can Slow Down The Processor

    Although external DRAM accesses by processors can be minimized through the use of

    internal caches, there is no way to avoid external bus operations when accessing I/O

    devices. Many applications require the processor to perform small, inefficient external

    transactions, which then must find their way through the I/O subsystem to the bus host-

    ing the device.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    8/30

    Chapter 1: Introduction to HyperTransport

    11

    Lack of Bandwidth Also Hurts Fast Peripherals

    Similarly, bus master I/O devices using slower subsystem buses to reach main memory

    are also hindered by the lack of bandwidth. Some modern peripheral devices are capable

    of running much faster than the buses they live on. This presents another system bottle-

    neck. This is a particular problem in cases where applications are running that empha-

    size latency-critical and/or real-time movement of data through the I/O subsystem over

    CPU processing.

    Reducing I/O Bottlenecks

    Two important schemes have been used to connect I/O devices to main memory. The

    first is the shared bus approach, as used in PCI and PCI-X. The second involves point-

    to-point component interconnects, and includes some proprietary buses as well as open

    architectures such as HyperTransport and PCI-Express. These are described here, along

    with the advantages and disadvantages of each.

    The Historic Approach

    Figure 1-1 on page 12 depicts the classic North-South bridge PCI implementation.

    Note that the PCI bus acts as both an add-in bus for user peripheral cards and as an

    interconnect bus to memory for all devices residing on or below it. Even traffic to and

    from the USB and IDE controllers integrated in the South Bridge must cross the PCI bus

    to reach main memory.

    The topology shown in Figure 1-1was the traditional way of connecting desktop sys-

    tems for a number of reasons, including:

    1. A shared bus reduces the number of traces on the motherboard to a single set.

    2. All of the devices located on the PCI bus are only one bridge interface away from

    the principal target of their transactions main DRAM memory.

    3. A single, very popular protocol (PCI) can be used for all embedded devices, add-in

    cards, and chipset components attached to the bus.

    Unfortunately, some of the things that made this topology so popular also have made it

    difficult to fix the I/O bandwidth problems which have become more obvious as proces-

    sors and memory became faster.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    9/30

    HyperTransport 3.1 Interconnect Technology

    12

    Figure 1-1: Classic PCI North-South Bridge System

    CPU

    PCI Bus

    PCI Slots

    NorthBridge

    MainMemoryAGP

    Port

    AGPGraphics

    accelerator

    VideoBIOS

    D

    V

    D

    Host Port

    Video Port

    Monitor

    LocalVideo

    Memory

    SouthBridge

    SCSIHBA

    Ethernet

    IDEHardDrive

    IDE CD ROM

    IRQs

    USB

    ISA Bus

    ISASlots

    SoundChipset

    SuperIO

    RTC

    COM1

    COM2

    SystemBIOS

    FSB

    CCIR601

    VMI(Video Module I/F)

    InterruptController

    INTR

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    10/30

    17

    2 HT Archi tecturalOverview

    The Previous Chapter

    To understand why HT was developed, it is helpful to review the previous generation of

    I/O buses and interconnects. This chapter reviews the factors that limit the ability of

    older generation buses to keep pace with the increasing demands of new applications.

    Finally, this chapter discusses the key factors of the HT technology that provides its

    improved capability.

    This Chapter

    This chapter provides an overview of the HT architecture that defines the primary ele-

    ments of HT technology and the relationship between these elements. This chapter sum-

    marizes the features, capabilities, and limitation of HT and provides the background

    information necessary for in-depth discussions of the various HT topics in later chap-

    ters.

    The Next Chapter

    The next chapter describes the function of each signal in the high- and low- speed

    HyperTransport signal groups.

    General

    HyperTransport is in its essence a hardware interface. While software is involved in

    configuring, controlling, and utilizing the interface, the bulk of the protocols described

    in this book are implemented in hardware. The interface has been built into manydevices including all of the microprocessors in AMDs Turion, Athlon, Phenom, and

    Opteron product lines.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    11/30

    HyperTransport 3.1 Interconnect Technology

    18

    HyperTransport provides a point-to-point interconnect that can be extended to support a

    wide range of devices. Figure 2-1 on page 19illustrates a sample HT system with four

    internal links. HyperTransport provides a high-speed, high-performance, point-to-point

    dual simplex link for interconnecting IC components on a PCB. Data is transmitted

    from one device to another across the link.

    The width of the link along with the clock frequency at which data is transferred are

    scalable and auto-negotiable:

    Link width ranges from 2 bits to 32-bits

    Clock Frequency ranges from 200MHz to 3.2Hz

    This scalability allows for a wide range of link performance and potential applications

    with bandwidths ranging from 200MB/s to 51.2GB/s.

    Once again referring to Figure 2-1, the HT bus has been extended in the sample system

    via a series of devices known as tunnels. A tunnel is an HT device that performs somefunction and also has a second HT interface that permits the connection of another HT

    device. The end device is termed a cave, which always represents the termination of a

    chain of devices that all reside on the same HT bus. Cave devices include a function, but

    no additional HT connection. The series of devices that comprise an HT bus is some-

    times simply referred to as an HT chain.

    In Figure 2-1, the tunnel devices provide connections to Infiniband, PCI-Express, and

    Ethernet. The Cave device provides a connection to a Serial Attached SCSI storage

    device. Devices supporting connections to any of these other technologies can be made

    as either cave or tunnel devices.

    Additional HT buses (i.e. chains) may be implemented in a given system by using a

    HT-to-HT bridge. In this way, a fabric of HT devices may be implemented. Refer to sec-

    tion entitled, Extending the Topology on page 31 for additional detail.

    Transfer Types Supported

    HT supports two types of addressing semantics:

    1. legacy PC, address-based semantics

    2. messaging semantics common to networking environments

    The most of this book discusses the address-based semantics common to compatible PC

    implementations. Message-passing semantics are discussed in See The Direct Packet

    Feature Set on page 497.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    12/30

    Chapter 2: HT Architectural Overview

    19

    Address-Based Semantics

    The HT bus was initially implemented as a PC compatible solution that by definition

    uses Address-based semantics. This includes both the 40-bit, 1 Terabyte (TB) and the

    64-bit, 18 Exabyte (EB) address spaces. The 64-bit address map is the same as the 40-bit

    address map with 24 extra bits that are defined to be 00_0000h for 40 bit transactions.

    Transactions specify locations within this address space that are to be read from or writ-

    ten to. The address space is divided into blocks that are allocated for particular func-

    tions, listed in Figure 2-2 on page 20. Unless otherwise specified, 40-bit addresses are

    used in this book.

    Read and write request command packets contain a 40-bit address Addr[39:2]. When an

    address extension doubleword is present, twenty four additional address bits are pro-

    vided for a total of 64 address bits.

    Figure 2-1: Example HyperTransport System

    CPU CPU

    Memory/Graphics Hub and

    HyperTransport Host Bridge

    PCIe

    DDR

    SDRAM

    Infiniband

    PCIe-

    GB

    Ethernet

    PCIe slots

    Infiniband Switch

    SASSATA

    RAID Disk array

    HyperTransportTunnel Devices

    -

    HyperTransport

    Cave Device

    HyperTransportLinks

    Ethernet

    Cable

    Out of Box

    HTX slot

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    13/30

    HyperTransport 3.1 Interconnect Technology

    20

    HyperTransport does not contain dedicated I/O address space as PCI does. Instead, CPU

    I/O space is memory-mapped to high address range (FD_FC00_0000h

    FD_FDFF_FFFFh). Each HyperTransport device is configured at initialization time by

    the boot ROM configuration software to respond to a range of memory address spaces.

    The devices are assigned addresses via the base address registers contained in the con-

    figuration register header. Note that these registers are based on the PCI Configuration

    registers, and are also mapped to memory space (FD_FE00_0000hFD_FFFF_FFFFh.

    Also unlike the PCI bus, there is no dedicated configuration address space.

    Additional memory address ranges are used for interrupt signaling and system manage-

    ment messages. Details regarding the use of each range of address space is discussed in

    subsequent chapters that cover the related topic. For example, a detailed discussion of

    the configuration address space can be found in Chapter 14, entitled "Device Configura-

    tion," on page 329

    Figure 2-2: 40-bit HT Address Map

    1012GB

    DRAM / Memory-Mapped IO

    3984MB Interrupt /

    EOI

    1MB Legacy

    PIC IACK

    1MB System

    Management

    46MB Reserved

    7680MB Reserved

    32MB

    Configuration

    32MB IO

    00_0000_0000h to FC_FFFF_FFFFh

    FD_0000_0000h to FD_F8FF_FFFFh

    FD_F900_0000h to FD_F90F_FFFFh

    FD_F910_0000h to FD_F91F_FFFFh

    FD_F920_0000h to FD_FBFF_FFFFh

    FD_FC00_0000h to FD_FDFF_FFFFh

    FD_FE00_0000h to FD_FFFF_FFFFh

    FE_0000_0000h to FE_1FFF_FFFFh512MB Extended

    Configuration

    FE_2000_0000h to FF_FFFF_FFFFh

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    14/30

    53

    3 Signal Groups

    The Previous Chapter

    The previous chapter provided an overview of the HT architecture that defines the pri-

    mary elements of HT technology and the relationship between these elements. The

    chapter summarized the features, capabilities, and limitation of HT and provided the

    background information necessary for in-depth discussions of the various HT topics in

    later chapters.

    This ChapterThis chapter describes the function of each signal in the high and low speed Hyper-

    Transport signal groups. The CAD, CTL and CLK high speed signals are routed point-

    to-point as low-voltage differential pairs between two devices (or between a device and

    a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDTSTOP# low

    speed signals are single-ended low voltage CMOS and may be bused to multiple

    devices. In addition, each device requires power supply and ground pins. Because the

    CAD bus width is scalable, the actual number of CAD, CTL and CLK signal pairs var-

    ies, as does the number of power and ground pins to the device.

    The Next Chapter

    The next chapter describes the use of HyperTransportcontroland datapackets to con-

    struct HyperTransport link transactions. Control packet types include Information,

    Request, and Response variants; data packets contain a payload of 0-64 valid bytes. The

    transmission, structure, and use of each packet type is presented.

    Introduction

    Signals on each HyperTransport link fall into two groups: high speed signals associated

    with the sending and receiving of control and data packets, and miscellaneous low-

    speed signals required for such things as reset and power management. The low speed

    signals are not scalable and employ conventional low voltage CMOS signaling. The

    high speed signal group is scalable in terms of both bus width and clock rate, and each

    signal is carried by a low-voltage differential signal pair.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    15/30

    HyperTransport 3.1 Interconnect Technology

    54

    While device pin count varies with scaling, signal group functions remain the same; the

    only real difference in signaling between a 32-bit link versus a 2-bit link is the number

    of bit times required to shift information onto the bus.

    The Signal Groups

    As illustrated in Figure 3-1 on page 54, the high-speed HyperTransport signals on each

    link consist of an outbound (transmit) set of signals and an inbound (receive) set of sig-

    nals for each device; these are routed point-to-point. Having two sets of uni-directional

    signals allows concurrent traffic. In addition, there is one set of low speed signals that

    may be bused to multiple devices.

    Figure 3-1: HyperTransport Signal Groups

    Link

    CTL[n:0] signal pairs

    CAD[p:0] signal pairs

    CLK[m:0] signal pairs

    RCV

    XMT

    XMT

    RCV

    System Logic

    PWROK

    RESET#

    *LDTSTOP#

    *LDTREQ#

    *Optional

    Low Speed

    Signals

    VHT= 1.2Volts

    GND

    High Speed

    Signals

    (NextLink)

    (NextLink)

    XMT

    RCV

    Device A Device B

    XMT

    RCV

    RCV

    XMT

    For Width = 2 4 8 16 32

    m = 0 0 0 1 3

    Gen3: n = 0 0 0 1 3

    Gen1: n = 0 0 0 0 0

    p = 1 3 7 15 31

    CTL[n:0] signal pairs

    CAD[p:0] signal pairs

    CLK[m:0] signal pairs

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    16/30

    Chapter 3: Signal Groups

    55

    The High Speed Signals (One Set In Each Direction)

    Each high-speed signal is actually a differential signal pair. The CAD (Command/

    Address/Data) signals carry the HyperTransport packets. When a link transmitter sendspackets on the CAD bus, the receive side of the interface uses the CLK signals, also sup-

    plied by the transmitter, to latch in packet information during each bit time. The CTL

    signal or signals are used to delineate between the packet types.

    The CAD Signal Group

    The CAD bus is always driven by the transmitter side of a link, and is comprised of sig-

    nal pairs that carry HyperTransport requests, responses, and data. Each CAD bus may

    consist of between 2 bits (two differential signal pairs) and 32 bits (thirty-two differen-

    tial signal pairs). The HyperTransport specification permits the CAD bus width to bedifferent (asymmetrical) for the two directions. To enable the corresponding receiver to

    make a distinction as to the type of information currently being sent over the CAD bus,

    the transmitter also drives the CTL signals (See the following description).

    Control Signal (CTL)

    This set of signal pairs is driven by the transmitter to identify the information being sent

    concurrently over the CAD signals. The receiver uses this information to delineate the

    incoming CAD information.

    In the Gen1 Protocol, there is one (and only one) CTL signal pair for each link direction,

    regardless of the width of the CAD bus. If this signal is asserted (high), the transmitter is

    indicated that it is sending a control packet; if deasserted, the transmitter is sending a

    data packet. The CTL line may only change on the boundaries of the transfer of the log-

    ical 32 bit CAD bus. The receiver uses this restriction to allow framing on the data.

    In the Gen3 Protocol, there is one CTL signal pair for each set of 8 or fewer CAD lines.

    These CTL signal(s) are encoded in a manner to transfer four CTL bits per 32 CAD bits.

    Five of the sixteen available codepoints for these bits are used to delineate the transfer

    of commands, inserted commands, the CRC for a command with data, the CRC for a

    command without data, and finally the data itself. The receiver frames on the presence

    of the used codepoints.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    17/30

    HyperTransport 3.1 Interconnect Technology

    56

    Clock Signal(s) (CLK)

    As each HyperTransport transmitter sends a differential clock signal along with CAD

    and CTL signals to the receiver at the other end of the link. There is one CLK signal pairfor each set of 8 or fewer CAD signal pairs. Having a CLK signal pair per set of 8 or

    fewer CAD signal pairs aids in the partitioning of on-chip clock circuitry as well as eas-

    ing board layout.

    In HyperTransport revisions 1.03 to 1.1, the CLK signal is source synchronous with the

    CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 800 MHz.

    In HyperTransport revision 2.0 the CLK signal is also source synchronous with the

    CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 1.4 GHz.

    In HyperTransport revision 3.1 the CLK signal is also source synchronous with the

    CAD and CTL lines, but the phase of CAD and CTL with respect to CLK is allowed tovary within a defined budget over a several bit times. The clock speeds range from 200

    MHz (default) up to 3.2 GHz.

    For all revisions, the jitter on the CLK line is tightly controlled.

    Scaling Hazards: Burden Is On The Transmitter

    It is a requirement in HyperTransport that the software in the transmitter side of each

    link must be aware of the capabilities of its corresponding receiver and avoid the double

    hazard of a scalable bus: running at a faster clock rate than the receiver can handle orusing a wider data path than the receiver supports. Because the link is not a shared bus,

    the transmitter side of each device is concerned with the capabilities of only one target.

    The Low Speed Signals

    Power OK (PWROK) And Reset (RESET#)

    PWROK used with RESET# indicates to HyperTransport devices whether a Cold or

    Warm Reset is in progress. Which system logic component is responsible for managingthe PWROK and RESET# signals is beyond the scope of the HyperTransport specifica-

    tion, but timing and use of the signals are defined. The basic use of the signals includes:

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    18/30

    63

    4 Packet Protocol

    The Previous Chapter

    The previous chapter described the function of each signal in the high and low speed

    HyperTransport signal groups. The CAD, CTL, and CLK high speed signals are routed

    point-to-point as low-voltage differential pairs between two devices (or between a

    device and a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDT-

    STOP# low speed signals are single-ended low voltage CMOS and may be bused to

    multiple devices. In addition, each device requires power supply and ground pins.

    Because the CAD bus width is scalable, the actual number of CAD, CTL and CLK sig-nal pairs varies, as does the number of power and ground pins to the device.

    This Chapter

    This chapter describes the use of HyperTransportcontroland datapackets to construct

    HyperTransport link transactions. Control packet types include Information, Request,

    and Response variants; data packets contain a payload of 0-64 valid bytes. The trans-

    mission, structure, and use of each packet type is presented.

    The Next Chapter

    The next chapter describes HyperTransportflow control, used to throttle the movementof packets across each link interface. On a high-performance connection such as Hyper-

    Transport, efficient management of transaction flow is nearly as important as the raw

    bandwidth made possible by clock speed and data bus width. Topics covered here

    include background information on bus flow control and the initialization and use of the

    HyperTransport virtual channel flow control buffer mechanism defined for each trans-

    mitter-receiver pair.

    The Packet-Based Protocol

    HyperTransport employs a packet-based protocol in which all information, address,commands, and data, travel in packets which are multiples of four bytes each. Packets

    are used in link management (e.g. flow control and error reporting) and as building

    blocks in constructing more complex transactions such as read and write data transfers.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    19/30

    HyperTransport 3.1 Interconnect Technology

    64

    It should be noted that, while packet descriptions in this chapter are in terms of bytes,

    the links bidirectional interface width (2, 4, 8, 16, or 32 bits) ultimately determines the

    amount of packet information sent during each bit timeon HyperTransport links. There

    are two bit times per clock period.

    Before looking at packet function and use, the following sections describe the mechan-ics of packet delivery over 2, 4, 8, 16, and 32 bit scalable link interfaces.

    8 Bit Interfaces

    For 8-bit interfaces, one byte of packet information may be sent in each bit time. For

    example, a 4-byte request packet would be sent by the transmitter during four adjacent

    bit times, least significant byte first as shown in Figure 4-1 on page 64. Total time to

    complete a four-byte packet is two clock periods.

    Figure 4-1: Four Byte Packet On An 8-Bit Interface

    Bit Time

    Clock

    Byte 0 -3

    0

    0

    0

    0

    7

    7

    7

    7

    A

    A B C D

    (Byte 0)

    xamp e:

    4 Byte Packet On An 8-Bit Interface

    (Byte 1)

    (Byte 2)

    (Byte 3)

    1

    1

    1

    1

    2

    2

    2

    2

    B

    C

    D

    Device A Device B

    CAD0-7

    8

    D,C, B, A

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    20/30

    Chapter 4: Packet Protocol

    65

    Interfaces Narrower Than 8 Bits

    For link interfaces that are narrower than 8 bits, the first byte of packet information is

    shifted out over multiple bit times, least significant bits first. Referring to Figure 4-2 onpage 65, a 2-bit interface would require four bit times to transmit each byte of informa-

    tion. After the first byte is sent, subsequent bytes in the packet are shifted out in the

    same manner. Total time to complete four byte packet: eight clock periods.

    Figure 4-2: Four Byte Packet On A 2-Bit Interface

    Bit Time

    Clock

    Byte 0 Byte 1 Byte 2 Byte 3

    0

    0

    0

    0

    7

    7

    7

    7

    D

    H

    L

    P

    C

    G

    K

    O

    B

    F

    J

    N

    AA B C D E F G H I J K L M N O P

    E

    (Byte 0)

    xamp e:

    4 Byte Packet On A 2-Bit Interface

    (Byte 1)

    (Byte 2)

    (Byte 3)

    1

    1

    1

    1

    2

    2

    2

    2

    P, O, N.....C, B, A

    Device A Device B

    CAD0-1

    2

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    21/30

    HyperTransport 3.1 Interconnect Technology

    66

    Interfaces Wider Than 8 Bits

    For 16 or 32 bit interfaces, packet delivery is accelerated by sending multiple bytes of

    packet information in parallel with each other.

    16 Bit Interfaces

    On 16-bit interfaces, two bytes of information may be sent in each bit time. Referring to

    Figure 4-3 on page 66, note that even numbered bytes travel on the lower portion of the

    16 bit interface, odd numbered bytes on the upper portion.

    Figure 4-3: Four Byte Packet On A 16-Bit Interface

    Device A Device B8

    Bit Time

    Clock

    (Byte 0 -3)

    0

    0

    0

    0

    7

    7

    7

    7

    A

    CAD0-7

    (Byte 0)

    Example:

    4 Byte Packet On A 16-Bit Interface

    (Byte 1)

    (Byte 2)

    (Byte 3)

    1

    1

    1

    1

    2

    2

    2

    2

    B

    8

    (Byte3), (Byte1)

    A B

    B A

    CAD8-15

    (Byte2), (Byte0)

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    22/30

    125

    6 I /O Ordering

    The Previous Chapter

    The previous chapter describes HyperTransportflow control, used to throttle the move-

    ment of packets across each link interface. On a high-performance connection such as

    HyperTransport, efficient management of transaction flow is nearly as important as the

    raw bandwidth made possible by clock speed and data bus width. Topics covered here

    include background information on bus flow control and the initialization and use of the

    HyperTransport virtual channel flow control buffer mechanism defined for each trans-

    mitter-receiver pair.

    This Chapter

    This chapter describes the ordering rules which apply to HyperTransport packets.

    Attribute bits in request and response packets are configured according to application

    requirements. Additional ordering requirements can be found in Chapter 23, entitled "I/

    O Compatibility," on page 525 that are used when interfacing PCI, PCI-X, PCIe and

    AGP.

    The Next Chapter

    In the next chapter, examples are presented which apply the packet principles describedin the preceding chapter. The examples also entail more complex system transactions

    than discussed previously, including reads, posted and non-posted writes, and atomic

    read-modify-write operations.

    The Purpose Of Ordering Rules

    Some of the important reasons for enforcing ordering rules on packets moving through

    HyperTransport include the following:

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    23/30

    HyperTransport 3.1 Interconnect Technology

    126

    Maintain Data Correctness

    If transactions are in some way dependent on each other, a method is required to assure

    that they complete in a deterministic way. For example, if Device A performs a writetransaction targeting main memory and then follows it with a read request targeting the

    same location, what data will the read transaction return? HyperTransport ordering

    seeks to make such events predictable (deterministic) and to match the intent of the pro-

    grammer. Note that, compared to a shared bus such as PCI, HyperTransport transaction

    ordering is complicated somewhat by point-to-point connections which result in target

    devices on the same chain (logical bus) being at different levels of fabric hierarchy.

    Avoid Deadlocks

    Another reason for ordering rules is to prevent cases where the completion of two sepa-rate transactions are each dependent on the other completing first. HyperTransport

    ordering includes a number of rules for deadlock avoidance. Some of the rules are in the

    specification because of known deadlock hazards associated with other buses to which

    HyperTransport may interface (e.g. PCI).

    Support Legacy Buses

    One of the principal roles of HyperTransport is to serve as a backbone bus which is

    bridged to other peripheral buses. HyperTransport explicitly supports PCI, PCI-X, and

    AGP and the ordering requirements of those buses.

    Maximize Performance

    Finally, HyperTransport permits devices in the path to the target, and the target itself,

    some flexibility in reordering packets around each other to enhance performance. When

    acceptable, relaxed ordering may be enabled by the requester on a per-transaction basis

    using attribute bits in request and response packets.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    24/30

    Chapter 6: I/O Ordering

    127

    Introduction: Three Types Of Traffic Flow

    HyperTransport defines three types of traffic: Programmed I/O (PIO), Direct Memory

    Access (DMA), and Peer-to-Peer. Figure 6-1 on page 127depicts the three types of traf-fic.

    1. Programmed I/O traffic originates at the host bridge on behalf of the CPU and tar-

    gets I/O or Memory Mapped I/O in one of the peripherals. These types of transac-

    tions often are generated by CPU to set up peripherals for bus master activity, check

    status, program configuration space, etc.

    2. DMA traffic originates at a bus master peripheral and typically targets main mem-

    ory. It may also originate from a DMA controller located within the processor

    device outside of the CPU itself. This traffic is used so that the CPU may be off-

    loaded from the burden of moving large amounts of data to and from the I/O sub-

    system. Generally, the CPU uses a few PIO instructions to program the peripheral

    device with information about a required DMA transfer (transfer size, target

    address in memory, read or write, etc.), then performs some other task while the

    DMA transfer is carried out. When the transfer is complete, the DMA device may

    generate an interrupt message to inform the CPU.

    3. Peer-to-Peer traffic is generated by an interior node and targets another interior

    node. In HyperTransport, direct peer-to-peer traffic is not allowed. As indicated in

    Figure 6-1 on page 127, the request is issued upstream and must travel to the host

    bridge. The host bridge examines the address and determines whether the request

    should be reflected downstream. If the request is non-posted, the response will sim-

    ilarly travel from the target back up to the host bridge and then be reissued to the

    original requester.

    Figure 6-1: PIO, DMA, And Peer-to-Peer Traffic

    Memory

    Target

    SourceUnitID0

    TargetUnitID2

    Host Bridge Host Bridge

    S

    P

    TunnelTunnel

    Bus 0Bus 0

    Bus 0Bus 0

    Bus 0

    Bus 0

    SourceSourceUnitID2UnitID2

    TargetUnitID1

    Peer-to-PeerDMAPIO

    Tunnel

    P P

    P

    S

    Host Bridge

    S

    P P

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    25/30

    HyperTransport 3.1 Interconnect Technology

    128

    The Ordering Rules

    HyperTransport packet ordering rules are divided into groups: general rules, rules for

    upstream I/O ordering, and rules for downstream ordering. Even the peer-to-peer exam-

    ple in Figure 6-1 on page 127can be broken into two parts: the request moving to the

    bridge (covered by upstream ordering rules) and the reflection of the request down-

    stream to the peer-to-peer target (covered by downstream I/O ordering rules). Refer to

    Chapter 23, entitled "I/O Compatibility," on page 525 for a discussion of ordering when

    packets move between HyperTransport and another protocol (PCI, PCI-X, or AGP).

    General I/O Ordering Limits

    Ordering Covers Targets At Same Hierarchy Level

    Ordering rules only apply to the order in which operations are detected by targets at the

    same level in the HyperTransport fabric hierarchy. Referring to Figure 6-2 on page 128,

    assume that two peer-to-peer writes targeting devices on two different chains have been

    performed by the end device in chain 0.

    Figure 6-2: Targets At Different Levels In Hierarchy And In Different Chains

    HT Host Bridge

    HT-to-

    PCI-X

    Tunnel

    HT-to-

    GbE

    Tunnel

    HT-to-

    PCI

    Tunnel

    HT-to-

    SCSI

    Tunnel

    Chain 1

    Chain 2

    Chain 0

    Request A

    Request B

    I/O

    Hub

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    26/30

    145

    7 TransactionExamples

    The Previous Chapter

    The previous chapter describes the ordering rules which apply to HyperTransport pack-

    ets. Attribute bits in request and response packets are configured according to applica-

    tion requirements. Additional ordering requirements can be found in Chapter 23,

    entitled "I/O Compatibility," on page 525 that are used when interfacing PCI, PCI-X,PCIe and AGP.

    This Chapter

    In this chapter, examples are presented that apply the packet principles in the preceding

    chapters and includes more complex system transactions, not previously discussed. The

    examples include reads, posted and non-posted writes, and atomic read-modify-write.

    The Next Chapter

    HT uses an interrupt signaling scheme similar to PCIs Message Signaled Interrupts.

    The next chapter defines how HT delivers interrupts to the Host Bridge via posted mem-ory writes. This chapter also defines an End of Interrupt message and details the mecha-

    nism that HT uses for configuring and setting up interrupt transactions (which is

    different from the PCI-defined mechanisms).

    Transaction Examples: Introduction

    The following section describes some transactions which might be seen in HyperTrans-

    port. Not all cases are covered, but the ones which are presented are intended to illus-

    trate two important aspects of each transaction:

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    27/30

    HyperTransport 3.1 Interconnect Technology

    146

    Packet Format And Optional Features

    Most of the control packet variants have a number of fields used to enable optional fea-

    tures: isochronous vs. standard virtual channels, byte vs. dword data transfers, postedvs. non-posted channels, etc. In each of the examples, the key options are described; in

    some cases, certain packet fields dont apply to a particular request or response at all.

    Refer to Chapter 4, entitled "Packet Protocol," on page 63 for low level, bit-by-bit

    packet field descriptions.

    General Sequence Of Events

    Once the packet fields are determined, the next topic covered in the transaction exam-

    ples is a general summary of events which must occur in the HyperTransport topology

    to perform the transaction. The description starts at the agent initiating an information orrequest control packet and follows it (and possibly a data packet) to the target; if there is

    a response required, this packet (and any data associated with it) is similarly followed

    from the target back to the original requester. Packet movement through HyperTransport

    involves aspects of topics described in more detail in other chapters, including flow con-

    trol, packet ordering and routing, error handling, etc.

    For each of the examples, assume the link interface CAD bus is eight bits in each direc-

    tion. This simplifies the discussion of packet bytes moving across the connection

    each byte of packet content is sent in one bit time. The only difference, when link width

    is greater or less than 8 bits, is how the packet bytes are shifted out and received. For

    example, on a 2-bit interface each byte of packet content is shifted onto the link over 4bit times. For an interface 32 bits wide, four bytes of packet content are sent in parallel

    with each other in each bit time.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    28/30

    Chapter 7: Transaction Examples

    147

    Example 1: NOP Information Packet

    Situation:Device 2 in Figure 7-1 on page 147is using a NOP packet to inform Device

    1 about the availability of two new Posted Request (CMD) entries and one new

    Response (Data) entry in its receiver flow control buffers.

    Figure 7-1: Example 1: NOP Information Packet With Buffer Updates

    ResponseData[1:0]

    6 5 4 3 2 170

    Bits

    Bit Fields

    Byte 0:DisCon = 0bCMD = 000000b

    Command Type Cmd[5:0] = 000000b

    1

    2

    3

    Byte 0DisCon

    PostData[1:0]Response[1:0]

    Reserved

    PostCmd[1:0]

    NonPostData[1:0] NonPostCmd[1:0]IsocDiagRsv-Host Reserved

    Byte 1: PostCMD = 10b (2d) PostData = 00b Response = 00b ResponseData = 01b (1d)

    Bit Fields

    Byte 3: RxNextPktToAck = 25h

    Byte 2: NonPostCMD = 00b Non-PostData = 00b

    Isoc = 0b Diag = 0b

    NOP PacketBytes

    Device 1 Device 2

    CAD0-7

    CTL 8

    CLK

    CAD0-7

    CTL8

    CLK

    Flow ControlBuffers

    RxNextPktToAck[7:0]

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    29/30

    HyperTransport 3.1 Interconnect Technology

    148

    Example 1: NOP Packet Setup

    (Refer toFigure 7-1 on page 147.)

    Command[5:0] Field (Byte 0, Bit 5:0)

    The NOP information packet command code. There are no option bits within this field.

    For this example, field = 000000b.

    DisCon Bit Field (Byte 0, Bit 6)

    This bit is set to indicate an LDTSTOP# sequence is beginning. Not active for this

    example. Refer to HT Link Disconnect/Reconnect Sequence on page 217 for a discus-

    sion of the DisCon bit and the LDTSTOP# sequence. For this example, field = 0b.

    PostCMD[1:0] Field (Byte 1, Bits 1:0)

    This NOP field is used by all devices to dynamically report the number of new entries

    (0-3) in the receiversPosted Request (CMD)flow control buffers which have become

    available since the last NOP reporting this information was sent. In our example,

    assume that two new entries have become available and this field = 10b.

    PostData[1:0] Field (Byte 1, Bits 3:2)

    This NOP field is used by all devices to dynamically report the number of new entries

    (0-3) in the receivers Posted Request (Data) flow control buffers that have become

    available since the last NOP reporting this information was sent. In our example,assume no new entries in this buffer have become available and this field = 00b.

    Response[1:0] Field (Byte 1, Bits 5:4)

    This NOP field dynamically reports the number of new entries (0-3) in the receivers

    Response (Data)flow control buffers which have become available since the last NOP

    reporting this information was sent. In our example, assume no new entries in this buffer

    have become available. For this example, field = 00b.

    ResponseData[1:0] Field (Byte 1, Bits 3:2)

    This NOP field is used by all devices to dynamically report the number of new entries

    (0-3) in the receiversResponse (Data)flow control buffers which have become avail-

    able since the last NOP reporting this information was sent. In our example, assume one

    new entry in this buffer has become available. For this example, field = 01b.

  • 7/25/2019 HyperTransport 3.1 Interconnect Technology.pdf

    30/30