AMD Opteron TM Overview. June 1, 2015Computation Products Group 2 Top Level Agenda The HPC Market &...
Preview:
Citation preview
- Slide 1
- AMD Opteron TM Overview
- Slide 2
- June 1, 2015Computation Products Group 2 Top Level Agenda The
HPC Market & AMD AMD64 A Programmers View AMD Opteron Processor
The HW Core Improvements Integrated Memory Controller
HyperTransport Technology Clustering Performance System Solutions
& Applications Development platforms Recent Events Summary
- Slide 3
- June 1, 2015Computation Products Group 3 Computing System
Evolution: Mainframes to desktops to clusters Mainframes ~ 1965
Tightly coupled processor, computer, OS and software from a single
company Proprietary software >$1M Departmental Minicomputers ~
1970 Significant proliferation of servers as machines leave glass
houses
- June 1, 2015Computation Products Group 42 AMD Athlon 64
Processor Replaces Address, Data and Control Bus L2 Cache L1
Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory
Controller HyperTransport 72 16 AMD64: Desktop Processor 8 Byte
memory controller supporting 200, 266, & 333 MHz DDR Memory
CHIPKILL ECC with x4 DRAMs Drive up to 4 registered DIMMs 4 DIMMs
333MHz Future memory technology supported as it is defined Up to
4GB x4 DRAMS (4GB DIMMs) HyperTransport Technology I/O On chip L1
& L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC
protected L2 Cache 740-pin PGA Package
- Slide 43
- June 1, 2015Computation Products Group 43 1P AMD Athlon 64
Desktop Processor System System Strengths Memory Latency, Bandwidth
and memory reach: 2 40 physical ( 1 Terabyte) 2 48 virtual I/O
Latency and Bandwidth ~1600M T/sec 6.4 GB/s 64-bit CPU More
Reliable Lower Chip count Improved machine check Improved error
handling AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16 HyperTransport @
1600 MTs 32bits @ 533Mhz AMD Athlon 64 200-333MHz 72-Bit Reg DDR
AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC
USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE 4GB DRAM
- Slide 44
- June 1, 2015Computation Products Group 44 1P AMD Opteron 100
Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport Replaces Address, Data
and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 1
way Value Server 16 Byte memory controller supporting 200, 266,
& 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 8
registered DIMMs 8 DIMMs 333MHz Future memory technology supported
as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three 16-bit
non-Coherent HyperTransport Technology Links On chip L1 & L2
cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2
Cache 940-pin PGA Package
- Slide 45
- June 1, 2015Computation Products Group 45 1P AMD Opteron 100
Desktop Processor System AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16
HyperTransport @ 1600 MTs 32bits @ 533Mhz AMD Opteron AMD-8111 TM
I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC USB1.1,2.0
AC97 ACR 1.0 MII 10/100 EIDE 8GB DRAM PCI-X AMD-8131 PCI-X Tunnel
AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X
Tunnel System Strengths Ideal for cost sensitive designs system
where I/O is the critical commodity Storage servers Low end DCC
workstations
- Slide 46
- June 1, 2015Computation Products Group 46 2P - AMD Opteron 200
Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport Replaces Address, Data
and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 2
Way Performance Server 16 Byte memory controller supporting 200,
266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up
to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology
supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) One
coherent and two 16-bit non-Coherent HyperTransport Technology
Links On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up
to 1M ECC protected L2 Cache 940-pin PGA Package
- Slide 47
- June 1, 2015Computation Products Group 47 2P AMD Opteron 200
Server AMD Opteron AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH
SIO LPC PCI 33/32 NIC USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE PCI-X
AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel AMD Opteron PCI-X
AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X
Tunnel AMD-8131 PCI-X Tunnel Bridge or SSL/IPSec. System Strengths
Ideal for systems where large flat memory is important (16GB of SMP
memory) Data mining Rational Data Base applications 8GB DRAM
- Slide 48
- June 1, 2015Computation Products Group 48 4P - 8P AMD Opteron
800 L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport 72/144 16 AMD64: 4 - 8
Way Performance Server 16 Byte memory controller supporting 200,
266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up
to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology
supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three
16-bit Coherent HyperTransport Technology Links On chip L1 & L2
cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2
Cache 940-pin PGA Package
- Slide 49
- June 1, 2015Computation Products Group 49 AMD Opteron 800 HPC
Processing Node HPC Strengths Flat SMP like Memory Model: All four
reside with the same 2 48 memory map Expandable to 8P NUMA
Glue-less Coherent multi- processing: low Latency and high
Bandwidth ~1600M T/sec ( 6.4 GB/s) 32GB of High B/W external memory
bus (>5.3GB/sec.) Native high B/W memory map I/O
(>25Gbits/sec.)
- Slide 50
- Model Number Implementation First digitFirst digit =
scalability of AMD Opteron processor Second and third digitsSecond
and third digits = relative performance among AMD Opteron
processors Model number conveys directional improvement AMD Opteron
200 Series AMD Opteron 100 Series 8462.0GHz 8441.8GHz 8421.6GHz
8401.4GHz ModelClock 8462.0GHz 8441.8GHz 8421.6GHz 8401.4GHz
ModelClock 2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock
2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock AMD Opteron 800
Series Up to 8 way 1462.0GHz ModelClock 1441.8GHz ModelClock AMD
Opteron Processor Model _ _ _ 2.0GHz 146 Up to 2 way 1 way Model
Number Implementation
- Slide 51
- June 1, 2015Computation Products Group 51 Price Performance
Positioning Performance Price A solution unto it self 800 200 100
256K 1M
- Slide 52
- Opteron Processor Architecture
- Slide 53
- June 1, 2015Computation Products Group 53 The Elements of the
CPU L1 Instruction Cache 64KB 44-entry Load/Store Queue L2 Cache L1
Data Cache 64KB Crossbar Memory Controller HyperTransport TM System
Request Queue Fetch Int Decode & Rename OPs 36-entry FP
scheduler FADDFMISCFMUL Branch Prediction Instruction Control Unit
(72 entries) Fastpath Microcode Engine Scan/Align FP Decode &
Rename AGU ALU AGU ALU MULT AGU ALU Res Bus Unit
- Slide 54
- June 1, 2015Computation Products Group 54 Processor Throughput
Supply 16 instruction bytes to the decoder per cycle Convert x86
instructions to fixed length OPs 24-entry integer scheduler can
Dispatch 3 OPs per cycle to integer/FP schedulers Instructions use
one of two decoding pipelines Fastpath: instructions which are
decoded in to two or fewer mOPs are decoded by hardware and then
packed into 3 dispatch positions Microcode: x86 instructions which
are decoded in to more than two mOPs, calculate microcode ROM entry
point and fetch sequence from Microcode ROM Compared to AMD Athlon
XP, more instructions use the Fastpath Eg: Packed SSE is microcoded
in AMD Athlon XP and Fastpath in AMD Opteron processors AMD Opteron
has 8% fewer microcoded instructions for SPECint2000 AMD Opteron
has 28% fewer microcoded instructions for SPECfp2000
- Slide 55
- June 1, 2015Computation Products Group 55 Floating Point &
Integer Performance FPU Throughput SSE2, x87 Theoretical: (1 Mul +
1 Add)/cycle Realized: 1.9 FLOPs/cycle SSE, 3DNow! Theoretical: (2
Mul + 2 Add)/cycle Realized: 3.4+ FLOPs/cycle 32-bit Integer
Throughput 1 add / clock cycle 1 multiply / clock cycle Multiply
latency has shrunk from 5 cycles on AMD Athlon TM to 3 cycles on
the AMD Opteron 64-bit Integer Throughput 1 add / clock cycle 1
multiply every other clock cycle Multiply latency is 4 cycles
Integer Instruction Scheduler Out Of Order (OOO) from a queue of
24* Integer Macro-Ops *Athlon TM Instruction Scheduler is 18
Macro-Ops deep
- Slide 56
- June 1, 2015Computation Products Group 56 Internal Caching L1
caches 64k bytes instruction and data 2-way set associative Data
Cache is ECC protected Instruction Cache is Parity protected L2
cache Caches instruction and data streams 16-way set associative,
ECC protected >2X Athlon XP L2 L1 bandwidth Improved Translation
Look-aside Buffer for large multiprocessor workloads Twice the size
and Lower latencies then AMD Athlon XP L2 Translation Look-aside
Buffer 512 entry - 4-way associative L1 Translation Look-aside
Buffer 32 entry Instruction & Data -fully associative Machine
check architecture for reporting failures L1 Instruction Cache 64KB
44-entry Load/Store Queue L2 Cache L1 Data Cache 64KB Bus Unit
- Slide 57
- June 1, 2015Computation Products Group 57 Reliability Features
L1 Cache Data cache is ECC protected via background scrubber
Instruction cache is parity protected upon R/W L2 cache Cache Tag
arrays are ECC protected via background scrubber Instructions are
parity protected, Data is ECC protected ECC bit reused for Branch
Prediction and Instruction Decode (end bits) DRAM is ECC protected
with chipkill ECC support Each fetch is parity checked ECC via
scrubber period is user programmable for 40ns to 84usec. Remaining
arrays are parity protected Instruction cache, tags and TLBs Data
tags and TLBs Generally read only data which can be recovered
Machine Check Architecture Report failures and predictive failure
results ECC Branch Predictor ThermTrip Memory scrubbers
- Slide 58
- June 1, 2015Computation Products Group 58 Branch Prediction
Improvements Full L1 Cache Coverage Twice the selectors as AMD
Athlon XP 4K Branch Target Addresses Backed up by Branch Address
Calculator 4 cycle correction for unconditional relative branches
16K Bimodal Counters Four times AMD Athlon XP Full Pre-decode and
Branch Identification in L2 Cache New and unique to AMD Opteron
Family of Processors Reuses L2 ECC bits on clean/shared instruction
lines and on extra bit Branch Prediction Fetch OPs Instruction
Control Unit (72 entries) Fastpath Microcode Engine Scan/Align
- Slide 59
- Integrated Northbridge
- Slide 60
- June 1, 2015Computation Products Group 60 Firmware View of
Northbridge Performs same functions found in Northbridge Memory
Controller fully integrated Host-Bridge function as defined by the
PCI spec PCI to PCI Bridge as defined by the PCI spec Graphics
Address Resolution Table (GART) Multi-processor coherency
Controlled via PCI configuration registers Memory controller
configuration HyperTransport technology routing Configured by
Firmware HyperTransport initialization via Hardware Auto-size,
coherent or non-coherent, Legacy path to the ROM in Southbridge
HyperTransport technology speed and routing via firmware Everything
else in firmware follows existing paradigms PCI enumeration Memory
sizing and configuration I/O controller setup Crossbar Memory
Controller HyperTransport TM System Request Queue
- Slide 61
- June 1, 2015Computation Products Group 61 Systems View of
Northbridge (Assumes a 2GHz processor Clock)
- Slide 62
- June 1, 2015Computation Products Group 62 HyperTransport
Technology Screaming I/O for chip-to-chip communication High
bandwidth Point-to-point links Split transaction and full duplex
Differential Signaling Tunneling capability HyperTransport Links
Three 16-bit links (3.2 GB/s per direction) Reduced pin count
compared to the typical Bus based systems Compatible with
high-volume PC board infrastructure Each can be: cHT: coherent
(Processor-to-Processor) link or, ncHT: non-coherent
(Processor-to-I/O) link For more info see:
http://www.HyperTransport.org/http://www.HyperTransport.org/
Enables scalable 2-8 processor Cache-Coherent MP systems Glueless
MP
- Slide 63
- Performance
- Slide 64
- June 1, 2015Computation Products Group 64 Multi-Processor
Performance Evaluation Simulation Parameters Microbenchmark
Simulations: RTL based Cycle accurate DRAM Page hit System
Parameters: AMD Opteron 2 GHz CPU Memory Clock = 333 MHz Data Rate
Registered PC2700 DDR memory DRAM width = 128 bits interleaved CAS
latency = 2.5 memory clocks HT frequency = 1600 MHz Data Rate (16
bits) DDR Peak Bandwidth = 5.4 GB/s HT Peak Bandwidth = 3.2 GB/s
(each direction)
- Slide 65
- June 1, 2015Computation Products Group 65 SPECint Performance
AMD Opteron processor estimates Intel Xeon processor * *Source
http://www.spec.org/osg/cpu2000/results/cpu2000.html SPECint 2000
400 500 600 700 800 900 1000 1100 1200 1300
10001200140016001800200022002400260028003000 Operating Frequency
[MHz] SPECint 2000 *Based on 2GHz lab hardware Using 32 bit
binaries
- Slide 66
- June 1, 2015Computation Products Group 66 SPECfp Performance
Comparison *Sourcehttp://www.spec.org/osg/cpu2000/results/cpu2000.
html SPECfp 2000 *Based on 2GHz lab hardware Using 32 bit binaries
10001200140016001800200022002400260028003000 Operating Frequency
[MHz] AMD Opteron processor estimates Intel Xeon processor * 400
500 600 700 800 900 1000 1100 1200 1300 3200340036003800
400042004400460048005000 1400 1500 A A A A A B ~400 MHz ~ 1100 MHz
B B B
- Slide 67
- June 1, 2015Computation Products Group 67 Source:
http://www.spec.org SPECfp 2000 Scores 0 200 400 600 800 1000 1200
1400 00.511.522.53 CPU Frequency (GHz) Score Base
(IA32)Peak(IA32)AMD Opteron Processor (Estimated Performance) AMD
Opteron P4 400FSB P4 533FSB PIII 133FSB SPECfp 2000 Base
Competitive Summary (32-bit Windows, PC2700 CAS2.5) AMD Opteron
Redesign effort
- Slide 68
- June 1, 2015Computation Products Group 68 AMD Opteron SPEC
projections compared to Alpha EV7 AMD Opteron should be more
cost-effective versus Alpha EV7 Standards versus Proprietary
Millions per month versus 100s
- Slide 69
- June 1, 2015Computation Products Group 69 AMD Opteron SPEC
projections compared to Itanium-2 AMD Opteron will be more
cost-effective than Itanium-2 Standards versus Proprietary Millions
per month versus 1,000s
- Slide 70
- June 1, 2015Computation Products Group 70 Integrated Memory
Controller Latency (Local Memory Access, Registered Memory, CAS
2.5) 1.6GHz PC2700 65ns (L1 cache miss,TLB hit) 85-95ns (L1 cache
miss,TLB miss) Block Size (bytes) Time (ns) Stride (bytes) Stride
>1M 32k< Stride
- June 1, 2015Computation Products Group 74 Sufficiently Uniform
Memory Organization (SUMO) Disadvantages 3P and 4P nodes work
better if the OS is aware of the memory map >4P may require a
NUMA aware OS if the CACHE hit rate is low Advantages Software view
of memory is SMP Latency difference between local & remote
memory is a function of the number of processors in the node 1P and
2P look like a SMP machine 3P and 4P are NUMA like but can still be
viewed as a ccUMA or asymmetric SMP node >4P can be viewed as
ccUMA and depending on CACHE hit rate, may or may not required NUMA
aware OS Physical address space is flat and can be viewed as fully
coherent or not (MOEIS state) DRAM can be contiguous or interleaved
Additional processor nodes bring true increased memory bandwidth
Designed for lower overall system chip count (glue-less
interface)
- Slide 75
- June 1, 2015Computation Products Group 75 Future NUMA Systems
Scaling beyond 8 Processor Scaling beyond 8P is enabled External
Coherent HyperTransport switch Coherent Interconnect Snoop filter
Data caching Up to 16 processors within the same 2 40 SPM memory
space 4P4P 4P4P 4P4P 4P4P SW2 SW3 4P4P 4P4P 4P4P 4P4P SW2 SW3
Interconnect Fabric 4P4P 4P4P 4P4P 4P4P SW0 SW1 4P4P 4P4P 4P4P 4P4P
SW2 SW3
- Slide 76
- AMD Opteron Support ICs
- Slide 77
- June 1, 2015Computation Products Group 77 AMD Opteron Support
ICs AMD is committed to deliver the highest quality systems
solutions Providing a family of x64-64 processors is just the start
AMD will promote and enable a broad range of HyperTransport support
silicon from internal and external design efforts. AMD, with the
HyperTransport consortium, will grow the HyperTransport
eco-system
- Slide 78
- June 1, 2015Computation Products Group 78 HyperTransport
Technology Consortium
- Slide 79
- June 1, 2015Computation Products Group 79 AMD-8131
HyperTransport PCI-X Tunnel Dual PCIx Master Each PCI-X Bridge
independently supports 66, 100, 133MHz PCI-X Protocol 33 and 66MHz
PCI 2.2 Protocol SHPC Controller 64-bit data path IOAPIC Arbiter
for up to 5 masters Hot-swap HyperTransport TM Support: 16/16 up,
8/8 down, independent support for Up to 1600MT/s up and down Full
Link Auto sizing and speed selection 829 OBGA, 37.5mm body, 1.27mm
pitch, full array, 6-Layer Motherboard Breakout AMD Opteron Or AMD
Athlon64 AMD-8111 TM I/O Hub FLASH SIO LPC 32bits @ 33Mhz
USB1.0,2.0 AC97 UDMA100 10/100 Ethernet 10/100 Phy 100 BaseT 8x8
HyperTransport @ 800MTs AMD-8131 HyperTransport Dual PCI-X 16x16
HyperTransport @ 1600MTs
- Slide 80
- June 1, 2015Computation Products Group 80 AMD-8111
HyperTransport I/O Hub I/O Hub Engineered from past successful AMD
I/O hub development efforts 8x8 wide 200 MHz DDR HyperTransport
technology interface (800MB/s aggregate BW) Enhanced 10/100
Ethernet MAC USB1.1, USB2.0, EDMA, AC97 LPC for BIOS ROM and Super
I/O PCI version 2.2 - 33/32 Bridge (legacy) Supports arbitration of
up to 8 external masters SMbus 1.0 and 2.0 controllers 492 PBGA,
35x35mm body, 1.27mm pitch AMD-8111 TM I/O Hub FLASH SIO LPC 32bits
@ 33Mhz NIC 10/100 BaseT 8x8 HyperTransport TM @ 800MHz USB1.1,2.0
AC97 MII EIDE
- Slide 81
- June 1, 2015Computation Products Group 81 AMD-8151
HyperTransport AGP Tunnel 8xAGP Fully AGP 3.0 Compliant
66,133,266,533MHz operation HyperTransport TM Support: 16/16 up,
8/8 down, independent support for Up to 1600MT/s up, Up to 800MT/s
down Full Link Auto sizing and speed selection 564 OBGA, 31x31mm
body, 1.27mm pitch, full array 8x AGP Int Gfx AMD 8151
HyperTransport AGP AMD Opteron Or AMD Athlon64 AMD-8111 TM I/O Hub
FLASH SIO LPC 32bits @ 33Mhz USB1.0,2.0 AC97 UDMA100 10/100
Ethernet 10/100 Phy 100 BaseT 8x8 HyperTransport @ 800MTs
- Slide 82
- June 1, 2015Computation Products Group 82 Opteron & Athlon
Server Chipset Roadmap 2H02 2003 2004 2005 AMD-760MP/MPX AMD-8111
HyperTransport I/O Hub 7 th Generation 8 th Generation AMD-8151
HyperTransport AGP Tunnel AMD-8131 HyperTransport PCI-X Tunnel 2
PCI-X Bridges HyperTransport Second Generation PCI Device Second
Generation HyperTransport I/O Hub
- Slide 83
- June 1, 2015Computation Products Group 83 Desktop
Infrastructure Roadmap Athlon 64 Desktop Chipset Roadmap
- Slide 84
- June 1, 2015Computation Products Group 84 A Growing ecosystem
of HyperTransport enabled ICs Available today: Dual MIPS processor
- Broadcom BCM1250 PCI 66/64 Bridge from Alliance Semi. NITROX
Security Macro Processor from Cavium Networks FPGA from XILINX and
Altera Announced: RM9000 MIPS processor from PMC Sierra 4 Port 8/8
HyperTransport TM switch swap support from Alliance Semi. SSL/TLS
Record Processing Systems Broadcom BC5850 Luminance Modular Array
Technology - Lightspeed Semiconductor Planned: InfiniBand Bridge
Proprietary High Speed Interconnect 4 Port 16/16 non-coherent
switch 4 port 16/16 coherent switch PCI-X Bridges
- Slide 85
- June 1, 2015Computation Products Group 85 HyperTransport TM
technology 4-way 16/16 Non-Coherent Switch Extends the fabric by
re-mapping Unit_IDs at each port Tracks path of packet that pass
through it, guaranteeing the same return path Records the incoming
Unit_ID so it can be restored in the response packet Follows same
rules as Processor Host interface Peer-to-peer through the switch
freeing up the host Facilitates multiple Host fabrics
- Slide 86
- June 1, 2015Computation Products Group 86 Nine channel GigE
Firewall 8x8 HyperTransport 1000M transfers/sec. FLASH LPC 8x8
HyperTransport @ 400MT/s 16x16 HyperTransport @ 1600MT/s Legacy PCI
USB1.0 AC97 UDMA133 MII 10/100 Phy 100 BaseT Management LAN AMD
Opteron Zircon BMC SIO PCI Graphics VGA 64bits @ 133Mhz 64bits @
133Mhz PCI-X AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub AMD-8131 TM
PCIX Tunnel AMD-8131 TM PCIX Tunnel
- Slide 87
- June 1, 2015Computation Products Group 87 AMD Opteron DP - 2P
Server with SSL/IPsec encryption SP1011 PCI Bridge PCI 66/64 SP 8/8
Switch Security Macro Processor RM9000x2 DDR SDRAM SysAD Bus
- Slide 88
- June 1, 2015Computation Products Group 88 1U/1P AMD Opteron
Server
- Slide 89
- June 1, 2015Computation Products Group 89 1U/2P AMD Opteron
Server
- Slide 90
- June 1, 2015Computation Products Group 90 4P Coherent System
Based on two 2P MP Nodes AMD Opteron DP 200-333MHz 9 byte Reg. DDR
8-G DRAM AMD Opteron DP 200-333MHz 9 byte Reg. DDR FLASH LPC Legacy
PCI USB1.0 AC97 UDMA133 MII 10/100 Phy 100 BaseT Management LAN
Management SIO PCI Graphics VGA AMD-8111 TM I/O Hub AMD-8111 TM I/O
Hub 16x16 HyperTransport @ 1600MT/s PCI-X AMD-8131 TM PCI-X Tunnel
AMD-8131 TM PCI-X Tunnel AMD Opteron DP 200-333MHz 9 byte Reg. DDR
8-G DRAM AMD Opteron DP 200-333MHz 9 byte Reg. DDR Horis Probe
directory SRAM
- Slide 91
- June 1, 2015Computation Products Group 91 AMD Opteron Beowulf
4P SMP Processing Node AMD Opteron 200-333MHz 9 byte Reg. DDR 8GB
DRAM AMD Opteron 200-333MHz 9 byte Reg. DDR 8-G DRAM AMD Opteron
200-333MHz 9 byte Reg. DDR 8GB DRAM AMD Opteron 200-333MHz 9 byte
Reg. DDR FLASH LPC Legacy PCI USB1.0 AC97 UDMA133 MII 10/100 Phy
100 BaseT Management LAN Management SIO PCI Graphics VGA AMD-8111
TM I/O Hub AMD-8111 TM I/O Hub 16x16 HyperTransport @ 1600MT/s
PCI-X AMD-8131 TM PCI-X Tunnel AMD-8131 TM PCI-X Tunnel To AMD 8131
Tunnel One 4P SMP node 16G-flops 32GB DRAM 10GB/sec. Memory BW
- Slide 92
- June 1, 2015Computation Products Group 92 HyperTransport
Technology on the Backplane non coherent interconnect SI4041 Switch
SI4041 Switch SI4041 Switch 4P Blade Switches and 8111 on the
backplane Hot swap connection
- Slide 93
- June 1, 2015Computation Products Group 93 Two - 8 Processor
System Topologies (NUMA)
- Slide 94
- June 1, 2015Computation Products Group 94 AMD Opteron PCI-X
AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel AMD Opteron 200-333MHz
72-Bit Reg DDR 8GB DRAM AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub
FLASH SIO LPC PCI 33/32 8x8 HyperTransport @ 1.6GB/sec. USB1.1,2.0
AC97 ACR 1.0 GMII PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel
PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel 2P Server with
Add-on Accelerator Daughter Card EIDE NIC 10/100
HyperTransport-enabled daughter card Luminance Modular Array ASIC
Interface Device Luminance Modular Array ASIC Interface Device
- Slide 95
- June 1, 2015Computation Products Group 95 AMD Athlon 64 1P
Blade Design 16x16 HyperTransport @ 1,000MT/s AMD Athlon 64 4GB
DRAM Luminance Modular Array ASIC Interface Device Luminance
Modular Array ASIC Interface Device HCA Interface Ultra low cost
Blade design 4GB 333MHz DRAM 2GHz processor ~35 Watts Luminance
Device Boots the Processor Provides HCA network interface Boot
ROM
- Slide 96
- June 1, 2015Computation Products Group 96 AMD Opteron Processor
DP 2P Graphics Workstation TM
- Slide 97
- June 1, 2015Computation Products Group 97 2P AMD Opteron
Processor Graphics Workstation (Cave)
- Slide 98
- June 1, 2015Computation Products Group 98 High density
SprayCooled Blade Configuration 4P 16G-flop Blade Design 64GB of
SMP DRAM ASIC boots the 4P unit PCI-X provides all I/O Vapor cooled
in sealed enclosure External VRM
- Slide 99
- June 1, 2015Computation Products Group 99 How ISR SprayCool TM
Technology Works b. Vapor travels though the heat exchanger to be
condensed c. Fluid collects in reservoir d. Fluid is purified by
the filtration system e. Fluid is pumped back into the electronics
in a continuous cycle a. As the electronics are sprayed, the fluid
vaporizes, cooling the electronics to a low, stable temperature. f.
Sealed enclosure protects electronics from dust, dirt,
salt-air
- Slide 100
- June 1, 2015Computation Products Group 100 16 cards
16G-flops/card 256G-flops peak throughput 64GB of memory per card
1TerraByte of sys. Memory 240 cubic inches 114M-flops/cubic inch
4.27GB of memory storage cubic inch ~6K watts ~3 watts/cubic inch
14 10 16 High Density HPC Cluster SprayCool Technology from
ISR
- Slide 101
- AMD Reference Design Kits
- Slide 102
- June 1, 2015Computation Products Group 102 Four Hardware
platforms Solo (AMD): 1P AMD Opteron mother-board for Desk top
applications Serenade (AMD): 2P AMD Opteron system board for HPC
and server applications Quartet (AMD): 4U-4P AMD Opteron system
board for HPC and server applications Khperi (Newisys): 1U-2P AMD
Opteron server board
- Slide 103
- June 1, 2015Computation Products Group 103 Solo Features
Athlon64 Uni-processor Two Unbuffered PC2700, PC2100 DDR DIMMs AMD
8151 AGP8X HyperTransport Tunnel AMD 8111 I/O Hub Four PCI 32b
33MHz slots Two ATA-100 EIDE connectors Size USB 2.0 ports 3 on
back panel, 2 on front panel, and 1 on ACR AC 97 audio SMBus 1.0
and 2.0 support One ACR slot; 1 Fan with sense and 1 Fan without
sense Floppy, serial, parallel, 2 PS/2 and 2 IEEE 1394a connectors
LPC Super I/O with 2 fans with sense 4-layer ATX form factor with
ATX power supply PC2001, WHQL, Energy Star, WFM 2.0 compliant
- Slide 104
- June 1, 2015Computation Products Group 104 Hammer Performance
Desktop (Solo-RDK)
- Slide 105
- June 1, 2015Computation Products Group 105 CPU/Memory Complex
Opteron processor 200 Series (supports up to 2 processors) Four
banks of 128bit registered DDR memory/CPU (DDR 200-333)I/O Full
size PCI-X slots: Two PCI-X 64/100 MHz or one PCI-X 64/133 (none
hot plug-able) One mini-PCI slot Dual Broadcom 10/100/1000 Ethernet
onboard Dual LSI U320 SCSI (one channel to disk, one channel to
rear expansion) Single USB1.1: to front SIO (Floppy, Serial,
Keyboard, Mouse)Management Single dedicated management, LAN10/100
Optional BMC management controller, IPMI 1.5 compliantStorage Dual
drive bays: (standard) IDE or (standard or hot-swap) SCSI drives
Slim-line IDE CD-ROM or slim-line floppy drivePhysicals 1U
Rack-mount server form factor, tool-less access, full extension
slide rails Single 500W power-supply, rear accessible to line cord
Removable blowers, cooling performed front-to-rear (passive CPU
heatsinks) Front LED panel with activity and status: PWR, RESET,
USB, PCI-Video Dimensions: (1U) x 19 W x 28 D 1U/2P Serenade
- Slide 106
- June 1, 2015Computation Products Group 106 1U/2P Serenade Front
View 28 500W Power Supply CDROM or Floppy (slimline) Drive Carriers
(x2) (SCSI hot swappable) 10 Redundant Blowers (front to back
cooling) AMD Opteron 200 Series (x2) 32/33MHz PCI (half-height/half
length) (Video option) 8 DIMMs DDR 266-333 ECC (4DIMMs/CPU) SCSI
Disk Option (Mini-PCI) Full Size PCI-X Slots (x2) 64/100 MHz or
single PCI-X 64/133 (riser w/sideband)
- Slide 107
- June 1, 2015Computation Products Group 107 1U/2P Serenade Rear
View Full Size PCI-X Slots (x2) 64/100 MHz or single PCI-X 64/133
module assembly (riser w/sideband) AMD Opteron 200 Series (x2)
cooling ducts Dedicated 10/100 IPMI Management Port Dual
10/100/1000 ENET 32/33MHz PCI (half-height/half length) (std.
half-height video option) PS2 ports U320 SCSI Option (Mini-PCI) USB
port
- Slide 108
- June 1, 2015Computation Products Group 108 Quartet: 4U/4P
SledgeHammer MP 940-pin Processor
- Slide 109
- June 1, 2015Computation Products Group 109 Quartet System
Features 4U Rack-mount server form factor (25 deep) EIA-Std 4P
Opteron (940-pin) Four banks of 128bit registered DDR memory per
CPU (designed for DDR-333) 16 Total Five full size PCI-X slots (AMD
8131): Two PCI-X 64/133 MHz (hot plug-able) Three PCI-X 64/66 MHz
Ethernet Ports: Dual Broadcom 10/100/1000 Ethernet onboard Single
10/100 (AMD-8111) Dual LSI U320 SCSI (one channel to disk, one
channel to rear expansion) System Management: Qlogic UL BMC IPMI
1.5 via dedicated LAN/Modem
- Slide 110
- June 1, 2015Computation Products Group 110 Quartet System
Features (cont) Dual IDE: Slim-line CD-ROM, Slim Floppy Dual USB:
one front, one rear SIO (Floppy, Serial, Keyboard, Mouse) Storage:
Four 1 hot-swap Ultra320 SCSI drives Video: ATI 4 Meg (via card
option PCI 32/33) Three 500W hot-swap power-supplies (2+1
redundancy) for 4U; rear accessible to three line cords Hot-swap
redundant fans (10) Front LED panel with activity and status: PWR,
RESET, USB, PCI-Video Full extension slide rails Dimensions: 5.25 H
x 19 W x 28 D (*5.25 is main/processor section; an additional 1.75
is the power supply bay) Cooling front to rear (passive CPU
heatsinks) Tool-less access
- Slide 111
- June 1, 2015Computation Products Group 111 Dual Processor
Opteron System 1U 2P Opteron 16 GigaBytes RAM, max Fully Managed
Linux 32 & 64 bit Windows 32 bit 2000 and.Net Server Windows 64
bit (when available) Khepri
- Slide 112
- June 1, 2015Computation Products Group 112 Khepri Block
Diagram
- Slide 113
- June 1, 2015Computation Products Group 113 Khepri Alpha
Internal View
- Slide 114
- June 1, 2015Computation Products Group 114 Availability Solo
(AMD Athlon 64) Prototypes are available now Production planned in
Sept. 2003 Serenade (AMD) Development platform RDK available now
Production planned for June 2003 Quartet (AMD) RDK available June
2003 Production planned for Aug. 2003 Khperi (Newisys) Development
units are available now through AMD Beachhead Program Production
Now
- Slide 115
- June 1, 2015Computation Products Group 115 Platform Enablement
Program Over the past 24 months, AMD has provided technical design
support to over ~50 companies To date, Newisys has enabled over 17
vendors with their Khepri 2P platform reference design By Launch
(April 2003) there will be 4+ announcements of 4P HPC servers based
on AMD Opteron. By Nov. 2003 there we be many more vendors with 4P
and up to four vendors with 8P SMP/NUMA AMD Opteron platforms. With
the availability of a HyperTransport coherent switch, the NUMA
server can grow to 32P and beyond.
- Slide 116
- June 1, 2015Computation Products Group 116 2002-2003 AMD Server
Roadmap Enterprise Scalable SH MP 2.2 Basic + SH MP 2.0 SH MP 1.8
Basic Value + SH DP 1.6 SH DP 1.4 THR 2.13/2600+ THR 2.0/2400+ THR
1.8/2200+ THR 1.67/2000+ BAR 2.2/2800+ SH DP 1.6 SH DP 1.4 BAR
2.2/2800+ SH DP 1.4 BAR 2.2/2800+ SH DP 1.4 Value Ultra-Value DP/MP
Systems 1Q03 2Q033Q03 4Q03 SH DP 2.4/4200SH DP 2.6/4500 SH DP
1.4/2600SH DP 1.6/3000 SH DP 1.4/2600 THR 2.13/2600+ THR 2.0/2400+
THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 4Q02 THR 2.0/2400+
THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 SH DP 1.4 SH DP
2.0 SH DP 1.8 SH DP 1.6 SH DP 2.2 SH DP 2.0 SH DP 1.8 SH DP 2.0 SH
DP 1.8 SH DP 2.4 SH DP 2.2 SH DP 2.0 SH DP 2.6 SH DP 2.4 SH DP 2.2
THR 2.13/2600+ THR 2.0/2400+ THR 2.13/2600+ AMD Opteron processor
SledgeHammer DP AMD Opteron processor SledgeHammer MP AMD Athlon MP
processor Barton (266MHz FSB) AMD Athlon MP processor Thoroughbred
(266MHz FSB) SH MP 2.0 SH MP 1.8 SH MP 2.2 SH MP 2.0 SH MP 2.6 SH
MP 2.4 SH MP 1.6SH MP 1.8 SH MP 1.4 SH MP 1.6 SH MP 1.4
- Slide 117
- Summary
- Slide 118
- June 1, 2015Computation Products Group 118 AMD Opteron
Processor Optimized for high performance operation Chip
infrastructure optimized for sub micron process impacting: Power
distribution, Clocking, Circuit design and layout 20-25% better
performance per clock than AMD Athlon XP Smart low-latency memory
controller Branch prediction, Cache and TLB improvements Advanced
clock distribution methods New operand/address sizes, rather than
new instructions Integrated DDR Memory System Controller Closing
the gap between external memory access and CPU speed Reduced
latency of current Stare of Art (AMD Athlon processor) Greater the
bandwidth of current State of Art (AMD Athlon system) Integrated
Coherent HyperTransport I/O supporting High speed peripheral
connections - >6.4GB/s throughput Coherent HyperTransport
technology to support glueless MP interface
- Slide 119
- Slide 120
- June 1, 2015Computation Products Group 120 Trademark
Attribution Copyright 2002 Advanced Micro Devices, Inc. All rights
reserved. AMD, the AMD Arrow Logo, AMD Athlon, AMD Opteron, 3DNow!
and combinations thereof are trademarks of Advanced Micro Devices,
Inc. HyperTransport is a licensed trademark of the HyperTransport
Consortium. MMX is trademark of Intel Corporation. Other product
names used in this presentation are for identification purposes
only and may be trademarks of their respective companies.