View
1
Download
0
Category
Preview:
Citation preview
Customer Training
Implementing, Simulating, &
Debugging External Memory
Interfaces A-MNL-ISDMI-12-0-v1
Table of Contents Implementing, Simulating, and Debugging External Memory Interfaces
Page Numbers
Objectives 1 Agenda 2 Section 1 – Introduction to Altera’s Memory Solutions 3
Memory Selection Criteria 5 Double Data Rate Memory Interfaces 6 DDR Logic Implementation in Altera FPGAs 10 Altera High-Speed Memory Interface IP 15 High Performance Controller II 16 ALTMEMPHY and UniPHY 24 UniPHY Calibration 28 Hard Memory Interface 34
Section 2 - Memory Interface Design Flow in the Quartus II Software
38
Parameterize with the MegaWizard™ Plug-In Manager 40 Timing Derating 47
Quartus II Project Settings 54 Exercise 1: Create the design 57
Section 3 – Functionality and Simulation of a Memory System 57 Controller Operation and Connections to User Logic 58 Performing a Simulation 70 Exercise 2: Simulation of the controller 81
Section 4 – Board and Termination Considerations 81 Creating I/O Assignments 83 Board Design and Simulation Basics 91 Choosing Optimal Termination Settings 96
Recommended Settings 102 Exercise 3: Complete the controller 104
Section 5 – Timing Analysis 104 Timing Analysis Methodology 106 General Recommendations for Closing Timing 116 Exercise 4: Perform timing analysis on the interface and test in hardware
119
Section 6 – Final Topics 120 DDR2/3 Controllers with UniPHY EMIF Toolkit 121 Using High Performance Interfaces with Nios II and Qsys 123
Multiple Memory Controllers on a Single FPGA 126 Conclusions 133 Resources 134 Appendix 137
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
© 2012 Altera Corporation—Confidential
Objectives
Achieve comfort level with Altera® memory interface IP, f i DDR3 i t ffocusing on DDR3 interfaces
Parameterize and instantiate a High Performance memory controller in a Quartus® II projectmemory controller in a Quartus® II project
Test and debug an external memory interface (EMIF) through:through: Simulation
Static timing analysis
External Memory Interface Toolkit
In-system testing (SignalTap® II embedded logic analyzer)
Apply required I/O and other constraints to the interface Apply required I/O and other constraints to the interface
Gain practical experience with the entire design and verification flow through lab exercises
© 2012 Altera Corporation—Confidential
verification flow through lab exercises
2
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 1
Agenda
Introduction to Altera’s memory interface options Source synchronous double data rate (DDR) interfaces
Parameterizing memory controllers in the Quartus II software
V if i f i li f DDR i f i Verifying functionality of a DDR interface in simulation
Board and termination considerations
Performing static timing analysisg g y
Final topics External Memory Interface Debug ToolkitExternal Memory Interface Debug Toolkit
Memory Interfaces with a Nios® II Processor and Qsys
Using multiple memory controllers inside FPGA
© 2012 Altera Corporation—Confidential
3
Quartus II Software – Two Editions
Required for memory controller IP
Devices supported All Selected devices
Subscription Edition Web Edition
Devices supported All Selected devices
Features 100% 95%
Distribution Internet & DVD Internet & DVD
Price Paid FreePrice Paid Free
Feature comparison available on Altera web site
© 2012 Altera Corporation—Confidential
4
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 2
Altera’s Complete Memory Solution
Advanced FPGA Architecture
Open source datapath
Memory ControllerMegaCore® IP
DQS phase shift circuitry Registers in I/O cells
Feature rich PLLs & clock
Architecture Reference designs Graphical user interface Included in the free IP base
suite (Subscription Edition) Feature-rich PLLs & clock
management
Automatic generated constraints
Software Support
System-level timing analysis
Spice and IBIS simulation models
Device Handbooks, External Memory Interface H db k Demo project
Development Kits &Hardware Reference Platforms
Interface description and use Timing analysis Electrical analysis
Handbook Demo project Board design guidelines Schematic and gerber files
© 2012 Altera Corporation—Confidential
5
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 1 I t d ti t Alt ’ MSection 1: Introduction to Altera’s Memory Solutions
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 3
Current Common Memory Interfaces
QDRII and II+ SRAM
RLDRAM II DDR
QDRII and II+ SRAM DDR per port Separate RD and WR ports Mem access via 1 addr bus
Common I/O (single data bus) or separate I/O (read and write data buses)
Reduced latency SRAM-like fast access time but
Cost/Bit
SRAM Fastest access time Higher cost Lower density
DRAM low price Good for latency sensitive applications,
such as traffic mgmt, caches, videos
/ / S
SRAM,QDRII
DDR/DDR2/DDR3 SDRAM DDR Lowest cost Access via row & column address busesAccess via row & column address buses Bank management increases
bandwidth by interleaving Migrating to DDR3 is the trend:
higher data rate & lower power
RLDRAM II
DDR higher data rate & lower powerSDRAM
Cycle Time
© 2012 Altera Corporation—Confidential
7
Cycle Time
Altera FPGAs Support Multiple Interface Types
Example Stratix® III, IV, or V device with DDR/2/3 memory system is a , , y y
common solution to system requirements for data buffering and other low-latency storage
HSTL-15/18 Class IQDRII/II+
RLDRAM II Altera
System I/O, HSTL-15/18 Class II
DDR3/DDR2/DDR
Altera FPGA
other chips, backplanes
SSTL-15/18/2 Class ISSTL-15/18/2 Class IIDifferential SSTL-2/15/18
Common memory data width is 72 bits
DDR
8 data bytes plus ECC Support for up to 144-bit wide interfaces for DDR2 and DDR3
© 2012 Altera Corporation—Confidential
8
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 4
Memory Selection CriteriaParameter DDR3 SDRAM DDR2 SDRAM DDR SDRAM RLDRAM II QDR II/II+
SRAM
Performance 300-800 MHz 200-533 MHz 100-200 MHz 200-533 MHz 154-350 MHzPerformance 300-800 MHz 200-533 MHz 100-200 MHz 200-533 MHz 154-350 MHz
Altera supports
Up to 1066 Mbps
Up to 800 Mbps
Up to 400 Mbps
Up to 1600 Mbps
Up to 1400 Mbpspp p p p p p
Density 512 MB - 8 GB,
32 MB - 8 GB (DIMM)
256 MB - 1 GB,
32 Mb - 4 GB (DIMM)
128 MB - 1 GB,
32 Mb - 2 GB (DIMM)
288 MB,
576 MB
8 - 72 MB
I/O standard SSTL-15
Class I, II
SSTL-18
Class I, II
SSTL-2
Class I, II
HSTL-1.8V/1.5V
HSTL-1.8V/1.5V
Data width 4, 8, 16 4, 8, 16 4, 8, 16, 32 9,18, 36 8, 9, 18, 36
Burst length 8 4, 8 2, 4, 8 2, 4, 8 2, 4
CAS latency 5 - 10 3, 4, 5 2, 2.5, 3 4, 6, 8 N/A
Data strobe Differential bidirectional strobe only
Differential or single-ended bidirectional strobe
Single-ended bidirectional strobe
Free running differential read and write clocks
Free running read and write clocks
© 2012 Altera Corporation—Confidential
9
strobe clocks
Our Main Focus Today
DDR3 memory implementations
Stratix series devices
High Performance Controller II (HPCII) High Performance Controller II (HPCII)
UniPHY physical interface
Basic discussion of how to implement other memory interface types, devices, and IP ALTMEMPHY
A i ® d C l ® i d i f d h d Arria® and Cyclone® series devices soft and hard implementations
© 2012 Altera Corporation—Confidential
10
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 5
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
D bl D t R t M I t fDouble Data Rate Memory Interfaces
© 2012 Altera Corporation—Confidential
DDR Memory Interfaces
Write cycle → FPGA to memory DQS strobe clock phase shifted 90o (center-aligned) with respect to
data (DQ) signalsdata (DQ) signals
Read cycle → FPGA from memory Receives DQS edge-aligned with data and introduces phase shiftReceives DQS edge aligned with data and introduces phase shift
to center-align for data capture
DQSDQ
Write operation
Memory
DQ
DQS
FPGA
(Logic + memory
DQSDQ
R d ti
memorycontroller)
clk/ lk#
© 2012 Altera Corporation—Confidential
12
Read operation clk#
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 6
DDR Data Read (FPGA input path)
data_out_hdata_in (DQ)D Q
FPGAFabricdata_out_lneg_reg_out
strobe (DQS)D QD Q
inclockstrobe (DQS)
90°
inclock
d t idata_in
neg_reg_out
data out l
B0 A0 B1 A1 B2 A2
B0 B1 B2 B3
B0 B1 B2xxdata_out_l
data_out_h xx
B0 B1 B2
A0 A1 A2
xx
© 2012 Altera Corporation—Confidential
13
DDR Write Logic (FPGA output path)
datain_ld (DQ)D Q
datain_h
dataout (DQ)01FPGA
Fabric
D Q
D Q
outclock Additional registers for center-aligning DQS strobe not shown
D Q
outclock
data_in_h
data in l
B0 B1 B2
A0 A1 A2
data_out
data_in_l
B0 A0 B1 A1 B2 A2
A0 A1 A2
© 2012 Altera Corporation—Confidential
14
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 7
DDR vs. DDR2
DDR is the original
DDR2 SDRAM offers key improvements over DDR On-die termination (ODT) improves signal integrity and timing
margin
C l SSTL 18 I/O t d d i t d f SSTL 2 Consumes less power; SSTL-18 I/O standard instead of SSTL-2
© 2012 Altera Corporation—Confidential
15
DDR3 Improvements Over DDR2
Lower power (SSTL-15) and double performance Even lower power DDR3L runs at 1.35 V Over 400 MHz operation requires fly-by
termination for CK and address commands Better signal integrity
Complicates timing analysis / controller design Complicates timing analysis / controller design
© 2012 Altera Corporation—Confidential
16
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 8
DDR3 Leveling
Clock signals routed in daisy-chain fly-by topology (see next slide)
Improves signal integrity on high fan-out clocksp g g y g
Other signals still point-to-point
Special leveling circuitry required Special leveling circuitry required Automatically accounts for delays and phase adjustments
Aligns all signals on writes and readsAligns all signals on writes and reads
Stratix III, IV, and V devices only
© 2012 Altera Corporation—Confidential
17
DDR3 Write LevelingFly-by termination
D D D D D D DD D D D D D DDDT
Fly by termination
D D D D D D DD D D D D D DDD
Legend:
Clock for top & bottom memory rank (fly by, double drop)
DQ DM DQS DQS (Point to point)DQ, DM, DQS, DQS (Point-to-point)
Device PHY is responsible
Mem
for skewing outgoing DQ & DQS/DQS# to match the clock flight times to
h t Ctrleach component
© 2012 Altera Corporation—Confidential
18
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 9
DDR3 Read LevelingFly-by termination
D D D D D D DDT
Fly by termination
D D D D D D DD D D D D D DDD
Clock for top & bottom (fly by,
Legend:
double drop)
DQ, DM, DQS, DQS (P2P)
Device PHY is responsible for de-skewing incoming
Mem
g gDQ & DQS/DQS# to match the clock flight times to each component
Ctrlp
© 2012 Altera Corporation—Confidential
19
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
DDR L i I l t ti i AltDDR Logic Implementation in Altera FPGAs
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 10
Memory Implementation in FPGAs
FPGAs implement DDR circuitry in different ways depending upon resources available
DDR data (DQ) and strobe (DQS) pins should be placed onto dedicated equi-skew DQ/DQS placement blocks in device for optimal memory performance (more later)memory performance (more later)
Wraparound (banks placed around corner of device) or split (banks split between opposite sides) configurations supported in somesplit between opposite sides) configurations supported in some scenarios
© 2012 Altera Corporation—Confidential
21Note: See Appendix for additional device families
Cyclone V Devices
Dedicated bidirectional DQ/DQS pins on top, bottom and rightbottom, and right
Dedicated transceivers on left side (all devices) Two hard memory controllers on top/bottom banks Two hard memory controllers on top/bottom banks
FPLLDQ/DQS in I/OFPLL
DDR read / write logic
Four DLLs available
Each I/O bank can access adjacent DLLs
DLLDLL
ock
s
/O
Hard controller
DDR read / write logic implemented in I/O cells on 3 sides of device
Up to 8 reconfigurable
for DQS phase shift
Each DLL has two outputs - allows multiple
Stratix IIIDevice
Cyclone V device
FPLL
FPLL
FPLL
FPLLscei
ver
blo
Q/D
QS
in I/
Bo
nd
ing
fractional PLLsp
interfaces to have separate frequencies
Differential DQS alsoDLL
Tran
s
DQ
DLL Hard controller
© 2012 Altera Corporation—Confidential
22
Differential DQS also possible
FPLLDQ/DQS in I/OFPLL
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 11
Arria V Devices Dedicated bidirectional DQ/DQS pins on top and bottom Dedicated transceivers on left (all devices) and right (most ( ) g (
GX and GT devices) Four hard memory controllers on top/bottom banks
DLLDLLDQ/DQS in I/O
DDR read / write logic
Four DLLs available
Each I/O bank can access adjacent DLLs
DLLo
cks
Hard controller
Tran
Hard controller
FPLL
FPLLFPLL
FPLLFPLL
implemented in I/O cells on 2 or 3 sides of device
Up to 16 reconfigurable fractional PLLs
for DQS phase shift
Each DLL has two outputs - allows multiple
Stratix IIIDevice
Arria V device
scei
ver
blo
Bo
nd
ing
nsceiver b
l
FPLL
FPLLFPLL
FPLL
FPLL
FPLLFPLL
FPLLfractional PLLs pinterfaces to have separate frequencies
Differential DQS alsoDLL
Tran
s
DLLHard controller
locks
Hard controller
FPLL
FPLL
FPLL FPLL
© 2012 Altera Corporation—Confidential
23
Differential DQS also possible
DLL
DQ/DQS in I/ODLL
Arria II GZ / Stratix III / IV Devices
Dedicated bidirectional DQ/DQS pins on all banks Top/bottom banks optimized for memory performance
S ff /O Side banks optimized to support differential I/O
Some Stratix IV GX/GT devices have dedicated transceivers on left/right sidestransceivers on left/right sides
DDR read / write logic implemented in I/O cells on Four DLLs available
PLL PLL PLL PLL5 6 7 8 implemented in I/O cells on all sides of device
Up to 12 reconfigurable PLLs
Four DLLs available
Each I/O bank can access adjacent DLLs for DQS phase shift
DLLDLL
3
4
10
9
Memory performance optimized on top and bottom of FPGA
LVDS ( CDR) SERDES
for DQS phase shift
Each DLL has two outputs - allows multiple
interfaces to have
Stratix IIIDevice
Stratix IIIdevicePLL
PLL
PLL
PLL
LVDS (non-CDR) SERDES support on left and right sides of die
interfaces to have separate frequencies
Differential DQS also ibl
1
PLL PLLPLL
DLLDLL
2
12
11
16 15 14 13
© 2012 Altera Corporation—Confidential
24
possiblePLL PLLPLL PLL16 15 14 13
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 12
Stratix V Devices
Dedicated bidirectional DQ/DQS pins on top/bottom (all devices) and right (larger GStop/bottom (all devices) and right (larger GS devices)
Dedicated transceivers on left side (all devices) ( )and right (some devices)
DDR read / write logic implemented in I/O cells on
FPLLFPLL
DQ/DQS in I/OFPLLFPLLimplemented in I/O cells on
3 sides of device
Up to 28 reconfigurable fractional PLLs
Four DLLs available
Each I/O bank can access adjacent DLLs
FPLL DLLDLL
FPLLFPLL
FPLL
ock
s
/O
FPLLFPLL
FPLLFPLL
for DQS phase shift
Each DLL has two outputs - allows multiple
Stratix IIIDevice
Stratix V device
FPLLFPLL
FPLLFPLLsc
eive
r b
lo
Q/D
QS
in I/
FPLLFPLL
FPLLFPLL p
interfaces to have separate frequencies
Differential DQS alsoDLL
FPLLFPLL
FPLL
FPLL
Tran
s
DQ
DLL
FPLL
FPLLFPLL
© 2012 Altera Corporation—Confidential
25
Differential DQS also possibleFPLL
FPLLDQ/DQS in I/OFPLLFPLL
Example DQ/DQS Block
DQ Output Path
DQ Input Path
Phase Delay via DLLy
DQS Clock
© 2012 Altera Corporation—Confidential
26
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 13
FPGA External Memory Support
Family DDR3 DDR2 DDR RLDRAM II QDR-II+ QDR-II
Stratix V (1)
Stratix IV E/GX/GT
Stratix III Stratix III
Hardcopy® III/IV
Arria V (1,2) (2)
Arria II GX
Arria II GZ
Cyclone V (1 2) (2) Cyclone V (1,2) (2)
Cyclone IV E/GX
Cyclone III
(1): Devices support DDR3 and DDR3L
© 2012 Altera Corporation—Confidential
27
pp(2): Soft or hard controller
Family/Protocol Support
FamilyMaximum half-rate frequency (MHz)
DDR3 DDR2 DDR RLDRAM II QDR-II+ QDR-II
Stratix V 1066 400 533 550 350
Stratix IV E/GX/GT 533 400 200 533 550 350
Stratix III 533 400 200 400 400 350Stratix III 533 400 200 400 400 350
Hardcopy III/IV 400/533 333 200 400 350 300
Arria V 667 400 400 400 350
Arria II GX 400 333 200 250 250
Arria II GZ 400 333 350 350 300
Cyclone V 400 (h) /300 (s) 400 (h) /300 (s) y ( ) ( ) ( ) ( )
Cyclone IV GX 200 167
Cyclone IV E 167 133
C l III 200 167
Try the External Memory Interface Spec Estimatorhttp://www altera com/technology/memory/estimator/mem emif index html
Cyclone III 200 167
© 2012 Altera Corporation—Confidential
http://www.altera.com/technology/memory/estimator/mem-emif-index.html
28
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 14
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
Alt Hi h S d M I t f IPAltera High-Speed Memory Interface IP
© 2012 Altera Corporation—Confidential
Memory Interfaces: A 2 (or 3) Part Solution
UniPHY mo
ry QDR II, QDR II+, RLDRAM II/III
PHYFPGA
UniPHY
or
Mem
RLDRAM II/III, DDR2/3
Memory controllerUser logic
PF
E)
ALTMEMPHY DDR1/2/3
Mem
ory
Mul
ti-po
rtfr
ont
end
(MP
Auto-calibrated PHY bridging physical interface and controller
MM f
controller
Uses one PLL and automatically selects all required clock phasesphases
Multi-port front end (MPFE) for multiple, independent accesses to hardened controller (discussed later)
© 2012 Altera Corporation—Confidential
accesses to hardened controller (discussed later) Cyclone V and Arria V hard memory interface only
30
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 15
DDR Memory Interface Blocks
User logic Generates data to be written to memory
Receives data read from memory
Memory controller (soft or hard) Altera High Performance Controller (HPC) II or custom controller
Initiator of read and write commands
Instantiates PHY if used
UniPHY (soft or hard) Instantiated by Altera HPCII or can be added separately
Read data/write data/address/command path
Clock and reset management
Automatic calibration during memory initializationAutomatic calibration during memory initialization
I/O logic to external memory
© 2012 Altera Corporation—Confidential
31
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
Hi h P f C t ll IIHigh Performance Controller II
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 16
High Performance Controller II Features
Supports up to 1066 MHz DDR3 memory Power managementg Advanced bank management w/ command reordering Inter-bank data reorderingg Five cycle controller latency (6 w/ ECC) ECC with sub-word write Flexible system interface Run time programmable Efficiency Monitor and Protocol Checker Multi-cast writes
Q t t t Quarter-rate support Quasi 1T/2T
© 2012 Altera Corporation—Confidential
33
Quick Review: DDRx Command Cycle
Idle Refreshing
REF
Bank
ACT
active READ or READ AP
WRITE orWRITE AP
ReadingWriting
PRE PRE
DDR initialization and configuration not shown
Pre-charging
DDR initialization and configuration not shown
Reads and writes are bursts (2, 4, or 8 bit i l i l d l )
© 2012 Altera Corporation—Confidential
sequential or interleaved column accesses)34
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 17
HPC II Advanced Bank Management
Look-ahead bank management Not efficient!!No look ahead
Efficient bank interleaving support
Issue activate and precharge commands idle cmd busprecharge commands early
Use auto-prechargewhere possibleI d d/ it
With look ahead bank management
In-order read/writes
Per access open or close page policy Use of idle cycles for bank-managementclose page policy Read/write accesses with
auto-precharge Automatic cancellation of
y g
Command Address Condition
Read Bank 0 Activate requiredAutomatic cancellation of auto-precharge on page hits
Read Bank 1 Precharge required
Read Bank 2 Precharge required
© 2012 Altera Corporation—Confidential
35
Inter-Bank Data Reordering
Intelligent reordering of read and write commands going to different bank addresses in an efficient mannerbank addresses in an efficient manner
Mitigates bus turn-around time (read to write, write to read)
Reduces conflict between rows
Data Reordering OFF
WR to RDturnaround
RD to WRturnaround
Data Reordering ON
WR to RDturnaround
WR to RDturnaround
WR to WRturnaround
© 2012 Altera Corporation—Confidential
36
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 18
Other Aspects of Data Reordering
Command aging Mechanism to favor older commands over newer commands
during data reordering if these commands are requested for access at the same timeaccess at the same time
Management of aging reduces latency
Starvation controlStarvation control Commands are “starving” if not served after a period of time
Starvation limit can be set (default is 10 commands)
Logic added to prevent command from starving
Also reduces latency
© 2012 Altera Corporation—Confidential
37
Without Starvation Control
Command Sequence
Local command(from user logic to
controller)
Memory command
Write to WriteTurnaround
Write to ReadTurnaround
Note: Write to write turnaround time is shorter than write to read turnaround time
Turnaround Turnaround
To minimize bus turnaround time, controller favors write over read if write was issued previously and vice versa
Causes read command to be pushed to the end Causes read command to be pushed to the end resulting in large latency
© 2012 Altera Corporation—Confidential
38
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 19
With Starvation Control
Command Sequence
Local command
Memory command
Write to WriteTurnaround
Write to ReadTurnaround
Read to WriteTurnaround
Note: Write to write turnaround time is shorter than write to read / read to write turnaround time
Note: Starvation counter increments for every command issued
User sets starvation limit Starved command served immediately when Starved command served immediately when
starvation counter reaches limit Example: starvation limit set to 2p
After 2 commands, read tagged as starved (promoted into priority command) and served immediately
© 2012 Altera Corporation—Confidential
39
Full-Rate and Half-Rate Modes
Simplify design by halving application side frequency and doubling data width
Half-rate mode required for DDR3q
y y
Full-rate logic Half-rate logic
UserLogic
SDRto HDR
DDR to SDR
Mem
ory
Mem
ory
UserLogic
DDRto SDR
8 16 8 16 32
FPGA FPGA
DDR200 MHz
DDR200 MHz
SDR200 MHz
SDR200 MHz
HDR100 MHz
Implemented directly in I/O for Cyclone V, Arria V, and Stratix III / IV / V FPGAs
© 2012 Altera Corporation—Confidential
40
Implemented in fabric for Cyclone III / IV FPGAs
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 20
Quarter-Rate Mode
Allows the controller and user logic to run at a quarter of the memory clock frequency
Allows further flexibility without compromising y p gperformance
© 2012 Altera Corporation—Confidential
41
Quasi 1T/2T
Row commands: ACT or PREColumn commands: READ or WRITE
Half-rate mode: 1 controller clock cycle = 2 memory clock Half rate mode: 1 controller clock cycle 2 memory clock cycles
Quarter-rate mode: 1 controller clock cycle = 4 memoryclock cyclesclock cycles
Improve command bandwidth by issuing two commands every controller clock cycle
© 2012 Altera Corporation—Confidential
42
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 21
Flexible System Interface
Avalon interface Avalon®-ST (streaming) for user logic access to controller( g) g
Avalon-MM (memory mapped) slave interface for access to configuration and status register (CSR)
See the Avalon Interface Specifications See the Avalon Interface Specifications http://www.altera.com/literature/manual/mnl_avalon_spec.pdf for details
Burst size adaptation for efficient DRAM accesses Combines short local transactions into memory bursts
Splits long local transactions into memor b rsts Splits long local transactions into memory bursts
Integrated low latency half-rate system interface Supports an optional half system interface speed Supports an optional half system interface speed
Maintains the controller in the faster clock domain to reduce latency
© 2012 Altera Corporation—Confidential
43
Efficiency Monitor and Protocol Checker
Reports on read and write throughput of the interface by ti d t f d it ticounting command transfers and wait times
Checks legality of commands issued by user logic to the controllercontroller
Measures full path memory read latency Read commands from user logic time-stamped Read commands from user logic time-stamped
Returned data timestamp compared to when command was issued
Implemented as Avalon slave interface for manual access or by EMIF Toolkit (described later)
© 2012 Altera Corporation—Confidential
44
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 22
Other Advanced Features
Runtime configurableTi i Timing parameters
Address widths Controller behavior
Error correction code (ECC) with sub-word writes Multicast write to mitigate effects of multiple Multicast write to mitigate effects of multiple
activates Refresh timing control Refresh timing control
Programmable periodic refresh User requested auto-refresh
Power management User requested self-refresh
A t ti t / it d d
© 2012 Altera Corporation—Confidential
Automatic entry / exit power-down mode
45
HPC II Architecture
See appendix for detailed block diagram and description of blocks that make up the High Performance Controller
© 2012 Altera Corporation—Confidential
46
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 23
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
ALTMEMPHY d U iPHYALTMEMPHY and UniPHY
© 2012 Altera Corporation—Confidential
Altera Memory PHY Solutions
Feature UniPHY ALTMEMPHY
A il bl M C IP Available as a MegaCore IP
Support for DDR2/3
Support for QDR II/II+ and RLDRAM II Support for QDR II/II+ and RLDRAM II
PLL/DLL sharing
Smart calibration algorithms Smart calibration algorithms
Latency 0.5 1.0
For all new designs with supported
memory
UniPHY Provides Higher Flexibility With Half the Latency
© 2012 Altera Corporation—Confidential
48
Half the Latency
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 24
UniPHY AdvantagesALTMEMPHY
ReconfigI/O structure
UniPHYDLL PLL
ry
Reconfig
Clock generation
PLL
Mimic path
Calibrationsequencer
User logicry
Calibrationsequencer
Clock generation
I/O structureRe-config
Mem
or
Write path
DQSpath
DLL
pathUser logic
logic
Memory
Mem
o
R d th
Write path
DQS path
DQ I/O FIFO
Address/cmd path
Read path
Write pathDQ I/O block
I/O block
Memory controller
Memory controller
Address/cmd path
Read pathFIFO
I/O block
Hard read/write paths in all supported devices implemented as FIFO
SoftHard
Hard read/write paths in all supported devices, implemented as FIFO in Stratix V devices
Soft I/O grouping and calibration sequencer provide flexibility
© 2012 Altera Corporation—Confidential
Better resource sharing (PLL, DLL) for multiple interfaces
49
UniPHY Benefits
UniPHY Enhancements Benefits
H lf h l I d fHalf the latency Improved system performance
PLL, DLL, and on-chip termination (OCT) logic sharing
Easier to create multiple memory interfaces on a single device
More configurationsDDR1/2/3, QDRII/II+, RLDRAM II
Mainstream configurations: widths, burst sizes DIMM types and multi-rank supportsizes, DIMM types, and multi-rank support
Nios II processor-based calibration sequencer
Higher performance with advanced calibration algorithms makes for easier design and debug
qdesign and debug
Easy to build custom PHYModular clear text code
Ease of use enhancements Flexible timing model Pin and timing constraint enhancements Improved testbenches
© 2012 Altera Corporation—Confidential
50
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 25
PHY Device Support
Device ALTMEMPHY UniPHYDevice ALTMEMPHY UniPHY
Arria II / II GX
Arria II GZ
Cyclone III
Cyclone IV
Hardcopy III - IV
Stratix III - IV
Stratix V
All new device families will only support UniPHY
© 2012 Altera Corporation—Confidential
51
UniPHY Interface to Controller and Memory
DLL and PLL instantiated at same level as PHY Can be set to master or slaveCan be set to master or slave
Facilitates sharing between multiple controllers
OCT block can also be shared or instantiated outside of UniPHY
UniPHY top-level file
UniPHYAltera PHY Interface
(AFI)
Reset interface
Memory interface
RUP & RDN
( R )OCT
(or RZQ)
DLL PLL
© 2012 Altera Corporation—Confidential
52
PLL/DLL sharing interfaceOCT sharing interface
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 26
UniPHY Clocks Typical half-rate design clocks in table below
Default phases for a >240 MHz memory frequency
Addi i l l k d d f diff i i h lf Additional clocks needed for different scenarios, i.e. half-rate to quarter-rate conversion
Run Report Clocks in TimeQuest timing analyzer for detailsp Q g y
Clock SourceClock rate
Phase Description
pll afi clk PLL c0 Half 0° Controller clockp _a _c c0 a 0 Co o e c oc
pll_mem_clk PLL c1 Full 0° Output memory clock
pll write clk PLL c2 Full90°
(45° for Stratix V Write data clockp _ _ (45 for Stratix V devices)
pll_addr_cmd_clk PLL c3 Half 270° (adjustable)Address/command output clock
pll_avl_clk PLL c5 0° Nios sequencer clock
pll_config_clk PLL c6 0° Scan chain clock
DQSExternal
Full 90° Read data strobe
© 2012 Altera Corporation—Confidential
53
DQSmemory
Full 90 Read data strobe
The UniPHY Sequencer Parameterizable Nios II processor system generated at run time
Implements calibration algorithm to maintain center alignment of data d l k i l th h I/O d l h i dj t tand clock signals through I/O delay chain adjustment
Hands control over to memory controller once calibration is completedp
For more information on UniPHY and sequencer architectural blocks, see the Appendix
To debug DQS enable l
DebugRAM
(calibration
module
Tracking
samples
Nios II processor
Avalon-MM interface
Debug interface
(calibration software storage)
Tracking manager
processor
Scan chain control (SCC)
manager
Read write (RW)
manager
PHY manager
Data manager
© 2012 Altera Corporation—Confidential
54
manager
To I/Os AFIPHY parameters
(includes FIFO info)
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 27
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
U iPHY C lib tiUniPHY Calibration
© 2012 Altera Corporation—Confidential
UniPhy Calibration
Overview of calibration
Calibration stages
Calibration signals Calibration signals
© 2012 Altera Corporation—Confidential
56
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 28
Overview of Calibration
Configures PHY and I/Os for reliable data transfer
Performed by Nios II processor-based sequencery p q
Determines the delay settings needed to center-align data signalsalign data signals
Two tasks performed1 FIFO buffer calibration: sets data valid (VFIFO) and read latency1. FIFO buffer calibration: sets data valid (VFIFO) and read latency
(LFIFO) lengths in the read datapath of UniPHY
2. I/O calibration: adjusts delay chains and clock phase settings
When calibration completes, control is passed to the memory controller
© 2012 Altera Corporation—Confidential
y
57
The Chicken-and-Egg Calibration Problem
Calibration, at a very high-level, works like this: Set the knobs to some value
Write to memory
Read from memory Read from memory
Check if what you read is correct If so, the knob settings are good..., g g
...if not... well either the write or the read failed
To test a write you need to be able to ready
To test a read you need to be able to write
© 2012 Altera Corporation—Confidential
58
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 29
“Guaranteed” Write
Special write mode that attempts to get known data into memory that can be used for read calibration
Write a constant burst of zeros to one bank, and a burst of ones to another bank Back-to-back read of these two banks can be used for read
calibration
© 2012 Altera Corporation—Confidential
59
Calibration Stages
Read calibration part one: DQS enable calibration
DQ/DQS centering
Write calibration part one: Leveling
Write calibration part two: DQ/DQS centeringDQ/DQS centering
Read calibration part two: Read latency minimization
© 2012 Altera Corporation—Confidential
60
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 30
Read Calibration Part One
ObjectivesRead calibration part one:
DQS enable calibrationDQ/DQS centering
Calculates when read data is received after a read command is issued to setup the Data Valid Prediction FIFO (VFIFO) cycle
Q Q g
Write calibration part one:Valid Prediction FIFO (VFIFO) cycle
Aligns the input DQ with respect to DQS to maximize the read margins
Write calibration part one: Leveling
ActionsWrite calibration part two:
DQ/DQS centering
Uses guaranteed writes to perform: DQS enable phase calibration
DQ/DQS centering Read calibration part two: DQ/DQS centeringRead latency minimization
© 2012 Altera Corporation—Confidential
61
DQS Enable Phase Calibration
Goal: set up phase and latency of VFIFO to best capture DQ without DQS glitchescapture DQ without DQS glitches Including postamble
dqs enable should be active from before first dqs_enable should be active from before first DQS rising edge until before last falling edge
© 2012 Altera Corporation—Confidential
62
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 31
DQ/DQS Centering
Goal: Center DQ signals with respect to each other and center DQS to aligned DQother and center DQS to aligned DQ
1. Sweep D1 (DQ input) delay chain to align DQ to each othereach other
2. Sweep aligned DQ to center DQS
DQS not adjusted, only DQ delay
© 2012 Altera Corporation—Confidential
63
Write Calibration Part One
ObjectivesAli DQS h l k h
Read calibration part one: DQS enable calibration
DQ/DQS centering Align DQS to the memory clock at each device
Compensate for address, command, and
DQ/DQS centering
clock skew at each device
Actions
Write calibration part one: Leveling
Actions Perform a variety of random burst pattern
writes with different delay and phase settings f
Write calibration part two: DQ/DQS centering
followed by a read Simple patterns could lead to incorrect
alignmentRead calibration part two:
Sequencer picks the closest delay and phase values to the center of the window
Read calibration part two: Read latency minimization
© 2012 Altera Corporation—Confidential
64
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 32
Write Leveling Procedure
pll write clkp _ _phase adjustment
(PLL)
D5 and D6 output delay chain adjustment(in 50 ps increments)
PLL againPLL again
D5 and D6 delay adjustment againadjustment again
© 2012 Altera Corporation—Confidential
Final Calibration Stages
Write calibration part twoDQ/DQS centering similar to read calibration
Read calibration part one: DQS enable calibration
DQ/DQS centering DQ/DQS centering similar to read calibration
D5 and D6 delay chains adjusted
Read calibration part two Write calibration part one: Read calibration part two LFIFO at maximum latency so far
Reduce LFIFO latency until read fails
Leveling
y
Increase latency by two for margin
Control handed over to memory
Write calibration part two: DQ/DQS centering
ycontroller
Read calibration part two: Read latency minimizationRead latency minimization
© 2012 Altera Corporation—Confidential
66
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 33
Calibration Signals
Signal Description
afi cal fail Asserts high if calibration failsafi_cal_fail Asserts high if calibration fails
afi_cal_success Asserts high if calibration is successful
afi_cal_req Synchronous reset for sequencer
© 2012 Altera Corporation—Confidential
67
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
H d M I t fHard Memory Interface
(Optional; Cyclone V & Arria V devices only)only)
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 34
Multi-Port Front End (MPFE)
Avalon-MM/ST adaptor
MPFE
MM register
Hardened controller
Hardened PHY
© 2012 Altera Corporation—Confidential
69
MPFE Architecture
Multiple Avalon ports for access to hard controller & PHY Up to 6 command ports
Up to 4 read-data ports
Up to 4 write-data ports
C fi d l it l bi i t Configure as read-only or write-only or combine into bidirectional
Internal Avalon port widths from 32 to 256 bits depending Internal Avalon port widths from 32 to 256 bits depending on number used and whether uni- or bidirectional
Avalon-MM to ST implementation in fabric for connectivity Avalon-MM to ST implementation in fabric for connectivity
Request scheduling done through set priority levels (absolute) and weighted round robin (relative)(absolute) and weighted round robin (relative)
© 2012 Altera Corporation—Confidential
70
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 35
Hardened Controller
Functionally similar to soft controller
DRAM interface is 40 bits wide to accommodate from 8 bits up to 32 bits + ECCp
Multiple controllers can be bonded for wider interfaces, even if using different clocksinterfaces, even if using different clocks From controller to user logic: synchronized
From controller to memory: not synchronized
Counters track data in FIFO buffers to ensure data is sent and received on same cycle
© 2012 Altera Corporation—Confidential
71
Hardened PHY
Again, similar to soft UniPHY
Portions of sequencer remain soft Soft: Nios II processor, instruction/data RAM, Avalon fabric
Hard: R/W manager, PHY manager, Data manager (run at full rate)
C t t d fi d I/O i t bl k d Connects to predefined I/O register blocks and pins
Pi il bl l I/O if t i h d i t f Pins available as regular user I/O if not using hard interface
© 2012 Altera Corporation—Confidential
72
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 36
Test Your Knowledge: Intro to Memory IP
1. What controller feature, required for running DDR3 at high speeds is available only in Stratix devices?
A. Read and write leveling
high speeds, is available only in Stratix devices?
2. What do the half-rate and quarter-rate modes allow you to do?
A. Run the internal interface logic at half or quarter of the speed of the external memory to ease timing closure
3. What FPGA settings are adjusted during the calibration process?
A. PLL clock output phase and I/O delay chain lengths
© 2012 Altera Corporation—Confidential
73
Section 1 Resources
Memory Resource Center http://www.altera.com/technology/memory/mem-index.jspp gy y j p
User guides External Memory Interfaces Handbook Quartus II Software Handbook
Device handbooksCyclone III Cyclone IV & Cyclone V FPGAs Cyclone III, Cyclone IV, & Cyclone V FPGAs
Arria GX, Arria II GX/GZ, & Arria V FPGAs Stratix III, Stratix IV, & Stratix V FPGAs
Application note: AN461: QDRII and QDRII+ with Stratix III/IV devices
AN637: Sharing External Memory Bandwidth Using the MPFE AN637: Sharing External Memory Bandwidth Using the MPFE Reference Design
© 2012 Altera Corporation—Confidential
74
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 37
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 2 M I t f D i FlSection 2: Memory Interface Design Flow in the Quartus II Software
© 2012 Altera Corporation—Confidential
Recommended Memory Interface Design Flow
Select deviceStart designCreate and parameterize
memory interfaceInstantiate PHY & controller(example or custom design)
Perform functional simulation
Add constraints (I/O, timing, etc.)
and compile
optional(but recommended)
Expected results?Debug design Verify timing
yesno
Meets timingand
performance?Adjust constraints
yes
no
Verify functionality & SI on board
yesBoard (PCB) related tasks
(layout, simulation, termination, drive strength settings, etc.)
Works correctly?
Debug design
yes
no
© 2012 Altera Corporation—Confidential
76
Design complete
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 38
3 Main Design Flows
MegaWizard™ Plug-In Manager flow Full custom parameterization of IP core variant Instantiate anywhere in existing design Can generate a complete example design and testbench Our focus for todayy
SOPC Builder flow Generates complete simulation environment
I t t i t f ith th t IP Integrate memory interface with other custom IP Uses Avalon-MM interfaces for easy integration Not recommended for new designs; use Qsys instead
Qsys flow (discussed later) All advantages of SOPC Builder plus… Hierarchical system designHierarchical system design Higher performance interconnect
© 2012 Altera Corporation—Confidential
77
Create or Open a Quartus II Project
Create a new Quartus II project or open an existing project
Select the target gdevice or device familyy Memory type support
Desired performance
© 2012 Altera Corporation—Confidential
78
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 39
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
P t i ith th M Wi d PlParameterize with the MegaWizard Plug-In Manager
© 2012 Altera Corporation—Confidential
Recommended Memory Interface Design Flow
Select deviceStart designCreate and parameterize
memory interfaceInstantiate PHY & controller(example or custom design)
Perform functional simulation
Add constraints (I/O, timing, etc.)
and compile
optional(but recommended)
Expected results?Debug design Verify timing
yesno
Meets timingand
performance?Adjust constraints
yes
no
Board (PCB) related tasks
(layout, simulation, termination, drive strength settings, etc.)
Verify functionality & SI on board
yes
Works correctly?
Debug design
yes
no
© 2012 Altera Corporation—Confidential
80
Design complete
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 40
Creating the Interface
MegaWizard Plug-In Manager
Easy creation and yparameterization of entire interface
T lTools menu or Tasks window
© 2012 Altera Corporation—Confidential
81
Select Memory Controller IP
Select PHY, output file HDL, and instance name
© 2012 Altera Corporation—Confidential
82
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 41
Parameterize the IP
Enable hard i t finterface
(Arria V & Cyclone V devices only)
Multiple settings tabs
Memorypresets
© 2012 Altera Corporation—Confidential
83
PHY Settings
PHY-only generation for use
with custom
Cl k
with custom controller
Clock frequencies & half/full-rate
mode selectionFineFine
adjustment of clock phases
Resource sharing options for multipleoptions for multiple memory interfaces
© 2012 Altera Corporation—Confidential
84
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 42
Memory Parameters
Use presets as a “starting point” for customizing external memory parameters and timingexternal memory parameters and timing
Adjust parameters (if needed) to match data sheet and memory usesheet and memory use
© 2012 Altera Corporation—Confidential
85
Custom Memory Presets
Create new preset from MegaWizard settings (.qprs file)
Update and save with custom settings Update and save with custom settings
© 2012 Altera Corporation—Confidential
86
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 43
Memory Initialization Options
Configures mode registers with MRS command during initializationduring initialization
Adjust parameters (if needed) to match data sheet and memory use
Check memory device datasheet for details on
each mode register setting sheet and memory useeach mode register setting
© 2012 Altera Corporation—Confidential
87
Memory Timing
Adjust memory timings to match datasheet; may needmatch datasheet; may need
to derate default values
© 2012 Altera Corporation—Confidential
88
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 44
Reading Memory Datasheets
1GB Micron DDR3 MT9JSF12872AY-1G1 DIMMbit id (i l dibits wide (including ECC check bits)
words deep x 1,000,000
(72 8)/8 = 8 bytes x 128M = 1 GB(72-8)/8 = 8 bytes x 128M = 1 GB
Component: DIMM (UDIMM or RDIMM): Simple chip / single device
soldered on board (or DIMM)
Collections of components placed into sockets
Requires one datasheet for DIMM)
Additional datasheet required for component-specific timing numbers
Requires one datasheet for general speed ratings and options
specific timing numbers
© 2012 Altera Corporation—Confidential
89
Setting Memory Options (Either Datasheet)
Obtain row and column addressing widths, number clock pairs and chip selects DQ widthnumber clock pairs and chip selects, DQ width, etc.
Adj t M P t t b Adjust Memory Parameters tab
© 2012 Altera Corporation—Confidential
90
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 45
Setting Memory Options (cont.)
If using different memory than preset, be sure to set options for correct operating speed, speed bin, and configuration
Example: CAS latency (CL) and CAS write latency (CWL) from component datasheet
For 533 MHz, -187E components
t CL f 7must use CL of 7 or 8 and CWL of 6; CWL of 5 not allowed
Can cause initialization orinitialization or calibration failures!
© 2012 Altera Corporation—Confidential
91
Setting Timing Parameters
Adjust for desired operating frequency on Memory Timing tab
Be aware of units! Different vendors, different units used in specs may not match units in MegaWizard!g
Most common error: clock cycles vs. ps/ns
With memory presets units should match but be With memory presets, units should match, but be sure to double check!
© 2012 Altera Corporation—Confidential
92
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 46
Timing Derating
Setup and hold time settings for DQ (wrtDQS/DQSn) and control/address (wrt CK/CK#) must be “derated” tDS, tDH, tIS, tIH specifications in component datasheet
Adjust values to account for different slew rates on signals, usually due to additional loading Example: single vs. multi-rank (multi-CS) memory configurations
Without derating, timing analysis may be overly optimistic Timing analysis passes, actual board implementation fails!
© 2012 Altera Corporation—Confidential
93
Timing Derating (cont.)
1. Enter base values for settings (memory preset f t d t h t)or from component datasheet)
2. Perform board simulations to determine slew t ith t l lik M t G hi ®rates with tools like Mentor Graphics®
HyperLynx (discussed later)E t i l ti i f ti i t M Wi d3. Enter simulation information into MegaWizard Plug-In Manager to automatically derate
If board simulation results not available, Altera d f lt b d ( t d d)defaults can be used (not recommended)
© 2012 Altera Corporation—Confidential
94
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 47
Automatic Derating
1. Enter base values for settings in the Memory Timing tabTiming tab
2. Enter slew rate information into Board Settings tabtab
Derated values automatically calculated
© 2012 Altera Corporation—Confidential
95
Additional Board Settings
Adjusts generated SDC constraints for timing analysis to account for board effectsanalysis to account for board effects
Account for ISI effects usually
found in multi-rank systemssystems
Account forAccount for differences in board
trace lengths
© 2012 Altera Corporation—Confidential
96
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 48
Controller Settings
Required for SOPC BuilderRequired for SOPC Builder and Qsys integration
Chip-row-bank-col: use bank look-Chip-row-bank-col: use bank look-ahead to hide affect of burst lengths
greater than column widthChip-bank-row-col: allocate separate
physical banks to multiple mastersphysical banks to multiple masters
Larger depth = more efficient, but more resources required
(possible max freq. hit)
Enable data reordering and set starvation limitstarvation limit
Enable ECC and choose
Enable CSR interface and how it will be accessed
© 2012 Altera Corporation—Confidential
97
Enable ECC and choose whether to auto correct
Multi-Port Front End Settings(H d C t ll O l )(Hard Controller Only)
Bond with another hardBond with another hard controller to create wider
data widths
Absolute priority(1-7; higher level has
Weight setting for weighted round robin (WRR)
priority over lower levels)round robin (WRR)
(0-32; relative priority)
© 2012 Altera Corporation—Confidential
98
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 49
Diagnostics Settings
Reduce simulation time by skipping calibration and initialization (only really
needed in hardware)
© 2012 Altera Corporation—Confidential
99
Enable the Efficiency Monitor for use with the EMIF Toolkit
Generate the IP
Clicking Finish generates the IP
Choose whether or not to generate the example designg
© 2012 Altera Corporation—Confidential
100
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 50
MegaWizard Output
Top-level wrapper file for instantiation <project_directory>/<variation_name>.v or .vhd
Files for synthesis and simulation pointed to by .qip (added to project automatically) <project_directory>/<variation_name>/
<project_directory>/<variation_name>_sim/
Al dd QIP fil t Q t II j t Always add QIP file to Quartus II project Adds all IP HDL files to project for synthesis
One file to add instead of multiple files One file to add instead of multiple files
© 2012 Altera Corporation—Confidential
101
Generated Example Design
Complete working reference design project (.qpf) if example design generation enabled Files in
j t di t / i ti l d i / l j t<project_directory>/<variation_name>_example_design/example_project
Example designExample design
Traffic G t
DDR/2/3 controller
PHY
Local interface (Avalon-MM) External DDR/2/3 Pass or
Generator
Controllerlogic
PHY( )
AFI
memory
DLL
fail M S
Clock source
g
PLL
© 2012 Altera Corporation—Confidential
102
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 51
Hierarchy of Example Design
Top level:p
<variation_name>_example.v/vhd
Driver / User Logic: Megacore top level:
Issues reads/writes
<variation_name>_example_d0.v/vhd
<variation_name>_example_if0.v/vhd
Instantiates controller core: Instantiates PHY core:
<variation_name>_example_if0_c0.v/vhd <variation_name>_example_if0_p0.v/vhd
© 2012 Altera Corporation—Confidential
103
Top Level Code (Verilog)module ddr3_top_example (
input wire pll_ref_clk,input wire global_reset_n,_ _output wire [12:0] mem_a,output wire [2:0] mem_ba,output wire mem_ck,output wire mem_ck_n,
Must assign these ports to I/O pins
_ _output wire mem_cke,inout wire [63:0] mem_dq,inout wire [7:0] mem_dqs,inout wire [7:0] mem_dqs_n,output wire [7:0] mem_dm,output wire mem_cs_n,output wire mem_ras_n,output wire mem_cas_n,
Designed for use with test driver only could be
output wire mem_we_n,output wire mem_reset_n,output wire mem_odt,output wire drv_status_fail,
driver only, could be promoted to debug header / port if desired
output wire drv_status_test_complete,output wire drv_status_pass
);
© 2012 Altera Corporation—Confidential
104
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 52
Constraints Required
Generated by MegaWizard Plug-In Manager and pointed to by .qippointed to by .qip Full set of SDC timing constraints (.sdc files)
Tcl scripts to create timing report on memory interface marging y g
<variation_name>_report_timing.tcl
<variation_name>_report_timing_core.tcl
Pi I/O t d d d i i t ( t b b ) Pin I/O standards and grouping script (must be run by user) <variation_name>_pin_assignments.tcl
Required from user Required from user Pin placement constraints done in Quartus II Pin Planner
Pin Planner file ( ppf) available for early I/O planning flow Pin Planner file (.ppf) available for early I/O planning flow See the I/O System Design online training for details
© 2012 Altera Corporation—Confidential
105
QDR II/II+ SRAM Controller With UniPHY
Example design
Traffic Gen. QDR II/II+ controller
Write data QDR II/II+ SRAMM
PHYSM M
Local interface
(Avalon-MM)
FIFO
Command
QDR II/II+ SRAMM M
SWrite
AFIDLL
Command issuing
FSMSRead PLL
© 2012 Altera Corporation—Confidential
106
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 53
RLDRAM Controller With UniPHY
Similar to QDR except has timers that interrupt t ll t d f hcontroller to do refresh
Example design
Traffic Gen. RLDRAM controller
Write dataM
PHYSAFI
data FIFO
RLDRAM IIM SM
S
DLL
Command issuing
FSM
Local interface
(Avalon-MM)
Refresh timer
PLL
Bank timers
© 2012 Altera Corporation—Confidential
107
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
Q t II P j t S ttiQuartus II Project Settings
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 54
Recommended Quartus II Settings Optimize hold timing feature for All Paths
Standard Fit (highest effort) option Standard Fit (highest effort) option Default is Auto Fit (shorter compilation)
Stand-alone memory designs should meet timing with Auto Fity g g
Physical synthesis
© 2012 Altera Corporation—Confidential
109
Compiling the Example Design
Open and use .qpf project in example_projectdirectorydirectory
© 2012 Altera Corporation—Confidential
110
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 55
Test Your Knowledge: Design Flow
1. Why is timing derating necessary when creating an external memory interface?external memory interface?
A. Adjusts setup and hold timing requirements to the external memory to account for board effects
A The default values for all memory interface parameter
2. What information is stored in a memory preset?
e te a e o y to accou t o boa d e ects
A. The default values for all memory interface parameter settings based on a given external memory device
3. What are the main components of the example design generated by the MegaWizard Plug-In Manager?
A. Top-level design, traffic generator, controller, and PHY
© 2012 Altera Corporation—Confidential
111
Section 2 Resources
User guidesExternal Memory Interfaces Handbook (Volume 2 Section I) External Memory Interfaces Handbook (Volume 2, Section I)
Quartus II Software Handbook
Device handbooks Device handbooks Cyclone III, Cyclone IV, & Cyclone V FPGAs
Arria GX, Arria II GX/GZ, & Arria V FPGAs, ,
Stratix III, Stratix IV, & Stratix V FPGAs
© 2012 Altera Corporation—Confidential
112
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 56
Please go to Exercise 1Please go to Exercise 1
Create a design that includes a high performance memory controller and PHY
© 2012 Altera Corporation—Confidential
113
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 3 F ti lit d Si l ti fSection 3: Functionality and Simulation of a Memory System
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 57
Agenda
Controller functionality Connections to core RTL design Connections to core RTL design
Latency
C ti b t t Consecutive burst support
Top-level system design description
Simulation models and directory structure
Simulation setup inside the Quartus II software Simulation setup inside the Quartus II software Choosing EDA simulator and setting up NativeLink
Using Quartus II software-generated scripts
Running simulation
© 2012 Altera Corporation—Confidential
115
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
C t ll O ti d C ti tController Operation and Connections to User Logic
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 58
Example Design Revisited
Simulation project with testbench and memory model found inmodel found in <project_directory>/<variation_name>_example_design/simulation
Testbench
Example designExample design
Traffic Generator
DDR/2/3 controller
PHY
Local interface (Avalon-MM) Memory Pass or
f il Generator
Controller logic
PHY
AFI
model
DLL
fail M S
Clock source PLL
© 2012 Altera Corporation—Confidential
117
Memory Controller Interface Signalsavl_addr
avl_beavl_burstbegin
avl_read_reqlocal refresh req
mem_addrmem ac parity
DDR3 SDRAMController
local_refresh_reqlocal_refresh_chip
avl_sizeavl_wdata
avl_write_reqlocal autopch req
mem_ac_paritymem_bamem_cas_nmem_ckemem_cs_nmem dm
External memory interface
local_autopch_reqlocal_self_rfsh_chiplocal_self_rfsh_req
local_multicastcsr_addr
mem_dmmem_odtmem_ras_nmem_we_nparity_error_n
Local interface (Avalon-MM)
MemoryController
csr_becsr_read_reqcsr_write_req
csr_wdata
mem_dqmem_dqsmem_dqs_n
AFI
local_init_doneavl_rdata
avl_rdata_validavl_rdata_error
avl ready
mem_err_out_n
or ALTMEMPHY
AFI
UniPHY orALTMEMPHY
avl_readylocal_refresh_ack
local_power_down_acklocal_self_rfsh_ack
ecc_interrupt
© 2012 Altera Corporation—Confidential
118
csr_rdatacsr_rdata_validcsr_waitrequest
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 59
Verify Design in Silicon
SignalTap II Logic Analyzer Verify local memory interface and pass-or-fail signals Verify local memory interface and pass-or-fail signals
Do not use on external memory interface pinsTapping signals adds stubs affects timing! Tapping signals adds stubs, affects timing!
© 2012 Altera Corporation—Confidential
119
Ops Initiated from User Logic or Example Driver
From local (Avalon) interface Memory writes
Memory readsMust not be activated simultaneously else core falls into unknown state
Refresh
If user controlled refresh option enabled If user-controlled refresh option enabled
Else Auto Refresh (ARF) command periodically issued
© 2012 Altera Corporation—Confidential
120
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 60
Local Interface: Hand Shaking Schemes
Local interface signals can be separated into 2 groups
1. request / avl_ready group*High
Controller ready to accept signals
avl_write_req, avl_read_reqavl_addr, avl_size
Controller returns:
avl_ready *
*Lowuser logic must hold read/write request, size, and address signals ntil a l read 1
2. write / read data group l d t
until avl_ready=1
avl_wdata Data to write put on bus along with avl_write_req
avl_rdata_valid avl_rdata Read data valid signal tells local interface that valid data is present
© 2012 Altera Corporation—Confidential
121
Hand Shaking when avl_ready Low
Local interface must hold read/write request, size, and address signals fixed until avl_ready = 1
© 2012 Altera Corporation—Confidential
122
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 61
HPC II Architecture: Functional Overview
Memory refresh Periodically issue auto-refresh command for data retention Periodically issue auto-refresh command for data retention
Can choose user-controlled refresh option
Memory initialization Memory initialization Memory must be initialized before functional use
Initialize memory automatically in MegaWizard settingsInitialize memory automatically in MegaWizard settings
Training/calibration (user transparent) Between controller PHY and memoryBetween controller, PHY, and memory
Controller NOT READY during this phase
Memory write/read Memory write/read Must follow bank management order
Core automatically manages bank within memory chips
© 2012 Altera Corporation—Confidential
123
Controller Operation – Bus Commands
Command Acronym ras_n cas_n we_n
No operation NOP High High Highp g g g
Active - Opens bank for reads/writes
ACT Low High High
Read RD High Low HighRead RD High Low High
Write WR High Low Low
Burst terminateBT High High Low
(DDR only)BT High High Low
Precharge PCH Low High Low
Auto refresh ARF Low Low Highg
Mode register set MRS Low Low Low
DDR2/3 bursts can be burst chop of 4 (BC4), full burst length of 8 (BL8), or “on-the-fly” (set during initialization)
© 2012 Altera Corporation—Confidential
124
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 62
Read and Write Bursts
Controller allows you to use any burst length up to maximum burst length set on memory devicemaximum burst length set on memory device
Full-rate: burst lengths 1, 2, 4 (local) = 2, 4, 8 (memory)
Half-rate: burst lengths 1 2 (local) = 4 8 (memory) Half-rate: burst lengths 1, 2 (local) = 4, 8 (memory)
“On-the-fly” burst length selection with address pin lets controller optimize bursts for maximum efficiencycontroller optimize bursts for maximum efficiency
F ll tlocal interface 1 2 4
Full-rate controller
memory interface 2 4 8
Half-rate controller
local interface 1 2
© 2012 Altera Corporation—Confidential
125
memory interface 4 8
Consecutive Bursting
Data from one read or write command is t t d ith d t f b tconcatenated with data from subsequent
commandEffectively moving data on every clock cycle despite burst size Effectively moving data on every clock cycle, despite burst size limits
No wait states/gaps on DDR data bus P ibl i hi h h (RD/WR) d Possible within an open row, when the next (RD/WR) command is issued within an interval of burst length /2 cycles
HPCII manages this intelligently, storing commands and issuing them out of order if possible
Any gaps or wait states means the bus is empty; efficiency of controller goes downefficiency of controller goes down
© 2012 Altera Corporation—Confidential
126
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 63
Consecutive Bursting
Controller issues next command on BL/2 cycles interval to concatenate bursts as long asinterval to concatenate bursts as long as addressing within same row Otherwise ACT needed to address new row Otherwise ACT needed to address new row
HPCII bank look-ahead mitigates this
As long as accessing row doesn't change, controller can continue bursting over and over (until Refresh)
SDRAM rows
1 2 3
© 2012 Altera Corporation—Confidential
127
Gap breaks consecutive burst
Consecutive READ Burst
© 2012 Altera Corporation—Confidential
128
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 64
Consecutive WRITE Burst
© 2012 Altera Corporation—Confidential
129
Altera PHY Interface (AFI)
Communication protocol between the controller and PHY
Single data rate interface transfers high and low data in Single data rate interface transfers high and low data in one clock cycle
Up to PHY to split into rising and falling edge datap p g g g
Handles the transition between full memory clock speed and half (or quarter) rate data
AFI bus signal width = mem_signal_width * signal_rate * AFI_RATE_RATIO
wherewhere
signal_rate = 1 for SDR protocols, 2 for DDR protocols
AFI RATE RATIO = 1 for full-rate, 2 for half-rate, 4 for quarter-rate_ _ , , q
Ex.: 13-bit DDR3 address bus using half-rate mode
© 2012 Altera Corporation—Confidential
13 * 1 * 2 = 26-bit AFI_ADDR_WIDTH
130
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 65
Read Sequence from User Logic
As illustrated on next slide…
U l i t READ b ti l d1) User logic requests READ by asserting avl_read_reqalong with size and address
Accepted by controller indicated by avl ready high Accepted by controller indicated by avl_ready high
2) Controller issues ACT to PHY over AFI
3) PHY issues ACT to memory3) PHY issues ACT to memory
4) Controller issues READ over AFI
5) PHY issues READ to memory
6) PHY receives DDR data from memory) y
7) Controller receives SDR (half-rate) read data over AFI
8) User logic receives read data from controller
© 2012 Altera Corporation—Confidential
8) User logic receives read data from controller
131
1
8
4
2
7
5
3
© 2012 Altera Corporation—Confidential
132
6
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 66
Write Sequence from User Logic
As illustrated on next slide…
U l i t WRITE b ti l it1) User logic requests WRITE by asserting avl_write_reqalong with size and address
Accepted by controller indicated by avl ready high Accepted by controller indicated by avl_ready high
2) Controller receives write data
3) Controller issues ACT to PHY over AFI3) Controller issues ACT to PHY over AFI
4) PHY issues ACT to memory
5) Controller issues WRITE over AFI
6) PHY issues WRITE command to memory
7) Controller issues write data to PHY over AFI
8) PHY sends write data to memory
© 2012 Altera Corporation—Confidential
8) PHY sends write data to memory
133
1
3
2
5
7
6
4
© 2012 Altera Corporation—Confidential
134
8
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 67
Latency
Read latency Cycles for data to arrive at local interface after read request
Total latency: from read command to data
Memory latency: from read command hitting the memory to data Memory latency: from read command hitting the memory to data back
Write latency Write latency Cycles for data to arrive at memory interface after write request
Basic assumptions Basic assumptions Reading and writing to the rows that are already open
avl ready signal is asserted high (no wait states) _ y g g ( )
Number of clock cycles using local (PHY) clock
© 2012 Altera Corporation—Confidential
135
Read Latency Components
Controller latencyl d fi d (AFI i l) avl_read_req afi_rdata_en (AFI signal)
Command output latency afi_rdata_en mem_cs_n
CAS latencyy Read command DQ data appearing on the bus
PHY read data input latency PHY read data input latency Read data appearing on the local interface
© 2012 Altera Corporation—Confidential
136
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 68
Write Latency Components
Controller latency l i i d AFI avl_write_req write command on AFI
Command output latency Write command on AFI mem_cs_n
PHY write data output latencyp y Write data appearing on memory interface DQ/DQS pins
© 2012 Altera Corporation—Confidential
137
DDR3 Latency with UniPHY
Measured in full-rate (memory clock) clock cycles
Varies depending on read or write and whether memory’s required CAS write latency (CWL) setting is odd or even
Controller t
Controller address &
PHY address &
Memory maximum
PHY read
Controller read
Round t i
Round trip ith trate
address & command
address & command
maximum read
read return
read return
trip without memory
Quarter 20 8 11 5 11 14 17 8 57 65 52 56Quarter 20 8-11 5-11 14-17 8 57-65 52-56
Half 10 3-4 5-11 6-7 4 28-36 23-25
Full 5 0 5-11 4 10 24-30 19
© 2012 Altera Corporation—Confidential
138
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 69
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
P f i Si l tiPerforming a Simulation
© 2012 Altera Corporation—Confidential
Recommended Memory Interface Design Flow
Select deviceStart designCreate and parameterize
memory interfaceInstantiate PHY & controller(example or custom design)
Perform functional simulation
Add constraints (I/O, timing, etc.)
and compile
optional(but recommended)
Expected results?Debug design Verify timing
yesno
Meets timingand
performance?Adjust constraints
yes
no
Board (PCB) related tasks
(layout, simulation, termination, drive strength settings, etc.)
Verify functionality & SI on board
yes
Works correctly?
Debug design
yes
no
© 2012 Altera Corporation—Confidential
140
Design complete
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 70
Review: Design Files Generated
MegaWizard Plug-In Manager generates: IP instance
Simulation model with scripts for simulating with Mentor Graphics, Cadence or Synopsys® toolsCadence, or Synopsys tools
(Optional) Example design and scripts for generating example design testbench
© 2012 Altera Corporation—Confidential
141
About the Traffic Generator
State machine with Avalon-MM interface Individual/block reads/writes Individual/block reads/writes
Sequential/random addressing
Writes data patterns to a range of addresses in all memory banks
Reads back data
Checks to see if it matches
Testbench outputs drv_status_pass, drv_status_fail
Active high; indicates test pass or fail Active high; indicates test pass or fail
drv_status_test_complete Transitions high for one clock cycle at end of test
Message printed to simulation console stating h th t t PASSES FAILS
© 2012 Altera Corporation—Confidential
whether test PASSES or FAILS
142
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 71
Greater Project Directory Structure
(Quartus II design project folder)
Project (.qpf) filej ( qp )
Settings (.qsf) file
Quartus IP (.qip) file (only file needed to be added to project)
(instance design files and constraints)(instance design files and constraints)
(standalone reference design)
E l j t f f d i f th i• Example project .qpf, .qsf, and .qip for synthesis
• Scripts for generating Verilog or VHDL modules for simulation
• Includes top-level testbench and generic memory model
(functional simulation files for components of the IP plus scripts for simulating in 3rd party tools)scripts for simulating in 3 party tools)
© 2012 Altera Corporation—Confidential
143
Generating Simulation Modules
1. Open simulation example project <variation_name>_example_design/simulation/<variation_name>_example_sim.qpf
2. Run appropriate Tcl script to generate Verilog or VHDLVHDL generate_sim_verilog_example_design.tcl
generate sim vhdl example design tcl generate_sim_vhdl_example_design.tcl
Creates submodules folder Creates submodules folder
Creates verilog or vhdl folderd d t bf ld t i i l ti cadence, synopsys, and mentor subfolders contain simulation
scripts for each vendor’s tool
© 2012 Altera Corporation—Confidential
144
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 72
Example Project Sim Hierarchy
[vhdl/verilog]/<variation_name>_example_sim.v/.vhd (simulation example design)
<variation_name>_example_sim_e0.v/.vhd (example design wrapper)
<variation_name>_example_sim_e0_d0.v/.vhd (traffic generator)
<variation_name>_example_sim_e0_if0.v/.vhd (example design DUT)
alt_mem_if_ddr3_mem_model_top_ddr3_mem_if_dm_pins_en_mem_if_dqsn_en.sv
(generic memory model)(generic memory model)
status_checker_no_ifdef_params.sv (pass/fail status checker)
In [vhdl/verilog]/submodules/ folder
© 2012 Altera Corporation—Confidential
145
Detailed Simulation Example Design
ddr3_top_example_sim.v/vhd
ddr3 top example sim e0 v/vhdddr3_top_example_sim_e0.v/vhd
ddr3_top_example_sim_e0_if0.v/vhdddr3_top_example_sim_e0_d0.v/vhd
Traffic generatorPHY
Local interface (Avalon-MM) Memory
modelM S
pll_ref_clk
Controller logic
AFI DLL
Status
pass
failPLL
global_reset_ncheckertest_complete
Note: Simulate just UniPHY with your own controller and traffic
© 2012 Altera Corporation—Confidential
146
generator with the Generate PHY only option
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 73
Generic or Vendor Model?
Generic: fully follows all memory protocol ifi tispecifications
Guaranteed to simulate accurately with generated IP No additional setup or configuration requiredNo additional setup or configuration required
Vendor: standardized and more thorough than generic modelgeneric model May require additional setup and configuration Requires manual connection to testbench
Vendor model simulation is not supported, but may provide a more accurate simulation of your
t l i t factual memory interface
© 2012 Altera Corporation—Confidential
147
Obtain Model from Vendor
Vendor downloads may include additional parameter files and instructions
© 2012 Altera Corporation—Confidential
148
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 74
Placement of Vendor Model Files
All sim modules generated in submodules folder including generic memory modelg g y
Place downloaded vendor model files (usually model and parameters files) here
© 2012 Altera Corporation—Confidential
model and parameters files) here
149
Edit Vendor Model File
Define selected device parameters
Select correct speed grade and device width `define sg15E Speed grade -15E
`define x16 DQS bank width or part width
Be sure to include any parameter files `include “ddr3_parameters.vh”
See instructions included with model files for parameter selection detailsparameter selection details
© 2012 Altera Corporation—Confidential
150
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 75
Replace Generic InstantiationReplace this:
With this:
© 2012 Altera Corporation—Confidential
151
Point Script to Vendor Model
For Mentor Graphics ModelSim®, edit msim_setup.tcl
For Synopsys VCS, edit vcs_setup.sh or y p y _ pvcsmx_setup.sh
For Cadence NCSim, edit ncsim setup.sh For Cadence NCSim, edit ncsim_setup.sh
Si il i t f i l ti j t IP ( t l Similar scripts for simulating just IP (not example design) in <variation_name>_sim/
© 2012 Altera Corporation—Confidential
152
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 76
VHDL Simulation Notes
Generated files are mix of VHDL and encrypted or plain-text Verilog
Requires mixed-language simulation toolq g g
Encrypted files allow for use with ModelSim tool with VHDL-only licensewith VHDL only license Use Verilog or synthesis fileset for ModelSim mixed-language
license
© 2012 Altera Corporation—Confidential
153
Run the Simulation
Run script appropriate to tool
From shell or within simulation tool
ModelSim example from shell:vsim –do run dovsim do run.do
M d lSi l i t l ModelSim example in tool:do run.do
© 2012 Altera Corporation—Confidential
154
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 77
Post-Fit (Gate-Level) Simulation
Not possible with UniPHY due to inherent issues ith it b h i i t fit tli twith its behavior in a post-fit netlist
Sampling X’s during calibration Internal 0-cycle transfers require delays for simulationInternal 0 cycle transfers require delays for simulation
Can be worked around with a “quasi-post fit scheme” using Quartus incremental compilationscheme using Quartus incremental compilation
Pre-map RTL for UniPHY Post-fit RTL for rest of design Post-fit RTL for rest of design
See Simulating Memory IP chapter of the EMIF See Simulating Memory IP chapter of the EMIF handbook for the step-by-step process
© 2012 Altera Corporation—Confidential
155
Using 3rd Party Controller IP
Vendor should provide test bench with test generator used to stimulate controller through Avalon (local) interface
If using own controller, follow Avalon interface specificationsp
Start design using the Altera example driver and add/connect extra signals as needed to exerciseadd/connect extra signals as needed to exercise additional controller functionality
Use PHY-only generation to simplify setup Use PHY-only generation to simplify setup
© 2012 Altera Corporation—Confidential
156
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 78
Connecting User Logic to Controller
Replace traffic generator with user logic <variation_name>_example_sim_e0_d0.v in hierarchy
Borrow from test methodology defined inside traffic generator module
Obey functional requirements defined in y qcontroller handbook and external (physical) memory datasheetsy
© 2012 Altera Corporation—Confidential
157
Simulation Steps Summarized
Generate memory IP, choosing to generate the example design
Generate Verilog or VHDL simulation files using g gscript
Edit and instantiate vendor memory simulation Edit and instantiate vendor memory simulation model
Point simulation script to location of vendor Point simulation script to location of vendor model
R i t i l ti i t Run appropriate simulation script
Simulate and verify
© 2012 Altera Corporation—Confidential
158
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 79
Test Your Knowledge: Functionality & Simulation
1. What are the three main functions of the controller?
A. Memory initialization, refresh, and reading/writing; calibration is done by sequencer before controller handoffhandoff
2 What step must be performed in the simulation example2. What step must be performed in the simulation example project before you can run a simulation?
A Generate files for Verilog or VHDL simulation usingA. Generate files for Verilog or VHDL simulation using generated Tcl script
© 2012 Altera Corporation—Confidential
159
Section 3 Resources
User guides External Memory Interfaces Handbook (Volume 2, Section I)
Quartus II Software Handbook
D i h db k Device handbooks Cyclone III, Cyclone IV, & Cyclone V FPGAs
A i GX A i II GX/GZ & A i V FPGA Arria GX, Arria II GX/GZ, & Arria V FPGAs
Stratix III, Stratix IV, & Stratix V FPGAs
© 2012 Altera Corporation—Confidential
160
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 80
Please go to Exercise 2g
Simulation of the controller
© 2012 Altera Corporation—Confidential
161
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 4 B d d T i tiSection 4: Board and Termination Considerations
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 81
Recommended Memory Interface Design Flow
Select deviceStart designCreate and parameterize
memory interfaceInstantiate PHY & controller(example or custom design)
Perform functional simulation
Add constraints (I/O, timing, etc.)
and compile
optional(but recommended)
Expected results?Debug design Verify timing
yesno
Meets timingand
performance?Adjust constraints
yes
no
Board (PCB) related tasks
(layout, simulation, termination, drive strength settings, etc.)
Verify functionality & SI on board
yes
Works correctly?
Debug design
yes
no
© 2012 Altera Corporation—Confidential
163
Design complete
Agenda
Assigning I/O constraints Pin locations, loading
Termination schemes
© 2012 Altera Corporation—Confidential
164
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 82
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
C ti I/O A i tCreating I/O Assignments
© 2012 Altera Corporation—Confidential
Some I/O Assignments Done For You
MegaWizard Plug-In Manager generatesTcl script that incl des certain I/O constraints Tcl script that includes certain I/O constraints <variation_name>_if0_p0_pin_assignments.tcl
You must source this script yourself prior to running Pin Plannery
Script automatically constrains:I/O Standard assignments for the various memory interface pins I/O Standard assignments for the various memory interface pins
Input and Output Termination assignments (discussed later)
Current Strength assignments for the high fanoutg g gaddress/command/control signals
DQ Group assignments to associate DQS signals with the DQ signals they clocksignals they clock
Memory Interface Delay Chain Configuration assignments to set the DQ, DQS, and DM I/O to use the flexible timing model
© 2012 Altera Corporation—Confidential
166
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 83
Additional I/O Constraints to Specify
Board trace components or output pin load assignmentsassignments Based on board memory topology & memory datasheet
specificationsspecifications
Pin Location assignments
© 2012 Altera Corporation—Confidential
167
Creating I/O Location and Other Assignments
Create and manage in the Quartus II Pin Planner
Optimized locations for memory interface pins defined in device handbooks and highlighted in g gPin Planner
Implement predefined pin-outs from board layout Implement predefined pin outs from board layout guidelines or define and send them to board developerdeveloper May be limited by locations on device of memory I/O specific
features
© 2012 Altera Corporation—Confidential
168
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 84
Quartus II Pin PlannerAssignments menu Pin Planner
Package View
Groups list
All Pins (signals) list
© 2012 Altera Corporation—Confidential
169
Assigning DQ Pins View or right-click menu
1.8-V HSTL Class II
PIN_C6
Q = DQS = DQS
© 2012 Altera Corporation—Confidential
170
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 85
Assigning DQS Pins One DQS (or DQS (differential pair) for
each DQ block
© 2012 Altera Corporation—Confidential
171
Assigning All Other Required Pins
mem_a [ ]
mem ba[ ]mem_ba[ ]
mem_ck[ ]
mem_ck_n[ ]
mem_cas, mem_cas_n
mem ras mem ras nmem_ras, mem_ras_n
mem_we, mem_we_n
mem_cs, mem_cs_n
mem_dm
mem odtmem_odt
etc.
© 2012 Altera Corporation—Confidential
172
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 86
Stratix V Layout Guidelines
Devices include unique architectural features to support highest frequency memory interfaces
Guidelines must be followed to guarantee timing g gclosure
See the Stratix V device handbook chapter entitled External Memory Interfaces in Stratix Ventitled External Memory Interfaces in Stratix V Devices for details
http://www altera com/literature/hb/stratix-v/stx5 51008 pdf http://www.altera.com/literature/hb/stratix-v/stx5_51008.pdf
© 2012 Altera Corporation—Confidential
173
Stratix V Sub-Banks and Leveling Blocks
Each I/O bank made up of multiple “sub-banks” Example: Upper-left bank has sub-banks named 8A, 8B, and 8C; p pp , , ;
more on larger devices
Leveling blocksg Generate delayed (PVT compensated) versions of the source
clock (e.g. 0°, 45°, 90°) from the PHY clock tree (next slide)
Distrib tes o tp t clocks to all I/Os in each s b bank Distributes output clocks to all I/Os in each sub-bank
Implements DQS phase shift
Connected to only one of the PHY clock trees (center, left, orConnected to only one of the PHY clock trees (center, left, or right) available
DLL
PLL
Leve
ling
bloc
k
I/O I/O I/O…
PH
Y
cloc
k tr
ee
© 2012 Altera Corporation—Confidential
174
I/O sub-bank
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 87
PLLs & PHY Clock Trees
Dedicated high-speed, low-skew balanced trees
Three each on top and bottom edge of device Three each on top and bottom edge of device Controlled by left, center, or right dedicated PLLs along edge
E h PLL d i l PHY l k t Each PLL can drive only one PHY clock tree
Each PHY clock tree can reach one device edge Each leveling block can only access one PHY clock tree
CCenterPLL
PLL PLLSub-bank Sub-bank Sub-bank Sub-bank… …
Center PHY clock tree
Left PHY clock tree
© 2012 Altera Corporation—Confidential
175
Left PHY clock tree
Right PHY clock tree
DLL Placement and Limitations
4 DLLs available in the corners of a device
E h DLL th 2 dj t id Each DLL can serve the 2 adjacent sides
Maximum number of incompatible interfaces on each side of a device is 2 When implementing multiple interfaces (discussed later)
DLL can be shared without sharing PLLs DLL is clocked by PLL hence their frequencies must be the same
DLL DLL
© 2012 Altera Corporation—Confidential
176
DLL DLL
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 88
Mandatory Stratix V Layout Guidelines
Interfaces must not be split between top and bottom edges of device Due to PLL/DLL limitations discussed
All CK/CK#, address, control, command pins for an interface should be in same I/O sub-bank For optimum timing, especially for >800 MHz interfaces
Highly recommended, but not required for ≤800 MHz
© 2012 Altera Corporation—Confidential
177
Highly Recommended Stratix V Guidelines
Use a center, instead of a left or right corner, PLL Prevents long PHY clock tree delay from corner PLLs
Set PLL input clock reference to pin that drives center PLL
For ≥800 MHz interfaces and for wide interfaces that must For ≥800 MHz interfaces and for wide interfaces that must straddle between quadrants
If possible, avoid straddling
If center PLL not possible, all pins should be in same quadrant as PLLsame quadrant as PLL
© 2012 Altera Corporation—Confidential
178
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 89
Advanced I/O Timing for Memory
Cyclone III, Stratix III, and all newer devices
Assign signal to pin, select pin, then View newer devices
Adjust timing based on HSPICE model for more
or right-click menu Board Trace Model
HSPICE model for more accurate timing analysis
Create models for all outputCreate models for all output and bidirectional memory I/O
Match model to selected termination scheme
For use mostly when a 3rd-party board simulation tool is not available
See the I/O System Design online
© 2012 Altera Corporation—Confidential
179
y gtraining for details
Board Trace Constraints
Allow you to model board topology from board-level simulations into the Quartus II design flow
For example:p Near and far trace lengths
Near and far trace distributed inductance
Near and far trace distributed capacitance
Near end capacitor values
Far end capacitive (IC) load Far end capacitive (IC) load
Far end termination values
© 2012 Altera Corporation—Confidential
180
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 90
Setting Far Capacitance
Set load based on memory input capacitanceF d i d d h Found in vendor datasheet
Example 4 memory components each with 1 CS# input Load on single FPGA CS# output pin is 4 x 2 pF = 8 pF
© 2012 Altera Corporation—Confidential
181
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
B d D i d Si l ti B iBoard Design and Simulation Basics
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 91
The Need for Good Board Design
Memory interfaces are getting faster
Increased speed comes at a price Smaller data valid windows
Signal integrity affected
Board design can be a balancing act Must ensure timing requirements are met
Must ensure quality signals at receivers
Oft l d t i i i Often also need to minimize power
© 2012 Altera Corporation—Confidential
183
Achieving Good Signal Integrity
Meeting timing and good signal integrity (SI) requires good placement and routing Board routing constraints are a necessity
Good signal integrity requires termination
Termination (usually) requires components( y) q p Termination resistors
Termination power supply rail(s) circuitry
More components = more difficult placement +routing + greater power usage!
Follow guidelines in the External Memory Interfaces Handbook
© 2012 Altera Corporation—Confidential
Interfaces Handbook
184
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 92
Design Methodology for Optimal SI
Minimize component usage to save power
Take advantage of features and settings available in FPGA and memoryy
Perform board-level simulation, taking features and settings into accountand settings into account Experiment before building board
Determine best topology
Help generate board routing constraints
After board build, compare actual results with simulations
© 2012 Altera Corporation—Confidential
185
Board Design 101: What is Impedance?
Impedance The resistance to the flow of energy in a transmission line The resistance to the flow of energy in a transmission line
Impedance (Z) is a complex number Z = R + jX where R = resistance (real) X = reactance (complex)
For capacitors and inductors (not discussed here)p ( )
Characteristic Impedance, Z0 Purely real impedance found on lossless transmission lines
For this discussion assume Z = 50 (common line value) For this discussion, assume Z0 = 50 (common line value)
Impedance discontinuities on an electrical path cause reflections of energy Impedance matched lines have minimal loss and yield better SI
© 2012 Altera Corporation—Confidential
186
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 93
Signal Reflections & Impedance Matching
Initial reflection seen here Problem reflection seen here
Z0ZSZ0
V lt
ZS
VSZT Z0 ≠ ZT
Vs = source voltage
Zs = source impedance
Zo = transmission line (characteristic) impedance
Driver
Accumulated reflections at the load cause ringing
(characteristic) impedance
ZT = load impedance
Accumulated reflections at the load cause ringing and/or over/undershoot
© 2012 Altera Corporation—Confidential
187
Termination
Need ways to Prevent reflections: Match impedance all along path
Dissipate reflections: External components or device features to dissipate reflection energydissipate reflection energy
Termination schemes prevent or dissipate signal p p greflections Device settings g
Component topologies
© 2012 Altera Corporation—Confidential
188
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 94
Designing Termination Schemes
Design and choose best scheme through simulation
Example simulation tool: Mentor Graphics HyperLynx Simple graphical interface to configure board stack-up
Draw I/O buffers, board trace components
Apply IBIS models (custom or generic) to buffers Apply IBIS models (custom or generic) to buffers
Virtual probes attached to points, usually receiver, in circuit
Simulations performed using an interface similar to an oscilloscope
Also needed for getting slew rate values for timing derating
S A di f d t il d l See Appendix for details and examples
© 2012 Altera Corporation—Confidential
189
IBIS Models
Voltage or current vs. time tables describe buffer behavior
Custom IBIS models for design created in Quartus II software by EDA Netlist Writer
Generic models available from Altera web site
http://www.altera.com/support/software/download/ibis/ibs-ibis_index.jsp
© 2012 Altera Corporation—Confidential
190
p pp _ j p
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 95
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
Ch i O ti l T i ti S ttiChoosing Optimal Termination Settings
© 2012 Altera Corporation—Confidential
Termination Schemes: Series Termination
Added to source impedance to match trace impedance
Dissipates reflections back at sourcep
15 included on DDR3 DIMMs (as shown)
Decent signal integrity with low power usage Decent signal integrity with low power usage
However, source impedance changes whether i l hi h lsignal high or low
ZS Z0DDR3
RSRS
FPGA
ZS
DDR3DIMMZS + RS = Z0
© 2012 Altera Corporation—Confidential
192
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 96
Termination Schemes: Parallel
VTT (VCC/2) power supply requiredDDR: 1 25V DDR: 1.25V
DDR2: 0.9V
DDR3: 0.75V
Impedance matched at receiver
Good for unidirectional signals (command Good for unidirectional signals (command, address, etc.) or at the receiver of a bidirectional signal (DQ DQS etc )signal (DQ, DQS, etc.)
VTT
Z ZPZS Z0ZP
FPGA or memory
ZP = ZS = Z0
FPGA or memory
© 2012 Altera Corporation—Confidential
193
ZP ZS Z0
FPGA and Memory Termination Settings
Reduce (or remove) external termination components using settings on FPGA andcomponents using settings on FPGA and external memory
FPGA FPGA Series and Parallel On-Chip Termination (OCT) Dynamic OCT DDR3 read/write leveling See Appendix for additional features
External memory External memory Normal and reduced drive strength On-Die Termination (ODT)( ) Dynamic ODT
© 2012 Altera Corporation—Confidential
194
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 97
Quartus II Software Support for OCT
Assign termination scheme in the Pin Planner
S Select series or parallel with or without calibration
© 2012 Altera Corporation—Confidential
195
OCT with Calibration
Output (series) or input (parallel) buffer impedance automatically matched to externalimpedance automatically matched to external ±1% resistors at end of device configuration Stratix III & IV devices: RUP & RDN for 25 or 50 settingsUP DN g
Stratix V devices: single RZQ for a range of values
Requires OCT calibration block resourceq
RUP
VCCIO
RUP
VCCIO
RZQ or RDN
© 2012 Altera Corporation—Confidential
196
RZQ or RDN
RZQ or RDN
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 98
Stratix V OCT Calibration Blocks
4 OCT calibration blocks in the corners of a device
Each calibration block can connect to any sub-ybank on any side of the device
Each sub-bank can connect to only one OCT Each sub bank can connect to only one OCT calibration block
OCT calibration blocks can be shared by multiple OCT calibration blocks can be shared by multiple interfaces (discussed later) provided that they have the samehave the same Series/parallel termination settings
Sub-bank voltage
© 2012 Altera Corporation—Confidential
Sub bank voltage
197
Dynamic OCT
Dynamically turn on or off series or parallel OCTPo er sa ings Power savings
Correctly matched impedances for reads or writes
Write (Class I)
ZSStratix III / IV / V
VTT
Z0ZP
Write (Class I)
series OCT onparallel OCT off
ZS
memory
VCC
Read (Class I)
ZS Z0Stratix III / IV / Vseries OCT offparallel OCT on
100
100memory
RS
© 2012 Altera Corporation—Confidential
198
ZS
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 99
Recall: DDR3 Leveling
DDR3 clock signals routed in daisy-chain fly-by topology on DIMMtopology on DIMM Discrete memory components can be placed and routed on PCB
to support leveling
Improves signal integrity on high fan-out clocks Other signals still point-to-point Stratix III / IV / V devices have special leveling
circuitry Automatically accounts for delays and phase adjustments Aligns all signals on writes and reads
Required for DDR3 operating at 240 MHz or Required for DDR3 operating at 240 MHz or higher
© 2012 Altera Corporation—Confidential
199
Enabling Leveling
Automatically enabled based on frequency and geometry Leveling on above 240 MHzg
Leveling required for DDR3 DIMMs
Disable leveling for DDR2/3 star topologiesg p g
© 2012 Altera Corporation—Confidential
200
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 100
External Memory: Output Drive Strength
Calibrated w/ VT Z Z R
through external precision resistor
ZS Z0
Z
RS
Zmem + RS = Z0 = ZS
DIMM: 34 34 + 15 = 49
Zmem
DDR3 DIMMnormal drive
Discrete: reduced 40 Set in memory’s mode
normal drive
ZS Z Set in memory s mode register MR1
ZS Z0
ZZmem ≈ Z0 = ZS
Zmem
DDR3 componentreduced drive
© 2012 Altera Corporation—Confidential
201
reduced drive
External Memory: On-Die Termination
Enabled or disabled parallel termination with no external componentscomponents
Dedicated ODT input pin on memory enabled by controller during writescontroller during writes
20, 30, 40, 60, 120 calibrated nominal settings Based on divided down 240 RZQ resistor connected to memory
Dynamic ODT: automatically switch between a normal ODT setting for reads and a different setting for writes Good for multi-DIMM configurations to reduce jitter and minimize
reflections VCC/2
ZS Z0ZP
FPGA DDR3 memory
ODT
© 2012 Altera Corporation—Confidential
202
ZS
ODT
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 101
Recommendations
Many options and settings!
Need to find the best recommendations for features available in FPGA and memoryy
Try to get rid of components where possible to simplify board placement and routingsimplify board placement and routing
Simulate simulate simulate!Simulate, simulate, simulate!
© 2012 Altera Corporation—Confidential
203
Recommended Settings 50 trace
Single-rank unregistered DIMM (UDIMM) Single rank unregistered DIMM (UDIMM)
Signal typeFPGA side1
(SSTL-15)Memory side for
writes2Memory drive
strength for reads2(SSTL 15) g
DQInput & Output: calibrated 50 60 ODT (RZQ/4) 40 (RZQ/6)
Input & Output:DQS
Input & Output: calibrated
differential 50 60 ODT (RZQ/4) 40 (RZQ/6)
Output only:DM
Output only: calibrated 50 60 ODT (RZQ/4) 40 (RZQ/6)
Address/command
Maximum drive strength
39 on-board fly-by terminationmand strength
CK/CK#Calibrated
differential 50
72 differential on-board fly-by termination plus compensation
capacitors
© 2012 Altera Corporation—Confidential
204
p
(1): set by <variation_name>_pin_assignments.tcl(2): set in MegaWizard Plug-In Manager
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 102
Test Your Knowledge: I/O and Termination
1. What View option in the Pin Planner makes it easy to place external memory signals on the optimized pins of
A. Show DQ/DQS Pins in x8/x9 mode or however wide
p y g p pa device?
your DQS byte lanes should be
2 How are the two types of OCT set on reads and writes
A. Write: serial OCT on, parallel OCT off; read: serial OCT
2. How are the two types of OCT set on reads and writes when dynamic OCT is enabled on bidirectional signals?
A. Write: serial OCT on, parallel OCT off; read: serial OCT off, parallel OCT on
© 2012 Altera Corporation—Confidential
205
Section 4 Resources
User Guides External Memory Interfaces Handbook (Volume 2, Section I)
Quartus II Software Handbook
D i H db k Device Handbooks Cyclone III, Cyclone IV, & Cyclone V FPGAs
A i GX A i II GX/GZ & A i V FPGA Arria GX, Arria II GX/GZ, & Arria V FPGAs
Stratix III, Stratix IV, & Stratix V FPGAs
Application notes Application notes AN465: Implementing OCT Calibration in Stratix III Devices
AN476: Impact of I/O Settings on Signal Integrity in Stratix IIIAN476: Impact of I/O Settings on Signal Integrity in Stratix III Devices
© 2012 Altera Corporation—Confidential
206
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 103
Please go to Exercise 3g
Completing the controller
© 2012 Altera Corporation—Confidential
207
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 5 Ti i A l iSection 5: Timing Analysis
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 104
Recommended Memory Interface Design Flow
Select deviceStart designCreate and parameterize
memory interfaceInstantiate PHY & controller(example or custom design)
Perform functional simulation
Add constraints (I/O, timing, etc.)
and compile
optional(but recommended)
Expected results?Debug design Verify timing
yesno
Meets timingand
performance?Adjust constraints
yes
no
Board (PCB) related tasks
(layout, simulation, termination, drive strength settings, etc.)
Verify functionality & SI on board
yes
Works correctly?
Debug design
yes
no
© 2012 Altera Corporation—Confidential
209
Design complete
Agenda
Timing analysis methodology Timing components
Timing paths
Timing constraints and report files Timing constraints and report files
Timing analysis description
Timing margin reportTiming margin report
Timing closure Common issuesCommon issues
Optimizing timing
© 2012 Altera Corporation—Confidential
210
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 105
Timing Analysis Methodology
Meeting timing requirements is challenging
Simplified implementation through Physical layer interface IPs
Numerous device features
Supported through TimeQuest timing analyzer
© 2012 Altera Corporation—Confidential
211
Timing Components
Source-synchronous timing paths
Calibrated timing paths
Internal FPGA timing paths Internal FPGA timing paths
FPGA timing parameters
© 2012 Altera Corporation—Confidential
212
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 106
Source-Synchronous Paths
Clock and data pass from transmitting device
Example: FPGA-to-memory write datapath Adjust phase of DQS to center clock within data valid window
© 2012 Altera Corporation—Confidential
213
Calibrated Paths
Data capture clocks dynamically positioned at power up
Reads Reads Sequencer analyzes path delays between read capture and read FIFO
buffer
Sets up FIFO write clock phase for optimal timing
Read postamble calibration done similarly
Read data valid signal calibrated to delay between read command issued g yand data received
Writes Write-leveling and programmable output delay chains align DQS with CK
at memory
BothBoth Dynamic deskew adjusts delay of each DQ/DQS to center-align DQ with
associated DQS
© 2012 Altera Corporation—Confidential
214
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 107
Internal FPGA Timing Paths
Impact memory interface timing
Common to all FPGA designs
Standard timing constraints required, i.e. clock Standard timing constraints required, i.e. clock constraints
TimeQuest timing analyzer reports these paths TimeQuest timing analyzer reports these paths
© 2012 Altera Corporation—Confidential
215
FPGA Timing Parameters
I/O toggle rates vary based on Speed grade
Loading
I/O bank location I/O bank location
Termination
Drive strengthDrive strength
Slew rate
Output clock specifications (from FPGA datasheet)Output clock specifications (from FPGA datasheet) Clock period jitter
Half-period jitter
Cycle-to-cycle jitter
Skew between FPGA clock outputs
© 2012 Altera Corporation—Confidential
216
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 108
Memory Timing Paths (Stratix III/IV Devices)
I/O source synchronous &
calibrated
CalibratedInternal source synchronous
ALTMEMPHY only; bypassed
by UniPHY
© 2012 Altera Corporation—Confidential
217
Memory Timing Paths(Arria V Cyclone V & Stratix V Devices)(Arria V, Cyclone V, & Stratix V Devices)
I/O source synchronous & calibrated
© 2012 Altera Corporation—Confidential
218
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 109
UniPHY – DDR2/DDR3 Timing Paths
Timing pathApplicable
clock(s)Description
Address andcommand
pll_addr_cmd_clk,pll_mem_clk
Setup and hold margin for all address and command pins, from FPGA outputs to memory inputs
Clock-to-strobe pll_addr_cmd_clk DQS arrival at memory with respect to CK/CK#p _ _ _ y p
Core pll_afi_clk Internal timing of the UniPHY IP, between internal core registers
Core recovery/removal
PHY clocksInternal timing of the asynchronous reset signals to the UniPHYIP
Read capture DQS INSetup and hold margin for the DQ pins with respect to DQS strobe at the FPGA capture registers
p _strobe at the FPGA capture registers
Write datapathDQS_OUT, pll_write_clk
Setup and hold margin for the DQ and DM pins with respect to DQS strobe at the memory
Read resynch pll_avl_clk Synchronizing captured data with read FIFO
© 2012 Altera Corporation—Confidential
219
UniPHY Files
File (<variation_name>) Description
Clock constraints for PLL inputs
.sdc
Clock constraints for PLL inputsGenerated clock constraints for PLL outputsDerive clock uncertaintyExceptions (false paths and multi-cycle paths)O d l dd d dOutput delays on address and command outputsInput and output delays on DQ inputs and outputs
timing.tcl Includes memory interface and FPGA device parameters_timing.tcl Includes memory interface and FPGA device parameters
_report_timing.tcl Main script for reporting timing slacks
_report_timing_core.tcl Contains high-level procedures for report timing script
pin map.tcl Library of functions and procedures used by other scripts_pin_map.tcl Library of functions and procedures used by other scripts
_parameters.tclDefines parameters describing the geometry of the core and PLL configuration (Do not change)
© 2012 Altera Corporation—Confidential
220
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 110
Timing Analysis Description
Areas that are analyzed by the TimeQuest timing analyzer in a design that includes a memory IP
Address and command
Core and core reset Core and core reset
Read capture
Write
Read resynchronizationy
Write leveling
Bus turnaround time© 2012 Altera Corporation—Confidential
Bus turnaround time
221
Address and Command
Single data rate signals
Latched by memory using the FPGA output clock TimeQuest analyzes from set output delayTimeQuest analyzes from set_output_delay
constraints (-max and -min)
© 2012 Altera Corporation—Confidential
222
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 111
Core and Core Reset
Core analysis All internal core paths in the FPGA fabric
Core reset analysis Recovery/removal analysis of asynchronous reset signals to
UniPHY
© 2012 Altera Corporation—Confidential
223
Read Capture
Timing analysis indicates slack for DQ signals Signals are latched by DQS TimeQuest analyzes timing using
set_input_delay (-max and -min) set_max_delay , set_min_delay
Base anal sis from before calibration Base analysis from before calibration Emulation of calibration and timing margins after
calibration (Arria II Cyclone IV Stratix IV and Vcalibration (Arria II, Cyclone IV, Stratix IV and V devices only)
See “Analyzing Timing of Memory IP” chapter in EMIF handbook See Analyzing Timing of Memory IP chapter in EMIF handbook (Volume 2, Section I, chapter 10) for details on calibration emulation
© 2012 Altera Corporation—Confidential
224
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 112
Write
Timing analysis indicates slack on DQ signals
Latched by memory using DQS strobe output from FPGA
TimeQuest analyzes timing using set output delay (-max and -min)_ p _ y ( )
Analyzes base timing and after calibration
© 2012 Altera Corporation—Confidential
225
Read Resynchronization
UniPHY implements FIFO buffer (sequencer) Synchronizes data transfer from the data capture to the core
Calibration process sets the depth of the FIFO buffer
No dedicated synchronization clock is required No dedicated synchronization clock is required
© 2012 Altera Corporation—Confidential
226
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 113
Write Leveling DDR2/3
tDQSS memory timing parameter Calibrated path details skew margin for the arrival of DQS/DQS#
rising edge with respect to CK/CK# rising edge at memory
t /t ti i t tDSS/tDSH memory timing parameters Setup and hold skew margin for arrival of DQS/DQS# falling edge
to CK/CK# rising edge at memoryto CK/CK# rising edge at memory
© 2012 Altera Corporation—Confidential
227
Bus Turnaround Time
Analyzes margin from when bus switches from writing to reading
Prevents possible data bus contention failuresp
Stratix IV and V devices only
© 2012 Altera Corporation—Confidential
228
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 114
Timing Margin Report
Report DDR task in TimeQuest Analyzes all external memory interfacesAnalyzes all external memory interfaces
Run <variation_name>_report_timing.tcl Analyzes only this particular interfaceAnalyzes only this particular interface
Reports timing slacks on specific paths mentionedmentioned Read capture
Read resynchronizationy
Address and command
Core
Core recovery and removal
Write
Write leveling
© 2012 Altera Corporation—Confidential
Write leveling
229
Timing Report Summary
Quickly check for any failures in any of the critical timing areas
© 2012 Altera Corporation—Confidential
230
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 115
General Recommendations
Accurately enter parameter settings and timing values, accounting for unit differences between datasheet and MegaWizard Plug-In Manager
Remember to derate timing parameters based on slew rates obtained through simulationg
Create board trace models and enter board information in MegaWizard Plug-In Managerinformation in MegaWizard Plug In Manager
Follow recommendations discussed today
© 2012 Altera Corporation—Confidential
231
Common Issues and Solutions (1)
Missing timing margin report Ensure .sdc file is attached to Quartus projectEnsure .sdc file is attached to Quartus project
Done automatically by UniPHY
PHY should not be the top-level project entity
Incomplete timing margin report Check if memory interface pins are optimized away
Ensure memory pins are connected at the top-level of the FPGA design
Read capture timing failures (Stratix III/IV devices) Read capture timing failures (Stratix III/IV devices) DQS phase shift selected is not optimal
Board skew is too large Board skew is too large
Make sure board skew parameters are set correctly Default mismatch is 20 ps
© 2012 Altera Corporation—Confidential
232
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 116
Common Issues and Solutions (2)
Write timing Negative margins reported if PLL phase shift is not optimal
Adjust PLL phase shift on the write clock Edit clock c2 in ALTPLL MegaWizard Plug In Manager for Edit clock c2 in ALTPLL MegaWizard Plug-In Manager for
<variation_name>_pll_memphy.v
Regenerating memory IP will overwrite this!
PHY reset recovery and removal PHY reset signals should not be globals
Global Signal assignment set to Off (should be set by pin_placement.tcl)
Manually adjust logic placement (last resort)
© 2012 Altera Corporation—Confidential
233
Address and Command Timing Solutions
Change the PLL phase shift used to generate these signals PHY settings tab: Additional CK/CK# phase and Additional
Address and Command clock phaseAddress and Command clock phase
Ensure board trace model is accurately representedrepresented Especially far-end load and trace delay differences between
address/command and memory clock I/Oaddress/command and memory clock I/O
Make sure PLL phase shift not negated by Fitter delay chain adjustmentdelay chain adjustment D5 Delay assignment set to 0
© 2012 Altera Corporation—Confidential
234
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 117
Optimizing Timing
Quartus II optimization settings Optimization Technique to Speed
Physical synthesis optimizations Effort level to Extra (will greatly increase compile time) Effort level to Extra (will greatly increase compile time)
Use Design Space Explorer (DSE) if necessary to sweep settings
© 2012 Altera Corporation—Confidential
235
Test Your Knowledge: Timing Analysis
1. What four components must be analyzed to perform a full timing analysis on an external memory interface?
A. Source synchronous timing paths, calibration timing paths internal paths timing parameters
full timing analysis on an external memory interface?
paths, internal paths, timing parameters
2. What is the best thing you can do to make sure that an external memory interface meets timing?
A. Make sure that all parameter settings, especially the memory timing settings are correct
external memory interface meets timing?
memory timing settings are correct
3. What adjustments can you try making if the design does not meet timing but without regenerating the interface?
A. Depending on the type of failure, PLL write clock phase shift global signal assignments optimization settings
not meet timing but without regenerating the interface?
© 2012 Altera Corporation—Confidential
shift, global signal assignments, optimization settings
236
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 118
Section 5 Resources
User Guides External Memory Interfaces Handbook (Volume 2, Section I)
Quartus II Software Handbook
Device Handbooks Cyclone III, Cyclone IV, & Cyclone V FPGAs
Arria GX, Arria II GX/GZ, & Arria V FPGAs
St ti III St ti IV & St ti V FPGA Stratix III, Stratix IV, & Stratix V FPGAs
© 2012 Altera Corporation—Confidential
237
Please go to Exercise 4g
Perform timing analysis on the memory interface and test in hardwareinterface and test in hardware
© 2012 Altera Corporation—Confidential
238
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 119
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
S ti 6 Fi l T iSection 6: Final Topics
© 2012 Altera Corporation—Confidential
Agenda
DDR2/3 Controllers with UniPHY EMIF Toolkit
Using High Performance DDR Memory Controllers with Nios II and Qsys
M lti l t ll i i l FPGA Multiple controllers in single FPGA
© 2012 Altera Corporation—Confidential
240
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 120
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
DDR2/3 C t ll ith U iPHY EMIFDDR2/3 Controllers with UniPHY EMIF Toolkit
© 2012 Altera Corporation—Confidential
DDR2/3 Controllers with UniPHY Toolkit GUI and Tcl task/report interface, similar to
TimeQuestQ
Report on calibration status and selected settings
Margining activities Margining activities
Generate and save reports on calibration and imargins
© 2012 Altera Corporation—Confidential
242
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 121
Enabling Communication via CSR Port
Controller Settings tab
Diagnostics tab
© 2012 Altera Corporation—Confidential
243
EMIF Toolkit
JTAGJTAGCSR port
JTAGAvalonMaster
JTAGy
Development system
Mem
ory
UniPHYHPCII
Controller
AFI
FPGA
© 2012 Altera Corporation—Confidential
244
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 122
Launching the EMIF Toolkit
1. Program the device
2 Quartus II Tools menu External Memory Toolkit2. Quartus II Tools menu External Memory Toolkit
3. Link Project to Device task
4 Select the project’s JTAG debugging information ( jdi)4. Select the project’s JTAG debugging information (.jdi) file
5 Create connections to the memory interface and/or the5. Create connections to the memory interface and/or the efficiency monitor
6 Generate and view calibration and margining reports6. Generate and view calibration and margining reports
See the optional steps at the end of Lab 4 for details on See t e opt o a steps at t e e d o ab o deta s ousing the EMIF Toolkit
© 2012 Altera Corporation—Confidential
245
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
M I t f ith Ni IIMemory Interfaces with a Nios II Processor and Qsys
© 2012 Altera Corporation—Confidential
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 123
Accessing Memory from Nios II Processor
Use Qsys to build system Supports DDR2/DDR3 SDRAM HPC plus QDR II/II+ and
RLDRAM, all with UniPHY Controller features Avalon interface Controller features Avalon interface
Stand-alone PHY not supported Needs a memory controller or driver
Add SDC file to Quartus II project and run Tcl scripts manuallyy
System performance is limited by Nios II processor performancep p
Number of peripherals connected to interconnect
© 2012 Altera Corporation—Confidential
247
Qsys Systems
Tools menu Qsys Create new Qsys system inside Quartus II project
Add all microcontroller peripherals required in system Nios II processor communications core/s PLL DMA controller Nios II processor, communications core/s, PLL, DMA controller,
memory IP, etc.
Same MegaWizard utility used as described previously
Remember to enable Generate power-of-2 data bus widths on Controller Settings tab
Note: Nios II processor and DMA controllers can both initiate
burst transactions with the memory
© 2012 Altera Corporation—Confidential
248
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 124
Memory IP in Qsys
DDR interfaces
Models of external memory for simulation
with Qsys systemwith Qsys system
Configurable traffic generator
© 2012 Altera Corporation—Confidential
249
Example System in Qsys
PLL included in memIP component; clock other components with it, if possible
© 2012 Altera Corporation—Confidential
250
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 125
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
M lti l M C t ll i Si lMultiple Memory Controllers in a Single FPGA
© 2012 Altera Corporation—Confidential
Multiple Memory Interfaces
Save cost and board space, and reduce partitioning complexity
Independent read and write transactions on each pinterface
Unique address/command bus for each interface Unique address/command bus for each interface
If they do not contend for device elements, treat them as independent modulesthem as independent modules Otherwise, share resources if controllers operating at same
frequency and are all half-rate or all full-ratey
PLL, DLL, and OCT block can be shared
© 2012 Altera Corporation—Confidential
252
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 126
Creating Multiple Memory Controllers Run through the MegaWizard flow once for each interface
Evaluate device resource availability Are there sufficient pins?
Interfaces cannot share pins
Need to share DLL?Need to share DLL? Interfaces must operate at same frequency
DLL sharing possible across same side of some devices
Need to share PLL and/or clock network? Need to share PLL and/or clock network? Interfaces must operate at same frequency
Need to share OCT block? RUP & RDN or RZQ must connect to block and be in I/O bank with same voltage
as the memory interface
Sharing of device resources requires RTL modificationsg q
Example design available at: http://www.altera.com/support/examples/verilog/ver-stratix-v-multiple-ddr3-
© 2012 Altera Corporation—Confidential
253
uniphy.html
DLLs in Stratix FPGAs
Support any number of interfaces running at same f li it d b i PLL l k d l ifrequency, limited by pin, PLL, clock, and logic resources
4 located at corners of deviceCan shift DQS pins connected to adjacent sides of FPGA Can shift DQS pins connected to adjacent sides of FPGA
Maximum of 4 unique frequencies
© 2012 Altera Corporation—Confidential
254
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 127
PLL/DLL/OCT Sharing
PLL/DLL/OCT block instantiated at same level as PHY to ease sharingg
Clocks from Master instance drive master instance and signals are exported
Clock and OCT inputs exposed and must be connected when set to Slave
Set to No sharing, Master, or Slave
© 2012 Altera Corporation—Confidential
255
Requirements for OCT/PLL/DLL Sharing
Same type of memory DDR3, QDRII, etc.
Same internal clock rate Full rate, half rate, quarter rate
Same interface clock rate 533 MHz, for example
Same PLL input clock rate 100 MHz, for example
Same clock phase requirements p q Setting on PHY Settings tab (previous slide)
Additional core-to-periphery phase of 45°, for example
© 2012 Altera Corporation—Confidential
256
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 128
Example of Full Resource Sharing
T ffiTraffic Generator
Local IF
Traffic Generator
Local IF
Slave ControllerMaster Controller
OCT
Local IF
oct_rupt d
oct_ctl_rt_value[13:0]
oca
OCT
DLL
oct_rdn
afi clk
dll_delayctrl[5:0]
oct_ctl_rs_value[13:0]
pll_ref_clkPLL
afi_clk
afi_half_clkafi_reset_npll_mem_clkpll_write_cllkpll_addr_cmd_clk
© 2012 Altera Corporation—Confidential
257
Memory 1 Memory 2
Number of PLL Outputs by Device
Device family
Number of Enhanced PLL clock outputs
Number of dedicated clock outputs
L ft/ i ht 7 l k t tLeft/right: 2 single-ended or 1 differential pair
Stratix IIILeft/right: 7 clock outputs
Top/bottom: 10 clock outputs
g g p
Top/bottom: 6 single-ended or 4 single-ended and 1 differential pair
L ft/ i ht 7 l k t tLeft/right: 2 single-ended or 1 diff. pair
Stratix IVLeft/right: 7 clock outputs
Top/bottom: 10 clock outputs
Left/right: 2 single ended or 1 diff. pair
Top/bottom: 6 single-ended or 4 single-ended and 1 differential pair
4 single-ended or 2 single-ended and 1Stratix V 18 clock outputs each
4 single-ended or 2 single-ended and 1 differential pair
© 2012 Altera Corporation—Confidential
258
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 129
Global Resources Available and DDR3 Half-R t Cl ki R i tRate Clocking Requirements
Device Global clock network Regional clock network
familyGlobal clock network Regional clock network
Stratix III 16 64-88
Stratix IV 16 64-88
Stratix V 16 92
Device f il
# of full-rate clocks # of half-rate clocksfamily
# of full rate clocks # of half rate clocks
Stratix III 3 global 1 global, 1 regional
Stratix IV 3 global 1 global, 1 regional
Stratix V 1 global, 2 regional 2 global
© 2012 Altera Corporation—Confidential
259
Stratix V Multiple Interface Guidelines
Pins from multiple interfaces cannot be placed in the same I/O sub-bank Pins in an interface all require access to the same leveling block
OK if sharing resources (PLL, DLL, OCT)
Multiple interfaces cannot share the same PLL i t f l kinput reference clock Would force same PLL (for a single PHY clock tree) to be used
for both interfacesfor both interfaces
Interfaces can’t share a single PHY clock tree
OK if sharing PLL/DLLg
© 2012 Altera Corporation—Confidential
260
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 130
Multi-Controller Considerations
Independent controllers (no resource sharing) Follow the design flow for each instance of the controller
I/O constraints Tcl file uses generic top-level pin names (mem_dq, mem dqs etc )mem_dqs, etc.) Edit the memory interface pin names to match your top-level
Or import .ppf file into Pin Planner to add unique prefix names to each instanceinstance
Additional considerations when sharing b t lti l t llresources between multiple controllers
Edit timing.tcl script for slave interface(s) to use master PLL clocksclocks
Edit report_timing_core.tcl script for slave interface(s) to allow master interface to adjust DQ/DQS delay chains
© 2012 Altera Corporation—Confidential
See EMIF Handbook for details on editing these files
261
Efficiently Fitting Memory Interfaces
Example showing 2 possible fits for
• Two 72 bit DDR3 interface (different sizes) and
Two 114 pin 72 bit DDR3
implementations
72 bit DDR2 72 bit DDR2
• Two 72 bit DDR3 interface (different sizes) and
• Four 36 bit (18R / 18W) QDRII+ interfaces (different sizes)
Number of I/O
p
One 159 pin
48
24
4832 4848 3248
48
24Number of I/O per bank
One 159 pin 72 bit DDR3
implementation
One 66 pin 36 bit QDRII+
implementation
Stratix III40
40
24
40
40
24 4832 4848 3248
Four 64 pin 36 bit QDRII
implementation
36 bit QDRII+ 36 bit QDRII 36 bit QDRII+ 36 bit QDRII+
48 48
4832 4848 3248
St ti III
48
24
40
48
24
4036 bit QDRII interfaces
36 bit QDRII+ 36 bit QDRII+ 36 bit QDRII+ 36 bit QDRII+ Stratix III40
48
24
40
48
24
4832 4848 3248
© 2012 Altera Corporation—Confidential
262
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 131
Test Your Knowledge: Final Topics
A Analyze how the interface was calibrated by the
1. What does the EMIF toolkit allow you to do?
A. Analyze how the interface was calibrated by the sequencer; view timing margin on all DQ paths
2. What does implementing the memory interface as a Qsys system component allow you to do?
A. Use a Nios II processor or any other Avalon master component control the interface.
3. How does enabling master or slave PLL/DLL/OCT sharing affect the generation of the controller?
A. Master: outputs of resources exposed to top-level of controller; slave: inputs exposed and must be
© 2012 Altera Corporation—Confidential
connected to resource output from master
263
Section 6 Resources
User Guides External Memory Interfaces Handbook (Volume 5, Section 2)
Chapter 2: Implementing Multiple Memory Interfaces Using UniPHY
Chapter 3: DDR3 SDRAM Controller with UniPHY Using Qsys Chapter 3: DDR3 SDRAM Controller with UniPHY Using Qsys
Quartus II Software Handbook
Device HandbooksCyclone III Cyclone IV & Cyclone V FPGAs Cyclone III, Cyclone IV, & Cyclone V FPGAs
Arria GX, Arria II GX/GZ, & Arria V FPGAs
Stratix III, Stratix IV, & Stratix V FPGAs, ,
© 2012 Altera Corporation—Confidential
264
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 132
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
C l iConclusions
© 2012 Altera Corporation—Confidential
Summary
Discussed Altera DDR3 memory interface IPHow to find parameterize and instantiate a High Performance DDR3 How to find, parameterize, and instantiate a High Performance DDR3 memory controller in a Quartus II project
Listed the steps required to simulate a system Pointed out what type of termination schemes to use Walked through static timing analysis in the TimeQuest
timing analyzer and techniques for solving commontiming analyzer and techniques for solving common timing problems
Discussed using the High Performance Memory Discussed using the High Performance Memory controllers inside a Qsys system
Highlighted how you can implement multiple memory controllers in a single FPGA
© 2012 Altera Corporation—Confidential
266
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 133
Implementing, Simulating, & DebuggingImplementing, Simulating, & Debugging External Memory Interfaces
R fReferences
© 2012 Altera Corporation—Confidential
Resources
Memory Resource Center http://www.altera.com/technology/memory/mem-index.jsp
User Guides External Memory Interface Handbook
http://www.altera.com/literature/lit-external-memory-interface.jsp
D i H db k Device Handbooks Cyclone III, Cyclone IV, & Cyclone V FPGAs
Arria GX Arria II GX/GZ & Arria V FPGAs Arria GX, Arria II GX/GZ, & Arria V FPGAs
Stratix III, Stratix IV, & Stratix V FPGAs
© 2012 Altera Corporation—Confidential
268
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 134
Learn More Through Technical Training
Instructor-Led Online Virtual ClassroomTraining TrainingTraining
With Altera's instructor-led training courses, you can: Learn from an experienced Altera technical
training engineer (instructor)
With Altera's virtual classroom training:
With Altera's online training courses, you can: Take a course at any time that is training engineer (instructor)
Complete hands-on exercises with guidance from an Altera instructor
Ask questions and receive real-time answers from an Altera instructor
Get the best of both worlds!
All the benefits of a live, instructor-led training class from the comfort of your home
convenient for you
Take a course from the comfort of your home or office (no need to travel as with instructor-led courses)
answers from an Altera instructor
Each instructor-led class is one or two days in length (8 working hours per day)
or office Each online course takes approximate one to three hours to complete
http://www.altera.com/training
View training class schedule and register for a class
© 2012 Altera Corporation—Confidential
269
g g
Altera Technical Support
Reference Quartus II software on-line help Quartus II Handbook Quartus II Handbook World-wide web: http://www.altera.com
Search for answers to problems with Knowledge Database Download literature View design examples View online trainingsg
MySupport: http://www.altera.com/mysupport Field applications engineers: contact your local Altera
sales officesales office Altera Wiki: www.alterawiki.com Altera Forum: www.alteraforum.com Altera Forum: www.alteraforum.com Intellectual Property Support
http://www.altera.com/support/ip/ips-index.html
© 2012 Altera Corporation—Confidential
270
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 135
Instructor-Led and Virtual Training Curriculum
Quartus IISoftware:
Foundation
Introductionto
VHDL
Quartus IIDesign
Quartus II
BestPractices
Introduction to
Qsys
Introductionto
Verilog
Quartus IISoftware:
TimingAnalysis
Optimization Using
Incremental Compilation
Quartus II Software:Debus &Analysis
forMaximizing
FPGAProductivity
(2 days)
Designing Advanced
AdvancedVHDL
AdvancedVerilog
AdvancedQsys
with the Nios II
Processor
AdvancedTiming
Analysis
TimingClosure
ExternalMemory
Interfaces
VideoDesign
FrameworkWorkshop
Developing SW for
the Nios IIProcessor
(2 days)
DSP Builder
Advanced Blockset
Foundation Classes
Advanced Follow-On Classes
Building Gigabit
Interfaces in Altera
Transceivers
DSP Builder
StandardBlockset
Designing w/ ARM
based SoC
Future Classes
Available as a Virtual Class
System Verilog
Specialized Classes
Scripting ModelSimDesigning w/ OpenCL
System Console
Partial Reconfiguration
Getting Started w/
PCIe
© 2011 Altera Corporation—Confidential http://www.altera.com/trainingThank You
Thank YouThank You
© 2012 Altera Corporation—Confidential ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at www.altera.com/legal.
© 2011 Altera Corporation—Confidential ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the United States and are trademarks or registered trademarks in other countries.
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 136
A diAppendix
© 2012 Altera Corporation—Confidential
Appendix Table of Contents Basics of source synchronous interfaces Example applications Older device resources for memory IP Older device resources for memory IP HPCII, UniPHY, and sequencer architecture ALTMEMPHY parameterization Details on manual timing derating Details on manual timing derating Detailed hierarchy of example design DDR/2 latencies 11 0 simulation with NativeLink and/or ALTMEMPHY 11.0 simulation with NativeLink and/or ALTMEMPHY Simulation of initialization and calibration Signal integrity analysis More information on termination More information on termination Other FPGA features for signal integrity ALTMEMPHY timing paths Multiple interfaces with ALTMEMPHY Multiple interfaces with ALTMEMPHY SOPC Builder performance considerations
© 2012 Altera Corporation—Confidential
274
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 137
Source Synchronous Interfaces
Strobe or clock signal sent from driver chip, not separate clock sourceseparate clock source Target device uses transmitted clock to sample incoming data
Data & clock routed identically to maintain phase relationship at destination device Example shown: Driver shifts clock to meet receiver timing
Driver ReceiverD tData
ClockDelay
© 2012 Altera Corporation—Confidential
275
Source Synchronous Clocking Schemes
Edge-aligned Center-aligned
SDRSDR
DDR
© 2012 Altera Corporation—Confidential
276
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 138
Source Synchronous Benefits
Higher maximum bus speed than common clock tsystems
Common clock method is limited by absolute delay of data signal
Source synchronous method is limited by delay difference (skew) between data and clock( ) No theoretical frequency limit Practical limits include I/O edge rates, SSN, signal integrity, and
minimum pulse widthsminimum pulse widths
© 2012 Altera Corporation—Confidential
277
Double-Data Rate (DDR) Interfaces
Not restricted to use with memory systems
Data sent on rising and falling clock edges Often uses complementary clocking for higher performance
lkclk_p
clk nclk_n
data AL AH BL BH CL CH DL DH EL EH
1st word 2nd word 3rd word 4th word 5th word
© 2012 Altera Corporation—Confidential
278
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 139
Example – Packet Buffering Application
RLDRAM II
RLDRAM IIAltera
600 Mbps RLDRAM
RLDRAM IIInterfaceFPGA
SPI 4.2 TX
SPI 4.2 RX
Core Logic
QDRII SRAMInterface
QDRII SRAM1 Gbps QDRII SRAM
© 2012 Altera Corporation—Confidential
279
p
Example - Embedded Application
DDR2 SDRAM DIMM
DDR2Altera
533 Mbps DDR2 SDRAM
Interface
Nios II
Altera FPGA
Nios IIEmbedded processor
Memory Controller
MemoryInterface
PCIInterface
600 Mbps RLDRAM II or1 Gbps QDRII SRAM
RLDRAM/QDRII
© 2012 Altera Corporation—Confidential
280
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 140
Cyclone III / IV Devices
Cyclone III & IV E devices: dedicated bidirectional DQ/DQS pins on all banks around dieDQ/DQS pins on all banks around die
Cyclone IV GX devices: top, bottom, & right sides only
3 PLL 4 PLL
DDR write logic implemented in I/O cells
Device
2
Cyclone IIIdevice
5 Up to 4 reconfigurable PLLs for static use during write and dynamic use during
DDR read logic implemented in FPGA Fabric
1 6
dynamic use during read operationPerformance optimized on top
and bottom of FPGA
PLL PLL8 7
Note: Left and right-hand sides of FPGA support LVDS data I/O. Top and
© 2012 Altera Corporation—Confidential
281
g pp pbottom optimized for memory performance
Arria GX / Arria II GX Devices
Dedicated bidirectional DQ/DQS pins on top and bottom banksbottom banks Non-DQS mode (PLL-generated capture clock) support on sides
of device Up to 4 PLLs for static use in read
3 PLL 4 PLL PLL DLLDDR read/write logic implemented in I/O cells on
static use in read and write operations
2
PLL blo
cks
implemented in I/O cells on top and bottom of FPGA
Can use regular I/O on sides of die for lower performance DLL used for
Arria GXdevicePLL
PLL
ansc
eive
r bof die for lower performance
Fast PLLs optimized to support LVDS and SERDES
DQS phase shift in DQS blocks during read
1
PLL 8 7 PLL DLL
Trapp
on side (non-CDR)Built in CDR
circuitry(xcvr on left hand side
PLL
© 2012 Altera Corporation—Confidential
282
(xcvr. on left hand side in Arria II GX devices)
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 141
Stratix III / IV I/O Memory Interface IP
Hard IP with powerful features
Stratix III IOE: 31 registerspowerful features
Available behind DQ i ll
DQ_CLK
0°_CLK
DQ_CLK_phase[8:0]
0°_CLK_phase[8:0]
Control signals from DLL 5
Resync_CLK Resync_CLK_phase[8:0]
DLL controlled delay chain
at per bank basis
every DQ pin on all 4 sides
D Q
D Q0
1
OE
99
D Q
D Q
11
Divided DQ_CLK
0
1
OE0
OE1D Q
D Q
D Q
Write levelingHalf rate resync
D Q
Older device IOE6 registersDQ
OE Register
O t t
D Q
D Q
D Q
D Q
IOD0OUT
IOD1OUT
IOD2OUT
IOD3OUT
0
1
D Q
D Q
D Q
0
1
D Q
D Q
D Q
0
1
vs. DQ
D Q
D Q
Input Register
D Q
D Q
01
D Q
D Q
Output Register
Input Register
D QD Q
CDATA0IN
CDATA2IN
CDATA3IN
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
DQ
Read calibration
DQS
Controlled byDLL or IP
Postamble
DQS
Controlled byDLL or IP
Divided 0°_CLK
CDATA1INDQ DQ
DQS logic Postamble
© 2012 Altera Corporation—Confidential
283
HPC II Architecture
face
Rank Timer
pu
t In
terf
Command Generator
Timing-Bank Pool Arbiter
AF
I IntTo user logic To PHY
lon
-ST
Interface
Write Data Buffer
To user logic To PHY
Ava ECC
Read Data Buffer
Config & Status Reg. (CSR) Interface
© 2012 Altera Corporation—Confidential
284
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 142
HPC II Interfaces
Avalon-ST input interfaceE i i ll f l i Entry point into controller from user logic
Communicates with masters requesting data
CSR (Avalon-MM slave) interface CSR (Avalon-MM slave) interface Provides runtime access to controller configuration and status
registers Independent status and efficiency monitoring or can be used
along with the EMIF debug toolkit (discussed later)
Avalon PHY (AFI) interface Avalon PHY (AFI) interface Communicates between the controller and PHY using AFI 3.0
specification Single data rate (SDR) interface See External Memory Interface Handbook for details
© 2012 Altera Corporation—Confidential
285
HPC II Architectural Blocks
Command generatorA d f i i f d ECC l i Accepts commands from streaming interface and ECC logic
Timing bank poolParallel queue that works with arbiter to enable data reordering Parallel queue that works with arbiter to enable data reordering
Tracks all incoming requests Passes data to arbiter once write data is ready
Arbiter Orders requests to be passed to external memory following
bit ti larbitration rules If only one master issuing request, grant request immediately If 2 or more masters have outstanding requests:
Read request granted before write request Otherwise, oldest request granted first
© 2012 Altera Corporation—Confidential
286
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 143
Other Blocks
Rank timerM i i k ifi i i i l i k ( l i l CS i l ) Maintains rank-specific timing in multi-rank (multiple CS signals) memory topologies (typically for DIMMs)
Limits number of activates in a given timing period Manages read-to-write and write-to-read turnaround time Manages delay between activating different banks
W it d R d d t b ff Write and Read data buffers Stores data to write or read data as it passes between the user
logic and the PHY/memory interfaceg y
ECC Encoder and decoder-corrector generates interrupts on errors Can detect and fix single-bit errors Can detect double-bit errors
© 2012 Altera Corporation—Confidential
287
Address & Command Datapath Transfers address/command signals from AFI clock domain to SDR
address/command clock domain
C l hif hif i f ll l i l i Cycle shifter can shift in full-rate cycles to implement correct write latencies
Full-rate addr/cmd
afi_clkCore DDIO
addr cmd clk
Full rate cycle shifter
afi_ or dd d lk
addr/cmd
addr_cmd_clkaddr_cmd_clk
afi_clk270°
H0/L0 H1/L1
addr cmd clk
L0 H0
addr_cmd_clk
L1 H1
© 2012 Altera Corporation—Confidential
288
mem_clk
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 144
Write Datapath
afi_wdata_valid
DDIO
DQSDQSnDDIO
HDR to SDR SDR to DDR
DDIO
DDIO
0
wdata[1:0]DQ
DDIODQDDIO
wdata[3:2]
DDIO...
0° phase
phy_write_clk
© 2012 Altera Corporation—Confidential
289
Read Datapath Read FIFO
Full SDR in
HDR (or QDR)t
afi clk
out
data capture
DQS
DQafi_clk
VFIFO
gated DQS
p
LFIFO
read enable
DQS enable
afi_clk(half-rate) afi_clk
rdata_en
VFIFO LFIFO
Synchronizes read data with afi_clk domain and converts SDR to HDR or QDR
(half rate)
converts SDR to HDR or QDR VFIFO (valid) and LFIFO (read latency)
parameters set by calibration
© 2012 Altera Corporation—Confidential
parameters set by calibration
290
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 145
Sequencer Architecture
SCC ManagerSets delays and phases on I/O through adjustment of dynamic delay chains based Sets delays and phases on I/O through adjustment of dynamic delay chains based on calibration algorithm
RW ManagerA t t ll th h AFI t d d t fi Acts as controller through AFI to send commands to memory: configure memory registers, activate, precharge, refresh, guaranteed writes, write/read bursts, etc.
PHY Manager Direct access to PHY to pass on calibration results, such as calibrated FIFO buffer
parameters
Indicates completion and pass/fail status of calibration process
Data Manager Stores parameter information for software access
Tracking Managerg g Take over AFI interface during operation after refresh to track DQS enable signal
Adjust as needed due to voltage and temperature changes
© 2012 Altera Corporation—Confidential
291
ALTMEMPHY: Memory Settings
Settings analogous to UniPHY
© 2012 Altera Corporation—Confidential
292
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 146
ALTMEMPHY: PHY Settings (1)Stratix devices:
improve SI
Newer Stratix devices: fine tune PLL phase shift to improve address/command timing(240 for dev. boards; typically use 270)
Devices with DLLs: Share DLL resource
with other core(s)
Arria, Cyclone, and older Stratix devices: fixed 90 degree phase shifts to improve address/command timing
Older Cyclone and Stratix devices: worst case skew for timing constraints(Newer devices use Board Settings)
© 2012 Altera Corporation—Confidential
293
ALTMEMPHY: PHY Settings (2)
For HardCopy II development
For newer Stratix devices (discussed later)
Reduce simulation time (analogous to UniPHY setting)
© 2012 Altera Corporation—Confidential
294
g)
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 147
ALTMEMPHY Memory Presets
Filters on left filter displayed presets on right
Load custom presets (in XML) from IP’s /lib directory
© 2012 Altera Corporation—Confidential
295
ALTMEMPHY Preset Editor
White cells are programmable ti ( l t U iPHYoptions (analogous to UniPHY
Memory Parameters)
Gray cells for creating custom memory presets
Save as a custom preset
© 2012 Altera Corporation—Confidential
296
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 148
Saving Custom Memory Presets
Default save directory is /lib folder for ipcomponentcomponent C:\altera\<ver>\ip\altera\ddr3_high_perf\lib
New memory added automatically to Presets list New memory added automatically to Presets list Must manually Load Preset if saved in different directory
© 2012 Altera Corporation—Confidential
297
Setting Timing Parameters (cont.)
But ns used in Preset Editor and preset for 533 MHz!MHz!
533 MHz: 4 x 1.875 ns = 7.5 ns
400 MHz: 4 x 2.5 ns = 10 ns
Adjust appropriately and Adjust appropriately and save as custom preset
© 2012 Altera Corporation—Confidential
298
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 149
Manual Derating (Stratix III devices only)1. Find correct derating values (∆tDS, ∆tDH, ∆tIS, ∆tIH) in component
datasheet based on signal slew rates
Add d ti t b l (E t t ∆t )2. Add derating to base values (Ex: tDS = tDS(base) + ∆tDS)
3. Normalize values to VREF and enter in Board Settings tab Datasheets reference to VIH & VIL, not VREF used by AlteraDatasheets reference to VIH & VIL, not VREF used by Altera
Example: ∆tIS, ∆tIH derating values; select values based on slew rates
See “Derate Memory Setup and Hold Timing” section (Volume 3, Section II, chapter 3)
© 2012 Altera Corporation—Confidential
299
y p g ( , , p )and “Timing Derating Methodology” chapter in the External Memory Interface Handbook for more details and examples
Automatic Derating
1. Enter base values for settings Memory Timing tab (UniPHY) or Preset Editor (ALTMEMPHY)y g ( ) ( )
2. Enter slew rate information and/or # of slots/devices for approximationslots/devices for approximation Board Settings tab (UniPHY and ALTMEMPHY)
Derated values automatically calculated Derated values automatically calculatedUniPHY ALTMEMPHY
© 2012 Altera Corporation—Confidential
300
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 150
ALTMEMPHY EDA Settings
Altera librariesAltera libraries needed for 3rd-party
simulation
Special sim lationSpecial simulation model; cannot be
synthesized
Timing and resource estimation netlist for 3rd-
t th i t lparty synthesis tools
© 2012 Altera Corporation—Confidential
301
ALTMEMPHY Summary
Choose optional files to generate; main variation and Quartus II IP
files always generated
© 2012 Altera Corporation—Confidential
302
y g
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 151
MegaWizard Generation Messagesg g
U iPHYUniPHY
ALTMEMPHY
© 2012 Altera Corporation—Confidential
303
Custom Controller With ALTMEMPHY (1)
© 2012 Altera Corporation—Confidential
304
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 152
Custom Controller With ALTMEMPHY (2)
Quickly and easily configure physical layer interface
Connect to your own controller or driver and still ygain the benefit of the Altera physical interface
© 2012 Altera Corporation—Confidential
305
Add SDC Files to Project
UniPHY <variation name> sdc <variation_name>.sdc
<variation_name>_sequencer_cpu.sdc
ALTMEMPHY ALTMEMPHY <variation_name>_phy_ddr_timing.sdc
<variation_name>_example_top.sdc (optional; only if using _ _ p _ p ( p y gexample design)
© 2012 Altera Corporation—Confidential
306
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 153
SDRAM Memory (Typical Configuration)
Multiple
memory
banks
Control
Micron 1 GB DDR3 SDRAM
© 2012 Altera Corporation—Confidential
307
Read/write logic
Example Design Revisited For 11.0
See main presentation for 11.1 and laterp
ddr3_top_example_sim_tb.v
ddr3 top example sim vddr3_top_example_sim.v
ddr3_top_example_sim_ddr3_top_example_sim.v
ddr3 top example sim ddr3 top example sim e0.v_ p_ p _ _ _ p_ p _ _
Test driverPHY
Local interface (Avalon-MM) Memory
modelM S
pll_ref_clk
Controller logic
AFI DLL
Status
pass
failPLL
global_reset_ncheckertest_complete
© 2012 Altera Corporation—Confidential
308
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 154
DDR/DDR2 Read Latency - ALTMEMPHY
Address and
Command Read Data Total Read
Device Frequency Interface
Controller
Latency
Command Latency
CAS
Latency
Read Data
Latency
Total Read
Latency
FPGA I/O FPGA I/O
Local Clock cycles
Time
(ns)Device Frequency Interface Latency LatencyFPGA I/O FPGA I/O cycles (ns)
Arria GX233 Half-rate 5 3 1 2 4.5 1 18 154
167 Full-rate 5 2 1 4 5 1 19 114
Arria II GX233 Half-rate 5 3 1 2.5 5.5 1 18 154
167 F ll t 5 2 1 4 6 1 20 120167 Full-rate 5 2 1 4 6 1 20 120
Cyclone III Cyclone IV
200 Half-rate 5 3 1 2 4.5 1 18 180
167 Full-rate 5 2 1 4 5 1 19 114
Stratix II333 Half-rate 5 3 1 2 4.5 1 18 108
Stratix II
Stratix II GX267 Half-rate 5 3 1 2 4.5 1 18 135
200 Full-rate 5 2 1 4 5 1 19 95
Stratix III
Stratix IV
400 Half-rate 5 3 1 2.5 7.125 1.5 20 100
267 Full-rate 4 2 1.5 4 7 1 20 75
© 2012 Altera Corporation—Confidential
309
DDR/DDR2 Write Latency - ALTMEMPHY
Address and
C d Total Write
Device Frequency Interface
Controller
Latency
Command Latency
Memory
Write
Latency
Total Write
Latency
FPGA I/O
Local Clock cycles
Time
(ns)Device Frequency Interface Latency LatencyFPGA I/O cycles (ns)
Arria GX233 Half-rate 5 3 1 1.5 12 103
167 Full-rate 5 2 1 3 12 72
Arria II GX233 Half-rate 5 3 1 2.5 12 103
Arria II GX167 Full-rate 5 2 1 4 12 72
Cyclone III Cyclone IV
200 Half-rate 5 3 1 1.5 12 120
167 Full-rate 5 2 1 3 12 72
St ti II333 Half-rate 5 3 1 1.5 12 72
Stratix II
Stratix II GX267 Half-rate 5 3 1 1.5 12 90
200 Full-rate 5 2 1 3 12 60
Stratix III
Stratix IV
400 Half-rate 5 3 1 2 12 60
267 Full-rate 5 2 1 5 3 13 49Stratix IV 267 Full rate 5 2 1.5 3 13 49
© 2012 Altera Corporation—Confidential
310
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 155
DDR3 Typical Latency - ALTMEMPHY
T t l L t
DeviceController
RateFrequency
(MHz)Latency
Type
Total Latency
Local Clock Cycles
Time
(ns)
Stratix III Half 400Read 23 115
Write 14 68
Stratix IV Half 400Read 23 115
Write 14 68
The exact latency depends on your precise configuration. You should obtain preciselatency from simulation, but this figure may vary in hardware because of the
© 2012 Altera Corporation—Confidential
311
automatic calibration process.
DDR2 Latency - UniPHY
Round
Controller Rate
Controller Address &
Command
PHY Address & Command
Memory Maximum
Read
PHY Read
ReturnRound
Trip
Round Trip
w/out
Memory
Full 5 1 3-7 4 13-17 10
Half 103 (1)
4 (2)
3-78
24-28(1)
26-28(2)
21(1)
22(2)
(1) Even write latency
© 2012 Altera Corporation—Confidential
312
(1) Even write latency(2) Odd write latency
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 156
EDA Settings Revisited
Not necessary for UniPHY; generated designs work for both simulation and synthesis
© 2012 Altera Corporation—Confidential
313
Greater Project Directory Structure (ALTMEMPHY)
(Quartus II design project folder) Project (.qpf) file
Settings (.qsf) file
(Quartus II design project folder)
MegaWizard-generated IP files
(testbench directory)( y)
© 2012 Altera Corporation—Confidential
314
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 157
NativeLink Simulation (11.0 & ALTMEMPHY only)
P f A l i & El b ti Perform Analysis & Elaboration
© 2012 Altera Corporation—Confidential
315
Set NativeLink Simulation OptionsTools menu Options
Path to simulator
© 2012 Altera Corporation—Confidential
316
Path to simulator executable directory
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 158
Choose EDA Simulation Tool & Language
© 2012 Altera Corporation—Confidential
317
Establish Simulation Settings
Simulation files stored
here
Testbench control
© 2012 Altera Corporation—Confidential
318
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 159
NativeLink Test Benches
RTL test bench automatically created for example project (11.0)
Create new testbench manually (11.1 and later)y ( )
© 2012 Altera Corporation—Confidential
319
DUT and Test Bench Files
Replace generic model with vendor model© 2012 Altera Corporation—Confidential
Replace generic model with vendor model
320
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 160
Run Simulation via NativeLink
Run EDA RTL Simulation C i l i fil i “ i l i / d l i ” di Creates simulation files in “simulation/modelsim” directory
Creates script: <variation_name>_example_sim_run_msim_rtl_verilog.do
© 2012 Altera Corporation—Confidential
321
Simulation Script
Compiles all required files, starts simulation,files, starts simulation, sets up waveform view, and advances the simulation
Can be edited and run llmanually
Example: add or change waveforms to view
Overwritten each time simulation started with N ti Li kNativeLink
© 2012 Altera Corporation—Confidential
322
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 161
Edit and Source “.do” Script
© 2012 Altera Corporation—Confidential
323
Run Simulation Script Manually
© 2012 Altera Corporation—Confidential
324
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 162
Format Waveforms During Simulation
© 2012 Altera Corporation—Confidential
325
Simulation Details – Start-up Sequence
Self-calibrating control blockSt t t i i tt C lib t t Startup: training pattern - Calibrates out process differences (board, memory, & FPGA)Normal operation: monitor & adjust
External memorydevice initialization
Normal operation: monitor & adjust Compensate for voltage and temperature
variations No interruption of operation
Write data training
No interruption of operation
Calibration
Functional use of memorymemory
© 2012 Altera Corporation—Confidential
326
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 163
Core Function: Supplementary Information
Initialization is activated automatically by core immediately after reset release and cannot be stopped by user logic Global reset going into the PHY can re-start sequence
© 2012 Altera Corporation—Confidential
327
Simulation Stages
1. Device initialization
3. Functional test begins
2. Training and calibration stage2. Training and calibration stage
© 2012 Altera Corporation—Confidential
328
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 164
Functional StageFunctional use beginsFunctional use begins- Test driver takes over after
local_init_done goes high
© 2012 Altera Corporation—Confidential
329
HyperLynx LineSim GUI
Pick and place parts to build circuit
Set properties for each component Set properties for each component
Simulate and view
Output Series Trace Input
Board stackup
Simulate
Simulate and view results in scope window
Output buffer
Series resistance
Trace element
Input buffer
Display eye-pattern at receiver
© 2012 Altera Corporation—Confidential
330
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 165
Modeling Stratix III to DDR2 SDRAM DIMM
Hyperlynx setup
© 2012 Altera Corporation—Confidential
331
Hyperlynx Sim vs. Real Measurement
DDR2 SDRAM Read
Diff b tDifference between simulation and measured trace could be due to capacitancebe due to capacitance in the via where the measurement was taken
© 2012 Altera Corporation—Confidential
332
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 166
Gauging Signal Integrity and Quality
“Eye” opening: persistent oscilloscope
overshoot
measurement at receiver width
height
overshoot
Bigger eye means better margin, less undershootgover/undershoot
Compare with Compare with simulation
© 2012 Altera Corporation—Confidential
333
Parts of the Eye
Width Faster edge rates widen the eye
Helps meet setup and hold timing
H i h Height Bigger height, larger signal swing
E ti f V d V ifi ti Ensure meeting of VIH and VIL specifications
Over/undershootC d b l ti f fl ti d i i i l Caused by accumulation of reflections or over-driving signal
Ringing can cause false triggering
Receiver damage possible over long term if specifications greatlyReceiver damage possible over long term if specifications greatly exceeded
© 2012 Altera Corporation—Confidential
334
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 167
Check Point
Does my simulated eye meet my criteria? If not, adjust settings and iterate
Determine boardDetermine board design constraints
Perform boardPerform board level
simulations
Adjust termination, drive
strength, etc.
noMeets timing & performance?
yes
Continue design
© 2012 Altera Corporation—Confidential
335
No Termination Scheme
Not recommended
Bad signal integrity on both reads and writes
No external components No external components
ZS Z00
ZS
FPGA memoryZ0 ≠ ZS
S
Potential electrical discontinuities
© 2012 Altera Corporation—Confidential
336
discontinuities
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 168
Eye (R/W) for No Termination Scheme
Bad signal integrity!
Small eye width and height
Large over/undershoot Large over/undershoot
© 2012 Altera Corporation—Confidential
337
Termination Schemes: Parallel Class II
Two resistors needed per line - One for each receiver
VTT power supply requiredTT p pp y q
Impedance matched and energy absorbed at both endsboth ends
Good for bidirectional signals
VTT
ZS Z0ZP
FPGA memory
VTT
ZP
ZS
FPGA memoryZP = Z0 = 2ZS
© 2012 Altera Corporation—Confidential
338
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 169
Non-Fly-By (Star) Topology
Parallel termination typically placed beforememory
Creates un-terminated stubs Stub itself causes reflections and ringing
ZP
Z0
Zstub1
Z0
Zstub2
© 2012 Altera Corporation—Confidential
339
Daisy Chain (Fly-By) Topology
ZP can be placed past receiver
Removes stubs
More difficult to route More difficult to route
ZP
Z0
© 2012 Altera Corporation—Confidential
340
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 170
Board Area & Component Usage
Many components needed for full Class II!
Fly-by pull-ups & bypass
Pull-ups and VTT
bypassDIMM
Connector
FPGAFPGA
© 2012 Altera Corporation—Confidential
341
Initial Conclusions
Termination in some form is essential for high speed designs
Class I good for unidirectional signals (address, g g (command)
Class II good for bidirectional signals (DQ, DQS) Class II good for bidirectional signals (DQ, DQS)
Lots of resistors needed
E t il/ l d d Extra power rail/plane needed
© 2012 Altera Corporation—Confidential
342
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 171
Utilize FPGA Adjustable I/O Features
Transistors inside I/O elements can be tailored to allow either drive strength or impedance adjustment Thus, these options are mutually exclusive
The former is calibrated around drive strength and the latter around impedance but either can be used in this contextaround impedance but either can be used in this context
© 2012 Altera Corporation—Confidential
343
Setting FPGA Drive Strength
Multiple settings available on all Altera devices
8 mA
Settings depend on selected I/O standard
Typical of Class I termination
Prevents over/undershoot due to less loading
16 mA Typical Class II
termination Increases eye height in
more heavily loaded configuration
© 2012 Altera Corporation—Confidential
344
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 172
Programmable Slew Rate Slew rate
Rate a signal takes to transition from one logic state to another Measured in V/ns
Stratix III/IV devicesSet as 0 (slowest) 1 2 or 3 (fastest; default) Set as 0 (slowest), 1, 2, or 3 (fastest; default)
Not available when series OCT in use
Stratix V devices
fast
l Only 2 settings: 0 (slow) and 1 (fast; default) Not available when series OCT in use
F t l t f t t d
slow
Faster slew rate means faster, stronger edges More chance of overshoot, noise (SSN)
Slower slew rate means slower edges Slower slew rate means slower edges Slower signaling, but less noise
© 2012 Altera Corporation—Confidential
345
Board Trace Mismatch Compensation
Manually adjust I/O delay to compensate for longer/shorter board trace
Digitally programmable in 50 ps steps
Compensation for 0 – 5 ½ inches FR4 ( t d d b d t i l) d l 170 /i h FR4 (standard board material) delay: ~170 ps/inch
“Last resort” debugging; check IP parameters and board trace models first
Stratix III / IV / V
UserC t ll dControlled
© 2012 Altera Corporation—Confidential
346
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 173
Deliberately Skew DQ Data Output
Reduce Simultaneous Switch Noise (SSN) Delaying adjacent edges reduces total number of simultaneous Delaying adjacent edges reduces total number of simultaneous
switch output (SSO) edges.
Controllable in 50 ps steps
See Advanced I/O System Design online training for more information about SSN
5 ns
700700 ps
© 2012 Altera Corporation—Confidential
347
5 ns
Series OCT (Class I)
Choose 50 for Class I (unidirectional signals) For typical Z0 of 50
Similar to standard 8 mA drive strength Typically used for Class I
Eye with OCT slightly bigger than with drive strength setting (mutually exclusive settings)
VTTVTT
ZS Z0ZP
FPGA memory
ZSZS = ZP = Z0
VTT = VCC/2
© 2012 Altera Corporation—Confidential
348
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 174
Series OCT (Class II)
Choose 25 for Class II
Similar to standard 16 mA drive strength Typically used for Class II
Eye with OCT slightly bigger than with drive strength settingg g
VTT
Z ZVTT
ZZS Z0ZP
FPGA memory
ZP
/2 /2 ZSZS = ZP/2 = Z0/2
© 2012 Altera Corporation—Confidential
349
Parallel OCT
50 Thevenin equivalent parallel termination No VTT required
Bidirectional and input pins only
N l i i i ( ) d d No external termination resistor(s) needed at FPGA Calibrates using external 50 resistors
Uses same RUP, RDN resistors as for series OCT
ZS Z0Stratix III / IV /V
bidirectional memory
VCC
100
ZS
bidirectional memory100
© 2012 Altera Corporation—Confidential
350
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 175
Loading Considerations
Decisions on settings and termination need to take type of loading into accountyp g
DIMM vs. discrete components No connector and no on-board series resistors Reduces loading, improves signal integrity
Single-rank vs. dual-rank DIMMsHigher density DIMMs increase load Higher density DIMMs increase load
Slows edge rates, affecting signal integrity Improve with increased drive strength at the expense of more power
usage
Multiple DIMMs Many options: one or both populated multiple sets of controlsMany options: one or both populated, multiple sets of controls See EMIF Handbook for detailed analysis and recommendations
© 2012 Altera Corporation—Confidential
351
Summary: Unidirectional Recommendations
Class I ODT on memory if availableDDR2: 50 or 75 DDR2: 50 or 75
DDR3: 60 or 120
If ODT is not supported, use memory side pp , yexternal termination - Series OCT at 50
No external components possible with DDR2/3!
ZS = 50 Ω Z0 = 50 ΩFPGA
VCC/2
22
VTT
ZP
DIMM
ZS
ODT
DIMM
© 2012 Altera Corporation—Confidential
352
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 176
Summary: Bidirectional Recommendations
Class II FPGA side external termination (if not Stratix III / IV / V device)
ODT on memory on writes (dynamic ODT with DDR3)
ODT not supported, external term. at memory
Dynamic series/parallel OCT at 50 if available Series OCT at 25 if not
No extra components possible with Stratix III / IV / V device and DDR2/3!/ V device and DDR2/3!
© 2012 Altera Corporation—Confidential
353
Cyclone III and Cyclone IV
ALTMEMPHY performs initial data capture using self-calibrating circuit
DQS strobes from memory are not used for ycapture
Dynamic PLL clock used to capture DQ data Dynamic PLL clock used to capture DQ data signals
© 2012 Altera Corporation—Confidential
354
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 177
ALTMEMPHY Timing Paths (1)
Timing PathALTMEMPYVariations
ApplicableClock
Description
Address andcommand
All mem_clkSetup and hold margin for all address and command pins or for mem_cke, mem_cs_n, and mem_odt pins.
PHY All PHY clocks Internal timing of the ALTMEMPHY megafunction.g g
PHY reset All PHY clocksInternal timing of the asynchronous reset signals to the ALTMEMPHY megafunction.
DQS vs. CK DDR/DDR2 mem_clk_DQSSSkew requirement for the DQS strobe at the memory with respect to the arrival time of CK/CK#.
Half-rate dd d
DDR/DDR2/lk
Setup and hold margin for the address and command pins (except for mem cs n, mem cke, and mem odt
address and command
DDR/DDR2/DDR3
mem_clkpins (except for mem_cs_n, mem_cke, and mem_odtpins) with respect to the mem_clk clock at the memory when the PHY is in half-rate mode.
Mimic DDR/DDR2 clk[0]The setup margin for the voltage and temperature tracking mechanism
[ ]tracking mechanism.
Read capture
All dqsSetup and hold margin for the DQ pins with respect to DQS strobe at the FPGA capture registers.
© 2012 Altera Corporation—Confidential
355
ALTMEMPHY Timing Paths (2)
Timing PathALTMEMPYVariations
ApplicableClock
Description
Read Postamble
DDR/DDR2 Postamble clockSetup and hold time margin for the postamble path that is calibrated with the resynchronization clock phase.
Read Postamble DDR/DDR2 DQS Clocks
The setup and hold margin for the postamble logic that enables and disables the DQS signal going to the DQPostamble
EnableDDR/DDR2 DQS Clocks enables and disables the DQS signal going to the DQ
registers.
Read Resync.
DDR/DDR2/DDR3
Resync. ClockSetup and hold margin for the DQ data with respect to resynchronization and the postamble clock at the resynchronization and the postamble registers
yresynchronization and the postamble registers.
Write datapath
All DQSSetup and hold margin for the DQ pins with respect to DQS strobe at the memory.
Write Skew margin for the arrival time of the DQS strobe atWrite leveling tDQSS
DDR3 except Arria II GX
CK/CK# clocksSkew margin for the arrival time of the DQS strobe at the memory with respect to the arrival time of CK/CK# at the memory.
Write DDR3 t S t d h ld i f th DQS f lli d ithleveling tDSS
/tDQSH
DDR3 except Arria II GX
CK/CK# clocksSetup and hold margin for the DQS falling edge with respect to the CK clock at the memory.
© 2012 Altera Corporation—Confidential
356
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 178
ALTMEMPHY
File Description
Clock constraints for PLL inputs
ddr_timing.sdc
Clock constraints for PLL inputs.Generated clock constraints for PLL outputsDerive clock uncertaintyExceptions (false paths and multi-cycle paths)OOutput delays on address and command outputsOutput delays on DQS strobe outputs
example_top.sdc Provides constraints for the example driver block
ddr_timing.tcl Includes memory interface and FPGA device parameters
report_timing.tcl Reports timing slacks
report_timing_core.tcl Contains high-level procedures for report timing script g g g
ddr_pins.tcl Library of useful functions
© 2012 Altera Corporation—Confidential
357
Note: Files are preceded with variation name.
Mimic Path - ALTMEMPHY
Mimics the round trip delay
Enables calibration sequencer to track variations Voltage
Temperature
Adjusts without affecting operation of controller
No timing constraints required for Arria II GX and Stratix IV devices
Cyclone III and IV devices place mimic register close to the IOEclose to the IOE
© 2012 Altera Corporation—Confidential
358
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 179
DQS vs. CK Path Cyclone III/IV
Indicates skew requirement for arrival of DQS strobe at memory
Requires timing constraints to account for duty q g ycycle distortion (set_output_delay max & min)
© 2012 Altera Corporation—Confidential
359
ALTMEMPHY DLL Sharing
Instantiate DLL externally Ensure Instantiate DLL externally option turned on in PHY
Settings page of MegaWizard Plug-In Manager
Stratix devices only Stratix devices only
© 2012 Altera Corporation—Confidential
360
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 180
ALTMEMPHY PLL Sharing ALTMEMPHY requires 5 PLL output taps (min)
phy_clk_1x - static system clock for data path and controller
mem clk 2x - static DQS output clock for DQS CK/CK# and input to DLL mem_clk_2x - static DQS output clock for DQS, CK/CK#, and input to DLL
write_clk_2x - static DQ output clock for DQ signals 90° before DQS
resynch_clk_2x - dynamic phase clock for resynchronization and postamble
measure_clk_2x - dynamic phase clock for VT tracking
mem_clk_ext_2x - optional static clock when dedicated outputs used for CK/CK#
ac_clk_2x - dedicated static clock for address and command signals
(Arria II GX, Stratix III, IV, V devices only)
Multiple PLL clock output sharing options Depending on number of clock networks and PLL outputs available
Share static clocks, saving up to 4 clock networks With unique resynchronization clocks for each interfaceWith unique resynchronization clocks for each interface
While mimic paths can be shared or independent
User needs to design own logic to share mimic clock
Requires modifications to design Once ALTMEMPHY Megafunction files changed, cannot re-open in MegaWizard Plug-In
Manager
© 2012 Altera Corporation—Confidential
361
Number of PLL Outputs by Device
Device Family
Number of Enhanced PLL Clock Outputs
Number of Dedicated Clock Outputs
Arria II GX 4 clock outputs each1 single-ended or 1 differential pair
3 single-ended or 3 differential pair total
Cyclone III 1 single ended or 1 differential pair totalCyclone III
Cyclone IV5 clock outputs each
1 single-ended or 1 differential pair total
(not for memory interface use)
Left/right: 7 clock outputsLeft/right: 2 single-ended or 1 differential pair
Stratix IIILeft/right: 7 clock outputs
Top/bottom: 10 clock outputsTop/bottom 6 single-ended or 4 single-ended and 1 differential pair
Arria II GZ Left/right: 7 clock outputsLeft/right: 2 single-ended or 1 diff. pair
Arria II GZ
Stratix IV
Left/right: 7 clock outputs
Top/bottom: 10 clock outputsTop/bottom 6 single-ended or 4 single-ended and 1 differential pair
Stratix V 18 clock outputs each4 single-ended or 2 single-ended and 1
Stratix V 18 clock outputs eachdifferential pair
© 2012 Altera Corporation—Confidential
362
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 181
Example Stratix III/IV PLL Sharing Limitations (ALTMEMPHY)(ALTMEMPHY) Top / bottom PLLs have up to ten clock outputs
Up to three controllers sharing same PLL with separate resynchronization clock and measure clocks (4 static) + (2 dynamic x 3)(4 static) (2 dynamic x 3)
Or up to five controllers sharing the same PLL (4 static) and measure clocks (1 dynamic) with a separate resynchronization clock (1 dynamic x 5)clock (1 dynamic x 5) (4 static) + (1 dynamic) + (1 dynamic x 5)
When sharing measure clock, ensure memory devices accessed by each different controller are laid out with same trace lengthsdifferent controller are laid out with same trace lengths
© 2012 Altera Corporation—Confidential
363
Cyclone III / IV Clock Sharing Limitations
PLLs have up to 5 clock outputs (3 static) + (2 dynamic) for single interface
Need additional PLL to resynchronize other interface
© 2012 Altera Corporation—Confidential
364
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 182
Dealing with Clock Crossing Logic
Clock crossing bridges added if components not clocked by memorycomponents not clocked by memory controller (Adds latency)
Manually add bridge or use half-rate Manually add bridge or use half rate bridge in controller
Must cut unrelated timing paths g pfrom timing analysis and place & route
Uses clock-crossing FIFOs to translate transfers across-l k d iclock domains
© 2012 Altera Corporation—Confidential
365
Example SDC Syntax to Cut Paths#**************************************************************# Create Clocks#**************************************************************# Define external clock frequency from oscillator:# Define external clock frequency from oscillator:create_clock -name clk_0 -period 20.000 -waveform 0.000 10.000 [get_ports clk] -add
#**************************************************************# Define aliases for long clock names:#**************************************************************set uniphy_ddr2_0_clock_source
sopc_top_inst|the_uniphy_ddr2_0|mem_if|controller_phy_inst|memphy_top_inst|upll_memphy|altpll_component|auto_generated|pll1|clk[1]
set system_clk sopc_top_inst|the_pll|the_pll|altpll_component|auto_generated|pll1|clk[0]
#**************************************************************# Set False Paths#**************************************************************# Cutting the paths between the system clock and DDR local clock since there is a clock
crossing # bridge between them (FIFOs)g # g ( )set_false_path -from [get_clocks $system_clk] –to [get_clocks $uniphy_ddr2_0_clock_source]set_false_path -from [get_clocks $uniphy_ddr2_0_clock_source] –to [get_clocks $system_clk]
© 2012 Altera Corporation—Confidential
366
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 183
Performance Considerations (1)( )
Don’t waste bandwidth through width mismatches Ideal data size at Avalon interface is 32 bits for (32-bit) Nios II
processorprocessor
Double-data rate and half-rate stages both double incoming data width 16-bit memory device best for full-rate option (16x2=32)
8-bit external memory device best when using half-rate option (8x2x2=32)(8x2x2 32)
© 2012 Altera Corporation—Confidential
367
Performance Considerations (2)( )
Data at Avalon interface (burst of 1)
Consider full-rate DDR (16-bit memory chip) example:
(burst of 1)
Data at memory interface
ABCDEFGH (32 bits per Avalon clock)
Data at memory interface(burst of 2)
EFGH ABCD (2x16 bits) at memory interface
Consider half-rate DDR (8-bit memory chip) example:Data at Avalon interface (burst of 1)
ABCDEFGH (32 bits per Avalon clock)
Data at memory interface(burst of 4)
GH EF CD AB (4x8 bits at memory interface)
© 2012 Altera Corporation—Confidential
368
GH EF CD AB (4x8 bits at memory interface)
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 184
CSR Address Map
Address Bit Description
25 Reports the value of afi cal fail
0x004
25 Reports the value of afi_cal_fail
24 Reports the value of afi_cal_success
0 Initiates a soft reset0 Initiates a soft reset
0x00523:16 Write figure of merit
7 0 R d fi f it7:0 Read figure of merit
0x00623:16 Initial failing error group of calibration
7:0 Initial failing error stage of calibration
0x007 31:0Indicates whether DQS edges have been id tifi d f hidentified for each group
Figure of merit: sum over all groups of minimum margin on DQ
© 2012 Altera Corporation—Confidential
369
+ margin on DQS divided by 2; measure of interface health
Implementing, Simulating, & Debugging External Memory Interfaces
A-MNL-ISDMI-12-0-v1 185
Implementing, Simulating, and Debugging External Memory
Interfaces
Exercise Manual
Software Requirements: Quartus® II software v. 12.0, ModelSim®-Altera® Edition software v. 10.0d
Hardware requirements: Stratix® IV GX FPGA development kit
Link to the Quartus II and External Memory Interfaces Handbooks:
http://www.altera.com/literature/hb/qts/quartusii_handbook.pdf
http://www.altera.com/literature/lit-external-memory-interface.jsp
Use the link below to download the design files for the exercises:
http://www.altera.com/customertraining/ILT/Memory_Interface_12_0_v1.zip
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
2
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
3
Exercise 1
Create a Quartus II Design with the High Performance DDR3 SDRAM Controller with
UniPHY
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
4
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
5
Exercise 1
Objective:
• Create and parameterize an instance of the DDR3 High Performance Controller (HPC) II with UniPHY
Introduction:
This document walks you through the steps necessary to create, constrain, and verify operation of the DDR3 SDRAM High Performance Controller with UniPHY in a Stratix IV GX device. The lab is targeted to the Stratix IV GX FPGA Development Board. However, using the same steps, it could be targeted to any development board supporting DDR3 (or DDR2 with modifications).
Hardware Requirements:
- Altera Stratix IV GX FPGA Development Board, which includes:
o Stratix IV EP4SGX230KF40C2 FPGA o Micron MT41J64M16LA-15E 1 Gb (128 MB) DDR3 SDRAM (x5)
components (top interface: 128 MB; bottom interface: 512 MB) - USB-Blaster™ programming interface built into development board and connected
between the computer and the board via USB - The appropriate power supply connected to the board
Performance Expectations:
The Stratix IV C2 device is rated at up to 533 Mhz (1066 Mbps) for DDR3 SDRAM. Using all 4 “bottom” port DDR3 devices (U5, U12, U18, U24) on the development board gives a maximum bandwidth as:
64 (bits wide) x 1066 Million (bits/second) = 68224 Million bits per second or 68.2 Gbps
In this lab, you will connect the DDR3 memory controller to all four of the “bottom” port DDR3 SDRAM devices on the development board (64 bits wide; about 68.2 Gbps).
As you proceed through the exercises, be sure to completely read the instructions for each step and sub-step in this lab manual. Each step first summarizes what you will be doing in that step before providing detailed instructions. Use the lines next to each step (____) to keep track of your progress or to check off completed steps in the exercises.
If you have any questions or problems, please ask the instructor for assistance.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
6
Step 1: Extract the lab files
____ 1. Unzip the lab project files, if necessary. In an Explorer window, go to C:/altera_trn/Memory. This is your lab installation directory. Please check with your instructor if you do not see this directory. Delete any old lab file folders that may already exist there. Double-click the executable file Memory_Interface_12_0_v1.exe. If you cannot find this file, ask your instructor for assistance. In the WinZip dialog box, just click Unzip to automatically extract the files in place to a new folder named MEM12_0 in the directory mentioned above.
From now on, this will be referred to as the <Mem lab install directory>.
____ 2. Create a Quartus II project in order to start creating the memory IP. Start the Quartus II software, version 12.0, from the Start menu (All Programs → Altera 12.0 Build 178 → Quartus II 12.0; use the 64-bit version if using an Altera training laptop or desktop) or from a shortcut on the desktop.
____ 3. From the File menu, select New Project Wizard. Choose the following options in completing the wizard:
a. Page 1 - Name the project and top-level design entity siv_ddr3 and set the project directory to <Mem lab install directory>.
b. Page 2 - Leave this page blank as you will add files later in the exercise.
c. Page 3 - Set the Device Family to Stratix IV (GT/GX/E). Set Devices to Stratix IV GX to help filter the Available devices list. You can further filter the list by setting the Pin count to 1517 and the Speed grade to 2. Select the EP4SGX230KF40C2 device. Be sure to select the correct device!
d. Page 4 - Though you will be simulating the design later, leave this page blank.
e. Click Finish.
The project you just created won’t actually be used in the labs today. You’ll be using the example project created by the MegaWizard® Plug-In Manager exclusively. However, in your own designs, you would create your own top-level project with your own user logic to talk to the controller and then instantiate the memory IP into it.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
7
Step 2: Create and parameterize the DDR3 memory IP
In this step you will create a DDR3 IP block targeting the Stratix IV GX family. The DDR3 IP generated will include synthesizable and simulation versions of the Altera HPCII and UniPHY. The Quartus II 12.0 MegaWizard Plug-in Manager will also create an example project instantiating the DDR3 IP block, as well as a traffic generator, allowing you to test the controller in simulation as well as on the targeted FPGA development board.
____ 1. From the Tools menu, select MegaWizard Plug-In Manager.
____ 2. When the Megawizard Plug-in Manager opens, select Create a new custom megafunction variation and click Next.
____ 3. On page 2a of the Megawizard tool, make the following selections (if necessary): Device family Stratix IV
Type of output file Verilog HDL
Output file C:/altera_trn/Memory/MEM12_0/ddr3_top
____ 4. Expand the Interfaces folder, the External Memory folder, and then the DDR3 SDRAM folder. Select DDR3 SDRAM Controller with UniPHY v12.0. The tool should look like the screenshot below. Click Next to continue.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
8
The tool will launch the DDR3 SDRAM Controller with UniPHY dialog box. This dialog box has six tabs across the top. You will configure the DDR3 SDRAM controller for the Stratix IV GX Development Board.
The lab instructions don’t go into detail about each setting you’ll be making in the parameter editor. If you are interested in learning more about the settings, you can hover over the setting to get a tooltip; click Documentation in the upper right hand corner of the tool; or search for the setting name in the External Memory Interfaces Handbook, included with the exercise files in the Vendor_files folder or on the Altera web site at http://www.altera.com/literature/lit-external-memory-interface.jsp.
____ 5. Under the PHY Settings tab, make the following selections. Any setting not listed should be left at its default value.
FPGA Speed Grade 2
Clocks Memory clock frequency 533
PLL reference clock frequency 50
Rate on Avalon-MM interface Half
Advanced PHY Settings Advanced clock phase control Enabled
Additional address and command clock phase
0.0
Additional CK/CK# phase 0.0
PLL, DLL, OCT sharing mode No sharing
____ 6. Select a memory preset. From the Presets list on the right, select the memory found on the development board, MICRON MT41J64M16LA-15E, and click Apply.
The preset sets default values for the memory you’ll be interfacing to, saving time in having to scour the memory data sheet for information. If you are curious about this memory, its datasheet can be found in the Vendor_files folder in the lab installation directory.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
9
____ 7. Select the Memory Parameters tab. Verify that the memory preset selected the correct values and set custom values for this design using the list below. Again, if a setting is not listed, leave it at its default value. (Some of these settings we’ll discuss in more detail throughout the rest of today.)
Memory Parameters
Memory vendor Micron
Memory Format Discrete Device
Memory device speed grade 666.667
Total interface width 64
DQ/DQS group size 8
Number of chip selects 1
Number of clocks per chip select 1
Row address width 13
Column Address Width 10
Bank address width 3
Enable DM pins Enabled
Memory Topology
Fly-by topology Enabled
Memory Initialization Options
READ Burst Type Sequential
DLL precharge power down DLL off
Memory CAS latency setting 8
Output drive strength setting RZQ/7
Memory additive CAS latency Disabled
ODT Rtt nominal value RZQ/4
Auto selfrefresh method Manual
Selfrefresh temperature Normal
Memory write CAS latency setting 6
Dynamic ODT (Rtt_WR) value RZQ/4
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
10
____ 8. Select the Memory Timing tab, but do not make any changes to the values. If you want, compare the values here with the values found in the vendor data sheet, found in the Vendor_files directory of the class file installation directory. The setup and hold times (tIS, tIH, tDS, tDH) will get derated automatically based on the board parameters of the development kit.
Note that the automatic derating set in the next step will not change the values entered into the Memory Timing tab.
____ 9. Select the Board Settings tab. Under Setup and Hold Derating, switch the derating method to Specify slew rates to calculate setup and hold times. Enter the values for this and the Board Skews using the table below. There is no need to adjust the ISI values.
These are all unique values that have already been determined for the Stratix IV GX development board through board simulation. Don’t worry if you see warnings or errors as you enter these values; they will go away once all values have been entered.
Setup and Hold Derating
CK/CK# slew rate (Differential) 4.0
Address and command slew rate 1.5
DQS/DQS# slew rate (Differential) 3.0
DQ slew rate 1.5
Board Skews Maximum CK delay to DIMM/device 0.618
Maximum DQS delay to DIMM/device 0.368
Minimum delay difference between CK and DQS 0.25
Maximum delay difference between CK and DQS 0.378
Maximum skew within DQS group 0.017
Maximum skew between DQS groups 0.128
Average delay difference between DQ and DQS 0.021
Maximum skew within address and command bus 0.072
Average delay difference between address and command and CK
0.015
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
11
____ 10. Select the Controller Settings tab. On this tab, change Maximum Avalon-MM burst length to 64.
____ 11. Also on the Controller Settings tab, turn on Enable Configuration and Status Register Interface, making sure that CSR port host interface is set to Internal (JTAG).
The CSR port will be used later when you use the EMIF Toolkit with the interface. Leave all other settings at their defaults.
____ 12. Select the Diagnostics tab. Make sure that the Auto-calibration mode is set to Skip calibration to save time later when the interface gets simulated.
____ 13. Also on the Diagnostics tab, enable Skip Memory Initialization Delays and Enable verbose memory model output.
____ 14. Under Debugging Options, set the Debugging feature set to Option 1 if it isn’t set already.
____ 15. Finally, turn on Enable the Efficiency Monitor and Protocol Checker on the Controller Avalon Interface. Again, these features will be used later with the EMIF Toolkit.
____ 16. Click Finish. When prompted, be sure that Generate Example Design is enabled, and click Generate. Let the instructor know when you have started generating the IP.
Exercise Summary • Created a top-level project for the memory interface
• Created, parameterized, and started generation of the IP
END OF EXERCISE 1
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
12
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
13
Exercise 2
Verify the High Performance DDR3 Memory Controller Functionality through Simulation
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
14
Objectives:
• Generate and examine the simulation module files created with the simulation scripts
• Simulate the example design
The Megawizard Plug-In Manager generates 3 folders: a version of the IP for instantiation and synthesis; a version of the IP for simulation; and an example design folder. The example design folder contains a full example project for synthesis and scripts for generating files for simulation. The example synthesis project and the simulation files generated from the scripts each include a traffic generator block to test the design in simulation as well as on the development board. The simulation version of the project includes a testbench and a generic external memory model for testing. The diagram below represents the simulation system generated by the scripts.
In this exercise, you’ll generate the example simulation files (top-level entity is ddr3_top_example_sim), and perform a scripted simulation using the ModelSim simulator. To save time, you will simulate the system using the generic memory model. This is not as accurate as using a vendor memory model, but it will give a good approximation of the actual behavior of the interface.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
15
Step 1: Generate the simulation files and examine them
As noted in the presentation, the IP generation only creates scripts to generate files for simulation instead of generating an entire project. This simplifies things since a Quartus II project is not necessary for simulating the design in a 3rd-party simulation tool, such as the ModelSim simulator. A simple project is generated, however, to make it easy to run the simulation file generation scripts.
____ 1. From the end of the last exercise, click Exit and choose to add the .qip file to the project if asked.
____ 2. From the File menu, select Open Project. Open the simulation generation project, generate_sim_example_design.qpf, located in
<Mem lab install directory>/ddr3_top_example_design/simulation/
For your own reference, there’s also a README file in this location that explains the simulation file generation we are about to go through.
____ 3. Change the device setting to the correct one. From the Assignments menu, select Device. Select from the Available devices list the same Stratix IV GX selected earlier (EP4SGX230KF40C2).
____ 4. From the Tools menu, select Tcl Scripts.
____ 5. Select the script named generate_sim_verilog_example_design.tcl. Examine the script, if you’d like, in the Preview window. When you’re ready, click Run.
____ 6. The script may take some time to execute, with no indication that it’s running. Click OK in the dialog that appears when it completes.
If you’d like, you can also run the generate_sim_vhdl_example_design.tcl script to generate the VHDL version of the simulation files. However, since the main IP was generated as Verilog, we’ll use the Verilog version for this and the rest of the lab exercises. Of course, if this was your own design and VHDL is your preferred HDL, you would use it instead.
____ 7. Open the newly-generated ddr3_top_example_sim.v file in the verilog directory using the Quartus II text editor or WordPad. Do not edit the file.
The testbench instantiates the example design (ddr3_top_example_sim_e0), a status checker, and the generic DDR3 SDRAM memory module and connects the memory interface signals appropriately.
____ 8. Close the top-level file.
____ 9. If you’d like, examine the files listed in the diagram above, but do not edit them. They can be found in
<Mem lab install directory>/ddr3_top_example_design/simulation/verilog/submodules/
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
16
Step 2: Simulate the design in the ModelSim-Altera software
The generated simulation files include a script to completely handle the compilation of the simulation files and the setup and running of the simulation itself. You’ll replace this script with one that displays additional waveforms during the simulation.
____ 1. In an Explorer window, copy the run.do file from the Constraint_files folder in the lab install directory to the mentor simulation directory that was generated by the script, replacing the existing run.do:
<Mem lab install directory>/ddr3_top_example_design/simulation/verilog/mentor
____ 2. Start the ModelSim-Altera software from the Windows Start menu (All Programs → Altera 11.1 Build 178 → ModelSim-Altera 10.0d (Quartus II 12.0) Edition) or from a shortcut on the desktop. Click Close if the introduction window opens.
____ 3. From the File menu, select Change Directory. Change the directory to
<Mem lab install directory>/ddr3_top_example_design/simulation/verilog/mentor
____ 4. From the Tools menu, go to the Tcl submenu and select Execute Macro. Select the run.do file you copied and click Open.
The script runs, compiling the testbench and all the files that make up the interface in the submodules folder. When the simulation actually starts, you’ll see the Wave window appear. The window will populate with all of the top-level signals listed in the testbench file along with a “virtual” signal named memcommandwave. This signal, defined in run.do, creates a mnemonic for the control signals (ras_n, cas_n, we_n) to make it easy to see the commands being sent to the memory.
____ 5. Observe the simulation status in the Transcript and Wave windows.
The Transcript window displays the write and read tests performed by the interface and evaluates whether the results are correct.
The Wave window displays a graphical representation of the top-level signal levels. During the simulation, select the Wave window to highlight it and use the Wave zoom
buttons to update the window and observe signal activity. Right-click a signal to change its radix (hex is useful for the address and dq buses). To hide the expanded signal names that include the name of the testbench, click the tiny button at the
bottom of the signal list with the tooltip Toggle leaf names <-> full names.
The simulation will run for approximately 170 μs. When the functional simulation is successful, the testbench should output a SIMULATION PASSED message in the Transcript window. This will take several minutes.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
17
____ 6. When the simulation finishes, a dialog box will appear asking if you are finished simulating. Click No so you will be able to explore the Wave window.
____ 7. Explore the functionality of the controller in greater detail by looking at the transactions on the waveforms in the wave window.
The entire simulation waveform should resemble the following:
The early activity on the DDR3 interface signals is the controller writing to the mode registers (cs_n, ras_n, cas_n, we_n all low; ba picks which of the 4 mode registers to write to). The empty part of the simulation is the calibration that was skipped when the IP was generated. After this, observe e0_emif_status_local_init_done going high. Once this happens, the traffic generator starts to test the DDR3 interface by generating write and read transactions. Once testing is finished, the e0_drv_status_test_complete signal goes high and the simulation finishes.
Diving into the waveforms, you can observe the read and write transactions on the DDR3 interface. Try to correlate the read back data with the data written to memory. This is easiest to do with the tests at the very end of the simulation which can be matched to information at the bottom of the Transcript window, but it can still be a little tricky due to the data masking (dm). Write and read tests occur in subsequent groups of issued commands, so you can try to observe that the first dq data written with the WR command on memcommandwave should correspond to the first dq data written when memcommandwave is RD. The dm masking can be seen in the Wave window or listed with the data in the Transcript. Ask the instructor if you need assistance.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
18
____ 8. Close the simulator after exploring the signals in the Waves window.
Exercise Summary • Generated the files for performing a functional simulation of the example design
• Ran the simulation and examined the results
END OF EXERCISE 2
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
19
Lab 3
Complete the DDR3 Memory Controller
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
20
Step 1: Modify the top-level of the design
In this step, you will use the synthesizable version of the design and set it up for running on the development board. As part of this, you’ll change the polarity of the test output signals so they can drive LEDs on the development board.
____ 1. From the Quartus II File menu, select Open Project.
____ 2. Open ddr3_top_example.qpf found in:
<Mem lab install directory>/ddr3_top_example_design/example_project
____ 3. Change the target device for the project. From the Assignments menu, select Device. Again select the same Stratix IV GX device (EP4SGX230KF40C2) from the Available devices list that you selected for the original top-level project. Don’t click OK yet.
____ 4. Change the reserved setting for unused pins on the device so that the device will draw less power. Click Device and Pin Options, and select the Unused Pins category.
____ 5. Set Reserve all unused pins to As input tri-stated. Click OK twice.
____ 6. Open the top level of the design by double-clicking ddr3_top_example in the Project
Navigator.
____ 7. Change the example project status signals to active low so that they can drive LEDs on the development board. In the module definition at the top of the ddr3_top_example.v file, look for the output signals drv_status_test_complete, drv_status_pass, and drv_status_fail. For each of these three signals, append _n to the signal name.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
21
____ 8. Add Verilog wire and assign statements to invert the three signals. A little further down in the file, you should see a number of wire type declarations. Underneath, add the following new wire declarations and assign statements to invert the signals: wire drv_status_test_complete; wire drv_status_pass; wire drv_status_fail; assign drv_status_test_complete_n = ~drv_status_test_complete; assign drv_status_pass_n = ~drv_status_pass; assign drv_status_fail_n = ~drv_status_fail;
____ 9. Save and close the ddr3_top_example.v file.
Step 2: Add constraints and assign I/O locations
Signals entering or exiting the FPGA device need to be assigned physical pin locations on the device I/O. The signals that require these location assignments are listed in the tables below. In this step, you will source a Tcl script created from the Megawizard Plug-In Manager to set up a number of I/O assignments and then use another script to create the required I/O location assignments. A synthesized netlist is required in order to run these scripts.
Top-level design inputs and outputs
Inputs Outputs
pll_ref_clk drv_status_pass_n
global_reset_n drv_status_fail_n
soft_reset_n drv_status_test_complete_n
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
22
DDR3 interface signals
mem_a mem_odt
mem_ba mem_ras_n
mem_ck mem_cas_n
mem_ck_n mem_we_n
mem_cke mem_dq
mem_cs_n mem_dqs
mem_dm mem_dqs_n
oct_rdn oct_rup
____ 1. Synthesize the project. From the Processing menu, go to Start, and select Start Analysis & Synthesis. You can also click the toolbar icon . Ignore all warnings.
____ 2. While the design is synthesizing, copy the ddr3_top_globals_pin_locations_bot.tcl file from the <Mem lab install directory>/Constraint_files directory to the submodules directory: <Mem lab install directory>/ ddr3_top_example_design/example_project/ddr3_top_example/submodules
____ 3. When synthesis is complete, add constraints generated by the MegaWizard Plug-In Manager to the project. From the Tools menu, select Tcl Scripts.
____ 4. Select the ddr3_top_example_if0_p0_pin_assignments.tcl script. Click Run. Click OK when the script completes.
This Tcl script sources 2 other Tcl scripts located in the submodules directory, creating I/O assignments for the DDR3 pins. If you like, open the Tcl scripts in the submodules folder and examine them.
____ 5. From the Assignments menu, open the Assignment Editor.
The Assignment Editor is basically a spreadsheet of the assignments for the project that are stored in the .qsf file. In the Assignment Editor, you can see all the assignments that were added by the script. Notice the Input Termination and Output Termination assignments. You may also notice a value of Flexible_timing for assignments named Memory Interface Delay Chain Configuration. This indicates the use of the newer, flexible timing model on the I/O delay chains used for the interface, as opposed to the older macro timing model.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
23
You’ll also see a number of Global Signal assignments. These assignments set the PLL clocks (signals named auto_generated|clk) to use the global routing resources and force a number of control signals to not use the global routing resources. Remember that putting control signals on the global resources could cause recovery and removal timing failures.
At this point in the design flow, you have to assign all of the pin-outs to the design as per your board requirements. Normally, you would use the Pin Planner or source a pre-defined pin-out script for this. Today, to save time, you will use a premade script named ddr3_top_globals_pin_locations_bot.tcl prepared specifically for this lab.
____ 6. Select Tcl Scripts from the Tools menu again, and Run the ddr3_top_globals_pin_locations_bot.tcl script that you copied to the submodules folder earlier. Click OK when complete.
____ 7. Open the Quartus II Pin Planner by selecting Pin Planner from the Assignments menu.
You can see that the interface has been placed along the bottom edge of the chip.
The Pin Planner should appear with DQ/DQS pin groups highlighted as shown above. If the pin groups are not highlighted, from the View menu, select Show, then Show DQ/DQS Pins and finally In x8/x9 Mode.
Feel free to examine the I/O assignments for all the top-level signals. Right-click in the All Pins list at the bottom of the window and select Customize columns to add columns for other assignments for each I/O pin, such as Output Termination and Input Termination.
pll_ref_clk
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
24
____ 8. When you are finished, close the Pin Planner.
____ 9. From the File menu in the Quartus II software, select Save Project.
Step 3: Add a SignalTap™ II instance to the design
The SignalTap II embedded logic analyzer allows you to tap and monitor any internal node(s) in the design. Normally, you might add a SignalTap II instance to your design after you discover that something has gone wrong, and you need to debug it. In this section of the lab, you will add it to your design now for use later in the day to tap a number of useful signals. This approach will save us some time by avoiding an extra compile.
Recall that the traffic generator provides the pass, fail, and test_complete signals (named drv_status_pass, drv_status_fail, and drv_status_test_complete by default) to indicate whether the memory interface is operating correctly or not. You inverted these signals in the design to drive LEDs on the Stratix IV GX development board. The new signals, drv_status_pass_n, drv_status_fail_n, and drv_status_test_complete_n are tied to user LEDs D23, D22, and D21 (labeled 0, 1, and 2 on the board) respectively. D21 should turn on at the end of the test. D23 turns on to indicate a passing test while D22 turns on to indicate a test failure. You will add SignalTap II nodes to allow you to probe these signals along with a number of others.
The SignalTap II file has already been created for you.
____ 1. Examine the SignalTap II file. From the File menu, select Open (not Open project). Change the Files of type to SignalTap II Logic Analyzer Files (*.stp), and open the ddr3_top.stp file from <Mem lab install directory>.
Notice the signals that will be tapped by the logic analyzer. Besides the drv_status_test_complete_n, drv_status_pass_n, and drv_status_fail_n status signals, the other signals are all Avalon bus signals, indicated by the avl_ prefix. As mentioned earlier, you should only tap the local Avalon interface between the controller and the traffic generator (or your user logic) because you want to avoid adding stubbed routing paths on the timing-critical signals of the external interface.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
25
The logic analyzer Setup tab should look like this:
____ 2. From the Assignments menu, select Settings. Go to the SignalTap II Logic Analyzer category.
____ 3. Turn on Enable SignalTap II Logic Analyzer. Click the browse button next to SignalTap II File name and open the ddr3_top.stp file. Don’t click OK yet.
Step 4: Make final project settings and compile the design
Before starting the compilation, you’ll make some final project settings that will help optimize the design in order to meet timing.
____ 1. Still in the Settings dialog box, go to the Fitter Settings category and set the Fitter effort to Standard fit.
____ 2. Make sure that Optimize hold timing is enabled and set to All Paths.
____ 3. Make sure that Optimize multi-corner timing is turned on.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
26
____ 4. In the Compilation Process Settings category, make sure Use smart compilation and Run Assembler during compilation are turned on.
____ 5. In the Analysis & Synthesis Settings category, set the Optimization Technique to Speed.
____ 6. Finally, in the Physical Synthesis Optimizations category, turn on all 4 options under Optimize for performance (combinational logic, register retiming, asynchronous signal pipelining, and register duplication), and set the Effort level to Extra. Click OK to close the Settings dialog box.
While these options will certainly increase compile time, they can help guarantee that the design will meet timing.
You are now ready to compile your design.
____ 7. Start compilation by either selecting Start Compilation from the Processing menu or clicking Start Compilation in the toolbar.
The compilation process will take some time. Please inform the instructor once you have started compiling.
Exercise Summary • Made I/O related assignments, including termination settings and locations
• Added the SignalTap II embedded logic analyzer to the design to capture internal signal data during runtime
END OF EXERCISE 3
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
27
Lab 4
Verify the High Performance DDR3 Memory Controller through Timing Analysis and
In-System Testing on the Board
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
28
Step 1: Verify timing using the TimeQuest timing analyzer
____ 1. Once compilation has completed, open the TimeQuest Timing Analyzer by selecting TimeQuest Timing Analyzer from the Tools menu or by clicking the toolbar button .
____ 2. Double-click Report DDR in the Device Specific Reports folder in the Tasks pane. Once this is performed, a green checkmark should appear.
This will perform the necessary steps in order to obtain timing reports. It will create a post-fit, slow corner timing netlist, read in the SDC files that the MegaWizard Plug-In Manager automatically added to the project, and update the timing netlist based on the timing constraints. TimeQuest will then generate the DDR timing report.
____ 3. When the script is run, you will see timing information in the Console window at the bottom of the window. Your design should meet all DDR timing (but see note below). If you had failing paths, however, you would need to investigate and fix them.
If you see some timing failures in the Core, they can be safely ignored for our purposes. The CSR interface and the Efficiency monitor add some additional delay between the example driver and the controller. For a final design, you could remove these debug features (which you’ll experiment with soon) in order to meet timing.
____ 4. Locate the worst address/command path in the Chip Planner. Highlight the first path shown in the if0 Address Command (setup) Summary of Paths tab.
____ 5. Right-click the path and select Locate Path.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
29
____ 6. Select to Locate in the Chip Planner.
____ 7. Observe the path in the Chip Planner when it opens. Once the Chip Planner opens, from the View menu, select Show Delays. You may have to zoom in to see the delay label.
____ 8. Click the + on the delay label to expand the path into its actual routing segments through the device.
Feel free to cross-probe from other paths in TimeQuest to the Chip Planner to see how they were routed. Each path you locate from TimeQuest is stored in the Locate History window at the bottom of the Chip Planner for easy review later. You may also wish to look at the timing waveforms on the Waveform tab in TimeQuest to graphically explore timing margins, etc.
____ 9. Close the Chip Planner and TimeQuest when you are finished.
Step 2: Verify operation of the DDR3 interface
After verifying the timing requirements, you can now download the design to the FPGA and verify that the DDR3 interface works properly. You’ll use the SignalTap II embedded logic analyzer to do that in this step.
____ 1. Plug in the Stratix IV GX development board and turn it on. The fan should start running and a number of LEDs should light.
____ 2. Set the rotary dial switch (SW2) to position 1.
____ 3. Plug the USB A-B cable into the board and connect it to your computer.
You should not have to install the drivers for the built-in USB-Blaster hardware if you are working on an Altera training computer. If the New Hardware Wizard appears on a training computer or you are using your own machine, please ask the instructor for assistance in getting the driver set up.
____ 4. Open the SignalTap file added to the project in Lab 3 if it’s not already open. You will program the device from here.
____ 5. Select USB-Blaster [USB-0] from the Hardware menu if it is not already selected. If the USB-Blaster connection does not appear in the list, click Setup and select it from the Currently selected hardware list. Click Close.
If you don’t see the USB-Blaster connection or cannot connect, please ask the instructor for assistance.
____ 6. Click browse to point to the ddr3_top_example.sof file that was generated by the Assembler during compilation.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
30
The JTAG Chain Configuration section should look like the screenshot below. Make sure that the EP4SGX230 is selected as the target Device.
____ 7. Click to program the device.
It will take about 20 to 30 seconds to program the device, during which time a status bar will appear to fill and refill a number of times. This is normal.
The example driver provides the pass, fail, and test_complete signals to determine whether your memory interface is operating correctly. The pass signal of the example driver is driven to logic high as long as the data written to memory matches what is read from the same location. Remember you connected these signals through inverters to LEDs on the development board. If D22 and D21 are turned on, that indicates proper functioning of the interface (the test passed and the test completed, respectively). If D23 turns on, the test has failed, indicating a mismatch between what was written and what was read back.
Other signals (like local_cal_fail) were not inverted earlier, so you should also see D20 lit as well, indicating that the calibration did not fail.
____ 8. Press the design’s global reset button (PB0) to restart the test and observe the LEDs.
Test fail Test pass Test
complete
Global reset
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
31
Step 3: Verify the design with the SignalTap II Embedded Logic Analyzer
With SignalTap II logic analyzer, you’ll be able to see the actual data transfers on the Avalon (local) bus similar to what you saw in the simulation earlier on the external interface.
____ 1. Click Run Analysis in the SignalTap II Instance Manager to start the logic analyzer.
The logic analyzer starts looking for the trigger: the falling edge of drv_status_complete_n. You should observe the Status of the logic analyzer instance as Waiting for trigger.
The driver starts running as soon as the device is programmed. To actually catch the driver test, you’ll need to reset the design.
____ 2. Press the global_reset_n button (PB0) on the development board.
This should trigger the logic analyer when the drv_status_test_complete_n signal goes low.
If you dig deeper into the waveforms (left-click to zoom in, right-click to zoom out), you can look at the read and write transactions on the Avalon interface and correlate the writes with the read back data. In the screenshot above (from a 16-bit version of the interface), the cursor (on the right) is placed at a location where avl_rdata_valid is high, indicating valid incoming data. If you look at avl_rdata at that point, highlighted in the Value column on the left, you can see it starts with 3C12… If you look at the indicated avl_wdata, you can see the write that generated this read. It may sometimes be difficult to find matching data because of the data reordering performed by the controller, but you should be able to find 16-bit or 32-bit patterns that match.
The pass signal always stays high, indicating that the test has passed.
____ 3. Try triggering on other signals. Switch back to the Setup tab, and try triggering on the rising edge of avl_write_req, avl_read_req, and avl_rdata_valid, switching the other signals back to don’t care each time and setting the Trigger position to the Pre trigger position.
Triggering on the rising edge of each of these signals will let you see the behavior of the Avalon interface right after calibration.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
32
Step 4: Generate calibration and margin reports with the EMIF Debug Toolkit (optional; time permitting; discussed in the next presentation section)
The EMIF Toolkit gives you the unique ability to analyze how your memory interface was calibrated by the UniPHY sequencer and how much margin you have in your design. For example, if memory calibration failed, you could use the EMIF Toolkit to pinpoint which signal failed and why. It could be a problem with unmatched board routing or a delay chain may have been set incorrectly. The toolkit can help figure it out.
The EMIF Toolkit makes use of the CSR Avalon interface you enabled to communicate during runtime with the Qsys system-based UniPHY sequencer. Operation of the EMIF Toolkit is very similar to the TimeQuest timing analyzer, making use of tasks to connect the toolkit to the memory interface sequencer and to generate reports. To use the toolkit, you have to establish a link with the CSR interface of the sequencer.
____ 1. From the Quartus II Tools menu, select External Memory Interface Toolkit.
____ 2. From the Tasks pane in the toolkit, double-click Initialize Connections.
This task looks for all available connections through the JTAG interface and generates a Discovered Connections report. Each connection found is made up of a number of nodes, so you’ll see multiple rows in this report even though there is actually only a single connection. This report is useful for figuring out which memory interface you want to link the toolkit to if you have more than one JTAG connection or more than one memory interface on one or more devices on your board. Since this design uses only one interface with a single JTAG interface, you could have skipped this step.
____ 3. Double-click the Link Project to Device task. Click OK.
This establishes a link between the hardware connection and the Quartus II project through a JTAG debugging information, or .jdi, file, which was generated by the Assembler during compilation. The file provides information about the debugging interfaces that were compiled into the design and programmed into the FPGA device. For the memory toolkit, this includes the CSR interface and the efficiency monitor.
____ 4. Create a connection to the memory interface sequencer. Double-click the Create Memory Interface Connection task.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
33
____ 5. Look over the information displayed. Optionally, enter a Connection name. Click OK.
Once you establish this connection, a number of additional tasks become available. You can also look through some newly generated reports that summarize the interface and indicate whether any DQS groups or memory ranks were masked during calibration of the interface.
____ 6. Double-click the Rerun Calibration task in the Commands folder.
Since the toolkit was not active during the initial calibration after programming the device, you must rerun calibration to get information about it from the sequencer. This also generates the Calibration Report folder.
____ 7. Examine the generated reports in the Calibration Report folder.
The calibration reports indicate the margins that were observed during calibration and the final delay chain settings and phase adjustments that were selected to best center the DQS in the DQ bits.
____ 8. Double-click the Generate Margining Report task in the Commands folder, and examine the generated reports in the Margin Report folder.
The margining reports indicate how much margin there is on each DQ signal before a read or write would fail. The Read Data Valid Windows and Write Data Valid Windows reports graphically illustrate the DQ Pin Post Calibrations Margins report, displaying DQS for each DQ bus group as a black line within the data valid window (DVW).
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
34
____ 9. Make a connection to the efficiency monitor. Double-click the Create Efficiency
Monitor Connection task.
____ 10. Again, look over the connection information and optionally enter a Connection name. Click OK.
____ 11. Examing the reports generated in the Avalon-MM Efficiency Monitor folder.
In general, efficiency is calculated as the number of active cycles of data transfer divided by the total number of operating cycles. The efficiency numbers you see in the Efficiency Monitor Statistics report are somewhat low because not much data was transferred by the traffic generator for its test. If you were using your own logic to continuously read and write data to the interface, you would get a better picture of the efficiency of the controller.
The Protocol Checker Summary Statistics report indicates if there were any violations of the Avalon bus protocol between the user logic (the traffic generator in this example design) and the interface.
____ 12. When you are done looking through the reports, close the EMIF Toolkit and quit the Quartus II software.
Implementing, Simulating, and Debugging External Memory Interfaces Exercises
Copyright © 2012 Altera Corporation
A-MNL-ISDMI-EX-12-0-v1
35
Exercise Summary • Performed a timing analysis on the interface
• Verified the functional operation of the memory interface using the board LEDs and the SignalTap II logic analyzer
• Used the EMIF Toolkit to generate reports relating to calibration and margin on the interface, as well as using it to evaluate controller efficiency
END OF EXERCISE 4
Recommended