Upload
buithu
View
242
Download
0
Embed Size (px)
Citation preview
Bern University of Applied Sciences BTE5380 - Embedded Systems
Lecture 3Xilinx 7 TechnologyXilinx Serie 7 Zynq TechnologyPrint version of the lecture in BTE5380 - Embedded Systems
Presented on Octobre 2014Document Revision
by Andreas Habegger <[email protected]> from HuCE-microLabat Bern University of Applied Sciences
3.1
1 IntroductionIntroduction
all Xilinx Fpgas contain the same basic resources:logic resources:
slices: grouped into configurable logic blocks (Clb), containing combinatorial logic andregister resourcesmemorymultipliers
interconnect resourcesprogrammable interconnectIobs, interface between the Fpga and the outside world
other resourcesglobal clock buffersboundary scan logic
The slides are based on Xilinx Tutorials3.2
7-Series FPGA Families
HuCE-microLab Last change: 31. 12. 2014 Page 1 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3.3
2 Logic ResourcesConfigurable Logic Block (CLB) in 7-Series FPGA
primary resource fordesign
combinatorialfunctionsflip-flops
Clb contains twoslices
SLICEM: Lutcan be used forlogic and mem-orySLICEL: Lutcan only beused for logic
connected to switchmatrix for routingto other Fpga re-sources
carry chainruns verticallyin a columnfrom one sliceto the oneabove 3.4
Slice Resource
HuCE-microLab Last change: 31. 12. 2014 Page 2 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
4 six-inputlook-up tables(Lut)multiplexerscarry chainsSrl4 flip-flops4 latches 3.5
6-Input LUT with Dual Output
Luts can be two5-input Luts withcommon input
minimal speedimpact to a 6-input Lutone or two out-puts
any function of sixvariables or two func-tions of five variables 3.6
Wide Multiplexers
each F7MUX combinesthe outputs of two Lutstogether
can implement anyarbitrary 7-inputfunctioncan implement an 8-1 multiplexer
each F8MUX combinesthe outputs of twoF7MUX together
can implement anyarbitrary 8-inputfunctioncan implement an16-1 multiplexer
MUX is controlled by theAX,BX,CX slice inputsMUX output can driveout combinatorialy or tothe flip-flop/latch 3.7
HuCE-microLab Last change: 31. 12. 2014 Page 3 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
Carry Chain
carry chain can imple-ment fast arithmetic ad-dition and subtraction
carry out is prop-agated verticallythrough the fourLuts in a slicethe carry chainpropagates fromone slice to the slicein the same columnin the Clb above
carry look-aheadcombinatorial carrylook-ahead over thefour Luts in a sliceimplements fastercarry cascadingfrom slice to slice 3.8
Slice Flip-Flops and Flip-Flop/Latches
each slice has 4 flip-flop/latches (FF/L)
can be configured aseither FFs or latchesthe D input cancome from the O6Lut output, thecarry chain, thewide mux or theAX,BX,CX,DX sliceinput
each slice also has 4 flip-flops (FF)
the D input cancome from the O5Lut output, thecarry chain, thewide mux or theAX,BX,CX,DX sliceinput
if any of the FF/L areconfigured as latches, the4 FFs are not available
HuCE-microLab Last change: 31. 12. 2014 Page 4 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3.9
Slice Flip-Flop Capabilities
all flip-flop are D typeall flip-flops have a single clock input (CK)
clocks can be inverted at the sliceboundary
all flip-flops have an active high chip enable(CE)all flip-flops have an active high SR input
input can be synchronous or asyn-chronous as determined by correspond-ing configuration bitsets the flip-flop value to a pre-determined state as determined by cor-responding configuration bit 3.10
Control Setsall flip-flop/latches share the same CK, SR, and CE sig-nals
this is refered to as the "control set" of the flip-flopsCE and SR are active high
if any one flip-flop uses the CE, all others must use thesame CE
CE gates the clock at the slice boundarysaves power
if any one flip-flop uses the SR, all others must use thesame SR
the reset value used for each flip-flop is individuallyset by the SRVAL attribute
3.11
SLICEM Used as 32-bit Shift Register
HuCE-microLab Last change: 31. 12. 2014 Page 5 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
versatile SRL-type shift regis-ters
variable-length shift regis-tersynchronous Fifoscontent-addressable mem-ory (Cam)pattern generatorcompensated for delay/la-tency
shift register length is deter-mined by the address
constant value giving fixeddelay linedynamic addressing forelastic buffer
cascadable up to 128x1 shiftregisters in one slice 3.12
Shift Register LUT Example
operation D-NOP must add 17 pipeline stages of 64 bits each1088 flip-flops (hence 136 slices) or64 SRLs (hence 16 slices)
3.13
SLICEM Used as Distributed SelectRAM Memoryuses the same storage that is used for the look-up table functionsynchronous write, asynchronous read
can be converted to synchronous read using the flip-flopsavailable in the slice
various configurationssingle port : Lut6 = 64x1 or 32x2 Ramdual port (D): 1 read/write + 1 read-only portsimple dual port (Sdp): write-only + 1 read-only portquad-port (Q): 1 read/write + 3 read-only ports
3.14
HuCE-microLab Last change: 31. 12. 2014 Page 6 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3 I/O Resources7-Series FPGA I/O
wide range of voltages1.2V to 3.3V operation
many different I/O standardssingle ended and differentialreferenced inputs3-state support
very high performanceup to 1600 MBps Lvdsup to 1866 MBps single-ended for Ddr3
easy interfacing to standard memorieshardware support for Qdrii+ and Ddr3
digitally controlled impedancelow power: features to reduce power
3.15
I/O Block Diagram
3.16
I/O Electrical Resources
HuCE-microLab Last change: 31. 12. 2014 Page 7 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
P and N pins can be configuredas
induvidual single-ended sin-gals ordifferential pairs
receiver can be standard Cmosor voltage comparator
when standard Cmos:logic 0 when "near"groundlogic 1 when "near"VCCO
referenced to VREF:logic 0 when "below"VREFlogic 1 when "above"VREF
differential:logic 0 when VP < VNlogic 1 when VP > VN 3.17
I/O Logical Resources
two blocks of logic per I/O pairmaster and slavecan operate indepen-dently or concatenated
each block containsILOGIC/ISERDES: Sdr,Ddr, or high-speed serialinput logicOLOGIC/OSERDES:Sdr, Ddr, or high-speedserial output logicIDELAY: selectable fine-grained input delayODELAY: selectable fine-grained output delay, onlyavailable on high perfor-mance I/O 3.18
Serial/Parallel Converters
HuCE-microLab Last change: 31. 12. 2014 Page 8 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
input serial-to-parallel and output parallel-to-serial convertersclocks in data from input pad or IDELAY
D is clocked on high speed clock (CLK)can be Sdr or Ddr
sends de-serialized data to fabricQ is clocked on low speed clock (CLKDIV)
CLK and CLKDIV must be in phasede-serialized data
single data rate: 2,3, . . .8double data rate: 4, 6, 8
cascade with slave for wider ratiosdouble data rate: 10,14
has BITSLIP logic for framing parallel data
3.19
4 Memory Resources7-Series Block RAM and FIFO
all members of the 7-seriesfamilies have the same BlockRam/Fifofully synchronous operation
all operations are syn-chronous; all outputs arelatched
optional internal pipeline regis-ter for higher frequency opera-tiontwo independent ports accesscommon data
individual address, clock,write enable, clock enableindependent data widthsfor each port
multiple configuration optionsintegrated 64/72-bit Hammingerror correctionintegrated control for fast andefficient Fifos 3.20
Single-Port Block RAM
HuCE-microLab Last change: 31. 12. 2014 Page 9 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
single read/write portclock: CLKA, address: ADDRA, write enable:WEAwrite data: DIA, read data: DOA
36-kbit configurations32kx1, 18kx2, 8kx4, 4kx9, 2kx18, 1kx36
18-kbit configurations16kx1, 8kx2, 4kx4, 2kx9, 1kx18, 512x36
configuration write modeWRITE_FIRST: data written on DIA is available on DOAREAD_FIRST: old contents of Ram at ADDRA is presentedNO_CHANGE: the DOA holds ts previous value (savespower)
optional output register for maximum performance 3.21
Dual-Port Block RAMtwo seperate read/write ports
ports have seperate clock,address, data in, data out,write enable, . . .clocks can be asyn-chronous to each otherthe two ports can have dif-ferent widths and modes
no contention avoidance whenboth ports access the same ad-dress, except
if clocked by the sameclock, and write port isREAD_FIRST
simple dual-port block Rameach port has seperateclock and address
block Ram cascading128Kb, 256Kb, 512Kb, 1Mb, . . . 3.22
FIFO
full featuredsynchronous or asynchronous read and write clocksflags: fully, empty, programmable almost-full/empty
Fifo configurations
HuCE-microLab Last change: 31. 12. 2014 Page 10 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
and 36Kb block Ram: 8K4x, 4Kx9, . . . 512x72and 18Kb block Ram: 4Kx4, 2Kx9, . . . 512x72write and read width must be equal
can use the integrated error correction when used in the x72 width3.23
5 DSP Resources7-Series DSP48 Slice
3.24
Using DSP48 for Non-DSO Function
HuCE-microLab Last change: 31. 12. 2014 Page 11 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3.25
6 ADC Resources7-Series DSP48 Slice
Xadc is a high quality and flexible analog interfacedual 12-bit Adcs, on-chip sensors, 17 flexible analog inputs, and track & holds with pro-grammable signal conditioning1V input range16-bit resolutionbuilt in digital gain and offset calibration
analog mixed signal (Ams)using the Fpga programmable logic to customize the Xadc and replace other external ana-log functions; for example, linearization, calibration, filtering, and DC balancing to improvedata conversion resolution
3.26
XADC Block Diagram
fast samplingconversion time of1µs with supportfor simultaneuossamplingself and externaltriggering timingmodesseparate track-/hold amplifier foreach Adc
flexible analog inputsdifferential analoginputs with highCmrrsupport for unipo-lar, bipolar, andtrue differential in-put signal types
HuCE-microLab Last change: 31. 12. 2014 Page 12 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3.27
7 Clocking ResourcesHigh-Performance Clocking
modern applications have complexclocking requirements
extremely high-performanceclock signalssupport for multiple fre-quency domains across a widefrequency rangede-skewing of clocks relativeto one anotherlow jitter and precise duty cy-cle to maintain the widestpossible data valid windowlowest possible system power
Fpgas have a rich mixture ofclocking resources to accomodatethese requirements 3.28
Clock Management
HuCE-microLab Last change: 31. 12. 2014 Page 13 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
systems usually require mutlipleclock frequencies from the samesource
minimizing the number of os-cillators lowers system cost
external clock sources can often benoisy
filtering jitter cleans up clockswidening data valid window
many circuits need to be clockedat the same time to ensure correctoperation
de-skewing and aligningclocks eliminates hold-timeissues and race conditions 3.29
Clock Networks
Fpga has clock management tile (Cmt)one mixed-mode clock manager (Mmcms) and one Cmt in each clockperforms frequency synthesis, clock de-skewing, and jitter-filtering
Fpga has four clock-capable inputs in each bankthese inputs are regular I/O pins with dedicated connections to internal clock resourcesclock inputs can be used single-ended or differental
Fpga is a regular array of resourcesmany of these resources require clocksclock must be distributed to these resources
for synchronous operation, clocks must arrive at the clocked elements withextremely low clock skew: ensures minimal hold-timelow clock jitter: allows highest performanceduty cycle preservation: important for double data rate
3.30
Die View
global clock buffer and routing column in the chip centerMmcms, Plls are in Cmt columns adjacent to IO columnshorizontal spines of global clock network run through center of each clock region, driven by BUFHsregional clock routing driven by BUFRsI/O clock networks driven by BUFIOs
HuCE-microLab Last change: 31. 12. 2014 Page 14 of 15
Bern University of Applied Sciences BTE5380 - Embedded Systems
3.31
8 ConclusionConclusion
the Fpga slices contain Luts, multiplexers, carry logic for combinatorial logic as well as registersthe Fpga IOBs contain Ddr registers as well as serial/parallel resourcesmultiple I/O standards are supportedthe Fpga contains dedicated block Ram and Dsp resourcesthe Fpga contains dedicated Mmcms, Plls, and clock routing resourcesthe Zynq-7000 processing platform is a system-on-chip (SoC) processor with embedded pro-grammable logic fabric of either Artix or Kintex 7-series Fpga
3.32
HuCE-microLab Last change: 31. 12. 2014 Page 15 of 15