51
SEU effects in FPGA How to deal with them? Csaba Soos PH-ESE-BE

SEU effects in FPGA How to deal with them?

  • Upload
    ulani

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Csaba Soos PH-ESE-BE. SEU effects in FPGA How to deal with them?. Outline. Introduction Radiation environment (LHC), definitions SEE in FPGA devices Impact on device resources SEU testing Mitigation techniques SM encoding, memory protection, reconfiguration, TRM etc. Commercial FPGAs - PowerPoint PPT Presentation

Citation preview

Page 1: SEU effects in FPGA How to deal with them?

SEU effects in FPGAHow to deal with them?

Csaba SoosPH-ESE-BE

Page 2: SEU effects in FPGA How to deal with them?

Outline Introduction

Radiation environment (LHC), definitions SEE in FPGA devices

Impact on device resources SEU testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

2/6/2009 R2E Radiation School: SEU effects in FPGA 2

Page 3: SEU effects in FPGA How to deal with them?

Radiation environment

Beam – beam interactions (near IPs) Beam – residual gas interactions Beam losses

2/6/2009 R2E Radiation School: SEU effects in FPGA 3

TID

SEE

Page 4: SEU effects in FPGA How to deal with them?

Radiation environment

2/6/2009 R2E Radiation School: SEU effects in FPGA 4

Comparison between Space environment and the CMS at the LHCSource: F. Guistino’s PhD thesis

Page 5: SEU effects in FPGA How to deal with them?

Single Event Effects (SEE)

2/6/2009 R2E Radiation School: SEU effects in FPGA 5

Heavy ion striking a transistor and creating charge along its path

Page 6: SEU effects in FPGA How to deal with them?

Single Event Effects (SEE) Single Event Upset (SEU)

State change, due to the charges collected by the circuit sensitive node, if higher than the critical charge (Qct)

For each device there is a critical LET Single Event Functional Interrupt (SEFI)

Special SEU, which affects one specific part of the device and causes the malfunctioning of the whole device

Single Event Latch-up (SEL) Parasitic PNPN structure (thyristor) gets triggered, and

creates short between power lines Single Event Gate Rupture (SEGR)

Destruction of the gate oxide in the presence of a high electric field during radiation (e.g. during EEPROM write)

2/6/2009 R2E Radiation School: SEU effects in FPGA 6

Page 7: SEU effects in FPGA How to deal with them?

Definitions and Units

Flux: rate at which particles impinge upon a unit surface area, given in particles/cm2/s

Fluence: total number of particles that impinge upon a unit surface area for a given time interval, given in particles/cm2

Total dose, or radiation absorbed dose (rad): amount of energy deposited in the material (1 Gy = 100 rad)

2/6/2009 R2E Radiation School: SEU effects in FPGA 7

Page 8: SEU effects in FPGA How to deal with them?

Definitions and Units

Linear Energy Transfer (LET): the mass stopping power of the particle, given in MeV/mg/cm2

Cross-section (σ): the probability that the particle flips a single bit, given in cm2/bit, or cm2/device

Failure in time rate (in 1 billion hours):FIT/Mbit = Cross-section*Particle flux*106*109

Mean Time Between Functional Failure:MTBFF = SEUPI*[1/(Bits*Cross-section*Particle

flux)]2/6/2009 R2E Radiation School: SEU effects in FPGA 8

Page 9: SEU effects in FPGA How to deal with them?

Failure rate calculation

Example:FIT/Mb = 100Configuration size = 20 MbFIT = FIT/Mb * Size = 2000,i.e. 2000 errors are expected in 1 billion hours

(Note: fluence above is 14 n/hour)Expected fluence: 3 x 1010 n/10 years

# of errors in 10 years = 2000 x (3 x 1010/ 14 x 109) = 4286

Taking into account the SEUPI factor:# of errors in 10 years = 4286 / 10 = 428

2/6/2009 R2E Radiation School: SEU effects in FPGA 9

Page 10: SEU effects in FPGA How to deal with them?

Failure rate calculation

ALICE Detector Data Link:Fluence (10 years): F = 3.9 x 1011 n/cm2

Cross-section: σ = 8.2 x 10-13 cm2/LC (i.e. per logic cell)

# of configuration errors per LC: F x σ = 0.32 error/LC

# of LCs in the design : 2500# of configuration errors per device: 2500 x 0.32

= 800

In other words, ~1 error per hour in one of the 400 link cards2/6/2009 R2E Radiation School: SEU effects in FPGA 10

Page 11: SEU effects in FPGA How to deal with them?

2/6/2009 R2E Radiation School: SEU effects in FPGA 11

Introduction Radiation environment (LHC), definitions

SEE in FPGA devices Impact on device resources

SEU Testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

Page 12: SEU effects in FPGA How to deal with them?

Sample FPGA architecture

2/6/2009 R2E Radiation School: SEU effects in FPGA 12

Page 13: SEU effects in FPGA How to deal with them?

FPGA logic cell and routing

2/6/2009 R2E Radiation School: SEU effects in FPGA 13

LUTDFF

M

M

M

M M M M

Page 14: SEU effects in FPGA How to deal with them?

Sensitive FPGA resources Configuration memory

It defines the logic functions (LUT) and the routing

Large devices contain several megabits of configuration memory

Large fraction of this memory is not used by a design (SEU Probability Impact, SEUPI)

User logic User RAM, flip-flops

Additional FPGA resources (JTAG, POR etc.) Single-event Functional Interrupt (SEFI)

2/6/2009 R2E Radiation School: SEU effects in FPGA 14

Page 15: SEU effects in FPGA How to deal with them?

Configuration memory vs. SRAM Configuration memory is more robust

Size constraints are not the same; SRAM cells must be smaller, hence more sensitive

Configuration memory is based on a static latch Configuration memory has higher critical charge

Configuration memory does not have to be fast Manufactures can improve the design (e.g. by maximizing

the capacitive load) However, there are much more configuration memory

cells in the device; the chance of an upset is higher Embedded RAMs follow the standard manufacturing

trends, but they can be protected by ECC (or other techniques)

2/6/2009 R2E Radiation School: SEU effects in FPGA 15

Page 16: SEU effects in FPGA How to deal with them?

SEU in configuration memory May change the programmed

combinatorial logic by rewriting the LUT e.g. A & B => A & !B

May create internal open, or short circuit (will not damage the device) e.g. Q = GND or ‘floating’

May have no impact on the device operation (don’t care configuration cell) 10 is a good (pessimistic) derating factor (can

be 100 !)

2/6/2009 R2E Radiation School: SEU effects in FPGA 16

Page 17: SEU effects in FPGA How to deal with them?

SEU in user logic Flip-flop (dynamic)

User RAM (static)

2/6/2009 R2E Radiation School: SEU effects in FPGA 17

DFF‘0’

clk

Q

1 1 0 1 1 1 1 01 0 1 0 1 1 0 11 0 1 1 1 1 1 01 1 1 0 1 1 1 1

1 1 0 1 1 1 1 01 0 1 0 1 1 0 11 0 1 1 1 0 1 01 1 1 0 1 1 1 1

‘0’

Page 18: SEU effects in FPGA How to deal with them?

2/6/2009 R2E Radiation School: SEU effects in FPGA 18

Introduction Radiation environment (LHC), definitions

SEE in FPGA devices Impact on device resources

SEU Testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

Page 19: SEU effects in FPGA How to deal with them?

Rosetta experiment

Real-time experiment with atmospheric neutrons Link between accelerated testing (proton or

neutron) and the real effects of atmospheric neutrons

Experimental sites at different locations and at different altitudes Sets of 100 devices are monitored constantly Altitudes from -488 m to 4023m

Verification carried out using simulation and by tests done at the Los Alamos Neutron Science Center17/12/2008 PH-ESE Seminar: FGPA's in Radiation Zones 19

Page 20: SEU effects in FPGA How to deal with them?

Rosetta experiment

17/12/2008 PH-ESE Seminar: FGPA's in Radiation Zones 20

Family, process

Neutron @ 10 MeV Rosetta (atmospheric)

CRAM (cm2) BRAM (cm2) CRAM (FIT/Mb)

BRAM (FIT/Mb)

V2, 150 nm 2.50E-14 2.64E-14 401 397V2P, 130 nm 2.74E-14 3.91E-14 384 614S3, 90 nm 2.40E-14 3.48E-14 199 390V4, 90 nm 1.55E-14 2.74E-14 246 352S3E/A. 90 nm 1.31E-14 2.63E-14 108 306V5, 65 nm 6.67E-15 3.96E-14 151 635

Note: configuration FIT/Mb does not include SEUPI=10 derating factor. Reference fluxat NYC =14 n/hour. Reminder: FIT = number of errors in 1 billion hours.Source: Xilinx

Page 21: SEU effects in FPGA How to deal with them?

Accelerated testing

High-energy proton or neutron beam proton: package shadowing and TID

dependence Heavy-ion irradiation Static or dynamic testing

Configuration or application memory read back Large shift-registers

See for example: ATLAS policy Or consult the JEDEC JESD89 standards

JESD89A, JESD89-1A, JESD89-3A

2/6/2009 R2E Radiation School: SEU effects in FPGA 21

Page 22: SEU effects in FPGA How to deal with them?

2/6/2009 R2E Radiation School: SEU effects in FPGA 22

Introduction Radiation environment (LHC), definitions

SEE in FPGA devices Impact on device resources

SEU Testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

Page 23: SEU effects in FPGA How to deal with them?

Configuration management

2/6/2009 R2E Radiation School: SEU effects in FPGA 23

time

SEUReconfiguration

Read

time

SEU

Regular reconfiguration

Page 24: SEU effects in FPGA How to deal with them?

Reconfiguration: Altera

Built-in CRC detection reports about flips in the configuration memory

Location information can help to filter out the ‘don’t care’ changes and to act upon critical errors only

2/6/2009 R2E Radiation School: SEU effects in FPGA 24

Page 25: SEU effects in FPGA How to deal with them?

Reconfiguration: Xilinx

Partial reconfiguration (scrubbing) The system remains fully operational Some parts of the device cannot be

refreshed Half-latch Full configuration can refresh everything

Combine with TMR to reduce the error rate

2/6/2009 R2E Radiation School: SEU effects in FPGA 25

time

Regular reconfiguration

Module #1

Module #2

Module #3

Page 26: SEU effects in FPGA How to deal with them?

Triple-module redundancy It works, if the SEU stays in one of the

triplicated modules, or on the data path It fails, if the errors accumulate, and two

out of the three modules fail, or the SEU is in the voter

2/6/2009 R2E Radiation School: SEU effects in FPGA 26

DFF

DFF

DFF

CombLogic

CombLogic

CombLogic

Majority

Voter

A

B

Clk

Out

Page 27: SEU effects in FPGA How to deal with them?

Functional TMR (FTMR)

VHDL approach for automatic TMR insertion

Configurable redundancy in combinatorial and sequential logic

Resource increase factor: 4.5 – 7.5 Performance decrease Ref.: Sandi Habinc

http://microelectronics.esa.int/techno/fpga_003_01-0-2.pdf

2/6/2009 R2E Radiation School: SEU effects in FPGA 27

Page 28: SEU effects in FPGA How to deal with them?

Improved TMR by Xilinx

2/6/2009 R2E Radiation School: SEU effects in FPGA 28

DFF

DFF

DFF

CombLogic

CombLogic

CombLogic

Majority

Voter

A

B

Clk

Majority

Voter

Majority

Voter

Supported by the XTMR Tool from Xilinx

Minority

Voter

Minority

Voter

Minority

Voter

PCB trace

Page 29: SEU effects in FPGA How to deal with them?

Multiple-Bit Upsets

2/6/2009 R2E Radiation School: SEU effects in FPGA 29

Ref.: H. Quinn et al, “Domain Crossing Errors: Limitations on SingleDevice Triple-Modular Redundancy Circuits in Xilinx FPGAs”

Page 30: SEU effects in FPGA How to deal with them?

State-machines

Used to control sequential logic SEU may alter/halt the execution Encoding can be changed to improve SEU

immunity (be careful with optimization)

2/6/2009 R2E Radiation School: SEU effects in FPGA 30

SM type Speed Resources Protection

Binary Fast Smallest NoneOne-hot Slow Large PoorHamming 2 Good Moderate FairHamming 3 Slowest Largest Good

Ref.: G. Burke and S. Taft, “Fault Tolerant State Machines”, JPL

Page 31: SEU effects in FPGA How to deal with them?

Very sensitive resource Optimized for speed/area -> Low Qct

Errors can easily accumulate Mitigation

Parity, ECC, EDAC, TRM, scrubbing

User memory

2/6/2009 R2E Radiation School: SEU effects in FPGA 31

ECCencode RAM ECC

decode

RAM

RAM

RAM

Vote

Vote

Vote

Scrub control

D

D

D

WE

WE

WE

A

A

A

Q

Q

Q

Page 32: SEU effects in FPGA How to deal with them?

2/6/2009 R2E Radiation School: SEU effects in FPGA 32

Introduction Radiation environment (LHC), definitions

SEE in FPGA devices Impact on device resources

SEU Testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

Page 33: SEU effects in FPGA How to deal with them?

Altera HardCopy devices

SRAM-based FPGA is used as prototype Using a HardCopy-compatible FPGA ensures

that the ASIC always works Design is seamlessly converted to ASIC

No extra tool/effort/time needed Increased SEU immunity and lower power

Expensive and not reprogrammable

We loose the biggest advantage of the FPGA

2/6/2009 R2E Radiation School: SEU effects in FPGA 33

Page 34: SEU effects in FPGA How to deal with them?

Xilinx Aerospace Products

Virtex-4 QPro V-grade Total-dose tolerance at least 250 krad SEL Immunity up to LET > 100 MeV/mg-cm2

Characterization report (SEU, SEL, SEFI): http://parts.jpl.nasa.gov/docs/NEPP07/NEPP07FPGAv4Static.pdf

Expensive , but reprogrammable

2/6/2009 R2E Radiation School: SEU effects in FPGA 34

Page 35: SEU effects in FPGA How to deal with them?

Xilinx’s SIRF products

SIRF – Single-Event Immune Reconfigurable FPGA

Radiation hardened by design (RHBD) Design goals:

Total-dose > 300 krad SEL immune > 100 MeV/mg-cm2

SEU rate < 1E-10 errors/bit-day SEFI rate < 1E-10 errors/bit-day

It will be certainly expensive

2/6/2009 R2E Radiation School: SEU effects in FPGA 35

Page 36: SEU effects in FPGA How to deal with them?

Actel ProASIC3 FPGA

2/6/2009 R2E Radiation School: SEU effects in FPGA 36

Flash-memory based configuration 0.13 micron process SEL free1

SEU immune configuration1

Heavy Ion cross-sections (saturation) 2E-7 cm2/flip-flop 4E-8 cm2/SRAM bit

Total-dose Up 15 krad (some issues above)

Not expensive and reprogrammable Note 1: Tested at LET = 96 MeV/mg-cm2

Page 37: SEU effects in FPGA How to deal with them?

Actel Antifuse FPGA

Non-volatile antifuse technology (OTP) 0.15 micron process SEU immune configuration SEU hardened (TMR) flip-flop Heavy Ion cross-section (saturation)

9E-10 cm2/flip-flop 3.5E-8 cm2/SRAM bit (w/o EDAC)

Total-dose Up to 300 krad

Expensive and not reprogrammable 2/6/2009 R2E Radiation School: SEU effects in FPGA 37

Page 38: SEU effects in FPGA How to deal with them?

2/6/2009 R2E Radiation School: SEU effects in FPGA 38

Introduction Radiation environment (LHC), definitions

SEE in FPGA devices Impact on device resources

SEU Testing Mitigation techniques

SM encoding, memory protection, reconfiguration, TRM etc.

Commercial FPGAs SRAM-based FPGAs, flash-based FPGAs,

antifuse FPGAs Applications

Page 39: SEU effects in FPGA How to deal with them?

ALICE TPC Readout Control Unit Measured cross-section (Xilinx FPGA): 2.8E-9

cm2/device Expected flux: 100 – 400 p/cm2-s Number of boards (i.e. FPGA devices): 216 Expected SEFI in 4 hours: 3.5 failures It is at the limit of what can be tolerated Active Partial Reconfiguration has been

implementedRef.: K. Røed et all, “Irradiation tests of the complete ALICE

TPC Front-End Electronics chain”

2/6/2009 R2E Radiation School: SEU effects in FPGA 39

Page 40: SEU effects in FPGA How to deal with them?

ALICE TPC RCUActive reconfiguration Functionality of both DCS and RCU board

can experience errors due to radiation effects in the FPGAs

Simple reloading of configuration data causes downtime and is thus not applicable to RCU board (interruption of data-flow) - Active error detection and reconfiguration

scheme using an FPGA capable of refreshing firmware w/o interrupting operation

Active Partial Reconfiguration “scrubbing”DCS-board

RCU-board

Radiation tolerantFPGA(Actel)

XilinxVirtex-II Pro

FPGAFLASH

Altera FPGAw/ ARM cpu

Bank 0

Bank 1Bank 2Bank 3

FLASH memw/ Linux

Bank 0Bank 1Bank 2Bank 3

SelectMap IF

2/6/2009 40R2E Radiation School: SEU effects in FPGA

Page 41: SEU effects in FPGA How to deal with them?

SEFI test with XilinxVirtex-II Pro FPGA

Scrubbing started after ~200 s:

Errors are correctedContinuously

~sec to “scrubb” fulldevice

Improved to ~ms

60 180 240 3600 seconds120

32 B

it Li

nes,

RED

= E

rror

s

Plain Shift Register (flux ~1.5*107 p/cm2-s)

Test carried out by G. Tröger, KIP

ALICE TPC RCUTest results

Page 42: SEU effects in FPGA How to deal with them?

ALICE DDL Source Interface Unit Prototype design (Altera FPGA)

Expected failure rate: ~ 1 failure /1 hour / 400 SIU cards This was not accepted

Every time there is a failure, the run needs to be restarted Several mitigation techniques were discussed

Reconfiguration => complex board design, size constraints

Design has been migrated to flash-based FPGA No configuration loss TID tolerance meets the requirements

Read more at: http://cern.ch/ddl/radtol

2/6/2009 R2E Radiation School: SEU effects in FPGA 42

Page 43: SEU effects in FPGA How to deal with them?

Summary Make sure you understand the requirements

Simulation of the environment is essential Try to select the components/technologies

Pay attention to the requirements Test your components

Look around, you may find some information about the selected components

Try to assess the risk SEU may not be critical, or it can be catastrophic

Mitigate Verify

2/6/2009 R2E Radiation School: SEU effects in FPGA 43

Page 44: SEU effects in FPGA How to deal with them?

Additional documentation Radiation hardness assuranceLink:

http://lhcb-elec.web.cern.ch/lhcb-elec/html/radiation_hardness.htm

Report on “Suitability of reprogrammable FPGAs in space applications” by Sandi Habinc, Gaisler Research

Link: http://microelectronics.esa.int/techno/fpga_002_01-0-4.pdf

2/6/2009 R2E Radiation School: SEU effects in FPGA 44

Page 45: SEU effects in FPGA How to deal with them?

Thank you!

2/6/2009 R2E Radiation School: SEU effects in FPGA 45

Page 46: SEU effects in FPGA How to deal with them?

Spare slides

2/6/2009 R2E Radiation School: SEU effects in FPGA 46

Page 47: SEU effects in FPGA How to deal with them?

TID trends

2/6/2009 R2E Radiation School: SEU effects in FPGA 47

050100150200250300350400

50100150200250300350nm

TID

Kra

ds (S

i)(p

er 1

019.

6)

*See “CMOS SCALING, DESIGN PRINCIPLES and HARDENING-BY-DESIGN METHODOLOGIES” by Ron Lacoe, Aerospace Corp2003 IEEE NSREC Short Course 2003

Page 48: SEU effects in FPGA How to deal with them?

Typical cross-section curve

2/6/2009 R2E Radiation School: SEU effects in FPGA 48

Page 49: SEU effects in FPGA How to deal with them?

Half-latches (Xilinx)

• Half-latches are used across the device to drive constants• Upset in the pull-up can change the state of the inverter• Partial configuration cannot restore the original state

– Latch can recover, after several seconds, due to the leakage of the pull-up transistor

• Mitigation requires the removal of the half-latches

2/6/2009 R2E Radiation School: SEU effects in FPGA 49

M

‘1’‘0’

‘0’‘1’

‘0’ or ‘1’

Weak pull-up

Page 50: SEU effects in FPGA How to deal with them?

Typical workflow

Environment simulation

Component testing Mitigation Verification

2/6/2009 R2E Radiation School: SEU effects in FPGA 50

Page 51: SEU effects in FPGA How to deal with them?

CMS mitigation example

2/6/2009 R2E Radiation School: SEU effects in FPGA 51

by J. Hauser