Experience with the design and submission of the Medipix3 pixel readout chip in 0.13 µm CMOS X. Llopart RD51 Paris-14 th October

Experience with the design and submission of the Medipix3 pixel readout chip in 0.13 µm

CMOS

X. Llopart

RD51 Paris-14th October

Outline

Introduction to the Medipix3 Medipix3 prototype Medipix3 requirements Available tools IP Blocks Through Via Silicon Verification Conclusions

2

Performance of the Medipix2 & Timepix

Single photon counting provides excellent noise free images Ideal in photon starved situations Many different application both foreseen and otherwise!

Electron microscopy for biology Neutron imaging Nuclear power plant decommissioning Adaptive optics for astronomy Dosimetry in space Gas detectors

www.cern.ch/Medipix

3X-ray transmission image of a termite worker body (left) and detail of its head (bottom). Even the fine internal structure of the antennae is recognized. (Magnified 15x, time=30 s, tube at 40 kV and 70 mA)

Introduction to Medipix3 Main limitations of the Medipix2 chip:

Charge sharing in the sensor is an issue: Flat field correction is sensitive to incoming spectrum Threshold should be exactly half of peak for correct counting with monochromatic

illumination Energy resolution is limited by charge sharing tail

Detector is ‘blind’ during readout (only one counter per pixel) Using serial readout it takes about 5ms to read out one frame Only 3-side buttable Radiation hardness

The Medipix3 collaboration started on June 2005 with 15 members (now 17)

By the end of 2005 a Medipix3 prototype with 8x8 55µm pixels was sent to fabrication

4

Medipix3 Prototype (R. Ballabriga)

IBM 130nm CMOS8RF with 8 metals LM process. MPW through MOSIS. First tests around March 2006. We tested the idea of local communication between 2x2 pixel clusters to

correct the charge sharing distortion effect It took ~1 year to fully test the prototype We learned a lot from this step:

Gain mismatch between channels Digital coupling into analog sensitive lines (arbiter signals-common summing node) Signals producing double counting close to threshold

R. Ballabriga, et al. “The Medipix3 Prototype, a Pixel Readout Chip Working in Single Photon Counting Mode with Improved Spectrometric Performance”, IEEE Trans. Nucl. Sci., vol 54, pp: 1824 - 1829

Prototyping is a time consuming but useful process!!!

5

1000 µ

m

2000 µm

Collaboration approved floorplan & requirements

Highly configurable pixels: Maintain pixel and matrix size Single Pixel Mode (SPM) Charge Summing Mode (CSM) Colour Mode (110 µm x 110 µm) 2 independent thresholds per pixel (8 in colour

mode) 2 programmable counters with overflow (1, 4 or 12

bits) or 1-24 bit Sequential Read/Write mode Semi-Sequential Read/Write mode Continuous Read/Write mode Pixel counter fast Reset

Maximize active area: Multiple-dicing options and minimal IO

periphery Through Via Silicon pads (TVS) included on-

chip Increase connectivity flexibility:

Region (Block) of interest readout Minimize number of lines All control/data lines use LVDS Configurable data output port (1 to 8) On-chip Band-Gap and DACs On-chip test pulse E-fuses for chip identification

6

Which tools we had IBM CMOS8RF 130 nm → Technology has many possibilities

(manual of 520 pages…): Thin (2.2 nm) and thick (5.2 nm) gate oxide NFET/PFET Low power (high threshold) or Regular (low threshold) NFET/PFET Zero-VT thin and thick NFETs Thin Triple Well NFETs 3.3V IO NFET/PFET 5 to 8 metal layers of different “flavours” (LM, MA and OL) with Cu and Al. MIM capacitors (only in MA and OL) E-Fuses And more…

Digital design flow using ARM/Artisan standard cells and IO pads

Design flow was realized by Manhattan Routing tuned for an LM process.

New verification tools (ASSURA, CALIBRE) matched with the IBM design kit

7

The right choice of technology “flavour”

IBM CMOS8RF 8-metals with MA was chosen: Good:

We needed MIM capacitors at the shaper front-end. This would free quite some area. Smaller inter-layer capacitances -> Smaller input capacitance ! Top metal is already thick Al which is needed for bump-bonding M7 is thick Cu -> Good power distribution

Bad: Prototype was done in 8-LM -> Re-check simulations Our digital design flow (MRE) was based in a LM BEOL

Devices used: Pixel:

Regular thin NFET/PFET (Analog) MIM capacitors (Analog) Low power thin NFET/PFET (Digital)

Periphery: Regular thin NFET/PFET (core logic) Regular thick NFET/PFET (IOs) Zero-Vt thick NFET (LVDS driver) 3.3V IO NFET/PFET (e-fuse) E-fuses

8

Pixel schematic

9

gm

gm

VFBK

PolarityBitGainMode

EnablePixelComColourMode

Test Input

Input Pad

TestBit

THA

Cluster commoncontrollogic

+Arbitration

circuitry

CounterA

Next Pixel A

ContReadWriteEnablePixelCom

ColourModeEqualizeTHHCounterSel

ANALOG DIGITAL

CF

CTEST

To adjacent pixels (A, B, D)

From adjacent pixels (F, H, I)

x1

From adjacent pixels (A, B, D)

To adjacent pixels (F, H, I)

A B CD E FG H I

BLOCK DIAGRAM OF PIXEL E

DISC

DISC

x2

x6

THB

x6

x3

x3

x1

x1

ConfigTHA<0:4>

Confx1

x6

x6

x1

ShutterAPrevious Pixel A

x1

ReadClock

CounterB

Next Pixel B

x1

x1

ShutterBPrevious Pixel B

x1

x1

CounterSel

x1

x1

ConfigTHB<0:4>

FastClearReadEnable

x1

x1

x3

Pixel layout Full custom design -> 3 man-years

(Rafa and Winnie)

Basic matrix cell is a 2 x 2 pixel matrix Each pixel contains ~1600 trts: > 100

Mtrts Changes from the prototype:

Some enclose layout NFETs for radiation tolerance enhancement

Added MIM caps Programmable binary counter (1, 4, 12 or

24 bits) with overflow Fast matrix Reset

13 configuration bits per pixel 2 independent test pulse circuits per

pixel column Two power domains:

AVDD 1.5 V: 10.1 µA/pixel max VDD 1.5 V: 10 nA/MHz/pixel -> 2 µA/pixel

@ 200 MHz readout clock)

10

55 µ

m

55 µm

Medipix3 Periphery (I)

11

EoC

0

EoC

1

EoC

2

EoC

253

EoC

254

EoC

255

IO Logic Band-Gap and 25 DACs

AVDD, VDD, DVDD25 and AVDD33

E-Fuses (32 bits)

x8 in

Data

In

Clk

In

Rese

t

Shutt

er

Shutt

er1

Fast

Cle

ar

Shutt

er1

Enable

In

TP_S

wit

ch

Enable

Out

Clk

Out

Data

Out7

Data

Out6

Data

Out5

Data

Out4

Data

Out3

Data

Out2

Data

Out1

Data

Out0

x10 out

TpC

0

TpC

1

TpC

2

TpC

253

TpC

254

TpC

255

Ext

BG

Ext

DA

C DA

CO

ut

1 man-year (Xavi) All the data communication is done through the bottom periphery. Chip needs between 12 (1 data out port) to 18 (8 data out ports) LVDS pairs. 1 analog output line is use to monitor the internal DACs 4 different power domains This block has been synthesized and automatically laid out using the digital design flow

from MRE

Medipix3 Periphery (II) Several blocks have been full custom designed and then

integrated inside the digital flow: LVDS driver (VDD/DVDD) LVDS receiver (VDD/DVDD) E-fuses block (VDD/AVDD33) Analog Periphery (Band-Gap, 25 DACs and monitoring logic) (AVDD) End Of Column (VDD) Test Pulse circuitry (AVDD)

The periphery has been synthesized using a target readout clock frequency of 350 MHz

The design has been verified at this frequency with the post-layout realization with parasitic RC

12

LVDS 130nm Tx & Rx Medipix3 will use only LVDS for the chip IO communication

8 Receivers: Reset, Shutter, Shutter1_CounterSelCRW, MatrixFastClear, TP_Switch, EnableIn, ClockIn, DataIn.

10 Drivers: EnableOut, ClockOut, DataOut[0..7] No LVDS drivers available in the IBM CMOS8 ARM IO libraries → Must

be designed in-house (full-custom)

Requirements: ~500 Mbps Minimum power consumption Dual power (VDDio: 2.5V and VDDcore: 1.2-1.5V): Use of thick and thin oxide

FET !!! Radiation Hard: ELT for the Thick oxide NFETs (New extraction tool for ASSURA LVS

is available) Must be included in the standard CMOS8 ARM IO MA pad size for compatibility with

the digital design flow (73 x 247 µm)

13

LVDS Driver Based in the 0.25 µm LVDS driver from Paulo Moreira (CERN) Added auto-bias circuitry Monte-Carlo simulation with Process and Mismatch corners and (RLC wire

bond parasitics) and VDDcore=1.5V @ 500 MHz Radiation hard

14

Parameter ValueVOUT Low 1 VVOUT High 1.4 VVOUT Common 1.2 VNumber of drivers 10Maximum operating frequency

500 MHz

Power suppliesDVDD = 2.5V

VDD = 1.2-1.5 V

Power consumption per channel

~11.25 mW (4.5 mA)floating: ~1.25 mW

(500µA)Maximum power consumption (8 DataOut ports used)

~112.5 mW (45 mA)

Minimum power consumption (1 DataOut port used)

~42.5 mW (17 mA)

LVDS Receiver Based on a schematic from Miguel Novais (CERN) for a

1.2Gbit/s receiver. Self-biased (always on) Radiation hard

15

Parameter ValueNumber of receivers 8Maximum operating frequency 500 MHz

Power suppliesDVDD = 2.5V

VDD = 1.2-1.5 VPower consumption per channel ~2 mW (800 µA)Total power consumption (8 channels)

~16 mW (6.40 mA)

LVDS Layout Layout fits in the ARM IO

library pitch size 2 PBAREWIRE cells side by

side which were emptied. Only ESD diodes were kept.

Both cells have been digitally characterized and included in the digital design flow.

All thick oxide NFETs have been laid out as ELTs

16

146µm

247µm

LVDS_RX LVDS_TX

E-Fuses IBM stop providing the laser blown fuses: “The “laser fuses” cost more,

occupy more area than the e-FUSE, function at the wafer level only, and prohibit placement of circuits below and wiring above the fuse”.

E-fuses are made by electronically “burning” a salicided polysilicon strip. Before and after burning the resistance is changed from ~100 Ω to >5KΩ

This means: Wafers from IBM will come “blank” Higher system complexity : logic to burn and logic to read Burning needs 3.3V power supply Programming transistor current Ion 10 mA < Ion < 13.5mA Programming time: > 0.18 ms and < 1.0 ms

32-bit included and burned during probe testing

17

E-Fuse block layout

18300µm

65µm

14µ

m

30µm

Programming: Programming pulse length (> 0.18 ms

and < 1.0 ms) is set by a 9-bit register. Fuse selection for programming is done

through a 5-bit fuse decoder. Only 1 bit burned at a time.

Reading: All e-fuses read at once MC simulations shows a sense

threshold of 500Ω ±100 Ω

Analog periphery: Band-Gap

Medipix3 includes a band-gap voltage reference (designed and tested by P. Moreira)

The forward voltage of one of the band-gap diodes is used to monitor the temperature

Power supply sensitivity: 1.2 mV/VAVDD

Temperature sensitivity: 0.1 mV/ºC The output of the band-gap is use for the on-

chip DACs to generate their output with minimal temperature and power supply dependence

19

y = -0.0016x + 0.7779R² = 0.9986

0.4

0.5

0.6

0.7

0.8

0.9

1

-100 -50 0 50 100 150 200

Mea

sure

d ou

tput

vol

tage

[V]

Temp [ºC]

1000µm

235µ

m

Analog periphery: DACs There are 25 DACs on-chip: 10 x 9-bits and 15 x 8-bits DACs 18 linear current and 7 linear voltage output DACs Power supply sensitivity: 1 LSB per 250 mVAVDD Temperature sensitivity: 1 LSB per 25 ºC

Transistor current matching in 0.13 µm is ~2 times worst than in 0.25 µm -> bigger transistors for the same current copy. Why? Different substrate resistivity!

230µm

235µm

450µm

235µm

End Of Column There is 1 End of Column cell per column realized with the MRE

flow It includes column buffering and CTPR and DACs registers Cell has been tested on all corners successfully up to 750 MHz Medipix2/Timepix EndOfColumn + buffering was ~200 µm x 34

µm (75% bigger) Propagation delay <3ns from bottom to top of the column (using

metal with of 400 nm and total line capacitance of 3pF)

21

22 µm

68.4

µm

<3ns<3ns

On-chip test pulse There are 2 independent test pulse circuits per column in order to

test the charge summing circuitry (TP_1 and TP_2) The test pulse amplitude range is controlled by 3 voltage DACs

(TP_REFA, TP_REFB and TP_REF) The test pulse frequency is controlled by LVDS input TP_SWITCH

22

TP_REF

TP_REFA

TP_REFB

TP_1

TP_2

TP_SWITCH

1

1

Parameter ValueChannels per column 2Minimum step 2.5 mV → 86.2 e-

Linear dynamic range [± 1%]

925 mV → ~ 32 Ke-

TP_REF min 8’b0100_0001 → ~325 mVTP_REF max 8’b1111_1010 → ~1250 mVTP_REFA and TP_REFB min 9’b0_1000_0010 → ~312.5 mVTP_REFA and TP_REFB max

9’b1_1111_0100 → ~1250 mV

Typical Rise/Fall time < 50 ns

Current consumptionCTPR bit enabled: ~500 µA @ default DAC

values CTPR bit disabled: negligible

EoC

0

EoC

1

EoC

2

EoC

25

3

EoC

25

4

EoC

25

5

I O Logic Band-Gap and 25 DACs

AVDD, VDD, DVDD25 and AVDD33

E-Fuses (32 bits)

x8 in

Dat

aIn

ClkI

n

Res

et

Shu

tter

Shu

tter

1

Fas

tCle

ar

Shu

tter

1

Ena

bleI

n

TP_

Sw

itch

Ena

bleO

ut

ClkO

ut

Dat

aOut

7

Dat

aOut

6

Dat

aOut

5

Dat

aOut

4

Dat

aOut

3

Dat

aOut

2

Dat

aOut

1

Dat

aOut

0

x10 out

TpC

0

TpC

1

TpC

2

TpC

25

3

TpC

25

4

TpC

25

5

Ext

BG

Ext

DA

C DA

CO

ut

Medipix3 PeripheryIO Logic

Includes ~1nF on-chip decoupling capacitance between VDD and VSSSynthesized using MRPowered through VDD/VSS

E-fuses32-bits Enclosed NMOS transistors used for improved radiation hardness.Powered through VDDA33/VDDA/VSSA

EoColumn and TPulse buffer1 EoC per column (VDD/VSS)2 TpC per column (VDDA/VSSA)

BG and DACs (25)BG from P.Moreira with temperature sensor10 9-bit DACs and 15 8-bit DACsPowered through VDDA/VSSA

IO padsARM power pads usedLVDS IN/OUT and SenseOut pads use enclosed NMOS transistors used for improved radiation hardness.All pads include TVS connectionDVDD/DVSS VDD/VSS VDDA/VSSA

23

WB

IO Pads strategy

The IO power pads used are from the GPIO MA CM0S8 ARM library

This library includes full ESD protection circuitry

Two types of bonding possible: Wire bonding (WB) Through Silicon Via (TSV)

24

IBM IO130nm

247µ

m70µ

mW

B

73µm

247µ

m

WB

IBM IO130nm

WB

WB

IBM IO130nm

WB

WB

IBM IO130nm

WB

1000µ

m

TSVM1

TSVM1

Pads Type Min pitch number

Bottom WB 73 µm 110

TSV 92 µm 108

TopWB 146 µm 84

TSV 146 µm 84

Bottom left corner The center of the first active pixel is at 1804

µm with WB extensions or at 804 µm with TV. Row of sensor guard-ring connected to VSSA. Alignment marks in the four corners of the

chip. Through via High-Voltage pads in the four

corners of the chip. Logo details:

25

1000µ

m800µm

Through Via Silicon (TSV)

Through Silicon Via (TVS) technology is a vertical electrical connection passing completely through a silicon wafer or die.

The connection to the PCB is then done through BGA -> Dead area due to WB is eliminated !

Typical state of the art TSVs in a 50 µm thinned wafer are 35 µm diameter vias with a 60 µm minimum pitch.

26

Timepix to BGA using TSV (Z. Vykydal)

Medipix3 TVS landing pads The TVS landing pads are laid out in M1 with an octagonal shape of 70 µm

diameter. There are 108 in-line TVS pads at the bottom and 84 in-line TVS pads at the top. There are 4 rectangular TVS for the High Voltage connection to the back of the

detector either via BB or WB.

27

70µm

78µm

135µm

15900

µm

14100 µm

15300

µm

14100 µm

14900

µm

14100 µm Medipix3

chip

X [µm]

Y [µm]

Active

Area

Medipix2 and Timepix

14111 16120 87.1%

Medipix3 top and bottom WB

14100 17300 81.2%

Medipix3 bottom WB 14100~1590

088.4%

Medipix3 top and bottom TVS

14100~1530

091.9%

Medipix3 bottom TVS 14100~1490

094.3%

Top Metal (MA) and passivation opening (DV) displayed

Multiple dicing cuts depending on: Top power connection WB or TSV bonding

17300

µm

14100 µm

17300

µm

14100 µm

Medipix3 DRC CALIBRE DRCTM used Many DRC errors still present due to:

ZVT enclosed layout gates in LVDS tx Mim cap area -> Vmax ≥ 6V MQ to gate RX diode per pixel (GR131f) Bump bonding openings in the pixel Multi-dice options

These DRC errors were sent to IBM for waiver clearance

Answer from IBM :

“the design can be manufactured but CERN accepts entirely the risks involved by violating the specific design rule”

29

Medipix3 LVS CALIBRE (from Mentor Graphics) was used for the first time in the

group for doing LVS inside Cadence -> lots of manual reading! This is a true hierarchical tool. ASSURA? Final LVS run for ~10h in a 8 core CPU with 16 GB. Chip completed !!!

30

Conclusion The Medipix3 chip is the first 130 nm engineering run

organized through the CERN HEP service. The Medipix3 prototype demonstrated the principle of local

communication between pixels to solve charge sharing effects.

From there still took 4 man-years (3 people) for the completion of the design. Why?

Change of BEOL technology from the prototype New programmable counter Many unavailable blocks (DACs, LVDS driver and receiver, e-fuse bits, …) Use new tools for the first time (MRE, CALIBRE LVS, …)

Experience gained should reduce design time for future projects

Chip was sent to IBM 24th September !!!31

Documents

Experience with the design and submission of the Medipix3 pixel readout chip in 0.13 µm CMOS X. Llopart RD51 Paris-14 th October