Upload
kevin-evans
View
220
Download
0
Embed Size (px)
Citation preview
Experience with the design and submission of the Medipix3 pixel readout chip in 0.13 µm
CMOS
X. Llopart
RD51 Paris-14th October
Outline
Introduction to the Medipix3 Medipix3 prototype Medipix3 requirements Available tools IP Blocks Through Via Silicon Verification Conclusions
2
Performance of the Medipix2 & Timepix
Single photon counting provides excellent noise free images Ideal in photon starved situations Many different application both foreseen and otherwise!
Electron microscopy for biology Neutron imaging Nuclear power plant decommissioning Adaptive optics for astronomy Dosimetry in space Gas detectors
www.cern.ch/Medipix
3X-ray transmission image of a termite worker body (left) and detail of its head (bottom). Even the fine internal structure of the antennae is recognized. (Magnified 15x, time=30 s, tube at 40 kV and 70 mA)
Introduction to Medipix3 Main limitations of the Medipix2 chip:
Charge sharing in the sensor is an issue: Flat field correction is sensitive to incoming spectrum Threshold should be exactly half of peak for correct counting with monochromatic
illumination Energy resolution is limited by charge sharing tail
Detector is ‘blind’ during readout (only one counter per pixel) Using serial readout it takes about 5ms to read out one frame Only 3-side buttable Radiation hardness
The Medipix3 collaboration started on June 2005 with 15 members (now 17)
By the end of 2005 a Medipix3 prototype with 8x8 55µm pixels was sent to fabrication
4
Medipix3 Prototype (R. Ballabriga)
IBM 130nm CMOS8RF with 8 metals LM process. MPW through MOSIS. First tests around March 2006. We tested the idea of local communication between 2x2 pixel clusters to
correct the charge sharing distortion effect It took ~1 year to fully test the prototype We learned a lot from this step:
Gain mismatch between channels Digital coupling into analog sensitive lines (arbiter signals-common summing node) Signals producing double counting close to threshold
R. Ballabriga, et al. “The Medipix3 Prototype, a Pixel Readout Chip Working in Single Photon Counting Mode with Improved Spectrometric Performance”, IEEE Trans. Nucl. Sci., vol 54, pp: 1824 - 1829
Prototyping is a time consuming but useful process!!!
5
1000 µ
m
2000 µm
Collaboration approved floorplan & requirements
Highly configurable pixels: Maintain pixel and matrix size Single Pixel Mode (SPM) Charge Summing Mode (CSM) Colour Mode (110 µm x 110 µm) 2 independent thresholds per pixel (8 in colour
mode) 2 programmable counters with overflow (1, 4 or 12
bits) or 1-24 bit Sequential Read/Write mode Semi-Sequential Read/Write mode Continuous Read/Write mode Pixel counter fast Reset
Maximize active area: Multiple-dicing options and minimal IO
periphery Through Via Silicon pads (TVS) included on-
chip Increase connectivity flexibility:
Region (Block) of interest readout Minimize number of lines All control/data lines use LVDS Configurable data output port (1 to 8) On-chip Band-Gap and DACs On-chip test pulse E-fuses for chip identification
6
Which tools we had IBM CMOS8RF 130 nm → Technology has many possibilities
(manual of 520 pages…): Thin (2.2 nm) and thick (5.2 nm) gate oxide NFET/PFET Low power (high threshold) or Regular (low threshold) NFET/PFET Zero-VT thin and thick NFETs Thin Triple Well NFETs 3.3V IO NFET/PFET 5 to 8 metal layers of different “flavours” (LM, MA and OL) with Cu and Al. MIM capacitors (only in MA and OL) E-Fuses And more…
Digital design flow using ARM/Artisan standard cells and IO pads
Design flow was realized by Manhattan Routing tuned for an LM process.
New verification tools (ASSURA, CALIBRE) matched with the IBM design kit
7
The right choice of technology “flavour”
IBM CMOS8RF 8-metals with MA was chosen: Good:
We needed MIM capacitors at the shaper front-end. This would free quite some area. Smaller inter-layer capacitances -> Smaller input capacitance ! Top metal is already thick Al which is needed for bump-bonding M7 is thick Cu -> Good power distribution
Bad: Prototype was done in 8-LM -> Re-check simulations Our digital design flow (MRE) was based in a LM BEOL
Devices used: Pixel:
Regular thin NFET/PFET (Analog) MIM capacitors (Analog) Low power thin NFET/PFET (Digital)
Periphery: Regular thin NFET/PFET (core logic) Regular thick NFET/PFET (IOs) Zero-Vt thick NFET (LVDS driver) 3.3V IO NFET/PFET (e-fuse) E-fuses
8
Pixel schematic
9
gm
gm
VFBK
PolarityBitGainMode
EnablePixelComColourMode
Test Input
Input Pad
TestBit
THA
Cluster commoncontrollogic
+Arbitration
circuitry
CounterA
Next Pixel A
ContReadWriteEnablePixelCom
ColourModeEqualizeTHHCounterSel
ANALOG DIGITAL
CF
CTEST
To adjacent pixels (A, B, D)
From adjacent pixels (F, H, I)
x1
From adjacent pixels (A, B, D)
To adjacent pixels (F, H, I)
A B CD E FG H I
BLOCK DIAGRAM OF PIXEL E
DISC
DISC
x2
x6
THB
x6
x3
x3
x1
x1
ConfigTHA<0:4>
Confx1
x6
x6
x1
ShutterAPrevious Pixel A
x1
ReadClock
CounterB
Next Pixel B
x1
x1
ShutterBPrevious Pixel B
x1
x1
CounterSel
x1
x1
ConfigTHB<0:4>
FastClearReadEnable
x1
x1
x3
Pixel layout Full custom design -> 3 man-years
(Rafa and Winnie)
Basic matrix cell is a 2 x 2 pixel matrix Each pixel contains ~1600 trts: > 100
Mtrts Changes from the prototype:
Some enclose layout NFETs for radiation tolerance enhancement
Added MIM caps Programmable binary counter (1, 4, 12 or
24 bits) with overflow Fast matrix Reset
13 configuration bits per pixel 2 independent test pulse circuits per
pixel column Two power domains:
AVDD 1.5 V: 10.1 µA/pixel max VDD 1.5 V: 10 nA/MHz/pixel -> 2 µA/pixel
@ 200 MHz readout clock)
10
55 µ
m
55 µm
Medipix3 Periphery (I)
11
EoC
0
EoC
1
EoC
2
EoC
253
EoC
254
EoC
255
IO Logic Band-Gap and 25 DACs
AVDD, VDD, DVDD25 and AVDD33
E-Fuses (32 bits)
x8 in
Data
In
Clk
In
Rese
t
Shutt
er
Shutt
er1
Fast
Cle
ar
Shutt
er1
Enable
In
TP_S
wit
ch
Enable
Out
Clk
Out
Data
Out7
Data
Out6
Data
Out5
Data
Out4
Data
Out3
Data
Out2
Data
Out1
Data
Out0
x10 out
TpC
0
TpC
1
TpC
2
TpC
253
TpC
254
TpC
255
Ext
BG
Ext
DA
C DA
CO
ut
1 man-year (Xavi) All the data communication is done through the bottom periphery. Chip needs between 12 (1 data out port) to 18 (8 data out ports) LVDS pairs. 1 analog output line is use to monitor the internal DACs 4 different power domains This block has been synthesized and automatically laid out using the digital design flow
from MRE
Medipix3 Periphery (II) Several blocks have been full custom designed and then
integrated inside the digital flow: LVDS driver (VDD/DVDD) LVDS receiver (VDD/DVDD) E-fuses block (VDD/AVDD33) Analog Periphery (Band-Gap, 25 DACs and monitoring logic) (AVDD) End Of Column (VDD) Test Pulse circuitry (AVDD)
The periphery has been synthesized using a target readout clock frequency of 350 MHz
The design has been verified at this frequency with the post-layout realization with parasitic RC
12
LVDS 130nm Tx & Rx Medipix3 will use only LVDS for the chip IO communication
8 Receivers: Reset, Shutter, Shutter1_CounterSelCRW, MatrixFastClear, TP_Switch, EnableIn, ClockIn, DataIn.
10 Drivers: EnableOut, ClockOut, DataOut[0..7] No LVDS drivers available in the IBM CMOS8 ARM IO libraries → Must
be designed in-house (full-custom)
Requirements: ~500 Mbps Minimum power consumption Dual power (VDDio: 2.5V and VDDcore: 1.2-1.5V): Use of thick and thin oxide
FET !!! Radiation Hard: ELT for the Thick oxide NFETs (New extraction tool for ASSURA LVS
is available) Must be included in the standard CMOS8 ARM IO MA pad size for compatibility with
the digital design flow (73 x 247 µm)
13
LVDS Driver Based in the 0.25 µm LVDS driver from Paulo Moreira (CERN) Added auto-bias circuitry Monte-Carlo simulation with Process and Mismatch corners and (RLC wire
bond parasitics) and VDDcore=1.5V @ 500 MHz Radiation hard
14
Parameter ValueVOUT Low 1 VVOUT High 1.4 VVOUT Common 1.2 VNumber of drivers 10Maximum operating frequency
500 MHz
Power suppliesDVDD = 2.5V
VDD = 1.2-1.5 V
Power consumption per channel
~11.25 mW (4.5 mA)floating: ~1.25 mW
(500µA)Maximum power consumption (8 DataOut ports used)
~112.5 mW (45 mA)
Minimum power consumption (1 DataOut port used)
~42.5 mW (17 mA)
LVDS Receiver Based on a schematic from Miguel Novais (CERN) for a
1.2Gbit/s receiver. Self-biased (always on) Radiation hard
15
Parameter ValueNumber of receivers 8Maximum operating frequency 500 MHz
Power suppliesDVDD = 2.5V
VDD = 1.2-1.5 VPower consumption per channel ~2 mW (800 µA)Total power consumption (8 channels)
~16 mW (6.40 mA)
LVDS Layout Layout fits in the ARM IO
library pitch size 2 PBAREWIRE cells side by
side which were emptied. Only ESD diodes were kept.
Both cells have been digitally characterized and included in the digital design flow.
All thick oxide NFETs have been laid out as ELTs
16
146µm
247µm
LVDS_RX LVDS_TX
E-Fuses IBM stop providing the laser blown fuses: “The “laser fuses” cost more,
occupy more area than the e-FUSE, function at the wafer level only, and prohibit placement of circuits below and wiring above the fuse”.
E-fuses are made by electronically “burning” a salicided polysilicon strip. Before and after burning the resistance is changed from ~100 Ω to >5KΩ
This means: Wafers from IBM will come “blank” Higher system complexity : logic to burn and logic to read Burning needs 3.3V power supply Programming transistor current Ion 10 mA < Ion < 13.5mA Programming time: > 0.18 ms and < 1.0 ms
32-bit included and burned during probe testing
17
E-Fuse block layout
18300µm
65µm
14µ
m
30µm
Programming: Programming pulse length (> 0.18 ms
and < 1.0 ms) is set by a 9-bit register. Fuse selection for programming is done
through a 5-bit fuse decoder. Only 1 bit burned at a time.
Reading: All e-fuses read at once MC simulations shows a sense
threshold of 500Ω ±100 Ω
Analog periphery: Band-Gap
Medipix3 includes a band-gap voltage reference (designed and tested by P. Moreira)
The forward voltage of one of the band-gap diodes is used to monitor the temperature
Power supply sensitivity: 1.2 mV/VAVDD
Temperature sensitivity: 0.1 mV/ºC The output of the band-gap is use for the on-
chip DACs to generate their output with minimal temperature and power supply dependence
19
y = -0.0016x + 0.7779R² = 0.9986
0.4
0.5
0.6
0.7
0.8
0.9
1
-100 -50 0 50 100 150 200
Mea
sure
d ou
tput
vol
tage
[V]
Temp [ºC]
1000µm
235µ
m
Analog periphery: DACs There are 25 DACs on-chip: 10 x 9-bits and 15 x 8-bits DACs 18 linear current and 7 linear voltage output DACs Power supply sensitivity: 1 LSB per 250 mVAVDD Temperature sensitivity: 1 LSB per 25 ºC
Transistor current matching in 0.13 µm is ~2 times worst than in 0.25 µm -> bigger transistors for the same current copy. Why? Different substrate resistivity!
230µm
235µm
450µm
235µm
End Of Column There is 1 End of Column cell per column realized with the MRE
flow It includes column buffering and CTPR and DACs registers Cell has been tested on all corners successfully up to 750 MHz Medipix2/Timepix EndOfColumn + buffering was ~200 µm x 34
µm (75% bigger) Propagation delay <3ns from bottom to top of the column (using
metal with of 400 nm and total line capacitance of 3pF)
21
22 µm
68.4
µm
<3ns<3ns
On-chip test pulse There are 2 independent test pulse circuits per column in order to
test the charge summing circuitry (TP_1 and TP_2) The test pulse amplitude range is controlled by 3 voltage DACs
(TP_REFA, TP_REFB and TP_REF) The test pulse frequency is controlled by LVDS input TP_SWITCH
22
TP_REF
TP_REFA
TP_REFB
TP_1
TP_2
TP_SWITCH
1
1
Parameter ValueChannels per column 2Minimum step 2.5 mV → 86.2 e-
Linear dynamic range [± 1%]
925 mV → ~ 32 Ke-
TP_REF min 8’b0100_0001 → ~325 mVTP_REF max 8’b1111_1010 → ~1250 mVTP_REFA and TP_REFB min 9’b0_1000_0010 → ~312.5 mVTP_REFA and TP_REFB max
9’b1_1111_0100 → ~1250 mV
Typical Rise/Fall time < 50 ns
Current consumptionCTPR bit enabled: ~500 µA @ default DAC
values CTPR bit disabled: negligible
EoC
0
EoC
1
EoC
2
EoC
25
3
EoC
25
4
EoC
25
5
I O Logic Band-Gap and 25 DACs
AVDD, VDD, DVDD25 and AVDD33
E-Fuses (32 bits)
x8 in
Dat
aIn
ClkI
n
Res
et
Shu
tter
Shu
tter
1
Fas
tCle
ar
Shu
tter
1
Ena
bleI
n
TP_
Sw
itch
Ena
bleO
ut
ClkO
ut
Dat
aOut
7
Dat
aOut
6
Dat
aOut
5
Dat
aOut
4
Dat
aOut
3
Dat
aOut
2
Dat
aOut
1
Dat
aOut
0
x10 out
TpC
0
TpC
1
TpC
2
TpC
25
3
TpC
25
4
TpC
25
5
Ext
BG
Ext
DA
C DA
CO
ut
Medipix3 PeripheryIO Logic
Includes ~1nF on-chip decoupling capacitance between VDD and VSSSynthesized using MRPowered through VDD/VSS
E-fuses32-bits Enclosed NMOS transistors used for improved radiation hardness.Powered through VDDA33/VDDA/VSSA
EoColumn and TPulse buffer1 EoC per column (VDD/VSS)2 TpC per column (VDDA/VSSA)
BG and DACs (25)BG from P.Moreira with temperature sensor10 9-bit DACs and 15 8-bit DACsPowered through VDDA/VSSA
IO padsARM power pads usedLVDS IN/OUT and SenseOut pads use enclosed NMOS transistors used for improved radiation hardness.All pads include TVS connectionDVDD/DVSS VDD/VSS VDDA/VSSA
23
WB
IO Pads strategy
The IO power pads used are from the GPIO MA CM0S8 ARM library
This library includes full ESD protection circuitry
Two types of bonding possible: Wire bonding (WB) Through Silicon Via (TSV)
24
IBM IO130nm
247µ
m70µ
mW
B
73µm
247µ
m
WB
IBM IO130nm
WB
WB
IBM IO130nm
WB
WB
IBM IO130nm
WB
1000µ
m
TSVM1
TSVM1
Pads Type Min pitch number
Bottom WB 73 µm 110
TSV 92 µm 108
TopWB 146 µm 84
TSV 146 µm 84
Bottom left corner The center of the first active pixel is at 1804
µm with WB extensions or at 804 µm with TV. Row of sensor guard-ring connected to VSSA. Alignment marks in the four corners of the
chip. Through via High-Voltage pads in the four
corners of the chip. Logo details:
25
1000µ
m800µm
Through Via Silicon (TSV)
Through Silicon Via (TVS) technology is a vertical electrical connection passing completely through a silicon wafer or die.
The connection to the PCB is then done through BGA -> Dead area due to WB is eliminated !
Typical state of the art TSVs in a 50 µm thinned wafer are 35 µm diameter vias with a 60 µm minimum pitch.
26
Timepix to BGA using TSV (Z. Vykydal)
Medipix3 TVS landing pads The TVS landing pads are laid out in M1 with an octagonal shape of 70 µm
diameter. There are 108 in-line TVS pads at the bottom and 84 in-line TVS pads at the top. There are 4 rectangular TVS for the High Voltage connection to the back of the
detector either via BB or WB.
27
70µm
78µm
135µm
15900
µm
14100 µm
15300
µm
14100 µm
14900
µm
14100 µm Medipix3
chip
X [µm]
Y [µm]
Active
Area
Medipix2 and Timepix
14111 16120 87.1%
Medipix3 top and bottom WB
14100 17300 81.2%
Medipix3 bottom WB 14100~1590
088.4%
Medipix3 top and bottom TVS
14100~1530
091.9%
Medipix3 bottom TVS 14100~1490
094.3%
Top Metal (MA) and passivation opening (DV) displayed
Multiple dicing cuts depending on: Top power connection WB or TSV bonding
17300
µm
14100 µm
17300
µm
14100 µm
Medipix3 DRC CALIBRE DRCTM used Many DRC errors still present due to:
ZVT enclosed layout gates in LVDS tx Mim cap area -> Vmax ≥ 6V MQ to gate RX diode per pixel (GR131f) Bump bonding openings in the pixel Multi-dice options
These DRC errors were sent to IBM for waiver clearance
Answer from IBM :
“the design can be manufactured but CERN accepts entirely the risks involved by violating the specific design rule”
29
Medipix3 LVS CALIBRE (from Mentor Graphics) was used for the first time in the
group for doing LVS inside Cadence -> lots of manual reading! This is a true hierarchical tool. ASSURA? Final LVS run for ~10h in a 8 core CPU with 16 GB. Chip completed !!!
30
Conclusion The Medipix3 chip is the first 130 nm engineering run
organized through the CERN HEP service. The Medipix3 prototype demonstrated the principle of local
communication between pixels to solve charge sharing effects.
From there still took 4 man-years (3 people) for the completion of the design. Why?
Change of BEOL technology from the prototype New programmable counter Many unavailable blocks (DACs, LVDS driver and receiver, e-fuse bits, …) Use new tools for the first time (MRE, CALIBRE LVS, …)
Experience gained should reduce design time for future projects
Chip was sent to IBM 24th September !!!31