6
Ball grid array (BGA) solder joint intermittency real-time detection N. Roth a, * , W. Wondrak a , A. Willikens a , J. Hofmeister b a Daimler AG, 71059 Sindelfingen, Germany b Ridgetop Group, Tucson/Arizona, USA article info Article history: Received 12 July 2008 abstract This paper presents test results and specifications for SJ BIST (solder joint built-in-self-test), an innovative sensing method for detecting faults in solder-joint networks that belong to the I/O ports of field program- mable gate arrays (FPGAs), especially in ball grid array packages. It is well known that fractured solder joints typically maintain sufficient electrical contact to operate correctly for long periods of time. Subse- quently, the damaged joint begins to exhibit intermittent failures: the faces of a fracture separate during periods of stress, causing incorrect FPGA signals. SJ BIST detects faults at least as low as 100 X with zero false alarms: minimum detectable fault period is one-half the period of the FPGA clock – guaranteed detection is two clock periods. SJ BIST correctly detects and reports instances of high-resistance with no false alarms: test results are shown in this paper. Being able to detect solder-joint faults in FPGAs increases fault coverage and health management capabilities, and provides support for condition-based and reliability-centered maintenance. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction This paper presents test results and specifications for SJ BIST (solder joint built-in-self-test), which is an innovative sensing method for detecting faults in solder-joint networks that belong to the I/O ports of field programmable gate arrays (FPGAs), espe- cially FPGAs in ball grid array (BGA) packages, such as a XILINX Ò FG1156 [1–5]. FPGAs are widely used as controllers in aerospace and automotive applications, and being able to detect solder-joint faults increases both fault coverage and health management capa- bilities and support for condition-based and reliability-centered maintenance. As both the pitch between the solder balls of the sol- der joints of BGA packages and the diameter of the solder balls de- crease, the importance a real-time solder-joint fault sensor for FPGAs increases. SJ BIST is the first known for detecting high-resis- tance faults in solder-joint networks of operational FPGAs. The current version of SJ BIST is a verilog-based, two-pin test group core intended to be incorporated within an end-use applica- tion in the FPGA. Test assemblies of printed wire (circuit) boards (PWBs) with programmed FPGAs have been assembled, developed and are being tested to failure at the Daimler Research Center in Germany and at Ridgetop Group facility in Tucson, Arizona. The program for the test boards contains temporary data collec- tion routines for statistical analysis and uses less than 250 cells out of over 78,000 cells for a 5-million gate, 1156-pin FPGA to test 8 pins. Initial testing indicates SJ BIST is capable of detecting high- resistance faults of 100 X or lower and which last one-half of a clock period or longer. In addition to producing no false alarms in the current period of testing, no additional failure mechanisms are introduced to an assembly because, with the exception of fault reporting and handling, SJ BIST is (1) not invasive to an application program; (2) is not compute intensive and so timing problems are not anticipated; and (3) any failures in SJ BIST itself is a failure in the FPGA or the board. Therefore, either a real failure is reported or no failure is reported and neither is a false alarm. 1.1. Mechanics of failure Solder-joint fatigue damage caused by thermo-mechanical and shock stresses is cumulative and manifests as voids and cracks, which propagate in number and size. Eventually, the solder-joint fractures [6–9] and FGPA operational failures occur. An illustration of a damaged solder joint (or bump) on the verge of fracturing is shown in Fig. 1. Fractured bumps, during periods of increasing stress, tend to momentarily open and cause hard-to-diagnose, intermittent faults of high resistance of hundreds of Ohms [9–11]. Such faults typi- cally last for periods of hundreds of nanoseconds, or less, to more than 1 ls [6,10,12–15]. These intermittent faults increase in frequency as evidenced by a practice of logging BGA package failures only after multiple events of high-resistance: an initial event followed by some num- ber (for example, 2–10) of additional events within a specified per- iod of time, such as 10% of the number of cycles of the initial event [13–15]. Even then, an intermittent fault in a solder-joint network 0026-2714/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.microrel.2008.07.064 * Corresponding author. Tel.: +49 7031 4389 398. E-mail address: [email protected] (N. Roth). Microelectronics Reliability 48 (2008) 1155–1160 Contents lists available at ScienceDirect Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel

Ball grid array (BGA) solder joint intermittency real-time detection

  • Upload
    j

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ball grid array (BGA) solder joint intermittency real-time detection

Microelectronics Reliability 48 (2008) 1155–1160

Contents lists available at ScienceDirect

Microelectronics Reliability

journal homepage: www.elsevier .com/locate /microrel

Ball grid array (BGA) solder joint intermittency real-time detection

N. Roth a,*, W. Wondrak a, A. Willikens a, J. Hofmeister b

a Daimler AG, 71059 Sindelfingen, Germanyb Ridgetop Group, Tucson/Arizona, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 12 July 2008

0026-2714/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.microrel.2008.07.064

* Corresponding author. Tel.: +49 7031 4389 398.E-mail address: [email protected] (N.

This paper presents test results and specifications for SJ BIST (solder joint built-in-self-test), an innovativesensing method for detecting faults in solder-joint networks that belong to the I/O ports of field program-mable gate arrays (FPGAs), especially in ball grid array packages. It is well known that fractured solderjoints typically maintain sufficient electrical contact to operate correctly for long periods of time. Subse-quently, the damaged joint begins to exhibit intermittent failures: the faces of a fracture separate duringperiods of stress, causing incorrect FPGA signals. SJ BIST detects faults at least as low as 100 X with zerofalse alarms: minimum detectable fault period is one-half the period of the FPGA clock – guaranteeddetection is two clock periods. SJ BIST correctly detects and reports instances of high-resistance withno false alarms: test results are shown in this paper. Being able to detect solder-joint faults in FPGAsincreases fault coverage and health management capabilities, and provides support for condition-basedand reliability-centered maintenance.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

This paper presents test results and specifications for SJ BIST(solder joint built-in-self-test), which is an innovative sensingmethod for detecting faults in solder-joint networks that belongto the I/O ports of field programmable gate arrays (FPGAs), espe-cially FPGAs in ball grid array (BGA) packages, such as a XILINX�

FG1156 [1–5]. FPGAs are widely used as controllers in aerospaceand automotive applications, and being able to detect solder-jointfaults increases both fault coverage and health management capa-bilities and support for condition-based and reliability-centeredmaintenance. As both the pitch between the solder balls of the sol-der joints of BGA packages and the diameter of the solder balls de-crease, the importance a real-time solder-joint fault sensor forFPGAs increases. SJ BIST is the first known for detecting high-resis-tance faults in solder-joint networks of operational FPGAs.

The current version of SJ BIST is a verilog-based, two-pin testgroup core intended to be incorporated within an end-use applica-tion in the FPGA. Test assemblies of printed wire (circuit) boards(PWBs) with programmed FPGAs have been assembled, developedand are being tested to failure at the Daimler Research Center inGermany and at Ridgetop Group facility in Tucson, Arizona.

The program for the test boards contains temporary data collec-tion routines for statistical analysis and uses less than 250 cells outof over 78,000 cells for a 5-million gate, 1156-pin FPGA to test 8pins.

ll rights reserved.

Roth).

Initial testing indicates SJ BIST is capable of detecting high-resistance faults of 100 X or lower and which last one-half of aclock period or longer. In addition to producing no false alarmsin the current period of testing, no additional failure mechanismsare introduced to an assembly because, with the exception of faultreporting and handling, SJ BIST is (1) not invasive to an applicationprogram; (2) is not compute intensive and so timing problems arenot anticipated; and (3) any failures in SJ BIST itself is a failure inthe FPGA or the board. Therefore, either a real failure is reportedor no failure is reported and neither is a false alarm.

1.1. Mechanics of failure

Solder-joint fatigue damage caused by thermo-mechanical andshock stresses is cumulative and manifests as voids and cracks,which propagate in number and size. Eventually, the solder-jointfractures [6–9] and FGPA operational failures occur. An illustrationof a damaged solder joint (or bump) on the verge of fracturing isshown in Fig. 1.

Fractured bumps, during periods of increasing stress, tend tomomentarily open and cause hard-to-diagnose, intermittent faultsof high resistance of hundreds of Ohms [9–11]. Such faults typi-cally last for periods of hundreds of nanoseconds, or less, to morethan 1 ls [6,10,12–15].

These intermittent faults increase in frequency as evidenced bya practice of logging BGA package failures only after multipleevents of high-resistance: an initial event followed by some num-ber (for example, 2–10) of additional events within a specified per-iod of time, such as 10% of the number of cycles of the initial event[13–15]. Even then, an intermittent fault in a solder-joint network

Page 2: Ball grid array (BGA) solder joint intermittency real-time detection

Fig. 1. Crack propagation at the top and bottom of a solder joint, 15 mm BGA [7,22].

1156 N. Roth et al. / Microelectronics Reliability 48 (2008) 1155–1160

might not result in an operational fault. For example, the faultmight be in a ground or power connection; or it might occur duringa period when the network is not being written; or it might be tooshort in duration to cause a signal error.

Damage accumulates and eventually there is a catastrophic fail-ure, such as might happen when a solder ball becomes displaced asdepicted in Fig. 2.

Fig. 3 represents HALT test results performed on XILINX FG1156Daisy Chain packages in which 30 out of 32 tested packages failedin a test period consisting of 3108 HALT cycles. Each temperature

Fig. 2. Intermittent failure caused by fractured solder joint and vibration stress.

Fig. 3. XILINX FPGA HALT test results [14].

cycle of the HALT was a transition from �55�C to 125�C in30 min: 3-min ramps and 12-min dwells. What is not immediatelyapparent is that each of the logged FPGA failures (diamond sym-bols) represents at least 30 events of high resistance: a FAIL wasdefined as being at least two OPENS within the same temperaturecycle. A single OPEN in any temperature cycle was not counted as aFAIL event; and the package was deemed as failed only after 15FAILS [14] had been logged.

1.2. State of the art

The use of leading indicators of failure for prognostication ofelectronics has been previously demonstrated [16–19]. One impor-tant reason for using an in situ solder-joint fault sensor is thatstress magnitudes are hard to derive, much less keep track of[20]; another reason is that even though a particular damaged sol-der joint might not result in immediate FPGA operational faults,the fault indicates the FPGA is likely to have, or will soon have,other damaged I/O ports – the FPGA is no longer reliable. In a man-ufacturing environment, SJ BIST addresses a concern that failuremodes caused by the PWB-FPGA assembly are not being detectedduring component qualification [11].

FPGAs are not amenable to the measurement techniques typi-cally used in manufacturing reliability tests such as highly acceler-ated life tests (HALTs) [9]. This is because, for example, a 4-pointprobe measurement requires devices to be powered-off; and be-cause FPGA I/O ports are digital, rather than analog, circuits.

2. SJ BIST

SJ BIST requires the attachment of a small capacitor to two un-used I/O ports as near as possible to a corner of the package. Fig. 4is a block diagram of an FPGA containing SJ BIST with a capacitorconnected to two I/O pins. SJ BIST writes a logical ‘1’ to chargethe capacitor and then reads the voltage across the charged capac-itor. If the solder-joint network of the I/O port of the FPGA isundamaged, the write causes the capacitor to be fully chargedand a logical ‘1’ is read by SJ BIST. When the solder-joint network

FPGA

SJBIST

SOLDER BALL

CAPACITOR VOLTAGE

FAULT

Fig. 4. SJ BIST block diagram.

Page 3: Ball grid array (BGA) solder joint intermittency real-time detection

Fig. 5. Capacitor signal with an initiated 300 X fault. Fig. 6. Clock frequency, external capacitor value and detectable fault resistance.

N. Roth et al. / Microelectronics Reliability 48 (2008) 1155–1160 1157

is damaged, the effective resistance of the SJ network increases, thecharging time constant increases, the write fails to sufficientlycharge the capacitor, and a logical ‘0’ instead of a logical ‘1’ is read:a fault occurs, is detected by SJ BIST and is reported.

2.1. Test results

Tests have been run at various clock frequencies ranging from10 KHz to 100 MHz and for varying solder-joint network condi-tions ranging from no fault to over 100 X of fault resistance: Allfaults of 100 X or larger were detected, and there were no falsealarms.

Referring to Fig. 5, voltages on a 47 pF capacitor connected to agroup of two I/O ports are shown. The red1 market arrows showsthe capacitor voltage when the resistance is increased to 300 X, awrite fault occurs. SJ BIST correctly reported the fault and incre-mented the fault count for that I/O pin.

2.2. Clock frequency, capacitor value and fault resistance

The relationship between the frequency of the FPGA clock, thevalue of the connected capacitor to each group of two I/O pins,and the detectable fault resistance is shown in Fig. 6.

2.3. SJ BIST sensitivity and resolution

For a given clock frequency, there is a capacitor value that willcause SJ BIST to always detect a fault resistance at least as low as100 X with a duration of at least two clock periods. Faults of short-er duration are detectable when they occur at the beginning of thewrite and last for about one-half of a clock period.

For clock frequencies of one-half or higher of the maximumclock frequency of the FPGA, the capacitance of the I/O port is suf-ficient – no external capacitance needs to be connected for SJ BISTto correctly detect faults with no alarms.

3. Intermittency mitigation

SJ BIST is a very useful sensor for mitigating intermittencies.Early detection of failure of an unused I/O pin allows the electronicboard to be replaced before subsequent fatigue damage causes an

1 For interpretation of color in Figs. 5, 7 and 8 the reader is referred to the webversion of this article.

application I/O pin to fail, and therefore intermittent operationalanomalies are avoided.

3.1. Intermittent behavior of fractured solder joints

As previously seen in Fig. 2, fractured solder joints tend to main-tain electrical contact until periods of increasing stress occur: ahard open longer than a millisecond did not occur until any shocksand vibrations [22].

3.2. Intermittent confirmation and prognostic warning

Detection of intermittent faults can be used to confirm that theelectronic board with that FPGA is a likely candidate for replace-ment to address reported operational anomalies. In the absenceof any reported operational anomalies, detected faults can be usedas a prognostic warning.

4. Pin selection

We recommend a deployed SJ BIST application would mostlikely use at least four groups of cores: one group of two I/O pinsnear each corner of an FPGA (the pink-shaded pins) because re-search indicates the solder balls nearest the corner are most likelyto fail first.

4.1. FEM simulation

Fig. 8 shows a finite element simulation with ANSYSTM underthermo-mechanical stress (based on Tref = 125�C as least strain,the X–Y Mises-Strain have the maximum at �55 �C).

High stress caused by physical mounting coupled with thermo-mechanical stresses is the most likely cause of the observed failuredistribution of solder balls in BGA packages.

The next-most likely solder balls to fail are those under the dieof the FPGA. Evidence of this failure distribution is the reserving ofthe four I/O pins at each corner for ground and all of the pins underthe dies for power and ground (the orange-shaded pins).

5. Test activities

Extensive experiments, including HALTs, have been plannedand are presently being conducted. The primary objectives arethe following: (1) perform final sensitivity, resolution and clock

Page 4: Ball grid array (BGA) solder joint intermittency real-time detection

1158 N. Roth et al. / Microelectronics Reliability 48 (2008) 1155–1160

frequency measurements to update Fig. 6; (2) collect, evaluate andpublish statistical data related to test I/O port location, first failureand probability of failure distribution. The pink- and light-blue-shaded pins in Fig. 7 are the 64 pins we selected for HALT testing:32 groups of two-pin SJ BIST cores.

5.1. HALT test board

Fig. 9 is a block diagram of the HALT test board we designed.The SJ BIST test program is loaded into a PROM, which then loadsthe program into each FPGA when the board is powered on. EachFPGA generates 640 bits of data (64 � (2 faults signals + 8 bits ofcount)); each board generates 2560 bits of data. We wrote a Lab-VIEW� program to control the collection of data, and we wrote aMATLAB� program to process the data.

Fig. 10 shows a manufactured, populated and soldered HALT testboard. There are three connectors: (1) XILINX programmer connec-tion, (2) power input and (3) experiment control connections.

Fig. 7. FG1156 I/O pin footprint [21].

Fig. 8. Thermo-mechani

Fig. 9. HALT test board block diagram.

Fig. 10. HALT experiment board with four XILINX FG1156 FPGAs.

The test boards are placed in a thermo oven and cycled from�55 �C to 125 �C in 5 min ramps and 25 min dwells. The boardsare subjected to random cyclic shock with variable frequencysuperimposed with random vibrations. Data is collected every

cal FEM simulation.

Page 5: Ball grid array (BGA) solder joint intermittency real-time detection

N. Roth et al. / Microelectronics Reliability 48 (2008) 1155–1160 1159

30 min. This procedure is repeated for each test board until theboards are damaged.

6. Results

After 2000 of thermo-cycles between�55�C and 125 �C the BGAPackage was pulled down from the PCB. The picture in Fig. 11shows the bottom of the BGA Package.

6.1. Left picture

The left cracked ball have a small remaining contact area, so it issufficient for an electrical contact (was not detected with SJBIST).The right cracked ball is completely separated between the padand the ball (was detected with SJBIST).

6.2. Right picture

Shows remaining bumps together with broken bumps.

Fig. 11. Rupture after pulled down the BGA package.

Fig. 12. Rupture after pulled down the BGA package.

Fig. 13. Grinding of a cracked and detected bump.

The effective contact area is getting smaller with increasingnumber of cycles. That’s similar with the model in the left pictureof Fig. 1.

Useful for comparison is the rupture of an unstressed FPGA afterpull down. In Fig. 12 all bumps remain on the package site (leftpicture).

The bumps on the PCB site (right picture) pulled down com-pletely with the PCB pads and wires, the soldering process by Xi-linx on the BGA package site should be ok.

Fig. 13 present a grinding from a detected bump. The crackpropagation starts at the edge and propagates along the phase ofintermetalisation.

7. Summary and conclusion

In this paper we provided updated information on SJ BIST. Abrief overview of the mechanics-of-failure was included: the pri-mary contributor to fatigue damage is thermo-mechanical stressesrelated to CTE mismatches, shock and vibration, and power on–offsequencing. Solder-joint fatigue damage can result in fractures thatcause intermittent instances of high-resistance spikes that arehard-to-diagnose. In reliability testing, OPENS (faults) are oftencharacterized by spikes of a 100 X or more lasting for less than100 ns–1 ls or longer.

Prior to SJ BIST, there were no known methods for detectinghigh-resistance faults in solder-joint networks belonging to theI/O ports of operational, fully-programmed FPGAs.

An in situ SJ BIST to test or monitor selected I/O pins is usefulbecause stress magnitudes are hard to derive, which leads to inac-curate life expectancy predictions; and even though a particulardamaged solder-joint port might not result in immediate FPGAoperational failure, the damage indicates the FPGA is no longerreliable. SJ BIST can also be used in newly designed manufacturingreliability tests to investigate failure modes related to the PWB-FPGA assembly.

References

[1] Hofmeister JP, Lall P, Graves Russ. In-situ, real-time detector for faults in solderjoints belonging to operational, fully programmed FPGAs. In: Proceedings, IEEEAUTOTESTCON 2006, Anaheim, CA, September 18–21, 2006. p. 237–43.

[2] Hofmeister JP, Lall P, Graves R. In-situ, real-time detector for faults in solder-joint networks belonging to operational, fully programmed fieldprogrammable gate arrays (FPGAs). In: IEEE instrumentation andmeasurement magazine, August, 2007. p. 32–7.

[3] Pradeep Lall, Prakriti Choudhary, Sameep Gupte, Jeff Suhling, JamesHofmeister, Justin Judkins, Douglas Goodman. Statistical pattern recognitionand built-in reliability test for feature extraction and health monitoring ofelectronics under shock loads. In: 57th IEEE electronic components andtechnology conference, June 1, 2007.

[4] James P, Hofmeister, Pradeep Lall, Edgar Ortiz, Jeremy Ralston-Good, DouglasGoodman. Prognostic solder-joint sensors for programmed FPGAs. In: CMSEconference 2007, Components Technology Institute Inc., Los Angeles, CA.,March 12–15, 2007. p. 159–68.

[5] James P. Hofmeister, Pradeep Lall, Edgar Ortiz, Doug Goodman, Justin Judkins.Real-time detection of solder-joint faults in operational field programmablegate arrays. In: IEEE aerospace conference 2007, Big Sky, MT, March 4–9, 2007,Track 11-0908. p. 1–9.

[6] Accelerated Reliability Task IPC-SM-785. SMT force group standard, productreliability committee of the IPC. Published by Analysis Tech. Inc.; 2005.www.analysistech.com/event-tech-IPC-SM-785.

[7] Lall P, Islam MN, Singh N, Suhling JC, Darveaux R. Model for BGA and CSPreliability in automotive underhood applications. IEEE Trans Comp PackTechnol 2004;27(3):585–93.

[8] Gannamani R, Valluri V, Sidharth, Zhang M-L. Reliability evaluation of chipscale packages. Advanced micro devices, Sunnyvale, CA, Daisy Chain Samples,Application Note, Spansion, July 2003. p. 4–9.

[9] Sony semiconductor quality and reliability handbook, vol. 2, p. 66–7. RevisedMay 2001. http://www.sony.net/products/SC-HP/tec/catalog (vol. 4, p. 120–9).

[10] Use condition based reliability evaluation: an example applied to ball gridarray (BGA) packages. SEMATECH technology transfer #99083813A-XFR,International SEMATECH; 1999. p. 6.

[11] Comparison of ball grid array (BGA) component and assembly levelqualification tests and failure modes. SEMATECH technology

Page 6: Ball grid array (BGA) solder joint intermittency real-time detection

1160 N. Roth et al. / Microelectronics Reliability 48 (2008) 1155–1160

transfer #00053957A-XFR, International SEMATECH, May 31, 2000.p. 1–4.

[12] Roergren R, Teghall P-E, Carlsson P. Reliability of BGA packages in anautomotive environment. IVF – The Swedish Institute of ProductionEngineering Research, Argongatan 30, SE-431 53 Moelndal, Sweden. http://www.ivf.se [accessed December 25, 2005].

[13] Hodges Popp DE, Mawer A, Presas G. Flip chip PBGA solder-joint reliability:power cycling versus thermal cycling. Motorola Semiconductor ProductsSector, Austin, TX, December 19, 2005.

[14] The reliability report, XILINX, September 1, 2003. p. 225–9. xgoogle.xilinx.com.[15] Clech J-P, Noctor DM, Manock JC, Lynott GW, Bader FE. Surface mount

assembly failure statistics and failure-free times. In: Proceedings, 44th ECTC,Washington, DC, May 1–4, 1994. p. 487–97.

[16] Lall P, Choudhary P, Gupte S. Health monitoring for damage initiation andprogression during mechanical shock in electronic assemblies. In: Proceedingsof the 56th IEEE electronic components and technology conference, San Diego,California, May 30–June 2, 2006. p. 85–94.

[17] Lall P, Hande M, Singh MN, Suhling J, Lee J. Feature extraction and damage datafor prognostication of leaded and leadfree electronics. In: Proceedings of the

56th IEEE electronic components and technology conference, San Diego,California, May 30–June 2, 2006. p. 718–27.

[18] Lall P, Islam N, Suhling J. Leading indicators of failure for prognostication ofleaded and lead-free electronics in harsh environments. In: Proceedings of theASME InterPACK conference, San Francisco, CA, Paper IPACK2005-73426, July17–22, 2005. p. 1–9.

[19] Lall P, Islam MN, Rahim K, Suhling J. Prognostics and health management ofelectronic packaging. IEEE Transactions on Components and PackagingTechnologies, Paper Available in Digital Format on IEEE Explore, March2005. p. 1–12.

[20] Lall P. Challenges in accelerated life testing. In: Inter society conference,Thermal phenomena; 2004. p. 727.

[21] XILINX Fine-pitch BGA (FG1156/FGG1156) package, PK039, vol. 1.2. June 25,2004.

[22] Maia Filhoa WC, Brizoux M, Frémonb H, Danto Y. Improved physicalunderstanding of intermittent failure in continuous monitoring method.ESREF; 2006.