Fast and Scalable Physics-Based Electromigration Checking for … · 2017-12-19 · Fast and...

Fast and Scalable Physics-Based Electromigration Checking for Power

Grids in Integrated Circuits

Sandeep Chatterjee

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Electrical & Computer Engineering

University of Toronto

Abstract

Fast and Scalable Physics-Based Electromigration Checking for Power Grids in Integrated

Circuits

Sandeep Chatterjee

Doctor of Philosophy

Graduate Department of Electrical & Computer Engineering

University of Toronto

Electromigration (EM) is a key reliability concern in chip power/ ground (p/g) grids, which

has been exacerbated by the high current levels and narrow metal lines in modern grids. EM

checking is expensive due to the large sizes of modern p/g grids and is also inherently difficult

due to the complex nature of the EM phenomenon. Traditional EM checking is based on

empirical models, but better models are needed for accurate prediction due to the very small

margins between the allowed failure rates (spec) and the failure rates at which the chips actually

operate in the field. Thus, recent more accurate physics-based EM models have been proposed,

which remain computationally expensive because they require solution of a system of partial

differential equations (PDEs). In this work, we extend the existing physics-based models for EM

in metal branches to track EM degradation in multi-branch interconnect trees and propose a fast

and scalable methodology for power grid EM verification. We speed up our implementation by

using filtering schemes (that focus the computation only on the most EM susceptible trees) and

by developing optimized numerical methods to solve the PDE system arising out of the physics-

based EM models. The lifetimes found using our physics-based approach are on average 2.35x

longer than those based on a (calibrated) Black’s model, as extended to handle mesh power

grids. With a runtime of only 10 minutes for a 4.1M node grid, our approach is extremely fast

and should scale well for large integrated circuits.

Acknowledgements

When people congratulated me on completing my final defense, I cannot help but look back

at the last 4 years of my life: how rewarding and enriching this journey has been. And it would

not have been possible without the help and support of a lot of people, to whom I would like

to express my sincere gratitude in this acknowledgment.

First and foremost, I would like to thank my supervisor Professor Farid N. Najm, because

without his support and encouragement this work would not have been possible. I have learned

a lot of things from him, which has helped make me a better person overall. I am truly thankful

for his brilliant technical (and non-technical) advice and his thoughtful suggestions. He is the

best supervisor one could hope for, and I consider myself extremely lucky that he chose me as

one of his students.

I would like to thank my committee members Professor Vaughn Betz, Professor Paul Chow,

Professor Sean Hum and Professor Peng Li for taking time to review this work and for providing

me with constructive comments, which has definitely improved the quality of this work. I would

also like to thank Dr. Valeriy Sukharev for providing me with the opportunity to collaborate

with him, I learned a lot from him about the industry and about Armenia! I appreciate the

financial support for this project provided by the University of Toronto, Natural Sciences and

Engineering Research Council (NSERC) of Canada, Mentor Graphics (a Siemens business) and

by Semiconductor Research Corporation (SRC).

I consider myself lucky to have such a good set of friends, whose support and encouragement

made the last 4 years of my life so easy and memorable. I would like to thank Mohammad

Fawaz, my friend and colleague, with whom I shared my masters at the University of Toronto

and now we both are finishing our Ph.D together. As it turns out, we are also joining the

same company after graduation, let’s hope this path continues in the future too. Many thanks

to Zahi Moudallal, who is a wonderful guy and is an excellent person to go talk to if you are

having problems with mathematical proofs or notation, or in general too. And how can I for-

get Abdul-Amir (Abed) Yassine, who is my cubicle neighbor and a fellow geek. We share a

common love for TV series and comic book movies, and I have enjoyed our long and “fruit-

ful” discussions on all related topics. I hope one day he gets the cubicle he deserves! This

acknowledgment would be incomplete without the mentioning my friends: Genevieve Hayden,

Aakar Gupta, Aakash Nigam, Dikshant Sharma, Divyam Beniwal, Balsher Singh Sidhu, Vipin

Mathew, Aapar Agarwal, Ajay Thomas, Monika Patel, Noha Sinno, Mehul Srivastava, Nihal

Anand, Rajeev Acharya, Venkatesh Medabalimi, Hari Sridhar and countless others who have

made this journey exciting. I will never forget the numerous Toronto adventures, hikes, camp-

ings, dinners, barbecues, board game nights, late night walks and discussions I had with them.

Also, many thanks to my friends in India for their support and motivation. I wish you all the

best for the future.

My biggest gratitude goes to my parents, Mr. Jitendra Kr. Chatterjee and Mrs. Soma

Chatterjee for their continued support and encouragement throughout my Ph.D. and wishing

only the best for me. Thank you, mom and dad, for believing in me, for making me what I am

today, for all that you have done for me and for which I am forever indebted to you. Thank

you again for all the support, this work is dedicated to you.

Lastly, I offer my regards to those whom I might have missed but supported me in any

respect during the completion of this work.

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 6

2.1 Electromigration Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Atomic Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Void Nucleation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Void Growth Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.4 Effective-EM Current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 EM failure Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Black’s model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Physics-based EM models . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Korhonen’s Model and its adaptations . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 The Korhonen’s model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Solution for blocking boundary at both ends . . . . . . . . . . . . . . . . 14

2.3.3 Riege Thompson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.4 CTHKS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Review of Power Grid EM checking approaches . . . . . . . . . . . . . . . . . . . 18

2.4.1 Industrial EM checking approach . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.2 Recent approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Power Grid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Partial Differential Equations (PDE) . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.7 Ordinary Differential Equations (ODE) . . . . . . . . . . . . . . . . . . . . . . . 23

2.7.1 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7.2 Linear Multi-Step Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7.3 Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.7.4 Variable time-stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.8 Compact Thermal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.9 State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.10 Mean estimation using Monte Carlo random sampling . . . . . . . . . . . . . . . 32

3 Extended Korhonen’s model 34

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Interconnect Tree EM analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Assigning reference directions . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.2 Incorporating thermal stress . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Extending Korhonen’s model to trees . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.1 Boundary Laws for junctions . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.2 PDE system for a general interconnect tree . . . . . . . . . . . . . . . . . 40

3.3.3 Void growth and resistance change . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Solving EKM using IVP formulation . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.2 Discretization for a tree branch . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.3 Boundary Conditions at Diffusion Barrier . . . . . . . . . . . . . . . . . . 44

3.4.4 Boundary Conditions at Dotted-I junction . . . . . . . . . . . . . . . . . . 44

3.4.5 Boundary Conditions at T junction . . . . . . . . . . . . . . . . . . . . . . 45

3.4.6 Boundary Conditions at Plus junction . . . . . . . . . . . . . . . . . . . . 46

3.5 Verifying EKM and the IVP formulation . . . . . . . . . . . . . . . . . . . . . . . 47

3.5.1 Verifying the numerical approach . . . . . . . . . . . . . . . . . . . . . . . 48

3.5.2 Verifying the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Comparison between EKM and Black’s model . . . . . . . . . . . . . . . . . . . . 53

3.7 Importance of Temperature distribution . . . . . . . . . . . . . . . . . . . . . . . 55

4 LTI Models for trees 57

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 State Space representation for a tree . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.1 Subtrees and Time-spans . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.2 LTI system for a subtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.3 LTI system for pre-void phase . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.4 Final State Space representation . . . . . . . . . . . . . . . . . . . . . . . 66

4.3 Choosing the value of N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Justification for the use of effective-EM currents . . . . . . . . . . . . . . . . . . 69

5 Solution Techniques 73

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2 Equivalent Homogeneous LTI system for EKM . . . . . . . . . . . . . . . . . . . 73

5.3 Using BDF formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3.1 Review of BDF with fixed time-step . . . . . . . . . . . . . . . . . . . . . 74

5.3.2 Variable coefficient BDF methods . . . . . . . . . . . . . . . . . . . . . . . 75

5.4 Applying VCBDF to solve the Homogeneous LTI system . . . . . . . . . . . . . . 79

5.5 Computing Matrix Exponential using the Arnoldi process . . . . . . . . . . . . . 82

5.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.5.2 The Arnoldi process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.5.3 Solving the Homogeneous LTI system . . . . . . . . . . . . . . . . . . . . 83

5.6 Solvers that use the matrix exponential . . . . . . . . . . . . . . . . . . . . . . . 85

5.6.1 Newton Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.6.2 Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6 Power Grid EM Checking 94

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.2 Early Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3 Determining Branch Temperatures . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.4 Power Grid EM analysis approaches . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.4.1 Power Grid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.4.2 The Main Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.4.3 Improved performance with Filtering . . . . . . . . . . . . . . . . . . . . . 101

6.4.4 Parallelization using shared memory . . . . . . . . . . . . . . . . . . . . . 107

6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.5.1 Main Approach vs Filtering Approach . . . . . . . . . . . . . . . . . . . . 112

6.5.2 Comparison of Performance and Accuracy between the solvers . . . . . . 113

6.5.3 Black’s Model vs. EKM for grid MTF estimation . . . . . . . . . . . . . . 115

6.5.4 Effect of Early Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.5.5 Speed-up due to parallelization . . . . . . . . . . . . . . . . . . . . . . . . 118

6.5.6 Break-up of time consumed by different tasks in the code . . . . . . . . . 119

6.5.7 Overall scalability of the approach . . . . . . . . . . . . . . . . . . . . . . 120

7 Conclusions and Future Work 121

Appendices 123

A Properties of system matrix A 124

A.1 Proof of theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

A.2 Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

B The math behind the Filtering approach 130

B.1 Integration details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

B.2 Deriving confidence bound on µ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

B.2.1 Finding δκζ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

B.2.2 Finding δµ′ζ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Bibliography 134

List of Tables

2.1 Butcher tableau characterizing a m stage RK formula with built-in error estimates 25

3.1 Comparison of upstream-to-downstream MTF ratio as reported in [1] and as

estimated using EKM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1 Comparison of solver metrics and runtime . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Details of Power Grids used in experiments. . . . . . . . . . . . . . . . . . . . . 110

6.2 Table of Physical constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.3 Configuration parameters to be used for evaluating all power grid benchmarks . . 111

6.4 Notation used to simplify presentation . . . . . . . . . . . . . . . . . . . . . . . . 111

6.5 Comparison of Power grid MTF obtained using the Main Approach and the

Filtering Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.6 Comparing the performance and accuracy of VCBDF2-VCBDF4 methods for

power grid EM checking using RK45 as reference . . . . . . . . . . . . . . . . . . 114

6.7 Comparison of the RK45 solver (run on the first machine) and the Predic-

tor+Newton solver on the second machine (Quad-core i7@3.4GHz) . . . . . . . . 114

6.8 Comparison of power grid MTF as estimated using Black’s model and Extended

Korhonen’s model (with VCBDF2 solver). . . . . . . . . . . . . . . . . . . . . . 115

List of Figures

1.1 Wire lifetime and current density scaling. Figure taken from [2]. . . . . . . . . . 2

2.1 (a) A conventional or late failure, (b) early failure and (c) simple schematic

representation for both failures. (a) and (b) taken from [3] and [4], respectively. . 8

2.2 A simple volume element with flux divergence. . . . . . . . . . . . . . . . . . . . 11

2.3 3D stress tensor on a small volume element. For each component, the first

subscript/index denotes the direction of the outward normal from the face and

the second subscript/index is the direction of the of stress acting on that face. . . 11

2.4 Schematic for a confined metal line, showing a volume element. . . . . . . . . . . 13

2.5 (a) Stress evolution at different points along the line and (b) stress profile along

the line at different time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Comparison of stress evolution at cathode of a finite line calculated using Riege-

Thomson model and the reference solution (2.11). . . . . . . . . . . . . . . . . . . 16

2.7 Simple multi-branch interconnect structures. . . . . . . . . . . . . . . . . . . . . 16

2.8 Schematic for a typical on-die power grid. . . . . . . . . . . . . . . . . . . . . . . 20

2.9 DC model of a power grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.10 (a) Cuboids resulting from spatial discretization along x, y and z axis with

their indices (note that we have not shown cuboids with indices (i, j − 1, k) and

(i, j + 1, k) for clarity) and (b) the equivalent electrothermal model for each

cuboid. The conductances gxT , gyT and gzT are shared by the neighbouring

cuboids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 Cross sectional schematic of Cu dual damascene interconnects. . . . . . . . . . . 34

3.2 A typical interconnect tree structure. . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 A simple 3-terminal tree Td. Dashed arrows denote reference directions. . . . . . 37

3.4 Stress profile around a junction immediately after void nucleation. . . . . . . . . 38

3.5 For Td, (a) evolution of stress at junctions with time and (b) stress profile with

time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.6 Tree with a (a) dotted-I junction and (b) T junction. . . . . . . . . . . . . . . . . 48

3.7 (a) Comparing stress evolution for a dotted-I structure as obtained using EKM

and the CTHKS model, and (b) the error rate plot with respect to the CTHKS

solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.8 (a) Comparing stress evolution for a T-structure as obtained using EKM and the

CTHKS model, and (b) the error rate plot with respect to the CTHKS solution. 49

3.9 Stress profile across the T-structure with time. . . . . . . . . . . . . . . . . . . . 50

3.10 Schematic of a finite line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.11 (a) Comparing stress evolution for a finite-line as obtained using EKM and the

reference solution, and (b) the error rate plot with respect to the reference solu-

tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.12 Comparing the estimated MTF and its 95% confidence bounds as obtained using

EKM with the ones reported by Gan et al. [5]. Note that the confidence bounds

get tighter as the number of TTF samples are increased. . . . . . . . . . . . . . 51

3.13 (a) Schematic view of the test structure used in [1], and (b) Upstream and

downstream configurations as defined with respect to the left via. Both figures

taken from [1]. Here, TiN (Titanium Nitride) is used for barrier liner and SiN

(Silicon Nitride) is used for capping. . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.14 a) Initial current density profile for T1 and heat map showing MTFs estimated

using (b) Extended Korhonen’s model (MTFekm), (c) Black’s model (MTFblk)

and (d) MTFblk −MTFekm. All MTF values are in years. . . . . . . . . . . . . 53

3.15 (a) Initial current density profile for T2 and heat map showing MTFs estimated

using (b) Extended Korhonen’s model (MTFekm), (c) Black’s model (MTFblk)

and (d) MTFblk −MTFekm. All MTF values are in years. . . . . . . . . . . . . . 54

3.16 (a) The actual temperature profile and the assumed nominal temperature dis-

tribution. Heat map showing MTFs estimated with (b) actual temperature

profile (MTFT ), (c) assuming Tm,k = 327.6K for all branches (MTF T ) and

(d) MTFT −MTF T . All MTF values are in years. . . . . . . . . . . . . . . . . . 55

3.17 Estimated MTF as per EKM using (a) the actual temperature profile, and as-

suming the temperature to be (b) 315K (c) 327.6K and (d) 340K for all branches.

The x-axis for all plots represent the junction IDs. Junctions with MTF ≥ 100

years have not been shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 Notion of subtrees and time-spans. . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Error rate plots for LTI modelsM8-M50 with respect to the reference solution

obtained usingM64. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.3 (a) Runtime vs. accuracy trade-off for LTI models with different discretizations

and (b) Percentage error in estimated junction void nucleation times for LTI

modelsM8-M50 with respect toM64. Smaller is better. . . . . . . . . . . . . . . 69

4.4 The stress evolution at junctions in response to periodic pulsed branch currents

and their average (effective) values. The time-periods are (a) 2 months, (b) 1

month, (c) 2 weeks, (d) 1 week and (e) is a random waveform. . . . . . . . . . . 71

4.5 Frequency response of the pre-void LTI system for Td using Bode plots. The LTI

system of Td has three outputs and three inputs for the pre-void phase. . . . . . 72

4.6 Frequency response of the post-void LTI system for Td using Bode plots. Here,

n2 has a void, and is thus a part of both branches b1 and b2. Also, now there are

only two inputs because a voided diffusion barrier has no inputs. . . . . . . . . . 72

5.1 Obtaining the next void nucleation time using the Newton solver. . . . . . . . . . 86

5.2 Obtaining the next void nucleation time using Predictor. . . . . . . . . . . . . . . 87

5.3 Showing part of trees (a) T1 and (b) T2 used for comparing solvers. The orange

dots show the junctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.4 (a) Error rate plot for stress evolution at junctions as obtained using VCBDF2-

VCBDF6 solvers and expm approximation and (b) the average absolute error

with respect to RK45 solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.5 Percentage error in the estimated TTFs of (a) T1 and (b) T2 using the proposed

solvers and RK45 solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.6 Empirical complexity of VCBDF2 solver for trees (a) T1 and (b) T2, and VCBDF3

solver for trees (c) T1 and (d) T2, computed by using the fitting function time =

aN b, where b is the complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.1 (a) An arrangement of two trees connected by a via taken from the power grid

and (b) the corresponding schematic showing early and conventional failures. . . 95

6.2 Thermal modelling of power grid using CTMs. . . . . . . . . . . . . . . . . . . . 96

6.3 (a) Heat map for Pself heating +Plogic and (b) temperature profile (in Kelvin) for

the M1 layer in ibmpgnew2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.4 (a) Heat map for Pself heating +Plogic and (b) temperature profile (in Kelvin) for

the M1 layer in PG7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.5 (a) Goodness-of-fit plot for normal distribution and (b) probability distribution

function (pdf) using 200 mesh TTF samples from ibmpg2 main approach. . . . . 100

6.6 The idea for expm filtering scheme. The dotted lines show the would-be stress

evolution if the boundary conditions are not updated when stress reaches σth.

Junction 1 fails before t = tm, Junction 2 fails after. . . . . . . . . . . . . . . . . 102

6.7 Variation of p2 with sample number. . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.8 Flow chart showing the MTF estimation using the Filtering approach. EF stands

for early failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.9 Workflow for each process in our parallel implementation. . . . . . . . . . . . . . 109

6.10 Comparing the main approach with the filtering approach for the first 5 grids

showing (a) 95% confidence bounds on the estimated MTF, and the TTF samples

obtained by each for (b) ibmpg2 and (c) ibmpg5. . . . . . . . . . . . . . . . . . . 112

6.11 Impact of early failures (EF) on (a) the maximum voltage drop (shown for one

sample grid) and (b) estimated mesh MTF for ibmpg2. Maximum voltage drop

at t = 0 is 3.8%vdd, and vth = 5%vdd. . . . . . . . . . . . . . . . . . . . . . . . . 117

6.12 Statistics of mesh TTF samples for ibmpg2 grid shows an underlying bimodal

distribution for different modes of grid failure. MTFA = 6.67 yrs, MTFB = 7.99

yrs, MTFall = 7.66 yrs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.13 Bar chart comparing speed-ups obtained using 4, 8 and 12 parallel processes with

respect to sequential code. Higher is better. . . . . . . . . . . . . . . . . . . . . . 118

6.14 The figure shows how tm is updated for (a) ibmpg2 and (b) ibmpg5 with MC

iterations for P parallel process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.15 Showing a breakdown of the total runtime (in terms of percentages) consumed

by different tasks in the code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.16 (a) tBDF212 vs. branch count for all test grids and (b) scalability analysis for grids

that only have straight trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.1 (a) A typical interconnect tree T with its corresponding graphs (b) G(T ), (c) theconverse G′(T ) and (d) Part of graph Γ(A) for any two adjacent points i and k.

Here, N = 4 and the vertex at n1 is the root. . . . . . . . . . . . . . . . . . . . . 125

A.2 All paths starting from the root and ending in a diffusion barrier for (a) G(T )and (b) the corresponding converse paths in G′(T ). . . . . . . . . . . . . . . . . . 126

List of Symbols

Symbol Description

σ Hydrostatic stress

t Time

x Distance along the length of branch from some reference point

σth Critical Stress threshold for void nucleation

σT Thermal stress

Ja Atomic flux

B Bulk modulus

C Concentration of atoms

Cv Vacancy concentration

Ω Atomic volume

kb Boltzmann’s constant

Tm Temperature of the metal

q∗ Effective charge

Da Atomic diffusion coefficient or diffusivity

Q Activation energy for vacancy formation

Ea Activation energy in Black’s model

n Current exponent in Black’s model

G Conductance matrix

v Vector of node voltage drops across the power grid

Tamb Ambient temperature

η Dimensionless (scaled) hydrostatic stress

τ Dimensionless (scaled) time

ξ Dimensionless (scaled) distance along the length of branch from some reference

δ Thickness of void interface

L, w, h Length, width and height of a branch

j Current density of a branch

N Number of discretizations per branch

vth Voltage drop threshold vector for mesh model

ρm Resistivity of metal (Copper)

ρb Resistivity of barrier metal (Tantalum)

GT Thermal conductance matrix

CT Thermal capacitance matrix

gxT , gyT , gzT thermal conductance in the x, y and z direction

Tzs Stress free annealing temperature

x State vector for state space representation of a system

A System matrix for state space representation of a system

B Input matrix for state space representation of a system

L Output matrix for state space representation of a system

u Input vector for state space representation of a system

y Output vector for state space representation of a system

h Time step taken by the numerical method

ai, bi Scalar coefficients of a numerical method

ǫPLTE Principal local truncation error

T Random variable that represents the statistics of time to failure of a grid

F (t) Cumulative Distribution Function of a Random variable

µ Mean Time to Failure

v Unbiased estimator of variance

zζ/2 The (1− ζ/2)-percentile of the standard normal distribution

Φ(t) The cdf of standard normal distribution

φ(t) Probability Distribution Function (pdf) of the standard normal distribution

tm Active set cutoff threshold

Chapter 1

Introduction

1.1 Motivation

On-die power/ground (p/g) grids are subjected to a wide variety of degradation mechanisms.

For example, the p/g grid must be designed to withstand the deterioration resulting from Time-

Dependent Dielectric Breakdown (TDDB), current crowding at corners in the metal structure,

and stresses generated due to non-uniform temperature distribution and electromigration. As a

result of these ongoing phenomena, the capacity of the grid to deliver the required power to the

underlying logic circuits reduces over time until it finally fails. Accurately accounting for these

degradation mechanisms is the key to optimally design a power grid that is fast and reliable in

the field for a desired amount of time.

As a result of continued scaling of integrated circuit technology, electromigration (EM) has

become a major reliability concern for the design of on-die power grids in large integrated

circuits. Electromigration is the mass transport of metal atoms due to momentum transfer

between electrons and the atoms in a metal line. This ‘mass transport’ of metal atoms eventually

leads to void formation in the metal line, which degrades its conductivity. If multiple lines

experience failure due to EM, a grid might not be able to provide enough voltage to the

underlying logic blocks, which will result in timing violations and failure of the whole IC.

While it is next to impossible to avoid EM degradation in narrow metal lines, one can design

the power grids to withstand EM damage for a target lifetime. This is where EM models and

CAD tools come into play: their main purpose is to estimate EM damage in a given layout so

that the designer can judiciously use metal resources. While signal and clock lines also suffer

from EM degradation, it is often the case that these lines carry bidirectional current. As a

result, the damage caused by EM is partially reversed and these lines have a longer lifetime. In

contrast, p/g lines carry mostly unidirectional current, with no benefit of healing. Moreover,

the signal lines are more likely to degrade due to thermal fatigue, rather than electromigration

damage [6].

Electromigration is a complex phenomenon and its study, spanning several decades, includes

theoretical analysis, empirical and physical models and full-chip EM checking techniques. When

Chapter 1. Introduction 2

Figure 1.1: Wire lifetime and current density scaling. Figure taken from [2].

EM was first discovered to be a failure mechanism for commercial IC designs in 1966 [7], the

initial solution was to make the lines wider. However, wider lines entail less area for routing,

which leads to more design iterations and longer time-to-market, that ultimately results in

less return on investment. Hence, a lot of research has been conducted since 1966 on the

reliability of metal lines under the influence of EM, with the purpose of understanding and

controlling EM damage. Some of this research was focused on improving the resilience of metal

lines to EM failures by improving the fabrication processes and the materials involved. Other

researchers focused on estimating the EM degradation using mathematical models. Simple

empirical EM models, such as the Black’s model [8], were proposed that helped in understanding

the dependence of EM on the current density, line microstructure and a host of other factors. A

series model was proposed to estimate the reliability of the whole power grid from the reliability

of its individual metal lines [9], where it was conservatively assumed that one line failure would

cause the whole system to fail. Based on the series model, and some simplifying assumptions,

Statistical Electromigration Budgeting (SEB) was proposed [10] to allow for reliability trade-

offs between different parts of the grid. Black’s model for line failure combined with the series

model for grid failure is used in the state of the art industrial tools today for EM checking.

Industrial EM tools, based on simple failure models, got the job done for the past 40 years.

However, over the last decade, technology scaling has exacerbated EM [2, 7]. It is now becoming

much harder to sign off on chip designs using state of the art EM checking tools, as there is

no margin left between the predicted EM stress (obtained from the EM tools) and the EM

design rules (formulated based on a target lifetime) [11]. There are at least two reasons for

the loss of the safety margins. First, the EM lifetime itself is becoming progressively worse

due to technology scaling. Fig. 1.1 shows the lifetime and current density trends as the metal

pitch is reduced due to technology scaling. As the interconnect dimensions are scaled down

in smaller technology nodes, their lifetime under the influence of EM decreases even under

constant current density [12]. Moreover, since the supply voltages are not scaling down by

the same factor as the line widths, the current densities keep on increasing, which further

reduces the EM lifetimes. Second, the loss of safety margins can also be traced back to the

simplicity and pessimism built in the EM models used by the industrial tools. This simplicity

and pessimism is often rationalized on the grounds of necessity (the actual physical system is

too complex to be analyzed, and modern power grids are very large with up to a billion nodes)

and conservatism (the analyzed system is worse than the actual one). But, as the IC designs

become more complex and new factors come to bear, this simplicity and pessimism, combined

with reduction in EM lifetimes, leave no breathing room for designers who are now forced

to over-design the grids. Thus, there is a need to reconsider the traditional approaches and

develop better EM models that can accurately assess EM degradation so that we can eliminate

the pessimism built in state of the art EM tools and accurately estimate EM lifetime.

1.2 Contribution

The goal of this research is twofold: first to develop an EM model that can accurately estimate

EM lifetime and second, to use that EM model for the verification of on-die power grids. Given

that it is hard to model all the complexity of the EM phenomenon using empirical models, we

will use physics-based EM models for our work. Several physics-based EM models have been

proposed in the literature [13, 14, 15, 16, 17, 18, 19], some which have been used for power

grid EM checking [20, 21, 22, 23], but as we will explain in the next chapter, these approaches

are either so slow that they are not scalable to large grids or they are simplified in a way that

prevents them from taking into account all the factors that affect EM in real designs, so that

they are inaccurate.

In this work, we propose a fast and scalable finite-difference based physical EM checking

approach that accounts for process and temperature variations across the die. Our major

contributions are:

1. We propose a new physics-based EM model, that builds on Korhonen’s one-dimensional

(1D) physical model [16], and augments it by introducing boundary laws at junctions

(where multiple branches meet) to track the material flow and stress evolution in multi-

branch metal segments (for arbitrary complex geometries). We also account for the ther-

mal stresses generated by non-uniform temperature distribution across the grid. We refer

to this as the Extended Korhonen’s Model, or EKM.

2. For each tree, EKM starts out as a system of partial differential equations (PDE) coupled

by boundary laws. We show that this PDE system can be expressed as a succession of

Linear Time Invariant (LTI) systems, where each state represents the hydrostatic stress

at a some point on the tree. We study the properties of this linear system to justify the

use of some well known practices in the field, such as the use of effective DC currents in

EM analysis.

3. We develop new numerical approaches, based on Backward Differentiation Formulas

(BDFs) and model order reduction techniques, that are very fast and efficient as com-

pared to the traditional solvers for solving the LTI systems resulting from EKM. These

approaches are optimized by eliminating the Newton iteration step usually associated with

BDFs, and by using customized error control for the problem at hand. These optimized

solvers are partly the reason that our approach is scalable to large grids.

4. We propose a Power Grid EM checking scheme that uses

a) Compact Thermal Models (CTMs) [24] to determine the temperature distribution

of the grid,

b) Extended Korhonen’s Model to track EM degradation in the metal segments and

c) the mesh model [25], as opposed to the series model, to determine grid failure.

The mesh model factors in the inherent redundancy of modern power grids while estimat-

ing its reliability, and gives an accurate estimate of the grid lifetime. The random nature

of EM degradation, caused by process variation, is taken care of by using a Monte Carlo

method, in which successive samples of the grid time to failure (TTF) are found, until

the estimate of the overall Mean Time to Failure (MTF) has converged. We improve our

runtime and scalability by using several filtering schemes that estimate up-front the active

set of trees that are most-likely to impact the MTF assessment of the grid. We show that

the filtering schemes have a minimal impact on the accuracy of MTF estimation. Since

EKM provides a natural way to account for early failures (big voids that disconnect the

via above), we also detect early failures and update the state of the system accordingly.

On the implementation side, we parallelize our code using a multi-process architecture to

take advantage of all available cores in a machine.

Testing our approach on the IBM grid benchmarks [26] and internal benchmarks, with the

largest grid up to 4.1M nodes, shows that the MTF estimated using our physics-based approach

are on average 2.35x longer than those based on a (calibrated) Black’s model. This justifies

the claim that Black’s model can be overly inaccurate for modern power grids and confirms the

need for physical models. With a run-time of only around 16.2 minutes for the most difficult

to solve grid and 10.3 minutes for the largest (4.1M) grid, our approach is extremely fast and

should scale well for large integrated circuits.

1.3 Organization

The thesis is organized as follows: Chapter 2 provides the necessary background material on

electromigration and the prior art regarding the EM models and power grid EM checking

approaches. It also covers the basics of ODE solvers, mean estimation of distributions and

LTI models. Chapter 3 presents the Extended Korhonen’s Model and verifies it by comparing

its results with data from experiments published in the literature. In Chapter 4, we study

in detail the LTI models arising out of EKM and introduce the concept of state stamps, that

can be used to quickly and efficiently assemble the LTI system. Chapter 5 develops fast and

scalable numerical approaches that are used to obtain the stress evolution in trees over time

and to determine the time and location of the next void nucleation in a tree. In Chapter 6, we

describe in detail our power grid EM checking approaches that use the physics-based EM model

we proposed in Chapter 3. We also compare the MTF estimates obtained using a calibrated

Black’s model and EKM to show the inherent limitations of the Black’s model. We conclude

and give future research directions in Chapter 7.

Chapter 2

Background

In this chapter, we will review the required background material. We will start by reviewing

the basics of Electromigration in Section 2.1, followed by the mathematical models that have

been proposed to explain the process of EM degradation in Section 2.2. We will then focus

on one particular physics-based EM model, namely Korhonen’s model and its adaptations in

Section 2.3. In Section 2.4 we will review the industrial EM checking approaches for power grids,

with some recently proposed enhancements and, in Section 2.5, we will present the power grid

model that is used in the field to perform EM checks. In Sections 2.6 and 2.7, we will review the

numerical methods for solving Partial Differential Equations (PDE) and Ordinary Differential

Equations (ODE), respectively. We will then apply one of the numerical methods (method of

lines) to the heat transfer PDE in Section 2.8 and show the electro-thermal equivalence. In

Section 2.9, we will review the state space models and finally in Section 2.10, we will review

the Monte Carlo random sampling approach for estimating the mean of a distribution within

user specified error tolerances.

2.1 Electromigration Basics

Electromigration is the mass transport of metal atoms due to momentum transfer between

electrons (driven by an electric field) and the atoms in a metal line. Equivalently, one can

also say that EM is the diffusive motion of vacancies in a metal segment under the influence

of an applied electric field and/or stress gradients. A vacancy is the absence of a metal atom

in a crystal lattice. As we will see a little later, the movement of atoms/vacancies generates

mechanical stress within a metal segment, which is used as a measure of EM degradation. EM

is highly dependent on the specific microstructure of a given line. As such, due to random

manufacturing variations, the time to failure (TTF) due to EM is a random variable. For a

given microstructure, the rate of EM degradation depends on the type of metal, geometry,

temperature and current density of the given line segment.

Chapter 2. Background 7

2.1.1 Atomic Flux

Under conditions of high current density, metal atoms are pushed in the direction of the electron

flow. The number of atoms moving across a cross-section of a metal line per second per unit

area is known as the atomic flux. The total atomic flux in a metal segment is the result of

fluxes generated due to two different phenomenon:

i) electronic flux, generated due to the applied electric field and is always opposite to the

direction of the applied electric field (i.e the atoms are pushed in a direction opposite to

the applied electric field) and

ii) gradient flux, generated by the stress gradient itself and always flows from points of low

vacancy concentration (i.e. compressive stress) to high vacancy concentration (i.e. tensile

stress).

Note that the gradient flux counteracts the electronic flux. For example, consider a finite metal

line embedded in a rigid dielectric material. Then, the metal atoms and the atomic flux are

confined within the line. We express this by saying the atomic flux is blocked at the boundary

and cannot escape. Now, if we apply a strong electric field in the line, the electric current

will flow from anode to cathode (recall that by convention, electric current always flows from

anode to cathode). Then, the electronic flux will push the metal atoms from cathode to anode.

Correspondingly, the vacancies will move towards the cathode, and will generate tensile stress

there. The anode end of the line will develop compressive stress. This stress gradient in turn

generates the gradient flux that flows from anode to cathode, and opposes the electronic flux. A

higher spatial stress derivative leads to a higher gradient flux and vice versa. The phenomenon

of gradient flux opposing the electronic flux is often referred to as the back-stress effect in the

literature [27]. The process of EM degradation can be divided into two phases: void nucleation

and void growth.

2.1.2 Void Nucleation Phase

If the in-flow of metal atoms is equal to the out-flow at every point on a line segment, then

clearly no deformation or failure will occur. On the other hand, if the in-flow is not equal to

the out-flow, atomic flux divergence (AFD) is said to occur. AFD is a necessary prerequisite

for EM degradation and is typically observed in locations with some sort of barrier to atomic

movement, such as at the end of a line, at locations where the width of the metal segment

changes or around grain boundaries where the microstructure changes. Flux divergence at

these locations generates points of high tensile and compressive stresses within the segment.

The amount of compressive stress needed to cause a pile-up of metal atoms (a hillock) leading

to a short circuit is very high in modern metal systems, hence failure due to short circuit is

not usually observed. However, the build up of tensile stress eventually leads to formation of

a void when the stress reaches a pre-determined critical threshold. This initial phase of EM

(a) (b)

Figure 2.1: (a) A conventional or late failure, (b) early failure and (c) simple schematic repre-sentation for both failures. (a) and (b) taken from [3] and [4], respectively.

degradation, when stress is increasing over time but the void has not yet nucleated, is known

as the void nucleation phase.

If the critical stress threshold for void nucleation cannot be reached, the stress profile settles

at some steady state value. This happens because as the tensile and compressive stresses in

a metal segment increase with time, the gradient flux also increases. On the other hand, the

electronic flux remains constant because it depends on the applied electric field. When the

gradient flux becomes equal to the electronic flux, the net atomic flux becomes zero and the

system reaches a steady state. For a given metal segment, the steady state stress profile is

primarily determined by the applied electric field.

2.1.3 Void Growth Phase

Once a void nucleates, the void growth phase begins. In some cases, depending on the geometry

and the location of the void, nucleation by itself might be enough to cause failure due to open

circuit by disconnecting the via [28], as shown in Fig. 2.1b and the schematic of Fig. 2.1c. These

failures are often observed in testing and are typically referred to as early failures. Early failures

give rise to bimodal TTF distributions [29]. On the other hand, a line may still continue to

conduct current even after void nucleation; so that it is not quite an open circuit. This situation

is shown in Fig. 2.1a and the schematic of Fig. 2.1c, and is referred to as a conventional failure.

In this case, the void grows in the direction of the electronic flux and the line resistance increases

towards some finite steady-state value. Even if the void spans the whole cross-section of the

line, conduction remains possible through the high resistance barrier metal liner surrounding

the metal, as shown in Fig. 2.1c. In testing of single isolated lines, failure is deemed to happen

when the increase in resistance is 10%− 20% of the initial resistance value.

2.1.4 Effective-EM Current

EM is a long-term failure mechanism. As such, short-term transients typically experienced

in chip workloads do not play a significant role in EM degradation. Thus, standard practice

in the field is to use an effective-EM current model [30] to estimate EM degradation, so that

the lifetime of a metal line when carrying the constant effective current and the time-varying

transient current is the same. The effective-EM current is often computed based on some

assumed periodic current waveform with period tp. If the waveform is unidirectional, then the

effective-EM current is equal to the time-average current density [31]

jdc,eff = javg =1

∫ tp

0j(τ)dτ. (2.1)

For the case of bidirectional currents, let j+(t) and j−(t) denote the current waveforms in the

chosen positive and negative directions, respectively. Then, the effective-EM current density is

given as [30, 32]

jac,eff =1

(∫ tp

0j+(τ)dτ − ϕ

∫ tp

0|j−(τ)|dτ

, (2.2)

where ϕ is the EM recovery factor that is determined experimentally. The positive direction is

chosen such that∫ tp0 j+(τ)dτ ≥

∫ tp0 |j−(τ)|dτ .

2.2 EM failure Models

Many empirical and physics-based models have been proposed to explain EM degradation in a

line. We will now review some of these models, focusing on EM models that are important to

understand the contribution of this work.

2.2.1 Black’s model

One of the earliest empirical models for estimating the EM mean time to failure (MTF) was

proposed by J. R. Black in 1969 [8]. As per his model, the time to failure (TTF) of an isolated

metal line has a lognormal distribution (to account for the randomness due to microstructure)

with mean time to failure given as

MTF =Abljn

, (2.3)

where Abl is a proportionality constant, j is the constant current density (current per unit

cross-sectional area) in the line, kb is Boltzmann’s constant, Tm is the temperature of the line,

n is the so-called current density exponent and Ea is the activation energy. The parameters

Abl, n and Ea are determined experimentally using accelerated testing: isolated metal lines are

tested with high current densities at higher than typical operating temperatures. The TTFs

thus obtained are fitted to a lognormal distribution using goodness of fit methods to estimate

the MTF under testing conditions. The parameters Abl, n and Ea are then determined using

regression analysis [33], and are used for extrapolating the results back to typical operating

conditions.

Later, Blech et al. [34, 35, 36] discovered that not all lines fail due to EM: an isolated

metal line (that has not already failed) is immune to EM failure if the product of its length and

current density is less than the critical Blech product (jL)c, defined as [37]

(jL)c =Ω∆σmax

q∗ρ, (2.4)

where Ω is the atomic volume, ∆σmax > 0 is the maximum stress difference between the cathode

and the anode before void nucleation occurs, q∗ is the absolute value of the effective charge of

the migrating atoms and ρ is the resistivity of the metal. This phenomenon later came to be

known as the Blech effect.

Equation (2.3), combined with the Blech effect (2.4), is known as the Black’s model and

is currently the EM model being used in state of the art commercial tools. The benefit of

using Black’s model is that it is computationally very fast and scales well as the problem size

increases. However, Lloyd [38] pointed out that the fitting parameters Abl, n and Ea obtained

under accelerated testing conditions are not valid at actual operating conditions, and this

leads to significant errors in lifetime extrapolations. Further, Hauschildt et al. [39] conducted

experiments which demonstrated that n depends on the temperature and thermal stress and Ea

depends on the current density of the line. These observations make the use of Black’s model

controversial.

2.2.2 Physics-based EM models

To remedy the shortcomings of the Black’s model, many physics-based EM models have been

proposed. These physics-based models are often presented in the form of partial differential

equations (PDE), that express how a physical quantity of interest, which provides a measure

of EM degradation, is influenced by factors such as the material properties, geometry, current

density and temperature of the metal structure. The PDE, coupled with appropriate boundary

Figure 2.2: A simple volume element withflux divergence.

Figure 2.3: 3D stress tensor on a smallvolume element. For each component, thefirst subscript/index denotes the directionof the outward normal from the face andthe second subscript/index is the directionof the of stress acting on that face.

conditions, can track the EM degradation of a metal structure. Physics-based EM models are

versatile and can be easily adapted to handle different configurations, as opposed to Black’s

model where the fitting parameters are usually valid only for the range of conditions under

which they were obtained.

Most physics-based EM models are based on the following continuity equation

∂t= ∇Ja + γ(t), (2.5)

where Cv is the vacancy concentration, i.e number of vacancies per unit volume, Ja is the atomic

flux, γ(t) is a sink/source term that models the recombination/generation of vacancies at grain

boundaries and ∇ is the Laplace operator, which in Cartesian coordinates can be stated as:

∇ =∂

∂z. (2.6)

Simply put, (2.5) states that for a small volume element, the time rate of change of vacancy

concentration is equal to the sum of the spatial gradient (derivative) of atomic flux and the rate

of recombination/generation of vacancies (higher flux gradient means higher flux divergence

and vice-versa). For example, consider a small volume element, for which the out-flow of atoms

is greater than the in-flow (Fig. 2.2), which means a positive gradient for Ja. If we ignore γ(t)

for simplicity, then we can see that the vacancy concentration in the volume element increases

with time, which generates tensile stress and may eventually lead to a void nucleation. The

physics-based EM models proposed in the literature differ in what they use as a measure of EM

degradation, and how they account for the recombination/generation of vacancies.

The earliest physics-based models [13, 14] used vacancy concentration as a measure of EM

degradation and their failure criteria was based on critical vacancy concentration, i.e. if the

vacancy concentration at any point along the metal line reaches a critical value, a void nucleates

at that point. However, when this model was applied to isolated metal lines, it was found that

the predicted failure times were orders of magnitude smaller than the observed failures times.

This anomaly was corrected by Kirchheim [15], who proposed the first EM model which used

hydrostatic stress σ as a measure of EM degradation. Here, hydrostatic stress is the average

of all normal components of the full stress tensor (see Fig. 2.3), i.e. σ = (σxx + σyy + σzz)/3.

Kirchheim’s model used the relationship between vacancy concentration and stress to “track”

the evolution of stress in a line. A void nucleates when stress along any point on the line reaches

a critical stress threshold. Kirchheim’s model was later simplified by Korhonen et al. [16] using

Hooke’s Law. Further, Kirchheim and Korhonen et al. solved their respective models to obtain

a closed form expression for σ(x, t) (stress as a function of position x on the line at time t)

for a simple configuration: a single metal line embedded in a rigid dielectric with atomic flux

blocked at the line ends. We will study Korhonen’s model in detail in the next section.

All EM models presented up to this point are one-dimensional (1D) models, i.e. at any

given point along the line (x axis), the gradient of stress along the y and z axes are ignored

by assuming that the stress is uniform over the whole cross sectional area. These 1D models

require more computation than Black’s model, but scale moderately well as the problem size

increases. Sarychev et al. [17] proposed the first three dimensional (3D) EM model that can

track stress along the x, y and z axes. Later, Sukharev et al. [18] introduced the concept of

‘plated’ atoms to capture generation/annihilation of vacancies at grain boundaries and Orio

[19] introduced the notion of a 3D diffusion coefficient to model EM degradation in greater

detail. These 3D EM models, though accurate, are computationally expensive and do not scale

well. As such, they are not suitable for full-chip p/g grid EM checking.

2.3 Korhonen’s Model and its adaptations

In this section, we will review the 1D EM model proposed by Korhonen [16], which will be

referred to as Korhonen’s model throughout this work. We will then focus on some of its

adaptations proposed in the literature.

2.3.1 The Korhonen’s model

Consider a metal line confined in a rigid dielectric material with line length along the x axis, as

shown in Fig. 2.4. If it is assumed that stress is uniform across the cross section of the line, then

for any volume element within the line, the relative change in C(x, t), the number metal atoms

per unit volume, corresponds to the increment in hydrostatic stress σ(x, t) as per Hooke’s Law

C= −dσ

B, (2.7)

Figure 2.4: Schematic for a confined metal line, showing a volume element.

where B is the bulk modulus and C is often referred to as the concentration of atoms. In

an ideal lattice, C = 1/Ω, where Ω is the atomic volume. The atomic flux Ja, in the volume

element is a combination of the gradient flux, generated when ∂σ/∂x 6= 0 and the electronic

flux, generated when the current density j 6= 0. It can be stated as

Ja =DaCΩ

(∂σ

∂x− q∗ρ

, (2.8)

where Da is the coefficient of atomic diffusion (also called the diffusivity), kb is the Boltzmann’s

constant, Tm is the temperature in Kelvin, q∗ is the absolute value of the effective charge of

the migrating atoms and ρ is the resistivity of the conductor. Using (2.7) and (2.8) in (2.5),

assuming γ(t) to be proportional to −∂C/∂t and applying some simplifying approximations,

Korhonen proposed that the hydrostatic stress σ(x, t), at location x from some reference point

and at time t, can be found by solving the following PDE

(∂σ

∂x− q∗ρ

. (2.9)

In Korhonen’s formulation, σ is positive for tensile stress and negative for compressive stress.

If the stress at any point along the line reaches the critical stress threshold σth > 0, a void

nucleates at that point. Korhonen’s model captures the dynamics of stress evolution within a

volume element, and as with any PDE, one needs to specify boundary conditions and initial

conditions in order to obtain a solution. Note that in (2.9), it is implicitly assumed that stress,

diffusivity and current density are differentiable with respect to x.

Diffusivity of metal lines

The atomic diffusion coefficient Da is usually expressed using the Arrhenius law

Da = D0 exp

, (2.10)

time (yrs)0 2 4 6 8 10

Stress(M

x = 0x = L/2x = L

Length (×10−6 m)0 10 20 30 40 50

Stress(M

t = 0t = 0.20t = 0.80t = 1.80t = 3.80t = 10.00

Figure 2.5: (a) Stress evolution at different points along the line and (b) stress profile along theline at different time points

where D0 is a constant and Q is the activation energy for vacancy formation and diffusion. The

randomness in TTF due to EM is primarily accounted for by the corresponding randomness

in Da, which has been shown to be lognormally distributed [40]. Strictly speaking, Da also

depends on the stress value at a given point. However, it has been reported that the numerical

results with stress dependent Da are “not too different” from constant Da [16]. Hence, as in

many previous works [20, 21, 22, 23, 41], we will assume that Da is stress-independent.

2.3.2 Solution for blocking boundary at both ends

Korhonen provided an analytical solution for a finite line with flux blocked at both ends.

Consider a finite metal segment of length L that carries a current density j and has a constant

diffusivity Da throughout the line. Korhonen assumed blocked boundary conditions (flux was

blocked at both ends), i.e. Ja(0, t) = Ja(L, t) = 0 and zero initial stress in the metal segment.

Then, as per (2.9), the stress can be found as

σ(x, t) =q∗ρjL

L− 4

∞∑

m−2n exp

−m2nνt

, (2.11)

where mn = (2n+1)π and ν = DaBΩ/(kbTm). We will refer to (2.11) as the reference solution

for the finite line. Fig. 2.5 shows the stress evolution for a finite line as per (2.11) with L = 50µm

and j = 6× 109A/m2 flowing from x = 0 to x = L. Since the current flows from x = 0 (anode)

to x = L (cathode), the electron flow pushes the metal atoms in the opposite direction. This

results in development of tensile stress at x = L (cathode) and compressive stress at x = 0

(anode), as shown in Fig. 2.5.

Role of j, L and Da

The final steady state stress profile across the line can be easily obtained by setting t = ∞ in

(2.11), and is given by

σ(x,∞) =q∗ρjL

. (2.12)

The stress profile at t = 10 yrs, as shown in Fig. 2.5b, is almost the steady state stress profile.

As per (2.12), the steady state stress profile depends on the product of current density j and

line length L. Note that the steady state tensile stress at the cathode is the maximum tensile

stress that can be achieved in the line. Thus, for a finite line to be EM immune, we must have

max[σ(x,∞)] = σ(L,∞) < σth =⇒ jL <2Ωσthq∗ρ

, (2.13)

which is the same as the critical Blech product (2.4) with ∆σmax = 2σth (the stress difference

between the cathode and the anode is maximum during the steady state).

As mentioned before, the atomic flux should be zero at steady state, and this is also readily

observable from Korhonen’s model. From (2.12), it is easy to see that at t =∞

q∗ρj

Ω, (2.14)

which when used in (2.8), gives

Ja =DaCΩ

(∂σ

∂x− q∗ρ

=DaCΩ

(q∗ρ

Ωj − q∗ρ

= 0. (2.15)

For a given current density, the time rate of change of stress depends on the atomic diffusion

coefficient Da: a higher value of Da leads to a higher rate of EM degradation and vice versa.

Since Da has an exponential dependence on temperature, it becomes important to include

temperature in EM analysis. The observations that the steady state stress profile depends on

j and L and that the derivative of stress with respect to time depends on Da are applicable for

complex interconnect structures as well.

2.3.3 Riege Thompson Model

Korhonen’s analytical solution for stress evolution in case of a finite line is theoretically inter-

esting, but is not practically useful as modern ICs are made of connected metal segments that

have complex geometries. Thus, many authors have made efforts to adapt Korhonen’s model

to track stress in multi-branch interconnect structures.

S. P. Hau-Riege and C. V. Thompson [42] developed a closed form analytical expression for

stress evolution at a junction (a point where multiple metal lines meet). They supplemented

Korhonen’s model with boundary conditions that model the interaction of atomic flux at the

junction and conceptually replaced connected branches with semi-infinite limbs. Further, they

time (yrs)0 2 4 6 8 10

Riege-Thomsonexact solution for finite line

Figure 2.6: Comparison of stress evolution at cathode of a finite line calculated using Riege-Thomson model and the reference solution (2.11).

Figure 2.7: Simple multi-branch interconnect structures.

assumed that the stress at the other end of the limbs is constant and is always equal to the

initial stress σ0. With these simplifying assumptions, the stress evolution at the junction is

given by

σjn(t) = σ0 +

ρq∗

√BΩ

k Da,kjk∑

√Da,k

. (2.16)

Fig. 2.6 compares the stress evolution at cathode of a finite line using (2.11) and (2.16). Because

Riege-Thomson’s model replaces branches with semi-infinite limbs, it cannot account for the

back-stress developed due to blocking flux boundary on the anode end of the finite line. That’s

why in Riege-Thompson’s model, the junction stress exceeds the steady state stress value and

the solution discrepancy increases with time. Nevertheless, it is accurate for small time-spans

and it does provide an upper bound on the stress value at a junction and has been used in some

works for power grid EM checking [21].

2.3.4 CTHKS Model

Chen et al. [43, 44] recently developed analytical closed form expressions for stress evolution

in simple multi-branch segments shown in Fig. 2.7. In doing so, they made the following

simplifying assumptions:

i) All branch lengths are equal, assumed to be L.

ii) All branches have the same constant diffusivity Da and temperature Tm.

iii) The initial stress at t = 0 is zero everywhere.

iv) There are no voids at t = 0.

We will refer to their model as the CTHKS model, after the initials of the authors. As per this

model, the stress evolution in branch b1 of a 3-terminal tree as shown in Fig. 2.7a is

σ1(x, t) =q∗ρ

∞∑

g (3L+ 4nL− x, t) + g (L+ 4nL+ x, t)

−2j2

g (L+ 4nL− x, t) + g (3L+ 4nL+ x, t)

+(j2 − j1)

g (2L+ 4nL− x, t) + g (4nL− x, t)

+g (4L+ 4nL+ x, t) + g (2L+ 4nL+ x, t)

, (2.17)

where g(u, t) is defined as

g(u, t) , 2

− u2

− u erfc

2√νt

, (2.18)

with erfc being the complementary error function and ν=DaBΩ/(kbTm). The authors provided

similar analytical expressions for all interconnect trees shown in Fig. 2.7, which can found in [44].

They compared their solutions to the results obtained using COMSOL Multiphysics software

and reported a maximum percentage error of 0.5%.

There are numerous shortcomings in the Riege-Thompson and the CTHKS model. Both

models are not directly applicable to the complex interconnect layouts found in modern power

grids. Riege-Thompson’s model allows for different diffusivities and temperatures for the

branches connected to a junction, which CTHKS model does not. On the other hand, CTHKS

model can account for the back-stress generated due to EM, which Riege-Thompson’s model

cannot. Both models cannot be applied during the void growth phase of EM. All these factors

greatly limit their usefulness for power grid EM checking.

2.4 Review of Power Grid EM checking approaches

2.4.1 Industrial EM checking approach

The state of the art approach for p/g grid EM checking is to break up the grid into isolated

branches, assess the reliability of each branch separately using Black’s model and use the earliest

branch failure time as the failure time for the whole grid. Thus, it is assumed that the grid

fails as soon as any of its branches fail and this is known as the series model of grid failure,

which was first proposed in [9]. Under the series model, the failure rate of the system is the

sum of failure rates of its individual components. Some industrial EM tools use this concept to

budget EM reliability among various parts of the grid. In other words, this allows designers to

re-balance metal usage in different parts of the grid (e.g. widening some lines to improve their

reliability while narrowing others) in a way that doesn’t impact the overall reliability of the

grid. This idea of EM budgeting was first introduced by J. Kitchin [10] in 1995 and is known

as Statistical Electromigration Budgeting (SEB).

As mentioned before, the reliability assessment for each individual branch is done using

Black’s model. Recall that as per Black’s model, 1) a line is immune to EM failure if the

product of its current density and length is less than the critical Blech product and 2) the MTF

of a branch is inversely proportional to its current density, raised to some power. For branches

that are deemed not to be EM immune as per Blech’s criteria, a maximum allowed current

density limit jmax is calculated based on a target (series model) MTF, denoted as µtarget, using

the following relation [45], which is derived form Black’s equation

jmax = jacc

(µacc

µtarget

Tm,use− 1

Tm,acc

, (2.19)

where µacc is the observed MTF under accelerated testing conditions using current density jacc

and temperature Tm,acc, while Tm,use is the actual operating temperature at which the chip will

be used and the other symbols are as defined before.

This industrial EM checking approach is highly inaccurate for at least two reasons:

1. Ignoring Material Flow :

In order to apply Black’s model, it is implicitly assumed that the connected neighboring

branches have no impact on the lifetime of a given branch. This is incorrect because in

todays mesh structured power grids, many branches within the same layer are connected

as part of what is called an interconnect tree, and the atomic flux can flow freely between

them. Indeed, two identical connected branches that carry the same current density

can in practice have quite different values of MTF, as Gan et al. [5] and Wei et al. [46]

have demonstrated in their experiments, so that connected lines can influence each other

leading to different MTF values.

2. Series System Assumption:

The second problem lies with the series system model of the power grid failure. Modern

power grids use a mesh structure. As such, there are many paths for the current to flow

from the C4 bumps to the underlying logic, a characteristic that we refer to as redundancy.

Mesh power grids are in fact closer to (but not quite) a parallel system. As such, it is

highly pessimistic to assume that a single branch failure will always cause the whole grid

to fail.

Over the last few years, many approaches have been proposed that overcome some of these

shortcomings. We will review them next.

2.4.2 Recent approaches

Chatterjee et al. [25, 47] proposed the mesh model as an alternative to the series model. In the

mesh model, a grid is deemed to have failed not when the first branch fails, but when enough

branches have failed so that the voltage drop at some grid node(s) has exceeded a pre-defined

threshold that is chosen so as to avoid causing errors in the underlying logic. However, [25, 47]

still used Black’s model to find the reliability of individual branches, which as we saw before is

inaccurate.

Huang et al. [20] proposed a compact EM model for approximating the TTF of a branch

within an interconnect tree by using a modified version of Korhonen’s solution for a finite line

(2.11). The modification accounted for the material flow and was based on the steady state

stress analysis for the whole tree. Huang et al. approximated the kinetics of branch resistance

change due to void growth using a drift velocity model and used the mesh model to determine

grid failure. The authors later extended their work to incorporate thermal stresses in the grid

[22]. However, their approach was very slow, requiring up to 32 hours to estimate the failure

time of a 400K node grid. The modification based on steady state analysis can determine the

potential void locations in a tree, but the actual time and sequence of void nucleations might

vary considerably from the predicted ones. Moreover, in their approach, only one power grid

TTF sample was obtained and thus, the random nature of EM degradation was not accounted

Li et al. [21, 23] used the Riege-Thompson model (2.16) to drive their EM verification

tool. In [21], the authors also proposed a heuristic greedy approach to increase the tree widths

in order to meet power grid integrity and reliability constraints. But, their approach suffers

from all the drawbacks of Riege-Thompson model. In addition, the authors assumed atomic

diffusivity to be the same throughout the whole tree, which is not true. Atomic diffusivity Da

can be assumed to be the same over short distances, but it varies across the whole tree due

to random grain boundary orientations [48, 41]. Thus, there is a need for a new EM checking

approach that accurately models EM degradation using physics-based models, combined with

a mesh model to account for redundancy, while being fast enough to be practically useful.

Figure 2.8: Schematic for a typical on-die power grid.

2.5 Power Grid model

An on-die power/ground (p/g) grid is a multi-layered metal structure that is used to deliver

power from the external package to the underlying logic. A typical power grid structure is

as shown in Fig. 2.8. Each metal layer mostly consists of a set of alternating parallel power

and ground stripes, that are respectively connected to the power and ground stripes of the

immediate upper and lower neighboring layers by vias. This gives rise to the mesh structure

in modern grids. These metal stripes are the multi-branch structures that are referred to as

interconnect trees. Note that the stripes are not necessarily straight lines: they may have bends

or orthogonal branches. However, they do not have loops. The top layer is connected to the

external package through C4 bumps, while the bottom layer is connected to the underlying

logic. The metal stripes are embedded in a rigid dielectric material, such as Silicon Dioxide.

The minimum spacing between the stripes is determined by the technology node. Usually, some

power or ground stripes are removed from a layer to make room for signal lines, which means

that the stripes in power grids are not uniformly placed. The width and height of the metal

stripes increase as we go from the bottom layer to the top layer.

There are three types of parasitic effects on a p/g grid: resistive, capacitive and inductive.

The resistive parasitics are responsible for the voltage drop across the grid under DC currents,

which is typically referred to as the IR drop. The capacitive effects arise due to the proximity

of metal wires, MOSFET capacitances and de-coupling capacitances. The inductive effects

are mostly due to the connections to the package through the C4 bumps, and are referred to

as L di/ dt drops. However, when it comes to EM, only the resistive parasitics are important

because EM analysis is based on effective-EM (DC) current densities. A p/g grid is a linear

system, with current sources (modeling the effects of the underlying logic circuits) as inputs

and node voltage drops as outputs. Since p/g grids carry mostly unidirectional currents, the

effective-EM currents are the same as average currents. In this work, we will use the mesh

Figure 2.9: DC model of a power grid.

model [25, 47] for p/g grid reliability checks, in which user-provided thresholds on average

voltage drops are used to determine the grid lifetime. In this framework, it becomes sufficient

to perform DC analysis of the power grid, driven by average source currents. Thus, a DC model

of the grid as shown in Fig. 2.9, devoid of any capacitances and inductance, is sufficient for EM

verification.

The power grid nodes, excluding the nodes connected to the voltage sources, are numbered

1, 2, . . ., m with the ground node being 0. Let i = [ik] ∈ Rm be the vector of non-negative

average source currents tied to the grid, such that ik = 0 if node k has no current source. Let

uk(t) be the voltage at node k, and u(t) = [uk(t)] ∈ Rm be the vector of all node voltage signals.

The voltage vector u(t) is a function of time t because it varies over large time-scales as the

grid degrades due to EM. Applying Kirchoff’s current law (KCL) at every node leads to the

following nodal analysis (NA) formulation

G(t)u(t) = −i+Gv(t)udd, (2.20)

whereG(t) andGv(t) arem×m conductance matrices that vary over large time-scales and udd is

a constant vector each entry of which is equal to vdd. Gv is a matrix of conductances connected

to the voltage sources and the matrix G = [gj,k] can be easily constructed using element stamps

[49]. If we set ik = 0 ∀k, then clearly u(t) = udd for all time, so that G(t)udd = Gv(t)udd from

(2.20). Define vk(t) , vdd − uk(t) to be the voltage drop at node k, and let v(t) = [vk(t)] ∈ Rm

be the vector of voltage drops. Then, the NA formulation can be re-written in terms of the

voltage drop vector as

G(t)v(t) = i. (2.21)

We will use this revised system to obtain the voltage drops directly for a given power grid.

2.6 Partial Differential Equations (PDE)

A PDE is an equation for some quantity z (dependent variable) that depends on two or more

independent variables and involves derivatives of z with respect to at least some of the inde-

pendent variables. A second order PDE for z(x, t) in two independent variables t and x is of

the general form

A∂2z

∂t2+ 2B

∂t∂x+C

∂x2+D

∂t+ E

∂x+ Fz = G(t, x), (2.22)

where A, B and C cannot all be zero. In order to solve the PDE, one needs to specify the

boundary conditions (conditions to be satisfied at the boundary of the domain of an independent

variable, say x, for all t) and the initial conditions (e.g. the value of z is specified ∀x at some

t = t0). A second order PDE is said to be linear if the equation, its boundary and initial

conditions do not include any non-linear combination of the independent variables or their

derivatives. A second order PDE is said to be parabolic if B2 − 4AC = 0. Korhonen’s model

(2.9) is a parabolic PDE if Da is assumed to be independent of the stress.

For any given boundary and initial conditions, the objective of solving a PDE is to find the

value of z for all x at some t = tf . There are many ways to solve a PDE, and the solution method

to be used depends on the problem itself. Laplace transform is a powerful technique to obtain an

analytical closed form solution, or the exact solution of a PDE. However, for complex systems,

it is often not possible to derive a closed form solution. For such systems, numerical solution

approaches such as the finite difference method [50], finite element method [51], finite volume

method [52], gradient discretization method(s) [53] or spectral method(s) [54] are preferred.

For numerically solving a linear parabolic PDE, the method of lines is a particularly useful

technique. The method of lines (MoL) [55] is a special finite-difference based technique, where

the basic idea is to discretize the PDE in all but one independent variable, so that we are left

with a set of Ordinary Differential Equations (ODE) that approximate the PDE. As we will

see, there are many well-established methods for solving an ODE system. We can use them to

solve the ODE system approximating the PDE, giving us the solution of the PDE system.

Discretizing the PDE along any variable requires us to approximate the partial derivatives.

For a sufficiently smooth function, one can approximate the partial derivatives using difference

formulas obtained from the Taylor series. Consider a sufficiently smooth function z(x, t) :

R× R→ R, then using the Taylor series we can write

z(x+∆x, t+∆t) = z(x, t) +∂z

∂t∆t+

∂x∆x

∂t2(∆t)2 +

∂x∂t∆t∆x+

∂x2(∆x)2

+ . . . (2.23)

To approximate the partial derivative of z with respect to, say x, we set ∆t = 0 in (2.23) and

re-arrange to get

z(x+∆x, t)− z(x, t)

∆x− 1

∂x2∆x− 1

∂x3(∆x)2 − . . . (2.24a)

≈ z(x+∆x, t)− z(x, t)

∆x. (2.24b)

Equation (2.24b) is known as forward difference approximation and is accurate up to the first

order, i.e. the norm for all terms ignored in (2.24b) (which is essentially the error) is bounded

from above by K∆x, where K is a constant. Similarly, if ∆x is replaced by −∆x in (2.23), we

obtain the backward difference approximation, which is also first order accurate

z(x, t)− z(x−∆x, t)

∂x2∆x− 1

∂x3(∆x)2 − . . . (2.25a)

≈ z(x, t)− z(x−∆x, t)

∆x. (2.25b)

Adding (2.24a) and (2.25a), we get the central difference formula which is second order accurate

z(x+∆x, t)− z(x−∆x, t)

2∆x− 2

∂x3(∆x)2 + . . . (2.26a)

≈ z(x+∆x, t)− z(x−∆x, t)

2∆x. (2.26b)

Higher order partial derivatives can be similarly obtained. The central difference formula ap-

proximating the second order partial derivative can be stated as

∂x2≈ z(x+∆x, t) + z(x−∆x, t)− 2z(x, t)

(∆x)2. (2.27)

We will use the central difference formulas (2.26b) and (2.27) for approximating the partial

derivatives.

2.7 Ordinary Differential Equations (ODE)

An ordinary differential equation (ODE) is an equation for some quantity z (dependent variable)

that depends on one independent variable and involves ordinary derivatives (as opposed to

partial) of z with respect to the independent variable. A first order ODE can be written as

dt= f(z(t), t), (2.28)

where f : R × R → R, z : R → R and t is an independent scalar variable. It is of first order

because the highest derivative is only the first derivative. If z and f are vectors in Rn, then we

get a system of ordinary differential equations, or simply an ODE system. In order to solve an

ODE system, one needs to specify the initial condition(s), i.e. the value of z at t = t0. An ODE

system with an initial condition is generally referred to as an Initial Value Problem or IVP

dt= f(z(t), t), z(t0) = z0, t ∈ [t0, tf ]. (2.29)

A sufficient condition for this IVP to have a unique solution is that f(z(t), t) be continuous on

[t0, tf ]× Rn and that it satisfies the Lipschitz condition [49] with respect to t. An IVP is said

to be well-posed if for a given finite perturbation in the initial condition z0, the perturbation

in the solution of the IVP is bounded. We will assume that all IVPs we are trying to solve are

well-posed. The ODE system (2.28) with z ∈ Rn is said to be linear if f(z, t) takes the form

f(z, t) = A(t)z(t) + u(t), (2.30)

where A(t) ∈ Rn×n and u(t) ∈ R

n. Further, if A(t) is independent of time, then we end up

with a Linear Time Invariant (LTI) system

dt= Az(t) + u(t). (2.31)

We will return to linear ODEs and LTI systems when we discuss the state-space representation

of a system.

There are many well-known techniques for numerically solving an IVP, i.e. an ODE with a

given initial condition. All these techniques involve discretization of the independent variable,

usually time t, and extending the known initial solution at t = t0 in a step-by-step fashion such

that dz/ dt = f(z, t) is (approximately) satisfied for all time-steps t0 < t1 < t2 < . . . < tn−1 <

tn < . . . up to the final time point t = tf . We will denote the true solution at time tn by z(tn),

and the approximate numerical solution obtained by the numerical method as zn. Obviously

for a good numerical method, zn ≈ z(tn) within some user-specified error bound. The solution

between two time-points tn−1 and tn is obtained by using an interpolation polynomial, which

depends on the numerical method being used. Numerical methods for solving IVPs can be

broadly classified into two types [49, 56, 57]:

1. One-step methods: These methods make use of the previously computed solution at time-

point tn to compute the solution at the next time-point tn+1. Some examples of such

methods would be Forward Euler (FE), Backward Euler (BE), Trapezoidal (TR) and

Runge-Kutta (RK) methods. The Runge-Kutta methods often evaluate the function f(·)at intermediate time-points between tn and tn+1 in order to improve the accuracy of the

solution.

2. Multi-step methods: A k-step method makes use of previously computed solutions at k

time points tn, tn−1, . . ., tn−k+1 to compute the solution at the next time-point tn+1.

Multi-step methods (k > 1) require some start-up scheme to compute the first k solutions

before the method can be applied. These methods are particularly suitable for stiff sys-

tems. We will focus mainly on linear multi-step methods, because circuit equations are

Table 2.1: Butcher tableau characterizing a m stage RK formula with built-in error estimates

a1 b11 b12 . . . b1m

a2 b21 b22 . . . b2m...

......

...am bm1 bm2 . . . bmm

w1 w2 . . . wm

w∗1 w∗

2 . . . w∗m

often stiff. Backward Differentiation formulas (BDF) and Adams-Moulton methods are

some examples of multi-step methods.

In general, almost all numerical methods for solving IVPs can be written in the following

general formk−1∑

j=−1

ajzn−j = hφf (zn+1, zn, . . . , zn−k+1, tn; h), (2.32)

where k ≥ 1, aj are scalar coefficients, h = tn+1− tn is the time-step (assumed to be fixed) and

φf (·) is a function that depends on f(·). The objective is to solve (2.32) for zn+1. A numerical

method is said to be convergent if, for a well-posed IVP satisfying the Lipschitz condition, we

limh→0

maxtn∈[t0,tf ]

‖z(tn)− zn‖)

= 0. (2.33)

Convergence guarantees that for a well behaved IVP, any desired level of accuracy can achieved

by choosing a small enough fixed step-size h. A numerical method is convergent if and only if it

is both zero stable and consistent. A numerical method is said to be zero-stable if there exists

a constant h0 > 0 such that for a well-posed IVP, the change in its initial condition by a finite

amount produces a bounded change in the (discrete) solution of the IVP obtained by applying

the numerical method with fixed step-size h < h0. Since a numerical method approximates

the underlying solution, the LHS and RHS of (2.32) when applied to the true solution differ

by O(hp), where O(hp) denotes that the discrepancy in LHS and RHS of (2.32) has an upper

bound of Khp for some constant K. This discrepancy is referred to as the residual. A numerical

method is said to be consistent if its residual is O(hp) with p ≥ 2.

2.7.1 Runge-Kutta Methods

A m-stage Runge-Kutta (RK) method evaluates the function f(·) at m points in the interval

[tn, tn+1] and φf (·) is the weighted average of these sampled values. Specifically, a m stage RK

formula to evaluate zn+1 can be stated as

zn+1 = zn + h(w1k1 + . . .+ wmkm), (2.34)

kj = f(tn + haj , zn + h

bjrkr). (2.35)

This formula is characterized by the table of m2 + 3m parameters as shown in table 2.1, with

the last row mainly used for computing error estimates [see (2.48)]. This table of parameters

is usually referred as the Butcher tableau. The parameters are usually chosen to make the

implementation easier or to improve the accuracy of the method. A RK method is said to be

explicit if bjr = 0 for r ≥ j, otherwise it is implicit. Explicit RK methods are numerically less

expensive and kj can be sequentially computed. On the other hand, implicit RK methods may

end up being a fully coupled non-linear system that can be hard to solve.

2.7.2 Linear Multi-Step Methods

For linear multi-step (LMS) methods, φf (·) is linear so that (2.32) becomes

k−1∑

j=−1

ajzn−j = h

k−1∑

j=−1

bjf(zn−j , tn−j), (2.36)

where the bj are also scalar coefficients. If b−1 = 0, the method is said to be explicit, otherwise

it is implicit and one may need to solve a non-linear equation to compute the value of zn+1.

For many LMS methods, the scalar coefficients aj and bj can be determined using the

linear difference operator. The linear difference operator is also useful in defining the order of

an LMS method, which determines its accuracy and will be useful later in understanding the

error estimates. The linear difference operator of an LMS method with fixed time-step h is an

operator that takes an arbitrary function s(t) and produces the following time function

D[s(t); h] ,k−1∑

j=−1

ajs(t− jh)− h

k−1∑

j=−1

bjs(1)(t− jh), (2.37)

where s(t) is assumed to be differentiable as often as desired and s(1)(·) is the 1st derivative of

s(·). If D is applied to the true solution z(t) and evaluated at tn, we get the so called residual

Rn+1 =k−1∑

j=−1

ajz(tn − jh)− h

k−1∑

j=−1

bjz(1)(tn − jh). (2.38)

Using the Taylor series expansion of s(t) in (2.37), evaluating the derivatives and collecting

similar terms, we can write

D[s(t); h] = C0s(t) + C1hs(1)(t) + . . .+Cqh

qs(q)(t) + . . . , (2.39)

where s(q) denotes the qth derivative of s with respect to t and

C0 =k−1∑

j=−1

aj , (2.40)

C1 = −k−1∑

j=−1

jaj −k−1∑

j=−1

bj , (2.41)

Cq =(−1)qq!

k−1∑

j=−1

jqaj −(−1)q−1

(q − 1)!

k−1∑

j=−1

jq−1bj , (2.42)

An LMS method is said to be of order p if we have C0 = C1 = . . . = Cp = 0 and Cp+1 6= 0, so

that the residual is given by

Rn+1 = Cp+1hp+1z(p+1)(tn) +O(hp+2), (2.43)

where O(hp+2) denotes that the norm of all following terms are bounded from above by Khp+2,

where K is a constant. Thus, an order p LMS method has a residual of the order of hp+1.

2.7.3 Error estimates

Error in the computed solution depends on the stability of the IVP we are trying to solve and

on the stability of the numerical method used to solve the IVP. Well-posed IVPs are stable and

do not introduce significant errors. For any numerical method, the total error consists of two

components: a local error which is introduced in the present step (moving from tn to tn+1)

and a global error which is propagated from all previous steps (from t0 to tn+1). For stable

LMS numerical methods, it can be shown that the local error of an order p method is O(hp+1)

and its global error is O(hp) [49]. In general, it is really hard to estimate the global error; it

is computationally expensive and is of limited applicability [58, 59]. Instead, in practice the

accuracy of the computed solution is often determined by analyzing the error introduced in a

single integration step: it is ensured that the local error per integration step is bounded as per

some user provided tolerance. The local truncation error (LTE) is often used as a measure of

the local error and is defined as follows: For a k-step method, let zn+1 be the value returned

when we artificially set the previous k computed solutions to their true solutions, i.e. we set

zn−j = z(tn−j) for j = 0, 1, . . . , k − 1. Then the LTE is defined as

ǫLTE(h) = z(tn+1)− zn+1. (2.44)

The LTE can be thought of as a direct measure of how well the discrete numerical formula

approximates the true solution. Lambert [60] shows that for an LMS method

ǫLTE(h) = Rn+1 +O(hp+2). (2.45)

Motivated by this, the principal local truncation error or PLTE for an order p LMS method

with fixed time-step h is defined as

ǫPLTE = Cp+1hp+1z(p+1)(tn), (2.46)

where the expression of residual from (2.43) was used. In most cases, ǫPLTE is used as a proxy

for local error estimation in LMS methods, with Cp+1 often referred to as the error constant.

For RK methods, the LTE can be estimated using Richardson error estimates [61], or it can

be built-in the RK method. These RK methods combine two methods, usually one of order

p and another of order p − 1, so that they have common intermediate evaluations between a

single step but different output coefficient values w∗r . Specifically, the lower order method is

defined as

z∗n+1 = zn + h(w∗1 k1 + . . .+ w∗

mkm), (2.47)

where kj are the same as (2.35). The coefficients w∗r form the bottom row of the table 2.1. The

error estimate is then calculated as

ǫRK = zn+1 − z∗n+1 = h

(wr − w∗r )kr, (2.48)

which can be shown to be O(hp).

2.7.4 Variable time-stepping

Error estimates are useful not only to judge the accuracy of a solution, but also to imple-

ment variable step-size numerical solvers. Almost all modern ODE solver implementations use

variable time-stepping: they monitor the accuracy of the solution using error estimates and

adaptively change the step-size in the course of the computation. The decision to increase or

decrease the step-size is primarily taken based on user-specified error tolerances: the step-size

is decreased if the user-specified tolerance is not met (decreasing step-size should decrease error

for convergent numerical methods), and it is increased if the error estimate for the past few

time-steps is very low as compared to the user provided error tolerances. This usually results in

larger step sizes when the solution is varying slowly and smaller step sizes when the solution is

changing rapidly. In absence of variable time stepping, a fixed time-step ODE solver is forced

to take the smallest time step that satisfies the user tolerances at the steepest part of the solu-

tion. As a result, variable time-step solvers are considerably faster as compared to their fixed

time-step variants.

Implementing a change in time-step requires the use of some heuristics to decide how much

the time-step should change. There is no universal best way of changing the time-step, it

depends on the type of IVP and the numerical method being used. Some strategies include

scaling the time step by a constant factor for every increase or decrease (i.e. the next time-step

is sh or h/s where h is the previous time-step), while in others the scaling factor depends on the

error estimation itself. Changing the time-step is easier for single step methods: they can just

evaluate the solution at the next time-point and move on. However, changing the time-step is

a little harder for multi-step methods as all the previous time-points were obtained using the

older time-step value. This is usually overcome using interpolation methods or using variable

coefficient formulas. One last thing: changing the time-step in implicit ODE solvers that often

require solving non-linear coupled equations (or at the very least a linear-system solve) can slow

down the solver as it prevents re-using the previously computed factorizations.

2.8 Compact Thermal Models

Since temperature plays an important role in EM degradation, we will need a way to determine

the power grid temperature distribution. This can be done by using Compact Thermal Models

[62], as shown below. This section also provides a small case study for the application of MoL

to a PDE.

Assuming isotropic thermal conductivity (independent of position and temperature), the

heat transfer/diffusion equation for solids in Cartesian coordinates can be written as [24, 63]

Dcp∂Tm

∂t= κT

(∂2Tm

∂x2+

∂2Tm

∂y2+

∂2Tm

+ γT , (2.49)

where Tm is the time and space dependent temperature profile in Kelvin (K), γT is the power

density of the heat source(s) (Watt/m3), κT is the thermal conductivity (Watt/(m.K)), D

is material density (kg/m3) and cp is the specific heat (Joule/(kg.K)). The PDE (2.49) can

be converted to an ODE by using the Method of Lines, i.e. by discretizing along the spatial

domain (x, y and z axis). Let ∆x, ∆y and ∆z be the discretizations along the x, y and z axes,

respectively. Then, we will end up with small cuboid shaped volume elements of dimension

∆x × ∆y × ∆z as shown in Fig. 2.10a. Each cuboid is identified by a unique triplet index

(i, j, k) and is isothermal with temperature Tm(i, j, k). The index i increases as we move along

the x direction, j increases along the y direction and k increases along the z direction. Then,

we can write using the central difference formula

DcpdTm(i, j, k)

dt= κT

Tm(i+ 1, j, k)− 2Tm(i, j, k) + Tm(i− 1, j, k)

+ κTTm(i, j + 1, k)− 2Tm(i, j, k) + Tm(i, j − 1, k)

+ κTTm(i, j, k + 1)− 2Tm(i, j, k) + Tm(i, j, k − 1)

∆z2+ γT . (2.50)

(a) (b)

Figure 2.10: (a) Cuboids resulting from spatial discretization along x, y and z axis with theirindices (note that we have not shown cuboids with indices (i, j−1, k) and (i, j+1, k) for clarity)and (b) the equivalent electrothermal model for each cuboid. The conductances gxT , gyT andgzT are shared by the neighbouring cuboids.

After multiplying both sides by ∆x∆y∆z and some re-arranging, we get

cTdTm(i, j, k)

dt+ 2 (gxT + gyT + gzT )Tm(i, j, k)

− gxT Tm(i+ 1, j, k) + Tm(i− 1, j, k)− gyT Tm(i, j + 1, k) + Tm(i, j − 1, k)− gzT Tm(i, j, k + 1) + Tm(i, j, k − 1) = iT , (2.51)

gxT = κT∆y∆z

∆x, gyT = κT

∆x∆z

∆y, gzT = κT

∆x∆y

∆z, (2.52)

cT = Dcp∆x∆y∆z, iT = γT∆x∆y∆z. (2.53)

Clearly, an equivalence can be drawn between (2.51) and electric circuits, where Tm(i, j, k) is

equivalent to voltage at node (i, j, k), κT is equivalent to electrical conductivity, gxT , gyT and

gzT are equivalent to electrical conductances, cT is equivalent to a capacitor to the ground and

iT is equivalent to a current source. The cuboid is thus equivalent to a thermal node that has a

current source, a capacitor and 6 resistors connected to neighbouring thermal nodes, as shown

in Fig. 2.10b, and so (2.51) can be obtained by applying KCL at this node. This arrangement

is known as a Compact Thermal Model (CTM) [62]. Using CTMs, we can express the system

of ODEs ∀i, j, k in a given volume concisely as

CTdTm(t)

dt+GTTm(t) = iTs(t) +GT,0Tamb, (2.54)

where GT is the thermal conductance matrix, CT is the diagonal thermal capacitance matrix,

Tm(t) is the vector of temperatures at all thermal nodes, iTs(t) is the vector of iT values for each

thermal node, Tamb is the surrounding ambient temperature and GT,0 is a matrix consisting of

thermal conductances from boundary nodes (at the top, bottom and sides) to the surroundings

that model the heat transfer between the given volume and the surroundings. These boundary

conditions can be isothermal (fixed temperature), insulated (no heat transfer) or convective

(heat loss due to difference in ambient and boundary temperatures) [24]. Equation (2.54) is

the equivalent to the RC power grid model obtained using NA analysis.

2.9 State Space Models

A state space model (SSM) of a linear system with n state variables, ni inputs and no outputs,

can be written as

x(t) = A(t) x(t) +B(t)u(t), (2.55a)

y(t) = L(t) x(t) +D(t)u(t), (2.55b)

where x(t) , dx/ dt, x(t) = [xi(t)] ∈ Rn is a state vector of xi states, A(t) = [ai,j(t)] ∈ R

n×n is

the system matrix, B(t) = [bi,j(t)] ∈ Rn×ni is the input matrix, L(t) = [li,j(t)] ∈ R

no×n is the

output matrix, D(t) = [di,j(t)] ∈ Rno×ni is the feedforward matrix, u(t) = [ui(t)] ∈ R

ni is the

vector of inputs to the system and y(t) = [yi(t)] ∈ Rno is the output of the system. At any

given time t, the state vector describes the linear system completely. Note that (2.55a) is a first

order ODE and is the same as (2.30). If all the matrices in (2.55) are independent of time t,

then we end up with a linear time invariant (LTI) system of the form

x(t) = A x(t) +Bu(t), (2.56a)

y(t) = L x(t) +Du(t). (2.56b)

An example of an LTI system is (2.54), with x(t) = Tm(t), A = −C−1T GT , B = C−1 and

u(t) = iTs(t)+GT,0Tamb. Similar to an ODE, one needs to specify the initial condition at some

time t = t0 in order to solve a SSM. Consider a simple homogeneous (no input) LTI system of

the form

x(t) = Ax(t), (2.57)

with given initial condition x(0). Then, its solution can be shown to be

x(t) = eAtx(0), (2.58)

where eAt is the matrix exponential that can be expressed as

eAt =∞∑

k!. (2.59)

Moreover, if the system matrix A has distinct eigenvalues λ0, λ1, . . ., λn−1 or is diagonalizable,

then each state variable xi can be expressed as a weighted sum of n exponential components

xi(t) =

n−1∑

mi,j eλjt, i = 0, 1, . . . , n− 1, (2.60)

where the mi,j are constant coefficients that depend on the initial conditions and the eigenvec-

tors of A. A homogeneous system, as given by (2.57), is asymptotically stable if and only if all

eigenvalues of A have strictly negative real part and is unstable if any eigenvalue of A has a

positive real part. From (2.60), if all eigenvalues have negative real parts, then xi(∞) = 0. So

an asymptotically stable system decays to zero in the absence of any input. On the other hand,

if any eigenvalue λi has a positive real part, then eλit blows up as t → ∞, which results in an

unstable system.

In the presence of inputs u(t), the solution is the sum of a homogeneous response that

depends on the initial condition (given by (2.58)) and a forced response which is calculated

using a convolution integral, as shown here

x(t) = eAtx(0) +

0eA(t−τ)Bu(τ)dτ. (2.61)

In most cases, computing the matrix exponential is a very expensive operation. As such,

numerical integration methods presented in Section 2.7 are often used to solve a state space

model.

2.10 Mean estimation using Monte Carlo random sampling

Consider a continuous random variable (RV) T that has a certain distribution. The RV T is

a function that maps the outcome of a random process (e.g. the microstructure of a line or a

grid) to a real number (e.g. the TTF). The sets T ≤ t represent the events where the RV

T has a value less than t, and have assigned probability values. A Cumulative Distribution

Function or cdf of a RV T, denoted by FT (t) is defined as

FT (t) , PT ≤ t, (2.62)

where the right hand side is equal to the probability that the RV T takes a value less than or

equal to t. A RV is completely characterized by its cdf.

Random sampling refers to the process of iteratively generating sample values from an

underlying distribution of a RV. Monte Carlo methods estimate a quantity of interest based on

repeated random sampling. A classic example is the problem of estimating the mean of a RV,

denoted by µ or E[T]. Ideally, if we draw a very large number of samples from the underlying

distribution of the RV and calculate the arithmetic average, we can estimate E[T]. However,

it is often practically not possible to obtain a large number of samples. Ideally, one would like

to know how close the estimated mean µ is to the true mean µ of the distribution and stop

when the estimated mean is close enough to the true mean. This is where Monte Carlo random

sampling comes into play.

Suppose we are sampling from a RV that has normal distribution and the variance of the

distribution is not known. Then, in order to ensure an upper bound ǫmc on the relative error

between µ and µ with a confidence of (1− ζ)× 100%, the number of samples s needed is given

by [64]

s ≥[

zζ/2v

|µ|ǫmc/(1 + ǫmc)

, (2.63)

where zζ/2 is the (1 − ζ/2)-percentile of the standard normal distribution (i.e. a normal dis-

tribution with mean 0 and variance 1) and v is the unbiased estimator of variance calculated

s− 1

(Ti − µ)2, (2.64)

with Ti being the ith sample obtained. The usage of v instead of the true variance (which

is unknown) is acceptable only when s is large enough. As per [64], (2.63) can be used only

when s ≥ 30. Roughly speaking, a confidence of (1 − ζ) × 100% means that the estimation

procedure using (2.63) as a stopping criteria satisfies the relative error bound |µ− µ|/µ ≤ ǫmc

(1− ζ)× 100% of the time.

Chapter 3

Extended Korhonen’s model

3.1 Introduction

In this chapter, we will present the first main contribution of our work: the Extended Korhonen’s

model (EKM). We will begin by formally defining interconnect trees and explaining why they

are important for EM checking. We will then introduce EKM by using a simple interconnect

tree as an example. After this, we will state the boundary laws to model the material transfer

between the connected branches and state EKM as a PDE system. We will then describe our

numerical approach for converting the PDE system to an ODE system by using the Method of

Lines. We will verify our numerical approach and the proposed model itself by comparing its

results with prior art. Finally, we will compare the estimated junction MTFs in a tree obtained

using (calibrated) Black’s model and EKM to show the inherent inaccuracy of Black’s model

and investigate the effect of temperature on EM lifetime. A preliminary version of this work

appeared in [65].

3.2 Interconnect Tree EM analysis

Figure 3.1: Cross sectional schematic of Cu dualdamascene interconnects.

As mentioned before, breaking up a tree into

individual branches for EM analysis is not ac-

curate because it ignores the material flow be-

tween the connected branches. However, in

modern p/g grids, one does not have to treat

the whole p/g as a connected structure when

it comes to material flow. Modern p/g are

made of Copper (Cu) and are fabricated us-

ing a dual damascene process [21]. In a dual-

damascene process, the metal line and via are

formed simultaneously using copper. A bar-

rier metal liner (usually Tantalum) must completely surround all Cu interconnects to prevent

Chapter 3. Extended Korhonen’s model 35

Figure 3.2: A typical interconnect tree structure.

the Copper from diffusing into the surrounding dielectric. The cross section of a typical metal

via structure in a Cu dual damascene process is as shown in Fig. 3.1. Due to the presence of

the barrier metal liner around the vias and branches, Cu atoms from one tree cannot diffuse

to another tree. As a result, the metal atoms are confined within a tree and we only need to

account for the material flow between the branches of a tree while conducting EM analysis.

An interconnect tree is a continuously connected acyclic structure of straight metal lines

within one layer of metalization such that atomic flux can flow freely within it. Fig. 3.2 shows a

typical interconnect tree structure. Formally, an interconnect tree is a graph T = (N ,B) withno cycles, where N is a set of grid junctions and B is a set of resistive branches. A branch is

defined to be a continuous straight metal line of uniform width that has the same current density

along its length. A junction is any point on the interconnect tree where a branch ends or where

a via is located. Usually, but not always, current density around a junction is discontinuous,

as different branches in a tree are allowed to have different widths. This discontinuity can be

caused either by differences in the widths of connected branches, or by a change in the currents

due to the presence of a via. We define the degree of a junction to be the number of branches

connected to it. Note that a via does not contribute to the degree of a junction. In this work, a

junction with degree 1 will be referred to as a diffusion barrier, a junction with degree 2 will be

referred to as a dotted-I junction, a junction with degree 3 will be referred to as a T junction

and a junction with degree 4 will be referred to as a plus junction. We treat corners in a tree

as dotted-I junctions. Due to the planar nature of interconnect trees, junctions with degrees

higher than 4 are rarely found in practice.

Many previous works [20, 22, 21] assumed atomic diffusivity Da to be constant throughout

the tree. In our case, we will assume Da to be constant within a branch, but it may vary across

different branches within a tree. Thus, we end up with a piecewise constant Da throughout the

tree. This is done for two reasons: 1) It allows for a more general framework that can easily

fall-back to constant Da for the whole the tree if required and 2) it is physically more accurate

to assume an effective diffusivity (at a macroscopic level) that varies over short distances [41, 66]

due to random grain boundary orientations. If required, a long branch can be broken down

into smaller branches with different diffusivities.

There are two consequences of assuming fixed branch diffusivities. First, atomic flux di-

vergence (AFD) is now higher at branch ends, i.e. junctions, as compared to branch interior.

Higher AFD leads to higher (positive or negative) time-rate of change of stress. Thus, in our

model, voids will nucleate only at junctions of a tree. This is not a problem since it is much

more common in the field to find voids around via locations and grain boundaries [5, 46, 28]

as compared to the branch interior. Second, branches cannot have temperature gradients be-

cause Da depends on temperature [as shown in (2.10)]. Practically, temperatures cannot change

drastically over short distances, hence assuming a branch to be isothermal is a very mild as-

sumption. The branch diffusivities can vary over time if the temperature changes, but at any

given time, the entire branch has the same diffusivity.

3.2.1 Assigning reference directions

Before doing any analysis, we need to assign reference directions to all branches. This is

necessary to consistently track the directions of branch currents and atomic flux throughout

the tree.

An interconnect tree is equivalent to a graph, with grid junctions as vertices and branches as

edges. With this analogy, there are many ways to assign reference direction to the branches. We

choose the following way: starting from any diffusion barrier, we traverse the whole interconnect

tree using a breadth-first search on the graph. This creates predecessor-successor relationships

between the junctions. The reference direction for each branch is then assigned from predecessor

to successor. The branch current (and atomic flux) is positive if it flows in the reference

direction, otherwise it is negative. Likewise, the reference point for distance is the predecessor

junction, so that for any branch bk, xk = 0 is the predecessor and xk = Lk (line length) is the

successor. In Fig. 3.2, if we choose to start from the leftmost diffusion barrier (labelled as n1),

then the reference directions for each branch would be as shown by the dashed arrow lines.

3.2.2 Incorporating thermal stress

In the case of on chip interconnects, the metal lines are embedded in a rigid confinement.

Because of the difference in coefficients of thermal expansion (CTE) of the metal (Copper) am

and the confinement (Silicon Dioxide) asi, stress is generated as the metal cools down after

deposition. This so called thermal stress can be expressed as [67]

σT,k(t) = B(am − asi)(Tzs − Tm,k(t)), (3.1)

where B is the bulk modulus, σT,k is the thermal stress, Tm,k is the temperature of branch bk

and Tzs > Tm,k is the stress free annealing temperature. The initial stress σk(xk, 0) in branch

Figure 3.3: A simple 3-terminal tree Td. Dashed arrows denote reference directions.

bk at t = 0 is equal to its thermal stress so that

σk(xk, 0) = σT,k(0). (3.2)

If σk(xk, 0) > σth, thermally induced voids will nucleate in a tree. However, that would require

the branch temperatures to be much lower than the room temperatures, which is a highly

unlikely scenario. In fact, the supported temperature range for commercial devices is 0C to

70C [68]. Thus, thermally induced voids are ignored in this work.

3.3 Extending Korhonen’s model to trees

To find the level of EM degradation in an interconnect tree, we will extend Korhonen’s model

to account for the coupling of stress between the tree branches. For better understanding, we

will first illustrate our approach with a simple interconnect tree as shown in Fig. 3.3. We will

then generalize the scheme into a set of boundary laws and state the PDE system for the whole

Consider a simple tree Td = (N ,B), with N = n1, n2, n3 and B = b1, b2, as shown in

Fig. 3.3. Branch bk has dimensions Lk × wk × hk (length × width × height), carries a current

density jk, has an atomic diffusivity of Da,k and temperature Tm,k, where k is 1 or 2 in this

case. The reference direction for both branches is from left to right, so that n1 is the reference

point for b1 and n2 is the reference point for b2. Within branch bk, the distance from their

respective reference point is denoted by xk. Note that x1 = L1 and x2 = 0 denote the same

point: the location of junction n2. We are interested in stress as a function of position and

time, i.e. σ1(x1, t) and σ2(x2, t) for branches b1 and b2, respectively. Once σ1 and σ2 are known,

we can easily determine the EM degradation in the branches.

For any point within a branch, Korhonen’s model (2.9) captures the dynamics of stress

evolution. Since atomic diffusivity is assumed to be constant for a branch, we can re-write (2.9)

for branches b1 and b2 as

∂σk∂t

=BΩDa,k

kbTm,k

(∂σk∂xk− q∗ρ

, xk ∈ (0, Lk) and k = 1, 2. (3.3)

At junctions, the diffusivity and current density change abruptly. As such, their spatial deriva-

tives are undefined and Korhonen’s model cannot be applied at junctions. Instead, we need to

Figure 3.4: Stress profile around a junction immediately after void nucleation.

state the boundary conditions to describe the behaviour of stress and atomic flux at the junc-

tions. For example, in Fig. 3.3, we need to state the boundary conditions at the two diffusion

barriers n1 and n3 and the dotted-I junction n2.

Diffusion Barrier

Junctions n1 and n3 are diffusion barriers, where the atomic flux is blocked. Considering the

nucleation phase first, Ja is zero at the barrier so that from (2.8)

Ja,1(0, t) = 0 =⇒ ∂σ1(0, t)

∂x1=

q∗ρ

Ωj1, (3.4a)

Ja,2(L2, t) = 0 =⇒ ∂σ2(L2, t)

∂x2=

q∗ρ

Ωj2. (3.4b)

We next move to the void growth phase. For a void to nucleate at n1 (n3), we must have

j1 < 0 (j2 > 0), so that the electron flow pushes the metal atoms away from n1 (n3). Exactly

what happens around a void is somewhat complicated and cannot be fully captured in a 1D

model. Sukharev et al. [67] provide a simplified extension of the Korhonen 1D model to describe

behaviour of stress around a void, which we will use in our work. When the stress value at any

junction reaches σth, a void nucleates at that point. Just after the void nucleation, stress falls

to zero inside the void and at the void surface, but remains at its original value σth at a very

short distance of δ ≈ 1nm from the void surface. We refer to δ as the thickness of the void

interface. For example, the stress profile at n1 just after void nucleation is shown in Fig. 3.4.

Recall that stress gradient gives rise to gradient flux that flows from points with lower stress

towards points of higher stress. In this case, the high spatial stress gradient gives rise to a

high gradient flux that always flows away from the void. This flux is responsible for the void

growth. The net flux at the junction is now the sum of this gradient flux and the electronic flux.

However, the magnitude of the electronic flux is very small as compared to the gradient flux.

We thus ignore the electronic flux and state the boundary conditions at the diffusion barriers

during the void growth phase as

∂σ1(0, t)

∂x1=

σ1(0, t)

δ, (3.5a)

∂σ2(L2, t)

∂x2= −σ2(L2, t)

δ, (3.5b)

where σ1(0, t) = σth and σ2(L2, t) = σth at the time of void nucleation. A growing void presents

amoving boundary problem for a PDE that is computationally very expensive to solve. However,

as we will see, the steady state void length is very small, (≈ 0.5% of line length), so that (3.5)

is a good approximation for stress gradient around the void.

Dotted-I Junction

The interaction of atomic flux at dotted-I junction n2 is the key to describing the coupling of

stresses in branches b1 and b2. Considering the nucleation phase first, the junction n2 is the

same physical point of both b1 and b2, so that

σ1(L1, t) = σ2(0, t). (3.6)

In other words, stress is continuous across the junction. This makes sense because if stress is

discontinuous across a junction and abruptly jumps from one value to another within a short

distance, the high stress gradient would quickly equalize the stress across the junction simply

because the atomic flux can flow freely between b1 and b2 when there is no void at n2. This

brings us to our second boundary condition, which can be stated mathematically as

w1h1Ja,1(L1, t) = w2h2Ja,2(0, t). (3.7)

Note that (3.7) is applicable to an infinitesimal cross-section at n2, and states that the material

flow across an infinitesimal cross-section is conserved. This is true for any infinitesimal cross-

section inside the branch as well, and is implicitly accounted for in Korhonen’s model. However,

over a finite region or volume element, the net atomic flux entering may not be equal to the

net atomic flux leaving, which gives rise to flux divergence and generates stress in the line.

Next we will consider the void growth phase. Once a void nucleates at n2, it is shared by

both branches b1 and b2. For our 1D model, we make the reasonable assumption that the void

completely covers the entire cross-sectional area of the junction. As a result, there would be

no flow of atomic flux between b1 and b2. Hence, during the void growth phase, we effectively

treat n2 as a diffusion barrier for both branches b1 and b2, so that

∂σ1(L1, t)

∂x1= −σ1(L1, t)

∂σ2(0, t)

∂x2=

σ2(0, t)

δ, (3.8)

where σ1(L1, t) = σ2(0, t) = σth at the time of void nucleation. The alternate assumption, that

a void partially covers the cross-section at a junction is hard to model in a 1D scenario where

every location is essentially treated as a point. Note that the branches are still electrically

connected as the current can flow through the barrier metal liner.

As we will see a little later, (3.3) combined with the boundary conditions obtained from

(3.4)-(3.8) and the initial condition as stated in (3.2), is the PDE system that completely

determines σ1 and σ2. We will next generalize the above schemes for capturing flux interactions

at junctions, into a set of laws that forms the basis for our approach.

3.3.1 Boundary Laws for junctions

Consider a junction np, and let Bp be the set of branches connected to np. Let tf,p be the time

of void nucleation for this junction. Then, the boundary laws, motivated mainly by the law of

conservation of mass and physical observations, can be stated as:

Law 1. Until a void nucleates at np, the stress values in any two branches where they meet at

np are equal.

Law 2. For t < tf,p, the number of metal atoms flowing into np per unit time is the same as

the number of metal atoms flowing out from it

bk∈Bp,in

wkhkJa,k =∑

bk∈Bp,out

wkhkJa,k, (3.9)

where wk (hk) is the width (height) of the branch, Bp,in is the set of branches for which the

reference direction is going into np, and Bp,out is the set of branches for which the reference

direction is going out from np.

Law 3. For t ≥ tf,p, there is no flow of atomic flux between the connected branches Bp. The

stress gradient at the junction, generalizing from (3.5) and (3.8), is

∂σk,p∂xk

= ±σk,pδ

, (3.10)

where σk,p is the value of stress at end-point np of branch bk. The sign is positive for bk ∈ Bp,outand negative for bk ∈ Bp,in.

3.3.2 PDE system for a general interconnect tree

We are now ready to state the complete PDE system that describes the stress evolution for an

interconnect tree of arbitrary complex geometry over time. We refer to this PDE system as the

Extended Korhonen’s model.

Consider a tree T = N ,B. A branch bk ∈ B has dimensions Lk × wk × hk and carries

a current density jk. Let Da,k and Tm,k represent the atomic diffusivity and temperature of

branch bk. Let xk denote the distance from the reference point (predecessor junction) in branch

bk with 0 ≤ xk ≤ Lk. For any junction np ∈ N , let Bp,in (Bp,out) be the set of connected

branches for which the reference direction is going into (out of) the junction, and let tf,p > 0

be its time of void nucleation. Then, the Extended Korhonen’s model can be stated as

PDE:∂σk∂t

=BΩDa,k

kbTm,k

(∂σk∂xk− q∗ρ

, ∀bk ∈ B, xk ∈ (0, Lk), (3.11a)

BC: ∀np ∈ N s.t. t < tf,p∑

bk∈Bp,in

wkhkJa,k(Lk, t) =∑

bk∈Bp,out

wkhkJa,k(0, t), (3.11b)

σk(Lk, t) = σi(0, t), ∀bk, bi ∈ Bp,in × Bp,out, (3.11c)

∀np ∈ N s.t. t ≥ tf,p

∂σk,p∂xk

−σk,p(Lk, t)

δ∀bk ∈ Bp,in,

σk,p(0, t)

δ∀bk ∈ Bp,out,

(3.11d)

IC: σk(xk, 0) = σT,k(0) ∀bk ∈ B. (3.11e)

3.3.3 Void growth and resistance change

Once the stress at any point on the tree reaches σth, a void nucleates at that point. As noted

before, in EKM, void nucleation occurs only at junctions and not within the branches. Once

a void nucleates at a junction, it is shared by all the branches connected to that junction, i.e.

it affects the resistance of all connected branches. Tracking void growth is useful in order to

determine the change in branch resistances and the corresponding current densities. However,

void growth dynamics in dual damascene copper interconnects is a complex phenomenon and

involves void migration (movement of void within a branch), healing (void size reduction due

to change in the current direction) and saturation (steady state void volume for given branch

current densities) [69, 70]. Since p/g grid branches carry mostly unidirectional current, void

healing rarely happens. Also, there is no change in void size during migration [69], which means

that void migration has no effect on the branch resistance. Thus, we will ignore void migration

and healing in this work.

Sukharev et al. [67] show that the initial void growth rate for an EM induced void is very

high. This is attributed to the high initial gradient flux as explained in Section 3.3. Hence,

as a conservative approximation, we assume that once a void nucleates at any junction np, the

void lengths for all branches bk connected to np reach their steady state values in a very short

period of time. As a result, the line resistance rises immediately to its steady state value for

all connected branches. The steady state void volume for branch bk can be calculated as

Vk,sat = Lkwkhk

(σT,kB

+q∗ρ|jk|Lk

. (3.12)

In our case, since we assume that a void covers the entire cross-section area, the void length is

simply given by lk,v = Vk,sat/(wkhk). In the presence of a void, the branch current is forced to

take the high resistance path through the metal liner. Correspondingly, the branch resistance

Rk becomes

Rk = ρblk,v/Ab + ρm(Lk − lk,v)/Am , (3.13)

where ρm(ρb) and Am(Ab) are the resistivity and cross-sectional area of the metal (liner),

respectively. For any branch bk, Vk,sat and jk are inter-dependent on each other. As such,

we iteratively find jk and Vk,sat using modified Richardson iteration. It should be noted that

although we assume a saturated volume for the void, the boundary conditions for any junction

where a void has nucleated is the same as the one used for transient void growth. Thus, in

assuming immediate steady state void volume, we have replaced the actual transient current

densities by their respective conservative steady state values.

3.4 Solving EKM using IVP formulation

In this section, we will describe our approach for solving the Extended Korhonen’s model using

method of lines (MoL). First, for points within a branch, we will use MoL to convert the PDE

system into a ODE system by discretizing along the spatial domain. Then, using the laws

proposed in Section 3.3.1, we will derive the boundary conditions at the junctions. Finally, we

will merge the two and state the IVP formulation that describes the stress evolution for a given

Since we will deal with power grids that are composed of trees, the IVP (as well as the LTI

systems in the next chapter) is formulated to solve EKM for trees. However, EKM as shown in

(3.11), is applicable to non-tree interconnect structures (that have loops) as well and one can

formulate an equivalent IVP (or LTI system) for general graphs.

3.4.1 Scaling

Before proceeding with MoL, we will scale stress, distance and time by introducing their di-

mensionless variants. This leads to stable PDEs that are easier to solve numerically. We define

the following scaling factors for any branch bk ∈ B

kbT ⋆m

D⋆at

, ηk=

ΩσkkbT ⋆

, ξk=

, (3.14)

where D⋆a is the atomic diffusivity at some chosen nominal temperature T ⋆

m and Lc is some

chosen characteristic length. The new variables τ , η and ξ are referred to as reduced time,

stress and distance, respectively. Using (3.14) in (3.11a) and applying the chain-rule, we get

∂ηk∂τ

= θk∂

∂ξk

(∂ηk∂ξk− αk

, (3.15)

where θk = (L2cDa,kT

⋆m)/(L2

kD⋆aTm,k) and αk = (q∗ρjkLk)/(kbT

⋆m). Since, for any given branch,

αk is not a function of distance ξk, we have ∂αk/∂ξk = 0, so that

∂ηk∂τ

= θk∂2ηk∂ξ2k

. (3.16)

Equation (3.16) constitutes the scaled PDE to be solved for ∀bk ∈ B. Also, the atomic flux in

bk can be restated in terms of the reduced variables as

Ja,k =Da,kCT ⋆

LkTm,k

(∂ηk∂ξk− αk

. (3.17)

3.4.2 Discretization for a tree branch

We uniformly discretize branch bk into N segments, where N is the same for all branches

because we have scaled all branch lengths to 1 as in (3.14). The reduced stress at each of the

N + 1 discrete spatial points 0, 1, . . . N in branch bk is denoted by ηk,i and the time rate of

change of ηk,i is [from (3.16)]

∂ηk,i∂τ

= θk∂2ηk,i∂ξ2k

, i = 0, 1, . . . N. (3.18)

Further, we approximate the partial derivatives with respect to ξk using the central difference

formula, so that (3.18) leads to

dηk,idτ

(ηk,i+1 + ηk,i−1 − 2ηk,i

(∆ξ)2

, i = 0, 1, . . . N, (3.19)

where ∆ξ = ∆ξk = 1/N , ∀k. The corresponding atomic flux Ja,k,i at the ith point in branch bk

is given as

Ja,k,i =Da,kCT ⋆

LkTm,k

(ηk,i+1 − ηk,i−1

2∆ξ− αk

. (3.20)

The ODE system given by (3.19) ∀bk ∈ B, combined with the initial condition (3.2), approx-

imates the PDE system (3.11); so that the solution of the ODE system gives us the solution

of (3.11). However, the formulation is not yet complete because the ODEs at all junctions

(i = 0, N ∀bk ∈ B) require the values of ηk,−1 and ηk,N+1, which are not part of the ξk

domain. The values at these ghost points are obtained by solving the respective boundary

condition(s), as we next explain.

To simplify the presentation going forward, we define the following for any two branches

bi, bk ∈ Brik , Li/Lk, pik , Da,iTm,k/(Da,kTm,i),

wik , wi/wk, γik , rkiwikpik, Υk , θk/(∆ξ)2.(3.21)

3.4.3 Boundary Conditions at Diffusion Barrier

Consider a diffusion barrier np connected to branch bk. We have two cases, one where np is the

predecessor junction (at ξk = 0, start of the branch) and one where it is the successor junction

(at ξk = 1, branch end). We will first obtain the boundary conditions for np at ξk = 0. Let τf

be the time of void nucleation at this barrier. Then, the corresponding boundary condition is

[using (3.9) and (3.10)]

∂ηk,0∂ξk

αk τ < τf ,

ηk,0(Lk/δ) τ ≥ τf ,(3.22)

where ηk,0 corresponds to σk,p in (3.10), with ηk,0 = ηth = Ωσth/(kbT⋆m) at τ = τf . Using the

central difference approximation, we get

ηk,1 − ηk,−1

2∆ξ=

αk τ < τf ,

ηk,0(Lk/δ) τ ≥ τf ,(3.23)

which can be easily solved for ηk,−1

ηk,−1 =

ηk,1 − 2∆ξαk τ < τf ,

ηk,1 − 2∆ξηk,0(Lk/δ) τ ≥ τf .(3.24)

Similarly, for a diffusion barrier at ξk = 1, we get

ηk,N+1 =

ηk,N−1 + 2∆ξαk τ < τf ,

ηk,N−1 − 2∆ξηk,N (Lk/δ) τ ≥ τf .(3.25)

3.4.4 Boundary Conditions at Dotted-I junction

Consider a dotted-I junction np. Without loss of generality, we will assume that np is at the

end of branch 1 and at the beginning of branch 2. To formulate the ODE at np, we need the

value of at least one of the ghost points (η1,N+1 or η2,−1). Let τf be the time of void nucleation

at this junction. Then, using (3.9), we have for τ < τf (h1 = h2 within a metal layer)

w1Ja,1,N − w2Ja,2,0 = 0. (3.26)

Define ∆η1,N , η1,N+1 − η1,N−1 and ∆η2,0 , η2,1 − η2,−1. Then substituting the expression for

atomic flux from (3.20) in (3.26), we get

w1Da,1CT ⋆

L1Tm,1

(∆η1,N2∆ξ

− α1

− w2Da,2CT ⋆

L2Tm,2

(∆η2,02∆ξ

− α2

=⇒ ∆η1,N − γ21∆η2,0 = u1, (3.27)

where u1 = 2∆ξ (α1 − γ21α2). Also, from law 1, η1,N = η2,0 when τ < τf . Hence, the time rate

of change of stress should also be the same, so that using (3.16)

∂η1,N∂τ

=∂η2,0∂τ

=⇒ ∂2η1,N∂ξ21

=θ2θ1

∂2η2,0∂ξ22

for τ < τf . (3.28)

Applying the central difference formula in (3.28), we get

η1,N+1 + η1,N−1 − 2η1,N(∆ξ)2

=θ2θ1

(η2,1 + η2,−1 − 2η2,0

(∆ξ)2

=⇒ ∆η1,N + 2(η1,N−1 − η1,N ) = r212 p21 (−∆η2,0 + 2(η2,1 − η2,0))

=⇒ ∆η1,N + r212 p21∆η2,0 = u2, (3.29)

where u2 = 2(r212p21η2,1 − η1,N−1 + (1− r212 p21)η1,N ). Solving for ∆η1,N and ∆η2,0 form (3.27)

and (3.29), we get

∆η1,N =r12u1 + w21u2

r12 + w21, ∆η2,0 = −

u1 − u2r12p21(r12 + w21)

. (3.30)

Thus, the final expression for the ghost points η1,N+1 and η2,−1 are

η1,N+1 = η1,N−1 +r12u1 + w21u2

r12 + w21, (3.31a)

η2,−1 = η2,1 +u1 − u2

r12p21(r12 + w21). (3.31b)

Once a void nucleates at np, it is treated as a diffusion barrier for all connected branches.

Thus, for τ ≥ τf , the boundary conditions are given by

η1,N+1 = η1,N−1 − 2∆ξη1,N (L1/δ), (3.32a)

η2,−1 = η2,1 − 2∆ξη2,0(L2/δ). (3.32b)

3.4.5 Boundary Conditions at T junction

Consider a T junction np. Similar to the dotted-I junction, we will assume that np is at the

end of branch 1 and at the beginning of branches 2 and 3. To complete the ODE formulation

at np, we need the value of at least one of the ghost points (η1,N+1, η2,−1 or η3,−1). Let τf be

the time of void nucleation at this junction. Then, using (3.9), we get (h1 = h2 = h3 within a

metal layer)

w1Ja,1,N − w2Ja,2,0 − w3Ja,3,0 = 0 for τ < τf . (3.33)

Also, for τ < τf , stress should be continuous across np (law 1), so that η1,N = η2,0 = η3,0, which

gives [using (3.16)]

∂η1,N∂τ

=∂ηk,0∂τ

=⇒ ∂2η1,N∂ξ21

=θkθ1

∂2ηk,0∂ξ2k

for τ < τf , k = 2, 3. (3.34)

Same as before, we substitute the expression of atomic flux from (3.20) in (3.33) and apply the

central difference formula in (3.34) to obtain the value of ghost points. We omit the complete

derivation and only present the final values

η1,N+1 = η1,N−1 +u1r12r13 + u2r13w21 + u3r12w31

r12r13 + r13w21 + r12w31, (3.35a)

η2,−1 = η2,1 +u1r13 − u2(r13 + w31) + u3w31

r12p21(r12r13 + r13w21 + r12w31), (3.35b)

η3,−1 = η3,1 +u1r12 + u2w21 − u3(r12 + w21)

r13p31(r12r13 + r13w21 + r12w31), (3.35c)

where u1 = 2∆ξ (α1 − γ21α2 − γ31α3), and uk = 2(r21kpk1ηk,1 − η1,N−1 + (1− r21kpk1)η1,N

), for

k = 2, 3.Using law 3, np is treated as a diffusion barrier during the void growth phase, so that for

τ ≥ τf

η1,N+1 = η1,N−1 − 2∆ξη1,N (L1/δ), (3.36a)

ηk,−1 = ηk,1 − 2∆ξηk,0(Lk/δ), k = 2, 3. (3.36b)

3.4.6 Boundary Conditions at Plus junction

The boundary conditions for the plus junction can be obtained by following the same procedure

as done before. Consider a plus junction np, which is at the end branch 1 and at the beginning

of branches 2, 3 and 4. Let τf be the time of void nucleation at this junction. Then, using law

1 and equations (3.9) and (3.16), we have for τ < τf

w1Ja,1,N − w2Ja,2,0 − w3Ja,3,0 − w4Ja,4,0 = 0, (3.37)

∂η1,N∂τ

=∂ηk,0∂τ

=⇒ ∂2η1,N∂ξ21

=θkθ1

∂2ηk,0∂ξ2k

, k = 2, 3, 4. (3.38)

Solving as before, we can obtain the value of ghost points, which are as shown here

η1,N+1 = η1,N−1 +u1r12r13r14 + u2r13r14w21 + u3r12r14w31 + u4r12r13w41

r12r13r14 + r13r14w21 + r12r14w31 + r12r13w41, (3.39a)

η2,−1 = η2,1 +u1r13r14 − u2(r13r14 + r14w31 + r13w41) + u3r14w31 + u4r13w41

r12p21(r12r13r14 + r13r14w21 + r12r14w31 + r12r13w41), (3.39b)

η3,−1 = η3,1 +u1r12r14 + u2r14w21 − u3(r12r14 + r14w21 + r12w41) + u4r12w41

r13p31(r12r13r14 + r13r14w21 + r12r14w31 + r12r13w41), (3.39c)

0 2 4 6time (yrs)

0 20 40 60 80 100x (10 -6 m)

0.00 yrs0.59 yrs2.98 yrs

4.76 yrs

4.79 yrs

6.00 yrs

Figure 3.5: For Td, (a) evolution of stress at junctions with time and (b) stress profile withtime.

η4,−1 = η4,1 +u1r12r13 + u2r13w21 + u3r12w31 − u4(r12r13 + r13w21 + r12w31)

r14p41(r12r13r14 + r13r14w21 + r12r14w31 + r12r13w41), (3.39d)

where u1 = 2∆ξ (α1 − γ21α2 − γ31α3 − γ41α4), and uk = 2(r21kpk1ηk,1−η1,N−1+(1−r21kpk1)η1,N ),

for k = 2, 3, 4.

Using law 3, np is treated as a diffusion barrier during the void growth phase. Thus, for

τ ≥ τf

η1,N+1 = η1,N−1 − 2∆ξη1,N (L1/δ), (3.40a)

ηk,−1 = ηk,1 − 2∆ξηk,0(Lk/δ), k = 2, 3, 4. (3.40b)

The IVP formulation is completed by eliminating the ghost points from the ODEs at junctions

by using (3.24), (3.25), (3.31), (3.32), (3.35), (3.36), (3.39) and (3.40). Fig. 3.5 shows the

solution obtained using the IVP formulation for tree Td of Fig. 3.3, with L1 = L2 = 50µm, and

j1 = −j2 = 6× 109 A/m2. In this scenario, since the electronic flux moves away from junction

n2 in both branches, it develops tensile stress which ultimately leads to void nucleation.

3.5 Verifying EKM and the IVP formulation

In this section, we will first verify the IVP formulation of EKM by comparing our numerical

results with known analytical solutions for simple interconnect trees. Then, we will compare

the lifetime estimates of Extended Korhonen’s model with experimental results published in

the literature to verify the model itself. We will use a standard variable time step Runge-Kutta

method with the Butcher tableau as given by Dormand and Prince [71] for integrating the IVPs

obtained.

Figure 3.6: Tree with a (a) dotted-I junction and (b) T junction.

3.5.1 Verifying the numerical approach

As mentioned in the background, analytical solutions are known for 1) a finite line with blocked

boundary conditions, as provided by Korhonen [16] and 2) for simple interconnect trees (with

some simplifying assumptions) as given by the CTHKS model [44]. We will compare our

numerical solution with both analytical solutions.

First, we compare our solution with the analytical solution of the CTHKS model. Chen et

al. [44] compared the solution of CTHKS model to the solution obtained by using COMSOL,

an industry standard PDE solver, and reported a maximum error of 0.5%. Thus, a comparison

with the CTHKS model would provide an indirect comparison of our numerical method with

COMSOL.

CTHKS model makes some simplifying assumptions to derive the analytical solution, which

are listed in Section 2.3.4. In order to compare EKM and CTHKS model, we make the same

simplifying assumptions. We will use the simple interconnect trees shown in Fig. 3.6, with all

branch lengths being L = 50 µm. We will use N = 20 discretizations per branch to formulate

the IVP (a higher value of N gives a more accurate solution and vice versa). The initial

branch current densities for the dotted-I structure Td are assumed to be j1 = 1 × 109 A/m2

and j2 = −2 × 109 A/m2. Fig. 3.7a compares the stress evolution at the junctions of tree Tdwith time as obtained using EKM and CTHKS model. In Fig. 3.7b, we plot the percent error

between the stress values against the CTHKS solution, i.e. if σEKM(x, t) and σCTHKS(x, t) represent

the solutions obtained using EKM and CTHKS model respectively, at some discrete point x

at time t, then a blue dot is the point σCTHKS(x, t), 100×(σEKM(x, t)− σCTHKS(x, t))/σCTHKS(x, t).The maximum absolute error between the solutions obtained using the CTKHS model and the

EKM is 0.5 MPa, i.e. max(|σCTHKS(x, t)−σEKM(x, t)|) = 0.5 MPa. The red and black lines show

the contour for the maximum absolute error. This kind of plot is known as the error rate plot,

and we will frequently use it to show the error between two quantities. The percentage errors

are high when the stress values are close to 0, which is to be expected. For the T-structure T⊥,the current densities are j1 = 0.9× 109 A/m2, j2 = −2× 109 A/m2 and j3 = −0.8× 109 A/m2.

The comparison of stress evolution at the junctions is shown in Fig. 3.8a. The error rate plot

in Fig. 3.8b shows a maximum absolute error of 0.9 MPa. This demonstrates that the results

obtained from the EKM are in excellent agreement with the CTHKS model, and by extension

0 1 2 3 4 5 6 7 8 9 10

time (yrs)

-60 -40 -20 0 20 40Stress (Mpa)

Percent error 0.502 MPa-0.502 MPa

Figure 3.7: (a) Comparing stress evolution for a dotted-I structure as obtained using EKM andthe CTHKS model, and (b) the error rate plot with respect to the CTHKS solution.

0 1 2 3 4 5 6 7 8 9 10

time (yrs)

-60 -40 -20 0 20Stress (Mpa)

Figure 3.8: (a) Comparing stress evolution for a T-structure as obtained using EKM and theCTHKS model, and (b) the error rate plot with respect to the CTHKS solution.

COMSOL. For a better comprehension of how the stress profile in T⊥ varies over time, we show

a 3D plot of stress evolution in Fig. 3.9.

For comparison with the reference solution proposed by Korhonen, we use the finite line as

shown in Fig. 3.10 with L=40 µm and j=2×109 A/m2. The initial thermal stress is assumed

to be 434.7 MPa. We use N=40 discretizations per branch. The stress values are computed

for each discretized point in the line for 200 equidistant time points between 0-50 years. In

Fig. 3.11a, we compare the stress evolution at the junctions and at x = L/2 (with time) as

obtained using the reference solution (2.11) and the numerical solution of the IVP formulation.

The error rate plot (Fig. 3.11b) for stress values at all discretized points over time with respect

to the stress values obtained using the reference solution shows a maximum absolute error of

∼1.37 MPa. In this case, the maximum percent error is approximately 0.34%, which shows

that our numerical solution for a finite line is very close to the reference solution.

x ( m)

y ( m)

40 206080

100 0120

t = 3.80 yrst = 5.80 yrs

t = 10.00 yrs

t = 1.80 yrst = 0.80 yrst = 0.20 yrs

Figure 3.9: Stress profile across the T-structure with time.

Figure 3.10: Schematic of a finite line.

0 10 20 30 40 50Time (yrs)

n2 ref.

n2 EKM

L/2 ref.L/2 EKMn1 ref.

n1 EKM

360 380 400 420 440 460 480 500Stress (Mpa)

Figure 3.11: (a) Comparing stress evolution for a finite-line as obtained using EKM and thereference solution, and (b) the error rate plot with respect to the reference solution.

3.5.2 Verifying the model

We will now verify the model itself using previously published experimental data by Gan et al. [5]

and Moreau et al. [1].

Gan et al. conducted experiments using the dotted-I structure shown in Fig. 3.6a, with each

(i) (ii) (iii) (iv) (v)Experiment number

Experiment (Gan et al.)

Simulation (EKM)

Used for calibration

Blacks' ModelMTF

Figure 3.12: Comparing the estimated MTF and its 95% confidence bounds as obtained usingEKM with the ones reported by Gan et al. [5]. Note that the confidence bounds get tighter asthe number of TTF samples are increased.

branch being L = 250µm long. They used 5 different current density configurations, which are

listed as follows:

i) j1 = 2.5× 1010 A/m2, j2 = 2.5× 1010 A/m2.

ii) j1 = 2.5× 1010 A/m2, j2 = 0 A/m2.

iii) j1 = 2.5× 1010 A/m2, j2 = 0.5× 1010 A/m2.

iv) j1 = 2.5× 1010 A/m2, j2 = −0.5× 1010 A/m2.

v) j1 = 2.5× 1010 A/m2, j2 = −2.5× 1010 A/m2.

The failure was determined based on a 30% increase in the branch resistance.

We use data from configuration (i) to calibrate EKM: since EKM assumes that voids im-

mediately reach their steady state after nucleation, we empirically choose an appropriate value

for the mean diffusivity, so that the mean void nucleation times (as estimated using EKM) are

equal to the mean failure times reported in [5]. We then use the calibrated model to estimate

the MTF for all the remaining configurations. The MTF and confidence bounds estimated

using EKM are based on 100 TTF samples, where each TTF sample is obtained by assign-

ing lognormally generated diffusivities to the branches of the tree. The results are shown in

Fig. 3.12, where the comparison is made in terms of the MTF and its 95% confidence bounds.

As can be seen, EKM gives a conservative estimate for configuration (ii) and a close enough

MTF estimate for configurations (iii), (iv) and (v), which is well within the experimentally

determined 95% confidence bounds. Since branch b2 has zero current density in configuration

(ii), it takes longer to reach the 30% resistance increase, which is not accounted for in EKM

(a) (b)

Figure 3.13: (a) Schematic view of the test structure used in [1], and (b) Upstream and down-stream configurations as defined with respect to the left via. Both figures taken from [1]. Here,TiN (Titanium Nitride) is used for barrier liner and SiN (Silicon Nitride) is used for capping.

Table 3.1: Comparison of upstream-to-downstream MTF ratio as reported in [1] and as esti-mated using EKM.

Temp. ib Upstream Downstream Ratio

(C) (mA) Experiment[1] EKM Experiment[1] EKM µexpu

µexpd

µekmu

µekmdµexp

ustdev. µekm

ustdev. µexp

dstdev. µekm

dstdev.

25015 1294.5 0.33 1292.3 0.34 – – 480.43 0.37 – 2.67

25 979.3 0.28 617.2 0.34 378.9 0.25 223.35 0.37 2.58 2.76

30010 748.7 0.34 834.8 0.29 203.9 0.36 351.25 0.33 3.67 2.38

20 348.5 0.22 332.3 0.30 – – 120.53 0.31 – 2.76

35015 184.6 0.25 211.3 0.26 48.5 0.28 84.51 0.28 3.81 2.50

25 98.3 0.18 110.6 0.28 – – 40.05 0.30 – 2.76

because it assumes that the voids immediately achieve their steady state volume. Simulating

with EKM, we found that for configurations (ii), (iv) and (v), the voids nucleate only at junc-

tion n2, and for configuration (iii), the voids nucleate at junctions n2 and n3, which is the same

as observed by Gan et al. in [5]. On the other hand, if Black’s model was calibrated using

configuration (i), it would predict the same MTF for b1 in all cases (shown by the dashed line

in Fig. 3.12) regardless of the current density in j2, which is pessimistic in this case.

Moreau et al. [1] conducted experiments to find out the impact of redundant through silicon

vias or TSVs on EM lifetimes. Redundant TSVs are often used as simple design solutions to

increase the EM lifetime. The schematic view of their test structure is shown in Fig. 3.13a.

They used two configurations, an upstream configuration where the direction of the electron

flow was towards the RDL (redistribution layer) and a downstream configuration in which the

direction of the electron flow was away from the RDL, as shown in Fig. 3.13b. For the up-

stream configuration, voids will nucleate below the right hand side vias and for the downstream

configuration, voids will nucleate below the left hand side via. Let ib represent the magnitude

20 40 60 80 100 120 140 160 180

branch number

#109 (a)

20 40 60 80 100 120 140 160 180

branch number

Figure 3.14: a) Initial current density profile for T1 and heat map showing MTFs estimated using(b) Extended Korhonen’s model (MTFekm), (c) Black’s model (MTFblk) and (d) MTFblk −MTFekm. All MTF values are in years.

of current flowing in the 500 µm branch. In their experiments, it was observed that even if ib

is kept at the same magnitude in both configurations, the presence of redundant TSVs on the

right side improved the EM lifetime by 2-4x in the upstream configuration due to the metal

reservoir effect. To simulate this metal structure using EKM, we first calibrate it based on data

available for the upstream configuration at T = 250C and ib = 15 mA. Using this calibrated

model, we estimate the MTF for all the other configurations. Ideally one should use the data

at different temperature points to obtain a more accurate calibration, but in this case we are

interested in the ratio of MTF as observed in the upstream and downstream configurations,

rather than the actual MTF values. The results are tabulated in Table 3.1. As can be seen,

EKM consistently predicts that the MTF in the upstream configuration is 2-3x longer than the

downstream configuration, which is similar to the reported results. Since there are 4 redundant

TSVs, a ratio close to 4 was to be expected. This effect cannot be modelled by only using

Black’s model as it simply depends on the current density of a given line, and thus would give

the same MTF in both configurations.

3.6 Comparison between EKM and Black’s model

In this section, we will compare the MTFs estimated using Black’s model and EKM for two

interconnect trees denoted as T1 and T2, extracted from the IBM power grid benchmarks [26].

20 40 60 80 100 120 140 160 180

branch number

#109 (a)

20 40 60 80 100 120 140 160 180

branch number

Figure 3.15: (a) Initial current density profile for T2 and heat map showing MTFs esti-mated using (b) Extended Korhonen’s model (MTFekm), (c) Black’s model (MTFblk) and(d) MTFblk −MTFekm. All MTF values are in years.

Both trees are straight metal stripes, consisting of 193 junctions (2 diffusion barriers and 191

dotted-I junctions) and 192 branches each. For a fair comparison, we calibrate Black’s model

based on data obtained from Korhonen’s model, so that for a finite line, the MTF predicted

by Black’s model and EKM are the same. Since Black’s model gives branch MTFs and EKM

computes junction MTFs, we report the junction MTFs as the MTF for all connected branches

for this comparison. The MTF estimate from EKM is the arithmetic average of 100 TTF

samples, where each TTF sample is obtained by assigning lognormally generated diffusivities

to all branches in the tree and simulating the tree up to 100 years.

Tree T1 has a high current density profile, with maximum initial branch current density being

5.1 × 109 A/m2 (Fig. 3.14a). In this case, the calibrated Black’s model estimates the smallest

MTF to be around 6 yrs, whereas the smallest MTF found using the Extended Korhonen’s

model is around 24 yrs, which is ∼ 4x longer. Fig. 3.14b and 3.14c show the heat map of MTFs

of all branches within the tree as estimated using EKM and Black’s model and Fig. 3.14d shows

the difference in the estimated values. This scenario clearly shows that Black’s model can be

highly pessimistic.

Next, consider tree T2 which has a low current density profile, with maximum initial branch

current density being 1.5×109 A/m2 (Fig. 3.15a). Here, due to the Blech effect, Black’s model

predicts that no failure should occur. However, due to the material flow between the branches,

we found that the smallest MTF would be around 2.2 yrs. This test case shows that Black’s

20 40 60 80 100 120 140 160 180

branch number

Actual temp. dist.

Assumed nominal temp.

20 40 60 80 100 120 140 160 180

junction ID

Figure 3.16: (a) The actual temperature profile and the assumed nominal temperature distri-bution. Heat map showing MTFs estimated with (b) actual temperature profile (MTFT ), (c)assuming Tm,k = 327.6K for all branches (MTF T ) and (d) MTFT −MTF T . All MTF valuesare in years.

model can also be highly optimistic for a tree, especially when it has a low current density

profile. Similar to the previous figure, Fig. 3.15b and 3.15c show the heat map of MTFs of all

branches within the tree as estimated using EKM and Black’s model and Fig. 3.15d shows the

difference in the estimated values.

3.7 Importance of Temperature distribution

In this section, we will explore the effect of temperature on the lifetimes estimated using EKM.

For this study, we will use tree T1. The junction MTFs are obtained by taking average of

100 TTF samples, where each TTF sample is obtained by assigning lognormally generated

diffusivities to all branches in the tree and simulating the tree up to 100 years.

We first estimate the MTFs using the actual temperature distribution, as shown in Fig. 3.16a.

This temperature distribution was obtained by using compact thermal models (the detailed pro-

cedure is presented in Section 6.3). For this case, the smallest MTF was observed to be around

24 years. Now, we artificially assume a constant temperature of 327.6K throughout the tree,

i.e. Tm,k = 327.6K ∀k. Note that 327.6K is the average of the actual branch temperatures. In

this case, the first failure happens around 22.5 yrs, which is close enough to the actual smallest

MTF. However, the similarity ends here, with all subsequent junction MTFs being different

(a) using actual Temp. distribution

20 40 60 80 100 120 140 160 1800

(b) Tm = 315K

20 40 60 80 100 120 140 160 1800

(c) Tm = 327.6K

20 40 60 80 100 120 140 160 1800

(d) Tm = 340K

20 40 60 80 100 120 140 160 1800

Figure 3.17: Estimated MTF as per EKM using (a) the actual temperature profile, and assumingthe temperature to be (b) 315K (c) 327.6K and (d) 340K for all branches. The x-axis for allplots represent the junction IDs. Junctions with MTF ≥ 100 years have not been shown.

from each other. In particular, the actual MTFs are lower for branches 1-50 and 140-192 where

the actual temperature is more than 327.6K and are higher for branches 51-140 where the

actual temperature is less than 327.6K (see Fig. 3.16b, 3.16c and 3.16d). This shows that a

single nominal temperature cannot model the effect of an uneven temperature distribution. A

higher nominal temperature would result in lower MTF values for all junctions and vice versa.

This is shown in Fig. 3.17, where we show the MTF computed with different values of nominal

temperature. The minimum MTFs for test cases with Tm,k as 315K, 327.6K and 340K ∀k are

60.1 years, 22.5 years and 9.07 years, respectively. Hence, temperature distribution plays a very

important role and should be taken into account while doing EM lifetime analysis.

Chapter 4

LTI Models for trees

4.1 Introduction

In the last chapter, we described in detail the Extended Korhonen’s model (EKM) and also

verified it against known analytical solutions for simple cases and published experimental results

in the literature. In this chapter, we will dig deeper into EKM to show that it has a state space

representation, which is a succession of Linear Time Invariant (LTI) systems. Expressing

EKM as a LTI system has at least two advantages. First, it allows us to analyze EKM better

using well-known LTI system concepts, which we will do in this chapter. Second, we can now

develop optimized numerical methods to solve EKM, which we will do in the next chapter. A

preliminary version of this work appeared in [72].

In this chapter, we will show that EKM is an asymptotically stable system with all eigenval-

ues being negative real numbers. We also investigate the accuracy vs. speed trade off for LTI

models obtained using different values of N . Finally, we will justify the use of average currents

for EKM by studying the frequency response of tree LTI systems.

4.2 State Space representation for a tree

A PDE system is said to be linear if the equation, its boundary and initial conditions do not

include any non-linear combination of the variables or their derivatives. From (3.11), it is clear

that for EKM, there are no non-linear combinations for the variables (σ, x and t) and the

derivatives involved. Thus, EKM is a linear PDE system. To illustrate this point, we will

revisit the example with dotted-I structure Td presented in Section 3.3. Specifically, we will

show that when no voids are present in Td, the IVP formulation obtained after eliminating the

ghost points is essentially a LTI system. For clarity, we repeat the following definitions

rik , Li/Lk, pik , Da,iTm,k/(Da,kTm,i),

wik , wi/wk, γik , rkiwikpik, Υk , θk/(∆ξ)2.(4.1)

Chapter 4. LTI Models for trees 58

At the diffusion barrier n1, the stress evolution is given by the ODE

dη1,0dτ

= Υ1 (η1,−1 + η1,1 − 2η1,0) . (4.2)

Substituting η1,−1 from (3.24) in (4.2), we can eliminate the ghost point

dη1,0dτ

= Υ1(η1,1 − 2∆ξα1 + η1,1 − 2η1,0) = −2Υ1(η1,0 − η1,1)− 2Υ1∆ξα1.

Similarly, we can eliminate the ghost points in ODEs at n2 and n3. The final IVP can be

written as

dη1,0dτ

= −2Υ1 (η1,0 − η1,1)− 2∆ξΥ1α1, (4.3a)

dη1,idτ

= Υ1 (η1,i−1 − 2η1,i + η1,i+1) , i ∈ 1, 2, . . . , N − 1, (4.3b)

dη1,Ndτ

= 212Υ1

(η1,N−1 − (1 + γ21)η1,N + γ21η2,1

)+ 2∆ξ12Υ1(α1 − γ21α2), (4.3c)

dη2,idτ

= Υ2(η2,i−1 − 2η2,i + η2,i+1), i ∈ 1, 2, . . . , N − 1, (4.3d)

dη2,Ndτ

= 2Υ2 (η2,N−1 − η2,N ) + 2∆ξΥ2α2, (4.3e)

ηk,i(0) =ΩσT,k(0)

kbT ⋆m

, k ∈ 1, 2 and i ∈ 1, 2, . . . , N − 1, (4.3f)

where 12 = r12/(r12 + w21). Clearly, (4.3) can be written as a LTI system

η1,0(τ)

η1,1(τ)...

η1,N (τ)

η2,1(τ)...

η2,N (τ)

−2Υ1 2Υ1 0 . . .

Υ1 −2Υ1 Υ1 . . .. . .

. . .. . .

. . . 212Υ1 −212Υ1(1+γ21) 212Υ1γ21 0 . . .

0 Υ2 −2Υ2 Υ2 . . .. . .

. . .. . .

. . . 2Υ2 −2Υ2

η1,0(τ)

η1,1(τ)...

η1,N (τ)

η2,1(τ)...

η2,N (τ)

2∆ξΥ1 0 0

0 0 0...

......

0 2∆ξ12Υ1 0

0 0 0...

......

0 0 2∆ξΥ2

−α1

(α1 − γ21α2)

. (4.4)

Following a similar procedure, it can be shown that the IVP formulation of Td in the presence

Figure 4.1: Notion of subtrees and time-spans.

of voids is also an LTI system. Note that (4.4) is not the final LTI system for a tree with no

voids, because it can be shown that the system matrix is singular. We will discuss this case in

detail in Section 4.2.3.

4.2.1 Subtrees and Time-spans

Before going any further, we need to introduce the concept of subtrees and time-spans. When a

void nucleates at a junction, EKM conceptually treats it as a diffusion barrier for all connected

branches, so that there is no material flow between them. Thus, the tree is effectively divided

into separate subtrees. A subtree of tree T = N ,B is graph T = N , B with N ⊆ N and

B ⊆ B. Fig. 4.1 illustrates the notion of subtrees. Let τp be the time of the pth void nucleation,

with τ0 = 0. For the time-span [τ0, τ1), a tree has no voids. Thus, T = T and Nf , the set

of failed junctions, is empty. We will refer to this time-span as the pre-void phase. For all

subsequent time-spans, the subtrees will have at least one failed junction that has a void. We

will refer to these time-spans as the post-void phase. At τ = τ1, the first void nucleates, say at

n2. Since n2 is a dotted-I junction, it divides the tree into two subtrees T1 and T2, as shown

in Fig. 4.1. Note that n2 appears in both subtrees, and as per EKM is treated as a diffusion

barrier with a void, which we will refer to as a voided diffusion barrier. Similarly, at τ = τ2,

junction n6 fails creating three new subtrees. The whole tree is now divided into four subtrees.

In general, if Nf is the set of failed junctions in the whole tree, then the number of subtrees ns

can be found using

ns = 1 +∑

np∈Nf

deg(np)− |Nf |, (4.5)

where deg(np) is the degree of junction np before void nucleation and |Nf | is the number of

junctions that have failed in the tree.

4.2.2 LTI system for a subtree

Consider a subtree T = N , B of tree T and let Nf be the set of failed junctions in the subtree

[e.g. Nf = n2 for T2 in the time-span [τ1, τ2)]. Similar to the IVP formulation, we uniformly

discretize each branch bk into N segments, where N is the same for all branches. Then, there

would be a total of q + 1 discretized points, where q = N |B|. Each discretized point is given a

unique index i ∈ i0+ 0, 1, 2, . . . q, where the offset i0 ensures unique indices for all discretized

points within the tree T . Let xi represent the reduced stress at the ith discretized point in the

tree. Then, using (3.18), the time rate of change of xi in branch bk is

∂xi∂τ

= θk∂2xi∂ξ2k

. (4.6)

Replacing the partial spatial derivative with respect to ξk in (4.6) with central difference ap-

proximation and solving the boundary conditions at junctions to eliminate the ghost points

leads to the following translated LTI system for a subtree in the time-span [τp, τp+1)

˙x(τ − τp) = Ax(τ − τp) + Bu, (4.7a)

y(τ − τp) = Lx(τ − τp), (4.7b)

x(0) = x0, (4.7c)

where x = [xi] ∈ Rq+1 is the state vector of the subtree, A = [ai,j ] ∈ R

(q+1)×(q+1) is the system

matrix, B = [bi,j ] ∈ R(q+1)×(|N |−|Nf |) is the input matrix, u = [ui] ∈ R

|N |−|Nf | is the input

vector, L = [li,j ] ∈ R|N |×(q+1) is the output matrix and y = [yi] ∈ R

|N | is the output vector that

consists of stress values at all junctions. The initial condition x0 is easily obtained from the

stress profile of the tree at τ = τp computed using the LTI models of the previous time-span,

or it is given by the residual thermal stress at τ = 0.

Each state xi contributes some non-zero entries to the ith row of A, B, u and L, which

we refer to as a state stamp. State stamps are conceptually similar to element stamps used

in SPICE for generating circuit matrices. The notion of stamps is useful to assemble the LTI

system for a given subtree: we start by initializing all matrices and vectors to zeros and add

the stamps as we traverse through the tree. The state stamps are determined based on the

location, the adjacent points and the presence or absence of a void at point i. Two points are

said to be adjacent to each other if they are physically next to each other in a subtree. We will

use A(i) denote the set of indices for points adjacent to i.

State Stamps for A

Diffusion barrier Consider state xi for a diffusion barrier np at the beginning or at the end

of branch bk, with A(i) = i1. Let τf be the time of void nucleation at this barrier. Then, the

non-zero entries in the ith row are given as

ai,i =

−2Υk τ < τf ,

−2Υk(1 + ∆ξLk/δ) τ ≥ τf ,(4.8a)

ai,i1 = 2Υk ∀τ. (4.8b)

Higher degree junctions Consider a state xi for a junction np with degree d (d is 2, 3 or

4) and A(i) = i1, i2, . . . , id. Without loss of generality, we will assume that np is at the end

of branch 1 and at the beginning of branches 2, . . . , d. Let τf be the time of void nucleation at

np. Then, the state stamp corresponding to np for τ < τf are

ai,i = −21dΥ1

γk1, (4.9a)

ai,ik = 21dΥ1γk1, k = 1, . . . , d, (4.9b)

12 =r12

r12 + w21, 13 =

r12r13r12r13 + r13w21 + r12w31

, (4.10a)

14 =r12r13r14

r12r13r14 + r13r14w21 + r12r14w31 + r12r13w41. (4.10b)

As mentioned before, when a void nucleates at junction np, it generates new subtrees.

Clearly, each subtree will have at least one void located at the newly created voided diffusion

barrier. For any subtree, let i be the index of the discretized point at the beginning or the

end of branch bk where a void is present, and let A(i) = i1 be its only adjacent point in the

subtree. Then the state stamps for A is simply given by

ai,i = −2Υk(1 + ∆ξLk/δ), ai,i1 = 2Υk. (4.11)

Branch interior Consider state xi for a discretized point within branch bk, with A(i) =

i1, i2. Then, the non-zero entries of the ith row of A are

ai,i = −2Υk, ai,i1 = ai,i2 = Υk. (4.12)

In EKM, a void cannot nucleate inside a branch. Hence, there are no state stamps for the

corresponding case.

Theorem 1. (properties of A) For a subtree T , let A be the system matrix obtained using

stamps (4.8)-(4.12) Then:

(a) For the pre-void phase, all eigenvalues of A are real and non-positive, with exactly one

eigenvalue being 0.

(b) For the post-void phase, A is non-singular, with all eigenvalues being real and negative.

The proof of this theorem is given in the appendix A. From theorem 1, A is singular in the

pre-void phase. This is problematic because a singular matrix is not invertible, and hence we

cannot find the steady state solution for the LTI system (which will be required later) and we

also cannot apply any model order reduction techniques. Thus, we will derive a non-singular

LTI system for the pre-void phase in the next section. But before we do that, we will complete

this section by presenting the state stamps for B, L and u.

State Stamps for B

Similar to A, each state xi contributes some non-zero entries to the input matrix B. By the

nature of the ODE system, the inputs are present only at junctions that have no voids, so that

number of inputs is equal to |N | − |Nf |, the number of un-voided junctions in a subtree. Thus,

B is a (q + 1) × (|N | − |Nf |) matrix. Let all the un-voided junctions junction in a subtree be

represented as np, with p ∈ 0, 1, 2, . . . , |N | − (|Nf |+ 1). Then, the state stamps for B are as

follows:

Branch interior and voided diffusion barrier Any state xi that lies within a branch or

is at a voided diffusion barrier does not contribute anything to the ith row of B. Thus, the

corresponding row in B is all zeros.

Diffusion barrier For a diffusion barrier at junction np with state xi at the beginning or the

end of branch bk, the non-zero entry in the ith row of B is

bi,p = 2∆ξΥk. (4.13)

Higher degree junctions For a junction np with degree d ∈ 2, 3, 4, which is at the end of

branch 1 and at the beginning of branches 2, . . . , d, the state-stamp is given as

bi,p = 2∆ξ1dΥ1, (4.14)

where 1d is as given in (4.10).

Overall, the structure of B is that such that the pth column corresponding to the un-voided

junction np with state xi has a non-zero entry at the ith row. All other entries are 0.

State Stamps for u

The input vector u = [up] ∈ R|N |−|Nf | is the vector of inputs at un-voided tree junctions. Here,

the value of up corresponds to junction np, and is determined as follows:

Diffusion barrier For a diffusion barrier np located at the beginning or end of branch bk, we

up = ±αk, (4.15)

where the sign is positive for a diffusion barrier at the end of a branch and is negative for a

diffusion barrier at the starting of a branch.

Higher degree junctions For a junction np with degree d ∈ 2, 3, 4, which is at the end of

branch 1 and at the beginning of branches 2, . . . , d, we have

up = α1 −d∑

γk1αk. (4.16)

State Stamps for L

The output matrix L = [lp,i] ∈ R|N |×(q+1) is just a matrix of 1’s and 0’s that selects the states

at junctions to be the output of the system

lp,i =

1 xi is a state at junction np,

0 otherwise.(4.17)

4.2.3 LTI system for pre-void phase

From theorem 1, A is singular in the pre-void phase. This happens because the corresponding

boundary conditions model it as a closed system, i.e. there is no exchange of atoms with

other trees. This creates a dependency among the states xi of the whole tree, which leads to a

singular system matrix. In this subsection, we will state that dependency, which is essentially

an alternate form of conservation of mass, and use it to get a corresponding non-singular LTI

system. Since T = T in the pre-void phase, we will consider the whole tree while applying the

conservation of mass.

From Hooke’s law (2.7), we can write for branch bk ∈ B

C(ξk, τ) = C0e−ηkkbT

⋆m/(BΩ), (4.18)

where ηk ≡ ηk(ξk, τ), C is the concentration of atoms and C0 is its equilibrium value in the

absence of stress. Then, the total number of atoms Ntot in the tree at any time τ can be written

as (h, the height of the tree is same for all branches in the tree)

Ntot = C0h∑

bk∈B

0e−ηkkbT

⋆m/(BΩ) dξk

≈ C0h∑

bk∈B

1− ηkkbT⋆m

bk∈B

wkLk −kbT

bk∈B

0ηk dξk

, (4.19)

where we used the approximation ex ≈ 1 + x for x ≪ 1 because ηkkbT⋆m ≪ BΩ, ∀τ . Since,

the number of atoms in the tree is the same for any time τ , the tensile/compressive stresses

generated by the movement of atoms can only vary in a way that conserves the number of

atoms in the tree. Thus, the second summation term on the right hand side of (4.19) should

be constant. Define

β(τ) ,∑

bk∈B

0ηk(ξk, τ)dξk =

cixi(τ), (4.20)

where q = N |B| and the integral was evaluated using the trapezoidal rule. The value of ci

coefficients are

Lkwkhk∆ξ xi is inside branch bk,

(∆ξ/2)∑

bk∈Bp

Lkwkhk xi is at junction np. (4.21)

Bp is the set of branches connected to np. Since the residual thermal stress values for all points

at τ = 0 is known from (3.2), β(0) = β0 =∑q

i=0 cixi(0) is a known quantity. Then, in order to

satisfy the conservation of mass, we must have

β0 = β(τ) =

cixi(τ) ∀τ. (4.22)

This gives us a linear dependence between the states so that one state can be eliminated from

(4.7), which will make the system matrix non-singular as it removes the (only) zero eigenvalue.

Note that we can only eliminate a non-output state from the system. Without loss of generality,

let x0 be the non-output state to be eliminated. If we denote x = [xi] ∈ Rq for 1 ≤ i ≤ q to be

the new state vector for the pre-void phase, we can write from (4.22)

x0(τ) = −cT x(τ) + β0/c0, (4.23)

where c = c−10 [ c1 c2 . . . cq ]

T ∈Rq. Now, the singular LTI system for pre-void phase can be

written as

x0(τ)

˙x(τ)

a0,0 a1q

aq1 Aq

︸︷︷︸

x0(τ)

b0|N |

︸︷︷︸

u, (4.24a)

y(τ) =[

l|N |0 Lq

︸︷︷︸

x0(τ)

, (4.24b)

where u ∈ R|N | is as obtained using state stamps for pre-void phase and

a1q = [ai,k]T ∈ R

q for i = 0, 1 ≤ k ≤ q,

aq1 = [ai,k] ∈ Rq for 1 ≤ i ≤ q, k = 0,

Aq = [ai,k] ∈ Rq×q for 1 ≤ i, k ≤ q,

b0|N | = [bi,k]T ∈ R

|N | for i = 0, 0 ≤ k ≤ |N | − 1,

Bq = [bi,k] ∈ Rq×|N| for 1 ≤ i ≤ q, 0 ≤ k ≤ |N | − 1,

l|N |0 = [li,k] ∈ R|N | for 0 ≤ i ≤ |N | − 1, k = 0,

Lq = [li,k] ∈ R|N |×q for 1 ≤ i ≤ |N | − 1, 0 ≤ k ≤ q.

Since we are eliminating x0, the first row in (4.24a) is removed, and we are left with

˙x(τ) = aq1 x0(τ) + Aqx(τ) + Bqu. (4.25)

Using (4.23) in the LTI system (4.25), we get

˙x(τ) = (Aq − aq1 cT ) x(τ) + Bqu+ (β0/c0)aq1. (4.26)

Define

A , Aq − aq1 cT , (4.27a)

B , Bq + (β0/c0)aq1u, u ∈ R|N | and u · u = 1, (4.27b)

L , Lq. (4.27c)

Then, the non-singular LTI system for the pre-void phase can be stated as

˙x(τ) = Ax(τ) + Bu, (4.28a)

y(τ) = Lx(τ), (4.28b)

x(0) =[

ηT,1(0) ηT,2(0) . . . ηT,q(0)]

, (4.28c)

where ηT,i(0) is the initial reduced thermal stress at point i.

4.2.4 Final State Space representation

From the previous discussion, it is clear that the state space representation for a tree has to be

updated whenever a void nucleates at any of its junctions. In addition, the size of the model

changes as voids nucleate in the tree. For any given time-span [τp, τp+1), the system matrix,

input matrix and the output matrix are fixed (independent of time), which gives us an LTI

system. Once a void nucleates, EKM assumes that it reaches its steady state volume in a

negligible amount of time. Correspondingly, the branch resistances change fairly quickly and

the current densities also change to their new effective values in a negligible amount of time.

As such, the input vector u is also fixed for a given time-span. Overall, for each time-span,

EKM is an LTI system with step inputs.

To state the succession of LTI systems, we first define the following

x(τ) ,

x(τ) p = 0,[

x1(τ − τp)T . . . xns(τ − τp)

p > 0,(4.29a)

A(τ) ,

A p = 0,

p > 0,(4.29b)

B(τ) ,

B p = 0,

p > 0,(4.29c)

u(τ) ,

u p = 0,[

uT1 . . . uTns

]Tp > 0,

(4.29d)

L(τ) ,

L p = 0,

p > 0,(4.29e)

where the subtrees are numbered 1 to ns [ns is obtained from (4.5)] and it is assumed that indices

of all points within a subtree is contiguous. Then, the complete state space representation of a

tree can be simply be stated as

x(τ) = A(τ)x(τ) +B(τ)u(τ), (4.30a)

y(τ) = L(τ)x(τ), (4.30b)

x(τp) = xp,0. (4.30c)

Here, xp,0 is the initial condition of the LTI system for the time-span [τp, τp+1), and is given by

xp,0 =

ηT,1(0) . . . ηT,N |B|(0)]

p = 0,

−cTx(τ−1 ) +β0c0

x(τ−1 )T]T

p = 1,

Px(τ−p ) p ≥ 2,

(4.31)

with x(τ−p ) being the solution obtained at τ = τp using the LTI system of the previous time-span

[τp−1, τp) and P is just an incidence matrix of 1s and 0s that maps the stress values from the

old indices to the new ones, taking care of the fact that the newly voided junction can now be

a part of multiple subtrees. The size (order) of the state space representation of a tree is given

N |B| 0 ≤ τ < τ1,ns∑

(N |Bi|+ 1) τp ≤ τ < τp+1 and p ≥ 1,(4.32)

where Bi is set of branches in the ith subtree.

From theorem 1 and Section 4.2.3, it is clear that all eigenvalues of the system matrix

A are negative real numbers for all time-spans. Thus, the corresponding LTI systems are

asymptotically stable, such that the forced response for step inputs grow towards some steady

state value. This is to be expected because steady-state stress in confined finite metal line has

been studied and reported in the literature [34, 35, 46, 18], and it is natural to expect it to

generalize for interconnect trees as well.

4.3 Choosing the value of N

The accuracy of our numerical approach heavily depends on how well the LTI model approx-

imates the PDE system. A finer discretization leads to a larger LTI system that results in a

more accurate approximation but takes longer to solve and vice versa. As such, it becomes

imperative to study what value of N gives a good accuracy-speed trade-off.

For this study, we will again use tree T1, which was used in Chapter 3 as well. Recall that

this tree is a straight metal stripe with 193 branches and 192 junctions. We will denote the

LTI model generated with N discretizations per branch asMN . In this study, we will generate

the following LTI models for T1: M8,M10,M16,M20,M25,M32,M40,M50 andM64, with

M64 being the reference solution as it is the most accurate. For each LTI model, we simulate

T1 for a time-period of 15 years, and store the stress values at all the outputs in the tree for

100 equidistant time-points. We also store the void nucleation times and the sequence of void

nucleations as estimated by the different LTI models. We use the 2nd order variable coefficient

Backward Differentiation Formula (VCBDF2) solver, to be presented in the next chapter, to

0 100 200 300 400 500 600

Stress (Mpa)

0 100 200 300 400 500 600

Stress (Mpa)

N = 10

0 100 200 300 400 500 600

Stress (Mpa)

N = 16

0 100 200 300 400 500 600

Stress (Mpa)

N = 20

0 100 200 300 400 500 600

Stress (Mpa)

N = 25

0 100 200 300 400 500 600

Stress (Mpa)

N = 32

0 100 200 300 400 500 600

Stress (Mpa)

N = 40

0 100 200 300 400 500 600

Stress (Mpa)

N = 50

Figure 4.2: Error rate plots for LTI models M8-M50 with respect to the reference solutionobtained usingM64.

simulate these LTI models1.

Fig. 4.2 shows the error rate plot for all models (M8-M50) with respect to the reference

solution obtained using M64. In each plot, the red and black lines show ǫabs,ub, the upper

bound on the absolute error between the two solutions. An LTI model with smaller ǫabs,ub has

a more accurate solution. As expected, ǫabs,ub decreases as we increase N . Fig. 4.3a shows

the trade-off between runtime and accuracy (i.e. ǫabs,ub). There is a very clear trade-off here,

increasing N decreases ǫabs,ub at the cost of runtime. However, note that increasing N beyond

16 gives diminishing returns: the decrease in ǫabs,ub gets slower and runtime increases rapidly.

A similar trend can be seen for other trees as well.

Another way of reporting the accuracy is to compare the void nucleation times and the

sequence of junction failures as obtained using the different LTI models. For all LTI models

1We use VCBDF2 because it will be the main numerical solver used for obtaining our final results

0 10 20 30 40 50 60 70N

8 10 16 20 25 32 40 50N

1st failure

2nd failure

3rd failure

Figure 4.3: (a) Runtime vs. accuracy trade-off for LTI models with different discretizationsand (b) Percentage error in estimated junction void nucleation times for LTI modelsM8-M50

with respect toM64. Smaller is better.

(M8-M64), we found 3 junction failures and the sequence of junction failures obtained were

exactly the same, but the estimated failure times varied slightly. The percentage error in

estimated failure times is as shown in Fig 4.3b. In this case (i.e. for tree T1 using VCBDF2

solver), there is clearly no benefit of moving beyond N = 16. In fact, the errors in M20 and

M25 are somehow larger as compared to M16 for this case. The general trend, however, is

similar to what we observed before: diminishing returns after N = 16. Hence, we will use

N = 16 to generate the tree LTI models in all future experiments.

4.4 Justification for the use of effective-EM currents

Because EM is a long term failure mechanism, short-term transients do not play a significant

role in EM dynamics. Thus, the standard practice in the field is to use an effective-EM cur-

rent, essentially a DC current, for doing EM analysis. For power grid lines, that carry mostly

unidirectional currents, effective EM current is the time-average of the current waveform. How-

ever, we need to verify that average currents are indeed sufficient for EM analysis, and that

we are not missing out on anything. In this section, we will provide a theoretical basis and an

experimental justification for the use of effective-EM currents.

Although the motivation to use effective currents comes from experimental evidence [30, 32,

73], one can also understand it in terms of Korhonen’s model and EKM. A simple integration

of (3.11a) for any branch bk gives

σk(xk, tp) = σk(xk, 0) +BΩ

kbTm,k

(∫ tp

∂σk∂xk

dt− q∗ρ

∫ tp

0jk(t) dt

. (4.33)

From (4.33), it can be seen that the stress evolution with time is determined by the integral

of the current density waveform jk(t), and thus the cumulative behaviour of current density

is more important than short time transients. Given that EKM is a linear system, if jk(t) is

replaced by an effective-EM (DC) current density jeff such that the integration up to time tp is

the same as that obtained using jk(t), the stress values obtained at time tp would be the same.

In other words, if

jeff =1

∫ tp

0jk(t) dt, (4.34)

the system response at tp using the time-varying (transient) and effective current densities will

be the same. This provides the basis for using average currents for EM lifetime analysis.

In order to justify the use of effective-EM currents, we conduct a small experiment using

the three junction tree Td with two diffusion barriers n1 and n3 and a dotted-I junction n2

(see Fig. 3.3). The experiment consists of five tests. In each test, we compare the system

response, i.e. stress evolution at junctions for the pre-void phase as obtained using a transient

branch current waveform and its effective (average) value. The results are shown in Fig. 4.4.

The transient current waveforms for j1 and j2 are periodic unidirectional DC pulses with a

duty-ratio of 0.5 for the first four tests and are randomly generated for the fifth test case. The

time-period of the pulse waveforms is chosen to be large enough so that its effect on the stress

evolution is visible. In all cases, it can be clearly observed that the stress evolution computed

using jeff tracks the stress evolution obtained using the pulsed or the random waveform really

well. A similar observation can be made for the post-void phase as well.

The transient and effective system responses become almost similar as the time-period is

reduced from 2 months (Fig. 4.4a) to 1 week (Fig. 4.4d). This ‘agreement’ in the system

response obtained using transient and effective current densities become more prominent as

the time-period is reduced further. This observation can be readily explained by the frequency

response of the LTI system. Recall that a frequency response determines the gain of the output

with respect to the input as the input frequency is varied in a given spectrum. For a multi-input

multi-output system (as is the case with our LTI system), the frequency response of each output

with respect to each input has to be considered.

In Fig. 4.5 and Fig. 4.6, we show the frequency response of all outputs with respect to each

input for the pre-void and post-void phase, respectively, using Bode plots. Each plot also shows

the bandwidth, defined as the first frequency where the gain drops below 70.79% (-3 dB) of its

DC value. Clearly, the Bode plots for all outputs show a frequency response similar to a low

pass filter, where only the low frequency input components are allowed to pass through and the

high frequency components are attenuated. The computed bandwidths for all outputs are in

the range of 1-25 Hz, which is very small when compared to the operating frequency of modern

logic circuits. Thus, the use of average currents for EM analysis is justified in modern on-die

power grids.

0 2 4 6time (yrs)

trans. resp.eff. resp.

0 2 4 6time (yrs)

0 0.1 0.2 0.3time (yrs)

10 10 current density

0 2 4 6time (yrs)

0 0.1 0.2 0.3time (yrs)

0 2 4 6time (yrs)

0 0.1 0.2 0.3time (yrs)

0 2 4 6time (yrs)

0 0.1 0.2 0.3time (yrs)

0 2 4 6time (yrs)

0 0.1 0.2 0.3time (yrs)

Figure 4.4: The stress evolution at junctions in response to periodic pulsed branch currents andtheir average (effective) values. The time-periods are (a) 2 months, (b) 1 month, (c) 2 weeks,(d) 1 week and (e) is a random waveform.

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n1, in:u1

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n1, in:u2

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n1, in:u3

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n2, in:u1

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n2, in:u2

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n2, in:u3

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n3, in:u1

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n3, in:u2

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n3, in:u3

bandwidth: 22.78 Hz bandwidth: 1.49 Hz bandwidth: 2.05 Hz

Figure 4.5: Frequency response of the pre-void LTI system for Td using Bode plots. The LTIsystem of Td has three outputs and three inputs for the pre-void phase.

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n1, in:u1

bandwidth: 0.25 Hz

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n2 (in b1), in:u1

bandwidth: 0.24 Hz

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n2 (in b2), in:u2

bandwidth: 1.46 Hz

10 0 10 2 10 4 10 6 10 8

frequency (Hz)

out:n3, in:u2

bandwidth: 1.56 Hz

Figure 4.6: Frequency response of the post-void LTI system for Td using Bode plots. Here, n2

has a void, and is thus a part of both branches b1 and b2. Also, now there are only two inputsbecause a voided diffusion barrier has no inputs.

Chapter 5

Solution Techniques

5.1 Introduction

In the last chapter, we showed that the EKM for a tree is essentially a succession of LTI

systems. These LTI systems can be easily solved using standard numerical techniques presented

in Section 2.7. However, our final goal is to estimate the reliability of the power grid, which

might require solving large LTI systems for thousands of trees in the grid. Thus, we need faster

and scalable numerical approaches. As such, we will focus on developing optimized numerical

methods for solving tree LTI systems in this chapter.

First, we will present an equivalent homogeneous LTI system for EKM because it has the

advantage of requiring less computational work per step in the numerical methods. We then

present three numerical methods for solving the homogeneous LTI system to determine the next

void nucleation: Variable coefficient Backward Differentiation Formulas (VCBDF), Newton’s

method and a Predictor based method. Newton’s method and the Predictor based method

use model order reduction (based on the Arnoldi process) to quickly compute the analytical

solution involving the matrix exponential. Finally, we compare and report the performance

and accuracy of the numerical techniques, by using a standard variable time-step Runge-Kutta

method with Butcher tableau as given by Dormand and Prince [71] and as implemented by [56]

as the reference solution. A preliminary version of this work will appear in [74].

5.2 Equivalent Homogeneous LTI system for EKM

From (4.30), EKM is an LTI system with a fixed input vector for any given time-span [τp, τp+1)

x(τ) = Ax(τ) +Bu, (5.1a)

y(τ) = Lx(τ), (5.1b)

x(τp) = xp,0. (5.1c)

Chapter 5. Solution Techniques 74

Since the input vector is fixed, we can simplify the LTI system by using the following change

of variables

z(τ) = x(τ)− xss, (5.2)

where xss = −A−1Bu is the vector of steady state stress profile of the tree for the given fixed

input u, assuming σth → ∞. Let yss = Lxss. Then, the homogeneous LTI system can be

written as

z(τ) = Az(τ), (5.3a)

y(τ) = Lz(τ) + yss, (5.3b)

z(τp) = xp,0 − xss. (5.3c)

Any numerical method for solving EKM needs to integrate the above ODE system [z(τ) =

f(z, τ) = Az(τ)] to find z(τ) ∈ Rq, where q is given in (4.32). The main objective of numerically

solving (5.3) is to compute the 1) time and location of the next void nucleation in the tree and

2) the stress profile of the tree at the time of void nucleation. We need this information to set

up the LTI system for the next time-span. As we will see in the next chapter, this objective

fits in the larger framework of determining the power grid MTF.

In the subsequent sections, we will make use of the notation and the theory presented in

Section 2.7. Specifically, we will use zn to denote the solution computed by the numerical

method that approximates the true solution z(τn), and zn[i] to denote the ith component of the

true solution vector zi(τn), i.e. zn ≈ z(τn) and zn[i] ≈ zi(τn).

5.3 Using BDF formulas

We found from practical experience that (5.3) is a stiff system. An LTI system with all negative

eigenvalues (which is the case for us) is said to be stiff if the ratio of its largest to smallest

magnitude eigenvalue is very large [49]. We observed this ratio to be of the order of 109−1010 formany of the system matrices. Solving a stiff system is difficult because the solution consists of

a combination of rapidly varying and slowly varying components, which usually forces explicit

integration methods (like Runge-Kutta) to take smaller time-steps in order to maintain the

the solution accuracy. Thus, one needs to use appropriate numerical methods while solving

(5.3). In this section, we will describe the use of variable coefficient Backward Differentiation

Formulas (VCBDFs) to numerically integrate the LTI systems.

5.3.1 Review of BDF with fixed time-step

BDFs are a type of linear multi-step (LMS) method that are particularity suited to solve stiff

systems. Suppose we wish to solve an ODE system z = f(z, τ) where z(τ) is a vector function

of τ . Then, a k-step BDF method takes the following general form

zn+1 + a0zn + a1zn−1 + · · ·+ ak−1zn−(k−1) = h b−1f(zn+1, τn+1), (5.4)

where a−1 = 1 by convention and h = τn−τn−1 is the fixed time-step taken by the BDF method.

A k-step BDF method can be derived using the linear difference operator if its order is also k.

Recall that for an order k method, the first k + 1 coefficients of the linear difference operator

should be zero (see Section 2.7.2). In other words, we must have C0 = C1 = . . . = Ck = 0,

which gives k + 1 equations in k + 1 unknowns. These unknowns are the scalar coefficients

b−1, a0, . . . , ak−1 of the k-step BDF formula. Hence, we can solve C0 = C1 = . . . = Ck = 0

to get their value. For example, a 2-step BDF method with fixed h, obtained by setting

C0 = C1 = C2 = 0, is given by

zn+1 −4

3zn−1 =

3hf(zn+1, τn+1), (5.5a)

ǫPLTE = −2

9h3z(3)(τn), (5.5b)

where z(3)(τn) is the third derivative of z(·) evaluated at τn. Similarly, fixed time-step BDFs

of order 3-6 can be derived and are given in [49]. BDF formulas of orders greater than 6 are

known to be unstable.

As mentioned in the background, all modern ODE solver implementations use variable

time-stepping to speed-up the computation. One way to extend a fixed time-step order k BDF

formula to incorporate variable time-steps is to use interpolation methods. Here, the solution

obtained at previous time-points τn, τn − hpre, . . . , τn − (k − 1) hpre is first interpolated to

the new time-points τn, τn − hnew, . . ., τn − (k − 1) hnew by using an appropriate polynomial

before applying the BDF formula, where hpre is the present time-step of the solver and hnew

is the new time-step. This technique has a problem: if hnew is significantly larger than hpre,

the interpolation polynomial might not provide accurate solutions at the new time-points. This

might lead to stability problems, especially if the step-size is changing frequently [75, 76]. Hence,

instead of using interpolation polynomials, we can re-derive the BDF methods so that they have

built-in support for non-equidistant data points.

5.3.2 Variable coefficient BDF methods

These methods are called variable coefficient BDF (VCBDF) methods because the coefficients

b−1, a0, . . . , ak−1 are now dependent on the sequence of time-steps taken, and thus are not

fixed as before. The derivation of VCBDF methods requires us to restate the concepts of linear

difference operator and residual for the case of variable time-step methods, as we do next.

Linear Difference Operator and Residual for Variable time-step methods

The linear difference operator, as stated in (2.37), assumed a fixed time-step h. The corre-

sponding time function for the variable step-case of a k-step method can be stated as

D[ s(τ);~h ] ,k−1∑

j=−1

ajs(τ −∆j)− hn+1

k−1∑

j=−1

bjs(1)(τ −∆j), (5.6)

where hn+1 = τn+1−τn, s(τ) is some function that can differentiated as often as desired, s(1)(·) isthe 1st derivative of s(·), ~h is a vector of time-steps hn+1, hn, . . ., hn−(k−2) and ∆j = τn− τn−j .

We assume that the ratio hn+1/hn−j is bounded. Similar to the fixed time-step case, when

(5.6) is applied to the true solution z(τ) (instead of the computed sequence zn, zn−1, . . .) andevaluated at τn, it gives the residual Rn+1 for the variable time-step case

Rn+1 = D[ z(τ);~h ]τn =k−1∑

j=−1

ajz(τn −∆j)− hn+1

k−1∑

j=−1

bjz(1)(τn −∆j). (5.7)

Using a Taylor series expansion of s(τ) around τ , we have

s(τ) = s(τ) +∞∑

r!s(r)(τ)(τ − τ)r, (5.8)

where s(r)(τ) is the rth derivative of s(·) evaluated at τ . Differentiating (5.8), we get

s(1)(τ) =∞∑

(r − 1)!s(r)(τ)(τ − τ)r−1. (5.9)

Then, evaluating (5.8) and (5.9) at τ = τ −∆j , we get for j = −1, 0, 1, . . . , k − 1

s(τ −∆j) = s(τ) +∞∑

(−∆j)r

r!s(r)(τ) (5.10)

s(1)(τ −∆j) =∞∑

(−∆j)r−1

(r − 1)!s(r)(τ). (5.11)

Plugging (5.10) and (5.11) into (5.6) and collecting terms with the same order of differentiation

leads to

D[ s(τ);~h ] = C0s(τ) + C1s(1)(τ) + . . .+Crs

(r)(τ) + . . . (5.12)

C0 =k−1∑

j=−1

aj , (5.13a)

C1 = −k−1∑

j=−1

∆jaj − hn+1

k−1∑

j=−1

bj , (5.13b)

Cr =(−1)rr!

k−1∑

j=−1

(∆j)raj −

(−1)r−1hn+1

(r − 1)!

k−1∑

j=−1

(∆j)r−1

bj , (5.13c)

Clearly, this is very similar to the equidistant case as shown in (2.39)-(2.42), with the difference

being that the Cr values are now a function of ∆j values, and hence the ~h vector. Define

~ , max(~h ) = max(hn+1, hn, hn−1, · · · , hn−(k−2)

Then ∆j = τn − τn−j ≤ K~ = O(~), which in turn gives (∆j)r = O(~r) and hn+1(∆j)

r−1 =

O(~r). From (5.13c), we then get Cr = O(~r). A variable time-step method is said to be of

order k if C0 = C1 = . . . = Ck = 0 and Ck+1 6= 0, so that the residual is given by

Rn+1 =∞∑

Crz(r)(τn) = Ck+1z

(k+1)(τn) +O(~k+2), (5.14)

which motivates the definition of PLTE for a variable time-step method

ǫPLTE = Ck+1z(k+1)(τn). (5.15)

Because Ck+1 = O(~k+1), the PLTE is essentially K~k+1z(k+1)(τn) for some constant K, which

is similar to what is observed in the fixed time-step case. In fact, this whole generalization

for variable time-step methods gracefully falls back to the respective equations for the fixed

time-step methods if we assume hn+1 = hn = · · · = hn−(k−2) = h.

Deriving the VCBDF methods

Generalizing from (5.4), a k-step VCBDF method will use the following difference equation for

an ODE system z = f(z, τ)

zn+1 + a0zn + a1zn−1 + · · ·+ ak−1zn−(k−1) = hn+1 b−1f(zn+1, τn+1), (5.16)

where we are trying to compute the solution zn+1 at time-point τn+1 based on previously

computed solution points (τn, zn), (τn−1, zn−1), . . ., (τn−(k−1), zn−(k−1)). Similar to the case of

fixed-step BDF methods, a k-step VCBDF method will be of order k, so that its PLTE is given

by (5.15). We are now ready to derive the VCBDF methods.

Deriving the 2-step VCBDF method The 2-step VCBDF (VCBDF2) and its PLTE are

given by

hn+1b−1f(zn+1, τn+1) = zn+1 + a0zn + a1zn−1, (5.17a)

ǫPLTE = C3z(3)(τn), (5.17b)

where C3 is the error constant. In order to find the values of the coefficients, we set C0, C1 and

C2 in (5.13) to zero, so that

C0 = 0 =⇒ a1 + a0 = −1, (5.18a)

C1 = 0 =⇒ a1hn + b−1hn+1 = hn+1, (5.18b)

C2 = 0 =⇒ a1h2n − 2b−1h

2n+1 = −h2n+1, (5.18c)

which gives three equations in three unknowns that can be solved to find the values of the

coefficients

a0 = −(hn + hn+1)

hn (hn + 2 hn+1), (5.19a)

a1 =hn+1

hn (hn + 2 hn+1), (5.19b)

b−1 =hn + hn+1

hn + 2 hn+1. (5.19c)

Using (5.19) in (5.13c), we can find the value of the error constant as:

C3 = −h2n+1(hn + hn+1)

6 (hn + 2hn+1), (5.20)

which completes the derivation of the VCBDF2 method.

Deriving higher order VCBDF methods Similarly, we can find the coefficients and the

error constant Ck+1 for a k-step VCBDF (VCBDFk) method of order k, with k = 3, 4, 5, 6,by setting C0 = C1 = . . . = Ck = 0, which gives rise to the following set of equations

aj = −1, (5.21a)

aj(∆j)r + (−1)r+1rb−1hn+1 = (−1)r+1

hn+1, r = 1, 2, . . . , k. (5.21b)

Before presenting the solution of (5.21), we will define the sum operator Ψ(u, v), where u and

v are integers, as follows

Ψ(u, v) ,

hn−j if u ≤ v,

0 otherwise.

(5.22)

Note that Ψ(u, u) = hn−u and ∆j = Ψ(0, j − 1) when j > 0. The solution to (5.21) can now be

stated as

b−1 =1

[k−2∑

i=−1

Ψ(−1, i)

, (5.23a)

(−1)j+1hn+1b−1

k−2∏

i=0,i 6=j−1

Ψ(−1, i)

j−1∏

i=−1

Ψ(i, j−1)k−2∏

Ψ(j, i)

, j = 0, 1, . . . , k − 1 (5.23b)

(5.23c)

with the error constant being

Ck+1 = −hn+1b−1

(k + 1)!

k−2∏

i=−1

Ψ(−1, i). (5.24)

Using (5.22)-(5.24), we can compute the coefficients for all VCBDF methods.

5.4 Applying VCBDF to solve the Homogeneous LTI system

Eliminating the Newton iteration step A major drawback of an implicit method like

VCBDF is the requirement of a computationally expensive Newton iteration step to solve (5.16)

for a non-linear f(·). Fortunately, in our case, the ODE system we are trying to solve is a

homogeneous LTI system, as shown in (5.3a). Thus, the solution zn+1 at the next time-point

τn+1 is easily obtained by doing a linear system solve of the following equation

(hn+1b−1A− I)zn+1 =

k−1∑

aizn−1, (5.25)

where I is the identity matrix and the coefficients b−1, a0, . . . correspond to the chosen k-step

BDF method. There is no need to re-factor the LHS if hn+1 and b−1 are unchanged.

Estimating the PLTE Another factor that usually affects the performance of a k-step

VCBDF method is the estimation of ǫPLTE, as it requires the calculation of the (k+1)th deriva-

tive of z(τ), which is usually not available or is difficult to compute. However, given that we

have a homogeneous LTI system, the (k + 1)th derivative is simply

z(k+1)(τn) = Ak+1z(τn) ≈ Ak+1zn. (5.26)

Hence, it is straightforward to compute the PLTE

ǫPLTE ≈ Ck+1Ak+1zn. (5.27)

Note that ǫPLTE = [ǫPLTE,i] is a (q × 1) vector, with the ith value denoting the error in the ith

solution component of zn.

Error Control and Variable time stepping We use ǫPLTE for error control, i.e. keeping

the solution within ǫabs and ǫrel, the absolute and relative error bounds provided by the user,

respectively and in deciding the value of the next time-step. As with many modern ODE

implementations, we use a weighted root-mean-square norm to compute a scalar error metric

from (5.27), as shown below

√∑q−1

i=0 (wi ǫPLTE,i)2

q, (5.28)

where the weight wi is based on the value of the current solution zn[i] and on the tolerances

provided by the user

ǫabs + |zn[i]| ǫrel. (5.29)

We accept a step when ǫs ≤ 1, otherwise we reject it. For determining the new step-size for a

k-step VCBDF method, we empirically found the following to be a good heuristic

hnew =

0.6ǫ−1/(k+1)s hpre, 10hpre, hmax

ǫs ≤ 0.1 and nlast ≥ k + 4,

0.6ǫ−1/ks hpre, 0.2hpre

ǫs ≥ 1,

hpre otherwise,

(5.30)

where nlast is the number of steps taken since the last change in step-size and hmax is the

maximum allowed step-size. This heuristic works due to the following reasons:

• Stability: We have an upper bound of 10 and a lower bound of 0.2 for the ratio hnew/hpre.

We found this necessary in empirical testing to ensure stability of the method. If hnew

becomes less than a pre-defined minimum step-size, we stop the integration.

• Lazy Time-step Changes: Even if a change in step-size is warranted by ǫs, we defer it

until at least k + 4 steps have been taken by the k-step VCBDF method since the last

change in time-step. This condition seeks to balance the trade-off between re-factorizing

the LHS of (5.25) to take a larger time-step versus using the previous factorization to

quickly compute the new solution using the present but smaller time-step. Frequently re-

factoring the LHS of (5.25) can considerably slow down the method and hence we avoid

• Upper limit on time-step: We have an absolute upper limit on the value of step-size h.

This is particularly useful due to the nature of the problem: stress evolution due to EM

has a gently decreasing slope in time. As such, the time-steps taken by the solver tend to

keep increasing. Also, we only need to integrate the LTI system for a comparatively short

time-span until the next void nucleates. As such, at some point, re-factoring the LHS

of (5.25) in order to take a larger time-step becomes less efficient than using the already

computed factorization to take the smaller step. In order to illustrate this point, lets take

a simple example. Suppose we have to integrate for a time-span of ∆τ , with hpre being

the present time-step and hnew = 10hpre being the larger time-step that could be taken.

Also, let tfac and tbf be the CPU time required for LU-factorization and backward-forward

solve, respectively. Typically, we have tfac/tbf ≥ 50. Then, changing the time-step to the

new value will not be preferable if

tfac +∆τ

10hpretbf >

hpretbf =⇒ hpre >

9tbf10tfac

∆τ =⇒ hpre > 0.018∆τ.

Note that for simplicity, we assumed in the above analysis that the time-step is fixed

(with h equal to either hpre or 10hpre) when we integrate in the time-span ∆τ . However, a

similar conclusion can be drawn for variable time-steps as well, mainly because factoring

a matrix is costly as compared to doing a backward-forward solve.

Determining void nucleation time A void nucleates at a junction when its stress value

reaches the critical threshold xth = Ωσth/(kbT⋆m) > 0. Let i be the index of the discretized

point located at a junction and let xss,i be its steady state value. Then, while stepping through

time, if zn[i] + xss,i < xth and zn+1[i] + xss,i ≥ xth, this junction will fail at τ = τf such that

τn < τf ≤ τn+1. The value of τf can be determined using linear interpolation, or by using a

newton divided difference formula [49] of an appropriate order. In practice, linear interpolation

works quite well because we limit the maximum step-size taken by the solver. Once τf is

known, the stress values at all discretized points are computed for τ = τf , so that we have all

the information required to set up the next LTI system.

Till now, we were solving the LTI system by stepping through time using numerical methods

that use difference equations [like (5.16)] to approximate the underlying function. The next two

numerical methods will take a different approach, where instead of using difference equations,

it will use the analytical solution of the LTI system itself to determine the next void nucleation.

However, the analytical solution requires computation of the matrix exponential, which can be

a very expensive operation. In the following sections, we will first present a fast and efficient

technique for approximating the matrix exponential using model order reduction, specifically

the Arnoldi process [77]. Then, we will present the two final numerical approaches that use the

matrix exponential computation to find the time and location of the next void nucleation in

the tree.

5.5 Computing Matrix Exponential using the Arnoldi process

5.5.1 Motivation

The homogeneous LTI system for time span [τp, τp+1), as presented in (5.3), has a closed form

analytical solution, given by

z(τ) = eA(τ−τp)z(τp), (5.31)

which, using the change of variables, gives the solution for the original LTI system (5.1)

x(τ) = xss + eA(τ−τp)(xp,0 − xss), (5.32)

where eA(τ−τp) is the matrix exponential and xss = −A−1Bu. However, the full size of the state

space representation of a tree, as given in (4.32), becomes very large for finer discretizations

(i.e. large N) or for large trees (larger branch count) and computing the matrix exponential for

such a big system is computationally expensive. Hence, we will now present a way of computing

the matrix exponential using a reduced order model obtained by the Arnoldi Process.

5.5.2 The Arnoldi process

The Arnoldi process [77] for some matrix M = [mi,k] ∈ Rq×q attempts to compute an upper

Hessenberg matrix H = [hi,k] ∈ Rq×q and an orthonormal basis V = [vi,k] ∈ R

q×q such that

VT MV = H ⇐⇒ MV = VH. (5.33)

Note that orthonormality implies VT V = I, where I is the identity matrix. Also, an upper

Hessenberg matrix is a matrix such that all entries below the first sub-diagonal is zero. An

example of a 4× 4 upper Hessenberg matrix would be

1 9 8 4

2 16 5 −10 7 6 22

0 0 2 4

. (5.34)

If V = [v1 v2 . . . vq] with vi ∈ Rq, then H is the orthogonal projection of M onto the

spanv1, v2, . . . , vq. Equating the kth columns in MV = VH gives

Mvk =k+1∑

hi,kvi, k = 1, 2, . . . , q − 1. (5.35)

Algorithm 1 Arnoldi Process

Input: LA,UA, q, y, sOutput: Hs,Vs, sinvr1: sinvr ← q ⊲ Initially, assume size of invariant subspace to be the size of A2: Hs ← zeros(s+ 1, s)3: Vs ← zeros(q, s)4: v1 ← y/‖y‖25: for j = 1→ s do6: x← BF Substitution(LA,UA, vj) ⊲ backward-forward substitution7: for i = 1→ j do8: hi,j = vTi x9: x← x− hi,jvi ⊲ modified Gram-Schmidt orthogonalization

10: end for11: hj+1,j = ‖x‖212: if hj+1,j == 0 then13: sinvr ← j ⊲ Size of invariant sub-space is j14: return15: end if16: vj+1 ← x/hj+1,j

17: end for

If we define Vs , [v1 v2 . . . vs] and Hs , [hi,k] ∈ Rs×s with 1 ≤ i, k ≤ s < q, then it can

be shown that (5.35) is equivalent to [78]

MVs = VsHs + hs+1,svs+1,seTs , (5.36)

where es ∈ Rs is the sth unit vector (a vector that has 1 in position s and 0 elsewhere). The

eigenvalues of Hs converge to the s extreme (largest magnitude) eigenvalues of M. Hence, if we

stop the Arnoldi process after obtaining s columns, we end up with the projection VTs MVs ≈

Hs that approximates the s largest magnitude eigenvalues of M.

5.5.3 Solving the Homogeneous LTI system

Recall that all eigenvalues of A are negative real numbers. Because the large magnitude eigen-

values of A die out quickly, the dynamics of stress evolution is primarily governed by the set

of smallest magnitude eigenvalues of A, which we refer to as the dominant modes. Hence, in

our case, we want to approximate the smallest magnitude eigenvalues of the system matrix A,

which can be done applying the Arnoldi process to M = A−1, because the smallest magnitude

eigenvalues of A correspond to the largest magnitude eigenvalues of A−1 (if λ is an eigenvalue

of A, then 1/λ is an eigenvalue of A−1). Algorithm 1 gives the procedure we use to compute

Hs and Vs such that

VTs A−1Vs ≈ Hs, (5.37)

which gives

−1VsH−1s ≈ I =⇒ A−1VsH

−1s ≈ Vs =⇒ H−1

s ≈ VTs AVs. (5.38)

The inputs to the algorithm are LA and UA, respectively the lower triangular and the upper

triangular matrix obtained using the LU factorization of A, the size of the original system q,

an arbitrary starting vector (seed) y ∈ Rq and s, the desired number of extreme eigenvalues to

approximate. The output is the Hessenberg matrixHs, the orthonormal matrixVs, and the size

of invariant sub-space sinvr [78] in case it is less than s. Note that for computational efficiency,

we avoid explicitly computing A−1 and use backward-forward substitution to compute the

matrix-vector product A−1vj in line 6. This algorithm costs s backward-forward substitutions,

and q2/2 +O(q) inner products and scale-add operations.

If we define

z = VTs z ⇐⇒ z = Vsz, (5.39)

then we can write the homogeneous LTI system (5.3) as

˙z(τ) = VTs Az(τ) = VT

s AVsz(τ) ≈ H−1s z(τ), (5.40a)

y(τ) ≈ LVsz(τ) + yss, (5.40b)

z(τp) = VTs (xp,0 − xss). (5.40c)

Here, using the orthonormal basis Vs, we project the original state vector of size q on to the

reduced state vector of size s to generate a reduced order model that captures the dominant

modes. The solution to (5.40a) is simply

z(τ) ≈ eH−1s (τ−τp)z(τp). (5.41)

From (5.39) and (5.40c), we can re-write (5.41) as

VTs z(τ) ≈ eH

−1s (τ−τp)VT

s (xp,0 − xss) =⇒ z(τ) ≈ VseH

−1s (τ−τp)VT

s (xp,0 − xss).

Finally, using the change of variables (5.2), we obtain x(τ)

x(τ) ≈ xss +VseH

−1s (τ−τp)VT

s (xp,0 − xss). (5.42)

Equation (5.42) is similar to (5.32), with the exception that here, we need to compute the

matrix exponential of a much smaller matrix (s ≪ q), which can be done efficiently using the

scaling and squaring method given in [78]. Note that eA(τ−τp) ≈ VseH

−1s (τ−τp)VT

s , with it being

equal only when s = q or s = sinvr.

There is another way to estimate (5.32) using the reduced order model, which is slightly

more optimized. Note that we need to compute the matrix vector product eA(τ−τp)(xp,0− xss),

for which computing eA(τ−τp) explicitly is not required. The product can be computed directly

if, instead of input y being an arbitrary vector in Algorithm 1, we use y = (xp,0 − xss) [78].

Then, x(τ) can be obtained using

x(τ) ≈ xss + ‖xp,0 − xss‖2VseH

−1s (τ−τp)e1, (5.43)

where e1 = [1 0 . . . 0]T ∈ Rq. We will refer to (5.43) as the expm approximation.

5.6 Solvers that use the matrix exponential

The unique characteristic of the expm approximation is that it can directly compute the stress

profile of a tree/subtree at any given time-point τ ∈ [τp, τp+1), without stepping through time.

We will now present two numerical methods that utilize this unique characteristic to compute

the location and time of the next void nucleation in the tree.

5.6.1 Newton Solver

Let m be the set of indices assigned to all unfailed junctions (junctions with no void) in the

tree. Define gi(τ) : R→ R, ∀i ∈ m

gi(τ) , xi(τ)− xth, (5.44)

where xth = Ωσth/(kbT⋆m). Another equivalent definition would be gi(τ) , zi(τ) − zth,i, where

zth,i = xth − xss,i. Clearly gi(τ) ≤ 0, with it being 0 only when a junction fails at τ = τf so

that xi(τf ) = xth. Then, for the time span [τp, τp+1), we can state the objective of finding the

next void nucleation time in the tree as the following problem

Find the minimum τf > τp s.t. gi(τf ) = 0 for some i ∈ m. (5.45)

The index i associated with the minimum τf for which gi(τf ) = 0 gives the location of the

newly failed junction.

One way of solving (5.45) is using the Newton’s method applied to solve gi(τf ) = 0 for

every unfailed junction in the tree. Newton’s method is an iterative method in which the

function to be solved is linearized in the neighborhood of the present candidate solution using

the gradient (slope) to find the next candidate solution. If we use τkf to denote the present

candidate solution, then the next candidate solution τk+1f is obtained using

τk+1f = τkf −

gi(τkf )

gi(τkf ). (5.46)

To apply Newton’s method, we have to evaluate gi(τ) ∀i ∈ m, which can be done using the expm

approximation, and gi(τ) = xi(τ), which is already known from the LTI system formulation.

0 1 2 3 4 5 6 7 8

Time (yrs)

[0.7 yrs, 498.9 MPa]

[0.0 yrs, 434.7 MPa]

[4.8 yrs, 587.7 MPa][6.1 yrs, 598.3 MPa]

[6.4 yrs, 600.0 MPa]

[2.7 yrs, 559.8 MPa]

Goes to 0

Figure 5.1: Obtaining the next void nucleation time using the Newton solver.

The newton iterations are terminated when the following two conditions are satisfied

|gi(τk+1f )| ≤ ǫnt,abs,

|τk+1f − τkf | ≤ ǫnt,rel τ

kf + ǫnt,abs,

where ǫnt,abs and ǫnt,rel are the absolute and relative error tolerances provided by the user. A

typical newton iteration to find the next void nucleation time is as shown in Fig. 5.1. The blue

curve shows the actual stress evolution and linearized models used by Newton’s method are

shown by dashed orange lines. For this case, the solution was obtained in 5 iterations.

5.6.2 Predictor

Newton’s method uses a linear model to approximate the function gi(τ), or xi(τ), around the

candidate solution. However, once the stress values of a junction are determined for a few

time-points using the expm approximation, we can also use other higher order (possibly non-

linear) models for extrapolating the rest of the trend for the nearby time-points. This works in

practice because from experience, we know that (except for a small time-interval after the void

nucleation) stress is a slowly varying function of time, so that the dynamics of stress near the

known solutions can be approximated well enough. While various exponential or log functions

may be suitable, we have found empirically that the following power function template provides

a very good local temporal approximation

xi(τ) = cτ b+a ln τ , (5.47)

where a, b and c are parameters to be determined. Taking ln on both sides of (5.47), we get

ln(xi(τ)) = ln c+ (b+ a ln τ) ln τ = ln c+ b ln τ + a(ln τ)2. (5.48)

0 1 2 3 4 5 6 7 8Time (yrs)

actual stress evolutionTTF predictor fitPoints usedestimated TTF

[5.5 yrs, 593.7 MPa]

[2.5 yrs, 556.6 MPa]

[4.0 yrs, 579.6 MPa]

[6.4 yrs, 600.0 MPa]

Figure 5.2: Obtaining the next void nucleation time using Predictor.

Thus, ln(xi(τ)) is a simple quadratic in ln τ , with a, b and ln c as the three coefficients. The

coefficients can be easily determined using using regression analysis and least-squares fitting if

the value of xi(τ) is computed for at least three time-points. Once the coefficients are known,

τf can be computed using roots of the quadratic polynomial

τf = exp

−b+√

b2 − 4a ln(c/xth)

2a,−b−

b2 − 4a ln(c/xth)

. (5.49)

We will refer to this technique as the Predictor because we are essentially using curve-fitting to

predict the junction failure time. The accuracy of the Predictor approach heavily depends on

the time-points chosen: if the actual junction TTF is close to the chosen time-points, then the

Predictor gives accurate results. Given that we do not know the failure times beforehand, we

use heuristics to choose the time-points at which to evaluate the stress profile using the expm

technique. Fig. 5.2 shows how (5.47) provides a local approximation to the stress evolution at

the given junction, which can then be solved to find the failure time.

5.7 Experimental Results

In this section, we will report the performance and accuracy of the proposed numerical methods,

by comparing them to a standard variable time-step Runge-Kutta method with the Butcher

tableau as given by Dormand and Prince [71] and as implemented in [56]. We will refer to this

solver as RK45, as it computes fourth- and fifth-order accurate solutions. The performance

comparison will be done in terms of run-time, and accuracy will be compared using error rate

plots and the estimated time and sequence of void nucleations.

C++ implementations were written for all the proposed methods: VCBDF2-VCBDF6, the

Newton solver and the Predictor based solver. The size of all reduced order models computed

Figure 5.3: Showing part of trees (a) T1 and (b) T2 used for comparing solvers. The orangedots show the junctions.

using the Arnoldi Process was chosen to be s = min(0.05q, 100), where q is the original size of

the LTI system, as we empirically found that this gave the best accuracy-speed trade-off. For

the comparison, we choose two trees from IBM power grid benchmarks. The first tree T1, shownin Fig. 5.3a, is a structurally simple straight metal stripe, with 192 branches and 193 junctions

(2 diffusion barriers and 191 dotted-I junctions). The second tree T2, shown in Fig. 5.3b, has a

more complex structure and consists of 540 branches and 541 junctions (26 diffusion barriers,

494 dotted-I, 18 T and 3 plus junctions). The LTI models for both trees were generated with

N = 16 discretizations per branch. We used two machines with two different CPU architectures

for carrying out the simulations, as the relative performance of the proposed solvers seems to

vary depending on the CPU architecture.

We will first report the accuracy of solvers VCBDF2-VCBDF6 and the expm approximation

by computing the stress values at all junctions in T1 for specific time-points and comparing

the results with the reference solution obtained from the RK45 solver. The comparison is

done using error rate plots and is shown in Fig. 5.4a. The maximum percentage error is less

than 0.07% and 0.2% for all VCBDF solvers and the expm approximation, respectively, which

clearly demonstrates their accuracy. Fig. 5.4b shows the average absolute error between the

solutions. All the errors are of the order of 10−4 − 10−2 MPa, which is relatively small. For

VCBDF methods, the average error decreases as the order increases, which shows that higher

order VCBDF methods are more accurate. In terms of the average error, accuracy of expm

approximation is similar to VCBDF4 method. A similar trend is observed for tree T2, and some

other trees we tested. The accuracy of a given solver is independent of the CPU architecture.

For the next comparison, we simulate both trees using all solvers for a period of 20 years,

and collect various performance metrics as well as the time and sequence of junction failures.

The sequence of junction failures obtained using all solvers are identical. Fig. 5.5 shows the

percentage error between the junction TTFs estimated using the proposed solvers and the RK45

solver. Clearly, the errors are very small for all solvers, except for the Predictor, that has the

highest percentage error for both trees. This shows that the VCBDF solvers and the Newton

350 400 450 500 550 600

Stress (Mpa)

VCBDF2 Percent error 0.289 MPa-0.289 MPa

350 400 450 500 550 600

Stress (Mpa)

350 400 450 500 550 600

Stress (Mpa)

-0.015

-0.005

350 400 450 500 550 600

Stress (Mpa)

10 -3 VCBDF5 Percent error 0.020 MPa-0.020 MPa

350 400 450 500 550 600

Stress (Mpa)

)VCBDF6 Percent error

0.099 MPa-0.099 MPa

350 400 450 500 550 600

Stress (Mpa)

expm Percent error 0.807 MPa-0.807 MPa

1.3435e-02

2.5455e-03

1.3663e-03

1.2880e-04 1.1936e-04

1.2950e-03

VCBDF2 VCBDF3 VCBDF4 VCBDF5 VCBDF6 expm0

Figure 5.4: (a) Error rate plot for stress evolution at junctions as obtained using VCBDF2-VCBDF6 solvers and expm approximation and (b) the average absolute error with respect toRK45 solver.

VCBDF2 VCBDF3 VCBDF4 VCBDF5 VCBDF6 Newton Predictor0

1st failure2nd failure3rd failure

VCBDF2 VCBDF3 VCBDF4 VCBDF5 VCBDF6 Newton Predictor0

1st failure2nd failure3rd failure4th failure

VCBDF2 VCBDF3 VCBDF4 VCBDF5 VCBDF60

710 -3 Close up view of error in VCBDF methods

Figure 5.5: Percentage error in the estimated TTFs of (a) T1 and (b) T2 using the proposedsolvers and RK45 solver.

solver have high accuracy, with the Predictor having medium to low accuracy.

Table 5.1 compares the performance of the proposed solvers and RK45 using the following

metrics:

- Total Steps: the number of successful time-steps taken by the solver to simulate the tree

up to 20 years.

- Failed Steps: the number of time-steps rejected by the solver due to high error.

- f(z, τ)evals: the number of derivative evaluations f(zn, τn) = Azn at the present time

point τn.

- LU’s: The number of LU factorizations computed for A and (hn+1b−1A− I).

- BF subs.: The number of backward-forward substitutions done using already computed

factorization of either A or (hn+1b−1A− I).

- expm solves: The number of times expm approximation was used for computing x(τ).

- time taken: The time taken by the solver, in seconds, to simulate the corresponding tree

up to 20 years.

- speed-up: The speed-up obtained by the solver as compared to RK45 solver.

From the data, it is clear that except for one scenario (VCBDF6 for T1 on Core i7), the proposed

solvers are faster than RK45, sometimes by orders of magnitude. This is to be expected, as

RK45 is not optimized for the problem at hand while our VCBDF solvers benefit from the

time-stepping and other optimizations in Section 5.4. Among all VCBDF solvers, VCBDF2 is

the fastest solver, with VCBDF3 and VCBDF4 being a close second and third, respectively.

The solvers VCBDF5 and VCBDF6 are comparatively slow. The reason for the slowdown of

the higher order solvers can be attributed to the calculation of PLTE using (5.27): A VCBDFk

solver requires Ak+1 to compute the PLTE, which results in higher error norms for larger k

values (in our case, ‖Ak1z‖ ≥ ‖Ak2z‖ iff k1 ≥ k2). This forces the higher order VCBDF solvers

to take smaller time-steps in order to maintain the solution accuracy. As a by-product, we get

more accurate solutions, as evident by the preceding accuracy comparison. Also, Ak+1 becomes

dense as k increases, so that calculation of the PLTE itself takes more time.

For a given problem, the performance of VCBDF solvers are better on the Xeon CPU as

compared to the Core i7. Overall, the VCBDF solvers are ∼2.5x faster on the Xeon CPU as

compared to the Core i7 CPU. Given that all the performance metrics (number of steps, LU

factorizations etc.) are almost identical on both CPUs, the difference in performance stems

from faster LU factorization of (hn+1b−1A − I) and corresponding faster backward-forward

substitution on Xeon CPUs. For the given simulations, LU factorization of (hn+1b−1A−I) and

the corresponding backward-forward substitution are respectively 5.5x and 8x faster on Xeon

as compared to Core i7, even though we use SuiteSparse [79, 80, 81, 82] to perform all sparse

matrix operations on both machines.

The performance of the Newton solver and the Predictor varies depending on the structure

of the tree and the machine architecture. For Xeon CPU, Newton solver outperforms all the

VCBDF solvers for T1 but is slower than VCBDF2-VCBDF5 for T2. On the other hand, on

the Core i7 CPU, the Newton solver is at least 1.5x faster than all VCBDF solvers for both

trees T1 and T2. The performance of the Predictor also follows a similar trend. Curiously

enough, this difference in performance comes from the difference in speed of backward-forward

substitution in the Arnoldi process on both machines: backward-forward substitution using

the LU factorization of A is 10x faster on Core i7 as compared to Xeon CPU. This makes the

Newton solver and the Predictor a viable option for machines with Core i7 CPU.

We also compute the empirical complexity of the fastest numerical methods. This is done

by increasing the value of N for a given tree (we do not use the runtimes from different trees to

compute the complexity because there are other factors that might affect the runtime, such as

0 10 20 30 40 50 60 70

VCBDF2

5 10 15 20 25 30 35 40 45 50

VCBDF2

0 10 20 30 40 50 60 70

VCBDF3

5 10 15 20 25 30 35 40 45 50

VCBDF3

Figure 5.6: Empirical complexity of VCBDF2 solver for trees (a) T1 and (b) T2, and VCBDF3solver for trees (c) T1 and (d) T2, computed by using the fitting function time = aN b, where bis the complexity.

the structure of the tree). The results are shown in Fig. 5.6. With almost linear complexities,

the VCBDF2 and VCBDF3 solvers appear to be scalable for large problem sizes.

Table 5.1: Comparison of solver metrics and runtime

Host Total Failed f(z, τ) BF expm time speed

CPU Tree Solver steps steps evals LU’s subs. solves taken up

Xeon T1 RK45 248246 15 1490252 – – – 43.38 –

E5-2687W VCBDF2 388 7 – 104 395 – 0.46 93.64x

3GHz VCBDF3 431 8 – 148 439 – 0.60 72.32x

VCBDF4 1048 15 – 212 1063 – 0.92 46.94x

VCBDF5 3261 28 – 1078 3289 – 5.47 7.93x

VCBDF6 7167 49 – 3874 7216 – 14.51 2.99x

Newton – – 24 1 489 28 0.36 120.51x

Predictor – – – 1 1241 40 0.49 87.84x

T2 RK45 233907 8 1404550 – – – 151.31 –

VCBDF2 548 16 – 153 564 – 3.51 43.09x

VCBDF3 656 15 – 220 671 – 3.97 38.12x

VCBDF4 1034 22 – 365 1056 – 5.23 28.92x

VCBDF5 2301 42 – 683 2343 – 8.54 17.71x

VCBDF6 5815 66 – 2488 5881 – 24.88 6.08x

Newton – – 29 1 758 35 8.68 17.43x

Predictor – – – 1 896 31 8.97 16.86x

Core i7 T1 RK45 248246 15 1490252 – – – 42.26 –

4770 VCBDF2 388 7 – 104 395 – 1.34 31.63x

3.4GHz VCBDF3 429 8 – 148 437 – 1.49 28.35x

VCBDF4 1027 15 – 215 1042 – 3.09 13.68x

VCBDF5 3101 27 – 1084 3128 – 27.33 1.55x

VCBDF6 7044 48 – 3777 7092 – 59.73 0.71x

Newton – – 24 1 489 28 0.31 138.28x

Predictor – – – 1 1241 40 0.47 89.85x

T2 RK45 233907 8 1404550 – – – 161.03 –

VCBDF2 548 16 – 153 564 – 4.96 32.45x

VCBDF3 662 15 – 220 677 – 6.10 26.41x

VCBDF4 1021 23 – 394 1044 – 8.09 19.90x

VCBDF5 2319 46 – 798 2365 – 14.45 11.14x

VCBDF6 5916 77 – 2764 5993 – 40.82 3.94x

Newton – – 29 1 758 35 3.07 52.48x

Predictor – – – 1 896 31 3.06 52.69x

Chapter 6

Power Grid EM Checking

6.1 Introduction

In this chapter, we will present two approaches for estimating the mean time to failure (MTF) of

a power grid under the influence of electromigration. We will start by explaining what an early

failure is and why it impacts the power grid reliability, which will be followed by our approach

for determining branch temperatures using compact thermal models. Then, we will present

the two power grid EM checking approaches: the main approach and the filtering approach,

where the second approach improves over the first one by focusing the computation only on

the EM-susceptible trees in the grid. Finally, we will present the experimental results where,

among other things, we will compare 1) the power grid MTFs obtained using a calibrated

Black’s model and the Extended Korhonen Model and 2) the performance of our solvers in the

context of power grid EM checking.

6.2 Early Failures

A void nucleation at a junction typically increases the resistance of all connected branches.

However, in a power grid, which consists of multiple trees electrically connected to each other

by vias, a void nucleation may have another effect. Consider two trees in two consecutive

metal layers connected by a via as shown in Fig. 6.1a, the schematic representation of which

is shown in Fig. 6.1b. In this case, we have two junctions, one above and one below the via.

Depending on the direction of the current densities, a void might form above or below the via.

If a large enough void forms below a via, it might in some cases cause an open circuit failure

by disconnecting the via. This phenomenon is known as early failure and has been reported in

the literature [29]. It happens because the capping layer is not conductive; hence if the void

covers the entire cross-section of a via (as shown in Fig. 6.1b), there is no conductive path left

between the via and the tree below and the current in the via completely falls to 0. On the

other hand, voids that form above the via generally happen at the top of the line away from the

via, and so take a long time to completely fill the cross-section, and even then do not translate

Chapter 6. Power Grid EM Checking 95

(a) (b)

Figure 6.1: (a) An arrangement of two trees connected by a via taken from the power grid and(b) the corresponding schematic showing early and conventional failures.

to an open circuit because the current can continue to flow through the (high resistance) metal

liner. We will refer to these kinds of failure as conventional failures. Removal of a via, as it

happens during early failures, can have a severe impact on grid reliability and thus should be

accounted for in the EM analysis.

6.3 Determining Branch Temperatures

Temperature affects EKM on the following three fronts:

1. The initial stress at t = 0 for any given tree is mainly due to the thermal stress, which is

strongly dependent on the initial temperature [see (3.2)]. A higher thermal stress often

leads to a smaller void nucleation time and vice-versa.

2. The diffusivity of branch bk, which primarily determines the time rate of change of stress,

depends on its temperature Tm,k [see (2.10)]. Diffusivity increases with increase in temper-

ature, so that the time rate of change of stress also increases with increase in temperature

and results in smaller void nucleation times.

3. The steady state void length depends on the thermal stress: higher thermal stress leads

to larger voids.

We have already seen in Section 3.7 that it is important to account for temperature variation

across a tree while estimating its EM degradation using EKM. We also saw that there is no

‘nominal temperature’ that can capture the effect of the actual temperature variation. As

such, it becomes important to determine the temperature profile of all trees across all layers

in the power grid for realistic EM assessment. We do this using the compact thermal models

(CTM) obtained using electro-thermal equivalence, as detailed in Section 2.8. We will now

briefly summarize the procedure for applying the CTM approach to determine the temperature

distribution of the whole power grid.

Each layer in the power grid is discretized into uniform volume elements called thermal

blocks [24]. Each thermal block represents an isothermal volume within a layer, and as such all

Figure 6.2: Thermal modelling of power grid using CTMs.

branches and junctions that reside within a thermal block have the same temperature. Since we

assume the atomic diffusivity to be the same throughout a branch, there can be no temperature

gradient within a branch. Hence, each branch is associated with only one thermal block. For

each block, we perform thermal analysis using CTMs [62] based on electro-thermal equivalence.

Recall that a CTM is a lumped thermal RC network, with heat dissipation modelled as a current

source, as shown in Fig. 6.2. Specifically, each thermal block is represented as a thermal node

connected to 6 resistors, a current source and a capacitor, and their values can be calculated

using (2.52).

The number of thermal blocks per layer is the same and is decided based on the required

resolution of temperature distribution. In addition, we assume convective boundary condition

[24] at the top and insulated boundary conditions at the four sides to model the heat transfer

between the power grid and the surroundings. The CTMs for thermal blocks, combined with

the boundary conditions, gives us a thermal grid that can be solved for finding the temperature

distribution of the power grid [see (2.54)]. In our case, we are only interested in the steady

state temperature distribution because transients in temperature occur on a time scale that is

small when compared to the EM. Thus, we ignore the thermal capacitance and use the steady

state temperature distribution in our analysis, which can be obtained by solving

GTTm(t) = iTs(t) +GT,0Tamb (6.1)

for Tm(t). This gives the temperature at every thermal node, and correspondingly for all

branches. All symbols in (6.1) are explained in Section 2.8. The total power dissipated in the

kth thermal block (iTs,k) is calculated using iTs,k = Pself heating+Plogic where Pself heating is due

to the average power dissipated by joule heating of the metal branches within the thermal block

and Plogic is the average heat dissipated by the underlying logic, due to active switching activity

and leakage currents. Note that Plogic contributes to power dissipation of thermal blocks in the

lowest layer only.

5 10 15 20 25 30

xcoord (mm)

ycoord (mm)

xcoord (mm)

Figure 6.3: (a) Heat map for Pself heating + Plogic and (b) temperature profile (in Kelvin) forthe M1 layer in ibmpgnew2.

0.1 0.2 0.3 0.4

xcoord (mm)

0.30.4

ycoord (mm)

0.2 0.3

xcoord (mm)

0.20.1

0.10 0

Figure 6.4: (a) Heat map for Pself heating + Plogic and (b) temperature profile (in Kelvin) forthe M1 layer in PG7.

Fig. 6.3 and 6.4 show the heat map for power consumption and the computed temperature

profile for the lowest metal layer M1 using CTMs for power grids ibmpgnew2 and PG7, respec-

tively. The specification for these grids are provided in Table 6.1. For PG7, the four bottom

layers are divided into 3×3 sub-grid islands, of which one island is switched off. This gives

rise to the power heat map as shown in Fig. 6.4a. The corresponding temperature profile in

Fig. 6.4b also reflects this, with the temperature being lowest above the switched off island.

6.4 Power Grid EM analysis approaches

6.4.1 Power Grid Model

As we saw in Chapter 4, the cutoff frequency for tree LTI models is less than 25 Hz. As

such, short-term transients with frequencies in MHz or GHz range typically experienced in

chip workloads do not play a significant role in EM degradation. Hence, and consistent with

standard practice in the field, we use an effective-current model [30], so that the grid currents

are assumed to be constant at some average (effective) value, at least during the void nucleation

phase. As per EKM, once a void nucleates, branch resistances change fairly quickly and the

currents change, also fairly quickly, to new effective values. Thus, between any two successive

void nucleations, the power grid has fixed currents, voltages, and conductances and so can be

modelled using a DC model as given in (2.21), which we re-state again here

G(t)v(t) = i, (6.2)

where G(t) is the time-varying (but piecewise-constant) conductance matrix, v(t) is the corre-

sponding time-varying (but piecewise constant) vector of node voltage drops and i is the vector

of average (effective) values of the current sources tied to the grid.

6.4.2 The Main Approach

We will use the mesh model [25, 47] to find the Mean Time to Failure (MTF), in which the

grid is deemed to fail not when the first void nucleates, but when enough voids have nucleated

so that the user-provided voltage drop threshold value has exceeded at some grid node. The

voltage-drop threshold value for every grid node (or a subset of grid nodes) is captured in the

vector vth and ensures that there are no timing violations in the underlying logic as long as node

voltage drops are below the threshold. As a byproduct, however, this process also produces the

time when the first void nucleates, which helps us generate the MTF under a series model, in

which a grid is deemed to fail when the first void nucleates. We report the series model MTF

for comparison purposes.

Obtaining one grid TTF sample

We assume that the grid is undamaged (no voids) at t = 0 and that all node voltage drops are

less than vth, i.e. v(0) < vth. We calculate the initial temperature distribution at t = 0, which

gives the initial thermal stress profile for the trees and the branch diffusivities. A power grid

is a collection of interconnect trees. As such, to estimate the EM degradation of the grid, we

formulate the LTI system for every tree as shown in Section 4.2.4 and numerically integrate

them to obtain the stress at all junctions as a function of time. At this point our main objective

is to find the time and location of the next void nucleation among all junctions in all the trees.

Let nf be the next junction that fails and tf be its time of void nucleation. Then, to determine

nf and tf efficiently, we propose the following 3 step approach:

1. Sort : For every unfailed junction in a tree, we calculate a crude estimate of the junction

nucleation time by using a simple linear model with slope equal to the present time-rate

of change of stress at that junction. The trees are then sorted in ascending order by their

minimum junction nucleation time.

2. Simulate: We set nf = ∅ and tf = ∞, and start simulating the trees one by one as

determined by the ordering of the previous step, up to either the first junction failure or

tf , whichever is earlier. If a junction in a tree fails before tf , we update nf and tf so that

they always store the best estimate of next junction to fail and its TTF. When we finish

simulating all trees, nf and tf have the correct final values.

3. Synchronize: Since the sorting step is based on a crude estimate, we might have trees

that have been simulated to a time point greater than tf . The solution for these trees are

no longer valid because the void nucleation at nf will the change current densities in the

power grid. Thus, we re-simulate all such trees to determine their stress profile at t = tf .

In principle, the simulate and synchronize steps can be done using any of the proposed

solvers from the previous chapter. While the RK45 solver and all VCBDFk solvers do a pretty

good job at simulating a tree, the Newton solver and the Predictor have some shortcomings.

The Newton solver suffers from the same drawbacks as a general Newton method: if we don’t

start sufficiently close to the true solution, the convergence is not guaranteed. On the other

hand, it is really difficult to determine the time-points for which the Predictor gives a good

junction TTF estimate. Thus, the junction failure times obtained by the Predictor are usually

not accurate enough, which was evident in the experimental results of the last chapter. To

overcome these shortcomings, we combine the two solvers as follows: we first use the Predictor

to find the time of next junction failure and then use the Newton solver to refine the estimated

failure time. This works really well in practice because the solution of the Predictor is always

close to the true solution and evaluating expm and the derivatives are cheap as compared to

generating the reduced model itself. Hence, for power grid EM assessment, we will combine the

Newton and Predictor methods into one method, which we will refer to as the Predictor+Newton

method.

Once nf and tf are known, we calculate the steady state volume of the void using (3.12),

update the resistances of connected branches using (3.13) and compute the new voltage drops

and current density values. We then examine to see if the recently nucleated void leads to an

early failure, by checking the following two conditions: i) is the void located below a via (this

is determined using the power grid structure) and ii) is the void large enough to disconnect the

via. If both conditions are met, the void at nf leads to an early failure, so that we remove the

via from the power grid and update the voltage drops and the current density values. Then we

re-calculate the power dissipation for all thermal nodes, find the new temperature distribution

8 9 10 11 12 13 14 15Time (yrs)

0.005 0.01

0.05 0.1

0.9 0.95

0.99 0.995

TTF samplesgoodness-of-fit

8 9 10 11 12 13 14 15Time (yrs)

from simulationparameters from fit

Figure 6.5: (a) Goodness-of-fit plot for normal distribution and (b) probability distributionfunction (pdf) using 200 mesh TTF samples from ibmpg2 main approach.

and update the branch diffusivities. The LTI system for all trees are updated as given in

Section 4.2.4. At this stage, we again need to find nf and tf , which can be done by repeating

the sort-simulate-synchronize steps.

The time of first void nucleation gives the TTF of the grid as per the series model. Due

to increase in branch resistances, the voltage drops in the grid continue to increase as we

move forward in time. Each time we update the voltage drop, we check to see if a voltage drop

violation has occurred somewhere. The earliest time when the voltage drop at any node exceeds

vth is the TTF of the grid as per the mesh model. As the power grid size increases, updating

voltage drops due to changing branch resistances becomes more computationally expensive,

which can limit the scalability of our approach. To overcome this problem, we update voltage

drops using Preconditioned Conjugate Gradient (PCG) method. At t = 0, we have to factorize

the conductance matrix in order to find the initial voltage drops. We use this initial factorization

as a pre-conditioner for our Conjugate Gradient (CG) method. This makes the voltage drop

updates really efficient because the perturbation in G(t) due to void nucleations is minimal,

hence the factorization of G(0) acts as an excellent incomplete factorization for G(t), that

results in very fast convergence within a few iterations.

Estimating grid MTF

To account for the random nature of EM degradation, we performMonte Carlo random sampling

to estimate the MTF. In each Monte Carlo iteration, we assign new lognormally generated

diffusivities to all the branches in the grid. This effectively produces a new instance of the

whole power grid, which we refer to as a sample grid. Then, as stated in the previous section,

we simulate the sample grid to generate a TTF value based on the series model and another

based on the mesh model. With enough samples, we form two averages as our estimates of the

series MTF and the mesh MTF.

Let T be the RV that represents the statistics of the mesh TTF for this approach, then

the expected value of T, denoted by E[T], is the mesh MTF of the grid. Using goodness of fit

methods, it was found that the normal distribution is a good fit for T (see Fig. 6.5). Therefore,

we can use standard statistical sampling (Monte Carlo) [64] to find the value of E[T] to within a

user-specified error tolerance. The number of samples required for the Monte Carlo to terminate

is given in (2.63). This stopping criteria ensures that we have (1 − ζ)× 100% confidence (e.g.

ζ =0.05 for 95% confidence) that the relative error in MTF estimation is less than user-provided

relative error threshold ǫmc (e.g. ǫmc = 0.05 for 5% relative error threshold).

Though this is the most accurate approach, numerically solving all the trees in the power

grid using the EKM can be computationally expensive. In this work, we use this approach only

on smaller grids and refer to it as the main approach. The results from this approach serve as

a benchmark of comparison for more optimized approaches.

6.4.3 Improved performance with Filtering

We will now present a method that drastically reduces the run-time with almost no impact on

accuracy. We will refer this as the Filtering approach. For each sample grid, solving all the

trees up to the time of grid failure yields a specific sequence of void nucleation times in certain

trees that are of interest. In particular, all trees that nucleate their first void before the time

of grid failure are of interest to us. All trees that nucleate their first void after the grid failure

are inconsequential to the analysis, and we would do well to filter them out in the first place.

Unfortunately, we don’t know up-front which set of trees should be solved, and which can be

discarded. However, we can devise an approximate filtering scheme that indicates which subset

of trees will most likely nucleate a void before all the rest, and focus our computation on those

trees.

Finding the Active set

For a given sample grid, we restrict our attention to a subset of trees whose estimated first void

nucleation times are smaller than some threshold t = tm. We call this subset of trees as the

active set and tm as the active set cutoff threshold. tm is a part of the Monte Carlo process. We

start with a sufficiently high value of tm, that is reduced as more TTF samples are obtained.

We will provide more details on tm later. Note that we don’t need to know the actual void

nucleation times for junctions in a tree, rather we only need to know if the first void in a tree

nucleates before tm. In addition, any filtering approach needs to be quick, or at least it should

be faster than simulating all trees in the grid to be considered viable. We now present some

filtering approaches that can be employed to speed-up the MTF estimation.

Steady State Filter In this approach, we compute the steady state stress profile of the all

trees using the respective LTI models. Any tree with a junction that has a maximum steady

state stress value larger than the critical stress threshold σth is included in the active set.

0 2 4 6 8 10 12Time (yrs)

Junction 2

Junction 1

Figure 6.6: The idea for expm filtering scheme. The dotted lines show the would-be stressevolution if the boundary conditions are not updated when stress reaches σth. Junction 1 failsbefore t = tm, Junction 2 fails after.

Riege-Thomson Filter As mentioned in Section 2.3.3, Riege and Thompson proposed an

analytical expression for stress evolution at a junction by replacing all its connected branches

with semi-infinite limbs [42]. After some algebraic manipulation, the TTF of a junction as

estimated by their model can be stated as

(σth − σ0

ρq∗

)2 ΩπkbTm

bk∈Bp

Da,kjk

bk∈Bp

Da,kjk > 0,

∞∑

bk∈Bp

Da,kjk ≤ 0,

Recall that due to the assumption of semi-infinite limbs, Riege-Thompson model cannot account

for back stress generated due to EM, and thus it gives a conservative TTF estimate for a

junction. Based on this conservative approximation, the trees that are likely to nucleate their

first void before tm are declared to be part of the active set.

expm based filter If the stress evolution at a junction is to cause void nucleation before time

tm, then that junction’s would-be stress value at tm should be higher than σth (see Fig. 6.6).

Here, the would-be stress value at tm denotes the hypothetical stress value at a junction if

the boundary conditions are not updated at the time of void nucleation. We use the expm

approximation to calculate the would-be stress profile of every tree at t = tm, and any tree with

junction stresses greater than σth at t = tm is included in the active set.

VCBDF2 based filter In this approach, we generate LTI models for all trees using a smaller

value of N , usually 8 or 10, and then use the VCBDF2 solver with relaxed tolerances to integrate

these coarse tree LTI models up to tm, or up to the time of its first void nucleation, whichever

is earlier. We include a tree in the active set if its first void nucleation time is less than tm.

We have already shown in Section 4.3 that the LTI models generated using smaller values of N

give less accurate but correct results.

The steady state filter and the Riege-Thomson filter are computationally very efficient, but

they are also very conservative so that the active set obtained using them usually consists of

many trees that will not fail before tm. The expm filter has very few false positives. However,

it assumes that the tensile stress at junctions is a monotonic function of time, which is not

always true and this makes the expm filter exclude trees where the stress overshoots σth before

dropping down to a value less than σth at t = tm. Finally, the VCBDF2 based filter provides a

good estimate of the active set, but it might not be as fast as the expm based filter because it

still has to step through time. Also, the performance of the VCBDF2 based filter depends on

tm: a higher value of tm usually slows down the VCBDF2 based filter. The performance of the

expm filter is independent of tm, as it does only one expm evaluation.

Estimating mesh MTF from limited samples

If the sample grid fails before tm, we obtain a sample TTF. On the other hand, it might be

the case that the sample grid hasn’t failed up to t = tm. In this case, we set the TTF sample

equal to tm, and such a sample is called a limited sample. Thus, in the Filtering approach, we

effectively sample from a RV T′ that has a maximum value of tm and has a normal distribution

same as T ∀t ≤ tm, where T is the RV the represents the statistics of mesh TTF as obtained

from the main approach. The RV T′ is a limited RV, and has the following definition.

Definition 1. Let Y be a random variable (RV) with cumulative distribution function (cdf)

FY (t) and let l and u be two scalars with l < u and at least one of them finite. Then, RV Y′

is called a limited RV between limits l and u, with Y being the underlying RV, if it has the

following cdf [83]

FY ′(t) =

0 t < l,

FY (t) l ≤ t < u,

1 t ≥ u.

In our case, T′ has a limited normal distribution with l = −∞ and u = tm, and the

underlying normal RV is T. A straight forward averaging of obtained TTF samples would give

us E[T′], which is not the mesh MTF of the power grid. Thus, in this section, we will derive a

relation to estimate E[T], the actual mesh MTF from the obtained samples.

Using the law of total expectation [84], we can write for T

E[T] = E[T|T ≤ tm]F (tm) + E[T|T > tm](1− F (tm)), (6.5)

where F (t) is the cdf of the normal RV T. We can also express E[T′] in similar terms

E[T′] = E[T′|T′ ≤ tm]F ′(tm) + E[T′|T′ > tm](1− F ′(tm)), (6.6)

where F ′(t) is the cdf of RV T′. From the definition of a limited RV, we know that F ′(tm) =

F (tm), E[T′|T′ ≤ tm] = E[T|T ≤ tm] and E[T′|T′ > tm] = tm. Hence, we can re-write (6.6)

E[T′] = E[T|T ≤ tm]F (tm) + tm(1− F (tm)). (6.7)

Subtracting (6.7) from (6.5), we get

E[T] = E[T′] + (E[T|T > tm]− tm)(1− F (tm))

= E[T′] + E[T− tm|T > tm](1− F (tm)). (6.8)

The term E[T− tm|T > tm] is the Mean Residual Life (MRL) of the power grid at t = tm, and

it can be showed that [85]

E[T− tm|T > tm] =1

1− F (tm)

∫ ∞

[1− F (z)]dz. (6.9)

Combining (6.8) and (6.9), we get

E[T]− E[T′] =

∫ ∞

[1− F (z)]dz. (6.10)

Define µ , E[T], v2 , V ar[T], µ′ , E[T′] and (v′)2 , V ar[T′]. Also, let Φ(t) be the cdf of a

standard normal distribution N (0, 1) (normal distribution with mean 0 and variance 1), so that

Φ(t) =1√2π

−∞e−z2/2dz =

1 + erf

(t√2

, (6.11)

which can be computed using the erf() function on most operating systems. Then, from the

definition of the cdf of a normal distribution, we can re-write (6.10) as

µ− µ′ =

∫ ∞

1− Φ

(z − µ

dz. (6.12)

The RHS of (6.12) can be integrated to give (see appendix B for a step-by-step derivation)

µ− µ′ = (µ− tm)

1− Φ

(tm − µ

, (6.13)

where φ(t) = e−t2/2/√2π is the pdf of standard normal distribution. Using (6.13), we could

have estimated µ from µ′ if variance v of the underlying normal was known. Unfortunately,

thats not the case. However, as we will show in the next subsection, we can estimate (with

some confidence) the value of F (tm), the cdf at t = tm from the Monte Carlo experiments.

Thus, F (tm) is a known quantity. Let and pf , F (tm). Then, we can write

pf = Φ

(tm − µ

=⇒ v =tm − µ

Φ−1(pf ), (6.14)

where Φ−1 denotes the inverse cdf of a standard normal distribution, which also can be evaluated

on most operating systems using erfinv() function. Now, using (6.14) in (6.13), we get

µ− µ′ = (µ− tm)

1− pf −φ(Φ−1(pf )

Φ−1(pf )

which can be easily solved for µ

µ =µ′ + (κ− 1)tm

κ, (6.15)

where κ is a function of pf

κ = pf +φ(Φ−1(pf )

Φ−1(pf ). (6.16)

Modifying the Monte Carlo stopping criteria

In addition to finding µ from µ′, we also need to derive a new stopping criteria for the Monte

Carlo random sampling process to ensure that µ computed using (6.15) is estimated within

the user specified tolerances akin to the main approach. In order to do this, we have to first

introduce the notion of a true value, estimated value and error in estimation. Let T ′1, T

′2, . . . T

be s samples obtained from RV T′ using a Monte Carlo process, of which slim are limited

samples. Then, define

µ′ ,1

T ′k, pf ,

s− slims

, (6.17)

where µ′ is the estimated value of µ′ and pf is the estimated value of pf obtained using the

TTF samples. Using µ′ and pf in (6.15) we can calculate µ, the estimated value of µ. Note

that µ′, pf and µ are the true values, so that

lims→∞

µ′ = µ′, lims→∞

pf = pf , and lims→∞

µ = µ. (6.18)

Then, the error in estimation can be written as

δµ = |µ− µ|, δµ′ = |µ′ − µ′|, and δpf = |pf − pf |. (6.19)

Similar to the main approach, we would like to stop the Monte Carlo process when we are

(1−ζ)×100% confident that the relative error in estimated MTF is less than some user provided

threshold ǫmc. This can be achieved if

δµζ

µ≤ ǫmc ⇐⇒

δµζ

µ≤ ǫmc

1 + ǫmc, (6.20)

where δµζ is (1−ζ)× 100% confidence bound on the estimation error δµ. In other words, this

means that the interval [µ− δµζ , µ+ δµζ ] will contain µ (the true value) (1− ζ)× 100% of the

time. To find δµζ , we apply propagation of errors [86] to (6.15)

δµζ =

√(∂µ

∂µ′δµ′

(∂µ

∂κδκζ

, (6.21)

where δµ′ζ and δκζ are the (1− ζ)× 100% confidence bounds on µ′ and κ, respectively. δµ′

obtained from simulation, using the technique given in [83] and δpfζ can be calculated from the

TTF samples using [87]. The complete details are given in the appendix B. Here, we present

the final expression

(δµζ)2 =

(δµ′ζ)

z2ζ/2(tm − µ′)2pf (1− pf )

, (6.22)

when spf ≥ 5 and s(1 − pf ) ≥ 5. Here, zζ/2 is the (1 − ζ/2)-percentile of N (0, 1), κ is the

estimated value of κ using pf and y = Φ−1(pf )/√2. Similar to the main approach, we obtain

at least 30 TTF samples before starting to check the stopping criteria (6.20).

Final workflow of the Filtering approach

The workflow of the filtering approach is very similar to the main approach, but with a few

key differences. First, instead of simulating all the trees, we determine the active set at t = 0

using the previously presented Filtering approaches, and simulate only the trees in the active

to obtain a grid TTF sample. In practice, we use a combination of filters. For example, we first

use the Riege-Thomson filter to remove trees that have their first failure greater than ktm for

some k > 1, and then apply either expm or the VCBDF2 based filters to finalize the active set.

Also, we usually add a slack ∆tm > 0 to tm in deciding the active set, so that any tree that has

the first failure time less than tm +∆tm becomes a part of the active set. The slack ∆tm not

only results in a conservative active set, but it also ensures that we don’t miss out on any tree

due to the error incurred by using a reduced order or coarse LTI model. The slack also serves

another important purpose: it might happen that trees that were not included in the active set

at t = 0 may become eligible to be a part of it due to change in current densities caused by

the previous junction failures. Usually, this behaviour is observed for trees that were excluded

from the active set due to a small margin1. Hence, including them in the active set safe guards

1It is rare to observe a tree that previously had its first junction failure much greater than tm, to suddenly

become eligible for to be a part of the active set.

our approach against potential pitfalls.

Second the mesh MTF sample is calculated using (6.15) and the stopping criteria as shown

in (6.20) and (6.22) is used. If the final value of tm is chosen such that 0 < pf ≤ 1, then (6.15)

is used to calculate the mesh MTF. If pf = 0, (6.15) cannot be used to estimate the MTF µ

because pf = 0 =⇒ Φ−1(pf ) = −∞, so that κ = 0 and (6.15) gives µ = ∞. This scenario

might happen if the value of tm is so small to begin with that all mesh TTF samples are limited

samples. In this case, we can only say with certainty that µ > tm. On the other hand, if pf = 1,

this means that none of the samples obtained are limited by tm, and thus the standard Monte

Carlo stopping criteria (2.63) can also be applied.

5 10 15 20 25 30

sample number

Figure 6.7: Variation of p2 with sample number.

Third, we now have an ‘extra’ Monte

Carlo parameter tm, whose value needs to be

decided. The value for tm should be chosen

carefully. If the value of tm is high, then we

will waste a lot of time simulating trees that

would never fail before grid failure and our

performance will suffer. On the other hand,

a lower value of tm reduces the run-time, but

we found from experiments that it results in

slower Monte Carlo convergence and would

require more iterations. The sweet spot for

tm lies somewhere in between. Indeed, from

experiments we found that we get the best

runtime if tm is close to, but greater than the MTF µ. Alas, we obviously don’t know the MTF

beforehand. Hence, we use an adaptive strategy for determining tm. For the first few iterations,

we choose a sufficiently high value, so that the TTF samples are not limited by tm. Then, based

on the mean of the samples obtained so far we decrease the value of tm so that it gets closer to

the estimated mean. While a lot of strategies of reducing tm based of the obtained samples are

possible, we found the following to be the most effective

tm,k+1 = min

tm,k, p1

tm,k + (µ− tm,k)

0.3 +0.4

1 + e−p2(s−10)

, (6.23)

where p1 ≥ 1 is a safety factor and p2, the steepness of the logistic function that varies with

the number of samples obtained as shown in Fig. 6.7. The overall flow of the filtering approach

is shown in Fig. 6.8.

6.4.4 Parallelization using shared memory

Estimating the TTFs of different sample grids in each Monte Carlo iteration are independent

of each other, and thus it can be parallelized. In our implementation, we use a multi-process

architecture, with each process bound to a separate core, to carry out different MC iterations

Figure 6.8: Flow chart showing the MTF estimation using the Filtering approach. EF standsfor early failure.

in parallel (see Fig. 6.9). These processes use shared memory for inter-process communication.

The first process allocates and initializes the shared memory object, which contains 1) a ran-

dom number generator to be used for generating sample grids, 2) a table to store the results

generated by all MC iterations, 3) the time threshold tm required for finding the active set, 4)

the process IDs of all active processes using the shared memory and 5) several read-write locks

to synchronize the read/write accesses to the shared memory. All subsequent processes map

this memory space to their own address space.

At the beginning of every MC iteration, each process uses the shared random number

generator to generate a sample grid. When a process completes its MC iteration and obtains

a TTF sample, it writes the results to the shared table, updates the estimated MTF and tm

based on the TTF samples obtained so far from all the processes and checks if the stopping

criteria has been satisfied. If its not satisfied, this process starts a new MC iteration. On the

other hand, if the stopping criteria is satisfied, this process sends an interrupt to all the other

Figure 6.9: Workflow for each process in our parallel implementation.

processes to signal the end of the task and then stops. Any time a process receives an interrupt,

it assumes that the stopping criteria has been satisfied in some other process, and thus stops.

The last process to stop deallocates the shared memory object.

6.5 Experimental Results

A C++ implementation was written to test the proposed electromigration assessment method-

ologies. Two types of test grids were used to verify our approach: IBM power grids [26] and

internal grids. The IBM power grids are a suite of 8 power grid benchmarks drawn from real

industrial designs. The largest IBM grid has around 720K electrical nodes. In order to sim-

ulate larger benchmarks, we use internally generated power grids. The internal grids are our

own non-uniform grids, synthesized as per user specifications, including grid dimensions, metal

layers, pitch, and width per layer. The current sources are randomly placed on the grid. The

technology specifications are consistent with 1V 45 nm CMOS technology. The details for

the grids are as given in Table 6.1. The grids names prefixed with ‘ibm’ are IBM power grid

benchmarks and the grids PG1-PG7 are internal grids, which go up to 4.1M nodes.

The interconnect material is assumed to be Copper, and the physical constants used for

simulation are listed in Table 6.2. The configuration parameters for the numerical solvers and

the Monte Carlo MTF estimation procedure are as given in Table 6.3. The size of the reduced

order models generated using the Arnoldi Process was chosen to be min(0.05q, 100), where q is

the size of the original LTI system. To make the presentation clear, Table 6.4 lists the notation

we will use in the later sections for presenting the experimental results. Similar to the last

chapter, we will use two different machines to carry out the experiments. The first machine

Table 6.1: Details of Power Grids used in experiments.

Grid Metal Nodes Branches Junctions Trees C4s Current v0,max

Name Layer sources (% vdd)

ibmpg1 2 6,085 10,853 11,562 709 100 5,387 4.4

ibmpg2 4 61,677 61,143 61,605 462 120 18,419 4.7

ibmpg3 5 410,011 401,412 409,601 8,189 494 100,527 4.4

ibmpg4 6 474,524 465,416 475,069 9,653 312 132,916 4.9

ibmpg5 3 248,838 495,656 497,658 2,002 100 236,600 4.7

ibmpg6 3 403,915 797,579 807,825 10,246 132 380,742 9.95

ibmpgnew1 6 315,951 698,101 717,629 19,528 494 178,965 4.8

ibmpgnew2 6 717,754 698,101 717,629 19,528 494 178,965 4.4

PG1⋆ 8 36,862 36,189 36,862 673 9 2,448 4.9

PG2⋆ 8 146,112 144,755 146,112 1,357 64 30,010 4.9

PG3 8 560,468 557,816 560,468 2,652 100 40,254 4.9

PG4 8 1,232,260 1,226,703 1,232,260 5,557 225 89,508 4.9

PG5 8 1,643,814 1,636,888 1,643,814 6,926 668 188,161 4.8

PG6 8 2,629,448 2,617,216 2,629,431 12,215 944 566,736 4.8

PG7 8 4,094,704 4,082,039 4,094,704 12,665 1,471 886,124 4.75

⋆ PG1 and PG2 will only be used for validating the filtering approach.

is a 12 core 3GHz Linux machine with Xeon CPU and 128 GB of RAM. The second machine

is quad-core 3.4 GHz Linux machine with Core i7 CPU and 32 GB of RAM. Unless stated

otherwise, all simulations are run on the first machine.

We carried out many experiments with the following objectives in mind: a) To validate the

Filtering approach as presented in Section 6.4.3 by comparing its results with the main approach

of Section 6.4.2, b) To verify the correctness of proposed numerical methods (VCBDFk and

Predictor+Newton) in the context of power grid EM checking by comparing their results to the

results obtained from the RK45 solver, c) To check the accuracy of Black’s model for power grid

EM verification and d) To show that accounting for early failures in a power grid EM verification

is important. Finally, we will report the speed-up obtained by our tool due to parallelization,

the break-up of time consumed by various parts of the code and study the overall scalability of

our approach.

Table 6.2: Table of Physical constants

Symbol Description Value

B Bulk modulus 135.21 GPa [88]

Ω Atomic volume 1.66× 10−29 m3

kb Boltzmann’s constant 1.38× 10−23 Joule/K

q∗ Effective charge 8.0109× 10−19C [89]

σth Critical stress threshold 600 MPa [20]

δ Thickness of void interface 10−9 m [67]

Tamb Ambient temperature 300K (27C)

Tzs Stress free annealing temperature 623K [22, 90]

am − asi Difference in coefficients of thermal expansion 1.068×10−5 K−1

ρm Resistivity of metal (Copper) 2.1991× 10−8 ohm·mρb Resistivity of barrier metal (Tantalum) 1.7082× 10−7 ohm·m

Table 6.3: Configuration parameters to be used for evaluating all power grid benchmarks

Symbol Description Value

N number of discretizations per branch 16

ǫabs Absolute error tolerance for ODE 10−6

ǫrel Relative error tolerance for ODE 10−3

ζ To ensure a (1− ζ)× 100% confidence bound on MTF 0.05 (95% confidence)

ǫmc Maximum relative error bound on estimated MTF 0.05 (max. 5% error)

vth Voltage drop threshold for all nodes 5% of v†dd

ǫnt,abs Absolute error tolerance for stopping newton iteration 10−8

ǫnt,rel Relative error tolerance for stopping newton iteration 10−5

tm,0 Initial value of active set cut-off threshold 20 years

† For ibmpg6, we use 10% of vdd.

Table 6.4: Notation used to simplify presentation

Symbol Description

Series MTF for a given grid estimated using x, where x is either

a numerical method or an EM model.

Mesh MTF for a given grid estimated using x

The time taken to estimate the Mesh MTF using x with P parallel processes.

ibmpg1 ibmpg2 ibmpg5 PG1 PG2Grid Name

Main approach

Filtering Approach

0 5 10 15 20 25 30Sample num (in ascending order)

Main ApproachFiltering Approach

0 5 10 15 20 25 30 35Sample num (in ascending order)

Main ApproachFiltering Approach

Figure 6.10: Comparing the main approach with the filtering approach for the first 5 gridsshowing (a) 95% confidence bounds on the estimated MTF, and the TTF samples obtained byeach for (b) ibmpg2 and (c) ibmpg5.

6.5.1 Main Approach vs Filtering Approach

We will first verify that the filtering approach indeed leads to significant speed-ups with minimal

loss of accuracy as compared to the main approach. Table 6.5 compares the series and mesh

MTF as estimated using the main approach (µmains and µmainm , respectively) and the filtering

approach (µflts and µfltm , respectively). We could only test the main approach on six of the

smallest benchmarks using the VCBDF2 solver due to memory and runtime constraints. From

Table 6.5, it is clear that as the grid size increases, the filtering approach leads to larger speed-

ups with negligible loss in accuracy. Note that both these approaches were parallelized by using

12 processes to carry out the different Monte Carlo iterations simultaneously. In Fig. 6.10a, we

show the 95% confidence bounds on MTF as obtained using the main approach and the filtering

approach, which can be seen to be almost identical. In Fig. 6.10b and 6.10c, we show the sample

TTF values, sorted in ascending order, obtained by both approaches for ibmpg2 and ibmpg5

grids, respectively, in the process of estimating the MTF. Clearly, for these grids, the filtering

approach does a good job of identifying the EM-susceptible trees because the TTF samples

Table 6.5: Comparison of Power grid MTF obtained using the Main Approach and the FilteringApproach.

Main Approach Filtering Approach Error(%) Speed-up

Grid µmains µmainm tmain12 µflts µfltm tflt12 Series Mesh tmain12

tflt12Name (yrs) (yrs) (mins) (yrs) (yrs) (mins)

ibmpg1 3.39 7.08 2.80 3.39 6.99 0.87 0.03 1.30 3.23x

ibmpg2 6.60 11.94 9.36 6.63 11.98 1.36 0.53 0.35 6.86x

ibmpg5 4.44 6.35 43.14 4.45 6.34 2.44 0.29 0.14 17.65x

PG1 6.81 11.73 1.14 6.81 11.69 0.21 0.08 0.36 5.52x

PG2 2.55 6.07 58.50 2.57 6.17 2.75 0.56 1.64 21.26x

PG3∗ 4.39 15.19 469.37 4.36 16.50 2.62 0.65 8.64 179.45x

∗ The MC process for the main approach could not converge within the set time limit for PG3.

from both approaches are almost the same. This proves the value of the filtering approach, as

it makes MTF estimation using physics-based EM models scalable by focusing the computation

only on EM-susceptible trees. For all subsequent sections, we will use the filtering approach for

obtaining the MTF estimates.

6.5.2 Comparison of Performance and Accuracy between the solvers

Table 6.6 compares the performance and accuracy of the VCBDF2-VCBDF4 solvers presented

in chapter 5 in the context of power grid EM checking by comparing their run-time and the

estimated MTFs with those of the RK45 solver. The MTF estimation for all simulations is

parallelized using 12 processes, one running on each core. We use only the three fastest solvers,

VCBDF2-VCBDF4, for this comparison. Since the VCBDF solvers have been optimized for

the problem at hand, they are very fast as compared to the standard RK45 solver. Overall, for

the given benchmarks, VCBDF2 is 39.6x faster, VCBDF3 is 31.9x faster and VCBDF4 is 22.2x

faster than RK45. The VCBDF solvers are also accurate, with the average percentage error

in MTF estimation across the board being only around 1%. As expected, the error in MTF

estimation decreases as we move towards the higher order solvers.

We test the Predictor+Newton solver on the second machine which has a Core i7 CPU

because, as we saw in the last chapter, these solvers had consistently better performance on

the Core i7 CPU as compared to the Xeon CPU. For the baseline, we will again use the results

obtained using the RK45 solver. Table 6.7 shows the comparison. In spite of using reduced

order models, the Predictor+Newton method is an accurate method, with the error in the series

and mesh MTF being only 1% and 1.84%, respectively. We also compare the speed-up obtained

by the Predictor+Newton solver with respect to the RK45 solver. We realize that this is not a

good comparison, since the runtimes are obtained on different machines and the RK45 solver

Table 6.6: Comparing the performance and accuracy of VCBDF2-VCBDF4 methods for powergrid EM checking using RK45 as reference

RK45 VCBDF2 VCBDF3 VCBDF4

Grid µRKs µRKm tRK12 µBDF2s µBDF2m tBDF312 µBDF3s µBDF3s tBDF312 µBDF4s µBDF4m tBDF412

Name (yrs) (yrs) (mins) (yrs) (yrs) (mins) (yrs) (yrs) (mins) (yrs) (yrs) (yrs)

ibmpg1 3.51 7.04 6.12 3.39 6.99 0.87 3.55 7.06 1.26 3.53 7.06 1.42

ibmpg2 6.71 11.91 35.93 6.63 11.98 1.36 6.63 11.98 1.43 6.63 11.98 1.86

ibmpg3 4.56 7.02 326.34 4.57 6.96 4.56 4.56 6.83 6.45 4.54 6.99 10.58

ibmpg4 8.82 17.05 336.43 8.83 16.83 8.01 8.65 17.08 10.64 8.65 17.08 15.45

ibmpg5 4.52 6.17 15.28 4.45 6.34 2.44 4.43 6.33 2.95 4.43 6.33 3.42

ibmpg6 5.58 11.27 237.72 5.61 11.40 16.21 5.61 11.25 21.03 5.61 11.24 29.13

ibmpgnew1 4.01 13.28 39.56 3.97 13.18 5.67 3.99 13.18 6.78 3.99 13.18 8.16

ibmpgnew2 4.58 7.18 62.56 4.62 7.21 4.88 4.63 7.20 5.51 4.62 7.22 5.97

PG3 4.35 16.87 369.42 4.36 16.50 2.62 4.35 16.46 3.45 4.40 16.92 5.60

PG4 3.60 10.43 426.61 3.60 10.36 9.47 3.61 10.29 10.34 3.64 10.46 13.85

PG5 3.91 8.55 236.71 4.00 8.58 3.80 4.00 8.66 4.18 3.96 8.66 5.49

PG6 – – – 3.23 14.87 10.95 3.28 14.57 13.75 3.22 14.89 22.48

PG7 – – – 4.31 9.10 10.35 4.24 9.20 11.78 4.13 9.11 15.23

Average error/speed-up 1.05% 1.08% 39.6x 1.01% 1.16% 31.9x 1.01% 0.69% 22.2x

Table 6.7: Comparison of the RK45 solver (run on the first machine) and the Predictor+Newtonsolver on the second machine (Quad-core i7@3.4GHz)

RK45 Predictor+Newton Error(%) Speed-up

Grid µRKs µRKm tRK12 µprnews µprnewm tprnew4 Series Mesh tRK12tprnew4Name (yrs) (yrs) (mins) (yrs) (yrs) (mins)

ibmpg1 3.51 7.04 6.12 3.51 7.04 2.46 0.07 0.09 2.49x

ibmpg2 6.71 11.91 35.93 6.62 11.92 5.28 1.42 0.05 6.80x

ibmpg3 4.56 7.02 326.34 4.51 7.31 14.59 1.13 4.17 22.37x

ibmpg4 8.82 17.05 336.43 8.79 17.37 23.37 0.32 1.86 14.40x

ibmpg5 4.52 6.17 15.28 4.47 6.34 4.46 0.98 2.77 3.43x

ibmpg6 5.58 11.27 237.72 5.53 10.52 60.81 0.87 6.66 3.91x

ibmpgnew1 4.01 13.28 39.56 4.01 13.27 15.45 0.06 0.06 2.56x

ibmpgnew2 4.58 7.18 62.56 4.64 7.18 12.38 1.32 0.07 5.05x

PG3 4.35 16.87 369.42 4.30 17.29 8.42 1.29 2.52 43.87x

PG4 3.60 10.43 426.61 3.65 10.35 31.66 1.43 0.77 13.47x

PG5 3.91 8.55 236.71 3.99 8.66 12.67 2.09 1.24 18.68x

Average 1.00% 1.84% 12.46x

Table 6.8: Comparison of power grid MTF as estimated using Black’s model and ExtendedKorhonen’s model (with VCBDF2 solver).

Black’s model EKM (using VCBDF2) Comparison

Grid µblks µblkm tblk1 µekms µekmm tekm1 µekms

µblks

µekmm

µblkm

tekm1Name (yrs) (yrs) (mins) (yrs) (yrs) (mins)

ibmpg2 2.33 5.07 2.620 6.63 11.98 11.12 2.85x 2.36x 0.24x

ibmpg3 2.58 5.72 28.292 4.57 6.96 46.56 1.77x 1.22x 0.61x

ibmpg4 2.50 5.28 36.812 8.83 16.83 64.40 3.53x 3.19x 0.57x

ibmpg5 2.25 3.58 2.249 4.45 6.34 20.52 1.98x 1.77x 0.11x

ibmpg6 1.37 1.54 4.818 5.61 11.40 121.28 4.08x 7.42x 0.04x

ibmpgnew1 1.63 3.33 6.557 3.97 13.18 50.58 2.44x 3.95x 0.13x

ibmpgnew2 1.78 6.10 46.387 4.62 7.21 31.26 2.60x 1.18x 1.48x

PG3 8.65 15.49 7.717 4.36 16.50 29.15 0.50x 1.07x 0.26x

PG4 3.25 6.01 16.935 3.60 10.36 75.76 1.11x 1.72x 0.22x

PG5 3.83 8.69 44.499 4.00 8.58 27.87 1.04x 0.99x 1.60x

PG6 3.70 9.10 105.294 3.23 14.87 76.67 0.87x 1.63x 1.37x

PG7 2.43 5.23 62.652 4.31 9.10 95.70 1.77x 1.74x 0.65x

Average 2.05x 2.35x 0.60x

is parallelized with 12 processes wheres as the Predictor+Newton is only parallelized with 4

processes. Nevertheless, the Predictor+Newton solver is still 12.5x faster than the RK45 solver.

If we extrapolate the runtimes of the Predictor+Newton solver to 12 cores by dividing them

with a conservative scaling factor of 2.5, then this solver will be as fast as the VCBDF3 solver.

Also, on the Core i7 CPU, we found that the Predictor+Newton method is on an average 2.7x

faster as compared to the VCBDF2 method.

From the results, we can see that using the VCBDF2 solver, the run-time for the most

difficult to solve grid (ibmpg6) is only around 16.2 minutes and the run time for the largest grid

(PG7) is around 10.4 minutes. This shows the scalability of our approach for large grids, which

has been made possible by a combination of optimized numerical methods and good filtering

techniques.

6.5.3 Black’s Model vs. EKM for grid MTF estimation

Table 6.8 lists the series and mesh MTFs obtained using Black’s model and the Extended

Korhonen’s model (EKM) proposed in this work. Columns µblks and µblkm denote respectively

the estimated series and mesh MTFs when Black’s model was used to determine branch TTFs

[25, 47]. We calibrate the Black’s model based on data obtained from Korhonen’s model, so that

for a finite line, the MTF predicted by Black’s model and EKM are the same. The columns µekms

and µekmm list the series and mesh MTFs, respectively, estimated using the filtering approach

given in Section 6.4.3 with the VCBDF2 solver. From the table, we note that µekms > µblks

for all grids except PG3 and PG6, and µekmm > µblkm for all grids except PG5 (for which it is

almost equal). Overall, the mesh (series) MTF estimated using EKM is 2.35x (2.05x) longer

than that found using Black’s model (with the ratio µekmm /µblkm being as much as 4x longer for

some grids). This serves as evidence of how Black’s model can lead to over-design of grids:

Suppose that for grid ibmpg4, the target mesh MTF is 10 years. If the grid EM sign-off was

done using Black’s model, ibmpg4 would have have failed the sign-off because its mesh MTF

estimated using Black’s model is 5.28 years. Thus, a designer will conclude that he/she needs to

widen the lines in order to achieve the target MTF. However, taking into account the material

flow between connected branches, we can see that the grid survives for 16.8 years! In fact, the

designer can even reduce the cross-sectional area of branches to reduce metal usage, because

there is an extra reserve of 6 years. Thus, the use of Black’s model leads to over-design of grids

and leaves a lot of margin on the table.

Next, we compare the performance of the Black’s mesh MTF estimation engine with the

EKM engine. Since the code that estimates Black’s model is not parallelized, we report the

sequential run-times for our approach, whereby all Monte-Carlo iterations are performed in a

single process. Based on the comparison shown in the last column of Table 6.8, overall our

approach is 0.6x slower. However, the fact that our approach, which solves large PDE systems

for several trees is only 0.6x slower than Black’s model based approach, which is only a simple

empirical model, is quite encouraging.

6.5.4 Effect of Early Failures

In order to assess the impact of early failures on the grid lifetime, we present a case study

using the ibmpg2 grid; we estimate its mesh MTF under two settings, one where early failure

detection is on and the other where early failure detection is turned off. As can be seen from

Fig. 6.11b, turning off early failures gives an optimistic MTF estimate which is 34% longer

than the actual MTF. Thus, if the target product lifetime is set as 15 yrs, this grid will fail

EM sign off due to the impact of early failures, but would erroneously succeed if early failures

are ignored. The difference in MTFs stems from the influence of early failures on node voltage

drops. In Fig. 6.11a, we show how the maximum node voltage drop changes with time (for one

sample grid) as the voids nucleate due to EM. Since early failures lead to removal of a via, their

impact on voltage drops is more severe, which ultimately leads to shorter lifetimes. In general,

the effect of early failures gets more pronounced as the difference between the maximum initial

voltage drop and vth increases.

Statistical analysis of EM failures in copper interconnects often shows bimodal distributions

due to the presence of early failures [29]. A similar bimodal distribution can be observed in

0 5 10 15 20Time (yrs)

EF detection ONEF detection OFF

10 12 14 16 18 20Time (yrs)

simulation EF ONsimulation EF OFFfit EF ONfit EF OFF

MTF = 12.46 yrs

TTF = 16.37 yrs

TTF = 11.56 yrs

MTF = 16.74 yrs

Figure 6.11: Impact of early failures (EF) on (a) the maximum voltage drop (shown for onesample grid) and (b) estimated mesh MTF for ibmpg2. Maximum voltage drop at t = 0 is3.8%vdd, and vth = 5%vdd.

4 6 8 10 12

Time (yrs)

0.00010.00050.001

0.005 0.01

0.05 0.1

0.9 0.95

0.99 0.995

0.999 0.99950.9999

babili

4 6 8 10 12

Time (yrs)

Mode A

Mode B

fit Mode A

fit Mode B

Mode A

All TTF

Samples

Mode B

Figure 6.12: Statistics of mesh TTF samples for ibmpg2 grid shows an underlying bimodaldistribution for different modes of grid failure. MTFA = 6.67 yrs, MTFB = 7.99 yrs, MTFall =7.66 yrs.

the statistics for mesh TTF samples obtained using our power grid EM analysis. Consider the

following two failure modes for a given sample grid : Mode A, in which all junction failures

that lead to grid failure are early failures and Mode B, where at least one junction failure is

a conventional failure. Fig. 6.12a and Fig. 6.12b show respectively the probability plot the

empirical pdf for the two failure modes obtained using 2500 mesh TTF samples from ibmpg2.

Since the pdf for failure modes A and B have a lot of overlap, the overall distribution is almost

normal.

0 2 4 6 8 10 12

Speed-up

ibmpg1

ibmpg2

ibmpg3

ibmpg4

ibmpg5

ibmpg6

ibmpgnew1

ibmpgnew2

PG7P = 4P = 8P = 12

Figure 6.13: Bar chart comparing speed-ups obtained using 4, 8 and 12 parallel processes withrespect to sequential code. Higher is better.

0 5 10 15 20 25 30MC iteration number

P = 12P = 8P = 4P = 1

0 5 10 15 20 25 30 35MC iteration number

P = 12P = 8P = 4P = 1

Figure 6.14: The figure shows how tm is updated for (a) ibmpg2 and (b) ibmpg5 with MCiterations for P parallel process.

6.5.5 Speed-up due to parallelization

The speed-ups obtained with the multi-process architecture are shown in Fig. 6.13. All

speedups are calculated based on the sequential runtime in which all MC iterations are per-

formed in a single process. For 4 parallel processes, we obtained an average speed-up of 3.4x, for

8 parallel processes, we got a speed up of 5.7x and for 12 parallel processes, we got an average

speed up of 8.5x. The reason for this sub-linear speedup is the ‘slow’ update of tm in the paral-

0 20 40 60 80 100

Percent of total time

ibmpg2

ibmpg3

ibmpg4

ibmpg5

ibmpg6

ibmpgnew1

ibmpgnew2

PG7Extract Power grid trees

Calculate v0 using cholmod

Compute initial temp. dist. + prepare trees

Generate sample grid + re-init data structures

Find the active set

Simulate

Synchronize

Update voltage drops using PCG

Figure 6.15: Showing a breakdown of the total runtime (in terms of percentages) consumed bydifferent tasks in the code.

lelized version as compared to the sequential version. Fig. 6.14 shows this phenomenon. Recall

that the initial value of tm is 20 years, and it is updated as more TTF samples are obtained.

A higher value of tm leads to a longer runtime for a given MC iteration, because more trees

are included in the active set. In the sequential version, by design only the first 5 Monte Carlo

iterations run with the tm = 20 years, after which all subsequent iterations use updated values

of tm. On the other hand, for a parallelized version with P processes, the first P Monte Carlo

iterations will run with tm = 20 years. This affects the scalability of our parallel version and

results in sub-linear speed-ups.

6.5.6 Break-up of time consumed by different tasks in the code

In Fig. 6.15, we show the percentage of time consumed by different tasks in the code while

estimating the MTF using VCBDF2 solver with 12 parallel processes. These percentages are

based on the total time taken by the respective task across multiple calls. From the figure, it

can be seen that overall, the majority of the runtime is spent in doing the following three tasks:

1) finding the next junction failure using the Sort- Simulate-Synchronize steps (consumes ∼36%on an average across all grids), 2) finding the active set (consumes ∼35% on an average) and

3) updating the node voltage drops using PCG after void nucleation(s) (consumes ∼16.5% on

an average). The synchronize step consumes only around 0.19% of the runtime, which shows

that the sort step does a good job of ordering the trees. The other tasks (in Fig. 6.15) mainly

consists of updating the temperature distribution and the MTF estimate, checking the stopping

criteria and updating the power grid structure after void nucleation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Number of branches 10 6

t 12BD

0 1 2 3 4 5 6

Number of branches 10 6

t 12BD

Figure 6.16: (a) tBDF212 vs. branch count for all test grids and (b) scalability analysis for gridsthat only have straight trees.

6.5.7 Overall scalability of the approach

Fig. 6.16a shows the run-times for all the grids we tested, plotted in ascending order of their

branch count. We use branch count as a measure of problem size because from (4.32), a

higher branch count leads to a larger LTI system. Overall, the runtime increases as we move

towards grids with higher branch count, but this increase is erratic because a host of other

factors, such as the geometry of the tree, the stiffness of the tree LTI system, the difference

between the maximum initial voltage drop and vth and the sensitivity of node voltage drops to

branch resistance values, also influence the runtime. If we select grids that have only straight

metal stripes as trees (i.e. no T or plus junctions), then we get a more consistent trend, as

shown in Fig. 6.16b. Computing the empirical complexity for these grids using the function

tBDF212 = anb, with n being number of branches and exponent b being the scalability factor, gives

a = 0.0052 and b = 0.5045. The reason for this sub-linear scalability can be mainly attributed

to the observation that the percentage of trees which become a part of the active set reduces

as the grid size increases, so that the increment in computation is less than the corresponding

increment in the problem size. If this trend continues for larger grids as well, and we ignore the

effect of other factors, then a simple calculation shows that a power grid with a billion branches

can be solved in around 3 hours.

Chapter 7

Conclusions and Future Work

A well-designed power grid in Integrated Circuits (ICs) not only must perform as desired, but it

should also survive and function as intended for a target lifetime before failing. As the modern

designs become more complex and the structural dimensions of electronic interconnects become

ever-smaller due to technology scaling, electromigration (EM) has emerged as a major reliability

concern for modern on die power grids. Modern power grids are huge, and can have up to a

billion nodes. Due to the scale of the problem, only the most simple EM methods have been used

so far power grid EM analysis. State of the art industrial EM checking tools use Black’s model

for branch failure combined with a series model for grid failure to determine system reliability

under the influence of EM. While this has served the purpose for last 40 years, we are now at

a stage where the simplicity and pessimism of the EM tools, that were once their virtues, are

now acting against them. Technology scaling has increased branch current densities, that has

drastically reduced the EM lifetimes. Thus, the industrial EM tools, due to their pessimistic

approach, are now unable to provide any breathing room for designers who are forced to over

use metal resources in designing the power grids.

This necessitates a EM checking tool that moves away from the overly simplistic EM models

and is scalable so that it can be applied to large power grids. In this work, we developed such

an approach. We proposed the Extended Korhonen’s Model (EKM) that can track stress in

multi-branch interconnect trees of arbitrary geometry. We then showed that this model can be

expressed as an LTI system and developed fast and scalable numerical methods for solving these

LTI systems. Finally, we developed a scalable approach for power grid EM checking using a

filtering scheme that determines upfront the set of trees that are most likely to impact the failure

of a grid, and then focusing our computation on those trees. The techniques developed in this

work have allowed the EM verification of a 4.1M node grid using physics-based models in only

∼10 minutes, which has not been done before. The results and the studies done in this work

clearly demonstrate that Black’s model is inaccurate, and one should move to physics-based

models to estimate EM degradation accurately.

There are many avenues to further extended this work. A desirable extension would be to

develop a budgeting framework (akin to SEB for the series model) that will enable the chip

Chapter 7. Conclusions and Future Work 122

level reliability to be traded between different parts of the grid using the Extended Korhonen’s

model. Also, since we have an LTI system representation, there might be a way to bypass

the Monte Carlo iterations altogether by developing an effective statistical model that directly

evaluates the mesh model based MTF. Some other proposed extensions include incorporating

current crowding in our EM tool and solving the reverse problem: Given a power grid and a

target MTF, how can we generate current constraints that guarantee the grid survival up to

the target MTF.

Appendices

Appendix A

Properties of system matrix A

In this section, we will provide the proof for theorem 1. The proof depends on the properties of

1) irreducible and ) non-negative matrices. As such, we will start with the definitions of such

matrices, and state the supporting theorems and lemmas that we will use to prove theorem 1.

The proof also appeals to some simple graph theory concepts, which we will review a little later.

Wherever possible, we have grouped the definitions/concepts so that it precedes the theorem

or lemma where it will be used, to make the presentation clear.

Definition 2. A square matrix M = [mi,k] ∈ Rn×n induces a directed graph Γ(M) whose

vertices are 0, 1, 2 . . . n− 1, and whose directed edges are i→ k if mi,k 6= 0. We call Γ(M) as

the directed graph of matrix M.

Definition 3. If there is a directed path in the graph from every vertex to every other vertex,

then the graph is said to be strongly connected.

Definition 4. A matrix M = [mi,k] is said to irreducible if Γ(M) is strongly connected [91].

Lemma 1. A is an irreducible matrix for both pre-void and post-void phase.

Proof. For any subtree T , consider two weighed directed graphs G(T ) and G′(T ), where the

discretized points are the vertices and any two adjacent points have an edge between them. In

G(T ), the direction of each edge is the same as reference direction assigned to the branches

and in G′(T ), the direction of each edge is opposite to the assigned reference direction, so that

G′(T ) is the converse of G(T ). For G(T ) (G′(T )), the weight of a directed edge i→ k (k → i)

between adjacent vertices is equal to ai,k (ak,i), which can be determined using (4.8)-(4.12).

Fig. A.1b and A.1c shows the graphs G(T ) and G′(T ) for interconnect tree of Fig. A.1a. A

weighed adjacency matrix can be used to represent the connectivity of a graph. If a graph has

n nodes, then the adjacency matrix will be of size n × n. The entry in the ith row and kth

column of a weighed adjacency matrix is equal to weight of edge i→ k if its exists, otherwise it

is 0. Let W = [wi,k] and W′ = [w ′k,i] be the weighed adjacency matrices for graphs G(T ) and

G′(T ), respectively. Then, wi,k = ai,k and w ′k,i = ak,i. Now, we can write A as

A = W + Ad +W′, (A.1)

Appendix A. Properties of system matrix A 125

(a) (b)

(c) (d)

Figure A.1: (a) A typical interconnect tree T with its corresponding graphs (b) G(T ), (c) theconverse G′(T ) and (d) Part of graph Γ(A) for any two adjacent points i and k. Here, N = 4and the vertex at n1 is the root.

where Ad is simply a diagonal matrix whose ith diagonal entry is equal to ai,i, the ith diagonal

entry of A. From (A.1), it is clear that Γ(A) = G(T ) ∪ G′(T ) ∪ Γ(Ad), so that for any two

adjacent vertices i and k, Γ(A) has both edges i→ k and k → i, as shown in Fig. A.1d. Thus,

in Γ(A) there is always a path from every vertex to every other vertex. Hence Γ(A) is strongly

connected and A is irreducible.

Definition 5. A matrix M = [mi,j ] is said to diagonally dominant if |mi,i| ≥∑

k 6=i |mi,k| ∀i.

Lemma 2. A is diagonally dominant for both pre-void and post-void phase.

Proof. From the state stamps (4.8)-(4.12), we have for the ith row in A

|ai,i| >q∑

k=0,k 6=i

|ai,k|, for a voided diffusion barrier, (A.2a)

|ai,i| =q∑

k=0,k 6=i

|ai,k|, otherwise. (A.2b)

Thus, A is diagonally dominant.

(a) (b)

Figure A.2: All paths starting from the root and ending in a diffusion barrier for (a) G(T ) and(b) the corresponding converse paths in G′(T ).

We will now review some simple graph theory concepts that are required to state the proof.

For any two vertices i and k, if there exists a directed path from i to k, then i is said to be an

ancestor of k and k is a descendant of i. In addition, if i and k are adjacent points, then i is

the parent of k and k is the child of i. In G(T ), each vertex has at most one parent while in

G′(T ), each vertex has at most one child. Note that there is only one vertex in G(T ) that hasno parents. The same vertex in G′(T ) has no children. We will designate this vertex as the root

for both G(T ) and G′(T ). Another concept that we will appeal to is a linear graph or a path.

A path is a tree where each vertex has at most one child or equivalently at most one parent.

Consider all paths in G(T ) that start from the root and end at a diffusion barrier, as shown in

Fig. A.2a. Clearly, the union of all such paths is equal to the graph itself. Since G′(T ) is the

converse of G(T ), the paths in G′(T ) are the converse of the paths in G(T ) (see Fig. A.2b).

We will now state all the remaining definitions and theorems, followed by the final proof.

Definition 6. A matrix M = [mi,k] is said to be non-negative if mi,k ≥ 0 ∀i, k.

Definition 7. Let λ1, λ2, . . . λn be the (real or complex) eigenvalues of a matrix M = [mi,k] ∈Rn×n. Then the spectral radius κ(M) is defined as

κ(M) = max|λ1|, |λ2|, . . . |λn|. (A.3)

Theorem 2. (Perron-Frobenius theorem) Let M ∈ Rn×n and suppose that M is irreducible

and non-negative. Then κ(M) > 0 is a simple eigenvalue of M with an associated positive

eigenvector.

Definition 8. A matrix M = [mi,k] ∈ Rn×n is said to be irreducibly diagonally dominant if M

is irreducible, all its rows are diagonally dominant and there is at least one row i that satisfies

|mi,i| >∑n

k=0,k 6=i |mi,k|.

Theorem 3. An irreducibly diagonally dominant matrix is non-singular.

The proofs for theorems 2 and 3 have been provided in [91].

A.1 Proof of theorem 1

Part (a). We will first prove part 1(a), by proving the following statements for the system

matrix A of subtree T = N , B in the pre-void phase:

(i) All eigenvalues of A have non-positive real parts.

(ii) There is exactly one eigenvalue at 0.

(iii) All eigenvalues of A are real.

Proving (i). From Gershgorin disc theorem [78], all eigenvalues of A are located in the

union of q + 1 discs

z ∈ C : |z − ai,i| ≤

k=0,k 6=i

|ai,k|

≡ G(A). (A.4)

From diagonal dominance, we always have |ai,i| ≥∑q

k=0,k 6=i |ai,k| and ai,i < 0 ∀i. Thus, G(A)

would lie in the left-half of the complex plane touching the imaginary axis at the origin. Hence,

all eigenvalues of A have non-positive real parts.

Proving (ii). In the pre-void phase, all the row sums in A are zero. Thus, we must have

at least one eigenvalue at 0. Indeed, we have Ay = 0 for y =[

1 1 . . . 1]T

or a multiple

thereof. Thus, y is an eigenvector for the 0 eigenvalue. Define

Ac = A+ cI, (A.5)

where c = max|ai,i|. Then, clearly Ac is non-negative and irreducible because it is obtained by

only adding c to the diagonal entries of A (non-diagonal elements are unaffected). Also, if λ0 ≥λ1 ≥ . . . λq are the eigenvalues of A (including multiplicities), then λ0 + c ≥ λ1 + c ≥ . . . λq + c

are the eigenvalues of Ac. From part (a), we know that all eigenvalues are non-positive, thus

λ0 = 0 is the largest eigenvalue of A and λ0 + c = c is the largest eigenvalue of Ac. But, since

Ac is non-negative and irreducible, we must have κ(Ac) = c. By Perron-Frobenius theorem, c

is a simple eigenvalue of Ac. Hence, 0 is a simple eigenvalue of A.

Proving (iii). We will prove this by showing that A is similar to a symmetric matrix. For

this, we restate (A.1)

A = W + Ad +W′. (A.6)

Let D = [di,k] ∈ R(q+1)×(q+1) be a diagonal matrix (di,k = 0 if k 6= i) and S , D−1AD be a

matrix similar to A. Then

S = D−1WD+D−1AdD+D−1W′D

= D−1WD+ Ad +D−1W′D.(A.7)

For S to be symmetric, we must have S = ST , so that

D−1WD+ Ad +D−1W′D = (D−1WD+ Ad +D−1W′D)T

= DWTD−1 + Ad +D(W′)TD−1.(A.8)

Note that by construction, we have wk,i 6= 0 ⇐⇒ w ′i,k 6= 0. Thus, the structure (sparsity

pattern) of W and (W′)T is the same. This is to be expected because G′(T ) is the converse of

G(T ). Then, from (A.8), S will be symmetric if we can find a diagonal matrix D such that

D−1WD = D(W′)TD−1, (A.9)

which in turn requires

wi,kdk,kdi,i

= w ′k,i

di,idk,k

=⇒ ai,kdk,kdi,i

= ak,idi,idk,k

=⇒ (dk,k)2 =

ak,iai,k

(di,i)2. (A.10)

Thus, if (A.10) is satisfied for all edges i → k in G(T ) and k → i in G′(T ), S will be

symmetric. To show that such a satisfying assignment is possible, consider a path in G(T ) thatstarts from the root and ends at any diffusion barrier. For every edge P (k) → k in the path,

where P (k) denotes the (only) parent of vertex k in the path, we enforce the following condition

(dk,k)2 =

(ak,P (k)

aP (k),k

(dP (k),P (k))2, (A.11)

that leads to the following transitive relation for any vertex k in the path

(dk,k)2 =

(ak,P (k)

aP (k),k

)(aP (k),P (P (k))

aP (P (k)),P (k)

(aC(r),r

ar,C(r)

(dr,r)2. (A.12)

Here, r is the index of the root vertex and C(k) denotes the (only) child of vertex k in the path.

If we choose dr,r 6= 0, then we can uniquely determine all dk,k values, corresponding to vertex k

in the path, as we traverse it starting from the root. By traversing all the paths starting from

the root and ending at diffusion barriers, we can determine D matrix, such that S is symmetric.

A being similar to a symmetric matrix will have real eigenvalues.

Part (b). In the post-void phase, the system matrix A will have at least one voided

diffusion barrier. Hence, there will be at least one row i that satisfies |ai,i| >∑q

k=0,k 6=i |ai,k|[from (4.11)]. Thus, as per definition 8, A is irreducibly diagonally dominant and hence, non-

singular. Also, as we did for the pre-void phase, we can proof that all eigenvalues of A are

real and non-positive. However, since 0 cannot be an eigenvalue of A in post-void phase, all

eigenvalues of A are negative real numbers.

A.2 Special Case

For a subtree that only has dotted-I junctions and diffusion barriers, we can also prove that

system matrix A has distinct eigenvalues. The proof relies on the following theorem, which has

been stated and proved in [92].

Theorem 4. Let A ∈ Rn×n be a tridiagonal matrix

c2 a2 b3. . .

. . .. . .

. . .. . . bn

Then A has n real and distinct eigenvalues if it satisfies the following three conditions for

i = 2, 3, . . . , n:

i) A is irreducible, i.e. bici 6= 0.

ii) A is diagonally dominant, i.e. |ai| ≥ |bi|+ |ci|.

iii) sign(bici) = sign(ai−1ai).

Lemma 3. If a subtree only has diffusion barriers and dottedI junctions, all eigenvalues of

system matrix A = [ai,k] obtained using state-stamps (4.8)-(4.12) are real and distinct for both

pre-void and post-void phase.

Proof. Consider a subtree T that has only diffusion barriers and dotted-I junctions. Clearly, Twill have only two diffusion barriers at the two ends with multiple dotted-I junctions in between.

Without loss of generality, we will assume that the indices are ordered, so that the index of

a parent is always less than the index of its child. This imposes a complete ordering on the

assigned indices to the discretized points, so that if the leftmost diffusion barrier was chosen

as the root, the indices would increase as we go from left to right, with the rightmost diffusion

barrier having the largest assigned index. For such a case, the system matrix A = [ai,k] obtained

using state-stamps (4.8)-(4.12) will be tridiagonal. In addition it satisfies all conditions stated

in theorem 4:

i. A is irreducible (see Lemma 1)

ii. A is diagonally dominant (see Lemma 2).

iii. Since ai,i < 0 ∀i and ai,k > 0 for k 6= i, we always have ak,kak−1,k−1 > 0 and ak,k−1ak−1,k >

Thus, from theorem 4, all eigenvalues of A are real and distinct.

Appendix B

The math behind the Filtering

approach

In this section, we will provide step by step details of the integration of (6.12) that leads to

(6.13). We will also show how we get the expression for δµζ , the (1 − ζ) × 100% confidence

bound on µ.

B.1 Integration details

Lets denote the RHS of (6.12) by I. Now using (6.11) in (6.12), we can write

∫ ∞

1− erf(z − µ

dz. (B.1)

Let y ,z − µ

=⇒ dz = v√2dy and h ,

tm − µ

. Then, (B.1) becomes

I =v√2

∫ ∞

herfc(y)dy, (B.2)

where we used erfc(y) = 1− erf(y). From the definition of erfc, we get

∫ ∞

(2√π

∫ ∞

ye−u2

dy. (B.3)

Using integration by parts, we get:

y2√π

∫ ∞

ye−u2

−∫ ∞

(2√π

∫ ∞

ye−u2

=2√π

∫ ∞

ye−u2

−∫ ∞

− 2√πe−y2

dy (Using Leibnitz rule)

Appendix B. The math behind the Filtering approach 131

=2√π

∫ ∞

ye−u2

+1√π

e−y2]∞

h(Integrated using u = 2y)

=2√π

0− h

∫ ∞

he−u2

+1√π

0− e−h2]

= −herfc(h) + 1√πe−h2

By substituting h, and replacing the corresponding expressions for standard normal cdf Φ(·)and pdf φ(·), we get the final expression

I = − tm − µ

1− erf(tm − µ

+v√2π

(tm − µ

= (µ− tm)

1− 1

1 + erf

(tm − µ

= (µ− tm)

1− Φ

(tm − µ

. (B.4)

B.2 Deriving confidence bound on µ

Given that µ is function of µ′ and κ, the estimation errors in µ′ and κ propagate to µ. Thus,

we can write using propagation of errors [86]

δµζ =

√(∂µ

∂µ′δµ′

(∂µ

∂κδκζ

, (B.5)

where δµζ , δµ′ζ and δκζ are the (1− ζ)× 100% confidence bounds for µ, µ′ and κ, respectively.

The partial derivatives in (B.5) can be easily calculated

∂µ′=

∂κ=

tm − µ′

κ2. (B.6)

Thus, if we determine δµ′ζ and δκζ , we can determine the δµζ .

B.2.1 Finding δκζ

Let y = Φ−1(pf )/√

2. Then, κ can be written as

κ = pf +φ(y√2)

= pf +e−y2

2y√π. (B.7)

Define g(y) , e−y2/(2y√π). Then, we can write κ = pf + g(y), with error δκζ as

δκζ =

(δpfζ)2 +

∂yδyζ

, (B.8)

where δpfζ is the (1− ζ)× 100% confidence bound on δpf . Now

2√π

)∂(e−y2/y)

∂y= −e−y2(2y2 + 1)

2y2√π

y =Φ−1(pf )√

2=⇒ pf = Φ(y

=⇒ δpfζ =d

dyΦ(y√2)δyζ =

dy(1 + erf(y)) δyζ =

2e−y2

=⇒ δyζ =

e−y2δpfζ . (B.10)

Using (B.9) and (B.10) back in (B.8), we get

δκζ =

(δpfζ)2 +

(e−y2(2y2 + 1)

2y2√π

×√πδpfζ

e−y2

= δpfζ

. (B.11)

From [87], the (1− ζ)× 100% confidence bound on δpf after obtaining s samples is

δpfζ ≤ zζ/2

pf (1− pf )

swhen spf ≥ 5, s(1− pf ) ≥ 5, (B.12)

where zζ/2 is the (1 − ζ/2)-percentile of standard normal distribution and pf is the estimated

value of pf . Hence, for s samples, we can state the (1− ζ)× 100% confidence bound on δκ as

δκζ ≤ zζ/2

√√√√

pf (1− pf )

. (B.13)

B.2.2 Finding δµ′ζ

Because RV T′ has a limited normal distribution, the confidence intervals for the normal dis-

tribution cannot be applied directly to calculate the confidence bounds in this case. Hence,

we use the technique presented in [83] which uses the notion of generalized confidence intervals

[93]. The procedure requires calculating the percentiles of generalized pivotal quantities (GPQ)

using simulation. The steps are as follows:

1. After obtaining s samples, estimate the mean µ using (6.15) and the standard deviation

v using

v =tm − µ′

Φ−1(pf )pf + φ (Φ−1(pf )), (B.14)

where (B.14) was found using (6.13)-(6.16).

2. Generate a large number of (Z,U2) samples, where Z is a sample from standard normal

distribution and U is a sample from chisquared distribution with s−1 degrees of freedom.

3. For each sample (Z,U2), calculate the GPQs Qµ and Qv for µ and v′ respectively using

Qµ = µ− Z

U/√s− 1

v√s, Qv =

U/√n− 1

. (B.15)

4. For each (Qµ, Qv), calculate µ′ by substituting all occurrences of µ by Qµ nd v by Qv in

(6.13).

5. Sort all values obtained in the previous step in ascending order. The 100 ζ2th and 100(1− ζ

percentiles of the sorted values give us the (1− ζ)× 100% confidence bounds µ′lb and µ′

respectively, from which δµ′ζ can be estimated.

Bibliography

[1] S. Moreau and D. Bouchu, “Reliability of dual damascene tsv for high density integration:

The electromigration issue,” in 2013 IEEE International Reliability Physics Symposium

(IRPS), April 2013, pp. CP.1.1–CP.1.5.

[2] J. Warnock, “Circuit design challenges at the 14nm technology node,” in ACM/IEEE 48th

Design Automation Conference (DAC-2011), San Diego, CA, June 5-9 2011, pp. 464–467.

[3] M. Hauschildt, M. Gall, S. Thrasher, P. Justison, R. Hernandez, H. Kawasaki,

and P. S. Ho, “Statistical analysis of electromigration lifetimes and void evolution,”

Journal of Applied Physics, vol. 101, no. 4, p. 043523, 2007. [Online]. Available:

http://dx.doi.org/10.1063/1.2655531

[4] C. S. Hau-Riege, “An introduction to cu electromigration,” Microelectron-

ics Reliability, vol. 44, no. 2, pp. 195 – 205, 2004. [Online]. Available:

https://doi.org/10.1016/j.microrel.2003.10.020

[5] C. L. Gan, C. V. Thompson, K. L. Pey, and W. K. Choi, “Experimental characterization

and modeling of the reliability of three-terminal dual-damascene Cu interconnect trees,”

J. Appl. Phys., vol. 94, no. 2, pp. 1222–1228, 2003.

[6] R. Monig, R. R. Keller, and C. A. Volkert, “Thermal fatigue testing of thin metal films,”

Review of Scientific Instruments, vol. 75, no. 11, pp. 4997–5004, 2004.

[7] B. Geden, “Understand and avoid electromigration (EM) and IR-drop in cus-

tom IP blocks,” Synopsys, White Paper, November 2011. [Online]. Available:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.443.498&rep=rep1&type=pdf

[8] J. R. Black, “Electromigration- a brief survey and some recent results,” IEEE Transactions

on Electronic devices, vol. 16, no. 4, pp. 338–347, 1969.

[9] D. Frost and K. Poole, “A method for predicting VLSI-device reliability using series models

for failure mechanisms,” Reliability, IEEE Transactions on, vol. R-36, no. 2, pp. 234–242,

June 1987.

Bibliography 135

[10] J. Kitchin, “Statistical electromigration budgeting for reliable design and verification in

a 300-MHz microprocessor,” in VLSI Circuits, 1995. Digest of Technical Papers., 1995

Symposium on, June 1995, pp. 115–116.

[11] A. S. Oates, “Interconnect reliability challenges for technology scaling: A circuit focus,” in

2016 IEEE Int. Interconnect Tech. Conf. / Adv. Metallization Conf. (IITC/AMC), May

2016, pp. 59–59.

[12] C. K. Hu, D. Canaperi, S. T. Chen, L. M. Gignac, B. Herbst, S. Kaldor, M. Krishnan,

E. Liniger, D. L. Rath, D. Restaino, R. Rosenberg, J. Rubino, S. C. Seo, A. Simon, S. Smith,

and W. T. Tseng, “Effects of overlayers on electromigration reliability improvement for

cu/low k interconnects,” in 2004 IEEE International Reliability Physics Symposium. Pro-

ceedings, April 2004, pp. 222–228.

[13] R. Rosenberg and M. Ohring, “Void formation and growth during electromigration in

thin films,” Journal of Applied Physics, vol. 42, no. 13, pp. 5671–5679, 1971. [Online].

Available: http://scitation.aip.org/content/aip/journal/jap/42/13/10.1063/1.1659998

[14] M. Shatzkes and J. R. Lloyd, “A model for conductor failure considering diffusion

concurrently with electromigration resulting in a current exponent of 2,” Journal

of Applied Physics, vol. 59, no. 11, pp. 3890–3893, 1986. [Online]. Available:

http://scitation.aip.org/content/aip/journal/jap/59/11/10.1063/1.336731

[15] R. Kirchheim, “Stress and electromigration in Al-lines of integrated circuits,” Acta

Metallurgica et Materialia, vol. 40, no. 2, pp. 309 – 323, 1992. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/095671519290305X

[16] M. A. Korhonen, P. Borgesen, K. N. Tu, and C.-Y. Li, “Stress evolution due to electro-

migration in confined metal lines,” J. Appl. Phys., vol. 73, no. 8, pp. 3790 –3799, apr

[17] M. E. Sarychev, Y. V. Zhitnikov, L. Borucki, C.-L. Liu, and T. M. Makhviladze,

“General model for mechanical stress evolution during electromigration,” Journal

of Applied Physics, vol. 86, no. 6, pp. 3068–3075, 1999. [Online]. Available:

[18] V. Sukharev, E. Zschech, and W. D. Nix, “A model for electromigration-induced

degradation mechanisms in dual-inlaid copper interconnects: Effect of microstructure,”

Journal of Applied Physics, vol. 102, no. 5, pp. –, 2007. [Online]. Available:

[19] R. de Orio, H. Ceric, and S. Selberherr, “Physically based models of elec-

tromigration: From Black’s equation to modern TCAD models,” Microelec-

Bibliography 136

tronics Reliability, vol. 50, no. 6, pp. 775 – 789, 2010. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0026271410000193

[20] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, “Physics-based Electromigration Assess-

ment for Power Grid Networks,” in ACM/EDAC/IEEE Design Automation Conf., June

2014, pp. 1–6.

[21] D.-A. Li, M. Marek-Sadowska, and S. Nassif, “A method for improving power grid resilience

to electromigration-caused via failures,” IEEE Trans. Very Large Scale Integr. (VLSI)

Syst., vol. 23, no. 1, pp. 118–130, Jan 2015.

[22] X. Huang, V. Sukharev, J.-H. Choy, M. Chew, T. Kim, and S. X.-D. Tan,

“Electromigration assessment for power grid networks considering temperature and

thermal stress effects,” Integration, the VLSI Journal, vol. 55, pp. 307–315, 2016. [Online].

Available: https://doi.org/10.1016/j.vlsi.2016.04.001

[23] D. A. Li, M. Marek-Sadowska, and S. R. Nassif, “T-VEMA: A temperature- and variation-

aware electromigration power grid analysis tool,” IEEE Transactions on Very Large Scale

Integration (VLSI) Systems, vol. 23, no. 10, pp. 2327–2331, Oct 2015.

[24] Y.-K. Cheng, P. Raha et al., “ILLIADS-T: an electrothermal timing simulator for tempera-

ture sensitive reliability diagnosis of CMOS VLSI chips,” IEEE Trans. on Computer-Aided

Design of Integrated Circuits and Systems, vol. 17, no. 8, pp. 668–681, Aug 1998.

[25] S. Chatterjee, M. Fawaz, and F. N. Najm, “Redundancy-Aware Electromigration Checking

for Mesh Power Grids,” in IEEE/ACM Int. Conf. on Comput. Aided Design, San Jose,

CA, Nov. 2013, pp. 540–547.

[26] S. R. Nassif, “Power grid analysis benchmarks,” in ASP-DAC, 2008, pp. 376–381.

[27] Y.-L. Cheng, S. Y. Lee, C. C. Chiu, and K. Wu, “Back stress model on electromigra-

tion lifetime prediction in short length copper interconnects,” in 2008 IEEE International

Reliability Physics Symposium, April 2008, pp. 685–686.

[28] B. Li, J. Gill, C. Christiansen, T. Sullivan, and P. S. McLaughlin, “Impact of via-line

contact on cu interconnect electromigration performance,” in IEEE Int. Rel. Phys. Symp.,

April 2005, pp. 24–30.

[29] E. T. Ogawa, K. D. Lee, H. Matsuhashi, K. S. Ko, P. R. Justison, A. N. Ramamurthi, A. J.

Bierwag, P. S. Ho, V. A. Blaschke, and R. H. Havemann, “Statistics of electromigration

early failures in Cu/oxide dual-damascene interconnects,” in 39th Annual IEEE Int. Rel.

Physics Symp. Proc., 2001, pp. 341–349.

[30] L. M. Ting, J. S. May, W. R. Hunter, and J. W. McPherson, “AC electromigration char-

acterization and modeling of multilayered interconnects,” in IEEE Int. Rel. Phys. Symp.,

March 1993, pp. 311–316.

Bibliography 137

[31] V. Sukharev, X. Huang, and S. X.-D. Tan, “Electromigration induced stress evolution

under alternate current and pulse current loads,” Journal of Applied Physics, vol. 118,

no. 3, p. 034504, 2015.

[32] K. Lee, “Electromigration recovery and short lead effect under bipolar- and unipolar-pulse

current,” in IEEE International Reliability Physics Symposium (IRPS), april 2012, pp.

6B.3.1 –6B.3.4.

[33] “Standard method for calculating the electromigration model parameters for current den-

sity and temperature,” JEDEC Solid State Technology Association, Arlington, VA, Stan-

dard, Feb 1998.

[34] I. A. Blech, “Electromigration in thin aluminium on titanium nitride,” Journal of Applied

Physics, vol. 47, no. 4, pp. 1203–1208, 1976, doi: 10.1063/1.322842.

[35] I. A. Blech and C. Herring, “Stress generation by electromigration,” Applied Physics Let-

ters, vol. 29, no. 3, pp. 131–133, 1976.

[36] I. A. Blech and K. L. Tai, “Measurement of stress gradients generated by electromigration,”

Applied Physics Letters, vol. 30, no. 8, pp. 387–389, 1977.

[37] A. Abbasinasab and M. Marek-Sadowska, “Blech effect in interconnects: Applications and

design guidelines,” in Proceedings of the 2015 Symposium on International Symposium on

Physical Design, ser. ISPD ’15. New York, NY, USA: ACM, 2015, pp. 111–118. [Online].

Available: http://doi.acm.org/10.1145/2717764.2717772

[38] J. Lloyd, “Black’s law revisited-Nucleation and Growth in Electromigration failure,”

Microelectronics Reliability, vol. 47, no. 9-11, pp. 1468–1472, 2007. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0026271407003630

[39] M. Hauschildt, C. Hennesthal, G. Talut, O. Aubel, M. Gall, K. B. Yeap, and E. Zschech,

“Electromigration early failure void nucleation and growth phenomena in Cu and Cu(Mn)

interconnects,” in Reliability Physics Symposium (IRPS), 2013 IEEE International, April

2013, pp. 2C.1.1–2C.1.6.

[40] J. Lloyd and J. Kitchin, “The electromigration failure distribution: The fine-line case,” J.

Appl. Phys., vol. 69, no. 4, pp. 2117–2127, Feb 1991.

[41] V. Mishra and S. S. Sapatnekar, “The impact of electromigration in copper interconnects

on power grid integrity,” in Proceedings of the 50th Annual Design Automation Conference,

2013, pp. 88:1–88:6. [Online]. Available: http://doi.acm.org/10.1145/2463209.2488842

[42] S. P. Hau-Riege and C. V. Thompson, “Experimental characterization and modeling of

the reliability of interconnect trees,” J. Appl. Phys., vol. 89, no. 1, pp. 601–609, 2001.

Bibliography 138

[43] H.-B. Chen, S.-D. Tan, V. Sukharev, X. Huang, and T. Kim, “Interconnect reliability

modeling and analysis for multi-branch interconnect trees,” in ACM/EDAC/IEEE Design

Automation Conf., June 2015, pp. 1–6.

[44] H. B. Chen, S. X. D. Tan, X. Huang, T. Kim, and V. Sukharev, “Analytical modeling

and characterization of electromigration effects for multibranch interconnect trees,” IEEE

Trans. on Comput.-Aided Design of Integrated Circuits and Systems, vol. 35, no. 11, pp.

1811–1824, Nov 2016.

[45] B. Li, P. S. McLaughlin, J. P. Bickford, P. Habitz, D. Netrabile, and T. D. Sullivan,

“Statistical evaluation of electromigration reliability at chip level,” IEEE Transactions on

Device and Materials Reliability, vol. 11, no. 1, pp. 86–91, March 2011.

[46] F. L. Wei, C. S. Hau-Riege, A. P. Marathe, and C. V. Thompson, “Effects

of active atomic sinks and reservoirs on the reliability of Cu low-k intercon-

nects,” Journal of Applied Physics, vol. 103, no. 8, 2008. [Online]. Available:

[47] S. Chatterjee, M. Fawaz, and F. N. Najm, “Redundancy-aware power grid electromigration

checking under workload uncertainties,” IEEE Transactions on Computer-Aided Design of

Integrated Circuits and Systems, vol. 34, no. 9, pp. 1509–1522, Sept 2015.

[48] J. Lloyd and J. Kitchin, “The electromigration failure distribution: The fineline case,”

Journal of Applied Physics, vol. 69, no. 4, pp. 2117–2127, 1991.

[49] F. N. Najm, Circuit Simulation. John Wiley and Sons, 2010.

[50] J. Thomas, Numerical Partial Differential Equations: Finite Difference Methods.

Springer-Verlag New York, 1995.

[51] J. N. Reddy, An Introduction to the Finite Element Method, 3rd ed. McGraw-Hill, 2006.

[52] T. Barth and M. Ohlberger, Finite Volume Methods: Foundation and Analysis. JohnWiley

& Sons, Ltd, 2004. [Online]. Available: http://dx.doi.org/10.1002/0470091355.ecm010

[53] J. Droniou, R. Eymard, T. Gallouet, and R. Herbin, “Gradient schemes: a generic frame-

work for the discretisation of linear, nonlinear and nonlocal elliptic and parabolic equa-

tions,” Mathematical Models and Methods in Applied Sciences, vol. 23, no. 13, pp. 2395–

2432, 2013.

[54] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang, Spectral Methods: Fundamen-

tals in Single Domains. Springer-Verlag Berlin Heidelberg, 2006.

[55] W. Schiesser, Computational Mathematics in Engineering and Applied Science: ODEs,

DAEs, and PDEs. Taylor & Francis, 1993.

Bibliography 139

[56] E. Hairer, S. P. Norsett, and G. Wanner, Solving ordinary differential equations, 2nd ed.

Springer-Verlag Berlin Heidelberg, 1993.

[57] E. Hairer and G. Wanner, Solving Ordinary Differential Equations II: Stiff and Differential-

Algebraic Problems, 2nd ed. Springer-Verlag Berlin Heidelberg, 1996.

[58] L. F. Shampine and H. A. Watts, “Global error estimates for ordinary differential

equations,” ACM Trans. Math. Softw., vol. 2, no. 2, pp. 172–186, Jun 1976. [Online].

[59] L. Shampine, What everyone solving differential equations numerically should know, Jan

1978. [Online]. Available: https://www.osti.gov/scitech/biblio/6219108

[60] J. D. Lambert, Numerical Methods for Ordinary Differential Systems: The Initial Value

Problem. Wiley, 1991.

[61] J. C. Butcher, Numerical Mathods for Ordinary Differential Equations, 2nd ed. John

Wiley and Sons, 2003.

[62] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan,

“Hotspot: a compact thermal modeling methodology for early-stage VLSI design,” IEEE

Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 5, pp. 501–513, May 2006.

[63] M. N. Ozisik, Boundary Value Problems of Heat Conduction. Mineola, New York: Dover

Publications, Inc., 2002.

[64] I. Miller, J. E. Freund, and R. Johnson, Probability and Statistics for Engineers. Engle-

wood Cliffs, N.J.: Prentice-Hall, Inc., 1990.

[65] S. Chatterjee, V. Sukharev, and F. N. Najm, “Fast physics-based electromigration checking

for on-die power grids,” in 2016 IEEE/ACM International Conference on Computer-Aided

Design (ICCAD), Nov 2016, pp. 1–8.

[66] R. de Orio, H. Ceric, and S. Selberherr, “A compact model for early electromigration

failures of copper dual-damascene interconnects,” Microelectronics Reliability, vol. 51, pp.

1573 – 1577, 2011. [Online]. Available: https://doi.org/10.1016/j.microrel.2011.07.049

[67] V. Sukharev, A. Kteyan, and X. Huang, “Postvoiding stress evolution in confined metal

lines,” IEEE Transactions on Device and Materials Reliability, vol. 16, no. 1, pp. 50–60,

March 2016.

[68] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Cicrcuits: A Design

Perspective, 2nd ed. Pearson, Dec 2002.

Bibliography 140

[69] Z.-S. Choi, J. Lee, M. K. Lim, C. L. Gan, and C. V. Thompson, “Void dynamics in

copper-based interconnects,” Journal of Applied Physics, vol. 110, no. 3, p. 033505, 2011.

[Online]. Available: http://dx.doi.org/10.1063/1.3611408

[70] L. Arnaud, F. Cacho, L. Doyen, F. Terrier, D. Galpin, and C. Monget, “Analysis

of electromigration induced early failures in cu interconnects for 45 nm node,”

Microelectronic Engineering, vol. 87, no. 3, pp. 355 – 360, 2010. [Online]. Available:

https://doi.org/10.1016/j.mee.2009.06.014

[71] J. Dormand and P. Prince, “A family of embedded runge-kutta formulae,” Journal of

Computational and Applied Mathematics, vol. 6, no. 1, pp. 19 – 26, 1980. [Online].

Available: http://dx.doi.org/10.1016/0771-050X(80)90013-3

[72] S. Chatterjee, V. Sukharev, and F. N. Najm, “Power grid electromigration checking using

physics-based models,” IEEE Trans. on Comput.-Aided Design of Integrated Circuits and

Systems, no. 99, 2017.

[73] J. J. Clement, “Reliability analysis for encapsulated interconnect lines under dc and

pulsed dc current using a continuum electromigration transport model,” Journal of Applied

Physics, vol. 82, no. 12, pp. 5991–6000, 1997.

[74] S. Chatterjee, V. Sukharev, and F. N. Najm, “Fast physics-based electromigration assess-

ment by efficient solution of linear time-invariant (LTI) systems,” in 2017 IEEE/ACM

International Conference on Computer-Aided Design (ICCAD), Nov 2017, pp. 1–1, to ap-

[75] K. W. Tu, “Stability and convergence of general multistep and multivariate methods with

variable step size,” Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, 1972, dept.

of Computer Science.

[76] R. K. Brayton, F. G. Gustavson, and G. D. Hachtel, “A new efficient algorithm for solving

differential-algebraic systems using implicit backward differentiation formulas,” Proceed-

ings of the IEEE, vol. 60, no. 1, pp. 98–108, Jan 1972.

[77] W. E. Arnoldi, “The principle of minimized iterations in the solution of the matrix

eigenvalue problem,” Quarterly of Applied Mathematics, vol. 9, no. 1, pp. 17–29, 1951.

[Online]. Available: http://www.jstor.org/stable/43633863

[78] N. J. Highham, Functions of Matrices: Theory and Computation. Society for Industrial

and Applied Mathematics, 2008.

[79] T. A. Davis, “A column pre-ordering strategy for the unsymmetric-pattern multifrontal

method,” ACM Trans. Math. Softw., vol. 30, no. 2, pp. 165–195, Jun. 2004. [Online].

Bibliography 141

[80] ——, “Algorithm 832: Umfpack v4.3—an unsymmetric-pattern multifrontal method,”

ACM Trans. Math. Softw., vol. 30, no. 2, pp. 196–199, Jun. 2004. [Online]. Available:

http://doi.acm.org/10.1145/992200.992206

[81] T. A. Davis and I. S. Duff, “A combined unifrontal/multifrontal method for unsymmetric

sparse matrices,” ACM Trans. Math. Softw., vol. 25, no. 1, pp. 1–20, Mar. 1999. [Online].

[82] T. Davis and I. Duff, “An unsymmetric-pattern multifrontal method for sparse lu factor-

ization,” SIAM Journal on Matrix Analysis and Applications, vol. 18, no. 1, pp. 140–158,

1997. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/S0895479894246905

[83] I. Bebu and T. Mathew, “Confidence intervals for limited moments and truncated moments

in normal and lognormal models,” Statistics & Probability Letters, vol. 79, no. 3, pp. 375

– 380, 2009.

[84] N. Weiss, P. Holmes, and M. Hardy, A Course in Probability. Pearson Addison Wesley,

[85] E. A. Amerasekera and F. N. Najm, Failure Mechanisms in Semiconductor Devices, 2nd ed.

John Wiley and Sons, Oct. 1998.

[86] H. H. Ku, “Notes on the use of propagation of error formulas,” Journal of Research of

the National Bureau of Standards, vol. 70C, no. 4, pp. 263–273, 1966. [Online]. Available:

http://archive.org/details/jresv70Cn4p263

[87] A. D. Lawrence D. Brown, T. Tony Cai, “Interval estimation for a binomial

proportion,” Statistical Science, vol. 16, no. 2, pp. 101–117, 2001. [Online]. Available:

http://www.jstor.org/stable/2676784

[88] V. Sukharev, E. Zschech, and W. D. Nix, “A model for electromigration-induced

degradation mechanisms in dual-inlaid copper interconnects: Effect of microstructure,”

Journal of Applied Physics, vol. 102, no. 5, p. 053505, 2007. [Online]. Available:

http://dx.doi.org/10.1063/1.2775538

[89] A. Lodder and J. P. Dekker, “The electromigration force in metallic bulk,” in Proc. of the

Stress Induced Phenomena in Metallization: 4th International Workshop, vol. 418, 1998,

pp. 315–329.

[90] A. L. S. Loke, “Process integration issues of low-permittivity dielectrics with copper for

high-performance interconnects,” Ph.D. dissertation, STANFORD UNIVERSITY, Mar

[91] R. A. Horn and C. R. Johnson, Eds., Matrix Analysis. New York, NY, USA: Cambridge

University Press, 1986.

Bibliography 142

[92] K. Veselic, “On real eigenvalues of real tridiagonal matrices,” Linear Algebra and its Ap-

plications, vol. 27, pp. 167 – 171, 1979.

[93] S. Weerahandi, “Generalized confidence intervals,” Journal of the American Statistical

Association, vol. 88, no. 423, pp. 899 – 905, Sept. 1993.

Fast and Scalable Physics-Based Electromigration Checking for … · 2017-12-19 · Fast and...

Documents

Scalable Multi-core Model Checking: Technology ...resources.mpi-inf.mpg.de/departments/rg1/conferences/vtsa14/slides/... · Scalable Multi-core Model Checking: Technology & Applications

High frequency pulsed electromigration

Interconnect Electromigration Modeling and Analysis for

Electromigration in Integrated Circuits

Experimental Study on Electromigration by Using Blech

Computer Simulation of ElectroMigration in

Fundamentals of Electromigration- Aware Integrated Circuit

Scalable Multi-core Model Checking: Technology & Applications of Brute … · 2014. 11. 13. · ymodel checking I symbolicmodel checking I bounded model checking I partial-orderreduction

Electromigration Check: Where the Design and Reliability

Electromigration Poster 2

Electromigration Evaluation System

Study of Electromigration in Integrated Circuits at Design

On Potential Design Impacts of Electromigration Awareness

Coping with Interconnect€¦ · Source: Cadence • Requires fast and ... Electromigration improvement; 100X longer lifetime (IBM, IEDM97) Electromigration is a limiting factor beyond

Experimental investigation of electromigration failure in

High Frequency AC Electromigration Lifetime …people.ece.umn.edu/groups/VLSIresearch/papers/2015/VLSI...Symposia on VLSI Technology and Circuits High Frequency AC Electromigration

Multi-Physics Computer Simulation ofthe Electromigration ...ycchan/publications-ycchan/Conference... · Multi-Physics Computer Simulation ofthe Electromigration Phenomenon Xiaoxin

Understanding Electromigration in Cu-CNT Composite

Lecture 12 Electromigration - iMechanica

Electromigration-Aware Interconnect Design