Statistical Performance Analysis and Modeling Techniques for …€¦ · University of California Riverside, USA Hao Yu Department of Electrical and Electronic Nanyang Technological

Statistical Performance Analysis and ModelingTechniques for Nanometer VLSI Designs

Ruijing Shen • Sheldon X.-D. Tan • Hao Yu

Statistical PerformanceAnalysis and ModelingTechniques for NanometerVLSI Designs

123

Ruijing ShenDepartment of Electrical EngineeringUniversity of CaliforniaRiverside, USA

Hao YuDepartment of Electrical and ElectronicNanyang Technological UniversityNanyang Avenue 50, Singapore

Sheldon X.-D. TanDepartment of Electrical EngineeringUniversity of CaliforniaRiverside, USA

ISBN 978-1-4614-0787-4 e-ISBN 978-1-4614-0788-1DOI 10.1007/978-1-4614-0788-1Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2012931560

© Springer Science+Business Media, LLC 2012All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subjectto proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

www.springer.com

To our families

Preface

As VLSI technology scales into nanometer regime, chip design engineering facesseveral challenges. One profound change in the chip design business is that engi-neers cannot realize the design precisely into the silicon chips. Chip performance,manufacture yield, and lifetime thereby cannot be determined accurately at thedesign stage accordingly. The main culprit here is that many chip parameters—such as oxide thickness due to chemical and mechanical polish (CMP) and impuritydensity from doping fluctuations—cannot be determined or estimated precisely andthus become unpredictable at device, circuit, and system levels, respectively. Theso-called manufacturing process variations start to play an essential role, and theirinfluence on the performance, yield, and reliability becomes significant. As a result,variation-aware design methodologies and computer-aided design (CAD) tools arewidely believed to be the key to mitigate the unpredictability challenges for 45 nmtechnologies and beyond. Variational characterization, modeling, and optimization,hence, have to be incorporated into each step of the design and verification processesto ensure reliable chips and profitable manufacture yields.

The book is divided into five parts. Part I introduces basic concepts of manymathematic notations relevant to statistical analysis. Many established algorithmsand theories such as the Monte Carlo method, the spectral stochastic method, andthe principal factor analysis method and its variants will also be introduced. PartII focuses on the techniques for statistical full-chip power consumption analy-sis considering process variations. Chapter 3 reviews existing statistical leakageanalysis methods, as leakage powers are more susceptible to process variations.Chapter 4 presents a gate-level leakage analysis method considering both inter-die and inter-die variations with spatial correlations using the spectral stochasticmethod. Chapter 5 tries to solve the similar problems in the previous chapter. But amore efficient, linear-time algorithm is presented based on a virtual grid modeling ofprocess variations with spatial correlations. In Chap. 6, a statistical dynamic poweranalysis technique using the combined virtual grid and the orthogonal polynomialmethods is presented. In Chap. 7, a statistical total chip power estimation methodwill be presented. A collocation-based spectral-stochastic-based method is appliedto obtain the variational total chip powers based on accurate SPICE simulation.

vii

viii Preface

Part III emphasizes on variational analysis of on-chip power grid networksunder process variations. Chapter 8 introduces an efficient stochastic method foranalyzing the voltage drop variations of on-chip power grid networks, consideringlog-normal leakage current variations with spatial correlation. Chapter 9 presentsanother stochastic method for solving the similar problem in the previous chapter.But model order reduction has been applied in this method to improve the efficiencyof the simulation. Chapter 10 introduces a new approach to variational powergrid analysis, where model order reduction techniques and variational subspacemodeling are used to obtain the variational voltage drop responses.

Part IV of this book is concerned with statistical interconnect extraction andmodeling under process variations. Chapter 11 presents a statistical capacitanceextraction method using Galerkin-based spectral stochastic method. Chapter 12discusses a parallel and incremental solver for stochastic capacitance extraction.Chapter 13 gives a statistical inductance extraction method by collocation-basedspectral stochastic method.

Part V of this book focuses on the performance bound and statistical analysisof nanometer analog/mixed-signal circuits and the yield analysis and optimizationbased on statistical performance analysis and modeling. Chapter 14 presents per-formance bound analysis technique in s-domain for linearized analog circuits usingsymbolic and affine interval methods. Chapter 15 presents an efficient stochasticmismatch analysis technique for analog circuits using Galerkin-based spectralstochastic method and nonlinear modeling. Chapter 16 shows a yield analysis andoptimization technique, and Chap. 17 describes a yield optimization algorithm byan improved voltage binning scheme.

The content of the book comes mainly from the recent publications of authors.Many of those original publications can be found at http://www.ee.ucr.edu/�stan/project/sts ana/main sts ana proj.htm. Future errata and update about this book canbe found at http://www.ee.ucr.edu/�stan/project/books/book11 springer.htm.

Riverside, CA, USA Ruijing ShenRiverside, CA, USA Sheldon X.-D. TanSingapore, Singapore Hao Yu

http://www.ee.ucr.edu/~stan/project/sts_ana/main_sts_ana_proj.htm

http://www.ee.ucr.edu/~stan/project/sts_ana/main_sts_ana_proj.htm

http://www.ee.ucr.edu/~stan/project/books/book11_springer.htm

Acknowledgment

The contents of the book mainly come from the research works done in the Mixed-Signal Nanometer VLSI Research Lab (MSLAB) at the University of California atRiverside over the past several years. Some of the presented methods also come fromthe research from Dr. Hao Yu’s research group at Nanyang Technological University,Singapore.

It is a pleasure to record our gratitude to many Ph.D. students who havecontributed to this book. They include Dr. Duo Li, Dr. Ning Mi, Dr. Zhigang Hao,and Mr. Fang Gong (UCLA) for some of their research works presented in this book.Special thank is also given to Dr. Hai Wang, who helps to revise and proofread thefinal draft of this book.

Sheldon X.-D. Tan is grateful to his collaborator Prof. Yici Cai of TsinghuaUniversity for the collaborative research works, which lead to some of the presentedworks in this book. Sheldon X.-D. Tan is also obligated to Dr. Jinjun Xiong andDr. Chandu Visweswariah of IBM for their insights into many important problemsin industry, which inspired some of the works in this book.

The authors would like to thank both the National Science Foundation andNational Nature Science Foundation of China for their financial support for thisbook. Sheldon X.-D. Tan highly appreciates the consistent support of Dr. SankarBasu of the National Science Foundation over the past 7 years. This book project isfunded in part by NSF grant under No. CCF-0448534; in part by NSF grants underNo. OISE-0623038, OISE-0929699, OISE-1051787, CCF-1116882; and OISE-1130402; and in part by the National Natural Science Foundation of China (NSFC)grant under No. 60828008. We would also would like to thank for the support ofUC Regent’s Committee on Research Fellowship and Faculty Fellowships from theUniversity of California at Riverside. Dr. Hao Yu would like also to acknowledgethe funding support from NRF2010NRF-POC001-001, Tier-1-RG 26/10, and Tier-2-ARC 5/11 at Singapore.

Last not least, Sheldon X.-D. Tan would like to thank his wife, Yan, and hisdaughters, Felicia and Leslay, for understanding and support during the many hoursit took to write this book. Ruijing Shen would like to express her deepest gratitude toher adviser, Prof. Sheldon X.-D. Tan, for his help, trust, and guidance. There exist

ix

x Acknowledgment

wonders as well as frustrations in academic research. His kindness, insight, andsuggestions always let her go the right way. A special word of thanks for all ofRuijing’s mentors in Tsinghua University (Prof. Xiangqing He, Prof. XianlongHong, Prof. Changzheng Sun, et al.). They taught her about the world of electronics(and much beyond). Finally, Ruijing Shen is extremely grateful to her husband,Boyuan Yan, the whole family, and all her friends. She would like to thank them fortheir constant support and encouragement during the writing of this manuscript.

Contents

Part I Fundamentals

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Nanometer Chip Design in Uncertain World . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1 Causes of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Process Variation Classification and Modeling . . . . . . . . . . . . . . . . . . 61.3 Process Variation Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Book Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1 Statistical Full-Chip Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Variational On-Chip Power Delivery Network Analysis . . . . . . . . 102.3 Statistical Interconnect Modeling and Extraction .. . . . . . . . . . . . . . . 112.4 Statistical Analog and Yield Analysis and Optimization . . . . . . . . 12

3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Fundamentals of Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Basic Concepts in Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1 Experiment, Sample Space, and Event . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2 Random Variable and Expectation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3 Variance and Moments of Random Variable . . . . . . . . . . . . . . . . . . . . . 171.4 Distribution Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5 Gaussian and Log-Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 191.6 Basic Concepts for Multiple Random Variables . . . . . . . . . . . . . . . . . 20

2 Multiple Random Variables and Variable Reduction.. . . . . . . . . . . . . . . . . . 232.1 Components of Covariance in Process Variation.. . . . . . . . . . . . . . . . 232.2 Random Variable Decoupling and Reduction .. . . . . . . . . . . . . . . . . . . 252.3 Principle Factor Analysis Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Weighted PFA Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.5 Principal Component Analysis Technique . . . . . . . . . . . . . . . . . . . . . . . 27

3 Statistical Analysis Approaches.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

xi

xii Contents

3.2 Spectral Stochastic Method Using StochasticOrthogonal Polynomial Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Collocation-Based Spectral Stochastic Method .. . . . . . . . . . . . . . . . . 313.4 Galerkin-Based Spectral Stochastic Method . . . . . . . . . . . . . . . . . . . . . 33

4 Sum of Log-Normal Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1 Hermite PC Representation of Log-Normal Variables. . . . . . . . . . . 344.2 Hermite PC Representation with One Gaussian Variable . . . . . . . 354.3 Hermite PC Representation of Two and More

Gaussian Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Part II Statistical Full-Chip Power Analysis

3 Traditional Statistical Leakage Power Analysis Methods . . . . . . . . . . . . . . 391 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Static Leakage Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.1 Gate-Based Static Leakage Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2 MOSFET-Based Static Leakage Model. . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Process Variational Models for Leakage Analysis . . . . . . . . . . . . . . . . . . . . . 454 Full-Chip Leakage Modeling and Analysis Methods . . . . . . . . . . . . . . . . . . 49

4.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Traditional Grid-Based Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Projection-Based Statistical Analysis Methods . . . . . . . . . . . . . . . . . . 53

5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Statistical Leakage Power Analysis by Spectral Stochastic Method . . 551 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 Flow of Gate-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.1 Random Variables Transformation and Reduction.. . . . . . . . . . . . . . 572.2 Computation of Full-Chip Leakage Currents . . . . . . . . . . . . . . . . . . . . 582.3 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling 651 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 Virtual Grid-Based Spatial Correlation Model . . . . . . . . . . . . . . . . . . . . . . . . . 673 Linear Chip-Level Leakage Power Analysis Method .. . . . . . . . . . . . . . . . . 69

3.1 Computing Gate Leakage by the Spectral Stochastic Method . . 703.2 Computation of Full-Chip Leakage Currents . . . . . . . . . . . . . . . . . . . . 713.3 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 New Statistical Leakage Characterization in SCL .. . . . . . . . . . . . . . . . . . . . 724.1 Acceleration by Look-Up Table Approach .. . . . . . . . . . . . . . . . . . . . . . 724.2 Enhanced Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.3 Computation of Full-Chip Leakage Currents . . . . . . . . . . . . . . . . . . . . 75

Contents xiii

4.4 Incremental Leakage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.5 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.6 Discussion of Extension to Statistical Runtime

Leakage Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.7 Discussion about Runtime Leakage Reduction Technique . . . . . . 79

5 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.1 Accuracy and CPU Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2 Incremental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Statistical Dynamic Power Estimation Techniques . . . . . . . . . . . . . . . . . . . . . 831 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832 Prior Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

2.1 Existing Relevant Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2 Segment-Based Power Estimation Method. . . . . . . . . . . . . . . . . . . . . . . 86

3 The Presented New Statistical Dynamic Power Estimation Method . . 873.1 Flow of the Presented Analysis Method .. . . . . . . . . . . . . . . . . . . . . . . . . 873.2 Acceleration by Building the Look-Up Table . . . . . . . . . . . . . . . . . . . . 883.3 Statistical Gate Power with Glitch Width Variation . . . . . . . . . . . . . 893.4 Computation of Full-Chip Dynamic Power . . . . . . . . . . . . . . . . . . . . . . 89


7 Statistical Total Power Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 931 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932 Review of the Monte Carlo-Based Power Estimation Method . . . . . . . . 953 The Statistical Total Power Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . 96

3.1 Flow of the Presented Analysis Method Under FixedInput Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.2 Computing Total Power by Orthogonal Polynomials .. . . . . . . . . . . 973.3 Flow of the Presented Analysis Method Under

Random Input Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Part III Variational On-Chip Power Delivery NetworkAnalysis

8 Statistical Power Grid Analysis Considering Log-NormalLeakage Current Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083 Nominal Power Grid Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

xiv Contents

5.1 Galerkin-Based Spectral Stochastic Method . . . . . . . . . . . . . . . . . . . . . 1125.2 Spatial Correlation in Statistical Power Grid Analysis . . . . . . . . . . 1145.3 Variations in Wires and Leakage Currents . . . . . . . . . . . . . . . . . . . . . . . 115

6 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.1 Comparison with Taylor Expansion Method .. . . . . . . . . . . . . . . . . . . . 1186.2 Examples Without Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.3 Examples with Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226.4 Consideration of Variations in Both Wire and Currents . . . . . . . . . 123

7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

9 Statistical Power Grid Analysis by Stochastic ExtendedKrylov Subspace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283 Review of Extended Krylov Subspace Method . . . . . . . . . . . . . . . . . . . . . . . . 1284 The Stochastic Extended Krylov Subspace Method—StoEKS.. . . . . . . 130

4.1 StoEKS Algorithm Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304.2 Generation of the Augmented Circuit Matrices . . . . . . . . . . . . . . . . . . 1304.3 Computation of Hermite PCs of Current Moments

with Log-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334.4 The StoEKS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354.5 A Walk-Through Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1364.6 Computational Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137


10 Statistical Power Grid Analysis by Variational Subspace Method . . . . 1451 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1452 Review of Fast Truncated Balanced Realization Methods . . . . . . . . . . . . . 146

2.1 Standard Truncated Balanced Realization Methods . . . . . . . . . . . . . 1462.2 Fast and Approximate TBR Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1472.3 Statistical Reduction by Variational TBR . . . . . . . . . . . . . . . . . . . . . . . . 148

3 The Presented Variational Analysis Method: varETBR . . . . . . . . . . . . . . . 1483.1 Extended Truncated Balanced Realization Scheme.. . . . . . . . . . . . . 1483.2 The Presented Variational ETBR Method .. . . . . . . . . . . . . . . . . . . . . . . 150


Part IV Statistical Interconnect Modeling and Extractions

11 Statistical Capacitance Modeling and Extraction . . . . . . . . . . . . . . . . . . . . . . . 1631 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1653 Presented Orthogonal PC-Based Extraction Method: StatCap . . . . . . . . 166

3.1 Capacitance Extraction Using Galerkin-Based Method . . . . . . . . . 166

5 Statistical Power Grid Analysis Based on Hermite PC . . . . . . . . . . . . . . . . 112

Contents xv

3.2 Expansion of Potential Coefficient Matrix . . . . . . . . . . . . . . . . . . . . . . . 1673.3 Formulation of the Augmented System . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4 Second-Order StatCap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.1 Derivation of Analytic Second-Order Potential

Coefficient Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1724.2 Formulation of the Augmented System . . . . . . . . . . . . . . . . . . . . . . . . . . 173

5 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1777 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

12 Incremental Extraction of Variational Capacitance . . . . . . . . . . . . . . . . . . . 1831 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1832 Review of GRMES and FMM Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

2.1 The GMRES Method .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1842.2 The Fast Multipole Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

3 Stochastic Geometrical Moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1853.1 Geometrical Moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1863.2 Orthogonal PC Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

4 Parallel Fast Multipole Method with SGM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1894.1 Upward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1904.2 Downward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1914.3 Data Sharing and Communication .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

5 Incremental GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1935.1 Deflated Power Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1945.2 Incremental Precondition.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

6 piCAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.1 Extraction Flow .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.2 Implementation Optimization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1997.1 Accuracy Validation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1997.2 Speed Validation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2027.3 Eigenvalue Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

13 Statistical Inductance Modeling and Extraction . . . . . . . . . . . . . . . . . . . . . . . . 2091 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2092 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2103 The Presented Statistical Inductance Extraction Method—statHenry. 212

3.1 Variable Decoupling and Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2123.2 Variable Reduction by Weighted PFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2133.3 Flow of statHenry Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214


xvi Contents

Part V Statistical Analog and Yield Analysisand Optimization Techniques

14 Performance Bound Analysis of Variational LinearizedAnalog Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2211 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2212 Review of Interval Arithmetic and Affine Arithmetic . . . . . . . . . . . . . . . . . 2223 The Performance Bound Analysis Method Based

on Graph-based Symbolic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2233.1 Variational Transfer Function Computation .. . . . . . . . . . . . . . . . . . . . . 2233.2 Performance Bound by Kharitonov’s Functions . . . . . . . . . . . . . . . . . 228


15 Stochastic Analog Mismatch Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2351 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2352 Preliminary .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

2.1 Review of Mismatch Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2372.2 Nonlinear Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

3 Stochastic Transient Mismatch Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2393.1 Stochastic Mismatch Current Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2393.2 Perturbation Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2403.3 Non-Monte Carlo Analysis by Spectral Stochastic Method .. . . . 2403.4 A CMOS Transistor Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

4 Macromodeling for Mismatch Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2424.1 Incremental Trajectory-Piecewise-Linear Modeling .. . . . . . . . . . . . 2434.2 Stochastic Extension for Mismatch Analysis . . . . . . . . . . . . . . . . . . . . 246

5 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2475.1 Comparison of Mismatch Waveform-Error and Runtime . . . . . . . 2485.2 Comparison of TPWL Macromodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

16 Statistical Yield Analysis and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2531 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2532 Problem Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2543 Stochastic Variation Analysis for Yield Analysis . . . . . . . . . . . . . . . . . . . . . . 256

3.1 Algorithm Overview.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2583.2 Stochastic Yield Estimation and Optimization .. . . . . . . . . . . . . . . . . . 2593.3 Fast Yield Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2593.4 Stochastic Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2603.5 Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

4 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2654.1 NMC Mismatch for Yield Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2664.2 Stochastic Yield Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2664.3 Stochastic Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2684.4 Stochastic Yield Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Contents xvii

17 Voltage Binning Technique for Yield Optimization . . . . . . . . . . . . . . . . . . . . . 2731 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2732 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

2.1 Yield Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2742.2 Voltage Binning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

3 The Presented Voltage Binning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2763.1 Voltage Binning Considering Valid Segment . . . . . . . . . . . . . . . . . . . . 2773.2 Bin Number Prediction Under Given Yield Requirement . . . . . . . 2783.3 Yield Analysis and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

4 Numerical Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2814.1 Setting of Process Variation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2824.2 Prediction of Bin Numbers Under Yield Requirement . . . . . . . . . . 2824.3 Comparison Between Uniform and Optimal Voltage

Binning Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2834.4 Sensitivity to Frequency and Power Constraints . . . . . . . . . . . . . . . . . 2844.5 CPU Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

List of Figures

Fig. 1.1 OPT and PSM procedures in the manufacture process . . . . . . . . . . . . . . 5Fig. 1.2 Chemical and mechanical polishing (CMP) process . . . . . . . . . . . . . . . . 6Fig. 1.3 The dishing and oxide erosion after the CMP process . . . . . . . . . . . . . . 7Fig. 1.4 The comparison of circuit total power distribution

of circuit c432 in ISCAS’85 benchmark sets (top)under random input vectors (with 0.5 input signaland transition probabilities) and (bottom) under afixed input vector with effective channel length spatialcorrelations. Reprinted with permission from [62]c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Fig. 2.1 Grid-based model for spatial correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Fig. 3.1 Subthreshold leakage currents for four different inputpatterns in AND2 gate under 45 nm technology .. . . . . . . . . . . . . . . . . . . . 42

Fig. 3.2 Gate oxide leakage currents for four different inputpatterns in AND2 gate under 45 nm technology .. . . . . . . . . . . . . . . . . . . . 43

Fig. 3.3 Typical layout of a MOSFET .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Fig. 3.4 Procedure to derive the effective gate channel length model . . . . . . . 45

Fig. 4.1 An example of a grid-based partition. Reprinted withpermission from [157] c� 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Fig. 4.2 The flow of the presented algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Fig. 4.3 Distribution of the total leakage currents of the

presented method, the grid-based method, and the MCmethod for circuit SC0 (process variation parametersset as Case 1). Reprinted with permission from [157]c� 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

xix

xx List of Figures

Fig. 5.1 Location-dependent modeling with the T .i/ of grid celli defined as its seven neighbor cells. Reprinted withpermission from [159] c� 2010 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Fig. 5.2 The flow of the presented algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Fig. 5.3 Relation between �.d/ and d=� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Fig. 5.4 The flow of statistical leakage characterization in SCL . . . . . . . . . . . . . 74Fig. 5.5 The flow of the presented algorithm using statistical

leakage characterization in SCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Fig. 5.6 Simulation flow for full-chip runtime leakage . . . . . . . . . . . . . . . . . . . . . . . 78

Fig. 6.1 The dynamic power versus effective channel length foran AND2 gate in 45 nm technology (70 ps active pulseas partial swing, 130 ps active pulse as full swing).Reprinted with permission from [60] c� 2010 IEEE .. . . . . . . . . . . . . . . 84

Fig. 6.2 A transition waveform example fE1; E2; : : : ; Emg for anode. Reprinted with permission from [60] c� 2010 IEEE . . . . . . . . . 86

Fig. 6.3 The flow of the presented algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Fig. 6.4 The flow of building the sub LUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Fig. 7.1 The comparison of circuit total power distributionof circuit c432 in ISCAS’85 benchmark sets (top)under random input vectors (with 0.5 input signaland transition probabilities) and (bottom) under afixed input vector with effective channel length spatialcorrelations. Reprinted with permission from [62]c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Fig. 7.2 The flow of the presented algorithm under a fixed input vector . . . . 97Fig. 7.3 The selected power points a, b, and c from the power

distribution under random input vectors. Reprinted withpermission from [62] c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Fig. 7.4 The flow of the presented algorithm with random inputvectors and process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Fig. 7.5 The comparison of total power distribution PDF andCDF between STEP method and MC method forcircuit c880 under a fixed input vector. Reprinted withpermission from [62] c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Fig. 7.6 The comparison of total power distribution PDF andCDF between STEP method and Monte Carlo methodfor circuit c880 under random input vector. Reprintedwith permission from [62] c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Fig. 8.1 The power grid model used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

List of Figures xxi

Fig. 8.2 Distribution of the voltage in a given node with oneGaussian variable, �g D 0:1, at time 50 ns whenthe total simulation time is 200 ns. Reprinted withpermission from [109] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Fig. 8.3 Distribution of the voltage caused by the leakagecurrents in a given node with one Gaussian variable,�g D 0:5, in the time instant from 0 ns to 126 ns.Reprinted with permission from [109] c� 2008 IEEE . . . . . . . . . . . . . . 120

Fig. 8.4 Distribution of the voltage in a given node with twoGaussian variables, �g1 D 0:1 and �g2 D 0:5, attime 50 ns when the total simulation time is 200 ns.Reprinted with permission from [109] c� 2008 IEEE . . . . . . . . . . . . . . 121

Fig. 8.5 Correlated random variables setup in ground circuitdivided into two parts. Reprinted with permission from[109] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Fig. 8.6 Distribution of the voltage in a given node withtwo Gaussian variables with spatial correlation, attime 70 ns when the total simulation time is 200 ns.Reprinted with permission from [109] c� 2008 IEEE . . . . . . . . . . . . . . 123

Fig. 8.7 Correlated random variables setup in ground circuitdivided into four parts. Reprinted with permission from[109] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Fig. 8.8 Distribution of the voltage in a given node withfour Gaussian variables with spatial correlation, attime 30 ns when the total simulation time is 200 ns.Reprinted with permission from [109] c� 2008 IEEE . . . . . . . . . . . . . . 124

Fig. 8.9 Distribution of the voltage in a given node withcircuit partitioned of 5 � 5 with spatial correlation, attime 30 ns when the total simulation time is 200 ns.Reprinted with permission from [109] c� 2008 IEEE . . . . . . . . . . . . . . 124

Fig. 8.10 Distribution of the voltage in a given node in circuit5with variation on G,C,I, at time 50 ns when the totalsimulation time is 200 ns. Reprinted with permissionfrom [109] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Fig. 9.1 The EKS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Fig. 9.2 Flowchart of the StoEKS algorithm. Reprinted with

permission from [110] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Fig. 9.3 The StoEKS algorithm .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Fig. 9.4 Distribution of the voltage variations in a given node by

StoEKS, HPC, and Monte Carlo of a circuit with 280nodes with three random variables. gi .t/ D 0:1udi .t/.Reprinted with permission from [110] c� 2008 IEEE . . . . . . . . . . . . . . 139

xxii List of Figures

Fig. 9.5 Distribution of the voltage variations in a given nodeby StoEKS, HPC, and MC of a circuit with 2,640nodes with seven random variables. gi .t/ D 0:1udi .t/.Reprinted with permission from [110] c� 2008 IEEE . . . . . . . . . . . . . . 140

Fig. 9.6 Distribution of the voltage variations in a given node byStoEKS and MC of a circuit with 2,640 nodes with 11random variables. gi .t/ D 0:1udi .t/. Reprinted withpermission from [110] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Fig. 9.7 A PWL current source at certain node. Reprinted withpermission from [110] c� 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Fig. 9.8 Distribution of the voltage variations in a given nodeby StoEKS, HPC, and Monte Carlo of a circuit with280 nodes with three random variables using thetime-invariant leakage model. gi D 0:1Ip . Reprintedwith permission from [110] c� 2008 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Fig. 10.1 Flow of ETBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Fig. 10.2 Flow of varETBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Fig. 10.3 Transient waveform at the 1,000th node

(n1 20583 11663) of ibmpg1 (p D 10, 100 samples).Reprinted with permission from [91] c� 2010 Elsevier . . . . . . . . . . . . . 154

Fig. 10.4 Transient waveform at the 1,000th node(n3 16800 9178400) of ibmpg6 (p D 10, 10 samples).Reprinted with permission from [91] c� 2010 Elsevier . . . . . . . . . . . . . 154

Fig. 10.5 Simulation errors of ibmpg1 and ibmpg6. Reprintedwith permission from [91] c� 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . 155

Fig. 10.6 Relative errors of ibmpg1 and ibmpg6. Reprinted withpermission from [91] c� 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Fig. 10.7 Voltage distribution at the 1,000th node of ibmpg1(10,000 samples) when t D 50 ns. Reprinted withpermission from [91] c� 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Fig. 11.1 A 2 � 2 bus. Reprinted with permission from [156]c� 2010 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Fig. 11.2 Three-layer metal planes. Reprinted with permissionfrom [156] c� 2010 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

Fig. 12.1 Multipole operations within the FMM algorithm.Reprinted with permission from [56] c� 2011 IEEE .. . . . . . . . . . . . . . . 185

Fig. 12.2 Structure of augmented system in piCAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189Fig. 12.3 The M2M operation in an upward pass to evaluate local

interactions around sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190Fig. 12.4 The M2L operation in a downward pass to evaluate

interactions of well-separated source cube and observer cube . . . . . . 192Fig. 12.5 The L2L operation in a downward pass to sum all integrations . . . . 193

List of Figures xxiii

Fig. 12.6 Prefetch operation in M2L. Reprinted with permissionfrom [56] c� 2011 IEEE .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Fig. 12.7 Stochastic capacitance extraction algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 197Fig. 12.8 Two distant panels in the same plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200Fig. 12.9 Distribution comparison between Monte Carlo and piCAP.. . . . . . . . 202Fig. 12.10 The structure and discretization of two-layer example

with 20 conductors. Reprinted with permission from[56] c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Fig. 12.11 Test structures: (a) plate, (b) cubic, and (c) crossover2�2. Reprinted with permission from [56] c� 2011 IEEE .. . . . . . . . . 204

Fig. 12.12 The comparison of eigenvalue distributions (panelwidth as variation source). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Fig. 12.13 The comparison of eigenvalue distributions (paneldistance as variation source) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Fig. 13.1 The statHenry algorithm .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214Fig. 13.2 Four test structures used for comparison .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 215Fig. 13.3 The loop inductance L12l distribution changes for the

10-parallel-wire case under 30% width and height variations . . . . . . 217Fig. 13.4 The partial inductance L11p distribution changes for

the 10-parallel-wire case under 30% width and height variations . . 218

Fig. 14.1 The flow of the presented algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224Fig. 14.2 An example circuit. Reprinted with permission from

[61]. c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224Fig. 14.3 A matrix determinant and its DDD representation.

Reprinted with permission from [61]. c� 2011 IEEE . . . . . . . . . . . . . . . 225Fig. 14.4 (a) Kharitonov’s rectangle in state 8. (b) Kharitonov’s

rectangle for all nine states. Reprinted with permissionfrom [61]. c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Fig. 14.5 (a) A low-pass filter. (b) A linear model of the op-ampin the low-pass filter. Reprinted with permission from[61]. c� 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Fig. 14.6 Bode diagram of the CMOS low-pass filter. Reprintedwith permission from [61]. c� 2011 IEEE .. . . . . . . . . . . . . . . . . . . . . . . . . . 232

Fig. 14.7 Bode diagram of the CMOS cascode op-amp. Reprintedwith permission from [61]. c� 2011 IEEE .. . . . . . . . . . . . . . . . . . . . . . . . . . 233

Fig. 15.1 Transient mismatch (the time-varying standarddeviation) comparison at output of a BJT mixer withdistributed inductor: the exact by Monte CarloN andthe exact by orthogonal PC expansion. Reprinted withpermission from [52]. c� 2011 ACM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

xxiv List of Figures

Fig. 15.2 Transient nominal�x.0/.t/

�(a) and transient mismatch

(˛1.t/) (b) for one output of a COMS comparator bythe exact orthogonal PC and the isTPWL. Reprintedwith permission from [52]. c� 2011 ACM.. . . . . . . . . . . . . . . . . . . . . . . . . . 249

Fig. 15.3 Transient waveform comparison at output of a diodechain: the transient nominal, the transient withmismatch by SiSMA (adding mismatch at ic only),the transient with mismatch by the presented method(adding mismatch at transient trajectory). Reprintedwith permission from [52]. c� 2011 ACM.. . . . . . . . . . . . . . . . . . . . . . . . . . 250

Fig. 15.4 Transient mismatch (˛1.t/, the time-varying standarddeviation) comparison at output of a BJT mixer withdistributed substrate: the exact by OPC expansion, themacromodel by TPWL (order 45), and the macromodelby isTPWL (order 45). The waveform by isTPWL isvisually identical to the exact OPC. Reprinted withpermission from [52]. c� 2011 ACM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Fig. 15.5 (a) Comparison of the ratio of the waveform error byTPWL and by isTPWL under the same reduction order.(b) comparison of the ratio of the reduction runtime bymaniMOR and by isTPWL under the same reductionorder. In both cases, isTPWL is used as the baseline.Reprinted with permission from [52]. c� 2011 ACM .. . . . . . . . . . . . . . 251

Fig. 16.1 Example of the stochastic transient variation or mismatch . . . . . . . . . 254Fig. 16.2 Distribution of output voltage at tmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255Fig. 16.3 Parametric yield estimation based on orthogonal

PC-based stochastic variation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260Fig. 16.4 Stochastic yield optimization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263Fig. 16.5 Power consumption optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Fig. 16.6 Schematic of operational amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Fig. 16.7 NMC mismatch analysis vs. Monte Carlo for

operational amplifier case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267Fig. 16.8 Schematic of Schmitt trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268Fig. 16.9 Comparison of Schmitt trigger example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269Fig. 16.10 Schematic of SRAM 6-T cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270Fig. 16.11 Voltage distribution at BL B node .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271Fig. 16.12 NMC mismatch analysis vs. MC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

Fig. 17.1 The algorithm sketch of the presented new voltagebinning method .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

Fig. 17.2 The delay and power change with supply voltage for C432 . . . . . . . . 277Fig. 17.3 Valid voltage segment graph and the voltage binning solution . . . . . 278Fig. 17.4 Histogram of the length of valid supply voltage segment

len for C432 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

List of Figures xxv

Fig. 17.5 The flow of greedy algorithm for covering mostuncovered elements in S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

Fig. 17.6 Yield under uniform and optimal voltage binningschemes for C432 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

Fig. 17.7 Maximum achievable yield as function of power andperformance constraints for C2670 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

List of Tables

Table 3.1 Different methods for full-chip SLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Table 3.2 Relative errors by using different fitting formulas for

leakage currents of AND2 gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Table 3.3 Process variation parameter breakdown for 45 nm technology.. . . 46

Table 4.1 Process variation parameter breakdown for 45 nm technology.. . . 61Table 4.2 Comparison of the mean values of full-chip leakage

currents among three methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Table 4.3 Comparison standard deviations of full-chip leakage

currents among three methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Table 4.4 CPU time comparison among three methods . . . . . . . . . . . . . . . . . . . . . . 63

Table 5.1 Summary of test cases used in this chapter. . . . . . . . . . . . . . . . . . . . . . . . . 80Table 5.2 Accuracy comparison of different methods based on

Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Table 5.3 CPU time comparison .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Table 5.4 Incremental leakage analysis cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Table 6.1 Summary of benchmark circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Table 6.2 Statistical dynamic power analysis accuracy

comparison against Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Table 6.3 CPU time comparison .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Table 7.1 Summary of benchmark circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Table 7.2 Total power distribution under fixed input vector . . . . . . . . . . . . . . . . . . 101Table 7.3 Sampling number comparison under fixed input vector . . . . . . . . . . . 101Table 7.4 Total power distribution comparison under random

input vector and spatial correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

xxvii

xxviii List of Tables

Table 8.1 Accuracy comparison between Hermite PC (HPC)and Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Table 8.2 CPU time comparison with the Monte Carlo methodof one random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Table 8.3 CPU time comparison with the Monte Carlo methodof two random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Table 8.4 Comparison between non-PCA and PCA againstMonte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Table 8.5 CPU time comparison with the MC methodconsidering variation in G,C,I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Table 9.1 CPU time comparison of StoEKS and HPC with theMonte Carlo method. gi .t/ D 0:1udi .t/ . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Table 9.2 Accuracy comparison of different methods, StoEKS,HPC, and MC. gi .t/ D 0:1udi .t/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Table 9.3 Error comparison of StoEKS and HPC over MonteCarlo methods. gi .t/ D 0:1udi .t/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Table 10.1 Power grid (PG) benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Table 10.2 CPU times (s) comparison of varETBR and Monte

Carlo (q D 50, p D 10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Table 10.3 Projected CPU times (s) comparison of varETBR and

Monte Carlo (q D 50, p D 10, 10,000 samples) . . . . . . . . . . . . . . . . . . 157Table 10.4 Relative errors for the mean of max voltage drop of

varETBR compared with Monte Carlo on the 2,000thnode of ibmpg1 (q D 50, p D 10, 10,000 samples) fordifferent variation ranges and different numbers of variables . . . . . 157

Table 10.5 Relative errors for the variance of max voltage drop ofvarETBR compared with Monte Carlo on the 2,000thnode of ibmpg1 (q D 50, p D 10, 10,000 samples)for different variation ranges and different numbers of variables . 157

Table 10.6 CPU times (s) comparison of StoEKS and varETBR(q D 50, p D 10) with 10,000 samples for differentnumbers of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Table 11.1 Number of nonzero element in Wi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Table 11.2 The test cases and the parameters setting . . . . . . . . . . . . . . . . . . . . . . . . . . 175Table 11.3 CPU runtime (in seconds) comparison among MC,

SSCM, and StatCap(1st/2nd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176Table 11.4 Capacitance mean value comparison for the 1�1 bus . . . . . . . . . . . . . 177Table 11.5 Capacitance standard deviation comparison for the

1 � 1 bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Table 11.6 Error comparison of capacitance mean values among

SSCM, and StatCap (first- and second-order) .. . . . . . . . . . . . . . . . . . . . . 178

List of Tables xxix

Table 11.7 Error comparison of capacitance standard deviationsamong SSCM, and StatCap (first- and second-order).. . . . . . . . . . . . . 179

Table 12.1 Accuracy comparison of two orthogonal PC expansions .. . . . . . . . . 200Table 12.2 Incremental analysis versus MC method . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Table 12.3 Accuracy and runtime(s) comparison between

MC(3,000), piCap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Table 12.4 MVP runtime (s)/speedup comparison for four

different examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203Table 12.5 Runtime and iteration comparison for different examples. . . . . . . . . 204Table 12.6 Total runtime(s) comparison for two-layer

20-conductor by different methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Table 13.1 Accuracy comparison (mean and variance values ofinductances) among MC, HPC, and statHenry. . . . . . . . . . . . . . . . . . . . . 216

Table 13.2 CPU runtime comparison among MC, HPC, and statHenry . . . . . . 216Table 13.3 Reduction effects of PFA and wPFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216Table 13.4 Variation impacts on inductances using statHenry. . . . . . . . . . . . . . . . . 217

Table 14.1 Extreme values of jP.j!/j and ArgP.j!/ fornine states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Table 14.2 Summary of coefficient radius reduction with cancellation . . . . . . . 231Table 14.3 Summary of DDD information and performance of

the presented method .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Table 15.1 Scalability comparison of runtime and error for theexact model with MC, the exact model with OPC, andthe isTPWL macromodel with OPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

Table 16.1 Comparison of accuracy and runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267Table 16.2 Comparison of accuracy and runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268Table 16.3 Sensitivity of �output with respect to each MOSFET

width variation �pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269Table 16.4 Sensitivity of �vBL B and �power with respect to each

MOSFET width variation �pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271Table 16.5 Comparison of different yield optimization algorithms

for SRAM cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Table 17.1 Predicted and actual number of bins needed underyield requirement .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

Table 17.2 Yield under uniform and optimal voltage binningschemes (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Table 17.3 CPU time comparison(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

Part IFundamentals

Chapter 1Introduction

1 Nanometer Chip Design in Uncertain World

As VLSI technology scales into the nanometer regime, chip design engineeringfaces several challenges in maintaining historical rates of performance improvementand capacity increase with CMOS technologies. One profound change in the chipdesign business is that engineers cannot put the design precisely into the siliconchips. Chip performance, manufacture yield, and lifetime become unpredictable atthe design stage, and they cannot be determined accurately at the design stage. Themain culprit is that many chip parameters—such as oxide thickness due to chemi-cal and mechanical polish (CMP) and impurity density from doping fluctuations—cannot be determined precisely and thus are unpredictable. The so-called manu-facture process variations start to play a big role, and their influence on the chip’sperformance, yield, and reliability becomes significant [16, 78, 121, 122, 170].

Traditional corner-based analysis and design approaches apply guard bands toconsider parameter variations, which may lead to too conservative designs. Suchpessimism can lead to increased design efforts and prolonged time to market. Alsoa worse case is a circuit that does not correspond with all parameters at their worstor best process conditions. It will become extremely difficult to find such a worstcase by simulating a limited number of corner cases.

As a result, it is imperative to develop new design methodologies to consider theimpacts of various process and environmental uncertainties and elevated temper-ature on chip performance. Variational impacts have to be incorporated into everystep of design process to ensure the reliable chips and profitable manufacture yields.The design methodologies and design tools from system level down to the physicallevels have to consider variability impacts on the chip performance, which calls fornew statistical optimization approaches for designing nanometer VLSI systems.

Performance modeling and analysis of nanometer VLSI systems in the presenceof process-induced variation and uncertainty is the one crucial problem facing ICchip designers and design tool developers. How to efficiently and accurately assess

R. Shen et al., Statistical Performance Analysis and Modeling Techniquesfor Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1 1,© Springer Science+Business Media, LLC 2012

3

4 1 Introduction

the impacts of the process variations on circuit performances in the various physicaldesign steps is critical for fast design closure, yield improvement, cost reductionof VLSI design, and fabrication processes. The design methodologies and designtools from system level down to the physical levels have to embrace variabilityimpacts on the nanometer VLSI chips, which calls for statistical/stochastic-basedapproaches for designing 90 nm and beyond VLSI systems. The advantages andpromises of statistical analysis is that the impact of parameter variations on a circuitis simultaneously obtained with a less computing effort and the impacts on yield canbe properly understood and used for further optimization.

1.1 Causes of Variations

To consider the impact of variations on the circuit performance, we should firstunderstand the sources of variations and how they affect circuit performances.

The first source is process-induced variation, which is value fluctuation ofprocess parameters during the manufacture process. Those variations will affectthe performance of devices and interconnects. For instance, chip leakage power(especially subthreshold leakage power) is very sensitive to channel length varia-tions owing to the exponential relationship between leakage current and effectivechannel length. Process variation is caused by different sources such as lithography(optical proximity correction, PSM), etching, CMP, doping process, etc. [16, 170].Figure 1.1 gives cartoon illustrations for the optical proximity correction (OPC)process (a) and phase-shift mask (b) procedures. Figure 1.2 shows the CMPprocess. Some of the variations are systematic, i.e., those caused by the lithographyprocess [42, 129]. Some are purely random, i.e., the doping density of impuritiesand edge roughness, etching, and CMP [7]. Process variations can occur at differentlevels: wafer level, inter-die level, and intra-die level, and we will discuss this indetail soon.

In addition to the process-induced variations, there are also variations from thechip operational environments. These include temperature variations and powersupply variations, which will affect circuit timing and powers. A reduced powersupply will reduce the driving strength of the devices and, hence, degrades theirperformance. The so-called power supply integrity now becomes a serious concernfor chip sign-off. On the other hand, increased temperature will lead to more leak-age, which in turn will result in more heat generated and high on-chip temperature.Such positive feedback can sometimes lead to thermal runaway and ultimate failureof the devices. Further, both voltage supply degradation and temperature are subjectto process-induced variations as they are functions of chip power (dynamic, short,and leakage), which are susceptible to process variations.

In addition to the mentioned variations, chip performance also changes over thetime due to aging and other reliability physical effects such as hot carrier injections,negative/positive bias temperature instability (N/PBTI), and electromigration. Hotcarrier injection can trigger numerous physical damages in the devices and cause

1 Nanometer Chip Design in Uncertain World 5

Optical proximity correction (OPC) process.

a

b

Phase shift mask (PSM) process.

Fig. 1.1 OPT and PSM procedures in the manufacture process

the voltage threshold voltage shift. N/PBTI will also lead to increased thresholdvoltage, decreased drain current and transconductance of devices. Electromigrationwill result in increased wire resistance and timing degradation of wires and evenlead to failure of the wires in the worst case. Those variations typically happen afterchips have been used for a while and were more studied as reliability issues thanvariation problems in the past. So in this book, we do not consider such aging- andreliability-related variations.

6 1 Introduction

Fig. 1.2 Chemical and mechanical polishing (CMP) process

1.2 Process Variation Classification and Modeling

To facilitate the modeling and analysis, it is beneficial to classify the processvariations into different categories. In general, process variations can be classifiedinto the following categories [16, 170]: inter-die and intra-die. Inter-die variationsare the variations from die to die, wafer to wafer, and lot to lot. Those are typicallyrepresented by a single variable for each die. As a result, inter-die variations areglobal variables and affect all the devices on a chip in the same way, i.e., make thetransistor gate channel lengths of all the devices on the same chip smaller.

In this book, we can model parameter variation as follows:

ıtotal D ıinter; (1.1)

where ıinter represents the inter-die variation. Typically, inter-die variations havesimple distributions such as Gaussian. For a single parameter variation, inter-dievariation impact can be very easily captured as all the devices in a die take the samevalues. In other words, under inter-die variation, if the circuit performance metricssuch as power, timing, and noises of all gates or devices are sensitive to the processparameters in a similar way, then the circuit performance can be analyzed at multipleprocess corners using deterministic analysis methods. However, if a number of inter-die process variations are considered and they are also correlated, the corner caseswill grow exponentially with the increased number of process parameters.

Intra-die variations correspond to variability within a single chip. Intra-dievariations may affect different devices differently on the same die, i.e., makesome devices to have smaller gate oxide thicknesses and others to have larger

1 Nanometer Chip Design in Uncertain World 7

Fig. 1.3 The dishing and oxide erosion after the CMP process

transistor gate oxide thicknesses. In addition, intra-die variations may exhibit spatialcorrelation due to proximity effects, i.e., it is more likely for devices located closeto each other to have similar characteristics than those placed far away.

Obviously, intra-die variations will typically have a large number of variables aseach device may require a variable. As a result, statistical methods must be usedas the corner-based method will be too expensive in this case. Intra-die variationcan be further classified into wafer-level variation, layout-dependent variation,and statistical variations [170] based on the sources of the variations. Wafer-levelvariation comes from lens aberration effect. Layout-dependent variation is causedby lithographic and etching processes such as CMP and OPC and phase-shiftmasks (PSM). CMP may lead to variations in dimensions called dishing and oxideerosion. Figure 1.3 gives a cartoon illustration of the dishing and oxide erosion afterthe CMP process.

Optical proximity effects are layout dependent and will lead to different criticaldimension (CD) variations depending on the neighboring layout of a pattern. Thoselayout-dependent variations typically are spatially correlated (they also have purelyrandom components). Statistical variations come from random dopant variations,whose impacts are not significant in the past and become more visible as CD scalesdown. Those variations are purely random and not spatially correlated. However,their impact on performance tends to be limited due to averaging effect in general.

In summary, we can model all the components of variation as follows:

ıtotal D ıinter C ıintra; (1.2)

where ıinter and ıintra represent the inter-die variation and intra-die variation,respectively. In some works such as in [13,95,170], ıinter and ıintra are both modeledas Gaussian random variables. In general, we will consider both the Gaussian andnon-Gaussian cases.

For layout-dependent ıintra, the value of parameter p located at .x; y/ can bemodeled as a location-dependent normally distributed random variable [101]:

p D �p C ıx C ıy C �; (1.3)

8 1 Introduction

where �p is the mean value (nominal design parameter value) at .0; 0/, and ıx

and ıy stand for the gradients of the parameter indicating the spatial variationsof p along the x and y directions, respectively. � represents the random intra-chip variation. Due to spatial correlations in the intra-die variation [195], thevector of all random components across the chip � has a correlated multivariatenormal distribution, � � N.0; †/, where † is the covariance matrix of the spatiallycorrelated parameters. If the covariance matrix is identity matrix, then there is nocorrelation among the variables.

1.3 Process Variation Impacts

In this section, we discuss the impact of the variations on the performance of acircuit. We have discussed different variations and their sources in the previoussections. It was shown that variations in device channel length have the largestimpacts on the device and circuit performances [151,170]. Channel length variationsconsist of both inter-die and intra-die variations and have spatially correlatedcomponents and purely random components. Channel length directly affects theleakage current, the driving strength of a device.

It was well accepted that process variations have huge impacts on circuit timing,power, yield, and reliability, and many studies have been done to assess theirimpacts in the past decade. In 2003, Borkar from Intel Corporation showed in afamous figure that the leakage current variations can be 20� with 1.3� variationin timing [8]. As a result, leakage analysis and estimation have been intensivelystudied recently. Furthermore, our recent study shows the total chip power variationscan be significant as glitch-related variation and other variation impacts on dynamicpower can be significant [60, 62]. Figure 1.4 shows the comparison of the circuittotal power distribution of c432 from ISCAS’85 benchmark. There are two powervariations. The first figure (upper) is obtained due to random input vectors. Thesecond is obtained using a fixed input vector but under process variations withspatial correlation. As can be seen, the variance induced by process variations iscomparable with the variance induced by random input vectors, which is quitesignificant.

In this book, we will have detailed studies to assess the impacts of processvariations on full-chip powers (leakage, dynamic, and total powers), interconnectsand their delays, voltage drops on power distribution networks, analog circuitperformances, and yields in the following chapters.

2 Book Outline

The book will present the latest developments for modeling and analysis of VLSIsystems in the presence of process variations at the nanometer scale. The authorsmake no attempt to be comprehensive on the selected topics. Instead, we want to

2 Book Outline 9

2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4

x 10−4

0

100

200

300Power distribution with random input vectors

Occ

uran

ces

W

3.5 3.6 3.7 3.8 3.9 4 4.1 4.2

x 10−4

0

100

200

300Power distribution with a fixed input vector and correlations in Leff

Occ

uran

ces

W

Fig. 1.4 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and(bottom) under a fixed input vector with effective channel length spatial correlations. Reprintedwith permission from [62] c� 2011 IEEE

provide some promising perspectives from the angle of new analysis algorithmsto solve the existing problems with reduced design cycle and cost. We hope thisbook can guide chip designers for understanding the potential and limitations ofthe existing design tools when improving their circuit design productivity, CADdevelopers for implementing the state-of-the-art techniques in their tools, CADresearchers for developing better and new generation algorithms, and students forunderstanding and mastering the emerging needs in the research.

The book consists of five parts. Part I starts with the review of many fundamentalstatistical and stochastic mathematic concepts, illustrated in Chap. 2. We discussrandom processes, correlation matrices, and Monte Carlo (MC) method. We also re-view orthogonal polynomial chaos (PC) and the related spectral stochastic method,and principal factor analysis (PFA) and their variants for variable reductions.

2.1 Statistical Full-Chip Power Analysis

Part II of this book focuses on the techniques for statistical full-chip powerconsumption analysis considering process variations. We will look at importantaspects of statistical power analysis such as leakage powers, dynamic powers, andtotal power estimation techniques in different chapters.

10 1 Introduction

Chapter 3 gives the overall review of statistical leakage analysis problemconsidering process variations with spatial correlations. The chapter discusses theexisting approaches and presents the pros and cons of those methods.

Chapter 4 presents a method for analyzing the full-chip leakage current distri-butions. The method considers both intra-die and inter-die variations with spatialcorrelations. The presented method employs the spectral stochastic method andmultidimensional Gaussian quadrature method to represent and compute variationalleakage at the gate level and uses the orthogonal decomposition to reduce thenumber of random variables by exploiting the strong spatial correlations of intra-die variations.

Chapter 5 gives a linear-time algorithm for full-chip statistical analysis of leakagepowers in the presence of general spatial correlation (strong or weak). The presentedalgorithm adopts a set of uncorrelated virtual variables over grid cells to representthe original physical random variables with spatial correlation, and the size of gridcell is determined by the correlation length. A look-up table (LUT) is further appliedto cache the statistical leakage information of each type of gate in the library to avoidcomputing leakage for each gate instance. As a result, the full-chip leakage can becalculated with O.N / time complexity, where N is the number of grid cells on chip.

Chapter 6 proposes a statistical dynamic power estimation method consideringthe spatial correlation in process variation. The chapter first shows that channellength variations have significant impacts on the dynamic power of a gate. Likeleakage analysis, the virtual grid-based modeling is applied here to consider thespatial correlations among gates. The segment-based statistical power method hasbeen used to deal with impacts of the glitch variations on dynamic powers. Theorthogonal polynomials of a statistical gate power are computed based on switchingsegment probabilities. The total full-chip dynamic power expressions are thencomputed by summing up resulting orthogonal polynomials (their coefficients).

Chapter 7 introduces an efficient statistical chip-level total power estimationmethod considering process variations with spatial correlation. The new methodcomputes the total power via circuit-level simulation under realistic input testingvectors. To consider the process variations with spatial correlation, the PFAmethod is applied to transform the correlated variables into uncorrelated ones andmeanwhile reduce the number of resulting random variables. Afterward, Hermitepolynomials and sparse grid techniques are used to estimate total power distributionin a sampling way.

2.2 Variational On-Chip Power Delivery Network Analysis

Part III of the book deals with variational analysis of on-chip power grid (distribu-tion) networks to assess the impacts of process variations on voltage drop noises andpower delivery integrity. We have three chapters in the part: Chaps. 8–10.

Chapter 8 introduces an efficient stochastic method for analyzing the voltagedrop variations of on-chip power grid networks, considering log-normal leakage

2 Book Outline 11

current variations with spatial correlation. The new analysis is based on the OPCrepresentation of random processes. This method considers both wire variationsand subthreshold leakage current variations, which are modeled as log-normaldistribution random variables, on the power grid voltage variations. To consider thespatial correlation, the orthogonal decomposition is carried to map the correlatedrandom variables into independent variables.

Chapter 9 presents another stochastic method for solving the similar problemspresented in Chap. 8. The new method, called StoEKS, still applies Hermite orthog-onal polynomial to represent the random variables in both power grid networksand input leakage currents. But different from the other orthogonal polynomial-based stochastic simulation method, extended Krylov subspace (EKS) method isemployed to compute variational responses from the augmented matrices consistingof the coefficients of Hermite polynomials. The new contributions of this methodlie in the acceleration of the spectral stochastic method using the EKS method tofast solve the variational circuit equations. By using the reduction technique, thepresented method partially mitigates increased circuit-size problem associated withthe augmented matrices from the Galerkin-based spectral stochastic method.

Chapter 10 gives a new approach to variational power grid analysis. The newapproach, called ETBR for extended truncated balanced realization, is based onmodel order reduction techniques to reduce the circuit matrices before the simula-tion. Different from the (improved) extended Krylov subspace methods EKS/IEKS,ETBR performs fast truncated balanced realization on response Gramian to reducethe original system. ETBR also avoids the adverse explicit moment representationof the input signals. Instead, it uses spectrum representation in frequency domainfor input signals by fast Fourier transformation.

The new algorithm is very efficient and scalable for huge networks with a largenumber of variational variables. This approach, called varETBR for variationalETBR, is based on model order reduction techniques to reduce the circuit matricesbefore the variational simulation. It performs the parameterized reduction on theoriginal system using variation-bearing subspaces. varETBR calculates variationalresponse Gramians by MC-based numerical integration considering both systemand input source variations for generating the projection subspace. varETBR is veryscalable considering number of variables, and is flexible for different variationaldistributions and ranges as demonstrated in experimental results. After the reduc-tion, MC-based statistical simulation is performed on the reduced system, and thestatistical responses of the original system are obtained thereafter.

2.3 Statistical Interconnect Modeling and Extraction

Part IV of this book is concerned with statistical interconnect extraction andmodeling due to process variations. There are three chapters: Chaps. 11–13.

12 1 Introduction

Chapter 11 introduces a statistical capacitance extraction method for interconnectconductors considering process variations. The new method is called StatCap, whereorthogonal polynomials are used to represent the statistical processes. The chaptershows how the variational potential coefficient matrix is represented in a first-orderform using Taylor expansion and orthogonal decomposition. Then an augmentedpotential coefficient matrix, which consists of the coefficients of the polynomials,is derived. After that, corresponding augmented system is solved to obtain thevariational capacitance values in the orthogonal polynomial form. Chapter 11further extends StatCap to the second-order form to give more accurate resultswithout loss of efficiency compared to the linear models.

Chapter 12 presents a parallel and incremental solver for stochastic capacitanceextraction. Our overall extraction flow is called piCAP. The random geometricalvariation is described by stochastic geometrical moments (SGMs), which leads toa densely augmented system equation. To efficiently extract the capacitance andsolve the system equation, a parallel fast multipole method (FMM) is derivedin the framework of stochastic GMs. This can efficiently estimate the stochasticpotential interaction and its matrix-vector product (MVP) with charge. Moreover,a generalized minimal residual method with incremental update is developed tocalculate both the nominal value and the variance.

Chapter 13 presents a method for statistical inductance extraction and modelingfor interconnects considering process variations. The new method, called statHenry,is based on the collocation-based spectral stochastic method. The coefficients of thepartial inductance orthogonal polynomial are computed via the collocation methodwhere a fast multidimensional Gaussian quadrature method is applied with sparsegrids. To further improve the efficiency of the presented method, a random variablereduction scheme is used. Given the interconnect wire variation parameters, theresulting method can derive the parameterized closed form of the inductance value.The chapter will show that both partial and loop inductance variations can besignificant given the width and height variations. The presented approach can workwith any existing inductance extraction tool to extract the variational partial andloop inductance or impedance.

2.4 Statistical Analog and Yield Analysis and Optimization

In Part V of this book, we discuss the variational analysis of analog and mixed-signal circuits as well as the yield analysis and optimization methods based onstatistical performance analysis and modeling. We will present the performancebound analysis technique in s-domain for linearized analog circuits (Chap. 14) andthe stochastic mismatch analysis of analog circuits (Chap. 15). Chapter 16 shows ayield analysis and optimization technique, and Chap. 17, binning scheme.

Chapter 14 introduces a performance bound analysis of analog circuits consid-ering process variations. The presented method applies a graph-based symbolic

3 Summary 13

analysis and affine interval arithmetic to derive the variational transfer functionsof analog circuits (linearized) with variational coefficients in forms of intervals.Then the frequency response bounds (maximum and minimum) are obtainedby performing analysis of a finite number of transfer functions given by thecontrol-theoretic Kharitonov’s polynomial functions, which can be computed veryefficiently. We also show in this chapter that the response bounds given by theKharitonov’s functions are conservative given the correlations among coefficientintervals in transfer functions.

Chapter 15 discusses a fast non-Monte Carlo (NMC) method to calculate mis-match of analog circuits in time domain. The local random mismatch is describedby a noise source with an explicit dependence on geometric parameters and isfurther expanded by OPC. The resulting equation forms a stochastic differentialalgebra equation (SDAE). To deal with large-scale problems, the SDAE is linearizedat a number of snapshots along the nominal transient trajectory and, hence, isnaturally embedded into a trajectory-piecewise-linear (TPWL) macromodeling. Themodeling is further improved with a novel incremental aggregation of subspacesidentified at those snapshots.

Chapter 16 introduces a fast NMC method to capture physical-level stochasticvariations for system-level yield estimation and optimization. Based on the or-thogonal PC expansion concept, an efficient and true NMC mismatch analysis isdeveloped to estimate the parametric yield. Moreover, this work further derives thestochastic sensitivity for yield within the framework of orthogonal polynomials.Using sensitivities, a corresponding multiobjective optimization is developed toimprove the yield rate and other performance merits, simultaneously. As a result, thepresented approach can automatically tune design parameters for a robust design.

Chapter 17 gives a yield optimization technique using voltage binning methodto improve yield of chips. Voltage binning technique tries to assign different supplyvoltages to different chips in order to improve the yield. The chapter will introducethe valid voltage segment concept, which is determined by the timing and powerconstraints of chips. Then we show a formulation to predict the maximum numberof bins required under the uniform binning scheme from the distribution of lengthof valid supply voltage segment. With this concept, an optimal binning scheme canbe modeled as a set-cover problem. A greedy algorithm is developed to solve theresulting set-cover problem in an incremental way. The presented method is alsoextendable to deal with the ranged supply voltages for dynamic voltage scalingunder different operation modes (like low power and high-performance modes).

3 Summary

In this chapter, we first describe the motivations for the statistical and variationalanalysis and modeling of nanometer VLSI systems. We then briefly introduceall the chapters in the book, which are divided into five parts: introduction andfundamental, statistical full-chip power analysis, variational power delivery network

14 1 Introduction

analysis, statistical interconnect extraction and modeling, and performance boundand statistical analysis for analog/mixed-signal circuits as well as statistical yieldanalysis and optimization respectively.

Throughout the book, numerical examples are provided to shed light on thediscussed topics and to help the reader gain more insights into the discussedmethods. Our treatment of those topics does not mean to be comprehensive, but wehope it can guide circuit designers and CAD developers to understand the importantimpacts of variability and reliability on nanometer chips and limitations of theirexisting tools. We hope this book helps readers to apply those techniques and todevelop new-generation CAD tools to design emerging nanometer VLSI systems.

Chapter 2Fundamentals of Statistical Analysis

To make this book self-contained, this chapter will review relevant mathematicalconcepts used in this book. We first review basic probability and statistical conceptsused in this book. Then we introduce mathematic notations for statistical processeswith multiple variable and variable reduction methods. We will then go throughsome statistical analysis approaches such as the MC method and the spectralstochastic method. Finally, we will discuss some fast techniques to compute someof random variables with log-normal distributions.

1 Basic Concepts in Probability Theory

An understanding of probability theory is essential to statistical analysis. In thissection, we will explain some basic concepts in probability theory [132] first. Moredetails and other stochastic theories can be found in [132].

1.1 Experiment, Sample Space, and Event

Definition 2.1. A experiment is any process of observation or procedure that canbe repeated (theoretically) an infinite number of times and has a well-defined set ofpossible outcomes.

Definition 2.2. A sample space is the set of all possible outcomes of an experiment.

Definition 2.3. An event is a subset of the sample space of an experiment.

Consider the following experiments as examples:

Example 1. Tossing a coin.Sample space: S D fhead or tailg orS D f0, 1g, where 0 represents a tail and 1 represents a head.


15

16 2 Fundamentals of Statistical Analysis

1.2 Random Variable and Expectation

Usually, we are interested in some value associated with a random event rather thanthe event itself. For example, in the experiment of tossing two dice, we only careabout the sum of the two dice, not the outcome of each die.

Definition 2.4. A random variable X on a sample space S is a real-valued functionX W S ! R.

Definition 2.5. A discrete random variable is a random variable that takes only afinite or countably infinite number of values (arises from counting).

Definition 2.6. A continuous random variable is a random variable whose set ofassumed values is uncountable (arises from measurement).

Let X be a random variable and let a 2 R. The event “X D a” represents the setfs 2 S j X.s/ D ag and the probability of this event is written as

Pr.X D a/ DX

s2S WX.s/Da

Pr.s/:

Example 2. Continuous random variable. A CPU is picked randomly from a groupof CPUs whose area should be 1 cm2. Due to some error in the manufacture process,the area of a chip could vary from chip to chip in the range 0.9 cm2 to 1.05 cm2,excluding the latter.

Let X denote the area of a selected chip. Possible outcomes: 0:9 � X < 1:05:

Example 3. Refer to the previous example. The area of a selected chip is acontinuous random variable. The following table gives the area in cm2 of 100 chips.It lists the observed values of the continuous random variable, the correspondingfrequencies, and their probabilities.

Area X Number of chips Pr.a � X < b/

0.90–0.95 8 0.080.95–1.00 57 0.571.00–1.05 35 0.35Total 100 1.00

Definition 2.7. The expectation EŒX�, or �, of a discrete random variable X is

EŒX� D � DX

i

i � Pr.X D i/;

where the sum is taken over all values in the range of X . IfP

i ji j � Pr.X D i/

converges, then the expectation is finite. Otherwise, the expectation is said tobe unbounded.

E.X/ is also called the mean value of the probability distribution.

1 Basic Concepts in Probability Theory 17

1.3 Variance and Moments of Random Variable

Theorem 2.1. Markov’s inequality. For a random variable X that takes on onlynonnegative values and for all a > 0, we have

Pr.X � a/ � EŒX�

a:

Proof. Let X be a random variable such that X � 0 and let a > 0. Define a randomvariable I by

I D(

1; if X � a,

0; otherwise,

where EŒI � D Pr.I D 1/ D Pr.X � a/ and

I � X

a: (2.1)

The expectations of both sides of (2.1) are given by the inequality

EŒI � D Pr.X � a/ � E

�X

a

�D EŒX�

a;

where we used Lemma 2.3. utDefinition 2.8. The kth moment of a random variable X is EŒXk�. The variance ofX is

VarŒX� D E�.X � EŒX�/2

�

D E�X2 � 2X � EŒX� C .EŒX�/2

�

D EŒX2� � 2 � EŒX� � EŒX� C .EŒX�/2

D EŒX2� � .EŒX�/2;

and the standard deviation of X is defined as

�.X/ Dp

VarŒX�:

The area under each curve is 1.

Theorem 2.2. Chebyshev’s inequality. For any a > 0 and a random variable X ,we have

Pr .jX � EŒX�j � a/ � VarŒX�

a2:


Proof. Note that

Pr .jX � EŒX�j � a/ D Pr�.X � EŒX�/2 � a2

�

and the random variable .X � EŒX�/2 > 0. Use Markov’s inequality and thedefinition of variance to obtain

Pr�.X � EŒX�/2 � a2

� � E�.X � EŒX�/2

�

a2D VarŒX�

a2

as required. utCorollary 2.1. For any t > 1 and a random variable X , we have

Pr�jX � EŒX�j � t � �.X/

� � 1

t2

Pr�jX � EŒX�j � t � EŒX�

� � VarŒX�

t2.EŒX�/2:

Proof. The results follow from the definitions of variance and standard deviationand Chebyshev’s inequality. ut

1.4 Distribution Functions

Definition 2.9. A discrete probability distribution is a table (or a formula) listingall possible values that a discrete variable can take on, together with the associatedprobabilities.

Definition 2.10. The function f .x/ is called a probability density function (PDF)for the continuous random variable X , if

Z b

a

f .x/dx D Pr.a � X � b/ (2.2)

for any values of a and b.

That is to say, the area under the curve of f .x/ between any two ordinates x D a

and x D b is the probability that X lies between a and b.It is easy to see that the total area under the PDF curve bounded by the x-axis is

equal to 1:Z 1

�1f .x/dx D 1: (2.3)


Definition 2.11. For a real-value random variable X , the probability distribution iscompletely characterized by its cumulative distribution function (CDF):

F.x/ DZ x

�1f .t/dt D PrŒX � x�; x 2 R; (2.4)

which describes probabilities for a random variable to fall in the intervals of.�1; x�.

1.5 Gaussian and Log-Normal Distributions

Definition 2.12. A Gaussian distribution (also called normal distribution) isdenoted as N .�; �2/, where, as usual, � identifies the mean and �2 the variance.The PDF is defined as follows:

f .xI �; �2/ D 1p2�2

e� .x � �/2

2�2 : (2.5)

The CDF of the standard normal distribution is denoted with ˚.x/ and can becomputed as an integral of the PDF:

˚.x/ D 1p2

Z x

�1e�t 2=2dt D 1

2

�1 C erf

�xp2

�; x 2 R; (2.6)

where erf is the complementary error function.

Definition 2.13. If X is distributed normally with mean � and variance �2, thenthe exponential of XY D exp.X/ follows log-normal distribution. That is to say,a log-normal distribution is a probability distribution of a random variable whoselogarithm is normally distributed.

The PDF and CDF of a log-normal distribution are as follows:

f .xI �; �/ D 1

x�p

2e� .lnx��/2

2�2 ; x > 0; (2.7)

FX .xI �; �/ D 1

2erf

�� lnx � �

�p

2

�D ˚

�lnx � �

�

: (2.8)

More details about the sum of multiple log-normal distribution is given in Sect. 4of Chap. 2.


1.6 Basic Concepts for Multiple Random Variables

Definition 2.14. Two random variables X and Y are independent if

Pr ..X D x/ \ .Y D y// D Pr.X D x/ � Pr.Y D y/

for all x; y 2 R. Furthermore, the random variables X1; X2; : : : ; Xk are mutuallyindependent if for any subset I f1; 2; : : : ; kg and any values xi for i 2 I , wehave

Pr

\

i2I

Xi D xi

!

DY

i2I

Pr.Xi D xi /:

Theorem 2.3. Linearity of expectations. Let X1; X2; : : : ; Xn be a finite collectionof discrete random variables with finite expectations. Then

E

"X

i

Xi

#

DX

i

EŒXi �:

Proof. We use induction on the number of random variables. For the base case, letX and Y be random variables. Use the law of total probability to get

EŒX C Y � DX

i

X

j

.i C j / � Pr ..X D i/ \ .Y D j //

DX

i

X

j

i � Pr ..X D i/ \ .Y D j //

CX

i

X

j

j � Pr ..X D i/ \ .Y D j //

DX

i

iX

j

Pr ..X D i/ \ .Y D j //

CX

j

jX

i

Pr ..X D i/ \ .Y D j //

DX

i

i � Pr.X D i/ CX

j

j � Pr.Y D j /

D EŒX� C EŒY �:

utLinearity of expectations holds for any collection of random variables, even if

they are not independent. Furthermore, ifP1

iD1 E ŒjXi j� converges, then it can beshown that


E

" 1X

iD1

Xi

#

D1X

iD1

E ŒXi � :

Lemma 2.1. Let c be any constant and X a random variable. Then

EŒcX� D c � EŒX�:

Proof. The case c D 0 is trivial. Suppose c ¤ 0. Then

EŒcX� DX

i

i � Pr.cX D i/

D c �X

i

.i=c/ � Pr.X D i=c/

D c �X

k

k � Pr.X D k/

D c � EŒX�

as required. utIf X and Y are two random variables, their covariance is

Cov.X; Y / D E Œ.X � EŒX�/.Y � EŒY �/�

D E Œ.Y � EŒY �/.X � EŒX�/�

D Cov.Y; X/:

Theorem 2.4. For any two random variables X and Y , we have

VarŒX C Y � D VarŒX� C VarŒY � C 2 � Cov.X; Y /:

Proof. Use the linearity of expectations, and the definitions of variance andcovariance, to obtain

VarŒX C Y � D E�.X C Y � EŒX C Y �/2

�

D E�.X C Y � EŒX� � EŒY �/2

�

D E�.X � EŒX�/2 C .Y � EŒY �/2

C 2.X � EŒX�/.Y � EŒY �/�

D E�.X � EŒX�/2

�C E�.Y � EŒY �/2

�

C 2 � E Œ.X � EŒX�/.Y � EŒY �/�

D VarŒX� C VarŒY � C 2 � Cov.X; Y /

as required. ut


Theorem 2.4 can be extended to a sum of any finite number of random variables.For a collection X1; : : : ; Xn of random variables, it can be shown that

Var

"X

i

Xi

#

DX

i

VarŒXi � C 2 �X

i

X

j >i

Cov.Xi ; Xj /:

Theorem 2.5. For any two independent random variables X and Y , we have

EŒX � Y � D EŒX� � EŒY �:

Proof. Let the indices i and j assume all values in the ranges of X and Y ,respectively. As X and Y are independent random variables, then

EŒX � Y � DX

i

X

j

ij � Pr ..X D i/ \ .Y D j //

DX

i

X

j

ij � Pr.X D i/ � Pr.Y D j /

D"X

i

i � Pr.X D i/

#2

4X

j

j � Pr.Y D j /

3

5

D EŒX� � EŒY �

as required. utCorollary 2.2. For any independent random variables X and Y , we have

Cov.X; Y / D 0

andVarŒX C Y � D VarŒX� C VarŒY �:

Proof. As X and Y are independent, then so are X � EŒX� and Y � EŒY �. For anyrandom variable Z, we have

E ŒZ � EŒZ�� D EŒZ� � E ŒEŒZ�� D 0:

Using Theorem 2.5, the covariance of X and Y is

Cov.X; Y / D E Œ.X � EŒX�/.Y � EŒY �/�

D E Œ.X � EŒX�/� � E Œ.Y � EŒY �/�

D 0:

2 Multiple Random Variables and Variable Reduction 23

Conclude via the latter equation and Theorem 2.4 that

VarŒX C Y � D VarŒX� C VarŒY � C 2 � Cov.X; Y /

D VarŒX� C VarŒY �

as required. utDefinition 2.15. For a collection of random variables, X D X1; : : : ; Xn, thecovariance matrix ˝n�n is defined as

˝ D

0

BBBBBBB@

Var.X1/ Cov.X1; X2/ : : : Cov.X1; Xn/

Cov.X2; X1/ Var.X1/ : : : Cov.X2; Xn/:::

::: : : ::::

Cov.Xn�1; X1/ Cov.Xn�1; X2/ : : : Cov.Xn�1; Xn/

Cov.Xn; X1/ Cov.Xn; X2/ : : : Var.Xn/

1

CCCCCCCA

When X1; : : : ; Xn are mutually independent random variables, it can be shownby induction that

Var

"X

i

Xi

#

DX

i

VarŒXi �:

And the covariance matrix is a diagonal matrix in this case.

2 Multiple Random Variables and Variable Reduction

2.1 Components of Covariance in Process Variation

In general, process variation can be classified into two categories [13]: inter-dieand intra-die. Inter-die variations are variations from die to die, while intra-dievariations correspond to variability within a single chip. Inter-die variations areglobal variables and, hence, affect all the devices on a chip in the similar fashion. Forexample, it can cause channel lengths of all the devices on the same chip smaller.Intra-die variations may affect devices differently on the same chip. For example, itcan cause some devices with smaller gate oxide thicknesses and others with largergate oxide thicknesses. The intra-die variations may exhibit spatial correlation. Forexample, it is more likely for devices located close to each other to have similarcharacteristics.


Gate5Gate3

Gate4

Gate2

Gate1

Fig. 2.1 Grid-based modelfor spatial correlations

In general, we can model parameter variation as follows,

ıtotal D ıinter C ıintra; (2.9)

where ıinter and ıintra represent the inter-die variation and intra-die variation,respectively. In general [13, 95, 169], ıinter and ıintra can be modeled as Gaussianrandom variables with normal distribution. In this chapter, we will discuss bothGaussian and non-Gaussian cases. Note that due to global effect of inter-dievariation, single random variable ıinter is used for all gates/grids in one chip.

For ıintra, the value of parameter p located at .x; y/ can be modeled as normallydistributed random variable [101] dependent on location:

p D �p C ıx C ıy C �; (2.10)

where �p is the mean value (nominal design parameter value) at .0; 0/ and ıx andıy stand for gradients of the parameter indicating the spatial variations of p along x

and y directions, respectively. � represents the random intra-chip variation. Due tospatial correlations in the intra-chip variation, the vector of all random componentsacross the chip � has a correlated multivariate normal distribution, ��N.0; †/,where † is the covariance matrix of the spatially correlated parameters.

A grid-based method is introduced by [13] for the consideration of correlation. Inthe grid-based method, the intra-die spatial correlation of parameters is partitionedinto

pn row � p

n col D n grids. Since devices close to each other are morelikely to have similar characteristics than those placed far away, grid-based methodsassume a perfect correlation among the devices in the same grid, high correlationsamong those in close grids, and low to zero correlations in faraway grids. Forexample, in Fig. 2.1, Gate1 and Gate2 have sizes shown to be exaggeratedly large.They are located in the same grid square, and hence, their parameter variations such


as the variations of their gate channel length are assumed to be always identical.Gate1 and Gate3 lie in neighboring grids, and hence, their parameter variationsare not identical but highly correlated due to their spatial proximity. For example,when Gate1 has a larger than nominal gate channel length, Gate3 is more likelyto have a larger than nominal gate channel length. On the other hand, Gate1 andGate4 are far away from each other; their parameters can be assumed as weaklycorrelated or uncorrelated. For example, when Gate1 has a larger than nominal gatechannel length, the gate channel length for Gate4 may be either larger or smallerthan nominal.

With the grid-based model, we can use a single random variable p.x; y/ to modela parameter variation in a single grid at location .x; y/. As a result, n randomvariables are needed for each type of parameter, where each represents the value ofa parameter in one of the n grids. In addition, we assume that correlation only existsamong the same type of parameters in different grids. Note that this assumption isnot critical and can easily be removed. For example, gate length L for transistorsin the i th grid is correlated with those in nearby grids, but is uncorrelated withother parameters such as gate oxide thickness Tox in any grid including the i th griditself. For each type of parameter, a correlation matrix † of size n�n represents thespatial correlation of this parameter. Notice that the number of grid partitions neededis determined by the process, not the circuit. So we can apply the same correlationmodel to different designs under the same process.

2.2 Random Variable Decoupling and Reduction

Due to correlation, a large number of random variables involved in VLSI design canbe reduced. After the random variable decoupling via correlation, one may furtherreduce the cost of statistical analysis by the spectral stochastic method as discussedin Sect. 3. Since the random variables are correlated, this correlation should beremoved before using the spectral stochastic method. In this part, we first presentthe theoretical basis for decoupling the correlation of random variables.

Proposition 2.1. For a set of zero-mean Gaussian-distributed variables �� whosecovariance matrix is ˝ , if there is a matrix L satisfying ˝ D LLT , then �� canbe represented by a set of independent standard normal distributed variables � as�� D L�.

Proof. According to the characteristics of normal distribution, linear transformationdoes not impact on the zero mean of the variables and yield another normaldistribution. Thus, we only need to prove the covariance matrix remains unchangedduring the transformation. According to the definition of covariance,

cov.L�/ D E�L�.L�/T

� D LE��T

�LT : (2.11)


Since � is subject to standard normal distribution,

LE��T

�LT D LLT D n: (2.12)

2.3 Principle Factor Analysis Technique

Note that the solution for decoupling is not unique. For example, Choleskydecomposition can be used to seek L since the covariance matrix ˝ is always asemipositive definite matrix. However, Cholesky decomposition cannot reduce thenumber of variables. PFA [74] can substitute Cholesky decomposition when variablereduction is needed. Eigendecomposition on the covariance matrix yields

˝ D LLT ; L Dp

�1e1; :::;p

�nen

�; (2.13)

where f�ig are eigenvalues in order of descending magnitude, and feig are corre-sponding eigenvectors. PFA reduces the number of components in � by truncatingL using the first k items.

The error of PFA can be controlled by k:

err D

nP

iDkC1

�i

nP

iD1

�i

; (2.14)

where bigger k leads to a more accurate result. PFA is efficient, especially when thecorrelation length is large. In our experiments, we set the correlation length beingeight times the width of wires. As a result, PFA can reduce the number of variablesfrom 40 to 14 with an error of about 1% in an example with 20 parallel wires.

2.4 Weighted PFA Technique

One idea is to consider the importance of the outputs during the reduction processwhen using PFA. Recently, the weighted PFA (wPFA) technique has been used [204]to obtain variable reduction efficiency.

If a weight is defined for each physical variable �i , to reflect its impact on theoutput, then a set of new variables �� are formed:

�� D W �; (2.15)


where W D diag.w1; w2; :::; wn/ is a diagonal matrix of weights. As a result,the covariance matrix of ��, ˝.��/ now contains the weight information, andperforming PFA on ˝.��/ leads to the weighted variable reduction. Specifically,we have

˝.��/ D E�W �.W �/T

� D W ˝.�/W T ; (2.16)

and denote its eigenvalues and eigenvectors by ��i and e�

i . Then, the variables �

can be approximated by the linear combination of a set of independent dominantvariables ��:

� D W �1�� W �1

kX

iD1

q��

i e�i ��

i : (2.17)

The error controlling process is similar to (2.14) but uses the weighted eigenval-ues ��

i .

2.5 Principal Component Analysis Technique

We first briefly review the concept of principal component analysis (PCA), which isused here to transform the random variables with correlation to uncorrelated randomvariables [75].

Suppose that x is a vector of n random variables, x D Œx1; x2; :::; xn�T , withcovariance matrix ˝ and mean vector �x D Œ�x1 ; �x2 ; :::; �xn �. To find theorthogonal random variables, we first calculate the eigenvalue and correspondingeigenvector. Then, by ordering the eigenvectors in descending order eigenvalues,the orthogonal matrix A will be obtained. Here, A is expressed as

A D �eT

1 ; eT2 ; :::; eT

n

�T; (2.18)

where ei is the corresponding eigenvector to eigenvalue �i , which satisfies

�i ei D ˝ei ; i D 1; 2; :::; n; (2.19)

and�i < �i�1; i D 2; 3; :::; n: (2.20)

With A, we can perform the transformation to get orthogonal random variables y,y D Œy1; y2; :::; yn�T by using

y D A.x � �x/; (2.21)


where yi is a random variable with Gaussian distribution. The mean, �yi , is 0 andthe standard deviation, �yi , is

p�i on the condition that [75]

eTi ei D 1; i D 1; 2; :::; n: (2.22)

Here, because of the orthogonal property of matrix A,

A�1 D AT : (2.23)

To reconstruct the original random variables, we use the following equation:

x D AT y C �x: (2.24)

3 Statistical Analysis Approaches

3.1 Monte Carlo Method

Monte Carlo techniques [41] are usually used to estimate the value of a definite,finite-dimensional integral of the form

G DZ

S

g.X/f .X/dX; (2.25)

where S is a finite domain and f .X/ is a PDF over X , i.e., f .X/ � 0 for all X

andRS

f .X/dX D 1. We can accomplish the MC estimation for the value of G bydrawing a set of independent samples X1; X2; :::; XMC from f .X/ and by applying

GMC D .1=MC /

MCX

iD1

g.Xi/: (2.26)

The estimator GMC above is a random variable. Its mean value is the integral G

to estimate, i.e., E.GMC / D G, making it an unbiased estimator. The variance ofGMC is Var.GMC / D �2=MC , where �2 is the variance of the random variableg.X/ given by

�2 DZ

S

g2.X/f .X/dX � G2: (2.27)

3 Statistical Analysis Approaches 29

We can use the standard deviation of GMC to assess its accuracy in estimating G.If the sample number MC is sufficiently large, then by the Central Limit Theorem,GMC �G

�=p

MChas an approximate standard normal distribution (N.0; 1/). Hence,

P

�G � 1:96

�pMC

� GMC � G C 1:96�pMC

0:95; (2.28)

where P is the probability measure. Equation (2.28) shows that GMC will be in the

intervalhG � 1:96 �p

MC; G C 1:96 �p

MC

iwith 95% confidence. Thus, one can use

the error measure

jErrorj 2�pMC

(2.29)

in order to assess the accuracy of the estimator.

3.2 Spectral Stochastic Method Using Stochastic OrthogonalPolynomial Chaos

One recent advance in fast statistical analysis is to apply stochastic OPC [187] to thenanometer-scale integrated circuit analysis. Based on the Askey scheme [196], anystochastic random variable can be represented by OPC, and the random variablewith different probability distribution type is associated with different types oforthogonal polynomials.

Hermite polynomial chaos (Hermite PC or HPC) utilizes a series of orthogonalpolynomials (with respect to the Gaussian distribution) to facilitate stochasticanalysis [197]. These polynomials are used as the orthogonal base to decomposea random process in a similar way that sine and cosine functions are used todecompose a periodic signal in a Fourier series expansion. Note that for theGaussian and log-normal distributions, Hermite polynomial is the best choice asthey lead to exponential convergence rate [45]. For non-Gaussian and non-log-normal distributions, there are other orthogonal polynomials such as Legendre foruniform distribution, Charlier for Poisson distribution, and Krawtchouk for binomialdistribution [44, 187].

For a random variable y.�/ with limited variance, where � D Œ�1; �2; :::�n� isa vector of zero-mean orthogonal Gaussian random variables, the random variablecan be approximated by truncated Hermite PC expansion as follows [45]:

y.�/ DPX

kD0

akH nk .�/; (2.30)


where n is the number of independent random variables, H nk .�/ is n-dimensional

Hermite polynomials, and ak are the deterministic coefficients. The number of termsP is given by

P DpX

kD0

.n � 1 C k/Š

kŠ.n � 1/Š; (2.31)

where p is the order of the Hermite PC.Similarly, a random process v.t; �/ with limited variance can be approximated as

v.t; �/ DPX

kD0

akH nk .�/: (2.32)

If only one random variable/process is considered, the one-dimensional Hermitepolynomials are expressed as follows:

H 10 .�/ D 1; H 1

1 .�/ D �; H 12 .�/ D �2 � 1; H 1

3 .�/ D �3 � 3�; ::: : (2.33)

Hermite polynomials are orthogonal with respect to Gaussian weighted expectation(the superscript n is dropped for simple notation):

hHi .�/; Hj .�/i D hH 2i .�/iıij ; (2.34)

where ıij is the Kronecker delta and h�; �i denotes an inner product defined asfollow:

hf .�/; g.�/i D 1p

.2/n

Zf .�/g.�/e� 1

2 �T �d�: (2.35)

Similar to Fourier series, the coefficient ak for random variable y and ak.t/ forrandom process v.t/ can be found by a projection operation onto the HPC basis:

ak D hy.�/; Hk.�/ihH 2

k .�/i ; (2.36)

ak.t/ D hv.t; �/; Hk.�/ihH 2

k .�/i ; 8k 2 f0; :::; P g: (2.37)

Once we obtain the Hermite PC, we can calculate the mean and variance ofrandom variable y.�/ by one-time analysis as (one Gaussian variable case):

E.y.�// D y0

Var.y.�// D y21Var.�1/ C y2

2 .t/Var��2

1 � 1�

D y21 C 2y2

2 : (2.38)

3 Statistical Analysis Approaches 31

Similarly, for random process v.t; �/ (one Gaussian variable case), the mean andvariance are as follows:

E.v.t; �// D v0.t/

Var.v.t; �// D v21.t/Var.�1/ C v2

2.t/Var��2

1 � 1�

D v21.t/ C 2v2

2.t/: (2.39)

One critical problem remains so far is how to obtain the coefficients of HermitePC in (2.36) and (2.37) efficiently. There are two kinds of techniques to calculatethe coefficients of Hermite PC in (2.36) and (2.37), which are collocation-basedspectral stochastic method and Galerkin-based spectral stochastic method. In short,we classify in the later part of the book as collocation-based and Galerkin-basedmethods.

3.3 Collocation-Based Spectral Stochastic Method

The collocation method is mainly based on computing the definite integral of afunction [70]. The Gaussian quadrature is the commonly used method. We cancompute the coefficients ak and ak.t/ in (2.36) and (2.37), respectively. We reviewthis method by using the Hermite polynomial shown below.

Our objective is to determine the numerical solution of the integral equationhy.�/; Hj .�/i (x can be a random variable or random process). In our problem,this is one-dimensional numerical quadrature problem based on Hermite polynomi-als [70]. Thus, we have

hy.�/; Hk.�/i D 1p

.2/

Zy.�/Hk.�/e� 1

2 �2

d�

PX

iD0

y.�i /Hi .�i /wi : (2.40)

Here we have only a single random variable �. �i and wi are Gaussian-Hermitequadrature abscissas (quadrature points) and weights.

The quadrature rule states that if we select the roots of the P th Hermitepolynomial as the quadrature points, the quadrature is exact for all polynomialsof degree 2P � 1 or less for (2.40). This is called (P � 1)-level accuracy of theGaussian-Hermite quadrature.

For multiple random variables, a multidimensional quadrature is required. Thetraditional way of computing a multidimensional quadrature is to use a directtensor product based on one-dimensional Gaussian Hermite quadrature abscissas


and weights [126]. With this method, the number of quadrature points needed forn dimensions at level P is about .P C 1/n, which is well known as the curse ofdimensionality.

Smolyak quadrature [126], also known as sparse grid quadrature, is used as anefficient method to reduce the number of quadrature points. Let us define a one-dimensional sparse grid quadrature point set P

1 D f�i ; �2; :::; �P g, which usesP C 1 points to achieve degree 2P C 1 of exactness. The sparse grid for an n-dimensional quadrature at degree P chooses points from the following set:

Pn D [

P C1�jij�P Cn.

i11 � � � � �

in1 /; (2.41)

where jij D Pnj D1 ij . The corresponding weight is

wi1:::inji1 :::jin

D .�1/P Cn�jij�

n � 1

n C P � jij

˘m

wimjim

; (2.42)

where

�n � 1

n C P � jij

is the combinatorial number and w is the weight for the

corresponding quadrature points. It has been shown that interpolation on a Smolyakgrid ensures a bound for the mean-square error [126]

jEP j D O�N r

P .logNP /.rC1/.n�1/�

;

where NP is the number of quadrature points and r is the order of the maximumderivative that exists for the delay function. The number of quadrature points

increases as O

nP

.P /Š

�.

It can be shown that a sparse grid at least with level P is required for an order P

representation. The reason is that the approximation contains order P polynomialsfor both y.�/ and Hj .�/. Thus, there exists y.�/Hj .�/ with order 2P , whichrequires a sparse grid of at least level P with an exactness degree of 2P C 1.

Therefore, level 1 and level 2 sparse grids are required for linear and quadraticmodels, respectively. The number of quadrature points is about 2n for the linearmodel and 2n2 for the quadratic model. The computational cost is about the same asthe Taylor-conversion method, while keeping the accuracy of homogeneous chaosexpansion.

In addition to the sparse grid technique, we can also employ several acceleratingtechniques. Firstly, when n is too small, the number of quadrature points for sparsegrid may be larger than that of direct tensor product of a Gaussian quadrature. Forexample, if there are only two variables, the number is 5 and 15 for level 1 and 2sparse grid, compared to 4 and 9 for direct tensor product. In this case, the sparsegrid will not be used. Secondly, the set of quadrature points (2.41) may contain thesame points with different weights. For example, the level 2 sparse grid for threevariables contains four instances of the point (0,0,0). Combining these points bysumming the weights reduces the computational cost of y.�i /.

4 Sum of Log-Normal Random Variables 33

3.4 Galerkin-Based Spectral Stochastic Method

The Galerkin-based method is based on the principle of orthogonality that the bestapproximation of y.�/ is obtained when the error, .�/, defined as

.�/ D y.�/ � y (2.43)

is orthogonal to the approximation. That is,

< .�/; Hk.�/ >D 0; k D 0; 1; : : : ; P; (2.44)

where Hk.�/ are Hermite polynomials. In this way, we have transformed thestochastic analysis process into a deterministic form, whereas we only need tocompute the corresponding coefficients of the Hermite PC.

For the illustration purpose, considering two Gaussian variable � D Œ�1; �2�, weassume that the charge vector in panels can be written as a second-order (p D 2)Hermite PC, we have

y.�/ D y0 C y1�1 C y2�2 C y3.�21 � 1/ C

y4.�22 � 1/ C y5.�1�2/; (2.45)

which will be solved by (2.44). Once the Hermite PC of y.�/ is known, the meanand variance of y.�/ can be evaluated trivially. Given an example, for one randomvariable, the mean and variance are calculated as

E.y.�// D y0;

Var.y.�// D y21Var.�/ C y2

2 Var.�2 � 1/

D y21 C 2y2

2 : (2.46)

In consideration of correlations among random variables, we apply PCA Sect. 2.5to transform the correlated variables into a set of independent variables.

4 Sum of Log-Normal Random Variables

Leakage current distribution is usually with log-normal distribution. Due to theexponential convergence rate, Hermite PC can be used to represent log-normalvariables and the sum of log-normal variables [109].


4.1 Hermite PC Representation of Log-Normal Variables

Let g.�/ be the Gaussian random variable and l.�/ be the random variable obtainedby taking the exponential of g.�/,

l.�/ D eg.�/; g.�/ D ln.l.�//: (2.47)

For a log-normal random variable Il , let the mean and the variance of g.�/ as �g

and �2g , then the mean and variance of l.�/ are

�l D e

��gC �2

g2

; (2.48)

�2l D e.2�gC�2

g/he�2

g � 1i

; (2.49)

respectively.A general Gaussian variable g.�/ can always be represented in the following

affine form:

g.�/ DnX

iD0

�i gi ; (2.50)

where �i are orthogonal Gaussian variables. That is, h�i �j i D ıij , h�i i D 0, and�0 D 1 and gi is the coefficient of the individual Gaussian variables. Note thatsuch form can always be obtained by using Karhunen–Loeve orthogonal expansionmethod [45].

In our problem, we need to represent the log-normal random variable l.�/ byusing the Hermite PC expansion form:

l.�/ DPX

kD0

lkH nk .�/; (2.51)

where l0 D exp

��g C �2

g

2

�. To find the other coefficients, we can apply (2.36) on

l.�/. Therefore, we have

lk.t/ D hl.t; �/; Hk.�/ihH 2

k .�/i ; 8k 2 f0; :::; P g: (2.52)

As was shown in [44], l.�/ can be written as

l.�/ D hHk.� � g/ihH 2

i .�/i D exp

2

4�g C 1

2

nX

j D1

g2j

3

5; (2.53)

where n is the number of independent Gaussian random variables.

4 Sum of Log-Normal Random Variables 35

The log-normal process can then be written as

l.�/ D l0

0

@1 CnX

iD1

�i gi CnX

iD1

nX

j D1

.�i �j � ıij /

h.�i �j � ıij /2igi gj C � � �1

A ; (2.54)

where gi is defined in (2.50).

4.2 Hermite PC Representation with One Gaussian Variable

In this case, � D Œ�1�. For the second-order Hermite PC (P D 2/, following (2.54),we have

l.�/ D l0

�1 C �g�1 C 1

2�2

g

��2

1 � 1�

: (2.55)

Hence, the desired Hermite PC coefficients, I0;1;2, can be expressed as l0; l0�g ,and 1

2l0�

2g , respectively.

4.3 Hermite PC Representation of Two and More GaussianVariables

For two random variables (n D 2), assume that � D Œ�1; �2� is a normalizeduncorrelated Gaussian random variable vector that represents random variable g.�/:

g.�/ D �g C �1�1 C �2�2: (2.56)

Note thath.�i �j � ıij /2i D h�2

i �2j i D h�2

i ih�2j i D 1:

Therefore, the expansion of the log-normal random variables using second-orderHermite PCs can be expressed as

l.�/ D l0

�1 C �1�1 C �2�2 C �2

1

2.�2

1 � 1/

C�22

2.�2

2 � 1/ C 2�1�2�1�2

; (2.57)

where

�l D l0 D exp

��g C 1

2�2

1 C 1

2�2

2

:


Hence, the desired Hermite PC coefficients, I0;1;2;3;4;5, can be expressed asl0; l0�1; l0�2; 1

2l0�

21 ; 1

2l0�

22 , and 2l0�1�2, respectively.

Similarly, for four Gaussian random variables, assume that� D Œ�1; �2; �3; �4� is a normalized, uncorrelated Gaussian random variable vector.The random variable g.�/ can be expressed as

g D �g C4X

iD1

�i � i : (2.58)

As a result, the log-normal random variable l.�/ can be expressed as

l.�/ D l0

0

@1 C4X

iD1

� i �i C4X

iD1

1

2

��2

i � 1�

�2i C

4X

iD1

4X

j D1

�i �j �i �j C � � �1

A ;

(2.59)where

�l D l0 D exp

�0 C 1

2

4X

iD1

�2i

!

:

Hence, the desired Hermite PC coefficients can be expressed using the equation(2.59) above.

5 Summary

The discussion of preliminary in probability theory is required to understandingstatistical analysis and modeling for VLSI design in nanometer region. In thischapter, we introduced the relevant fundamentals employed in statistical analysis.First, we presented the basic concepts and components such as mean, variance,and covariance due to process variation. After that, we reviewed techniques forthe statistical variable decoupling and reduction in PFA/PCA analysis. We furtherdiscussed the spectral stochastic analysis required for extraction, mismatch, andyield analysis used in the later chapters. We also discussed different methods toestimate the sum of random variables required for leakage current estimation.

Part IIStatistical Full-Chip Power Analysis

Chapter 3Traditional Statistical Leakage PowerAnalysis Methods

1 Introduction

Process-induced variability has huge impact on the circuit performance in the sub-90 nm VLSI technologies [120]. This is the particular case for leakage power,which has increased dramatically with the technology scaling and is becoming thedominant chip power dissipation [71].

Leakage power and its proportion in chip power dissipation have increaseddramatically with technology scaling [71]. The dominant factors in leakage currentsare subthreshold leakage currents Isub and gate oxide leakage currents Igate.Subthreshold leakage currents rapidly increase for every technology generation(about 5� to 10� increase per generation [24]) and are highly sensitive to thresholdvoltage V th variations owing to the exponential relationship between Isub and V th.On the other hand, as gate oxide thickness, Tox, scales down, Igate grows rapidly asIgate has an exponential dependence on Tox.

Both leakage currents are highly sensitive to process variations due to theexponential relation between the leakage current and variational parameters likeeffective channel lengths. As process-induced variability becomes more pronouncedin the deep submicro regime [120], leakage variations become more significant,and traditional worst-case-based approaches will lead to extremely pessimistic andexpensive overdesigned solutions. Statistical estimation and analysis of leakagepowers considering process variability are critical in various chip design steps toimprove design yield and robustness. In the leakage estimation model, we can obtainthe chip-level leakage statistics such as the mean value and standard deviation fromprocess information, library information, and design information.

Many methods have been proposed for the statistical model of chip-level leakagecurrent. Early work in [169] gives the analytic expressions of mean value andvariance of leakage currents of CMOS gates considering only subthreshold leakage.The method in [119] provides simple analytic expressions of leakage currents of the


39

40 3 Traditional Statistical Leakage Power Analysis Methods

Table 3.1 Different methods for full-chip SLA

Criteria Categories

Process variation Inter-die Intra-die, w or w/o spatial correlationLeakage distribution Log-normal Non-log-normalSpeedup method MC Grid b Gate b Projection bLeakage component Isub Igate

Static leakage model Gate-based MOSFET-based

whole chip considering global variations only. The method in [192] uses third-orderHermite polynomials without considering spatial correlations and only calculatesthe mean value of full-chip leakage current. In [114], reverse biased source/drainjunction BTBT (band-to-band tunneling) leakage current is considered, in additionto the subthreshold leakage currents, for estimating the mean values and variances ofthe leakage currents of gates only. In [142], the PDF of stacked CMOS gates and theentire chip are derived considering both inter-die and intra-die variations. In [14],a hardware-based statistical model of dynamic switching power and static leakagepower was presented, which was extracted from experiments in a predeterminedprocess window.

Chip-level SLA methods can be classified into different categories based ondifferent criteria as shown in Table 3.1. Our classification and survey may notbe complete as this is still an active research field and more efficient methodswill be developed in the future. We will present in detail some recent importantdevelopments in the section such as Monte Carlo method and the traditional grid-based method [13]. The gate-based spectral stochastic method [155] and the virtualgrid-based method will be introduced in Chap. 4 and Chap. 5, respectively. Weremark that our limited coverage of the other methods, which are presented inminimal detail, does not diminish the value of their contributions.

This chapter is structured as follows. In Sect. 2, we discuss the static leakagemodel for one gate/MOSFET, and then Sect. 3 gives the process variation models forcomputing statistical information of full-chip leakage current. Section 4 presents therecently proposed chip-level statistical leakage modeling and analysis works. Thechapter concludes with a summary and brief discussion of potential future research.

2 Static Leakage Modeling

Full-chip leakage current has two components, subthreshold leakage current andgate leakage current. Here we describe the empirical models for both of them, basedon the assumption that the leakage current under process variations is estimatedunder log-normal distributions.

2 Static Leakage Modeling 41

2.1 Gate-Based Static Leakage Model

The subthreshold leakage current, Isub, is exponentially dependent on the thresholdvoltage, V th. V th is observed to be most sensitive to gate oxide thickness Tox andeffective gate channel length L due to short-channel effects. When the change inL or Tox is small, the precise relationship shows an exponential dependent effecton Isub, with the effect of Tox being relatively weak. For the gate oxide leakagecurrent, both channel length and oxide thickness have strong impacts on the leakagecurrents, which are exponential functions of the two variables.

The leakage model is based on gates, as in [13] and [155]. We follow theanalytical expressions given in [13], which estimate the subthreshold leakagecurrents and the gate oxide leakage currents as follows:

Isub D ea1Ca2LCa3L2Ca4T �1ox Ca5Tox ; (3.1)

Igate D ea1Ca2LCa3L2Ca4ToxCa5T 2ox ; (3.2)

where a1 through a5 are the fitting coefficients for each unique input combinationof a gate. Then we can use a LUT to store the fitting parameters. For a k-input gate,the size of the LUT is 2k � 10 as we have two equations for each input combination,and each equation has 10 fitting parameters. While in [13], they only keep dominantstates for leakage current, i.e., only one “off” transistor in a series transistor stack.However, with technology down scaling to 45 nm, this is not the practical case. TheIsub based on the model in (3.1) still has a large error compared to the simulationresults. Hence, the authors in [155] keep all the states.

After choosing sampling points for L and Tox in their 3� regions linearly, andthen conducting SPICE simulation at each point, the subthreshold leakage currentis stored as the original curve. We can then perform the curve fitting process.Figures 3.1 and 3.2 show the curve fitting results of Isub and Igate for four inputpatterns in the AND2 gate. Here, 100 points are chosen linearly in the 3� regionsfor L and Tox. These figures show that the curves fit the SPICE results very well,and the currents in the four cases are comparable with each other. Since there is no“dominant state,” all of them need to be considered.

Table 3.2 shows the errors compared with industry SPICE simulation resultsfor the AND2 gate for Isub. Max Err. is the maximum error given by one inputcombination, and Avg Err. refers to the average error over all the input patterns. Ifwe add more terms into (3.1) as shown in Table 3.2, we can reduce the errors from8% to about 3%. After we obtain the analytic expression for each input combination,we take the average of the leakage currents of all the input combinations to arrivefinal analytic expression for each gate in lieu of the dominant states used in [13].

Based on this model, the leakage current of one gate under process variationcan be estimated by log-normal distributions. The average leakage of a gate can becomputed as a weighted sum of leakage under different input states,


0 20 40 60 80 100−1

0

1

2

3

4

5

ln(I

sub)

ln(I

sub)

ln(I

sub)

Sample Point Index

0 20 40 60 80 100Sample Point Index

Input Patern 0ln(nA) Input Patern 1ln(nA)

ln(I

sub)

Input Patern 3ln(nA)Input Patern 2ln(nA)

0 20 40 60 80 1000

1

2

3

4

5

6

Sample Point Index

1

2

3

4

5

6

0 20 40 60 80 1000

1

2

3

4

5

6

Sample Point Index

Spice

Curve−fittingSpice

Curve−fitting

Spice

Curve−fitting

SpiceCurve−fitting

Fig. 3.1 Subthreshold leakage currents for four different input patterns in AND2 gate under 45 nmtechnology

Iavgsub D

X

j 2 input states

Pj Isub;j ; (3.3)

Iavggate D

X

j 2 input states

Pj Igate;j ; (3.4)

Ileak; chip DX

8gates iD1;:::;n

Iavgsub;i C I

avggate;i ; (3.5)

where Pj is the probability of input state j ; Isub;j and Igate;j are the subthresholdleakage and the gate oxide leakage at input state j , respectively. n is the total numberof gates in the circuit. The interaction between these two leakage mechanisms isincluded in total leakage estimation.

Since all the leakage components can be approximated as a log-normal distribu-tion, we can simply sum up the distributions of the log-normals for all gates to getthe full-chip leakage distribution. Note that there exist spatial correlations, and the

2 Static Leakage Modeling 43

0 50 100−2

0

2

4

6ln

(Iga

te)

ln(I

gate

)

ln(I

gate

)ln

(Iga

te)

Sample Point Index

0 50 100Sample Point Index

ln(nA) Input Patern 0

0 50 100−2

0

2

4

6

Sample Point Index


−2

0

2

4

6ln(nA) Input Patern 2

0 50 100−2

0

2

4

6

8

Sample Point Index


Spice

Curve−fitting

Spice

Curve−fitting



Fig. 3.2 Gate oxide leakage currents for four different input patterns in AND2 gate under 45 nmtechnology

Table 3.2 Relative errors by using different fitting formulas forleakage currents of AND2 gate

Fitting components Max Err. (%) Avg Err. (%)

Original: L; L2; T �1ox ; Tox 14.7 8.46

Add T 2ox 13.95 8.26

Add T 2ox; Tox=L 7.08 5.95

Add T 2ox, Tox=L; L=Tox 7.14 4.94

Add T 2ox; Tox=L, L=Tox; Tox � L 3.67 3.49

leakage distributions of any two gates may be correlated. Therefore, the full-chipleakage current is calculated by a sum of correlated log-normals:

S DpX

iD1

eYi ; (3.6)

where p is the total number of log-normals to sum, Yi is Gaussian random variable,and Y D ŒY1; Y2; : : : ; Yp� forms a multivariate normal distribution with covariancematrix †Y . The vector Y is a function of L and Tox.


DrainSource

LeffW

Gate

BA

Fig. 3.3 Typical layoutof a MOSFET

2.2 MOSFET-Based Static Leakage Model

Like in [96], sometimes the statistical model for the subthreshold leakage current isformulated in a MOSFET. Here, we only discuss the formulation method developedfor NMOS transistors, then the method can be easily extended to PMOS transistors.Here Isub of one MOSFET is formulated, and the Leff for nonrectilinear transistor isdeveloped.

The leakage current of a ideal transistor can be expressed as a function ofLeff [65]. The curve-fitted leakage model considering narrow width effect is shownin (3.7),

Isub D ˛sub

pq�si Ncheff.W 2 C ˛W W /

.V 2ds C ˛ds1Vds C ˛ds2/exp.˛L1L2

eff C ˛L2Leff/

��

2 � exp

�� A

A0

� exp

�� B

B0

��

1 � exp

��Vds

VT

exp

�Vgs � Vthlin

nVT

; (3.7)

where all ˛s are fitting parameters, �si is the dielectric constant of Si, Ncheff is theeffective channel doping concentration, and A and B are layout parameters as shownin Fig. 3.3.

When high-k techniques are used to better insulate the gate from the channelfor sub-65-nm technologies, gate oxide tunneling effect has been moderated andcontrolled [96]. In this case, Igate is less important than Isub.

A real gate structure under sub-90-nm technology is with rough edge (nonrec-tilinear), which can be translated into an equivalent single transistor with effectivegate channel length Leff. As shown in Fig. 3.4, a nonrectilinear gate can be dividedinto several slices of subgate, each of which has its own length and shares samecharacteristic width W0 along the width direction. In this way, the leakage current ofone nonrectilinear gate IG can be approximated as the sum of the leakage currentsof all the slices along the width direction:

3 Process Variational Models for Leakage Analysis 45

Leff

Li

W0

WW

Fig. 3.4 Procedure to derive the effective gate channel length model

IG DMX

j D1

Ij .Lj ; W0/ D I.Leff; W /; (3.8)

where W is the width of the gate and each slice is a regular gate. Under this frame,supposing we have M slices along the width direction, then we have

� DPM

j D1 Lj

M; (3.9)

� DqPM

j D1.Lj � �/2

M: (3.10)

The Leff for the equivalent gate can be calculated by

Leff D Lmin C ˛ln

��W

W0

; (3.11)

where ˛ is the fitting parameter.After we set up the Leff model, the equivalent Leff can be used in the compact

model for leakage current as shown in (3.7).

3 Process Variational Models for Leakage Analysis

In this section, we present the process variation for computing variational leakagecurrents. Process variation occurs at different levels: wafer level, inter-die level,and intra-die level. Furthermore, they are caused by different sources such aslithography, materials, aging, etc. [7]. Some of the variations are systematic, i.e.,those caused by the lithography process [42,129]. Some are purely random, i.e., the


Table 3.3 Process variationparameter breakdown for45 nm technology

�2 distribution (� )

Gate Inter-die 20% 4% � 18 nmlength (L) Intra-die

� Spatial correlated 80%Gate oxide Inter-die 20% 4% � 1:8 nmthickness (Tox) Intra-die

� Noncorrelated 80%

doping density of impurities and edge roughness [7]. In this section, we introducedifferent kinds of process variations first, and then the process variational model forleakage analysis.

The main process parameter to have a big impact on leakage current is thetransistor threshold voltage V th. V th is observed to be the most sensitive to theeffective gate channel length L and gate oxide thickness Tox. The ITRS [71]indicates that the gate channel length variation is a primary factor for deviceparameter variation, and the number of dopants in channel results in an unacceptablylarge statistical variation of the threshold voltage. Therefore, we must considerthe variations in L and Tox, since leakage current is most sensitive to theseparameters [13]. To reflect reality, we model spatial correlations in the gate channellength, while the gate oxide thickness values for different gates are taken to beuncorrelated.

Here we list an example of detailed parameters for gate channel length and gateoxide thickness variations for under 45 nm technology in Table 3.3. As indicatedin the second column, we can decompose each parameter variation into “inter-die”and “intra-die” variations. For intra-die variation, we further decompose it into withand without spatial correlation. In most cases, these variations can be modeled byGaussian distributions [33, 178]. The total variance (�2) is computed by summingup the variances of all components, since the sum of Gaussian distributions is still aGaussian distribution.

Under inter-die variation, if the leakage currents of all gates or devices aresensitive to the process parameters in similar ways, then the circuit performancecan be analyzed at multiple process corners using deterministic analysis methods.However, statistical methods must be used to correctly predict the leakage ifintra-die variations are involved. As leakage current varies exponentially withthese parameters, simple use of worst-case values for all parameters can result inexponentially larger leakage estimates than the nominal values which are actuallyobtained, which is too inaccurate to be used in practical cases.

Electrical measurements of a full wafer show that the intra-die gate channellength variation has strong spatial correlation [42]. This implies that devices thatare physically close to each other are more likely to be similar than those that are farapart. Therefore, the intra-die variation of gate channel lengths is modeled based onsuch kind of correlation. There are several different models that can represent thiskind of spatial correlations. Take the exponential model [195] for instance,

�.r/ D e�r2=�2

(3.12)

3 Process Variational Models for Leakage Analysis 47

where r is the distance between two panel centers and � is the correlation length.We notice that the strong spatial correlation suggested by (3.12) has been exploitedby [13] to speed up the calculation, where the full-chip is divided into N gridsand the correlated random variables are perfectly correlated in a grid. The strongspatial correlation is explored naturally by grid-based method or PCA (for Gaussiandistributions) or independent component analysis (for non-Gaussian distributions),which can transfer the correlated random variables into independent ones withreduced numbers. Details will be given in the next section. For gate oxide thickness,Tox, strong spatial correlation does not exist; therefore, we assume Tox of differentgates are uncorrelated.

The last column of Table 3.3 shows the standard deviation (�) of each variation.According to statistical theory regarding Gaussian distributions, 99% of the samplesshould fall in the range of ˙3� . According to [71], the physical gate channel lengthfor high-performance logic in 45 nm technology will be 18 nm, and the physicalvariation should be controlled within ˙12%. Therefore, we let 3� be 12%, and asimilar analysis can be done for Tox.

For a gate/module in a chip with gate channel length L, and process variationL using our model parameters in Table 3.3, we have

L D �L C L; L D Linter C Lintra corr; (3.13)

where �L is the nominal design parameter value, and Linter is constant for all gatesin all grids since it is a global factor that applies to the entire chip. For one chipsample, we only need to generate it once. Lintra corr is different between each gateor each grid and has spatial correlation. Therefore, we generate one value for eachgate/grid, and the spatial correlation is regarded as an exponential model in (3.12),so that the correlation coefficient value diminishes with the distance between anytwo gates/grids.

As for the gate oxide thickness Tox, using model parameters in Table 3.3, we have

Tox D �ox C Tox; Tox D Tox; inter C Tox; intra uncorr; (3.14)

where �ox is the nominal design parameter value. Due to similar reason as Linter,Tox; inter is constant for all gates in all grids. Tox; intra uncorr is different betweenany gates/grids, but does not have spatial correlation.

After the process variations are modeled as correlated distributions, we can applythe PCA in Sect. 2.2 of Chap. 2 to decompose correlated Gaussian distributions intoindependent ones. After PCA, the process variations (e.g., V th, Tox and L) ofeach gate can be modeled as

XG;i D VG;i E; (3.15)


where the vector XG;i D ŒxG;i;1; xG;i;2; : : :�T stands for the parametervariations of the i th gate. E D Œ"1; "2; : : : ; "m�T represents the random variablesfor modeling both inter-die and intra-die variations of the entire die. Heref"1; "2; : : : ; "mg can be extracted by PCA. They are independent and satisfy thestandard Gaussian distribution (i.e., zero mean and unit standard deviation). m isthe total number of these random variables. For practical industry designs, m istypically large (e.g., 103 � 106). VG;i captures the correlations among the randomvariables.

When m is a large number, the size of VG;i can be extremely huge. However, XG;i

only depends on the intra-die variations within its neighborhood; so VG;i should bequite sparse. In Sect. 4, the gate-based spectral stochastic method and the projection-based method will use this sparsity property to reduce the computational cost in twodifferent ways.

Gate-based statistical leakage analysis typically starts from the leakage modelingfor one gate,

Ileak;i D f .E/; (3.16)

where Ileak;i represents the total leakage current of the i th gate. Different modelscan be chosen here to represent the relationship between E and Ileak;i . For example,quadratic models are used to guarantee accuracy:

log.Ileak;i / D ET Aleak;i E C BTleak;i E C Cleak;i ; (3.17)

where Aleak;i 2 Rm�m; Bleak;i 2 Rm; and Cleak;i 2 R are the coefficients. Moredetails will be given in the next section.

Given the leakage models of all the individual gates, the full-chip leakage currentis the sum of leakage currents of all the gates on the chip:

Ileak; Chip D Ileak;1 C Ileak;2 C � � � C Ileak;n; (3.18)

where n is the total number of gates in a chip. If we choose the quadratic modelin (3.17) and (3.18) implies that the full-chip leakage current is the sum of manylog-normal distributions. As we mentioned before, it can be approximated as alog-normal distribution [13]. Therefore, we can also use a quadratic model toapproximate the logarithm of the full-chip leakage:

log.Ileak; Chip/ D ET AChipE C BTChipE C CChip; (3.19)

where AChip 2 Rm�m; BChip 2 Rm; and CChip 2 R are the coefficients. In (3.17) and(3.19), the quadratic coefficient matrices AGatei and AChip can be extremely large forcapturing all the intra-die variations, which makes the quadratic modeling problemextremely expensive in practical applications. Several approaches have been madeto reduce the size of the model, with more details shown in the next section.

4 Full-Chip Leakage Modeling and Analysis Methods 49

4 Full-Chip Leakage Modeling and Analysis Methods

Full-chip statistical leakage modeling and analysis methods can be classified intodifferent categories based on different criteria as shown in Fig. 3.1. In this section,we will present in detail the three important methods: MC method, the traditionalgrid-based method, and project-based method.

4.1 Monte Carlo Method

Monte Carlo technique mentioned in Sect. 3.1 of Chap. 2 can be used to estimatethe value of leakage power at gate level as well as chip level.

For full-chip leakage current, Ileak; Chip is G in (2.25). If the sample number MC

is large enough, then we can obtain a sufficiently accurate result. However, for full-chip leakage current analysis, the MC estimator is too expensive. A more efficientmethod with good accuracy is needed.

Several techniques exist for improving the accuracy of Monte Carlo evaluationof finite integrals. In these techniques, the goal is to construct an estimator with areduced variance for a given, fixed number of samples. In other words, the improvedestimator can provide the same accuracy as the standard Monte Carlo estimator,while needing considerably fewer samples. This is desirable because computing thevalue of g.Xi / is typically costly.

4.2 Traditional Grid-Based Methods

Since the number of gates on an entire chip is very large and every gate has theirown variational parameter, the resulting number of random variables is very large.For greater efficiency, the grid-based method partitions a chip to several grids, andassigns all the gates on one grid with the same parameters.

A full-chip SLA method considering spatial correlations in the intra-die andinter-die variations was proposed [13]. This method introduces a grid-based par-titioning of the circuits to reduce the number of variables at a loss of accuracy. Aprojection-based approach has been proposed in [95] to speed up the leakage anal-ysis, where Krylov-subspace-based reduction has been performed on the coefficientmatrices of second-order expressions. This method assumes independent randomvariables after a preprocessing step such as PCA. However, owing to the largenumber of random variables involved (103 to 106), the PCA-based preprocess can bevery expensive. Work in [65] proposes a linear-time complexity method to computethe mean and variance of full-chip leakage currents by exploiting the symmetricproperty of one existing exponential spatial correlation formula. The method onlyconsiders subthreshold leakage, and it requires the chip cells and modules to be


partitioned into a regular grid with similar uniform fitting functions, which istypically impractical. In this work, both subthreshold leakage and gate oxide leakageof only dominant input states are considered in (3.4). Here we consider only intra-die variation of parameters. The extension to handling inter-die variation is quiteobvious, as shown at the end of this subsection.

As shown in (3.6), the total leakage current of a chip is the sum of correlatedleakage components, which can be approximated as a log-normal using Wilkinson’smethod [2]. A sum of t log-normals, S D Pt

iD1 eYi , is approximated as the log-normal eZ , where Z D N.�z; �z/. In Wilkinson’s approach, the mean value andstandard deviation of Z are obtained by matching the first two moments, u1 and u2,ofPt

iD1 eYi as follows:

u1 D E.S/ D e�zC�2z =2 D

tX

iD1

e�yi C�2yi

=2; (3.20)

u2 D E.S2/ D e2�zC2�2z D

tX

iD1

e2�yi C2�2yi

C2

t�1X

iD1

tX

j DiC1

e�yi C�yj e.�2yi

C �2yj

C 2rij �yi �yj /=2; (3.21)

where rij is the correlation coefficient of Yi and Yj . Solving (3.21) for �z and �z

yields

�z D 2 ln u1 � 1

2ln u2; (3.22)

�2z D ln u2 � 2 ln u1: (3.23)

From the above formula, we can see that a pair-by-pair computation for allcorrelated pairs of variables needs to be done, i.e., for all i , j such that rij D 0.It will lead to a very expensive computation time cost. First, leakage currents ofdifferent gates are correlated because of the spatial correlation of L. Secondly, Isub

and Igate associated with the same NMOS transistor are correlated. Thirdly, Isub

in the same transistor stack are also correlated. If there are N gates in the circuit,the complexity for computing the sum will be O.N 2/, which is far from practicalfor large circuits. Therefore, the grid-based method uses several approximationsto reduce the time complexity. In the grid-based method, gates in the same gridhave the same parameter values. For example, let Isub;i be the subthreshold leakagecurrents for Gatei (i D 1; : : : ; t) under the same input vector, and assume that thesegates are all in the same grid k. Then

Isub;i D ˛i eY 0

i Cˇ0�dLkCˇ1�dTox;i ; (3.24)

4 Full-Chip Leakage Modeling and Analysis Methods 51

where ˛i , ˇ0, and ˇ1 are the fitting coefficients. Since we assume that L is spatiallycorrelated and Tox is uncorrelated, all of the Isub;i in the same grid should use thesame variable dLk and different dTox values. Then, the sum of the leakage termsIsub;i in grid k is given by

eY 0i Cˇ0�dLk �

tX

iD1

˛i � eˇ1�dTox;i : (3.25)

Note that the second part of the above expression is a sum of independent log-normal variables, which is a special case for the sum of correlated log-normalvariables. By using Wilkinson’s method, this can be computed in linear time.Therefore, for gates of the same type with the same input state in the same grid,the time complexity is only linear, and we can approximate the sum of leakageof all gates by a log-normal variable which can be superposed in the originalexpression. Similarly, Igate of different gates in the same grid can be calculatedthrough summation in linear time and can be approximated by a log-normal variable.

Now, if the chip is divided into n grids, we can reduce the number of correlatedleakage components in each grid to a small constant c in their library. As a result, thetotal number of correlated log-normals to sum is no more than c � n. In general, thenumber of grids is set to be substantially smaller than the number of gates in the chip,which can be regarded as a constant number. Therefore, the complexity requiredfor the sum of log-normals in the grid-based method is reduced from O.N 2/ to asubstantially smaller constant O.n2/.

As we discussed before, leakage currents of different gates are correlated due tospatially correlated parameters such as transistor gate channel length. Furthermore,Isub and Igate are correlated within the same gate. In addition, leakage currents underdifferent input vectors of the same gate are correlated because they are sensitive tothe same parameters of the gate, regardless of whether or not these are spatiallycorrelated. We must carefully predict the distribution of total leakage in the circuit,and the correlations of these leakage currents must be correctly considered whenthey are summed up.

As we mentioned before, the leakage currents that arise from the same leakagemechanisms in the same grid from the same entry of the LUT are merged intoa single log-normally distributed leakage component to reduce the number ofcorrelated leakage components to sum. Let I sum

1 and I sum2 be two merged sums,

which correspond to subthreshold leakage and gate oxide leakage components inthe same grid, respectively. These can be calculated as

I sum1 D eY 0

1 Cˇ0�dL �tX

iD1

˛i � eˇ1�dTox;i D eY 01 Cˇ0dLe� ; (3.26)

I sum2 D eY 0

2 Cˇ0

0�dL �t 0X

iD1

˛0i � eˇ0

1�dT 0

ox;i D eY 02 Cˇ0dLe� ; (3.27)


where e� and e� are the log-normal approximations of the sum of independent log-normals,

PtiD1 ˛i � eˇ1�dTox;i and

Pt 0

iD1 ˛i � eˇ0

1�dT 0

ox;i in I sum1 and I sum

2 ; respectively, asdescribed in (3.25).

Note thatPt

iD1 ˛i � eˇ1�dTox;i andPt 0

iD1 ˛i � eˇ0

1�dT 0

ox;i may be correlated, since thesame gate could have both subthreshold and gate leakage. Therefore, e� and e� arecorrelated, and we need to derive the correlation between � and � . Since the Tox

values are independent in different gates, we can easily compute the correlation,cov.

PtiD1 ˛i � eˇ1�dTox;i ;

Pt 0

iD1 ˛i � eˇ0

1�dT 0

ox;i / as

X˛i ˛

0i e

.ˇ2i Cˇ02

i /�2

T 0

ox;i=2

.eˇi ˇ0

i �2Tox;i � 1/: (3.28)

The correlation between e� and e� is then found as

cov.e� ; e� / D E.e�C� / � E.e�/E.e� /

D e�� C�� C.�2� C�2

� /=2.ecov.�;�/=2 � 1/; (3.29)

where �� / �� and �� / �� are the mean value and standard deviation of � / � ,respectively. Solving (3.29) for cov.�; �/, we have

cov.�; �/ D 2log

�1 C cov.e� ; e� /

em�Cm� C.�2� C.�2

� /=2

: (3.30)

Since e� and e� are approximations ofPt

iD1 ˛i � eˇ1�dTox;i andPt 0

iD1 ˛i � eˇ0

1�dT 0

ox;i ;

respectively, it is reasonable to assume that

cov.e� ; e� / D cov

0

@tX

iD1

˛i � eˇi Tox;i ;

t 0X

iD1

˛0i � eˇ0

i T 0

ox;i

1

A : (3.31)

At the same time, the mean values and standard deviations of � and � are alreadyknown from the approximations; therefore, the computation of cov.�; �/ is easilypossible.

We can extend the framework for statistical computation of full-chip leakageconsidering spatial correlations in intra-die variations of parameters to handle inter-die variation. For each type of parameter, a global random variable can be appliedto all gates in the circuit to model the inter-die effect. In addition, this frameworkis general and can be used to predict the circuit leakage under other parametervariations or other leakage components. However, if the Gaussian or log-normalassumption does not work, we can not use the grid-based method to estimate full-chip leakage.

5 Summary 53

4.3 Projection-Based Statistical Analysis Methods

Recent work in [5] presents a unified approach for statistical timing and leakagecurrent analysis using quadratic polynomials. However, this method only considersthe long-channel effects and ignores the short-channel effects (ignoring channellength variables) for the gate leakage models. The coefficients of the orthogonal PCat gate level are computed directly by the interproduction via the efficient Smolyakquadrature method. The method also tries to reduce the number of variables via themoment matching method, which further speeds up the quadrature process at thecost of more errors.

This projection-based method is used to compute the moments of statisticalleakages via moment matching techniques, which are well developed in the area ofinterconnect model order reduction [177]. In the projection-based method, quadraticmodels in (3.17) and (3.18) are used to guarantee accuracy. Li et al. [97] proposeda projection-based approach (PROBE) to reduce the quadratic modeling cost. Ina quadratic model, we need to compute all elements of the quadratic coefficientmatrix, which is the main difficulty. Take Achip in (3.19), for example. In mostreal cases, Achip is rank deficient. As a result, this full-rank matrix Achip can beapproximated by another low-rank matrix QAchip if kAchip � QAchipkF is minimized.Here, k � kF denotes the Frobenius norm, which is the square root of the sum ofthe squares of all matrix elements. Li et al. [97] proved that the optimal rank-Rapproximation is

QAchip DRX

rD1

�chiprPchiprPTchipr; (3.32)

where m stands for the total number of random variables and �chipr 2 R andPchipr 2 Rm are the r th dominant eigenvalue and eigenvector of the matrix Achip,respectively.

The PROBE method proposed in [97] is efficient in handling 101 � 102 randomvariables. However, there are 103 � 106 variables in a full-chip SLA. This ledLi et al. [98] to improve the projection-based analysis algorithm by exploringthe underlying sparse structure of the leakage analysis problem. Specifically, theimproved methodology includes (1) two-step iterative algorithm for quadraticSLA modeling, (2) quadratic model compaction algorithm for leakage distributionestimation, and (3) incremental analysis algorithm for locally updating the leakagedistribution.

5 Summary

In this chapter, we have presented problem of statistical leakage analysis underprocess variations with spatial correlations. We then discuss the existing approachesand present the pros and cons of those methods. All the existing approaches either


suffer from the high computing costs (MC method), or can only work for variationswith strong spatial correlations (grid-based method), or has strong assumption aboutparameter variations (no spatial correlation in the projection-based method).

In the following chapters, we show how those problems can be resolved ormitigated. We will mainly present two statistical leakage analysis methods: thespectral-stochastic-based method with variable reduction techniques and the virtualgrid-based approach.

Chapter 4Statistical Leakage Power Analysis by SpectralStochastic Method

1 Introduction

In this chapter, we present a gate-based general full-chip leakage modeling andanalysis method [157]. The gate-based method starts with the process variationalparameters such as the channel length, ıL, and gate oxide thickness, ıTox, and itcan derive the full-chip leakage current Ileak in terms of those variables directly(or their corresponding transformed variables). Unlike existing grid-based methods,which trade the accuracy for speedup, the presented method is gate-based methodand uses principal component analysis (PCA) to reduce the number of variableswith much less accuracy loss, assuming that the geometrical variables are Gaussian.For non-Gaussian variables, independent component analysis (ICA) [68] can beused. The presented method considers both inter-die and intra-die variations, andit can work with various spatial correlations. The presented method becomeslinear under strong spatial correlations. Unlike the existing approaches [13, 65],the presented method does not make any assumptions about the distributions offinal total leakage currents for both gates and chips and does not require anygrid-based partitioning of the chip. Compared with [5], the presented methodapplies a more efficient multidimensional numerical quadrature method (vs. reducednumber of variables using interproduction via the moment matching), considersmore accurate leakage models, and presents more comprehensive comparisons withother methods.

The presented method first fits both the subthreshold and gate oxide leakagecurrents into analytic expressions in terms of parameter variables. We show thatby using more terms in the gate-level analytic models and we can achieve betteraccuracy than [13]. Second, the presented method employs the OPC, which givesthe best representation for specific distributions [45] and is also called the spectralstochastic method, to represent the variational gate leakages in an analytic formin terms of the random variables. The step is achieved by using the numericalGaussian quadrature method, which is much faster than the MC method. The totalleakage currents are finally computed by simply summing up the resulting analytical


55

56 4 Statistical Leakage Power Analysis by Spectral Stochastic Method

orthogonal polynomials of all gates (their coefficients). The spatial correlations aretaken care of by PCA or ICA, and at the same time, the number of random variablescan also be substantially reduced in the presence of strong spatial correlations duringthe decomposition process. Numerical examples on the PDWorkshop91 benchmarkson a 45 nm technology show that the presented method is about 10 times faster thanthe recently presented method [13] with constant better accuracy.

2 Flow of Gate-Based Method

To analyze the statistical model of chip-level leakage current, traditional methodsare grid-based. Since the number of gates on a whole chip is very large, andevery gate has its own variational parameters, it means that the number of randomvariables is huge. So considering efficiency, the traditional methods partition a chipto several grids and assume that all the gates in one grid have the same parametersas mentioned in Sect. 4.2 of Chap. 3. However, this is not the real case. Take Fig. 4.1as one example. Here the distance between Gate1 and Gate 2 is smaller than thedistance between Gate1 and Gate 3. In grid-based method, we suppose that Gate1has strong correlation with Gate 3, and has weak correlation with Gate 2. Butactually, the situation is opposite. In this section, we will present the full-chipstatistical leakage analysis method. This method is gate-based instead of grid-based,while it can gain better speed as well as better accuracy than the method in [13],which is based on grid. Our algorithm is shown in Fig. 4.2. The presented algorithmbasically consists of three major parts. The first part (step 1) is precharacterization,which builds the analytic leakage expressions (3.1) and (3.2) for each type ofgates. This step only needs to be done once for a standard cell library (SCL). Thesecond part (step 2–5) generates a set of independent random variables and builds

Gate1

Gate3

Gate2

Fig. 4.1 An example of agrid-based partition.Reprinted with permissionfrom [157] c� 2010 Elsevier

2 Flow of Gate-Based Method 57

Input: standard cell lib, netlist, placement information of design, � of L and Tox

Output: analytic expression of the full-chip leakage currents in terms of Hermitepolynomials

1. Generate fitting parameter matrices asub and agate of Isub and Igate in (3.1) and (3.2) foreach type of gates (after SPICE run on each input pattern) (Sect. 2).

2. Perform PCA to transform and reduce the original parameter variables in L intoindependent random variables in Lk (Sect. 2.2).

3. Generate Smolyak quadrature point set 2n with corresponding weights.

4. Calculate the coefficients of Hermite polynomial of Isub;k and Igate;k for the final leakageanalytic expression for each gate using (4.9) and (4.10).

5. Calculate the analytic expression of the full-chip leakage current by simple polynomialadditions and calculate �leakage, �leakage, PDF, and CDF of the leakage current if required.

Fig. 4.2 The flow of the presented algorithm

the gate-level analytic leakage current expressions and covariances. The final part(step 6) computes the final leakage expressions by simple polynomial additions andcalculates other statistical information.

2.1 Random Variables Transformation and Reduction

In presented gate-based approach, instead of using grid-based partitioning, asin [13], to reduce the number of channel length variables in the presence of thestrong spatial correlation, we apply the PCA to reduce the number of randomvariables. Our method starts with the following random variable vectors:

L D ŒL1; L2; :::; Ln� C ıLinter; (4.1)

Tox D ŒTox1; Tox2; :::; Toxn� C ıTox; inter; (4.2)

where n is the total number of gates on the whole chip, and ıLinter and ıTox; inter

represent the inter-die (global) variations. In total, we have 2nC2 random variables.There exist correlations between L among different gates, represented by thecovariance matrix cov.Li ; Lj / computed by (3.12).

The first step is to perform PCA on L to get a set of independent random variablesL0 D ŒL0

1; L02; :::; L0

n�, where L D PL0 and P D fpij g is the n � n principalcomponent coefficient matrix. In this process, singular value decomposition (SVD)is used on the covariance matrix, and the singular values are arranged in a decreasingorder, which means that the elements in L0 are arranged in a decreasing weight order.Then the number of elements in L0 can be reduced by only considering the dominantpart of L0 as ŒL0

1; L02; :::; L0

k � (e.g., the weight should be bigger than 1%), where k


is the number of reduced random variables. Then every element L0i in L0 can be

represented by orthogonal Gaussian random variable �i with normal distribution:

L0i D �i C �i �i ; (4.3)

where �i and �i are the mean value and standard deviation of L0i . And L can be

represented as

L D

0

BBB@

�L1

�L2

:::

�Ln

1

CCCA

C

0

BBB@

p11 : : : p1k

p21 : : : p2k

::::::

:::

pn1 : : : pnk

1

CCCA

0

BBB@

�1�1

�2�2

:::

�k�k

1

CCCA

C ıLinter : (4.4)

For ŒTox1; Tox2; :::; Toxn�, ıLinter, and ıTox; inter, we can also represent them by usingthe standard Gaussian variables as

Tox;j D �ox;j C �ox;j �ox;j ;

ıLinter D �L;inter�L;inter;

ıTox; inter D �ox; inter�ox; inter; (4.5)

where �ox;j , �L;inter, and �ox; inter are independent orthonormal Gaussian randomvariables. As a result, we can present L and Tox by k C n C 2 independentorthonormal Gaussian random variables:

� D Œ�1; �2; :::; �kCnC2�: (4.6)

Then the Isub.L; Tox/ / Igate.L; Tox/ can be modeled as Isub.�/ / Igate.�/, respec-tively.

But among the k C n C 2 variables, only k C 2 variables related to thechannel lengths are correlated. In other words, the n variables Tox;i of each gateare independent. As a result, for the j th gate, we only have k C 3 independentvariables; the corresponding variable vector, �g D f�g;j g, is defined as

�g;j D Œ�1; :::; �k ; �ox;j ; �L;inter; �ox; inter�: (4.7)

2.2 Computation of Full-Chip Leakage Currents

For each gate, we need to present the leakage currents in order-2 Hermite polynomi-als first as shown below for both subthreshold and gate leakage currents—Isub.�g;j/

and Igate.�g;j/:

2 Flow of Gate-Based Method 59

Isub.�g;j/ DPX

iD0

Isub;i;j H 2i .�g;j/; Igate.�g;j/ D

PX

iD0

Igate;i;j H 2i .�g;j/; (4.8)

where H 2i .�g;j/s are order-2 Hermite polynomials. Isub;i;j and Igate;i;j are then

computed by the numerical Gaussian quadrature method discussed in Sect. 3.3 ofChap. 2. Let S be the size of Z-dimensional second-order (level 2) quadrature pointset 2

Z and Z D k C 3. Then Isub;i and Igate;i can be computed as the following:

Isub;i;j DSX

lD1

Isub.�l /H2i .�l /wl =hH 2

i .�g;j/i; (4.9)

Igate;i;j DSX

lD1

Igate.�l /H2i .�l /wl =hH 2

i .�g;j/i; (4.10)

where Isub.�l / and Igate.�l / are computed using (3.1) and (3.2).As a result, their coefficients for i th Hermite polynomial at j th gate can be added

directly asIleakage;i;j D

XIsub;i;j C

XIgate;i;j : (4.11)

After the leakage currents are calculated for each gate, we can proceed to computethe leakage current for the whole chip as follows:

Ileakage.�/ DnX

j D1

.Isub.�g;j/ C Igate.�g;j//: (4.12)

The summation is done for each coefficient of Hermite polynomials. Then we obtainthe analytic expression of the final leakage currents in terms of the �.

We can then obtain the mean value, variance, PDF, and CDF of the leakagecurrent very easily. For instance, the mean value and variance for the full-chipleakage current are

�leakage D Ileakage; 0th; (4.13)

�2leakage D

XI 2

leakage; 1st C 2X

I 2leakage; 2nd; type1

CX

I 2leakage; 2nd; type2; (4.14)

where Ileakage;i th is the leakage coefficient for i th Hermite polynomial of secondorder defined as follows,

H0th.�/ D 1; H1st.�/ D �i ;

H2nd; type1.�/ D �2i � 1; H2nd; type2.�/ D �i �j ; i ¤ j: (4.15)


2.3 Time Complexity Analysis

To analyze the time complexity, one typically does not count the precharacterizationcost of step 1 in Fig. 4.2. For PCA step (step 2), which essentially uses SVD on thecovariance matrix, its computation cost is O.nk2/ if we are only interested in thefirst k dominant singular values. This is the case for strong spatial correlation.

In step 3, we need to compute the weights of level 2 .k C 3/-dimensionalSmolyak quadrature point set. For quadratic model with k C3 variables, the numberof Smolyak quadrature points is about .k C 3/2. So the time cost for generatingSmolyak quadrature point set is O..k C 3/2/.

In step 4, we need to call (3.1) and (3.2) S times for each gate. In each call, weneed to compute k C 3 variables in the Hermite polynomials. The computing costfor the two steps is (O.n.k C 3/ � S/), where n is the number of gates. After theleakage currents are computed for each gate, it takes O.n.k C 3// to compute thefull-chip leakage current.

The total computing cost is O.nk2C.kC3/2Cn.kC3/SCn.kC3//. For second-order Hermite polynomials, S / k2, so the time complexity becomes O.nk3/. Ifk � n (for strong spatial correlation), we end up with a linear-time complexityO(n). In the sub-90 nm VLSI technologies, the spatial correlation is really strong,and in the downscaling process, the spatial correlation will become stronger, whichmakes sure our method can achieve pretty good time complexity.

3 Numerical Examples

The presented method has been implemented in Matlab 7.4.0. For comparisonpurpose, we also implement the grid-based method in [13] and the pure MC method.All the experimental results are carried out in a Linux system with quad Intel XeonCPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter werepublished in [155, 157].

The methods for full-chip statistical leakage estimation are tested on circuits inthe PDWorkshop91 benchmark set. The circuits are synthesized with Nangate OpenCell Library, and the placement is from MCNC [106]. The technology parameterscome from the 45 nm FreePDK Base Kit and PTM models [139].

Table 4.1 shows the detailed parameters for gate length and gate oxide thicknessvariations. Here we choose two sets of �2 distributions. The last column of Table 4.1shows the standard deviation (�) of each variation. The 3� values of parametervariations for L and Tox are set to 12% of the nominal parameter values, of whichinter-die variations constitute 20% and intra-die variations, 80% (case 1); inter-dievariations constitute 50% and intra-die variations, 50% (case 2). The parameter L ismodeled as sum of correlated sources of variations, and the gate oxide thickness Tox

is modeled as an independent source of variation. The same framework can be easilyextended to include other parameters of variations. Both L and Tox in each gate are

3 Numerical Examples 61

Table 4.1 Processvariation parameterbreakdown for 45 nmtechnology

Case 1�2 distribution (� )




Case 2�2 distribution (� )




modeled as Gaussian parameters. For the correlated L, the spatial correlation ismodeled based on the exponential spatial correlation in (3.12). For [13], we stillpartition the chip into a number of regular grids, and the numbers of grid partitionsof spatial correlation model used for the benchmarks are given in Table 4.1.

For comparison purposes, we perform MC simulations with 500,000 runs, thegrid-based method in [13], and the presented method on the benchmarks. The largenumber of MC runs is due to the fact that presented method is quite accurate.Figure 4.3 shows the full-chip leakage current distribution (PDF and CDF) of circuitSC0 with 125 gates, considering variation in gate length and gate oxide thickness asin Table 4.1 for Case 1, and the spatial correlation of gate length. It shows that ourmethod fits very well with the MC results, and is more accurate than [13]. Other testcases show the similar comparison results. The results of the comparison of meanvalues and standard deviations of full-chip leakage currents are shown in Tables 4.2and 4.3. For Case 1, the average errors for mean value and standard deviation of thepresented gate-based method are 0.8% and 4.04%, respectively. While for the grid-based method in [13], the average errors for mean value and standard deviation are4.08% and 39.7%, respectively. For Case 2, the average errors for mean value andstandard deviation of the presented new gate-based method are 0.8% and 5.51%,respectively. While for the grid-based method in [13], the average errors for meanvalue and standard deviation are 4.17% and 28.4%, respectively. The presented gate-based method is more accurate than the grid-based method, especially for standarddeviation value. Since we use 45 nm technology, while the results in [13] is basedon 100 nm technology, the error ranges are different. (In [13], the average errorsfor mean value and standard deviation are 1.3% and 4.1%.) Results of the grid-based method in [13] will become worse when the technology scales down, sincethe dominant state assumption is not working any more.

And Table 4.4 also compares the CPU times of the three methods. From thistable, we can see that even if our method is gate based, it is still faster than the


0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1x 10−3

Full−chip Leakage Current(nA)

0 2000 4000 6000 8000 10000Full−chip Leakage Current(nA)

Pro

babi

lity

Den

sity

Probability Density of Leakage Current Comparison

0

0.2

0.4

0.6

0.8

1

Pro

babi

lity

Cumulative Distribution of Leakage Current Comparison

Monte CarloOur MethodGrid−based Method

Monte CarloOur MethodGrid−based Method

Fig. 4.3 Distribution of the total leakage currents of the presented method, the grid-based method,and the MC method for circuit SC0 (process variation parameters set as Case 1). Reprinted withpermission from [157] c� 2010 Elsevier

Table 4.2 Comparison of the mean values of full-chip leakage currents among three methods

� of Ileak.� A) Errors (%)Circuitname

Gate#

Grid#

Variationsetting MC [13] New [13] New

SC0 125 4 Case 1 1:84 1:75 1:82 �4:67 �0:84

Case 2 1:84 1:75 1:82 �4:85 �0:87

SC2 1888 16 Case 1 29:98 28:88 29:70 �3:65 �0:91

Case 2 30:02 28:89 29:75 �3:77 �0:89

SC5 6417 64 Case 1 107:9 103:6 107:2 �3:93 �0:65

Case 2 107:9 103:6 107:2 �3:9 �0:65

method in [13], which is grid based. And the presented method is much faster thanthe MC method. On average, the presented method has about 16� speedup over thegrid-based method in [13]. We notice that method in [13] will become faster withsmaller number of grids used. But this can lead to large errors even with strongspatial correlations.

4 Summary 63

Table 4.3 Comparison standard deviations of full-chip leakage currents among three methods

� of Ileak (�A) Errors (%)

Circuit name Variation setting MC [13] New [13] New

SC0 Case 1 0:495 0:668 0:524 35:0 �5.77Case 2 0:632 0:726 0:689 14:9 9.04

SC2 Case 1 8:606 10:86 8:798 26:2 2.23Case 2 10:71 12:03 11:36 12:33 6.13

SC5 Case 1 26:19 41:36 25:11 57:9 �4.12Case 2 26:19 41:36 25:11 57:9 �4.12

Table 4.4 CPU time comparison among three methods

Cost time(s) Speedup (%)

Circuit name Variation setting MC [13] New [13] New

SC0 Case 1 378.1 11:35 1:40 8:11 270:1

Case 2 358.6 7:47 1:41 5:30 254:33

SC2 Case 1 1:35 � 104 168:51 18:79 30:6 718:5

Case 2 1:35 � 104 87:94 17:23 5:10 437:96

SC5 Case 1 2:76 � 105 3335 121:2 27:52 2277

Case 2 2:06 � 105 7798:3 443:95 17:56 464:33

4 Summary

In this chapter, we have presented a gate-based method for analyzing the full-chipleakage current distribution of digital circuit. The method considers both intra-die and inter-die variations with spatial correlations. The new method employsthe orthogonal polynomials and multidimensional Gaussian quadrature method torepresent and compute variational leakage at the gate level and uses the orthogonaldecomposition to reduce the number of random variables by exploiting the strongspatial correlations of intra-die variations. The resulting algorithm compares veryfavorable with the existing grid-based method in terms of both CPU time andaccuracy. The presented method has about 16� speedup over [13] with constantbetter accuracy.

Chapter 5Linear Statistical Leakage Analysis by VirtualGrid-Based Modeling

1 Introduction

When the spatial correlation is weak, existing general approaches mentioned inChaps. 3 and 4 do not work well as the number of correlated variables cannot bereduced too much. Recently, an efficient method was proposed [200] to address thisproblem. The method is based on simplified gate leakage models and formulates themajor computation tasks into matrix–vector multiplications via Taylor’s expansion.It then applies fast numerical methods like the fast multipole method or the pre-corrected fast Fourier transformation (FFT) method to compute the multiplication.However, this method assumes the gate-level leakage currents are purely log-normal, and the chip-level leakage is also approximated by log-normal distribution,which is not the case as we will show in the chapter. Also it can only give the meansand variances, not the complete distribution of the leakage powers.

In this chapter, a linear statistical leakage analysis technique using virtual grid-based model is presented [158, 159]. We start with a new linear-time algorithm forstatistical leakage analysis in the presence of any spatial correlation (from no spatialcorrelation to 100% correlated situation). The presented algorithm exploits thefollowing property: leakage current of a gate in the presence of spatial correlationis affected by process variations in the neighbor area. As a result, gate leakagecurrent can be efficiently computed by considering the neighbor area in constanttime. We adopt a newly used spatial correlation model where a new set of location-dependent uncorrelated virtual variables are defined over grid cells to representoriginal correlated random variables via fitting. To compute the statistical leakagecurrent of a gate on the new set of variables, the collocation-based method is appliedand the variational gate leakages and total leakage currents are represented in ananalytic form in terms of the random variables, which can give complete statisticinformation. The presented method considers both inter-die and intra-die variationsand can work with any spatial correlations (strong or weak, as defined in Sect. 3).Unlike the existing approaches [13, 65], the presented method does not make anyassumptions about the final distributions of total leakage currents for both gate and


65

66 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling

chip levels. In case of medium and strong correlations, the presented method canalso work in linear time by properly sizing the grid cells so that both locality ofcorrelation and accuracy are still preserved.

Furthermore, we bring forth a novel characterization of SCL for statisticalleakage information and we have the following observations: (1) The set of neighborcells is usually small (�10), and only considering the relative position, not theabsolute position on chip. (2) As proved later, the number of neighbor cellsinvolved in our model is not related to the strength (level) of spatial correlation.(3) The collocation-based method is applied, and the variational leakage of a gate isrepresented in an analytic form in terms of the virtual random variables, which cangive complete distribution. (4) The gate-level leakage distribution is only related tothe type of gates in a SCL. This statistical leakage characterization can be stored ina LUT, which only needs to be built once for a SCL. And the full-chip leakage ofany chip can be easily calculated by summing up certain items in the LUT.

The main highlights of the presented algorithm are as follows:

1. We apply the virtual grid-based model for spatial correlation modeling in thestatistical leakage analysis, making the resulting algorithm linear time for thefirst time for all the spatial correlation (weak or strong) cases.

2. A new characterization in SCL for statistical leakage analysis has been used.The corresponding algorithm can accelerate full-chip statistical analysis for allspatial correlation conditions (from weak to strong). To the best knowledge ofthe authors, the presented approach is the first published algorithm which canguarantee O.N / time complexity for all spatial correlation conditions.

3. In addition, an incremental algorithm has been applied. When a few local changesare made, only a small circuit (includes the changing gates) is involved inthe updating process. Our numerical examples show the incremental analysiscan achieve 10� further speedup compared with the library-enabled full-chipanalysis approach.

In addition to the main highlights, we also present a forward-looking way toextend the presented method to handle runtime leakage analysis. In order to estimatemaximum runtime leakage, the input state under the maximum leakage input vectorneeds to be chosen. While for transient runtime leakage simulation, every time theinput vector changes, the input states of some gates on a chip will be updated.Therefore, the incremental technique makes efficient runtime leakage simulationpossible. More details are given in Sect. 4.6.

Numerical examples on the PDWorkshop91 benchmarks on a 45 nm technologyshow that the presented method using novel characterization in SCL is on averagetwo orders of magnitude faster than the recently proposed method [13] with similaraccuracy. For weak correlation situation, more speedup can be observed. We remarkthat the experiment in this chapter is based on idle-time leakage. However, thelinear-time algorithm can also be applied to runtime leakage by selecting differentinput states under certain input vectors. Notice that glitch events are ignored inthe simplified discussion, which may cause estimation errors [99], and need to beconsidered in the future work. More details are discussed in Sect. 4.6.

2 Virtual Grid-Based Spatial Correlation Model 67

2 Virtual Grid-Based Spatial Correlation Model

The virtual grid-based model is based on the observation that the leakage current ofa gate in the presence of spatial correlation only correlates to its neighbor area. Ifwe can introduce a set of uncorrelated variables to model the localized correlation,computing the leakage current of one gate can be done in a constant time by onlyconsidering its neighbor area. Hence, total full-chip statistical leakage currents canthen be computed by simply adding all the gate leakage currents together in termsof the virtual set of variables in linear time. Notice that the virtual random variablesin different grids are always independent, which is different from traditional grid-based model. This idea was proposed recently for fast statistical timing analysis [15]to address the computational efficient modeling for weak spatial correlation, whichis similar to the PCA-based approach [155], but with a different set of independentvariables.

Specifically, the chip area is still divided into a set of grid cells. When the spatialcorrelation is weak enough to be ignored, the cell can become so small that one cellonly contains one gate. Then we introduce a “virtual” random variable for each cellfor one source of process variation.

These virtual random variables are independent and will be the basis forstatistical leakage current calculation concerned with spatial correlation. Then wecan express the original physical random variable of a gate in a grid cell as a linearcombination of the virtual random variables of its own cell as well as its nearbyneighbors. Since virtual random variables in each cell has specific location on chip,such location-dependent correlation model still retains the important spatial physicalmeaning (in contrast to PCA-based models). The grid partition can be made of anyshape. We use hexagonal grid cells [15] in this chapter since they have minimumanisotropy for 2D space.

Here we define the distance between centers of two direct neighbor grid cellsas the grid length dc . Gates located in the same cell have strong correlation (largerthan a given threshold value �high) and are assumed to have the same parametervariations. And “spatial correlation distance” dmax is defined as the minimumdistance beyond which the spatial correlation between any two cells is sufficientlysmall (or smaller than a given threshold value �low) so we can ignore it.

In this model, the j th grid cell is associated with one virtual random variable�j � N.0; 1/, which is independent of all other virtual random variables. Lj

can then be expressed as its k closest neighbor cells. We introduce the conceptof correlation index neighbor set T .j / for cell j , and the corresponding variablevector, �g;j , is defined as

�gridjD Œ�q; q 2 T .j /� (5.1)

to model the spatial correlation of Lj as

Lj DX

q2T .j /˛q � �q: (5.2)


d1

d2

d3

dmax

1

2

3

4

5

6

78

9

10

Fig. 5.1 Location-dependentmodeling with the T .i/ ofgrid cell i defined as its sevenneighbor cells. Reprintedwith permission from [159]c� 2010 IEEE

For example, hexagonal grid partition is used as shown in Fig. 5.1, and if T .i/

for each cell is defined as its closest k D 7 neighbor cells, then L located atcell .xi ; yi / can be represented as a linear combination of seven virtual randomvariables located in its neighbor set. Take L1 in Fig. 5.1 for instance, we haveL1 D ˛1�1 C ˛2�2 C � � � C ˛7�7.

This concept of virtual random variable helps to model the spatial correlation.Two cells close to each other will share more common spatial random variables,which means the correlation is strong. On the other hand, two cells physically faraway from each other will share less or no common spatial random variables. Inthis way, the spatial correlation is modeled as a homogeneous and isotropic randomfield, and the spatial correlation is only related to distance. That is to say, spatialcorrelation can be fully described by �.d/ in (3.12). dmax is the distance beyondwhich �.d/ becomes small enough to be approximated as zero.

Since �.d/ is only a function of distance, the number of unique distance valuesbetween two correlated grid cells equals the number of unique element values in˝N . From Fig. 5.1, the spatial correlation distance equals to the distance betweencell 1 and cell 10 which is dmax D p

7dc , and there are only three unique correlationdistances d1 to d3. Correspondingly, there are only three unique elements in ˝N ,without including two special values: 0 for d � dmax or 1 for distance withinone cell.

Furthermore, the same correlation index can be used for all grid cells, andthe coefficient ˛k should be the same for the same distance because of thehomogeneousness and isotropy of spatial correlation. For the cell marked 1 inFig. 5.1, we only have two unique values among the seven coefficients, i.e., we setp0 D ˛1, p1 D ˛i ; i D 2; 3; : : : ; 7. In other words, we have

L1 D p0�1 C p1.�2 C � � � C �7/: (5.3)

In this way, although there are seven random variables involved in the neighbor set,there are only two unknown coefficients left in the linear function in (5.3) due to thesymmetry property of hexagonal partition.

3 Linear Chip-Level Leakage Power Analysis Method 69

According to (3.12), a nonlinear overdetermined system can be built to determinethe two unique values of p0, p1 as follows,

�.0/ D E.L21/ D p2

0 C 6p21

�.d1/ D E.L1; L2/ D 2p0p1 C 2p21 (5.4)

�.d2/ D E.L1; L9/ D 2p21

�.d3/ D E.L1; L8/ D p21:

The system in (5.5) can be solved by formulating them as a nonlinear least-squareoptimization problem. In the matrix form, we can rewrite (5.2) for a whole chip as

L D PN;N � �; (5.5)

where N is the number of grid cells and � D Œ�1; �2; : : : ; �N �. According to (5.2), thecorrelation index set contains only k spatial random variables, which is a very smallfraction of the total spatial random variables. As a result, PN;N is a sparse matrix.Every gate is only concerned with k virtual random variables, which has specificlocation information.

Fundamentally, PCA-based method performs a similar process and has a similarnew transformation matrix between the original and new set of variables:

L D Vn;n � �; (5.6)

where Vn;n is the transformation matrix obtained from eigenvalue decomposition ofthe correlation matrix in PCA. The major difference is that Vn;n is a dense matrixeven though the original correlation matrix is sparse. This makes a huge differenceespecially when the spatial correlation is weak, as eigendecomposition will takealmost O.n3/ to compute. The virtual independent spatial correlation model alsoworks for medium and strong correlation cases, which will be shown in the nextsection.

3 Linear Chip-Level Leakage Power Analysis Method

In this section, we will present the new full-chip statistical leakage analysis method.We first introduce the overall flow of the presented method and highlight the majorcomputing steps. The presented algorithm flow is summarized in Fig. 5.2.

The presented algorithm consists of three major parts. The first part (steps 1and 2) is precharacterization. Step 1 builds the analytic leakage expressions (3.1)and (3.2) for each type of gates, which only needs to be done once for a SCL. Step2 deals with a small-sized nonlinear overdetermined system, which can be solvedwith any least-square optimization algorithm. The second part (step 3) generates a



small set of independent virtual random variables and builds the analytic leakagecurrent expressions and covariances for each gate on top of the new randomvariables. The final part (step 4) computes the final full-chip leakage expressions bysimple polynomial additions. From the final expressions, we can calculate importantstatistical information (like mean, variance, and even the whole distributions). Inthe following, we briefly explain some important steps.

3.1 Computing Gate Leakage by the Spectral StochasticMethod

In the following, we use the orthogonal polynomial-based modeling approachesmentioned in Sect. 3.2 of Chap. 2. Note that for Gaussian and log-normal distribu-tions, Hermite polynomial is the best choice as it leads to exponential convergencerate [45]. For non-Gaussian and non-log-normal distributions, there are other or-thogonal polynomials. The presented method can be extended to other distributionswith different orthogonal polynomials.

In our problem, y.�/ in (2.30) will be the leakage current for each gate andeventually for the full chip. For the j th gate, from (5.2), Lj only relates to k

independent virtual random variables in T .j /. Since k is a small number, step 3 inFig. 5.2 can be very efficient.

To compute the gate leakage current, we need to present both Isub and Igate ofeach gate in the second-order Hermite polynomials, respectively:

Isub.�gridj/ D

XP

iD0Isub;i;j Hi .�gridj

/; (5.7)

3 Linear Chip-Level Leakage Power Analysis Method 71

Igate.�gridj/ D

XP

iD0Igate;i;j Hi .�gridj

/; (5.8)

where Hi .�gridj/ are second-order Hermite polynomials defined as in (4.15). And

Isub;i;j and Igate;i;j are then computed by the numerical Smolyak quadrature methodin (2.40).

Notice that the time complexity of computing leakage for a gate is O.k2/. Andthe number of involved independent random variables k is very small compared tototal number of gates. The analytic expression is also functions of those involvedrandom variables.


After the leakage currents are calculated for each gate, we can proceed to computethe leakage current for the whole chip as follows:

Ichip.�/ DXn

j D1.Isub.�gridj

/ C Igate.�gridj//: (5.9)

The summation is done for each coefficient of Hermite polynomials. Then we obtainthe analytic expression of the final leakage currents in terms of �.

We can then obtain the mean value and variance of full-chip leakage current veryeasily as follows:

�chip D Ichip; 0th; (5.10)

�2chip D

XI 2

chip; 1st C 2X

I 2chip; 2nd; type1

CX

I 2chip; 2nd; type2; (5.11)

where Ichip;i th is the leakage coefficient for i th Hermite polynomial of second orderdefined in (4.15). Since Hermite polynomials with orders higher than two have nocontribution to mean value or standard deviation, second order is good enough forestimating �chip and �chip in (5.10) and (5.11).


To analyze the time complexity, one typically does not count the precharacterizationcost of step 1 in Fig. 5.2, and the time cost of step 2 is ignorable compared to thefollowing steps. In step 3, we need to compute the weights of level 2 k-dimensionalSmolyak quadrature point set. For quadratic model with k C3 variables, the numberof Smolyak quadrature point is S � O.k2/ based on the discussion in Sect. 3.1.


So the time cost for generating Smolyak quadrature points set is O.k2/. In step 4,we need to call (3.1) and (3.2) S times for each gate. In each call, we need tocompute k C 3 variables in the Hermite polynomials. The computational cost forthe two steps is (O.nk � S/), where n is the number of gates. After the leakagecurrents are computed for each gate, it takes O.n.k C 3// to compute the full-chipleakage current.

For the second-order Hermite polynomials, S / k2, and the k is the numberof grid cells in the correlated neighbor index set, which is a very small constantnumber. As a result, the time complexity of our approach becomes linear—O.n/.

4 New Statistical Leakage Characterization in SCL

In this section, we will present why a new characterization modeling statisticalleakage can be added to SCL and how it can be applied in our new full-chipstatistical leakage analysis method.

4.1 Acceleration by Look-Up Table Approach

The spatial correlation in (5.2) is related to distance between two grid cells. As aresult, neighbor set T .i/ represents the relative location, not the absolute location.In other words, a local neighbor set T and a local set of variables � loc D Œ�1; : : : ; �k�

can be shared by all the gates in all the cells.The local neighbor set T and the coefficients in (5.2) are determined by dmax=dc .

From the specific spatial correlation model in (3.12) (as shown in Fig. 5.3),

dmax D �p

� ln.�low/; dc D �

q� ln.�high/; (5.12)

00

1

d/η

ρ(d)

ρhigh

ρlow

dc/η dmax/η

ρ = exp(−d2/η2)

Fig. 5.3 Relation between�.d/ and d=�

4 New Statistical Leakage Characterization in SCL 73

then the ratio of spatial correlation distance dmax over grid length dc becomes

dmax=dc Dq

ln.�low/= ln.�high/: (5.13)

Once the threshold values �high and �low are set, dmax=dc is not related to thecorrelation length �. This means we can determine the grid length once we knowthe spatial correlation distance for a specific correlation formula at cost of controllederrors (by �high and �low).

Furthermore, (5.13) shows the spatial correlation (strong or weak) has nothingto do with T and the virtual random variables used in our model. At the sametime, the fitting parameters of static leakage in (3.1) and (3.2) are only related tothe types of gates in a library. As a result, the coefficients of Hermite polynomialsfor the leakage of one gate are only functions of the type of the gate, �high and�low. Therefore, a simple LUT can be used to store the coefficients of Hermitepolynomials of each type of gates in the library. In other words, we do not needto compute the coefficients of Hermite polynomials for each gate, just look themup from the table instead. This makes a big difference, as the time complexity isreduced from O.n/ to O.N /, where n is the number of gates and N is the numberof grid cells on chip.

For the LUT, supposing Q is the number of Hermite polynomials involved andm is the number of gate types in the library, then it includes two matrices as follows:

CS D fIsub;q;j g; CG D fIgate;q;j g: (5.14)

Here Isub;q;j represents the coefficient of Hq for j th kind of gate in the library forsubthreshold leakage and Isub;q;j represents the coefficient of Hq for j th kind ofgate in the library for gate oxide leakage. CS and CG are Q � m matrices. Noticethe table needs to only be built once and can be reused for different designs withdifferent conditions of spatial correlations since the new algorithm is independentof spatial correlation length � or the circuit design information. In this way, the LUTactually builds a new characterization in SCL, which presents the statistical leakagebehavior of each standard cell.

4.2 Enhanced Algorithm

The enhanced new algorithm consists of two parts. The first part is precharacteri-zation as shown in Fig. 5.4. We build analytic leakage current expressions for eachkind of gate on top of a small set of independent virtual random variables. For fixedvalues of �high, �low, and one library, a new characterization is added to the SCLby building a LUT, which stores coefficients of Hermite polynomials of Isub andIgate for the leakage analytic expressions for each kind of gate. This process only


Fig. 5.4 The flow of statistical leakage characterization in SCL

Fig. 5.5 The flow of the presented algorithm using statistical leakage characterization in SCL

needs to be done once for one LIBRARY, given �high and �low. Besides, it involvesa small-size nonlinear overdetermined problem, which can be solved fast with anyleast-square algorithm.

When we deal with full-chip statistical leakage analysis, the coefficients of localHermite polynomials in the neighbor grid cell set for each cell can be simplycalculated by the LUT. After transferring the local coefficients to correspondingglobal positions, we can compute the final full-chip leakage expressions by simplepolynomial additions. From the resulting expression, we can calculate other statis-tical information (like mean, variance, and even the whole distributions). The newalgorithm flow is summarized in Fig. 5.5. In the following, we briefly explain someimportant steps.



Here we define a gate mapping matrix as follows:

GN �m D fgi;j g; (5.15)

where gi;j represents the number of j th kind of gate in library located in i th gridcell. Then the coefficients of local Hermite polynomials in neighbor set for all thecells on chip can be easily calculated by the LUT as follows:

Isub; loc D G � C TS ; Igate; loc D G � C T

G : (5.16)

In order to get the full-chip leakage current, the local coefficients need to betransferred to their corresponding global positions:

T .i/ D .xi ; yi / C T: (5.17)

For the i th grid cell, the local set of random variables �loc should be transferredto the corresponding positions in T .i/. Therefore, Isub; loc and Igate; loc can betransferred to the corresponding global coefficients based on the global virtualrandom variable set �. For example, the coefficient of �i in the i th cell is

Isub.�i / DX

k;i2T .k/

Isub; loc.�T .k/�.xk;yk //: (5.18)

Next, we can proceed to compute the leakage current of the whole chip as follows,

Ichip.�/ DX

Isub.�/ C Igate.�/: (5.19)

The summation is done for each coefficient of global Hermite polynomials to obtainthe analytic expression of the final leakage currents in terms of �. We can thenobtain the mean value, variance, PDF, and CDF of the leakage current very easily.For instance, the mean value and variance for the full-chip leakage current are

�chip D Ichip; 0th; (5.20)

�2chip D

XI 2

chip; 1st C 2X

I 2chip; 2nd; type1

CX

I 2chip; 2nd; type2; (5.21)

where Ichip;i th is the leakage coefficient for i th Hermite polynomial of second orderdefined in (4.15).


4.4 Incremental Leakage Analysis

During the leakage-aware circuit optimizations, a few small changes might be madeto the circuit. But we do not want to compute the whole chip leakage from scratchagain. In this case, incremental analysis becomes necessary. In this section, we showhow this can be done in our look-up-table-based framework.

For brevity, we only consider the case where one gate is changed. However, thepresented incremental approach can be easily extended to handle a number of gates.

Assume one gate located in the i th grid cell is changed (e.g., a j th type of gateis replaced by a .j C 1/th type), resulting in

I newchip D I old

chip � I oldgrid�i C I new

grid�i ; (5.22)

where I newchip and I old

chip denote the full-chip leakage currents after and before change,

respectively, and I oldgrid�i and I new

grid�i are the leakage currents in the i th grid cell beforeand after change, respectively.

As defined in (5.15), gi;j in gate mapping matrix represents the number of j thkind of gate in the library located in the i th cell on a chip. Therefore, we can quicklygenerate the new gate mapping matrix Gnew by updating only two elements in Gold:

gnewi;j D gold

i;j � 1;

gnewi;j C1 D gold

i;j C1 C 1: (5.23)

In the incremental analysis processes, we can consider the updating part as asmall circuit, in which there is only one grid cell (the i th cell on chip) and only twotypes of gates in the library (the j th and the .j C 1/th). Then the updating gatemapping matrix is

Gupdate D Œ�1 1�; (5.24)

and LUTs in (5.14) used in the small circuit are only

CupdateS D ŒIsub;j ; Isub;j C1�;

CupdateG D ŒIgate;j ; Igate;j C1�; (5.25)

where Isub;j=.j C1/, Igate;j=.j C1/ are the j=.j C 1/th column in CS and CG , respec-tively.

Compared to the size of the whole chip, the small circuit is much simpler andonly contains a few terms. Therefore, updating the leakage distribution using (5.24)and (5.25) is much cheaper than the full-blown chip leakage analysis.



Considering statistical leakage analysis of a certain chip, for each grid cell, we needto do a weighted sum up of m kinds of gates in this cell for every coefficient inthe neighbor set (size k). For quadratic model with k variables, the number ofcoefficients is about S � k2. So the time cost for this step is O.k2 � m � N /,where N is the number of cells. For transferring the local coefficients to theirglobal positions and summing them up, the time cost is O.N /. Next, it takes O.N /

to compute the full-chip leakage current. Since k and m are very small constantnumbers, as a result, the time complexity of our approach becomes O.N /.

4.6 Discussion of Extension to Statistical Runtime LeakageEstimation

The leakage current for each input combination we obtained in Sect. 2 of Chap. 3can be used to estimate the average leakage in standby mode (idle) as well as time-variant leakage in active mode (runtime).

For idle leakage analysis, we take the average of the leakage currents of all theinput combinations to arrive at analytic expression for each gate as in (5.26), in lieuof the dominant states used in [13]. The reason for keeping all input states is that thetechnology downscaling narrows the gap between leakage under dominant statesand others. Only considering one state in leakage analysis will lead to large errorcompared to the simulation results:

Iavgsub D

X

i2all input states

Pi Isub;i ;

Iavggate D

X

i2all input states

Pi Igate;i ; (5.26)

where Pi is the probability of input state i , and Isub;i and Igate;i are the subthresholdleakage and gate leakage value at input state i , respectively.

On the other hand, runtime leakage might change when a new input vector isapplied. By choosing the input state at gate level under certain input vector, the finalanalytic expression for runtime leakage can be obtained. Notice that the size of theLUT of runtime leakage is larger than the one used in idle-time leakage analysis. Forruntime leakage, the analytic expressions of all input patterns cannot be combinedand have to be stored separately.

The presented statistical characterization in SCL is fast enough to make runtimeleakage estimation under a series of input vectors possible. More details forstatistical runtime leakage analysis is given in the following part.


Change ininput vector?

SLA on given initial input vector andinput states of all gates on chip

Update runtime leakage behaviorby incremental leakage analysis

No

Yes

Fig. 5.6 Simulation flow forfull-chip runtime leakage

Here we present a forward-looking way to extend the presented method tohandle runtime leakage current estimation. In traditional power analysis, leakagewas considered important only in the idle time. However, as technology scales down,the growth of leakage power becomes significant even during runtime, for instance,for computing the maximum power bound [38].

Runtime leakage, however, is input-signal dependent and changes each timethe input signals change, which means it becomes time varying. As a result, theruntime leakage analysis will take an extremely long time as we need to perform thestatistical analysis for each input vector along the time domain. Fortunately, withthe novel statistical characterization in SCL and the incremental approach discussedin Sect. 4.4, leakage analysis at each cycle is fast enough to make runtime leakageestimation possible.

In the following, we show how to extend the presented statistical leakage methodto handle the runtime leakage analysis. First, in the runtime leakage analysis, giventhe initial input vector and initial state of each gate on a chip, the initial leakageanalysis can be done using the algorithm in Fig. 5.5. After that, every time the inputvector changes, the input states of some gates on the chip will be updated. Insteadof computing the chip-level leakage from the very beginning, the incrementaltechnique discussed in Sect. 4.4 can be applied here to update the runtime leakageinformation. The flow of the presented statistical analysis of runtime leakage isshown in Fig. 5.6.

Also one notable difference is that the gate-level leakage analytical expressionsin (3.1) and (3.2) for all input states need to be stored for runtime leakage analysisinstead of the average value in (5.26) for idle-time leakage analysis.

Second, sometimes the maximum statistical runtime leakage estimation isrequired instead of such transient results of leakage. In fact, the maximum runtimeleakage of a circuit can be much greater than the minimum leakage (by a few ordersof magnitude [99]). Besides, the input vectors causing the maximum leakage currenthighly depend on process variations due to the shrinking physical dimensions.


To obtain the maximum statistical runtime leakage, we follow the work in [38],which proposed a technique to accurately estimate the runtime maximum/minimumleakage vector considering both cell functionalities and process variations. One canfirst run the tool in [38] to obtain input vector, giving the maximum leakage powerfirst. Then one can apply the presented SCL tool to obtain the maximum/minimumstatistical leakage power under the input. The presented statistical leakage charac-terization in SCL will work as long as the input vector is given.

We note that glitch events also have effect on runtime leakage power and ignoringthe glitching can cause an estimation error of approximately 5–20% dependingon circuit topology [99]. However, glitch has not been considered in any existingstatistical runtime leakage analysis works so far and will be investigated in thefuture.

4.7 Discussion about Runtime Leakage Reduction Technique

Runtime leakage reduction technology such as power gating [1] is widely applied indesign of mobile devices nowadays. Although the model of leakage power used inthis chapter is idle-time leakage, the presented method can be extended to leakagecomputation under the runtime scenario with leakage reduction.

By shutting off the idle blocks, power gating is an effective technique for savingleakage power. Following the runtime leakage model for power gating in [73], thevariational part of full-chip leakage can be estimated as

Ileak D .1 � W /X

i2allgates

Igatei ; (5.27)

where W is the empirical switching factor. And from [198], the leakage of a gateI gate can be approximated into a single exponential function of its virtual groundvoltage (VVG)

I gate OIe�KgateVVG ; (5.28)

where Kgate is the leakage reduction exponent and OI is zero-VVG leakage current.Notice both the switching factor W in (5.27) and the leakage reduction exponentKgate in (5.28) are related only to the type of gates and not to a statistical factor.Therefore, the presented LUT approach can work for both idle leakage and runtimeleakage with power-gating activities.


The presented methods with and without using LUT have been implemented inMatlab 7.8.0. Since the leakage model for method in [200] has to be purely log-normal (linear terms in exponent parts), we did not choose it for comparing purpose.


Table 5.1 Summary of test cases used in this chapter

Circuit Gate # Area/ �m2 Test case dmax=�m dc=�m Grid #

SC0 125 1,459 � 1,350 Case 1 2,190 730 2 � 2Case 2 1,095 365 4 � 4

SC1 1,888 4,892 � 4,874 Case 3 1,896 612 8 � 8Case 4 918 328 16 � 16

SC2 6,417 10,092 � 10,466 Case 5 984 328 32 � 31Case 6 482 164 64 � 64

VLSI 2e6 SC2 � 256 Case 7 6,301 2,144 112 � 112

All the experimental results are carried out in a Linux system with quad Intel XeonCPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter werepublished in [158, 159].

The methods for full-chip statistical leakage analysis were tested on circuits inthe PDWorkshop91 benchmark set. The circuits were synthesized with NangateOpen Cell Library [125], and the placement is from MCNC [106]. The technologyparameters come from the 45 nm FreePDK Base Kit and PTM models [139].

According to [71], L and Tox for high-performance logic in 45 nm technologywill be 18 nm and 1.8 nm, respectively. And the physical variation should becontrolled within ˙12%. So the 3� values of variations for L and Tox were setto 12% of the nominal values, of which inter-die variations constitute 20% andintra-die variations, 80%. L is modeled as sum of spatially correlated sources ofvariations, and Tox is modeled as an independent source of variation. The sameframework can be easily extended to include other parameters of variations. BothL and Tox are modeled as Gaussian parameters. For the correlated L, the spatialcorrelation is modeled based on (3.12), and the partition adopts Fig. 5.1. The testcases are given in Table 5.1 (all length units in � m), where test case “VLSI” isgenerated from duplicating SC2 as unit block to 16 � 16 array.

For comparison purposes, we performed MC simulations with 50,000 runsusing (3.1) and (3.2), the method in [13] (only consider spatial correlation ofneighbor grid cells), and the presented approaches on the benchmarks.

5.1 Accuracy and CPU Time

The results of the comparison of mean value and standard deviations of full-chipleakage current are shown in Table 5.2, where New is the presented method. Theaverage errors for mean and standard variance (�) values of the new technique are4.52% and 3.92%, respectively. While for the method in [13], the average errorsfor mean value and � are 4.12% and 3.83%, respectively. Table 5.2 shows thesetwo algorithms have almost the same accuracy, and our method can handle bothstrong and weak spatial correlations by adjusting grid size, for very large circuit


Table 5.2 Accuracy comparison of different methods based on Monte Carlo

Mean value (�A) Errors (%)

Test case Grid # MC Method [13] New Method [13] New

Case1 2� 2 3.311 3.105 3.169 �6.20 �4.28Case2 4 � 4 3.310 3.105 3.169 �6.20 �4.28Case3 8 � 8 30.04 28.88 30.46 �3.85 �1.38Case4 16 � 16 30.04 28.88 30.46 �3.85 �1.38Case5 32 � 32 191.6 179.0 182.7 �6.59 �4.65Case6 64 � 64 191.6 179.0 182.7 �6.59 �4.65Case7 112 � 112 – – 2.6e4 – –

Standard deviation (�A) Errors (%)

Test case Grid # MC Method [13] New Method [13] New

Case1 2 � 2 0.904 0.837 0.861 �7.40 �4.69Case2 4 � 4 0.594 0.547 0.548 �7.91 �7.74Case3 8 � 8 5.713 5.494 5.417 �3.83 �5.18Case4 16 � 16 5.307 5.400 5.067 1.75 �4.52Case5 32 � 32 33.87 31.83 32.25 �6.02 �4.78Case6 64 � 64 33.20 30.27 29.34 �8.83 �11.63Case7 112 � 112 – – 4.1e3 – –

Table 5.3 CPU timecomparison

Test case MC Method in [13] New LUT

Case1 83.14 2.96 0.10 0.023Case2 87.09 13.16 0.14 0.036Case3 828.42 26.24 0.86 0.033Case4 869.12 74.50 0.87 0.609Case5 7532.77 117.77 8.65 1.005Case6 7873.54 490.84 10.67 7.191Case7 – – 2598 3.7313

such as Case 7 MC and method in [13] runs out of memory, but the presented methodstill works.

Table 5.3 compares the CPU times of MC, method in [13], presented method(New), and presented method using statistical leakage characterization in SCL(shorted as LUT). This table shows the presented new method, New, is much fasterthan the method in [13] and MC simulation. On average, the presented algorithm hasabout 113� speedup over [13] and many order of magnitudes over the MC method.And the speed of our approach is not affected by the total number of grid cells. Ifthe spatial correlation is strong, which means dmax is large, dc can be increased atthe same time without loss of accuracy. So the number of neighbor grid cells in T .i/

will still be much smaller than the number of gates. The presented method will beefficient and linear under both cases. Table 5.3 also shows the presented method cangain further speedup with LUT technique using statistical leakage characterizationin SCL.


Table 5.4 Incrementalleakage analysis cost Test Cost time(s) Speedup over

case Incremental LUT MC [13] New LUT

Case1 3.78e�4 2.2e5 2.7e4 265 53Case2 1.53e�4 5.7e5 8.1e4 915 157Case3 0.0026 3.2e5 3.7e4 331 13Case4 1.12e�4 7.8e6 6.7e5 7768 407Case5 0.0095 7.9e5 1.1e5 911 16Case6 2.77e�4 2.8e7 6.1e6 3.9e4 3.1e4

5.2 Incremental Analysis

For comparison purpose, one gate in each benchmark circuit is changed, and thepresented incremental algorithm is applied to update the leakage value locally.Table 5.4 shows the computational cost of the incremental analysis and the speedupover four different leakage analysis methods in Table 5.3. Compared with theLUT approach (the fifth column in Table 5.3), the incremental analysis achieves13 � 3:1e4X speedup. As discussed in Sect. 4.4, the minicircuit for updating onlycontains a small constant number of terms. Therefore, when the problem sizeincreases further, we expect the incremental analysis could achieve more speedupover the full leakage analysis.

6 Summary

In this chapter, we have presented a linear algorithm for full-chip statistical analysisof leakage currents in the presence of any condition of spatial correlation (strong orweak). The new algorithm adopts a set of uncorrelated virtual variables over gridcells to represent the original physical random variables with spatial correlation,and the size of grid cell is determined by the correlation length. As a result, eachphysical variable is always represented by virtual variables in local neighbor set.Furthermore, a LUT is used to cache the statistical leakage information of each typeof gate in the library to avoid computing leakage for each gate instance. As a result,the full-chip leakage can be calculated with O.N / time complexity, where N is thenumber of grid cells on chip. The new method maintains the linear complexity fromstrong to weak spatial correlation and has no limitation of leakage current model orvariation model.

This chapter also presented an incremental analysis scheme to update the leakagedistribution more efficiently when local changes to a circuit are made. Numericalexamples show the presented method is about 1,000� faster than the recentlyproposed method [13] with similar accuracy and many orders of magnitude over theMC method. Numerical results show the presented incremental analysis can furtherachieve significant speedup over the full leakage analysis.

Chapter 6Statistical Dynamic Power EstimationTechniques

1 Introduction

It is well accepted that the process-induced variability has huge impacts on thecircuit performance in the sub-90 nm VLSI technologies. The variational consid-eration of process has to be assessed in various VLSI design steps to ensure robustcircuit design. Process variations consist of both inter-die ones, which affect allthe devices on the same chip in the same way, and intra-die ones, which representvariations of parameters within the same chip. These include spatially correlatedvariations and purely independent or uncorrelated variations. Spatial correlationdescribes the phenomenon that devices close to each other are more likely to havesimilar characteristics than when they are far apart. It was shown that variationsin the practical chips in nanometer range are spatially correlated [195]. Simpleassumption of independence for involved random variables can lead to significanterrors.

One great challenge from aggressive technology scaling is the increasing powerconsumption, which has become a major issue in VLSI design. And the variationsin process parameters and timing delays result in variations in power consumption.Many statistical leakage power analysis methods have been proposed to handle bothinter-die and intra-die process variations considering spatial variation [13, 65, 155,200]. However, the problem is far from being solved for dynamic power estimation.

Dynamic power for a digital circuit in general is expressed as follows:

Pdyn D 1

2fclkV 2

dd

nX

j D1

Cj Sj ; (6.1)

where n is the number of gates on chip, fclk is clock frequency, Vdd is the supplyvoltage, Cj is the sum of load capacitance and equivalent short-circuit capacitanceat node j , and Sj is the switching activity for gate j . This expression, however, does


83

84 6 Statistical Dynamic Power Estimation Techniques

0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.20

0.5

1

1.5

2

2.5x 10−5

AND2 Gate Dynamic Power Variation underFull−swing and Partial−swing

Leff Ratio 0.8~1.2

AN

D2

Gat

e D

ynam

ic P

ower

(W

)

Full−swingPartial−swing

Fig. 6.1 The dynamic power versus effective channel length for an AND2 gate in 45 nmtechnology (70 ps active pulse as partial swing, 130 ps active pulse as full swing). Reprinted withpermission from [60] c� 2010 IEEE

not give explicit impacts of effective channel length (Leff) and gate oxide thickness(Tox) of the gate on the dynamic power. In the work of [64], Leff and Tox are provedto have the most impact on gate dynamic power consumption. Figure 6.1 showsdynamic power variations due to different effective channel length for an AND2 gatein 45 nm technology. It can be seen that channel lengths of a gate has a significantimpact on its dynamic power.

In this chapter, we propose to develop a more efficient statistical dynamic powerestimation method considering channel length variations with spatial correlationand gate oxide thickness variations, which is not considered in the existing works.The presented dynamic power analysis method explicitly considers the spatialcorrelations and glitch width variations on a chip. The presented method [60]follows the segment-based statistical power analysis method [30], where dynamicpower is estimated based on the switching period instead of switching events toaccommodate the glitch width variations. To consider the spatial correlation ofchannel length, we set up a set of uncorrelated variables over virtual grids torepresent the original physical random variables via fitting. In this way, O.n2/ timecomplexity for computing the variances can be reduced to linear-time complexity(n is the number of gates in the circuit). The algorithm works for both strong andweak correlations. Furthermore, a LUT is created to cache statistical informationfor each type of gate to avoid running SPICE repeatedly. The presented method hasno restrictions on models of statistical distributions for dynamic power. Numericalexamples show that the presented method has 300� speedup over recently proposedmethod [30] and many orders of magnitude over the MC method.

2 Prior Works 85

2 Prior Works

2.1 Existing Relevant Works

Many works on dynamic power analysis have been proposed in the past. MC-basedsimulation was proposed in [10] where the circuit is simulated for a large numberof input vectors to gain statistics for average power. Later, probabilistic methodsfor power estimation were proposed and widely used [29,48,116,117,183] becausestatistical estimates can be obtained without time-consuming exhaustive simulation.In [117], the concept of probability waveforms is proposed to estimate the mean andvariance of the current drawn by each circuit node. In [116], the notion of transitiondensity is introduced and they are propagated through combinational logic moduleswithout regard to their structure. However, the author did not consider the innersignal correlation; thus, the algorithm is only applicable to combinational circuits.Ghosh et al. [48] extended the transition density theory to consider sequentialcircuits via the symbolic simulation to calculate the correlations between internallines due to reconvergence. However, the performance of this algorithm is restricteddue to its memory space complexity. In [29, 183], the authors used the taggedprobabilistic simulation (TPS) to model the set of all possible events at the output ofeach circuit node, and is more efficient compared with [48] due to its effectiveness incomputing the signal correlation. The work [48] is based on zero-delay model, andthe works [10,116,183] are based on real delay model. However, all of them assumefixed delay model, which is no longer true under process variation. At the sametime, all the previous works only consider full-swing transition, and partial-swingeffects are not well accounted for.

Recently, several approaches have been proposed for fast statistical dynamicpower estimation [4, 18, 30, 64, 66, 138]. Alexander et al. [4] proposed to considerthe delay variations and glitches for estimation dynamic powers. With efficientsimulation of input vectors, this algorithm has a linear-time complexity. But thevariation model is quite simple as only minimum and maximum bounds for delaywere obtained, and partial swings are not considered. Pilli et al. [138] presentedanother approach, which divides the clock cycle into a number of time slots andthe transition density is computed for each slot, but only mean value of dynamicpower can be estimated. In [66], the authors used supergate and timed Booleanfunctions to filter glitches and consider signal correlations due to reconvergent fan-outs, but failed to consider the correlations including placement information. Chouet al. [18] used probabilistic delay model based on MC simulation technique fordynamic power estimation but also lacked the considerations including placementinformation. Harish et al. [64] used hybrid power model based on MC analysis; themethod is only applied to a small two-stage two-input NAND gate; however, forlarge circuits, Monte Carlo simulation can be really time consuming.


Time

E1 E2 E3 E4 EmE5 ...

Fig. 6.2 A transition waveform example fE1; E2; : : : ; Emg for a node. Reprinted with permissionfrom [60] c� 2010 IEEE

2.2 Segment-Based Power Estimation Method

Dinh et al. [30] recently proposed a method not based on the fixed delay gate modelto consider the partial-swing effect as well as the effect of process variation.

To accurately estimate the dynamic power in the presence of process variation,the work in [30] introduces the transition waveform concept, which is similar to theprobability waveform [117] or tagged waveform [29] concepts except that varianceof the transition time is introduced. Specifically, a transition wave consists of setof a transition events, which is a triplet .p; t; ıt /; where p is the probability forthe transition to occur, t is the mean time of the transition, and ıt is the standarddeviation of the transition time. Figure 6.2 shows an example of transition waveformfor a node.

The triplets are then propagated from the primary inputs to the primary outputs,and they are computed for every node. In addition to propagating the switchingprobabilities like traditional methods, this method also propagates the variancesalong the signal paths, which is done in a straightforward way based on the second-order moment matching. The glitch filtering is also performed to ensure accuracyand reduce the number of switches during the propagation.

Unlike the traditional power estimation methods in [29, 117], which count thetransition times (or their probabilities), i.e., edges in the transition waveform, toestimate the dynamic power, the work in [30] proposed to count the transitionsegments (duration), which are pairs of two transition events to take into accountthe impacts of the different glitch widths on the dynamic power consumption. Forn transition event in transition waveform, the number of segments is C 2

n D n.n �1/=2, which increases the complexity of the computation compared to the edge-based method. Another implication is traditional power edge-based consumptionformula (6.1) cannot be used any more. As a result, a LUT is built from theSPICE simulation results for different glitch widths. The total dynamic power for agate is then the probability-weighted average dynamic power for all the switchingsegments, which is then summed up to compute the total chip dynamic power.However, this method does not consider spatial correlation, which can lead tosignificant errors and is the main issue to be addressed in this chapter.

3 The Presented New Statistical Dynamic Power Estimation Method 87

3 The Presented New Statistical Dynamic Power EstimationMethod

3.1 Flow of the Presented Analysis Method

In this section, we present the new full-chip statistical dynamic power analysismethod. The presented approach follows the segment-based power estimationmethod [30]. The presented algorithm propagates the triplet switching events fromprimary input to the output. Then it computes the statistical dynamic power at eachnode based on orthogonal polynomial chaos and virtual grid-based variables forchannel length to deal with spatial correlation discussed in Sect. 3 of Chaps. 3 and2 of Chap. 5.

We first present the overall flow of the presented method in Fig. 6.3 and thenhighlight the major computing steps later.

The dynamic power for one gate (under glitch width Wg with variation and fixedload capacitance Cl) can be presented by Hermite polynomial expansion as

Pdyn;Wg;Cl.�g;j / D

XQ

qD0Pdyn;q;j Hq.�g;j /: (6.2)



Fig. 6.4 The flow of building the sub LUT

Pdyn;q;j is then computed by the numerical Smolyak quadrature method. In thischapter, we use second-order Hermite polynomials for statistical dynamic poweranalysis. The coefficient for qth Hermite polynomial at j th gate, Pdyn;q;j , can becomputed as the following:

Pdyn;q;j DX

Pdyn.�l /Hq.�l /wl =hH 2q .�g;j /i; (6.3)

where � l is Smolyak quadrature sample. From the dynamic power LUT Pdyn Df .L; Tox; Wg; Cl/, we can interpolate Pdyn.�l /, which is the dynamic power forevery Smolyak sampling point.

3.2 Acceleration by Building the Look-Up Table

Since we follow the segment-based power estimation method, we have to charac-terize the powers from the SPICE simulation with different sets of parameters. Thepower of a gate is a function of L and Tox as well as glitch width Wg and loadcapacitance Cl in the look-up table. Pdyn D f .L; Tox; Wg; Cl/. We then performSPICE simulation on different sets of those four parameters to get the accurate dataand build the LUT.

On the other hand, we observe that the coefficients of Hermite polynomials fordynamic power of one gate in (6.2) and (6.3) are only functions of the type of thegate, �high and �low (defined in Sect. 2 of Chap. 5) and Wg and Cl . Therefore, anothersub LUT can be used to store the coefficients of Hermite polynomials for each kindof gate instead of computing the coefficients for each gate. The time complexityreduces from the number of gates, O.n/, to the number of grids, O.N /. Figure 6.4shows the flow of sub LUT construction.

3 The Presented New Statistical Dynamic Power Estimation Method 89

3.3 Statistical Gate Power with Glitch Width Variation

To compute the statistical gate power expression considering the glitch widthvariations, we need to compute the probability of each switching segment assumingthat they follow the normal distribution:

Pr.w D wi / D 1

�w

p2

exp

�� .wi � �w/2

2�2w

: (6.4)

The Hermite polynomial coefficients for (6.2) under glitch width wi and loadcapacitance Cl can be interpolated from the sub LUT. For a gate index j with thetransition waveform .p1; t1; �t1/, .p2; t2; �t2/, : : : , .pM ; tM ; �tM /, there are M.M �1/=2 segments. The resulting statistic power is probabilistic addition of power fromeach segment (their Hermite polynomial expressions):

Pdyn;Cl.�g;k/ D

M�1X

iD1

MX

j DiC1

P r.i; j / � Pdyn;Cl.�g;k; i; j /; (6.5)

in which Pdyn;Cl.�g;k; i; j / is the dynamic power of gate k caused by the switching

segment between transitions Ei and Ej . P r.i; j / is the probability that theswitching segment .Ei ; Ej / occurs only if there are transitions at both Ei and Ej ,and there are no transitions between Ei and Ej :

P r.i; j / D pi � pj �j �1Y

kDiC1

.1 � pk/: (6.6)

In the following, we write Pdyn;Cl.�g;k/ as Pdyn.�g;k/ without confusion.

3.4 Computation of Full-Chip Dynamic Power

The dynamic power for each gate is calculated using (6.5). To compute the full-chipdynamic powers, we also need to transfer the local coefficients to correspondingglobal positions first. Then we can proceed to compute the dynamic power for thewhole chip as follows,

P totaldyn .�/ D

Xn

j D1Pdyn.�g;j /: (6.7)

The summation is done for each coefficient of global Hermite polynomials toobtain the analytic expression of the final dynamic power in terms of �. We canthen obtain the mean value, variance, PDF, and CDF of full-chip dynamic power


very easily. For instance, the mean value and variance for the full-chip dynamicpower are

�total D Pdyn; 0th; (6.8)

�2total D

XP 2

dyn; 1st C 2X

P 2dyn; 2nd; type1

CX

P 2dyn; 2nd; type2; (6.9)

where Pdyn;ith is the power coefficient for i th Hermite polynomial of second orderdefined in (4.15).


The presented method and the segment-based analysis [30] have been implementedin Matlab V7.8. The initial results of this chapter were published in [60].

The presented new method was tested on circuits in the ISCAS’89 benchmarkset. The circuits were synthesized with Nangate Open Cell Library under 45 nmtechnology, and the placement is from UCLA/Umich Capo [145]. For comparisonpurposes, we performed MC simulations (10,000 runs) considering spatial correla-tion, the method in [30], and the presented method on the benchmark circuits. In ourMC implementation, similar to [30], we do not run the SPICE on the original circuitsas it is too much time consuming for ordinary computer. Instead, we compute theresults via interpolation from the characterization data computed from SPICE runs.The 3� range of L and Tox is set as 20%, of which inter-die variations constitute 20%and intra-die variations, 80%. L, Tox are modeled as Gaussian random variables. L

is modeled as sum of spatially correlated sources of variations based on (3.12). Tox

is modeled as an independent source of spatial variation. The same framework canbe easily extended to include other parameters of variations.

The characterization data for each type of gate in SCL are collected usingHSPICE simulation. For each type of gate, we perform repeated simulation onsampling points in the 3� range of L, Tox, and input glitch width Wg for severaldifferent load capacitances to obtain the gate dynamic powers and gate delays. Thetable of characterization data will be used to interpolate the value of dynamic powerfor each type of gate with different process parameters. We use 21 sample points forglitch width, from 50 ps to 150 ps.

In transition waveform computation, the gate delays are obtained through thetable of characterization data, and the input signal probabilities are 0.5, withswitching probabilities of 0.75. The test cases are given in Table 6.1 (all length unitsin � m). In the first column, s and w stand for strong and weak spatial correlations,respectively.

The comparison results of mean values and standard deviations of full-chipdynamic power are shown in Table 6.2, where MC Co represents Monte Carlo


Table 6.1 Summary ofbenchmark circuits

Test case Gate # Grid # Area

s1196 (s) 529 27 95�90s1196 (w) 529 294 95�90s5378 (s) 2779 93 209.5�198s5378 (w) 2779 1300 209.5�198s9234 (s) 5597 161 278.5�270s9234 (w) 5597 2358 278.5�270

Table 6.2 Statistical dynamic power analysis accuracy comparison against Monte Carlo

Mean value (mW) Errors (%)Testcase Grid # MC Co [30] New [30] New

s1196 (s) 27 1.14 1.19 1.14 3.82 0.49s1196 (w) 294 1.14 1.19 1.14 3.98 0.41s5378 (s) 93 6.09 6.24 5.98 2.46 1.85s5378 (w) 1300 6.09 6.23 5.98 2.29 1.85s9234 (s) 161 12.8 13.2 12.5 2.94 2.31s9234 (w) 2358 12.8 13.1 12.5 2.78 2.14

Standard deviation (mW) Errors (%)Testcase Grid # MC Co [30] New [30] New

s1196 (s) 27 0.0912 0.00394 0.0845 95.68 7.33s1196 (w) 294 0.0671 0.00395 0.0645 94.11 3.94s5378 (s) 93 0.470 0.00877 0.435 98.13 7.61s5378 (w) 1300 0.436 0.00891 0.412 97.96 5.68s9234 (s) 161 0.964 0.0185 0.882 98.08 8.52s9234 (w) 2358 0.894 0.0191 0.839 97.87 6.14

considering spatial correlation, and New is the presented method. The methodin [30] cannot consider spatial correlation as it assumed that the power for thegates are independent Gaussian random variables. In implementation of [30], weassume the same variation for Leff and Tox but without spatial correlations. Theaverage errors for mean and standard deviation (�) values of the New techniqueare 1.49% and 6.54% compared to MC Co, respectively. While for the methodin [30], the average errors for mean value and � are 3.04% and 96.97%, respectively.As a result, not considering spatial correlations can lead to significant errors.Furthermore, from the comparison between mean and standard deviation of MC Co,the average std=mean is 7.21% which means spatial correlation in process parameterhas significant impact on the distribution of dynamic power. The results in Table 6.2also show that our method can handle both strong and weak spatial correlations byadjusting grid size.

Table 6.3 compares the CPU times of three methods, which shows that the Newmethod is much faster than the method in [30] and MC simulation. On average, thepresented technique has about 377� speedup over [30] and 5,123� speedup over theMC method. In [30], the dynamic power of each gate needed to be interpolated fromthe LUT due to different L, Tox, and glitch width value variations; the complexity


Table 6.3 CPU timecomparison

CPU time (s) Speedup overTestcase MC Co [30] New MC Co [30]

s1196 (s) 1261 88 0.30 4242 296s1196 (w) 1225 92 0.33 3743 281s5378 (s) 7037 522 1.19 5927 440s5378 (w) 6859 517 1.41 4874 367s9234 (s) 14805 1062 2.11 7026 504s9234 (w) 13978 1058 2.84 4927 373

is a linear function of the number of gates O.n/; however, in New algorithm, onlythe coefficients of Hermite polynomials for each type of gate are needed to computeand the overall complexity is a linear function of the number of grids O.N /.

5 Summary

In this chapter, we have presented a new statistical dynamic power estimationmethod considering the spatial correlation in the presence of process variation.The presented method considers the variational impacts of channel length on gatedynamic powers. To consider the spatial correlation, it uses a spatial correlationmodel where a new set of uncorrelated variables are defined over virtual grids torepresent the original physical random variables by least-square fitting. To computethe statistical dynamic power of a gate on the new set of variables, the new methodapplies the flexible OPC-based method, which can be applied to any gate models.We adopted the segment-based statistical power method to consider the impactsof glitch width variations on dynamic powers. The total full-chip dynamic powersexpressions are then computed by summing up the resulting orthogonal polynomials(their coefficients) on the new set of variables for all gates. Numerical resultson ISCAS’89 benchmark with 45 nm technology show that the presented methodhas about 300� speedup over recently proposed segment-based statistical powerestimation method [30] and many orders of magnitude over the MC method.

Chapter 7Statistical Total Power Estimation Techniques

1 Introduction

For digital CMOS circuits, the total power consumption is given by the followingformula:

Ptotal D Pdyn C Pshort C Pleakage; (7.1)

in which Pdyn, Pshort, and Pleakage represent dynamic power, short-circuit power,and leakage power, respectively. Most of the previous works on power estimationeither focus on dynamic power estimation [10, 28–30, 64, 116] or leakage powerestimation [13, 95, 158, 200]. As technology scales down to nanometer range, theprocess-induced variability has huge impacts on the circuit performance [120].Furthermore, many variational parameters in the practical chips in nanometer rangeare spatially correlated, which makes the computations even more difficult [195],and simple assumption of independence for involved random variables can lead tosignificant errors.

Early research on power analysis is mainly focusing on dynamic poweranalysis [10, 28, 29, 116]; the solution ranges from the transition density-basedmethod [116], tagged probabilistic method [29], to the practical MC basedmethod [10, 28, 29]. Later on, designers realize that leakage power is becomingmore and more significant and is very sensitive to the process variations. As aresult, full-chip leakage power estimation considering process variations underspatial correlation has been intensively studied in the past [13, 95, 158, 200]; themethod can be grid based [13, 158], projection based [95], and simplified gateleakage model based [200].

Although total power can be computed by simply adding the dynamic power andleakage power (plus short-circuit power), practically, dynamic power and leakagepower are correlated. For instance, leakage power of a gate depends on its inputstate, which depends on the primary inputs and timing of the circuits. Usingdominant state or average values is less accurate than the precise circuit-levelsimulation under realistic testing input vectors. Under the process variations with


93

94 7 Statistical Total Power Estimation Techniques

2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4

x 10−4

0

100

200

300Power distribution with random input vectors

Occ

uran

ces

W

3.5 3.6 3.7 3.8 3.9 4 4.1 4.2

x 10−4

0

100

200

300Power distribution with a fixed input vector and correlations in Leff

Occ

uran

ces

W

Fig. 7.1 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and(bottom) under a fixed input vector with effective channel length spatial correlations. Reprintedwith permission from [62] c� 2011 IEEE

spatial correlation, the dynamic power and leakage power are more correlated viaprocess parameters. As a result, traditional separate approaches will not be accurate.Circuit-level total power estimation based on real testing vectors is more desirable.

Figure 7.1 shows the comparison of the circuit total power distribution of c432from ISCAS’85 benchmark. We show two power variations. The first figure (upper)is obtained due to random input vectors. The second is obtained using a fixed inputvector but under process variations with spatial correlation. As can be seen, thevariance induced by process variations is comparable with the variance induced byrandom input vectors. As a result, considering process variation impacts on the totalchip power is important for early design solution exploration and post-layout designsign-off validation.

Several works had been proposed to consider the dynamic power consideringprocess variation. Harish et al. [64] used hybrid power model based on MC analysis,but the method is only applied to a small two-stage two-input NAND gate. The workin [4] used a variation delay model to obtain minimum and maximum delay boundsin order to estimate the number of glitches and dynamic power. The work in [30]introduced a new method based on transition waveform concept, where transitionwaveform is propagated through the circuit and the effect of partial swing could be

2 Review of the Monte Carlo-Based Power Estimation Method 95

considered. However, none of these works consider the process-induced variationswith spatial correlation which can be significant (as shown in Fig. 7.1).

In this chapter, we present an efficient statistical chip-level total power estimation(STEP) method [62] considering process variations under spatial correlation inwhich both the dynamic power and leakage power are included. To the bestknowledge of the authors, it is first work toward the statistical total power analysis.The presented method uses the commercial Fast-SPICE tool (UltraSim) to obtaintotal chip power. To consider the process variations with spatial correlation, wefirst apply PFA method to transform the correlated variables into uncorrelatedones and meanwhile reduce the number of resulting random variables. Afterward,Hermite polynomials and sparse grid techniques are used to estimating total powerdistribution in a sampling way. Numerical examples show that the proposed methodis 78� faster than the MC method under fixed input vector and 26� faster thanthe MC method considering both random input vectors and process variations withspatial correlation.

2 Review of the Monte Carlo-Based Power EstimationMethod

In general, dynamic power Pdyn is expressed as in (6.1). Many previous worksabout dynamic power estimation are based on (6.1); they can be MC-based[10, 28, 29] or probabilistic based [29, 116]. The MC-based method is consideredmore accurate than probabilistic-based method and at the same time without losingmuch efficiency [10]. In the MC-based method, the switching activity Si in (6.1)can be modeled as

Si D ni .T /

T; (7.2)

in which ni .T / is the number of transitions of node i in the time interval.�T=2; T=2�. The mean power PT is defined as

PT D E�Pdyn

�: (7.3)

The key part in MC simulation is the stopping criterion. Suppose we need toperform N different simulations of the circuit, each of length T and the average andstandard deviation of the N different Pdyn values are mdyn and sdyn, respectively.Therefore, we have

limN !1 P

8<

:PT � mdyn

sdyn

.pN

� Pdyn

9=

;D ˚

�Pdyn

�; (7.4)


in which P is the probability and ˚.Pdyn/ is the CDF of the standard normaldistribution. Therefore, given the confidence level .1 � ˛/, it follows that

P

(

˚˛=2 <PT � mdyn

sdynıp

N� ˚1�˛=2

)

D 1 � ˛: (7.5)

As ˚˛=2 D �˚1�˛=2, given a specified error tolerance �, (7.5) can be recast to

ˇˇPT � mdyn

ˇˇ

mdyn� ˚1�˛=2sdyn

mdyn

pN

� �: (7.6)

Equation (7.6) can be viewed as the stopping criterion when N , mdyn, and sdyn

satisfy it.Afterward, the work in [28, 29] further improves the efficiency of MC-based

method. In [29], the author transforms the power estimation problem to a surveysampling problem and applied stratified random sampling to improve the efficiencyof MC sampling. In [28], the author proposed two new sampling techniques,module-based and cluster-based, which can adapt stratification to further improvethe efficiency of the Monte Carlo-based techniques. However, all of these worksare based on gate-level logic simulation as they only consider dynamic powers. Fortotal power estimation and estimating of impacts of process variations, one needstransistor-level simulations. As a result, improving the efficiency of MC methodbecomes crucial and will be addressed in this chapter.

3 The Statistical Total Power Estimation Method

In this section, we present the new chip-level statistical method for total estimationof full-level powers, called STEP. The method can consider both fixed input vectorsand random input vectors for power estimation. Power distribution consideringprocess variations under fixed input vectors is important because it can reveal thepower distribution for the maximum power, the minimum power, or the power dueto user-specified input vectors. This technique can be further applied to estimate thedistribution for maximum power dissipation [188]. Power distribution under randominput vectors is also important, as it can show the total power distribution caused byrandom input vectors and process variations with spatial correlation. We first givethe overall flow of the presented method under a fixed input vector in Fig. 7.2 andthen highlight the major computing steps later. The flow of the presented methodconsidering random input vectors is followed afterward. The spatial correlationmodel is the same as Sect. 3 of Chap. 3.

3 The Statistical Total Power Estimation Method 97

Fig. 7.2 The flow of the presented algorithm under a fixed input vector

3.1 Flow of the Presented Analysis Method Under FixedInput Vector

The STEP method uses commercial Fast-SPICE tool for accurate total powersimulation. It transforms the correlated variables into uncorrelated ones and re-duces the number of random variables using the PFA method [57]. Then itcomputes the statistical total power based on Hermite polynomials and sparse gridtechniques [45].

3.2 Computing Total Power by Orthogonal Polynomials

Instead of using the MC method, a better approach is to use spectral stochasticmethod, which will lead to much less sampling than standard MC for small numberof variables as discussed in Sect. 3.3 of Chap. 2.

In our problem, x.�/ will be the total power for the full chip. k is the number ofreduced variables by performing the PFA method. The full-chip total power can bepresented by HPC expansion as

Ptot.�/ DXQ

qD0Ptot;qHq.�/: (7.7)


Ptot;q is then computed by the numerical Smolyak quadrature method. In this chap-ter, we use second-order Hermite polynomials for statistical total power analysis,and the Smolyak quadrature samples for k random variables is 2k2 C 3k C 1. Thecoefficient for qth Hermite polynomial, Ptot;q , can be computed as the following:

Ptot;q DX

Ptot.�l/Hq.�l/wl=hH 2q .�/i; (7.8)

where � l is Smolyak quadrature sample. As stated in Sect. 2.2 of Chap. 2, certainquadrature sample can be converted to the sample in terms of the original gateeffective channel length variables via ı D L�l. Thus, Ptot.�l/ can be obtained byrunning the circuit simulation tools like Fast-SPICE using the specified Leff obtainedfrom ı for each gate.

After the coefficients of the analytic expression of the total power (7.7) isobtained, we can then get the mean value, variance, PDF, and CDF of full-chiptotal power very easily. For instance, the mean value and variance for the full-chiptotal power are

�tot D Ptot;0th; (7.9)

�2tot D

XP 2

tot;1st C 2X

P 2tot;2nd;type1

CX

P 2tot;2nd;type2; (7.10)

where Ptot;ith is the power coefficient for i th Hermite polynomial of second orderdefined in (4.15).

3.3 Flow of the Presented Analysis Method Under RandomInput Vectors

To consider more input vectors or random input vectors used in the traditionaldynamic power analysis, one simple way is to treat the input vector as one morevariational parameter in our statistical analysis framework. This strategy can beeasily fit into the simple MC-based method [10] as we just add one dimension to thevariable space. But for spectral stochastic method, it is difficult to add this variableinto existing space.

In probability theory, the PDF of a function of several random variables canbe calculated from the conditional PDF for single random variable. Let Ptotal Dg.Uin; Leff/, in which Uin is the variable of random input vectors and Leff is thevariable of gates effective channel length. The PDF of total power Ptotal can becalculated by

fPtotal.p/ DZ 1

�1fLeff .l ju/fUin .u/du; (7.11)


Total power distribution under selected power pointsTotal power distribution under random input vectors

a b c Power

Fig. 7.3 The selected power points a, b, and c from the power distribution under random inputvectors. Reprinted with permission from [62] c� 2011 IEEE

in which the PDF function under random input vectors fUin .u/ is obtained by MC-based method [10] and the conditional PDF fLeff .l jUin D u/ under fixed input u canbe obtained or interpolated from samples calculated from fixed input algorithm inFig. 7.2. Note u can be viewed as the power of chip under input u.

We use the example in Fig. 7.3 to illustrate the presented method. In this figure,we first compute the power distribution (solid line) with random input vectors only.Then we select three input power points, a; b; c (with three corresponding inputvectors). In each of the input power point, we perform statistical power analysiswith process variations under the fixed power input (using the corresponding inputvector). After this, we interpolate the power distributions for other power points forfinal integration.

The flow of the presented analysis method under random input vectors is shownin Fig. 7.4. The STEP algorithm computes the total power under random inputvectors using the MC-based method [10].


The presented method has been implemented in Matlab V7.8, and Cadence Ultrasim7.0 was used for Fast-SPICE simulations. All the experimental results have beencarried out in a Linux system with quad Intel Xeon CPUs with 3 GHz and 16 GBmemory. The initial results of this chapter were published in [62].

The STEP method was tested on circuits in the ISCAS’85 benchmark set. Thecircuits were synthesized with Nangate open cell library under 45 nm technology,and the placement is obtained from UCLA/Umich Capo [145]. The test cases aregiven in Table 7.1 (all length units in �m).

Effective channel length Leff is modeled as sum of spatially correlated sources ofvariations based on (3.12). The nominal value of Leff is 50 nm and the 3� range is


Fig. 7.4 The flow of the presented algorithm with random input vectors and process variations

Table 7.1 Summaryof benchmark circuits

Circuit Gate # Input # Output # Area

c432 242 36 7 55 � 48c880 383 60 16 85 � 84c1355 562 41 32 84 � 78c1908 972 33 25 102 � 102c3540 1705 50 22 141 � 144

set as 20%. The same framework can be easily extended to include other parametersof variations.

Firstly, we use the MC-based method [10] to obtain the mean and standarddeviation (std) of each circuit sample under random input vectors. The input signaland transition probabilities are 0:5, with the clock cycle of 180 ps. The simulationtime for each sample circuit is 10 clock cycles, and the error tolerance � is 0:01.

Secondly, we observe the total power distribution for each sample circuit underfixed input vector. For each sample circuit, one input vector is selected, and thenwe run the MC simulations (10,000 runs) under process variations with spatialcorrelation as well as our presented STEP method. The results are shown inTable 7.2, in which MC Co and STEP mean the MC method considering processvariations with spatial correlation and the presented method, respectively. Theaverage errors for mean and standard deviation of the STEP method are 2:90% and6:00%, respectively. Figure 7.5 shows the total power distribution (PDF and CDF) ofcircuit c880 under a fixed input. Table 7.3 gives parameter values of the correlationlength �, reduced number of variable k, and sample count of Fast-SPICE running ofthe two methods. Sampling time dominates the total simulation time for both MC


Table 7.2 Total power distribution under fixed input vector

Mean (uW) Err Std (uW) Err

Circuit MC Co Step (%) MC Co Step (%)

c432 267:6 261:7 2.23 10:22 9:54 6.78c880 606:9 610:5 0.59 19:88 18:09 9.02c1355 785:6 799:4 1.76 40:51 43:25 6.77c1908 1404:9 1294:4 7.86 76:15 79:73 4.71c3540 2824:6 2766:8 2.05 268:5 261:2 2.73

5.5 6 6.5 7 7.5

x 10−4

0

0.05

0.1

0.15

0.2

Power(W)

Pro

babi

lity

c880 power distribution pdf under fixed input

5.5 6 6.5 7 7.5

x 10−4

0

0.2

0.4

0.6

0.8

1

Power(W)

Pro

babi

lity

c880 power distribution cdf under fixed input

NewMonte Carlo

NewMonte Carlo

Fig. 7.5 The comparison of total power distribution PDF and CDF between STEP method and MCmethod for circuit c880 under a fixed input vector. Reprinted with permission from [62] c� 2011IEEE

Table 7.3 Sampling numbercomparison under fixed inputvector

Sample countSpeedup

Circuit ı k MC Co Step over

c432 50 6 10,000 91 110c880 50 9 10,000 190 53c1355 50 9 10,000 190 53c1908 100 6 10,000 91 110c3540 100 8 10,000 153 65

Co and the STEP methods and the STEP method has 78� speedup over MC Comethod on average. The more speedup can be gained for large cases.

Thirdly, we compare the STEP method with the MC method under both randominput vectors and process variations with spatial correlation. We select three power


Table 7.4 Total powerdistribution comparison underrandom input vector andspatial correlation

Mean (uW) Errors(%)

Circuits MC Co MC nCo Step MC nCo Step

c432 299.9 299.9 312.7 0.01 4.26c880 609.8 604.5 604.4 0.88 0.89c1355 802.6 777.1 778.3 3.18 3.04c1908 1375.1 1361.6 1361.3 0.98 0.99c3540 2775.8 2821.7 2822.2 1.65 1.67

Standard deviation (uW) Errors(%)

Circuits MC Co MC nCo Step MC nCo Step

c432 45.3 40.4 44.6 10.9 1.52c880 57.1 51.5 56.5 9.76 0.95c1355 56.3 30.2 60.5 46.4 7.45c1908 115.5 79.4 128.5 31.3 11.3c3540 309.3 180.4 280.8 41.7 9.21

points from the total power distribution obtained by the MCy-based method [10]and get the corresponding input vectors. We performed the STEP method underthese three input vectors and obtain the corresponding mean and standard deviation,respectively. The .mean; std/ samples for other power points with distinguishedpower values can be interpolated via the three samples.

Equation (7.11) is used to calculate the PDF of total power distribution underboth random input vectors and process variations with spatial correlation. Theresults are shown in Table 7.4; MC Co, MC nCo, and STEP represent the MCmethod considering process variations with spatial correlation, the MC methodwithout considering process variations with spatial correlation, and the presentedmethod, respectively. The average error of the mean and the standard deviation ofour method compared with MC Co is 2.17% and 6.09%, respectively. While theaverage error of the mean and the standard deviation of MC nCo compared with MCCo is 1.34% and 28.01%, respectively. The error (std) is increasing for larger testcases.

Obviously, we can see that the MC method considering only random inputvectors fails to capture the true distribution when both input vector and processvariations are considered. The parameter values of ı and k is the same as inTable 7.3. The difference is that we need to run STEP for three times and thetotal sample numbers are increased correspondingly. However, the STEP methodstill has 26� speedup over the MC method on average and remains to be accurate.Figure 7.6 shows the power distribution comparison (PDF and CDF) of the STEPmethod and the MC method under both random input vectors and process variationswith spatial correlation for circuit c880. We observe that the distribution of the totalpower under a fixed input vector or under random input vectors has a distributionsimilar to normal as shown in Figs. 7.5 and 7.6, such distribution justifies the use ofHermite PC to represent the total power distributions.

5 Summary 103

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5

x 10−4

0

0.05

0.1

0.15

0.2

Power(W)

Pro

babi

lity

c880 power distribution pdf

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5

x 10−4

0

0.2

0.4

0.6

0.8

1

Power(W)

Pro

babi

lity

c880 power distribution cdf

NewMonte Carlo

NewMonte Carlo

Fig. 7.6 The comparison of total power distribution PDF and CDF between STEP method andMonte Carlo method for circuit c880 under random input vector. Reprinted with permissionfrom [62] c� 2011 IEEE

5 Summary

In this chapter, we have presented an efficient statistical total chip power estimationmethod considering process variations with spatial correlation. The new method isbased on accurate circuit-level simulation under realistic testing input vectors toobtain accurate total chip powers. To improve the estimation efficiency, efficientsampling-based approach has been applied using the OPC-based representationand random variable transformation and reduction techniques. Numerical examplesshow that the presented method is 78� faster than the MC method under fixed inputvector and 26� faster than the MC method considering both random input vectorsand process variations with spatial correlation.

Part IIIVariational On-Chip Power Delivery

Network Analysis

Chapter 8Statistical Power Grid Analysis ConsideringLog-Normal Leakage Current Variations

1 Introduction

As discussed in Part II, process-induced variability has huge impacts on chip leakagecurrents, owing to the exponential relationship between subthreshold leakagecurrent Isub and threshold voltage Vth as shown below [172],

Isub D Is0eVgs�Vth

nVT

�1 � e

�VdsVT

; (8.1)

where Is0 is a constant related to the device characteristics, VT is the thermalvoltage, and n is a constant. It was shown in [78] that leakage variations for 90 nmcan be 20�. Based on the ITRS [71], the leakage power accounts for more than60% at 45 nm; there are many consequences for chip design, especially for designof the power grid. The grid will develop voltage drop at all the nodes that arecorrespondingly significant with strong within-die components. The voltage dropis unavoidable and manifests itself as a background noise on the grid which has animpact on the circuit delay and operation.

Clearly, the leakage current has exponential dependency on the threshold voltageVth. In the sequel, the leakage current is mainly referred to as the subthresholdleakage current. Detailed analysis shows that Isub is also an exponential functionof the effective channel length Leff [142]. Actually, Leff are strongly correlated withVoff as Voff variations typically are caused by the Leff. So if we model Vth or Leff

as the random variable with Gaussian variation caused by the inter-die or intra-die process variations, then the leakage currents will have a log-normal distributionas shown in [142]. On top of this, those random variables are spatially correlatedwithin a die, owing to the nature of the many physical and chemical manufactureprocesses [120].

On-chip power grid analysis and designs have been intensively studied in thepast due to the increasing impacts of excessive voltage drops as technologiesscale [84, 191, 206]. Owing to the increasing impacts of leakage currents and its


107

108 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations

variations on the circuit performances, especially on the on-chip power deliverynetworks, a number of research works have been proposed recently to perform thestochastic analysis of power grid networks under process-induced leakage currentvariations. The voltage drop of power grid networks subject to the leakage currentvariations was first studied in [39, 40]. This method assumes that the log-normaldistribution of the node voltage drop is caused by the log-normal leakage currentinputs and is based on a localized MC (sampling) method to compute the varianceof the node voltage drop. However, this localized sampling method is limited tothe static DC solution of power grids modeled as resistor-only networks. Therefore,it can only compute the responses to the standby leakage currents. However, thedynamic leakage currents become more significant, especially when the sleeptransistors are intensively used nowadays for reducing leakage powers. In [131,169],impulse responses are used to compute the means and variances of node voltageresponses caused by general current variations. But this method needs to know theimpulse response from all the current sources to all the nodes, which is expensiveto compute for a large network. In [142], the PDF of leakage currents is computedbased on the Gaussian variations of channel lengths.

2 Previous Works

A number of research work have been proposed recently to address the voltage dropvariation issues in the on-chip power delivery networks under process variations.The voltage drop of power grid networks subject to the leakage current variationswas first studied in [39, 40]. This method assumes that the log-normal distributionof the node voltage drop is caused by log-normal leakage current inputs and is basedon a localized MC (sampling) method to compute the variance of the node voltagedrop. However, this localized sampling method is limited to the static DC solution ofpower grids modeled as resistor-only networks. Therefore, it can only compute theresponses to the standby leakage currents. However, the dynamic leakage currentsbecome more significant, especially when the sleep transistors are intensively usednowadays for reducing leakage powers. In [131,169], impulse responses are used tocompute the means and variances of node voltage responses due to general currentvariations. But this method needs to know the impulse responses from all the currentsources to all the nodes, which is expensive to compute for a large network. Thismethod also cannot consider the variations of the wires in the power grid networks.

Recently, a number of analysis approaches based on so-called spectral stochasticanalysis method have been proposed for analyzing interconnect and power gridnetworks [46, 47, 108, 190]. This method is based on the OPC expansion of randomprocesses and the Galerkin theory to represent and solve for the stochastic responsesof statistical linear dynamic systems. The spectral stochastic method only needs tosolve for some coefficients of the orthogonal polynomials by using normal transientsimulation of the original circuits. Research work in [190] applied the spectral

3 Nominal Power Grid Network Model 109

stochastic method to compute the variational delay of interconnects. In [46, 47],the spectral stochastic method has been applied to compute the voltage dropvariations caused by Gaussian-only variations in the power grid wires and inputcurrents (approximating them as Gaussian variations by using first-order Taylorexpansion). Intra-die variations can be considered in [46]. Recently, the authorsextended the spectral stochastic method by specifically considering the log-normalleakage variations to solve for the variational voltage drops in on-chip power gridnetworks [107, 108]. Spatial correlations were also considered in [109].

In this chapter, we apply the spectral statistical method to deal with leakagecurrent inputs with log-normal distributions and spatial correlations [108]. Weshow how to represent a log-normal distribution in terms of Hermite polynomials,assuming Gaussian distribution of threshold voltage Vth in consideration of intra-dievariation. To consider the spatial correlation, we apply orthogonal decompositionvia PCA to map the correlated random variables into independent variables. Tothe best knowledge of the authors, the presented method is the first method beingable to perform statistical analysis on power grids with variation dynamic leakagecurrents having log-normal distributions and spatial correlations. Experiment resultsshow that the presented method predicates the variances of the resulting log-normal-like node voltage drops more accurately than Taylor expansion-based Gaussianapproximation method.

Notice that we only consider the leakage current inputs with log-normal dis-tributions in this chapter. For general current variations from dynamic power of thecircuits, which typically can be modeled as Gaussian distribution, existing work [47]using Taylor series expansion has been explored. The voltage variations causedby the dynamic power can be considered on top of the variations from the log-normal leakage currents. We notice that similar work, which consider only leakagevariations have been done before [39, 40].

We also remark that Vdd drop will have impacts on the leakage currents, whichcreate a negative feedback for the leakage current itself as increasing Vdd drop leadsto lower Vgs in (8.1), which leads to smaller Isub. However, to consider the effect,both the power grid and signal circuits need to be simulated together, which willbe very expensive. Hence, practically, two-step simulation approach is used wherepower grid and signal circuits are simulated separately but in an iterative way toconsider the coupling between them. In light of this simulation methodology, thepresented method can be viewed as the only one step (power grid simulation step)in such a method.

3 Nominal Power Grid Network Model

The power grid networks in this chapter are modeled as RC networks with knowntime-variant current sources, which can be obtained by gate-level logic simulationsof the circuits. Figure 8.1 shows the power grid models used in this chapter. For a


Fig. 8.1 The power grid model used

power grid (vs. the ground grid), some nodes having known voltage are modeledas constant voltage sources. For C4 power grids, the known voltage nodes can beinternal nodes inside the power grid. Given the current source vector, u.t/, the nodevoltages can be obtained by solving the following differential equations, which areformulated using the modified nodal analysis (MNA) approach:

Gv.t/ C Cdv.t/

dtD Bu.t/; (8.2)

where G 2 Rn�n is the conductance matrix, C 2 Rn�n is the matrix resulting fromstorage elements, v.t/ is the vector of time-variant node voltages and branch currentsof voltage sources, u.t/ is the vector of independent sources, and B is the inputselector matrix.

We remark that the proposed method can be directly applied to power gridsmodeled as RLC/RLCK circuits. But inductive effects are still most visible at boardand package levels, and the recent power grid networks from IBM only consist ofresistance [123].

4 Problem Formulation 111

4 Problem Formulation

In this section, we present the modeling issue of leakage current under intra-dievariations for power grid network. Note that in this case, the leakage current israndom process instead of random variable in the full-chip leakage analysis in theabove part of this book. After this, we present the problem that we try to solve.

The G and C matrices and input currents I.t/ depend on the circuit parameters,such as metal wire width, length, and thickness on power grids, and transistorparameters, such as channel length, width, gate oxide thickness, etc. Some previouswork assumes that all circuit parameters and current sources are treated as uncorre-lated Gaussian random variables [47]. In this chapter, we consider both power gridwire variations and the log-normal leakage current variations, caused by the channellength variations, which are modeled as Gaussian (normal) variations [142].

Process variations can also be classified into inter-die (die-to-die) variationsand intra-die variations. In inter-die variations, all the parameters variations arecorrelated. The worst-case corner can be easily found by setting the parametersto their range limits (mean plus 3�). The difficulty lies in the intra-die variations,where the circuit parameters are not correlated or spatially correlated within adie. Intra-die variations also consist of local and layout-dependent deterministiccomponents and random components, which typically are modeled as multivariateGaussian process with some spatial correlations [12]. In this chapter, we first assumewe have a number of independent (uncorrelated) transformed orthonormal randomGaussian variables �.�/; i D 1; : : : ; n, which actually model the channel lengthand the device threshold voltage variations and other variations. Then, we considerspatial correlation in the intra-die variation. We apply the PCA method in Sect. 2.2of Chap. 2 to transfer the correlated variables into uncorrelated variables before thespectral statistical analysis.

Let ˝ denote the sample space of the experimental or manufacturing outcomes.For ! 2 ˝ , let �d.!/ D Œ�1d .!/; : : : ; �rd .!/� be a vector of r Gaussian variablesto represent the circuit parameters of interest. After the PCA operation, we obtainindependent random variable vectors � D Œ�1; : : : ; �n�. Notice that n � r in general.Therefore, given the process variations, the MNA for (8.2) becomes

G.�/v.t/ C C.�/dv.t/

dtD I.t; �.�//; (8.3)

The variation in wire width and thickness will cause variation in the conductancematrix G.�/ and capacitance matrix C.�/. The variations are more related to backend of the line (BEOL) as power grids are mainly metals at top or middle layers.The input current vector, I.t; �.�//, has both deterministic and random components.In this chapter, to simplify our analysis, we assume the dynamic currents (power)caused by circuit switching are still modeled as deterministic currents as we onlyconsider the leakage variations. Practically, the variations caused by the dynamicpower of circuits can be significant. But the voltage variations caused by the leakagevariations can be viewed as background noise, which can be considered togetherwith dynamic power-induced variations later.


To obtain the variation current sources I.t; �.�//, some library characterizationmethods will be used to compute the I.t; �.�// once we know the effective channellength Leff variations, threshold voltage (Vth) variations, and other variable sourcesunder different input patterns. With those variation-aware cell library, we can moreaccurately obtain the I.t; �.�// based on the logic simulation of the whole chipunder some inputs.

Note that from practical use perspective, a user may be only interested in voltagevariations over a period of time or worst case in a period of time. Those informationcan be easily obtained once we know the variations in any given time instance.In other words, the information we obtain here can be used to derive any otherinformation that is interesting to designers.

The problem we need to solve is to efficiently find the mean and variances ofvoltage v.t/ at any node and at any time instance. A straightforward method is MC-based sampling methods in Sect. 3.1 of Chap. 2. We randomly generate G.�/, C.�/,and I.t; �.�//, which are based on the log-normal distribution; solve (8.3) in timedomain for each sampling; and compute the means and variances based on sufficientsamplings. Obviously, MC will be computationally expensive. However, MC willgive the most reliable results and is the most robust and flexible method.

Specifically, we expand the variational G and C around their mean values andkeep the first-order terms as in [22, 102, 134].

G.�/ D G0 C G1�1 C G2�2 C : : : C GM �M ; (8.4)

C.�/ D C0 C C1�1 C C2�2 C : : : C CM �M :

We remark that the presented method can be trivially extended to the second- andhigher-order terms [134]. The input current variation i.t; �/ follows the log-normaldistribution as leakage variations are dominant factors:

i.�/ D eg.�/; g.�/ D � C ��: (8.5)

Note that input current variation i.�/ is not a function of time as we only model thestatic leakage variations for the simplicity of presentation. However, the presentedapproach can be easily applied to time-variant variations with any distribution.

5 Statistical Power Grid Analysis Based on Hermite PC

5.1 Galerkin-Based Spectral Stochastic Method

To simplify the presentation, we first assume that C and G are deterministic in (8.3).We will remove this assumption later. In case that v.t; �/ is unknown randomprocess as shown in Sect. 3.2 of Chap. 2 (with unknown distributions) like nodevoltages in (8.3), then the coefficients can be computed by using Galerkin-based

5 Statistical Power Grid Analysis Based on Hermite PC 113

method. In this way, we transform the stochastic analysis process to a deterministicprocess, where we only need to compute the coefficients of its Hermite PC. Oncewe obtain those coefficients, the mean and variance of the random variables can beeasily computed as shown later in the section.

For illustration purpose, considering one Gaussian variable � D Œ�1�, we thencan assume that the node voltage response can be written as a second-order (p D 2)Hermite PC:

v.t; �/ D v0.t/ C v1.t/�1 C v2.t/��2

1 � 1�

: (8.6)

Assuming that the input leakage current sources can also be represented by a secondHermite PC,

I.t; �/ D I0.t/ C I1.t/�1 C I2.t/��2

1 � 1�: (8.7)

By applying the Galerkin equation (2.44) and noting the orthogonal property ofthe various orders of Hermite PCs, we end up with the following equations:

Gvi .t/ C Cdvi .t/

dtD Ii .t/; (8.8)

where i D 0; 1; 2; ::; P .For two independent Gaussian variables, we have

v.t; �/ D v0.t/ C v1.t/�1 C v2.t/�2 C v3.t/��2

1 � 1�

Cv4.t/��2

2 � 1�C v5.�1�2/: (8.9)

Assuming that we have a similar second-order Hermite PC for input leakage currentI.t; �/,

I.t; �/ D I0.t/ C I1.t/�1 C I2.t/�2 C I3.t/��2

1 � 1�

CI4.t/��2

2 � 1�C I5.�1�2/: (8.10)

The (8.8) is valid with i D 0; : : : ; 5. For more (more than two) Gaussian variables,we can obtain the similar results with more coefficients of Hermite PCs to be solvedby using (8.8).

Once we obtain the Hermite PC of v.t; �/, we can obtain the mean and varianceof v.t; �/ by (2.39).

One critical problem remaining so far is how to obtain the Hermite PC (8.7)for leakage current with log-normal distribution. Our method is based on Sect. 4 ofChap. 2, and we will show how it can be applied to solve our problems for one ormore independent Gaussian variables.

Once we have the Hermite PC representation of the leakage current sourcesI.t; �/, the node voltages v.t; �/ can be computed by using (8.8).

Once we obtain the Hermite PC of v.t; �/, we can obtain the mean and varianceof v.t; �/ trivially by (2.39).


5.2 Spatial Correlation in Statistical Power Grid Analysis

Spatial correlations exist in the intra-die variations in different forms and havebeen modeled for timing analysis [12, 121]. The general way to consider spatialcorrelation is by means of mapping the correlated random variables into a setof independent variables. This can be done by using some orthogonal mappingtechniques, such as PCA in Sect. 2.2 of Chap. 2. In this chapter, we also applyPCA method in our spectral statistical analysis framework for power/grid statisticalanalysis.

To consider intra-die variation in Vth, the chip is divided into n regions, assuming˚ D Œ˚1; ˚2; : : : ; ˚n� is a random variable vector, representing the variation of Vth

on different part of the circuit. In other words, in the ith region, the leakage currentIsubi

D ceVth.˚i / follows the log-normal distribution. Here, ˚i is a random variablewith Gaussian distribution. �˚ D Œ�ˆ1 ; �˚2 ; : : : ; �˚n� is the mean vector of ˚ andC is the covariance matrix of ˚ .

With PCA, we can get the corresponding uncorrelated random variables � DŒ�1; �2; : : : ; �n� from the equation

� D A.˚ � �˚ /: (8.11)

Also, the original random variables can be expressed as

˚i DnX

j D1

aij �j C �˚i ; i D 1; 2; : : : n; (8.12)

where aij is the ith row, jth column element in the orthogonal mapping matrixdefined in (2.21). � D Œ�1; �2; : : : ; �n� is a vector with orthogonal Gaussian randomvariables. The mean of �j is 0 and variance is �j , j D 1; 2; : : : ; n. The distributionof �i can be written as

�i D ��i C ��iO�i ; i D 1; 2; : : : ; n: (8.13)

O� D Œ O�1; O�2; : : : ; O�n� is a vector with orthogonal normal Gaussian random variable.˚i can be expressed with normal random variables, O� D Œ O�1; O�2; : : : ; O�n� :

˚i DnX

j D1

aij

q�j

O�j C �˚i ; i D 1; 2; : : : ; n: (8.14)

With (8.14), the leakage current can be expanded as Hermite PC:

I.˚i / � e˚i D ePn

j D1 gjO�j C�˚i

D �i

0

B@1 C

nX

j D1

O�j gj CnX

j D1

nX

kD1

O�jO�k � ıjk

�

h O�j

O�k � ıjk

�2igj gk C � � �

1

CA :

(8.15)

5 Statistical Power Grid Analysis Based on Hermite PC 115

Here,

gj D aij

q�j ; j D 1; 2; : : : ; n: (8.16)

Therefore, the MNA equation with correlated random variables ˆ in currentsource can be expressed in terms of uncorrelated random variables O� as follows:

Gv.t/ C Cdv.t/

dtD Ii .t; O�/: (8.17)

With orthogonal property of O� , (8.17) will be simply solved by using (8.8), i D1; 2; : : : ; P .

5.3 Variations in Wires and Leakage Currents

In this section, we will consider variations in width (W ), thickness(T ) of wires ofpower grids, as well as threshold voltage(Vth) in active devices which are reflectedin the leakage currents. Meanwhile, without loss of generality, these variations aresupposed to be independent of each other. As mentioned in [47], the MNA equationfor the ground circuit will become

G.�g/v.t/ C C.�c/dv.t/

dtD I.�I ; t/: (8.18)

The variation in width W and thickness T will cause variation in conductancematrix G and capacitance matrix C while variation in threshold voltage will causevariation in leakage currents I . Thus, the conductance and capacitance of wires canbe expressed as in [47]:

G.�g/ D G0 C G1�g;

C.�c/ D C0 C C1�c: (8.19)

G0; C0 represent the deterministic components of conductance and capacitance ofthe wires. G1; C1 represent sensitivity matrices of the conductance and capacitance.�g; �c are normalized random variables with Gaussian distribution, representingprocess variation in wires of conductance and capacitor, respectively. As mentionedin previous section, the variation in leakage current can be represented by a secondHermite PC as in (2.55):

I.t; �I / D I0.t/ C I1.t/�I C I2.t/��2

I � 1�

: (8.20)


Here, �I is a normalized Gaussian distribution random variable representingvariation in threshold voltage. I.t; �I / follows log-normal distribution as

I D eg.�I /;

g.�I / D �I C �I �I : (8.21)

As in previous part, the desired Hermite PC coefficients, I0;1;2, can be expressed asI0; I0�I ; and 1

2I0�

2I respectively. I0 is the mean of leakage current source, which is

expressed as

I0 D exp

��I C 1

2�2

I

: (8.22)

Considering the influence of �g; �c; �I , the node voltage is therefore expanded byHermite PC in the second-order form as

v.t; �/ D v0.t/ C v1.t/�g C v2.t/�c C v3.t/�I

Cv4.t/�2

g � 1�

C v5.t/��2

c � 1�C v6.t/

��2

I � 1�

Cv7.t/�g�c C v8.t/�g�I C v9.t/�c�I : (8.23)

Now the task is to compute coefficients of the Hermite PC of node voltage v.t; �/.Applying Galerkin equation (2.44), we only need to solve the equations as follows:

h.t; �/; 1i D 0; h.t; �/; �gi D 0;

h.t; �/; �ci D 0; h.t; �/; �I i D 0;

h.t; �/; �2g � 1i D 0; h.t; �/; �2

c � 1i D 0;

h.t; �/; �2I � 1i D 0; h.t; �/; �g�ci D 0;

h.t; �/; �g�I i D 0; h.t; �/; �c�I D 0: (8.24)

With the distribution of �g,�c ,�I , we can get these coefficients v.t/ D Œv0.t/, v1.t/,: : : , v9.t/�

T of node voltage as

eGv.t/ C eCdv.t/

dtDeI .t/; (8.25)


where

eG D

2

6666666666666664

G0 G1 0 0 0 0 0 0 0 0

G1 G0 0 0 2G1 0 0 0 0 0

0 0 G0 0 0 0 0 G1 0 0

0 0 0 G0 0 0 0 0 0 0

0 G1 0 0 G0 0 0 0 0 0

0 0 0 0 0 G0 0 0 0 0

0 0 0 0 0 0 G0 0 0 0

0 0 0 0 0 0 0 G0 0 0

0 0 0 G1 0 0 0 0 G0 0

0 0 0 0 0 0 0 0 0 G0

3

7777777777777775

eC D

2

6666666666666664

C0 0 C1 0 0 0 0 0 0 0

0 C0 0 0 0 0 0 C1 0 0

C1 0 C0 0 0 2C1 0 0 0 0

0 0 0 C0 0 0 0 0 0 0

0 0 0 0 C0 0 0 0 0 0

0 0 C1 0 0 C0 0 0 0 0

0 0 0 0 0 0 C0 0 0 0

0 0 0 0 0 0 0 C0 0 0

0 0 0 0 0 0 0 0 C0 0

0 0 0 C1 0 0 0 0 0 C0

3

7777777777777775

eI .t/ D ŒI0.t/; 0; 0; I1.t/; 0; 0; I2.t/; 0; 0; 0�T : (8.26)

Knowing Hermite PC coefficients of node voltage v.t; �/, it is easy to get the meanand variance of v.t; �/, which describe the random characteristic of node voltage inthe given circuit.

We remark that the presented method will lead to large circuit matrices, whichwill add more computation costs. To mitigate this scalability problem, for reallylarge power grid circuits, we can apply partitioning strategies to compute thevariational responses for each subcircuit, which will be small enough for efficientcomputation, as done in the existing work [17, 206].


This section describes the simulation results of circuits with log-normal leakagecurrent distributions for a number of power grid networks. All the presentedmethods have been implemented in Matlab. Sparse techniques are used in theMatlab. All the experimental results have been carried out in a Linux system withdual Intel Xeon CPUs with 3.06 GHz and 1 GB memory. The initial results of thischapter were published in [108, 109].


The power grid circuits we test are RC mesh circuits based on the values fromsome industry circuits, which are driven by only leakage currents as we are onlyinterested in the variations from the leakage currents. The resistor values are in therange 10�2˝ , and capacitor values are in the range of 10�12 farad.

6.1 Comparison with Taylor Expansion Method

We first compare the presented method with the simple Taylor expansion methodfor one and more Gaussian variables.

For simplicity, we assume one Gaussian random variable g.�/, which is ex-pressed as

g D �g C �g�; (8.27)

where � is a normalized Gaussian random variable with h�i = 0, and h�2i = 1.The log-normal random variable l.�/, obtained from g.�/, is written as

l.�/ D eg.�/ D exp.�g C �g�/: (8.28)

Expand the exponential into Taylor series and keep all the terms up to secondorder, then we have

l.�/ D 1 C1X

iD0

� i gi C 1

2

1X

iD0

1X

j D0

�i �j gi gj C : : :

D 1 C �g C 1

2�2

g C 1

2�2

g C .�g C �g�g/�

C1

2�2

g.�2 � 1/ C � � � : (8.29)

We observe that the second-order Taylor expansion, as shown in (8.29), issimilar to second-order Hermite PC in (2.57). Hence, the Galerkin-based methodcan still be applied; we then use (8.8) to obtain the Hermite PC coefficientsof node voltage v.t; �/ accordingly. We want to emphasize, however, that thepolynomials generated by Taylor expansion in general are not orthogonal withrespect to Gaussian distributions and cannot be used with Galerkin-based method,unless we only keep the first order of Taylor expansion results (with less accuracy).In this case, the resulting node voltage distribution is still Gaussian, which obviouslyis not correct.

We note that the first-order Taylor expansion has been used in the statistic timinganalysis [12]. The delay variations, owing to interconnects and devices, can beapproximated with this limitation. The skew distributions may be computed easilywith Gaussian process.


Table 8.1 Accuracycomparison between HermitePC (HPC) and Taylorexpansion

ıg 0.01 0.1 0.3 0.5 0.7

HPC (%) 3.19 1.88 2.07 5.5 2.92Taylor (%) 3.19 1.37 2.41 16.6 24.02

To compare these two methods, we use the MC method to measure the accuraciesof two methods in terms of standard deviation. For MC, we sample 2,000 times,which represents 97.7% accuracy. The results are summarized in Table 8.1. Inthis table, ıg is the standard deviation of the Gaussian random threshold voltageGaussian variable in the log-normal current source, and HPC is the standarddeviation from the Hermite PC method in terms of relative percentage against theMC method. Taylor is the standard deviation from the Taylor expansion method interms of relative percentage against the MC method.

We can observe that when the variation of current source increases, the Taylorexpansion method will result in significant errors compared to the MC method,while the presented method has the smaller errors for all cases. This clearly showsthe advantage of the presented method.

6.2 Examples Without Spatial Correlation

Figure 8.2 shows the node voltage distributions at one node on a certain point of aground network with 1,720 nodes. The MC results are obtained by 2,000 samples.The standard deviations of the log-normal current sources with one Gaussianvariable are 0.1. The mean and 3� computed by the Hermite PC method are alsomarked in the figure, which fits very well with the MC results. Figure 8.3 shows thenode voltages and its variations caused by the leakage currents from 0 ns to 126 ns.The circuit selected contains 64 nodes with one Gaussian variable of 0.06 in thecurrent source. The blue solid lines are mean, upper bound and lower bound. Thecyan lines are node voltages of MC with 2,000 times. Most of the MC results are inbetween upper bound and lower bound.

Another observation is that when standard deviation, �g , is small, the shape lookslike Gaussian as in Fig. 8.2, but it is log-normal indeed. In the case of two randomvariables with one large and the other small standard deviations, the larger onedominates, which shows the shape of log-normal as in Fig. 8.4.

To consider multiple random variables, we divide the circuit into severalpartitions. We first divide the circuit into two parts. Figure 8.4 shows the nodevoltage of one node of a particular time instance of a ground network with 336nodes with two independent variables. The standard deviations for two Gaussianvariations are �g1 D 0:5, �g2 D 0:1. The 3� variations are also marked in the figure.

Tables 8.2 and 8.3 show the speedup of the Hermite PC method over MC methodwith 2,000 samples considering one and two random variables, respectively.


0.15 0.2 0.25 0.3 0.35 0.4 0.450

50

100

150Distribution of voltage at given node (one variable, σ = 0.1)

Voltage (volts)

Num

ber

of o

ccur

ance

s ←μ − 3 δ ← μ + 3 δ← μ

Fig. 8.2 Distribution of the voltage in a given node with one Gaussian variable, �g D 0:1, at time50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c� 2008IEEE

0 20 40 60 80 100 120 1402

2.2

2.4

2.6

2.8

3

3.2x 10−3

time(ns)

volta

ge(v

)

Comparison between Hermite PC and Monte Carlo

Fig. 8.3 Distribution of the voltage caused by the leakage currents in a given node with oneGaussian variable, �g D 0:5, in the time instant from 0 ns to 126 ns. Reprinted with permissionfrom [109] c� 2008 IEEE


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450

20

40

60

80

100

120

140

160

180

200

← μ + 3 δ← μ← μ − 3 δ

Distribution of voltage at given node (two variables, σ = 0.1 and 0.5)

Voltage (volts)

Num

ber

of o

ccur

ance

s

Fig. 8.4 Distribution of the voltage in a given node with two Gaussian variables, �g1 D 0:1 and�g2 D 0:5, at time 50 ns when the total simulation time is 200 ns. Reprinted with permission from[109] c� 2008 IEEE

Table 8.2 CPU time comparison with the Monte Carlo method of one random variable

Ckt #node p n MC(s) #MC HPC(s) Speedup

gridrc 6 280 2 1 766.06 2000 1.0156 754.3gridrc 12 3240 2 1 4389 2000 8.3281 527.0gridrc 5 49600 2 1 2:3 � 105 2000 298.02 771.76

Table 8.3 CPU time comparison with the Monte Carlo method of two random variables

Ckt #node p n MC (s) #MC HPC (s) Speedup

gridrc 3 280 2 2 1:05 � 103 2000 2.063 507.6gridrc 5 49600 2 2 2:49 � 105 2000 445.6 558.7gridrc 9 105996 2 2 6:11 � 105 2000 1141.8 535.1

In two tables, #node is the number of nodes in the power grid circuits. p is theorder of the Hermite PCs, and n is the number of independent Gaussian randomvariables. #MC is the number of samples used for MC method. HPC and MCrepresent the CPU times used for Hermite PC method and MC method, respectively.It can be seen that the presented method is about two orders of magnitude faster thanthe MC method.

When more Gaussian variables are used for modeling intra-die variations, weneed more Hermite PC coefficients to compute. Hence, the speedup will be smallerif the MC method uses the same number of samples as shown in gridrc 12. Also, one


Φ1 = ξ1 + 0.5ξ2 Φ2 = ξ2 + 0.5ξ1

Fig. 8.5 Correlated random variables setup in ground circuit divided into two parts. Reprintedwith permission from [109] c� 2008 IEEE

Table 8.4 Comparisonbetween non-PCAand PCA against MonteCarlo methods

Mean Std dev

Non-PCA PCA Non-PCA PCAckt #nodes % error % error % error % error

1 336 10.3 0.52 18.8 1.132 645 8.27 0.59 11.4 1.163 1160 10.8 0.50 2.6 0.73

observation is that the speedup depends on the sampling size in MC method. Thespeedup of the presented method over the MC method depends on many factors suchas the order of polynomials, number of variables, etc. In general, speedup should nothave a clear relationship with the circuit sizes. We still use 2,000 samples for MC,which represent about 97.7% accuracy (as the error in MC is roughly 1=

p2000 for

2,000 samples).

6.3 Examples with Spatial Correlation

To model the intra-die variations with spatial correlations, we divide the power gridcircuit into several parts. We first consider that circuit is partitioned into two parts.In this case, we have two independent random current variables, �1 and �2. Thecorrelated variables for the two parts are ˚1 D �1 C 0:5�2 and ˚2 D �2 C 0:5�1,respectively, as shown in Fig. 8.5.

Table 8.4 shows the error percentage of mean and standard deviation of thecomparison between Monte Carlo and HPC with PCA and the comparison betweenMonte Carlo and HPC without PCA. As shown in the table, it is necessary to usePCA when spatial correlation is considered. Figure 8.6 shows the node voltagedistribution of one certain node in a ground network with 336 nodes, using bothPCA and non-PCA methods.

To get more accuracy, we divide the circuit into four parts, and each part hascorrelation with its neighbor as shown in Fig. 8.7. � is the correlated randomvariable vector we use in the circuit. � D Œ�1; �2; �3; �4� are independent Gaussiandistribution random variables with standard deviations �1 D 0:1, �2 D 0:2, �3 D 0:1,and �4 D 0:5. Figure 8.8 is the voltage distribution of a given node. The meanvoltage and voltages of worst case are given as the solid line. Figure 8.9 is thevoltage distribution of a circuit with 1,160 nodes. The circuit is partitioned into 25parts of five rows and five columns with spatial correlation. The dashed blue linesare mean, upper bound, and lower bound by Hermite PC. While the solid red linesare mean, upper bound, and lower bound by MC of 2,000 times.


−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160

50

100

150

200

250

300

350

Voltage(volts)

Num

ber

of o

ccur

ance

sDistribution of voltage considering spatial correlation(two variables)

← μμ−3δ → ← μ+3δ

dotted line:Monte Carlosolid line:HPC with PCAdashed line:HPC without PCA

Fig. 8.6 Distribution of the voltage in a given node with two Gaussian variables with spatialcorrelation, at time 70 ns when the total simulation time is 200 ns. Reprinted with permission from[109] c� 2008 IEEE

φ1=ζ1+0.5ζ2+0.5ζ3 φ3=ζ3+0.5ζ1+0.5ζ4

φ2=ζ2+0.5ζ1+0.5ζ4 φ4=ζ4+0.5ζ2+0.5ζ3

Fig. 8.7 Correlated random variables setup in ground circuit divided into four parts. Reprintedwith permission from [109] c� 2008 IEEE

Note that the size of the ground networks we analyzed is mainly limited by thesolving capacity of Matlab on a single Intel CPU Linux workstation. Given longsimulation time of large MC sampling runs, we limit the ground network size toabout 3,000 nodes.

Also note that for more accurate modeling, we need to have more partitions ofthe circuits, and thus, more independent Gaussian variables are needed as shownin [12].

6.4 Consideration of Variations in Both Wire and Currents

Figure 8.10 shows the node voltage distribution at one node of ground circuit,circuit5, which contains 280 nodes considering variation in conductance, capacitor,and leakage current. The maximum 3ı variation is 10% in �g , �c , and �I . Inthe figures, the solid lines are the mean voltage and worst-case voltages using


0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

350

400

Voltage(volts)

Num

ber

of o

ccur

ance

s

Distribution of voltage considering spatial correlation(four variables)

← μ−3σ ← μ+3σ ← μ

Fig. 8.8 Distribution of the voltage in a given node with four Gaussian variables with spatialcorrelation, at time 30 ns when the total simulation time is 200 ns. Reprinted with permission from[109] c� 2008 IEEE

2 2.5 3 3.5 40

50

100

150

200

250

300

Voltage(volts)

Num

ber

of o

ccur

ance

s

Distribution of voltage considering spatial correlation(5*5)

← μ−3δ ← μ+3δ← μ

dashed:HPC

line:Monte Carlo

Fig. 8.9 Distribution of the voltage in a given node with circuit partitioned of 5 � 5 with spatialcorrelation, at time 30 ns when the total simulation time is 200 ns. Reprinted with permission from[109] c� 2008 IEEE


0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.080

50

100

150

200

250

300

Voltage(volts)

Num

ber

of o

ccur

ance

sDistribution of voltage considering variance in G,C,I

← μ← μ−3δ ← μ+3δ

dot: Monte Carloline: HPC

Fig. 8.10 Distribution of the voltage in a given node in circuit5 with variation on G,C,I, at time50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c� 2008IEEE

Table 8.5 CPU timecomparison with the MCmethod considering variationin G,C,I

Ckt # of nodes MC(s) HPC(s) Speedup

gridrc 6 280 1320.1 9.25 142.7gridrc 12 3,240 12183 141.4 86.2gridrc 62 9,964 63832 3261 19.6

HPC method. The histogram bars are the Monte Carlo results of 2,000 samples.The dotted lines are the mean voltage and worst-case voltage of the 2,000 samples.From the figures, we can see that results got from two methods match very well.

Table 8.5 shows the CPU speedup of HPC method over MC method. The samplenumber of Monte Carlo is 3,500, and we can see that the presented method is abouttwo orders of magnitudes faster than the MC method when considering variationsin conductance, capacitors, and voltage sources. The speedup becomes smaller forlarger circuits. This is because of the super-linear-time complexity of linear solveras the augmented matrices in (8.26) grow faster than each individual matrices Gi

and Ci . The presented method does not favor very large circuits. Practically, thisscalability problem can be mitigated by using partitioning-based strategies [17].


7 Summary

In this chapter, we have presented a stochastic simulation method for fast estimatingthe voltage variations from the process-induced log-normal leakage current varia-tions with spatial correlations. The presented new analysis is based on the HermitePC representation of random processes. We extended the existing Hermite PC-basedpower grid analysis method [47] by considering log-normal leakage distributions aswell as the consideration of the spatial correlations. The new method considers bothlog-normal leakage distribution and wire variations at the same time. The numericalresults show that the new method is more accurate than the Gaussian-only HermitePC using the Taylor expansion method for analyzing leakage current variations andtwo orders of magnitude faster than MC methods with small variation errors. In thepresence of spatial correlations, method without considering the spatial correlationsmay lead to large errors, roughly 8–10% in our tested cases, if correlation is notconsidered. Numerical examples show the correctness and high accuracy of thepresented method. It leads to about 1% or less of errors in both mean and standarddeviations and is about two orders of magnitude faster than MC methods.

Chapter 9Statistical Power Grid Analysis by StochasticExtended Krylov Subspace Method

1 Introduction

In this chapter, we present a stochastic method for analyzing the voltage dropvariations of on-chip power grid networks with log-normal leakage current varia-tions, which is called StoEKS and which still applies the spectral-stochastic-methodto solve for the variational responses. But different from the existing spectral-stochastic-based simulation method, the EKS method [177, 191] is employed tocompute variational responses using the augmented matrices consisting of thecoefficients of Hermite polynomials. Our work is inspired by recent spectral-stochastic-based model order reduction method [214]. We apply this work to thevariational analysis of on-chip power grid networks considering the variationalleakage currents with the log-normal distribution.

Our contribution lies in the acceleration of the spectral stochastic methodusing the EKS method to fast solve the variational circuit equations for the firsttime. By using the Krylov-subspace-based reduction technique, the new methodpartially mitigates the increased circuit-size problem associated with the augmentedmatrices from the Galerkin-based spectral stochastic method. We will show how thecoefficients of Hermite PCs are computed for variational circuit matrices and for thecurrent moments used in EKS with log-normal distribution. Numerical examplesshow that the presented StoEKS is about two orders of magnitude faster than theexisting Hermite PC-based simulation method, having similar error compared withMC method. StoEKS can analyze much larger circuits than the existing Hermite PCmethod in the same computation platform.

The variational power grid models and problem we plan to solve here arethe same as in Chap. 8. The rest of this chapter is organized as the follows:Sect. 3 reviews the orthogonal PC-based stochastic simulation method and theimproved EKS method. Section 4 presents our new statistical power grid simulationmethod. Section 5 presents the numerical examples and Sect. 6 concludes thischapter.


127

128 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...


In this chapter, we assume that the variational current source in (8.3), u.t; �/,consists of two components:

u.t; �/ D ud .t/ C uv.t; �/; (9.1)

where ud .t/ is the dynamic current vector from circuit switching, which is stillmodeled as deterministic currents as we only consider the leakage variations.uv.�; t/ is the variational leakage current vector, which is dominated by subthresholdleakage currents and it may change over time also. uv.t; �/ follows the log-normaldistribution.

The problem we need to solve is to efficiently find the mean and variance ofvoltage u.t; �/ at any node at any time instance without using the time-consumingsampling-based method, such as MC.

3 Review of Extended Krylov Subspace Method

In this subsection, we briefly review the EKS method in [191] and [89] for fastcomputation of responses from linear dynamic systems.

The EKS method uses the Krylov-like reduction method to speed up the simula-tion process. Different from the Krylov-based model order reduction method, EKSperforms the reduction considering both system matrices and input signals beforethe simulation (so the subspace is no longer Krylov subspace). So it essentially is asimulation approach using the Krylov subspace reduction method. It assumes inputsignals can be represented by piecewise linear (PWL) sources.

Let V D ŒOv1; Ov2; :::Ovk� be an orthogonal basis for moment subspace .m0, m1,: : :, mk/ of input u.t/. Following is the high-level description of the EKS algorithm(Fig. 9.1) [191].

Then the original circuit described by (8.2) can be reduced to a smaller system:

OGz C OC dz.t/

dtD OBu; (9.2)

where

OG D V T GV; OC D V T C V; OB D V T B; v.t/ D V z.t/:

After the reduced system in (9.2) has been solved for the given input u.t/, thesolution z.t/ can then be mapped back into original space by v.t/ D V z.t/.

As the EKS models a PWL source as a sum of delayed ramps in Laplace domain,the terms, however, contain 1=s and 1=s2 moments [191], while the traditional

3 Review of Extended Krylov Subspace Method 129

Input: G,C ,B ,u.t / and moment order q

Output: orthogonal basis V D fOv0; Ov2; :::; Ovq�1g1 Ov0 D ˛0v0, where v0 D G�1Bu0, ˛0 D 1

norm.v0/;

2 set hs D 0;3 for i D 1 W q � 1

4 vi D G�1f˘i�1jD0˛j Bui � C.Ovi�1 C ˛i�1hs/g;

5 hs D 0;6 for j D 0 W i � 1

7 h D OvTj vi ;

8 hs D hs C hOvj ;9 end10 Nvi D vi � hs ;11 ˛i D 1

norm.Nvi /;

12 Ovi D ˛i Nvi

13 end

Fig. 9.1 The EKS algorithm

Krylov space starts from 0th moment. Therefore, moment shifting must be madein EKS, which would cause complex computation and more errors. This problem isresolved in [89] in the IKES algorithm, which shows that the moments of 1=s and1=s2 are zeros for PWL input sources.

Assume that we want to obtain a single input source uj .s/ in the followingmoment form:

uj .s/ D u1 C u2s C u3s2 C � � � C uLsL�1:

A PWL source uj .t/ is represented by a series of value-time pairs such as .a1; �1/,.a2; �2/; :::; .aKC2; �KC2/; and L moments needed to be calculated. As proposedin [89], the mth moment for current source uj .t/ in a current source vector u.s/ canbe calculated as

uj;m D�

a1 � ˛1

�1

m C 1

ˇ

.m/1 �

kX

iD1

.˛i � ˛iC1/ˇ.mC1/iC1

��

aKC2 � ˛KC1

�kC2

m C 1

ˇ

.m/KC2; m D 1; :::; L: (9.3)

Here,

ˇ.m/i D .��i /

m

mŠ; ˛i D aiC1 � ai

�iC1 � �i

:

The EKS/IEKS method, however, has its limitations. One major drawback is thatcurrent sources have to be represented in the explicit moment form, which may


not be accurate and not numerically stable when high-order moments are employedfor high-frequency-rich current waveforms owing to the well-known problem in theexplicit moment matching method [136].

Recently, more stable and accurate algorithm, called ETBR, has been pro-posed [93], which is based on more accurate fast truncated balanced reductionmethod. It uses a frequency spectrum to represent the current sources, and thus,is more flexible and accurate. Since our contribution in this chapter is not aboutimproving the EKS method, we just use EKS as a baseline algorithm for StoEKS.

4 The Stochastic Extended Krylov SubspaceMethod—StoEKS

In this section, we present the new stochastic simulation algorithm, StoEKS, whichis based on both the spectral stochastic method and the EKS method [191]. The mainidea is that we use the spectral stochastic method to convert the statistical simulationinto a deterministic simulation problem. Then we apply EKS to solve the convertedproblem.

4.1 StoEKS Algorithm Flowchart

First, we present StoEKS algorithm flowchart, which is shown in Fig. 9.2. Thealgorithm starts with variational G.�/, C.�/, and variational input source u.t; �/.Then, it applies spectral stochastic method to convert the variational system (8.3)into a deterministic system, which consists of augmented matrices of G.�/ andC.�/ and position matrix B in (8.3) with new unknowns. Then we generate the firstL moments of coefficients of Hermite polynomial of current sources, UL, with log-normal distribution. Finally, we apply EKS/IEKS to solve the obtained deterministicsystem for response Z using the computed projection matrix V . After this, we getback to the transient response of the original augmented system by v.t/ D V z.t/.Finally, we compute the mean and variance of any voltage node from v.t/.

In the following subsections, we present the detailed descriptions for somecritical steps of the StoEKS algorithm.

4.2 Generation of the Augmented Circuit Matrices

We first show how we convert the variational circuit equation into a deterministicone, which is suitable for EKS. Our work follows the recently presented stochasticmodel order reduction (SMOR) method [214]. SMOR is based on Hermite PC andthe Krylov-based projection method.

4 The Stochastic Extended Krylov Subspace Method—StoEKS 131

Given varience ofG, C, u

Get augmented systemG_sts, C_sts,B_sts,u_sts

get mean and variance of the voltageof every node

Project back to original circuitx(t)=Vz(t)

Obtain orthogonal basis Vby IEKS on the augmented system

StoEKS algorithm

Solve reduced system, z(t),based on orthogonal basis V

Compute first L moments ofu_sts by IEKS for every current source

Fig. 9.2 Flowchart of theStoEKS algorithm. Reprintedwith permission from [110]c� 2008 IEEE

We first assume that G.�/, C.�/, and u.t; �/ in (8.3) are represented in HermitePC forms with a proper order P :

G.�/ D G0 C G1H1.�/ C G2H2.�/ C � � � C GP HP .�/;

C.�/ D C0 C C1H1.�/ C C2H2.�/ C � � � C CP HP .�/;

u.t; �/ D .u0.t/ C ud .t// C u1.t/H1.�/ C � � � C uP .t/HP .�/:

Here, Hi .�/ are the Hermite PC basis functions for G.�/, C.�/, and u.t; �/. P isalso the number of these basis functions, which depends on the number of randomvariables n and the expansion order p in (2.31). Gi , Ci , and ui are the Hermitepolynomial coefficients of conductance, capacitors, and current source. G0 and C0

are the mean value of conductance and capacitors. Gi and Ci are variational part forconductance and capacitors.

Ideally, to obtain the G and C in the HPC format, i.e., to compute Gi and Ci fromthe width and length variables, one can use spectral stochastic analysis method [86],


which is a fast MC method or other extraction methods. For this chapter, we simplyassume that we obtain such information. The detail of how Gi and Ci are obtainedis as follows:

Gi D ai � G0;

Ci D ai � C0; i D 1; :::; P: (9.4)

ai is the variational percentage for Hi .Substitute (9.4) into (8.3), the system equations become

P �1X

iD0

P �1X

j D0

Gi vj Hi Hj C s

P �1X

iD0

P �1X

j D0

Ci vj Hi Hj

D ud .t/ CP �1X

iD0

ui .t/Hi : (9.5)

Here, vi is the coefficients of Hermite polynomial of node voltages v.t; �/ as

v.t; �/ D v0.t/ C v1.t/H1 C v2.t/H2 C � � � C vP �1.t/HP �1: (9.6)

After performing the inner product of Hk on both sides of the equation (9.5), it willbecome

P �1X

iD0

P �1X

j D0

Gi vj hHi Hj ; Hki C s

P �1X

iD0

P �1X

j D0

Civj hHi Hj ; Hki

DP �1X

iD0

ui hHi; Hki C hHk; 1ivd .t/; k D 0; 1; :::; P � 1; (9.7)

where hHi Hj ; Hki is the inner product of Hi Hj and Hk . On the right-hand side(rhs) of (9.7), the inner product is calculated based on Hi and Hk .

Notice that hHk; 1i D 1, when k D 0; hHk; 1i D 0 when k ¤ 0. In general, thecoefficients of Hi Hj are calculated in (9.5), and the inner product is defined as

hHiHj ; Hki DZ C1

�1Hi Hj Hkd�; (9.8)

considering the independent of Hermite polynomial Hi , Hj , and Hk . Also, the innerproduct is similar for

hHi ; Hj i DZ C1

�1Hi Hj d�: (9.9)


The inner product is a constant and can be computed a priori and stored in a tablefor fast computation. Based on the P equations and the orthogonal nature of theHermite polynomials, these equations can be written in matrix form as

.Gsts C sCsts/V D Bstsusts; (9.10)

Gsts D

2

64

G00 : : : G0P �1

:::: : :

:::

GP 0 : : : GP �1P �1

3

75 ;

Csts D

2

64

C00 : : : C0P �1

:::: : :

:::

CP �10 : : : CP �1P �1

3

75 ;

usts D

2

6664

u0.t/ C ud .t/

u1.t/:::

uP �1.t/

3

7775

; V D

2

6664

V0.t/

V1.t/:::

VP �1.t/

3

7775

; (9.11)

Bsts D

2

64

B0 : : : 0:::

: : ::::

0 : : : BP �1

3

75 (9.12)

Bi D B; Gkj DP �1X

iD0

Gi hHi Hj ; Hki; Ckj DP �1X

iD0

CihHi Hj ; Hki;

where Gsts 2 RmP�mP, Csts 2 RmP�mP, Bsts 2 RmP �l , m is the size of theoriginal circuit, and P is the number of Hermite polynomials. In [214], PRIMA-like reduction is performed on (9.10) to obtain the reduced variational system.

4.3 Computation of Hermite PCs of Current Momentswith Log-Normal Distribution

In this section, we show how to compute the Hermite coefficients for the varia-tional leakage currents and their corresponding moments used in the augmentedequation (9.10).


Let uiv.t; �/ be the i th current in the current vector uv.t; �/ in (9.1), which is a

function of the normalized Gaussian random variables � D Œ�1; �2; :::; �n� and time t :

uiv.t; �/ � eg.t;�/ D e

Pnj D0 gj .t/�j : (9.13)

The leakage current sources are therefore following log-normal distribution. We canthen present ui

v.t; �/ by using Hermite PC expansion form:

uiv.t; �/ D

PX

kD0

uivk.t/H n

k .�/

D uiv0.t/

0

@1 CnX

iD1

�i gi .t/ CnX

iD1

nX

j D1

.� i�j � ıij /

< .�i �j � ıij /2 >

gi .t/gj .t/ C � � �1

A ; (9.14)

where

uiv0.t/ D eg0.t/C 1

2

PniD1 gi .t/

2

; P DpX

kD0

.n � 1 C k/Š

kŠ.n � 1/Š: (9.15)

n is the number of random variables and p is the order of Hermite PC expansion.As a result, the variational variable u.t; �/ leads to the usts in (9.10):

usts Dhu0.t/

T C ud .t/T ; u2.t/T ; :::; uP �1.t/T

iT

: (9.16)

Note that ud .t/ is the deterministic current source vector.In the EKS method, we need to compute the moments of input sources in

frequency domain. Suppose .ai1; �i1/, .ai2; �i2/,..., .aiKC2; �iKC2/ are PWL seriesof value-time pairs for ui .t/ or u0.t/ C ud .t/ in (9.16). Using equation (9.3), we canget the first L moments for each ui , i D 1; 2; :::; P in (9.16), respectively, and wehave

ui .s/ D mui1 C mui2sC; :::; muiL sL�1; (9.17)

where muikis the kth order moment vector of Hermite PCs coefficient for ui . In

this way, we can compute the moments of Hermite PC coefficients for every currentsource.


Input: Augmented system Gsts , Csts , Bsts , usts

Output: The HPC coefficients of node voltage, v1 Get the first L moments of usts for each current source.2 Compute the orthogonal basis of subspace from (9.10) V.3 Obtain the reduced system matrix from

OG D V T GstsV , OC D V T CstsV , OB D V T Bsts .4 Solve OGz.t / C OC d z.t/

dtD OBusts .t /.

5 Project back to original space to get v(t) = Vz(t).6 Compute the variational values (means, variance) of the specified nodes.

Fig. 9.3 The StoEKS algorithm

4.4 The StoEKS Algorithm

Given the Gsts , Csts , and usts in moment forms, we can obtain the orthogonalV using the EKS algorithm. The reduced systems then can be obtained by thisorthogonal basis V from equation (9.3). The reduced system will become

OGstsz.t/ C OCsts

dz.t/

dtD OBstsusts: (9.18)

Here,

OGsts D V T GstsV; OCsts D V T CstsV; OBsts D V T Bsts: (9.19)

The reduced system can be solved in the time domain by any standard integrationalgorithm. The solution of the reduced system, z.t/, can then be projected back tooriginal space by Qv.t/ D V z.t/.

By solving the augmented equation in (9.10), we can obtain mean and varianceof any node voltage v.t/ by

E.v.t// D E

v0.t/ CP �1X

iD1

vi .t/Hi

!

D v0;

var.v.t// D var

v0.t/ CP �1X

iD1

vi .t/Hi

!

DP �1X

iD1

vi .t/2var.Hi /:

Further, the distribution of v.t/ can also be easily calculated by the characteristicof Hermite PC and the distribution of �1,�2,...,�N . Figure 9.3 is the StoEKS algorithmfor given Gsts , Csts , Bsts , and usts.


4.5 A Walk-Through Example

In the following, we consider a simple case where we only have three independentvariables to illustrate the method. We assume that there are three independentvariables �g , �c , and �I associated with matrices G and C and input sources,respectively, in the circuit.

We assume that the variational component in (9.1), uv.t; �I /, follows log-normaldistribution as

uv.t; �I / D eg.t;�I /; g.t; �/ D �I .t/ C �I .t/�I : (9.20)

Then equation (8.3) becomes

G.�g/v.t/ C C.�c/dv.t/

dtD Bu.t; �I /: (9.21)

The variation in width W and thickness T will cause variation in conductance matrixG and storage matrix C while variation in threshold voltage will cause variation inleakage currents u.t; �I /. Thus, the resulting system can be written as [47]

G.�g/ D G0 C G1�g; C.�c/ D C0 C C1�c: (9.22)

G0; C0 represent the deterministic component of conductance and capacitance ofthe wires. G1; C1 represent sensitivity matrices of the conductance and capacitance.�g; �c are random variables with normalized Gaussian distribution, representingprocess variation in wires of conductance and capacitor, respectively.

�I is a normalized Gaussian distribution random variable representing variationin threshold voltage.

Using Galerkin-based method as in [107] with second-order Hermite PCs, weend up solving the following equation:

Gstsv.t/ C Csts

dv.t/

dtD Bstsusts.t/; (9.23)

where

Gsts D

2

6666666666666664

G0 G1 0 0 0 0 0 0 0 0

G1 G0 0 0 2G1 0 0 0 0 0

0 0 G0 0 0 0 0 G1 0 0

0 0 0 G0 0 0 0 0 G1 0

0 G1 0 0 G0 0 0 0 0 0

0 0 0 0 0 G0 0 0 0 0

0 0 0 0 0 0 G0 0 0 0

0 0 G1 0 0 0 0 G0 0 0

0 0 0 G1 0 0 0 0 G0 0

0 0 0 0 0 0 0 0 0 G0

3

7777777777777775


Csts D

2

6666666666666664

C0 0 C1 0 0 0 0 0 0 0

0 C0 0 0 0 0 0 C1 0 0

C1 0 C0 0 0 2C1 0 0 0 0

0 0 0 C0 0 0 0 0 0 C1

0 0 0 0 C0 0 0 0 0 0

0 0 C1 0 0 C0 0 0 0 0

0 0 0 0 0 0 C0 0 0 0

0 C1 0 0 0 0 0 C0 0 0

0 0 0 0 0 0 0 0 C0 0

0 0 0 C1 0 0 0 0 0 C0

3

7777777777777775

usts.t/ D Œu0.t/ C ud .t/; 0; 0; u3.t/; 0; 0; u6.t/; 0; 0; 0�T :

One observation we have is that although the augmented circuit matrices are muchbigger than before, they are very sparse and also consist of repeated coefficientmatrices from the HPC. As a result, the reduction techniques can significantlyimprove the simulation efficiency.

4.6 Computational Complexity Analysis

In this subsection, we analyze the computing costs for both StoEKS and HPCmethods and show the theoretical advantage of StoEKS over the non-reduction-based HPC method.

First, if the PCA operation is performed, which essentially uses SVD on thecovariance matrix, its computation cost is O.ln2/. Here, l is the number of originalcorrelated random variables and n is the first n dominant singular values, whichis also the number of independent random variables after PCA. Since the randomviable l is typically much smaller than the circuit size, the running time of PCA isis not significant for the total cost.

After we transform the original circuit matrices into the augmented circuitmatrices in (9.10), which are still very sparse, the matrix sizes grow from m � m

to P m � P m, where P is the number of Hermite polynomials used. The numberis dependent on the Hermite polynomial order and the number of variable used asshown in (2.31).

Typically, solving an n � n linear matrix takes O.n˛/ (typically, 1 � ˛ � 1:2

for sparse circuits), and matrix factorizations take O.nˇ/ (typically, 1:1 � ˇ �1:5 for sparse circuits). For HPC, assuming that we need to compute w time stepsin transient analysis (taking w forward and backward substitutions after one LUdecomposition), the computing cost then is

O.w.mP /˛ C .mP /ˇ/: (9.24)


While for StoEKS, we only need to approximately take q, the order of the reducedmodel, steps (after the one LU decomposition) to compute the projection matrix V .So the total computational cost is

O�q.mP /˛ C .mP /ˇ C mP q2 C q3 C wq2

�; (9.25)

without considering the cost of the PCA operations (ln2) as we did not performthe PCA in our experiments. The last three items are the costs of performing thereductions (QR operation) and transient simulation of the reduced circuit (whichhave very dense matrices) in time domain. Since q � w, the computing cost ofStoEKS can be significantly lower than HPC. Also the presented method can befurther improved by using the hierarchical EKS method [11].


This section describes the simulation results of circuits with both capacitance andconductance variations and leakage current variation. The leakage current variationfollows log-normal distribution. The capacitance and conductance variations followGaussian distribution.

All the presented methods have been implemented in Matlab 7.0. All theexperimental results are carried out on a Dell PowerEdge 1900 workstation (using aLinux system) with Intel Quadcore Xeon CPUs with 2.99 Ghz and 16 GB memory.To solve large circuits in Matlab, an external linear solver package UMFPACK [184]has been used, which is linked with Matlab using Matlab mexFunction. The initialresults of this chapter were published in [110, 111].

As mentioned in Sect. 4 of Chap. 8, we assume that the random variables usedin the chapter for G and C and current sources are independent after the PCAtransformation.

First, we assume a time-variant leakage model, in which we assume that uiv.t; �/

in (9.13) is a function of time t and further assume that gj .t/, the standard deviation,is a fixed percentage, say 10%, of vd .t/ in (9.1), i.e., gi .t/ D 0:1udi .t/, where udi .t/

is the i th component of the PWL current vd .t/.Figures 9.4–9.6 show the results at one particular node under this configuration.Figure 9.4 shows the node voltage distribution at one node of a ground network

with 280 nodes, considering variation in conductance, capacitance, and leakagecurrent (with three random variables). The standard deviation (s.d.) of the log-normal current sources with one Gaussian variable is 0:1udi .t/. The s.d. inconductance and capacitance are also 0:1 of the mean. The mean and s.d. computedby the Hermite PC method, Hermite PC with EKS are also marked in the figure,which fit very well with the MC results. In Fig. 9.4, the dotted lines are the mean ands.d. calculated by MC. The solid lines are the mean and s.d. by the algorithm [108],which is named as HPC. The dashed lines are the results from StoEKS. The MCresults are obtained by 3,000 samples. The reduced order for EKS is five, q D 5.


0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.110

50

100

150

200

250

300

350

400

450

500

Voltage(volts)

Num

ber

of o

ccur

ance

sComparison of voltage distribution among three method with three RV

← μ← μ−3δ ← μ+3δ

dash: StoEKSdot: Monte Carloline: HPC

Fig. 9.4 Distribution of the voltage variations in a given node by StoEKS, HPC, and MonteCarlo of a circuit with 280 nodes with three random variables. gi.t/ D 0:1udi .t /. Reprinted withpermission from [110] c� 2008 IEEE

Figure 9.5 shows the distribution at one node of a ground network with 2,640nodes. The parameter gi .t/ value is set to the same as the ones in the circuit with 280nodes. The s.d. in conductance are 0.02, 0.05, and 0.1 of the mean for three variables.The s.d. in capacitance are 0:02, 0:02, and 0:1 of the mean for three variables. Thereare totally seven random variables. The dotted lines represent the MC results. Andthe dashed lines represent the results given by StoEKS. From these two figures, wecan only see marginal difference between the three different methods. The reducedorder for EKS is also five, q D 5.

Figure 9.6 shows the distribution at one node of a ground network with 280nodes. But the variation setting of parameters is different. The standard deviations inconductance are set to 0:02, 0:02, 0:03, 0:05, and 0:05 of the mean for five variables,respectively, i.e., their a1 in (9.4) is set to those values. The standard deviations incapacitance are set to 0:02, 0:03, 0:04, 0:05, and 0:05 of the mean for five variables,respectively, also. The standard deviation of the log-normal current sources is 0:1

of the mean. There are 11 random variables in all. It is even harder for HPC tocompute mean and s.d. of the circuit. The dotted lines represent the MC results.And the dashed lines represent the results given by StoEKS. The reduced order forEKS is ten.

Table 9.1 shows the speedup of the StoEKS and HPC methods over MC methodunder different numbers of random variables. In the table, #RV is the number of


0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.650

50

100

150

200

250

300

350

400

450

500

Voltage(volts)

Num

ber

of o

ccur

ance

sComparison of voltage distribution among three methods with seven RV

← μ−3δ ← μ+3δ← μ

dash: StoEKSdot: Monte Carloline: HPC

Fig. 9.5 Distribution of the voltage variations in a given node by StoEKS, HPC, and MC ofa circuit with 2,640 nodes with seven random variables. gi .t/ D 0:1udi .t /. Reprinted withpermission from [110] c� 2008 IEEE

random variables used. In the table, there are 3, 7, and 11 random variables. Thevariation value setup of three random variables is the same as the circuit used inFig. 9.4. The variation value setup of seven random variables is the same as thecircuit used in Fig. 9.5. The variation value setup of 11 random variables is thesame as the circuit used in Fig. 9.6. The first speedup is the speedup of StoEKS overMC, and the second speedup is the speedup of HPC over MC.

From the table, we observe that we cannot obtain the results from HPC or MCwhen the circuit becomes large enough in reasonable time. Meanwhile, StoEKS candeliver all the results.

We remark that the intra-die variations are typically very spatially correlated [16].After the transformation like PCA, the number of variables can be significantlyreduced. As a result, in our examples, we do not assume large number of variables.

Tables 9.2 and 9.3 show the mean and s.d. comparison of different methods overthe MC method for several circuits. Again, #RV is the number of random variablesused. Table 9.2 contains the values we obtain from different methods, and Table 9.3presents the error comparison of StoEKS and HPC over Monte Carlo, respectively.


0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.650

50

100

150

200

250

300

350

400

450

500

Voltage(volts)

Num

ber

of o

ccur

ance

sComparison of voltage distribution between two methods with eleven RV

← μ−3δ ← μ+3δ← μ

dash: StoEKSdot: Monte Carlo

Fig. 9.6 Distribution of the voltage variations in a given node by StoEKS and MC of a circuit with2,640 nodes with 11 random variables. gi .t/ D 0:1udi .t /. Reprinted with permission from [110]c� 2008 IEEE

Table 9.1 CPU time comparison of StoEKS and HPC with the Monte Carlo method.gi .t/ D 0:1udi .t /

#nodes #RV MC StoEKS Speedup HPC [108] Speedup

280 3 694.35 0:3 2314:5 2:37 292:97

280 7 671.46 2:37 283:31 227:94 2:94

280 11 684.88 24:26 28:23 914:34 0:74

2,640 3 5925.7 4:33 1368:5 55:35 107:1

2,640 7 5927.6 25:02 236:9 1952:2 3:04

2,640 11 6042.2 693:27 8:72 – –12,300 3 3:54 � 104 21:62 1637:4 298:84 118:5

12,300 7 3:30 � 104 151:71 217:65 – –119,600 3 – 258:21 – – –119,600 7 – 2074:8 – – –1,078,800 3 – 1830:4 – – –


Table 9.2 Accuracy comparison of different methods, StoEKS, HPC,and MC. gi.t/ D 0:1udi .t /

Mean Std dev#nodes #RV MC StoEKS HPC MC StoEKS HPC

280 3 0.047 0.047 0.047 0.0050 0.0048 0.00482,640 3 0.39 0.39 0.39 0.048 0.046 0.04612,300 3 1.66 1.66 1.66 0.16 0.17 0.17280 7 0.047 0.047 0.047 0.0056 0.0055 0.00552,640 7 0.39 0.39 0.39 0.048 0.046 0.04612,300 7 2.56 2.56 – 0.31 0.30 –280 11 0.047 0.047 0.047 0.0039 0.0039 0.00402,640 11 0.39 0.39 – 0.033 0.033 –

Table 9.3 Error comparison of StoEKS and HPC over Monte Carlomethods. gi .t/ D 0:1udi .t /

StoEKS % HPC % StoEKS % HPC %#nodes #RV error in � error in � error in � error in �

280 3 0.19 0.28 3.14 3.102,640 3 1.23 1.05 4.31 4.5112,300 3 0.10 0.08 2.95 2.98280 7 0.063 0.17 1.12 1.542,640 7 0.076 0.11 4.18 4.6012,300 7 0.23 – 0.23 –280 11 0.42 0.21 0.18 0.522,640 11 0.18 – 0.30 –

0 0.5 1 1.5 2

x 10−7

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

time(s)

Am

s

A PWL current source at one node

Fig. 9.7 A PWL current source at certain node. Reprinted with permission from [110] c� 2008IEEE

6 Summary 143

0.04 0.05 0.06 0.07 0.08 0.09 0.10

50

100

150

200

250

300

350

400

450

500

Voltage(volts)

Num

ber

of o

ccur

ance

sComparison of voltage distribution among three methods with three RVs

← μ−3δ ← μ+3δ← μ

dash: HPCdot: MonteCarloline: HPC

Fig. 9.8 Distribution of the voltage variations in a given node by StoEKS, HPC, and Monte Carloof a circuit with 280 nodes with three random variables using the time-invariant leakage model.gi D 0:1Ip . Reprinted with permission from [110] c� 2008 IEEE

We can see that StoEKS only has marginal difference from MC while it is able toperform simulation on much larger circuit than the existing HPC method on thesame platform.

Finally, we use a time-invariant leakage model, in which we assume that uiv.�/

in (9.13) is not a function of time t and further assume that gj , which is the standarddeviation, is a fixed percentage, of a constant current value in (9.1). In our test cases,we use the peak current, Ip 41 mA as shown in Fig. 9.7, as the constant value.Figure 9.8 shows the results in this configuration.

6 Summary

In this chapter, we have presented a fast stochastic method for analyzing the voltagedrop variations of on-chip power grid networks. The new method, called StoEKS,applies HPC to represent the random variables in both power grid networks andinput leakage currents with log-normal distribution. This HPC method transforms


a statistical analysis problem into a deterministic analysis problem where increasedaugmented circuit matrices are created. The augmented circuit matrices consist ofthe coefficients of Hermite polynomials representing both variational parameters incircuit matrices and input sources. We then applied the EKS method to computevariational responses from the augmented circuit equations. The presented methoddoes not require any sampling operations as used by collocation-based spectralstochastic analysis method. Numerical examples have shown that the presentedmethod is about two orders of magnitude faster than the existing Hermite PC-based simulation method and more orders of magnitudes faster than MC methodwith marginal errors. StoEKS also increases the analysis capacity of the statisticalsimulation methods based on the spectral stochastic method presented in Chap. 8.

Chapter 10Statistical Power Grid Analysis by VariationalSubspace Method

1 Introduction

In this chapter, we present a novel scalable statistical simulation approach forlarge power grid network analysis considering process variations [92]. The newalgorithm is very scalable for large networks with a large number of randomvariables. Our work is inspired by the recent work on variational model orderreduction using fast balanced truncation method (called variational Poor man’s TBRmethod, or varPMTBR [134]). The new method, called varETBR, is based on therecently proposed ETBR method [93, 94]. To consider the variational parameters,we extend the concept of response Gramian, which was used in ETBR to computethe reduction projection subspace, to the variational response Gramian. ThenMC-based numerical integration is employed to multiple-dimensional integrals.Different from traditional reduction approaches, varETBR calculates the variationalresponse Gramians, considering both system and input source variations, to gen-erate the projection subspace. In this way, much more efficient reduction can beperformed for interconnects with massive terminals like power grid networks [177].Furthermore, the new method is based on the globally more accurate balancedtruncation reduction method instead of the less accurate Krylov subspace methodas in EKS/IEKS [89, 191]. After the reduction, MC-based statistical simulation isperformed on the reduced system and the statistical responses of the original systemsare obtained thereafter. The varETBR only requires the simulation of the reducedcircuit using any existing transient analysis method. It is insensitive to the numberof variables and variation ranges in terms of computing costs and accuracy, whichmakes it very general and scalable. Numerical results, on a number of the IBMbenchmark circuits [123] up to 1.6 million nodes, show that the varETBR can beup to 1; 900� faster than the MC method, and is much more scalable than theStoEKS method [110,111]. varETBR can solve very large power grid networks withlarge numbers of random variables, large variation ranges, and different variationaldistributions.


145

146 10 Statistical Power Grid Analysis by Variational Subspace Method

The rest of this chapter is as follows: Sect. 2 reviews the EKS methods andfast balanced truncation methods. Our new variational analysis method varETBR ispresented in Sect. 3. Section 4 shows the experimental results, and Sect. 5 concludesthis chapter.

2 Review of Fast Truncated Balanced Realization Methods

2.1 Standard Truncated Balanced Realization Methods

The truncated balanced realization (TBR)-based reduction method has two stepsin the reduction process: The balancing step transforms the states that can becontrolled and observed equally. The truncating step then throws away the weakstates, which usually leads to much smaller models. The major advantage of theTBR method is its ability to give a deterministic global bound for the approximateerror as well as provide nearly optimal models in terms of errors and model sizes.

Given a system in a standard state-space form,

Px.t/ D Ax.t/ C Bu.t/;

y.t/ D Cx.t/; (10.1)

where A 2 Rn�n, B 2 Rn�p, C 2 Rp�n, and y.t/, u.t/ 2 Rp. The controllableand observable Gramians are the unique symmetric positive definite solutions to theLyapunov equations:

AX C XAT C BBT D 0;

AT Y C YA C C T C D 0: (10.2)

Since the eigenvalues of product XY are invariant under similarity transformation,we can perform a similarity transformation .Ab D T �1AT; Bb D T �1B; Cb DC T / to diagonalize the product XY such that

T �1XY T D † D diag.�12; �2

2; : : : ; �n2/; (10.3)

where T matrix is the transformation matrix and the Hankel singular values of thesystem, (�k), are arranged in a descending order. If we partition the matrices as

�W T

1

W T2

�XY

�V1 V2

� D�

†1 0

0 †2

�; (10.4)

where †1 D diag.�12; �2

2; : : : ; �r2/ are the first r largest eigenvalues of Gramian

product XY and W1 and V1 are corresponding eigenvectors. A reduced model canbe obtained as follows:

Px.t/ D Arx.t/ C Br u.t/;

y.t/ D Crx.t/; (10.5)

2 Review of Fast Truncated Balanced Realization Methods 147

where Ar D W T1 AW1, Br D W T

1 B , and Cr D C V1. One most desired feature ofthe TBR method is that it has proved error bound: the error in the transfer functionof the order r approximation is bounded by 2

PNiDrC1 �k [50, 112]. In the TBR

procedure, the computational cost is dominated by solving Lyapunov equations ofcomplexity O.n3/, which makes it too expensive to apply to large problem sizes.

2.2 Fast and Approximate TBR Methods

The TBR method generally suffers high computation costs, as it needs to solveexpensive Lyapunov equations (10.2). To mitigate this problem, fast TBR meth-ods [134, 196] have been proposed recently, which compute the approximateGramians. The Poor men’s TBR method or PMTBR [134] was proposed forvariational interconnect modeling.

Specifically, the Gramian X can also be computed in the time domain as

X DZ 1

0

eAt BBT eAT t dt: (10.6)

From Parseval’s theorem, and the fact that the Laplace transform of eAt is .sI �A/�1, the Gramian X can also be computed in the frequency domain as

X DZ C1

�1.j!I � A/�1BBT .j!I � A/�H d!; (10.7)

where superscript H denotes Hermitian transpose. Let !k be the kth sampling point.If we define

zk D .j!kI � A/�1B; (10.8)

then based on the numerical quadrature rule, X can be approximated as [134]:

OX DX

wkzkzHk D ZW 2ZH ; (10.9)

where Z D Œz1; z2; : : : ; zn�. W is a diagonal matrix with diagonal entries wkk Dpwk . wk comes from a specific numerical quadrature method. Since OX is symmet-

ric, it is orthogonally diagonalizable:

OV T OX OV D� OV T

1OV T2

�OX � OV1

OV2

� D� O†1 0

0 O†2

�; (10.10)

where OV T OV D I . OV converges to the eigenspaces of X , and the dominanteigenvectors OV1 can be used as the projection matrix in a model reduction approach.Ar D OV T

1 A OV1; Br D OV T1 B/.


2.3 Statistical Reduction by Variational TBR

In [134], PMTBR has been extended to reduce interconnect circuits with variationalparameters. The idea is that the computation of Gramian in (10.7) can be viewed asthe mean computation of .j!I � A/�1BBT .j!I � A/�H with respect to statisticalvariable !, the frequency. If we have more statistical variable parameters, theGramians can be still viewed as the mean computation, but over all the variables(including the frequency variables).

In the fast TBR framework, computing Gramian (10.7) is essentially a one-dimensional integral with respect to the complex frequency !. When multiplevariables with specific distributions are considered, multidimensional integral withrespect to random variables will be computed. As in PMTBR, the MC method wasstill employed in variational TBR to compute the multiple-dimensional integral.

One important observation in varPMTBR is that the number of samplings inbuilding subspaces is much smaller than the number of general MC samplingsfor achieving the same accuracy. As a result, varPMTBR is much faster thanthe brute-force Monte Carlo method, and its costs are much less sensitive to thenumber of random variables and variation ranges, which makes this method muchmore efficient than the existing variational or parameterized model order reductionmethods [208].

3 The Presented Variational Analysis Method: varETBR

In this section, we detail the presented varETBR method. We first present therecently proposed ETBR method for deterministic power grid analysis based onreduction techniques.

3.1 Extended Truncated Balanced Realization Scheme

The presented method is based on the recently proposed ETBR method [93]. Wefirst review this method.

For a linear system in (8.2), we first define the frequency-domain responseGramian,

Xr DZ C1

�1.j!C C G/�1Bu.j!/uT .j!/BT .j!C C G/�H d!; (10.11)

which is different from the Gramian concepts in the traditional TBR-based reductionframework. Notice that in the new Gramian definition, the input signals u.j!/ areconsidered. As a result, .j!C C G/�1Bu.j!/ serves as the system response withrespect to the input signal u.j!/ and resulting Xr becomes the response Gramian.

3 The Presented Variational Analysis Method: varETBR 149

Fig. 10.1 Flow of ETBR

To fast compute the response Gramian Xr , we can use MC-based method toestimate the numerical value as done in [134]. Specifically, let !k be kth samplingpoint over the frequency range. If we further define

zrk D .j!kC C G/�1Bu.j!k/; (10.12)

then OX can be computed approximately by numerical quadrature methods:

OXr DX

k

wkzrkzr

kH D ZrW

2ZHr ; (10.13)

where Zr is a matrix whose columns are zrk and W is a diagonal matrix with diagonal

entries wkk D pwk . wk comes from a specific quadrature method.

The projection matrix can be obtained by singular value decomposition (SVD)of Zr . After this, we can reduce the original matrices into small ones and thenperform the transient analysis on the reduced circuit matrices. The ETBR algorithmis summarized in Fig. 10.1.

Notice that we need the frequency response caused by input signal u.j!k/

in (10.12). This can be obtained by FET on the input signals in time domain.Using frequency spectrum representations for the input signals is a significantimprovement over the EKS method as we avoid the explicit moment representationof the current sources, which are not accurate for currents rich in high-frequencycomponents due to the well-known problems in explicit moment matching meth-ods [137]. Accuracy is also improved owing to the use of the fast balanced truncationmethod for the reduction, which has global accuracy [112, 134].


Note that we use congruence transformation for the reduction process withorthogonal columns in the projection matrix (by using Arnoldi or Arnoldi-likeprocess); the reduced system must be stable. For simulation purposes, this issufficient. If all the observable ports are also the current source nodes, i.e., y.t/ DBT v.t/, where y.t/ is the voltage vector at all observable ports, the reduced systemis also passive. It was also shown in [134] that the fast TBR method has similartime complexity to multiple-point Krylov-subspace-based reduction methods. Theextended TBR method also has similar computation costs as the EKS method.

3.2 The Presented Variational ETBR Method

We first start the new statistical interpretation of Gramian computation beforeintroducing the presented method.

3.2.1 Statistical Interpretation of Gramian

For a linear dynamic system formulated in state space equations (MNA) in (8.2), ifcomplex frequency j! is a vector of random variables with uniform distribution inthe frequency domain, then the state responses V.j!/ D .G C j!C /�1Bu.!/

become random variables in frequency domain. Its covariance matrix can becomputed as

Xr D E˚V.j!/V.j!/T

� DZ C1

�1V.j!/V.j!/T d!; (10.14)

where Efxg stands for computing the mean of random variable x. Xr is definedin (10.11). The response Gramian essentially can be viewed as the covariance matrixassociated with state responses. Xr can also be interpreted as the mean for functionP.j!/ on evenly distributed random variables j! over Œ�1; C1�.1 ETBR methodactually performs the PCA transformation of the mentioned random process withuniform distribution.

3.2.2 Computation of Variational Response Gramian

Define P.j!/ D V.j!/V.j!/T . Now suppose in addition to the frequency variablej!, P.j!; �/ is also the function of the random variable � with probability density

1Practically, the interesting frequency range is always bounded.

3 The Presented Variational Analysis Method: varETBR 151

f .�/. The new variational response Gramian Xvr can be defined as

Xvr DZ

s�

Z C1

�1f .�/P.j!; �/d!d� D EfP.j!; �/g (10.15)

where s� is the domain of variable � with a specific distribution. Hence, Xvr isessentially the mean of P.j!; �/ with respect to both j! and �. The concept can beextended to more random variables � D Œ�1; �2; :::; �n� and each variable �i adds onemore dimension of integration for the integral.

As a result, calculating the variational Gramian is equivalent to computing themultidimensional integral in (10.15), which can be computed by numerical quadra-ture methods. For one-dimensional integration, efficient methods like Gaussianquadrature rule [173] exist. For multidimensional integral, quadrature points arecreated by taking tensor products of one-dimensional quadrature points, which,unfortunately, grow exponentially with the number of variables (dimensions) andmake the integration intractable for practical problems [165].

Practically, established techniques like MC or quasi MC are more amenable forcomputing the integrals [173] as the computation costs are not dependent on thenumber of variables (integral dimensions). In this chapter, we apply the standardMC method to compute the variational Gramian Xvr . The MC estimation of (10.15)consists of sampling N random points xi 2 S , where S is the domain for bothfrequency and other variables, from a uniform distribution, and then computing theestimate as

OXvr D 1

N

NX

iD1

P.xi /: (10.16)

The MC method has a slow convergence rate (1=p

N ) in general, although it canbe improved to (1=N ) by quasi MC methods. But as observed by Phillips [134], theprojection subspace constructed from the sampled points actually converges muchfaster than the value of OXvr . As we are concerned with the projection subspace ratherthan the actual numerical values of Xvr , we require only the drawing of a smallnumber of samples as shown in the experimental result. The varETBR algorithmflow is shown in Fig. 10.2. Where OG.�/ D V T

r G.�/Vr and OC..�// D V Tr C.�/Vr

stand for

OG.�/ D V Tr G0Vr C V T

r G1Vr�1 C � � � C V Tr GM Vr�M ; (10.17)

OC .�/ D V Tr C0Vr C V T

r C1Vr�1 C � � � C V Tr CM Vr�M : (10.18)

The algorithm starts with the given power grid network and the number of sam-plings q, which are used for building the projection subspace. Then it computes thevariational response zr

k D �skC.�k

1 ; :::; �kM / C G.�k

1 ; :::; �kM /��1

B u.sk; �k1 ; :::; �k

M /

randomly. Then we perform the SVD on Zr D Œzr1; zr

2; : : : ; zrq� to construct the

projection matrix. After the reduction, we perform the MC-based statistical analysisto obtain the variational responses from v.t/ D Vr Ov.t/.


Fig. 10.2 Flow of varETBR

We remark that in both Algorithm 10.1 and Algorithm 10.2, we perform MC-likerandom sampling to obtain q frequency sampling points over the M C1 dimensionalspace for given frequency range and parameter spaces (for Algorithm 10.1, samplingis on the given frequency range only). We note that the MC-based sampling methodis also used in the PMTBR method [134].

Compared with existing approaches, varETBR offers several advantages andfeatures. First, varETBR only uses MC sampling, it is easy to implement, and isvery general for dealing with different variation distributions and large variationranges. It is also more amenable for parallel computing as each sampling infrequency domain can be done in parallel. Second, it is vary scalable for solvinglarge networks with large number of variables as reduction is performed. Third,varETBR is more accurate over wide band frequency ranges as it samples overfrequency band (compared with the less accurate moment matching-based EKSmethod). Last, it avoids the explicit moment representation of the input signals,leading to more accurate results than the EKS method when signals are rich in highfrequency components.


The varETBR algorithm has been implemented using Matlab and tested on an Intelquad-core workstation with 16 GB memory under Linux environment. The initialresults of this chapter were published in [91, 92].


Table 10.1 Power grid (PG)benchmarks Name # of nodes # of V sources # of I sources

ibmpg1 30,638 14,308 10,774ibmpg2 127,238 330 37,926ibmpg3 851,584 955 201,054ibmpg4 953,583 962 276,976ibmpg5 1,079,310 539,087 540,800ibmpg6 1,670,494 836,239 761,484

All the benchmarks are real PG circuits from IBM provided by [123], but thecircuits in [123] are resistor-only circuits. For transient analysis, we need to addcapacitors and transient input waveforms. As a result, we modified the benchmarkcircuits. First, we added one grounded capacitor on each node with a random valuein the magnitude of pF. Second, we replaced the DC current sources by a PWL signalin the benchmark. The values of these signals are also randomly generated based ontheir original values in the DC benchmarks. We implemented a parser using Pythonto transform the SPICE format benchmarks into Matlab format.

The summary of our transient PG benchmarks is shown in Table 10.1. We useMNA formulation to set up the circuit matrices. To efficiently solve PG circuits with1.6 million nodes in Matlab, an external linear solver package UMFPACK [184] isused, which is linked with Matlab using Matlab mexFunction.

We will compare varETBR with the MC method, first in accuracy and thenin CPU times. In all the test cases, the number of samples used for forming thesubspace in varETBR is 50, based on our experience. The reduced order is set top D 10, which is sufficiently accurate in practice. Here we set the variation range,the ratio of the maximum variation value to the nominal value, to 10% and set thenumber of variables to 6 (2 for G, 2 for C and 2 for i ). G.�/ and C.�/ followGaussian distribution. i.t; �/, which models the leakage variations [39], followslog-normal distribution.

varETBR is essentially a kind of reduced MC method. It inherits the merits ofMC methods, which are less sensitive to the number of variables and can reflect thereal distribution very accurately for a sufficient number of samples. But the maindisadvantage of MC is that it is too slow to simulate on large-scale circuits. varETBRfirst reduces the size of circuits to a small number while maintaining sufficientaccuracy. Thus, varETBR can do MC simulation on the reduced circuits very fast.Note that the reduction process is done only once during the simulation process.

To verify the accuracy of our varETBR method, we show the results ofsimulations on ibmpg1 (100 samples) and ibmpg6 (10 samples). Figures 10.3 and10.4 show the results of varETBR and the pure MC method at the 1,000th node(named n1 20583 11663 in SPICE format) of ibmpg1 and at the 1,000th node(named n3 16800 9178400 in SPICE format) of ibmpg6, respectively. The circuitequations in MC are solved by Matlab.

The absolute errors and relative errors of ibmpg1 and ibmpg6 are shown inFigs. 10.5 and 10.6. We can briefly see that errors are very small and our varETBR is


0 0.5 1 1.5 2

x 10−7

1.3

1.4

1.5

1.6

1.7

1.8

1.9

Time (s)

Vol

tage

(V

)Transient waveforms on node 1000 of ibmpg1

varETBRMonte Carlo

Fig. 10.3 Transient waveform at the 1,000th node (n1 20583 11663) of ibmpg1 (p D 10, 100samples). Reprinted with permission from [91] c� 2010 Elsevier

0 0.5 1 1.5 2

x 10−7

1.6

1.62

1.64

1.66

1.68

1.7

1.72

1.74

1.76

1.78

1.8

Time (s)

Vol

tage

(V

)

Transient Waveforms on Node 1000 of ibmpg6

varETBRMonte Carlo

Fig. 10.4 Transient waveform at the 1,000th node (n3 16800 9178400) of ibmpg6 (p D 10, 10samples). Reprinted with permission from [91] c� 2010 Elsevier


0 0.5 1 1.5 2

x 10−7 x 10−7

0

0.005

0.01

0.015

0.02

0.025

0.03a b

Time (s)

Vol

tage

(V

)

Simulation errors of ibmpg1

Simulation errors of ibmpg1 (100 samples). Simulation errors of ibmpg6 (10 samples).

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10−4

Time (s)

Vol

tage

(V

)

Simulation errors of ibmpg6

Fig. 10.5 Simulation errors of ibmpg1 and ibmpg6. Reprinted with permission from [91] c� 2010Elsevier

0 0.5 1 1.5 2

x 10−7 x 10−7

0

0.5%

1%

1.5%

2%

2.5%a b

Time (s)

Per

cent

age

Relative errors of ibmpg1

Relative errors of ibmpg1 (100 samples).

0 0.5 1 1.5 20

0.1%

0.2%

Time (s)

Per

cent

age

Relative errors of ibmpg6

Relative errors of ibmpg6 (10 samples).

Fig. 10.6 Relative errors of ibmpg1 and ibmpg6. Reprinted with permission from [91] c� 2010Elsevier

very accurate. Note that the errors are not only influenced by the variations but alsodepend on the reduced order. To increase the accuracy, we may increase the reducedorder. In our tests, we set the reduced order to p D 10 for all the benchmarks.

Next, we do accuracy comparison with MC on the probability distributionsincluding means and variances. Figure 10.7 shows the voltage distributions of bothvarETBR and original MC at the 1,000th node of ibmpg1 when t D 50 ns (200 timesteps between 0 ns and 200 ns in total). We can also refer to simulation waveformson t D 50 ns in Fig. 10.3. Note that the results do not follow Gaussian distributionas G.�/ and C.�/ follow Gaussian distribution and i.t; �/ follows log-normaldistribution. From Fig. 10.7, we can see that not only are the means and the variancesof varETBR and MC almost the same but so are their probability distributions.


0 0.5 1 1.5 2 2.50

50

100

150

200

250

300

350

400

450

Voltages (V)

Num

ber

of e

vent

sDistributions of voltages for Monte Carlo and varETBR

Monte Carlo

varETBR

μ−3σ μ+3σμ

Fig. 10.7 Voltage distribution at the 1,000th node of ibmpg1 (10,000 samples) when t D 50 ns.Reprinted with permission from [91] c� 2010 Elsevier

Table 10.2 CPU times (s)comparison of varETBR andMonte Carlo (q D 50,p D 10)

varETBR (s) Monte Carlo

Test Ckts Red. (s) Sim. (s) Sim. (s)

ibmpg1 (100) 23 14 739ibmpg1 (10000) 23 1335 70719ibmpg2 (10) 115 1.4 536ibmpg3 (10) 1879 1.5 4973ibmpg4 (10) 2130 1.3 5275ibmpg5 (10) 1439 1.3 5130ibmpg6 (10) 1957 1.5 6774

Finally, we compare the CPU times of varETBR and the pure Monte Carlomethod. To verify the efficiency of varETBR on both CPU time and memory, wedo not need to run simulations many times for both varETBR and MC. We will run10 or 100 samples for each benchmark to show the efficiency of varETBR since wealready showed its accuracy. Although we only run a small number of samples, thespeedup will be the same. Table 10.2 shows the actual CPU times of both varETBR(including FFT costs) and MC on the given set of circuits. The number of samplingpoints in reduction is q D 50. The reduction order is p D 10. Table 10.3 shows theprojected CPU times of varETBR (one-time reduction plus 10,000 simulations) andMC (10,000 samples).

In varETBR, circuit model becomes much smaller after reduction and we onlyneed to perform the reduction once. Therefore, the total time is much faster than


Table 10.3 Projected CPUtimes (s) comparison ofvarETBR and Monte Carlo(q D 50, p D 10, 10,000samples)

Test Ckts varETBR (s) Monte Carlo (s) Speedup

ibmpg1 1358 70719 53�ibmpg2 1515 53600 354�ibmpg3 3379 497300 1472�ibmpg4 3430 527500 1538�ibmpg5 2739 513000 1873�ibmpg6 3457 677400 1960�

Table 10.4 Relative errors for the mean of max voltage drop of varETBRcompared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D 10,10,000 samples) for different variation ranges and different numbers ofvariables

Variation range

#Variables var D 10% var D 30% var D 50% var D 100%

M D 6 0:16% 0:08% 0:17% 0:21%M D 9 0:16% 0:25% 0:08% 0:23%M D 12 0:25% 0:07% 0:07% 0:28%M D 15 0:15% 0:06% 0:05% 0:06%

Table 10.5 Relative errors for the variance of max voltage drop of varETBRcompared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D10, 10,000 samples) for different variation ranges and different numbers ofvariables

Variation range

#Variables var D 10% var D 30% var D 50% var D 100%

M D 6 0:27% 1:54% 1:38% 1:73%M D 9 0:25% 0:67% 1:32% 1:27%M D 12 0:42% 0:07% 0:68% 1:41%M D 15 0:18% 1:11% 0:67% 2:14%

MC (up to 1; 960�). Basically, the bigger the original circuit size is, the faster thesimulation will be for varETBR. Compared to the MC method, the reduction timeis negligible compared to the total simulation time.

Note that we run random simulation 10,000 times for ibmpg1, as shown inTable 10.2, to show the efficiency of our varETBR in practice.

It can be seen that varETBR is very scalable. It is, in practice, almost independentof the variation range and numbers of variables. One possible reason is thatvarETBR already captures the most dominant subspaces even for small number ofsamples (50 in our case) as explained in Sect. 3.

When we increase the variation range and the number of variables, the accuracyof varETBR is almost unchanged. Tables 10.4 and 10.5 show the mean and variancecomparison between the two methods for 10 K MC runs, where we increase thenumber of variables from 6 to 15 and the variation range from 10% to 100%.The tables show that varETBR is very insensitive to the number of variables and


Table 10.6 CPU times (s) comparison of StoEKS and varETBR (q D 50, p D 10)with 10,000 samples for different numbers of variables

M D 5 M D 7 M D 9

Test Ckts StoEKS varETBR StoEKS varETBR StoEKS varETBR

ibmpg1 165 1315 572 1338 3748 1326

ibmpg2 1458 1387 � 1351 � 1377

variation range for a given circuit ibmpg1, where simulations are run on 10,000samples for both varETBR (q D 50, p D 10) and MC.

The variation range var is the ratio of the maximum variation value to thenominal value. So “var D 100%” means the maximum variation value may beas large as the nominal value.

From Tables 10.4 and 10.5, we observe that varETBR is basically insensitive tothe number of variables and the variation range. Here we use the same sampling size(q D 50) and reduced order (p D 10) for all of the different combinations betweennumber of variables and variation range. And the computation cost of varETBR isthe almost same for different numbers of variables and different variation ranges.This actually is consistent with the observation in PMTBR [134]. One explanationfor the insensitiveness or nice feature of the presented method is that the subspaceobtained even with small number of samplings contains the dominant responseGramian subspaces for the wide parameter and frequency ranges.

Finally, to demonstrate the efficiency of varETBR, we compare it with one re-cently proposed similar approach, StoEKS method, which employs Krylov subspacereduction with orthogonal polynomials in [111] on the same suite of IBM circuit.

Table 10.6 shows the comparison results where “�” means out of memory error.StoEKS can only finish smaller circuits ibmpg1 (30 k) and ibmpg2 (120 k), whilevarETBR can go through all the benchmarks (up to 1.6 M nodes) easily. The CPUtime of StoEKS increases rapidly and could not complete computations as variablescount increases. For varETBR, CPU time is independent of number of variablesand only depends on the reduced order and number of samples used in the reducedMC simulation. Here we select reduced order p D 10 and 10,000 samples that aresufficient in practice to obtain the accurate probability distribution.

5 Summary

In this chapter, we have presented a new scalable statistical power grid analysisapproach based on ETBR reduction techniques. The new method, called varETBR,performs reduction on the original system using variation-bearing subspaces beforeMC statistical transient simulation. But different from the varPMTBR method, bothsystem and input source variations are considered for generating the projectionsubspace by sampling variational response Gramians to perform the reduction. As aresult, varETBR can reduce systems with many terminals like power grid networks

5 Summary 159

while preserving variational information. After the reduction, MC-based statisticalsimulation is performed on the reduced system to obtain the statistical responses ofthe original system. Numerical examples show that the varETBR can be 1;900�faster than the MC method and can be scalable to solve very large power gridnetworks with large numbers of random variables and variation ranges. varETBRis also much more scalable than the StoEKS [111] on the IBM benchmark circuits.

Part IVStatistical Interconnect Modeling

and Extractions

Chapter 11Statistical Capacitance Modeling and Extraction

1 Introduction

It is well accepted that the process-induced variability has huge impacts on thecircuit performance in the sub-100 nm VLSI technologies [120,121]. The variationalconsideration of process has to be assessed in various VLSI design steps to ensurerobust circuit design. Process variations consist of both systematic ones, whichdepend on patterns and other process parameters, and random ones, which haveto be dealt with using stochastic approaches. Efficient capacitance extraction ap-proaches by using the boundary element method (BEM) such as the fastCap [115],HiCap [164], and PHiCap [199] have been proposed in the past. To consider thevariation impacts on the interconnects, one has to consider the RLC extractionprocesses of the three-dimensional structures modeling the interconnect conductors.In this chapter, we investigate the geometry variational impacts on the extractedcapacitance.

Statistical extraction of capacitance considering process variations has been stud-ied recently, and several approaches have been proposed [74,87,207,208,210] underdifferent variational models. Method in [87] uses analytical formulas to consider thevariations in capacitance extraction and it has only first-order accuracy. The FastSiesprogram considers the rough surface effects of the interconnect conductors [210].It assumes only Gaussian distributions and has high computational costs. Methodin [74] combines the hierarchical extraction and PFA to solve the capacitancestatistical extraction.

Recently, a capacitance extraction method using collocation-based spectralstochastic method was proposed [205, 208]. This approach is based on the HermitePC representation of the variational capacitance. It applies the numerical quadrature(collocation) method to compute the coefficients of the extracted capacitance in theHermite polynomial form where the capacitance extraction processes (by solvingthe potential coefficient matrices) are performed many times (sampling). One ofthe major problems with this method is that many redundant operations are carriedout (such as the setup of potential coefficient matrices for each sampling, which


163

164 11 Statistical Capacitance Modeling and Extraction

corresponds to solve one particular extraction problem). For the second-orderHermite polynomials, the number of samplings is O(m2), where m is the numberof variables. So if m is large, the approach will lose its efficiency compared to theMonte Carlo method.

In this chapter, instead of using the numerical quadrature method, we use adifferent spectral stochastic method, where the Galerkin scheme is used. Galerkin-based spectral stochastic method has been applied for statistical interconnectmodeling [35, 187] and on-chip power grid analysis considering process variationsin the past [109–111]. The presented method, called StatCap [156], first transformsthe original stochastic potential coefficient equations into a deterministic andlarger one (via the Galerkin-based method) and then solves it using an iterativemethod. It avoids the less efficient sampling process in the existing collocation-based extraction approach. As a result, the potential coefficient equations and thecorresponding augmented system only need to be setup once versus many times inthe collocation-based sampling method. This can lead to a significant saving in CPUtime. Also, the augmented potential coefficient system is sparse, symmetric, andlow rank, which is further exploited by an iterative solver to gain extra speedup. Toconsider second-order effects, we derive the closed-form OPC for the capacitanceintegral equations directly in terms of variational variables without the loss ofspeed compared with the linear model. Numerical examples show that the presentedmethod based on the first-order and second-order effects can deliver two orders ofmagnitude speedup over the collocation-based spectral stochastic method and manyorders of magnitude over the MC method.

The highlights of the presented algorithm are as follows:

1. Proposing the Galerkin-based spectral stochastic method to solve the statisticalcapacitance extraction problem where Galerkin scheme (vs. the collocationmethod) is used to compute the coefficients of capacitance.

2. Deriving the closed-form coefficients Hermite polynomial for potential coeffi-cient matrices in both first-order and second-order forms.

3. Studying the augmented matrix properties and showing that augmented matrix isstill quite sparse, low rank, and symmetric.

4. Solving the augmented systems by minimum residue conjugate gradientmethod [130] to take advantage of the sparsity, low rank, and symmetricproperties of the augmented matrices.

5. Comparing with the existing statistical capacitance extraction methods based onthe spectral stochastic collocation approach [208] and MC method and showingthe superiority of the presented method.

We remark that we have put less emphasis on the acceleration techniques duringthe extraction processes such as the multiple-pole scheme [115], the hierarchicalmethods [164, 199], using the more sophisticated iterative solvers such as generalminimal residue (GMRES) [149], which actually are the key components of thosemethods. The reason is that this is not the focus area where our major contributionsare made. We believe those existing acceleration techniques can significantly speedup the presented method as they did for the deterministic problem. This is especially


the case for the hierarchical approach [164]: the number of panels (thus the randomvariables) can be considerably reduced and the interactions between panels areconstant. These are the areas for our future investigations.


For m conductors system, the capacitance extraction problem based on the BEMformulation is to solve the following integral equation [118]:

Z

S

1

j !xi � !

xj j�.

!xj /daj D v.

!xi /; (11.1)

where �.!xj / is the charge distribution on the surface at conductor j , v.

!xi / is the

potential at conductor i , and 1

j!xi �!

xj j is the free space Green function.1 daj is the

surface area on the surface S of conductor j .!xi and

!xj are point vectors. To solve

for capacitance from one conductor to the rest, we set the conductor’s potential tobe one and all other m � 1 conductors’ potential to be zero. The resulting chargescomputed are capacitance. BEM method divides the surfaces into N small panelsand assumes uniform charge distribution on each panel, which transforms (11.1)into a linear algebraic equation:

P q D v; (11.2)

where P 2 RN �N is the potential coefficient matrix, q is the charge on panels, andv is the preset potential on each panel. By solving the above linear equation, we canobtain all the panel charges (thus capacitance values). In potential coefficient matrixP , each element is defined as

Pij D 1

sj

Z

Sj

G.!xi ;

!xj /daj; (11.3)

where G.!xi ;

!xj / D 1

j!xi �!

xj j is the Green function of point source at!xj . Sj is the

surface of panel j and sj is the area of panel j .Process variations introducing conductor geometry variations are reflected on the

fact that the size of panel and distances between panels become random variables.Here we assume the panel is still a two-dimensional surface. These variations willmake each element in capacitance matrix follow some kinds of random distributions.The problem we need to solve now is to derive this random distribution and then to

1Note that the scale factor 1=.4�0/ can be ignored here to simplify the notation and is used in theimplementation to give results in units of farads.


effectively compute the mean and variance of involved capacitance given geometryrandomness parameters.

In this chapter, we follow the variational model introduced in [74], where eachpoint in panel i is disturbed by a vector ni that has the same direction as thenormal direction of panel i :

!xi

0D!xi Cni ; (11.4)

where the length of the ni follows Gaussian distribution jni j � N.0; �2/. Ifthe value is negative, it means the direction of the perturbation is reversed. Thecorrelation between random perturbation on each panel is governed by the empiricalformulation such as the exponential model [212]:

�.r/ D e�r2=�2

; (11.5)

where r is the distance between two panel centers and � is the correlation length.The most straightforward method is to use MC simulation to obtain distributions,

mean values, and variances of all those capacitance. But the MC method willbe extremely time consuming as each sample run requires the formulation of thechanged potential coefficient matrix P .

3 Presented Orthogonal PC-Based ExtractionMethod: StatCap

In this section, we present the new spectral-stochastic-based method, StatCap,which uses the OPC to represent random variables starting from the geometryparameters.

In our presented method, we first represent the variation potential matrix P intoa first-order form using the Taylor expansion. We then extend our method to handlethe second-order variations in the Sect. 4.

3.1 Capacitance Extraction Using Galerkin-Based Method

Here the charge q.�/ in (11.2) is an unknown random variable vector (with normaldistribution), then potential coefficient equation becomes

P.�/q.�/ D v; (11.6)

where both P.�/ and q.�/ are in Hermite PC form. Then the coefficients can becomputed by using Galerkin-based method in Sect. 3.4 of Chap. 2. The principle oforthogonality states that the best approximation of v.�/ is obtained when the error,.�/, defined as

.�/ D P.�/q.�/ � v (11.7)

3 Presented Orthogonal PC-Based Extraction Method: StatCap 167

is orthogonal to the approximation. That is,

h.�/; Hk.�/i D 0; k D 0; 1; : : : ; P; (11.8)

where Hk.�/ are Hermite polynomials. In this way, we have transformed thestochastic analysis process into a deterministic form, whereas we only need tocompute the corresponding coefficients of the Hermite PC.

For the illustration purpose, considering two Gaussian variables � D Œ�1; �2�,assuming the charge vector in panels can be written as a second-order (p D 2)Hermite PC, we have

q.�/ D q0 C q1�1 C q2�2 C q3.�21 � 1/

Cq4.�22 � 1/ C q5.�1�2/; (11.9)

which will be solved by using augmented potential coefficient matrices to bediscussed in Sect. 3. Once the Hermite PC of q.�/ is known, the mean and varianceof q.�/ can be evaluated trivially. Given an example, for one random variable, themean and variance are calculated as

E.q.�// D q0;

Var.q.�// D q21Var.�/ C q2

2Var.�2 � 1/

D q21 C 2q2

2: (11.10)

In consideration of correlations among random variables, we apply PCA to trans-form the correlated variables into a set of independent variables.

3.2 Expansion of Potential Coefficient Matrix

Specifically, each element in the potential coefficient matrix P can be expressed as

Pij D 1

sj

Z

Sj

G.!xi ;

!xj /daj ; (11.11)

where G.!xi ;

!xj / is the free space Green function defined in (11.3).

Notice that if panel i and panel j are far away (their distance is much larger thanthe panel area), we can have the following approximation [74]:

Pij G.!xi ;

!xj / i ¤ j: (11.12)

Suppose variation of panel i can be written as ni D ıi!ni where

!ni is the unit

normal vector of panel i and ıi is the scalar variation. Then take Taylor expansion


on the Green function,

G.!xi Cni ;

!xj Cnj / D 1

j !xi � !

xj Cni � nj j(11.13)

D 1

j !xi � !

xj jC r 1

j !xi � !

xj j� .nj � ni / C O..ni � nj /2/: (11.14)

From free space Green function, we have

rG.!xi ;

!xj / D r 1

j !xi � !

xj jD r 1

j !r j

D!r

j !r j3

(11.15)

!r D !

xi � !xj : (11.16)

Now we first ignore the second-order terms to make the variation in the linearform. As a result, the potential coefficient matrix P can be written as

P P0 C P1 D0

BBBB@

G.!x1 Cn1;

!x1 Cn1/ � � � G.

!x1 Cn1;

!xn Cnn/

G.!x2 Cn2;

!x1 Cn1/ � � � G.

!x2 Cn2;

!xn Cnn/

::: � � � :::

G.!xn Cnn;

!x1 Cn1/ � � � G.

!xn Cnn;

!xn Cnn/

1

CCCCA

;(11.17)

where

P0 D

0

BBBB@

G.!x1;

!x1/ G.

!x1;

!x2/ � � � G.

!x1;

!xn/

G.!x2;

!x1/ G.

!x2;

!x2/ � � � G.

!x2;

!xn/

:::::: � � � :::

G.!xn;

!x1/ G.

!xn;

!x2/ � � � G.

!xn;

!xn/

1

CCCCA

P1 D

0

BBBB@

0 � � �rG.!x1;

!xn/ � .nn � n1/

rG.!x2;

!x1/ � .n1 � n2/� � �rG.

!x2;

!xn/ � .nn � n2/

::: � � � :::

rG.!xn;

!x1/ � .n1 � nn/� � � 0

1

CCCCA

We can further write the P1 as the following form:

P1 D V1 � N1 � J1 � J1 � N1 � V1; (11.18)

3 Presented Orthogonal PC-Based Extraction Method: StatCap 169

J1 D

0

BBBB@

0 rG.!x1;

!x2/ � � � rG.

!x1;

!xn/

rG.!x2;

!x1/ 0 � � � rG.

!x2;

!xn/

:::::: � � � :::

rG.!xn;

!x1/ � � � rG.

!xn;

!xn�1/ 0

1

CCCCA

N1 D

0

BBBB@

!n1 0 � � �0

!n2 � � �

::: � � � :::

0 � � � !nn

1

CCCCA

V1 D

0

BBB@

ın1 0 � � �0 ın2 � � �::: � � � :::

0 � � � ınn

1

CCCA

;

where J1 and N1 are vector matrices and V1 is a diagonal matrix.To deal with spatial correlation, P1 can be further expressed as a linear

combination of the dominant and independent variables:

� D Œ�1; �2; : : : ; �p� (11.19)

through the PCA operation. As a result, V1 can be further expressed as

0

BBB@

PpiD1 a1i �i 0 � � �

0Pp

iD1 a2i �i � � �::: � � � :::

� � � 0Pp

iD1 ani �i

1

CCCA

(11.20)

Finally, we can represent the P1 as

P1 DX

P1i �i ; (11.21)

whereP1i D Ai � N1 � J1 � J1 � N1 � Ai (11.22)

and

Ai D

0

BBB@

a1i 0 � � � 0

0 a2i � � � 0:::

::: � � � :::

0 � � � 0 ani

1

CCCA

: (11.23)


3.3 Formulation of the Augmented System

Once the potential coefficient matrix is represented in the affine form as shown in(11.21), we are ready to solve for the coefficients P1i by using the Galerkin-basedmethod, which will result in a larger system with augmented matrices and variables.

Specifically, for p independent Gaussian random variables � D Œ�1; : : : ; �p�,there are K D 2pCp.p�1/=2 first- and second-order Hermite polynomials. Hi .�/

i D 1; : : : ; K represents each Hermite polynomial and H1 D �1; : : : ; Hp D �p . Sofor the vector of variational potential variables q.�/, it can be written as

q.�/ D q0 CKX

iD1

qi Hi .�/; (11.24)

where each qi is a vector associated with one polynomial. So the random linearequation can be written as

P q D

P0 CpX

iD1

P1i Hi

!

q0 CKX

iD1

qi Hi

!

D v: (11.25)

Expanding the equation and performing inner product with Hi on both sides, wecan derive new linear system equations:

W0 ˝ P0 CpX

iD1

Wi ˝ P1i

!

Q D V; (11.26)

where ˝ is the tensor product and

Q D

0

BBB@

q0

q1

:::

qK

1

CCCA

I V D

0

BBB@

v0:::

0

1

CCCA

(11.27)

and

Wi D

0

BBBBB@

hHi H0H0i hHi H0H1i � � � hHi H0HKihHi H1H0i hHi H1H1i � � � hHi H1HKi

:::::: hHi Hl Hmi :::

hHiHKH0ihHiHKH1i � � � hHi HKHKi

1

CCCCCA

; (11.28)

4 Second-Order StatCap 171

where hHi HlHmi represents the inner product of three Hermite polynomials Hi ,Hl , and Hm. The matrix .W0 ˝ P0 C Pp

iD1 Wi ˝ P1i / in (11.26) is calledthe augmented potential coefficient matrix. Since Hi are at most second-orderpolynomials, we can quickly calculate every element in Wi with a LUT for anynumber of random variables.

We remark that matrices Wi are very sparse due to the nature of the inner product.As a result, their tensor products with P1i will also lead to the very sparse augmentedmatrix in (11.26). As a result, we have the following observations regarding thestructure of the Wi and the augmented matrix:

1. Observation 1: W0 is a diagonal matrix.2. Observation 2: For Wi matrices, i ¤ 0, all the diagonal elements are zero.3. Observation 3: All Wi are symmetric and the resulting augmented matrix W0 ˝

P0 CPpiD1 Wi ˝ P1i is also symmetric.

4. Observation 4: If one element at position .l; m/ in Wi is not zero, i.e.,Wi .l; m/ ¤ 0, then elements at the same position .l; m/ of Wj , j ¤ i , must bezero. In other words,

Wi .l; m/� Wj .l; m/ D 0 when i ¤ j;

8 i; j D 1; : : : ; p and l; m D 1; : : : ; K:

Such sparse property can help save the memory significantly as we do not needto actually perform the tensor product as shown in (11.26). Instead, we can addall Wi together and expand each element in the resulting matrix by some specificP1i during the solving process, as there is no overlap among Wi for any elementposition.

As the original potential coefficient matrix is quite sparse, low rank, theaugmented matrix is also low rank. As a result, the sparsity, low rank, and symmetricproperties can be exploited by iterative solvers to speed up the extraction process asshown in the experimental results. In our implementation, the minimum residueconjugate gradient method [130] is used as the solver since the augmented systemis symmetric.

4 Second-Order StatCap

In this section, we extend StatCap to consider second order perturbations. We showthe derivation of the coefficient matrix element in second-order OPC from thegeometric variables. As a result, the second-order potential coefficient matrix canbe computed very quickly. In our second-order StatCap, we consider both of thefar-field and near-field cases when (11.11) is approximated.


4.1 Derivation of Analytic Second-Order PotentialCoefficient Matrix

Each element in the potential coefficient matrix P can be expressed as

Pij D 1

si sj

Z

Si

Z

Sj

G.!xi ;

!xj /dai daj

1

sj

Z

Sj

G.!xi ;

!xj /daj (11.29)

1

si

Z

Si

G.!xi ;

!xj /dai ; (11.30)

where G.!xi ;

!xj / is the free space Green function defined in (11.3).

We assume the same definitions for ni , ıni , and!ni as in Sect. 3. If we consider

both first-order and second-order terms, we have the following Taylor expansionon Pij :

Pij .ni ; nj /

D Pi;j;0 C rPij � ni C rPij � nj

CnjT r2Pij nj C ni

T r2Pij ni

C2njT r2Pij ni C O..ni � nj /3/

Pi;j 0 C @Pij

@ni

ıni C @Pij

@nj

ınj

C @2Pij

@ni2ıni

2 C @2Pij

@nj2ınj

2 C 2@2Pij

@ni nj

ıni ınj : (11.31)

And to deal with the spatial correlation, ni can be further expressed as a linearcombination of the dominant and independent variables in (11.19) through the PCAoperation. As a result,

ni D ıni!ni D .ai1�1 C : : : C aip�p/

!ni ; (11.32)

where aiL is defined in (11.20). After that, P will be represented by a linearcombination of Hermite polynomials:

P D P0 CpX

LD1

P1L�L CpX

LD1

P2L.�2L � 1/

CL1¤L2X

L1

X

L2

P2L1;L2 �L1 �L2 ; (11.33)

where P2L is the coefficient corresponding to the first type of second-order Hermitepolynomial, �2

L � 1, and P2L1;L2 means the coefficient corresponding to the secondtype of second-order Hermite polynomial, �L1 �L2 .L1 ¤ L2/.

4 Second-Order StatCap 173

So for each element Pij in P , the coefficients of orthogonal polynomials can becomputed as follows:

Pij;1L D aiL

@Pij

@ni

C ajL

@Pij

@nj

; (11.34)

Pij;2L D a2iL

@2Pij

@ni2

C a2jL

@2Pij

@nj2

C 2aiLajL

@2Pij

@nj ni

; (11.35)

Pij;2L1;L2 D 2aiL1 aiL2

@2Pij

@ni2

C 2ajL1ajL2

@2Pij

@nj2

C 2.aiL1ajL2 C aiL2 ajL1/@2Pij

@nj ni

: (11.36)

Hence, we need to compute analytic expressions for the partial derivatives of Pij

to obtain the coefficients of Hermite polynomials. The details of the derivations forcomputing the derivatives used in (11.34)–(11.36) can be found in the appendixsection.

4.2 Formulation of the Augmented System

Similar to Sect. 3, once the potential coefficient matrix is represented in the affineform as shown in (11.33), we are ready to solve the coefficients P1L, P2L, andP2L1;L2 by using the Galerkin-based method.

In this case, P in (11.33) now is rewritten as

P D P0 CpX

iD1

P1i Hi CKX

iDpC1

P2i Hi : (11.37)

So after considering the first-order and second-order Hermite polynomials in P , therandom linear equation can be written as

P q D0

@P0 CpX

iD1

P1i Hi CKX

iDpC1

P2i Hi

1

A

q0 CKX

iD1

qiHi

!

D v: (11.38)

Expanding the equation and performing inner product with Hi on both sides, wecan derive a new linear system:

0

@W0 ˝ P0 CpX

iD1

Wi ˝ P1i CKX

iDpC1

Wi ˝ P2i

1

AQ D V; (11.39)


Table 11.1 Number of nonzero element in Wi

i D 0 1 � i � p p C 1 � i � 2p 2p C 1 � i � K

# Nonzero K 2p C 2 p C 3 2p C 4

where ˝ is the tensor product and Q and V are the same as in (11.27), and Wi hasthe same definition as in (11.28).

Again, the matrix in the rhs of (11.39) is the augmented potential coefficient ma-trix for the second-order StatCap. Since Hi are at most second-order polynomials,we can still use LUT to calculate every element in Wi for any number of randomvariables.

Now we study the properties of augmented potential coefficient matrix. Wereview the features and observations we made for the first-order StatCap.

For Wi , which is a K�K matrix, where K D p.pC3/=2, the number of nonzeroelements in Wi is showed in Table 11.1. From Table 11.1, we can see that matricesWi for i D 1; : : : ; K are still very sparse. As a result, their tensor products with P1i

and P2i will still give rise to the sparse augmented matrix in (11.39).For the four observations in Sect. 3 regarding the structure of Wi; i D p C

1; : : : ; K and the augmented matrix, we find that all the observations are still validexcept for Observation 2. As a result, all the efficient implementation and solvingtechniques mentioned at the end of Sect. 3 can be applied to the second-ordermethod.


In this section, we compare the results of the presented first-order and second-order StatCap methods against MC method and SSCM method [208], which arebased on the spectral stochastic collocation method. The StatCap methods havebeen implemented in Matlab 7.4.0. We use minimum residue conjugate gradientmethod as the iterative solver. We also implement the SSCM method in Matlabusing the sparse grid package [81, 82]. We do not use any hierarchical algorithmto accelerate the calculation of the potential coefficient matrix for both StatCap andSSCM. Instead, we use analytic formula in [194] to compute the potential coefficientmatrices.

All the experimental results are carried out in a Linux system with Intel QuadcoreXeon CPUs with 2:99 Ghz and 16 GB memory. The initial results of this chapterwere published in [21, 156].

We test our algorithm on six testing cases. The more specific running parametersfor each testing case are summarized in Table 11.2. In Table 11.2, p is the numberof dominant and independent random variables we get through PCA operation andMC # means the times we run MC method. The 2�2 bus are shown in Fig. 11.1, andthree-layer metal plane capacitance is shown in Fig. 11.2. In all the experiments, we


Table 11.2 The test cases and the parameters setting

1 � 1 bus 2 � 2 bus Three-layer 3 � 3 bus 4 � 4 bus 5 � 5 bus

Panel # 28 352 75 720 1,216 4,140p 10 15 8 21 28 35MC # 10,000 6,000 6,000 6,000 6,000 6,000

Fig. 11.1 A 2 � 2 bus. Reprinted with permission from [156] c� 2010 IEEE

set standard deviation as 10% of the wire width and the �, the correlation length, as200% of the wire width.

First, we compare the CPU times of the four methods. The results are shown inTable 11.3. In the table, StatCap(1st/2nd) refers to the presented first- and second-order methods, respectively. SP(X) means the speedup of the first-order StatCapcomparing with MC or SSCM. All the capacitance is in picofarad unit.

It can be seen that both the first- and second-order StatCap are much faster thanboth SSCM and the MC method. And for large testing cases, such as the 5 � 5 buscase, MC and SSCM will run out of memory, but StatCap still works well. For allthe cases, StatCap can deliver about two orders of magnitude speed up over theSSCM and three orders of magnitude speed up over MC method. Notice that bothSSCM and StatCap use the same random variables after PCA reduction.


00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 11.2 Three-layer metal planes. Reprinted with permission from [156] c� 2010 IEEE

Table 11.3 CPU runtime (in seconds) comparison among MC, SSCM, and StatCap(1st/2nd)

1 � 1 bus, MC(10,000)

MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)2,764 s 49.35 s 1.55 s 3.59 s 1,783 32

2 � 2 bus, MC(6,000)MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)63,059 s 2,315 s 122 s 190 s 517 19

Three-layer metal plane, MC(6,000)MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)16,437 s 387 s 4.11 s 6.67 s 3,999 94

3 � 3 bus, MC(6,000)MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)2:2 � 105 s 7,860 s 408 s 857 s 534 194 � 4 bus, MC(6,000)MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)–* 3:62 � 104 1,573 s 6,855 s 260 235 � 5 bus, MC(6,000)MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)–* – 1:7 � 104 6:0 � 104s – –

* – out of memory

We notice that both MC and SSCM need to compute the potential coefficientmatrices each time the geometry changes. This computation can be significantcompared to the CPU time of solving potential coefficient equations. This is one

6 Additional Notes 177

Table 11.4 Capacitancemean value comparison forthe 1�1 bus

MC SSCM StatCap(1st) StatCap(2nd)

C11 135.92 135.90 136.58 136.21C12 �57.11 �57.01 �57.49 �57.27C21 �57.11 �57.02 �57.49 �57.27C22 135.94 135.69 136.58 136.21

Table 11.5 Capacitancestandard deviationcomparison for the 1 � 1 bus

MC SSCM StatCap(1st) StatCap(2nd)

C11 2.42 2.49 3.13 2.63C12 1.71 1.74 2.02 1.86C21 1.72 1.71 2.02 1.86C22 2.51 2.52 3.19 2.63

of the reasons that SSCM and MC are much slower than StatCap, in which theaugmented system only needs to be set up once.

Also, SSCM uses the sparse grid scheme to reduce the collocation points inorder to derive the coefficients of OPC. But the number of collocation points is stillin the order of O.m2/ for the second-order Hermite polynomials, where m is thenumber of variables. Thus, it requires O.m2/ solutions for the different geometries.In our algorithm, we also consider the second-order Hermite polynomials. But weonly need to solve the augmented system once. The solving process can be furtherimproved by using some advanced solver or acceleration techniques.

Next, we perform the accuracy comparison. The statistics for 1�1 bus case fromthe four algorithms are summarized in Tables 11.4 and 11.5 for the mean valueand standard deviation, respectively. The parameter settings for each case are listedin Table 11.2. We make sure that SSCM and the first-order and the second-orderStatCap use the same number of random variables after the PCA operations.

From these two tables, we can see that first-order StatCap, second-order StatCap,and SSCM give the similar results for both mean value and standard deviationcompared with the MC method. For all the other cases, the times we carry out MCsimulations are as shown in Table 11.3, and the similar experimental results canbe obtained. The maximum errors and average errors of mean value and standarddeviation for all the testing cases are shown in Tables 11.6 and 11.7. Compared tothe MC method, the accuracy of the second-order StatCap is better than the first-order StatCap method, while from Table 11.3, the speed of second-order StatCapkeeps in the same order as first-order StatCap and is still much faster than SSCMand MC.

6 Additional Notes

In this appendix section, we detail the derivations for computing derivatives in(11.34)–(11.36).


Table 11.6 Errorcomparison of capacitancemean values among SSCM,and StatCap (first- andsecond-order)

1 � 1 bus, MC(10,000) as standardSSCM StatCap(1st) StatCap(2nd)

Max err 0.19% 0.67% 0.28%Avg err 0.14% 0.57% 0.24%2 � 2 bus, MC(6,000) as standard

SSCM StatCap(1st) StatCap(2nd)

Max err 0.32% 0.49% 1.19%Avg err 0.15% 0.24% 0.89%Three-layer metal plane, MC(6,000) as standard




Max err 0.33% 0.81% 0.43%Avg err 0.11% 0.58% 0.11%4 � 4 bus, SSCM as standard


Max err 0 0.76% 0.35%Avg err 0 0.40% 0.09%5 � 5 bus, StatCap(2nd) as standard


Max err – 0.59% 0Avg err – 0.28% 0

First, we consider the scenario where panel i and panel j are far away (theirdistance is much larger than the panel area). In this case, the approximations in(11.12) and (11.13) are still valid. From free space Green function, we have (11.15)and (11.16) for the first-order Hermite polynomails, and we have the following forthe second-order Hermite polynomails:

Pij;0 D 1

j !xi � !

xj j; (11.40)

@Pij

@ni

D �!r � !

ni

j !r j3

; (11.41)

@Pij

@nj

D!r � !

nj

j !r j3

; (11.42)

@2Pij

@ni2

D 3.!r � !

ni /2

j !r j5

� 1

j !r j3

; (11.43)


Table 11.7 Errorcomparison of capacitancestandard deviations amongSSCM, and StatCap (first-and second-order)

1 � 1 bus, MC(10,000) as standardSSCM StatCap(1st) StatCap(2nd)



Max err 14.28% 12.98% 25.99%Avg err 6.11% 8.51% 6.04%3-layer metal plane, MC(6,000) as standard




Max err 23.32% 21.39% 11.75%Avg err 3.33% 10.35% 4.38%4 � 4 bus, SSCM as standard


Max err 0 25.7% 6.68%Avg err 0 16.1% 3.89%5 � 5 bus, StatCap(2nd) as standard


Max err – 17.5% 0Avg err – 7.92% 0

@2Pij

@nj2

D 3.!r � !

nj /2

j !r j5

� 1

j !r j3

; (11.44)

@2Pij

@nj ni

D �3.!r � !

nj /.!r � !

ni /

j !r j5

: (11.45)

Second, we consider the scenario where panel i and panel j are near each other(their distance is comparable with the panel area). In this case, the approximationin (11.12) is no longer accurate and we must consider the general form in (11.29)and (11.30).

Since panel i panel j are perpendicular to ni /nj , for @Pij

@njand @2Pij

@nj2 , with

(11.29), we have

@Pij

@nj

@ 1

sj

RSj

G.!xi ;

!xj /daj

@nj


D@ 1

sj

RSj

1

j!xi �!

xj Cni �nj j daj

@nj

D 1

sj

Z

Sj

@ 1

j!xi �!

xj Cni�nj j@nj

daj

D 1

sj

Z

Sj

!r � !

nj

j !r j3

daj

D!r � !

nj

sj

Z

Sj

1

j !r j3

daj ; (11.46)

@2Pij

@nj2

@2 1

sj

RSj

G.!xi ;

!xj /daj

@nj2

D@2 1

sj

RSj

1

j!xi �!

xj Cni�nj j daj

@nj2

D 1

sj

Z

Sj

@2 1

j!xi �!

xj Cni �nj j@nj

2daj

D 1

sj

Z

Sj

3.!r � !

nj /2

j !r j5

� 1

j !r j3

daj

D 3.!r � !

nj /2

sj

Z

Sj

daj

j !r j5

� 1

sj

Z

Sj

daj

j !r j3

: (11.47)

Similarly, with (11.30), we can further obtain

@Pij

@ni

@ 1si

RSi

G.!xi ;

!xj /dai

@ni

D � !r � !

ni

si

Z

Si

1

j !r j3

dai ; (11.48)

@2Pij

@ni2

@2 1si

RSi

G.!xi ;

!xj /dai

@ni2

D 3.!r � !

ni /2

si

Z

Si

dai

j !r j5

� 1

si

Z

Si

dai

j !r j3

: (11.49)


While for @2Pij

@nj ni, we need to further consider two cases. First, when panel i and

panel j are in parallel, we have

@2Pij

@ni2

D @2Pij

@nj2

D � @2Pij

@nj ni

: (11.50)

Second, we consider panel i and panel j are not in parallel. Then we arrive

@2Pij

@nj ni

D @@Pij

@ni

@nj

D@

��

!

r �!ni

si

RSi

1

j!r j3dai

s

@nj

D �!r � !

ni

si

@R

Si

1

j!r j3dai

@nj

: (11.51)

Assume the conductors are rectangular geometries. Then two panels should be eitherin parallel or perpendicular. Since panel i and panel j are not parallel, these twopanels will be perpendicular.

Without loss of generality, we assume that panel i is in parallel with xz-plane

and panel j is in parallel with yz-plane. Then, easy to see,!ni D .0; 1; 0/ and

!nj D

.1; 0; 0/. Let ukl , k; l 2 f0; 1g denote the four corners of panel i , with .xik; yi ; zi l /

being the Cartesian cooridinates of corner ukl , and the center of gravity is .xi ; yi ; zi /.Let tkl , k; l 2 f0; 1g denote the four corners of panel j , with .xj ; yjk; zjl / being theCartesian cooridinates of corner tkl , and the center of gravity is .xj ; yj ; zj /.

After that, (11.51) can be further deduced to

@2Pij

@nj ni

D yj � yi

si

@R xi1

xi 0

R zi1

zi0

dxdz

j!r j3@xj

D yj � yi

si

@R xi1�xj

xi 0�xj

�R zi1

zi0

dz

j!r 0 j3

dx

@xj

D yj � yi

si

0

BBB@

Z zi1

zi0

dzˇˇˇ!r

�ˇˇˇ3

�Z zi1

zi0

dzˇˇˇˇ!r

Cˇˇˇˇ

3

1

CCCA


D yj � yi

si

1X

kD0

1X

lD0

.�1/kClC1.zi l � zj /

..xik � xj /2 C .yi � yj /2/

� 1p

.xik � xj /2 C .yi � yj /2 C .zi l � zj /2

!

(11.52)

where

!r D

q.x � xj /2 C .yi � yj /2 C .z � zj /2;

!r 0 D

q.x/2 C .yi � yj /2 C .z � zj /2;

!rC D

q.xi1 � xj /2 C .yi � yj /2 C .z � zj /2;

!r� D

q.xi0 � xj /2 C .yi � yj /2 C .z � zj /2:

7 Summary

In this chapter, we have introduced a statistical capacitance extraction method,called StatCap, for three-dimensional interconnects considering process variations.The presented method is based on the orthogonal polynomial method to representthe variational geometrical parameters in a deterministic way. We consider bothfirst-order and second-order variational effects. The presented method avoids thesampling operations in the existing collocation-based spectral stochastic method.The presented method solves an enlarged potential coefficient system to obtain thecoefficients of OPC for capacitance. StatCap only needs to setup the augmentedequation once and can exploit the sparsity and low-rank property to speed upthe extraction process. The presented StatCap method can consider second-orderperturbation effects to generate more accurate quadratic variational capacitance.Numerical examples show that our method is two orders of magnitude faster thanthe recently proposed statistical capacitance extraction method based on the spectralstochastic collocation method and many orders of magnitude faster than the MCmethod for several practical interconnect structures.

Chapter 12Incremental Extraction of VariationalCapacitance

1 Introduction

Since the interconnect length and cross area are at different scales, the variationalcapacitance extraction is quite different between the on-chip [21, 205, 209] andthe off-chip [34, 210]. The on-chip interconnect variation from the geometricalparameters, such as width of one panel and distance between two panels, is moredominant [21, 209] than the rough surface effect seen from the off-chip packagetrace. However, it is unknown how to leverage the stochastic process variation intothe matrix-vector product (MVP) by fast multipole method (FMM) [21, 34, 205,209, 210]. Similar to deal with the stochastic analog mismatch for transistors [133],a cost-efficient full-chip extraction needs to explore an explicit relation betweenthe stochastic variation and the geometrical parameter such that the electricalproperty can show an explicit dependence on geometrical parameters. Moreover, theexpansion by OPC with different collocation schemes [21,34,187,196,209] alwaysresults in an augmented and dense system equation. This significantly increasesthe complexity when dealing with a large-scale problem. The according GMRESthereby needs to be designed in an incremental fashion to consider the updatefrom the process variation. As a result, a scalable extraction algorithm similar to[77, 118, 163] is required to consider the process variation with the new MVP andGMRES developed accordingly as well.

To address the aforementioned challenges, this chapter introduces a new tech-nique [56], which contributes as follows. First, to reveal an explicit dependenceon geometrical parameters, the potential interaction is represented by a number ofGMs. As such, the process variation can be further included by expanding the GMswith the use of orthogonal polynomial chaos, OPC, called SGMs in this chapter.Next, with the use of the SGM, the process variation can be incorporated into amodified FMM algorithm that evaluates the MVP in parallel. Finally, an incrementalGMRES method is introduced to update the preconditioner with different variations.Such a parallel and incremental full-chip capacitance extraction considering thestochastic variation is called piCAP. Parallel and incremental analyses are the two


183

184 12 Incremental Extraction of Variational Capacitance

effective techniques in reducing computational cost. Experiments show that thepresented method with stochastic polynomial expansion is hundreds of times fasterthan the MC-based method while maintaining a similar accuracy. Moreover, theparallel MVP in the presented method is up to 3� faster than the serial method,and the incremental GMRES in the presented method is up to 15� faster thannonincremental GMRES methods.

2 Review of GRMES and FMM Algorithms

2.1 The GMRES Method

The resulting potential coefficient matrix P is usually dense in the BEM methodin Sect. 2 of Chap. 11. As such, directly solving (11.2) would be computationallyexpensive. FastCap [118] applies an iterative GMRES method [149] to solve (11.2).Instead of performing an expensive LU decomposition of the dense P , GMRES firstforms a preconditioner W such that W �1 � P has a smaller condition number thanP , which can accelerate the convergence of iterative solvers [150]. Take the leftpreconditioning as an example:

.W �1 � P /q D W �1 � b:

Then, using either ME [118], low-rank approximation [77], or the hierarchical-tree method [163] to efficiently evaluate the MVP for .W �1 �P /qi (qi is the solutionfor i -th iteration), the GMRES method minimizes the residue error

min W jjW �1 � b � .W �1 � P /qi jj

iteratively till converged.Clearly, the use of GMRES requires a well-designed preconditioner and a fast

MVP. In fact, FMM is able to accelerate the evaluation of MVP with O.N / timecomplexity where N is the number of variables. We will introduce FMM first aswhat follows.

2.2 The Fast Multipole Method

The FMM was initially proposed to speed up the evaluation of long-ranged particleforces in the N-body problem [141,193]. It can also be applied to the iterative solversby accelerating calculation of MVP [118]. Let us take the capacitance extractionproblem as an example to introduce the operations in the FMM. In general, theFMM discretizes the conductor surface into panels and forms a cube with a finiteheight containing a number of panels. Then, it builds a hierarchical oct-tree of cubesand evaluates the potential interaction P at different levels.

3 Stochastic Geometrical Moment 185

Fig. 12.1 Multipole operations within the FMM algorithm. Reprinted with permission from [56]c� 2011 IEEE

Specifically, the FMM first assigns all panels to leaf cells/cubes, and computesthe MEs for all panels in each leaf cell. Then, FMM calculates the multipoleexpansion of each parent cell using the expansions of its children cells (called M2Moperations in upward pass). Next, the local field expansions of the parent cells canbe obtained by adding multipole expansions of well-separated parent cells at thesame levels (called M2L operations). After that, FMM descends the tree structureto calculate the local field expansion of each panel based on the local expansion ofits parent cell (called L2L in downward pass). All these operations are illustratedwithin Fig. 12.1.

In order to further speed up the evaluation of MVP, the presented stochasticextraction has a parallel evaluation P q with variations, which is discussed in Sect. 4and an incremental preconditioner, which is discussed in Sect. 5. Both of thesefeatures depend on how to find an explicit dependence between the stochasticprocess variation and the geometric parameters, which is discussed in Sect. 3.

3 Stochastic Geometrical Moment

With FMM, the complexity of MVP P q evaluation can be reduced to O.N / duringthe GMRES iteration. Since the spatial decomposition in FMM is geometricallydependent, it is helpful to express P using GMs with an explicit geometry


dependence. As a result, this can lead to an efficient recursive update (M2M, M2L,L2L) of P on the oct-tree. The geometry dependence is also one key property topreserve in presence of the stochastic variation. In this section, we first derive thegeometrical moment and then expand it by stochastic orthogonal polynomials tocalculate the potential interaction with variations.

3.1 Geometrical Moment

Process variation includes global systematic variations and local random variations.This chapter focuses on local random variations, or stochastic variations, whichis more difficult to handle. Note that although there are many variation sources,without loss of generality, the chapter considers two primary geometrical parameterswith stochastic variation for the purpose of illustration: panel distance (d ) and panelwidth (h). Due to the local random variation, the width of the discretized panel, aswell as the distance between panels, may show random deviations from the nominalvalue. Though there could exist a systematic correlation between d and h for eachpanel, PCA in Sect. 2.2 of Chap. 2 can be first applied to decouple those correlatedparameters, and hence, potentially reduce the number of random variables. After thePCA for the global systematic variation, we focus on the more challenging part: thelocal random variation. With expansions in Cartesian coordinates, we can relate thepotential interaction with the geometry parameter through GMs that can be extendedto consider stochastic variations.

Let the center of an observer cube be r0 and the center of a source cube to be rc.We assume that the distance between the i th source panel and rc is a vector r:

r D rx�!x C ry

�!y C rz�!z

with jrj D r , and the distance between r0 and rc is a vector d

d D dx�!x C dy

�!y C dz�!z

with jdj D d .In Cartesian coordinates (x � y � z), when the observer is outside the source

region (d > r), a multipole expansion (ME) [9, 72] can be defined as

1

jr � dj DX

pD0

.�1/p

pŠ. r � � � r„ƒ‚…

p

/ � � � � �„ƒ‚…p

0

@r � � � r„ ƒ‚ …p

1

d

1

A

DX

pD0

Mp DX

pD0

lp.d/mp.r/; (12.1)

by expanding r around rc, where

3 Stochastic Geometrical Moment 187

l0.d/ D 1

d; m0.r/ D 1;

l1.d/ D dk

d 3; m1.r/ D �rk;

l2.d/ D 3dkdl

d 5; m2.r/ D 1

6.3rkrl � ıkl r

2/;

: : : ;

lp.d/ D r � � � r„ ƒ‚ …p

1

d; mp.r/ D .�1/p

pŠ. r � � � r„ƒ‚…

p

/: (12.2)

Note that dk; dl are the coordinate components of vector r in Cartesian coordinates.The same is true for rk and rl . r is the Laplace operator to take the spatial difference,ıkl is the Kronecker delta function, and .r � � � r/ and .r � � � r 1

d/ are rank-p tensors

with x˛; yˇ; z� (˛ C ˇ C � D p) components.Assume that there is a spatial shift at the source-cubic center, rc, for example,

change one child’s center to its parent’s center by h (jhj D c � h), where c is aconstant and h is the panel width. This leads to the following transformation for mp

in (12.2):

m0p D ..r C h/ � � � .r C h/

„ ƒ‚ …p

/

D mp CpX

qD0

pŠ

qŠ.p � q/Š.h � � � h„ƒ‚…

j

/mp�j : (12.3)

Moreover, when the observer is inside the source region (d < r), a localexpansion (LE) under Cartesian coordinates is simply achieved by exchanging d

and h in (12.1)

1

jr � hj DX

pD0

Lp DX

pD0

mp.h/lp.r/: (12.4)

Also, when there is a spatial shift of the observer-cubic center r0, the shift ofmoments lp.r/ can be derived similarly to (12.3).

Clearly, both Mp, Lp and their spatial shifts show an explicit dependence on thepanel width h and panel distance d . For this reason, we call Mp and Lp GMs. Assuch, we can also express the potential coefficient

4�0 � P.h; d/ '(P

pD0 Mp if d > r;P

pD0 Lp otherwise;(12.5)

as a geometrical-dependence function P.h; d/ via GMs.


Moreover, assuming that local random variations are described by two randomvariables. �h for the panel width h, and �d for the panel distance d , the stochasticforms of Mk and Lk become

OMp.�h; �d / D Mp.h0 C h1�h; d0 C d1�d /;

OLp.�h; �d / D Lp.h0 C h1�h; d0 C d1�d /; (12.6)

where h0 and d0 are the nominal values and h1 as well as d1 defines the perturbationrange (% of nominal). Similarly, the stochastic potential interaction becomesOP .�h; �d /.

3.2 Orthogonal PC Expansion

By expanding the stochastic potential interaction OP .�h; �d / with OPC, we canfurther derive the SGMs similarly as Sect. 4 of Chap. 11.

We use n D 1 as an example to illustrate the general expression in Sect. 4 ofChap. 11. First, the potential coefficient matrix OP can be expanded with the first twoHermite polynomials by

OP .�/ D P0˚0.�/ C P1˚1.�/ D P0 C P1�:

Then, the Wk .k D 0; 1/ matrix becomes

W0 D0

@1 0 0

0 1 0

0 0 1

1

A; W1 D0

@0 1 0

1 0 2

0 2 0

1

A;

and the newly augmented coefficient system can be written as

P D W0 ˝ P0 C W1 ˝ P1

D0

@P0 0 0

0 P0 0

0 0 P0

1

AC0

@0 P1 0

P1 0 2P1

0 2P1 0

1

A

D0

@P0 P1 0

P1 P0 2P1

0 2P1 P0

1

A : (12.7)

By solving q0, q1; : : : and qn, the Hermite polynomial expansion of chargedensity can be obtained. Especially, the mean and the variance can be obtained from

E.q.�d // D q0;

Var.q.�d // D q21Var.�d / C q2

2Var.�2d � 1/ D q2

1 C 2q22:

4 Parallel Fast Multipole Method with SGM 189

0 10 20 30 40 50 60

0

10

20

30

40

50

60

Matrix Column IndexM

atrix

Row

Inde

x

P0

P0

P1

P1 2P1

2P1

P0

Fig. 12.2 Structure ofaugmented system in piCAP

Note that under a BEM formulation, the expanded terms Pi are still dense. Witha single plate example, we show the structure of augmented system in (12.7) asFig. 12.2.

Considering that the dimension of OP is further augmented, the complexity tosolve the augmented system (11.25) would be expensive. In the following, wepresent a parallel FMM to reduce the cost of MVP evaluations in Sect. 4 and anincremental preconditioner to reduce the cost of GMRES evaluation in Sect. 5.

4 Parallel Fast Multipole Method with SGM

As discussed in Sect. 3, we need an efficient evaluation of MVP P � Q for theaugmented and dense system (11.25). The block structure of the matrix blocks inP can be utilized to simplify the evaluation of MVP (P � Q). In the frameworkof a parallel FMM, each product of Pi;j � qi .q D q0; q1; : : : ; qn/, the MVPs ofboth nominal values, and their variations can be efficiently evaluated at the blocklevel before being summed to obtain the final P � Q. Though the parallel FMMhas been discussed before such as in [201], the extension to deal with stochasticvariation for capacitance extraction needs to be addressed in the content of SGMs.In the following, we illustrate the parallel FMM considering the process variation.

The first step of a parallel FMM evaluation is to hierarchically subdivide space inorder to form the clusters of panels. This is accomplished by using a tree structureto represent each subdivision. We assume that there are N panels at the finest(or bottom) level. Providing depth H , we build an oct-tree with H D dlog8

Nn

e byassigning n panels in one cube. In other words, there are 8h cubes at the bottom level.A parallel FMM further distributes a number of cubes into different processors toevaluate P. The decomposition of the tasks needs to minimize the communication


Center of leaf sourceCenter of parent source

M2M

Fig. 12.3 The M2M operation in an upward pass to evaluate local interactions around sources

cost and balance the workload. In the following steps, the stochastic P � Q isevaluated in two passes: an upward pass for multipole expansions (MEs) and adownward pass for local expansions (LEs), both of which are further illustratedwith details below.

4.1 Upward Pass

The upward pass manages the computation during the source expansion, which isillustrated in Fig. 12.3.

It accumulates the multipole-expanded near-field interaction starting from thebottom level (l D 0). For each child cube (leaf) without variation (nominal contribu-tion to P0) at the bottom level, it first evaluates the stochastic geometrical momentwith (12.1) for all panels in that cube. If each panel experiences a variation �d or �h,it calculates Pi .�/ � q.i ¤ 0; � D �d ; �h/ by adding perturbation hi �h or di�d toconsider different variation sources, and then evaluates the SGMs with (12.6).

After building the MEs for each panel, it transverses to the upper level to considerthe contribution from parents as shown in Fig. 12.3. The moment of a parent cubecan be efficiently updated by summing the moments of its eight children via an

4 Parallel Fast Multipole Method with SGM 191

M2M operation. Based on (12.3), the M2M translates the children’s OMp into theirparents.

The M2M operations at different parents are performed in parallel since there isno data dependence. Each processor builds its own panels’ SGMs while ignoringthe existence of other processors.

4.2 Downward Pass

The potential evaluation for the observer is managed during a downward pass. Atl th level (l > 0), two cubes are said to be adjacent if they have at least one commonvertex. Two cubes are said to be well separated if they are not adjacent at level l buttheir parent cubes are adjacent at level l � 1. Otherwise, they are said to be far fromeach other. The list of all the well-separated cubes from one cube at level l is calledthe interaction list of that cube.

From the top level l D H � 1, interactions from the cubes on the interactionlist to one cube are calculated by an M2L operation at one level (M2L operationat top level, which is illustrated in Fig. 12.4). Assuming that a source-parent centerrc is changed to an observer-parent’s center r0, this leads to an LE (12.4) using theME (12.1) when exchanging the r and d: As such, the M2L operation translates thesource’s OMp into the observer’s OLp for a number of source-parents on the interactionlist of one observer-parent at the same level. Due to the use of the interaction list,the M2L operations have the data dependence that introduces overhead for a parallelevaluation.

After the M2L operation, interactions are further recursively distributed down tothe children from their parents by an L2L operation (converse of the upward passshown in Fig. 12.5). Assume that the parent’s center r0 is changed to the child’scenter r 0

0 by a constant h. Identical to the M2M update by (12.3), an L2L operationupdates r by r0 D r C h for all children’s OLks. In this stage, all processors canperform the same M2L operation at the same time on different data. This perfectlyemploys the parallelism.

Finally, the FMM sums the L2L results for all leaves at the bottom level (l D 0)and tabulates the computed products Pi � qj (i; j D 0; 1; : : : ; n). By summing upthe products in order, the FMM returns the product P � Q.i/ in (11.25) for the nextGMRES iteration.

4.3 Data Sharing and Communication

The total runtime complexity for the parallel FMM using stochastic GMs can be es-timated by O.N=B/CO.log8B/CC.N; B/, where N is the total number of panels


Center of parent source

Center of parent observer

M2L

Observer c

Source c

Fig. 12.4 The M2L operation in a downward pass to evaluate interactions of well-separated sourcecube and observer cube

and B is the number of used processors. The C.N; B/ implies communication orsynchronization overhead.

Therefore, it is desired to minimize the overhead of data sharing and communi-cation during a parallel evaluation. In the presented parallel FMM implementations,the message-passing interface (MPI) is used for data communication and synchro-nization between multiple processors. We notice that data dependency mainly comesfrom the interaction list during M2L operations. In this operation, a local cubeneeds to know the ME moments from cubes in its interaction list. To design atask distribution with small latency between computation and communication, theimplementation uses a complement interaction list and prefetch operation.

As shown in Fig. 12.6, the complement interaction list (or dependency list) forthe cube under calculation records cubes that require their ME moments to belisted within the shaded area. As such, the studied cube first anticipates which MEmoments will be needed by other dependent cubes (such as Cube 0, : : :, Cube k

shown in Fig. 12.6). Then, it distributes the required ME moments to these cubesprior to the computation. From the point of view of these dependent cubes, they can“prefetch” the required ME moments and perform their own calculations withoutstalls. Therefore, the communication overhead can be significantly reduced.

5 Incremental GMRES 193

Center of leaf observerCenter of parent observer

L2L

Fig. 12.5 The L2L operation in a downward pass to sum all integrations

Cube 1

Cube under calculation

Cube 0

Dependency List

Cube 0Cube 1…Cube k…

Cube k

Fig. 12.6 Prefetch operationin M2L. Reprinted withpermission from [56]c� 2011 IEEE

5 Incremental GMRES

The parallel FMM presented in Sect. 4 provides a fast MVP for the fast GMRESiteration. As discussed in Sects. 2 and 3, another critical factor for a fast GMRESis the construction of a good preconditioner. In this section, to improve the


convergence of GMRES iteration, we first present a deflated power iteration toimprove convergence during the extraction. Then, we introduce an incrementalprecondition in the framework of the deflated power iteration.

5.1 Deflated Power Iteration

The convergence of GMRES can be slow in the presence of degenerated smalleigenvalues of the potential matrix P such as the case for most extraction problemswith fine meshes. Constructing a preconditioner W to shift the eigenvalue distri-bution (spectrum) of a preconditioned matrix W � P can significantly improve theconvergence [49]. This is one of the so-called deflated GMRES methods [166].

To avoid fully decomposing P , an implicitly restarted Arnoldi method byARPACK1 can be applied to find its first K eigenvalues Œ�1; : : : ; �K� and its Kth-order Krylov subspace composed by the first K eigenvector VK D Œv1; : : : ; vK�,where

PVK D VKDK; V TK VK D I: (12.8)

Note that DK is a diagonal matrix composed of the first K eigenvalues

DK D V TK AVK D diagŒ�1; : : : ; �K�: (12.9)

Then, an according spectrum preconditioner is formed:

W D I C �.VKD�1K V T

K /; (12.10)

which leads to a shifted eigenspectrum using

.W � P/vi D .� C �i/vi i D 1; : : : ; K: (12.11)

Note that � is the shifting value that leads to a better convergence. This methodis called deflated power iteration. Moreover, as discussed below, the spectralpreconditioner W can be easily updated in an incremental fashion.

5.2 Incremental Precondition

The essence of the deflated GMRES is to form a preconditioner that shiftsdegenerated small eigenvalues. For a new P 0 with updated ıP , the distributionof the degenerated small eigenvalues changes accordingly. Therefore, given a

1http://www.caam.rice.edu/software/ARPACK/.

5 Incremental GMRES 195

preconditioner W for the nominal system with the potential matrix P .0/, it wouldbe expensive for another native Arnoldi iteration to form a new preconditioner W 0for a new P 0 with updated ıP from P .1/, : : :, P .n/ . Instead, we show that W can beincrementally updated as follows.

If there is a perturbation ıP in P , the perturbation ıvi of i th eigenvector vi

.k D 1; : : : ; K/ can be given by [171]:

ıvi D Vi B�1i V T

i ıPvi : (12.12)

Note that Vi is the subspace composed of

Œv1; : : : ; vj ; : : : ; vK�;

and Bi is the perturbed spectrum

diagŒ�i � �1; : : : ; �i � �j ; : : : ; �i � �K�;

(j ¤ i; i; j D 1; : : : ; K). As a result, ıVK can be obtained similarly for K

eigenvectors.Assume that the perturbed preconditioner is W 0:

W 0 D .I C �V 0K.D0

K/�1.V 0K/T /

D W C ıW; (12.13)

whereV 0

K D VK C ıVK; D0K D .V 0

K/T P V 0K: (12.14)

After expanding V 0K by VK and ıVK , the incremental change in the preconditioner

W can be obtained by

ıW D �.EK � VKD�1K FKD�1

K VK/; (12.15)

whereEK D ıVKD�1

K V TK C .ıVKD�1

K V TK /T ; (12.16)

andFK D ıV T

K VKDK C .ıV TK VKDK/T : (12.17)

Note that all the above inverse operations only deal with the diagonal matrix DK ,and hence, the computational cost is low.

Since there is only one Arnoldi iteration to construct a nominal spectralpreconditioner W , it can only be efficiently updated when ıP changes. For example,ıP is different when one alters the perturbation range h1 of panel width or changesthe variation type from panel width h to panel distance d . We call this deflatedGMRES method with the incremental precondition an iGMRES method.


For our problem in (11.25), we first analyze an augmented nominal system with

W D diagŒW; W; : : : ; W �;

P D diagŒP .0/; P .0/; : : : ; P .0/�;

DK D diagŒDK; DK; : : : ; DK�;

VK D diagŒVK; VK; : : : ; VK�;

which are all block diagonal with n blocks. Hence, there is only one preconditioningcost from the nominal block P .0/. In addition, the variation contributes to theperturbation matrix by

ıP D

0

BBB@

0 P0;1 � � � P0;n

P1;0 0 � � � P1;n

::::::

: : ::::

Pn;0 Pn;1 � � � 0

1

CCCA

: (12.18)

6 piCAP Algorithm

We further discuss how to apply iGMRES to the presented stochastic capacitanceextraction in this part. For a full-chip extraction, simultaneously consideringvariations from all kinds of geometrical parameters would significantly increasemodel complexity, if at all possible. In this chapter, we study the stochastic variationcontributed by each parameter individually in an incremental fashion. Togetherwith the incremental GMRES discussed above, the computational cost can bedramatically reduced for a large-scale extraction.

6.1 Extraction Flow

The overall parallel extraction flow in piCAP is presented in Fig. 12.7. First, piCAPdiscretizes conductor surfaces into small panels, and builds a hierarchical oct-treeof cubes which will be distributed into many processors. Then, it sets the potentialof certain conductor j as 1 volt while other conductors are grounded. After that,the spectrum preconditioner W is built according to the variational system P , andupdated partially for different variation sources. With the preconditioner, piCAPuses GMRES to solve the augmented linear system P � Q D B iteratively tillconvergence. Parallel FMM described in Sect. 4 is then performed to provide MVPP � Q efficiently for GMRES. Finally, the variational capacitance Cij can beachieved by summing up panel charges on conductor i .

As an example, we can take the procedure for panel distance d . With first-orderOPC expansion and the inner product, we can have the below augmented potentialcoefficient matrix:

6 piCAP Algorithm 197

Fig. 12.7 Stochastic capacitance extraction algorithm

P D P .0/ C ıP

D0

@P0 0 0

0 P0 0

0 0 P0

1

AC0

@0 P1 0

P1 0 2P1

0 2P1 0

1

A

D0

@P0 P1 0

P1 P0 2P1

0 2P1 P0

1

A : (12.19)


Notice that the first-order OPC expansion is used here for illustration, and ahigher order expansion can provide more accurate variance information.

With the spectrum precondition in Sect. 5, we can build W .0/ for P .0/ and ıW

for ıP . Thus, the preconditioner W for an augmented system can be written as

W D W .0/ C ıW: (12.20)

Therefore, the preconditioned GMRES can be used to solve the linear systemP � Q D B with W as the preconditioner. In each iteration, the parallel FMM inSect. 4 is involved to provide the MVP P � Q quickly. More specifically, FMMfirst calculates geometric moments for potential coefficient P0 in P .0/ with (12.5).Then, it introduces a certain range perturbation d1 (% of nominal) to panel distanced and recalculates the geometric moments for P1 in ıP according to (12.9). Withall geometric moments, FMM can evaluate P .0/ and ıP , and then return the finalMVP P � Q.

When GMRES reaches its convergence, it achieves the resultant vector Qd DŒq0; q1; : : : ; qn�T , which contains the mean as well as the variance for the geometricparameter d by

E.q.�d // D q0;

Var.q.�d // D q21Var.�d / C q2

2Var.�2d � 1/ D q2

1 C 2q22:

The above procedure can be similarly applied to calculate the variance and the meanfor the geometrical parameter h. Clearly, the stochastic orthogonal expansion leadsto an augmented system with perturbed blocks in the off-diagonal. It increases thecomputational cost for any GMRES method and remains an unresolved issue in theprevious applications of the stochastic orthogonal polynomial [21, 34, 187, 209].In addition, when variation changes, the P matrix should be partially updated.Forming a new preconditioner to consider the augmented (11.26) would thereforebe expensive.

Based on (12.15), we can do an incremental update of the preconditioner W toconsider a new variation P .i/ when changing the perturbation range of hi or di .Moreover, we can also make an incremental update of W when changing thevariation type from P .i/.h/ to P .i/.d /. This can dramatically reduce costs whenapplying the deflated GMRES during the variational capacitance extraction. Thesame procedure can be easily extended for high-order expansions with stochasticorthogonal polynomials.

6.2 Implementation Optimization

The memory complexity of iGMRES limits the scalable capability to large-scaleproblems, which generally comes from two parts: memory consumption of thepreconditioner and of the MVP. Moreover, there is a time complexity mainly fromtime-consuming LU and eigenvalue decompositions.


The first memory bottleneck is located at the O.N 2/ storage requirement of thepreconditioner matrix. For example, a second-order expanded system contains 3N

variables, where N is the number of panels. This is expensive to maintain. Becauseeach block ofPi;j is a set of symmetric positive semi-definite matrices, we can prunesome small off-diagonal entries, store half of them, and further apply a compresssparse column (CSC) format to store the preconditioner matrix. This can reduce thecost to build and store the block-diagonal spectral preconditioner. Another memorybottleneck for the MVP is resolved due to the intrinsic matrix-free property of FMM.This exploits the tree hierarchy to speed up the MVP evaluation with a cost ofO.N logN / for both memory and CPU time. Thus, the presented FMM using SGMscan be efficiently used for large-sized variational capacitance extraction.

The time complexity stems mainly from the analysis of the preconditioner of thenominal system during the first time. The use of a restarted Arnoldi in ARPACK canbe used to efficiently identify the first K eigenvalues. This can significantly reducethe cost to O.N /. As a result, the computational cost to form the preconditioner isreduced even during the first time.


Based on the presented algorithm, a program has been developed for piCap usingC++ on Linux network servers with Xeon processors (2.4 GHz CPU and 2 GB mem-ory). In this section, we first validate the accuracy of SGMs by comparing them withthe MC integral. Then, we study the parallel runtime scalability when evaluating thepotential interaction using MVP with charge. In addition, the incremental GMRESpreconditioner is verified when compared to its nonincremental counterpart withtotal runtime. Finally, spectral precondition is validated by analyzing the spectrumof potential coefficient matrix. The initial results of this chapter were publishedin [53].

7.1 Accuracy Validation

To validate the accuracy of SGM by first-order and second-order expansions, weuse two distant square panels as shown in Fig. 12.8. The nominal center-to-centerdistance d is d0, and nominal panel width h is h0.

7.1.1 Orthogonal PC Expansion

First, we compare the accuracy of first-order and second-order OPC expansionsagainst the exact values from integration method. The Cij between these two panelsare calculated with different methods as listed in Table 12.1. It can be observed thatsecond-order OPC expansion can achieve higher accuracy than first-order expansion


−100

1020

−10

0

10

20−1

0

1

Z(u

m)

X (um)Y(um)

d

panel j

panel i

h

h

Fig. 12.8 Two distant panelsin the same plane

Table 12.1 Accuracy comparison of two orthogonal PC expansions

2 panels, d0 D 25 �m; h0 D 5 �m

First-order orthogonal PC Second-order orthogonal PCn Integration

Cij .fF / �2.7816 �2.777 �2.7769

2 panels, d0 D 15 �m; h0 D 2 �m

First-order orthogonal PC Second-order orthogonal PC Integration

Cij .fF / �1.669 �1.6677 �1.6677

when compared with exact values from integration method. Thus, higher OPCexpansion can lead to more accurate result but with higher computational expensedue to larger-scale system.

7.1.2 Incremental Analysis

One possible concern is about accuracy of incremental analysis, which considersindependent variation sources separately and combines their contributions to getthe total variable capacitance. In order to validate it, we first introduce panel widthvariation (Gaussian distribution with perturbation range h1) to panel j in Fig. 12.8and calculate the variable capacitance distribution. Then, panel distance variationd1 is added to panel j and the same procedure is conducted. As such, accordingto incremental analysis, we can obtain the total capacitance as a superposition ofnominal capacitance and both variation contributions. Moreover, we introduce theMC simulations (10,000 times) as the baseline where both variations are introducedsimultaneously. The comparison is shown in Table 12.2, and we can observe thatthe results from incremental analysis can achieve high accuracy.

Actually, it is ideal to consider all variations simultaneously, but the dimensionof system can increase exponentially with the number of variations, and thus, thecomplexity is prohibited. As a result, when the variation sources are independent,it is possible and necessary to separate them by solving the problem with eachvariation individually.


Table 12.2 Incremental analysis versus MC method

2 panels, d0 D 10 �m; h0 D 2 �m; d1 D 30%d0; h1 D 30%h0

Incremental analysis .fF / MC .fF / Error .%/

�mCij �1.1115 �1.1137 0.19�Cij 0.11187 0.11211 0.21

2 panels, d0 D 25 �m; h0 D 5 �m, d1 D 20%d0 , h1 D 20%h0

Incremental analysis .fF / Monte Carlo .fF / Error .%/

�Cij �2.7763 �2.7758 0.018�Cij 0.19477 0.194 0.39

Table 12.3 Accuracy andruntime(s) comparisonbetween MC(3,000), piCap

2 panels, d0 D 7:07 �m; h0 D 1 �m, d1 D 20%d0

MC piCAP

Cij .fF / �0.3113 �0.3056Runtime (s) 2.6965 0.008486

2 panels, d0 D 11:31 �m; h0 D 1 �m, d1 D 10%d0

MC piCAP

Cij .fF / �0.3861 �0.3824Runtime (s) 2.694 0.007764

2 panels, d D 4:24 �m; h0 D 1 �m, d1 D 20%d0; h1 D 20%

MC piCAP

Cij .fF / �0.2498 �0.2514Runtime (s) 2.7929 0.008684

7.1.3 Stochastic Geometrical Moments

Next, the accuracy of presented method based on SGM is verified with the sameexample in Fig. 12.8. To do so, we introduce a set of different random variationranges with Gaussian distribution for their distance d and width h. For this example,MC method is used to validate the accuracy of SGMs.

First, MC method calculates their Cij s 3;000 times, and each time, the variationwith a normal distribution is introduced to distance d randomly. As such, we canevaluate the distribution, including the mean value � and the standard deviation � ,of the variational capacitance.

Then, we introduce the same random variation to geometric moments in (12.6)with stochastic polynomial expansion. Because of an explicit dependence on geo-metrical parameters according to (12.1), we can efficiently calculate OCij s. Table 12.3shows the Cij value and runtime using the aforementioned two approaches. Thecomparison in Table 12.3 shows that SGMs not only can keep high accuracy, whichyields an average error of 1.8%, but can also be up to �347 faster than the MCmethod.

Moreover, Fig. 12.9 shows the Cij distribution from MC (3,000 times), whileconsidering 10% panel distance variation with Gaussian distribution. Also, the meanand variance computed by piCAP are marked in the figure with the dashed lines,which fit very well with MC results.


−0.44 −0.42 −0.4 −0.38 −0.36 −0.34 −0.32 −0.30

100

200

300

400

500

600

700

800

900

Cij (pF)

Num

ber

of o

ccur

ance

s

Distribution of Cij compare between two methods

μ

μ−3σ μ+3σ

Fig. 12.9 Distribution comparison between Monte Carlo and piCAP

7.2 Speed Validation

In this part, we study the runtime scalability using a few large examples to showboth the advantage of the parallel FMM for MVP and the advantage of the deflatedGMRES with incremental preconditions.

7.2.1 Parallel Fast Multipole Method

The four large examples are comprised of 20; 40; 80; and 160 conductors, respec-tively. For the two-layer example with 20 conductors, each conductor is of size1�1�25 �m (width � thickness � length), and piCap employs a uniform 3�3�50

discretization. Figure 12.10 shows its structure and surface discretization.For each example, we use a different number of processors to calculate the MVP

of P � q by the parallel FMM. Here we assume that only d has a 10% perturbationrange with Gaussian distribution. As shown in Table 12.4, the runtime of the parallelMVP decreases evidently when more processors are involved. Due to the use of thecomplement interaction list, the latency of communication is largely reduced andthe runtime shows a good scalability versus the number of processors. In fact, thedependent list can eliminate major communication overhead and further achieve1:57� speedup with four processors. Moreover, the total MVP runtime with fourprocessors is about 3� faster on average than runtime with a single processor.


Fig. 12.10 The structure and discretization of two-layer example with 20 conductors. Reprintedwith permission from [56] c� 2011 IEEE

Table 12.4 MVP runtime (s)/speedup comparison for four different examples

#Wire 20 40 80 160#Panels 12,360 10,320 11,040 12,4801 proc 0.737515/1.0 0.541515/1.0 0.605635/1.0 0.96831/1.02 procs 0.440821/1.7� 0.426389/1.4� 0.352113/1.7� 0.572964/1.7�3 procs 0.36704/2.0� 0.274881/2.0� 0.301311/2.0� 0.489045/2.0�4 procs 0.273408/2.7� 0.19012/2.9� 0.204606/3.0� 0.340954/2.8�

It is worth mentioning that MVP needs to be performed many times in theiterative solver such as GMRES. Hence, even a small reduction of MVP runtimecan lead to an essential impact on the total runtime of the solution, especially whenthe problem size increases rapidly.

7.2.2 Deflated GMRES

piCap has been used to perform analysis for three different structures as shown inFig. 12.11. The first is a plate with size 32�32 �m and discretized as 16�16 panels.The other two examples are cubic capacitor and Bus 2 � 2 crossover structures.


plate

a b c

cubic bus2x2

Fig. 12.11 Test structures: (a) plate, (b) cubic, and (c) crossover 2�2. Reprinted with permissionfrom [56] c� 2011 IEEE

Table 12.5 Runtime and iteration comparison for different examples

Diagonal prec. Spectral prec.

#Panel #Variable # Iter Time # Iter Time

Single plate 256 768 29 24.594 11 8:625

Cubic 864 2592 32 49.59 11 19:394

Crossover 1,272 3,816 41 72.58 15 29:21

For each example, we can obtain two stochastic equation systems in (12.19) byconsidering variations separately from width h of each panel and from the centricdistance d between two panels, both with 20% perturbation ranges from theirnominal values which should obey the Gaussian distribution.

To demonstrate the effectiveness of the deflated GMRES with a spectral pre-conditioner, two different algorithms are compared in Table 12.5. In the baselinealgorithm (column “diagonal prec.”), it constructs a simple preconditioner usingdiagonal entries. As the fine mesh structure in the extraction usually introducesdegenerated or small eigenvalues, such a preconditioning strategy within the tra-ditional GMRES usually needs much more iterations to converge. In contrast, sincethe deflated GMRES employs the spectral preconditioner to shift the distributionof nondominant eigenvalues, it accelerates the convergence of GMRES, leadingto a reduced number of iterations. As shown by Table 12.5, the deflated GMRESconsistently reduces the number of iterations by 3� on average.

7.2.3 Incremental Preconditioner

With the spectral preconditioner, an incremental GMRES can be designed easilyto update the preconditioner when considering different stochastic variations. Itquite often happens that a change occurs in the perturbation range of one geometryparameter or in the variation type from one geometry parameter to the other. As the


Table 12.6 Total runtime(s) comparison for two-layer 20-conductor by different methods

Total runtime (s)Discretizationw � t � l #Panel #Variable Nonincremental Incremental

3 � 3 � 7 2,040 6,120 419.438 81.3753 � 3 � 15 3,960 11,880 3,375.205 208.2663 � 3 � 24 6,120 18,360 – 504.2023 � 3 � 60 14,760 44,280 – 7,584.674

system equation in (12.19) is augmented to 3� larger than the nominal system, itbecomes computationally expensive to apply any nonincremental GMRES methodswhenever there is a change from the variation. As shown by the experiments, theincremental preconditioning in the deflated GMRES can reduce the computationcost dramatically.

As described in Sect. 5, iGMRES needs to perform the precondition only onetime for the nominal system and to update the preconditioner with perturbationsfrom matrix block P .1/. In order to verify the efficiency of such an incrementalpreconditioner strategy, we apply two different perturbation ranges for h1 for panelsof the two-layer 20 conductors shown in Fig. 12.10. Then, we compare the totalruntime of the iGMRES and GMRES, both with the deflation. The results are shownin Table 12.6.

From Table 12.6, we can see that a nonincremental approach needs to constructits preconditioner whenever there is an update of variations, which is very timeconsuming. The presented iGMRES can reduce CPU time greatly during the con-struction of the preconditioner by only updating the nominal spectral preconditionerincrementally with (12.15). The result of iGMRES shows a speedup up to 15� overnonincremental algorithms and only iGMRES can finish all large-scale examples upto 14,760 panels.

Moreover, we investigate the speedup each technique can bring to the overallperformance, and find that parallel MVP using FMM can reduce on average36% of total runtime when compared with serial counterpart. Similarly, spectralpreconditioner can reduce 27% total runtime on average. In addition, when applyingincremental precondition, total runtime can be reduced by 21% on average. It canbe found that parallel MVP is the most efficient mechanism among these techniquesto achieve speedup.

7.3 Eigenvalue Analysis

The spectral preconditioner can shift eigenvalue distribution to improve the conver-gence of GMRES. Therefore, we compare the resultant spectrum with the nominalcase in this section, and further verify the efficiency of spectral preconditioner.We use a single plate as an experimental example, and the spectrum of potentialcoefficient matrix P can be calculated for nominal and perturbed systems.


0 20 40 60 80 100

103

102

101

100

10−1

EigenValue Index

Eig

enV

alue

Nominal SystemPerturbed SystemPreconditioned Perturbated System

Fig. 12.12 The comparison of eigenvalue distributions (panel width as variation source)

7.3.1 Perturbed System with Width as a Variation Source

First, we study the spectrum of the nominal system without variation, which isshown as plus signs in Fig. 12.12. It is obvious that the eigenvalues are not closeto each other, which can lead to large number of iterations in GMRES.

We introduce panel width variation h to generate perturbed system P.�/ �q.�/ D v. Here we assume that h has a 20% perturbation range. The eigenvaluedistribution of perturbed system can change dramatically from this nominal case, ascircle signs in Fig. 12.12, which disperse within a larger area. Therefore, in orderto speed up the convergence, we construct a spectral preconditioner as describedin Sect. 5 and apply it to the above perturbed system. Similarly, the spectrum ofthe preconditioned perturbed system are shown as star signs in Fig. 12.12. It can beobserved that the preconditioned system has a more compact eigenvalue distributionbecause the spectral preconditioner shifts dispersed eigenvalues to a certain area.

Moreover, when the linear system is solved with an iterative solver, such asGMRES, the convergence speed depends greatly upon eigenvalue distributionsof the system matrix. With more compact spectrum, spectral preconditioner canaccelerate the convergence of iGMRES dramatically in the presented method.

8 Summary 207

0 20 40 60 80 100

102

101

100

10−1

EigenValue Index

Eig

enV

alue

Nominal SystemPerturbed SystemPreconditioned Perturbed System

Fig. 12.13 The comparison of eigenvalue distributions (panel distance as variation source)

7.3.2 Perturbed System with Distance as a Variation Source

Similarly, we can introduce panel distance variation d into the nominal system toget perturbed system P.�/ � q.�/ D v. Also, distance d has a 20% perturbationrange.

We plot the spectrum of the perturbed system with distance variation with circlesigns in Fig. 12.13. When compared with spectrum in Fig. 12.12, we find thatpanel width variation has more influence on the spectrum of perturbed system thanpanel distance variation does. With spectral precondition, the spectrum becomesmore compact, as shown with star signs in Fig. 12.13. In fact, all eigenvaluesof preconditioned perturbed system are close to 0:2, which determines the smallcondition number of the system matrix and thus fast convergence of GMRES.

8 Summary

In this chapter, we introduced GMs to capture local random variations for full-chipcapacitance extraction. Based on FMs, the stochastic capacitance can be therebycalculated via OPC by FMM in a parallel fashion. As such, the complexity ofthe MVP can be largely reduced to evaluate both nominal and stochastic values.Moreover, one incrementally preconditioned GMRES is developed to considerdifferent types of update of variations with an improved convergence by spectrumdeflation.


A number of experiments show that the presented approach is �347� fasterthan the MC-based evaluation of variation with a similar accuracy, up to 3� fasterthan the serial method in MVP, and up to 15� faster than nonincremental GMRESmethods. In detail, the observed speedup of the presented approach is analyzed intwofold: the first is from the efficient parallel FMM, and the other is from the non-MC evaluation by OPC. The potential speedup of one parallel algorithm is givenby Amdahl’s law. As FMM and OPC can be highly parallelized, the presenteddeveloped extraction thereby can achieve significant speedups on parallel computingplatforms. However, note that the spectral precondition is not parallelized. Forexample, the parallel MVP in FMM can reduce the total runtime by 36% on average.The use of spectral precondition and incremental evaluation can reduce the totalruntime by 27% and 21% on average, respectively. As such, the parallel MVP is theone to reduce the runtime mostly. Moreover, we have also investigated the benefitfrom data sharing on communication overhead during the parallel implementation.It shows that the data-sharing technique, such as the use of dependence list, caneliminate the major communication overhead and can achieve up to 1:57� speedupfor the parallel MVP on four processors. The future work is planned to extendthe presented approach to deal with the general capacitance extraction with a non-square-panel geometry.

Chapter 13Statistical Inductance Modeling and Extraction

1 Introduction

A significant portion of process variations are purely random in nature [122]. As aresult, variation-aware design methodologies and statistical computer-aided design(CAD) tools are widely believed to be the key to mitigating some of the challengesfor 45 nm technologies and beyond [122, 148]. Variational considerations have tobe incorporated into every step of the design and verification processes to ensurereliable chips and profitable manufacturing yields.

In this chapter, we investigate the impact of geometric variations on theextracted inductance (partial or loop). Parasitic extraction algorithms have beenintensively studied in the past to estimate the resistance, capacitance, inductance,and susceptance of 3D interconnects [76, 118, 147, 211]. Many efficient algorithmslike the FastCap [118], FastHenry [76], and FastImp [211] were proposed, basedon using the BEM or volume discretization methods (for partial element equivalentcircuit (PEEC)-based inductance extraction [147]). In the nanometer regime, cir-cuit layout will have significant variations, both systematic and random, comingfrom the fabrication process. Much recent research work has been done underdifferent variational models for capacitance extraction while considering processvariations [74, 207, 208, 210]. However, less research has been done for variationalinductance extraction in the past.

We present a new statistical inductance extraction method called statHenry [143],based on a spectral stochastic collocation scheme. This approach is based on theHermite PC representation of the variational inductance. statHenry applies thecollocation idea where the inductance extraction processes are performed manytimes in predetermined sampling positions so that the coefficients of orthogonalpolynomials of variational inductance can be computed using the weighted least-square method. The number of samplings is O(m2), where m is the number ofvariables for the second-order Hermite polynomials. If m is large, the approach willlose its efficiency compared to the MC method. To mitigate this problem, a weightedprincipal factor analysis (wPFA) method is performed to reduce the number of


209

210 13 Statistical Inductance Modeling and Extraction

variables by exploiting the spatial correlations of variational parameters. Numericalexamples show that the presented method is orders of magnitudes faster than theMC method with very small errors for several practical interconnect structures. Wealso show that typical variation for the width and height of wires (10–30%) cancause significant variations to both partial and loop inductance.


For a system with m conductors, we first divide all conductors into b filaments. Theresistance and inductance of all filaments are, respectively, stored in matrices Rb�b

and Lb�b , each with dimensions b � b. R is a diagonal matrix with its diagonalelement

Rii D li

�ai

; (13.1)

where li is the length of filament i , � is conductivity, and ai is the area of thecross section of filament i . The inductance matrix L is a dense matrix. Lij can berepresented as in [76]:

Lij D �

4ai aj

Z

Vi

Z

Vj

liPljkr � r0kdVidVj ; (13.2)

where � is permeability, li and lj are unit vectors of the lengthwise directionof filaments i and j , r is an arbitrary point in the filament, and Vi and Vj

are the volumes of filaments i and j , respectively. Assuming magnetoquasistaticelectric fields, the inductance extraction problem is then finding the solution to thediscretized integral equation:

�li

�

Ii C j!

bX

j D1

�

4ai aj

Z

Vi

Z

Vj

liPljkr � r0kdVidVj

!

Ij

D 1

ai

Z

ai

.˚A � ˚B/dA; (13.3)

where Ii and Ij are the currents inside the filaments i and j , ! is the angularfrequency, and ˚A and ˚B are the potentials at the end faces of the filament.Equation (13.3) can be written in the matrix format as

.R C j!L/Ib D Vb; (13.4)


where Ib 2 Cb is the vector of b filament currents and Vb is a vector of dimensionb containing the filament voltages. We will first solve for the inductance of oneconductor, which we will call the primary conductor, and then the inductancebetween it and all others, which we will call the environmental conductors. To dothis, we set the voltages of filaments in the primary conductor to unit voltage andvoltages of all other filaments to zero. Therefore Ib can be calculated by solvinga system of linear equations, together with the current conservation (Kirchhoff’scurrent law (KCL)) equation

MIb D Im (13.5)

on all the filaments, where M is an adjacent matrix for the filaments and Im isthe currents of all m conductors. By repeating this process with each of the m

conductors as the primary conductor, we can obtain Im;i ; i D Œ1; : : : m� vectorswhich form an m � m matrix Ip D ŒIm;1; Im;2; : : : ; Im;m�. Since the voltages ofall primary conductors have been set to unit voltage previously, the resistance andinductance can be achieved respectively from the real part and the imaginary part ofthe inverse matrix of Ip .

Process variations affecting conductor geometry are reflected by changes in thewidth w and height h of the conductors. We ignore the length of the wires as thevariations are typically insignificant compared to their magnitude. These variationswill make each element in the inductance matrix follow some kinds of randomdistributions. Solving this problem is done by deriving the random distributionand then effectively computing the mean and variance of the inductance with thegiven geometric randomness parameters. In this chapter, we assume that width andheight in each filament i are disturbed by random variables nw;i and nh;i , whichgives us:

wi0 D wi C nw;i ; (13.6)

hi0 D hi C nh;i ; (13.7)

where the size of xi is a Gaussian distribution jxi j � N.0; �2/. The correlationbetween random perturbations on each wire’s width and height is governed by anempirical formulation such as the widely used exponential model:

�.r/ D e�r2=�2

; (13.8)

where r is the distance between two panel centers and � is the correlation length. Themost straightforward method is to use a MC-based simulation to obtain distribution,mean, and variance of all those inductances. Unfortunately, the MC method will beextremely time consuming, and more efficient statistical approaches are needed.


3 The Presented Statistical Inductance ExtractionMethod—statHenry

In this section, we present the new statistical inductance extraction method—statHenry. The presented method is based on spectral stochastic method where theintegration in (2.36) is computed via an improved numerical quadrature method. Thepresented method is based on the efficient multidimensional numerical Gaussianand Smolyak quadrature in Sect. 3.3 of Chap. 2 and the variable decoupling andreduction technique in Sect. 2.2 of Chap. 2.

3.1 Variable Decoupling and Reduction

In inductance extraction problem, process variations exist in the width w and heighth of the conductors, which make each element of the inductance matrix (13.2)follow some kinds of random distributions. Solving this problem is done by derivingthe random distribution and then effectively computing the mean and variance of theinductance with the given geometric randomness parameters. As shown in (13.6)and (13.7), each filament i is modeled by two Gaussian random variables, nw;i

and nh;i . Suppose there are n filaments, then the inductance extraction probleminvolves 2n Gaussian random variables with spatial correlation modeled as in (13.8).

Even with sparse grid quadrature, the number of sampling points still growsquadratically with the number of variables. As a result, we should further reducethe number of variables by exploiting the spatial correlations of the given randomwidth and height parameters of wires.

We start with independent random variables as the input of the spectral stochasticmethod. Since the height and width variables of all wires are correlated, thiscorrelation should be removed before using the spectral stochastic method. Asproved in Sect. 2.3 of Chap. 2, the theoretical basis for decoupling the correlationof those variables is Cholesky decomposition.

Proposition 13.1. For a set of zero-mean Gaussian distributed variables �� whosecovariance matrix is ˝2n�2n, if there is a matrix L satisfying ˝ D LLT , then ��can be represented by a set of independent standard normally distributed variables� as �� D L�.

Here the covariance matrix ˝2n�2n contains the covariance between all the nw;i

and nh;i for each filament, and ˝ is always a semipositive definite matrix due tothe nature of covariance matrix. At the same time, PFA [74] can substitute Choleskydecomposition when variable reduction is needed. Eigendecomposition on ˝2n�2n

yields:

˝2n�2n D LLT ; L Dp

�1e1; : : : ;p

�2ne2n

�; (13.9)

3 The Presented Statistical Inductance Extraction Method—statHenry 213

where f�ig are eigenvalues in order of descending magnitude, and feig arecorresponding eigenvectors. After PFA, the number of random variables involvedin inductance extraction is reduced from 2n to k by truncating L using the first k

items.The error of PFA can be controlled by k:

err D

2nP

iDkC1

�i

2nP

iD1

�i

; (13.10)

where bigger k leads to a more accurate result. PFA is efficient, especially when thecorrelation length is large. In the experiments, we set the correlation length beingeight times the width of wires. As a result, PFA can reduce the number of variablesfrom 40 to 14 with an error of about 1% in an example with 20 parallel wires.

3.2 Variable Reduction by Weighted PFA

PFA for variable reduction considers only the spatial correlation between wires,while ignoring the influence of the inductance itself. One idea is to consider theimportance of the outputs during the reduction process. We follow the recentlyproposed wPFA technique to seek better variable reduction efficiency [204].

If a weight is defined for each physical variable �i , to reflect its impact on theoutput, then a set of new variables �� are formed:

�� D W �; (13.11)

where W D diag.w1; w2; : : : ; w2n/ is a diagonal matrix of weights. As a result, thecovariance matrix of ��, �.��/ now contains the weight information, and perform-ing PFA on �2n�2n.��/ leads to the weighted variable reduction. Specifically, wehave

�2n�2n.��/ D E�W �.W �/T

� D W �2n�2n.�/W T (13.12)

and denote its eigenvalues and eigenvectors by ��i and e�

i . Then, the variables �

can be approximated by the linear combination of a set of independent dominantvariables ��:

� D W �1�� W �1

kX

iD1

q��

i e�i ��

i : (13.13)

The error controlling process is similar to (13.10), but using the weighted eigenval-ues ��

i . For inductance extraction, we take the partial inductance of the deterministicstructure as the weight, since this normal structure reflects an approximate equality


Fig. 13.1 The statHenry algorithm

of inductance compared with the variational structure. By performing wPFA in thesame example with 20 parallel wires, 40 variables can now be reduced to 8 ratherthan 14 when using PFA (more details in the experimental results).

3.3 Flow of statHenry Technique

After explaining all the important pieces from related works in Chap. 2, we are nowready to present the new algorithm—statHenry. Figure 13.1 is a flowchart of thepresented algorithm.


In this section, we compare the results of the statHenry method against the MCmethod and a simple method using HPC with the sparse grid technique but withoutvariable reduction. The method statHenry has been implemented in Matlab 8.0. Allthe experimental results were obtained using a computer with a 1:6 GHz Intel quad-core i7-720 and 4 GB memory running Microsoft Windows 7 Ultimate operatingsystem. The version of FastHenry is 3.0 [76]. The initial results of this chapter werepublished in [63, 143].

For the experiment, we set up four test cases to examine the algorithm: 2 parallelwires, 5 parallel wires, 10 parallel wires, and 20 parallel wires as shown in Fig. 13.2.In all four models, all of the wires have a width of 1 �m, length of 6 �m, andpitch of 1 �m between them. The unit of the inductance in the experiment results ispicohenry (pH).


Fig. 13.2 Four test structures used for comparison

We set the standard deviation as 10% of the wire widths and wire heights and thecorrelation length � being 8 �m to indicate a strong correlation.

First, we compare the accuracy of the three methods in terms of the meanand standard deviations of loop/partial inductance. The results are summarized inTable 13.1. In the table, we report the results from four test cases as mentioned.In each case, we report the results for partial self-inductance on wire 1 (L11p) andloop inductance between wire 1 and 2 (L12l). Columns 3–4 are the mean valueand standard deviation value for the MC method (MC). And columns 5–12 are themean value, standard deviation value, and their errors comparing with MC methodfor HPC and the presented method. The average error of the mean and standarddeviation of HPC method is 0:05% and 2:01% compared with MC method whilethat of statHenry method is 0:05% and 2:06%, respectively. The MC results comefrom 10,000 FastHenry runs.

It can be seen that statHenry is very accurate for both mean and standarddeviation compared with the HP C method and MC method. We observe that a10% standard deviation for the width and height results in variations from 2.73% to5.10% for the partial and loop inductances, which is significant for timing.

Next, we show the CPU time speedup of the presented method. The results aresummarized in Table 13.2. It can be seen that statHenry can be about two ordersof magnitude faster than the MC method. The average speedup of the HPC methodand statHenry method is 54.1 and 349.7 compared with MC method. We notice thatwith more wires, the speedup goes down. This is expected as more wires lead tomore variables, even after the variable reduction, as the number of samplings inthe collocation method is O.m2/ for second-order Hermit polynomials, where m

is the number of variables. As a result, more samplings are needed to compute thecoefficients while MC has the fixed number of samplings (10,000 for all cases).


Table 13.1 Accuracy comparison (mean and variance values of inductances) among MC, HPC,and statHenry

Values (pH) Error

Wires Inductance MC HPC statHenry HPC (%) statHenry (%)

2 L11p Mean 2.851 2.850 2.850 0.02 0.03std 0.080 0.078 0.078 2.31 2.47

2 L12l Mean 3.058 3.057 3.056 0.05 0.06std 0.158 0.156 0.155 1.50 2.21

5 L11p Mean 2.849 2.851 2.851 0.08 0.07std 0.078 0.078 0.078 0.86 0.24

5 L12l Mean 3.054 3.058 3.058 0.11 0.11std 0.155 0.156 0.156 1.01 0.70

10 L11p Mean 2.852 2.853 2.853 0.01 0.02std 0.079 0.078 1.23% 0.078 1.37

10 L12l Mean 3.059 3.060 3.060 0.05 0.05std 0.159 0.156 1.55% 0.156 1.74

20 L11p Mean 2.852 2.853 2.853 0.03 0.03std 0.081 0.078 0.078 3.74 3.82

20 L12l Mean 3.059 3.060 3.060 0.04 0.05std 0.163 0.156 0.156 3.88 3.96

Table 13.2 CPU runtime comparison among MC, HPC, and statHenry

WiresMCTime (s)

HPCTime (s)

Speedup(vs. MC)

statHenryTime (s)

Speedup(vs. MC)

2 5394:4 32:6 165:4 9:8 550:4

5 7442:8 192:5 38:7 12:6 589:1

10 8333:5 893:7 9:3 42:5 195:9

20 13698:3 4532:9 3:0 215:8 63:5

Table 13.3 Reduction effects of PFA and wPFA

PFA wPFA

WiresOriginalVariables Reduction Points Reduction Points

2 4 4 45 2 15

5 10 4 45 2 15

10 20 6 91 4 45

20 40 14 435 8 153

Table 13.3 shows the reduction effects using PFA and wPFA for all the casesunder the same errors. We can see that with weighted wPFA, we can achieve lowerreduced variable number and fewer quadrature points for sampling, thus betterefficiency for the entire extraction algorithm.

Finally, we study the variational impacts of partial and loop inductances underdifferent variabilities for width and height using statHenry and the MC method.

The variation statistics are summarized in Table 13.4. Here we report the resultsfor standard deviations from 10% to 30% for width and height for statHenry


Table 13.4 Variation impacts on inductances using statHenry

10 parallel wires L11p (pH)Monte Carlo statHenry Error

Variation Mean Std Mean Std Mean (%) Std (%)

10% 2.852 0.079 2.853 0.078 0.02 1.3720% 2.872 0.163 2.862 0.160 0.35 1.8430% 2.890 0.245 2.879 0.249 0.36 1.45

10 parallel wires L12l (pH)Monte Carlo statHenry Error

Variation Mean Std Mean Std Mean (%) Std (%)

10% 3.059 0.159 3.060 0.156 0.05 1.7420% 3.097 0.325 3.078 0.319 0.61 1.8430% 3.128 0.484 3.110 0.495 0.56 2.26

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

0.05

0.1

0.15

0.2

loop inductance L12 (pH)

prob

abili

ty

loop inductance L12 distribution of 10 parallel wires

Monte CarlostatHenry

Fig. 13.3 The loop inductance L12l distribution changes for the 10-parallel-wire case under 30%width and height variations

method and MC method for 10-parallel-wire case. As the variation due to processimperfections grows as the technology advances, we can see that inductancevariation will also grow. Considering a typical 3� range for variation, a 30%standard deviation means that width and height changes can reach 90% of theirvalues. It can be seen that with the increasing variations of width and height (from10% to 30%), the std=mean of partial inductance grows from 2.75% to 8.65% whilethat of loop inductance grows from 5.10% to 15.9% , which can significantly impactthe noise and delay of the wires. The average error of mean and standard deviationof statHenry is 0.33% and 1.75% compared with MC for all variabilities of widthand height. From this, we can see that the results of statHenry agree closely withMC under different variations.


1.5 2 2.5 3 3.5 4 4.50

0.05

0.1

0.15

0.2

0.25

partial inductance L11 (pH)

prob

abili

typartial inductance L11 distribution of 10 parallel wires

Monte CarlostatHenry

Fig. 13.4 The partial inductance L11p distribution changes for the 10-parallel-wire case under30% width and height variations

Figures 13.3 and 13.4 show the loop (for wire 1 and wire 2, L12l ) and partialinductance distributions (for wire 1 itself, L11p) under 30% deviations of width andheights for the 10-parallel-wire case.

5 Summary

In this chapter, we have presented a new statistical inductance extraction method,called statHenry, for interconnects considering process variations with spatialcorrelation. This new method is based on the collocation-based spectral stochasticmethod where OPC is used to represent the variational geometrical parameters ina deterministic way. Statistical inductance values are then computed using a fastmultidimensional Gaussian quadrature method with sparse grid technique. Then, tofurther improve the efficiency of the presented method, a random variable reductionscheme based on wPFA is applied. Numerical examples show that the presentedmethod is orders of magnitudes faster than the MC method with very small errorsfor several practical interconnect structures. We also show that both partial and loopinductance variations can be significant for the typical 10–30% standard variationsof width and heights of interconnect wires.

Part VStatistical Analog and Yield Analysis

and Optimization Techniques

Chapter 14Performance Bound Analysis of VariationalLinearized Analog Circuits

1 Introduction

Analog and mixed-signal circuits are very sensitive to the process variationsas many matchings are required. This situation becomes worse as technologycontinues to scale to 90 nm and below owing to the increasing process-inducedvariability [122, 148]. Transistor-level mismatch is the primary obstacle to reacha high yield rate for analog designs in sub-90 nm technologies. For example, dueto an inverse-square-root-law dependence with the transistor area, the mismatch ofCMOS devices nearly doubles for each process generation less than 90 nm [80,104].Since the traditional worst-case- or corner-case-based analysis is too pessimistic tosacrifice the speed, power, and area, the statistical approach [133] thereby becomesa trend to estimate the analog mismatch and performance variations. The variationsin the analog components can come from systematic (or global spatial variation)ones and stochastic (or local random variation) ones. In this chapter, we model bothvariations as the parameter intervals on the components of analog circuits.

Analog circuit designers usually perform a MC analysis to analyze the stochasticmismatch and predict the variational responses of their designs under faults. As MCanalysis requires a large number of repeated circuit simulations, its computationalcost is expensive. Moreover, the pseudorandom generator in MC introduces numer-ical noises that may lead to errors. More efficient variational analysis, which cangive the performance bounds, is highly desirable.

Bounding or worst-case analysis of analog circuits under parameter variationshas been studied in the past for fault-driven testing and tolerance analysis of analogcircuits [83, 162, 179]. The proposed approaches include sensitivity analysis [185],the sampling method [168], and interval arithmetic-based approaches [83, 140, 162,179]. But sensitivity-based method cannot give the worst-case in general, and thesampling based method is limited to a few variables. Interval arithmetic methods, ingeneral, have the reputation of overly pessimistic in the past. Recently, worst-caseanalysis of linearized analog circuits in frequency domain has been proposed [140],


221

222 14 Performance Bound Analysis of Variational Linearized Analog Circuits

where Kharitonov’s functions [79] were applied to obtain the performance boundsin frequency domain, but no systemic method was proposed to obtain variationaltransfer functions.

In this chapter, we propose a performance bound analysis algorithm of analogcircuits considering the process variations [61]. The presented method employsseveral techniques to compute the bounding responses of analog circuits in thefrequency domain. First, the presented method models the variations of componentvalues as intervals measured from tested chip and manufacture processes. Thenthe presented method applies determinant decision diagram (DDD) graph-basedsymbolic analysis to derive the exact symbolic transfer functions from linearizedanalog circuits. After this, affine interval arithmetic is applied to compute the vari-ational transfer functions of the analog circuit with variational coefficients in formsof intervals. Finally, the frequency response bounds (maximum and minimum) areobtained by performing evaluations of a finite number of special transfer functionsgiven by the Kharitonov’s theorem, which shows the proved response bounds forgiven interval polynomial functions in frequency domain. We show that symbolicdecancellation is critical for reducing inherent pessimism in the affine intervalanalysis. We also show that response bounds given by the Kharitonov’s functions areconservative, given the correlations among coefficient intervals in transfer functions.Numerical examples demonstrate the presented method is more efficient than theMC method.

The rest of this chapter is organized as follows: Sect. 2 gives a review oninterval arithmetic and affine arithmetic. The presented performance bound analysismethod is presented in Sect. 3. Section 4 shows the experimental results, and Sect. 5summarizes this chapter.

2 Review of Interval Arithmetic and Affine Arithmetic

Interval arithmetic was introduced by Moore in the 1960s [113] to solve rangeestimation considering uncertainties. In interval arithmetic, a classical variable x

is represented by an interval Ox D Œx�; xC� which satisfies x� � x � xC.However, the interval arithmetic suffers the overestimation problem as it often yieldsan interval that is much wider than the exact range of the function.

As an example, given Ox D Œ�1; 1�, the interval evaluation of Ox � Ox producesŒ�1 � 1; 1 � .�1/� D Œ�2; 2� instead of Œ0; 0�, which is the actual range of thatexpression.

Affine arithmetic was proposed by Stolfi and de Figueiredo [25] to overcome theerror explosion problem of standard interval analysis. In affine interval, the affineform Ox of random variable x is given by

Ox D x0 CnX

iD1

xi "i ; (14.1)

3 The Performance Bound Analysis Method Based on Graph-based Symbolic Analysis 223

in which each noise symbol "i .i D 1; 2; : : : ; n/ is an independent componentof the total uncertainties of x which satisfies �1 � "i � 1, the coefficient xi

is the magnitude of "i , and x0 is the central value of Ox. The conversion fromaffine intervals to classical intervals is easy as Ox in (14.1) can be converted toŒx0 � rad. Ox/; x0 C rad. Ox/� in which rad. Ox/ D Pn

iD1 jxi j is defined as the radiusof the affine expression Ox. Basic operation of addition and subtraction of affinearithmetic is defined by

Ox ˙ Oy D .x0 ˙ y0/ CnX

iD1

.xi ˙ yi /"i : (14.2)

Returning to the previous example, if x has the affine form Ox D 0C"1 then Ox � Ox D"1 � "1 D 0 gives the accurate result. Affine arithmetic multiplication is defined as

Ox � Oy D x0 � y0 CnX

iD1

.x0 � yi C xi � y0/"i C rad. Ox/ � rad. Oy/ � "nC1; (14.3)

in which "nC1 is a new noise symbol that is distinct from all the other noisesymbols "i .i D 1; 2; : : : ; n/. We notice that affine operations mitigate the problemassociated with symbolic cancellations in addition, but for multiplication, thesymbolic cancellation can still exist, for instance if Ox � Oy � Oy � Ox D 0, but they willgenerate two different "nC1’s when multiplication is done first and the completecancellation will not happen any more.

3 The Performance Bound Analysis Method Basedon Graph-based Symbolic Analysis

We first present the whole algorithm flow of the presented performance boundanalysis algorithm in Fig. 14.1. Basically, the presented method consists of twomajor computing steps. The first step is to compute the variational transfer functionsfrom the variational circuit parameters, which will be done via DDD-based symbolicanalysis method and affine interval arithmetic (steps 1–3). Second, we compute thefrequency response bounds via Kharitonov’s functions, which just require a fewtransfer function evaluations (step 4). Kharitonov’s functions can lead to approvedupper and lower bounds for the frequency domain responses for a variationaltransfer function. We will present the two major computing steps in the followingsections.

3.1 Variational Transfer Function Computation

In this section, we first provide a brief overview of DDD [160]. Next we show howaffine arithmetic can be applied to compute the variational transfer function.



1 R2 R3

C2 C3R1 C1

2 3

I

Fig. 14.2 An example circuit. Reprinted with permission from [61]. c� 2011 IEEE

3.1.1 Symbolic Analysis by Determinant Decision Diagrams

Determinant decision diagrams [160] are compact and canonical graph-basedrepresentation of determinants. The concept is best illustrated using a simple RCfilter circuit shown in Fig. 14.2.

Its system equations can be written as

2

64

1R1

C sC1 C 1R2

� 1R2

0

� 1R2

1R2

C sC2 C 1R3

� 1R3

0 � 1R3

1R3

C sC3

3

75

2

4v1

v2

v3

3

5 D2

4I

0

0

3

5 :

We view each entry in the circuit matrix as one distinct symbol and rewrite its systemdeterminant in the left-hand side of Fig. 14.3. Then its DDD representation is shownin the rhs.

A DDD is a signed, rooted, directed acyclic graph with two terminal nodes,namely, the 0-terminal vertex and the 1-terminal vertex. Each nonterminal DDDvertex is labeled by a symbol in the determinant denoted by ai (A to G in Fig. 14.3),and a positive or negative sign denoted by s.ai /. It originates two outgoing edges,


A B 0

C D E

0 F G

0 edge

1 0

1 edge+

+

+

-

+

+

-

A

B

CD

F

E

G

Fig. 14.3 A matrix determinant and its DDD representation. Reprinted with permission from [61].c� 2011 IEEE

called 1-edge and 0-edge. Each vertex ai represents a symbolic expression D.ai /

defined recursively as follows:

D.ai / D ai � s.ai / � Dai C Dai ; (14.4)

where Dai and Dairepresent, respectively, the symbolic expressions of the nodes

pointed by the 1-edge and 0-edge of ai . The 1-terminal vertex represents expression1, whereas the 0-terminal vertex represents expression 0. For example, vertex E

in Fig. 14.3 represents expression E , and vertex F represents expression �EF ,and vertex D represents expression DG � FE . We also say that a DDD vertexD represents an expression defined in the DDD subgraph rooted at D.

A 1-path in a DDD corresponds with a product term in the original DDD, whichis defined as a path from the root vertex (A in our example) to the 1-terminalincluding all symbolic symbols and signs of the nodes that originate all the 1-edgesalong the 1-path. In our example, there exist three 1-paths representing three productterms: ADG, �AFE , and �CBG. The root vertex represents the sum of theseproduct terms. Size of a DDD is the number of DDD nodes, denoted by jDDDj.

Once a DDD has been constructed, the numerical values of the determinant itrepresents can be computed by performing the depth-first-type search of the graphand performing (14.4) at each node, whose time complexity is linear functionof the size of the graphs (its number of nodes). The computing step is calledEvaluate(D) where D is a DDD root. With proper node ordering and hierarchicalapproaches, DDD can be very efficient to compute transfer functions of large analogcircuits [160, 174].

In order to compute the symbolic coefficients of the transfer function in differentpowers of s, the original DDD can be expanded to the s-expanded DDD [161].By doing this, each coefficient of the transfer function is represented by a coefficient


DDD. The s-expanded DDD can be constructed from the complex DDD in lineartime in the size of the original complex DDD [161].

3.1.2 Variational Transfer Function

Assume that each circuit parameter Ox becomes an affine interval Ox D x0 CnP

iD1

xi "i

due to process variations, now we want to compute the variational transfer functions.The resulting transfer functions will take the following s-expanded rational form:

H.s/ D N.s/

D.s/D

PmiD0 Oai s

i

Pnj D0

Obj sj; (14.5)

where coefficients Oai and Obj are all affine intervals. This can be computed by meansof affine arithmetic [25]. Basically, the DDD Evaluation operation traverses theDDD in a depth-first style and performs one multiplication and one addition at eachnode as shown in (14.4). Now the two operations will be replaced by the additionand multiplication from affine arithmetic.

3.1.3 Symbolic Decancellation in DDD Evaluation UsingAffine Arithmetic

As mentioned before, the interval and affine arithmetic operations are very sensitiveto the symbolic term cancellations, which, however, have significant presencesin the DDD and s-expanded DDD. It was shown that about 70–90% terms inthe determinant of a MNA-formulated circuit matrix are canceling terms [175].Notice that symbolic cancellation always happens even in the presence of parametervariations.

In DDD evaluation, we have both addition and multiplication as shown in (14.4).Cancellation can lead to large errors if not removed. For example, considering twoterms Ox � Oy � Oz and Oz � Oy � .� Ox/, and supposing Ox D 1 C "1; Oy D 1 C "2; Oz D 1 C "3,then

Ox � Oy � Oz D .1 C "1 C "2 C "4/ � OzD 1 C "1 C "2 C "3 C "4 C 3"5;

Oz � Oy � .� Ox/ D .1 C "2 C "3 C "6/ � .� Ox/

D �1 � "1 � "2 � "3 � "6 � 3"7:

However, the addition of these two terms is

Ox � Oy � Oz C Oz � Oy � .� Ox/ D "4 C 3"5 � "6 � 3"7; (14.6)


which should be 0. The reason is that in affine multiplication defined in (14.3),the new noise symbol is actually a function of the original noise symbols "i .i D1; 2; : : : ; n/, but affine arithmetic assumes the new symbol is independent from theoriginal ones. As a result, the symbolic canceling terms will result in inaccurateresults, which can be as large as Œ�8; 8� for (14.6).

Fortunately, we can perform the decancellation operation on coefficient DDDsin the s-expanded DDDs in a very efficient way during or after the coefficientDDD construction, so that the resulting coefficient DDD is cancellation free [175],which can significantly improve the interval computation accuracy as shown in theexperimental results.

3.1.4 Increase the Accuracy of Affine Arithmetic by ConsideringSecond-Order Noise Symbols

The affine arithmetic used in DDD evaluation is addition and multiplication. Theaffine addition is accurate as it does not include any new noise symbol. However,for affine multiplication shown in (14.3), every time a new noise symbol "nC1 isadded and this process will reduce the accuracy of the bound of affine arithmeticcompared with real bound. In our implementation, we store the coefficients of firstorder as well as second-order noise symbols and we only add new noise symbol forhigher orders. The affine multiplication in (14.3) is changed to:

Ox Oy D x0y0 CnX

iD1

.x0yi C xi y0/"i

CnX

iD1

xi yi "2i C

nX

iD1

nX

j DiC1

.xi yj C xj yi /"i "j : (14.7)

For simplicity, assume x�; xC; xi ; y�; yC; yi > 0 .i D 0; 1; � � � ; n/, the boundof Ox Oy in (14.7) is Œx0y0 � rad1; x0y0 C rad2�, in which

rad1 DnX

iD1

.x0yi C xi y0/ �nX

iD1

nX

j D1

xi yj ; (14.8)

rad2 DnX

iD1

.x0yi C xi y0/ CnX

iD1

nX

j D1

xi yj ; (14.9)

which is more accurate than the bound Œx0y0 � rad2; x0y0 C rad2� obtained byoriginal affine multiplication in (14.3). For other combinations of the values ofx�; xC; xi ; y�; yC; yi , the accuracy of affine multiplication can also be increasedaccordingly via considering second-order noise symbols.


3.2 Performance Bound by Kharitonov’s Functions

Given a transfer function with variational coefficients, one can perform MC-based approach to compute the variational responses in frequency domain. However,more efficient works can be done via Kharitonov’s functions which are only a few,but can give the approved bounds of the responses in frequency domain.

Kharitonov’s seminal work proposed in 1978 [79] was originally concernedwith the stability issues of a polynomial (with real coefficients) with coefficientuncertainties (due to perturbations). He showed that one needs to verify only fourspecial polynomials to ensure that all the variational polynomials are stable.

Specifically, given a family of polynomials with real and variational coefficients,

P.s/ D p0 C p1s C : : : C pnsn; p�i 6 pi 6 pC

i ; i D 0; � � � ; n: (14.10)

Then the four special Kharitonov’s functions are:

Q1.j!/ D Pemin.!/ C jPomin.!/; (14.11)

Q2.j!/ D Pemin.!/ C jPomax.!/; (14.12)

Q3.j!/ D Pemax.!/ C jPomin.!/; (14.13)

Q4.j!/ D Pemax.!/ C jPomax.!/; (14.14)

where

Pemin.!/ D p�0 � pC

2 !2 C p�4 !4 � pC

6 !6 C � � � ; (14.15)

Pemax.!/ D pC0 � p�

2 !2 C pC4 !4 � p�

6 !6 C � � � ; (14.16)

Pomin.!/ D p�1 ! � pC

3 !3 C p�5 !5 � pC

7 !7 C � � � ; (14.17)

Pomax.!/ D pC1 � p�

3 !3 C pC5 !5 � p�

7 !7 C � � � : (14.18)

One important observation is that the four special functions given byKharitonov’s theorem create a rectangle (called Dasgupta’s rectangle) [23] inthe response complex domain as shown in Fig. 14.4a, where the rectangle hasedges in parallel with real and imaginary axis. The four Kharitonov’s functions(polynomials) correspond to the four corners of the rectangle.

Later, Levkovich et al. [90] showed that Kharitonov’s theorem can be used tocalculate the amplitude and phase envelops of a family of interval rational transferfunctions of continuous-time systems in frequency domain. The results can be easilyinterpreted based on the Dasgupta’s rectangle (which is also called Kharitonov’srectangle), which can clearly show what is the largest magnitude (the longestdistance from origin of the complex plane to one corner of the rectangle). Same thingcan be derived for the smallest magnitudes and the bounds of the phase responses.


omax

omin

emaxemin

a b

4

3

2

1

Fig. 14.4 (a) Kharitonov’s rectangle in state 8. (b) Kharitonov’s rectangle for all nine states.Reprinted with permission from [61]. c� 2011 IEEE

Table 14.1 Extremevalues of jP.j!/jand ArgP.j!/ fornine states

Max Min Max MinState jP.j!/j jP.j!/j argŒP.j!/� argŒP.j!/�

1 Q4 Q1 Q2 Q3

2 Q3 or Q4 Pemin Q2 Q1

3 Q3 Q2 Q4 Q1

4 Q1 or Q3 Pomax Q4 Q2

5 Q1 Q4 Q3 Q2

6 Q1 or Q2 Pemax Q3 Q4

7 Q2 Q3 Q1 Q4

8 Q2 or Q4 Pomin Q3 Q1

9 Q1 or Q2 or 0 2 0

Q3 or Q4

Specifically, in the complex frequency domain, the magnitude and phase re-sponse of Kharitonov’s rectangle in the complex plane can be divided into ninestates, which is shown in Fig. 14.4b [90]. And the corresponding maximum andminimum magnitude and phase of the nine states are shown in Table 14.1:

Pmax.!/ D max.jQ1.!/j; jQ2.!/j; jQ3.!/j; jQ4.!/j/; (14.19)

Pmin.!/ D min.jQ1.!/j; jQ2.!/j; jQ3.!/j; jQ4.!/j;jPeminj; jPominj; jPemaxj; jPomaxj; 0/: (14.20)

An example of cascode op-amp circuit for phase envelops,

max ArgP! D max.jQ1.!/j; jQ2.!/j; jQ3.!/j; jQ4.!/j/: (14.21)

In Table 14.1, jP.j!/j and argŒP.j!/� are defined as the magnitude andphase of the polynomial P.j!/. Once the variational transfer function is obtained


from (14.5), the coefficients can be converted from affine interval to classicalinterval as Oai D Œa�

i ; aCi � and Obj D Œb�

j ; bCj �. Afterward, one can compute the

upper and lower bounds of the transfer function easily:

maxjH.s/j D maxjN.s/j=minjD.s/j; (14.22)

minjH.s/j D minjN.s/j=maxjD.s/j; (14.23)

max argŒH.s/� D max argŒN.s/� � min argŒD.s/�; (14.24)

min argŒH.s/� D min argŒN.s/� � max argŒD.s/�: (14.25)

Since the maximum and minimum magnitude and phase of numerator N.s/ anddenominator D.s/ have only a few possible cases which are shown in Table 14.1, itis very straightforward to obtain the magnitude and phase bounds of H.s/ comparedto large sampling-based MC simulations [90].

It was shown that if all the variational coefficients are not correlated and the valueof each coefficient in numerator and denominator belongs to finite real interval,the magnitude and phase bound are precise (real bound) [90], i.e., each bound willbe attained by one function in the variational function family. But in our problem,we know that each circuit parameter may contribute to several coefficients duringthe evaluations of coefficient DDDs, and thus, the variational coefficients are notindependent.

However, DDD can generate the dominant terms of each coefficient in differentpowers of s by performing the shortest path algorithm [176]. The shared parametersin the dominant terms can be removed from different coefficients to tighten the affineinterval bounds and reduce the correlation between coefficients.

In the experiment part, we show that the bounds given by Kharitonov’s theoremare conservative and they indeed cover all the responses from the MC simulationresults.


The presented method has been implemented in CCC, and the affine arithmetic partis based on [43]. All the experimental results were carried out in a Linux system withquad Intel Xeon CPUs with 3 GHz and 16 GB memory. The presented performancebound method was tested on two sample circuits, one is a CMOS low-pass filter(shown in Fig. 14.5), another is a CMOS cascode op-amp circuit [154] where thesmall signal model is used to model the MOSFET transistors. The initial results ofthis chapter were published in [61].

The information about the complexity of complex DDD and s-expanded DDDafter symbol decancellation are shown in column 1 to 7 in Table 14.3, in whichNumP and DenP are the total numbers of product terms in the numerator anddenominator of the transfer function and jDDDj is the size(number of vertices)


in

i2

i3

i1

1

1 1

1

12 1d

2 2

5 F

F3

a b

Fig. 14.5 (a) A low-pass filter. (b) A linear model of the op-amp in the low-pass filter. Reprintedwith permission from [61]. c� 2011 IEEE

Table 14.2 Summary of coefficient radius reduction with cancellation

Ave. Max. Min.

Var. (%) Num. (%) Den. (%) Num. (%) Den. (%) Num. (%) Den. (%)

5 23.2 35.2 36.8 51.7 2.0 25.710 36.9 52.0 54.5 66.6 3.9 41.415 45.9 61.9 64.8 73.6 5.8 51.6

Table 14.3 Summary of DDD information and performance of the presented method

Complex DDD s-Expanded DDD

Circuit NumP DenP jDDDj NumP DenP jDDDjLow-pass 5 8 31 7 70 32Cascode 76 216 153 4,143 13,239 561

Bound range

CircuitNumberof "

Globalvariation (%)

LocalVariation (%) Mag (%) Pha (%)

Speed upto MC

Low-pass 7 5 10 95.1 93.8 11510 10 92.5 91.9 101

Cascode 30 5 10 83.9 84.3 7710 10 81.1 80.2 68

of the DDD representing both the numerator and the denominator of the transferfunction. From the table, we can see that s-expanded DDDs are able to represent ahuge number of product terms with a relatively small number of vertices by meansof sharing among different coefficient DDDs.

First, we show that term decancellation is critical in improving the accuracy forinterval bounds in DDD evaluation using affine interval. Table 14.2 shows the effectof coefficient affine radius reduction considering term decancellation for the giventwo example circuits during the DDD evaluation under different sets of variations.Var, Nom, and Den represent process variation, numerator, and denominator, respec-tively. As can be seen from the table, the average radius reduction amount is 35:4%and 49:8% for numerators and denominators, respectively, and the reduction effectgrows with the increasing of process variation. As a result, symbolic decancellationcan indeed significantly reduce the pessimism of affine arithmetic.


103 104 105 106 107

103 104 105 106 107

−30

−20

−10

0

10Bode Diagram of CMOS Lowpass Filter

Mag

nitu

de (

dB)

−100

−80

−60

−40

−20

0

Pha

se (

deg)

Frequency (Hz)

Monte CarloNominalAffine DDD

Monte CarloNominalAffine DDD

Fig. 14.6 Bode diagram of the CMOS low-pass filter. Reprinted with permission from [61].c� 2011 IEEE

Second, we present the performance of the presented method. For the low-pass filterexample, we introduce three noise symbols " as the local variation source for theVCCS, resistor, and capacitor inside linear op-amp model shown in Fig. 14.5b. Andwe introduce another four noise symbols " for other devices of the filter as globalvariation. For the cascode op-amp example, we introduce three noise symbols " forthe VCCS, resistor, and capacitor inside the small signal model for each MOSFETtransistor as local variation source and introduce another six noise symbols " forother devices in the op-amp as global variations. The total number of noise symbolsfor each testing circuit is shown in the 8th column in Table 14.3. As a DDDexpression is exactly symbolic and does not have any approximations, it is provedto be accurate compared with SPICE (which uses the simple linearized devicemodels). In the experiments, we compare the obtained result with the Monte Carlosimulations using DDD. We test the presented algorithm on different global/localvariation pairs as is shown in column 9. We introduce the bound range, which isthe average value of the result of the bound of the MC simulation divided by thebound of the presented method.

Shown in Figs. 14.6 and 14.7 are the two results for comparison for the presentedmethod and the MC method under 10% global, 10% local variation and 5% global,10% local variation, in which Affine DDD is the presented method and the Nominalis the response of the circuit without parameter variation. During all the simulations,

5 Summary 233

100 102 104 106

100 102 104 106

0

20

40

60

Mag

nitu

de (

dB)

Bode Diagram of CMOS Cascode Opamp

NominalAffine DDDMonte Carlo

−100

−50

0

Frequency (Hz)

Pha

se (

deg)

NominalAffine DDDMonte Carlo

Fig. 14.7 Bode diagram of the CMOS cascode op-amp. Reprinted with permission from [61].c� 2011 IEEE

we found that the bound calculated by Kharitonov’s functions in the presentedmethod is always the conservative bound compared with MC. However, furtherinvestigation is needed to obtain tighter bound using affine arithmetic. We chosethe MC samples to be 10,000. The speed up of the presented method compared withMC is shown in column 12 in the Table 14.3. The average speed up is 90� for givencircuits.

5 Summary

In this chapter, we have presented a performance bound analysis algorithm ofanalog circuits considering process variations. The presented method applies agraph-based symbolic analysis and affine interval arithmetic to derive the variationaltransfer functions of linearized analog circuits with variational coefficients. Then thefrequency response bounds were obtained by using the Kharitonov’s polynomialtheorem. We have shown that symbolic decancelation is important and necessaryto reduce pessimism for affine interval analysis. We also showed that the responsebound given by the Kharitonov’s functions is conservative given the correlationsamong coefficient intervals in transfer functions. Numerical examples demonstratedthe effectiveness of the presented algorithm compared to the MC method.

Chapter 15Stochastic Analog Mismatch Analysis

1 Introduction

For sub-90 nm technologies, mismatch in transistor is one of the primary obstaclesto reach a high yield rate for analog designs. For example, mismatch of CMOSdevices nearly doubles for every process generation less than 90 nm [80,104] due toan inverse-square-root-law dependence with the transistor area.

Similar to leakage analysis, the traditional worst-case-based analysis is too pes-simistic to sacrifice the speed, power, and area. Therefore, the statistical approach[6, 80, 105, 128, 133] becomes a viable approach to estimate analog mismatch.Analog circuit designers usually perform a MC analysis to analyze and predict thestatistical mismatch and functionality of VLSI designs. As MC analysis requiresa large number of repeated circuit simulations to achieve accurate result, itscomputational cost is extremely expensive. Besides, MC pseudorandom generatorintroduces numerical noises that may lead to errors.

Recently, many NMC methods [6, 80, 128] were developed to analyze stochasticmismatch in VLSI. The authors of [128] calculated dc sensitivities with respect tosmall device-parameter perturbations and scaled them as desired mismatches while[80] extended the above work by modeling dc mismatches as ac noise sources. In atransient simulation, the mismatch is converted back from the power spectral density(PSD) in frequency domain. The speed of these NMC mismatch simulations can bemuch faster than the MC approaches, but the accuracy remains a concern.

Recently, the mismatch was studied within the framework of the stochasticdifferential algebra equation (SDAE), which is called SiSMA [6]. SiSMA is similarto dealing with the transient noise [27]. Due to random variable existing in DAE,it is unknown if the derivative is still continuous. Besides, the mismatch of thechannel current in transistors is designers’ top interest. As a result, the mismatchwas modeled as a stochastic current source in SiSMA and formed an SDAE.Assuming the magnitude of the stochastic mismatch is much smaller than the


235

236 15 Stochastic Analog Mismatch Analysis

nominal case, the nominal SDAE at dc can be linearized with the stochastic currentsource. The obtained dc solution from SiSMA is used as initial condition (ic) fortransient analysis. This assumption may not be accurate enough for describing themismatch during the transient simulation as the stochastic current source is onlyincluded during dc. Another limitation is that SiSMA calculates the mismatchby the extraction and analysis of a covariance matrix to avoid an expensive MCsimulation. When there are thousands of devices, it would be slow to analyze thecovariance matrix. Moreover, the computation is expensive for large-scale problemssince the entire circuit is analyzed twice. As a result, there is still a need to find afaster transient mismatch analysis technique that requires improvements in twofold:a different NMC method and an efficient macromodel by the nonlinear model orderreduction (MOR).

This chapter presents a fast NMC mismatch analysis, named isTPWL method[202], which uses an incremental and stochastic TPWL macromodel. First, weintroduce the transient mismatch model and its macromodeling in this chapter andthen the way to linearize SDAE along a series of snapshots on a nominal transienttrajectory. After that, stochastic current source (for mismatch) is added at eachsnapshot as a perturbation, which is more accurate than considering the mismatchthrough an ic condition [6]. We further show how to apply an improved TPWLmodel order reduction [58,144,181] to generate a stochastic nonlinear macromodelalong the snapshots of the nominal transient trajectory. After that, we apply it for afast transient mismatch analysis along the full transient trajectory. The presentedapproach applies incremental aggregation on local tangent subspaces, linearizedat snapshots. In this way, the applied technique can reduce the computationalcomplexity of [58] and even improve the accuracy of [144].

The numerical examples show that the isTPWL method is 5� times moreaccurate than the work in [144] and is 20� faster than the work in [58] on average.Besides, the nonlinear macromodels reduce the runtime by up to 25� compared tothe use of the full model during the mismatch analysis.

Next, in order to solve the SDAE efficiently and avoid applying MC iterations oranalyzing the expensive covariance matrix [6], the stochastic variation is describedby spectral stochastic method based on OPC and forms an according SDAE [196].The chapter presents a new method to apply OPC for nonlinear analog circuitsduring an NMC mismatch analysis. Numerical results show that compared to theMC method, the presented method is 1,000 times faster with a similar accuracy.

The rest of the chapter is organized in the following manner. In Sect. 2, thebackground of the mismatch model and the nonlinear model order reduction arepresented. Section 3 discusses a transient mismatch analysis in SDAE, including aperturbation analysis and a NMC analysis by the OPC expansions. We develop anincremental and stochastic TPWL model order reduction for mismatch in Sect. 4.And numerical examples are given in Sect. 5. Section 6 concludes and summarizesthe chapter.

2 Preliminary 237

2 Preliminary

2.1 Review of Mismatch Model

Precise mismatch model and analysis are the key to a robust analog circuit design.Similar to the two components of process variation, inter-die and intra-die, thereare global and local components of mismatch. The global mismatch affects the thewhole chip the same way, while the local mismatch is more complex and the mostdifficult one to analyze, and hence, it is the focus of this chapter.

The local mismatch is dependent on the variation in process parameter. ThePelgrom’s model is one of the most popular CMOS mismatch models, which [133]relates the local mismatch variance of one electrical parameter (such as the channelcurrent Id) with geometrical parameters (such as the area A) by a geometricaldependence equation as follows:

�Id D �ˇ

pA

; (15.1)

where A D W � L is the area of a width W and length L, and �ˇ is an extractedconstant depending on the operating region ˇ.

Considering process parameters other than the geometry, a more general pur-posed mismatch model can be derived through a so-called backward propagation ofvariance (BPV) method [105] for other transistors such as diode, BJT, [105]. Forexample, the base-current Ib depends on the base current density, emitter area, andsheet resistance. The BPV model is then built up for the relation between the localmismatch of an electrical property e and those process parameters pl by a first-ordersensitivity:

�e DX

l

�@e

@pl

�pl

: (15.2)

Based on the mismatch model in (15.2), a NMC transient mismatch analysis fora large number of transistors can be developed, which is shown in Sect. 3.

2.2 Nonlinear Model Order Reduction

Here we discuss the nominal model for nonlinear circuit first, then expand it tostochastic model. The nominal nonlinear circuit is described by the followingdifferential algebra equation (DAE):

f .x; Px; t/ D Bu.t/; (15.3)


where x ( Px D dx=dt) are the state variables, which include nodal voltage and branchcurrent. f .x; Px; t/ is used to describe the nonlinear i � v relation, and u.t/ are theexternal sources with a topology matrix B, which describes how to add them intothe circuit. The time cost of solving the MNA equations in (15.3) includes threeparts: device evaluation, matrix factorization, and time-step control and integration.Among these three items, the portion of runtime mainly comes from the matrixfactorization when the circuit size is large or when devices are latent in most of thetime. Supposing we are under this condition, model order reduction can be used toreduce the size of circuit, and then reduce the overall runtime efficiently. Therefore,model order reduction can be applied in a transient mismatch analysis as a powerfulspeedup tool as well.

The basic idea in model order reduction is to find a small dimensioned subspacethat can represent the original state space with a preserved system response, whichcan be usually realized in the view of a coordinate transformation. For linear circuits,the coordinate transformation can be described by a linear mapping as follows:

z D V T x; x D V z; (15.4)

where V is a small dimensioned projection matrix (2 N � q, q � N ). V canbe constructed from the first few dominant bases spanning a space of moments(or derivatives of transfer functions) [36, 127].

For nonlinear circuits, model order reduction is more complex, and there arealready many MOR techniques developed [58, 144, 146, 181] as well. Similar toMOR for linear circuit, there can be a nonlinear mapping defined by a function �:

z D �.x/; x D ��1.z/: (15.5)

Without losing generality, we assume an ordinary differential equation (ODE) formfor the simplicity of illustration:

Px D f .x; t/ C Bu.t/ (15.6)

for the DAE in (15.3). Since

Pz D d�

dx

dx

dtD�

d�

dxf .x; t/

C�

d�

dxB

u.t/; (15.7)

we have

Pz D Of .z; t/ C OBu.t/; Of .z; t/ D�

d�

dxf .x; t/

�ˇˇ̌ˇxD��1.z/

; OB D d�

dxB: (15.8)

In this way, if a proper lower-dimensioned mapping function � (2 N � q) canbe found, the original nonlinear system can be reduced within a tangent subspacespanned by d�=dx (or named as manifold).

3 Stochastic Transient Mismatch Analysis 239

The authors of [58] presented a working related the above nonlinear mappingfunction � with a TPWL method [144], which leads to a local two-dimensional(2D) projection [58]. The bright side is that such a local 2D-projection is constructedfrom local tangent subspaces, which maintains a high accuracy. However, the timecomplexity comes out as an issue. Local 2D-projection could be computationallyexpensive to project and store, when the number of local tangent subspaces is large.On the other hand, the TPWL method [144] approximated the nonlinear mappingfunction � by aggregating those local tangent subspaces with the use of a globalSVD. This global SVD results in a one-dimensional (1D) projection. Obviously, theglobal 1D-projection leads to a more efficient projection and less runtime. Anotherthing is the accuracy of the TPWL model order reduction is limited because theinformation in the dominant bases of each local tangent subspace is lost during theglobal SVD [58]. In Sect. 4, an incremental aggregation that can balance the speedand accuracy is introduced. In addition, the nonlinear model order reduction can beextended to consider the stochastic mismatch as shown in Sect. 4.

3 Stochastic Transient Mismatch Analysis

3.1 Stochastic Mismatch Current Model

It is difficult to add the stochastic mismatch � into the state variable x of (15.3)directly, since f .x; Px; �/ may not be differentiable. Therefore, we model themismatch as a current source i.x; �/ added at the rhs of (15.3), similar to SiSMA [6]:

f .x; Px; t/ D F i.x; �/ C Bu.t/: (15.9)

Here, F is the topology matrix describing the way to connect i into the circuit.Based on the BPV equation in (15.2), the stochastic current source i has the

following form:

i.x; �/ D n.x/X

l

gˇ.pl/�l ; (15.10)

where �l is a random variable associated with a stochastic distribution W.�l / for theparameter pl . n.x/ describes the biasing-dependent condition (depending on x; Px),provided from a nominal transient simulation. gˇ.pl/ is a constant for the parameterpl at operating region ˇ. Taking one CMOS transistor with respect to the parameterarea A, for instance, �A is one Gaussian random variable, gˇ.A/ is �ˇ=

pA, and

n.x/ becomes Id. Generally speaking, gˇ.pl/ can be either derived based on theanalytical device equations or practically characterized from measurements [105].


3.2 Perturbation Analysis

In this chapter, we assume that the impact of the local mismatch is small, (15.9) andcan be solved by treating the right-hand-side term for mismatch as a perturbationto the nominal trajectory x.0/.t/ of the circuit, where x.0/.t/ is the nominal statevariable or solution of the nonlinear circuit equation:

f�x.0/; Px.0/; t

� D Bu.t/: (15.11)

First-order Taylor expansion of f .x; Px; t/ in (15.9) can lead to the followingequation:

fx.0/; Px.0/; t

�C @f .x; Px; t/

@x

x � x.0/

�C

@fx; Px; t

�

@ Px

Px � Px.0/�

D F inx.0/; �

�C Bu.t/; (15.12)

or

Gx.0/; Px.0/

�xm C C

x.0/; Px.0/

�Pxm D F in

x.0/; �

�; (15.13)

where

Gx.0/; Px.0/

�D

@fx; Px; t

�

@x

ˇˇˇ̌ˇˇxDx.0/; PxD Px.0/

;

Cx.0/; Px.0/

�D

@fx; Px; t

�

@ Px

ˇ̌ˇˇˇˇxDx.0/; PxD Px.0/

(15.14)

are the linearized conductive and capacitive components stamped by the companionmodels in SPICE, and xm D x �x.0/ is the first-order perturbed mismatch response.Recall that x.0/.t/ and Px.0/.t/ are a number of time-dependent biasing points alongthe transient trajectory.

3.3 Non-Monte Carlo Analysis by Spectral Stochastic Method

Performing Monte Carlo or the correlation mismatch analysis can be really ex-pensive, so in this part, we will introduce the perturbed SDAE (15.13) wherethe random variable � is solved through an expansion of the OPC using spectralstochastic method in Sect. 3.2 of Chap. 2. Different process variations are relatedto the different orthogonal polynomials. In this chapter, we assume that the random

3 Stochastic Transient Mismatch Analysis 241

process parameters for the local mismatch have a Gaussian distribution. Therefore,an according Hermite polynomial (represent one random variable)

˚.�/ D Œ˚1.�/; ˚2.�/; ˚3.�/; : : : ; �T D Œ1; �; �2 � 1; : : : ; �T (15.15)

is used to construct the basis of HPC expansion to calculate the mean and thevariance of xm.t/.

The first step is expanding the stochastic state variable xm.t/ by

xm.t/ DX

i

˛i .t/˚i .�/: (15.16)

Then, we apply the inner product of the residue error

.�/ D G�x.0/; Px.0/

�X

i

˛i .t/˚i .�/ C Cx.0/; Px.0/

�X

i

P̨i .t/˚i .�/

�Fn�x.0/

�X

l

gˇ.pl /�l

by the orthogonal basis ˚j .�/, it results in

h.�/; ˚j .�/i DZ

�

.�/˚j .�/W.�/d� D 0; (15.17)

where W.�/ is the PDF of the random variable �. We assume all parameters involvedhere follow Gaussian distribution.

Without the loss of generality, for one random variable � for modeling onegeometrical parameter p, it is easy to verify that (15.17) leads to

˛0 D 0; ˛2 D 0

G�x.0/; Px.0/

�˛1.t/ C C

�x.0/; Px.0/

� P̨1.t/ D Fn�x.0/

�gˇ.p/ (15.18)

with a second-order HPC expansion of xm.�/. The according standard deviation isthereby given by

Var < xm.�/ >D ˛21Var.�/ C ˛2

2Var.�2 � 1/ D ˛21: (15.19)

The first-order OPC coefficients of ˛1.t/ in (15.18) can be solved by backward-Euler integration as follows:

�Gk C 1

hCk

˛1.tk/ D 1

hCk˛1.tk � h/ C F ik; (15.20)

where

Gk D Gx

.0/

k ; Px.0/

k

�; Ci D C

x

.0/

k ; Px.0/

k

�; ik D n.xk/

X

l

gˇ.pl/ (15.21)

are Jacobians and the current source of mismatch at the kth time-instant along thenominal trajectory x.0/.


It is easy to see that a native application of the above perturbation-basedmismatch analysis is still slow, since Gk , Ck , and ik have to be evaluated duringevery time step along the nominal trajectory. Therefore, only K snapshots along thenominal trajectory are used in the frame of a macromodeling instead of linearizingalong the full nominal trajectory, in Sect. 4.

3.4 A CMOS Transistor Example

In this part, using one CMOS transistor as an example, which is modeled with ageometric parameter A, and the according Gaussian random variable �A, (15.18)becomes

�Gk C 1

hCk

˛1.tk/ D 1

hCk˛1.tk � h/ C �ˇ

pA

� .Id/k (15.22)

at the kth time step. Recall that Gk , Ck , and .Id/k represent the nominal valueof conductance (gds), capacitance (cds), and channel current Id evaluated at tk ;gˇ.A/ is �ˇ=

pA, and n.x/ becomes Id. Note that �ˇ is the extracted constant from

Pelgrom’s model.In this way, the transient mismatch voltage .xm D ˛1.t/˚1.�A// of this transistor

has a time-varying standard variance ˛1.t/2, which can be solved from the above

perturbation equation. In most of the cases, �ˇ=p

A is about few percentages of thenominal channel current Id. The more important thing is that we can simultaneouslysolve the transient mismatch vector using (15.18) with a generally characterizedgˇ.pl/ by the BPV model [105] for thousands of different typed transistors.

4 Macromodeling for Mismatch Analysis

For speedup purpose, we can take K snapshots along a nominal transient trajectoryinstead of performing a full simulation for the nominal transient and transientmismatch. Then the subspaces or macromodels can be found from the K snapshotswith respect to right-hand-side of the nominal input and stochastic current source,respectively. Afterward, efficient transient analysis and transient mismatch estima-tion can be performed along the full transient trajectory using those macromodels. Inthe following part, we first introduce an incremental TPWL method for the nominaltransient to balance the accuracy and efficiency when generating the macromodel.After that, we extend this approach to incremental stochastic TPWL (isTPWL) tohandle the stochastic mismatch.

4 Macromodeling for Mismatch Analysis 243

4.1 Incremental Trajectory-Piecewise-Linear Modeling

As discussed in Sect. 2, the first step in TPWL takes a few number of snapshotsalong the typical transient trajectory and performs the local reduction at eachlinearized snapshot or biasing point. The second step is creating a global subspaceusing a sequence of linearized local subspaces obtained at those snapshots. Then weapply a singular value decomposition (SVD) [51] to analyze the global subspace,and further construct a global projection matrix with weights. The linearizedstochastic DAE (15.18) can be naturally reduced in the framework of the TPWLmethod since the stochastic mismatch analysis isTPWL is performed along thenominal trajectory x.0/.

Suppose that there are K snapshotsnx

.0/1 ; : : : ; x

.0/K

otaken along the nominal

trajectory x.0/. The linearized SDAE at the kth snapshot should be

Gk˛1.t/ C Ck P̨1.t/ D F ik: (15.23)

The above linearized subsystem in frequency domain is contained by a subspacefAk; AkRk; A2

kRk; : : : ; g composed by moments expanded at a frequency point s0

using two moments matrices:

Ak D .Gk C s0Ck/�1Ck; Rk D .Gk C s0Ck/�1F : (15.24)

With the use of the block-Arnoldi orthonormalization [127], a q0th orderprojection matrix Vk (2 N � q0),

Vk Dhv1

k; v2k; : : : ; vq0

k

i; k D 1; : : : ; K (15.25)

can be constructed locally. Here we use the subscript to describe the index ofsnapshot, and the superscript to describe the index of the reduction order.

4.1.1 Local Tangent Subspace

When the input vector is given (usually a set of typical inputs is used), we take K

snapshotsnx

.0/1 ; : : : ; x

.0/K

oalong a nominal transient trajectory x.0/.t/ and linearize

the DAE (15.3) at K snapshots (or biasing points), with the first snapshot x1 takenat the ic point. The linearized DAE at kth (k D 1; : : : ; K) snapshot is

Gk

x � x

.0/

k

�C Ck

Px � Px.0/

k

�D ık; ık D Bu.tk/ � f

x

.0/

k ; Px.0/

k ; tk

�; (15.26)

where ık represents the rhs source and the “nonequilibrium” update. x.0/

k at the kthsnapshot is contained by a subspace of moments fAk, AkRk , A2

kRk , . . . ,g expanded


at a frequency point s0 in frequency domain, where

Ak D .Gk C s0Ck/�1Ck; Rk D .Gk C s0Ck/�1ık (15.27)

are two moments matrices.With the use of the block-Arnoldi orthonormalization [127], a q0th order

projection matrix Vk (2 N � q0) with q0 bases

Vk Dhv1

k; v2k; : : : ; vq0

k

i(15.28)

can be constructed locally to represent that local subspace. We call vik (k D

1; : : : ; K; i D 1; : : : ; q0) as the first-q0 dominant bases of one Vk , where thesubscript and superscript describe the index of the local subspace and the index ofthe order of the dominant base, respectively. Block-Arnoldi orthonormalization canfind a linear coordinate transformation Vk which maintains jjz� z.0/

k jj jjx �x.0/

k jj.Moreover, as discussed in the following part, those Vks could span a subspace ford�=dx, the tangent (or named as manifold) of the mapping function � introduced inSect. 2. In this chapter, we call the space spanned by Vks as local tangent subspace.

4.1.2 Local and Global Projection

One approach to approximate the nonlinear mapping function � introduced inSect. 2 is discovered in [58]:

x D ��1.z/ KX

kD1

wk

hxk C Vk

z � z.0/

k

�i(15.29)

and

z D �.x/ KX

kD1

wk

hzk C V T

k

x � x

.0/

k

�i; (15.30)

where wk

PKkD1 wk D 1

�is the weighted kernel function. The weighted kernel

function depends on the distance between a point on the trajectory and a lineariza-tion point [144].

A nonlinear model order reduction is derived in terms of a local two-dimensional(2D) projection based on equations (15.8), (15.29), and (15.30) as follows:

KX

lD1

KX

kD1

wl wk

hV T

l GkVk

z � z.0/

k

�C V T

l CkVk

Pz � Pz.0/

k

�iD

KX

lD1

wl VT

l ık;

(15.31)

4 Macromodeling for Mismatch Analysis 245

where we assume that all Vks are reduced to the same order q0. The number ofsampled snapshots is required to be quite large to maintain a high accuracy forcircuits with a sharp transition (input) or strong nonlinearity (device). For this kindof circuits, the numerical examples show that the number of sampled snapshots(or neighbors) has to be large to produce a good accuracy. As such, the computa-tional runtime cost would be prohibited by the local 2D projection (15.31) in [58].

On the other hand, the TPWL method in [144] approximates the nonlinearmapping function � by aggregating the local subspace Vk (2 N � q0) into a unifiedglobal subspace spanfV1; V2; : : : ; VKg, which can be further compressed into alower-dimensioned subspace V (2 N � q, q � N ) by a SVD as follows,

V D SVDq .ŒV1; V2; : : : ; VK�/ : (15.32)

This procedure is defined as global aggregation. A global aggregation can generatea global one-dimensional (1D) projection by

KX

kD1

wk

hVT GkV

z � z.0/

k

�C VT CkV

Pz � Pz.0/

k

�iD

KX

kD1

wkVT ık: (15.33)

It is easy to see that such a global 1D-projection has a smaller projection time andstorage than the local 2D-projection. However, the global 1D-projection usuallyrequires a higher-order q to achieve an accuracy similar to the local 2D projectionwith the order q0 (q0 < q) [58] at the same time, since the dominant bases of thoselocal Vks are interpolated by the global aggregation.

4.1.3 Incremental Aggregation of Subspaces

Longer runtime and larger storage are required by the local 2D-projection in (15.31)compared to the global 1D-projection (15.33). On the other hand, the local 2D-projection (15.31) is more accurate than the global 1D-projection (15.33) by V .Therefore, we need a procedure that can balance both of the accuracy and efficiency.

The manifold d�

dxcan be covered by the local tangent subspaces fV1, V2,. . . ,VKg

along the trajectory, where each Vk can be further composed of different orders

of dominant bases, fv1k; v2

k; : : : ; vq0

k g. As such, an effective aggregation needs toconsider the order or the dominance of those bases. This motivates us to use thoselocal tangent subspaces to decompose the space spanned first according to the order.In this way, (15.29) becomes

x D ��1.z/ KX

kD1

wkxk CKX

kD1

wk

q0

X

pD1

vp

k

z � z.0/

k

�


DKX

kD1

wkxk Cq0

X

pD1

KX

kD1

vp

k wk

z � z.0/

k

�

DKX

kD1

wkxk Chv1

1w1

z � z.0/

1

�C : : : C v1

KwK

z � z.0/

K

�i

C : : : Chvq

1w1

z � z.0/

1

�C : : : C vq

KwK

z � z.0/

K

�i: (15.34)

After that, we can form a global tangent subspace in the order of the dominantbases by

span˚v1

1; v12; : : : ; v1

K

�; : : : ; span

nvq0

1 ; vq0

2 ; : : : ; vq0

K

o: (15.35)

A global projection matrix V is accordingly constructed below in a fashion of anincremental aggregation. In this process, we first aggregate each global tangentsubspace by orders

V1 D SVDq

��v1

1; : : : ; v1K

��; : : : ; Vq0 D SVDq

hvq0

1 ; : : : ; vq0

K

i�: (15.36)

That is to say, we can identify a Vp (p D 1; : : : ; q0) to represent the p-th orderglobal tangent subspace.

Then, the global projection matrix V can be further aggregated

V D SVDq

�ŒV1; V2; : : : ; Vq0 �

�(15.37)

by those global tangent subspaces in a descending order of dominance. As shown bythe numerical examples, usually we can choose a much lower q0 (q0 � q) for eachlocal tangent subspace Vk , and the order q depends on the number of snapshots. Forcircuits with the sharp transition (input waveform) or strong nonlinearity (device),the number of snapshots is large and so does the number of q.

The information of those dominant bases at low orders are preserved, as thelocal tangent subspace is incrementally aggregated according to their ordered bases.As shown by the numerical examples, when compared to the previous TPWLmethod [144], this incremental aggregation results in a higher accuracy yet witha similar computational cost in the projection time and memory storage. Anotherbenefit of the presented incremental aggregation is that it also can consider moresampled biasing (linearization) points than the approach in [58], whereas thecomputational cost of the local 2D-projection would increase dramatically.

4.2 Stochastic Extension for Mismatch Analysis

After the incremental aggregation, we further extend the above discussion to buildthe TPWL macromodel for stochastic mismatch analysis. Instead of linearizing the


DAE in (15.3) directly, we linearize the SDAE (15.18) at K snapshots along thenominal trajectory similarly, and then construct the local tangent subspace Vk bythe following formula:

A0k D .Gk C s0Ck/�1Ck; R0

k D .Gk C s0Ck/�1ı0k: (15.38)

Here ı0k is determined by the nonequilibrium correction associated with F ik . After

that, we can build the similar incrementally aggregated mapping V through (15.36)and (15.37).

Then, a set of weighted local macromodels can construct the global macromodel,where we use

KX

kD1

wk � �VT GkV˛1.t/ C VT CkV P̨1.t/ � VT F � ik� D 0 (15.39)

to calculate the transient mismatch. We call such a macromodeling technique asisTPWL method, which is sampled from K snapshots. Using such a macromodel,we can then efficiently perform a transient mismatch analysis for the full trajectory.


To show the numerical examples of the presented method, a modernized SPICE3(http://ngspice.sourceforge.net/) is used to generate the K snapshots of a nominaltrajectory and to extract the mismatch current model. The presented mismatchalgorithm has been implemented in C and Matlab, where the OPC expansion,backward-Euler, and incremental and stochastic TPWL (isTPWL) are implementedin Matlab. The TPWL method and maniMOR method are implemented exactly fol-lowing the procedure described in [144] and [58], respectively, for the comparisonpurpose. For instance, the state variables at snapshots are added to have a “richer”information during the global aggregation as for the TPWL method [144]. Weimplement the flow under MC analysis as the baseline with 1,000 iterations. Theinitial results of this chapter were published in [202].

All experimental results are measured on an Intel dual-core 2.0 GHZ PC with2 GB memory. We compare the accuracy and study the scalability of the presentedmethod with four industrial analog/RF circuits. They contain different transistorssuch as diode, BJT, CMOS. The circuits also include the extracted parasites so thatthe matrix time is dominant. For the characterization of gˇ.pl/, we apply Pelgrom’smodel for CMOS transistors and BPV model for diodes and BJTs. All of them resultin � 10% variation from the nominal bias n.x/ (e.g., Id for CMOS transistor). Inaddition, the waveform error is measured by taking the averaged difference of twowaveforms. Three waveforms are measured at each time step: the transient nominal�x.0/.t/

�, the transient mismatch (˛1.t/, the time-varying standard deviation), and

the transient (x.t/, the nominal plus the standard deviation).


Table 15.1 Scalability comparison of runtime and error for the exact model withMC, the exact model with OPC, and the isTPWL macromodel with OPC

Case Circuit # of nodes # of steps # of snapshots # of orders

1 Diode chain 802 225 24 252 BJT mixer-1 238 135 25 253 BJT mixer-2 1,248 219 83 454 CMOS comp. 654 228 75 60

Exact OPC OPCCisTPWLMC

Case Time (s) Time (s) Error (%) Time (s) Error (%)

1 520.1 0.53 0.41 0.02 0.432 338.0 0.34 0.29 0.02 0.363 348.0 0.20 0.18 0.04 0.244 412.1 0.39 0.41 0.08 0.62

5.1 Comparison of Mismatch Waveform-Error and Runtime

In this part, we first compare the accuracy of the waveform of transient mismatchbetween the MC method (1,000 iterations) and the exact orthogonal PC. After that,we further compare the accuracy with the isTPWL macromodel. In addition, wealso compare the waveform of the transient mismatch and the waveform by addingmismatch as one initial condition similar to the setting in SiSMA [6] technique.Finally, the runtime and waveform error are summarized in Table 15.1.

The first example is a BJT-mixer circuit including an extracted distributedinductor with 238 state variables. The waveforms are compared by solving theperturbed SDAE (15.13) with use of the MC analysis and the OPC expansion,respectively. We apply MC analysis with Gaussian distribution 1,000 times at onetime step and calculate the time-varying standard deviation. It takes 348 s for thetransient mismatch by the MC analysis, and only 0:20 s (more than 1,000 timesspeedup) for the exact OPC expansion up to the second order with error less than0:18%. Clearly, these two waveforms of transient mismatches got from the twomethods are virtually identical, as shown in Fig. 15.1.

Next, we show further speed improvement by macromodeling. The secondexample is a CMOS comparator including an extracted power supply with 654 statevariables. Waveforms of the exact OPC and the one further reduced by isTPWL arecompared in this part. Figure 15.2a shows the comparison of the transient nominal,while Fig. 15.2b shows the comparison of the transient mismatch. Here 75 snapshotsare used to generate the macromodel: we reduce the original model to a macromodelwith the order of 60. For a short transient with 228 time steps, it takes 0.39 secondfor the exaction and 0.08 second for the isTPWL (five times speedup). The error ofwaveforms analyzed by isTPWL is 0.62%.

We further compare the transient mismatch waveforms for different ways toadd the mismatch. The first is to add the stochastic mismatch only for the iccondition like the procedure used in SiSMA [6] (Fig. 15.3). The second is adding


0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

(ns)

(mV

)

Transient Mismatch

Monte CarloSOP Expansion

Fig. 15.1 Transient mismatch (the time-varying standard deviation) comparison at output of aBJT mixer with distributed inductor: the exact by Monte CarloN and the exact by orthogonal PCexpansion. Reprinted with permission from [52]. c� 2011 ACM

0 2 4 6

0

0.5

1

1.5

2

2.5

a b

(V)

Transient Nominal

(ns)0 2 4 6

0

2

4

6

8

10

12

(ns)

(mV

)

Transient Mismatch

Exact SOP isTPWL SOP Exact SOP isTPWL SOP

Fig. 15.2 Transient nominal�x.0/.t /

�(a) and transient mismatch (˛1.t/) (b) for one output of a

COMS comparator by the exact orthogonal PC and the isTPWL. Reprinted with permission from[52]. c� 2011 ACM

the stochastic mismatch during every time step as in the presented approach. In thispart, we use a diode chain with 802 state variables. Figure 15.4 shows one waveformof the transient nominal, and two waveforms with mismatches added differently,from which we can see that the waveform with mismatch added at ic shows anonnegligible difference.


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.5

0.6

0.7

0.8

0.9

1.0

1.1

(ns)

(V)

Transient Waveform

Nominal TransientSiSMA TransientExact−SOP Transient

Fig. 15.3 Transient waveform comparison at output of a diode chain: the transient nominal, thetransient with mismatch by SiSMA (adding mismatch at ic only), the transient with mismatch bythe presented method (adding mismatch at transient trajectory). Reprinted with permission from[52]. c� 2011 ACM

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

(ns)

(mV

)

Transient Mismatch

Exact SOPisTPWLTPWL

Fig. 15.4 Transient mismatch (˛1.t/, the time-varying standard deviation) comparison at outputof a BJT mixer with distributed substrate: the exact by OPC expansion, the macromodel by TPWL(order 45), and the macromodel by isTPWL (order 45). The waveform by isTPWL is visuallyidentical to the exact OPC. Reprinted with permission from [52]. c� 2011 ACM

Finally, Table 15.1 summarizes the runtime and error of four different analog/RFcircuits. In this table, the waveform error is defined as the relative differencebetween the exact and the macromodel, and the runtime here is the total simulationtime. We find that the OPC expansion reduces the runtime by 1,000 times yet


1 2 3 40

1

2

3

4

5

6

7

ckt type

TP

WL/

isT

PW

Lerror ratioa b

1 2 3 40

5

10

15

20

25

ckt type

man

iMO

R/is

TP

WL

runtime ratio

diodechain

bjt−mixer

−1bjt−

mixer−2

cmos−comp

diodechain

bjt−mixer

−1

bjt−mixer

−2

cmos−comp

Fig. 15.5 (a) Comparison of the ratio of the waveform error by TPWL and by isTPWL under thesame reduction order. (b) comparison of the ratio of the reduction runtime by maniMOR and byisTPWL under the same reduction order. In both cases, isTPWL is used as the baseline. Reprintedwith permission from [52]. c� 2011 ACM

with an error of 0.23% on average. Moreover, the macromodel by isTPWL furtherreduces the runtime up to 25 times (diode chain) yet with an error up to 0.43%. Thisdemonstrates the efficiency and accuracy of the isTPWL method for the transientmismatch analysis.

5.2 Comparison of TPWL Macromodel

By isTPWL, we can improve the accuracy and runtime further, as shown in thispart. First, Fig. 15.4 presents the transient-mismatch waveform comparison for aBJT mixer including the distributed substrate with total 1,248 state variables. Here83 snapshots are used for both TPWL and isTPWL to reduce the original modelto a macromodel with the order of 45. We find that the waveform by isTPWL isvisually identical to the exact OPC expansion. But the waveform by TPWL [144]shows a nonnegligible waveform error 4.5 times larger than the one by isTPWL.Figure 15.5 further summarizes the comparison by the four circuits used in theprevious section. Figure 15.5a is the comparison of the ratio (TPWL vs. isTPWL)of errors in waveforms for simulated macromodels by TPWL [144] and by isTPWLunder the same model reduction order. Figure 15.5b shows the comparison of the


ratio (maniMOR vs. isTPWL) of the reduction time for reduced macromodels bymaniMOR [58], and by isTPWL under the same reduction order. In both of thosecases, isTPWL is used as the baseline when calculating the ratio. The numericalexamples show that the isTPWL method is 5 times more accurate than TPWL [144]and is 20 times faster than maniMOR [58] on average, which clearly demonstratesthe advantage to use the incremental aggregation.

6 Summary

This chapter has presented a fast non-MC mismatch analysis. It models themismatch by a current source associated with a random variable and forms a SDAE.The random variable in SDAE is expanded by OPC. This leads to an efficientsolution without using the MC or correlation analysis. Moreover, the SDAE hasbeen solved by an improved TPWL model order reduction, called isTPWL. Anincremental aggregation has been introduced to balance the efficiency and accuracywhen generating the macromodel. Numerical examples show that when compared tothe MC method, the presented method is 1,000 times faster with a similar accuracy.Moreover, on average, the isTPWL method is 5 times more accurate than the workin [144] and is 20 times faster than the work in [58]. In addition, the use of a reducedmacromodel reduces the runtime by up to 25 times when compared to the use of afull model.

Chapter 16Statistical Yield Analysis and Optimization

1 Introduction

A robust design beyond 90 nm is challenging due to process variations [6,20,31,32,37,54,55,59,67,80,88,100,105,124,133,135,153,180,187,189,203]. The sourcesof variation can come from etching, lithography, polishing, stress. For example, theproximity effect caused by stress from shallow-trench isolation regions affects thestress in the channel of nearby transistors and therefore affects carrier mobility andthreshold voltage. Process variation (or mismatch) significantly threatens not onlythe timing closure of digital circuits but also the functionality of analog circuits. Toensure the robustness in terms of a high yield rate, in addition to performance, afast engine for yield estimation and optimization is needed to verify designs beyond90 nm. Note that there are two types of variations: systematic global variation, andstochastic local variation. The stochastic variation such as analog mismatch is themost difficult one. One either performs thousand times of MC (Monte Carlo) runsconsuming engineering resources, or uses pessimistic process corners provided fromthe foundry. Since corners are usually pessimistic for yield and MC is too painfulfor verification, the stochastic engine with a NMC approach is currently requiredfor yield estimation and optimization.

To ensure one robust design, the development of fast variation (mismatch)analysis to estimate yield is the first priority. Many NMC methods have beendeveloped recently for stochastic variation (mismatch) analysis as discussed inChap. 15.

Next, one needs to improve or optimize the yield by tuning parameters atnominal conditions to ensure a robust design. An efficient approach is to deriveand employ yield sensitivity with respect to design parameters. Unfortunately,it is unknown how to calculate the stochastic sensitivity in the frame work ofthe OPC [187, 196]. This chapter is the first to discuss the stochastic sensitivityanalysis under OPC, which can be effectively deployed in any gradient-basedoptimization such as the sequential linear or quadratic programming. Moreover, itis necessary, even imperative, to optimize two or more objectives or performance


253

254 16 Statistical Yield Analysis and Optimization

merits simultaneously [26,103,152], such as maximizing the benefit and minimizingthe expense. To do so, we formulate a stochastic optimization problem anddevelop a multiobjective optimization algorithm to improve the yield rate andother objectives simultaneously. As such, our OPC-sensitivity-based algorithmperforms the optimization by changing the nominal point along gradient directionsof orthogonal PC-expanded SDAE [52]. Experiments show that fast mismatchanalysis can achieve up to 700� speedup and maintain 2% accuracy; meanwhile,our optimization procedure can improve yield rate to 95:5% and enhance otherperformance merits compared with other existing methods.

2 Problem Formulations

We formulate the yield optimization problem in this chapter. This is based on theobservation that the parameter vector p can change the performance metric fm,such as delay and output swing, and further lead to the circuit failure that affectsthe yield rate. In general, the parametric yield Y.p/ is defined as the percentage ofmanufactured circuits that can satisfy the performance constraints.

To illustrate this we can consider one output voltage that discharges from high tolow. Because the process variation can perturb the parameter vector p away fromtheir nominal values, this leads to the transient variation (mismatch) waveformshown in Fig. 16.1.

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040.89

0.892

0.894

0.896

0.898

0.9

Time

Out

put V

olta

ge

tmax

vthreshold

success

fail

Fig. 16.1 Example of the stochastic transient variation or mismatch

2 Problem Formulations 255

0.891 0.8915 0.892 0.8925 0.893 0.89350

20

40

60

80

100

Output Voltage

Num

ber

of O

ccur

ance

s

PerformanceConstraint

Successfulregion

Failed region

Fig. 16.2 Distribution ofoutput voltage at tmax

The performance constraint h.pI t/ in this case is

h.pI t/ D fm.tmax/ � fmthreshold � 0: (16.1)

This means that those curves below vth at tmax correspond to successful samplings. Inaddition, one can plot the distribution of output voltages at tmax shown in Fig. 16.2. Itis clear that samplings located at the left of the performance constraint are successes,while those at the right are failures.

As such, parametric yield can be defined as

Y.pI t/ DZ

S

pdf .fm.pI t//dS; (16.2)

where S is the successful region and pdf .fm.pI t// is the PDF of the performancemetric fm.pI t/ of interest. With defined parametric yield, one can optimize theparametric yield by tuning the parameters under stochastic variations. Meanwhile,one needs to consider other performance merits, such as power and area, during theoptimization process.

Accordingly, stochastic multiobjective optimization problem in this chapter canbe formulated in detail below:

Maximize Y.p/;

Minimize pc.p/;

Subject to Y.p/ � NY ;

pc.p/ � Npc;

F.p/ � Fmax;

pmin � p0 � pmax: (16.3)


Here, Y.p/ is the parametric yield associated with the parameter vector p andpc.p/ is the power consumption. F.p/ denotes other performance metrics (suchas area A), which define the feasible design space. Moreover, NY and Npc are theminimum yield rate and maximum power consumption (or targeted values) that canbe accepted, respectively. In other words, the multiobjective optimization procedureis to maximize the Y.p/ that should be larger than NY and minimize the pc.p/ thatshould be smaller than Npc simultaneously. Meanwhile, other constraints defined byF.p/ should be satisfied.

Moreover, p is a vector of the process parameters with variations and can beexpressed as p D p0 C ıp. Also, p0 is a vector of the nominal values assigned inthe design stage, and ıp consists of parameter variations with zero-mean Gaussiandistributions. In addition, all nominal values of process parameters p0 are assumedto be limited within the feasible parameter space (pmin; pmax) and can be tuned forbetter yield rate.

One effective solution of this optimization is the gradient-based approach, whichrequires the calculation of the sensitivity in the stochastic domain. As discussedlater, this chapter develops a stochastic sensitivity analysis, which can be embeddedinto one sequential linear programming (SLP) to solve this optimization problemefficiently.

3 Stochastic Variation Analysis for Yield Analysis

In this section, we show how to apply the OPC technique introduced in Sect. 3.2 ofChap. 2 to analyze and estimate the yield.

In this section, we first review the existing works of mismatch analysis [6,32,105,133]. Here we focus on the stochastic variation, or referred to as local mismatch. Weillustrate the stochastic variation analysis using MOS transistors in the followingsection. A similar approach can be extended to other types of transistors by theso-called propagation of variance (POV) method [32, 105].

The mismatch of one MOS transistor is usually modeled by Pelgrom’s model[133], which relates the local mismatch variance of one electrical parameter withgeometrical parameters by

� D �ˇ

pW � L

; (16.4)

where �ˇ is the additional fitting parameter.To consider the local mismatch during circuit simulation without running Monte

Carlo, SiSMA [6] models the random local mismatch of a MOS transistor by astochastic noise current source �, coupled with the nominal drain current ID inparallel. � can be expressed by

� D IˇD tm.W; L/�.x; y/: (16.5)

3 Stochastic Variation Analysis for Yield Analysis 257

Here, the IˇD is determined by the operating region of MOS transistors; tm.W; L/

considers the geometry of the device active area:

tm.W; L/ D 1 C �ˇ

pW � L

; (16.6)

and �.x; y/ refers to the sources of all the variations that depend on the deviceposition, which can include the spatial correlation [6]. Here, �.x; y/ D 1 becauseall parameters are decoupled after the PCA.

Note that the random variable in the stochastic current source can be expandedby the spectral stochastic method [187, 196]. For example, let us use the channellength L of one MOS transistor as the variation source. Assuming the variation ofL is small, one can expand tm.W; L/ around its nominal value W.0/ and L.0/ withTaylor expansion by

tm.W; L/ D 1 C �ˇ

pW L

D 1 C �ˇ

pW.0/

2

64

1p

L.0/

� 1

2

q�L.0/

�3�L � L.0/

�3

75

D 1 C �ˇ

pW.0/

2

64

1p

L.0/

� 1

2

q�L.0/

�3�

3

75 (16.7)

Here, � is the random variable for the variation of the channel length L. One candescribe � by OPC. Based on the Askey scheme [196], a Gaussian distribution of �

can be expanded using Hermite polynomials ˚i (i D 0; : : : ; n) by

� DnX

iD0

g0i ˚i ; (16.8)

where g0i is the OPC expansion coefficient.

As such, one can summarize the expression of the stochastic current source � as

� D IˇD

2

641 C �ˇ

pW .0/

0

B@

1p

L.0/

� 1

2q

L3.0/

nX

iD1

g0i ˚i

1

CA

3

75 ;

DnX

iD0

gi ˚i ; (16.9)


where gi is the new expression of the expanded coefficients but with geometrydependence.

Knowing the expression of � for one parameter variation source, multiple processparameters pi (i D 1; � � � ; m) can be considered by a vector of stochastic currentsource �.t/.

On the other hand, any integrated circuit is composed of passive and activedevices described by a number of terminal-branch equations. According to KCL,one can obtain a differential algebraic equation (DAE) as below:

d

dtq.x.t// C f .x.t/; t/ C B � u.t/ D 0: (16.10)

Here, x.t/ is vector of state variables consisting of node voltages and branchcurrents. q.x.t/; t/ contains active components such as charges and fluxes. Also,f .x.t/; t/ describes passive components, and u.t/ denotes input sources. B de-scribes how to connect sources into the circuit which is determined by circuittopology.

Similar to [6], one can add �.t/, representing the mismatch, to the rhs of thedifferential algebra equation (DAE):

dq.x.t//

dtC f .x.t// C B � u.t/ D T � �.t/; (16.11)

which describes the circuit and system under stochastic variations. Note that T is thetopology matrix describing how to connect �.t/ into the circuit, and one can have

T � �.t/ DmX

iD1

Tpi �pi (16.12)

for multiple parameters. For example, �pi is the mismatch current source for i thparameter variation, which can be expanded using OPC shown in (16.9).

3.1 Algorithm Overview

In summary, we outline the overall algorithm flow as in Algorithm (1). From thisflow, we observe that the optimization procedure involves several optimizationiterations. Each of the iterations contains three major steps: stochastic yieldestimation, stochastic sensitivity analysis, and stochastic yield optimization. Thelast is achieved by tuning nominal parameters along the obtained gradient directions.Notice that we take all design parameters as random variables; fixed parameters thatcannot be tuned can be removed from this procedure by parameter screening.


3.2 Stochastic Yield Estimation and Optimization

In this section, we will discuss how to estimate the parametric yield and furtheroptimize it by tuning parameters automatically. As such, we first show how toestimate the parametric yield with the stochastic variation (mismatch) (�fmIt ; �fmIt )obtained from the above NMC mismatch analysis.

3.3 Fast Yield Calculation

First, we construct the performance distribution at one time step tk by (�fm.tk/,�fm.tk/), shown as the solid curve from � � 3� to � C 3� in Fig. 16.3. Then, theperformance constraint is given as

h.pI tk/ D fm.pI tk/ � fmthreshold � 0: (16.13)

With the constraints, the boundary separating success region from failure region canbe plotted as the straight line h.pI tk/ D 0 in following figure.

As a result, the performance fm.tk/ located at the left of h.pI tk/ D 0 (shownas the shaded region) can satisfy the constraint in (16.13) and thus belongs to the


−3 −2 −1 0 1 2 30

0.5

1

1.5

Performance (fm)

Num

ber

of O

ccur

anes

h(p;t)=0

μfm

μfm−3σfm

μfm+3σfm

SuccessRegion

Fig. 16.3 Parametric yieldestimation based onorthogonal PC-basedstochastic variation analysis

successful region OS . Hence, the parametric yield can be estimated with the arearatio by

Y.p/ DOS

Sfm

: (16.14)

When denoting the entire region area Sfm D 1, Y.p/ becomes OS and is determinedby the integration below:

Y.p/ DZ

OSpdf.fm.pI tk//dS D

Z

OSpdf.�fm ; �fm /dS; (16.15)

where pdf.fm/ is the probability distribution function (PDF) of the performancemerit of interest, characterized by �fm and �fm at the time step tk .

3.4 Stochastic Sensitivity Analysis

In order to enhance yield rate, most optimization engines need sensitivity infor-mation to identify and further tune those critical parameters. However, with theemerging process variations beyond 90 nm, traditional sensitivity analysis becomesinefficient: either use the worst-case scenario or conduct MC simulations [88, 100,153]. Therefore, an efficient NMC-based stochastic sensitivity analysis is neededfor this purpose. With all parameter variations calculated from the fast mismatchanalysis in Sect. 15, one can further explore the impact or contribution from theparameter variation ��pi

to the performance variation ��fm. This can be utilized to

perform optimization procedure for better performance merits. In this section, we


develop an approach to evaluate the sensitivity of transient variation (mismatch)with respect to each parameter variation.

We start from the definition of stochastic sensitivity, expressing the relationshipbetween the performance metric variation �fm . From now on, we note �fm .t/ Dfm.�pI t/) for illustration purpose and assume the random parameter vector �p (2R

m). As such, the stochastic sensitivity can be defined by

spi.t/ D @fm.�pI t/

@�pi

; i D 1; � � � ; m; (16.16)

where spi .t/ is the derivative of the performance variation �fm with respect to thei th random parameter variable �pi at one time instant t . Depending on the problemor circuit under study, the performance fm can be output voltage, period, and power,and the parameter can be transistor width, length, and oxide thickness. Such a so-called stochastic sensitivity can be also understood based on the POV relationship[32, 105]:

�2�fm

DX

i

�@fm.�pI t/

@�pi

2

�2�pi

: (16.17)

Here, �2�pi

is the parameter variance and �2�fm

is the performance variance.Note that the performance variation �fm is mainly determined by ˛1 [196] in

(16.15) at time step tk as derived in Sect. 3.3, while ˛2 has little impact on theperformance variation. As such, one can truncate the OPC expansions to the first-order for the calculation of mean and variance, and experiments show that thefirst order expansion can provide adequate accuracy. Therefore, ˛1 is the dominantmoment for �fm while ˛2 can be truncated to simplify calculation. Therefore, wehave the following:

˛1.tk/ D c1 C c0T � g.tk/; (16.18)

where

c0 D�

Gk.0/ C 1

hC k

.0/

�1

;

c1 D c0 ��

1

hC k

.0/˛1.tk � h/

:

As such, one can further calculate the stochastic sensitivity @fm.�pI t/ı

@�pi

using

spi.tk/ D @fm.�pI t/

@�pi

D �c0Tpi

� � @g.tk/

@pi

; (16.19)

which can be utilized in any gradient-based optimization to improve the yield rate.


3.5 Multiobjective Optimization

Next, we make use of sensitivities spi to improve parametric yield. Meanwhile, sincepower is also a primary design concern, we treat power consumption reduction as anextra objective and solve a multiobjective optimization problem defined in Sect. 3.Note that other performance merits can be treated as objectives of optimizationin a similar way. As such, by tuning nominal process parameters along gradientdirections, we enable more parameters containing process variations to satisfy theperformance constraints. This is an important feature for a robust design. In thissection, we demonstrate this requirement by a sequential linear programming (SLP).

At the beginning of each optimization iteration, the nonlinear objective functionsY.p/ and pc.p/ can be approximated by linearization:

Y.p/ D Y�p.0/

�C rpY.�p.0//T�p � p.0/

�;

pc.p/ D pc�p.0/

�C rppc.�p.0//T�p � p.0/

�; (16.20)

where p.0/ represents the nominal design parameters while p contains the processvariations of these parameters. Note that (31) is a first-order Taylor expansion ofparametric yield Y.p/ defined in (16.15) and power consumption pc.p/, aroundthe nominal parameter region p.0/. Thus, rpY.�p.0/

/ is a vector consisting of

@Y.�p/ı

@�pi . The same is true for power consumption rppc.�p.0//. Therefore, the

nonlinear objective functions can be transformed into a series of linear optimiza-tion subproblems. The optimization terminates when the convergence criterion isachieved.

As such, the stochastic multiobjective yield optimization problem in Sect. 3 canbe reformulated as

Maximize Y.p/ D Y�p.0/

�C rpY�p.0/

�T �p � p.0/

�;

Minimize pc.p/ D pc�p.0/

�C rppc

�p.0/

�T �p � p.0/

�;

Subject to Y.p/ � NY ;

pc.p/ � Npc;

F.p/ � Fmax;

pmin � p � pmax;

where ıp D p � p0 is the step size. Within each iteration, the sensitivity vector

rpY�p.0/

�; rppc

�p.0/

�; and ıp should be updated.

However, the stochastic sensitivity analysis in Sect. 5 can only calculate@F.�pI t/

ı@�pi rather than @Y.�p/

ı@�pi . To obtain @Y.�p/

ı@�pi , we start from

(16.15) with the following derivation:


−4 −3 −2 −1 0 1 2 30

0.5

1

1.5

Performance (fm)

Num

ber

of O

ccur

anes

h(p;t)=0μfm(p1)

μfm(p0)Fig. 16.4 Stochastic yieldoptimization

@Y.�p/

@�pi

DZ

OS

@pdf.F.�pI t//

@�pi

dS

DZ

OS

@pdf.F /

@F� @F.�pI t/

@�pi

dS: (16.21)

As a result, @Y.�p/ı

@�pi can be obtained with @F.�pI t/ı

@�pi calculated from thestochastic sensitivity analysis. Note that the PDF of the performance variation andthe integral region OS are both given from the yield estimation in (16.15).

We illustrate the presented optimization procedure for yield objective functionY.p/ through Fig. 16.4. With the parametric yield estimation using the NMCmismatch analysis, the distribution of performance fm for nominal parametersp0 can be plotted as a solid curve, which has a mean value �fm .p0/. With theperformance constraint h.pI t/ � 0 in (16.1), the shaded area located at the left ofthe constraint line is the desired successful region. One yield optimization procedureneeds to move the performance distribution to left side so that the shaded area canbe maximized. Therefore, the problem here is how to change the process parametersp in order to move the performance distribution for an enhanced yield rate.

Moreover, power consumption can be estimated by

pc.p/ D �ŒVdd � NiVdd �; (16.22)

where Vdd is the power supply voltage source and NiVdd is the average value of currentthrough the voltage source. The power consumption optimization can be explainedas shown in Fig. 16.5. The initial design generates the current iVdd denoted as theblack curve and leads to high power consumption pc.


0 0.005 0.01 0.015 0.02 0.025 0.03 0.035−2.5

−2

−1.5

−1

−0.5

0x 10−5

Time (ns)

Cur

rent

thro

ugh

pow

er s

uppl

y (A

)Optimal

Middle

Initial

Fig. 16.5 Power consumption optimization

According to (16.22), pc can be reduced by lowering the average value of iVdd .To do so, we move the minimum point on the current trajectory close to zero andobtain the optimal design with minimum pc as the red curve shown in Fig. 16.5. Assuch, the power optimization requires us to change p in order to move the minimumpoint of iVdd close to zero for smaller power consumption. To solve this problem,the parametric yield rate Y.p0/ is first calculated from (16.15) and the performancedistribution is constructed accordingly, similar to the one in Fig. 16.4. Then, thetargeted yield rate NY is used to compare with Y.p0/ by

Y.p0/ D NY � Y.p0/: (16.23)

Next, the NMC stochastic sensitivity analysis is performed to find @F.�pI t/ı

@�pi ;

and thus, @Y.�p/ı

@�pi in (16.21). As a result, with the first-order Taylor expansionin SLP (16.20), one can determine the parameter incremental ıpyield D p � p.0/ inorder to reach Y.p/ D NY by

ıpyield DNY � Y

�p.0/

�

rpY�p.0/

� D Y�p.0/

�

rpY�p.0/

� : (16.24)

On the other hand, we perform the same procedure to optimize the powerconsumption. Same as in (16.19), we calculate the sensitivity of power consumptionw.r.t. process parameters at iVdd with a minimum current value:


@pc.p/

@pi

D �"

Vdd � @iVdd

@pi

ˇˇ̌ˇiVdd DMinimum

#

: (16.25)

The according parameter increments can be computed as

ıppower D Npc � pc�p.0/

�

rppc�p.0/

� D pc�p.0/

�

rppc�p.0/

� : (16.26)

In this way, the total changes to the process parameters are the weightedsummation below:

ıptotal D �1 � ıpyield C �2 � ıppower; .�1; �2 2 Œ0; 1�/; (16.27)

where �1 and �2 are weights for yield and power consumption. Also, �1 and �2 canbe updated dynamically and weight � should be larger for the performance meritthat is farther from the target value.

Therefore, one can update p with the new parameter p0 C ıptotal. Moreover, theNMC mismatch analysis is conducted to update the performance distribution, whichis denoted by a dashed curve shown in Fig. 16.4. With the updated new parametersand performance distribution, all performance constraintsF.p/ � Fmax are checkedfor violations. If they are still valid, p becomes the new design point, and thisprocedure is repeated again to enhance the yield rate.


The presented NMC algorithms has been implemented for NMC mismatch analysis,yield estimation, and optimization in a Matlab-based circuit simulator. All experi-ments are performed on a Linux server with a 2.4 GHz Xeon processor and 4 GBmemory. In the experiment, we take the widths of MOSFETs as process variableparameters. The initial results of this chapter were published in [52].

However, the presented approach only considers design parameters such aschannel width W , because the distribution of design parameters under processvariations can be shifted by tuning their nominal values. As such, more designparameters with process variations can satisfy the performance constraints and thetotal yield rate can be enhanced, which is also needed for a robust design. Therefore,the parameters that are not tunable, such as channel length L, are not considered inthe presented approach.

We first use an operational amplifier (OPAM) to validate the accuracy andefficiency of the NMC mismatch analysis by comparing it with the MC simulations.Then, a Schmitt trigger is used to verify the presented parametric yield estimationand stochastic yield analysis. Next, we demonstrate the validity and efficiency ofthe presented yield optimization flow using a six-transistor SRAM cell.


Vss −5V

Is

Mp8Mp5

Mp7

Mp2Mp1

Mn3 Mn4

Mn6

Input+Input−

Output

Vdd +5V

Fig. 16.6 Schematic of operational amplifier

4.1 NMC Mismatch for Yield Analysis

The OPAM is shown in Fig. 16.6, which consists of eight MOS transistors. Theirwidths are treated as stochastic variational parameters with Gaussian distributionsand a 10% random perturbation from their nominal values. Moreover, we considerthe matching design requirements for the input pair devices, such as the samenominal width (Wp1 D Wp2, W n3 D W n4, Wp5 D Wp8) and the fixed widthratio (W n6 D kW n3).

We first introduce the width variations to all MOS transistors, and perform 1;000

times MC simulations with a high confidence level to find the variational trajectoriesat the output node. Then, we apply the developed NMC mismatch analysis to OPAMand locate the boundaries (� � 3� , � C 3�) of variational trajectories with a one-time run of transient circuit simulation. The results are shown in Fig. 16.7, whereblue lines denote the MC simulations and the two black lines are results from thepresented mismatch analysis. We observe that our approach can capture the transientstochastic variation (mismatch) as accurately as that in the MC result.

We further compare the accuracy and efficiency for NMC mismatch analysis andthe MC method in the Table 16.1. From this table, we can see that NMC mismatchanalysis not only can achieve 2% accuracy, but also gains 680� speedup over MCmethod.

4.2 Stochastic Yield Estimation

We further consider the Schmitt trigger shown in Fig. 16.8 to demonstrate thestochastic yield estimation. Similarly, we assume the widths of all MOSFETs


Fig. 16.7 NMC mismatch analysis vs. Monte Carlo for operational amplifier case

Table 16.1 Comparison ofaccuracy and runtime

Operational amplifier example

Runtime (seconds) Proposed 1.33Monte Carlo 905.06

Mean value (�) Proposed 0.35493Unit: volt Monte Carlo 0.34724

Std. value (� ) Proposed 0.57032Unit: volt Monte Carlo 0.56272

to have 10% variations from their nominal values and to conform to Gaussiandistributions. Moreover, we consider the lower switching threshold VTL to be theperformance metric of the parametric yield, which can be changed due to MOSFETwidth variations. Thus, the performance constraint for the parametric yield is thefollowing: when the input VTL is 1:8 V and the output is initially set to Vdd D 5 V,the output VOUH should be greater than 4.2 V.

First, we perform 1;000 times MC simulations and compare it with the NMCstochastic variation analysis shown in Fig. 16.9a. Then, the output distribution fromthe MC simulation at the time step where input equals to 1:8 V is plotted inFig. 16.9b. Also, the PDF estimation by the NMC mismatch analysis is comparedwith MC simulations in the same figure. We can observe that the two distributionscoincide with each other very well.

Then, the yield rate can be calculated with one estimated PDF from the NMCmismatch analysis efficiently. We list the mean (�), standard deviation (�), andyield estimation results from the presented approach and those by MC simulationsin Table 16.2.


Vdd

VoutVinVdd

GND

Mp1

Mp2

Mn1

Mn2

Mp3

Mn3

Fig. 16.8 Schematic ofSchmitt trigger

Table 16.2 Comparison ofaccuracy and runtime

Schmitt trigger example

Runtime (seconds) Proposed 1.06Monte Carlo 801.84

Mean value (�) Proposed 4.2043Unit: volt Monte CarloN 4.1993

Std. value (� ) Proposed 0.10487Unit: volt Monte Carlo 0.094346

Yield rate Proposed 0.48357Monte Carlo 0.47059

With the accurate estimation of output distribution, the presented method cancalculate the yield rate with 2:7% accuracy as well as 756� speedup when comparedto the MC method.

More important, NMC mismatch analysis has linear scalability because allprocess variation sources can be modeled as additive mismatch current sources andintroduced into the rhs of DAE system in (16.11).

4.3 Stochastic Sensitivity Analysis

Furthermore, we apply the presented stochastic sensitivity analysis to Schmitttrigger example to find the contribution of each variation source to the outputvariation. Note that we are interested in the lower switching threshold VTL, whereinput increases from zero and output decreases from Vdd. The sensitivity of outputvoltage variation �output with respect to all MOSFET widths variations �pi at the timestep where input equals to 1:8 V are shown in Table 16.3. From this table, we canobserve that widths of Mp1, Mp2, and M n3 transistors are more critical than otherMOSFETs.


0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.83.8

4

4.2

4.4

4.6

4.8

5

5.2a

b

Time [ns]

Out

put V

olta

ge (

volt)

NMC mismatch analysis vs. MC

Output distributions from NMC mismatch analysis and Monte Carlo

3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.70

10

20

30

40

50

60

70

80

90

Output Vltage (volt)

Num

ber

of O

ccua

nces

Fig. 16.9 Comparison of Schmitt trigger example

Table 16.3 Sensitivity of�output with respect to eachMOSFET width variation �pi

Parameter Mn1 width Mn2 width Mn3 width

Sensitivity 2.4083e-4 2.4083e-4 4.8069e-3Parameter Mp1 width Mp2 width Mp3 widthSensitivity 2.4692e-2 2.4692e-2 0


Vdd +5V

GND

Mn1 Mn3

Mn4Mn2 Mp5 Mp6

WL=1

BL=1BL_B=1 Q_B=0

Q =1

Fig. 16.10 Schematic of SRAM 6-T cell

4.4 Stochastic Yield Optimization

To demonstrate the yield optimization using stochastic sensitivity analysis, we usea typical design of 6-T SRAM cell in Fig. 16.10. In this example, the performancemerit is the access time of the SRAM, which is determined by the voltage differencebetween BL B and BL. Initially, both BL B and BL are precharged to Vdd, whileQ B stores zero and Q stores one. When reading the SRAM cell, BL B starts todischarge from Vdd and produces a voltage difference V between it and BL. Thetime it takes BL B to produce a large enough voltage difference Vth is calledaccess time. If the access time is larger than the threshold at the time step tthreshold,this leads to an access time failure. In the experiment, we assume that tthreshold D0:04 ns and Vth D 0:1338 V.

Similarly, all channel widths of MOSFETs are considered as the variationalparameters which conform to Gaussian distributions with 12% perturbation fromnominal values. As such, when the access time differs from the nominal value dueto variations in channel width, access time failure occurs, and thus, yield loss mayhappen. In order to enhance it, we first perform NMC mismatch analysis to find thevoltage distribution of BL B at tthreshold, which is shown in Fig. 16.11. Also, as abaseline for comparison, we run 1;000 times MC simulations to plot the variationaltransient waveforms of BL B , which are shown in Fig. 16.12. This validates theaccuracy of the NMC mismatch analysis.

Then, the sensitivity analysis developed in this chapter is used to find the@�vBL B

ı@�pi and @�power

ı@�pi where �pi is the width variation of i th MOS transistor

and �power is the variation of power supply voltage source. Results are shownin Table 16.4. From this table, we can see that only M n1, M n2, and Mp6 canhave influence on the access time and power variations; also, we can see that theirnominal values can be tuned to reduce access time failure for better parametric yieldrate and to lower the power consumption simultaneously.


0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.170

20

40

60

80

100

Output Voltage (volt)

Num

ber

of O

ccur

ance

s

Fig. 16.11 Voltagedistribution at BL B node

Fig. 16.12 NMC mismatch analysis vs. MC

Table 16.4 Sensitivity of�vBL B and �power withrespect to each MOSFETwidth variation �pi

Parameter Mn1 width Mn2 width Mp6 width

Sensitivity (�vBL B ) 1.3922e-3 2.0787e-3 7.0941e-2Sensitivity (�power) 3.7888e-4 5.7816e-4 �5.8871e-4

As a result, we apply the developed multiobjective yield optimization to improveyield. For comparison purpose, two algorithms have been implemented:

1. Baseline, the generic gravity-directed method in [167], which moves the nominalparameters to the gravity of successful region

2. The single-objective optimization which only improves the yield


Table 16.5 Comparison of different yield optimization algorithms for SRAM cell

Parameter First cut Baseline Single objective Multiobjective

Mn1 width (m) 1e-5 2.872e-5 2.7841e-5 3.577e-5Mn2 width (m) 1e-5 2.3282e-5 2.2537e-5 2.7341e-5Mp6 width (m) 3e-5 1.5308e-5 1.6296e-5 9.7585e-6Power (W ) 1.0262e-005 3.0852e-5 1.2434e-5 1.0988e-5Area (m2) 2.4e-11 2.81e-11 2.8e-11 2.88e-11Yield (%) 49.32 94.23 95.49 95.31Runtime (seconds) 2.42 32.384 27.226 15.21Iterations 1 12 10 6

The results from all optimization methods are shown in Table 16.5. From thistable, it can be observed that all methods can improve the parametric yield to bearound or even more than 95% compared with the initial design. According nominalvalues can be used as better initial design parameters. Meanwhile, the area is smallerthan the maximum acceptable area criterion A � 1:2Ainitial.

However, optimal designs from baseline (gravity-directed) method and single-objective optimization require 2:75� and 21% more power consumption whencompared with initial design, respectively. Proposed method can lead to optimaldesign with only 7% more power requirement. Therefore, it can be demonstratedthat presented multiobjective optimization not only can improve the yield ratebut also suppresses the power penalty simultaneously. Moreover, the presentedoptimization procedure only needs six iteration runs to achieve the shown resultswithin 15:21 s. Notice that the parametric yield Y.p/ can be further improved witha higher target yield NY and more optimization iterations.

5 Summary

In this chapter, we have presented one fast NMC method to calculate mismatch intime domain with the consideration of local random process variations. We modelthe mismatch by a stochastic current source expanded by OPC. This leads to anefficient solution for mismatch and further for parametric yield rate without usingthe expensive MC simulations. In addition, we are the first to derive stochastic sensi-tivity of yield within the context of OPC. This leads to a multiobjective optimizationmethod to improve the yield rate and other performance merits simultaneously.Numerical examples demonstrate that the presented NMC approach can achieve upto 2% accuracy with 700� speedup when compared to the Monte Carlo simulations.Moreover, the presented multiobjective optimization can improve the yield rate upto 95:3% with other performance merits optimized at the same time. The presentedapproach assumes the need to know the distribution type of the process variationsin advance.

Chapter 17Voltage Binning Technique for YieldOptimization

1 Introduction

Process-induced variability has huge impacts on the circuit performance and yieldin the nanometer VLSI technologies [71]. Indeed, the characteristics of devices andinterconnects are prone to increasing process variability as device geometries getclose to the size of atoms. The yield loss from process fluctuations is expectedto increase as the transistor size scaling down. As a result, improving yieldsconsidering the process variations is critical to mitigate the huge impacts fromprocess uncertainties.

Supply voltage adjustment can be used as a technique to reduce yield loss, whichis based on the fact that both chip performance and power consumption dependon supply voltage. By increasing supply voltage, chip performance improves. Bothdynamic power and leakage power, however, will become worse at the sametime [182]. In contrast, lower supply voltage will reduce the power consumption butmake the chip slower. In other words, faster chips usually have higher power con-sumption and slower chips often come with lower power consumption. Therefore,it is possible to reduce yield loss by adjusting supply voltage to make some failingchips satisfy application constraints.

For yield enhancement, there are also different schemes for supply voltageadjustment. In [182], the authors proposed an adaptive supply voltage method forreducing impacts of parameter variations by assigning individual supply voltage toeach manufactured chip. This methodology can be very effective but it requiressignificant effort in chip design and testing at many different supply voltages.Recently, a new voltage binning technique has been proposed by the patent [85]for yield optimization as an alternative technique of adaptive supply voltage. Allmanufactured chips are divided into several bins, and a certain value of supplyvoltage is assigned to each bin to make sure all chips in this bin can work under thecorresponding supply voltage. At the cost of small yield loss, this technique is muchmore practical than the adaptive voltage supply. But only a general idea is givenin [85], without details of selecting optimal supply voltage levels. Another recent


273

274 17 Voltage Binning Technique for Yield Optimization

work [213] provides a statistical technique of yield computation for different voltagebinning schemes. From results of statistical timing and variational power analysis,the authors developed a combination of analytical and numerical techniques tocompute joint PDFs of chip yield as a function of inter-die variation in effectivegate length L, and solve the problem of computing optimal supply voltages for agiven binning scheme.

However, the method in [213] only works under several assumptions and approx-imations that will cause accuracy loss in both yield analysis and optimal voltagebinning scheme. The statistical model for both timing and power analysis used in[213] is simplified by integrating all process variations other than inter-die variationin L to one random variable following Gaussian distribution. Indeed, the intra-dievariations have a huge impact on performance and power consumption [3,158]. Andother process variations (gate oxide thickness, threshold voltage, etc.) have differentdistributions and should not be simplified to only one Gaussian distribution.Furthermore, this technique cannot predict the number of voltage bins needed undercertain yield requirement before solving the voltage binning problem.

In general, voltage binning for yield improvement becomes an emerging tech-nique but with many unsolved issues. In this chapter, we present a new voltagebinning scheme to optimize yield. The presented method first computes the set ofworking supply voltage segments under timing and power constraints from eitherthe measurement of real chips or MC-based SPICE simulations on a chip withprocess variations. Then on top of the distribution of voltage segment lengths,we propose a formula to predict the upper bound of bin number needed underuniform binning scheme for the yield requirement. Furthermore, we frame thevoltage binning scheme as a set-cover problem in graph theory and solve it by agreedy algorithm in an incremental way. The presented method is not limited bythe number or types of process variability involved as it should be based on actualmeasured results. Furthermore, the presented algorithm can be easily extended todeal with a range of working supply voltages for dynamic voltage scaling underdifferent operation modes (like lower power and high-performance modes).

Numerical examples on a number of benchmarks under 45 nm technology showthat the presented method can correctly predict the upper bound on the number ofbins required. The optimal binning scheme can lead to significant saving for thenumber of bins compared to the uniform one to achieve the same yield with verysmall CPU cost.


2.1 Yield Estimation

A “good” chip needs to satisfy two requirements:

(1) Timing slack is positive S > 0 under working frequency.(2) Power does not exceed the limit P < Plim.


For a single voltage supply, the definition of parametric chip yield is the percentageof manufactured chips satisfying these constraints. Specifically, we compute yieldfor a given voltage level by direct integration in the space of process parameters:

Y DZ

� � �Z

S>0;P <Plim

f .X1; : : : ; Xn/dX1 : : : dXn; (17.1)

where f .X1; X2; : : : ; Xn/ is the joint PDF of X1 to Xn, which representsthe process variations. Also, there exists spatial correlation in the intra-die partof variation. Existing approach in [213] ignores the intra-die variation in processparameters, which means only one random variable for inter-die variation isconsidered. And all other variations except inter-die variation in Leff are integratedinto one Gaussian random variable. In this way, the multi-dimensional integralin (17.1) can be modeled numerically as a two- or three-dimensional integral.However, the spatial correlation can have significant impacts on both statisticaltiming and statistical power of a circuit [12,158], thus impacts on yield analysis also.

2.2 Voltage Binning Problem

We first define voltage binning scheme as in [213].

Definition 17.1. A voltage binning scheme is a set of supply voltage levels V DfV1; V2; : : : ; Vkg; a set of corresponding bins U D fU1; U2; : : : ; Ukg, which is also apartitioning of all chips; and a binning algorithm B , which distributes manufacturedchips among the bins.

The binning algorithm B assigns chips to bins so that any chip in bin Ui meets boththe performance and power constraints at supply voltage level Vi correspondingto Ui . The yield loss is constituted by chips which fail to be assigned to any binin U.

The definition of a voltage binning scheme depends on two factors: the binvoltage levels V and the binning algorithm A. Different binning algorithms willresult in different yields even given the same bin voltage levels V. However, inthe optimization process, the focus is the binning algorithms which can producethe maximum possible yield. That is to say, in an optimal binning algorithm, thereexists at least one voltage bin for any “good” chip (the chips satisfy performanceand power constraints). In this way, the yield loss under bin voltage levels V willreach the maximum value.

Therefore, the problem of computing optimal voltage binning scheme can beformulated as follows:

maxV

Y I s:t: Vmin � Vi 2 V � Vmax; (17.2)

where Y is the total yield under the optimal voltage binning scheme with supplyvoltage levels V D fV1; V2; : : : ; Vkg.


We would like to mention one special type of voltage binning in which we havean infinite number of voltage bins with all possible voltage levels. This binningscheme allows the supply voltage to be individually tailored for each chip to meettiming and power constraints. It is obvious that the yield in this case is the maximumpossible yield, named as Ymax, which should be an upper bound of yield for anyother voltage binning scheme. As a result, for optimal solution, kopt should be theminimum number of bins that make Yk;opt D Ymax.

3 The Presented Voltage Binning Method

In this section, we present a new voltage binning scheme, which not only gives thegood solution for a given set of voltage levels, but also computes the minimumnumber of bins required. Figure 17.1 presents the overall flow of the presentedmethod and highlights the major computing steps. Basically, steps 1 and 2 computethe valid voltage segment for each chip. Step 3 determinates the voltage levels andthe chip assignments to the resulting bins. This is done by a greedy-based set-covering method. In Fig. 17.1, Sleft denotes the set of uncovered voltage segmentsleft in the complete set of valid voltage segments Sval. Vi is the i th supply voltagelevel, and chips assigned to Ui can meet both the power and timing constraints atsupply voltage Vi .

The algorithm in step 3 tries to find the voltage level one at a time such that itcan cover as many chips as possible in a greedy fashion (a chip is covered if its valid

Fig. 17.1 The algorithm sketch of the presented new voltage binning method

3 The Presented Voltage Binning Method 277

0.8 1 1.2

0.15

0.2

0.25

0.3Mean delay as function of vdd

Supply voltage (V)

Del

ay (

ns)

0.8 1 1.2 1.4100

150

200

250

300

350

400Mean power as function of vdd

Supply voltage (V)P

ower

(μ

W)

Fig. 17.2 The delay and power change with supply voltage for C432

Vdd segment contains the given voltage level). The algorithm stops when all thechips are covered, and the number of levels seen so far (kopt) will be the minimumnumber of bins that can reach the maximum possible yield Ymax. In the presentedalgorithm, we can also provide a formulation to predict the minimum number ofbins required under the uniform binning scheme from the distribution of length ofvalid Vdd segment, which can serve as a guideline for the number of bins required.

3.1 Voltage Binning Considering Valid Segment

For a chip, the working supply voltage range (segment) ŒVlow; Vhigh� actually can beconsidered as a knob to do the trade-off between the power and timing of the circuit.As we know, supply voltage affects power consumption and timing performancein opposite ways. Reducing supply voltage will decrease the dynamic power andleakage power, which is often considered the most effective technique for lowpower design. On the other hand, propagation delay will increase as supply voltagedecreases [186]. Figure 17.2 shows the mean delay and power consumption asfunctions of supply voltage, which show such trends clearly. As a result, giventhe power consumption bound and the timing constraint for a chip, Vlow is mainlydecided by timing and Vhigh is mainly determined by power constraint. Since processvariation leads to different timing performances and power consumptions, the validVdd segment ŒVlow; Vhigh� will be different for each chip. As a result, the measuredtiming and total power data from a chip can be mapped onto corresponding workingVdd segments, which is the step 1 in Fig. 17.1. For some chips, we may haveVlow > Vhigh (invalid segment), which means that these chips will fail on any supplyvoltage. So we call them “bad” chips.


Vdd

Vmin V1 V2 V3 Vmax

Fig. 17.3 Valid voltagesegment graph and thevoltage binning solution

Suppose there are N sampling chips from testing, and nbad bad chips. Obviously,the maximum number of possible yield via voltage binning scheme only will be

Ymax D .N � nbad/=N: (17.3)

We then define the set of valid segments Sval D ŒVlow; Vhigh� by removing thebad chips from the sampling set and only keeping the valid segments (step 2 inFig. 17.1). Then the voltage binning scheme problem in (17.2) can be framed intoa set-cover problem. Take Fig. 17.3, for instance; there are nval D 13 horizontalsegments between Vmin and Vmax (each corresponds a valid Vdd segment), and theproblem becomes using minimum number of vertical lines to cover all the horizontalsegments. In this case, three voltage levels can cover all the Vdd segments of these13 chips. We also notice that one chip can be covered by more than one voltagelevel. In this case, it can be assigned to any voltage level containing it. The problemis well known in graph theory with known efficient solutions. This valid voltagesegment model has many benefits compared with other yield analysis model forvoltage binning:

1. Distribution of length of valid supply voltage segment can provide informationabout the minimum number for uniform binning under certain yield requirement(e.g., to achieve 99% for Ymax, more details in Sect. 3.2.).

2. The model can also be used when the allowed supply voltage level for one voltagebin is an interval or a group of discrete values for voltage scaling mechanisminstead of a scalar (details in Sect. 3.3).

3.2 Bin Number Prediction Under Given Yield Requirement

The distribution of valid Vdd segment length (defined as len D Vhigh � Vlow) can bea guide in yield optimization when there is a lower bound requirement for yield.And it works for both uniform binning and optimal binning. Notice that the optimal

3 The Presented Voltage Binning Method 279

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

50

100

150

200

250

300

350

400

450

Length of Valid Vdd Range (V)

Num

ber

of S

ampl

e C

hips

in E

ach

Bin Mean ValueOne σTwo σ

Fig. 17.4 Histogram of the length of valid supply voltage segment len for C432

binning can always have an equal or better yield than the uniform binning. Actually,the experiment result part shows that the number of bins needed for optimal voltagebinning is much smaller than the prediction from the distribution of len. Figure 17.4shows the histogram of valid supply voltage length, len, for testing circuit C432,from which we can see that it is hard to tell which type of random variable it belongsto. However, it is quite simple to get the numerical probability density function(PDF) and CDF from measured data of testing samples, as well as the mean valueand standard deviation.

Suppose the yield requirement is Yreq and the allowed supply voltages for testingis in ŒVmin; Vmax�. For the uniform voltage binning scheme, there is k bins, and the setof supply voltage levels is V D fV1; V2; : : : ; Vkg. Since the voltage binning schemeis uniform,

Vi � Vi�1 D V const. .i D 2; 3; : : : k/: (17.4)

For the uniform voltage binning scheme, we have the following observations:

Observation 1. If there are k bins in ŒVmin; Vmax� then

V D .Vmax � Vmin/=.k C 1/: (17.5)

Observation 2. For a Vdd segment ŒVlow; Vhigh� with a length len D Vhigh � Vlow, iflen > V , there must exist at least one Vdd level in the set of supply voltage levelsV D fV1; V2; : : : ; Vkg that can cover ŒVlow; Vhigh�. Now we have the followingresults:

Proposition 17.1. For the yield requirement Yreq, the upper bound for voltagebinning numbers kup can be determined by


kup D Vmax � Vmin

F �1.1 � Yreq/� 1; (17.6)

where F �1.len/ is the inverse function of CDF of len.

(17.6) basically says that the upper bound for the numbers of voltage bins in uniformscheme can be predicted from the yield requirement and the distribution of len.

Proof Sketch for Proposition 17.1:If the chip satisfies the yield requirement Yreq,

1 � F.V / � Yreq (Observation 2): (17.7)

For the upper bound for voltage binning numbers kup, the corresponding Vmin canbe calculated by

Vmin D Vmax � Vmin

kup C 1(Observation 1): (17.8)

From (17.7) and (17.8),

Yreq D 1 � F.Vmin/ D 1 � F

�Vmax � Vmin

kup C 1

; (17.9)

which is equivalent form of (17.6). Q.E.D.Notice that the optimal binning always has a better or equal yield compared

to uniform binning using same number of bins. Therefore, if the uniform voltagebinning scheme with k bins already satisfies the yield requirement, k bins must beenough for the optimal voltage binning scheme. So the histogram for the length ofvalid Vdd segment can be used to estimate the upper bound for the number of binsneeded for a certain yield requirement for both uniform and optimal voltage binningschemes. And this process can be done right after mapping measured power andtiming data to working Vdd segments.

3.3 Yield Analysis and Optimization

The whole voltage binning algorithm for yield analysis and optimization is givenin Fig. 17.1. After the yield analysis and optimization, supply voltage levels V DfV1; V2; : : : ; Vk;optg, and the corresponding set of bins U D fU1; U2; : : : ; Uk;optg canbe calculated up to kopt, where Yk;opt D Ymax already.

There are many algorithms for solving the set-cover problem in step 3. Bychoosing optimal set-cover algorithm, the global optimal solution can be obtained.In this case, the decision version of set-covering problem will be NP-complete. Inthis chapter, we use a greedy approximation algorithm as shown in Fig. 17.5, which


Fig. 17.5 The flow of greedy algorithm for covering most uncovered elements in S

can easily be implemented to run in polynomial time and achieve a good enoughapproximation of optimal solution. Notice that the greedy approximation is notnecessary and any algorithm for set-cover can be used in step 3, which is not alimitation for the presented valid supply voltage segment model. The solution foundby GREEDY-SET-COVER is at most a small constant times larger than optimal [19],which is found already satisfactory as shown in the experimental results. Besides,the greedy algorithm can guarantee that each voltage level will cover the mostsegments corresponding to uncovered testing chips, which means this algorithm isincremental. As a result, if only k �1 bins is needed, we can stop the computation atk �1 instead of k. And when the designer needs more voltage bins, the computationdoes not need to be started all over again. Actually, the benefit of incremental voltagebinning scheme is very useful for circuit design. Since when the number of binsincrease from k � 1 to k, the existing k � 1 voltage levels will be the same.

We remark that the presented method can be easily extended to deal with agroup of discrete values Vg;1; Vg;2; : : : for dynamic voltage scaling under differentoperation modes instead of a single voltage. For example, if the i th supply voltagelevel Vi contains two discrete values, Vs and Vh, which are the supply voltages forsaving-power mode and high-performance mode, respectively (anything in betweenalso works for the selected chips). Set-cover algorithm in Fig. 17.5 now will use arange Vg (defined by users) to cover the voltage segments instead of a single voltagelevel. Such extension is very straightforward for the presented method.


In this section, the presented voltage binning technique for yield analysis and opti-mization was verified on circuits in the ISCAS’85 benchmark set with constraintson timing performance and power consumption. The circuits were synthesizedwith Nangate Open Cell Library. The technology parameters come from the45 nm FreePDK Base Kit and PTM models [139]. The presented method has beenimplemented in Matlab 7.8.0. All the experiments are carried out in a Linux systemwith quad Intel Xeon CPUs with 2:99 GHz and 16 GB memory.


Table 17.1 Predicted andactual number of bins neededunder yield requirement

Circuit Yreq Predicted Real for uni. Real for opt.

C432 99% 25 23 497% 10 9 395% 7 6 3

C1908 99% 27 12 797% 11 6 395% 7 3 3

C2670 99% 8 4 397% 5 3 295% 3 2 1

C7552 99% 30 12 597% 9 4 395% 6 3 2

4.1 Setting of Process Variation

For each type of circuit in the benchmark, 10;000 Monte Carlo samples aregenerated from process variations. In this chapter, effective gate length L andgate oxide thickness Tox are considered as two main sources of process variations.According to [71], the physical variation in L and Tox should be controlled within˙12%. So the 3� values of variations for L and Tox were set to 12% of the nominalvalues, of which inter-die variations constitute 20% and intra-die variations, 80%. L

is modeled as sum of spatially correlated sources of variations, and Tox is modeledas an independent source of variation. The same framework can be easily extendedto include other parameters of variations. Both L and Tox are modeled as Gaussianparameters. For the correlated L, the spatial correlation was modeled based on theexponential models [195].

The power and timing information as a function of supply voltage for each testingchip is characterized by using SPICE simulation. Under 45 nm technology, typicalsupply voltage range is 0:85 V–1:3625 V [69]. Since that, Vdd is varied between 0.8volt and 1.4 volt in this chapter, which is enough for 45 nm technology.

We remark that practically the power and timing information can be obtainedfrom measurements. As a result, all the sources of variability of transistors andinterconnects including inter-die and intra-die variations with spatial correlationswill be considered automatically.

4.2 Prediction of Bin Numbers Under Yield Requirement

As mentioned in Sect. 3.2, the presented valid segment model can be used topredict the number of bins needed under yield requirement before voltage binningoptimization. Table 17.1 shows the comparison between the predicted number andthe actual number needed under yield requirement for the testing chips. In this


Table 17.2 Yield under uniform and optimal voltage binning schemes (%)

Circuit Ymax VB 1 bin 2 bins 5 bins 10 bins kopt

C432 96.66 Uni. 60.19 79.04 90.52 94.36 4,514Opt. 80.08 88.68 96.42 96.66 10

C1908 98.06 Uni. 71.80 91.46 95.20 97.04 437Opt. 89.18 92.88 97.18 98.06 21

C2670 90.15 Uni. 81.12 87.13 89.74 89.95 1,205Opt. 85.77 88.34 89.83 90.08 13

C7552 93.46 Uni. 73.94 86.38 91.40 92.34 1,254Opt. 87.22 90.30 92.64 93.26 18

table, Yreq means the lower bound requirement for yield optimization (normalizedby Ymax). Column 3 is the predicted number of bins, and columns 4 and 5 are theactual bin numbers found for the uniform and optimal voltage binning schemes,respectively. This table validates the upper bound formulation for the needednumber of bins in Sect. 3.2. From this table, we can see that the predicted valueis always the upper bound of actual number of bins needed, which can be applied asa guide for yield requirement in optimization. Table 17.1 also shows that the optimalvoltage binning scheme can significantly reduce the number of bins compared withthe uniform voltage binning schema under the same yield requirement. When yieldrequirement is 99% of the optimal yield, the optimal voltage binning scheme canreduce 52% bin count on average.

4.3 Comparison Between Uniform and Optimal VoltageBinning Schemes

Numerical examples for both uniform and the optimal voltage binning schemeswith different number of bins are used to verify the presented voltage binningtechnique. Table 17.2 shows the results, where Ymax is the maximum chip yieldwhich can be achieved when Vdd is adjusted individually for each manufacturedchip, VB stands for voltage binning schemes used, and kopt is the minimum numberof bins to achieve Ymax. From Table 17.2, we can see that the yield of optimalVB always increases with the number of bins, with Ymax as the upper bound. Andthe voltage binning can significantly improve yield compared with simple supplyvoltage. Column 8 in Table 17.2 shows that the number of bins needed to achieveYmax in optimal voltage binning schemes is only 1.88% of number of bins needed inthe uniform scheme on average, which means that optimal voltage binning schemesis much more economic in order to reach the best possible yield.

Figure 17.6 compares the yields from uniform and optimal voltage binningschemes with the number of bins from 1 to 10 for C432. This figure showsthat the optimal binning scheme always provides higher yield than the uniform


0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Yield under different number of voltage bins

Number of voltage bins

Yie

ldOptimal VBUniform VB

Fig. 17.6 Yield under uniform and optimal voltage binning schemes for C432

binning scheme. For optimal voltage binning scheme, the yield increasing speed isslower as the bin number increases since we use greedy algorithm. For other testingcircuits, similar phenomenon is observed from the yield results.

4.4 Sensitivity to Frequency and Power Constraints

For very strict power or frequency constraints, voltage binning can provide moreopportunities to improve yield. Figure 17.7 shows the changes in parametric yieldfor C432 with and without voltage binning yield optimization due to the changes infrequency and power consumption requirements, where Pnorm is normalized powerconstraint and fnorm is normalized frequency constraint. By analyzing this figure, wecan see that parametric yield is sensitive to both performance and power constraints.As a result, yield can be substantially increased by binning supply voltage to a verysmall amount of levels in the optimal voltage binning scheme. For example, withoutvoltage binning technique, the yield will fall down 0% when constraints become20% stricter, while the voltage binning technique can keep the yield as high as 80%under the same situation.

4.5 CPU Times

Table 17.3 compares the CPU times among different voltage binning schemes anddifferent numbers of bins. Since the inputs of the presented algorithm in Fig. 17.1

5 Summary 285

Fig. 17.7 Maximum achievable yield as function of power and performance constraints for C2670

Table 17.3 CPU timecomparison(s)

Circuit VB 1 bin 2 bins 5 bins 10 bins

C432 Uni. 0.0486 0.0571 0.0866 0.1374Opt. 0.0747 0.0786 0.0823 0.0827

C1908 Uni. 0.0551 0.0749 0.1237 0.2037Opt. 0.0804 0.0840 0.0874 0.0901

C2670 Uni. 0.0347 0.0371 0.0425 0.0504Opt. 0.0686 0.0696 0.0711 0.0704

C7552 Uni. 0.0476 0.0565 0.0925 0.1493Opt. 0.0775 0.0791 0.0802 0.0812

are the measured data for real chips practically, the time cost of measuring datais not counted in the time cost of the voltage binning method. But in this chapter,the timing and power data is generated from SPICE simulation. There are threesteps in the presented method as shown in Fig. 17.1. It is easy to see that the timecomplexity of steps 1 and 2 is both O.N /, where N is the number of MC samplepoints. From [19], step 3 can run within O.N 2ln.N // time. Therefore, the speedof the voltage binning algorithm is not related to the size of circuits. Table 17.3confirms that binning technique is insignificant even for the case of 10 bins, and thetime cost is not increasing with the number of gates on chip.

5 Summary

In this chapter, we have presented a voltage binning technique to improve the yieldof chips. First, A novel formulation has been introduced to predict the maximumnumber of bins required under the uniform binning scheme from the distribution of


valid Vdd segment length. We then developed an approximation of optimal binningscheme based on greedy-based set-cover solution to minimize the number of binsand keep the corresponding voltage levels incremental. The presented method is alsoextendable to deal with a range of working supply voltages for dynamic voltagescaling operation. Numerical results on some benchmarks on 45 nm technologyshow that the presented method can correctly predict the upper bound on the numberof bins required. The presented optimal binning scheme can lead to significantsaving for the number of bins compared to the uniform one to achieve the sameyield with very small CPU cost.

References

1. A. Abdollahi, F. Fallah, and M. Pedram, “Runtime mechanisms for leakage current reductionin CMOS VLSI circuits,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED),Aug 2002, pp. 213–218.

2. A. Abu-Dayya and N. Beaulieu, “Comparison of methods of computing correlated lognormalsum distributions and outages for digital wireless applications,” in Proc. IEEE VehicularTechnology Conference, vol. 1, June 1994, pp. 175–179.

3. K. Agarwal, D. Blaauw, and V. Zolotov, “Statistical timing analysis for intra-die processvariations with spatial correlations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),Nov 2003, pp. 900–907.

4. J. D. Alexander and V. D. Agrawal, “Algorithms for estimating number of glitches anddynamic power in CMOS circuits with delay variations,” in IEEE Computer Society AnnualSymposium on VLSI, May 2009, pp. 127–132.

5. S. Bhardwaj, S. Vrudhula, and A. Goel, “A unified approach for full chip statistical timing andleakage analysis of nanoscale circuits considering intradie process variations,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1812–1825,Oct 2008.

6. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, and M. Alessandrini, “SiSMA: A tool forefficient analysis of analog CMOS integrated circuits affected by device mismatch,” IEEETCAD, pp. 192–207, 2004.

7. S. Borkar, T. Karnik, and V. De, “Design and reliability challenges in nanometer technolo-gies,” in Proc. Design Automation Conf. (DAC). IEEE Press, 2004, pp. 75–75.

8. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter variationsand impact on circuits and microarchitecture,” in Proc. Design Automation Conf. (DAC).IEEE Press, 2003, pp. 338–342.

9. C. Brau, Modern Problems In Classical Electrodynamics. Oxford Univ. Press, 2004.10. R. Burch, F. Najm, P. Yang, and T. Trick, “A Monte Carlo approach for power estimation,”

IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 1, no. 1, pp. 63–71,Mar 1993.

11. Y. Cao, Y. Lee, T. Chen, and C. C. Chen, “HiPRIME: hierarchical and passivity reservedinterconnect macromodeling engine for RLKC power delivery,” in Proc. Design AutomationConf. (DAC), 2002, pp. 379–384.

12. H. Chang and S. Sapatnekar, “Statistical timing analysis under spatial correlations,” IEEETrans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 9,pp. 1467–1482, Sept. 2005.

R. Shen et al., Statistical Performance Analysis and Modeling Techniquesfor Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1,© Springer Science+Business Media, LLC 2012

287

288 References

13. H. Chang and S. S. Sapatnekar, “Full-chip analysis of leakage power under process variations,including spatial correlations,” in Proc. IEEE/ACM Design Automation Conference (DAC),2005, pp. 523–528.

14. H. Chen, S. Neely, J. Xiong, V. Zolotov, and C. Visweswariah, “Statistical modeling andanalysis of static leakage and dynamic switching power,” in Power and Timing Modeling, Op-timization and Simulation: 18th International Workshop, (PATMOS), Sep 2008, pp. 178–187.

15. R. Chen, L. Zhang, V. Zolotov, C. Visweswariah, and J. Xiong, “Static timing: back toour roots,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2008,pp. 310–315.

16. C. Chiang and J. Kawa, Design for Manufacturability. Springer, 2007.17. E. Chiprout, “Fast flip-chip power grid analysis via locality and grid shells,” in Proc. Int.

Conf. on Computer Aided Design (ICCAD), Nov 2004, pp. 485–488.18. T.-L. Chou and K. Roy, “Power estimation under uncertain delays,” Integr. Comput.-Aided

Eng., vol. 5, no. 2, pp. 107–116, Apr 1998.19. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed.

MIT Press, 2001.20. P. Cox, P. Yang, and O. Chatterjee, “Statistical modeling for efficient parametric yield

estimation of MOS VLSI circuits,” in IEEE Int. Electron Devices Meeting, 1983, pp. 391–398.21. J. Cui, G. Chen, R. Shen, S. X.-D. Tan, W. Yu, and J. Tong, “Variational capacitance

modeling using orthogonal polynomial method,” in Proc. IEEE/ACM International GreatLakes Symposium on VLSI, 2008, pp. 23–28.

22. L. Daniel, O. C. Siong, L. S. Chay, K. H. Lee, and J. White, “Multi-parameter moment-matching model-reduction approach for generating geometrically parameterized interconnectperformance models,” IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 23, no. 5, pp. 678–693, May 2004.

23. S. Dasgupta, “Kharitonov’s theorem revisited,” Systems & Control Letters, vol. 11, no. 5,pp. 381–384, 1988.

24. V. De and S. Borkar, “Technology and design challenges for low power and high perfor-mance,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED), Aug 1999,pp. 163–168.

25. L. H. de Figueiredo and J. Stolfi, “Self-validated numerical methods and applications,” inBrazilian Mathematics Colloquium monographs, IMPA/CNPq, Rio de Janeiro, Brazil, 1997.

26. K. Deb, Multi-objective optimization using evolutionary algorithms. Wiley Publishing,Hoboken, NJ, 2002.

27. A. Demir, E. Liu, and A.Sangiovanni-Vincentelli, “Time-domain non-Monte Carlo noisesimulation for nonlinear dynamic circuits with arbitrary excitations,” IEEE TCAD, pp. 493–505, 1996.

28. C. Ding, C. Hsieh, and M. Pedram, “Improving the efficiency of Monte Carlo powerestimation VLSI,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5,pp. 584–593, Oct 2000.

29. C. Ding, C. Tsui, and M. Pedram, “Gate-level power estimation using tagged probabilisticsimulation,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,vol. 17, no. 11, pp. 1099–1107, Nov 1998.

30. Q. Dinh, D. Chen, and M. D. Wong, “Dynamic power estimation for deep submicron circuitswith process variation,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan2010, pp. 587–592.

31. S. W. Director, P. Feldmann, and K. Krishna, “Statistical integrated circuit design,” IEEE J.of Solid State Circuits, pp. 193–202, 1993.

32. P. Drennan and C. McAndrew, “Understanding MOSFET mismatch for analog design,” IEEEJ. of Solid State Circuits, pp. 450–456, 2003.

33. S. G. Duvall, “Statistical circuit modeling and optimization,” in Intl. Workshop StatisticalMetrology, Jun 2000, pp. 56–63.

34. T. El-Moselhy and L. Daniel, “Stochastic integral equation solver for efficient variation-awareinterconnect extraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 2008.

References 289

35. J. Fan, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical model order reduction forinterconnect circuits considering spatial correlations,” in Proc. Design, Automation and TestIn Europe. (DATE), 2007, pp. 1508–1513.

36. P. Feldmann and R. W. Freund, “Efficient linear circuit analysis by Pade approximation viathe Lanczos process,” IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 14, no. 5, pp. 639–649, May 1995.

37. P. Feldmann and S. W. Director, “Improved methods for IC yield and quality optimizationusing surface integrals,” in IEEE/ACM ICCAD, 1991, pp. 158–161.

38. R. Fernandes and R. Vemuri, “Accurate estimation of vector dependent leakage power inpresence of process variations,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),Oct 2009, pp. 451–458.

39. I. A. Ferzli and F. N. Najm, “Statistical estimation of leakage-induced power grid voltagedrop considering within-die process variations,” in Proc. IEEE/ACM Design AutomationConference (DAC), 2003, pp. 865–859.

40. I. A. Ferzli and F. N. Najm, “Statistical verification of power grids considering process-induced leakage current variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),2003, pp. 770–777.

41. G. F. Fishman, Monte Carlo, concepts, algorithms, and Applications. Springer, 1996.42. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, “Modeling within-die spatial

correlation effects for process design co-optimization,” in Proceedings of the 6th InternationalSymposium on Quality of Electronic Design, 2005, pp. 516–521.

43. O. Gay, D. Coeurjolly, and N. Hurst, “Libaffa: CCC affine arithmetic library for gnu/linux,”May 2005, http://savannah.nongnu.org/projects/libaa/.

44. R. Ghanem, “The nonlinear Gaussian spectrum of log-normal stochastic processes andvariables,” Journal of Applied Mechanics, vol. 66, pp. 964–973, December 1999.

45. R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral Approach. DoverPublications, 2003.

46. P. Ghanta, S. Vrudhula, and S. Bhardwaj, “Stochasic variational analysis of large powergrids considering intra-die correlations,” in Proc. IEEE/ACM Design Automation Conference(DAC), July 2006, pp. 211–216.

47. P. Ghanta, S. Vrudhula, R. Panda, and J. Wang, “Stochastic power grid analysis consideringprocess variations,” in Proc. Design, Automation and Test In Europe. (DATE), vol. 2, 2005,pp. 964–969.

48. A. Ghosh, S. Devadas, K. Keutzer, and J. White, “Estimation of average switching activity incombinational and sequential circuits,” in Proc. IEEE/ACM Design Automation Conference(DAC), June 1992, pp. 253–259.

49. L. Giraud, S. Gratton, and E. Martin, “Incremental spectral preconditioners for sequences oflinear systems,” Appl. Num. Math., pp. 1164–1180, 2007.

50. K. Glover, “All optimal Hankel-norm approximations of linear multi-variable systems andtheir L

1

error bounds”,” Int. J. Control, vol. 36, pp. 1115–1193, 1984.51. G. H. Golub and C. V. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University

Press, 1996.52. F. Gong, X. Liu, H. Yu, S. X. Tan, and L. He, “A fast non-Monte-Carlo yield analysis and

optimization by stochastic orthogonal polynomials,” ACM Trans. on Design Automation ofElectronics Systems, 2012, in press.

53. F. Gong, H. Yu, and L. He, “Picap: a parallel and incremental capacitance extractionconsidering stochastic process variation,” in Proc. ACM/IEEE Design Automation Conf.(DAC), 2009, pp. 764–769.

54. F. Gong, H. Yu, and L. He, “Stochastic analog circuit behaviour modelling by point estimationmethod,” in ACM International Symposium on Physical Design (ISPD), 2011.

http://savannah.nongnu.org/projects/libaa/

290 References

55. F. Gong, H. Yu, Y. Shi, D. Kim, J. Ren, and L. He, “QuickYield: an efficient global-searchbased parametric yield estimation with performance constraints,” in Proc. ACM/IEEE DesignAutomation Conf. (DAC), 2010, pp. 392–397.

56. F. Gong, H. Yu, L. Wang, and L. He, “A parallel and incremental extraction of variational ca-pacitance with stochastic geometric moments,” IEEE Trans. on Very Large Scale Integration(VLSI) Systems, 2012, in press.

57. R. L. Gorsuch, Factor Analysis. Hillsdale, NJ, 1974.58. C. J. Gu and J. Roychowdhury, “Model reduction via projection onto nonlinear manifolds,

with applications to analog circuits and biochemical systems,” in Proc. Int. Conf. on ComputerAided Design (ICCAD), Nov 2008.

59. C. Gu and J. Roychowdhury, “An efficient, fully nonlinear, variability-aware non-Monte-Carlo yield estimation procedure with applications to SRAM cells and ring oscillators,” inProc. Asia South Pacific Design Automation Conf., 2008, pp. 754–761.

60. Z. Hao, R. Shen, S. X.-D. Tan, B. Liu, G. Shi, and Y. Cai, “Statistical full-chip dynamic powerestimation considering spatial correlations,” in Proc. Int. Symposium. on Quality ElectronicDesign (ISQED), March 2011, pp. 677–782.

61. Z. Hao, R. Shen, S. X.-D. Tan, and G. Shi, “Performance bound analysis of analogcircuits considering process variations,” in Proc. Design Automation Conf. (DAC), July 2011,pp. 310–315.

62. Z. Hao, S. X.-D. Tan, and G. Shi, “An efficient statistical chip-level total power estimationmethod considering process variations with spatial correlation,” in Proc. Int. Symposium. onQuality Electronic Design (ISQED), March 2011, pp. 671–676.

63. Z. Hao, S. X.-D. Tan, E. Tlelo-Cuautle, J. Relles, C. Hu, W. Yu, Y. Cai, and G. Shi, “Statisticalextraction and modeling of inductance considering spatial correlation,” Analog Integr Circ SigProcess, 2012, in press.

64. B. P. Harish, N. Bhat, and M. B. Patil, “Process variability-aware statistical hybrid modelingof dynamic power dissipation in 65 nm CMOS designs,” in Proc. Int. Conf. on Computing:Theory and Applications (ICCTA), Mar 2007, pp. 94–98.

65. K. R. Heloue, N. Azizi, and F. N. Najm, “Modeling and estimation of full-chip leakage currentconsidering within-die correlation,” in Proc. IEEE/ACM Design Automation Conference(DAC), 2007, pp. 93–98.

66. F. Hu and V. D. Agrawal, “Enhanced dual-transition probabilistic power estimation withselective supergate analysis,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005,pp. 366–372.

67. G. M. Huang, W. Dong, Y. Ho, and P. Li, “Tracing SRAM separatrix for dynamic noise marginanalysis under device mismatch,” in Proc. of IEEE Int. Behavioral Modeling and SimulationConf., 2007, pp. 6–10.

68. A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, 2001.69. “Intel pentium processor e5200 series specifications,” Intel Co., http://ark.intel.com/Product.

aspx?id=37212.70. A. Iserles, A First Course in the Numerical Analysis of Differential Equations, 3rd ed.

Cambridge University, 1996.71. “International technology roadmap for semiconductors (ITRS), 2010 update,” 2010, http://

public.itrs.net.72. J. D. Jackson, Classical Electrodynamics. John Wiley and Sons, 1975.73. H. Jiang, M. Marek-Sadowska, and S. R. Nassif, “Benefits and costs of power-gating

technique,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005, pp. 559–566.74. R. Jiang, W. Fu, J. M. Wang, V. Lin, and C. C.-P. Chen, “Efficient statistical capacitance

variability modeling with orthogonal principle factor analysis,” in Proc. Int. Conf. onComputer Aided Design (ICCAD), 2005, pp. 683–690.

75. I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.76. M. Kamon, M. Tsuk, and J. White, “FastHenry: a multipole-accelerated 3D inductance

extraction program,” IEEE Trans. on Microwave Theory and Techniques, pp. 1750–1758,Sept. 1994.

http://ark.intel.com/Product.aspx?id=37212.

http://ark.intel.com/Product.aspx?id=37212.

http://public.itrs.net

http://public.itrs.net

References 291

77. S. Kapur and D. Long, “IES3: A fast integral equation solver for efficient 3-dimensionalextraction,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1997.

78. T. Karnik, S. Borkar, and V. De, “Sub-90 nm technologies-challenges and opportunities forCAD,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), San Jose, CA, Nov 2002,pp. 203–206.

79. V. L. Kharitonov, “Asymptotic stability of an equilibrium position of a family of systems oflinear differential equations,” Differential. Uravnen., vol. 14, pp. 2086–2088, 1978.

80. J. Kim, K. Jones, and M. Horowitz, “Fast, non-Monte-Carlo estimation of transient perfor-mance variation due to device mismatch,” in Proc. IEEE/ACM Design Automation Conference(DAC), 2007.

81. A. Klimke, “Sparse Grid Interpolation Toolbox—user’s guide,” University of Stuttgart, Tech.Rep. IANS report 2006/001, 2006.

82. A. Klimke and B. Wohlmuth, “Algorithm 847: spinterp: Piecewise multilinear hierarchicalsparse grid interpolation in MATLAB,” ACM Transactions on Mathematical Software,vol. 31, no. 4, 2005.

83. L. Kolev, V. Mladenov, and S. Vladov, “Interval mathematics algorithms for toleranceanalysis,” IEEE Trans. on Circuits and Systems, vol. 35, no. 8, pp. 967–975, Aug 1988.

84. J. N. Kozhaya, S. R. Nassif, , and F. N. Najm, “A multigrid-like technique for power gridanalysis,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 21,no. 10, pp. 1148–1160, Oct 2002.

85. M. W. Kuemerle, S. K. Lichtensteiger, D. W. Douglas, and I. L. Wemple, “Integrated circuitdesign closure method for selective voltage binning,” in U.S. Patent 7475366, Jan 2009.

86. Y. S. Kumar, J. Li, C. Talarico, and J. Wang, “A probabilistic collocation method basedstatistical gate delay model considering process variations and multiple input switching,” inProc. Design, Automation and Test In Europe. (DATE), 2005, pp. 770–775.

87. A. Labun, “Rapid method to account for process variation in full-chip capacitance extraction,”IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 941–951, June 2004.

88. K. Lampaert, G. Gielen, and W. Sansen, “Direct performance-driven placement of mismatch-sensitive analog circuits,” in Proc. IEEE/ACM Design Automation Conference (DAC), 1995,pp. 445–449.

89. Y. Lee, Y. Cao, T. Chen, J. Wang, and C. Chen, “HiPRIME: Hierarchical and passivitypreserved interconnect macromodeling engine for RLKC power delivery,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 6, pp. 797–806, 2005.

90. A. Levkovich, E. Zeheb, and N. Cohen, “Frequency response envelopes of a family ofuncertain continuous-time systems,” IEEE Trans. on Circuits and Systems I: FundamentalTheory and Applications, vol. 42, no. 3, pp. 156–165, Mar 1995.

91. D. Li and S. X.-D. Tan, “Statistical analysis of large on-chip power grid networks byvariational reduction scheme,” Integration, the VLSI Journal, vol. 43, no. 2, pp. 167–175,April 2010.

92. D. Li, S. X.-D. Tan, G. Chen, and X. Zeng, “Statistical analysis of on-chip power gridnetworks by variational extended truncated balanced realization method,” in Proc. Asia SouthPacific Design Automation Conf. (ASPDAC), Jan 2009, pp. 272–277.

93. D. Li, S. X.-D. Tan, and B. McGaughy, “ETBR: Extended truncated balanced realizationmethod for on-chip power grid network analysis,” in Proc. Design, Automation and Test InEurope. (DATE), 2008, pp. 432–437.

94. D. Li, S. X.-D. Tan, E. H. Pacheco, and M. Tirumala, “Fast analysis of on-chip power gridcircuits by extended truncated balanced realization method,” IEICE Trans. on Fundamentalsof Electronics, Communications and Computer Science(IEICE), vol. E92-A, no. 12, pp. 3061–3069, 2009.

95. P. Li and W. Shi, “Model order reduction of linear networks with massive ports via frequency-dependent port packing,” in Proc. Design Automation Conf. (DAC), 2006, pp. 267–272.

292 References

96. T. Li, W. Zhang, and Z. Yu, “Full-chip leakage analysis in nano-scale technologies:Mechanisms, variation sources, and verification,” in Proc. Design Automation Conf. (DAC),June 2008, pp. 594–599.

97. X. Li, J. Le, L. Pileggi, and A. Strojwas, “Projection-based performance modeling forinter/intra-die variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,pp. 721–727.

98. X. Li, J. Le, and L. T. Pileggi, “Projection-based statistical analysis of full-chip leakage powerwith non-log-normal distributions,” in Proc. IEEE/ACM Design Automation Conference(DAC), July 2006, pp. 103–108.

99. Y. Lin and D. Sylvester, “Runtimie lekaage power estimation technique for combinationalcircuits,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2007,pp. 660–665.

100. B. Liu, F. V. Fernandez, and G. Gielen, “An accurate and efficient yield optimization methodfor analog circuits based on computing budget aladdress and memetic search technique,” inProc. Design Automation and Test Conf. in Europe, 2010, pp. 1106–1111.

101. Y. Liu, S. Nassif, L. Pileggi, and A. Strojwas, “Impact of interconnect variations on the clockskew of a gigahertz microprocessor,” in Proc. IEEE/ACM Design Automation Conference(DAC), 2000, pp. 168–171.

102. Y. Liu, L. T. Pileggi, and A. J. Strojwas, “Model order-reduction of rc(l) interconnectincluding variational analysis,” in DAC ’99: Proceedings of the 36th ACM/IEEE conferenceon Design automation, 1999, pp. 201–206.

103. R. Marler and J. Arora, “Survey of multi-objective optimization methods for engineering,”Struct Multidisc Optim 26, pp. 369–395, 2004.

104. H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, “Challenge: Variability characterizationand modeling for 65- to 90-nm processes,” in Proc. IEEE Custom Integrated Circuits Conf.,2005.

105. C. McAndrew, J. Bates, R. Ida, and P. Drennan, “Efficient statistical BJT modeling, why betais more than ic/ib,” in Proc. IEEE Bipolar/BiCMOS Circuits and Tech. Meeting, 1997.

106. “MCNC benchmark circuit placements,” http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/nPlacement/.

107. N. Mi, J. Fan, and S. X.-D. Tan, “Simulation of power grid networks considering wires andlognormal leakage current variations,” in Proc. IEEE International Workshop on BehavioralModeling and Simulation (BMAS), Sept. 2006, pp. 73–78.

108. N. Mi, J. Fan, and S. X.-D. Tan, “Statistical analysis of power grid networks consideringlognormal leakage current variations with spatial correlation,” in Proc. IEEE Int. Conf. onComputer Design (ICCD), 2006, pp. 56–62.

109. N. Mi, J. Fan, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical analysis of on-chip powerdelivery networks considering lognormal leakage current variations with spatial correlations,”IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 55, no. 7,pp. 2064–2075, Aug 2008.

110. N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Fast variational analysis of on-chip power gridsby stochastic extended krylov subspace method,” IEEE Trans. on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 27, no. 11, pp. 1996–2006, 2008.

111. N. Mi, S. X.-D. Tan, P. Liu, J. Cui, Y. Cai, and X. Hong, “Stochastic extended Krylovsubspace method for variational analysis of on-chip power grid networks,” in Proc. Int. Conf.on Computer Aided Design (ICCAD), 2007, pp. 48–53.

112. B. Moore, “Principal component analysis in linear systems: Controllability, and observability,and model reduction,” IEEE Trans. Automat. Contr., vol. 26, no. 1, pp. 17–32, 1981.

113. R. E. Moore, Interval Analysis. Prentice-Hall, 1966.114. S. Mukhopadhyay and K. Roy, “Modeling and estimation of total leakage current in nano-

scaled CMOS devices considering the effect of parameter variation,” in Proc. Int. Symp. onLow Power Electronics and Design (ISLPED), 2003, pp. 172–175.

115. K. Nabors and J. White, “Fastcap: A multipole accelerated 3-d capacitance extractionprogram,” IEEE TCAD, pp. 1447–1459, Nov 1991.

http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/ Placement/

http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/ Placement/

References 293

116. F. Najm, “Transition density: a new measure of activity in digital circuits,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 310–323, Feb1993.

117. F. Najm, R. Burch, P. Yang, and I. Hajj, “Probabilistic simulation for reliability analysis ofCMOS VLSI circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 9, no. 4, pp. 439–450, Apr 1990.

118. K. Narbos and J. White, “FastCap: a multipole accelerated 3D capacitance extractionprogram,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,vol. 10, no. 11, pp. 1447–1459, 1991.

119. S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, “Full-chipsubthreshold leakage power prediction and reduction techniques for sub-0.18-�m CMOS,”IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 501–510, Mar 2004.

120. S. Nassif, “Delay variability: sources, impact and trends,” in Proc. IEEE Int. Solid-StateCircuits Conf., San Francisco, CA, Feb 2000, pp. 368–369.

121. S. Nassif, “Design for variability in DSM technologies,” in Proc. Int. Symposium. on QualityElectronic Design (ISQED), San Jose, CA, Mar 2000, pp. 451–454.

122. S. R. Nassif, “Model to hardware correlation for nm-scale technologies,” in Proc. IEEE Inter-national Workshop on Behavioral Modeling and Simulation (BMAS), Sept 2007, keynotespeech.

123. S. R. Nassif, “Power grid analysis benchmarks,” in Proc. Asia South Pacific Design Auto-mation Conf. (ASPDAC), 2008, pp. 376–381.

124. S. R. Nassif and K. J. Nowka, “Physical design challenges beyond the 22 nm node,” in Proc.ACM Int. Sym. Physical Design (ISPD), 2010, pp. 13–14.

125. “Nangate open cell library,” http://www.nangate.com/.126. E. Novak and K. Ritter, “Simple cubature formulas with high polynomial exactness,”

Constructive Approximation, vol. 15, no. 4, pp. 449–522, Dec 1999.127. A. Odabasioglu, M. Celik, and L. Pileggi, “PRIMA: Passive reduced-order interconnect

macro-modeling algorithm,” IEEE TCAD, pp. 645–654, 1998.128. J. Oehm and K. Schumacher, “Quality assurance and upgrade of analog characteristics by fast

mismatch analysis option in network analysis environment,” IEEE J. of Solid State Circuits,pp. 865–871, 1993.

129. M. Orshansky, L. Milor, and C. Hu, “Characterization of spatial intrafield gate cd variability,its impact on circuit performance, and spatial mask-level correction,” in IEEE Trans. onSemiconductor Devices, vol. 17, no. 1, Feb 2004, pp. 2–11.

130. C. C. Paige and M. A. Saunders, “Solution of sparse indefinite systems of linear equations,”SIAM J. on Numerical Analysis, vol. 12, no. 4, pp. 617–629, September 1975.

131. S. Pant, D. Blaauw, V. Zolotov, S. Sundareswaran, and R. Panda, “A stochastic approachto power grid analysis,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2004,pp. 171–176.

132. A. Papoulis and S. Pillai, Probability, Random Variables and Stochastic Processes. McGraw-Hill, 2001.

133. M. Pelgrom, A. Duinmaijer, and A. Welbers, “Matching properties of mos transistors,” IEEEJ. of Solid State Circuits, pp. 1433–1439, 1989.

134. J. R. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,” IEEETrans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 1, pp. 43–55, 2005.

135. L. Pileggi, G. Keskin, X. Li, K. Mai, and J. Proesel, “Mismatch analysis and statistical designat 65 nm and below,” in Proc. IEEE Custom Integrated Circuits Conf., 2008, pp. 9–12.

136. L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation for timing analysis,” IEEETrans. on Computer-Aided Design of Integrated Circuits and Systems, pp. 352–366, April1990.

137. L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuit and System SimulationMethods. New York: McGraw-Hill, 1994.

http://www.nangate.com/

294 References

138. S. Pilli and S. Sapatnekar, “Power estimation considering statistical ic parametric variations,”in Proc. IEEE Int. Symp. on Circuits and Systems (ISCAS), vol. 3, June 1997, pp. 1524–1527.

139. “Predictive Technology Model,” http://www.eas.asu.edu/�ptm/.140. L. Qian, D. Zhou, S. Wang, and X. Zeng, “Worst case analysis of linear analog circuit

performance based on kharitonov’s rectangle,” in Proc. IEEE Int. Conf. on Solid-State andIntegrated Circuit Technology (ICSICT), Nov 2010.

141. W. T. Rankin, III, “Efficient parallel implementations of multipole based n-body algorithms,”Ph.D. dissertation, Duke University, Durham, NC, USA, 1999.

142. R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, “Statistical analysis of subthresholdleakage current for VLSI circuits,” IEEE Trans. on Very Large Scale Integration (VLSI)Systems, vol. 12, no. 2, pp. 131–139, Feb 2004.

143. J. Relles, M. Ngan, E. Tlelo-Cuautle, S. X.-D. Tan, C. Hu, W. Yu, and Y. Cai, “Statisticalextraction and modeling of 3D inductance with spatial correlation,” in Proc. IEEE Interna-tional Workshop on Symbolic and Numerical Methods, Modeling and Applications to CircuitDesign, Oct 2010.

144. M. Rewienski and J. White, “A trajectory piecewise-linear approach to model order reductionand fast simulation of nonlinear circuits and micromachined devices,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 2, pp. 155–170,Feb 2003.

145. J. Roy, S. Adya, D. Papa, and I. Markov, “Min-cut floorplacement,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 7, pp. 1313–1326,July 2006.

146. J. Roychowdhury, “Reduced-order modelling of time-varying systems,” in Proc. Asia SouthPacific Design Automation Conf. (ASPDAC), Jan 1999, pp. 53–56.

147. A. E. Ruehli, “Equivalent circuits models for three dimensional multiconductor systems,”IEEE Trans. on Microwave Theory and Techniques, pp. 216–220, 1974.

148. R. Rutenbar, “Next-generation design and EDA challenges,” in Proc. Asia South PacificDesign Automation Conf. (ASPDAC), January 2007, keynote speech.

149. Y. Saad and M. H. Schultz, “GMRES: a generalized minimal residual algorithm for solvingnonsymmetric linear systems,” SIAM J. on Sci and Sta. Comp., pp. 856–869, 1986.

150. Y. Saad, Iterative methods for sparse linear systems. SIAM, 2003.151. S. B. Samaan, “The impact of device parameter variations on the frequency and performance

of VLSI chips,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), ser. ICCAD ’04,2004, pp. 343–346.

152. Y. Sawaragi, H. Nakayama, and T. Tanino, Theory of Multiobjective Optimization (vol. 176of Mathematics in Science and Engineering). Orlando, FL: Academic Press Inc. ISBN0126203709, 1985.

153. F. Schenkel, M. Pronath, S. Zizala, R. Schwencker, H. Graeb, and K. Antreich, “Mismatchanalysis and direct yield optimization by specwise linearization and feasibility-guidedsearch,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2001.

154. A. S. Sedra and K. C. Smith, Microelectronic Circuits. Oxford University Press, USA, 2009.155. R. Shen, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical modeling and analysis of

chip-level leakage power by spectral stochastic method,” in Proc. Asia South Pacific DesignAutomation Conf. (ASPDAC), Jan 2009, pp. 161–166.

156. R. Shen, S. X.-D. Tan, J. Cui, W. Yu, Y. Cai, and G. Chen, “Variational capacitance extractionand modeling based on orthogonal polynomial method,” IEEE Trans. on Very Large ScaleIntegration (VLSI) Systems, vol. 18, no. 11, pp. 1556–1565, 2010.

157. R. Shen, S. X.-D. Tan, N. Mi, and Y. Cai, “Statistical modeling and analysis of chip-levelleakage power by spectral stochastic method,” Integration, the VLSI Journal, vol. 43, no. 1,pp. 156–165, January 2010.

158. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear algorithm for full-chip statistical leakage poweranalysis considering weak spatial correlation,” in Proc. Design Automation Conf. (DAC), Jun.2010, pp. 481–486.

http://www.eas.asu.edu/~ptm/

References 295

159. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear statistical analysis for full-chip leakage powerwith spatial correlation,” in Proc. IEEE/ACM International Great Lakes Symposium on VLSI(GLSVLSI), May 2010, pp. 227–232.

160. C.-J. Shi and X.-D. Tan, “Canonical symbolic analysis of large analog circuits with determi-nant decision diagrams,” IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 19, no. 1, pp. 1–18, Jan 2000.

161. C.-J. Shi and X.-D. Tan, “Compact representation and efficient generation of s-expandedsymbolic network functions for computer-aided analog circuit design,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 7, pp. 813–827, April2001.

162. C.-J. R. Shi and M. W. Tian, “Simulation and sensitivity of linear analog circuits underparameter variations by robust interval analysis,” ACM Trans. Des. Autom. Electron. Syst.,vol. 4, pp. 280–312, July 1999.

163. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitanceextraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 1998.

164. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitanceextraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,vol. 21, no. 3, pp. 330–336, March 2002.

165. R. W. Shonkwiler and L. Lefton, An introduction to parallel and vector scientific computing.Cambridge University Press, 2006.

166. V. Simoncini and D. Szyld, “Recent computational developments in Krylov subspace methodsfor linear systems,” Num. Lin. Alg. with Appl., pp. 1–59, 2007.

167. R. S. Soin and R. Spence, “Statistical exploration approach to design centering,” Proceedingsof the Institution of Electrical Engineering, pp. 260–269, 1980.

168. R. Spence and R. Soin, Tolerance Design of Electronic Circuits. Addison-Wesley, Reading,MA., 1988.

169. A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, “Modeling and analysis of leakage powerconsidering within-die process variations,” in Proc. Int. Symp. on Low Power Electronics andDesign (ISLPED), Aug 2002, pp. 64–67.

170. A. Srivastava, D. Sylvester, and D. Blaauw, Statistical Analysis and Optimization for VLSI:Timing and Power. Springer, 2005.

171. G. W. Stewart, Matrix Algorithms, VOL II. SIAM Publisher, 2001.172. B. G. Streetman and S. Banerjee, Solid-State Electronic Devices. Prentice Hall, 2000, 5th ed.173. E. Suli and D. Mayers, An Introduction to Numerical Analysis. Cambridge University, 2006.174. S. X.-D. Tan, W. Guo, and Z. Qi, “Hierarchical approach to exact symbolic analysis of large

analog circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,vol. 24, no. 8, pp. 1241–1250, August 2005.

175. S. X.-D. Tan and C.-J. Shi, “Efficient DDD-based interpretable symbolic characterization oflarge analog circuits,” IEICE Trans. on Fundamentals of Electronics, Communications andComputer Science(IEICE), vol. E86-A, no. 12, pp. 3112–3118, Dec 2003.

176. S. X.-D. Tan and C.-J. Shi, “Efficient approximation of symbolic expressions for analogbehavioral modeling and analysis,” IEEE Trans. on Computer-Aided Design of IntegratedCircuits and Systems, vol. 23, no. 6, pp. 907–918, June 2004.

177. S. X.-D. Tan and L. He, Advanced Model Order Reduction Techniques in VLSI Design.Cambridge University Press, 2007.

178. R. Teodorescu, B. Greskamp, J. Nakano, S. R. Sarangi, A. Tiwari, and J. Torrellas, “Amodel of parameter variation and resulting timing errors for microarchitects,” in Workshopon Architectural Support for Gigascale Integration (ASGI), Jun 2007.

179. W. Tian, X.-T. Ling, and R.-W. Liu, “Novel methods for circuit worst-case tolerance analysis,”IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 43, no. 4,pp. 272–278, Apr 1996.

296 References

180. S. Tiwary and R. Rutenbar, “Generation of yield-aware Pareto surfaces for hierarchical circuitdesign space exploration,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2006,pp. 31–36.

181. S. K. Tiwary and R. A. Rutenbar, “Faster, parametric trajectory-based macromodels vialocalized linear reductions,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),Nov 2006, pp. 876–883.

182. J. W. Tschanz, S. Narendra, R. Nair, and V. De, “Ectiveness of adaptive supply voltage andbody bias for reducing impact of parameter variations in low power and high performancemicroprocessors,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 826–829, May 2003.

183. C.-Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamic power consumptionunder a real delay model,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 1993,pp. 224–228.

184. “Umfpack,” http://www.cise.ufl.edu/research/sparse/umfpack/.185. J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design. New York, NY:

Van Nostrand Reinhold, 1995.186. M. Vratonjic, B. R. Zeydel, and V. G. Oklobdzija, “Circuit sizing and supply-voltage selection

for low-power digital circuit design,” in Power and Timing Modeling, Optimization andSimulation: 18th International Workshop, (PATMOS), 2006, pp. 148–156.

187. S. Vrudhula, J. M. Wang, and P. Ghanta, “Hermite polynomial based interconnect analysisin the presence of process variations,” IEEE Trans. on Computer-Aided Design of IntegratedCircuits and Systems, vol. 25, no. 10, 2006.

188. C.-Y. Wang and K. Roy, “Maximum power estimation for CMOS circuits using deterministicand statistical approaches,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems,vol. 6, no. 1, pp. 134–140, Mar 1998.

189. H. Wang, H. Yu, and S. X.-D. Tan, “Fast analysis of nontree-clock network consideringenvironmental uncertainty by parameterized and incremental macromodeling,” in Proc.IEEE/ACM Asia South Pacific Design Automation Conf. (ASPDAC), 2009, pp. 379–384.

190. J. Wang, P. Ghanta, and S. Vrudhula, “Stochastic analysis of interconnect performance in thepresence of process variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov2004, pp. 880–886.

191. J. M. Wang and T. V. Nguyen, “Extended Krylov subspace method for reduced order analysisof linear circuit with multiple sources,” in Proc. IEEE/ACM Design Automation Conference(DAC), 2000, pp. 247–252.

192. J. M. Wang, B. Srinivas, D. Ma, C. C.-P. Chen, and J. Li, “System-level power and thermalmodeling and analysis by orthogonal polynomial based response surface approach (OPRS),”in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 2005, pp. 727–734.

193. M. S. Warren and J. K. Salmon, “A parallel hashed oct-tree n-body algorithm,” in Proceedingsof the 1993 ACM/IEEE conference on Supercomputing, ser. Supercomputing ’93, 1993,pp. 12–21.

194. D. Wilton, S. Rao, A. Glisson, D. Schaubert, O. Al-Bundak, and C. Butler, “Potential integralsfor uniform and linear source distributions on polygonal and polyhedral domains,” IEEETrans. on Antennas and Propagation, vol. AP-32, no. 3, pp. 276–281, March 1984.

195. J. Xiong, V. Zolotov, and L. He, “Robust extraction of spatial correlation,” IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 4, 2007.

196. D. Xiu and G. Karniadakis, “The Wiener-Askey polynomial chaos for stochastic differentialequations,” SIAM J. Scientific Computing, vol. 24, no. 2, pp. 619–644, Oct 2002.

197. D. Xiu and G. Karniadakis, “Modeling uncertainty in flow simulations via generalizedpolynomial chaos,” J. of Computational Physics, vol. 187, no. 1, pp. 137–167, May 2003.

198. H. Xu, R. Vemuri, and W. Jone, “Run-time active leakage reduction by power gating andreverse body biasing: An energy view,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),Oct 2008, pp. 618–625.

199. S. Yan, V. Sarim, and W. Shi, “Sparse transformation and preconditioners for 3-d capacitanceextraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,vol. 24, no. 9, pp. 1420–1426, 2005.

http://www.cise.ufl.edu/research/sparse/umfpack/

References 297

200. Z. Ye and Z. Yu, “An efficient algorithm for modeling spatially-correlated process variation instatistical full-chip leakage analysis,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),Nov 2009, pp. 295–301.

201. L. Ying, G. Biros, D. Zorin, and H. Langston, “A new parallel kernel-independent fast multi-pole method,” in IEEE Conf. on High Performance Networking and Computing, 2003.

202. H. Yu, X. Liu, H. Wang, and S. X.-D. Tan, “A fast analog mismatch analysis by an incrementaland stochastic trajectory piecewise linear macromodel,” in Proc. Asia South Pacific DesignAutomation Conf. (ASPDAC), Jan 2010, pp. 211–216.

203. H. Yu and S. X.-D. Tan, “Recent advance in computational prototyping for analysis ofhigh-performance analog/RF ICs,” in IEEE International Conf. on ASIC (ASICON), 2009,pp. 760–764.

204. W. Yu, C. Hu, and W. Zhang, “Variational capacitance extraction of on-chip interconnectsbased on continuous surface model,” in Proc. IEEE/ACM Design Automation Conference(DAC), July 2009, pp. 758–763.

205. W. Zhang, W. Yu, Z. Wang, Z. Yu, R. Jiang, and J. Xiong, “An efficient method for chip-levelstatistical capacitance extraction considering process variations with spatial correlation,” inProc. Design, Automation and Test In Europe. (DATE), Mar 2008, pp. 580–585.

206. M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchical analysis of powerdistribution networks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, vol. 21, no. 2, pp. 159–168, Feb 2002.

207. Y. Zhou, Z. Li, Y. Tian, W. Shi, and F. Liu, “A new methodology for interconnect parasiticsextraction considering photo-lithography effects,” in Proc. Asia South Pacific Design Automa-tion Conf. (ASPDAC), Jan 2007, pp. 450–455.

208. H. Zhu, X. Zeng, W. Cai, J. Xue, and D. Zhou, “A sparse grid based spectral stochastic collo-cation method for variations-aware capacitance extraction of interconnects under nanometerprocess technology,” in Proc. Design, Automation and Test In Europe. (DATE), Mar 2007,pp. 1514–1519.

209. Z. Zhu and J. Phillips, “Random sampling of moment graph: a stochastic Krylov-reduction algorithm,” in Proc. Design, Automation and Test In Europe. (DATE), April 2007,pp. 1502–1507.

210. Z. Zhu and J. White, “FastSies: a fast stochastic integral equation solver for modelingthe rough surface effect,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,pp. 675–682.

211. Z. Zhu, B. Song, and J. White, “Algorithms in FastImp: a fast and wideband impedanceextraction program for complicated 3-d geometries,” in Proc. Design Automation Conf.(DAC). New York, NY, USA: ACM, 2003, pp. 712–717.

212. Z. Zhu, J. White, and A. Demir, “A stochastic integral equation method for modeling therough surface effect on interconnect capacitance,” in Proc. Int. Conf. on Computer AidedDesign (ICCAD), 2004, pp. 887–891.

213. V. Zolotov, C. Viweswariah, and J. Xiong, “Voltage binning under process variation,” in Proc.Int. Conf. on Computer Aided Design (ICCAD), Nov 2009, pp. 425–432.

214. Y. Zou, Y. Cai, Q. Zhou, X. Hong, S. X.-D. Tan, and L. Kang, “Practical implementation ofstochastic parameterized model order reduction via hermite polynomial chaos,” in Proc. AsiaSouth Pacific Design Automation Conf. (ASPDAC), Jan 2007, pp. 367–372.

Index

AAdaptive voltage supply

yield optimization, 273Affine interval, 13

performance bound analysis, 222Arnoldi algorithm

capacitance extraction, 194, 199power grid, 150

Askey scheme, 29yield analysis, 257

Augmented potential coefficient matrixcapacitance extraction, 167

BBalancing

TBR, 146Baseline

yield, 271BEM

boundary element method, 163capacitance extraction, 165, 184inductance extraction, 209

BEOLback-end-of-the-line, 111

Bin voltage levelyield, 275

Binning algorithmyield, 275

Block-Arnoldi orthonormalization, 243BPV

backward propagation of variance, 237mismatch, 242

CCAD

developers, 9inductance extraction, 209

Capacitance extraction, 163Capacitance matrix

power grid, 111CDF

cumulative distribution function, 19Charge distribution

capacitance extraction, 165Chebyshev’s inequality, 17–18Cholesky decomposition, 26CMP, 3Collocation-based method

spectral stochastic method, 31Collocation-based spectral stochastic method

capacitance extraction, 163leakage analysis, 65

Conductance matrix, 110Continuous random variable, 16Corner-based, 3Correlation index neighbor set

statistical leakage analysis, 67Covariance, 21Covariance matrix, 8, 23, 25

statistical leakage analysis, 43, 57Critical dimension, 7

DDAE

differential-algebra-equation, 235yield, 258

R. Shen et al., Statistical Performance Analysis and Modeling Techniquesfor Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1,© Springer Science+Business Media, LLC 2012

299

300 Index

DDDdeterminant decision diagram, 222

Decancellationperformance bound analysis, 227

Delaydynamic power, 86inductance extraction, 217power grid, 107yield, 254

Deterministic current source, 134Discrete probability distribution, 18Discrete random variable, 16Dishing, 7Downward pass, 185Dynamic current

power grid, 128Dynamic power, 10Dynamic power analysis, 85Dynamic power

yield optimization, 273

EEffective channel length

dynamic power analysis, 84power grid, 112statistical leakage analysis, 41yield, 257, 274

EKS, 11extended Krylov subspace, 127Extended Krylov subspace method, 11power grid, 128

Electrical parameter, 256Electromigration, 4ETBR, 11ETBR extended truncated balanced realization,

11ETBR

extended truncated balanced realization,145

power grid, 130, 148Event, 15Expectation, 16Experiment, 15Exponential correlation model

capacitance extraction, 166inductance extraction, 211

FFast multipole method, 12Filament current, 211Filament voltage, 211

FMMfast-multipole-method, 183

Free space Green function, 168

GGalerkin-based method, 33

spectral stochastic method, 31Galerkin-based spectral stochastic method, 11,

166capacitance extraction, 164, 166power grid, 113, 136

Gate oxide leakagestatistical leakage analysis, 41

Gate oxide thicknessstatistical leakage analysis, 41dynamic power analysis, 84

Gaussian-Hermite quadraturefundamental, 31

Gaussian distribution, 19Gaussian quadrature

fundamental, 31leakage analysis, 10statistical leakage analysis, 59inductance extraction, 212

Gaussiancapacitance extraction, 166dynamic power analysis, 90inductance extraction, 211mismatch, 241power grid, 111random variable, 7statistical leakage analysis, 58yield, 256yield optimization, 275

Geometric variationcapacitance extraction, 166inductance extraction, 209

Geometrical parameter, 256Glitch width variation

dynamic power analysis, 89Glitch

dynamic power analysis, 86Global aggregation, 245GM

geometrical moment, 186GMRES

capacitance extraction, 183general minimal residue, 164

Gradient-based yield optimization, 256Gramian

power grid, 145, 147Greedy algorithm, 13

Index 301

Green function, 168Grid-based method, 24

statistical leakage analysis, 49

HHermite polynomials

total power analysis, 10, 95yield, 257

HOChermit polynomial chaos, 33

Hot carrier injection, 4HPC

capacitance extraction, 163, 166Hermite polynomial chaos, 29inductance extraction, 214–215power grid, 115, 131statistical leakage analysis, 40total power analysis, 97

IIdle leakage, 77IEKS

improved extended Krylov subspacemethods, 11

IGMRESincremental GMRES, 195

Incremental aggregation, 246Independent, 20

capacitance extraction, 167power grid, 110statistical leakage analysis, 67statistical leakage analysis, 57

Inductance extraction, 209Inductance matrix, 210Inner product

capacitance extraction, 171mismatch, 241power grid, 132

Inter-die, 6fundamentals, 23power grid, 111statistical leakage analysis, 45, 57yield optimization, 275

Interval arithmeticperformance bound analysis, 222

Intra-die, 6fundamentals, 23power grid, 111statistical leakage analysis, 45, 55yield optimization, 275

IsTPWLincremental stochastic TPWL, 236mismatch, 247

ITRSInternational technology roadmap for

semiconductors, 107

KKCL

Kirchhoff’s current law, 211yield, 258

Kharitonov’s functions, 13performance bound analysis, 222, 228

Kharitonov’s polynomials, 13Krylov subspace

capacitance extraction, 194

LLayout dependent variation, 7LE

local expansion, 187Leakage power, 39

yield optimization, 273Local tangent subspace

mismatch, 244Log-normal, 19Log-normal leakage current, 11Log-normal

power grid, 111, 134statistical leakage analysis, 41

Look-up tablecapacitance extraction, 171gate-based leakage analysis, 41LUT, 66

Loop-up-table, 10LU decomposition, 184Lyapunov equation, 146

MMacromodel

mismatch, 242ManiMOR

mismatch, 247Markov’s inequality, 17–18Maximum possible yield, 276MC

capacitance extraction, 166dynamic power analysis, 90mismatch, 235Monte Carlo, 28performance bound analysis, 221, 228power grid, 132, 151statistical leakage analysis, 49, 61total power analysis, 95yield, 253, 260, 282inductance extraction, 211

302 Index

MEmultiple expansion, 186

Mean value, 16dynamic power analysis, 90mismatch, 241power grid, 116statistical leakage analysis, 39, 58yield, 261inductance extraction, 211total power analysis, 100

Mismatch, 235analog circuits, 13performance bound analysis, 221yield, 253

MNAmodified nodal analysis, 111power grid, 115

Moment, 17power grid, 129statistical leakage analysis, 50

MORmismatch, 236, 238model order reduction, 236

Multi-objective optimization, 262Multivariate Gaussian process

power grid, 111Mutually independent, 20MVP

matrix-vector product, 183

NNBTI, 4NMC

mismatch, 235non-Monte Carlo, 253

Non-Monte-Carlo method, 13Non-Monte Carlo method

yield, 259

OOPAM

operational amplifier, 265Optical proximity correction, 7Optimal binning scheme, 280Ordinary differential equation

ODE, 238Orthogonal decomposition

capacitance extraction, 12leakage analysis, 10power grids, 11

Orthogonal PCpower grids, 11

Orthogonal polynomial chaos, 29, 158capacitance extraction, 166, 183, 188leakage analysis, 55mismatch, 236

Orthogonal polynomial chaosmismatch, 236

Orthogonal polynomial chaosmismatch, 240power grid, 108, 127statistical leakage analysis, 53yield, 257

Orthogonal polynomial chaosyield analysis and optimization, 13

Orthogonal polynomial chaosdynamic power analysis, 87

Orthogonal polynomials chaosanalog circuits, 13

Oxide erosion, 7

PPanel-distance, 186Panel-width, 186Parametric yield, 254, 275PBTI, 4PCA

capacitance extraction, 167, 186power grid, 111, 150principal component analysis, 27statistical leakage analysis, 49, 57, 67yield, 257

PDFmismatch, 241probability density function, 18total power analysis, 99yield, 255, 263yield optimization, 274

Pelgrom’s modelmismatch, 237yield, 256

Performance bound analysis, 12, 222Performance metric, 255Perturbation

mismatch, 240Perturbed SDAE

mismatch, 240PFA

principle factor analysis, 26total power analysis, 10, 95

Phase-shift mask, 7

Index 303

PiCAP, 12parallel and incremental capacitance

extraction, 183PMTBR

power grid, 147Potential coefficient matrix

second-order, 168capacitance extraction, 165

POVpropagation of variation, 256yield, 261

Power constraint, 276Power grid network, 109Power grids, 10Pre-set potential, 165Preconditioner, 184Primary conductor, 211Principal factor analysis, 10Process variation, 4, 23

capacitance extraction, 163, 165, 183inductance extraction, 209performance bound analysis, 221statistical leakage analysis, 45total power analysis, 95yield, 253

Projection matrix, 147PSD

power spectral density, 235PWL

piece-wise linear, 128

QQuadrature points, 31

statistical leakage analysis, 59

RRandom variable, 16Random variable reduction, 12RC network, 109Response Gramian, 11, 148RHS

right-hand-side, 258Run-time leakage, 77

estimation, 77reduction, 79

SSample space, 15

power grid, 111Schmitt trigger, 265

SCLstandard cell library, 66

Segmentdynamic power analysis, 86

Set covering, 276SGM

stochastic geometric moment, 189Single-objective yield optimization, 272Singular value

power grid, 146Slack, 274SLP

sequential linear programming, 256yield, 262

Smolyak quadraturedynamic power analysis, 88fundamental, 32inductance extraction, 212statistical leakage analysis, 60total power analysis, 98

SMORstochastic model order reduction, 130

Snapshotmismatch, 243

Sparse grid quadrature, 32Sparse grid

inductance extraction, 214total power analysis, 10, 95

Sparse gridsinductance extraction, 12

Spatial correlation, 8Spatial correlation, 23

capacitance extraction, 169power grid, 111statistical leakage analysis, 46, 57, 67total power analysis, 95yield optimization, 275

Spatial correlationsleakage analysis, 10

Spectral-stochastic-based MORpower grid, 127

Spectral stochastic methodleakage analysis, 10

Spectral stochastic methodmismatch, 240power grid, 108statistical leakage analysis, 40total power analysis, 97yield, 257

SPICEdynamic power analysis, 86mismatch, 240total power analysis, 95

304 Index

SSCMcapacitance extraction, 175

Standard deviation, 17–18dynamic power analysis, 90mismatch, 241statistical leakage analysis, 39, 58total power analysis, 100

StatCap, 12statistical capacitance extraction, 166

State-spacepower grid, 146

StatHenry, 12, 212Statistical leakage analysis, 10Statistical variation, 7Statistical yield, 12STEP

statistical chip-level total power estimation,95

Stochastic current sourceyield, 257

Stochastic differential-algebra-equation, 13mismatch, 235

Stochastic geometrical moments, 183Stochastic sensitivity, 261StoEKS, 11

stocahstic Krolov subspace method, 127Subthreshold leakage, 39

power grid, 107statistical leakage analysis, 41

Supply voltage, 263Supply voltage adjustment

yield optimization, 273SVD

mismatch, 245singular-value-decomposition, 239

Switching segment, 89Symbolic analysis, 13

performance bound analysis, 223Symbolic cancellation

performance bound analysis, 223

TTaylor expansion, 118

capacitance extraction, 166mismatch, 240

TBRtruncated balanced realization, 146

Tensor productcapacitance extraction, 171

Threshold voltagepower grid, 107statistical leakage analysis, 41

Timing constraint, 276

Total power, 10, 93TPWL

mismatch, 246trajectory-piecewise-linear, 236

Trajectory-piecewise-linear macromodeling,13

TrancatingTBR, 146

Transition waveformdynamic power analysis, 86

UUniform binning scheme, 277Upward pass, 185

VValid voltage segments

yield, 276VarETBR

variational TBR, 11Variance, 17–18

inductance extraction, 211mismatch, 241statistical leakage analysis, 46, 59yield, 261

Variation-aware designinductance extraction, 209

Variationcapacitance extraction, 167yield, 257

Variational current sourcepower grid, 128

Variational response Gramian, 151Variational transfer function

performance bound analysis, 226VarPMTBR

variational Poor man’s TBR, 145Virtual grid

dynamic power analysis, 10, 87statistical leakage analysis, 67

Virtual variables, 10Voltage binning method, 13

yield optimization, 273Voltage binning scheme

yield, 275

WWafer-level variation, 7Wire thickness

power grid, 111

Index 305

Wire widthpower grid, 111

Worst case(corner)mismatch, 235performance bound analysis, 221power grid, 111statistical leakage analysis, 39yield, 260

WPFAweighted PFA, 26

YYield estimation, 253Yield optimization, 253Yield sensitivity, 253

Documents

Statistical Performance Analysis and Modeling Techniques for …€¦ · University of California Riverside, USA Hao Yu Department of Electrical and Electronic Nanyang Technological