Superposition Principle Applied to Thermal Analysis for 3DICs

Superposition Principle Applied to Thermal Analysis for 3DICs

Hadrien Clarke1 and Kazuaki Murakami1,2 1 Institute of Systems, Information Technologies and Nanotechnologies, Japan

2 Kyushu University, Japan

Abstract. Temperature is an important factor for reliability in 3DICs. Transient thermal analysis is known to be time consuming for fine-granularity models. On the other hand, only parts of the ICs are likely to become hotspots. This paper presents a method where one can compute the temperatures of desired parts instead of the whole chip. The first stage of this method is to precompute the thermal responses of the 3DIC to power impulses in order to characterize it by using any given heat equation solving technique. Then, the thermal profiles of only relevant parts of the 3DIC can be computed. Our experiments show that, precomputation time apart, the computation time of designated parts can be reduced without affecting the accuracy of the heat equation solving technique. Besides, due to the independent nature of the computations, parallelization is easily achievable, thus enabling the benefits of modern multi-core architectures.

Keywords: Thermal Analysis, 3DIC, Parallelization

1. Introduction The reliability of integrated circuits (ICs) highly depends on the temperature they operate at [1]. As

technology evolves from a planar to a stacked-die integration, ICs will have higher power densities and thus will face higher temperatures [3]. It is therefore important to have tools to be able to predict the thermal profiles that the IC will encounter and to make sure they stay within operating range. However, due to the considerable computation time required to obtain the thermal profiles of an IC when using generic approaches such as the finite difference method (FDM) or the finite element method (FEM), simplifications are often made.

Several models have been used for thermal analysis. HotSpot [2] and ISAC [4] are tools that calculate the dynamic temperature of chips. HotSpot uses a thermal-electrical analogy where the system is turned into a RC circuit while, on the other hand, ISAC uses dynamic spatial and temporal granularity adaptation in order to eliminate unessential cells and time steps.

Another proposed method for thermal analysis is to analytically solve the heat equation using the Green Function method and FFTs such as in [5] and [6]. Still, this approach is limited to steady-state studies.

When a thermal profile is computed, it usually has to be computed for every cell of the system. Yet, only some parts in ICs (mostly located on the active layer) produce power to which the whole chip will have a thermal response. Furthermore, only few parts are likely to turn into hotspots (e.g. register files or reorder buffer in CPUs [8], [9]). In this paper, we propose a method to characterize the thermal response of the whole chip to each part that produces power. As done in [7], we take advantage of the superposition principle. The thermal response of only designated parts can then be computed from data obtained at a precomputation stage, and that for any given power profile. The method in this work is based on FDM but would work with any other solving technique. We call this method Spata.

2. Superposition Principle Applied to Thermal Analysis

2.1. Heat Equation and Superposition Principle

159

2011 International Conference on Circuits, System and Simulation IPCSIT vol.7 (2011) © (2011) IACSIT Press, Singapore

The thermal profile of a chip is governed by the heat equation:

),,,(),,,(),,,( tzyxptzyxTkt

tzyxTc p +Δ=∂

∂ρ

where ρ is the density of the material (in kg.m-3), cp is the heat capacity of the material (in J.kg-1.K-1), T is the temperature (in K), k is the thermal conductivity (in W.m-1.K-1) and p is the volume power density (in W.m-3).

Fig. 1: Method Flow.

(a) (b)

Fig. 2: (a) A 3DIC divided into c cells with two layers containing n power cells. (b) Transfer functions linking the thermal response of cell j to the power dissipated by cells 1, 2,…, n.

The system representing the chip being linear, it allows for using the superposition principle which states that if F is a linear operator and { }nitxi ,,2,1),( …∈∀ :

{ } ℜ∈∀=⎭⎬⎫

⎩⎨⎧

⇒= ∑∑==

i

n

iii

n

iiiii tytxFtytxF ααα ,)()()()(

11 Additionally, the system is time-invariant1, hence, according to the linear time-invariant system theory:

{ } )()()()( thtxtytxF iiii ∗== with hi(t) = F{δi(t)} being the response of the system represented by F to an impulse and ∗ representing the convolution operation. This means that the knowledge of hi(t) is enough to compute the response to any given function f(t).

The two previous equations give:

1 If an input signal x(t) produces an output y(t), then any time-shifted input x(t+Δ) results in a time-shifted output y(t+Δ)

160

)()()(11

thtxtxF i

n

iii

n

iii ∗=

⎭⎬⎫

⎩⎨⎧ ∑∑

==

αα

2.2. Model and Method A 3D chip will now be considered. It is modeled by a cube split into c cells among which n cells may

dissipate power – called power cells – as shown in Figure 2a. Typically, each plane in the cube corresponds to a layer of a die in the 3DIC but the granularity is at the user's discretion. In our case, xi(t) represents the power dissipated by power cell i noted pi(t), hi(t) represents the transfer function linking the power dissipated at cell i to the temperature of cell j noted hj,i(t), and yi(t) represents the temperature of a given cell j due to xi(t) noted θj,i(t) = Tj,i(t) – Ta, Ta being the ambient temperature. Thus, as depicted in Figure 2b, the temperature θj of cell j is:

∑∑∑===

=∗=⎭⎬⎫

⎩⎨⎧=

n

iij

n

iiji

n

iij tthtptpFt

1,

1,

1)()()()()( θθ

The method flow is shown in Figure 1. As the whole system is characterized by the knowledge of transfer functions hj,i(t), if all these are known, the dynamic temperature of any cell can be computed (using the previous equation) once the power profiles of the power cells pi(t) are provided. They are obtained by observing the response of all c cells to an impulse pi(t) = δ(t) applied to each power cell, one at a time. Besides, a common way to calculate convolutions is by using FFTs for algorithm complexity considerations. Consequently, the transfer functions hj,i(t) are computed and their FFTs Hj,i(s) too. As long as the system remains unchanged, that is to say the material characteristics and the size of the chip have not been altered, these steps need to be executed only once.

Ultimately, some cells may have the same power profile. For instance, cells describing a unit in a floorplan will share the same power profile pX(t) (Figure 3a). Therefore, for each cell j, the transfer functions representing their effects can be added, which will in return give a transfer function Hj,X(s) representing the global effect of the cells constituting the unit (Figure 3b).

Once the power profiles pX(t) are provided, their FFTs PX(s) are computed. Since we switch from time domain to frequency domain, the convolution becomes a multiplication and, the Fourier transform being linear, the previous equation becomes:

∑ ∑= =

=Θ=Θn

i

n

iXjXXjj sHsPss

1 1,, )()()()(

where Θj(s) is the FFT of θj(t). Finally, an inverse FFT applied to the above equation gives the dynamic temperature of any desired cell j.

(a) (b)

Fig. 3: (a) Cells within a same unit have the same power profile. (b) The transfer functions of cells representing the same unit can be added since they have the same power profiles.

3. Experiments 161

In this part, Spata is illustrated by using Matlab. To compute the transfer functions, an explicit FDM (Forward-Time Central-Space) is utilized. This solving technique is fairly unintensive numerically-wise. The downside is that the accuracy is limited and proportional to the time step and the square of the space step. Also, some convergence problems can occur if these steps are not chosen with care. A more numerically-intensive solving technique (such as a Central-Time Central-Space FDM) would provide better accuracy and be numerically stable in any case. However, the former technique is enough for a proof-of-concept. Furthermore, tests were run and results compared against those of a commercial FEM piece of software in order to validate the FDM implementation.

In this experiment, a 1 cm x 1 cm x 1 cm cube split into 63 cells with two layers containing power cells is considered as depicted in Figure 2a. The thermal conductivity is set at 150 W.m-1.K-1, the material density set at 2330 kg.m-3 and the heat capacity set at 705 J.kg-1.K-1. These values were chosen to reflect silicon properties. To take into account a cooling system, the bottom face of the cube is supposed to remove the heat by convection [10]:

),,,(),,,( tzyxz

tzyxbottom

zz bottom

θβθ ×=∂

∂

=

where β is a heat transfer coefficient set at 20000 W.m-2.K-1. The other faces of the cube are supposed adiabatic.

The characterization precomputation of this configuration requires 2 x 62 FDMs and 2 x 62 x 63 FFTs. With 25000 time steps of 10-3 s for each FDM, this stage takes 265.4 s on our test machine2. Approximately a quarter of this time is dedicated to FFTs. By taking advantage of the presence of symmetries in the cube, the precomputation time could be cut down to 49.8 s.

Two units (units B and C in Figure 3a) that actually dissipate power with arbitrary power profiles (shown in Figure 4a) are now defined and the thermal response of the penultimate layer is obtained with both direct FDM and Spata. The FDM requires 2.9 s while Spata takes 0.77 s. While the FDM computation time remains constant whatever the analyzed cells are, for Spata, it linearly depends on the number of cells we are analyzing (see Section 4). Consequently, the computation time of the thermal profile of one cell is 14.2 ms.

Figure 4b shows the thermal profile of the blue cell in Figure 4a. The accuracy of the thermal profiles essentially depends on the solving technique used for the characterization stage. In this experiment, the error induced by Spata was less than 10-13 K compared to the results given by a direct FDM.

(a) (b)

Fig. 4: (a) Arbitrary power profiles of units B and C. (b) Thermal profile of a designated cell.

4. Discussion

2 AMD Opteron 290 (2.8 GHz), 8 GB of RAM

162

The results in Section 3 highlight that the computation time of Spata depends on the number of power units u and on the number of aimed cells q whereas for an FDM, the computation time is constant for a given number of time steps t and cells c in the cube. Indeed, while the complexity of the FDM is O(c t), summing convolutions using FFTs leads to a complexity of O(q u t log t).

As a result, the computation time of Spata may be higher than that of a direct FDM in some situations (q u log t > c). However, more reliable solving techniques may also be more computationally intensive than that of the simple FDM used in this study. Also, one has to keep in mind that only parts of 3DICs are usually relevant for thermal analysis (e.g. power dissipating layers), which gives an advantage to our approach. Moreover, the computations of the thermal profile of each cell are independent. Parallelization is therefore easily achievable and the method can easily benefit from a multithreaded implementation on a modern multi-core architecture such as, for instance, a GPU.

For the precomputation stage, the complexity is linked to the one of the solving technique it uses. In the experiment of Section 3, since the complexity of the FDM is O(c t), the complexity of the precomputation is O(p c t) with p the number of power cells. In spite of taking a long time to compute, this phase only needs to be executed once per chip, and is parallelizable as well due to the independence of each iteration.

5. Conclusions and Future Work In this paper, Spata, a method for thermal analysis of 3DICs, has been presented. This method is based

on the superposition principle and consists of two stages. First, the 3DIC to be analyzed is characterized by using any heat equation solving technique. For a given chip model, this precomputation stage is executed only once. Then, in an analysis stage, the thermal profiles of designated parts of the chip are computed using the data of the previous stage and by taking advantage of the superposition principle. This stage is executed whenever a new power profile is provided.

Though Spata has been illustrated with a simple proof-of-concept example, improvements can be made. Future work may include the use of a better solving technique to increase the accuracy of the analyses, and the study of real test cases in order to have exploitable results. Lastly, as most computations in the method are independent, Spata can easily be accelerated with a multithread implementation.

6. Acknowledgment The authors would like to thank Aran Raoufi from the University of Tehran for his kind assistance with

this work.

7. References [1] Failure Mechanims and Models for Semiconductor Devices. JEDEC publication JEP122E. http://www.jedec.org.

[2] Huang, W., Sankaranarayanan, K., Skadron, K., Ribando, R., Stan, M.: Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model. In: Design, Automation, and Test in Europe, 2007.

[3] Puttaswamy, K., Loh, G.: Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor. In: Great Lakes Symposium on VLSI, 2006.

[4] Yang, Y., Gu, Z., Zhu, C., Dick, R., Shang, L.: ISAC: Integrated Space and Time Adaptive Chip-Package Thermal Analysis. In: Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions, 2007.

[5] Zhan, Y., Sapatnekar, S.: A High Efficiency Full-Chip Thermal Simulation Algorithm. In: International Conference on Computer Aided Design, 2005.

[6] Oh, D., Chen, C., Hu, Y.: 3DFFT: Thermal Analysis of Non-Homogenous IC using 3D FFT Green Function Method. In: International Symposium on Quality Electronic Design, 2007.

[7] Michaud, P., Sazeides, Y.,: ATMI: Analytical Model of Temperature in Microprocessors. In: Workshop on Modeling, Benchmarking and Simulation, 2007.

[8] Kim, J., Jhang, S., Jhon, C.: Dynamic register-renaming scheme for reducing power-density and temperature. In: Symposium on Applied Computing, 2010.

163

[9] Black, B., Annavaram, M., Brekelbaum, N., DeVale, J., Jiang, L., Loh, G., McCauley, D., Morrow, P., Nelson, D., Pantuso, D., Reed P., Rupley, J., Shankar S., Shen, S., Webb, C.: Die Stacking (3D) Microarchitecture. In: Symposium on Microachitecture, 2006.

[10] Cheng, Y-K., Raha, P., Teng, C-C., Rosenbaum, E., Kang, S-M.: ILLIADS-T: An Electrothermal Timing Simulator for Temperature-Sensitive Reliability Diagnosis of CMOS VLSI Chips. In: Computer-Aided Design of Integrated Circuits and Systems, 1998.

164

Documents

Superposition Principle Applied to Thermal Analysis for 3DICs