IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, …

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, SPECIAL ISSUE, VOL. 6, NO. 1, DECEMBER 2013 1

A Combined Design-Time/Test-Time Study ofthe Vulnerability of Sub-Threshold Devices to

Low Voltage Fault AttacksAlessandro Barenghi, Cedric Hocquet, David Bol, Francois-Xavier Standaert, Francesco

Regazzoni Member, IEEE , and Israel Koren Fellow, IEEE

Abstract—The continuous scaling of VLSI technology and the possibility to run circuits in subthreshold voltage rangemake it possible to implement standard cryptographic primitives within the very limited circuit and power budget of RFIDdevices. However, such cryptographic implementations raise concerns regarding their vulnerability to both active andpassive side-channel attacks. In particular, when focusing on RFID targeted designs, it is important to evaluate theirresistance against low cost physical attacks. A low cost fault injection attack can be mounted, for example, by loweringthe supply voltage of the chip with the goal of causing setup time violations.In this paper, we provide an in-depth characterization of a chip implementation of the AES cipher. The chip has beendesigned using a 65nm low power standard cell library and operates in a subthreshold voltage range. We first show thatit is possible to inject faults (through lowering the supply voltage) compliant with the fault models required to performattacks against the AES cipher. We then investigate the possibility of predicting, at design time, which parts of thechip are more likely to be sensitive to such fault injection attacks and produce the desirable (from the point of viewof the attacker) faulty behavior. Identifying such sensitive logic signals allows us to suggest to the designer a tailoredcountermeasure strategy for thwarting these attacks, with a minimal impact on the circuit’s performance.

Index Terms—Setup time violation, Fault attacks, AES, Design Simulation

F

1 INTRODUCTION

Radio Frequency Identification (RFID) devices arenowadays used in a wide range of applications, suchas health care, supply chain management, and pet iden-tification [1]. Such a pervasive diffusion raises concernsregarding the privacy of the users, as RFID tags oftenstore sensitive information. The design of RFID devices

• A. Barenghi is with the Department of Electronics, Informationand Biotechnology, Politecnico di Milano, Milano, Italy.

• C. Hocquet is with National Instruments, Zaventem, Belgium.

• D. Bol is with ICTEAM institute, Universite catholique deLouvain, Louvain-la-Neuve, Belgium

• F. X. Standaert is with ICTEAM institute, Universite catholiquede Louvain, Louvain-la-Neuve, Belgium.

• F. Regazzoni is with Delft University of Technology, Delft,The Netherlands, and ALaRI - University of Lugano, Lugano,Switzerland.

• I. Koren is with University of Massachusetts at Amherst, Mas-sachusetts, USA.

is commonly constrained by a very strict power andarea budget. Thus, incorporating the circuitry needed toguarantee a sound security margin against attackers is achallenging task, as security primitives, if not properlyimplemented, are quite demanding in terms of area andpower.

A particularly appealing solution to meet the powerconsumption constraints is to exploit nanometer CMOStechnologies, while adopting known aggressive powersaving techniques, and operating the device at a sub-threshold voltage [2]. Typical supply voltages employedin this context range between 0.3V and 0.5V, signifi-cantly lower than the common ones needed to work inthe saturation region (1-1.2V).

Nanometer CMOS technologies also allow to meetthe area constraints when implementing standard crypto-graphic algorithms such as AES. As the manufacturingprocesses for nanometer CMOS technologies becomemore widespread, a larger number of commercial ap-plications will be able to afford the cost of using RFIDtags [3].

Using low power cell libraries and operating thedevice at a subthreshold voltage, result in a significant


reduction of the power consumption but at the cost of aconsiderably lower clock frequency at which the devicewill operate. This, however, is acceptable since, even ifspeed in RFIDs is an important design parameter, it isnot commonly a critical one.

A key concern for every secure cryptographic imple-mentation is its vulnerability to both active and passiveside-channel attacks. When designing RFIDs, it is partic-ularly important to evaluate their resistance against lowcost physical attacks. A well-known such attack is a faultinjection carried out by simply decreasing the supplyvoltage. It has been shown that cipher implementationsmay experience functional failures if the Vdd is reducedbelow the reference supply voltage.

In [4] we demonstrated the practical feasibility ofthese attacks against nanometer CMOS devices workingat a subthreshold voltage. The considered AES co-processor design has a data path of 8 bits and is imple-mented using a 65nm low power library [5]. In particular,it was shown that it is possible to generate timing faults(violations of the flip-flops’ setup time) localized withina single byte of the state of the AES cipher. In the samepaper, we also showed that it is possible to inject suchfaults even if the effects of process variations may shiftthe exact voltage level to which the power supply of thedevice should be lowered.

In this paper we address an important question in thedesign of secure, low-power, low-area cryptographic im-plementations: can the locations of potential setup-timeviolations (resulting from lowering the supply voltage)be predicted at design time, through industry standardEDA toolchains? Such predictions can allow the designerto assess the vulnerability of the circuit to fault injectionattacks and modify the circuit design accordingly.

We tackle this issue through extending the fault lo-cation and timing characterization of [4] down to thesingle bit level, for all the faults which can be usedfor attack purposes. By exploiting the detailed results ofthis analysis we investigate the feasibility of gatheringinformation regarding the vulnerability of the chip atdesign time. In particular, we consider the feasibility ofreproducing the faults, which we measured experimen-tally, using simulations that can be performed prior tomanufacturing. In addition, we investigate the usefulnessof static timing analysis to predict which state bits ofthe cipher are more likely to be affected by setup timeviolation induced faults.

The remainder of the paper is organized as follows.In Section 2 we provide background information on theAES cipher, briefly discuss several previously proposedfault attacks on AES and describe the one consideredin our study. We then present the architecture of our

AES design in Section 3. Section 4 describes the ex-perimental setup that we employed to collect the faultmeasurements on the chip samples and the bit-levelcharacterization of the faults. Finally, Section 5 reportsour results concerning the feasibility of predicting thesetup time violation faults at design time by employingan industry grade EDA toolchain, and discusses anefficient countermeasure which can be applied at designtime. Section 6 presents our conclusions and points outfuture research directions.

2 BACKGROUND

In this section we provide a brief overview of thestandard block cipher AES that is used as a case studyin our experiments and simulations. We then review theknown fault injection attacks against AES, focusing onthose that can be mounted with a limited budget. Thissection also includes a detailed description of the basicdifferential fault attack that we have mounted in ourexperiments. We stress the importance of investigatingfaults induced through setup time violations caused byunderfeeding the cipher implementation, as they are,non-destructive and easy to perform, thus representinga concrete threat to secure digital devices.

2.1 The AES Cipher

The cipher considered in this work is the AdvancedEncryption Standard [6], due to its wide spread adoptionand the substantial cryptanalytic scrutiny it has under-gone in the last 12 years. The selected variants of theRijndael [7] algorithm that the AES standard supportsinclude a plaintext block size of 128 bits and three keysizes of 128, 192 and 256 bits.

The AES cipher is based on the iteration of around function composed of four primitives: SUB-BYTES, SHIFTROWS, MIXCOLUMNS and ADDROUND-KEY. The number of times the round function is iterated,Nr, is 10, 12 or 14 times depending on the length ofthe key employed. The only exception to the repetitionof the four primitives is the fact that the last round ofthe encryption is missing the MIXCOLUMNS primitive.Moreover, an extra ADDROUNDKEY is performed be-fore the first round as a pre-whitening of the input state.

The inner state of the AES cipher after round r,denoted by Sr, can be represented as a 4× 4 matrix,where each element is 8-bit wide. We denote the n-thbyte, counting from left to right, from top to bottom asSr

nEach primitive of the AES cipher contributes either

confusion or diffusion effects to the cipher, or adds


a dependency on the value of the key. The SUB-BYTES primitive is a non-linear mapping over Z28 thatintroduces a non-linear confusion effect. This mappingis applied to a single byte at a time, Sr

n, and can beimplemented either as a lookup table or computed onthe fly. The SHIFTROWS primitive provides a row-wisediffusion effect to the inner state of the AES cipher. Itrotates the four rows of the state Sr by 0,1,2 or 3 bytepositions, respectively, while the values of the rotatedbytes remain unaltered. The MIXCOLUMNS primitiveprovides column-wise diffusion of the cipher state byconsidering the column as a vector of values over Z28 ,and multiplying the vector by a constant matrix. Thisoperation linearly combines in an invertible way the con-tents of the four bytes of a column. The last operation,the ADDROUNDKEY primitive, combines the state ofthe AES cipher with a 4×4 key matrix through bitwiseexclusive or (xor).

Since the ADDROUNDKEY primitive is repeated Nr +1 times, there is a need to expand the initial key providedby the user into Nr+1 round keys through a key scheduleroutine. The key schedule process is non-destructive, i.e.,all the operations performed are bijective. As a result, ifa person is in possession of 4, 6 or 8 contiguous 32-bitwords of the key schedule, he is able to reconstruct thefull 128, 192 or 256 bit secret key.

2.2 Active side-channel attacks against AES

A number of fault injection attacks on the AES cipherhave been reported in the literature. Although some ofthem were not experimentally validated at the time theywere presented [8], [9], [10], [11], several other weresuccessfully mounted on real world implementations.Among the ones viable with low cost equipment is thework of Hutter et al. [12] where the authors were able tomount a successful attack by causing temporary brownouts and glitches on the power supply line of an 8-bitmicrocontroller running a software implementation ofAES. In [13], Schmidt et al. attacked an implementationof AES by blanking selectively the memory where theSBoxes are held, effectively reducing the entire AESalgorithm to the last ADDROUNDKEY, performed on azero-filled state. A technique which has proven effectivein inducing controlled faults is through causing setuptime violations by lowering the supply voltage belowthe level the circuit was designed for. In [14], Selmaneet al. report the effects of attacking a commercial gradeASIC implementation of AES in a smart card, usingthis fault induction technique, while in [15] the authorssuccessfully applied the technique to a full ARM9 corerunning a software implementation of AES. We refer

the interested reader to a comprehensive survey on faultattacks and countermeasures in [16]. For the sake ofcompleteness, we also recall that passive side-channelanalysis techniques have been successful in attackingreal world implementations of RFID chips, as is reportedin [17].

2.3 The attack under consideration

We now present the attack methodology considered inthis paper to determine which faults represent a threatto the security of the AES implementation. Dusart etal. [18] have shown that it is possible to successfullyretrieve the whole secret key of an AES-128 cipher,through the injection of a single byte-wide fault duringthe regular functioning of the cipher. The proposed attackrelies on the injection of a single byte fault betweenthe MIXCOLUMNS operation of the eighth round andthe MIXCOLUMNS of the ninth round, as depicted inFigure 1. Due to the lack of the MIXCOLUMNS operationduring the tenth round, the effect of the fault is spreadonly over 4 of the 16 bytes of the state. Since the keyaddition is performed byte-wise, the values of these 4bytes are influenced only by 4 bytes of the last round key.Exploiting this fact and assuming that the injected faulthas corrupted only one byte, the attacker may proceed torecover the 4 bytes of the key by comparing the correctand faulty ciphertexts.

To this end, the attacker makes an hypothesis on theunknown part of the key and proceeds to invert the effectof the last ADDROUNDKEY on the part of the cipherthat was affected by the fault (greyed out in the figure)obtaining four values belonging to the state marked asI in Figure 1. This operation is performed on boththe erroneous and the correct values of the ciphertext,yielding two groups of 4-byte values. Subsequently, theattacker inverts the effect of both the SHIFTROWS andSUBBYTES primitives, since their effect is fully known,obtaining successfully a faulty and a correct hypotheticalvalues for four bytes of the state S9, denoted respec-tively by w = {S9

0, S91, S

92, S

93} and w = {S9

0,S91,S

92,S

93} .

Bypassing the effect of the ADDROUNDKEY operation,to further roll back the cipher, the attacker computesthe exclusive or of w and w. Doing so, the effect ofthe ADDROUNDKEY function is canceled since in com-puting δ = w⊕ w the key values are added twice. Thisallows the attacker to effectively compute the differencebetween the correct and the erroneous state of the cipherright before the MIXCOLUMNS operation. Since theMIXCOLUMNS operation is linear with respect to theexclusive or, it is possible to map the 4-byte differenceδ into the difference before the operation by multiplying


S0 4 8 12

1 5 9 13

2 6 10 14

3 7 11 15

S9

0 4 8 12

1 5 9 13

2 6 10 14

3 7 11 15

MixColumnShiftRows

ShiftRows AddRoundKey

AddRoundKey

SubBytes

Round 9

Round 10

10

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9

S9 S

9

S10

S10

S10

S10

S10

S10

S10

S10

S10

S10

S10 S

10

S10

S10

S10

S10

IS9

Fig. 1. Effects of the propagation of a single fault injected between the MIXCOLUMNS operations of theeighth and ninth rounds. The fault propagates to only a quarter of the cipher state, allowing the attacker todetect such a situation.

the value by the inverse of the matrix employed in theregular MIXCOLUMNS. At this point, the attacker maycheck if the obtained difference is actually composedof a single byte, as the fault model required by theattack mandates. Depending on whether the differencematches the fault model or not, the attacker decides ifthe key hypothesis made at the beginning of this rollbackprocedure is a valid one or not.

The attacker iterates the same difference analysisprocedure for all the possible 232 values of the fourunknown bytes of the key and stores only the ones whichproduce a single byte difference before the last MIX-COLUMNS operation. A single pass of this procedureyields roughly a thousand valid candidates for the 32-bitwide key slice, and may be repeated if more than onefaulty ciphertext caused by a single byte fault is availableto the attacker. With a second sweep of the procedurethe number of key candidates is reduced to one witha reasonably high probability [19]. Since it is possiblefor the attacker to discern, looking at the bytes that areaffected by the faults, which slice of the key is the oneunder consideration, he can reconstruct the whole lastround key with 4 injected faults and a brute force effortof 10004 ∼ 240 AES encryptions, which takes about acouple of minutes on a modern desktop computer, orwith 8 injected faults and no brute force effort at all.

Further reduction of the number of faults needed forthe attack can be achieved by injecting a single fault be-tween the MIXCOLUMNS operation of the seventh roundand the MIXCOLUMNS of the eighth round (instead offour single-byte faults injected simultaneously on each

column of the state before the last MIXCOLUMNS),thanks to the diffusing effect of the SHIFTROWS op-eration of the ninth round. This way, it is possible toretrieve the complete last round key of the cipher witha single fault and a modest brute force effort or twofaults and no brute force effort. The main drawbackof this method is that it is not possible to determinewhether a faulty ciphertext has been the result of a faultcomplying with the required timing hypothesis, since theentire ciphertext value is altered. Nonetheless, if a faultwhich does not match the fault hypothesis is employedin the key recovery procedure, the number of valid keycandidates drops to zero during the first iteration of theprocedure, thus allowing the attacker to be aware of theissue.

After performing the aforementioned procedure, theattacker has all the bits of the last round key and is thusable to reconstruct the original key, in case a key sizeof 128 bit is employed. If larger key sizes are used, itis necessary to recover more key material in order tosuccessfully break the cipher. An extension of the aboveattack is reported in [21], and requires twice the numberof faults in order to recover the entire AES-256 key.The fault model assumed is the same, but the attackerhas to inject faults also during the round before the onewhere the aforementioned attack takes place. The samepaper also mentions the possibility of attacking even anAES cipher where no key schedule has been employed,and the 1408-bit key material is filled with random keybits. To this end, the attacker should be able to injectsingle byte faults during all the rounds of the cipher,


ControlUnit

I/OInterface

32 x 8-bitSRAM

Datapath

Ciphertext

Plaintext

C1

C2

D

8-bit Data Bus

Control Signal

Reg

(a) AES Component

Scratch Register

Key ScheduleRCON

MUX

SBox 1/4 MixColumnC1

D

C2

(b) AES Component Datapath

Fig. 2. Block diagram of the AES module proposed by Feldhofer et al. [20] and implemented with a 65nmsubthreshold cell library.

thus retrieving all the round keys one by one.

3 ARCHITECTURE OF THE COPROCESSOR

In this section we describe the structure of the AES co-processor chosen to implement the cipher, given the tightarea and power constraints [5].

The selected 8-bit AES implementation is tailoredto be used in low cost, low power devices such asRFIDs and follows the high-level structure proposed byFeldhofer et al. [20], supporting only a 128-bit key. Theblock diagram of this design is depicted in Figure 2. Thechip communicates with the user through an 8-bit widedata bus managed by an I/O interface module. The stateand the initial key of the cipher are stored in a 32-bytewide register file, connected to the 8-bit wide data-pathof the chip, and driven by the finite-state control unit.

The compute portion of the design is composed ofthree components: a module to compute the non-lineartransformation (S-box) that is used by the SubBytesprimitive and the key expansion routine, a module tocompute one quarter of the MixColumn primitive perclock cycle, and an exclusive-or module to performthe round key addition and the linear combinations ofsubkeys which are needed to execute the AES keyschedule. The ShiftRows primitive is performed througha proper access pattern in the load and store operations.

The implemented AES co-processor differs from theone in [20] in terms of the technique used to implementthe S-Box non-linear transformation. To achieve a lowergate count, we followed the guidelines of the lightweight

S-box implementation proposed by Satoh et al. [22]. Thisimplementation maps the input byte of the S-Box into thecomposite Galois field Z(((22)2)2), effectively reducingthe input size of the nonlinear primitives to be computedto 2 bits. Then, it performs an efficient inversion of thefour elements over Z(2) obtained from the previous map-ping, and maps back the resulting inverted elements toZ(28), subsequently performing the affine transformationas a sequence of bitwise xor operations. This designtechnique that has a significantly lower gate count, wasfurther optimized by Mentens et al. in [23], yielding theimplementation technique that we employ.

The HDL code of the described AES design wassynthesized for a target clock frequency of 100 kHzusing Synopsys Design Compiler. The target technologywas the STMicroelectronics 65nm LP CMOS technologywith seven interconnect metal layers. Considering thefact that the device would operate in the subthresholdregime, we manually removed from the standard-celllibrary the cells which may not operate correctly, i.e., theones with the longest transistor stack, such as NAND3 orNOR3. We set the timing condition to the worst-case pro-cess corner (slow NMOS and slow PMOS transistors),low temperature (-15 C), and operating supply voltageof 0.4V to achieve the desired 100KHz clock frequency.The optimized design was then placed and routed usingCadence SoC Encounter and manufactured in silicon.Twenty dies were encapsulated in a 44-pin CeramicQuad Flat Package (CQFP) and were tested to verifythe correct execution of encryption and decryption. All


twenty tested chips were found to operate correctly at afrequency of 1.3MHz with a power supply of 0.45V and400KHz at 0.4V. By relaxing the clock frequency further,it is possible to keep the AES circuit operating correctlydown to 0.25 V, which represents thus the functionallimit of the design.

4 EXPERIMENTAL SETUP AND FAULT CHAR-ACTERIZATION

This section describes the measurement setup used andthe experiments conducted in order to profile the be-havior of the low-power AES implementation and inves-tigate the feasibility of low-cost fault injection attacksbased on voltage throttling. It also provides a precisecharacterization of the setup time violation faults whichcan be exploited by an attacker.

4.1 Experimental setup

To perform the required measurements, the packagedchips were mounted on suitable sockets and dedicatedPCBs were built. The chips under test were connectedto a Keithley K236 tunable voltage generator, which issufficiently precise to allow modifying the supply voltageby as little as 0.1mV. The power supply was connecteddirectly to the power pin of the chips under test. All thetested chips were clocked at a frequency of 1.3MHz bymeans of an external clock generator and each encryptionrequired 1100 clock cycles (less than 1ms). These clockrates and encryption times are within the typical rangeof RFID systems.

To carry out the experiments we connected the in-puts/outputs of the PCB to a logic generator/analyzer,the National Instruments NI6552, which was connectedto a PC where the plaintexts and keys were stored. Theentire acquisition system, including the scaling of thesupply voltage, was controlled by a Labview program,allowing to automate the acquisition process.

4.2 Performing the Attacks

The first experiment, with the objective of finding outwhether the performance of the circuit degrades grace-fully, was conducted by testing how many differentencryptions the chip was able to perform while loweringthe supply voltage by 0.1mV at each step. The voltagereduction was carried out in order to exactly identify thevoltage level at which it is more likely that only a smallnumber of setup time violations occur but no functionalerrors happen, which is the desired situation for anattacker. At each voltage step, we collected the resultsof ten thousand encryption operations and compared

0.4275 0.4280 0.4285 0.4290 0.42950

10

20

30

40

50

60

Voltage V

Faul

tPe

rcen

tage

Fig. 3. Percentage of faulty outputs at different sup-ply voltages for a single sample chip.

them to the correct ones. Figure 3 depicts the resultsof the experiments in terms of percentage of faultyciphertexts versus supply voltage. As can be seen fromthe figure, there is a 0.8mV interval where the faultoccurrence is limited to less than 10% of the encryptionsand the fault occurrences gradually increase when thevoltage is further lowered. The 0.8mV zone where theprobability of a fault occurrence is low is relevant whenperforming fault attacks as it maximizes the likelihood ofinjecting a single fault into the computation, as opposedto the catastrophic behavior experienced when the supplyvoltage is considerably lower, as reported in [14], [24].The 0.8mV supply voltage interval is well within thereach of the precision of the employed tunable voltagegenerator.

Circuits implementations in current CMOS technolo-gies are known to experience process variations. Thesevariations may have a noticeable impact on the behaviorof the chip especially when it is operated at subthresholdvoltage levels. An important question here is whetherthese process variations significantly reduce the alreadysmall exploitable voltage interval (where very few faultsoccur). To this end, we tested the impact of process vari-ations on the position and width of the desired voltageinterval. Figure 4 reports the results of conducting thesame campaign on five different sample chips with thesame AES design. As can be seen, process variationsresult in a noticeable offset of the fault injection thresh-old for the sampled chip, due to their significant impacton subtreshold timing characteristics [2]. However, itcan be noticed that the rate of the degradation of the


0.390 0.395 0.400 0.405 0.410 0.415 0.420 0.425 0.430 0.435 0.440 0.445 0.4500

10

20

30

40

50

Voltage V

Faul

tPe

rcen

tage

Fig. 4. Comparison of the voltage threshold of the first appearance of faults among different sample chipsimplementing the same AES co-processor design. The five chip degradation curves are depicted in black.The impact of temperature variations on the degradation curve of the chip marked by round plotmarks isdepicted by the blue (T= 25C), green (T = 50C), and red (T= 75C) curves.

circuit performance is similar for all the samples andconsequently, the width of the voltage interval staysabout the same. This in turn implies that, regardless ofthe different offsets in the fault onset zone, it is possibleto inject successfully single byte faults with the same lowcost equipment. Note that an attacker may identify theproper voltage interval by lowering the supply voltage ofthe device using first large steps, and then much smallersteps once a faulty behavior is exhibited.

A related question is whether operating the chip atdifferent temperatures may impact the success rate ofthe fault injection attacks since temperature changes areknown to affect the working voltage of a circuit. Figure 4also shows the effect of operating a chip instance atdifferent temperatures achieved by placing the chip ina thermostatic chamber. As it can be seen, raising theworking temperature of the chip lowers the voltage pointwhere a significant amount of faults starts to happen.However, the degradation curve shows that it is stillpossible to identify a reasonable interval where the faultsare exploitable by an attacker.

After characterizing the graceful degradation of thechips, in terms of fault occurrence frequency, we inves-tigated the actual fault patterns to find out whether singlebyte faults were present in the erroneous ciphertextswhich were collected. To this end, we collected roughly670k faulty ciphertexts while running the chip undertest within the 10% faulty ciphertext region mentionedbefore. In order to uniformly stress the AES implemen-tation, the plaintexts used during this campaign wereselected from the NIST standard AES test vectors. The

goal of this analysis was to understand the variationsin the observed fault patterns. By examining the faultyciphertexts produced by the device, we found out thatthe errors induced by setup time violations caused only822 unique faults. This implies that the positions wherethe faults occur are very regular, since there was anapproximate repetition rate of 1000 for each fault pattern.The fault repetition rate was uniform with respect to thedifferent plaintexts.

The last step in the characterization of attacks on thisAES implementation was to find out how many faults outof the ones obtained, were practically usable for carryingout the attack. To this end, we computed all the correctinter-state values for the ten thousand plaintext/key pairsemployed in the experiments and the ones resulting fromthe decryption of the faulty outputs with the correctkey. The inner states of the cipher at the beginning ofeach round obtained in this way were compared withthe correct values and all the per-state differences wereanalyzed. We identified which faults were likely to havebeen generated by a single byte difference between thecorrect inner state and the faulty one. We note that,although we are not able to record the actual inner stateof the chip, it is very unlikely that a multi-byte differenceinduces the same effect as a single byte fault in an AESencryption, due to the fast diffusion properties of thecipher. We also note that an attacker does not need toperform this analysis (which would be impossible forhim due to the lack of the key knowledge), since allthe mentioned fault attack techniques on AES are ableto successfully discard faults that do not fit the desired


0 1 2 3 4 5 6 7 8 9 10

100

200

300

400

Cipher Round

Num

ber

offa

ults

Fig. 5. Per-round distribution of the single byte faultsin the states of the cipher on a chip sample. Round0 indicates the faults occurring before the cipherstarted, allegedly during the load operations. Eachline is obtained with a fixed voltage range, sweepingthe [426.6−427.7]mV interval in 0.1mV steps, faintercolors represent lower voltages.

fault pattern.In order to avoid possible fault pattern repetitions due

to the same plaintext or key, 10000 different plaintext andkey combination were used as input to each experiment.We performed 10 different measurements lowering thevoltage level by 0.1mV each time, while keeping thevoltage within the low fault rate region, to determinehow wide is the actual voltage window to achieve usablesingle bit faults. Based on these results we found outthat, out of 39881 faulty results collected over the 10runs, 30386 were the outcome of a single byte fault,thus resulting in an average 76% of the injected faultsfitting the desired fault model (single byte modification).The percentage of exploitable faults ranges from 61%(lowest voltage) to 82% (highest voltage), supporting theobservation that a bigger voltage drop induces graduallymore catastrophic faults in the computation. This resultimplies that the actual exploitable window for the faultinjection is at least 1mV wide. The last step to confirmthe feasibility of mounting fault injection attacks on thechip is to verify that the faults are hitting the specificround positions required by the attack of [18]. Figure 5shows the distribution of the single byte faults in thestate of the cipher for each voltage level employed inthe measurements.

As can be seen, the faults tend to hit all the roundsof the cipher, albeit in larger numbers for the earliest

rounds. This different sensitivity to single byte faultscan be ascribed to the critical paths in the control unitallowing slightly different slack times among differentrounds due to the specific encoding of the state register.

The very low fault rate for the first state of thecipher is to be attributed to the fact that the architecturehas just loaded the values and has not performed anysignificant operation on the plaintext yet. These resultsshow that it is possible to generate successfully therequired faults in order to attack the AES implementationunder consideration. In particular, since the position ofthe single byte fault in the inner state is almost uniformlydistributed, the hypotheses made in [18], [19] on therequired number of faults hold.

To confirm the feasibility of an attack on the chipwe executed the aforementioned attacks on an ad-hoc Csoftware implementation of the algorithm on a Core 2Quad Q6600 based desktop.

To provide an in-depth characterization of the faults,we examined all the single byte faults which we wereable to measure, counting the number of their occur-rences on a bit-level. Figure 6 shows a bit-level char-acterization of the single byte faults which we haveobserved in our chips. In particular, the state bit flips thatoccurred during the computation of each round of theAES are shown. Figures 6(a) and 6(b) depict the numberof single bit flips, split into bit flip-ups (effectively atransient bit set fault) and bit flip-downs (transient bitreset). As shown in the figure, both flip-ups and flip-downs are present, thus yielding a fault model whichdiffers from the one observed on a software-based AESby [24], where only bit resets were reported. We are notable to compare with the fault patterns obtained throughsetup time violations on a smart card in [14] as the paperdoes not report them.

Figures 6(c) and 6(d) show the cumulative single bitflipping statistics for two different chips of our design,to examine the impact of process variations on the faultpatterns. As can be seen from the figures, the values ofsome state bytes are consistently more sensitive to faultsacross different instances of the chip (e.g., byte 12),while some bytes are never hit by a fault, namely bytes1,5,9 and 13. The consistency of the fault patterns acrossdifferent experiments is compatible with the findingsof [25], which analyzed fault patterns on an FPGAdevice. The observations regarding the fault immunityof these four bytes were consistent in all the resultswe collected, and point to the possibility that the pathsdriving these bytes are effectively shorter than the otherones. However, we note that, among the fault-sensitivebytes, the amount of sensitivity may vary significantly:for instance, byte 7 is affected by a large number of


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0123456789

10

State Byte

Rou

nd

0

50

100

150

(a) Single bit flip-downs - Chip 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0123456789

10

State Byte

Rou

nd

0

50

100

(b) Single bit flip-ups - Chip 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0123456789

10

State Byte

Rou

nd

0

100

200

(c) Single bit flips - Chip 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0123456789

10

State Byte

Rou

nd

0

100

200

300

(d) Single bit flips - Chip 2

Fig. 6. Bit level characterization of the single byte faults in the state of the cipher.

faults in Chip 1, while it is almost never hit by them inChip 2.

5 FAULT PREDICTABILITY AT DESIGN TIME

This section describes the EDA flow used to design thecircuit under test and discusses how an early assessmentof the potential fault attacks is carried out. In particular,we discuss how an early prediction of the points in thecircuit which are more sensitive to setup-time violationscan be identified using static timing analysis. Finally, wediscuss to what extent the results obtained in simulationsmatch the ones measured on the real chip, and we exploitthem to propose efficient countermeasures against thisattack strategy.

5.1 Design Toolchain descriptionThe design flow used to assess the viability of fault at-tacks against the device under test is depicted in Figure 7,and it is composed of common EDA tools. The inputsof the design flow are an HDL definition of the cipher

implementation, in our case described in VHDL, andthe target standard-cell library, in the STMicroelectronics65nm LP CMOS technology.

As previously mentioned, we first recharacterized thelibrary at the desired voltage, then we synthesized ourdesign setting the appropriate target frequency.

The synthesized netlist was then placed and routedusing Cadence Design Systems SoC Encounter. Alsoproduced were a parasitics file (in spef format), the finalnetlist of the circuit (in verilog format), and the backannotation of the delay (in sdf format).

Once the design was completed, we carried out a postplace-and-route simulation using the netlist as well as thecorresponding delays. This simulation is used to producethe test vector, in VCD format, which will be used in thefollowing steps of the evaluation process. We performedtwo different analyses with the aim of discerning if thetools available in the common EDA flow for transistorlevel simulation and static timing analysis are effectivein predicting the fault patterns of the chip under test.


CMOS libraryFront-end and Back-end view

Logic SynthesysSynopsys DC

HDL code ofthe design

Library recharacterizationSynopsys Liberty NCX

CMOS libraryrecharacterized

RTL validation

post-synthvalidation

post-P&Rvalidation

P&RCadence Encounter

Synth netlist

P&R netlist

test vector(VCD format)

parasitic(SPEF format)

SPICE simulationSynopsys Nanosim

Supply Voltagescaled

CMOS libraryrecharacterized

SimulatedCiphertext

Transistormodel

Timing analysisSynopsys Primepower

Path delay

Fig. 7. Outline of the full simulation flow employed for both SPICE simulations and static timing analysis.

5.2 Predicting faults with transistor level simu-lations

The first analysis was aimed at understanding if a tran-sistor level simulator is capable of correctly predictingthe fault patterns we have measured in practice. To thisend, a transistor level simulation campaign was carriedout using Synopsys Nanosim, a fast SPICE simulator,employing the aforementioned data files as input. Inaddition to the data files describing the circuit, Nanosimrequires a transistor-level model of the gates availablein the library. The device behavior was simulated fora working voltage between 0.3V and 0.5V, in 1 mVsteps. The simulation resolution was set to 100ns, andwe employed a clock signal with the same frequencyas during the real measurements, yielding a numberof faulty ciphertexts reasonably close to the numbersobserved in the experiments. We can thus speculate thatNanosim is an effective tool to predict the setup timeviolations which we were able to measure in practice.

However, performing an extensive characterization of

the chip through transistor level simulation, to find whichare the paths that are more likely to fail through setuptime violations, is very time consuming. In fact, such acharacterization would require to perform a significantamount of simulations with different input-key pairsin order to provide a good circuit coverage. As onetransistor level simulation requires 10 minutes in ourcase, performing a simulation matching the experimentalcollection campaign would exceed 10k hours of compu-tation time. We therefore, followed another methodologyto predict which paths are more likely to fail using statictiming analysis.

5.3 Predicting faults with static timing analysis

To perform the static timing analysis, Synposys Prime-Power was used, employing the spef and sdf files, therelative Verilog netlist, and the file describing the timingcharacteristics of the library used during the synthesis.

The technology library file is not the original one, butthe one storing the power consumption and the timing


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

5,800

6,000

6,200

6,400

6,600

6,800

7,000

7,200

7,400

7,600

7,800

State Byte

Prop

agat

ion

Del

ay[n

s]

(a) Worst case propagation timings and hold propagation timingsfor all the input lines of the state register

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

State Byte

Tota

lnu

mbe

rof

faul

ts

(b) Fault rates measured on Chip 1 (green) and Chip 2 (blue),adding together all rounds

Fig. 8. Input delays (worst-case) for all the bits of the cipher 8(a); state and amount of faults hitting everystate bit 8(b).

information of the library after the recharacterization atthe appropriate voltage.

The result of a static timing analysis is a very detailedtiming characterization of the paths inside our design. Weextracted the worst-case delays associated with the inputconnections of the state and key registers. Examiningthe delays, we noticed that the ones characterizing thecipher state and the ones characterizing the key byteswere the same, bitwise. We ascribe this behavior to thefact that both the cipher state and the key are stored in thesame 32-byte SRAM bank, making it thus likely for themto have the same delays. Figure 8(a) depicts the delayscharacterizing the input lines of the latches storing thecipher state: it can be noticed that the bytes in position 1,5, 9 and 13 have a significantly lower worst-case delayfor their input signal. Figure 8(b) shows the number offaults per bit measured on chips 1 and 2, adding togetherall the faults occurring throughout the whole 10 roundsof the cipher, with fainter colors representing lower inputvoltages. It can be seen that while lowering the voltage,the number of faults per bit increases, but the percentageof faults hitting a single bit are roughly the same. Thisin turn points to an increase in the number of singlebyte faults when lowering the voltage, while maintainingthe faulty bit pattern even at lower voltages. Comparingthe two figures, one can notice that the bits having aninput delay very different from the critical path delay(e.g., in bytes 1,5,9 and 13) are completely fault-free. Incontrast, the bits characterized by an input delay close tothe critical path are affected by a significant amount of

faults. Also notice that, among them, the number of faultshitting a bit is strongly influenced by process variationsin some cases. For instance, the bits in the eighth byteof the state are affected by only a few faults in chip 2,while they are the second most affected in chip 1.

Based on these results we can conclude that statictiming analysis provides an effective way for the de-signer to predict circuit paths which are likely to notexperience setup time violations upon an attack, whileamong the ones at risk, the effective success rate of anattack is influenced by chip-to-chip process variations. Inan attempt to propose an effective approach to preventsetup-time violation based fault attacks, we note that theattacker relies on the early failures of some particularlines to achieve a meaningful faulty result. A viableapproach for a designer to protect the chip against suchattacks could be through introducing an extra logic path,having a total delay sensibly longer than the critical pathof the chip. Such a path will be acting as a canary,due to its extra length, in case a setup time violationis induced by the attacker. If a setup time violationon the canary path is detected, proper countermeasuresagainst the fault attack, e.g., output random values asthe ciphertext, and/or erase the key, can be activated.According to the delays reported in Figure 8(a) and thefault rates depicted in Figure 8(b), we note that pathsshorter than 20% with respect to the critical path, arenever hit by setup time violations, regardless of theprocess variations. It is thus sensible to assume that acanary path longer than the critical path by about 20%


will be adequate.

6 CONCLUSIONS

In this paper we have presented a detailed characteriza-tion of an AES coprocessor realized with 65nm technol-ogy and operating in a subtreshold voltage region. Thecharacterization showed that it is possible to effectivelyperform setup time violation attacks on the implementedciphers through reducing the supply voltage in 0.1mVsteps. We were able to provide a bit-level description ofthe fault frequencies and provide insights regarding theirpredictability with common EDA tools. In particular, wereport that by employing static timing analysis tools, itis possible to obtain reliable estimates of the worst casetimings for the input lines of the cipher state registersand pinpoint which ones are more likely to be vulnerableto setup time violation attacks. Finally, we proposed acountermeasure that designers can use in order to protecttheir design against these attacks.

ACKNOWLEDGEMENTS

This work was partially supported by the Walloon Re-gion (E-USER and S@T Skywin projects) and by theNanotera program (SecWear project). Francois-XavierStandaert is an associate researcher of the Belgian Fundfor Scientific Research (FNRS-F.R.S.). We acknowledgethe support of Lubos Gaspar and Franois Stas fromUniversite Catholique de Louvain, Louvain-la-Neuve, forperforming the temperature-controlled measurements.

REFERENCES

[1] R. Das and P. Harrop, “RFID forecasts, players and opportuni-ties 2011-2021,” IDTechEx report, 2010.

[2] D. Bol, “Robust and energy-efficient ultra-low-voltage circuitdesign under timing constraints in 65/45 nm cmos,” Journalof Low Power Electronics and Applications, vol. 1, no. 1, pp.1–19, 2011. [Online]. Available: http://www.mdpi.com/2079-9268/1/1/1

[3] D. Bol, R. Ambroise, D. Flandre, and J. Legat, “Interests andlimitations of technology scaling for subthreshold logic,” VeryLarge Scale Integration (VLSI) Systems, IEEE Transactions on,vol. 17, no. 10, pp. 1508–1519, 2009.

[4] A. Barenghi, C. Hocquet, D. Bol, F.-X. Standaert, F. Regazzoni,and I. Koren, “Exploring the Feasibility of Low Cost FaultInjection Attacks on Sub-threshold Devices through an Exampleof a 65nm AES Implementation,” in RFIDSec, ser. LectureNotes in Computer Science, A. Juels and C. Paar, Eds., vol.7055. Springer, 2011, pp. 48–60.

[5] C. Hocquet, D. Kamel, F. Regazzoni, J.-D. Legat, D. Flandre,D. Bol, and F.-X. Standaert, “Harvesting the potential of nano-cmos for lightweight cryptography: an ultra-low-voltage 65nm aes coprocessor for passive rfid tags,” J. CryptographicEngineering, vol. 1, no. 1, pp. 79–86, 2011.

[6] NIST, “Announcing the advanced encryption standard aes,” Fed-eral Information Processing Standards Publication 197, Tech.Rep., 2001.

[7] J. Daemen and V. Rijmen, The Design of Rijndael: AES - TheAdvanced Encryption Standard. Springer, 2002.

[8] C.-N. Chen and S.-M. Yen, “Differential fault analysis on aeskey schedule and some countermeasures,” in Proc. InformationSecurity and Privacy, pp. 217–217, 2003.

[9] C. Giraud, “DFA on AES,” Advanced Encryption Standard -AES, 4th International Conference, vol. 3373, pp. 27–41, 2005.

[10] C. H. Kim and J.-J. Quisquater, “New Differential Fault Anal-ysis on AES Key Schedule: Two Faults Are Enough,” inProc. International Conference on Smart Card Research andAdvanced Applications, pp. 48–60, 2008.

[11] A. Moradi, M. T. M. Shalmani, and M. Salmasizadeh, “Ageneralized method of differential fault attack against AEScryptosystem,” in Proc. International Workshop on Crypto-graphic Hardware and Embedded Systems, pp. 91–100, 2006.

[12] M. Hutter, T. Plos, and J.-M. Schmidt, “Contact-Based FaultInjections and Power Analysis on RFID Tags,” in Proc. IEEEEuropean Conference on Circuit Theory and Design, pp. 409–412, 2009.

[13] J.-M. Schmidt, M. Hutter, and T. Plos, “Optical Fault Attacks onAES: A Threat in Violet,” in Proc. Workshop on Fault Diagnosisand Tolerance in Cryptography, pp. 13–22, 2009.

[14] N. Selmane, S. Guilley, and J.-L. Danger, “Practical Setup TimeViolation Attacks on AES,” in EDCC-7 ’08: Proc. SeventhEuropean Dependable Computing Conference. Washington,DC, USA: IEEE Computer Society, 2008, pp. 91–96.

[15] A. Barenghi, G. M. Bertoni, L. Breveglieri, M. Pellicioli, andG. Pelosi, “Low Voltage Fault Attacks to AES,” in HOST,M. Tehranipoor and J. Plusquellic, Eds. IEEE ComputerSociety, 2010.

[16] A. Barenghi, L. Breveglieri, I. Koren, and D. Naccache, “FaultInjection Attacks on Cryptographic Devices: Theory, Practice,and Countermeasures,” Proceedings of the IEEE, vol. 100,no. 11, pp. 3056–3076, 2012.

[17] T. Kasper, M. Silbermann, and C. Paar, “All you can eat orbreaking a real-world contactless payment system,” in FinancialCryptography, ser. Lecture Notes in Computer Science, R. Sion,Ed., vol. 6052. Springer, 2010, pp. 343–350.

[18] P. Dusart, G. Letourneux, and O. Vivolo, “Differential FaultAnalysis on A.E.S,” CoRR, vol. cs.CR/0301020, 2003.

[19] G. Piret and J.-J. Quisquater, “A Differential Fault AttackTechnique against SPN Structures, with Application to the AESand KHAZAD,” in CHES, ser. Lecture Notes in ComputerScience, C. D. Walter, Cetin Kaya Koc, and C. Paar, Eds., vol.2779. Springer, 2003, pp. 77–88.

[20] M. Feldhofer, J. Wolkerstorfer, and V. Rijmen, “AES im-plementation on a grain of sand,” Information Security, IEEProceedings, vol. 152, no. 1, pp. 13–20, 2005.

[21] A. Barenghi, G. M. Bertoni, L. Breveglieri, and G. Pelosi, “Afault induction technique based on voltage underfeeding withapplication to attacks against AES and RSA, journal = Journalof Systems and Software,” vol. 86, no. 7, pp. 1864–1878, 2013.

[22] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A CompactRijndael Hardware Architecture with S-Box Optimization,” inProceedings of ASIACRYPT 2001, ser. LNCS, no. 2248, 2000,pp. 239–254.

[23] N. Mentens, L. Batina, B. Preneel, and I. Verbauwhede, “Asystematic evaluation of compact hardware implementations forthe rijndael s-box,” in CT-RSA, ser. Lecture Notes in ComputerScience, A. Menezes, Ed., vol. 3376. Springer, 2005, pp. 323–333.

[24] A. Barenghi, G. Bertoni, E. Parrinello, and G. Pelosi, “LowVoltage Fault Attacks on the RSA Cryptosystem,” in Workshopon Fault Diagnosis and Tolerance in Cryptography, 2009, pp.23–31.


[25] A. Pellegrini, V. Bertacco, and T. M. Austin, “Fault-based attackof rsa authentication,” in DATE. IEEE, 2010, pp. 855–860.

Alessandro Barenghi AlessandroBarenghi is currently a Post-Doc ResearchAssistant in Dipartimento di Elettronicae Informazione at Politecnico di Milano,where he obtained his Ph.D. title in 2011,with the thesis “Developments in SideChannel Attacks to Digital CryptographicDevices: Differential Power and FaultAnalysis”.

The main area of interest for his researches is computer,embedded and network security. He is currently tacklingthe issue of providing security at design time for embeddedsystems through the use of compiler techniques. In addition tohis interests in computer security, he is also working in the fieldof formal languages and compilers: his current interest regardstechniques for parallel parsing, employing operator precedencegrammars.

Cedric Hocquet Cedric Hocquet receivedthe M.S. degree in electrical engineer-ing from Universite Catholique de Louvain,Louvain-la-Neuve, Belgium, in 2009. Hewas with Universite Catholique de Lou-vain, Louvain-la-Neuve, Belgium, as a Re-searcher until 2012. His main researchsubjects were ultra-low-power digital circuitdesign and design automation. He has au-

thored more than ten papers and conference contributions. Heis now with National Instruments, Zaventem, Belgium.

David Bol David Bol received the M.Sc.degree in Electromechanical Engineeringand the Ph.D degree in Engineering Sci-ence from Universit catholique de Louvain(UCLouvain), Louvain-la-Neuve, Belgiumin 2004 and 2008, respectively. In 2005, hewas a visiting Ph.D student at the CNM Na-tional Centre for Microelectronics, Sevilla,Spain, in advanced logic design. In 2009,

he was a postdoctoral researcher at intoPIX, Louvain-la-Neuve,Belgium, in low-power design for JPEG2000 image process-ing. In 2010, he was a visiting postdoctoral researcher at theUC Berkeley Laboratory for Manufacturing and Sustainability,Berkeley, CA, in life-cycle assessment of the semiconductorenvironmental impact. He is now an assistant professor atUCLouvain, conducting researches in green VLSI and SoCs fora sustainable Internet-of-Things.

Prof. Bol has authored or co-authored more than 50 technicalpapers and conference contributions and holds a patent. In2008, he received the Best Paper and Best Poster Awards fromthe IEEE Int. Conf. on Computer Design (ICCD) and the IEEEInt. SOI Conf., respectively. He also serves as an editor for MDPIJ. Low-Power Electronics and Applications, as a TPC memberof IEEE SubVt and S3S conferences and as a reviewer forvarious journals and conferences such as IEEE J. of Solid-StateCircuits, IEEE Trans. on VLSI Syst., IEEE Trans. on Circuitsand Syst. I/II and ACM Design Automation Conf. Since 2008,he presented several invited keynote tutorials in internationalconferences.

Francois-Xavier Standaert Franois-Xavier Standaert was born in Brussels,Belgium in 1978. He received the ElectricalEngineering degree and PhD degree fromthe Universite catholique de Louvain,respectively in June 2001 and June2004. In 2004-2005, he was a Fulbrightvisiting researcher at Columbia University,Department of Computer Science,

Network Security Lab and at the MIT Medialab, Center forBits and Atoms. In March 2006, he was a founding memberof IntoPix s.a. From 2005 to 2008, he was a post-doctoralresearcher of the UCL Crypto Group and a regular visitorof the two aforementioned laboratories. Since September2008, he is associate researcher of the Belgian Fund forScientific Research (F.R.S.-FNRS) and professor at the UCLInstitute of Information and Communication Technologies,Electronics and Applied Mathematics (ICTEAM). In 2010, hewas program co-chair of CHES (IACR’s flagship workshopon cryptographic hardware). In June 2011, he has beenawarded a Starting Independent Research Grant by theEuropean Research Council. His research interests includedigital electronics, FPGAs and cryptographic hardware, lowpower implementations for constrained environments (RFIDs,sensor networks, ...), the design and cryptanalysis of symmetriccryptographic primitives, physical security isssues in generaland side-channel analysis in particular.


Francesco Regazzoni Dr. FrancescoRegazzoni is a postdoctoral researcherat the ALaRI Institute of University ofLugano. After receiving his Master ofScience degree from Politecnico diMilano, he completed his PhD degreeat the ALaRI Institute of University ofLugano. Previously, he has been assistantresearcher at the Crypto Group of the

Universit Catholique de Louvain and at Technical University ofDelft, and visiting researcher at several institutions, includingNEC Labs America, Ruhr University of Bochum, and EPFL.His research interests are mainly focused on embeddedsystems security, covering in particular side channel attacks,cryptographic hardware, electronic design automation forsecurity, and random number generators.

Israel Koren Israel Koren (M’76 - SM’87 -F’91) is currently a Professor of Electricaland Computer Engineering at the Univer-sity of Massachusetts, Amherst. He hasbeen a consultant to numerous compa-nies including IBM, Analog Devices, Intel,AMD and National Semiconductors. His re-search interests include Fault-Tolerant sys-tems, Computer Architecture, VLSI yield

and reliability, Secure Cryptographic systems, and ComputerArithmetic. He publishes extensively and has over 250 publica-tions in refereed journals and conferences. He is an AssociateEditor of the VLSI Design Journal, and Sustainable Computing:Informatics and Systems. He served as General Chair, ProgramChair and Program Committee member for numerous confer-ences. He is the author of the textbook ”Computer ArithmeticAlgorithms,” 2nd Edition, A.K. Peters, 2002 and a co-author of”Fault Tolerant Systems,” Morgan-Kaufman, 2007.

Documents

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, …