8
A NOVEL MICROPROCESSOR-INTRINSIC PHYSICAL UNCLONABLE FUNCTION Abhranil Maiti Electrical & Computer Engineering Virginia Tech Blacksburg, Virginia, 24061 email: [email protected] Patrick Schaumont Electrical & Computer Engineering Virginia Tech Blacksburg, Virginia, 24061 email: [email protected] ABSTRACT We present a novel Physical Unclonable Function (PUF) exploiting the variability existing in a microprocessor pipeline to uniquely identify the microprocessor chip. The PUF accepts a microprocessor instruction as a challenge and produces the delay in a data path or a control path in the microprocessor as the response. The delay value is captured by over-clocking the micropro- cessor. The entire mechanism can be controlled by the microprocessor itself. Moreover, this PUF requires no dedicated hardware resources. It is a microprocessor- intrinsic PUF solution. We demonstrate our proposed idea using the SPARC instruction set implemented in a 32-bit LEON3 processor in a Spartan 3E FPGA. Our implementation based on the characterization of a sub- set of five SPARC instructions can produce 37 secure response bits for authentication. 1. INTRODUCTION An on-chip Physical Unclonable Function (PUF) ex- ploits manufacturing process variation inside integrated circuits (ICs) to generate chip-unique challenge-response pairs (CRPs). The relation between a challenge and the corresponding response is determined by complex statistical variation in logic and interconnect in an IC. Many useful applications for PUFs have been proposed so far due to their ability to generate a random key, to store a key without dedicated memory, and to prevent physical tampering. For example, they can be used in device authentication and secret key generation [1]. Guajardo et al. discussed the use of PUFs for Intel- lectual Property (IP) protection, remote service activa- tion, and secret key storage [2]. A PUF-based RFID tag has also been proposed to prevent product counter- feiting [3]. In this paper, we introduce a novel PUF that ex- tracts variability existing in a microprocessor pipeline to uniquely identify the microprocessor chip. This PUF accepts a microprocessor instruction as a challenge and produces the delay in a data path or a control path in the microprocessor as the response. The delay value is captured by operating the microprocessor pipeline at an elevated clock frequency, also known as over-clocking. The entire PUF mechanism can be controlled by the microprocessor itself using software programs. More- over, the proposed PUF does not require any dedicated hardware or any type of modification in the existing hardware except a circuit that can vary the clock fre- quency of the microprocessor. We assume the avail- ability of a digital clock manager (DCM) or a phase locked loop (PLL) that can be used for clock genera- tion/variation in general, not just for the PUF. In an FPGA, a DCM or a PLL is available as a standard com- ponent and hence, can be used for this PUF with no ex- tra cost. In this context, one may argue that an FPGA can be configured solely with the PUF for authentica- tion and then reconfigured with the main application so that the area consumed by the PUF does not become an issue. However, configuring an FPGA separately for authentication and the main application will lead to the overhead of reconfiguration. In our method, the PUF can exist along with the main application while consuming minimal area. Why do we need a new PUF? This is an important question, given the number of PUF proposals already available. Most of the proposed PUFs require a signif- icant amount of hardware resources. For example, a ring oscillator-based PUF requires a pair of ring oscil- lators to generate one bit of PUF response [1]. A but- terfly PUF consumes two latches to produce the same [4]. However, there are PUFs that can be implemented without the need of dedicated hardware resource; they are called intrinsic PUF. Guajardo et al. exploited ran- dom power-up values of SRAM cells to build an SRAM- PUF [5]. Similar work has been done by Holcomb et al. [6]. The main drawback of this type of PUFs is that they require power cycling. Hence, the random infor- 978-1-4673-2256-0/12/$31.00 c 2012 IEEE 380

A Novel Microprocessor-Intrinsic Physical Unclonable Function

Embed Size (px)

DESCRIPTION

A Novel Microprocessor-Intrinsic Physical Unclonable Function

Citation preview

Page 1: A Novel Microprocessor-Intrinsic Physical Unclonable Function

A NOVEL MICROPROCESSOR-INTRINSIC PHYSICAL UNCLONABLEFUNCTION

Abhranil Maiti

Electrical & Computer EngineeringVirginia Tech

Blacksburg, Virginia, 24061email: [email protected]

Patrick Schaumont

Electrical & Computer EngineeringVirginia Tech

Blacksburg, Virginia, 24061email: [email protected]

ABSTRACT

We present a novel Physical Unclonable Function (PUF)exploiting the variability existing in a microprocessorpipeline to uniquely identify the microprocessor chip.The PUF accepts a microprocessor instruction as achallenge and produces the delay in a data path or acontrol path in the microprocessor as the response. Thedelay value is captured by over-clocking the micropro-cessor. The entire mechanism can be controlled by themicroprocessor itself. Moreover, this PUF requires nodedicated hardware resources. It is a microprocessor-intrinsic PUF solution. We demonstrate our proposedidea using the SPARC instruction set implemented ina 32-bit LEON3 processor in a Spartan 3E FPGA. Ourimplementation based on the characterization of a sub-set of five SPARC instructions can produce 37 secureresponse bits for authentication.

1. INTRODUCTION

An on-chip Physical Unclonable Function (PUF) ex-ploits manufacturing process variation inside integratedcircuits (ICs) to generate chip-unique challenge-responsepairs (CRPs). The relation between a challenge andthe corresponding response is determined by complexstatistical variation in logic and interconnect in an IC.Many useful applications for PUFs have been proposedso far due to their ability to generate a random key, tostore a key without dedicated memory, and to preventphysical tampering. For example, they can be usedin device authentication and secret key generation [1].Guajardo et al. discussed the use of PUFs for Intel-lectual Property (IP) protection, remote service activa-tion, and secret key storage [2]. A PUF-based RFIDtag has also been proposed to prevent product counter-feiting [3].

In this paper, we introduce a novel PUF that ex-tracts variability existing in a microprocessor pipelineto uniquely identify the microprocessor chip. This PUF

accepts a microprocessor instruction as a challenge andproduces the delay in a data path or a control path inthe microprocessor as the response. The delay value iscaptured by operating the microprocessor pipeline at anelevated clock frequency, also known as over-clocking.The entire PUF mechanism can be controlled by themicroprocessor itself using software programs. More-over, the proposed PUF does not require any dedicatedhardware or any type of modification in the existinghardware except a circuit that can vary the clock fre-quency of the microprocessor. We assume the avail-ability of a digital clock manager (DCM) or a phaselocked loop (PLL) that can be used for clock genera-tion/variation in general, not just for the PUF. In anFPGA, a DCM or a PLL is available as a standard com-ponent and hence, can be used for this PUF with no ex-tra cost. In this context, one may argue that an FPGAcan be configured solely with the PUF for authentica-tion and then reconfigured with the main application sothat the area consumed by the PUF does not becomean issue. However, configuring an FPGA separatelyfor authentication and the main application will leadto the overhead of reconfiguration. In our method, thePUF can exist along with the main application whileconsuming minimal area.

Why do we need a new PUF? This is an importantquestion, given the number of PUF proposals alreadyavailable. Most of the proposed PUFs require a signif-icant amount of hardware resources. For example, aring oscillator-based PUF requires a pair of ring oscil-lators to generate one bit of PUF response [1]. A but-terfly PUF consumes two latches to produce the same[4]. However, there are PUFs that can be implementedwithout the need of dedicated hardware resource; theyare called intrinsic PUF. Guajardo et al. exploited ran-dom power-up values of SRAM cells to build an SRAM-PUF [5]. Similar work has been done by Holcomb et al.[6]. The main drawback of this type of PUFs is thatthey require power cycling. Hence, the random infor-

978-1-4673-2256-0/12/$31.00 c©2012 IEEE 380

Page 2: A Novel Microprocessor-Intrinsic Physical Unclonable Function

Data path /Control Path

Register1 Register2

Clock (frequency = f, period = Tc, Tc = 1/f)

Tpath

Fig. 1. A pipelined delay path

Chip1 Tpath1

Tc

FFPs

Chip2 Tpath2

Chip3 Tpath3

Chip4 Tpath4

Chipm Tpathm

Fig. 2. Variable FFPs based on variable slacks in datapath / control path

mation is not easily accessible. Maes et al. showed thatthe power-up states of the D flip-flops (DFFs) inside anFPGA can be captured by modifying the programmingbitfile [7]. In the DFF-based PUFs also, in order to ex-tract the PUF CRPs, one has to power cycle the chipwhich is not the best practical solution.

Our proposed PUF is also intrinsic in nature. Thekey difference between the existing intrinsic PUFs andthe proposed one is that our PUF does not depend onthe power-up state of the circuit. As a result, the pro-posed PUF can evaluate the CRPs anytime during theoperation of a circuit.

We identify several benefits out of the proposed PUF.First, a microprocessor is a ubiquitous circuit element,present in almost every embedded application. Thismakes it an ideal candidate for an intrinsic PUF. Sec-ond, we use microprocessor instructions to build thePUF challenge-response mechanism. This provides away of integrating low-level hardware information (vari-ability) with the high-level software applications (soft-ware programs). Finally, any post-processing of thePUF data can be flexibly done in software obviatingany need of error-correction hardware. The main con-tributions of this work are :

• We first propose how variability in the data pathand the control path of a microprocessor pipelinecan be captured using microprocessor instructions.We introduce a challenge-response mechanism basedon the variability information.

• We demonstrate the idea on a SPARC instructionset implemented in a 32-bit LEON3 processor in aSpartan 3E FPGA. We analyzed different types ofinstructions such as arithmetic instructions, logi-cal instructions, and control instructions. Our re-sults show good amount of uniqueness among thechips we tested in terms of the PUF responses. Itproduced 37 secure response bits for authentica-tion based on a subset of five SPARC instructions.They also exhibit a high value of reliability.

The proposed approach or anything similar has notbeen reported in literature previously as far as our knowl-edge extends. In this paper, we focus on establishingthe idea that process variation information inherent ina microprocessor can be extracted using microprocessorinstructions to create a PUF challenge-response mech-anism. A complete characterization and validation ofthis PUF in terms of stress testing (variation in temper-ature and supply voltage, aging) and security analysisare expected future work, but it is out of the scope ofthe present contribution. The rest of the paper is orga-nized as follows. We introduce the detailed architectureof the proposed PUF in Section 2. We present the ex-perimental results in Section 3. Section 4 discusses therelated work. We conclude in Section 5.

2. MICROPROCESSOR-INTRINSIC PUF

In this PUF, we aim to exploit the delay variabilitythat exists in several data paths and control paths in amicroprocessor pipeline. Depending on the instructionthat is run on the microprocessor, a particular set ofdata paths or control paths are activated. Due to man-ufacturing process variation, these paths have variabledelay from one chip to another. We extract this variabledelay to authenticate a chip. Figure 1 shows a typicalpipelined delay path. In a microprocessor pipeline, sim-ilar paths exist. The clock period Tc is constrained bythe following relationship:

Tc ≥ tc q + tcomb + tsetup (1)

where tc q is clock-to-output delay of the register 1,tcomb is the combinational path delay, and tsetup is theset up time of the register 2. We represent tc q+tcomb+tsetup as Tpath. The value of Tpath for a particular pathvaries from one chip to another due to process varia-tion. In addition, the value of Tpath also depends onthe type of instruction being executed, as well as itsoperands. If we increase the clock frequency i.e. reducethe clock period, an instruction will fail once the value

381

Page 3: A Novel Microprocessor-Intrinsic Physical Unclonable Function

f1 f2 f3 f4f0

k

0.5k

0

Chip1

Chip2

Chip3

Frequency

no of

correct

executions

=

pass count

(pc)

Transition

Fig. 3. Comparison of failure behavior of an instruc-tion across a set of chips

of the reduced Tc reaches the value of Tpath. In otherwords, the instruction will reach a failure point oncethe slack (which is the difference between Tpath andthe original Tc) becomes negative. We call the point offailure for a single instruction a frequency failure point(FFP). This has been demonstrated in Figure 2; thevalue of Tpath for a particular data path/control pathvaries randomly over a set of m chips. The instructionset of a microprocessor leads to a set of FFPs. Theobjective of the proposed PUF is to authenticate a mi-croprocessor through the FFPs of its instruction set, ora suitable subset of it.

2.1. How do we generate a PUF response bit?

We first explain a simple concept that shows how a re-sponse bit can be generated using the FFPs correspond-ing to an instruction. We observed that an instructiondoes not stop executing abruptly at a particular fre-quency while the microprocessor is over-clocked. It goesthrough a transition phase in which there are severalfrequencies at which an instruction executes with par-tial failure. This means when run several times, it fails afraction of the total number of times it is run. With fur-ther increase in the clock frequency, it fails completelybeyond a certain frequency. Essentially, the failure ofan instruction due to over-clocking occurs through aregion and not at a point. Figure 3 shows such a sce-nario. (It is a hypothetical example and not based onreal data.) It shows how a particular instruction failsdue to over-clocking for three different chips. The fail-ure behavior is estimated by running the instructionk(k > 0) times at different frequencies and countingthe number of times it is correctly executed; we call itthe pass count (pc). It is represented along the Y-axisin Figure 3. Before the frequency f1, the pc of chip1 isk. It starts failing partially at f1. This means that thepc is a fraction of k at f1. Finally, the pc becomes zero

fstart fend

k

0.5k

0

Frequency

pass count

0.1k

0.9k

10 01 0011

Fig. 4. Response generation in the proposed PUF

at f2. The region between f1 and f2 (colored in gray)is the transition region for chip1 for the instruction ex-ecuted. Due to process variation, chip2 and chip3 havetheir transition regions for the same instruction at dif-ferent frequencies as illustrated in Figure 3. The tran-sition region for chip2 and chip3 are not highlighted forsimplicity. Now, to generate a response bit r at an ar-bitrary frequency f for a particular instruction, let usapply a simple quantization rule as follows:

r =

{1 if pc/k > 0.5

0 otherwise(2)

Following the above rule, three chips in Figure 3 pro-duce different response bits at different frequencies. Forexample, at f2, chip1, chip2, and chip3 produce ‘0’,‘1’,and ‘1’ respectively. At f3, they produce ‘0’, ‘0’, and‘1’ respectively. We can create a string of response bitsthis way by sampling an instruction at different frequen-cies. This is the basic working principle of the proposedPUF.

2.2. Formalizing the PUF mechanism

We follow the standard PUF mechanism consisting oftwo phases: enrollment and evaluation. In the enroll-ment, we characterize several instructions at differentlevels of over-clocking to measure the transition regions.We slightly modify the basic quantization rule proposedin the previous section. We divide the transition regionin four parts based on the pc value and assign a binaryvalue (used as PUF response, r) to those sub-regionsusing the follow rule:

r =

00 if pc/k ≤ 0.1

01 if 0.1 < pc/k ≤ 0.5

10 if 0.5 < pc/k ≤ 0.9

11 if pc/k > 0.9

(3)

382

Page 4: A Novel Microprocessor-Intrinsic Physical Unclonable Function

The idea is depicted in Figure 4. Suppose, at five con-secutive sampling frequencies in the transition region, achip produces the following pc values for an instruction:95, 87, 46, 17, and 5. Assuming k=100, the correspond-ing responses would be 11, 10, 01, 01, 00 according toEquation (3). The resultant response is the concatena-tion of these bit-strings i.e. 1110010100.

The region between 0.5k and 0.9k is more likely toexecute an instruction correctly but it is distinct fromthe region where the processor has 100% pc. On theother hand, the region between 0.1k and 0.5k is morelikely to execute an instruction incorrectly but it is dis-tinct from the region where the pc is 0. This is the rea-son why we divided the transition region in four parts.The reason behind choosing the threshold of 0.1k and0.9k is to reduce errors in response bits. Due to noisein the circuit, the pc around the frequency fstart (referto Figure 4) in the transition region may fluctuate be-tween 100% and values slightly lower than 100%. Simi-larly, near the frequency fend (refer to Figure 4), it mayfluctuate between 0 and values slightly higher than 0.During enrollment, we collect three types of informa-tion: a) the frequency at which an instruction startsfailing partially. (the transition start point fstart) b)the frequency at which it completely stops executing.(the transition end point fend) c) the sampling frequen-cies in the transition region and the pc values at thesampling frequencies including fstart and fend. Thesethree types of information are captured and stored in asecure database as failure transition information (FTI).Each of these three types of information are utilized forauthentication as they vary from one chip to another.

During evaluation, a verifier can check the chip’sclaimed identity as follows. First, the verifier defines achallenge by selecting one or more instructions and a setof frequencies at which the instructions will be run. Theclaiming chip then executes the selected instructions atthose frequencies and returns the binary responses foreach using Equation (3). Finally, the verifier calculatesthe binary responses using the FTI stored in his securedatabase, and compares with that returned by the chip.A match implies an authentic chip. To clone a chip, anadversary needs to know the precise detail of the FTIof a chip for all characterized instructions.

3. RESULTS

We characterized three types of instructions: arith-metic, logical, and control transfer. We present resultsfor addition, unsigned multiplication, unsigned divi-sion, AND operation, and a branch instruction. We im-plemented our proposed PUF using the SPARC instruc-tion set in a 32-bit LEON3 processor. It is tested on

five Xilinx Spartan3E 500 FPGAs under normal tem-perature and regular supply voltage. The processor isconfigured with an internal RAM of 32 Kbytes alongwith a multiplier and a divider unit. It does not haveany external memory, instruction cache or data cache.All the instructions are written in assembly code andloaded to the internal memory using the JTAG-baseddebug interface of the LEON3 processor.

While the instructions characterized for PUF failwith over-clocking, rest of the microprocessor still func-tions correctly. When the clock frequency is restoredto the normal value, the microprocessor comes backto fully functional state with the characterized instruc-tions executing correctly. This is achieved by small,step-by-step variation of frequency. We used a step of0.5 MHz. This way the microprocessor can run thePUF fully autonomously. We controlled the processorfrequency by two different methods. Initially, we usedan external, high-speed pulse generator. In a further re-finement of the design, we replaced the external pulsegenerator by the on-board FPGA frequency generators(DCMs), to demonstrate that the proposed PUF canalso run fully autonomously. The step of 0.5 MHz iseasily available using the pulse generator. Even a DCMin a Virtex V FPGA can achieve this step as our im-plementation shows. Moreover, a frequency step as lowas 0.038 MHz using a DCM in an FPGA has been re-ported in [8]. The results presented in this paper aregenerated using the frequency generator.

To save the value of pc in memory, we first char-acterized the FTI of the store instruction and ensuredthat the clock frequency never exceeds the transitionregion of the store instruction. This characterizationalso showed that the FTIs of the arithmetic instruc-tions, logical instruction, and the control instructionsare located at a much lower frequency than that of thestore instruction. This ensures that the collected datarepresents the variability of the instruction we want tocharacterize (such as addition and multiplication) andnot the variability of the store instruction.

3.1. Performance evaluation of the PUF

Two parameters are used to evaluate the performanceof the PUF: uniqueness and reliability. Uniqueness is anestimate of the ability of a PUF to uniquely distinguishdifferent chips based on the generated responses. Relia-bility quantifies the reproducibility of the response bits.If two chips, i and j (i 6= j), have n-bit responses, Ri

and Rj respectively for the challenge C, the uniqueness

383

Page 5: A Novel Microprocessor-Intrinsic Physical Unclonable Function

among m chips is defined as:

Uniqueness =2

m(m − 1)

m−1∑

i=1

m∑

j=i+1

HD(Ri, Rj)

n× 100%

(4)

HD stands for Hamming distance. For a chip i, thereliability is estimated as follows.

Reliability = 100%− 1

L

L∑

l=1

HD(Ri, R′i,l)

n× 100%

(5)

where R′i,l is the l-th sample of Ri, and L is the total

number of samples. In order to evaluate these param-eters, we need to form an n-bit response string usingthe PUF. For the proposed PUF, not only the transi-tion regions for an instruction are located at differentfrequencies for different chips, the length of the tran-sition region also varies from one chip to another. Weused 90nm technology FPGAs, and the length of thetransition region is 2 MHz on an average. This requirescareful calibration of the PUF measurements. We usea simple method to construct the response string in or-der to evaluate PUF performances. Let us explain withan example of the addition instruction. Figure 5shows the result in terms of pc values for the opera-tion adding 0x7FFFFFFF and 0x1. The pc values areshown as percentage of k with k = 100 (refer to Equa-tion (3)). Among the five chips, chip2 has the lowestfstart (shown as fmin in Figure 5), and chip5 has thehighest fend (shown as fmax in Figure 5). Before fmin,each of the chips has 100% pc value, and hence all ofthem produce ‘11’ in this region. On the other hand,after fmax, all the chips have a pc value of 0 leadingto ‘00’ response by all. Therefore, there is no infor-mation available outside the range from fmin to fmax.Therefore, we derive response bits using Equation (3)starting at fmin until fmax with a step of 0.5 MHz. With

115 120 125 130 135 140 1450

20

40

60

80

100

Frequency (MHz)

Pa

ss

Co

un

t (%

)

Chip1

Chip2

Chip3

Chip4

Chip5

fmax = 140.5 MHzfmin = 117.5 MHz

Fig. 5. Result for ADD 0x7FFFFFFF, 0x1

130 135 140 145 150 1550

20

40

60

80

100

Frequency (MHz)

Pass C

ou

nt

(%)

130 135 140 145 150 1550

20

40

60

80

100

Frequency (MHz)

Pass C

ou

nt

(%)

Chip1

Chip2

Chip3

Chip4

Chip5fmin = 128.5 MHz fmax = 154.5 MHz

fmin = 131 MHz fmax = 156.5 MHz

Fig. (a)

Fig. (b)

Fig. 6. (a)Result for MUL 0xFFFFFFFF,0xFFFFFFFF (b)Result for MUL 0xFFFFFFFF,0x80000001

105 110 115 120 125 130 135 140 1450

20

40

60

80

100

Frequency (MHz)

Pa

ss

Co

un

t (%

)

105 110 115 120 125 130 135 140 1450

20

40

60

80

100

Frequency (MHz)

Pa

ss

Co

un

t (%

)

Chip1

Chip2

Chip3

Chip4

Chip5

fmin = 110 MHz fmax = 131 MHz

fmin = 117 MHz fmax = 141 MHz

Fig. (a)

Fig. (b)

Fig. 7. (a)Result for DIV0xFFFFFFFE00000001,0xFFFFFFFF (b)Resultfor DIV 0xFA0, 0x14

115 120 125 130 135 140 1450

20

40

60

80

100

Frequency (MHz)

Pa

ss

Co

un

t (%

)

115 120 125 130 135 140 1450

20

40

60

80

100

Frequency (MHz)

Pa

ss

Co

un

t (%

)

Chip1

Chip2

Chip3

Chip4

Chip5

fmin = 117 MHz

fmin = 117 MHz

fmax = 141.5 MHz

fmax = 139 MHz

Fig. (a)

Fig. (b)

Fig. 8. (a)Result for AND 0xFFFFFFFF,0xAAAAAAAA (b)Result for BGE instruction

384

Page 6: A Novel Microprocessor-Intrinsic Physical Unclonable Function

Table 1. Summary of PUF performance for different operationsOperation (No of Responses) Uniqueness ReliabilityADD 0x7FFFFFFF, 0x1 (94) 38.7% 97.4%MULT 0xFFFFFFFF, 0xFFFFFFFF (106) 36% 98.1%MULT 0xFFFFFFFF, 0x80000001 (104) 36.1% 98%DIV 0xFFFFFFFE00000001, 0xFFFFFFFF (86) 38.1% 99%DIV 0x0000000000000FA0, 0x14 (98) 37.3% 95.6%AND 0xFFFFFFFF, 0xAAAAAAAA (100) 36% 99%BGE (90) 40.6% 98.3%

fmin= 117.5 MHz and fmax = 140.5 MHz, there are 47sampling points in total each producing 2 response bitsaccording to Equation (3). Therefore, there are 47 ×2 = 94 responses produced by each of the chips. Theaverage uniqueness among five chips based on a 94-bitresponse string is 38.7% (refer to Table 1). We alsotested different input operands such as different lengthof carry propagation. However, our observation is thatno significant variability is found based on carry lengthvariation for the addition instruction.

For the unsigned multiplication operation, we presentresults for two pairs of operands: (0xFFFFFFFF,0xFFFFFFFF) and (0xFFFFFFFF, 0x80000001). Fig-ure 6 shows that this instruction has an FTI that is de-pendent on the input operands. For each of the testedchips, the transition region for the two multiplicationsare located at different frequencies. For example, inChip2, fstart=128.5 MHz for the first pair of operands(shown as fmin in Figure 6(a)) while fstart=131 MHzfor the second pair of operands (shown as fmin in Fig-ure 6(b)). Using the same method as described for theaddition instruction, the multiplication instruction pro-duced 106 response bits for 53 sampling frequencies forthe first pair of operands. The second pair of operandsproduced 104 response bits for 52 sampling frequencies.The average uniqueness values are 36% and 36.1% re-spectively.

Similar to the unsigned multiplication operation,we observed operand dependency in case of the un-signed division instruction also. We present two cases:(0xFFFFFFFE00000001, 0xFFFFFFFF) and (0xFA0,0x14). Figure 7 shows that the two pairs of operandsled to FTIs that are located at distinct frequency lo-cations. The uniqueness values are 38.1% and 37.3%respectively for the two cases respectively.

There are nine different logic instructions supportedby the SPARC instruction set. We characterized all ofthem. The results show that all of the instruction havea very similar FTI. We have presented the result for theAND operation with the operand pair (0xFFFFFFFF,0xAAAAAAAA) in Figure 8(a). The measured unique-

ness is 36%. Finally, we present results for the controltransfer instruction. There are sixteen branch instruc-tions supported by the SPARC instruction set. Like thelogical operations, they also show similar FTIs. Figure8(b) shows the result for the instruction BGE (branchon greater or equal). It can be noticed that unlike othertypes of instruction, the branch instruction has a veryshort transition region for all the chips and reaches a pcvalue of 0 abruptly. The average uniqueness producedby this instruction is 40.6%.

Table 1 shows a summary for different instructionsused as challenge for the proposed PUF. Though theaverage uniqueness value shows good result, it deviatesfrom the ideal value of 50% of fully random responses.This is because the proportion of ‘0’s and ‘1’s in theproduced responses is partly biased. The smaller pop-ulation size of five chips partially contributes to a lowvalue of uniqueness. We present the uniqueness valuewithout any post-processing. By using hashing or othertechniques, as has been used by several other PUFs,the uniqueness can be improved. Moreover, the pre-transition region produces only ‘11’ as response whilethe post-transition region produces only ‘00’. However,it still helps to distinguish one chip from another iftwo chips have different FTIs. The experimental re-sults support that. For example, in Figure 5, fromthe frequency 120 MHz to 127.5 MHz, chip1 is in pre-transition region producing only ‘11’ as responses. Inthe same frequency range, chip2 is in post-transition re-gion producing only ‘00’ as responses, and hence theyare distinguished. The PUF shows high reliability. Thereliability parameter is estimated based on 100 mea-surement samples. A value close to 100% indicates thatthe PUF responses are highly reliable at the normaloperating condition. We plan the evaluation of envi-ronmental reliability (temperature, voltage) as futurework.

385

Page 7: A Novel Microprocessor-Intrinsic Physical Unclonable Function

3.2. Security analysis of the proposed PUF

We estimate the number of secure response bits for au-thentication. By secure bits, we mean those responsebits that cannot be predicted given the informationabout other response bits. The number of secure re-sponse bits is a function of the number of instructionswe characterize, as well as the frequency spacing withwhich we sample the transition region (0.5 MHz in ourexperiment). These are the design parameters of theproposed PUF.

Let us assume that an adversary is attempting topredict the response r at a frequency fi. We assumethat she knows the value of r at the previous sam-pling frequency fi−1 and the next sampling frequencyfi+1 (fi−1 <fi <fi+1). Then she can guess r at fi tosome extent. If r at fi−1 and r at fi+1 are same, it isobvious that r at fi has the same value too. This isbecause with increase in frequency, pc can either staythe same or decrease. On the other hand if r at fi−1

and r at fi+1 are not same, different possibilities mayarise. Table 2 shows the possibilities depending on allpossible combination of r at fi−1 and r at fi+1 whenthey are different. Assuming the possible values of rat fi are equally likely, we can estimate the numberof secure bits at fi using Shannon’s formula. For ex-ample, the first row in the third column in Table 2shows that the possible value of r at fi is ‘11’ or ‘10’i.e two possibilities. Hence the number of secure bitsis 2 × (1/2) × (−log2(1/2)) = 1. In this case, r cannothave values ‘01’ and ‘00’ as pc may either stay constantor may decrease with increase in frequency. Similarly,for the second row, it is 3× (1/3)× (−log3(1/3)) ≈ 1.5.

This analysis shows that secure bits are available atthe transition regions. There are seven transition re-gions for five instructions with multiplication and divi-sion having two operations each. With each transitionregion producing around five secure bits on an average,each chip has 37 secure bits on an average as shown intable Table 3.

Table 2. Secure response bit estimation chartr atfi−1

r atfi+1

Possible r atfi

No of securebits at fi

11 10 11, 10 111 01 11, 10, 01 1.511 00 11, 10, 01, 00 210 01 10, 01 110 00 10, 01, 00 1.501 00 01, 00 1

Table 3. Estimated Number of secure response bitsChip1 Chip2 Chip3 Chip4 Chip5 Average

38 38 37 38 34 37

4. RELATED WORK

Besides the SRAM PUF and the DFF-based PUF thathave been already discussed, there are several otherworks that are relevant to this work. Suh et al. pro-posed a secure processor architecture using PUF as asource of secret [9]. The proposed architecture providedsecure environment for trusted computing by prevent-ing physical attacks on memory and tampering of pro-gram. This work did not propose any new PUF; theyused arbiter PUF in their architecture. In our work, weexploit the variability in a processor pipeline to builda new PUF. In another work, hardware authentica-tion is achieved by calculating a checksum using micro-architectural complexity of a processor model [10]. Thistechnique can distinguish one processor model from an-other; distinguishing one chip from the other is not pos-sible by it. We instead focus on chip-level authentica-tion.

Wong et al. used variable clock to characterize thedelay variability in several components of an FPGAsuch as LUTs, flip-flops [8]. In our proposed PUF, wealso employ variable clock frequency to a soft-core mi-croprocessor in an FPGA in order to measure variabil-ity. However, the work by Wong et al. only focusedon characterizing variability in generic FPGA compo-nents. We instead built a PUF mechanism using thevariability in a microprocessor pipeline. Majzoobi etal. used a very similar technique as proposed in [8] tobuild a PUF [11]. However, their PUF exploits the de-lay variability of FPGA building blocks, and not of amicroprocessor pipeline like ours.

5. CONCLUSION

We introduced a microprocessor-intrinsic PUF. We in-troduced the concept supported by an on-chip imple-mentation and experimental results. The experimentalresults shows a moderate value of uniqueness in the re-sponse bits, whereas the reliability is high at normaloperating condition. As part of the future work, weplan to characterize those instructions that have notbeen tested. We would also like to test the reliability ofthe PUF under varying temperature and supply volt-age. A detailed security analysis is also a part of thefuture plan.

386

Page 8: A Novel Microprocessor-Intrinsic Physical Unclonable Function

Acknowledgment

This research was supported in part through NSF grantno 0964680 and NSF grant no 0855095.

6. REFERENCES

[1] G. E. Suh and S. Devadas, “Physical unclonable func-tions for device authentication and secret key gener-ation,” in Proceedings of the 44th annual Design Au-tomation Conference, ser. DAC ’07, 2007, pp. 9–14.

[2] J. Guajardo, S. Kumar, G.-J. Schrijen, and P. Tuyls,“Brand and ip protection with physical unclonablefunctions,” in Circuits and Systems, 2008. ISCAS2008. IEEE International Symposium on, may 2008,pp. 3186 –3189.

[3] S. Devadas, E. Suh, S. Paral, R. Sowell, T. Ziola, andV. Khandelwal, “Design and implementation of puf-based ”unclonable” rfid ics for anti-counterfeiting andsecurity applications,” in RFID, 2008 IEEE Interna-tional Conference on, april 2008, pp. 58–64.

[4] S. Kumar, J. Guajardo, R. Maes, G.-J. Schrijen, andP. Tuyls, “Extended abstract: The butterfly puf pro-tecting ip on every fpga,” in Hardware-Oriented Secu-rity and Trust, 2008. HOST 2008. IEEE InternationalWorkshop on, 2008, pp. 67 –70.

[5] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls,“Fpga intrinsic pufs and their use for ip protection,” inProceedings of the 9th international workshop on Cryp-tographic Hardware and Embedded Systems, ser. CHES’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp.63–80.

[6] D. Holcomb, W. Burleson, and K. Fu, “Power-up sramstate as an identifying fingerprint and source of truerandom numbers,” Computers, IEEE Transactions on,vol. 58, no. 9, pp. 1198 –1210, sept. 2009.

[7] R. Maes, P. Tuyls, and I. Verbauwhede, “Intrinsicpufs from flip-flops on reconfigurable devices,” in 3rdBenelux Workshop on Information and System Secu-rity (WISSec 2008), Eindhoven,NL, 2008, p. 17.

[8] J. S. J. Wong, P. Sedcole, and P. Y. K. Cheung, “Self-measurement of combinatorial circuit delays in fpgas,”ACM Trans. Reconfigurable Technol. Syst., vol. 2, pp.10:1–10:22, June 2009.

[9] G. Suh, C. O’Donnell, I. Sachdev, and S. Devadas, “De-sign and implementation of the aegis single-chip secureprocessor using physical random functions,” in Com-puter Architecture, 2005. ISCA ’05. Proceedings. 32ndInternational Symposium on, june 2005, pp. 25 – 36.

[10] D. Y. Deng, A. H. Chan, and G. E. Suh, “Hardware au-thentication leveraging performance limits in detailedsimulations and emulations,” in Proceedings of the 46thAnnual Design Automation Conference, ser. DAC ’09,2009, pp. 682–687.

[11] M. Majzoobi, A. Elnably, and F. Koushanfar, “Fpgatime-bounded unclonable authentication,” in Proceed-ings of the 12th international conference on Informa-tion hiding, ser. IH’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 1–16.

387