110
Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby ? , Max Howald ? , Siddharth Garg ? , abhi shelat , and Michael Walfish ? Stanford University ? New York University The Cooper Union The University of Virginia June 10 th , 2016

Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs:trustworthy hardware with untrusted components

Riad S. Wahby◦?, Max Howald†?,Siddharth Garg?, abhi shelat‡, and Michael Walfish?

◦Stanford University?New York University†The Cooper Union

‡The University of Virginia

June 10th, 2016

Page 2: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Setting: ASICs with mutually distrusting designer, manufacturer

chip designPrincipal

(government,chip designer)

Manufacturer(“foundry”or “fab”)

Here we are thinking about ASICs, not CPUs:

CPU

RAM

cache

registerfile ALU

ASIC

in[0]

in[n]

...D Q

D Q

Page 3: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Setting: ASICs with mutually distrusting designer, manufacturer

chip designPrincipal

(government,chip designer)

Manufacturer(“foundry”or “fab”)

Here we are thinking about ASICs, not CPUs:

CPU

RAM

cache

registerfile ALU

ASIC

in[0]

in[n]

...D Q

D Q

Page 4: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Setting: ASICs with mutually distrusting designer, manufacturer

Firewall

e.g., a network firewall appliance,with a custom chip for packet processing

What if our packet processing chip has a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)

US DoD controls supply chain with trusted foundries.

Page 5: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Untrusted manufacturers can craft hardware Trojans

Firewall

e.g., a network firewall appliance

,with a custom chip for packet processing

What if our packet processing chip has a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)US DoD controls supply chain with trusted foundries.

Page 6: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Untrusted manufacturers can craft hardware Trojans

Firewall

e.g., a network firewall appliance

,with a custom chip for packet processing

What if our packet processing chip has a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)

US DoD controls supply chain with trusted foundries.

Page 7: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Untrusted manufacturers can craft hardware Trojans

Firewall

e.g., a network firewall appliance

,with a custom chip for packet processing

What if our packet processing chip has a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)US DoD controls supply chain with trusted foundries.

Page 8: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Untrusted manufacturers can craft hardware Trojans

Firewall

e.g., a network firewall appliance

,with a custom chip for packet processing

What if our packet processing chip has a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)

US DoD controls supply chain with trusted foundries.

Page 9: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Trusted fabs are the only way to get strong guarantees

For example, stealthy trojans can thwart post-fab detection[A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016;Stealthy Dopant-Level Trojans, Becker et al., CHES 2013]

But trusted fabrication is not a panacea:

7 Only 5 countries have cutting-edge fabs on-shore

7 Building a new fab takes $$$$$$, years of R&D

7 Semiconductor scaling: chip area and energy go withsquare and cube of transistor length (“critical dimension”)

7 So using an old fab means an enormous performance hite.g., India’s best on-shore fab is 108× behind state of the art

Can we get trust more cheaply?

Page 10: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Trusted fabs are the only way to get strong guarantees

For example, stealthy trojans can thwart post-fab detection[A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016;Stealthy Dopant-Level Trojans, Becker et al., CHES 2013]

But trusted fabrication is not a panacea:

7 Only 5 countries have cutting-edge fabs on-shore

7 Building a new fab takes $$$$$$, years of R&D

7 Semiconductor scaling: chip area and energy go withsquare and cube of transistor length (“critical dimension”)

7 So using an old fab means an enormous performance hite.g., India’s best on-shore fab is 108× behind state of the art

Can we get trust more cheaply?

Page 11: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Trusted fabs are the only way to get strong guarantees

For example, stealthy trojans can thwart post-fab detection[A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016;Stealthy Dopant-Level Trojans, Becker et al., CHES 2013]

But trusted fabrication is not a panacea:

7 Only 5 countries have cutting-edge fabs on-shore

7 Building a new fab takes $$$$$$, years of R&D

7 Semiconductor scaling: chip area and energy go withsquare and cube of transistor length (“critical dimension”)

7 So using an old fab means an enormous performance hite.g., India’s best on-shore fab is 108× behind state of the art

Can we get trust more cheaply?

Page 12: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Trusted fabs are the only way to get strong guarantees

For example, stealthy trojans can thwart post-fab detection[A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016;Stealthy Dopant-Level Trojans, Becker et al., CHES 2013]

But trusted fabrication is not a panacea:

7 Only 5 countries have cutting-edge fabs on-shore

7 Building a new fab takes $$$$$$, years of R&D

7 Semiconductor scaling: chip area and energy go withsquare and cube of transistor length (“critical dimension”)

7 So using an old fab means an enormous performance hite.g., India’s best on-shore fab is 108× behind state of the art

Can we get trust more cheaply?

Page 13: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs

Principal

F → designsfor P,V

Page 14: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs

Untrustedfab (fast)builds P

Trustedfab (slow)builds V

Principal

F → designsfor P,V

Page 15: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs

Untrustedfab (fast)builds P

Trustedfab (slow)builds V

Principal

F → designsfor P,V

IntegratorV P

Page 16: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs

Untrustedfab (fast)builds P

Trustedfab (slow)builds V

Principal

F → designsfor P,V

Integrator

V Pinput

output

Page 17: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Verifiable ASICs

Untrustedfab (fast)builds P

Trustedfab (slow)builds V

Principal

F → designsfor P,V

Integrator

V Pxy

proof thaty = F(x)

input

output

Page 18: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Can we build Verifiable ASICs?

V Pxy

proof thaty = F(x)

input

output Fvs.

Makes sense if V + P are cheaperthan trusted F

Babai85GMR85BCC86BFLS91FGLSS91Kilian92ALMSS92AS92Micali94BG02GOS06IKO07GKR08KR09GGP10Groth10GLR11Lipmaa11BCCT12GGPR13BCCT13KRR14. . .

SBW11CMT12

SMBW12TRMP12

SVPBBW12SBVBPW13

VSBW13PGHR13Thaler13

BCGTV13BFRSBW13

BFR13DFKP13

BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

108×

Page 19: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Can we build Verifiable ASICs?

V Pxy

proof thaty = F(x)

input

output Fvs.

Makes sense if V + P are cheaperthan trusted F

Reasons for hope:• running time of V < F (asymptotically)

• Implementations exist

• P overheads are massive, but using anadvanced fab might offset these costs

Babai85GMR85BCC86BFLS91FGLSS91Kilian92ALMSS92AS92Micali94BG02GOS06IKO07GKR08KR09GGP10Groth10GLR11Lipmaa11BCCT12GGPR13BCCT13KRR14. . .

SBW11CMT12

SMBW12TRMP12

SVPBBW12SBVBPW13

VSBW13PGHR13Thaler13

BCGTV13BFRSBW13

BFR13DFKP13

BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

108×

Page 20: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Can we build Verifiable ASICs?

V Pxy

proof thaty = F(x)

input

output Fvs.

Makes sense if V + P are cheaperthan trusted F

Reasons for hope:• running time of V < F (asymptotically)

• Implementations exist

• P overheads are massive, but using anadvanced fab might offset these costs

Babai85GMR85BCC86BFLS91FGLSS91Kilian92ALMSS92AS92Micali94BG02GOS06IKO07GKR08KR09GGP10Groth10GLR11Lipmaa11BCCT12GGPR13BCCT13KRR14. . .

SBW11CMT12

SMBW12TRMP12

SVPBBW12SBVBPW13

VSBW13PGHR13Thaler13

BCGTV13BFRSBW13

BFR13DFKP13

BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

108×

Page 21: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Can we build Verifiable ASICs?

V Pxy

proof thaty = F(x)

input

output Fvs.

Makes sense if V + P are cheaperthan trusted F

Reasons for hope:• running time of V < F (asymptotically)

• Implementations exist

• P overheads are massive, but using anadvanced fab might offset these costs

Babai85GMR85BCC86BFLS91FGLSS91Kilian92ALMSS92AS92Micali94BG02GOS06IKO07GKR08KR09GGP10Groth10GLR11Lipmaa11BCCT12GGPR13BCCT13KRR14. . .

SBW11CMT12

SMBW12TRMP12

SVPBBW12SBVBPW13

VSBW13PGHR13Thaler13

BCGTV13BFRSBW13

BFR13DFKP13

BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

108×

Page 22: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Can we build Verifiable ASICs?

V Pxy

proof thaty = F(x)

input

output Fvs.

Makes sense if V + P are cheaperthan trusted F

Reasons for hope caution:• Theory is silent about feasibility

• Onus is heavier than in prior work

• Hardware issues: energy, chip area

• Need physically realizable circuit design

• Need V to save for plausible computation sizes

Babai85GMR85BCC86BFLS91FGLSS91Kilian92ALMSS92AS92Micali94BG02GOS06IKO07GKR08KR09GGP10Groth10GLR11Lipmaa11BCCT12GGPR13BCCT13KRR14. . .

SBW11CMT12

SMBW12TRMP12

SVPBBW12SBVBPW13

VSBW13PGHR13Thaler13

BCGTV13BFRSBW13

BFR13DFKP13

BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

108×

Page 23: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

A qualified success

Zebra: a hardware design that saves costs

. . .

. . . sometimes.

Page 24: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

A qualified success

Zebra: a hardware design that saves costs. . .

. . . sometimes.

Page 25: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

F must be expressed as an arithmetic circuit (AC)

AC satisfiable ⇐⇒ F was executed correctly

P convinces V that the AC is satisfiable

generalized boolean circuit over Fp

∧ → × ∨ → +What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.

73

Page 26: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.

7 3

Page 27: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.7 3

Page 28: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .These all seem a bit further from practicality.

7 3

Page 29: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.

7 3

Page 30: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.

7

3

Page 31: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Probabilistic proof protocols, briefly

V Pxy

proof thaty = F(x)

input

output

Arguments [GGPR13,SBVBPW13, PGHR13, BCTV14]

e.g., Zaatar, Pinocchio, libsnark

+ nondeterministic ACs,arbitrary connectivity

+ Few rounds (≤ 3)

Unsuited to hardwareimplementation

IPs[GKR08, CMT12, VSBW13]

e.g., Muggles, CMT, Allspice

– deterministic ACs;layered, low depth

– Many rounds

Suited to hardwareimplementation

generalized boolean circuit over Fp

∧ → × ∨ → +

What about other schemes? e.g.,FHE [GGP10], MIP+FHE [BC12], MIP [BTWV14],PCIP [RRR16], IOP [BCS16], PIR [BHK16], . . .

These all seem a bit further from practicality.

7 3

Page 32: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

F must be expressed as alayered arithmetic circuit.

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 33: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates

, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 34: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates

, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 35: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates

, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 36: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates

, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 37: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

y

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 38: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 39: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check

, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 40: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 41: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 42: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 43: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check, getsclaim about second-last layer

5. V iterates

, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 44: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

1. V sends inputs

2. P evaluates, returns output y

3. V constructs polynomial relatingy to last layer’s input wires

4. V engages P in a sum-check, getsclaim about second-last layer

5. V iterates, gets claim aboutinputs, which it can check

V Px

thinking...

y

thinking...

... sum-check[LFKN90]

more sum-checks

Page 45: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

Soundness error ∝ p−1

Cost to execute F directly:O(depth · width)

V ’s sequential running time:O(depth · log width + |x | + |y |)(assuming precomputed queries)

P ’s sequential running time:O(depth · width · log width)

V Pxy

... sum-check[LFKN90]

more sum-checks

Page 46: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

Soundness error ∝ p−1

Cost to execute F directly:O(depth · width)

V ’s sequential running time:O(depth · log width + |x | + |y |)(assuming precomputed queries)

P ’s sequential running time:O(depth · width · log width)

V Pxy

... sum-check[LFKN90]

more sum-checks

Page 47: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13]

Soundness error ∝ p−1

Cost to execute F directly:O(depth · width)

V ’s sequential running time:O(depth · log width + |x | + |y |)(assuming precomputed queries)

P ’s sequential running time:O(depth · width · log width)

V Pxy

... sum-check[LFKN90]

more sum-checks

Page 48: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra

P executing AC: layers aresequential, but all gates at alayer can be executed in parallel

Proving step: Can V and Pinteract about all of F’s layersat once?

No. V must ask questions inorder or soundness is lost.

But: there is still parallelism tobe extracted. . .

Page 49: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra

P executing AC: layers aresequential, but all gates at alayer can be executed in parallel

Proving step: Can V and Pinteract about all of F’s layersat once?

No. V must ask questions inorder or soundness is lost.

But: there is still parallelism tobe extracted. . .

Page 50: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra

P executing AC: layers aresequential, but all gates at alayer can be executed in parallel

Proving step: Can V and Pinteract about all of F’s layersat once?

No. V must ask questions inorder or soundness is lost.

But: there is still parallelism tobe extracted. . .

Page 51: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra

P executing AC: layers aresequential, but all gates at alayer can be executed in parallel

Proving step: Can V and Pinteract about all of F’s layersat once?

No. V must ask questions inorder or soundness is lost.

But: there is still parallelism tobe extracted. . .

Page 52: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

V questions P aboutF(x1)’s output layer.

F(x1)

Page 53: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

V questions P aboutF(x1)’s output layer.

Simultaneously, Preturns F(x2).

F(x1)

F(x2)

Page 54: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

V questions P aboutF(x1)’s next layer

F(x1)

Page 55: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

V questions P aboutF(x1)’s next layer, andF(x2)’s output layer.

F(x1)

F(x2)

Page 56: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

V questions P aboutF(x1)’s next layer, andF(x2)’s output layer.

Meanwhile, P returnsF(x3).

F(x1)

F(x2)

F(x3)

Page 57: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

This process continues. . .

F(x1)

F(x2)

F(x3)

F(x4)

Page 58: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

This process continues. . .

F(x1)

F(x2)

F(x3)

F(x4)

F(x5)

Page 59: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P

This process continuesuntil V and P interactabout every layersimultaneously—but fordifferent computations.

V and P can completeone proof in each timestep.

F(x1)

F(x2)

F(x3)

F(x4)

F(x5)

F(x6)

F(x7)

F(x8)

Page 60: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P with pipelining

Input (x)

Output (y)

prove

prove

prove

Sub-prover, layer 0

Sub-prover, layer 1

Sub-prover, layer d −1

V

Pqueries

responses

queries

responses

queries

responses

.

.

.

.

.

.

This approach is just a standard hardware technique, pipelining;it is possible because the protocol is naturally staged.

There are other opportunities to leverage the protocol’s structure.

Page 61: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Extracting parallelism in Zebra’s P with pipelining

Input (x)

Output (y)

prove

prove

prove

Sub-prover, layer 0

Sub-prover, layer 1

Sub-prover, layer d −1

V

Pqueries

responses

queries

responses

queries

responses

.

.

.

.

.

.

This approach is just a standard hardware technique, pipelining;it is possible because the protocol is naturally staged.

There are other opportunities to leverage the protocol’s structure.

Page 62: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer

,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 63: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 64: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateprover

δ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateprover

δ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateprover

δ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateprover

δ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 65: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 66: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 67: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 68: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 69: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 70: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 71: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:gate

proverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 72: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Per-layer computations

For each sum-check round, Psums over each gate in a layer,evaluating H[k], k ∈ {0, 1, 2}

In software:// compute H[0],H[1],H[2]for k ∈ {0, 1, 2}:H[k] ← 0for g ∈ layer:H[k] ← H[k] + δ(g, k)// δ uses state[g]

// update lookup table// with V’s random coinfor g ∈ layer:state[g] ← δ(g, rj)

layer:H[k] =

∑g∈layer

δ(g , k)

In hardware:

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

RAM

+ ++

Adder tree

state[0]

gateproverδ(0, 0)

δ(0, 1)

δ(0, 2)

δ(0, rj)

state[1]

gateproverδ(1, 0)

δ(1, 1)

δ(1, 2)

δ(1, rj)

state[2]

gateproverδ(2, 0)

δ(2, 1)

δ(2, 2)

δ(2, rj)

state[3]

gateproverδ(3, 0)

δ(3, 1)

δ(3, 2)

δ(3, rj)

. . .

+ ++

Adder tree

Page 73: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra’s design approach

3 Extract parallelism

e.g., pipelined provinge.g., parallel evaluation of δ by gate provers

3 Exploit locality: distribute data and control

e.g., no RAM: data is kept close to places it is needed

e.g., latency-insensitive design: localized control

3 Reduce, reuse, recycle

e.g., computation: save energy by adding memoization to Pe.g., hardware: save chip area by reusing the same circuits

Page 74: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra’s design approach

3 Extract parallelism

e.g., pipelined provinge.g., parallel evaluation of δ by gate provers

3 Exploit locality: distribute data and control

e.g., no RAM: data is kept close to places it is needede.g., latency-insensitive design: localized control

3 Reduce, reuse, recycle

e.g., computation: save energy by adding memoization to Pe.g., hardware: save chip area by reusing the same circuits

Page 75: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Zebra’s design approach

3 Extract parallelism

e.g., pipelined provinge.g., parallel evaluation of δ by gate provers

3 Exploit locality: distribute data and control

e.g., no RAM: data is kept close to places it is needede.g., latency-insensitive design: localized control

3 Reduce, reuse, recycle

e.g., computation: save energy by adding memoization to Pe.g., hardware: save chip area by reusing the same circuits

Page 76: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area

3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]

3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity

7 Give V trusted storage? Cost would be prohibitive3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 77: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]

3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity

7 Give V trusted storage? Cost would be prohibitive3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 78: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]

3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity

7 Give V trusted storage? Cost would be prohibitive3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 79: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity

7 Give V trusted storage? Cost would be prohibitive3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 80: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity7 Give V trusted storage? Cost would be prohibitive

3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 81: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Architectural challenges

Interaction between V and P requires a lot of bandwidth7 V and P on circuit board? Too much energy, circuit area3 Zebra uses 3D integration

Protocol requires input-independent precomputation [VSBW13]3 Zebra amortizes precomputations over many V-P pairs

Precomputations need secrecy, integrity7 Give V trusted storage? Cost would be prohibitive3 Zebra uses untrusted storage + authenticated encryption

V

Pxy

proof thaty = F(x)

input

outputprei

V Pxy

proof thaty = F(x)

input

outputEk(prei)

Page 82: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Implementation

Zebra’s implementation includes

• a compiler that produces synthesizable Verilog for P• two V implementations

• hardware (Verilog)• software (C++)

• library to generate V ’s precomputations

• Verilog simulator extensions to modelsoftware or hardware V ’s interactions with P

Page 83: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

. . . and it seemed to work really well!Zebra can produce 10k–100k proofs per second,while existing systems take tens of seconds per proof!

But that’s not a serious evaluation. . .

Page 84: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

. . . and it seemed to work really well!Zebra can produce 10k–100k proofs per second,while existing systems take tens of seconds per proof!

But that’s not a serious evaluation. . .

Page 85: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Evaluation method

V Pxy

proof thaty = F(x)

input

output Fvs.

Baseline: direct implementation of F in same technology as V

Metrics: energy, chip size per throughput (discussed in paper)

Measurements: based on circuit synthesis and simulation,published chip designs, and CMOS scaling models

Charge for V, P, communication; retrieving and decryptingprecomputations; PRNG; Operator communicating with V

Constraints: trusted fab = 350 nm; untrusted fab = 7 nm;200 mm2 max chip area; 150 W max total power

350 nm: 1997 (Pentium II)7 nm: ≈ 2017 [TSMC]≈ 20 year gap betweentrusted and untrusted fab

Page 86: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Evaluation method

V Pxy

proof thaty = F(x)

input

output Fvs.

Baseline: direct implementation of F in same technology as V

Metrics: energy, chip size per throughput (discussed in paper)

Measurements: based on circuit synthesis and simulation,published chip designs, and CMOS scaling models

Charge for V, P, communication; retrieving and decryptingprecomputations; PRNG; Operator communicating with V

Constraints: trusted fab = 350 nm; untrusted fab = 7 nm;200 mm2 max chip area; 150 W max total power

350 nm: 1997 (Pentium II)7 nm: ≈ 2017 [TSMC]≈ 20 year gap betweentrusted and untrusted fab

Page 87: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Evaluation method

V Pxy

proof thaty = F(x)

input

output Fvs.

Baseline: direct implementation of F in same technology as V

Metrics: energy, chip size per throughput (discussed in paper)

Measurements: based on circuit synthesis and simulation,published chip designs, and CMOS scaling models

Charge for V, P, communication; retrieving and decryptingprecomputations; PRNG; Operator communicating with V

Constraints: trusted fab = 350 nm; untrusted fab = 7 nm;200 mm2 max chip area; 150 W max total power

350 nm: 1997 (Pentium II)7 nm: ≈ 2017 [TSMC]≈ 20 year gap betweentrusted and untrusted fab

Page 88: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Evaluation method

V Pxy

proof thaty = F(x)

input

output Fvs.

Baseline: direct implementation of F in same technology as V

Metrics: energy, chip size per throughput (discussed in paper)

Measurements: based on circuit synthesis and simulation,published chip designs, and CMOS scaling models

Charge for V, P, communication; retrieving and decryptingprecomputations; PRNG; Operator communicating with V

Constraints: trusted fab = 350 nm; untrusted fab = 7 nm;200 mm2 max chip area; 150 W max total power

350 nm: 1997 (Pentium II)7 nm: ≈ 2017 [TSMC]≈ 20 year gap betweentrusted and untrusted fab

Page 89: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Application #1: number theoretic transform

NTT: a Fourier transform over Fp

Widely used, e.g., in computer algebra

Page 90: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Application #1: number theoretic transformRatio of baseline energy to Zebra energy

6 7 8 9 10 11 12 130.1

0.3

1

3

log2(NTT size)

baselin

e v

s. Z

ebra

(hig

her

is b

etter)

Page 91: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Application #2: Curve25519 point multiplication

Curve25519: a commonly-used elliptic curve

Point multiplication: primitive, e.g., for ECDH

Page 92: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Application #2: Curve25519 point multiplicationRatio of baseline energy to Zebra energy

84 170 340 682 11470.1

0.3

1

3

Parallel Curve25519 point multiplications

baselin

e v

s. Z

ebra

(hig

her

is b

etter)

Page 93: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

A qualified success

Zebra: a hardware design that saves costs. . .

. . . sometimes.

Page 94: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 95: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 96: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Arguments versus IPs, redux

Design principleIPs[GKR08, CMT12,VSBW13]

Arguments[GGPR13, SBVBPW13,PGHR13, BCTV14]

Extract parallelism 3 3Exploit locality 3

7

Reduce, reuse, recycle 3

7

Argument protocols seem friendly to hardware?

P computes over entire AC at once =⇒ need RAM

P does crypto for every gate in AC =⇒ special crypto circuits

. . . but we hope these issues are surmountable!

Page 97: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Arguments versus IPs, redux

Design principleIPs[GKR08, CMT12,VSBW13]

Arguments[GGPR13, SBVBPW13,PGHR13, BCTV14]

Extract parallelism 3 3Exploit locality 3 7Reduce, reuse, recycle 3

7

Argument protocols seem unfriendly to hardware:

P computes over entire AC at once =⇒ need RAM

P does crypto for every gate in AC =⇒ special crypto circuits

. . . but we hope these issues are surmountable!

Page 98: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Arguments versus IPs, redux

Design principleIPs[GKR08, CMT12,VSBW13]

Arguments[GGPR13, SBVBPW13,PGHR13, BCTV14]

Extract parallelism 3 3Exploit locality 3 7Reduce, reuse, recycle 3 7

Argument protocols seem unfriendly to hardware:

P computes over entire AC at once =⇒ need RAM

P does crypto for every gate in AC =⇒ special crypto circuits

. . . but we hope these issues are surmountable!

Page 99: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Arguments versus IPs, redux

Design principleIPs[GKR08, CMT12,VSBW13]

Arguments[GGPR13, SBVBPW13,PGHR13, BCTV14]

Extract parallelism 3 3Exploit locality 3 7Reduce, reuse, recycle 3 7

Argument protocols seem unfriendly to hardware:

P computes over entire AC at once =⇒ need RAM

P does crypto for every gate in AC =⇒ special crypto circuits

. . . but we hope these issues are surmountable!

Page 100: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 101: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 102: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 103: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 104: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 105: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 106: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 107: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Summary of Zebra’s applicability

Applies to IPs, but not arguments

1. Computation F must have a layered, shallow, deterministic AC

2. Must have a wide gap between cutting-edge fab (for P)and trusted fab (for V)

3. Amortizes precomputations over many instances

4. Computation F must be very large for V to save work

5. Computation F must be efficient as an arithmetic circuit

Common to essentially all built proof systems

101

105

0

109

103

107

1011

wor

ker’

s co

st

norm

aliz

ed to

nat

ive

C

matrix multiplication (m=128) PAM clustering (m=20, d=128)

N/A

1013

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Pepp

erG

inge

r

Pin

occh

io

Zaa

tar

CM

T

nativ

e C

Alls

pice

Tin

yRA

M

Tha

ler

Figure 5—Prover overhead normalized to native execution cost for twocomputations. Prover overheads are generally enormous.

you’re sending Will Smith and Jeff Goldblum into space to saveEarth; then you care a lot more about correctness than costs (a calcu-lus that applies to ordinary satellites too). More prosaically, there isa scenario with an abundance of server CPU cycles, many instancesof the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce [15] is thereforean encouraging application.

OPEN QUESTIONS AND NEXT STEPSThe main issue in this area is performance, and the biggest problemis the prover’s overhead. The verifier’s costs are also quantitativelyhigher than ideal; qualitatively, we would ideally eliminate the veri-fier’s setup phase while retaining expressivity (such schemes existin theory [11] but have very high overheads, even in principle).

The computational model is also a critical area of focus. OnlyTinyRAM handles data-dependent loops, only Pantry handles re-motely stored inputs, and only TinyRAM and Pantry handle com-putations that work with RAM. Unfortunately, TinyRAM adds highoverhead to the circuit representation for every operation in the givencomputation; Pantry, on the other hand, adds even higher overheadto its constraint representation but only to operations that interactwith state. While improving the overhead of either representationwould be worthwhile, a more general research direction is to movebeyond the circuit and constraint model.

There are also questions in systems and programming languages.For instance, can we develop programming languages that are well-tailored to the circuit or constraint formalism? We might also be ableto co-design the language, computational model, and verification ma-chinery: many of the protocols naturally work with parallel computa-tions, and the current verification machinery is already amenable toa parallel implementation. Another systems question is to develop arealistic database application, which requires concurrency, relationalstructures, etc. More generally, an important test for this area—sofar unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. Byleveraging Pinocchio, Pantry has experimented with simple applica-tions that hide the prover’s state from the verifier, but there is morework to be done here and other notions of privacy that are worthproviding. For example, we can provide verifiability while conceal-ing the program that is executed (by composing techniques fromPantry, Pinocchio, and TinyRAM). A speculative application is toproduce versions of Bitcoin in which transactions can be conductedanonymously, in contrast to the status quo [18, 22].

REFLECTIONS AND PREDICTIONS

It is worth recalling that the intellectual foundations of this researcharea really had nothing to do with practice. For example, the PCPtheorem is a landmark achievement of complexity theory, but if wewere to implement the theory as proposed, generating the proofs,even for simple computations, would have taken longer than theage of the universe. In contrast, the projects described in this articlehave not only built systems from this theory but also performedexperimental evaluations that terminate before publication deadlines.

So that’s the encouraging news. The sobering news, of course, isthat these systems are basically toys. Part of the reason that we arewilling to label them near-practical is painful experience with whatthe theory used to cost. (As a rough analogy, imagine a graduatestudent’s delight in discovering hexadecimal machine code afteryears spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that aremotely deployed machine is executing correctly. In the streamingcontext, the verifier may not have space to compute locally, so wecould use CMT [21] to check that the outputs are correct, in concertwith Thaler’s refinements [57] to make the prover truly inexpensive.Finally, data parallel cloud computations (like MapReduce jobs)perfectly match the regimes in which the general-purpose schemesperform well: abundant CPU cycles for the prover and many in-stances of the same computation with different inputs.

Moreover, the gap separating the performance of the currentresearch prototypes and plausible deployment in the cloud is a feworders of magnitude—which is certainly daunting, but, given thecurrent pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead,the effects will go far beyond verifying cloud computations: we willhave new ways of building systems. In any situation in which onemodule performs a task for another, the delegating module will beable to check the answers. This could apply at the micro level (ifthe CPU could check the results of the GPU, this would exposehardware errors) and the macro level (distributed systems could bebuilt under very different trust assumptions).

But even if none of the above comes to pass, there are excitingintellectual currents here. Across computer systems, we are startingto see a new style of work: reducing sophisticated cryptographyand other achievements of theoretical computer science to prac-tice [32, 43, 47, 60]. These developments are likely a product ofour times: the preoccupation with strong security of various kinds,and the computers powerful enough to run previously “paper-only”algorithms. Whatever the cause, proof-based verifiable computationis an excellent example of this tendency: not only does it composetheoretical refinements with systems techniques, it also raises re-search questions in other sub-disciplines of Computer Science. Thiscross-pollination is the best news of all.

AcknowledgmentsWe thank Srinath Setty both for his help with this article, includingthe experimental aspect, and for his deep influence on our under-standing of this area. This article has also benefited from many pro-ductive conversations with Justin Thaler, whose patient explanationshave been most helpful. This draft was improved by experimen-tal assistance from Riad Wahby; by detailed feedback from AlexisGallagher and the anonymous CACM reviewers; and by commentsfrom Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai,Guy Rothblum, Riad Wahby, Eleanor Walfish, and Mark Walfish.This work was supported by AFOSR grant FA9550-10-1-0073; NSF

System Amortization regime Advice

Zebra many V-P pairs short

Allspice[VSBW13]

batch of instancesof a particular F

short

BootstrappedSNARKs[BCTV14a,CTV15]

all computations long

BCTV[BCTV14b]

all computationsof the same length

long

Pinocchio[PGHR13]

all future instancesof a particular F

long

Zaatar[SBVBPW13]

batch of instancesof a particular F

long

Exception: [CMT12] with logspace-uniform ACs

For example, libsnark [BCTV14b], a highly optimized implementa-tion of [GGPR13] and Pinocchio [PGHR13]:

V ’s work: 6 ms + (|x |+ |y |) · 3 µs on a 2.7 GHz CPU

⇒ break-even point ≥ 16× 106 CPU ops

With 32 GB RAM, libsnark handles ACs with ≤ 16× 106 gates

⇒ breaking even requires > 1 CPU op per AC gate, e.g.,computations over Fp rather than machine integers

Page 108: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Recap

V Pxy

proof thaty = F(x)

input

output

+ Verifiable ASICs: a new approach to buildingtrustworthy hardware under a strong threat model

+ First hardware design for a probabilistic proof protocol

+ Improves performance compared to trusted baseline

– Improvement compared to the baseline is modest

– Applicability is limited:precomputations must be amortizedcomputation needs to be “big enough”large gap between trusted and untrusted technologydoes not apply to all computations

Bottom line: Zebra is plausible—when it applieshttps://www.pepper-project.org/

Page 109: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Recap

V Pxy

proof thaty = F(x)

input

output

+ Verifiable ASICs: a new approach to buildingtrustworthy hardware under a strong threat model

+ First hardware design for a probabilistic proof protocol

+ Improves performance compared to trusted baseline

– Improvement compared to the baseline is modest

– Applicability is limited:precomputations must be amortizedcomputation needs to be “big enough”large gap between trusted and untrusted technologydoes not apply to all computations

Bottom line: Zebra is plausible—when it applieshttps://www.pepper-project.org/

Page 110: Verifiable ASICs: trustworthy hardware with untrusted ... · [A2: Analog Malicious Hardware, Yang et al., IEEE S&P 2016; Stealthy Dopant-Level Trojans, Becker et al., CHES 2013] But

Recap

V Pxy

proof thaty = F(x)

input

output

+ Verifiable ASICs: a new approach to buildingtrustworthy hardware under a strong threat model

+ First hardware design for a probabilistic proof protocol

+ Improves performance compared to trusted baseline

– Improvement compared to the baseline is modest

– Applicability is limited:precomputations must be amortizedcomputation needs to be “big enough”large gap between trusted and untrusted technologydoes not apply to all computations

Bottom line: Zebra is plausible—when it applieshttps://www.pepper-project.org/