Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger...

Carnegie Mellon

Increasing Intrusion Tolerance Via Scalable Redundancy

Greg Gangergreg.ganger@cmu.edu

Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck Cranor

Carnegie Mellon

Technical Objective To design, implement and evaluate new protocols for

implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of

failures tolerated grows

Targeting three types of services Read-write data objects Custom “flat” object types for particular applications, notably

directories for implementing an intrusion-tolerant file system Arbitrary objects that support object nesting

Carnegie Mellon

Expected Impact Significant efficiency and scalability benefits over today’s

protocols for intrusion tolerance

For example, for data services, we anticipate At-least twofold latency improvement even at small configurations

(e.g., tolerating 3-5 Byzantine server failures) over current best And improvements will grow as system scales up

A twofold improvement in throughput, again growing with system size

Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

Carnegie Mellon

The Problem Space Distributed services manage redundant state across servers to

tolerate faults We consider tolerance to Byzantine faults, as might result from an

intrusion into a server or client A faulty server or client may behave arbitrarily

We also make no timing assumptions in this work An “asynchronous” system

Primary existing practice: replicated state machines Offers no load dispersion, requires data replication, and degrades as

system scales with O(N2) messages

Carnegie Mellon

Our approach Combine techniques to eliminate work in common cases

Server-side versioning allows optimism with read-time repair, if nec. allows work to be off-loaded to clients in lieu of server agreement

Quorum systems (and erasure coding) allows load dispersion (and more efficient redundancy for bulk

data) Several others applied to defend against Byzantine actions

Major risk? could be complex for arbitrary objects

Carnegie Mellon

Evaluation We are Scenario I: “centralized server setting”

Baseline: the BFT library Popular, publicly available implementation of Byzantine fault-tolerant

state machine replication (by Castro & Liskov) Reported to be an efficient implementation of that approach

Two measures Average latency of operations, from client’s perspective Peak sustainable throughput of operations

Our consistency definition: linearizability of invocations

Carnegie Mellon

Outline Overview Read-write storage protocol Some results Continuing work

Carnegie Mellon

Read-write block storage Clients erasure-code/replicate blocks into fragments Storage-nodes version fragments on every write

Storage-nodes

F3F1 F2 F4 F5

Client Data block

FragmentsF1 F2 F3 F4 F5

Carnegie Mellon

Challenges: Concurrency Concurrent updates can violate linearizability

Data Data

4 51 2 3

Servers

4 5 1 2 3

Carnegie Mellon

Challenges: Server Failures Can attempt to mislead clients

Typically addressed by “voting”

Servers

31 2 4 54’

Carnegie Mellon

Challenges: Client Failures Byzantine client failures can also mislead clients

Typically addressed by submitting a request via an agreement protocol

Servers

1 2 3 4’ ?2’

Carnegie Mellon

Consistency via versioning

Leverage versioning storage nodes for consistency

Allow writes to proceed with versioning All writes create new data versions Partial writes and concurrency won’t destroy data

Reader detects and resolves update conflicts Concurrency rare in FS workloads (typically < 1%) Offloads work to client resulting in greater scalability

Only perform extra work when needed Optimistically assume fault-free, concurrency-free operation Single round-trip for reads and writes in common case

Carnegie Mellon

Our system model

Crash-recovery storage-node fault model Up to t total bad storage-nodes (crashed/Byzantine) Up to b ≤ t Byzantine (arbitrary faults) So, t - b faults are crash-recovery faults

Client fault model Any number of crash or Byzantine clients

Asynchronous timing model Point-to-point authenticated channels

Carnegie Mellon

Read/write protocol Unit of update: a block

Complete blocks are read and written Erasure-coding may be used for space-efficiency

Update semantics: Read–write No guarantee of contents between read & write Sufficient for block-based storage

Consistency: Linearizability Liveness: wait-freedom

Carnegie Mellon

R/W protocol: Write

1. Client erasure-codes data-item into N data-fragments

2. Client tags write requests with logical timestamp Round-trip required to read logical time

3. Client issues requests to at least W storage-nodes

4. Storage-nodes validate integrity of request

5. Storage-nodes insert request into version history

6. Write completes after W requests have completed

Carnegie Mellon

R/W protocol: Read1. Client reads latest version from storage-node subset

Read set guaranteed to intersect with latest complete write

2. Client determines latest candidate write (candidate)

Set of responses containing the latest timestamp

3. Client classifies the candidate as one of: Complete Incomplete Repairable

For consistency: only complete writes can be returned

Carnegie Mellon

R/W protocol: Read classification Based on client’s (limited) system knowledge

Failures and asynchrony lead to imperfect information

Candidate classification rules: Complete: candidate exists on W nodes

candidate is decoded and returned

Incomplete: candidate exists on W nodes Read previous version to determine new candidate Iterate…perform classification on new candidate

Repairable: candidate may exist on W nodes Repair and return data-item

Carnegie Mellon

D0 determined complete, returned

Example: Successful read(N=5, W=3, t=1, b=0)

e Ø Ø Ø Ø ØD0 D0 D0

Storage Nodes

Client read operation after T1

1 2 3 4 5

D1 latest candidateD1 incompleteD0 latest candidate

Carnegie Mellon

Example: Repairable read(N=5, W=3, t=1, b=0)

e Ø Ø Ø Ø ØD0 D0 D0

T0T1T2

Storage Nodes

D0 D1D2T2

Client read operation after T2

1 2 3 4 5D2 D2D2

D2 repairableRepair D2

Return D2D2 latest candidate

Carnegie Mellon

Protecting against Byzantine storage-nodes Must defend against servers that modify data in their possession

Solution: Cross checksums [Gong 89] Hash each data-fragment Concatenate all N hashes Append cross checksum to each fragment Clients verify hashes against fragments and use cross checksums as

“votes”

Data-item

Data-fragmentsHashes

Crosschecksum

Carnegie Mellon

Protecting against Byzantine clients Must ensure all fragment sets decode to same value

Solution: Validating timestamps Write: place hash of cross checksum in timestamp

also prevents multiple values being written at same timestamp Storage-nodes validate their fragment against corresponding hash Read: regenerate fragments and cross checksum

Data-items

Data-fragments

Example: Byzantine encoding with “poisonous” fragment

F1 F2 F3 F4 F5

Carnegie Mellon

Experimental setup Prototype system: PASIS 20 node cluster

Dual 1 GHz Pentium III storage-nodes Single 2 GHz Pentium IV clients

100 Mb switched Ethernet 16 KB data-item size (before encoding)

Blowup of over the data-item size Each fragment is the data-item size

Carnegie Mellon

PASIS response time

1 2 3 40

Total failures tolerated (t)

1-way 16KB ping

Writes b = t

Reads b = t Writes b = 1

Reads b = 1

Fault modelsb = t and b = 1

N = 2t + 2b + 1

N = 17N = 11

Decode computationNW delay: redundant fragments

Carnegie Mellon

Throughput experiment

Same system set-up as resp. time experiment Clients issue read or write requests

Increase number of clients to increase load

Demonstrate value of erasure-codes Increase m to reduce per storage-node load

Compare with Byzantine atomic broadcast BFT library [Castro & Liskov 99] Supports arbitrary operations Replica (with multicast): limits write throughput O(N2) messages: limits performance scalability

Carnegie Mellon

Reduce per storage-node loadwith erasure-codes

BFT uses replicationwhich increases per storage-node load

PASIS vs. BFT: Write throughput

0 2 4 6 80

Clients

PASISPASISBFT

m Nb = t = 1

2 53 61 4

PASIS has higher writethroughput than BFT

Carnegie Mellon

PASIS vs. BFT: Read throughput

0 2 4 6 80

Clients

PASISPASISBFT

b = t = 1

3 61 4

Carnegie Mellon

Continuing work New testbed: 70 servers connected with switched Gbit/sec

experiments can then explore higher scalability points baseline and our results will come from this testbed

Protocol for arbitrary deterministic functions on objects built from same basic primitives

Protocol for objects with nested objects adds requirement of replicated invocations

Carnegie Mellon

Summary Goal: To design, implement and evaluate new protocols for

implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of

failures tolerated grows

Started with a protocol for read-write storage based on versioning and quorums scales efficiently (and much better than BFT) also flexible (can add assumptions to reduce costs)

Going forward (in progress) generalize types of objects and operations that can be supported

Carnegie Mellon

Questions?

Carnegie Mellon

Garbage collection Pruning old versions is necessary to reclaim space

Versions prior to latest complete write can be pruned

Storage-nodes need to know latest complete write In isolation they do not have this information Perform read operation to classify latest complete write

Many possible policies exist for when to clean what

Best to clean during idle time (if possible) Rank blocks in order of greatest potential gains Work remains in this area

Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger...

Documents

PowerPoint-presentasjon 2017...meg Spilt onlinespill med andre størstedelen av kvelden Ingen ganger 1 gang 2-5 ganger 6 ganger eller mer 30 29 18 8. trinn 9. trinn 10. trinn

Teaching Metacognition Marsha C. Lovett, Ph.D. Lovett@cmu.edu

Instructors: Greg Ganger, Dave O’Hallaron , and Greg Kesden

Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter reiter@cmu.edu Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor

To ganger er en vane

Journal of Membrane Science - cmu.edu

Lower Limb Development-Ganger

Instructors: Dave O’Hallaron, Greg Ganger, and Greg Kesden

AI Intro Cmu.edu

Addressing Diagnostic Complexity The EDDY Approach End-to-end Diagnostic DiscoveryY Chas DiFatta chas@cmu.educhas@cmu.edu Mark Poepping poepping@cmu.edupoepping@cmu.edu

Mental Models for Human-Robot Interaction Christian Lebiere (cl@cmu.edu) 1cl@cmu.edu Florian Jentsch and Scott Ososky 2 1 Psychology Department, Carnegie

Vulnerabilities of EV Battery Packs to Cyber Attacks€¦ · ssripad@cmu.edu , venkvis@cmu.edu INTRODUCTION RESULTS FUTURE WORK Pictorial illustration of an attack scenario Parametric

16722 mws@cmu.edu Mo:20090201synchronous detection93+1 assignment (15)

Instructors: Dave O’Hallaron, Greg Ganger, and Greg Ganger

16722 mws@cmu.edu We:20090121noise51+1 noise. 16722 mws@cmu.edu We:20090121noise51+2 noise: signal you don’t want technical noise: sources we can control

CAREER - cmu.edu

Improving DRAM Performance by Parallelizing … DRAM Performance by Parallelizing Refreshes with Accesses Kevin Kai-Wei Chang Donghyuk Lee Zeshan Chishti† kevincha@cmu.edu donghyu1@cmu.edu

All that GLITTERs: Low-Power Spoof-Resilient Optical ... · Rahul Anand Sharma Carnegie Mellon University rahulans@cmu.edu Adwait Dongare Carnegie Mellon University adongare@cmu.edu

30. Genu Varum and Genu Valgum Ganger 20.05.2011

Morae Software in the Usability Testing Lab - cmu.edu