40
Anti-Persistence or History Independent Data Structures Moni Naor Vanessa Teague Weizmann Institute Stanford

Anti-Persistence or History Independent Data Structures

  • Upload
    anais

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Anti-Persistence or History Independent Data Structures. Moni Naor Vanessa Teague Weizmann Institute Stanford. Why hide your history?. Core dumps Losing your laptop The entire memory representation of data structures is exposed Emailing files - PowerPoint PPT Presentation

Citation preview

Page 1: Anti-Persistence  or History Independent Data Structures

Anti-Persistence or

History Independent Data Structures

Moni Naor Vanessa Teague

Weizmann Institute Stanford

Page 2: Anti-Persistence  or History Independent Data Structures

Why hide your history?

Core dumps Losing your laptop

– The entire memory representation of data structures is exposed

Emailing files– The editing history may be exposed (e.g. Word)

Maintaining lists of people– Sports teams, party invitees

Page 3: Anti-Persistence  or History Independent Data Structures

Making sure that nobody learns from history

A data structure has:– A “legitimate” interface: the set of operations

allowed to be performed on it– A memory representation

The memory representation should reveal no information that cannot be obtained from the “legitimate” interface

Page 4: Anti-Persistence  or History Independent Data Structures

History of history independence

Issue dealt with in Cryptographic and Data Structures communities

Micciancio (1997): history independent trees– Motivation: incremental crypto– Based on the “shape” of the data structure, not including

memory representation– Stronger performance model!

Uniquely represented data structures– Treaps (Seidel & Aragon), uniquely represented dictionaries– Ordered hash tables (Amble & Knuth 1974)

Page 5: Anti-Persistence  or History Independent Data Structures

More History

Persistent Data Structures: possible to reconstruct all previous states of the data structure (Sarnak and Tarjan)– We want the opposite: anti-persistence

Oblivious RAM (Goldreich and Ostrovsky)

Page 6: Anti-Persistence  or History Independent Data Structures

Overview

Definitions History independent open addressing hashing History independent dynamic perfect hashing

– Memory Management

(Union Find) Open problems

Page 7: Anti-Persistence  or History Independent Data Structures

Precise Definitions

A data structure is – history independent: if any two sequences of

operations S1 and S2 that yield the same content induce the same probability distribution on memory representation

– strongly history independent: if given any two sets of breakpoints along S1 and S2 s.t. corresponding points have identical contents, S1 and S2 induce the same probability distributions on memory representation at those points

Page 8: Anti-Persistence  or History Independent Data Structures

Relaxations Statistical closeness Computational indistinguishability

– Example where helpful: erasing Allow some information to be leaked

– Total number of operations– n-history independent: identical distributions if the last n

operations where identical as well Under-defined data structures: same query can yield several

legitimate answers, – e.g. approximate priority queue – Define identical content: no suffix T such that set of permitted results

returned by S1 T is different from the one returned by S2 T

Page 9: Anti-Persistence  or History Independent Data Structures

History independence is easy (sort of)

If it is possible to decide the (lexicographically) “first” sequence of operations that produce a certain contents, just store the result of that

This gives a history independent version of a huge class of data structures

Efficiency is the problem…

Page 10: Anti-Persistence  or History Independent Data Structures

Dictionaries

Operations are insert(x), lookup(x) and possibly delete(x)

The content of a dictionary is the set of elements currently inserted (those that have been inserted but not deleted)

Elements x U some universe Size of table/memory N

Page 11: Anti-Persistence  or History Independent Data Structures

Goal

Find a history independent implementation of dictionaries with good provable performance.

Develop general techniques for history independence

Page 12: Anti-Persistence  or History Independent Data Structures

Approaches

Unique representation– e.g. array in sorted order– Yields strong history independence

Secret randomness– e.g. array in random order– only history independence

Page 13: Anti-Persistence  or History Independent Data Structures

Open addressing: traditional version

Each element x has a probe sequence

h1(x), h2(x), h3(x), ...– Linear probing: h2(x) = h1(x)+1, h3(x) = h1(x)+2, ...– Double hashing– Uniform hashing

Element is inserted into the first free space in its probe sequence– Search ends unsuccessfully at a free space

Efficient space utilization– Almost all the table can be full

Page 14: Anti-Persistence  or History Independent Data Structures

y

Open addressing: traditional version

y x

y

x arrived before y, so move y

No clash, so insert y

Not history independent becauselater-inserted elements move furtheralong in their probe sequence

Page 15: Anti-Persistence  or History Independent Data Structures

History independent version

At each cell i, decide elements’ priorities independently of insertion order

Call the priority function pi(x,y). If there is a clash, move the element of lower

priority At each cell, priorities must form a total order

Page 16: Anti-Persistence  or History Independent Data Structures

Insertion

xy x

x

p2(x,y)? No, so move xxy

Page 17: Anti-Persistence  or History Independent Data Structures

Search

Same as in the traditional algorithm In unsuccessful search, can quit as soon as

you find a lower-priority element

No deletionsNo deletions

Problematic in open addressing anyway

Page 18: Anti-Persistence  or History Independent Data Structures

Strong history independence

Claim:

For all hash functions and priority functions, the final configuration of the table is independent of the order of insertion.

Conclusion:

Strongly history independent

Page 19: Anti-Persistence  or History Independent Data Structures

Proof of history independence

A static insertion algorithm (clearly history independent):

x3

x5x5

x3

x6

x4

x2

x1

x5

x3x6

x4

x2 x1 p1(x2,x1) so insert x2

p1(x6,x4) and p6(x3,x6), so insert x3

insert x5

x2

x3

x2

x5

x2

x5

x3

Gather up the rejects and restart

x1

x6 x4

p3(x4,x5) and p3(x4,x6).Insert x4 and remove x5

x4x4

x1

x4 x5

x1

Page 20: Anti-Persistence  or History Independent Data Structures

Proof of history independence

Nothing moves further in the static algorithm than in the dynamic one– By induction on rounds of the static alg.

Vice versa– By induction on the steps in the dynamic alg.

Strongly history independent

Page 21: Anti-Persistence  or History Independent Data Structures

Some priority functions

Global– A single priority independent of cell

Random– Choose a random order at each cell

Youth-rules– Call an element “younger” if it has moved less far

along its probe sequence; younger elements get higher priority

Page 22: Anti-Persistence  or History Independent Data Structures

Youth-rules

x

y

p2(x,y) because x has taken fewer steps than y

x

y

y

Use a tie-breaker if steps are equalThis is a priority function

Page 23: Anti-Persistence  or History Independent Data Structures

Specifying a scheme

Priority rule– Choice of priority functions– In Youth-rules – determined by probe sequence

Probe functions– How are they chosen– Maintained– Computed

Page 24: Anti-Persistence  or History Independent Data Structures

Implementing Youth-rules

Let each hi be chosen from a pair-wise independent collection– For any two x and y the r.v. hi(x) and hi(y) are

uniform and independent.

Let h1, h2, h3,… be chosen independently

Example: hi(x) = (ai ·x mod U) + bi mod NSpace: 2 elements per function

Need only log N functions

Page 25: Anti-Persistence  or History Independent Data Structures

Performance Analysis

Based on worst-case insertion sequence The important parameter: - the fraction of

the table that is used ·N elementsAnalysis of expected insertion time and

search time (number of probes to the table)– Have to distinguish successful and

unsuccessful search

Page 26: Anti-Persistence  or History Independent Data Structures

Analysis via the Static Algorithm

For insertions, the total number of probes in static and dynamic algorithm are identical– Easier to analyze the static algorithm

Key point for Youth-rules: in the phase i all unsettled elements are in the ith probe in their sequence– Assures fresh randomness of hi (x)

Page 27: Anti-Persistence  or History Independent Data Structures

Performance

For Youth-rules, implemented as specified: For any sequence of insertion the expected

probe-time for insertion is at most 1/(1-) For any sequence of insertion the expected

probe-time for successful or unsuccessful search is at most 1/(1-)

Analysis based on static algorithm

is the fraction of the table that is used

Page 28: Anti-Persistence  or History Independent Data Structures

Comparison to double hashing

Analysis of double hashing with truly random functions [Guibas & Szemeredi, Lueker & Molodowitch]

Can be replaced by log n wise independent functions (Schmidt & Siegel)

log n wise independent is relatively expensive: – either a lot of space or log n time

Youth-rules is a simple and provably efficient scheme with very little extra storageExtra benefit of considering history independence

Page 29: Anti-Persistence  or History Independent Data Structures

Other Priority Functions

[Amble & Knuth] log(1/(1-)) for global– Truly random hash functions

Experiments show about log(1/(1-)) for most priority functions tried

Page 30: Anti-Persistence  or History Independent Data Structures

Other types of data structures

Memory management (dealing with pointers)– Memory Allocation

Other state-related issues

Page 31: Anti-Persistence  or History Independent Data Structures

Dynamic perfect hashing:FKS scheme, dynamized

x6

x4

x1

x2

x5

x3s0

h0

h1

hk

h s1

sk

Top-level table:O(n) space

Low-level tables:O(n) space total.

Each gets about si2

n elements tobe inserted

The hi are perfect on their respective sets. Rechoose h or some hi to maintain perfection and linear space.

Page 32: Anti-Persistence  or History Independent Data Structures

A subtle problem:the intersection bias problem

Suppose we have:– a set of states {1, 2, ...}– a set of objects {h1, h2, ...}– a way to decide whether hi is “good” for j.

Keep a “current” h as states change Change h only if it is no longer “good”.

– Choose uniformly from the “good” ones for . Then this is not history independent

– h is biased towards the intersection of those good for current and for previous states.

Page 33: Anti-Persistence  or History Independent Data Structures

Dynamized FKS is not history independent

Does not erase upon deletion Uses history-dependent memory allocation Hash functions (h, h1, h2, ...) are changed

whenever they cease to be “good”– Hence they suffer from the intersection bias

problem, since they are biased towards functions that were “good” for previous sets of elements

– Hence they leak information about past sets of elements

Page 34: Anti-Persistence  or History Independent Data Structures

Making it history independent

Use history independent memory allocation

Upon deletion, erase the element and rechoose the appropriate hi. This solves the low-level intersection bias problem.

Some other minor changes Solve the top-level intersection bias

problem...

Page 35: Anti-Persistence  or History Independent Data Structures

Solving the top-level intersection bias problem

Can’t afford a top-level rehash on every deletion

Generate two “potential h”s 1 and 2 at the beginning– Always use the first “good” one– If neither are good, rehash at every deletion– If not using 1, keep a top-level table for it for easy

“goodness” checking (likewise for 2)

Page 36: Anti-Persistence  or History Independent Data Structures

Proof of history independence

Table’s state is defined by: The current set of elements Top-level hash functions

– Always the first “good” i, or rechosen each step

Low-level hash functions– Uniformly chosen from perfect functions

Arrangement of sub-tables in memory– Use history-independent memory allocation

Some other history independent things

Page 37: Anti-Persistence  or History Independent Data Structures

Performance

Lookup takes two steps Insertion and deletion take expected amortized

O(1) time– There is a 1/poly chance that they will take more

Page 38: Anti-Persistence  or History Independent Data Structures

Open Problems

Better analysis for youth-rules as well as other priority functions with no random oracles.

Efficient memory allocation – ours is O(s log s)

Separations– Between strong and weak history independence– Between history independent and traditional

versions e.g. for union find

Can persistence and (computational) history independence co-exist efficiently?

Page 39: Anti-Persistence  or History Independent Data Structures

Conclusion

History independence can be subtle We have two history independent hash tables:

– Based on open addressing Very space efficient but no deletion

– Dynamic perfect hashing Allows deletion, constant-time lookup

Page 40: Anti-Persistence  or History Independent Data Structures

Open addressing: implementing hash functions

For all i, generate random independent ai, bi

hi(x) = (aix mod U + bi) mod N– U : size of universe; prime– N : size of hash table

x’s probe sequence is h1(x), h2(x), h3(x), ... We need log n hash functions

– n is the number of elements in the table