Graph Analysis with Node Differential Privacy

Preview:

DESCRIPTION

Graph Analysis with Node Differential Privacy . Sofya Raskhodnikova Penn State University. Joint work with Shiva Kasiviswanathan ( GE Research ), Kobbi Nissim ( Ben-Gurion U. and Harvard U. ), Adam Smith ( Penn State ). Publishing information about graphs. - PowerPoint PPT Presentation

Citation preview

1

Graph Analysis with Node Differential Privacy

Sofya RaskhodnikovaPenn State University

Joint work with Shiva Kasiviswanathan (GE Research), Kobbi Nissim (Ben-Gurion U. and Harvard U.),

Adam Smith (Penn State)

2

Publishing information about graphs

Many datasets can be represented as graphs• “Friendships” in online social network• Financial transactions• Email communication• Romantic relationships

American J. Sociology,  Bearman, Moody, Stovel

image source http://community.expressor-software.com/blogs/mtarallo/36-extracting-data-facebook-social-graph-expressor-tutorial.html

Privacy is a big issue!

Differential privacy for graph data

Differential privacy [Dwork McSherry Nissim Smith 06]An algorithm A is -differentially private if for all pairs of neighbors and all sets of answers S:

𝑷𝒓 [ 𝑨 (𝑮 )∈𝑺 ] ≤𝒆𝝐𝑷𝒓 [ 𝑨 (𝑮′ )∈𝑺 ]

Graph G

Aqueries

answers)(

Government,researchers,businesses

(or) maliciousadversary

3image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Trustedcurator

Users

4

Two variants of differential privacy for graphs

• Edge differential privacy

Two graphs are neighbors if they differ in one edge.

• Node differential privacy

Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges.

G: G:

G: G:

Node differentially private analysis of graphs

5image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

• Two conflicting goals: utility and privacy– Impossible to get both in the worst case

• Previously: no node differentially private algorithms that are accurate on realistic graphs

Graph G

Aqueries

answers)(

Government,researchers,businesses

(or) maliciousadversary

Trustedcurator

Users

6

Our contributions

• First node differentially private algorithms that are accurate for sparse graphs– node differentially private for all graphs– accurate for a subclass of graphs, which includes

• graphs with sublinear (not necessarily constant) degree bound• graphs where the tail of the degree distribution is not too heavy• dense graphs

• Techniques for node differentially private algorithms• Methodology for analyzing the accuracy of such

algorithms on realistic networks

Concurrent work on node privacy [Blocki Blum Datta Sheffet 13]

7

• Node differentially private algorithms for releasing – number of edges– counts of small subgraphs

(e.g., triangles, -triangles, -stars)– degree distribution

• Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with -decay for constant Notation: fraction of nodes in G of degree

– Every graph satisfies -decay– Natural graphs (e.g., “scale-free” graphs, Erdos-Renyi) satisfy

Our contributions: algorithms

A graph G satisfies -decay if for all

𝒅 𝑡 ⋅𝒅

≤ 𝒕−𝜶

… …

Frequency

Degrees…

……

8

Our contributions: accuracy analysis• Node differentially private algorithms for releasing

– number of edges– counts of small subgraphs

(e.g., triangles, -triangles, -stars)– degree distribution

• Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with -decay for constant

– number of edges– counts of small subgraphs (e.g., triangles, -triangles, -stars)– degree distribution

A graph G satisfies -decay if for all

(1+o(1))-approximation

}

……

9

Previous work ondifferentially private computations on graphs

Edge differentially private algorithms• number of triangles, MST cost [Nissim Raskhodnikova Smith 07]• degree distribution [Hay Rastogi Miklau Suciu 09, Hay Li Miklau Jensen 09,

Karwa Slavkovic 12]• small subgraph counts [Karwa Raskhodnikova Smith Yaroslavtsev 11]• cuts [Blocki Blum Datta Sheffet 12]

Edge private against Bayesian adversary (weaker privacy)• small subgraph counts [Rastogi Hay Miklau Suciu 09]

Node zero-knowledge private (stronger privacy)• average degree, distances to nearest connected, Eulerian,

cycle-free graphs for dense graphs [Gehrke Lui Pass 12]

10

Challenge for node privacy: high local sensitivity• Local sensitivity [NRS’07]:

• Global sensitivity [DMNS’06]:

For many functions of the data, node is high.• Consider adding a node connected to all other nodes.• Examples: (G)= |E(G)|. (G)= # of s in G.

𝑳𝑺 𝒇 (𝑮 )= max𝐺 ′ :𝐧 𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺

|𝑓 (𝐺 )− 𝑓 (𝐺′ )|

Edge is 1; node is for all G. Edge is ; node is |E(G)|.

11

“Projections” on graphs of small degree

Let = family of all graphs, = family of graphs of degree .Notation. = node over.

= node over .Observation. is low for many useful .Examples: = (compare to = ) = (compare to = )

Idea: ``Project’’ on graphs in for a carefully chosen d << n.

𝓖𝓖𝑑

Goal: privacy for all graphs

12

Method 1: Lipschitz extensions

• Release via GS framework [DMNS’06]

• Requires designing Lipschitz extension for each function – we base ours on maximum flow and linear and convex programs

𝓖𝓖𝑑

low

high

𝒇 ′= 𝒇

=

A function is a Lipschitz extension of from to if

agrees with on and=

13

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

Add edge iff .(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑1

𝑑

14

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

Add edge iff .(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .Proof: (1) (G) = for all G (2) = 2⋅

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑 𝑑1

deg (𝑣 )/¿ deg (𝑣 )/¿1/

15

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .Proof: (1) (G) = for all G (2) = 2⋅ = 2

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑 𝑑1

6'

𝑑 𝑑

6

16

For a graph G=([n], E), define LP with variables for all triangles :

(G) is the value of LP.Lemma. (G) is a Lipschitz extension of .

• Can be generalized to other counting queries• Other queries use convex programs

Lipschitz extensions via linear/convex programs

Maximize

for all triangles for all nodes

∑𝑇=△ of 𝐺

𝑥𝑇

¿𝝏𝒅 𝒇 △

17

Method 2: Generic reduction to privacy over

• Time(A) = Time(B) + O(m+n)• Reduction works for all functions How it works: Truncation T(G) outputs G with nodes of degree removed.• Answer queries on T(G) instead of G

𝓖𝓖𝑑

low

high

𝑻

Input: Algorithm B that is node-DP over Output: Algorithm A that is node-DP over , has accuracy similar to B on “nice” graphs

via Smooth Sensitivity framework [NRS’07] via finding a DP upper bound on [Dwork Lei 09, KRSY’11]

and running any algorithm that is -node-DP over

T query f

+ noiseT(G)

G

S (G)

A

18

Generic Reduction via Truncation• Truncation T(G) removes

nodes of degree .• On query , answer

How much noise?• Local sensitivity of as a map

• Lemma., where = nodes of degree .

• Global sensitivity is too large.

… …d

Frequency

Degrees

Nodes that determine

𝐿𝑆𝑇 (𝐺 )= max𝐺′ :𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺

𝑑𝑖𝑠𝑡 (𝑇 (𝐺 ) ,𝑇 (𝐺′ ))

19

Smooth Sensitivity of Truncation

Lemma. is a smooth bound for , computable in time

“Chain rule”: is a smooth bound for

Smooth Sensitivity Framework [NRS ‘07]is a smooth bound on local sensitivity ofif

– for all neighbors

T query f

+ noiseT(G)

G

S (G)

A

20

Utility of the Truncation Mechanism

Lemma. If we truncate to a random degree in ,

• Application to releasing the degree distribution: an -node differentially private algorithm such that

with probability at least if satisfies -decay for

Utility: If G is -bounded, expected noise magnitude is .

T query f

+ noiseT(G)

G

S (G)

A

21

Techniques used to obtain our results• Node differentially private algorithms for releasing

– number of edges– counts of small subgraphs (e.g., triangles, -triangles, -stars)– degree distribution

via Lipschitz extensions

} via generic reduction

22

Conclusions• It is possible to design node differentially private algorithms

with good utility on sparse graphs– One can first test whether the graph is sparse privately

• Directions for future work– Node-private algorithm for releasing cuts– Node-private synthetic graphs– What are the right notions of privacy for graph data?

Recommended