P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei...

Preview:

Citation preview

PRIVACY-PRESERVING RELEASES OF SOCIAL NETWORKS

Chih-Hua Tai

Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

DATA MINING

The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules

2

FINDING PATTERNS

3

FINDING RULES

4

DATA MINING VS. PRIVACY

The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules

Can we develop accurate models without access to precise information in individual data records? Why?

5

PRIVACY ISSUES IN DATA

6

PRIVACY ISSUES IN DATA

7

PRIVACY ISSUES IN DATA

8

DATA MINING AND PRIVACY

The primary task in data mining: development of models about aggregated data.

Can we develop accurate models without access to precise information in individual data records?

Answer: yes, by randomization. R. Agrawal, R. Srikant “Privacy Preserving Data

Mining,” SIGMOD 2000 How about the data utility?

9

DATA MINING VS. SOCIAL NETWORKS

10

Attributes: Name, Salary, …

Links: Friends, Neighborhood, …

Communities: Interests, Activities, …

PRIVACY ISSUES ON SOCIAL NETWORKS

Personal information leaked, even if the vertex identifies are hidden…

11

Many information can be used to re-associate the vertex with its identity.

Vertex degree: k-degree anonymity , …

Neighborhood configuration: k-neighborhood anonymity, k-automorphism anonymity, k-isomorphism anonymity, grouping-and- collapsing, …

13

C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, "Privacy-preserving Social Network Publication Against Friendship Attacks," In KDD, 2011.

FRIENDSHIP ATTACK

Still there are another type of information for vertex re-identification – friendship attack

14

FRIENDSHIP ATTACK

Given a target individual A and the degree pair information D2 = (d1,d2), a friendship attack (D2,A) exploits D2 to identify a vertex v1 corresponding to A in a published social network where v1 connects to another vertex v2 with the degree pair (dv1

,dv2) = (d1,d2).

15

10

1 2 3

4 5

Alice

6 7

9

8

Ex 1. Assume that an attacker knows that Alice has 3 connections, Bob has 2 connections, and Alice and Bob are friends. The attacker identifies v9 as Alice with 100% confidence.

FRIENDSHIP ATTACK

16

In DBLP data set, the percentages of vertices that can be re-identified with a probability larger than 1/k by degree and friendship attacks.

Original Social Network

k-degree anonymized Social Network

kDegree Attack Friendship Attack Friendship Attack

5 0.28% 5.37% 2.89%

10 0.53% 10.69% 4.65%

15 0.73% 14.71% 5.82%

20 0.93% 18.44% 7.23%

NEW PRIVACY MODEL AGAINST FRIENDSHIP ATTACK

k2-degree AnonymityA social network is k2-degree anonymous

if, for every vertex with an incident edge of degree pair (d1,d2), there exist at least k − 1 other vertices, each of which also has an incident edge of the same degree pair.

17

10

1 2 3

4 5

6 7

9

8

Ex 2. Even with the knowledge (D2,A)=((3,2),Alice), the probability that an attacker can re-identify Alice in the 22-degree anonymous social network is limited to ½.

THE ANONYMIZATION

Problem formulation: Given a graph G(V, E) and an integer k, the problem is to

anonymize G to satisfy k2-degree anonymity such that information distortion is minimized.

The challenges: Any alteration on an edge will affect the degrees of

two vertices.

18

GRAPH ANONYMIZATION ALGORITHMS

Integer Programming formulation Obtain the optimal solution with bad scalability

DEgree SEqence ANonymization (DESEAN) Step1. Degree Sequence Anonymization.

- determine the groups of vertices protecting each others

Step2. Privacy Constraint Satisfaction.- eliminate the advantage of knowing friendship information

Step3. Anonymous Degree Realization.- have the vertices in the same group share the same vertex degree 19

Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a

target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.

ALGORITHM DESEAN

20

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

ALGORITHM DESEAN

Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a

target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.

21

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure

that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.

ALGORITHM DESEAN

22

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure

that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.

ALGORITHM DESEAN

23

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

Ex 3. Given k = 2 and ω = 0.5.

Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each

cluster x meet the target degree dx selected in Step 1.

10

1 2 3

4 5

6 7

9

8

ALGORITHM DESEAN

24

Ex 3. Given k = 2 and ω = 0.5.

Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each

cluster x meet the target degree dx selected in Step 1.

10

1 2 3

4 5

6 7

9

8

10

1 2 3

4 5

6 7

9

8

ALGORITHM DESEAN

25

Ex 3. Given k = 2 and ω = 0.5.

26

C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural diversity for privacy in publishing social networks,” In SDM, 2011.

COMMUNITY IDENTIFICATION

Vertex identification is considered to be an important privacy issue in publishing social networks. ◦ k-degree anonymity, k-neighborhood anonymity, …

In addition to a vertex identity, each individual is also associated with a community identity. ◦ Could be used to infer the political party affiliation or disease

information sensitive to the public.◦ Is a kind of structural information

27

COMMUNITY IDENTIFICATION

Community information is explicitly given:

Ex.

Alice knows recently… Bob is sick Bob participates in this social network Bob makes 5 friends. (vertex degree attack)

Bob has AIDS!

AIDS Com. SLE Com.

28

COMMUNITY IDENTIFICATION

Community information is not given:

Ex.

Alice knows Bob participates in this social network and has 5 friends. (vertex degree attack)

Alice can know the approximation of Bob’s neighborhood.

29

COMMUNITY IDENTIFICATION

% of vertices violating k-structural diversity◦ (a) DBLP◦ (b) ca-CondMat

k-degree anonymization is insufficient 30

COMMUNITY IDENTIFICATION

structural diversity◦ (a) original DBLP◦ (b) 10-degree anonymized DBLP

Vertices with large degrees appear in a small set of communities 31

NEW PRIVACY MODEL AGAINST COMMUNITY IDENTIFICATION

k-Structural Diversity To protect against vertex degree attack, for each vertex, there

should be other vertices with the same degree located in at least k-1 other communities.

If a graph satisfies k-structural diversity, then it also satisfies k-degree anonymity.

32

THE ANONYMIZATION

Problem formulation: Given a graph G(V, E, C) and an integer k, 1 k |C|, the ≦ ≦

problem is to anonymize G to satisfy k-structural diversity such that information distortion is minimized.

The challenges: How to preserve community structures, even in

the implicit cases, while preserving privacy.

33

PROBLEM FORMULATION

Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.

34

PROCEDURE MERGENCE

◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d

Com. 1

v

Com. 2

Com. 335

PROCEDURE MERGENCE

◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d

Com. 1

v

Com. 2

Com. 336

PROCEDURE CREATION

To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v

Com. 1 Com. 2 Com. 3

v

37

PROCEDURE CREATION

To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v

Com. 1 Com. 2 Com. 3

v

38

THE EDGE-REDITECTION MECHANISM

◦ Is defined on w, v, x in the same community w: an anonymized vertex v and x : two not-yet-anonymized vertices

◦ Is to replace the edge (w, v) with the edge (w, x)

v w

v w

x

39

THE EDGE-REDITECTION MECHANISM

By mergence

v w

Com. 1 Com. 2

Com. 3

Q. Why needs the Edge-Reditection mechanism?

By mergence

v w

Com. 1 Com. 2

Com. 3

40

THE EDGE-REDITECTION MECHANISM

By creation

Com. 1

Com. 2 Com. 3

v w

Q. Why needs the Edge-Reditection mechanism?

By creation

Com. 1

Com. 2 Com. 3

v w

41

ALGORITHM EDGECONNECT (EC)

Procedures Mergence and Creation◦ Let Rv be the set of edges that could be redirected away from v

The Edge-Reditection mechanism

42

PROBLEM FORMULATION

Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.

Operation Splitting Vertex◦ Replace a vertex v with a set of substitute vertices, such that

each substitute vertex is connected with at least one edge incident to v originally.

◦ Each substitute vertex presents partial truth of the vertex v.

43

PROCEDURE CREATIONBYSPLIT

Split a set of vertices, including v, to create a new anonymous group

Com. 1 Com. 2

Com. 3

v

Com. 1 Com. 2

Com. 3

v1v2

44

PROCEDURE MERGEBYSPLIT

Split v into a set of substitute vertices s.t. each substitute vertex is protected in some existing anonymous group

Com. 1 Com. 2

Com. 3 v

v1

v2

v3

45

46

C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities Anonymization in Dynamic Social Networks," In ICDM-11, 2011.

THE PROBLEM IN DYNAMIC SCENARIOS…

A dynamic social network will be sequentially released.

An attacker can monitor a victim for a period w.

Therefore, the adversary knowledge includes: The releases G t-w+1,

G t-w+2, …, G t during w A degree sequence Δv

w=(dvt-

w+1, dv

t-w+2, …, dvt) of a victim v

during w47

G2G1

Ex. John has two friends at time 1, and three friends at time 2.

PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY

Base case of w=1 A group θd, consisting of all

vertices of degree d, is a k-shielding group if there is a vertex subset θ ⊆ θd s. t. (1) |θ |≥ k, and (2) any two vertices u and v in θ,

Cv ∩ Cv = ø, where C is the community identity.

48

The adversary knowledge includes: 1. The release

social network G t 2. A degree

sequence Δv

1=(dvt) of a

victim v

Ex. Mary has four friends.

???

G

PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY

Dynamic scenarios of w>1 A consistent group ΘΔ is the set of vertices

that always share the same degree during w.

A consistent group ΘΔ is a k-shielding if

at each time instant t in w, there is a vertex subset Θ t

⊆ ΘΔ s. t. (1) |Θ t |≥ k, and (2) any two vertices u and v in Θ t, Cv

t ∩ Cv t = ø, where

C t is the community identity at time t.

49

The adversary knowledge of includes: 1. The releases G t-

w+1, G t-w+2, …, G t during w

2. A degree sequence Δv

w=(dvt-w+1, dv

t-

w+2, …, dvt) of a

victim v during w

THE ANONYMIZATION

Problem formulation: Suppose that every vertex in a series of sequential

releases G t-w+1, G t-w+2, …, G t-1 is protected. Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the

current social network Gt s. t. every vertex is protected in a k-shielding consistent group.

The challenges: The anonymization is depended on not only the

current social network but also previous w-1 releases.

Searching through all the w-1 releases to eliminate privacy leak is time consuming.

50

ANONYMIZATION ALGORITHM

Construct CS (Clustering Sequence) -Table to prevent the search through w graphs.

CS-Table Summary the vertex information in w-1 previous

releases. Fetch v’ info. in w graphs without scanning the graphs.

According to the degree sequence during w, sort the vertices in hierarchical clustering manner. Vertices in the same k-consistent shielding group are close in CS-Table. CS-Table can be incrementally updated.

According to the vertex ranking in CS-Table, anonymize each vertex to be protected.

51

THANK YOU~!

Recommended