Click here to load reader

Reconstruction from Randomized Graph via Low Rank Approximation

  • Upload
    morty

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Reconstruction from Randomized Graph via Low Rank Approximation. Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte. Outline. Background & Motivation Low Rank Approximation on Graph Data Reconstruction from Randomized Graph Evaluation - PowerPoint PPT Presentation

Citation preview

Slide 1

Leting Wu

Xiaowei Ying, Xintao WuDept. Software and Information SystemsUniv. of N.C. CharlotteReconstruction from Randomized Graph via Low Rank Approximation1OutlineBackground & MotivationLow Rank Approximation on Graph DataReconstruction from Randomized GraphEvaluationPrivacy Issue2Background & Motivation3BackgroundIn the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not enough for protecting the privacy due to topology based attacks(Active/passive attacks, subgraph attacks).Graph Randomization/Perturbation:Random Add/Del edges (no. of edges unchanged)Random Switch edges (nodes degree unchanged)Feature preserving randomizationSpectrum preserving randomizationFeature preserving via Markov-chain based graph generation Clustering --- grouping subgraphs into supernodes4MotivationWe focus on whether we can reconstruct a graph from s.t.

5

Our Focus

Low Rank Approximation on Graph Data6Adjacency Matrix & Its Eigen-DecompositionMatrix Representation of NetworkAdjacency Matrix A (symmetric)

Eigen-decomposition:

Questions:What are their relations with graph topology?

7

78Leading Eigenpairs vs. Graph TopologyWhat are the role of positive and negative eigen-pairs in graph topology?Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as

where and represent the edges within the two groups and represents the edges between the groups 8

9Leading Eigenpairs vs. Graph Topology9

r = 1r = 2Original

10Leading Eigenpairs vs. Graph Topology10

Originalr = 1r = 2

11Leading Eigenpairs vs. Graph Topology11

Originalr = 1r = 4r = 2

Low Rank Approximation on Graph DataLow Rank Approximation:

This provide a best r rank approximation to A To keep the structure of adjacency matrix, discrete as following:

12

Reconstruction from Randomized Graph1314Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges)14

15Determine Number of Eigen-pairs Question:How to choose an optimal rank r for reconstruction?Solution:Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator

where m is the number of edges, k is the number of edges add/delete,15

Algorithm16

Evaluation1718Effect of Noise (Political Blogs)The method works well to a certain level of noiseEven with high level of noise, the reconstructed features are still closer to the original than the randomized ones18

19Reconstructed Features on 3 real network data19Reconstruction Quality

When , the reconstructed features are closer to the original ones than the randomized onesAll positive for the three data sets

Privacy Issue2021Privacy IssueQuestion 1: Can this reconstruction be used by attackers?Define the normalized Frobenius distance between A and as21 Political Books Enron

Political Blogs

Normalized F NormNormalized F NormNormalized F Norm22Privacy IssueQuestion 2: Which type of graphs would have privacy breached?

For low rank graphs which have , the distance between the reconstructed graph and the original graph can be very small

22

23Synthetic Low Rank GraphsHere is a set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstruction works on both the distance and features23

24ConclusionWe show the relationship between graph topological structure and eigen-pairs of the adjacency matrixWe propose a low rank approximation based reconstruction algorithm with a novel solution to determine the optimal rankFor most social networks, our algorithm do not incur further disclosure risks of individual privacy except for networks with low ranks or a small number of dominant eigenvalues

2425Questions?

AcknowledgmentsThis work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.Thank You!25BackgroundPublish/outsource data for mining/analysis

26Public/Third party/Research Inst.Data OwnerThe original graph datareleasepublishUnder Attacks!!!Privacy: protect sensitive data (identity, relationship, sensitive attributes) Utility: preserve features/patterns/distributions of dataBackgroundSpectral Filter for Numerical Data derive estimation of U from perturbed dataCalculate covariance matrixApply spectral decomposition to Derive the eigenvalues information from the covariance matrix of noise V and choose a proper number of dimensions, r Let and , obtain the estimated data set using

27

New ChallengesA is a 0-1 adjacency matrix whereas U is a numerical matrix and is positive covariance matrix has only non-negative eigenvalues whereas A has both positive and negative eigenvalues.Can not define the covariance matrix for graph dataThe strategy of determining the number of eigen components to use in numerical data does not work for graph data since the first eigenvalue of the noise matrix could be very large.28A is a 0-1 adjacency matrix whereas U is a numerical matrix and is positive covariance matrix has only non-negative eigenvalues whereas A has both positive and negative eigenvalues.Can not define the covariance matrix for graph dataThe strategy of determining the number of eigen components to use in numerical data does not work for graph data since the first eigenvalue of the noise matrix could be very large.

29Data SetsPolitical BlogsBased on incoming and outgoing links and posts during the time of 2004 presidential election16714 links among 1222 US political blogs Political BooksBased on the political books sold by Amazon.com where nodes represent the books and edges represent the co-purchasing of books105 nodes and 441 edgesEnronBased on email corpus of a real organization covering 3 years period where an edge represents there are at least 5 emails sent between two people151 nodes and 869 edges2930Future WorkStudy whether similar LRA reconstruction can be derived on other edge based perturbation strategies such as Rand Switch and K-Anonimity.Reconstruction of distribution from networked data.Distribution of networked data?Randomization mechanismPrivacy vs. utility (in general social networks and with background knowledge attacks) Spectral analysis of graph topology (signed/weighted/directed graph)30