RESEARCH ARTICLE Sanitization Techniques for Protecting ... · Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014,

Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244

© 2014, IJCSMC All Rights Reserved 236

Available Online at www.ijcsmc.com

International Journal of Computer Science and Mobile Computing

A Monthly Journal of Computer Science and Information Technology

ISSN 2320–088X

IJCSMC, Vol. 3, Issue. 12, December 2014, pg.236 – 244

RESEARCH ARTICLE

Sanitization Techniques for Protecting

Social Networks from Inference Attacks JAYASREE DASARI

1, K.R.KOTESWARA RAO

2

Assistant Professor, Department of CSE, Gokul Institute of Technology and Sciences, Piridi village

Bobbili mandalam, Vizianagaram dt. Jntu kakinada university A.P, India1

Student, Department of CSE, Gokul Institute of Technology and Sciences, Piridi village

Bobbili mandalam, Vizianagaram dt. Jntu kakinada university A.P, India2

[email protected] 1, [email protected]

2

________________________________________________________________________________________________________________________________

Abstract-- Social networking applications became popular as they provide virtual platform that can help in

making people of all walks of life to have various communication channels over Internet. These applications

allow users to publish their data and share it with other users as well. In the process, it is possible that

adversaries can launch private information inference attacks to know sensitive information. In this paper we

focus on finding how such applications are vulnerable to private information inference attacks and devise

preventing mechanisms. We employ different sanitization techniques that prevent private information inference

attacks. The sensitive information is thus prevented from being disclosed to unauthorized people. We built a

prototype system that can demonstrate the proof of concept. We used the Census dataset in order to test the

application. The empirical results are encouraging.

Index Terms – Social networks, data mining, inference attacks, sanitization techniques

_____________________________________________________________________________

I. INTRODUCTION

Social networking has been around for many years. People of all walks of life depend on Internet for obtaining

various kinds of information. However, the usage of social networks like Facebook became increasingly more by

people of all demographics. In this context, lot of information is being made available over social networks. When

sensitive information is disclosed that might be misused by unknown people. Moreover the security settings

provided by social networks are inadequate. In this context, users‟ private information is not fully protected. When

some data is known, the missing data can be obtained from other data sources. This is possible as the data is

replicated in multiple sites or places. The kind of attack that infers unknown information from the known

information is called as private information inference attack.



Figure 1 – Possible ways in which social networking can be used

As can be seen in Figure 1, the information available over Internet and especially with social networks is

pertaining to alliances, contacts, internet, social media, career, connections, social skills, relationships, virtual

community, friendship, online, and business opportunities. Out of this information, sensitive information disclosure

in the problem to be addressed.

In this paper we used various sanitization mechanisms in order to prevent private information attacks. The

remainder of the paper is structured as follows. Section II provides review of literature. Section II reviews literature

on prior works. Section III presents the proposed methodology. Section IV provides implementation and results.

Section V concludes the paper besides providing directions for future work.

II. RELATED WORKS

This section provides review of literature on prior works. In [5] attacks are explored on anonymized networks.

The goal of such attacks was to know the identity of people. In [7] many anonymizing techniques were explored.

They tried to resolve the problem of sensitive information disclosure. He et al. [8] focused on creating Bayesian

network by using the links available in social network. They did their research using hypothetical attributes. Many

methods are proposed in [9] also anonymization techniques are explored using a technique known as k-anonymity.

In [10] specific usage scenarios are explored in social networks. They also focused on various attacks such as

stalking, re-identification and so on. When social networking web applications publish public data even after

anonymization there are many possible attacks that can disclose sensitive information. One such attack is private

information inference attack. Inference attack refers to an attack in which an attacks uses different sources of data to

obtain unknown data from known data by matching known attributes. In [11] usage trends of Facebook data was

explored.

Link based classification has been studied in order to devise solutions for the problem. Various methods related

to link based classification are studied in [12]. The methods include mean-field relaxation, loopy belief propagation,

and link-based classification. However, these techniques were not considered in [13] where alternative approach

using Markov networks is proposed. In order to predict class labels for unknown data dynamic methods are

employed in [14]. Predicting private attributes is carried out in [15]. Classification of data available over social

network can provide useful insights into the attributes and it is possible to devise plans for protecting data as well.

When data is published it is to be ensured that such exposed data do not allow adversaries to perform inference

attacks.

III. PROPOSED SOLUTION

In this paper we proposed a solution for preventing private information inference attacks in social networks. An

inference attack is the attack used to obtain private and sensitive information from the known data. Though sensitive

information is not directly disclosed, it is possible to match the known information with other data sources available

and successfully complete inference attacks. This can be prevented by proposing new sanitization techniques. In this



paper, we proposed a framework that has provision for making inference attacks and also prevention mechanisms in

the form of sanitization techniques.

Figure 2 – Outline of the proposed architecture

As can be seen in Figure 2, it is evident that the proposed architecture enables possible inference attacks in

order to demonstrate the usefulness of the proposed solution to prevent inference attacks in social networks.

Sanitization techniques are very useful are used to combat such attacks in social networks. Naïve Bayes

Classification, Network Classification are used to achieve inference based solution. More details on the sanitization

techniques can be found in [50]. Generalized information loss and structured information loss are the metrics used to

make use of sanitization techniques.

IV. PROTOTYPE AND RESULTS

We built a prototype application using Java programming language. Intuitive application demonstrates the

inference attacks, information loss procedures and the results. The environment used for the experiments is a PC

with dual core processor, 4 GB RAM running Windows 7 operating system. The application is tested using census

data that is collection from Internet sources.

Figure 3 – Sanitization process



As can be seen in Figure 3, it is evident that the data is split into multiple result tables and they are subjected to

sanitization. The reason behind sanitization is that the data is exposed to public without disclosing sensitive

information.

Figure 4 – Clustering process

As can be seen in Figure 4, it is evident that there is provision for loading data, anonymizing it, clustering and

viewing table. Two clusters are formed and age column is used as criterion for forming clusters.

Figure 5 – Zipcode process for anonymization

As can be seen in Figure 5, it is evident that the zipcode of the census data is used for anonymization. The UI

provides provision for loading clustered data, generating zip code, viewing zip code and reziping the zipcode.



Figure 6 – General information loss process

General information loss is computed as shown in Figure 6. The UI facilitates loading clustered data and

performing general information loss process besides allowing users to view total general information loss and

natural general information loss.

Figure 7 – Sensitive value secure process

As can be seen in Figure 7, it is evident that the sensitive value secure process has provision for processing,

sensitive data calculation, securing sensitive data and viewing the results.



Figure 8 – Total data loss

As can be seen in Figure 8, it is evident that the total data loss is presented. It provides various details and

visualize them using demographic features.

Figure 9 – Shows generalization information loss

The generation information loss is computed and presented in Figure 9. From the result it shows the general

information loss pertaining to cluster 1, cluster 2 and the total loss incurred.



Figure 10 – Shows structural information loss

The generation information loss is computed and presented in Figure 10. From the result it shows the structural

information loss pertaining to cluster 1, cluster 2 and the total loss incurred.

V. CONCLUSIONS AND FUTURE WORK

In this paper we studied the need for protecting private information in social networks. Especially we focused

on the private inference attacks on social networks. Sensitive information disclosure attacks are launched by hackers

in which they infer unknown information from the known information. Information inference attacks can disclose

the weakness of security mechanism in social networking systems. In this paper we employed such information

inference attacks and devised sanitization techniques in order to prevent them. We built a prototype application that

demonstrates the proof of concept. We tested the application using Census dataset obtained from Internet sources.

The empirical results are encouraging. In future we continue our research to investigate more into privacy

information inference attacks for building fool proof security with respect to preserving privacy.

REFERENCES

[1] Facebook Beacon, 2007.

[2] T. Zeller, “AOL Executive Quits After Posting of Search Data,” The New York Times, no. 22,

http://www.nytimes.com/2006/08/22/ technology/22iht-aol.2558731.html?pagewanted=all&_r=0, Aug. 2006.

[3] K.M. Heussner, “„Gaydar‟ n Facebook: Can Your Friends Reveal Sexual Orientation?” ABC News,

http://abcnews.go. com/Technology/gaydar-facebook-friends/story?id=8633224#. UZ939UqheOs, Sept. 2009.

[4] C. Johnson, “Project Gaydar,” The Boston Globe, Sept. 2009.

[5] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou r3579x?: Anonymized Social Networks,

Hidden Patterns, and Structural Steganography,” Proc. 16th Int‟l Conf. World Wide Web (WWW ‟07), pp. 181-190,

2007.

[6] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, “Anonymizing Social Networks,” Technical Report

07-19, Univ. of Massachusetts Amherst, 2007.

[7] K. Liu and E. Terzi, “Towards Identity Anonymization on Graphs,” Proc. ACM SIGMOD Int‟l Conf.

Management of Data (SIGMOD ‟08), pp. 93-106, 2008.

[8] J. He, W. Chu, and V. Liu, “Inferring Privacy Information from Social Networks,” Proc. Intelligence and

Security Informatics, 2006.

[9] E. Zheleva and L. Getoor, “Preserving the Privacy of Sensitive Relationships in Graph Data,” Proc. First ACM

SIGKDD Int‟l Conf. Privacy, Security, and Trust in KDD, pp. 153-171, 2008.

http://www.nytimes.com/2006/08/22/

http://abcnews.go/



[10] R. Gross, A. Acquisti, and J.H. Heinz, “Information Revelation and Privacy in Online Social Networks,” Proc.

ACM Workshop Privacy in the Electronic Soc. (WPES ‟05), pp. 71-80, http://dx.doi.org/10.1145/1102199.1102214,

2005.

[11] H. Jones and J.H. Soltren, “Facebook: Threats to Privacy,” technical report, Massachusetts Inst. of Tchnology,

2005.

[12] P. Sen and L. Getoor, “Link-Based Classification,” Technical Report CS-TR-4858, Univ. of Maryland, Feb.

2007.

[13] B. Tasker, P. Abbeel, and K. Daphne, “Discriminative Probabilistic Models for Relational Data,” Proc. 18th

Ann. Conf. Uncertainty in Artificial Intelligence (UAI ‟02), pp. 485-492, 2002.

[14] A. Menon and C. Elkan, “Predicting Labels for Dyadic Data,” Data Mining and Knowledge Discovery, vol. 21,

pp. 327-343, 2010.

[15] E. Zheleva and L. Getoor, “To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed

Public and Private user Profiles,” Technical Report CS-TR-4926, Univ. of Maryland, College Park, July 2008.

[16] N. Talukder, M. Ouzzani, A.K. Elmagarmid, H. Elmeleegy, and M. Yakout, “Privometer: Privacy Protection in

Social Networks,” Proc. IEEE 26th Int‟l Conf. Data Eng. Workshops (ICDE ‟10), pp. 266-269, 2010.

[17] J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham, “Inferring Private Information Using

Social Network Data,” Proc. 18th Int‟l Conf. World Wide Web (WWW), 2009.

[18] S.A. Macskassy and F. Provost, “Classification in Networked Data: A Toolkit and a Univariate Case Study,” J.

Machine Learning Research, vol. 8, pp. 935-983, 2007.

[19] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int‟l J. Uncertainty, Fuzziness and Knowledge-

based Systems, pp. 557-570, 2002.

[20] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-Diversity: Privacy Beyond K-

Anonymity,” ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, p. 3, 2007.

[21] C. Dwork, “Differential Privacy,” Automata, Languages and Programming, M. Bugliesi, B. Preneel, V.

Sassone, and I. Wegener, eds., vol. 4052, pp. 1-12, Springer, 2006.

[22] A. Friedman and A. Schuster, “Data Mining with Differential Privacy,” Proc. 16th ACM SIGKDD Int‟l Conf.

Knowledge Discovery and Data Mining, pp. 493-502, 2010.

[23] K. Fukunaga and D.M. Hummels, “Bayes Error Estimation Using Parzen and K-nn Procedures,” IEEE Trans.

Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 5, pp. 634-643, http://

portal.acm.org/citation.cfm?id=28809.28814, Sept. 1987.

[24] C. Clifton, “Using Sample Size to Limit Exposure to Data Mining,” J. Computer Security, vol. 8, pp. 281-307,

http://portal.acm.org/ citation.cfm?id=371090.371092, Dec. 2000.

[25] K. Tumer and J. Ghosh, “Bayes Error Rate Estimation Using Classifier Ensembles,” Int‟l J. Smart Eng. System

Design, vol. 5, no. 2, pp. 95-110, 2003.

[26] C. van Rijsbergen, S. Robertson, and M. Porter, “New Models in Probabilistic Information Retrieval,”

Technical Report 5587, British Library, 1980.

[27] D.J. Watts and S.H. Strogatz, “Collective Dynamics of Small- World Networks,” Nature, vol. 393, no. 6684,

pp. 440-442, June 1998.

AUTHORS

JAYASREE DASARI is currently working towards her M.Tech degree in Gokul Institute of Technology

and Sciences, Piridi village, Bobbili mandalam ,Vizianagaram dt, A.P, India. Her research interests include

Networking and cloud computing.

http://portal.acm.org/



K.R.KOTESWARA RAO is working as an Assistant professor in Gokul Institute of Technology and Sciences,

Piridi village, Bobbili mandalam ,Vizianagaram dt, A.P, India. His main research interests are data mining and big

data mining.

Documents

RESEARCH ARTICLE Sanitization Techniques for Protecting ... · Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014,