Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 236
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 3, Issue. 12, December 2014, pg.236 – 244
RESEARCH ARTICLE
Sanitization Techniques for Protecting
Social Networks from Inference Attacks JAYASREE DASARI
1, K.R.KOTESWARA RAO
2
Assistant Professor, Department of CSE, Gokul Institute of Technology and Sciences, Piridi village
Bobbili mandalam, Vizianagaram dt. Jntu kakinada university A.P, India1
Student, Department of CSE, Gokul Institute of Technology and Sciences, Piridi village
Bobbili mandalam, Vizianagaram dt. Jntu kakinada university A.P, India2
[email protected] 1, [email protected]
2
________________________________________________________________________________________________________________________________
Abstract-- Social networking applications became popular as they provide virtual platform that can help in
making people of all walks of life to have various communication channels over Internet. These applications
allow users to publish their data and share it with other users as well. In the process, it is possible that
adversaries can launch private information inference attacks to know sensitive information. In this paper we
focus on finding how such applications are vulnerable to private information inference attacks and devise
preventing mechanisms. We employ different sanitization techniques that prevent private information inference
attacks. The sensitive information is thus prevented from being disclosed to unauthorized people. We built a
prototype system that can demonstrate the proof of concept. We used the Census dataset in order to test the
application. The empirical results are encouraging.
Index Terms – Social networks, data mining, inference attacks, sanitization techniques
_____________________________________________________________________________
I. INTRODUCTION
Social networking has been around for many years. People of all walks of life depend on Internet for obtaining
various kinds of information. However, the usage of social networks like Facebook became increasingly more by
people of all demographics. In this context, lot of information is being made available over social networks. When
sensitive information is disclosed that might be misused by unknown people. Moreover the security settings
provided by social networks are inadequate. In this context, users‟ private information is not fully protected. When
some data is known, the missing data can be obtained from other data sources. This is possible as the data is
replicated in multiple sites or places. The kind of attack that infers unknown information from the known
information is called as private information inference attack.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 237
Figure 1 – Possible ways in which social networking can be used
As can be seen in Figure 1, the information available over Internet and especially with social networks is
pertaining to alliances, contacts, internet, social media, career, connections, social skills, relationships, virtual
community, friendship, online, and business opportunities. Out of this information, sensitive information disclosure
in the problem to be addressed.
In this paper we used various sanitization mechanisms in order to prevent private information attacks. The
remainder of the paper is structured as follows. Section II provides review of literature. Section II reviews literature
on prior works. Section III presents the proposed methodology. Section IV provides implementation and results.
Section V concludes the paper besides providing directions for future work.
II. RELATED WORKS
This section provides review of literature on prior works. In [5] attacks are explored on anonymized networks.
The goal of such attacks was to know the identity of people. In [7] many anonymizing techniques were explored.
They tried to resolve the problem of sensitive information disclosure. He et al. [8] focused on creating Bayesian
network by using the links available in social network. They did their research using hypothetical attributes. Many
methods are proposed in [9] also anonymization techniques are explored using a technique known as k-anonymity.
In [10] specific usage scenarios are explored in social networks. They also focused on various attacks such as
stalking, re-identification and so on. When social networking web applications publish public data even after
anonymization there are many possible attacks that can disclose sensitive information. One such attack is private
information inference attack. Inference attack refers to an attack in which an attacks uses different sources of data to
obtain unknown data from known data by matching known attributes. In [11] usage trends of Facebook data was
explored.
Link based classification has been studied in order to devise solutions for the problem. Various methods related
to link based classification are studied in [12]. The methods include mean-field relaxation, loopy belief propagation,
and link-based classification. However, these techniques were not considered in [13] where alternative approach
using Markov networks is proposed. In order to predict class labels for unknown data dynamic methods are
employed in [14]. Predicting private attributes is carried out in [15]. Classification of data available over social
network can provide useful insights into the attributes and it is possible to devise plans for protecting data as well.
When data is published it is to be ensured that such exposed data do not allow adversaries to perform inference
attacks.
III. PROPOSED SOLUTION
In this paper we proposed a solution for preventing private information inference attacks in social networks. An
inference attack is the attack used to obtain private and sensitive information from the known data. Though sensitive
information is not directly disclosed, it is possible to match the known information with other data sources available
and successfully complete inference attacks. This can be prevented by proposing new sanitization techniques. In this
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 238
paper, we proposed a framework that has provision for making inference attacks and also prevention mechanisms in
the form of sanitization techniques.
Figure 2 – Outline of the proposed architecture
As can be seen in Figure 2, it is evident that the proposed architecture enables possible inference attacks in
order to demonstrate the usefulness of the proposed solution to prevent inference attacks in social networks.
Sanitization techniques are very useful are used to combat such attacks in social networks. Naïve Bayes
Classification, Network Classification are used to achieve inference based solution. More details on the sanitization
techniques can be found in [50]. Generalized information loss and structured information loss are the metrics used to
make use of sanitization techniques.
IV. PROTOTYPE AND RESULTS
We built a prototype application using Java programming language. Intuitive application demonstrates the
inference attacks, information loss procedures and the results. The environment used for the experiments is a PC
with dual core processor, 4 GB RAM running Windows 7 operating system. The application is tested using census
data that is collection from Internet sources.
Figure 3 – Sanitization process
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 239
As can be seen in Figure 3, it is evident that the data is split into multiple result tables and they are subjected to
sanitization. The reason behind sanitization is that the data is exposed to public without disclosing sensitive
information.
Figure 4 – Clustering process
As can be seen in Figure 4, it is evident that there is provision for loading data, anonymizing it, clustering and
viewing table. Two clusters are formed and age column is used as criterion for forming clusters.
Figure 5 – Zipcode process for anonymization
As can be seen in Figure 5, it is evident that the zipcode of the census data is used for anonymization. The UI
provides provision for loading clustered data, generating zip code, viewing zip code and reziping the zipcode.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 240
Figure 6 – General information loss process
General information loss is computed as shown in Figure 6. The UI facilitates loading clustered data and
performing general information loss process besides allowing users to view total general information loss and
natural general information loss.
Figure 7 – Sensitive value secure process
As can be seen in Figure 7, it is evident that the sensitive value secure process has provision for processing,
sensitive data calculation, securing sensitive data and viewing the results.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 241
Figure 8 – Total data loss
As can be seen in Figure 8, it is evident that the total data loss is presented. It provides various details and
visualize them using demographic features.
Figure 9 – Shows generalization information loss
The generation information loss is computed and presented in Figure 9. From the result it shows the general
information loss pertaining to cluster 1, cluster 2 and the total loss incurred.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 242
Figure 10 – Shows structural information loss
The generation information loss is computed and presented in Figure 10. From the result it shows the structural
information loss pertaining to cluster 1, cluster 2 and the total loss incurred.
V. CONCLUSIONS AND FUTURE WORK
In this paper we studied the need for protecting private information in social networks. Especially we focused
on the private inference attacks on social networks. Sensitive information disclosure attacks are launched by hackers
in which they infer unknown information from the known information. Information inference attacks can disclose
the weakness of security mechanism in social networking systems. In this paper we employed such information
inference attacks and devised sanitization techniques in order to prevent them. We built a prototype application that
demonstrates the proof of concept. We tested the application using Census dataset obtained from Internet sources.
The empirical results are encouraging. In future we continue our research to investigate more into privacy
information inference attacks for building fool proof security with respect to preserving privacy.
REFERENCES
[1] Facebook Beacon, 2007.
[2] T. Zeller, “AOL Executive Quits After Posting of Search Data,” The New York Times, no. 22,
http://www.nytimes.com/2006/08/22/ technology/22iht-aol.2558731.html?pagewanted=all&_r=0, Aug. 2006.
[3] K.M. Heussner, “„Gaydar‟ n Facebook: Can Your Friends Reveal Sexual Orientation?” ABC News,
http://abcnews.go. com/Technology/gaydar-facebook-friends/story?id=8633224#. UZ939UqheOs, Sept. 2009.
[4] C. Johnson, “Project Gaydar,” The Boston Globe, Sept. 2009.
[5] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou r3579x?: Anonymized Social Networks,
Hidden Patterns, and Structural Steganography,” Proc. 16th Int‟l Conf. World Wide Web (WWW ‟07), pp. 181-190,
2007.
[6] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, “Anonymizing Social Networks,” Technical Report
07-19, Univ. of Massachusetts Amherst, 2007.
[7] K. Liu and E. Terzi, “Towards Identity Anonymization on Graphs,” Proc. ACM SIGMOD Int‟l Conf.
Management of Data (SIGMOD ‟08), pp. 93-106, 2008.
[8] J. He, W. Chu, and V. Liu, “Inferring Privacy Information from Social Networks,” Proc. Intelligence and
Security Informatics, 2006.
[9] E. Zheleva and L. Getoor, “Preserving the Privacy of Sensitive Relationships in Graph Data,” Proc. First ACM
SIGKDD Int‟l Conf. Privacy, Security, and Trust in KDD, pp. 153-171, 2008.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 243
[10] R. Gross, A. Acquisti, and J.H. Heinz, “Information Revelation and Privacy in Online Social Networks,” Proc.
ACM Workshop Privacy in the Electronic Soc. (WPES ‟05), pp. 71-80, http://dx.doi.org/10.1145/1102199.1102214,
2005.
[11] H. Jones and J.H. Soltren, “Facebook: Threats to Privacy,” technical report, Massachusetts Inst. of Tchnology,
2005.
[12] P. Sen and L. Getoor, “Link-Based Classification,” Technical Report CS-TR-4858, Univ. of Maryland, Feb.
2007.
[13] B. Tasker, P. Abbeel, and K. Daphne, “Discriminative Probabilistic Models for Relational Data,” Proc. 18th
Ann. Conf. Uncertainty in Artificial Intelligence (UAI ‟02), pp. 485-492, 2002.
[14] A. Menon and C. Elkan, “Predicting Labels for Dyadic Data,” Data Mining and Knowledge Discovery, vol. 21,
pp. 327-343, 2010.
[15] E. Zheleva and L. Getoor, “To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed
Public and Private user Profiles,” Technical Report CS-TR-4926, Univ. of Maryland, College Park, July 2008.
[16] N. Talukder, M. Ouzzani, A.K. Elmagarmid, H. Elmeleegy, and M. Yakout, “Privometer: Privacy Protection in
Social Networks,” Proc. IEEE 26th Int‟l Conf. Data Eng. Workshops (ICDE ‟10), pp. 266-269, 2010.
[17] J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham, “Inferring Private Information Using
Social Network Data,” Proc. 18th Int‟l Conf. World Wide Web (WWW), 2009.
[18] S.A. Macskassy and F. Provost, “Classification in Networked Data: A Toolkit and a Univariate Case Study,” J.
Machine Learning Research, vol. 8, pp. 935-983, 2007.
[19] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int‟l J. Uncertainty, Fuzziness and Knowledge-
based Systems, pp. 557-570, 2002.
[20] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-Diversity: Privacy Beyond K-
Anonymity,” ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, p. 3, 2007.
[21] C. Dwork, “Differential Privacy,” Automata, Languages and Programming, M. Bugliesi, B. Preneel, V.
Sassone, and I. Wegener, eds., vol. 4052, pp. 1-12, Springer, 2006.
[22] A. Friedman and A. Schuster, “Data Mining with Differential Privacy,” Proc. 16th ACM SIGKDD Int‟l Conf.
Knowledge Discovery and Data Mining, pp. 493-502, 2010.
[23] K. Fukunaga and D.M. Hummels, “Bayes Error Estimation Using Parzen and K-nn Procedures,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 5, pp. 634-643, http://
portal.acm.org/citation.cfm?id=28809.28814, Sept. 1987.
[24] C. Clifton, “Using Sample Size to Limit Exposure to Data Mining,” J. Computer Security, vol. 8, pp. 281-307,
http://portal.acm.org/ citation.cfm?id=371090.371092, Dec. 2000.
[25] K. Tumer and J. Ghosh, “Bayes Error Rate Estimation Using Classifier Ensembles,” Int‟l J. Smart Eng. System
Design, vol. 5, no. 2, pp. 95-110, 2003.
[26] C. van Rijsbergen, S. Robertson, and M. Porter, “New Models in Probabilistic Information Retrieval,”
Technical Report 5587, British Library, 1980.
[27] D.J. Watts and S.H. Strogatz, “Collective Dynamics of Small- World Networks,” Nature, vol. 393, no. 6684,
pp. 440-442, June 1998.
AUTHORS
JAYASREE DASARI is currently working towards her M.Tech degree in Gokul Institute of Technology
and Sciences, Piridi village, Bobbili mandalam ,Vizianagaram dt, A.P, India. Her research interests include
Networking and cloud computing.
Jayasree Dasari et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.12, December- 2014, pg. 236-244
© 2014, IJCSMC All Rights Reserved 244
K.R.KOTESWARA RAO is working as an Assistant professor in Gokul Institute of Technology and Sciences,
Piridi village, Bobbili mandalam ,Vizianagaram dt, A.P, India. His main research interests are data mining and big
data mining.