10
Personalized privacy protection in social networks through adversarial modeling Sachin Biradar and Elena Zheleva Department of Computer Science, University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected] Abstract In order to personalize user experiences and improve the reach of advertisers, online companies increasingly rely on powerful predictive models to infer user attributes and pref- erences. Meanwhile, some of these attributes may be of sen- sitive nature and users have little say in keeping such pre- dicted attribute values as private. Here, we propose a per- sonalized adversarial approach which can empower users to change parts of their online profile in a way that makes it harder for companies to predict their private information. Our work is tailored to state-of-the-art graph convolutional networks which are increasingly used for attribute predic- tion in graphs. We mitigate their predictive power by help- ing users decide which relationships and non-sensitive per- sonal attributes they need to change, in order to protect their privacy. Unlike existing adversarial approaches, we consider two constraints that correspond to realistic real-world scenar- ios: 1) a user has control over changing their own attributes only and no collusion between users can occur, 2) a user is willing to change only certain attributes and relationships but not others. Our approach achieves a much better user privacy protection when compared to state-of-the-art adversarial al- gorithms for graphs. Introduction User-generated content on social media has led to increased interest in mining actionable patterns from user data (Za- farani, Abbasi, and Liu 2014). This content varies from declaring personal attributes, such as name, gender, and ed- ucation, to sharing ideas, photos, events, and interests. It can be broadly classified into explicit and implicit (Novak and Li 2012). Explicit content is information provided by the user on the social media platform; it can be made publicly avail- able or available to a limited set of users on that platform. In contrast, implicit content is one that the users have cho- sen to withhold from the platform but the platform can infer based on explicit user content and the content in their social networks. Advancements in data mining techniques have led to the development of powerful models that can predict im- plicit user information with good accuracy. Predicting im- plicit user information has many applications, from adver- tising and recommender systems (Ruining He 2018) to un- derstanding health-related behavior (Sun et al. 2019a). Copyright c 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. While leading to a better and more personalized user ex- perience and improved reach for advertisers, implicit at- tribute prediction has also caused concerns regarding the pri- vacy of users who participate in social networks (Lee Rainie 2018). Multiple studies have shown that it is possible to reli- ably infer sensitive information from publicly available data (Zheleva and Getoor 2009; Lindamood et al. 2009; Kosinski, Stillwell, and Graepel 2013; Garcia 2017), leading to sensi- tive attribute disclosure (Zheleva, Terzi, and Getoor 2012). In the absence of rules and guidelines on how to ethically perform such predictions, or how much of it is even accept- able, individuals, companies and government entities can ap- ply machine learning models on public data for classification and prediction of user behaviour without the explicit per- mission of the users themselves. Solutions such as allowing users to hide their own information have only been success- ful to a limited extent, due to the fact that these models also take into account the data generated by other users in the network. A natural question one can ask is whether users have any power over protecting their own private informa- tion from being predicted accurately. We propose a user-centric solution that can help users have more control over the prediction of their sensitive at- tributes. We take an adversarial learning approach and de- vise an algorithm, Outwit, which can suggest to users what explicit content to change on their social network profiles, in order to mislead predictive algorithms. Unlike a typical ad- versarial scenario in which the adversary is the person trying to mislead the predictive algorithm, we assume that the pre- dictive algorithm itself is the adversary because it is trying to infer a sensitive attribute. We focus on graph data and as- sume that the predictive algorithm uses graph convolutional networks (GCN) for sensitive attribute prediction. Outwit’s goal is to mislead the predictive algorithm and help a user protect their privacy. As such, Outwit falls un- der the umbrella of evasion attacks in which contaminated test data is introduced, in Outwit’s case the perturbed data of an individual profile. Unlike existing evasion attacks for GCN (Dai et al. 2018b; Wang and Gong 2019) which require the content of multiple profiles to be perturbed, we consider realistic constraints which assume that 1) a user has control over changing their own content only and not the content of other users and 2) that they are willing to change some attributes and relationships but not others.

Personalized privacy protection in social networks through ...elena/pubs/biradar-ppai21.pdfPersonalized privacy protection in social networks through adversarial modeling Sachin Biradar

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • Personalized privacy protection in social networks through adversarial modeling

    Sachin Biradar and Elena ZhelevaDepartment of Computer Science, University of Illinois at Chicago

    Chicago, IL, [email protected], [email protected]

    AbstractIn order to personalize user experiences and improve thereach of advertisers, online companies increasingly rely onpowerful predictive models to infer user attributes and pref-erences. Meanwhile, some of these attributes may be of sen-sitive nature and users have little say in keeping such pre-dicted attribute values as private. Here, we propose a per-sonalized adversarial approach which can empower users tochange parts of their online profile in a way that makes itharder for companies to predict their private information.Our work is tailored to state-of-the-art graph convolutionalnetworks which are increasingly used for attribute predic-tion in graphs. We mitigate their predictive power by help-ing users decide which relationships and non-sensitive per-sonal attributes they need to change, in order to protect theirprivacy. Unlike existing adversarial approaches, we considertwo constraints that correspond to realistic real-world scenar-ios: 1) a user has control over changing their own attributesonly and no collusion between users can occur, 2) a user iswilling to change only certain attributes and relationships butnot others. Our approach achieves a much better user privacyprotection when compared to state-of-the-art adversarial al-gorithms for graphs.

    IntroductionUser-generated content on social media has led to increasedinterest in mining actionable patterns from user data (Za-farani, Abbasi, and Liu 2014). This content varies fromdeclaring personal attributes, such as name, gender, and ed-ucation, to sharing ideas, photos, events, and interests. It canbe broadly classified into explicit and implicit (Novak and Li2012). Explicit content is information provided by the useron the social media platform; it can be made publicly avail-able or available to a limited set of users on that platform.In contrast, implicit content is one that the users have cho-sen to withhold from the platform but the platform can inferbased on explicit user content and the content in their socialnetworks. Advancements in data mining techniques have ledto the development of powerful models that can predict im-plicit user information with good accuracy. Predicting im-plicit user information has many applications, from adver-tising and recommender systems (Ruining He 2018) to un-derstanding health-related behavior (Sun et al. 2019a).

    Copyright c© 2021, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

    While leading to a better and more personalized user ex-perience and improved reach for advertisers, implicit at-tribute prediction has also caused concerns regarding the pri-vacy of users who participate in social networks (Lee Rainie2018). Multiple studies have shown that it is possible to reli-ably infer sensitive information from publicly available data(Zheleva and Getoor 2009; Lindamood et al. 2009; Kosinski,Stillwell, and Graepel 2013; Garcia 2017), leading to sensi-tive attribute disclosure (Zheleva, Terzi, and Getoor 2012).In the absence of rules and guidelines on how to ethicallyperform such predictions, or how much of it is even accept-able, individuals, companies and government entities can ap-ply machine learning models on public data for classificationand prediction of user behaviour without the explicit per-mission of the users themselves. Solutions such as allowingusers to hide their own information have only been success-ful to a limited extent, due to the fact that these models alsotake into account the data generated by other users in thenetwork. A natural question one can ask is whether usershave any power over protecting their own private informa-tion from being predicted accurately.

    We propose a user-centric solution that can help usershave more control over the prediction of their sensitive at-tributes. We take an adversarial learning approach and de-vise an algorithm, Outwit, which can suggest to users whatexplicit content to change on their social network profiles, inorder to mislead predictive algorithms. Unlike a typical ad-versarial scenario in which the adversary is the person tryingto mislead the predictive algorithm, we assume that the pre-dictive algorithm itself is the adversary because it is tryingto infer a sensitive attribute. We focus on graph data and as-sume that the predictive algorithm uses graph convolutionalnetworks (GCN) for sensitive attribute prediction.

    Outwit’s goal is to mislead the predictive algorithm andhelp a user protect their privacy. As such, Outwit falls un-der the umbrella of evasion attacks in which contaminatedtest data is introduced, in Outwit’s case the perturbed dataof an individual profile. Unlike existing evasion attacks forGCN (Dai et al. 2018b; Wang and Gong 2019) which requirethe content of multiple profiles to be perturbed, we considerrealistic constraints which assume that 1) a user has controlover changing their own content only and not the contentof other users and 2) that they are willing to change someattributes and relationships but not others.

  • Algorithm Type Model knowledge Perturbations User-centric Target Model(Zügner et al. 2018) Poisoning Black Box Feature & Edge No GCN

    (Bojchevski et al. 2019) Poisoning White Box Edge No DeepWalk(Sun et al. 2019b) Poisoning Black Box Edge No GCN

    (Chang et al. 2019b) Poisoning Black Box Edge No DeepWalk, LINE & GCN(Dai et al. 2018a) Evasion Black-box/White-box Edge No GCN

    (Wang and Gong 2019) Evasion White Box Edge No LinLBPOutwit Evasion Grey-box Feature & Edge Yes GCN

    Table 1: Comparison between Outwit and other adversarial algorithms for node classification.

    Our main contributions are:

    • Framing a new privacy problem of user-centric adversar-ial perturbations with utility constraints.

    • Proposing a realistic ”grey-box” scenario in which themodel type of the predictive algorithm is known andthe algorithm parameters can be estimated using publiclyavailable data.

    • Developing a new gradient-based algorithm which findsthe minimum number of node attribute and edge changesnecessary to get the sensitive attribute misclassified whilesatisfying the utility constraints.

    • Showing empirically that our solution leads to 8 − 25%absolute accuracy decrease over a state-of-the-art adver-sarial algorithm with as few as 6 edge deletions.

    Related workFriendship links and group affiliations can leak a surpris-ingly large amount of sensitive data about individuals (Zhel-eva and Getoor 2009; Lindamood et al. 2009; Kosinski,Stillwell, and Graepel 2013) There has been an increas-ing interest in personalized privacy assistants that can helpusers manage their privacy online (e.g., (Fang and LeFevre2010; Ghazinour, Matwin, and Sokolova 2013; Wisniewski,Knijnenburg, and Lipford 2017)). Meanwhile, the use ofgraph neural networks has increased in popularity recently(Gori, Monfardini, and Scarselli 2005; Scarselli et al. 2005,2009a,b; Li et al. 2016), posing threats to personal privacy.However, none of the personalized privacy assistant studiesconsider user-centric adversarial approaches to combat thepredictive power of neural network algorithms, as we do inthis work.

    The goal of adversarial attacks is to mislead machinelearning models by either introducing misleading examplesduring the testing phase (evasion attack)(Dai et al. 2018a;Wang and Gong 2019), contaminating the training data withmalicious examples (poisoning attack)(Zügner, Akbarne-jad, and Günnemann 2018; Sun et al. 2019b, 2018; Changet al. 2019b) or by learning about the model it is tryingto mislead as much as possible (exploratory attack) (Big-gio, Fumera, and Roli 2014). Adversarial learning in graphsis fairly new (Zügner, Akbarnejad, and Günnemann 2018;Dai et al. 2018a; Sun et al. 2018). Dai et al. (Dai et al.2018a) demonstrate a black-box evasion attack assumingGCN (Kipf and Welling 2017) as the node classificationmodel. The model is allowed to add or delete edges from thegraph but node features remain unchanged. Dai et al. also

    propose a gradient-based white-box attack algorithm whichis closest to our work and we use as a baseline. In contrastto their greedy combinatorial approach, we use the magni-tude of the gradients along with the degree of the nodes tofind optimal perturbations. Zügner et al. (Zügner, Akbarne-jad, and Günnemann 2018) focus on poisoning attacks, alsousing GCN (Kipf and Welling 2017) as the target model butwithout preserving its non-linearity. The attacker is allowedto manipulate the graph structure as well as the node fea-tures while preserving important graph characteristics (de-gree distribution and feature co-occurrences). Sun et al. (Sunet al. 2018) present a poisoning attack against unsupervisednode embedding methods. The goal is to either change thesimilarity score of a target node pair, or to affect the linkprediction accuracy of a test set by adding or deleting edges.Wang et al. (Wang and Gong 2019) optimizes attack tar-geting Linearized Loopy Belief classifier in order to evadedetection via manipulating the graph structure resulting inincrease of False Negative Rate. Table 1 summarizes thedifferences between these approaches and ours. To the bestof our knowledge, we are the first ones to consider featureperturbations for evasion attacks and to use graph deriva-tive mapping between node features and class labels for de-termining feature perturbations. While other white box eva-sion models (Dai et al. 2018b) greedily add and delete edgesbased on the gradient information, we use the gradient infor-mation to first find the dominant nodes in the graph and thenmake changes to the target node’s edges.

    Preliminaries

    Data model

    We represent the social network as an undirected graph.The users in the network are represented by nodes and theirprofile information is represented by the attributes or fea-tures of the nodes. Relationships, such as friendships or fol-lowing relationships, are represented by edges between theusers. Formally, let G = (A,X) be an attributed undirectedgraph where A ∈ {0, 1}N×N is the symmetric binary ad-jacency matrix representing the relationships between theusers, X ∈ {0, 1}N×D represents the binary nodes’ fea-tures, N represents the number of users and D representsthe number of node features. We denote the D-dimensionalfeature vector of node i as Xi ∈ {0, 1}D. We assume thenode ID to be N = {1, 2, ..., N} and feature-ids to be{1, 2, ..., D}.

  • Figure 1: A user in social network from class A changes some parts of his profile to keep his class label private, successfullygets misclassified to class B

    Target classifier

    We assume that there is a target classifier that is used topredict missing (and potentially sensitive) attributes on userprofiles, in order to target users based on hidden character-istics. This classifier can be built by the company behind thesocial network or third parties with access to the social net-work data. The target classifier performs node classificationand it considers a partially labeled graphGL = (A,X,YL),where YL are nodes with explicit labels. If there are Cclasses, we assume the class labels to be C = {1, 2, ..., C}.Given this partially labelled graph GL, the task of the tar-get classifier is to find an optimal function f : N → Cwhich maps each node i ∈ N to a single class in C in or-der to correctly classify the unlabeled nodes. Since we havethe complete A and X , the features and edges of unlabellednodes (test nodes) in the graph GL are known and are partof the training data hence this corresponds to a transductivelearning scenario (Chapelle, Scholkopf, and Zien 2009).

    Graph Convolutional Networks (GCNs) have achievedstate-of-the-art results on node classification tasks hence weassume that the target classifier is based on GCN by Kipf andWelling (2016) (Kipf and Welling 2017). GCNs use spec-tral graph convolutions in which filter parameters are sharedover all locations in the graph. A single layer in this modelis a non linear function of the form:

    H(k+1) = σ(H(k), Â,W ) (1)

    where H(0) = X ,K is the number of layers and W is train-able weight matrix. The model uses ReLU as the non-linearactivation function; the adjacency matrix A is normalizedfor faster computation. The final model is a 2-layer neuralnetwork with the following propagation rule -

    Z = f(X,A) = softmax(Â ReLU(Â X W (0))W (1))(2)

    where W (0) ∈ RD×H is an input-to-hidden weight matrixfor a hidden layer with H feature maps. W (1) ∈ RH×Cis a hidden-to-output weight matrix. The softmax activa-tion function, defined as softmax(hi) = 1Z exp(hi) withZ =

    ∑i exp(hi), is applied row-wise. To optimize node

    classification, cross-entropy error is used as loss over all la-

    beled examples -

    L = −∑

    Y ∈YL

    ∑c∈C

    Yc lnZc (3)

    Information available to the adversarial algorithmThe goal of our adversarial algorithm is to perturb the graphattributes of a focus node in a way that makes the target clas-sifier predict poorly the node’s sensitive attribute. There aretwo types of information that the adversarial algorithm hasaccess to: 1) information about the target classifier, and 2)user attributes and edges that adversarial algorithm has thepower to perturb.

    Adversarial attacks can be classified into two main groupsbased on the amount of target classifier information accessi-ble to them:

    • White-box attack: The algorithm has access to the tar-get classifier predictions and to its model parameters (Daiet al. 2018a; Sun et al. 2018).

    • Black-box attack: The algorithm has access only to thepredictions of the target classifier (Chang et al. 2019a;Zügner, Akbarnejad, and Günnemann 2018; Dai et al.2018a; Bojchevski and Günnemann 2019).

    Here we introduce a third type of attack which we callGrey-box attack. In a grey-box attack, the adversarial algo-rithm does not have access to the actual target model but haspartial access to the training nodes of the target classifierand their true labels. Therefore, it can estimate the param-eters of the target model based on this data and make nodelabel predictions using the estimated model. Grey-box at-tacks are more practical than white box attacks because theadversarial algorithm is unlikely to have access to the targetclassifier parameters but can estimate them based on pub-licly available data. At the same time, black box attacks arerather limited in their capabilities and are known to performmuch worse than white-box attacks (Dai et al. 2018a).

    Our work assumes a grey-box attack and that the adversar-ial algorithm has access to true class labels for some nodesin the graph to train the estimated classifier. We also makethe practical assumption that the only information that theadversarial algorithm can change is the focus node’s own

  • attributes and friendship links. This user-centric approach isone of the main distinguishing characteristics of our setup ascompared to current adversarial algorithms on graphs whichassume that they can perturb multiple nodes’ profiles at thesame time (Zügner, Akbarnejad, and Günnemann 2018; Daiet al. 2018a; Bojchevski and Günnemann 2019; Chang et al.2019a; Sun et al. 2018; Zügner and Günnemann 2019; Wangand Gong 2019; Sun et al. 2019b; Chang et al. 2019b).

    Privacy protection problemAdversarial PerturbationsThe task of the adversarial algorithm is to modify the profileinformation and relationships of a target user t in the socialnetwork such that the target classifier f cannot predict thetrue class label of t. Since f ’s parameters are unknown, weestimate a classifier f̂ trained on a partially labelled socialnetwork graph GL. Let G̃ be the graph after the adversar-ial algorithm is applied, Xtd ∈ {0, 1} and X̃td ∈ {0, 1} bethe values of feature d for target node t before and after theperturbation, Ati ∈ {0, 1} be a binary value representingwhether an edge exists between node t and node i, and letthe constants γ, δ ∈ Z≥0 represent the maximum numberof changes that are allowed to be made to the feature vectorand edges respectively for node t. Given the classifier f̂ and(Gt, Yt) where Gt is the graph visible to the target node tand Yt is the true class of t, the task of the adversarial algo-rithm g : G → G̃ is to modify the graph G = (A,X) toG̃ = (Ã, X̃) such that:

    maxG̃

    P (f̂(G̃, t) 6= Yt)

    s.t. G̃ = g(f̂ , (Gt, Yt))

    D∑d=1

    | Xtd − X̃td |≤ γ

    i=N∑i=1

    | Ati − Ãti |≤ δ

    (4)

    The constraints ensure that the adversarial algorithm doesnot drastically modify the target node and that the changes itmakes are unnoticeable. These perturbations will allow a so-cial network user to change their profile and safeguard theirsensitive attributes from machine learning predictions.

    The computational complexity of training the estimatedmodel and finding the features and nodes that are most help-ful in prediction is O(EHDC) (Kipf and Welling 2017)where E =

    ∑Ni=1,j=1Ai,j > 0 is the number of graph

    edges. The computational complexity to find edge pertur-bations is O(Et +D) where Etis the degree of target nodeand the complexity to find the optimal feature perturbationisO(D). Hence the complexity to find both feature and edgeperturbations for a target user is O(Et +D).

    Utility-constrained PerturbationsThere is a utility cost associated with hiding or changingattributes on social media. For example, a user may not be

    willing to declare a different gender from their real genderonline or unfriend their spouse, in order to impact the accu-racy of predicting another, sensitive attribute. A user may bemore willing to agree to change favorite foods and moviesor to ”unfriend” an elementary school classmate. Therefore,there is a cost to changing some attribute and our algorithmneeds to take this cost into consideration. We define user-specific feature utilities as U = RN×D such that a high util-ity Ui,d corresponds to user i being less willing to changefeature Xid.

    In order to account for feature utilities in deciding whichfeatures to change, we add a utility constraint to our prob-lem. The utility-constrained adversarial algorithm is allowedto change only those attributes whose user utilities are belowa certain threshold. The following is the final optimizationproblem with utility constraints:

    maxG̃

    P (f̂(G̃, t) 6= Yt)

    s.t. G̃ = g(f̂ , (Gt, Yt))

    ∀t, d : Utd ≥ η =⇒ X̃td = Xtd∀i, j : Vij ≥ µ =⇒ Ãij = AijD∑

    d=1

    | Xtd − X̃td |≤ γ

    i=N∑i=1

    | Ati − Ãti |≤ δ

    (5)

    where Utd is the utility of the feature d for node t, Vij isthe utility of the edge between node i and j, η and µ are thethresholds below which the features and edge are allowed tobe modified respectively.

    Adversarial, user-centric privacy protectionHere, we present Outwit, our algorithm for solving the pri-vacy protection problem through adversarial perturbations,targeting Graph Convolutional Networks. Given a graph Gand a target classifier f̂ , our goal is to protect the privacy ofa target user t by performing a small number of perturba-tions on their profile leading to the graph G̃, such that t getsmisclassified by f̂ . Outwit has two types of perturbations:feature perturbations and edge perturbations that we de-scribe next. To the best of our knowledge, we are the firstones to consider feature perturbations for evasion attacksand to use graph derivative mapping between node featuresand class labels for determining feature perturbations. As faras novelty in the edge perturbations, other white box eva-sion models (Dai et al. 2018b) greedily add and delete edgesbased on the gradient information but we use the gradientinformation to first find the dominant nodes in the graph andthen make changes to the target node’s edges.

    Feature perturbationsGiven the estimated node classifier f̂ , Outwit finds the thenode’s attributes most likely to cause the change to the pre-diction in the target classifier. In order to do so, it looks at

  • dLdW (0)

    , the gradient of the loss in the final layer with respectto the weights in the first layer:

    dLdW (0)

    =dLdH(0)

    × dH(0)

    dW (0)= RD×H (6)

    Outwit feeds feature matrix X as the input in the first layerhence the output of the first layer H(0) is  X W (0). Be-cause weight vector W (0) is being multiplied with X thisgradient indicates the importance given to different featuresin the nodes. There are H spectral filters in the first hid-den layer of our GCN i.e. W (0) = RN×D, hence there areH gradient vectors, each associated with one of the filterparameters. Each of the spectral filters focuses on specificfeatures to pass to the next GCN layer. Outwit makes a listof all such features in decreasing order of the magnitude oftheir gradients and select the top features. Then it finds theassociation of these features with one of the class labels.

    For each node in the labeled set Outwit finds the distribu-tion of class labels with respect to the feature. For all suchfeatures in our final list, it finds one class or at most a combi-nation of two classes to have a much higher frequency thanthe remaining classes.

    d =⇒ c | c = argmax(N∑i=1

    Yi, whereXi,d = X ) (7)

    where X is a particular feature. Then Outwit creates a map-ping for the classes and the features associated with them.

    Given a target node, the top γ features associated with thetrue class of this node which are visible to the classifier f̂are set to 0. Then Outwit uses the list of mappings betweenfeatures and the classes to select specific features from thesemappings that are associated with a class other than the tar-get user’s true class. These features are changed to 1 if theywere 0 before, until reaching γ perturbations, the upper limitof allowed feature perturbations.

    Edge perturbationsEdges between nodes in a graph play a prominent role innode classification. The target GCN model can predict thetarget node’s label with considerable accuracy even whennone of the features from the test set are revealed to it (referto Table 4 in Section 29) by using labels from the neighbour-ing labeled nodes. It is thus important to find these dominantneighboring users in the target user’s neighbourhood and re-move the connections to them, in order to decrease the con-fidence of the classifier towards the target node’s true classlabel. To this end, Outwit finds

    dLd(edges)

    =dLdA

    =dLdH(0)

    × dH(0)

    dA= RN×N (8)

    the gradient of the loss in the final layer with respect toedges represented by A, the adjacency matrix. Some of theedges have a larger gradient magnitude than others whichsuggests that these edges provide significant information totheir neighbouring node in the aggregation process of graphconvolution. Outwit orders all node pairs (edges) in decreas-ing order of their magnitude of gradient. The users repre-sented by nodes with a significantly large frequency of oc-currence are the best classification contributors in the graph,

    they pass their class label information to their neighbouringnodes. Outwit maintains a mapping of all classes to thesedominant nodes. Outwit scans all the edges of the target userin the order of decreasing gradient and connections to thetop δ nodes that have the same class as the target node’s trueclass are removed.

    It is also possible to connect the target node to dominantusers in GL, in order to increase the confidence of the nodeclassifier towards the class label of these dominant nodes.Outwit finds the dominant users from the class other thanthe target user’s true class and adds edges between the targetnode and these dominant nodes until it reaches the constraintof maximum edge perturbations δ. Dependent on the plat-form, edge addition perturbations can be harder to achievethan edge removals. For example, it is a lot easier to add alink to someone when reciprocity of relationship is not re-quired (e.g., follower relationship on Twitter) versus when itis required (e.g., friendship relationship on Facebook).

    Utility-constrained Outwit Algorithm

    To protect privacy, Outwit perturbs information on user’sprofile as described in the previous subsections. The inputto the Outwit algorithm is a partially labeled graph GL, atarget node t and its true class label Yt. The output is G̃ suchthat the target user profile information is perturbed. Beforeit begins, it trains the GCN classifier f̂ according to Equa-tion 2 to obtain trained weightsW (0) andW (1). Algorithm 1shows the pseudo-code for Outwit which first looks for fea-tures to perturb (Lines 8 − 18). It finds the gradient of lossL of f̂ with respect to W (0). It looks for the most importantfeature in every spectral filter h ∈ H and stores them in thefeatures variable (lines 10− 12). These features representthe most important attributes of the user profile as these helpthe classifier to identify the node’s class. But what is the re-lation between these features and different classes? Whichfeature combinations help the classifier to predict a certainclass? Lines 13− 15 try to find exactly this. For each of theimportant features in the variable features, Outwit mapsthem to a class in featuremap variable. Now that Outwithas the most important node features and their relation tothe class labels, it changes the feature to decreases the con-fidence of node classifier prediction towards y and increasethe confidence towards other classes for our target user node(lines 17− 18). Then it finds edge perturbation for the targetuser t (lines 20− 27), starting by finding the gradient of lossLwith respect toA, adjacency matrix. Outwit finds the userswhich are most dominant in passing their class labels to theirneighbours in the GCN aggregation phase (lines 20 − 25).Now that it has the dominant nodes in the graph, Outwitremoves the edges to dominant nodes whose class label issame as target node’s class y and add edges to dominantnodes from other class in (lines 26 − 27). This, combinedwith the feature perturbation, will outwit the target classifier

  • into misclassifying the target user.

    Algorithm 1: Outwit AlgorithmInput:

    1 Graph GL = (A,X,YL)2 Target node t and it’s true class Yt3 Utility U

    Output:4 Perturbed graph G̃ = (Ã, X̃)5

    6 Train f̂ on GL to obtain W (0) and W (1)7/* Feature Perturbations */

    8 features = []

    9 gradientW =dL

    dW (0)

    10 foreach h ∈ H do11 if U [t][d] ≤ η then12 features← sort descending(d, key =

    gradientW [h][d])

    13 feature map = []14 foreach d do15 distribution = class distribution(d)16 feature map← (d, index(max(distribution)))17 Set top γ features of t to 0 from feature map where

    class = y18 Set top γ features of t to 1 from feature map where

    class 6= y19/* Edge Perturbations */

    20 gradientA =dLdA

    21 edge grads← sort descending(e, key =gradientA)

    22 node frequency ← []23 foreach (n1, n2) ∈ edge grads do24 node frequency ← frequency(n1, n2)25 sort(node frequency)26 Remove top δ edges from G such that t ∈ (·, ·) and

    the node class = y // Equation 427 Add top δ edges to G such that t ∈ (·, ·) and the node

    class 6= y28

    29 Return G̃

    ExperimentsDatasetsWe consider three citation network datasets, Citeseer (Luand Getoor 2003), Cora (Lu and Getoor 2003), and Pubmed(Sen et al. 2008), together with one Twitter retweet net-work (Ribeiro et al. 2018). Dataset statistics are summa-rized in Table 2. We treat each scientific paper as a nodeand the citation links as (undirected) edges between them.The Cora and CiteSeer datasets consists of sparse 0/1-valuedbag-of-words feature vectors for each document indicatingthe absence/presence of the corresponding word in the doc-ument and a list of citation links between them. Each publi-cation in the the Pubmed diabetes dataset is described by

    a TF/IDF weighted word vector from a dictionary whichconsists of 500 unique words. The Twitter node features arespaCy’s off-the-shelf 300-dimensional GloVe’s vector alongwith few profile attributes and manual annotation whether auser is hateful or not. The features are averaged across allwords in a given tweet, and subsequently, across all tweetsa user has. Because these features are not binary, we applyonly edge perturbations to this dataset and not feature ones.

    Dataset Nodes Edges Features ClassesTwitter 4971 15,142 320 2Citeseer 3,327 4,732 3,703 6

    Cora 2,708 5,429 1,433 7Pubmed 19,717 44,338 500 3

    Table 2: Dataset statistics.

    Experimental setupOur target model for the node classification task is a GraphConvolutional Network from (Kipf and Welling 2017). Themodel is trained with a 2-layer GCN. The model hyperpa-rameters such as number of epochs, layers and hidden units,learning rate and dropout rate are kept the same as in (Kipfand Welling 2017).

    We show results of the Outwit algorithm with equal utilityand various settings of δ and γ ranging from 6 feature per-turbations, 2 edge perturbations to 20 feature perturbation, 8edge perturbations. To fairly compare with the baseline al-gorithms, we also run experiments with edge perturbationsonly with varying amount of δ from 2 to 8. We test our algo-rithm with varying utility distribution with range values forα and β. As we know that the user has no control over theamount of information that is publicly available to train themodel, we run experiments with different ratio of trainingdataset varying the set of labelled nodes in training set from10% to 100%.

    BaselinesWe use the white-box models by Dai et al. (2018a) and Wangand Gong (2019) models as baseline models in our experi-ments and adapt them to the user-centric, grey-box assump-tions of our privacy protection scenario. Because (Dai et al.2018a) and (Wang and Gong 2019) don’t perturb features, afair comparison requires to show results after edge perturba-tions only. We also consider simple feature perturbations asbaselines, such as setting all the target node features to ze-ros, to ones and random. For simple edge perturbations weuse random edge rewiring to change the links between thenodes. Table 4 shows classification accuracy for the simpleperturbations. The row ”unperturbed” shows the result for f̂ .As we can see the average accuracy drops very little for bothsimple feature and edge perturbations which makes Outwit’stask very challenging.

    ResultsTarget node perturbation Figure 2 shows the classifica-tion performance of the target model when perturbing one

  • Figure 2: Average node classification accuracy after perturbing test nodes using Outwit algorithm. δ and γ represent maximumnumber of edge and feature perturbations with δ/2, γ/2 additions and δ/2, γ/2 deletions.

    Figure 3: Average node classification accuracy after perturbing test nodes using Outwit algorithm, showing significant decrease(23− 47%) in accuracy compared to the baseline when 6 or more edges are deleted.

    node at a time. The test nodes are perturbed using the Out-wit algorithm with different settings of γ and δ. There is asignificant decrease in classification accuracy with only 4edge perturbations and 6 feature perturbations. As expected,the average accuracy drops as we allow more perturbations.With a maximum of just 8 edge perturbations and 10 fea-ture perturbations (0.5% of the total information present inthe target node) the average accuracy drastically falls morethan 50%. Figure 3 shows the comparison of Outwit with thebaseline model with only edge perturbations.

    Different ratios of training data Since social networksevolve and change over time, the ratio of publicly availableand hidden data can change as well. In order to train the tar-get model, we assume that few of the nodes in the graph arelabelled. The ratio of this trained data may vary across differ-ent social networks and may also vary over time. So we testthe perturbations generated by Outwit algorithm with differ-ent ratios of training data. Table 3 shows the performance ofthe perturbations with varying ratios of labelled nodes. ”Es-timated” column shows the performance of model which istrained on partial training data and ”Target” column showsthe performance of the company model trained on 100% oftraining data. It is seen that the perturbations generated byour algorithm work well for all the ratios of training dataand as the previous experiments, we are perturbing one testnode at a time, and not their neighbors’ information.

    Increasing number of colluding users One of the as-sumptions of our paper is that perturbations are user-centric

    and Outwit is not allowed to change the attributes of multi-ple nodes. Here we relax this assumption and show empiri-cally how the average accuracy of the target model is furtherreduced when a randomly selected group of 10 to 50 userssimultaneously use adversarial perturbations from the Out-wit algorithm. Figure 4 shows the performance of the targetmodel when many random users simultaneously use 6 edgeperturbations and 8 feature perturbations with utility con-straints with α = β = 0.5. The x-axis represent numberof users simultaneously adding adversarial information andy-axis represent the average node classification accuracy ofGCN. As the number of users increases, the average classi-fication accuracy of the target model decreases even furtherbecause adversarial information is passed on to neighbour-ing nodes in the GCN aggregation phase, causing the targetmodel performance to decrease further.

    Figure 4: Performance with increasing number of colludingusers for Outwit (OA) and the baseline (BL).

  • TrainingRatio

    Cora CiteseerEstimated Target Estimated Target

    Perturbed Unperturbed Perturbed Unperturbed Perturbed Unperturbed Perturbed Unperturbed10% 19.1% 70.1% 40.2% 85.5% 5.8% 65.1% 38.4% 77.7%30% 13.1% 80.8% 31.7% 85.5% 6.0% 72.9% 21.7% 77.7%50% 12.8% 85.3% 22.1% 85.5% 8.3% 74.9% 19.4% 77.7%70% 15.1% 86.3% 20.3% 85.5% 9.6% 75.8% 15.1% 77.7%90% 14.2% 85.4% 16.8% 85.5% 10% 75.8% 12.5% 77.7%100% 15.0% 85.5% 15.0% 85.5% 9.3% 77.7% 9.3% 77.7%

    TrainingRatio

    Pubmed TwitterEstimated Target Estimated Target

    Perturbed Unperturbed Perturbed Unperturbed Perturbed Unperturbed Perturbed Unperturbed10% 36.3% 83.2% 53.2% 87.9% 32.8% 66.8% 52.7% 76.5%30% 38.7% 85.5% 49.5% 87.9% 29.6% 68.9% 44.3% 76.5%50% 39.8% 85.8% 46.9% 87.9% 27.9% 70.4% 38.5% 76.5%70% 39.7% 86.8% 43.4% 87.9% 25.7% 72.3% 29.9% 76.5%90% 40.2% 86.7% 41.7% 87.9% 26.8% 75.7% 27.3% 76.5%100% 40.8% 87.9% 40.8% 87.9% 25.2% 76.5% 25.2% 76.5%

    Table 3: Average accuracy for varying ratios of labelled training nodes. The ”unperturbed” column shows the average accuracyof the learned classifier (estimated or target) on test data and the ”perturbed” column shows average accuracy after the Outwituser-centric perturbation.

    Strategy Cora Citeseer PubmedUnperturbed 80.8% 72.9% 85.5%

    FeatureAll set to 0 76.6% 67.3% 81.1%All set to 1 76.6% 67.2% 81.4%

    Random 76.8% 67.0% 80.3%Edge Random rewiring 75.7% 65.1% 78.7%

    Table 4: Accuracy for simple feature and edge perturbations

    Sensitivity of Outwit to node degree Here we analyze thesensitivity of Outwit to the node degree and whether it per-forms poorly for low-degree nodes. Figure 5 shows the re-lation between average accuracy of perturbed nodes to theirdegree. As expected, Outwit can better perturb the node topreserve privacy with increasing node degree. Nevertheless,Outwit still performs much better than all the baselines asshown earlier.

    Figure 5: Sensitivity of Outwit to node degree.

    Varying Utility In the real world every user has differ-ent utility distribution for their profile attributes. In orderto replicate the level of mutability of profile attributes wegenerate U , a utility distribution on top of each dataset.We model the utility values for attributes of a user as ran-

    dom variables from a Bernoulli distribution. Each featurehas a different parameter p for their Bernoulli drawn from aBeta distribution prior α and β. We vary the Beta parameterα = 2,β = 5(i.e., features tend to have mostly high levelsof utility) to α = 5,β = 2 (i.e., features tend to have mostlylow levels of utility p parameters). This allows us to draw autility value for each user and attribute pair. Figure 6 showsthe average classification accuracy under utility constraints.α and β represent the parameters for the beta distributionused to generate the utility distribution. It is observed thateven when utility constraints are applied, the perturbationssuggested by algorithm significantly decrease the node clas-sification accuracies when test data is perturbed one node ata time. The accuracies are consistently low for all values ofα and β.

    Figure 6: Average accuracy with varying utility. α , β are theparameters of the beta distribution.

    ConclusionIn this work, we presented Outwit, an algorithm which helpssocial network users find small user-centric adversarial per-turbations in their profile which lead to a significant decreasein prediction performance of target classifiers in the form of

  • graph convolutional networks. Outwit is a viable approachby which users can ensure that their sensitive informationcannot be reliably used. Our experiments show that only afew perturbations are required to prevent a target classifierfrom predicting the true label. Even after applying the utilityconstraints on the perturbations, the algorithm was able toperform well.

    AcknowledgmentsThis material is based on research sponsored in part by theNational Science Foundation under Grant No. 1801644.

    ReferencesBiggio, B.; Fumera, G.; and Roli, F. 2014. Security evalu-ation of pattern classifiers under attack. IEEE transactionson knowledge and data engineering .

    Bojchevski, A.; and Günnemann, S. 2019. Adversarial At-tacks on Node Embeddings via Graph Poisoning. In Chaud-huri, K.; and Salakhutdinov, R., eds., Proceedings of the36th International Conference on Machine Learning, vol-ume 97 of Proceedings of Machine Learning Research, 695–704. Long Beach, California, USA: PMLR. URL http://proceedings.mlr.press/v97/bojchevski19a.html.

    Chang, H.; Rong, Y.; Xu, T.; Huang, W.; Zhang, H.; Cui,P.; Zhu, W.; and Huang, J. 2019a. The General Black-boxAttack Method for Graph Neural Networks.

    Chang, H.; Rong, Y.; Xu, T.; Huang, W.; Zhang, H.; Cui, P.;Zhu, W.; and Huang, J. 2019b. A Restricted Black-box Ad-versarial Framework Towards Attacking Graph EmbeddingModels.

    Chapelle, O.; Scholkopf, B.; and Zien, A. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book re-views]. IEEE Transactions on Neural Networks .

    Dai, H.; Li, H.; Tian, T.; Huang, X.; Wang, L.; Zhu, J.; andSong, L. 2018a. Adversarial Attack on Graph StructuredData. In Dy, J.; and Krause, A., eds., Proceedings of the 35thInternational Conference on Machine Learning, Proceed-ings of Machine Learning Research. Stockholmsmässan,Stockholm Sweden: PMLR.

    Dai, Q.; Li, Q.; Tang, J.; and Wang, D. 2018b. AdversarialNetwork Embedding. In AAAI.

    Fang, L.; and LeFevre, K. 2010. Privacy Wizards for SocialNetworking Sites. In Proc. WWW.

    Garcia, D. 2017. Leaking privacy and shadow profiles inonline social networks. Science advances .

    Ghazinour, K.; Matwin, S.; and Sokolova, M. 2013. Mon-itoring and recommending privacy settings in social net-works. In Proc. EDBT.

    Gori, M.; Monfardini, G.; and Scarselli, F. 2005. A newmodel for learning in graph domains. In Proceedings. 2005IEEE International Joint Conference on Neural Networks,2005. IEEE.

    Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Clas-sification with Graph Convolutional Networks. In 5th In-ternational Conference on Learning Representations, ICLR2017, Toulon, France, April 24-26, 2017.

    Kosinski, M.; Stillwell, D.; and Graepel, T. 2013. Privatetraits and attributes are predictable from digital records ofhuman behavior. Proceedings of the National Academy ofSciences .

    Lee Rainie. 2018. Americans’ complicated feel-ings about social media in an era of privacy con-cerns. URL http://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/.

    Li, Y.; Tarlow, D.; Brockschmidt, M.; and Zemel, R. S. 2016.Gated Graph Sequence Neural Networks. In 4th Interna-tional Conference on Learning Representations, ICLR 2016,San Juan, Puerto Rico, May 2-4, 2016.

    Lindamood, J.; Heatherly, R.; Kantarcioglu, M.; and Thu-raisingham, B. 2009. Inferring private information using so-cial network data. In Proceedings of the 18th internationalconference on World wide web. ACM.

    Lu, Q.; and Getoor, L. 2003. Link-based classification. InProceedings of the 20th International Conference on Ma-chine Learning (ICML-03).

    Novak, E.; and Li, Q. 2012. A survey of security and pri-vacy in online social networks. College of William and MaryComputer Science Technical Report .

    Ribeiro, M. H.; Calais, P. H.; Santos, Y. A.; Almeida, V. A.;and Meira Jr, W. 2018. Characterizing and detecting hatefulusers on twitter. In Twelfth International AAAI Conferenceon Web and Social Media.

    Ruining He. 2018. PinSage: A new graph convolutionalneural network for web-scale recommender systems.URL https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48.

    Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; andMonfardini, G. 2009a. Computational capabilities of graphneural networks. IEEE Transactions on Neural Networks .

    Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; andMonfardini, G. 2009b. The graph neural network model.IEEE Transactions on Neural Networks .

    Scarselli, F.; Yong, S. L.; Gori, M.; Hagenbuchner, M.;Tsoi, A. C.; and Maggini, M. 2005. Graph neural net-works for ranking web pages. In Proceedings of the 2005IEEE/WIC/ACM International Conference on Web Intelli-gence. IEEE Computer Society.

    Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.;and Eliassi-Rad, T. 2008. Collective classification in net-work data. AI magazine .

    Sun, M.; Tang, J.; Li, H.; Li, B.; Xiao, C.; Chen, Y.;and Song, D. 2018. Data Poisoning Attack against Un-supervised Node Embedding Methods. arXiv preprintarXiv:1810.12881 .

    http://proceedings.mlr.press/v97/bojchevski19a.htmlhttp://proceedings.mlr.press/v97/bojchevski19a.htmlhttp://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/http://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/http://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48https://medium.com/pinterest-engineering/pinsage-a-new-graph-convolutional-neural-network-for-web-scale-recommender-systems-88795a107f48

  • Sun, M.; Zhao, S.; Gilvary, C.; Elemento, O.; Zhou, J.; andWang, F. 2019a. Graph convolutional networks for compu-tational drug development and discovery. Briefings in Bioin-formatics ISSN 1477-4054.Sun, Y.; Wang, S.; Tang, X.; Hsieh, T.-Y.; and Honavar, V.2019b. Node Injection Attacks on Graphs via Reinforce-ment Learning.Wang, B.; and Gong, N. Z. 2019. Attacking Graph-basedClassification via Manipulating the Graph Structure. In Pro-ceedings of the 2019 ACM SIGSAC Conference on Com-puter and Communications Security, CCS ’19, 2023–2040.New York, NY, USA: ACM. ISBN 978-1-4503-6747-9.doi:10.1145/3319535.3354206. URL http://doi.acm.org/10.1145/3319535.3354206.Wisniewski, P.; Knijnenburg, B. P.; and Lipford, H. R. 2017.Making privacy personal: Profiling social network users toinform privacy education and nudging. Int. J. Human-Computer Studies 98.Zafarani, R.; Abbasi, M. A.; and Liu, H. 2014. Social MediaMining: An Introduction. New York, NY, USA: CambridgeUniversity Press. ISBN 1107018854, 9781107018853.Zheleva, E.; and Getoor, L. 2009. To join or not to join:the illusion of privacy in social networks with mixed publicand private user profiles. In Proceedings of the 18th inter-national conference on World wide web. ACM.Zheleva, E.; Terzi, E.; and Getoor, L. 2012. Privacy in socialnetworks. Synthesis Lectures on Data Mining and Knowl-edge Discovery 3(1): 1–85.Zügner, D.; Akbarnejad, A.; and Günnemann, S. 2018. Ad-versarial Attacks on Neural Networks for Graph Data. InProceedings of the 24th ACM SIGKDD International Con-ference on Knowledge Discovery & Data Mining, KDD ’18.New York, NY, USA: ACM. ISBN 978-1-4503-5552-0.Zügner, D.; and Günnemann, S. 2019. Adversarial Attackson Graph Neural Networks via Meta Learning. In Interna-tional Conference on Learning Representations (ICLR).

    http://doi.acm.org/10.1145/3319535.3354206http://doi.acm.org/10.1145/3319535.3354206

    IntroductionRelated workPreliminariesData modelTarget classifierInformation available to the adversarial algorithm

    Privacy protection problemAdversarial PerturbationsUtility-constrained Perturbations

    Adversarial, user-centric privacy protectionFeature perturbationsEdge perturbationsUtility-constrained Outwit Algorithm

    ExperimentsDatasetsExperimental setupBaselinesResults

    ConclusionAcknowledgments