GA Based Q-Attack on Community Detection · give two heuristic attack strategies, i.e., Community Detection Attack (CDA) and Degree Based Attack (DBA), as baselines, uti-lizing the

1

GA Based Q-Attack on Community DetectionJinyin Chen, Lihong Chen, Yixian Chen, Minghao Zhao, Shanqing Yu, Qi Xuan, Member, IEEE, Xiaoniu Yang

Abstract—Community detection plays an important role insocial networks, since it can help to naturally divide the networkinto smaller parts so as to simplify network analysis. However, onthe other hand, it arises the concern that individual informationmay be over-mined, and the concept community deception thus isproposed to protect individual privacy on social networks. Here,we introduce and formalize the problem of community detectionattack and develop efficient strategies to attack communitydetection algorithms by rewiring a small number of connections,leading to individual privacy protection. In particular, we firstgive two heuristic attack strategies, i.e., Community DetectionAttack (CDA) and Degree Based Attack (DBA), as baselines, uti-lizing the information of detected community structure and nodedegree, respectively. And then we propose a Genetic Algorithm(GA) based Q-Attack, where the modularity Q is used to designthe fitness function. We launch community detection attack basedon the above three strategies against three modularity basedcommunity detection algorithms on two social networks. Bycomparison, our Q-Attack method achieves much better attackeffects than CDA and DBA, in terms of the larger reduction ofboth modularity Q and Normalized Mutual Information (NMI).Besides, we find that the adversarial networks obtained by Q-Attack on a specific community detection algorithm can be stilleffective on others, no matter whether they are modularity basedor not, indicating its strong transferability.

Index Terms—community detection, community detection at-tack, genetic algorithm, social network, privacy protection, mod-ularity, adversarial network.

I. INTRODUCTION

COMPLEX networks can well represent various complexsystems in our daily life, such as social networks [1],

[2], biological networks [3], [4], power networks [5] andfinancial networks [6]. Network science has been a very activefield because of its highly interdisciplinary nature [7]. Onehot topic in this field is community detection. Since theconcept of community structure was proposed [8], the study ofcommunity detection algorithms has attracted great attentionfrom various disciplines [9]–[11]. Detecting communities inthe network has played an important role in getting a deepunderstanding of its organizations and functions.

However, with the rapid development of community detec-tion algorithms in recent years [12], a new challenging prob-lem arises: information over-mined. People realize that someinformation, even their own privacy, is going to be over-mined

J. Chen, L. Chen, Y. Chen, M. Zhao, S. Yu, and Q. Xuan are withthe Institue of Cyberspace Security and the College of Information En-gineering, Zhejiang University of Technology, Hangzhou 310023, China(e-mail: [email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected]).

X. Yang is with the Institue of Cyberspace Security, Zhejiang University ofTechnology, Hangzhou 310023, China, and with the Science and Technologyon Communication Information Security Control Laboratory, Jiaxing 314033,China (e-mail: [email protected]).

This article has been submitted on October 21th, 2018.Corresponding author: Qi Xuan.

by those social network analysis tools. The web we browse,the people we contact with and more things like that are allrecorded through the Internet. Our interests, circle of friends,partners on business can be easily detected once these data areutilized [13], [14]. Community hiding or deception [15]–[17]is put forward to hide certain community. Still, from individualview and global view, community detection attack is launchedon the network to change some of its structure thereby makingthe performance of related algorithms worse and the accuracyof results lower.

The problem of making communities in the network moredifficult to detect through attack strategies against detectionalgorithm is worthy studying because the good attack strategiescan be a guideline for individuals or organizations to disguisetheir social circles and change the way they interact withothers, lest some privacies get leaked [16]. In this paperwe focus on investigating the attack strategy and testingthe effectiveness of these strategies through experiments. Inparticular, we propose two heuristic attack strategies to addressthe problem shown in the last paragraph. Heuristic strategy isvital because it is easy to implement and plays well sometimes.During the design process of heuristic attack strategies, wetake community detection algorithms and some properties ofcomplex networks into consideration. One question is aboutwhat kind of nodes to be chosen as targets may benefit the at-tack effect. Typically, node centrality is a common-used indexto measure the importance of nodes in the network. It mainlyincludes degree, betweenness and closeness. Compared withbetweenness, whose calculation desires global information,degree is a quantity only based on local information with lesscomputation [18]. Thus, we propose an attack strategy basedon degree and take some tests.

Furthermore, we consider the design of attack strategy as anoptimization problem. Owing to the good performance in solv-ing complex optimization problems, evolutionary algorithms,such as Genetic Algorithm (GA), have been widely used invarious disciplines, including complex networks [19]–[21].Searching for attack schemes which make the performance ofcommunity detection algorithms decrease most under givenattack cost is a typical combinatorial optimization problem.We propose an attack strategy based on GA, namely Q-Attack, where the modularity Q is used to design the fitnessfunction, and show its effectiveness through experiments ondifferent community detection algorithms and several real-world networks. The major contributions of our work aresummarized as:• First, we introduce and formalize community detection

attack problem, which is to implement global communitydeception and privacy protection;

• Second, we propose two heuristic attack strategies andGA based Q-Attack method to launch attack with negli-

arX

iv:1

811.

0043

0v3

[cs

.SI]

11

Nov

201

8

2

gible rewiring to cheat community detection;• Third, we conduct comprehensive experiments to com-

pare the attack effects of different attack strategies againstthree community detection algorithms on several real-world networks;

• Finally, the transferability of Q-Attack is validated and wethink that community detection attack can also be a robustevaluation metric for anti-attack capacity comparison ofcommunity detection algorithms.

The rest of paper is organized as follows. Sec. II reviews therelated work on community detection and attacks on complexnetwork. Sec. III presents our attack strategies for destroyingcommunity structure in the network, including heuristics andoptimization methods. We demonstrate the experiments andresults in Sec. IV and finally draw conclusions in Sec. V.

II. RELATED WORK

A. Community Detection Algorithms

A large number of community detection algorithms havebeen proposed since the problem brought up [12]. And detec-tion algorithms continue to spring up because of the diversityof networks in our real world. Here we mainly give a briefintroduction of some ones.

Girvan and Newman [22] first proposed a communitydetection algorithm based on the betweenness of edges andexperiments showed that it did a good job in small networks.Some researchers made changes on this study and proposednew ones, e.g.,Tyler et al. [23] got approximate betweennessusing the fast algorithm of Brandes, Radicchi et al. [24] usedclustering coefficient of edges instead so that only local infor-mation was required and the efficiency of the algorithm wasimproved. Besides, Newman [25] also proposed a greedy algo-rithm which mainly aimed at optimizing modularity. CNM wasan improvement of Newman’s method, which was proposed byClauset et al. [26]. The main technology used in this algorithmwas the data structures that called stacks. Futhermore, Blondelet al. [27] divided the modularity optimization problem intotwo levels and the computational complexity was essentiallinearly. Spectral clustering algorithms [28], [29] leveragedthe eigenvalue spectrum of several graph matrices to detectthe community structure in networks.

Except these methods based on modularity optimization, re-searchers also considered the problem of community detectionfrom other perspectives. Label Propagation Algorithm (LPA)was proposed by Raghavan et al. [30], whose main idea wasupdating the label of each node according to its neighbors.Rosvall et al. [31] proposed an algorithm called Infomap,which was based on information theory. This algorithm foundout the partition for the network through optimizing theaverage length of description L(M). Further, Ronhovde andNussinov [32] proposed a community detection algorithmbased on the minimization of the Hamiltonian of Potts model.

B. Attacks on Complex Networks

Several earlier researches have studied the influence ofattacks on the network performance. Holme et al. [18] pro-posed four strategies to investigate the attack vulnerability of

different kinds of networks, including ID removal, IB removal,RD removal and RB removal, both for nodes and links. ID re-moval represented removing nodes (or links) according to thedescending order of degrees in the initial network while “B”represented betweenness and “R” represented recalculation. Inthis study, it was validated that the network structure changedand some performances degraded via attacks. Bellingeri etal. [33] found that node deletion according to betweennesscentrality was most efficient while using the size of LargestConnected Component (LCC) as a measure of network dam-age. As for the community structure, Karrer et al. [34] studiedthe robustness of community structure in networks throughperturbing networks in a specific way. They constructed arandom network with the same number of nodes and linksas the original one, then replaced the links in the originalnetwork with that in the random one with the probability ofα. These studies shed lights on the further work about takingattacks on networks.

Taking attacks on some network algorithms is an emergingtopic in recent years. Yu et al. [35] developed both heuristicand evolutionary methods to add subtle perturbations to orig-inal networks, so that some sensitive links would be harder topredict. Dai et al. [36] proposed three methods to modify graphstructure, including RL-S2V, GradArgmax and GeneticAlg,aiming at fooling the classifiers based on GNN models. More-over, Zugner et al. [37] proposed the first adversarial attack onnetworks using GCN, namely NETTACK, and found that thismethod is effective on node classification task. Quite recently,Bojchevski and Gunnemann [38], as well as Chen et al. [39],proposed adversarial attacks on network embedding at thesame time, which may promote the extensive discussion on therobustness of network algorithms, since network embeddingestablishes a bridge between network space and Euclideanspace and thus facilitate many downstream algorithms, suchas node classification and link prediction.

C. Researches on Anti-Detection of Community

As for privacy concerns aroused by community detection,some researchers started paying attention to this problemduring the last two years. Waniek et al. [16] came up withtwo heuristic algorithms ROAM and DICE for hiding in-dividuals and communities, respectively. DICE implementedthrough disconnecting d intra-links of given community Cand connecting b−d inter-links under budget b and a measurewas proposed to evaluate the concealment. Finoda et al. [17]proposed the concept of community deception to cope withcommunity detection. Safeness-based deception algorithm andmodularity-based deception algorithm were mainly testedthrough amounts of experiments. Another job they did wasdefining the deception score to evaluate effects of proposedmethods. Similar to the measure proposed in [16], deceptionscore also took community spread and hiding into accountand it was more comprehensive while reachability was alsoconsidered. Extensive distribution of members in C and lesspercentages of C

′s for each Ci ∈ C showed a good hiding,

where C was the community structure detected by specificcommunity detection algorithm and Ci was one element in C.

3

Fig. 1: The framework of community detection attack. We get the attack scheme Eattack combined with community detectionalgorithms. Two metrics Q and NMI are used to evaluate the attack effect.

There is still little attention that has been paid on theproblem of taking attacks on the network to make it harderfor algorithms to detect the community structure. And bothWaniek’s and Finoda’s researches are dedicated to hide only asmall set of nodes (community C in the broad sense) targetly.In this paper, we address the problem of information over-mined in community detection from another perspective. Sinceit is because of the improvement of the community detectionalgorithms both in detection results and speed that privacyconcerns occur, we aim at degrading the performance of algo-rithms directly. Our goal is to propose some attack strategiesto make the quantification of the communities detected innetworks decrease as much as possible, which means a bettereffectiveness for attacks with subtle network changes.

III. METHODS

First, we briefly demonstrate the major problem we focus onin this paper, using related notations and definitions. Let G =(V,E) be the original network, where V = {v1, v2, ..., vn} isthe set of nodes and E = {e1, e2, ..., em} is the set of links.Eattack = {+e1,−e2, ...,+e2T−1,−e2T } is the solution thatcomprised of a series of links, “+” represents adding the linkto the network while “-” represents removing. Then, we havethe adversarial network G′ = (V ′, E′) defined as follows:{

V ′ = VE′ = E + Eattack

(1)

We thus want to find out such Eattack to change some connec-tions in G and get the adversarial network G′. And for theseG′s, community detection algorithms perform significantlyworse, i.e., the quality of detection results decreases.

Fig. 1 gives the overview of our work. In the following, wegive the details of attack strategies for community detectionalgorithms proposed in this paper, including heuristics andevolutionary ones.

A. Heuristic Attack Strategy

We have introduced that we take link attacks on the networkto change some relations between nodes. In this paper, wechoose rewiring attack to maintain the degree of target nodes,i.e., adding a link to the target node while deleting onefrom it. This reduces the number of nodes whose degreechanges by 2 under one attack, less than 4 for deleting a linkbetween two nodes and adding a link between another two.In social networks, one rewiring attack means getting a falsefriend while hiding a real one for the person chosen as thetarget node, the total number of friends of this person keepsunchanged, such attack way can maintain the person’s statusin the entire network, i.e., rewiring shows a good concealmentof taking attacks on the network.

While thinking of attack strategies, we first come up with aRandom Attack (RA) strategy, described in Algorithm 1, forcomparison. Here, we randomly select K nodes from G toform a target node set Vt. We then randomly select a node vi ∈Vt in each iteration and implement the following operation:deleting an existent link while adding nonexistent one for thetarget node. The neighbor set of the target node vi is denotedby Ni = {vj | < vi, vj >∈ E}, while its non-neighbor set isdenoted by N i. The number of rewiring attacks (or the numberof iterations), denoted by T , is another parameter representingthe attack cost. The procedure of random attack strategy isshown in following pseudo-code.

4

Algorithm 1: Random Attack (RA)Input: G,K,TOutput: G′

1 Vt ← RandomK(G,K);2 AttackNum = 0;3 for AttackNum < T do4 vi ← RandomOne(Vt);5 Ni, Ni ← GetSet(G, vi);6 if Ni 6= ∅ and Ni 6= ∅ then7 vdel ← RandomOne(Ni);8 vadd ← RandomOne(Ni);9 remove < vi, vdel > from E and add

< vi, vadd > to E;10 Update G;11 AttackNum = AttackNum+ 1;12 end13 end14 Get the adversarial network G′ under T rewiring attacks.

Algorithm 2: Community Detection Attack (CDA)Input: G, K, TOutput: G′

1 C = C1, C2, ..., Ch ← Detection(G);2 Vt ← RandomK(G,K);3 AttackNum = 0;4 for AttackNum < T do5 vi ← RandomOne(Vt);6 S1, S2 ← GetSet(G, vi, C);7 if S1 6= ∅ and S2 6= ∅ then8 vdel ← RandomOne(S1);9 vadd ← RandomOne(S2);

10 remove < vi, vdel > from E and add< vi, vadd > to E;

11 Update G;12 AttackNum = AttackNum+ 1;13 end14 end15 Get the adversarial network G′ under T rewiring attacks.

A network with community structure often shows the char-acteristic that having a high density of intra-community linksand a lower density of inter-community links. Deleting linkswithin community and adding links between communitiesthus can weaken the community structure in the network.Therefore, having some knowledge of the community structurein advance can help to design more effective attack strategies.Based on this, we propose a heuristic attack strategy combinedwith the community division results via specific commu-nity detection algorithm, called Community Detection Attack(CDA), the procedure of which is shown in Algorithm 2.

For target node vi, we define its intra-community neighborset as S1 = Ci∩Ni, where Ci is the node set of the communitythat vi belongs to. Similarly, we can get its inter-communitynon-neighbor set S2 = Ci ∩N i, with Ci = V − Ci.

Fig. 2 illustrates how the CDA strategy works graphically.

Fig. 2: C1, C2 represent the communities detected by a specificcommunity detection algorithm. Node 6 is chosen as the targetnode according to CDA. The line with a red “×” representsthe deleted link and the dashed line represents the added link.

Algorithm 3: Target Nodes Chosen Based on Degree1 procedure NodeChoose;2 for len(KNode) < K do3 v ← BiggestDegree(C);4 add node v to KNode;5 remove Ci from C, where v ∈ Ci;6 if K > h then7 update C every turn;8 repeat from step BiggestDegree;9 end

10 end

For the target node, the removal of intra-community linksweakens its connection with the original organization and theaddition of inter-community link generates a force to drag itto the others.

It was found that many real-world networks followed power-law degree distribution [40], consistent with the classic 80/20rule. That means, usually, a small number of nodes possessa large number of connections, while the rest only have afew in real-world networks. Some changes of these hub nodesof larger degree usually have a bigger impact on the wholenetwork structure. For instance, statistically, people with morefriends in social networks always plays more important rolesin the circle and their absence seems to make the party lessactive. Therefore, we propose another heuristic attack strategy,namely Degree Based Attack (DBA), aiming to attack thenodes of large degree in the network.

The only difference between DBA and CDA is that, in DBAwe choose the nodes of large degree as our target nodes, whilein CDA we do it randomly. Thus, next we just explain how tochoose the K target nodes in DBA based on node degree, sincethe rest attack part is totally the same as CDA. The procedureto choose the target nodes based on their degree in DBA isshown in Algorithm 3. The function BiggestDegree gets thenode with biggest degree from the division C, whose initialstate C = {C1, C2, ..., Ch}. And the step update C every turnmeans removing the nodes that have been chosen and one turn

5

Fig. 3: The overview of one iteration of GA based Q-Attack Strategy.

is finished when we choose h nodes from h communities, withone in each community.

B. GA Based Q-Attack

Evolutionary algorithms are a type of intelligent optimiza-tion technologies designed by simulating natural phenomenaor leveraging various mechanisms of living organisms innature [41]–[43]. GA is a typical one of such algorithms. Itsbasic model was firstly introduced and investigated by Hollandin 1975 [41], which mainly leveraged the natural selectionmechanism to solve optimization problems. A populationof chromosomes, which represent individuals with differentcharacteristics, initialize at the beginning of the algorithmand then generate offspring according to the designed geneticoperators, such as crossover and mutation. Individuals withbetter fitness have more chances to survive and multiply sothat the population are able to evolve. Here we propose anattack strategy based on GA to search effective rewiring attackschemes. Fig. 3 gives an overview of the evolutionary process.

Before the implementation of GA, we first give our encod-ing method and fitness function.

• Encoding: Same as the attack strategies we propose inSec. III-A, here we choose a rewiring attack as a gene,including a deleted link and an added link. The length ofeach chromosome represents the number of attacks.

• Fitness function: Modularity is an important evaluationto assess the quality of a particular partition for thenetwork and it has been widely applied in the field ofcommunity detection (more details in Sec. IV-C). Herewe design our fitness function as follows:

f = 2 ∗ e−Q, (2)

indicating that the populations of lower modularity willhave larger fitness. And this is also the reason why wename this attack strategy as Q-Attack.

Now, we are going to find out individuals that can make thepartition of an adversarial network under specific communitydetection algorithms have lower modularity. In the following,we will give a specific illustration about how GA operates incommunity detection attack.• Initialization. An initial population is randomly gen-

erated at a fixed size in GA. One thing we need totake care is avoiding link deletion or addition conflict,which means deleting or adding a link repeatedly. Eachindividual in the population is a combination of rewiringattacks, representing a solution of attacks on the network.

• Selection. Selection is a manifestation of the survival ofthe fittest mechanism. According to the laws of nature’sevolution, the individuals of higher fitness have a greaterchance to survive and multiply than those of lowerfitness. We use a common selection operator, namelyroulette wheel. In this selection way, the probability foran individual to mating is proportional to its fitness, asrepresented by Eq. (3):

pi =f(i)∑nj=1 f(j)

. (3)

• Crossover. Here, we adopt single point crossover whichis the easiest to achieve in GA, i.e., we generate a cutpoint at random and change the gene segments betweenparents. The probability for paired individuals to generateoffspring through crossover operation is denoted by Pc.In order to avoid the conflicts on genes (as mentionedbefore, two identical deleted-links or added-links arenot allowed to appear on one chromosome), we do afeasibility test before two individuals get crossed.

6

Algorithm 4: Q-Attack Based on GAInput: G,PopSize,Generation,Pc,Pm,TOutput: Pop

1 ParentPop← Inatialization(G,PopSize, T );2 while i < Generation do3 SelectedPop← Selection(ParentPop, PopSize);4 CrossoverPop←

Crossover(SelectedPop, PopSize, T, Pc);5 MutationPop←

Mutation(G,CrossoverPop, PopSize, T, Pm);6 OffspringPop←

Elistism(MutationPop, ParentPop);7 ParentPop← OffspringPop;8 i← i+ 1;9 end

• Mutation. To prevent the solution from falling into a lo-cal optimum, mutation operator is a necessary componentin GA. Considering the network characteristics, here, wepropose three kinds of mutation operators: link-deletionmutation, link-addition mutation and link-reconnectionmutation. link-deletion or link-addition mutation indicatesthat the deleted or added link of the gene changesindependently, while link-reconnection mutation indicatesthat a gene is completely replaced. These three mutationsoccur with equal probability and the overall mutation rateis denoted by Pm. Such designs on mutation operatorscan promote the diversity of population. Conflicts detec-tion is also considered while the genes are mutating.

• Elitism. Over the course of evolution, individuals withhigh fitness in the parent may be destroyed. Elitismstrategy has been proposed to solve this problem. In thispaper, we replace the worst 10% of the offspring withthe best 10% of the parents to maintain excellent genes.

• Termination criteria. We set a fixed number of evo-lutionary generation and the algorithm stops when thiscondition is fulfilled.

In summary, GA based Q-Attack mainly involves the encod-ing, fitness function and the design of genetic operators. Thepseudo-code of this attack strategy is shown in Algorithm 4.

IV. EXPERIMENTS

A. Datasets

In order to evaluate the attack effect of our attack strategies,we test them on two real-world datasets, including Zachary’skarate club network [44] and bottlenose dolphin network [45],which are commonly used for community detection withground-truth partitions.• Zachary’s karate club network: It is a dataset about

the relationship in a karate club recorded by Zachary.The network consists of 34 nodes and 78 links, whichis ultimately divided into two groups out of the conflictsbetween the instructor and the administrator.

• Bottlenose dolphin network: This network is created byLusseau, demonstrating the associations of the dolphins

living off Doubtful Sound, New Zealand. The networkconsists of 62 nodes and 159 links. In reality, thesedolphins are divided into two groups.

B. Community Detection Algorithms

Here, we introduce three well-known community detectionalgorithms that are attacked by our strategies proposed inSec. III. Note that these algorithms are mainly used fordetecting non-overlapping community structures.• Fast Newman Algorithm (FN) [25]: It is a hierarchical

agglomerative algorithm. Each node is considered as acommunity at the beginning and we merge communitiesin pairs repeatedly with the aim of optimizing Q, untilall the nodes are merged into one community.

• Spectral Optimization Algorithm (SOA) [28], [29]:Through analyzing the spectral properties of networks,Newman gives a reformulation of modularity:

Q =1

4msT (Aij −

kikj2m

)s =1

4msTBs (4)

where m is the number of links, Aij is the elementof adjacency matrix A, s is a column vector with eachelement si representing the community label of node vi,ki and kj are the degrees of nodes vi and vj , respectively.B is called modularity matrix. The communities thatnodes belong to depend on the signs of the elements inthe leading eigenvector of the modularity matrix.

• Louvain Algorithm (LOU) [27]: The main idea forthis algorithm is considering the community detectionas a multi-level modularity optimization problem. First,it starts with isolated nodes and repeats the process ofremoving the node to the community which obtains maxi-mum gain of modularity until no further improvement canbe achieved. Second, it merges the community detectedby first step into a super-node and constructs a newnetwork. These two steps are performed repeatedly untilthe algorithm is stable.

C. Metrics for Community Structure

• Modularity: It is widely used to measure the qualityof divisions of a network, especially for the networkswith unknown community structure, which was firstlyproposed by Newman and Grivan [46] and is defined by

Q =∑i

(eii − a2i ), (5)

where eii represents the fraction of links in the networkwith two terminals both in cluster Ci, ai =

∑j eij

represents the fraction of links that connect to the nodesin cluster Ci. In other words, modularity Q measures thedifference between actual fraction of within-communitylinks and the expected value of the same quantity withrandom connections. A reformulation of the modularityis represented by Eq. (4).

• Normalized Mutual Information (NMI): It is anothercommonly used criterion to assess the quality of clus-tering results in analyzing network community structure,

7

0 50 100 150 200 250 300 350 400 450 500Generation

1.24

1.26

1.28

1.30

1.32

1.34

1.36

Bes

t Fitn

ess

Pc=0.5Pc=0.6Pc=0.7Pc=0.8

0 50 100 150 200 250 300 350 400 450 500Generation

1.24

1.26

1.28

1.30

1.32

1.34

1.36

Bes

t Fitn

ess

Pm=0.04Pm=0.06Pm=0.08Pm=0.1

Fig. 4: The curves of best fitness for various values of param-eters Pc and Pm when Q-Attack is applied to FN algorithmon Dolphins network.

which was proposed by Danon et al. [47]. For two parti-tions X and Y , the mutual information I(X,Y ) is definedas the relative entropy between the joint distributionP (XY ) and the production distribution P (X)P (Y ):

I(X,Y ) = D(P (X,Y )||P (X)P (Y ))

=∑x,y

p(x, y) logp(x, y)

p(x)p(y). (6)

A noticeable problem for mutual information alone beinga similarity measure is that subpartitions derived from Yby splitting some of its clusters into small ones wouldhave same mutual information with Y . NMI thus isproposed to deal with this problem, which is defined by

Inorm(X,Y ) =2I(X,Y )

H(X) +H(Y ). (7)

The value of NMI indicates the similarity between twopartitions, i.e., the larger value means more similar be-tween the two. When NMI equals 1, partition X isidentical to partition Y .

Table I: Optimal parameters for Q-Attack in different casesFN+Kar FN+Dol SOA+Kar SOA+Dol LOU+Kar LOU+Dol

Pc 0.8 0.6 0.7 0.6 0.7 0.8Pm 0.1 0.06 0.1 0.04 0.1 0.1

D. Experimental Results

Our experiments are performed on a PC i5 CPU 2.60GHzand 4GBs RAM. Our programming environment is python2.7. There are four parameters in GA based Q-Attack. Weset the population size PopSize = 100 and the evolutionarygeneration Generation = 500 uniformly. Then, we tune thecrossover rate Pc and the mutation rate Pm from four differentvalues (0.5, 0.6, 0.7, 0.8 for Pc and 0.04, 0.06, 0.08, 0.1for Pm), respectively, according to the results for parameterconfiguration. Fig. 4 shows the curves of best fitness for a setof experiments when Q-Attack is applied to FN algorithm onDolphins network, and we get the optimal parameters Pc = 0.6and Pm = 0.06 for this case, under T = 4, which is a mediumvalue. Table I gives all the optimal parameters we use for Q-Attack in different situations.

We test our attack strategies under different attack numberT . In Sec. IV-C we give a brief introduction of two metrics,modularity Q and NMI. Modularity evaluates the results fromthe perspective of significance of community structure andNMI evaluates the results from the perspective of accuracy.These two are most common metrics in related researches.Here, we also choose these two metrics to evaluate the effec-tiveness of attack strategies. The results are shown in Figs. 5and 6. For heuristic attack strategies proposed in Sec. III-A,we take the number of target nodes K = 5 (15% nodes) forKarate network and K = 6 (10% nodes) for Dolphins network.Corresponding to each specified T , we generate 50 adversarialnetworks for each strategy and record the mean values of Qand NMI, respectively. For our GA based Q-Attack strategy,we set parameters according to Table I. With T changing from2 to 8, each point on the curve is the mean value over 10runnings. For T = 1, crossover operator described in GA ishard to implement. Here are two solutions for this problem: 1)we can perform an internal crossover for one rewiring attack,i.e. swapping out the deleted link or the added one; 2) we canget the best result by going through all possible situations.Here, we get the results according to the second solution, sincethe two networks we use are both small and thus it only takeslittle time to get the results.

From Figs. 5 and 6, we can see that, in general, for almostall attack strategies, the values of Q and NMI decrease as thebudget T increases, i.e., the attack effect is more significantwhen more links can be changed. As expected, Q-Attackbehaves best to reduce the value of Q. This is not surprisingsince its goal is designed to minimize Q, based on Eq. (2).Moreover, for the other metric NMI, we can still find that thecurves for Q-Attack present sharper decline than the others,indicating its best attack effect on both Karate and Dolphinsnetworks. Such results validate that our GA based Q-Attackis indeed more effective in both weakening the strength ofcommunity structure and reducing the similarity between the

8

1 2 3 4 5 6 7 8

0.25

0.30

0.35

0.40Q

FN+Kar

1 2 3 4 5 6 7 80.10

0.15

0.20

0.25

0.30

0.35

0.40

SOA+Kar

1 2 3 4 5 6 7 8

0.25

0.30

0.35

0.40

LOU+Kar

1 2 3 4 5 6 7 8T

0.2

0.3

0.4

0.5

0.6

0.7

NMI

FN+Kar

1 2 3 4 5 6 7 8T

0.4

0.6

0.8

1.0

SOA+Kar

1 2 3 4 5 6 7 8T

0.1

0.2

0.3

0.4

0.5

0.6

0.7

LOU+Kar

RA CDA DBA Q-Attack

Fig. 5: The attack effects of the four attack strategies on three community detection algorithms and Karate network, for variousnumbers of attacks.

1 2 3 4 5 6 7 80.350

0.375

0.400

0.425

0.450

0.475

0.500

Q

FN+Dol

1 2 3 4 5 6 7 8

0.20

0.25

0.30

0.35

0.40

SOA+Dol

1 2 3 4 5 6 7 8

0.375

0.400

0.425

0.450

0.475

0.500

0.525

LOU+Dol

1 2 3 4 5 6 7 8T

0.30

0.35

0.40

0.45

0.50

0.55

NMI

FN+Dol

1 2 3 4 5 6 7 8T

0.3

0.4

0.5

0.6

0.7

0.8

SOA+Dol

1 2 3 4 5 6 7 8T

0.35

0.40

0.45

0.50

LOU+Dol

RA CDA DBA Q-Attack

Fig. 6: The attack effects of the four attack strategies on three community detection algorithms and Dolphins network, forvarious numbers of attacks.

community detection results and the true labels. Besides, wealso notice that, in most situations, DBA does a better job thanCDA in reducing Q while we get the opposite results on themetric NMI. This seems reasonable by considering that nodesof large degree plays a vital role in maintaining the structureof network and thus attacks on such nodes may affect thenetwork structure more significantly. However, on the other

hand, such large-degree nodes always stay in the centers ofcommunities, and thus are relatively difficult to be groupedinto wrong communities when they are under attack, whichmay lead to the less reduction of NMI for DBA than CDA.

To provide more quantitative comparison, we present therelative reduction of Q and NMI under the four attack strate-gies on the two networks in Table II, where for each column,

9

Table II: Attack results when the attack number is set to 5% of the total number of links in the networks. Here, we onlypresent the relative reduction of Q and NMI, compared with their values in the original network without attack.

StrategyFN+Kar FN+Dol SOA+Kar SOA+Dol LOU+Kar LOU+Dol

Q NMI Q NMI Q NMI Q NMI Q NMI Q NMI

Original 0.381 0.692 0.495 0.573 0.371 1.00 0.390 0.753 0.419 0.587 0.519 0.511

RA 0.24% 19.14% 0.09% 12.52% 5.62% 8.16% 3.81% 11.84% 4.43% 3.79% 2.15% 11.71%

CDA 0.53% 24.23% 2.88% 18.14% 8.64% 19.84% 5.10% 22.94% 6.56% 8.14% 4.71% 11.45%

DBA 4.46% 8.71% 6.49% 15.98% 13.24% 16.84% 12.51% 13.41% 10.75% 2.65% 5.53% 9.48%

Q-Attack 23.64% 41.06% 26.04% 45.27% 37.70% 53.81% 44.53% 62.83% 26.92% 41.30% 24.38% 35.24%

Fig. 7: (a) Communities detected by Louvin algorithm on Karate network.(b) Communities detected by Louvin algorithm onthe adversarial network. Different colors of nodes represent the different communities they belong to, while different shapesrepresent the different communities these nodes belong to according to the given label.

we mark the top two best results as bold. The attack numberis set to 5% of the total number of links in the network, i.e.T = 4 for Karate and T = 8 for Dolphins. Again, we cansee that our GA based Q-Attack has significantly better attackeffects, in terms of larger relative reduction of both Q andNMI, than the three heuristic strategies in all the cases, i.e., forany pair of community detection algorithm and network. DBAhas better attack effects than CDA in all the cases when usingmodularity Q as the performance metric, while the results arereversed for NMI. And as expected, Q-Attack, DBA and CDAbehave much better than the random attack strategy.

In Fig.7, we visualize an example that Q-Attack is appliedon the Louvin algorithm and Karate network. The communitystructure in the original network detected by Louvin algorithmis shown in Fig. 7 (a). The modularity Q and NMI are 0.4188and 0.5866, respectively. We also use Louvin algorithm todetect the community structure in the adversarial networkobtained under 4 rewiring attacks, which is shown in Fig. 7(b). After attack, the modularity Q falls to 0.3053 while NMIfalls to 0.3078 and the number of communities changes from4 to 5. We can see that, indeed, the community detection resultpresents higher-level disorder after attack, indicating the goodattack effect of Q-Attack.

E. Transferability of Q-Attack

As we can see, Q-Attack is typically based on the modular-ity Q which however depends on certain community detectionalgorithm. In other words, there is a threaten that the adver-sarial network designed by Q-Attack strategy to attack a givencommunity detection algorithm may lose its effectiveness onattacking others. If this is true, Q-Attack may be of lesspracticability, since there are a large number of communitydetection algorithms in reality, any of which could be chosento be applied, and we may not know which one it is. In orderto address this concern, we design experiments to testify thetransferability of Q-Attack, i.e., to see whether the adversarialnetwork obtained from certain community detection algorithmcan be still effective for some others.

Except for FN, SOA and LOU, we adopt another twocommunity detection algorithms, including Infomap (INF) andLPA, which are not directly based on modularity Q. Theresults are presented in Table III. Again, for each case, wewe mark the top two best results as bold. We find that the ad-versarial networks obtained by Q-Attack on FN, SOA or LOUseem still effective on other community detection algorithms toa certain extent, no matter whether they are modularity basedor not. In fact, the Q-Attack based on FN, SOA or LOU iseven more effective than heuristic strategies on attacking othercommunity detection algorithms in most cases. These results

10

Table III: Analysis on the transferability of Q-Attack. Here, we use the adversarial networks, generated by Q-Attack onparticular community detection algorithms, to attack other algorithms. And RA, CDA and DBA are adopted as references.

Dataset Metric Network FN SOA LOU INF LPA

Karate

Q

Original 0.381 0.371 0.419 0.402 0.380RA 0.380(0.24%) 0.351(5.62%) 0.400(4.43%) 0.387(3.64%) 0.312(18.13%)

CDA 0.379(0.53%) 0.339(8.64%) 0.391(6.56%) 0.350(12.93%) 0.290(23.89%)DBA 0.364(4.46%) 0.322(13.24%) 0.374(10.75%) 0.338(15.93%) 0.280(26.48%)

Q-Attack(FN) 0.291(23.64%) 0.320(13.84%) 0.376(10.29%) 0.318(20.88%) 0.240(36.91%)Q-Attack(SOA) 0.377(1.00%) 0.231(37.70%) 0.400(4.52%) 0.344(14.49%) 0.274(27.98%)Q-Attack(LOU) 0.341(10.29%) 0.324(12.72%) 0.306(26.92%) 0.263(34.67%) 0.260(31.57%)

NMI

Original 0.692 1.000 0.587 0.699 0.806RA 0.560(19.14%) 0.918(8.16%) 0.564(3.79%) 0.634(9.43%) 0.583(27.58%)

CDA 0.525(24.23%) 0.802(19.84%) 0.539(8.14%) 0.533(23.76%) 0.528(34.45%)DBA 0.632(8.71%) 0.832(16.84%) 0.571(2.65%) 0.648(7.33%) 0.539(33.12%)

Q-Attack(FN) 0.408(41.06%) 0.951(4.89%) 0.560(4.48%) 0.580(17.12%) 0.484(39.91%)Q-Attack(SOA) 0.572(17.47%) 0.462(53.81%) 0.606(-3.37%) 0.556(20.45%) 0.449(44.31%)Q-Attack(LOU) 0.500(27.81%) 0.902(9.77%) 0.344(41.30%) 0.509(27.24%) 0.582(27.71%)

Dolphins

Q

Original 0.495 0.390 0.519 0.524 0.506RA 0.495(0.09%) 0.375(3.81%) 0.507(2.15%) 0.508(3.19%) 0.456(9.83%)

CDA 0.481(2.88%) 0.370(5.10%) 0.494(4.71%) 0.489(6.76%) 0.416(17.74%)DBA 0.463(6.49%) 0.341(12.51%) 0.490(5.53%) 0.479(8.72%) 0.413(18.26%)

Q-Attack(FN) 0.366(26.04%) 0.349(10.55%) 0.481(7.32%) 0.485(7.48%) 0.379(25.17%)Q-Attack(SOA) 0.476(3.92%) 0.217(44.43%) 0.492(5.12%) 0.492(6.25%) 0.432(14.54%)Q-Attack(LOU) 0.455(8.08%) 0 .348(10.85%) 0.389(24.96%) 0.471(10.24%) 0.406(19.82%)

NMI

Original 0.573 0.753 0.511 0.553 0.607RA 0.501(12.52%) 0.664(11.84%) 0.451(11.71%) 0.479(13.47%) 0.575(5.28%)

CDA 0.469(18.14%) 0.580(22.94%) 0.452(11.45%) 0.462(16.53%) 0.598(1.47%)DBA 0.481(15.98%) 0.652(13.41%) 0.462(9.48%) 0.466(15.80%) 0.592(2.41%)

Q-Attack(FN) 0.313(45.27%) 0.637(15.45%) 0.423(17.14%) 0.487(11.93%) 0.603(0.72%)Q-Attack(SOA) 0.483(15.63%) 0.279(62.98%) 0.488(4.50%) 0.463(16.22%) 0.499(17.72%)Q-Attack(LOU) 0.451(21.23%) 0.643(14.64%) 0.382(25.17%) 0.497(10.21%) 0.584(3.81%)

suggest that our Q-Attack has relatively good transferability,and thus rewiring a small number of connections, guided byQ-Attack on a particular community detection algorithm, cansuccessfully attack many others, making it feasible to protectthe individual privacy against community detection.

V. CONCLUSION

In this paper, we propose several strategies, including CDA,DBA and GA based Q-Attack, to attack a number of commu-nity detection algorithms for networks. We perform the exper-iments on two well-known social networks, and find that all ofthese strategies are effective to make the community detectionalgorithms fail partly, in terms of decreasing modularity Qand NMI, by just disturbing a small number of connections.By comparison, Q-Attack behaves much better than CDA andDBA in all the considered white-box cases, where the targetcommunity detection algorithms are known in advance. More-over, this attack strategy also presents certain transferabilityto make effective black-box attacks. That is, the adversarialnetworks, generated by Q-Attack on particular communitydetection algorithms, can still be used to effectively attackmany others. These results indicate that it is feasible to utilize

Q-Attack to protect the individual privacy against communitydetection in reality.

With more and more attentions are focused on the problemof information over-mined in social networks, many worksare left to be studied. For instance, the community detectionalgorithms we considered in the experiments are mainly usedto detect non-overlapping communities, but there are alsomany overlapping communities in the real world, our attackstrategies thus can be extended to such more general situations.Besides, our Q-Attack method only takes modularity Q as theoptimization objective due to its simplicity, therefore, methodsbased on multi-objective optimization are also worth studyingand may present better attack effects.

ACKNOWLEDGMENT

The authors would like to thank all the members in ourIVSN research group in Zhejiang University of Technology forthe valuable discussion about the ideas and technical detailspresented in this paper.

11

REFERENCES

[1] Q. Xuan, Z.-Y. Zhang, C. Fu, H.-X. Hu, and V. Filkov, “Social synchronyon complex networks,” IEEE transactions on cybernetics, vol. 48, no. 5,pp. 1420–1431, 2018.

[2] Q. Xuan, H. Fang, C. Fu, and V. Filkov, “Temporal motifs revealcollaboration patterns in online task-oriented networks,” Physical ReviewE, vol. 91, no. 5, p. 052813, 2015.

[3] E. Bullmore and O. Sporns, “Complex brain networks: graph theoreticalanalysis of structural and functional systems,” Nature Reviews Neuro-science, vol. 10, no. 3, p. 186, 2009.

[4] J. O. Garcia, A. Ashourvan, S. Muldoon, J. M. Vettel, and D. S. Bas-sett, “Applications of community detection techniques to brain graphs:Algorithmic considerations and implications for neural function,” Pro-ceedings of the IEEE, vol. 106, no. 5, pp. 846–867, 2018.

[5] Z. Chen, J. Wu, Y. Xia, and X. Zhang, “Robustness of interdependentpower grids and communication networks: A complex network perspec-tive,” IEEE Transactions on Circuits and Systems II: Express Briefs,vol. 65, no. 1, pp. 115–119, 2018.

[6] S. Schiavo, J. Reyes, and G. Fagiolo, “International trade and financialintegration: a weighted network analysis,” Quantitative Finance, vol. 10,no. 4, pp. 389–399, 2010.

[7] A. Lancichinetti and S. Fortunato, “Community detection algorithms:a comparative analysis,” Physical review E, vol. 80, no. 5, p. 056117,2009.

[8] M. E. Newman, “The structure and function of complex networks,”SIAM review, vol. 45, no. 2, pp. 167–256, 2003.

[9] ——, “The structure of scientific collaboration networks,” Proceedingsof the national academy of sciences, vol. 98, no. 2, pp. 404–409, 2001.

[10] P. M. Gleiser and L. Danon, “Community structure in jazz,” Advancesin complex systems, vol. 6, no. 04, pp. 565–573, 2003.

[11] A. A. Abbasi and M. Younis, “A survey on clustering algorithms forwireless sensor networks,” Computer communications, vol. 30, no. 14-15, pp. 2826–2841, 2007.

[12] S. Fortunato and D. Hric, “Community detection in networks: A userguide,” Physics Reports, vol. 659, pp. 1–44, 2016.

[13] Q. Xuan, M. Zhou, Z.-Y. Zhang, C. Fu, Y. Xiang, Z. Wu, and V. Filkov,“Modern food foraging patterns: Geography and cuisine choices ofrestaurant patrons on yelp,” IEEE Transactions on Computational SocialSystems, vol. 5, no. 2, pp. 508–517, 2018.

[14] C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, and Q. Xuan,“Link weight prediction using supervised learning methods and itsapplication to yelp layered network,” IEEE Transactions on Knowledgeand Data Engineering, 2018.

[15] S. Nagaraja, “The impact of unlinkability on adversarial communitydetection: effects and countermeasures,” in International Symposium onPrivacy Enhancing Technologies Symposium. Springer, 2010, pp. 253–272.

[16] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hidingindividuals and communities in a social network,” Nature HumanBehaviour, vol. 2, no. 2, p. 139, 2018.

[17] V. Fionda and G. Pirro, “Community deception or: How to stop fearingcommunity detection algorithms,” IEEE Transactions on Knowledge andData Engineering, vol. 30, no. 4, pp. 660–673, 2018.

[18] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerabilityof complex networks,” Physical review E, vol. 65, no. 5, p. 056109,2002.

[19] M. Tasgin, A. Herdagdelen, and H. Bingol, “Community detec-tion in complex networks using genetic algorithms,” arXiv preprintarXiv:0711.0491, 2007.

[20] H. Liu, X.-B. Hu, S. Yang, K. Zhang, and E. Di Paolo, “Applicationof complex network theory and genetic algorithm in airline routenetworks,” Transportation Research Record, vol. 2214, no. 1, pp. 50–58,2011.

[21] S. Wang, H. Zou, Q. Sun, X. Zhu, and F. Yang, “Community detec-tion via improved genetic algorithm in complex network,” InformationTechnology Journal, vol. 11, no. 3, pp. 384–387, 2012.

[22] M. Girvan and M. E. Newman, “Community structure in social andbiological networks,” Proceedings of the national academy of sciences,vol. 99, no. 12, pp. 7821–7826, 2002.

[23] J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “E-mail as spec-troscopy: Automated discovery of community structure within organi-zations,” The Information Society, vol. 21, no. 2, pp. 143–153, 2005.

[24] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi,“Defining and identifying communities in networks,” Proceedings of theNational Academy of Sciences, vol. 101, no. 9, pp. 2658–2663, 2004.

[25] M. E. Newman, “Fast algorithm for detecting community structure innetworks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.

[26] A. Clauset, M. E. Newman, and C. Moore, “Finding communitystructure in very large networks,” Physical review E, vol. 70, no. 6,p. 066111, 2004.

[27] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,” Journal of statisticalmechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.

[28] M. E. Newman, “Modularity and community structure in networks,”Proceedings of the national academy of sciences, vol. 103, no. 23, pp.8577–8582, 2006.

[29] ——, “Finding community structure in networks using the eigenvectorsof matrices,” Physical review E, vol. 74, no. 3, p. 036104, 2006.

[30] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithmto detect community structures in large-scale networks,” Physical reviewE, vol. 76, no. 3, p. 036106, 2007.

[31] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complexnetworks reveal community structure,” Proceedings of the NationalAcademy of Sciences, vol. 105, no. 4, pp. 1118–1123, 2008.

[32] P. Ronhovde and Z. Nussinov, “Multiresolution community detection formegascale networks by information-based replica correlations,” PhysicalReview E, vol. 80, no. 1, p. 016109, 2009.

[33] M. Bellingeri, D. Cassi, and S. Vincenzi, “Efficiency of attack strategieson complex model and real-world networks,” Physica A: StatisticalMechanics and its Applications, vol. 414, pp. 174–180, 2014.

[34] B. Karrer, E. Levina, and M. E. Newman, “Robustness of communitystructure in networks,” Physical Review E, vol. 77, no. 4, p. 046119,2008.

[35] S. Yu, M. Zhao, C. Fu, H. Huang, X. Shu, Q. Xuan, and G. Chen,“Target defense against link-prediction-based attacks via evolutionaryperturbations,” arXiv preprint arXiv:1809.05912, 2018.

[36] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adver-sarial attack on graph structured data,” arXiv preprint arXiv:1806.02371,2018.

[37] D. Zugner, A. Akbarnejad, and S. Gunnemann, “Adversarial attackson neural networks for graph data,” in Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery & DataMining. ACM, 2018, pp. 2847–2856.

[38] A. Bojchevski and S. Gunnemann, “Adversarial attacks on node embed-dings,” arXiv preprint arXiv:1809.01093, 2018.

[39] J. Chen, Y. Wu, X. Xu, Y. Chen, H. Zheng, and Q. Xuan, “Fast gradientattack on network embedding,” arXiv preprint arXiv:1809.02797, 2018.

[40] A.-L. Barabasi and R. Albert, “Emergence of scaling in random net-works,” science, vol. 286, no. 5439, pp. 509–512, 1999.

[41] J. Holland, “Adaptation in natural and artificial systems: an introductoryanalysis with application to biology,” Control and artificial intelligence,1975.

[42] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization bysimulated annealing,” science, vol. 220, no. 4598, pp. 671–680, 1983.

[43] R. Eberhart and J. Kennedy, “A new optimizer using particle swarmtheory,” in Micro Machine and Human Science, 1995. MHS’95., Pro-ceedings of the Sixth International Symposium on. IEEE, 1995, pp.39–43.

[44] W. W. Zachary, “An information flow model for conflict and fission insmall groups,” Journal of anthropological research, vol. 33, no. 4, pp.452–473, 1977.

[45] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M.Dawson, “The bottlenose dolphin community of doubtful sound featuresa large proportion of long-lasting associations,” Behavioral Ecology andSociobiology, vol. 54, no. 4, pp. 396–405, 2003.

[46] M. E. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,” Physical review E, vol. 69, no. 2, p. 026113,2004.

[47] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, “Comparingcommunity structure identification,” Journal of Statistical Mechanics:Theory and Experiment, vol. 2005, no. 09, p. P09008, 2005.

Documents

GA Based Q-Attack on Community Detection · give two heuristic attack strategies, i.e., Community Detection Attack (CDA) and Degree Based Attack (DBA), as baselines, uti-lizing the