Social Media Mining: An Introductiondmml.asu.edu/smm/chapters/SMM-ch6.pdfChapter 6 Community Analysis This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad

Chapter 6Community Analysis

This chapter is from Social Media Mining: An Introduction.By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu.Cambridge University Press, 2014. Draft version: April 20, 2014.Complete Draft and Slides Available at: http://dmml.asu.edu/smm

In November 2010, a team of Dutch law enforcement agents dismantleda community of 30 million infected computers across the globe that weresending more than 3.6 billion daily spam mails. These distributed networksof infected computers are called botnets. The community of computers ina botnet transmit spam or viruses across the web without their owner’spermission. The members of a botnet are rarely known; however, it isvital to identify these botnet communities and analyze their behavior toenhance internet security. This is an example of community analysis. In thischapter, we discuss community analysis in social media.

Also known as groups, clusters, or cohesive subgroups, communities havebeen studied extensively in many fields and, in particular, the social sci-ences. In social media mining, analyzing communities is essential. Study-ing communities in social media is important for many reasons. First,individuals often form groups based on their interests, and when study-ing individuals, we are interested in identifying these groups. Considerthe importance of finding groups with similar reading tastes by an on-line book seller for recommendation purposes. Second, groups provide aclear global view of user interactions, whereas a local-view of individualbehavior is often noisy and ad hoc. Finally, some behaviors are only ob-servable in a group setting and not on an individual level. This is becausethe individual’s behavior can fluctuate, but group collective behavior ismore robust to change. Consider the interactions between two opposingpolitical groups on social media. Two individuals, one from each group,can hold similar opinions on a subject, but what is important is that their

175

http://dmml.asu.edu/smm

communities can exhibit opposing views on the same subject.In this chapter, we discuss communities and answer the following three

questions in detail:

1. How can we detect communities? This question is discussed in differ-ent disciplines, and in diverse forms. In particular, quantization inelectrical engineering, discretization in statistics, and clustering inmachine learning tackle a similar challenge. As discussed in Chap-ter 5, in clustering, data points are grouped together based on asimilarity measure. In community detection, data points representactors in social media, and similarity between these actors is oftendefined based on the interests these users share. The major differencebetween clustering and community detection is that in communitydetection, individuals are connected to others via a network of links,whereas in clustering, data points are not embedded in a network.

2. How do communities evolve and how can we study evolving communities?Social media forms a dynamic and evolving environment. Similarto real-world friendships, social media interactions evolve over time.People join or leave groups; groups expand, shrink, dissolve, orsplit over time. Studying the temporal behavior of communities isnecessary for a deep understanding of communities in social media.

3. How can we evaluate detected communities? As emphasized in ourbotnet example, the list of community members (i.e., ground truth)is rarely known. Hence, community evaluation is a challenging taskand often means to evaluating detected communities in the absenceof ground truth.

Social Communities

Broadly speaking, a real-world community is a body of individuals withcommon economic, social, or political interests/characteristics, often livingin relatively close proximity. A virtual community comes into existencewhen like-minded users on social media form a link and start interactingwith each other. In other words, formation of any community requires (1)a set of at least two nodes sharing some interest and (2) interactions withrespect to that interest.

176

Figure 6.1: Zachary’s Karate Club. Nodes represent karate club membersand edges represent friendships. A conflict in the club divided the mem-bers into two groups. The color of the nodes denotes which one of the twogroups the nodes belong to.

As a real-world community example, consider the interactions of a col-lege karate club collected by Wayne Zachary in 1977. The example is oftenreferred to as Zachary’s Karate Club [309] in the literature. Figure 6.1 depicts Zachary’s Karate Clubthe interactions in a college karate club over two years. The links showfriendships between members. During the observation period, individ-uals split into two communities due to a disagreement between the clubadministrator and the karate instructor, and members of one communityleft to start their own club. In this figure, node colors demonstrate thecommunities to which individuals belong. As observed in this figure, us-ing graphs is a convenient way to depict communities because color-codednodes can denote memberships and edges can be used to denote relations.Furthermore, we can observe that individuals are more likely to be friendswith members of their own group, hence, creating tightly knit componentsin the graph.

Zachary’s Karate Club is an example of two explicit communities. An ex- Explicit (emic)Communitiesplicit community, also known as an emic community, satisfies the following

three criteria:

177

1. Community members understand that they are its members.

2. Nonmembers understand who the community members are.

3. Community members often have more interactions with each otherthan with nonmembers.

In contrast to explicit communities, in implicit communities, also knownImplicit (etic)Communities as etic communities, individuals tacitly interact with others in the form of

an unacknowledged community. For instance, individuals calling Canadafrom the United States on a daily basis need not be friends and do not con-sider each other as members of the same explicit community. However,from the phone operator’s point of view, they form an implicit commu-nity that needs to be marketed the same promotions. Finding implicitcommunities is of major interest, and this chapter focuses on finding thesecommunities in social media.

Communities in social media are more or less representatives of com-munities in the real world. As mentioned, in the real world, members ofcommunities are often geographically close to each other. The geographicallocation becomes less important in social media, and many communitieson social media consist of highly diverse people from all around the planet.In general, people in real-world communities tend to be more similar thanthose of social media. People do not need to share language, location, andthe like to be members of social media communities. Similar to real-worldcommunities, communities in social media can be labeled as explicit orimplicit. Examples of explicit communities in well-known social mediasites include the following:

• Facebook. In Facebook, there exist a variety of explicit communities,such as groups and communities. In these communities, users can postmessages and images, comment on other messages, like posts, andview activities of others.

• Yahoo! Groups. In Yahoo! groups, individuals join a group mailinglist where they can receive emails from all or a selection of groupmembers (administrators) directly.

• LinkedIn. LinkedIn provides its users with a feature called Groupsand Associations. Users can join professional groups where they canpost and share information related to the group.

178

Because these sites represent explicit communities, individuals havean understanding of when they are joining them. However, there existimplicit communities in social media as well. For instance, consider in-dividuals with the same taste for certain movies on a movie rental site.These individuals are rarely all members of the same explicit community.However, the movie rental site is particularly interested in finding theseimplicit communities so it can better market to them by recommendingmovies similar to their tastes. We discuss techniques to find these implicitcommunities next.

6.1 Community Detection

As mentioned earlier, communities can be explicit (e.g., Yahoo! groups), orimplicit (e.g., individuals who write blogs on the same or similar topics).In contrast to explicit communities, in many social media sites, implicitcommunities and their members are obscure to many people. Communitydetection finds these implicit communities.

In the simplest form, similar to the graph shown in Figure 6.1, com-munity detection algorithms are often provided with a graph where nodesrepresent individuals and edges represent friendships between individual.This definition can be generalized. Edges can also be used to representcontents or attributes shared by individuals. For instance, we can connectindividuals at the same location, with the same gender, or who bought thesame product using edges. Similarly, nodes can also represent products,sites, and webpages, among others. Formally, for a graph G(V,E), the taskof community detection is to find a set of communities {Ci}

ni=1 in a G such

that ∪ni=1Ci ⊆ V.

6.1.1 Community Detection Algorithms

There are a variety of community detection algorithms. When detectingcommunities, we are interested in detecting communities with either (1)specific members or (2) specific forms of communities. We denote the formeras member-based community detection and the latter as group-based commu-nity detection. Consider the network of 10 individuals shown in Figure6.2 where 7 are wearing black t-shirts and 3 are wearing white ones. Ifwe group individuals based on their t-shirt color, we end up having a

179

Figure 6.2: Community Detection Algorithms Example. Member-basedcommunity detection groups members based on their characteristics. Here,we divide the network based on color. In group-based community detec-tion, we find communities based on group properties. Here, groups areformed based on the density of interactions among their members.

community of three and a community of seven. This is an example ofmember-based community detection, where we are interested in specificmembers characterized by their t-shirts’ color. If we group the same setbased on the density of interactions (i.e., internal edges), we get two othercommunities. This is an instance of group-based community detection,where we are interested in specific communities characterized by their inter-actions’ density.

Member-based community detection uses community detection algo-rithms that group members based on attributes or measures such as sim-ilarity, degree, or reachability. In group-based community detection, weare interested in finding communities that are modular, balanced, dense,robust, or hierarchical.

180

Figure 6.3: A 4-Cycle.

6.1.2 Member-Based Community Detection

The intuition behind member-based community detection is that mem-bers with the same (or similar) characteristics are more often in the samecommunity. Therefore, a community detection algorithm following thisapproach should assign members with similar characteristics to the samecommunity. Let us consider a simple example. We can assume that nodesthat belong to a cycle form a community. This is because they share thesame characteristic: being in the cycle. Figure 6.3 depicts a 4-cycle. Forinstance, we can search for all n-cycles in the graph and assume that theyrepresent a community. The choice for n can be based on empirical evi-dence or heuristics, or n can be in a range [α1, α2] for which all cycles arefound. A well-known example is the search for 3-cycles (triads) in graphs.

In theory, any subgraph can be searched for and assumed to be a com-munity. In practice, only subgraphs that have nodes with specific charac-teristics are considered as communities. Three general node characteristicsthat are frequently used are node similarity, node degree (familiarity), and nodereachability.

When employing node degrees, we seek subgraphs, which are oftenconnected, such that each node (or a subset of nodes) has a certain nodedegree (number of incoming or outgoing edges). Our 4-cycle examplefollows this property, the degree of each node being two. In reachability, weseek subgraphs with specific properties related to paths existing betweennodes. For instance, our 4-cycle instance also follows the reachabilitycharacteristic where all pairs of nodes can be reached via two independentpaths. In node similarity, we assume nodes that are highly similar belong

181

Figure 6.4: First Four Complete Graphs.

to the same community.

Node Degree

The most common subgraph searched for in networks based on node de-grees is a clique. A clique is a maximum complete subgraph in which allpairs of nodes inside the subgraph are connected. In terms of the nodedegree characteristic, a clique of size k is a subgraph of k nodes where allnode degrees in the induced subgraph are k − 1. The only difference be-tween cliques and complete graphs is that cliques are subgraphs, whereascomplete graphs contain the whole node set V. The simplest four completegraphs (or cliques, when these are subgraphs) are represented in Figure6.4.

To find communities, we can search for the maximum clique (the onewith the largest number of vertices) or for all maximal cliques (cliquesthat are not subgraphs of a larger clique; i.e., cannot be expanded further).However, both problems are NP-hard, as is verifying whether a graphcontains a clique larger than size k. To overcome these theoretical barriers,for sufficiently small networks or subgraphs, we can (1) use brute force, (2)add some constraints such that the problem is relaxed and polynomiallysolvable, or (3) use cliques as the seed or core of a larger community.Brute-force clique identification. The brute force method can find allmaximal cliques in a graph. For each vertex vx, we try to find the maximalclique that contains node vx. The brute-force algorithm is detailed inAlgorithm 6.1.

The algorithm starts with an empty stack of cliques. This stack isinitialized with the node vx that is being analyzed (a clique of size 1). Then,from the stack, a clique is popped (C). The last node added to clique C

182

Algorithm 6.1 Brute-Force Clique IdentificationRequire: Adjacency Matrix A, Vertex vx

1: return Maximal Clique C containing vx

2: CliqueStack = {{vx}}, Processed = {};3: while CliqueStack not empty do4: C=pop(CliqueStack); push(Processed,C);5: vlast = Last node added to C;6: N(vlast) = {vi|Avlast,vi = 1}.7: for all vtemp ∈ N(vlast) do8: if C

⋃{vtemp} is a clique then

9: push(CliqueStack, C⋃{vtemp});

10: end if11: end for12: end while13: Return the largest clique from Processed

is selected (vlast). All the neighbors of vlast are added to the popped cliqueC sequentially, and if the new set of nodes creates a larger clique (i.e., thenewly added node is connected to all of the other members), then the newclique is pushed back into the stack. This procedure is followed until nodescan no longer be added.

The brute-force algorithm becomes impractical for large networks. Forinstance, for a complete graph of only 100 nodes, the algorithm will gen-erate at least 299

− 1 different cliques starting from any node in the graph(why?).

The performance of the brute-force algorithm can be enhanced by prun-ing specific nodes and edges. If the cliques being searched for are of sizek or larger, we can simply assume that the clique, if found, should containnodes that have degrees equal to or more than k− 1. We can first prune allnodes (and edges connected to them) with degrees less than k − 1. Due tothe power-law distribution of node degrees, many nodes exist with smalldegrees (1, 2, etc.). Hence, for a large enough k many nodes and edges willbe pruned, which will reduce the computation drastically. This pruningworks for both directed and undirected graphs.

Even with pruning, there are intrinsic properties with cliques that makethem a less desirable means for finding communities. Cliques are rarelyobserved in the real world. For instance, consider a clique of 1,000 nodes.

183

Figure 6.5: Maximal k-plexes for k = 1, 2, and 3.

This subgraph has 999× 10002 = 499,500 edges. A single edge removal from

this many edges results in a subgraph that is no longer a clique. Thatrepresents less than 0.0002% of the edges, which makes finding cliques achallenging task.

In practice, to overcome this challenge, we can either relax the cliquestructure or use cliques as a seed or core of a community.

Relaxing cliques. A well-known clique relaxation that comes from sociol-ogy is the k-plex concept. In a clique of size k, all nodes have the degreek-plexof k − 1; however, in a k-plex, all nodes have a minimum degree that is notnecessarily k − 1 (as opposed to cliques of size k). For a set of vertices V,the structure is called a k-plex if we have

dv ≥ |V| − k,∀v ∈ V, (6.1)

where dv is the degree of v in the induced subgraph (i.e., the number ofnodes from the set V that are connected to v).

Clearly, a clique of size k is a 1-plex. As k gets larger in a k-plex, thestructure gets increasingly relaxed, because we can remove more edgesfrom the clique structure. Finding the maximum k-plex in a graph stilltends to be NP-hard, but in practice, finding it is relatively easier due tosmaller search space. Figure 6.5 shows maximal k-plexes for k = 1, 2, and3. A k-plex is maximal if it is not contained in a larger k-plex (i.e., withmore nodes).

Using cliques as a seed of a community. When using cliques as a seedor core of a community, we assume communities are formed from a set ofcliques (small or large) in addition to edges that connect these cliques. Awell-known algorithm in this area is the clique percolation method (CPM)Clique

PercolationMethod (CPM)

[225]. The algorithm is provided in Algorithm 6.2. Given parameter k, the

184

Algorithm 6.2 Clique Percolation Method (CPM)Require: parameter k

1: return Overlapping Communities2: Cliquesk = find all cliques of size k3: Construct clique graph G(V,E), where |V| = |Cliquesk|

4: E = {ei j | clique i and clique j share k − 1 nodes}5: Return all connected components of G

method starts by finding all cliques of size k. Then a graph is generated(clique graph) where all cliques are represented as nodes, and cliques thatshare k − 1 vertices are connected via edges. Communities are then foundby reporting the connected components of this graph. The algorithmsearches for all cliques of size k and is therefore computationally intensive.In practice, when using the CPM algorithm, we often solve CPM for a smallk. Relaxations discussed for cliques are desirable to enable the algorithmto perform faster. Lastly, CPM can return overlapping communities.

Example 6.1. Consider the network depicted in Figure 6.6(a). The correspondingclique graph generated by the CPM algorithm for k = 3 is provided in Figure6.6(b). All cliques of size k = 3 have been identified and cliques that sharek−1 = 2 nodes are connected. Connected components are returned as communities({v1, v2, v3}, {v8, v9, v10}, and {v3, v4, v5, v6, v7, v8}). Nodes v3 and v8 belong to twocommunities, and these communities are overlapping.

Node Reachability

When dealing with reachability, we are seeking subgraphs where nodesare reachable from other nodes via a path. The two extremes of reacha-bility are achieved when nodes are assumed to be in the same communityif (1) there is a path between them (regardless of the distance) or (2) theyare so close as to be immediate neighbors. In the first case, any graphtraversal algorithm such as BFS or DFS can be used to identify connectedcomponents (communities). However, finding connected components isnot very useful in large social media networks. These networks tend tohave a large-scale connected component that contains most nodes, whichare connected to each other via short paths. Therefore, finding connectedcomponents is less powerful for detecting communities in them. In the sec-ond case, when nodes are immediate neighbors of all other nodes, cliques

185

Figure 6.6: Clique Percolation Method (CPM) Example for k = 3.

are formed, and as discussed previously, finding cliques is considered avery challenging process.

To overcome these issues, we can find communities that are in betweencliques and connected components in terms of connectivity and have smallshortest paths between their nodes. There are predefined subgraphs, withroots in social sciences, with these characteristics. Well-known ones in-clude the following:

• k-Clique is a maximal subgraph where the shortest path betweenany two nodes is always less than or equal to k. Note that in k-cliques, nodes on the shortest path should not necessarily be part ofthe subgraph.k-Clique, k-Club, and

k-Clan

186

Figure 6.7: Examples of 2-Cliques, 2-Clubs, and 2-Clans.

• k-Club is a more restricted definition; it follows the same definitionas k-cliques with the additional constraint that nodes on the shortestpaths should be part of the subgraph.

• k-Clan is a k-clique where, for all shortest paths within the subgraph,the distance is less than or equal to k. All k-clans are k-cliques andk-clubs, but not vice versa. In other words,

k-Clans = k-Cliques ∩ k-Clubs.

Figure 6.7 depicts an example of the three discussed models.

Node Similarity

Node similarity attempts to determine the similarity between two nodes vi

and v j. Similar nodes (or most similar nodes) are assumed to be in the samecommunity. Often, once the similarities between nodes are determined,a classical clustering algorithm (see Chapter 5) is applied to find commu-nities. Determining similarity between two nodes has been addressed indifferent fields; in particular, the problem of structural equivalence in the Structural

Equivalencefield of sociology considers the same problem. In structural equivalence,similarity is based on the overlap between the neighborhood of the ver-tices. Let N(vi) and N(v j) be the neighbors of vertices vi and v j, respectively.In this case, a measure of vertex similarity can be defined as follows:

σ(vi, v j) = |N(vi) ∩N(v j)|. (6.2)

For large networks, this value can increase rapidly, because nodes mayshare many neighbors. Generally, similarity is attributed to a value that isbounded and usually in the range [0, 1]. For that to happen, various nor-malization procedures such as the Jaccard similarity or the cosine similarity

187

can be done:

σJaccard(vi, v j) =|N(vi) ∩N(v j)||N(vi) ∪N(v j)|

, (6.3)

σCosine(vi, v j) =|N(vi) ∩N(v j)|√|N(vi)||N(v j)|

. (6.4)

Example 6.2. Consider the graph in Figure 6.7. The similarity values betweennodes v2 and v5 are

σJaccard(v2, v5) =|{v1, v3, v4} ∩ {v3, v6}|

|{v1, v3, v4, v6}|= 0.25, (6.5)

σCosine(v2, v5) =|{v1, v3, v4} ∩ {v3, v6}|√|{v1, v3, v4}||{v3, v6}|

= 0.40. (6.6)

In general, the definition of neighborhood N(vi) excludes the node itself(vi). This, however, leads to problems with the aforementioned similarityvalues because nodes that are connected and do not share a neighbor willbe assigned zero similarity. This can be rectified by assuming that nodesare included in their own neighborhood.

A generalization of structural equivalence is known as regular equiv-alence. Consider the situation of two basketball players in two differentcountries. Though sharing no neighborhood overlap, the social circles ofthese players (coach, players, fans, etc.) might look quite similar due totheir social status. In other words, nodes are regularly equivalent whenthey are connected to nodes that are themselves similar (a self-referentialdefinition). For more details on regular equivalence, refer to Chapter 3.

6.1.3 Group-Based Community Detection

When considering community characteristics for community detection, weare interested in communities that have certain group properties. In thissection, we discuss communities that are balanced, robust, modular, dense,or hierarchical.

Balanced Communities

As mentioned before, community detection can be thought of as the prob-lem of clustering in data mining and machine learning. Graph-based

188

Figure 6.8: Minimum Cut (A) and Two More Balanced Cuts (B and C) in aGraph.

clustering techniques have proven to be useful in identifying communi-ties in social networks. In graph-based clustering, we cut the graph intoseveral partitions and assume these partitions represent communities.

Formally, a cut in a graph is a partitioning (cut) of the graph into two(or more) sets (cutsets). The size of the cut is the number of edges that arebeing cut and the summation of weights of edges that are being cut in aweighted graph. A minimum cut (min-cut) is a cut such that the size of thecut is minimized. Figure 6.8 depicts several cuts in a graph. For example,cut B has size 4, and A is the minimum cut. Minimum Cut

Based on the well-known max-flow min-cut theorem, the minimum cutof a graph can be computed efficiently. However, minimum cuts are not al-ways preferred for community detection. Often, they result in cuts wherea partition is only one node (singleton), and the rest of the graph is inthe other. Typically, communities with balanced sizes are preferred. Fig-ure 6.8 depicts an example where the minimum cut (A) creates unbalancedpartitions, whereas, cut C is a more balanced cut.

To solve this problem, variants of minimum cut define an objectivefunction, minimizing (or maximizing) that during the cut-finding pro-cedure, results in a more balanced and natural partitioning of the data.Consider a graph G(V,E). A partitioning of G into k partitions is a tupleP = (P1,P2,P3, . . . ,Pk), such that Pi ⊆ V, Pi ∩ P j = ∅ and

⋃ki=1 Pi = V. Then,

the objective function for the ratio cut and normalized cut are defined as Ratio cut andNormalized Cutfollows:

Ratio Cut(P) =1k

k∑i=1

cut(Pi, P̄i)|Pi|

, (6.7)

189

Normalized Cut(P) =1k

k∑i=1

cut(Pi, P̄i)vol(Pi)

, (6.8)

where P̄i = V − Pi is the complement cut set, cut(Pi, P̄i) is the size of thecut, and volume vol(Pi) =

∑v∈Pi

dv. Both objective functions provide a morebalanced community size by normalizing the cut size either by the numberof vertices in the cutset or the volume (total degree).

Both the ratio cut and normalized cut can be formulated in a matrixformat. Let matrix X ∈ {0, 1}|V|×k denote the community membership matrix,where Xi, j = 1 if node i is in community j; otherwise, Xi, j = 0. Let D =diag(d1, d2, . . . , dn) represent the diagonal degree matrix. Then the ith entryon the diagonal of XTAX represents the number of edges that are insidecommunity i. Similarly, the ith element on the diagonal of XTDX representsthe number of edges that are connected to members of community i. Hence,the ith element on the diagonal of XT(D − A)X represents the number ofedges that are in the cut that separates community i from all other nodes. Infact, the ith diagonal element of XT(D−A)X is equivalent to the summationterm cut(Pi, P̄i) in both the ratio and normalized cut. Thus, for ratio cut, wehave

Ratio Cut(P) =1k

k∑i=1

cut(Pi, P̄i)|Pi|

(6.9)

=1k

k∑i=1

XTi (D − A)Xi

XTi Xi

(6.10)

=1k

k∑i=1

X̂Ti (D − A)X̂i, (6.11)

where X̂i = Xi/(XTi Xi)1/2. A similar approach can be followed to formulate

the normalized cut and to obtain a different X̂i. To formulate the summationin both the ratio and normalized cut, we can use the trace of matrix (tr(X̂) =∑n

i=1 X̂ii). Using the trace, the objectives for both the ratio and normalizedcut can be formulated as trace-minimization problems,

minX̂

Tr(X̂TLX̂), (6.12)

190

where L is the (normalized) graph Laplacian, defined as follows:

L =

D−A Ratio Cut Laplacian (Unnormalized Laplacian);

I−D−1/2AD−1/2 Normalized Cut Laplacian (Normalized Laplacian).

(6.13)

It has been shown that both ratio cut and normalized cut minimization Normalized andUnnormalized GraphLaplacian

are NP-hard; therefore, approximation algorithms using relaxations aredesired. Spectral clustering is one such relaxation:

minX̂

Tr(X̂TLX̂), (6.14)

s.t. X̂TX̂ = Ik. (6.15)

The solution to this problem is the top eigenvectors of L.1 Given L, the Spectral Clusteringtop k eigenvectors corresponding to the smallest eigen values are computedand used as X̂, and then k-means is run on X̂ to extract communitiesmemberships (X). The first eigenvector is meaningless (why?); hence, therest of the eigenvectors (k − 1) are used as k-means input.

Example 6.3. Consider the graph in Figure 6.8. We find two communities in thisgraph using spectral clustering (i.e., k = 2). Then, we have

D = diag(2, 2, 4, 4, 4, 4, 4, 3, 1). (6.16)

The adjacency matrix A and the unnormalized laplacian L are

A =

0 1 1 0 0 0 0 0 0

1 0 1 0 0 0 0 0 0

1 1 0 1 1 0 0 0 0

0 0 1 0 1 1 1 0 0

0 0 1 1 0 1 1 0 0

0 0 0 1 1 0 1 1 0

0 0 0 1 1 1 0 1 0

0 0 0 0 0 1 1 0 1

0 0 0 0 0 0 0 1 0

, (6.17)

1For more details refer to [56].

191

L = D − A =

2 −1 −1 0 0 0 0 0 0

−1 2 −1 0 0 0 0 0 0

−1 −1 4 −1 −1 0 0 0 0

0 0 −1 4 −1 −1 −1 0 0

0 0 −1 −1 4 −1 −1 0 0

0 0 0 −1 −1 4 −1 −1 0

0 0 0 −1 −1 −1 4 −1 0

0 0 0 0 0 −1 −1 3 −1

0 0 0 0 0 0 0 −1 1

.

(6.18)

We aim to find two communities; therefore, we get two eigenvectors corre-sponding to the two smallest eigenvalues from L:

X̂ =

123456789

0.33 −0.460.33 −0.460.33 −0.260.33 ≈ 0.010.33 ≈ 0.010.33 0.130.33 0.130.33 0.330.33 0.59

. (6.19)

As mentioned, the first eigenvector is meaningless, because it assigns all nodesto the same community. The second is used with k-means; based on the vectorsigns, we get communities {1, 2, 3} and {4, 5, 6, 7, 8, 9}.

Robust Communities

When seeking robust communities, our goal is to find subgraphs robustenough such that removing some edges or nodes does not disconnect thesubgraph. A k-vertex connected graph (or k-connected) is an example ofk-connectedsuch a graph. In this graph, k is the minimum number of nodes that must

192

be removed to disconnect the graph (i.e., there exist at least k independentpaths between any pair of nodes). A similar subgraph is the k-edge graph,where at least k edges must be removed to disconnect the graph. An upper-bound analysis on k-edge connectivity shows that the minimum degree forany node in the graph should not be less than k (why?). For example, acomplete graph of size n is a unique n-connected graph, and a cycle is a2-connected graph.

Modular Communities

Modularity is a measure that defines how likely the community structurefound is created at random. Clearly, community structures should be farfrom random. Consider an undirected graph G(V,E), |E| = m where thedegrees are known beforehand, but edges are not. Consider two nodes vi

and v j, with degrees di and d j, respectively. What is the expected number ofedges between these two nodes? Consider node vi. For any edge going outof vi randomly, the probability of this edge getting connected to node v j is

d j∑i di

=d j

2m . Because the degree for vi is di, we have di number of such edges;

hence, the expected number of edges between vi and v j isdid j

2m . So, givena degree distribution, the expected number of edges between any pair ofvertices can be computed. Real-world communities are far from random;therefore, the more distant they are from randomly generated commu-nities, the more structure they exhibit. Modularity defines this distance, Modularityand modularity maximization tries to maximize this distance. Considera partitioning of the graph G into k partitions, P = (P1,P2,P3, . . . ,Pk). Forpartition Px, this distance can be defined as∑

vi,v j∈Px

Ai j −did j

2m. (6.20)

This distance can be generalized for partitioning P with k partitions,

k∑x=1

∑vi,v j∈Px

Ai j −did j

2m. (6.21)

The summation is over all edges (m), and because all edges are countedtwice (Ai j = A ji), the normalized version of this distance is defined as

193

modularity [211]:

Q =1

2m

k∑x=1

∑vi,v j∈Px

Ai j −did j

2m

. (6.22)

We define the modularity matrix as B = A − ddT/2m, where d ∈ Rn×1

is the degree vector for all nodes. Similar to spectral clustering matrixformulation, modularity can be reformulated as

Q =1

2mTr(XTBX), (6.23)

where X ∈ Rn×k is the indicator (partition membership) function; thatis, Xi j = 1 iff. vi ∈ P j. This objective can be maximized such that the bestmembership function is extracted with respect to modularity. The problemis NP-hard; therefore, we relax X to X̂ that has an orthogonal structure(X̂TX̂ = Ik). The optimal X̂ can be computed using the top k eigenvectorsof B corresponding to the largest positive eigenvalues. Similar to spectralclustering, to find X, we can run k-means on X̂. Note that this requires thatB has at least k positive eigenvalues.

Dense Communities

Often, we are interested in dense communities, which have sufficiently fre-quent interactions. These communities are of particular interest in socialmedia where we would like to have enough interactions for analysis tomake statistical sense. When we are measuring density in communities,the community may or may not be connected as long as it satisfies the prop-erties required, assuming connectivity is not one such property. Cliques,clubs, and clans are examples of connected dense communities. Here,we focus on subgraphs that have the possibility of being disconnected.Density-based community detection has been extensively discussed in thefield of clustering (see Chapter 5, Bibliographic Notes).

The density γ of a graph defines how close a graph is to a clique. InGraph Densityother words, the density γ is the ratio of the number of edges |E| that graphG has over the maximum it can have

(|V|2

):

γ =|E|(|V|2

) . (6.24)

194

A graph G = (V,E) is γ-dense if |E| ≥ γ(|V|2

). Note that a 1-dense graph

is a clique. Here, we discuss the interesting scenario of connected densegraphs (i.e., quasi-cliques). A quasi-clique (or γ-clique) is a connected Quasi-Cliqueγ-dense graph. Quasi-cliques can be searched for using approaches previ-ously discussed for finding cliques. We can utilize the brute-force cliqueidentification algorithm (Algorithm 6.1) for finding quasi-cliques as well.The only part of the algorithm that needs to be changed is the part wherethe clique condition is checked (Line 8). This can be replaced with a quasi-clique checking condition. In general, because there is less regularity inquasi-cliques, searching for them becomes harder. Interested readers canrefer to the bibliographic notes for faster algorithms.

Hierarchical Communities

All previously discussed methods have considered communities at a singlelevel. In reality, it is common to have hierarchies of communities, in whicheach community can have sub/super communities. Hierarchical clusteringdeals with this scenario and generates community hierarchies. Initially, nnodes are considered as either 1 or n communities in hierarchical clustering.These communities are gradually merged or split (agglomerative or divisivehierarchical clustering algorithms), depending on the type of algorithm,until the desired number of communities are reached. A dendrogramis a visual demonstration of how communities are merged or split usinghierarchical clustering. The Girvan-Newman [101] algorithm is specificallydesigned for finding communities using divisive hierarchical clustering.

The assumption underlying this algorithm is that, if a network has aset of communities and these communities are connected to one anotherwith a few edges, then all shortest paths between members of differentcommunities should pass through these edges. By removing these edges(at times referred to as weak ties), we can recover (i.e., disconnect) commu-nities in a network. To find these edges, the Girvan-Newman algorithmuses a measure called edge betweenness and removes edges with higher edgebetweenness. For an edge e, edge betweenness is defined as the number Edge

Betweennessof shortest paths between node pairs (vi, v j) such that the shortest pathbetween vi and v j passes through e. For instance, in Figure 6.9(a), edgebetweenness for edge e(1, 2) is 6/2 + 1 = 4, because all the shortest pathsfrom 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1, 2) is theshortest path between 1 and 2. Formally, the Girvan-Newman algorithm

195

Figure 6.9: An Example of Girvan-Newman Algorithm Example: (a) graphand (b) its hierarchical clustering dendrogram based on edge betweenness.

is as follows:Girvan-NewmanAlgorithm

1. Calculate edge betweenness for all edges in the graph.

2. Remove the edge with the highest betweenness.

3. Recalculate betweenness for all edges affected by the edge removal.

4. Repeat until all edges are removed.

Example 6.4. Consider the graph depicted in Figure 6.9(a). For this graph, theedge-betweenness values are as follows:

1 2 3 4 5 6 7 8 91 0 4 1 9 0 0 0 0 02 4 0 4 0 0 0 0 0 03 1 4 0 9 0 0 0 0 04 9 0 9 0 10 10 0 0 05 0 0 0 10 0 1 6 3 06 0 0 0 10 1 0 6 3 07 0 0 0 0 6 6 0 2 88 0 0 0 0 3 3 2 0 09 0 0 0 0 0 0 8 0 0

. (6.25)

Therefore, by following the algorithm, the first edge that needs to be removedis e(4, 5) (or e(4, 6)). By removing e(4, 5), we compute the edge betweenness once

196

Figure 6.10: Community Detection Algorithms.

again; this time, e(4, 6) has the highest betweenness value: 20. This is becauseall shortest paths between nodes {1,2,3,4} to nodes {5,6,7,8,9} must pass e(4, 6);therefore, it has betweenness 4 × 5 = 20. By following the first few steps ofthe algorithm, the dendrogram shown in Figure 6.9(b) and three disconnectedcommunities ({1, 2, 3, 4}, {5, 6, 7, 8}, {9}) can be obtained.

We discussed various community detection algorithms in this section.Figure 6.10 summarizes the two categories of community detection algo-rithms.

6.2 Community Evolution

Community detection algorithms discussed so far assume that networksare static; that is, their nodes and edges are fixed and do not changeover time. In reality, with the rapid growth of social media, networks andtheir internal communities change over time. Earlier community detectionalgorithms have to be extended to deal with evolving networks. Before

197

analyzing evolving networks, we need to answer the question, How donetworks evolve? In this section, we discuss how networks evolve in generaland then how communities evolve over time. We also demonstrate howcommunities can be found in these evolving networks.

6.2.1 How Networks Evolve

Large social networks are highly dynamic, where nodes and links appearor disappear over time. In these evolving networks, many interesting pat-terns are observed; for instance, when distances (in terms of shortest pathdistance) between two nodes increase, their probability of getting con-nected decreases.2 We discuss three common patterns that are observed inevolving networks: segmentation, densification, and diameter shrinkage.

Network Segmentation

Often, in evolving networks, segmentation takes place, where the largenetwork is decomposed over time into three parts:

1. Giant Component: As network connections stabilize, a giant com-ponent of nodes is formed, with a large proportion of network nodesand edges falling into this component.

2. Stars: These are isolated parts of the network that form star struc-tures. A star is a tree with one internal node and n leaves.

3. Singletons: These are orphan nodes disconnected from all nodes inthe network.

Figure 6.11 depicts a segmented network and these three components.

Graph Densification

It is observed in evolving graphs that the density of the graph increases asthe network grows. In other words, the number of edges increases fasterthan the number of nodes. This phenomenon is called densification. LetV(t) denote nodes at time t and let E(t) denote edges at time t,

|E(t)| ∝ |V(t)|α. (6.26)2See [154] for details.

198

Figure 6.11: Network Segmentation. The network is decomposed into agiant component (dark gray), star components (medium gray), and single-tons (light gray).

If densification happens, then we have 1 ≤ α ≤ 2. There is lineargrowth when α = 1, and we get clique structures when α = 2 (why?).Networks exhibit α values between 1 and 2 when evolving. Figure 6.12depicts a log-log graph for densification for a physics citation network anda patent citation network. During the evolution process in both networks,the number of edges is recorded as the number of nodes grows. Theserecordings show that both networks have α ≈ 1.6 (i.e., the log-log graphof |E| with respect to |V| is a straight line with slope 1.6). This value alsoimplies that when V is given, to realistically model a social network, weshould generate O(|V|1.6) edges.

Figure 6.12: Graph Densification (from [167]).

199

Figure 6.13: Diameter Shrinkage over Time for a Patent Citation Network(from [167]).

Diameter Shrinkage

Another property observed in large networks is that the network diametershrinks in time. This property has been observed in random graphs as well(see Chapter 4). Figure 6.13 depicts the diameter shrinkage for the samepatent network discussed in Figure 6.12.

In this section we discussed three phenomena that are observed inevolving networks. Communities in evolving networks also evolve. Theyappear, grow, shrink, split, merge, or even dissolve over time. Figure 6.14depicts different situations that can happen during community evolution.

Both networks and their internal communities evolve over time. Givenevolution information (e.g., when edges or nodes are added), how canwe study evolving communities? And can we adapt static (nontemporal)methods to use this temporal information? We discuss these questionsnext.

200

Figure 6.14: Community Evolution (reproduced from [226]).

6.2.2 Community Detection in Evolving Networks

Consider an instant messaging (IM) application in social media. In theseIM systems, members become “available” or “offline” frequently. Con-sider individuals as nodes and messages between them as edges. In thisexample, we are interested in finding a community of individuals whosend messages to one another frequently. Clearly, community detection atany time stamp is not a valid solution because interactions are limited atany point in time. A valid solution to this problem needs to use temporalinformation and interactions between users over time. In this section, wepresent community detection algorithms that incorporate temporal infor-mation. To incorporate temporal information, we can extend previouslydiscussed static methods as follows:

1. Take t snapshots of the network, G1, G2, . . . ,Gt, where Gi is a snapshotat time i.

201

2. Perform a static community detection algorithm on all snapshotsindependently.

3. Assign community members based on communities found in all tdifferent time stamps. For instance, we can assign nodes to commu-nities based on voting. In voting, we assign nodes to communitiesthey belong to the most over time.

Unfortunately, this method is unstable in highly dynamic networksbecause community memberships are always changing. An alternative isto use evolutionary clustering.

Evolutionary Clustering

In evolutionary clustering, it is assumed that communities do not changemost of the time; hence, it tries to minimize an objective function thatconsiders both communities at different time stamps (snapshot cost orSC) and how they evolve throughout time (temporal cost or TC). Then,the objective function for evolutionary clustering is defined as a linearcombination of the snapshot cost and temporal cost (SC and TC),

Cost = α SC + (1 − α) TC, (6.27)

where 0 ≤ α ≤ 1. Let us assume that spectral clustering (discussed inSection 6.1.3) is used to find communities at each time stamp. We knowthat the objective for spectral clustering is Tr(XTLX) s.t. XTX = Im, so wewill have the objective function at time t as

Costt = α SC + (1 − α) TC, (6.28)= α Tr(XT

t LXt) + (1 − α) TC, (6.29)

where Xt is the community membership matrix at time t. To define TC,we can compute the distance between the community assignments of twosnapshots:

TC = ||Xt − Xt−1||2. (6.30)

Unfortunately, this requires both Xt and Xt−1 to have the same numberof columns (number of communities). Moreover, Xt is not unique and

202

can change by orthogonal transformations;3 therefore, the distance value||Xt − Xt−1||

2 can change arbitrarily. To remove the effect of orthogonaltransformations and allow different numbers of columns, TC is defined as

TC =12||XtXT

t − Xt−1XTt−1||

2,

=12

Tr((XtXTt − Xt−1XT

t−1)T(XtXTt − Xt−1XT

t−1)),

=12

Tr(XtXTt XtXT

t − 2XtXTt Xt−1XT

t−1 + Xt−1XTt−1Xt−1XT

t−1),

= Tr(I − XtXTt Xt−1XT

t−1),= Tr(I − XT

t Xt−1XTt−1Xt), (6.31)

where 12 is for mathematical convenience, and Tr(AB) = Tr(BA) is used.

Therefore, evolutionary clustering objective can be stated as

Costt = α Tr(XTt LXt) + (1 − α)

12||XtXT

t − Xt−1XTt−1||

2,

= α Tr(XTt LXt) + (1 − α) Tr(I − XT

t Xt−1XTt−1Xt),

= α Tr(XTt LXt) + (1 − α) Tr(XT

t IXt − XTt Xt−1XT

t−1Xt),= Tr(XT

t αLXt) + Tr(XTt (1 − α)IXt − XT

t (1 − α)Xt−1XTt−1Xt).

(6.32)

Assuming the normalized Laplacian is used in spectral clustering, L =

I −D−1/2t AtD−1/2

t ,

Costt = Tr(XTt α(I −D−1/2

t AtD−1/2t ) Xt)

+ Tr(XTt (1 − α) I Xt − XT

t (1 − α) Xt−1 XTt−1 Xt),

= Tr(XTt (I − αD−1/2

t AtD−1/2t − (1 − α) Xt−1XT

t−1) Xt),

= Tr(XtL̂Xt), (6.33)

where L̂ = I−αD−1/2t AtD−1/2

t − (1−α)Xt−1XTt−1. Similar to spectral clustering,

Xt can be obtained by taking the top eigenvectors of L̂.

3Let X be the solution to spectral clustering. Consider an orthogonal matrix Q(i.e., QQT = I). Let Y = XQ. In spectral clustering, we are maximizing Tr(XTLX) =Tr(XTLXQQT) = Tr(QTXTLXQ) = Tr((XQ)TL(XQ)) = Tr(YTLY). In other words, Y isanother answer to our trace-maximization problem. This proves that the solution X tospectral clustering is non-unique under orthogonal transformations Q.

203

Figure 6.15: Community Evaluation Example. Circles represent commu-nities, and items inside the circles represent members. Each item is repre-sented using a symbol, +, ×, or 4, that denotes the item’s true label.

Note that at time t, we can obtain Xt directly by solving spectral cluster-ing for the laplacian of the graph at time t, but then we are not employingany temporal information. Using evolutionary clustering and the newlaplacian L̂, we incorporate temporal information into our community de-tection algorithm and disallow user memberships in communities at timet: Xt to change dramatically from Xt−1.

6.3 Community Evaluation

When communities are found, one must evaluate how accurately the de-tection task has been performed. In terms of evaluating communities, thetask is similar to evaluating clustering methods in data mining. Evaluatingclustering is a challenge because ground truth may not be available. Weconsider two scenarios: when ground truth is available and when it is not.

6.3.1 Evaluation with Ground Truth

When ground truth is available, we have at least partial knowledge ofwhat communities should look like. Here, we assume that we are giventhe correct community (clustering) assignments. We discuss four mea-sures: precision and recall, F-measure, purity, and normalized mutualinformation (NMI). Consider Figure 6.15, where three communities arefound and the points are shown using their true labels.

204

Precision and Recall

Community detection can be considered a problem of assigning all similarnodes to the same community. In the simplest case, any two similar nodesshould be considered members of the same community. Based on ourassignments, four cases can occur:

1. True Positive (TP) Assignment: when similar members are assignedto the same community. This is a correct decision.

2. True Negative (TN) Assignment: when dissimilar members are as-signed to different communities. This is a correct decision.

3. False Negative (FN) Assignment: when similar members are as-signed to different communities. This is an incorrect decision.

4. False Positive (FP) Assignment: when dissimilar members are as-signed to the same community. This is an incorrect decision.

Precision (P) and Recall (R) are defined as follows,

P =TP

TP + FP, (6.34)

R =TP

TP + FN. (6.35)

Precision defines the fraction of pairs that have been correctly assignedto the same community. Recall defines the fraction of pairs that the com-munity detection algorithm assigned to the same community of all thepairs that should have been in the same community.

Example 6.5. We compute these values for Figure 6.15. For TP, we need tocompute the number of pairs with the same label that are in the same community.For instance, for label × and community 1, we have

(52

)such pairs. Therefore,

TP =

(52

)︸︷︷︸

Community 1

+

(62

)︸︷︷︸

Community 2

+ ((42

)+

(22

))︸︷︷︸

Community 3

= 32. (6.36)

205

For FP, we need to compute dissimilar pairs that are in the same community.For instance, for community 1, this is (5 × 1 + 5 × 1 + 1 × 1). Therefore,

FP = (5 × 1 + 5 × 1 + 1 × 1)︸︷︷︸Community 1

+ (6 × 1)︸︷︷︸Community 2

+ (4 × 2)︸︷︷︸Community 3

= 25. (6.37)

FN computes similar members that are in different communities. For instance,for label +, this is (6 × 1 + 6 × 2 + 2 × 1). Similarly,

FN = (5 × 1)︸︷︷︸×

+ (6 × 1 + 6 × 2 + 2 × 1)︸︷︷︸+

+ (4 × 1)︸︷︷︸4

= 29. (6.38)

Finally, TN computes the number of dissimilar pairs in dissimilar communi-ties:

TN = (

×,+︷︸︸︷5 × 6 +

+,×︷︸︸︷1 × 1 +

4,+︷︸︸︷1 × 6 +

4,×︷︸︸︷1 × 1 )︸︷︷︸

Communities 1 and 2

+ (

×,4︷︸︸︷5 × 4 +

×,+︷︸︸︷5 × 2 +

+,4︷︸︸︷1 × 4 +

4,+︷︸︸︷1 × 2 )︸︷︷︸

Communities 1 and 3

+ (

+,4︷︸︸︷6 × 4 +

×,+︷︸︸︷1 × 2 +

×,4︷︸︸︷1 × 4︸︷︷︸

Communities 2 and 3

= 104. (6.39)

Hence,

P =32

32 + 25= 0.56 (6.40)

R =32

32 + 29= 0.52. (6.41)

F-Measure

To consolidate precision and recall into one measure, we can use the har-monic mean of precision and recall:

F = 2 ·P · RP + R

. (6.42)

206

Computed for the same example, we get F = 0.54.

Purity

In purity, we assume that the majority of a community represents the com-munity. Hence, we use the label of the majority of the community againstthe label of each member of the community to evaluate the algorithm. Forinstance, in Figure 6.15, the majority in Community 1 is ×; therefore, weassume majority label × for that community. The purity is then definedas the fraction of instances that have labels equal to their community’smajority label. Formally,

Purity =1N

k∑i=1

maxj|Ci ∩ L j|, (6.43)

where k is the number of communities, N is the total number of nodes, L j

is the set of instances with label j in all communities, and Ci is the set ofmembers in community i. In the case of our example, purity is 5+6+4

20 = 0.75.

Normalized Mutual Information

Purity can be easily manipulated to generate high values; consider whennodes represent singleton communities (of size 1) or when we have verylarge pure communities (ground truth = majority label). In both cases,purity does not make sense because it generates high values.

A more precise measure to solve problems associated with purity isthe normalized mutual information (NMI) measure, which originates ininformation theory. Mutual information (MI) describes the amount ofinformation that two random variables share. In other words, by know-ing one of the variables, MI measures the amount of uncertainty reducedregarding the other variable. Consider the case of two independent vari-ables; in this case, the mutual information is zero because knowing onedoes not help in knowing more information about the other. Mutual infor-mation of two variables X and Y is denoted as I(X,Y). We can use mutualinformation to measure the information one clustering carries regardingthe ground truth. It can be calculated using Equation 6.44, where L and Hare labels and found communities; nh and nl are the number of data points

207

in community h and with label l, respectively; nh,l is the number of nodesin community h and with label l; and n is the number of nodes.

MI = I(X,Y) =∑h∈H

∑l∈L

nh,l

nlog

n · nh,l

nhnl(6.44)

Unfortunately, mutual information is unbounded; however, it is com-mon for measures to have values in range [0,1]. To address this issue, wecan normalize mutual information. We provide the following equation,without proof, which will help us normalize mutual information,

MI ≤ min(H(L),H(H)), (6.45)

where H(·) is the entropy function,

H(L) = −

∑l∈L

nl

nlog

nl

n(6.46)

H(H) = −

∑h∈H

nh

nlog

nh

n. (6.47)

From Equation 6.45, we have MI ≤ H(L) and MI ≤ H(H); therefore,

(MI)2≤ H(H)H(L). (6.48)

Equivalently,MI ≤

√H(H)

√H(L). (6.49)

Equation 6.49 can be used to normalize mutual information. Thus, weintroduce the NMI as

NMI =MI√

H(L)√

H(H). (6.50)

By plugging Equations 6.47, 6.46, and 6.44 into 6.50,

NMI =

∑h∈H

∑l∈L nh,l log n·nh,l

nhnl√(∑

h∈H nh log nhn )(

∑l∈L nl log nl

n ). (6.51)

An NMI value close to one indicates high similarity between commu-nities found and labels. A value close to zero indicates a long distancebetween them.

208

Figure 6.16: Tag Clouds for Two Communities.

6.3.2 Evaluation without Ground Truth

When no ground truth is available, we can incorporate techniques based onsemantics or clustering quality measures to evaluate community detectionalgorithms.

Evaluation with Semantics

A simple way of analyzing detected communities is to analyze other at-tributes (posts, profile information, content generated, etc.) of communitymembers to see if there is a coherency among community members. Thecoherency is often checked via human subjects. For example, the Ama-zon Mechanical Turk platform4 allows defining this task on its platformfor human workers and hiring individuals from all around the globe toperform tasks such as community evaluation. To help analyze these com-munities, one can use word frequencies. By generating a list of frequentkeywords for each community, human subjects determine whether thesekeywords represent a coherent topic. A more focused and single-topic setof keywords represents a coherent community. Tag clouds are one way ofdemonstrating these topics. Figure 6.16 depicts two coherent tag cloudsfor a community related to the U.S. Constitution and another for sports.Larger words in these tag clouds represent higher frequency of use.

Evaluation Using Clustering Quality Measures

When experts are not available, an alternative is to use clustering qualitymeasures. This approach is commonly used when two or more commu-

4http://www.mturk.com.

209

nity detection algorithms are available. Each algorithm is run on the targetnetwork, and the quality measure is computed for the identified commu-nities. The algorithm that yields a more desirable quality measure valueis considered a better algorithm. SSE (sum of squared errors) and inter-cluster distance are some of the quality measures. For other measures referto Chapter 5.

We can also follow this approach for evaluating a single communitydetection algorithm; however, we must ensure that the clustering qualitymeasure used to evaluate community detection is different from the mea-sure used to find communities. For instance, when using node similarityto group individuals, a measure other than node similarity should be usedto evaluate the effectiveness of community detection.

210

6.4 Summary

In this chapter, we discussed community analysis in social media, answer-ing three general questions: (1) how can we detect communities, (2) howdo communities evolve and how can we study evolving communities, and(3) how can we evaluate detected communities? We started with a de-scription of communities and how they are formed. Communities in socialmedia are either explicit (emic) or implicit (etic). Community detectionfinds implicit communities in social media.

We reviewed member-based and group-based community detectionalgorithms. In member-based community detection, members can begrouped based on their degree, reachability, and similarity. For exam-ple, when using degrees, cliques are often considered as communities.Brute-force clique identification is used to identify cliques. In practice, dueto the computational complexity of clique identifications, cliques are eitherrelaxed or used as seeds of communities. k-Plex is an example of relaxedcliques, and the clique percolation algorithm is an example of methods thatuse cliques as community seeds. When performing member-based com-munity detection based on reachability, three frequently used subgraphsare the k-clique, k-club, and k-clan. Finally, in member-based communitydetection based on node similarity, methods such as Jaccard and Cosinesimilarity help compute node similarity. In group-based community de-tection, we described methods that find balanced, robust, modular, dense,or hierarchical communities. When finding balanced communities, onecan employ spectral clustering. Spectral clustering provides a relaxedsolution to the normalized cut and ratio cut in graphs. For finding ro-bust communities, we search for subgraphs that are hard to disconnect.k-edge and k-vertex graphs are two examples of these robust subgraphs.To find modular communities, one can use modularity maximization andfor dense communities, we discussed quasi-cliques. Finally, we providedhierarchical clustering as a solution to finding hierarchical communities,with the Girvan-Newman algorithm as an example.

In community evolution, we discussed when networks and, on a lowerlevel, communities evolve. We also discussed how communities can bedetected in evolving networks using evolutionary clustering. Finally, wepresented how communities are evaluated when ground truth exists andwhen it does not.

211

6.5 Bibliographic Notes

A general survey of community detection in social media can be foundin [91] and a review of heterogeneous community detection in [278]. Inrelated fields, [35, 305, 136] provide surveys of clustering algorithms and[294] provides a sociological perspective. Comparative analysis of commu-nity detection algorithms can be found in [162] and [168]. The descriptionof explicit communities in this chapter is due to Kadushin [140].

For member-based algorithms based on node degree, refer to [158],which provides a systematic approach to finding clique-based communi-ties with pruning. In algorithms based on node reachability, one can findcommunities by finding connected components in the network. For moreinformation on finding connected components of a graph refer to [130]. Innode similarity, we discussed structural equivalence, similarity measures,and regular equivalence. More information on structural equivalence canbe found in [178, 166], on Jaccard similarity in [133], and on regular equiv-alence in [264].

In group-based methods that find balanced communities, we are of-ten interested in solving the max-flow min-cut theorem. Linear program-ming and Ford-Fulkerson [61], Edmonds-Karp [80], and Push-Relabel [103]methods are some established techniques for solving the max-flow min-cutproblem. We discussed quasi-cliques that help find dense communities.Finding the maximum quasi-clique is discussed in [231]. A well-knowngreedy algorithm for finding quasi-cliques is introduced by [2]. In theirapproach a local search with a pruning strategy is performed on the graphto enhance the speed of quasi-clique detection. They define a peel strategy,in which vertices that have some degree k along with their incident edgesare recursively removed. There are a variety of algorithms to find densesubgraphs, such as the one discussed in [99] where the authors proposean algorithm that recursively fingerprints the graph (shingling algorithm)and creates dense subgraphs. In group-based methods that find hierar-chical communities, we described hierarchical clustering. Hierarchicalclustering algorithms are usually variants of single link, average link, orcomplete link algorithms [135]. In hierarchical clustering, COBWEB [88]and CHAMELEON [142] are two well-known algorithms.

In group-based community detection, latent space models [121, 129]are also very popular, but are not discussed in this chapter. In addition tothe topics discussed in this chapter, community detection can also be per-

212

formed for networks with multiple types of interaction (edges) [279]; [280].We also restricted our discussion to community detection algorithms thatuse graph information. One can also perform community detection basedon the content that individuals share on social media. For instance, usingtagging relations (i.e., individuals who shared the same tag) [292], insteadof connections between users, one can discover overlapping communities,which provides a natural summarization of the interests of the identifiedcommunities.

In network evolution analysis, network segmentation is discussed in[157]. Segment-based clustering [269] is another method not covered inthis chapter.

NMI was first introduced in [267] and in terms of clustering qualitymeasures, the Davies-Bouldin [67] measure, Rand index [236], C-index[76], Silhouette index [241], and Goodman-Kruskal index [106] can beused.

213

6.6 Exercises

1. Provide an example to illustrate how community detection can besubjective.

Community Detection

2. Given a complete graph Kn, how many nodes will the clique perco-lation method generate for the clique graph for value k? How manyedges will it generate?

3. Find all k-cliques, k-clubs, and k-clans in a complete graph of size 4.

4. For a complete graph of size n, is it m-connected? What possiblevalues can m take?

5. Why is the smallest eigenvector meaningless when using an unnor-malized laplacian matrix?

6. Modularity can be defined as

Q =1

2m

∑i j

[Ai j −

did j

2m

]δ(ci, c j), (6.52)

where ci and c j are the communities for vi and v j, respectively.

δ(ci, c j) (Kronecker delta) is 1 when vi and v j both belong to the samecommunity (ci = c j), and 0 otherwise.

• What is the range [α1, α2] for Q values? Provide examples forboth extreme values of the range and cases where modularitybecomes zero.

• What are the limitations for modularity? Provide an examplewhere modularity maximization does not seem reasonable.

• Find three communities in Figure 6.8 by performing modularitymaximization.

7. For Figure 6.8:

214

• Compute Jaccard and Cosine similarity between nodes v4 andv8, assuming that the neighborhood of a node excludes the nodeitself.

• Compute Jaccard and Cosine similarity when the node is in-cluded in the neighborhood.

Community Evolution

8. What is the upper bound on densification factor α? Explain.

Community Evaluation

9. Normalized mutual information (NMI) is used to evaluate com-munity detection results when the actual communities (labels) areknown beforehand.

• What are the maximum and minimum values for the NMI?Provide details.

• Explain how NMI works (describe the intuition behind it).

10. Compute NMI for Figure 6.15.

11. Why is high precision not enough? Provide an example to show thatboth precision and recall are important.

12. Discuss situations where purity does not make sense.

13. Compute the following for Figure 6.17:

Figure 6.17: Community Evaluation Example.

215

• precision and recall

• F-measure

• NMI

• purity

216

Documents

Social Media Mining: An Introductiondmml.asu.edu/smm/chapters/SMM-ch6.pdfChapter 6 Community Analysis This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad