22
Clustering And Community Formation By: Pinakpani Shah

Clustering And Community Formation

  • Upload
    rupali

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Clustering And Community Formation. By: Pinakpani Shah. Many systems can be described as network, which creates high connections between units system is made of. What is Clustering/community? - PowerPoint PPT Presentation

Citation preview

Page 1: Clustering And Community Formation

Clustering And Community Formation

By: Pinakpani Shah

Page 2: Clustering And Community Formation

Many systems can be described as network, which creates high connections between units system is made of.

What is Clustering/community?

- Large unit of network densely connected to each other compare to rest of the network is called a cluster or community.

• Small complete sub graphs are used as motifs and distribution and clustering properties are used to identify such communities.

• Essential feature of community is that each node should be reachable by subset of nodes.

Page 3: Clustering And Community Formation

Uncovering the Overlapping community structure of complex networks in nature and society

Page 4: Clustering And Community Formation

Communities can be of friends, relatives, hobby, games etc…

This social structure is built by considering a random individual and creating a network of his/ her friends.

This network is called a random network.

In this kind of network, it is possible that a single node can be a part of multiple network.

To identify such community one of the best method used is divide the network into small groups.

Page 5: Clustering And Community Formation

Such method will make a node to be part of only one network.

Overlapping community is a crucial part of the community.

To get over this problem there is one more method is called clique percolation.

Erdos-Renyi uncorrelated random graphs are used as a prototype.

In this graph p = pc = 1/N

Page 6: Clustering And Community Formation

p = probability that two vertices are connected to each other.

N = number of nodes in a network

pc= threshold percolation or critical point.

k-clique is a complete sub-graph of k vertices.

It is used to find the overlapping communities.

Should allow overlapping, should not be restrictive, Should be based on the density of links and there should not be cut nodes or edges are a basic requirements to find the overlapping communities.

Page 7: Clustering And Community Formation

Clique percolation in Random Networks

Page 8: Clustering And Community Formation

Both graphs are giant connected components as edge probability is much larger then the threshold percolation value which is (0.05)

On the left graph p is 0.13 which is less then the percolation threshold (0.16 from above equation) while on the right graph p is above threshold value.

Page 9: Clustering And Community Formation

K-clique are adjacent if they share k-1 nodes.

K-clique chain is a sub graph which is a union of adjacent k-cliques.

Two k-cliques are connected if they are part of k-clique chain.

Union of all k-cliques that are k-clique connected to a particular k-clique is called k-clique percolation cluster.

Above graphs is a 3-clique percolation cluster.

Page 10: Clustering And Community Formation

k-clique percolation cluster is like an edge percolation cluster in k-clique adjacency graph.

k-cliques are represented as nodes and there will be an edge between them if they are adjacent.

Community structure depends on the value of k, as value of k is increased community becomes disintegrated and smaller.

A k-clique template is an object of original graph, can be placed onto any k-clique of graph.

Moving a particle from one vertex to another with and edge is called a rolling a k-clique template.

Page 11: Clustering And Community Formation

k-clique template can be placed to any k-clique and by rolling it’s one particle and keeping other k-1 particle fixed.

k-clique percolation cluster of a graph are all the sub graphs that are explored by k-clique template.

k-clique percolation cluster can be considered as a community.

Different values of k in this will give different strength of communities.

Page 12: Clustering And Community Formation

Properties of the community structure are:

◦Every member can be reached by every subset of well connected nodes.

◦Community share nodes with each other means Overlapping.

Page 13: Clustering And Community Formation

Size and age are to basic quantities to define dynamically changing community.

Both size s and age t are correlated.

To quantify relative overlap of two states of the same community auto-correlation function C(t) is used.

Intersection gives the number of common nodes in two different time stamps.

Union gives the total number of nodes in two different time stamps.

Page 14: Clustering And Community Formation

To calculate stationarity of the community following equation is used:

Gives the average correlation between states.

T(0) denotes the birth and t(max) denotes the extinction of the community.

Page 15: Clustering And Community Formation

Now a days web has become an advantage to information access.

Contents on web are difficult to analyze as they are decentralized and unorganized.

Because of focused search engines, content filtering and text based searching web community identification is necessary.

On the web communities web pages are treated as nodes while the hyperlinks on the pages are the edges between the nodes.

Web communities are collection of web pages such that each has more hyperlinks within the community than outside of it.

Page 16: Clustering And Community Formation

This communities are collectively self organized by the independent author.

It is compare with the maximal flow problem. Graph edges with capacity and to find the maximal way from source vertex to the sink vertex.

Seed vertices are the source vertices.

Page 17: Clustering And Community Formation

Self Organization and Identification of Web Communities

Page 18: Clustering And Community Formation

Approximate-Flow-Community

◦Takes the seed vertices as the input and crawls to the finite depth which includes the inbound and outbound hyperlinks.

◦Uses the Exact-Flow-Community method, ranks the sites and add the non-seed member to the seed set.

◦May initially a small community is identified but as new seeds are added new large communities can be found.

Page 19: Clustering And Community Formation

Self Organization and Identification of Web Communities

Page 20: Clustering And Community Formation

Exact-Flow-Community

◦Source s is added with infinite capacity edges and routed to all vertices in seed set S.

◦All existing edges are made bidirectional with capacity value set to constant k.

◦All the vertices except source, sink and seed are routed to artificial sink vertex with the unit capacity.

Page 21: Clustering And Community Formation

If we consider biology then protein has it’s own network.

Any cellular tasks are not performed by any individual protein, but group of functionally associated proteins.

These modules are densely connected with each other and creates a overlapped network.

Cfinder is the stand alone application used to view the overlapping of gene networks.

Input to this application is file containing two columns of strings and third column of weight of this link.

Page 22: Clustering And Community Formation

References:

◦Cfinder : Locating cliques and overlapping modules in biological network.

◦Clique percolation in random networks◦Uncovering overlapping community structure of

complex networks in nature and society.◦Quantifying Social group evolution◦The critical point of k-clique percolation