Upload
donhi
View
220
Download
1
Embed Size (px)
Citation preview
Network Visualization: Gephi and Cytoscape
Caf'E.phe, février 2016
Pablo Ruiz Fabo — LATTICE
Network visualization
• Requires relational data
[ http://cvcedhlab.hypotheses.org/125 ]
2
. . .
. . .
Network analysis
• Some terminology:
[ http://cvcedhlab.hypotheses.org/106 ]
3
. . .
Network analysis • Network: composed of nodes, linked by edges
• Nodes represent actors in our domain
– People, characters, concepts, places, …
• Edges encode the relation between the nodes
– Interacting with someone, citing someone’s work, occurring in the same paragraph, …
• Edges can be weighted: encodes importance of the link
– E.g. How many times did this link occur?
• Edges can bear direction or not:
– [Being a correspondent] vs. [being the sender vs. being the addressee of a letter]
4
Objectives
• Create an co-occurrence network visualization with Gephi and Cytoscape, for two corpora:
– History corpus on the American crisis of 2008 • A CSV file representing the network’s edges was used
– Philosophy corpus: Jeremy Bentham’s manuscripts. • For Gephi, a GEXF file representing the network was used
• For Cytoscape , a Graphml file representing the network was used (it can also be used for Gephi)
• Export a navigable network so that it can be visualized outside these tools
[ Some example files to import or create networks with, and example exported networks are available at apps.lattice.cnrs.fr/nav/cafephe11 ]
5
2008 Crisis Corpus: PoliInformatics
6
Smith et al. (2014) [12]
Bentham Corpus
Transcribe Bentham (Causer & Terras, 2014) [13]
• UCL (London)
• Unpublished manuscripts transcribed by volunteers (crowdsourcing)
• 30,000 pages
7
Jeremy Bentham: Philosopher, social
reformer (1748-1832, London)
Image: blogs.ucl.ac.uk/transcribe-bentham/
8
Gephi version
• This presentation covers Gephi 0.9, which came out in December 2015, and which works with Java 8 or 7
• Most training materials on Gephi are about version 0.8.2 (worked with Java 7, NOT 8)
• Small UI changes between 0.8.2 and 0.9
• Cytoscape 3.3.0, works with Java 8, NOT 7
9
Cytoscape version
10
Import Edges table (1)
11
• Start Gephi and go to Data Laboratory. You may need to close the Projects popup. Do File / New Project
• Click on Import Spreadsheet and search in the materials for a file whose name ends with “edges.csv”. Import it as an Edges table
Import Edges table (2)
12
2. Once the table is imported, create labels by copying ID with the “Copy data to another column” tab in the bottom row
2a 2b
1. Import Edge Table Weight and Create missing nodes must be checked in the dialogue
Initial Network • Click on the Overview tab to see the initial, not
spatialized network:
13
Saving and exporting a project
• It is advisable to both save and export a project
14
To save a project, just click on Save, as would be expected. It will be saved as a project file with the .gephi extension (it’s a sort of zip file)
Additionally, also export the network as a graph file for safety
Network Layout (1) • Run the Force Atlas layout, with these settings:
15
In force-based layouts (like Forced Atlas or Forced Atlas 2), linked nodes attract each other and unrelated nodes are represented as further apart. See [3] and [8].
Determines how far apart nodes will be, thus affecting the readability of the network (how wide it will spread)
Helps avoid label overlap (but there are other means for this too)
1. Choose the Layout 2. Specify Settings
Network Layout (2) • Once the network stabilizes, you can stop Force Atlas.
• The initial layout will look similar to below
• The zoom slider can be used to see more or less of the network
16
Zoom
Toggle bottom pane here
Node and Edge Appearance
• In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based
17
Node and Edge Appearance
• In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based
18
• Colour • Size • Label colour • Label size
Attributes correspond to properties of nodes and edges, reflecting their role in the network as per different metrics
• Different types of metrics can be encoded in the node size. Here, we use a node’s Degree (how many nodes it is connected to)
Node Size
19
In the Appearance tab, choose the Nodes and Attribute buttons: and then: - Degree in the dropdown menu - The CIRCLES icon for node size
in the button bar, hit Apply
After applying the ranking, node size will reflect the ranking criterion. In this case, more strongly connected nodes will be bigger
For information on other ranking criteria, see [4]
Node Labels (1) • Other Node Label settings can be accessed from
the bottom panel, that can be toggled here
• If at any point node labels overlap, this can be fixed by running the Label Adjust layout
20
Node Labels (2) • Label Sizes are defined with the leftmost button
21
- In scaled mode, all labels bear the same size, scaled for readability
- In fixed mode, all labels bear the size specified in the font dropdown (Dialog bold 32 in the example)
- In node size mode, label size matches node size
- Run Label Adjust Layout in case of label overlap
• Label Colour is defined with the rightmost button
Community Detection: Modularity • The modularity tool can be run to detect communities, i.e.
groups of nodes that are more strongly connected among them than they are to other groups of nodes [9].
22
1. In the Statistics pane on the right, look for Modularity and hit Run
2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button
Community Detection: Modularity • The modularity tool can be run to detect communities, i.e.
groups of nodes that are more strongly connected among them than they are to other groups of nodes [9].
23
1. In the Statistics pane on the right, look for Modularity and hit Run
2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button
Preview Pane Preview after applying a node size criterion and community detection Settings are default, unless specified on the screenshot.
24
Show Labels was activated
Edge Thickness was reduced to 0.2 to avoid too thick edges on highly connected nodes
Hit Refresh after any changes to the Settings or to reset an unreadable preview pane
Filters • The network can be filtered
according to many criteria (see [6]). Here, we filter nodes that have less than six connections, to get rid of generally less relevant nodes and edges
25
Expand the Topology dropdown - Double click on Degree Range - Move the slider at the bottom up up to the desired minimum degree
Exporting visualization as PDF or image
• In the Preview pane, there’s a button to export the visualization (bottom left)
26
Export visualization as an interactive website: sigma.js exporter (1)
• Gephi has several plugins that allow exporting the network in an interactive website format.
– The website allows zooming in and out
– In some cases, the user can selectively focus a part of the network and run searches for nodes
• We’ll be using the sigma.js exporter plugin [10], which has all of the functions above. Depending on your browser, it may need to be run inside a web server (Apache, XAMPP, Wamp, EasyPHP etc.)
• Other plugins allowing some of the above functions:
– Seadragon plugin
– Google Maps Exporter 27
Network as a website (2): sigma.js
• We need to do three things: – Install the sigma.js exporter
plugin
– Export the network as a sigma.js site
– Make the site available from a web server
• To install the plugin: – Go to Tools/Plugins, and select
Sigma Exporter in the Available Plugins tab (once installed, it will move to the Installed tab)
28
Network as website (3): Exporting
• Jafkaj
29
1. Export the network from File/Export and Sigma.js template
2. Fill in the dialogue: Give the path to folder to export the site to, and the legend to be displayed for the site’s data
Network as website (4): Web Server • We need to take the exported site from the previous step and put it
in a web server. Note: some browsers (e.g. Firefox) allow seeing the networks just by opening the index.html file, no need for the local web server
• If you don’t have a web server installed, a possibility is to install XAMPP https://www.apachefriends.org
– Windows: • https://blog.udemy.com/xampp-tutorial/
• https://www.apachefriends.org/faq_windows.html
– Linux: https://www.apachefriends.org/faq_linux.html
– Mac: https://www.apachefriends.org/faq_osx.html
• Once you have the web server, to see the network, point a browser to http://localhost/XXX , where XXX corresponds to the name of your sigma.js network (by default the name is network when Gephi exports it). 30
Network as website (5): Config
31
If edges on the exported network are too thin and node labels are not visible
Look for config.json inside the folder where the sigma.js site was exported (network by default)
- Increase minEdgeSize and maxEdgeSize for thicker edges - Decrease labelThreshold to see more labels
32
Import the network or edges file
33
The example involves the graphml network for the Bentham corpus. Other graphml networks are available in the materials and can be manipulated similarly. An edges file (CSV) can also be imported the same way (but click ‘advanced’).
Layout
34
The AllegroLayout plugin was used (Force-based), install it with Apps / Apps Manager Default options were chosen: “Spring-electric” option. If need to modify the layout, read about their intended effect with the tooltips
Apps / App Manager
35
Layout (another example)
36
If you need a clearer layout, the Scale option will spread the network. If the edges have a weight attribute, it can be used from the Edge Weighting tab The following example follows a graphml import of the American crisis corpus, and the scale was modified. (The screenshot also reflects later modifications to the network appearance, see the following slides).
Node attributes from node table • We imported a ready network in Graphml; we
can read the attributes off the import:
37
Attributes Similar to Gephi’s Unique / Attribute buttons:
First column (Def.) defines a unique value
Second column (Map.) defines values based on an attribute
Final column (Byp.) allows to define exceptions
In the example:
- Fill color (default is a blue hue) reflects communities (based on column cluster_universal_index in the node table)
- Size (default 35) is based on the size column of the imported nodes
38
Attribute value “mapping”
• Discrete: a discrete set of categories
• Continuous: continues values, the minimum and maximum can be set.
• Passthrough: values read off the import file directly
39
Node color for communities
40
Node color according to the community id of the node in the imported node table (in this case the id was called cluster_universal_index, but other names may appear) Note: the original network was created with Cortext Manager (manager.cortext.net), and communities were created with the Louvain method [9]
Other appearance options (1)
41
After adding node color for communities, the node label was read off the label field of the nodes in the imported graphml network (otherwise the label would be the node’s numeric id). Label Font Size was set to 90
Other appearance options (2)
42
Using the character co-occurrence example in Les Misérables provided in [2b] Node size was made dependent on the “size” attribute of the nodes in the graphml. The background color was changed from the Network tab (at the bottom of the control panel). Edge color was changed with the Edge tab.
Importing edge table and analysis • If we are importing just the edges
(Source,Target,Weight), we won’t have all the attributes like node size, communities etc. So the first thing after import is running an analysis:
43
Analyzing the network: partitioning
• Several possibilities:
44
Filtering the network
45
From the Select tab in the control panel (on top) Enter a selection criterion and create a new network with the result. Selected nodes are highlighted in yellow.
Visualizing the analyzed network
46
After running the analysis, a partitioning with the Community cluster (GLay) app was peformed. Node color is based on that (__glayCluster attribute). Node size was made dependent on node degree (i.e. how many connections it has)
Exporting the network • Like in Gephi, the network can be exported as
an image, as a graph file, or as a website.
47
Other
48
The grid view helps look at different regions of the network (or selected vs othe rnodes ) at once
Interpretation problems
• Hubs vs. Authorities:
– nlp.stanford.edu/IR-book/html/htmledition/hubs-and-authorities-1.html
• Force Atlas Layout and Force Atlas with Attraction Distribution:
– The latter pushes hubs to the periphery, giving a different view of the same network, see [11]
• Hubs vs. “Sinks” (e.g. air traffic)
49
Hubs vs. Authorities (1)
50
B. R
ied
er (
20
10
) [1
1]
Hubs vs. Authorities (2)
51
B. R
ied
er (
20
10
) [1
1]
Hubs vs. Authorities (3)
• Force Atlas and Force Atlas with Attraction Distribution:
– The latter pushes hubs to the periphery, giving a different view of the same network, see [11]
– Look at Barry Wellman in the preceding graphs
52
Hubs vs. “Sinks” (1)
53
Hubs vs. “Sinks” (2)
• Las Vegas is not a central element in the network. People fly to Las Vegas and back to their departure city, not through Las Vegas.
54
References: Gephi Tutorials The format of the reference list is: Description: URL [description of the dataset if applicable]
[1] General Tutorial: https://gephi.github.io/users/quick-start/ [Character cooccurrences in Hugo’s Les Misérables]
[2a] Deeper: By Martin Grandjean http://www.martingrandjean.ch/gephi-introduction/ [many datasets]
[2b] Deeper: By Clément Levallois http://www.clementlevallois.net/gephi.html
[several datasets]
[3] Importing edge tables from CSV: http://www.literaturegeek.com/2013/09/09/dataintogephi/ [Character interactions in Joyce’s Ulysses]
[4] Network Layouts: https://gephi.github.io/users/tutorial-layouts/ [Les Misérables, Airlines dataset, Internet Core Routers datasets]
[5] Metrics: http://www.clementlevallois.net/gephi/tuto/en/gephi_advanced%20functions_en.pdf
[6] Formatting the Networks: https://gephi.github.io/users/tutorial-visualization/ [Airlines dataset]
[7] Filters: http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/ [Facebook]
55
References: Cytoscape
56
References: Other [7] Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for
exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. http://gephi.org/publications/gephi-bastian-feb09.pdf
[8] Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679
[9] Blondel, Vincent D and Guillaume, Jean-Loup and Lambiotte, Renaud and Lefebvre, Etienne. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. http://arxiv.org/pdf/0803.0476.pdf
[10] Sigma JS exporter, created by the Oxford Internet Institute: http://blogs.oii.ox.ac.uk/vis/
[11] Rieder, B. (2010). One network and four algorithms http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/
[12] Smith, N.A., Cardie. C., Washington, A. L., Wilkerson, J.D. (2014). Overview of the 2014 NLP Unshared Task in PoliInformatics. Proceedings of the ACL Workshop on Language Technologies and Computational Social Science, 5-7.
[13] Tim Causer and Melissa Terras (2014). Crowdsourcing Bentham: Beyond the traditional boundaries of academic history. International Journal of Humanities and Arts Computing, vol. 8(1), pp. 46-64.
57
Thank you!
58