Visual Analytics of
Big Data
Seok-Hee Hong
University of Sydney
Bioinformatics Winter School 2014
Big Data and
The Scale Problem
Social networks: Facebook users
2004 2005 2006 2007
50M
40M
30M
20M
10M
5M
0
Biological networks: KEGG database
1982 1988 1994 2000 2006
108
107
106
105
104
103
102
Internet Movie Data Base
Year 1937
1995
The scale problem Data sets are growing much faster than
computing systems/tools to analyse them.
Existing algorithms/methods do not scale
well enough to be efficient/effective on the big data sets.
Big Graph/Network
Erdos networks Lincoln Lu
Visual Analytics
Good visualisation can enable users:
to understand the structure
to discover new knowledge/insight
to find regular/abnormal patterns/behavior
to generate/confirm/reject hypothesis
to confirm expected and discover unexpected
to reveal the hidden truth
to predict the future
Visual Analytics
Visual Data Mining
Key Scientific Challenge
1. Scalability
2. Visual Complexity
3. Domain Complexity
Visual Analysis Framework for Big Graph
Big Data Graph Picture
interaction
visualisation analysis
GEOMI (Geometry for Maximum Insight)
Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney
GEOMI (GEOmetry for Maximum Insight)
Network Analysis
Interaction
Graph Layout
GEOMI Features Network/graph generator
Scale-free networks Clustered graph Hierarchical graph
Network analysis
Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness
Group analysis: blockmodelling, clustering, k-core, structural equivalence
Graph algorithms: filtering, shortest path, giant component Interaction/Navigation
Zoom, panning, rotation Selection Graph layout interaction/navigation Animation Head gesture interaction
Graph/Network Layout Node-link representation
Trees Planar graphs General undirected graphs Directed graphs Clustered graphs Hierarchical graphs Scale-free networks Dynamic/Temporal networks Multi-relational networks Multi-variate networks Overlapping networks
Map representation Tree/Radial tree map Voronoi map Temporal map
Hybrid representation
Interaction with Cool Toys
IMDB (Internet Movie Data Base) Network Analysis
Kevin Bacon Network
Days of Thunder (1990)
Far and Away (1992) A Few Good Man
Hollywood Movie Actor Collaboration Network
Kevin Bacon Network
IMDB (Internet Movie DataBase)
Kevin Bacon
Tom Cruise:
Bacon #1
Nicole Kidman: Bacon#2
Evolution of Kevin Bacon Network
GD05: Evolution of IMDB Kevin Bacon #1: 2000
WOS (Web of Science) Analysis
Social Network Co-citation Network
Evolution of Co-citation Network in WOS
co-citation network of year 2003
co-citation network of Year 2006
Information Visualisation
Network Analysis
Evolution of research area
Info Vis Collaboration Network
Email Network Virus Detection
History of World Cup
World Cup 2002
Edge Bundling with centrality analysis & k-core analysis
US Airline Network Analysis
Integration with Clustering
Clustered Graph Layout
Metabolic Pathway Visualisation
GO-defined Protein Interaction Network
2.5D Scale-free Network Visualisation
Scale-free Network
[Barabasi and Albert 99] Exponential Growth Preferential attachment
Properties Power-law degree distribution Sparse, but locally dense Small-world property: O(loglogn) average path length High clustering coefficient Resilient to random attack, but vulnerable to designed
attack Examples: Webgraph Social networks Biological networks
Parallel Plane/Concentric Sphere Layout
G1
G3
G2
G1
G3
G2
PPI networks Hawoong Jeong
Visualisation of Patterns
Motif
Overlapping Network Visualisation for
Integrated Analysis
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Two Overlapping Networks
Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli
9 overlap
1-neighborhood network
Gene Regulatory Network [RegulonDB] and PPI: E. Coli
periphery proteins
6 hubs: no overlap
bottleneck proteins
Three Overlapping Networks
GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp
No overlap
3
aceE
3
aceE
aceF
3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
3
ptsG: overlap between 3 layers
Propagation Animation in Diffusion Network