Upload
jason-riedy
View
848
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation for NSF IUCRC Center for Hybrid Multicore Productivity Research program meeting.
Citation preview
Project Reports and Proposals
Bader Lab
STING: A FRAMEWORK FOR
ANALYZING SPATIO-TEMPORAL
INTERACTION NETWORKS AND
GRAPHS
Presenter: Jason Riedy
Project report: STING
I/UCRC Review Meeting, Dec 2010
19
Streaming Data Analysis • Massive, irregularly structured
input data.
• New simulation, analysis methods
• Widely varied, unexplored
response / control methods
?
?
Current needs, future knowledge:
Analysts need us here. Yesterday.
The STING framework is growing to help. ... I/UCRC Review Meeting, Dec 2010
20
Streaming Data Analysis • Massive, irregularly structured
input data.
• New simulation, analysis methods
• Widely varied, unexplored
response / control methods ?
Current needs, future knowledge:
...
friendship graph:
30k edges per
pixel on 1600x1200
screen!
I/UCRC Review Meeting, Dec 2010
21
Our Recent Contributions
• Demonstrating value
– Analysis of public Twitter data
• Analyzing streaming data in parallel
– Streaming connected component tracking
• Exploring new algorithms
– Seeded community detection
– First known comparison of methods
D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley, R. Farber, and W. Reynolds.
“Massive Social Network Analysis: Mining Twitter for Social Good,” International
Conference on Parallel Processing, San Diego, CA September 13-16, 2010.
P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded Community Detection in
Social Networks.” Technical Report. Being revised for submission.
D. Ediger, J. Riedy, D. Bader. “Tracking Structure of Streaming Social Networks,” in
submission.
I/UCRC Review Meeting, Dec 2010
22
“Mining Twitter for Social Good”
• Immense volume of data:
• Twitter: 165M users, 55M tweets / day
• Goal: Use the interaction network to understand
and characterize information flow.
Results from collaboration with PNNL:
D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley,
R. Farber, and W. Reynolds. “Massive Social
Network Analysis: Mining Twitter for Social
Good,” International Conference on Parallel
Processing, San Diego, CA September 13-16,
2010.
Twitter social
network using
Large Graph
Layout
I/UCRC Review Meeting, Dec 2010
23
Twitter Data Sets
•Influenza H1N1 Tweets in September 2009
–Keywords: flu, h1n1, influenza, swine flu
•Atlanta Flood Tweets in September 2009
–Hash tag: #atlflood
•All public tweets on September 1st, 2009
–For performance evaluation
http://hippwaters.wordpress.com/2009/09/22/atlanta-flood-images/ Source: CDC
I/UCRC Review Meeting, Dec 2010
24
Tweeters ranked by Betweenness Centrality
Rank H1N1 atlflood
1 @CDCFlu @ajc
2 @addthis @driveafastercar
3 @Official_PAX @ATLCheap
4 @FluGov @TWCi
5 @nytimes @HelloNorthGA
6 @tweetmeme @11AliveNews
7 @mercola @WSB_TV
8 @CNN @shaunking
9 @backstreetboys @Carl
10 @EllieSmith_x @SpaceyG
11 @TIME @ATLINtownPaper
12 @CDCemergency @TJsDJs
13 @CDC_eHealth @ATLien
14 @perezhilton @MarshallRamsey
15 @billmaher @Kanye
I/UCRC Review Meeting, Dec 2010
25
Exact vs. Approximate
Betweenness Centrality Performance
2009-09 H1N1
2009-09 Atlanta Flood
2009-09 Day 1 All
2009-09 Days: 1-9
All
2009-09 Days: 1-30
All
Vertices Edges
46,457 36,886
2,283 2,774
1,242,715 1,020,671
4,093,202 7,146,911
7,213,879 18,153,410
Cray XMT GraphCT
39.18s (64 processors)
6.14s (64 processors)
48m50s (128 processors)
20h21m (128 processors)
Not Run
Nehalem Exact Snap-GT (multi-threaded)
17s 1s Stopped after 72 hours
Not Run Not Run
Cray XMT GraphCT Approx. (128 samples)
3.94s (16 processors)
4.08s (16 processors)
5.12s (16 processors)
14.49s (16 processors)
33.09s (16 processors)
I/UCRC Review Meeting, Dec 2010
26
The Cray XMT
•Tolerates latency by massive multithreading –Hardware support for 128 threads on each processor
–Globally hashed address space
–No data cache
–Single cycle context switch
–Multiple outstanding memory requests
–Fine-grained, word-level synchronization
–Flexibly supports dynamic load balancing
•Example, extreme architecture
– Useful lesson: Tolerating memory latency assists graph
analysis.
– Graph500: XMTs at #3, #4, #6 w/1 afternoon's work.
•PNNL's 128 processor XMT: 16384 threads, 1 TiB of shared memory
Image Source: cray.com
I/UCRC Review Meeting, Dec 2010
27
Handling massive data rates
• Current data rates: Handle 240k edge insertions &
deletions per second on a Cray XMT.
• Note: GigE packet rates are ~550k/sec
Updates / sec
Edge adds only 930k
Edge adds + STINGER 300k
Adds + Deletes + STINGER 240k
Sizes 32P, 1M batch
RMAT: 16M vertices, 135M edges (edge factor 8)
I/UCRC Review Meeting, Dec 2010
28
Seed Set Expansion
• Useful to find communities
to which several vertices
belong.
• Blue vertices are
are seeds, red vertices
belong to a discovered
community.
• Uses: Selection for viz,
expensive analysis...
I/UCRC Review Meeting, Dec 2010
29
Comparing expansion methods
• Results of (first known) comparison of methods:
– McCloskey-Bader variation of widely used,
agglomerative Clauset-Newman-Moore heuristic
produces smallest sets with good properties.
– Followed closely by a personalized PageRank
approach [Andersen, Chung, & Lang, 2006].
• Currently working on parallel, streaming version...
Results from P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded Community
Detection in Social Networks.” Technical Report. Being revised for submission.
I/UCRC Review Meeting, Dec 2010
30
References
1. A. Clauset, M.E.J. Newman, and C. Moore. “Finding community structure in
very large networks.” Physical Review E, 70(6):66111, 2004.
2. R. Andersen and K. Lang. “Communities from seed sets.” In Proceedings of
the 15th international conference on World Wide Web, page 232. ACM, 2006.
3. R. Andersen, F. Chung, and K. Lang. 2006. “Local Graph Partitioning using
PageRank Vectors.” In Proceedings of the 47th Annual IEEE Symposium on
Foundations of Computer Science (October 21 - 24, 2006). FOCS. IEEE
Computer Society, Washington, DC, 475-486.
4. J.P. McCloskey and D.A. Bader, “Modularity and Graph Algorithms,”
Minisymposium on Analyzing Massive Real-World Graphs, 2010 SIAM Annual
Meeting (AN10), Pittsburgh, PA, July 12-16, 2010.
I/UCRC Review Meeting, Dec 2010
31
Current Bader Lab Personnel
• Faculty: David A. Bader
• Research Scientists:
– Henning Meyerhenke (University of Paderborn, Germany)
– Jason Riedy (UC Berkeley)
• Graduate Students:
– David Ediger (George Washington University)
– Seunghwa Kang (Seoul National University, Korea)
– Xing Liu (Huazhong University of Science and Technology, China & IBM)
– Robert McColl (Vanderbilt University)
– Pushkar Pande (IIT Roorkee, India)
– Emily Rogers (UC Berkeley)
– Vipin Sachdeva (IIT Guwahati, UNM, IBM)
– Vyomkesh Tripathi (IIT-Kharagpur, India)
– Ivan Walker (Jackson State University)
– Zhaoming Yin (Peking Univ, China)
32
Acknowledgment of Support