16
Project Reports and Proposals Bader Lab

STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

Embed Size (px)

DESCRIPTION

Presentation for NSF IUCRC Center for Hybrid Multicore Productivity Research program meeting.

Citation preview

Page 1: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

Project Reports and Proposals

Bader Lab

Page 2: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

STING: A FRAMEWORK FOR

ANALYZING SPATIO-TEMPORAL

INTERACTION NETWORKS AND

GRAPHS

Presenter: Jason Riedy

Project report: STING

I/UCRC Review Meeting, Dec 2010

Page 3: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

19

Streaming Data Analysis • Massive, irregularly structured

input data.

• New simulation, analysis methods

• Widely varied, unexplored

response / control methods

?

?

Current needs, future knowledge:

Analysts need us here. Yesterday.

The STING framework is growing to help. ... I/UCRC Review Meeting, Dec 2010

Page 4: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

20

Streaming Data Analysis • Massive, irregularly structured

input data.

• New simulation, analysis methods

• Widely varied, unexplored

response / control methods ?

Current needs, future knowledge:

...

Facebook

friendship graph:

30k edges per

pixel on 1600x1200

screen!

I/UCRC Review Meeting, Dec 2010

Page 5: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

21

Our Recent Contributions

• Demonstrating value

– Analysis of public Twitter data

• Analyzing streaming data in parallel

– Streaming connected component tracking

• Exploring new algorithms

– Seeded community detection

– First known comparison of methods

D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley, R. Farber, and W. Reynolds.

“Massive Social Network Analysis: Mining Twitter for Social Good,” International

Conference on Parallel Processing, San Diego, CA September 13-16, 2010.

P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded Community Detection in

Social Networks.” Technical Report. Being revised for submission.

D. Ediger, J. Riedy, D. Bader. “Tracking Structure of Streaming Social Networks,” in

submission.

I/UCRC Review Meeting, Dec 2010

Page 6: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

22

“Mining Twitter for Social Good”

• Immense volume of data:

• Twitter: 165M users, 55M tweets / day

• Goal: Use the interaction network to understand

and characterize information flow.

Results from collaboration with PNNL:

D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley,

R. Farber, and W. Reynolds. “Massive Social

Network Analysis: Mining Twitter for Social

Good,” International Conference on Parallel

Processing, San Diego, CA September 13-16,

2010.

Twitter social

network using

Large Graph

Layout

I/UCRC Review Meeting, Dec 2010

Page 7: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

23

Twitter Data Sets

•Influenza H1N1 Tweets in September 2009

–Keywords: flu, h1n1, influenza, swine flu

•Atlanta Flood Tweets in September 2009

–Hash tag: #atlflood

•All public tweets on September 1st, 2009

–For performance evaluation

http://hippwaters.wordpress.com/2009/09/22/atlanta-flood-images/ Source: CDC

I/UCRC Review Meeting, Dec 2010

Page 8: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

24

Tweeters ranked by Betweenness Centrality

Rank H1N1 atlflood

1 @CDCFlu @ajc

2 @addthis @driveafastercar

3 @Official_PAX @ATLCheap

4 @FluGov @TWCi

5 @nytimes @HelloNorthGA

6 @tweetmeme @11AliveNews

7 @mercola @WSB_TV

8 @CNN @shaunking

9 @backstreetboys @Carl

10 @EllieSmith_x @SpaceyG

11 @TIME @ATLINtownPaper

12 @CDCemergency @TJsDJs

13 @CDC_eHealth @ATLien

14 @perezhilton @MarshallRamsey

15 @billmaher @Kanye

I/UCRC Review Meeting, Dec 2010

Page 9: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

25

Exact vs. Approximate

Betweenness Centrality Performance

2009-09 H1N1

2009-09 Atlanta Flood

2009-09 Day 1 All

2009-09 Days: 1-9

All

2009-09 Days: 1-30

All

Vertices Edges

46,457 36,886

2,283 2,774

1,242,715 1,020,671

4,093,202 7,146,911

7,213,879 18,153,410

Cray XMT GraphCT

39.18s (64 processors)

6.14s (64 processors)

48m50s (128 processors)

20h21m (128 processors)

Not Run

Nehalem Exact Snap-GT (multi-threaded)

17s 1s Stopped after 72 hours

Not Run Not Run

Cray XMT GraphCT Approx. (128 samples)

3.94s (16 processors)

4.08s (16 processors)

5.12s (16 processors)

14.49s (16 processors)

33.09s (16 processors)

I/UCRC Review Meeting, Dec 2010

Page 10: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

26

The Cray XMT

•Tolerates latency by massive multithreading –Hardware support for 128 threads on each processor

–Globally hashed address space

–No data cache

–Single cycle context switch

–Multiple outstanding memory requests

–Fine-grained, word-level synchronization

–Flexibly supports dynamic load balancing

•Example, extreme architecture

– Useful lesson: Tolerating memory latency assists graph

analysis.

– Graph500: XMTs at #3, #4, #6 w/1 afternoon's work.

•PNNL's 128 processor XMT: 16384 threads, 1 TiB of shared memory

Image Source: cray.com

I/UCRC Review Meeting, Dec 2010

Page 11: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

27

Handling massive data rates

• Current data rates: Handle 240k edge insertions &

deletions per second on a Cray XMT.

• Note: GigE packet rates are ~550k/sec

Updates / sec

Edge adds only 930k

Edge adds + STINGER 300k

Adds + Deletes + STINGER 240k

Sizes 32P, 1M batch

RMAT: 16M vertices, 135M edges (edge factor 8)

I/UCRC Review Meeting, Dec 2010

Page 12: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

28

Seed Set Expansion

• Useful to find communities

to which several vertices

belong.

• Blue vertices are

are seeds, red vertices

belong to a discovered

community.

• Uses: Selection for viz,

expensive analysis...

I/UCRC Review Meeting, Dec 2010

Page 13: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

29

Comparing expansion methods

• Results of (first known) comparison of methods:

– McCloskey-Bader variation of widely used,

agglomerative Clauset-Newman-Moore heuristic

produces smallest sets with good properties.

– Followed closely by a personalized PageRank

approach [Andersen, Chung, & Lang, 2006].

• Currently working on parallel, streaming version...

Results from P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded Community

Detection in Social Networks.” Technical Report. Being revised for submission.

I/UCRC Review Meeting, Dec 2010

Page 14: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

30

References

1. A. Clauset, M.E.J. Newman, and C. Moore. “Finding community structure in

very large networks.” Physical Review E, 70(6):66111, 2004.

2. R. Andersen and K. Lang. “Communities from seed sets.” In Proceedings of

the 15th international conference on World Wide Web, page 232. ACM, 2006.

3. R. Andersen, F. Chung, and K. Lang. 2006. “Local Graph Partitioning using

PageRank Vectors.” In Proceedings of the 47th Annual IEEE Symposium on

Foundations of Computer Science (October 21 - 24, 2006). FOCS. IEEE

Computer Society, Washington, DC, 475-486.

4. J.P. McCloskey and D.A. Bader, “Modularity and Graph Algorithms,”

Minisymposium on Analyzing Massive Real-World Graphs, 2010 SIAM Annual

Meeting (AN10), Pittsburgh, PA, July 12-16, 2010.

I/UCRC Review Meeting, Dec 2010

Page 15: STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

31

Current Bader Lab Personnel

• Faculty: David A. Bader

• Research Scientists:

– Henning Meyerhenke (University of Paderborn, Germany)

– Jason Riedy (UC Berkeley)

• Graduate Students:

– David Ediger (George Washington University)

– Seunghwa Kang (Seoul National University, Korea)

– Xing Liu (Huazhong University of Science and Technology, China & IBM)

– Robert McColl (Vanderbilt University)

– Pushkar Pande (IIT Roorkee, India)

– Emily Rogers (UC Berkeley)

– Vipin Sachdeva (IIT Guwahati, UNM, IBM)

– Vyomkesh Tripathi (IIT-Kharagpur, India)

– Ivan Walker (Jackson State University)

– Zhaoming Yin (Peking Univ, China)