GraphChi: Big Data – small machine

GraphChi

GraphChi:Big Data small machineWhat is it good for and whats new?

Aapo KyrlPh.D. candidate @ CMU

http://www.cs.cmu.edu/~akyrolaTwitter: @kyrpovGraphChi: Big Data - small machine (A. Kyrola)I am Aapo Kyrola, I am beginning my fifth year as a Ph.D. student in Carnegie Mellon. I am advised by Carlos and Guy Blelloch.

In this talk I am going to talk about GraphChi. GraphChi is part of the GraphLab project, kind of a spin-off. The basic promise of GraphChi is that you can conveniently do graph computation on extremely large graphs, on just a Mac Mini or a laptop. And I actually mean very big graphs.

This talk is quite high level: more like a marketing talk I want to encourage you to have a look at GraphChi. But I will also talk some new stuff.

I would like to also use this opportunity to thank Pankaj, who just talked, because last year we met in this workshop and he invited me to work for Twitter for the last Fall. That was a great experience and unique opportunity, and based on that work I am proud [transition]

1

GraphChi can compute on the full Twitter follow-graph with just a standard laptop.~ as fast as a very large Hadoop cluster!(size of the graph Fall 2013, > 20B edges [Gupta et al 2013])GraphChi: Big Data - small machine (A. Kyrola) that I can say that you can use a basic Mac laptop to compute on the actual, whole Twitter graph. This is data that is not normally available in the academic research, and I am really proud of this. It it hard to find results from the literature that would have done experiments on anything this big. And this is just on a laptop. Thanks Pankaj for the opportunity, and a especially appreciate Twitters commitment to open source, and I have been able to use and release the code I did while at Twitter without trouble.

Ok I said you can compute on the Twitter graph, but how fast? Roughly speaking, as fast as you can do on a huge Hadoop cluster. Goes without saying that energy wise or cost wise GraphChi is incredibly efficient.

Response to PankajGraphChi can compute on the actual Twitter graph, on a Macbook Pro (Fall 2012)Cite Pankajs paper

Roughly same performance as Twitters hadoop clusterThink about energy consumption, costs2

What is GraphChi

2

Both in OSDI12!GraphChi: Big Data - small machine (A. Kyrola)So as a recap, GraphChi is a disk-based GraphLab. While GraphLab2 is incredibly powerful on big clusters, or in the cloud, you can use GraphChi to solve as big problems on just a Mac Mini. Of course, GraphLab can solve the problems way faster but I believe GraphChi provides performance that is more then enough for many.

Spin-off of GraphLab projectDisk based GraphLabOSDI12

3Parallel Sliding Windows

Only P large reads for each interval (sub-graph).P2 reads on one full pass.

orDetails: Kyrola, Blelloch, Guestrin: Large-scale graph computation on just a PC (OSDI 2012)GraphChi: Big Data - small machine (A. Kyrola)So how does GraphChi work? I dont have time to go to details now. It is based on an algorithm we invented called Parallel Sliding Windows. In this model you split the graph in to P shards, and the graph is processed in P parts. For each part you load one shard completely in to memory, and load continuous chunks of data from the other shards. All in all, you need very small number of random accesses, which are the bottleneck of disk based computing. GraphChi is good on both SSD and hard drive!4Why GraphChiGraphChi: Big Data - small machine (A. Kyrola)I am now going into details of the features of GraphChi, which are the selling points fo GraphChi. So I am going to talk about the Performance, how it is perhaps surprising a really scalable system, then just to mention that you program GraphChi with the familiar vertex centric model of GraphLab or Pregel but GraphLab provides some extensions like support for dynamic graphs. 5Performance ComparisonNotes: comparison results do not include time to transfer the data to cluster, preprocessing, or the time to load the graph from disk. GraphChi computes asynchronously, while all but GraphLab synchronously. PageRankSee the paper for more comparisons.WebGraph Belief Propagation (U Kang et al.)Matrix Factorization (Alt. Least Sqr.)Triangle CountingOn a Mac Mini:GraphChi can solve as big problems as existing large-scale systems.Comparable performance.

GraphChi: Big Data - small machine (A. Kyrola)Unfortunately the literature is abundant with Pagerank experiments, but not much more. Pagerank is really not that interesting, and quite simple solutions work. Nevertheless, we get some idea. Pegasus is a hadoop-based graph mining system, and it has been used to implement a wide range of different algorithms. The best comparable result we got was for a machine learning algo belief propagation. Mac Mini can roughly match a 100node cluster of Pegasus. This also highlights the inefficiency of MapReduce. That said, the Hadoop ecosystem is pretty solid, and people choose it for the simplicity. Matrix factorization has been one of the core Graphlab applications, and here we show that our performance is pretty good compared to GraphLab running on a slightly older 8-core server. Last, triangle counting, which is a heavy-duty social network analysis algorithm. A paper in VLDB couple of years ago introduced a Hadoop algorithm for counting triangles. This comparison is a bit stunning. But, I remind that these are prior to PowerGraph: in OSDI, the map changed totally!

However, we are confident in saying, that GraphChi is fast enough for many purposes. And indeed, it can solve as big problems as the other systems have been shown to execute. It is limited by the disk space.6PowerGraph ComparisonPowerGraph / GraphLab 2 outperforms previous systems by a wide margin on natural graphs.With 64 more machines, 512 more CPUs:Pagerank: 40x faster than GraphChiTriangle counting: 30x faster than GraphChi.

OSDI12

GraphChi has state-of-the-art performance / CPU.2

vs.

GraphChiGraphChi: Big Data - small machine (A. Kyrola)PowerGraph really resets the speed comparisons. However, the point of ease of use remain, and GraphChi likely provides sufficient performance for most people. But if you need peak performance and have the resources, PowerGraph is the answer. GraphChi has still a role as the development platform for PowerGraph.7Scalability / Input Size [SSD]Throughput: number of edges processed / second.Conclusion: the throughput remains roughly constant when graph size is increasedNo worries of running out of memory, or buying more machines when your data grows.Graph size Performance GraphChi: Big Data - small machine (A. Kyrola)Here, in this plot, the x-axis is the size of the graph as the number of edges. All the experiment graphs are presented here. On the y-axis, we have the performance: how many edges processed by second. Now the dots present different experiments (averaged), and the read line is a least-squares fit. On SSD, the throughput remains very closely constant when the graph size increases. Note that the structure of the graph actually has an effect on performance, but only by a factor of two. The largest graph, yahoo-web, has a challenging structure, and thus its results are comparatively worse.8GraphChi^2

TimeTDistributed Graph SystemSingle-computer system (capable of big tasks)Task 1Task 2

Task 3Task 4Task 5Task 6TimeT6 machines12 machinesTask 1Task 2Task 3Task 4Task 5Task 6Task 10Task 11Task 12(Significantly) less than 2x throughput with 2x machinesExactly 2x throughput with 2x machinesGraphChi: Big Data - small machine (A. Kyrola)The fact that GraphChi can scale to very big problems makes it surprisingly an interesting candidate for massive production systems. This is true in cases when you can sacrifice latency for throughput, and you have many problems that you run on the same graph, data. For example, Twitter computes recommendations for each user personally and they are all on same graph. So lets say you need to compute millions of new recommendations a day but you dont need to compute them in a few seconds. Then you could have a choice between a distributed efficient graph system which needs many machines just to solve this one problem, and of using GraphChi to run one task a time. Note that one task can mean computing recommendations for thousands or even million of users a time I will describe such a setting later today.

This is a made-up example to illustrate a point.

Here we have chosen T to be the time the single machine system, such as GraphChi, solves the one task. Lets assume the cluster system needs 6 machines to solve the problem, and does it about 7 times faster than GraphChi. Then in Time T it solves 7 tasks while GraphChi solves 6 tasks with the same cluster.

Now if we double the size of the cluster, to twelve machines: cluster systems never have linear speedup, so lets assume the performance increases by say 50%. Of course this is just fake numbers, but similar behavior happens at some cut-off point anyway. Now GraphChi will solve exactly twice the number of tasks in time T.

9

GraphChi: Big Data - small machine (A. Kyrola)We are not only ones thinking this way10Applications for GraphChiGraph MiningConnected componentsApprox. shortest pathsTriangle countingCommunity DetectionSpMVPageRankGenericRecommendationsRandom walksCollaborative Filtering (by Danny Bickson)ALSSGDSparse-ALSSVD, SVD++Item-CF+ many moreProbabilistic Graphical ModelsBelief Propagation

GraphChi: Big Data - small machine (A. Kyrola)One important factor to evaluate is that is this system any good? Can you use it for anything? GraphChi is an early project, but we already have a great variety of algorithms implement on it. I think it is safe to say, that the system can be used for many purposes. I dont know of a better way to evaluate the usability of a system than listing what it has been used for. There are over a thousand of downloads of the source code + checkouts which we cannot track, and we know many people are already using the algorithms of GraphChi and also implementing their own. Most of these algos are now available only in the C++ edition, apart from the random walk system which is only in the Java version.11Programming + Special FeaturesSimilar programming model as GraphLab version 1Dynamic graphsStreaming graphs while computingGraph contraction algorithms (new)Minimum spanning forestGraphChi: Big Data - small machine (A. Kyrola)Easy to Get StartedJava and C++ versions availableNo installation, just runAny machine, SSD or HD

http://graphchi.orghttp://code.google.com/p/graphchihttp://code.google.com/p/graphchi-javaGraphChi: Big Data - small machine (A. Kyrola)Whats NewGraphChi: Big Data - small machine (A. Kyrola)Extensions1. Dynamic Edge and Vertex ValuesDivide shards into small (4 mb) blocks that can be resized separately.Block 1Block 2Block 3Block NShard(j)Block 2Block 3Block 12. Integration with Hadoop / Pig

3. Fast neighborhood queries over shardsSparse indices

4. DrunkardMob: Random Walks (next)GraphChi: Big Data - small machine (A. Kyrola)Random Walk SimulationsPersonalized PageRankProblem: using the power method would require O(V2) of memory to compute for all vertices.Can be approximated by simulating random walks and computing the sample distribution.Other applications:Recommender systems: FolkRank (Hotho 2006), finding candidatesKnowledge-base inference (Lao, Cohen 2009)GraphChi: Big Data - small machine (A. Kyrola)Random walk in an in-memory graphCompute one walk a time (multiple in parallel, of course):

parfor walk in walks: for i=1 to numsteps: vertex = walk.atVertex() walk.takeStep(vertex.randomNeighbor()) Extremely slow in GraphChi / PSW !

Each hop might require loading of a new interval.

GraphChi: Big Data - small machine (A. Kyrola)So how would we do this if we could fit the graph in memory?17Random walks in GraphChiDrunkardMob algorithmReverse thinkingparfor vertex in graph: mywalks = walkManager.getWalksAtVertex(vertex.id) foreach walk in mywalks:walkManager.addHop(walk, vertex.randomNeighbor())

Need to encode only current vertex and source vertex for each walk:4-byte integer sufficient / walkWith 144 GB RAM, could run 15 billion walks simultaneously (on Java) recommendations for 15 million usersGraphChi: Big Data - small machine (A. Kyrola)CHUNKS! Load chunk of graph, chunk of walks and move them forward.18Keeping track of walksVertex walks table (WalkManager)Source A top-N visitsSource B top-N visitsWalk Distribution Tracker (DrunkardCompanion)Execution intervalGraphChiGraphChi: Big Data - small machine (A. Kyrola)Keeping track of walksVertex walks table (WalkManager)Source A top-N visitsSource B top-N visitsWalk Distribution Tracker (DrunkardCompanion)Execution intervalGraphChiSource A top-N visitsSource B top-N visitsGraphChi: Big Data - small machine (A. Kyrola)Application: Twitters Who-to-FollowBased on WWW13 paper by Gupta et. al.

Step 1: Compute Circle of Trust (CoT) for each userStep 2: Bipartite graph with CoT + CoTs followees.Step 3: Compute SALSA and pick top scored users as recommendations.DrunkardMobNeighborhood queries over shards.GraphChi: Big Data - small machine (A. Kyrola)TODO: IMPROVE21ConclusionGraphChi can run your favorite graph computation on extremely large graphs on your laptopUnique features such as random walk simulations and dynamic graphsMost popular: Collaborative Filtering toolkit (by Danny Bickson)GraphChi: Big Data - small machine (A. Kyrola)Thank you!

Aapo KyrlPh.D. candidate @ CMU soon to graduate!

http://www.cs.cmu.edu/~akyrolaTwitter: @kyrpovGraphChi: Big Data - small machine (A. Kyrola)23

Documents

GraphChi: Big Data – small machine