Upload
jim-bethancourt
View
122
Download
0
Embed Size (px)
Citation preview
29 June 2016
Java Performance TweaksHow I brought TopicViewer’s analysis runtime from 2:10 down to 18 seconds
Repo: https://github.com/jimbethancourt/topic-viewer
2
Use a Faster Library 1
Avoid the Unnecessary 3
Looping 9
Parallelization 13
Finally 17
Contents
3
Use a Faster Library
• Swapping out off-the-shelf components is the first step you’ll want to take and will likely provide the most immediate gain.
• Replacing Colt with ParallelColt was relatively straightforward
• No migration guides, but the fact that both libraries were open source and classes had relatively similar names helped.
• Reduced the runtime from 2:10 to 0:55
Migrating from Colt to ParallelColt
4
Use a Faster Library 1
Avoid the Unnecessary 3
Looping 9
Parallelization 13
Finally 17
5
Avoid the Unnecessary
Don’t use Objects when primitives will provide the same functionality
Old
clustersTree.findSet(new Vertex(i)).index == rootIndexNew
clustersTree.findSet(i).index == rootIndex
Avoid Unnecessary Object Creation
6
Avoid the Unnecessary
Use Object arrays instead of Maps when you primarily read values and can index via numbers:
In DisjointTree
OriginalMap<Vertex, Vertex> vertexMapping = new HashMap<Vertex, Vertex>();findSet(Vertex v) { Vertex mapped = vertexMapping.get(v); …}NewVertex[] vertexMapping = new Vertex[numDocuments];findSet (int v) { vertexMapping[v.index] = v; …}
Brought runtime down to 42 seconds (if I remember correctly) between using arrays and avoiding object creation.
Use Object Arrays instead of Maps
7
Avoid the Unnecessary
• Avoid Unnecessary calls when there are collections that already perform the operations for you
• Constantly calling collection.contains() added up – I was surprised
• Avoid complex, object-heavy equals() operations when possible
• Splitting work into producer + consumer and leveraging natural properties of collections cut 15 seconds off of runtime
• See updateCorrelationMatrix() method and Pair.equals()
Avoid Unnecessary Operations
8
Avoid the Unnecessary
• Use method parameter values when looping heavily
• Method parameters live on the processor stack, not the heap
• As a result, accessing them is much faster
• Shaved off a second or two
Original:
private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, boolean force) {
for (int i = 0; i < this.numDocuments; i++)
for (int j = 0; j < this.numDocuments; j++)
New:
private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, int numDocuments, boolean force) {
for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++)
Use Method Parameter Values
9
Avoid the Unnecessary
Original:
for (int i = 0; i < this.numDocuments; i++)
for (int j = 0; j < this.numDocuments; j++) {
double similarity = correlationMatrix.get(i, j);
if (i < j && similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity)
New:
for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) { if (i < j) { double similarity = correlationMatrix.get(i, j); if (similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity)
Make Absolutely Sure you Need I/O
10
Use a Faster Library 1
Avoid the Unnecessary 3
Looping 9
Parallelization 13
Finally 17
11
Looping
for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) if (i < j)Is faster thanfor (int j = 0; j < numDocuments; j++) for (int i = 0; i < j; i++)
Likely faster due to reuse of register values and JITting
Don’t Get Fancy With Inner Loops
12
Looping
It was cheaper to re-run the operation instead of caching in a map:
private Set<Integer> getClusterSet(int rootIndex, int numDocuments) {
Set<Integer> clusterSet = new LinkedHashSet<>(); for (int i = 0; i < numDocuments; i++) if (this.clustersTree.findSet(i).index == rootIndex) clusterSet.add(i); return clusterSet;}
Don't Cache Where it Won't Help
13
Looping
https://dzone.com/articles/java-collection-performance was pure gold!
Use the Fastest Collection for your Dominant Operation
14
Use a Faster Library 1
Avoid the Unnecessary 3
Looping 9
Parallelization 13
Finally 17
15
Parallelization
What do you do when you need to parallelize something like this?
for (int i = 0; i < correlationMatrix2D.rows()-1; i++) {…}for (int i = 0; i < correlationMatrix2D.rows()-1; i++) { rowsForUpdating.add(i);}rowsForUpdating.parallelStream().forEach(i -> {…});
Shaved 10 seconds off of runtime.If possible, populate rowsForUpdating only once if quantity is fixed
Need to Parallelize an Indexed For Loop?
16
Parallelization
When organizing parallelization of processing, perform the parallelization at the highest / outermost level possible.This will reduce the cost of context-switching for the CPUNesting parallelization was detrimental to runtime when I attempted to do so. Processing time took longer and CPU utilization was higher.
Parallelize (only) at the Highest Level Possible
17
Parallelization
There is no ConcurrentHashSet data structure in java.util.concurrent Leverage the fact that ConcurrentHashMap’s key is a HashSet:
ConcurrentHashMap<Pair, Integer> calculatedClusters = new ConcurrentHashMap<>();
…
calculatedClusters.put(newPair, 0);
A ConcurrentHashMap has 16 or more segments and each can be written to at the same time.
Use a ConcurrentHashMap for the Key
18
• When performing a large volume of writes, use a non-blocking data structure.
• br.ufmg.aserg.topicviewer.util.Double2DMatrix is pretty amazing once I realize how it worked
• Each matrix row has its own channel• Each cell in the row has a calculated offset• As a result, there is no contention
Non-Blocking Data Structures for Heavy I/O
19
Use a Faster Library 1
Avoid the Unnecessary 3
Looping 9
Parallelization 13
Finally 17
20
Finally
-XX:+UseG1GCSaved 2 seconds of runtime!
Use the G1 Garbage Collector
21
Finally
Java Mission Control is your new best friend-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
Profile Profile Profile
22
• 18 seconds • Over 8X performance
improvement
Final Runtime