22
29 June 2016 Java Performance Tweaks How I brought TopicViewer’s analysis runtime from 2:10 down to 18 seconds Repo: https:// github.com/jimbethancourt/topic-viewer

Java Performance Tweaks

Embed Size (px)

Citation preview

Page 1: Java Performance Tweaks

29 June 2016

Java Performance TweaksHow I brought TopicViewer’s analysis runtime from 2:10 down to 18 seconds

Repo: https://github.com/jimbethancourt/topic-viewer

Page 2: Java Performance Tweaks

2

Use a Faster Library 1

Avoid the Unnecessary 3

Looping 9

Parallelization 13

Finally 17

Contents

Page 3: Java Performance Tweaks

3

Use a Faster Library

• Swapping out off-the-shelf components is the first step you’ll want to take and will likely provide the most immediate gain.

• Replacing Colt with ParallelColt was relatively straightforward

• No migration guides, but the fact that both libraries were open source and classes had relatively similar names helped.

• Reduced the runtime from 2:10 to 0:55

Migrating from Colt to ParallelColt

Page 4: Java Performance Tweaks

4

Use a Faster Library 1

Avoid the Unnecessary 3

Looping 9

Parallelization 13

Finally 17

Page 5: Java Performance Tweaks

5

Avoid the Unnecessary

Don’t use Objects when primitives will provide the same functionality

Old

clustersTree.findSet(new Vertex(i)).index == rootIndexNew

clustersTree.findSet(i).index == rootIndex

Avoid Unnecessary Object Creation

Page 6: Java Performance Tweaks

6

Avoid the Unnecessary

Use Object arrays instead of Maps when you primarily read values and can index via numbers:

In DisjointTree

OriginalMap<Vertex, Vertex> vertexMapping = new HashMap<Vertex, Vertex>();findSet(Vertex v) { Vertex mapped = vertexMapping.get(v); …}NewVertex[] vertexMapping = new Vertex[numDocuments];findSet (int v) { vertexMapping[v.index] = v; …}

Brought runtime down to 42 seconds (if I remember correctly) between using arrays and avoiding object creation.

Use Object Arrays instead of Maps

Page 7: Java Performance Tweaks

7

Avoid the Unnecessary

• Avoid Unnecessary calls when there are collections that already perform the operations for you

• Constantly calling collection.contains() added up – I was surprised

• Avoid complex, object-heavy equals() operations when possible

• Splitting work into producer + consumer and leveraging natural properties of collections cut 15 seconds off of runtime

• See updateCorrelationMatrix() method and Pair.equals()

Avoid Unnecessary Operations

Page 8: Java Performance Tweaks

8

Avoid the Unnecessary

• Use method parameter values when looping heavily

• Method parameters live on the processor stack, not the heap

• As a result, accessing them is much faster

• Shaved off a second or two

Original:

private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, boolean force) {

for (int i = 0; i < this.numDocuments; i++)

for (int j = 0; j < this.numDocuments; j++)

New:

private int[] getLeastDissimilarPair(DoubleMatrix2D correlationMatrix, int numDocuments, boolean force) {

for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++)

Use Method Parameter Values

Page 9: Java Performance Tweaks

9

Avoid the Unnecessary

Original:

for (int i = 0; i < this.numDocuments; i++)

for (int j = 0; j < this.numDocuments; j++) {

double similarity = correlationMatrix.get(i, j);

if (i < j && similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity)

New:

for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) { if (i < j) { double similarity = correlationMatrix.get(i, j); if (similarity != Double.NEGATIVE_INFINITY && similarity > bestSimilarity)

Make Absolutely Sure you Need I/O

Page 10: Java Performance Tweaks

10

Use a Faster Library 1

Avoid the Unnecessary 3

Looping 9

Parallelization 13

Finally 17

Page 11: Java Performance Tweaks

11

Looping

for (int i = 0; i < numDocuments; i++) for (int j = 0; j < numDocuments; j++) if (i < j)Is faster thanfor (int j = 0; j < numDocuments; j++) for (int i = 0; i < j; i++)

Likely faster due to reuse of register values and JITting

Don’t Get Fancy With Inner Loops

Page 12: Java Performance Tweaks

12

Looping

It was cheaper to re-run the operation instead of caching in a map:

private Set<Integer> getClusterSet(int rootIndex, int numDocuments) {

Set<Integer> clusterSet = new LinkedHashSet<>(); for (int i = 0; i < numDocuments; i++) if (this.clustersTree.findSet(i).index == rootIndex) clusterSet.add(i); return clusterSet;}

Don't Cache Where it Won't Help

Page 13: Java Performance Tweaks

13

Looping

https://dzone.com/articles/java-collection-performance was pure gold!

Use the Fastest Collection for your Dominant Operation

Page 14: Java Performance Tweaks

14

Use a Faster Library 1

Avoid the Unnecessary 3

Looping 9

Parallelization 13

Finally 17

Page 15: Java Performance Tweaks

15

Parallelization

What do you do when you need to parallelize something like this?

for (int i = 0; i < correlationMatrix2D.rows()-1; i++) {…}for (int i = 0; i < correlationMatrix2D.rows()-1; i++) { rowsForUpdating.add(i);}rowsForUpdating.parallelStream().forEach(i -> {…});

Shaved 10 seconds off of runtime.If possible, populate rowsForUpdating only once if quantity is fixed

Need to Parallelize an Indexed For Loop?

Page 16: Java Performance Tweaks

16

Parallelization

When organizing parallelization of processing, perform the parallelization at the highest / outermost level possible.This will reduce the cost of context-switching for the CPUNesting parallelization was detrimental to runtime when I attempted to do so. Processing time took longer and CPU utilization was higher.

Parallelize (only) at the Highest Level Possible

Page 17: Java Performance Tweaks

17

Parallelization

There is no ConcurrentHashSet data structure in java.util.concurrent Leverage the fact that ConcurrentHashMap’s key is a HashSet:

ConcurrentHashMap<Pair, Integer> calculatedClusters = new ConcurrentHashMap<>();

calculatedClusters.put(newPair, 0);

A ConcurrentHashMap has 16 or more segments and each can be written to at the same time.

Use a ConcurrentHashMap for the Key

Page 18: Java Performance Tweaks

18

• When performing a large volume of writes, use a non-blocking data structure.

• br.ufmg.aserg.topicviewer.util.Double2DMatrix is pretty amazing once I realize how it worked

• Each matrix row has its own channel• Each cell in the row has a calculated offset• As a result, there is no contention

Non-Blocking Data Structures for Heavy I/O

Page 19: Java Performance Tweaks

19

Use a Faster Library 1

Avoid the Unnecessary 3

Looping 9

Parallelization 13

Finally 17

Page 20: Java Performance Tweaks

20

Finally

-XX:+UseG1GCSaved 2 seconds of runtime!

Use the G1 Garbage Collector

Page 21: Java Performance Tweaks

21

Finally

Java Mission Control is your new best friend-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

Profile Profile Profile

Page 22: Java Performance Tweaks

22

• 18 seconds • Over 8X performance

improvement

Final Runtime