Notes on concurrency. Resources Java concurrency in practice – Practice-Brian-Goetz/dp/0321349601

Notes on concurrency

Resources

• Java concurrency in practice– http://www.amazon.com/Java-Concurrency-Practi

ce-Brian-Goetz/dp/0321349601

• Articles by Brian Goetz– http://www.ibm.com/developerworks/java/library

/j-jtp02244/index.html– http://www.ibm.com/developerworks/library/j-jtp

03304/

http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601

http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601

http://www.ibm.com/developerworks/java/library/j-jtp02244/index.html

http://www.ibm.com/developerworks/java/library/j-jtp02244/index.html

http://www.ibm.com/developerworks/library/j-jtp03304/

http://www.ibm.com/developerworks/library/j-jtp03304/

Java Memory Model

• Based on happens-before– Each action in a thread happens-before every action in that

thread that comes later in the program order– An unlock on a monitor happens-before every subsequent lock

on that same monitor– A write to a volatile field happens-before every subsequent read

of that same volatile– A call to Thread.start() on a thread happens-before any actions

in the started thread– All actions in a thread happen-before any other thread

successfully returns from a Thread.join() on that thread• In all other cases, Java VM is free to reorder execution

Java Memory Model

• Each action on thread Ahappens-before the other

• The lock happens-beforethe unlock

• Each action on thread Bhappens before the other

• At low level: unlock M flushesthe cache, lock M invalidates the cache

• This is how the JVM thinks: it’s very taxing. For humans, it’s better to have some constructs built on top of it, to use in various cases.

Visibility and atomicity

• These are the only two properties that you have to think about– Visibility: whether the changes on one thread are visible

to another (i.e. you see the latest value or a cached one)– Atomicity: whether you see changes at “inopportune

times”• Locks are used to guarantee both these properties– Which means some times you have to balance them (i.e.

to make a change visible, the atomic operation needs to end)

AVOIDING CONCURRENCY

Avoid concurrency

• Locks solve visibility and atomicity, at the cost of introducing a number of other problems

• Avoid concurrency whenever possible!• Let’s look at some pattern

Final static initialization

• Classloading guarantees fields are properly initialized– Useful for one-time initializations (singletons, …)

• See also “Initialization on Demand Holder”http://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom

public class Foo { private static Foo defaultFoo = createDefaultFoo();

private static Foo createDefaultFoo() { ... }

public static Foo getFoo() { return defaultFoo }}

http://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom

http://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom

Immutability

• No visibility issues if the state does not change– Final fields are guaranteed to be initialized for all

threads as long as the constructor does not leak

public class Foo { private final String name;

public Foo(String name) { this.name = name; }

public String getName() { return name; }

}

Immutability

• Make sure the fields themselves are immutable

• If not immutable, in general, you have to protect from changes– Creator of the object may retain a reference, and change

the data– User of the object may change the data through the

returned reference

public class Foo { private final String name; // Immutable private final Date date; // Not immutable! private final List<String> properties; // Not immutable!}

Immutability

• If reasonable and possible, make immutable copy

• It may not be possible– Too expensive (e.g. large arrays or large collections)– No immutable options (e.g. java.util.Date)

public class Foo { private final List<String> properties; // Not immutable! public Foo (List<String> properties) { this.properties = Collections.unmodifiableList(properties); }}

Immutability

• If reasonable and possible, make immutable copy

• It may not be possible– Too expensive (e.g. large arrays or large collections)– No immutable options (e.g. java.util.Date)

Add warning to the javadocs

public class Foo { private final List<String> properties; // Not immutable! public Foo (List<String> properties) { this.properties = Collections.unmodifiableList(properties); }}

Immutability

• You can still use lazy initialization for some fields

as long as the initialization is idempotent– It can happen that two threads will run it

public class BigDataset { private Double average; public double getAverage() { if (average == null) average = calculateAverage(); return average; }}

Immutability

• Don’t assume performance of immutable objects is bad– Java memory management handles very well

allocation and de-allocation• No issues of memory fragmentation, short lived objects

cost is similar to stack allocation

– Use of final fields allows extra optimizations– Escape analysis may optimize away object creation

Use of local variables

• Sometimes we use field variables to break some computation into smaller pieces

public class Foo { private State state; // Not thread-safe public Result doSomething(Input input) { preprocessInput(input); refine(); return calculateFinalResult(); }}

Use of local variables

• Use local variables instead

if not leaked, each concurrent thread will have a different state• See org.diirt.util.text.CsvParser

public class Foo { public Result doSomething(Input input) { State state = new State(); state.preprocessInput(input); state.refine(); return state.calculateFinalResult(); }}

Thread confinement

• If all the computation is done within a single thread, no locking is necessary

public class Facade { private final Executor exec = newSingleThreadExecutor(); private final BigData bigData = new BigData(); public void doSomethingLongAndComplicated(Input input) { exec.execute(new Runnable() { @Override public void run() { bigData.doSomethingCoplicated(input); } }); }}

Thread confinement

• Scheduling tasks on an executor/executorservice is a proper way of publishing changes.– From Executor Javadocs:

Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.

– From ExecutorService Javadocs:Memory consistency effects: Actions in a thread prior to the submission of a Runnable or Callable task to an ExecutorService happen-before any actions taken by that task, which in turn happen-before the result is retrieved via Future.get().

http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

Thread confinement

• Thread confinement is not a “coup-out”, but a proper technique to use when:– Operations are embarrassingly parallel and you

can scale them horizontally (e.g. web servers have one thread per request)• I use this for graphene: the rendering of each plot is

thread confined

– A subsystem is too tightly coupled together and little can be actually parallelized (e.g. a UI toolkit)

UI toolkits

• They are typically thread confined: all state must be accessed in one thread– The sub-system is usually layered (low level

input/output handling, rendering engine, widgets, models, aggregates of widgets, …)

– Events can triggered from low level (e.g. user input) or high level (e.g. model changes, animations)

– Result is high chance of taking locks in reverse ordering (deadlocks)

UI toolkits

• Further complication: the UI thread is precious and should not be blocked for unspecified period of time– Don’t run long operations on UI thread

• My rule of thumb:• >5 ms – need a different thread• >50 ms – need wait cursor• >0.5 sec – need a progress bar• >5 sec – need a cancel button

– Don’t access the network on UI thread– Don’t access files on UI thread– Don’t use locks on the UI thread

• You need to be 100% of what happens on the other side of the lock

UI toolkits

• Another reason not to use locks and multi-threading on UI threads: stale state– The UI always needs to be displaying the current state– Therefore, even if state (not thread confined) well

synchronized, you still need to trigger an event on the UI thread to make sure the display matches• And you’ll have a window in which the state has changed but

the UI is behind

– Better to transfer the state when you are dispatching the event: it makes sure the processing happens on the state you want

UI toolkits• Useful technique: immutable view on mutable model

– Suppose you have a complicated and sizeable model, on which you may want to execute “long” operations (e.g. regular expression substitution for the names of the nodes of a tree of 10,000 elements)

– The UI does not need to reflect the model as it’s changes: only the initial and final state

– The undo function also needs only the initial and the final state, so does save, revert, …

– You can make the model thread-safe mutable, and make it provide “checkpoints”: immutable views before and after the changes.• Most of the time, most of the model will not change, so most of the immutable view

will point to the same immutable (e.g. for a tree, the root node will always change, along with the few paths that changed, but most of the nodes will stay the same)

– UI action will queue operations on some other thread or thread pool, and the result of those operations will dispatch the result as immutable view.

THE SYNCHRONIZED KEYWORD

Synchronized keyword

• I’ll assume everybody is already familiar with the synchronized keyword– It allows to create critical sections– Changes made in one section are visible to other

sections with the same lock• In retrospect, it was not a great idea to add that

support to the base Object class• We are going to go through some best practices

and some unintended consequences in some common scenarios

Minimize critical sections

• If you can, limit the size of the critical section (synchronized block):– The longer you hold the lock, the less parallelism

you’ll have (never really had this problem)– You want to avoid to call other code (especially

one that you do not control): it may call other locks, and it may cause deadlocks

Ideas to minimize critical sections

• Pre-calculate difference

public class Statistics { public void addData(double[] data) { int newMax; for (...) { ... newMax = max(newMax, ...); } synchronized(lock) { max = max(max, newMax); } }}

Ideas to minimize critical sections

• Pre-calculate action based on the state in the critical section, execute the action outside– Good in cases where you just need the “decision”

to be serializedpublic class BusinessLogic { public void consumeEvent(Event event) { int case; synchronized(lock) { case = whatToDo(this.state, event); } switch(case) { ... } }}

Minimize critical sections

• Don’t try too hard, though: you may risk to introduce more bugs then the performance you are introducing

Avoid synchronized classes/methods

• The synchronized keyword is applicable to methods and classes, but best to avoid those cases– Granularity is typically too coarse


• The synchronized keyword is applicable to methods and classes, but best to avoid those cases– Implicit dependencies between methods of the

same class, and of class hierarchy. Also note:public class A { public synchronized void foo() { ... }}

public class B extends A { public void foo() { ... }}


• The synchronized keyword is applicable to methods and classes, but best to avoid those cases– Granularity is typically too coarse– Implicit dependencies between methods of the

same class, and of class hierarchy– It exposes the lock: external code may mess with

the synchronization logic

Guards (privately held locks)

• The recommended technique is to have a private object used as a lock, and document which variables require the lock for read and write

public class Bopper { private final Object lock = new Object(); // Guarded by lock private int bops; public int getBops() { synchronized(lock) {return bops;} } public void bop() { int bopAmount = calculateBopAmount(); synchronized(lock) { bops += bopAmount; } }}

Guards (privately held locks)

• Upside: the user cannot mess with your synchronization

• Downside: the user cannot “compound” operations you provide to a single critical section you can guarantee

// Can’t do this:synchronized(bopper) { bopper.bop(); int bops = bopper.getBops();}

// Need to implement specific bopper.bopAndGet();// or make sure that bopper is always guarded by another lock// (2 locks = potential for deadlock)

Takeaway for locks

• Each piece of state accessed from more than one thread must be synchronized

• Document which lock is going to guard each piece of state

• Make sure that each access (both read and write) are synchronized with the right guard

• Try to minimize access to any other state or calls– This can be difficult when the new state depends on

both previous state and external calls• Is the state correctly divided?

Notifications

• Suppose you want to make a writeable property thread-safe

public class Bar { public void addPropertyListener(..) {...} public int getFoo() { synchronized(lock) {return foo;} } public void setFoo(int foo) { synchronized(lock) { this.foo = foo; // Notifications in a lock to prevent another // change during notification? firePropertyChanged(); } }}

Notifications

• A user wants to access the data as part of his critical section

...synchronized(guard) { a = bar.getFoo();}...

...bar.addPropertyChangeListener( e -> { synchronized(guard) { ... }});

Notifications

• Result: deadlock

synchronized(guard) getFoo() synchronized(lock)

synchronized(lock) fireNotifications() synchronized(guard)

Thread 1

Thread 2

Note: this can happen with several levels of indirections.• A critical section calls a method that calls a method that calls a method that calls getFoo()• The notification calls a method that calls a notification that calls a method with a critical

section

Notifications

• It’s impossible to provide thread-safe callbacks and synchronous calls that trigger such callbacks in a way that guarantees:

• The changes are serialized (second change waits that first is done)• Callback see only its change (first round of callbacks sees only the

first change)• No deadlocks

• You may think that it’s ok to provide the first two, and documenting the lock policy so that the user can simply avoid the deadlock– Does not work

Notifications (pvmanager case)

• Original implementation of pvmanager both exposed the locks and locked on the notifications– The idea was to allow user to combine operations

into his own critical sections– Javadocs would explain how to use the lock

• Result– User writes code that does not follow locking policy– “Pvmanager has deadlocks”

Notifications (pvmanager case)Takeaway:• You can’t assume the user will be read/understand the documentation

– As we saw before, everything in the stack is affected (not just the direct user)

• It’s your job to protect your system (not your user’s)• The more “general purpose” your code, the more both are true

pvmanager now:• Hides all the locks (can’t participate in deadlock)• Does guarantees serialization of events

– User code cannot trigger notifications directly– May give an extra event after pause/close

Notifications part II

• You see the light, decide to hide your lock completely:

public class Bar { public void addPropertyListener(..) {...} public int getFoo() { synchronized(lock) {return foo;} } public void setFoo(int foo) { synchronized(lock) { this.foo = foo; } // Notifications should not expose lock // User lock may cause out of sequence notifications firePropertyChanged(); }}


• Two clients perform and listen to changes as part of a critical section

...synchronized(guard) { bar.setFoo();}...

...bar.addPropertyChangeListener( e -> { synchronized(guard) { ... }}


• Result: deadlock

synchronized(guard1) setFoo() synchronized(guard2)Thread 1

Thread 2

The notification leaks the locks held by the setter to all listeners!

synchronized(guard2) setFoo() synchronized(guard1)

Notifications part II (ca datasource)

• ca datasource needs critical sections to:– make sure that all the monitors as established as an atomic operation– make sure all the notifications are handled atomically

• JCA/CAJ have different implementations that have different guarantees on– what is atomic– whether the locks are leaked or not– whether multiple connections share channels

• In one of the permutations, 2 channel handlers in pvmanager where using the same CAJ channel, which made the locks leak to the different handlers

• Solution was to make the channel sharing optional


• A possible solution is to context switch for the callbacks

public class Bar { public void addPropertyListener(..) {...} public int getFoo() { synchronized(lock) {return foo;} } public void setFoo(int foo) { synchronized(lock) { this.foo = foo; } // Notifications fired on a different thread to prevent // leaking of the caller locks // If executor enforces sequence, no out of synch possible // But setFoo returns before callbacks are executed exec.submit( ... { firePropertyChanged(); }); }}

Notifications takeaway• Property change notifications in multi-threaded context are hard!

– Yet another reason why UI toolkits are single-threaded• Document the assumptions, but don’t expect that people will

read/understand (they are more for yourself and other maintainers)– Stability/correctness is usually more important than performance/flexibility– There may be multiple layers between you and the other lock involved in a

deadlock (actual user may not even know your library)• If you provide a callback

– If you can, hide the lock. If you can’t, document which other methods need access to the same lock

– If you can, context switch. If you can’t, make it very clear what callers lock will the callback be holding

• If you use a callback– Don’t add synchronized keywords at random: really think it through– If you do need them, look at the javadocs for guaranteed policy

Notifications in pvmanager

• The user sees– PV objects are thread-safe

• Can call any method on any thread

– Events are serialized, no new event until previous event is done• No matter the user code, the formula, the datasource, …

– Can choose on which thread the events are processed• This is necessary because one should not forward the event to another

thread: the framework will think you are ready for the next event and send.

– You need to specify what to do if rate is too high: queue, skip, …• This allows the framework to keep your events serialized and current.

– May receive an extra event after the pause/close• The event processing may already be at the notification level: outside of

pvmanager control. Not locking pause and close, as it may create deadlocks.

Notifications in pvmanager

• The PV object– Hides its lock– Does not itself guarantee the event serialization

• The rest of the framework– Guarantees that a new event is dispatched only if the

previous is processed– If a new event comes while processing, it is

postponed until after the current event is finished– Once the event processing starts, it cannot be

postponed or canceled

Documents

Notes on concurrency. Resources Java concurrency in practice – Practice-Brian-Goetz/dp/0321349601