Synchronization
Goals
- Recognize race conditions and prevent them with synchronization.
- Understand and prevent deadlock among synchronized threads.
Concepts
- atomic
- block
- concurrency
- benign (race condition)
- deadlock
- double-checked locking
- fail-fast (iterator)
- lock
- mutex
- race condition
- reentrant
- synchronization
- thread safe
Language
synchronized
Library
java.util.ArrayList<E>
java.util.Collections
java.util.Collections.synchronizedList(List<T> list)
java.util.ConcurrentModificationException
java.util.Vector<E>
Lesson
Threads allow different sequences of program logic to execute concurrently within the process. If the threads are performing completely independent tasks, no problems occur. Many times, however, we use threads to perform the same or similar task more than once concurrently. Sharing resources across threads brings an entirely new level of complexity, as the sequence of resource reading and writing can cause many strange, sporadic, and difficult bugs. Synchronization is the most basic form of controlling resource access among threads, a tool to ensure that only a single thread at a time can access resources crucial to a certain task.
Race Conditions
The central problem that plagues multi-threaded programming is the possibility of a race condition, when two threads tries to access data at the same time. You already learned to use the volatile
keyword for variables shared between threads, to ensure that the value is not cached by the JVM. But providing the current value to each thread is only the tip of the iceberg. Once threads start manipulating those values, race conditions appear.
Consider the bank account class presented in the lesson about methods, which for simplicity uses long
to represent the monetary balance in cents. (Remember that floating point numbers should never be used to represent monetary values.)
Assume that your bank account currently has a balance of $100.00 (10000L
). You receive a birthday present from each of your parents: your mother gives you $20 (2000L
) and your father other gives you $30 (3000L
). After all the birthday celebrations are over, you expect to have $150 (15000L
) in your account.
But what if each of your parents deposit the money at exactly the same? Although using separate tellers at the bank, the computer system uses a central server that allows simultaneous operations using threads. Let's see what might happen if the thread supporting your mother's transaction tries to call BankAccount.deposit(…)
at the same time as the thread supporting your father's transaction:
- momThread calls
deposit(2000L)
. - momThread executes
oldBalance = balance
, which places the value10000L
intooldBalance
. - momThread executes
newBalance = oldBalance + amount
, which places10000L + 2000L
or12000L
intonewBalance
. - At this point dadThread calls
deposit(3000L)
. - dadThread executes
oldBalance = balance
, which places the value10000L
intooldBalance
. Remember thatbalance
has not yet been changed. - dadThread executes
newBalance = oldBalance + amount
, which places10000L + 3000L
or13000L
intonewBalance
. - dadThread executes
balance = newBalance
, which places the value13000L
fromnewBalance
(in dadThread) intobalance
. - momThread continues and executes
balance = newBalance
, which places the value12000L
fromnewBalance
(in momThread) intobalance
.
When both threads have exited BankAccount.deposit(…)
, the value in balance
is 12000L
($120), not the 15000L
($150) you expected. The problem is that the deposit
operation consists of several parts that depend on values shared between threads. When these individual parts of the operation were performed at the same time by different threads, a race condition to read and then update the balance
value occurs.
Synchronization
The race condition above occurred because we expected the operation in BankAccount.deposit(…)
to be atomic. That is, we wanted the calculation and assignment to be completed as one whole operation, not broken down into parts. A non-atomic operation allows another thread to start executing parts the same operation, negating the assumptions of the first operation and creating a race condition.
Java provides the synchronized
keyword which, when applied to a method, prevents more than one thread from accessing the method at the same time.
Because the method uses synchronized
, no two threads can enter the method at the same time. Once momThread enters the method in the sequence above, Java does not allow dadThread to enter the method until momThread has left the method. Because no two threads can enter the method at the same time, the the BankAccount.deposit(…)
method is now atomic.
Behind the scenes each object instance in Java has a token called a lock. Before a thread can enter a synchronized
method, it must acquire this lock. If another thread currently has the lock, the first thread will block, going into the suspended stated BLOCKED
which you learned about previously. The blocked thread will only continue once the other thread has released the lock.
Because instance methods share access to instance variables in the same class instance, making a method atomic may require more than making the method itself synchronized
. If other methods potentially change variables used by the atomic method, a race condition arises again.
Even if BankAccount.deposit(…)
is made synchronized, a race condition now exists, not from multiple threads accessing the same method, but from multiple threads accessing different methods. If another thread, say the monthly billingCycleThread, happens to enter BankAccount.addInterest()
at the same time that momThread or dadThread is inside the BankAccount.deposit(…)
method, a similar race condition develops. BankAccount.deposit(…)
is no longer atomic because another thread can change data used by that method via the BankAccount.addinterest()
method. The solution is to make both methods synchronized
.
Because synchronized
refers to the same class-level lock, a thread cannot enter one of the methods if another thread is has already acquired the lock and entered either of the method.
Synchronization Scope
The scope of synchronized
can be narrowed by using the keyword on smaller-grained blocks than entire methods. If the BankAccount.deposit(…)
performed other actions in addition to making a deposit, those might not need to be part of the atomic operation and could be relegated to outside the synchronized block. Sending an email after making a deposit, for example, would not affect the actual deposit; it would not matter which thread requested an email first, as long as the deposit operation completed atomically.
Synchronizing on a Specific Object
In an effort to reduce the scope of synchronization even further, the synchronized
keyword allows you to specify the object from which to acquire the synchronization lock. The BankAccount
class might keep a separate variable to keep count of the number of transactions that had occurred. Updating this variable would need to be synchronized or there would be a race condition, just as with updating the balance. The two values are separate, however; operations updating the number of transaction don't interfere with operations modifying the balance. You could create separate objects solely for using their locks to synchronize access separately to the two values.
Deadlock
As soon as you have multiple threads and multiple locks, the risk is raised that your program logic could allow a deadlock to occur. A deadlock situation occurs when two threads are blocking, each waiting for a lock that he other holds. Each thread can only continue once the lock it is waiting for is released, but that can never happen because the other thread is blocking and waiting on the lock held by the first thread. In such a case each thread blocks indefinitely and that section of the program hangs.
Rather than a simple transaction counter, let's keep a historical list of all account actions. We will use a simple ArrayList<Action>
, but because ArrayList<E>
is not thread safe we must add a locking object for it, too, or simultaneous modification by multiple threads would corrupt the list contents. (We can't simply reuse the balanceLock
object, because we want to record actions other than deposits such support calls.)
Let us assume a situation in which you make a support call regarding your account at the same time that your mother is depositing your birthday present—a completely plausible scenario:
- momThread calls
deposit(2000L)
. - momThread gets a lock on
balanceLock
. - momThread executes
balance += amount
. - Just then you finish your customer support call and supportThread calls
logSupportCall(…)
. - supportThread gets a lock on
historyLock
. - supportThread tries to get a lock on
balanceLock
, but can't because that lock is held by momThread, so supportThread blocks. - momThread continues execution and tries to get a lock on
historyLock
, but can't because that lock is held by supportThread, so momThread blocks as well.
Now both momThread and supportThread are blocked, each waiting for a lock held by the other. Your birthday present never gets deposited, all because some developer didn't notice the possibility of a race condition.
One solution would be to move retrieval of the balance from outside the scope of the lock onhistory
.
Synchronized Collections
Java's collections and maps were a welcome addition that brought abstract data type implementations for general use. The most common implementations such as java.util.ArrayList
are similar to (but much more extensive) the data structures you implemented personally as part of these lessons. These implementations and those of the basic Java collection implementations are chock full of race conditions.
Think of what would happen if two threads tried to add an object to a linked list, for example; one thread might read the nextNode reference, but the other thread might update the nextNode reference right out from under the other thread before the other finished, and in the end one of the added objects might be lost because the nextNode got replaced with the wrong value. Java's general collection and map implementations similarly make no provision for thread-safety.
When we added a history list in the example above, we controlled access to the list by synchronizing on the list itself: synchronized(history)
. This approach works, but the developer must ensure that all access to the list is correctly synchronized. If one day a another developer adds a new method that accesses the history list, forgetting that it should be synchronized, race conditions will occur and data in the list could become corrupt. Likewise if the history list were to be shared between components in the banking system, it would be difficult to guarantee that the other components correctly performed locking.
java.util.Collections
class provides utility methods for creating unmodifiable collection wrappers. In the same way the Collections
utilities offers several methods for creating synchronized wrappers to collections. The Collections.synchronizedList(List<T> list)
method, for example, can take an existing List<T>
instance and wrap it in a decorator class that only allows synchronized
access to all the list methods.
Now there is no need to maintain a separately lock on the history list, because the list itself provides synchronized access to all its methods internally.
Review
Gotchas
- Don't synchronize an operation on an object that will itself be changed by the operation.
- If you make a list synchronized using one of the
Collections
synchronized wrapper methods, you still need to synchronize on the returned collection if you intend to iterate over its elements. - Double-checked locking is broken in the Java memory model before Java 5.
In the Real World
- Concurrent programming with multiple threads is very, very difficult to get right. If at all possible, redesign your program so that multiple threads do not access shared variables.
- Double-checked locking is confusing and will not work with older versions of Java. Usually there are better alternatives; don't use it unless you have a good reason.
Self Evaluation
- What does it mean that synchronization in Java is
reentrant
?
Task
Your current implementation of FilePublicationRepository
is not thread-safe. If used in an environment allowing multiple access to the repository, one thread could be trying to update files while another is trying to read them. Add synchronization to FilePublicationRepository
to prevent conflicts when accessing local variables and the file system concurrently via multiple threads. Note that synchronization only affects code within the JVM. It is not possible to prevent concurrent modifications to files in the publication directory by other processes in the operating system, or to prevent the user from manually modifying files while the program is running.
The FakePublicationRepository
also needs to be upgraded for concurrent access. You likely used a Java collection type for storing the publications in memory. Add synchronized access to the collection using one of the java.util.Collections
methods that creates a synchronized wrapper for a collection. If you use multiple collections in your implementation of FakePublicationRepository
, create a synchronized wrapper for one of them and use that as the lock object for synchronizing the others.
See Also
- Synchronization (Oracle - The Java™ Tutorials)
- Wrapper Implementations (Oracle - The Java™ Tutorials)
- Java 8 Concurrency Tutorial Part 2: Synchronization and Locks (Benjamin Winterberg)