Monday, February 16, 2009

Synchronized collections in Java 5+

Picture courtesy of Swamibu@flickr
You probably encountered problem of which collection to use while writing concurrent applications. Which implementation to use when you have N concurrent writers and M concurrent readers? I will try to shortly answer this question on simple example.

The code below presents usage of different collection implementations while having 16 concurrent writer threads and 100 concurrent reader threads. At the same time writer threads modify the collection i.e. add new elements and reader threads iterate through actual collection state and operate on its elements.

public class ConcurrentCollectionsTest {
public static final int READERS_COUNT = 100;
public static final int COLLECTION_SIZE = 5000;
public static final int WRITERS_COUNT = 16;

static volatile int readExceptionCount = 0;
static CountDownLatch latch;

public static void main(String[] args) throws Exception {
Map<String, Collection<String>> map =
new LinkedHashMap<String, Collection<String>>();

map.put("ArrayList",
new ArrayList<String>());

map.put("Synchronized ArrayList",
Collections.synchronizedCollection(
new ArrayList<String>()));

map.put("ConcurrentLinkedQueue",
new ConcurrentLinkedQueue<String>());

map.put("CopyOnWriteArrayList",
new CopyOnWriteArrayList<String>());

for (Entry<String, Collection<String>> e: map.entrySet()) {
Collection<String> collection = e.getValue();
latch = new CountDownLatch(READERS_COUNT + WRITERS_COUNT);
readExceptionCount = 0;

long start = System.currentTimeMillis();
for (int i = 0; i < WRITERS_COUNT; ++i) {
Thread t = new Thread(new Writer(collection));
t.start();
}

for (int i = 0; i < READERS_COUNT; ++i) {
Thread t = new Thread(new Reader(collection));
t.start();
}

latch.await();
System.out.println("-----------------");
System.out.println("Stats for [" + e.getKey() + "]:");
System.out.printf("Collection's size: %d\n", collection.size());
System.out.printf("ConcurrentModificationException count [read]: %d\n", readExceptionCount);
System.out.printf("Duration: %d\n\n", System.currentTimeMillis() - start);
}
}

static class Reader implements Runnable {
private final Collection<String> collection;

Reader(Collection<String> col) {
this.collection = col;
}

public void run() {
try {
for (Iterator<String> i = collection.iterator(); i.hasNext();) {
// additional computing operation just to consume some time
String t = i.next() + "_added";
if (t.length() > 10 && i.hasNext()) {
i.next();
}
}
} catch (ConcurrentModificationException e) {
readExceptionCount++;
}
latch.countDown();
}
}

static class Writer implements Runnable {
private final Collection<String> collection;

Writer(Collection<String> col) {
this.collection = col;
}

public void run() {
try {
for (int i = 0; i < COLLECTION_SIZE; ++i) {
collection.add(Thread.currentThread().getName()
+ System.currentTimeMillis());
}
} catch (Exception e) {
// ignore
}
latch.countDown();
}
}
}

Results on my machine (numbers can vary on different hardware but problems/conclusions will be the same):

-----------------
Stats for [ArrayList]:
Collection's size: 77038
ConcurrentModificationException count [read]: 17
Duration: 1047 ms

-----------------
Stats for [Synchronized ArrayList]:
Collection's size: 80000
ConcurrentModificationException count [read]: 4
Duration: 1141 ms

-----------------
Stats for [ConcurrentLinkedQueue]:
Collection's size: 80000
ConcurrentModificationException count [read]: 0
Duration: 1046 ms

-----------------
Stats for [CopyOnWriteArrayList]:
Collection's size: 80000
ConcurrentModificationException count [read]: 0
Duration: 14344 ms

Conclusions
ArrayList (unsurprisingly) doesn't work at all - there is 17 ConcurrentModificationException on read attempt and what is more unacceptable this collection is in unpredictable state i.e. it's size should be 80000 instead of 77038 - it means that many items were not stored by the writer threads.

Synchronized ArrayList works better i.e. its state is correct but still you need additional synchronization in order to get rid of ConcurrentModificationException while iterating through it.

ConcurrentLinkedQueue works best in this case and does not add any performance penalties.

CopyOnWriteArrayList is the worst in this case because traversal operations does not vastly outnumber mutations.

The last conclusion would be to perform such test before deciding which implementation works best in your particular case. Also, read javadoc carefully to see implications and suggested use cases.

Monday, February 09, 2009

Java synchronizers - Semaphore

Picture courtesy of matthewblack@flickr
In this part I will present good-old semaphore that was finally introduced in Java 5. Semaphores are applicable in many different cases. One of them is the case when you have limited number of resources that can be accessed at the same time (imagine that you have 5 printers and 100 employees who print their documents - possibly all at the same time). Users have to wait for their turn - however order in which they requested the resource is not important (someone can wait couple of turns while someone else can get the resource many times turn after turn).

The code below shows this "metaphor" with 3 printers and 5 employees:

public class SemaphoreTest {
public static void main(String[] args) throws InterruptedException {
Semaphore sem = new Semaphore(3);
Thread[] t = new Thread[5];

for (int i = 0; i < t.length; i++) {
t[i] = new Thread(new Task(sem));
t[i].start();
}

Thread.sleep(10000);

for (int i = 0; i < t.length; i++) {
t[i].interrupt();
}
}

private static final class Task implements Runnable {
private final Semaphore sem;

private Task(Semaphore sem) {
this.sem = sem;
}

@Override
public void run() {
try {
while (true) {
if (sem.tryAcquire() == true) {
System.out.printf("%s acquired semafore - %d\n", Thread.currentThread().getName(), sem.availablePermits());
sem.release();
} else {
System.out.printf("%s could not acquire semafore\n", Thread.currentThread().getName());
}
Thread.sleep(1000);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
}

The program above terminates automatically after about 10 seconds from start.

Results (one of the infinite number of possible results) on my machine:

Thread-0 acquired semafore - 1
Thread-3 could not acquire semafore
Thread-4 could not acquire semafore
Thread-2 could not acquire semafore
Thread-1 acquired semafore - 0
Thread-3 acquired semafore - 1
Thread-0 acquired semafore - 1
Thread-4 acquired semafore - 0
Thread-2 could not acquire semafore
Thread-1 acquired semafore - 0
Thread-0 acquired semafore - 1
Thread-3 acquired semafore - 0
Thread-2 acquired semafore - 1
...

As you can see each thread at some point acquires the semaphore and accesses requested resources. Be careful though because when the number of consumers/producers vastly outnumbers semaphore's permits starvation is very likely (i.e. some of the threads will never have an opportunity to enter the critical section). You can even spot it on the presented listing where Thread-2 cannot acquire semaphore twice while Thread-0 is "lucky" and always gets it.

Thursday, February 05, 2009

Seven Principles of Lean Software Development

Lean Software Development has its roots in Toyota Production System and it helps software organizations optimize their processes and production methods in order to deliver their products to the market much faster and with better quality. Lean movement can be considered as a new development method that tries to identify and eradicate all problems and "disabilities" of old methodologies like Waterfall. Lean puts main focus on people and communication - if people who produce the software are respected and they communicate efficiently, it is more likely that they will deliver good product and the final customer will be satisfied.

Lean Software Development subsequently gave birth to Agile Software Development methods and its main branches like Scrum or Crystal Clear. For many people who know the subject Agile is just another word for Lean or Lightweight.

In one of the most popular books on Lean subject, namely "Implementing Lean Software Development - from Concept to Cash", Mary and Tom Poppendieck explain how to implement Lean by following seven principles - principles that are some kind of Lean commandments:

  1. Eliminate Waste
    • Provide market and technical leadership - your company can be successful by producing innovative and technologically advanced products but you must understand what your customers value and you know what technology you're using can deliver

    • Create nothing but value - you have to be careful with all the processes you follow i.e. be sure that all of them are required and they are focused on creating value

    • Write less code - the more code you have the more tests you need thus it requires more work and if you're writing tests for features that are not needed you are simply wasting time


  2. Create Knowledge
    • Create design-build teams - leader of the development team has to listen to his/her members and ask smart questions encouraging them to look for the answers and to get back with encountered problems or invented solutions as soon as possible

    • Maintain a culture of constant improvement - create environment in which people will be constantly improving what they are working on - they should know that they are not and should not be perfect - they always have a field to improve and they should do it

    • Teach problem-solving methods - development team should behave like small research institute, they should establish hypotheses and conduct many rapid experiments in order to verify them


  3. Build Quality In
    • Synchronize - in order to achieve high quality in your software you should start worrying about it before you write single line of working code - don't wait with synchronization because it will hurt

    • Automate - automate testing, building, installations, anything that is routine, but do it smartly, do it in a way people can improve the process and change anything they want without worrying that after the change is done the software will stop working

    • Refactor - eliminate code duplication to ZERO - every time it shows up refactor the code, the tests, and the documentation to minimize the complexity


  4. Defer Commitment
    • Schedule Irreversible Decisions at the Last Responsible Moment - you should know where you want to go but you don't know the road very well, you will be discovering it day after day - the most important thing is to keep the right direction

    • Break Dependencies - components should be coupled as loosely as possible to enable implementation in any order

    • Maintain Options - develop multiple solutions for all critical decisions and see which one works best


  5. Optimize the Whole
    • Focus on the Entire Value Stream - focus on winning the whole race which is the software - don't optimize local inefficiencies, see the whole and optimize the whole organization

    • Deliver a Complete Product - teams need to have great leaders as well as great engineers, sales, marketing specialists, secretaries, etc. - they together can deliver great final products to their customers


  6. Deliver Fast
    • Work in small batches - reduce projects size, shorten release cycles, stabilize work environment (listen to what your velocity tells you), repeat what's good and eradicate practices that creates obstacles

    • Limit work to capacity - limit tasks queue to minimum (one or two iterations ahead is enough), don't be afraid of removing items from the queue - reject any work until you have an empty slot in your queue

    • Focus on cycle time, not utilization - put in your queue small tasks that cannot clog the process for a long time - reduce cycle time and have fewer things to process in your queue


  7. Respect People
    • Train team leaders/supervisors - give team leaders the training, the guidance and some free space to implement lean thinking in their environment

    • Move responsibility and decision making to the lowest possible level - let your people think and decide on their own - they know better how to implement difficult algorithms and apply state-of-the-art software frameworks

    • Foster pride in workmanship - encourage passionate involvement of your team members to what and how they do



If this brief introduction to Lean Software Development is still not enough for you I strongly recommend buying and reading Poppendiecks' book "Implementing Lean Software Development - from Concept to Cash".

Monday, February 02, 2009

Deadlock while terminating JVM - how to avoid it

Quite recently I was implementing some very small command line tool in Java. In order to clean and release all resources in case user presses Ctrl+C I added shutdown hooks. Unfortunately by mistake I added:

System.exit(0);

line within the shutdown hook and my application couldn't close. When I used jps and jstack I saw a deadlock on exit() method? WTF - I thought - I assumed that this method tries to forcibly close the JVM. You now know what was the problem?

If you read documentation carefully you should know :)

If this method is invoked after the virtual machine has begun its shutdown sequence then if shutdown hooks are being run this method will block indefinitely. If shutdown hooks have already been run and on-exit finalization has been enabled then this method halts the virtual machine with the given status code if the status is nonzero; otherwise, it blocks indefinitely.

This explains everything.

Conclusions - never invoke System.exit() from shutdown hooks and READ javadoc even if you're absolutely sure you know Java well...