Monday, April 26, 2010

When multithreading / multitasking your application doesn't help...

Copyright (c) Formula 1
Last week we spotted some behaviors of our system that could be improved significantly. In some specific cases our system is sending multiple independent requests to the underlying subsystems and it is sending them sequentially i.e. is waiting for the first response before sending second request etc. This would not be any problem if this specific processing didn't take more than 30 seconds (which is our system's hard timeout).

First thought was to send all the independent requests simultaneously and to collect responses. This way if each request / response action takes 5 seconds and we have 10 such actions we could get all responses in ~5 seconds, not in over 50. Assumption here is that such processing will take utmost only the time necessary for the longest task to complete. Of course ~5 seconds also depends on thread creation / allocation from the pool overhead but usually could be neglected (we assume that this time is at least order of magnitude lower than time necessary for task to complete).

The best overview of the problem and the solution I'm writing here can be found in my older post Combining Callable or Runnable with Future and FutureTask:

List<String> urls = new ArrayList<String>() {{
  add("http://www.java2jee.blogspot.com");
  add("http://www.yahoo.com");
  add("http://www.msdn.com");
  add("http://c2.com/xp/ExtremeProgrammingRoadmap.html");
  add("http://apache.org/");
  add("http://sourceforge.net/");
}};
List<Future<Integer>> futures = 
  new ArrayList<Future<Integer>>(urls.size());

final ExecutorService service = 
  Executors.newFixedThreadPool(1);

long start = System.nanoTime();
for (String url : urls) {
  // compile error - create this task first
  futures.add(service.submit(new ProcessUrlTask(url)));
}

long result = 0;
for (Future<Integer> future : futures) {
  result += future.get();
}
If you manipulate number of threads Executors.newFixedThreadPool(1) you will see performance boost - default value 1 means that all URLs are processed sequentially.

Coming back to our system - I basically did the same, I created shared thread pool, processing task that was sending request and was waiting for the response, updated unit tests (actually nothing changed here). What was my surprise when I discovered that this "improvement" actually didn't have any effect! Processing time was the same even though thread creation took ~0ms:
16:34:04,738 DEBUG[pool-19-thread-5] ...
16:34:04,741 DEBUG[pool-19-thread-2] ...
16:34:04,741 DEBUG[pool-19-thread-8] ...
16:34:04,741 DEBUG[pool-19-thread-7] ...
16:34:04,741 DEBUG[pool-19-thread-3] ...
16:34:04,741 DEBUG[pool-19-thread-6] ...
16:34:04,741 DEBUG[pool-19-thread-1] ...
16:34:04,741 DEBUG[pool-19-thread-4] ...
The detail I haven't mentioned yet is that messages that were supposed to be sent in multiple threads were sent using fancy middleware Java library our team doesn't develop. But more about it later...

My improvement sucks because it didn't shorten time necessary for request processing - still didn't know why. So, I started my local Tomcat, sent original SOAP request and watched the logs. What I spotted was in fact sequential message send / receive behavior. How is it possible if messages were being sent from different threads? There must be some lock in the middleware library!
This was not so sure and I needed proof before communicating anything to the management. I used my secret weapon i.e. Visual VM. In fact I connected to the JVM on which Tomcat server was running and started the test again. During the test I dumped threads and here's what I got:
Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode):

"pool-2-thread-8" prio=6 tid=0x289ab000 nid=0x1690 in Object.wait() [0x2a92f000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x25c34b78> (a java.lang.Object)
 at java.lang.Object.wait(Object.java:485)
 at company.utils.Mutex.lock(Mutex.java:41)
 - locked <0x25c34b78> (a java.lang.Object)
 ...
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
 - <0x24ecfea0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"pool-2-thread-7" prio=6 tid=0x29913400 nid=0x1670 in Object.wait() [0x2a8df000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x24fa2558> (a company.lib.Transaction)
 at java.lang.Object.wait(Object.java:485)
 at company.lib.Transaction.waitForCompletion(Transaction.java:530)
 - locked <0x24fa2558> (a company.lib.Transaction)
 ...
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
 - <0x24ecfb50> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"pool-2-thread-6" prio=6 tid=0x293ffc00 nid=0x1674 in Object.wait() [0x2a88f000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x25c34b78> (a java.lang.Object)
 at java.lang.Object.wait(Object.java:485)
 at company.utils.Mutex.lock(Mutex.java:41)
 - locked <0x25c34b78> (a java.lang.Object)

... remainder omitted

Conclusions: middleware library used to send messages is "queuing" all the messages - if I can write "queue" about basic Java monitor lock (inside company.utils.Mutex.lock() method):
synchronized(lock) {
 while(locked) {
   lock.wait();
 } 
 locked = true;
}
And the lock was the current session object - it means that if I want to use 5 different threads to send messages simultaneously I have to create 5 different sessions (which in turn means that I have to log in to the system five times with the same credentials). This is additional memory and performance overhead we can't accept at this time. Not to mention our questions why session in this middleware does not allow us to send messages from many threads at the same time?

At this moment we abandoned improving this part of the system - it's hardly possible with current middleware implementation.

No comments: