Wednesday, October 22, 2008

Premature optimization is the root of all evil - not only in the Agile world

Picture courtesy of gutter@flickr
I was just reading an excellent book by Josh Bloch, namely "Effective Java, Second Edition" and I was on the optimization subject when it happened. It was funny coincidence but I think it was just a sign for me to write this post.

It doesn't relate to the Agility in any way but it relates to the quality of software so it should be definitely published here. And it all started very innocently - from publishing blog post with the solution to some annoying problem.

In this post I will tell you how easily you can fall into really dangerous and ugly development problems starting optimizing your software too early. I hope you will like the story.

Start with the simplest possible solutions...


I've been reading "must-read" book for all Java developers, namely "Effective Java (2nd Edition)" by Josh Bloch and I was just reading "Optimize judiciously" chapter. In the same time I was doing some Java EE development and I encountered a problem with Struts2 file upload capabilities. I found a solution and posted it to my private blog: http://java2jee.blogspot.com/2008/09/solution-to-struts2-upload-file-error.html. "This has nothing to do with the optimization", you may think - and I thought the same but it's wrong assumption.

After few days I received a comment to this post from anonymous user with an "optimized solution". The author of this post wanted to optimize this line of Java code:

if (string.contains(
"the request was rejected because its size")) {
with this code:

public static Pattern REJECTED_FILE_SIZE_PATTERN
= Pattern.compile(".*reject.*size.*");
...
if (REJECTED_FILE_SIZE_PATTERN.matcher(string).matches()) {


I always considered myself as a seasoned Java developer (hopefully it is still true :) but after receiving this comment I was quite worried. "Why I'm not using regular expressions to check strings? Isn't it much faster", I thought. I was even thinking: "Maybe it's time to become a manager? - my Java/technical knowledge is deteriorating..."

"But hey! I will not let it go like this" - I thought. I wrote a simple Java program to test the performance of both solutions:

import java.util.regex.Pattern;

public class Test {
public static void main(String[] args) {
int count = 1000000;
Pattern p = Pattern.compile(".*reject.*size.*");
String matching = "the request was rejected because"
+ "its size (1234) some other text";

long start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
if (matching.contains(
"the request was rejected because its size")) {
// do nothing
}
}
System.out.printf(
"contains() matching: %dms%n",
System.currentTimeMillis() - start);

start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
if (p.matcher(matching).matches()) {
// do nothing
}
}
System.out.printf(
"matches() matching: %dms%n",
System.currentTimeMillis() - start);
}
}
On my machine the standard contains() solution is 50 to 80 times faster than the solution with regexp matcher. What a disastrous effect this could have when applied in the whole application! I can't even imagine.

When I took a look at the contains() method implementation I saw that it operates on the char array (i.e. underlying array that creates the String object). It is fast! And it is the simplest and the most obvious method to call in this situation. It even makes the code more readable and tangible than with the regexp matcher. I see only the advantages.

The conclusion is simple: DON'T OPTIMIZE YOUR CODE AND USE THE SIMPLEST POSSIBLE SOLUTIONS - THEY WORK!

... and stay with them


Joshua Bloch cites these guys:

More computing sins are committed in the name of efficiency (not necessarily achieving it) than for any other single reason - including blind stupidity. (William A. Wulf)


We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald E. Knuth)


We follow two rules in the matter of optimization:
   Rule 1. Don't do it.
   Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfecly clear and unoptimized solution.
(M.A. Jackson)

What else I can add? Actually, nothing. I just showed that each of the quotes above is true on the real example.

To rephrase Joshua Bloch: Never focus on optimizing your software. If you write good and logically structured code your software will be probably optimized by itself. Use well known, standard libraries and use the most basic features that meet your requirements - the optimization and quality will follow.

Do you have similar adventures with sub-optimal solutions? Maybe you disagree with me? I would gladly read your opinions.

Originally published on AgileSoftwareDevelopment.com

No comments: