The Fallacy of Premature Optimization - A must-read

Nov 19 2009

Sir Tony Hoare historically remarked that, "premature optimization is the root of all evil." Have we let this view (or a mis-application or misinterpretation of it) dictate too much of our programming methodology? In his article The Fallacy of Premature Optimization, Randall Hyde argues that we have indeed.

I feel it my philosophic duty to point out that there is in fact no fallacy in the statement. But I think that Hyde makes a very good argument, terminological shortcomings aside aside. Of particular interest are his nine observations in the middle. <!--break-->

Good Observations

Here are a few observations that I think are particularly interesting:

Observation #3
Software engineers use the Pareto Principle (also known as the "80/20 rule") to delay concern about software performance, mistakenly believing that performance problems will be easy to solve at the end of the software development cycle. This belief ignores the fact that the 20 percent of the code that takes 80 percent of the execution time is probably spread throughout the source code and is not easy to surgically modify. Further, the Pareto Principle doesn't apply that well if the code is not well-written to begin with (i.e., a few bad algorithms, or implementations of those algorithms, in a few locations can completely skew the performance of the system).
Observation #5
Software engineers have been led to believe that they are incapable of predicting where their applications spend most of their execution time. Therefore, they don't bother improving performance of sections of code that are obviously bad because they have no proof that the bad section of code will hurt overall program performance.

These observations, and most of the others in the article, are insightful and should give rise to a re-assessment of the role of optimization in any program of size. They should also remind use to avoid blindly following patterns when a novel approach can mean a real improvement. And it's not just algorithms that make a difference. Data structures, too, play a large role. For example, take a look at the needlessly complex arrays in Drupal's navigational menus. Nested associative arrays take the place of a more robust data structure (like a classed object). And for that reason, data is duplicated, structures are hard to traverse, and far more space than is necessary is allocated.

One serious performance blunder I see made all the time in PHP is the re-implementing or wrapping of existing PHP functions. I once thought about suggesting an NSFW principles: No Stupid Wrappers. Adding a function call around a function call can have minimally negative performance impacts in compiled languages (many of which will simply optimize out the indirection at compile time). But wrapping a native function in a wrapper function in PHP not only deepens the call stack, but it artificially slows down the system, adding a PHP-level hit where there would typically only be a binary (C-layer) hit. Worse is the case where PHP re-implementations are used instead of PHP builtins. (I once saw an application with a PHP implementation of MD5, which already has an built-in MD5 function.)

One danger in a language like PHP is to believe you are using a high-performance algorithm when you are not. Yes, a linked list is faster than an array during incremental searching ceteris paribus, but no PHP-language implementation of a linked list will outperform a builtin PHP array because arrays are implemented in C, and do not need the extra runtime overhead of a PHP-based data structure.

The Unobserved Observation

While I did enjoy the article, I find one observation to be horrible, and perhaps the closest thing to a fallacy in the entire article:

Observation #9
There is little need to ensure that you have the best possible algorithm during initial software design because you can always substitute a better algorithm later. Therefore, there is no need to worry about performance during initial software design because it can always be corrected with a better algorithm later. Unfortunately, people who take this approach to software design often write code that cannot be easily modified down the road.

What's the problem with this one? It's the last sentence: "Unfortunately, people who take this approach to software design often write code that cannot be easily modified down the road." There is no clear link between the first part of the argument and the last, and if you have to supply a premise that justifies this argument, the premise is:

  • People who do not focus on the best possible algorithm during initial software design write code that cannot be easily modified.

That premise is easily rejected because it makes a gross, unsubstantiated empirical claim. In my experience (and consider the fact that I don't work with poor software developers very often), I often find sections of code that are not optimized, are noted as such and can easily be fixed without impacting the larger program. (In fact, many OO design patterns are there to ensure that later algorithm optimization can occur.) In the practical world, it is often the case that the extra time it takes to optimize is substantial enough that it should be delayed. It is indeed easily justifiable to neglect optimization during early editions of software simply in order to "play it safe" (use the tested rather than the experimental) and to release on time. Here, Hyde provides a very good quote:

Charles Cook's quote was right on the mark: "A good software developer will do this [optimization] automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems."

Bugs introduced by novel algorithms are bad. And a delayed release is bad. But a third negative point to optimization is the case where optimization of one micro-unit of code (a loop, a particular function, a chunk of code) may actually damage the overall maintainability and flexibility of the system. Hyde notes this issue as well. A trimmed record may take less space, but valuable data may be lost. (The Y2K problem is a good example). Adding a cache or materialized view may speed read time, but it could drastically slow write time and introduce maintenance headaches.

So perhaps the hyperbole of Hoare and Knuth has been taken too far. But there is certainly no fallacy. And the intended nugget of wisdom remains: premature optimization can slow down development, cause bugs, introduce unexpected consequences, and indeed violate good programming practices. But both strategic optimization and a work-a-day focus on good algorithms can be a big win.