Friday, June 08, 2007

Optimizing everything: some details matter a lot, most don't

Optimization is something that all engineers and product designers (all people, really) should understand. However, their words and behavior suggest that most either don't understand it, or haven't really internalized the lessons of optimization. (and by the way, I'm talking about CS-style "make it better" optimization, not "make it optimal" optimization)

Like everything, optimization is a complex topic, so I'm going to simplify it into three easy steps:
  1. Find the problems ("profiling")
  2. Sort the problems according to benefit-of-fixing/cost-of-fixing ("ROI")
  3. Fix them, in the order of highest benefit/cost to lowest

Most programmers probably think of optimization primarily in terms of CPU profiling, but it really applies to everything. (and in fact, this post is about the general process of optimizing, especially for things other than code -- for specific cases of hardcore game engine optimization or whatever it might be totally wrong)

For example, people who continually "pinch pennies" (perhaps driving around town to save a few cents on gas) but then waste thousands of dollars on something stupid like a fancy car or credit card interest are probably not optimizing their spending very well. They are putting most of their effort into low reward activities (saving a few cents on gas) while losing much more money in other places. Big, dumb, Dilbertesque companies are also notorious for this kind of thing, cutting back on pencils while running super-bowl commercials, or something.

This is also the kind of activity that I was talking about in my post titled "Wasting time on things that really don't matter". It's not that there is zero difference between two slightly different ways of testing for the empty string, it's just that your product probably has 100,000 more serious problems, and so time spent worrying about the empty string style is time NOT spent fixing those more important problems. Put another way, fix all of the other problems first, then debate the empty string.

Of course you may enjoy debating the empty string (and in fact there are some interesting arguments), and that's fine. The zeroth step in optimization is to decide what you are optimizing for. If you're looking to maximize the amount of time spent on endless debates, or perhaps code "beauty", then maybe the empty string debate really is the highest benefit/cost activity (because the benefit is so high). If, on the other hand, you are trying to maximize the success of your product, then the empty string debate probably has negative value and should be strongly avoided.

The empty string comparison is obviously a small detail, and it's tempting to think that we should not bother with such small details, but that too is a huge mistake. Another important lesson of optimization is that some small fraction of the details will determine the majority of the outcome. This is commonly referred to as the 80/20 or 90/10 rule, stating for example that "90% of the time is spent in 10% of the code", but it could just as easily be 99/1 or something like that (and for code that has already been well optimized, this 80/20 rule may not hold -- it's just a heuristic). In code it may be just a few lines in the "inner loop", or perhaps it's the interest rate on your mortgage ("what's a few percent?"). I once profiled a server at Google and discovered that something like 90% of the time was being spent on a single line of code: a call to sscanf (repeated millions of times) that someone had snuck into the scoring code (that's the 99999/1 rule).

Determining which details matter a lot (and which don't) is also the key to designing successful products. In the classic slashdot iPod post from 2001, they summarized the newly released iPod thusly: "No wireless. Less space than a nomad. Lame."

History suggests that those weren't the important details, since it's 2007 and there still isn't an iPod with wireless (not until the 29th, at least), and most iPods have less than the original 5GB of storage. Apple did, however, get a few very important details right: it's easy to get your music onto an iPod, it holds "enough" music (unlike the early flash players), and it fits easily into your pocket (unlike the early disk-based players).

So how do we determine which details are important? Observation. (and some judgment, but I think that's mostly formed via observation) If you're optimizing an application, observe which resources are being exhausted or causing the delay (it's not always CPU -- it's often disk seeks), and then observe where they are being consumed. If you're trying to optimize money, first observe where most of the money goes to and comes from. To optimize a product design, observe which things are most "useful", and which are most painful or annoying (waiting or thinking, usually).

If you get all of the important details right, you'll be so far ahead of everyone else that the other details probably won't matter. But remember, your time is finite, so time spent on unimportant details is time not spent on important details! Most engineers are very detail oriented people, and this is good because they can really focus on the important details, and it is bad because they can really focus on the unimportant details! I suspect that learning to tell the difference will be worth your time.

By the way, to get even more details right, decrease the cost side of the benefit/cost equation so that you can do more things in less time (by finding easier solutions, avoiding endless debates, etc).


Unknown said...


Obviously, talking about optimization tends to bring up emotion in a lot of people, and adding a language comparison in just adds gasoline to the fire.

I think something important to recognize, and something you appear to overlook in your blog, is that optimization in different fields means different things. Specifically, typically in domains where performance (size & space) is truly important (games, HPC, embedded), a lot of the classic truisms don't hold.

The most obvious example is the 80/20 rule -- generally performance profiles are much flatter, as the easy things have already been done (or instinctively avoided). See Tim Sweeney's talk ( for a mention of this. As well, things causing a slow-down are often very difficult to detect (cache misses, excessive synchronization, excessive inlining).

Second, waiting until the 'end' to profile & optimize is often too late, for multiple reasons. You may need to change your whole design (or language, or art pipeline, or whatever) if you originally made a poor, non-performant choice. Or you may just not be able to squeeze out that 50% faster you suddenly realize you need, three weeks before your shipping date.

One final note regarding the truism to always 'profile before you optimize': this is, in general, good advice, and is much too often ignored, but it shouldn't be blindly followed. How can you consider performance at the design stage with profiling? Generally you need to go with experience, algorithmic complexity analysis, and mini-tests.

On a separate issue, regarding people nit-picking compiler flags and such, I think that's more a case of your mistake having raised warning flags (along with the, at least from a C/++ perspective, inflammatory title). Small changes can have big results on micro-benchmarks, and some setting you consider arcane may be standard for someone who cares about that environment. Combined with an apparent agenda, people get worked up and loud. Imagine I set the JIT threshold to 1 million, and used BigNums, and wrote an article about how crappy Java still was. (I'm not saying this is what you did, I'm just pointing out why your mistakes may have caused a stronger reaction than you expected)

Paul Buchheit said...

Ben, thanks for the thoughtful comment.

I added a note about the 80/20 not always applying.

I realize that a lot of the attention comes from the c++/java aspect of it -- people get very religious about their languages for some reason.

As for the flags, if someone can demonstrate a real performance improvement, then I'm interested (and in fact I noted a few of those flags), otherwise they are irrelevant.

Babu said...


Interesting article. I have used OptimizeIt, JProbe profilers to profile J2EE applications. I usually face a problem when I try to do cost vs benefit analysis for J2EE applications

For example, usually profilers suggest the following approach for profiling J2EE applications

a) Find out hotspot methods or statements (expensive methods) using profiler.
b) Try to fix them by replacing the hotspot method with an efficient one.
c) Profile again and see the top most method will be different one.

After step c, if the application is load tested to check how much throughput improvement is achieved because of a fix to the top hotspot method, there will be either a positive improvement or no improvement at all.

Most of the times it's very difficult to estimate cost(algorithmic) of one request going through multiple layers using asymptotic analysis.

My biggest problem is to determine max throughput that I can expect to achieve with my code improvements, before actually putting together code improvements.

There is a way to find this out at least in J2EE world where applications are usually built using layered architecture. I can find out throughput (pages/sec) of a system by disabling layers one by one. I can also find out which layer needs attention first and use profilers to profile and fix problems.

The disabling of a layer can be done by introducing a cache in front of that layer. I wrote an article about it.

I had used the above method to categorize which layer (UI(server-side), Business, DTO, DA) contributed to system performance cost and introduce code improvements per layer.

It's one more way to approach optimization and do cost vs benefit analysis for J2EE applications.

We can definitely estimate cost using asymptotic analysis if we can isolate performance problems to a specific algorithm.

My 2 cents.

Unknown said...

>Fix them, in the order of highest benefit/cost to lowest

Paul, it looks to me like a classical _optimization problem_ and you're solving it the wrong way around :)
You've got those task with their costs and benefits, and you've got a limitated resource - time. It's a 0-1 kanpsack problem! So why not solve it with dynamic programming as opposed to greedy algorithm you're using right now?

Welcome to the meta-optimization world...

Isaac Gouy said...

Paul Buchheit wrote As for the flags, if someone can demonstrate a real performance improvement, then I'm interested...

afaict the same applies to
-server -XX:CompileThreshold=1

Java Elapsed 6.734
Java Elapsed 3.988
Java Elapsed 4.033

and just

Java Elapsed 4.893
Java Elapsed 3.995
Java Elapsed 3.996

and of course those times vary