School of Engineering and Applied Science
Department of Computer Science
CSci 133 -- Algorithms and Data Structures I
http://www.seas.gwu.edu/~csci133/spring03
Prof. Michael B. Feldman
mfeldman@gwu.edu

Notes on Algorithm Performance Prediction
January 2003

adapted from Chapter 3 of
M.B. Feldman, Software Construction and Data Structures with Ada 95,
Addison Wesley, 1996.

1. Performance Prediction and the "Big O" Notation

In considering the trade-offs among alternative problem solutions, an important factor is the expected computation time of each of the alternatives. It is difficult to predict the actual computation time of an algorithm without knowing the intimate details of the underlying computer, the object code generated by the compiler, and other related factors. The actual time must really be measured for a given algorithm, language, compiler, and computer system by means of some carefully designed performance tests, usually called benchmarks.

On the other hand, it is very helpful to know the way the running time will vary or grow as a function of the "problem size": the number of elements in an array, the number of records in a file, and so forth. Programmers sometimes discover that programs that have run in perfectly reasonable time, for the small test sets they have used, take extraordinarily long when run with "real world"-size data sets or files. These programmers were deceived by the "growth rate" of the computation.

To take an example, programs whose running time varies with the square of the problem size are not unusual. A program taking, say, 1 second to complete a file-handling problem with 10 records in the file, will require not 2 but 4 seconds for 20 records. Increasing the file size by a factor of 10, to 100 records, will multiply the original run time by 100, to 100 seconds. One thousand records will need 10,000 seconds, or about 3 hours, to complete! And 10,000 records (the number of accounts in a fair-sized bank, or students in a fair-sized university) will need almost 2 weeks! This is a long time by comparison to the 1 second taken by the 1-record test.

Suppose that this program is moved to a newer computer that is, say, twice as fast in every respect as the old one. All the running times will be halved, which means that the 2-week run will now take "only" 1 week. This is probably still much longer than the time that was desired. The difficulty lies not in the original computer being "too slow," but in the poor growth-rate performance of the algorithm, and only real improvement in the growth rate will yield significant performance speedup.

It is also futile to blame this sort of poor performance on a language or a compiler. A compiler only translates the high-level statements of an algorithm into machine instructions; it does not—cannot—change the algorithm in any significant way. A better compiler can effect the sort of incremental speedup expected from a faster computer, but it cannot compensate for a poorly chosen or poorly coded algorithm.

This example shows that it makes sense to know something about growth rates, lest program running time grow in unpleasantly surprising ways when problems grow to meaningful size. Sometimes there is no choice: There may be no alternative solution to that program running in "squared," or quadratic, time. But at least a programmer with some experience in performance estimation will not be surprised!
 

2. Algorithm Growth Rates

Getting a precise estimate of the computation time of an algorithm is often difficult, but as you have seen, it helps to "get a handle on it." We do this by trying to write a formula for the computation time in terms of the problem size N. By the problem size, generally we mean the number of data items that must be processed by the algorithm.

The computation time of an algorithm consists of two factors. One factor depends on the programming language, compiler, and speed and instruction set of the underlying computer. It is often a good assumption that this system-dependent factor is reasonably constant, not varying with the problem size, and so we can "factor it out." (Obviously it’s nice to have a small system dependent-constant as well as a small growth rate, but reducing the size of the constant is hard to do in a general way precisely because it’s system-dependent!)

We give the name growth rate to that part of the formula that does vary with problem size. In discussing the growth rates of algorithms, it is conventional to use the notation O() (read "growth rate," "big O," or "order of magnitude"). The most common growth rates you will normally encounter are the following:

To give you an idea of the computation time of typical file sizes, Figure 1 shows the values of each of these functions for a number of different values of N. The values happen to be powers of 2, but this is just to make the computation of logarithms convenient.
 
Fig. 1
Common Algorithm Growth Rates
N 1 (constant) log N N log N N2
1 1 0 0 1
2 1 1 2 4
4 1 2 8 16
8 1 3 24 64
16 1 4 64 256
32 1 5 160 1024
64 1 6 384 4096
128 1 7 896 16384
256 1 8 2048 65536
512 1 9 4608 262144
1024 1 10 10240 1048576
2048 1 11 22528 4194304
4096 1 12 49152 16777216
8192 1 13 106496 67108864
16384 1 14 229376 268435456
32768 1 15 491520 1073741824

From this table you can see that as N grows, log N remains quite small with respect to N and N log N grows fairly large, but not nearly as large as N2. In studying sorting later, you’ll discover that most sorting methods have growth rates of N log N or N2. In the next section we will look at some common algorithmic structures and discuss ways to estimate their growth rates.
 

3. Estimating the Growth Rate of an Algorithm

While there are no absolute "cookbook" rules that will always work to estimate performance, we can "get a handle on it" by taking advantage of the fact that algorithms are developed in a structured way. Structured algorithms combine statements into usefully complex blocks in four ways: In Figure 2 you can see the Java notation for a number of different variations on these structures. Now let’s take a look at some typical algorithm structures and estimate their "big O’s". We’ll always use N to denote the "problem size."
 
Figure 2
Some Java Control Structures
(a) Sequence

Temp = A;
A = B;
B = Temp;

(b) Decision

if (x > Max)
{
  Max = x;
}

(c) If-then-else

if (x = y)
{
  Max = x;
}
else
{
  Max = y;
}

(d) Counting loop

int x = 0;
for (int i = p; i <= q; i++)
{
  x = x + i;
}

(e) While loop

while (x > 0)
{
  y = y + 3;
  x = x / 2;
{

Simple Statement

A simple statement is, for example, an assignment statement. If we assume that the statement contains no function calls (whose execution time may, of course, vary with problem size), the statement takes a fixed amount of time to execute. This we denote by O(1), because if we factor out the constant execution time, we’re left with 1.

Sequence of Simple Statements

A sequence of simple statements takes time equal to the sum of the individual statement times. If the individual statements are O(1), then so is the sum.

Decision

For purposes of estimating performance, we rely on the fact that both the thenpart and the else part can be arbitrary structures in their own right. Whether the thenpath or the elsepath will be executed depends, of course, on the data and other execution-time conditions. To estimate conservatively, then, we must take the larger of the two individual "big O’s" as the "big O" of the decision.

There are variations of the decision structure. For example, the switch structure is really a multiway if-then-else, so in estimating a switch, we just take the largest "big O" of all of the case choices.

Note that performance estimation can sometimes get complicated: The condition tested in a decision may involve a function call, and the timing of the function call may itself vary with problem size!

Counting Loop

A counting loop is a loop in which the loop counter is incremented or decremented each time the loop is executed. This is different from some loops we will consider a bit later, in which the counter is multiplied or divided by a value.
What is the performance of a simple counting loop? Suppose the body of the loop contains only a sequence of simple statements. Then the performance of the loop is just the number of times the loop executes. Let us use the term trip count to mean the number of times a loop executes (number of "trips" through the loop body). If the trip count is constant—independent of problem size—then the whole loop is O(1). On the other hand, if the loop is something like
for (int counter = 1; counter < N; counter++)
the trip count does depend on N, so the performance is O(N). These two loop structures, in which the body contains only simple statements, are shown in Figure 3.
 
Figure 3
Two Simple Counting Loops
(a) Trip count is constant

for (int counter = 1; counter < 5; counter++)
{
  // something with O(1) performance
}

(b) Trip count depends on N

for (int counter = 1; counter < N; counter++)
{
  // something with O(1) performance
}

Now suppose that the loop body is more complex. Real algorithms have this sort of complexity, so let’s consider a number of possibilities. Remember that we can’t cover every case; we’ll look at some common ones, so that you can recognize these when you see them.

Figure 4 shows three double counting loops. In (a), the outer loop’s trip count is clearly N. However, the inner loop executes N times for each time the outer loop executes, so the body of the inner loop will be executed N x N times, and the performance of the entire structure is O(N2).
 

Figure 4
Three Simple Counting Loops
(a) A double counting loop

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int innerCounter = 1; innerCounter < N; innerCounter++)
  {
    // something with O(1) performance
  }
}

(b) Another double counting loop

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int innerCounter = 1; innerCounter < outerCounter; innerCounter++)
  {
    // something with O(1) performance
  }
}

(c) Yet another double counting loop

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int innerCounter = outerCounter; innerCounter < N; innerCounter++)
  {
    // something with O(1) performance
  }
}

Now let's consider (b). The outer loop surely has a trip count of N. But the trip count of the inner loop depends not only on N but also on the value of outerCounter! If outerCounter is 1, the inner loop has a trip count of 1. If outerCounter is 2, the inner loop trip count is 2; if outerCounter is 3, the inner loop trip count is 3. Finally, if outerCounter is N, the inner loop trip count is N.

How many times will the body of the inner loop be executed? It will be the sum
1 + 2 + 3 +. . .+ N ? 1 + N.

This summation, as you probably learned in an algebra course, is
N x (N + 1)/2 = ((N2) + N)/2.

We will say that the performance of this structure is O(N2), since for large N the contribution of the N/2 term is negligible. For example, if N is 100, including the N/2 term gives 5050; ignoring it gives 5000, a difference of only 1%.
It is interesting that making the inner loop trip count depend on OuterCounter does not alter the "big O," since we neglect the term in N.

The structure in (c) is similar, but the trip count of the inner loop decreases rather than increases as above. If outerCounter is 1, the inner loop has a trip count of N. If outerCounter is 2, the inner loop trip count is N - 1; if outerCounter is 3, the inner loop trip count is N - 2. Finally, if outerCounter is N, the inner loop trip count is 1.

The number of times the body of the inner loop is executed is the sum
N + N - 1 + N - 2+ . . . + 1
which is really the same sum as before:
N x (N + 1)/2 = ((N2) + N)/2.
This structure also has performance O(N2).
Look at the loop structures in Fig. 5 and convince yourself that in all cases the performance is O(N3).
 

Figure 5
Some Triple Counting Loops
(a) 

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int middleCounter = 1; middleCounter < N; middleCounter++)
  {
      for (int innerCounter = 1; innerCounter < N; innerCounter++)
      {
        // something with O(1) performance
      }
 }
}

(b) 

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int middleCounter = 1; middleCounter < outerCounter; middleCounter++)
  {
      for (int innerCounter = 1; innerCounter < middleCounter; innerCounter++)
      {
        // something with O(1) performance
      }
 }
}

(c) 

for (int outerCounter = 1; outerCounter < N; outerCounter++)
{
  for (int middleCounter = 1; middleCounter < outerCounter; middleCounter++)
  {
      for (int innerCounter = middleCounter; innerCounter < N; innerCounter++)
      {
        // something with O(1) performance
      }
 }
}

From these examples we can generalize as follows: a structure with k nested counting loops—loops in which the counter is just incremented or decremented by 1—has performance O(Nk) if the trip count of each loop depends on the problem size. A growth rate of O(Nk) is called polynomial.

Multiplicatively Controlled Loop

Counting loops are additively controlled: the counter is incremented or decremented by some constant value, often 1. By a multiplicatively controlled loop, we mean one in which the variable controlling the loop is multiplied or divided by a constant each time the loop is executed. Multiplicatively controlled loops arise often in various kinds of algorithms you will learn.

Although most programming languages have a special structure for counting loops, they usually have no structure designed specifically to accommodate multiplicative control; we just use a while or until loop of some kind.
Recall that whatever the specific structure used, every loop needs

The difference between a while structure and an until structure is that in the former the termination condition is tested before each iteration, and in the latter the condition is tested at the end of each iteration. Thus an until loop has a trip count of at least 1, where a while loop may have a trip count of 0 under certain conditions.
Consider the structures in Figure 6.
 
Figure 6
Two Multiplicately Controlled Loops
(a) Control is multiplied by 2

int Control = 1;
while (Control <= N)
{
  // something with O(1) performance
  Control = 2 * Control;
}

(b) Control is divided by 2

int Control = N;
while (Control >= 1)
{
  // something with O(1) performance
  Control = Control / 2;
}

In (a), whose performance clearly depends on the problem size N, the variable Control is multiplied by the constant 2 until Control becomes larger than N. Since Control’s starting value is 1, after k iterations,
Control = 2k
The number of iterations k can be found just by taking logarithms of both sides so that we get
log2 Control = k
Since the loop stops when Control >= N, the performance of this algorithm is O(log N).

Looking at the structure a bit more generally, suppose we multiply Control by some other constant factor. Giving this constant the name Factor, we can see that after k iterations
Control = Factork
and so, by the argument above, the performance is O(logk N) instead of O(log2 N). However, in considering the "big O" of an algorithm, it doesn’t matter what base we use for logarithms. This is because the logarithm of a number to one base is just a constant times the logarithm of the same number to a different base. Since constant factors are "factored out" of a "big O," the base doesn’t matter and we usually just refer to O(log N) or O(log2 N). As an exercise, you can fill in the details of a proof that the base really doesn’t matter.

Now look at (b), where the control variable is divided by a factor (2 in this case) instead of multiplied. This is very similar to the previous example. There, Control was started at a small value and multiplied repetitively until it reached some maximum; here, Control is started at a large value and divided repetitively until it reaches a minimum.
What is the "big O" of this structure? Instead of repeating the analysis above, we just say that it is O(log N) and leave the details to you.

Now look at the two structures in Figure 7. Here we have analogies to the nested counting loops we considered earlier. The performance of these structures is O(N log N); you can do the analysis as an exercise.
 

Figure 7
Two N log N Loop Structures
(a) 

int Control = 1;
for (int Counter = 1; Counter <= N; Counter++)
{
  while (Control <= N)
  {
    // something with O(1) performance
    Control = 2 * Control;
  }
}

(b)

int Control = N;
while (Control >= 1)
{
  for (int Counter = 1; Counter <= N; Counter++)
  {
    // something with O(1) performance
  }
  Control = Control / 2;
}

Subprogram Call

We can handle a subprogram call by realizing that the subprogram is also an algorithm with its own "big O," then imagining that this algorithm appears "in line" with the calling program.

In this way, we can deal with it as we have dealt with other complex algorithms above. If the subprogram call appears inside a decision statement, its "big O" is used in determining the maximum of the "big O’s" of the different branches of the decision. If the subprogram call appears inside a loop, its "big O" is, essentially, multiplied by the trip count of the loop.

If the subprogram (call it A) in turn calls another subprogram (call it B), we use B’s "big O" in calculating A’s, and then A’s in calculating the calling program’s "big O," and so on for deeper nesting of subprograms.

The structures you have seen here show examples of the most common "big O" performances: constant, linear, quadratic, log N, and N log N. They also show how to think through the "big O" analysis for composite program control structures.

(end)