Kahan summation algorithm

From Wikipedia, the free encyclopedia

In numerical analysis, the Kahan summation algorithm minimizes the error when adding a sequence of finite precision floating point numbers.

It is of basic use, therefore, on computers. It is also called compensated summation. As the name suggests, this algorithm is attributed to William Kahan.

In pseudocode, the algorithm is:

function kahanSum(input, n)
    var sum = input[1]
    var c = 0.0             //A running compensation for lost low-order bits.
    for i = 2 to n
        y = input[i] - c          //So far, so good: c is zero.
        t = sum + y               //Alas, Sum is big, y small, so low-order digits of y are lost.
        c = (t - sum) - y         //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
        sum = t                   //Algebriacally, c should always be zero. Beware eagerly optimising compilers!
    next i                  //Next time around, the lost low part will be added to y in a fresh attempt.
    return sum

An example in six-digit floating-point decimal arithmetic. Suppose Sum has attained the value 100000 and the next value of input(i) is 3.14159 (a six-digit floating point number) and c has the value zero.

  y = 3.14159 - 0
  t = 100000 + 3.14159
    = 100003                       Many digits have been lost!
  c = (100003 - 100000) - 3.14159  This must be evaluated as written! 
    = 3.00000 - 3.14159            The assimilated part of y recovered, vs. the original full y.
    = -0.141590                    The trailing zero because this is six-digit arithmetic.
sum = 100003                       Thus, few digits from input(i) met those of the sum.

The sum is so large that only the high-order digits of the input numbers are being accumulated. But on the next step, suppose input(i) has the value 2.71828, and this time, c is not zero...

  y = 2.71828 - -0.141590          The shortfall from the previous stage has another chance.
    = 2.85987                      It is of a size similar to y: most digits meet.
  t = 100003 + 2.85987             But few meet the digits of sum.  
    = 100006                       Rounding is good, but even with truncation,
  c = (100006 - 100003) - 2.85987  This extracts whatever went in.
    = 3.00000 - 2.85987            In this case, too much.
    = .140130                      But no matter, the excess will be subtracted off next time.
sum = 100006

So the summation is performed with two accumulators: sum holds the sum, and c accumulates the parts not assimilated into sum, to nudge the low-order part of sum the next time around. Thus the summation proceeds with "guard digits" in c which is better than not having any but is not as good as performing the calculations with double the precision of the input. However, if input is already in double precision, few systems supply quadruple precision, and if they did, what if input were quadruple precision...

Another approach is to perform the summation on differences from a working mean (in the hope that the value of sum never becomes much larger than individual differences), except that the values might be quite different from the working mean and thus suffer significant truncation. Alternatively, sort the values and pair positive and negative values so that the accumulated sum remains as close to zero as possible, at great cost in computational effort.

[edit] Caution! Beware Optimising Compilers!

The code produced by the compiler must follow the steps exactly as written! Alas, writers of compilers are often casual in their understanding of floating-point arithmetic and when considering optimisations are seduced by the purity of the mathematical analysis of real numbers when instead it is floating-point arithmetic that is being dealt with. Applied to this code

  t:=sum + y;
  c:=(t - sum) - y;

it will swiftly be deduced that

  c = 0

which is constant and need not be computed within the loop; further, since it is initialised to zero, the statement

  y:=input[i] - c;

can be contracted so that the loop becomes

  y:=input[i];
  t:=sum + y;
  sum:=t;

and further that these variables are just waystations, so

  sum:=sum + input[i];

It might be that some particular optimising compiler would carry its analyses so far as to deduce that a summation of input is intended, and then generate code employing maximum precision and its own version of this algorithm, but far more likely is that it will do something that will result in code that wrecks the workings of the algorithm you have coded.

The algorithm's execution can also be affected by non-mathematical optimisations. For instance, it is quite common for the floating-point computations to be carried out in machine registers that have a precision higher than that of the variables held in main storage, as in the IBM-PC and clones where the registers hold 80-bit floating-point numbers while in main storage they might be held only as 32-bit, or 64-bit as well as 80-bit. The sequence

  y:=input[i] - c;
  t:=sum + y;
  c:=(t - sum) - y;

might be compiled without any of the unwanted mathematical transformations, but, notice that after the code for the first statement is executed, the register holding the result that was stored in y still has (or could still have: the registers might be organised as a stack with overflow to memory) that result and as the next statement refers to y, perhaps the code for it could be arranged so that the value of y need not be fetched from memory; similarly for t in the third statement. If the stored values are held to the same precision as the registers, then all will be well: the registers and main storage are thus equivalent except for the speed of access. But if not, the working of the algorithm will again be ruined. Optimisation options helpful for some parts of the programme will not necessarily be good for all parts of a programme.

It might be that some optimising compiler will recognise that variables y and t are waystations that need not be saved into main memory (especially if their usage is only within the loop), and it may further be that the computer has enough floating-point registers available that the compiler can generate code that employs only full-precision registers to hold sum and the waystations (saving the summation into sum at the end of the loop), but input may be a complex entity such as a computation itself needing registers for its evaluation, too many registers, or worse still, be a function invocation with an arbitrary requirement. Thus it is unlikely that the optimum outcome of a full-precision summation via registers employing this algorithm will be attained without the most careful attention to detail, even inspection of the machine code produced.

Alternatively, some computer languages offer summation features:

  +/input          APL (read this as "Plus, over input")
  SUM(input)       Fortran, a semi-function SUM supplied as a compiler intrinsic.

And it might be that the implementation will indeed use the best method. Alas, the Fortran manual to hand offers no details of its internal workings, merely that the result will be in the same precision as input. Inspection of the code generated by the Compaq Visual Fortran v6.6 compiler for a simple usage reveals the equivalent of

  Load  sum
  Add   input(i)
  Store sum

There is no attempt to hold the value of sum in a register, therefore, even if the addition were performed with a precision greater than that of sum, that will be lost when the result is stored into sum and later retrieved. So there is nothing beyond a straight summation.