0.1 + 0.2 = 0.30000000000000004

7 minute read

Published:

If you are newbie or experienced coder, I deeply believe you ever encounter a scenario that you do some arithmetic computation on decimal number like: System.out.println(0.1 + 0.2) and surprisingly, the returned value isn’t as expected. You’ve got 0.30000000000000004.

However System.out.println(0.1 + 0.3) show on my screen 0.4. At the first time, I was thought my computer is lag. It’s hard to believe that a modern computer can’t produce the result exactly as my old hand calculator can do. I tried to look up on Google, but I couldn’t found any available answer succinct enough to understand. I was daunted. My brain told me: “Let’s give up, buddy, we’re good without this knowledge”. As a result, I hardly ever think about this question until someday I found out the way the computer work with the floating-point number. If you had been wondering why this magical equation happened like me, this is a time to unveil the behind mechanism’s secret.

The first, we have to know that computer doesn’t know anything like 0 and 1. Two number is symbolic number represent for two possible electrical signal state(on/off) of the transistor. Because of that, the computer must represent everything in the form of 0s and 1s. It’s look different from our standard decimal system which is include ten number from zero to nine. So, as you probably know, when you entered a decimal number, the compiler/interpreter have to convert to binary’s digit.

To ensure all computers yield the same result when compute with floating-point number, the IEEE Standard for Floating-Point Arithmetic (IEEE 754) come into exists using the idea of scientific notation. You just factor your number into two parts: a value whose magnitude is in the range of 1 ≤ n < 2, and a power of 2 number. For example:

\(0.1 \quad \textrm{ is written as } \quad 1.6 \times 2^{-4}\)
\(0.2 \quad \textrm{ is written as } \quad 1.6 \times 2^{-3}\)
\(0.3 \quad \textrm{ is written as } \quad 1.2 \times 2^{-2}\)

It’s pretty easy to represent integers as binary number using two’s complement but in order to convert the floating-point number to binary number, the first step we need convert the floating-point number to the following form:

$(-1)^{\color{purple}{\textrm{sign bit}}} (1 + \color{red}{\textrm{fraction}}) \times 2^{\textrm{$\color{green}{\textrm{exponent}}$ - bias}}$

The second step, we need to know how to convert this number represent as scientific notation above to binary number using 64-bit floating-point number format (IEEE 754), so called double-precision floating-point format, which is the widely popular format in the most modern computers nowadays along with single-precision format and quad-precision format. 64 bits laid out as follows:

  • The sign occupy 1 bit, sign bit represents whether the number is positive or negative. If it is 0, the value is positive. If it is 1, the value is negative.
  • The exponent occupy 11 bits, this is the superscript part in those forms above indicates how many digits after the dot. However, according to IEEE 754 standard, instead of stored exponent as an 11 bits signed two’s complement number, coder’s life will be easier if represent the exponent as an 11 bits unsigned number by adding the exponent with 1023 (biasing exponent) range from 00000000001 to 11111111110 (exclude 0 and 2047, two special value ).
  • The fraction (sometimes also called the significand or mantissa) occupy 52 bits. If you already familiar with convert integer to binary number, you can convert decimal number to binary number as the same way.

For example, we convert 0.1 to binary number as follows:

  • The first step is to look at the sign of the number

    $0.1 = (-1)^0 (1 + \color{red}{\textrm{fraction}}) \times 2^{\textrm{power}} \quad$ $\textrm{ <=> } \quad 0.1 \quad / \quad 2^{\textrm{power}} = 1 + \color{red}{\textrm{fraction}}$

  • Next, we convert the number to base-2 scientific notation

\[0.1 \quad / \quad 2^{-1} = 0.2\] \[0.1 \quad / \quad 2^{-2} = 0.4\] \[0.1 \quad / \quad 2^{-3} = 0.8\] \[0.1 \quad / \quad 2^{-4} = 1.6\]

$Therefore \quad 0.1 \quad = \quad 1.6 \times 2^{-4}$

  • Next, we find the exponent

The power of 2 used above was -4, and the bias for the double-precision format is 1023. Thus,
$\color{green}{\textrm{exponent}} = -4+1023 = 1019 = \color{green}{01111111011}_{\textrm{binary}}$

  • Next, we find the fraction

    0.6 x 2 = 1.2
    0.2 x 2 = 0.4
    0.4 x 2 = 0.8
    0.8 x 2 = 1.6
    0.6 x 2 = …

    Once this process terminates or starts repeating, we read the unit’s digits from top to bottom to reveal the binary form for 0.6: 0.1001…(at this point the list starts repeating). So we have: \(\underbrace{1001100110011001100110011001100110011001100110011001}_{52 bits}100110...\)

    So I have to round to either Round to Nearest Mode:

    • Smaller fraction (truncated) \(1001100110011001100110011001100110011001100110011001\)
    • Larger fraction (original fraction plus 1) \(1001100110011001100110011001100110011001100110011010\)

    Since the 53 bit is 1, I’m rounding up to the larger fraction.

  • As you can see above, As you can see, 0.6 has a non-terminating, repeating binary form. This is very similar to how a fraction, like 5/27 has a non-terminating, repeating decimal form (i.e., 0.185185185…). And with the limit 53 bits to represents the fraction, we have a binary string of 0.1 in IEEE 754 format: \(0.1 \quad \textrm{=>} \quad \color{purple}{0 } \quad \color{red}{01111111011 } \quad \color{green}{1001100110011001100110011001100110011001100110011010}\) Or in another form: \(0.1 \quad => \quad 2^{-4} \quad * \quad [1].1001100110011001100110011001100110011001100110011010\)

Apply the same formula, we have: \(0.1 \quad => \quad 2^{-4} \quad * \quad [1].1001100110011001100110011001100110011001100110011010\) \(0.2 \quad => \quad 2^{-3} \quad * \quad [1].1001100110011001100110011001100110011001100110011010\)

OR

(Note that \(11001100110011001100110011001100110011001100110011010_{2} \quad = \quad 7205759403792794_{10}\))

\(0.1 \quad => \quad 2^-56 \quad * \quad 7205759403792794 \quad = \quad 0.1000000000000000055511151231257827021181583404541015625\) \(0.2 \quad => \quad 2^-55 \quad * \quad 7205759403792794 \quad = \quad 0.200000000000000011102230246251565404236316680908203125\)

To add two numbers, the exponent needs to be the same, i.e.:

0.1 => 2^-3 *  0.1100110011001100110011001100110011001100110011001101(0)
0.2 => 2^-3 *  1.1001100110011001100110011001100110011001100110011010
sum =  2^-3 * 10.0110011001100110011001100110011001100110011001100111
or
0.1 => 2^-55 * 3602879701896397  = 0.1000000000000000055511151231257827021181583404541015625
0.2 => 2^-55 * 7205759403792794  = 0.200000000000000011102230246251565404236316680908203125
sum =  2^-55 * 10808639105689191 = 0.3000000000000000166533453693773481063544750213623046875

Since the sum is not of the form 2n * 1.{bbb} we increase the exponent by one and shift the decimal (binary) point to get:

sum = 2^-2  * 1.0011001100110011001100110011001100110011001100110011(1)
    = 2^-54 * 5404319552844595.5 = 0.3000000000000000166533453693773481063544750213623046875

There are now 53 bits in the mantissa (the 53rd is in square brackets in the line above). The default rounding mode for IEEE 754 is ‘Round to Nearest’ - i.e. if a number x falls between two values a and b, the value where the least significant bit is zero is chosen.

a = 2^-54 * 5404319552844595 = 0.299999999999999988897769753748434595763683319091796875
  = 2^-2  * 1.0011001100110011001100110011001100110011001100110011

x = 2^-2  * 1.0011001100110011001100110011001100110011001100110011(1)

b = 2^-2  * 1.0011001100110011001100110011001100110011001100110100
  = 2^-54 * 5404319552844596 = 0.3000000000000000444089209850062616169452667236328125

Note that a and b differ only in the last bit; ...0011 + 1 = ...0100. In this case, the value with the least significant bit of zero is b, so the sum is:

sum = 2^-2  * 1.0011001100110011001100110011001100110011001100110100
    = 2^-54 * 5404319552844596 = 0.3000000000000000444089209850062616169452667236328125

whereas the binary representation of 0.3 is:

0.3 => 2^-2  * 1.0011001100110011001100110011001100110011001100110011
    =  2^-54 * 5404319552844595 = 0.299999999999999988897769753748434595763683319091796875

which only differs from the binary representation of the sum of 0.1 and 0.2 by 2-54.

The binary representation of 0.1 and 0.2 are the most accurate representations of the numbers allowable by IEEE 754. The addition of these representation, due to the default rounding mode, results in a value which differs only in the least-significant-bit.