The drawing shows the binary code for the ASCII characters that spell “Google”:

01000111 01100111 01101111 01101100 01101111 01100101

(The last ‘1’ is unfinished in the drawing.)

Converted to decimal it reads

71 103 111 108 111 101

Converted to letters by looking up in an ASCII table it reads

G g o l o e

(Here are two of his papers to check out: De Progressione Dyadica and Explication de l’Arithmétique Binaire.)

]]>This site doesn’t process sensitive data like logins or credit cards, so technically HTTPS is not necessary. But Internet users have come to expect encryption for everything, including browsing a basic informational site like this one. Also, Google adds another incentive: they use HTTPS as a signal for search result rankings. In any case, using HTTPS makes a site look more modern.

I’ve removed all the “mixed content” warnings I could find, but please tell me if you find any. (A page with mixed content won’t have the “padlock” in the browser address bar.) Also tell me if you experience a degradation of performance (I have not).

Please consider changing your links to my site to use HTTPS: https://www.exploringbinary.com/…/. Old HTTP links will redirect to HTTPS, but direct links would be cleaner.

]]>The maximum digit counts are useful if you want to print the full decimal value of a floating-point number (worst case format specifier and buffer size) or if you are writing or trying to understand a decimal to floating-point conversion routine (worst case number of input digits that must be converted).

For integers, it’s easy to determine the maximum number of decimal digits — just count the digits of the largest floating-point number. (This may seem obvious, but wait until we discuss fractions.) Smaller numbers may have as many digits, but they can never have more.

The largest value of a double, known as DBL_MAX, is a significand of 53 1s starting at the place of the largest power of two exponent, 1023; here it is, expressed in normalized binary scientific notation:

`1.1111111111111111111111111111111111111111111111111111 x 2`^{1023}

Written out longhand in binary it is 1024 bits, 53 1s followed by 971 zeros.

This number can be expressed as (2 – 2^{-52}) · 2^{1023} = (1 – 2^{-53}) · 2^{1024} = 2^{1024} – 2^{971} = (2^{53} – 1) · 2^{971}. In decimal it is

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

It has **309 (significant) digits**. You can count them, or use this formula: ⌊log_{10}((2^{53} – 1) · 2^{971})⌋ + 1 = 309.

(Formulas with logarithms can be rewritten so that they are computed more efficiently; for example, the above can be written as ⌊log_{10}(2^{53} – 1) + 971 · log_{10}(2)⌋ + 1. However, to keep things simple, I will not express them that way.)

The largest value of a float, known as FLT_MAX, is a significand of 24 1s starting at the place of the largest power of two exponent, 127:

`1.11111111111111111111111 x 2`^{127}

Written out longhand in binary it is 128 bits, 24 1s followed by 104 zeros.

This number can be expressed as (2 – 2^{-23}) · 2^{127} = (1 – 2^{-24}) · 2^{128} = 2^{128} – 2^{104} = (2^{24} – 1) · 2^{104}. In decimal it is

340282346638528859811704183484516925440

It has **39 (significant) digits**: ⌊log_{10}((2^{24} – 1) · 2^{104})⌋ + 1 = 39.

With fractions, finding the maximum required digits is not as simple as counting the digits of the smallest number. Also, we have to specify what we mean by “maximum” — is it the total length of the fraction, which includes leading zeros, or just the length of the significant digits? ^{1} We will be looking for the maximum significant digits, although the maximum number of digits overall will come out in the process.

We’ll first look at the smallest values of a double — the smallest *normal* and *subnormal* numbers — and then we’ll look at the numbers with the most significant digits.

The smallest (positive) normal value of a double, known as DBL_MIN, is 2^{-1022.} In binary it is 1022 bits, 1021 leading zeros followed by a 1. In decimal it is

0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625

It has **1022 digits**, 307 leading zeros followed by **715 significant digits**. You can count the digits or just use this formula: 1022 + ⌊log_{10}(2^{-1022})⌋ + 1 = 715.

That’s a lot of digits, but it’s neither the maximum total digits nor the maximum significant digits possible.

Let’s look at another number, 2^{-1074}. It’s the smallest subnormal value of a double — the smallest value of a double period. In binary it is 1074 bits, 1073 leading zeros followed by a 1. In decimal it is

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625

It has **1074 digits**, 323 leading zeros followed by **751 significant digits**: 1074 + ⌊log_{10}(2^{-1074})⌋ + 1 = 751.

There aren’t numbers with more total digits, but there are numbers with more significant digits.

That there are numbers with more digits than 2^{-1022} and 2^{-1074} is not surprising if you’ve read this. The most significant digits come from a binary floating-point number that has the maximum length 1s-filled significand ending at the lowest place — place 1074 for doubles.

The double-precision number that fits the bill has a significand of 53 1s starting at the place of the smallest normal power of two exponent, -1022; it is this 53 significant bit number:

`1.1111111111111111111111111111111111111111111111111111 x 2`^{-1022}

Written out longhand in binary it is 1074 bits, 1021 leading zeros followed by a 53 1s.

This number can be expressed as (2 – 2^{-52}) · 2^{-1022} = (1 – 2^{-53}) · 2^{-1021} = 2^{-1021} – 2^{-1074} = (2^{53} – 1) · 2^{-1074} = (2^{53} – 1) / 2^{1074}. In decimal it is

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000044501477170144022721148195934182639518696390927032912960468522194496444440421538910330590478162701758282983178260792422137401728773891892910553144148156412434867599762821265346585071045737627442980259622449029037796981144446145705102663115100318287949527959668236039986479250965780342141637013812613333119898765515451440315261253813266652951306000184917766328660755595837392240989947807556594098101021612198814605258742579179000071675999344145086087205681577915435923018910334964869420614052182892431445797605163650903606514140377217442262561590244668525767372446430075513332450079650686719491377688478005309963967709758965844137894433796621993967316936280457084866613206797017728916080020698679408551343728867675409720757232455434770912461317493580281734466552734375

It has **1074 digits**, 307 leading zeros followed by **767 significant digits**: 1074 + ⌊log_{10}(2^{53} – 1) / 2^{1074})⌋ + 1 = 767.

This is the most significant digits a double can have, but it’s not the only double with that many; some (relatively speaking) smaller numbers have just as many. In this case, any significand with bits 1 and 53 equal to 1 — that is, half of the doubles between (2^{52} + 1) / 2^{1074} and (2^{53} – 1) / 2^{1074} — will have a decimal value with 767 significant digits.

There are also some subnormal numbers with the same number of significant digits. Consider the number that is one ULP below DBL_MIN. It has a significand of *52* 1s starting at the place of the largest subnormal power of two exponent, -1023; it is this 52 significant bit number:

`1.111111111111111111111111111111111111111111111111111 x 2`^{-1023}

Written out longhand in binary it is 1074 bits, 1022 leading zeros followed by 52 1s.

This number can be expressed as (2 – 2^{-51}) · 2^{-1023} = (1 – 2^{-52}) · 2^{-1022} = 2^{-1022} – 2^{-1074} = (2^{52} – 1) · 2^{-1074} = (2^{52} – 1) / 2^{1074}. In decimal it is

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000022250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375

It has **1074 digits**, 307 leading zeros followed by **767 significant digits**: 1074 + ⌊log_{10}(2^{52} – 1) / 2^{1074})⌋ + 1 = 767.

Why does it also have 767 digits? Well of course because the logarithm comes out the same. But let’s look at it in terms of the structure of the number. It has one less significant bit than the number described above, which makes it lose a significant digit. But that’s offset by the lowered exponent, which in this case — as is the case about 70% of the time — adds back a significant digit. (This by the way explains the difference in the number of significant digits between this number and 2^{-1074}, which has 751; the 51 extra bits adds 51 digits, but moving the exponent 51 places higher subtracts about 0.7 * 51 = 36 digits.)

As it turns out, half of the doubles between (2^{51} + 1) / 2^{1074} and (2^{52} – 1) / 2^{1074}, will have a decimal value with 767 significant digits.

The range of 767 significant digit numbers even continues into the next subnormal exponent down (one digit lost, one digit gained), although it does not span all its values; it goes from (2^{50} + 898122626230483) / 2^{1074} through (2^{51} – 1) / 2^{1074}. (I determined that big constant in the first numerator through trial and error.)

Overall, the 767 significant digit numbers live in the range (2^{50} + 898122626230483) / 2^{1074} through (2^{53} – 1) / 2^{1074}. (You can verify this with the logarithm formula.)

I’ll do a similar progression for floats, but with less narration.

The smallest normal value of a float, known as FLT_MIN, is 2^{-126.} In binary it is 126 bits, 125 leading zeros followed by a 1. In decimal it is

0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625

It has **126 digits**, 37 leading zeros followed by **89 significant digits**: 126 + ⌊log_{10}(2^{-126})⌋ + 1 = 89.

That number has neither the maximum total digits nor the maximum significant digits possible.

Let’s look at another number, 2^{-149}. It’s the smallest subnormal value of a float — the smallest value of a float period. In binary it is 149 bits, 148 leading zeros followed by a 1. In decimal it is

0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

It has **149 digits**, 44 leading zeros followed by **105 significant digits**: 149 + ⌊log_{10}(2^{-149})⌋ + 1 = 105.

There aren’t numbers with more total digits, but there are numbers with more significant digits.

A significand of 24 1s starting at the place of the smallest normal power of two exponent (-126) will give us a float with the most significant digits; it is this 24 significant bit number:

`1.11111111111111111111111 x 2`^{-126}

Written out longhand in binary it is 149 bits, 125 leading zeros followed by 24 1s.

This number can be expressed as (2 – 2^{-23}) · 2^{-126} = (1 – 2^{-24}) · 2^{-125} = 2^{-125} – 2^{-149} = (2^{24} – 1) · 2^{-149} = (2^{24} – 1) / 2^{149}. In decimal it is

0.00000000000000000000000000000000000002350988561514728583455765982071533026645717985517980855365926236850006129930346077117064851336181163787841796875

It has **149 digits**, 37 leading zeros followed by **112 significant digits**: 149 + ⌊log_{10}(2^{24} – 1) / 2^{149})⌋ + 1 = 112.

This is not the only float with that many significant digits; some smaller numbers have just as many. In this case, any significand with bits 1 and 24 equal to 1 — that is, half of the floats between (2^{23} + 1) / 2^{149} and (2^{24} – 1) / 2^{149} — will have a decimal value with 112 significant digits.

There are also some subnormal numbers with the same number of significant digits. Consider the number that is one ULP below FLT_MIN. It has a significand of 23 1s starting at the place of the largest subnormal power of two exponent (-127); it is this 23 significant bit number:

`1.1111111111111111111111 x 2`^{-127}

Written out longhand in binary it is 149 bits, 126 leading zeros followed by 23 1s.

This number can be expressed as (2 – 2^{-22}) · 2^{-127} = (1 – 2^{-23}) · 2^{-126} = 2^{-126} – 2^{-149} = (2^{23} – 1) · 2^{-149} = (2^{23} – 1) / 2^{149}. In decimal it is

0.00000000000000000000000000000000000001175494210692441075487029444849287348827052428745893333857174530571588870475618904265502351336181163787841796875

It has **149 digits**, 37 leading zeros followed by **112 significant digits**: 149 + ⌊log_{10}(2^{23} – 1) / 2^{149})⌋ + 1 = 112.

There are more 112 significant digit numbers below that, with the same exponent; they range from (2^{22} + 2941935) / 2^{149} through (2^{23} – 1) / 2^{149}.

Overall, the 112 significant digit numbers live in the range (2^{22} + 2941935) / 2^{149} through (2^{24} – 1) / 2^{149}.

Here are the maximum number of significant digits required for quadruple-precision:

*Maximum length integer.*An integer with the maximum number of significant digits is (2^{113}– 1) · 2^{16271}. It has ⌊log_{10}((2^{113}– 1) · 2^{16271})⌋ + 1 =**4,933 digits**.*Maximum length fraction.*A fraction with the maximum number of significant digits is (2^{113}– 1) / 2^{16494.}It has 16,494 digits: 4,931 leading zeros followed by**11,563 significant digits**: 16494 + ⌊log_{10}(2^{113}– 1) / 2^{16494})⌋ + 1 = 11,563!

You can apply the above analysis to find the maximum number of significant digits in other floating-point formats as well.

Some programming languages let you print all these digits. In Python 3, this line will print all 1074 digits of the worst case double-precision example (2^{53} – 1) · 2^{-1074}, as displayed above:

print(format((pow(2,53)-1)*pow(2,-1074),".1074f"))

You can print just the 767 significant digits with this line (uses the ‘g’ presentation type instead of ‘f’):

print(format((pow(2,53)-1)*pow(2,-1074),".767g"))

4.4501477170144022721148195934182639518696390927032912960468522194496444440421538910330590478162701758282983178260792422137401728773891892910553144148156412434867599762821265346585071045737627442980259622449029037796981144446145705102663115100318287949527959668236039986479250965780342141637013812613333119898765515451440315261253813266652951306000184917766328660755595837392240989947807556594098101021612198814605258742579179000071675999344145086087205681577915435923018910334964869420614052182892431445797605163650903606514140377217442262561590244668525767372446430075513332450079650686719491377688478005309963967709758965844137894433796621993967316936280457084866613206797017728916080020698679408551343728867675409720757232455434770912461317493580281734466552734375e-308

In C (GCC and Visual Studio), this line will also print all 1074 digits (must also include *math.h*):

printf("%.1074f\n",(pow(2,53)-1)*pow(2,-1074));

Substituting ‘g’ for ‘f’ will print only the significant digits:

printf("%.767g\n",(pow(2,53)-1)*pow(2,-1074));

(I was happy that all the *pow()* implementations computed 2^{-1074} correctly. It’s a power of two so I expected them to, but you never know.)

Java, PHP, and JavaScript won’t let you print all those digits.

For double-precision, for example, it may seem like 17 significant digits of a decimal input is enough to convert it correctly, but it’s not; 17 digits only helps you return to the proper double once you’ve determined it.

A decimal to floating-point conversion routine has to consider all input digits up to the maximum, plus one digit for rounding. (Any digits beyond can just be considered “sticky”.) For example, for the input decimal representing 2^{-1022} + 2^{-1074} + 2^{-1075}, 1075 digits (768 significant) must be processed.

Here is a summary of the digit counts we derived:

Max Integer Digits | Max Fraction Digits | |||
---|---|---|---|---|

Format | Total/Significant | Total | Leading Zero | Significant |

float | 39 | 149 | 37 | 112 |

double | 309 | 1,074 | 307 | 767 |

quad | 4,933 | 16,494 | 4,931 | 11,563 |

It is the maximum number of digits in a fraction that determines the maximum number of digits for a given IEEE format.

Notice the near symmetry between the number of integer digits and the number of fractional leading zeros. (If the absolute values of the minimum and maximum exponents of each format were equal, and if we listed the starting place of the significant digits instead of the count of leading zeros, it’d be symmetric.) For the fractions, we’ve put the biggest significand at the lowest place.

^{1} I’ll use the definition that *significant digits* are all the digits following the leading zeros; don’t think of them as digits of precision.

One digit per bit? We know that’s not true for binary integers. But it *is* true for binary fractions; every binary fraction of length *n* has a corresponding equivalent decimal fraction of length *n*.

This is the reason why you get all those “extra” digits when you print the full decimal value of an IEEE binary floating-point fraction, and why glibc strtod() and Visual C++ strtod() were once broken.

(In the text that follows, I will usually refer to decimal digits as just *digits* and binary digits as just *bits* — but where context allows, I will use *digits* to refer to both generically.)

I will prove, in two steps, that every binary fraction has a corresponding decimal fraction of the same length:

- Show that every binary fraction can be written as a/2
^{n}, where*a*is a positive integer < 2^{n}. - Show that every fraction of the form a/2
^{n}can be written as a fraction of the form b/10^{n}, where*b*is a positive integer < 10^{n}.

Any fraction with a power of ten denominator represents a decimal fraction, with a number of digits equal to the exponent of the power of ten.

A binary fraction is a finite sum of negative powers of two. For example, 0.00111011011 =

0·2^{-1} + 0·2^{-2} + 1·2^{-3} + 1·2^{-4} + 1·2^{-5} + 0·2^{-6} + 1·2^{-7} + 1·2^{-8} + 0·2^{-9} + 1·2^{-10} + 1·2^{-11}

= 2^{-3} + 2^{-4} + 2^{-5} + 2^{-7} + 2^{-8} + 2^{-10} + 2^{-11}

= 1/2^{3} + 1/2^{4} + 1/2^{5} + 1/2^{7} + 1/2^{8} + 1/2^{10} + 1/2^{11}.

If a binary fraction has *n* bits, its lowest weighted bit has a place value of 2^{-n}, or 1/2^{n}. For our example, that’s 2^{-11}, or 1/2^{11}.

The denominator of each term in the series divides 2^{n}. (The divisors of a nonnegative power of two are all the nonnegative powers of two less than or equal to it.) As such, each term can be rewritten as a fraction with denominator 2^{n}, with a corresponding power of two numerator. All terms can then be combined into a single value a/2^{n}. For our example, we’d have

2^{8}/2^{11} + 2^{7}/2^{11} + 2^{6}/2^{11} + 2^{4}/2^{11} + 2^{3}/2^{11} + 2^{1}/2^{11} + 2^{0}/2^{11}

= (2^{8} + 2^{7} + 2^{6} + 2^{4} + 2^{3} + 2^{1} + 2^{0})/2^{11}

= (256 + 128 + 64 + 16 + 8 + 2 + 1)/2^{11}

= 475/2^{11}.

(A way to view what we’ve shown: every binary fraction is a multiple of the smallest negative power of two contained within it.)

The above proof suggests a simpler way to convert a binary fraction to the form a/2^{n}. First, treat the digits of the binary fraction as a binary integer, and convert that integer to decimal (ignore superfluous leading zeros). Next, divide that number by 2^{n}, where *n* is the number of bits in the binary fraction. For our example, we have 111011011 = 475 as the numerator, and 2^{11} as the denominator: 475/2^{11}.

This step is trivial: simply multiply a/2^{n} by 5^{n}/5^{n}, getting (a·5^{n})/(2^{n}·5^{n}) = (a·5^{n})/10^{n} = b/10^{n}, where *b* = a·5^{n}. It has *n* digits.

For our example, we have (475·5^{11})/10^{11} = 23193359375/10^{11} = 0.23193359375.

We’ve shown that every binary fraction is a decimal fraction with the same number of digits.

Having understood the above proof, you can reason towards the answer with this shortcut: Think of each place of the *n*-bit binary fraction as having the place value 2^{-m} = 1/2^{m} = (1/2)^{m} = (5/10)^{m} = 5^{m}/10^{m}. This makes it obvious from the outset that the binary fraction is equivalent to an *n*-digit decimal.

In general, a decimal fraction has more significant digits than its corresponding binary fraction has significant bits. ^{1} (A decimal fraction can have an equal number of significant digits, but it can never have less.) This may seem counterintuitive at first, since the opposite is true for integers. The difference in the number of significant digits and significant bits depends on three properties of the binary fraction:

**The starting place of the significant bits.**For any given combination of length and value of significant bits, the lower the starting place — that is, the more leading zeros, or smaller the number — the more significant digits, in general. For example, 0.000011 = 0.046875 (2 bits to 5 digits), 0.0000011 = 0.0234375 (2 bits to 6 digits), and 0.00000011 = 0.01171875 (2 bits to 7 digits).

Lowering the starting place won’t always increase the number of significant digits — it may stay the same. For example, continuing the sequence above, 0.000000011 = 0.005859375 (2 bits to 7 digits). The length of the fractions increased by one, but the number of significant digits remained the same; a leading decimal zero accumulated instead.

As you incrementally lower the starting place, the number of significant digits can increase two or three times in a row, after which a leading decimal zero must be added. From my observations the pattern is: significant digits increase twice in a row, then a leading zero is added; significant digits increase twice in a row, then a leading zero is added; significant digits increase

*three times*in a row, then a leading zero is added. This pattern then repeats. This reflects a ratio of about 7 significant digits to 3 leading zeros, which is expected (this is the same as saying you get about 3 leading decimal zeros for every 10 leading binary zeros — see “Where log_{10}(2) Fits In” below).If the binary fraction has no leading zeros, then the decimal fraction won’t either; both will have the same number of significant digits no matter how you vary length or value.

To emphasize the effect that starting place has on the number of significant digits, consider this example, a very small single-bit number, 2

^{-100}:0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

It has 70 significant digits (and 30 leading zeros):

0.0000000000000000000000000000007888609052210118054117285652827862296732064351090230047702789306640625.

**The number of significant bits it has.**For any given starting place, more significant bits equates to more significant digits. For example, 0.0001 = 0.0625 (1 bit to 3 digits), 0.00011 = 0.09375 (2 bits to 4 digits), and 0.000111 = 0.109375 (3 bits to 6 digits). You almost always get just one significant digit per bit, but sometimes you get two (a leading decimal zero is replaced). From my observations, the latter only happens for some starting places, only once per starting place, and only after just a few significant bits.

**The value of the significant bits.**For any given combination of starting place and number of significant bits, a greater value can lead to more significant digits. For example, 0.000000101 = 0.009765625 (3 bits to 7 digits), and 0.000000111 = 0.013671875 (3 bits to 8 digits). The effect is minimal though, since it can only add up to one significant digit. From my observations, this happens about 30% of the time, when you change the value from its minimum (only its most and least significant bits set) to its maximum (all its bits set).

You can count the number of significant digits indirectly by counting the number of leading zeros and then subtracting that from the length of the binary fraction (decimal fraction). The number of leading zeros is easily deduced from the starting place of the significant digits, which is determined using a logarithm.

The starting place of the significant digits of a number *x* is determined by taking the *floor* of the logarithm of *x* to the base *i* of the number: ⌊log_{i}(x)⌋. ^{2} This value is the exponent of the power of *i* of the place containing the most significant digit. (The logarithm is negative, so remember that *floor* will round it towards negative infinity.) If you negate that value, you get the number of the place (1, 2, 3, etc.) at which the significant digits start, *p*_{i} = -⌊log_{i}(x)⌋. For a binary fraction *b*, ** p_{b} = -⌊log_{2}(x)⌋**; for a decimal fraction

The number of leading zeros z_{i} is simply the starting place of the significant digits minus one. The number of leading binary zeros is **z _{b} = p_{b} – 1**; the number of leading decimal zeros is

Knowing the starting place of the significant digits — and hence the number of leading zeros — and the length *n* of the binary fraction (decimal fraction), you can compute the number of significant digits s_{i} = *n* – z_{i}. The number of significant bits is **s _{b} = n – z_{b};** the number of significant digits is

For example, for *b* = 0.00111011011, p_{b} = 3, z_{b} = 2, and s_{b} = 9; for *d* = 0.23193359375, p_{d} = 1, z_{d} = 0, and s_{d} = 11.

For integers, just like for fractions, the logarithm tells you the starting place of the significant digits. But for integers, it tells you more: it also says *how many* significant digits there are. Furthermore, the ratio of significant digits in one base relative to another is predictable, approaching a constant as the integers get large. For example, the ratio of significant digits to significant bits converges to log_{10}(2) ≈ 0.3.

For fractional values, there is no such relationship. The logarithm does not help you count significant digits. There is no significant digit ratio in any case — it can vary from one to essentially infinite. But from the logarithm you can count *leading zeros*, and from that you can determine the ratio of leading decimal zeros to leading binary zeros, z_{d}/z_{b}, as the numbers get small: log_{10}(2) ≈ 0.3.

All binary fractions are decimal fractions, but the reverse is not true. Some decimal fractions *do* convert to binary fractions (when put in lowest terms as a fraction their denominator becomes a power of two), so in this case, the above analysis applies.

However, most decimal fractions *do not* convert to binary fractions, which means that their binary representations are infinite (repeating bicimals as I call them ^{4}). The only part of the above analysis that applies is the computation of the starting place of the significant digits, and hence the number of leading zeros.

^{1} *Significant digits* are the digits following the leading zeros of the fraction — don’t think of them as digits of precision.

^{2} I use ⌊log_{i}(x)⌋ and not ⌈log_{i}(x)⌉ + 1 because the calculation using *ceiling* fails when *i* = *2* and *x* is a power of two or *i* = *10* and *x* is a power of ten.

^{3} I have not specified how to compute ⌊log_{i}(x)⌋ programatically, but imagine that *x* is in an internal binary representation.

^{4} A binary fraction can be called a *terminating bicimal*, just as a decimal fraction can be called a *terminating decimal*. In this article, I chose to go with the standard terminology, which I think of as inferior in most contexts.

Most of my traffic is desktop — presumably because my readers are computer programmers and engineers — so there wasn’t an urgent need to upgrade. On the other hand, improved readability and improved google search ranking may increase my mobile traffic. Whether mobile traffic increases or not, I am happy with the more modern design.

Mobile-friendly means the site will also respond fluidly when resized in a desktop browser window. This is useful, for example, when you want to keep one of my converters or calculators up on part of your screen.

There are still some tweaks to make, mainly to fix minor formatting issues, but the experience should be good. (Landscape mode will generally give the best results.) If you encounter any problems, please let me know.

]]>On the other hand, an arbitrary, arbitrarily long decimal literal rounded or truncated to 17 digits *may not* convert to the double-precision value it’s supposed to. This is a subtle point, one that has even tripped up implementers of widely used decimal to floating-point conversion routines (glibc strtod() and Visual C++ strtod(), for example).

The job of a decimal to floating-point conversion routine is to pick the floating-point number that is closest to the decimal input. The hardest inputs to convert correctly are those that are halfway or near halfway between consecutive floating-point numbers. These are the very inputs for which more than 17 digits — for doubles, up to 768 of them! — may need to be parsed. The extra digits, despite how little they contribute, can put you just at or beyond the halfway point.

Here’s the interesting thing: we know every double has a 17-digit representative that maps to it (actually there are multiple, but for our purposes we’ll consider only one, the nearest), but it may take more than 17 digits of a decimal input to figure out which 17-digit representative to choose! But once we’ve processed our long decimal input, we could replace it with its simpler, 17-digit representative. We’ve still got the correct double, but now our “handle” to it is unambiguously close. It’s like we are using a more precise number to decide which less precise number to pick, although that extra precision is just an illusion.

3.08984926168550152811e-32, an example near-halfway case from Vern Paxson’s paper “A Program for Testing IEEE Decimal–Binary Conversion”, is a 21-digit number that converts to 0x1.40de48676653bp-105. Rounded (same as truncated in this case) to 17 digits it is 3.0898492616855015e-32. However, that converts to 0x1.40de48676653ap-105, which is one binary ULP below the correct answer. Even rounded (truncated) to 20 digits — 3.0898492616855015281e-32 — we still come up one ULP short. So we need all 21 digits.

The problem is that this input converts to a number very nearly halfway between two double-precision numbers. (It is slightly *above* halfway). Here are its first 130 digits in binary to illustrate this (bits 54 through 130 are highlighted):

1.010000001101111001001000011001110110011001010011101010000000000000000000000000000000000000000000000000000000000000000000000000001…p-105

Being so close to halfway, those extra decimal digits are needed to decide which way to go. Ignore those digits, and the conversion will land below the halfway point.

The full decimal value of the desired floating-point number is

3.08984926168550180180110631344083416478369964008307296403646973245891422971698357636226306421889375997125171124935150146484375e-32

Rounded to 17 digits, it’s 3.0898492616855018e-32; that converts to the desired floating-point number.

1.00000000000000033306690738754696212708950042724609375 is an example I concocted using my decimal/binary converter; it is halfway between consecutive doubles:

1.00000000000000000000000000000000000000000000000000011

It is 54 digits, and all 54 are needed to decide that the correct conversion is 0x1.0000000000002p0 (round half to even) and not 0x1.0000000000001p0. Rounding (truncating) to 17 digits — 1.0000000000000003 — does not work.

The full decimal value of the desired floating-point number is

1.000000000000000444089209850062616169452667236328125

Rounded to 17 digits, it’s 1.0000000000000004, which converts to the desired floating-point number.

8.36168422905420598437e-214, another 21-digit example from Vern Paxson’s paper, is a little less than halfway between two doubles. Here it is in binary:

1.00100000010000000011101001100010100010101001110010100111111111111111111111111111111111111111111111111111111111111111111111111110…p-708

It converts to 0x1.20403a628a9cap-708. But rounded to 17 digits — 8.361684229054206e-214 — it converts up to 0x1.20403a628a9cbp-708. (If you truncated instead of rounded, you would get the correct answer.) Again, assuming you are rounding and not truncating, you need to round it to 18 digits to convert it correctly: 8.36168422905420598e-214.

The full decimal value of the desired floating-point number is

8.36168422905420515990295749156693…(509 digits not shown)…0625e-214

Rounded to 17 digits, it’s 8.3616842290542052e-214.

In case you were thinking I could come up with only contrived examples, take a look at sqrt(2) ≈ 1.414213562373095048801688… . In binary, it’s

1.01101010000010011110011001100111111100111011110011001001… ,

which you can see is somewhat close to halfway. If you test it, you will find that you need 18 digits (rounded to 1.41421356237309505 or truncated to 1.41421356237309504) to get the correction conversion; 1.414213562373095 (rounded/truncated to 17 digits) does not work.

The full decimal value of the desired floating-point number is

1.4142135623730951454746218587388284504413604736328125

Rounded to 17 digits, it’s 1.4142135623730951.

As another realistic example, consider pi/3 ≈ 1.047197551196597746154214… . In binary it’s

1.0000110000010101001000111000001011010111001101100101100001…

which is even closer to halfway than sqrt(2).

19 digits — 1.047197551196597746 — are required to convert it correctly; 1.0471975511965977 (rounded/truncated to 17 digits) does not work.

The full decimal value of the desired floating-point number is

1.047197551196597853362391106202267110347747802734375

Rounded to 17 digits, it’s 1.0471975511965979.

The same issue applies to floats of course. Whereas you can always find a 9-digit decimal stand-in for a float, you may need more digits than that to convert it correctly.

When entering decimal literals into a computer program, you need to be aware that you may need more than 17 (9) digits to get the correct conversion. Once you know what a given decimal input converts to, it’s easy to find its 17 (9) digit stand-in. But until you convert it, you must assume you need all the digits you have — or even more if your value represents an infinite decimal. Unless you are willing to do an analysis like mine, you won’t know how many digits you need.

**(Please let me know if you know of any “underspecified” literals in real code.)**

Please check out the related article by “carolomeetsbarolo”: “Mathematical Constants in Program Code”.

]]>For example, does an IEEE single-precision binary floating-point number, or *float* as it’s known, have 6-8 digits? 7-8 digits? 6-9 digits? 6 digits? 7 digits? 7.22 digits? 6-112 digits? (I’ve seen all those answers on the Web.)

Part of the reason for multiple answers is that there *is* no single answer; the question is not as well defined as it seems. On the other hand, if you understand what it really means to equate decimal floating-point precision with binary floating-point precision, then only some of those answers make sense. In this article, I will argue that there are only three reasonable answers: “6 digits”, “6-8 digits”, and “slightly more than 7 digits on average”.

(For double-precision binary floating-point numbers, or *doubles*, the three answers are “15 digits”, “15-16 digits”, and “slightly less than 16 digits on average”.)

A common answer is that floats have a precision of about 7.22 digits. While this may be true for integers, where gaps align and are both of size one, it’s not true for floating point numbers (the fact that it gets you in the ballpark notwithstanding). I can’t say it better than David Matula himself, as he did in his 1970 paper “A Formalization of Floating-Point Numeric Base Conversion”:

“Converting integer and fixed-point data to an “equivalent” differently based number system is generally achieved by utilizing essentially log

_{δ}Β times as many digits in the new base δ as were present for representing numbers in the old base Β system. This simplified notion of equivalence does not extend to the conversion of floating-point systems. Actually, conversion between floating-point number systems introduces subtle difficulties peculiar to the structure of these systems so that no such convenient formula for equating the “numbers of significant digits” is even meaningful.”

Bruce Dawson has an excellent article on floating-point precision. Here is his main definition of precision, and the one I will adopt:

“For most of our purposes when we say that a format has n-digit precision we mean that over some range, typically [10^k, 10^(k+1)), where k is an integer, all n-digit numbers can be uniquely identified.”

So d-digit precision (d a positive integer) means that if we take all d-digit decimal numbers over a specified range, convert them to b-bit floating-point, and then convert them back to decimal — rounding to nearest to d digits — we will recover all of the original d-digit decimal numbers. In other words, all d-digit numbers in the range will round-trip.

It is the choice of range which leads to the multiple answers. Powers of ten and two interleave to create segments with different relative gap sizes, and it is relative gap size that determines how many decimal digits will round-trip.

So the answer for precision depends on what you are looking for. Do you want to know the precision for one power of ten exponent? For a power of two exponent? For a power of two exponent that crosses a power of ten exponent? For the whole floating-point format?

These are the main differences between my work and Bruce’s:

- I primarily do an analytical analysis based on relative gap sizes instead of running code to check round-trips. (I have run code tests in the past too, but they capture “coincidental” precision, as I’ll explain below.)
- I include analysis for 8-digit precision, not just 7 and 6 digit precision.
- I talk solely about decimal precision of binary numbers so as not to confound precision with floating-point to decimal to floating-point round-trip theory.
- I do a more granular analysis instead of just assigning the maximum guaranteed precision per power of ten exponent.
- I don’t analyze subnormal numbers, where the precision can go down to as low as 0 digits.

I wrote a PARI/GP script to identify all the different segments with different relative gap sizes over the entire single-precision format. For each segment, I calculated its precision — a single number, 6, 7, or 8. Here is the data condensed by power of ten exponent, which results in a range of precisions for most:

Power | Precision | Power | Precision | Power | Precision | ||
---|---|---|---|---|---|---|---|

10^{-38} |
6-7 | 10^{-12} |
7 | 10^{14} |
7-8 | ||

10^{-37} |
7 | 10^{-11} |
7-8 | 10^{15} |
6-8 | ||

10^{-36} |
7-8 | 10^{-10} |
6-8 | 10^{16} |
7 | ||

10^{-35} |
6-8 | 10^{-9} |
7 | 10^{17} |
7-8 | ||

10^{-34} |
7 | 10^{-8} |
7-8 | 10^{18} |
6-8 | ||

10^{-33} |
7-8 | 10^{-7} |
6-8 | 10^{19} |
7 | ||

10^{-32} |
6-8 | 10^{-6} |
7 | 10^{20} |
7-8 | ||

10^{-31} |
7 | 10^{-5} |
7-8 | 10^{21} |
6-8 | ||

10^{-30} |
7-8 | 10^{-4} |
6-8 | 10^{22} |
7 | ||

10^{-29} |
7-8 | 10^{-3} |
7 | 10^{23} |
7-8 | ||

10^{-28} |
7-8 | 10^{-2} |
7-8 | 10^{24} |
6-8 | ||

10^{-27} |
7-8 | 10^{-1} |
7-8 | 10^{25} |
7 | ||

10^{-26} |
7-8 | 10^{0} |
7 | 10^{26} |
7-8 | ||

10^{-25} |
7-8 | 10^{1} |
7-8 | 10^{27} |
6-8 | ||

10^{-24} |
7-8 | 10^{2} |
7-8 | 10^{28} |
7 | ||

10^{-23} |
7-8 | 10^{3} |
7-8 | 10^{29} |
7-8 | ||

10^{-22} |
6-8 | 10^{4} |
7-8 | 10^{30} |
7-8 | ||

10^{-21} |
7 | 10^{5} |
7-8 | 10^{31} |
7-8 | ||

10^{-20} |
7-8 | 10^{6} |
7-8 | 10^{32} |
7-8 | ||

10^{-19} |
6-8 | 10^{7} |
7-8 | 10^{33} |
7-8 | ||

10^{-18} |
7 | 10^{8} |
7-8 | 10^{34} |
7-8 | ||

10^{-17} |
7-8 | 10^{9} |
6-8 | 10^{35} |
7-8 | ||

10^{-16} |
6-8 | 10^{10} |
7 | 10^{36} |
7-8 | ||

10^{-15} |
7 | 10^{11} |
7-8 | 10^{37} |
6-8 | ||

10^{-14} |
7-8 | 10^{12} |
6-8 | 10^{38} |
7 | ||

10^{-13} |
6-8 | 10^{13} |
7 |

There are 77 powers of ten, although being at the extremes, 10^{-38} and 10^{38} are covered only partially.

When the precision shows a range, like 6-8, it means one segment has 6 digits, another has 7 digits, and another has 8 digits. Actually, it is always in reverse; each power of ten range starts with the higher precision segment and ends with the lower precision segment (as the numbers increase).

A few observations from the table:

- There are 19 powers of ten with a constant precision (and it’s always 7 digits).
- There are 18 powers of ten for which precision dips to 6 digits. (Precision as low as 6 digits may surprise you, especially if you have bought into the log
_{10}(2^{24}= 16,777,216) ≈ 7.22 argument.) - There are three long runs where the precision is 7-8 digits: 10
^{-30}through 10^{-23}, 10^{1}through 10^{8}, and 10^{29}through 10^{36}.

A power of ten can get one less digit of precision than advertised — if it converts to a floating-point number less than itself. For example, 1e-4 does not have 8 digits, but rather 7: 9.99999974737875163555145263671875e-5.

Let’s look at the precision of decimal floating-point numbers with decimal exponent -4, the range [10^{-4}, 10^{-3}):

Between 10^{-4} and 10^{-3} there are five segments: [10^{-4}, 2^{-13}), [2^{-13}, 2^{-12}), [2^{-12}, 2^{-11}), [2^{-11}, 2^{-10}), and [2^{-10}, 10^{-3}); they have decimal precision of 8, 7, 7, 7, and 6 digits, respectively.

Let’s look at examples of numbers represented to each of the three different levels of precision:

- 1.21e-4 converts to the single-precision floating-point value 1.209999973070807754993438720703125e-4, which has 8 digits of precision: rounded to 8 digits it’s 1.21e-4, but rounded to 9 digits it’s 1.20999997e-4.
- 1.23e-4 converts to 1.2300000526010990142822265625e-4, which has 7 digits of precision: rounded to 7 digits it’s 1.23e-4, but rounded to 8 digits it’s 1.2300001e-4.
- 9.86e-4 converts to 9.860000573098659515380859375e-4, which has 6 digits of precision: rounded to 6 digits it’s 9.86e-4, but rounded to 7 digits it’s 9.860001e-4.

(When using my decimal to floating-point converter to compute these values, check the boxes ‘Single’ and ‘Normalized decimal scientific notation’.)

You can’t get less precision than each segment supports, but you can get what looks like more.

In any given segment, you can find examples of numbers that convert to a higher precision than supported by that segment:

- 1.013e-4 converts to 1.01300000096671283245086669921875e-4, which appears to have 9 digits of precision; but it’s in an 8-digit segment.
- 1.24e-4 converts to 1.23999998322688043117523193359375e-4, which appears to have 8 digits of precision; but it’s in a 7-digit segment.
- 9.8e-4 converts to 9.80000011622905731201171875e-4, which appears to have 7 digits of precision; but it’s in a 6-digit segment.
- 2.350988561514728583455765982071533026645717985517980855365926236850006129930346077117064851336181163787841796875e-38, an exactly representable number, converts to itself, so it looks like it has 112 digits of precision!

This is not precision, at least as we have defined it; precision is a property of a *range* of n-digit numbers, not a specific n-digit number. I’ll call the above ** coincidental precision**.

Let’s look at 1.013e-4 and its six nearest 9-digit neighbors:

1.01299997e-4

1.01299998e-4

1.01299999e-4

1.013e-4

1.01300001e-4

1.01300002e-4

1.01300003e-4

All seven of those numbers map to the same float, which in turn maps back (to 9 digits) to 1.013e-4. There’s not enough precision to represent all 9-digit numbers.

Broadly we can say that precision ranges from 6 to 8 digits — but can we say what the *average* precision is?

I computed a simple average from the 329 segments between FLT_MIN and FLT_MAX. There are 254 powers of two in this range, most of which have a single precision value; those that cross a power of ten have two values. There are 77 powers of ten, 75 of which cross powers of two (10^{-38} is less than FLT_MIN, and 10^{0} = 2^{0}). 254 + 75 = 329.

The denominator for my average was 254, the number of powers of two. For those powers of two with a single precision, I assigned a weight of 1. For those powers of two split by powers of ten, I assigned a fractional weight, proportional to where the split occurs.

For example, 2^{115} has 7 digits of precision, and 2^{116} has 7 digits for about 20% of its length (before 10^{35}) and 8 digits for the remaining 80% of its length (after 10^{35}). The average across just those two powers of two would be (7*1 + 7*0.2 + 8*0.8)/2 = 7.4.

With that methodology, I came up with an average decimal precision for single-precision floating-point: **7.09 digits**. 89.27% of the range has 7 digits, 10.1% has 8 digits, and 0.63% has 6 digits.

It’s hard to say what that average would mean in practice, since you will likely be using numbers in a specific range and with a particular distribution. But it does tell you that you are likely to get 7 digits.

(I kind of “linearized” a logarithmic concept, but since I’m talking about integer digit counts, it feels OK.)

So after that analysis, what is the bottom line?

If you care about the minimum precision you can get from a float, or equivalently, the maximum number of digits *guaranteed* to round-trip through a float, then 6 digits is your answer. (It’s unfortunate we have to be that conservative; only a small percentage of the float range is limited to 6 digits.) If you want to know the *range* of precision, or equivalently, the range of the number of digits that can round-trip through a float (excluding “coincidental” conversions), then 6-8 digits is your answer. If you want to know how much precision you’ll get on average, then your answer is slightly more than 7 digits.

If you could give only one answer, the **safe answer is 6 digits**. That way there will be no surprises (print precision notwithstanding).

By the same argument above, the precision of a double is not log_{10}(2^{53}) ≈ 15.95.

Doing the same analysis for doubles I computed an average decimal precision of **15.82 digits**. 82.17% of the range has 16 digits and 17.83% has 15 digits.

The three reasonable answers are 15 digits, 15-16 digits, and slightly less than 16 digits on average. The **safe answer is 15 digits**.

Some will say 9 is the upper bound of precision for a float, and likewise, 17 digits is the upper bound for a double (for example, see the Wikipedia articles on single-precision and double-precision). Those numbers come from the theory of round-tripping, from conversions in the opposite direction: floating-point to decimal to floating-point. But you’re not getting that much decimal precision in any range of those IEEE formats. You can determine that based on analyzing gaps, as I did above with my PARI script, or by running some code, as I did with a C program.

For floats, you can iterate through all 9-digit decimal numbers and see if they round-trip. I found that about 97% of 9-digit decimals failed to round-trip. Every float had multiple decimals mapping to it. The minimum count I found was 6; for example, 1.00000004e6 through 1.00000009e6. The maximum count I found was 119, occurring from 9.90353153e27 through 9.90353271e27. (This matches theory, as produced by my PARI/GP script.)

Printing to 9 (or 17) digits is just a way to recover a floating-point number; any one of multiple 9 (or 17) digit numbers will serve that purpose (it doesn’t have to be the closest).

]]>In my article “The Shortest Decimal String That Round-Trips May Not Be The Nearest” I describe some shortest string conversions that are tricky to get right. I tried some in Java and guess what? It got them wrong.

After some research I realized the problem is not limited to the tricky power-of-two conversions described in my article; Java fails to print the shortest strings for many other numbers. This is apparently not a bug, because Java is working as designed.

I had once written this small program to test if Java printed shortest strings:

public class ShortestTest1{ public static void main(String []args){ double d = 0; for (int i = 1; i <= 10; i++) { d += 0.1; System.out.println(d); } } }

It gave me the expected output:

0.1 0.2 0.30000000000000004 0.4 0.5 0.6 0.7 0.7999999999999999 0.8999999999999999 0.9999999999999999

The floating-point numbers represented by the long strings are printed that way because no shorter strings (e.g., 0.3, 0.8, 0.9, and 1.0) will round-trip. Here are the full-length decimal strings representing the double *d* after each iteration:

0.1000000000000000055511151231257827021181583404541015625 0.200000000000000011102230246251565404236316680908203125 0.3000000000000000444089209850062616169452667236328125 0.40000000000000002220446049250313080847263336181640625 0.5 0.59999999999999997779553950749686919152736663818359375 0.6999999999999999555910790149937383830547332763671875 0.79999999999999993338661852249060757458209991455078125 0.899999999999999911182158029987476766109466552734375 0.99999999999999988897769753748434595763683319091796875

You can verify that no shorter strings than the ones Java prints will round-trip by using my decimal to floating-point converter.

The values in the Java program are calculated values of 0.1, 0.2, …, 0.9, 1.0, not conversions of them. The conversions are

0.1000000000000000055511151231257827021181583404541015625 0.200000000000000011102230246251565404236316680908203125 0.299999999999999988897769753748434595763683319091796875 0.40000000000000002220446049250313080847263336181640625 0.5 0.59999999999999997779553950749686919152736663818359375 0.6999999999999999555910790149937383830547332763671875 0.8000000000000000444089209850062616169452667236328125 0.90000000000000002220446049250313080847263336181640625 1.0

The conversions all round-trip when shortened to one digit (because of round-tripping in the other direction, where all decimal numbers 15 digits or less round-trip through floating-point).

This program tests one of the problematic powers of two from my article, 2^{-44}:

public class ShortestTest2{ public static void main(String []args){ double d; d = 0.00000000000005684341886080801486968994140625; //2^-44 System.out.println(d); } }

This prints the 17-digit value 5.6843418860808015E-14, not the shortest, non-nearest, 16-digit expected value, 5.684341886080802E-14.

(Other problematic powers of two fail as well.)

I had assumed Java printed shortest strings, and decided to look for a statement to that effect in the specification. The closest thing I found was this paragraph in the documentation for the toString() method of class Double (search within the page for the second occurrence of ‘toString(double d)’):

How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double.

This seems to say Java prints shortest strings. But then I found this old bug report (written in 2001, still open): “Double.toString(double) sometimes produces incorrect results”. (Sometimes that link doesn’t work; try the OpenJDK copy instead.)

In this bug report are:

- Other examples of non-shortest conversions.
- An explanation as to why the Java implementation gets it wrong.
- A link (which unfortunately is dead) to code that would fix the problem.
- Examples of other printing anomalies, like not choosing the nearer of two numbers that round-trip, and printing a number to 18 digits (only up to 17 is ever necessary!)
- A response that explains that Java is working as designed.

It is a very good read.

]]>Sometimes (many) fewer than 17 digits will serve to round-trip; it is often desirable to find the shortest such string. Some programming languages generate shortest decimal strings, but many do not.^{1} If your language does not, you can attempt this yourself using brute force, by rounding a floating-point number to increasing length decimal strings and checking each time whether conversion of the string round-trips. For double-precision, you’d start by rounding to 15 digits, then if necessary to 16 digits, and then finally, if necessary, to 17 digits.

There is an interesting anomaly in this process though, one that I recently learned about from Mark Dickinson on stackoverflow.com: in rare cases, it’s possible to overlook the shortest decimal string that round-trips. Mark described the problem in the context of single-precision binary floating-point, but it applies to double-precision binary floating-point as well — or any precision binary floating-point for that matter. I will look at this anomaly in the context of double-precision floating-point, and give a detailed analysis of its cause.

Let’s try the brute force algorithm on 2^{-44}, which as a hexadecimal floating point constant is 0x1p-44, and as a full-precision decimal value is 5.684341886080801486968994140625e-14. Rounded to 15 digits, it is 5.6843418860808e-14, but that doesn’t round-trip: it converts to 0x1.ffffffffffffep-45. Rounded to 16 digits, it is 5.684341886080801e-14, but that doesn’t round-trip either: it converts to 0x1.fffffffffffffp-45. So we must settle for the 17-digit value, 5.6843418860808015e-14.

But wait! There is 16-digit number that round-trips, and we missed it: 5.684341886080802e-14. Why does that round-trip, when the closer 16-digit number does not?

The root of the problem is that the size of the gaps between binary floating-point numbers changes at power of two boundaries; gap size above a power of two is double the gap size below a power of two. This asymmetry is a necessary condition, but it alone does not cause the problem; the size of the gaps between decimal numbers around powers of two factors in as well. For double-precision, the problematic decimal gap size occurs only for 16-digit numbers.

Even for 16-digit numbers, not all powers of two exhibit the anomaly; the binary and decimal numbers must align in a certain way. For starters, the nearest 16-digit decimal number must be *below* the power of two, and the next higher 16-digit decimal number must be *above* the power of two. Furthermore, the nearest 16-digit decimal number must be more than halfway towards the next lower 53-bit binary number (it can’t be halfway because round-to-nearest-even would map it to the power of two), while the next higher 16-digit decimal number can be no more than halfway toward the next higher 53-bit binary number. Because the halfway distance is different on either side of the power of two, the farther decimal number will map to the power of two, but the nearer one won’t.

This diagram, drawn to scale, depicts the situation as it applies to our example:

The diagram shows the 53-bit binary floating-point numbers transitioning from the 2^{-45} exponent to the 2^{-44} exponent range. This occurs under the 10^{-14} range of 16-digit floating-point decimal numbers. The size of the binary gaps change from 2^{(-45+1-53)}= 2^{-97} ≈ **6.3 x 10 ^{-30}** to 2

Let’s derive the range of decimal gap size that sets the stage for the anomaly. First, let’s give things some names:

*p*is the power of two,*p-1*is the 53-bit binary number that precedes it, and*p+1*is the 53-bit binary number that follows it.*n*is the nearest 16-digit decimal number, and*n+1*is the next bigger 16-digit decimal number.*p+1/2*is the halfway point between*p*and*p+1*;*p-1/2*is the halfway point between*p*and*p-1.**p+1/4*is one quarter of the way between*p*and*p+1*.

It’s easier to analyze decimal gap size by considering two cases, which I’ll call the *minimum range* and *maximum range*.

For the minimum range, consider *n* fixed just below *p-1/2*, with *n+1* varying from just above *p+1/4* (enough to make it farther away than *n*) to *p+1/2*:

This shows the decimal gap size — the distance between *n* and *n+1* — varies from **a little more than one lower binary gap to a little more than 3/4 upper binary gap**.

For the maximum range, consider *n+1* fixed at *p+1/2*, with *n* varying from just below *p-1/2* to just above *p-1*:

This shows the decimal gap size varies from **a little more than 3/4 upper binary gap to a little less than one upper binary gap**.

Combining the two ranges we see **the problematic decimal gap size is between the lower and upper binary gap size**. Gap size in this range is necessary for the anomaly to occur, but not sufficient.

The problematic decimal gap size only occurs for 16-digit decimal numbers. This is because for decimal numbers of 15 digits or less, the decimal gap size is greater than the largest double-precision binary gap size, and for decimal numbers of 17 digits or more, the decimal gap size is less than the smallest double-precision binary gap size. These latter two facts are a consequence of round-trip decimal to floating-point to decimal and floating-point to decimal to floating-point conversions, respectively.

I wrote a C program to test all 2,046 powers of two in the normal double-precision range: 2^{-1022} to 2^{1023}. There are 54 for which the nearest 16-digit number does not round-trip, yet the next one up does:

2^{976}, 2^{966}, 2^{956}, 2^{896}, 2^{890}, 2^{863}, 2^{803}, 2^{740}, 2^{710}, 2^{594}, 2^{574}, 2^{554}, 2^{544}, 2^{534}, 2^{481}, 2^{405}, 2^{398}, 2^{378}, 2^{345}, 2^{305}, 2^{275}, 2^{182}, 2^{172}, 2^{149}, 2^{132}, 2^{122}, 2^{89}, 2^{-24}, 2^{-44}, 2^{-77}, 2^{-97}, 2^{-140}, 2^{-296}, 2^{-366}, 2^{-383}, 2^{-489}, 2^{-496}, 2^{-499}, 2^{-509}, 2^{-549}, 2^{-569}, 2^{-645}, 2^{-652}, 2^{-662}, 2^{-695}, 2^{-705}, 2^{-778}, 2^{-788}, 2^{-791}, 2^{-808}, 2^{-921}, 2^{-957}, 2^{-1007}, 2^{-1017}

Of those, eight have a 15-digit rounded-to-nearest number that round-trips:

2^{966}, 2^{956}, 2^{890}, 2^{740}, 2^{149}, 2^{-499}, 2^{-569}, 2^{-645}

So if you use the brute force testing algorithm, the anomalous behavior only comes into play for 46 floating-point numbers.

(It turns out for each of the eight cases, the 15-digit rounding and non-nearest 16-digit number are the same.)

The same anomaly applies to single-precision, as Mark showed on Stack Overflow. Round-tripping protects rounding to 6 digits or less or 9 digits or more, leaving 7 and 8 digit numbers as candidates. Of the 254 normal powers of two, the anomaly occurs only for three, and only for their 8-digit roundings: 2^{90}, 2^{87}, and 2^{-96}. (None of the three have 6 or 7 digit strings that round-trip.)

For any binary precision, the problem area will be in the “middle ground” where the digit count is between the two round-trip bounds.

For negative numbers, the results are the same, except the drawings would be mirror images.

^{1} I did a quick test on Java, Python, PHP, and Javascript (C doesn’t really have a mechanism for it). Only Python (3) and Javascript (tested on Firefox, Chrome, Edge, and Internet Explorer) appear to be designed to return the shortest decimal string. (Actually, I know Python is designed as such; I did not look into Javascript’s design and code.)

base_convert() may lose precision on large numbers due to properties related to the internal “double” or “float” type used.

The truth is that it works perfectly for integers up to a certain maximum — you just have to know what that is. I will show you this maximum value in each of the 35 bases, and how to check if the values you are using are within this limit.

Many numbers can only be approximated in IEEE binary floating-point, while others can be represented exactly. Integers starting from 0 and going sequentially to some maximum are among those numbers that *are* exactly representable. In single-precision floating-point, the maximum is 2^{24}; in double-precision floating-point, the maximum is 2^{53}. (Technically the maximums are the largest 24 significant bit integer, 2^{24} – 1, and the largest 53 significant bit integer, 2^{53} – 1. But 2^{24} and 2^{53} are exactly representable by virtue of being powers of two.)

For the remainder of this article, I will assume PHP uses double-precision floating-point. (I’ve only ever seen double-precision implementations — are there implementations that uses single-precision?)

base_convert() takes three arguments: the input number (as a string), the input base, and the output base. The output number is returned as a string. The digits for bases 2 through 36 are drawn from the digits 0-9 and letters a-z (mixed case is accepted, but lower case is returned). You can pass any integer, but the output is guaranteed to be accurate only for integers less than or equal to 2^{53}; I will call these integers * safe* for base_convert().

We have to look at base_convert()’s implementation to make sure it never produces intermediate calculations that exceed the safe range of integers.

base_convert() proceeds in two steps: first it converts the input string to numeric binary, using a C function called *basetozval()*, and then it converts that numeric binary value to the output string, using a C function called *zvaltobase()*.

*basetozval()* is straightforward. It executes this line of code for every digit of the input string:

fnum = fnum * base + c;

fnum is the accumulating numeric binary value of the input, and c is the numeric value of the current digit. Although fnum is declared as a double, its intermediate values will all be integers, and its final integer sum cannot exceed 2^{53} if the input itself does not.

*zvaltobase()* is also simple, but it generates intermediate floating-point values, so deserves a closer look. Here is the relevant loop:

do { *--ptr = digits[(int) fmod(fvalue, base)]; fvalue /= base; } while (ptr > buf && fabs(fvalue) >= 1);

This generates output digits until the numeric input value is consumed. For each iteration, the value of each digit is the remainder of fvalue/base, and the value of fvalue is reset to fvalue/base.

I would have written this code in a slightly different, but equivalent way:

do { *--ptr = digits[(int) fmod(fvalue, base)]; fvalue =floor(fvalue/base); } while (ptr > buf && fabs(fvalue)> 0);

In this form it is obvious that it works, assuming we start with a value of 2^{53} or less. It essentially performs integer arithmetic: fvalue is always an integer, as is the fmod() result.

There’s only a very small chance that an integer greater than 2^{53} will be converted correctly; three conditions have to be met:

- The integer has to be exactly representable.
- base_convert()’s “quick and dirty like” conversion routines have to convert the integer to the right double-precision floating-point number
*and*print it correctly. - The converted to number has to be less than or equal to 64 digits (that’s the size of the output buffer used).

Furthermore, integers that are not exactly representable may not even be correctly rounded (in the IEEE 754 sense); this is a consequence of doing the conversion in limited precision floating-point.

Here are examples of unsafe conversions, with incorrect results:

<?php echo base_convert('3674675646464747008',10,2) . '<br>'; ?>

3674675646464747008 is exactly representable, as the double-precision number

11001011111111000100100000011111111100010011110110011000000000

base_convert() returns

11001011111111000100100000011111111100010011110110100000000000

which is one ULP too high.

<?php echo base_convert('123456789012345669025792',10,2) . '<br>'; ?>

123456789012345669025792 is exactly representable, as the double-precision number

11010001001001001101100011111000100001010000001101100000000000000000000000000

base_convert() returns

1001101100011111000100001010000001101100000000000000000000000000

which is way, way off. (The 64 digit limit was hit, cutting off the most significant digits.)

<?php echo base_convert('1234567890123456789',10,2) . '<br>'; ?>

1234567890123456789 is *not* exactly representable; the closest double precision number is

1000100100010000100001111010001111101111010011000000100000000

base_convert() returns

1000100100010000100001111010001111101111010011000001000000000

which is one ULP too high (so is not correctly rounded).

This table shows the maximum integer safely convertable by base_convert() — 2^{53} — as represented in each of the 35 bases:

Base | Max Value (2^{53}) |
---|---|

2 | 100000000000000000000000000000000000000000000000000000 |

3 | 1121202011211211122211100012101112 |

4 | 200000000000000000000000000 |

5 | 33421042423033203202432 |

6 | 224404414114114022452 |

7 | 5350140446150306054 |

8 | 400000000000000000 |

9 | 47664754584305345 |

10 |
9007199254740992 |

11 | 2179a75830112628 |

12 | 702273685b77a28 |

13 | 2397b7325802696 |

14 | b4c34aaccadc64 |

15 | 4964cdca1dc7b2 |

16 | 20000000000000 |

17 | f7ded8c9e1f8f |

18 | 7e2c925c889fe |

19 | 416210bi7ca4a |

20 | 23jc3e8722c9c |

21 | 14f01e5ec7fdb |

22 | f92hf53a8cc8 |

23 | 9a9i7gmkbfj6 |

24 | 5m1bec25hbd8 |

25 | 3jb4ed3h3aeh |

26 | 2bko8jf78bb6 |

27 | 1gk4mmhm95ae |

28 | 12bd1h7b56h4 |

29 | lbpf6d7shib |

30 | f7iboftrod2 |

31 | aukoap6ali8 |

32 | 80000000000 |

33 | 5t2d3e17rj8 |

34 | 4cbreicjccw |

35 | 399uaj5f5vw |

36 | 2gosa7pa2gw |

A table of base-specific maximum values is interesting, but it does not lead to a simple, base-independent, programmatic test. Here is such a test:

<?php function base_convert_is_safe($integer,$base) { if (bindec(base_convert($integer,$base,2)) <= 9007199254740991) { $isSafe = TRUE; } else { $isSafe = FALSE; } return $isSafe; } ?>

This test uses base_convert() to convert the input string to a binary string, and then uses another PHP conversion function — bindec(), which by the way is safe up to 2^{53} as well — to convert that binary string to a numeric binary value. This value is then compared to the numeric binary value of (decimal) 9007199254740991, which is 2^{53} – 1. (Testing for 2^{53} is problematic; with the default IEEE rounding mode of *round-to-nearest/ties-to-even*, 2^{53} + 1 is indistinguishable from 2^{53}. We thus have to cut the top number from the safe range.)

This test is not very efficient, since it uses a call to bindec() and a call to base_convert() before the actual base_convert() conversion you want to do.

Here are examples of how to use the tester function:

<?php $integer_from = '2gosa7pa2gv'; $base_from = 36; if (base_convert_is_safe($integer_from,$base_from)) { $integer_to = base_convert($integer_from,$base_from,2); echo $integer_to . '<br>'; } else { echo 'NOT safe to use base_convert()<br>'; } ?>

The input value equals 2^{53} – 1, so this prints

11111111111111111111111111111111111111111111111111111

<?php $integer_from = '2gosa7pa2gx'; $base_from = 36; if (base_convert_is_safe($integer_from,$base_from)) { $integer_to = base_convert($integer_from,$base_from,2); echo $integer_to . '<br>'; } else { echo 'NOT safe to use base_convert()<br>'; } ?>

The input value equals 2^{53} + 1, so this prints

NOT safe to use base_convert()

There is another way to test for safe conversions that is a little less simple but much more efficient — but you have to sacrifice the top end of the safe range of integers to use it. For example, in decimal, 2^{53} is the 16-digit integer 9007199254740992. If we limit the safe range of decimal integers to 15 digits — that is, integers up to and including 999999999999999 — we can simply take the length of the input and make sure it is less than or equal to 15. We don’t have to consider the value of the number per se.

This works similarly for the other bases; for example:

- 2
^{53}in base 7 is the 19-digit integer 5350140446150306054; if we cap it at 18 digits, we get 666666666666666666 - 2
^{53}in base 16 is the 14-digit integer 20000000000000; if we cap it at 13 digits, we get fffffffffffff - 2
^{53}in base 36 is the 11-digit integer 2gosa7pa2gw; if we cap it at 10 digits, we get zzzzzzzzzz

(Taking the cutoff as one digit less than the representation for 2^{53} is the same as taking the integer part of the logarithm to the base “from base” of 2^{53}: floor(log_{fromBase}(2^{53}) .)

Here is a table showing the maximum cap of digits for each base, and the corresponding values:

Base | Max Digits | Max Value |
---|---|---|

2 | 53 | 11111111111111111111111111111111111111111111111111111 |

3 | 33 | 222222222222222222222222222222222 |

4 | 26 | 33333333333333333333333333 |

5 | 22 | 4444444444444444444444 |

6 | 20 | 55555555555555555555 |

7 | 18 | 666666666666666666 |

8 | 17 | 77777777777777777 |

9 | 16 | 8888888888888888 |

10 |
15 |
999999999999999 |

11 | 15 | aaaaaaaaaaaaaaa |

12 | 14 | bbbbbbbbbbbbbb |

13 | 14 | cccccccccccccc |

14 | 13 | ddddddddddddd |

15 | 13 | eeeeeeeeeeeee |

16 | 13 | fffffffffffff |

17 | 12 | gggggggggggg |

18 | 12 | hhhhhhhhhhhh |

19 | 12 | iiiiiiiiiiii |

20 | 12 | jjjjjjjjjjjj |

21 | 12 | kkkkkkkkkkkk |

22 | 11 | lllllllllll |

23 | 11 | mmmmmmmmmmm |

24 | 11 | nnnnnnnnnnn |

25 | 11 | ooooooooooo |

26 | 11 | ppppppppppp |

27 | 11 | qqqqqqqqqqq |

28 | 11 | rrrrrrrrrrr |

29 | 10 | ssssssssss |

30 | 10 | tttttttttt |

31 | 10 | uuuuuuuuuu |

32 | 10 | vvvvvvvvvv |

33 | 10 | wwwwwwwwww |

34 | 10 | xxxxxxxxxx |

35 | 10 | yyyyyyyyyy |

36 | 10 | zzzzzzzzzz |

You can create an array of maximum digit values, indexed by base, and test the length of your number before calling base_convert().

Limiting the maximum number of digits cuts the top range of integers you can convert safely with base_convert() — sometimes significantly. It depends on the relative value of the top (most significant) digit you are cutting. This table shows the percentage of safe integers allowed by the maximum digit value — what I call *coverage* — by base:

Base | Max Digits | Coverage |
---|---|---|

2 | 53 | ≈100% |

3 | 33 | ≈61.7% |

4 | 26 | ≈50% |

5 | 22 | ≈26.5% |

6 | 20 | ≈40.6% |

7 | 18 | ≈18.1% |

8 | 17 | ≈25% |

9 | 16 | ≈20.6% |

10 |
15 |
≈11.1% |

11 | 15 | ≈46.4% |

12 | 14 | ≈14.3% |

13 | 14 | ≈43.7% |

14 | 13 | ≈8.8% |

15 | 13 | ≈21.6% |

16 | 13 | ≈50% |

17 | 12 | ≈6.5% |

18 | 12 | ≈12.8% |

19 | 12 | ≈24.6% |

20 | 12 | ≈45.5% |

21 | 12 | ≈81.7% |

22 | 11 | ≈6.5% |

23 | 11 | ≈10.6% |

24 | 11 | ≈16.9% |

25 | 11 | ≈26.5% |

26 | 11 | ≈40.7% |

27 | 11 | ≈61.7% |

28 | 11 | ≈92.1% |

29 | 10 | ≈4.7% |

30 | 10 | ≈6.6% |

31 | 10 | ≈9.1% |

32 | 10 | ≈12.5% |

33 | 10 | ≈17% |

34 | 10 | ≈22.9% |

35 | 10 | ≈30.6% |

36 | 10 | ≈40.6% |

You can improve your coverage significantly with a hybrid test: you can allow one digit more than the maximum length as long as the top digit occurs lexicographically before the value of the top digit in that base’s representation of 2^{53}. For example, In base 29, the maximum digit length number *ssssssssss* (10 digits) gives less than 5% coverage. 2^{53} in base 29 is *lbpf6d7shib* (the top digit is the letter ‘l’, not the digit ‘1’ — which is why my arbitrary-precision base converter offers a character set that excludes that letter). Anything kssssssssss (which is 11 digits) or less is safe, giving over 98% coverage.

You can skip the programmatic checks if you know that the range of numbers you will be converting is safe — by looking at the tables beforehand.

Perhaps this is a better warning for base_convert()’s documentation:

base_convert() can convert integers accurately only up to a certain size. This is due to the limits of the internal floating-point type used. If single-precision floating-point is used, the maximum integer is 2

^{24}; if double-precision floating-point is used, the maximum integer is 2^{53}.

I would also add a note that said how to tell whether single or double precision is being used (this can be done by trial and error — but is there a better way?) Better yet, if there are no single-precision implementations, we could eliminate any talk of different precisions at all.

]]>

The Fretlight guitar is a teaching aid for guitarists. It connects to a computer through a USB cable, and acts as a USB HID. Software on the computer sends it data that tells it which of its 132 LEDs to light, indicating proper placement of your fingers.

I co-opted the interface and turned my Fretlight into a BCD mode binary clock. I had done this with Windows and Android programs, and now I have done it with a Python program — running on the Raspberry Pi.

(See “My Fretlight Guitar As a Binary Clock” for more background and for the USB communication details.)

I wrote a program, fretclock.py, to implement my Fretlight clock in Python. I created a class called FretlightBinaryClock that has two “public” methods, start() and stop(). start() activates the clock, which entails opening the Fretlight USB device, getting the current time from the system, and displaying the updated time every second. stop() deactivates the clock, turning off all LEDs and releasing the device.

For USB communication, I use PyUSB, a USB library for Python. PyUSB itself requires a backend USB library, for which I use libusb.

Here is the code:

# fretclock.py: Rick Regan, https://www.exploringbinary.com/, 1/4/16 # BCD mode (standard time only) binary clock represented on the Fretlight Guitar # # For details see https://www.exploringbinary.com/my-fretlight-guitar-as-a-binary-clock/ # and https://www.exploringbinary.com/my-fretlight-guitar-binary-clock-raspberry-pi-edition/ # # To install PyUSB (and required backend, e.g., libusb): # sudo apt-get install libusb-1.0-0-dev # sudo pip install --pre pyusb # # To run: # sudo date -s "4 Jan 2016 10:32:00" (Set current date and time) # sudo python fretclock.py # # To stop: ctrl-c # # Credits: # PyUSB calls taken from https://github.com/walac/pyusb/blob/master/docs/tutorial.rst # and https://stackoverflow.com/questions/12542799/communication-with-the-usb-device-in-python import usb.core import usb.util import time import datetime import signal import sys def _signal_handler(signal, frame): fretclock.stop() sys.exit(0) class FretlightBinaryClock: PACKET_LEN = 7 #Number of bytes in a Fretlight USB HID packet def __init__(self): self.fretlight = 0 self.cfg = 0 self.intf = 0 self.ep = 0 self.hour = 0 self.minute = 0 self.second = 0 self.packet = bytearray([0] * FretlightBinaryClock.PACKET_LEN) self.packet[6] = 0x03 # All LEDS are in packet 3 def _open_fretlight(self): self.fretlight = usb.core.find(idVendor=0x0925, idProduct=0x2000) # Open the Fretlight Guitar device (Vendor ID, product ID) if self.fretlight is None: raise ValueError('Fretlight Guitar not found') if self.fretlight.is_kernel_driver_active(0): # This will happen once after each time the Fretlight is attached -- it must be detached so we can control it ourselves self.fretlight.detach_kernel_driver(0) self.fretlight.set_configuration() # Set active configuration (no arguments means the first configuration is actived) self.cfg = self.fretlight.get_active_configuration() # Get active configuration self.intf = self.cfg[(0,0)] # Get interface self.ep = usb.util.find_descriptor( self.intf, # match the first OUT endpoint custom_match = \ lambda e: \ usb.util.endpoint_direction(e.bEndpointAddress) == \ usb.util.ENDPOINT_OUT) # Get endpoint (interrupt OUT endpoint) if self.ep is None: raise ValueError('Endpoint not found') def _leds_off(self): self.packet[0] = 0 self.packet[1] = 0 self.packet[2] = 0 self.packet[3] = 0 self.packet[4] = 0 self.packet[5] = 0 self._send_report() def _init_clock(self): #Initialize from system clock self.hour = datetime.datetime.now().time().hour if (self.hour > 12): self.hour -= 12 # Convert from military to standard self.minute = datetime.datetime.now().time().minute self.second = datetime.datetime.now().time().second def _increment_clock(self): self.second += 1 if (self.second == 60): self.second = 0 self.minute += 1 if (self.minute == 60): self.minute = 0 self.hour += 1 if (self.hour == 13): self.hour = 1 def _send_report(self): # Report should begin with a 0 byte followed by the packet, but for some reason with PyUSB, it won't work if you send the 0 report = self.packet self.ep.write(report) # Write the data to the Fretlight def _create_packet(self): # Break time into BCD digits: h1,h2 / m1, m2 / s1,s2 # Get the two digits of the hour h1 = self.hour / 10 h2 = self.hour % 10 # Get the two digits of the minute m1 = self.minute / 10 m2 = self.minute % 10 # Get the two digits of the second s1 = self.second / 10 s2 = self.second % 10 # Set the appropriate LEDs # h1 (range: 0-1) if (h1 & 1): # LED at fret 21, string 6 self.packet[1] |= 0x10 # Turn on else: self.packet[1] &= ~0x10 # Turn off # h2 (range: 0-9) if (h2 & 8): # LED at fret 18, string 5 self.packet[3] |= 0x80 # Turn on else: self.packet[3] &= ~0x80 # Turn off if (h2 & 4): # LED at fret 19, string 5 self.packet[3] |= 0x02 # Turn on else: self.packet[3] &= ~0x02 # Turn off if (h2 & 2): # LED at fret 20, string 5 self.packet[2] |= 0x08 # Turn on else: self.packet[2] &= ~0x08 # Turn off if (h2 & 1): # LED at fret 21, string 5 self.packet[1] |= 0x20 # Turn on else: self.packet[1] &= ~0x20 # Turn off # m1 (range: 0-5) if (m1 & 4): # LED at fret 19, string 4 self.packet[3] |= 0x04 # Turn on else: self.packet[3] &= ~0x04 # Turn off if (m1 & 2): # LED at fret 20, string 4 self.packet[2] |= 0x10 # Turn on else: self.packet[2] &= ~0x10 # Turn off if (m1 & 1): # LED at fret 21, string 4 self.packet[1] |= 0x40 # Turn on else: self.packet[1] &= ~0x40 # Turn off # m2 (range: 0-9) if (m2 & 8): # LED at fret 18, string 3 self.packet[4] |= 0x02 # Turn on else: self.packet[4] &= ~0x02 # Turn off if (m2 & 4): # LED at fret 19, string 3 self.packet[3] |= 0x08 # Turn on else: self.packet[3] &= ~0x08 # Turn off if (m2 & 2): # LED at fret 20, string 3 self.packet[2] |= 0x20 # Turn on else: self.packet[2] &= ~0x20 # Turn off if (m2 & 1): # LED at fret 21, string 3 self.packet[1] |= 0x80 # Turn on else: self.packet[1] &= ~0x80 # Turn off # s1 (range: 0-5) if (s1 & 4): # LED at fret 19, string 2 self.packet[3] |= 0x10 # Turn on else: self.packet[3] &= ~0x10 # Turn off if (s1 & 2): # LED at fret 20, string 2 self.packet[2] |= 0x40 # Turn on else: self.packet[2] &= ~0x40 # Turn off if (s1 & 1): # LED at fret 21, string 2 self.packet[2] |= 0x01 # Turn on else: self.packet[2] &= ~0x01 # Turn off # s2 (range: 0-9) if (s2 & 8): # LED at fret 18, string 1 self.packet[4] |= 0x08 # Turn on else: self.packet[4] &= ~0x08 # Turn off if (s2 & 4): # LED at fret 19, string 1 self.packet[3] |= 0x20 # Turn on else: self.packet[3] &= ~0x20 # Turn off if (s2 & 2): # LED at fret 20, string 1 self.packet[2] |= 0x80 # Turn on else: self.packet[2] &= ~0x80 # Turn off if (s2 & 1): # LED at fret 21, string 1 self.packet[2] |= 0x02 # Turn on else: self.packet[2] &= ~0x02 # Turn off def _update(self): self._increment_clock() self._create_packet() self._send_report() def _timer_loop(self): while True: time_begin = time.time() self._update() time_end = time.time() time_elapsed = time_end - time_begin time.sleep(1.0-time_elapsed) # Delay for 1 second, less overhead to this point def _set_signal_handler(self): signal.signal(signal.SIGINT, _signal_handler) def start(self): print ("Starting fretclock") self._set_signal_handler() self._open_fretlight() self._leds_off() self._init_clock() self._timer_loop() def stop(self): print ("Stopping fretclock") self._leds_off() usb.util.dispose_resources(self.fretlight) fretclock = FretlightBinaryClock() fretclock.start()

This is the first real Python program I have written. In the time I allotted to this project, I did my best to research and follow Python’s coding conventions (please comment if you have suggestions for improvement). My goal was to explore my new Raspberry Pi, using the recommended language. (I kind of cheated though, so as not to take on too much new stuff at once: I did an initial implementation in C, which was a straightforward port of my Windows C++ code.)

As for the PyUSB API, I could not find a guide detailing all its functions. I just copied the code from this tutorial, and added a few lines of code from this Stack Overflow answer. The code works inasmuch as I tested it, but I don’t know how robust it is. I also don’t understand it well enough to explain, for example, why I had to write the packet directly to the endpoint instead of wrapping it in a USB report (like I did in my C, C++, and Java versions).

You must first put an image of Raspbian on the micro SD card; I used Raspbian Jessie. Insert the card in the Pi and power it on — it will boot right up to the graphical desktop. From there, open a terminal window and do the following (as necessary):

Install the USB libraries (the Pi needs to be connected to the Internet for this step):

sudo apt-get install libusb-1.0-0-dev sudo pip install --pre pyusb

Set the system time (the Pi has no built-in real-time clock); for example:

sudo date -s "4 Jan 2016 10:32:00"

Start the clock:

sudo python fretclock.py

(‘sudo’ is required; otherwise, access to the Fretlight is denied with this message: usb.core.USBError: [Errno 13] Access denied (insufficient permissions))

Stop the clock:

ctrl-c

On my Pi, I observed that the clock loses about 3-4 seconds per hour. I don’t know if this is my code, Python’s sleep command, or the lack of “real-timeness” of the Pi. For my purposes, this is acceptable.

This code should run on any Linux, not just Raspbian. (I also ran it on Ubuntu.)

When I started this project I wasn’t sure if the Pi would have enough power for itself and the Fretlight. Apparently it does.

Ideally, you would run this code without the Pi hooked up to a monitor, so that you could just plug in the Fretlight and power up the Pi. (You would have the program set to start up at boot.) However, since there is no real-time clock on the Pi, you need to interact with it first to set the time.

To see the Fretlight binary clock in action, check out my video (the video is of the clock running under Windows, but it looks the same when running the Python code on the Pi).

]]>

(The lesson comes in two forms: a *blocks* version and a *JavaScript* version. The JavaScript version, furthermore, has a blocks mode which allows you to assemble JavaScript with blocks. This article is about the JavaScript blocks mode version.)

Here is a screenshot of puzzle 9, where I’ve coded the value 0.1 for the score increment:

Here specifically is the code block:

Here’s the score after the first encounter (looks OK):

Here’s the score after the second encounter (looks OK):

Here’s the score after the third encounter (what on earth?):

Bingo! A floating-point gotcha! (If this is surprising, you’re not alone.)

The root of the problem is that 0.1 is not 0.1 in binary floating-point. Specifically, in double-precision binary floating-point, it is

0.1000000000000000055511151231257827021181583404541015625.

Decimal numbers such as 0.1 do not have exact binary equivalents.

Furthermore, every time you add this to the score, rounding may occur (double-precision floating-point has a fixed precision of 53 bits). The internal double-precision score after each encounter is as follows (you can do the binary additions and rounding yourself, using my decimal/binary converter and binary calculator):

- 0.1000000000000000055511151231257827021181583404541015625
- 0.200000000000000011102230246251565404236316680908203125
- 0.3000000000000000444089209850062616169452667236328125

One strategy for printing double-precision floating-point numbers is to round them to 17 significant digits; that way, they are guaranteed to round-trip convert back to the same double-precision number. But that is not what is done here; that would make the scores 0.10000000000000001, 0.20000000000000001, and 0.30000000000000004, respectively.

Another strategy for printing floating-point numbers is to round them to the shortest number that will round-trip. In this case, that would be 0.1, 0.2, and 0.30000000000000004. That’s what’s happening here.

You can show this by using my decimal to floating-point converter. 0.10000000000000001 and 0.1 both convert to the original floating point number, as do 0.2 and 0.20000000000000001. However, 0.30000000000000004 converts to the original floating point number, but 0.3 does not; it converts to 0.299999999999999988897769753748434595763683319091796875.

I entered a value of 1e308 as the score increment and the game displayed 1e+308 after the first encounter, as expected. After the second and third encounters, it displayed *Infinity* — also as expected. (The maximum value of a double-precision number is approximately 1.8e308.) This is not so much an anomaly as a demonstration of the limits and properties of floating-point.

In other exercises, where you can run up a higher score, you can see similar printing anomalies using the 0.1 score increment. The next one for which this happens is after the eighth score; it prints the 16-digit value 0.7999999999999999, the shortest number that round-trips.

You can’t fix this, because that’s how the underlying JavaScript code works, and because that’s how the underlying floating-point implementation works. I suppose you could prevent entering floating-point numbers — but then where would the life lesson be?

]]>