Talk:Computer number format/Archive 1

Proposed merge with "Signed number representation"
There seems to be a lot of redundency with [Signed number representation]. On the whole, I'd say [Signed number representation] is the cleaner, better written page, so that material should generally be favored. However, I think [Computer numbering formats] is the more general name. Though perhaps the material here should be split on to multiple pages; in this case, the stuff about signed representation should be gotten rid of on this page.

I'll also say that I'm not happy with the term "numbering" as I found it odd. Why not just number? I can think of some alternate titles:


 * computer number formats
 * number formats (computing)
 * number formats in computers

--Apantomimehorse 06:03, 2 September 2006 (UTC)


 * The articles are about how numbers are represented as binary data. It isn't necessarily specific to computers, and these representations aren't necessarily signed.  Therefore, my vote is for them to be merged under the title of Binary number representation. ~ Booya Bazooka 03:21, 3 October 2006 (UTC)

I am happy with that. But then why not explain what "excess 127 format" is? Bill, 2007 Sept 18

Nitpick: Decimal vs Denary
As I understand it, the representation of integer values using base 10 is called denary. The term decimal is used to describe values that include fractions (hence decimal point and not denary point). So 1, 10, 97 are denary values while 1.5, 10.9 and 97.6 are decimal values. However, various sources (including Wikipedia) don't seem to distinguish between the two these days (I'm sure they used to), and I'm not an SME so I leave it to others to decide whether to modify the article. AncientBrit 19:33, 3 December 2007 (UTC)

'decimal' as an adjective is synonymous with 'denary' (which is only used as an adjective). 'decimal' as a noun refers to fractions, etc. All of the numbers you mention are in base ten, so they are all decimal (or denary) numbers. mfc 20:42, 3 December 2007 (UTC)


 * I'm not so sure. There is a distinction drawn between denary and decimal: denary numbers use the digits 0 to 9 multiplied by 10 to the power of positive integers; decimal numbers permit negative signs and negative powers to represent fractional and negative values, which denary cannot do. One source is Introductory Mathematics: Algebra and Analysis By Geoff Smith, pp 153-154. AncientBrit 02:26, 4 December 2007 (UTC)


 * Denary, "by tens"; decimal "by tenths". Popular usage prefers "decimal" over "denary", because people are taught decimal versus vulgar fractions in schools. Only the more advanced students learn about different number bases, and that is when the term "denary" is introduced. The less mathematically inclined head out into the world, oblivious to this word; then when challenged with defining "base 10" as opposed to binary or "hexadecimal" (ugh), they regurgitate the closest information to what they were fed. This doesn't make it correct terminology in mathematics, which is what we're discussing. Here's a hint: binary, ternary, quarternary, quinary, senary, septenary, octonary, nonary, denary, undenary, duodenary, tridenary, quattuordenary, quindenary, senidenary, etc. You'll notice a few anomalies with popular usage in the list. Fancy that, eh? --199.46.245.231 (talk) 21:49, 9 October 2008 (UTC)

Grammar
This article uses the second person and instructional language extensively. "You" should rarely if ever appear in an encyclopaedia. If anyone happens to be editing a section and finds such luanguage, please rewrite it. Examples: " You will rarely encounter them and NaNs will not be discussed further here", and "Once again, remember that bits are bits. If you have eight bytes stored in computer memory ...". Mmcculloch (talk) 14:46, 28 July 2011 (UTC)
 * Feel free to improve the prose. -R. S. Shaw (talk) 21:22, 28 July 2011 (UTC)

Rewrite
While it does appear a reasonable amount of work has been done since the page was tagged for rewrite and verification two years ago, I think it still needs a fair amount of work. Removal of excessive examples, simplification of some explanations and examples, further explanation for a few terms, and so on. As well as general cleanup for clarity and consistency. Also, citations. I think I'm going to start looking into writing parts of this article. Westphjm (talk) 17:08, 12 April 2012 (UTC)
 * I've gone ahead and begun my rewrite. Thus far I've rewritten the Bit section under the first heading.  Westphjm (talk) 16:02, 24 April 2012 (UTC)
 * I think that these changes have helped. I would continue to change the tables since the format you used makes them more reader friendly. The tables in the next section on octal and hex number display could especially use reformatting. I would suggest that you have make one table with the decimal representation on the left and then put the other representations on right of these. This would greatly assist the comprehension.Jock Echo Loon (talk) 16:53, 26 April 2012 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Computer number format. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive http://arquivo.pt/wayback/20100530092446/http%3A//home1.paulschou.net/tools/xlate/ to http://www.paulschou.com/tools/xlate/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 21:14, 11 August 2017 (UTC)

Untitled section
NOTE: Please remember to sign your comments with four consecutive tildes.

My quick first impression is that this is a long well written and useful article. It's only problem is that it is an orphan. Perhaps someone more familiar with this pard of Wikipedia can create links so that people other than those who stumble onto it can have access. Eclecticology, Friday, April 26, 2002

Two ideas that should be integrated: -- Just to list the article in RecentChange. The article is interesting, having a lot of good points. Alas, the article seems isolated from other wikipedia articles. I am going to integrate the article to wikipedia. If you want, you can help that too! -- Taku 00:42 2 Jun 2003 (UTC) -- What about sign-mangnitude systems? and what about langs / packs that store 'rational' ints? the (1/3)*3 would surely yield 1, the example on top is specific for fix'd float, and not stated as such. I find at least the 'introduction' section too much 'programming-manual' and too little encyclopidic. The (1/3)*3 should state that it's in certian numbering systems, and not in the 'computer'. numbers in the computer will act the way they're programmed, some encoding schemes and certain quirks, others, have, other-quirks.
 * the use of fixed point numbers in business computing, e.g. spreadsheets and COBOL, to represent money values. Floating point precision loss is unacceptable when dealing with money.
 * the introduction of expanded numeric types, such as rationals, bignums, and complex numbers, in high-level programming languages. Programmers in LISP or Python (among others) have some assurance that their programming systems will Do The Right Thing with mathematical operations.

A very nice article, though it doesn't cover different floating point representations in much depth. Redirecting integer (computer science) to this one (after merging pieces, if necessary) may have some merit.

I would strongly recommend replacing the ASCII table given in this page with a link to the much better table given in ASCII. RedWolf 06:51, Dec 7, 2003 (UTC)

The style and freedom in the article should make the material useful somewhere. I don't have time, but if you adjust it, be aware: the section on octal numbers is accurate on its face, but wrong by omission. When (mostly early) hardware, and some programs used Octal numbers, they did it to use then current LEDs,


 * HuH? current LEDs ??

that couldn't display hexadecimal. So they would use two octal digits to represent a byte, and thw only valid values are 00-377. This means two bytes of hex 'FFFF' became '377 377'. Lou I 19:54, 20 Jan 2004 (UTC)


 * Nope, I don't think that's the reason they used octal instead of hex (one may easily display hex digits with seven segment LEDs, as page three of this document (pdf) shows with a nice illustration). Rather, I think it was due to the earlier habit of using multiple-of-three (bit) wordlengths instead of today's multiple-of-fours. With an 18-bit computer, for instance, a word could be conveniently written as six octal digits. Many famous mainframes and minicomputers were 36-bit, 18-bit or 12-bit, using eg. 6-bit 'bytes' for characters, so octal was just fine back then. --Wernher 21:30, 1 Feb 2004 (UTC)

Useful info should be copied from this page and distributed to the already existing Binary number system, Integer (computer science), Floating point, ASCII, Binary coded decimal and Data type. Perhaps a short summary on each type here? Article WAY too long and boring. Listing article for cleanup and removed 'merge with Integer (computing science)' request. Zoney 17:24, 13 Mar 2004 (UTC)


 * I don't think it should be merged with integer article, but does need tidying up. You're right, it is too long, and it essentially covers several different subjects, e.g. binary representation, which are already covered elsewhere.  This article should discuss the relative merits and pitfalls of the various systems, with links to each of the formats mentioned for more info.  --HappyDog 23:26, 18 Mar 2004 (UTC)

This article seems overly "chatty", I'll try and rewrite some of it in a more formal tone. porge 14:15, 25 Jul 2004 (UTC)

Here is text removed from bottom of the article, it doesn't relate directly to numbering systems in computers - although the section about BCD could go back in in some form. The rest of it relates to binary math and conversion and needs to be dispersed amongst various articles. porge 15:46, 25 Jul 2004 (UTC)

Comments: binary conversion, binary math, BCD, indefinite precision
This material may be confusing, but if you find this interesting and want to have a more complete understanding, there are a few more details to be provided.

The first detail is the translation of binary numbers into decimal numbers and the reverse. It's easy to do. There are formal arithmetic methods, "algorithms", for performing such conversions, but most people who do such things a lot have a fancy calculator to do the work for them and if you don't do it a lot, you won't remember how to do anything but the brute-force method. The brute-force method isn't too hard if you have a pocket calculator.

In almost all cases where someone wants to do this, they're converting an unsigned integer decimal value to a hex value that corresponds to a binary number. They almost never convert a signed or floating-point value. The main reason to convert to binary is just to get a specific pattern of bits often for interfacing to computer hardware, since computer mechanics may need to send various patterns of "on" and "off" bits to control a certain device.

Usually the binary value is not more than 16 bits long, meaning the unsigned binary equivalent of the bit pattern is no larger than 65,535, and if you remember your powers of 2, or just have them written down handy, you can perform a conversion very easily.

Suppose you have a decimal number like, say, 46,535, that you want to convert to a 16-bit binary pattern. All you have to do is work your way down the list of powers of two and follow a few simple rules.

First, take the highest power of 2 in the table, or 32,768, and check to see if it is larger than the decimal number or not. It isn't, write down a "1":


 * 1

and then subtract 32,768 from 46,535 on your calculator. This gives 13,767. Go down to the next power of two, or 16,384, and compare. This is larger than 13,767, so write down a "0":


 * 10

Compare 13,767 to the next lower power of 2, which is 8192. This isn't larger than 13,767, so write down a "1":


 * 101

and subtract 8192 from 13,767 to get 5,575. Repeat this procedure until you have all 16 binary digits. In summary, the conversion looks like this:

46,535 greater than or equal to 32,768? Yes, subtract, write:  1 13,767 greater than or equal to 16,384? No, write:             0 13,767 greater than or equal to  8,192? Yes, subtract, write:  1 5,575 greater than or equal to  4,096? Yes, subtract, write:  1

1,479 greater than or equal to  2,048? No, write:             0 1,479 greater than or equal to  1,024? Yes, subtract, write:  1 455 greater than or equal to    512? No, write:             0 455 greater than or equal to    256? Yes, subtract, write:  1

199 greater than or equal to    128? Yes, subtract, write:  1 71 greater than or equal to     64? Yes, subtract, write:  1 7 greater than or equal to     32? No, write:             0 7 greater than or equal to     16? No, write:             0

7 greater than or equal to      8? No, write:             0 7 greater than or equal to      4? Yes, subtract, write:  1 3 greater than or equal to      2? Yes, subtract, write:  1 1 greater than or equal to      1? Yes, subtract, write:  1

This gives:


 * 46,535 decimal = 1011 0101 1100 0111 binary = b5c7 hex

Of course, converting back is a simple multiplication exercise. Doing it in hex is easier:


 * b5c7 hex
 * = 11 &times; 163 + 5 &times; 162 + 12 &times; 16  +  7 &times; 1
 * = 11 &times; 4096 + 5 &times; 256 + 12 &times; 16 +  7
 * = 45,056  +  1,280     +  192      +  7     = 46,535

Again, this is a clumsy means of converting between the number bases, but since most people don't do this often or at all, it pays to have a technique that is at least easy to remember. If you do end up doing it often, you'll find some better way of doing it than by hand.
 * The next interesting topic is the operations that can be performed on binary numbers. We'll consider signed and unsigned integers first.

The most basic operations on binary values are the simple logical operations AND, OR, XOR ("exclusive OR"), and NOT. Performing them on some sample byte data gives:

1111 0000       1111 0000        1111 0000   OR 1010 1010    AND 1010 1010    XOR 1010 1010    NOT 1010 1010 -   -    -      1111 1010        1010 0000        0101 1010        0101 0101

The rules for these operations are as follows:


 * AND: The result is 1 if both values are 1.
 * OR:  The result is 1 if either value is 1.
 * XOR: The result is one if only one value is 1.
 * NOT: The result is 1 if the value is 0 (this is also called "inverting").

A computer uses these operations a great deal, either for checking and setting or "twiddling" bits in its hardware, or as building blocks for more complicated operations, such as addition or subtraction.

Binary addition is a more interesting operation. If you remember the formal rules for adding a decimal number, you add the numbers one digit at a time, and if the result is ten or more, you perform a "carry" to add to the next digit. For example, if you perform the addition:


 * 374 + 452

you first add:


 * 4 + 2 = 6

then:


 * 7 + 5 = 12

which means that you have to "carry" the "1" to the next addition:


 * ( 1 + ) 3 + 4 = 8

giving the result: 374 + 452 = 826.

Performing additions in binary are essentially the same, except that the number of possible results of an addition of two bits is minimal:


 * 0 + 0 = 0
 * 0 + 1 = 1
 * 1 + 0 = 1
 * 1 + 1 = 10

In the last case, this implies a "carry" to the next digit in the binary number. So performing a binary addition on two bytes would look like this:

0011 1010         58    + 0001 1101        + 29   ---             0101 0111          87       CC C

The bits on which a carry occurred are marked with a "C". The equivalent decimal addition is shown to the right. Assuming that we are adding unsigned integers, notice what happens if we add:

1000 1001            137     + 0111 1011           + 123   -     ---   (1) 0000 0100     ( 256 + ) 4 ?       CCCC C CC

The result, equivalent to a decimal 260, is beyond the range of an 8-bit unsigned value (maximum of 255) and won't fit into 8 bits, so all you get is a value of 4, since the "carry-out" bit is lost. A "numeric overflow" has occurred.

OK, now to get really tricky. Remember how we defined signed integers as two's complement values? That is, we chop the range of binary integer values in half and assign the high half to negative values. In the case of 4-bit values:

0000  0001 ...  0110   0111   1000   1001 ...  1110   1111      0      1 ...     6      7     -8     -7 ...    -2     -1

Now we can discuss exactly why this scheme makes life easier for the computer. Two's complement arithmetic has some interesting properties, the most significant being that it makes subtraction the same as addition. To see how this works, pretend that you have the binary values above written on a strip of stiff paper taped at the ends into a loop, with each binary value written along the loop, and the loop joined together between the "1111" and "0000" values.

Now further consider that you have a little slider on the loop that you can move in the direction of increasing values, but not backwards, sort of like a slide rule that looks like a ring, with 16 binary values on it.

If you wished to add, say, 2 to 4, you would just move the slider up two values from 4, and you would get 6. Now let's see what happens if you want to subtract 1 from 3. This is the same as adding -1 to 3, and since a -1 in two's complement is 1111, or the same as decimal 15, you move the slider up 15 values from 3. This takes you all the way around the ring to ... 2.

This is a bizarre way to subtract 1 from a value, or so it seems, but you have to admit from the machine point of view it's just dead simple. Try a few other simple additions and subtractions to see how this works.

The problem is that overflow conditions have become much trickier. Consider the following examples:

0111 0100    116                       1000 1110           -112   + 0001 0101   +  21                     + 1001 0001           -111   ---   -                   -        1000 1001    -119 ( = 137) ?        (1) 0001 1111   ( 256 + ) 21 ?

In the case on the left, we add two positive numbers and end up with a negative one. In the case on the right, we add two negative numbers and end up with a positive one. Both are absurd results. Again, the results have exceeded the range of values that can be represented in the coding system we have used. One nice thing is that you can't get into trouble adding a negative number to a positive one, since obviously the result is going to be within the range of allowed values.

As an aside, if you want to convert a positive binary value to a two's complement value, all you have to do is invert (NOT) all the bits and add 1. For example, to convert a binary 7 to -7:


 * 0000 0111 &rarr; 1111 1000 + 1 &rarr; 1111 1001

Another approach is to simply scan from right to left through the binary number, copy down each digit up to and including the first "1" you encounter, then invert all the bits to the left of that "1".

This covers addition and multiplication, but what about multiplication and division? Well, take the binary value:


 * 0010 1001 = 41

Suppose you move, or "shift", all the bits one place to the left, and shove a zero in on the right. You get:


 * 0101 0010 = 82

Surprise! Shifting a binary quantity left one place multiplies it by two. Suppose you shift it to the right, and shove a zero in on the left:


 * 0001 0100 = 20 (losing a 1 that was shifted off the right side)

This is equivalent to dividing by two. The bit shifted off the right side is the "remainder" that's left over from division. This implies that you can implement multiplication by performing a combination of shifts and adds. For example, to multiply a binary value by five, you would shift the value right by two bits, multiplying it by four, and then add the original value. Similarly, division can be implemented by a shift and subtract procedure. This scheme allows you to use simple hardware to perform these operations.

One interesting feature of this scheme are the complications two's complement introduces. Suppose we have a two's complement number such as:


 * 1001 1010 = -102

If we shift this left, the result is obviously invalid, and there's no way to fix it. Multiplying -102 by 2 would give -204, and that would exceed the range of values available in signed 8-bit arithmetic (-128 to 127). But see what happens if you shift right:


 * 0100 1101 = 77

You'd want to get -51, but since we lost the uppermost "1" bit the value becomes abruptly positive. In the case of signed arithmetic you have to use a slightly different shift method, one that shoves a "1" into the upper bit instead of a "0", and gives the right result:


 * 1100 1101 = -51

This is called an "arithmetic shift". The other method that shoves in a "0" is known in contrast as a "logical shift". This is another example that shows how well 2's complement works from the hardware level, as this would not work with sign-magnitude numbers. There is also a "rotate" operation that is a variation on a "shift", in which the bit shifted out of one end of the binary value is recirculated into the other. This is of no practical concern in this discussion, it's just mentioned for the sake of completeness.

So that covers the basic operations for signed and unsigned integers. What about floating-point values?

The operations are basically the same. The only difference in addition and subtraction is that you have two quantities that have both a significand and an exponent, and they both have to have the same exponent to allow you to add or subtract. For example, using decimal math:


 * 3.13E2 + 2.7E3 = 0.313E3 + 2.73E3 = 3.043E3

The same principle applies to binary floating-point math: you shift the significand of one value left or right and adjust the exponent until both values have the same exponent, and then perform the add, or subtract as the case may be. A sharp reader will realize this is why performing calculations between sets of floating-point quantities with greatly different ranges of values is tricky and dangerous. If the two sets of values are adjusted to have the same exponents, a process known as "normalization", the significand of the much smaller number may be shifted so far down that it doesn't even show up in the result of an addition or subtraction. Also recalling high-school math, to multiply two floating-point values, you simply multiply the significands and add the exponents:


 * 3.13E2 &times; 2.7E3 = (3.13 &times; 2.7)E(2 + 3) = 8.45E5

To divide, you divide the significands and subtract the exponents. The same applies to binary floating-point quantities. Note that it's much easier to do this when both binary significands have the same assumed decimal point, but that will be true by the very definition of the format of IEEE 754 floating-point numbers.

Finally, what about higher functions, like square roots and sines and logs and so on?

There are two approaches. First, there are standard algorithms that mathematicians have long known that use the basic add-subtract-multiply-divide operations in some combination and sequence to generate as good an approximation to such values as you would like. For example, sines and cosines can be approximated with a "power series", which is a sum of specified powers of the value to be converted divided by specific constants that follow a certain rule.

Second, you can use something equivalent to the "look-up tables" once used in math texts that give values of sine, cosine, and so on, where you would obtain the nearest values for, say, sine for a given value and then "interpolate" between them to get a good approximation of the value you wanted. (Younger readers may not be familiar with this technique, as calculators have made such tables generally obsolete, but such tables can be found in old textbooks.) In the case of the computer, it can store a table of sines or the like, look up values, and then interpolate between them to give the value you want.

That covers the basics of bits and bytes and math for a computer. However, just to confuse things a little bit, while computers normally use binary floating-point numbers, calculators normally don't.

Recall that there is a slight error in translating from decimal to binary and back again. While this problem is readily handled by someone familiar with it, calculators in general have to be simple to operate and it would be better if this particular problem didn't arise, particularly in financial calculations where minor discrepancies might lead to a tiresome audit.

So calculators generally really perform their computations in decimal, using a scheme known as "binary-coded decimal (BCD)". This scheme uses groups of four bits to encode the individual digits in a decimal number:


 * 0000 = decimal 0
 * 0001 = decimal 1
 * 0111 = decimal 7
 * 1000 = decimal 8
 * 1001 = decimal 9
 * 1010 = ILLEGAL
 * 1011 = ILLEGAL
 * 1110 = ILLEGAL
 * 1111 = ILLEGAL
 * 1110 = ILLEGAL
 * 1111 = ILLEGAL

With four bits, 10 BCD values can be represented. With 8, 100 BCD values can be handled:


 * 0001 0000 = decimal 10
 * 0001 0001 = decimal 11
 * 0001 0010 = decimal 12
 * 0001 0011 = decimal 13

BCD obviously wastes bits. For example, 16 bits can be used to handle 65,536 binary integers, but only 10,000 BCD integers. However, as mentioned, BCD is not troubled by small errors due to translation between decimal and binary, though you still have resolution and range problems. BCD can be used to implement integer or floating-point number schemes. Of course, it's just decimal digits represented by bits. It can be used in computer programs as well, but except for calculators you won't normally come in contact with it. It makes the arithmetic substantially slower and so is generally only used in applications, such as financial calculations, where minor errors are unacceptable.

One final comment on computer math: there are math software packages that offer "indefinite precision" math, or that is, math taken out to a precision defined by the user. What these packages do is define ways of encoding numbers that can change in the number of bits they use to store values, as well as ways of performing math on them. They are very slow compared to normal computer computations, and many applications actually do fine with 64-bit floating-point math. ---

In my opinion, this page not only doesn't meet the usual format of a Wikipedia article, but is also too informal. This is understandable considering it's imported, but to be a useful article it really has to be cleaned up. Derrick Coetzee 15:54, 14 Aug 2004 (UTC)

notation to describe floating-point formats
I see that many people describe and compare fixed-point number formats using Q (number format) notation, where one system has "Q1.14" fixed-point numbers while another system has "Q7.16" fixed-point numbers.

I see that some people use a similar notation (without naming that notation) in List of computers with on-board BASIC and probably other Wikipedia articles, describing one system as having "FP:31/8" floating-point numbers, while another system uses "FP:64" floating-point numbers. I see that (other?) people use a different notation (without naming that notation) to describe G.711 as having effectively "seeemmmm" floating-point numbers and describe single-precision Microsoft Binary Format as having "xxxxxxxx_smmmmmmm_mmmmmmmm_mmmmmmmm" floating-point numbers.

Is there a name for either of these notations? Is there any other named notation used by people to show the differences between binary16 vs. bfloat16 vs. other floating point formats? --DavidCary (talk) 23:47, 19 June 2020 (UTC)