Code point

A code point, codepoint or code position is a particular position in a table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dimensional (sheets in a workbook), etc... in any number of dimensions.

Technically, a code point is a unique position in a quantized n-dimensional space, where the position has been assigned a semantic meaning. The table has discrete (whole) and positive positions (1, 2, 3, 4, but not fractions).

Code points are used in a multitude of formal information processing and telecommunication standards. For example ITU-T Recommendation T.35 contains a set of country codes for telecommunications equipment (originally fax machines) which allow equipment to indicate its country of manufacture or operation. In T.35, Argentina is represented by the code point 0x07, Canada by 0x20, Gambia by 0x41, etc.

In character encoding
Code points are commonly used in character encoding, where a code point is a numerical value that maps to a specific character. In character encoding code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes represent symbols, control characters, or formatting. The set of all possible code points within a given encoding/character set make up that encoding's codespace.

For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

In Unicode
For Unicode, the particular sequence of bits is called a code unit – for the UCS-4 encoding, any code point is encoded as 4-byte (octet) binary numbers, while in the UTF-8 encoding, different code points are encoded as sequences from one to four bytes long, forming a self-synchronizing code. See comparison of Unicode encodings for details. Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data. However, code points may also be left reserved for future assignment (most of the Unicode code space is unassigned), or given other designated functions.

The distinction between a code point and the corresponding abstract character is not pronounced in Unicode but is evident for many other encoding schemes, where numerous code pages may exist for a single code space.

History
The concept of a code point dates to the earliest standards for digital information processing and digital telecommunications.

In Unicode, code points are part of Unicode's solution to a difficult conundrum faced by character encoding developers in the 1980s. If they added more bits per character to accommodate larger character sets, that design decision would also constitute an unacceptable waste of then-scarce computing resources for Latin script users (who constituted the vast majority of computer users at the time), since those extra bits would always be zeroed out for such users. The code point avoids this problem by breaking the old idea of a direct one-to-one correspondence between characters and particular sequences of bits.