6b/8b encoding

In telecommunications, 6b/8b is a line code that expands 6-bit codes to 8-bit symbols for the purposes of maintaining DC-balance in a communications system.

The 6b/8b encoding is a balanced code -- each 8-bit output symbol contains 4 zero bits and 4 one bits. So the code can, like a parity bit, detect all single-bit errors.

The number of 8-bit patterns with 4 bits set is the binomial coefficient $$\tbinom 84$$ = 70. Further excluding the patterns  and , this allows 68 coded patterns: 64 data codes, plus 4 additional control codes.

Coding rules
The 64 possible 6-bit input codes can be classified according to their disparity, the number of 1 bits minus the number of 0 bits:

The 6-bit input codes are mapped to 8-bit output symbols as follows:
 * The 20 6-bit codes with disparity 0 are prefixed with Example: 000111 → 10000111 Example: 101010 → 10101010
 * The 15 6-bit codes with disparity +2, other than, are prefixed with  Example: 010111 → 00010111
 * The 15 6-bit codes with disparity −2, other than, are prefixed with  Example: 101000 → 11101000
 * The remaining 20 codes: 12 with disparity ±4, 2 with disparity ±6,,  , and the 4 control codes, are assigned to codes beginning with   as follows:

No data symbol contains more than four consecutive matching bits, and because the patterns  and   are excluded, no data symbol begins or ends with more than three identical bits. Thus, the longest run of identical bits that will be produced is 6. (I.e. this is a (0,5) RLL code, with a worst-case running disparity of +3 to −3.)

Any occurrence of 6 consecutive identical bits constitutes a comma sequence or sync mark or syncword; it identifies the symbol boundaries precisely. Those 6 bits straddle the inter-symbol boundary with exactly 3 of those identical bits at the end of one symbol, and 3 of those identical bits at the start of the following next symbol.