Talk:Binary-to-text encoding

A few extras to consider

 * "Five-Letter Codegroup Filter": uses only upper case ASCII letters and spaces; includes a CRC. --DavidCary (talk) 08:28, 19 February 2021 (UTC)
 * VLQ
 * LEB128
 * basE91
 * Z85 diff charset from Ascii85

ASCII85 implementation in JavaScript
I wanted to add link to my "just written" implementation, however got a warning about conflict of interests. If anyone other will think that this information will be helpfull feel free to add this link to Article. NE0sIghT (talk) 21:24, 27 February 2018 (UTC)

basE91
Uses 91 characters, and every pair of characters represents either 13 or 14 bits. Since 2^13 < 91*91 < 2^14, two characters can encode all 13-bit strings and some 14-bit strings; since 91*91=8281 and 2^13=8192, there are 89 combinations of two characters that are not used for representing strings of 13 characters. The particular choice is: for a string of 14 bits, if its lowest 13 represent a number greater than or equal to 89, the two characters are used to represent only these 13 bits; otherwise, 14 bits are encoded.

Do not move this explanation to the article; it is undocumented. - Liberatore(T) 13:45, 13 April 2006 (UTC)

basE91 documents: http://base91.sourceforge.net/ ; TB100QM.pdf Ropata (talk) 17:47, 9 September 2019 (UTC)

reviving 'ascii armor' concept
ASCII armor was apparently the original title of this article. It is now 'binary to text encoding'. The ascii armoring concept consists of more than just encoding binary into ascii, it also consists of encoding plain text for purposes of archival or transmission. Therefore, it should go back into the article. drefty.mac 17:33, 27 October 2006 (UTC)

Base26 should be kept
Base26 is used for binary encoding. It is not as heavily used as other coding formats as it is still fairly new as an encoding format. Eyreland (talk) 03:18, 12 September 2013 (UTC)


 * I apologize. At first glance it looked kind of spammy.  Feel free to add it again, but use a citation format instead of a direct URL as a source.  And please add a citation for the other two entries.  —EncMstr (talk) 03:47, 12 September 2013 (UTC)

Efficiency
It makes little sense to include a term 'efficiency' in the table, if there is not any kind of explanation or mentioning of what it is. Someone would probably want to add this information. - 141.213.15.66 (talk) 18:11, 19 March 2015 (UTC)


 * I agree, I am looking at various systems, attempting to find one that I can follow, and I see efficiency in the chart, think it might be based on the ratio of base 10 to the converted base, but 10/16 is not .5(hexadecimal). I feel that if whoever added the efficiency section could add an explanation, then that would be better.67.221.121.30 (talk) 12:24, 28 June 2017 (UTC)Yggdrasil


 * Added explanation of the efficiency. Lord Crc (talk) 12:13, 15 August 2017 (UTC)

Decimal (base10)
I'm surprised this article does not even mention one of the most commonly used encoding for integer which is the decimal notation... — Preceding unsigned comment added by 90.9.199.223 (talk) 12:26, 6 February 2020 (UTC)
 * This article is about binary, not decimal, and about text, not integer. --bdijkstra (talk) 13:04, 6 February 2020 (UTC)
 * This article is about encoding of binary. Decimal, hexadecimal, and all text-representation of a binary in a certain base are encodings (half of the entries of the table). It is not about encoding *of* text, it is about encoding *to* text. The decimal encoding is as relevant as base58. It is a way to represent a binary integer with text characters. Yes we are very used to this representation, so it may seem unnecessary to add it. But the efficiency of the decimal representation can be helpful as a baseline (41.5%).  — Preceding unsigned comment added by 90.9.199.223 (talk) 13:30, 6 February 2020 (UTC)

Programming language implementations
I was looking at the programming language implementations for Base58 on this page while doing research for Articles for deletion/Base58. I came to the conclusion that we are on the wrong path here. I could create, say, Base11 (for Spinal Tap fans) and Base666 (for use by daemons) write up some routines for Python, C, Forth, GW-BASIC, LOLCODE and Whitespace. publish them on GitHub, and add them to this list.

I say we should nuke the column. it isn't useful. --Guy Macon (talk) 18:04, 2 July 2020 (UTC)


 * Indeed. Potentially interesting for student programmers, but in my opinion too trivial to classify as knowledge. --bdijkstra (talk) 18:16, 2 July 2020 (UTC)
 * I'm for removing that column as well, unless you find all existing public implementations for all languages out there...--2A02:8070:6394:7A00:1920:6758:18DA:1DF9 (talk) 18:41, 29 December 2021 (UTC)

Identifying the encoding used
Is there any way to guess the encoding used? Do any of these have any headers? Base64 for example uses padding with "=" at the end. How do other aglorithms fill remaining bytes? -- Thunderbolt (talk) 16:11, 7 December 2020 (UTC)

Is PGP word list a Binary-to-Text Encoding?
Should PGP word list added in the list?--𝒞𝒽ℯℯ𝓈ℯ𝒹ℴℊ (talk) 01:53, 3 July 2021 (UTC)

What does the ASCII line in the standards table mean?
This line is rather confusing.

Given that ASCII text cannot represent arbitrary data why is it here? And how can it be 87.5% efficient given that it cannot represent half of all possible octets?


 * I added a point of clarification. I think you are thinking about one specific subset of what binary-to-text encodings are used for. While it is true that ASCII is not commonly used to represent arbitrary Binary data, Binary-encoded ASCII is still the most common data format there is. To address your claim "cannot represent half of all possible octets" that's entirely false. Any binary sequence can be represented in ASCII. It does take 2 characters to represent one octet, it only takes 8 ASCII characters to hold 7 octets. So the efficiency is 7/8 = 87.5%. Adam McCormick (talk) 00:43, 18 November 2021 (UTC)
 * ASCII is ASCII, a 7 bit encoding. All the schemes listed in the article are forms of encoding binary data as ascii. Whereas if you allow control characters you could bit-shift 7 octets of 8-bit binary data into 8 octets of 127-bit ascii data this would ony be 7/8% efficient for multiples of 7 bytes and is certainly not commonly used. -- Q Chris (talk) 11:09, 18 November 2021 (UTC)

Missing encodings
Is this article missing any other notable encodings, whether ‘modern’ or historic?

Examples
For example, I’ve never heard of these being used in this context, but they seem like they plausibility could (or do) exist:

• Base10 - e.g. inputting binary data into a 10-key numerical keypad (an anonymous user commented on the general topic of Base10 encodings here under the heading Decimal (base10))

• Base12 - e.g. inputting binary data into a 12-key telephone keypad (0-9, plus '*' and '#') or transmitting it (yes, less efficiently than Base16) by sending DTMF tones through an audio channel (including from speaker to microphone)

What about Telex-like systems (and their modern descendants)?

Already mentioned by others
Base 26 (hexavigesimal) was mentioned above* but never appears to be re-added. Does anyone use this for encoding binary, or is it mostly used for representing integers (as done by Amazon for ASINs) in a more concise form (the opposite aim compared to the encodings in this article; it has an integer-to-text efficiency of greater than 100% - BCD, an integer-to-binary encoding, is a better analogue, as the encoded version of BCD is less efficient than the original)?

''*The last edit by Eyreland prior to reversion and that above discussion, (21:18, September 11, 2013), never appears to have been restored. (Unfortunately, I don’t have time to research that further at the moment.)''

DavidCary mentioned several others here under the heading A few extras to consider in early 2021.

𝒞𝒽ℯℯ𝓈ℯ𝒹ℴℊ (起司狗) mentioned the PGP word list, which seems like it could qualify for inclusion.

Some of these things are not like the others
Certain encodings and formats, while perhaps having some similar characteristics to those already included in the article, don’t belong here and should be listed/classified elsewhere.

This article’s title seems to really mean “binary syllable-to-text encoding” or “byte-to-text encoding”, where the syllable/byte size is commonly 8 bits (but not always; Ascii85/Base85 uses a 32-bit BE grouping of four 8-bit bytes, for example, which could be seen as encoding a syllable/byte size of 32 bits)

• Most character and terminal encodings

• Some data compression standards

• Geocode systems

A larger category of ‘data-to-text encodings’ might include some of those that don’t fit into the scope or this article.

Jim Grisham (talk) 00:08, 17 June 2022 (UTC)

YEnc
https://en.m.wikipedia.org/wiki/YEnc should be added to the table 69.17.160.164 (talk) 15:40, 10 July 2024 (UTC)