Template talk:Character encodings

Encoding vs. TES
HZ is a TES, Transfer Encoding Syntax, see UTR17, of GB2312, not a character encoding proper. Nor is it a national standard. If at all kept in this template it should be in the misc section.

Similarly, UTF-7 is also a TES, not a UTF (despite the name). So I was thinking of removing UTF-7 from this template. It's included in the "Table Unicode" template, and I think that is enough.

/keka (talk) 08:40, 21 July 2009 (UTC)

Grouping
I've tried to group certain encodings in a "logical" way. For instance, even if the GOST standard is/was a national standard, it's for 4, 5, and 6-bit character encodings. Not something used in modern computers. So it's amongst "misc" items. Likewise, HKSCS is near Big5 and CP950 since they are so closely related. Etc.

keka (talk) 08:59, 25 July 2009 (UTC)

The Big5-HKSCS encoding is not really supported by Windows. Windows 950 should not be considered HKSCS compatible by default. Windows Vista only supports the Unicode characters of Big5-HKSCS. Microsoft HKSCS —Preceding unsigned comment added by 69.110.13.196 (talk) 04:57, 26 July 2009 (UTC)

newline
UTF-8, read that article please. It is not a "single character" (like horizontal tabulation, backspace etc.), it is a piece of encoding troubles related to line separation. Incnis Mrsi (talk) 09:06, 15 March 2010 (UTC)

Missing codepages
I notice, that there are a few code pages messing, namely the following
 * Code page 708 (Arabic ASMO);
 * Code page 851 (Greek III);
 * Code page 853 (Latin III);
 * Code page 868 (IBM Persian);
 * Code page 934 (MS-DOS Korean);
 * Code page 938 (MS-DOS Taiwanese);
 * Code page 999 (Yugoslavian ASCII-7).

I have the Korean edition of MS-DOS 6.2, which uses code page 934. It, and code page 938, are also referenced in MS-DOS 6.22 COUNTRY.TXT file.

MS-DOS code page 999 seems to be the code page version of the Yugoslavian ASCII-7 codepage, commonly used especially in Croatia and Slovenia before the advent of code page 852. One notable user of it is the Slovenia SAOP programming corporation's software.

Code page 708 is referenced in Windows. As for 851, 853, and 868, I've seen specifications of them on Google. - 94.140.73.150 (talk) 16:15, 22 August 2010 (UTC)

1259, 1260, 1262-1269
What are these Windows Codepages? What is CP0028?

Proposed changes
The design of this template is getting more and more complete but some few things could be done to get it clearer. Here are some suggestions:
 * 1) Make a clear distinction between what are “Character encoding methods”, “Character sets” and “Code pages”.
 * 2) The terminology “Code page” is used mainly by IBM and Microsoft, very few other manufacturers / organizations use it. The so called “Miscellaneous code pages” are not code pages. Perhaps, a better name would be “Miscellaneous character sets”.
 * 3) EUC, ISO/IEC 2022 and HZ are not character sets. They are encoding methods (schemes) which are used to encode character sets, namely JIS, KSX, GB and CNS character sets.
 * 4) The same goes for all UTF, which are encoding schemes to encode the ISO 10646 character set.
 * 5) The left column is already arranged accordingly to several platforms. That could be expanded and some character sets included in the “Platform specific” section could be moved to the “right” place:
 * 6) Adobe: Adobe Standard, Adobe Latin 1, Adobe Symbols, etc.
 * 7) DEC: DEC Multinational, DEC Turkish, DEC Greek, DEC Cyrillic, DEC Hebrew, DEC/8/ASMO, DEC Technical, DEC Kanji, DEC Korean, DEC Hanzi, DEC Hanyu, etc.
 * 8) Data General: Data General International, Data General Turkish, Data General Arabic, Data General Kana, Data General Symbols, etc.
 * 9) Hewlett-Packard: HP Roman-8, HP Turkish-8, HP East-8, HP Greek-8, HP Cyrillic-8, HP Hebrew-8, HP Arabic-8, HP Thai-8, HP Japan-15, HP Korea-15, HP PRC-15, HP ROC-15, HP Math-8, etc.
 * 10) Latex: T1 (Cork Encoding), T2A, T2B, T2C, T3, T4, T5, etc.
 * 11) ISO: ISO is not a platform in itself, but some platforms (for instance, UNIX) are designed to work following the ISO standards. Also, many character sets, non specific to any platform, are designed following the ISO standards. For the sake of convenience, perhaps we could consider ISO as a “platform”.
 * 12) “Acorn” is not a character set but rather a manufacturer (as are IBM or Apple). Perhaps, a better name would be “RISC OS character set”.
 * 13) Is it worthwhile to have an entry called “National standards”? Of course, some Governments or some Official National Bodies have defined their national standards. But, after that, the manufacturers or organizations have implemented them or some variations of them. And in some cases it was the opposite, some Governments or some Official National Bodies have adopted existing standards as their national standard. But that list, as it is, is a mixed bag and rather incomplete. Here is what I have found out so far:
 * {| class="wikitable"

! Country || 7-bit standard || 8-bit standard || Multibyte standard || 16-bit standard || Notes
 * Arab countries || ASMO 449 || ASMO 708 || || ||
 * Armenia || AST 34.005:1997 || AST 34.002:1997|| || ||Commonly called ArmSCII AST 34.002:1997 defines two variants: ArmSCII-8 for ISO environment; ArmSCII-8a for DOS and Macintosh environment.
 * Bangladesh || ||BSD 1520:1995 BSD 1520:2000 BSD 1520:2011 || || ||BSD 1520:1995 was not approved; BSD 1520:2011 is the same as the Bengali (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called BSCII.
 * Brazil || ||NBR­-9614:1986 NBR-­9614:1991 || || ||Commonly called BraSCII.
 * Canada ||CSA Z243.4 ­ 1985 alt.1­1 CSA Z243.4 ­ 1985 alt.1­2 || || || ||ISO 646-CA.
 * China ||GB 1988 - 1980 || ||GB 2312-80 GB 18030-2000 GB 18030-2005 || ||GB 1988 - 1980 = ISO 646-CN.
 * Croatia || ||HRN I.B1.013:1988 || || ||
 * Cuba || NC 99-10 - 1981 || || || ||ISO 646-CU.
 * Czechoslovakia || ||ČSN 36 91 03 || || ||Nearly identical to ISO Latin-2.
 * Denmark ||DS 2089-1974 || || || ||Not an official part of ISO 646 series.
 * Estonia || ||EVS 8:1993 || || ||EVS 8:1993 has defined 3 “tables”: table 3.1 for ISO environment; table 3.2 for EBCDIC; table 3.3 for DOS.
 * Finland ||SFS 4017 || || || ||ISO 646-FI; identical to Swedish Standard SEN 850200 b.
 * France ||NF Z 62-010 - 1973 NF Z 62-010 - 1982 || || || ||ISO 646-FR.
 * Georgia || ||SSP 18.1:1998 || || ||Commonly known as Geostd8; the more popular GeoSCII is not the national standard.
 * Federal Republic of Germany ||DIN 66003 || || || ||ISO 646-DE.
 * Greece ||ELOT 927 ||ELOT 928 || || ||
 * Hungary ||MSZ 7795­3 || || || ||ISO 646-HU.
 * India ||IS 13194:1991 ||IS 13194:1991 || || ||IS 13194:1991 defines several character sets: EA-ISCII for 7-bit environment ISCII for ISO environment PC-ISCII for DOS
 * International ||ISO 646-1973 IRV || || ||ISO 10646 ||
 * Iran ||ISIRI 2900 ||ISIRI 3342 || || ||ISIRI 2900 is glyph-based; ISIRI 3342 is character-based.
 * Ireland ||IS 433 - 1996 || || || ||Not an official part of ISO 646 series.
 * Israel ||SI 960 ||SI 1311:1988 SI 1311:1998 SI 1311:2002 || || ||The International Register number went on changing (IR 138 >> IR 198 >> IR 234) as the Standards Institute of Israel went on updating the character set, but ISO kept the name as ISO 8859-8.
 * Italy ||UNI 0204 - 1970 || || || ||ISO 646-IT.
 * Japan ||JIS C 6220-1969 JIS C 6220-1976 || ||JIS C 6226-1978 JIS C 6226-1983 JIS X 0208:1990 JIS X 0212:1990 JIS X 0213:2000 JIS X 0213:2004 || ||JIS C 6220 (Roman version, not Katakana version) = ISO 646-JP.
 * Kazakhstan || ||ST RK 920:91 ST RK 1048:2002 || || ||ST RK 920:91 is for DOS; ST RK 1048:2002 is for Windows.
 * North Korea || || ||KPS 9566-97 || ||
 * South Korea ||KS C 5636 KS X 1003 - 1989 || ||KSC 5601-1987 KS C 5601-1992 || ||KS C 5636 is not an official part of ISO 646 series.
 * Latvia || ||RST 1040-90 LVS 8-92 || || ||RST 1040-90 is commonly known as Code Page 866-Latvian.
 * Lithuania || ||RST 1093-89 RST 1095-89 LST 1282:1993 LST 1283:1993 LST 1284:1993 LST 1590-1 LST 1590-2 LST 1590-3 || || ||
 * Malta ||?1 ||MSA ISO 8859-3?2 || || ||1 There is a character set commonly referred as ISO 646-MT (not an official part of the ISO 646 series), but I don’t know if it has been defined as a Maltese official standard; 2 The MSA has included all the ISO 8859 series among their standards; however, I haven’t seen any document saying specifically that MSA ISO 8859-3 is the national standard.
 * Norway ||NS 4551-1 NS 4551-2 || || || ||ISO 646-NO.
 * Poland ||BN-74/3101-01 ||PN-T-42118:1993 || || ||BN-74/3101-01 is not an official part of ISO 646 series.
 * Romania || ||SR 14111:1998 || || ||
 * Soviet Union ||GOST 13052-74 ||GOST 19768-74 GOST 19768-87 || || ||GOST 13052-74 is commonly known as KOI-7; GOST 19768-74 is commonly known as KOI-8; check if they superseded as Russian standards
 * Sri Lanka || ||SLS 1134:1990 SLS 1134:1996 SLS 1134:2004 || || ||SLS 1134:1990 was not approved; SLS 1134:2004 is the same as the Sinhala (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called SlaSCII.
 * Sweden ||SEN 850200 b SEN 850200 c || || || ||ISO 646-SE. SEN 850200 b is identical to Finnish Standard SFS 4017.
 * Taiwan ||CNS 5205-1996 || ||CNS 11643-1992 || ||CNS 5205-1996 is not an official part of the ISO 646 series; the more popular Big5 is not the national standard.
 * Thailand || ||TIS 620-2529 TIS 620-2533 || || ||
 * Turkey || ||TS-5881:1988 || || ||
 * United States ||ANSI X3.4 - 1968 || || || ||Commonly called ASCII; ISO 646-US.
 * United Kingdom ||BSI 4730 || || || ||ISO 646-GB.
 * Vietnam || ||TCVN 5712-1:1993 TCVN 5712-2:1993 TCVN 5712-3:1993 ||TCVN 6056:1995 ||TCVN 6909:2001 ||TCVN 5712 is also referred as VSCII; the more popular VISCII is not the national standard TCVN 6056 is for the Chữ Nôm script.
 * Yugoslavia ||JUSI.B1.002 JUSI.B1.003 JUSI.B1.004 ||JUS I.B1.013 || || ||In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.
 * }
 * As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.
 * Italy ||UNI 0204 - 1970 || || || ||ISO 646-IT.
 * Japan ||JIS C 6220-1969 JIS C 6220-1976 || ||JIS C 6226-1978 JIS C 6226-1983 JIS X 0208:1990 JIS X 0212:1990 JIS X 0213:2000 JIS X 0213:2004 || ||JIS C 6220 (Roman version, not Katakana version) = ISO 646-JP.
 * Kazakhstan || ||ST RK 920:91 ST RK 1048:2002 || || ||ST RK 920:91 is for DOS; ST RK 1048:2002 is for Windows.
 * North Korea || || ||KPS 9566-97 || ||
 * South Korea ||KS C 5636 KS X 1003 - 1989 || ||KSC 5601-1987 KS C 5601-1992 || ||KS C 5636 is not an official part of ISO 646 series.
 * Latvia || ||RST 1040-90 LVS 8-92 || || ||RST 1040-90 is commonly known as Code Page 866-Latvian.
 * Lithuania || ||RST 1093-89 RST 1095-89 LST 1282:1993 LST 1283:1993 LST 1284:1993 LST 1590-1 LST 1590-2 LST 1590-3 || || ||
 * Malta ||?1 ||MSA ISO 8859-3?2 || || ||1 There is a character set commonly referred as ISO 646-MT (not an official part of the ISO 646 series), but I don’t know if it has been defined as a Maltese official standard; 2 The MSA has included all the ISO 8859 series among their standards; however, I haven’t seen any document saying specifically that MSA ISO 8859-3 is the national standard.
 * Norway ||NS 4551-1 NS 4551-2 || || || ||ISO 646-NO.
 * Poland ||BN-74/3101-01 ||PN-T-42118:1993 || || ||BN-74/3101-01 is not an official part of ISO 646 series.
 * Romania || ||SR 14111:1998 || || ||
 * Soviet Union ||GOST 13052-74 ||GOST 19768-74 GOST 19768-87 || || ||GOST 13052-74 is commonly known as KOI-7; GOST 19768-74 is commonly known as KOI-8; check if they superseded as Russian standards
 * Sri Lanka || ||SLS 1134:1990 SLS 1134:1996 SLS 1134:2004 || || ||SLS 1134:1990 was not approved; SLS 1134:2004 is the same as the Sinhala (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called SlaSCII.
 * Sweden ||SEN 850200 b SEN 850200 c || || || ||ISO 646-SE. SEN 850200 b is identical to Finnish Standard SFS 4017.
 * Taiwan ||CNS 5205-1996 || ||CNS 11643-1992 || ||CNS 5205-1996 is not an official part of the ISO 646 series; the more popular Big5 is not the national standard.
 * Thailand || ||TIS 620-2529 TIS 620-2533 || || ||
 * Turkey || ||TS-5881:1988 || || ||
 * United States ||ANSI X3.4 - 1968 || || || ||Commonly called ASCII; ISO 646-US.
 * United Kingdom ||BSI 4730 || || || ||ISO 646-GB.
 * Vietnam || ||TCVN 5712-1:1993 TCVN 5712-2:1993 TCVN 5712-3:1993 ||TCVN 6056:1995 ||TCVN 6909:2001 ||TCVN 5712 is also referred as VSCII; the more popular VISCII is not the national standard TCVN 6056 is for the Chữ Nôm script.
 * Yugoslavia ||JUSI.B1.002 JUSI.B1.003 JUSI.B1.004 ||JUS I.B1.013 || || ||In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.
 * }
 * As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.
 * Sri Lanka || ||SLS 1134:1990 SLS 1134:1996 SLS 1134:2004 || || ||SLS 1134:1990 was not approved; SLS 1134:2004 is the same as the Sinhala (Unicode block) but assigned to the upper part of an 8-bit character set; commonly called SlaSCII.
 * Sweden ||SEN 850200 b SEN 850200 c || || || ||ISO 646-SE. SEN 850200 b is identical to Finnish Standard SFS 4017.
 * Taiwan ||CNS 5205-1996 || ||CNS 11643-1992 || ||CNS 5205-1996 is not an official part of the ISO 646 series; the more popular Big5 is not the national standard.
 * Thailand || ||TIS 620-2529 TIS 620-2533 || || ||
 * Turkey || ||TS-5881:1988 || || ||
 * United States ||ANSI X3.4 - 1968 || || || ||Commonly called ASCII; ISO 646-US.
 * United Kingdom ||BSI 4730 || || || ||ISO 646-GB.
 * Vietnam || ||TCVN 5712-1:1993 TCVN 5712-2:1993 TCVN 5712-3:1993 ||TCVN 6056:1995 ||TCVN 6909:2001 ||TCVN 5712 is also referred as VSCII; the more popular VISCII is not the national standard TCVN 6056 is for the Chữ Nôm script.
 * Yugoslavia ||JUSI.B1.002 JUSI.B1.003 JUSI.B1.004 ||JUS I.B1.013 || || ||In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.
 * }
 * As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.
 * United States ||ANSI X3.4 - 1968 || || || ||Commonly called ASCII; ISO 646-US.
 * United Kingdom ||BSI 4730 || || || ||ISO 646-GB.
 * Vietnam || ||TCVN 5712-1:1993 TCVN 5712-2:1993 TCVN 5712-3:1993 ||TCVN 6056:1995 ||TCVN 6909:2001 ||TCVN 5712 is also referred as VSCII; the more popular VISCII is not the national standard TCVN 6056 is for the Chữ Nôm script.
 * Yugoslavia ||JUSI.B1.002 JUSI.B1.003 JUSI.B1.004 ||JUS I.B1.013 || || ||In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.
 * }
 * As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.
 * Yugoslavia ||JUSI.B1.002 JUSI.B1.003 JUSI.B1.004 ||JUS I.B1.013 || || ||In Croatia, JUS I.B1.013 was superseded as the HRN I.B1.013:1988 standard; check if these standards were not followed in the other countries of former Yugoslavia; JUSI.B1.002 = ISO 646-YU.
 * }
 * As it can be seen, putting all the national standards in the template can be cumbersome. Perhaps, it would be better if, in each article about a character set, we put the clear statement “It is the national standard of (country), called (name or code).”.

I would like to hear some feedback before making some changes.

Code Page Guy (talk) 16:39, 4 March 2017 (UTC)

Update Apple 1 link & more
Please update the Apple 1 link to point to Apple_I.

The article the link point to now has been deleted. — Preceding unsigned comment added by 84.82.12.118 (talk) 21:35, 30 November 2019 (UTC)

There's a draft of the Apple III character set at Draft:Apple III character set but it will never survive by itself. Consider merging all old Apple sets into one article, source it well, and write up some of the history about them, otherwise it will all just get deleted and you might as well remove them from the infobox now.

The Amstrad link should probably point to Amstrad CP/M Plus character set.

The Apple Sabine link should be removed and that article should be deleted.

The only reference to Elwro Junior is here: List of ZX Spectrum clones Currently the link points to an article about Polish spelling. I'm actually not sure if the Elwro Junior has its own character set; it may just be the same as the ZX Spectrum's character set.

The Mattel Aquarius character set article will not survive on its own; I recommend merging it into the Aquarius article.

The Minitel character set article has been deleted. Either remove it from the infobox, or put the character set in the Minitel article.

The OricSCII article has also been deleted. Put the character set in Oric or remove it from the infobox.

The Sega SC-3000 character set article should probably be deleted. Games at the time tended to use sprites and tiles and the meaning / appearance of a given code would be determined by whatever was in sprite ROM.

The Teletext character set will probably get deleted soon, as will Videotex character set.

Semi-protected edit request on 9 July 2020
The leading word "IBM" and the trailing word "emulations" should not be in this list. These terms don't make any sense next to the works Apple, Adobe, etc. Following are the lines to change - just removed IBM and emulations from each:

IBM Apple Macintosh emulations IBM Adobe emulations IBM DEC emulations IBM HP emulations 66.210.61.254 (talk) 14:41, 9 July 2020 (UTC)
 * Red information icon with gradient background.svg Not done: please provide reliable sources that support the change you want to be made. Eggishorn (talk) (contrib) 17:00, 9 July 2020 (UTC)

I don't know of sources, I'm sorry, for the things to be changed are plain: the term "IBM" doesn't precede Apple - why would it. The term "emulations" doesn't follow Apple, why would it? Are you aware of the character sets used in those machines? They aren't emulations of any IBM anything. The terms are unfortunately free of meaning. I didn't know this would be an unusual request. Sorry to have bothered you. — Preceding unsigned comment added by 66.210.61.254 (talk) 17:05, 9 July 2020 (UTC)


 * The phrase "IBM Apple Macintosh emulations" means emulations of Apple Macintosh, as used by IBM; it does not mean emulations of IBM.
 * The Apple encodings are listed by their actual names under the MacOS code pages ("scripts") heading already. The IBM Apple Macintosh emulations heading is listing the code page numbers to the Apple encodings, e.g. Mac OS Roman is numbered 1275 by IBM (see ). These numbers are only used by IBM or by things associated with IBM (e.g. software running under IBM products, or possibly ICU, which started off as an IBM project): for example, Microsoft assigns the same encoding (Mac OS Roman) the completely different code page number 10000 (see ; I'm not entirely sure why these are not also listed).
 * -- HarJIT (talk) 17:57, 9 July 2020 (UTC)
 * -- HarJIT (talk) 17:57, 9 July 2020 (UTC)
 * -- HarJIT (talk) 17:57, 9 July 2020 (UTC)

Semi-protected edit request on 19 April 2022
The "Symbol" link in the Platform Specific section links to a general Symbol page. Shouldn't it be linked to Symbol_(typeface) instead? 68.9.24.237 (talk) 08:28, 19 April 2022 (UTC)
 * ✅ ScottishFinnishRadish (talk) 11:11, 19 April 2022 (UTC)