Talk:Code page/Archive 1

IBM PC code page[edit]

I think it makes sense to keep the different code pages for the IBM PC collected. The entry as it is now is a bit confused in terms of "IBM PC code page", "code page as a more general term" and even "8 bit character sets". perhaps the IBM PC code pages should have an entry of their own? -- Egil 13:45 Mar 10, 2003 (UTC)

NPOV in Microsoft related text[edit]

Someone needs to review the Microsoft related text for NPOV, IMHO. mjb 21:00, 16 Jun 2004 (UTC)

I agree and attempted a starting point for this. Pjacobi 18:32, 9 Jul 2004 (UTC)

Valiant attempt, though 'ISO insisted' was a bit strong (ISO just writes a standard based on what the USA (ANSI) and other national standards bodies agree). Have rephrased. mfc

The section that discusses the proliferation of the non-standards-compliant windows code pages seems to me very much from a "camp". I have broken it out into its own "Criticism" section, but I suggest it be reworked from a "from my own experiences" tone into something a little more neutral. csiefken

To say that the chaos surrounding code pages was part of Microsoft's embrace, extend and extinguish policy shows ignorance in the history of code pages and the code page standardization process. Microsoft was developing 1252 at least by 1984, possibly 1983. The first versions of Windows were released in 1985! To say that Microsoft was conspiring in 1985 to undermine a standard released in 1992 shows just how biased the author of this section is. It is so flagrant that I am leaving it in as it says more about the author than the supposed controversy. Laughingskeptic

This has since been removed. -- Beland (talk) 16:14, 20 July 2020 (UTC)[reply]

Code page vs. Codepage[edit]

Codepage redirects to Character encoding, but Code page gives the page on vendor specific code pages. Am I the only one puzzled about this? Pjacobi 12:24, 10 Jul 2004 (UTC)

This appears to be fixed now; codepage redirects to code page. -- pne 10:58, 12 Jul 2004 (UTC)

Origin of the term Code Page[edit]

The entry should reflect that the term refers to which physical page number in the printed IBM PC Technical Reference Manual (see BIOS) the character set was listed on ...

I do not know if IBM PC/AT Technical Reference Manual 1502494 is exactly the right part to reference, but I believe so. -Hobart

The article errs in stating that "Whilst the term code page originated from IBM's EBCDIC-based mainframe systems ..." It is a telegraphy term IBM borrowed and predates EBCDIC by several decades. E.g., the 6-bit Teletypesetter (TTS) telegraphy code introduced in 1928 [sic] used two code pages to achieve a larger character set than could otherwise be achieved with only 6 bits to work with.

I worked extensively with TTS in my first career as a newspaper typographer. It would not surprise me to learn that telegraphy codes earlier than TTS also used code pages. It would probably take some library time to find a citeable reference. There is little on the Web about TTS, notwithstanding its incredible but generally unacknowledged importance in the history of computing. Marbux (talk) 15:53, 3 December 2008 (UTC)[reply]

Those were not code pages as discussed here, and indeed I don't think they were called that at the time.

A code page (as discussed in this article) is an encoding of a character set, which can represent a text (in a given language or collection of languages for which the code page was designed). For example, code page 1252 contains capitals, lower case, digits, punctuation and various accented characters, making it possible to store things like ‘Wer nicht vorwärts geht, der kommt zurücke.’ in a text file. (Note the capitals, punctuation and accents.)

The TTS code on the other hand used characters ‘ .. ..’ (shift) and ‘ .....’ (unshift) to turn shifted mode on and off. In unshifted mode you got lower case, digits and some punctuation, whereas in shifted mode you got capitals, special characters and different punctuation. Some codes, like space, control codes (like return, shift and unshift), period and comma were the same in both modes. All these taken together could be used to send English text.

So you see that these concepts are quite different and that the TTS's shifting mechanism has nothing to do with these code pages. — Preceding unsigned comment added by 82.139.82.82 (talk) 16:03, 21 November 2015 (UTC)[reply]

OEM character set[edit]

Other pages, (notably FAT), while talking about "OEM character set", redirect to here. But the word "OEM" is missing here. Can somebody please fill the gap?

any good now? Plugwash 19:59, 27 February 2006 (UTC)[reply]

And what is an OEM string? --Abdull 12:05, 19 April 2007 (UTC)[reply]

See here: http://blogs.msdn.com/b/oldnewthing/archive/2005/08/29/457483.aspx — Preceding unsigned comment added by 82.139.82.82 (talk) 14:56, 21 November 2015 (UTC)[reply]

does anyone know[edit]

if there was any CJK support availible for dos and if so how it worked? was it limited to 512 characters onscreen at once? did it use graphics mode? Plugwash 01:21, 28 February 2006 (UTC)[reply]

Link: http://www.o3one.org/hwdocs/bios_doc/dosref22.html

I believe different systems used different solutions. One possibility is using half-width and full-width characters. Half-width would be "normal" characters, whereas full-width characters would take up two cells and consist of a lead byte and a trailing byte. Alternatively, one could simply stick to graphics mode (the solution most commonly used today).

Sometimes quite innovative solutions were used. For instance, I seem to remember reading about a computer which had dipslay modes for high-resolution, but low color, and vice versa, capable of using two pages. (This was not uncommon at the time; graphics memory was limited.) The computer allowed the use of different modes for the two pages, using one as an overlay for the other. So you could have nice colour graphics, while still having readable kanji. I wish I remembered the model of the thing. Anyone? Shinobu 15:27, 15 April 2007 (UTC)[reply]

Probably this one: FM Towns, or a related system. Shinobu 16:16, 22 May 2007 (UTC)[reply]

ISO 8859-1[edit]

What is the code page for ISO 8859-1? This encoding is mentioned twice in the article, but the code page for it is not given. 99.137.109.95 (talk) 20:15, 19 May 2008 (UTC)[reply]

Try Codepage 819. It was made available (along with other ISO-8859 encodings, up to about '8859-10 or so) by Kosta Kostis, for MS-DOS computers. Standard MS-DOS rejects CP 819 and many others, so he provided replacement DOS commands to accept the code pages he made available. Apparently, these code pages were created for IBM minicomputers, such as the AS/400 (?). The nice box-drawing graphics in Codepage 437 are no longer available, and any DOS software that uses them looks pretty bad. Refs: Please read the text at the top of this page first: [1]

IIrc, the first two or three freeware character encoding tools here [2] are what one would want to enable an MS-DOS machine to work with Latin-1 and other earlier Latin-[n] encodings. Years ago, I set up an MS-DOS machine so that the [chcp] command would select any of probably 10 codepages -- very nice, although a given text would look correct only if the given codepage were correct. (You couldn't combine, say, Maltese, Russian, and Icelandic in the same document.) Nikevich (talk) 10:09, 15 April 2009 (UTC)[reply]

Unicode biased?[edit]

Not to diss Unicode, but I feel the article is a bit subjective towards it.
Phrases like

"code pages have been rendered obsolete by newer and better international standards, such as Unicode."

and

"Many code pages, except Unicode, suffer from several problems."

don't make for a neutral feel. The truth is, Unicode has similar problems as other code systems, only more.
Pim 2 (talk) 15:12, 22 May 2009 (UTC)[reply]

Unicode does not have ‘similar’ problems as the old code pages did, and the few problems that are there are so small in magnitude that they tend to be insignificant in actual use. They don't come close to the crap we used to have to deal with; I remember the pre-Unicode era and I wish I could forget.

Also, Unicode has made code pages obsolete, that is a fact regardless of whether it's better or not. The old code pages no longer have any real utility and are only supported to be able to convert between legacy data and Unicode.

It is not biased to say that Unicode is better, because it is objectively better. It is also not biased to say that evolution happens, that the speed of light is constant regardless of reference frame and that homoeopathy doesn't cure. — Preceding unsigned comment added by 82.139.81.0 (talk) 22:09, 1 December 2012 (UTC)[reply]

I have no doubt that Unicode is, in general, an improvement over older multi-character set systems, especially in its generality. It isn't AT ALL clear why (or whether) it is better for an average (US-English) application or user. Using "objectively" and "better" in the same sentence is silly. Code pages are NOT obsolete since Microsoft continues to use the term. It will be obsolete when the number of users of OSs/applications using it drop to a lot less than most of the PCs in the world - a long time from now, imho. I'm trying to understand why code pages seem to be an alternative to Unicode under VC++ and how they differ. Sure wish this article helped, but it doesn't.72.172.1.40 (talk) 23:31, 25 September 2014 (UTC)[reply]

> Using "objectively" and "better" in the same sentence is silly.

Not as silly as saying things can't be objectively better. — Preceding unsigned comment added by 82.139.82.82 (talk) 14:58, 21 November 2015 (UTC)[reply]

Are we making sense?[edit]

QUOTE: Code page is another term for character encoding.

But the character encoding page does not point here or vice-versa - as it would elsewhere in wikipedia for equivalent terms.

Should we not say something like "Code Pages are an approach to character encoding in computer science and software engineering (IBM, Microsoft) and a code page is equivalent to a charmap in unix." ?

The article on character encoding states:

" A code page usually means a byte oriented encoding, but with emphasis to some suite of encodings (covering different scripts), where many characters share same codes in all these code pages (or most). Well known code page suites are "Windows" (based on Windows-1252) and "IBM"/"DOS" (based on code page 437), see Windows code page for details. Most encodings referred to as code pages, but not all of them, are single byte encodings. "

Yet that quote is itself not even edited as standard English ( I will now correct that at this time ) so that article cannot yet be considered definitive for purposes of clarification, is it?

But: in en.wikipedia the term "charmap" is mapped to the Windows software utility which is not a charmap but part of the name of "charmap.exe" (which mapping looks to me like clear Microsoft bias rather than NPOV !) The result is that my proposed clarification becomes unclear as "charmap" occurs twice in character encoding, once as "CHARMAP" and once as "charmap" and is itself ambiguous as to its naming a unix executable file or various unix character mapping files or "character set description files" (per Solaris doc's,) for example, a file including the Posix Portable Character Set (see the three terms "charmap", "CHARMAP" and "Character Description File" in Open Group's "Character Set" [3] web page in which "charmap" is stated to mean "Character Description File".)

G. Robert Shiplett 15:23, 25 March 2012 (UTC)

This article is inadequate.[edit]

Describing a code page as:"...a table of values that describes the character set for encoding a particular language." is inadequate. One problem is that various code pages contain non-character (control) infomation also. Some also include codes for character (or even word and line) modification, including order, direction, and spacing and grouping. These control characters have a profound effect on the representation, as well as the meaning, of the text they are in. Additionally, listing code pages without any discussion of how they compare is not particularily useful, imho (saying CP X is the same as CP Y, except for characters x1-xx is also less than illuminating) Its like claiming that a "vehicle" is a thing that gets you to work and then contrasting a car with a pick-up truck (and ignoring bikes, trains, busses and air/space craft, not to mention boats and ships). This article is much too narrowly focused on Euro-American characters, imho.72.172.1.40 (talk) 20:41, 25 September 2014 (UTC)[reply]

Hm, I'd say that ambiguity of the lead section is debatable; anyway, went ahead and touched it up a bit, please check it out. — Dsimic (talk | contribs) 06:25, 29 September 2014 (UTC)[reply]

Separate conflicts[edit]

I think parts of the article would become clearer if the conflicting code pages were separated. The sections themselves would become cleaner, the history would become clearer and we would be able to reference the new section(s) from the text that discusses the existence of conflicts. — Preceding unsigned comment added by 82.139.82.82 (talk) 14:37, 21 November 2015 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 3 external links on Code page. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

Corrected formatting/usage for https://msdn.microsoft.com/de-de/en-en/library/windows/desktop/dd317756%28v%3Dvs.85%29.aspx
Added archive https://web.archive.org/web/20160527142512/http://permalink.gmane.org/gmane.os.freedos.devel/364 to http://permalink.gmane.org/gmane.os.freedos.devel/364
Added archive https://web.archive.org/web/20090906204346/http://www-01.ibm.com/software/globalization/cp/cp_es.jsp to http://www-01.ibm.com/software/globalization/cp/cp_es.jsp

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Y An editor has reviewed this edit and fixed any errors that were found.

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 04:55, 10 August 2017 (UTC)[reply]

Relationship to Unicode[edit]

This section does/did not really make sense in the version before Jan 6th 2019. There is preciously little relationship between Unicode and codepages except that Unicode tries to retain comaptibility with many pre-existing ones. I will shorting and clarify that section. If somebody feels that something essential(!) went missing please add it to the end.Roeschter (talk) 19:18, 6 February 2019 (UTC)[reply]