Talk:Shift JIS

Language...
can someone put this in english for the common man? —Preceding unsigned comment added by 75.108.224.182 (talk) 05:22, 25 February 2008 (UTC)


 * Sure, just take a look at Evil. Jpatokal (talk) 09:57, 25 February 2008 (UTC)

Typo?
developed by a Japanese company called ASCII Sounds like a typo, but I don't know if it is. First google page doesn't seem show anything. JamesBrownJr 22:17, 4 May 2006 (UTC)


 * It's not: the company is for real. Jpatokal 02:03, 5 May 2006 (UTC)


 * Ken Lunde states on page 175 of his authoritative book "CJKV Information Processing", O'Reilly & Associates, 1999, ISBN 1565922247, that shift-JIS was "originally developed by Microsoft Corporation". Later on page 176 he makes a reference to the ASCII Corporation's version of Japanese TEX as one of four examples of computer platforms or environments that uses shift-jis internally.


 * Lunde at least does not seem to be crediting the ASCII Corporation with having invented shift-jis. Morten Johnsen 23:51, 25 June 2006 (UTC)

The article has an underscore in the title which appears nowhere else in the article. Is this correct or a typo? If it's correct, it should be used throughout. User:Fredhoysted 14:57 (UTC)
 * Underscores are not used in normal English. However, MIME names cannot contain spaces; this is a common restriction for identifiers in software, it is also seen in programming languages for example. So to substitute an underscore is used in the MIME name. Shinobu 05:23, 4 November 2007 (UTC)

Umlauts
Why has Shift-JIS no code points for umlauts assigned? --84.61.71.163 15:25, 15 May 2006 (UTC)
 * Japanese typers rarely have a need to type the words "Mötley Crüe". You could use ISO-2022-JP-2, which has a mechanism to switch to ISO-8859-1 (includes umlauts), but you might as well just use UTF-8. --150.216.151.171 17:59, 9 July 2006 (UTC)

Upper and Lower ASCII
This page uses the term "upper ASCII" and "lower ASCII". I believe that the writer meant "characters > 127" and "characters <= 127" in a fixed-width 8-bit encoding. But ASCII only defines 127 characters. There is no "upper ASCII".

http://www.xslt.com/html/xsl-list/2002-02/msg00248.html

Browser interpretation of 0x5C
At least Firefox interprets 0x5C in Shift_JIS as '\' and not '¥'. I suspect this is because the '\' character is used to escape characters in Javascript, so having an encoding without a representation of that character would be a security problem. JeffreyYasskin 21:20, 19 December 2006 (UTC)
 * And you would be wrong to suspect that. 0x5c is commonly used as a special character, regardless of what symbol that character value actually represents. So on a Japanese computer you have filesystem paths like "C:¥Program Files¥", the DOS prompt looks like "C:¥>" and a Hello World program might contain the line 'cout << "Hello World!¥n";'. Shinobu 05:29, 4 November 2007 (UTC)

ASCII coporation: not a typo
Hi, I wrote a large portion of this article (before I had a sign-in) and that is what I meant when I wrote it. Sadly I could not remember where I read it, but I've done some googling so I'll add a link that backs up what I said. —The preceding unsigned comment was added by Tim Band (talk • contribs) 15:38, 7 May 2007 (UTC).

:1997 - what does that mean?
JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double byte characters) What does :1997 mean? The linked articles don't yield a clue, and both state that the standards were set in 1969 and 1990(?). Were they revised in 1997? If so, why didn't they simply get a new four-digit number? Shinobu 16:16, 31 August 2007 (UTC)
 * 1969 and 1983, yes; AFAIK there are no later revisions. The one you're thinking of in 1990 is JIS-X-0212 Tacitus Prime 11:22, 11 September 2007 (UTC)

So the ":1997"'s in the article are wrong and should go, right? Shinobu 05:32, 4 November 2007 (UTC)
 * Bit late to the discussion, but to avoid confusing future readers: at the time of this writing, JIS X 0208 has four versions (JIS C 6226-1978, JIS C 6226-1983, JIS X 0208-1990, JIS X 0208:1997). JIS X 0201 has three (JIS C 6220-1969, JIS C 6220-1976, JIS X 0201:1997). The colon before the 1997 instead of a hyphen is just a change in convention (and technically, so is the “X” classifier; that was a new category created in 1987 for information processing and such, because while it may have made sense to put that under “Electronic and Electrical Engineering” back when they were first writing these standards, it became a field in its own right). This statement is correct as stands, and at some point in the next few weeks months years, this should all be fleshed out their own, appropriate articles. -BRPXQZME (talk) 23:27, 13 June 2009 (UTC)

EUC-JP
The article says "the competing 8-bit format EUC-JP, which does not support halfwidth katakana" - but EUC does indeed have the halfwidth katakana (upper half of JIS-X-0201:1976) in G2 (i.e., as two-byte sequences 0x8E 0xA1 .. 0x8E 0xDF) Tacitus Prime 11:22, 11 September 2007 (UTC)


 * The article means that it doesn't support single-byte encoding of halfwidth katakana. I've added a clarification. Jpatokal 12:53, 11 September 2007 (UTC)

Recommendation or lobbying?
"it is recommended that Unicode be used instead" recommended by unicode.org, isn't it? Should be added then! —Preceding unsigned comment added by 84.56.91.141 (talk) 01:06, 4 June 2009 (UTC)

Error in the Transformation Formula
The artice states some malfunctioning transformation formulas:


 * $$33 \le j_1 \le 94 \Rightarrow s_1 = \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 112\,$$
 * $$95 \le j_1 \le 126 \Rightarrow s_1 = \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 176\,$$
 * $$j_1 \mbox{ is odd } \Rightarrow s_2 = j_2 + 31 + \begin{cases} \left \lfloor \frac{j_2}{95} \right \rfloor & \mbox{if } j_2 \ge 96 \\ 0 & \mbox{otherwise} \end{cases}  \,$$
 * $$j_1 \mbox{ is even } \Rightarrow s_2 = j_2 + 126\,$$

If you apply these formulars to some randome examples you will get e.g.:
 * 朧 (Kuten 59-16) 8E, 2F instead of the correct 9E, 4F
 * 察 (Kuten 27-01) 7E, 20 instead of the correct 8E, 40
 * 鯣 (Kuten 82-40) 99, A6 instead of the correct E9, BE

It appears to be that you have to increase both, j1 and j2 by 32 (0x20) before doing this kind of calculation, witch will correct the first two example and the 1. Byte of the last one but will get E) instead of BE for the second byte of 鯣. Do you have an idea how to fix also this one? --Sannaj (talk) 14:55, 11 November 2012 (UTC)


 * Interesting. Here's an alternative formula: http://www.sljfaq.org/afaq/encodings.html#encodings-Shift-JIS Jpatokal (talk) 02:50, 12 November 2012 (UTC)


 * I just realised I read the the article wrongly. It stated that the formula needs to be applied to "double-byte JIS sequence $$j_1 j_2$$", but I've used the Kuten-Code. This explains the increase of 0x20 for both bytes. But it still doesn't explain 鯣. --Sannaj (talk) 20:06, 22 November 2012 (UTC)


 * I think your E9,BE shift JIS code is wrong.
 * A table for Kuten 82-40 matches your symbol.
 * My calculation says shift JIS for 82-40 (j1,j2)=(114,72) [decimal] is (s1,s2)=(E9,C6) [hex].
 * Codepage 932 converts E9,BE to U+9BB4 &#x9BB4;
 * Codepage 932 converts E9,C6 to U+9BE3 &#x9BE3; symbol matches Kuten table.
 * Inverting the formula takes (s1,s2) (E9,BE) [hex] to (j1,j2) (114,64) [decimal] which goes to Kuten 82-32.
 * Kuten 82-32 matches U+9BB4.
 * The given formula is consistent with codepage 932.
 * The formula also makes sense for shifting around the kana.
 * Glrx (talk) 19:18, 3 January 2013 (UTC)
 * O, ok, that would explain my problems with the formula. --Sannaj (talk) 18:31, 5 January 2013 (UTC)

Formula Error
I've looked at this for a while now, and I'm pretty sure that the formula for $$s_2$$ in the odd $$j_1$$ should have $$\lfloor j_2/96 \rfloor$$ rather than $$\lfloor j_2/95 \rfloor$$. Isn't the purpose of that skip to avoid the non-printing DELETE character (code 127) in the second byte? As it stands it skips code 126 instead, which doesn't make sense. Uranographer (talk) 19:33, 1 February 2013 (UTC)
 * This is the second time the formula has been challenged in the past month. It's completely unsourced and we have no idea where it came from. I propose removing it entirely unless it can be reliably sourced. Without sourcing, it is original research. Regards, Orange Suede Sofa  (talk) 20:42, 1 February 2013 (UTC)
 * I see your point, but it's really just a mathematical way of describing the Shift JIS encoding procedure, so in that sense it really isn't very extensive original research. (Although, I've seen people get papers published with about as much content!)  I think it's probably okay to leave it.  I won't argue, though, if you want to remove it. Uranographer (talk) 21:56, 1 February 2013 (UTC)
 * Looks like I introduced the error when I mis-simplified the previous contorted expression in the section above. Floor(j_2/95) was only used if j_2 ≥ 96. Since floor will then be one, the simpler version is floor(j_2/96) without the conditional (or 1 if j_2 ≥ 96 and 0 otherwise). I thought the expression was just doing the same thing twice. Glrx (talk) 22:54, 1 February 2013 (UTC)
 * Heh, it's easy to do. I've been coding and writing this stuff up all day and I get those simplifications right about half the time--and that's if I'm lucky.96.18.211.87 (talk) 00:12, 2 February 2013 (UTC)
 * (Guess I had logged out--that was me Uranographer (talk) 00:13, 2 February 2013 (UTC))

Removing "UTF-8 is recommended"
On October 30 2011 user BIL added: https://en.wikipedia.org/w/index.php?title=Shift_JIS&diff=458095114&oldid=455096120 "The same thing is valid for UTF-8 which is a world standard, better supported by software, and is predicted to fully replace Shift-JIS and EUC-JP." On December 8 2010 user 131.107.0.81 added: http://en.wikipedia.org/w/index.php?title=Shift_JIS&diff=401316723&oldid=396679673 "..., conflicting with some code points. This is one reason why applications are recommended to use Unicode such as UTF-8 or UTF-16 instead." There was no explanation or citation. I added "By whom?" markers in May 2013. I'll be happy if someone cites some respected authority. Until then, these UTF-8 endorsements don't belong. I removed them. Peter Gulutzan (talk) 02:09, 21 October 2013 (UTC)

Questionable redirect
The JIS X 0213 article links to Shift JIS-2004 but Shift JIS-2004 redirects here. It is *not* the same encoding and no information about Shift JIS-2004 is present on this page. 58.173.133.147 (talk) 10:56, 20 May 2014 (UTC)

Huh?
The lead bytes for the double byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF.

Wut? Can someone explain this without the jargon? Maury Markowitz (talk) 16:09, 23 September 2015 (UTC)


 * Here's what I think it means. Look at JIS X 201 which is a single byte encoding with halfwidth katakana characters; it does not encode kanji or hirigana. The 201 code table has unpopulated codes in the upper half; the unpopulated codes are before and after the katakana. The goal of Shift JIS was to be compatible with JIS X 201 (a JIS X 201 string is a Shift JIS string), but Shift JIS needed several lead bytes to signal shifts to subpages. Those shifts use the codes before and after JIS X 201's katakana characters. Look at the lead bytes ("First byte of a double-byte JIS X 0208 character") in the Shift JIS.
 * The sentence is poor and shifted could have two meanings. It might mean the shift bytes (lead bytes) are before and after the katakana, or "shifted around" might mean sprinkled before and after the katakana. Glrx (talk) 16:57, 23 September 2015 (UTC)

Version problem
http://www.dozine.co.jp/duke/ displays this:
 * ‚æ‚¤‚±‚»AŒöŽÝ‚ÌHP‚ÖB
 * ŒöŽÝ‚Í18‹ÖƒQ[ƒ€‚Ìƒ\ƒtƒgƒnƒEƒX‚Å‚·B

http://blasterhead.product.co.jp/ displays this:
 * 当サイトは、18才未満の方は閲覧出来ないページを含んでいます.
 * 18歳未満の方は、これより先のアクセスを固くお断りします.

What problem with JIS could cause that? Ranze (talk) 07:58, 15 August 2016 (UTC)


 * I don't know, but my guess is:
 * the dozine site is serving JIS without specifying the charset, so the JIS is interpreted as UTF-8.
 * the blasterhead site is serving UTF-8 without specifying the charset or serving JIS with specifying the charset.
 * Glrx (talk) 18:04, 21 August 2016 (UTC)

ASCII company confusion
I think there should be some sort of clarification that the ASCII company is unrelated to the ASCII character encoding scheme within the article. Possibly something like ASCII, unrelated to the encoding scheme, and then continue on with the article after the comma. 207.131.207.202 (talk) 18:17, 26 June 2023 (UTC)


 * I think that's a good point, particularly since ASCII (the encoding) is mentioned so often. I can't think of a way to cleanly explain it directly in the lead, so I'll add a footnote shortly. Thanks, Orange Suede Sofa  (talk) 20:02, 26 June 2023 (UTC)
 * ✅ I went ahead and added a footnote; my wording is a little too instructive for my taste but that's all I got. Orange Suede Sofa  (talk) 20:30, 26 June 2023 (UTC)