Wikipedia:Reference desk/Archives/Computing/2023 June 15

= June 15 =

Mojibake of surrogate characters
All of these seem to come from surrogate pairs, and i found no information about such mojibake in our article Mojibake, so i'm asking here. They were apparently created by Notepad++.

Some that i found out are:

Other pairs that i could not find out include: {U+7DB}{U+3E0}, {U+7DA}{U+4BF}, and {U+2B3E}{U+2B33}. I couldn't decode them with https://www.linestarve.com/tools/mojibake/ - any other way? Or maybe anyone can find out the relationship between the mojibake and the decoded pairs? ◅ Sebastian 11:52, 15 June 2023 (UTC)


 * Are you trying to find the character encoding of the source ? Is the output what is displayed when it is interpreted as utf-8? Or what is it you're trying to do? There is no 'emojibake' character encoding. NadVolum (talk) 17:16, 15 June 2023 (UTC)
 * My question is independent of the display - it's only about the code; that's why i only put code in the table. I'm trying to understand what happened, similar to this question. My hope is to get a sequence of encodings, transcodings and decodings, as in the answer to that question. (As that example shows, the solution also includes an assumption for the character encoding of the source, but that's not my main goal here.) But it would also be nice if someone who likes riddles could find the original code points of the three pairs i mentioned. ◅ Sebastian 04:02, 16 June 2023 (UTC)
 * Can you give a link to a webpage that uses these combinations? --Lambiam 10:38, 16 June 2023 (UTC)
 * By ‘combinations’, do you mean surrogate pairs? There are many webpages that contain emojis (such as this), but how can one search for those that are coded using surrogate pairs? ◅ Sebastian 19:07, 19 June 2023 (UTC)
 * I mean things like {U+7DA}{U+4A0} and {U+D83D}{U+DEB2}, which I suppose represent sequences of bytes in a file, not sequences of key strikes on a keyboard. I have no idea what makes you think these have anything to do with the emoji 🚲. --Lambiam 22:45, 19 June 2023 (UTC)
 * The first of these ‘{U+7DA}{U+4A0}’, which you took from the mojibake column of the table above, is simply the mojibake encountered. The second pair ‘{U+D83D}{U+DEB2}’, from the column ‘surrogate pair’, is the surrogate pair for the code point given in the previous column. What makes me think that they have something to do with the code points in that column is that there is a well defined relationship between the two, as described at surrogate pair. Because it occurred to me that you might not trust that article, I just added a reference there which describes the concept in more detail than our article does.
 * BTW, for clarity's sake: Since both you and NadVolum latched on the term ‘emoji’, which is a nice short word, we can agree on that term here for convenience, but it needs to be said that these are actually “characters outside the initial Basic Multilingual Plane”, as our pertinent article correctly calls them. While the samples I listed here happen to all be emoji, the problem is independent of that. The problem never occurs with common emojis such as ‘🙂’, while it should arise with characters such as {U+1F812} (although I haven't actually encountered that yet.) ◅ Sebastian 15:23, 20 June 2023 (UTC)
 * This leaves the following questions unanswered. (1) Can you give a link to a webpage that uses the mojibake {U+7DA}{U+4A0}? (2) What makes you think has anything to do with {U+1F6B2}? --Lambiam 10:25, 21 June 2023 (UTC)
 * (1) No. What makes you think I should be able to? I already replied to that question 2 days ago with a pertinent question that you chose to ignore. (2) Because the person who I got it from told me so (or I didn't have to ask because it was clear from the context in which they used it). ◅ Sebastian 11:09, 21 June 2023 (UTC)
 * You said you couldn't search for surrogate pairs, but question (1) is about mojibake . I apologize if my limited intelligence keeps me from making sense of your question; in any case, it is clear now it will keep me from making any progress in answering it. --Lambiam 09:57, 22 June 2023 (UTC)