Wikipedia:Reference desk/Archives/Computing/2020 September 30

= September 30 =

Abbreviations on the internet
Recently I've spotted some strange abbreviation-like series of letters in Youtube comments and chats typed in small font, like KZAZTRKGUZTM or AMAMAMAMAMAMAM (screenshot), gaining several likes. What are they? 212.180.235.46 (talk) 13:47, 30 September 2020 (UTC)
 * The website of the Qazaqstan Radio and Television Corporation is at http://kaztrk.kz; perhaps is a typo for . In the screenshot this comment has no likes or dislikes. In the Kazakh language and several other Turkic languages such as Azeri, ам is a vulgarity.  --Lambiam 15:51, 30 September 2020 (UTC)
 * The things I learn from Borat. 2601:648:8202:96B0:0:0:0:DDAF (talk) 19:40, 30 September 2020 (UTC)

Apostrophe replacing question
At work, I discovered a curious piece of code. It is apparently supposed to remove all apostrophes from a string but keep double apostrophes in place. They way it works is by first replacing " " with " ", then removing all apostrophes by replacing " " with "", and finally replacing " " with " ".

This works, but if the string actually contains " ", it gets replaced with " ".

So I thought of a better algorithm. It would go like this:
 * 1) Split the entire string into substrings using " " as a delimiter.
 * 2) If the first and/or the last substring are empty, ignore them.
 * 3) Of the remainder, replace every empty substring with " ", except that if there are consecutive streaks of empty substrings, leave every second one as empty to get around triple apostrophes.
 * 4) Join all substrings back together.

Would this work? J I P &#124; Talk 20:59, 30 September 2020 (UTC)
 * , that's very elaborate. I don't know the language or the regular expressions in play here, but the construction {n} (see ref) means to "match exactly one occurrence." What is the code supposed to do for triple, quadruple, etc. parentheses? Elizium23 (talk) 21:46, 30 September 2020 (UTC)


 * I agree, the substrings are very elaborate, I'd be inclined to just do  (that's Perl syntax, but any language that can do regular expressions should be able to do something similar). --174.89.48.182 (talk) 21:50, 30 September 2020 (UTC)


 * If you can't use regular expressions, I'd do it more directly - just scan the characters of the string. If you find an apostrophe, see if an apostrophe follows it.  If not remove it.  Bubba73 You talkin' to me? 23:42, 30 September 2020 (UTC)


 * I guess the writer thought _xx_ was unlikely to occur. If there is a possibility it will occur then replace it with something a rather more unlikely. Incidentally I do something similar to replace any single paragraph break in a block of text but leaving any double breaks.--Shantavira|feed me 06:27, 1 October 2020 (UTC)
 * If I understand correctly, the code replaces a sequence of n apostrophes by half the number, n/2 apostrophes, where the division rounds down to a whole number. I did not understand correctly; the code replaces a sequence of n apostrophes by 2×(n/2) apostrophes, where the division rounds down to a whole number. I cannot readily think of a purpose for such an operation. In the syntax of some languages, the apostrophe character is represented in a string denoted between "single quotes" (i.e. apostrophes) by repeating it, so the string   is denoted as  . (Other languages might denote the same string using an escape symbol as  .) But this cannot explain the operation here, because then single apostrophes cannot occur between the delimiters enclosing the string. If the string operations are Unicode-cognizant, the temporary replacement string   is, I think, considerably less likely to occur accidentally than , but of course also not foolproof. As to a better algorithm for this mystery operation, that is hard to judge without knowing the programming language and the available string-handling library. At a very low level, a program could copy the characters over one by one from a source string to a target string while maintaining a Boolean flag "apo_odd" an integer  , initially set to false 0. On encountering an apostrophe, it is not simply copied over like other characters. Instead, the flag apo_odd is toggled the counter   is incremented. If it is now true, skip the copying and proceed to the next source character. Otherwise, set the flag to false and copy the apostrophe just like any other character. Before a non-apostrophe is copied over, first   is tested for being positive. If so, if it is odd it is decremented, and then   apostrophes are appended to the target string, while   is reset to 0. (Written in C and descendants, the code is much shorter not much shorter than this description:

apo_odd= 0; while (c= *s++) { if (c == '\'' && (apo_odd= !apo_odd)) continue; apo_odd= 0; *t++= c;       }

do {c= *s++; if (c == '\'') apo_cnt++; else { if (apo_cnt) { if (apo_cnt & 01) --apo_cnt; while (apo_cnt--) *t++= '\''; }           apo_cnt= 0; *t++= c;          } } while (c);
 * (I have not tested this, so don't use without testing.) --Lambiam 10:04, 1 October 2020 (UTC); redacted 08:06, 2 October 2020 (UTC)


 * Possibly you should mention, a piece of what code you found. A solution may vary greatly depending on a language and a context of string processing. For example, whether it is a low-level C char-by-char manipulation, an advanced C++ or Java string library or a generalized reg-exp processing? Does the code perform in-place modification or it builds a new piece of data based on the original one? Depending on amount of data to process and frequency of processing, is a program readability and flexibility your priority, or may be you want it as fast as possible?
 * An answer to each of these questions (and probably some more, which have not appeared in my head yet...) may influence the final answer.
 * Anyway, before sketching a code I would try to say what the requirement is: the code should keep all contiguous blocks of apostrophes which have an even length, and remove/skip one character from those of an odd length. For example a single apostrophe should disappear, and 2, 3, 4, 5, 6, 7 apostrophes should become 2, 2, 4, 4, 6, 6, respectively. --CiaPan (talk) 21:59, 1 October 2020 (UTC)
 * Oh right. The code is written in C# using the Microsoft .NET Framework. J I P  &#124; Talk 22:16, 1 October 2020 (UTC)