Template talk:Lang/Archive 6

Parameter to selectively disable auto-italics in the Lang-xx templates
We need to be able to selectively disable (e.g. with no) the auto-italicization of non-English content in the templates that auto-italicize (, etc.), so that the style is not applied to proper names (e.g. placenames, titles of songs, etc.).

For example, the present code of is:

It hard-coding the italics.

The brute-force way around this is to go template-by-template and do something like:

A more elegant solution is to:
 * 1) Put this test into, to do italics automatically by default, but exclude it when no (or 0, etc., etc.) if passed into it.
 * 2) Change all the  type templates that  auto-italicize by default, to do:

(and whatever other parameters they need, case by case)
 * 1) Change all the  type templates (the non-Latin-script ones) that should  italicize, to do:

(and whatever other parameters they need, case by case) — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  07:09, 30 October 2017 (UTC)
 * I was hoping you could just put italics around the template when you use it in an article, but that doesn't work:
 * Di me con quien andas....
 * Don Quixote
 * It looks like a systematic solution within Language with name is necessary. – Jonesey95 (talk) 13:43, 30 October 2017 (UTC)
 * Yeah, the presence of the language name necessitates a template-internal fix. There is a grotesque hack one can do in situ, but we should not have to do this, and it's so brittle and ugly that later editors are likely to break or revert it:  – Don Quixote.  An even-worse kluge:   – Don Quixote.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  00:39, 31 October 2017 (UTC)
 * This template's documentation suggests:
 * Don Quixote
 * Don Quixote
 * —Trappist the monk (talk) 11:30, 31 October 2017 (UTC)
 * —Trappist the monk (talk) 11:30, 31 October 2017 (UTC)

converting to lua
Because it amused me to do it, I have hacked up Module:Lang (I was surprised to see that name still available). Not complete but in this first iteration it appears to correctly render for languages supported by MediaWiki (not the whole 900+ languages supported by the  templates (see ) so the module will need a table of the language names not supported by MediaWiki.  The module supports italic and appears to correctly render when that parameter is used.  It also appears to handle rtl languages when rtl is set.  The module doesn't deal well with erroneous input and does not yet support categorization; basic rendering of  and  templates first.  In these examples, the live  template is followed by the module  : —Trappist the monk (talk) 14:46, 31 October 2017 (UTC)
 * Don Quixote –
 * – yes
 * Don Quixote –
 * – no
 * Don Quixote –
 * – italic
 * הורביץ, אלוף ("לופי") –
 * – no yes
 * Schweet. I'm not sure what the "for languages supported by MediaWiki" means; we'd want it, surely, to try to do the right thing for any arbitrary value given for ?? in .  We're more apt to need something like  or  than  in most contexts (how often do we really need a wikilink explaining what the Spanish language is)? Ideally,, etc. would also work after the Lua adaptation, since we have specific articles on various dialects of English.  I guess that's a lot of work, but hopefully the  code with 900+ of these already worked up can be dumped and munged in a way that makes it easy to adapt to the new Lua code. If there's a convenient way to extrapolate the language code to WP article correspondences in an array that is included that would probably make maintenance and expansion easier.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  16:20, 31 October 2017 (UTC)
 * for languages supported by MediaWiki refers to the languages supported by the magic word .  For example, ISO 639-1 code   (Arabic) is supported:
 * but ISO 639-2 code  (also Arabic) is not:
 * Of those languages that are supported, there are likely to be differences:
 * Don Quixote –
 * in this case 'Western Frisian' agrees with the ISO 639 custodians; see loc 639-1 and 639-2, and sil 639-3
 * I think that the rule we can apply to 639-2 and -3 language codes is to fall back on 639-1 when there is a 639-1: code  →  ;   →  ; etc.  We can keep a table specifically for fall back codes and another table to hold language names for 639-2 and -3 codes that don't fall back to 639-1 (Hopi, for example)
 * —Trappist the monk (talk) 17:21, 31 October 2017 (UTC)
 * I haven't been following the discussion, so apologies if this is irrelevant, but there exists Module:Language. – Uanfala 17:48, 31 October 2017 (UTC)
 * Yep, am aware of that. I haven't given it a close line by line reading but to me it looks to be more tailored to Wiktionary's needs than to Wikipedia's needs.  I'm not opposed to merging this with that if it makes sense to do so.
 * —Trappist the monk (talk) 17:59, 31 October 2017 (UTC)
 * I support the module-ization of this template, especially if it means that categories like will be easier to deal with. I spent a while creating (hundreds?) of ISO 639 templates and matching categories for obscure languages; the error category should more properly be used to track actual errors. I would be happy to help create a list of language codes and their matching full language names. – Jonesey95 (talk) 20:05, 31 October 2017 (UTC)
 * If there should be an array matching ISO 639-3 codes to language names, then it should ideally be in sync with Module:Language/data/ISO 639-3 as well as – whenever possible – with the comprehensive series of ISO 639:xxx redirects. — Preceding unsigned comment added by Uanfala (talk • contribs) 20:17, 31 October 2017 (UTC)
 * Perhaps better for initial experimentation is Module:Language/data/iana_languages which also has 639-1 codes. That file may be dated since a comment at the top of it reads 2014-04-10 and I haven't wrapped my brain around the documentation in Module:Language/name/data.
 * —Trappist the monk (talk) 21:05, 31 October 2017 (UTC)
 * The documentation for this template seems to suggest that BCP47 (IETF language tags) should be used when choosing the code for the template. That being the case, Module:Language/name/data would seem to be the best choice ... except that it includes a file called Module:Language/data/wp languages which has, as its accompanying 'documentation', this: "Wikimedia wikis uses some non-standard codes and a subset of IANA codes, plus composite codes".  Why?  Why 'spoil' the standard that way?
 * —Trappist the monk (talk) 23:16, 31 October 2017 (UTC)
 * might have an opinion here, as he was the last to work on this module. – Uanfala 23:25, 31 October 2017 (UTC)
 * And there is more ... There are lang-xx templates that don't use BCP47 codes:
 * كَیکاوس
 * Presumably we can troll through and find what appear to be legitimate language codes that aren't part of 639-anything and create a table for use by the module.
 * —Trappist the monk (talk) 12:56, 1 November 2017 (UTC)
 * One answer to my 'why spoil the standard' question might be because the 'official' name associated with code  is 'Modern Greek (1453-)' so we use Module:Language/data/wp languages to overwrite the 'official' name with 'Greek'.
 * —Trappist the monk (talk) 16:56, 1 November 2017 (UTC)
 * The fallback idea sounds good to me. I have to note that many 639-2 codes do not work, even with the current non-Lua templates (including some of the other Frisian languages/dialects). I think we have a big win if end up with a system in which none of the lang-family templates will redlink (or break entirely) unless a) we have no article or the language/dialect, or b) the code given is simply invalid.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  02:31, 1 November 2017 (UTC)
 * Module:Language/name/data has flaws. For example, that data would return these language names for these codes:
 * → Frisian
 * → Northern Frisian
 * → Eastern Frisian
 * → West Frisian
 * → Saterfriesisch
 * So, I've created an override table in Module:Lang/data so that we can override the BDP47 language names if needs be. The initial values assigned produce these results
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * Module:Language/name/data has flaws. For example, that data would return these language names for these codes:
 * → Frisian
 * → Northern Frisian
 * → Eastern Frisian
 * → West Frisian
 * → Saterfriesisch
 * So, I've created an override table in Module:Lang/data so that we can override the BDP47 language names if needs be. The initial values assigned produce these results
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)
 * —Trappist the monk (talk) 15:56, 2 November 2017 (UTC)

I saw that my name was mentioned above. It's a wide-ranging discussion, and I'm not sure exactly what I'm being asked.

But I guess I can explain something about Wiktionary's treatment of languages and scripts, which is very different. Language codes that are allowed in language-tagging and linking templates are listed in language data modules. Each language code corresponds to a single language name that we call a "canonical name". The canonical name appears in level-2 headers in entries. There are two subtypes of languages: what could be called "full" language codes are allowed in regular linking or tagging templates, and etymology languages (codes for subtypes of full languages) are allowed in etymology templates: for instance,  for Attic Greek, a dialect of Ancient Greek. Some of the codes are Wiktionary-specific: for instance,  for Proto-Indo-European.

We also have a script data module that contains information on scripts, such as Ustring patterns for the Unicode characters included in the script. Each language may have an array of script codes indicating which scripts it is written with, either in real life, in linguistic works, or on Wiktionary (for instance, {"Latn", "Brai", "Shaw", "Dsrt"} for English). This list of scripts is used by findBestScript in wikt:Module:scripts to automatically detect the script of text that is being tagged. Thus, script codes are generally not required in tagging templates.

Script codes are used as class names (for instance, word for English). Many script codes are from ISO 15924 (for instance, ); others were created to allow wikt:MediaWiki:Common.css to select different fonts for a variant of the script, either for their looks or their character set. (The script code  has the same character pattern as , but having a distinct script code for Persian allows it to be displayed in Nastaliq-style fonts. We don't use the ISO 15924 code   because it does not involve a different character set.)

We don't allow any modifiers to be appended onto language codes: placing,  , or   into a linking or tagging template results in a module error.

As you can see, Wiktionary is much more restrictive than Wikipedia. Many of the features are probably not applicable, but at least you have an overview. One feature that would be nice is script recognition, at least if Wikipedia starts adding CSS classes for scripts. (Or the module could add the very verbose inline CSS that is currently found in and its subtemplates. But inline CSS is best avoided because, to overrule it, you have to add important! to every rule in your personal stylesheet that contradicts it.) I started Module:Language/scripts and Module:Language/scripts/data based on wikt:Module:scripts and wikt:Module:scripts/data, but didn't go anywhere with it, because it would only be for my own use until Wikipedia has a coordinated approach to script tagging and the associated CSS.

As to Module:Lang, I have no objections to it being merged with Module:Language eventually if possible. It's unfortunate to have two modules that do similar things. I did attempt to make Module:Language generate the content of and considered the idea of doing the same for the   templates, but I don't have the motivation to sort out the crazy IETF tags (crazy from my perspective because I don't have to deal with them on Wiktionary), non-Wiktionary language codes, language names, colons, italicization, and the lack of any CSS classes for scripts. But if the distinct purposes of generating a Wiktionary-compatible tagging and linking template and a Wikipedia-style one  can be coordinated, that would be great. — Eru·tuon 07:24, 4 November 2017 (UTC)
 * Thanks for that; it'll take a bit to digest but my initial reaction is that there is a basic lack of compatibility between Wiktionary and en.wiki in that en.wiki attempts, for the most part, to adhere to IETF/IANA language coding and attempts to minimize custom language coding. I do like the css-classes-for-scripting idea.


 * I think that you were mentioned here because you were the last editor to touch Module:Language/name/data so I guess that the mentioning editor presumed that by doing so, you had become the expert.
 * —Trappist the monk (talk) 10:09, 4 November 2017 (UTC)
 * Another feature I forgot to mention is that Wiktionary uses a data module to determine whether a script is RTL. It's probably a bad idea to set text direction for a given language, because languages are written in multiple scripts, and direction is a characteristic of the script, and as script direction can be determined automatically, editors should not have to deal with it at all. (On Wiktionary, this item in the data module is almost never used, because text direction is set for many RTL scripts in wikt:MediaWiki:Common.css with the CSS property direction: rtl;.) I've added script direction data to Module:Language/scripts/data.
 * Another thing I could mention is that we use language and script objects that have several methods (for basic things like retrieving the code and canonical name, or more complex things like retrieving the scripts used by a language, transliterating, or counting the characters in a string that belong to the script). These methods are shared across all objects of the same type using a metatable. This is convenient, because you can use a single variable for the language or the script and retrieve the code or the name from it when needed, and cleaner, because the code that handles the retrieval of the code and name is removed from the functions that use the code and name. But an object is probably overkill at this point if just the code and name are used. Another possibility would be table containing the code and first name (for instance, { code = "en", name = "English" }). — Eru·tuon 21:20, 4 November 2017 (UTC)

categorization
I've added categorization code to the module. The live and  templates use  to do their categorization. will add when there isn't a  template that matches the language code. The module doesn't use these templates so it uses a different category when the code isn't in Module:Language/name/data: – that name could certainly be less wordy and more concise. Suggestions?

The live templates do not categorize pages that are not in article space. For the time being, I have disabled that discrimination in the module for the purposes of debugging so you will see red-linked categories produced by the module at the bottom of this page (all hidden categories if 'Show hidden categories' is checked at Special:Preferences). If and  templates ever call Module:Lang, namespace discrimination will be reinstated.

The red-linked categories attached to this page are because 'West Frisian' (the current category name) does not match the code/name defined by BCP47 +Module:Language/data/wp languages ;  because there is no the  template and therefore has no matching category. For the Hopi case, the live dumps all Hopi-language instances into. I think that philosophy is misguided. I think that red-linked categories are more likely to get 'fixed' than a blue-linked dumping-ground category.

—Trappist the monk (talk) 09:44, 2 November 2017 (UTC)
 * Yeah, I wasn't going to get into those yet. Getting all the ISO stuff to work would be first priority, but it would be nice to support codes introduced by others like Glottolog, at least for languages and dialects with no ISO code.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  17:27, 1 November 2017 (UTC)
 * I'm pretty sure that has existed since 2011, but it looks like the non-existence of the category causes the generic categorization. You can see a couple hundred other such templates with gaps at . I created a bunch of them, but it gets tedious, especially because three other categories are also requested by the documentation for each ISO 639 name xxx template. A bot might be helpful in creating all of these red-linked categories. – Jonesey95 (talk) 00:32, 2 November 2017 (UTC)
 * You're right, I've edited my post.
 * I can now see why this 'simple' task of converting the and  templates to a module has been started before but never been completed.  On the face of it, conversion to a module is simple but then you look under the bonnet ...
 * —Trappist the monk (talk) 09:44, 2 November 2017 (UTC)
 * Keep going! If anyone can do it, you can. Let us know how we can help. – Jonesey95 (talk) 21:45, 2 November 2017 (UTC)
 * has become . I have also created  to track those templates that are using the module during the transition period.  Once all templates that can be have been changed to use the module, this category can go away.
 * —Trappist the monk (talk) 13:06, 6 November 2017 (UTC)
 * —Trappist the monk (talk) 13:06, 6 November 2017 (UTC)

translation and transliteration
The templates have support for translation rendering and some support transliteration rendering. I have attempted to add that support to Module:Lang.


 * Literal translation:
 * Im Westen nichts Neues
 * Im Westen nichts Neues
 * Im Westen nichts Neues


 * Literal translation with generic transliteration:
 * Θεοτόκος
 * Θεοτόκος
 * Θεοτόκος


 * Literal translation with ISO 843 transliteration:
 * doesn't allow editors to specify the transliteration standard nor does the underlying which calls  which does; confused yet?

—Trappist the monk (talk) 14:06, 2 November 2017 (UTC)
 * Well, you were definitely right about this being more complicated than it seemed! Definitely appreciate the effort you're putting into this.  We've needed to Lua-ize this for  long (and I don't have the Lua skillz to do it).  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  17:07, 2 November 2017 (UTC)

I got to wondering about the html/css markup around transliteration renderings when it occurred to me that the module doesn't (because doesn't) include the   attribute in the enclosing :
 * → al-Khwarizmi
 * al-Khwarizmi

For this example, shouldn't the module output something like this:

As I understand it, in css,  and   are the defaults. If they are used here then that suggests that the css  class somehow alters those two properties. Where is  defined? Pinging Editors Dbachmann, the author of, and Ruud Koot, the author of.

—Trappist the monk (talk) 12:53, 14 November 2017 (UTC)
 * Found it, and it appears to be gone:
 * came into
 * moved to
 * moved to
 * So then, does that not mean that the html/css markup around transliteration renderings should be:
 * —Trappist the monk (talk) 13:46, 14 November 2017 (UTC)
 * Changed. Results can be seen in the transliteration example above.
 * —Trappist the monk (talk) 15:52, 16 November 2017 (UTC)
 * Changed. Results can be seen in the transliteration example above.
 * —Trappist the monk (talk) 15:52, 16 November 2017 (UTC)

links=no
If I have a template that renders like this:

If I set no, shouldn't that unlink the primary language (Hebrew) and the transliteration and literal translation static texts?

—Trappist the monk (talk) 00:03, 5 November 2017 (UTC)
 * I would certainly think so. Another issue I was just thinking of again today (and grinding my teeth) is that we need a way to suppress these things entirely e.g. with a no and lang; we don't need the language name, the "translit.", or the "lit." labels after the first occurrence in the same block of material, or sometimes we need the language one only, e.g. when comparing cognates. What we're doing now is using the template once, then abandoning it for manual markup with a  in it; or reusing the  and driving readers nuts by repeating the same crap over and over at them as if they have dain bramage. ;-/   — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  14:18, 5 November 2017 (UTC)
 * For the time being, I'm going to limit 'new features' to the italic switch and perhaps unlinking the translation and transliteration static text so that I can think about making the templates function correctly given a variety of inputs. That I think is mostly done so I'm about to take the module live on a handful of  templates to see what happens – to see if anyone outside of this conversation notices.  You should probably start a new wish-list topic for the label thing.
 * Done, below.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  14:28, 6 November 2017 (UTC)
 * —Trappist the monk (talk) 21:04, 5 November 2017 (UTC)

sandbox testing
lists several templates that have sandboxen. Of those, where the template also has a /testcases page, I have edited the sandbox to use Module:Lang. So far, these:
 * Template:Lang-ar/testcases
 * Template:Lang-arc/testcases
 * Template:Lang-el/testcases
 * Template:Lang-en/testcases
 * Template:Lang-es/testcases
 * Template:Lang-hbs/testcases
 * Template:Lang-he/testcases

Doing this found a handful of coding errors that have been fixed. The interesting case in these templates is Serbo-Croatian. This language uses both Latin characters and Cyrillic characters (not at the same time, I think) so the issue of italics arises. Rendering is controllable with no but it might be better to create another script parameter (script is currently used to override code when rendering the transliteration tool tip – though I don't know how useful that actually is). In this scheme, if lang-script is set to a valid IANA script, then we would write and if not   would override whatever italic is to no-italic.

The previous sandbox version of had some module code that would automatically transliterate the input text to the other script. That apparently didn't ever become live because there are/were problems transliterating Cyrillic to Latin in the presence (or lack – I'm not quite sure) of certain Unicode characters. I don't think that Module:Lang wants to go there.

The other one that I have found, though I've done nothing with it yet, is. That template introduces l, an alias of link; i, to control italic rendering; and abbr, to replace the langauge name with an unlinked abbreviation of the name. I am sure that we really don't need l because in the text editor  looks too much like   and because to someone unfamiliar with the internals of these templates, no is meaningless; this latter reason applies to i as well. Is there a standardized list of language abbreviations? If yes, then perhaps we should support abbr; if no, then we should not support abbr. Without a standard list, editors can (and will) write whatever suits them but what they concoct may not be understandable by readers and other editors.

—Trappist the monk (talk) 12:55, 3 November 2017 (UTC)
 * I suppose one could poke through the hundreds of templates to look for parameters, but another way to do it would be to convert the templates one by one to the new module, and have module code that detects unsupported parameters. Like the proposed script, such parameters could be evaluated for their utility and potentially incorporated into the module. Parameters that are determined to be unneeded or non-standard could be removed or converted to standard parameters. – Jonesey95 (talk) 14:53, 3 November 2017 (UTC)
 * Isn't [poking] through the hundreds of templates to look for parameters more-or-less the same as [converting] the templates one by one because to do the latter you are in effect doing the former? These templates are basically similar enough that we will see the oddball parameters straight away; no need for the module to detect anything.  Compare  to  as an example or  to.
 * —Trappist the monk (talk) 15:39, 3 November 2017 (UTC)
 * Modifying the templates will tell us whether or not the unusual parameters are actually used, not just whether they exist in the template. Unused parameters can be discarded. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)
 * Editing to use Module:Lang showed how it is necessary for the module to support IETF language tags so I've modified the module accordingly.  When processing, because that template receives its language code directly from the template in wikitext, editors will be creative in how they set that parameter.  The module now supports the most commonly used (I think) IETF tags:
 * primary language code-script-region
 * where
 * primary language code is the two- or three-character ISO 639 language code lowercase (ll)
 * script is the four-character IANA script code; title case (Ssss)
 * region is the two-character IANA region code; uppercase (RR)
 * in these forms
 * The module emits an error message when IETF tags don't match these forms or do look right but have invalid content. These tests should probably be added to the  so that we can, if appropriate create new templates that might make use of it (perhaps  and ).
 * —Trappist the monk (talk) 15:55, 3 November 2017 (UTC)
 * I don't know how the ISO 639 name xx templates fit into all of this, but this list of redirects to Template:ISO 639 name ru might provide some useful examples of scripts that are in use. Some of the redirects appear to be for invalid scripts. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)
 * This is why we want to make a module. The article Film speed transcludes   which transcludes   which redirects to   which returns 'Russian' so that the article is properly categorized in .  With the module, Film speed transcludes   which invokes Module:Lang which renders and categorizes in one go.
 * I imagine that the others serve similar purposes.  is wrong-case language code; should be   because   is the ISO 3166 country code for Russian Federation.  is a misspelling of the IANA script code  .  I have no idea where ru-1708 came from.  Its only use is in ; the redirect  was created at the same minute, both by Editor OwenBlacker who can perhaps explain.
 * I think that the module handles all of these correctly:
 * → ГОСТ
 * —Trappist the monk (talk) 22:45, 3 November 2017 (UTC)
 * That is an excellent explanation. I look forward to getting rid of the current morass of hundreds of templates, redirects, and other madness. Keep up the good work. – Jonesey95 (talk) 23:01, 3 November 2017 (UTC)
 * Hey there, saw your ping.  refers to the 1708 "civil script" reform of the Russian alphabet under Peter the Great. Text written in that specific form of Russian should be tagged   to distinguish it from modern Russian. It's a valid IETF language tag, but using a variant subtag, so not the more common types you're covering here. German has the same kind of tags with   and  ; French has , Portuguese has   and  ; Scottish Gaelic has   and   and so on. While there will always be variant subtags that won't get recognised by something all-encompassing (though you could just truncate off the last section, especially if it matches the regex  ), merging templates together like this is an awesome project. Anything that makes it easier for editors to add language tags to content gets my support :) —  OwenBlacker (talk) 23:48, 3 November 2017 (UTC)
 * Are you sure? There does not appear to be a   variant listed.  There is this, extracted from the current IANA language-subtag-registry file:
 * I think that the module handles all of these correctly:
 * → ГОСТ
 * —Trappist the monk (talk) 22:45, 3 November 2017 (UTC)
 * That is an excellent explanation. I look forward to getting rid of the current morass of hundreds of templates, redirects, and other madness. Keep up the good work. – Jonesey95 (talk) 23:01, 3 November 2017 (UTC)
 * Hey there, saw your ping.  refers to the 1708 "civil script" reform of the Russian alphabet under Peter the Great. Text written in that specific form of Russian should be tagged   to distinguish it from modern Russian. It's a valid IETF language tag, but using a variant subtag, so not the more common types you're covering here. German has the same kind of tags with   and  ; French has , Portuguese has   and  ; Scottish Gaelic has   and   and so on. While there will always be variant subtags that won't get recognised by something all-encompassing (though you could just truncate off the last section, especially if it matches the regex  ), merging templates together like this is an awesome project. Anything that makes it easier for editors to add language tags to content gets my support :) —  OwenBlacker (talk) 23:48, 3 November 2017 (UTC)
 * Are you sure? There does not appear to be a   variant listed.  There is this, extracted from the current IANA language-subtag-registry file:
 * —Trappist the monk (talk) 22:45, 3 November 2017 (UTC)
 * That is an excellent explanation. I look forward to getting rid of the current morass of hundreds of templates, redirects, and other madness. Keep up the good work. – Jonesey95 (talk) 23:01, 3 November 2017 (UTC)
 * Hey there, saw your ping.  refers to the 1708 "civil script" reform of the Russian alphabet under Peter the Great. Text written in that specific form of Russian should be tagged   to distinguish it from modern Russian. It's a valid IETF language tag, but using a variant subtag, so not the more common types you're covering here. German has the same kind of tags with   and  ; French has , Portuguese has   and  ; Scottish Gaelic has   and   and so on. While there will always be variant subtags that won't get recognised by something all-encompassing (though you could just truncate off the last section, especially if it matches the regex  ), merging templates together like this is an awesome project. Anything that makes it easier for editors to add language tags to content gets my support :) —  OwenBlacker (talk) 23:48, 3 November 2017 (UTC)
 * Are you sure? There does not appear to be a   variant listed.  There is this, extracted from the current IANA language-subtag-registry file:
 * Are you sure? There does not appear to be a   variant listed.  There is this, extracted from the current IANA language-subtag-registry file:

%% Type: variant Subtag: petr1708 Description: Petrine orthography Added: 2010-10-10 Prefix: ru Comments: Russian orthography from the Petrine orthographic reforms of 1708 to the 1917 orthographic reform
 * Same thing?   and   yes, but the others that you mentioned, no.  The data files that the new Module:Lang depends on aren't necessarily current so at the moment I'm working on code that will extract language, script, and region information from the language-subtag-registry file.  Currently there is no 'variant' data file but that could be extracted as well.
 * —Trappist the monk (talk) 00:44, 4 November 2017 (UTC)
 * I have extended the iana data extraction tool so that it also extracts variant data. The result is Module:Language/data/iana_variants.  With that data module, and a bit of new code, Module:lang can support:
 * but rejects improperly formed tags and emits an error message:
 * The variant data records in the iana language-subtag-registry file include a Prefix item that specifies the language code used with the variant. For variant   the Prefix is   so using that variant with another language code is rejected:
 * These changes also apply to the template support in Module:Lang.
 * —Trappist the monk (talk) 20:54, 5 November 2017 (UTC)
 * BCP47 says that IETF language tags are case insensitive so I have relaxed the checking to allow any mixture of case. The code does, however, prettify its output (not that anyone will see it):
 * I have also added support for three-digit region codes:
 * —Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
 * Fantastic work. Should we also be warning against or disallowing language tags with suppressed script codes, e.g. ?
 * – Quoth (talk) 11:51, 6 November 2017 (UTC)
 * I have not thought about that. Can you make a separate wish-list topic to hold this and other idea so that it/they don't get lost?
 * —Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
 * I set up a section for that, and put both my and Quoth's items in it.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  14:28, 6 November 2017 (UTC)
 * These changes also apply to the template support in Module:Lang.
 * —Trappist the monk (talk) 20:54, 5 November 2017 (UTC)
 * BCP47 says that IETF language tags are case insensitive so I have relaxed the checking to allow any mixture of case. The code does, however, prettify its output (not that anyone will see it):
 * I have also added support for three-digit region codes:
 * —Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
 * Fantastic work. Should we also be warning against or disallowing language tags with suppressed script codes, e.g. ?
 * – Quoth (talk) 11:51, 6 November 2017 (UTC)
 * I have not thought about that. Can you make a separate wish-list topic to hold this and other idea so that it/they don't get lost?
 * —Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
 * I set up a section for that, and put both my and Quoth's items in it.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  14:28, 6 November 2017 (UTC)
 * – Quoth (talk) 11:51, 6 November 2017 (UTC)
 * I have not thought about that. Can you make a separate wish-list topic to hold this and other idea so that it/they don't get lost?
 * —Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
 * I set up a section for that, and put both my and Quoth's items in it.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  14:28, 6 November 2017 (UTC)

iana data
Module:Lang uses Module:Language/data/iana languages, Module:Language/data/iana scripts, and Module:Language/data/iana regions which are, I believe, derived from the 2014-04-10 IANA language-subtag-registry file. There is a new version that is current as of 2017-08-15. I believe that we should update our data files to be inline with the current registry file. To that end I have cobbled-up a data extraction tool that creates the tables held in the data files from the IANA source. You can see the result.

Like the current version of the data modules, the data created by the extraction tool does not have codes that are deprecated, codes that have preferred alternatives, nor codes that are marked as private use. I do not believe that there is a need for these particular codes but I could be wrong. I'm going to update the data files. If anyone knows of a reason to include the codes that the tool skips, let us know.

—Trappist the monk (talk) 16:16, 4 November 2017 (UTC)
 * Along these lines I've hacked another data extraction tool that will generate a table for Module:Language/data/ISO 639-3. I have used this tool to update that module and the other tool to update the IANA data modules.


 * But what about Module:Language/data/wp languages? Anyone know where the data in that module came from?  Is there an 'official source'?
 * —Trappist the monk (talk) 20:22, 5 November 2017 (UTC)

problems with the data set
List of native plants of Flora Palaestina (E-O) times out before it can be fully rendered. I guess I'm not all that surprised because the data set (all of those modules mentioned in §iana data) is recompiled every time a or  template is called (in this case the template is ). The Lua processing time limit is 10 seconds. As an experiment, I forced the module to use only one of the data modules Module:Language/data/iana languages and 'included' it in Module:Lang with  instead of with. The page rendered properly in about 2 seconds. The differences are significant. allows the included modules to hold executable code but must be reloaded with every  (every 'template' in the wikisource). The modules 'included' with  must not hold executable code but are loaded only once per page.

The obvious solution is to create some sort of static version of the table of tables created by. These tables don't need to recompiled for every use because they will only change when the standards from which they were created change.

—Trappist the monk (talk) 17:54, 17 November 2017 (UTC)
 * You should be able to do mw.loadData ('Module:Language/name/data'), and the data will not be recompiled each time one of these templates is transcluded. That is the way we load data modules on Wiktionary. — Eru·tuon 20:50, 17 November 2017 (UTC)
 * That works. Thanks.  Failure on my part to grasp this in the documentation: "The value returned from the loaded module must be a table ... [of] booleans, numbers, strings, and other tables"  For a long time I somehow misunderstood that (perhaps not necessarily from the documentation; could have been from other reading or conversation) because modules always return tables (even if they are tables of functions – something that is used quite a bit in Module:Citation/CS1.  Clearly it means that it doesn't matter how the table is built, just that when the module returns, it can only return a table containing a limited subset of data types.
 * —Trappist the monk (talk) 21:08, 17 November 2017 (UTC)
 * Exactly. The rationale is that functions can "trap" values from one module invocation that could then be transferred to another, or can otherwise change their behavior each time they are called. (For instance, the iterator function returned by ipairs(array) giving a new index and value from the array each time it's called.) So functions would in many cases make unexpected things happen if they were saved in memory and accessed by multiple invocations. Other types (number, string, boolean, nil) don't behave in this way, so they can safely be saved in a table by mw.loadData, accessed through the metatable of a dummy table, and shared between modules. In any case, you can always try loading a module with mw.loadData, and it'll tell you if you're not allowed to. — Eru·tuon 22:14, 17 November 2017 (UTC)

multiple text scripts in a single template
There are a couple of issues here:

Abaza apparently has both Cyrillic and Latin scripts so the italicized part could be the correct  or it could simply be a transliteration of the. I don't know how to tell the difference. My gut would say that switching alphabets 'midstream' is inappropriate. The same applies to transliterations;  should not hold text in two alphabets.

Module:Lang detects italic markup in  (also incorrectly finds bold markup – I'll fix that) because the correct way to control italicization of   is with italic

All of this suggests that the correct way of writing this would be:

—Trappist the monk (talk) 11:07, 7 November 2017 (UTC)
 * , some languages use three scripts (at least) – kk.wp is available in Latin, Cyrillic and Farsi script, for example. It would be convenient if all could be accommodated within a single template, but the sort of workaround you illustrate above could work too. Justlettersandnumbers (talk) 16:47, 7 November 2017 (UTC)

As a solution to this languages-with-multiple-scripts problem, I have renamed the existing  parameter script to transl-script and created a new script that applies to the text and to the language code.

In the example above, both alphabets are contained in a single template. That is still wrong and this change does nothing to permit that. But, it does start us on the way to supporting multiple alphabets in a single template as I have suggested at

Above, because Cyrl, the text is not italicized. When italic is not set and script is set, the module will apply italic markup only when the specified script is  (case ignored). When italic is set, it controls:

The module emits an error message if the value assigned to script is not recognized:

The module does not now, but will, compare the IETF script subtag provided to or received from a  to script. If they are not the same, the module will emit a mismatch error message.

Another reason to do this? So we don't have to fork a bunch of templates to properly support script subtags. —Trappist the monk (talk) 13:55, 9 November 2017 (UTC)
 * Revision; script is not needed with . Because the template gets the language code directly from , editors can simply add the appropriate IETF script subtag:
 * →  or
 * Now emits an error message when the script subtag in code does not match the value assigned to script:
 * This error message should be rare because it should not be necessary to have templates that specifically set code to a value that includes an IETF script subtag.
 * This error message should be rare because it should not be necessary to have templates that specifically set code to a value that includes an IETF script subtag.
 * This error message should be rare because it should not be necessary to have templates that specifically set code to a value that includes an IETF script subtag.


 * I suppose, for completeness, the templates should also support region and variant (also not required in ).
 * —Trappist the monk (talk) 14:40, 9 November 2017 (UTC)
 * I wonder if transl-script should be trans-script instead, to match the trans-title parameter style used in the popular Citation Style 1 templates. – Jonesey95 (talk) 15:27, 9 November 2017 (UTC)
 * Because too close to transcript? Because translit-script just felt too long?  Because  is the subsidiary template used by the current  templates that support transliteration?  Of course, none of these are good reasons.
 * For the most part, there are four different groups, if you will, of parameters in templates:
 * main group has:
 * fixed by the template – language code; module parameter code
 * – text; module parameter text
 * script – language script (only templates rendered by the module); module parameter script
 * transliteration group:
 * translit or  – transliteration of the text in  ; module parameter translit
 * script – not part of but introduced in ; module parameter transl-script
 * std – transliteration standard (only templates rendered by the module); module parameter std
 * translation group:
 * lit or  – literal translation; module parameter lit
 * control group:
 * rtl – fixed by the template; module parameter rtl
 * italic – italic display of  (only templates rendered by the module); module parameter italic
 * Can't do much about existing template parameters here and now (lit? who thought that was a good parameter name?)
 * Still, your point is taken, I'll change transl-script to translit-script, std to translit-std, and the module parameter lit to translation.
 * —Trappist the monk (talk) 16:12, 9 November 2017 (UTC)
 * That all looks better to me. If we have both translation and transliteration, we should not have any parameters that are abbreviated "trans" or "transl". That's just begging for confusion. – Jonesey95 (talk) 20:27, 9 November 2017 (UTC)
 * Would want lit to continue working; lots of use that, since it's short and mnemonic for what it outputs.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  17:34, 10 November 2017 (UTC)
 * The problem with lit is that in the mind and in the mouth it too much mimics translit whereas translation doesn't. A possible, and perhaps better, alias for lit instead of translation is literal.  For the time being, lit isn't going away.  And it you are concerned that typing literal or translation or even lit is too onerous, don't use any of them; positional parameters aren't going away either:
 * —Trappist the monk (talk) 20:40, 10 November 2017 (UTC)
 * Following up on my musing that for completeness, the templates should also support region and variant, implemented:
 * —Trappist the monk (talk) 13:53, 10 November 2017 (UTC)
 * —Trappist the monk (talk) 20:40, 10 November 2017 (UTC)
 * Following up on my musing that for completeness, the templates should also support region and variant, implemented:
 * —Trappist the monk (talk) 13:53, 10 November 2017 (UTC)
 * —Trappist the monk (talk) 13:53, 10 November 2017 (UTC)
 * —Trappist the monk (talk) 13:53, 10 November 2017 (UTC)
 * —Trappist the monk (talk) 13:53, 10 November 2017 (UTC)

live testing
I have implemented the module in, , and.

—Trappist the monk (talk) 14:42, 6 November 2017 (UTC)
 * +,, and
 * —Trappist the monk (talk) 13:21, 7 November 2017 (UTC)
 * —Trappist the monk (talk) 17:16, 11 November 2017 (UTC)
 * —Trappist the monk (talk) 17:16, 11 November 2017 (UTC)

switching |lang= to the module
I am at the point of switching to use the module. I don't anticipate that this will cause problems. But, with 625,000-ish transclusions, problems may arise. The number is so large because a majority of the templates use  to create the  around the text. I have disabled the italic checking for because such checking will detect the hardcoded italic markup added by many (most) of the  templates that have not been converted to the module.

Objections to proceeding?

—Trappist the monk (talk) 16:54, 13 November 2017 (UTC)
 * Sounds good, though it may not be idea for lang-xx to be transcluding lang this way; better that it does this in Lua with a call to the same function, to reduce the transclusion count.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  21:05, 13 November 2017 (UTC)
 * The module supports both. The old versions of  transclude .   templates that use the module don't transclude  because the module does it all.
 * Because the old templates transclude, the module will be doing the work that is now done by the wikitext version of  until all of the  templates are converted to the module.
 * —Trappist the monk (talk) 21:41, 13 November 2017 (UTC)
 * —Trappist the monk (talk) 21:41, 13 November 2017 (UTC)

Switched.

—Trappist the monk (talk) 23:23, 18 November 2017 (UTC)

what about lang-?? with this ?
From :

which gives us the '?' and a link to Help:Multilingual support (Ethiopic):
 * → text

An insource search conducted in the template namespace found:

All of these are Ethiopic languages. If this is all that use this markup, then, for standardization, it would seem best to discontinue support.

—Trappist the monk (talk) 19:57, 13 November 2017 (UTC)
 * Not sure I follow.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  21:05, 13 November 2017 (UTC)
 * What don't you understand?
 * —Trappist the monk (talk) 21:45, 13 November 2017 (UTC)


 * I work very closely with articles containing Ethiopic script. I agree with discontinuing support. Most modern browsers support rendering Ethiopic script. This is an outdated help page that should be archived. It is no longer necessary. The ? is not needed or helpful any more. &mdash;አቤል ዳዊት?(Janweh64) (talk) 08:31, 8 December 2017 (UTC)
 * In fact, it has become a page for software developers to add promotional spam. &mdash;አቤል ዳዊት?(Janweh64) (talk) 08:44, 8 December 2017 (UTC)

recent changes and lang-ar
I am minded to revert back to this version of the module. A problem was introduced with that made the module ignore the no setting in  so that all Arabic script was rendered in italics font when it should not have been.

The purpose of the module edits was to simplify a handful of  statements. Were this code running on a micro-controller, such optimization might be required. It is not so we can afford to spend some processor cycles and use up memory space evaluating. There is the added benefit that editors who come after us can know specifically what it is that is needed at that particular point in the code.

—Trappist the monk (talk) 11:16, 18 November 2017 (UTC)
 * Because we managed to break the module and because there are currently some 41k transclusions of it, I have protected it and created Module:Lang/sandbox.
 * —Trappist the monk (talk) 11:32, 18 November 2017 (UTC)
 * Additionally, I have started Module:Lang/testcases; results at Module talk:Lang/testcases. The sandbox produces different (correct) results for these tests.
 * —Trappist the monk (talk) 14:38, 18 November 2017 (UTC)

Auto-italicization of Latin scripts
The module currently seems to auto-italicize language tags which include a  script code, while the previous template didn't. Because the previous template didn't automatically do it, the correct way to format these words was to italicize them using wiki markup, which means that the module now appears to render them with two sets of encapsulating   tags (presumably one from the mark-up and one from the module). This also means the module auto-italicizes Latin scripts some of the time, but not most of the time (such as in the common cases where the  script is redundant/should be suppressed, e.g. for ,  ,  ). I think this should be reverted to the previous behaviour to both avoid this inconsistency and the duplicate HTML.

If, however, anyone wants to go the opposite direction and make the module output for Latin scripts more consistent by auto-italicizing all Latin scripts, I'd also be fine with the relatively small amount of redundant HTML generated by the current formatting in order to remove the need for doing it manually in the future. That might be doable by checking a language's suppressed script codes for  when no script tag has been supplied, and italicizing it if. – Quoth (talk) 16:12, 19 November 2017 (UTC)
 * Examples of what you mean are always appropriate. Which template are we talking about?  Many of the  templates unconditionally italicize the text in.


 * This is a work in progress. It is not possible (for this human, at least) to, in one go, switch all of the  and  templates to use Module:lang.
 * —Trappist the monk (talk) 18:09, 19 November 2017 (UTC)
 * Right, sorry: you can find an example on this page under the Chinese Mandarin entry with its pinyin transliteration bàng, which uses ; and I'm only talking about usage of the main  template. – Quoth (talk) 21:59, 19 November 2017 (UTC)
 * I'm having a difficult time understanding what the problem is. If I take a step back and view Open back unrounded vowel with the previous version of the template (the last one before Module:lang was introduced), the bàng text looks the same (to me) as it does when that page is rendered with the module.  See for yourself:
 * this link opens the edit window for the previous version of
 * https://en.wikipedia.org/w/index.php?title=Template:Lang&action=edit&oldid=775049579
 * in the Preview page with this template box put:
 * click the adjacent Show preview button
 * That is how it 'used' to look. Compare it against the rendering made by the live template.  How are they different?  They don't seem different to me.
 * —Trappist the monk (talk) 23:15, 19 November 2017 (UTC)
 * The look hasn't changed, only the HTML markup and the circumstances around when the text will be auto-italicized by . If you inspect the HTML you should see two sets of surrounding  tags instead of one; one set from the wiki markup, which was previously required for formatting, and one from the new lang module output. – Quoth (talk) 21:13, 20 November 2017 (UTC)
 * I did your experiment. First I viewed Open back unrounded vowel with the template as it was before the switch to the module (old).  I right-clicked view source and to see the html the en.wiki serves, copy/pasted the markup for bàng.  I repeated the procedure with the current template/module (new).  Here are the results:
 * – old
 * – new
 * These look the same to me. Is it possible that you are looking at a cached version of an older page?
 * —Trappist the monk (talk) 21:58, 20 November 2017 (UTC)
 * Curious. I've cleared my caches, and purged the page, but on the current version of that article I see this markup:
 * I should note that I'm looking at the publicly available page, because I'm unable to use the template edit or preview functionality due to it being protected. – Quoth (talk) 20:00, 21 November 2017 (UTC)
 * I'm seeing the markup bàng when I preview the relevant section too. There is no caching involved because I previewed the page before looking at the source code. — Eru·tuon 23:15, 21 November 2017 (UTC)
 * I should note that I'm looking at the publicly available page, because I'm unable to use the template edit or preview functionality due to it being protected. – Quoth (talk) 20:00, 21 November 2017 (UTC)
 * I'm seeing the markup bàng when I preview the relevant section too. There is no caching involved because I previewed the page before looking at the source code. — Eru·tuon 23:15, 21 November 2017 (UTC)

most lang-?? templates switched to the module
I have switched most templates to use Module:Lang. Most were relatively trivial to switch, the remaining templates less so. These remain to be switched, redirected, deleted, or not: —Trappist the monk (talk) 14:04, 9 December 2017 (UTC)
 * – appears to be a sort of catch-all for 'hard to define' Greek text or for Greek text that doesn't have a specific IANA/ISO 639 language code; internally the template uses ; the template labels this text 'Greek' but the documentation implies that this template is to be used with Ancient Greek text so perhaps the labeling is incorrect; this is another case where private use tags may be useful:   as the catch-all;   for Koine Greek;   for Attic Greek (or the linguist list code  ); etc – 1424 transclusions
 * – special version of to use  to render Hebrew text with Niqqud diacritical marks; not sure what to with this one – 3521 transclusions
 * – has support for automatic transliteration when  is set to  ; an insource search finds 83 instances of the template that use this functionality; not sure what to do with this one – 3819 transclusions
 * – calls which calls  to wrap   in  tags with several fonts – 1 article transclusion
 * – calls to wrap   in  tags with several fonts – 31 transclusions
 * – to wrap   in  tags with several fonts – 11 transclusions
 * – one of two Ligurian languages officially 'Ligurian' but the en.wiki article is at Ligurian (Romance language) (the other officially is 'Ligurian (Ancient)' and its article is at Ligurian language (ancient) – there is no ); may require article naming of the creation of suitable redirects to make this template work with Module:lang – 26 transclusions
 * – has support for two simultaneous transliteration renderings – 47 transclusions
 * – calls to wrap   in  tags with several fonts – 50 transclusions
 * – named using retired code  (see sil.org); internally uses   which does not exist in ISO 639-1 – 76 transclusions
 * – purportedly to be used for North Azerbaijani but uses the code for Coatepec Nahuatl – no article transclusions; delete?
 * – calls to wrap   in  tags with several fonts – 25 transclusions
 * – purportedly to be used for Dutch Low Saxon but uses the code for Southern Nisu – 1 article transclusion
 * – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 197 transclusions
 * – IANA/ISO 639 define code  as 'Prakrit languages', a collective of individual languages; special handling in Module:lang is required for collections – 2 article transclusions
 * – IANA/ISO 639 define code  as 'Romance languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
 * – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 2073 transclusions
 * – IANA/ISO 639 define code  as 'Salishan languages', a collective of individual languages; special handling in Module:lang is required for collections – 1 article transclusion
 * – has support for automatic transliteration when, mechanism is different from that used in  – 3 article transclusions
 * – calls to wrap   in  tags with several fonts – 20 transclusions
 * – IANA/ISO 639 define code  as 'Slavic languages', a collective of individual languages; special handling in Module:lang is required for collections – 4 article transclusions
 * – IANA/ISO 639 define code  as 'Songhai languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
 * – wraps  in a  tag that applies special fonts and sizing; does not provide labeling in the manner of most other  templates – 39 transclusions
 * – provides labeling for simultaneous rendering of Cyrillic, Latin, and Arabic scripts; this functionality apparently never documented – 402 transclusions
 * – provides for simultaneous rendering of multiple transliterations – 235 transclusions
 * – calls which calls  with text wrapped in  tags with several fonts – 23 transclusions
 * – IANA/ISO 639 define code  as 'Sorbian languages', a collective of individual languages; special handling in Module:lang is required for collections – 8 article transclusions


 * As the purpose of the template is to label Classical Attic, Koine, or Byzantine Greek text as "Greek", I'd suggest using  . None of the other special subtags have been abbreviated to three characters, and   is kind of cryptic. — Eru·tuon 04:33, 4 January 2018 (UTC)
 * For the cases where a label different from the label provided by the templates is desired, editors can, after the next update to the live module, use Greek.  It isn't clear to me how the reader benefits from that kind of obfuscation.
 * I don't think that we should specifically support a  code where the defined name associated with that code is 'Greek'.  The module uses the defined name for the rendered label (the  templates) and for categorization (both  and the  templates).  Were we to create a separate  template that directly calls the module, we would be lumping all of these various old Greek languages into the same category used for modern Greek  because they share the same display name.  Using the  with Greek categorizes properly.
 * —Trappist the monk (talk) 11:50, 4 January 2018 (UTC)
 * —Trappist the monk (talk) 11:50, 4 January 2018 (UTC)

—Trappist the monk (talk) 17:39, 9 December 2017 (UTC)
 * ✅ – this and similar templates will require special handling either in Module:Lang or by rewriting the templates to use the  function of the module instead of the    function – 7 transclusions
 * ✅ – see Lang-de-AT – no article transclusions; delete?
 * ✅ – see Lang-de-AT – no article transclusions; delete? (previous TfD)
 * ✅ – see Lang-de-AT – no article transclusions; delete? (previous TfD)
 * ✅ – see Lang-de-AT – no article transclusions; delete? (previous TfD)
 * ✅ – see Lang-de-AT – no article transclusions; delete? (previous TfD)

—Trappist the monk (talk) 18:27, 9 December 2017 (UTC)
 * ✅ – see Lang-de-AT – 21 transclusions (previous TfD)
 * ✅ – see Lang-de-AT – 16 transclusions previous TfD & second TfD)
 * ✅ – see Lang-de-AT – no article transclusions; delete? (previous TfD)

—Trappist the monk (talk) 19:17, 9 December 2017 (UTC)
 * ✅ – similar to Lang-de-AT, IETF language tags like this will require special handling bu Module:lang – 4 transclusions in article space (previous TfD)
 * ✅ – redundant ISO 639-3 version of – 3 article transclusions; delete? redirect to ?
 * I think this can be safely redirected. – Uanfala (talk) 14:18, 9 December 2017 (UTC)
 * redirected by Editor Jonesey95.
 * – Module:Zh handles the complexities and nuances of Chinese text; nothing to do here

—Trappist the monk (talk) 20:36, 9 December 2017 (UTC)
 * ✅ – sort of a version of without the annotation; could be easily converted to use Module:lang – 32 transclusions
 * ✅ – sort of a version of without the annotation; could be easily converted to use Module:lang – 14 transclusions
 * ✅ – uses deprecated code ; the correct code is   – 2 article transclusions; redirect to ?
 * redirected

—Trappist the monk (talk) 15:08, 11 December 2017 (UTC)
 * These two templates not redirect; instead, script set to the appropriate value; the names 'Serbian Cyrillic' and 'Serbian Latin' not preserved because that usage is inconsistent with other templates for languages that use multiple scripts and because it is easy to distinguish one script from the other.
 * ✅ – calls to wrap   in  tags with   attribute; the Unicode class no longer exists (see ) – 6239 transclusions; redirect to ?
 * ✅ – not a member of ; does nothing special except label the text as 'Serbian Latin' – 50 transclusions; redirect to ?
 * ✅ – see Lang-de-AT – 24 transclusions

—Trappist the monk (talk) 23:36, 24 December 2017 (UTC)
 * ✅ – IANA/ISO 639-3 name is 'Havasupai-Walapai-Yavapai'; this template requires the use of a code in  to choose one for the language label and link; 29 transclusions
 * Converted to use the module; created three new templates that use private use codes, one each for the three language names:

—Trappist the monk (talk) 00:53, 28 December 2017 (UTC)
 * ✅ – probably an improper use of  defined by sil.org as a collective with the name 'Germanic languages' but used by this template as an individual language named 'Proto German'; we should not be redefining international standards so if there is not international standard code for Proto German, we should not make one up except to perhaps create a private use variant  ; any private use IETF tags will require special handling by Module:lang or by rewriting the templates to use the   function of the module instead of the    function – 5 article transclusions
 * Created private-use code version;  now redirects to.

—Trappist the monk (talk) 18:16, 3 January 2018
 * ✅ – template name uses a code that is not a legitimate IANA / ISO 639 code ostensibly to refer to Medieval Greek (internally the template uses, Ancient Greek); the correct solution may be to rename the template to use a private use variant:   – 23 transclusions
 * Created private-use code version;  now redirects to.

—Trappist the monk (talk) 19:52, 10 January 2018 (UTC)
 * ✅ – this one expects as  an ISO 15924 script identifier – 244 transclusions
 * changed to use the module; the single Latn  script use fixed.

These templates have been nominated for deletion: —Trappist the monk (talk) 11:04, 25 December 2017 (UTC)
 * – includes several fonts in css in a span around  which don't appear to be necessary – 1 transclusion
 * – includes several fonts in css in a span around  which don't appear to be necessary; if really for Latn script, should be italicized (for pinyin, presumably) – 0 transclusions; should be deleted?
 * – this appears to be an improper use of  which sil.org says is the ISO 639-3 code for German; this template uses it for something called Early New High German but named as 'early German' (sic – a redirect) – no article transclusions; should be deleted?
 * – misuses by giving it the result of a call to ; no documentation so not clear indication of the purpose – 1 article transclusion; delete?
 * – misuses by giving it the result of a call to ; no documentation so not clear indication of the purpose – 2 article transclusions; delete?
 * –  is the ISO 639-3 code for Uncoded languages;  is used to label Montenegrin which, apparently does not have a language code; a search of sil.org finds little mention of Montenegrin – no article transclusions; delete?
 * – see – no article transclusions; delete?
 * And relisted. Comments there appreciated.
 * —Trappist the monk (talk) 10:54, 3 January 2018 (UTC)

These survived TfD; no concensus: These deleted: ISO 639-3 now has  for Montenegrin so there is a new  template that replaces  and.

—Trappist the monk (talk) 17:21, 15 January 2018 (UTC)

promoting ISO 639-2/3 codes to ISO 639-1
According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." This would explain why the IANA data set has both ISO 639-1 and 639-3 language codes but does not have both -1 and -3 codes for the same language. This issue was brought to my attention because code  was causing a mis-categorization to Letzeburgesch when it should have been Luxembourgish.

It is common practice to promote three-character language codes to equivalent two-character codes. We should adhere to this practice. To that end I have created a tool that creates a Lua table from the data in the table at the custodian's website. The result is Module:Lang/ISO 639 synonyms. Module:Lang uses that table to promote ISO 639-3 codes to ISO 639-1 codes. When this happens, a maintenance category is added so that the template call can be tweaked. is currently only implemented for and cannot be turned off with nocat. Without any issues or problems, this functionality will be extended to the templates and nocat control enabled.

—Trappist the monk (talk) 17:54, 13 December 2017 (UTC)
 * So to fix these codes: I look for a three-letter code in a lang template within the page in question, then I look in Module:Lang/ISO 639 synonyms to see if there is an equivalent two-letter code. Then I change the three-letter code to the two-letter code. Like this? If that is correct, it would help to have an error message of some sort, perhaps shown in preview mode only, to give the editor a hint about how to fix the error(s). – Jonesey95 (talk) 20:03, 13 December 2017 (UTC)
 * Hadn't got there yet. Because it isn't really broken, I had thought to do something akin to the maintenance messages emitted by Module:Citation/CS1 but first I wanted to see if this stuff worked properly.
 * Yeah, for that is pretty much the fix.  When  gets categorization functionality, the usual fix will be a fix to the template itself – though it is possible to set code in a  template to override its normal rendering:
 * text
 * text
 * (not sure why one would want to do that – perhaps that is something that should be prevented for )
 * —Trappist the monk (talk) 20:20, 13 December 2017 (UTC)
 * The best fix for lang-??? templates may be to redirect them to the appropriate lang-??. I did a lot of that when cleaning up those template calls in the pre-module days. – Jonesey95 (talk) 20:24, 13 December 2017 (UTC)
 * Concur.
 * —Trappist the monk (talk) 20:26, 13 December 2017 (UTC)
 * Hidden messaging added. To see the messages, add this to your preferred css:
 * .lang-comment {display: inline !important;} /* show lang messages */
 * —Trappist the monk (talk) 23:02, 13 December 2017 (UTC)
 * Categorization limited to article namespace, nocat supported.
 * —Trappist the monk (talk) 00:03, 14 December 2017 (UTC)
 * Curious about the construction of Module:Lang/ISO 639 synonyms. Is there a reason for doing  rather than  ? The latter uses less memory. — Eru·tuon 21:42, 13 December 2017 (UTC)
 * Copy/pasta from another of the tools, otherwise no reason.
 * —Trappist the monk (talk) 23:02, 13 December 2017 (UTC)
 * fixed.
 * —Trappist the monk (talk) 00:03, 14 December 2017 (UTC)
 * I'm not quite sure I see the benefit of running this task. On occasions, the 3-letter code is more intuitive than the 2-letter one: if anything we should encourage the use of for example ave for Avestan rather than ae. – Uanfala (talk) 13:15, 16 December 2017 (UTC)
 * First sentence of this topic says why: According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." (which see). Promotion to ISO 639-1 is the generally accepted convention.  If you look in the IANA language-subtag-registry file for subtag   you will not find it; Wikipedia's   magic word does not understand   but does understand   (the magic word code does not support either of   or   – which is why Module:Lang has its own data modules):
 * By promoting synonymous ISO 639-2/-3 codes to ISO 639-1, Module:Lang aligns with this convention.
 * With regard to your : the and  templates use codes and names from IANA (which gets them from ISO 639, but does sometimes reorder names when there is more than one spelling).  IANA and ISO 639 do not distinguish   from  ; they provide the same names in the same order: Panjabi and then Punjabi so, ,  all produce the same html markup and the latter two would produce the same visible display and links ( redirects to ).  For completeness in my accounting here,  is deprecated, uses an invalid language code in its name, has no article transclusions, so should be deleted.
 * Most important though, is that w3c specifies the use of language codes from the IANA subtag registry so that browsers and other html readers understand what is meant by the value assigned to the  attribute.  This is a prime argument for Module:Lang to discontinue support of the two linguist list codes it now supports.
 * —Trappist the monk (talk) 14:35, 16 December 2017 (UTC)
 * So, if I understand correctly, the practical rationale behind the promotion to ISO 639-1 is that these codes are more likely to be understood by browsers? If this is so then it makes sense. But do we really want to have the maintenance burden of having to clean up every time someone uses an ISO 639-3 code instead of the 639-1 one? Won't it be possible for the template to do these conversions internally? – Uanfala (talk) 15:02, 16 December 2017 (UTC)
 * The module does do the promotion so that it produces correct html markup:
 * ਮਾਝੀ
 * ਮਾਝੀ
 * The maintenance message is only visible to those who turn on the display with the css code above. I have an AWB script that will help to clear the hidden maintenance  (you reverted an edit made by that script).  Yesterday there were about 550 pages in that category.  Most of what remains is there because I didn't let the script make the edit so that I have the opportunity to fix the italic markup that will cause errors when the italic error checking code for  gets reenabled.
 * —Trappist the monk (talk) 15:45, 16 December 2017 (UTC)
 * I might have said this somewhere in one of these threads, but it bears repeating: not all the three-letter codes are a 1:1 correspondence with two-letter ones. I have no issue with synonymous longer ones being made more concise (though yes, the longer ones are often more intuitive) as long as the longer ones aren't rejected as input, and most especially as long as three-letter codes for dialects, historical stages, etc., are never collapsed to the generic language name.  — SMcCandlish ☏ ¢ &gt;ʌⱷ҅ᴥⱷʌ&lt;  23:07, 17 December 2017 (UTC)
 * There is no 1:1 mapping of all three-character codes to two-character codes. There is a 1:1 mapping of all two-character codes (ISO 639-1) to three-character codes (ISO 639-2/3).  Three-character codes that have an associated  two-character code are omitted from the IANA language-subtag-registry file so browsers and other html readers are not obligated to know about those synonymous three-character codes.  We do not reject three-character codes as input but where there is a two-character synonym, we use the synonym.
 * The relationship between codes and language names is a frustrating one. ISO 639 establishes the base code-to-name mapping.  When a code has more than one possible name, ISO 639 lists them in some sort of an order.  IANA, sometimes chooses to use a different order for the same code and names.  Sometimes the ISO 639/IANA names are not suitable for direct use as a label by Wikipedia:
 * → Old English (ca. 450-1100)
 * So, we have a table of alternate names; of alternate spellings; of names we choose because of ISO 639/IANA of list order differences; of codes that improperly redefine the standard's definition:
 * ISO 639/IANA:  → Malo
 * but in Module:Language/data/wp_languages
 * → Medieval Latin (there is no ISO 639/IANA code for Medieval Latin)
 * The provenance for the codes/names listed in that module is wholly unknown so is suspect. Cleaning that up is just one more task to be done.
 * —Trappist the monk (talk) 11:43, 18 December 2017 (UTC)
 * The relationship between codes and language names is a frustrating one. ISO 639 establishes the base code-to-name mapping.  When a code has more than one possible name, ISO 639 lists them in some sort of an order.  IANA, sometimes chooses to use a different order for the same code and names.  Sometimes the ISO 639/IANA names are not suitable for direct use as a label by Wikipedia:
 * → Old English (ca. 450-1100)
 * So, we have a table of alternate names; of alternate spellings; of names we choose because of ISO 639/IANA of list order differences; of codes that improperly redefine the standard's definition:
 * ISO 639/IANA:  → Malo
 * but in Module:Language/data/wp_languages
 * → Medieval Latin (there is no ISO 639/IANA code for Medieval Latin)
 * The provenance for the codes/names listed in that module is wholly unknown so is suspect. Cleaning that up is just one more task to be done.
 * —Trappist the monk (talk) 11:43, 18 December 2017 (UTC)

using private-use tags
I have written elsewhere in these discussions that we should not be making up our own primary language tags; should not be redefining tags that have already been defined by international standards. Instead we should be operating within the permitted uses of the standard. BCP47 (IETF language tags) provides for private use tags. I have tweaked Module:Lang/sandbox to accept private use IETF language tags in the form:

where:
 * is the standard ISO 639-1, -2, -3 language code
 * is the BCP47-required singleton that marks the beginning of a private use tag
 * is the private use tag; one to eight alphanumeric characters

I have created three of these tags for :

I use Walapai instead of Hualapai for standardization and because it matches the existing category. The label will link Walapai to Havasupai–Hualapai language because there is an existing redirect. Categorization isn't quite noodled out yet. Simplest and best, I think, it to create three individual categories for the three languages and make them subcategories of.

This sandbox template needs to be implemented as, , , to be compliant with the other  templates.

—Trappist the monk (talk) 10:50, 23 December 2017 (UTC)

collective language codes
See this faq @ LOC for collective-language code description.

In general, I think, and  templates should not use collective-language codes. Such use should be discouraged because these codes don't properly identify the language of the text held by the template:

According to MARC Code List for Languages, code  includes these languages:
 * Anglo-Norman
 * Cajun French
 * Franco-Provençal ( – Arpitan or Francoprovençal in the current IANA list)
 * Franco-Venetian (not in IANA list – possibly  Venetian)
 * Italian, Old (to 1300) (not in IANA list)
 * Ladin
 * Portuñol (not in IANA list)
 * Spanish, Old (to 1500) (not in IANA list by that name – possibly  Old Spanish)

To which of them does the example template refer?

I am not suggesting that such codes should never be used, but they should be used with care.

There are about 110 collective codes listed in the IANA language-subtag-registry file (of which only a handful are in current use at en.wiki) where the language name ends with the word 'languages' (plural). This, according to the LOC faq, is how ISO 639-2 distinguishes individual and macro-language names from collective-language names.

The and the  templates use language names obtained from the data set for categorization and for language labels. For the occasions when collective-language codes are used, I propose that Module:lang shall: —Trappist the monk (talk) 14:41, 1 January 2018 (UTC)
 * 1) use the proper collective language name for all template labels
 * → Romance languages: some text
 * 1) standardize category naming for these language codes:
 * Category:Articles with text from the Romance languages collective
 * I have seen instances of these codes used when the derivation of a word is unclear, but where it does appear to be traceable to a root word in a collective set of languages. I agree that there should be a recommendation to use them only in that situation or similar situations. I support the proposal to match the language codes with the "collective" name; if editors want a more specific label, they can use a more specific language code.


 * All of that said, I expect that this change will have some unexpected side effects, and we should be open to refining it as we go. – Jonesey95 (talk) 15:11, 1 January 2018 (UTC)
 * I have tweaked the sandbox to use the category naming convention described above. In mainspace, this:
 * renders this:
 * Module:Language/data/wp_languages redefines these collective codes:
 * → 'Bihari' – Bihari languages [ ]; only two-character collective code
 * → 'Berber' – Berber languages [ ]
 * → 'Proto-Celtic' – Celtic languages [ ]; now redirects to
 * → 'Proto-Germanic' – Germanic languages [ ]; now redirects to
 * → 'Mayan' – Mayan languages [ ]
 * → 'Nahuatl' – Nahuatl languages [ ]
 * → 'Prakrit' – Prakrit languages [ ]
 * → 'Jèrriais' – overridden in Module:Lang/data to 'Romance'
 * → 'Salish' – Salishan languages [ ]
 * → 'Slavic' – Slavic languages [ ]
 * → 'Songhay' – Songhai languages [ ]
 * → 'Sorbian' – Sorbian languages [ ]
 * Module:Lang/data redefines these collective codes
 * → 'Baltic' – Baltic languages [ ]
 * → 'Norman' [ ] – not defined as a collective but has the appearance of a collective – IANA names: Jèrriais, Guernésiais; proper handling of this may require  and   private-use codes
 * → 'Romance' – Romance languages [ ]; overridden in Module:Lang/data to 'Romance'
 * → 'other Semitic' – Semitic languages [ ]
 * So, with the exception of, all that should be required to implement the collective naming convention is to move the categories associated with these code to the appropriate names and tweak the data set to correctly support them.
 * When the '&lt;something> languages' name is undesirable in article text, label can be used to locally override the template-provided label (category name will remain the same).
 * —Trappist the monk (talk) 13:21, 7 January 2018 (UTC)
 * When the '&lt;something> languages' name is undesirable in article text, label can be used to locally override the template-provided label (category name will remain the same).
 * —Trappist the monk (talk) 13:21, 7 January 2018 (UTC)
 * —Trappist the monk (talk) 13:21, 7 January 2018 (UTC)

latn script inside &lt;poem>...&lt;/poem> tags
Because of this conversation, I noticed that was not italicizing Latn-script text inside of  tags. All of the text inside the template at Erde, singe §Text under the German current lyrics heading is written using characters belonging to the Unicode Latin character set so should have been rendered in italics.

It turns out that  tags insert poem strip markers that look like this:

The '?' characters in the strip marker are used here as visual replacements of the invisible delete character (U+007F). I do not fully understand how  tag processing works but when it comes time for to do its work, the text has these strip markers and it has the original newline characters (U+000A, LF, '\n').

I have tweaked the sandbox to account for the delete and newline characters:  —Trappist the monk (talk) 13:30, 5 January 2018 (UTC)