User:Monkbot/Task 6: CS1 language support

Monkbot task 6 was created to modify CS1 citations that have title parameters containing non-Latin to use the new CS1 parameter script-title.

A recent change to Module:Citation/CS1 (the engine underlying the templates) created a new parameter script-title. The new parameter is intended to be used when a citation's title is written in a script that is not a Latin-based alphabet. Usually these scripts should not be italicized (Chinese, Japanese, etc.) and/or may be written right-to-left (Hebrew, Persian, etc.). script-title is supported by all citation templates that use Module:Citation/CS1 except. As of revision b, task 6 does not modify templates.

The purpose of the templates is to identify for readers that certain links are to sources that are not English language sources. Each of these templates adds the page to the appropriate subcategory of. Prior to the 11 October 2014 update to Module:Citation/CS1, CS1 templates with language parameters also added pages to the individual subcategories in Category:Articles with non-English-language external links. Because CS1 citations do not always provide links to external sources, citations that used language to identify the language in which the source is written were improperly categorizing the article. Module:Citation/CS1 now uses. Task 6 locates CS1 citation templates that are adjacent to templates, adds a language parameter with the language code from the  template to the CS1 citation and then deletes the  template.

Task 6 was initially created to work on pages listed in certain subcategories of Category:Articles with non-English-language external links. The criteria are: subcategories that contain 1,000 or more articles; or subcategories for languages that have a ISO639-1 two-character language code that are listed at right-to-left. The first was an arbitrary cutoff, the second was not.

Task 6 begins by changing redirects to that standard form. For example,, , , and are all redirects to and so are changed to. The purpose of the standardization is to simplify later rules in the script.

After standardization, task 6:
 * 1) protects certain  templates from further edits;
 * 2) moves  templates that are inside a CS1 citation template to a position ahead of the CS1 template for processing by later rules;
 * 3) removes empty language parameters from CS1 citations so that the citation doesn't end up with duplicate language parameters at the end of the task;
 * 4) removes wikilink markup from language parameter values so that Module:Citation/CS1 can properly categorize the citation;
 * 5) removes English, British English, en, or en-GB from CS1 citations that use them. discontinued at task 6n;
 * 6) from task 6n: modifies English language, British English to English; modifies en-GB to en

Some citations have language parameters that contain RFC1766-style language codes (code-subcode where code is an ISO639-1 language code and subcode is an ISO3166 country code. CS1 does not support this style of language parameter.  Task 6 truncates these codes to just the ISO639-1 portion.  Chinese is written in both simplified and traditional forms.  Where simplified Chinese or traditional Chinese parameters occur, task 6 removes the qualifier.  Where language contains a language name followed by the word language ({{para|language}German language}}), task 6 removes the qualifier.

In a CS1 citation, language may either precede or follow title with or without intervening parameters. To properly evaluate each citation then requires a rule for each case. Alternately, multiple rules are not needed if each citation is modified to a standard format. In this case, editors generally place language somewhere after title. Task 6 modifies those citation templates where language precedes title by moving language to the end of the citation (same place it puts language parameters that are created from templates).

Certain citations shouldn't be edited. Task 6 employs a multilevel protection scheme. Edits to protected elements are prevented by the insertion of a special text string that makes the template unrecognizable to subsequent rules. Elements that include either of the special text strings  and , are never edited by task 6 except to remove the protection string at the task's completion. Reasons for this level of protection are:
 * 1) a citation with leading or trailing templates contains language where the  code (xx) or the code's equivalent language name does not match the language name or code in language; where there is a match,  is removed;
 * 2) the citation includes another template; especially templates like which can confuse the later rules;
 * 3) groups of two or more or  templates, the first and last are protected to prevent later rules from taking one of them as a value for a citation's language parameter.
 * 4) when amongst other  or  templates; it is presumed that such use indicates a multilingual source;

The second level of protection is applied only after the first level protection rules have been applied. This level identifies CS1 citations that have title values containing one or more Latin characters. The script is not smart enough to know if these characters are part of the original writing system, are a transliteration, or are a translation. Under certain circumstances described later, task 6 may edit those citations marked with.

Unprotected templates are then deleted.

For each of the rtl languages, the CJK languages, other non-Latin scripts (Greek, Hebrew, Cyrillic), and in keeping with MOS:Foriegn terms, special rules require that the content of title must match the language identified in or language. For example, the rule for Arabic requires an or ar or Arabic and that title contain only punctuation, digits (0–9), and Arabic script. When these conditions are met, task 6 replaces ... with ar:..., adds ar (if appropriate) and deletes the adjacent template (if present).

Languages for which task 6 supports script-title are:

• Arabic (ar)

• Armenian (hy)

• Bosnian (bs)

• Chinese (zh)

• Greek (el)

• Hebrew (he)

• Japanese (ja)

• Korean (ko)

• Kurdish (ku)

• Maldivian (dv)†

• Pashto (ps)

• Persian (fa)

• Russian (ru)

• Serbian (sr)

• Sindhi (sd)

• Thai (th)

• Ukranian (uk)

• Uyghur (ug)

• Yiddish (yi) † when divehi, dhivehi, maldivian, dv; when citation has adjacent, language parameter must be Maldivian or dv;

For those languages that use Latin or Latin-variant alphabets, task 6 simply adds xx and deletes the adjacent template.

Where those CS1 citations with Latin characters in title, and which now contain, task 6 deletes the icon and adds xx to the citation.

As a final step, wherever task 6 added,  , and  , that text is removed.

From 18 April 2015‎ Module:Citation/CS1 supports a comma delimited list of language names. From Rev. o, task 6 will locate cs1|2 templates followed by two to five templates and add the codes from those template to a language parameter.

Hidden under the hood at Module:Citation/CS1 is the process that takes transcription, xx:original writing system title, and translated title and puts them all together with  which both isolates the content for rtl languages and helps the browser to correctly display the script.

If, at the end of all of this, only casing has been changed ( to ) then the change is not saved.

Article pages that contain or that do not contain Module:Citation/CS1-supported templates will not be edited by this task.

Ancillary tasks
This script also:

AWB settings file
  false false   true    false</Enabled> <Link /> <Variants /> <ContextChars>20</ContextChars> </Disambiguation> <Special> <namespaceValues> 0    </namespaceValues> <remDupes>true</remDupes> <sortAZ>true</sortAZ> <filterTitlesThatContain>false</filterTitlesThatContain> <filterTitlesThatContainText /> <filterTitlesThatDontContain>false</filterTitlesThatDontContain> <filterTitlesThatDontContainText /> <areRegex>false</areRegex> <opType>0</opType> </Special> <Tool> <ListComparerUseCurrentArticleList>0</ListComparerUseCurrentArticleList> <ListSplitterUseCurrentArticleList>0</ListSplitterUseCurrentArticleList>