Wikipedia:Bots/Requests for approval/Platybot

Platybot
Operator:

Time filed: 08:51, Monday, July 8, 2024 (UTC)

Function overview: Adjusts templates based on provided JSON configuration files. This request is limited to Template:Cite news and Template:Cite web, and is primarily intended to correct issues where the work or publisher is linked to the wrong target.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Not currently

Links to relevant discussions (where appropriate):

Edit period(s): Initially, irregular one-off runs, with each held after significant expansions to the configuration file. Once most citations have been fixed I will open a request for continuous operation in a maintenance mode.

Estimated number of pages affected: Varies considerably based on configuration. This configuration, which applies to ten sources, will edit approximately 23,000. This configuration, which goes beyond correcting wrong links and also always inserts the correct link when one is missing, will edit approximately 450,000.

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: Adjusts parameters of Cite news and Cite web based on a configuration file. This configuration can be applied to any parameter, but the intent of this request is to apply it to the following:
 * work
 * publisher
 * publication-place
 * department
 * agency
 * url-access

It determines which change to apply based on current parameter field values. Any field or combination of fields can be used, but the intent of this request is to use the "url" field.

Adjustments can be specified as "always", "onEdit", or "never". When "always" is specified, if a change is identified as being desired for a parameter the article will be edited to implement it. When "onEdit" is specified, desirable changes are only implemented if we are already editing the page. This reduces the impact on watchlists by skipping articles that don't have high priority issues.

{  "$schema": "http://json-schema.org/draft-07/schema#", "type": "array", "items": { "type": "object", "properties": { "includes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": "url" },            "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com", "www.bbc.co.uk"] }            }           }         },         "description": "Lists conditions required to be met for this configuration to be applied to the template." },      "excludes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": "url" },            "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com/sport", "www.bbc.co.uk/sport"] }            }           },           "description": "Lists conditions that must not be met for this configuration to be applied to the template." }      },       "patternProperties": { "^[a-zA-Z0-9-]+$": { "oneOf": [ {              "type": "array", "description": "Named for the parameter, and defines what will be done with it. Used when there are multiple possible configurations for the parameter.", "items": { "$ref": "#/definitions/parameter-config" }          },             {               "type": "object", "description": "Named for the parameter, and defines what will be done with it. Used when there is only one possible configuration for the parameter.", "$ref": "#/definitions/parameter-config" }          ]         }       }     },     "definitions": { "parameter-config": { "$schema": "http://json-schema.org/draft-07/schema#", "$id": "parameter-config", "type": "object", "properties": { "includes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": ["url"] },                "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com", "www.bbc.co.uk"] }                }               }             },             "description": "Lists conditions required to be met for this configuration to be applied to the parameter." },          "excludes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": ["url"] },                "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com/sport", "www.bbc.co.uk/sport"] }                }               }             },             "description": "Lists conditions that must not be met for this configuration to be applied to the parameter." },          "link": { "type": "string", "description": "Where the parameter should normally link to", "example": ["ABC News (Australia)"] },          "wikitext": { "type": "string", "description": "What the wikitext of the parameter should normally be", "example": ["ABC News"] },          "blacklist": { "type": "array", "items": { "type": "string", "example": ["ABC News (United States)", "ABC News"] },            "description": "Links that will always be removed" },          "greylist": { "type": "array", "items": { "type": "string", "example": ["Australian Broadcasting Corporation"] },            "description": "Links that will only be removed when already editing the page. Used to prevent edits that would only fix issues we consider minor." },          "whitelist": { "type": "array", "items": { "type": "string", "example": ["The Sunday Telegraph (Sydney)"] },            "description": "Links that will never be removed. Used when we believe editors may have deliberately provided a non-standard value that we wish to respect." },          "fixRedirects": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will replace redirects to the provided link with the provided link." },          "fixDisplay": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will replace the currently displayed text with the displayed version of the provided Wikitext." },          "fixOthers": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "always", "description": "Specifies when we will replace links to pages that are neither redirects to the link nor on the provided lists." },          "fixMissing": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will add a missing value" },          "priority": { "type": "integer", "default": 5, "description": "Provides a tie-breaker when multiple array objects meet the inclusion or exclusion criteria. Higher value is preferred. It is unspecified which configuration object is used when both have the same priority level.", "minimum": 1 }        }       }     }   } }

What it does to these parameters depends on the configuration. For example: "work": {    "link": "ABC News (Australia)", "wikitext": "ABC News", "blacklist": ["ABC News (United States)", "ABC News"], "greylist": ["Australian Broadcasting Corporation"], "fixMissing": "onEdit", "fixRedirects": "onEdit", "fixOthers": "always" }

Will ensure that the "work" parameter only links to ABC News (Australia). When it finds a link to a source other than ABC News (Australia), its redirects, or Australian Broadcasting Corporation, it will edit the article to correct that link.

When it encounters a redirect, or Australian Broadcasting Corporation, or a missing value, it will only correct those if it is already editing the article.

If we change "fixMissing" to "always", it would edit the article to insert the value.

"agency": {     "includes": [       {          "key": "agency", "value": ["Reuters"] }     ],      "remove": "onEdit" }

Will remove the agency field when it contains "Reuters". This is used to correct when the field has been incorrectly filled with the name of the publisher or work.

"department": [    {       "includes": [        {           "key": "url", "value": ["reuters.com/world/"] }      ],       "wikitext": "World" },    {       "includes": [        {           "key": "url", "value": ["reuters.com/world/reuters-next/"] }      ],       "wikitext": "Reuters Next", "priority": 6 },    {       "includes": [        {           "key": "url", "value": ["reuters.com/business/"] }      ],       "wikitext": "Business" }  ] This fills in the department field based on the source url. If none of these are met then the department field is not filled.

The current configuration file will do the following:
 * ABC News (Australia)
 * Set "work" to ABC News
 * Set "publisher" to Australian Broadcasting Corporation
 * Remove "publication-place"
 * Remove "agency" when incorrect
 * The Daily Telegraph
 * Set "work" to The Daily Telegraph
 * Set "publisher" to Telegraph Media Group
 * Set "publication-place" to "London, United Kingdom"
 * Set "department" when it can be determined
 * Reuters
 * Set "work" to Reuters
 * Set "publisher" to Thomson Reuters
 * Set "publication-place" to "London, United Kingdom"
 * Set "department" when it can be determined
 * Remove "agency" when incorrect
 * The New York Times
 * Set "work" to The New York Times
 * Set "url-access" to "limited"
 * Remove "publisher"
 * Remove "publication-place"
 * BBC News
 * Set "work" to BBC News
 * Remove "publisher"
 * Remove "publication-place"
 * Set "department" when it can be determined
 * BBC Sport
 * Set "work" to BBC Sport
 * Remove "publisher"
 * Remove "publication-place"
 * The Guardian
 * Set "work" to The Guardian
 * Remove "publisher"
 * Set "publication-place" to "London, United Kingdom"
 * Set "department" when it can be determined
 * The Guardian (Swan Hill)
 * Set "work" to The Guardian
 * The Daily Telegraph (Sydney)
 * Set "work" to The Daily Telegraph
 * Set "publisher" to News Corp Australia
 * Remove "publication-place"
 * ABC News (United States)
 * Set "work" to ABC News
 * Set "publisher" to American Broadcasting Company
 * Remove "publication-place"

The intent is that the community will expand the configuration file, increasing the number of citations it can fix.

When editing a template, to improve readability it will also apply a consistent format and naming convention. This involves converting parameters away from aliases to their primary values, and placing the parameters into the following order:


 * 1) author, last1, or vauthors
 * 2) first1
 * 3) author-link1
 * 4) last2
 * 5) first2
 * 6) author-link2
 * 7) lastN
 * 8) firstN
 * 9) author-linkN
 * 10) display-authors
 * 11) author-mask
 * 12) collaboration
 * 13) editor, editor1-last, or veditors
 * 14) editor1-first
 * 15) editor1-link
 * 16) editor2-last
 * 17) editor2-first
 * 18) editor2-link
 * 19) editorN-last
 * 20) editorN-first
 * 21) editorN-link
 * 22) display-editors
 * 23) editor-mask
 * 24) translator1-last or vtranslator
 * 25) translator1-first
 * 26) translator1-link
 * 27) translator2-last
 * 28) translator2-first
 * 29) translator2-link
 * 30) translatorN-last
 * 31) translatorN-first
 * 32) translatorN-link
 * 33) display-translators
 * 34) translator-mask
 * 35) interviewer1-last or vinterviewer
 * 36) interviewer1-first
 * 37) interviewer1-link
 * 38) interviewer2-last
 * 39) interviewer2-first
 * 40) interviewer2-link
 * 41) interviewerN-last
 * 42) interviewerN-first
 * 43) interviewerN-link
 * 44) display-interviewers
 * 45) subject1-last or vsubject
 * 46) subject1-first
 * 47) subject1-link
 * 48) subject2-last
 * 49) subject2-first
 * 50) subject2-link
 * 51) subjectN-last
 * 52) subjectN-first
 * 53) subjectN-link
 * 54) display-subjects
 * 55) subject-mask
 * 56) others
 * 57) display-contributors
 * 58) contributor-mask
 * 59) name-list-style
 * 60) date
 * 61) year
 * 62) orig-date
 * df
 * 1) title
 * 2) script-title
 * 3) trans-title
 * 4) title-link
 * url, article-url, chapter-url, contribution-url, entry-url, map-url, or section-url
 * 1) url-access, article-url-access, chapter-url-access, contribution-url-access, entry-url-access, map-url-access, or section-url-access
 * 2) url-status
 * 3) format
 * 4) work
 * 5) script-work
 * 6) trans-work
 * 7) page
 * 8) pages
 * at
 * 1) department
 * 2) type
 * 3) series
 * 4) language
 * 5) volume
 * 6) issue
 * 7) others
 * 8) edition
 * 9) location
 * 10) publisher
 * 11) publication-date
 * 12) publication-place
 * 13) agency
 * 14) no-pp
 * 15) arxiv
 * 16) asin
 * 17) bibcode
 * 18) bibcode-access
 * doi
 * 1) doi-access
 * 2) doi-broken-date
 * hdl
 * 1) hdl-access
 * 2) isbn
 * 3) issn
 * jfm
 * 1) jstor
 * 2) jstor-access
 * 3) lccn
 * mr
 * 1) oclc
 * ol
 * 1) ol-access
 * 2) osti
 * 3) osti-access
 * pmc
 * 1) pmc-embargo-date
 * 2) pmid
 * rfc
 * 1) ssrn
 * 2) ssrn-access
 * 3) s2cid
 * 4) s2cid-access
 * zbl
 * id
 * 1) archive-url
 * 2) archive-date
 * 3) archive-format
 * 4) access-date
 * via
 * 1) quote
 * 2) trans-quote
 * 3) postscript
 * ref
 * 1) mode
 * 2) postscript

Discussion

 * I'd prefer if this bot (and every bot) stopped short of reordering template parameters. Doing a full reorganisation on any template edited will make it much more difficult to tell what changes have been made when reviewing diffs. Folly Mox (talk) 09:23, 16 July 2024 (UTC)
 * We can trust our bots that much, I'd say. And it shouldn't be much of a problem if you compare the diffs in visual diff mode, try here. In my experience, it's much easier for a bot (program) to reassemble a template in some predefined order. Having data in the order of final appearance does help with readability (BilledMammal: that'd be url?, author(s) data, date, title…).Ponor (talk) 06:48, 18 July 2024 (UTC)
 * Currently, author(s) data, date, title, url - the full order can be seen in the final collapsed box. However, that is easy to change.
 * It wouldn't be difficult to put it back in the original order (although it would result in new fields being dumped at the end), but personally I believe it is better to reorganize it, as while it makes it harder for editors using non-visual viewer to identify the changes, it easier for editors to parse the template going forward. BilledMammal (talk) 23:05, 18 July 2024 (UTC)
 * I support putting the params in some canonical order, my only question is which one it should be. VisualEditor (TemplateData), IAbot, maybe even reFill, probaly use the same one ("Full parameter set in horizontal format" from Cite web?), which is what I'd use as well. Up to you, though. Ponor (talk) 14:05, 19 July 2024 (UTC)
 * I started with the full parameter set from Template:Cite news, but quickly found that "full parameter set" doesn’t actually mean "full parameter set".
 * I see the two templates differ in where to put the URL; I think Cite news' method is better, as the URL is difficult to read so better to put that at the end. BilledMammal (talk) 14:11, 19 July 2024 (UTC)