Wikipedia:Bots/Requests for approval/BHGbot 7


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

BHGbot 7
Operator:

Time filed: 15:10, Tuesday, July 28, 2020 (UTC)

Function overview: Mass create Category redirects to resolve the WP:ENGVAR variations in category names using the word "organisation(s)" or "organization(s)". e.g. if we have a Category:Anti-Foobar organizations, then the page Category:Anti-Foobar organisations would be created with the content

Automatic, Supervised, or Manual: Automatic

Programming language(s): Bash and AutoWikiBrowser

Source code available: Yes. There are two components:
 * Bots/Requests for approval/BHGbot 7/Make-BHGbot7-edit-list.sh
 * Bots/Requests for approval/BHGbot 7/BHGbot-7-AWB-module

Links to relevant discussions (where appropriate): WT:WikiProject Categories (ations_category_redirects permalink], tho discussion is ongoing). This discussion was notified to WP:VPP and WP:VPR. Previous related discussion: WP:Bots/Requests for approval/BHGbot 3 (a similar proposal in 2017, which ran into the sands due to lack of prior consensus. My bad)

Edit period(s): Initial run to handle the backlog. Then a followup every few months.

Estimated number of pages affected: ~12,500 in the initial run.

Namespace(s): Category

Exclusion compliant (Yes/No): Yes

Function details: This task supports MOS:COMMONALITY by resolving the s/z WP:ENGVAR variation in the spelling of "organisation"/"organization", by creating a soft category redirect to the title which is in use. This corresponds with the MOS:COMMONALITY guideline to create such redirects in article space.
 * The word "organisation"/"organization" is one of the most common ENGVAR variants in category titles, and the current lack of redirects is a long-standing nuisance for both readers and editors.
 * The bot works in three stages:
 * A set of quarry queries to generate lists of pages
 * A bash script to process these lists and generate a list of category redirect titles to be created
 * An AWB run to create the category redirect pages
 * 1. Get lists
 * The first part of the bot is three quarry queries:
 * query/46899: Gets a list of non-redirect category pages whose title matches  and don't transclude Category redirect or Category disambiguation
 * query/46999: gets a list of all pages in the category namespace
 * query/47001: gets a list of all pages in the main (article) namespace
 * 2 process the lists
 * The bash script Make-BHGbot7-edit-list.sh:
 * inverts the S/Z spelling in the list of organisation categories
 * removes from that list titles which are in the list of all pages in the category namespace
 * removes from that list titles which are in the list of all pages in the main (article) namespace
 * wikilinks the resulting edit list
 * 3 Create the redirects
 * Using the edit list created in step 2, AWB
 * skips any existing pages (there should be none, but some may have been created since the list was made)
 * applies the AWB custom module BHGbot-7-AWB-module to create the redirect with an explanatory edit summary as in this test edit
 * If the page title to be created is "Foo organisations" (with an S), a category redirect is created to "Foo organizations" (with a Z). And vice versa.
 * Per a request by User:Hellknowz at the 2017 BRFA, the redirect template includes the parameter
 * The module includes sanity checks to:
 * skip any pages whose title does not match the regex
 * skip any case where it is about to create a self-redirect
 * I have done a dry run (AWB in pre-parse mode) on a deliberately-polluted list of test pages, and it correctly skipped them all. I did another test of the full list of ~12,500 pages, where no pages were skipped, which indicates the accuracy of the list-making.


 * Differences from BHGbot 3
 * This proposal tackles the same problem as the 2017 proposal BHGbot 3, but it uses a different approach. The 2017 proposal drew its list from recursing the category tree.  This proposal uses quarry to collect list of category titles.  Using quarry gives a complete list, whereas category recursion is usually woefully incomplete. The quarry-generated lists allow rigorous checks against error.

Discussion
Primefac (talk) 22:03, 2 August 2020 (UTC)

. Thanks, @Primefac.
 * I used the linux shuf command to randomly select 50 pages from a list of 12,461 categories which I had built last week while testing the list-making:


 * Category:Religious organisations established in 1928
 * Category:Organisations established in 1718
 * Category:Films about organisations
 * Category:Wikipedia categories named after organizations based in Iran
 * Category:Environmental organisations based in Europe
 * Category:Organisations based in San Diego
 * Category:Organisations based in Mayotte by subject
 * Category:Transport organisations based in Lithuania
 * Category:Organisations based in Oceania by country and subject
 * Category:Student organisations established in 1917
 * Category:Scientific organisations established in 1857
 * Category:Organisations based in American Samoa by subject
 * Category:British Cadet organizations
 * Category:State history organisations of the United States
 * Category:Missing people organisations
 * Category:Arts organisations established in 1887
 * Category:Environmental organizations based in the Bahamas
 * Category:Horticultural organizations based in India
 * Category:Organisations disestablished in 1950
 * Category:Transport organizations based in Gibraltar
 * Category:Defunct organizations based in Zambia
 * Category:Humanitarian aid organisations of World War I
 * Category:Defunct organizations based in the Cook Islands
 * Category:Wikipedia categories named after organisations based in Romania
 * Category:Religious organizations based in Chile
 * Category:Cultural organisations based in Moldova
 * Category:Cultural organizations based in Portugal
 * Category:Ethnic organisations based in the Czech Republic
 * Category:Religious organisations based in the Marshall Islands
 * Category:Animal welfare organizations based in Peru
 * Category:Women's organizations based in Pakistan
 * Category:Islamic organizations based in Mali
 * Category:Arts organisations established in 1988
 * Category:Housing rights organisations
 * Category:Sports organizations of South Ossetia
 * Category:Religious organisations disestablished in 2010
 * Category:National Taiwan University organisations
 * Category:Sports organisations disestablished in 1954
 * Category:Paramilitary organisations based in South America by country
 * Category:Business and industry organisations based in Chicago
 * Category:Music organisations based in the State of Palestine
 * Category:Organizations based in Bhopal
 * Category:Private and independent school organisations in the United States
 * Category:Film organizations in Belgium
 * Category:Organisations based in Orange County, California
 * Category:Members of the Parliamentary Assembly of the Collective Security Treaty Organisation
 * Category:Religious organizations based in Gibraltar
 * Category:Business organisations based in Turkmenistan
 * Category:Research organisations by country
 * Category:Migration-related organisations based in the United States


 * Here are the 50 trial edits.
 * No pages were skipped, and I have reviewed each of the 50 edits. The redirects are all as intended. -- Brown HairedGirl  (talk) • (contribs) 10:17, 4 August 2020 (UTC)
 * Primefac (talk) 00:37, 6 August 2020 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.