Wikipedia:Bots/Requests for approval/RonBot 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

RonBot 5
Operator:

Time filed: 15:33, Sunday, June 10, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: User:RonBot/5/Source1

Function overview: Removal of succession Boxes from Music Articles

Links to relevant discussions (where appropriate): Wikipedia_talk:Manual_of_Style/Record_charts via Bot_requests

Edit period(s): Large run to start, then a weekly run

Estimated number of pages affected: Initially around 7000

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: API call for when there is a music infobox (needs 3 calls), then remove entire succession box. If the box is preceded by a header line (with only whitespace between the "=" and the "{", then the header line will be removed as well. If the box find a NavBox constructed in the page, it will skip and mark for manual removal (there are only a few succession boxes inside a NavBox - not worth all the extra programming to try and catch all variants)

Discussion
25 for songs, 25 for albums. Headbomb {t · c · p · b} 16:12, 11 June 2018 (UTC)
 * Mainly good so far, boxes removed as planned, A few wrinkles showed up...
 * It helps to have "return" at the end of a subroutine, doh!
 * Led Zeppelin IV - an "=" on the preceding line, got trapped by the RegEx and removed. RegEx changed to look for 2 to 6 "=" signs, not one.
 * The manual template got added again on a second sweep Hallelujah (Leonard Cohen song) - now checks for its presence and skips.
 * A couple of non-music articles (Film, video game) got caught - they had an "infobox Album" on the same page - now looks for infoboxes that are not song/slbum/single in the article, if found will skip.
 * Difficult to do a second run (I was trialling at 5+5 to start), as wiki obviously does not update it's search index very fast, the second run went through all the files done before - checking each one still has a "S-start" template before processing.
 * Edit summary wrong when just adding the manual template - fixed.
 * Above can be seen at Special:Contributions/RonBot 18:51, 11 June 2018 to 20:47, 11 June 2018. Will finish trial tomorrow (should allow wiki's index to catch up). Ron h jones (Talk) 23:24, 11 June 2018 (UTC)
 * Final 20 of trial now run Special:Contributions/RonBot 18:20, 12 June 2018 to 18:21, 12 June 2018. All articles were music ones. The following were found in the search, but were skipped as they had a non-music infobox as well.
 * Howard the Duck (film)
 * Super Mario Galaxy
 * As Good as It Gets
 * Whiplash (2014 film)
 * Precious (film)
 * Half-Life 2
 * Cats (musical)
 * Ron h jones (Talk) 18:33, 12 June 2018 (UTC)
 * Edits like these are rather problematic. Headbomb {t · c · p · b} 18:54, 12 June 2018 (UTC)
 * I checked the skipped film, etc., articles and the succession boxes are not for record charts, so they're OK. The rest look fine, except the one that didn't remove the === headers (I fixed it).  I think there's only a few where the succession boxes are in a separate section with "succession" in the header. —Ojorojo (talk) 19:27, 12 June 2018 (UTC)
 * I see that was Hound Dog (song) - back to the RegEx tester! Ron h jones (Talk) 21:50, 12 June 2018 (UTC)
 * I made that RegEx more complicated than it needed to be (after it selected the single "="), Looks better now. I'll put that revision into a user sandbox and test again. Ron h jones (Talk) 21:56, 12 June 2018 (UTC)
 * See https://en.wikipedia.org/w/index.php?title=User:Ronhjones/Sandbox4&diff=prev&oldid=845606912 Ron h jones (Talk) 22:37, 12 June 2018 (UTC)
 * Looks good. FWIW, I found about 42 uses of level 2, 3, and 4 headers with "Chart succession". —Ojorojo (talk) 16:30, 13 June 2018 (UTC)

Again 25 songs, 25 albums. I also take it the problematic edits from the previous bot run were reverted/fixed? 18:17, 14 June 2018 (UTC)
 * . All problematic edits from the previous bot run were fixed before run. See Special:Contributions/RonBot 16:11, 15 June 2018  to 16:23, 15 June 2018. All music articles, all look OK to me. Ron h jones (Talk) 16:32, 15 June 2018 (UTC)
 * I reviewed the diffs and they look fine. On Imagine though, the succession boxes were removed, but an empty navbox titled "Chart procession and succession" remained.  I didn't see that it was added to Category:Music pages for manual succession box removal, but I fixed it. —Ojorojo (talk) 17:55, 15 June 2018 (UTC)
 * Trouble with public writing articles, they are never consistent. OK, I see the minor issue - I should have looked for "Navboxes}}" and not "Navboxes". C'ést la vie. I suppose we had better think about another trial, maybe a bit bigger? Ron h jones (Talk) 18:12, 15 June 2018 (UTC)
 * Yes, it seems one can always count on one more variation. I though I removed most of the navboxes with succession boxes, but there are 10 with "Chart procession and succession" (similar to Imagine). I can removed these if it's easier. —Ojorojo (talk) 15:26, 16 June 2018 (UTC)
 * Since we also want this to run as a weekly check once the bulk have been processed - I'd rather get it right now. Ron h jones </b>(Talk) 16:15, 16 June 2018 (UTC)
 * Tested in my sandbox4 - now correctly runs - the debug output (below)is OK (the 1 0 above the "writing" indicates 1 Navbox found, and zero Navbox or Navboxes with the closing braces - i.e a Navbox constructed on the page)

main.pagepage allow bot to edit page Pages done so far 0

=
===============================TOP OF ORIG======================

=
===============================BOTTOM OF ORIG====================== 1 1 0 0 SStart 1 Manual 0 1 0 writing page tagged manual ++++++++++++++++++++++++++++++++++++++++++++TOP OF NEW+++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++BOTTOM OF NEW++++++++++++++++++++ User:Ronhjones/Sandbox4 <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 15:26, 17 June 2018 (UTC)

One additional issue is that it often replaces succession boxes with empty lines, e.g., and seems to leave some blank lines alone when it should remove them. It should replace them with nothing at all. Headbomb {t · c · p · b} 00:53, 18 June 2018 (UTC)

250 songs, 250 albums. After the navbox/whitespace issues are resolved. Also, point to this BRFA during the trial for the edit summaries. Headbomb {t · c · p · b} 00:57, 18 June 2018 (UTC)
 * OK, I'll check the ones done and see how we can ensure a clean removal. <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 01:40, 18 June 2018 (UTC)
 * OK, I've done some dummy runs to make sure I'm picking up the blank line - The code was being deleted from the start of the first line to the final  end}} , thus leaving the "newline" following that section intact. Adding a "\n" to end of the regEx grabs that new line as well. The question is before we do the run is - do you want to eliminate double blank lines - reason... most (not all) are like...

<blanklineA> {{S-start}

<blanklineB>
 * So last time we got blanklineA then a blank line (hiding at the end of the " end}} ") then blanklineB - equals three blank lines. Added a \n at the end of the RegEx, we reduce that to two blank lines (BlanklineA then BlanklineB), do we want to cull that to one blank line? One easy way to do that would be another RegEx replace for two blank lines to be one blank lines - but that would work over the whole page OR we would have to have a separate check for two \n after the end}}, if we only want to fix that double blank line. <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 19:43, 19 June 2018 (UTC)


 * Well, basically everything between start/end of those templates should be purged. And leading/trailing whitespace normalized accordingly. Headbomb {t · c · p · b} 19:47, 19 June 2018 (UTC)
 * I've done a quick 50 trial, just to ensure that the changes to the RegEx went as planned. It seems to be doing all we are asking it to do - Special:Contributions/RonBot 18:14 to 18:21 20th June. Even one succession box found in a navbox and tagged properly. Please have a look at the diffs, if looks good, then I'll scale up. <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 18:28, 20 June 2018 (UTC)


 * I looked through all of them and couldn't find any problems. Good to go. —Ojorojo (talk) 19:44, 20 June 2018 (UTC)
 * Famous last words... Made it up to 250 total, and one tiny, wrinkle... Rivers of Babylon - has a double header before the boxes. Reverted back, so I can trial it when tweaked. The other 249 are fine<b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 20:18, 20 June 2018 (UTC)
 * Tweaked the code - now deletes both headings (dummy run only), Will do the second lot of 250 tomorrow (after Wiki indexes the pages done properly), including that page <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 20:49, 20 June 2018 (UTC)

Final 250 in one run. I've looked at them all, they have removed exactly what was required - including the problem Rivers of Babylon with the double headers. Special:Contributions/RonBot 20:07, 21 June 2018 to 20:23, 21 June 2018 <b style="border:1px solid #dfdfdf;color:green; padding:1px 3px;background:#FFD">Ron h jones </b>(Talk) 20:26, 21 June 2018 (UTC)
 * I've reviewed a few, and all seem fine with the correct whitespace and everything. Gonna take it on faith that all 500 were fine too. Headbomb {t · c · p · b} 20:36, 21 June 2018 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.