Wikipedia:Bots/Requests for approval/TolBot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

TolBot 3
Operator:

Time filed: 19:04, Monday, May 3, 2021 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available:

Function overview: Add to old WikiProject Spam reports

Links to relevant discussions (where appropriate): Bot request

Edit period(s): Continuous (one time)

Estimated number of pages affected: ~6809:
 * ~6799 subpages of Wikipedia:WikiProject_Spam/LinkReports;
 * ~10 subpages of Wikipedia:WikiProject_Spam/Local.

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: The bot:
 * 1) Logs in;
 * 2) Gets the list of pages with a certain prefix;
 * 3) Checks each page if it is in a category (Category:Noindexed pages)
 * 4) If it is not, it adds a newline and NOINDEX to the page.

Discussion
A quick note: The estimated number of pages is an absolute maximum; the number will be far smaller than that. However, each of those pages must be checked to see how many still have to be NOINDEX-ed (which is taking a very long time). A better estimate will probably come sometime within the next day. Tol &#124; Talk &#124; Contribs 19:10, 3 May 2021 (UTC)
 * I had meant to ask this at BOTREQ, but I suppose now it’s here I might as well ask here! What problem exactly is this solving? I don’t see a massive issue with it, but I’m not sure exactly why it needs to be done... ƒirefly  ( t · c ) 19:41, 3 May 2021 (UTC)
 * Some older spam reports are not properly NOINDEX-ed, so they may show up in search engines. This can cause issues when people on those search engines find these spam reports and may misinterpret them. Tol &#124; Talk &#124; Contribs 23:32, 3 May 2021 (UTC)

All the pages seem to contain User:COIBot/Summary/LinkReports (embedded as a template). Can't we just add NOINDEX into that? ProcrastinatingReader (talk) 20:00, 3 May 2021 (UTC)
 * That sounds like a good idea — however, this template transclusion count says there are only 185724 transclusions. Unless this counter is wildly inaccurate, it would seem that some pages lack this template. Tol &#124; Talk &#124; Contribs 23:32, 3 May 2021 (UTC)
 * Could be some other template on the other pages. Best to get the number affected as low as possible by adding the template to any already-transcluded templates, then redoing the numbers for pages affected to proceed here. ProcrastinatingReader (talk) 07:37, 4 May 2021 (UTC)
 * However, each of those pages must be checked to see how many still have to be NOINDEX-ed (which is taking a very long time). A better estimate will probably come sometime within the next day Are you trying to do that with a bot? A search immediately shows there are 21,574 pages. Many of these transclude User:COIBot/Summary/LinkReports as PR says above. I edited that template to add NOINDEX. Let's wait for some time for the job queue to settle, and we should hopefully have a lower figure. – SD0001  (talk) 11:17, 4 May 2021 (UTC)
 * Adding: this search shows that 14,775 of those pages transclude that template, so should get noindexed soon. – SD0001  (talk) 11:19, 4 May 2021 (UTC)
 * Yes, for some reason I was. The same search now shows 15,206 pages left (which is probably still too high); I looked at a few and most (on the first few pages) have an ambox header. Thanks for editing the template. Tol &#124; Talk &#124; Contribs 19:29, 4 May 2021 (UTC)


 * I think part of the problem is that many of these pages are old and are not re-parsed unless read. As the person who is complaining about one of the pages appearing in a Google result is not disclosing which page it is, I cannot check.  I presume this is a very old report and hence it is (was) not noindexed and hence in Google's search results.  This exercise would be to ensure that every single COIBot report is actually actively no-indexed, so that removing it from the Google search results will actually stick (this is not the first time that I get a 'complaint' that a report is showing up in Google, whereas they are, for years, noindexed by default).
 * Probably most of them could be blanket deleted, but some of them are representations of evidence that has been used in admin actions, which I think should be visible to everyone (but not to google). --Dirk Beetstra T  C 12:18, 5 May 2021 (UTC)
 * A mass null edit run on those pages would probably get them no-indexed if the issue is caching. &#32; Headbomb {t · c · p · b} 13:29, 7 May 2021 (UTC)
 * I believe a template edit causes all pages transcluding the template to be re-parsed – as appears to have happened here with the count of search results coming down. – SD0001  (talk) 18:55, 13 May 2021 (UTC)


 * @ProcrastinatingReader; @SD0001: There are no more indexed pages transcluding the template. There appear to be 6799 subpages of Wikipedia:WikiProject_Spam/LinkReports and 10 subpages of Wikipedia:WikiProject_Spam/Local which still need to be NOINDEXed. It does not appear that any of them transclude other templates which could be used for another mass NOINDEXing. Tol &#124; Talk &#124; Contribs 18:34, 13 May 2021 (UTC)
 * Ok, so should the remaining pages be tagged with NOINDEX or with (which applies the former)? Also pinging @Beetstra. –  SD0001  (talk) 18:53, 13 May 2021 (UTC)
 * , guess the latter would be nicer, thanks for the effort! Dirk Beetstra T C 18:49, 14 May 2021 (UTC)
 * I can prepend that to each page. Tol &#124; Talk &#124; Contribs 18:56, 14 May 2021 (UTC)
 * Also please update the task description to that effect. – SD0001  (talk) 19:09, 14 May 2021 (UTC)
 * Done; thank you. I'll get the trial done soon. Tol &#124; Talk &#124; Contribs 20:02, 14 May 2021 (UTC)
 * Diffs are here; sorry it took so long — I changed it to use search instead of checking every single page (which I really should have thought of sooner). Tol &#124; Talk &#124; Contribs 22:31, 14 May 2021 (UTC)
 * Looks quite straightforward. Not seeing any issues. Do set the bot flag on edits though (put bot=true in the mw:API:Edit request). – SD0001  (talk) 03:54, 15 May 2021 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.