Wikipedia:Bots/Requests for approval/Josvebot 12


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Josvebot 12
Operator:

Time filed: 23:34, Sunday, April 17, 2016 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AutoWikiBrwoser

Source code available: AWB

Function overview: Will tag orphaned talk pages (talk pages without existing corresponding article page) for speedy deleteion.

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: 100-3000 per run

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Will run this SQL and prepend to all talkpages using AutoWIkiBrowser. If you are a sysops, see this edit I did with my account.

Discussion

 * Question: Will this bot operate only in article space, i.e. main space? There are many talk pages outside of article space that do not have corresponding non-talk pages, e.g. Help talk:Citation Style 1/Archive 7. Thanks – Jonesey95 (talk) 03:24, 18 April 2016 (UTC)
 * The quarry results are showing pages in all sorts of namespaces - I think there are too many false-positives if you were to do this. — xaosflux  Talk 03:40, 18 April 2016 (UTC)
 * I disagree with this strategy, as it would flood CAT:CSD - would be better to make it in to an on-wiki report for processing. —  xaosflux  Talk 03:40, 18 April 2016 (UTC)
 * I can limit this to namespace-related talk pages if so seems more appropriate. (And I agree, it does. See quesion raised by me below ) And I can space out the edits with one tag per minute, in order to not flood CSD (60 tags per hour doesn't seem to be flooding imo), since this will be continuous it seems better to tag them directly with a bot, than list them and wait for a human to tag them, then most likely en masse and really flood stuff. (t) Josve05a  (c) 05:56, 18 April 2016 (UTC)
 * You say the query results in non-mainspace-results; can you give an example? I can only find namespace-matches. (t) Josve05a  (c) 07:48, 18 April 2016 (UTC)
 * The query should exclude results with  (titles with slashes, sush as subpages). (t)  Josve05a  (c) 07:50, 18 April 2016 (UTC)
 * - in you own link above, look at the namespace results; notice 2600:Topic; 711:Timed Text Talk; 447:Education Program talk; 119:Draft Talk, etc.... — xaosflux  Talk 11:48, 18 April 2016 (UTC)
 * Oh right, sorry. I haven't coded in exceptions for those mainsspaces yet, and I was looking at a downloaded query result which had filtered out all !=1 titles. I will only operate for namespace talk pages. (t) Josve05a  (c) 12:27, 18 April 2016 (UTC)
 * Thank you. — xaosflux  Talk 13:00, 18 April 2016 (UTC)

I'm still concerned with your false positive rate, can you run your query, and produce a wiki-page of all the pages that your bot would tag. This needs to be reviewed by humans to find out your error rate before this can even begin live-trials. Your bot may run these reports in its own userspace.
 * — xaosflux  Talk 13:00, 18 April 2016 (UTC)
 * Ignoring all the already deleted pages, which the bot will skip, here is the list: User:Josvebot/Orphaned talk pages/2016-04-18. (t) Josve05a  (c) 14:54, 18 April 2016 (UTC)
 * Thank you - a very quick check of some random pages looks clean - would like to give anyone else some time to go through these to see if there is a legitimate reason they have been left behind. I'll post at AN for feedback. — xaosflux  Talk 15:18, 18 April 2016 (UTC)
 * Ambox notice.svg This message is being sent to inform you that there is currently a discussion at Administrators' noticeboard regarding an issue with which you may have been involved. Thank you. —  xaosflux  Talk 15:22, 18 April 2016 (UTC)
 * Note: removed "Trial" tags - the report looks good for review - after comments an actual trial of the bot's actions will be needed. — xaosflux  Talk 15:24, 18 April 2016 (UTC)


 * It seems to mistakenly list talk pages which are redirects and are talk pages of articles which are also redirects. is the talk page of the existing  and should not be deleted as G8 -- and that's the first link I spot-checked! ☺ ·   Salvidrim!   ·  &#9993;  15:27, 18 April 2016 (UTC)
 * Hm, that's actually an interesting thing. The "redirect" article was created today (merly hour(s) beofre the quey, so there were some server lag...I'll try and work on a fix for that. (t) Josve05a  (c) 15:55, 18 April 2016 (UTC)
 * Actually both the article and its talk page were auto-renamed by AnomieBOT (for dashes reasons), but for some reason the talk page was renamed approx. 9 hours before the article, and your list happened to be generated within the time gap between the first move and the second one (server-wise). ☺ ·  Salvidrim!   ·  &#9993;  16:01, 18 April 2016 (UTC)
 * Hmm..I didn't see that the article was created after the talk page (for whatever strange reason). This should not happen to often and should be well below the acceptable false possitive ratio. (But I'll try and see if I can do something about this.) (t) Josve05a  (c) 16:03, 18 April 2016 (UTC)
 * If the list were smaller, I could effectively do a dry run before each tagging session and remove all occurrence of existing article page with AWB. But it can't be checked while tagging. (t) Josve05a  (c) 16:19, 18 April 2016 (UTC)

What about subpages? Please set the bot to ignore pages with slashes in the title, unless both (1) the slash-free title doesn't exist [e.g. Talk:AC/DC would be ignored as long as Talk:AC exists], and (2) the corresponding article doesn't exist. I'm just concerned that the bot might start tagging archive pages (after all, United States/Archive 22 isn't an article), making extra work and perhaps causing a few pages to be deleted that shouldn't be. Also, how does the bot handle G8 exceptions? WP:G8 reminds us that we shouldn't delete useful-but-orphaned pages, and that we should tag such pages with G8-exempt; does the bot know that it shouldn't touch those pages? These are the only exceptions that come to mind; otherwise I think this bot a great idea. Nyttend (talk) 17:49, 18 April 2016 (UTC)
 * Both these things are caught and excluded in the SQL query. See discussion at the top for slashes. The sql should not list articles tagged with that template, and if you see the example I listed in the description at the top (if you have sysops) then you'll see the edit summary asks that you replace the db-tag with the exempt tag if tagged in error. (t) Josve05a  (c) 18:51, 18 April 2016 (UTC)
 * I also have a custom 'skip-RegEx', for such as the exempt-template, active MfD and CSD-templates etc. when editing. (t) Josve05a  (c)
 * I'm sorry; I missed the part about slashes when I was reading quickly through this discussion, and I didn't pay attention to the edit summary when I looked at the deleted test. No remaining concerns on this issue.  New issue: Sphilbrick's comment about articles in talkspace.  What if the bot checked to see if each orphaned talk page has a deletion-log entry for its corresponding article, and if there's no such entry (i.e. the article never existed), the bot adds a cleanup category to the talk page?  I'm thinking something like "Possibly orphaned talk pages", which of course would be tagged with Empty category; we admins could always run through it at random, deleting or draftifying them as appropriate.  Right now I'm sleepy, so I have to acknowledge that perhaps I missed a spot in which you addressed it.  Nyttend (talk) 05:47, 19 April 2016 (UTC)
 * Currently I have no way to check this with my current set-up. i do however believe the amount of "real" articles in talk-space that's worth keeping is so low that is miniscule. However, I could make the bot add something like instead of tagging with, but I do not see that this would be such a thing that would warrant a complete re-coding of everything. (t)  Josve05a  (c) 11:21, 19 April 2016 (UTC)
 * I see here that the bot will ignore pages tagged with G8-exempt, but will it also ignore pages in Category:Wikipedia orphaned talk pages that should not be speedily deleted? Asking since all pages tagged with the template will be in that category, but not necessarily all pages in that category will be tagged with that template. Also, G8-exempt has incoming redirects. Steel1943  (talk) 15:51, 22 April 2016 (UTC)
 * It should skip those as well, but~they do still show up in the list of article that the bot will list and "try and edit", but it will not be able to edit or process those. (t) Josve05a  (c) 15:57, 22 April 2016 (UTC)

Why does Talk:Ocean Beach (Bluff Harbour) appear in your query output? It appears to have a corresponding article page, and both pages were not created recently. I must be missing something. – Jonesey95 (talk) 19:55, 18 April 2016 (UTC)
 * Talk:Ocean Beach (Bluff Harbour) does not appear. does, because Ocean Beach (Bluff Habour) does not exist. ☺ ·   Salvidrim!   ·  &#9993;  20:01, 18 April 2016 (UTC)
 * I knew I was missing something. Thanks. – Jonesey95 (talk) 20:08, 18 April 2016 (UTC)

A few comments:
 * 1) I'd like to hear from  If that name doesn't register, that editor proposes orphaned talk pages at CSD every Monday morning. I don't know that editors process, and there is, of course, no requirement that any such editor has to be exhaustive, but I'm puzzled by the observation that we have a dedicated editor working on this and apparently some items that may have been missed
 * 2) I'd like to see a little more discussion of false positives. The first few items in this list require some careful review. Per MOS, articles about sports seasons should have an en-dash in the date. Per the very reasonable assumption that some people searching for such a page might do a query with an ordinary dash  rather than an en-dash, it is common to create a redirect with the ordinary dash. We ought to be clear on whether that redirect ought to have a talk page. I think a good argument is that it should not, but it isn't clear to me that it is an orphaned talk page.
 * 3) One of my concerns is that people mistakenly create an article in talk space rather than article space. This used to be more common before the draft space was created, but may still occur. I would prefer that such mistakes be moved to draft space, even though they technically are orphaned talk pages. I've made this proposal before without much success, but if a bot simply checks to make sure that the talk page exist in the article does not, it is likely to miss that the page was intended as an article.-- S Philbrick  (Talk)  23:31, 18 April 2016 (UTC)
 * Regarding point two, see above, this was caused by the "article" was created after the talk page, but before the query run which made the list, so there were no redirect on the article page then.
 * Regarding point two, I've yet to see an article in talk page which was worth keeping (neither in draft or in ns space). It is much more likely someone creates a talk page asking a question such as "why doens't this article exists" than someone creating a complete article in talk. Also see my respons to Nyttend above. (t) Josve05a  (c) 11:21, 19 April 2016 (UTC)

Adding some of my own comments/questions:
 * 1) Way back when, there was Database reports/Orphaned talk pages which I used to work on.  It broke when the Toolserver went away.  I think I prefer this list based method as it was easier to slam through the list instead of dealing with the bulleted format of CAT:CSD. Would it be easier to fix this report instead of writing a new bot?  used to work on that list a lot with me, so I would like to see what their opinion is.
 * 2) Does the bot honor G8-exempt?
 * 3) I'd like to see a throttle on the bot (assuming it doesn't go to the single page list) so that it doesn't tag more than 50 orphans at one time and then wait until the tagged count goes below a certain number before tagging the next set.  Similar to how HasteurBot was going through the G13 backlog.  That way CAT:CSD doesn't get slammed.
 * 4) Not a comment or question, just a pat on the back for  who has been doing a great job for a long time and I just wanted to take the opportunity to thank them again. -- Gogo Dodo (talk) 03:20, 19 April 2016 (UTC)
 * Regarding point one, not everyone likes every "metod". However I do believe tagging for deletion is better than listing on a "list page" and wait for someone else (non-admin) to tag them. Especially if we get the list down to below 100. (t) Josve05a  (c) 11:21, 19 April 2016 (UTC)
 * Regarding point two, see multiple responses above. (Yes it does)
 * Regarding point three, this could be done. Or set a time throttle between each edit.
 * Regarding point four, I agree! Thanks for all your work Aleenf1. (t) Josve05a  (c) 11:21, 19 April 2016 (UTC)
 * When the report was running, there really was not much CSD tagging going on. What usually happened is that the report would run and some interested admin (usually myself or Athaenara) would slam through the deleting all the appropriate ones or doing whatever else was necessary.  That is the main disadvantage of a list in that there are fewer admins potentially working on it from lack of visibility.  I just find it easier to deal with the list instead of CAT:CSD, but then I did a lot (IMHO) of the G13 deletions when HasteurBot was working through the backlog.
 * Regarding the throttle, a throttle of the number of orphans it tags at once a la HasteurBot is better than a time throttle per edit. The point of throttling the total number is to not overwhelm CAT:CSD.  If the bot is set to limit say one orphan per 5 seconds, you could still overwhelm CAT:CSD if no admin deletes the tagged orphans.  If there is a limit of 50 open tags, then CAT:CSD will never grow too large. -- Gogo Dodo (talk) 03:18, 20 April 2016 (UTC)
 * I'm all for compromises. I could keep posting these updates, and tag X number of files for csd per day. How does that sound? <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 05:00, 20 April 2016 (UTC)
 * It isn't tag X per day. It is on given interval N, tag X number of orphans unless X is already tagged either by the bot or somebody else and not deleted (i.e., count of Category:Candidates for speedy deletion as dependent on a non-existent page exceeds X). -- Gogo Dodo (talk) 06:20, 21 April 2016 (UTC)


 * FYI: Thparkth resurrected Database reports/Orphaned talk pages with Community Tech Bot. I just wiped out the entire listing there (I only made a few mistakes).  Of course, now every time I see a fill in box, I want to paste in "G8: Talk page of a nonexistent or deleted page".  I think my deletion count is over 100k now. Woohoo! -- Gogo Dodo (talk) 06:29, 21 April 2016 (UTC)

Determining automated G8 tagging strategy
From the discussion above, it appears that the current back log has been removed, and a list-based method at Database reports/Orphaned talk pages has been reactivated with another bot. That being said, in order for this tagging bot to go forward we need to know if there is a consensus for automated G8 tagging at all. — xaosflux  Talk 15:18, 22 April 2016 (UTC)


 * Discuss :
 * Community comments for this proposal have been solicited at the talk pages for CAT:CSD, WP:CSD. — xaosflux  Talk 15:28, 22 April 2016 (UTC)
 * Is there anything I can do to help "the discussion"? <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 07:13, 28 April 2016 (UTC)


 * Support :
 * 1) Support, but only if the bot also adds instructions to there tags to inform the reviewing administrator to remove the tag and replace it with G8-exempt in the event of false positives, as well as ignores pages in Category:Wikipedia orphaned talk pages that should not be speedily deleted. Steel1943  (talk) 15:47, 22 April 2016 (UTC)


 * Oppose :

Determining automated G8 tagging strategy

 * Based on all of the discussion above, can you please generate a new list of all of the pages you would tag G8, and include below? This should take in to account all of the exemption criteria previously discussed. —  xaosflux  Talk 09:25, 28 April 2016 (UTC)
 * Creating that list right now. Another exemption note, The bot will skip talk pages to articles which re creation protected, since the bot will run on the article pages and check if existing, and if they exist the article will be skipped, before converted to a talk page in the queue. AWB automatically skips creation protected articles, so those talk pages will not be edited. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 13:16, 28 April 2016 (UTC)
 * The list has grown much shorter, and I've added a few more exceptions, which are not discussed above, but here is "today's" list User:Josvebot/Orphaned talk pages/2016-04-28. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 13:22, 28 April 2016 (UTC)


 * Looks like you will be fighting with another bot, for example on Talk:2016-17 Southern Football League - in this case I don't think the other bot should be making that orphaned talk page, lets see if can give us some more details? —  xaosflux  Talk 13:35, 28 April 2016 (UTC)
 * Well, technically it is an orphaned talkpage, but I can set the bot to skip tagging if the talk page is a redirect, if that would be wanted/warranted. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 13:41, 28 April 2016 (UTC)
 * Ugh. Apparently due to the sort of issue described at T115517 the database replica is missing the page table record for 2016–17 Southern Football League, so AnomieBOT doesn't know about it to create the redirect at 2016-17 Southern Football League. Anomie⚔ 14:23, 28 April 2016 (UTC)
 * As said, he bot can skip redirect pending a fix to this issue. I suggest a real trial, to test the bot live. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 19:55, 28 April 2016 (UTC)


 * - OK, back to trial you can live tag these as well. —  xaosflux  Talk 00:48, 6 May 2016 (UTC)
 * So far tagged about 100 pages. Only one of these pages has so far been kept, and moved to draft space. Seems like good numbers to me. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t) Josve05a  (c) 12:41, 7 May 2016 (UTC)
 * Around 125-160 edit has been made. Not sure, since I can't see deleted edits, and the bot has skipped some when processing. I'm putting this trial to close now at least, as not to do more edits than the trial allowed. Seems as though only one page which was tagged was kept, and moved to draft space. Seems like a success to me. At least no one has yet to complain. <span style="background: turquoise;font-family: 'Segoe Script', 'Comic Sans MS';">(t)  Josve05a  (c) 15:48, 10 May 2016 (UTC)
 * — xaosflux  Talk 02:38, 11 May 2016 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.