Wikipedia:Bots/Requests for approval/AilurophobiaBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

AilurophobiaBot
Operator: ~ Ame I iorate U T C @

Automatic or Manually Assisted: Automatic

Programming Language(s): AWB (possibly assisted by a C# app)

Function Summary: Escaping categories in user space pages

Edit period(s) (e.g. Continuous, daily, one time run): Sporadically Most likely once/twice a week.

Already has a bot flag (Y/N): N

Function Details: AilurophobiaBot escapes article categories in userspace pages, it does this by replacing  with

The regex includes filtering to prevent it removing legitimate userspace categories, it ignores any category containing Wiki, Bot, User, task force (inc. taskforce), Possible, Candidates, proofreaders, translators, workgroup, admin or proxies (cases are insensitive). It also ignores specific categories; Category:Vandalism Control Network members, Category:Non-talk pages that are automatically signed and Category:The IC Star Recipients.

The lists of pages to edit are generated using with. At the moment the list of pages to edit will be generated manually through Special:Export [[Category: with  along with some filtering to ignore certain categories, this is 'safer' because if a userspace page has been caught by the RecentChangesLinked result of an article-only category then it should be safe to bulk remove the categories (ignoring the incontestably userspace-only cats.) %27%27'~ Ame I iorate%27%27' [[User:AmeIiorate|U]] T C @ 11:47, 20 August 2008 (UTC)
 * You say "along with some filtering to ignore certain categories"... How will this filtering work? Details needed. – Quadell (talk) 13:53, 20 August 2008 (UTC)
 * Similar to before, only with a bigger scope. If the category name contains Wiki, Bot, User, task force, Possible or Candidates, it won't be removed - which should cover all categories that are userspace acceptable. ~ Ame I iorate U T C @ 09:53, 21 August 2008 (UTC)

Hmmm... is there a reason why a workflow like this wouldn't work? I could help you with step 1: I have both a toolserver account and some scripts for grepping database dumps. You can probably find people to help you with step 3 if there are too many categories for you to do it alone. —Ilmari Karonen (talk) 10:43, 21 August 2008 (UTC)
 * 1) Obtain a list of all categories used on user pages, either from the toolserver or from the categorylinks database dump.
 * 2) Apply the exclusion rules you suggested above.  This ought to narrow down the list quite a lot.
 * 3) Post the remaining list of categories on-wiki for manual review.
 * 4) Once the list has been reviewed, set the bot to work on it.
 * 5) Remember the results of the review for later runs, so that only categories that haven't been checked before need to be reviewed each time.


 * Here's a list of suspicious categories found on user pages, and here's the query to generate it (warning: takes a while to run). These are only from root user pages, not from subpages, and excludes any categories that match   or transclude any of, , , , , , ,  or any template beginning with "Usercat".  The numbers tell how many user pages are in each category.  —Ilmari Karonen (talk) 13:01, 21 August 2008 (UTC)
 * Excellent. Per that information, anything containing "proofreaders", "translators" or "workgroup" is now filtered, also Category:Rouge admins and Category:Vandalism Control Network members are excluded, they were the only two I noticed that shouldn't be removed. ~ Ame I iorate U T C @ 13:32, 21 August 2008 (UTC)
 * Probably should filter out anything with "admins", "administrator" or "proxies" in it, as well as Category:Accessibility advocates, Category:Birthday Committee , Category:The IC Star Recipients and Category:Non-talk pages that are automatically signed. By the way, here's the same list as a sortable wikitable.  Feel free to edit it as needed.  —Ilmari Karonen (talk) 05:23, 22 August 2008 (UTC)
 * Actually, it seems Category:Birthday Committee doesn't belong on userpages, but gets there via transclusions of Birthday Committee/Calendar subpages that don't have the category wrapped in &lt;noinclude> tags. Fixing that would probably be a bot job in itself, seeing as there are slightly over 366 of them.  —Ilmari Karonen (talk) 05:29, 22 August 2008 (UTC)
 * Now filters anything containing: admin and proxies as well as The IC Star Recipients and Non-talk pages that are automatically signed. Also, if this is approved I'll file a request for a separate task to fix the cats on the Birthday Calendar pages. That sortable list is excellent, thanks. ~ Ame I iorate U T C @ 06:23, 22 August 2008 (UTC)

As a content editor, I'm aware that lots of people use userspace for sandboxes for articles before moving them into the mainspace when ready. It'd be irritating for them if your bot kept fiddling with Cats prior to page moving. You could address this by setting the bot to ignore User:Foo/Sandbox pages and subpages thereof. For those who use sandboxes but don't call them "sandbox" (!) you could assist by setting a time-related filter, to ignore recently created or perhaps those recently worked on? Just some thoughts. --Dweller (talk) 11:01, 21 August 2008 (UTC)
 * As it isn't an omnipresent "watchdog" type bot (like ClueBot) I can't see this causing problems. An article created in a userspace sandbox should only be categorised right before it is moved, so the category links would have to have been added just before the list of miscategorised pages is created and not moved before the bot got to it, so the way the bot would mess up here would require, either a significant delay before moving the page, or incredibly bad luck and timing. ~ Ame I iorate U T C @ 11:54, 21 August 2008 (UTC)
 * Yeah, I tend to agree with you. Userfied pages would fall foul, but I think there's an argument for supporting the Cats on those pages being "switched off". Re your comment about bad luck/timing, it'd be good to avoid this if possible; perhaps ignoring pages with very recent edits would cover it completely? --Dweller (talk) 12:04, 21 August 2008 (UTC)
 * It now works as follows: I make a list 24 hours before the bot run and right before the bot run. The two are compared, and only pages that appear on both lists will be edited. ~ Ame I iorate U T C @ 22:23, 24 August 2008 (UTC)

I have rewritten the full function details to clarify/outline what has been changed. ~ Ame I iorate U T C @ 10:47, 22 August 2008 (UTC)

BAGAssistanceNeeded

It could also link some templates that place the page in a category. BJ Talk 19:19, 26 August 2008 (UTC)
 * Done. ~ Ame I iorate U T C @ 21:39, 26 August 2008 (UTC)

BAGAssistanceNeeded ~ Ame I iorate U T C @ 23:39, 31 August 2008 (UTC)
 * I can't approve this but I review all the edits and only found one mistake. It also removed a newline in two of the edits for some reason. BJ Talk 12:29, 1 September 2008 (UTC)
 * Skipping that category was by design, as it contains "Candidates", although I think that filter isn't necessary as any "candidates" category is put there through a template anyway. Not sure about the newline-removal, probably just an AWB quirk. ~ Ame I iorate U T C @ 12:56, 1 September 2008 (UTC)


 * I looked over the contribs as well, seems to have gone very well.  SQL Query me!  03:58, 4 September 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.