Wikipedia talk:Database reports/Polluted categories

Exclude hidden cats from this report
Just because. Rich Farmbrough, 23:37, 13 January 2011 (UTC).


 * You'll have to do better than that. --MZMcBride (talk) 01:53, 16 January 2011 (UTC)
 * I've been using this list to find user pages that contain article categories and change them to links. (See Bots/Requests for approval/BattyBot 9)  While my process works well for categories such as Category:Living people, it doesn't work for hidden categories such as the numerous Category:Singlechart categories.  Any changes you would be willing to make to display more non-hidden categories (e.g. exclude hidden categories, split into two lists, raise the limit from 250 categories) would be appreciated.  Thanks!  GoingBatty (talk) 04:05, 14 April 2012 (UTC)

Definition and criteria
What are the criteria for a polluted category? How are they ranked? Leading the list is Stale Userspace drafts, with 15,084 pages. Ranking at 3, 2010 films contains three subcategories and 1,915 pages. Interestingly, way down down at no. 86, American film has 21 subcategories and--wait for it--24,190 pages. Furthermore, should the polluted categories be added to the Wikipedia backlog? The backlog has categories for articles needing categorization, for adding additional categories, and so on. If category pollution is considered an issue, is there an ongoing discussion online? Cheers. Encycloshave (talk) 19:05, 9 February 2012 (UTC)


 * Hi. Sorry, I missed this thread. "Polluted" in the context of this report means being an article category and containing non-articles or being a non-article category and containing articles. There's a general theory that categories should be separated at a high level between reader-facing categories and editor-facing categories. A quick way to find overlap is to look at pages where there are mostly non-articles with one or two articles and cases where there are mostly articles with one or two non-articles. There are more cases that could be found, but this report is rather rudimentary.
 * The source code for the script is available here: Database reports/Polluted categories/Configuration. The "No." column is simple enumeration of the results. It's just there to give you a way to refer to a particular row or figure out how many rows there are. Primary key, &c. :-)
 * I'm not sure if there should be backlog categories for polluted categories. Maybe! It looks like there's infrastructure in place to exclude certain categories from being listed in the report using . So there's that, I guess. --MZMcBride (talk) 01:10, 18 April 2012 (UTC)
 * Thanks for the response. I'll start adding Polluted category on some of the hidden categories in this week's report, and see how it goes.  GoingBatty (talk) 02:09, 18 April 2012 (UTC)

Issue with categories that contain parentheses
Could you please fix this report so that categories containing parentheses are listed properly? (e.g. Category:Western Cordillera (North America) is listed as "Western_Cordillera_") Thanks! GoingBatty (talk) 15:42, 1 July 2012 (UTC)
 * This was the result of using the pipe trick. I've fixed this now. --MZMcBride (talk) 00:58, 5 November 2012 (UTC)
 * Thank you! GoingBatty (talk) 01:54, 5 November 2012 (UTC)

Request for assistance in removing categories from user pages
Where is the best place to ask for help removing article categories from user pages? For example, User:Noahk11 is in Category:1, but I don't see it in the wikicode. Thanks! GoingBatty (talk) 03:10, 13 November 2012 (UTC)
 * I guess my talk page or WP:VPT would both work. This talk page isn't closely watched, so I'd recommend against posting here.
 * It's usually easiest with cases like that to run them through Special:ExpandTemplates. You enter "User:Noahk11" for the context title and then put " " for the wikitext. You can leave the default options checked/unchecked. Then hit submit and wait for the result. From that result, you can see approximately where the category is coming from. In this case, the answer is User:UBX/LGBTYouthQ, which specifies usercategory=1 inexplicably. Check out the page history of User:UBX/LGBTYouthQ to see the evolution of the template's wikicode. If you change that template parameter (or remove it altogether), that'll fix your categorization issue. --MZMcBride (talk) 03:34, 13 November 2012 (UTC)
 * Thank you very much for the quick reply. It worked like a charm - thanks!  GoingBatty (talk) 03:50, 13 November 2012 (UTC)

Request for assistance updating templates that autocategorize
Some articles are in this list because they contain a template that automatically categorizes the article. For example, User:Zythe/Twilight (Buffy the Vampire Slayer) is in Category:2010 in comics due to the fields in Infobox comics story arc. Is there a way to update templates so that the categorization is only done in article space and not user space? Thanks! GoingBatty (talk) 03:36, 13 November 2012 (UTC)
 * Sure. You can add wrapper code such as . This code snippet just means that if the current namespace name is equal to an empty string (ifeq --> if equal;  --> current namespace name; article namespace --> explicitly named "" [unprefixed]), then categorize the page. You can write simliar logic for the user namespace if you want to categorize in that way. Or you can do an if/then/else tree. --MZMcBride (talk) 03:40, 13 November 2012 (UTC)
 * That works great - thanks! GoingBatty (talk) 04:00, 13 November 2012 (UTC)
 * I see that User:Drilnoth is using Main other, which appears to be a bit easier. GoingBatty (talk) 14:52, 15 December 2012 (UTC)
 * I've seen at least two cases where the various templates had been fixed, but the category table never got updated. You'd look at the article and the cat wouldn't be listed, but you'd look at the category and it would be.  Inclusion of categories by templates is always a bug-filled puspile.  A WP:NULLEDIT or better of the including article addresses the problem. --j⚛e deckertalk 05:30, 17 December 2012 (UTC)
 * Some of those NULLEDITS aren't sufficient. I'm now trying a few delete/undelete cycles to poke the database, see User:Johnny421/Miss Universe 2013, User:Steinar259/sandbox, if those drop from the report next cycle that will indicate that the technique works for clearing out messed-up entries. --j⚛e deckertalk 19:14, 31 December 2012 (UTC)
 * Please don't go around deleting/undeleting pages needlessly. The Toolserver's replicated databases are corrupt. This is a known issue. When the replicated databases are re-imported (rebuilt), they'll stop being corrupt and consequently any database reports that rely on these replicated databases will stop listing bad (inaccurate) results.
 * Unless you think the issue is in the production site? I don't think there's any evidence of this, though. --MZMcBride (talk) 19:23, 31 December 2012 (UTC)
 * Roger, will comply. No, I don't think it's a production database issue.  It was not clear to me that the toolserver db would be rebuilt, but that lack of clarity on my part almost certainly reflects my ignorance and not really being in the loop.  (Suggestions for where to monitor to just absorb more of what's happening on TS would be welcome.) --j⚛e deckertalk 19:26, 31 December 2012 (UTC)
 * Just pinged toolserver-l again. These problems are super-annoying. :-/ --MZMcBride (talk) 19:39, 31 December 2012 (UTC)
 * Thanks! And no worries, I didn't mean to suggest it was urgent for me, I just have a bit of a compulsion to CLEAN UP ALL THE THINGS, *laughs*.  Polluted cats are not a high-urgency issue.  Thanks for the pointer to (and I'm sure I should have known this already) toolserver-l. appreciated!  --j⚛e deckertalk 19:46, 31 December 2012 (UTC)

Tool broken
When I click on the links on the page it returns a 404 error. When will this be fixed? Fei noh a  Talk 05:11, 25 July 2016 (UTC)
 * Hi Feinoha. This issue should be fixed now. --MZMcBride (talk) 18:42, 14 December 2016 (UTC)

Tool broken (again?)
Hello, the list has not been updated for over 5 months. What's going on there? --TheImaCow (talk) 10:27, 23 May 2020 (UTC)
 * Hi TheImaCow. The report is broken and the script used to generate the report needs to be rewritten. Do you have any interest in doing this? --MZMcBride (talk) 09:57, 4 June 2020 (UTC)
 * No, sorry, I have no idea how to write such a script :D But thanks for your answer! --TheImaCow (talk) 12:02, 4 June 2020 (UTC)