Wikipedia talk:Database reports/Archive 4

usernames vs articles
Would there be any way to locate instances where both (X Y) and (user:X Y) exist, and (X Y) is a living person? DS (talk) 14:21, 7 June 2010 (UTC)
 * I did fuzzy matching between article creators and article titles at some point. The data is available here. It'd be neat if something similar could be done with usernames and BLP titles, though you're dealing with much larger sets, so I'm not sure how technically feasible it is.
 * In general, there are two important considerations to make with a task like this: false positives and false negatives. The false positives come from common names (most are), causing coincidental collisions that aren't of any importance. The false negatives come from two sources that I can think of off-hand: the addition of middle names or inclusion of full names in page titles and page titles that disambiguate (e.g. "John Smith (minister)"). The middle name issue is offset by fuzzy matching; the disambiguation issue is offset by stripping. --MZMcBride (talk) 18:25, 7 June 2010 (UTC)
 * If those users did not edit articles about themselves, then identifying them can raise privacy issues. Sole Soul (talk) 22:41, 7 June 2010 (UTC)
 * How so? --MZMcBride (talk) 00:33, 8 June 2010 (UTC)

Database reports/Autoconfirmed users in the confirmed user group
Link to the userrights screen would be keen. Filling in the reason as 'autoconfirmed' even moreso. – xeno talk 01:28, 14 June 2010 (UTC)
 * I've changed the report to use dbr link, allowing any user to customize the output. Other reports will hopefully eventually use this template as well. --MZMcBride (talk) 15:28, 16 June 2010 (UTC)
 * Awesomeness. – xeno talk 15:31, 16 June 2010 (UTC)

Suggestion: sort by "time added"
This is not a request, it is a suggestion concerning all applicable reports. The suggestion is to add a column which marks when an item was first added to the report.

The rationale: If an item has been reported in the last 10 updates of a daily report, chances are that more than one user has checked the item.

If this suggestion was accepted, there is no need to apply it retroactively for the current items. Sole Soul (talk) 03:19, 7 July 2010 (UTC)
 * I can't see any simple or clean way of doing this right now. I'll have to give this more thought. In general, I'd say that better reports are preferred to just adding a timestamp column. That is, if certain results keep showing up and are getting in the way, it'd be good to find a way to make that not happen rather than simply point out to the user how long these items have been getting in the way. Is there a particular report or set of reports where you think this feature would be useful? --MZMcBride (talk) 04:45, 13 July 2010 (UTC)
 * Thank you. My impression is that this is a bot function not a database query function, am I right? Sole Soul (talk) 05:51, 13 July 2010 (UTC)
 * I'm not sure I understand the question. The MediaWiki database doesn't track when items are added to a page. It would have to be stored in a separate table somewhere, I imagine. --MZMcBride (talk) 05:52, 13 July 2010 (UTC)
 * When I made my suggestion I presumed that it would be done by programming BernsteinBot to compare the last update of a report with the new update, and I just asked to know if my understanding was correct or not. Sole Soul (talk) 06:05, 13 July 2010 (UTC)
 * Yes, it would do something like that. BernsteinBot would probably maintain a database in which it stored database report title, a key of some sort (maybe a cryptographic hash of the table row]]), and a timestamp. Then the bot would compare any updates to the database table and update the database table or wiki table as necessary. Or something like that. It's not impossible to code, but it's not a trivial task and I'm not sure I see the benefit (yet). --MZMcBride (talk) 06:08, 13 July 2010 (UTC)
 * Users go through a large report either from the top down or the bottom up or randomly. With time, this will make the bulk of the top and bottom items stabilize (i.e. they will be repeated because many users have checked them). This means that small percent of items are checked many times while the rest of items checked a few times or never. Sole Soul (talk) 06:37, 13 July 2010 (UTC)

More long pages
Is it possible to get more long talk pages for Database reports/Long pages in section "Specified talk pages"? It would be preciated, since the list is good to check for pages that needs to be archived and reduced in size. What about change the size limit from 175,000 bytes to 150,000 bytes? --Kslotte (talk) 16:06, 12 July 2010 (UTC)
 * Sure, done. --MZMcBride (talk) 04:44, 13 July 2010 (UTC)
 * Thanks, 300+ entries found. --Kslotte (talk) 10:58, 14 July 2010 (UTC)
 * Could you also exclude pages that transclude Historical?. Tim  1357  talk  10:26, 16 July 2010 (UTC)
 * I don't see any point why to exclude that. If it is about getting some Wikipedia talk "permanent archives" excluded, then excluding historical template alone isn't enough. There exist also many other templates. --Kslotte (talk) 10:52, 16 July 2010 (UTC)
 * Sounds a bit like "perfect is the enemy of done." If we can say definitively that pages marked with historical should be excluded, then there's no reason to list them, even if we may miss other pages that aren't properly marked or are marked using a different template. --MZMcBride (talk) 01:31, 18 July 2010 (UTC)

Perhaps exclude all noindexed pages. Most of the permenant archive templates dump the pages into the noindex category. Tim 1357  talk  01:01, 18 July 2010 (UTC)
 * I don't know about this. A lot of pages get NOINDEX'd for a lot of reasons. --MZMcBride (talk) 01:31, 18 July 2010 (UTC)

The list have have been reduced. It is time to take next step and lower the size limit to 140K for Database reports/Long pages in section "Specified talk pages". --Kslotte (talk) 09:42, 28 August 2010 (UTC)
 * And, could the pagelinks be used instead of pler? What is the idea with the raw format? do we need it? --Kslotte (talk) 22:55, 29 August 2010 (UTC)
 * Changed the limit per your request. I also set the report to use dbr link, so you can customize the link output to your heart's desire. --MZMcBride (talk) 01:12, 30 August 2010 (UTC)
 * I did this change, making use of pagelinks and an additional link to the raw version. --Kslotte (talk) 17:21, 30 August 2010 (UTC)
 * I did revert it. Seems like there is too many transculations on one page. Links at end of page won't work. Possible some wiki function restrictions. --Kslotte (talk) 18:31, 20 September 2010 (UTC)

The list have have been further reduced. It is time to reduce size limit to 130K for Database reports/Long pages in section "Specified talk pages". --Kslotte (talk) 21:17, 21 October 2010 (UTC)

Atypical deletion log actions
has been disabled. When RevDelete was only enabled for oversighters, this report made a lot more sense. Now it has simply become unwieldy and ineffective. 17293 would address this issue in MediaWiki itself, so write your Congressmen about getting that fixed. --MZMcBride (talk) 04:59, 13 July 2010 (UTC)
 * Can we get a petition going? Killiondude (talk) 19:42, 18 July 2010 (UTC)
 * With God, all things are possible. --MZMcBride (talk) 20:26, 18 July 2010 (UTC)

Most-linked redirects
Would it be possible to have a database of redirects that have the most links to them in articles? For example, finding out how many articles link to Dusty Rhodes (baseball player) when those should all link to Dusty Rhodes (baseball). Running it weekly would work fine. Wizardman Operation Big Bear 16:59, 17 July 2010 (UTC)
 * The pagelinks table is one of the largest. As a basis for comparison, all non-deleted revisions equal roughly 328,876,236 rows. The pagelinks table is currently roughly 507,145,981 rows. So any queries on it generally don't work so well. I'm testing the query to get this data now, but it's been running for a few hours and I'm not sure if it will finish before it gets killed. We'll see. --MZMcBride (talk) 20:58, 17 July 2010 (UTC)
 * Diddo MZMcBride. Finding things like most linked require a function that takes friggin forever to complete. This is because it needs to find how many links exist for every single redirect before it can spit out a list of the most linked. I suggest a list of redirects with over a certian ammount of links (say 500 links), limited to the first 100 found. Tim  1357  talk  01:00, 18 July 2010 (UTC)
 * You mean ditto, not diddo. --MZMcBride (talk) 17:11, 18 July 2010 (UTC)


 * { Tim  1357  talk  18:46, 18 July 2010 (UTC)


 * So, I ran this query, but it took about 20 hours. And the result set wasn't ordered correctly, so I have to run it again. I'm not sure I see the need for this to be a proper database report. I may just throw it in a sandbox. Dunno. --MZMcBride (talk) 17:11, 18 July 2010 (UTC)
 * 20 hours seems like a bit much for such a low priority type of report. Maybe just have this be a "as requested" report. Tim  1357  talk  18:46, 18 July 2010 (UTC)
 * I tried it without any filters and after 3 hours I got a 60 MB file with 2,566,221 rows. However, I'm unsure how to filter the set.  Plus we don't fix redirects.  — Dispenser 23:21, 21 July 2010 (UTC)
 * 3 hours? Can you head or tail the file and post results somewhere? I'm also not sure what "filter the set" means. Maybe paste the query you used? :P --MZMcBride (talk) 01:12, 22 July 2010 (UTC)
 * I looked at the query you used. I think you missed the "in articles" part. Or maybe I mis-read the request. It seems like he only wants to count links from other articles, though. --MZMcBride (talk) 11:10, 22 July 2010 (UTC)
 * It takes 5.2 hours searching only the article namespace, but only 15 minutes when both the page and redirect ends with ')'. However, the requester hasn't come back to clarify what he's going to do with it.  — Dispenser 03:10, 29 July 2010 (UTC)

Images without categories
Would it be possible to have a request run to find images with no categories whatsoever, including categories from templates? In theory, every image should be categorised based on its licensing; if it's not, it'll fail NFCC#10b, as all the copyright tags include licensing categories (or should). This should help us find problem images (or perhaps even problem templates) that the bots and other checks are missing. I don't know the scale of this problem, but I know that it exists. I guess having it checked weekly would be good. J Milburn (talk) 02:42, 27 July 2010 (UTC)
 * In case it wasn't clear, I was thinking about images hosted on enwp, not images hosted on Commons that happen to have some details on pages here. Can't see any harm in including the latter, but just thought I'd note that. J Milburn (talk) 02:45, 27 July 2010 (UTC)
 * Special:UncategorizedFiles? --MZMcBride (talk) 01:56, 28 July 2010 (UTC)
 * Well, I'll be damned. Thank you muchly. J Milburn (talk) 12:04, 28 July 2010 (UTC)
 * Interestingly, it didn't seem to catch the images I was after, which is odd. Very useful though, thanks again. J Milburn (talk) 12:14, 29 July 2010 (UTC)

Redirects
I created Pages with most redirects. Don't know if it belongs on this page. Also working on Templates with names differing only in capitalization. Rich Farmbrough, 13:20, 28 July 2010 (UTC).
 * If you plan on updating them at somewhat frequent intervals, please add them to the subject-space page. --MZMcBride (talk) 16:54, 28 July 2010 (UTC)

Interwiki reports
Would it be worth producing a report on problematic interwiki links? I have recently come across the following errors Now the third one may not really be a problem if we obey the rule that we don't fix redirects but the other two are definitely worth fixing. Boissière (talk) 20:47, 28 July 2010 (UTC)
 * Asymmetric links. i.e Article A points to article B but the en interwiki on the latter doesn't point back to article A.
 * Interwiki links to deleted articles. (These don't appear red like internal links do).
 * Interwiki links to redirects.
 * There is lots of bots that fix the first error. The second one might be worth looking into, if no bot does this already. Also, I think that bot that immediately fixes such errors would be better than just a report. Svick (talk) 21:23, 28 July 2010 (UTC)


 * Bot requests are a bit different from database reports. Though obviously there is some overlap, as some reports are fed into bots and scripts to automatically or semi-automatically fix issues in pages.
 * Off-hand, I'm inclined to say that listing interwikis to redirects isn't going to be helpful and would create the illusion that they should be fixed. I don't see any reason for the redirects to bypassed unless the target page isn't accurate (the same title being re-purposed for a disambiguation page or whatever), obviously. I don't think there's really a way for a bot to know that, though.
 * I agree with Svick regarding points 1 and 2. Bots either exist (such as pywikipedia's interwiki.py, Google it for more info) or could exist to do some of this work. I imagine most of these bots would generate their own lists, though I'd be happy to help if I can. --MZMcBride (talk) 23:31, 28 July 2010 (UTC)
 * OK, I would agree that number 3 is not necessary and number 2 is fairly easy for bots to handle (though it would be interesting to know how good a job they are doing). What I can't see in the case of number 1 is how a bot could work out which link was faulty. I would have thought it would require some examination of the articles in question to decide where the links should actually go. How easy would it be to knock up a query for number 1? If it is not too hard then we could see what the current situation is. Boissière (talk) 20:07, 29 July 2010 (UTC)
 * I have a tool ~merl/reverselanglinks which is used by a gadget on dewiki to search interwikis linking to a single article. Because of these experiences i think the output of this query would be very long.
 * If you create this report you also have to keep in mind that jbowiki is not first-letter-upper-case like all other wikipedia. Merlissimo 20:46, 29 July 2010 (UTC)
 * Your comment about the output being long does imply that it is something that a bot cannot easily fix. It has occurred to me that a database report might not be the best approach as presumably you have to attach to each language wiki's own database. I am now wondering if using the API would be easier and therefore a reporting (but not fixing) bot would be better. Boissière (talk) 15:31, 2 August 2010 (UTC)

Database reports/Pages with the most revisions
Is there any reason why that report hasn't been updated for nearly three months? Does it have anything to do with server load? I for one find that report interesting. I wonder if pending changes has had much of an effect on the data. Graham 87 09:16, 29 July 2010 (UTC)
 * It's supposed to run on the 28th day of each month. It keeps getting killed because it runs for so long, I think. I'm not really sure if there's anything to be done about this right now. --MZMcBride (talk) 02:44, 31 July 2010 (UTC)
 * "data as of 18:11, 28 August 2010 (UTC)" &mdash; wheeeee! --MZMcBride (talk) 02:26, 30 August 2010 (UTC)

Archived page
I've (selectively) archived this talk page. I was going to write a script to do it, but it's not worth the development time when it only takes a few minutes to do it manually.

Some of the threads I removed were either regarding reports I didn't really feel like generating or the original poster failed to respond in a while.

If you actually want a report, re-post here and someone will take care of it, eventually. You may have to nag, though. --MZMcBride (talk) 02:41, 31 July 2010 (UTC)

births and births
Been finding a lot of biographies of dead people mistakenly put in two births categories ( par exemple ), could we get a report of people in two separate births categories? --Closedmouth (talk) 13:30, 14 August 2010 (UTC)
 * Make me! --MZMcBride (talk) 18:20, 14 August 2010 (UTC)
 * Here you go: http://en.wikipedia.org/w/index.php?oldid=378928452
 * If there isn't one already, you should create a (maintenance or talk page) category for biographies that cover more than one individual. It would allow for much easier filtering in the future, for this list and other lists. Once that's in place, I'll write up a report that regularly updates. I think this is a reasonable compromise. --MZMcBride (talk) 20:27, 14 August 2010 (UTC)
 * If you change the WHERE from WHERE c1.cl_to != c2.cl_to to WHERE c1.cl_to < c2.cl_to it would stop the double rows. -- WOSlinker (talk) 20:48, 14 August 2010 (UTC)
 * Aye, nice. :-) I think SELECT DISTINCT would also do it. I was just too lazy to re-run the query. --MZMcBride (talk) 20:49, 14 August 2010 (UTC)

File description pages containing no license
Been mulling this over for a while; similar to Database reports/File description pages containing no templates, I'm looking for "File description pages containing no license" - but the actual finding of them...? Theoretically each image description page would include more than one template - i.e. Logo & FUR - which would probably lead to tons of false positives. Another option would be to scan the image page to see if any of the templates from Category:Wikipedia image copyright templates is present, but it's a huge category. Another possibility - checking for "very short" page length instead of "no" template at all... Suggestions would be greatly appreciated :) Skier Dude  ( talk  07:21, 19 August 2010 (UTC)
 * ~erwin85/shortpages.php seems to be what you're looking for. It allows you to find pages in the file namespace that don't transclude any templates. Killiondude (talk) 16:26, 19 August 2010 (UTC)

Not only does it catch my issue - also catches the description pages for commons images as well! Skier Dude ( talk  04:07, 21 August 2010 (UTC)
 * Changed my mind - it's picking up only the short pages - that means that if there's one template (say, the basic info) it isn't included by the tool. So, my request for "very short pages" isn't the right answer for this :(  Skier Dude  ( talk  04:17, 21 August 2010 (UTC)


 * You've seen, right?
 * If that report isn't really what you want, you should find some specific example files that are problematic. I can take a look at them and see if some patterns are evident. --MZMcBride (talk) 04:31, 21 August 2010 (UTC)
 * The tagged ones are in Category:Wikipedia files with unknown copyright status - I've added a goodly number there ;) Skier Dude ( talk  03:29, 26 August 2010 (UTC)

Unlisted stub types
I think a useful report for WP:WikiProject Stub sorting would be all stub categories (any category ending with the word "stubs") or stub template (ending with the string "-stub") not linked to from WikiProject Stub sorting/Stub types. A good heading would be:
 * Stub types not listed at WikiProject Stub sorting/Stub types. These may be unproposed (in which case they should be listed at WikiProject Stub sorting/Discoveries, or they may have simply not been listed by a user who created a proposed stub type, or renamed one under WP:Stub types for deletion.

עוד מישהו Od Mishehu 08:45, 19 August 2010 (UTC)
 * Shouldn't be too difficult to write. How often should it update? And I assume Database reports/Unlisted stub types will work as an output location and title? --MZMcBride (talk) 03:59, 24 August 2010 (UTC)
 * This was actually one of the many reasons for creating - in that the List Comparer from WP:AWB can be used to compare Category:Stub message boxes with links from WikiProject Stub sorting/Stub types. Unfortunately there are only about 3400 on the page and there are about 16000 in the category.  It is necessary to pick up some subpages too namely
 * Wikipedia:WikiProject_Stub_sorting/Stub_types/Geography
 * Wikipedia:WikiProject_Stub_sorting/Stub_types/Culture
 * Wikipedia:WikiProject_Architecture/Stub_categories


 * But even then the list is far short of the category size.


 * Xeno has also created WikiProject Stub sorting/Complete template list, last updated March.
 * I will paste my results into Database reports/Unlisted stub types and Database reports/Uncategorized stub types, presently. Rich Farmbrough, 14:27, 31 August 2010 (UTC).
 * Most of these stub types wil be listed at WikiProject Stub sorting/Templates to vet. Rich Farmbrough, 15:11, 31 August 2010 (UTC).
 * Haveing remove those there's still abuut 3000 left. oI am adding them as an appenidx to WikiProject Stub sorting/Templates to vet. Rich Farmbrough, 19:32, 31 August 2010 (UTC).

I see the 2 most recent lists in WSS are from March and from August (5 months apart), I think we need a more relyable system. About MZMcBride's list at Database reports/Unlisted stub types - the title is good; one thing I forgot to consider is that the data is in the transcludeed subpages, which means that it takes longer for the data to get to WikiProject Stub sorting/Stub types - I'd say that we should see what's not linked to from the subpages of WikiProject Stub sorting/Stub types. עוד מישהו Od Mishehu 08:06, 2 September 2010 (UTC)

Replacement for Wolterbot

 * ''From WP:VPT

Is it difficult for someone to replace the late Wolterbot with a bot that goes through the FAs and coutns how many cleanup categories are also listed at the bottom to update Featured articles/Cleanup listing. I cannot imagine it to be a difficult task  YellowMonkey  ( new photo poll )  00:54, 23 August 2010 (UTC)
 * Maybe request a listing at WP:DBR?  — fetch ·  comms   02:36, 23 August 2010 (UTC)
 * Try asking at WP:BOTREQ Peachey88 (T · &#32; C) 07:22, 23 August 2010 (UTC)

The old bot was monthly, which is enough, really  YellowMonkey  ( new photo poll )  04:12, 24 August 2010 (UTC)
 * I'm gonna let the WP:BOTREQ have first stab at this. If no one bites, I'll take a look. At a quick glance, it doesn't look like a bot was updating Featured articles/Cleanup listing. And I'm not really sure what you define as a "cleanup category." Is there a master category somewhere of cleanup categories?
 * I hadn't realized B. Wolterding had stopped editing. :-/ --MZMcBride (talk) 04:18, 24 August 2010 (UTC)
 * Well, the it was updated by a bot until March, and when the problems were dealt with, they were struck off manually between the month. I guess if there is a parent cat, the usual ones needed are unsourced, dead links, tagged to POV/weasel, undue weight and all that  YellowMonkey  ( new photo poll )  04:42, 24 August 2010 (UTC)
 * I threw this together in about 5 minutes, what do you think? Tim  1357  talk  20:35, 28 August 2010 (UTC)

Report request- transclusions of deleted templates
I would like to request that a DBR be created: Pages transcluding deleted or nonexistant templates, run Sunday and Wednesday. Would be nice to catch stuff that slips through TFD/H. Note: all pages, regardless of namespace, should be listed. &mdash; Train2104 (talk· contribs· count· email) 00:25, 26 August 2010 (UTC)
 * I just quickly peeked at this yesterday but I was having an issue mapping this out from my mind to the database, so I left it for another day. There are a few points I wanted to make while I'm here, though:
 * Generally I favor "red-linked" instead of "deleted or nonexistant" [sic!]; this is mostly because a template can be red-linked due to never having been created or due to having been deleted (and this can be an important distinction in the context of some reports); the generic term covers both cases fairly well, in my view;
 * There are some oddities in some (complex or "esoteric") templates that cause them to transclude non-existent templates; this is mostly due to poor coding from what I've seen, but it doesn't change the fact that there are glaring red links in reports like currently (e.g., Template:Pubchemcite is listed as having over 4,000 transclusions but has never existed, as far as I can see);
 * When you talk about "templates", the reality is that a template can exist in nearly any namespace (with a few exceptions);   is technically a template just like  | ; I'm going to assume you're only talking about transclusions of pages of in the Template namespace; if that's not the case, please let me know;
 * [Related to my second point] I'm not sure you realize the volume of such a report; it will likely be thousands upon thousands of results, which leaves two options: (1) limit output to 1,000 results (truncate), or (2) paginate; I'm generally hesitant to paginate if it will be more than a few pages as I don't see that providing much value and it just adds needless clutter; is a good example of this phenomenon.
 * I hope I haven't overwhelmed you; and I hope you've enjoyed the excessive semi-colons. --MZMcBride (talk) 02:58, 29 August 2010 (UTC)
 * I've found the use of Pubchemcite in Chembox PubChem. Probably should be a parameter but I haven't changed it yet. -- WOSlinker (talk) 10:16, 29 August 2010 (UTC)
 * If it is limited to 1000, then will it be sorted in a similar way to Templates transcluded on the most pages ? -- WOSlinker (talk) 10:22, 29 August 2010 (UTC)
 * It certainly could be. It'd just require pulling a COUNT(*) and sorting by it. --MZMcBride (talk) 18:00, 29 August 2010 (UTC)

I threw this together, but it seems to be pretty slow. It returned !comment as the first result. Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  22:24, 29 August 2010 (UTC)

To cut down on the number of entries, lets put these restrictions on it:


 * 1) Transclusions of pages outside the template namespace wouldn't count.
 * 2) The template in question must have a deletion log entry. (reduces the esoteric issue, and plus, who would want to use a never-existent template?)

&mdash; Train2104 (talk • contribs • count) 23:31, 29 August 2010 (UTC)
 * I'm running this now. We'll see how many results it outputs (and in how much time). --MZMcBride (talk) 01:21, 30 August 2010 (UTC)
 * With or without Train2104's restrictions? Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  02:04, 30 August 2010 (UTC)

(unindent) With. The query I used was: It took fifteen minutes to run and output 6,164 rows. A sample of the top:

And some results from the bottom of the file:

This file is obviously sorted by number of transclusions. Train2104: Is this what you want? --MZMcBride (talk) 02:15, 30 August 2010 (UTC)


 * I took a look at Pending deletion, and what links here only gave me 1 transclusion (which I have removed). Where are the other 4787 transclusions?  Also, the last entry is in the userspace.&mdash; Train2104 (talk • contribs • count) 12:43, 30 August 2010 (UTC)


 * Some of the data is just bad. The templatelinks table is referencing non-existent page IDs. I imagine some old bug didn't delete the rows in the templatelinks table appropriately when a page was deleted via the user interface. That seems like a reasonable explanation for the high count for a template like Template:Pending deletion. It may be possible to account for these bad rows in future queries.
 * Most of the other data looks fine. All of the pages are in the Template namespace. For example: Special:WhatLinksHere/Template:User:Encyclopedist/Welcome! and Special:WhatLinksHere/Template:User_Sock_Puppet. --MZMcBride (talk) 14:33, 30 August 2010 (UTC)

Transclusions outside template namespace should be fairly few, excepting userboxen, and there's no reason they shouldn't be cleaned up (redirect to undefined maybe?). It would reduce the report size if you could ignore transclusions from TfD. Rich Farmbrough, 15:20, 31 August 2010 (UTC).
 * Looks fine to me. Paginate the first 4000 rows, truncate the rest until some are dealt with. &mdash; Train2104 (talk • contribs • count) 02:26, 11 September 2010 (UTC)
 * The data should appear at in a few minutes. --MZMcBride (talk) 02:42, 11 September 2010 (UTC)

Medical articles frequently edited by non-autoconfirmed users
Hello. At Wikipedia_talk:Invitation_to_edit we're looking for the medical articles that are most frequently edited by non-autoconfirmed users (to trial a "how to edit a medical article" tutorial). Would it be possible to generate such a list? The top 40 should be sufficient. Anthony (talk) 17:57, 29 August 2010 (UTC)
 * Well, this is fairly complicated:
 * What constitutes a "medical article"? Pages whose talk pages are tagged a particular way? Pages in a particular category? Pages from a particular list?
 * Auto-confirmed status is implicit, not explicit, and it has changed over time. It used to be only four days; now it's four days + 10 edits. You could roughly approximate what the auto-confirmed standards of today are when looking at the database (at this specific moment in time), but you can't be exact. That is, you can say "if these people tried to edit today, we could say with a high degree of confidence that they wouldn't be able to on a semi-protected article due to their current edit count and registration date." But when you look at page histories, the stored edit count will remain current and the registration date will remain constant. So when you evaluate older edits, you won't know if the user was auto-confirmable then without manually calculating it for each edit. Other odd extensions like TorBlock completely change the auto-confirmed settings under certain conditions.
 * If you wanted to limit the report to something more easily definable (like all pages in Category:X sorted by number of IP edits), that would be much simpler. --MZMcBride (talk) 18:07, 29 August 2010 (UTC)

Thanks for the quick feedback! I've asked at the medicine project if there is a comprehensive list of medical articles. And IP edits is a good second best. I'll get back to you. Anthony (talk) 18:25, 29 August 2010 (UTC)

I'm back! Would it be possible to find the 100 articles with the greatest number of unique IP edits in the last year from the C, B, GA and FA class med articles here? Anthony (talk) 19:19, 29 August 2010 (UTC)

The data is at http://en.wikipedia.org/w/index.php?oldid=381806837. A few notes: Hope that helps. The source of the script I used is below. --MZMcBride (talk) 02:18, 30 August 2010 (UTC)
 * 1) it's unsorted;
 * 2) if there were 0 unique IP edits in the past year, the row was omitted (e.g., Timeline of tuberous sclerosis in Category:FL-Class medicine articles);
 * 3) the results are not truncated.


 * Thank you so much. That was perfect. I have reduced the list to the top 20. --Anthony (talk) 09:32, 30 August 2010 (UTC)

Non-free images with multiple files in the history?
Hey, would it be possible to get a report of all the files tagged as non-free which have multiple files in the history? For an example of what I mean, see File:Manchester Mark2.jpg- the image was reduced, but the old version is still visible (and will be until deleted). The old versions fail the NFCC (as they are not in use) and so some cleanup is required in every case. J Milburn (talk) 00:02, 19 September 2010 (UTC)
 * The first 1000 are at User:Svick/Non-free images with multiple files in the history. I can run that report on demand or regularly or extend it to more articles, if you wanted. Svick (talk) 15:25, 19 September 2010 (UTC)
 * Thank you very much, I'm requesting permission for my bot to deal with them, so a longer list may be very helpful- I'll see how the request for approval goes. J Milburn (talk) 22:03, 19 September 2010 (UTC)

Database reports/Articles containing overlapping coordinates
I'd like to up the frequency of this report from monthly to weekly. Doable? --Stepheng3 (talk) 21:03, 20 September 2010 (UTC)
 * Sure, done. --MZMcBride (talk) 21:52, 20 September 2010 (UTC)
 * Thank you! --Stepheng3 (talk) 23:00, 20 September 2010 (UTC)
 * When can I expect another update? --Stepheng3 (talk) 16:13, 28 September 2010 (UTC)
 * Well, it says "0 12 * * 4" on the configuration page. That'd be 0 minutes past hour 12 on the fourth day of the week. So... maybe on Thursday? I'm not really sure why it's been over a week since it updated. I'll try to remember to peek at this later today. --MZMcBride (talk) 16:23, 28 September 2010 (UTC)


 * I poked at this a bit. I verified that the crontab entry is "0 12 * * 4 python ~/scripts/database-reports/coordoverlap.py > /dev/null" and that the script works properly. (I just ran it manually, so the report is now updated.) I think the report should update again this week. Please let me know if you continue to have issues. It's comparing two large sets (Coord/display/inline,title has 202,431 transclusions while Coord/display/title has 349,287 transclusions), but it shouldn't take more than a few minutes to execute and finish. I don't have any e-mail from cron indicating an issue from the script, so I guess we'll just have to wait and see if this continues to be a problem. --MZMcBride (talk) 02:39, 29 September 2010 (UTC)
 * Thanks. I'll let you know if it does. --Stepheng3 (talk) 03:13, 29 September 2010 (UTC)

Request for one-time run
Can we run this report, one time only? I don't need it listed on the table.


 * All files that start with "BSicon " or any case variant of it, and
 * that are not on Commons, or
 * that obscure another file with the same name on Commons?

It would help me find some bad icons, since all of them should be on commons. WP:RDT is a global project. &mdash; Train2104 (talk • contribs • count) 01:03, 6 October 2010 (UTC)
 * I'm not sure why you need a manual query for this. Special:PrefixIndex/File:BSicon and commons:Special:PrefixIndex/File:BSicon should do the trick (perhaps with a few set operators). --MZMcBride (talk) 01:14, 6 October 2010 (UTC)
 * Here's the list you want, I think: http://en.wikipedia.org/w/index.php?oldid=389000355 --MZMcBride (talk) 01:17, 6 October 2010 (UTC)
 * And the inverse: http://en.wikipedia.org/w/index.php?oldid=389000621 --MZMcBride (talk) 01:20, 6 October 2010 (UTC)

Uncategorized location map templates
Would it be possible to have a (one-off) report of all the uncategorised templates of the form "Template:Location map xxx"? Every time the uncategorised templates report is updated, a few are listed in the report, but I suspect there are others that aren't listed (noting that the uncategorized templates report is not a complete list), so I would like to have a complete list of all the uncategorised location map templates, so I can address them all in one go. If there's an easy way I can obtain this information myself, please advise. Thanks. DH85868993 (talk) 04:07, 9 August 2012 (UTC)


 * Report posted at User:DH85868993/Uncategorised location maps for you. - TB (talk) 14:45, 26 August 2012 (UTC)
 * Thanks! DH85868993 (talk) 01:14, 27 August 2012 (UTC)

WikiProjects ranked by liveliness
I would like a list of discussion pages of WikiProjects (so I'm talking about pages called "Wikipedia talk:WikiProject Whatever") ranked by the frequency with which they are edited---in effect the most active WikiProjects listed first. Can that be done? Michael Hardy (talk) 03:46, 24 October 2012 (UTC)


 * There are a couple of relevant reports already being produced. Do either of:


 * Database reports/WikiProject watchers
 * Database reports/WikiProjects by changes
 * closely enough match your needs? - TB (talk) 07:01, 24 October 2012 (UTC)


 * I've spoken with Michael Hardy previously at the WikiProject Council page and recommended he come here. In trying to measure a project's activity or "liveliness", I can see the limits of the two reports you've listed. The current report on how many people are watching gives an impression that some projects are busy when in reality there has been little discussion in years (the dead WikiProject Contents tops the list). The other list provides the number of edits for articles under a project's scope, not necessarily edits made by the project's members. I think Michael Hardy is looking to see how many times the project's talk page is edited in a particular timeframe (maybe six months or a year) to gauge which projects are active places for discussion. –Mabeenot (talk) 15:44, 26 October 2012 (UTC)


 * Okay, initial report at User:Topbanana/WPtalk. It seems to be a good measure of activity for some projects, and an awful one for others.  I suspect the parameters need tweaked to make it more useful before we make it a regular thing. - TB (talk) 17:25, 26 October 2012 (UTC)

These look as if they may be useful.

Thank you. Michael Hardy (talk) 18:29, 26 October 2012 (UTC)

Polluted categories report
I'm trying to improve Wikipedia by removing article categories from user pages per WP:USERNOCAT, and have found the Polluted categories report to be very helpful in identifying categories to fix. Although the Database reports list states that this report is updated weekly, it appears that it hasn't been updated since August. Could some kind developer please look to see how this report can be put back on schedule? Thanks! GoingBatty (talk) 02:17, 4 November 2012 (UTC)
 * All fixed up now. I rewrote the report, which fixed a few outstanding issues: the output limit is now 1000 instead of 250; the pipe trick has been killed; there are now some helper links for users to see which pages are polluting the category; other various fixes. Thanks for a clear and discrete problem to solve! ;-) --MZMcBride (talk) 01:19, 5 November 2012 (UTC)
 * Thank YOU for solving it, and so quickly too! GoingBatty (talk) 01:50, 5 November 2012 (UTC)

New report request: unsourced BLPs not BLPPRODed
A list of articles in that don't transclude. -- Patchy1 08:39, 8 January 2013 (UTC)


 * I have run this report and placed the output at User talk:Patchy1/Unsourced BLPs not BLPPRODed for you. - TB (talk) 11:54, 8 January 2013 (UTC)
 * Excellent, thanks. -- Patchy1 12:41, 8 January 2013 (UTC)
 * Patchy1, you might also find http://toolserver.org/~magnus/catscan_rewrite.php usedful for that sort of query. --j⚛e deckertalk 18:34, 10 January 2013 (UTC)
 * Thanks! -- Patchy1 00:48, 11 January 2013 (UTC)