Wikipedia talk:Database reports/Archive 2

Hyphenation
I hate to be pedantic (that might be a lie), but generally there isn't a hyphen connecting adverbs that end in -ly to another word; several pages here have such hyphens. Namely, Recently-created biographies of living people, Recently-created unreferenced biographies of living people, Indefinitely-blocked IPs, Fully-protected articles with unusually long expiries, Indefinitely fully-protected articles and Indefinitely fully-protected talk pages. I know this isn't mainspace or anywhere important, so the MoS doesn't necessarily need to apply, and this isn't really a big deal either way. But if there's any support for doing so, could they be moved to the correct, un-hyphenated titles? (I won't do it, because I don't know if that would break any of the bots.) ÷seresin 03:39, 24 September 2009 (UTC)
 * Well, fuck. Seems you're right, if only barely. The rules surrounding hyphenation seem to be entirely arbitrary and made-up for the most part, but there seems to be (somewhat) wide agreement that adverbs ending in -ly are a special case. I'll fix the page titles in the next few days (it requires updating the scripts, moving some subpages, and adjusting some redirects). Cheers. --MZMcBride (talk) 04:21, 24 September 2009 (UTC)
 * Should be done. --MZMcBride (talk) 19:19, 24 September 2009 (UTC)
 * Neat. Thanks. ÷seresin 02:27, 25 September 2009 (UTC)

Database reports/Empty categories
Would it be possible to exclude categories tagged with WPFILMS Category from this report, in the same way that categories tagged with empty category are excluded? PC78 (talk) 12:40, 29 September 2009 (UTC)
 * Are these a significant problem? And, more generally, why are people creating empty categories? --MZMcBride (talk) 18:54, 30 September 2009 (UTC)
 * This is for project categories that should not be deleted even when empty, e.g. categories populated by templates. It's a problem if such categories are deleted when they shouldn't be. PC78 (talk) 22:34, 30 September 2009 (UTC)

Bump. Can this be done? There are a number of WP:FILMS categories currently listed in the report that should be excluded. PC78 (talk) 13:07, 27 October 2009 (UTC)
 * Eep. Thanks for the bump. With these two edits, I think your issue is resolved. The next update should reflect these changes. (I hope I didn't break anything....) --MZMcBride (talk) 13:37, 27 October 2009 (UTC)
 * I guess that does it, thanks! You should also exclude categories containing "-priority" from the report as well. PC78 (talk) 11:50, 28 October 2009 (UTC)
 * ... and the "Uncategorised" categories, though they should probably all be tagged with empty category. PC78 (talk) 11:53, 28 October 2009 (UTC)
 * I don't see anything wrong with deleting the "uncategorised xyz" categories. Once they are empty, it would be just as much work to simply add the proper categories than the "uncategorised" category, there's no reason these should ever be populated again. VegaDark (talk) 00:01, 5 November 2009 (UTC)
 * Uncategorised articles are continually sorted into those categories; they shouldn't be deleted when empty. PC78 (talk) 00:43, 5 November 2009 (UTC)

Database reports/Pages containing an unusually high number of non-free files
Could I ask for a couple of minor tweaks, please? Now that we've got the number of articles down to a manageable size, could we drop the lower limit on the report from 10 to ... well, something less than 10 that won't leave us with tens of thousands of articles in the list. How many would be returned if we dropped the lower limit to 8? Or even 6? Also, could the list be sorted descending by default so that the most obvious violations are at the top? Many thanks, Black Kite 06:29, 1 October 2009 (UTC)
 * ~mzmcbride/blackkite-non.txt is the list > 3 (warning, large file, you may want to save as). We can either make a larger report and paginate or you can tell me what kind of limit you'd like. --MZMcBride (talk) 19:55, 1 October 2009 (UTC)


 * Shall we start with 6 as the limit? That would seem to bridge the gap beteen articles which are probably infringing the limit, and a list that's too long.  Cheers, Black Kite 22:27, 3 October 2009 (UTC)
 * Set the limit to > 6 and changed the sort order. It's running now; should be finished in a few hours. --MZMcBride (talk) 04:59, 4 October 2009 (UTC)

Database reports/File description pages without an associated file
Could this be made to exclude image redirects? Thanks! –Drilnoth (T • C • L) 02:28, 4 October 2009 (UTC)
 * Database reports/File description pages without an associated file/Configuration indicates that the query contains page_is_redirect = 0. Do you have an example file? --MZMcBride (talk) 05:00, 4 October 2009 (UTC)
 * File:3133 (16).JPG --Closedmouth (talk) 06:32, 4 October 2009 (UTC)

Seems to have been a fluke? --MZMcBride (talk) 20:29, 4 October 2009 (UTC)
 * Weird; at least on one day there were a lot of these. I cleaned a lot of them up, but it's weird... MediaWiki didn't even indicate that the redirects existed when I went to the image page. –Drilnoth (T • C • L) 17:16, 5 October 2009 (UTC)
 * Yes, there's some sort of bug with image redirects not showing the "redirected from" text when using a foreign repo (or something). --MZMcBride (talk) 20:21, 5 October 2009 (UTC)
 * Any news on this? It's still happening (e.g., ) –Drilnoth (T • C • L) 22:06, 17 October 2009 (UTC)

Interesting. http://en.wikipedia.org/w/index.php?title=File:Beograd-opstine-lat.png&redirect=no&action=edit It's not a local redirect.... I guess we need to check against the foreign repo (Commons) to see if it's a redirect there? --MZMcBride (talk) 22:14, 17 October 2009 (UTC)
 * Should be fixed now. If there's reason to believe it's not, please ping me. --MZMcBride (talk) 00:52, 20 October 2009 (UTC)
 * Great; thank you! –Drilnoth (T • C • L) 00:57, 20 October 2009 (UTC)

Protected templates without documentation section
I think a report of all fully protected templates, which aren't redirects and don't use Documentation would be useful. Even if they don't actually have any documentation, transcluding Documentation will allow all registered users to create it. I think once a month is probably enough for this one. עוד מישהו Od Mishehu 10:38, 14 October 2009 (UTC)
 * Here are the results: ~mzmcbride/od-mish-templates.txt (about 3,600 entries). I think you'll want to refine your criteria before this is made into a proper report. --MZMcBride (talk) 02:20, 15 October 2009 (UTC)

Old stale articles...
Xeno said I should ask here... The other day I was in a particularly impish mood, and thought it would be nice to revisit some old articles that haven't seen the light of day for a while... Go back, and check older articles from when ref, citation, etc. policies weren't as heavily enforced... Is there any way to get a list of articles that were created a long time ago, and have been untouched since (e.g. created more than 5 years ago, and have had less than a dozen edits in their lifetime, and/or haven't had any new edits in the last 2 years)? If so, I'm sure that number is in the thousands, so is there any way to then break said list down into seperate lists by year or month/year? - Adolphus79 (talk) 02:00, 26 October 2009 (UTC)
 * yah id like to know that too. it would have to not include bot edits\. Tim1357 (talk) 00:10, 27 October 2009 (UTC)
 * Well, Special:AncientPages has been disabled for this project. So, I don't expect it's feasible to get the requested data using a query. Perhaps it can be done using database dumps? --Erwin (talk) 21:54, 27 October 2009 (UTC)
 * Actually, the bot part is the only piece that seemed infeasible to me. Doing a MIN(rev_timestamp) for the old articles and doing a COUNT(*) for the number of revisions both seemed easy enough. I think I have something similar that I wrote for BLPs for east somewhere.... --MZMcBride (talk) 22:24, 27 October 2009 (UTC)
 * Bot edits wouldn't matter to me... if one of those dozen edits were a bot, but it has not been edited at all in the last 2 years, then the bot edits could very well be outdated... - Adolphus79 (talk) 04:43, 28 October 2009 (UTC)

(outdent) But it would skew the data, if the article were to have not been edited for 2 years, then SmackBot came along and fixed them, it would make the article look less "stale."Tim1357 (talk) 22:31, 28 October 2009 (UTC)
 * OP, I found this WP:DUSTY a collection of articles with the lease recent edits. May not be useful in your case, but it sorda is. Tim1357 (talk) 22:37, 28 October 2009 (UTC)


 * That's a good start... thanks... With WP:DUSTY, and as far as the Smackbot concern, what if we take off the recent edits variable, and look at just the articles created >5 years and with <15 edits? - Adolphus79 (talk) 00:23, 29 October 2009 (UTC)
 * Bump? WP:DUSTY was a good jump off, but once it;s been edited, it's off the list... how about something along the lines of WP:DUSTY + >15 edits? - Adolphus79 (talk) 06:24, 15 November 2009 (UTC)
 * The query that I was running to generate this report was taking too long. I'm going to try another (hopefully smarter) query. If that doesn't work, you'll have to try a database dump. --MZMcBride (talk) 08:43, 15 November 2009 (UTC)
 * Query failed again. Don't have too much time to deal with this. Perhaps try WP:BOTREQ? --MZMcBride (talk) 09:02, 27 November 2009 (UTC)

Indefinitely protected templates without many transclusions
To catch stale or unwarranted high-risk-template protections. Could someone run a query to see what it would look like? Thanks. -- zzuuzz (talk) 20:40, 3 November 2009 (UTC)
 * Done: Database reports/Indefinitely protected templates without many transclusions --MZMcBride (talk) 05:20, 9 November 2009 (UTC)

BLPs not directly in Category:Living people
Normally, biographies of living persons should be directly in Category:Living people (not through a template), so that filter work for example. But some are not directly in, approximately 1000 based on this search. I wonder if it is possible to retrieve those. Thanks, Cenarium (talk) 23:42, 8 November 2009 (UTC)
 * Hmmmm, if you had a list of all pages directly using the text string "category:living people" and a list of all pages in the category, it should be simple enough to do a comparison of the two, right? This involves database dumps, which there is now very limited support for (haven't announced it here yet). I'll see what I can do. I can definitely say it won't be automated anytime soon, but a one-time report for now should be simple enough. --MZMcBride (talk) 23:46, 8 November 2009 (UTC)
 * From what I can see, all the pages in Category:Living people are in the category directly. --MZMcBride (talk) 04:39, 9 November 2009 (UTC)
 * A template which still transcluded it has been modified to no longer do so. Actually, the difference between the counts may be entirely due to the delay in the indexation. Cenarium (talk) 04:57, 14 November 2009 (UTC)
 * Well, the category description page and the PAGESINCATEGORY magic word give you the stored value of category.cat_pages, which is often wrong for large categories. For example, compare:


 * 416368 vs. 416396. That's too bad compared to some other categories, but it likely helps explain some of the discrepancy you notice. I don't know whether the search index number is accurate (or precise), so that's another factor that could be altering your perception of what's right. --MZMcBride (talk) 19:56, 14 November 2009 (UTC)

Cross-namespace redirects
Could Database reports/Cross-namespace redirects be broken up in those that are in Category:Cross-namespace redirects and not ? So we can readily see which are not and add the category. Incidentally, could redirect=no be added to the links ? Thanks, Cenarium (talk) 23:50, 8 November 2009 (UTC)
 * Done. --MZMcBride (talk) 01:12, 9 November 2009 (UTC)

Uncategorized non-redirect pages in Wikipedia namespace orphaned in Wikipedia namespace with a single contributor
Such pages are often nonsense, test pages, abandoned pages or pages never used for anything meaningful. By orphaned in Wikipedia namespace, I mean with no link in that namespace, but there may be links from other namespaces (this would catch forgotten afds linked from an article for example), if possible. This is along the lines of Orphaned article deletion discussions but larger in scope and with more restrictions. I've looked in the recent page creations, and I don't think it'll include any big set of pages (spam reports for example, are categorized), but it may still, then we'd need to refine the conditions to get a usable report. Cenarium (talk) 04:57, 14 November 2009 (UTC)

Query:

--MZMcBride (talk) 08:42, 15 November 2009 (UTC)
 * Thanks, I've looked into the sample; a report seems worth it. I had forgotten to mention it shouldn't be transcluded into Wikipedia namespace either (it would exclude 'account suspensions', and many self-closed xfds). Cenarium (talk) 17:47, 15 November 2009 (UTC)
 * Needs a better (shorter) title, preferably something incorporating "single-author." And I'm not sure what you mean about Wikipedia transclusions. I'll need an example. --MZMcBride (talk) 05:15, 20 November 2009 (UTC)
 * The page should not be transcluded in another project page (additionally to not be linked from other project pages), because otherwise it would include many things like the 'account suspensions' above, all transcluded in Account suspensions, xfds closed by the nominator before other editors could comment and similar. But it may still be transcluded in other namespaces (typically, user). As for the name, it could be 'orphan single-author project pages', it leaves out specifications but gives the general idea. Cenarium (talk) 00:00, 21 November 2009 (UTC)
 * 15,000 results (which is why the script kept failing when it tried to output to one page). Would you like paginated results or just the first 1,000? --MZMcBride (talk) 20:05, 27 November 2009 (UTC)
 * The first 1,000 will be enough for now. Cenarium (talk) 04:13, 28 November 2009 (UTC)

Running now. Should take about 15 minutes, I imagine. --MZMcBride (talk) 04:16, 28 November 2009 (UTC)

Additional note: I should mention that the query logic is a bit confusing for this, so while I'm reasonably sure this is the data you want, I'm not positive. Please be sure to double-check the results and let me know if there are issues. --MZMcBride (talk) 22:33, 28 November 2009 (UTC)
 * This is the intended data. Thanks. It's very useful. Cenarium (talk) 18:46, 17 December 2009 (UTC)

Templates containing non-free files
Just removed 60 non-free files from Template:Euro coins. These didn't appear on Database reports/Templates containing non-free files. Any idea why? Black Kite 21:23, 28 November 2009 (UTC)
 * Heh, looks like the images were wrapped in &lt;includeonly>, so they didn't show up on the template page itself (see the old version of the page, before your recent edits). &lt;includeonly> means the images would only show up when the template is called (on the article Euro coins in this case). "Euro coins" is listed at Pages containing an unusually high number of non-free files, so no worries. :-) --MZMcBride (talk) 22:31, 28 November 2009 (UTC)

Uncategorized templates
I expect it could be useful. I've nominated Template:Uncategorized template because it's really not practical, and database reports would be more usable and efficient. I don't think we need to impose restrictions, unless there's really too many, except it should probably exclude templates with / in title, like the report for unused templates. A monthly or biweekly report should be sufficient. Cenarium (talk) 18:46, 17 December 2009 (UTC)
 * There are a lot of uncategorized templates. The first 100 get to "Ar" in the alphabet. I'll just make a limited report (first 1000 entries)? --MZMcBride (talk) 19:31, 17 December 2009 (UTC)
 * What about ignoring (for the moment) those that contain the word "userbox" ? –xenotalk 19:31, 17 December 2009 (UTC)
 * Meh. --MZMcBride (talk) 19:59, 17 December 2009 (UTC)
 * Done: Database reports/Uncategorized templates. --MZMcBride (talk) 19:59, 17 December 2009 (UTC)

Database reports/Potential biographies of living people (1)
Could this list exclude all articles that are in a subcat of Category:Deaths by year. Those pages should probably go on a different list but are not likely potential BLPs. Rettetast (talk) 23:27, 22 December 2009 (UTC)
 * Yes, I've been contemplating a "Potential biographies of dead people (X)." Can you do me a favor and list a few sample bios so I know whether the query I write is returning good data? --MZMcBride (talk) 01:35, 23 December 2009 (UTC)
 * Thanks for the reply. User:Suffusion of Yellow has done a lot of good work on the list today so I am going to wait for the next update. Rettetast (talk) 15:38, 23 December 2009 (UTC)
 * I'm running it manually right now. --MZMcBride (talk) 15:45, 23 December 2009 (UTC)
 * 1, 2, 3, 4, 5, 6. Rettetast (talk) 18:29, 23 December 2009 (UTC)
 * Thanks. Split out into and . --MZMcBride (talk) 19:47, 23 December 2009 (UTC)

Could Database reports/Potential biographies of dead people (2) exclude article that includes "_and_", "_&_", "brothers" and "sisters" in the title. Rettetast (talk) 01:38, 30 December 2009 (UTC)
 * Oh. And the dead people lists probably don't need a daily update. Rettetast (talk) 01:47, 30 December 2009 (UTC)
 * Excluded those four words (plus "quintuplets"). I modified the schedule for the dead people lists to run twice a week instead of daily. --MZMcBride (talk) 06:37, 30 December 2009 (UTC)

superimposed title coordinates
I'd be interested in seeing a list of articles that have multiple coordinates in the title area. An example would be this rev which gets one set of title coordinates from Infobox Place Ireland and another from. This fairly common problem often results in illegible coordinates, and I don't know any systematic way to find instances of it. One approach might be to search for pages that transclude Coord/display/title more than once.

A one-off report would be fine for starters.--Stepheng3 (talk) 18:54, 26 December 2009 (UTC)

The immediate issue here is that multiple instances of a template are only recorded once. For example, if I have a page like this, the templatelinks look like this: As you can see, each template is only referenced to the page_id once, so if coord is used once or 200 hundred times on a particular page, there's no easy way to count the transclusions. I'll keep looking for a way to do what you want to do, though really it would probably be best if all "auto-coords" in infoboxes and other similar templates were simply outright banned. --MZMcBride (talk) 03:30, 27 December 2009 (UTC)
 * I doubt this is comprehensive, but it would have caught the test case provided: http://en.wikipedia.org/w/index.php?oldid=334200353 Cheers. --MZMcBride (talk) 04:03, 27 December 2009 (UTC)
 * That's a helpful list, though it has a lot of false positives. I'll study it and see if I get any ideas for how to reduce the F+ rate.  As for Infoboxes that generate Coords, I happen to know of some good reasons why they do, and I don't expect we'll see the ban you desire any time soon. --Stepheng3 (talk) 20:31, 27 December 2009 (UTC)
 * What I was thinking were false positives are actually cases where both Coords generate identical text, so legibility is unaffected. From a maintenance standpoint, these transclusions ought to be merged, so the list is actually quite useful.  Would you be so kind as to make this a regular report -- monthly perhaps? --Stepheng3 (talk) 20:41, 27 December 2009 (UTC)
 * Yeah, didn't mention that. It's probably more visible in certain browsers (Safari makes the coords noticeably more bold when they overlap directly). I've set up the report here: Database reports/Articles containing overlapping coordinates. Cheers. --MZMcBride (talk) 07:22, 28 December 2009 (UTC)
 * Thank you very much. --Stepheng3 (talk) 20:50, 29 December 2009 (UTC)
 * There's been some significant progress. Could I get this report re-run, please? --Stepheng3 (talk) 05:07, 7 January 2010 (UTC)
 * Sure, done. --MZMcBride (talk) 12:07, 7 January 2010 (UTC)
 * Thanks again. --Stepheng3 (talk) 06:29, 9 January 2010 (UTC)

Confirmed users that are autoconfirmed
^^^ for easy cleansing. –xenotalk 19:03, 29 December 2009 (UTC)
 * Sha-zam. Killiondude (talk) 19:06, 29 December 2009 (UTC)
 * LOL. i r dum. Sry =p –xeno<sup style="color:black;">talk  19:09, 29 December 2009 (UTC)
 * Happens to the best of us. :-) Killiondude (talk) 19:37, 29 December 2009 (UTC)
 * -1 for poor hyphenation. --MZMcBride (talk) 21:16, 29 December 2009 (UTC)
 * I thought it would be more emphatic. I also don't type it every day, so maybe not knowing how to write it was an issue. Maybe! Killiondude (talk) 21:26, 29 December 2009 (UTC)

Templates that should be Userpages
A report that shows templates where the start of the name of the template is the same as the name of the editor who initial created it. For example if I had created Template:WOSlinker userboxes then it would be included in the list. These could then be moved to userspace or renamed. -- WOSlinker (talk) 19:14, 29 December 2009 (UTC)
 * This sounds like a bitch to implement. Personally, I've never seen anyone do this; is there a sample page somewhere I can test with? --MZMcBride (talk) 21:15, 29 December 2009 (UTC)
 * It's more common they'll create a page at Template:User:Foo, which is found easily enough by Special:PrefixIndex/Template:User: (don't bother looking right now, I already cleaned it today). –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk 21:35, 29 December 2009 (UTC)
 * I looked. --MZMcBride (talk) 21:37, 29 December 2009 (UTC)
 * Here's a few: Template:Redmarkviolinist/Guestbook, Template:Redmarkviolinist/VC lessons and Template:Bradcro/Userpage -- WOSlinker (talk) 14:13, 30 December 2009 (UTC)

With a bit of manual searching, found the following:
 * 1) Template:A Raider Like Indiana/header2
 * 2) Template:Bradcro/Userpage
 * 3) Template:Chirags/UserTalk
 * 4) Template:Dionyziz/IncorrectMath
 * 5) Template:Domingopilartejr/DomingoPilarte
 * 6) Template:Ellomate/userpage
 * 7) Template:Gerardw/wnotify
 * 8) Template:Ibanez Guy/Zomba
 * 9) Template:Jhfireboy/Uniform
 * 10) Template:Kerrek.smoter/The Breadheel
 * 11) Template:Lil'GKhan/KZK Plot
 * 12) Template:Lil'GKhan/Thanx
 * 13) Template:Lil'GKhan/Warning
 * 14) Template:Lil'GKhan/Welcome
 * 15) Template:Lilac Soul/plainsig
 * 16) Template:Lotrgamemast/Past
 * 17) Template:Merosonox/Wikipedia Mainpage Banner Test
 * 18) Template:Protocoldroid/template/User blender
 * 19) Template:Protocoldroid/template/User popping
 * 20) Template:Redmarkviolinist/Guestbook
 * 21) Template:Redmarkviolinist/VC lessons
 * 22) Template:S.C.Ruffeyfan/Navbox
 * 23) Template:TotoBaggins/StripSpace
 * 24) Template:Wwoods/complicated ship table begin
 * 25) Template:Wwoods/complicated ship table career
 * 26) Template:Wwoods/complicated ship table characteristics
 * 27) Template:Wwoods/complicated ship table end
 * 28) Template:Yourilima/tabs
 * 29) Template:Wōdenhelm/g
 * 30) Template:Wōdenhelm/ȝ
 * Must...resist...urge...to immediately clean... the above... –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk  21:08, 30 December 2009 (UTC)
 * Would the following query help to find similar pages:


 * This is just from looking at the schema docs & the SQL for the other reports, so it probably needs some tweaking. -- WOSlinker (talk) 21:22, 30 December 2009 (UTC)


 * Yeah, it's a bit more complicated than that. page.page_title stores spaces as underscores while user.user_name stores spaces as spaces. So you've got to account for that. You also originally requested that the report only look at page creators. Your query doesn't account for only focusing on page creators. (In addition, "rev_user_text+'%'" isn't valid, as far as I know, at least not in MySQL.) All of your examples also seem to focus on a "Template:Username/..." pattern, which I imagine is most common, though I bet there are some without slashes. I gave this some thought today and I'm fairly sure I know how to implement it. It'll just take another day or so. xeno: as long as you leave me a test case or two, you're free to clean all you like. --MZMcBride (talk) 22:18, 30 December 2009 (UTC)
 * I know that my query doesn't restrict it to just page creators. It's probably not too vital to restrict it to just the page creator if that helps to simplify the query. Thanks. -- WOSlinker (talk) 22:23, 30 December 2009 (UTC)
 * Running now: . --MZMcBride (talk) 06:44, 31 December 2009 (UTC)
 * Thanks. -- WOSlinker (talk) 10:04, 31 December 2009 (UTC)

Toolserver copy of English Wikipedia's database is corrupt
The Toolserver's copy of the English Wikipedia has gone corrupt (announcement here). This may cause quirky reports until this issue is resolved. --MZMcBride (talk) 03:08, 31 December 2009 (UTC)

User reports
A couple of reports that could be useful for tidying up userspace items are: Any thoughts? -- WOSlinker (talk) 11:49, 15 January 2010 (UTC)
 * Redirects of Template: pages to User: pages
 * Transclusions of User: pages into Articles or Template pages.

Sample results below:
 * Redirects of Template: pages to User: pages

Both of these reports need proper names. Thoughts? --MZMcBride (talk) 14:03, 15 January 2010 (UTC)
 * Transclusions of User: pages into Articles or Template pages.


 * Potentially deletable redirects following userfication
 * Articles and templates transcluding userspace content
 * Not too creative, I know. You're the one who came up with "Eponymous templates" so maybe you can do better! ;> –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk 14:14, 15 January 2010 (UTC)
 * Those report names sound fine. -- WOSlinker (talk) 16:47, 15 January 2010 (UTC)

For the first, "User template redirects"? For the second, "Cross-pollinated user templates"? "Transplanted user templates"? "Anachronistic user templates"? --MZMcBride (talk) 16:44, 16 January 2010 (UTC)


 * I've just tidied up some of those listed above. Could the full reports be made available? Thanks. -- WOSlinker (talk) 22:59, 23 January 2010 (UTC)
 * Done: and . --MZMcBride (talk) 05:01, 24 January 2010 (UTC)

Blank talk pages where User:WildBot is the only contributor
Would like the nuke them from orbit so a blue link doesn't lead people to a blank page. –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk 20:41, 22 January 2010 (UTC)
 * At a guess there'll be about a thousand or so. Josh Parris 20:59, 22 January 2010 (UTC)
 * Josh has access to ts now, so this may not be necessary as a DBR. –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk  21:13, 22 January 2010 (UTC)

There's already a report for blank single-author pages:. It looks like it updated recently and includes some entries from WildBot. So... yeah. --MZMcBride (talk) 22:49, 22 January 2010 (UTC)
 * Doh! Josh Parris 22:52, 22 January 2010 (UTC)
 * Damn it! You're 2 for 2 McBride. I am not worthy. –<b style="font-family:verdana; color:black;">xeno</b><sup style="color:black;">talk 01:09, 23 January 2010 (UTC)  —Preceding unsigned comment added by Xeno on an iPhone (talk • contribs)

Exclamation points in the mainspace
I had an idea for tracking down vandalism/spam/unencyclopedic content. It would be great to have a DB report that searches for the "!" symbol in the mainspace.
 * Conditions
 * All occurrences within 'single quotes', "double quotes", italics and bold are excluded.
 * Anything within a wikilink is excluded.
 * Any ! symbol within [single brackets] that begin with a url after the [ is excluded.  e.g.:
 * Anything inside is excluded.
 * Certain articles should be added to a whitelist: i.e., Punctuation (I can compile a list).

I know, this will likely be turned down as too broad of a search. And we'll definitely get false positives. *shrug* I don't know much about db reports, but at least it's worth proposing – it would be really helpful if we could have something to detect those pesky "!" unencyclopedic comments. Jamie  S93  14:56, 7 February 2010 (UTC)
 * This is probably something you'd want to put at WP:BOTREQ. Or WT:DDR, though DDR is fairly dead at the moment. --MZMcBride (talk) 21:26, 7 February 2010 (UTC)
 * Posted to BOTREQ, thanx. Jamie  S93  21:47, 8 February 2010 (UTC)

Is Database reports/Articles containing links to the user space down?
Database reports/Articles containing links to the user space is supposed to be updated weekly (based on the table on WP:DBR, but hasn't been updated during the past 11 days. Any idea why? עוד מישהו Od Mishehu 07:14, 17 February 2010 (UTC)
 * The bot operator appears to be taking a hiatus. Killiondude (talk) 07:22, 17 February 2010 (UTC)
 * You're a hiatus. --MZMcBride (talk) 19:07, 9 March 2010 (UTC)
 * "Data as of 00:16, 06 March 2010 (UTC)" and I fixed the links, so we should be good. --MZMcBride (talk) 19:07, 9 March 2010 (UTC)

Files
Could there be a listing for all files that use a fair use copyright template, but don't have a FUR template? That would be nice. :-) Killiondude (talk) 09:16, 30 November 2009 (UTC)
 * Maybe! Sounds kinda tricky, though. --MZMcBride (talk) 20:59, 30 November 2009 (UTC)
 * I imagine there would be a lot of files that have a non-template FUR. PC78 (talk) 22:03, 30 November 2009 (UTC)
 * I hadn't thought of that, PC78. That does throw a wrench into things. I'm not sure how these reports are gathered, maybe the bot could not collect pages that have a === Fair use in ARTICLE NAME === header? Or something like that. All the non-template examples given on WP:FURG and WP:FURE have that, so that would probably weed out the majority of files that use non-template fair use rationales. I'm not even sure how large of a problem it would be if the files with non-template fair use rationales were included... Do many people use those? Killiondude (talk) 06:10, 1 December 2009 (UTC)

(unindent) Seems to be a lot results like File:'Gaea', glazed stoneware by --Toshiko Takaezu--, 1984-1990, --The Contemporary Museum, Honolulu-- courtesy of the artist.jpg. --MZMcBride (talk) 23:52, 1 December 2009 (UTC)
 * Perhaps you could exclude any file which has "fair use" anywhere in the text? PC78 (talk) 00:11, 2 December 2009 (UTC)
 * I've been told that dealing with pagetext is not the best way to go about doing things. Oh well. It would have been a great idea had we only used templated fair use rationales. Killiondude (talk) 00:32, 4 December 2009 (UTC)

Started working on this. Out of the first 500 files checked (that met the template requirements given above), 49 didn't contain "fair use" in their wikitext. Results: http://en.wikipedia.org/w/index.php?oldid=334828581 There are some other words / phrases / expressions that could be checked to make this list have fewer false positives. Please suggest them and I'll try to get this finished before the new year. Cheers! --MZMcBride (talk) 01:34, 30 December 2009 (UTC)
 * Done (or it will be in an hour or so): . It's going to need some tweaking. I didn't want to remove all files that use "fair use," so I went with phrases matches. The current phrase list is below. Cheers! --MZMcBride (talk) 00:34, 15 January 2010 (UTC)


 * Okay, some strings to look into:
 * File:007FRWLvg2.jpg has "FU in"
 * File:007GoldfingerComic.jpg has "Rationale for "fair use" in"
 * File:008 frank corder cessna.jpg has "Fair use rationale for"
 * File:05unconf gallery1.jpg just has "fair use" as something that could possibly be filtered... dunno.
 * File:0x80 cracker with laptop.jpg not sure what to pull from that
 * File:100kgCanadianGoldOnPedestal.gif "no free image available" maybe? that would probably be in a lot of fair use rationales
 * File:1251MZMS.jpg "used under a claim of fair user" or some smaller string.
 * File:Admin logo.gif is this in here because of Category:Files copyrighted by the Wikimedia Foundation? Dunno what to do about this one, really. I think we're able to use Wikipedia/Wikimedia logos without a fair use rationale... if they were strictly fair use, we couldn't use them in the userspace. Perhaps exclude that category from this search?
 * I need to sleep now. But I noticed that many of the files from the first few hundred now had fair use rationales. Perhaps running this report again (maybe filtering with a few of the strings I listed here?) would cut back on the false positives or whatever. Thanks for your hard work. Killiondude (talk) 09:14, 26 January 2010 (UTC)
 * I hope you enjoyed your rest! I've updated the fair use strings list and bumped the limit on the report. I'm going to archive this thread if there's nothing further. --MZMcBride (talk) 22:53, 21 April 2010 (UTC)

Dump reports
Some people who follow this page may be interested in Dump reports. Cheers. --MZMcBride (talk) 05:59, 25 December 2009 (UTC)

Request : Large non-free audio samples
For all non-free audio samples, identify OGG Vorbis files larger than 1MB or that have a sample rate higher than 96Kbps.

This will allow them to be tagged for reduction, to meet NFCC. Media already tagged for reduction with Non-free reduce or whatever category it uses can be excluded. Sfan00 IMG (talk) 12:08, 4 January 2010 (UTC)
 * I'm not sure we store sample rate anywhere. Do you have an example file? --MZMcBride (talk) 17:50, 4 January 2010 (UTC)


 * File:NobodyLovesMeLikeYou.ogg is an example of the sort of thing I was trying to find, (It's also tagged for reduction currently)Sfan00 IMG (talk) 00:09, 6 January 2010 (UTC)
 * @MZ: Is the duration of the sound file stored anywhere in the API or whatever you use? It looks like it says in two places on that example file the duration (in seconds) of it. If so, you could query for non-free sound files that have a really long (over 30 seconds or whatever) duration. Killiondude (talk) 00:35, 6 January 2010 (UTC)

At least in the database, it's stored as a blob: <pre style="white-space:pre-wrap; overflow:auto;"> mysql> SELECT * FROM image WHERE img_name = 'NobodyLovesMeLikeYou.ogg'\G img_name: NobodyLovesMeLikeYou.ogg img_size: 333652 img_width: 0 img_height: 0 img_metadata: a:4:{s:7:"version";i:2;s:7:"streams";a:1:{i:435487386;a:8:{s:6:"serial";i:435487386;s:5:"group";i:0;s:4:"type";s:6:"Vorbis";s:6:"vendor";s:29:"Xiph.Org libVorbis I 20020717";s:6:"length";d:19.3836961451247162813160684891045093536376953125;s:4:"size";i:333652;s:6:"header";a:9:{s:14:"vorbis_version";i:0;s:14:"audio_channels";i:2;s:17:"audio_sample_rate";i:44100;s:15:"bitrate_maximum";i:0;s:15:"bitrate_nominal";i:160003;s:15:"bitrate_minimum";i:0;s:11:"blocksize_0";i:8;s:11:"blocksize_1";i:11;s:12:"framing_flag";i:0;}s:8:"comments";a:0:{}}}s:6:"length";d:19.3836961451247162813160684891045093536376953125;s:6:"offset";N;} img_bits: 0 img_media_type: AUDIO img_major_mime: application img_minor_mime: ogg img_description: fair use clip from The Flamingos' "Nobody Loves Me Like You" img_user: 8351 img_user_text: TUF-KAT img_timestamp: 20040211052420 img_sha1: g9tujggvyekmnhrajf7m1zu2twmk3nq 1 row in set (0.00 sec) I'm not sure if the API has a parser for audio length built in. I doubt it, though. --MZMcBride (talk) 00:42, 6 January 2010 (UTC)
 * 1. row ***************************
 * Is there anything further with this request? If not, it's getting archived shortly. --MZMcBride (talk) 22:54, 21 April 2010 (UTC)

Red links for disambiguation pages
I've noticed that disambiguation pages seem to lack links. Could we get a report or tool that lists page Foo is missing red links Foo (301) and Foo (bar)? — Dispenser 04:56, 9 February 2010 (UTC)


 * The code above basically does the same as Special:PrefixIndex but with red links. I have written a simple tool that specializes on finding red links with,  in them.  ~dispenser/cgi-bin/redlinks.py?prefix=Foo paging hasn't been implemented yet, results are limited to 50.  — Dispenser 06:12, 10 March 2010 (UTC)

Least revisions?
Could I get a report of 100 articles older than a year, with the least revisions, monthly? Thank you. Maurreen (talk) 08:13, 13 February 2010 (UTC)

I don't see much utility in such a report, though that won't really stop me from generating it. I've put some sample results below. The least revisions is 1, and by "articles" I assumed you meant non-redirects in the article namespace. Due to the way the page and revision tables are structured, the list is the first 100 articles with 1 revision in alphabetical order.

If you still want a report, please choose an acceptable report title and I'll try to knock this out sometime this week. Cheers. --MZMcBride (talk) 02:56, 11 March 2010 (UTC)

Request : All Images ending in *.gif, not tagged for conversion to PNG/SVG or JPEG
Simple request, a list of all single frame gif images, that are not currently tagged for conversion.

Objective: As far as possible GIF elimination based on the results...

Shouldn't need to run this more than once a week...

Sfan00 IMG (talk) 18:20, 14 February 2010 (UTC)
 * How is a script supposed to know if a GIF is single-framed or not? --MZMcBride (talk) 02:57, 11 March 2010 (UTC)


 * Hmmm Point taken... Change of suggestion... Report detailing all items with incorrect MIME type? Sfan00 IMG (talk) 12:10, 20 March 2010 (UTC)
 * ~earwig/cgi-bin/file_extensions.py --MZMcBride (talk) 16:36, 20 March 2010 (UTC)

Possibly unreferenced biographies of living people
This report currently includes a lot of articles that are tagged as unsourced y redirects to BLP unsourced. Could it be set to ignore articles in ? Rettetast (talk) 23:26, 20 February 2010 (UTC)
 * Fixed. --MZMcBride (talk) 03:58, 11 March 2010 (UTC)

Wikipedia:Database reports/Unused templates
On the Database reports/Unused templates report, 75 out of 800 listed are stub templates. Just wondering if it would be better to split off stubs into a separate report. -- WOSlinker (talk) 16:32, 13 March 2010 (UTC)
 * A fine idea. Done: . Cheers. --MZMcBride (talk) 06:06, 14 March 2010 (UTC)

Old XfDs
I think we should have a report of all pages categorized in any f the following categories for over a month: While occasionally this may be the result of a discussion being open that long, it's more likely because the nomination wasn't completed, or because of a partial closure of it. I think that this should be done weekly. עוד מישהו Od Mishehu 06:58, 15 March 2010 (UTC)
 * Category:Articles for deletion
 * Category:Templates for deletion
 * Category:Wikipedia files for deletion
 * Category:Categories for deletion
 * Category:Categories for merging
 * Category:Categories for renaming
 * Category:Redirects for discussion
 * Category:Miscellaneous pages for deletion


 * Do you have a report name in mind? The results that I think you want are below. I'm not sure how reliable categorylinks.cl_timestamp is, but it's the only way I know of to generate this data. I also assumed 1 month was 30 days. --MZMcBride (talk) 07:16, 15 March 2010 (UTC)


 * List of entries looks good, except that I think that pages in Category:Articles for deletion should only be listed if they are in the article namespace. The cl_timestamp appears to be correct - I specificly checked that the time for File:TTC Bessarion - Digital Rendering.PNG looks correct; the time corresponds to the time frame of a template update which moved the Files for deletion pages to their current category. I think that the order should be alphabetical by page name. עוד מישהו Od Mishehu 08:17, 15 March 2010 (UTC)
 * . Note, there's some database corruption that should be resolved today that's currently causing some red-linked titles to appear in the results. --MZMcBride (talk) 08:52, 15 March 2010 (UTC)

Templates linking to disambiguation pages
I think we could use to have a report of templates linking to disambiguation pages. This tool, which WP:Database reports links to, seems to be broken. As to the definition of a disambiguation page, MediaWiki defines it as any page transcluding a template linked to from MediaWiki:Disambiguationspage; I believe that this definition is probably the best. עוד מישהו Od Mishehu 14:57, 15 March 2010 (UTC)
 * Example page? --MZMcBride (talk) 16:43, 15 March 2010 (UTC)
 * Hungarian kings links to House of Anjou, which is marked as a disambiguation page with disambiguation. עוד מישהו Od Mishehu 16:52, 15 March 2010 (UTC)
 * Done: . It's using the master category Category:All disambiguation pages because using templatelinks with MediaWiki:Disambiguationspage was too much of a headache. How often do you want the report updated? --MZMcBride (talk) 00:05, 16 March 2010 (UTC)
 * I think weekly is good. עוד מישהו Od Mishehu 07:16, 16 March 2010 (UTC)
 * Please have it sorted by number of transclusions, not alphabetically. עוד מישהו Od Mishehu 07:45, 16 March 2010 (UTC)
 * Set to weekly and ordered by number of transclusions, descending. Though, it'll still be limited results and due to the way the table is ordered, it'll be only the first X% of the alphabet. --MZMcBride (talk) 18:54, 16 March 2010 (UTC)

Orphaned talk pages
I think a monthly report of orphaned talk pages would be useful. A good definition of an orphaned talk page would be: A talk page of a non-existent page, which isn't in the User talk: namespace, and which isn't the subpage of an existing talk page on any level (i.e, if there's a page called "Talk:Foo", then a page called "Talk:Foo/bar/1/2/3" shouldn't be listed; if that's impracticle, then not containing a slash would be good enough.) עוד מישהו Od Mishehu 17:59, 18 March 2010 (UTC)
 * You mean one different to Database reports/Orphaned talk pages?-- Jac 16888 Talk 18:01, 18 March 2010 (UTC)
 * Sorry, missed that one. עוד מישהו Od Mishehu 18:05, 18 March 2010 (UTC)
 * :-) --MZMcBride (talk) 18:31, 18 March 2010 (UTC)

Orphaned talk subpages - Subpages of pages with slashes
Would it be possible to prevent subpages of pages with lashes, such as Talk:Architects & Engineers for 9/11 Truth/Archive 1 (subpage of Talk:Architects & Engineers for 9/11 Truth, not of Talk:Architects & Engineers for 9) from appearing? עוד מישהו Od Mishehu 19:51, 21 March 2010 (UTC)
 * This is the same thing you posted about here? It's on the list. MySQL doesn't really have a split string function, so I need to write a Pythonic function. --MZMcBride (talk) 21:15, 21 March 2010 (UTC)
 * Fixed. --MZMcBride (talk) 04:59, 22 March 2010 (UTC)

Wikipedia:Database reports/Blocked users in user groups
Should we ask at Meta for the admin bit to be removed from User:Mirwin, I'm not sure what the standard procedure is here-- Jac 16888 Talk 16:34, 22 March 2010 (UTC)
 * I don't think there's any policy in place that covers that. Stewards would not act. Maybe arbcom motion? Maybe the account should just be unblocked, if the account becomes compromised they'll do that anyway. – xeno <sup style="color:black;">talk 16:37, 22 March 2010 (UTC)
 * E-mail ArbCom. I seem to remember a similar situation arising previously. As far as I know, these accounts always have all rights removed. Though, as xeno notes, the directive has to come from ArbCom. --MZMcBride (talk) 17:39, 22 March 2010 (UTC)
 * Emailed them-- Jac 16888 Talk 19:49, 22 March 2010 (UTC)
 * I've posted a request at meta. <span style="font-family:Verdana,sans-serif"> — <b style="color:#060;">Rlevse</b> • Talk  • 22:51, 22 March 2010 (UTC)

WP:GAN
As part of this month's GA review drive, I think it would be helpful to have a list of current GA nominees which still have article tags on them, like unsourced, or article issues, or anything listed at a subcategory of Category:Wikipedia cleanup categories, if possible. Category:Good article nominees awaiting review should be used on any unreviewed GAN's talk page. Updated every few days should be fine.  fetch  comms  ☛ 23:19, 8 April 2010 (UTC)
 * 0 results. --MZMcBride (talk) 02:12, 10 April 2010 (UTC)
 * That's good, I suppose.  fetch  comms  ☛ 02:13, 10 April 2010 (UTC)