Wikipedia talk:Database reports/Archive 3

Request: Files with 'conflicted' licensing
In other words, files that are currently tagged as having both a 'free' and 'non-free' license, or which have a 'free' license but also have a 'fair-use' rationale present.

I'm currently working through an earlier list someone provided on the toolserver here: http://toolserver.org/~betacommand/reports/miss_tagged_files.txt

but feel that doing this via a weekly report would greatly assist cleanup efforts.

I am currently tagging the ones I am finding with conflicted-license which will categorise into Category: Wikipedia files with confilicted copyright information a category whose contents could reasonably be excluded from the report.

Sfan00 IMG (talk) 12:08, 4 January 2010 (UTC)
 * Is there some way to identify images using a free license? Is there a master category of free license templates somewhere? I know all non-free file go into Category:All non-free media, but there doesn't seem to be a corresponding category for free media. --MZMcBride (talk) 17:55, 4 January 2010 (UTC)


 * Category:Free_images and it's subcateogries is a possibility, but it's subcatagories, not just on great big one...

Sfan00 IMG (talk) 22:19, 5 January 2010 (UTC)


 * OK I've looked into this further, potentialy conflicted files are ones in both Category:All free media and Category:All non-free media.Sfan00 IMG (talk) 11:14, 16 January 2010 (UTC)


 * Further to last message, someone called erwin on #wikimedia-toolserver came up with the following -

SELECT page_title FROM page JOIN categorylinks AS c1 ON c1.cl_from = page_id JOIN categorylinks AS c2 ON c2.cl_from = page_id WHERE page_namespace = 6 AND c1.cl_to = 'All_free_media' AND c2.cl_to = 'All_non-free_media'

It would need a report to go with it but in essence the query is straightforward. :) Sfan00 IMG (talk) 11:28, 18 January 2010 (UTC)
 * Done Tim1357 (talk) 04:29, 25 April 2010 (UTC)


 * Done: . --MZMcBride (talk) 05:14, 25 April 2010 (UTC)

Broken reports
--MZMcBride (talk) 19:33, 9 March 2010 (UTC)
 * &mdash; Commons subquery issue Fixed!
 * &mdash; Commons subquery issue Fixed!
 * &mdash; Commons subquery issue Fixed!
 * &mdash; needs to be paginated Fixed!
 * &mdash; Commons subquery issue; also needs a rename to match other reports Fixed!
 * &mdash; Commons subquery issue Fixed!

Update: The Commons subquery issue is described here: https://jira.toolserver.org/browse/TS-549. --MZMcBride (talk) 05:33, 11 March 2010 (UTC)
 * Could I prod you into running the indef semi prot one? There's an ongoing WP:AN about indef semi's and I'd like to see the current data. –xeno talk  18:42, 23 March 2010 (UTC)
 * Cats are evil. --MZMcBride (talk) 07:19, 24 March 2010 (UTC)
 * Lol. Just dogs for future, then =p Cheers, –xeno talk  15:27, 25 March 2010 (UTC)

Candidates for Autoreview privileges
I would like to generate a report on editors who are eligible for autoreview privileges (and don't already have them):


 * 1) Search for all editors with more than 50 created non-redirect articles.
 * 2) Remove entries which already have administrator or autoreview privileges.

I suppose this report could be run quarterly&mdash;once every three months or so. I asked around to see if anyone would be willing to make a bot to do this, and I was pointed here by Tim1357. —Preceding unsigned comment added by Cryptic C62 (talk • contribs)
 * I think someone beat me to this: jira:DBQ-87. I can still make a report with the provided query if you'd like. If not, we'll count this as resolved. --MZMcBride (talk) 03:11, 17 March 2010 (UTC)


 * Yay! I am more than slightly clueless, so bear with me here: Is there a way to use DBQ-87 to generate a new report every few months? Or is it just a static report? In the latter case, I think it would be helpful to make the report here. On the other hand, this query generated over 900 results, so it may not even be necessary to run it again any time soon considering how long it would take to process that many candidates. Your call. --Cryptic C62 · Talk 04:01, 17 March 2010 (UTC)


 * The DBQ is static. The ticket has been resolved and isn't set to re-run in the future. A report here would be set to update every X weeks or months or whatever. (This is actually how most queries got run before I set up "Database reports" with automation.) If/when you need the query re-run, just leave a note here and I'll use that code that Mauro wrote to make something that updates however often you'd like. --MZMcBride (talk) 04:15, 17 March 2010 (UTC)


 * That sounds like an excellent plan to me, thanks mate! I'll come back in a few months when this batch is done. In the meantime, here's something to think about: Of the candidates that these queries generate, some will probably not be granted autoreview privileges. In order to avoid having these users show up in every subsequent report, I could maintain a subpage that lists the rejected candidates. I assume it would be fairly straightforward for the automated report to perform an additional check to see if the user is on that list, correct? --Cryptic C62 · Talk 04:21, 17 March 2010 (UTC)


 * Simple enough to program. Something similar is already done with the edit count list (currently living at User:MZMcBride/Sandbox 3), which has both an opt out list (for unflagged bots) and an anonymize list. You may run into political issues making a list of users, though. Sometimes people get testy with such things (look up "User:Kelly Martin/B" sometime if you're bored). Just let me know sometime in the future and we'll figure out a decent solution then. --MZMcBride (talk) 04:29, 17 March 2010 (UTC)


 * Yup yup, sounds good. --Cryptic C62 · Talk 04:33, 17 March 2010 (UTC)

Yikes, there's a problem! It seems that the line of code  isn't working. Many of the editors on the list are no longer editing. For now I can just check their most recent contributions, but what can be done to rectify this permanently? --Cryptic C62 · Talk 02:18, 18 March 2010 (UTC)
 * So... I guess I didn't get beat out after all? Can you give me an example user that's wrongly listed in the current list and an example user that's correctly listed? I should be able to run this report properly this week. Thanks. --MZMcBride (talk) 02:22, 18 March 2010 (UTC)
 * Certainly. 2fort5r, 45ossington, and 4meter4 were all correctly listed and approved, as they meet all the requirements and have edited in 2010. 40fifw0, 6afraidof7, and 9Nak were all incorrectly listed, as they meet the requirements but have not edited in 2010. On a side-note, it may be useful to add a check that the user is not currently blocked. I assume this would also be easy to implement? --Cryptic C62 · Talk 02:33, 18 March 2010 (UTC)


 * The static report is becoming less useful over time. In the last batch of 6 candidates, 2 had already been given the autoreview privilege since the report was first compiled, plus there's always the problem of inactive users. I've started an exclusion list at User:Cryptic C62/Exclusion, any chance ya'll could run the report again using the new criteria? --Cryptic C62 · Talk 16:05, 23 April 2010 (UTC)
 * What are your exclusion criteria for that user subpage? The same as listed above? --MZMcBride (talk) 05:36, 25 April 2010 (UTC)

If a user is not suitable for the autoreview privilege but might become suitable later (for example, if they haven't edited/created articles in a while), then I don't put them on the exclusion list. If a user is not suitable for the privilege and they are unlikely to become suitable later (for example, if they are indefinitely blocked or have a number of article warnings on their talk page), then I do put them on the exclusion page. Also, if I nominate a user but an admin rejects the nomination, then I will put that user on the exclusion list. Basically, I just want to avoid having to process the same users over and over again if there's no chance that the later nominations will be successful. The ideal report would produce users with all of the following attributes: Does this help? --Cryptic C62 · Talk 18:08, 25 April 2010 (UTC)
 * 50+ created articles
 * Not autoreview, not administrator
 * Most recent edit after 1 April 2010
 * Not blocked
 * Not on exclusion list

Start articles
A report that list articles with quality scale of start that doubled in size since were last rated. Sole Soul (talk) 17:16, 26 March 2010 (UTC)
 * For which wikiProject? –xeno talk  17:19, 26 March 2010 (UTC)


 * (edit conflict) Do you have any example pages? I don't think this is possible to (easily) query, but I won't know for sure until I try it out on a few example pages. Also, is this something you'd like a regularly updated report for or just a one-time thing? What's the ultimate goal of this report (there might be a better way to find similar data)? --MZMcBride (talk) 17:21, 26 March 2010 (UTC)
 * This is not a wikiproject-specific. I will try to find examples; I was thinking that the hard part is to know when was an article last rated, which do not need examples, am I wrong? A one-time report is enough, I think. Sole Soul (talk) 19:29, 26 March 2010 (UTC)
 * If it's not WikiProject specific, the report is going to be huge - and who is going to go through it? Most projects already have a trouble keeping their unassessed articles out of backlog - let alone re-rating existing ones... –xeno talk  19:39, 26 March 2010 (UTC)
 * This is only a report for start pages that were doubled in size since last rated, I doubt that it is gonna be huge. If that is the case then we can start with the first 100. Outdated rate is less useful than no rate. Sole Soul (talk) 19:54, 26 March 2010 (UTC)

I'm no programmer, but maybe this will help: Cheers. --Cryptic C62 · Talk 17:32, 27 March 2010 (UTC)
 * Step 1: Find all talk pages that contain.
 * Step 2: Search through revisions backwards to find timestamp of earliest revision with.
 * Step 3: Search through revisions of the actual article to find revision with the largest timestamp that is older than the aforementioned talk page timestamp. (The revision of the article that was reviewed when  was added.
 * Step 4: Find prose size of that article revision.
 * Step 5: Find prose size of current revision.
 * Step 6: If current prose ≥ 2x old prose, add to report.
 * Yeah, you're clearly not a programmer. :-) --MZMcBride (talk) 18:20, 28 March 2010 (UTC)
 * Lol, why do you say that? --Cryptic C62 · Talk 18:25, 28 March 2010 (UTC)

User subpages of indefinitely blocked users
It could be useful to have a report listing user subpages, and maybe also user talk subpages, of indefinitely blocked users blocked over a year ago, as they often contain inappropriate content. A report on main userpages and usertalkpages would need plenty of exceptions to become usable but it's probably not the case for subpages. Cenarium (talk) 22:57, 11 April 2010 (UTC)
 * Done. I removed all pages ending in .css and .js. Cheers! Tim1357 (talk) 02:49, 21 April 2010 (UTC)

Cenarium: Do you want a regularly updated report for this? If so, how often? --MZMcBride (talk) 05:19, 25 April 2010 (UTC)

Very old user subpages of long-term inactive users
Per discussion at Village_pump_(development), would it be possible to get a report listing very old user subpages of long-term inactive users? We could start with the criteria (on an AND filter basis) i) subpage older than 5 years ii) user's last edit (to any page) more than 5 years ago. It's quite possible that this listing would contain nothing at all, in which case we could try it with 3 years instead of 5. PS Providing a diff and date of the last edit of the subpage could be helpful. Rd232 talk 12:45, 12 April 2010 (UTC)
 * Its running now. When it finishes it will be saved here. The SQL:


 * I think that's what you want. Tim1357 (talk) 02:54, 17 April 2010 (UTC)
 * Hah, parts of that query look pretty familiar. --MZMcBride (talk) 02:57, 17 April 2010 (UTC)
 * Thanks, but that's a list of old userpages. That may be of interest as well, but less so. I'm after user subpages. Also (and this must be a perennial question) can't the report provide clickable links to the pages in question? Copying and pasting will waste a lot of time here. Rd232 talk 07:32, 17 April 2010 (UTC)
 * Yeah, that list from Tim1357 is useless to you; it doesn't contain any subpages. I'll look at making a proper report. --MZMcBride (talk) 17:34, 17 April 2010 (UTC)
 * Wow I fail at reading. I didn't take anybody else's SQL, so maybe just great minds think alike? Ill do another query for subpages and post a link when I am done. Tim1357 (talk) 17:59, 17 April 2010 (UTC)
 * Great minds read documentation, I think you mean. I wasn't accusing you of anything, but the REPLACE, DATE_FORMAT, and SELECT MAX(rev_timestamp) bits all look familiar. Anyway, your problem is that you're JOINing page ON user, which will inherently eliminate all subpages. I wouldn't worry about this one, I'm about to write it. --MZMcBride (talk) 18:04, 17 April 2010 (UTC)
 * I already did. Running and will save to here when done. New SQL:


 * Cheers! Tim1357 (talk) 18:30, 17 April 2010 (UTC)
 * Thanks... but is there really no way for the reports to be linked to the pages in question, so they can be seen with one click? Rd232 talk 22:12, 19 April 2010 (UTC)
 * Ha, of course! That is just the way the query outputs it. here ya go. Tim1357 (talk) 01:13, 21 April 2010 (UTC)
 * Thank you. This way of presenting the info is so much more user friendly! :) Rd232 talk 06:52, 21 April 2010 (UTC)

Rd232: Do you want a regularly updated report for this? If so, how often? --MZMcBride (talk) 05:19, 25 April 2010 (UTC)

Pages with four or more WikiProject banners, no shell, and at least one heading
I would like a list of talk pages with four or more WikiProject banners1, no shell2, and at least one ==heading==3. –xeno talk  18:38, 14 April 2010 (UTC)

Criteria 1 \{\{.*?(WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{.*?(WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{(.*?WikiProject|WP|\|[ ]class[ ]*=).*?\}\}.*?\{\{(.*?WikiProject|WP|\|[ ]class[ ]*=).*?\}\}

Criteria 2 \{\{[ ]*(Template:|)(W(iki|)p(roject|)[ ]*banner[ ]*s(hell|)|(WPBS|WPB|WBS|Shell)[ ]*\|)

Criteria 3 ==

Xeno, Im running a query that matches the first 2 criteria on your list. My guess that parsing for a section header won't be to hard for you. When the query is done, the output will save here and Ill make it pretty after that. Tim1357 (talk) 16:40, 21 April 2010 (UTC)
 * I imagine the first iteration of this will be rather large, so you can just store it somewhere before making it into a auto-refreshing report. –xeno talk  18:39, 14 April 2010 (UTC)
 * Sounds like you're posting to the wrong page. --MZMcBride (talk) 18:51, 14 April 2010 (UTC)
 * bah. you ignore that page like a red-headed stepchild. –xeno talk  18:55, 14 April 2010 (UTC)
 * Toolserver accounts are free and I publish nearly all of the code I write. :-) --MZMcBride (talk) 19:14, 14 April 2010 (UTC)
 * I was applying for a toolserver account and ran into a snag (couldn't log into JIRA). Emailed the people who were responsible and never got a reply. =\ –xeno talk  19:23, 14 April 2010 (UTC)
 * Try again? I don't know of any known issues regarding account creation in JIRA. If you hit a wall, post the error message (here or on my talk page) and I'll see that it gets looked at. --MZMcBride (talk) 19:25, 14 April 2010 (UTC)
 * Why are you suggesting regex on the page text here? You're interested in pages not using Template:WikiProjectBannerShell that contain 5 or more instances of templates that start with "WP" or "Wikiproject"? That seems mostly like database work (the database will be far more accurate than your shitty regex, at least). The additional == header requirement isn't really too difficult to implement in a database report as well. Much as I loathe dealing with page text, some of the current reports do so. Would the implementation I suggested work, and if so, do you have a report title in mind? --MZMcBride (talk) 19:29, 14 April 2010 (UTC)
 * It would probably be more robust to find pages transcluding 4 or more members of Category:WikiProject banners (recursive), not transcluding WikiProjectBannerShell, but I am not a wizard like you. The == part can be left off if it's easier, but I'll probably still be ignoring pages without at least a header (no need to save space when there's nothing below to see) The above is the hack I came up with =)
 * For the name: Pages needing WikiProject banner shells? I am terrible at this name stuff. –xeno talk  19:31, 14 April 2010 (UTC)
 * That's one shell of a name. Killiondude (talk) 00:54, 15 April 2010 (UTC)
 * –xeno talk  16:55, 15 April 2010 (UTC)


 * Thanks! Don't worry about making it pretty as long as it's an importable list. And yes, I can look for section headers easily enough. –xeno talk  16:59, 21 April 2010 (UTC)
 * Grr it got stuck last night. I slapped a  on there for now, and restarted it. MZMcBride, can you find anything that I am doing wrong? Tim1357 (talk) 22:59, 22 April 2010 (UTC)
 * The 500 talk pages finished pretty quickly. Ill try 5,000 tonight. If it works, ill leave another link here. Cheers! Tim1357 (talk) 00:27, 23 April 2010 (UTC)
 * Here are the first 5,000. Damn this list is big! Do you still want more? Tim1357 (talk) 18:27, 24 April 2010 (UTC)
 * Let me run with this 5000 and I'll bother you when I need more. Thanks, –xeno talk  18:28, 24 April 2010 (UTC)

wmf: conversion
Could you run a report similar to, but excluding User:, User talk:, and /Archive pages? Thanks.  MBisanz  talk 19:15, 20 April 2010 (UTC)
 * http://en.wikipedia.org/w/index.php?oldid=357259623 --MZMcBride (talk) 19:20, 20 April 2010 (UTC)

Templates that cause citation errors
Templates that contain inline citations occasionally cause cite erros if transcluded in articles that do not contain or if the template was placed below a. AWB does not fix this error. The conditions are:


 * Templates that contain inline citations, AND
 * Transcluded in mainspace articles that display "Cite error"

example. Sole Soul (talk) 11:47, 21 April 2010 (UTC)
 * ❌ Sorry, but it is impossible to use the database to find this information, as MzMcbride and I cannot do queries on the pages text. I would suggest requesting a dump scan instead. Thanks! Tim1357 (talk) 13:15, 21 April 2010 (UTC)
 * If this only concern the first condition then I may have an alternative suggestion, but if the the "Cite error" is also considered a text, then may be not. Sole Soul (talk) 13:40, 21 April 2010 (UTC)
 * Category:Pages with broken reference names, Category:Pages with missing references list, and Category:Pages with incorrect ref formatting. If you want a report of these (I'm not sure why you would...), just let me know. --MZMcBride (talk) 15:49, 21 April 2010 (UTC)
 * Cool, looks like I was wrong: I was not aware that there were maintenance categories that list pages with broken refs. I thought you wanted us to parse the text for broken refs, which we of course cannot do. Tim1357 (talk) 16:22, 21 April 2010 (UTC)
 * Yes please, but I want to change my request slightly. I want a list of articles not templates. So, articles that appear in Category:Pages with missing references list and transclude these templates. If possible weekly report. Sole Soul (talk) 21:26, 21 April 2010 (UTC)
 * Done. Tim1357 (talk) 00:32, 22 April 2010 (UTC)


 * Thank you very much. Sole Soul (talk) 00:43, 22 April 2010 (UTC)

I have no intention of making this a regularly updated report as I don't understand Tim's output. I don't understand why there are two columns (article title and template title) and why there are results being listed multiple times. It's fairly trivial to obtain a list of pages in namespace 0 that are in Category:Pages with missing references list, if by no other means than by using yanker. --MZMcBride (talk) 05:42, 25 April 2010 (UTC)

request
Sorry but, I tried to use the scripts in italian wikipedia, I have an account on toolserver! what can I do for use this scripts? :( --.snoopy.  10:22, 23 April 2010 (UTC)
 * Hi. You need to install wikitools and check out the git repository. Links are available here: Database reports/Configuration. Let me know if you have any questions. --MZMcBride (talk) 18:36, 23 April 2010 (UTC)


 * Hi, thanks for the reply, Ok, Yesterday morning I download wikitools and after I run this commands on toolserver:
 * tar -xf wikitools-1.1.1.tar.gz
 * and into wikitools-1.1.1's directory I run python setup.py install.
 * but I can't use wikitool module in python on toolserver :|
 * sorry but my english is not good enought --.snoopy.  18:56, 24 April 2010 (UTC)

There is no setup.py. Here's what you need to do:

Navigate to the directory where you want to put the wikitools folder. Then run this command: $ svn co http://python-wikitools.googlecode.com/svn/trunk/wikitools/ wikitools

Then you need to modify your PYTHONPATH. Edit your .bash_profile: $ nano ~/.bash_profile

Add the following line to it (where the path is wherever your wikitools/ directory is): export PYTHONPATH="/home/ /path/to/directory"

For me, this is "/home/mzmcbride/scripts" because my "wikitools" directory is in /home/mzmcbride/scripts.

Save the .bash_profile file and then you need to source it. $ source ~/.bash_profile

You can check your PYTHONPATH by running this command: $ echo $PYTHONPATH

Now you need to test that wikitools is working properly: $ python -c 'import wikitools'

If this fails, paste the full error message onto this page.

You can also test wikitools by creating a file like this:

Save this file as "wikitools-test.py" and then run: $ python wikitools-test.py

It should output "Success!". If it doesn't, paste the full error message onto this page.

Hope that helps. --MZMcBride (talk) 04:55, 25 April 2010 (UTC)


 * yes Now it's all OK. thanks a lot MZMcBride!! :-)) --.snoopy.  06:48, 26 April 2010 (UTC)


 * sorry but I try to use one of your scrips, brokenredirects.py and insert itwiki_p where I find enwiki_p but in .my.cnf I found only my username dario_vet and the password, I don't find any configurations for my .snoopybot. --.snoopy.  07:59, 26 April 2010 (UTC)


 * You need to rename "settings.sample" to "settings.py" and then modify it appropriately (example file here). Be sure to run "chmod 600 settings.py" and "chmod 600 settings.pyc" after modifying the file to prevent others from being able to read it. --MZMcBride (talk) 08:02, 26 April 2010 (UTC)

ok. I create fil ~/wikitools/setting.py do you think is ok? --.snoopy.  08:41, 26 April 2010 (UTC) you speak about settings.pyc, but I don't have any settings.pyc, thanks for you help anyway --.snoopy.  08:55, 26 April 2010 (UTC)
 * The file looks fine, except the "host" line and the "rootpage" line. The host line needs to be 'sql-s2'. The rootpage lines needs to end with "/". When you run brokenredirects.py (or any of the other scripts), it will create the settings.pyc file. --MZMcBride (talk) 12:07, 26 April 2010 (UTC)
 * seems to be ok (link) if I copy line for line on a python shell on toolserver, but if I run python2.5 -c "import wikitool" wiki/brokenredirect.py don't work, I must write "import wikitool" in the command, I hope it's ok :| --.snoopy.  12:39, 26 April 2010 (UTC)
 * You want "python ~/block/brokenredirect.py" --MZMcBride (talk) 15:05, 26 April 2010 (UTC)
 * ok, now it's ok, but the script run only under home directory, not on ~/block, I thinks that is a problem with export, now my configuration for export is:

the script run on ~ but not on ~/block thanks for help. --.snoopy.  15:24, 26 April 2010 (UTC)
 * You could try setting your PYTHONPATH to "/home/dario_vet". Mine is "/home/mzmcbride/scripts" because my wikitools/ directory is inside the scripts folder. Your wikitools directory isn't. Hope that helps. --MZMcBride (talk) 15:26, 26 April 2010 (UTC)
 * I move all file on /home/dario_vet/wikitools/block but not work, mmm I don't think where I done an error :| --.snoopy.  15:55, 26 April 2010 (UTC)

$ echo $PYTHONPATH should output: /home/dario_vet If it does not, you need to modify your PYTHONPATH in ~/.bash_profile and then source the file (see above). --MZMcBride (talk) 16:04, 26 April 2010 (UTC)

Ok, now I think I have a directory tree like your, befoure I read wrong, now I have this situation: conferm path and wikitools is in script directory list file in wikitools: now I start broken.py, the output is: I try another time, and I go to bed. Thanks a lot for your help! --.snoopy.  20:11, 26 April 2010 (UTC)
 * I replace settings with my variable and It's ok... he import correctly wikitools but not settings :-) --.snoopy.  20:20, 26 April 2010 (UTC)
 * You want "settings.py" to be in the same directory as whatever file you're running is in. It should not be in the wikitools/ directory unless your other scripts are also in that directory. So, for example, "broken.py" is in /home/dario_vet/scripts. "settings.py" should also be in /home/dario_vet/scripts. Then it should import properly. --MZMcBride (talk) 06:36, 27 April 2010 (UTC)
 * Ok, now is all ok. thanks very very much --.snoopy.  11:39, 27 April 2010 (UTC)

articles transcluding Infobox settlement without coordinates_display
Per Template talk:Infobox settlement and BOTREQ, we are looking to identify articles that


 * 1) Transclude Template:Infobox settlement
 * 2) Do not transclude Template:Coord/display/inline,title
 * 3) Do not transclude Template:Coord/display/title

Not sure if this belongs here or WT:DBR. I imagine the initial run can be taken from a dump, and future cases being in a database report. If possible, please prioritize over my above requests. –xeno talk  15:24, 4 May 2010 (UTC)
 * This is trivial. I'll try to get to it tonight. You just want pages in ns:0? Do you have a report title in mind and an update frequency? --MZMcBride (talk) 22:19, 4 May 2010 (UTC)
 * Articles lacking title coordinates? I think... Yes, just mainspace-thanks. –xeno talk  22:25, 4 May 2010 (UTC)
 * You say "ugh," but someone has to come up with a report title. This isn't WP:DDR, after all! And I'm already doing all the heavy lifting! --MZMcBride (talk) 22:27, 4 May 2010 (UTC)
 * My original name was the header - but I guess you didn't like it =) –xeno talk  22:31, 4 May 2010 (UTC)
 * It seems like this job could be simple enough fot Catscan 2.0. However, the tool has yet to work for me. Tim 1357  <font face="Times new roman" size = 2 >talk 23:28, 4 May 2010 (UTC)
 * That thing looks cool. –<font face="verdana" color="black">xeno talk  23:32, 4 May 2010 (UTC)

65,000 results. Do you want them in a text file or something? --MZMcBride (talk) 02:00, 5 May 2010 (UTC)
 * Awesome! Text file is fine–<font face="verdana" color="black">xeno talk  02:01, 5 May 2010 (UTC)
 * ~mzmcbride/xeno-settlement-coord-2010-05-04.txt --MZMcBride (talk) 02:03, 5 May 2010 (UTC)
 * Thanks again. This is a huge help. AWB was choking trying to compare a 200,000 list to a 356,000 one =) [I wonder if JIRA is fixed yet...] –<font face="verdana" color="black">xeno talk  02:05, 5 May 2010 (UTC)
 * No problem. And yes. --MZMcBride (talk) 02:07, 5 May 2010 (UTC)

Report Suggestion : Orphaned Fair Use
List all media in File: namespace having a 'Fair Use' tag but which is not indicated as actually being used in an article.

Basic Query suggested by carl-m on the toolserver IRC channel: select concat('File:', page_title) from page join categorylinks on page_id = cl_from where page_namespace = 6 and cl_to = 'All_non-free_media' and not exists (select 1 from imagelinks where il_to = page_title) limit 4;

Sfan00 IMG (talk) 13:44, 9 May 2010 (UTC)
 * My bot, DASHBot does this every night. I could have the list be saved somewhere, but I see no point because the bot tags them immediately. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 16:11, 9 May 2010 (UTC)
 * Done: . --MZMcBride (talk) 17:48, 9 May 2010 (UTC)

Request: ' Free' file lacking information
Should be a relativly straightforward report..

Find and display by date all files that are in Category:All free media, but do not currently have a information block.

Reason for the request is to further identify items for Commons and resolve some of the problems before they get transfered (and deleted :.

Sfan00 IMG (talk) 11:23, 14 May 2010 (UTC)
 * You don't need a database report becase you can do it with CatScan. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 17:17, 14 May 2010 (UTC)
 * Well, CatScan is being stupid so I went ahead and made the list for you. Mind you its friggin' huge. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 20:47, 14 May 2010 (UTC)


 * Thanks :) Looks Like a considerable backlog to clear :) Sfan00 IMG (talk) 15:55, 19 May 2010 (UTC)

PRODed pages with deletion logs
I think we could use to have a daily report of all pages in Category:All articles proposed for deletion (alternatively, all articles containing a dated prod tag), which have deletion logs. Frequently, these are pages are recreated versions of previously PRODed pages, where the recreation is contesting the previous deletion. עוד מישהו Od Mishehu 13:37, 31 May 2010 (UTC)
 * I don't yet have approval to have a bot update a list, but I have a list that update every night here. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 22:04, 31 May 2010 (UTC)
 * Im working on making them clickable links. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 01:47, 1 June 2010 (UTC)

Dab pages with two or less entries
Once/month DBR reporting all pages with disambiguation but have two or fewer wikilinks. Thanks,  — fetch ·  comms   02:03, 3 June 2010 (UTC)
 * Do you realize that all pagelinks count and that there's no way to separate meta-data from legitimate page content? For example, the disambiguation template alone will +1 all counts (due to <tt> disambiguation </tt>). --MZMcBride (talk) 02:18, 3 June 2010 (UTC)
 * I limited the count of the page_links to only links to the main namespace. Do you want a regular report? <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 03:55, 3 June 2010 (UTC)

mysql> select page_title -> from page -> join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title = 'Disambiguation' -> where page_namespace=0 -> and (select count(*) from pagelinks where pl_from = page_id and pl_namespace = 0) <= 2 limit 200; +---+ +---+ +---+ 200 rows in set (5.75 sec)
 * page_title                               |
 * Transport_phenomena                      |
 * Solute                                   |
 * Orientalist                              |
 * Indrani                                  |
 * Web_TV                                   |
 * Shirako                                  |
 * SDK_(disambiguation)                     |
 * War_of_Religion                          |
 * Algazel                                  |
 * Komunistická_strana_Slovenska           |
 * Frente_Amplio                            |
 * Leopoldville                             |
 * Southfield                               |
 * Braces                                   |
 * Partido_dos_Trabalhadores                |
 * Harmodius                                |
 * United_Telecommunications                |
 * Mouth_harp                               |
 * Zhang_Bao                                |
 * Trove                                    |
 * Vara_(length)                            |
 * Nikita_Sergeyevich_Khrushchev            |
 * Realistic                                |
 * Bolckow                                  |
 * Open-loop                                |
 * Beos                                     |
 * Front_Populaire                          |
 * Honey_berry                              |
 * Ackerly                                  |
 * Sfls                                     |
 * LPH                                      |
 * Van_de_wetering                          |
 * Quality_engineering                      |
 * Voron                                    |
 * DSDP                                     |
 * OFI                                      |
 * Det_Liberale_Folkeparti                  |
 * Battle_of_Fei                            |
 * Basting                                  |
 * Fanmail                                  |
 * ITRF                                     |
 * Acción_Popular                          |
 * English_Wars                             |
 * Marxist-Leninist_Party_of_Canada         |
 * Pocket_Hercules                          |
 * Sleeping_pills                           |
 * Mavado                                   |
 * CPI(ML)                                  |
 * Lord_Buckhurst                           |
 * Front_de_Libération_Nationale           |
 * Susan_Moore                              |
 * Unión_Republicana                       |
 * Partido_Acción_Ciudadana                |
 * Tarentel                                 |
 * Tripling                                 |
 * Eidi                                     |
 * Rehab                                    |
 * KMDB                                     |
 * Demokrat_Parti                           |
 * F-Zero:_GP_Legend                        |
 * Bouzoukia                                |
 * Abyssinians                              |
 * Sakkara                                  |
 * Juk                                      |
 * Record_(audio)                           |
 * Funky_Chicken                            |
 * Ancient_Chinese                          |
 * Harikrishna                              |
 * Globo                                    |
 * Garey                                    |
 * Palkar                                   |
 * Iverson_notation                         |
 * Access_Point                             |
 * Studentification                         |
 * Chogye                                   |
 * Oligo                                    |
 * Derechos                                 |
 * B58                                      |
 * Bradley_Field                            |
 * Murren                                   |
 * Jatni                                    |
 * IFBB                                     |
 * Cordillera_Mountains                     |
 * Fleming_Museum                           |
 * Amusement_arcade                         |
 * PZP                                      |
 * Hakkinen                                 |
 * Aggtelek                                 |
 * Luminis                                  |
 * Mosk                                     |
 * Lask                                     |
 * Banjo_Frog                               |
 * ESDS                                     |
 * New_Liberalism                           |
 * Party_of_the_Left                        |
 * Morris_and_Company                       |
 * Dosco                                    |
 * Ponoka                                   |
 * Mercantile_capitalism                    |
 * Logical_Volume_Manager                   |
 * Indian_Tobacco                           |
 * Winlock                                  |
 * Perfect_society                          |
 * Tetu                                     |
 * CPIF                                     |
 * Microsoft.net                            |
 * OneTel                                   |
 * Banana_spider                            |
 * ECWA                                     |
 * YYT                                      |
 * Longwang                                 |
 * Tube_frame                               |
 * Wilson_Administration                    |
 * Hard_metal                               |
 * Fishing_Party                            |
 * YLD                                      |
 * Rabbit_fever                             |
 * ENSAM                                    |
 * MIAT                                     |
 * ARPS                                     |
 * S85                                      |
 * Talkshow                                 |
 * Vivandière_(disambiguation)             |
 * National_football_team                   |
 * Scarlet_Knight                           |
 * Lynnfield                                |
 * Macke                                    |
 * Jōshū                                  |
 * The_Ink_and_Paint_Club                   |
 * Communist_Party_of_Spain_(international) |
 * Unity_List                               |
 * Orok                                     |
 * AIYF                                     |
 * Clinching                                |
 * CYRL                                     |
 * XED                                      |
 * Dwight_Baldwin                           |
 * Combat_mission                           |
 * Religious_drama                          |
 * V21                                      |
 * EUB                                      |
 * Balot                                    |
 * Big_Thinkers                             |
 * Partido_Progressista                     |
 * Kalenda                                  |
 * Information_filter                       |
 * CPTP                                     |
 * MLPC                                     |
 * BADD                                     |
 * G100                                     |
 * Menswear                                 |
 * S._S._Stevens                            |
 * Rock_102                                 |
 * Nonanol                                  |
 * SURS                                     |
 * Windows_4.0                              |
 * W16                                      |
 * The_Man_Comes_Around                     |
 * Bundesgericht                            |
 * Sam.                                     |
 * Vedic_University                         |
 * Gunkanjima                               |
 * RSLC                                     |
 * Paulinerkirche                           |
 * Drop_(album)                             |
 * Bezhta                                   |
 * Treaty_of_Dayton                         |
 * OMLE                                     |
 * Bishi                                    |
 * IIUC                                     |
 * NVM                                      |
 * Posterior_intercostal                    |
 * B29                                      |
 * Parti_Socialiste_Unifié                 |
 * BJCC                                     |
 * Sea_Holly                                |
 * UNLP                                     |
 * Red_Shirt_Movement                       |
 * Cruiserweight_Championship               |
 * D_Series                                 |
 * Academy_for_Jewish_Religion              |
 * Rasse                                    |
 * Margareta_Eriksdotter                    |
 * W27                                      |
 * Well-tempered                            |
 * Helen_Gould                              |
 * Movimiento_Socialista_de_los_Trabajadores |
 * NAMB                                     |
 * Mazandarani                              |
 * LTUC                                     |
 * Cordis                                   |
 * Partido_Social-Democrata                 |
 * VLK                                      |
 * Sagal_Twins                              |
 * Serious_Fraud_Office                     |
 * EESI                                     |
 * Isildur's_Bane                           |
 * MTUC                                     |
 * NSIC                                     |
 * NSIC                                     |


 * Oh, duh. Of course you do. Ill set up a regular report. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 03:57, 3 June 2010 (UTC)
 * ✅ Database reports/Tiny Disambiguation pages <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 04:29, 3 June 2010 (UTC)

Something is broken with that report. The pasted results seem fine, so you must've used a different query for the automated report. The results are simply wrong in the automated report.

There should probably be a better way of adding new reports.... --MZMcBride (talk) 05:49, 3 June 2010 (UTC)
 * Oh, fixed. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 22:05, 3 June 2010 (UTC)
 * Thanks, I'll try to deal with the pages sometime. Didn't realize there would be 500.  — fetch ·  comms   02:41, 4 June 2010 (UTC)
 * It's capped at 500, there could be many more than that. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 03:55, 4 June 2010 (UTC)
 * Here is the full list (a bit over 1,000). I just thought 500 would be the most practical. <font color="Blue" face="Arial" >Tim 1357  <font face="Times new roman" size = 2 >talk 03:57, 4 June 2010 (UTC)

I've been collecting MOS:DAB issues, numbered according to Josh Parris's list, when people use the Dab solver. The file is TSV formatted, and best worked with using filtered tables in Excel and then converted to wiki code in WikEd. The revision id (which increases with time, hint) is included so, in principle, we can prune updated pages. However, I will need to redo it to gather the number occurrences of each issues; this allows identifying pages which are in a particular bad state. — Dispenser 05:59, 4 June 2010 (UTC)

Reviewers
It seems the flagged protection trial is planned for implementation next week, so it will be useful to have a list of candidates for the reviewer usergroup. So I ask for an update of Database reports/Potential reviewer candidates (with links). Since it's kinda lots of users, we should probably use the 'most obvious first' strategy, by using higher standards for the first few reports. I suggest 2,500 edits, their first edit more than a year ago, and their latest edit within the past month for now. It can be one-time for now then daily or every 2-3 days when we start granting the rights. Cenarium (talk) 15:10, 9 June 2010 (UTC)
 * Easy enough to update the report. Personally, I am curious about this, though. I thought RobLa, et al. had declared that all autoconfirmed users were getting review rights. Did that change or am I mistaken? --MZMcBride (talk) 16:33, 9 June 2010 (UTC)
 * I believe that's been quashed. –<font face="verdana" color="black">xeno talk  16:35, 9 June 2010 (UTC)


 * I'm running the updated report now. It may fail due to such a large page size (things like the motherfucking AbuseFilter like to kill large edits, even if they're completely legitimate). If it fails, I can add pagination easily enough or we can limit output, whichever. --MZMcBride (talk) 16:38, 9 June 2010 (UTC)
 * One way to reduce the size would be to exclude: <tt>rollbackers, account creators, autoreviewers</tt>, these people are likely trusted enough by virtue of those rights to receive the right implicitly or explicitly. –<font face="verdana" color="black">xeno talk  16:43, 9 June 2010 (UTC)


 * Cenarium: Do you have a preference between pagination or limitation? Putting all the results on one page simply isn't feasible. --MZMcBride (talk) 20:50, 10 June 2010 (UTC)
 * Ok, we probably need to do several reports. The reports should of course exclude admins and users in the reviewer usergroup, when it comes live. A possibility is
 * a report listing users in the rollbacker, account creator or autoreviewer usergroups whose latest edit is within the past month
 * a report listing users with X edits, their first edit more than Y ago, and their latest edit within the past month, and who are not in any of the above usergroups; where X and Y are gradually downgraded as we grant the rights. We can start with X=2,500, Y= 1 year, and if there are still too many, then paginate. Cenarium (talk) 23:57, 10 June 2010 (UTC)
 * Haven't all kinds of auto-promotion been rejected by the community? Amalthea  13:34, 11 June 2010 (UTC)
 * Autopromotion by the software, yes. But admins will make their own determination when parsing the lists. Cenarium (talk) 15:52, 11 June 2010 (UTC)
 * Ah, using the lists as strong suggestions is of course fine. It should have actually read your initial post, sorry about that. Amalthea  16:01, 11 June 2010 (UTC)


 * I think one report is fine as the proposed report 1 is naturally a subset of the proposed report 2. Adding a "Groups" column that lists the user's current user groups might be nice, though. --MZMcBride (talk) 20:17, 11 June 2010 (UTC)
 * I'll also be implementing an "Exceptions" subpage for users who wish to not be listed (and therefore not be in the "reviewer" user group), like me. --MZMcBride (talk) 20:31, 11 June 2010 (UTC)
 * Thanks, it has been requested at AN too. Cenarium (talk) 03:50, 12 June 2010 (UTC)

This should be done now. The report isn't particularly cheap to update, though. --MZMcBride (talk) 22:28, 12 June 2010 (UTC)


 * Indef-blocked editors are currently on the list, would of course be best if they can be excluded. But what's happening now? If you need me to mass-add people to the group, say the word. Personally I'd just add everyone from the lists and deal with hypothetical false positives once they are reported, but I'd certainly go ahead and add all rollbackers, autoreviewers, and account-creators since they all previously had someone look at them before. Amalthea  09:32, 13 June 2010 (UTC)
 * The vast majority of users probably meet the (not yet established) criteria. There were a suggestion that users should be recently sufficiently active, so more than one edit in the latest month, how many subject to interpretation, and recent blocks should also be considered. Maybe users who had not requested the right should be given some explanatory notice, so either a talk page notice, or we wait until the trial begins where we'll have the sitnotice. Cenarium (talk) 04:09, 14 June 2010 (UTC)
 * Hmm, if people are still not happy with any kind of fixed criteria then I'll hold still for now (won't get around to do any sanity checks myself the next few days). But as indicated, feel free to give me a list of usernames and I'll semi-automatically put them into the group. Amalthea  09:48, 14 June 2010 (UTC)

I manually updated this list this morning. As expected, there was a large drop in the output. --MZMcBride (talk) 15:19, 16 June 2010 (UTC)

Images uploaded from Flickr without renaming
Would it be possible to generate a weekly report of English Wikipedia files whose names follow, more or less, the regex  (don't count the slashes, case-sensitive is OK, and convert to SQL format as appropriate). This would be very useful to identify both files that need renaming and also copyvios (you can prepend http://www.flickr.com/photo.gne?id= to the first space-separated part to get the original description page including licensing information). Thanks, PleaseStand (talk) 05:22, 12 June 2010 (UTC)
 * Simple enough to do; this is pretty much the same methodology that uses. Do you want to query Commons as well? Or avoid Commons images? How often do you want the report updated? And do you have a good report title in mind? For the link bit, I'll probably just use a wrapper template that you can customize to meet your needs. --MZMcBride (talk) 05:51, 12 June 2010 (UTC)
 * Commons images should be in a separate section. Both reports should be run weekly. A good report title would be "Files with Flickr-generated filenames". PleaseStand (talk) 06:16, 12 June 2010 (UTC)
 * To clarify: this large backlog (much from 2006 and 2007) remains from before such filenames were title-blacklisted, so the backlog should not be growing. I am not sure how many such filenames exist on Commons, but I know many exist herePleaseStand (talk) 06:29, 12 June 2010 (UTC)
 * Teach a man to fish... http://toolserver.org/~nikola/grep.php?pattern=%5E%5B0-9%5D%7B7%2C11%7D+%5B0-9a-f%5D%7B10%7D.*%5C.jpe%3Fg%24&lang=en&wiki=wikipedia&ns=6 --MZMcBride (talk) 14:10, 4 July 2010 (UTC)
 * Shit, sorry, I just noticed it's been over two weeks since I looked at this. The month flew by. Silly life, always getting in the way. --MZMcBride (talk) 14:12, 4 July 2010 (UTC)
 * Thank you! I never knew that there was an article title grep program on the toolserver. It is good that the list is manageably short. PleaseStand (talk) 14:31, 4 July 2010 (UTC)

Edit filters warnings effectiveness
I'm not sure if this is doable. A report that list edit filters with "warn" as action (exclude filters that have "disallow" in addition). For each of the filters 1) the number of edits that produced warning in the last 30 days 2) the portion of those edits that were not saved after the warning 3) calculate the percentage. Thank you. Sole Soul (talk) 23:57, 21 June 2010 (UTC)
 * I suppose you want number of edits (not users) that produced warning, right? Also, I'm not sure how to properly decide whether an edit actually followed a warning. But I think something like “warned user made some edit to the page he was warned about during one minute after that warning” should be pretty accurate. Svick (talk) 00:26, 22 June 2010 (UTC)
 * Yes, that's what I meant, corrected. I think the warning filters log the edit initially with the action "warn", and a second time when saved with action "none" or "tag" etc. See this for example. Sole Soul (talk) 00:44, 22 June 2010 (UTC)
 * Ok, I posted the result at User:SvickBOT/Edit filter effectiveness. If you find any inaccuracies, let me know. Svick (talk) 04:12, 22 June 2010 (UTC)
 * Wow, that was fast. Thank you very much. The filters that warn and disable like 9 and 10 show 100% effectiveness, which is inaccurate. Apparently these filters warn users and then disallow them if they choose to save (which really doesn't make sense, either warn the user or disallow him). Thank you again. Sole Soul (talk) 05:27, 22 June 2010 (UTC)
 * I added a column for warning messages links to the report. Sole Soul (talk) 06:31, 22 June 2010 (UTC)
 * I removed the filters that warn and disallow, like you said, it didn't provide any useful data. BTW, it's usually not a good idea to edit page that is generated in some automatic way – the next time it is generated, your changes will be lost (I added the new column to the program that generates the table). Also, if you wanted, I could update the table periodically or on demand. Svick (talk) 13:53, 22 June 2010 (UTC)
 * Yes, that would be great, may be every month. Sole Soul (talk) 21:43, 22 June 2010 (UTC)
 * Ok, done. Svick (talk) 22:33, 22 June 2010 (UTC)
 * Very interesting report! One highlight....linkspamers ignored the warning.  only 22% effective.  --Rocksanddirt (talk) 22:41, 22 June 2010 (UTC)

Non free images without rationales
Is it possible to exclude images that have the following wording :

"It is held that the use of copyrighted images of individuals that are no longer living, are inherently unrepeatable, and where no new free-use image is available is permitted under US copyright law governing the fair use of copyrighted properties, and Wikipedia fair use policies. Its use is solely for the purpose of illustration and in no way limits the copyright holders rights."

in a permission field of an information template? Sfan00 IMG (talk) 09:58, 4 July 2010 (UTC)
 * Sure, done. --MZMcBride (talk) 04:51, 13 July 2010 (UTC)

Non Free Images without Rationales (exclude from report)
Exclude from search-
 * Short-Rationale
 * Standard-Rationale
 * MTG set symbol
 * Non-free Wikimedia logo (and no other non-free tag)

This would assist greatly in finding the images that are actually without rationales (as opposed to having non-standard/non-templated ones.) Sfan00 IMG (talk) 10:36, 6 July 2010 (UTC)
 * I excluded the first three. I'm not sure about the last one, mostly due to this thread on my talk page. --MZMcBride (talk) 04:52, 13 July 2010 (UTC)
 * Yes, it's technically non-free, but it is treated as free- we are allowed to use it in the userspace and rationales are not required. J Milburn (talk) 12:03, 14 July 2010 (UTC)

Related
The configuration for the report is available here: Database reports/Non-free files missing a rationale/Configuration

If you search in your browser for "fair_use_strings", you can see the strings that the bot looks for to determine whether a file description page of a non-free file uses a proper rationale. --MZMcBride (talk) 06:10, 13 July 2010 (UTC)

Unprotected Templates by Number of Transclusions
Hello, It would be nice to see a list of templates that are not protected, sorted by the number of transclusions they have. That way some sysops could go through and protect the higher risk templates. Tim1357 (talk) 06:54, 17 January 2010 (UTC)
 * Would an extra column at be sufficient or would you like a separate report? --MZMcBride (talk) 04:17, 11 March 2010 (UTC)
 * A column would probably be ok, but if you sorted by protection status would it sub-sort by number of transclusions? –<font face="verdana" color="black">xeno talk  14:51, 23 March 2010 (UTC)
 * Yeah, there may actually be an advantage to creating a separate report. Not the sorting issue, necessarily, but it would drastically cut down on the noise of the new report and allow one to be purely statistical (the current report) while the other could be more production (the new report). Though I'm not a huge fan of using template protection, so I'd want to avoid any mass protections simply due to being listed.... In the meantime, desperately needs some love. --MZMcBride (talk) 15:06, 23 March 2010 (UTC)
 * Gah! Can you exclude subpages of Template:Convert? Some of these are also protected because they are "frequently substituted"... Maybe a filter for that might be worthwhile as well. –<font face="verdana" color="black">xeno talk  15:22, 23 March 2010 (UTC)
 * Yes, I can, but I'm not really inclined to. They're sorted alphabetically, just skip over them if you'd like. Or unprotect them, as they're not widely used and their protection is silly. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
 * Need to pick a report name for this and come up with a limit for number of transclusions. I guess it only matters if editing is sysop-only. Nobody should care about move protection, surely. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
 * Done: . --MZMcBride (talk) 03:11, 11 September 2010 (UTC)

While I don't necessarily object to this report, it may be sensible to change the cut-off point given the mess it has inadvertantly caused. Looking at the lower end of the report, do 501 transclusions really constitute "many"? PC78 (talk) 13:29, 13 September 2010 (UTC)
 * What limit would you suggest? --MZMcBride (talk) 19:17, 13 September 2010 (UTC)
 * 1000? Even that may be too low, but it would be enough to cut the report (as it is now) in half. Ultimately it depends on what this report is to be used for. I'll raise the question over at the WP:ANI discussion. PC78 (talk) 00:10, 14 September 2010 (UTC)

Potentially Commons
Would it be possible to have a report that lists Files in Category:All free media which are not currently in Category:Copy to Wikimedia Commons so that they may be more readily identified?

Someone called 'erwin' on the #Wikimedia-toolserver IRC, suggested the following via a pastebin,

1.SELECT page_title 2.     FROM page 3.     JOIN categorylinks 4.     ON cl_from = page_id 5.     WHERE page_namespace = 6 6.     AND cl_to = 'All_free_media' 7.     AND NOT EXISTS (SELECT *   8.                      FROM templatelinks   9.                      WHERE tl_from = page_id  10.                      AND tl_namespace = 10  11.                      AND tl_title IN ('Nocommons', 'Not commons')  12.                      ) 13.     LIMIT 10 but they pointed out that it's only a basic starting point and would need a LOT of tweaking before it's usuable for the intended purpose of finding Commons candidate material that is not currently tagged as such.

It would need to exclude Files already in Category:Copy to Wikimedia Commons, as well as images with PUI, No license or No source tags etc

Sfan00 IMG (talk) 11:31, 18 January 2010 (UTC)


 * Here are the first 100 results of pages in the File namespace that are not in Category:Copy to Wikimedia Commons or Category:Wikipedia files on Wikimedia Commons:


 * Please review this list and provide feedback regarding its results. Thanks. --MZMcBride (talk) 05:33, 25 April 2010 (UTC)


 * So far it's looking good :), I'm tagging the problems I find as well :) Sfan00 IMG (talk) 11:39, 25 April 2010 (UTC)


 * Is this still wanted? Is there a report name in mind? A report update frequency? --MZMcBride (talk) 19:25, 24 October 2010 (UTC)


 * Belated followup - This can't proceed until some other issues relating to finding images tagged for Commons that don't have

recognisable license tags are resolved. Sfan00 IMG (talk) 13:49, 5 January 2011 (UTC)

Stale report?
Database reports/Indefinitely semi-protected articles hasn't been updated since late October? – xeno talk  14:34, 1 March 2010 (UTC)
 * Yar, it got to be too big for a single page and paginating it is an issue as it divides the content into redirects and non-redirects. This isn't the only broken report, see below. --MZMcBride (talk) 19:33, 9 March 2010 (UTC)
 * What about splitting the reports? (Indefinitely semiprotected redirects). – xeno talk  19:35, 9 March 2010 (UTC)
 * That might be an option, though it has two issues: (1) it would make the semi-protected report inconsistent with the fully protected report (though I guess that report could use some pagination too, heh), and (2) it would mean that the non-redirect report would use something clunky like "Indefinitely semi-protected non-redirects" or "Indefinitely semi-protected articles", neither of which are very nice. --MZMcBride (talk) 19:43, 9 March 2010 (UTC)
 * Good god you're obsessive-compulsive like me, aren't you? ;p The two reports (Indefinitely semi-protected (articles|redirects)), when listed side-by-side, should be self-explanatory. Alternatively the landing page could disambiguate the viewer. – xeno talk  19:57, 9 March 2010 (UTC)

Uncatecorized and Unreferenced BLPs
Could someone make a list of articles in that are in no categories when one ignore all maintnance categories,  and Cat:XXXX births. Rettetast (talk) 19:50, 22 February 2010 (UTC)
 * I made the above request at WP:BOTREQ a few weeks ago, but nothing happened. This is probably a better venue.
 * I have chosen to only do Unreferenced BLPs at the moment, but the report could be expanded to do all articles. It depends on how large the output will be. Also articles in is already in a queue and does not need to be listed on this report. Rettetast (talk) 11:30, 15 March 2010 (UTC)
 * Example page? --MZMcBride (talk) 16:43, 15 March 2010 (UTC)
 * 1, 2, 3, 4, 5, 6, 7, 8. Rettetast (talk) 17:27, 15 March 2010 (UTC)
 * Thanks for the examples. I spent about an hour working on this, but I think I need to re-think how to do this. --MZMcBride (talk) 23:40, 15 March 2010 (UTC)
 * Done: . How often do you want the report updated? --MZMcBride (talk) 21:29, 18 March 2010 (UTC)
 * Thank you. Once a week is enough. I'll leave a note here if I need the report too be expanded too all articles and not just the unreferenced BLPs. Rettetast (talk) 22:19, 18 March 2010 (UTC)

Could the report be expanded too all Living people? And please exclude articles that are tagged with catimprove. The latter should also be done to Database reports/Potential biographies of living people (4). Rettetast (talk) 11:38, 7 April 2010 (UTC)
 * I'm not sure expanding this report would be accurate. If you use Category:Living people instead of Category:All unreferenced BLPs, the report title would be "Uncategorized biographies of living people," not "Uncategorized and unreferenced biographies of living people." Presumably the biographies not in Category:All unreferenced BLPs are referenced. Would a separate report work?
 * I'm also not sure why cat improve should be applied to . The goal of this report is to find pages that need Category:Living people. Can you elaborate? --MZMcBride (talk) 12:01, 7 April 2010 (UTC)
 * A separate report would be fine, but I was thinking of expanding the focus to not moving it. Maybe a new report titled, and also include all articles in XXXX births/deaths, would be appropriate.
 * The point with cat improve is that it already is in a maintenance category and is more likely to already be fixed when you get to it. Rettetast (talk) 12:24, 7 April 2010 (UTC)
 * Bump. Rettetast (talk) 21:03, 22 May 2010 (UTC)
 * Are you using to find pages in need of better categorization or are you using it to find bios that need to be in Category:Living people? It sounds like you're using the report for the former, when its purpose is the latter. I think that's why your cat improve suggestion didn't make sense to me. --MZMcBride (talk) 00:01, 23 May 2010 (UTC)

Database reports/Blank single-author pages
Please get Database reports/File description pages containing no templates and/or Database reports/File description pages containing no templates or categories up and runnung again, and then remove the File: namespace from Database reports/Blank single-author pages. There are too many files which were uploaded with empty description pages, and that has nothing to do with CSD G7. עוד מישהו Od Mishehu 10:41, 18 March 2010 (UTC)
 * The broken reports are noted above. I just filed another bug in Wikimedia's bug tracker (22883) to accompany the Toolserver bug report (jira:TS-549). I could probably rewrite these reports to use a different method of comparing against Commons, but I'd really rather see this bug resolved properly (setting the data types as varchar or varbinary consistently). I'm not sure if this is likely to happen in the near future, though, so I may be forced to rewrite these reports in the next few weeks.


 * As for, I don't think the File pages are really overwhelming the report, though if they're being listed elsewhere, I don't have an issue removing them. I'll do it if/when the other reports are fixed. --MZMcBride (talk) 15:05, 18 March 2010 (UTC)

Recently created pages with deleted titles
I think we could use an automatically generated list of recently created (30 days would make sense, based on Special:NewPages) pages, whose author isn't an admin, which have a deletion log entry and no undeletion entry. Pages like this are likely recreations of the original pages, which would frequently be speedy deletable (if the original was a speedy deletion, then under the same CSD; if the original was deleted per XFD, then CSD G4). Weekly seems like a good frequency for this one. עוד מישהו Od Mishehu 09:36, 14 April 2010 (UTC)
 * Example page? --MZMcBride (talk) 16:21, 14 April 2010 (UTC)
 * I think this would be a good idea, except I wouldn't necessarily limit it to those with no undeletion history. Some pages have a history of deletion/restoration only to be finally settled by an XfD resulting in delete. If such a category were recreated, filtering out ones with a restoration in history wouldn't catch this. An example of one that would fit Od Mishehu's original suggestion would be Category:Wikipedian cyclists, and an example of one that wouldn't be caught would be Category:Socialist Wikipedians. VegaDark (talk) 16:30, 14 April 2010 (UTC)

I looked at pages that were new (in the recentchanges table, which goes back about 30 days) and had a deletion entry. There were about 5,000 results total. I put 100 results below for analysis. These are only articles, but there are IP user talk pages, file pages, and other types of pages in the results as well. --MZMcBride (talk) 23:42, 14 April 2010 (UTC)

Incomplete work
--MZMcBride (talk) 15:24, 3 May 2010 (UTC)
 * &mdash; the intention was to split off pages with 0 categories from the "(4)" report; (4) is no longer listing pages with 0 categories, but "(5)" still hasn't been created
 * &mdash; looks like this has been committed and pulled, it just needs to be added to the crontab, WP:DBR, /Announcements, and it needs a /Configuration subpage created
 * With regard to the second report, QCoder said here that there were some false positives, but he seems to enjoy the list nonetheless. I pointed him over here and asked if he could produce some common keywords for the false positives to make the report better. Killiondude (talk) 18:09, 3 May 2010 (UTC)


 * Non-free... is resolved. Still need to work on (5) quite a bit. --MZMcBride (talk) 03:54, 9 May 2010 (UTC)

Categories linking to deleted categories
I think we could use a weekly report for categories linking to deleted categories. These are frequently the result of back-links to renamed categories not being fixed. עוד מישהו Od Mishehu 12:39, 23 May 2010 (UTC)
 * There are and  already. Not sure what you want. --MZMcBride (talk) 13:52, 23 May 2010 (UTC)
 * Not the same thing. I'm talking about a "See also Category:Foo", or a navbox for categories, which isn't necessarily updated when the category is moved. עוד מישהו Od Mishehu 04:43, 24 May 2010 (UTC)
 * Ah, so you're talking about Categories containing deleted red-linked categories, pretty much. (There's a bit of ambiguity in that report title, but oh well.) I posted a list here. The results look pretty noisy. Some results appear to be related to parser magic voodoo. Or perhaps just bad rows in the pagelinks table, though that's less likely. Let me know what you think of the results. --MZMcBride (talk) 05:21, 24 May 2010 (UTC)
 * I think we should exclude any case where the deletion reason contains the string "C1" (the CSD for empty categories), which seems to be filling up the report. עוד מישהו Od Mishehu 10:51, 26 May 2010 (UTC)
 * Updated results. This report idea seems untenable. --MZMcBride (talk) 22:03, 26 May 2010 (UTC)
 * This does contain some useable information (see this edit, where I foxed a link thenks to your report). To make this more useful, I think we should remove categories transcluding Progress box (these are maintenance categories, where the links don't need to be updated), transcluding cat class (these are WikiProject categories, where some of the corresponding levels have been deleted). עוד מישהו Od Mishehu 08:18, 27 May 2010 (UTC)
 * And any transcluding Cfr full, which should catch categories being proposed for renaming to previously deleted titles. עוד מישהו Od Mishehu 08:30, 27 May 2010 (UTC)

Sorry, I seem to have missed your replies here (or I forgot about them). Are you still interested in getting an updated report with the new exclusions? --MZMcBride (talk) 05:01, 13 July 2010 (UTC)
 * Yes. עוד מישהו Od Mishehu 10:07, 13 July 2010 (UTC)
 * Here are the results with the three specified templates excluded: http://en.wikipedia.org/w/index.php?oldid=373363263
 * By the way, you should ping me on my talk page if I don't respond to a thread for a few weeks. --MZMcBride (talk) 01:30, 14 July 2010 (UTC)


 * Update: I would like it if the list would:
 * include cases where the target is a category redirect, regardless of the existrance of a deletion log for such pages;
 * exclude all cases where the target begins with a digit, as there are too many false positives in that section to make it useful;
 * exclude all source categories which transclude Articles by Quality, for the same reason as cat class.
 * עוד מישהו Od Mishehu 08:38, 29 July 2010 (UTC)
 * I'm confused now. This most recent request seems to have veered off of the original request. We haven't been looking at category redirects (or pseudo-redirects, as the case may be with category redirect). I'm not sure how I can modify the current query to accommodate the new request. It seems like it would have to go into a new list (with the exception of point 2, I suppose). --MZMcBride (talk) 01:37, 30 July 2010 (UTC)
 * Category redirects are almost always categories which have been renamed, but at lewast one user thinks that the old name is one people might be looking for. As such, from a human prospective, it's similar to the main thiong I'm looking for - links to categories which have been renamed, where the links haven't been updated. If you need to do it in a separate report, feel free to do so.
 * Points 2 and 3 are consistant with my earlier request - point 2 removes a practcly useless part of the list; point 3 removes a class of categories which, while technicly fit the original request, there is no need for anything to be done with. עוד מישהו Od Mishehu 07:31, 30 July 2010 (UTC)

I need to chart this, because I don't remember what the original request was or what I wrote and rewrote.

Here's what I have so far:
 * pages in the category namespace
 * that don't transclude progress box, cat class, and cfr full
 * that include links to deleted categories
 * where the deleted categories' deletion reason isn't like ".*([Cc]1|[Ee]mpty|Datadump).*"

Now you want to:
 * exclude pages beginning with a digit
 * add Articles by Quality to the list of templates to exclude based upon on pages

These two pieces are easy enough. The part about category redirects is confusing the shit out of me, so you'll need to copy and paste from this list and re-explain what exactly you're after with those. It'll have to be in a separate report, I think. --MZMcBride (talk) 02:54, 31 July 2010 (UTC)

Here you go: http://en.wikipedia.org/w/index.php?oldid=376370319 --MZMcBride (talk) 04:00, 31 July 2010 (UTC)


 * After an other thought, I think that links to category redirects may be a different report. What I want there is any case where:
 * Category:A links to category:B
 * Category:B is a category redirect (defined as containing a Category redirect template), or a catewgory disambiguation (defined as containing a Category ambiguous).
 * עוד מישהו Od Mishehu 10:12, 1 August 2010 (UTC)

Actually a red-link in a progress box means that the category should be created, so there's still an action required. However I seem to remember using AWB's red-links function to try and pick these up, and getting the empty red-linked categories which Progress box knows about but is not showing, as well as the ones I wanted. Rich Farmbrough, 09:22, 15 September 2010 (UTC).