User:MZMcBride/Magic

This page compiles some useful database queries, scripts, methods, and tools that are helpful when doing admin maintenance work.

Toolserver background
The Wikimedia Toolserver is a server with access to a (more or less) live copy of the database of all Wikimedia projects, for use with user scripts and utilities. Access is granted on request on a case-by-case basis.

There are three main servers: S1, S2, and S3. S1 holds the English Wikipedia; S2 holds Bulgarian Wikipedia and Wiktionary, Wikimedia Commons, Czech Wikipedia, German Wikipedia, English Wikiquote and Wiktionary, Esperanto Wikipedia, Finnish Wikipedia, Indonesian Wikipedia, Italian Wikipedia, Dutch Wikipedia, Norwegian Wikipedia, Polish Wikipedia, Portuguese Wikipedia, Swedish Wikipedia, Thai Wikipedia, Turkish Wikipedia, Chinese Wikipedia; S3 holds every other public Wikimedia wiki not previously listed. 1

To have the queries below run, you can either sign up for a Toolserver account, you can use the Query service, or you can ask someone on their talk page or in the Toolserver IRC channel. For more information about the Toolserver, see http://wiki.ts.wikimedia.org/view/Main_Page and http://meta.wikimedia.org/wiki/Toolserver.

One important factor to observe when using the Toolserver is replication lag (replag). Because the Toolserver is a live copy of the Wikimedia servers, from time to time, it gets a bit behind (for whatever reason). If the replag is two weeks, it means that any data obtained will be as of two weeks ago. This can be problematic when dealing with orphaned talk pages and such, as they may have already been de-orphaned or deleted. To view the Toolserver's current replag, see here.

General links

 * mw:Manual:Database layout
 * Getting started
 * Query service
 * Criteria for speedy deletion

CSD G2 and CSD G7
This is good for finding test pages (CSD G2) and blanked pages (CSD G7) as well as other quirky pages. Code courtesy of nl:User:Erwin and de:User:Duesentrieb.
 * http://toolserver.org/~erwin85/shortpages.php

Explanation: All pages that are not in the User: or User_talk: namespaces, are less than 50 bytes in length, have only one contributor, use no templates, and are not redirects.

CSD U2
Generate three lists: one of all users on en.wiki, one of all pages in User:, and one of all pages in User_talk:. Run dpn.py twice. The script will generate lists of pages in User: or User_talk: that do not correspond to a registered user (CSD U2).

CSD G8
Explanation: Finds all pages in the Talk: namespace whose titles don't contain a "/" and who do not have a corresponding subject-space page (CSD G8).
 * http://toolserver.org/~agony/orphantalk/index.php?l=en
 * Find forgotten pages
 * Run "/.*$" on first list.
 * Run both lists through dpn.py.
 * Filter page titles that are redirects, don't exist, or contain "archive".
 * Run both lists through dpn.py.
 * Filter page titles that are redirects, don't exist, or contain "archive".

CSD R1
Code courtesy of MediaWiki.
 * http://toolserver.org/~schutz/output/brokenredirects.html
 * User:SQL/BRedir

Code courtesy of MediaWiki.

Explanation: The first query just finds broken redirects and lists their target. The second query is good at finding pages where the redirect has text below it that should be moved to its target.

CSD C1
Code courtesy of User:CBM.

Explanation: Lists empty categories that are empty and that are not members of Category:Wikipedia category redirects or Category:Disambiguation categories. After generating list, remove all titles that contain "(-importance|-class|non-article|assess)".

CSD T3
Code courtesy of MediaWiki and de:User:Duesentrieb.

Explanation: Lists all pages in the Template: namespace that have no transclusions.

Notes: Stub templates and templates that are called from other templates dynamically should not be deleted.

CSD G6
Code courtesy of User:Autocracy.

Explanation: Lists all pages with no edits in 2008 (or later), are less than or equal to 10 bytes, have no transclusions, and have only 1 or 2 edits.

CSD R2
Code courtesy of User:ST47.

Explanation: Lists all redirects from the (Main) namespace to a non-(Main) namespace.

Notes: Many of the redirects listed are very old and are kept intentionally.

Code courtesy of User:ST47.

Explanation: Lists all redirects from the (Main) namespace to Talk:, User:, or User_talk: (CSD R2).

Inaccurate file extensions
Code courtesy of User:CBM.

Explanation: Lists all images, their major MIME type, and their minor MIME type.

Filter out extensions that don't match the MIME type.

Orphaned subpages
Subpages without any incoming links generally indicate an issue. If the subpage is an archive, it should at least be linked from the active talk page. Code courtesy of MediaWiki.

Explanation: This generates a list of all pages in the Talk: namespace that have no incoming links.

Category:Temporary Wikipedian userpages

 * http://toolserver.org/~misza13/cat_temp.txt (should update semi-regularly)
 * http://toolserver.org/~cbm/cgi-bin/queries/Category?category=Temporary+Wikipedian+userpages&namespace=&project=enwiki_p

Duplicate files
Filter out the duplicates.

Inappropriate cross-namespace links
Explanation: Generates a list of all pages in the (Main) namespace that are not redirects.

Explanation: This script fills the void left by the disabling of Special:CrossNamespaceLinks. Using the query above saved to a file called "all_pages.txt", it goes through each non-redirect in the (Main) namespace and determines if there are links to the User:, User_talk:, Image_talk:, MediaWiki:, MediaWiki_talk:, Help_talk:, or Category_talk: namespaces. If it finds such links, it lists the page in a file called "output.txt". To run this script, replace "Username" and "Password" with the appropriate data.

Note: This script requires the API.py and Wiki.py modules. It also requires an account with the 'apihighlimits' right (usually the sysop and bot groups).

Users by edit count
Explanation: This query will list the first 4,000 users not currently in the 'bot' user group by descending edit count.

Users with edits only in the User: namespace
Explanation: This query will list all users who have (non-deleted) contributions only in the User: namespace.

Miscellaneous Python scripts
Some of these Python scripts require the pywikipedia framework.

Incoming links
Explanation: Using an input list named "file_name_in.txt" with "FullPageName" on each line, this script checks if the page has any incoming links. If the page has no incoming links, the page is added to a file named "file_name_out.txt".

Open proxies
Explanation: This script requires an input file with an IP address on each line. It blocks each IP for two years using the block reason blockedproxy.

dpn.py
Code courtesy of dpn`` in #python.

Explanation: This script takes two lists. The first input file contains a list where no entry has a "/". The second input file has a list of similar entries, but some will contain a "/". The script truncates everything following a "/" in the second input file, and compares it to the first input file. It outputs all mismatches. For example:

This script is incredibly helpful for finding orphaned subpages (CSD G8 or CSD G6) and pages that belong to users who do not exist (CSD G2).

To-do

 * pywikipedia
 * CAT:TEMP header (with stuff above)
 * Pages and subpages of indefinitely blocked users (User: and User_talk:)
 * Caveat: Users blocked indefinitely for dying
 * Userboxes that are still actively transcluded
 * image hash duplicates (local and vs. Commons)