User:Pascal666/check

Though this project is still active, it has been mostly superseded by WikiProject Check Wikipedia, a much larger set of tests that extends across many languages. I am currently working to better document the validity checks I have been performing so they can be merged into the larger project.

Reports currently posted on Wikipedia

 * User:Pascal666/cats
 * User:Pascal666/living
 * User:Pascal666/external

History
This project dates from June 2006, at which time the only test I was running was looking for template elements inside articles (as documented much more thoroughly on my userpage) indicative of a substituted template. I have expanded it over the years to include many more checks. Most of the problems found I fix myself, but a couple reports are so large I have not attempted to fix all of them. Some of these reports as well as others requested by other Wikipedians can be found above.

Technical details
All of my checks are run in Perl on en.wiki database dumps. When analyzing enwiki-xxxxxxxx-pages-articles.xml.bz2 I use Parse::MediaWikiDump and IO::Uncompress::Bunzip2. The later allows me to analyze the dump without having to expand it to disk; it is instead decompressed into memory on the fly. For the rest of the dump files I use IO::Uncompress::Gunzip to decompress on the fly and access the data directly with my own algorithms.

Many of the below include Perl regexs.

Substituted templates
The original purpose of this project was to find templates that had been substituted into articles when they should have been simply included.

Wikimarkup
Certain wikimarkup should never be found on pages that are not included in other pages. This wikimarkup is usually only found in templates that should never be subst'd.  Template wikimarkup

Interwikis
Templates often exist in multiple languages. When a template gets subst'd its interwikis get placed into the article as well. Since an article should never contain an interwiki pointing to the template namespace in a foreign language, the existence of these interwikis can be indicative of a subst'd template. Example:. They can also be caused by a template not having noincludes around its interwikis. Example:. Both problems need to be fixed. Template interwikis eu:Txantiloi vi:Tiêu bản ml:ഫലകം sk:Sablóna cs:Šablona fr:Modèle pt:Predefiniçao de:Vorlage pl:Szablon nl:Sjabloon es:Plantilla sv:Mall no:Mal fi:Malline ca:Plantilla tr:Sablon cs:Sablona hu:Sablon ro:Format eo:Sablono da:Skabelon id:Templat [a-z][a-z]:Шаблон [a-z][a-z]:Template

Categories
Certain elements should never appear inside a " ". Not allowed carriage return/line feed { } < > [ ]

Least wanted categories
Special:WantedCategories only includes the top 1000 wanted categories. Many categories on this list have simply not yet been created. The bottom of the wanted categories list (the least wanted categories) contains mostly typos. That is, categories not are not really wanted, but pages that have simply been miscategorized. This report was requested by a user who planned to fix these typos (but didn't realize how many there are) and is currently posted at User:Pascal666/cats.

Wrong case cats
Categories are case sensitive. If a page is in a non-existent category that has the same name as an existing category just different capitalization, the user probably intended to put the page in the existing category. Example:

Wrong hyphenation cats
If a page is in a non-existent category that has the same name as an existing category just different hyphenation, the user probably intended to put the page in the existing category. Example:

Duplicate categories
Users will sometimes create a new category not knowing that the category already exists, just with a different capitalization or hyphenation.

Included non-templates
Users often accidentally include a page when they intend to instead create link to it, or place an article into a category. This check started as simply a text search for "{{Category:" but turned into examining enwiki-xxxxxxxx-templatelinks.sql.gz for any includes outside template space (though many of these are valid anyway so this has required many exceptions). Example:

Birth cats
Anyone in Category:Living people should also be in a births category within the last 123 years. This report is currently posted at User:Pascal666/living. Births cats since 1880 Category:19\d\ds? births Category:18[8-9]\ds? births Category:200\ds? births Category:Year of birth uncertain Category:19th-century births Category:20th-century births Category:Year of birth missing (living people)

Death cats
Anyone in Category:Living people should not be in a deaths category as well. Example:

Template parameters
By scanning templates for "{{{\w+" a list of parameters each template accepts can be created. You can then scan articles that include each template to find parameters that it is not designed to accept. In many cases the parameter simply has the wrong case. {{user|PascalBot}} was created to fix many of these. Example:

External internal links
Links between Wikipedia articles should be accomplished using wikilinks instead of external links to "http://en.wikipedia.org". This report is currently posted at User:Pascal666/external. Example: