User:Rich Farmbrough/Category sorting conventions

Background

 * A category is a relationship between pages. One page is always in the category namespace (the category page - which need not exist), the other (the member page) can be anywhere (including the category namespace).
 * The relationship is created by having the text on the member page, either directly, via transclusion or  a mixture of both.
 * Generally

Defaultsort
When an article is in a category it is sorted by the article name. Thus Cat comes before Dog - moreover the category has sections for different starting letters so Cat will be under "C". Sometimes this isn't what we want for example "John Smith" is usually wanted to fall under the "S" section. The mechanism for this is called a sort key: For example we write - the comma is what we would have used in the days of paper - to indicate that the name was reversed, in fact since the article John Smith still shows up as "John Smith" albeit under the S section the comma is a little ill advised, but we have it on many thousands of articles.

Now this is all well and good but often we have eight or ten categories, if they are all keyed on "Smith, John" it is crazy to type that, or cut and paste it every time, so "DEFAULTSORT" was invented. Any category that isn't given a sort key will use the DEFAULTSORT or failing that the pagename.

Multiple entries
Sometimes we would like to list an page in a category more than once - under different names, or in a different sort order. Including the category twice with different sort orders on the member page doesn't work

will take the last sort order and ignore any earlier ones.

Sometimes the best approach here is to use separate categories. For example we could have

But that's not always what we want: we can use redirects to get the desired effect - the redirects are just member pages of the category (they show in italics thanks to CSS):

Under different names
For example Canis familiaris could contain


 * 1) redirect Dog

This shows Canis familiaris under "C" in the pets category. Of course Dog will still be there under "D"

With a different sort order but the same name
This is an unusual device, and not very attractive aesthetically. But it works:

Gregory v. Ηelvering is a redirect to Gregory v. Helvering

The first page contains the Greek letter eta "Η" in it's name, and is a simple redirect-with-categories with a default sort of "Helvering" (spelled with an atich not an eta of course!).

This means that under say, Category:United States taxation and revenue case law, the article is listed under both plaintiff and defendant (Gregory and Helvering).

Material

 * http://en.wikipedia.org/wiki/Wikipedia_talk:Categorization#Capitalising_every_word_in_the_defaultsort

Sorting by DEFAULTSORT/PAGENAME
There are two problems with this, one systemic, one ephemeral


 * 1) bunching of most or all of the category under one letter
 * 2) While the addition of case insensitive sorts is underway this may break sortorder

Both these problems are shown at Category_talk:Abies

However in general categories this is the appropriate sort order: see for example Category:Onions

Sorting by specific epithet
The option of sorting categories of binomials by the second part of the name is a useful one: the benefit is that


 * categories a broken down: instead of having all items under one letter (though correctly ordered) they are under the letter of the specific epithet: [ is a slightly imperfect example.

The draw-back is that items which belong in the category under a non-specific key are mixed with those that have one.

Sorting by lower-case specific epithet
This enables species and varieties names to be sorted in a separate cascade from common names, lists, main articles, sub-families etc.  An example is at


 * 1) Problem: if large categories contain many "normal" keys then the species will be hidden:
 * 2) * Solution at this point split the category, as has been done with Banksia

Additional notes
To add an article under one or more names that are distinct from the article name, create a redirect with that name and categorise it appropriately. It may still use DEFAULTSORT and/or individual category sort-keys to order that name as the editors see fit.

Proposed solution

 * 1) Sort in binomial categories by lowercase specific epithet where possible
 * 2) Sort in all other categories by the pagename or default-sort, or category specific sorts (for example top-sorting List of, removing "Flora of" from state articles in cat:Flora of US, etc..)
 * 3) Migrate to canonical DEFAULTSORT as is being done across WP: starting at the beginning of the alphabet.  This will mean that no ordering that is not already broken will be broken, and those that are broken will be fixed.


 * I would take care of 1, where an explicit category sort order has not already been set (and will look at lower-casing those with a hard-coded title-case specific sort order).
 * I would take care of 3, where no default has been set, and in when it is a casing change.
 * 2. is part of ongoing maintenance.

Detail
Various appreviations and marks need to be ignored for this sort
 * "var. ", "subsp.", "ssp.", "subvar.", "f." and "subf.", "subg.", "sect.", "subsect.", "ser.", "subser." , "×" , "'" , '"' ,"+"

DEFAULTSORT

 * September 2009 989000 articles have a DEFAULTSORT, 45,000 have one that breaks capitalise every word. 600,000 would need one.
 * August 2009 852665 articles have a DEFAULTSORT including of the articles with a taxobox
 * June 2009 759110 articles had a DEFAULTSORT
 * May 2009 7118431 articles had a DEFAULTSORT
 * March 2009 692630 articles had a DEFAULTSORT

Mediawiki features that might help

 * Allowing multiple sort keys [[:Category:Cat name|Sort 1|Sort 2]
 * Allowing categories to define their sort order with transform rules