User talk:Stemonitis/Testing ground

Separate surname and forename with both comma and space
It probably would help to explain that the conventional separator between the surname and the forename is both a comma and a space—no matter how those names appear in the article name. For example, index Wong Gong as, and not as it is currently indexed (Wong Ka Keung apparently already does that in living people, for example), so that he appears after Felix Wong. Index Tim Wood as and not the current , so that he appears before Timothy Wood and not after Willie Wood. A lot of people do not realize that ALL characters, including spaces, are indexed, even many of them are aware of our convention of using that to get articles at the top of the category listing by using a space as the sort key. I think few of them realize that the space in that context is not a special command but just an ordinary character sorted by the ordinary rules, and that if there are more more than one indexed under the space, they can be sorted by including more characters after the space. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * I agree that having both a comma and a space are required between and  is worth specifying. I'm undecided about adding commas in names which are already in the order " ". I can see advantages to either approach. --Stemonitis 01:24, 19 January 2007 (UTC)

Put "Jr.", "III" after forname, not after surname
You've got that covered; just mentioning it is surprising how often people get that wrong. I suppose that people who haven't actually tried to look someone up in a category haven't figured out that they need to get John Smith and John Smith close to each other, and not have all the "Smith Jr." people in a separate clump at the end of the Smith listings. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * Yes indeed! --Stemonitis 01:24, 19 January 2007 (UTC)

British peers
You say "(which will not, in such cases, appear in the article title)"—what sort of fantasy world are you living in? This is really much more category dependent than other things, and anyone known not just "invariable" but rather "often" should be indexed under their real names in any of the real categories, leaving the titles stuff to the peerage categories. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * This was a carry-over from the previous text, which I'm not especially enamoured with. The peers people seem reluctant to change their established practice, though, even in categories that are predominantly filled by commoners. --Stemonitis 01:24, 19 January 2007 (UTC)

First name sorted categories
1. Sorting by first name rather than by last name is a property of the category, not of the person
 * This is seen in some of the arguments about people named "Singh" in some country and some occupation, with some people insisting on not sorting them by Singh.
 * Diacritics still need to be fixed when first name sorted.

2. There should be no first name sorted categories without it being explicitly mentioned on the category page or its talk page, and if there is a discussion somewhere else relevant to more than one category, that needs to be linked from every category talk page to which it is claimed to apply. This isn't something that should remain at the whim of one editor; those who might disagree need a place to discuss it. Furthermore, it is just sensible to point out to people coming to that category what they should expect.

3. These include at least many of the categories related to family names (many of those will have the name which is in the category name, even those who do not should be indexed by first name, I would think, and that's something that ought to be pointed out in the category's talk page). Others often first-name indexed include Icelandic names categories, and the cricket categories for Pakistan and a couple of other countries, at least that's what some of the people on Wikiproject Cricket claim it should be. Once again, I think that if there is nothing about it on the category page, we should not follow it. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * This is all new, but I think it's good. I hadn't realised that Pakistani names sort the other way, so I may have caused some bother with that. However, I think you're absolutely right that sort orders are category-specific. I carefully sort Icelandic people by forename in Category:Icelandic people, but by patronym in Category:Living people, and I think that's right. The same would apply to other categories as well, and would mean that all the Thai people currently sorted by first word in non-Thai-specific categories should probably be changed, for instance. --Stemonitis 01:24, 19 January 2007 (UTC)

Apostrophes, okinas, and a various and sundry things of that sort
Only apostrophes should be allowed in sorting, and only in some situations.

One problem area is the "Ó Sé" vs. "O'Shea" names. An apostrophe (code number 39) sorts differently from a space (code number 32).

Somebody tried to insist on sorting the Tongan version of the okina as a separate letter after Z, even though admitting the Samoans don't do that, and the English don't do that. And hardly anybody ever writes in Tongan; it's mostly a spoken language. Gene Nygaard 04:45, 18 January 2007 (UTC)


 * I can't see any reason for keeping apostrophes, and I've been removing them. The fact that "Ó Sé" and "O'Shea" sort differently isn't really surprising, since they're spelt differently. What we do with spaces in surnames I'm not sure, but I'm convinced that "O'Shea" is best dealt with as "Oshea". This is the practice followed by other reference works, I believe. There may be a stronger case for characters which function as full letters, like the apostrophe used to transliterate aleph, or the ʻokina. But still, I think, they ought to be removed from the sort key, because few people will be able to predict where they will appear in the sort order. --Stemonitis 01:24, 19 January 2007 (UTC)

Hyphens, em and en dashes, minus signs, etc.
Only hyphens should be allowed in indexing. And maybe they shouldn't be either, at least not outside of limited situations.

One problem area is the French habit of sticking a hyphen between two given names. I'd prefer all of them changed to spaces in the sort keys. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * This ties in to the previous point: Hyphens and spaces sort very differently (one before A-Z/a-z, and one after), so that "Roger Lloyd Pack" would come before "John Lloyd", but "Roger Lloyd-Pack" (as he is sometimes called) would come after. This seems inconsistent. The trouble is that the only solutions I can think of are either to remove all spaces/hyphens in multiple-word surnames, or to replace all spaces with hyphens. Removing both would probably lead to more reasonable sorting but is likely to be messed up more often. I'm not sure what the solution is. --Stemonitis 01:24, 19 January 2007 (UTC)

Case insensitivity
Ideally, I'd prefer totally case-insensitive sorting in all Wikipedia categories. But the only realistic way to accomplish that is through a software change, I would think.

Take a gander at a number of the "LaUppercase" and "LeUppercase" pattern names that have been sorted in, e.g., Category:Living people with the second capital letter in the surname lowercased. I'm not one who has done that or anything similar in more than a couple of cases, but I like it. For example:
 * Huguette Labelle
 * Leah LaBelle


 * Donat Leclair
 * John LeClair

Gene Nygaard 04:45, 18 January 2007 (UTC)
 * I'm glad you like this. Again, it's just applying what other reference books do, but it hadn't been explicitly stated before. --Stemonitis 01:24, 19 January 2007 (UTC)

Mc, Mac, St.
Some places index Mc and Mac names together. Some index "St." as if spelled out Saint, more often in place names such as "St. James" than in personal names such as "St. James". I'm satisfied without doing that, just wondering if it needs to be mentioned. Gene Nygaard 04:45, 18 January 2007 (UTC)
 * The Mc=>Mac issue is separate from everything else. I haven't been converting Mc to Mac, because I wasn't sure whether I thought that was right. Many reference works do so, but perhaps not all. At least for people who are unfamiliar with the system, leaving Mc as Mc is the more transparent. The St.=>Saint case is perhaps similar, but could be seen to apply to many instances of abbreviations. I would rather see them left as they are, if only to increase the transparency. Almost everybody knows that "St" means "Saint" and "Jr" means "Junior", but there may be others that we haven't thought of and which are less familiar. I think it is easiest to have the sort key follow the title as closely as possible, including abbreviations. Thinking about it, that would also suggest a preference for Mc remaining as Mc. At least that way, it's pretty clear from the title alone where it will sort in a category. --Stemonitis 01:24, 19 January 2007 (UTC)