Wikipedia talk:Categorizing articles about people/Archive 7

Case and sorting
It appears that sometime in the last few months, wikimedia software's sorting has gone from being a "case-sensitive" sort to "case-insensitive" sort, the way an encyclopedia should be sorted. What that means for us is that we no longer need to "tweak" sort keys by adjusting case of letters, "duBarry", "DuBarry", and "Dubarry" all sort the same, which is correct. I will update the text so that we stop making doing work that is no longer needed. Studerby (talk) 00:02, 29 June 2011 (UTC)
 * Maybe we could verify that it's going to stay that way before we proceed? --Auntof6 (talk) 02:24, 29 June 2011 (UTC)
 * What I mean is, was this change done on purpose, or is it perhaps an unintended side effect of some other change(s)? Is there a discussion somewhere that indicated the change was going to be made? --Auntof6 (talk) 06:49, 29 June 2011 (UTC)
 * Yes it was done on purpose, and has been requested for *years* at Bugzilla, at least. The actual change happened in early March 2011, about two weeks after the deployment of MediaWiki 1.17. It was not included with the MW changes in case of problems: the MW 1.17 deployment was reversible (and indeed several times was reversed, patched and rolled out again), whereas the change to category sorting was non-reversible. Few people seem to have been aware that it would happen though, and since the event it's been discussed at WP:VPT several times. The most recent related thread is here. -- Red rose64 (talk) 12:14, 29 June 2011 (UTC)
 * Cool! In that case, I join the celebration! --Auntof6 (talk) 19:04, 29 June 2011 (UTC)

Arabic names
The "Sort by surname" section, IMO, inappropriately assumes everyone has a surname.

It correctly notes that will Chinese names have a surname, the surname comes first.

There are two styles of traditional Arabic names. Neither one uses a surname. In one style of Arabic name a child name does include one of his father`s name, but it is his `````first`````name, not a surname. The same holds true for most Pashtun names, and possibly in other cultures influenced by Arabic culture.

I think that, for names like this, it is worse than a waste of time -- it is misleading. Why do we sort on surnames? Isn`t it so if a list contains individuals who are closely related, sorting will group them together? But when related individuals don`t share an inherited last name, sorting on the last name will misleading group together indviduals who aren`t closely related.

Several years ago I had a series of discussion with another contributor who was leading an initiative to add certain templates to every biographical article. For the initiative of his team to work properly every biographical article was supposed to have a listas field in that template, and a DEFAULTSORT template. The compromise we agreed on was that in order for the templates his team were placing to work articles about individuals with Arabic names would have sort keys identical to the current name of the article.

This is not ideal as, due to different transliterations, articles about individuals with Arabic names are frequently renamed, sometimes multiple times. Few people will realized that when they rename an article they should update its sort order.

Some individuals with Arabic style names, who live in Europe or North America will name their children in the European, not Arabic style. But these exceptions should not fool us into not recognizing that the vast majority of individuals with Arabic names don`t have anything like a surname.

Complicating all this is a bot that was run in 2006 and 2007. The author of the bot assumed it was OK to automatically add a Europeanized sortkey. This bot was run over some or all of the hundreds of thousands of biographical articles that existed then. Many people have told me, Yes, but since the article already had a sortkey set in either the DEFAULTSORT template or in the other templates `listas` field they thought they could trust it was OK to fill in the same sortkey in the other place.

Unfortunately, due to the poorly thought out bot, sortkeys from that period are not reliable.

Worse, the author of that bot went on to contribute to one or more of our semi-automated editing tools, and the bad code that was prepared to plug in a naive and possibly incorrect sortkey was recycled into some versions of some semi-automated editing tools. Consequently one can`t rely that any sortkey one comes across was set by an actual human being.

I won`t suggest a specific rewrite of this section. But I feel sure it needs a rewrite that make clear no one puts sortkeys that follows the Europeanized style on the articles of individuals who aren't from Europe, or aren't of European heritage.


 * Nope, your partially wrong. Some Arabic names have "surnames" and others don't.  You are assuming ALL arabic names, please don't.  The current standard is almost all Arabic names follow western standards.


 * However, there are differing patterns based upon when and where you lived. For example, Malay names, including Arabic ones, follow a patronymic pattern.  Burmese names, Icelandic name and some old European (Scandinavian, Celctic and Russian) are also patronymic.  With patronymic names, Wikipedia consensus is the sort order is how it is written. Pre-Meiji Restoration Japanese names followed the same pattern as Chinese names.  After Meiji, Japanese names follow Western order, except for Sumo wrestlers and Kabuki actors.


 * There are other patterns. Vietnam writes their names Surname Middle First.  A majority of Spanish and Portuguese names have two surnames, one from the father and one from the mother. There are standards for others such as non-Arabic Ethiopian, non-Arabic Somalian names, and many African tribes have differing patterns.  There is no one size fits all.


 * Arabic names before 1900 are mostly patronymic. There are some names that don't follow the pattern, but it is hard to know which ones.  Therefore the majority of pre-1900 names are sorted how it is written.  After 1900 it can get dicey.  Iraqi names are now using Western order.  However, Sadam Hussein is a patronymic because Hussein was his father.  However, Hussein is treated as the surname mostly because of the western press.  Via a decree in 2008 by the Saudi Ministry of Interior, all Saudi people, except royalty, now have a family or surname.  Some Arabic tribes were still clinging to a tribal surnames in 2008.  Most Saudi's have had family names for years now do to the influx of western ideas after oil was discovered. Osama bin Laden (full name Osama bin Mohammed bin Awad bin Laden) is from Saudi Arabia and "bin Laden" would be a patronymic name before 1900, but it is not patronymic.  Osama's father, Mohammed bin Awad bin Laden came from Yemen.  "bin Laden" is the family or surname. Osama's grandfather is Awad bin Aboud bin Laden.


 * There a some different conventions when using Arabic names, for example "al". Al is sorted like von or de in western names.  So, the last name of Al-Rahman, Al Rahman or al-Rahman is sorted by Rahman.  Another one is abd, examples are Abd Rahman, Abdul Rahman and Abdullah Rahman.  Abd means God.  The name following Abd is the description of God.  Rahman means merciful, so Abd Rahman is God is Merciful.  The lastname in the sort order is Abd Rahman.


 * A good text to read is . This article shows how to sort Arabic names in a way my poor brain can understand.  This article comes from the journal, The Indexer, The International Journal of Indexing, published by the Society of Indexers.  The Indexer is the main journal of the Society of Indexers, the American Society for Indexing, the Indexing Society of Canada/Société canadienne d’indexation, the Australian and New Zealand Society of Indexers, the Association of Southern African Indexers and Bibliographers, and the China Society of Indexers.  The article also follows what is in the Chicago Manual of Style.


 * Currently the is no bot that I'm aware of that indexes Arabic names. AWB does index some Arabic names.  But, it does not index names with "al", "abd", "bin" and certain Arabic names like Ahmed. Those names have to be entered in manually.  Bgwhite (talk) 22:47, 4 August 2011 (UTC)


 * Saudi Arabia ordered citizens to adopt a surname in 2008? And did it order them to use whatever was their current last name as a surname?


 * Abd means God? I thought the name Abdullah or Abdallah meant "servant of God" or "slave of God" -- Abd (servant) Allah (God).  Well, perhaps I got that wrong.  You write authoritatively.


 * I know "al" is an article, like von and van. The last time I looked, granted several years ago, I thought that we treated von and van differently here.  You write authoritatively, you may be technically correct that for those Arabic names that should be sorted as if they had an European style surname, al shouldn't be part of the sortkey.  You are aware that you would be swimming upstream on this.  Ninety to ninety-five percent of people who have tried to sort Arabic names have included "al" or "bin"/"ibn" as part of the surname.


 * You mention Saudi Arabia and Iraq as two Arabic countries that ruled everyone's name should be sorted in the European style. They are two of the more populous.  Should I assume people from the other two dozen or so Arabic speaking countries are not under this ruling.


 * The bot in question was called something like "kingbotk". It did include the "al" in the sortkey.  I have had multiple people tell me that altered the sortkey because they trusted their automated editing tool, and their automated editing tool had offered this change as part of the changes it recommended.


 * OK, I have read your document.


 * The clearest thing I think we need to agree upon is that the instructions in it are extremely complicated.


 * The instructions direct indexers to contact the author with some inconsistent usages.  The instructions directe indexers to generally trust that if authors use non-standard transliterations, etc, they know what they are doing.  Practically no one here knows what they are doing.


 * I have encountered exactly one wikipedian who was a genuine Arabic scholar. I used to ask him for advice, but he seemed more interested in being an administrator, and editing, than answering questions or resolving disputes on very complicated aspects of Arabic.


 * No offense, I read the link you referred me to, and it reinforces my original interpretation of "abd" and "abdallah".
 * "beginning with Abd (or more precisely ‘Abd) and followed by the word for God (as in Abdullah)";
 * "Examples of the varieties of romanization spellings of the name meaning ‘servant of the Merciful’ include: Abd al-Rahman, Abdul Rahman, Abdulrahman, Abd ar-Rahman and Abdel-Rahman"


 * Bearing in mind that:
 * the professional sorting of Arabic names is very complicated
 * the advice in your document assumes the authors themselves are well informed about how to choose how to transliterate Arabic names;
 * no offense, but the instructions were so complicated you misunderstood a portion of them.
 * essentially none of the wikipedia contributors are truly competent to make an informed choice as to how to transliterate or sort Arabic names, doing it right is complicated, I doubt most contributor have the patience to do this right. You may be a rare exception.


 * For these reasons, particularly that doing it right is so supremely complicated, I suggest we should stick with naive sorting of all Arabic names, statrting with the first letter, and proceeding in sequence to the last letter of their name. Geo Swan (talk) 05:37, 5 August 2011 (UTC)


 * Egads. You like to take things literal.  I have not misunderstood a thing, but you have. Please read what I said and do not add things I didn't say.


 * Almost all Saudi Arabians already had surnames, the last vestiges without were those with tribal names. They ordered the ones with tribal names to change, thus all Saudi Arabians are now using surnames.  I don't know what the people of tribal names changed their surnames to.  When Japan's Emperor Meiji ordered the changeover, most people only had one name and they could chose almost anything for their last name.  Nowhere did I say Iraq ordered everyone to use surnames.  I only said they have been changing over time.


 * You can argue all you want on what Abd means as it does not matter. The point is that it is a compound name.  You have to include both words as a last name.


 * Yes, Arabic names are very complicated. I personally think European notability is worse as some last names are surnames and others are not and it is very hard to tell the difference.   I've yet to understand Latin names of Roman times and I don't do those.


 * I did get one thing wrong. I mentioned I'm not aware of any bots working on sortkeys.  I've been in listas land (the sortkey on talk pages) for awhile and no bot does those.  There are bots that do DEFAULTSORT.  However, they do not sort on names with al, bin, abd or other surnames such as Ahmed.  They leave DEFAULTSORT blank in these cases.  They do sort all Chinese, Korean, Malay and Icelandic names wrong.  They sort the vast majority of new Arabic names right.  So, if you have a problem with Arabic names, then you should have a bigger problem with Chinese names as no bot gets them right.


 * Your beginning statement where you said, "There are two styles of traditional Arabic names. Neither one uses a surname."  This is completely false as the majority of those LIVING outside of Malaysia are now using surnames.  There are pockets that don't.  I'm aware of some in Iraq, Pakistan and Afghanistan.  However, if they have made the Western press, they are to be treated as a surname.  The prime example is Sadam Hussein.  So, sorting them by there first name is completely wrong.


 * I'm following what the Indexers, Chicago Manual of Style and Wikipedians have said what to do. Will I get some wrong, yes.  Will others get some wrong, yes.  There are cases, as the link I gave bears out, that are extremely difficult to know what to do.  But, it is nothing different from Royalty names or other hard ones. (I really hate three word Portuguese names.  Is that a two word last name or is that a middle name?)  We can only do our best.  Having only a professional do it is wrong as then it will never get done.  Wikipedia is built upon volunteers that are not professionals or experts at everything.


 * That said, this is currently how it works.
 * There are currently only three editors that I'm aware of doing sorting. I'm the youngest of the bunch.  User:Mandarax and User:JimCubb are they other two that go around doing manual sorting.  Unfortunately, JimCubb's last post in April said he was out sick and didn't know when he would return.  I really hope he is ok. He is the elder of us three.
 * A bot (Larabot) goes around tagging the talk pages of new living biographical articles with WikiProject Biography.
 * I get a list of what the bot tagged and I manually, via AWB, enter parameters on the talk page (class, living, other Projects). I also enter the listas parameter.
 * In another AWB window, I'm viewing the main article of those tagged by Larabot. I check to see if DEFAULTSORT is correct or missing, add Persondata tag if not already there.  Add any missing data in Persondata.  Finally, I check to see if the article meets notability requirements and has reliable sources.
 * I also get a list of pages people have added WikiProject Biography to the talk page and are missing either listas or living parameter and go thru the same procedure. Almost nobody adds listas.
 * Typically I see around 150 new articles a day. Some days are over 300. I've never been below 100 names unless the database reports don't run that day for Larabot to get its input.


 * So long story short. Almost every new article's sort key is being looked at manually and they are being looked at by three people who atleast know what they are doing most of the time. Will we get some Arabic names wrong, Yes.  Will we get other names wrong, Yes.  Nobody, especially me, is perfect.  None of us are aware of every little rule. Just last week I learned if a historic painter has Master as a name, it should be sorted under M and not under .  The sorting of Arabic names is not as bad as you think it is.  Bgwhite (talk) 07:09, 5 August 2011 (UTC)

Vietnamese names
There needs to be something about Vietnamese names in the guidelines. The book indexes typically do them family name, middle name, given name. That's the same order used in the article titles. Someone surveyed the indexes here. In Vietnam, it's by given name. This style is recommended by the Chicago Manual. The reason for this is that so many Vietnamese are named "Nguyen" that sorting by surname creates problems. All the same, you never see this style in real-life English-language book indexes. Entertainers nowadays often drop off their surnames to create stage names. I have no idea what to do with them. Kauffner (talk) 19:14, 25 August 2011 (UTC)


 * I agree there needs to be some consensus on how to index Vietnamese names because there are different standards to follow.


 * To Summarize: Let's use the name of Nguyen Thi Dinh as an example on how others index.
 * Chicago Manual of Style - Ding, Nguyen Thi
 * Library of Congress, U.S. University libraries and the British Library - Nguyen, Thi Dinh
 * In Vietnam, it is usually - Ding, Nguyen Thi
 * Living in the west or publishes in the west - Nguyen, Dinh Thi (note: All three ways are in Scientific papers, but more tend this way).
 * Bgwhite (talk) 21:59, 25 August 2011 (UTC)
 * A paper where the author's name is given family name last is obviously not comparable to a bio that is titled family name first. Kauffner (talk) 11:32, 26 August 2011 (UTC)
 * Ok, not sure what you are trying to say. There are three ways to sort a Vietnamese name that I'm aware of.  If you have sources for which way is best, please share.  How they do it in Vietnam is not really appropriate as this is the English Wikipedia.  So, how to sort names in an English setting is what is needed.  Normally Chicago MOS and Libraries are in sync, but not in this case.  I'm also not able to find any info from the The Indexer, the scholarly journal for indexing or from Genealogical sites.  I'm leaning towards Library of Congress, British Library and scholarly journal way... Nguyen, Thi Dinh.  This appears to be what is used most when it comes to indexing. Thoughts? Bgwhite (talk) 00:07, 27 August 2011 (UTC)
 * Yes, family name, middle name, given name. Any academic book about Vietnam does it this way. Only a few war memoirs do it any other way. As for the memoirs, it is not so much that they use some other system. It's more like they are just sloppy. I wouldn't worry about what CMOS or the Library of Congress says. Kauffner (talk) 00:30, 27 August 2011 (UTC)
 * Um, is that "Nguyen Thi Dinh" or "Nguyen, Thi Dinh"? Any "academic book" may possibly do it one way, but not every academic paper.  Actually, not every academic library sorts your way either.  Wikipedia goes by reliable references which is why CMOS and the two Libraries matter.  CMOS is sort of the bible of style guides.  The indexing societies are the gold standard to how to sort names. The two Libraries are usually the ones every other Library follows. If we were talking newspapers then the New York Times and AP style guides are the bible.  So far you have shown me a forum post in which three librarians post their thoughts.  Oh, the way academic libraries do things change over time... For McDonald, it used to be (and still is in some places) sorted as Macdonald instead of McDonald.  Without finding info from the indexing societies, I will follow the Library of Congress and the British Library... "Nguyen, Thi Dinh". Bgwhite (talk) 07:23, 27 August 2011 (UTC)
 * I am fine with what you are proposing. Kauffner (talk) 07:47, 27 August 2011 (UTC)

Peers and other "qualified" names
The existing guideline for sorting British peers doesn't really say what to do in some cases that I'm seeing currently in Category:Biography_articles_without_living_parameter, which just got a large influx of various mostly-deceased British peers without the listas parameter being filled in on the WikiProject Biography template. I'm wondering how the categorization/sorting info can be updated to accomodate them. Examples:
 * How does one order the different parts of a name like "Donald Curry, Baron Curry of Kirkharle" - Kirkharle first or Curry first? It looks like Kirkharle first, but an example should be in the material of a case with the last name being in the title.
 * "James Hamilton, 1st Earl of Clanbrassil (second creation)" (where does one put the "second creation"?)
 * "Sir John Chichester, 1st Baronet, of Arlington Court" (where does one put the "Arlington Court"?)
 * "John Hussey-Montagu, Lord Montagu" (the "Lord Montagu" is a courtesy title - what does one do with it?)
 * "John Dawnay, 4th Viscount Downe" (where does the 4th go? It has to go in someplace - there's also a "John Dawnay, 5th Viscount Downe")

Any suggestions? Thanks! Allens (talk) 20:32, 28 November 2011 (UTC)
 * If the article has a  (it's usually near the bottom, just above the categories), you can often use the value from that - all of the above have one already:
 * Donald Curry, Baron Curry of Kirkharle has, so set Curry, Donald Baron Curry of Kirkharle
 * James Hamilton, 1st Earl of Clanbrassil (second creation) has  so Hamilton, James
 * Sir John Chichester, 1st Baronet, of Arlington Court has  so Chichester, John
 * John Hussey-Montagu, Lord Montagu has  so Hussey-Montagu, John
 * John Dawnay, 4th Viscount Downe has  so Downe, John Dawnay, 4th Viscount
 * But some of these don't quite match WP:PEERS.
 * Donald Curry, Baron Curry of Kirkharle is fine as it stands, but could be adjusted slightly to
 * James Hamilton, 1st Earl of Clanbrassil (second creation) →
 * Sir John Chichester, 1st Baronet, of Arlington Court →  (Baronets sort by family name)
 * John Hussey-Montagu, Lord Montagu →
 * John Dawnay, 4th Viscount Downe is fine as it stands.
 * The extra stuff like "(second creation)" is only really necessary in the event that two people having otherwise identical names and titles appear in the same category. -- Red rose64 (talk) 21:12, 28 November 2011 (UTC)
 * Perhaps some of these (in the revised form you list above, to conform with WP:PEERS) could be used as examples in the Categorization of people guideline? Also, would anyone happen to know how the software handles identical DEFAULTSORT keys - does it default back to the page title, or what? I'm wondering this because I'm noticing in Edward Watson, Viscount Sondes. I suspect this is an error needing fixing (should be  ) if I've got it right (I find it odd that it isn't  ...)? Thanks again! Allens (talk) 23:39, 28 November 2011 (UTC)
 * Regarding your second q: the answer is that the software always appends the page title and namespace (in that order) to each sort key, regardless of whether there would be duplication without those (there is a help page describing this, but I can't find it).
 * Using your example Edward Watson, Viscount Sondes, this has a  and four categories, none of which has a sort key. In each of these four cats the page will be sorted as DEFAULTSORT plus page name, i.e. with an actual sort key of "SondesEdward Watson, Viscount Sondes". If we add a sort key to one of those categories to override the DEFAULTSORT - for example,
 * then in this particular category, the page will be sorted as if the sort key were "Watson, EdwardEdward Watson, Viscount Sondes". -- Red rose64 (talk) 14:40, 29 November 2011 (UTC)
 * then in this particular category, the page will be sorted as if the sort key were "Watson, EdwardEdward Watson, Viscount Sondes". -- Red rose64 (talk) 14:40, 29 November 2011 (UTC)

Sorting "Mc" names under "Mac"?
I noticed that some pages where the person's surname begins with "Mc" have DEFAULTSORTs to alphabetize them as under "Mac". So, for example, John McCain has a DEFAULTSORT as "Maccain, John" and Paul McCartney has one as "Maccartney, Paul". I realize that this was standard practice historically, but it isn't supported by this guideline. Furthermore, Collation says "Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British phone books." My objections are that alphabetizing "Mc" under "Mac" is old-fashioned, probably stemming from the time when the spelling of surnames was less standardized than it is now. It also makes names harder to find if the user does know the exact spelling of the individual's name. Other manuals advise alphabetizing "Mc" as "Mc" and not as "Mac": The New York Times Manual of Style and Usage, Cite It Right by Johns & Keller, Writing the Research Paper by Winkler & Metherell, The SBL Handbook of Style. --Metropolitan90 (talk) 20:30, 4 December 2011 (UTC)
 * I wholeheartedly agree. Treating Mc as Mac was removed from WP:MCSTJR in October 2010.  Among the other sites that recommends this are The Chicago Manual of Style, American Library Association, The MLA Style Manual, The Indexing Companion , ISO 999 , The Cambridge Handbook for Editors, Authors and Publishers   recommends sorting letter-by-letter as they appear.  The most likely source that you are seeing names with Mc sorted as Mac comes from AWB.  I've brought up once a year or so ago to have it changed and was told there was no consensus anywhere to be found in Wikipedia land about this.  Start a discussion at Wikipedia talk:AutoWikiBrowser or file this as a but at Wikipedia talk:AutoWikiBrowser/Bugs and I'll gladly chime in. Bgwhite (talk) 21:29, 5 December 2011 (UTC)
 * Discussion to fix the Mc -> Mac conversion in AWB and if to have a bot run to fix all current "problems" from this conversion in DEFAULTSORT and listas can be located at: Wikipedia talk:AutoWikiBrowser.  Bgwhite (talk) 18:57, 7 December 2011 (UTC)
 * I refer the honourable gentlemen to the section . Rich Farmbrough, 22:07, 8 December 2011 (UTC).

I've restored the removed clause for now. If we're going to make such a major and far-reaching change we should at least have a proper RFC, and if a change is made the clause should be replaced with a statement of the new guideline policy rather than simply removed. --Deskford (talk) 17:52, 19 January 2012 (UTC)
 * This statement was removed from WP:NAMESORT in October 2010 after some discussion. It was removed from AWB after some discussion.  A bot was proposed to make the changes, but was abandoned for being too hard to code.  So, the change is being done manually.  I don't see the point of bringing it up again unless there is a valid reason to do Mc -> Mac sorting.  As noted above and at other discussions, it is now standard practice not to do Mc -> Mac sorting... the library guidelines, all the indexing societies (including UK's Society of Indexers), and major style guidelines (including Cambridge).  At this point, the old wording should be restored and if there is a change to be proposed, it should be brought up for discussion.
 * "clause should be replaced with a statement of the new guideline policy rather than simply removed". It will be, please see discussion below this.  Any comments are extremely welcome. Bgwhite (talk) 18:41, 19 January 2012 (UTC)
 * btw, you added that everything after the first letter should be lowercase... McDonald should be Mcdonald. This is also no longer the case as the Mediawiki software was changed to be case insensitive.  Mediawiki will now sort McDonald the same as Mcdonald. Bgwhite (talk) 19:18, 19 January 2012 (UTC)
 * Yes, the case-insensitivity was introduced during February-March 2011, a week or two after the MediaWiki 1.17 upgrade. Diacritics still need special treatment though. -- Red rose64 (talk) 19:45, 19 January 2012 (UTC)
 * I deliberately put the text back exactly as it was before the removal in October 2010, but you are right to remove the case-sensitivity stuff. --Deskford (talk) 01:55, 20 January 2012 (UTC)