Wikipedia:Bots/Requests for approval/DefaultsortBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

DefaultsortBot
Operator: Mikaey

Automatic or Manually Assisted: Automatic, unsupervised

Programming Language(s): C#

Function Overview: Puts a DEFAULTSORT tag into biography articles that do not have them.

Edit period(s): Continuous

Already has a bot flag (Y/N): N

Function Details:
 * 1) Grab a list of talk pages from Category:Biography articles with listas parameter
 * 2) Check corresponding article page for a DEFAULTSORT tag. If it exists, jump to step 7.
 * 3) Grab value of listas parameter from WPBiography banner on talk page.
 * 4) Insert DEFAULTSORT at end of article with value grabbed from listas.
 * 5) Remove sort keys from any category tags with a sort key equal to the new DEFAULTSORT value.
 * 6) Save page
 * 7) Proceed to next article and jump to step 2.

Discussion
I'll come right out with it, this is pretty much ListasBot 1/ListasBot 4 in reverse. This was requested by Carcharoth, you can see the conversation(s) here and here.
 * Roughly how many articles would be edited on the initial run? Nakon  04:12, 20 May 2009 (UTC)
 * Sorry for the delay -- I didn't quite know how to answer that. So, I wrote out the bot's code, and just commented out the part where it actually commits the change back to the Wiki.  I let it run for 500 articles, and it ended up "editing" 139 of them.  So, that's a 27.8% edit rate, and with Category:Biography articles with listas parameter having 609,677 pages as of this writing, I'm going to have to put my estimate at (about) 170,000.  Matt (talk) 05:31, 20 May 2009 (UTC)

As long as you're fairly sure the listas parameter is accurate most of the time, this sounds fine to me. Can you publish the source code somewhere? --MZMcBride (talk) 05:34, 20 May 2009 (UTC)
 * Sure thing. And I think "most of the time" would be pretty accurate.  DEFAULTSORT/listas falls under more scrutiny than I would have thought when I started writing bots.  Matt (talk) 05:44, 20 May 2009 (UTC)

What will you do about sort keys already in place? What if they're the same? What if they're different? Will it remove duplicate sort keys? Will it remove non-duplicates? Should it? --MZMcBride (talk) 20:25, 20 May 2009 (UTC)
 * If you want to add additional tasks: at WP:CHECKWIKI, there is a list of pages that need sortkeys. -- USer:Docu
 * Respectfully, I think I'm going to pass on that one, at least for right now. The intent here is to stick to biographical articles, or at least articles where a bot can safely pick out a DEFAULTSORT key on its own.  From glancing at the list you gave, it doesn't appear that a bot would be able to safely pick out a DEFAULTSORT tag in all situations for those articles.  Matt (talk) 19:23, 20 May 2009 (UTC)
 * I can understand, it was just a thought. ((BTW part of it is easy to do: (1) if it's not a bio, basically the sortkey should be the title stripped of diacritics. (2) Some articles in the group are false positives (lifetime isn't taken in account. (3) numbers are probably better checked on a per category basis. The reminder should be (4) bios. )) -- User:Docu
 * Would you be willing to convert special characters and remove apostrophes from the listas values before you put them in more places? – Quadell (talk) 19:51, 20 May 2009 (UTC)
 * Erm...the code already does that. If you look at the code, that's what the StripPunctuationAndUnicode function does.  Matt (talk) 22:34, 20 May 2009 (UTC)
 * Excellent. – Quadell (talk) 23:20, 20 May 2009 (UTC)
 * Are we talking about a situation where a DEFAULTSORT tag is already on the page? In that situation, the bot skips over the page.  Matt (talk) 22:34, 20 May 2009 (UTC)
 * No, no. I'm talking about a page like this:


 * And the listas parameter is . How would the bot deal with such a case? Does it remove the exact duplicates? Does it remove the non-exact duplicates? Should it? --MZMcBride (talk) 22:41, 20 May 2009 (UTC)
 * I've never really thought to look at category tags before, so the answer to "how would the bot deal with it" would be that it would pull the listas value and leave the category tags alone. Best solution I can think of is to remove the sorting keys on category tags and just let DEFAULTSORT take over.  However, which one should the bot pick to use for DEFAULTSORT?  I don't know.  My inclination would be to use listas for everything, for consistency's sake.  Matt (talk) 00:20, 21 May 2009 (UTC)
 * You don't want to remove sorting keys on category tags with a bot. There are some cases where a pipesorting should be different for an individual category. For instance, Category:Richard Nixon in the Richard Nixon article should be piped to a single space; and Category:Dukes of York in Henry VIII of England should be piped to "301". 99% of the time they should be removed, but we don't want a bot to remove them because that last 1% is important. – Quadell (talk) 01:58, 21 May 2009 (UTC)
 * Addendum: I think listas would be your best pick. – Quadell (talk) 02:01, 21 May 2009 (UTC)
 * I was thinking more about the instances where the pre-existing sort keys are identical to the new DEFAULTSORT. --MZMcBride (talk) 02:24, 21 May 2009 (UTC)
 * *scratches head*...so,  in the Richard Nixon article causes the article to appear at the very top of Category:Richard Nixon, and   in the Henry VIII of England article causes the article to appear 301st in the list (assuming that every other article in that category was also similarly numbered)?  Am I understanding that right?  Matt (talk) 03:16, 21 May 2009 (UTC)
 * Yep. – Quadell (talk) 11:04, 21 May 2009 (UTC)
 * Ok, I understand what MZMcBride is trying to say. So, basically, if any category tags have a sort key equal to what we're putting in as the DEFAULTSORT, they can be taken out, since they would be redundant at that point.  ✅.  Note that I've changed the function details above to reflect that.  Matt (talk) 05:46, 21 May 2009 (UTC)

Let's see it go. – Quadell (talk) 11:04, 21 May 2009 (UTC)

Did fix a minor bug or two along the way. One thing I'm not quite sure about -- if a category sort key has a space before the name, it'll get shot up to the top of the list in that category. If it's still identical to the new DEFAULTSORT value (without the leading whitespace), should it still be removed? I can see instances where you'd want it at the top of the category (such as in the Richard Nixon example above), but that could be done with a sort key of just a single space, instead of the entire name. Matt (talk) 19:32, 21 May 2009 (UTC)
 * I just tested that, and damn, you're right. I'm gonna go out on a limb here and say that if a sortkey is whitespace plus the DEFAULTSORT value, it's always a typo, and should be treated like it was just the DEFAULTSORT value (i.e. taken out). – Quadell (talk) 00:05, 22 May 2009 (UTC)
 * Agreed. The only other (minor) issue is putting DEFAULTSORT above the category links, not below them. It screws with editors to have it below and we end up getting duplicate code by people who don't see it below the categories. --MZMcBride (talk) 00:08, 22 May 2009 (UTC)
 * I think I can swing that. Do you want another (short) trial to see if I got it right?  Matt (talk) 00:12, 22 May 2009 (UTC)
 * Sure. Another 20 edits or so should be fine. Assuming the trial edits are problem-free (I'm sure they will be), I have no objection to approving the bot. --MZMcBride (talk) 00:14, 22 May 2009 (UTC)

The bot should skip pages which already have Lifetime, just as it does for DEFAULTSORT. Also, could it put DEFAULTSORT in its traditional position immediately before the first category? MANdARAX •  XAЯAbИAM  03:41, 22 May 2009 (UTC)
 * I suppose...could we do it to where it skips pages with Lifetime if it has at least 3 parameters to it (thereby ensuring that the DEFAULTSORT parameter is filled in)? And the code has been modified to put the DEFAULTSORT before the first category tag, we're just waiting for approval of some sort before it takes effect.  Matt (talk) 03:47, 22 May 2009 (UTC)
 * I've never encountered a Lifetime without the sort key parameter, but it wouldn't hurt to check. The ideal action in such a case would be either to fill in the Lifetime's sort key, or expand the birth/death categories, remove the Lifetime, and add the DEFAULTSORT. But I expect that situation to be extremely rare, so it probably isn't worth the extra effort, and I could live with both DEFAULTSORT and Lifetime on the page. Incidentally, about half of the bot's edits which I examined added DEFAULTSORT to pages with Lifetime.
 * And I discovered another item in need of tweaking. The bot removes parentheses (e.g. from ), but it should also remove what's inside the parentheses. According to WP:Categorization of people, "The sort key should mirror the article's title as closely as possible, while omitting disambiguating terms." MANdARAX  •  XAЯAbИA</SMALL>M  07:23, 22 May 2009 (UTC)
 * Okey doke, ✅. I'll get that change mirrored into ListasBot shortly.  Matt (talk) 07:28, 22 May 2009 (UTC)

Just to make MZ's comments official. – Quadell (talk) 14:48, 22 May 2009 (UTC)

I think it looks pretty good. I didn't have to make any changes mid-run this time around. Matt (talk) 20:02, 22 May 2009 (UTC)


 * Thanks for implementing my suggestion, which I see worked for . I have an additional tiny formatting request. Could you have the DEFAULTSORT immediately above the categories without a blank line in between? That's how it's almost always formatted. M<SMALL>AN</SMALL>d<SMALL>ARAX</SMALL>  •  <SMALL>XAЯA</SMALL>b<SMALL>ИA</SMALL>M  21:21, 22 May 2009 (UTC)
 * Sure thing. Everyone happy with the results?  Matt (talk) 04:52, 24 May 2009 (UTC)

Looks good. – Quadell (talk) 14:28, 24 May 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.