Wikipedia:Bots/Requests for approval/RjwilmsiBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

RjwilmsiBot
Operator: Rjwilmsi

Automatic or Manually assisted: Automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Set page ranges within page parameter of citation templates to use en-dashes

Per guidelines on Template:Citation etc. Links to relevant discussions (where appropriate): Guidelines on Template:Citation etc.

Edit period(s): On download of new database dump

Estimated number of pages affected: ~29,000 (first dump), less after

Exclusion compliant (Y/N): Yes

Already has a bot flag (Y/N): N

Function details: AWB has logic to apply en-dashes to page ranges within the 'page' or 'pages' parameter of citation templates such as citation, cite web etc. Many page ranges are incorrectly given using a simple hyphen or (occasionally) an em-dash. Pages not matching this logic will be skipped.

Discussion
29,000 edits to replace '-' with '–'? I think this would be better done as something in AWB's general fixes (if it isn't already) than as a standalone task. Mr.Z-man 00:44, 12 November 2009 (UTC)

Sure, why not take care of it all in one go? --Cyber cobra (talk) 10:40, 12 November 2009 (UTC)
 * Are these considered as cosmetic changes or not? -- Magioladitis (talk) 18:17, 12 November 2009 (UTC)
 * I'd say they're slightly more significant than "cosmetic", especially within citations. I don't think AWB would be practical for 30k edits though. – Juliancolton  &#124; Talk
 * They are cosmetic, but it's supported by the MoS and the task is very specific. --Cyber cobra (talk) 07:49, 13 November 2009 (UTC)
 * There are plenty of things supported by the MoS that are too trivial to have a bot enforce them individually; the specific-ness of the task is part of the problem. If it fixed several problems at once, or did this while fixing something more substantial - like AWB's general fixes - it would be fine. But I'm not convinced this is a significant enough task that it needs a bot to do it. Mr.Z-man 18:19, 13 November 2009 (UTC)
 * I'm not seeing what the downside of running this bot would be. Are you suggesting server load as the problem or...? --Cyber cobra (talk) 18:55, 13 November 2009 (UTC)

I can't find where a "-" is required over a whatever in the links above. Where is the discussion about this change? I thought I asked this before. --IP69.226.103.13 (talk) 08:30, 14 November 2009 (UTC)
 * See WP:MOSDASH. --Cyber cobra (talk) 09:29, 14 November 2009 (UTC)
 * Thanks, the links above don't discuss it. Yes if there are other similar MoS details, the bot could take care of that. Also, this would be an ongoing task. I also don't understand the objection. That's a lot of articles that need a trivial change, and a lot of future maintenance. It seems bot worthy to me. --IP69.226.103.13 (talk) 20:25, 14 November 2009 (UTC)
 * The problem is the idea of "one bot per change" fills up page histories with inconsequential edits. We frequently deny bots that run the Pywikipedia cosmetic_changes.py script, and that does about half a dozen different things. I believe I've suggested in the past that people interested in enforcing the MoS/WP:CHECKWIKI with a bot get together, find the things that can be reliably detected by a bot, and make one bot to do all of them. There's also no real urgency with tasks like this such that all 29000 pages need to be fixed immediately, which is why I don't quite understand the objection (or rather complete lack of reply) to my suggestion to add this to AWB's general fixes, which is pretty much designed for things like this. Mr.Z-man 22:27, 14 November 2009 (UTC)
 * The logic is already in the gen fixes. My idea was to request a simple task as my first bot task, in the expectation that it would get approved more readily, rather than a more complex one. Perhaps I was mistaken. Rjwilmsi  23:45, 14 November 2009 (UTC)
 * Your logic or your reasoning for doing it this way is fine, maybe not the specific task, considering how many pages it impacts. Yes, I agree, single small edits to 29,000 pages when the same edit could fix more things should be considered. How about trying out this change on a small number of edits, then adding something? I don't know. Any ideas from anyone else. I see your points, Mr.Z-man. --IP69.226.103.13 (talk) 04:13, 15 November 2009 (UTC)
 * I could run the bot with all AWB gen fixes enabled, and ensure at least the page range dashes were fixed. Does that help? Rjwilmsi  08:47, 15 November 2009 (UTC)
 * Are all of the AWB gen fixes 100% reliable? I don't think they are, but I could be mistaken. --ThaddeusB (talk) 04:51, 20 November 2009 (UTC)
 * Major concern, then. Is there a known and finite list of a number of 100% reliable AWB general fixes. Yes, this needs more input, but I think it's a great idea for a bot, particularly if it can do a number of edits at once, doesn't have to do all, leaving it still a basic programming exercise. --IP69.226.103.13 (talk) 08:58, 21 November 2009 (UTC)
 * No, they are not. See Administrators' noticeboard/Incidents. Please do not run bots with general fixed enabled. Christopher Parham (talk) 13:47, 9 December 2009 (UTC)
 * BAGAssistanceNeeded - This could use some more comments by BAG members other than myself. Mr.Z-man 22:40, 18 November 2009 (UTC)
 * I've looked at this several times in the past week, and I just can't seem to care much one way or the other. It would be good if any other bottable general fixes were done at the same time, WP:CHECKWIKI might be able to help with that. Since it seems that the only potential controversy here is blowing up people's watchlists with relatively minor edits to 29000 pages, maybe we should run it past WP:VPR to try for a wider consensus for or against? Anomie⚔ 22:52, 25 November 2009 (UTC)
 * Probably appropriate to seek a wider audience. I can't even guess what the result of more input would be. I think details should be fixed. If they can be fixed by bot, so much the better. But I agree with not doing 29,000 edits to fix one thing if the same number of edits could fix a handful of details at the same time. --IP69.226.103.13 (talk) 04:42, 26 November 2009 (UTC)
 * The Manual of Style makes demands, a few of which are "bottable", such as ellipsis spacing and bracketing, curly quotes and apostrophes, and date formatting (apart from the delinking controversy). I doubt if more than a few percent of the articles would have more than one of those categories of problems. Art LaPella (talk) 01:03, 10 December 2009 (UTC)

D Please advertise this as discussed above to determine whether there is community consensus for this bot to make minor edits to 29000 pages. Anomie⚔ 03:52, 2 December 2009 (UTC)
 * Has it been advertised, or should I mark this BRFA as "expired"? Anomie⚔ 18:22, 9 December 2009 (UTC)
 * Have done now at WP:VPR and WP:MOSDASH. Anywhere else? Rjwilmsi  18:37, 9 December 2009 (UTC)
 * Maybe WT:CHECKWIKI, it seems up their alley. Anomie⚔ 20:40, 9 December 2009 (UTC)

There are two cases where a dash-like-mark is used with page numbers. When indicating a range, such as 29–31 to mean page 29 through 31, the n-dash is correct. When giving a single page in a book that is numbered by chapter, like the 2000 Dodge Dakota Service Manual, such as 2 - 1, I don't know what the proper punctuation is and I can't find it in the APA Style manual. I think we should find out what the proper punctuation is for the latter case before going any further. --Jc3s5h (talk) 19:26, 9 December 2009 (UTC)

Strong support for automating the replacement of hyphens with en-dashes in page ranges per WP:MOSDASH. A related problem that I see often in citation templates is using the  parameter for multiple pages and using the   parameter for a single page.—Finell 19:52, 9 December 2009 (UTC)

Question approach A tidy-up to increase consistency is something I would support. I would however query in this particular case whether the real issue has been correctly identified. Perhaps I'm missing the point, but it seems to me that one of the basic benefits of citation templates is that the template takes care of presentational aspects. From that standpoint, I have to question why the dash is considered to be part of the parameterized data in the first place. Given that this situation has come about, would a more correct solution not be to establish new parameters (such as pagefrom= and pageto=) for use when there is a range? Then the template can apply the correct presentation (en-dash), just as it controls all other presentational aspects such as parentheses and full stops. PL290 (talk) 20:03, 9 December 2009 (UTC)
 * I think I just answered my own question: a citation could be to "pp. 1, 3, 6–10" etc., and it would get very silly to try and cater for all possible permutations with parameters. However, could not the template be made to take care of substituting endashes for any hyphens passed? PL290 (talk) 21:26, 9 December 2009 (UTC)
 * Not straightforwardly, if at all, since we don't have decent string manipulation functions. And certainly not straightforwardly for anything besides the common "#-#" case. And given how often citation templates are used, hacking something up might be an exception to WP:PERF. Anomie⚔ 22:11, 9 December 2009 (UTC)

Objection. (Addressed below.) Jc3s5h's objection seems to be crucial: how can a bot tell whether "2-12" (with a hyphen) is a single page number (page 12 of section 2) or a page range (pages 2 through 12)? Most of the time, it's a page range, but not always, and a bot that turns a single page number into a page range will be making an incorrect edit. Eubulides (talk) 21:46, 9 December 2009 (UTC)
 * Oh, a couple percent of the time. Often a human can't tell either, short of a trip to the library for each article which is tantamount to giving up. Art LaPella (talk) 01:03, 10 December 2009 (UTC)
 * Perhaps the cases are unusual, but suppose the correct page number really is "2-12" (with a hyphen)? How is an editor supposed to prevent the bot from repeatedly and incorrectly changing it to an endash? To address this problem I propose the following modifications to the bot:
 * Leave the argument of page alone, since the singular word "page" indicates that it really is just one page and not a page range.
 * If there's a substring that contains some hyphens and endashes but no spaces, leave that substring alone; that will allow constructions such as "2-15–3-17" to continue to work.
 * Leave " " alone, so that it can be used in the other rare cases where hyphen is what's really wanted, e.g., "2&amp;#45;13, 3&amp;#45;17".
 * (I assume (3) is already the case, but I just thought I'd document it.) Eubulides (talk) 01:38, 10 December 2009 (UTC)
 * Sounds good to me, although I'd say an incorrect "page=347-348" is much more common than a correct "page=2-12" (I've been fixing this stuff with AWB). Art LaPella (talk) 02:22, 10 December 2009 (UTC)
 * Per Art LaPella I disagree about point (1) as the common use is mis-use. I agree with (2) – adds tests to make sure AWB does not touch such constructs. Re (3) I suggest a non-breaking hyphen be used instead (either HTML or Unicode) as in separate logic AWB automatically converts the HTML format of a hyphen to the Unicode one, so your suggestion won't work.  Rjwilmsi  10:21, 10 December 2009 (UTC)
 * OK, thanks, that addresses my objections. I assume, then, that the bot will leave non-breaking hyphen alone? (" " or "&#8209;"). Also, the constructs in (2) could also include non-breaking hyphens? Eubulides (talk) 18:09, 10 December 2009 (UTC)
 * Yes to both of those points. Rjwilmsi  08:16, 11 December 2009 (UTC)
 * Yes, it should leave  alone, editing only  . There are books using numbering such as "6–32" and books using "6-32". (I'm sure, because in both formats I've seen pages using both at least one hyphen and at least one dash in the text, so I could be sure that it was really a dash (or really a hyphen) and not a hyphen in a funny typeface.) --  _ _ _ A. di M. 20:37, 10 December 2009 (UTC)
 * Strong support per Steve Finnell. The use of en dashes is supported by many major style guides; automatic upgrading of typewriter-speak would be very welcome on WP. Tony   (talk)  12:26, 11 December 2009 (UTC)

I'm fine with how this bot is shaping up. There are a number of involved editors, now. --IP69.226.103.13 (talk) 17:44, 11 December 2009 (UTC)
 * I suggest giving the code to SmackBot, so that each article has the minimal amount of bot-visits. Tim1357 (talk) 05:19, 14 December 2009 (UTC)
 * Since Rjwilmsi has already made 100s of thousands of edits, I'm content to approve as it, rather then make him transfer his knowledge to Rich. I'll approve in the next day or so probably.  MBisanz  talk 21:07, 14 December 2009 (UTC)
 * Nature of the beast, Rjwilmsi work on AWB has added much value to SmackBot's edits, and conversely much of SB's key tasks have become AWB general fixes. In fact I had to re-run some stuff as it was "no find and replace changes"  GF's had picked up a load of the errors I had written regexes for. I am already seeing the improved page-range handling.  I look forward to RjwilmsiBot joining the MoS cromulence program. 79.79.127.255 (talk)
 * ✅  MBisanz  talk 20:13, 15 December 2009 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.