Wikipedia:Bots/Requests for approval/Spelian


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol delete vote.svg Denied.

Spelian
Operator: WikipedianProlific

Automatic or Manually Assisted: Automatic but supervised.

Programming Language(s): AWB

Function Summary: Replacing common misspellings which cannot be the result of anything other than unintentional error.

Edit period(s) (e.g. Continuous, daily, one time run): Daily run lasting several hours.

Edit rate requested: 4/5 per minute.

Already has a bot flag (Y/N)NO

Spelian is an automatic AWB based bot intended to trawl pages of Wikipedia semi-automatically making spelling corrections to specific recurring spelling errors. For example, words like prominent are almost always spelt prominant which is incorrect. It is also highly unlikely that the misspelling is intentional. So using AWB, the operator (WikipedianProlific) selects a word and does a Google search for it. 5 pages are then selected at random as a sample to ensure that the misspelling is not intentional. As a rough example, some misspellings have around 100 occurrences while others have 2000. AWB is then set up and 'run' automatically allowing Spelian to trawl through the offending articles correcting them as it goes. This process is much faster than a user manually checking every single page prior to editing. AWB will not make any automatic changes other changing the spelling of the one word being run. This is because on occasion AWB can reformat or alter words, pages and links for the worse. For precaution, records of all lists run will be kept in the extremely unlikely event that a mass revert be required. To ensure that intentional misspellings aren't picked up the Bot will have 'list of common misspellings' removed from its list and word selection will be based on strict criteria. These criteria can be found on the bots userpage here.

Discussion
My position on automatic spelling correction bots is that they should always be run fully manually, to avoid breaking product name, template calls and links, or even subverting the meaning of the text. Sometimes, an incorrect spelling is appropriate (eg in articles about bad spelling, or in almost any solrt of article where scientific terms or quotes use a bad spelling), hence my feeling for the need for such a bot to be *fully manual*. Mart inp23 18:34, 21 March 2007 (UTC)


 * I see there is a potential for an automatic spelling replacer to make an erroneous alteration to an article, and hence the policy on usually not allowing such bots. However, Spelian is targeting specifically misspelt words which are extremely unlikely to the product of anything other than unintentional user error, rather than targeting a blanket of generic spelling mistakes. A good example of a word that Spelian might target is pejorative. It is almost always missspelt perjorative due to user misunderstanding of the correct spelling. It is also highly unlikely (as much so as one can be sure) that the word is an intentional misspelling. I appreciate that some users will have concerns about a high volume bot like this, would you support a trial run of perhaps 3 or 4 words, each of which has no more than say 50-100 erroneous occurances? Lists of the changed pages will be kept just in case an automatic mass revert be needed. Thanks. WikipedianProlific(Talk) 18:46, 21 March 2007 (UTC)


 * Note that I am not a BAG member, so would be unable to approve a trial off my own back. As a suggestion, however, would it be possible for you to do a "dry run" of the bot on the word pejorative across 1000 pages, outputting just a list of the pages which it could edit if allowed, without actually correcting the pages?  This should provide us with a quick and easy list of pages which could be affected on a run, without any potential collateral damage.  On the other hand - do you have a rough list of words which you plan to correct in the near future? Mart inp23  22:19, 21 March 2007 (UTC)


 * This is likely a bad idea. A manual bot is quite reasonable and probably required.  You need to look at the mispelled word in context to determine whether or not the spelling mistake is really a mistake in each specific case.  A hueristic like you want to use is good, but it won't be perfect and that's the problem.  We don't want to be introducing errors that would be really difficult to catch. -- RM 22:41, 21 March 2007 (UTC)


 * I do indeed have a list Martinp23, I actually manually ran Perjorative --> pejorative the other day. Not a single occurance was anything other than a misspelling. My list is derived from words which are suitable on Lists of common misspellings. I also appreciate the concerns you have RM about the potential risk to presently correct articles, I have two theories on that. Firstly has anyone tried this before? It may not be as bad as we think it could be. Because it nots really a spell checker, more a word replacer, out to catch common mistakes. Provided the words are thought about I think it should be fine. I'm just asking for a trial period to test the theory in, and see what we come out with. It might be that we're very happy with the results? The second theory on it is, lets say it does produce one error. It wouldn't be the first bot out there to produce one or two anomalies. But if it corrects say 10,000 articles before it makes that mistake is it justified? I'm not so sure but it is something worth mulling over I think. Comments appreciated, thanks. WikipedianProlific(Talk) 22:53, 21 March 2007 (UTC)


 * I don't know how much you know about this issue, and considering how often it comes up there should probably be a FAQ written by someone. The problem is that a spelling mistake is sometimes justified, such as in a quotation.  I'm not the expert on this issue, but there are other subtle cases where a replacement would be a bad thing.  The problem with these types of edits is that they are very hard to detect because of the difficulty with reverting from a correctly spelled word to an incorrectly spelled word.  It seems counter-intuitive.  Thus this type of error is very hard to catch, and we do not want this type of error when manual spell checkers (which have broad community support) are preferred over automatic bots (which have little if any community support).  We don't outright ban auto-spellcheck bots because we believe that some day someone may come up with a bot that is intelligent enough to handle all the cases.  If you have not reviewed the previous spellcheck bot requests, I can find you the links if you'd like. -- RM 23:07, 21 March 2007 (UTC)


 * I have ran past several, but then I do genuinely think this has some good things to offer. The key to it is carefully selecting the right words to run. However I see that there are also good points on both side of the arguement. Ideally a trial run of the bot would be nice to test on a controlled set of maybe no more than 300 pages what its potential is. However, I can see why even that may be more than WP:BAG are willing to give at this time due to the hazards it presents. WikipedianProlific(Talk) 23:17, 21 March 2007 (UTC)


 * I suspect this isn't "an AWB based bot", I suspect it is in fact AWB. Wonderful tool that AWB is, I don't believe it's suitable yet for doing automated spelling corrections. Mets knows the codebase better than I so we should probably wait for his comments, but my initial reaction is to decline this request outright per RM. --kingboyk 23:22, 21 March 2007 (UTC)


 * Before outright denying the request, I want to make sure that we exhaust our potential for creating a spellcheck bot that works. In the process, I found this much more complete bot proposal: User talk:PocklingtonDan/Spelling bot.  There is a lot of information on that page that could be useful. -- RM 23:32, 21 March 2007 (UTC)
 * Wouldn't it be better to have spell checking built into Mediawiki? Or encourage folks to use Firefox, which now includes spell checking? I'm not sure that bots should be second guessing human spelling, nor that a bot is the best solution to every problem.
 * That said, I'm happy to hear more arguments and will read that article :). Cheers. --kingboyk 23:47, 21 March 2007 (UTC)
 * And we "recently" had a bot request for a spell checking bot that was well thought out, but I can seem to find it. Perhaps someone else knows where it is. -- RM 00:31, 22 March 2007 (UTC)
 * Well even if not this one, why don't we give one a limited (say less than 300 edits?) trial run and see how it performs. We may find that on paper it actually outperforms expectations? I think my main arguement for this is that its mostly conjecture, until one is tried its hard to be sure exactly what the outcome will be, and invariably people will keep asking this question until oneday one is successful. WikipedianProlific(Talk) 00:41, 22 March 2007 (UTC)
 * Is this vanilla AWB or have you written your own code modules or plugin? --kingboyk 00:45, 22 March 2007 (UTC)
 * AWB/T is not yet ready for fully automated spelling fixes. All edits must be manually checked and approved if you are using that list. — M ETS 501 (talk) 00:51, 22 March 2007 (UTC)

OK, I've also just realised who the applicant is :) You have less than 500 edits and I only approved you for AWB usage (within the last couple of days) on the basis that you ease yourself in slowly.

Mets - a leading AWB developer - says the code isn't ready for automated use, and I - a new AWB developer - agree. All things consider this proposal has to be rejected for now. --kingboyk 01:09, 22 March 2007 (UTC)
 * Well, that, and AWB wont let you have AutoMode Enabled with Regextypofix =) Reedy Boy 11:43, 22 March 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.