Wikipedia:Bots/Requests for approval/CheMoBot 2


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

CheMoBot 2
2

Operator: Beetstra

Automatic or Manually assisted: Automatic

Programming language(s): Perl / PerlWikipedia

Source code available: Yes

Function overview: Keeping an eye on verifyable numerical data in infoboxes (e.g. 'boilingPtC = 100' for water (molecule) in the chembox, or 'birth_date = May 29, 1917' for John F. Kennedy in the Infobox President

Edit period(s): Continuous

Estimated number of pages affected: 10-20 mainspace pages / hour (with the three infoboxes currently followed, set of followed pages about 9.000)

Exclusion compliant (Y/N): No(t yet?)

Already has a bot flag (Y/N): Y

Function details:

As explained in Bots/Requests_for_approval/CheMoBot, CheMoBot is following verifyable data in infoboxes. E.g. we all know that the boiling point of water is 100 degrees Centigrade, so there is hardly any reason to change the value 'boilingpoint = 100' in the infobox on the wikipage for water to anything else (maybe except if there will be a rewrite of the infobox, which would have a larger effect, not only for water). If that value is changed, then that should be noted. For water it will be clear, practically every editor will recognise that a change of that value needs a good and proper reason, however for e.g. the melting point of Sodium bisulfate one would have to dig in literature to see if a change from 35.2 to 58.5 is correcting the value or not.

CheMoBot works with an index with revids of pages where certain values are correct. E.g. the CAS registry number ('CASNo' field in the chembox) of Acetone was checked by Physchim62 in revid 266420980 (see this revid) and found correct. This revid was recorded in the index for the chembox (stored in WikiProject Chemicals/Index). If the CASNo would now be different, that can be checked by CheMoBot, by comparing the value in the verified revid with the current revid.

Every time a page transcluding one of the followed boxes, CheMoBot checks if verified values are changed, and the changes are documented in his logs. However, the logs are difficult to read, and one would have to go through all of them to check if there is data changed. The same could be accomplished with proper categorisation of boxes which have these verified values changed.

'''Here is proposed, that CheMoBot adds (a) parameter(s) to the infoboxes in mainspace that have verified or watched values changed to values which are not the same as those in the indexed, verified version. These parameters would enable these boxes to be categorised using the template code to note that (critical) values in the box were changed. When the parameters are correct, the parameters are removed again.'''

The bot is not going to correct the values, it only tags the pages by putting another parameter in the infobox!

I mention 'verified' and 'watched' fields. The verification project under the WikiProject Chemicals is currently only verifying the 'CASNo', and for that field only the index is set up (at the moment). However, the infoboxes also contain other numerical fields for which we are interested in changes in them, but which may not be correct in the indexed version. In short: CheMoBot follows both the 'watched' and 'verified' parameters in the box (as changes to them are of interest), but treats the 'verified' fields special (as they are correct in the indexed version).

Regarding the being 'Exclusion complaint', I don't think that that is necessery here, it does not show strictly what is displayed, it only helps in categorisation of boxes containing verified data. Although I understand that the settings can be used to 'enforce' certain data in boxes, I would say that the problem would be with either having wrong settings, or having to adapt the index of verified versions, not having deliberately wrong data in the infobox.

For the 'variables' in the settings, see User:CheMoBot/Settings; when I mention a setting here, I will put the variable between (e.g., is the variable 'boxes', in the settings set in the line 'boxes=Chembox|Drugbox|Reactionbox|Chembox_new'
 * Settings

Pages transcluding watched infoboxes (chembox, drugbox and reactionbox can either be 'verified' or not 'verified' (so no revid in the index). Verified here means that someone has checked the correctness of the verified fields in the infobox on a page, and if verifyably correct, has recorded the revid of the page in the index (for chembox that is in WikiProject Chemicals/Index).

I will from here talk about a page with a verified chembox. For chembox, the verified fields are , the watched fields are  (variables for the chembox all have the prefix 'chembox_'), others are either for other boxes, or systemwide. The system works the same for every box in, each infobox having his own set of settings.

This functionality does not do anything with a not-verified box, i.e. a box on a page which does not have a revid with verified values in the index!
 * So what happens when an editor changes a value in a verified, followed infobox

Functionality is the same for the verified and for the watched fields, the actual infobox-code should be able to handle the two things differently.
 * 1) The editor is changing a verified or watched field (normal)


 * An the editor is changing the (correct) 'CASNo' into an incorrect one (my testedit here changes the 'correct' value (122-12-121) to 122-12-1213). The bot loads both the current revid, and the verified revid, and see that the CASNo is changed.  The bot notifies this immediately to .  CheMoBot retains the pagename, as there is a changed value and waits seconds (currently set to 5 minutes), to allow further changes, or other editors or antivandalismbots to revert.  After seconds, the bot will load the current revid, and the verified revid, check if the field is still changed, and if so, will add a parameter to the body of the box.  The parameter is defined by , "Verifiedfields=changed" will be added, see diff (edit at 7.06, field added at 7:12; the change of the other field will be explained below).
 * The subsequent edit to reset the CAS to the original value (diff) results in the bot edit (diff) to remove the "Verifiedfields=changed" from the box.
 * Similarly, for change to a watched field: diff and diff. Changing them back (diff) results in diff

The added fields (here 'Verifiedfields=changed' and 'Watchedfields=changed') can be used to trigger categorisation, box colouration, &c. for easy recognition. For the chembox the effect of changing the values for a verified field can be seen at the bottom of the box, where also the box-disclaimer is displayed).


 * 2) The editor is changing a verified or watched field (special, optional)
 * In the chembox, the (verified!) field 'CASNo' is accompanied by a field 'CASNo_Ref'. The field CASNo_Ref has an effect on the display of the CASNo, namely:
 * CASNo_Ref absent: CASNo is displayed with black brackets,
 * CASNo_Ref contains '' : CASNo is displayed with green brackets (cascite refers to thé authority/organisation that gives out the number; for this number thé only valid 'reference'),
 * CASNo_Ref contains 'changed': CASNo is displayed in red.
 * The setting  defines what happens when the parameter CASNo is changed. It contains 3 parameters, 1st the dependant parameter in the chembox, 2nd the value of that parameter when the CASNo is correct (i.e., the same as in the verified version), and 3rd the value of that parameter when the CASNo is wrong.  The parameter will be added (if not available) immediately before the paramater it is dependent on (directly before CASNo in this example).


 * Changing the CASNo from the one stored in the verified revid ('122-12-121') to incorrect '122-12-1213' results hence in a) addition of the 'Verifiedfields=changed' as explained above, and b) a change of the value of the parameter 'CASNo_Ref' from '' to 'changed': edit results in diff. This would now display the brackets in red, in stead of green when viewing the page.
 * Changing the value back (diff) results in the bot edit (diff) to remove the "Verifiedfields=changed" (there are no other changed verified fields left in this box) and it sets the parameter 'CASNo_Ref' back to ' '.  The colour of the brackets is now again green (i.e., the CASNo is correct).


 * Verified revid
 * if in the settings 'boxname'_addverifiedrevid (e.g. "chembox_addverifiedrevid = 1") is set, CheMoBot will in an edit to the page also add/update the value of a parameter 'verifiedrevid' in the body of the infobox. This will contain the value of the verified revid as set in the index.  This parameter in the box can be used to show a 'changes from the verified version'-link.


 * some notes
 * If an editor edits somewhere something on the page, then CheMoBot will follow up with an edit to add the appropriate fields for the first time (such an edit will look like this (depending on the settings)). From that point on, CheMoBot will not edit that page again until on of the watched or verified fields are changed.
 * If there is no index defined for the infobox, or if there is no verified revid set for the page in the index of an infobox, then CheMoBot will not edit the page.

Discussion
For people who want to test and play with this functionality. The bot operates this functionality in my userspace (it is strictly turned off for mainspace, it only operates in specified userspace):
 * Example page with a chembox: User:Beetstra/Propane
 * The verified revid is set in WikiProject Chemicals/Index
 * Settings are in User:CheMoBot/Settings
 * Edits to the chembox pages will be shown as day-by-day subpages of WikiProject Chemicals/Log (e.g. WikiProject Chemicals/Log/2009-08-03 for the third of August, 2009).

Feel free to edit the page User:Beetstra/Propane to see how changes will operate. The handle-delay (i.e., the time between a change to the page, and the moment that the bot adapts the parameters) is now set to 15 seconds in userspace (to allow reasonably rapid testing ; ). In mainspace I am thinking to force it to a minimum of 5 minutes, even if the setting for is set shorter. If the setting is higher, that number will be used; but I expect that in the order of 5-10 minutes would be reasonable); the actual delay can be set in the settings page, parameter . I would appreciate if people would try to fool the bot on my userspace page before we apply this to mainspace (though I don't think that it can be done).  --Dirk Beetstra T  C 15:13, 3 August 2009 (UTC)
 * I have to say this is a superbly useful piece of functionality, it would be useful also for watching a lot of other carefully entered and verifiable data, such as co-ordinates, death dates etc. Obviously there will be errors that need correcting, and we know that IP corrections often get slapped down by vandal fighters,especially if they have no edit summary, but that is a red herring, this is a useful tool, not just for Chem articles. Rich Farmbrough, 19:04, 20 August 2009 (UTC).


 * I have just added Infobox Person, Infobox Royalty, and Infobox Officeholder to the watching of boxes, and made the bot output to #wikipedia-en-blp for these boxes (there may be more boxes, but those can come later, lets not take too big steps, the bot still has to be able to munch every edit (there is space for more, though)). For these boxes no index is defined yet (and hence, there is no verified data).  Bot does seem to do well, there.  --Dirk Beetstra T  C 19:52, 20 August 2009 (UTC)
 * I think the point to be made is that, in chemistry, we have an almost ideal dataset against which to test the bot. We know that our CAS numbers are correct, because we have obtained them from the primary source, but they can also be verified and challenged by any user (see our Signpost article from last May). At a rough estimate, our error rate in CAS numbers has dropped from 1–2 percent to 1–2 permille (an error rate of zero doesn't exist) thanks to our verification efforts: WP:CHEM would like to keep it that low! The principle can certainly be applied to other items of data which we have gone to some (often considerable) effort to verify – such as birth and death dates and places, or geographical coordinates – but we need to check how things will work in practice first. Physchim62 (talk) 23:03, 29 August 2009 (UTC)

I can't see any reason to oppose approval as requested. The bot would be running as a cleanup bot, ie tagging articles that might be in need of cleanup, but leaving intellectual decisions to human editors. The addition of a single parameter to an infobox, one which would have minimal visibility for users – in fact no visibility at all, given recent changes to the chembox since the request for approval was posted – seems to me to be quite within the established bounds of a cleanup bot. It is less intrusive than, say, adding unreferenced tags (even when such tags are justified). Physchim62 (talk) 22:38, 29 August 2009 (UTC)


 *  MBisanz  talk 02:39, 2 September 2009 (UTC)

Thanks, but ... err ... the testing for this is already working in the specified userspace (specifically ONLY in my userspace, see e.g. User:Beetstra/Propane, so we can actually test if it works), and it actually works .. --Dirk Beetstra T C 11:35, 2 September 2009 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.