User:Monkbot/task 11: CS1 multiple authors/editors fixes

Task 11 trolls through and  to replace singular author and editor parameters that hold multiple names with a parameter for each name or with the Vancouver system parameters when appropriate.

description
Module:Citation/CS1 adds pages to Category:CS1 maint: Multiple names: authors list and Category:CS1 maint: Multiple names: editors list when an author or editor parameter value has more than one separator character. Separator characters are commas and semicolons. The test isn't perfect and may 'catch' generational suffixes, html entities, etc. These false positives are relatively rare.

Multiple names in a singular parameter causes Module:Citation/CS1 to produce malformed metadata. The solution to this is not to simply convert author to authors because authors does not contribute to the citation's metadata. There are too many possible ways to write author name lists for the module to attempt to parse the name-list into meaningful metadata.

The same diversity of author/editor name-list formats constrains what task 11 can accomplish. Task 11 seeks out a few commonly used name-list formats and attempts to rewrite them using more appropriate parameters.

supported name-list formats
There are a few name-list formats that editors commonly use. In no particular order, these are:
 * semicolon separated name-lists
 * These name-lists take the form: name; name; name;.... The semicolon separator makes it relatively easy to create name name parameters from the original source.


 * comma separated name-lists
 * There are two forms of this type
 * the form first last, first last, .... The comma separators make it relatively easy to create first last first last parameters from the original source.
 * (disabled) the form last, first, last, first, .... As long as there is an even number of comma separators, creating last, first last, first from the original source is mostly straightforward. This form is not supported by the bot because it is too susceptible to misinterpretation.
 * Both of these formats are susceptible to editor inconsistencies – primarily switching from one name format to another within the same source parameter, for example: last, first, first last, first last, .... Task 11 attempts to skip mixed format parameters.


 * Vancouver style
 * Because the Vancouver style imposes a consistent format: last I, last I, last I,... it is relatively easy to create last I, last I, last I,... from the original source.


 * name and name
 * Very common, this form is not detected by Module:Citation/CS1 but is equally inappropriate so when possible, task 11 fixes this form.

avoiding errors
Task 11 takes some steps to reduce improper edits but can't avoid them entirely:
 * sometimes editors include affiliations in author parameters. These can be interpreted as author names.  GIGO
 * the word 'and' in National Aeronautics and Space Administration becomes National Aeronautics Space Administration (this particular error is avoided; see below)

Task 11 avoids:
 * author and editor parameter values that contain digits
 * names with zero or more than three spaces in comma separated lists (Bono, Leonard Bernstein is avoided because it looks like the name is 'Leonard Bernstein Bono')
 * templates that have enumerated author and editor parameters
 * templates that have certain words in the author parameter: journal, national, university, etc which may be part of a longer name that contains the important word 'and'

errors that are not avoided
The conversion process for Vancouver style name-lists does not ensure that these converted name-lists conform completely to the Vancouver style. That is not the purpose of task 11. When the result of an author → vauthors conversion is malformed, Module:Citation/CS1 will add the article to from which the errors can be corrected.

ancillary tasks
Task 11 does some housekeeping: If these are the only changes to be made to an article, the edit is abandoned.
 * removes empty author, authors, last, first author-link, author-mask in their singular and enumerated forms
 * removes empty editor, editors, editor-last, editor-first editor-link, editor-mask in their singular and enumerated forms
 * removes empty display-authors and display-editors because these parameters are related to the author and editor parameters
 * removes empty others because this parameter is vaguely related to the author parameters
 * removes empty coauthor and coauthors because these are deprecated
 * removes extraneous editor annotation from editor parameter values (redundant to the static text supplied by the templates)
 * removes some pre and post nominals from author and editor names (Dr and PH.D., for example)
 * replaces some html entities in author names with their unicode equivalents because html entities end with a semicolon which can cause Module:Citation/CS1 to add the article to the category