Wikipedia:Semi-bots

Wikipedians are encouraged to be bold in their edits, or even to ignore all rules. For serialized or "semi-bot" edits, however, extra precaution is advised, and a broad consensus for the type of edits you want to serialize is desirable.

This guideline gives a description of what is understood by serialized edits in this sense, explains which types of possibly contentious edits are better not serialized, and gives some general recommendations for operators and developers of semi-bot auxiliary software (note however many types of serialized behaviour can be performed without the help of software features).

Definitions
A wikipedia semi-bot is defined as a non-botflagged account which performs repetitive tasks on different articles.

Not considered "repetitive tasks on different articles" in the context of this definition:
 * Moving a single page together with its talk page (this creates four edits in one go);
 * Cleanup after a page move or the creation of a disambiguation page (e.g. double redirect or disambiguation repair);
 * Stub sorting;
 * Updating categorization and inter-wiki links;
 * Reverting vandalism or removing obvious errors.

Apart from that, repetitive is defined very broadly in this context: two consecutive edits only inserting redundant whitespace, can be considered a pattern of repetitive editorial behaviour in the sense of this guideline.

Examples of auxiliary software
Semi-bot tools include (see also Types of bots and Tools/Editing tools):
 * Anti-vandalism plug-ins and other software (see also Counter-Vandalism Unit):
 * VandalProof: AmiDaniel's VandalProof
 * Vandal Fighter: CryptoDerk's "original" (historical); Henna's continuation
 * Anti-vandal tool
 * AutoWikiBrowser
 * Popups, wikEd and other JavaScript tools (see also Scripts and WikiProject User scripts)
 * Conversion of citations/footnotes:
 * Footnote3/numlink2note.pl and Footnote3/order-footnote.pl
 * Cyde Weys' Ref converter
 * Citation Tool (automated/directed fixes to Cite.php errors)
 * Certain uses of the Pywiki bot, such as disambiguating

General principle
Don't serialize edits that have a reasonable chance of being perceived as contentious.

In other words, being bold in updating pages and ignore all rules are great principles which make Wikipedia to what it is. But don't push your luck; if you wish to change something in a single article and you guess that a significant number of wikipedians would find the change objectionable, or simply redundant in the sense of taking server time without inherent benefit, you can do it; that would be covered by the boldness principle, even if a fellow-wikipedian has the boldness to revert it instantaneously. However, serializing this edit over several articles circumvents discussion and would not be covered by the boldness principle.

When in doubt whether it would be a good idea to serialize a particular type of edit, you may want to consult your fellow-wikipedians before proceeding with the serialized edit: for example the proposal can be raised on an appropriate village pump page, Request for Comment, etc.

Rationale

 * Regarding performed tasks need a broad consensus: other wikipedians might have another view on page layout issues, which templates are used on which pages, etc... it's no fun to compete with a semi-bot in such case.
 * Tasks that need individual interpretation or assessment of context and content of a page may easily lead to errors when performed on a repetitive scale, so if performed repetitively, such tasks can only be performed by accounts that have a proper bot task description.

Examples
This section lists some tasks and whether they should be performed by semi-bots. This is not a comprehensive list limiting the "general principle" above; rather this list should be treated as a set of representative examples.

Policies and Guidelines
Since official policies are covered by a broad consensus, bringing articles in line with these policies in a serialized manner would generally not be perceived as "contentious". This usually can be performed by a semi-bot (but also see next point).

Guidelines are also covered by consensus, but they are more likely to have exceptions. As such, more care needs to be taken when applying guidelines in a serialized manner.


 * When a guideline or policy implies that the appropriateness of certain edits is dependent on context or interpretation, it is wise to consider that different people might choose to make different edits. For example, an ArbCom case decided against serialised reformatting of Harvard references to footnotes. The arbitrators justified this because whatever encouragements were inscribed in guidelines, there was no uniform format imposed by wikipedia policies and guidelines.

Style Guidelines
Style guidelines (the Manual of Style and other pages in Category:Wikipedia style guidelines) often hold recommendations that are not very suitable to be serialized indiscriminately.

For example Lead section currently recommends that articles with less than 15000 characters have a lead section of less than three paragraphs. Systematically lumping together the lead section paragraphs of such articles would probably not be a good idea: depending on context it might for example be better to create subsection headers or otherwise reorganise the content of such articles.

Another example is linking/delinking words. Only make links that are relevant to the context mentions that context (which is always subject to interpretation) is important when deciding whether or not to apply a wiki-link. Because of this, serializing linking or delinking is often contentous.

Adding/removing whitespace
"Whitespace" consists of spaces, empty lines, etc.

Examples:
 * Changing section headers from  to   (or vice versa). The effect is not visible, and there is no consensus as to which is prefered. Performing such edits by semi-bot also creates  edit history.
 * Adding spaces around pipes: while generally the visual effect would be nil, and some editors prefer it one way, while others prefer it another way, in specific cases the visual result may inadvertently be affected, e.g. changing (essay) to ( essay) leads to ( essay).

Fixing redirects that aren't broken
See Redirect. The consensus not to do this is narrow, and maybe could change over time to a permission to do this kind of linkfix. But even in that case it is not to be expected that the consensus would be very broad, so, until a broad consensus emerges: not recommended to be serialized.

Re-ordering category sequences alphabetically
Wikipedians have spent quite some time to find the "ideal" order of categories listed at the bottom of an article. No agreement resulted thus far. See Wikipedia talk:Categorization of people for a summary of the discussions.

Page moves
In July 2006, a Wikipedian undertook to move several hundreds of articles on people to a variant of their name including a full middle name, despite the fact that the applicable guideline (Naming conventions (people)) advises against this unless the person is best known with the name variant including a full middle name. This contentious serial behaviour was stopped, and the moves reverted.

Recommendations for semi-bot operators
If you're not sure whether a particular serialized edit would be perceived as contentious, one way to find that out is to request approval for that type of edit as a bot job at Bots/Requests for approvals.

Edit summaries
Edit summaries of changes performed by semi-bots should be clear about any auxiliary software used, as well as about the performed changes.

Responsiveness
Operators of semi-bots (whether separate account or not) should be reachable and responsive on an en:wiki talk page. Removing remarks given about the semi-bot's behaviour before ascertaining that the poster of the remark considers the remark properly handled is considered the same as "not being responsive". Following on the posting of a remark on the appropriate user talk page, a response is expected within minutes of the next edit by the semi-bot. Not being responsive may lead to the blocking of the semi-bot account, for which the blocking admin should leave a note on the contact page for the semi-bot. Alternatively, and this is preferred where possible, the semi-bot software is temporarily disabled for that account, equally with an appropriate note to the semi-bot operator. For example, AWB has such functionality at AutoWikiBrowser/CheckPage.

Operators of semi-bots should be prepared to undo edits diverging from the principles of this guideline. These reversions should not destroy intermediate changes by other editors. Improving the semi-bot's settings without properly undoing previous problematic edits does not suffice.

Separate account?
Unlike bots, semi-bot operators are not required nor even specifically encouraged to take a separate user account for semi-bot operations. Nor does the Sock puppet guideline discourage the use of a second account for serialized operations.

So, the decision is up to the semi-bot operator. The reason for this section is to list pro's and con's for both approaches, in order to help a semi-bot operator to make his/her choice:
 * A separate account might be perceived as some kind of bot, with its disadvantages, e.g. bot accounts are sooner blocked than bot operator accounts if a bot starts to behave strangely. If a semi-bot and its operator share the same account, this would usually make sysops hold back from pre-emptive blocking (which is not a big issue when applied to a separate account).
 * If, on the long run, a semi-bot operator might want to request approval for bot status, it might be a good idea to start with a separate account, so that the behaviour of the account performing serialized operations is already known by the time of the bot account application (which would usually speed up the process of receiving approval as a bot for that account).
 * Reachability: think for yourself what would be the easiest way for wikipedians to contact you with regard to issues on the semi-bot operations: if separate accounts would make it take longer before you notice a message has been sent to you (e.g. while you're logged in with the bot account, and you don't receive notification of messages left on your operator account's talk page), this might be a contra-indication for having separate accounts.
 * Risk of being perceived as sockpuppeteering, e.g. forgetting to change login when expressing your opinion in a straw-poll may be ill-received.
 * Cleaner separation of tasks: glitches performed by a semi-bot that are properly handled don't reflect back as much on the operator: if the two accounts are not separated the "glitch" is more identified with the operator.
 * Starting as a semi-bot under your own account is certainly the less red tape solution.
 * Sometimes the auxiliary software (if any is used) gives an indication: most of the typical semi-bot software (e.g. anti-vandalism tools) is designed to be used on a single account. Typical bot software (e.g. py framework) is rather designed for a separate account (even when used as semi-bot).

If a semi-bot operator decides to use a separate account it is recommended not to use "bot" in the semi-bot's account name (in order not to confuse with bots that have a listed and agreed upon job description). Instead it is recommended to use an account name in the vein of:"(derivative of) user's account name + 'Task'"Where "Task" can be either just the word task, or a word that indicates a task. For example linkcheck could be a "Task" name for a semi-bot that checks whether external links are still live.

Recommendations for auxiliary software developers
Development of auxiliary software is usually not as strongly supervised as the development of the features and implementation of the MediaWiki software itself. Some suggestions: Compare Scripts (specific for scripts)
 * It's always possible to discuss ideas at Village pump (for example in the "proposals" or "technical" section) or Meta-Wiki;
 * Be careful not to program anything that might easily lead users to perform actions that are unsupported by wikipedia's guidelines and policies. If that can't be avoided, at least give proper warnings in the software and/or be clear about such risks in the manual. In other words, try to make the tool as fool-proof as possible, taking account of e.g. the KISS principle and Murphy's law.
 * Preferably, the software uses a copyleft licence, such as the GPL, in line with the open nature of Wikipedia. Note that scripts uploaded in Wikipedia (for instance on ones monobook.js user namespace page), are automatically licensed under the GFDL, although you may extend further rights as you wish.