User:Muninnbot/doc

This page describes the basic functionality and reasoning behind the Teahouse archival notification function of Muninnbot.

Contact information
I am, the bot's maintainer. For bug reports, feature requests, etc. you should contact me directly at User talk:Tigraan. If you have coded a new feature, use Github's pull requests, but drop me a note regardless on the user talk page.

The source code for the bot, in Python 3, is available on Github: https://github.com/Tigraan/Teahouse-bot. It is run as a Cron job on Wikimedia Toolforge (maintainer tigraan / role account tools.muninnbot).

Why the bot exists
In a discussion about the Teahouse's archival settings, it appeared that a frequent scenario was that a new user would post on the Teahouse, and then not log again for a couple of days. They would receive a quick answer, but the thread would be archived before they came back, without any obvious trace of what happened; some reposted the same question, but one can assume that most just assumed their post disappeared / was not read / was removed for unknown reasons.

The discussion focused on the optimal archival settings (too long and the page would become cluttered; too short and newbies would be lost to the "ask and get archived" effect), but I raised the option of having a bot notify of thread archivals. Participants generally agreed with the idea. I asked for more input in a later thread where we started to actually think of the bot design (specification and architecture). offered invaluable help in that phase (and later on).

The general idea for the bot is thus to warn new users that a thread they started was archived on the help forum most likely to be used by those users. In the future, the bot's activities might be extended to notify new users of archival processes in other forums, or to allow users to opt-in to get archival notification anywhere they like.

General principles of bot design
To be efficient, this bot has to be post to user talk pages on an opt-out basis. (New users wouldn't know they could opt-in; and user talk pages is the most efficient way to contact newcomers.) That gives it a lot of spam potential. Therefore, it was decided very early on that when it comes to determining notifications, a false positive is much more problematic than a false negative; the bot should only post when we are really really sure it has identified an archived thread and its original poster with near-100% accuracy, and it is acceptable to miss some notifications.

The bot runs daily. It first reads the recent page history of the Teahouse and identifies new thread creations by the edit summary (which by default is /* Name of section */ new section: changing the default edit summary when creating a new section will cause a miss). Then, it pulls the latest archival edit by and looks at which sections were removed from the page by this edit. Matching archived thread names to the identified thread creations allows to identify original posters with a high degree of accuracy (for instance, it does not matter if the original poster forgot to sign).

After new thread creations have been identified, notifications (notification template used) are sent to users satisfying certain conditions. As of 19:16, 21 April 2018 (UTC), those conditions are: I originally proposed to not notify editors with a somewhat larger tenure, measured either by edit count or by user-group (e.g. an extended confirmed user is likely to know about archival). Consensus was somewhat against that idea and the current setting does not differentiate by any age metric, though it would be trivial to implement.
 * 1) No notification of users who opted out (via )
 * 2) No notification to blocked users (WP:DENY)

Race condition with
The bot will notify according to the latest archival edits in the TH page history. Consequently, it is important that it does not run twice between two archival edits (else notifications would be sent twice) and that it does run between two consecutive archival edits (else some notifications will be forgotten).

There is no verification that the latest archival edit was not already used for notification. The Cron job launching the bot is set to trigger at 19:00 UTC every day, which is far enough from lowercase sigmabot III's usual run (at about 05:00 UTC) that delays in the servers will not prevent smooth operation.

Because the bot only parses the last day of page history at the Teahouse, if no archival is performed on a certain day, the bot will not re-use the previous archival edit. However, if the archival frequency changes, the Cron's settings will have to be adapted.

Known issues
(none that were not already decided as WONTFIX at the design stage... yet!)