User:RoySmith/DYK-Tools bot proposal

This is a proposal for a new bot to help out at WP:DYK. A big part of the back-end work of DYK is building prep sets. Each set consists of 8 "hooks", which are chosen from those proposed in nominations. The selection of hooks needs to comply with an absurdly large number of rules. These rules include:


 * The hook must be previously approved, indicated by a checkmark icon on the nomination template.
 * Once approved, a hook can be unapproved by somebody raising an objection, requiring that it be re-approved
 * If you are the author of a hook or have approved it, you can't promote it to a set yourself
 * The first hook in set must include an image (which in turn must be approved)
 * Within a set, it is strongly discouraged to run two hooks that are biographies next to each other
 * It is similarly strongly discouraged to run two hooks about American topics next to each other
 * The total number of biography and/or American topic hooks in a set is capped
 * Between sets, it is discouraged to have the lead hooks be of similar types
 * Certain hooks are tagged to be run on particular dates
 * And so on

In the current process, people building prep sets scan the list pending hooks looking for ones that meet all the requirements. It would be good to have a tool which automates as much of this as possible and presents to the human a list of potential hooks that might fit a given slot. It would then be up to the human to confirm the suitability and pick from the suggestions presented (or ignore them completely).

Proof of concept
A POC implementation of the evaluation system is currently running on toolforge. Source is available in github.

Next steps
The next step is to repackage the nomination evaluation code as a bot which runs under cron on toolforge. This would:


 * Run at some reasonable interval. Hourly seems like a good starting point.  Based on some initial measurements, I estimate a run will take a couple of minutes to complete.
 * Iterate over the articles in Category:Pending DYK nominations to find nominations to examine.
 * For each unassessed nomination, evaluate it to determine if it's a biography and/or an American topic.
 * Add Category:Pending DYK biographies and/or Category:Pending DYK American hooks to the nomination template as appropriate. The edit summary will include a link back to the bot's user page.  A human can override the automatic assignments by adding or deleting classification templates manually.
 * Keep track of which nominations it has processed so it doesn't keep reprocessing the same ones. Any nomination which already has any of the classification templates will be automatically skipped.  Thus, if a human does a manual evaluation, the bot will never override the human.
 * Iterate over Category:Pending DYK biographies and [[:Category:Pending DYK American hooks to find any templates which are (no longer) in Category:Pending DYK nominations and remove the classification categories.
 * Alternative to that would be to have the bot edit the DYKsubpage which is on every nomination, adding new parameters to indicate the categories. That will clean up the cats automatically when the DYKsubpage during the nomination close process.


 * I'll implement some kind of emergency button so anybody can stop it if it goes haywire.
 * Assert will be used to prevent logged-out editing (I need to figure out how that works in pywikibot).

Future work will be to build a tool that a user can run (probably as part of the existing toolforge web service) to filter based on these categories and/or other criteria. I could also see additional classification categories being added in the future if needed.

Architecture
The code that touches the wiki is pywikibot. The web app is Flask.

I don't anticipate the need to persist much data. What little bits of state I need, I'll probably use redis to keep things simple.

I've created User:DYK-Tools-Bot.