User:TedderBot/NewPageSearch

The New Pages Patrol is the perfect place for assessing articles and finding new contributors. However, it's a busy place to patrol, even with the patrolled flag. User:AlexNewArtBot provides content divided by subject area.

When the bot and user vanished in 2011, I coded a replacement for it. The code is written in Java with User:MER-C's Wikimedia API, and the source code is posted to Github with a reuse-friendly license. That way when I'm hit by a bus, another user can get it running quickly.

Do you want to see new features?
Please post them on the talk page. I want to control how features are added to the following lists. If you can help clean up this page, feel free to do so.

Things I need help with

 * "search query clerk" - if you understand the search queries (or even some of them) and want to help maintain and fix queries, let me know.
 * graphical person - I need a mascot/logo. This shed deserves to be painted.
 * documentation - I'm terrible with wording. I need help explaining what this bot does and help explaining what sparklines are.

Specification notes

 * The definition of a lede, for doubling points, is narrowly construed: it is from the beginning of the page until two newlines or the beginning of a section ("=="). Effectively, it's limited to infoboxes, cleanup tags, and the first paragraph.
 * Can't process the \p{charset} thing. TODO: explain.

To implement

 * Leave annotations alone on search result page, both before and after the search text (before for User:Dudemanfellabra to mark 'unrelated', after for User:Nthep, |diff)
 * Self-document search pages. Have a person or project as owner? (for User:SunCreator)
 * RevisionID of article seen (since it is cached)
 * Lazy load rules
 * article title in cached text: example
 * Configuration to turn archives off (for User:SunCreator)
 * Configuration to turn infobox parsing off. (for User:Acroterion and WP:WPARCH)

Completed

 * ✅ Invert output so the newest result is at the top
 * ✅ Move order of processing so a given ruleset is processed and posts, then another next ruleset is processed
 * ✅ Maintain state of each ruleset independently, start with ruleset+1
 * ✅ Respect bot flag
 * ✅ Only run on a given rule if necessary (more than 24 hours since last run)
 * ✅ Logging, not stdout
 * ✅ Detect pages removed and put them in the archive
 * ✅ Log page: User:AlexNewArtBot/ShipsLog (on errors page: User:TedderBot/NewPageSearch/Ships/errors)
 * ✅ Turned off caching for fetching rules pages
 * ✅ Added count of inhibitors to search logs
 * ✅ Bug: inhibit/excludes not working correctly? (for User:Lionelt, example is Neil McAuley on User:AlexNewArtBot/Conservatism with inhibitor "right wing back")
 * ✅ RevisionID of the ruleset when loaded (for User:SunCreator)

Searches to implement

 * Motorcycling
 * Redirects

Long-term

 * Also watch for redirects turned into articles ( there's an editfilter for this no more, so this task is difficult unless we can steal another list)