Wikipedia:WikiProject TypoScan/roadmap

[ edit] [ watch] [ unwatch]


 * ✅ | Do the initial scan of the Wikipedia database extract with DBScanner against the list of typos in RegExTypoFix
 * ✅ | Produce a basic SQL backend to track progress
 * ✅ | Add initial list of articles with typos to MySQL Database
 * ✅ Build a small application to take article list from the database scan and add them to the database (Takes ~4 seconds to populate nearly 70k articles into local database on reasonable spec server)


 * ✅ | Work with the AWB developers on integration
 * ✅ IListProvider AWB Plugin written/modified to be able to parse XML output from database generated in PHP, to give list of articles to work on (known as a workload)
 * Give user 100 articles to process per workload (can collect more than workload before processing, articles just appended to list)
 * ✅ | Produce check in/check out/timeout system to track what has and hasn't been typo fixed.
 * Timestamped ("checked out") in database when list is requested with that article in
 * If the timestamp is more than 2 hours old and not marked as finished, it will be pulled for another user
 * ✅ | Find way for client to upload the status to the server (check in/finished)
 * articles (article id's) can be posted back to the script and marked as finished
 * ✅ | When to write the status decide if it will be in intervals of time, edits, or at the end of program
 * Can be done on demand by using Plugins menu
 * Automatically done when program is closing if there are articles to be submitted
 * Automatically done every 25 finished articles


 * | Build a small application to "sync" a new article list from a new dump with the database
 * List of articles to be removed & added (Can be done based on ListComparer and the already written application)
 * Periodically its probably worth clearing the database


 * | Integrate false-positive reporting with AWB. Then use this reporting to find regular expressions that don't produce good results.
 * Typo stats have been suggested for this reason


 * ✅ | Expand plugin and DB for other projects
 * ✅ | Log whether editied or ignored/skipped
 * Logs reason to database also. Stats included to show statistics


 * | Add as plugin expansion way for DBScanner to add straight to TypoScan DB