Wikipedia:Community health initiative on English Wikipedia/Edit filter

The Wikimedia Foundation's Anti-Harassment Tools team is exploring ways to improve the AbuseFilter extension so communities can use the powerful tool to prevent and monitor potential harassment, similar to how vandalism and spam are prevented and monitored.

AntiSpoof and performance monitoring improvements are currently in development. We may implement new conditions and functionality later, if the need presents itself.

Goals

 * Improve AbuseFilter so admins have stronger tools at their disposal to prevent, identify, and monitor harassing behavior.
 * Alleviate performance concerns so communities don't need to pick-and-chose which filters are active.
 * Add functionality to allow more sophisticated detection. (ORES, Detox, anti-spoof, regex captures)

On ENWP
The AbuseFilter extension is enabled on most wikis. On the English Wikipedia it is called 'Edit filter' because it is used to log and monitor other events other than just abuse, though filters can be used for other logged actions, and not just edits. Wikipedia:Edit filter explains the features and the tool can be found at Special:AbuseFilter.

There are ~220 users (most of which are administrators) on ENWP who have permissions to create or modify these filters, known as "edit filter managers", or EFMs. Edit filters are written in a language similar to other high-level programming languages. Full documentation can be found at mw:Extension:AbuseFilter/Rules format, with some additional documentation at w:en:Wikipedia:Edit filter/Documentation. In short, filters can search for actions (e.g. a user making many rapid edits, user removes a large section of text, account creation) or strings (e.g. curse words, spam URLs) and compare them against information about the user (e.g. username, registration date, lifetime edit count) and the page they are editing (e.g. namespace, title, recent contributors). In addition to basic logging, if the filter is triggered it can do one of four actions: All hits to enabled filters are publicly logged. However edit filter managers can set the visibility of a particular filter to private, so that the filter details are only visible to EFMs. This privacy also applies when searching for hits to a specific filter in the logs (example). When reviewing the user's filter log, hits to private filters are still listed, but the ID of the filter is not disclosed.
 * Block the user — this is not used on ENWP.
 * Prohibit, or "disallow" the edit from being saved.
 * Warn the user with a custom message and require a confirmation before publishing.
 * Tag the edit (en:Wikipedia:Tags).

Performance management
When a user publishes an edit their revision is processed through as many edit filters as possible, in increasing numerical order, until 1,000 conditions are hit. In essence, each boolean operation is considered one "condition". If the edit runs through 1,000 conditions but not all filters, the remaining filters are skipped. This is to keep the filters lean and reduce the time between submitting a publish and the publish succeeding.

This results in a system where EFMs must manage the inventory of conditions — if all conditions are used and a new filter is desired, conditions must be re-allocated from other filters. This is time consuming and requires experience and patience. To ensure all filters are running, after modifying any filters the EFMs must manually monitor the top line of text on Special:AbuseFilter:"Of the last 7,694 actions, 0 (0.00%) have reached the condition limit of 1,000, and 74 (0.96%) have matched one of the filters currently enabled."

Problems to solve
This is an unprioritized laundry list, not a backlog.

Performance management
We've added performance monitoring to AbuseFilter on a handful of wikis, which can be viewed here: https://grafana.wikimedia.org/dashboard/db/mediawiki-abusefilter-profiling

We've also added logging for filters that take over 800MS. These are currently privately logged, but we'd like to build them into the AbuseFilter UI directly. We've also re-enabled the once-disabled per-filter profiling on Portuguese and English Wikipedias to monitor if they themselves cause a performance degradation.

From what we've found, there is still condition inventory on ENWP to run more filters, so we do not need to 'fix' performance, but rather help Edit Filter Managers more easily find the maximum number of filters they can enabled. To accomplish this, we'd like to bring the tracking we've implemented out of these dashboards and into the Special:AbuseFilter UI directly. Once we know if the per-filter profiling causes a performance degradation, we will begin an on-wiki discussion about how to surface the data on the Sp:AF UI directly.

Potential things for the future
 * Filters are combined for performance reasons, so it's hard to know if certain parts of the regular expressions are still needed (e.g. some vandalism trend that has died off).
 * These combo-regex filters can have severe affects on large edits to large article pages.
 * Give people better feedback on the performance of a filter. ("Hey, this filter needs to be optimized. Here’s some best practices on how to improve it.")
 * Database updates to the abuse filter log table could happen via post-processing (not that big of a deal for Anti-Harassment).
 * Can we optimize execution time on the backend?
 * The larger the diff, the slower AF runs on publish. This can be troublesome on page-blanking or mass content removal reverts and undos.
 * Is runtime a better way to show the performance of each filter?

Warning effectiveness
AbuseFilter currently has a 'warn' feature that displays a message when a user trips a filter. It would be low effort to test and implement improvements to these messages' effectiveness.
 * Allow a different type of display per filter (pop-up, above the edit window, etc.)

Functionality
There are many limitations to the edit filter's functionality.
 * Binary decisions, no heuristics. Should we explore Detox, ORES, or other machine learning integration?
 * Subroutines — Check if another filter was tripped. Allow one rule to trigger / call other rules, i.e. to standardize common elements and remove redundancies.
 * Notifications? — Echo? Watchlist? IRC feed? A new Special:AbuseWatchlist?
 * When a filter is tripped
 * When other managers edit/create filters
 * Detect edit wars, or likelihood that an edit is part of an edit war?
 * Ability to set an expiry for a filter, so that it can automatically be disabled after some period of time.
 * Add additional variables:
 * Age since page creation / total number of edits, or some other way to tell if a page is newish or oldish
 * Number of recent edits / time since previous edits, or some other way of detecting floods in the AF rules (separate from the throttle mechanism)
 * Indicate edits performed via revert, undo, etc.
 * new_categories / old_categories
 * new_media / old_media
 * Add additional functions:
 * A function like  but for integers, such as  . Currently this can only be done with less-user-friendly regex, such as , or with multiple comparisons that require one condition for each item in the array.
 * Add ability to store the regex "captures" into a variable, which can be used in other parts of the filter.
 * Add additional outcomes of the filters:
 * Require editor to complete a captcha before saving edit.
 * Possibly, apply temporary semiprotection to the targeted page.
 * Deferred changes — see & Wikipedia:Deferred_changes
 * Clarify the UI that setting a filter to both 'warn' and 'disallow' may be redundant (but still leave it as an option.)
 * Throttling — Allow users to make a certain action for N infractions, then warn for N infractions, then disallow on any further infractions.

Anti-Spoof
It can be easy to get around filter that are using AntiSpoof. To address this, the Anti-Harassment Tools team added in more equivset coverage and implemented a new function to AbuseFilter to make it easier to compare potentially spoofed words to others. We will not be investing in this area in the future.