Wikipedia talk:Moderator Tools/Automoderator

Do you think ClueBot NG has a substantial impact on the volume of edits you need to review?

 * Not really, no. A few years ago my answer would have been different, but there has been a noticeable decrease in how much I encounter ClueBot's edits while I'm patrolling. I think it's primarily due to increasing use of filters and more aggressive proxy blocking; it's getting harder and harder to get to hit "publish changes" on a piece of vandalism. On top of that, the pool of counter-vandals is continually increasing and tools are being developed to help them, which makes the whole human side more efficient. Nowadays it feels like ClueBot mainly catches things nobody bothered to revert, or occasionally something I'm hesitating to revert for one reason or another. Ironically, we're taking work away from the automaton. Giraffer (talk·contribs) 18:03, 25 August 2023 (UTC)

How would English Wikipedia evaluate Automoderator to decide whether to use it?

 * To me at least, the really relevant portion is "what edits could Automoderator catch that ClueNG isn't already". So any evaluation would be easied by seeing examples/scale of what it's catching. Attached to that is being confident that automoderator isn't making more false positives than ClueNG (we could already configure ClueNG to catch way more problematic edits if we accepted twice the false positive rate, so 0.1% or less would be needed). Nosebagbear (talk) 11:16, 25 August 2023 (UTC)
 * Yeah, seconding Nosebagbear. I'm not sure how this would be an improvement over the current system when ClueBot NG already does the same thing. I'm extremely worried about the sentence But in the event that we find Automoderator is more effective or accurate than ClueBot NG, we want to ensure the door is open to either transitioning to Automoderator or having it run in parallel. Is this saying that if the team finds Automoderator more effective/accurate than ClueBot NG, the community will be barred from rejecting it? This shouldn't normally be an issue, but the WMF has a history of blatantly lying (see talk) about their solutions being better than what is currently in place, so I think the community should be able to reject Automoderator regardless of the Moderator Tools' teams' assessment of its quality relative to ClueBot NG. casualdejekyll  00:24, 27 August 2023 (UTC)
 * @Casualdejekyll This absolutely will not be forced on any communities that don't want it. When we wrote "ensure the door is open" I meant "ensure that the community feels equipped to make a decision about using this tool or not" - I've hopefully clarified that sentence. Although I think it's entirely possible that Automoderator turns out to be less accurate or have lower coverage than ClueBot, we didn't want to assume so and simply ignore English Wikipedia, so this project page and discussion is our attempt to keep the community in the loop in the event that it actually is more accurate and you might want to explore using it. The way that we want to build this tool would give each community full control, likely via an on-wiki configuration page similar to Special:EditGrowthConfig, over whether the tool runs, and if so with what parameters, messages, etc. (see the illustrative sketch). Samwalton9 (WMF) (talk) 06:43, 27 August 2023 (UTC)

What configuration options would you want to see in the tool?

 * Either a whitelist or blacklist of namespaces in scope. — xaosflux  Talk 13:13, 25 August 2023 (UTC)
 * @Xaosflux Thanks for the suggestion! At the moment the machine learning models that we're using only support the main namespace, which is the same as ClueBot NG as I understand it. If that changes, I think a namespace filter sounds like a fairly straightforward configuration to add. Samwalton9 (WMF) (talk) 13:41, 25 August 2023 (UTC)
 * @Samwalton9 (WMF) thank you, no worries then. Yes, at least here on enwiki CBNG only "reverts" on (article), it leaves messages in usertalk and project space. — xaosflux  Talk 13:53, 25 August 2023 (UTC)
 * The mock up image shows options for very high, high, medium, and low. I would suggest that Wiki administrators have the competence and desire for more precise control. You could show the numeric value associated with each of those settings, and then have a custom option to enter a desired numeric value. I would also suggest having separate controls for auto-revert and for logging. I can definitely see a wiki setting a high threshold auto-revert along with a low threshold for log-and-review. (And issue an are-you-sure sanity check warning if someone tries to set the logging threshold higher than the auto-revert threshold.) Alsee (talk) 06:56, 30 August 2023 (UTC)
 * @Alsee This is something we'll need to give more thought to - although absolutely I agree that administrators are competent and might want precise controls, machine learning model scores aren't always the most intuitive things to parse. I'm looking at ORES for inspiration, for example how in Special:RecentChanges you select categories of edit harm rather than defining a particular ORES threshold. One top-of-mind idea I have on this is perhaps, per the conversation below, we could have human-readable options in the UI, but admins would still be able to go in to the configuration file itself to fine-tune the figures. What do you think?
 * Different actions for different thresholds is definitely an idea we're interested to explore further so thanks for sharing that suggestion. Samwalton9 (WMF) (talk) 13:22, 30 August 2023 (UTC)
 * I was imagining there was a single numeric value underlying the "high/medium/low" threshold UI, and it would require little understanding to select 'custom' and guesstimate an intermediate value to put in. However now I'm unclear on what the control value(s) look like under the UI. For example, what does "medium" look like under the UI? Is it a single numeric value? Multiple numeric values? Numeric plus non-numeric control values? Alsee (talk) 16:35, 30 August 2023 (UTC)
 * @Alsee You're right that the model will ultimately just be giving us a single number per edit, the ambiguity I was referring to was more around how folks understand (or don't) that number. The score above which we'd be confident to revert for this bot is probably going to be somewhere around 0.99 on the 0-1 scale provided by the model, so we're talking about changing, for example, 0.9925 to 0.9930 to tweak things further than a natural language scale. There may be other non-numerical configurations going into each option, but that's something we still need to explore. Ultimately I think I agree with you that this kind of fine-grained control should be an option, but I also think there are benefits to the default options presented in the UI being less specific. Samwalton9 (WMF) (talk) 16:46, 30 August 2023 (UTC)
 * Gah, Reply tool barfs on tables. I asked the project manager to built it differently. Pardon my grumble grumble as I post manually.
 * Ok, the values are what I thought. I have superficial familiarity with ORES and similar systems. I was picturing a UI something like this:
 * To be honest I tend to be a bit cynical when it comes to the competence of a random member of the general public. However I think that UI is pretty self explanatory for someone with the competence to be an admin and with a full time hobby of pulling useful parameters and meaning out of surrounding wikitext gibberish. Chuckle. I went back and specifically added a zero at the end, to self-document that it was not limited to 3 decimal digits. So if the threshhold is already High and you're still getting complaints, you copy-paste the high value and edit it higher.
 * If you want to get fancy you could even add two more columns, estimated number of good reverts per day and estimated number of bad reverts per day. If the custom option is selected, those extra columns could dynamically update the estimates based on the custom value. These calculations would require estimates for total edits per day and estimates for true-bad-edit percentage, but I assume(?) adequate estimates would be available.
 * Oh, and of course set a hard minimum. A typo of .0998 would be bad chuckle. Maybe also add a low-value-are-you-sure threshold, somewhere above the hard minimum. Alsee (talk) 19:19, 30 August 2023 (UTC)
 * @Alsee I just realised I didn't respond to this message earlier. I think the kind of interface you laid out in this table is the kind of thing we'll have, and I definitely like the idea of presenting the associated estimated reverts per day / false positive rate / etc. - we were quite inspired by this paper where the authors design interfaces for ORES to help visualise and document how changing various settings changed the impact of ORES tools. We'll almost certainly be setting a hard minimum - I know there are some communities that are more relaxed about false positives when it comes to these tools, but there are clearly sensible limits we can impose to avoid drastic issues. Perhaps each wiki could also define its limit, in a way that's harder to edit (e.g. it's in the configuration file but not presented in the UI)? Something for us to think about. Samwalton9 (WMF) (talk) 10:50, 4 October 2023 (UTC)

What is the support model for this feature?

 * Who specifically will be responsible for bug fixes and feature requests for this utility? If this includes paid WMF staff, are any SLOs going to be committed to? — xaosflux  Talk 13:10, 25 August 2023 (UTC)
 * @Xaosflux The Moderator Tools team will be responsible for development and ongoing maintenance of the Automoderator configuration interface, and the Machine Learning and Research teams are responsible for the model and its hosting via LiftWing. SLOs aren't something we've talked about for this project in particular, but I think it's an interesting idea, so let me chat about it and I'll get back to you. Samwalton9 (WMF) (talk) 13:48, 25 August 2023 (UTC)
 * Thank you, some issues we have run in to before with anything that runs at massive-scale is that bugs can quickly become serious problems, and conversely if a process gets built that the community becomes dependent upon (to the point where volunteers may stop running other bots, etc) having the process break down also becomes disruptive. — xaosflux  Talk 13:57, 25 August 2023 (UTC)
 * @Xaosflux That totally makes sense - we'll need to be very careful when making code changes once the tool is live on a wiki, given the potential for disruption, and ensure that we have good test coverage and code review processes. Beyond tool bugs, I think the biggest factor in terms of uptime is likely to be the LiftWing infrastructure which will host the models. The Machine Learning team are in the process of defining SLOs for LiftWing, which you can read about at T327620. Samwalton9 (WMF) (talk) 16:29, 28 August 2023 (UTC)

Where does Automoderator sit in the workflow?

 * The description says "prevent or revert", will this tool function at multiple layers? There are many steps of the publish workflow today, where in the workflow would this tool function? Specifically before/after which step, keeping in mind certain existing/planned steps such as editcheck, captcha, sbl, abusefilter, ores, publish, copyvioscore, pagetriage. —  xaosflux  Talk 13:18, 25 August 2023 (UTC)
 * @Xaosflux This is something we haven't decided on yet, so I'm curious where you think it would be best positioned. My initial thinking was that this would be a revert, much like ClueBot NG, both for simplicity and to maximise community oversight. Rather than try to overcomplicate things, allowing an edit to go through and then reverting it means we have all the usual processes and tools available to us - diffs, reverting, a clear history of actions the tool is taking, etc. That said, we're open to exploring other options, which is why I left it vague on the project page. There are benefits to preventing a bad edit rather than allowing and reverting (as AbuseFilter demonstrates), or we might imagine more creative solutions like a Flagged Revisions-style 'hold' on the edit until a patroller has reviewed it. I think that for simplicity we'll at least start with straightforward reversion, but I'd love to hear if any other options seem like good ideas to you. Samwalton9 (WMF) (talk) 13:57, 25 August 2023 (UTC)
 * @Samwalton9 (WMF) I don't think that starting with something that would increase editor workload would be good here for "edits" (e.g. requiring edit approval) - but maybe as part of the new page creation process. (Please advertise for input to Wikipedia talk:New pages patrol/Reviewers and Wikipedia talk:Recent changes patrol. — xaosflux  Talk 14:06, 25 August 2023 (UTC)
 * While "prevent" is tempting, I think it's probably better to stick with revert.
 * Revert style allows good edit false-positives to go through, then any editor can restore the good edit. I've had to do this on a few rare occasions with cluebot. Note: It's important that the bot never editwar, in particular the bot should never should never trigger when the edit in question is itself a revert restoring the content to the page.
 * Revert style leaves a record of false positives, which may be needed to evaluate and adjust the tool.
 * A false-positive block may have a particularly strong impact on an editor - especially if they are relatively new. While a false positive-revert may also be dispiriting, we can at least hope they'll see a human revert the bot and restore their content. Ideally that human may even send them a personalized message in the process.
 * Revert style leaves an edit history log of the individual's action. This can be useful for the community, providing evidence allowing the user to be blocked or otherwise sanctioned.
 * I suspect blocking the edit will often just result in the user tweaking the content to evade the autoblock.
 * Alsee (talk) 07:20, 30 August 2023 (UTC)
 * @Alsee This is a great overview of the reasons we might want to revert rather than prevent, thanks for laying it out so clearly. Samwalton9 (WMF) (talk) 13:15, 30 August 2023 (UTC)

Is the Moderator Tools team open to suggestions for a different name for the Automoderator?
As brought up by Special:Contributions/78.28.44.12778.28.44.127 in this edit, Wikipedia does not have "moderators". How are we supposed to have an automated version of a moderator when the term "moderator" in a Wikipedia context is undefined? Furthermore, the term "moderator" is associated with paid editors falsely claiming to be in positions of power. casualdejekyll 00:00, 27 August 2023 (UTC)


 * @Casualdejekyll I've just responded to that comment, which I think answers your question, so thanks for highlighting it. In short, yes, we're definitely open to new names, and in fact it might be best for us to allow each community to name the tool themselves anyway. Samwalton9 (WMF) (talk) 06:51, 27 August 2023 (UTC)
 * If there is a service account, don't make the service account name translatable, there are big problems with this related to abusefilter that are unsolved. But yes, all messages used in logs or used to send to users should be both translatable and localize-able. — xaosflux  Talk 13:31, 30 August 2023 (UTC)
 * @Xaosflux I'd love to know more about the abusefilter account name problems if you have any links where I could read more - this hasn't progressed beyond a vague idea in my head so we haven't looked into the feasibility yet. Samwalton9 (WMF) (talk) 15:49, 30 August 2023 (UTC)
 * Replied on your talk. — xaosflux  Talk 15:58, 30 August 2023 (UTC)

Will Automoderator communicate?
CBNG has 3 primary outputs today: (1) revert an edit, (2) notify the user that was reverted via a user_talk message (example), (3) Escalate certain repeat reversion for further attention. (example). Will Automoderator perform steps 2 and 3? — xaosflux  Talk 13:35, 30 August 2023 (UTC)


 * @Xaosflux I personally think Automoderator should definitely do step 2 - good faith users need an obvious way to report a false positive, and new users don't get any notification their edit was reverted by default, so sending a talk page message seems a sensible way to notify them and provide links for a report. I'm less sure about step 3 at this stage - I like the idea, but reporting venues and processes are so different from one wiki to the next that I don't know what this would look like, unless we created a new venue or interface for Automoderator to report in. I'm not super enthusiastic about that idea because we're trying to save patrollers time, and giving them yet another feed of things to review may generate more effort than we're saving. What do you think? Samwalton9 (WMF) (talk) 15:55, 30 August 2023 (UTC)
 * Lack of #3 seems to have a problem for use case: Vandal1 make an edit, Automoderator reverts it, (REPEAT) (REPEAT) ... (REPEAT) - now someone else that isn't doing the reversions would need to detect this and report it. We've had that sort of feature here for over 15 years (see old example). So some sort of short-term-memory, write a report is probably important. —  xaosflux  Talk 16:03, 30 August 2023 (UTC)
 * @Xaosflux I totally see where you're coming from with the edit warring concern. One thing that I noticed most anti-vandalism bots do is only revert a given editor so many times per page per day, with the idea being that if someone (even the original editor) has reverted the bot, it's ambiguous enough that this should go to community review. That would avoid the edit warring problem, but doesn't solve the issue of someone else now needing to independently notice the issue, so even in this case I agree that some kind of reporting would be valuable. Samwalton9 (WMF) (talk) 16:40, 30 August 2023 (UTC)

Any action on Configuration Panel itself MUST be treated as an edit
The mock up image has no history tab. This suggests to me that your team may have overlooked that any action on this panel MUST be treated just like any other admin action, it must run through all of the myriad of systems related to edits. Any such action could be malicious or simply incompetent, and the action needs to appear in every location and tool we use for tracking histories. This includes, but is not limited to: Alsee (talk) 08:07, 30 August 2023 (UTC)
 * Logging the action in the user's edit history.
 * Logging the action on the control panel's own history.
 * Also review User:Risker/Risker's_checklist_for_content-creation_extensions. There does not appear to be any ability to enter freeform content in the control panel, so many items from Risker's list don't apply - for example there's no need to support oversight. However the checklist gives excellent insight regarding logging and visibility.


 * Since this is for on-wiki config, perhaps that can just be a front end for a config file, sort of how Special:ManageMentors just is a fancy front end for MediaWiki:GrowthMentors.json. — xaosflux  Talk 10:06, 30 August 2023 (UTC)
 * @Alsee I absolutely agree that changes to this tool need to be transparent and trackable. The sketch image is simply an illustrative example of the kind of thing we're thinking about for the configuration interface, and is far from comprehensive. As @Xaosflux gestured towards, we're likely to use the Growth team's Community Configuration toolset for this project, which would allow us to build this UI on top of a json page, like Special:ManageMentors (MediaWiki:GrowthMentors.json) or Special:EditGrowthConfig (MediaWiki:GrowthExperimentsConfig.json). That page would have all the usual edit history, reverting, and watchlisting functionality, so we shouldn't need to build anything novel on that front. Thanks for calling this out though, and for linking to Risker's checklist. It's referenced quite regularly at the WMF but I hadn't looked at it in the context of this project, so I'm going to do that now! Samwalton9 (WMF) (talk) 13:10, 30 August 2023 (UTC)
 * Thanx, good to hear that the WMF has really picked up on the issue of integrating with history/tracking/oversight/other systems.
 * I, and many other editors, tend to break out in hives when a project like Gather even suggests the idea of plugging non-wikipage user-generated content into the system. Content might carry a minor's phone number, leaving little tolerance for uncertainty in our toolkit for tracking and elimination. Even adding constrained logged actions makes me itchy, knowing that I can't begin to list everything it needs to support or interact with. Such as watchlisting, which I missed. Good catch. Anyway, enough rambling :) Alsee (talk) 16:05, 30 August 2023 (UTC)

Measurement plan - input invited
Pinging folks who have participated here so far - 

Hi all! I have an update on data and evaluation for this project. We’ve been working on a measurement plan, detailing the various research questions we have and how we’ll evaluate each during the course of this project. We’ve just published a summary version on MediaWiki at Moderator Tools/Automoderator/Measurement plan, and would value your thoughts and opinions. We want to ensure that as we evaluate whether this project is successful, that we’re doing so in a way that you think is reasonable. Samwalton9 (WMF) (talk) 10:45, 4 October 2023 (UTC)


 * @Samwalton9 (WMF) the measurement plan of detecting "actual vandalism" and other FP/FN issues is going to require human review of the edit involved, is staff committing resources to this? — xaosflux  Talk 11:12, 4 October 2023 (UTC)
 * @Xaosflux There are a few ways we could tackle the question of defining 'actual vandalism' and false positives/negatives, and I don't think they necessarily require additional human effort. For example, looking at fast edit reverts by patrollers could give us data on edits that Automoderator missed, and patrollers reverting Automoderator directly implies a false positive. The other source of this data would be false positive reports - we know that this project is going to need a false positive report/review process, so collecting data on those reports and responses can also inform this data point. I appreciate that the latter two ideas here do involve extra effort, but our goal is that false positive review should require less effort than the original workload of reviewing edits would have (the project would seem to me to be a failure if it generates more work than it saves - this is something we'll capture in our guardrail metrics).
 * Ultimately I think the source of truth on this should come from community decisions (even if indirectly as I describe above) because I don't think staff on our team are necessarily best placed to make judgements on which edits should be reverted. We don't want this to generate additional work for patrollers, so we'll start with the most WMF-intensive methods first (i.e. data analysis) and see how far that gets us on answering these questions. Does this make sense or do you still have concerns? Samwalton9 (WMF) (talk) 11:35, 4 October 2023 (UTC)
 * FP are likely "worse" then FN for this sort of system, measuring and adjusting the crossover acceptance rate is going to be critical towards community acceptance. At the very least, manual random sampling will be needed in addition to relying on error reports so that the lack of an error report isn't counted as a true positive. — xaosflux  Talk 11:41, 4 October 2023 (UTC)

Testing Automoderator
We know that one of the most important aspects of Automoderator will be its accuracy. Before we write a single line of code, we want to give interested editors the opportunity to test out Automoderator’s decision-making and share data and your thoughts with us on how accurate it is currently. Automoderator will make a decision based on how it's configured and a score from a machine learning model. While the model will get better with time through re-training, we’re also looking to enhance its accuracy by defining some additional internal rules. For instance, we’ve observed Automoderator occasionally misidentifying users reverting their own edits as vandalism. To improve further we’re seeking similar examples and appreciate your assistance in identifying them.

Please see the Testing subpage on MediaWiki for further details and instructions on how to evaluate past edits and judge Automoderator’s decisions! Samwalton9 (WMF) (talk) 10:47, 24 October 2023 (UTC)