Wikipedia:Page Curation/2023 Moderator Tools project/Interview research

In January–February 2023 we interviewed six English Wikipedia New Page Patrollers to learn more about new page patrolling. Interviews consisted of a small number of questions and then typically around 30 minutes of screen share where the participant patrolled new pages. Our primary goals with these interviews were to make sure that our team had a shared understanding of how new page patrolling works on the English Wikipedia, and to identify potential areas for improvement grounded in today’s patrolling practices. We’ve summarised our findings below. Thanks again to the interview participants for making the time to meet with us.

If you have thoughts, questions, additions, or corrections, please share them on the talk page!

-- Sam Walton and Claudia Lo, Moderator Tools (WMF)

Key takeaways

 * New page patrollers require a vast knowledge of English Wikipedia’s content policies and guidelines. This is a barrier for new patrollers.
 * New page patrol is time consuming and can be frustrating, but this is usually not because of software issues.
 * Patrollers employ a number of different modes and tactics for patrolling articles. The boundaries of what is required of patrollers can be unclear.
 * Patrolling can be a draining experience primarily consisting of negative interactions between editors.
 * Domain expertise is an important factor in which patrollers review which articles. There is a noteworthy feature gap in PageTriage around filtering and finding guidance for a topic.
 * Patrollers use a wide array of tools, gadgets, and scripts to patrol articles. These could be more closely integrated with the PageTriage software.

Policies and guidelines
Moreso than for almost any other kind of contributions to Wikimedia projects, new page patrollers need to be familiar with a vast array of content policies and guidelines. In reviewing a page, a patroller may need to evaluate the potential for copyright violations, assess source reliability, select an applicable speedy deletion criterion, refer to subject-specific notability guidelines (SNGs), and make other decisions which require a deep knowledge of Wikipedia’s rules.

This is challenging for newer patrollers - examples we heard confusion about included whether a subject-specific notability guideline existed for a particular topic, and a lack of clarity about what to do in uncommon situations like copy-and-paste moves. Needing to learn the full depth and breadth of content notability policies and guidelines - and the amount of time it can take to review an article - is likely the biggest barrier to growing the number of patrollers.

Because of the large number of tools, policies, and sources which need to be cross-referenced when patrolling articles, it can take a substantial amount of time to patrol a page, particularly if it is long, or on a topic or in a language the patroller is unfamiliar with (more on this below). Although it’s clear that there are areas in which software improvements would be beneficial to patrollers, they are unlikely to have a substantial impact on the average time taken to patrol a given article. On a related note, patrolling new pages does not seem to be a task well suited to mobile devices - it is not easily turned into a ‘microtask’, and often requires substantial multitasking.

Because of the time commitment and deep policy knowledge required of patrollers, it can be frustrating when other editors disagree with a tag placed by a new page patroller. In particular, some patrollers are often frustrated when administrators decline speedy deletion tags - the patroller may have put a substantial amount of work into making the decision to add the tag, and might feel that the administrator has made less effort. See further discussion on this perspective here.

Beyond tagging a page for speedy deletion, if a page doesn’t meet obvious deletion criteria, but isn’t clearly good enough to be marked as reviewed, individual patroller expertise and judgment plays a large factor in what happens to the article. Draftifying, tagging but not marking as reviewed, nominating at Articles for Deletion, and simply skipping the article were all routes we saw employed for borderline cases. It can be confusing for new patrollers who don’t yet have the expertise and judgment to make this decision.

Modes of patrolling
In speaking to New Page Patrollers, we identified three different ‘modes’ of patrolling - from the front, back, or middle of the NewPagesFeed article queue, with distinctive methods for each. Each patroller also has unique styles of patrolling.

In ‘Front of the queue’ patrolling, patrollers are looking at the most recently created articles. This can include pages which are still in the one-hour grace period, but generally refers to looking at articles which are just over one hour old. In this region, patrollers are focusing on identifying the worst articles - particularly those which should obviously be tagged for speedy deletion without delay (e.g. copyright violations or attack pages). This is generally considered to be the normal mode of patrolling, with patrollers ensuring that as few readers as possible see the worst content. Patrollers often make use of the NewPagesFeed ‘possible issues’ filters and respective in-line tags to identify pages to prioritise here. Front of the queue patrolling gives rise to a tension between acting on bad content as soon as possible, and allowing editors time and space to develop a page into a better state. Patrollers have to make judgment calls about how bad an article is - is the creator likely to improve it, or is this its final state?

At the opposite end, ‘Back of the queue’ patrolling involves sorting the queue by oldest pages and working back. This mode of patrolling is highly investigative, as patrollers generally need to uncover why an article is at the back of the queue before doing anything else. This is less clear than at the front of the queue, because articles created in place of a redirect display with the date of their original creation in the NewPagesFeed. Patrollers investigating the back of the queue often find that other editors are already engaged with these articles, reverting back to redirects or debating their status. Past the articles which are old redirects, the oldest articles are generally those which other patrollers have left unreviewed. These articles are often challenging pages to review, requiring more time investment or domain knowledge than average. Some of these articles have been created by an editor with a conflict of interest - patrollers tend to have negative feelings about working on these articles because they feel like they’re helping to create promotional content.

Patrollers also sometimes have a patrolling method which involves a starting point in the middle of the queue. One patroller reviews pages which are 7 or 14 days old, for example, and another prioritises the oldest unreviewed redirects.

Beyond position in the queue, we found that patrollers also engage in patrolling in other differing ways. Some aim to clear the queue by processing as many articles as possible - they perform surface-level analysis and add tags or mark as reviewed before moving straight on to the next article. Other patrollers are slower and patrol more deliberately, with some looking for articles with potential which they can help bring up to scratch.

The boundaries of what patrollers should be doing when reviewing an article can be unclear. While some patrollers will only engage with the content present on the article page itself, others will go further to research the notability of the subject independently, or even improve the article. Similarly, some patrollers will add categories to uncategorised pages, inbound links to orphaned pages, or WikiProjects to pages without any. This isn’t inherently a problem, but taken with the previous paragraph, highlight that patrollers are likely to have differing perspectives on what it means to patrol newly created pages.

Another insight we gained on the topic of how patrollers work is that marking a page as ‘reviewed’ is seen by many patrollers as marking the page as ‘free of issues’. Although new page patrolling may once have been designed as a process focused on lightweight triage, patrollers today generally do not mark a page as reviewed until it clearly demonstrates the notability of the subject and is free of, at least, sourcing issues. Patrollers often tag an article with maintenance tags but then do not mark it as reviewed, leaving it in the queue until it gets reviewed by another patroller or the issues are resolved. While this is intended to give the article creator time to improve the article, it also has the effect of leaving that article in the queue of unreviewed pages, despite having been reviewed. This may or may not be desirable, but we thought it worth highlighting.

Negative user interactions
In many areas of the encyclopedia, positive discussions and feedback is commonplace, and discussions are generally civil. In patrolling new pages, however, patrollers are often on the receiving end of a deluge of negative comments and complaints. When a page is tagged with issues or deleted, the page creator may complain or insult the patroller, and even in more collegial discussions, the patroller probably still needs to spend a substantial amount of time re-explaining Wikipedia’s policies and guidelines. This isn’t balanced out by much in the way of positive feedback, which can make patrolling a draining activity.

Beyond specific negative interactions, it can also be frustrating to see a lack of progress on pages which have been tagged with particular issues. Some patrollers tag articles with issues and then wait some time before checking back in. If the issues haven’t been resolved, the page may then be moved to the Draft namespace or tagged for deletion. With the primary aim of tags (as described by patrollers) being to educate newer editors on the work that is required, it can be discouraging to see them ignored. It is unclear to what degree tags are helpful to new editors - they often link to lengthy and opaque policy pages rather than pages designed for new contributors. Further research could be beneficial to learn how new editors perceive the NPP process and templated messaging.

Importance of domain expertise and guidance
One of the biggest barriers to reviewing an article is the patroller’s familiarity with the article topic. Many patrollers will skip articles on topics they aren’t familiar with, and/or will focus their attention on certain topics. Knowing the subject-specific notability guidelines (SNGs) for a topic area, and which sources are or aren’t reliable, can be a major factor in assessing an article’s quality. Similarly, articles with a particular geographic focus or with sources in an unfamiliar language can also be challenging. Reliable source lists published by WikiProjects (e.g. WikiProject Video games/Sources) can be helpful, but patrollers need to know where these (and SNGs) can be found for a given topic. Special:NewPagesFeed does not allow patrollers to filter by topic, and the Curation toolbar does not have links or guidance on finding topic-specific help pages.

Some patrollers avoid certain kinds of articles because they expect that reviewing it will open up a substantial amount of additional work. For example, an article on a sporting event might have a dozen other articles which are part of a series - reviewing one article makes the patroller feel obligated to review the entire series, which can cause a motivation block.

Patrollers can gain domain expertise over time if they patrol articles on certain topics. Doing research to determine whether a source is reliable once or twice can help that patroller make decisions about the same topic and sourcing in the future. Some patrollers have gained expertise on sourcing on topics they would otherwise have no interest in.

On a similar topic, over time patrollers develop an understanding of which Wikipedia editors are trustworthy, and which to look out for. They might find an editor who always writes high quality articles on a particular subject and know that they don’t need to spend much time patrolling their article. Such editors could be nominated for the autopatrolled user right, but doing so takes time and requires the patroller to break out of their patrolling flow. On the other hand, they might encounter the same editor who writes articles on a particular subject but never adds sources.

Lastly, in the specific example of handling redirects, broad language expertise can be useful. We saw examples where redirects were transliterations of the page name into other languages, or when understanding the original language of a piece of media allowed a patroller to quickly find appropriate subject-specific databases or sources to assess notability. This rarely seemed to be a make-or-break factor, but having a familiarity with multiple languages or multiple writing systems seemed to facilitate the NPP workflow.

Tools and scripts
Every patroller we spoke to uses multiple other scripts and gadgets when patrolling pages. We’ve listed the ones we saw or heard about in our interviews below:

Very common (most patrollers)
 * Twinkle
 * Most patrollers use the Articles for Deletion feature in Twinkle rather than the one in PageTriage because it has more features (e.g. deletion sorting and edit preview).
 * MoveToDraft (either the MPGuy2824 or Evad37 version, both are actively used)
 * Patrollers prefer one script over the other based on their features/familiarity.
 * Earwig’s Copyvio Detector
 * This tool was described as invaluable, but is very slow.
 * copyvio-check
 * rater

Less common (some patrollers)
 * TinEye (reverse image search)
 * Custom google searches for a topic area
 * DisamAssist
 * reFill

These tools could be better integrated into the NPP process. Some are direct additions to the PageTriage software which could be incorporated into the extension itself. Others are external tools which each patroller needs to independently learn about and integrate into their workflow - we could explore adding a community-configurable list of tools to link to from the Curation toolbar, or investigate other ways of linking these tools more closely.

In terms of PageTriage, we found that patrollers generally didn’t run into significant issues using the extension. During our interviews, no patrollers encountered flow-breaking bugs or confusions with the software, and the bulk of their time was spent reviewing the article content, sources, and referring to policy, rather than navigating the Curation toolbar. It's worth noting that the 2022 WMF letter noted that PageTriage had a number of serious unaddressed bugs - these were fixed by volunteer developers in the months between that letter's publication and this research.

Specific feature requests and observations
While conducting these interviews, we came across a number of clear and generally small potential improvements to the PageTriage software. Some were directly requested by participants, and we think some will solve problems we noticed when watching editors patrol. We’ve filed these as new tasks on Phabricator already and are noting them here. Please join the discussion in Phabricator on these suggestions:
 * Add a 'Select all' button for Potential issues in NewPagesFeed
 * Allow users to filter NewPagesFeed based on time since creation
 * Clarify whether reviewed message should be signed by the patroller or not
 * Enable patrollers to choose recipient of reviewed message
 * Enable editors to add links to other tools in the Curation Toolbar
 * Allow patrollers to add notes for other patrollers via the Curation Toolbar
 * See percent similarity to top deleted revision
 * NewPagesFeed should only display 'Blocked' note for non-partial blocks

Existing tickets which this research surfaced as potential priorities:
 * Add ORES topic prediction to the NewPagesFeed
 * Add a link to the copyvio score in the information flyout
 * Automatically review pages that were reverted to a previously reviewed state