User:CLo (WMF)/sandbox

Metrics for en-wp SPI, May 2019
Data on administrator actions across Wikiprojects has historically been relatively scarce outside of basic statistics of edits or other easily-logged actions. One of the long-standing issues with using such metrics as the basis for assessing administrative work is that these actions are not generally weighed; a basic count of admin actions per month does not capture, for example, the potential hours of deliberation that can go into a single action versus relatively easy judgement calls. As part of the Community Health Initiative, and to further the future work of the Anti-Harassment Tools team, I conducted an analysis of Sockpuppet Investigations (SPI) on English Wikipedia as a prototype for future in-depth research into administrator workflows.

I chose SPI as a site for investigation due to its semi-structured nature. Under the section, "Cases currently listed", there is a large bot-maintained table that lists all current and recently closed cases, making it relatively easy to gather organized data ready for further cleaning. Additionally, we know from other reports, interviews, and conversations with administrators that multiple account abuse is a common problem that they tackle. Lastly, sockpuppet investigations make heavy use of the CheckUser tool, which is a current focus of some of my research efforts. On top of this, SPI workflows are quite complex, involving multiple functionaries with different permissions all co-ordinating work, as well as the occasional need to refer to older sources of knowledge to compare against previous sockpuppet patterns. This complexity makes it an interesting site for study.

Methodology
To limit the scope of this investigation, I focused on a single month, May 2019. This was chosen because it was very recent, but enough time had passed that I could reasonably expect the majority of cases to be closed, limiting my need to constantly update my data as case statuses changed. Although SPI has explicit archives, they have not been updated since 2010 and are kept for purely historical reasons, requiring a new method for pulling now-deleted cases that are not on the table.

To do this, I looked at the section history for the SPI case table. Because the table is bot-maintained, I could look for large negative edits, which would indicate a large number of cases being removed. By sorting the table by date, I could thus pull closed and archived cases to add to my data. I manually removed duplicate cases, with later updates taking precedence. The resulting information was put into Google Sheets, and a number of additional values were calculated.

Calculated values
In addition to the information contained in the SPI case table, I calculated:


 * Time to close
 * Number of socks to date
 * Number of involved users
 * Cross-wiki status?
 * History over one year?

Time to close is the number of days between the case opening, and the last edit on the case, only for cases marked "CU complete", "CU declined", or "Closed". Time to close is a slight misnomer; more accurately it should be considered a composite, rough measure of the time to CheckUser response. In any case, I wanted to use it as a rough measure of turnaround time.

Number of socks to date was a simple count of all confirmed sockpuppets associated with that master account. IPs blocked for socking each counted for one; ranges were noted where possible, but were not counted since they were rarely linked to a sole account, and I decided to keep this count conservative. Only accounts marked "Technically indistinguishable", "Confirmed", "Likely", or explicitly blocked for socking under that case, were counted as confirmed. The confirmation also had to come from a CheckUser or the closing admin. That is, any account that was either confirmed by CheckUser or was determined as a sock through behavioural evidence (for example, by following WP:DUCK) counted as a confirmed sockpuppet. For those rare cases with an LTA account, I counted the number of articles in the category, "Confirmed sockpuppets of ".

Number of involved users was a count of how many users were involved in the logged case. If an account had multiple cases opened against it over the month of May, I treated all cases in May as a single long case for the purposes of calculating this figure, since the case only occupied one row in the table. "Involved" meant any user who left a message on the case.

Both Cross-wiki status and History over one year were simple true/false values. A cross-wiki case was one where the reported master operated sockpuppets on more than one project, as evidenced in their history. Any case with an archived case from before 1 May 2018 was considered to have a history of over one year.

Findings
In May 2019, there were 178 new cases opened at SPI. Of these, 137 (77%) were marked closed by 10 June, when the bulk of these cases were logged for this project. Fourteen of these cases (7.9%) were against a master account that had engaged in cross-wiki abuse, and 32 (18%) had a history of over a year. Only seven cases were both cross-wiki and had a history longer than a year.

The median days-to-close number was 2, with a low of 0 and a high of 35. This suggests that SPI admins are fairly adept at closing cases; however, some of these closed cases are quick because they are declined by CUs, because the accounts or IPs involved have already been blocked, or simply because there are no sockpuppets involved.

As for number of sockpuppets, for cases reported in May 2019, the median number was 2 though the mode was 0. There were ten cases that had over 50 confirmed sockpuppets each, representing a disproportionate timesink despite their low count. Based on earlier work, we can assume that the number of hours needed to handle a sockpuppet case goes up exponentially with each new alleged sock; therefore, these cases with large numbers of sockpuppets take up far more time and energy than their low number would suggest. However, the data suggests that most cases have a very low number of socks, or end in no case.

For this set, the median number of involved users was 3. Paired with observational data from reading through each case, a basic pattern emerges: cases are generally commented upon by the reporting administrator, a CheckUser, and a closing administrator or CheckUser. This suggests that sockpuppet investigations are a highly collaborative endeavour, but also points towards its administrative overhead, requiring multiple people with different permissions to simply open and close a case.

Key takeaways and future expansions
The results of this quick study, coupled with feedback and interviews with current CheckUsers, indicates that while the median case has only a handful of sockpuppets, those cases with more than 50 alternate accounts take a vast amount of time and energy to resolve, and tend to offer a much greater sore spot. The vast majority of cases are closed within a few days, but this speed is partly a result of a dedicated body of users who triage cases and decline those that do not require CheckUser to resolve. This is not a position that every wiki has, and in this respect English Wikipedia may be more of an outlier than a baseline.

The number of cases declined suggest that this is a worthwhile line of inquiry to pursue. In the absence of further data on the numbers of CheckUser cases conducted on our projects, understanding how often the tool is requested versus how frequently it is actually employed would be helpful in assessing the impact of any improvements made to CheckUser.