Wikipedia:WikiProject Video games/Newsletter/20210104/Feature2

Follow-up: VG editing in a time of COVID

 *  Submitted by Thibbs 

Well, here we are closing 2020 and looking forward to 2021. The WP:VG Newsletter feature for 1st Quarter 2020 obliquely touched on the topic of COVID which at that time had caused shutdowns in numerous countries, had begun normalizing mask wearing, and had claimed the lives of nearly 80 thousand people worldwide. Italy and Spain were suffering greatly and the USA, France, and the UK were closely following them. Politicians were arguing how best to control things and Wikipedia was reporting on it all (using RSes, of course). The tongue-in-cheek thought that was floated by the Newsletter was that with mandatory shutdowns, we editors (at least the "non-essentials") would have more time to play games. And although the proposition remains undiscussed on Wikipedia, the structure here is more amenable to discussing writing than gaming. Talk pages are filled with discussions of editing (structure, notability, sources, etc.), not playing. But it seems that the more time you spend on Wikipedia the more obvious it becomes that editing is a kind of playing. Crafting an article isn't too incredibly different from building with mined redstone while avoiding the creepers. Heck, even proper rule-based games within the sandbox are possible according to Wikipedia's Department of Fun.

And that raises a question that can be answered by looking at the data rather than relying on self-reporting: Does widespread mandatory free time (e.g. from pandemic lockdowns) have any effect on video-game-related editing? There are a number of ways that we can begin unpacking the question, but let's first consider the possible variables. The most obvious effect of a pandemic would be illness or even death. While illness would slow or hamper yearly statistics, pandemic death such as that we have seen in editor User:Dmitrismirnov on 9 April 2020 is a final silencing of an editor. Wikipedia is not good at tracking the people behind the usernames (which is probably a good thing), but there can be no question that some X number of editors have died of COVID this year. Many editors and family members in the USA, Brazil, India, Mexico, and so many others have been impacted by the now 1.7 million deaths. All of this would have a depressing effect on editing rates. On the other hand, the freedom and the idle hands that comes with forced furlough may spur editing rates to increase. And there are other considerations like economic status, family structure, political instability, scandals, and protests, etc., etc. Without dipping into WP:OR territory perhaps it is most sensible to simply examine the data and make our own conclusions.

The 1st Quarter analysis was based in large part upon data from the Newsletter. This time we will be using the WikiProject's data. The initial dataset covers the total number of WP:VG pages (articles, categories, templates, etc.) and spans the entirety of WP:VG's tracking stats from 30 June 2006 to the present. And it is at this point that we start noticing the unusual shape of the curve. Dates like 3/2011 and 1/2010 provoke immediate curiosity. Did WP:VG really grow by 20k pages in one month? How is this possible? The curve is very rough and not at all familiar to the normal editing experience. Looking into the COVID period that is the focus of our investigation we see similar eruptions at 8/2020 and at 11/2019 (each increasing the content count by ~1200-1700 pages or ~40-60 pages daily). Breaking the totals down we see two factors at work here. First it is clear that the pages that are counted increases in time. When new categories of pages like "Category" (which started at 1556 pages) and "Redirect" (starting at 962 pages) are added to the total count we see artificial jumps in the curve. Secondly, and perhaps more significantly, the "File" category (images, audio, film, and other media) is significantly more erratic than other categories. The 20k spike of 3/2011 as well as both of the COVID-era jumps are primarily attributable to the irregularities of "File" counts.

To resolve these irregularities, the easiest solution is to remove all earlier portions of the curve that didn't count all page categories (e.g. the "Redirect" category and the "Category" category). From the data we see that the last category that was added was the "Featured Media (FM)" category which was first initiated on 1 January 2016. This shrinks the viewing area to 5 years but perhaps produces a clearer picture of the possible COVID effect. Or does it? Unfortunately when starting at 2016 we find ourselves looking at a monumental curve with each datapoint perched on 70k+ earlier pages. To see the curve more closely it is a simple matter to remove 70k from each datapoint to achieve a much more sensible curve focused specifically on the changes from 2016 to 2021.

So the modern landscape is starting to shape up, but the inclusion of the erratic "File" category is still providing strange spikes rather than the kind of trend one would imagine for traditional article editing, etc. At this point it's difficult to see if there is anything outside of the normal in 2020. The next step is to remove the "File" category, but we will have to work from the original data so we will accomplish this is 2 steps. First we will remove the "Files" and then we will flatten the curve by removing 40k to avoid another monumental curve.

Ah, at last a "calm" curve. So has there been any effect from COVID? Well actually there does seem to be a subtle boost. The enterprising reader with a straightedge can see that the curve is not quite linear. There does seem to be a subtle dip in 2019 and a subtle rise in 2020 that makes up for a good portion of that dip. Perhaps switching from simple content increase to the rate of increase would be more illuminative. This can be accomplished by taking the derivative of each point on the curve. And the result is... Well it's pretty difficult to read to be honest. It's way too noisy to make sense of the data. Perhaps if we switched to a running average to calm down the spikes and crevasses?

That's more like it! We seem to have achieved a chart demonstrating a comprehensible rate of change from 2016-2021. The subtle dip in editing may have actually started a bit earlier than 2019. The reduced rate of growth may have started in mid-2018, but by 2019 the rates had generally become sluggish and then in 2020 the rate of growth seems to have increased dramatically with months like March, May, and July 2020 bringing WP:VG back to a monthly 200+ average. The reason for this apparent rate of growth is obviously hard to pin down, but COVID seems to have been one of the main constants for this year and it is likely that it has had some effect on the numbers. 2020 has been a decided growth year.