Wikipedia talk:Don't protect Main Page featured articles/December Main Page FA analysis

This is an analysis of the effect of being on the Main Page of Wikipedia on featured articles. It considers all the featured articles in the first week of December 2006.

Purpose
The aim of this is to provide some 'facts' over an extended period of time in order to inform debate at Wikipedia talk:Don't protect Main Page featured articles. In paticular, it is interested in the effects of those users who would be prevented editing an article through semi-protection. Its aim is not to provide conclusive data upon which changes to the policy must be made and it is not prove/disprove the rationale of the policy. The month and articles chosen will effect the results, so it cannot be claimed that what is true for these 31 articles at this time is true for all FA's all the time.

What is being counted and analyzed
The following things will be tracked or assessed. The numbered variables refer to their key in the tables:
 * From two diffs - a 'before and after', and one showing change between 'immediate after' and 48 hours after the article is removed from the front page - the overall changes in the quality of an article.
 * The % of IP edits that are vandalising
 * The % of IP edits that are beneficial to the article, in either reverting vandalism or improving quality
 * The % of IP non-reverting edits that, beneficial or not, are in good faith
 * Total time with vandalised versions of the page visible
 * The amount of time spent protected or semi/protected
 * Reasons for any such lock downs on the article (shown in notes section)

Template vandalism
Vandalism of templates isn't included in this study (see note on 8 December, below). That's because such vandalism is done by editing the templates, not by editing the article directly, so they do not appear as edits on the history page of the article. Nor is it possible to easily use the edit history to determine when the damage was fixed, such that also could have been done by an edit to a template, not the article.

Edits done by those trying to repair the damage by removing such templates, or substituting another template, DO appear as edits; it's fairly safe to say that an anonymous editor wouldn't understand templates or their repair.) So template damage is (a) not included as IP damage, below, either by count or by time, and (b) may result in increasing the edit count of registered editors trying to fix the damage (edits directly to templates, to fix them, again aren't included in the article history).

The issue of template damage appears to be in the process of being resolved by a change in policy (5 December, but just being noticed) that allows semi or full protection of templates in Main Page articles, and presumably there will be a change in process by those who handle FAs being moved to the Main Page. John Broughton |  Talk 15:30, 8 December 2006 (UTC)

Methodology and issues
There have been various issues that have come up:


 * Often an IP editor will make a series of consecutive edits. These have been counted seperately.


 * Where a 'save' produces a page that is the same as the previous 'save' (that is, a diff shows no change), this is counted as "Good faith but not beneficial", no matter what the user's other edits have been.


 * Where a user vandalizes the page and then immediately reverts that vandalism, these are counted as two "Good faith but not beneficial" edits.

There are probably more things that can be studied. if you want to suggest them, feel free (or add a column to the table, or add comments to the table).

Everyone should feel free to participate in this study.

Initial test on 1st of December
Key:


 * 1 The % of IP edits that are vandalising
 * 2 The % of IP edits that are beneficial to the article, in either reverting vandalism or improving quality
 * 3 The % of IP non-reverting edits that, beneficial or not, are in good faith

Format changed slightly starting on the 2nd; new user edits were no longer included.


 * For December 1, users who vandalized were checked to see if they were newly registered (that is, registered for less than 4 days), since such vandalism would have also been blocked by semi-protection, and this was added to the counts. This approach was not used after the 1st, since it was realized that this would (a) add significant time, and (b) present problems with consistent counts (IP edits are obvious; "new user" edits are not, requiring further checking of each editor).''

2 December

 * Before and after diff, 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page
 * There were 124 edits today, 58 of which were by IP editors. A total of 89 edits were either vandalism or reverting of vandalism.
 * This page's introduction changed notably over the day on the Main Page. This was not affected by IP edits and remained of a good quality, though possibly inferior to the starting version. Elsewhere in the article, the only changes were syntactical or improving wikilinks. IP edits were relatively low while the article was on the front page and - for the most part - reverted quite quickly. A small number of repeating vandals were responsible for a large proportion of the IP edits today.
 * There were no vandalizing edits by newly registered users.
 * The changes in the 48 hours after the article left the Main Page were not related to any of the edits from that day that were in place as of the end of that day.

3 December

 * Before and after diff, 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page
 * There were 193 edits today, 47 of which were by IP editors
 * The overall changes to the page were almost entirely syntactical or wikilinks. Today was the first day of the study that saw a substantial number of vandalising edits by newly registered users, who would also be blocked by semi-protection of the article. (New editors are not included in the counts below.
 * Newly registered editors contributed 29 vandalizing edits (24 by a single userID) which resulted in the main page appearing vandalized for an additional 14 minutes.
 * There were very few beneficial IP edits today. A large proportion of the vandalising IP edits were conducted by one or two users.

4 December

 * Before and after diff, 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page
 * There were 280 edits today, 96 of which were by IP editors. The number would have been higher if the page had not been semi-protected for the last 8+ hours it was on the Main Page.
 * There were numerous edits by newly registered editors, many of which fall under the category of "Good Faith but not beneficial". There were a total of 8 clearly vandalizing edits by newly registered users which only resulted in an additional 2 minutes of time which the article appeared vandalized.  Two templates were vandalized for an additional two minutes of vandalism.
 * Space for overall analysis

[1] Of the beneficial edits, only one was content-related; the text that was added was not in the article at the end of the 24-hour period.

5 December

 * Before and after diff, 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page (PLACEHOLDER)
 * There were 535 edits today, 206 of which were by IP editors
 * The Beneficial IP edits were almost all reverts (and often incomplete).
 * Vandalism to this article was of a particularly vile and distressing nature due to the subject.
 * Very hard to gauge how long the article spent vandalised without per-second history.
 * Three templates were vandalized, contributing to an additional 10 minutes of unreverted vandalism.

[1] One vandalizing edit every 90 seconds.

6 December

 * Before and after diff, 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page (PLACEHOLDER
 * There were 323 edits today, 111 of which were by IP editors
 * There were 13 vandalizing edits by newly registered users which added an additional 6 minutes to the amount of time the article appeared vandalized. Additionally, there were 4 vandalizing edits by seemingly established user accounts for an additional 3 minutes.
 * Space for overall analysis

7 December

 * 24 hours on Main Page
 * Before and after diff, 48 hours after leaving Main Page (PLACEHOLDER)
 * There were 266 edits today, 107 of which were by IP editors
 * There were 9 vandalizing edits by newly registered users, adding 5 minutes to the time the article appeared vandalized. There were 2 vandalizing edits from a seemingly established user, causing 1 additional minute of vandalism.
 * Analysis of change:

Notes:
 * [1] Three of the "Beneficial" edits involved content changes. Two, by 66.215.83.151 at 14:39 and 14:41, were reverted by a pair of edits about 14 minutes later, possibly because the two changes, if looked at separately (as it appears was done), don't automatically appear constructive.
 * [2] At 13:31, 68.250.159.195 changed the name of a person mentioned in the article; this was not reverted until 14:08, 37 minutes later, which possibly distorts this number, since the vandalism was quite subtle and would not have made any difference to the average reader.
 * [3] At 18:15 there was an addition to the article by 66.73.14.2 that I think was excellent (it even included a cite). I've counted it as "Beneficial". It was reverted two minutes later by 82.135.113.226, with an edit summary of rv vandalismus, which I've coded as "Good Faith but not beneficial".  A review by an expert on the subject matter would be appreciated; I may have the coding backwards.  (The information appears to have never been added back to the article, or, if it was, was deleted again.)


 * The apparent content addition with reference by 66.73.14.2 appears to be made up. I can't find any source similar to what the IP described, and for what it's worth, a Google search for author "Heigendozen" returns naught! If you don't mind, I've added one to "vandalize" and subtracted one from "good faith but not beneficial" in the 1800 block. – Outriggr § 01:08, 12 December 2006 (UTC)

Template vandalism
As of now, (7:22 PST), the templates associated with Macedonia (terminology) were vandalized for a total of 1 hr 58 min. Some of the time might be overlapping.--DaveOinSF 15:23, 8 December 2006 (UTC)


 * See new section, above, on what isn't being counted. John Broughton  |  Talk 15:39, 8 December 2006 (UTC)


 * Well, regardless of whether it's supposed to be counted, it was, in fact, counted, so might as well add this as a note. Certainly this is not particularly salient to the specific aims of this project, but is a useful bit of information when assessing the overall amount of vandalism that readers did, in fact, encounter.--DaveOinSF 18:16, 8 December 2006 (UTC)


 * Apologies; I can see how my comment would be read as not thinking your contribution being useful. In fact, it's extremely useful if someone should argue about the need to at least semi-protect templates.  Also, I suspect (from what I've read) that template damage was more disturbing to readers (and embarrasing to Wikipedia) than most of the vandalizing edits directly to the page.   John Broughton  |  Talk 18:45, 8 December 2006 (UTC)

End of counting phase of the study?
I'm hereby proposing to end the counting portion of this analysis, having collected seven days of data, above (the 1st of December is in a slightly different format, but could be reanalyzed if need be). The counting, by three different editors, has produced very similar percentages across a full week of MPAs. Counting of edits for more days will, I think, add little, unless there are strong arguments by others that (somehow) these seven days are not representative, or enough of a sample. If so, then we can do more, starting with the 8th, or can go back into late November, or do whatever best responds to such arguments.

Instead of more counting, I think we should turn our attention to discussing what has been found, and, if need be, what additional counts or information might be needed to provide further answers. For example, we now know that MPAs are often semi-protected, often for hours at a time, so it's possible to compare what happens during the periods of semi-protection to what happens during similar periods of the day when not semi-protected. John Broughton |  Talk 15:39, 8 December 2006 (UTC)