User:Alzarian16/RfA participation by year

If you're interested in both statistics and Requests for adminship, you might well have seen such works as User:Majorly/RfA/Stats, User:WereSpielChequers/RFA by month and the plethora of pages under User:NoSeptember/The NoSeptember Admin Project. If you haven't, read them – they're quite interesting. All three projects presented historical and/or chronological data on areas such as the number of admins passing RfA each month, the number of active admins, resysopping of former admins and others. However, there is one area which has yet to be examined in detail: total participation at RfA. This is addressed below.

Data
Data comes from Successful requests for adminship and Unsuccessful adminship candidacies (Chronological). RfAs which were withdrawn or closed early are ignored as unrepresentative since they didn't run for the whole time period; only those listed on the latter as "consensus not reached", "no consensus", "failed" or "unsuccessful" are included in the unsuccessful column. A separate column tallies the number of RfAs closed early in each year following a (very good) suggestion by another user. Since unsuccessful RfAs were not recorded until 2004, the study begins there.

Data is presented in a year-by-year format. I worked out most of it manually; since this involved looking through every completed RfA since records began, it's quite likely I got something wrong somewhere, although I did double check everything where possible. In addition, the original source pages were also created manually, and may also contain some mistakes. I fixed a few that I spotted, so hopefully this shouldn't be too big an issue.

Summary

 * The total number of !votes cast in all RfAs has been decreasing since reaching a peak in 2006. The large percentage decreases show that participation in RfA as a whole is falling at a quicker rate than other areas of the encyclopedia.
 * The mean number of !votes on successful and unsuccessful RfAs for each year are higher than the previous year in all but one case. The percentage increases in recent years show that participation in individual RfAs is increasing at a quicker rate than other areas of the encyclopedia. This apparent contradiction is explained by the falling number of completed RfAs.
 * The standard deviation for each year increased dramatically until 2006, then began to level out. This suggests that RfA participation is becoming no more spaced out: any single RfA in 2010 is likely to be no further away from the mean than in the three previous years.

Graphs

 * Bar chart of average successful and unsuccessful !votes against time: Wikipedia RfA !votes by year.jpg
 * Graph of changes in the total number of !votes at all RfAs in a year, including a polynomial approximation of the changes: Total !votes at completed RfAs.jpg
 * Number of active users against time (for comparison): WM strat plan WP Active editors.png
 * Graph of changes in average !votes at successful RfAs with time, including a polynomial approximation of the changes: Wikipedia successful RfA !votes by year.jpg
 * Graph of changes in average !votes at unsuccessful RfAs with time, including a polynomial approximation of the changes: Wikipedia unsuccessful RfA !votes by year.jpg

Graphs were generated by me using Microsoft Excel 2003, except for the number of active users chart which was borrowed from an earlier WMF study. The polynomial approximations were calculated using an Excel tool which allows the use of any function up to and including order 6. For total !votes and average !votes in successful RfAs, a function of order 4 appeared most accurate; for average !votes unsuccessful RfAs, a function of order 3 was closer to the true values. This difference is discussed below.

Association between participation and success
This section contains a Chi-squared test for association between number of !votes and success for completed RfAs in the year 2010. If there is no association between the two, the number of RfAs in each of the four categories in the two-way tables below will follow the chi-squared distribution. The test will use the 5% significance level.

Expected values are all larger than 5 so the Chi-square distribution is a good approximation to the distribution of the data.

X2 = 0.1011 + 0.1755 + 0.4413 + 0.7313 = 1.4492

A 2x2 table has 1 degree of freedom, so the critical value at the 5% level can be read from a table of values (example here) to be 3.841. Since the value of X2 is lower than this value, there is no significant evidence for association between the two variables, despite the variance between the two tables.

Extrapolation

 * As an RfA can only pass if the support rate is above 70%, an average successful RfA in 2010 would have had to generate a minimum of 82 support !votes. In 2009, this value was 70; in 2004 it was 16. An average failed RfA in 2010 would have generated a minimum of 34 oppose !votes; this value was 31 in 2009, and 8 in 2004.
 * The trend lines, calculated using polynomial approximations of the discrete data available, suggest that participation in individual RfAs will continue to increase for the foreseeable future, and that the number of !votes on successful RfAs will increase faster than those on unsuccessful ones. If present trends continue, the number of !votes at successful RfAs will continue to rise indefinitely, while those at unsuccessful RfAs will begin to level off and ultimately reach an asymptotic value. It is not known for how long this model will remain valid.
 * Participation in RfA as a whole is a very volatile feature. Between 2004 and 2006, the percentage increases were very large (over 200% increase in one year, followed by 100% increase the following year). Since 2006, numbers have been falling; the rate of decrease is not uniform, either in real terms or as percentages. This suggests that it will be difficult to predict its future behaviour. The polynomial approximation for total !votes suggests that this number has reached its lowest point and will now begin to rise again; an approximation of order 5, which matched the known data slightly better than the order 4 approximation, was rejected as this predicts negative values from 2012 onwards, which is impossible.

Disagree? Tell me!