User:TeaDrinker/Welcome study

Abstract This pilot study investigates the effect of welcoming new users to Wikipedia before they edit. A randomly selected subset of 214 newly registered editors received a standard welcome template and their subsequent editing behavior was monitored for one week following registration. The welcome template seemed to increase the probability the editor would edit from 0.2, for non-welcomed users, to 0.298 for welcomed users, although this result was not statistically significant. Several other variables are measured.

Introduction
Wikipedia has the primary goal of writing an encyclopedia, and around this work a community of editors has developed. The community exists to better the encyclopedia, although little research has been done to quantify this contribution. This study looks at the effectiveness of welcoming new users, prior to their first edit. The primary measured outcome is whether or not the user makes an edit.

Wikipedia's founder, Jimbo Wales, included in his Statement of principles the point that "New users should always be welcomed," although this was probably intended in a broader sense. Welcoming new users with a welcome template provides a contact within the community, a short collection of links intended to help find information on how to contribute, and is intended to provide positive reinforcement for contributing. This study examines the effect of this particular form of welcoming on the subsequent editing of the new user.

Materials and methods
From 02:27 on 4 May, 2007 to 19:59 on 10 May, 2007, 214 newly registered editors were selected from the top of Special:log/newusers. After selection, a pseudo-random number generator simulated in R determined if the user would be welcomed with equal probability. A total of 113 users were welcomed using the template, under a welcome headline, and signed by User:TeaDrinker. The intent was to welcome the user before they made any edits, although it is perhaps inevitable that some users did not see the message until after they had completed their first edit.

The time of the editor's first edit was recorded as soon as it was seen, and subjectively classified into one of thirteen types. In cases of vandalism or creation of pages which fit the criteria for speedy deletion, the appropriate action was taken including warning the user. One week after the user created an account, the number of (non-deleted) contributions were tallied (although in most cases, the other edits were not examined or classified).

The time a first edit was made was recorded even if edit were subsequently deleted. Deleted pages, however, were not included in the end of the week count. Since Wikipedia does not keep an accessible record of deleted edits indexed by user, some first edits were likely missed. An examination of the talk pages of editors in the study showed one instance of a notification of a speedy deletion which was not recorded.

Several users were excluded from the study at various stages. One user was an obvious attempt to impersonate an administrator, and the username was reported to administrator intervention against vandalism and subsequently blocked. No further data was collected on this user. Two users were welcomed by other editors, and one new editor generated an edit before they were welcomed. These have all been excluded from analysis.

Although the users selected for inclusion in the study was done in a non-selective manner, the time of day at which they were selected was not random. Most editors were selected in the 9 hours between 1800 and 0400 (UCT) due to difficulties in acquiring a representative sample.

Results
Of the 211 editors included in the study, 112 were welcomed in the manner described above, while 99 did not receive a welcome. Of the 211 editors, 66 made an edit in the first week. The breakdown of edits by type are shown in the table.

The principal aim of the study is to determining if welcoming users has an effect on their behavior. With this goal in mind, the principle metric is the number of users who make at least one non-vandalism edit. The proportion of users making non-vandalism edits was higher in the welcomed group, as shown in the table.

Under a null hypothesis that the proportions in the welcomed and non-welcomed groups were both equal to their combined proportion, the test the hypothesis that the welcomed group were more likely to edit is not significant (using a chi-squared test).

Other measures
There are other metrics which can be calculated in the data. In most cases, multiple hypotheses can be constructed for how each measure should be affected by the welcome message. Since many of these hypotheses can be generated for any number of possible outcomes of the data, it would be perhaps misleading to do a formal hypothesis test on each of these, at least without careful consideration to correcting for multiple comparisons. As such, these data are best viewed as exploratory and perhaps indicative of directions for future research, rather than definitive conclusions.

Time of edits
One measure which can be calculated is the time to the first edit. This measure seemed to have two distinct behaviors; many people edited within an hour or so of registering. If an editor did not make their first edit within roughly an hour, however, they tended to wait for hours or days before making their first edit. There was some variability in the outcomes by whether the editor received a welcome, as shown in the table. These data suggest that welcoming users increases the probability they make an edit in that session.

Total number of edits
Total number of edits is a somewhat complicated measure since it is dependent on how the user edits (see Edit counting). Some users tend to make many small edits, while others make multiple substantive changes in one edit. Among editors who edited at least once (although there may be some zeros due to deleted pages) and whose first edit was not vandalism, the median number of edits was higher in the welcomed group. The two editors with the highest counts were also both in the welcomed group, standing out at 44 and 46 edits.

Since each edit was not checked, it is possible some of these edit counts are vandalism subsequent to their first edit.

Discussion
The trend in both the principle measure of behavior seems to indicate users probably do respond positively to receiving a welcome, although the p-value does not rule out random effects. Approximately 20-30 percent of users do make a non-vandalism edit in the first week after editing, most of these edits occurring immediately following registration.

Using the estimates found in this study, a welcomed user is approximately 10% more likely to edit within the first week if they are welcomed. Since this was a randomized, controlled study, the difference in means between the welcomed and non-welcomed groups is probably caused by the welcome itself. As such, if 100 welcomes are given, 10 more editors are likely to be making a contribution to the project.

Further work, however, is needed to confirm this result and possibly investigate other questions, including
 * Which welcome template is most effective;
 * What effect a welcome template has if given after the user's first edit;
 * Are other measures more effective in measuring the welcome message's effect?