User:Dcoetzee/The Wikipedia Adventure/Prototype, assessment, and pilot study design

Derrick Coetzee

Prototype
The Wikipedia Adventure is an interactive tutorial for new Wikipedia users.

In the first milestone, we implemented a preliminary version of the first lesson, in which a user is led through the process of making their first edit to an article. They are asked to identify an error in an article, and then to make an edit fixing it. Their fix is verified and displayed. They are then asked to visit the real Wikipedia, use the Random article feature to find an article containing a problem, and repair it in the same manner. Preliminary feedback was sought from collaborators. A live demo is available from:


 * http://wikipediaadventure.moonflare.com/

Implementation
The implementation is based on MediaWiki, the open-source wiki software used by Wikipedia, together with JavaScript extensions. By importing content, extensions, and configuration settings, we precisely replicated the appearance of Wikipedia's Main Page at a fixed point in time, along with at least one article (George Tupou V).



We then used JavaScript to insert an instructions box on top of the existing content, and to disable all links/forms except for the one the user is directed to use. It also adds an arrow pointing to the current element of interest, and validates user actions to ensure that they performed them correctly.

The preliminary implementation was demonstrated to several online collaborators who completed it and offered ad hoc feedback, such as:
 * Indicating when the highlighted item is outside the window by adding an instruction to scroll
 * Making the interaction more friendly
 * Providing hints if the user gets stuck

Remaining implementation work
The remaining introductory lessons need to be created. Because the first lesson involves only trivial edits, multiple-choice options for text to insert are not yet implemented. These will be needed when dealing with, for example, neutral point of view discussions.

A log-in system with a database backend needs to be implemented in order to track and record progress over time, and associate each user with their corresponding Wikipedia user for later analysis.

No score system or on-wiki rewards have yet been implemented.

Separating lesson content from JavaScript code, enabling non-programmer collaborators to participate in lesson creation, is important.

Quantitative
The two main goals of the tutorial are editor retention and encouraging high-quality contributions. Because Wikipedia records all contributions of every user, this offers a unique opportunity to retrospectively evaluate user performance on Wikipedia by manually examining their user contributions and rating them according to certain criteria. This strategy is preferable to making assessment part of the tutorial, partly because it directly targets the factors of interest, and partly because tutorial participation is voluntary and any unnecessary content could lead to higher abandonment rates.

By asking users to link their Wikipedia Adventure account to their real Wikipedia user account (for example by posting of an authentication code), it becomes possible to systematically track the progress of each user through the tutorials, and their resulting performance on the project. Simultaneously, we will log all responses, timing, and score information for each user completing the tutorial. If feasible, mouse motion may also be recorded. Not all this information will be used at the present time, but may be useful later.

A random sample of at least 10 new users will be selected and (with the assistance of online collaborators) the contribution history of each will be evaluated according to the following rubric.

First the user will be grouped into a participation category based on their level of participation. Only users in the same participation category can be directly compared. The number of users in each participation category can be used to assess overall participation level of the group using their participation score, listed in parentheses.


 * Low participation (0): User made no edits to Wikipedia article content, or made only minor edits such as spelling corrections.
 * Medium participation (1): User made some major content contributions to Wikipedia, but had little to no interaction with other editors.
 * High participation (2): User made major content contributions to Wikipedia and interacted substantially with other editors.

The users are then evaluated in the following areas. If an item does not apply to a particular participation category it is omitted.


 * 1) Correct use of syntax and formatting (medium and high)
 * 0: User uses no markup/formatting or only uses it incorrectly.
 * 1: User uses only the most basic formatting, or makes many mistakes while using formatting.
 * 2: User is competent with basic to intermediate syntax and formatting, but might make a few errors.
 * 3: User uses advanced syntax or formatting with few errors.
 * 1) Comprehension of and adherence to policies including:
 * 2) Verifiability and referencing (medium and high)
 * 0: User never added any references.
 * 1: User included a few references, but did not supply enough information (e.g. used bare URLs) or covered only a small portion of the topic.
 * 2: User's writing was well-referenced, but could have used a few more references and/or supplied more information about some sources.
 * 3: User included a large number of diverse references and consistently supplied full information about each one.
 * 1) Neutral point of view (medium and high)
 * 0: User's writing is consistently blatantly biased, showing only one side of a topic and using strongly biased language.
 * 1: User's writing mixes biased content with neutral content, or describes neutral facts using heavily biased language.
 * 2: User's writing contains some bias or omissions, but overall presents an acceptably neutral account of the topic.
 * 3: User's writing contains very little bias and effectively describes and integrates multiple points of view.
 * 1) Copyright policy (medium and high)
 * 0: User only contributed content that violated copyright.
 * 1: User's contributions included major blatant copyright violations, but also contributed original content successfully.
 * 2: Some of the user's contributions were too closely paraphrased or some quotes were too long, but there were no major, blatant copyright violations.
 * 3: All of the user's contributions were original or within the bounds of fair use.
 * 1) Civil interaction with other editors (high)
 * 0: The user was blocked for attacking other editors.
 * 1: The user had severe issues interacting with other editors, but either partly resolved them or is working through them.
 * 2: The user had some negative interactions, but interactions were positive overall. They may have been provoked and/or apologized for any incivility.
 * 3: The user showed extraordinary patience and calmness even in the face of heated disputes in which they had a personal stake.

These values will be added, and this will generate for each user an editing performance score representing the user's overall performance relative to other members of their participation category. To establish a baseline for comparison, the same rubric will be evaluated for a sample of at least 10 new users who did not participate in the tutorial.

The overall participation level of each group will be compared by constructing a confidence interval for the difference in the mean participation scores of the two groups. If the lower end of this interval is positive and reasonably large (say at least 0.3) it suggests that the tutorial might increase participation.

Because not all users will complete the entire tutorial, some may benefit more than others. To cope with this, we can use a rough metric of completion, such as the number of lessons completed, and compare it to the editing performance score of the user, using zero for users who did not participate in the tutorial. A standard correlational statistical test such as r2 can be performed for each participation category and if the coefficient is close to 1, it suggests that users who complete more of the tutorial may have increased performance during real editing.

This is not a controlled study because participants in the tutorial are self-selected, but it is expected to offer insight into whether it is potentially effective at all, and issues in certain areas may point to the need for improvement of certain lessons.

Qualitative
During sign up for the tutorial users will enter an e-mail address, allowing us to contact a sample of them later to survey them about whether they feel like the tutorial was valuable. Due to the expected low reply rate, we would contact all users who entered an e-mail address. We would follow up after a few days to a week, allowing them time to make some contributions to the project and get initial feedback from other users. The following survey e-mail text will be used:

Subject: The Wikipedia Adventure: feedback survey

Dear Sir/Madam,

You recently participated in a trial of a Wikipedia tutorial for new editors, The Wikipedia Adventure. We'd like to get your brief feedback on how to improve it. If you could answer some of the following questions, it would help us to help other new editors in the future.

1. Do you feel like the tutorial was useful in helping you contribute to Wikipedia? Why or why not?

2. Are there portions of the tutorial that were especially effective or especially ineffective?

3. Did the tutorial explain things you already knew?

4. Was there anything you wanted or needed to know that the tutorial did not explain?

5. In addition to the tutorial, what other resources did you find useful in helping you complete tasks?

6. If you could build your own lesson/level for the tutorial, what would it be about?

7. Do you have any other comments or suggestions regarding the tutorial?

Thank you for your participation and for your interest in contributing to Wikipedia!

- The Wikipedia Adventure development team

For users whose responses suggest more details may be interesting, we can follow up in an e-mail conversation eliciting more information.

This data would be summarized in the project presentation and in a public report, which would be used to direct further development on the tutorial, as well as to generate anecdotal "success stories" to motivate interest.

Pilot study design
The tutorial will be pilot-tested with a mixture of experienced Wikipedia users, who can assess the content for accuracy and provide detailed feedback, and new users, who will be used for assessment as above. They will be distinguished on the basis of the number of edits their Wikipedia user has already made.

Experienced Wikipedia users can be recruited from my set of online collaborators as well as through a variety of Wikipedia community channels (IRC, Village pump, known contacts). A link to the pilot can be posted and anyone who follows it will be permitted to participate, and each student's associated Wikipedia user as well as complete quantitative information will be logged. Feedback from experienced Wikipedians will be solicited in a public, free-form discussion on Wikipedia; by the nature of the project, Wikipedians are accustomed to offering feedback in such a setting. By avoiding prompts, we can encourage more disruptive feedback that may result in positive major changes. This discussion will also serve to identify the community's sense of the value of the tutorial, potential negative interactions with project management, advice for how to advertise the tutorial, and advice about where it will be most helpful.

The most effective candidates for new users will be those who have recently expressed an interest in editing but have not yet edited much. Such users can be identified by automatically processing Special:RecentChanges to identify users with few edits, and/or by identifying new authors of submissions at Articles for creation. New users will be surveyed by e-mail with questions, as above.

We intend to follow an iterative approach in which several small pilot studies are done as more lessons are developed, in order to collect useful feedback as early as possible. Some ad hoc feedback has already been given. The study presented for this class will constitute the first major pilot study.