Wikipedia:WikiProject Wikidemia/Studies

Major proposed and ongoing studies of Wikipedia use should be listed and categorized here. Hypotheses, methods, and implementations can be discussed here if public; and should be noted after experiments have been completed.

Methodologies

 * Randomized subsets - a traditional way to run large-scale experiments in a neutral way is to identify a class of visitors, editors, or pages; select a randomized subset of that class; and introduce a variation to the randomized subset. Then metrics can be evaluated for both the subset and the entire class, and inferences drawn about what effects the variation had.


 * Stratification - With or without randomized subsets, stratification involves identifying a metric, splitting up a class into quintiles


 * Pilot studies - running small, short initial studies to provide an example of how to run and evaluate a study; and to iron out implementation details specific to Wikipedia and its community.

First edit
What motivates readers to edit the site?


 * New page creation
 * Minor corrections - What encourages users to fix typos and other quick mistakes?
 * Sandboxes - what encourages users to try editing the sandbox, the introduction page, etc? o find out about wiki syntax?

Recognition
What effect does recognition have on editing?


 * Sending notices to people when articles they've touched are featured on the main page.

Donations
There have been many good suggestions about ways to improve donation drives and funrdaisers for Wikipedia; a specific idea follows. Please add other ideas in their own sections below.

What motivates readers to donate? How do changes to link placement and text affect donations?


 * During fund drives
 * During the rest of the year?
 * Stratified across frequency of reading contribution

What statistics could be gathered to help answer these questions? Both general and specific statistics are welcome.
 * Queries on an entire history-laded database
 * Queries on randomized subsets of pages/histories
 * Queries on randomized subsets of users
 * The # of readers per page, and their referrers; by date and time (anonymized)
 * The most popular search queries / terms; and the actions of those who entered them (anonymized)

What variations could be tried out to study the influences on contribution and donor relationships over time?
 * Changing the anonymous sitenotice
 * Changing the reactions of a body of complicit editors to the target visitor
 * Changing the default site skin

Retention
What do long-term contributors have in common? How do they interact with one another? What encourages these people to stay, or to leave?

Metrics and statistics

 * Popular pages (from en:WP), Pages from English Wikipedia with more than 1000 hits in Feb 2004 (from meta).
 * Most referenced articles (en:WP)
 * Breakdown of visitors by country (earliest data from the french cluster; cf. Submarine)