Wikipedia:WikiProject Wikidemia/Quant/ReadershipStats

A lot of these may be untenable or redundant. Thought it'd be best to get as many ideas out there as possible.
 * NB: it would help to add a priority (1-5) before each item to indicate how interesting/important you think it is. +sj+

What among these is both interesting and attainable using some subset (or obsfucated full set) of readership data? What are we missing? We would really like to hear ideas on the matter. Can we think of a clearer way of structuring these?

Reading behavior of contributors:

 * If we want to understand the motivation of contributors, it may be helpful to see the history of their relationship with the articles they contribute to.
 * Whose edits are most long-lasting? Are edits of higher 'quality' if users travel more widely in Wikipedia as they compose their contribution (i.e. in the time period leading up to the submission).  Are do they survive longer if users don't travel far outside the local network of pages linked to a given article.
 * Are there clusters of contributors (or do contributors in general):
 * more likely to edit if the article is new to them?
 * more likely to edit if they have been to topologically nearby articles?
 * articles in the same category?
 * more likely to edit if they view more articles/time?
 * Are there typical phases to a contributor's relationship to WP? Do surges in pageviews by an individual precede increases in edits, or vice versa?
 * Do contributors contribute more in response to conversations on their talk page, or conversations in the talk pages of various articles?
 * (Does interpersonal interaction encourage or discourage contribution?)
 * Are contributors influenced by the linguistic content of the pages they read?
 * Do people pick up phrasings/unique words from pages and deposit them in their edits?
 * How long is contributor "memory" of Wikipedia articles they have visited?
 * Do most link additions require a visit to the page being linked to?
 * How much more likely is someone to reference an article if they have seen it a few hours before? Days before?  Months?

Reading behavior of non-contributing users:

 * How many links does a user follow from an initial entry page?
 * What dominates browsing habits:
 * Link following?
 * Searching or url-entry?
 * Incoming links?
 * How long is the average Wikipedia browsing 'session'?
 * Are there patterns of use unique to WP which we can find?
 * How does use vary across the week?
 * Do non-contributors have different browsing habits than contributors?
 * Is it just a matter of raw number of accesses, or might there also be a difference in the number of links that non-contributors and contributors follow?
 * In the manner in which they browse (depth-first, breadth-first?)
 * What kinds of articles (by category, for example) attract what kinds of browsers/browsing behavior?
 * Do people follow more links from certain types of pages?
 * Does this behavior change with respect to identifiable spikes in readership (such as when a news event, holiday, etc. occur).
 * Does the age/length/number of contributors to a page have a relationship to the browsing behavior it fosters?
 * Which links are most likely to be followed when hopping between articles?
 * High/low on the page?
 * Longer/shorter link titles?
 * Does Title or Article get more accesses/link?
 * Do more links/page increase the number of links which users follow per page/per kilobyte of content?

Raw page views.

 * Could be used to look at simple ratios between the number of edits/editors and the number of readers in a given article.
 * How does this ratio vary across article parameters and link topology?
 * How many more views does an article get if it's linked to by one other article?
 * Is there a diminishing return on each new inbound link?
 * When a page is included in a category does it increase readership?
 * Is this just because of new inbound links?
 * Is there a trickle-out effect which follows from increases in pageviews of one article?
 * How do the number of raw page views relate to the Pagerank/in-degree, etc. of a page.
 * Note that page views are one measure of article quality.

The relationship between out-of-band events (news events, etc.) and in-band user behavior.

 * Is there a change in the edits/pageview when there is an identifiable surge in readership wrt a news event/inbound link from a high-profile site?
 * How do the previous metrics vary when access surges?

Backend behavior and user response:

 * How long will a user wait for a page to load before giving up?
 * Can we really ever obtain this from site logs?
 * How long do users stay away if they attempt accesses when system load is very high (and response time slow)? Do they give up at all?
 * These probably require the transmission of more data than we have considered. I include them because they are important considerations which might be useful to WP (and, generally, web) developers.