User:Rich Farmbrough/temp124

Curation effort general model
The curation effort required is approximately:

$$ C = NV + nE $$

Where:
 * C is the curation effort per unit time.
 * n is the number of articles
 * V is the global visibility of the site
 * N is the propensity to make negative edits
 * E is the degradation due to the environment. This is generally a fairly small number

In more detail

$$ C= \sum_{i=1}^nN_iV_i + \sum_{i=1}^n\ E_i $$

where the subscripted terms refer to individual articles (or pages the result is general)

Definitions

 * Curation effort is a simple metric that treats every attempted negative edit as equal.
 * Number of articles for conceptual simplicity we think of articles, but we can use any sub-set of pages (indeed the model is particularly important for redirects).
 * Global visibility may be taken as a relatively vague metric, related primarily to the number of agents visiting the site and number of agent page-views, where an agent is an entity capable of editing.
 * Propensity to make negative edits is the likelihood of making any negative edit, for the purposes of this exercise any edit which deliberately or otherwise, makes an article worse.
 * Degradation due to environment this refers to changes to a page that are required despite the page not changing. It may be useful to break this down for some purpose to internal and external causes.  Examples: changes in a reference site's URL schema (external).  US changes to metric (external).  New censorship laws (external).  Change in manual of style (internal).  Template deprecated (internal).  (Changes in content due to new discoveries, events etc. - are part of content creation, not curation.)

Applications

 * 1) Certain types of article have a greater propensity to attract negative edits than others.  If the general figures are understood the cost in terms of curation effort can be quantified.  For example if N=1 excluding our collection of interest (CoI), and N=1.1 for the CoI, and n=5,000,000 whereas n(CoI)=50,000 the additional curation effort for the CoI, vs. an alternative normal collection of 50,000 articles is (ignoring E, which can be subsumed in N for the purposes of this exercise) an increase of 1/1000.

Limitations

 * 1) Curation effort in the real world is more subtle than this, bad edits are prevented by a number of means, ranging from at least three types of embedded warning and denying the edit by various forms of protection, blocking and edit filters and reverted by bot, recent-change patrol, watchlist curation and general curation.  The curation effort also feeds back to (in general, though not always) reduce the propensity to make negative edits.
 * 2) The difficulty of preventing, spotting and reverting negative edits is also not uniform.