Wikipedia:Provenance

On Wikipedia, provenance is the origin of material that appears in articles. Provenance has been a controversial issue on Wikipedia from its inception. As Wikipedia has grown larger, published concerns about provenance have also grown. (See the external links below.) One of the more prominent critics has been Robert McHenry, former editor-in-chief of Encyclopædia Britannica, who suggested that Wikipedia is like a public restroom where what you find in a Wikipedia article is whatever the last user deposited.

Intellectual provenance has been a traditional concern of academia. Academics study the intellectual heritage of ideas, concepts, methods, theories, etc.  Specifically, they are concerned with properly attributing work in their own field of study. So they question how this can be done for Wikipedia. For example, should academic credit be given for contributions to Wikipedia?

Provenance has at least two aspects: source and time. (See below.)

Provenance controversies
Providing provenance has proven to be controversial. Objections have included the following, e.g., from Village pump (proposals):


 * It is technically impossible. (But see proposals below.)
 * No one wants it. (But see the articles in the external links below.)
 * An article is a joint effort by the community that should stand as a whole without its parts being viewable as to their provenance. It seems to go against the spirit of the way articles have been developed in the past, where we're all in it together and no one lays claim to the work.  Editing is really an interactive joint enterprise and should not be viewed as minuscule intervals that do not reflect each real contribution.
 * If an article is viewed as intervals with individual ownership, it will affect the process of cooperation on the article, maybe adversely. It allows some users to take prominence over others, which again goes against the spirit in which Wikipedia operates now.
 * You don't really know who contributed each interval because people can choose any user login they want.

Proposals for provenance
Proposals for providing provenance have been made and discussed on Village pump (proposals).

Source provenance
On Village pump (proposals) (see Wikipedia talk:Provenance), Pseudo Socrates made a proposal to provide source provenance by placing a  button on the history page of each article that would produce a dynamic page that is a version of the current article modified as follows:  Each interval of text would be preceded by a source link that would link to an article in the history where the text following the link first appeared in the editing history of the article. The name of the source link would be source for that version (log in name or IP address). At the bottom of the dynamic page the following notice would appear:

The name of each provenance link above was derived from the second column, i.e., source (login name or IP address), of the history page of the article for which this page was produced. Clicking on a provenance link will produce a dynamic page that shows a (previous) version of the article in which the text following the link first appears in the editing history. Of course the source may not be the real author of any of the text in an article that results from their edit.

Alternatively, source provenance could use a color-coding scheme: each of the top ten contributors to an article would be assigned a different color. Users could point the mouse at text of any color to see which contributor that color represented. Or, they could look at a legend at the bottom specifying which color represented which contributor. Users who have set up forced custom text colors in their web browser and blind users using screen readers would not be able to take advantage of this feature, though. This would also require that a standard be established to determine how contribution to an article is measured, e.g. the number of edits made, the number of characters added/changed, etc.

Temporal provenance
On Village pump (proposals) (see Wikipedia talk:Provenance), Pseudo Socrates made a proposal to provide temporal provenance by placing a  button on each article that would produce a dynamic page that was a version of the current article modified as follows:  Each interval of text would be colored according to the following algorithm:  Text of vintage less than 24 hours would be colored red,  vintage more than 24 hours but less than one week would be colored green,  remaining text would remain black.

Tom Cross' article "Puppy smoothies: Improving the reliability of open, collaborative Wikis" (First Monday) proposes a temporal provenance by coloring based on the number of edits a piece of text has survived. E.G., new edits would be colored red, text surviving 50 edits would be yellow, text surviving 100 edits would be green, and text surviving 150 edits would be black. The exact values here could be tempered by various factors (e.g., perhaps surviving many reads, or many days, would could for something too). By itself, this probably isn't enough; an attacker could automate "editing" to "promote" some other text. But counting only named edits, by multiple people, and adding a minimum time value (say, 7 days to get a new level) would be simple to do, and might make it workable.

Of course, temporal provenance can be combined with source provenance.

Examples

 * Skrapion created an example of how provenance could be displayed with example text taken from the history of this page. You can see his example here: Provenance/Example