User:Filll/Controversial Article Project

There are many controversial articles on Wikipedia, and they describe a variety of controversial topics. These controversial topics include:
 * religion and spirituality
 * evolution, creationism, and intelligent design
 * sex
 * abortion
 * alternative medicine
 * racism, racial issues, and ethnicity
 * politics
 * pseudoscience
 * the paranormal
 * UFOs, alien abduction
 * biographies of living people
 * conspiracy theories
 * global warming and other environmental issues

There are many controversial articles on Wikipedia which address other topics, however. Some of the controversial articles on Wikipedia are somewhat surprising, like the articles on sunscreen or Joan of Arc.

Many who are appointed as administrators have minimal experience in dealing with controversial topics, because of the vagaries and requirements of the appointment and vetting process. The same is true of the appointments as arbitrators of bureaucrats or other positions. The experience necessary to understand and make good decisions in these crucial areas is not undervalued or devalued, but even viewed as a black mark against the candidates.

This is very unfortunate for several reasons. These controversial articles constitute an important part of Wikipedia. They generate a lot of traffic and visibility. They also are areas of significant risk, particularly legally and in the public relations arena, for Wikipedia.

Research in better techniques
There are several ways in which one can improve this situation. One is to study the techniques for dealing with controversial articles and see which works best, or if the methods can be improved. There are several useful techniques  that are used in the evolution, creationism and intelligent design articles. There are several incipient research efforts on new approaches.

Training
Another approach is to improve training of editors and admins in controversial areas, and to expose more people to these controversial situations and develop "Best Practices" for handling difficult situations. The AGF Challenge is an attempt at giving editors a taste of the experience of editing controversial articles in a safe, accessible, sanitized and nondemanding environment. The AGF Challenge consists of 8 short exercises. So far over 100 people have taken the original AGF Challenge. The WP Challenge continues with further exercises in a similar vein.

User: Kim Bruning is developing a set of classes for admins and editors, and using the AGF Challenge exercises in part. Bruning has also suggested using AGF Challenge exercises as a way to select administrators, and as part of the RfA questions.

User:Durova has suggested that methods for giving a more realistic impression of arguing with someone who edits disruptively with an agenda be created. An Eliza-like bot could be created to simulate the experience fairly easily, for example. Interspersing the input from several bots and occasional humans and recording the results might give valuable feedback about how an editor can handle these situations in a somewhat controlled environment, and/or give them experience.

Why Measure?
Wikipedia is managed in a somewhat haphazard way. People have hunches about what is going on and what is needed, and so they act. And sometimes their intuition is correct, but often it is wrong. However, if the reasoning is appealing, or even for reasons of maintaining a tradition, people continue to do the same thing over and over, even if it fails, and even if it has failed repeatedly before. Wikipedia, if it is to continue to be successful as it grows, has to adapt to changing circumstances. And the most reasonable approach is to begin to move beyond intuition-based management towards evidence-based management. Measurement is part of this.

As an example of a place where Wikipedia could be improved, consider the editing of controversial articles. Editing of controversial articles is not particularly respected. When people come up for RfA or are runnning for a position on the Arbitration Committee, few if any ask what sort of articles the candidates have experience editing, and if they have edited controversial articles or not. There is no generally accepted method for comparing the types of editing experiences people have had. In particular, there is no standard method for deciding if a given editor has edited controversial articles or not, and the kind of success that they have had in editing articles dealing with contentious subjects.

If we want to understand the editing of controversial articles, and hopefully develop better, more effective and more efficient methods for editing controversial articles, or develop more respect for the editors of controversial articles, we need to make relevant measurements. One of the first things that must be done is develop metrics to measure how controversial an article is, what experience an editor has in editing controversial articles, how successful they have been at editing controversial articles and similar potentially useful information. It is difficult to compare editors or editing styles if they are not measured.

User:Durova has pointed out that it is possible to "game" and trick almost any measurement scheme. While this is true, it is better to attempt to make an attempt to measure the relevant properties, rather than to just carry on blindly. If this is a significant problem, the exact nature of the algorithm can be hidden, as in the page rank algorithm used by Google or in the Checkuser procedures used by Wikipedia.

Although exotic metrics can be developed, such as linear and even nonlinear discriminant analysis, even involving customized software, to start with it might be useful to use just the standard common tools that are already at our disposal. To this end, I propose that the following be considered:

Controversy Intensity Estimation
Purpose: For estimating the level of controversy of an article A rough idea might be something like the number of average edits per month, over the period that the article has existed. Some articles are controversial for some periods, and uncontroversial for others, so there is some temporal dependence that is to be expected in this metric. For articles which are locked, it can be assumed that the article would have been edited at the same rate as the preceding month before it was locked during the locked period, or an average taken of the number of edits in the month before the article was locked and the number of edits in the month after the article was unlocked again, or some other reasonable assumption.

Articles that are very controversial should garner many more edits per month than articles that are less controversial. Similar measures can be made of talk page activity of articles; some articles have more activity on the talk page but are still quite controversial, although there is little edit warring in the mainspace version of the article.

More sophisticated measures can be imagined, but might involve more work. For example, one could measure the number of new accounts that appear to edit only that page, or mainly that page. One could count the number of SPAs that edit that page or mainly that page. One could count the number of identified sock and meat puppets that visit a page. One could measure the rate of reversion of the page.

Editor Participation
Purpose: For determining the intensity of editing of a webpage by an editor One method for doing this might be to divide the number of average monthly edits by the "rank" of the editor. For example, if there were 100 edits to the page per month on average, and the editor was the fourth most prolific editor of the page, then he would get an intensity score of 100/4, or 25. The 50th most prolific editor of the same page would get an intensity score of 100/50 or 2.

Another method might be to multiply the number of edits the editor has made to the page by the number of average monthly edits. If this number is too large, the average number of weekly or daily edits to the page could be used. To make the numbers more manageable, they could be divided by a constant, such as 1000.

Editor Contribution
Purpose: Determining the contribution of an editor's edits to a page This is a more complicated question. Ideally one would like to measure how persistent an editor's contributions to a page are, or how influential they are, if the page is a mainspace article page. For example, if an article is rated as an FA, and an editor has contributed 68% of the bits to this FA article, and another has contributed 12% of those bits, that might be useful information to know. The average length of time that an editor's contributions are retained in an article might be another useful measure. Measuring how influential talk page discussions are is even more difficult and fraught with subjectivity problems. There are tools that have appeared to judge how persistent article contributions are, but these might not yet be at a useful level of refinement for this purpose.

Editor contributions can be classified as (1) article building or (2) article defense and discussion. It is not clear what tools exist for this. Most talk page edits will be in the category of defense and discussion, although some will be involved with article building. Many mainspace edits are article building, but some are defense, such as reversions. If tools exist to count reversions, that could be useful, although obviously some partial reversions or reversions that are done by editing rather than some more automatic procedure to go backwards in the history would be missed by this.

Editors that received barnstars for their contributions might be judged to have had a beneficial presence, although clearly this can be GAMEd. Articles that are promoted while an editor is contributing might be viewed as evidence of their positive influence, although clearly some if not all of an editor's contributions might in fact impede progress towards promotion. Just because an article is demoted while an editor is contributing does not mean he or she has not had a positive influence either. Evaluating an editor's contributions is clearly a challenging problem.

Editor value
Purpose: To determine the relative potential value to the project of different categories of editors. What is interesting, is that contrary to the wikilove chorus worries about us driving away newbies and FRINGErs etc by not being more open and accepting and "letting the FRINGE advocates feel dignity" ( paraphrasing something a wikilove person told me), we are hemorrhaging experienced users, mainstream users, users who understand the principles of Wikipedia and can work within it and assorted expert users. And frankly, although I have not yet seen any estimates of the relative "value" of an experienced user and a newbie, or of other kinds of users, I think that the loss of potential productive value to controversial articles, controversial subject areas and sometimes even the entire project is far far greater when we weigh the loss of these experienced mainstream editors against SPAs and newbies of various sorts.

Calculation
See User:Filll/Controversial Article Project/Calculation

Guilds and ratings
I have proposed that a Science Guild (or potentially other kinds of guilds) be created. I envision the Science Guild as being an invitation-only organization, started from a nucleus of those who have dropped anonymity and are demonstrably real scientists. They would invite in others recognized as scientists, or pro-science-oriented. Members of the guild could "vote" on extending offers of membership once the guild became of a certain size.

A Science Guild could function as a support group and help to socialize those new users who are not familiar with the Byzantine structure of Wikipedia. It could argue for pro-science policies and give input in debates. And I think that candidates for office could be "rated" by the Science Guild.

A rating of 1 to 100, or 1 to 10 or 1 to 5 (or -100 to 100, or -10 to 10 etc) could be given to an RfA or Arbcomm candidate. This would be formed by averaging the ratings of Science Guild members who weighed in on the issue, possibly based on personal impressions, personal recollections, presented evidence etc. A candidate for office could then present their rating from the Science Guild to help "voters" know where the Science Guild stands on a given candidate. Of course, there could be other guilds like a Paranormal Guild that could offer ratings as well. Ideally, one would want a candidate that had high ratings from both the Science Guild and the Paranormal Guild, or something similar.

Just as multiple Wikiprojects and other organizations give a rating to an article on its talk page, a Science Guild or other guild could individually evaluate an article and rate it. The guild rating could be used as a lever to get certain problematic controversial articles into more encyclopedic condition, as feedback for certain groups whose ownership of the article has caused a loss of balance, and some additional measure of quality of articles.

POV Outline
Many of the disputes around controversial articles are associated with the idea of balance, or bias of the article. Typically there are multiple points of view, or POV to deal with. In some cases, these POV all have similar prominence. In other cases, there is a dominant mainstream POV and the other POV are FRINGE POV. A large amount of energy is expended on defining what are the mainstream POV and FRINGE POV, how prominent each is, how prominence should be determined, what the overall tone and organization of the article should be, and how much of the article should be devoted to each POV.

It is possible that requiring all editors to agree to a "content contract", or "POV outline" and abide by its terms before editing the mainspace article page, or even the article talk page would be useful. There would be a "form" that editors who wanted to contribute to a very contentious article might have to fill out, and possibly "vote" on its entries. The form would try to establish community consensus and understanding of some of the issues that govern the creation of the article. For example, if there are 3 POV to be represented in the article, we could try to identify the 3 main POV and have editors agree on what they are. We could try to decide on the relative prominence of a given POV. We could try to decide on how to determine relative prominence. We could try to decide what are mainstream POV and what are FRINGE POV. We could get each editor's views on how they wanted to divide up the text on each POV, and in what sections. And so on and so forth. With a set of carefully worded questions, we could focus the discussion and get editors to commit, or to realize that they are arguing against consensus. We could even use an editor's responses to try to help outsiders like admins understand better what the issues are, particularly if an article is under probation, and someone is arguing against consensus or against NPOV or whatever. I want to see if we can design ways to cut down on all this tail-chasing and nonsense and obfuscation and confusion, a lot of it intentional.

Controversial article patterns
After some observation of controversial article editing, some clear patterns start to emerge. A set of behaviors is repeated over and over at almost every controversial article to gain the upper hand in a dispute. For example, tags are added repeatedly to well known material, or material that is fully referenced on wikilinked articles that discuss that point in more detail. Assorted templates branding the article are thrown on the article repeatedly, such as the claim that an NPOV dispute is going on, when it is more accurate to describe the discussion as revolving around some editor's idiosyncratic interpretation of NPOV to satisfy their own personal agenda. Accusations that a group editing the article own the article since they will not change the consensus to satisfy one malcontent are common. I have compiled a short incomplete list of other common tactics that are repeated over and over.

A taxonomy of editors
There is some advantage to trying to develop a categorization of editors with distinguishing characteristics. If these characteristics can be quantified, then automated methods can be used classify editors by type. This can be used as part of evaluation systems, notification procedures and other assorted processes.

One of the most basic divisions of editors are among those who are more productive than disruptive, and therefore a net positive to the project, and editors who are more destructive or disruptive than productive, and therefore a net negative to the project. For example, an editor that is a net negative to the project may end up contributing nothing to the project, and necessitating that other editors who are creating and improving content cease to do so to deal with their disruption. At this point, they are actually reducing the volume and quality of the potential content of the product.

A crude measure of the costs of a disruptive editor is the number of editor-hours that they consume. That is, if it takes 10 editors 5 hours each to deal with the disruption of a given disruptive editor, this disruptive editor has cost the system 50 editor hours. These 50 editor hours could have been translated into content for the project, but they were devoted instead to dealing with the disruption caused by the disruptive editor. If the disruptive editor drives off a productive editor, as often happens, and this productive editor was statistically likely to devote 30 hours of free labor per month for the next year to the project, then the cost of this retirement is 360 editor hours.

Designate productive editors as editors of type "P", and designate disruptive editors as editors of type "D". Disruptive editors come in a variety of subtypes. There are editors who create disruption across a wide range of articles on different unrelated topics. Designate these editors as belonging to subtype "V".

There are D type editors that contribute content but often of an unbalanced nature, or otherwise neglect the rules for contributing to Wikipedia. Most frequently these editors will contribute to only one or two subject areas. Designate these editors as belonging to subtype "S".

There are D type editors who rarely if ever offer any content for inclusion in Wikipedia, and will usually decline to make suggestions when asked. These editors appear to be mainly interested in getting involved in altercations. Designate these editors as belonging to subtype "T".

An automated procedure for distinguishing between type S and type V editors would employ graph theoretic techniques. Either through the Wikipedia categories or hyperlinks, associated topics that the editor has been involved with can be identified, and the breadth of their editing interests can be assessed automatically.

Another major category of editors are editors of type "A". Editors of type "A" are not directly involved in article production or article defense, which are the primary activities of editors of type "P" and type "D". Editors of type "A" can be destructive or destructive, but more of their activity takes place on the administrative pages on Wikipedia, such as the policy pages, the noticeboard pages, pages associated with assorted administrative actions such as RfCs, RfAr, Arbcomm proceedings and Arbcomm enforcement.

Costs
Important questions to consider include:
 * What is the chance a new editor will actually contribute something to Wikipedia? What is the chance they will become an editor of type D? of type P?
 * What is the average cost of a D type editor to the project? What is the distribution of such costs? Can the cost be predicted?
 * What is the average cost of a V type editor? An S type editor? A T type editor? Are there significant differences?
 * Can Wikipedia become more productive and efficient at minimizing costs?

Gaming the system
It is impossible to create processes that cannot be gamed. All that can be done is to make the efforts to "game" the system almost as much work as just following normal procedures, in addition to increasing the risks that are associated with being caught trying to cheat. The list of huggle users and its history are automatically updated, and the list of other automated tools used by the editor can be used to reduce the chance of gaming or to correct the data for distortions introduced by the use of automated tools.

Questions to be investigated

 * 1) Does civility affect productivity? If so, how?
 * 2) How much do unproductive editors cost the project? What costs do they impose?
 * 3) Are there editors who are particularly skilled at negotiation who can settle disputes, or can improve productivity on Wikipedia articles?
 * 4) Does banning of certain editors affect productivity, positively or negatively?
 * 5) Do experienced and inexperienced editors answer the WP Challenge differently? Does the type of experience matter? Does the type of editing experience matter?
 * 6) Do editors change their editing styles with time and experience?
 * 7) What is the value of a new editor?
 * 8) Does the WP:BITE policy affect the producitivity of new editors?
 * 9) How can new editor productivity be optimized?
 * 10) Does mediation help with article productivity? What sort of mediation?
 * 11) What changes the contentiousness of a given article?