User:Chazpelo/sandbox/Wikipedia:Stiki

STiki is a tool used to detect and revert vandalism or other unconstructive edits on Wikipedia, available to trusted users. STiki chooses edits to show to end users; if a displayed edit is vandalism, STiki then streamlines the reversion and warning process. STiki facilitates collaboration in reverting vandalism; centrally stored lists of edits to be inspected are served to STiki users to reduce redundant effort. STiki is not a Wikipedia bot: it is an intelligent routing tool that directs human users to potential vandalism for definitive classification.

To date, STiki has been used to revert /leaderboard instances of vandalism, spam, and other unconstructive editing on Wikipedia (see the leaderboard and editor milestones).

Download

 * STiki 2.1 – Current version released 2015-07-02 (942 kB)


 * Front-end GUI, distributed as an executable *.JAR. After unzipping, double-click the *.JAR file to launch (Windows, OS X), or issue the terminal command "java -jar STiki_exec_[date].jar" (Unix).


 * STiki remains in active development, both the front-end GUI and back-end scoring systems. Check back frequently for updated versions.


 * Software developers: STiki source @ GitHub
 * Full source for the GUI and back-end server. Library dependencies (IRC and JDBC) are not included.
 * Also available statically: STiki Source (2.0 mB) --- Link Processing Component (114 kB; may have deprecated/broken API code)
 * Note that this also contains the source for the WikiAudit tool

Using STiki
STiki may only be used by editors with a Wikipedia account. Additionally, the account must meet some qualifications to reduce the probability of users misidentifying vandalism. The account must have either: (1) the rollback permission/right, (2) at least 1000 article edits (in the article namespace, not talk/user pages), or (3) special permission via the talk page. We emphasize that users must take responsibility for their actions with STiki.

After login, users primarily interact with the GUI tool by classifying edits into one of four categories:



Uncertainty over constructiveness: If a user is uncertain about whether an edit is constructive, the quickest solution is often to perform a web search (e.g., with Google); this may reveal whether some "fact" is true. Of course, STiki users should consider the reliability of the source found. If no reliable source can be found, the correct response may be to add a Citation needed or Verify credibility tag, using the normal wiki interface. Where content has been removed, common sense is usually the best guide. Does the removed text have citations? (Note that checking the citations themselves may be necessary in content regarding living people.) What is the edit summary? Does that explanation make sense? Is it discussed on the talk page? Regardless of the issue, anything that requires domain-specific expertise to resolve is probably best classified as "innocent" or "pass".

Uncertainty over malice: It can be tricky to differentiate between vandalism and good-faith edits that are nonetheless unconstructive. Test edits should be classified as "vandalism", as initial warnings and edit comments accommodate this case. If the unconstructive edit or the edit summary indicate Wikipedia inexperience it may be best to label the edit "good faith" and leave a message on the new user's talk page, offering to help. Beyond that, common sense is usually the best guide. Consider the article in question. Is it something that young editors might be interested in? Is there any truth in what is being said (absent formatting, language, and organizational issues)?

Deeper investigation: Sometimes a revert ("vandalism" or "good faith") will not repair all the issues presented in a diff or the diff doesn't contain enough evidence to make a definitive classification. In these cases, use the hyperlinks (blue underlined text) to open relevant pages in the default web browser. This is helpful, for example, to: (1) view the article talk page to see if some issue was discussed, (2) make changes using the normal interface, and (3) use other tools like Popups, Twinkle, and wikEdDiff.

When you return to the STiki tool you will still need to classify the edit. If you used the browser interface to edit the article, pressing "vandalism" or "good-faith revert" will not revert your changes or have any direct effect on Wikipedia. Classify the displayed edit as best you can. Making such classifications will help STiki to identify similar edits in future.

Interface tips: STiki has hotkeys to ease user interaction with the tool. After a single edit has been classified with the mouse (giving the button panel "focus"), the keys, , and   will mark edits as "vandalism", "good faith", "pass", and "innocent" respectively. While in the same mode, the Page Up, Page Down, Up Arrow (↑), and Down Arrow (↓) keys will also scroll the diff browser. Also note that hyperlinks which appear in diffs can be opened in your web-browser, assuming that the "Activate Ext-Links" option (under the "Options" tab) is turned on. STiki stores your settings in a file named, so it is possible to quickly edit your settings there.

Comparison with other tools
The following features make STiki distinctive:

Edit prioritization
STiki orders the edits to be displayed to end-users into priority queues. The priority an edit takes is based upon its evaluation by an anti-damage scoring system. Different systems produce different scores/queues, and users can explicitly select a queue to access using the "Rev. Queue" menu. All approaches are rooted in machine learning, of which there are two active, and two inactive approaches:

When STiki is experiencing considerable use, the frequency of vandalism found in one queue may reduce significantly, a phenomenon called "queue exhaustion". In such cases, it may be wise to try an alternative queue. Users should also recognize there is a finite amount of vandalism on Wikipedia. The more people who use STiki, the less percentage any one user will see. This does not mean STiki is doing "bad"; it means the encyclopedia is doing "good".

Metadata scoring and origins
Here we highlight a particular scoring system, based on machine-learning over metadata properties. This system was developed by the same authors as the STiki frontend GUI, was the only system shipped with the first versions, and shares a code-base/distribution with the STiki GUI. This system also gave the entire software package its name (derived from Spatio Temporal processing on Wikipedia), though this acronymic meaning is now downplayed.

The "metadata system" examines only four fields of an edit when scoring: (1) timestamp, (2) editor, (3) article, and (4) revision comment. These fields are used to calculate features pertaining to the editor's registration status, edit time-of-day, edit day-of-week, geographical origin, page history, category memberships, revision comment length, etc. These signals are given to an ADTree classifier to arrive at vandalism probabilities. The ML models are trained over classifications provided on the STiki frontend. A more rigorous discussion of the technique can be found in a EUROSEC 2010 publication.

An API has been developed to give other researchers/developers access to the raw metadata features and the resulting vandalism probabilities. A README describes API details.

The paper was an academic attempt to show that language properties were not necessary to detect Wikipedia vandalism. It succeeded in this regard, but since then the system has been relaxed for general-purpose use. For example, the engine now includes some simple language features. Moreover, there was the decision to integrate other scoring systems in the GUI frontend.

Architecture
STiki uses a server/client architecture:

(1) Backend-processing: that watches all recent changes to Wikipedia and calculates/fetches the probability that each is vandalism. This engine calculates scores for the Metadata Scoring System, and uses APIs/feeds to retrieve the scores calculated by third-party systems. Edits populate a series of inter-linked priority queues, where the vandalism scores are the priority for insertion. Queue maintenance ensures that only the most-recent edit to an article is eligible to be viewed. Backend work is done on STiki's servers (hosted at the University of Pennsylvania), relying heavily on a MySQL database.

(2) Frontend-GUI: STiki's user interface is a Java desktop application. It displays diffs that likely contain vandalism (per the backend) to human-users and asks for definitive classification. STiki streamlines the process of reverting poor edits and issuing warnings/AIV-notices to guilty editors. The interface is designed to enable quick review. Moreover, the classification process establishes a feedback loop to improve detection algorithms.

Related works and cooperation
STiki's authors are committed to working towards collaborative solutions to vandalism. To this end, an API is available to STiki's internally calculated scores. A live feed of scores is also published to channel "#arm-stiki-scores" on IRC server "armstrong.cis.upenn.edu". Moreover, all STiki code is open-sourced.

In the course of our research, we have collected large amounts of data, both passively regarding Wikipedia, and through users' active use of the STiki tool. We are interested in sharing this data with other researchers. Finally, STiki distributions contain a program called the Offline Review Tool (ORT), which allows a user-provided set of edits to be quickly reviewed and annotated. We believe this tool will prove helpful to corpus-building researchers.

Credits and more information
STiki was written by Andrew G. West (west.andrew.g) while a doctoral student in computer science at the University of Pennsylvania, under the guidance of Insup Lee. The academic paper which shaped the STiki methodology was co-authored by Sampath Kannan and Insup Lee. The work was supported in part by ONR-MURI-N00014-07-1-0907.

In addition to the already discussed academic paper, there have been several STiki-specific write-ups/publications that may prove useful to anti-vandalism developers. The STiki software was presented in a WikiSym 2010 demonstration, and a WikiSym 2010 poster visualizes this content and provides some STiki-revert statistics. STiki was also presented at Wikimania 2010, with the following presentation slides. An additional writing (not peer reviewed) examines STiki and anti-vandalism techniques as they relate to the larger issue of trust in collaborative applications. Finally, the anti-damage ecosystem and STiki's technical contribution were summarized in the developer's PhD dissertation. That work is novel in analyzing ~1 million STiki classification actions to learn about human/social aspects of the patrolling process.

Beyond STiki in isolation, a CICLing 2011 paper examined STiki's metadata scoring technique relative (and in combination with) NLP and content-persistence features (the top 2 finishers from the 2010 PAN Competition) – and set new performance baselines in the process. A 2011 edition of the PAN-CLEF competition was also held and required multiple natural-languages to be processed; the STiki entry won at all tasks. A Wikimania 2011 Presentation surveyed the rapid anti-vandalism progress (both academic and on-wiki) of the 2010–2011 time period. Finally, a research bulletin published by EDUCAUSE looks at the issue of Wikipedia/wiki damage from an organizational and higher-education perspective with particular emphasis on the protection of institutional welfare.

Queries not addressed by these writings should be addressed to STiki's authors.

Userboxes, awards, and miscellania
For those who would like to show their support for STiki via a userbox, the following have been created/made-available:

Other STiki images, adverts, promotional material, and statistics:
 * An advertisement in the Wikipedia rotation:
 * Some statistics about STiki's market share

        

STiki