Wikipedia:Wikipedia Signpost/2015-07-29/Recent research



Wikipedia as an example of collective intelligence
An article in Social Science Computer Review presents an argument that Wikipedia is an example of collective intelligence. It is primarily a theoretical piece, but the author is well-informed about Wikipedia's everyday workings, illustrating the theory with his knowledge of Wikipedia. The article heavily relies on Pierre Lévy's notion of "humanistic collective intelligence". The author argues that Wikipedia displays some key characteristics of a collective intelligence process, such as software optimized for stigmergy (a mechanism of indirect coordination between agents or actions, such as the existence of edit history, talk pages, etc.); distributed cognition (such as existence of bots, and division of tasks between various tools and individuals, facilitating their actions), and possibly, through it is not possible to prove beyond any doubt, emergence (a process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties). The author concludes that Wikipedia thus exemplifies a special kind of collective intelligence, the aforementioned humanistic collective intelligence proposed by Lévy.

#Wikipedia and Twitter
review by Kim Osman

This study from OpenSym '15 analysed 2.5 million tweets, collected over a five-month period on Twitter, that linked to Wikipedia pages. The authors found tweets referencing Wikipedia in both English and Japanese linked to pages from their respective language versions of Wikipedia nearly all the time (97 and 94 percent respectively). However, in other languages, tweets often linked to a different language version of Wikipedia - roughly one fifth of the time. Interestingly, tweets in Indonesian referenced another language version more than half the time (linking to English Wikipedia in half the tweets) and of the links to English Wikipedia the authors found that 75% of linked articles did not have an equivalent Indonesian version. There was a long tail distribution of articles among the analysed tweets, with the authors noting certain “events” (like the Gamergate controversy) generating multiple tweets. Of the Top 20 Twitter users in the dataset, 19 were bots, with the most prolific tweeter being Wikipedia Stub Bot (@wpstubs). The authors do note that in their study there is not enough evidence to support the relationship between “how actively edited a certain article is and its popularity on Twitter.” This study does however raise interesting questions about the platform relationship between Wikipedia and Twitter and the role of bots in creating and maintaining this association. The authors note future research could consider the role of events in popularising Wikipedia articles on Twitter along with further examining motivations for inter-language linking on Twitter.

Briefly
"The castle was later incorporated into the construction of Ashtown Lodge which was to serve as the official residence of the Under Secretary from 1782" (en:Ashtown Castle) vs. "After the building was made bigger and improved, it was used as the house for the Under Secretary of Ireland from 1782." (simple:Ashtown Castle)
 * How old is the account making an average edit? Among other charts recently created by to visualize statistical data about the English Wikipedia community, this one shows that "the long-term trend is for the active community to gain about 6 months in average age for every year of time that passes in real life."
 * Simplifying sentences by finding their equivalent on Simple Wikipedia: A preprint by researchers at the University of Washington describes a method to automatically align sentences on the English Wikipedia and the Simple English Wikipedia about the same facts. Besides a hand-annotated dataset of corresponding (and non-corresponding) sentence pairs used to test and adjust the algorithm, their approach uses a "novel similarity metric" between of pairs of words which is based on synonym information from Wiktionary, resulting in a weighted graph called "WikNet" that consists of "roughly 177k nodes and 1.15M undirected edges. As expected, our Wiktionary based similarity metric has a higher coverage of 71.8% than WordNet, which has a word coverage of 58.7% in our annotated dataset". These datasets are available online. The following pair of sentences are presented as an example for good match found by the resulting method:

Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
 * "The Virtues of Moderation" presents "a novel taxonomy of moderation in online communities", including a case study of Wikipedia (p.88).
 * "Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation" From the abstract: "We show that using the full graph is more effective than just direct links by a large margin, that non-reciprocal links harm performance, and that there is no benefit from categories and infoboxes ..."
 * "Wikidata through the Eyes of DBpedia" From the introduction: "All DBpedia data is extracted from Wikipedia and Wikipedia authors thus unconciously also curate the DBpedia knowledge base. Wikidata on the other hand has its own data curation interface ... While DBpedia covers a very large share of Wikipedia at the expense of partially reduced quality, Wikidata covers a significantly smaller share, but due to the manual curation with higher quality and provenance information."
 * "WikiMirs: A Mathematical Information Retrieval System for Wikipedia"
 * "Content Translation: Computer-assisted translation tool for Wikipedia"
 * "Peer-production system or collaborative ontology development effort: what is Wikidata?" (to be presented at the OpenSym 2015 conference in August)
 * "Big data and Wikipedia research: social science knowledge across disciplinary divides"
 * "Comparing language development in Wikipedia in terms of page views per Internet users" See also Wiki-research-l mailing list discussion
 * "Understanding Graph Structure of Wikipedia for Query Expansion"
 * "Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research"
 * "Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study"
 * "Is it Possible to Enhance our Expert Knowledge from Wikipedia?" From the English-language abstract: "In September 2013 two different questionnaires about medical issues were given to medical students, resident physicians and one medical specialist. The questioning was about diseases/symptoms, examinations/classifications and conservative therapy/surgery of the department of orthopaedics and traumatology. ... The survey has proven the up-to-dateness of Wikipedia articles and their listing on the first or second position on Google. Wikipedia contains a lot of bibliographical references, high-quality images and video material. Almost half (42,5 %) of all evaluated articles are appropriate for use in medical exams and in the daily clinical work."
 * "Predicting elections from online information flows: towards theoretically informed models" From the conclusions: "We have shown good evidence that an 'uncertainty effect' drives much Wikipedia traffic: newer parties which attracted a lot of swing voters received disproportionately high levels of Wikipedia traffic. By contrast, there was no evidence of a 'media effect': there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared biased towards different things: with news favouring incumbent parties, whilst Wikipedia favoured new ones." (See also coverage of an earlier preprint by the same authors: "Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK")

