Wikipedia:Wikipedia Signpost/2010-09-13/News and notes

Global page-edit statistics
Erik Zachte has posted an analysis of page edits on all Wikipedias by region (on his Infodisiac blog, a site dedicated to Wikimedia statistics). The analysis, similar to an earlier one focusing on global page views (see 18 January Signpost), was based on a 1 in 1000 sampling of Wikipedia's squid logs, and excludes known bots and web crawlers. While not perfectly accurate, the analysis does reveal several important editing trends:
 * On average, Wikimedia projects received 2000 views per edit in July 2010.
 * The breakdown according to global region is similar between edits and views, but there were striking differences within the statistics: Europeans contribute 51% of edits and 35% of views, whereas North Americans contribute 23% of all edits and 38% of all views. A respondent to the blog post suggested that the relatively mature stage of the English Wikipedia may cause its edit rate to be lower than those of foreign-language Wikipedias.
 * Of all Wikipedias, the English Wikipedia received 51% of page views and 41% of page edits.
 * North America, Europe, Russia, Australia, New Zealand, and a few associated countries account for 81% of total views plus edits, with 46% of the world's internet population and just 19% of the world population.
 * Monthly requests from China are 10 times lower than average (one in 10 views and one in 14 edits) than its population would suggest, given the average for Asia.
 * In India, 94% of page views and 78% of edits of all Wikipedias are for the English Wikipedia.

Wikimédia France partnership with the French National Library
In April, the French chapter Wikimédia France signed an agreement with the Bibliothèque nationale de France (BnF, the National Library of France), to make about 1,400 public domain books from their digital library Gallica available for Wikisource (see 12 April Signpost).

A team of three volunteers from Wikimédia France then retrieved high-resolution image files (in the lossless but bulky TIFF format) and OCR files from the BnF, and produced DjVu files that were uploaded on Wikimedia Commons in July. The heavy compression used in conversion of image files to DjVu resulted in a substantial loss of quality. Since the support of TIFF was imminent (see Signpost coverage in April and August), all of the original, high-resolution TIFF files were uploaded on Wikimedia Commons at the end of August, for future reference.

The BnF's OCR files, which indicate the position of each word and all graphical elements such as illustrations in the books, allowed extraction of more than 22,000 image files, although many of them may be useless (detection errors, mere black lines), of limited interest (stamps, vignettes), or duplicates, and thus require human review before a mass-upload to Wikimedia Commons. Nonetheless, many interesting images, such as educational diagrams, novel illustrations, scientific schematics, portraits, and maps, were obtained. The team is currently investigating the possibility of making the files available to Wikisource contributors.

Mass blanking of copyright violations
Darius Dhlomo, a Wikipedia contributor with more than 163,000 edits dating back to 2005, has been indefinitely blocked for extensive copyright infringements. Following debate on the user's talk page, the incident was transferred to contributor copyright investigations. Copy-pasted articles brought to light numbered almost 10,000 creations and possibly 25,000 infringements. Consensus was established for the automated mass blanking of all confirmed and suspected infringements by the user (about 17,000; see Task explanation) – roughly 10% of his article edits. Most of the articles are very short tabular stubs with little prose, explaining how they were not noticed for so long.

Manual repair efforts faltered due to the sheer number of articles. According to Uncle G, managing administrator and coder of the bot responsible for the mass blanking, the infringements were "on quite a large scale, and with a regular pattern." All articles created by Darius Dhlomo are now suspect and need to be reviewed for potential copyright infringement. The bot will roll back every article to the version immediately prior to Darius Dhlomo's first edit, based on a master list generated by VernoWhitney. The articles he created will not be deleted, but the bot will blank the page completely.

This short-term solution to the problem was announced on the project-wide watchlist notice; the long-term solution will require that editors review the copyright infringements and turn them into proper articles. The hope is that this Signpost article can help spread the word about user involvement in resolving the issue. Uncle G says this mountain can be moved "by a thousand teaspoons all digging together."

Jimbo weighs in on the Pending changes poll
Jimbo Wales has made an Announcement about Pending Changes, having been asked to interpret the results of the Pending changes poll for the Foundation. Wales said his intent was to communicate the community's desires to the Foundation and not to act as a final authority on the matter. There is "absolutely no consensus for simply turning the system off and walking away", he said, citing the result of the poll (65/35% Support/Oppose, despite the large number of contributors who opposed the structure of the poll itself). He conceded there has been substantial, vocal, and articulate opposition to using a system of this kind at all, or to using it in its current form, and addressed three concerns:
 * Openness: "I believe Pending Changes, used properly, can make Wikipedia more open if used on pages that would otherwise be put under some other form of protection." He acknowledged that while he had hoped that Pending Changes would make it possible to open up articles like George W. Bush and Barack Obama for anonymous edits, the trial had shown that for these particular pages "the workload of vetting edits and the polluted edit history weigh greater than the benefit of the system." However, he continues to believe that the system is useful for opening problematic pages to unrestricted editing, and should be further refined and tweaked for that purpose.
 * Effectiveness: On doubts about effectiveness, Wales said the results indicate that there is still much to learn. Noting that PC does not work as well on high-traffic pages, he said it seemed to be effective on such articles as those dealing with current events. Referring to supporters' suggestions that the feature could be applied to determine if a page is eligible to be moved from semi-protected to unprotected status, he said it all argued for further careful exploration of the tool.
 * Complexity: In response to complaints about the system's complexity—particularly the user interface—he agreed that the current iteration was "rough around the edges", but was hopeful it would integrate better with user experience over time, and work like traditional page protection now. On this basis, he has already asked the foundation to keep Pending Changes enabled, to streamline the interface, and "to increase the hard-coded limit of pages as the performance characteristics of the system allow it." He suggested a six-month waiting period, and proposed inviting opposers to provide feedback in the hope that by "working together, we can build a system and policies that have even broader support, similar to page protection and user blocking today."

Wales also took part in the ensuing discussion and responded to the comments on his page. Community members expressed their views following his statement on their concerns, suggesting an alternative straw poll for the future and discussing ways to resolve the issue in the meantime. Wales proposed a quick poll to determine what to do pending the availability of version 2.0, saying he has asked the Foundation for a firm schedule and will report back when he hears from them. The two proposed options for the poll would be to stop using the feature altogether or use it only on an evaluation basis. Rob Lanphier from the Foundation has advised that he will make a timeline available by September 17.

Briefly

 * Chief human resources officer hired: The Wikimedia Foundation has announced the hiring of Cyn Skyberg for the position of "Chief Talent and Culture Officer" (CTCO). She will be responsible for the coordination of Wikimedia's human resources and organizational, developmental, and recruiting strategies. The position was originally advertised as "Chief Human Resources Officer" but was renamed so as to show "a strong focus on helping Wikimedia grow and sustain an organizational culture consistent with its values."
 * Three Chapter reports published: Three Wikimedia Chapter reports, for Hungary, for Argentina, and for Hong Kong, have been published. The Hungarian report covered the first ever "WikiCamp" (see earlier Signpost coverage), Conference organization, several grants, media and social media coverage, and contact with a group of librarians "who want to start an initiative to systematically improve Wikipedia content on the topics they find important." The Argentinian report covered several representative and board meetings (including one with representatives of Buenos Aires), the bicentennial of the start of the Argentine War of Independence, the third anniversary of Wikimedia Argentinia, media coverage, and other topics. The Hong Kong report covered the third annual Anniversary Conference, involvement in liberal studies, annual fundraising preparations, the Wikimedia Asia Conference, and the General Assembly.
 * Wikimania Committee?: Discussions have begun on a possible Wikimania Committee, kick-started by Seddon. Active participants had "floated the notion" that there should be an oversight committee for several years; interest was reignited during the run-up to this year's Wikimania (see Signpost coverage).
 * Gadget installation statistics have been published for the English Wikipedia.
 * CGDO on India: The log for the August 31 IRC Office hours with Barry Newstead, the Wikimedia Foundation's Chief Global Development Officer, have been posted. A large part of the conversation related to the Foundation's plans in India, which Newstead is visiting this month to prepare the opening of the first Foundation office outside the US (see last week's "In the news").
 * Movement roles workgroup: As reported in last week's "News and notes", the WMF Board (re)formed a Movement roles working group in its July meeting. The group is tasked with "clarifying the roles of different parts of the Wikimedia network and movement", with the goal of drafting a "Wikimedia Charter". It held its first conference call last week; a summary has been posted on Meta.