Wikipedia:Wikipedia Signpost/2014-06-25/News and notes

The US National Archives and Record Administration (NARA) has committed to engaging with Wikimedia projects in their newest Open Government Plan. The biannual effort is a roadmap for how the agency will accomplish its goals in the digital age. In the first plan, issued in 2010, Archivist of the United States David Ferriero wrote "the cornerstone of the work that we do every day is the belief that citizens have the right to see, examine, and learn from the records that document the actions of their Government. But in this digital age, we have the opportunity to work and communicate more efficiently, effectively, and in completely new ways."

These "new ways" included reaching out to Wikipedia, starting in 2011 with the hiring of Dominic McDevitt-Parks as a Wikipedian in residence. The position began as a student internship, but McDevitt-Parks has since moved to being a digital content specialist with a specialty in the Wikimedia sites. Ferriero has spoken at multiple Wikimedia events, including the Wikipedia in Higher Education summit in 2011 (see Signpost coverage) and Wikimania 2012 (video; transcript; Signpost coverage). He has been frequently quoted saying varying forms of "if Wikipedia is good enough for the Archivist of the United States, maybe it should be good enough for you."

How has the Wikimedia movement benefited from NARA and McDevitt-Parks' placement? There are three organized projects dedicated to NARA. On Wikisource, NARA has an ongoing initiative that is transcribing US government documents. On Commons, NARA has uploaded over 100,000 images, the most recent of which came [//commons.wikimedia.org/w/index.php?title=Special:ListFiles&user=US+National+Archives+bot a month ago]. The English Wikipedia has gone into action with several articles related to images from NARA, such as Desegregation in the United States Marine Corps. The site has benefited with several images uploaded for specific users, such as living Medal of Honor recipients, like Charles H. Coolidge, and the lead images for three US battleship articles: Pennsylvania-class battleship, USS Arizona (BB-39), and South Carolina-class battleship (Editor's note: the author of this article has made significant contributions to the last three pages).

All of that is in the past, though. The Open Government Plan lays out what NARA wants to accomplish in the next two years; but as a general plan it suffers from a lack of specifics. The Signpost contacted McDevitt-Parks to learn what the inclusion of Wikipedia in this plan will mean for the site.

He told us that there is no quantitative target for a total number of image uploads, because NARA plans to upload all of its holdings to Commons. "The records we have uploaded so far contain some of the most high-value holdings (e.g. Ansel Adams, Mathew Brady, war posters)", he said. "However, we are not limiting ourselves to particular collections. Our approach has always been simply to upload as much as possible ... to make them as widely accessible to the public as possible."

To accomplish this, volunteers are working with NARA on a new upload script to port images to Commons; the work in progress is posted on Github. At NARA itself, an API is in development that will make it easier to extract the metadata of the images. Given these efforts, McDevitt-Parks says that they will "allow us to more easily upload all of our existing digitized holdings to Wikimedia Commons and similar third-party platforms, and also that in the future upload to platforms like Commons will be the end of all digitization. Looking at it this way, I would say that in a way all of our digitization efforts are also for upload to Wikimedia Commons."

In the meantime, the special requests process—the first pilot launched by NARA when McDevitt-Parks began his tenure—is still available for Wikipedia editors. In the future, they hope that this ad hoc arrangement can be supplemented with a volunteer citizen scanning program that will be able to "generate greater Wikipedian-initiated digitization."

What do the Vietnamese, Waray-Waray, and Swedish Wikipedias all have in common?
The Vietnamese and Philippines-based Waray-Waray Wikipedias have crossed the one million article rubicon—the tenth and eleventh to do so. Just like the Swedish Wikipedia, the sites have attained this symbolic milestone with the help of bots, a process that has divided opinions among Wikimedians from several languages. For example, for a previous Signpost article on the topic, German Wikipedian Achim Raschka pointed us to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."

In an email to the Wikimedia-l mailing list, Vietnamese Wikipedian Minh Nguyen wrote that some editors on the site shared similar concerns and were "alarmed" at the sharp uptick in bot-created articles. Yet at the same time, crossing the one million article mark with a high proportion of auto-articles led the community to look at its small size—its roughly 1250 active editors is less than the Catalan Wikipedia, a language with almost 60 million less speakers—and they are taking steps to ease the learning curves of new editors.

The question of active users is even more pertinent for fellow millionaire Waray-Waray, which has just 71 active users. The related Cebuano Wikipedia, which has also embraced bot-created articles and will soon join the million article club, has even fewer.

Meanwhile, the Swedish Wikipedia's article-creation bot has started editing again. The bot's operator Lsj told the Signpost that the source code has been rewritten to use the most recent references, though it is currently mostly operating on the Waray-Waray and Cebuano Wikipedias, which will soon also have one million articles. Other Wikipedias, such as Farsi (mostly spoken in Iran), have also expressed an interest in the bot's operation. Why have other Wikipedias not adopted similar processes, aside from those (like the English and German) that have philosophical objections? Lsj believes "it is mostly a matter of whether there is somebody who knows both bots and the target language well enough, and is prepared to devote the time required. Small language versions likely do not have such a person."


 * This article was updated after publication with information and comments from Minh Nguyen.

In brief

 * Commons devolving into full-blown URAA conflict: Battle lines have formed over the past few months with a split in the Commons community over the American Uruguay Round Agreements Act, which restored US copyrights on several works. While arguments over the Commons mission have gone on for years (in the Signpost in the past year alone—op-ed, reply, and forum), the URAA debate came to a head only recently with a large community request for comment. This was closed with a strong majority in favor of disallowing deletions based on URAA alone, but a subsequent discussion was rejected both on legal grounds and as being, in the words of several editors, "unclear". Recent discussions have taken place on the Wikimedia-l mailing list, but the current situation is muddled. A new proposal to break the deadlock has been started on Commons' administrators' noticeboard, but Wikimedia Foundation board member Samuel Klein has quickly suggested a major change that would modify Commons' policy to read "Keep something that is public domain in its country of origin, as long as there is reason to believe that rights-holders would want it to be used in the rest of the world."
 * Wiknics: US and Dutch Wikimedians are organizing Wiknics for early July.
 * Wikimania volunteers needed: The organizers of this year's London Wikimania have issued a call for volunteers to serve in a number of capacities during the conference week (8–10 August).
 * Engineering goals: The Wikimedia Foundation's Engineering and Product Development department is currently formulating its goals for the 2014–15 fiscal year. Editors are invited to contribute their views on the draft plan's talk page.
 * Quarterly reviews: Three quarterly reviews have been published on Meta: the editing team (formerly VisualEditor), Parsoid team, and Analytics team. Quarterly reviews aim to ensure accountability and allow senior Foundation staff to offer specific guidance to their proliferous and diverse initiatives.