Wikipedia:Wikipedia Signpost/2011-08-22/Technology report

July Engineering Report published
The Wikimedia Foundation's Engineering Report for July was published last week on the Wikimedia Techblog and on the MediaWiki wiki, giving an overview of all Foundation-sponsored technical operations in that month. Many of the projects mentioned have been covered in The Signpost, including the deployment of the MoodBar, ArticleFeedback and WikiLove extensions. Among those yet to have received significant coverage but that were highlighted in the report were "the successful implementation of a MySQL-based parser cache on Wikimedia wikis" and work on a "Wikimedia Report Card 2.0".

There was good news for performance, too. Seventy-four new servers were purchased to increase the capacity of our Apache cluster to be installed this month, whilst the reliability of database dumps also settled down after a rocky June. The work on the parser cache was also successful, improving the hit rate (the percentage of requests which do not force the server to regenerate the page from scratch) to 80%, from 30%. On the software side, Roan Kattouw and Timo Tijhof worked on delivering "global gadgets and a gadget manager". According to the report, the "back-end for loading gadgets remotely from another wiki" is now in a workable state, as is an "inventory" of available gadgets.

The HipHop deployment, AcademicAccess, App-level monitoring, and Configuration management projects were "mostly on hold" in July, as was work on LiquidThreads 3.0. Documentation of the status of projects came under more scrutiny in July under the guidance of Guillaume Paumier, now the Foundation's Technical Communications Manager. Paumier "continued to create, update, clean up and organize the project documentation pages for most engineering activities" during July, according to the report, which is itself authored by him.

July also saw the arrival of Jeff Green (Operations Engineer for Special Projects), Ben Hartshorne and contractor Daniel Zahn (Operations Engineers) and Ian Baker (Software Developer). At the same time, however, Chief Technology Officer Danese Cooper and Code Maintenance Engineer Priyanka Dhanda left the Wikimedia Foundation.

Fundraiser engineering sprints in progress
Also published this week was a detailed insight into the present fundraising team, who are responsible for making sure Wikimedia websites have the capability to maximise the fundraising potential that the annual drive afford them. It is currently led by Arthur Richards and also includes two developers, an operations engineer, a data analyst, and a general business analyst. The team for this year's fundraiser has now been working on that fundraiser since approximately May, according to the post.

From a technical point of view, Richards stressed that good "code hygiene" was a must, including writing unit tests for all the code they produce in two-week code "sprints". The sprints focus around specific goals which the team can track using the proprietary software Mingle. "While we would much prefer to use an open-source solution, we settled on this proprietary tool as it much more closely meets our needs than any of the others we explored" wrote Richards. Examples of sprint targets include improving the banner tracking system to allow for results to be filtered. This allows the team to improve the banner range available to maximise the number of visitors who see the value in donating to Wikimedia. This year's campaign will again feature localised banners and user stories.

In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.


 * The role of the "Platform Engineering" team at the WMF was explained in a blog post by its head, Rob Lanphier. He writes that broadly speaking the team is split up into three subgroups: the MediaWiki Core subgroup; the Technical Liaison and Developer Relations subgroup (TL;DR); and the Data Analytics subgroup, all of which are currently hiring.
 * Among the Requests for Bot Approval currently requiring community discussion is a request to create between five and ten thousand stubs on certain animal species.
 * Director Mobile and Special Projects Tomasz Finc invited testers for two projects within his remit: the second round of mobile testing and the latest beta version of Kiwix, the offline Wikipedia reader that was selected for Wikimedia developer attention.
 * After a discussion on the wikitech-l mailing list, it looks unlikely that MediaWiki will fix a date for dropping PHP 5.2.x support so soon after stopping support of platforms running PHP versions 5.1.x (which it dropped in version 1.17, released in June this year). Such a move would allow use of those features of the languages only introduced in PHP 5.3, but would prevent the installation of future MediaWiki versions on older systems.
 * During a special bug triage session targeted at those bugs affecting mobile devices, a number of problems and feature requests were analysed. According to bugmeister Mark Hershberger, his department "hope[s] to do one of these mobile triages every month and, for future ones it would be awesome if we could have Kindles, iPads, and maybe even Nooks as well as BlackBerries, Androids, iPhones and even Nokia phones" (wikitech-l mailing list).
 * Developer Andrew Garrett has detailed the technical lessons that could be learned from the recent referendum-related mass emailing to potential voters, with 750,000 outgoing emails.
 * Diederik van Liere wrote on wikitech-l about the efforts of himself and other Wikimedia-sponsored researchers to harness the power of Hadoop (a platform that allows for computing in the cloud) in processing large Wikimedia dumps. The dumps, which can be many terabytes, are the most efficient way of grabbing large amounts of a Wikipedia's (or other project's) history at one time. In related news, he also published his suggestions about how the dumps could be made more reuser-friendly.
 * The main page of the Wikimedia Commons showed intermittent errors due to heavy server load. The problem was only relieved when more rigorous caching was reinforced, prompting content elsewhere to become slightly out-of-date (bug #30428).