Wikipedia:Wikipedia Signpost/2011-03-21/Technology report

What is: localisation?
This week's Technology Report sees the first in an occasional editorial series entitled What is?. The series aims to demystify areas of the Wikimedia and MediaWiki technology world for the casual editor. Today's article is on "localisation", a process where the MediaWiki interface is translated into other languages (over 300 of them).

For the past five years, localisation is something MediaWiki has done very well. For 188 different languages (or language variants), 490 or more out of the most used 500 interface messages (including sidebar items and "Revision as of", for example) have been translated from the default (English) into that language. That list includes big names (French, German, Spanish) but also a myriad of smaller language groups as diverse as Lazuri (spoken by approximately 32,000 people on the Black Sea) and Tachelhit, a Berber language spoken by 3 to 8 million Moroccans (full list).

Translation, in the vast, vast majority of cases, cannot be handled by MediaWiki developers alone. Instead, the effort is crowdsourced to a large community of translators at translatewiki.net, an external site with nearly 5,000 registered users (source). The site was built for translating all things MediaWiki, but now also handles a number of other open source projects. When new interface messages are added, they are quickly passed onto translatewiki.net, and the finished translations are then passed back. Every project which uses the LocalisationUpdate extension (including all Wikimedia projects) provides access to the latest translations of interface messages to users in hundreds of languages within a few days of translation.

Over 100 issues (source) remain with language support for right-to-left languages, languages with complex grammar, and languages in non-Roman scripts, but the situation is slowly improving. For more information about MediaWiki localisation, see MediaWiki.org.

"Personal image filter" to offer the ability to hide sexual or violent media
At the upcoming meeting of the Wikimedia Board of Trustees on March 25/26, a design draft for the "Personal image filter" will be presented, a system that will allow readers to hide controversial media, such as images of a sexual or violent nature, from their own view. This modification would be the first major change to come out of the long-lasting debates about sexual and other potentially offensive images. In May last year they culminated in controversial deletions by Jimbo Wales and other admins on Commons, at a time where media reports, especially by Fox News, were targeting Wikimedia for providing such content. Subsequently, the Foundation commissioned outside consultants Robert Harris and Dory Carr-Harris to conduct the "2010 Wikimedia Study of Controversial Content", which was presented at the Board's last physical meeting in October. The study's recommendations were not immediately adopted, with the Board forming a workgroup instead. (See the summary in the Signpost's year in review: "Controversial images".)

The study had recommended that "a user-selected regime be established within all WMF projects, available to registered and non-registered users alike, that would place all in-scope sexual and violent images ... into a collapsible or other form of shuttered gallery with the selection of a single clearly-marked command ('under 12 button' or 'NSFW' button)", but that "no image [should] be permanently denied to any user by this regime, merely its appearance delayed".

In response to an inquiry by the Board if such a feature was feasible and how it might look, the draft design for the Personal Image Filter was developed by the Foundation's tech staff, in particular designer Brandon Harris (User:Jorm (WMF), no relation) and has already been presented to the workgroup, which in turn will present it to the Board this week. The design introduces a global "Content Filter" category on Commons, containing all images that can potentially be hidden according to a user's preferences, with a set of subcategories corresponding to such preferences. As a kind of localization of these, "individual wikis will be required to maintain a 'Category Equivalence Mapping'", to which they can add (but not remove) their own subcategories. The total number of subcategories is intended to be small though, with "somewhere between 5-10" global subcategories, and together with local ones "the interface can comfortably support around 10-12 filters before becoming unwieldy". Like the original recommendations from the study, the proposal appears to leave it to the communities to define the set of filterable subcategories, but it sketches a possibility: A wiki's "Content Filter" category could contain the following sub-categories: "Sexually Explicit", "Graphic Violence", "Medical", and "Other Controversial Content". [For example,] images illustrative of sexual techniques could be placed in the "Sexually Explicit" sub-category while images of Mohammed could be placed in "Other Controversial Content" (or even "Images of Mohammed").

Users (both anonymous and registered) can select which categories they want to filter via an annotation next to filterable images that lists the filter categories the image belongs to, or from a general display setting (accessible via a registered user's preferences, or for anonymous users via a new link next to "Log in/Create account").

Both the recommendations of the Controversial Content study and the workgroup's chair Phoebe Ayers emphasise the opt-in (i.e. voluntary) nature of the filtering. From a technical perspective, the changes needed to arrive at an opt-out (i.e. mandatory at first) version are obviously rather trivial, and indeed until very recently, the proposal encompassed an additional option for "Default Content Filtering", that could be activated on a per-wiki basis if consensus on that project demanded it. The option was removed by Jorm who explained that it had originally been included "because I could see this being used by non-WMF sites", but decided to remove it because it was "more of a suggestion for implementation, rather than a requirement, and appears controversial".

In fact, at least on the English Wikipedia, the standard skins have for a long time provided CSS and JavaScript code to allow parts of a page to be hidden for all readers. However, the use of the corresponding templates has generally been restricted to talk pages (collapse), tables and navigational components (hidden), with objections to their use for more encyclopedic content. Still, their use for controversial images has been advocated by some, including Jimmy Wales who argued in favour of using the "Hidden" template for Muhammad caricatures: "Wiki is not paper, we should make use of such interactive devices far more often throughout the entire encyclopedia, for a variety of different reasons." Wales, who has been a member of the Board's Controversial Content workgroup since a reshuffle in winter (the others being Ayers, Matt Halprin and Bishakha Datta), recently responded to two related proposals on his talk page, supporting "reasonable default settings" for the display of controversial images, based on "NPOV tagging" such as "Image of Muhammad", rather than subjective assessments such as "Other controversial content".

The Controversial Content study's recommendations had suggested that the feature should be "using the current Commons category system", in the form of an option that users can select to partially or fully hide "all images in Commons Categories defined as sexual ... or violent". For registered users, it recommended even more fine-grained options, to restrict viewing "on a category by category or image by image basis" even outside the sexual or violent categories, similar to Wales' "NPOV tagging". But this was rejected as impractical for the Personal image filter proposal. Brandon Harris explained why: 1) Cache invalidation. The way the system is proposed to work, we basically have to send the categories along with the HTML of the article, so that JavaScript [i.e. the code that actually does the hiding] can act on it. We have to invalidate that HTML whenever controversial content categories are applied or removed, which is hard enough with a set of 10-15 categories and some tens or hundreds of thousands of images [let alone] all 10M+ images and every single category operation that's performed on those images.

2) Subcategories. A system that allows you to choose arbitrary categories isn't particularly helpful if it doesn't also traverse subcategories (imagine choosing Nudity in art, and then not getting the actual categories that contain the relevant images). Traversal of subcategory structures completely breaks the approach we're proposing, and is generally very hard to do as you can tell from the absence of any meaningful traversal features in MediaWiki.

3) Localisation. All the complexity that we've talked about for 10-15 categories, but a few order of magnitudes more of it.

In brief
''Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks. Users interested in the "tarball" release of MW1.17 should follow bug #26676.''
 * Niklas Laxström (User:Nikerabbit), a MediaWiki developer and the founder of translatewiki.net (see editorial above) blogged to thank developers for fixing important bugs with the site.
 * The WMF's Jessie Wild posted an update on the "offline" Wikipedia project, summarising developments of the past fortnight (Wikimedia blog).
 * A new extension, called "PoolCounter" and designed to help with spikes in traffic - the "Michael Jackson problem", was enabled for all Wikimedia wikis (server admin log).
 * Special:NewPages now supports RevisionDeletion (bug #27899).
 * Mark Hershberger, the WMF's current bugmeister, announced his intention to supply regular lists of the highest priority bugs for developers to focus their attention on (wikitech-l mailing list, update).
 * The full history on the English Wikipedia as of January 2011 is now available to download in 15 chunks (full, uncompressed size is around 4 terabytes). A version correct as of March is expected to take "a while" to complete (wikitech-l mailing list).
 * Guillaume Paumier blogged about a new user gallery feature that has recently been deployed to WMF wikis.
 * Following the resolution of bug #28020, page moves will no longer lock-in category sort codes.
 * The limit on the number of contributions a user can have and can still be renamed has been increased to 50,000 (server admin log).