Wikipedia:Wikipedia Signpost/2010-08-23/Technology report

Plans to improve password security
Head developer Tim Starling has proposed an upgrade of the way the MediaWiki software (and hence Wikimedia sites) encrypts ("hashes") passwords (wikitech-l mailing list). He outlined concerns that if someone could acquire an encrypted password from the database, they could decrypt it and log in as that user within 20 minutes, with no special hardware. Highlighting this issue, he requested that any new system be:
 * 1) Future-proof: should be adaptable to faster hardware.
 * 2) Upgradeable: it should be possible to compute the C-type [i.e. new] hash from the B-type [i.e. old] hash, to allow upgrades without bothering users.
 * 3) Efficient in PHP, with default configure options.
 * 4) MediaWiki-specific, so that generic software can't be used to crack our hashes.

Tim Starling suggested that the "Whirlpool" hash be incorporated as a way of achieving this. The result was a general consensus that the proposed scheme was better than the current process, with a wide-ranging discussion of what might be even better. User:Simetrical played down the threat, arguing that "Hackers go after money, and there's no money in hacking Wikipedia. We have nothing secret or valuable that's not already readily available".

Concerning client-side improvements in password security, a JavaScript-based password complexity checker has recently been written (70520), prompted by the remarks of a security researcher quoted in the Technology Report earlier this month (Study of web passwords includes Wikipedia).

See also earlier Signpost coverage about password security on Wikipedia: Four administrator accounts desysopped after hijacking, vandalism, Administrator status restored to five accounts after emergency desysopping (about a 2007 incident which led to some changes in MediaWiki and the start of the page Security), Blank passwords eliminated for security reasons (2006), Password security upgraded after Slashdot furor (2005, about an incident after which salted passwords were introduced).

Google Summer of Code: Brian Wolff
We begin a series of articles about this year's Google Summer of Code (GSoC) with student Brian Wolff (User:Bawolff), who describes his project to improve MediaWiki's image metadata support:

Mw metadata branch screenie bobsled image.png those not familiar with what image metadata is, it's somewhat similar to how back before digital cameras, people used to write information about photographs (date taken, the subject of the photograph, etc.) on the reverse. In digital photography we cannot write on the back of a digital file, but we can embed such secondary information inside the file itself. My project is to improve how we extract and show this embedded information on file description pages.

Currently MediaWiki does extract some image metadata, specifically exif data in jpeg files, and as of a couple days ago, tiff files (example). However it misses some exif data, most noticeably, embedded GPS data (example, with embedded GPS data that has had to be manually extracted). Part of my project is to fix up MediaWiki's current exif support so that it extracts GPS data and other properties currently missed. With that said, exif is only one on the many types of metadata. The two other (main) types I added support for are IPTC (IIM) and XMP data. IPTC data is often found in more professional archive type settings. For example, many of the images on commons from the German federal archive have IPTC metadata and no exif metadata. XMP metadata is a relatively new metadata standard that is slowly gaining ground. It has the ability to store metadata properties in multiple languages, which I feel aligns very well with the multilingual goals of Wikimedia. XMP data can also be easily embedded into formats such as PNG and GIF images, in addition to JPEG images.

The code I've been working on also allows extracting file format specific metadata. This includes JPEG, GIF, and PNG file comments, as well as PNG textual data chunks (for those familiar with the internals of PNG, the tEXt, zTXt and iTXt chunks). For example, File:Pentdod gruen neu anim.gif has hidden inside it a comment of "Created with The GIMP by Alfons Kolling (Lokilech)" which my project allows us to extract and show to the user. Another example of why this is important is that whenever you download a thumbnail from Wikipedia (or other Wikimedia site), MediaWiki adds a file comment with the URL for the image page. It is kind of ironic we can't show the metadata that we ourselves embed in thumbnails.

Once finished and rounded off, the new code could easily be merged into the MediaWiki base, improving functionality for all new MediaWiki installations and upgrades, including Wikimedia sites. Metadata can also help volunteers to spot low-level image copyright infringement.

In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
 * The final Vector and advanced editing tools rollout will start on 1 September (Wikimedia techblog), to all remaining wikis (mostly the smaller ones).
 * A number of problems with image thumbnails are outstanding; for example, with large thumbnails (bug #24824) and the sharpness of thumbnails (bug #24857).
 * Further to previous coverage, User:Simetrical has begun his overhaul of the category display system, this week improving the  extension, which had previously been disabled on WMF wikis over performance concerns (bug #23682).
 * In last week's Technology report, it was noted that the complexity and informality of wikitext presented a problem in developing WYSIWYG editors. Recently, Andreas Jonsson reported preliminary success in moving to a formalised, predictable model (wikitext-l mailing list).
 * Researcher Dirk Riehle argues that "companies are shying away from bringing commercial innovation and investment to MediaWiki because of the uncertainty around its intellectual property", especially the question whether the GPL would prevent publishing proprietary extensions, and the usage of the term "MediaWiki". He suggested setting up a separate "MediaWiki Foundation".