Wikipedia:Wikipedia Signpost/2010-09-20/Technology report

Washington DC "Hack-a-Ton" announced
On the Wikimedia Techblog, contractor Chad Horohoe announced the first Wikimedia "hack-a-ton", an event when developers, amateur and professional, get together with the explicit aim of bug-fixing and generally getting "down and dirty with the code". Designed to act as a counterpoint to the "MediaWiki Developers' Meetup" in Berlin, which is focused on demonstrations, workshops and small group discussions, the event is scheduled for October 22–24 in Washington DC. Bugs for the weekend are going to be tracked using a new keyword in Bugzilla, "bugsmash". MediaWiki has around 4900 bugs and feature requests outstanding from a total pool of around 25000, though not all relate to the core MediaWiki software.

Google Summer of Code: Samuel Lampa
We continue a series of articles about this year's Google Summer of Code (GSoC) with Samuel Lampa, a biotechnology student at Uppsala University, who describes his project to develop a system for the general import and export of RDF metadata from the Semantic MediaWiki software. Some of you might know Semantic MediaWiki, the MediaWiki extension that (if installed, which is not currently the case on Wikimedia wikis) lets users annotate facts in articles with a special syntax, which makes them "machine readable". This allows external software tools to use the facts for powerful stuff like integrating data, querying the data in a bandwidth-saving way, providing powerful search facilities, and so on. For example, on the Stockholm article, one would add:. Annotations are of course best embedded in templates such as the infobox on the Methane article, where they can make use of the already formatted information without bothering users with additional syntax.

Apart from Wikipedia, MediaWiki is used by numerous organizations and companies for all kinds of knowledge bases. In fields such as construction and engineering there are loads of data available in strictly formalized and standardized document formats that, if stored in Semantic MediaWiki, could be turned into "machine readable", queryable databases, by simply adding semantic annotations in the templates, for example.

Now, what if one exposes this data in a standardized format that the rest of the web was using, everyone using the same identifier for "Stockholm" and "Bosch spark plug no 0001"? This would enable connecting all the data available into a big "web of things" instead of "web of documents", which can be much more smartly queried – asking explicitly for "all cities in" "Europe", or "all spark plugs that fits" "Volvo V70", for example, instead of guessing the keyword combination that returns such a document on a search engine like Google. Such a format is already available, and called RDF. Semantic MediaWiki already allows the static export of articles in RDF, but does not allow its import; nor does it provide a method out of the box to select from remote only exactly those pieces of data you want.

The RDFIO extension, which I built for the Google Summer of Code, addresses the mentioned gaps by providing ability to import RDF as well as an interface for both the querying and updating of facts via a so-called "SPARQL endpoint" (see here for an example) which external tools can also very easily talk to.

This new ability to update semantic facts remotely opens up for some interesting use cases. For example, chemists and biologists using Bioclipse can take their working data and export it to a wiki where their peers can make corrections, before importing it again for further analysis, etc. This workflow is in fact already possible as hinted in this blog post / screencast, and is the focus my current work (progress documented on the blog). For a more technical description as well as download and install instructions, see the RDFIO Extension page. The development, and thoughts behind RDFIO was documented on this blog.

In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
 * Nightshade, a login server for the Toolserver, is to be converted from the Linux operating system to Solaris at "some point in the future" (Toolserver-Announce mailing list). The Toolserver's other login server, Willow, has been running Solaris for some time without issue, though there are differences between the two setups for which some tool owners will need to compensate.
 * Users can now embed TIFF files into documents and have them thumbnailed. For multi-page files, the "page" parameter can be used to select a page for rendering.
 * Diffs with "intermediate revisions" will now say how many different users contributed to those revisions (bug #24007).
 * Gadget and script developers can now assume the availability of the jQuery library, after it was added to all skins, says User:DieBuche.
 * There was a temporary problem affecting servers handling the API on Friday, soon fixed, and another with European servers on Wednesday (identi.ca).
 * After the announcement in last week's Signpost that the Article Feedback extension will go live on the English Wikipedia on September 22, in a trial that is part of the Public Policy Initiative, a post on the Techblog and a FAQ provided further technical explanations.
 * As announced last week, longtime Wikimedia tech staff member Mark Bergsma has been promoted to Engineering Program Manager for Operations. EPMs are a position that was recently introduced at the WMF, as explained by Erik Möller some weeks ago.
 * Wikimedia Chief Technical Officer Danese Cooper will hold an IRC office hour (a public chat event on IRC) on Wednesday, 22 September at 23:00UTC.