User:Tim Starling/Weekly reports/2008-W02

I haven't made a weekly report for a while, and when I did make them, I didn't really know what the purpose of them was and what audience I was targeting. They were concise, just a list of project names, essentially. Sue eventually complained that she didn't know what was going on in development. So I thought that I would write a report today of ongoing and recently completed work with a non-technical audience in mind. Tell me what you think of this format.

Summary: Ongoing projects:  New preprocessor, DumpHTML Completed projects: LabeledSectionTransclusion, #tag, CheckUser log

Parser work (new preprocessor)
This is a major project which I've been working on at a low priority since early November. The parser is the module of MediaWiki responsible for processing wikitext, especially converting it to HTML. The main goal of this project is to improve the speed of the parser -- large articles on Wikipedia commonly take tens of seconds to parse (i.e. convert to HTML). This is a serious issue for user experience. Improving the parser speed also helps to reduce our hardware costs.

The current project is a partial rewrite of the preprocessor phase of the parser, especially the part that deals with templates. It's very complex code, so per line, it takes a lot of time to write and test compared to a typical project. The project has been in a testing and bug-fixing stage since late November, and is now (hopefully) nearing completion.

LabeledSectionTransclusion
LabeledSectionTransclusion is an extension written by Steve Sanbeg for transclusion of sections of articles, where the sections are labelled with tags. It's used on Wikisource. I did a partial rewrite of it this week, in order to:


 * Improve its integration with the new preprocessor, thereby fixing a number of bugs
 * Provide an example for other developers of how the new parser can perform flexible analysis of wikitext.

DumpHTML
DumpHTML is a project to dump the contents of a MediaWiki installation to a collection of static HTML files. I started it in 2005 in response to a request from Widernet, who wanted to install copies of Wikipedia into University intranets in Africa, as part of their eGranary project. We made a few dumps, but the code has since been neglected and has fallen into disrepair. I restarted work on it shortly before Christmas. The goals are:


 * To start producing static HTML dumps of Wikipedia again, which have been requested by many external parties wishing to reuse our content
 * To make DumpHTML easier to use by non-Wikimedia users of MediaWiki. I gave no thought to ease of use in the initial design. DumpHTML has become a valuable tool for some non-Wikimedia users, and I often have to answer their frustrated questions about how to use it.

#tag:
This is a handy little feature which someone implemented as an extension and asked me to install. I rewrote it using the new preprocessor and added it to the MediaWiki core. I have no doubt it will become widely used on Wikipedia.

The extension is documented here: http://www.mediawiki.org/wiki/Extension:TagParser. My feature works in basically the same way.

CheckUser log
I rewrote the logging code for the CheckUser extension. This was about half a day's work, and was put off far too long. The old logging code was terrible -- originally written by me in maybe 10 minutes, then progressively hacked over the following two years by various people in a vain attempt to maintain reasonable behaviour as the popularity of the extension grew.

The CheckUser extension is an abuse investigation tool used on Wikimedia wikis. It allows trusted users to see the IP addresses of logged-in editors, and to search for edits by IP address.