User:MarkAHershberger/Weekly reports/2010-W29

This week I tracked down a UTF-8 problems and refactored bits of LiquidThreads.

UTF-8
While digging through code review, I found r61258 that TimStarling had marked “fixme” back in June (months after any conversation on the revision), but had not yet been addressed.

After an abortive attempt in r69333 to simply move the iconv code to UtfNormal::cleanUp, I dug into UTF-8 normalization again to figure out what was going on.

The first odd thing I saw was utf8_normalize — where did it come from. My initial google searches only seamed to turn up old MediaWiki commits or comments on the code. After grep-searching Subversion, though, I turned up the PHP extension that Brion had put together as a wrapper around the ICU Project’s libintl Library. (Note: asking about utf8_normalize on #mediawiki would probably have been quicker than searching for its origin. Brion was very helpful in the channel with other questions I had about UtfNormal.)

While Brion's wrapper pre-dates it, the modern way to use libintl appears to be the Intl PECL extension which many modern systems will have installed or available. While the UtfNormal class does contain a pure-PHP normalization function, this is about 25 times slower than the using <tt>libintl</tt>.

One caveat: it is easier to keep the pure PHP version up-to-date with the latest Unicode normalizations. Older versions of <tt>libintl</tt> are harder for end-users to update and they might fail some of the newer normalizations found in the Unicode Normalization Test Suite. For example, on IRC someone brought up problemss they were having with MediaWiki and “the latest characters” in Burmese. These could have been caused by old versions of <tt>libintl</tt> on translatewiki.net (where the problems were showing up). I haven't yet had a chance to check this with Siebrand.

LiquidThreads
Again, during code review, I spent some time going through un-reviewed revisions of LiquidThreads.

One awkward bit of code I found was the way LiquidThreads checked to see if the WikiEditor extension was installed and called out to it. The way the dependency was buried deep in the code and set off some alarms for me and I went searching for a way to fix it. (It didn't help that calls to <tt>addJSandCSS</tt> were scattered throughout the various View classes and could be consolidated.)

My initial fix was tinged with irony. As Catrope pointed out, I had made a dependency in the reverse direction. Now WikiEditor had LiquidThreads hard-coded into it. (In my defence, it was “only” a hook name, not a section of code, but Catrope was right.)

Fixing this was easy enough and actually simplified the code as I just had LiquidThread use a hook that WikiEditor was already attached to. Code review wins another round!

Finally, the time I spent looking at LiquidThreads lead to a discover that its search box was broken when the Lucene search extension was not installed. I'll probably fix that one this coming week.