User:Haus/Hanzo

 remain  completed Hanzo is an experimental plug-in for jEdit built to partially automate the process of converting from bad, old-style Infobox Ship templates to shiny, new Infobox Ship Begin templates. This is a task undertaken by the Ships Wikiproject and is (briefly) described at Category:Ship articles needing infobox conversion.

It's so named because The Bride was wreaking havoc with a Hattori Hanzo sword on TV while I was searching for a name for a Java class.

The program has absolutely no other use in the universe, and the only way someone else could use it would be to basically do a bit-by-bit copy of my hard drive. It depends on about a zillion other packages. That said, if you want to write a lexer for infoboxes or automate some editing processes in Beanshell, I have some notes below.

Hanzo is % finished with the job I created it for, having helped me convert  of 3,282 infoboxes in 3 days 4 days 5 days.  There are about  left to go. which represents about hours. The remaining 50 or so infoboxes have to be cleaned up by hand, which is costing extra time. Hanzo's current status could best be described as "humming along smoothly for a few hundred edits, then bursting into flames."

Feedback
If you're here, you probably saw an edit summary. I have a watch on the discussion page here. Feedback away.

Project history
3,282 infoboxes were converted in a period of 5 days, 8 hours and 27 minutes, from:

to
 * 10:35, 27 March 2008 (hist) (diff) USS Patrick Henry (SSBN-599)‎ (replaced infobox using Hanzo) (top)
 * 19:02, 1 April 2008 (hist) (diff) HMS Dragon (D35)‎ (Migrating infobox with Hanzo)

This represents about 25.54 conversions per calendar hour over the period of 128.45 hours.

Related infobox issues
As of 30 March, 2008, about 3,750 pages use Infobox Ship Begin, listed here Ship infoboxes requiring conversion include approximately


 * 2,500 Infobox Ship remaining, listed here (of the original 3,282),
 * 1) 50 table header 01 conversions
 * 1,000 table header 02 conversions
 * 2,361 2,226 2,161 hand-tagged articles, including subst'ed infoboxes

I haven't formally analyzed (2) and (3), but they should be mostly amenable to automation. (4) might not be as easy, it may be something of a head-scratcher.

Technical
Hanzo's main functionality comes from a lexical analyzer written in Java with jFlex. To a large extent, the program lives inside a jEdit environment. A single-purpose program, it just barely functions. It was written in three four rather arduous days: about a day to write the lexer (twice, per Raymond's law), half a day to uninstall/reinstall/fix jEdit to work with wmjed, and a two and a half days to do stuff like:
 * get communication from WP to the lexer and back
 * preserve UTF-8 characters
 * do automatic page loading
 * do local diffs
 * automate to a 1-click process

It has one goal in life: to translate Ship-specific infoboxes.

Translating these infoboxes with regular expression search-and-replace seemed nuts to me. I couldn't bring myself to hack out code to do it. On the other hand, a small lexer with dozen rules and 4 parse states seems to do it pretty nicely.

Requirements
The BeanShell scripts below need an environment something like this:
 * jEdit version 4.3pre13 or later from http://www.jedit.org
 * mwjed wikimedia jedit plugin
 * mwjed has some requirements of its own, read the mwjed page carefully
 * the JDiff plugin, available from inside the jEdit Plugin manager ( Plugins menu, Plugin manager item, Install tab)

Possibly reusable bits

 * A flex lexer for infoboxes
 * Single load-process-save cycle in Beanshell
 * Single load-process-diff-save cycle in Beanshell