Wikipedia:Bots/Requests for approval/HermesBot 9


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

HermesBot
Operator: Wikihermit

Automatic or Manually Assisted: auto - unsupervised

Programming Language(s): PHP

Function Summary: remove deleted articles from SCV

Edit period(s) (e.g. Continuous, daily, one time run): continuous

Edit rate requested: 5 edits per minute (this will be extremely unlikely though)

Already has a bot flag (Y/N): Y

Function Details: remove the deleted articles from SCV, cobi has created a source code in PHP for me which I will run off of the toolserver after it is approved.

Discussion
The bot needs not to do edits like this. For one, it has not yet been approved for trial. For two, it does not need to be resolving HTML entities. You can either edit the code you were provided to stop this, or I can give you new code which we can try out. :) — madman bum and angel 17:16, 16 August 2007 (UTC)
 * Yeah, I saw that. I have contacted Cobi about it. If you have the code already created you could go ahead and give it to me for a trial run. ~   Wi ki  her mit  18:38, 16 August 2007 (UTC)
 * For the record, the code has been fixed. Trial? ~   Wi ki  her mit  18:54, 16 August 2007 (UTC)
 * --ST47 Talk·Desk 18:55, 16 August 2007 (UTC)

The trial edit still touched lines it shouldn't have. Converting the em-dash back to &amp;mdash; doesn't solve the problem – it just masks it. HTML entities shouldn't be converted in the first place. I've talked to that bot writer about his framework, but he's convinced that there is no problem. Why don't you try this. It's not commented yet, but hopefully you can understand most of it. I saved a few diff edits and it seemed to work properly.

I do promise to document it; I'm just allegedly on a half-wikibreak and would hate to ruin that image. ;) — madman bum and angel 22:16, 16 August 2007 (UTC)


 * Oh, and note that HTTP_Client has to be installed. I'm sure it is on the Toolserver; they should have a local mirror of PEAR.  If not, just get river or another root to run pear install HTTP_Client.  — madman bum and angel 22:39, 16 August 2007 (UTC)

The trial seems to be working fairly well, but I have a couple minor problems with the code being used currently. It does several unnecessary things. It changes paragraph breaks to line breaks (I'm not sure why). I do like that it will only edit below the &lt;!-- comment --&gt;. I don't really like that it reformats existing lines, and that it has such a fixed format for lines it'll match. For example, the regex to match a reported line is:

(^|\n)\*\s*([^ ][^\n]*\s*)?\[\[([^|\x5d]+)\]\]\s*([^ ][^\n]*\s*)?(--|—)\s*([^ ](.|\n)*)(?=(\n\*\s*([^ ][^\n]*\s*)?\[\[[^|\x5d]+\]\]\s*([^ ][^\n]*\s*)?(--|—)|$))/iU

Whereas my code just takes out any line that has a redlink in it (without a namespace specification). It's up to you; I just worry that should any particulars about the page change, the bot will break, possibly magnificently. — madman bum and angel 03:47, 17 August 2007 (UTC)
 * It seems to be reasonably well working now - are those issues still there, or is it working fine? Matt/TheFearow (Talk) (Contribs) (Bot) 22:49, 19 August 2007 (UTC)
 * It works well in practice. It touches lines that it shouldn't touch, but it's much needed and non-disruptive.  When/if it breaks, we'll talk about it.   — madman bum and angel 04:18, 20 August 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.