Wikipedia:Bot requests/Archive 2

Black 'n white bot
Several articles about (in particular) Pokemon use cutesy colors for every word. Needless to say this reduces legibility. It would be appreciated if a bot were available that stripped font/color tags from an article. Radiant_* 09:35, Jun 3, 2005 (UTC)
 * There are legitimate places where colour is used usefully, but probably not that many. If a bot could distinguish them, fine, otherwise I don't think it worth it. Can you link to an example article? Thryduulf 10:41, 3 Jun 2005 (UTC)
 * I'll take a look. Most were templated, luckily. Radiant_ &gt;|&lt; 08:55, July 15, 2005 (UTC)

HTML Entities to Unicode conversion
Having a bot to do numeric HTML entities to Unicode character conversion would be good. e.g.:
 * &amp;#26085;&amp;#26412;

versus
 * 日本

There are *a lot* of these. I wouldn't ask for non-numeric HTML conversion. That, is there is no reason to covert things like &amp;mdash; etc.. Thoughts? --ChrisRuvolo (t) 18:08, 4 August 2005 (UTC)
 * seems like a good idea. One change i'd like to see though is to only convert scripts with traditional characters, that is those where each character is a seperate glyph and the text direction is left to right. (e.g. i'd rather you didn't convert stuff from hebrew arabic etc as they are a pain to work with in the edit box for those not familiar with them). Plugwash 19:13, 4 August 2005 (UTC)

I have written such a bot, see User:Curpsbot-unicodify. It also does non-numeric conversions... there's no real reason to keep &amp;eacute; and others, is there? Regarding Hebrew and Arabic, weirdness in editing only occurs if you edit within the Hebrew or Arabic (right-to-left) string, which you would probably only do if you were familiar with those languages; you can edit the rest of the article (in English) without any problems. -- Curps 09:32, 25 August 2005 (UTC)

I see what you mean about weirdness in the edit box. Howerver, the weirdness only seems to happen when there's non-whitespace ASCII embedded between Arabic characters (or Hebrew) and it's adjacent to an Arabic character (not separated by whitespace on both sides). If that's the case, then the browser editor tries to decide whether it's dealing with an English line of text with Arabic embedded or an Arabic line of text with English embedded. If you start out with lots of English and then delete it one character at a time, you can actually get the browser editor (under IE) to "jump" from normal mode to "weird mode". Actually, at one point I got it to happen just by deleting a letter and typing a number in its place.

Mind you, the weirdness is purely within the editor. As far as the rendered display (what the reader sees) is concerned, it still works just fine.

Editor weirdness only happens in relatively rare situations, like Template:User ar-1, where we actually want to put  around Arabic text. However, the vast majority of cases in ordinary articles consists of just a couple of words of Arabic with only whitespace between each other and with whitespace separating them from the surrounding English text. And that works just fine.

I changed the bot to warn about embedded ASCII within right-to-left characters. It issues a warning and a prompt, and if the edit goes through anyway it puts a warning in the edit summary to check the result. -- Curps 08:03, 28 August 2005 (UTC)

Obscenity Bot
Has anybody made a bot that looks through a lot of articles and searches for bad words? Usually this is the work of vandals, and the bot could report to people so that they can take a look at it and revert it. Words like "F*ck", "Sh*t", "B*tch", et cetera are sometimes used in context, but are usually inserted by vandals or trolls. Just throwing an idea out there. Kangy 02:19, 5 November 2005 (UTC)


 * What would this do that a simple search would not do? See for example . If anyone decides to do this, be sure to avoid pages outside of the main namespace (talk, user, etc. should remain unchanged).  see: Profanity, WikiProject Wikipedians against censorship.  --ChrisRuvolo (t) 03:07, 5 November 2005 (UTC)

Never mind... forget it. Kangy 06:47, 5 November 2005 (UTC)