Wikipedia:Bots/Requests for approval/HBC NameWatcherBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

HBC NameWatcherBot
Operator: HighInBC

Automatic or Manually Assisted: Fully automatic, unsupervised

Programming Language(s): perl with the same repaired MediaWiki.pm module HBC AIV helperbot uses

Function Summary: Monitor the creation of new users, compare names to a whitelist and a blacklist, report matches page for human examination.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: X edits per TIME 1 edit per 10 seconds max. However when I applied the filter over 24 hours of names I got only 40 hits, so editing rate will be far lower in practice.

Already has a bot flag (Y/N):

Function Details:

The program runs in a loop and will read the last 25 new users every 5 seconds(or as define by Read rate in the control panel) and monitors #en.wikipedia for new users. It will examine each new user one time by first removing any string from the name that is found in the white list, then comparing the remaining string to the blacklist.

Each pattern that is compared can contain several flags that are associated with this pattern. These flags alter what happens when the pattern is matched. Some of the flags are WAIT_TILL_EDIT which means it will not match the pattern unless the user has edited, or ALTERNATE_TARGET(page name) that allows you to direct reports to another location, and flags like NOTE(message), LOW_CONFIDENCE, and SOCKPUPPET(sock puppet name) which add extended information to the report.

The reports will be added to the bottom of the page defined as the Default target or as defined by the ALTERNATE_TARGET flag. The bot will only write to pages that contain " " at a rate defined by Write rate in the control panel.

The whitelist, the blacklist, and the control panel are all editable on wiki and will be read by the bot every 10 minutes.

Discussion
Existing discussion about this bot is here: Wikipedia talk:Usernames_for_administrator_attention. There seems to be a consensus for the bot, however User:Viridae has expressed some concerns.

I have added functionality such as the ability to wait for the user to edit before certain patterns are used so as to address these concerns. I will leave a note on User talk:Viridae to let him know about this request for approval.

The source code is written and is ready for its testing phase. I would like to have a few days to test the bot before it is rolled out. Once in a presentable state I will release the source code to GFDL as usual. HighInBC(Need help? Ask me) 16:55, 11 May 2007 (UTC)
 * I don't see a reason for the bot to run in a 5 second loop and read the last 25 new names, why not just read the IRC feed? → Aza Toth 17:05, 11 May 2007 (UTC)


 * I can do that, I did not know there was an IRC feed. That will be much better, perl has a great IRC interface. What is the channel? HighInBC(Need help? Ask me) 17:07, 11 May 2007 (UTC)
 * #en.wikipedia at irc.wikimedia.org . it's the "rc" bot who is pumping out data, for example:

 Special:Log/newusers create * Igor baltiski *  New user account → Aza Toth 17:10, 11 May 2007 (UTC)


 * Wow that is a busy channel, I can get my bot reading it no problem. HighInBC(Need help? Ask me) 17:33, 11 May 2007 (UTC)


 * The bot is now reading from the IRC feed. HighInBC(Need help? Ask me) 18:14, 11 May 2007 (UTC)


 * Looks good, for a day or two. ST47 Talk 18:17, 11 May 2007 (UTC)
 * Just a question on how this bot works, is it using regex to spot the bad names? ——  Eagle 101 Need help? 18:23, 11 May 2007 (UTC)


 * It am using simple string comparison, I could use regexs, but the problem is that people could enter a regex that uses an inordinate amount of computer resources. HighInBC(Need help? Ask me) 18:25, 11 May 2007 (UTC)
 * Regex is better in my experience, where would "people" be able to enter them? (note User:shadowbot uses regex effectively)——  Eagle 101 Need help? 21:30, 11 May 2007 (UTC)


 * Both the blacklist and the whitelist are editable, see User:HBC_NameWatcherBot/Whitelist and User:HBC_NameWatcherBot/Blacklist. I will protect them before the bot is rolled out though, but even an honest mistake in a regex can screw up a bot. HighInBC(Need help? Ask me) 22:36, 11 May 2007 (UTC)


 * This is what I would do, admin protect the pages, then test run each regex, if it throws an exception or an error, ignore said regex, (and perhaps log the error somewhere) You might be interested in seeing some of the bots on #vandalism-en-wp and related channels. (On freenode irc network.). Cheers! ——  Eagle 101 Need help? 22:51, 12 May 2007 (UTC)


 * I think regexes may work as it is now, but I have yet to find the need for anything but simple substrings so I have not tried. HighInBC(Need help? Ask me) 14:42, 13 May 2007 (UTC)


 * Current source code is now available at User:HBC NameWatcherBot/source. HighInBC(Need help? Ask me) 14:45, 13 May 2007 (UTC)


 * The bot now has all essential features programed and tested. The last 50 contributions of the bot represent the current state of the program. I am now seeking final approval. HighInBC(Need help? Ask me) 13:48, 15 May 2007 (UTC)
 * Good. --ST47 Talk 00:53, 16 May 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.