Wikipedia:Bots/Requests for approval/Svenbot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Svenbot
Operator:

Time filed: 21:10, Sunday August 14, 2011 (UTC)

Automatic or Manual: Automatic (with spot checks)

Programming language(s): Java

Source code available: Controlled by Fastily, his call.

Function overview: This is a duplicate copy of Fbot function 2, which was recently approved. Fastily has asked for my assistance in running a second copy of the bot for the following reasons:
 * 1) Because Fastily wants to keep this off of the Toolserver, it can only be run on personal computers and only run when the (an) operator is online.
 * 2) The estimated number of files that Fbot needs to go through is over 500,000. Even if it were running constantly (which it can't because of bullet 1), it would take an unseemly amount of time for Fbot to get though the backlog alone.
 * 3) I was the one he asked because I've been involved in getting the bot setup, and know the file namespace well enough to be able to effectively monitor what the bot is doing.

Links to relevant discussions (where appropriate): Bots/Requests for approval/Fbot 2

Edit period(s):

Estimated number of pages affected: 20K (note that the bot has to look through 500K pages, but will only edit the 20K that meet its criteria)

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: See Bots/Requests for approval/Fbot 2

Discussion

 * I'm asking for a procedural speedy approval. Thanks,  S ven M anguard   Wha?  21:10, 14 August 2011 (UTC)
 * I endorse speedy approval as well as the account given above. - F ASTILY  (TALK) 21:31, 14 August 2011 (UTC)
 * Oppose as the request stands; why duplicate? There's no reason for another bot just because it's got a lot to do; we can run the bot more quickly, allocate more resources to it, as necessary.
 * We approve a bot; not a task. When we approved "Fbot function 2" it was within a certain, defined remit - discussed, and agreed upon; and one of the deciding factors was, how 'fast' it can run.  To now accept another overrides that.  So saying 'per ' isn't enough justification. Imagine if SOMEONE requests a bot to do X at 500/day; and it's approved, and then ANOTHERPERSON and YETANOTHER say 'please, I'll do it too, per SOMEONE'...and 100 more. That fundamentally affects the original approved remit.
 * A bot request needs to define an objective; to 'help with xxx-bot' isn't enough.  Chzz  ► 01:10, 15 August 2011 (UTC) clarified now, as requested by the Earwig  Chzz  ►  01:36, 15 August 2011 (UTC)

I'm not sure why we need another account for the exact same thing. Responding to points in the function overview: Chzz edit-conflicted me in the middle of this post, but I can agree with what he's saying. If you want the task to be done faster, then we should discuss its speed/throttle, edit times, or Toolserver-ness, not just dump another bot into the mix that could potentially mess things up if we need to stop/block them, change their code, or something else. &mdash; The Earwig   (talk)  01:25, 15 August 2011 (UTC)
 * 1) What's wrong with the Toolserver? If you are against it for some reason, please share why. If it's simply because you don't have an account there and you want to run this 24/7, then someone with a Toolserver account should run it. No reason to duplicate the task only so it can be run by another person without Toolserver access.
 * 2) This is an issue of server load as well. If Fbot causes x amount of server load due to its internal throttling/maxlag, then that's for a reason – running the exact same task will not get around this problem, because the task will still be causing 2x server load or whatever regardless of which bot is doing it. If you want Fbot to be faster, then lower its throttle/maxlag, don't add *another* bot to the equation.
 * 3) No concerns with this part of it.


 * I see some concerns have been raised above. I assumed what was stated in the function details was self-evident, but it's now obvious we could use some clarification.  In response to the above:


 * I have nothing against the Toolserver. I elected not to run my bot off the Toolserver because edits Fbot must be scrutinized.  Given the nature of the task, which pertains copyright, mistakes/false positives must be corrected in a timely manner; there are too few Commons admins to effectively police and remove inappropriate transfers (in my experience as a commons admin, these always turn out to be blatant copyright violations). An operator of this bot must therefore be online when the bot runs to ensure that mistakes are quickly corrected and that spot-checks are frequently performed.
 * I have, and intend only to ask one user, Sven Mangaurd, to run a duplicate copy of this bot. He is a user who I trust, a user who understands the implementation of this bot , and most importantly, a user who really understands our complex copyright policies .  In response to the concerns with server lag, Fbot causes none that I know of.  My code is streamlined for maximum efficiency and the bot does not query the api any more than it has to.  If it helps, Sven and I can run our bots at different times of the day. We are regularly in contact so this will not be a problem.  Furthermore, I should note I am currently very busy in RL, and expect to be for the next few months.  As a result, I have very little time to spend online and am unable to run my bot regularly. To put this in context, by my calculations, I'm going to have time to run my bot once or twice a week for a few hours.  It can review ~4,000 items each 3-4 hour long run.  There are currently ~500,000 items to review. Best case scenario - assuming I run my bot twice a week regularly, it will take me 63 weeks to complete a single run of the task.


 * Hope this helps to clarify things. Regards,  F ASTILY  (TALK) 04:09, 15 August 2011 (UTC)
 * It does help, yes; I struck the 'oppose'. This sounds more of a request for multi *operators* for a single bot - the bot already having been approved. But, I'm not familiar enough with policy on kinda "multi-operator bots" and things; also, I have rather a lot of concerns about files being moved to Commons - I know that quite a number of people object to moving them, on the basis that Commons policy can/does change on a whim, outside of enwiki control - hence, although almost no users explicitly use the 'do not move' tag, I know it's a bit controversial. But, this probably isn't the place for that debate, and I see the other one was approved - so I'll just stay out of this discussion now :-) (unless someone shouts me back in here). Thanks for clarifying. Best of luck sorting it out, one way or another.  Chzz  ► 08:43, 15 August 2011 (UTC)
 * Btw, why does the bot need to read every one of 500k pages? Assuming you use MediaWiki's API, you can easily correlate the embedded templates. I would then only need to read/write to pages that match your white/blacklist rules. — HELL KNOWZ  ▎TALK 08:59, 15 August 2011 (UTC)

See no issues here, as long as the operator's acknowledge the points of Bot policy. — HELL KNOWZ  ▎TALK 08:59, 15 August 2011 (UTC)


 * There are, as I understand it, two ways to do this. One way would involve both Fastily and I logging into Fbot (and differentiating between the two of us by having a login in the bot itself). Both Fastily and I are uncomfortable with that option. The other option would involve me running an identical copy of script, on my own bot account, and then having my bot point to his bot for bugs, comments, ect. We'd be able to ensure that we're not trampling each other because 1) we'll split the whitelist in and each take different parts of it, so we're not working in the same area at the same time, and 2) because for most of the rest of the year, I'll be in the time zone UTC +8, meaning I'd be awake (and running the script) when Fastily is asleep (and therefore not running the script himself). A good deal more planning has gone into this than my awkward statement above would indicate. All this is to say that criteria 1 and 2 of the multiple use policy are met by my having my own account, and criteria 3 is covered by Fastily's endorsement above.  S ven M anguard   Wha?  09:12, 15 August 2011 (UTC)

Indeed, after discussion with Sven, Fastily, Chzz, and some fellow BAG members on IRC last night, I now support this task as-is. Because it uses the same code as Fbot, I see no reason for another trial, so once we agree on how the bot will be run, we can quickly approve it. As long as Svenbot and Fbot's connection is made clear (talkpages redirecting to one common page, for example), I can't foresee any problems. Of course, your code and blacklists should always be kept in sync, but this should be relatively obvious. A single shutoff page that both bots look at would also be helpful, but not required. &mdash; The Earwig   (talk)  20:11, 15 August 2011 (UTC)


 * BAGAssistanceNeeded Given that consensus appears to be in favor of approving this bot, could we please have some closure on this matter? Thanks,  F ASTILY  (TALK) 19:04, 16 August 2011 (UTC)

Bot is a duplicate of existing one. Multi-operator usage clarified. Necessity for the task to be split among users clarified. See no reason to delay this, BAG also doesn't appear to have any more concerns. — HELL KNOWZ  ▎TALK 19:08, 16 August 2011 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.