Wikipedia:Bots/Requests for approval/VWBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

VWBot
Operator:

Automatic or Manually assisted: Automatic

Programming language(s): Python

Source code available: Not written yet

Function overview: Clerical tasks for WikiProject Copyright Cleanup.

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: Less than 10 per day

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: I am interested in automating a decent variety of tasks focused on helping out with the copyright area, but they are all relatively minor:
 * 1) transclude new pages created by CorenSearchBot to WP:SCV
 * 2) add adminbacklog and backlog tags to WP:CP and WP:SCV respectively if/when they get backed up
 * 3) act as a backup for DumbBOT and Zorglbot for their tasks at WP:CP if/when they are down (this involves moving transclusions and loading new daily pages)
 * 4) notify authors that their pages have been blanked (by {{subst:copyvio}}) in case they aren't notified by the taggers, so that the pages don't get relisted for an extra week without any action being taken on them
 * 5) add pages newly tagged with close paraphrase to the appropriate daily page at WP:CP

Discussion
If you haven't done so already, please notify WikiProject Copyright Cleanup of this BRfA, and request some input from them. Regarding function 1, I presume you mean pages tagged by the bot (not created)? Doesn't CorenSearchBot do this itself? If not, I feel that it should be something CSB does, rather than having another bot running around after it. Regarding task 4, how do you identify if the user has already been warned? Do you link to the nominator? Will you be using a custom notification message for the author? - Kingpin13 (talk) 12:27, 24 May 2010 (UTC)
 * I am a member of Copyright Cleanup, and I've gone ahead and posted a link and request for input on the talk page. Regarding #1 I actually do mean created, it makes a new page recording every day (e.g., Suspected copyright violations/2010-05-24) and for the past couple of months I've just been transcluding them into the main WP:SCV page (like this) since that's how they were doing it before I joined on. No, CSBot doesn't do it itself, and it could be something that it does, but DumbBOT already transcludes the page onto the daily WP:CP page here), so I guess I just assumed it was something to be done by other bots and leave CorenSearchBot to its main task of the actual searching and tagging.


 * Regarding #4, this is obviously the most complicated part of the bot, and may take some time for me to figure out the best practices and algorithm for it. After blanking, a page (or possibly just sections of the page) looks like this, which provides the convenient " ~ " code to notify the uploaders which when placed looks like this. I believe the bot message should be largely the same, albeit with an additional note at the end that the message was left by a bot and instructions to contact me if the bot is in error or leaving a duplicate notice or the like. I hadn't considered the need to link to the nominator, but now that you mention it I probably should, in case the uploader has any questions about why the article/section was blanked. I was going to have it figure out which new articles were transcluding Template:copyviocore from the previous day and then check for who uploaded the blanked material (when I do this by hand unless it's a brand new article I use WikiBlame/Revision history search to find when the offending text was added and identify the contributor, although I'm unsure if there's an easier way to do it with a bot). Once the contributor is found, checking their talk page for a notification regarding the page in question and if necessary the days talk page history to ensure they haven't received and deleted the notification.


 * I will say that there also some other still-clerical duties that didn't occur to me when filing (such as closing a CCI which is marked as completed, or backing up DumbBOT's other CP function of listing articles that are tagged copyvio but not listed at WP:CP), and I'm sure others that haven't occured to me yet - would I need to file another BRfA for additional things of that nature? VernoWhitney (talk) 14:27, 24 May 2010 (UTC)
 * I suppose it's a little late now, but should I have filed five seperate BRfAs instead of this single one? VernoWhitney (talk) 17:50, 25 May 2010 (UTC)

It's a good idea to include a link to the nominator as this makes it easier for the user being warned, and also reduces the users asking the bot for help. I wouldn't worry too much about CSDWarnBot, the main problems with that was the message, and that it wasn't actually waiting a set amount of time between the tagging of the article, and the warning of the user. I've actually since created a bot which does the same task (User:SDPatrolBot II) and haven't had any serious complaints, it's not actually the task of warning users on behalf of others that there was a problem with, rather just the way in which that bot was carrying out the task. However, it's your call :). It did occur to me it might be easier to have them as separate BRfAs, but don't worry too much about it, we can simply trial them separately (code permitting?), so it's not a large problem. As to future tasks, they would require further BRfAs, unless they are simply uncontroversial changes/fixes to the current tasks, if you're not sure, you can just ask me or another BAG member in the future. - Kingpin13 (talk) 17:45, 26 May 2010 (UTC)
 * After reading about some problems with another bot which left notice templates, I'm thinking that completely automating #4 may not be a good idea, and instead it should probably just make a list in it's own userspace (or a notation elsewhere depending on where others at WP:COPYCLEAN may like it) for manual double-checking in case a personal notice was left. VernoWhitney (talk) 16:56, 26 May 2010 (UTC)
 * Okay, so will it actually be waiting for CSBot to create the subpage, or will it just transclude the next subpage every day?
 * I was thinking of having it wait until CSBot created the subpage, just so it's not putting up a useless redlink, but it would work fine either way. I guess if the timing was the big problem with CSDWarnBot then I'll stick with my original idea for #4, since I'll only be running this once a day (and only a dozen articles are blanked on a busy day) the chance of stepping on someone's toes shouldn't be a large concern. As I haven't coded the bot yet (waiting to be sure I'm not wasting my time) I'll make sure they can be trialed separately, although I imagine they could be fairly drawn-out trials as only task #1 is guaranteed to run every day (assuming CSBot isn't down). VernoWhitney (talk) 18:31, 26 May 2010 (UTC)
 * Regarding #4 notifications, WP:Bots/Requests for approval/Erwin85Bot 8 AfD notification is more comparable: it runs once a day for a 7-day process. Flatscan (talk) 04:05, 28 May 2010 (UTC)
 * Not sure about having it wait for CSBot, since that would mean (presumably) that you would have to have it running continuously, repeatedly checking if the page is created, if you're happy with that, then there's no problem (so long as it's waiting a reasonable amount of time in-between checks). Like you say, either way works. Also, will the bot need to remember pages it has previously listed, so it does not list them again? Hmm, well it would be best to have the bot check the article was tagged at least 15 minutes or so before warning, if you're getting the nominator anyway, it shouldn't be too difficult to check the time of that revision? Okay, tasks #1 and #2 seem pretty straightforward, you may want to start on the code for those, and let me know once you're ready to trial them. Regarding #2, what will the number of open pages be before the backlogged template is added/removed? Also, you may want to come up with some drafts for the warning message to be used for task #4, if you want it be used as a on-wiki template, let me know if you also want the template protected - Kingpin13 (talk) 18:47, 26 May 2010 (UTC)
 * I'll just start with it transcluding whether the new daily page exists or not and worry about the timing later. For #1, the bot shouldn't have to keep track of pages at all, just to know what day it is as there's no reason for them to be un-transcluded until the day is over and every entry has been dealt with. For #2, I was thinking 4 days of backlog warrants the tag (3 days can pile up with trickier cases or a weekend, but 4 is rarer). As far as a delay for #4, I was thinking of having it start its scanning of new tags around 00:00 UTC (since everything's broken down by day), but then wait to do at least some of the edits until around 00:20 UTC once Zorglbot and DumbBOT and should have completed their work (at 00:00 and 00:15 respectively), so I'll put notifications into that second batch too. I'll start working on a notification message in my sandbox. I'm not sure how much detail you're looking for so feel free to stop me if I'm rambling or ask for more details if I'm not giving enough. VernoWhitney (talk) 19:43, 26 May 2010 (UTC)

As an admin part of WP:Copyclean, I think every task here, except perhaps for #2, is sorely needed. 2 is merely in the "nice to have" category as far as I'm concerned. MLauba (Talk) 16:33, 27 May 2010 (UTC)
 * Ditto in all respects. I've added a more detailed note at Wikipedia talk:WikiProject Copyright Cleanup. --Moonriddengirl (talk) 17:04, 27 May 2010 (UTC)

Okay, I now have task #1 programmed and ready for trial. VernoWhitney (talk) 02:39, 9 June 2010 (UTC)
 * And task #2 ready for trial. VernoWhitney (talk) 00:12, 11 June 2010 (UTC)
 * Brilliant. For tasks 1 and 2, that may not be long enough for the backlog, but lets see how it goes. - Kingpin13 (talk) 15:42, 11 June 2010 (UTC)
 * For convenient timing, CorenSearchBot has been out of commission recently, so there will at least be a "backlog" at WP:SCV of links to non-existant pages. VernoWhitney (talk) 20:06, 11 June 2010 (UTC)
 * 7-day trial completed for tasks #1 and #2. The first two days were run manually, after which I set up the last 5 for completely automatic. VernoWhitney (talk) 14:42, 18 June 2010 (UTC)

Brilliant, the trial all looked fairly straightforward. Firstly just a sorry for the late reply, I've not had as much time for Wikipedia recently. Moving back to task #3, have you decided how to identify if the creator has been warned yet? Also, could you provide a link to your sandbox please? - Kingpin13 (talk) 07:38, 21 June 2010 (UTC)
 * No worries. Since task #3 is just acting as a backup to other bot tasks, I'm assuming you're asking about #4 and I think I've figured out how to do it, but I'm still working on fully implementing it yet and seeing if it correctly determines who needs to be notified in a dry run. My sandbox is at User:VernoWhitney/Sandbox2. VernoWhitney (talk) 11:59, 21 June 2010 (UTC)
 * Yeah, sorry about that, meant #4 (spotted that I had it wrong, and then forgot to fix it!) Okay, this is pretty much the same as SDPatrolBot II, and there doesn't seem to be much opposition from users. Let me know once you want to do a (user-space if possible) trial of task 4. If it is a user-space trial, would it be possible to list some pages which the bot thinks the creators' of have already been notified? Just so we get a larger number of pages to search through, if not that's fine, - Kingpin13 (talk) 13:44, 21 June 2010 (UTC)
 * Yeah, as is it's currently getting a list of all of the blanked pages (see User:VWBot/Trial), having it just make a note for its conclusions as to whether they've been notified (or don't need to be in certain cases), won't be a problem. VernoWhitney (talk) 13:59, 21 June 2010 (UTC)
 * Great, let's do a user space trial for this, probably for a few days. If you could, please list all the parameters the bot would put in the template. Oh by the way, I meant to mention that I like the look of the template you are using, comparing to the standard one. It's a bit long, but can't really be helped with a copyright notice. Also, if you need protection for a template, just ask me. (for task #4) - Kingpin13 (talk) 13:30, 23 June 2010 (UTC)
 * 7-day userspace trial completed for task #4. I also have the code ready for task #5 and most of #3. #4 has been a work in progress, so I don't know if you want to see more of it or not, but it's been seven days. VernoWhitney (talk) 03:12, 30 June 2010 (UTC)

Section break
Alright, let's see what's going on here: There you go. &mdash; The Earwig   (talk)  17:03, 1 July 2010 (UTC)
 * Task 1: Overall, seems fine. I'm going to assume that this was a bug and you fixed it, caused by starting the bot at the wrong time? Perhaps the bot should check to make sure it isn't re-adding a day that's already listed. In any event, I don't suspect this should be a problem. Perhaps, if another user transcludes the date before the bot does, but I don't think that will happen.
 * Task 2: Seems fine. The only reported backlog was a mistake, however, due to the pages not having been created by CSBot. I don't suspect that this will be a problem, though. Unless the bot goes down and it's adding pages that don't exist, there shouldn't be any trouble. A few other things: what happens if another user (unintentionally) removes the nobacklog template? Does the bot have a fail-safe? Again, probably not important. Also, any chance we can combine this and this into one edit? Not important, as before, and I'd suspect that it would be a little difficult to program, but would be a nice feature.
 * Task 3: Not exactly sure how to trial this, maybe in the bot's userspace?
 * Task 4: Hm, now... this is a little more interesting. Apparently you're going to go with a WONTFIX on the CCI bug? Alright. Assuming we skip that one, the rest of the results are mostly okay, but I'm still a little worried about trialing this outside of the userspace. For example, AB Aurigae from 29 June could be especially problematic (worse than not notifying a contributor) because it is notifying someone who wrote the article but did not introduce the violation, which would be very frustrating if it happened to me. How do you feel about this?
 * Task 5: I think it's safe to try this one out.
 * Okay, replying in order:
 * Task 1: Yes, that edit was a mistake: I intended it to be another userspace test but didn't change the page-to-edit. I have since added a check for an existing transclusion.
 * Task 2: That edit was actually working as intended, since there were 4 days transcluded. Yeah, it'll look odd if CSBot goes down again and nobody's removing empty dates, but really I was just letting them pile up so that I could see VWBot actually edit it. If someone removes the nobacklog tag, I have it set up to just add a new backlog tag before /doc since that shouldn't be removed. I'm also trying to find the right wording on some html comments to ask people to not remove the tag in the first place. It actually shouldn't be a problem to combine those edits, I just haven't yet.
 * Task 3: I could set most of it up for Userspace just replacing WP:CP/... with User:VWBot/... . There is one CP task that DumbBOT is supposed to be doing that it isn't at the moment which is "list articles that are tagged copyvio but not listed". It worked at least through March, but it hasn't been recently and Tizio hasn't yet answered the question about it which I placed on his talk page a week ago, so that part at least could be done live.
 * Task 4: Yeah, trying to track which CCIs have been recently closed and which subpages they may have and which articles they link to seems like a very messy project to me, so at least for now the workaround is to just not close CCIs for one day to let VWBot see the backlinks. As far as that particular mistagging goes, the copyvio is two sentences out of the 24 in the article (not including references and such) which makes it tricky. I already have the algorithm only looking inside the blanked portion of the article, so I think pagescraping the source(s) may be the only option to improve accuracy there. Any objection to me continuing this task in userspace so I can see if I can increase the accuracy?
 * That's fine, feel free to continue testing it. &mdash; The Earwig   (talk)  18:24, 1 July 2010 (UTC)
 * Task 5: Great. Will start tonight. VernoWhitney (talk) 18:19, 1 July 2010 (UTC)
 * 5-day trial completed for task #5. It didn't run every night (no new taggings some days), the first run began with a misstep, which was fixed, the second run missed the source parameter in the tag, but last night worked fine (I fixed my regex). Task #4 is still running in userspace, but not ready for the real-world yet. VernoWhitney (talk) 13:01, 6 July 2010 (UTC)
 * Feel free to just consider Task #4 withdrawn for now (although I'd like to revisit it in the future). It's going to take me a while to work out the kinks in it so I'll just keep running it in userspace so I can follow-up manually until I feel it's ready for a live trial. VernoWhitney (talk) 15:51, 8 July 2010 (UTC)

BAGAssistanceNeeded On the off-chance someone is looking at this, I'm now trialing #3 in userspace, since #1, 2 and 5 have already completed live trials and 4 has been withdrawn for now. VernoWhitney (talk) 13:02, 5 August 2010 (UTC)
 * Verno, just want to say I'm really sorry about the slow progress of this. I'm now back from camping, and don't have further plans to be away much over the rest of summer, so please feel free to badger me as much as you like :D. I'll try and take a look through the trial edits for #5, and #3. Thanks, - Kingpin13 (talk) 08:57, 7 August 2010 (UTC)
 * That's fine. I've been coding this from scratch, so it's taken me a while to get things up and running too. Since it's been making edits for a bunch of different tasks, let me know if you have trouble sorting them out. VernoWhitney (talk) 11:35, 7 August 2010 (UTC)
 * Right, the trial edits for task five look fairly straight forward. Now task three looks a bit more complicated, perhaps you could just outline the different tasks this involves? And if possible, link to a userspace trial edit for each one? - Kingpin13 (talk) 05:14, 11 August 2010 (UTC)
 * Task 3 is a backup for Zorglbot and DumbBOT for the tasks they've been approved for at WP:CP. As such I planned on running most of this task at 00:20 or later, since their edits should all be finished by 00:15. This total task includes:
 * Creating a new daily page (Zorglbot) as at User:VWBot/2010 August 7 or, if one is already created but not populated with {{subst:Cppage}} (DumbBOT), populating it. This second situation has been run once, here (the date's wrong because I ran it a day late) with the live version for comparison here.
 * Updating WP:CP/NewListings (Zorglbot), moving in the new day and moving out the old day as here. This edit should be unnecessary, but must be done in order for Zorglbot to work correctly.
 * Updating the main WP:CP page with the old day which can now be processed (Zorglbot) as here.
 * Listing those articles which have been blanked but not listed on the just-finished daily page (DumbBOT). As DumbBOT is currently not doing this task (it's been broken since March or so if I recall correctly), I have been including this in my userspace listing alongside Task 5 as here so that I could copy/paste the edit by hand as here, where the first article listed is a newly blanked page and the 2nd-4th are Task 5's close paraphrases.
 * I've been running all of this manually and tweaking it regularly, which means the timing is completely inconsistent and has lead to a couple of errors, such as not creating User:VWBot/2010 August 11 on time (I didn't copy/paste that line of code into my python console) and this duplicate entry when my code to make sure that it only acts as a backup was broken. That particular bug has now been fixed. VernoWhitney (talk) 13:56, 11 August 2010 (UTC)
 * Great, thanks very much for sorting it like that, much easier to understand :). I feel that this is pretty much ready for approval, there's been a few problems in the trials, but everything seems to have been addressed, and if anything else pops up it can presumably be sorted on the job. So long as you're happy with it..? I think I'll mark it as approved. Cheers, - Kingpin13 (talk) 16:55, 16 August 2010 (UTC)
 * Yeah, I think all of the bugs have been sorted out of the tasks that are currently up for approval (and I'll keep babysitting it just to make sure), so I'm comfortable running it live. VernoWhitney (talk) 17:03, 16 August 2010 (UTC)

Very well. Seems to be making good edits, has the support of the community, and a great op ;). Good luck :) - Kingpin13 (talk) 17:07, 16 August 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.