Wikipedia:Bots/Requests for approval/Merge bot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Merge bot
Operator:

Time filed: 01:04, Thursday February 21, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available: User:Merge bot/proposedmergers.php

Function overview: Maintains Proposed mergers/Log and its subpages, for the benefit of WikiProject Merge.

Links to relevant discussions (where appropriate): User talk:Wbm1058, Wikipedia talk:WikiProject Merge/Archive 2

Edit period(s): Daily, at least... maybe more frequently

Estimated number of pages affected: One summary page, plus a page for each merge backlog subcategory (currently about 47)

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: Look at Proposed mergers/Log and pages linked from it to see what this bot does. This is my second takeover of a harej bot. This bot is a fork of the Automated list of proposed mergers operation formerly performed by RFC bot. That bot function stopped working after the category it ran off of was redirected. I patched in the new category name and it seems to be happy again. The bot takes about an hour to process the current large backlog of proposed mergers. This may be a first step towards further improvements to the proposed mergers process, the consensus seems to want something more like RMCD bot. Thanks.

Discussion

 * This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT ⚡ 13:09, 22 February 2013 (UTC)
 * Your bot is editing. Why is that?— cyberpower ChatLimited Access 14:14, 22 February 2013 (UTC)
 * Oops, sorry. I only intended to edit in my own user space. I'll blank the pages that the bot edited in error. There is another bug, my test did not complete. I'll work on fixing that. Wbm1058 (talk) 14:40, 22 February 2013 (UTC)


 * This task looks fairly uncontroversial. Fix the bugs, and I'll give it a 50 edit/7 day trial.  Maxim (talk)  16:35, 23 February 2013 (UTC)
 * Have you fixed the bug yet?  MBisanz  talk 22:49, 6 March 2013 (UTC)
 * I'm still working on tweaking the code. As I'm able to check results on what it writes to my user space, I don't really need to write to the Wikipedia: space. Testing & tweaking may take some more time, in between that I'm working on cleaning up problems that I see in my console output. I'll report back here when I'm ready, OK? It does run successfully to completion. Wbm1058 (talk) 23:23, 6 March 2013 (UTC)
 * Sure thing. Take your time. No hurry.  MBisanz  talk 14:47, 11 March 2013 (UTC)
 * How's the bug coming? Are you ready to run it?  For the record of the bot this request has been — cyberpower ChatLimited Access 13:22, 29 March 2013 (UTC)
 * Thanks for the approval for trial. Unfortunately my time is spread thin and I'm easily distracted from this task. While it's not as robust as I'd like, I think I have it running as well as it did under the previous operator. I'll get what I have running live and maybe come back to improve it later. Wbm1058 (talk) 22:22, 29 March 2013 (UTC)
 * My testing continues to get interesting console results. Last night I got these errors:

The first error msg is one I had my program write, following this library call:

You see that along with returning an error the library function getpage also generates php errors. I haven't attempted to debug the library yet, I assume it's fairly well tested by now. But the funny thing is, I ran the identical program again today—no code changes at all—and it ran cleanly. This isn't the first time I've run into this. Perhaps go live, knowing that some % of runs will fail, but knowing that a majority should complete successfully? Any ideas about what's wrong here? Results from most recent program run are linked here. Wbm1058 (talk) 20:14, 30 March 2013 (UTC)
 * Can I see your source. Maybe I can be of some assistance.— cyberpower ChatOnline 20:40, 30 March 2013 (UTC)
 * The program source here has made its first trial run, which made 44 edits. A second program run could push it over the 50 edit limit, depending on editor activity to work on reducing the reported backlog of merge requests. The program ran cleanly as far as no php errors or warnings reported. I'm guessing that issue might be transitory server load issues that aren't handled gracefully at some lower level in the code. I do note that there are still higher level issues, such as Proposed mergers/Log/July 2011 getting written three times in a single program run, leaving only a single merge request on the last edit. I'll work on tracking that one down. Wbm1058 (talk) 14:06, 1 April 2013 (UTC)
 * It's ok if you go over a limit a bit. The trial is more of a controlled test than an hard and fast limit.  MBisanz  talk 14:30, 1 April 2013 (UTC)


 * It looks like an issue with your framework. Can I see the botclasses.php script? Cp678  (T•C•G•E) 14:42, 1 April 2013 (UTC)
 * I copied that from here last July. Wbm1058 (talk) 16:13, 1 April 2013 (UTC)
 * Below is the function, copied directly from where it runs on my machine. Wbm1058 (talk) 16:21, 1 April 2013 (UTC)


 * Actually, I have a suggestion for your script. It looks like it might be feeding your botclass.php and empty parameter.  May I modify your script on Wikipedia? Cyberpower  &#124;  Penny for your thoughts?  14:47, 1 April 2013 (UTC)
 * I changed your source. Can you try it and tell me if it works? Cp678  (T•C•G•E) 15:06, 1 April 2013 (UTC)
 * I ran Cyberpower's version and from the files it wrote you can see that it was not an improvement. Cyberpower, I hope this wasn't an April first joke. You should sign consistently, for a minute I thought that you and Cp678 were two different editors each wanting to suggest an edit. My time is spread thin and this isn't helping. Thanks, Wbm1058 (talk) 19:21, 1 April 2013 (UTC)
 * My attempt was to resolve syntax issues and the error messages you get. If it busted your script, I apologize.  It was not an April Fools joke. :( (✉→Cyberpower←✎) 20:38, 1 April 2013 (UTC)
 * Oh dear. I see what it has done.  Sorry about that.  Cyberpower  &#124;  Penny for your thoughts?  20:40, 1 April 2013 (UTC)
 * OK. Thanks for trying. Wbm1058 (talk) 14:04, 2 April 2013 (UTC)
 * I can't see any issues on the surface. I would need to run the program myself to try it.  Would you mind if I recoded it to run with my framework to see if it bugs up just the same?  I would of course run it under my bot just once though.— cyberpower ChatOnline 22:09, 2 April 2013 (UTC)
 * If you ask me, it looks it doesn't support error handling. The error is coming from the foreach command which in my guess, assuming it works sometimes, that the API call is intermittently failing.  That would as a result have $x be a null variable and trying to iterate a null variable would throw an error.— cyberpower ChatOnline 22:19, 2 April 2013 (UTC)
 * Yes, that's what I meant by "transitory server load issues that aren't handled gracefully at some lower level in the code". It may not duplicate the error for you, but you could simulate the API call failing to test the response. You're welcome to run the code with your bot. Just either suppress the output or write to your own user space. The console messages I added will help you follow the program flow. When you do get errors it can rain & pour them. That's why I added the code to just die after 5 $getpagefailed errors. This is probably beyond the program's control if the problem persists, just wait till the next scheduled run and hope that works. Though the API code shouldn't throw php errors. If you update the code, let user:Chris G know so he can post the updated version of his library. Wbm1058 (talk) 01:17, 3 April 2013 (UTC)
 * I modified the getpage function. Try installing it into the library and run your code.— cyberpower ChatOnline 02:44, 3 April 2013 (UTC)
 * I think that's fixed it. I've run a couple of clean updates using your new version and there were no getpage errors, so can't tell for sure. RMCD bot is also running clean with the new version. Looks like the server response issues have settled down. Thanks again for the help. – Wbm1058 (talk) 18:18, 4 April 2013 (UTC)
 * I confirm the fix worked. Monday I ran a test that rained five consecutive getpage errors, then died -- with no php warnings or errors. The test had been running successfully without incident for about 40 minutes, and was about two-thirds complete when it died. Wbm1058 (talk) 00:24, 10 April 2013 (UTC)
 * That's because I added an error handler to the function. It attempts to retrieve the page five times and if all five attempts fail, the script will return false.— cyberpower ChatOnline 02:20, 10 April 2013 (UTC)

GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Thomas+and+Friends+video+releases&rvlimit=1&rvprop=content|timestamp (280.93706417084 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Thomas+and+Friends+video+releases&rvlimit=1&rvprop=content|timestamp (14.248254060745 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Thomas+and+Friends+video+releases&rvlimit=1&rvprop=content|timestamp (0.0024290084838867 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Thomas+and+Friends+video+releases&rvlimit=1&rvprop=content|timestamp (0.0029768943786621 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Thomas+and+Friends+video+releases&rvlimit=1&rvprop=content|timestamp (0.0014991760253906 s) (0 b)

?? getpage failed: Thomas and Friends video releases

GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Throughput&rvlimit=1&rvprop=content|timestamp (0.0038001537322998 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Throughput&rvlimit=1&rvprop=content|timestamp (0.0020449161529541 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Throughput&rvlimit=1&rvprop=content|timestamp (0.0020802021026611 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Throughput&rvlimit=1&rvprop=content|timestamp (0.0013549327850342 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Throughput&rvlimit=1&rvprop=content|timestamp (0.0010161399841309 s) (0 b)

?? getpage failed: Throughput

GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Town+watch&rvlimit=1&rvprop=content|timestamp (0.0017259120941162 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Town+watch&rvlimit=1&rvprop=content|timestamp (0.0011367797851562 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Town+watch&rvlimit=1&rvprop=content|timestamp (0.0010550022125244 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Town+watch&rvlimit=1&rvprop=content|timestamp (0.002094030380249 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Town+watch&rvlimit=1&rvprop=content|timestamp (0.0006711483001709 s) (0 b)

?? getpage failed: Town watch

GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Traffic+Separation+Scheme&rvlimit=1&rvprop=content|timestamp (0.00051283836364746 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Traffic+Separation+Scheme&rvlimit=1&rvprop=content|timestamp (0.00053310394287109 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Traffic+Separation+Scheme&rvlimit=1&rvprop=content|timestamp (0.00040698051452637 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Traffic+Separation+Scheme&rvlimit=1&rvprop=content|timestamp (0.0003960132598877 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Traffic+Separation+Scheme&rvlimit=1&rvprop=content|timestamp (0.00040602684020996 s) (0 b)

?? getpage failed: Traffic Separation Scheme

GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Train+set&rvlimit=1&rvprop=content|timestamp (0.00041913986206055 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Train+set&rvlimit=1&rvprop=content|timestamp (0.00041508674621582 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Train+set&rvlimit=1&rvprop=content|timestamp (0.00040507316589355 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Train+set&rvlimit=1&rvprop=content|timestamp (0.00040102005004883 s) (0 b) GET: http://en.wikipedia.org/w/api.php?action=query&format=php&prop=revisions&titles=Train+set&rvlimit=1&rvprop=content|timestamp (0.00039911270141602 s) (0 b)

?? getpage failed: Train set

getpage Error


 * I've automated the bot to run automatically once daily, and it successfully completed its first automated run at 02:55, 10 April 2013 (UTC). It will continue running every 24 hours unless you tell me to shut it down. Output (Special:Contributions/Merge bot) is primarily read by editors at Proposed mergers. I may do some further tweaking in the future, but at this point feel that I've fixed the most significant issues and it's time to move on to other tasks. Wbm1058 (talk) 14:34, 10 April 2013 (UTC)
 * Comment – The most recent update died 15 mins. in, with five consecutive getpage errors (25 consecutive GET errors, as shown in the hidden example above). This after the first two automated updates, April 10 and 11, successfully completed. Not sure what can be done to solve that issue. Wbm1058 (talk) 19:15, 12 April 2013 (UTC)
 * Have you tried talking to anyone in the BAG IRC channel? Someone is usually around who can help.  MBisanz  talk 21:06, 12 April 2013 (UTC)
 * Not yet. On the theory that it's a temporary server issue, I inserted  to wait ten seconds after a getpage fail before trying again. With 5 retry attempts that should give it closer to a minute to recover. The last update ran cleanly, that makes 3 of 4 now. I'll watch for the next getpage error and see if sleeping before retries solves the problem. If not, I'll try checking in at the chat channel. Wbm1058 (talk) 13:02, 13 April 2013 (UTC)
 * I think I've finally got that one solved. Sleeping did the trick. A manually run test had five fails, but not consecutively. The worst case had to retry twice, but recovered on the third attempt (after 20 seconds sleep). The others worked after sleeping 10 seconds. The test died after the fifth getpage fail, but now I've rewritten it so that a run can have unlimited fails, so long as there aren't five consecutive on the same page. Updated code is running live now. Wbm1058 (talk) 01:09, 16 April 2013 (UTC)
 * Any news?  MBisanz  talk 04:17, 13 May 2013 (UTC)
 * Yes, I've been meaning to post an update. I suppose I've been in defacto extended trial for the past month. For the most part, the daily updates have been running successfully. I made a couple program fixes to accommodate two new templates and a new alias (see revision history. I'll need to make changes of this sort as editors change templates ongoing. Windows task scheduler has an option to stop tasks if they run longer than a given time. There is a wide variation in times for this task to complete. At first, I had it set to stop after 2 hours. After finding too many updates hitting that time limit, I bumped it to 4 hours. Then several updates bumped that limit, so the next stop on Windows' menu is 8 hours, and that's where I have it set now. So far no update has run longer than 8 hours. The most recent update ran in 1 hr, 48 min. Of course, if the large backlog of proposed merges can be better managed and more timely addressed so that there aren't so many of them, this bot could potentially finish much faster. The updates running longer probably have more "sleep breaks" in them. Updates have died because my internet connection was down (I still kill them if they sleep too many consecutive times), and I can't tell whether delays are due to ISP problems or Wikipedia server issues, or something else. One time wikipedia's URL was pulling up a shopping site in my browser – http://www.shopify.com/ – I don't know if there was some major failure or something was hacked, fortunately that rectified after only a few minutes. A couple of times I've restarted the bot after an update failed, so as not to wait a full 24 hours before the next attempt, then the updates were on the new daily start time after that. As far as program output/reports, I'm reasonably happy with that, and no problems or requests have been reported to me by other editors. I was sorry to see Mutleybot withdrawn. Possibly I might try to take on that task with this bot, but that's back-burner for now, and I suppose I would open a new request if and when I get to it. Let me know if you have any more questions or issues. Thanks, Wbm1058 (talk) 12:58, 13 May 2013 (UTC)
 * Today's update ran without any problems in 1 hr, 22 min. Nice. – Wbm1058 (talk) 03:46, 15 May 2013 (UTC)
 * Today's update also ran without any problems in 1 hr, 22 min. – Wbm1058 (talk) 02:58, 16 May 2013 (UTC)
 * As long as you're happy with the sleep problem, I'm ok with it. You might look into moving to the WMFLabs setup, but you're fine to go.  MBisanz  talk 10:48, 16 May 2013 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.