Wikipedia:Bots/Requests for approval/Addbot 31


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Addbot 31
Operator:

Time filed: 19:41, Friday February 1, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available: On GitHub

Function overview: Bot rewrite, Performing all tasks at once including various minor changes

Edit period(s): Cont

Estimated number of pages affected: Many

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: The bot rewrite consists of two major parts, a listing process and a checking process. The Listing process does not edit, this takes various sources of articles (categories, toolserver reports e.t.c) and lists them for the bot to check 'at some point'. The Checking process then reads this list every (currently 1 min) performing multiple checks as listed below before editing if required. The bot will try and alter its rate of checking pages depending on how many pages it currently has in its queue.


 * For Articles
 * If multiple tags exist that can be put in Multiple Issues then add them per Bot_requests
 * Remove duplicate tags on a page leaving the one with the oldest date per Bot_requests
 * Check if page is double redirect and fix it (already approved here)
 * Check if page has an empty section, if it does add empty section
 * Check if page is Orphan (Either tag or untag) (already approved Bots/Requests_for_approval/Addbot_18)
 * Check if page is Uncat (Either tag or untag) (already approved here and here)
 * Check if page is Deadend (Either tag or untag) (removing is already approved here)
 * If page only has 2 links add Underlinked in its place
 * Check if page has any ref, remove unrefed tag it it appears (i.e. Unreferenced or BLP unsourced)
 * If page only has 2 refs add refimprove
 * Change Unreferenced to BLP unsourced if in Category:Living people and vice versa
 * Change refimprove to BLP Sources if in Category:Living people and vice versa
 * Check if Sections tag can be removed (already approved here)
 * Add Sections tag if no sections and over 1000 words
 * Check if the page has a stub tag that can be removed (more than 500 words) (current trial task 30)
 * Remove outdated templates (currently Wikify) (already approved)
 * Date any other maint tags that haven't been been dated (Approved for use with AWB here)
 * If any of the above have happened
 * General template fixes (taken from AWB) (Approved for use with AWB here)
 * Combine any maint templates into multipleIssues if there is more than one
 * Fix whitespace (e.g. multiple new lines in a row) (Approved for use with AWB here)
 * For Images
 * If PDF tag as bad format (already approved here)
 * For UserTalk
 * If contains one of templates that needs to be substed do so (already approved here)
 * For Categories
 * Checks and adds or removes Underpopulated category (removes if more than 50, adds if less than 10)
 * For Sandboxes
 * Check if the header exists at the top, if not put it there (Arrpoved here
 * My Bot Space
 * If the bot comes accross a page that is protected that it was planning on editing it will post it in its user space
 * If the bot finds a broken redirect (it redirects to itself) it will post it in its user space

The main changes to the tasks that I am currently performing is that they will all occur in single edits with increased small uncontroversial fixes. The bot is configurable from User:Addbot/config although some variables still need to be added.

Discussion
Any significant tests I performed in the bot sandbox have been posted below.  ·Add§hore·  T alk T o M e ! 21:23, 4 February 2013 (UTC) *Checking Kikin Inc
 * > Is Article.orph.uncat.dead.unref.sec.stub-.dep.date.gen
 * > POST: Bot: - Removing Stub Tag (Report Errors 2) output
 * Checking Mabel Fairbanks
 * > Is Article.orph.uncat+.dead-.unref.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Dating  (Report Errors 2) output (Fixed adding edit summary after this)✅
 * Checking Macrinus (Bishop of Eleutheropolis)
 * > Is Article.orph.uncat+.dead-.unref-.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized Dating (Report Errors 2) output
 * Checking Magnetic Tower of Hanoi
 * > Is Article.orph.uncat+.dead-.unref-.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized Dating (Report Errors 2) output (Fixing template links in summaries, spotted error where parser matches templates in comment)✅
 * Checking PTV World
 * > Is Article.orph.uncat+.dead-.unref-.sec-.stub.dep.date.gen
 * > POST: Bot: - Removing Unreferenced (Report Errors 2) output (Need to alter regex to match refs)✅
 * Checking Robert Harvey (Clwyd politician)
 * > Is Article.orph.uncat+.dead-.unref-.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized Dating (Report Errors 2) output (summaries fixed)✅
 * Checking Sammy Barr
 * > Is Article.orph.uncat+.dead-.unref-.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized Dating (Report Errors 2) output
 * Checking SzabadkÃ­gyÃ³s
 * > Is Article.orph.uncat+.dead-.unref.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized (Report Errors 2)
 * Checking Theatre in Bangladesh
 * > Is Article.orph.uncat+.dead-.unref.sec-.stub.dep.date.gen
 * > POST: Bot: - Adding Uncategorized (Report Errors 2) output

BAD Addshore, bad! Line 28 of run.php is a real no-no.

Really, you shouldn't eval anything taken from an untrustworthy source, but from any unprotected wikipage? That's just asking for trouble. I've protected the page now, and you must remove that from the code. If you have to have your config on a wikipage, I would suggest using something like parse_ini_string.

Seriously, I can't stress enough how dangerous that code is. Not only could it be used to hack your bot, but it could also hack your server as well. (note: I have not reviewed the rest of the code) -- Chris 17:14, 2 February 2013 (UTC)
 * I know it was a terrible way to do it,I was braindead at the time, I did have the page protected at one stage hence why I was using eval, going to do parse_ini_string now.  ·Add§hore·  T alk T o M e ! 17:22, 2 February 2013 (UTC)
 * ✅ Fixed  ·Add§hore·  T alk T o M e ! 19:25, 2 February 2013 (UTC)|2=eval FIXME fixed ✅

 ·Add§hore·  T alk T o M e ! 12:25, 4 February 2013 (UTC)

Trial

 *  MBisanz  talk 14:15, 4 February 2013 (UTC)
 * Currently running each page through one by one although the post is made before I check the contents. First edit prompted me to add more checks when adding an orphan tag, previous to this I only had checks for removing. See here. Starting the trial as soon as Labs is fixed.  ·Add§hore·  T alk T o M e ! 15:28, 4 February 2013 (UTC)


 * Edit 1 diff added orphan tag to an SIA page by adding more checks which were only included in the removal of orphan tags before. See on git.
 * Edit 2 diff formatted and removed one stub tag but for some reason the second tag was not removed. Adding a second pass over the article to see if this fixes git.
 * Edit 3 diff added Orphan correctly.
 * Edit 4 diff added Orphan correctly.
 * Edit 5 diff added Uncategorized correctly.
 * Edit 6 diff correctly added reflist but again only removed one stub tag. Comparing the two diffs it left stub tags behind with capital letters both times and it turns out I missed this in the regex git.
 * Edit 7 diff removed both stub tags correctly.
 * Edit 8 diff correctly added Orphan and Uncategorized in Multiple Issues.
 * Checked Greg Brown (businessman) but did not remove Deadend. It turns out the redirects to this template were not added to the config git.
 * Edit 9 diff correctly removed the deadend tag.
 * Edit 10 diff correctly removed the deadend tag.
 * Checked Sinocast but did not remove Orphan even though article had link to David_Marchick. I altered a regex which meant it realised it was not an oprhan git
 * Checked Sinocast but again didnt remove {tl|Orphan}}, I discovered this was because the tag was included in the old style of Multiple Issues. It turns out my  function doesn't quite hit all variations of the template yet. When trying to fix made a bad edit so moved testing to sandbox until fixed. After a bit of testing I fixed the mi tag git, then to work out where the content was going. I ended getting a good edit after changing the way the tags were removed git.
 * Edit 11 diff removed the Orphan tag correctly and succesfully from the page as well as reformating the Multiple issues tag into the currently used format.
 * Edit 12 diff removed the Uncategorized stub tag even though the page didnt have any categories and was a stub. Firstly I also fixed the stub matching regex for this check git (I need to add these to the config) and then make the function ignore stub cats git.
 * Checked Ailum again and no edit was made.
 * Edit 13 diff dated the notability as well ass adding two further tags in Multiple Issues.
 * Checked Madrast_Al-Mushaghebeen No actions should have occoured and no actions did (was orphan with linking redirect)
 * Edit 14 diff removed Uncategorized correctly.
 * Edit 15 diff adding Uncategorized correctly.
 * Checked DNA_history_of_Egypt and no edit was made.
 * Edit 16 diff adding Uncategorized correctly.
 * Edit 17 diff added Orphan and Uncategorized although my reason for choosing the page was to see if the bot added the deadend tag correctly which it didnt. So I did some digging...
 * Edit 18 diff ran the bot over the same page and my fix worked adding the Dead end tag git
 * Checked Truncatella_caribaeensis, Moesziomyces_bullatus, Edwin_Atkins_Merritt, Asarum_caudatum, Balandiz, Jill_Culton, Yukariulucak,_Beypazari, Bottle_scraper, Box_Hill_High_School, The_Cairnwell, Mineral_exploration and no edits were made. Still have to test removing '''wikify tag, removing unref tag, swaping unref and blpunsourced, adding sections tag, removing sections tag.
 * Edit 19 (FromDB) diff correctly removed orphan and fixed Multiple Issues
 * Edit 20 (FromDB) diff correctly removed orphan and fixed Multiple Issues
 * Edit 21 (FromDB) diff correctly removed orphan and Multiple Issues leaving primarysources
 * Edit 22 diff did something special with another old style multiple issues template while adding a deadend tag that it did not already spot on the page. After looking at the page I have a feeling it is because the Multiple Issues template is half in the new style and half in the old style which I have not accounted for.  git although this can be streamlined at a later date
 * Checked Vistarband again and no edit was made.
 * Edit 23 diff Removing Unreferenced Adding BLP unsourced correctly
 * Note, at this stage the only thing we really have to check is the adding and removal of sections tag
 * Edit 24 diff Removed Sections but yet again broke Multiple Issues in another special way.. Caused by newlines in the wrong places git
 * Edit 25 diff this time Sections was removed and MI didnt break.
 * Edit 26 diff Removed Sections and Multiple Issues leaving one tag.
 * Edit 27 diff Removed Sections and Multiple Issues leaving one tag.
 * Edit 28 (FromDB) diff Removing Unreferenced Adding BLP unsourced and adding Multiple Issues correctly
 * Edit 29 (FromDB) diff Removing Orphan
 * Edit 30 (FromDB) diff Removing Orphan
 * Edit 31 (FromDB) diff Got confused when it hit a tag that it didnt recognise . As this is a unique little notice I will create a check before hand removing it git. After one more failed edit attempt the bot nolonger edits the page per git
 * Edit 32 (FromDB) diff Removing Unreferenced Adding BLP unsourced and adding Multiple Issues correctly
 * Edit 33 (FromDB) diff Removing Unreferenced Adding BLP unsourced and adding Multiple Issues
 * Edit 34 (FromDB) diff Removing Orphan
 * Edit 35 (FromDB) diff Removing Unreferenced Adding BLP unsourced and adding Multiple Issues
 * Edit 36 (FromDB) diff Removing Unreferenced Adding BLP unsourced and adding Multiple Issues
 * Edit 37 (FromDB) diff Removing Orphan
 * Edit 38 (FromDB) diff] Removing Orphan fixing Multiple Issues
 * Edit 39 diff Adding Dead end Removing Sections
 * Edit 40 diff Removing Sections fixing Multiple Issues
 * Edit 41 diff Removing Sections fixing Multiple Issues and other gen fixes
 * Edit 42 diff Removing Sections and Multiple Issues leaving 1 tag
 * Edit 43 (FromDB) diff Removing Orphan and Multiple Issues leaving 1 tag
 * Edit 44 (FromDB) diff Removing Orphan and Multiple Issues leaving 1 tag
 * Edit 45 (FromDB) diff Removing Orphan leaving 2 tags in Multiple Issues
 * Edit 46 (FromDB) diff Removing Stub incorrectly, turns out my wordcount function did not ignore tables, it now should git
 * Checking Arabic exonyms and no edit was made so the above is
 * Edit 47 diff Removing Sections fixing Multiple Issues
 * Edit 48 diff Removing Sections and Multiple Issues leaving 1 tag
 * Edit 49 diff Removing Sections fixing Multiple Issues
 * Edit 50 diff Removing Sections fixing Multiple Issues
 *  ·Add§hore·  T alk T o M e ! 21:22, 4 February 2013 (UTC)
 * As a quick reference: We worked off the list I compiled at GitHub John F. Lewis (talk) 21:30, 4 February 2013 (UTC)

Trial 2

 * Let's do another trial to be sure. I appreciate the table, but you don't need to go to that effort for the ones it does right, just any errors.  MBisanz  talk 23:19, 4 February 2013 (UTC)
 * Restarting, Once Addshore remembers to set it to edit the mainspace and not sandbox. Posted on behalf of Addshore. John F. Lewis (talk) 23:26, 4 February 2013 (UTC)
 * I wanted to make sure I didn't miss anything :) This was run from the Database and was not checked until after all 50 edits were complete. You can see all of the 50 edits made here, see below for 3 more bugs that I have found, I think another trial after these bugs are fixed would be good.  ·Add§hore·  T alk T o M e ! 23:56, 4 February 2013 (UTC)


 * Edit, seemed to add deadend when page was not a deadend. I think this is due to some of the characters that were used on the pages in the links. Am looking into this now
 * Edit seemed to remove unref tag when i cannot see any references. No idea why this has happened so will look into it.
 * Edit has an error in the edit summary, it looks like the bot added Sections when infact it removed it. Should be an easy fix.

Trial 3

 * Let's do another trial because we're finding and fixing stuff still.  MBisanz  talk 00:37, 5 February 2013 (UTC)
 * Perfect, will run a few sandbox tests over the articles that previously hit bugs before starting the trial.  ·Add§hore·  T alk T o M e ! 00:39, 5 February 2013 (UTC)
 * All of the bugs above are now fine. Also someone mentioned a vbug on my talkpage where BLP unsourced and Unreferenced were switched for non BLP's, this was due to incorrect regex. I have the regex and added an extra check to this part of the bot. Just about to run the next 50 edits.  ·Add§hore·  T alk T o M e ! 01:00, 5 February 2013 (UTC)
 * 49/50 edits went as expected. The last edit with discussion here seems to still have a problem switching Unreferenced and BLP unsourced tags. Looking into it now, would be good to have another trial after this is fixed, i will also try and find some of the less common changes for the bot to hit, i.e. stubs, deadend, wikify, dating, double redirects.  ·Add§hore·  T alk T o M e ! 01:37, 5 February 2013 (UTC)
 * the bug per this commit. Would love another trial to try and gain some more variety of types of changes.  ·Add§hore·  T alk T o M e ! 01:48, 5 February 2013 (UTC)

Trial 4

 * Sure thing.  MBisanz  talk 03:07, 5 February 2013 (UTC)
 * Just about to start the trial, some changes have happened to the bot this morning so I will try and test these features in particular.  ·Add§hore·  T alk T o M e ! 10:42, 5 February 2013 (UTC)
 * , I have a few more small checks regarding the Empty section template to add. Throughout the run I made lots of other small tweaks and toward the end of the run the but seemed to be getting everything right. I think 1 more trial would be good to stand as a final test.  ·Add§hore·  T alk T o M e ! 18:33, 8 February 2013 (UTC)
 * I have made the slight changes to the Bot making one change to when Empty section is added and also adding the below.
 * If multiple tags exist that can be put in Multiple Issues then add them per Bot_requests
 * Remove duplicate tags on a page leaving the one with the oldest date per Bot_requests
 * I am now ready for another trial.  ·Add§hore·  T alk T o M e ! 20:34, 8 February 2013 (UTC)

Trial 5

 * Sure thing.  MBisanz  talk 03:10, 9 February 2013 (UTC)
 * . I made a few more changes to the identification of BLP articles which seems to be working now. I also fixed a bug where templates were being added above HAT notes, and also then the bot was dragging up hat noes from sections to the top (also fixed). Just spotted one final issue with an edit summary at the end but other than that I think we are there.  ·Add§hore·  T alk T o M e ! 19:51, 9 February 2013 (UTC)
 *  MBisanz  talk 15:43, 11 February 2013 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.