Wikipedia:Bots/Requests for approval/PearBOT 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

PearBOT 5
Operator:

Time filed: 10:26, Monday, November 4, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: Standard pywikibot

Function overview: Adding automatically generated short descriptions to some biographies

Links to relevant discussions (where appropriate): Wikipedia talk:WikiProject Short descriptions

Edit period(s): One time run

Estimated number of pages affected: 200,000+

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: As discussed at Wikipedia talk:WikiProject Short descriptions, I've created a set of regular expressions to extract short descriptions of the form "[Nationality] [Career]" from the first sentance of the article lead. The main conditions on the generated short description is that they procede "[Name] is/was a/an", starts with a nationality on my list of nationalities and is at most 40 characters long as recommend by WP:SHORTDESC. If it doesn't fulfill these criteria the bot skips the article and attempts the same process for the next article. Using these criteria about 20% of biographies can get a short description based on my tests at User:Trialpears/Automatic biography short descriptions where 202 out of 1000 attempts generated a short description. This sample was considered acceptable for automatic application by the participants in the discussion linked above.

Discussion

 * BAGAssistanceNeeded ‑‑Trialpears (talk) 22:50, 11 November 2019 (UTC)
 * Could you please give an example edit that this bot would make? Thanks! -- The SandDoctor Talk 04:48, 12 November 2019 (UTC)
 * , sure! From the first one on the list the edit would be this, which is MOS:ORDER compliant and contains a parameter to facilitate tracking as discussed at WT:WPSHORTDESC. ‑‑Trialpears (talk) 06:52, 12 November 2019 (UTC)
 * Thank you very much for helping me visualize this, ! Please post (preferably a permalink) to the contribs page when done. As usual, please take however much time you need to complete this trial (there isn't any rush) and let me know if you have any questions or need any clarifications.  -- The SandDoctor  Talk 07:05, 12 November 2019 (UTC)
 * 50 edits Only problem encounterd was related to Adrian Vandenberg where ", best" was appended to the end of the description. This is due to me forgetting to sync the bug fix for this from the Trialpears PAWS notebook to the PearBOT PAWS notebook, this is now resolved as can be seen in the second edit. The third edit was manual and intended to be made through my main account, but was accidentally made using PearBOT because PAWS didn't allow me to be logged in with Trialpears while using it which is usually what I do. ‑‑Trialpears (talk) 08:13, 12 November 2019 (UTC)
 * Looks good to me so far. How do you feel about running a larger extended trial and then waiting a bit for feedback before we close this request? With over 100k estimated pages effected, I am a bit leery to approve too quickly. -- The SandDoctor Talk 18:34, 12 November 2019 (UTC)
 * , sure thing! No bot flag then I guess? ‑‑Trialpears (talk) 21:26, 12 November 2019 (UTC)
 * The bot is already flagged? (Rights log) -- The SandDoctor Talk 02:03, 13 November 2019 (UTC)
 * , yes but should I use it of the main purpose of the trial is feedback? ‑‑Trialpears (talk) 06:43, 13 November 2019 (UTC)
 * Oh, I see/understand your question better now. Yes: getting feedback is the main purpose, so I would recommend not using it (set to "false" for this task). If that sounds good to you, I will go ahead and grant an extended trial. -- The SandDoctor Talk 06:57, 13 November 2019 (UTC)
 * , sounds good! ‑‑Trialpears (talk) 09:21, 13 November 2019 (UTC)
 * not wanna be pushy, but getting the trial started tomorrow would be very nice since I won't be able to do it until Tuesday otherwise. ‑‑Trialpears (talk) 22:12, 14 November 2019 (UTC)
 * Thanks for the reminder . As before, please post permalink to contribs section when done and take as much time as you need. When done I will need some time to review the edits and then we should wait a few days for any community feedback. Thanks! -- The SandDoctor Talk 22:22, 14 November 2019 (UTC)

Looks good here. - FlightTime  ( open channel ) 22:10, 15 November 2019 (UTC)
 * 500 edits All reviewed and there were some edits that I should mention here., , and  are all examples of what I'll call the best bug, since it's the same thing that made best appended to some previous edits. This is because these words weren't on the list of words preceeding known for that shouldn't be included in the description. All of these are now solved and a ton of other words found by searching for "known for" have also been added which should have stopped it once and for all. There were also two edits with problems due to abbreviations with periods  this should also be solved now with them being added to a list of exception this list has been partially running throughout both trials including only a few words which can be seen in work at Bill Harsey Jr. and probably a couple of other edits, but I'm too tired to go digging for more right now. After these changes I would expect the problem rate to be significantly less than a percent and hopefully as low as 1 in 500 or less. There were of course a few edits that sounded a bit awkward due to the opening sentance being awkward, I think the best way to get an impression of this problem to look through these edits and see if you think it's a problem. I personally think it's a major issue. ‑‑Trialpears (talk) 23:48, 15 November 2019 (UTC)
 * Let's see if we can fix the issues mentioned above. Primefac (talk) 15:50, 8 December 2019 (UTC)
 * 100 edits No problems. For what it's worth I reviewed about 300 random generated descriptions and some testcases specifically designed to test some edge cases. All edits mentioned above as problematic were also included in these test cases. I also added some final checks that should never be activated for things like ending with a conjugation or a comma after processing. I would be comfortable letting this run now. ‑‑Trialpears (talk) 21:48, 16 December 2019 (UTC)
 * I spot checked those edits and didn't see any issues. This bot task is good to see - I wrote code way back with some of the same logic, running on specific kinds of biographies and with a similar check for valid nationalities, but never got it to the point where it could be trialed. Great to see someone put the effort in to fine tune the code to the point where it can be widely used!


 * Just wondering, are descriptions being only added to living people? Would certainly be nice to handle all people, using categories like Category:Dead people, but its fine if this task is limited to only living people. Galobtter (pingó mió) 05:02, 28 December 2019 (UTC)
 * , I would intend on running it on all biographies, but have only been running it on Category:Living people since it's the simplest. Perhaps I should have included some biographies of dead people in the trial and would be willing to do another trial if that is an issue. ‑‑Trialpears (talk) 14:33, 29 December 2019 (UTC)

Primefac (talk) 18:41, 30 December 2019 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.