Wikipedia:Bots/Requests for approval/RichBot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

RichBot 2
Operator:

Time filed: 22:11, Wednesday, August 18, 2021 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): PHP

Source code available:

Function overview: Add a copyright notice to drafts that may be copyvios

Links to relevant discussions (where appropriate):

Edit period(s): Once every 12 hours

Estimated number of pages affected: Depends on amount of copyright content found

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: RichBot (formally NoSandboxesHere) already checks a maximum of 200 drafts every 12 hours though Earwig's Copyvio Detector to create User:RichBot/copyvios which it does under the provisions of WP:BOTUSERSPACE. I am proposing to tag the found potential copyvios that have a confidence score of > 70% with User:RichBot/maybecopyvio (as a template) which will indicate to reviewers to check the report before reviewing. If the tag is removed, the bot will not re-add it during that submission round.

Discussion
I think it would be helpful to link from the template to WP's policy on copyright violation. Many people are unaware, as I once was, that inserting copyvio text is not allowed, even in Draft space with the intention of paraphrasing it later. – Jonesey95 (talk) 23:55, 18 August 2021 (UTC)
 * Excellent point, this has been added. I have also added the ability for a percentage to be given as a parameter which will change the text a little, this can be seen on my sandbox - Rich T&#124;C&#124;E-Mail 13:57, 19 August 2021 (UTC)

I can see this potentially having a lot of FPs (a short article with lyrics, copying from another article, etc), and thus being bitey. It may have less FPs if it only included drafts created by new editors. Either way, it's probably best to advertise this discussion to copyright venues (talk page of the policy, Contributor copyright investigations, etc) to get some thoughts from those who deal with CVs regularly. ProcrastinatingReader (talk) 21:15, 19 August 2021 (UTC)
 * I'm not going to outright decline this, just in case I'm missing something, but we do not have bots run through 's Copyvio Tool because we only get a limited number of searches. Auto-checking is something we should not be doing, and I am wondering if the updates to User:RichBot/copyvios are what have been causing me lately to run into the max search issue at seemingly earlier points in the day. Primefac (talk) 20:41, 22 August 2021 (UTC)
 * Provided the bot limits its daily searches, this isn't completely unacceptable. Of the past 1,767 tool uses with the search engine enabled (a two-day period), RichBot made 404. This might be a bit high if we're seeing the limit being exceeded frequently, as you say, but 200/day is only about 10-15% of our quota, so I don't think that alone is a problem. Also, the bot would be free to make as many checks as it wants if it disables the engine and uses "links in page", if we need to go down that route.
 * Separately, we should work on the language in User:RichBot/maybecopyvio. "Has a high confidence chance of being a copyright violation" is too strong of a determination given the frequency of false positives, and the tool doesn't use the term confidence anymore because it is misleading—there is no probabilistic basis for the percentage. You could say that the submission was found to be highly similar to another source found on the web, for example. —&#8239; The Earwig (talk) 03:56, 23 August 2021 (UTC)
 * I'd ideally also like to see how many searches per day get run, if that's possible. Primefac (talk) 12:32, 11 November 2021 (UTC)
 * There's a already a bot that scans drafts for copyvios. See Special:NewPagesFeed, select "articles for creation", and set filter "copyvio". These are marked by EranBot (operated by ערן) using the pagetriage log action. How is this bot's detection different? – SD0001  (talk) 04:08, 12 November 2021 (UTC)
 * Also, as an alternative to adding a banner (which has potential to be problematic due to FPs), we could consider having the bot save some metadata at some place, which could be picked up by a tool like AFCH and shown to AFC reviewers only. – SD0001  (talk) 04:10, 12 November 2021 (UTC)


 * Please see the above and advise asap. -- The SandDoctor Talk 07:29, 29 December 2021 (UTC)
 * due to inactivity. Feel free to reopen if you wish to pick this back up. ProcrastinatingReader (talk) 15:16, 23 January 2022 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.