Wikipedia:Bots/Requests for approval/RetractionBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

RetractionBot
Operator:

Time filed: 22:43, Sunday, April 21, 2019 (UTC)

Function overview: This bot maintains a database of retracted sources from the CrossRef API (and in the future also from PubMed) and would update references citing those sources with the Retracted template, marking them for review.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: https://github.com/Samwalton9/RetractionBot

Links to relevant discussions (where appropriate): Wikipedia talk:WikiProject Medicine/Archive 118 and Wikipedia talk:WikiProject Medicine/Archive 26

Edit period(s): Daily

Estimated number of pages affected: 169 on first run. It's hard to estimate beyond the first run, but the number isn't likely to be higher than a few edits per month.

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Please correct me if I'm wrong, I'm quite new to making bots, but I believe Pywikibot does this automatically unless overridden, so yes.

Function details: This bot maintains a database of retracted papers - at the time of writing it only considers those flagged as retracted or withdrawn by CrossRef using their API. At the time of writing the database contains 3974 entries, and in the future I plan to expand it with entries from the PubMed API. I'm also currently using quite a conservative search for retracted papers - we could also look for other publication types like errata, but a little more research is required there.

On each run, the bot loads a list of retracted papers, searching Wikipedia (mainspace only) for the original DOI. If it finds pages containing this DOI, it looks to see if that DOI is inside a tag. If so, it then checks if the Retracted template is present in this citation. If not, it appends the template, with the retraction DOI filled out, to the citation, and saves the page.

On its first run the bot detects 169 citations which need tagging - a full table can be checked at User:RetractionBot/results. A spot check implies that everything should be functioning as intended. The only edge cases I found were articles like Frontiers Media where the citation is flagged as retracted but doesn't use the correct template.

With just shy of 4000 entries in the database and 169 edits to make, a dummy run took ~10 minutes to process. I would plan to update the database and run the bot daily, any more than that would probably be overkill.

The bot has both a local killswitch User:RetractionBot/run, which will stop it running on the English Wikipedia, and a global killswitch User:RetractionBot/run, which will stop it running on any Wiki.

Discussion
&#32; Headbomb {t · c · p · b} 00:23, 22 April 2019 (UTC)
 * Nice project! Just in case it might be useful: https://github.com/neilfws/PubMed/tree/master/retractions contains the code used by Saunders for their "PMRetract", which was used for some research on retraction patterns. Nemo 05:40, 22 April 2019 (UTC)
 * That's super useful Nemo! I was struggling to find a method to discover PubMed retractions that was as straightforward as it is for CrossRef, so this is very helpful. Sam Walton (talk) 10:05, 22 April 2019 (UTC)
 * I'm glad. Nemo 17:21, 22 April 2019 (UTC)
 * Thanks Headbomb! I made 5 edits, checking each as it went, and noticed a bug where I hadn't allowed for the DOI to be long enough in the database (this edit). Fixing that now. There was also this invalid DOI, which appears to be a CrossRef data problem rather than a bot problem. Removing 'ret' at the end of the identifier resolves correctly. Sam Walton (talk) 09:48, 22 April 2019 (UTC)
 * 25 edits made, one bug discovered and fixed. Looked through all edits and the bot appears to be behaving as intended. Sam Walton (talk) 10:04, 22 April 2019 (UTC)

looks good. Three things.
 * 1. How hard would it be to make the bot cross-reference the existing DOIs against pubmed databases to add pmc/pmcid?
 * 2. Could you expand the scope to cover citations in bulleted lists, like Quark
 * to
 * to


 * 3. Plain text citations like
 * Carberry, Josiah (2008). "Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory". Journal of Psychoceramics. 5 (11): 1–3.
 * would be very tricky to handle. But could the bot make a listing, at perhaps User:RetractionBot/Plaintexts, of all instances of '10.5555/12345678' (and other retracted DOIs) found in articles. Either inside or outside doi.? This way they could be found and flagged manually? &#32; Headbomb {t · c · p · b} 15:45, 22 April 2019 (UTC)
 * 1. If I understand correctly this feels like it would be out of scope for this bot; if we want to add pubmed identifiers to all references with DOIs that sounds like it should be a separate bot task - it seems this bot is only going to edit a handful of references per year. Another bot could tackle all citations much more thoroughly. 2. Seems quite possible. 3. Possible, but probably only in the case that the citation is on its own line; otherwise it might not be clear where the retracted template should be placed. Can you provide an example of 3 (i.e. not in a ref tag, not in a list) in an article? Sam Walton (talk) 20:41, 22 April 2019 (UTC)

I don't mean cross-referencing the existing citation. I mean adding this. You have the retraction DOI, but you could query pubmed relatively easily to figure out the retraction PMID/PMC when they exist. As for 2, those would cover things specifically on their own line. Maybe it could be expanded to cover things like but there will be a point where the complexity gets too complex, so I'll let you figure out where that threshold is. &#32; Headbomb {t · c · p · b} 20:51, 22 April 2019 (UTC)
 * Oh, and for 3, I initially misread. This wouldn't be something the bot itself would touch, but rather it would create a worklist of articles for people to review (so people can place retracted if appropriate). So this would cover pretty much everywhere the DOI string is found, not just 'nicely' formatted things. E.g., if, on the article Example was a retracted article, horribly formatted as
 * Carberry, in 2008 (Published as "Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory" in Journal of Psychoceramics. 5 (11): 1–3. DOI OBJECT IDENTIFIER HTTP://DX.DOI.ORG 10.5555/12345678 ), claimed that ...
 * Then you could have a report at User:RetractionBot/Report, with something like

However, this isn't something that needs to be part of this BRFA (or would need one, since it would operate in its own userspace), just an idea to increase the usefulness of the bot. &#32; Headbomb {t · c · p · b} 21:31, 22 April 2019 (UTC)
 * Having spent more than 30 seconds reading your post to properly understand your suggestions, hopefully this answer will be more coherent! For 1, yes that makes sense and I'll take a look at doing that. 2 seems very possible, it would just be another check alongside being in a ref tag. 3 is also a neat idea, and agree that's something worth playing around with following this BRFA. I should have time to implement the first two this weekend. Sam Walton (talk) 09:08, 24 April 2019 (UTC)

If amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. -- The SandDoctor Talk 16:45, 28 April 2019 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.