Wikipedia:OABOT/False positives

April 2017
https://docs.google.com/spreadsheets/d/1zRZeks0MrqfYFuvWp-IjmbytaVKHYyS97-vgvDCpMIg/edit?usp=sharing

Results from version using restricted filters but including citeseer and excluding academia.edu and researchgate

Errors:
 * Original not paywalled: 30 (51%)
 * New link not free to read: 2 (3%)
 * Original and new link do not match: 0 (0%)
 * Copyright violation: 4 (7%)
 * Sherpa/Romeo version wrong: 13 (22%)

Summary

 * A significant percentage of “paywalled” links turned out to actually be free to read already, via the original link
 * In at least one case the tool recommended adding the same link that was already present
 * In almost all cases the new link recommended by the tool was free to read, with intermittent problems accessing Citeseer’s cached PDFs
 * In at least two cases there were issues of a thesis of the same title being returned for a journal article citation. In another case a translation was returned for a citation to the original in another language.
 * The semi-automated tool demonstrated continued issues with Citeseer publications uploaded by people other than the original authors. One instance may be an unauthorized translation of the original.
 * I used SHERPA/RoMEO, an index of publisher copyright/archiving policies, to estimate compliance of included repositories with publisher contracts, keeping in mind that it is always possible the author negotiated terms other than those usually applied. A significant number of results in both the automated and semi-automated tests uploaded the wrong version (preprint vs postprint vs publisher version) or failed to meet other requirements uploaded by the publisher. Most often a publisher’s version was uploaded when this was not permitted. SHERPA-RoMEO has an API that may be possible to integrate
 * arXiv and PMC are both generally accepted by publishers as standard in those fields. No other source representing a significant portion of the test sets could be considered “perfect” in terms of likely copyright compliance.
 * Assessment of potential linkvio concerns will be challenging to non-experts even with written instructions.
 * The error page for mispelled/nonexistent titles could be more helpful
 * Using the tool manually (ie. entering an article title in the search box) was very slow, and the page occasionally did not load at all. In addition, a significant numbers of large journal-based articles present with no proposed changes