Wikipedia:Wikipedia Signpost/2016-04-24/Special report



Some of you may recall from James Heilman’s previous article that we now have a copyvio detection bot, User:EranBot, that checks prose additions over a certain size for copyright violations using Turnitin / iThenticate. Edits that come back positive are listed at User:EranBot/Copyright/rc for human follow-up. I have been working on assessing these bot reports since December 2015.

EranBot is highly useful, as people are being notified immediately or within a few days that their edit is an unacceptable copyright violation. For repeat / egregious violators, I follow up by monitoring the user's edits, and block if necessary. My hope is that early detection and warning will prevent the introduction of additional serial copyright violators like we have seen in the past and curtail the need to open further cases at the ridiculously backlogged WP:CCI (which has a five-year backlog of 155 cases, 76,218+ articles as of 7 April 2016).

Copying within Wikipedia requires attribution
One thing I've noticed while reviewing the bot reports is that many established editors don't realise that attribution is required within Wikipedia under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License. Some of you may have already received a notice from me letting you know about this requirement!

When copying within Wikipedia, at a minimum, what you need to do is mention in your edit summary that the content has been copied from another article. Here’s a sample edit summary: "Attribution: this material was copied from Example on April 1, 2016. Please see the history of that page for attribution." That's the ideal version, but even a simple edit summary such as "copied from Example" is better than nothing! If you're copying material you wrote yourself, attribution is technically not required, but it would help simplify checking the bot reports—and it would save you from receiving irrelevant notices from me or potential future helpers.

It's good practice, especially if copying is extensive, to also place a template on the talk pages of the source and destination. In cases where you add public domain material from US government websites or other PD sources, you should place the template immediately after your citation, and for material under a compatible Creative Commons license, you should use. There's also a whole host of other useful attribution templates available at Category:Attribution templates.

Helpers needed
As the bot gets more reliable and begins to run 24 hours a day, there’s going to be more work to do than can be managed by one person alone. It would be great if interested people had a look at User:EranBot/Copyright/rc or the archived batches, with a view to helping out by assessing diffs, removing violations from article space, and issuing warnings to the editors involved. If you have any questions about how to perform this task, please let me know on my talk page, or post a message at User talk:EranBot/Copyright/rc.

The WMF Community Tech team has been working with Eran to further improve the bot, based on user feedback. The hope is that usability and reliability will improve over time. Further details are here.