Wikipedia:WikiProject Military history/News/September 2010/Editorials

In Part 1, I talked about how copyright problems are defined on Wikipedia and why we should care. This time, I'll be focusing on the practical aspects of how to address it.

To miminize damage to the project, we should do everything we can to prevent copyright problems in the first place, but when they do creep in we need to get them out of here as quickly as possible. Doing so not only protects copyright holders and reusers, but is also a great courtesy to other Wikipedia contributors. My heart sinks when I'm cleaning copyright problems and encounter an article that shows considerable time and effort from other contributors. When they are building on the base of a copyright infringement, their work must frequently be eliminated, too, as a derivative work of the original. Not only are those immediate man hours lost, but I can't but worry that the contributor might be lost as well. How many times does this have to happen before our post-copyright contributor begins to wonder if it's worth it?

To clean copyright problems, we have to be able to recognize them and to know what processes are best when we do. We need to give appropriate notice to the person who created the problem, beginning with the assumption that they meant well but did not understand the requirements or know how to comply. And we need to remain conscious of the fact that a person who creates one copyright problem may have created others and that sometimes additional review will be required.

How to recognize it
I have personally found copyright violations in almost every level of article, from stub to good article. Copyright violations have made it to the main page, in DYKs. They've been placed by IPs and administrators. Accordingly, there is no profile on where they're going to appear or who is going to have placed them.

For your project, the most likely red flag to look for will be the addition of a lengthy stretch of professional level text, especially if it lacks sources. This isn't exhaustive: some copyright infringers lift the sources directly from their copyrighted content, while some add copyrighted content incrementally, especially if copying from multiple sources. Too, sometimes we do get extensive contributions directly from professional-level writers. But this is one indicator that further investigation may be necessary. Likewise, content that has a tone that is all wrong for Wikipedia might have been copied from someplace where the tone was right: an editorial, perhaps, or a less formal resource.

When content raises a red flag, a quick run through a few well-selected search engines can often help confirm if issues exist. I frequently use Google and Google books, but sometimes also Google news and Google scholar. The mechanical detectors I use are generally not sensitive and only detect larger amounts of copying, but User:The Earwig has made a nice tool that can start a search here. I search manually for "apt phrases" or runs of four or five words that don't seem likely to be common. Looking at one of your featured articles, for example, Admiralty Islands campaign, I would not search for the phrase "no signs of enemy activity." With quotation marks (which I do not always use, in case a few words have been minimally changed), I get over 3,000 precise matches; without them, I have over 5.2 million! I'd be more likely to look for "the Japanese had not anticipated an assault" or "completed the isolation of the major Japanese base". For some examples of "apt phrasing", see Especially if the content has been present for a while, be careful of our reusers. A list of some of these can be found at Mirrors and forks. If you aren't sure if the content was published here first, you can do a quick gut-check by (a) looking at the contributor's talk page and (b) looking at the article's history. Has the contributor had copyright warnings before? Another red flag. Was the content placed largely in one edit? Another red flag. A couple of those is worth at least further investigation.

How to handle it
The next step is knowing what to do once you've found copied or very closely paraphrased content and you've identified what may be the source—or at least a source that seems to predate us. It doesn't matter if you've found the first publisher. All that really matters is that we know that we didn't publish it first. It was copied from somewhere.

How to handle it, honestly, depends on how much time you have. It would be great if you devoted the full time to investigate, but if you can't, it's better to do the minimum than nothing. At the very least, consider noting your concerns at Wikipedia talk:Copyright problems so that others can investigate. (But, please, try not to do the very least as a matter of habit, as that talk page is not heavily watched itself.) If you have time, you can more thoroughly evaluate by following these steps:
 * 1) Check to see if there is an OTRS permission tag or a backwardscopy tag referencing the suspected source at the top of the article's talk page. If there is, either permission has already been verified or somebody has already investigated and found that the content was here first. (We hope with good reason. If you have reason to doubt, feel free to investigate further yourself! You can ask an OTRS volunteer to check the ticket, or ask the person who placed the backwards copy template to explain their actions.) If they don't reference the specific source, there may still be problems, and you should at least seek feedback. If these tags are present and seem in order, you're done.
 * 2) If not, check to see how long the content has been here and if there is any sign that the contributor who placed it here had permission. If it's been there forever, and there's no sign of permission, it will probably need to be speedily deleted. At this writing, articles with unsalvageably corrupt history where there is "no credible assertion of permission, public domain, fair use, or a free license, where there is no non-infringing content on the page worth saving" should be tagged . The tag generates a notice for you to give to the contributor. Please do. (See next section for why.)
 * 3) If the content hasn't been there forever or the article is not unsalvageably corrupt (say somebody pasted something into one or two sections, but others are all ours), you can revert to the last clean version or remove/rewrite the copyright problem. (What if you aren't sure where it was added or how to extricate it? Skip this step; we're coming to this situation soon.) Please tag the article's talk page . This can help prevent the content being inadvertently restored. (If it's advertently restored, see step 5.) What you say to the contributor here depends on whether they indicated permission. If they did not, the standard notice is  . If they did, the standard notice is.... Well, we don't have one. I usually copy the cclean notice I've just placed at the article's talk and tweak it a bit. It has all the information they need to verify.
 * 4) If it's been there forever and permission is indicated (or some credible assertion of public domain, fair use or free license), replace the article with . The tag you generate will tell you what to do next, providing the notice to place at Copyright problems and on the contributor's talk page.
 * 5) Suppose the copyright problem is not foundational but the copyrighted content is terribly intertwined with the article...or you are afraid based on the article's history that there may be other sources involved...or somebody is edit-warring with your efforts to remove the copyvio. In these cases, too, you should replace the article with . Follow the directions the tag generates, and an administrator or copyright volunteer will take over in due course.

Why we notify contributors
Before I started working basically full time at copyright problems, I had already noted the interesting (to me) fact that the copyvio speedy deletion criterion is the only one under which taggers are required to notify the page's creator. With all other criteria, notice is an optional courtesy.

There are good reasons for this. First, obviously, creators may learn that they can't paste content, and that's a win-win. The best case outcome here is a contributor who goes on to a happy, productive Wikipedia life without ever violating our copyright policy again. But even more importantly, a consistent practice of notification may itself protect the project against prosecution. Without bogging down into law, the Online Copyright Infringement Liability Limitation Act doesn't just require that we remove copyright problems when we receive notice of them. Among other requirements, we must also inform people about our policy and their risk of account termination. As the full-on legalese goes, a service provider who wants protection must have "adopted and reasonably implemented, and inform[ed] subscribers and account holders of the service provider’s system or network of, a policy that provides for the termination in appropriate circumstances of subscribers and account holders of the service provider’s system or network who are repeat infringers".

The notice clearly serves to inform; it also helps to implement. We don't keep a central record of infringers. These notices serve as red flags. When I'm cleaning copyright, I will usually look at a contributor's talk page history to see if he or she has received multiple notices in the past. If the user has and has persisted (sometimes the infringement I'm cleaning will predate their notices), it's time to consider terminating the account, at least temporarily. And it may be time to launch a contributor copyright investigation.

When to seek admin assistance
Admin assistance is necessary when a contributor has multiple warnings but has persisted in violating copyright. The Administrator's Noticeboard/Incidents is one place to go, but you can also simply approach an administrator whom you know routinely works copyright.

Admin assistance is also required if contributors are obstructing copyright cleanup of their own or others' work. If your cleanup is reverted or if the copyright problem is replaced with an unusably close paraphrase, I recommend replacing the content with the {{subst:copyvio}} first, as the template instructs that it is only to be removed by an administrator or OTRS agent. It may prevent the publication of a copyright problem while the matter is straightened out. If the person obstructing you is working in good faith, a friendly note clarifying the problem is also in order at this point. Sometimes bystander dismay leads to knee-jerk reversions of copyright cleanup, as people fear content is being removed without good reason. Sometimes people are attempting to replace a copyright problem with usable content, but not understanding the extent to which material must be revised. (A pointer to Close paraphrasing can help.) If the obstruction continues, it's time to head to WP:ANI or to the talk page of an admin who works copyright. We can't publish non-free content for which we don't have permission. Sometimes contributors need to be blocked to prevent the behavior. Sometimes, pages need to be protected.

Replacing copyright violations
Beyond locating copyright concerns, it will often be a great help if you can assist in rewriting them. If a copyright problem is foundational, we may lose the entire article. Even if not, it can set an article's development back years. I have myself rewritten hundreds of articles blanked for copyright problems, but I'm sorry to say it's a paltry percentage of what I've had to delete. The Copyright Problems board has daily listings of a handful to even dozens of articles that have been blanked for evaluation; the template blanking them includes a link to a temporary page in which clean content can be proposed. You are very welcome to fill it, even if you are the person who originally blanked the text.

One thing you do need to remember is that rewriting of copyright problems must be done from scratch. Works based upon other creative works are "derivative works". This includes translations, annotations, abridgments, condensations, elaborations or modifications. If the original content is copyrighted, only the copyright holder has the right to prepare derivative works. For this reason, incrementally modifying a copyright violation on Wikipedia is not likely to be helpful. If it isn't written from scratch, the rewrite may not be usable itself.

Again, if we can't prove that content is free, we can only use brief, clearly marked quotations for good reasons; all other information should be written in our own language, structure and organization.

A word about CCIs
Beyond helping to address copyright problems when you trip over them (and even seeking them out), probably the biggest assistance members of your project can offer to copyright cleanup on Wikipedia is helping with CCIs.

A CCI is an in-depth evaluation of one contributor's edits; they are launched (usually with great reluctance) only after we have verified copyright infringement in multiple articles or images. Usually, we will provide notice to a project if a CCI is opened that heavily impacts articles under their provenance, but not always—particularly because contributors often work in multiple areas. However, all active CCIs are listed at the top of Contributor copyright investigations, along with one or two areas in which they work. A spot check can help clarify if any of them have worked on articles of interest to you.

Any contributor who does not him or herself have a history of copyright problems is welcome to help out in cleaning up CCIs. (Downright celebrated, even.) We have literally dozens of these, with thousands of articles waiting review. The longer copyrighted content remains in an article, the more damage it may do to reusers, copyright holders, and to Wikipedia's contributors, who waste their time polishing something we can't retain.

Each CCI has instructions at the top of its individual listings page. In general, these are the same: content can be removed, if copying is found, or presumptively removed, if copying seems likely. (Remember, in these cases, we know the contributor of the content has violated copyright and are simply trying to figure out where. Given our knowledge, we err on the side of protecting the project, by exercising a reasonable duty of care.) There are special templates that can be placed on article talk pages to help avoid inadvertent return and also to help reduce bystander dismay.

Notes