Wikipedia:Wikipedia Signpost/2010-12-06/WikiProject report

This Halloween, Wikipedia unwittingly featured a truly scary article on the main page: a copyright violation produced by a member of the Arbitration Committee. After generating a large amount of buzz, the incident led to the replacement of the featured article and the retirement of an Arb (see Signpost coverage). Just a few weeks before, another long-time editor was indefinitely blocked and over 10,000 articles were blanked because of copyright violations (see Signpost coverage). With copyright policy garnering headlines so frequently, we decided to ask WikiProject Copyright Cleanup for some insight into copyright policies and how editors can avoid getting in trouble. We interviewed Moonriddengirl, MLauba, and Physchim62.

What motivated you to join WikiProject Copyright Cleanup?


 * MLauba: One of the articles I had on my watchlist was deleted for a reason or another, then recreated and slapped with a CSB notice. I decided to follow up, found that I could clean the article up and source it better, and from there hopped over to WP:SCV where the bot tracks its reports. It was under a huge backlog (in reality, it always is), so I got started on clearing it. After about 30 entries, I thought it might be a good idea to have someone validate what I did, and that's how I ended up on the WikiProject.


 * Moonriddengirl: After getting my tools, I focused on WP:CSD for a while before I even knew we had a board for copyright problems. When I found it, it had a pretty decent backlog, and I felt like I ought to be able to keep it up to date. There were a few other admins who pitched in now and again, and I soon observed a couple of regulars at WP:SCV, but there was no sense of community. So, in March 2009, I proposed the project. Its purpose, as I said then, was "to encourage participation and collaboration in copyright cleanup, currently a rather lonely field. In addition to providing a forum where contributors may discuss copyright matters, my hope is to create a gathering point where efforts can be coordinated to clean up massive infringement." This was before the days of WP:CCI, and the few cases I had stumbled upon were hosted in my userspace! I was hoping that the project would bring more members than it has, but I still regard it as successful. New people find us now and again, and we've got some regulars who pull together well in keeping the various boards up to date. (That said, I am in perpetual recruitment mode. Come on aboard!)

'''There's been a lot of buzz about copyright cleanup recently. Could you provide some insight?'''


 * MLauba: Copyright is something that is perceived as complex, and many editors and admins shy away from anything that has "copyvio" attached to it. To wit, most of the ANI reports mentioning copyright issues tend to remain very short in nature - the two recent cases that stick to mind are completely out of the ordinary.


 * Out of the practice of cleaning up text copyright issues I believe all of us are aware that there is a staggering proportion of our content that has been borrowed from third parties, most of it in good faith. But to much of the community, this realization came as a shock - the first time due to the sheer volume of contributions to check, the second time because it was an arbitrator affected. And on the latter case, we also had an article that went through two article review processes, DYK and FAC undetected.


 * Last but not least, the problem was exacerbated because the whole thing was erroneously labeled as plagiarism when it was actually a copyvio. Plagiarism is a particularly loaded term in many circles, most notably academia.


 * Physchim62: The [cleanup] buzz comes in waves, roughly one per academic semester. This semester it's copyright violation because we had two pieces up on the Main Page over the Halloween weekend, in separate sections, that were contrary to Wikipedia's copyright policy. Next semester it might be biographies of living persons (again) or verifiability (again). The important thing is that these issues serve as a wake-up call for all editors that our quality control procedures are not perfect, and indeed never will be. The challenge is to keep copyright issues (and BLP, and verifiability) in the back of editors' minds even when there isn't a "WikiScandal" going on. The problems don't go away just because no one's shouting about them!

'''What does the project do? What is your role in the project?'''


 * MLauba: The project does three things: offer a discussion space for people seeking advice on copyright issues, centralize tools and processes used in copyright cleanup, and discuss new initiatives in terms of processes, practices and guidelines.
 * In terms of roles, there aren't any real roles in the project, but I am one of the people who come up with ideas for tightening up processes and documentation from time to time. And of course normal cleanup activity, mostly at WP:CP, when I have time left to do so.


 * Moonriddengirl: Narrowing specifically to the first point of MLauba's response, the project offers a place for people interested in helping out with copyright problems to go to learn procedures and seek feedback. My personal goal is to help welcome new people to copyright cleanup and be available to provide what guidance and assistance I can as they learn the ropes.

How does copyright infringement and plagiarism affect Wikipedia?


 * MLauba: First, there's a point to clarify. Copyright infringement is a legal term, and deciding whether a copyright infringement has happened or not is a complex matter left to the courts. If we were ever to be found in infringement by a court of law, the first hit is of course the reputation of the project, but in terms of liabilities, the WMF is subject to a so-called "safe harbor provision" for copyright matters: it acts as a repository of data but not as a publisher. This would mean that in the end, the defendant in a copyright lawsuit raised against Wikipedia wouldn't be the WMF, but the editor who copy / pasted content without permission.
 * If you remember about a year ago, there were some headlines because several thousand images from the British National Portrait Gallery were imported onto Wikipedia, an act legal under US laws but not necessarily in the UK. If that matter would have proceeded to the courts, it would almost certainly have been brought against the uploader.


 * Our copyright policy defines the notion of "copyright violation" or copyvio for short, and it has been designed on purpose to be much more stringent than what current US jurisprudence recognizes as actionable copyright infringement. And there are two reasons for that. The first is obviously to protect both our editors and our content - by being much more strict than what the current legal practice requires, we minimize risks to our editors, but beyond that, we also future-proof the encyclopedia. Indeed, the trend those past two decades has been to tighten copyright laws in favour of copyright holders, so by having a large margin of progression, we hope to avoid a situation where Congress passes new laws that suddenly renders a large portion of Wikipedia illegal.


 * Then there's the second reason, and the most important one. Our mission is to create a free repository of knowledge, and material copyrighted to third parties is not free - it is essentially not ours to give away. And that's the primary reason why copyright cleanup is absolutely essential to Wikipedia.


 * Plagiarism is a different matter, an ethical one. Here the impact is first and foremost to our reputation. Plagiarism is a failure to attribute content to their authors, to give credit where credit is due. It is taking someone else's work and passing it off as our own. Attributing text we have copied from a free source is essential in terms of credibility, to distinguish between what is our work and what is from another person's labour. And to look at the broader picture, our own licenses to reusers are quite broad, but the one thing that we require is attribution, that if you take our content and re-use it, you must state that you got it from us. Plagiarism makes us hypocrites, who would want to hold our re-users to a higher standard than what we hold ourselves to.


 * Moonriddengirl: To what MLauba says, I would add that copyright infringement also has impact on our content reusers. Wikipedia could have been published under full copyright; it isn't. We're under a license that permits liberal reuse, even commercially, for a reason: we want our content to be reused. Among the core values of the Wikimedia Foundation we find the following: "An essential part of the Wikimedia Foundation's mission is encouraging the development of free-content educational resources that may be created, used, and reused by the entire human community. We believe that this mission requires thriving open formats and open standards on the web to allow the creation of content not subject to restrictions on creation, use, and reuse." We are here, in part, to be reused. While the WMF has safe harbor, not all of our reusers will. While the WMF can simply pull copyvios when these are identified, some of our content reusers utilize print, which may be difficult and costly to retract. If we do not ensure that our content is free, we risk damage to them and more, to their trust in us and our content. If they cannot trust that our content is free, why would they risk reusing it? We compromise our values and our mission.


 * Beyond the global concerns, there is also the direct damage copyright problems do to our articles and to our editors. I've seen literally thousands of articles (I don't want to think about how many thousands) come up for copyright review. I've had to delete many; in many others, I've had to chop out content or roll them back sometimes years to the last clean version before the article was tainted. Our contributors waste their time polishing something we can't retain. How discouraging it must be to discover that the article you've spent hours copyediting is an unusable derivative work! In addition to the time lost working on articles with copyright problems, more time is needlessly wasted in cleaning them up. We would all be better off if we could just put that energy from the beginning into creating content we can keep.

'''How can editors help you with the project? What do you recommend for users who need to clean up their own articles?'''


 * Moonriddengirl: Pitch in! We have a project subpage full of instructions: WikiProject Copyright Cleanup/How to clean copyright infringements. (If these can be improved, let us know. If it's unclear or confusing, we'll fix it.) We especially need contributors who have no history of copyright problems to help out with the massive backlog at WP:CCI. As of this writing, there are 44 open "contributor copyright investigations" on individuals who have been verified to have copied content into multiple articles. (Grab any one you want; you don't have to start with the oldest.) Each CCI subpage has instructions at its top. If you aren't sure how to do something, ask us. Ask me. I'll be so happy that you're helping out that I'll be more than willing to take the time to explain. For contributors who need to clean up their own articles, I'd recommend first familiarizing yourselves with the policies and guidelines (WP:C, WP:NFC) as well as some helpful essays (Close paraphrasing, Wikipedia Signpost/2009-04-13/Dispatches under "Avoiding plagiarism": also good advice for avoiding copyvios!). If you aren't sure that you're rewriting completely enough, get feedback. Proper paraphrasing is a learned skill. None of us are born knowing how to do it, and there's no shame in learning how. If you know that you've created problems either with copyright or plagiarism, get systematic about cleaning it up. We have a tool at WP:CCI with which we can list your major edits to articles...all of them. You can conduct your own CCI so you can identify problem areas and clean them up.

Next week, we'll focus on some photosynthetic, unicellular and multicellular eukaryotes dating back to the Precambrian Eon... To return to a simpler time, study our previous reports in the archive.