Wikipedia talk:Plagiarism/Archive 8

A real-life example from outside Wikipedia
Here is an article describing a plagiarism case that led to a very highly-regarded journalist losing his job. It's instructive, as it is the sort of plagiarism that is common in Wikipedia. -- JN 466  18:02, 9 November 2010 (UTC)


 * That's what we'd qualify as a close paraphrase and remove as a copyvio, actually. MLauba (Talk) 00:35, 10 November 2010 (UTC)
 * And rightly so. It's these sorts of cases we need to focus on. -- JN 466  02:34, 10 November 2010 (UTC)
 * That case is not so much plagiarism as professional misconduct! The journalist implies to readers (and, more importantly for him, to his editors) that he has seen things that he hasn't seen, "travelled roads" that he hasn't travelled. It's probably plagiarism as well, but it is an excellent example of why we should not use the word plagiarism when we mean something else, because it woolies our thinking and leads us to miss essential issues. Physchim62 (talk) 04:15, 10 November 2010 (UTC)
 * This is a long-term problem on this page. The plagiarism guideline (nee essay) was developed to address primarily PD/free sources and how to properly attirbute them. It is not designed to address copyvio issues, yet the two always get conflated. The real world mixes up these concepts too. Raising the possibility of a copyvio seems to be much less fraught than "plagiarism", which despite our efforts to WP:-ify the term always comes across as a personal attack. As witness your mention of "professional misconduct", that is much closer to the connotation the word carries on Wikipedia. It seems that "copyvio" issues can be dealt with much more clinically here. Franamax (talk) 21:54, 11 November 2010 (UTC)

In a nutshell: the first clause is a contradiction
"Don't make the work of others look like your own". Can anyone tell me why this should not be removed, given that WP editors don't claim what they write as their own in the first place (WP:OWNERSHIP) and must surrender any right to copyright protection? The notion of personal gain from theft is also hard to substantiate because authorship is a relatively diffuse concept, both in terms of a snapshot of an article and how it evolves (and sometimes degrades) over time. Tony  (talk)  14:40, 11 November 2010 (UTC)
 * I think that's only a misunderstanding regarding the language (the nutshell line is coorect though). "Own" is used in the meaning of self(made) and not legal ownership or copyright status. So "your own work" simply implies it is work that you've done yourself, it does not imply that it is work you legally own (as in possess). Generally speaking you don't necessarily own (=possess) your own (=self) work, as  you may have relinquished your ownership by selling it or goving it away for free. So the perceived contradiction is none but merely stems from different meanings of the word "own" --Kmhkmh (talk) 15:08, 11 November 2010 (UTC)
 * The entire "guideline" is only applicable if you take a view of "ownership" which is incompatible with WP:OWN. It should be rewritten as an essay which explains why plagiarism will almost invariably lead to a breach of one of Wikipedia's real policies, such as WP:V or WP:NPOV. Physchim62 (talk) 20:39, 11 November 2010 (UTC)
 * I agree with your disliking of this guideline in general for various reasons. However I see not problem with WP:OWN, as pointed out above the word "own" can be used in somewhat different ways. WP:OWN is talking of "own" as in possessing (moral, legal or just claimed ownership) whereas the nutshell line here talks about "own" as in selfmade/originally created by you. Or to put it in another way it is "belongs to me" versus "I've done/created it" or "holding the title" versus "being the creator". So we should confuse this misunderstanding of language with the real issues of this guideline.--Kmhkmh (talk) 07:23, 13 November 2010 (UTC)
 * I disagree, otherwise the sentence "If you do not want your writing to be edited, used, and redistributed at will, then do not submit it here." in Ownership of articles would not make sense. The sentence could be rewritten "Don't make the work of others look like your writing", but I think "own" is more succinct.
 * If I copy a paragraph from Encyclopædia Britannica (Eleventh ed.) and do not attribute it, then it looks like my own work and be plagiarised text. It will probably not be a breach of NOR and unless the content is challenged or likely to be challenged it is not a breach of WP:V to include it without a citation. Here is an article (Daniel Keyte Sandford) created by copying content from The Scottish Nation, by William Anderson, a publication from 1863 now in the public domain, which from its creation on 2 February 2010, was what most editors here consider to be a plagiarised version, an error which was corrected by adding attribution on 13 July 2010. -- PBS (talk) 02:55, 13 November 2010 (UTC)


 * Tony, that's not my understanding. I believe that I absolutely do retain copyright in my creative expressions here. I do grant the right to modify and re-publish my work, but it is still mine. I think you are referring to the effective need to accept that you are surrendering ownership, because pursuing a copyright case through the actual court system would be ruinously expensive. But if someone copies an article you substantially wrote without attributing your authorship, feel free to go at 'em. Of course you will have no monetary damages to claim, sp you should either go with "pain and suffering"/"callous disregard" to recover some tort damages, or use a statutory infringement claim. In Canada, that is a cool $1000 per occurence, leaving you down only 249K or so.
 * The intent of the nutshell (which yes, I wrote) is to address the en:wiki dimension of adding text and then saying it is "an article I wrote". No, it is an article you created or expanded, but you did not write it. It's fine to copy free sources - so long as you are clear that is what you are doing. We seek here to give guidance on how to be sure that you are properly attributing PD and free sources. Franamax (talk) 21:42, 11 November 2010 (UTC)


 * Do we own our own writing here, Franamax? My understanding is that, as soon as it's submitted, all claims to ownership cease. I can take something I've posted here, and sell it to a publisher—if I could find someone silly enough to buy it—but they would not be allowed to publish it without making clear that it was not their property or mine to sell; that is, they would have to credit Wikipedia. SlimVirgin  talk| contribs 04:52, 13 November 2010 (UTC)
 * Noo, the minimal acceptable compromise is that off-site attribution link the en:wiki page (in this case) from which the writing originated. That's the core principle of both the GFDL and recently-adopted CC-BY licensing, the chain of authorship absolutely must be preserved. GFDL had some problems with the "five principal authors" requirement for the History section given a massively collaborative environment and the MediaWiki interface kind of smeared out the "Section"s, so moving over to CC-BY probably made more sense. But the licensing terms definitely do preserve your ownership in every little bit of creative contributions you've made. If you wrote an article from scratch and the only other edits were typos and categories, yes, it is yours. However, you licensed it in perpetuity for re-use and modification, so I can do whatever I want with it - so long as I credit you as one of the principal authors. If I can sell it to someone for $$, more power to me, you already granted me that license, thank you very much (commercial reuse is part of the license).


 * What is the sense in which it is mine? Anyone can reuse it and change it. Anyone can sell it or buy it (though if they buy it, they're fools). It can be reproduced simply by crediting Wikipedia, not me (I have never seen the main authors credited). So in what sense does it remain my property? SlimVirgin  talk| contribs 05:30, 13 November 2010 (UTC)
 * That might take you through the intricacies of the GFDL under which almost all of our content was donated - I've made that journey a few times, and I have a long answer. But short answer I'd say is that a link to the en:wiki page places in close proximity on arrival another link to the GFDL History and smearily (sic) the Title page. It tales a bit of grappling, but your own original contribs are your own. The fact that you assent to modification and redistribution doesn'r subtract from your ownership. If you wanted to truly donate your material free of any restrictions whatever, including the restriction on requiring attribution, you probably should have picked a website with more permissive licensing terms. :) This is Wikipedia, so we go by GFDL and latterly CC-BY, that is fundamental. If you are talking about how to enforce your rights, whole different story, see my comments above where I mention needing a quarter-mil to get it off the ground. But don't confuse justice with access to justice. Franamax (talk) 05:58, 13 November 2010 (UTC)
 * Don't the guys who make printed books from Wikipedia articles print the list of contributors to the articles somewhere in small print? I seem to remember something to that effect. -- JN 466  06:20, 13 November 2010 (UTC)
 * (Rudely inserting myself here) There's a whole lot more guys who do it that than that, and some of them flagrantly violate our license (and hence our contributor's copyrights.) ISHA Books incorporates major content from Wikipedia articles under claim of full copyright . See this talk page thread and this one. The biggest problems there are that books from this publisher still pop up as sources in articles and that sometimes people mark our text as violating their copyrights. (An understandable error, but a serious concern of mine. We have to make sure our content is copyright compliant, but it really bugs me when reverse infringement endangers the valid contributions of our volunteers.) --Moonriddengirl (talk) 12:45, 13 November 2010 (UTC)
 * Fascinating. Anecdotally, when I was writing an article a while back, I noticed that an Indian sociologist had plagiarised significant parts of his book from a 60-page article in The New Yorker. The original author was occasionally cited, or mentioned, but basically the book was written much like some of our articles. :) I understand it is rather more difficult to enforce copyright in India than in the US. It wouldn't surprise me if this type of thing should become a growing problem. -- JN 466  05:01, 14 November 2010 (UTC)
 * I dunno. If someone reduces Toronto to "my firend Alison is teh coolest!" and I restore the content, will VDM credit me as a principal author? There are articles with well over 50,000 edits, do they reproduce that list in entirety or do they somehow distil the truly creatve edits? We have a situation here where "our rights" are pretty clearly spelled out, but so far as I know have never been tried under case law. Franamax (talk) 06:43, 13 November 2010 (UTC)


 * Not to mention PediaPress, a private company that runs the button in every sidebar, "Create book". I take your point about not confusing rights with access to justice, Franamax, but the fact is our work is not our own in any meaningful, enforceable sense, and other people are making money from it. What I only learned yesterday is that the Foundation is helping them to do that, via Pediapress and its sidebar access. SlimVirgin  talk| contribs 06:49, 13 November 2010 (UTC)
 * That's an interesting nugget, I never really did understand the whole "books" thing. I try to only concern myself with on-wiki stuff where on-wiki guidelines are concerned though, and so far as I'm concerned SV, you and anyone else who makes a creative contribution here has an unequivocal ownership interest in your copyright (and in law where the servers are located, I believe moral rights under the Berne Convention). I will defend those rights into the ground. But no, I won't be out looking for a lawyer tomorrow. Franamax (talk) 07:04, 13 November 2010 (UTC)
 * I definitely support your position morally, and I sense that one day it will be tested in court, because for people to be openly making money by selling our words, with Foundation assistance, is somewhat provocative. We'll just have to wait for a rich Wikipedian to get annoyed by it. :) SlimVirgin  talk| contribs 07:09, 13 November 2010 (UTC)
 * I'm not so sure here. This whole thing is imho due to fact that WP had difficulties to come up with proper copyright license in time when it started. GDFL was mainly used because it was available, but you might argue we are paying until today for that choice. Ideally WP texts and edits should be released under public domain or a copyright scheme very close to the public domain. Because with such a model all the problems biting us in daily WP work would go away. Unfortunately there is no obvious legal or easy way to downgrade our copyright status to public domain, so we will to live that early mistake. I think people focusing a lot on the ownership (by copyright) in WP got somewhat distracted from the primary goal. The primary goal is providing free access to the knowledge of the world and the framework (legal, software, social) should make the creation and management of that knowledge as easy as possible. It was not meant as a place to stake legal or moral claims in intellectual ownership.--Kmhkmh (talk) 07:43, 13 November 2010 (UTC)
 * Well I can claim a fairly notable lack of creative contributions here and I would note that the real writers here do have quite an attachment to, and sense of pride in, the work they have committed. I'm fairly sure they all had a sense that they were giving something freely for the ages to come each time they clicked on "Save page", a little child they were sending off into the world. And I think they have suffered at each of the grievous slings and arrows their children have borne. But for the purpose of this guideline, that's really not material. The fact is that original writing submitted here is in fact protected by copyright. Whether or not to enforce that right is an individual decision, just as it is in all other aspects of life. It makes no difference whether you are an individual or a publishing house - if you want to enforce your copyright (or patent for that matter), then pony up some dollars and go visit a lawyer. That's universally true, it's not a wiki-specific problem. However, for the purpose of this guideline, the issue is copying directly or in substantive structure the writings of other people which is patently not your own. Quite aside from considerations of commercial reuse, the issue being addressed is whether your incorporation of PD/free text has the appearance of being your own original writing. That is where this whole guideline started, consciously separated from copyright considerations so far as possible. (Yes, in-wiki copy-pasting is a complicating factor) We really did try in there to separate out the copyright stream over to the established process. It still does get conflated though... Franamax (talk) 08:28, 13 November 2010 (UTC)
 * I'm aware of what this guideline is about and that is exactly why imho it needs to be abolished. However that is a somewhat separate matter from the point I've raised in my reply SlimVirgin. Authors should be proud concerning their achievements, but anybody seriously considering legal remedies over his potentially "abused" contribution to WP has lost sight of our primary goal and probably contributed to the wrong kind of project (for him) in the first place.--Kmhkmh (talk) 09:51, 13 November 2010 (UTC)
 * Anyone getting particularly rich off Wikipedia, through "consultancy jobs" or directorships with republishers, would likely be the strongest disincentive for community participation imaginable. In a way, it looks inevitable that this will happen one day, but it will also be the end of Wikipedia as we know it. -- JN 466  05:43, 14 November 2010 (UTC)

←Franamax, is there something basic I'm not getting? When you say: "the issue is copying directly or in substantive structure the writings of other people which is patently not your own.", I wonder why copying your own writing from elsewhere, whether copyrighted or not, would be appropriate on WP (WP:OR). Tony  (talk)  12:11, 13 November 2010 (UTC)
 * Sometimes experts do hang out on Wikipedia, whose writings have been properly vetted and published: COI. Beyond that, we get people submitting their own writings all the time through WP:OTRS. For a random example, Klipsch Audio Technologies contains content submitted by the company itself. We do accept such content in articles about the subjects themselves, of course, under appropriate circumstances. (Haven't read that article; as I said, I selected it randomly from the list.) --Moonriddengirl (talk) 12:32, 13 November 2010 (UTC)
 * Copying your own content is not WP:OR per se, that entirely depends on the actual content itself and whether you source it sufficiently (not by citing yourself but by citing other sources confirming your content). WP:OR issues only arise if your original content contained new previously unpublished or unrecognized knowledge or opinions. To look at more to concrete scenario. Let's assume I've written an article on the Pythagorean theorem elsewhere, which summarizes the most import (known) aspects of that theorem, then it would of course be perfectly fine for me to reuse that content in WP to improve the according WP article or create a new one in case no article exists in WP yet. There is no WP:OR problem at all with that scenario. If however my original article on the Pythagorean theorem contains new previously unknown and/or unpublished mathematical properties rather than known facts only, then we have a potential WP:OR issue, if I were to reuse that content in WP. However assuming my original was actually published (and reviewed) in reputable math journal then I might still be able to reuse it here. There are afaik already some scientific journals expecting/requiring their authors to prepare a "WP-version" of the papers they submit. The idea is here, that if those papers get published, reviewed and accepted by the scientific community, the "WP-version" can be fed directly into WP.--Kmhkmh (talk) 13:43, 13 November 2010 (UTC)
 * P.S: This is essentially the case that moonriddengirl called "experts that actually do hang out in WP" in her posting above. It is common misconception (in particular by the general public) that this rarely happens, because many WP articles are actually written by people holding an academic degree in the field to which that article belongs. Now this does not hold for the majority of the articles but still for a significant percentage.--Kmhkmh (talk) 13:50, 13 November 2010 (UTC)
 * Look, back to my opening point: "Don't make the work of others look like your own" is appropriate to say to college students. They write assignments in which they are expected to own the text unless they explicitly mark it as borrowed from someone else. Academic assignments are a showcase of students' ability to think with originality. WP article text is not meant to be original—it is meant to be true to its sources. Thus, the marking system of attribution and the signalling system for paraphrased and duplicated material is not the same. This guideline has been pushed too far in the direction of the college assignment genre, without proper assessment of the unique relationship of WP text to sources. The opening needs to be recast. Tony   (talk)  14:46, 13 November 2010 (UTC)
 * I agree that WP is not a comparable to the situation in college or academia and I'd also agree that this guideline problematic to say the least (and imho should be abolished). However your starting point was a perceived contradiction in the nutshell line and that contradiction does not exist. Whether that nutshell line or the whole guideline is well suited for WP and really wanted by the community is an entirely different issue.--Kmhkmh (talk) 16:07, 13 November 2010 (UTC)
 * Welcome. I've been making that point for about a year now. Perhaps we should add a paragraph to the lede explaining the difference between plagiarism in an academic setting, and plagiarism in Wikipedia. Slim made a start, adding a few words to that effect a few days ago, in Plagiarism. -- JN 466  05:38, 14 November 2010 (UTC)

When writing about historical issues
Here's a question, particularly for Moonriddengirl. I'm currently reading sources about when the media first broadcast news of the existence of the gas chambers at Auschwitz. I'm doing it for a summary-style section in an article about an escapee. But I may also want to write a fuller account of it in its own article.

There are just a small number of sources who have written about this in great detail. There is one paper in particular that lays it all out -- who knew what when. If I want to write this article, I'm going to be forced to stick very closely to the arrangement of facts as presented by this source, because to change it would produce inaccuracy. But it seems this will by necessity open me up to the charge of plagiarism and copyright violation as defined by some editors here, even if I pepper the text (as I will) with in-text attribution. SlimVirgin talk| contribs 16:14, 13 November 2010 (UTC)


 * In terms of copyright violation, nobody can say without seeing the source and then seeing what you do with the text. You're a skilled writer, and I have a hard time imagining that you will not adequately revise it to separate from the source. In terms of plagiarism, that's pretty easy; if you follow the source closely, acknowledge that you've followed the source closely. Voila, no plagiarism issues. --Moonriddengirl (talk) 16:35, 13 November 2010 (UTC)


 * But still a copyright violation, according to your earlier argument, because I'll be copying the source's facts, and his arrangement of those facts. SlimVirgin  talk| contribs 16:36, 13 November 2010 (UTC)


 * Sorry, but I'm not able to give an opinion on whether you're violating copyright policies without seeing what you do--neither to say, "Oh, this is fine!" or "Gosh, you'd better not." --Moonriddengirl (talk)16:48, 13 November 2010 (UTC)


 * I've told you what I'm going to do. It's a short article -- I'm guessing around 20 pages. And I intend to put it in my own words, but otherwise follow it extremely closely; both the same facts and the same arrangement of facts, because the point of the article (if I create it) will be to give those exact details and not to waver from them in the slightest. SlimVirgin  talk| contribs 16:52, 13 November 2010 (UTC)


 * Telling me what you're going to do is not helpful, I'm afraid. If you take a copyright case to court, they will review the source and they will review the use of the source. Your intentions don't figure into it. But so far as I know, you've never had a copyright problem. I don't see any reason to presume you're about to start. Meanwhile, there's plenty of real (rather than theoretical) copyright evaluation to be done. --Moonriddengirl (talk) 17:05, 13 November 2010 (UTC)


 * I'm not asking about court, but about what advice we give to Wikipedians in these circumstances. SlimVirgin  talk| contribs 17:09, 13 November 2010 (UTC)
 * Are you talking about trying to craft general advice for this or some other guideline or policy? Or if a random user stops by our pages to ask if their intended use is legal? If the latter, I would say in this situation, "For a 20 page source, you aren't likely to create a summary close enough to constitute a copyright problem, unless you are closely following creative elements within the document itself. For example, you can't reproduce tables or lists unless these are completely devoid of creativity. Be careful that you aren't producing an abridgement of the source, however, as this is one form of derivative work which is reserved to the copyright holder by ." --Moonriddengirl (talk) 17:20, 13 November 2010 (UTC)


 * Yes, it will be an abridgement, and necessarily so, because I have to stick closely to the precise arrangement of facts. That's why I'm asking the question. When you're looking at a narrow area that hangs on detail, and there are very few sources who have gone into that detail, you either follow them very closely, or you don't write about it. SlimVirgin  talk| contribs 17:23, 13 November 2010 (UTC)


 * Since abridgments are reserved to the copryight holder, if you create an abridgment you may be by US definition creating a copyright problem. See also Fair use in our article. But this has nothing to do with plagiarism. --Moonriddengirl (talk) 17:27, 13 November 2010 (UTC)
 * I'm trying to catch up CP before I have to head off in about an hour, but I don't want to leave this hanging incomplete. The art of writing to avoid copyright concerns is complex. Your best bet for safety is to put content into your own words and to incorporate multiple sources, thereby creating a new creative work that serves to advance rather than supplant what you've taken from a single source. If you take from a single source, you should be careful not to follow too closely on the structure. The more creative the source you are using, the greater the concern. A chronological recounting of a battle will have far less copyright protection than a nuanced evaluation of the factors that led to a war. If you do effectively create an abridgment of a non-free source, the legal permissibility of it will depend on other fair use factors. --Moonriddengirl (talk) 17:37, 13 November 2010 (UTC)

I guess abridgement is a bit of a red herring here because the point of an abridgement is that it reads basically like the original. So the style is essentially the same, and in a typical case many sentences will be copied. It's much closer to the original than a summary.

My advice would be, just to be on the safe side, to look for a way to organise the information differently. Is the original source presenting everything chronologically as it occurred? Maybe you can separate events into two or more main stories. First tell one of them, then tell the other and explain how it relates to the first. Maybe it makes sense to go backwards in time. Maybe isolate a number of phases which you can arrange chronologically, and discuss each phase according to some ordering principle other than time. Programmers use similar tricks when they write open source software based on an analysis of copyrighted source code. Hans Adler 17:39, 13 November 2010 (UTC)


 * The thing is that I don't want to do that. I want to tell people "this is what Martin Gilbert says, and this is how he says it," because it's a very good article. And the issues are too complex to mess around with. I probably won't write it now, so this is a moot point in terms of this article, but I think it probably applies to a fair chunk of WP articles where the writers are trying to do something serious, because that often involves sticking very closely to the key sources. SlimVirgin  talk| contribs 17:48, 13 November 2010 (UTC)
 * Well, that would be a shame. :/ Is there a way to incorporate it as part of a larger body of text? If you incorporate the information into a larger piece that covers additional elements, it's more likely to be properly "transformative". --Moonriddengirl (talk) 17:54, 13 November 2010 (UTC)
 * I could add it to Auschwitz, but that article is a bit of a mess. I was hoping to write a little stand-alone piece about who knew what when, and the number of good sources who go into sufficient detail is really small. I'll have a rethink and see if I can approach it differently. SlimVirgin  talk| contribs 18:02, 13 November 2010 (UTC)
 * If you want further feedback, you're welcome to drop by my talk page any time. I can't give a definitive opinion without seeing the usage, but I'm happy to brainstorm solutions. I've spent many an hour trying to figure out how to say stuff that has already been tagged as a copyright problem on Wikipedia so we don't lose the information. :) --Moonriddengirl (talk) 18:08, 13 November 2010 (UTC)
 * Okay, thanks, I appreciate that. SlimVirgin  talk| contribs 18:24, 13 November 2010 (UTC)


 * Sigh. That is precisely the sort of thing I was worried about. Remember, copyright law serves to aid education and the spread of knowledge (so I remember reading in one of the legal copyright judgments I read a couple of days ago), and that is why facts are not copyrightable. That includes timelines of historical events. -- JN 466  05:31, 14 November 2010 (UTC)
 * It's a pity that copyright law is complex, but unless we can persuade the United States (since they govern En Wiki) to jettison their current muddy system in favor of something simple and obvious, what we need to do is try to make our own policies as clear as possible within that murky water so that we and our legitimate reusers don't wind up on the wrong side of it. Codifying simple rules is not easy when the law is complex. To go through this again: facts are not copyrightable. Their selection and arrangement can be. The more extensive the compilation of facts and the more creative the selection and arrangement, the more stringent copyright protection is likely to be. Feist v. Rural remains among the primary precedents for nonfiction and Feist says clearly (with respect to compilations that clear the creativity threshold) "a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement."bald link You can have the facts, but if the source's selection and arrangement of facts is creative and if it is used extensively, you need to bring your own creativity to bear on both. --Moonriddengirl (talk) 13:20, 14 November 2010 (UTC)
 * Copyright does no such thing. But it's there, and we have to live with it. Hans Adler 13:58, 14 November 2010 (UTC)
 * In addition to the above two accurate replies, there is the question of whether, even if entirely legal, it would be desirable for Wikipedia to gain a reputation as a website where articles are plagiarised from books, media reports, and other sources. Johnuniq (talk) 03:56, 15 November 2010 (UTC)


 * Hans, you may well be right, but this is the claim copyright law makes for itself. Here is the relevant bit from Rural v. Feist (a case that concerned a telephone directory). It is really worth reading in toto. The relevant claim is highlighted.

"No one may claim originality as to facts." Id. 2.11[A], p. 2-157. This is because facts do not owe their origin to an act of authorship. The distinction is one between creation and discovery: the first person to find and report a particular fact has not created the fact; he or she has merely discovered its existence. To borrow from Burrow-Giles, one who discovers a fact is not its "maker" or "originator." 111 U.S., at 58. "The discoverer merely finds and records." Nimmer 2.03[E]. Census-takers, for example, do not "create" the population figures that emerge from their efforts; in a sense, they copy these figures from the world around them. Denicola, Copyright in Collections of Facts: A Theory for the Protection of Nonfiction Literary Works, 81 Colum.L.Rev. 516, 525 (1981) (hereinafter Denicola). Census data therefore do not trigger copyright, because these data are not "original" in the constitutional sense. Nimmer [499 U.S. 340, 348]  2.03[E]. The same is true of all facts - scientific, historical, biographical, and news of the day. "[T]hey may not be copyrighted, and are part of the public domain available to every person." Miller, supra, at 1369.

Factual compilations, on the other hand, may possess the requisite originality. The compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers. These choices as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original that Congress may protect such compilations through the copyright laws. Nimmer 2.11[D], 3.03; Denicola 523, n. 38. Thus, even a directory that contains absolutely no protectible written expression, only facts, meets the constitutional minimum for copyright protection if it features an original selection or arrangement. See Harper & Row, 471 U.S., at 547. Accord, Nimmer 3.03.

This protection is subject to an important limitation. The mere fact that a work is copyrighted does not mean that every element of the work may be protected. Originality remains the sine qua non of copyright; accordingly, copyright protection may extend only to those components of a work that are original to the author. Patterson & Joyce 800-802; Ginsburg, Creation and Commercial Value: Copyright Protection of Works of Information, 90 Colum.L.Rev. 1865, 1868, and n. 12 (1990) (hereinafter Ginsburg). Thus, if the compilation author clothes facts with an original collocation of words, he or she may be able to claim a copyright in this written expression. Others may copy the underlying facts from the publication, but not the precise words used to present them. In Harper & Row, for example, we explained that President Ford could not prevent others from copying bare historical facts from his autobiography, see 471 U.S. at 556-557, but that he could prevent others from copying his "subjective descriptions and portraits of public figures." [499 U.S. 340, 349]  Id. at 563. Where the compilation author adds no written expression, but rather lets the facts speak for themselves, the expressive element is more elusive. The only conceivable expression is the manner in which the compiler has selected and arranged the facts. Thus, if the selection and arrangement are original, these elements of the work are eligible for copyright protection. See Patry, Copyright in Compilations of Facts (or Why the "White Pages" Are Not Copyrightable), 12 Com. & Law 37, 64 (Dec. 1990) (hereinafter Patry). No matter how original the format, however, the facts themselves do not become original through association. See Patterson & Joyce 776.

This inevitably means that the copyright in a factual compilation is thin. Notwithstanding a valid copyright, a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement. As one commentator explains it: "[N]o matter how much original authorship the work displays, the facts and ideas it exposes are free for the taking. . . . [T]he very same facts and ideas may be divorced from the context imposed by the author, and restated or reshuffled by second comers, even if the author was the first to discover the facts or to propose the ideas." Ginsburg 1868.

It may seem unfair that much of the fruit of the compiler's labor may be used by others without compensation. As Justice Brennan has correctly observed, however, this is not "some unforeseen byproduct of a statutory scheme." Harper & Row, 471 U.S., at 589 (dissenting opinion). It is, rather, "the essence of copyright," ibid. and a constitutional requirement. Art. I, 8, cl. 8. Accord, Twentieth Century Music Corp. v. Aiken, 422 U.S. 151, 156 (1975). To this end, copyright assures authors the right to their original [499 U.S. 340, 350]  expression, but encourages others to build freely upon the ideas and information conveyed by a work. Harper & Row, supra, at 556-557. This principle, known as the idea/expression or fact/expression dichotomy, applies to all works of authorship. As applied to a factual compilation, assuming the absence of original written expression, only the compiler's selection and arrangement may be protected; the raw facts may be copied at will. This result is neither unfair nor unfortunate. It is the means by which copyright advances the progress of science and art.

This Court has long recognized that the fact/expression dichotomy limits severely the scope of protection in fact-based works. More than a century ago, the Court observed: Baker v. Selden, 101 U.S. 99, 103 (1880). We reiterated this point in Harper & Row:


 * "[N]o author may copyright facts or ideas. The copyright is limited to those aspects of the work - termed `expression' - that display the stamp of the author's originality.
 * "[C]opyright does not prevent subsequent users from copying from a prior author's work those constituent elements that are not original - for example . . . facts, or materials in the public domain - as long as such use does not unfairly appropriate. the author's original contributions." 471 U.S., at 547 -548 (citation omitted).

This, then, resolves the doctrinal tension: Copyright treats facts and factual compilations in a wholly consistent manner. Facts, whether alone or as part of a compilation, are not original, and therefore may not be copyrighted. A factual compilation is eligible for copyright if it features an original selection or arrangement of facts, but the copyright is limited to [499 U.S. 340, 351]  the particular selection or arrangement. In no event may copyright extend to the facts themselves. - -- JN 466  17:08, 15 November 2010 (UTC)
 * I've collapsed the lengthy quote here. It's got nothing to do with plagiarism. I don't know that it's necessary to refute Hans' apparent belief that copyright law hampers rather than enhances the spread of information to prove that the Feist court felt differently. --Moonriddengirl (talk) 17:41, 15 November 2010 (UTC)
 * No problem. We somehow came to focus on the copyright implications of SlimVirgin writing her article in the discussion above. At any rate, we seem to have concluded that the plagiarism aspect of it would be remedied by acknowledging the source. -- JN 466  18:24, 15 November 2010 (UTC)
 * Yes, she asked about it and, as you say, the plagiarism aspect is by far the easier to resolve. Give proper credit and there's no problem. Beautifully simple! But this talk page has gotten quite long!--Moonriddengirl (talk) 18:38, 15 November 2010 (UTC)

Slimvirgin writes:


 * The thing is [...] I want to tell people "this is what Martin Gilbert says, and this is how he says it,"

Well, then, how about just saying so? I.e., at the end of the introduction, parenthetically note that the article closely follows Martin Gilbert's. Since you're paraphrasing, you're not in copyright violation territory. And since you're giving credit where it's due, for organization of facts, it's not plagiarism. It's inelegant, I suppose, but if anyone can make up for that, SlimVirgin, you can ;-) Yakushima (talk) 09:13, 17 November 2010 (UTC)
 * If you paraphrase too closely, you are indeed in copyright violation territory under the U.S. laws that govern us. --Moonriddengirl (talk) 11:24, 17 November 2010 (UTC)
 * I'm aware of that. SlimVirgin is, I'm sure, aware of that.  I was assuming that SlimVirgin would, in the text of the WP article, avoid paraphrasing the source article so closely as to be plagiarizing it. That's not what I was addressing.  I was simply suggesting that, if the organization of that essay were largely borrowed, the charge of plagiarism of that structure might be averted by simply saying, in the WP article, that you're borrowing the organization from that source.  I'm certainly not proposing that, say, any section headers that required apparent creative effort be lifted verbatim from the essay.  (But if "Introduction" were the first section header in the essay, well, that's not exactly plagiarism, is it?). Yakushima (talk) 14:26, 18 November 2010 (UTC)

Indirect quoting makes copied phrasing not plagiarism!?
The article used to say this:
 * Adding in-text attribution ("John Smith argues that ...") always avoids accusations of plagiarism. But be cautious when using it, because it can lead to other problems. For example, "According to Professor Susan Jones, human-caused increases in atmospheric carbon dioxide have led to global warming" would be a violation of NPOV, because this is the consensus of many scientists, not only a claim by Jones.

I deleted it. Do I need to explain why?

Let's say that in some article about arms control negotiations, I cite this article after "writing":


 * Mary Beth Seridan and Walter Pincus wrote that President Obama's top foreign-policy goals suffered a potentially ruinous setback when the Senate's second-ranking Republican said the U.S. nuclear treaty with Russia should not be considered until next year.

Obviously, I'm lifting Sheridan's and Pincus' wording verbatim, while leaving no clue to the reader that I'm copying, not writing.

As to the supposed NPOV violation, there's no reason it would be a violation if the article were about, say, Professor Susan Jones, or the climatology research center she worked for. Yakushima (talk) 08:57, 17 November 2010 (UTC)


 * Adding in-text attribution there is fine, though the number of words copied would be a problem. We can add something about that. As for NPOV, yes, but the text would say that. With the example as it stands it would be problematic. We can add a different example if you like. SlimVirgin  talk| contribs 19:26, 17 November 2010 (UTC)
 * We do not need an example. Just a mention that the text also meet the content polices. This is because not only is there a problem with NPOV of the type you have described (which for want of another word could be under undue) there is also problems of bias presenting only one side of an argument and probably half a dozen more. So a general statement without details keeps this guideline focused on what it is about. -- PBS (talk) 20:16, 17 November 2010 (UTC)


 * SlimVirgin, I just can't agree. The number of words copied from the source in my example doesn't look like a copyright problem at all to me.  Under fair use, you're typically allowed to include far more words (unless the law addresses some special case for indirect quoting -- does it?).  But if I looked at the source and saw how much had been copied verbatim and how original the wording seemed to be, I'd be calling it plagiarism, openly and persistently.  The claim that INTEXT "always" prevents charges of plagiarism doesn't seem to hold up very well.  (The description of INTEXT policy isn't quite as strong, saying "Using in-text attribution avoids inadvertent plagiarism", though I find fault with it anyway, under the same reasoning I use here.)


 * The problem with the case I bring up above is that the reader is far too likely to consider almost all of the words as originating from Wikipedia editors, not from Pincus and Sheridan. That's certainly how it reads to me -- it's not at all clear that it's intended as an indirect quote from the article cited.  In the case of the indirect quote of Rawls, I don't have the same reaction at all.  I'm immediately thinking "'veil of ignorance' -- Rawls is introducing a phrase that requires some elaboration."  Even in that case, though, I'd prefer quotes around "veil of ignorance".


 * I believe Wikipedia policy about attribution should be, if anything, more conservative than in other encyclopedic sources, precisely because Wikipedia is more vulnerable to plagiarism by its open-editing nature, and indeed (as we all know) has been on the receiving end of bad press on this count. The basic rule should be: leave no doubt in any reader's mind as to the precise source of any wording, if the wording could leave any impression at all of creative construction.


 * Indirect quoting deserves special care because of this. It is a textual convention deriving from reported speech.  And it's implicit in reported speech that, even apart from the grammatical transformations required in English, you're not necessarily reproducing the word choices of the source, since human memory typically fails at that task.


 * Reproducing the source's word choices verbatim in all but the shortest indirect quotes therefore tends to violate reader expectations about originality of wording. If you explicitly allow it, up to lengths permitted in direct quotes under fair use doctrine (hundreds of words, in some cases), you could potentially have entire Wikipedia articles stitched together with indirect-quote sutures.  Each sentence of such a Frankenstein-article might start out something like, "According to X", and might contain sentences that are separate in the source but joined by substituting semicolons where the source uses periods.  Such articles would be abiding by "the letter of the law", while clearly violating the spirit, opening Wikipedia up to charges of being soft on plagiarism.


 * And what is the cost if you disallow this kind of thing? Negligible.  You're not putting any great burden on editors.  Quite the contrary.  It's almost always easier to just put quotes around some copied text than to recast it into grammatically correct reported speech. Yakushima (talk) 12:25, 18 November 2010 (UTC)


 * I added some words to deal with your concern about copyright. But bear in mind this page isn't about copyright, and we don't want to do anything to encourage the quote farm mentality. SlimVirgin  talk| contribs 12:41, 18 November 2010 (UTC)


 * I would say (as a repentant quote-farmer myself), "We don't want to do anything to encourage the quote farm mentality, unless some particular form of discouragement of quote-farming could result in more plagiarism (or charges thereof, anyway)." After all, which is worse?  Legally, Wikipedia can get a lot of slack about quoting, under Fair Use, because it's educational and non-profit.  Quote farms are sad, perhaps, but they don't invite the kind of public derision and disrepute you get with plagiarism (not to speak of the internal spite that develops, mea culpa there also).  Finally, farm straw can eventually be spun into the kind of gold you so generously and copiously provide to Wikipedia.  But a long and informative article composed too much of plagiarism poses one of two relatively pressing problems: (1) rewrite chores that are often doubly onerous in being undertaken while angry or ashamed, or (2) an article becoming dramatically less informative until a substitute for the plagiarism can be supplied.


 * By the way, it is plagiarism I'm overridingly concerned with, in my comments above. I'd far prefer that this guideline say less about copyright, in fact. If you somehow skimmed my comment above and assumed I was mainly addressing copyright, well ... could you please try again? Yakushima (talk) 13:58, 18 November 2010 (UTC)

Removed external link
I've provisionally removed the external link added by Fred Bauder here on WP:BLP grounds. While illuminating on the topic, it seems rather unsavoury, especially the email exchanges at the end. Was this accusation upheld by an institutional review board? If not, we are advertising what could be construed as an attack piece directed at a living person, which is not the intent of this guideline. It's arguable that the material is valid as an illustration of the acrimony which accompanies suggestions of plagiarism though, so perhaps there is a place for the link. But the way it looked just gave me a bad feeling. Comments of course are welcome. Franamax (talk) 23:03, 16 November 2010 (UTC)
 * It is genuine, but does focus in an embarrassing way on a scholar who seems to have cut some corners. This is the article in The Chronicle of Higher Education that links to it. I like it as an external link as it illustrates both plagiarism and copyright violations nicely in that it illustrates both blocks of text copied and the failure to fully cite sources. This is a recurrent problem faced by our editors. Fred Talk 03:30, 17 November 2010(UTC)
 * Forum discusson Chronicle website. Fred Talk 03:33, 17 November 2010 (UTC)
 * Letter to the editor by Frank Fischer. Fred Talk 03:52, 17 November 2010 (UTC)
 * Defense by colleagues. Fred Talk 03:52, 17 November 2010 (UTC)
 * "Alan Sokal Responds to Frank Fischer" Fred Talk 03:52, 17 November 2010 (UTC)
 * The utility of these as external links is that they nicely illustrate the routine problems Wikipedia editors face. The downside, of course, is that they focus on one aspect of the work of an otherwise distinguished scholar without presenting a rounded picture of him and his work, a problem Biographies of living persons attempts to address. 03:52, 17 November 2010 (UTC)

This is interesting too. Fred Talk 04:36, 17 November 2010 (UTC)
 * Very interesting, thanks, Fred. -- JN 466  09:30, 20 November 2010 (UTC)

INTEXT
I have move:
 * INTEXT: Add in-text attribution when you copy or closely paraphrase a source's words.

here for further discussion. This is phrased as an imperative and implies that when an article is copied from an copyright expired text that one has to include the author the authors name as an in-text attribution. This is clearly not what is wanted. -- PBS (talk) 19:12, 17 November 2010 (UTC)


 * Philip, we have a plagiarism problem onwiki, which several people are working to resolve. We can add words to the text you don't like to cover your concerns, but this is about source material, not copying from PD texts. SlimVirgin  talk| contribs 19:24, 17 November 2010 (UTC)


 * You are adding it to the lead INTEXT. The lead is meant to discuss the context of the whole piece. This includes copying from the public domain. Text from the public domain does not need in-text attribution and does not usually get it. It needs attribution in the citation or in the general references list. I think you should stop making changes until the issues that have been raised are agreed upon. -- PBS (talk) 20:05, 17 November 2010 (UTC)


 * This page must be as clear as possible. Editors have been getting into trouble over this, so we need complete clarity. This page is about plagiarism, not copyright (as it explains), so PD text concerns should be dealt with in detail there; then we can link to that section from here, though I believe there is already such a link. This kind of issue is widely agreed, Philip. See the large numbers of people currently working on this in various forms and on multiple pages. SlimVirgin  talk| contribs 20:24, 17 November 2010 (UTC)
 * SV my concerns are not about copyright, but about how we attribute pubic domain text copied into Wikipedia. To me it seems that you are concentrating on how to avoid plagiarising text is under copyright. There is nothing wrong with that in the correct context, but sn important part of this guideline is how to deal with copy left and public domain text so that it is attributed and not plagiarised. Unless there is another reason such as complying with NPOV and OR we do not usually use in-line text attribution for public domain text. For that reason I think that the lead must reflect all of the guideline and not just that section which concentrates how to avoid plagiarising text is under copyright. I think that your changes to the lead are not giving due weight to issues of the sections on copy left and public domain text. -- PBS (talk) 22:17, 17 November 2010 (UTC)
 * [edit clash] SV you reverted my revert of your edit with the comment "restoring as this is the norm; we can discuss exceptions, but even non-copyrighted requires in-text acknowledgment of some kind" but this is not true. You have defined INTEXT "is the attribution inside a sentence of material to its source, in addition to an inline citation..." For PD works the norm is to place the attribution either in the citation or if it involves more than a few lines then in the References section. See the tens of thousands of  links to the template 1911, links to the template DNB and link to the template  Catholic. The first link to an article containing both DNB and 1911 is Jakob Abbadie it could do with inline citations but as far as plagiarism is concerned it is covered in the general references section. The first link to an article containing 1911 and the Catholic templates is Ambrose which carries more citations. See Murrough O'Brien, 1st Earl of Inchiquin for a fully footnoted example of text from the DNB. In none of these cases is in-text attrition used unless it is to cover POV as in Murrough O'Brien, 1st Earl of Inchiquin "Bagwell wrote that Clarendon not unfairly summed up the case ..."-- PBS (talk) 22:53, 17 November 2010 (UTC)


 * PD texts copied into WP do say on the page (not just in a footnote) that the text has been taken from X. There is a section about that on the page, so we could link to it from the lead. SlimVirgin  talk| contribs 22:29, 17 November 2010 (UTC)
 * It depends on the amount of text copied as to whether the wording in a template like citation-attribution is sufficient inside a ref tag (footnote) or if the wording like that in the template source-attribution needs to be placed in the "References section". BTW the line Attribution is not a section header it is just a way of highlighting the attributed text. Making it a section header would complicate other guidelines like WP:CITE and WP:LAYOUT for little gain. Also note that the recent revamping of this guideline has changed the tone of this guideline from that used in the section that suggests these things are done to a more forceful tone. I am not personally against that but, plagiarism has been a minority interest until recently and a more facilitating tone was taken with how to present this information. -- PBS (talk) 23:05, 17 November 2010 (UTC)

This edit is the equivalent of a hack in computer science. If fixes a logical problem but it is only a work around/band aid on the problem of the structure because we are then faced with "well what about copy left text"? It leads to the question "As copy left is not now explicitly mentioned in the lead, do we have to use this construction for copy left text?" The question is not answered so it has made the lead confusing.

Personally I think it better to remove all the bullet points from the lead and go back to a lead that describes the guideline instead of trying to put details in there that out of context are confusing. But even if that is not done it would be better to remove the line "in text" from the lead and let it appear in the body of the guideline as the exceptions to it complicate the lead unnecessarily. -- PBS (talk) 23:35, 17 November 2010 (UTC)


 * INTEXT is vital. It was lack of INTEXT that caused many of the recent troubles. We are not going to water it down because of concerns about PD texts which should arguably never have been added to WP in the first place. But that is a separate issue, and is dealt with separately on this page in its own section. By all means expand your qualifications of INTEXT there. But for the vast majority of WP articles written by Wikipedians, INTEXT is very important because it keeps them safe, used within reason. SlimVirgin  talk| contribs 12:45, 18 November 2010 (UTC)


 * Overwhelmingly, the main audience for this guideline is people who are confused over big issues, not fine points. And there are a lot of fine points.  The lead should hook these editors, then help them get to the section they need.  The last thing it should do is leave them intimidated by the subject of plagiarism and Wikipedia's guidelines on it.  Think triage: commonest and most clueless cases first, physicians who can heal themselves last, with only a few categories sketched in between.


 * For example, I've recently dealt with an editor who copied a PD source almost verbatim into an article section, reworded it slightly in a way that changed the meaning of the first sentence, added citations to the PD source on each paragraph, but then added a citation to yet another source (also PD, and to much the same effect, but not identical) on one of the sentences, and added nothing in the references section to indicate his judgement on the intellectual property status of what was copied in. All in the same edit, which was summarized merely as "expanded article" (it did, after all, include some other contributions besides the one from the PD source.)  The overall impression left was of a polished piece of writing, but one suspiciously over-reliant on one of the two sources it cited.


 * Now, this is the mark of someone under the very common impression that public domain sources can be treated as if they were one's own words. "It's in the public domain, so I can do what I want with it."  I've heard it a million times.


 * You're not helping this person very much if the lead for WP:PLAG summarizes all the coverage of all the fine points. That'll just make readers pointed to the guideline blanch and think "Wow, plagiarism is a complex technical subject -- who knew?"  Actually, at its core, it's not.  It's 99% covered by the nutshell summary (which, by the way, should be right at the top, not tucked under some other public service announcement).  All else is mostly just description of citation and attribution conventions that mainly reflect the application of common sense and scholarly practice to the common-sense nutshell summary.  And the reader should be constantly reminded of this fact.


 * What editors like the one I mention above need to see in the lead is something saying,
 * .... the basic rule is that, if what you want to include anything that evidently required creative effort, you should clearly indicate the inclusion and clearly say who produced the source material. Where the difficulty enters in is that there are a number of special cases when it comes to how you should do those two essential things.  Find your own case, among those outlined below, and choose the section that's most relevant to your particular problem.


 * That makes the editor I mentioned above think, "Well, I thought 'public domain' meant I could do whatever I wanted, but since this text I'm copying from PD clearly required creative effort -- heck, that's why I chose it, it's great! -- and since I've been pointed here by some editor who thinks I botched that inclusion, let me check the clearly indicated section about how to do it right."


 * Any lead that doesn't do that isn't doing the job. It might conform to all kinds of formal policies about how to write formal guidelines in Wikipedia.  It might make editors coming from a software background squirm with delight that handling of copyleft sources is addressed.  It might make open-source licensing mavens glow with pride that they recognize all the acronyms for all the licenses mentioned.  But it's still failing 95% of the intended audience, and therefore failing outright for all practical purposes, no matter how much it might please some of us.  If a bulleted list helps the Great Unwashed navigate better, use it.  If some other organization works better to that effect, well, use that.  But don't forget that the Great Unwashed account for the overwhelming majority of cases of plagiarism, and that they require the overwhelming bulk of the effort going into guidance on this issue. Yakushima (talk) 13:29, 18 November 2010 (UTC)


 * Y, sorry, this is too much to read. Can you sum up the main points? SlimVirgin  talk| contribs 14:45, 18 November 2010 (UTC)


 * Okay, I see now, you're saying the lead must be succinct. I agree, which is why I like INCITE, INTEXT, and INTEGRITY. Easy to remember and follow. Which parts do you feel are not succinct enough? SlimVirgin  talk| contribs 14:47, 18 November 2010 (UTC)
 * "Succinct" is not the same as "clear and not intimidating to newbies, and helpful in navigating the immediately relevant part of the guideline." Don't start right in with jargon.  "INCITE" is a codeword, and my first free association is with the verb "to incite".  So you're already getting cryptic and technical.  INTEXT isn't even an English word.


 * As the guidelines are now written, I still believe you're opening the door to what might be called "indirect-quote farming", which in turn exposes practitioners of it to charges of plagiarism -- i.e., the worst of both worlds. Here's something that I think passes INCITE/INTEXT/INTEGRITY tests with flying colors, without exceeding typical Fair Use length for educational purposes:


 * During the commodity price increases of 2007-8, Paul Krugman wrote that Europe's dependence on Russian energy, especially natural gas, looked very dangerous — more dangerous, arguably, than its dependence on Middle Eastern oil. After all, he asserted, Russia had already used gas as a weapon: in 2006, it cut off supplies to Ukraine amid a dispute over prices.  He warned of a global economic disruption he believed would follow if China — which was then about to surpass the United States as the world's largest manufacturing nation — were to forcibly assert its claim to Taiwan.


 * It can avoid charges of WP:NPOV violation because the POV is clearly attributed to Paul Krugman, i.e., it's an objective description of his POV. But how is the reader to know that the wording is 90%+ verbatim from Paul Krugman himself?  If the reader doesn't reach that conclusion almost automatically, the passage is in plagiarism territory.  But those who craft such edits could respond to such criticism by citing WP:PLAG where it says that INTEXT "always" avoids charges of plagiarism.  Really?


 * There's no cookbook formula for avoiding plagiarism. You actually have to think: "Is there creative effort in what I'm including from another source, and if so, could anyone reasonably believe that the creative ingredient was the result of my effort (or that I'm claiming it as mine, implicitly)?"


 * If some newbie simply copied that paragraph from that Krugman op-ed verbatim, and prefaced it with, "i found this in a krugman column: ", who would call it plagiarism? Sure, it would be in violation of multiple Wikipedia style strictures.  But it would be more within the spirit of avoiding plagiarism than the  paragraph I've wordsmithed into compliance with INCITE, INTEXT and INTEGRITY, above.  For one thing, almost any reader would see that the source had been handled incorrectly.  For another, a more experienced editor could immediately see what needed to be done.  Not so with my wordsmithed-compliance version.  With that, you've got a ticking time-bomb: someday, somebody's going to look closely at the source cited and notice that the wording is almost identical to that of one of the best writers in the business of propounding certain economic and political viewpoints to the general public. Yakushima (talk) 06:19, 19 November 2010 (UTC)


 * I added that using a few words of a source's words would not be problematic. Do you want to strengthen that? SlimVirgin  talk| contribs 00:49, 20 November 2010 (UTC)

I am reverting the lead (and just the lead) to the version of 15:58, 17 November 2010 (Yakushima) because in your own words SlimVirgin "consensus on talk was against you" -- PBS (talk) 23:33, 19 November 2010 (UTC)


 * You reverted me here because you didn't get your own way at WP:W2W. This is the behavior I've come to expect from you after watching years of tendentious article and policy editing from you, and arguing that black is white on talk. It is time to deal with it. Please see your talk page. SlimVirgin  talk| contribs 00:47, 20 November 2010 (UTC)


 * No I reverted you here because you have not had anyone else support your recent changes to the lead. I have been waiting for several days to see it anyone did agree with you. But after you reverted a change I made after waiting several days for people to reply for my requests for evidence of the WP:W2W with the comment "consensus on talk was against you" I presumed that you would be happy to apply the same criteria to changes that you have made to a page. Note I only changed the lead back to what it was and left the other edits in place as no one had raised an objection to them here. -- PBS (talk) 02:02, 20 November 2010 (UTC)

The lead is not the palace to go into detail. That should be left to the individual sections. For example "Add in-text attribution when you copy or closely paraphrase a source's words", but we do not always attribute in the text all quotations. It depends on the context, see for example List of events named massacres, (but this is a detailed observation for the section in which it resides "Add in-text"). In the lead it causes problems because you have added PD as an exception but not "copy left" text, which implies copy left texts from say Citizendium need to have in-text attribution. It seems to me better to put the lead back to how it was and give a brief overview of each section rather than concentrating on just one aspect of the problem. -- PBS (talk) 02:33, 20 November 2010 (UTC)
 * Could we not refer to "non-free" and "free or public domain" sources in that sentence? -- JN 466  10:59, 20 November 2010 (UTC)


 * What Yakushima says above makes a lot of sense -- both as regards making the lead easy to read for novices, and regarding the example of long attributed paraphrases. Could I suggest that we give ourselves a few days' time to work something out? We can draft things here on the talk page. -- JN 466  10:59, 20 November 2010 (UTC)
 * Fine with me. I don't agree with PBS's reversion, and in fact I would have preferred he just tag the lead as not reflecting consensus.  Of course, I have problems with SV's treatment of the lead, but it's getting on the right track in proposing solutions.  Maybe the discipline of reaching consensus before making a serious edit to this guideline has broken down somewhat, but mea culpa: I've been immediately reverted by SV, and I had it coming, for not getting consensus first.
 * What I think should take priority for treatment in the lead, however, is problems. Most people pointed to WP:PLAG have been pointed to it over a particular problem.  And, much as I hate to say it, you've really got to dumb things down a little, both in describing problems and in recommending solutions.  Under AGF, what else can you do but make so-called "inadvertent plagiarism" the default assumption?  But this means the audience for this guideline is basically the clueless.  I've had to yank out pretty obvious clueless edits just in the last day or so, for wholesale copying from copyrighted sources left utterly unattributed, and I've run across this kind of thing numerous times before, so I don't have a problem with revising WP:PLAG from the assumption of cluelessness.  But dumbing it down has another virtue: where there's true plagiarism, someone who's been pointed to the guideline only once can't really retreat to the position that the guideline was too hard to understand, if they are caught plagiarizing again.
 * What SV proposes is great, if you're an experienced editor. But if you're an experienced editor, you don't really need it anyway. It's time to get this guideline down to the level where a reasonably bright 15-year-old can read it and understand it so easily that virtually nobody can claim that it's vague, dense, jargon-ridden, or otherwise unapproachable.  After all, most "inadvertent plagiarists" pretty much fit the profile of "reasonably bright 15-year-old", from what I've seen so far on WP.  Regardless of how old they might really be. Yakushima (talk) 11:57, 20 November 2010 (UTC)
 * Agreed, good points. -- JN 466  03:06, 22 November 2010 (UTC)
 * I don't disagree. However doing that from some perspective lead redicing the guideline to few lines are scrapping it altogether (as various people here suggested anyhow).--Kmhkmh (talk) 15:38, 22 November 2010 (UTC)

Query
Colleagues, I'm not up to speed on the status of large slabs of attributed material within quote-marks. Is there a limit to size? Crayon_(film) Tony   (talk)  14:02, 24 November 2010 (UTC)
 * Absolutely. :) This is not a plagiarism issue, though; it's a copyright concern. (Especially since it was originally a straightforward, unattributed paste, although at that point it was a plagiarism issue, too) I've removed it in accordance with WP:NFC. Size limits are vague, but unavoidably so, since "amount and substantiality" depend on so many factors, including the proportion of the material to the original and whether it constitutes the "heart" of the work. Its size relative to its new usage is also a factor. You can certainly defend a single stanza from a multiple-stanza poem, for instance, under "fair use" in an in-depth critical analysis of said poem. It's a different matter if the single stanza is alone on a page. We're stuck with the whole "except brief excerpts" at the bottom of every edit screen and WP:NFC's "Extensive quotation of copyrighted text is prohibited." --Moonriddengirl (talk) 14:24, 24 November 2010 (UTC)
 * WP:NFC could do with a few more examples, especially those that might be reasonably close to the boundary of acceptability, either side. Tony   (talk)  14:43, 24 November 2010 (UTC)
 * What does one do when an established editor says "it's not a crime to copy a sentence" (i.e., without quotation marks, in that case)? User:GFHandel has been trying to reason with this editor, who earlier today pasted a slab of text from the internet into an article, word for word, without quotes. Please see User_talk:Taksen. Tony   (talk)  09:17, 14 January 2011 (UTC)
 * I've removed the text and left Taksen a note. EyeSerene talk 10:51, 14 January 2011 (UTC)
 * For most of last year this guideline advised any standard copyrighted material copied into a Wikipedia article should be limited and put in quotes. SlimVirgin was the one who pushed most strongly to allow in-text attribution without quotation marks. See Wikipedia talk:Plagiarism/Archive 6 and other discussions elsewhere. I would not have made the change (see my comment in the linked section). -- PBS (talk) 11:04, 14 January 2011 (UTC)
 * I don't believe quotes are the only way to distinguish copied material, but your comment "how is a reader to know if what I have just written is a summary of what you said or a direct quote unless quotes are marked as such?" is, for me, the key point. Using blockquote markup or some other formatting technique seems fine too, but the MoS limits these uses and I'd certainly advocate using quotes as best practice in other circumstances. The edit in question here was totally unacceptable without either quotes or a complete rewrite; without quotes it just looked like POV. EyeSerene talk 11:41, 14 January 2011 (UTC) {editied 11:43, 14 January 2011 (UTC))

That much hasn't changed. With respect to copyrighted content, this guideline says, "Limited amounts of text can be quoted if they are clearly indicated in the article with the use of quotation marks, or some other acceptable method (such as block quotations). All quotations must be followed by an inline citation. They cannot be closely paraphrased for copyright concerns, but must be substantially rewritten in original language." That is entirely consistent with the policy at WP:NFC. But even if the content were public domain, there's still no allowance to copy without acknowledging copying. Plagiarism is the lesser concern; copying copyrighted content without acknowledging copying is a copyright problem. (And even if the text were PD, this was not a question of indirect speech; this was still plagiarism as defined by Wikipedia.) --Moonriddengirl (talk) 11:45, 14 January 2011 (UTC)

another query
example from a very recently created article with query as to at what point "rewording" becomes plagiarism?:


 * The first American socialists were German Marxist immigrants who arrived following the 1848 revolutions. (article)


 * The first Marxian socialists in the United States were German immigrants who came over after the ill-fated German revolution of 1848. (Draper)


 * A larger wave of German immigrants followed in the 1870s and 1880s, which included social democratic followers of Ferdinand Lasalle.. (article)


 * The next and larger wave of German immigrants in the seventies and eighties however, owed their socialism less to the exiled Marx than to the romantic founder of German social democracy, Ferdinand Lassalle. (Draper)


 * The Lasalleans formed the Social Democratic Party of North America in 1874 and both Marxists and Lasalleans formed the Working Men's Party of the United States in 1876. (article)


 * The Lassalleans formed a Social Democratic Party of North America in 1874. The Working Men's Party of the United States came in 1876 under joint Lassallean-Marxist auspices. (Draper)


 * The organizers of the IWW disageed on the road to socialism, where it would be achieved through political or industrial action. (article)


 * Though most of the organizers of the I.W.W., including Haywood, were avowed socialists, they did not agree on the road to socialism. The fundamental dispute hinged on the old problem of political versus economic action. (Draper)


 * It was successful organizing unskilled migratory workers in the lumber, agriculture, and construction trades in the Western states and immigrant textile workers in the Eastern states and accepted violence as part of industrial action. (article)


 * In the West, where the I.W.W. started, it was mostly sucessful in the lumber, agriculture, and construction trades, which used unskilled migratory workers who shifted from job to job and industry to industry. (Draper)

Covering only the first section as it seemed to show a pattern. Is it, however, plagiarism? Collect (talk) 02:22, 6 February 2011 (UTC)
 * Hmm. "on the road to socialism" seems like a striking phrase. If this is the extent of it, I don't know if the pattern is all that bad, but it would be worth speaking to the contributor about how to attribute close following or paraphrase otherwise. In addition to Close paraphrasing, I'm inclined to point people to Wikipedia Signpost/2009-04-13/Dispatches, specifically material beginning under "Avoiding plagiarism". --Moonriddengirl (talk) 19:57, 16 February 2011 (UTC)

Plagiarism at Johan de la Faille
I've discovered that large chunks of text in the Johan de la Faille article have been copied (without change) from the web sources listed at the end of the article. I don't know which template to place on the article to bring this to the attention of those involved. Could someone please help me there? Thanks. GFHandel . 18:59, 16 February 2011 (UTC)
 * This is a blatant violation of our copyright policy. I have accordingly blanked the content with copyvio and provided the requisite notice to the creator. --Moonriddengirl (talk) 19:10, 16 February 2011 (UTC)


 * Thanks for acting so promptly. I can see that you have left a little of the article (and that's fair enough), however when I now look at it, all I can see are the picture, the references, and the "Possible copyright infringement" notice. I was also expecting to see the little bit of the article that should be remaining (e.g. the dates of birth and death lede). I'm curious why the bits that should display, don't? GFHandel . 19:19, 16 February 2011 (UTC)
 * When copyright problems are extensive and foundational, it is common practice to blank the article to allow the creator an opportunity to rewrite the article or to verify permission for the content. If I could be sure that only the content I've identified was copied, I could remove it, but I cannot verify that any of the creative text in the article is free of copyright concerns; as I noted at the listing, I believe the administrator who evaluates the article on closure (whether that be me or, hopefully, somebody else pitches in :)) will need to carefully review that content as well.


 * By the way, it is not only the creator who may rewrite articles to address these concerns. Frequently, when content is a copyvio at its foundation, the entire article will be deleted. It is greatly appreciated when users make use of the "temporary" space linked from the articles face to propose a clean rewrite which can instead be used to replace it. --Moonriddengirl (talk) 19:23, 16 February 2011 (UTC)


 * Fair enough, but I still can't figure out why some of the remaining text is not displayed. After your notice template starts the following paragraph:
 * Johan de la Faille (1626 or 26 December 1628 - 14 October 1713...
 * Why are the references displaying, but not text like the above? (This is only a technical question now.)
 * GFHandel . 19:39, 16 February 2011 (UTC)
 * Because the template automatically blanks the contents of the page. :) I placed  after the creative text to stop that blanking, so that the references and the categories are still viewable. --Moonriddengirl (talk) 19:41, 16 February 2011 (UTC)
 * Mystery solved. Thanks. GFHandel . 19:50, 16 February 2011 (UTC)

Discussion on another talk page
It was suggested that editors here might be interested in this discussion.--Andrew Lancaster (talk) 21:24, 18 March 2011 (UTC)


 * Actually, it was suggested that the issue be discussed here instead, because it seems to be about plagiarizing ideas. SlimVirgin  TALK |  CONTRIBS 21:42, 18 March 2011 (UTC)

Template
Is there a template for a tag to say "There may be concerns of plagiarism with this article"? ta — Ched : ?  07:45, 23 June 2011 (UTC)


 * Or maybe a flag that one can put in an article? -- kosboot (talk) 17:55, 27 July 2011 (UTC)


 * close paraphrase, non-free, cv-unsure, copyvio, etc. — HELL KNOWZ  ▎TALK 10:38, 30 July 2011 (UTC)

RfC for the explicit auditing of DYKs for compliance with plagiarism policy
An RfC has been launched to measure community support for requiring the explicit checking of DYK nominations for compliance with basic WP policies—including plagiarism policy—and to improve the management of the nominations page through the introduction of a time-limit after which a nomination that does not meet requirements is archived. Tony  (talk)  06:11, 23 July 2011 (UTC)

If you copy from one author, it’s plagiarism. If you copy from two, it’s research
See here. Count Iblis (talk) 00:40, 11 October 2011 (UTC)
 * Plagiarize,
 * Let no one else's work evade your eyes,
 * Remember why the good Lord made your eyes,
 * So don't shade your eyes,
 * But plagiarize, plagiarize, plagiarize -
 * Only be sure always to call it please 'research'.

see: http://www.youtube.com/watch?v=RNC-aj76zI4 --Kmhkmh (talk) 04:23, 11 October 2011 (UTC)

Placement of "Attribution" section
Hi. On the face of it, I do not agree with this change. I don't think it fits as well in the section where it was moved, where it applies to some but not all of the cases, as in its own section. If you're concerned that people will miss it, why not add a reference with an internal link in the sections to which it applies? --Moonriddengirl (talk) 00:52, 20 October 2011 (UTC)
 * My primary motive for moving it is that time and again when discussing plagarism with someone who has come across text that is a copy of a PD source (or have created a page without "proper" attribution), I have to put in two links to the page to explain what to do. I think That indicates that the two sections need to be closer to each other. -- PBS (talk) 01:02, 20 October 2011 (UTC)
 * Maybe it should be moved to its own section just below the section on "Attributing text copied from other sources"? --Moonriddengirl (talk) 00:53, 20 October 2011 (UTC)


 * Whow that was quick! I had come to the same conclusion about its indentation. So my solution (which is now half done was to leave it where it was but indent the others around it by placing a new section called "other licences" at the same level as copyright but indenting the three non copyright sections by one. The attribution will apply only to copyleft and pd sources. -- PBS (talk) 00:58, 20 October 2011 (UTC)
 * No, actually, I've undone that for now, too. :) We need to talk about it. "Public domain" is not a license. Attribution of the sort described does not apply to content copied within Wikipedia, either, so its placement in proximity to the "Where to place attribution" would be misleading. Content copied from within Wikipedia or other Wikis is handled in accordance with Copying within Wikipedia. (ETA: I can see the value of clumping them somehow, but I think we should probably consider the best way to do that first.) --Moonriddengirl (talk) 01:01, 20 October 2011 (UTC)
 * putting to one side the proposed section title (but not the use of a new section). That is why I placed the attribution section above the wikipedia section. -- PBS (talk) 01:06, 20 October 2011 (UTC)
 * I'm not entirely sure that the attribution section belongs in the same overall section with the other material. As it does not apply to one, but two, and as there are three sections, I think it may be more confusing than simply putting an internal link to #Where to place attribution. --Moonriddengirl (talk) 01:11, 20 October 2011 (UTC)

To summarise the changes I am proposing (as they no longer exist in the guideline). I think it is desirable to move the section "Where to place attribution" closer to the sections that mention it ("Sources under copyleft" and "Public-domain sources"). At the moment it is 8 sections away from those two sections. My proposal is to move it up immediately after the two section that mention it and before the Copying within Wikipedia. Here is the old layout:
 * 5 Attributing text copied from other sources
 * 5.1 Close paraphrasing
 * 5.2 Sources under copyright
 * 5.3 Sources under copyleft
 * 5.4 Public-domain sources
 * 5.5 Copying within Wikipedia

Here is my proposed layout
 * 5 Attributing text copied from other sources
 * 5.1 Close paraphrasing
 * 5.2 Sources under copyright
 * 5.3 name to be decided upon
 * 5.3.1 Sources under copyleft
 * 5.3.2 Public-domain sources
 * 5.3.3 Where to place attribution
 * 5.3.4 Copying within Wikipedia

If as you say Moonriddengirl "As it does not apply to one, but two, and as there are three sections" fair point about it applying to two sections (hence my second edit that if you had not reverted would have looked like the proposal above), but as to the second one if it comes before the "Copying within Wikipedia" section, it will be more explicit than the current situation where the section "Where to place attribution" is tagged on the end of the guideline. -- PBS (talk) 06:14, 20 October 2011 (UTC)
 * Okay, if that's your proposal, I disagree with it. It should be just as effective to add a "see below" statement to the two sections to which it does apply and less confusing and unnecessarily complex, similarly to the use at FAQ/Copyright, which points at the end to the Derivative works section further down the page. This is easy to do and seems to me better organized according to the overall principles structuring this page. --Moonriddengirl (talk) 11:18, 20 October 2011 (UTC)
 * From the standpoint of someone who refers to this page from time to time, but who does not focus a great deal of time on it – in other words, the sort of person who needs to be able to use this document – I'm inclined to support Moonriddengirl's approach. While I can see what PBS is trying to accomplish, I'm not sure that relying on a more complex, nested layout will be as effective in conveying the point.  The fact that there isn't an immediately obvious name for the new second-level heading should set off a mental alarm that this organizational scheme may be less intuitive than might be hoped.  TenOfAllTrades(talk) 14:51, 20 October 2011 (UTC)

Policy on copy/pasting a whole public domain article with attribution?
There is a minor dispute at Articles for deletion/Resilient control systems concerning the use of wholesale copy/paste content from a government whitepaper. I don't see that this is technically a violation as long as the content is attributed but I may be missing the policy statement that this violates. Joja lozzo  23:10, 20 November 2011 (UTC)
 * It's not. Many of our articles start with more or less a PD copy-paste. Rich Farmbrough, 20:42, 28 December 2011 (UTC).

Wrong
Public-domain attribution notices should not be removed from an article or simply replaced with inline citations unless it is verified that all phrasing and information from the public-domain source has been excised.
 * This is a ridiculous burden, If all information is removed, then no citation is needed at all. Even all phrasing is going too far, there is only one natural way to express many facts. ("Bloggs died 15 November 2012.") Rich Farmbrough, 11:00, 14 December 2011 (UTC).


 * Indeed, this makes no sense at all, it seems to be bureaucratic burden with connection of the primary project goals (in particular verification) whatsoever and possibly driven by plagiarism-phobia.--Kmhkmh (talk) 11:09, 14 December 2011 (UTC)
 * (ec) I agree. When we are using the wording from public domain content we are essentially plagiarising, but the attribution notice makes it OK. I hope what is meant here is that the attribution may only be removed when it's no longer needed because there is no plagiarism left. I suspect that whoever wrote this may have been influenced by questions of copyright: a work that started with a copyrighted work that was rewritten gradually until it was no longer immediately recognisable can still be a derived work, I believe. But I think the public-domain attribution notices should only be about the latest version of the text, as they are about plagiarism not copyright. Hans Adler 11:15, 14 December 2011 (UTC)

Looking at who placed it, I suspect what he meant is that attribution should not be removed while content is here and citation should not be removed while information is. It used to say, "But as creative work consists of more than just the words used in the copied text, also applying to the structure of the article and the way the topics are covered, public domain attribution notices should not be removed from an article or simply replaced with inline citations without verifying that this will not leave uncredited phrasing, language, or presentation from a public domain source." (Rich, I think your "only one natural way" is covered by Plagiarism.) Maybe the passage should be retooled something more in line with the way it was previously written? --Moonriddengirl (talk) 11:56, 14 December 2011 (UTC)
 * OK I had a hack at it. While PD attribution is important, wording like this is being used as a stick by some editors, which is not good. Rich Farmbrough, 20:41, 28 December 2011 (UTC).

new Nutshell
I think that the emphasis in the nutshell and the lede are wrong. I propose to change the "nutshell" to this:
 * When copyright is not an issue, you may copy, verbatim or otherwise, but you must attribute the source.

I feel that the entire tenor of our guideline is wrong. We should not try to prohibit or inhibit the use of PD material. The academic prohibition against plagiarism (don't claim credit for yourself) is almost completely distinct from our prohibition (credit the original source.) Since we do not in general identify our editors, the academic distinction is irrelevant: there is no "you" here. Only the amorphous "Wikipedia reputation" is a risk of sanction for plagiarism, and I frankly do not think that this has ever happened. -Arch dude (talk) 00:21, 3 January 2012 (UTC)


 * I love the positive tone of the proposed new nutshell. Let's adopt it. --Hroðulf (or Hrothulf) (Talk) 15:47, 3 January 2012 (UTC)


 * Can you point to where the guideline trys to "prohibit or inhibit" PD re-use? It was written quite carefully to not do that. Further, there is a "yourself", in your case the user ID Arch Dude, which is connected to a single real person. If someone were to copy your creative contributions here and use them in violation of the CC-BY-SA license, then you the living person could sue them for copyright violation. And your distinction is unclear anyway, the risk of plagiarism in the academic world is also loss of reputation. Also, several editors have quit permanently over the years due to accusations and evidence of plagiarism. Franamax (talk) 17:43, 3 January 2012 (UTC)
 * Well note he is not talking about reputation (of individuals) in the academic world, but about the reputation of WP and there he has a point. The reputation that should matter for WP, is being known for correct quality content, which is verifiable (cited) and legal. Plagiarism, that is legal, doesn't matter in that description at all and is strictly speaking a problem for academia only. If we lost productive authors of quality content over issues of legal plagiarism, then I rather see that as a problem of a misunderstood/misleading guideline, which may lead people to focus on the wrong things.--Kmhkmh (talk) 15:37, 31 January 2012 (UTC)
 * No, I read it as the "Wikipedia reputation" attached to an account name (think section headings at AN/I like "Franamax long history of plagiarism") and that is what I addressed. The cases I'm thinking of are where it came to light that the editor was actually a "productive copier of quality content" (usually from early in their editing career). We developed this guideline quite carefully to define what plagiarism means here on-wiki. For instance, copying PD/free sources without attribution is plagiarism, but the remedy is just to add the attribution, i.e. the editor just made a mistake, and so long as they don't keep making the same mistake, no big deal. And at all times, move issues over to the copyvio stream where possible, as it's a much more neutral process. Editors can deal with having made copyvios, even a lot of them, but react less well when told they are plagiarists. Incidentally, there is no such thing as "legal" plagiarism, as the term is undefined in statute law. It could arise in civil law, as (say) a reason for dismissal from a position, or pertinently here, say, revoking a WMF grant or fellowship awarded based on quality authorship. There is a "civil law" dimension here on-wiki too, as if an editor continues to add unattributed PD/free text after being notified of the requirements, it would indeed threaten the reputation of Wikipedia as a whole. Franamax (talk) 22:03, 31 January 2012 (UTC)
 * Yes but the point is, the current guideline aside for a moment, that copying PD quality content (without explicit attribution) is not problem as long as it referenced (=verifiable) at least. That is exactly the issue with this guideline or rather why it keeps being a subject of debate. The slightly stricter notion of this guideline does not really follow from project goal nor is it really shared by the community as a whole (or even at large). There is no (good) reason to mimic the requirements in academia (partially) into WP, because we operate under a different context with different goals. Polemically speaking our content is everything and its authorship doesn't matter (legality assumed). What matters primarily is the quality of the article and not who has contributed what. In that sense in this case the "loss of reputation by constant rule breaking" is caused by the fact that we formulated unnecessary and problematic rule to begin with.--Kmhkmh (talk) 13:26, 1 February 2012 (UTC)
 * Almost. Facts must be referenced to a cited source. Content must be attributed to the source. But "attribution" should not be onerous (a footnote is perfectly acceptable) and copying should not be discouraged. -Arch dude (talk) 16:12, 1 February 2012 (UTC)
 * : well yes, though from the WP perspective citing references to confirm the facts would be good enough (i.e. good enough for assuring quality/correctnes of the content).--Kmhkmh (talk) 20:12, 1 February 2012 (UTC)
 * Yes and no. The guiedline doesn't at all discourage copying PD/free material, it just sets out very easy standards for attribution. You have to understand though, that when you contribute text to Wikipedia, you are making an implicit claim of ownership, i.e. that it is your creative contribution which you license to others under the terms of the CC-BY-SA license. You can't actually do that with public domain work, as it's not yours to license. A standard footnote isn't sufficient, as it only indicates the source drawn from, not the licensing status of that work. That is why we have the attribution templates and the citation-attribution and source-attribution wrapper templates. Franamax (talk) 22:43, 1 February 2012 (UTC)
 * Well I have to agree regarding the licensing model, but imho the CC-BY-SA was a mistake too (causing various problems, even more so the original gnu), unfortunately contrary to the guideline that is a problem that cannot be fixed, hence we have to live with the guideline due to the licensing aspect I guess. I do agree with arch regarding the footnote though, i.e. that it is good enough to keep the attribution there.--Kmhkmh (talk) 01:46, 2 February 2012 (UTC)

Mention of in-text attribution in the lead and general definition section
Franamax you wrote above "Can you point to where the guideline trys to 'prohibit or inhibit' PD re-use?" I think that this edit and this one unbalanced the guideline because it introduces intext attribution in to the lead and into a general section. I know of at least one instance where the current wording has been misunderstood by an experienced editor: ... [PLAGIARISM is] quite clear: "Here the editor is not trying to pass the work off as his own, but it is still regarded as plagiarism, because the source's words were used without in-text attribution. The more of the source's words that were copied, and the more distinctive the phrasing, the more serious the violation." Almost the entire final paragraph in Anne Hungerford, which represents about half of the article, has been copied without in-text attribution.-- PBS (talk) 05:50, 2 February 2012 (UTC)


 * Well yes, I wasn't really paying attention just at that time, and I knew what you meant before ever clicking on your links. Can you reformulate that into a separate thread? We could address that separately and try to get a consensus. Franamax (talk) 06:16, 2 February 2012 (UTC)


 * Done I've put a section heading before the comment and un-indented it. -- PBS (talk) 08:06, 2 February 2012 (UTC)

Clarification of what "with very few changes" means
I need to ask about the exact, technical meaning of one of the guidelines for determining if something is plagiarism. The section "Defining plagiarism: Forms of plagiarism on Wikipedia" states that "inserting a text... with very few changes—then citing the source in an inline citation after the passage that was copied, without naming the source in the text" is plagiarism. According to this definition, what exactly does it mean to paraphrase "with very few changes"? Are there any guidelines to determine exactly how many changes a section of text must have with its source before it ceases to be plagiarism?

For example, consider this paragraph from one of our articles:

"Whatever Mao's opinion of Zhou may have been, the rest of the nation was plunged into mourning. Foreign correspondents reported that Beijing, shortly after Zhou's death, looked like a ghost town. Zhou had willed his ashes to be scattered across the hills and rivers of his hometown, rather than stored in a ceremonial mausoleum. With Zhou gone, it became clear how the Chinese people had revered him, and how they had viewed him as a symbol of stability in an otherwise chaotic period of history."

The source from which this paragraph is clearly cited states:

"But whatever Mao's attitude may have been, the country as a whole seemed plunged into mourning. Peking was described by foreign correspondents as looking like a ghost town, and the news that his ashes be scattered across the rivers and hills of his beloved land, rather than buried in some mausoleum, was received with deep emotion. With Zhou gone it suddenly emerged how many people had revered him, and regarded him as a symbol of an ordered life and of a measure of decency in deeply troubled times."

In this example, the facts and general pattern of prose are very similar, but the paragraph has been paraphrased so that only a few short phrases are worded exactly the same. According to the definition of plagiarism given above, is the paragraph's similarity of facts and order sufficient for it to be considered an insertion of text "with very few changes"? Can you point me to an example of text which violates these rules, and one which is acceptably paraphrased (without having to directly mention the source from which the text's information is cited)?Ferox Seneca (talk) 06:57, 31 January 2012 (UTC)
 * General advice can be found in our essay on paraphrasing. Can you link the article in question, and the source you are consulting? It may be worth looking at in detail. [Added] Note that it is often better to work in terms of copyright violation concerns where there is a clear process to resolution, rather than as a plagiarism concern. Franamax (talk) 08:24, 31 January 2012 (UTC)
 * If it is a copyvio to copy from the source, then we should refer to the copyright guideline and not attempt to add it here. If is not a copyvio, then our guideline should be simple: when in doubt, attribute. There is no reason to paraphrase to avoid plagiarism, and in fact is more honest to not resort to paraphrase when the source's wording is suitable for our article We should not encourage our editors to paraphrase to avoid plagiarism. Just add a footnote that is also an attribution rather than a mere citation. -Arch dude (talk) 14:09, 31 January 2012 (UTC)
 * Indeed--Kmhkmh (talk) 15:18, 31 January 2012 (UTC)
 * That is exactly what this guideline says, and exactly what I just said. Franamax (talk) 19:08, 31 January 2012 (UTC)


 * The above paragraph is from Zhou Enlai, in the "Memorial" section. The source is The Search for Modern China by Jonathan Spence, pp.610-611. The paragraphs include inline citations, and a full citation of the book is included in the "References" section. The section was modified last week to superfluously include the author's name in the offending paragraphs, due to those paragraphs being plagiarism via "close paraphrasing". Apparently, there is a dispute among authors about whether ordinary paraphrasing becomes "close paraphrasing" simply by sharing many words in common with its source, or if those words actually need to be written in the same order.


 * The change of prose to the article is only a minor concern, but it is a concern to me that I can't find a definition of "close paraphrasing" that makes it clear that a section must include numerous words in the exact same order as its source in order for that section to be an example of "close paraphrasing". Some guideline for the exact number or proportion of words that cannot be in the same order as its source would also be helpful. Without a more technical guideline, editors may freely interpret anything that is paraphrased from its source, including almost everything in this article and almost everything that I write, as plagiarism.Ferox Seneca (talk) 19:11, 3 February 2012 (UTC)
 * There is no simple formula that can be given for this. People have asked for similar things for what is a copyright violation, and the general consensus is that we can not describe it simply. It depends on how notable/creative the language is. For example a simple chronological list may have hundreds of words similar and be neither a copyright violation or plagiarism, while a short epigram not attributed to a source is probably a copyright violation and plagiarism. See the archive Wikipedia talk:Plagiarism/Archive 8.
 * I think SlimVirgin was the driving force behind the wording of in text attribution and close paraphrasing a year last November. It all blew up because of the Rlevse drama (As it happens this particular Wikidrama has just had a re-run see this ANI). I think it is way time intext attribution and close paraphrasing of copyrighted material was revisited because, I think that that "close paraphrasing" is creating so many problems that we should go back to information from copyrighted sources should either be summarised or quoted (and do a way with close paraphrasing). As noted here in the archives that was how it was before SV started making changes here. Anyway these archive sections may help flesh out some details:
 * Wikipedia_talk:Plagiarism/Archive 6
 * Wikipedia_talk:Plagiarism/Archive 7
 * Wikipedia_talk:Plagiarism/Archive 7
 * See what you thing as there is a lot in those archives (particularly 7 and 8) which address this issue. -- PBS (talk) 05:35, 4 February 2012 (UTC)

Close paraphrasing examples
There is a discussion at Wikipedia talk:Close paraphrasing concerning the language of the examples being used there. These examples were generated to give contributors guidance in talking to other editors where problems are sufficient enough to warrant tagging of the article, either by blanking with copyvio or with close paraphrasing. The specifics at this point seem to revolve around whether or not the examples should be altered to default to use on a single passage of close paraphrasing (by removing the current text "This is an example, there are other passages that similarly follow quite closely") or altered to embrace paraphrasing that may not be as close (by eliminating the term "very" from "very closed paraphrased"). Additional input in this conversation would be welcome to help establish consensus, here. The section immediately above is, really, essential reading. Sorry for the complexity; a content dispute seems to have swelled it a bit. :) Please see the discussion there if you have an opinion and would like to take part. --Moonriddengirl (talk) 19:54, 11 March 2012 (UTC)

Large-scale constructs
User:Nikkimaria has defined "Large-scale constructs" as sentences or paragraphs in an article that are structurally the same as the source, even if the use of synonyms creates the superficial appearance of proper paraphrasing, and has given the following example from the article on Wyman-Gordon (Problem version from WP article, proposed variant to fix the problem by changing the sequence, followed by the source): This is to propose that the guideline on paraphrasing be expanded to cover similar large-scale constructs. Aymatth2 (talk) 23:10, 23 May 2012 (UTC)
 * The company now had complete control of its supply chain and orders began to grow again as customers regained confidence in Wyman-Gordon's delivery capability.
 * Because the company now had complete control of its supply chain, customers were more certain of Wyman-Gordon's delivery capability, leading to increased orders.
 * The company now controlled all the essential technologies involved in producing titanium components, including sponge manufacturing, melting and alloying, and the forging process. As a result, customer orders began to increase due to customers' confidence that Wyman-Gordon could provide an uninterrupted supply of titanium forged parts.


 * I am not sure about the exact wording, purpose, tests and remedies, so this is a request for initial views and suggestions, that could be followed by more tangible recommendations. Maybe it is a daft idea, or maybe it is impossible to define the idea precisely enough to be useful, but there does seem to be a concept here and it seems to be worth discussion.  "Broad paraphrasing" could be a better title, unless that is what we already recommend. Aymatth2 (talk) 01:29, 24 May 2012 (UTC)
 * I'm not sure this is the correct venue or approach for this issue. The issue raised at the DYK nom was more a paraphrasing than a plagiarism problem, and of course when proposing policy clarifications it is best to remain more general. Perhaps consider WT:Close paraphrasing instead? Nikkimaria (talk) 02:42, 24 May 2012 (UTC)


 * I would agree that WP:Close paraphrasing is the better page to discuss proper paraphrasing practices. --Moonriddengirl (talk) 10:52, 24 May 2012 (UTC)

I disagree. The essay on WP:Close paraphrasing is just an essay. We need a more formal guideline. Editors must avoid close paraphrasing because it is a form of plagiarism, which in turn violates copyright laws due to removal of copyright management information. The question is whether large-scale constructs represent a similar problem. If so, we can develop a new essay on how to identify and manage these constructs, which by definition are not WP:Close paraphrasing. But first, how should Large-scale constructs be defined, and are they a real problem? Aymatth2 (talk) 15:58, 24 May 2012 (UTC)
 * You're starting from a few incorrect premises here. "Large-scale constructs", as I defined them, are a paraphrasing issue. Close paraphrasing may or may not be a copyright issue, depending on circumstances, but it would be wrong to suggest that either close paraphrasing or plagiarism (and no, these aren't necessarily the same thing) always violates copyright laws. As a clear example, if you create an article that is a completely unattributed copy of a public-domain source, that would be a plagiarism issue, but would not constitute a legal violation. Close paraphrasing, plagiarism and copyvio are related, but are not the same. If you see the need for a formal guideline on paraphrasing, I would suggest working to make WP:Close paraphrasing (or an edited version thereof) a guideline, rather than trying to incorporate all of that information into this page. Nikkimaria (talk) 20:11, 24 May 2012 (UTC)


 * Close paraphrasing and plagiarism are quite different. As I walked home from the pub I realized I should make that clarification.  Unattributed close paraphrasing is of course plagiarism and usually also a violation of copyright laws.  The essay on on WP:Close paraphrasing mostly discusses how to avoid, detect and handle it.  Where the source is attributed a direct quotation is preferable to the paraphrase in most but not all circumstances.  But this is not a discussion of close paraphrasing.  With large-scale constructs we are (I think) talking about copying that is so disguised as to be unrecognizable, but for which a WP policy may still be appropriate.  Aymatth2 (talk) 22:29, 24 May 2012 (UTC)
 * This is not the place to discuss paraphrasing. Paraphrasing, close or otherwise, does not cure plagairism. The only remedy for plagairism is to attribute the source. If the soure is attributed, there is no plagiarism. If the source is not attributed, there is plagiarism. If the source is used at all, then it must at least be referenced. If content other than facts are taken from the source, it must be attributed. The only gray area is whether or not some degree of paraphrasing has reduced the extraction to mere facts so that a reference rather than a attributions suffices, and frankly I just do not care, because if the refernce is provided, any reader is free to make their own determination. We should not encourage paraphrasing as a mitigation against plagiarism. In general, it is more honest and a better acknowledgement of the original work to use it (and attribute it) without modification. Paraphrasing is a necessary evil when the original work is copyrighted. This discussion therefore belongs under wp:c, not here. Whether or not large-scale structure is copyrightable is a separate issue, and one that is currently being actively litigated in the United States, but this has nothiong to do with plagiarism. Wikipedia has an ethical obligation to avoid plagiarism. We have a legal obligation to avoid copyright violation. -Arch dude (talk) 01:58, 25 May 2012 (UTC)
 * Whether WP has indeed an ethical obligation to avoid plagiarism is debatable. It surely has a legal obligation regarding the copyright and it surely has a fundamental project requirement for sourcing. As far as plagiarism is concerned the community (or rather its majority) has decided that WP should stay clear of plagiarism for various reasons, but I would consider that an ethical obligation. It is primarily a procedural decision, that by no means is a fundamental requirement for the WP projects, which also reflects in the fact that it is a guideline and not a policy.--Kmhkmh (talk) 13:59, 25 May 2012 (UTC)


 * Come on guys, that is a different debate. I don't care if WP:Plagiarism is a guideline, not a policy.  It makes sense and is good enough for me.  And 99% of the time plagiarism is a copyright violation anyway.  But I am after some sort of definition of a "large-scale construct" because right now I would not recognize one if it sat down beside me on the bus.  Apparently it is a Bad Thing.  I think it is some sort of plagiarism, but it is not close paraphrasing.  But I don't know.  Aymatth2 (talk) 17:31, 25 May 2012 (UTC)


 * My understanding is that where anything that could be considered creative expression is reproduced, the article should give inline attribution, as in "Smith said that..." Preferably the words should be quoted verbatim, but there are occasional cases where close paraphrasing is needed, as with highly technical material where a more accessible version is needed.  "Robin" may be better than "Erithacus rubecula".  Either way, the source must be clearly attributed inline.  I also understood that if a copy is so disguised as to be unrecognizable, it is not a copy and therefore is neither plagiarism nor a violation of copyright laws.  Ditto with reproduction of mere facts.  In both these cases the source should be cited, but an inline attribution, as in "Smith said that..." is not appropriate and may in fact give offense to the author.


 * Nikimaria has brought up the concept of the "large-scale structure" of one or two sentences or a paragraph. This is not the overall structure of the work, but is also not the same as the creative wording.  So paraphrasing may not fix it.  In the example given, the similarity seemed to be that the same information was provided in the same sequence as the source, even though the style and wording was quite different.  The proposed solution was to re-sequence the information.  The question is whether an article may plagiarize the large-scale structure of a sentence or paragraph even when the work is cited and there is no close paraphrasing concern.  If so, how do we define "large-scale structure" and what is the remedy?  Aymatth2 (talk) 13:21, 25 May 2012 (UTC)


 * For convenience, the example is given below, first the version that violated the large-scale construct rule, then the re-sequenced version, and last the source, with matching words highlighted to make the similarities clear:
 * The company now had complete control of its supply chain and orders began to grow again as customers regained confidence in Wyman-Gordon's delivery capability.
 * Because the company now had complete control of its supply chain, customers were more certain of Wyman-Gordon's delivery capability, leading to increased orders.
 * The company now controlled all the essential technologies involved in producing titanium components, including sponge manufacturing, melting and alloying, and the forging process. As a result, customer orders began to increase due to customers' confidence that Wyman-Gordon could provide an uninterrupted supply of titanium forged parts.
 * The first and second versions use much the same wording. The second version replaces "confident" with "certain", perhaps a debatable change, reintroduces the concept of causation ("Because" for "As a result"), and reintroduces "increase", so in some ways the second version is closer to the source than the first version.  But neither the first nor the second version could be seen as close paraphrasing.  Both have a very different style of writing, and both present the facts from a business process viewpoint where the source is much more engineering-oriented.  No creative turns of phrase are reproduced and all three versions are mainly dry facts.  The big difference is in sequence, which seems to be the essence of the "large-scale structure" concept.  Aymatth2 (talk) 14:35, 25 May 2012 (UTC)
 * There is no consensus that "attribution" requires "inline attribution." Several of us believe that other forms of attribution are perfectly acceptable, including, for instance the use of a "ref"-style footnote that includes the phrase "some content taken from..." This credits the original author without cluttering the text, while still letting the interested reader know that content, not merely facts, are from anther source. See the project page itself for acceptable forms of attribution. -Arch dude (talk) 19:47, 25 May 2012 (UTC)


 * I personally prefer inline attribution for quotes or close paraphrases, but can see a footnote type of attribution would also work. Either way, the issue with the large-scale structure rule is that an editor may be inadvertently plagiarizing whenever they present the same information in the same sequence as the source, no matter how different their wording is.  And when different sources present the same information in different sequences it becomes a nightmare.  The editor finds a source that says "Because of A, B happened".  They carefully rephrase and re-sequence as "B happened due to A" only to find some other source has used the new sequence.  Either sequence is plagiarism.


 * We could change the citation templates to add a standard phrase like "this article may present information in the same sequence as this source"? It would have to be "may present" because there is always a risk that another editor will think a sequence is awkward and rearrange, thus either taking it out of the source sequence or putting it into the source sequence.  But is presenting the same information in the same sequence without attribution automatically plagiarism, no matter how different the words are?  If so, should not this guideline be updated to spell that out? Aymatth2 (talk) 20:20, 25 May 2012 (UTC)

Proposal
I sense quiet support and propose to add a new section immediately following the section on "Close paraphrasing", with the title "Large-scale constructs": "An author's creative expression may rest in the sequence in which the information is given rather than in the words itself. An article should therefore not present information without direct attribution in a sequence that has been used by another writer, since this may be a form of plagiarism. Where this is not practical, a footnote should be added with words like 'the above sentences may include information in the same sequence as that used by one or more other writers'." If there are no objections, I will make the change. Aymatth2 (talk) 00:55, 29 May 2012 (UTC)
 * I think this oversteps the purpose of this guideline, which really has been to focus more on the best way to treat PD/free information rather than address the academic-type "moral" issue, or even ethics for that matter. In essence, much the same test should be made for takings from a copyright work or PD/free - has there been copying of large-scale structure among other ways of too-closely-paraphrasing or improperly attributing a work? If yes and it's copyrighted, address under WP:CP, if yes and it's PD/non-free, address as advised in this guideline, which includes use of citation wrappers in addition to inline attribution. If the work is copyrighted, I don't think we're seeking to defiine a "not copyvio but still plagiarism" test. So this is not the right place for your proposed wording, as it's not specific to only WP:PLAG. Franamax (talk) 01:24, 29 May 2012 (UTC)


 * I don't agree with this proposal. I agree with Franamax. While it's true that an author's creative expression may rest in the sequence, that's just another kind of close paraphrasing. We don't need to adopt neologisms to describe it. The proper handling of close paraphrasing from a plagiarism standpoint is already described here. --Moonriddengirl (talk) 01:30, 29 May 2012 (UTC)


 * In practice plagiarism is almost always a copyright violation. Where there is no copyright it is trivial to provide attribution, and very few people would object to doing so.  The issue is  where the editor wants to use information supported by a copyright-protected source but does not want to quote direct.  I disagree that presenting information in the same sequence as another source is close paraphrasing.  What if there are three sources with three different sequences?  How do we avoid plagiarizing one of them?  The essay on close paraphrasing does not have sufficient scope or force to deal with this issue.  What policy or guideline page is the right place for this proposal?  Aymatth2 (talk) 02:03, 29 May 2012 (UTC)
 * Can you point to a case where that has happened? The specific case you discuss above is from a single source, no? If you are drawing from multiple sources there is an excellent chance you will not be making an unacceptable paraphrase of any of them. Just as with a copyright evaluation, using the simplest order to present facts is not usually a problem, nor is presenting a large number of facts. It is copying of the creative selection of facts or a thematic (say) rather than chronological organization which gives rise to "large-scale" paraphrasing problems. It's hard to understand how using multiple sources would give rise to such a problem. So what exactly are you trying to clarify, other than a theoretical exercise? Franamax (talk) 02:42, 29 May 2012 (UTC)


 * This proposal was triggered by the example below, where a precis and paraphrase is considered plagiarism since the sequence is the same as the source, but where essentially the same words if re-sequenced are not plagiarism. Since reproducing the sequence of a source is plagiarism, whatever the words are, there should be a guideline that spells this out so others do not fall into the same trap.  To reproduce the example once more, first the plagiarism, then the re-sequenced fix and finally the source:


 * The company now had complete control of its supply chain and orders began to grow again as customers regained confidence in Wyman-Gordon's delivery capability.
 * Because the company now had complete control of its supply chain, customers were more certain of Wyman-Gordon's delivery capability, leading to increased orders.
 * The company now controlled all the essential technologies involved in producing titanium components, including sponge manufacturing, melting and alloying, and the forging process. As a result, customer orders began to increase due to customers' confidence that Wyman-Gordon could provide an uninterrupted supply of titanium forged parts.


 * There may be multiple sources for an article, from which the editor has picked one to support a given sentence. The other sources may say much the same thing, but in a different sequence.  Or there may be sources that have not been identified.  Perhaps  the events described above were covered by a 1982 article in Titanium World, now long out of print.  Who knows what sequence that article used?  That is why the proposed solution is open-ended about where the sequence came from. Aymatth2 (talk) 12:32, 29 May 2012 (UTC)


 * Following the sequence of a source certainly can be a copyright infringement, even if you change every word. If switching out words were the ultimate solution to copyright issues, there would be no legal barrier against translating a work from, say, Japanese to English. No word is retained, but translation is still the exclusive right of the copyright holder. The point at which following sequence becomes an issue of either plagiarism or copyright is certainly going to be a subtle one. As Franamax points out, key is the degree of creativity involved. A chronological biography may have little creativity unless the author has a creative means of selecting order. "He was born; he went to school; he got a job; he retired" is not creative enough for protection. The greater the amount of detail and the more subjective the importance of those details, the more likely we are to run into problems. Because of the subtlety here, I do not agree with your wording: "An article should therefore not present information without direct attribution in a sequence that has been used by another writer." Sequences that are not creative are no one's property. Beyond that, I still maintain that this is an issue of close paraphrasing (which we describe as "the superficial modification of information from another source") and any changes should go there. That essay is referred to both in plagiarism and copyright work. I don't think we should use neologisms to define concepts that already fit under more widely used terminology. --Moonriddengirl (talk) 12:17, 30 May 2012 (UTC)
 * Again: This is about copyright, not plagiarism. The only way to avoid plagiarism is to attribute, even if you paraphrase. Paraphrase may or may not avoid copyright infringement, but that is a topic for the copyright page, not the plagiarism page. Whether or not "Sequence, Structure, and organization" ("SSO") can constitute copyright infringement is a topic that is currently being litigated. In general consensus appears to be that if the SSO is a logical consequence of the underlying work (e.g., alphabetical arrangement in a dictionary, chronological arrangement in a history, logic chain in an argument) then the SSO is not creative and is not copyrightable. With respect to re-creating SSO of "B" without actually ever reading "B," this is not plagiarism, by the definition of plagiarism. -Arch dude (talk) 13:49, 30 May 2012 (UTC)
 * I should probably just let this one go. The example floored me because I could not see anything creative about the sequence, and if anything the "correct" version seemed closer to the original in wording and concept than the problem version.  It got me reading and I found that sequence can be creative, but the examples were much more elaborate than a simple "A happened, resulting in B and then C".  The idea cannot be copyright, only the way it is expressed.  Shuffling the sequence does not seem to make any difference.  But I got the strong impression from Nikkimaria's comments that the large-scale structure could itself be the author's property regardless of wording: "Be mindful also of larger-scale structural similarities ... sentences or paragraphs that are structurally the same, even if the use of synonyms creates the superficial appearance of proper paraphrasing." I must have misunderstood.


 * To Arch Dude's points, most people see plagiarism as something to be avoided, even if there is no copyright concern. The concept, which I seem to have got wrong, was that using the same sequence as an author, regardless of the wording, was a form of plagiarism if not attributed.  Can you plagiarize "B" without ever reading "B"?  If your version looked suspiciously similar to "B", a judge might find that you had even if you had not.  If the sequence of presentation was considered intellectual property, it would be very easy to "accidentally plagiarize".   Aymatth2 (talk) 14:17, 30 May 2012 (UTC)
 * A judge does not rule on Plagiarism. Plagiarism is not a legal matter. A Judge may rule on copyright violation. Therefore, it is (arguably) possible to commit a copyright violation without ever reading the original work, but it is not possible to plagiarize. In the world of software copyright, it is legally accepted that a team of programmers can reproduce the functionality of a copyrighted work without violating the copyright, if no members of the team have ever seen the source code of the work: this is called a "Clean room design." They are permitted to work from a document called a "functional specification" that is written by analysts who reverse engineer the original work and who then never contact the team. I think that this is exactly analogous to the "inadvertent plagiarism" you refer to. -Arch dude (talk) 19:50, 30 May 2012 (UTC)


 * A judge would rule on plagiarism if that were the basis for a charge of removal of copyright information.


 * The idea of a clean room design is that the developers of the equivalent software have never seen the original software and have no knowledge of its design, only of its function. They know what it does but not how it does it, and will therefore create radically different software to perform the function.  In effect the new software expresses the same idea, but in entirely different words.  It is possible that the new software developers could produce something like the same large-scale structure as the original software.  They might decide that a word processing program should include a user interface module, a text editing tool, a spelling checker and a print formatting module.  The original design might well have that same large-scale structure, even though in detail it would be completely different.


 * "Inadvertent plagiarism" is where one author creates something that appears to copy another's work, even though they have never seen that work. This is highly unlikely if by "copy" we mean reproduce the same words with only superficial changes, but quite likely if it is enough to just present the same information in the same sequence.  Fortunately, that does not seem to be the case.  In general the rather vague idea of "large-scale structure" seems much more open to interpretation, error and accident than the narrow concept of close paraphrasing.  Aymatth2 (talk) 20:33, 30 May 2012 (UTC)


 * I made a start at Structure, sequence and organization (SSO). This is about software, but software is considered a literary work in the U.S. so presumably there are similarities and analogies.  There were some surprises, to me, probably ideas that are well known to anyone else following this discussion.  Any corrections to my garbled explanation of SSO welcome.  Aymatth2 (talk) 19:27, 31 May 2012 (UTC)