Wikipedia talk:Contributor copyright investigations/Archive 1

The purpose of the process page
Wikipedia has several processes in place for dealing with limited copyright concerns--single articles or files, even a small grouping of these--but no workable process for dealing with massive multiple point infringement. While WP:COPYCLEAN has attempted to fill this gap with WikiProject Copyright Cleanup/Contributor surveys, this solution is not ideal. Such clean-up needs to be general, not affiliated with a single project. I hope that generalizing clean-up will encourage other contributors as well as making it easier to publicize the investigation option at relevant policies and guidelines.

To substantiate the need for this, I need only point out the listings currently at WikiProject Copyright Cleanup/Contributor surveys and those few which have already archived. It seems to be uncommon (thankfully) but is not rare that a contributor creates copyright problems in hundreds or even thousands of articles or images.

This page is intended, in a nutshell, to provide a place where contributors can bring such multiple point infringers to the attention of the community. The process proposed is that an administrator or clerk will confirm the problem if it exists and, if it is sufficiently widespread, open a cleanup subpage with instructions for handling. Currently, we use a program created by User:Dcoetzee that groups contributions of contributors by article, listing them (with diffs of contribution) in a hierarchy of most text contributed first. We don't have a program for listing files, but as contributors tend not to edit the same file page so often, Wikipedia's own contribution history listing may suffice for this. If plausible, I might request a bot to do this work on cleanup subpages for us. (I have no idea what bots are capable of. :))

I would imagine that except under rare circumstances, instructions will typically center around removing the contributions. For articles, a templated message is appropriate at the talk to explain why and to ask contributors to verify that the text is copyvio free before restoring it. (See Template:CCI.) There have been cases where a project comes together to evaluate materials as they go (notably this one), so I would propose to notify projects that are substantially affected in case members without a history of copyright violation are interested in helping out.

For an example of a listing under the process I propose (as some of the COPYCLEAN listings well predate this idea), please see the newly opened CCI/Singingdaisies.

I have spent more time than I want to contemplate working on copyright cleanup, including multiple point infringement, and I believe a process for handling these is sorely needed. I am not solely invested in the process I propose. To me, the point is coming up with a workable solution. This process may necessarily create some embarrassment for those listed, but I'm afraid that some is unavoidable (and we already risk some embarrassment with WP:SSI, which is not the legal concern copyright problems are). In an effort to minimize this, investigation pages are not indexed, and there are reminders that infringement may not be intentional. Its strengths are (IMO) in centralizing investigations and formalizing processes to avoid some of the sprawling drama I have seen when matters must be raised instead at ANI. --Moonriddengirl (talk) 13:16, 23 October 2009 (UTC)


 * Thanks for the heads up. -- PBS (talk) 13:52, 23 October 2009 (UTC)

Contribution survey listings
What is the thought behind: "All contributors with no history of copyright problems are encouraged to contribute to clean up."? It would seem to me that we want to encourage contributors with a history of copyright problems, particularly if they made the mess in the first place to it up clean up, so is this sentence necessary? -- PBS (talk) 13:52, 23 October 2009 (UTC)
 * That's a good point. :) The thinking is that we don't want UserX, who has a history of copyright violations, randomly declaring that the contributions of UserY are okay, either because UserX may not be operating in good faith or may have no idea what constitutes a copyright violation. I'll see if I can clarify that. --Moonriddengirl (talk) 13:55, 23 October 2009 (UTC)
 * Actually, allowing the perp to clean up their mess is a double edged sword. I'd recommend against it. If the situation has got to a CCI, it means they clearly 'don't get it'. They will usually have had multiple warnings and guidelines pointed out them before arriving there. They would definitely require supervision, i.e. their clean ups will have to be checked by someone else anyway. I've also noticed in the past that some (most?) perps who do 'help' are so defensive about having been caught out and often so possessive of 'their' articles that they waste everyone's time arguing about what was removed and why, who added it, etc. etc. In my view the only way a perp can/should be allowed to help is to provide the sources of the copypaste where they aren't already known. If the want to 'make nice', they can go off and find references for unreferenced articles written by other people. Voceditenore (talk) 15:07, 24 October 2009 (UTC)
 * There have been several occasions where they have, and we have generally followed up after to be sure that they did so properly. (I would agree with you that they can't be left unsupervised.) The contributor is working on this one and the contributor helped here. --Moonriddengirl (talk) 15:11, 24 October 2009 (UTC)
 * Even so, the amount of other editors' time spent supervising, arguing, explaining, etc. has to be weighed against any help the perp will give. If there's a net gain in time and agro saved, well ok, but I still think it should be the exception rather than the rule. Actually, one clear area where I'd make an exception is the one pointed out below, where a good-faith editor hadn't realized the intricacies of CC licensing. Voceditenore (talk) 15:25, 24 October 2009 (UTC)
 * I agree that if the infringer hinders or disrupts the process, he or she can't be permitted to assist. Few of them, honestly, have ever shown any interest in helping. --Moonriddengirl (talk) 15:40, 24 October 2009 (UTC)

Copy left and intra wikipedia copies
I think it needs to be emphasised that many infringements are not intentional. Not just because people misunderstand the copyright law, but for other reasons as well.


 * Many editors do not realise that copying from one article in Wikipedia to another is a copyright violation unless suitable attribution is given.


 * Some editors fall foul of the greater restrictions that some sites have, for example although David Plant at http://www.british-civil-wars.co.uk uses a Creative Commons License, it is not compatible with the Wikipedia Creative Commons Licence because it is more restrictive and does not allow commercial redistribution.

It might be better if a split was made between clean up where the perpetrator is either not available or unwilling to help clean up, and those where the perpetrator shown the error of their ways has accepted that they made a mistake and wish to help clean up. -- PBS (talk) 14:05, 23 October 2009 (UTC)

Perhaps similar to the split in WP:RM "Uncontroversial requests" and "Contested requests". -- PBS (talk) 14:09, 23 October 2009 (UTC)


 * Thank you very much for your feedback! Hopefully I've clarified point one. Maybe a section called "Contributor assisted cleanup" would work? But that term is a little vague, since we're all contributors. I want to avoid the pejorative "Infringer assisted cleanup" but can't think of another word that indicates "The specific contributor who created the problem." --Moonriddengirl (talk) 15:05, 23 October 2009 (UTC)

When should a CCI be started? - queries
First of all, as a member of project currently dealing with the aftermath a serial copy-paster complete with multiple sockpuppets (!!), I think something like this is long overdue. Kudos to Moonriddengirl for getting this started.

Just a couple of queries:
 * 1) the meaning of "wholesale copy / pasting". People are bound to ask: How much is wholesale? Two or more sentences? Three or more? It might be a good idea to specify: X or more continuous sentences.
 * 2) by specifically stating only "copy / pasting", I assume you're excluding closely paraphrased passages as 'evidence', or not?

Voceditenore (talk) 14:29, 23 October 2009 (UTC)
 * Didn't intend to, and I hope I've addressed that. :) I've just removed the "wholesale" reference. If people start listing very iffy matters ("these three words are duplicated from that source!") then that'll certainly be something that requires expansion. I just can't think of how to explain it without creeping. :/ --Moonriddengirl (talk) 15:02, 23 October 2009 (UTC)


 * I think this is exactly the point where the CCI clerks come into play. Is it coincidental? Is it plagiarism? Is it true infringement? The clerks will make the call. MLauba (talk) 10:45, 26 October 2009 (UTC)

Getting this active
This proposal is bound to pass; the RfC has already been going on for about two weeks, and has unanimous support. So, to facilitate the beginning of WP:CCI as a real process, I have the following comments:


 * Would it be a good idea to move all of the investigations currently at WikiProject Copyright Cleanup/Contributor surveys to CCI subpages? This would help activate the project, and lay out some work for us to do. We could then redirect WikiProject Copyright Cleanup/Contributor surveys to here, or mark as historical.


 * Almost all processes that use subpages like this one – WP:XFD, WP:RFA, WP:AFC, etc – have the pages as subpages of the project's main name, not the shortcut. For example, we have Articles for deletion/Twice exceptional instead of AFD/Twice exceptional, and we have Requests for adminship/The Earwig instead of RFA/The Earwig. So, wouldn't it be a better idea to move the project's existing subpages to the Contributor copyright investigations prefix? We could leave the ones with a CCI prefix as redirects.

Thanks. &mdash; The   Earwig   @  05:44, 8 November 2009 (UTC)
 * This also makes sense.— S Marshall  Talk / Cont  18:17, 8 November 2009 (UTC)
 * Fine by me. The shortcut is sheer laziness on my part. :) I cut as many corners as I can, but if the Contributor copyright investigations hierarchy is easier, then I can buck up. I'm stealing RfA's start wizard for Contributor copyright investigations/Instructions, but I've got to figure out how to preload a page. --Moonriddengirl (talk) 18:29, 8 November 2009 (UTC)
 * Okay. I've got most of them transferred over. There's a couple I'd like more time to structure, since they are special circumstances. (Plus, I'm wiped out. :)) I've also added an investigation listing for a serial image infringer. --Moonriddengirl (talk) 21:49, 8 November 2009 (UTC)

A couple of questions
1) Can the intro text be trimmed? I added a skip box, but it's still tl;dr.

2) When are we going to open the doors?

3) I'd like to be a clerk. I have experience in clearing copyvios at SCV and various image copyright enforcement (e.g. Mario1987). MER-C 07:59, 9 November 2009 (UTC)


 * My thoughts: (1) I'm all open to ideas on trimming text. I'm much better at writing long than short. :) We could remove the whole "What's copyrighted?" section, I suppose. (2) It's probably better if I don't promote this myself, so I'll leave that to somebody a little less involved. :) (3) I'd love for you to be a clerk. I'm familiar with your copyright work. --Moonriddengirl (talk) 11:45, 9 November 2009 (UTC)


 * Done. It's now about the same size as the SSP header once I kill the proposed notice. I'm not quite out of the woods IRL yet but will be early next week. I may make a bit of time to open up this place (and look at the case you filed) when I'm bored. MER-C 13:38, 9 November 2009 (UTC)

Where to host tools? (techno clueless question)
I have a new version of the Contribution Surveyor program which I use to put together these lists. It is quicker than the first generation and more organized, because it creates neat sections of its own. It also runs several options simultaneously: all text contributions; major edits only; major edits with no reversions. I'm sure that Dcoetzee would put it back up online where others could use it (the old version is stored at http://moonflare.com/misc/ContributionSurveyor.zip, but is there someplace on Wikipedia where such a tool could be stored? I don't know so much from the "file" option. Would we be able to upload it that way? Or might we be able to get a bot to do this job? --Moonriddengirl (talk) 18:57, 9 November 2009 (UTC)
 * Update: I've talked to Derrick (aka User:Dcoetzee) about this, and he is at work trying to figure out how to simplify & speed this process. Meanwhile, since I already have the program installed on my computer, I'm happy to run it as needed. Caveat: the current version takes several hours to complete. --Moonriddengirl (talk) 15:30, 11 November 2009 (UTC)
 * Hit & run posting but ideally we'll want the application to run on the toolserver for convenience (and avoiding having one of our clerk's computers running several hours nonstop). MLauba (talk) 18:11, 11 November 2009 (UTC)
 * As the Contribution Surveyor is a Windows binary, it will not run from the Toolserver, which is Linux and Solaris-based. Someone would have to convert it to either a web application or something that can be run from a Linux or Solaris-based console. I would be happy to run it (http://toolserver.org/~earwig), it would need to be rewritten first . &mdash; The   Earwig   @  20:24, 11 November 2009 (UTC)
 * W00t, apparently it does work :D  http://toolserver.org/~earwig/contrib_surveyor/; ran on (unrelated to me) User:Earwig, producing http://toolserver.org/~earwig/contrib_surveyor/page.Earwig.txt . &mdash;  The   Earwig   @  20:33, 11 November 2009 (UTC)
 * Great! That's temporary, though. Derrick has several improvements in the version I have which he should be able to implement, including organizing output into groups and eliminating reversions & minor edits. --Moonriddengirl (talk) 20:35, 11 November 2009 (UTC)
 * I'm sure it would be significantly faster to retrieve the data using an SQL query, rather than a binary file that queries Wikipedia's API. &mdash; The   Earwig   @  21:29, 11 November 2009 (UTC)
 * He said something about that, but I'm afraid that I don't really even know what SQL is. :) Let me see if he can come talk to you about this. --Moonriddengirl (talk) 21:34, 11 November 2009 (UTC)
 * Hey all, I talked a bit to MRG about this - the concept is to have a web interface on Toolserver that lets you submit requests for reports, monitor the progress of reports, and automatically create pages for them on-wiki when they're done. I have a Toolserver account and I can do this. As noted, the database support could potentially massively speed up the necessary queries to generate the report (in fact it's probably possible to do it with a single join query). Dcoetzee 21:42, 11 November 2009 (UTC)
 * Hi. Thanks for doing this. This query:


 * ...will give you the page titles and revision IDs, which you can use to create most of the survey. It also returns the namespace, which you can use to skip non-articles. I did create a surveyor markup (mostly because I was bored :) ) which works for the most part, but can't calculate the size of an edit, and fails on page titles with certain characters such as é. It is, however, significantly faster than the Windows binary; this tool takes less than a second to process 100 edits. Of course, as you have way more experience in this department than myself; I'd like to see how your version of the tool works.
 * The only thing I don't quite understand is what you mean by [...] that lets you submit requests for reports, monitor the progress of reports, and automatically create pages for them on-wiki when they're done.; I'm not sure this is really necessary. A clerk/admin could query the tool when they needed a contribution survey; everything else could be done manually. The system is not that complex. &mdash; The   Earwig   @  00:21, 12 November 2009 (UTC)

Clerk requests
I've done some copyvio-related work and would love to help organize things (clerk) here. The instructions page says to post a request here. Here we are. :) GrooveDog &bull; i'm groovy. 15:07, 11 November 2009 (UTC)
 * And thank you very much. :D I really don't know how the consensus process works for choosing clerks, because I've been involved in it at SPI (which is where I stole (with correct attribution!) most of that). I think I need to go find out how it's done. --Moonriddengirl (talk) 15:28, 11 November 2009 (UTC)

Meta talk: Clerking?
←(HA! Seems like you could tell me how its handled at SPI, GrooveDog, being as you're a clerk for them.) :D But I see they do it through e-mail, which is awkward, since we don't have an e-mail list and probably should keep this on Wikipedia. We don't have the privacy issues they do, and we need to be transparent. How on earth are we going to do this with a process instead of a rather random "Yes, I like users with the word 'blue' in their names" or "While user has only been active two days, contribs seem copyvio clear" or "Not a chance! That user voted 'no' on this AfD!" Obviously, this is not a position of community trust akin to the RfA process, so we don't need a raking over the coals. Or pointless bureaucracy. But we also need to have some kind of standard to avoid abuse of the system, as a clerk could manipulate listings to benefit friends, disadvantage enemies.We've currently got two clerk volunteers here (User:MER-C and User:GrooveDog, and I'm very comfortable with both of them. We need all the volunteers for cleaning we can get, but only a few active clerk (I dunno: four?). Thoughts? (And I'm subsectioning this, because I think even potential clerks should help with this part.) --Moonriddengirl (talk) 15:37, 11 November 2009 (UTC)
 * Endorse GrooveDog & MER-C. I think we'll start with these two gentlemen and see how it goes, then make up a more formal process from there. Probably something along the lines of "at least 500 mainspace edits, 6 months tenure, no copyvio problems, able to walk uphill both ways in the snow", and applications up for 3-5 days during which the existing clerks either endorse or reject, once we get the ball going, a candidate is vetted by 3 or 4 clerks and must get 3 net yeahs from the evaluators. As soon as the 3rd yeah vote or second nay is cast, the candidacy is closed as either successful or not, unsuccessful candidates can reapply 2 months later.
 * This also leaves the door open for redeemed candidates - someone who had issues in the past but recognized the problem, helped fix them, went through mentoring, and has since become an advocate for WP:C. MLauba (talk) 18:20, 11 November 2009 (UTC)
 * PS - for the vetting, it can be done both by standing and former clerks, in my opinion. And the last part about uphill in the snow we can probably do without ;) MLauba (talk) 18:21, 11 November 2009 (UTC)
 * Commenting here as an SPI clerk: A lot of SPI work, because of the nature of the IRC bot that we rely on, takes place over IRC. If we have someone that wishes to become a clerk, we would have them post on the clerks' noticeboard (which I would imagine would be here for you guys). If there are no objections after a few days, a full clerk adopts the requester as a trainee clerk, and the trainee would be promoted if there are no issues. If there are objections, we have a fight to the death small discussion and figure out where to go from there. The process has generally worked for us, so if you guys want to adopt it, feel free. NW ( Talk ) 19:53, 15 November 2009 (UTC)
 * Sounds good to me. I propose GrooveDog & MER-C as full clerks immediately and that we follow this process in the future. --Moonriddengirl (talk) 20:47, 15 November 2009 (UTC)

Open for business
Since we are already getting requests for investigation, I've decided to not waste time and open up. Here we go. MER-C 10:35, 12 November 2009 (UTC)
 * Good luck to us. :) Moonriddengirl (talk) 15:22, 12 November 2009 (UTC)

Process board proposal
This is a proposal for a new process board for handling multiple point infringers, whether text or images. Currently, there is no official point where these are addressed. Please see Wikipedia talk:Contributor copyright investigations and help to determine consensus, for or against. --Moonriddengirl (talk) 22:29, 25 October 2009 (UTC)

Reasons for

 * As succinctly as I can, (a) expedience, (b) transparency, (c) official sanction. Multiple point infringement happens. Wikipedia currently has no official home for these (though WP:COPYCLEAN has been filling the gap), and contributors need to know where to take them when they find them. Sometimes they receive good attention on ANI (and sometimes they do not), but when contributions rise into the hundreds or even thousands, sustained effort is needed to address them, and ANI is not built for that. Too, there is currently no structured method for analyzing and dealing with these as a result of which conversations at ANI that do happen sometimes drift off-topic into other concerns and become unnecessarily dramatic. A board would provide a transparent location (the board can be linked from policies; with investigations collected anyone can see the state of clean-up; if a multiple point-infringer should re-infringe, archived records more easily demonstrate the point to which prior clean-up completed) with official status and clear procedures. --Moonriddengirl (talk) 11:28, 26 October 2009 (UTC)

Support

 * 1) Support as minor co-drafter. We need to move these out of WP:COPYCLEAN and to start treating this on a wider level. While copyright issues are traditionally given a wide berth by many, dealing with multiple massive infringements is something we will have less and less luxury to ignore. MLauba (talk) 10:41, 26 October 2009 (UTC)
 * 2) Support for reasons above. --Moonriddengirl (talk) 11:07, 26 October 2009 (UTC)
 * 3) Support -- Absolutely. One of the inhibitors to resolving the multitude of current copyvio cases is that the copyright violations reports are tucked away in a "back alley" area of WP. Bringing these reports into the open with a dedicated CCI page will encourage drop-by editors to help resolve cases -- in much the same way as SPI, RPP, UAA, AIV and other highly trafficked policy violation areas. — Cactus Writer |   needles  12:31, 26 October 2009 (UTC)
 * 4) Support I've only been involved in one contribution survey so far, and I have to admit, there were times when my heart sank when looking at the sheer volume of edits to be checked, reverted and noted - and that seemed to be a fairly straightforward case of copy-pasting from a couple of different websites. The problem is only going to grow, and any process which provides higher visibility and encourages more participation would be useful. --  Kateshortforbob talk  13:00, 26 October 2009 (UTC)
 * 5) Support as someone who has helped out occasionally at WP:COPYCLEAN and is a member of WikiProject Opera which is faced with cleaning up a huge amount of actual and potential copyvio from a single editor, this is long overdue. Something this important to Wikipedia's reputation and legal position needs to have a Wikipedia-wide solution. Voceditenore (talk) 13:51, 26 October 2009 (UTC)
 * 6) Support Much of the proposed process is already happening, but an adhoc case-by-case basis via ANI and Wikiproject Copyright Cleanup. The proposed process will help to centralise and sanction these efforts, and help focus attention on the incredible backlogs that these copyright violators are generating. As I've noted before, the subpages described by this proposed policy are also excellent to link in edit summaries for the purpose of explanation to confused people watching the affected articles - we may even recruit more people this way. Dcoetzee 19:25, 26 October 2009 (UTC)
 * 7) Support A great idea that has reached it's time. I have worked with User:Moonriddengirl on several copyright issues.  I am glad to see she has stayed with the work and continues to find ways to improve the dificult and critical task. JeepdaySock (talk) 16:25, 29 October 2009 (UTC) (aka User:Jeepday
 * 8) Support - I just ran into a case today where this board would have served much better than what is being used now. NW ( Talk ) 21:19, 31 October 2009 (UTC)
 * 9) Support' - An official mechanism to deal with large scale copyright violations would allow for more organised and transparent handling of these situations. -- Whpq (talk) 14:36, 3 November 2009 (UTC)
 * 10) Support – A better and more organized alternative to WikiProject Copyright Cleanup/Contributor surveys. &mdash; The   Earwig   @  16:17, 3 November 2009 (UTC)
 * 11) Support particularly if it can deal with problems of suspected multiple copyrighted image uploads. MilborneOne (talk) 21:05, 6 November 2009 (UTC)
 * 12) Support. Serial abuse of copyright needs dealing with. Fences  &amp;  Windows  16:27, 7 November 2009 (UTC)
 * 13) Support. Serious issue with little attention as it is difficult to deal with now. feydey (talk) 19:24, 7 November 2009 (UTC)
 * 14) Reasonable idea. – Juliancolton  &#124; Talk 00:30, 8 November 2009 (UTC)
 * 15) Makes sense to me.— S Marshall  Talk / Cont  18:16, 8 November 2009 (UTC)
 * 16) Support per above  The left orium  21:50, 8 November 2009 (UTC)
 * 17) Per above. Absolutely needed. MER-C 07:34, 9 November 2009 (UTC)
 * 18) Support per above; get out the mops! Bearian (talk) 18:26, 9 November 2009 (UTC)
 * 19) Support - good idea!--Blargh29 (talk) 00:35, 11 November 2009 (UTC)
 * 20) Support. I'd rather avoid creating a new board, but if the people who work in this area say it's needed, that's good enough for me. Rd232 talk 19:54, 11 November 2009 (UTC)
 * 21) Support, having had experience of such problems, I think it is a good idea that they are logged in one place, and we learn from shared experience of how to deal with this problem. It could also help if there is ever a legal challenge to show that we are serious about fixing this problem. -- 16:25, 16 November 2009 (UTC)

Image instructions
I've realized that the preloaded instructions are specifically geared for text (not surprising, since I wrote them and text is where I work). I've removed the instructions for now from Contributor copyright investigations/Vlad9. Thoughts on how these should go? --Moonriddengirl (talk) 15:22, 12 November 2009 (UTC)
 * Oh, this is particularly timely, since we have another image issue in the investigation requests which looks like a pretty clear candidate. --Moonriddengirl (talk) 15:26, 12 November 2009 (UTC)

Draft image instructions (is there a "guide to image deletion" equivalent on Commons?) MER-C 04:23, 15 November 2009 (UTC)


 * I don't think there's anything as clear as our brilliantly compiled Guide to image deletion on Commons. :D (Though Commons:Deletion policy is helpful.) The question now, I guess, would be what we do with items that look like a probable copyvio but for which copyvio cannot be confirmed. Do we mass list them at PUF or presumptively delete in accordance with Copyright violations? --Moonriddengirl (talk) 14:30, 15 November 2009 (UTC)


 * I would think that the images should be deleted as a consensus of this process rather than re-listing at PUF. Unless we can list the investigation as one liner at PUF rather than re-list every image, but I think that the evidence from the CCI should be enough. MilborneOne (talk) 12:14, 16 November 2009 (UTC)
 * Works for me. Avoids redundancy of effort, and it should not be implemented by a clerk or admin without strong evidence. --Moonriddengirl (talk) 13:40, 16 November 2009 (UTC)

Contribution surveyor
I'm happy to announce that the precursor to Dcoetzee's Toolserver-based contribution surveyor, ~earwig/cgi-bin/contribution_surveyor.py, is now fully functional. It has all of the features of the old contribution surveyor, as well as minor edit/major edit support, namespace support, and can process about twenty contributions per second. Of course, Derrick's version will include many more features; we can use this briefly until he is finished with it. &mdash; The   Earwig   @  00:14, 16 November 2009 (UTC)
 * Cool! Just for comparison purposes, I'm running the latest version Derrick gave me on Jcuk and will run the Toolserver version as well. We can see what kind of differences we get. :) --Moonriddengirl (talk) 15:22, 16 November 2009 (UTC)
 * Hmm. I have no idea why, but we got some pretty major differences: CCI/test. Massive, really. :/ I understand that the current prototype is a work in progress; we probably shouldn't rely on it until we (a term that should be read as "you who understand such things") can work out some of the kinks. :) Meanwhile, we might want to stick with the program. I can e-mail it to anybody who wants it, and I'm perfectly happy to run the program for any clerk or admin who does not until we get it up and running. (I apparently have the code of the latest version in what Derrick e-mailed me, if you want to see that.) --Moonriddengirl (talk) 15:57, 16 November 2009 (UTC)


 * That's very, very strange. I did a few SQL queries, and found the problem: it's an issue on the Toolserver's side. One of the columns I was using to organize everything, revision.rev_parent_id, was blank for a large number of Jcuk's edits. This prevented the tool from charting those edits, as they were not appearing properly in the query. I'm not sure what I can do about that; apparently it is a massive oversight on my part :P Anyway, I look forward to Derrick's version; hopefully he won't encounter such a bug. Heh. &mdash; The   Earwig   @  21:32, 16 November 2009 (UTC)
 * I'll have to leave it to you folks to handle. :) This is way out of my field. I just kind of point & click this stuff. :D --Moonriddengirl (talk) 22:30, 16 November 2009 (UTC)

Template for open CCIs
A quick hack I put together just now. I don't know what to do with that white space - merely giving a description of the subject matter involved (to help involve editors who have an interest in the subject) isn't enough so two columns will do for the time being. MER-C 13:38, 17 November 2009 (UTC)
 * Fabulous! That's a great idea. A summary of subjects makes it particularly helpful for figuring out what project might care. (I need to draft a template for that.) Can we obscure the individual's names and just link to the description? Some of these contributors are indefinitely blocked, but some of them really were working off of a misunderstanding or operating in good faith, and I would hope to emphasize that it's about the content more than the contributor. --Moonriddengirl (talk) 13:46, 17 November 2009 (UTC)


 * Another template: Template:CCI-project. This is designed to alert projects that may be significantly impacted by CCI work and to invite their participation in clean up. --Moonriddengirl (talk) 18:51, 17 November 2009 (UTC)

Alterations to CCIsubpage
I've added MER-C's instructions for images (with attribution :)) with a few notes of my own, since we don't want to open PUF listings for images that are posted here. Look okay? Should we remove the empty User5? If we're copying the listing over, it seems redundant. --Moonriddengirl (talk) 14:23, 17 November 2009 (UTC)

NOINDEX and Robots.txt
I noticed that the case pages are tagged with. All CCI subpages can be excluded at once by adding appropriate entries to MediaWiki:Robots.txt. This was done for the WP:Article Incubator. Flatscan (talk) 03:30, 19 November 2009 (UTC)
 * Great! I think I will let somebody bolder with such things do this, though. I don't even know what a "syntax validator" is. And while I could look it up on Wikipedia (quite probably), I would still fear to poke at one. Can somebody else handle this, please? --Moonriddengirl (talk) 13:04, 19 November 2009 (UTC)

Case archives
During the previous incarnations of this process, we've had a longish discussion about a closed investigation no longer remaining irrevocably in plain sight, in particular for contributors who helped clean up their own mistakes.

In fact, as adopted, the text of CCI still has: ''After completion, any CCI case, including those dismissed, will be moved to the archive subpage. On top of the latest revision of the case, the clerk will replace the text with a CCI archive banner listing the date filed, the date closed or dismissed, and the summary finding. Once this is done, he will immediately file all previous revisions for redaction or perform it himself if he is an admin clerk.''

To that effect, I made a new template, presently at User:MLauba/CCI archive, which can be seen in action [ here]. The intent being that once we've completed an investigation, we replace the investigation page's content with this when archiving, then delete (or hide once in effect) the history. This avoids the drama in particular with reformed users who'd otherwise keep getting their past thrown back in their face whenever they're involved in any dispute whatsoever.

Thoughts? MLauba (talk) 14:33, 19 November 2009 (UTC)
 * Hmm. When a case is investigated, I think we need the history so we can verify that a certain article has been evaluated. I think we have some tension here between the need for discretion and the need for transparency, along with potentially useful access to records. If a contributor is going to have his past thrown in his face, I think it's going to be because he or she was investigated at all and not because of a specific article in that investigation. The traces of the investigation are likely to be widespread, with notices at AN (when necessary), at project pages (when appropriate) and at the user's page. Also, many article pages may contain links to the cleanup. (See, for example, Talk:William Haldimand. While the template there is structured to minimize embarrassment by obscuring the contributor's name, it can be located.) So it isn't as if we can make all traces disappear, or necessarily should. Do you believe that it will occasion substantially more embarrassment to contributors to have a record of which articles proved problematic and which did not accessible? --Moonriddengirl (talk) 14:53, 19 November 2009 (UTC)
 * The verification is done before the case is archived. In many cases, the instances of the copyvio will have been removed from the article, whenever possible through redaction or revision deletion. That means that the evidence is, by the time a case is archived, already hidden from non-admins.
 * When we have article Testcase completely history purged of any infringement from User:Example, if we don't "seal" up the archive, the only easily accessible way for the general public to find out that article Testcase had infringement from User:Example becomes the CCI case archive.
 * The only need I see to ever revisit a closed case is if a new one is opened for the same contributor. If that happens, the undelete button is two clicks away from the deleted revisions.


 * Case in point, I remember quite vividly one user this year who was investigated (and blocked for a while) and participated in the cleanup. As he was involved in unrelated issues, the copyright issues kept being thrown at him as a pile-on, and I believe influenced quite a few of the people who called for his ban by the end.


 * A seal with "Case: Example, Finding:multiple infringement, fully cleared, Example helped clean up" would allow redeemed users to carry on.
 * Most of us probably tend to stay well clear of RFAR, but if you look at requests there, you will easily find that some litigants will dig up whatever dirt they can find to bolster their case, regardless of what happened afterwards. We don't need them to be able to harvest long-deleted diffs from our case pages, where most of it is in practice hidden to non-admins. This avoids needless distractions on any other matters, and on our side, the investigation being long closed, the matter is behind us anyway. MLauba (talk) 15:26, 19 November 2009 (UTC)
 * I know verification is done before the case is archived, but that doesn't mean that it won't be necessary at some point to confirm. :) Particularly as the investigation represent our best efforts merely (I'm always worried about missing something, but I remain loathe to adopt the sometimes recommended approach of preemptively nuking it all. :/). Revision deletion and redaction are frequently not employed in CCIs, particularly as these are often assisted by non-admin article cleaners. Even with the articles created by GrahamBould, the articles were stubbed with a caution against restoration placed at the article's talk pages. In these cases, expedience rules, I'm afraid, for obvious reasons. Even with expedience, we can't keep on top of them. :/ In the case in point (and, oh dear, this seems to be future development of which I was unaware), I should think that the publicity at ANI and at the user's page would have been far more influential on the pile-on than the investigation page, which if anything would have provided evidence that the contributor helped. I guess my question is whether the dirt of a diff of a specific case is really any more condemnatory than an investigation in itself, given that for an investigation to have happened, multiple copyvios must have been confirmed? --Moonriddengirl (talk) 15:34, 19 November 2009 (UTC)

User Warnings
User:Lou72JG has an open investigation and some of his/her images have been found to be copyright violations. The user continues to question why their images are being removed or challenged and continues to upload images. The user has also re-uploaded images that have been previously deleted using the PUI/PUF process. Although this user has had all the standard template messages they dont appear to have any formal warnings. Once the investigation has found copyright violations should we warn these users to stop uploading images and block them if they continue, perhaps the CCI should have some guidance on this. Thanks. MilborneOne (talk) 12:37, 24 November 2009 (UTC)


 * Final warning issued, if it is not heeded, an indef will be in order. MLauba (talk) 12:51, 24 November 2009 (UTC)


 * Thanks but should it be mentioned in the process using the sort of message you used ? MilborneOne (talk) 13:01, 24 November 2009 (UTC)

Are there standard warning messages
appears to be inserting paragraphs whole into articles with a convenient link to the page from which he plagiarised the information. Are there standard message somewhere to try and educate about what they have done and maybe enlist their help in clearing up the copyvios that they created?--Peter cohen (talk) 20:26, 25 February 2010 (UTC)
 * Not a lot. The typical user copyright templates are gathered at WikiProject Copyright Cleanup/Resources (all of them, for user pages and otherwise, are here). We have a few for this board, but nothing of the sort that you address. I'd probably go with Uw-copyright and maybe import some of the language from Uw-c&pmove (the "if there are any other pages that you...even if it was a long time ago, please" bit). --Moonriddengirl (talk) 20:38, 25 February 2010 (UTC)
 * Thanks for the reply. I've opted for softer wording as I'm trying to encourage him to fix his own vios. Is there an automatic way to "stalk" an editor and be infromed of future edits by them? Or do I have to rememebr to check up on him myself?--Peter cohen (talk)
 * If there's an automatic way, sadly, I don't know it. :/ --Moonriddengirl (talk) 15:51, 2 March 2010 (UTC)

Presumptive deletion of images
I had a look at some of the entries for files uploaded by Bci2, and there is a wide variety of claims on the copyright ranging from being the copyright holder to stating that the material has been released in the public domain. Based on the two files that I tagged, I suspect that none of the claims for any of the images are credible. It looks like David Eppstein and some other editors have spent some considerable time combing through the listing and carefully sourcing the violations. Aside from some non-creative images of text, and the PDF files, the image files so far are all copyvios. We have a presumptive deletion template for articles. Should we have one for images too? -- Whpq (talk) 15:28, 2 March 2010 (UTC)
 * We do, though we sometimes will extend the benefit of the doubt if one series of pictures by an image uploader has consistent metadata. I'm not entirely sure if we should, since they could be simply snagging the images from one source, but images are not my main area so I bow to those with more experience. Copyright violations has not for some time limited its presumptive deletion to text. I think that the careful combing through is a good idea, as it clearly establishes a pattern of violation, but once a strong pattern has emerged we don't really seem to have any viable option. Usually, what I do is let the image folks have at such CCIs for a while and come in as a last cleanup crew, mopping up anything that has not been tagged as viable and nominating any images that have gone to Commons for deletion. Commons, sensibly, has an option for mass deletion debates for copyright problems. Unfortunately, Commons also has a monster backlog. :/ --Moonriddengirl (talk) 15:50, 2 March 2010 (UTC)

wp:quote
There is a proposal to promote this.174.3.113.245 (talk) 06:26, 29 March 2010 (UTC)

Convention on talk page note when copyvio has already been edited out?
I found a copyvio in a CCI listed article, but the copyvio text had already been excised by a revision of the article unconnected with the CCI. In such a case, should any note be placed in the article's talk page a la CCI or cclean? thanks --Tagishsimon (talk) 23:38, 8 March 2010 (UTC)
 * Boy, we're really good at this "answering questions" thing. I only just now noticed this! I usually use cclean, fwiw. :) --Moonriddengirl (talk) 20:13, 7 May 2010 (UTC)

How to flag an investigation complete
So a dumb process question. I read through the CCI instructions, and I don't see how we are to flag an investigation as completed so that a clerk or admin can close it. -- Whpq (talk) 15:30, 7 April 2010 (UTC)
 * We haven't had to develop this yet. This is only the second time, I think, that it's come up. I'm not sure how to note it; should we create a new subsection for completed investigations that need closure and ask people who complete them to list them there? --Moonriddengirl (talk) 15:48, 7 April 2010 (UTC)
 * Or perhaps have a parameter on the template that can be set to indicate the investigation which can set some sort of visual cue for a clerk or admin to see. -- Whpq (talk) 16:00, 7 April 2010 (UTC)
 * I was hoping somebody who could technically work this out would show up. Alas, it seems not. I'll track one down. :) --Moonriddengirl (talk) 19:35, 13 April 2010 (UTC)
 * I've taken the liberty of adding an optional "completed=yes" parameter to the template for this purpose. Try it out in preview to see, let me know if you have any feedback. :-) Dcoetzee 20:38, 13 April 2010 (UTC)
 * It works! Yay! Now to update instructions. :) --Moonriddengirl (talk)

Listing generation question
So I noticed something I have a question about, taking Nicolae Ceauşescu from Bci2 for example.

It lists (1 edits, 1 major, +1649) with this diff which is certainly his largest edit to the article, but in looking through the article history I notice that he made 10 edits, all of them marked minor, resulting in this diff for his entire contribution. Is this how it's supposed to work and/or dare I ask about the innards of the program that generates these listing? VernoWhitney (talk) 21:30, 14 April 2010 (UTC)
 * I have absolutely no clue, I'm afraid. You'd have to ask User:Dcoetzee how it works. --Moonriddengirl (talk) 20:19, 7 May 2010 (UTC)
 * Related incident for my own reference: this edit removed text, but replaced it with copyvio. This article was not listed at all in the recent CCI for the editor. Methinks this is a problem. VernoWhitney (talk) 16:21, 10 May 2010 (UTC)
 * Lest I bother the creator unnecessarily I decided to play around with the contribution surveyor and examine the source code. After my review I've determined that both of these appear to result from using the "Hide minor edits" which causes the tool to completely ignore edits which do not increase the size of the article by at least 100 characters. So that answers my first relatively pointless musing. VernoWhitney (talk) 17:23, 10 May 2010 (UTC)
 * Since Dcoetzee's attention has already been pointed this direction, I have a more precise question: is there a way to determine the total amount of content changed beyond just counting the characters? VernoWhitney (talk) 17:26, 10 May 2010 (UTC)
 * You have the "Hide minor edits" box checked. Uncheck it and hit "Update" to see all edits. It does not currently combine adjacent edits. It would be nice to have a more accurate way of determine content change than the change in number of characters, but any other method would also be very, very slow. Dcoetzee 19:54, 10 May 2010 (UTC)
 * Okay, I kinda figured it would be performance prohibitive. Thanks for the reply. So now a question to the more veteran CCIers: Is the annoyance of having all of the minor edits showing up worth finding situations like this one, where existing text is replaced by smaller copyvio? I couldn't begin to guess how often this kind of situation occurs. VernoWhitney (talk) 20:05, 10 May 2010 (UTC)

CCI Clerkship?
Okay, so someone MRG convinced me to apply for CCI clerkship, and since I try and hit up CCI a few times a week anyways I figured why not. Anyone know how the process works from here (or want to shoot me down)? ^_^ VernoWhitney (talk) 13:24, 7 May 2010 (UTC)
 * 👍. With User:GrooveDog out of the picture (for school?), the full clerkship duties (such as they are) fall on User:MER-C. Verno has shown himself to be dedicated and on the ball, and I'm quite confident he could help out here. :) --Moonriddengirl (talk) 13:31, 7 May 2010 (UTC)
 * Pictogram voting keep.svg Get him a desk and the clerical stamp already :) . MLauba (Talk) 14:14, 7 May 2010 (UTC)


 * [[Image:Pictogram voting support.svg|20px]] No copyright concern. Material PD or appropriately licensed for use. – Toon 20:21, 7 May 2010 (UTC) (Sorry, couldn't resist it). – Toon 20:21, 7 May 2010 (UTC)
 * LOL! And I do believe a consensus may be emerging. :) --Moonriddengirl (talk) 21:34, 7 May 2010 (UTC)


 * Yeah, OK. MER-C 01:42, 8 May 2010 (UTC)
 * Thanks for the !votes of confidence. I guess 4 people makes for a strong consensus in this neck of the woods. VernoWhitney (talk) 02:20, 8 May 2010 (UTC)


 * I have no objections. I'm sure you'll do a great job. :) Theleftorium (talk) 10:07, 8 May 2010 (UTC)
 * Five now! That's like crossing the 100 person mark at RfA! --Moonriddengirl (talk) 11:55, 8 May 2010 (UTC)
 * Make that six. Verno has my complete confidence in this matter. Not that I do much around here, for which apologies. --Tagishsimon (talk) 16:28, 10 May 2010 (UTC)
 * Well, I'm glad it continues to be unanimous, since I took MRG's listing of me at Contributor copyright investigations/Instructions to heart and went ahead and closed a case I had been working on earlier today. VernoWhitney (talk) 16:42, 10 May 2010 (UTC)


 * Yup. Work looks good. Thanks for stepping up to the plate, VW. Now somebody give him the paper hat and decoder ring and we'll call it official. — Cactus Writer |   needles  17:09, 10 May 2010 (UTC)
 * I'd do it but we all know that MRG is what corresponds most closely to both 'crat and Founder for CCI :) I'll let her the honour of declaring consensus and activating the symbolic bit. MLauba (Talk) 21:13, 10 May 2010 (UTC)
 * Psst, she already did. Although she must've forgotten the decoder ring... VernoWhitney (talk) 21:18, 10 May 2010 (UTC)
 * Yes, I sort of played the averages and put him to work already. :) --Moonriddengirl (talk) 21:51, 10 May 2010 (UTC)

Change the instructions, preserve the diffs
The instructions for poring over lists of edits reads, at one point, "After examining an article: replace the diffs after the colon on the listing with indication of whether problem was found." I'd like to propose that we do not replace the diffs after the colon, since anyone coming to a case after someone has been though it is in no easy position to double check via the (now absent) diffs. I can't think of a good reason for trashing the diffs. Thoughts? --Tagishsimon (talk) 14:49, 3 June 2010 (UTC)
 * I agree that it's harder to double check the work (although they could always look at the first history of the page before any work is done, which is what I've done when I was looking at some of MRG's work with Rcpaterson). I think the main reason for removing the diffs (at least why I like it) is readability - at least with the cases where the diffs stretch onto more than one line its hard to see which articles have been checked and which haven't at a glance. I guess it's a trade off between ease of the initial work and ease of a later audit. VernoWhitney (talk) 14:54, 3 June 2010 (UTC)
 * That could be resolved by placing the status marker at the start of the line instead of replacing the diff at the end of the line. -- Whpq (talk) 15:04, 3 June 2010 (UTC)
 * That could work, although I think I'd personally prefer putting it on a new line following the article line so it's the same practice as at CP and SCV. VernoWhitney (talk) 15:18, 3 June 2010 (UTC)
 * I'm open for any approach that works. :) We've taken several different approaches to this before the process was formalized. At one point, we just removed the line listing of the article altogether, but I got to thinking that could be very difficult to follow up. Removing the diffs was primarily to declutter the listings for readability, as some of these can be very long (I've seen listings where contributors have literally added hundreds of edits to an article. yikes!) --Moonriddengirl (talk) 16:22, 3 June 2010 (UTC)
 * Is the clutter a problem when viewing or editing? They could be enclosed in  comments to hide them. It's also possible to hide them using CSS, but I'm not sure if show/hide can be toggled easily. Flatscan (talk) 04:12, 4 June 2010 (UTC)
 * For me it's primarily when viewing, I appreciate collapsing completed sections for the same reason. And if we're commenting out the diffs wouldn't it still be easier for an auditor to just look in the history to get the list of diffs than go through the trouble of uncommenting them? VernoWhitney (talk) 04:36, 4 June 2010 (UTC)
 * You're right, comments are no good. I think something similar to can be implemented with   and custom JavaScript. I'll write a prototype. Flatscan (talk) 04:30, 6 June 2010 (UTC)

Prototype
CCI item hides the summary and diffs if the item has a completion comment. User:Flatscan/showCCI.js adds links that to show/hide the summary and diffs.


 * Suspected copyvio article: (3 edits, 3 major, +2703) (+2703)(+135)(+421)
 * Suspected copyvio article: Cleaned. User:Example 04:30, 7 June 2010 (UTC)



Flatscan (talk) 04:30, 7 June 2010 (UTC)
 * I added the current formatting to highlight the differences, plus strike-through and bold to show how a patroller marks an entry as checked. Once the .js is tweaked, it should be moved to a subpage here and fully protected. Flatscan (talk) 04:25, 8 June 2010 (UTC)
 * It certainly looks nice, but I'm not sure that requiring custom javascript to display the diffs is really user-friendly for those who don't hang out here all the time anyways (and how often do even we go through and double-check each other's work unless asked?). Going back to reverse Tagishsimon's original question - is the current practice of just removing the diffs actually bothering anyone? VernoWhitney (talk) 15:07, 7 June 2010 (UTC)
 * Without the JavaScript, the rendered text appears exactly the same as outright removal. Even regulars could elect to use the .js only when necessary. Flatscan (talk) 04:25, 8 June 2010 (UTC)
 * I'm afraid that it would be prohibitively time-consuming. :/ We sometimes have thousands of articles in a CCI, and having to fill in those parameters will take much longer than the current practice (I sometimes cut & paste ~ to evaluate and clear several at a time). Since these are courtesy blanked on completion anyway and since the diffs can be found in the CCI history as well as the article's history, I guess I'd have to wonder also whether the removal of the diffs is a major issue. --17:58, 7 June 2010 (UTC)
 * The transition to the template wouldn't be manual. New CCIs would use the template at creation, and I could convert the existing ones with a fairly simple text parser. Flatscan (talk) 04:25, 8 June 2010 (UTC)

For archiving
G'day all - this CCI seems to be done if someone would like to archive it. --Mkativerata (talk) 21:23, 18 June 2010 (UTC)
 * ✅ VernoWhitney (talk) 21:34, 18 June 2010 (UTC)

Clerk?
May I have permission to be a clerk? I know the laws/ rules and have never had a copyright infringement. Already helping out at WP:SCV.  Mr. R00t   Talk  19:40, 21 June 2010 (UTC)
 * Hi. We can always use more help in copyright cleanup, but we currently have two active clerks and I'm not sure that there is a pressing need for another. Personally, I would like the opportunity to observe your copyright work first, as you seem to have been contributing to SCV for only a day. I'm not quite sure, for instance, why you marked this one as a "false positive". A comparison of the foundational edit with the tagged site shows that most of the second paragraph was copied. --Moonriddengirl (talk) 19:56, 21 June 2010 (UTC)
 * I'm sorry but for some reason I just don't seem to be seeing anything along the lines of a copy vio on that. Could you point out a little more clearly. Guess I need a new prescription on my glasses or something. If you are talking about a second paragraph on the article I don't know what you mean as the term 501(c)(3) does not appear anywhere.  Mr. R00t    Talk  20:09, 21 June 2010 (UTC)
 * MrG is referring to the second paragraph in the article's first revision: http://en.wikipedia.org/w/index.php?title=Cleveland_State_University_Alumni_Association&oldid=369406562. That paragraph was then deleted by the creator of the article. Always remember to check the article's history. :) Theleftorium (talk) 20:16, 21 June 2010 (UTC)
 * Something would no longer be considered a copy vio if the article had been changed to not include the copyrighted material. Right?  Mr. R00t    Talk  20:20, 21 June 2010 (UTC)
 * Correct, but you shouldn't tag it as a "false positive" as you did. Just add a note that the article has been cleaned. Theleftorium (talk) 20:22, 21 June 2010 (UTC)
 * Yes. A false positive is when the bot flagged an article that had no copied content. Of the options at Template:SCV is probably your best option in a case like that. Clear records at SCV can be helpful if copyright concerns persist with a specific contributor. --Moonriddengirl (talk) 20:25, 21 June 2010 (UTC)

Alright. Thanks for telling me. I am going to assume that all of this is a no on the clerk bit?  Mr. R00t   Talk  20:51, 21 June 2010 (UTC)
 * Yes. It's mainly no ;) --Tagishsimon (talk) 20:54, 21 June 2010 (UTC)
 * Alright. I'll help without  Mr. R00t    Talk  20:57, 21 June 2010 (UTC)
 * Thanks. As I said, we can use all the help we can get. And if we lose a clerk or if traffic picks up significantly, you might want to toss your hat back into the ring. It'll be a bit easier to assess you once we get to know you better. --Moonriddengirl (talk) 21:16, 21 June 2010 (UTC)

Confidentiality questions
So I was wandering around CCI while avoiding work and came up with some process questions, which I don't think are answered anywhere: VernoWhitney (talk) 20:56, 21 June 2010 (UTC)
 * 1) Under what circumstances do we open a CCI under date-only and not their username?
 * 2) If we're opening the CCI under date-only, why do we then list the username at Template:CCIlist?
 * 3) I don't know if it has happened yet, but should a date-only CCI be placed in the archive as date-only or under their username?


 * There have from the beginnings been concerns raised about potential damage to reputation from CCIs. Since CCI pages are not indexed, this didn't seem to be a major concern until people started bandying them about in edit summaries and article talk pages. I sort of IARed the date-only system after seeing that done with a mathematician who worked under evidently his real name. Later, it occurred to me that "real name" CCIs could also come back to bite uninvolved people with the same real name, if such bandying pops up in a random search. I believe I also did it with an e-mail address.


 * The CCI list is not indexed, like CCI pages. Using the actual name can help with investigation and clean-up, since it may make it easier for people to realize they are familiar with the contributor. For a similar reason, the real name is still listed on the CCI page, to facilitate history checks.


 * Since the archives are not indexed either, I really think archiving under the username is best, even if the username is a real name. The archives are there to document cleanup and help if future additional action is needed.


 * There's always a balance between courteous handling (recognizing particularly that many contributors do not intend to infringe copyright) and facilitating cleanup. :/ --Moonriddengirl (talk) 21:22, 21 June 2010 (UTC)
 * Ok, thanks very much for clearing that up. VernoWhitney (talk) 21:33, 21 June 2010 (UTC)

So I was messing around this morning and noticed that Template:CCIlist, and CCIheader are being indexed, which kindof ruins the whole confidentiality thing. I'd fix it except I'm likely to break something, although if nobody smarter comes along I'll try later today. VernoWhitney (talk)
 * No claims to intelligence here, but will this work? You know this is not my thing. :/ --Moonriddengirl (talk) 15:19, 5 July 2010 (UTC)

Contributor copyright investigations/Arab League
In order to facilitate the completion of this CCI, I have requested an exemption from the Non-free content policy for non-article pages. Please leave any opinions at Wikipedia talk:Non-free content/Archive 46. VernoWhitney (talk) 15:23, 29 June 2010 (UTC)

A little advice
Not a copyright problem but in the same ballpark. I'm finding a number of articles that are copy-pastes of public domain works, but without attribution. For instance an article section will have at its bottom a reference to the work copied from, but it doesn't indicate that the information was lifted wholesale. At least we're ending up with a bunch of encyclopedia articles written in a 19th century style and from a nineteenth century point of view, but at worst it's potentially misleading plagiarism (someone will believe a sentence or passage is a wikipedia summary of an older source, when it is in fact the source's words). Is there a template for this (i vaquely remember seeing boilerplate on articles like this article was largely copied from Britanicca of 1911 or whatever, but can't find it. Any other thoughts or advice about what to do about this kind of issue much appreciated (and if there's some better place to bring this up, I'll take it there if you point me there).Bali ultimate (talk) 14:43, 11 September 2010 (UTC)


 * Hi. :) This is an issue of Plagiarism, but I'm happy to offer some tips here. We do have templates that can help with these located at Category:Attribution templates. There are many specific ones, and a generic PD-old-text. The thing to do beyond attributing when you find this is to let the contributor know. We don't have a template that I know of yet for plagiarism. Hmm. Maybe I'll write one. --Moonriddengirl (talk) 14:47, 11 September 2010 (UTC)


 * I've created it at {{subst:Uw-plagiarism}}. I'll run it up the flag pole and see who salutes, so to speak. --Moonriddengirl (talk) 15:01, 11 September 2010 (UTC)
 * The generic template is a good start. Thanks. It's one editor I'm looking at that seems to do this extensively. To be fair to them, there's some very odd advice at WP:Public domain resources (or was; i just changed the passage .) They probably believe it isn't plagarism if the citation is there. I'm wondering if the plagiarism page provides confusing advice. Will have a read.Bali ultimate (talk) 15:06, 11 September 2010 (UTC)
 * I've pointed to Plagiarism in the lede of WP:Public domain resources in no uncertain terms. --Tagishsimon (talk) 15:14, 11 September 2010 (UTC)
 * This is the first article I'm dealing with - Great Comet of 1556. Are the tags i've placed at the bottom appropriate/sufficient?Bali ultimate (talk) 15:19, 11 September 2010 (UTC)
 * They look great! I've moved them beneath a ==Notes== section, since that's where I've always thought they should live. --Tagishsimon (talk) 15:23, 11 September 2010 (UTC)
 * Thanks for the template improvement, User:Tagishsimon. :) --Moonriddengirl (talk) 15:26, 11 September 2010 (UTC)

Finished CCI
I don't know all the bureaucratic processes here, but I just finished up Contributor copyright investigations/Snjsharma if someone wants to mark it as closed. Calliopejen1 (talk) 14:32, 16 September 2010 (UTC)

Rollback bot
I have filed Bots/Requests for approval/VWBot 9 to have something ready to go in case there is support for rolling back all articles edited-but-not-created by a CCI subject (I know it has been mentioned repeatedly DD's page, but the actual execution of this bot would obviously wait for consensus on a case-by-case basis). VernoWhitney (talk) 20:40, 22 September 2010 (UTC)
 * So I've gotten approval to run a trial for this doomsday weapon. Are there any candidates amongst our current crop of CCIs? Note that it will still take me some time to finish tweaking the code to ignore small (<100 byte) edits and reversions, I had been working on other things because I lost hope there for a while that anyone would actually read my proposal. VernoWhitney (talk) 00:48, 20 October 2010 (UTC)
 * If my latest sock request goes through, that would be an excellent candidate, since he is both a serial infringer and a banned contributor. For that matter, that's true of your latest sock request, too. I am generally loathe to rollback without manual review, but in either of those cases it would probably be warranted. --Moonriddengirl (talk) 02:57, 11 November 2010 (UTC)
 * Okay, so your sock didn't pan out, but has been confirmed as a sock of Earioh/LAVINA4194/etc... so I'm setting up for the first trial on their articles, and I'll be reopening Contributor copyright investigations/Jansport87 since the articles they created from scratch can't be rolled back. VernoWhitney (talk) 22:20, 11 November 2010 (UTC)
 * First test run complete - if anyone would be willing to go through VWBot's recent edits (e.g. and ) and tell me what I did wrong and what could be done better, it would be much appreciated. VernoWhitney (talk) 12:52, 12 November 2010 (UTC)
 * First, that's a very powerful tool! I'm sure it would go without saying that it would be implemented only with due consideration, but I'm a'sayin' it anyway, for posterity. That said, it seems to have done its job well bar one: it tagged talk:Kevin Coughlin but did not touch the article. I assume its operation was barred by the subsequent edit of Gigs. --Moonriddengirl (talk) 13:33, 12 November 2010 (UTC)
 * It wasn't barred by subsequent edits, since the idea is that whenever this is used, throwing the baby out with the bathwater is okay. In that particular instance he had already reverted it to the version just before CAFESDO touched it, so I need to add another check to make sure that the new text is actually different so it doesn't leave the message on talk (or maybe it just needs a different message saying that the now-reverted content was also likely copyvio in addition to whatever other reason may have been mentioned as the reason for removing it?). I've already fixed a bug with the link to the old version of the article (see the broken link that it added at the bottom of http://en.wikipedia.org/w/index.php?title=Talk:Teresa_Fedor&oldid=396304683).


 * Are the edit summaries good? Is the use of CCI with the added information below it good, or should it be using a custom template? Given the scale of this tool, nitpicking is acceptable and encouraged. VernoWhitney (talk) 14:15, 12 November 2010 (UTC)


 * I had thought it would throw the baby out with the bathwater, but presumed I must have been mistaken. :) Edit summaries: I might avoid naming the contributor in edit summary, but just refer to the talk. I don't have a strong rationale for that. I sometimes name contributors in edit summaries and sometimes not; it depends on the likelihood of "good faith." In this case, there's not much. :/ I would suggest that we cobble together a custom template for it to avoid confusing people. --Moonriddengirl (talk) 14:21, 12 November 2010 (UTC)

Contributor copyright investigations/FlyingToaster
I've just done Pat Finucane Centre and Harold Blauer from the Articles 1-20 list. I would be grateful if someone could look over what I've done so I know I'm doing it right, and then I'll work through the rest of the list. I stubified both, I don't have the interest to re-write them tbh, but what's left preserves the sources for others. PF Centre lifted wholesale from uncited sources, Harold Blauer flew too closely to the text of its sources for my liking – I recognised bits of the article in the sources as I read them, which surely isn't a good thing! Thanks, Bigger digger (talk) 01:22, 29 October 2010 (UTC)
 * Nice work, everything looks good. :) Theleftorium (talk) 15:37, 29 October 2010 (UTC)
 * Thank you! Seems I got into this just before it became the "in" thing! What's all the talk of supressing the history of an article? I haven't gone back into the history as the article when I investigated was clearly a problem, and there isn't much info on this history purging that I can find. Bigger digger (talk) 15:45, 1 November 2010 (UTC)
 * As far as I know, history purging is not a requirement. But I've been rather inactive on Wikipedia for the last couple of months so you may want to ask User:Moonriddengirl. Regards, Theleftorium (talk) 16:15, 1 November 2010 (UTC)
 * Well she seems a bit busy, and I've followed the directions on the CCI page, and there's a record of me trying to probe a bit further here, so I think I'll just keep going through them, and I can review later on if it becomes necessary. Thanks for the quick response. Bigger digger (talk) 16:25, 1 November 2010 (UTC)

Contributor_copyright_investigations/Jonathan329
I'm just trying to wrap up the text portion of this CCI and there are a couple of outstanding problems that I am not sure how to address. Boissière (talk) 15:46, 1 November 2010 (UTC)
 * List of Southeast Asian leaders - This article has been built up by simply copying the corresponding Wikipedia leader list article (as it was in mid 2007) for each country (hence the myriad of formats). None of these copyings has been attributed. Clearly someone *could* go in and correctly attribute all this lot but I am somewhat dubious as to the usefulness of this article at all. The alternatives seem to be either chop it down to just be the list of the existing leaders or delete the article outright as duplicating existing content.
 * Laguna local elections, 2010 - The lead has been copied (again without attribution) from a similar Phillipine 2010 election article (it's difficult to say which one as they all have similar leads). Not sure there is enough copied to be a problem.


 * Sorry for the delay! Things have been crazy lately, as I think you know. :) I would probably let the second go, as it seems to be formulaic on Wikipedia. The former has little creativity, but I would probably go with chopping it down to just the list of existing leaders. I note at the talk page there has previously been comment about the usefulness of the mix. --Moonriddengirl (talk) 03:01, 11 November 2010 (UTC)
 * OK, agreed. I have excised the excess in List of Southeast Asian leaders so it is now just a list of the existing leaders (which hadn't been kept up to date either). That finishes the text portion of the CCI. As for the photos there are just three left. File:Kay-anlog Map.png (on en-wp) and File:Mapa kay-anlog.png (on Commons) are the same map but at different resolutions whilst File:Punta Map.PNG (on en-wp) is a similar map but is unused. However all three are based on commons File:PH Locator Laguna Calamba Barangay.png but they don't attribute it. I'm sure that this is sortable but I have not fiddled with files at all so further advice would be helpful. Boissière (talk) 23:05, 11 November 2010 (UTC)
 * Whoot! I've taken care of the image attribution, and it's a wrap! After-party, all participants! :D --Moonriddengirl (talk) 13:26, 12 November 2010 (UTC)

CCI bot
Following discussion with Moonriddengirl I intend to create a CCI bot. My initial thoughts are here. Any comments would be much appreciated on that page's talk page. Dpmuk (talk) 14:49, 26 November 2010 (UTC)

Hkdollarboy
Re I've found two (Charles Richard Stith & Tahuichi Academy) straightforward full copyvios. There aren't a huge number of articles started by him and many seem fine just stubs that other people have gone over since. Is it worth starting a CCI thingy or shld I just go thru the articles myself?--Misarxist 16:44, 2 December 2010 (UTC)
 * If you can go through them by yourself that would be great (there's a bit of a perpetual CCI backlog as you may have noticed). If you confirm 5 copyvios (our completely arbitrary cutoff) or so we can of course fire up a CCI so that none of their articles get overlooked and you don't have to do them all on your own. VernoWhitney (talk) 17:36, 2 December 2010 (UTC)

Wikipedia:Contributor copyright investigations/Ivankinsman
I am almost finished with the CCI on Ivankinsman, but some copyvios from after the CCI began have been found Because of this his CCI should be expanded with the edits made after 20 oktober 2009 as well. Could somebody add those so I can check them? Yoenit (talk)
 * Yes, and you are a prince among men...or women for working so hard on this. :) (I don't know if there are gender associations with the name "Yoenit" :D) --Moonriddengirl (talk) 13:31, 3 December 2010 (UTC)
 * = . VernoWhitney (talk) 13:44, 3 December 2010 (UTC)
 * Whoa! How ever did you stumble upon that? :O --Moonriddengirl (talk) 14:00, 3 December 2010 (UTC)
 * I have not a clue. There are random templates and magic words hiding all over the place which do fun things, I just try to make a point of writing them down when I do stumble upon them. VernoWhitney (talk) 14:04, 3 December 2010 (UTC)

Mass rollback, take 2
It turns out that (see CCI) has a sock in  (see SPI) and it has been proposed that I use VWBot's mass rollback tool which is in trial to remove all of their edits to non-created articles. After the feedback I received from the first trial run I'll be posting a proposed edit summary and talk page template later today. Thoughts/questions/concerns are appreciated. Is there support for this? VernoWhitney (talk) 13:08, 9 December 2010 (UTC)
 * For reference the edits from the first trial can be seen here and my proposed messages for the bot to leave can be seen and edited at User:VernoWhitney/Sandbox2. VernoWhitney (talk) 15:18, 9 December 2010 (UTC)


 * I support this. We haven't even had time to evaluate the articles from prior to his block; adding the 1326 articles remaining after the hundreds of articles G5ed is an unreasonable burden. Copyright violations supports indiscriminate removal of his edits, and this is a good time to do that. --Moonriddengirl (talk) 18:28, 9 December 2010 (UTC)
 * I support as well. If a user banned for copyright violations makes a sock that could be doing that, we have no reason to assume good faith and keep them; revert is appropriate. Wizardman  Operation Big Bear 18:33, 9 December 2010 (UTC)
 * Okay, given the activity of this page that's a decent consensus, so in light of that and with the precedent of almost all of their created articles having being deleted earlier under WP:CSD and WP:CV, VWBot is off and running to revert 897 articles (the 1326 number is apparently inaccurate because I ran the contribution surveyor before a few hundred articles were deleted and it must've still had the old number cached). VernoWhitney (talk) 05:33, 10 December 2010 (UTC)
 * I noticed a bug in a spot-check about a quarter of the way through, so it's shut down until I can fix it. VernoWhitney (talk) 06:39, 10 December 2010 (UTC)
 * And we're off again; bug fixed and beginning with a re-edit of a few dozen articles which were rolled back to the wrong version. VernoWhitney (talk) 12:47, 10 December 2010 (UTC)
 * This is not my day. :( VernoWhitney (talk) 15:20, 10 December 2010 (UTC)
 * Sorry. :/ I'm a bit impaired today, but if there is cleanup needed, let me know, and I'll try to pitch in. (FWIW, I would consider resetting size to a bit higher than 100. I've always thought that was a bit low for the CCI program.) --Moonriddengirl (talk) 15:24, 10 December 2010 (UTC)
 * If you have a good suggestion for size let me know - 300 maybe? - I just looked through the two CCIs I closed yesterday and we found copyvios with edits as small as 178. I'll reset the number before I finish the run tonight - I'm going to have to reparse all of their edits anyways because it turns out I only fixed the bug for the first quarter of the articles before I had to run away this morning so it'll have to recalculate the earliest big edit for all of the articles it's reverted since then anyways. (I imagine we could ask for the contribution surveyor to be changed to a different size cutoff too since it's just a single number in the code as I recall.) VernoWhitney (talk) 15:43, 10 December 2010 (UTC)
 * If you've found them at 178, maybe 150? 175? Or maybe for autorevert, a larger number would be better and list the smaller edits for human review, since these are generally more likely to be formatting? I don't guess there's any way to limit it to text strings. --Moonriddengirl (talk) 15:46, 10 December 2010 (UTC)
 * There are some ideas for parsing the actual text introduced by an edit (it's amazing how many people get interested when a bot starts rolling back a few hundred articles!), but I'm not sure how much benefit they would provide for the massive increase in processing time it would require without doing some test runs (without edits) to see where it categorizes things. I'll look into this more after finishing Accotink2's run tonight.
 * I think the idea of auto-reverting only larger edits seems good, but then would it list smaller, earlier edits to those auto-reverted articles for human review too? That sounds odd to me, and it's still a large proportion of acceptable to copyvio edits at those sizes, so maybe the smaller ones should just be written off entirely even for human review as not worth the time? VernoWhitney (talk) 16:01, 10 December 2010 (UTC)

I understand the reasoning but I want to voice my opposition to this unless a better solution can be developed. I just watched my watchlist fill up (which in itself is great) but after reviewing 20 - 30 of the "reversions" done by the bot, not 1 had any copyvio in it. Additionally several others have complained about the same problem. After discussing this with VM on their talk page it was suggested I bring it here. Using the example article I used on their talk page, Accutink did some early edits to the Ernest Spybuck article and since then a lot of other edits have been made by several other editors. If this bot goes back and reverts all that it will not only cause harm to the article (and hundreds or thousands more) it will also anger users, do more harm than good to WP and the articles in it and give this bot, its operator and the process in general a bad name. Know myself and other editors are forced to go through hundreds or thousands of articles to fix this mess thats being caused by a Knee jerk reaction to a problem. Rather than continue to leave comments on a completely hidden page that almost knowone has visibility of I suggest some limitations for the bot that at a minimum should be met before going any further.
 * 1) Move this conversation to a more public venue like the Village pump. Especially know that it is causing so many problems
 * 2) Do not revert any article that has been through a GA or better review process or Peer review process since Accutink or one of their socks edited it. If its gone through these processes any problems would have been caught and fixed.
 * 3) Do not revert the article if more than 3-5 edits have been made since Accutinks last edit
 * 4) Do not revert if more than 1500 bytes of info has changes since thier last edit.
 * 5) Do not revert if all thats being reverted is a minor edit
 * 6) Do not revert if the only action was a change to External links or if the editor as adding or expanding and inline citation. --Kumioko (talk) 16:18, 10 December 2010 (UTC)
 * I was originally going to go and fix the articles that this bot messed up but since it just reverted a couple of my reversions I am not going to waste my time. You need to fix this bot and my recommendation would be to go back and undo all the reversions you just did until the code in the bot can be corrected. This is causing more problems than is necessary and is quite frankly ridiculous --Kumioko (talk) 16:29, 10 December 2010 (UTC)


 * The bot's not (for the most part - see my bug) causing problems - the contributor has caused problems and the bot is bringing them to light. If you think going through a peer review process is sufficient for presuming clean of copyvio perhaps you missed Contributor copyright investigations/ItsLassieTime and Contributor copyright investigations/Vanished 6551232. Checking the number of changes since their edits is perfectly reasonable, but leaves possible copyvio lingering for longer while it won't be checked by humans. Truly minor edits aren't being reverted. Parsing the actual text has (as of today) been mentioned at my talk page and will be looked into after this trial concludes. VernoWhitney (talk) 17:28, 10 December 2010 (UTC)
 * But if your going to be done with the run tonight...then investigating parsing text will be pointless. You are correct I don't think going through a peer review is a 100% sure way of identifying the problems but I do for 100% Know that a flat bot run to revert every edit the user ever made is bad. It appears though to me that only a small percentage of the edits the individual performed were actually copyvio (less than 10% from what I have seen so far). I guess this means instead of doing actual edits and continuing to build up WPUS as I have been I will have to break from that and comb through this mess for the next several days or weeks. I really have better things to do that manually comb through 900 pages but its the only way to make sure this is done correctly apparently. I apologize for the tone and I know you are only doing what was dictated by concensus but this is still rediculous. The bottom line here is that nothing we do short of deleting the entire article is going to 100% ensure that there is no copyvio. That argument is like saying if an article doesn't have inline citations it should be deleted because we don't know its not a copyvio. --Kumioko (talk) 17:40, 10 December 2010 (UTC)
 * Looking into parsing text is because this isn't just a one-time problem; we see many repeat infringers here at CCI and there aren't enough volunteers to keep up with their business. The backlog of open CCIs has increased by at least 20 cases in the past six months. We appreciate all the help we can get with copyright cleanup, and if there was enough help then this task wouldn't be needed. Obviously we can't ensure that an article is copyvio-free, but we can at least do our best to try and get edits specifically examined for possibilities of copyvio before restoring them to public view. VernoWhitney (talk) 17:51, 10 December 2010 (UTC)
 * I can see there is no argument I can present that will undo the damage your bot has done and that you have no desire or intent to fix it. I would consider submitting this to ANI as an issue but I doubt that would do much good so I guess now I am stuck with chasing after your bot on a recurring basis starting with this group. My original thought was that I would just look at the ones that pertain to WPUS but after some review it appears the vast majority do. Oh well. Sorry for the attitude but I stayed in the shadows and did my edits building up the Medal of Honor recipients for the last couple years and only recently got involved in things like this because I got tired of people taking short cuts, making poor decisions because people weren't taking the time to comment or otherwise doing the things/or not doing tham, that needed to be done. This falls into the short cut one and the bad decisions and not doing the things that need to be done (fix the bot). Unfortunately I am the lone voice so I am left with the cleanup or let it go. --Kumioko (talk) 18:03, 10 December 2010 (UTC)
 * I retract some of my earlier comment, I am not the lone voice and I wasn't left with the cleanup. Apparently a lot of editors haad a problem with the bot and your bots edits are being reverted on a massive scale. Just FYI. So far it looks like about 150 (maybe more) by what I can see. --Kumioko (talk) 18:08, 10 December 2010 (UTC)
 * I'm sorry you feel that way, but I honestly do appreciate your feedback. This bot task is in trial and the purpose of a trial is to fix whatever needs to be fixed and get reactions once people actually see the bot running. I really will look into text parsing if that's what other editors want to see and hopefully it will improve on the results. I can't say I'm fond of mass reversions but at least for now there isn't another way to actually keep up with the repeat copyright infringers, and so if there's a way that the rollbacks can be improved to reduce collateral damage I'd like to do it. VernoWhitney (talk) 18:11, 10 December 2010 (UTC)

Recommendation to change slightly the policy regarding the edits made by known copyright violators
Due to the ongoing discussions here regarding the current policy of dealing with copyright violators I started a siscussion at the Village pump (policy) here to modify slightly the wording of the current policy. --Kumioko (talk) 15:11, 11 December 2010 (UTC)

LRBurdak CCI
I think that the above mentioned CCI also needs an image check done. There seem to be rather too many 'stock' images of various politicians and academics (nearly all marked 'self created') for them to plausibly belong to this contributor. Boissière (talk) 23:48, 14 December 2010 (UTC)


 * Verno and Mer-C usually list those. Verno, any chance? --Moonriddengirl (talk) 20:57, 16 December 2010 (UTC)


 * Yeah, I'll do it tonight. VernoWhitney (talk) 21:01, 16 December 2010 (UTC)


 * ✅ Appended to Contributor copyright investigations/20101122 2. VernoWhitney (talk) 01:27, 17 December 2010 (UTC)


 * A good chunk of the images have been transferred to Commons, so there's a broken trail there. A majority of them are scanned images, so there could be a bit of an interpretation issue of "I scanned it so I own the copyright to the digital version", I've tagged one such image for deletion. &mdash; Spaceman  Spiff  06:45, 17 December 2010 (UTC)


 * Thanks for generating that list Verno. As a result of that list I have created a subpage here which gives an overview of certain properties. Feedback welcome on the utility of this in processing the image list.
 * PS : Whilst producing the overview I discovered that File:Jhabua tribe2.JPG was actually a redirect. Should this have appeared in the list? Boissière (talk) 21:34, 18 December 2010 (UTC)


 * My program just pulls up every upload log entry of theirs and checks to see if the page exists or not (with some messing around to check for images moved to commons) so it won't list images twice and it won't list redlinks, but anything else will still be there. I hadn't considered the possibility of redirects since it doesn't happen very often with images, but I'll make a note and see about actually posting the image instead of just the redirect on future lists when I get a chance. VernoWhitney (talk) 01:51, 19 December 2010 (UTC)

Tweak Contribution Surveyor?
As was brought up briefly above, there really aren't any copyvios being found at the +100 byte size of contribution. Running with this idea I've gone over every single Archived and Checked-but-never-opened CCI and where possible determined the lowest byte-count addition where copyvio was confirmed or presumed. The lowest I found was at +155 and there were only 6 cases (out of the 50 or so which weren't image-only) where any problems were confirmed or suspected at less than +200.

Given my findings I was thinking of asking Dcoetzee to change Contribution Surveyor's code to set the "default" run to +150 characters instead of +100. That would let us ignore even more of the trivial edits and shouldn't significantly increase the false negatives. Objections? Other opinions? VernoWhitney (talk) 19:46, 16 December 2010 (UTC)
 * No objections here. As we are on the subject there are a couple of further tweaks to the Contribution Surveyor that I feel might be quite useful.
 * To indicate whether a listed article was created by the contributor. It's only a gut feel but copyvios seem to be more likely in new articles.
 * To indicate whether an article is now a redirect. This could be used to flag possible copy/paste moves.
 * Thoughts? Boissière (talk) 20:54, 16 December 2010 (UTC)


 * Agree on all counts. --Moonriddengirl (talk) 20:56, 16 December 2010 (UTC)


 * Hrmm... my suggested change should just involve updating one variable ($major_edit_char_count) if I recall correctly; your ideas could be helpful but would involve additional coding. I'll ask Dcoetzee to stop by. VernoWhitney (talk) 21:13, 16 December 2010 (UTC)


 * Hey all, I've added a field to the form for $major_edit_char_count and set the default to 150. I added a little bold N in front of articles created by the person (as in Special:Contributions). As for redirects, I'm not sure how helpful that would be since normally the page which is now a redirect is the page the text was removed from, rather than the page the text was added to. I'm all up for adding some kind of copy-paste move detection though, if there's a good reliable way to do so. Dcoetzee 00:36, 17 December 2010 (UTC)


 * That's outstanding, thanks! VernoWhitney (talk) 01:05, 17 December 2010 (UTC)


 * I'm trying to recall the possible issues that I have found when discovering that a CCI article is a redirect. I think that is probably along the lines that *something* might not be quite in order, especially if the text has been moved elsewhere, possibly without attribution. On the other hand seeing that an article is a redirect would indicate that the article in question is not currently a copyvio as any inserted text cannot now be seen. Anyway I am not that fussed about this. Boissière (talk) 21:46, 18 December 2010 (UTC)

Which way?
Gianni Rivera and Sandro Mazzola  are copies though I can't work out which way. Wayback archives the site though I can't find those pages earlier than this year. I've gone thru all of & db-copyvioed about half a dozen, the rest seems fine.--Misarxist 09:59, 17 December 2010 (UTC)
 * I think that these two are fine too. These two pages seem to be a snapshot of the Wikipedia pages sometime in late 2007. They contain quite a bit of text that was added by users other than Hkdollarboy. Boissière (talk) 21:39, 18 December 2010 (UTC)


 * Thanks & another one: Ardeshir Cowasjee seems to turn into here . Is that them copying us?--Misarxist 10:22, 24 December 2010 (UTC)
 * That is definitely them copying us. The wikipedia article was build over many revisions by different contributors. Yoenit (talk) 10:39, 24 December 2010 (UTC)

"Diff" template and popups
Is there some easy way of converting the links on the older investigations pages to the Diff template so that it doesn't break Popups?--Misarxist 10:24, 24 December 2010 (UTC)

Ivankinsman CCI completed
I just finished the Ivankinsman CCI case, so it can be closed. Yoenit (talk) 01:43, 25 December 2010 (UTC)

Help with cleaning up a copyvio
I was studying Contributor copyright investigations/De Administrando Imperio 2 for any Ethiopia-related articles with problems when I found one in Ogaden National Liberation Front, buried in the middle of a number of revisions. Could someone (1) remove this from the history, & (2) leave me a pointer to how to do for myself in the future? Feel free to ask me for more specific information to find the specific revision. (I'd try to do it myself, but I have a bad cold which is making me more stupid than usual. Right now I'm more likely to make things worse than better.) -- llywrch (talk) 23:00, 17 January 2011 (UTC)
 * I've removed the violation from the current article here (I assume this is the one you identified). Revision deleting it wouldn't be feasible as we'd have to delete all subsequent versions that contained the vio. I've left the standard note on the article's talk page to warn against restoration. Thanks for picking it up. --Mkativerata (talk) 23:08, 17 January 2011 (UTC)
 * That's the one. I was hoping revision deleting could be used here, just wasn't how to use it. Thanks. -- llywrch (talk) 06:16, 19 January 2011 (UTC)

Reporting a violation
Hi, sorry, I'm new to this process. On 24 November, User:JeffreyLiu-NJITWILL pasted in a couple of lengthy paragraphs from a website, word for word, which was obvious from the chopped-off line endings in the WP text when I viewed it today. I see other text in that article, not from the diff I've identified below, that is from the same site ("The fact that ...").

Diff

The source is

I suppose his/her contribs will now need to be audited. I'll alert the user to this breach and ask him whether this violation is an isolated instance, perhaps even a lapse that was meant to be paraphrased and attributed almost immediately.

It seems to be useful text. Is the normal procedure to ask the user to paraphrase and attribute it now, or simply to remove it completely? Tony  (talk)  13:47, 4 February 2011 (UTC)


 * Hi there. This board is meant for large scale issues rather than individual articles, and the user would have to have at least 5 different problematic submissions to warrant opening a CCI. As the editor has only ever edited the one single article, there's a couple of things that can be done:
 * Simply excise the content and rewrite it from scratch, in two separate steps
 * If the copy / pasted content is a suitable quote, it can be turned into a proper citation and attributed (not the case here)
 * Otherwise, mark the content at issue with {{subst:copyvio}} and list it on WP:CP, where it will be reviewed in a week.


 * Thanks for spotting this. MLauba (Talk) 14:29, 4 February 2011 (UTC)
 * Thank you, MLauba. Not the case because the URL is not authoritative, I guess you mean; or the segment is too long. I'll try to rewrite it, but it's not my area. Tony   (talk)  14:34, 4 February 2011 (UTC)
 * Yup, had the same issue myself, completely outside my subject matter expertise and too technical for my limited English to rewrite. MLauba (Talk) 14:37, 4 February 2011 (UTC)

Image contribution surveyor tool
Enjoy. MER-C 03:41, 3 March 2011 (UTC)
 * Hey, that's fabulous! I tested it out on Verno. :D It refused to handle me; too many edits, maybe? I'll go link it at the instructions. --Moonriddengirl (talk) 15:49, 3 March 2011 (UTC)
 * Oh, you already did. I revised it a bit to bring it in line with the section above and to give you credit. :) --Moonriddengirl (talk) 15:53, 3 March 2011 (UTC)
 * The timeout was caused by Wikipedia not responding in time which was in turn caused by the large number of image uploads. MER-C 06:47, 4 March 2011 (UTC)

The Darius Dholmo nightmare. Over?
Category:Articles tagged for CCI copyright problems, the category created to handle the Darius Dholmo mess is now empty. Let us rejoice. Nevertheless, I suggest keeping the category for now: there have been numerous cases in which the tag placed by Uncle G's bot was incorrectly removed and I think it's likely that we'll see a few reverts to the tagged revision. Pichpich (talk) 16:59, 8 March 2011 (UTC)
 * I've removed the backlog template - I agree to keep the category for now. Skier Dude  ( talk ) 07:06, 9 March 2011 (UTC)

Spot check on contribs
Hi, while clearing off a WP:CP entry, I met, whose recent article contribs (one deleted, one tagged as close paraphrase) hold an unhealthy dose of the dreaded daily special, the CopyPasta. While I haven't gone deep enough to request a full investigation, I think this gentleman's work might require a deeper spot check to ensure the rest is OK. MLauba (Talk) 14:26, 17 March 2011 (UTC)

Clerking
I noticed that the CCI area seems to be quite backed up, and I'd love to help out. Objections?

Regards, MacMedtalk stalk 19:40, 20 March 2011 (UTC)
 * How much experience do you have with copyright problems? You don't have to be a clerk to check the massive piles of articles we have here. MER-C 04:50, 21 March 2011 (UTC)

Wtimrock CCI - Article count
I have just looked at this CCI and the blurb says that the survey found 416 articles but there are only 300 listed. Which is correct?

PS : Is poor Tobby72 going to get investigated at all?

Boissière (talk) 22:30, 20 March 2011 (UTC)


 * 1) Ask Moonriddengirl, she was the one that created that particular CCI.
 * 2) Yeah, I know, but that request is TLDR. I spot checked and didn't find much. If you find an additional copyvio by this user then feel free to open the investigation. MER-C 04:54, 21 March 2011 (UTC)


 * I got lucky and found a close paraphrase in this edit. Opened. MER-C 02:49, 25 March 2011 (UTC)


 * Sorry I didn't see this sooner. :) I have sometimes cut very small listings from the end of CCIs. --Moonriddengirl (talk) 11:34, 16 April 2011 (UTC)

Becoming involved with this process
A users popped by my talk page, after my unsuccessful RFA, with some suggestions about how I can become involved in this area. I gather that there are not a whole lot of active users that are interested in doing copyright work, so I would like to lend a hand. However, this does appear to be a bit of a walled garden, and I can't quite figure out how what the procedure is to go through the checking of contributions. Is it just a simple google search for the added content, combined with an evaluation of the current state of the article, mixed with a bit of common sense and copyright policy? I think more people would become involved if the system itself were easier to understand. -- Nick Penguin ( contribs ) 15:31, 4 April 2011 (UTC)
 * Take a look at WP:Cv101, and don't miss WP:Cv101 at the bottom. Flatscan (talk) 04:46, 8 April 2011 (UTC)
 * Note that some of our copyright violators extensively use print and/or subscription only sources. MER-C 05:22, 16 April 2011 (UTC)
 * Do we need to revise our CCI instructions? Certainly we could stand to link to WP:Cv101. I'm all for encouraging assistance any way we can. :) --Moonriddengirl (talk) 11:36, 16 April 2011 (UTC)

How to update?
I've been working sporadically on Contributor copyright investigations/ItsLassieTime. A few new contributors/socks have been identified; the contributions checked and scrubbed. Do they need to be added to the report? If so, how exactly is that done? Thanks. TK  (talk)  19:19, 25 May 2011 (UTC)

Question
I know that I am rather new to this process, but is there a way that we could help eliminate the backlog that is currently on some of these cases that are lasting two years or more? It doesn't seem as though there are a lot of people who are active in this process in terms of clearing the backlog, but I would be willing to help get people on board should there be a general consensus to create a drive of sorts. I personally have an investigation ongoing against me and I really don't want to see this open in a few years as it will just be a pain should I want to run for something and it isn't even half done. It's just a thought, but I feel like doing this will be quite a good thing for increasing the credibility of a process where things just seem to initially be worked on, and then languish for a number of years. Kevin Rutherford (talk) 23:23, 30 May 2011 (UTC)
 * If you can figure out how to get people involved in any way, shape or form, that would be fantastic. I hate the backlog here. :/ --Moonriddengirl (talk) 23:32, 30 May 2011 (UTC)
 * I don't know if I could commit to it completely, but is there a way that we can bring the process into more of a mainstream thing so that it would garner more attention. It might even be good to implement a clerk process where users who want to can help to not only clear the backlog but maintain the pages. Kevin Rutherford (talk) 19:41, 4 June 2011 (UTC)
 * Thanks; CCI has clerks, though. What we really need are people to do the necessary work of checking articles. --Moonriddengirl (talk) 21:16, 4 June 2011 (UTC)
 * Ah, I forgot this even though a year ago I knew the answer. I know asking the clerks to do this wouldn't be good but I wonder if we could advertise on a noticeboard, see what kind of response we get, and go from there. Considering there are are between 20-35 pages that need clearing, it would be good to at least attempt some sort of action at this point. What are your ideas for going about this? Kevin Rutherford (talk) 22:56, 4 June 2011 (UTC)
 * I've tried advertising at AN a couple of times, but so far have not seen much response to it. We did a Signpost piece, although we didn't focus that much on CCI but on copyright in general. Every time I open a new CCI personally I advertise it at relevant noticeboards, and sometimes that has gotten us assistance. What noticeboard did you have in mind? --Moonriddengirl (talk) 23:03, 4 June 2011 (UTC)
 * Well, there is always the option of posting on the Village pump (miscellaneous) page as well as canvassing on the main IRC channel. It might be a stretch, but involving editors over at the Copyright problems might also be a good start since the people over there are doing work in a similar area as this one. Kevin Rutherford (talk) 00:20, 5 June 2011 (UTC)
 * There's nobody who works at WP:CP who doesn't know about this. :) (More specifically, User:NortyNort is very busy with SCV and CP; User:MLauba is only here part time.) --Moonriddengirl (talk) 00:22, 5 June 2011 (UTC)
 * Maybe I'll try to see if Sonia could help us out in her spare time. Maybe we should explore working with the ambassador program to see what they think of involving new students with copyright issues. The idea with that is it would help to show new users how to correct copyright issues, and therefore help prevent them from slipping up. Kevin Rutherford (talk) 01:16, 5 June 2011 (UTC)
 * I think we have better streamlined the system at CP and as always, more "manpower" is needed. I want to help at CCI but I get bogged down at CP and SCV with the time I have. This day recently burned me out. I have been thinking of ways to advertise but if MRG gets lukewarm responses, I don't know how much more I can help. Maybe advertising the space as a great way to gain admin experience can be an advantage.--NortyNort (Holla) 01:37, 5 June 2011 (UTC)
 * I'm all in for advertising on IRC should we need people. Isn't there a centralized copyright channel somewhere or am I just imagining things? Kevin Rutherford (talk) 15:08, 5 June 2011 (UTC)
 * I think Wikipedia talk:WikiProject Copyright Cleanup is a good hub.--NortyNort (Holla) 22:32, 5 June 2011 (UTC)


 * Rutherford- we don't need extra people. We're staring at the obvious- why don't we who are under investigation do each others investigations instead of searching for people who don't want to do it? most copyviolaters barely know each other, so it would be neutral if we each read up on the policy, then dedicated ourselves to clearing at least one single investigation on another user. If every single copyviolater did it, then we would be clear in no time.ΔΥΝΓΑΝΕ (talk) 02:57, 7 June 2011 (UTC)
 * Just to trow an idea out there: How about we have a bot post a template to every article talkpage, notifying it is possible material was copied to this page and the relevant diffs, asking people to examine it and report their findings on the CCI page or wp:CP. We would need to ensure the messages are not archived until the concern is adressed and remove them (or replace with cclean) once it has. Yoenit (talk) 07:08, 7 June 2011 (UTC)
 * Alot of articles aren't viewed at all by many people. The key is to actually get people to the article, and to read it, perhaps by putting them at the top of some wikiproject list or something, like have them tagged with a template "possible copyvio", and that automatically gets it moved up on some sort of list at the wikiproject page.ΔΥΝΓΑΝΕ (talk) 01:18, 8 June 2011 (UTC)
 * It is not the solution for everything, but it would definitely attract more attention than we get now. We have perhaps over 100.000 articles currently in CCI, a significant amount of those is going to have active talkpages. Even if it helps clear out only 10% of the backlog it is still a big help. Putting a tag in the article itself rather than the talkpage is rather heavy handed and might be an idea if it turns out that talkpage notices are not working. With regards to wikiprojects, the experience I have with wikiprojects (mainly wp:MIL) are that lists like that do not work. Yoenit (talk) 06:58, 8 June 2011 (UTC)
 * I think talk-page notices are a better route. I am not a fan of tags on articles unless the problem with that article itself is visibly plausible or apparent, not just based off of probable cause. Also, editors interested enough to put time into reviewing the article and fixing a potential problem would monitor the talk page well.--NortyNort (Holla) 11:25, 8 June 2011 (UTC)
 * thousands of articles have gotten zero edits for years, and its safe to say that they've been viewed almost by no one. Tagging them will not help at all since there is no one to find the tag.... you need to get people to actually reach the article firstΔΥΝΓΑΝΕ (talk) 20:44, 10 June 2011 (UTC)

Little help needed
Another user has just substantially rewritten College of Arms, and a lot of the phrasing suggests to me there's a lot of copy and paste or insufficiently distant paraphrasing going on. Where do I even begin? → ROUX   ₪  16:06, 27 January 2012 (UTC)


 * By asking him? the researching and writing of the article took me about one year, on and off. You can see for yourself at this sandbox's history: User:Sodacan/Sandbox4/Box4, going back to February 2011. The greatest text used for most of the article is Sir Anthony Wagner's brilliant, but exceedingly large Heralds of England. The "insufficiently distant paraphrasing" is I'm afraid the fact that English is my second language after Thai. No copy and paste have been made, most of my sources are in book form. There is no definitive text on the subject, especially not from a modern viewpoint. Wagner is brilliant but very lengthy, so only snippets were prized out, Mark Noble is another good one, but being published in 1805 is limiting. Finally the College of Arms's own website is very informative and is in itself encyclopedic, so the structure of many parts of the article follow those as set out from the website. All of these are cited and referenced to the appropriate source, in fact not a single paragraph of the article is not cited. If there is issue with the content and research of the article I am happy to go through it sentence by sentence. But if this baseless suggestion is the reason why, then I can't help but think that those discussions we had in the talk page were not in good faith. Unless you have definitive proof of a copyright violation we should refrain from any more discussions on the article, because only one of us would be doing so under good faith. Sodacan (talk) 17:55, 27 January 2012 (UTC)


 * Don't you dare accuse me of not acting in good faith. The article as rewritten by you is riddled with grammatical errors, is largely unreadable in parts, and given the tenuousness of your grasp of the difference between 'evidence' and 'opinion,' I am concerned about how well your information is sourced, and how well the text you wrote is actually supported by the sources given. My concern about copying wholesale from sources is a valid one given the archaic language used in most texts about heraldry and the similarly archaic phrasing you have used. Frankly I don't care what your sources are; I care that copyright is not being infringed. Thus I asked for help here, as history on Wikipedia indicates that the overwhelming majority of people either don't understand when they have infringed copyright (which may be the case here) and are therefore unable to even understand how to help, or they know full well they have copied and therefore do not want to help. → ROUX   ₪  18:05, 27 January 2012 (UTC)


 * How can I not? you have made your opinion very clear. The accusation of copyright infringement is very serious, and it has been made very swiftly by you. It took me a year to complete the article, I put it out for two days and this is what is seriously being considered? The key here is proof, like I said, I am happy to go through it line by line. I am quite aware of the difference between evidence and opinion, that case was a very bad demonstration from me, but I still have full faith in the rest of the article and will stand by it. Funnily enough those reasons you cited are part of my proof that I wrote it, the mistakes and the errors. The article as I wrote it is not perfect and it still needs a lot of work, I know that. The community will deal with that. Sodacan (talk) 18:25, 27 January 2012 (UTC)


 * You're quite right. When I am concerned about an article I shouldn't ask the experts for help in either validating or alleviating my concern. How stupid of me. → ROUX   ₪  18:27, 27 January 2012 (UTC)


 * You are right, you have every right to ask the experts, but clearly this is your issue to sort out and not mine. I was just a little offended you didn't ask me first before you decided to raise this concern. Sodacan (talk) 18:31, 27 January 2012 (UTC)