Wikipedia talk:Contributor copyright investigations/Dr. Blofeld CCI cleanup/Archive 1

Joining as a member
Hi. I'd be interested in helping out a bit, but I'm fairly busy right now. I have to do stuff like study for my upcoming exams. That said, I might be able to help out from time to time. Is that something you'd be okay with? Clover moss (talk) 03:49, 15 January 2020 (UTC)
 * , of course (I also got exams). Helping out here doesn’t need to be a full time thing; at cci, even the smallest step forward is a step forward. 💴Money💶💵emoji💷Talk💸Help out at CCI! 04:02, 15 January 2020 (UTC)
 * Thank you. I guess I'll add my name to the list, then. Clover moss  (talk) 04:03, 15 January 2020 (UTC)

Might be helpful
Hi Money Emoji. It might be helpful to have a look at the wee drive we set up for Darius Dhlomo, might be some useful ideas/content: WikiProject Copyright Cleanup/Darius Dhlomo Drive. In particular, I found the Progress Chart to be a motivator towards daily progress. — Diannaa 🍁 (talk) 05:03, 15 January 2020 (UTC)

And look at *Kat*'s talk page! Wizardman sent out invitations: User talk:*Kat*— Diannaa 🍁 (talk) 05:08, 15 January 2020 (UTC)
 * Thanks for the suggestions. I've added a progress chart at the bottom of the page, and I'll send out invitations to the usual gang of copyright people when I have the time. 💴Money💶💵emoji💷Talk💸Help out at CCI! 20:07, 15 January 2020 (UTC)
 * Thanks, it looks great! I've made a couple minor changes to the invitation; I hope you don't mind. :) — Diannaa 🍁 (talk) 01:53, 16 January 2020 (UTC)

Pages with more edits than shown in the list.
Hi - first time helping out with something like this, and almost straight away a question: on this subpage there appears the following entry: However, looking at the history of the page there are more than 2 edits by Dr. Blofeld to this page, adding way more than the 174 bytes suggested here. Do these other edits appear in another entry? Am I supposed to be commenting on just the two edits in question or looking at the whole page history? Leaving this particular entry blank until advised. Cheers, Gricehead (talk) 09:54, 15 January 2020 (UTC)
 * Kuta (2 edits): (+174)
 * , No, just go off of the diffs listed. The other edits are too small to be listed, and the bigger edits were removed by an automatic culling script, which is why it says (2 edits) when there’s just one listed. The edit in question is basic prose. Thanks for helping out, 💴Money💶💵emoji💷Talk💸Help out at CCI! 11:24, 15 January 2020 (UTC)

If the article has been redirected/deleted...
Example from page 44. The source added in those diffs is long dead and the prose no longer exists in any form on a live wikipedia page. What do we mark this?  spryde |  talk  19:16, 15 January 2020 (UTC)
 * , If the prose is totally gone from wikipedia, paste  after the diff. Upon inspection, the prose was actually moved to History of Cumbira, where I removed it as OR, as the source cited to it did not really back it up. After some dirty work on the internet archive, I found the news story the info was cited to,, but the articles were not similar upon comparison, so  not a vio. 💴Money💶💵emoji💷Talk💸Help out at CCI!  19:57, 15 January 2020 (UTC)
 * , thanks. That helps.  spryde |  talk  20:22, 15 January 2020 (UTC)

Is this basic prose?
I thought it would be good to have a section of the talk page where cleanup-newbies can discuss examples that come close to the basic prose threshold so that we can learn and adapt. For instance, is the following (taken from batch 25) still basic prose? Pichpich (talk) 20:43, 15 January 2020 (UTC)
 * N Asit Sen (director) (2 edits): (+1501)(+159)
 * , Yes, since it's just saying where he was born and what films he directed in the most basic of ways. "Sen worked with some of the most prominent actors in Bollywood during his career." is a bit of a borderline case, but I would say it's ok. 💴Money💶💵emoji💷Talk💸Help out at CCI! 21:11, 15 January 2020 (UTC)

Stubs and pictures
I'm mostly attacking the location and movie stubs as I find them, since those are factual information with no embellishments and there are a lot of those. I'm leaving aside the diffs that contain pictures and longer paragraphs as those require more care. Is there a tool that can be used to check for picture copyright or should I just continue what I'm doing and leaving those alone? Zera/talk 21:19, 15 January 2020 (UTC)
 * , Unless there is a significant amount of non-basic prose in the file caption, File insertions can be treated like anything else that is considered basic prose, as this investigation is based on text, and a seperate one can be opened on his uploads to the File: namespace. That won't be necessary, fortunately, as DrB is not known to have problems with image copyright. 💴Money💶💵emoji💷Talk💸Help out at CCI! 21:51, 15 January 2020 (UTC)

I'm not known to have problems with general copyright in articles either but I guess the one or two Fram found from my early days makes me a serial copyright violator. Funny!♦ Dr. Blofeld  09:45, 16 January 2020 (UTC)

Dr. Blofeld 16
I think I'm done for today, but Contributor copyright investigations/Dr. Blofeld 16 is almost done! For the more technically-minded, running  will hide entries already marked with n, to make it clearer what is left. I count 74 pages left. Thanks, --DannyS712 (talk) 05:00, 16 January 2020 (UTC)
 * Now 11 left!!! DannyS712 (talk) 19:31, 17 January 2020 (UTC)
 * , Nice work- I can take out the rest. 💴Money💶💵emoji💷Talk💸Help out at CCI! 19:36, 17 January 2020 (UTC)
 * I wanted to finish it... :( I've never been able to fully finish and courtesy blank a CCI page, and I'd like to. I should be done this weekend DannyS712 (talk) 19:38, 17 January 2020 (UTC)
 * , Hehe... alright, it's yours then. 💴Money💶💵emoji💷Talk💸Help out at CCI! 19:52, 17 January 2020 (UTC)
 * okay, its done:
 * 1 page listed at Copyright problems/2020 January 17
 * 2 presumptive removals: Special:Diff/936015736, Special:Diff/936016101
 * 3 pages listed at Copyright problems/2020 January 18
 * 5 revision deletions requested: Special:Diff/930045512/936267948, Special:Diff/936321948/936322427, Special:Diff/936321948, Special:Diff/936269727, Special:Diff/936270275
 * I'll leave it for you to double check that everything is okay and closed properly - I'm a bit sick of the page at this point... Thanks, --DannyS712 (talk) 02:35, 18 January 2020 (UTC)
 * If you feel like blanking something, Contributor copyright investigations/Dutchy85 12 and Contributor copyright investigations/Dutchy85 13 contains hundreds of diffs that defeated my autoculling script - references containing only text that were split over two paragraphs. The Dutchy85 CCI is also massive (started off at 13432 articles, now 10780), and it too was populated on Boxing Day last year (bad memories). MER-C 18:55, 18 January 2020 (UTC)

LOL
I'm h{{onoured that you would go to such an enormous amount of effort to check every article. I welcome it as I think it will prove that Fram's concerns were unfounded. Keep in mind though that a considerable percentage were short stub creations or film lists etc, I spent most of my time trying to create new topics without writing them so having copyvios is impossible for those. I  would be surprised if more than 0.25% of the articles had vios, but time will tell..♦ Dr. Blofeld  09:41, 16 January 2020 (UTC)

What is really needed is as an expansion drive and get all those short stubs expanded. A lot of them are still placeholders and should have been expanded by now.♦ Dr. Blofeld  10:23, 16 January 2020 (UTC)
 * Maybe you should focus more on improving content instead of mass-producing stubs and expecting people to come along and make something out of them. Der Wohltemperierte Fuchs  {{sup|  talk  }} 19:03, 16 January 2020 (UTC)

I suggest you read my user page {{u|David Fuchs}}, your advice was more relevant 10 years ago.♦ Dr. Blofeld  19:52, 16 January 2020 (UTC)
 * And yet, here we are, cleaning up your messes. Der Wohltemperierte Fuchs  {{sup|  talk  }} 19:55, 16 January 2020

Haha. Yes, all those hundreds of copyvios and work done here cleaning up my enormous Darius Dhomo sized mess. Still a frosty one I see David.♦ Dr. Blofeld  21:28, 16 January 2020 (UTC)

{{Clear}}

Infobox additions
A significant number (at least on page 23) of edits are simple additions of an infobox, with some basic facts. Can we perhaps add bullet point 6 about that, and maybe a suggestion to mark those as Addition of infobox ? &spades;PMC&spades; (talk) 13:46, 16 January 2020 (UTC)
 * ,, added. I hadn't mentioned infoboxes because I thought they would be culled by the script, guess they weren't. You can add Addition of infobox if you want to, but I think the distinction doesn't really matter too much. 💴Money💶💵emoji💷Talk💸Help out at CCI!  14:10, 16 January 2020 (UTC)
 * And of course, thank you for helping out. 💴Money💶💵emoji💷Talk💸Help out at CCI! 14:11, 16 January 2020 (UTC)
 * Cheers, it makes for decent end-of-shift scut work. Just enough attention required to keep me awake, not so much that I have to think very hard. &spades;PMC&spades; (talk) 14:12, 16 January 2020 (UTC)
 * Me adding infoboxes enmasse eh? :-) ♦ Dr. Blofeld  16:01, 16 January 2020 (UTC)

Interwiki translations
I've run into a couple of articles whose content was imported from other wikis, tagged them with translated page, and marked them as n. I'm assuming this is ok per WP:TFOLWP, which pretty much says that the template is sufficient for attribution. &spades;PMC&spades; (talk) 03:28, 17 January 2020 (UTC)
 * I've been having the content imported - see discussion about this at Requests for page importation. I don't know if one is better than the other though... DannyS712 (talk) 03:33, 17 January 2020 (UTC)
 * Ehhhh, like Graham says at the RFPI page, importing the entire history isn't necessary, as long as there's some attribution. Importing this stuff now seems like a whole lotta work for very little gain to me. &spades;PMC&spades; (talk) 03:35, 17 January 2020 (UTC)


 * WP:HOWTRANS says that importing isn't necessary and that adding attribution in an edit summary and tagging the talk page is fine. If you see any cases where something is translated without attribution then add it though.  Hut 8.5  07:31, 17 January 2020 (UTC)

How to mark copyright violations
As a non-admin, I don't really have the tools to clean up obvious copyright violations. So how do you guys want the non-admins to tag violations when we've located the source? Just a with the link? I've also found a few instances of close paraphrasing although I'm not sure it's close enough to be considered a copyright violation. In these cases, I've just indicated the source material and am leaving the decision to editors with more experience in copyvios. Pichpich (talk) 23:13, 17 January 2020 (UTC)
 * It's generally assumed that if you mark an edit as a, you have removed the copyvios added in that edit or taken the necessary corrective action. If you need an admin for WP:RD1, you can use copyvio-revdel. There's also copyvio and G12 if article deletion is warranted. copyvio can also be used if you are uncertain or removing the copyvio is onerous. MER-C 18:28, 18 January 2020 (UTC)

How to mark non-basic non-violations
, I've been working my way #10, which is going well enough. For cases where they aren't basic prose, but were either very easy to fix (in a couple of cases it was 1 line making them non-basic, which I just paraphased myself) or Earwig indicates no breach, what would you like us to do? Fix ping Nosebagbear (talk) 23:47, 17 January 2020 (UTC)
 * , (sorry for the delay) if you’re going to run earwig, first fix all dead links (check the box when doing so because the bot will sometimes miss some), then run earwig. If books/other sources that earwig can’t scan are used, then just leave the listing alone; I can figure out what to do with those. If you check the sources and find nothing, write “ sources checked, no violation” next to the listing. Thanks for your help, 💴Money💶💵emoji💷Talk💸Help out at CCI! 19:13, 18 January 2020 (UTC)

Does this count as basic prose?
This dif mentions "Some of village lands are being used by the Israeli army as military training ground and the settlement of Kerem Maharal now occupies part of the old village." Does the inclusion of this information cross the line of "basic prose"? &mdash;The Editor's Apprentice (talk) 16:08, 21 January 2020 (UTC)
 * , No, I wouldn't consider it basic prose. It doesn't really matter either way, as the text is no longer in the article. I suspect it may have been translated from ar.wiki due to the poor wording. Thanks for helping out! 💴Money💶💵emoji💷Talk💸Help out at CCI! 16:33, 21 January 2020 (UTC)

An actual copyright violation - looking for help
Think I actually found one, Guyana Sugar Corporation, from here on the cleanup list, with some pretty obvious direct copy from the source. And it appears to still be live on the page. If someone can validate, and detail step-by-step what I need to do as a non-admin, it would be appreciated. Ta. Gricehead (talk) 16:54, 21 January 2020 (UTC)
 * Generally, you'd want to start by removing the offending text entirely. Then you can follow the (somewhat-complicated) instructions for requesting a revision deletion at Template:Copyvio-revdel. Alternately, you can install Enterprisey's revdel request script to make requesting revdels easier. &spades;PMC&spades; (talk) 17:04, 21 January 2020 (UTC)
 * , Not a copyvio- the source it was taken from, is in public domain. It is attributed in the article, but in an odd, older way. I have fixed this. When I have the time, I'm going to add instructions on how to deal with copyvios to the instructions later today. 💴Money💶💵emoji💷Talk💸Help out at CCI!  17:10, 21 January 2020 (UTC)
 * OK thanks. Clearly out of my depth here :/ What's the giveaway that I missed about this being public domain? Cheers, Gricehead (talk) 17:23, 21 January 2020 (UTC)
 * Oop, that's what I get for looking at the nuts and bolts of how to do things rather than the actual text. Thanks, ., anything created by the US government is generally public domain (see Copyright status of works by the federal government of the United States). At the bottom of that page, you can see "Source: U.S. Library of Congress", which tells you it's by the US government and would therefore fall automatically into PD. &spades;PMC&spades; (talk) 17:28, 21 January 2020 (UTC)
 * Thanks, both, for the learning opportunity. Gricehead (talk) 17:30, 21 January 2020 (UTC)

Second pair of eyes please?
Can I get someone to Earwig what's left at Arts of Odisha please? I've already removed a large chunk and revdelled basically the entire history because it was like the fourth edit to the page and was still intact ten years later. Anyway, Earwig still shows approximately a 25% violation of something on Issuu, but my work computer blocks Issuu so it also blocks the Earwig comparison. &spades;PMC&spades; (talk) 15:35, 22 January 2020 (UTC)
 * The 25% is the entire lede paragraph (Earwigs excluded a few words that should have been included) - that issuu is listed as "Published on May 3, 2019", however. There is also a 76,2% violation from facebook? Hope this helps. --DannyS712 (talk) 20:46, 22 January 2020 (UTC)
 * Okay thanks, it was the Issuu publication I was most concerned with, because the Facebook comparison showed a date of 2020 (incredibly I can process a Facebook comparison here but not Issuu, god knows why). Looks like it's coping from us, not us copying from them. Gracias. &spades;PMC&spades; (talk) 20:51, 22 January 2020 (UTC)
 * Very likely they copied us...♦ Dr. Blofeld  12:36, 25 January 2020 (UTC)
 * Well yeah, given the publication date of 2019. Unless you're meaning the big chunk I removed and revdelled? &spades;PMC&spades; (talk) 13:17, 25 January 2020 (UTC)

Initial pass complete on page 35
Hello all! I've done an initial pass on Contributor copyright investigations/Dr. Blofeld 35 (with some much appreciated help from Money emoji). I was pretty conservative in what I marked as basic prose, especially in the beginning, so if someone wants to do another pass they're welcome to. I think I'm going to take a break now. If this hangs around for a while I may come back and do something more. Tamwin (talk) 06:40, 25 January 2020 (UTC)
 * , Thank you very much for your help. I'll review what's left off the page. 💴Money💶💵emoji💷Talk💸Help out at CCI! 17:32, 25 January 2020 (UTC)

Help
I am normally interested in clearing out copyright violations but I don't understand what to do. I am constantly getting an error with the Earwig Copyright Tool so it is quite difficult to help out. Here is the error: I don't understand why I keep getting this error but it is stopping me from helping out. Pkbwcgs (talk) 22:55, 25 January 2020 (UTC)
 * its because Eargwigs is popular - wait until it resets every day... sorry --DannyS712 (talk) 23:00, 25 January 2020 (UTC)
 * That's a shame. Because Earwig is not working right now, I can't help out. Pkbwcgs (talk) 23:01, 25 January 2020 (UTC)
 * You can still help by marking edits that are simply basic prose or translations from wikis in other languages. &spades;PMC&spades; (talk) 23:04, 25 January 2020 (UTC)
 * I am not sure what to do in this situation. Can you please help me out. Pkbwcgs (talk) 10:20, 26 January 2020 (UTC)
 * Countrystudies is a public domain source, so that edit (thankfully) is ok since it was attributed. Wizardman  14:18, 26 January 2020 (UTC)

Subpage 13
Can I get some other eyes on the edits I've moved to a separate section at the top of Contributor copyright investigations/Dr. Blofeld 13? I've done as much searching as I can and can't find sources, but don't want to write them off as kosher without confirmation from others (particularly the plot summary ones as noted). &spades;PMC&spades; (talk) 07:05, 1 February 2020 (UTC)

Sock
So sorry, I just remembered, Dr Blofeld had a sock: User:Tibetan Prayer. I guess we will have to check there as well :( — Diannaa (talk) 22:44, 26 January 2020 (UTC)
 * It's not that bad, there's only four months worth of edits to check = 1250 articles, 774 (62%) of which were removed by my culling script. MER-C 20:45, 27 January 2020 (UTC)

You talk about me as if I'm some kind of malicious vandal. "Sock" infers I was using the account for negative purposes which couldn't be further from the truth. You clearly don't have any respect for me as an editor or person.♦ Dr. Blofeld  09:01, 28 January 2020 (UTC)
 * Sorry to have hurt your feelings Dr Blofeld, I thought my comment was extremely neutrally worded. — Diannaa (talk) 12:56, 28 January 2020 (UTC)
 * This "sock" doesn't seem to be declared on your user page, nor vice-versa,; and the user page of the "sock" account falsely claims you have retired from Wikipedia. Andy Mabbett ( Pigsonthewing ); Talk to Andy; Andy's edits 09:48, 29 January 2020 (UTC)
 * Because I was being hassled at that the time it was created, a place to edit in peace away from the silly nonsense. And it was only retired after being trolled by people on external sites..♦ Dr. Blofeld  13:13, 3 February 2020 (UTC)
 * With regards to the latter, since we're looking at historical cases, I interpreted it as that account had been retired. Nosebagbear (talk) 09:55, 29 January 2020 (UTC)
 * It explicitly says "user" not "account". Andy Mabbett ( Pigsonthewing ); Talk to Andy; Andy's edits 10:01, 29 January 2020 (UTC)
 * More retirements than Sinatra! Still, only 15,000 more pages to check...  Lugnuts  Fire Walk with Me 17:37, 29 January 2020 (UTC)

Pareto principle
Having been attracted to this by WP:CENT, I sampled one of the articles – Thumpokhara – which turned out to be an insignificant village in Nepal. As this was a pro-forma stub, it seemed that clicking a random page would be as likely to find a copyright violation. I verified this by looking at a random article – Bill Cherry – which proved to be a similar sort of stub.

Looking at stubs of this sort for copyright violations is a big waste of time because you're unlikely to find anything significant. The main copyright issue in such cases is sweat of the brow but that's not accepted in US law and is unlikely to be accepted here as it would be an existential threat to Wikipedia.

To make light work of this, the exercise should be reorganised using the law of the vital few. To determine these, the lists should be filtered and sorted using parameters such as:


 * 1) Exclude stubs
 * 2) Exclude cases where the article has been edited so that Blofeld is no longer a significant author
 * 3) Sort by size of the article's prose

You then go through the list from the largest to the smallest and stop when you're no longer finding anything.

Andrew🐉(talk) 13:12, 3 February 2020 (UTC)

I admire their tenacity at doing it though, I think it will only prove what I said originally. ♦ Dr. Blofeld  13:21, 3 February 2020 (UTC)
 * There's already been a great deal of auto-culling of the lists, using a script as well as manually by Wizardman. The material is sorted by the size of the edit, not by the size of the article. The size of the article is in some ways irrelevant. I've worked on quite a few CCIs (Racepacket, Epeefleche, Mushroom9, Tobby72 to name a few of the bigger ones) and I find it helpful to start with the smaller edits, which results in me getting a feel for the user's writing style and skill level, which makes assessing the trickier ones a little easier. So no, I personally would not start with the bigger ones; in fact quite the opposite — Diannaa (talk) 23:42, 3 February 2020 (UTC)
 * That all sounds fine but please could Diannaa explain the details of the previous culling as I'm puzzled that a stub like Thumpokhara is still in the list. Andrew🐉(talk) 11:34, 4 February 2020 (UTC)
 * , The culling only removes tables/refs/file insertions/infoboxes/super basic prose and such, (Example), but it doesn't cull the several stubs that just a bit more text for some reason., the creator of the script, might be able to elaborate why it isn't able to do that. 💷Money💷emoji💷Talk💸Help out at CCI! 12:22, 4 February 2020 (UTC)
 * My two cents: 1 is untenable, since "stub" is a vague term that typically applies to any article less than 300 words in length, and because many entries in print encyclopedias are shorter than that it could theoretically exclude articles that were copy-pasted in their entirety (was what was meant "sub-stubs" that are only one or two short sentences?); 2 works in theory, except the only way to establish as much would be to go through the page history and establish if any significant amount of Blofeld's text is still present (i.e., virtually all the work of removing it); 3 is irrelevant, as the amount of text Blofeld added is not related to the overall length of the article. Hijiri 88 ( 聖やや ) 01:12, 6 February 2020 (UTC)

Harold S. Gladwin
Harold S. Gladwin had most of its content removed in 2008 by with the summary "rm suspected copyvio material", after, in the previous edit, adding a book with the summary "bio which actually looks like copyvio source to me". Dsp13, is this removal based on a comparison with the book? I.e., can you confirm whether this was or wasn't a copyvio before your edits? --DannyS712 (talk) 03:53, 12 February 2020 (UTC)
 * I think I added an external link, rather than a book (see Diff - and then removed content from the page since it repeated text from that external link: https://web.archive.org/web/20080224174247/http://www.mnsu.edu/emuseum/information/biography/fghij/gladwin_harold.html Dsp13 (talk) 08:09, 12 February 2020 (UTC)
 * Sorry for the misunderstanding. I can't tell when the external link content was originally posted there, so I cannot tell which came first. Presumptively blanking would be the next step, but since its already removed, the only question is if it should be revdelled. --DannyS712 (talk) 08:21, 12 February 2020 (UTC)