Wikipedia talk:Contributor copyright investigations/Darius Dhlomo/How to help

infringement detection
IMO the paragraph
 * In some cases Darius Dhlomo copied entire sentences at a time, but simply placed them in a different order. If you are using a WWW search engine to search the WWW for original works that were copied, note that this will affect your success if your search strings span multiple sentences.

is too complicated and dangerous. "In some cases" should read as approximating "in all cases", i.e. any sentence written by Darius is presumed copied. And searching is useless if (say) the info was copied from a print source rather than a web source. So I'd just ditch that paragraph. The rest looks good. 67.122.211.178 (talk) 22:29, 6 September 2010 (UTC)

While this page presumably has links in numerous other pages involved in this project, there should be a link on this page "back" to a central listing of the articles that need to be reviewed. Yeah, I know, I can find them, and I know where they are, but this might make it a bit more inviting for some to participate and assist in reviewing another article or two. Steveozone (talk) 06:31, 9 September 2010 (UTC)


 * I take particular interest in a lot of the content Darius Dhlomo has (not) created. I've been going through the articles in his list (there are thousands he has created).  The vast majority I see are results--as in publicly availably statistics--of athletic events, and stub articles about names on those lists merely mentioning that they were in that competition and achieve that result.  I do not believe either of those circumstances are subject to copyright, even if he took the mention word for word from another article.  I would like to get a definitive statement on that.  I posed this on the Project:Athletics page days ago so far with no reply.  If this simply takes a minor word rearrangement we can take on some of that project.  And most important, since I am checking on these articles, apparently some recognized administrators are placing a checkmark to say they have checked articles.  Sure I can copy the code--what a pain--but how is this done properly and who is qualified to make such a check, under what circumstances?  Trackinfo (talk) 17:52, 10 September 2010 (UTC)
 * Sorry this hasn't been straight forward! Use to mark an article as all clear. Use  to mark an article as a violation or questionable. Make sure to leave a short explanatory comment (e.g. "cleaned", "not cleaned", "looks like a violation") on the ones marked as violations. SFB/talk 22:33, 10 September 2010 (UTC)


 * To answer the rest of your question: anyone can make such a check so long as they have no history of copyright problems. We can use all the help we can get! VernoWhitney (talk) 00:05, 11 September 2010 (UTC)


 * Thank you. Now that I have spent considerable time checking and marking many articles, I certainly hope the BOT somebody is programming to blank these articles will check our checklist before blanking.  Quite frankly, I have found precious little content that I consider copyrightable.  As I mark "stats and facts" I find the majority of his work has been translating dry statistical public record into stub articles that just state those facts.  I've already been through a few hundred articles that essentially read:  "Name, birthdate, born in location, competed in . . . for his native country, played for this team, achieved this result."  This is extremely valuable material to WP that frankly I am glad he did.  It gives a place for further information about these subjects a place to go, without all the legwork to set up the article.  Most of these articles need far more attention--but at least they exist.  I have yet to hit on one of the articles where copyrighted material was copied, by now I am guessing they are a small percentage of the whole of this body of work.  I think the folks out to destroy this material with a BOT ought to hold off. Trackinfo (talk) 18:15, 11 September 2010 (UTC)


 * OK, now I have been through the better part of a thousand of this user's articles (so I am suggesting I have looked at close to 10% of his newly created articles). I have seen the potential copyright violations in a handful of articles--there is an obvious problem with creating original material.  It even seems like english is not this user's primary language, or poor translations of these copyright violations are used.  This, however, is microscopic compared to the volume of overall work performed.


 * Having seen a good impression of the scope of this body of work I am going to suggest that this user be welcomed back with open arms. So valuable is the legwork to create these highly accurate wikifications of public record documents, I would be willing to overlook the occasional tendency toward copyright violations in prose.  That can be sanitized by another editor supervising the contributions.  So convinced of the value of this user's contributions, I will offer my own edit time to do that supervision.  Let's get this user back on the job. Trackinfo (talk) 13:25, 12 September 2010 (UTC)


 * Trackinfo, the carefully examinations at the CCI discussion page showed an awful lot of copying. Any reversions under consideration should be checked carefully.  A revert that you just did seems to insert or restore copyvio text from, for example. 67.119.12.29 (talk) 23:19, 12 September 2010 (UTC)


 * OK everybody knows more about this than I. Somebody please explain who owns the copyright on, for example:  1993 World Championships in Athletics – Women's high jump which has been blanked.  There is one, very short, factual sentence of prose and the rest is pure statistics.  There are a bunch of blanked articles just like it. Trackinfo (talk) 06:06, 21 September 2010 (UTC)
 * I expect they will be reviewed in short order. --Moonriddengirl (talk) 11:38, 21 September 2010 (UTC)
 * In other words, go away, you're bothering us more important people. You have obviously taken the tact that here on WP, you and your group are more equal than I because I once copied information off of a government agency website (and who owns the copyright to that?).  I'm asking for a statement of policy.  For essentially a pure statistics article like this, yes obviously copied from the source of the statistics, what element of sports statistics is a copyright violation?  How can it be a copyright violation?  What sports organization in its right mind would prevent dissemination of its statistics through copyright protection?  Carrying that forward, at what point does a basic reporting of facts and statistics become a violation--When the order of words matches somebody else's order of the same facts?  When the order of key facts matches somebody else's?  Let's see, name, date of birth, where they came from, they are known for this achievement (say an Olympic medal) . . . are we already getting into a violation situation?  The majority of these articles that are now getting blanked en masse are simple statements of sports statistics--which is largely what is known about many of these subjects.  What more do you expect to be added to these statistics to somehow make them unique to the pure statement of facts about historical public events that they are, without inserting WP:POV, WP:OR or worse, out and out falsified information to supplement?   Does someone hold the copyright to making these statistics orderly or compiling a list of these statistics?  If these are purely statistics, then (if properly complied) when put in order the resulting summary will be exactly the same.  Mathematics (scores) and alphabetization are constants.  Looking at what is blanked, hundreds of the kinds of articles I just described were called copyright violations.  Publicly explain how? Trackinfo (talk) 16:47, 21 September 2010 (UTC)
 * Many of these articles are not going to be copyright problems, as we have already discussed at great length here and is explained at Contributor_copyright_investigations/Darius_Dhlomo. There is strong consensus for this action, which has been amply publicized and reviewed at all necessary forums. While I understand your dismay at the temporary inconvenience to our readers and to those projects volunteers who will help with cleanup, this has nothing to do with the importance of individuals. Arguing the merits of one article is not helpful; arguing that many of them are not copyright violations is not, either, because it's nothing we don't already know. Each article needs to be reviewed by people who understand and comply with copyright policy. Besides your own history of copyright problems (while your confusion about the Ventura County website is understandable, is not a "governemnt agency"), you have already violated our copyright policy by restoring copyrighted content to one of these articles multiple times: here, here and, finally, here. This is the reason why we request assistance only from those with no history of copyright problems, because those who do have a history of copyright problems are not the best representatives of due diligence in our cleanup efforts. --Moonriddengirl (talk) 17:14, 21 September 2010 (UTC)

Clarifying
Since this subpage did not include it, I have copied over from WP:CCI the following important note: "All contributors with no history of copyright problems are welcome to contribute to clean up." Contributors who do have a history of copyright problems should not be reviewing this content. --Moonriddengirl (talk) 00:05, 13 September 2010 (UTC)

Queries from a copyright n00b
Howdy. Athletics at the 1984 Summer Olympics – Men's hammer throw cropped up when I was looking at recent changes. I've been half-following the DD thing and thought I'd see if I could help out, but I've just got myself confused. If someone could clear up my confusion I might be able to help a bit more....

Uncle G's bot blanked the page, I understand why. I started looking at it and found that someone looked at this on the 13th September and removed a para of prose as a copyvio. I have googled snippets of that removed section and only found mirrors. I have looked through the ELs that DD posted in his edits of the article and can't find the text there, so I am unsure where that original copyright infringement was - what am I missing? Furthermore, all that's left now is a whole load of stats with a vanilla intro para. Is any of that a copyright infringement? The guide suggsts a degree of severity if I get this wrong (if you revert the blanking and there is still found to be an infringement you can be liable, to paraphrase from my memory of reading it a few hours ago), hence the questions. A big thank you to all the people involved here who actually know what they're doing! Bigger digger (talk) 01:48, 21 September 2010 (UTC)
 * Try a different WWW search engine. I just ran some of the text removed by Sillyfolkboy through Bing, and it found the original work straight away. Uncle G (talk) 02:13, 21 September 2010 (UTC)
 * Wow, first ever argument for Bing over Google, thanks! And can you address the second query, are all the stats a copyvio, or would it be ok for me to revert this now? Bigger digger (talk) 02:33, 21 September 2010 (UTC)
 * There's a detailed discussion of this on Administrators' noticeboard/Incidents/CCI, where Moonriddengirl addresses some of the subtleties. It's not as clear cut as one would like, and Moonriddengirl is probably the better person to consult on it.  However, that particular page is one of the already-checked ones.  Ask at User talk:Moonriddengirl about User:Moonriddengirl/checked and the statistics pages listed thereon.  I pushed all of the already-checked ones through first, in order to get them over with quickly.  Moonriddengirl is going to quickly run through the list at some point in the near future, I understand.  You could probably help.  But ask at User talk:Moonriddengirl first. The more challenging pages are the slightly less than 9,000 ones to come, that the 'bot hasn't yet processed, that aren't on Moonridengirl's list and that haven't had humans already review them.  Uncle G (talk) 02:56, 21 September 2010 (UTC)

Is it possible to get a list of affected articles by WikiProject?
I could try to help in reviewing Sociology-related articles, for example; such lists could be also reported as "need urgent review" to all WikiProjects. --Piotr Konieczny aka Prokonsul Piotrus 00:09, 23 September 2010 (UTC)
 * The articles that have been blanked are listed at: Category:Articles tagged for CCI copyright problems Trackinfo (talk) 07:40, 23 September 2010 (UTC)
 * Thanks. Don't forget to ad : to the category you mention. That's enough for me to get a list with CatScan; which tells me there are no real sociology articles affected. I will repeat the process for other WikiProjects I am involved soon. --Piotr Konieczny aka Prokonsul Piotrus 15:59, 23 September 2010 (UTC)
 * There's a "Things that you can patrol" section on the page that actually covers this. Uncle G (talk) 16:20, 23 September 2010 (UTC)
 * Right, but CatScan is not perfect, the main problem is that it stops after 4 or so subcategories. I was asking if we have a tool that would look for intersections of the Articles... category and specific project assessment templates on its talk. I know that there are tools that generate similar reports, such as User:WolterBot clean-up reports (which do not seem to have a section on copyright issues, at least in the one I just checked) so... --Piotr Konieczny aka Prokonsul Piotrus 16:37, 23 September 2010 (UTC)

Sub-pages
"Please make a note at the article's listing at the CCI subpage that you've (dealt with the copyright problem), signed with four tildes, so we can keep a record of the cleanup." There are currently twenty-four subpages split into two separate strands covering tens of thousands of articles. Is there a quick way to determine which sub-page an article appears on? Quick = seconds, effortless. And on a tangent, the language paraphrased above ("if you revert the blanking and there is still found to be an infringement you can be liable") is how I also interpret the instructions, and a part of me wonders (a) what kind of legal power Wikipedia has to hold me accountable for the copyright violations it failed to spot during the two years they were carried out in a foreign country and (b) why on Earth should I leave myself open to legal action, when there's no way to prove conclusively that something isn't a copyright violation? The content could have been borrowed from a source not present on the internet, for example, or behind the Times' paywall, or I might simply not have found the source it was copied from. How can we be absolutely sure that an article is not a copyright violation? -Ashley Pomeroy (talk) 19:10, 30 October 2010 (UTC)
 * Special:WhatLinksHere, linked as What links here in the Toolbox. Example: Lori Ann Mundt → Special:WhatLinksHere/Lori Ann Mundt → WP:Contributor copyright investigations/Darius Dhlomo 9. Flatscan (talk) 05:03, 31 October 2010 (UTC)


 * E.g. the articles seem to be mostly about sports figures, with tables of data. How can we tell that the tables of data are not held under copyright? (I'm thinking of this kind of thing in particular). I am absolutely confident that the article I cleared was sound, but having to search for tables complicates matters immensely. It seems to me that the purpose of this exercise is to throw the problem out to the wider Wikipedia community, the mass of non-specialist casual editors who blat out edits without much thought, which might well end up with hundreds of problematic articles reverted right back to a problematic state because one detail was missed. -Ashley Pomeroy (talk) 19:15, 30 October 2010 (UTC)