Wikipedia:Contributor copyright investigations/Transporter Guy

Instructions
All contributors with no history of copyright problems are welcome to contribute to clean up. Contributors who are the subject of a contributor copyright investigation are among contributors with a history of copyright problems and so are not welcome to directly evaluate their own or others' copyright violations in CCIs. They are welcome to assist with rewriting any problems identified.

If contributors have been shown to have a history of extensive copyright violation, it may be assumed without further evidence that all of their major contributions are copyright violations, and they may be removed indiscriminately in accordance with Copyright violations. Contributors who are the subject of a contributor copyright investigation are among contributors who have been shown to have a history of extensive copyright violation and so all of the below listed contributions may be removed indiscriminately. However, to avoid collateral damage, efforts should be made when possible to verify infringement before removal.

When every section is completed, please alter the listing for this CCI at CCI to include the tag "completed=yes". This will alert a clerk that the listing needs to be archived.

Text

 * Examine the article or the diffs linked below.
 * If the contributor has added creative content, either evaluate it carefully for copyright concerns or remove it.
 * Evaluating for copyright concerns may include checking the listed sources, spot-checking using google, google books and other search engines and looking for major differences in writing style. The background may give some indication of the kinds of copyright concerns that have been previously detected. For older text, mirrors of Wikipedia content may make determining which came first difficult. It may be helpful to look for significant changes to the text after it was entered. Searching for the earlier form of text can help eliminate later mirrors. If you cannot determine which came first, text should be removed presumptively, since there is an established history of copying with the editor in question.
 * If you remove text presumptively, place   on the article's talk page.
 * If you specifically locate infringement and remove it (or revert to a previous clean version), place   on the article's talk page. The url parameter may be optionally used to indicate source.
 * If there is insufficient creative content on the page for it to survive the removal of the text or it is impossible to extricate from subsequent improvements, replace it with, linking to the investigation subpage in the url parameter. List the article as instructed at the copyright problems board, but you do not need to notify the contributor. Your note on the CCI investigation page serves that purpose.
 * To tag an article created by the contributor for presumptive deletion, place  on the article's face and   on the article's talk page. List the article as instructed at the copyright problems board, but you do not need to notify the contributor.


 * After examining an article:
 * replace the diffs after the colon on the listing with indication of whether a problem was found (add y) or not (add n). If the article is blanked and may be deleted, please indicate as much after the y. The ? template may be used for articles where you were unable to determine whether or not a violation occurred, but are prepared to remove the article from consideration – either because the material is no longer present in the article, or it is adequately paraphrased so as to no longer be a violation (please specify which).
 * Follow with your username and the time to indicate to others that the article has been evaluated and appropriately addressed. This is automatically generated by four tildes ( ~ )


 * If a section is complete, consider collapsing it by placing collapse top and collapse bottom beneath the section header and after the final listing.

Images

 * Examine the images below. For free images:
 * Does the image look non-free? Is it likely the uploader is the copyright holder?
 * Is the image properly licensed and sourced? Be aware of images that say "this image is licensed under X" without specifying who created it.
 * Do a reverse image search using Google Images. Check the license of the source page. Compare the last modified time with the (Commons) upload time.
 * Do a Google image search for phrases that describe the image's contents.
 * See Guide to image deletion on dealing with cases of possible image copyright infringement. There is no need to open a possibly unfree files listing. Administrators may delete images from multiple point infringers presumptively in accordance with Copyright violations. Evaluators who are not administrators may section images into a "deletion requested" section for administrator attention.


 * For non-free images, determine whether each image meets our non-free content criteria.
 * Note that Commons does not accept non-free content.
 * Annotate the listing with the action taken, e.g. if the image was tagged no source write "no source"; if the fair use claim is deemed ok you can write "OK fair use".

Background

 * Check requested by Justlettersandnumbers (talk)
 * The editor works for the Transporter Classification Database, a connection he has declared. He has arranged for the tcdb.org website to carry a CC-BY-SA licence, and has been creating pages by copying content here from the database. Unfortunately it seems that those pages sometimes contain content taken from other, non-free, sources. Pages where these copyright violations have apparently been imported into Wikipedia include:
 * , blanked and listed
 * , blanked and listed
 * , deleted (G7)
 * , blanked and listed
 * , blanked and listed
 * The last of these was created after he had been not to create any more such pages before the matter had been looked into. At this point, it seems that all contributions should be checked. Justlettersandnumbers (talk) 11:52, 24 April 2016 (UTC)
 * I did a sample of five of his contributions, and I found copyright violations in three of them. Opening this request. Calliopejen1 (talk) 16:57, 23 May 2017 (UTC)
 * I previously commented here as someone familiar with the subject. Some fragments of these pages might be copy-pasted from TCDB website (we have a page about it, Transporter Classification Database). However, TCDB copyright disclaimer (at the bottom) tells: The text of this website is available for modification and reuse under the terms of the Creative Commons Attribution-Sharealike 3.0 Unported License and the GNU Free Documentation License. The author of all TCDB annotations is Dr. Saier. One can not exclude that some fragments of annotations on the TCDB website were similar to texts previously published by the same Dr. Saier in textbooks and articles. However, even if it was the case, I know that authors of the publications in journals are usually allowed to use their own texts on their web sites (such as TCDB). Hence I do not think that TCDB itself is a copyright violation. This is a widely known and used classification and database of transmembrane transporters; everyone in the field knows about it, and no one ever claimed it to be a copyright violation. I do not see what's the problem, and I am not sure if this copyright investigation would be needed. My very best wishes (talk) 13:58, 5 June 2017 (UTC)
 * The issue is not the TCDB website, but rather that the text is copyrighted from the original journal publications. As far as I know, most journals do not allow authors to release the text of their articles under CC licenses (in fact, some journals charge authors ~$1000 if they want to release it under a free license, and many authors don't pay this fee).  Being permitted to post a copy of an article on a website is not the same as releasing that article under a free license... The copied content is not limited to content by Dr. Saier, and in any event AFAIK it is probable that the publishers (not Dr. Saier) own the copyright on Dr. Saier's publications. Calliopejen1 (talk) 19:24, 5 June 2017 (UTC)
 * So, are you telling that having certain overlap with TCDB would probably be OK, however having an overlap with publications in scientific journals and books would be a copyright violation and must be fixed? This is happening because the summaries for the database and publications in books and journals are written by the same author. My area of expertise is different, but I used TCDB in my work and have great respect for its creators... My very best wishes (talk) 19:39, 5 June 2017 (UTC)
 * Yes, exactly. I'm not sure if there is anyone else we can call in to discuss who is more knowledgeable about this issue. I am not an expert on how publishing arrangements work, but I highly suspect that TCDB shouldn't be copying (or closely paraphrasing) copyrighted journal articles like it is doing. This is a complicated area, and is not uncommon to see people getting tripped up by copyright issues. Calliopejen1 (talk) 19:48, 5 June 2017 (UTC)
 * TCDB classification and the summaries are work by one very good expert. It might be even possible that he effectively published TCDB annotations on the internet prior to publishing these materials in journals (I do not know). But as soon as annotations were in the TCDB, anyone could use them, and many people did. Therefore, Google search will provide a lot of hits. I can look at some of these pages. What automatic tool for copyright check are you using? That was CorenBot for new pages. My very best wishes (talk) 00:36, 6 June 2017 (UTC)
 * I was using this tool which indicated that three of the five pages I checked had content copied from various copyrighted journal articles. It's possible that the other two contained additional copied content this tool did not catch. Calliopejen1 (talk) 03:55, 6 June 2017 (UTC)
 * OK, let's consider Phi11 holin family - it is short, specific and created only by Transporter Guy. This tool shows 52% confidence for TCDB and 13% or less confidence with other publication. After looking at TCDB and other publications it appears that the text was indeed paraphrased and one phrase copy-pasted from the TCDB, but not from other sources. That is exactly what I suspected. TCDB disclaimer allows it if I understand correctly. Importantly, this page provides a very clear reference to the TCDB. What we do? My very best wishes (talk) 13:16, 6 June 2017 (UTC)
 * Okay, if you are confident that it only pastes from the TCDB, then it is not a copyright violation. The TCDB is attributed as required, so we are good there.  I'm marking this article okay below.  Like I said, I only saw three of five articles that were copyright violations in the sample I took.  There are bound to be a number of articles that are not copyright violations, but a good share will be.  Calliopejen1 (talk) 17:15, 6 June 2017 (UTC)
 * Okay, if you are confident that it only pastes from the TCDB, then it is not a copyright violation. The TCDB is attributed as required, so we are good there.  I'm marking this article okay below.  Like I said, I only saw three of five articles that were copyright violations in the sample I took.  There are bound to be a number of articles that are not copyright violations, but a good share will be.  Calliopejen1 (talk) 17:15, 6 June 2017 (UTC)

When you are using the Earwig tool are you checking for the text "The search ended early because a match was found with high confidence. Do a complete check. " and re-running the search? This will be necessary for almost every article. I'm just a little perplexed that you are finding no violations whatsoever when I was seeing them frequently. I'll do a spot check. Calliopejen1 (talk) 06:09, 9 June 2017 (UTC)
 * I just sampled one article (see below) and found extensive copying -- the vast majority of the article -- and haven't even exhaustively checked the text. Calliopejen1 (talk) 06:20, 9 June 2017 (UTC)
 * And the second article I checked also contained copying (which was found by Googling, not by Earwig). I'm wondering if it would make more sense just to blow all of these articles up and start over.  We may need to evaluate that option after reviewing 10 or 20 articles more thoroughly. Calliopejen1 (talk) 06:33, 9 June 2017 (UTC)
 * I used the tool you gave me and was making pairwise comparisons with each source indicated by the tool as a potential copyright violation, even with low confidence. I did not do any additional checks using other tools because you did not ask me. So far I found and fixed copyvios on ~12 pages. The tool did not find any other copivios. In a few cases there were overlaps with public domain resources, and in one case there was an external website which copied content from the TCDB. Given your comments above, it will be best if I recuse myself from any further copyright checking here. My very best wishes (talk) 13:59, 9 June 2017 (UTC)
 * OK, here is the problem. See this. It shows 0% confidence for . Let's check the pairwise comparison: . Nothing. But what is it in reality? Google search leads here, which does look like a copyvio to the publication by ... the author of TCDB himself. One should use another automatic web tool to look for copyright violations. My very best wishes (talk) 18:15, 9 June 2017 (UTC)
 * The tool ends its search early and does not complete its check if it finds a confirmed match. But here the confirmed match (TCDB) is a false positive so we need the tool to run the full check. Here is a link to the full check: .  The full check turns up two hits, one of which is the article you identified and one of which is this.  Just because Saier is the author doesn't mean it isn't a copyright violation.  The journal owns the copyright, so Saier cannot release it under CC-BY even if he wanted to.  See e.g. the link you gave which notes "© 2010 S. Karger AG, Basel" (that is, the company).  Calliopejen1 (talk) 18:48, 9 June 2017 (UTC)
 * Also, if you find and fix a copyright violation please mark it so we know that the article had a problem but has been cleaned. Calliopejen1 (talk) 18:49, 9 June 2017 (UTC)
 * Thank you! However, the "genenames" was not a copyvio. If you follow the link it tells the source was Wikipedia. Same with everything else in genenames.org. It takes annotations from TCDB and other similar internet resources. Any overlap with resources like Uniprot and Pfam/InterPro would also be OK, given their copyright disclaimers.  Saying that, the full check does determine another copyvio on the page that I just fixed. Good luck! My very best wishes (talk) 19:47, 9 June 2017 (UTC)
 * P.S. I respect copyright (rules are rules, and I checked pages 1 to 60 using the tool), but here is my problem with this, informally speaking. In this case, copyright is needed for journals to have their profits. However, using these sources in TCDB, even with such occasional copyvios, actually serves as an advertisement for the journals and the scientists who published the papers. Therefore, I did not hear about the journals or the scientists making an issue from such use. There are actually zero chances that anyone will complain. Yes, you do follow the rules and the letter of the law, but are you helping the people and the project in the best way you can? Such situation has been satirized by Nikolay Nekrasov . My very best wishes (talk) 11:39, 10 June 2017 (UTC)
 * Well, I don't speak Russian so I can't comment on the linked text.... The problem here is that Wikipedia's aim is to create unencumbered content that is freely sharable/reusable/remixable, and having copyrighted content could create legal liability for reusers, even more than for Wikipedia itself. In addition, leaving copied content here ultimately is a huge waste of volunteer time. The longer it is here, the more people spend time improving it before it is ultimately deleted -- along with all later revisions thereto. Calliopejen1 (talk) 02:04, 12 June 2017 (UTC)
 * Yes, I certainly agree. Transporter Guy had to fix it. But he did not. That was irresponsible. No doubts. My very best wishes (talk) 02:39, 12 June 2017 (UTC)
 * @Calliopejen1. I guess this resolves the problem with insufficiently reliable searches? My very best wishes (talk) 18:13, 12 June 2017 (UTC)
 * Done - full check using this tool. My very best wishes (talk) 17:24, 18 June 2017 (UTC)

=Contribution survey= This report covers contributions to 173 articles from timestamp 2015-12-26 22:27:40 UTC to timestamp 2016-04-26 02:09:38 UTC. __NOINDEX__