Wikipedia talk:Copyright in lists

Status
This essay is not yet finished. I'm requesting feedback before requesting feedback. :) --Moonriddengirl (talk) 21:18, 15 February 2011 (UTC)

just a couple of comments
I think there needs to be a few clarifications overall.


 * 1) First is that a lot of people like to say that Feist v. Rural only said that data/information is free and not copyrightable. What it really said was that the *underlying* data was free but how it is presented may not be. Therein lies a massive understanding issue with many people. I think that somehow needs to be better expressed at the start. Underlying public information vs presentation of such information.
 * 2) The wider consensus is that, because this is the English speaking Wikipedia with servers located in the US we only follow US laws. Yes we do, but there is an underlying process that says we respect the laws of other countries. It doesn't say "We only follow them when it suits us." As such, in the United States, a phone book may not be copyrightable but in Australia it is - even if the underlying content (i.e- the data) isn't. In other words while we could conceptually "cut and paste" the information in a phone book from America, we could not do the same for an Australian phone book. The E.U has also recently adopted copyright for certain data. In cases such as these do we ignore their copyrights because the US does not have them? Or do we honor those copyrights? In it's current form only deals with US copyright, not how to deal with other copyrights in relation to Wikipedia.
 * 3) The concept of "fair use" is not the fully same in the real world as it is here. In the wider sense it is "easier" outside of Wikipedia. Wikipedia has a policy that lays out 10 criteria that must be met. In that regards it is important to look at the source of such data. If such data was collected as part of a story and it is offered up via a commercial content provider it goes beyond the copyvio/formatting issue - because there is a fairly common thought process that, at Wikipedia, anything claimed as "fair use" is not considered a copyvio. The reality is two fold - if the Associated Press offers a story that included "data" we are allowed to source it, but not cut and paste it. However, with "fair use", in the context of a commercial content provider, we really,*really*, would not be allowed to cut and past it - any portion of it - simply by claiming "fair use" because it would be seen as failing criteria number 2 - "respect for commercial opportunities." This is a very narrow scope I am talking about, I am not talking about a list of who won the Grammy awards that anyone can see and re-purpose, I am talking about a combination of a few things: Formatting, Original Source, Amount of content.
 * 4) Consider touching on things like graphs and other graphic illustrations of data. Could any of that be considered a "free enough for Wikipedia" replacement of a copyrighted presentation? The focus may be on text, but often that can be presented in another way. That is where some other polices, such as OR, might figure in. Re-drawing of figures from scientific works is a current discussion that touches on such issues. It is more about images than numbers, but "data" is "data" - in this case a question about if "The drawing is a reconstruction of an extinct form of life" is raised. The only real information on such a thing would be "data". And that "data" would need to be considered via other policies in order to be presented.

I know this isn't meant to be a one stop solution but sometimes there is a very narrow line between "data" that is freely obtained vs data that is only available at one source, and data that is really free (public knowledge) vs something that Wikileaks would print/publish. Soundvisions1 (talk) 22:27, 15 February 2011 (UTC)


 * Thanks for your input. :)


 * In re: point #2, Wikipedia respects the laws of other countries to the extent that we regard content as copyrightable even if we do not have agreements with the country of origin, but at the same time, per Non-U.S. copyrights, "While Wikipedia prefers content which is free anywhere in the world, it accepts content which is free in the United States even if it may be under copyright in some other countries." We have had OTRS challenges re: sweat of the brow compilation protection, and these have been rejected. It may very well be worth creating a template to note when content may not be legal in other countries, just as we do with images on Commons, though. That way reusers of the content will know that the material may not necessary be free for their intended uses.


 * In re: point 3 and 4: the 10 criteria of WP:NFCC do not apply to text; the policy says:(emphasis added)

"Articles and other Wikipedia pages may, in accordance with the guideline, use brief verbatim textual excerpts from copyrighted media, properly attributed or cited to its original source or author, and specifically indicated as direct quotations via quotation marks,, or a similar method. Other non-free content—including all copyrighted images, audio and video clips, and other media files that lack a free content license—may be used on the English Wikipedia only where all 10 of the following criteria are met."
 * This was created, I'm afraid, with a text bias, since that's the area where I work. Text have their own criteria, set out here and incorporated into policy by reference. I'm not sure if it needs to include other media. Charts and diagrams may also be compilations, but I'm afraid I feel inadequate to even begin to address that. :/ I only know that Godwin used to say that if a chart of diagram could be done in a different form than one claimed under copyright, it should be. Now, I could mention that information presented in copyrightable presentation may be reconfigurable in chart, if you think that's a good idea. My "What copyrighted selection/arrangement mean for Wikipedia" is a bit skimpy.


 * In re point 1, I'd be happy to expand that, but I'm not sure how without getting considerably longer. I mention the creativity of selection and arrangement pretty early. Do you have any recommendations? Maybe a pull-quote? --Moonriddengirl (talk) 22:53, 15 February 2011 (UTC)


 * Thanks for the response. In regards to 3/4 - it does apply in the correct context. You even quoted it: Articles and other Wikipedia pages may, in accordance with the guideline, use brief verbatim textual excerpts... As with the rest of the criteria words such as "brief" apply to the context, and meet the Foundations resolution about such exemption being minimal. The policy on Non-free content applies across the board. In the context of what this is about, if an editor took "data" from a commercial content provider and used it in it's entirety, sourcing it or not, it could not be made "safe" by simply claiming "fair use". It's the same reason Wikipedia wouldn't allow a full text of a book, or even a full chapter, to be posted at Wikipedia. Not only would it be failing the "brief" allowance for text, it would most likely be seen as failing "respect for commercial opportunities." But my main point is I think it needs to be said that a claim of "fair use" will not stop such "data" from being deleted if it is seen as *not* meeting the policy.


 * And as for the graphs - it was somewhat of an addition, in regards to "free content" in case such "data" as is being discussed here is copyrightable. I know your main area is text, but in the overall purpose of Wikipedia they go hand in hand. I could very easily see both editors and editor/admin claiming graphs are "free" despite a copyright claim because the "data" used to create it is "free" and a "Something is PD cannot be copyrighted" argument or, in regards to how many images are now being seen here, "public domain" because they lack "creativity." Maybe it could all tie in simply by stating something along the lines of "some underlying data may be free but the means that data is presented may not be." (i.e - the concept that "1+1=2" may be free but a pamphlet on why one plus one does, or does not, equal two may not be. A list of Academy Award winners may be "free" but a book about them may not be. etc) And "While Wikipedia allows for brief verbatim textual excerpts under it's non-free content policy moving (cut and pasting) full data tables would fail any claim of fair use."


 * And I know you (at least seem to) disagree but content taken from any sort of commercial content provider is a copyvio unless presented in the correct manner. Normally that would be in the context of "fair use". But an article that contains a "list of the top 100 grossing films in the county" (yes, county - not country) that is offered for sale can not be used in its' entirety here simply because it is presumed the data is "free". If that data is not available anywhere else *but* that provider it fails our policy when claimed via "fair use", and if it isn't it is a copyvio. Again, going back to Feist v. Rural, part of the argument was that a phone book is a requirement see forth by the government, so that sort of data can't really be under copyright. That is a lot different than if private "data" is being sold somehow, and for what reason.


 * On a related issue that should be mentioned - Privacy law varies, but it does exist and, in relation to certain data, should be considered in relation to our polices and real world implications. And I know that the scope of this is copyright, not privacy issues, but in the internet age I think the line will become (has become) a bit blurred. See Do you know who is looking at your personal information? for information on some of the related lawsuits about "free" data from a legal firm involved in such cases. Just because underlying data may be "free" does it mean Wikipedia needs to use it? Just a thought because I know this somewhat arose out of items such as the "Billboard top 100" or "top grossing films" but in the broader sense "data" is "data". Using the same Feist v. Rural argument, ignoring polices (in this case WP:OUTING) phone-book data is only "data" and has been established as being free. Admins (and editors alike) have a right to not want their "data" (name, address and phone number) to be in an article about "List of Wikipedia admins by state/city/town", even if using the same "free" argument as a basis. Copyright, or lack of, shouldn't fully be the only consideration.


 * Maybe, in looking over what I have said, a most "plain English" wording may be "Use of any data must meet Wikipedia policies, including but not limited to WP:OUTING, WP:NPOV, No original research, Non-free content criteria and Copyright violations" Soundvisions1 (talk) 00:00, 16 February 2011 (UTC)


 * Oh, no, I'm afraid you've misunderstood what that says. The guideline is at WP:NFC. The policy is at WP:NFCC. The 10 points are part of policy, not the guideline. ("Guideline" and "policy" have very specific meanings on Wikipedia. :)) I really feel pretty strongly I know the intention of that one. For example, looking at those 10 points, #3.1 is completely inappropriate, #7 has no bearing (I don't know if the Paltry quote I utilized on the front of this is used in an article or not); and #9 does not apply at all (it goes without saying that neither does #10).


 * "And I know you (at least seem to) disagree but content taken from any sort of commercial content provider is a copyvio unless presented in the correct manner." ? I don't know why you'd think that; excepting, of course, that I don't believe content taken from a commercial content provider needs any special handling whatsoever if it is not copyrightable. We don't have to claim "fair use" for public domain content, right? They have no exclusive right to what they do not own. --Moonriddengirl (talk) 00:20, 16 February 2011 (UTC)


 * I meant guideline - but that is just a plain English example of the policies wording which says "Minimal extent of use", or, as it would relate to the text example "use brief verbatim textual excerpts." And 3 is actually an "a" or "b" situation and "b" applies. Number 7 would surely apply - I can't simply cut and past a section of a non-free article and call it a Wikipedia article and state "fair use" in the edit summary. (I know I have discussed this in the past but for text you use a copyvio tag that blanks the page or section, for an image if it is claimed as "fair use" that same copyvio concept is rendered null and void - but it really is the same idea). Number 9 I had discussed with you in the past - such as a user page using song lyrics on their user page. The concept is it is ok, again, to use "use brief verbatim textual excerpts" but, for me anyway, the exact same argument should be used across the board. (for a short clip of a song or a short clip of a film in a user space for example or why AFI's 100 Years...100 Movie Quotes is acceptable to use all the "quotes", but it wouldn't be to use clips in AFI's 100 Years...100 Thrills to show *why* these are "heart-pounding") (NOTE: Just saying, It is not really related to this but I always ponder such justifications here so that is more of a rhetorical question)


 * I thought at some point you had said you can't call anything claimed under fair use a copyvio. If you didn't sorry about that. :)


 * What was said below was sort of what I was getting at as well - go beyond the oversimple "nutshell" but not as deep as some of the wording. Although the lead in may be swaying too far in the simple direction now - not sure.


 * One section that still bothers me - and mostly because of conversations I have at times - is in the "Background" section. "Discoveries (facts) are not copyrightable" just leaps out as me. It is one of those things that I can see being quoted way out of context over and over again to justify use of any data related material. Idea-expression divide says it nicely: ...limits the scope of copyright protection by differentiating an idea from the expression or manifestation of that idea. To me the key idea with data is the presentation needs to be separated from the information itself. So, maybe, "While discoveries (facts) are not copyrightable, the author who presents such information has wide latitude in choosing what to say and how to say it, so the form the data is presented may be copyrightable." That, to me, expands on the concept (ok, the law) that "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work" by explaining it in "plain English"


 * But looking good. :) Soundvisions1 (talk) 01:23, 16 February 2011 (UTC)


 * I've clarified the discoveries are not copyrightable statement; do you think my alteration helps? (In this content: Discoveries (facts) are not copyrightable, but compilations often are. Copyright doesn't only govern fiction; an historical essay may be as much an original work of authorship as a purely speculative science fiction novel. The author of each has wide latitude in choosing what to say and how to say it. Likewise, a list or compilation may be extremely creative. We must determine the degree of creativity (and, hence, usability) on a case-by-case basis.)


 * In terms of copyvio, although I'm capable of saying wonky things sometimes, I don't think I've said that particular wonky thing. :) You may be thinking of a conversation about how to tag items listed under fair use? Currently WP:CSD excludes content "claimed by the uploader to be fair use", which limits our approach to that stuff. As to the NFC stuff, we can go on more about that elsewhere, my talk page if you like, since it's off topic here, but I do have to wonder if you're reading the same #7 that I am. :) ("One-article minimum.") --Moonriddengirl (talk) 12:57, 16 February 2011 (UTC)
 * Yes to the first. As for the second, I don't remember overall. F9, as it relates to commercial content, has always been a thorn in my side because it goes against the Image Use policy that clearly states an invalid claim of fair use is a copyvio. It is accepted here that claiming fair use on any content from a commercial content provider is an invalid claim of fair use. 1+1=2. at least to me. With text that is always a "non-issue" as I've said. And that ties into NFCC#7 somewhat, even without content coming from a commercial content provider. If you think of a mainspace article = a file page it might make more sense. Non-free material needs to be used in order to validate any claim/s, so in the context of text if I cut and pasted a list of data into mainspace and said it was "fair use", without any surrounding article (think of it as "context" or a FUR for other material), it is akin to upping a non-free file and not placing into any article. For text such a use would be called a copyvio, for an image it wouldn't, but the core implication is exactly the same - without the article it is not usable, and certainly any claim of "fair use" for either should be seen as an invalid claim of fair use.


 * And just a note on the examples. You don't suck at it at all, but is there a reason for the use of the check-marks and "x"'s? When I look at that it is causing confusing. Depending on how one looks at it, shouldn't "Safe content" only contain examples with a green check-mark? And shouldn't "Uncreative selection" only contain red x's? To me green suggests "Ok! Great!" and red suggests "Danger, Will Robinson" A good example of "Uncreative selection" would be a really obvious one - the English alphabet. Soundvisions1 (talk) 14:36, 16 February 2011 (UTC)

Talk page stalker comments
Hope you don't mind me commenting here - unsurprisingly I currently have your talk page watchlisted and the section "List of highest-grossing Bollywood films" grabbed my attention as I was for a short while involved in one of these (though I think it may have been the Tamil version) and so ended up here as I've wanted to understand more about the list issues for a while now. Anyway two comments immediately spring to mind: Dpmuk (talk) 23:53, 15 February 2011 (UTC)
 * The lead to me reads more like background information than a lead. I'd like to see more of a summary of the whole page here (but in more detail than in a nutshell so as to avoid duplication).  This may well also help with my next point.
 * I think this page is, as it stands, great for people used to dealing with copyright problems and who are likely to be making the decisions but I don't think it will help the average editor understand the issue. As it stands I think it is overly-legalistic and overly-long for the average editor to easily understand and so I think it needs a section (or the lead) to be more aimed at the average editor.  If I'd been pointed at this a couple of months ago I'm not sure I'd have been any the wiser as to why something was a copyright concern.


 * Okay, so I need to pull it back. Thanks. That's very helpful. :) --Moonriddengirl (talk) 00:20, 16 February 2011 (UTC)


 * Is this any better, or no? I really appreciate your feedback, by the way. One of my big challenges is figuring out how to explain stuff to people who aren't already familiar with it, so your sort-of-fresh eyes are perfect. :D --Moonriddengirl (talk) 00:42, 16 February 2011 (UTC)


 * I think that's much better. My only comment would be to suggest that somewhere in the lead it's stated that we work under US copyright laws as that should make it clear to people why what you summarise may differ from what they understand the situation to be.  I'll leave the discussion of the rest of the page to those that know the relevant law / case law etc as everything I know I've found out on wikipedia :-). Dpmuk (talk) 11:46, 16 February 2011 (UTC)

Simple-minded comments
Looking at this document as a future guide, I have just a couple of comments related to style rather than the issue of copyright:
 * 1) The text uses the phrase "value judgements" and the article would benefit from saying a little more on what is commonly cited that is not a value judgement as well as complex examples that are. For example:
 * ✅ estimates which are based on a repeatable calculation, such as trend analysis or interpolation
 * ✅ forecasts using standard repeatable methods, such as net present value calculations
 * ❌ ordered rankings based on judgement, such as the top 50 most influential Muslims
 * ❌ estimates with no definition as to the source of data or how they are calculated and are either challenged or probably based on unreproducible experience, such as bookie odds
 * ❌ calculations which are themselves based upon numbers created by value judgements
 * 1) I think is it helpful to compare with existing articles to show how the issue was managed successfully, there is one example given in the text and perhaps a couple of different types of example would be good illustrations.
 * 2) At the moment little is said about recommending how far one can apply fair-use. This is difficult, however it is safe to answer the basic question "do we have to remove all mention of this website?" - obviously it is well within fair-use to cite or quote a demonstrably trivial amount from the data such as one or two numbers per stand-alone article. For example it would be fair to say "BOI estimates that Dil Toh Baccha Hai Ji made Rs. 26 crore of gross box office income by the second week of showing which ranks it as third overall for February 2011 with Yamla Pagla Deewana running in first place" which explicitly quotes one estimate and implicitly uses their estimated rankings. However the question of how much one can quote in a list article (such as extracting the top ten ranking) remains open for debate. Fæ (talk) 10:03, 16 February 2011 (UTC)
 * PS I added "a non-standard thematic list of postage stamps" as an example of creative content. I have such a document giving a thematic breakdown of the collections in the British Library, I dare say they would release it for free use but it is potentially under copyright. Fæ (talk) 19:25, 16 February 2011 (UTC)


 * Thanks for coming up with yet another example. :) As you've seen, I've implemented yours. I hope I've addressed your points above, all very good ones. --Moonriddengirl (talk) 19:25, 16 February 2011 (UTC)

Doh!
I was searching for something else but I came across a few things that should be linked to. Namely sections of the What Wikipedia is not policy. The Wikipedia is not an indiscriminate collection of information section has a few sub-sections that relate to this topic. Just thought I'd toss that out there. Soundvisions1 (talk) 16:21, 16 February 2011 (UTC)
 * Lyrics databases. Perhaps not in the same vein as some of the data being discussed but, in part: Most song lyrics published after 1922 are protected by copyright, and any quotation of them must be kept to a minimum, and used for the purpose of direct commentary or to illustrate some aspect of the style. (This section contains a link to the Do not include the full text of lengthy primary sources guideline)
 * Excessive listing of statistics - in part: articles should contain sufficient explanatory text to put statistics within the article in their proper context for a general reader. It goes on to say if a longer list is needed to use a style similar to Nationwide opinion polling for the United States presidential election, 2008. < - That list seems to also be a good overview of how to also present sourced data and not infringe on any copyrights.
 * Catalogue - Wikipedia is not a stamp catalogue nor a database of collectables. More than the existence of reliable published information regarding specific items is required for inclusion.

Some examples / Questions
Liking this a lot - the clarity of writing & the premise. Though I have some concerns about how far it is pushed. Thoughts/questions:

1; Time's All-TIME 100 Movies appears to be an example of a copyright violating (under this scheme) list? Am I seeing that right?

Second question: a table of racecar drivers clustered by average protein consumption before a race; what about if we have a table of racecar drivers in general that also has (under a reasonable scope) a column for protein consumption before a race. And what if that table is sortable - theoretically allowing you to sort/cluster the list by protein consumption. Any issues with this? Or is the word clustered meant to imply more than simply "sorted by"?

Third:  a list of notable residents of a town; that the individuals are residents is fact, but the selection criteria "notable" is not; what about presenting this as attributed prose (or even just attributed list). So something like: "In a 2008 article, Joe Bloggs identified Fred Jones, Albert Spangler, Tom, Dick and Harry as the most notable graduates of the college". Would that cause trouble?

I think this is one area we will have to be very careful to nail down because it could easily cover a lot of material. --Errant (chat!) 12:58, 18 February 2011 (UTC)


 * Thanks for the feedback. :)


 * For 1, yes, it most definitely is. Kind of an urgent one. :/ And that's not limited to this particular document, but to clear precedent in U.S. law and consistent guidance we've received over the years from Mike Godwin and our interim associate counsel. A value based judgment like that is copyrightable. It's not remotely "fact".


 * In terms of your second question, the presumption is that the list we are copying is sorted by protein consumption, which would be an unusual element probably with a pretty targeted audience. :) It's a theoretical example chosen in line with Key v. Chinatown, where a yellow page directory was deemed copyrightable in having unusual categories such as "bean sprout dealers". (You don't find that category in every yellow page, to be sure!) In keeping with that particular case, taking the arrangement wholesale would likely lead to infringement. Say our source lists it from high to low; if we copy his list and sort it from low to high, we're unlikely to have made enough changes to avoid "substantial similarity". The courts have been inconsistent with material like this, but generally have been lenient on reuse where that reuse adds something substantially transformative. The problem, of course, is that that's very subjective and hard to describe. If I wanted to include protein consumption of race car drivers in an article I was writing, I might approach it like this: if my source list had 50 drivers, I might reproduce 25 and add columns for other factors (height and weight, maybe; average placement in races; age). In this way, I'd be building on and adding something of substance to the original. That's what copyright laws are meant to encourage.


 * In terms of three, we can do it as an attributed prose or list, but within the confines of WP:NFC, that is, it can't be extensive. Consensus at this point is that a brief selection of the list is usable, which is why The 500 Greatest Songs of All Time includes 10 out of the 500. Context is very important there, too. Fair use considers substantiality of the content related to the whole; if we're writing an article on a college, it is easier to defend a cited statement of Joe Bloggs' opinion, as it will probably not constitute the majority of the article. --Moonriddengirl (talk) 13:32, 18 February 2011 (UTC)
 * Category:Time_(magazine)_100_Lists has a couple more FYI. I am not sure the process for them, so will let someone else bring down the hammer :)
 * Right, so in the race car driver example I was thinking in terms of someone writing a list of Race Car drivers, with all sorts of data that is reasonably considered of relevance. Including protein, that aspect of which is referenced to our fictional source list. The implication being, then, that the ability to display it in the same way is incidental to the primary construction of the list (which is made of accessible data). I see and understand that, great :)
 * For the last one: What if the list is 5 or 6 persons long? And a sort of extension to the question - what is the difference between a long piece of prose that discusses an authors opinion on the 5 most notable people (which we then present as list) and if they presented it simply as a list themselves. Or is there no difference? Is there a length at which it switches from being a list into being citable or quotable prose? --Errant (chat!) 13:41, 18 February 2011 (UTC)


 * My twenty cents on the questions:
 * Time's All-TIME 100 Movies is not a copyvio because the article is *about* "Time's All-TIME 100 Movies." In a visual context if you looked at The Falling Man and there was no text, only the image, and it was cited to the Associated Press it would be clearly failing our policy on non free content, and it would be seen as a clear copyvio as well. If you add back in the text is seen a meeting our policy and no longer is a copyvio. Such an article can also be seen as meeting the Wikipedia is not an indiscriminate collection of information section of the What Wikipedia is not policy. I feel that articles should contain sufficient explanatory text to put statistics within the article in their proper context for a general reader clearly applies in this case.
 * I think, overall, the idea is that any data that is used in an article should be verifiable, and also not seen as original research. In the context of the example and how it is currently presented it doesn't appear to make clear those things are taken into account. In other words such data may be "be considered un creative in selection" - thus the red "x". If it had a green check-mark it would imply the data is "considered uncreative in selection."
 * Again, the data first must be proven to verifiable, and not seen as original research. As the example points out "the individuals are residents is fact, but the selection criteria "notable" is not", meaning the names would be verifiable, but the notability of them is original research. In your question "Joe Bloggs" would have to, first, be established as somehow "notable" as a reliable source. And the context of the article would have to explain *why* "Fred Jones, Albert Spangler, Tom, Dick and Harry as the most notable graduates of the college" One presumption might be that "Fred Jones, Albert Spangler, Tom, Dick and Harry" all have their own En Wikipedia stand alone articles.
 * So there is my opinion/s on the matter. Soundvisions1 (talk) 13:53, 18 February 2011 (UTC)
 * (my &frac12;p worth) I would disagree with your conclusion about the Time 100 - if you examine the source quoted (link) the table is virtually a cut and paste of the body of the Time article which has an unambiguous copyright statement underneath it. The table has creative content as it is a ranked list based on unreproducible expert qualitative judgement. I would agree that if the embedded list were removed, then the rest of the article would not represent a copyvio. On a practical note, the list was only recently added and earlier versions of the article are not an issue (for example this version from December 2010). Fæ (talk) 14:01, 18 February 2011 (UTC)
 * I, too, disagree with your conclusion about the Time 100. They've reproduced the entire thing. We cannot copy Harry Potter even if we wrap text about the book around it. :) --Moonriddengirl (talk) 14:12, 18 February 2011 (UTC)


 * That is part of the point I feel. Clearly, on its own, File:The Falling Man.jpg is a "cut and paste" of a copyrighted image...you could say it is a 1:1 "cut and paste." What Time uses for their layout and what Wikipedia uses are not 1:1. The idea, however, that I was getting across is that the source of the list is *only* the list, there needs to be a "tansformative nature" to viewing that list in another setting - such as the article here. As with The Falling Man the Time's All-TIME 100 Movies article takes something and adds to it. Soundvisions1 (talk) 14:14, 18 February 2011 (UTC)


 * I've removed the list. It is plainly "value judgment" and can only use brief excerpts in accordance with Terms of Use. If you disagree, we should discuss it at that talk page. --Moonriddengirl (talk) 14:17, 18 February 2011 (UTC)
 * Well, the good news is that only one other article in that category had copyright problems, and that was also recent. The edit summary when Time's List of the 100 Best Novels was created suggests that it had already been deleted for copyright concerns, but I don't know under what title. There's no log at that title or at TIME's List of the 100 Best Novels, where it was originally created. --Moonriddengirl (talk) 14:32, 18 February 2011 (UTC)

(This is what I originally wrote; I'm stickign with it. I'll address new points above.) I've already written to try to get attorney feedback on List of highest-grossing Bollywood films (since I'm unsure if I should follow up with interim counsel, who gave me the opinion quoted on the front); I've expanded that to include the Time 100 lists. :P I hope to get some feedback soon. And, yes, I believe the courts would consider primarily the factor of whether or not we are appropriating their work or building on it. In Key v. Chinatown, the competing directory escaped infringement even though the material was held to be copyrighted because they did not appropriate the structure and selection. The more ours differs, the safer we are. The amount of protein consumed is not itself copyrightable.

In terms of your last question, it's all very subjective. The substantiality is based on the importance of the material; you can infringe by copying too much, or you can infringe by copying "the heart" of your source. We really should consider how much of our work relies on theirs and how much their work relies on what we've taken. The more we take, and the more important, the greater risk we run of infringement, when the content is creative. It shouldn't really matter whether we are copying prose or a list, or transforming one into the other, so long as we stay within fair use. Non-free content is a bit skimpy on what that means within text, but generally there should be good reason for reproducing non-free text information. The more it is used in the context of a larger work discussing the subject, the better off we are likely to be in that respect. If I created an article and just pasted in Joe Blogg's "100 notable people from Stanford" list, I'd be in trouble. I'd be superseding the original. A more defensible use would be to select individuals from the list and present cited commentary on his selection of them. ("Joe Blogg's list has been criticized for some of its members, such as John Smith, Jane Doe, John Doe and Jane Smith, all of whom have established careers in the dairy delivery industry. One critic, Jim Critic, wrote, 'Does it really matter to a milkman's career that he came from Stanford? I don't think so; and I don't think milkmen are what Stanford is looking to produce.'")

How much alteration is enough to avoid copyright infringement is always going to be a moving line, depending on the viewer. The Supreme Court has reversed a number of lower court decisions on the question, which kind of illustrates that even amongst so-called "experts" there is substantial disagreement. I've seen court judgments heavily criticized on both sides of this particular debate, and I suspect that the judicial definition here is not really firmly established. --Moonriddengirl (talk) 14:11, 18 February 2011 (UTC)
 * If it is a copyvio the exact same concept needs to apply to what I presented earlier - AFI's 100 Years...100 Movie Quotes. And items such as are contained in US Albums could be seen as exactly that same thing. Part of the overall issue is "how much" is too much *and* context. To me List of number-one albums of 1961 (U.S.) comes off as a cut and paste, but is also seen as meeting the Manual of Style (stand-alone lists) guideline (Which, incidentally, also sends people to Lists of people which could also apply to one of the questions) I 100% agree that we do not use the full text of a book in an article about a book. We do not use all of the music on an album in an article about an album. And so on. However we are discussing "data", and, no matter how you slice it, at Wikipedia the Non-free content criteria policy applies. "Critical commentary" is a foundation for use, and helps to provide the "how much" context. List of number-one albums of 1961 (U.S.) really doesn't have any "critical commentary" for example, so I wouldn't see how it would pass our policy. Time's All-TIME 100 Movies does have "critical commentary". I would agree that a "list of the top 1000 films of all time" most likely should *not* use the full 1000, no matter how it was presented. So the "case by case" scenario comes into play. Soundvisions1 (talk) 14:37, 18 February 2011 (UTC)
 * When we discuss presentation it's not really the look of the thing that is being discussed; it is about presenting the data. So #1's of X year is fine, because that is purely data, and if some sources lists those #1's no subjective work has gone into compiling it. That those #1's are each part of a list that would be copyrighted is fine because we are only using a portion of the list. I think MRG's point here is that if we are re-using the creative intent behind a list then it is a copyright issue. Time's All-TIME 100 Movies does have commentary, sure, and that means it would be reasonable IMO to present a portion of the list under fair use (if it can be shown this is useful to accompany and illustrate the commentary), but the whole list doesn't have a fair use rationale really. --Errant (chat!) 15:49, 18 February 2011 (UTC)


 * I'm just commenting here because while I see you're getting legal council on this (which will resolve the issue), I believe that what films Time selected is simply "Facts", the creative aspect being the blurb that each film has (if you clickthru the linked article) being the part that is clearly subject to copyright. What Time, or AFI, have done is very little different from, say, the selection of films for nominees or awards at the Oscars - the how and why are creative aspects that, were they published, we couldn't include wholesale, but certainly the bare facts of what was selected is uncopyrightable.
 * I agree that there's a separate notability issue on these lists and that needs to be met before the lists can even be considered to be used, but my concern is the copyrightability of a list verse the rationales or extra creative effort that discusses that list. I do await what legal says about these, however. --M ASEM (t) 17:50, 18 February 2011 (UTC)
 * That a movie is 90 minutes long is fact; that a movie stars Brad Pritt is fact; that a movie is "best" is opinion...or, if you will, "value based judgment." Your list of "best" films is not going to be the same as their list (unless by remarkable coincidence). No one can argue that the list generated and published here is formulaic. The list is inherently creative, as it relies on subjective human opinion. --Moonriddengirl (talk) 17:57, 18 February 2011 (UTC)
 * Coming from your NFC post, I agree with the approach that if the list is compiled strictly from some human decision metric (as the Time list likely was), it is copyrightable, and thus the question of what amount we can include is an issue. I think we need to be clear (at WP:SAL, at WP:NFC?) to distinguish these types of "Best of" lists from others that are acceptable as they are simply factual. --M ASEM  (t) 18:17, 18 February 2011 (UTC)
 * That's kind of the point of this essay. :) But, yes, if people are not aware that some lists are creative, we may well need that encoded in guideline. The copyright FAQ brushes upon it, but only very lightly, and while it mentions "creative selection" is copyrightable, it doesn't really offer any guidance on how much we can use. --Moonriddengirl (talk) 18:23, 18 February 2011 (UTC)
 * First, sorry if it seemed as though I was crossing the copyright issue with the non-free content issue, it was not my intent. But, as with almost everything at Wikipedia, it all ends up somehow mashing together.


 * As I mentioned in my Doh! post there is the What Wikipedia is not policy, specifically the Wikipedia is not an indiscriminate collection of information section which covers some of this very clearly yet doesn't touch on the copyright issue at all outside of the use of lyrics. For "data" the explanation ties back into the non-free content policy without really saying so and how "critical commentary" is a must by saying that "articles should contain sufficient explanatory text to put statistics within the article in their proper context for a general reader." I admit it is an extremely fine line and someone looking for for a "list" of something is not automatically going to associate it with the NFCC, maybe not even copyright - more than likely a noob will be sent to Manual of Style (stand-alone lists), which makes no mention of what I started off talking about and is now tied into this as well. But, as I mentioned above, that guideline does state Stand-alone lists are Wikipedia articles; thus, they are equally subject to Wikipedia's content policies, such as verifiability, no original research and neutral point of view. At face value it has noting to do with copyright if you are a new user, but to those who have been around the block a few times we could stop at "they are equally subject to Wikipedia's content policies" and understand that Copyrights, Copyright violations and Non-free content criteria also apply to a stand alone list *or* a list within an article.


 * I think that what Moonriddengirl said up above (It shouldn't really matter whether we are copying prose or a list, or transforming one into the other, so long as we stay within fair use. Wikipedia:Non-free content is a bit skimpy on what that means within text...) is very much in line with what I have been (trying to say) saying overall. And I fully agree with what Masem says also (I think we need to be clear to distinguish these types of "Best of" lists from others that are acceptable as they are simply factual.)


 * The "how much is too much" question is probably better left to the question I posted elsewhere, but it is a valid comparison to ask/say that if the actual list section in Time's All-TIME 100 Movies is either (take you pick - all or some) 1. too long, 2. copyvio, 3. fails our non-free content policy, than so does AFI's 100 Years...100 Movie Quotes. In both case somebody (or a few somebodies) made a "creative selection/value judgment" as to what was a top 100 and Wikipedia, in both case, made a post of the full top 100 in an article *about* those lists. One was removed, one wasn't - so there is no real clear guideline or policy that explains it either way - and yes, I understand that is part of what *this* essay is going to be for. But I also think things need to be tied in both here and elsewhere (where it currently isn't)


 * Did that make sense? Soundvisions1 (talk) 20:36, 18 February 2011 (UTC)

Preliminary feedback
I've gotten a first response from our associate counsel and have sent her a follow-up e-mail with some additional questions I'm hoping she'll have time to answer.

In terms of guidance on what we can do with copyrighted lists, she says in brief:
 * 1) The more we use, the greater the risk.
 * 2) The more important the content we use, the greater the risk.
 * She explains that "if you list the top 5 out of a top 20 or even top 100 list, it's less likely to be fair use because the top 5 is usually what the public is the most interested in. Whereas if you give #2, #6, and #18-20, even though you are giving up the same percentage, it is more likely to be considered fair use."

She suggests we also consider the following:
 * 1) Using older lists that are out of publication may help a finding of fair use;
 * 2) Republishing the lists as we do "appeals to the same audience as the original" and is not likely to be seen as transformative.

I've asked her if there are any kinds of percentages that we may use as a guideline, recognizing that there may not be, and will report here. --Moonriddengirl (talk) 17:24, 28 February 2011 (UTC)


 * Thanks for looking into this. It's good information to have from the Foundation, as it were, and I am glad you posted it over at the non-free content guideline talk page as well. Soundvisions1 (talk) 18:38, 28 February 2011 (UTC)

Feedback
Hi! I read over this and made some small corrections to accuracy (it suggested freely licensed works aren't copyrighted, which I'm sure wasn't your intention), but the rest looks fine to me. I don't have a strong background in this area but you've done good research on the case law. Like some of the commenters above I'd like to see more about charts and graphs if possible. I think it's also worth commenting that analysis and commentary on a list can improve a claim of fair use (in particular, we should describe what third party reliable sources think of the list, not just the list itself). Hope this helps. :-) Dcoetzee 06:41, 14 March 2011 (UTC)
 * Thank you. :) I've added a bit, and immediately stricken it. I'm trying to draw together an RfC, since my recommended approach is a bit more liberal at this point than our attorney feedback. --Moonriddengirl (talk) 17:26, 16 March 2011 (UTC)


 * I've launched (very belatedly) a discussion at Wikipedia_talk:Copyrights. I'd like to get together some ideas and perhaps a proposed guideline on the question before posting an RfC. --Moonriddengirl (talk) 12:34, 2 July 2011 (UTC)

Copyright of polls
WP:Articles for deletion/200 Greatest Israelis is an AfD on list formed from a news website poll. The outcome was delete, but the closer dismissed the lengthy dispute over copyvio as irrelevant. Based on the delete precedent, roughly 25 AfDs were initiated. Most have been closed as keep, as the participants agree that there are no copyright concerns.

These were all presented at the first AfD and dismissed by some of the participants:
 * WP:Non-free content,
 * Its footnote WP:Non-free content, and
 * The discussion WT:Non-free content/Archive 51.

I believe that the local AfD consensuses conflict with the attorney feedback. Flatscan (talk) 04:53, 14 October 2011 (UTC)
 * If the community does not believe this is a credible legal issue we can go back to the attorney for clarification or we can refine our interpretation of that advice so that the guidance makes more sense. AfD discussions cannot override the legal advice. Was there a specific recommendation for improvement from the AfDs as opposed to just "keep"? --Fæ (talk) 07:42, 14 October 2011 (UTC)
 * The outcomes that I spot-checked were keeps without additional comment from the closer. The lists that I spot-checked included all 100 people, sometimes with multiple sortings (e.g. both ranking and alphabetical). Flatscan (talk) 04:46, 15 October 2011 (UTC)


 * I'm inclined to agree with Fae here. There is obviously gray area in copyright law, as the parameters are not clearly defined legally (and probably could not be). If attorneys never disagreed, cases would seldom get to trial, because they would not need a judge to settle things. Assuming good faith and general competence, either plaintiff attorney would say, "You're wrong; you'll lose. Back out." or defendant attorney would say, "You're wrong; you'll lose. Settle up." And certainly there'd never be dissenting opinions among judges. :) I do know that our attorney didn't give the advice lightly; it reflects her legal opinion, and it was engineered for safety. While some lawyers might think only of the protection of Wikipedia, our legal team is highly motivated by desire to protect our users as well...and good on them. I would think far less of her than I do if her opinion started and ended with "Well, if it's a problem, we can take it down...." rather than thoughtfully evaluating risks. --Moonriddengirl (talk) 12:09, 14 October 2011 (UTC)
 * Thanks for the replies. I can see the argument that asking "Who is the greatest person from Country X?" and collecting the unfiltered responses have minimal creativity. On the other hand, "greatest" is a value judgment, and the results are not reproducible. Flatscan (talk) 04:46, 15 October 2011 (UTC)
 * The legal advice received was for ranked lists where creative judgement was involved, such as "the 100 most Influential Moslems" as judged by a panel of scholars or "the top 100 Bollywood films of 2011 by (estimated) box office income" where the estimates are not a repeatable calculation or otherwise statistically based on published box office receipts. The advice, as far as I know, did not specifically cover the result of public polls and the nature of creative content is arguably quite different from the situation when asking for the same information from a panel of experts or a single expert estimator even though all these situations have some creative content. Considering how the community response is not comfortable with the interpretation of the legal advice, it seems reasonable to go back to our attorney to clarify that the same advice must specifically apply to the results of public polls where any estimate or ranking does not rely on any expert opinion and where no copyright release is given (such as might be the case for Government sponsored polls). Given such a clarification, if the advice is unambiguously that this is a copyright problem we can then amend any guidance and make any necessary minimal changes to the articles in question (such as removing a copyrighted ranking but keeping the list of names in alphabetical order). --Fæ (talk) 06:35, 15 October 2011 (UTC)
 * Moonriddengirl, would you be able to request clarification? I was worried that the AfDs would snowball further, but I don't see any new ones. Flatscan (talk) 04:26, 19 October 2011 (UTC)
 * Yes, I can. I will do, although it may be a few days before I get response. --Moonriddengirl (talk) 10:23, 19 October 2011 (UTC)

BTW, I have raised a similar consideration at Articles for deletion/Le plus grand Belge where unfortunately the only version of the source material is tucked in an old archive which clarifies nothing about the intended copyright status or the process by which the poll was conducted. --Fæ (talk) 19:09, 5 November 2011 (UTC)


 * Hi Moon/Fae. I raised some of the below points at the AfD pointed to.  But so that they do not get lost, will repeat them here.  My view (informed from having faced this issue in my three decades of practice), is that there is clearly no copyvio here.  Some main points to consider are that: 1) there is no copyright vio here per Feist and its progeny, 2) the opinion expressed by the sysop law professor confirms this, and 3) the readily apparent reflection of aggregate poll results by all manner of top-flight media sources confirms this as well.

--Epeefleche (talk) 20:51, 8 November 2011 (UTC)
 * 1) U.S. law. The U.S. Supreme Court clarified the issue of the application of copyright to lists of this sort in Feist Publications v. Rural Telephone Service Co., 111 S. Ct. 1282 (1991).  In which it wrote (emphasis added):  "A factual compilation is eligible for copyright if it features an original selection or arrangement of facts, but the copyright is limited to the particular selection or arrangement. In no event may copyright extend to the facts themselves."  So — a screenshot of the list of the aggregate poll results would, for example, be covered by copyright.  But the mere listing of the compilation of the fact of the names and order in the aggregate poll results is not covered by copyright.
 * 2) What is protected — Attribution and Format. The key to having a list without violating US copyright law is:  a) attribution; and b) format.  There is no copyvio as long as we have: a) attribution (which we do have here; the lists at issue clearly attribute their sources), and b) the format of the list is not a mirror of the original format (that's because the original format is covered by copyright law; we are also OK here).  Per Feist.
 * 3) Not the opinion of the pollster. It is noteworthy that these poll results are not the opinions of the pollsters themselves.  The pollsters did not create the responses.  They have zero "creative content" in the fact of the aggregate responses.  They do not have any "interest" in the aggregate opinions of the polled public.
 * 4) Not an expert opinion. It is also noteworthy that there is no issue of "expert opinion" here.  Note —even where there are expert opinions by a group of experts, results are commonly reflected. As in AP football rankings, and football and other sports polls.  And Academy Award results (the Academy Award is a poll of a professional organization, overseen by a Board of Governors; if there were any copyvio issue, one would think that the opinions of members of an organization would be more likely to attract copyvio protection than would the opinions of disconnected respondents in a national poll). And NME Critics Polls.  But the instant non-expert polls don't even raise the issue.
 * 5) The aggregate opinions of thousands. It is also worth noting that these poll results are the factual aggregation of opinions of the thousands of people polled.  There is a large difference between the opinion of one person, and the aggregate opinion of a multitude of people.
 * 6) Broad top-level RS media reflection of such lists. If there were a copyvio issue, media would never be able to reflect such poll results—often, as in the football results, running into the dozens of entries or more—without violating US law.  The top-level media does in fact report such results.  Without any apparent copyvio concern.  There is no reason—based on actual practice—to think there is a copyvio involved in reflecting such poll results.  No specialized legal knowledge is involved here; even laymen can see that.
 * That is why all manner of highest-level RS media (and wp) reflect every sports list determined by poll such as: lists reflected in Category:College football rankings, every Academy Award list such as the List of Academy Award-winning films, every Gallup list such as the Gallup's List of Most Widely Admired People of the 20th Century, the List of Hot 100 number-one singles of 2011 (U.S.), all lists reflected in Category:Time (magazine) 100 Lists, every poll of critics such as the 1974 NME Critics End of Year Poll, and every election (though some elections may be also subject to other exceptions from copyright laws). These are just a smattering of the hundreds of poll results covered on wikipedia, and in normal course by media in general.  Does anyone honestly think that both the media and wp are forbidden by US copyright law from reflecting these poll results? If so, why is all manner of media in fact reflecting them?
 * 1) If Feist were not the law of the land, but the opposite were true, we would have to delete from wp every reflection of such lists.
 * 2) Law professor opinion. Even a sysop law professor, just 2 months ago, has indicated that there is "clearly" no copyvio issue lists of this sort.
 * 3) No indication of assertion of copyright protection, and even if there were such an assertion the law would control. As to whether this publication claimed copyright protection vis-a-vis the results of the poll, there is no evidence that it did.  Nor would any hypothetical assertion of a right— where none exists— override the law as reflected in Feist.
 * There are some technical issues here on interpretation, for public polls with random responses from the public there may be no issue, but the selection method, sample size and nature of how lists are generated is often unclear in the sources. In the example of the deleted list of the top 500 most notable muslims, the judging panel was composed of 12 people, this may well be redefined as an expert "poll" of 12 selected people, would we consider now undeleting? This may seem slightly wiki-lawyerish, but examples of polls to produce "top 100 most notable" lists might be:
 * A poll of 4 well known experts
 * A poll of a larger number of individuals but of a constrained nature (say a selected set of 20 academics who have published articles in the British Medical Journal)
 * A poll for comments may be considered quite distinct from a simple voting poll (these would more often be called a survey, but there is no firm use of language here)
 * We also have the issue of taking information from sources where no copyright is declared and those sources where a copyright is actively protected. For example some marketing survey reports are almost entirely based on privately commissioned polls (sometimes with the only added value being easily reproducible calculations and sometimes where the contributors were paid to take part) and yet one pays significant amounts of money for a copy of the report, so to claim such reports have no copyright protection seems rather ground-breaking as an interpretation of Feist. --Fæ (talk) 22:00, 8 November 2011 (UTC)
 * Fæ -- this discussion was prompted our collective discussion at the AfD, which you refer to above, of Le plus grand Belge. That was a poll by a Belgian tv show.  Thousands of votes were cast by its audience, who voted for the greatest Belgian. That raised none of the issues -- even if they are issues -- raised in your above post.  And yet in that AfD, you voted "Delete on grounds of doubtful copyright".  Looking at the specific facts of the poll in question, I don't see any copyvio issues whatsoever.


 * Similarly, the Israel poll first referred to above, which is the other poll mentioned in this string, was also typical of the recent series of AfDs of polls where a nom who is an apparent non-lawyer raised the spectre of copyvio--but without any discussion of applicable law. The Israeli poll was also by the public media.  Of thousands of members of the public.  Reported publicly, in the media.


 * We can of course get into extended discussion of what is not at issue in these AfDs. But that might be a waste of time.  Focusing on the actual attributes of the polls raised by others above, I see no copyvio issue.


 * As to the last point, one can only actively protect a copyright where, by law, one has a copyright interest. It is analogous to the point made in Computer Associates v. Altai, 982 F.2d 693 (2d Cir. 1992), in which the court held that "Feist teaches that substantial effort alone cannot confer copyright status on an otherwise uncopyrightable work".  As discussed above, that's not the case in these publicly reported polls, of thousands of people.  We're not discussing here Mercedez Benz's privately commissioned and highly confidential consumer testing results of its latest convertible--we are speaking about polls, the results of which are publicly shared.--Epeefleche (talk) 23:34, 8 November 2011 (UTC)
 * To clarify again, by "publicly shared" you include being put on a website at any time (noting that the Belgian website no longer exists) even where that website may state something like "all rights reserved" and where there may be no information on the sample size, selection criteria or analysis that produced a list of ranked value judgements rather than numbers or statistics? --Fæ (talk) 23:45, 8 November 2011 (UTC)
 * By publicly shared I mean being shared with the public at any time -- putting in on a website for one day, or printing it for one day, or announcing it through any medium whatsoever for one day would be the same. Those all are differentiated strongly from a privately commissioned Mercedes poll survey of its customers, that is confidential and not shared with the public.  As to the "all rights reserved" point -- first of all, that is a hypothetical, as none of the lists at AfDs have involved people pointing to any such statements.  None.  But in any event, the additional point is that can only "reserve" rights that one in fact has.  And it is of no moment for copyvio issues as to whether there is information on sample size, etc -- in fact, those are the unique elements that could be copied by another party, which involve choice and creativity arguably .... the failure to reflect them may be of interest to someone quibbling with poll results (was team x really the # 2 team in the nation, or was the AP Poll flawed in its process?), but it has nothing to do with whether there is a copyvio.--Epeefleche (talk) 23:55, 8 November 2011 (UTC)


 * Epeefleche, would you explain how List of Hot 100 number-one singles of 2011 (U.S.) is comparable to a poll? According to its lead, Nielsen SoundScan tracks sales and airplay, then ranks the singles. I guess a purchase could be considered equivalent to a vote, but that seems like a stretch. Thanks. Flatscan (talk) 05:32, 9 November 2011 (UTC)
 * Hi Flat. Feist applies to compilations of facts, and the format of the arrangement of the facts.  This can be a list of the results of a poll (whether for the Academy Award, or the  weekly AP Poll of sportswriters which in aggregate lead to a list of the rankings of the top 25 NCAA teams in college football, or a poll of the public as to the greatest Belgians), a list of telephone numbers, a list of re-sale values of cars, a list of CUSIP numbers of securities, a list of airplay results, etc.  It is what is similar that is key here, not what is different -- and what is similar is that they are lists of facts as to which no right of copyright adheres to the facts, though there is a requirement of attribution and of respecting any originality in form of presentation.--Epeefleche (talk) 06:12, 9 November 2011 (UTC)
 * A ordered list of "greatest" anything is not normally considered a "fact" and I doubt Feist defines fact to encompass any opinion that may be expressed even if aggregated from a sample of 10 people. --Fæ (talk) 09:28, 9 November 2011 (UTC)
 * Feist certainly applies precisely to ordered lists -- even ordered lists of telephone numbers, and of blue book resale values of cars.--Epeefleche (talk) 17:28, 9 November 2011 (UTC)
 * Sorry, for covering this again, but a list of phone numbers is normally sorted alphabetically or by a pre-defined taxonomy and similarly for tables of resale value. Ordering a list of living people by "notability" is not a repeatable process and easy to distinguish in terms of the potential for creative content. I am not claiming there is definitely copyright, only that the scenario is not a direct or tested extension of Feist and where arguments presented are original thought or based on "common sense" rather that legally solid findings then our policy should remain conservative on interpretation. --Fæ (talk) 14:53, 10 November 2011 (UTC)
 * Thanks for your replies. That list struck me as the odd one out, so I thought that I should ask. Flatscan (talk) 05:14, 10 November 2011 (UTC)


 * I realize and so does the Wikimedia Foundation attorney who spoke to me that copyright issues are not a firm, hard line, and her opinion on the matter (as of March) was presented as her best recommendation based on the fact that case law related to this issue is murky. She wrote, "Amassing and publishing facts is one thing, but creating 'facts' through a creative process (which conducting a survey arguably is) is another, and the latter likely to be declared sufficiently creative for copyright protection." She added, "Because I believe survey results can be protected under copyright law, any use of them should be guided by fair use principles. Merely republishing them without any commentary or transformation is not fair use.  As far as percentages and what portions of the lists or survey results can be used and qualify for fair use, as I've indicated before, the case law in this area is also murky at best." I'm not an attorney, and this one is beyond me. :) She is an attorney, neutral to specific content but with the best interests of the projects and the contributors in mind. Now, she did not say "Seek ye out and destroy...." What she offered was a recommendation for best practice, in her considered opinion. I have not been pursuing list or survey articles myself, but when they are brought to CP or otherwise raised for admin attention, I've been addressing them in accordance with her recommendations. --Moonriddengirl (talk) 12:32, 9 November 2011 (UTC)
 * It is accurate that amassing facts is one thing, and creatively creating facts is another. That is why the amassing of the facts of telephone numbers, what people are willing to resell their used cars for, opinions as to the best film or football team or "Belgian", and the like are all the amassing of facts.  The person who collects those facts does not create them.  Hence, they have no copyright interest in the telephone number, or the price at which Joe is willing to resell his car, or the view by Tony that Team A is better than Team B.  On the other hand, the pollster does have an interest in any special format that is used for presentation, which is why a screenshot is viewed as protected.  The starting point that she addressed was correct, but if she looks at the caselaw that followed Feist she will see that its application is precisely what I indicated.--Epeefleche (talk) 17:35, 9 November 2011 (UTC)
 * Evidently, she diverges with you in the creativity of the process. Moving into the realm of speculation and away from her statements (and thus into the possible realm of completely separating from her thought processes), where people are asked to choose their favorite from a finite list of films or football teams or Belgians, the selection of the original list is creative. That said, this all remains beyond me, and at this point I don't see any choice but to follow the practice I've adopted of not looking for trouble, but addressing issues when presented in line with her advice. :/ I tried to raise some interest in developing best practices in this so that it might become a guideline, but I wouldn't know what to recommend myself. The best option I've been able to come up with was to turn the list of top grossing films into an actual discussion of top grossing films at Bollywood_films_of_2011. (She specifically evaluated BOI's lists as problematic, since they are not actual facts but estimates, in line with CCC Information Services v. Maclean Hunter Market Reports. BOI has unfortunately not yet been willing to license their material because - AIR - they don't like the commercial reproduction bit.) --Moonriddengirl (talk) 12:21, 10 November 2011 (UTC)
 * It is not as clear to me, as it is to you, what the extent is to which she diverges from me and from law professor Bearian. As to your (self-described) speculation, please note the difference between an aggregate opinion of thousands of people (whether in a poll, or an election), and the opinion of one person.  Also -- note the difference between people being asked to choose from a finite list, and otherwise.  But it may well be that these are not even relevant difference -- see the Academy Awards list, or the lists of top football teams, both of which are repeatedly presented by all manner of RS media, in content though not in original format.  My focus on this discussion is on the lists that have been up for AfD, and the similar lists that are the results of polls, and which are reported robustly by the media in full, a number of which I've reflected above.  At this point, from what I can see, all but one of the (2 dozen?) AfDs have closed with the articles being kept.  As to the Bollywood films, I've not looked at them, and can't opine intelligently as to the relevance of the numbers being estimates; it is possible that they puts them in a different category for our purposes, but I've no opinion on that at this point in time.  I appreciate your good work here, Moon.  Best.--Epeefleche (talk) 21:23, 15 November 2011 (UTC)

This is an essay, not a guideline
This essay should not be used to delete charts from Wikipedia. It is an essay, and not a guideline. For more info see: Template talk:Corruption Perceptions/Corruption perceptions index and the following talk sections. --Timeshifter (talk) 06:13, 2 January 2013 (UTC)
 * This essay is not used to delete charts from Wikipedia. WP:C is. This essay offers more in-depth information on the copyright status of lists. --Moonriddengirl (talk) 12:03, 2 January 2013 (UTC)
 * Seems all are in agreement that it is not to be used to delete charts. Moon is correct that it is in-depth.  Another difference is that it is the advice or opinions of one or more Wikipedia contributors, which may represent widespread norms or minority viewpoints, and as Time said essays are not therefore Wikipedia policies or guidelines.--Epeefleche (talk) 17:19, 19 January 2013 (UTC)

Reliable legal sources
We need more lawyers, and links to reliable sources of info on list/chart copyright law. Feel free to add more links.
 * Copyrightability of Charts, Tables, and Graphs. Copyright Office, MPublishing. MLibrary of the University of Michigan. --Timeshifter (talk) 08:19, 2 January 2013 (UTC)

CCC Information Services v. Maclean Hunter Market Reports

 * CCC Information Services v. Maclean Hunter Market Reports.

It seems that some people are trying to invent a broad guideline based on an overly-extended interpretation of one case in one U.S. court of appeals out of 13. See United States courts of appeals.

What I get out of that case is that the judges are trying to defend an intellectual property variation that is tangentially copyright related. A sui generis special case for data that is intrinsically related to a company's bottom line. See:
 * Sui generis
 * Trademark
 * Trade secrets
 * Right of publicity

As long as the data we are sharing is presented in a different format, and we are not causing the originator of that data to lose money, then we are fine in all practical senses of getting sued. No need for us to invent a problem that does not exist. We are not sabotaging some used car publication's means of support by posting data that others have to pay them for. --Timeshifter (talk) 10:13, 2 January 2013 (UTC)
 * The problem with your hypothesis is that it assumes none of our reusers could ever do anything with the data that would cause the originator to lose money. Our content is not merely "content that we can use ourselves": it is "content anyone could use for anything". Ironholds (talk) 12:03, 2 January 2013 (UTC)
 * I see from your WMF user page that you have a law degree as of 2011. That's great, because we need more lawyers looking at this issue. But frankly, we need some experienced lawyers looking at this. Experienced in this area. If you look at the previous talk section you see what a university comes up with concerning the issue of copyrightability of charts, tables, and graphs.


 * As to your point, I agree that it is about "content anyone could use for anything". But it is pretty obvious when the data/info in a table or list used on Wikipedia could ever effect the originating organization's livelihood. And the example discussed here is not such a case: Template talk:Corruption Perceptions/Corruption perceptions index and the following talk sections there. --Timeshifter (talk) 13:41, 2 January 2013 (UTC)
 * As Moonriddengirl makes clear in the section you're citing, we have had experienced lawyers look at this area. WMF counsel. They disagree with you. Ironholds (talk) 13:47, 2 January 2013 (UTC)
 * No they don't. Reread this whole talk page thoroughly if you haven't already. The lawyers disagree with each other. WMF counsel says the whole issue is murky. --Timeshifter (talk) 14:15, 2 January 2013 (UTC)
 * Actually, she said case law as regards surveys is murky. Surveys are only one kind of list. --Moonriddengirl (talk) 14:22, 2 January 2013 (UTC)
 * From reading the differences of opinions on this talk page concerning various lists it is all fairly murky. We need more expert opinion. --Timeshifter (talk) 14:41, 2 January 2013 (UTC)
 * Have you considered the possibility that it may be murky following the involvement of experts because it is a murky field? There is no expert who is going to provide some uniformly acceptable legal panacea. People are divided as to whether or not it is acceptable: when we have something that cannot reasonably be shown to be acceptable, the system is biased towards removing it for fairly transparent reasons. Ironholds (talk) 15:04, 2 January 2013 (UTC)

Related discussion
Please see Wikipedia_talk:WikiProject_Academic_Journals. Comments appreciated on whether such a list as discussed there would be a copyvio or not. --Piotr Konieczny aka Prokonsul Piotrus&#124; reply here 05:14, 15 May 2019 (UTC)