Wikipedia talk:Wikipedia Signpost/2011-09-05/Opinion essay

Wikipedia is in the position of strength here. The search engines require Wikipedia a lot more then the reverse. We should consider blocking spidering of Wikipedia by Yahoo unless they allow Wikipedia via CorenSearchBot access to search returns for copy vio purposes. It wouldn't take long for Yahoo to be distressed at such a decision. Can we use Google to check the copy vios? Google makes big bucks out of Wikipedia by immediately accessing the updates. Is there an a technical issue of using Google or some policy issue? Regards, SunCreator (talk) 01:58, 6 September 2011 (UTC)
 * Even the idea that Jimbo would announce to the press that Wikipedia was thinking of blocking Yahoo would send it's share price down and some immediate attempt to rectify the situation from Yahoo. It's not like accessing it's search results automatically is a problem, they just don't want everyone doing that, I'm sure they will make an exception for Wikipedia. Has anyone even asked Yahoo? Regards, SunCreator (talk) 02:08, 6 September 2011 (UTC)
 * Coren and Jimbo are in negotiation with Google with respect to this issue. MER-C 02:57, 6 September 2011 (UTC)
 * If Jimbo does do that it's perfectly allowable by the nonexistent rules of capitalism. -- Σ  talk  contribs  03:49, 6 September 2011 (UTC)


 * Sometimes the state of some of these copyvios shocks me--Guerillero &#124; My Talk  02:56, 6 September 2011 (UTC)
 * While I kind of like the idea of us throwing our weight around, in the spirit of Christmas, lets not. extransit (talk) 05:23, 6 September 2011 (UTC)


 * I think it would be unethical to punish Yahoo for first helping us (do searches) and then deciding (for unknown reasons) that it cannot help us any longer. Why should anyone help us, if we show that we will be vengeful when they stop? JRSpriggs (talk) 06:38, 6 September 2011 (UTC)
 * Quite a while later but I came across this while deciding on my arbcom votes and agree whole heartedly. Particularly since we would be punishing Yahoo for something which neither Google or Bing allowed us to do. Doesn't Yahoo rely on Bing nowadays anyway (i.e. can we even block them independently)? For suggestions like ƒETCH proposing we make noise about all three it's a little fairer but IMO still not likely to be effective. People are more likely to thing just because we're a non profit doesn't mean others have to let use their service in a manner that's normally charged for, so we'll come across as whiny complainers. Remember also search engines work both ways. Yes they take our resources by indexing but they also make it easy for people to find our content. Us using a search engine to find copyvios isn't that much of a benefit to search engines except in an abstract 'it's good for us therefore good for them' or 'good publicity' sort of way. Nil Einne (talk) 17:35, 5 December 2011 (UTC)


 * Great article. I try to apply a 'does it look too good to be true?' test to new articles and uploaded images, and this has produced good results (I've caught a largish number of copyright violations and been pleasantly surprised by content that turned out to be fine). In my experience text that looks like it came from a news story probably did. Nick-D (talk) 10:54, 6 September 2011 (UTC)
 * I have been criticized by an experienced Wikipedian for deleting material copied unaltered from a web site. I agree that there is a problem, both in the extent of copyvios, and the blasé attitute of many Wikipedians to copyvios. -- Donald Albury 11:02, 6 September 2011 (UTC)
 * There's no indication that Yahoo did this for the simple pleasure of spitting in Wikipedia's face so I don't see why we should freak out. Wikipedia's reaction was exactly what it should be: regret Yahoo's decision, try to find an agreement with Google. You don't need to act like a bully just because you have enough muscle to do so credibly. Pichpich (talk) 21:29, 6 September 2011 (UTC)
 * Given that Yahoo!'s CEO Carol Bartz has just been kicked out, a new executive might be more open to reversing the API changes. If a Google deal falls through, I think publicly embarrassing the three major English-language search engines a little might push someone to act. / ƒETCH COMMS  /  04:17, 7 September 2011 (UTC)
 * I'd add that the copyvio problem is not limited to articles. I've uncovered a huge number of copyvios in my short stint at AfC as well.  When I watch the new user log, I frequently check new userpages, and I'm quite liberal in tagging pages from obvious corporate accounts, because my experience is that many times, even if they don't quite meet G11,they're often copyvios from somewhere. The Blade of the Northern Lights  ( 話して下さい ) 04:30, 7 September 2011 (UTC)

I just wanted to point out that back on August 30th, I proposed a change to Special:NewPages to help us deal with copyvios while CorenSearchBot was down. The thread can still be found at Wikipedia talk:New pages patrol. Singularity42 (talk) 20:01, 7 September 2011 (UTC)


 * As someone who deals with copyright issues in the File namespace on a regular basis, I can attest to the scope of the problem there. Wikipedia has several hundred thousand images and Commons has several million. On a daily basis images that were just found on the internet and are clearly the work of other people are uploaded by usually good intentioned users as 'own work' and given free licenses. I place a good deal of blame on the Wizard and its defaults, however Moonriddengirl is correct that a major cause is the lack of knowledge about copyright among many people. Most troubling is that a good number of people know about the existence of copyright but have major details wrong. I often hear the statement "it's on the internet, therefore it's in the public domain". What is needed are a set of guides, written so clearly that a third grader could understand them, that we can link to as an easy way of showing people the mistakes they are making. Communication with these people is key.  S ven M anguard   Wha?  17:40, 9 September 2011 (UTC)
 * +1 to the idea that most people don't know the first or last thing about copyright law wrt to images (For fun and frustration, if you have a Flickr account, go over there, look over new uploads (especially under certain CC licenses) and find copyright violations like screenshots, or photos of three-dimensional public artwork in the U.S. (for even more of a challenge, don't use a fish-in-the-barrel tag search like that. But you'll still find some if you know what you're looking for) Then leave comments for the users who uploaded them telling them about this. Not a single one will have been aware of this; some of them will even tell you off. Yahoo! is (in addition to its other problems) sitting on a huge litigation time bomb here; they are demonstrably negligent even without comparing them to us. We make this even more complicated with a fair-use policy that is more restrictive than U.S. law, so someone who thinks they're OK (and would be elsewhere) is actually not (I have found it interesting that, in surveys of how many new accounts stick around to become members of the community, virtually none of those whose first edit was to create a page outside of article namespace have done so. Hmm ... what kind of new user starts by creating a non-article page? You got it ... someone uploading an image that they thought they could use (It would be interesting to see how many of them did, indeed, upload third-party copyrighted images that wouldn't be justified under our policies). Daniel Case (talk) 19:59, 9 September 2011 (UTC)
 * Can we not make file-uploading a userright independent of autoconfirmed? It is harder to wrap one's head around all the fair-use, OTRS, etc. material than to understand "No copy/pasting text". -- Σ  talk  contribs  07:58, 12 September 2011 (UTC)

Yahoo and Google both permit automated queries (which is what Corenbot is/was). They charge for them, though; you can see those costs by following the links to the relevant terms of service mentioned here. The cost wouldn't be minimal for the Foundation (Google, $5 per 1000 queries, for up to 10,000 queries per day; ; Yahoo either 80 cents per 1,000 or 40 cents per thousand using a limited index and slower refresh (about 3 days). However many thousand new articles per day over all projects, times number of queries per article (possibly one for each article sentence?) And I'd suppose other non-profits, including university research projects, would like cost exemptions, including those for copyvio searches, and have comparable claims. Novickas (talk) 01:19, 10 September 2011 (UTC)
 * And TinEye (relevant to Commons) is 10 times more costly than Google -- $1500 for 30 000 of queries. Trycatch (talk) 17:36, 12 September 2011 (UTC)