Wikipedia talk:Wikipedia Signpost/2011-11-07/Special report

Single point of failure

 * I'd like to echo Kudpung's concern that we need to get CorenSearchBot back up and running as soon as possible. We have many policies here at Wikipedia, but only a few have legal implications and copyright adherence is one of them. Please note that the bot is fine, it's the Yahoo search engine that CorenSearchBot used that is the problem. I know the WMF staff have been trying to work with search engine providers to come up with a solution, but it's been a few months and it might be time to put more effort in to finding a solution. This also highlights another issue, single point of failure. We rely heavily on bots here at Wikipedia and certain bots, like CorenSearchBot, are critical to our operation. When those bots go down, it can be a major problem. IMHO, we should be indentifying critical bots and making sure we have a backup plan to keep them operational. - Hydroxonium (T•C• [//en.wikipedia.org/w/index.php?title=Special:ListUsers&limit=1&amp;username=Hydroxonium V] ) 12:53, 8 November 2011 (UTC)
 * Just a note: I think we're very close to having a solution to this problem. I think we'll be able to make some announcement around a week from now.  Coren is involved in the resolution of this, and please know that it's being actively worked.  Several staff members have put a great deal of time and energy into getting a resolution.  Philippe Beaudette, Wikimedia Foundation (talk) 13:52, 8 November 2011 (UTC)
 * Very encouraging to hear.  Skomorokh   11:09, 11 November 2011 (UTC)

Copyright in India
Just an observation that Indian copyright law is based on UK law and the Berne convention, and is not substantially different to copyright law anywhere else on the planet. I cannot comment on whether or not there is a culture of plagiarism that is worse than in educational establishments in other places - if so, that would appear to be a structural issue that the Wikimedia Foundation cannot tackle. --Elen of the Roads (talk) 12:59, 8 November 2011 (UTC)
 * There's some discussion about that at the bottom of WT:IEP now, which would seem to indicate that there are some pretty serious issues with people's attitudes towards copyright in Asia (see there for full comments). The Blade of the Northern Lights  ( 話して下さい ) 15:30, 8 November 2011 (UTC)
 * Basically, laws and attitudes are two different things. Take laws on jaywalking for example. There are laws, but most of the people don't follow them. Same concept here. Manish Earth Talk •  Stalk 16:40, 8 November 2011 (UTC)


 * The concept of plagiarism as a bad thing is quite modern. For example, ancient authors copied each other on a regular basis: what we know about the lost books of Polybius' Roman History is due to Livy's unattributed plagiarism of the earlier writer. Even as late as the 18th century, plagiarism was a regular occurrance: there is a comic incident where Benjamin Franklin, after falling out with his partner in the publishing business, then accused him of unethically printing articles from Chamber's Cyclopedia in their American newspaper -- despite the fact it was Franklin's idea in the first place! (Encyclopedia publishers in the 18th & 19th centuries plagiarized each other as a regular practice.) I believe one reason for this was that until the 19th century, a scholar considered himself very fortunate to have access to even as many books as can be found in the average high school library. Access to information & ideas is more important than giving proper credit for them; only within the last 100 years or so have we in the West achieved the luxury of abundant information, so now we expect honesty in credit. -- llywrch (talk) 17:32, 8 November 2011 (UTC)
 * Some modern authors (H. P. Lovecraft comes readily to mind) have encouraged people to take their ideas as well, so it's not unheard of today, but it's certainly unusual. In Lovecraft's case, it's made the copyright status of his work irredeemably confused.  Totally agree with your points, though.  The Blade of the Northern Lights  ( 話して下さい ) 17:52, 8 November 2011 (UTC)
 * Is it really useful to single out Indian student edits in this way? A much smaller, US-based initiative has had similar recent problems, . Anyone that's worked in this area has seen copy-paste additions by editors whose user pages indicate all kinds of origins. See WP:CCI. Wouldn't it be better to just ask all course leaders to verify that they've given their students a session on plagiarism?
 * Another angle - some of the institutions sponsoring and/or requiring WP editing must have subscriptions to the plagiarism detector tool Turnitin. If the WP edits are done as part of an educational assignment it'd probably be acceptable from Turnitin's point of view to submit articles there for checking. Pending a re-instatement of in-house tools, which may or may not include Google books. Novickas (talk) 20:48, 8 November 2011 (UTC)
 * I do think it is fair to single out India (and I have worked extensively with copyright problems in Wikipedia). Though editors from all over the world post copyvios, if you look at the list of investigations at WP:CCI, a disproportionate number are from South Asia. Calliopejen1 (talk) 03:21, 10 November 2011 (UTC)
 * I concur with  Calliopejen1. I've been working  in  education  in  Southeast  Asia (with  periods in  India) for  the last  13 years, and plagiarism is endemic here -  at  all  levels of academia. Kudpung กุดผึ้ง (talk) 09:48, 10 November 2011 (UTC)

If you do a search with Google books, on just about any subject, you'll find books published in India in recent years (in English) that appear to have a liberal borrowing of content from various other copyrighted books. I do think twice about using a book as a reliable source if it's published in India. (Yes, I'm generalizing, but there's a lot of disregard for copyright in some places, not just India.)OttawaAC (talk) 04:45, 11 November 2011 (UTC)

Class size
From :


 * Some of the classes enrolled for the program were very small, with only 18 students.


 * Smaller classes demand equal number of in-class presentations/editing sessions/refresher sessions as any bigger class would do. To get maximum impact, it makes logical sense to enroll classes with larger number of students.

Yikes. This sounds like anti-wisdom to learn from this project. I couldn't disagree more. Plagarism is a problem when mentors are not closely involved with student's work. Small class sizes and close prof / TA involvement is *vital* for getting good Wikipedia articles; lectures to the class are a dime a dozen and not that important. If the prof of an 18-person course was not familiar with Wikipedia or not monitoring their students at all, the solution is absolutely not to give them a 50-person course! If only a few profs had the knowledge / patience to do this right, then just shrink the program and keep with small class sizes. 800 students participating was part of the problem, anyway. 70 dedicated students across 3 Wikipedia-savvy professors would have done far more good than a giant haphazard program, I'm sure. If Wikipedia absolutely had to be part of a large class project... I'd still want to break it down into "labs" where a TA has a responsibility to chat with a specific set of 15-20 students, and check their work. The difference between "go to library, read refernece work on subject at hand, add passages cited to it, add new passages next week off different book from library, etc." and "sudden text dump with no referenecs" should be very obvious - IF people are paying attention early.

The other limitation on quality is that students should choose to do this at least semi-voluntarily. (I believe that the course description of WikiProject Murder Madness and Mayhem mentioned the Wikipedia aspect, for example.) I'm not closely familiar with the project, but did all 800 students really know what they were getting into? Or was this a surprise homework assignment dropped on all of them? SnowFire (talk) 16:48, 8 November 2011 (UTC)


 * Hi SnowFire, thanks for the comment -- that page is actually an early draft of what is at Wikimedia_Foundation_-_India_Programs/Education_Program, and a few things have changed, including that particular point. Our learnings will continue to evolve on the Meta page, which is more accessible than a sandbox on the English Wikipedia. :) I'll replace the content of that page with a link to Meta momentarily. I think you have a great point, though, and I'd encourage you to make comments on the talk page of the Meta page to ensure we are having community feedback on those learning points in one place. -- LiAnna Davis (WMF) (talk) 17:35, 8 November 2011 (UTC)

Get back on the horse
When you get bucked off a horse, you should get back on again as soon as you've checked out that you're still in one piece. Otherwise you're likely to start imagining difficulties and problems and how close you came to getting killed. Wikipedia is still in one piece. Similar projects should now go forward with all deliberate speed, or folks will be reluctant to get involved with this ever again.

Similar projects might include projects is other countries, say Mexico, Brazil, or South Africa. Project size should be limited, say to 100 students, until a fully successful project has been completed. I'd suggest not making the project mandatory for a grade, rather make student contributions "extra credit" assignments. Wikipedia has always been about volunteer contributors - there's no reason to change this now. Professors should review student contributions before they go into article space - that way we can know whether the problem is with the students or with the professors. Having the university administration apply for a Wikipedia grant to implement the program might help as well, by getting the top people at the university involved and putting their credibility and prestige on the line. Smallbones (talk) 17:54, 8 November 2011 (UTC)


 * Well said. This project tried to do too much too quickly, but experimenting with partnering with universities world-wide is exactly the kind of thing we should be doing to broaden and diversify our contributor community, and now's the time to take the lessons from the initial pilot on-board and do better. :-) --Eloquence* 19:04, 8 November 2011 (UTC)
 * Here's another idea; just have people edit their native language wiki instead of trying to get them to edit here. I've long advocated that we should be better, especially with Indian editors, at pointing them to their native language wikis, for many of the same reasons this project has gone awry (see the history of Malhoo for a great example of what happens when we encourage editors with very little command of English to edit en.wiki instead of their native language), and because increasing the size of the other Wikipedias will make us look more diverse and give us higher quality content everywhere, which can be translated into other languages and make better articles overall.  Hindi, Tamil, Telugu, and many other Indian Wikipedias are in the low to mid 10,000s in articles, meaning they're missing a lot more than we are, and they could use the new editors more than us, not to mention the fact that the number of copyvios would probably go down because students won't feel the same pressure they do writing in a foreign language (as a Japanese student, I can relate to that pressure in some ways).  I've met many Indian immigrants where I live, and most of them are great people with the best of intentions, but I would never mistake their speech or writing for Jawaharlal Nehru; there is a reason we have other language wikis, and we should make a more conscious effort to promote them. The Blade of the Northern Lights  ( 話して下さい ) 20:33, 8 November 2011 (UTC)


 * As far as I know, the India Edu folks have strongly promoted the existence of the Indic language Wikipedias. There's a staff person on the India team, Shiju Alex, entirely dedicated to supporting Indic language projects, and you can read a bit more about Indic language specific outreach here. In my department (engineering), we're investing significant time and effort in the development of technologies like Narayam and WebFonts, which help overcome technical barriers to participation in those languages.


 * As you know, the language situation in India is particularly complex. English is promoted as the lingua franca in higher ed, and it's an official language of India that's widely seen as key to professional success. At the same time, the Indic languages are also being promoted and pushed, sometimes for nationalistic reasons, or for reasons of cultural heritage. It's a very difficult context to wade into, and I think WMF is wise to generally avoid being prescriptive as to what language people should work in. My understanding -- and Nikita or Hisham would be able to add some detail on this -- is that the strong preference of the educational institutions approached in the India Edu pilot was to work in English.
 * I certainly do agree that we should define parameters for these programs that serve the best interests of our projects, regardless of the conditions and preferences on the ground, and decline engaging in activities that bring more harm than good.--Eloquence* 06:17, 9 November 2011 (UTC)

While understanding the reasons why the participants wanted to edit en:wp, the result has shown that many of them did not have sufficient command of English to contribute satisfactorily, particularly in technical areas where en:wp is already well-developed, and the result was not only disruptive for en:wp but must also have been extremely frustrating and dispiriting for the students. I suggest that an alternative, which would be much more likely to produce useful results, would be to translate articles from en:wp into Indic language WPs. If good-quality source articles, preferably GA or FA standard, were chosen, the receiving WPs would be improved, and the students would be able to exercise English skills in translation, would learn about article structure and sourcing and about attribution (by using the Translated article template), and would be much more likely to end with a feeling of achievement and to become long-term Wikipedians. JohnCD (talk) 12:33, 10 November 2011 (UTC)


 * Engaging in large scale translation projects has its very own problems. See, for example, Sodabottle's scathing criticism of Google's translation efforts in Tamil Wikipedia. Regardless of whether all the criticism is fair or not, the simple fact is that correct translation is very, very difficult, and low-quality translation is disprespectful of the Indic language communities which are trying to maintain their own standards of quality.


 * I don't think there are any easy answers. Growing the community of contributors in developing countries is going to be hard, no matter how we do it, and there'll be plenty of failed starts, finger pointing, and Signpost stories along the way. ;-)--Eloquence* 00:15, 12 November 2011 (UTC)


 * Hear, hear.  Skomorokh   11:09, 11 November 2011 (UTC)


 * One of the reasons this project failed is a gross underestimation of the amount of work nearly 1000 novice editors (~5% of the existing body of editors) would push onto the community at large. I would be very wary of implementing these types of programs on a large scale without them being essentially self-contained–having enough supporting editors, ambassadors, and involved professors that they do not place a burden on already swamped NPPers, copyright investigators, and general cleanup crew. Danger High voltage! 22:28, 8 November 2011 (UTC)

Quoting from above:

"When you get bucked off a horse, you should get back on again as soon as you've checked out that you're still in one piece. Otherwise you're likely to start imagining difficulties and problems and how close you came to getting killed. Wikipedia is still in one piece. Similar projects should now go forward with all deliberate speed, or folks will be reluctant to get involved with this ever again."

I absolutely agree with this sentiment. Take the lessons learned, try again. KConWiki (talk) 14:18, 13 November 2011 (UTC)

I am glad Smallbones suggests a smaller "pilot", though I think 100 is too large - try two or three classes, say 50. It would be a disaster to initiate another "pilot" of this size in the spirit of "get back on again". It concerns me a great deal that there has been no acknowledgment by the designers that they started out too big. If you get bucked of the horse because you didn't plan properly and overestimated your skills and capacity then simply getting back on again without acknowledging and correcting those errors is really stupid. Joja lozzo  16:08, 18 November 2011 (UTC)

Scientific plagiarism in India
This article may be of interest. Curiously, it is now at AFD. *goes to investigate* --Piotr Konieczny aka Prokonsul Piotrus&#124; talk to me 18:07, 8 November 2011 (UTC)

An Ambassador's 2 cents
As an "Online Ambassador" for students and classes in the Public Policy project and current US education initiative, I urge the Foundation to be very careful not to bite off more than it can chew. Starting with 800 students was just 'way too ambitious. Projects need to start small in each new market to see what the issues will be before expanding. A high ratio of ambassadors to students in smaller pilot projects is essential. Perhaps this is now stating the obvious. Happy editing, everyone. -- Ssilvers (talk) 21:35, 8 November 2011 (UTC)
 * Even if it is stating the obvious, your point is well worth stating and re-stating. I was particularly concerned that so much responsibility, over-and-above what Campus Ambassadors are supposed to do, was placed on those bright, enthusiastic, but very inexperienced shoulders. This was compounded by bringing on a group of "special" Online Ambassadors (not chosen through the normal Online Ambassador processes) who were completely unqualified for the task. I hope too that when the IEP gets back on the horse, the organisers will reach out to the subject-specialised WikiProjects who could have provided an enormous amount of help and advice, but were never even contacted. Please don't ignore that valuable resource. Voceditenore (talk) 14:07, 9 November 2011 (UTC)

Evaluating the Pilot
Just like to say that the WMF Global Development team is taking the feedback from all sources very seriously, as we review what happened during the pilot over the past few months. We made mistakes as Frank and Hisham shared above and we are already integrating the lessons into our plans going forward. I am in the process of retaining a reporter to do a round of interviews with key actors in the process the WP community, students, professors, campus and online ambassadors and WMF staff to really capture all of the learning systematically. Her report will be shared openly with the community and I've asked her to be blunt, where necessary. While we can't go back in time, we can extract lots of learning from the pilot that will make all of our work more effective. Our team definitely plans on getting back on the horse (appreciate the sentiments Smallbones) and we plan to follow Beckett's maxim as you suggest Skomorokh. --Bnewstead (talk) 00:54, 9 November 2011 (UTC)
 * Glad to hear it, Barry. The Signpost will be very interested in following the review process, and wishes you the best of luck with future initiatives.  Skomorokh   11:09, 11 November 2011 (UTC)

Ethics, anyone?
"Acting as a publisher of content" is a no-no. Forcing Indian kids to "contribute" is a priority. Very well. You wanted unfree labor (not exactly slavery, but not free will either), you've got unfree works (not always plagiarism, but mostly worthless). Unexpected, really?

Perhaps, if this "source" of content is really important for the Foundation, all input should be contained in an incubator or some other sort of a holding pen (and then kill them before they grow).

NVO (talk) 11:45, 10 November 2011 (UTC)
 * A salient point. I only edit Wikipedia with the diligence and passion that I do because I choose to do so of my own volition. Regarding the incubator: might I suggest (as I do in my point below) the Hindi Wikipedia? They could work in what will be (for most of them) their native tongue, and if there's an endemic culture of plagiarism as some other editors have been saying, then it remains confined to a wiki where those problems are dealt with as appropriate to the culture of most of that language's speakers. Brammers (talk/c) 23:27, 10 November 2011 (UTC)


 * Are those factors any different from the Public Policy Initiative, which produced a significant amount of worthy content? Under that analysis, any course requirements are coercive.  Skomorokh   11:09, 11 November 2011 (UTC)
 * I do agree with Brammers that WMF should Concentrate on Indic Wikipedia and specially Hindi Wikipedia as Hindi has a very large internet user base in india as well.Nearly 40% of the population is Hindi Speaker and more than 60-70% can understand Hindi.Apart from this our main purpose should be increasing awareness regarding Wikipedia among Indians that will increase our Viewers and editors both.-- Ma yur (talk•Email) 11:29, 15 November 2011 (UTC)

Why en.wikipedia?
I haven't had time to rummage round the paper trail for the pilot, but why was the English language Wikipedia chosen as the target of the pilot? We have almost four million articles and stringent standards. I would have thought that more good would have come from bolstering the 100,000-ish article Hindi Wikipedia. 800 new articles there would have raised awareness of the Hindi language Wikipedia in the country where it is most relevant and increased its article count by almost 1%. In addition, there is much more low-hanging fruit to be had: whereas students creating new articles for en-wp have to scrabble around for a ridiculously niche topic, hi-wp will still have many of the more obvious potential articles still waiting to be written. Not unlike anglophones who mosey over to Simple just for the buzz of writing article after article from scratch. Brammers (talk/c) 23:19, 10 November 2011 (UTC)
 * Plus if the students' first language is Hindi, it would result in much better text quality than if they were to write in faltering English. And (purely conjecture here) if they can communicate an idea better in their native language, they'll probably be less likely to rip something wholesale out of a book or webpage. Brammers (talk/c) 23:22, 10 November 2011 (UTC)
 * So they learn the ropes at English Wikipedia and then move to their native language WP with an editing skill set. That would be a great use of the program. We're all trying to develop more serious editors here, as far as I'm concerned... The faltering first articles are immaterial. Carrite (talk)
 * I've often wondered about this myself - and haven't made up my mind either way. One point to keep in mind: many of the indic language editing communities are tiny and would experience the same strains as the en:wp editing community given the same circumstances. Going forward, the same lessons hold, no matter which language wikipedia is selected for the next round. Bishdatta (talk) 18:03, 13 November 2011 (UTC)
 * As has been mentioned elsewhere, the preferred language of instruction in higher education in India is English. While Hindi is the most common language in India, it is spoken by only about 40% of the population. While some schools may have a majority of students who speak Hindi (or some other language), it is unlikely that any institute of higher education in India will have all its students proficient in any one Indian language. One reason that English is an official language in India is that many non-Hindi speakers resist the idea that Hindi should be the common language of India. It is also the case that English is the language for access to global science, technology and commerce, which is why many Asian countries require or strongly encourage learning English, and use it as a language of instruction in higher education. -- Donald Albury 10:58, 14 November 2011 (UTC)

In defense of Indian editors
My experiences thus far with the Indian Education Project editors have been positive, mainly because out of 25 or whatever assigned projects, only a handful have made significant edits to date. So, instead of being swamped by 25 people going willy-nilly, I've been getting serious and good questions from three, and I feel like they're learning the ropes and progressing. Is their material totally smooth and clean of potential copy vio? I hope so, but y'know, we all learn as we go here. There's nothing blatant that I've seen and I feel like a few difficult economics pages have been substantially improved. And, hopefully, the editing spark has ignited a few new content creators.

The concept of getting Indian students involved is good. English Wikipedia is Anglo-American centric and getting increased content creation from other English-speaking regions of the globe is highly desirable, to my way of thinking. It's going to be a learning process for everybody. The ratio of newbie content-creators to mentors should be no greater than 5-to-1, I'll say that for sure. Expecting and forcing entire classes to contribute is a failed idea; there are probably a serious few in each class of 25 or 50 that will be willing to learn the ropes — these should be encouraged to participate. Forced participation of lower echelon students is going to result in a mess which exceeds benefit created.

The idea is good, it's a question of scale. There needs to be more one-on-one hand-holding, content creator to content creator, and less throwing shit against the wall and letting quality control supervisors pick through it for copy vio. That's not gonna work. Scale it back, keep it going. And hurray for the good Indian contributors. Carrite (talk) 04:24, 11 November 2011 (UTC)

Quality of Indian Students and Scale
While many are focusing on lack of local volunteers,Ambassadors etc, I feel these will matter very little. NASSCOM observes only 25% of students passing out are employable in Indian IT. Now if anyone has worked with Indian IT firms, the bare qualifications to secure a job in these firms is basic written / oral communication skills and Analytical abilities which is just 10th grade math. So if we make a large scale of such students participate in any program from a college we are going to get the same quality. GIGO.  Srikanth (Logic)  11:45, 11 November 2011 (UTC)


 * Exactly. There are a top 5 or 10% of Indian students who should be targeted — or who will identify themselves by their actions — and THOSE are the ones who need to be carefully cultivated. In terms of the class project idea, I suspect the way to way to frame it is for a Wikipedia article to be a completely optional, extra credit exercise: everyone in the class registers to write on a topic, and from there the student is completely free to write or not write on that topic, according to their own wishes. Then Wikipedia mentors appear — people who actually write articles, not quality control inspectors — and offer their services. The best and the brightest, those actually interested in learning the ropes at WP will self-identify. Those who don't give a crap should not be pushed into participating. The best and the brightest who are interested then should be reassigned to other mentors, if necessary, to maintain a manageable ratio of 4-to-1 or 5-to-1, something like that. Carrite (talk) 15:43, 11 November 2011 (UTC)


 * By the way, this is not a dig on India or Indian students per se. I suspect that a huge percentage of students from the USA or Canada or the UK would be similarly unsuited — although probably a lower percentage, due to the fact that some Indian editors are not completely fluent in English, which makes participation at English Wikipedia problematic. Carrite (talk) 15:48, 11 November 2011 (UTC)


 * That's not an untested hypothesis, you know.  Skomorokh   15:56, 11 November 2011 (UTC)
 * I agree mostly with the view expressed here. The project should target better collection of students. This may be done by targeting top institutes, such as Indian Institute of Technology or Indian Institute of Management or institutes of such stature. Although some universities included in this pilot project could be quite good, (and of course some students must have done fabulous jobs), the middle grade institutes like Symbiosis would mostly present mediocre students. Now, one or two or a few students even from such institutes may be really good contributor, but to increase the probability of getting contributors, better institutes need to be targeted. Regards.--Dwaipayan (talk) 05:26, 16 December 2011 (UTC)

Consultants' experience and exposure to Wikipedia editing, policies and guidelines

 * I don't know whether this played a part and perhaps I'm completely wrong. But did the consultants recruited have enough understanding of Wikipedia's policies and guidelines? Did they personally have established editing experience? Yes, students did commit plagiarism; but they could have been mentored at stricter levels. Just one view... Wifione  Message 15:40, 31 January 2012 (UTC)

Update on the India Education Pilot
Just wanted to inform you that we have put up a post about the India Education Pilot here. Please fell free to initiate, advance or follow the conversation on the same page. Thanks Nitika.t (talk) 10:23, 19 June 2012 (UTC)