Wikipedia:Wikipedia Signpost/2005-12-05/DDR copyright

The German Wikipedia has discovered that it has been infected with widespread copyright infringement from a number of printed reference works. This incident has also seen coverage from a few online media sources (mostly German, but one report is available in English).

Over the past two years, some person or persons have been copying text from a variety of reference works published in the former German Democratic Republic and adding this text to Wikipedia. So far copied material has been traced to seven different publications, and it remains possible that others are involved that have yet to be identified. The editor(s) responsible have invariably not used an account, making it difficult to hunt down every possible infringement. Reported IP addresses belong primarily but not exclusively to the internet service provider Deutsche Telekom.

This problem was first discovered in mid-November, but the addition of these copyrighted texts appears to have been going on since at least November 2003. Since the discovery, editors on the German Wikipedia have been busy remedying the situation, involving hundreds of affected articles. Articles suspected of containing copied text would be tagged and quarantined until someone with access to the original text could check it. The typical approach to locating copyright problems, which relies heavily on checking blocks of text against search engine results, proved inadequate in this instance because these texts are not otherwise available online.

To deal with media reports, the German Wikipedia prepared a media information page explaining the situation. Although the situation is serious and has required several weeks of work, one fortunate aspect is that Wikipedia editors discovered the problem themselves, rather than facing a public complaint from an offended copyright owner.