Wikipedia:Wikipedia CD Selection



This is the project page for a series of Wikipedia CDs/DVDs being produced by Wikipedians and SOS Children. The latest version can be browsed at schools-wikipedia.org and downloaded here

The 2007 DVD was a huge success, with distributions to schools in four countries, use by the Hole in the Wall project, thousands of downloads and discs and around 14,000 unique IPs a day visiting the online version.

Downloads started on 15 October 2008. The 2008/9 version has 33,716 images, 19.8m words and 5502 articles. The article list has gone up to 5500 articles with about 1500 added and 600 removed from the 2007 version as more relevant articles for kids have reached adequate quality thresholds. Pretty much all of them have been updated to more recent versions since the 2007 version. They are still sorted by curriculum topics with some portal pages for curriculum subjects added. This will still be free by any route.

As well as updating checked content and improving the selection, this time some portal pages have been included such as SchoolsWP:Portal:Early Modern Britain which closely match UK curriculum subjects. A joint press release was made to coincide with the availability of downloads.

Articles which were listed on the Version 0.7 selectionbot pages and previous editions were all considered by volunteers. All submitted articles then go through a clean-up script to remove Fair Use images, all sentences whose only purpose was to link to unincluded articles (e.g. "see also"), stubs and editorial content, and sections containing material unsuitable for children and external links. Where articles had been vandalised or contained questionable material the most recent good version was used.

Previous versions
For the 2007 Schools Wikipedia, a joint press release between SOS Children and the Wikimedia Foundation announced the launch of the new Wikipedia Selection (see Press releases/SOSChildrenUK2007)

Current issues
General article quality is improving but hand checking is still needed because of vandalism. Some articles on really basic topics (like Farm for example) are still disappointing though.

In 2007 the vandalism rate was much higher than the previous year. More than 50 of the 4500 articles were found to be vandalised versus 5 of 2000 the previous year. This corresponds to a what seems to be a recognised falling standard on Wikipedia which is why the WP community threw in the towel and implemented nofollow etc. Using historical versions of articles is inadequate to protect against graffiti since images get copied over and templates get vandalised. Some sort of manual check is needed too.

Frequently asked questions
We have been asked by several other WP projects to explain the process of selecting article versions from the article history in more detail.

For 2007 we asked for lists of URLs of good historical article versions from page histories. For 2008 as we made extensive use of volunteers for this project, we set up things for them in an easy to use system.

1) Lists of proposed articles were taken from community proposals, the Version 1.0 lists of articles, working directly from school curriculum topic lists and the 2006 WP.

2) These lists of proposed articles are browsed on our database system pictured below. In the frame the English Wikipedia can be browsed manually, comparing the current DB version (on display in the case below is a DB version), a recent version by a known editor and the previous version used for the project (or alternatively an older version by a different known editor) to check for vandalism in the diffs. When a "good" version of an article has been selected, the edit fields are filled in listing specific delete sections (there is a global delete list too) which are unsuitable for children or otherwise need deleting (empty sections for example), also you add text lines which need to be deleted (usually some reference to non-included content which has not been picked up automatically, where people have done disambiguation not using DAB templates etc.) and "notes" on things outstanding to check on the page. There is an overnight batch job to take these instructions, collect the new article version from the main Wikipedia and clean it up (the collection of a new article only requires access to new articles but the cleanup depends on the whole article list and has to be completely re-run to avoid red links).

What do we check? Generally we do not try to check external sources for information given, but we do check that information given has survived the scrutiny of several credible editors active in the area. On some things (year pages for example) we have generally checked information against the rest of Wikipedia. However we would not be doing this project if we did not believe in the Wikipedia model, and that in general content which complies with policy is of good quality.

3) The browsing feature allows you to check whether this article is definitely the most suitable one (for example do you want Great Pyramid of Giza or Giza Necropolis) and add related articles for example on info boxes if they are relevant and not included.

4) The scroll down menus on the left hand side then allow the topic to be placed in any number of categories for the automatically generated category index based on the relevant schools subject.

5) Other buttons are mainly to help debug the automated script (which render articles not included as plain text, remove non-free images, tidy formatting etc.) and for example give WP article from which the current schools WP was generated to compare the effect of the script.

6) (As it is a Frequently Asked Question): can other projects which wish to make a selection from WP have this script. Answer: the database program is not documented, and our "techy" guy doesn't have time to answer questions on it. Our best "easy" offer is to host another copy for you on our server as is. If you really want to try, you can have a copy but you aren't likely to get far with it.