Wikipedia:Version 0.5/To do

We have to try and get some content balance for Version 0.5 by September 30, 2006, the date by which we aim to complete article reviews. I would like every reviewer to sign up for at least one of the following, and aim to review 40 articles by Sept 30 (ten per week from today). Many of the articles should be quick to review because they pass automatically on quality (FA or A, unless quality lost) or on importance (countries, global cities, core topics/supplement). Please sign up if you can work on one of the following article sets:

UPDATE:November 12, 2006&mdash; Reviewing has stopped, so I'm archiving the tasks and adding new ones... Eyu100(t 02:23, 13 November 2006 (UTC)

Archive1

Complete list
We now have a complete list of articles to be included in Version 0.5. This includes The GNU Free Documentation License and the front page. So far, the list is not updated and does not have the countries or the license or the front page. Walkerma will produce an article tree to help people locate a particular article - for examples of this see the Arts listing and the Countries tree (please give feedback).

Redirects
If a search capability is available (see below) we should make sure that some simple redirects are included, such as Los Angeles as a redirect to Los Angeles, California. BozMo has suggested that even without a search capability, we could still have a lookup table (he called it a keyword (pre-indexed) search), this is seen as an important feature.

However, the Linterweb search engine delivers all pages that include certain terms without any real ranking of those hits, so redirects may be less important than we thought.

Publishing
Linterweb, a small software company from near Paris, France, is interested in publishing Version 0.5 during December 2006 or January 2007. The CD would cost 10 Euros, with 1.5 Euros going to the Wikimedia Foundation. This company has some scripts for cleanup, and an offline reader/search engine is under development. Their initial dump can be seen here.

User:Wikiwizzy and User:BozMo have tools to make a CD, but there are still some problems, namely that Wizzy doesn't have enough hard drive space for the full image dump. BozMo has also performed a dump and will make this available after cleanup. He will also be able to run the articles through the scripts used for the 2006 Wikipedia CD Selection.

Reader / Search
As mentioned above, User:Kelson (fr) has located Linterweb, which has developed an offline reader/search engine. This is still be tested, but latest demo versions of this look promising. This is a Windows-only solution, but Linux and Mac versions are planned.

Wizzy's use case is in a computer lab, in a school. The 'CD' (or image) is centrally stored, and a browser is used to access it. He wants a search capability. Javascript is a possibility, or a static HTML page alphabetically sorted by search word, linked to relevant articles.

We have also been looking at the Polish search engine, to be used on their upcoming DVD release, but at this point (alpha version only!) it does not seem easy to use, and there is no English language documentation. Therefore at present we are tending to favour the Linterweb software for use with Version 0.5.

Images
We need to run a check on all image copyright tags, and remove images that don't comply - we can use this list as a guide. Does someone have a script for this? The default should be that an image will be automatically excluded unless it has a copyright tag known to be acceptable.

We also need to convert all of the images to a smaller size (thumbnail), so that they can fit onto the CD.

Dealing with vandalised pages
We need to tag certain versions as unvandalised. Until stable versions become a reality, this will involve several things:
 * Check edit histories till we find a version by an editor we can trust. We don't have a "whitelist", but we can at least look for the last version by an non-anonymous editor.  If there is some way of checking to see that the editor has at least 250 edits (say), that would be even better.  Do we have a script to do this?
 * Once that fork has been made, we need to create a collection of only those versions. We need to work with only those versions for the final CD unless there is really a specific major development we want to include - e.g. death of a major figure whose biography is included, say.
 * Once we have this collection, run a script to check every article for certain standard vandalism words, such as "poop", "gay", "f**k". Of course sometimes these words may be appropriate!  User:Tawker has a vandalism bot we could try using, and User:BozMo has a script for this.