Wikipedia talk:Wikipedia CD Selection/archive

Barcelona page (which is going to be included to the CD) vandlized for 1.5 months and no one did anything?
Today I saw the article of Barcelona and it really amazed me that the history section had been vandalized for 1 month and a half and no one had done anything. It all started when a guy removed some things from that section and added lots of none sense stuff. Then some people removed some of the stuff this guy had written, and the guy was blocked, but misteriously no one was smart enough to remove all the fake things that this guy had written about the history of Barcelona, and these things lasted until today, when I saw it and I changed the history section to an older version (the version that was there 1 month and a half ago, before that guy vandalized it).

You can check the story of all this here: http://en.wikipedia.org/wiki/Talk:Barcelona#.22Modern_Barcelona.22.3F_Who_wrote_this.3F.3F

With this, I want to say that it's very surprising that an article that is going to be included in this CD has been vandalized for 1 month and a half, and some people has removed some of the consequences of this vandalism, but no one has completely removed the 100% of the vandalized stuff and completely rectified it. It has been like that for 1 month and a half. Being an article that is going to be included in the CD, why there hasn't been any admin there, watching the article all the time?

If there are more articles with stories like this, this CD will be a trash!

Onofre Bouvila 05:26, 25 December 2006 (UTC)

Initial discussion
This section is copied from Wikipedia talk:Version 1.0 Editorial Team, Wikipedia_talk:Version_1.0_Editorial_Team. Maurreen 07:19, 6 April 2006 (UTC)

It might help to have a look at [2006 Wikipedia CD Selection] where a group of volunteers for a charity have gone through some of this process and produced a 2000 article selection for UK schools (as a free download).
 * Unsigned post by BozMo


 * That's nifty, thanks. Did you pretty much take stuff as it was, clean it up within Wikipedia, or clean it up after forking? Maurreen 02:43, 5 April 2006 (UTC)


 * Nice job! Thanks for alerting us to this! Hopefully what we are doing here will make this type of thing easier & better. Walkerma 03:47, 5 April 2006 (UTC)


 * Just a thought, maybe we should consider somehow incorporating that into what we're doing and possibly using it as a first test version. Maurreen 03:54, 5 April 2006 (UTC)


 * I'd definitely support that, though what about the "places" project? Which should come first? Could we do a quick assessment ourselves of each article, by trying to recruit a bunch of people from this project to each do a batch of 100?  I think if people knew we were planning on actually releasing something pretty soon, they'd be much more likely to pitch in. I really like the way they chose articles as both important and decent quality.  Walkerma 04:19, 5 April 2006 (UTC)


 * Thanks for the compliments. BozMo (me) of course is the CEO of SOS Children UK as his day job... the clean up was after forking on 28 Feb and was remarkably difficult with a vast number of (mainly short) sections needing deleting (most of the work was done on my user pages here). Even good articles have section stubs etc. and on a CD you don't want external links lists or lists of WP articles not included on the CD. Some of the clean up also was child-friendly rather than absolute (there are featured articles including stuff and sodomizing schoolkids and the like). Also articles get better as well as worse even over a short period and one learning is that practically you need a list of historical copies of good articles rather than the article themselves since they get worse as well as better. A good example is Christianity which fluctuates in and out of disputed whereas the old versions when it was featured are much better. I guess the hard work has shown me the official version is quite a long way off and I think we would like to get adopted as a schools version in the meantime. The good news is that we have all the scripts and if we are provided with a list of (copy dated) articles, a list of sections to delete we can produce a WP copy soon. Also on size 2000 articles with thumbnails is 180M which I hope helps with deciding how many articles to go for. On the volunteers we are an international charity based in a university town so we get plenty of volunteers but mainly they are interested in doing things for orphans rather than for wikipedia. I think you should be able to find people to check articles here--BozMotalk 06:38, 5 April 2006 (UTC)


 * It looks very nice. However, one possible problem I can see is that the licensing information of images seems not to be included. For a lot of images released under de GFDL this is not a problem, I think, but there are those that are released under other licenses, Creative Commons, for example, which might require proper attribution. Have you looked into that? &mdash; mark &#9998; 07:02, 5 April 2006 (UTC)


 * Nope. Only thumbnails are included and these are attributed to Wikipedia. We can add some other attributions fairly easily but we would need to know what. --BozMotalk 06:59, 5 April 2006 (UTC)
 * It took me some time to find an example; a lot of images are licensed under the GFDL or even released into the public domain. However, here's one that requires attribution and a share-alike license: Image:KualaLumpurAbdulSamadBldg.jpg in Malaysia. Please note that I'm not an expert on these matters; I don't know what would be the best way to fix this. &mdash; mark &#9998; 07:02, 5 April 2006 (UTC)
 * Easiest fixed is to identify these images and remove them. --BozMotalk 07:20, 5 April 2006 (UTC)
 * I guess the alternative is to extend the copyright section to explain to people how to find the attribution. I note WP itself doesn't use an onpage attribution so I guess explaining to people how to find a photo attribution via WP history pages would do --BozMo talk
 * This seems to be a good idea. To me at least. Having discovered a few of my own maps now (in the article on the LRA), I would agree with any method that makes clear that not all of the images included on the CD are in the public domain, and that licensing information can be found by going to the original article on Wikipedia. &mdash; mark &#9998; 08:00, 5 April 2006 (UTC)


 * About BozMo's product versus the "places" plan (or any plan for a release version) -- It seems like whatever we do essentially comes down to deciding between quick and dirty or very slow but better quality, or where we're going to land on the spectrum
 * Having only looked at the intro pages, it seems like BozMo's group has done a big slice, and it seems like they've found a good compromise between quick or good. I am impressed that BozMo's group did this.
 * I also agree that immediacy or a finished product is likely to get more people involved.
 * It seems like we have a consensus that this is at least worth exploring further.
 * The current stuff we're already doing probably should continue, but maybe we ought to:


 * make a new subproject page,
 * sample x number of BozMo's articles and look over the contents to better know what we're talking about,
 * discuss more ...
 * Maurreen 03:36, 6 April 2006 (UTC)


 * Oh, and I'm leaning against an assessment of each article. I'm thinking if we could randomly sample x number, that could give us an idea of how much work we want or need to do for a first test version. Does that make sense? Maurreen 03:38, 6 April 2006 (UTC)

End of copied text. Maurreen 07:19, 6 April 2006 (UTC)

Very rough proposed plan
Some of these steps could run concurrently. Please feel free to edit this as you see fit.


 * 1) Review scope and sample articles to determine overall general quality level.
 * 2) If we're not satisfied with the quality, figure out and carry out Plan B.
 * 3) Intermediary steps -- handle:
 * 4) Copyright and licensing concerns
 * 5) Mechanics -- scripts, storage and whatnot
 * 6) Write introduction.
 * 7) Other?
 * 8) Publish and publicize. Solicit feedback.
 * 9) Correct any serious problems (mainly any factual errors), rinse and repeat for next version.
 * 10) Plan for and proceed with future versions.
 * Maurreen 07:38, 6 April 2006 (UTC)

Proposed initial minimum quality standard
That which won't embarrass us. 07:38, 6 April 2006 (UTC)

Sampling
Does anyone know much about statistics? Given that there are 2,000 articles in BozMo's release, does anyone have a good estimate as to the number we'd need to look at to be reasonably confident that the sample represents the whole? And how to ensure we have a relatively scientific random but small sampling? Maurreen 07:38, 6 April 2006 (UTC)

If you're really worried, look at 200; otherwise 100 should be sufficient. ~user:orngjce223how am I typing? 12:43, 2 January 2007 (UTC)

Version names
Hypothetically, if we were to have umpteen versions before 1.0, would they be:
 * alpha, beta ... zeta
 * zeta ... beta, alpha
 * alpha, bravo ... zulu :)
 * 0.1, 0.2, 0.3 ...
 * Maurreen 07:38, 6 April 2006 (UTC)


 * Just here because you tagged my favorite article -- not sure if I'm going to join this particular project. But, thought I'd offer you some input as to versioning:


 * [number of intended release version]a1, a2 (i.e. 1.0a1, 1.0a2, 1.0a3) &larr; alpha version, really shaky, starting off
 * [number of intended release version]b1, b2 (i.e. 1.0b1, 1.0b2, 1.0b3) &larr; beta versions, working out the kinks and getting to ready place
 * [number of intended release version]rc1, rc2 (i.e. 1.0rc1, 1.0rc2, 1.0rc3) &larr; "release candidate," a.k.a. very last proofreads, very last examinations -- usually goes final within a week or two of a release candidate release, at most.


 * This is the versioning system that NetNewsWire seems to follow. (It's not unusual for the alpha and beta numbers in various software products to go fairly high, i.e. 1.0b32 ... ).


 * &mdash; WCityMike (talk &bull; contribs &bull; where to reply) 19:57, 15 May 2006 (UTC)

Child-friendly
BozMo suggested keeping the material acceptable for children. I support that, at least until we have many more articles we are ready to release. Two thousand is enough to stand on its own, but small enough that we have to discriminate anyway. Maurreen 07:57, 7 April 2006 (UTC)

Next cut
My congratulations also to BozMo and his volunteers. I have put complete wikipedia dumps (May 2004 snapshot) down at South African schools, with great success. I did it using MySQL and the dumps. However, the picture archive does not quite match the text archive, and there are quite a few missing pictures, so it is not trouble-free. It does have the advantage that Search works. I love the CD idea - and would like to scale it up to DVD-size, and so a process to get there would be wonderful.

Slightly-related question - BozMo's zipped CD is 175Meg. That tells me we could fit 4x as much onto the CD, if we could keep the HTML text compressed - can a browser be persuaded to unzip on the fly from the CD ?


 * Some comments for those who have not looked at the CD :


 * CD size is 232 Meg - 175 Meg compressed on the SOS Children website
 * About 235 Meg of the CD is wikipedia content - 47Meg (uncompressed) text, 235 Meg images.
 * images have been downsized - great.

I converted the wikipedia portion of BozMo's CD to a Plucker database so I could carry it on an SD card on my Palm. I have put it up for download at ftp://ftp.wizzy.com/pub/wizzy/palm/Wikipedia.pdb - it weighs in at 44 Megabytes, with pictures downsized to 150x150, 8 bit color. Wizzy&hellip; &#9742;   11:47, 9 April 2006 (UTC)

Process
Actually we did most of this on my user pages (or at least the volunteers used spreadsheets which got version managed on my used pages. Process is described here: User:BozMo/2006 Wikipedia CD Selection. The starting point is a list of articles on a wiki page (in our case User:BozMo/version_1_list ), together with a list of sections which need to be deleted in the clean up like User:BozMo/tidy & User:BozMo/tidy plus other tidy up notes. The only big change we would make a second time around is that we would list historical versions of articles like http://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia-CD/Download&oldid=47383198 rather than live ones since the checking process would be much easier.

Note the whole thing started end Jan and we had part time help from 5 main volunteers. Now of course we have a script, if you provide a list of historical versions, section deletes etc in principle we can do new runs pretty quickly. --BozMo talk 10:50, 7 April 2006 (UTC)


 * Sorry, I guess that's what happens when I read too fast. I'll blame excitement and distractions.
 * So, if I understand you right, ery little would need to be done to publish the same material as a Wikipedia version, right? Could your material be imported to Wikipedia instead of further checking and cleaning up? Maurreen 18:07, 7 April 2006 (UTC)
 * No, I didn't say that. Quite a few clean ups cannot be done on the main wikipedia because they are only things which don't work on CD. For example look at Finland, it is a list of other articles most of which (say for the sake of argument) we do not want on the CD. So it looks silly unless we delete the section (since it only makes sense as a list of articles). But do sections like this have any kind of standard name or is there any way of identifying them? Heck no. It is a very TDS hand checking job. I cannot delete them on en but you do not want them on a CD. By the way if you look at User:BozMo/version_1_list you might recognise a (slightly modified) version of your list, for which thanks...meanwhile thought I am not sure what you mean by "publishing as a Wikipedia version". We have published this as a Wikipedia version and applied to Wikimedia to put the logo on it: we have restricted changes to deletion only to keep NPOV; clearly I would like it to be the 0.1 version which we all improve, but personally I would call it a Wikipedia version. --BozMo talk 20:31, 7 April 2006 (UTC)


 * Thanks, I'm glad you could use my initial list.
 * Maybe I'm thick, but I think I'm making progress. Are you looking for help with the tidying up or with a list of historical versions? Maurreen 03:24, 8 April 2006 (UTC)
 * And is your intention to make a new version with essentially the latest version of the same topics? Maurreen 07:19, 8 April 2006 (UTC)

I mentioned something similar at Village_pump_%28proposals%29. BozMo's list, User:BozMo/version 1 list, is Wikipedia 0.1. We should ensure everything on the list stays or becomes "featured article" status. Hope BozMo doesn't mind, but I will tidy up the list a little. BTW, BozMo, congrats as well; very impressive! Samw 03:07, 8 April 2006 (UTC)


 * No problem with tidying. In general I would take any help we can get. As for our plans, doing a cut requires a lot of human checking effort so we will do the next one when it could be a lot better. If in six months we have somehow worked out a list of 5000 articles (which would just fit on a CD: more prominent articles have more images) or many of the original 2000 had been improved we would do another check and cut. Personally I wonder if we could use the category pages to hone the selection of 5000 articles... category "included in WP1" and category "would like to include in WP1 but needs work". If there is enough interest we could set 1 Oct as the next cut date or something and chose historical versions in Sept... am open to views--BozMo talk 14:59, 8 April 2006 (UTC)


 * Thanks, that is more clear. Maurreen 17:35, 10 April 2006 (UTC)

Images and Copyrights
A couple of people have raised the issue of image copyrights. There are also some images WP shouldn't really be using. AFAICT the CD complies with the various licences: All images are clearly attributed to their creators as listed on Wikipedia and it is stated that the few images are not under GFDL but under other licences listed there. i.e. creators are named in a way which uniquely identifies them but perhaps not with their preferred title. See copyright & disclaimer. However it is extremely easy to remove images so it seems a good idea to request a list for any non GFDL images in the CD which people come across and so we can remove them if needed.

Although I know some people submit their own images and claim they are under different copyrights both Contributing_FAQ and Copyrights are (and have been at all prior dates to the CD being generated) explicit that all material and images submitted is under GFDL. I reckon these are the terms under which people submit. The only grey area is people who submit someone else's CC work on the basis that CC is close enough to GFDL and I have not yet found an example of this on the CD. Personally I think the WP community should clean up all the multi licence and creative commons stuff and return to the basics. It would certainly make doing a 1.0 properly much easier. Thanks for any help listing images here:

Image:KualaLumpurAbdulSamadBldg.jpg (CC Only: but the creator submitted it to WP. Hmm)

Image:China_%28172%29.jpg (Apparently illegal use by WP?)

End of list--BozMo talk 07:39, 10 April 2006 (UTC)

Here is the collected remarks from others on images:


 * In fact, as far as I can tell, the GFDL does not permit you to simply link to WP for the authorship. Section 4b requires that you list on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement. (emphisis added). Now since WP has said that they consider linking to WP as being in compliance with the license, one could argue that contributers to WP implicitly release you from this requirement, but the ccby photos are not taken by WP contributers. The terms of ccby require that you state that the image is licensed as ccby, provide the ccby license or a link to it, and provide the name of the creator in a manner "appropriate to the medium". I would think that a copyright holder might think that providing their name on a CD is appropriate to the medium. Also see Creative_commons for more information on using CC material.

— Preceding unsigned comment added by MattKingston (talk • contribs)
 * Thanks. I have read the creative commons licences and don't believe they require naming their creators on the CD? Attribution requirements look the same as GFDL to me: which is to say that their creators can be acknowledged/attributed indirectly. WP itself does not acknowledge them on the page they are reproduced of course. The attribution at http://fixedreference.org/2006-Wikipedia-CD-Selection/disclaimer.htm seems to me to comply with the licence. The Reuters one though I will look at. --BozMo talk 16:21, 9 April 2006 (UTC)


 * Thanks. Could you direct me to the part about using the creators preferred name, rather than attributing? At present the images are clearly attributed to their creator named as listed at WP. This seems appropriate given the target audience is UK school children who are not going to remember names listed on the PC anyway... and other people are going to have to look up other details since the actual names used are often WP user names etc. WP itself does not list the artist on the page impression, just hidden somewhere if people click on the thumnail which is a form of hidden attribution (and the thumbnail is all I use of course). Clearly I am not going to put people's name on the CD that would take a long time. However it is fairly quick and easy to remove a list of images and I will have to look at that if there is an advertising requirements.


 * Also I do not understand your comment about 4b. SOS have made no modification to the text and so AFAICP we are not required to list ourselves as a modifier; it is the authors of our own modifications required not of the previous publisher's. As a verbatim copier we are required to attribute and use front and back pages if they exist. Unfortunately we are also not allowed to use front and back pages (as far as I can work out what these may be: there is not clear statement on their existence) since they have logos on which require permission, a catch 22. We want to comply with this and applied some time ago to wikimedia for permission to use the logos. No clue when we will get an answer. --BozMo talk 06:36, 10 April 2006 (UTC)


 * Sorry about this, I replied to this a few days ago, but I guess I forgot to hit the save button. It's exam time, my mind is a bit frazzled. Anyway, I'm not going to type out my response all over again, I'll just summarize it in point form:
 * Not all images on WP are GFDL, some are CCBY or other licenses, some are fair use, some are copyright violations tagged as fair use. Some copyright violations are tagged as PD (I know of a dozen or so at the moment).
 * Although WP bills itself as a "free encyclopedia", the truth is that you have to do a lot of work (particularly with images) to ensure that you can use it. WP states (somewhere) that what's fair use for WP may not be fair use for you.
 * Because it's a web-site, WP can get away with a lot. If a user uploads a copyvio, then the user is the one responsible for it, not WP. WP only has to respond to a DMCA takedown but can't be sued (easily) for damages. If you make a CD, you're the one who put the content on the CD so you're the one that's responsible to ensure that all the content is legal.
 * The history pages of an article are part of the document, if you don't include them on the CD, you've modified the document (possibly image description pages too). I believe WP has said that they're ok with this as long as there is a link back to WP. But that may only apply to websites, if the content is on a CD you can't assume that the end user has an internet connection and can access the WP content.
 * I think this is highly debatable in most of the different assertions it makes...--BozMo talk 07:57, 21 April 2006 (UTC)
 * I'm just brining this to your attention. I'm not going to sue you, maybe no one will sue you, but it might save you a lot of money in the long run to consult with a lawyer. Remember the web is a dynamic place, if there's a problem with WP content it's easy enough to change. Once you've burned something to a CD and sold it, it's a trickier situation. I wish you the best of luck, and I hope it works out well for you. An english CD is well overdue. Matt 00:47, 13 April 2006 (UTC)
 * Thanks. Of course, we don't sell the disks, we give them away. I am taking out cases I can find and if anyone points out others I will happily removed them.


 * Re: your Q on my talk page. Unfortunately, I had no fancy way of finding these images; I just looked at a few articles at random and checked out the license of images which looked like they had been created with some effort. A lot of the older ones are GDFL-licensed but in recent years, CC-by (i.e. attribution) has become popular. I myself release all of my maps under CC-by-2.5. I won't require on-page attribution in case of your project, though; as I said before, I'd be content with just a note somewhere on the copyrights page to the effect that some images are not as free as others and that creators etc. can be found by going to the image page on Wikipedia. &mdash; mark &#9998; 06:53, 11 April 2006 (UTC)

Showcase
I'm a bit confused by the "off-site" part of this project. Why don't we make an online version of Wikipedia which displays suitable Wikipedia 1.0 core articles + selected FA's - without the option to edit - so it's simply a stable showcase. Articles would only be copied from WP when they had proven to be superior to the ones in the showcase - ensuring a constant growth in quality. This would also tackle the "Wikipedia is not credible because it's getting vandalized all the time blablabla" argument against WP as a whole. The showcase version would in effect deal with most criticism against the WP principles. In addition, it would make the whole WP 1.0 project more visible... and a high quality, stable encyclopaedia is the end goal anyway, right? Gardar Rurak 17:08, 15 April 2006 (UTC)


 * I guess because everyone would argue about what was to go in it? The only reason why we could do this at Fixed Reference so fast was that we could use a conventional management system (volunteer checks article, proposes deletions and raises issue, one editor decides and deletes). Doing this by wikicommittee would be much better but not be in less than years. --BozMo talk 19:48, 19 April 2006 (UTC)


 * I disagree. FA's would be a straightforward matter of simply copying them when they appear and update them per consensus as they improve over time. Core articles and 'good articles' can be copied per consensus a la the FA model to ensure a minimum standard of quality. Bureaucracy would essentially not be a factor since the workload would be minimal. 1) WP 1.0 team nominates articles. 2) People vote. 3) Bureaucrats update the showcase when needed. The nomination procedure can even be simplified to the FA model if needed.


 * In any event, I don't understand why it's restricted to off-site and CD publications. I consider this to be the logical climax of Wikipedia the encyclopedia - the end goal - what we are here for - the final kick in EB's butt. Let's make it open and observable as all other procedures in Wikipedia. Gardar Rurak 01:09, 22 April 2006 (UTC)


 * Okay, then we agree on what criteria for inclusion, how many articles we want and go for it. I suggest use of category pages and also use of a section delete list. --BozMo talk 12:57, 25 April 2006 (UTC)

I think that this is what they are trying to do over at Stable versions.--JK the unwise 10:00, 27 June 2006 (UTC)--JK the unwise 10:00, 27 June 2006 (UTC)


 * It doesn't look like Stable versions will get off the ground anytime soon. Maurreen 13:06, 27 June 2006 (UTC)

Clarifying
In response to the "Showcase" section above, I think I was premature in making this page. I was coming at it differently than BozMo. Probably to avoid further confusion, we should just note his project on the main 1.0 page, and redirect all of this to an appropriate page of his. Maurreen 19:12, 23 April 2006 (UTC)


 * That is possible. The alternative is that you entirely take it over. To do this all you need to do is mark all the pages you want to include with some category thing like, maintain a list of sections which are inappropriate to a CD version and then agreed on periodic production of a download.--BozMo talk 12:54, 25 April 2006 (UTC)


 * I am open and interested in hearing from other people on this either way. Maurreen 16:51, 25 April 2006 (UTC)


 * I think the WPCD category for talk pages is great. I come across articles not on BozMo's CD (Dubai, Pigeon) and rather than chit-chat I would like to just stick them on. I would also love to have an automated script, that picked all WPCD articles, and builds the CD! Wizzy&hellip; &#9742;   13:53, 16 May 2006 (UTC)


 * The category works: try it. On the other thing, sorry. I've got an automated script which will pick them up, run the section deletes but it was done for us on the understanding we have to use it for the charity not for open release... what I miss is a check script on image copyrights and also something which will pick up all the redirects to page name variants. --BozMo talk 15:38, 16 May 2006 (UTC)


 * pick up all the redirects to page name variants - It has happened to me twice - with China and Pigeon. I want a release early, release often approach to this - I think it will encourage people to get involved if they see the next weekly CD release contain their articles. What do we do about vandalism of the category ? Or disputes, like Homosexuality ? Can you fix the licencing of the script ? Wizzy&hellip; &#9742;   08:34, 17 May 2006 (UTC)


 * Vandalism is tricky. If you list a historical version of an article instead of the current one the script picks up the old version but I don't see that as a good solution really. I think we will stay with a hand check/ six monthly update unless I can think of a better way. I have plenty of server space for weekly copies if wanted but... The other good thing about the WPCD flag is that in itself it will bring lots of attention to the project (but I have only flagged about 20% of the pages so far). Contraversy is even harder. A lot of important topics in History and religion are difficult so short term I just steer clear. --BozMo talk 08:48, 17 May 2006 (UTC)


 * I really meant vandalism of the category - like adding Porn to the WPCD category. Re: variants - I guess we could tag the redirects - I will do so with China and Pigeon so you see what I mean. Wizzy&hellip; &#9742;   09:00, 17 May 2006 (UTC)

Section delete list
What is this ?

I presume it is to delete sections considered unnecessary/harmful to a widely distributed list - too long, unverified, what else ? If there is such a thing, the list should also be on the Talk page ? Project page ? Format ? Wizzy&hellip; &#9742;   13:24, 17 May 2006 (UTC)


 * Either things which do not work on a CD (e.g. external links lists) or poor quality sections see User:BozMo/tidy and User:BozMo/tidy one for all sections with a given title, one for an article by article list. We need a list and it needs to be somewhere on a wokring page. I don't mind where.--BozMo talk 15:14, 17 May 2006 (UTC)


 * Eek. I am always a bit worried by things that require human intervention. External links I can understand, but See also are often internal. Also, for anyone else reading and wishing to pitch in, articles on WP:VA Talk could all be added to  , as could articles from List of articles all languages should have. Wizzy&hellip;  &#9742;   09:17, 18 May 2006 (UTC)


 * I sympathise but we looked at the CD without. It could never look like a finished product to rival Britannica or whatever. The problem is what a CD user makes of a "See Also" list of 20 articles not included on the CD? There are vast numbers of internal link lists to stubs, also people who add empty stub sections (e.g. "culture" to every country which doesn't have it). Also all Good Articles good be included and I don't mind doing that automatically (e.g. by adding WPCD to the categories on the Good Article template. Featured articles though are often not child friendly and weird/porn etc. Thanks for help tagging articles by the way. I am trying to do 100 a day to get the current list in over three weeks --BozMo talk 09:22, 18 May 2006 (UTC)

Well, one good thing is that ...
The category tagging has brought half a dozen other interested parties into the project. Thanks! --BozMo talk 12:23, 22 May 2006 (UTC)
 * Currently standing at 1172 articles! I am still interested in experimenting with the script that pulls all these to a CD, to find out how the section delete stuff works, and what we can do about redirects/synonyms. Can we mention directly in the template about the Category ? I have had questions about it. When does the category addition take place ? At edit time, or (somehow) at view time ? Can you really edit the GA template and magically add all articles to this WPCD template ? Wizzy&hellip; &#9742;   14:10, 23 May 2006 (UTC)


 * The input to the script is just a list of urls for the articles plus a list of section deletes (general and by article) and any funnies. Therefore, at run time you can do a manual process and put the list of GAs into the funnel with the rest. It may be possible to get the GA articles listed on the category page e.g. by tweaking the GA template to add another category section which would make it even better. --BozMo talk 14:50, 23 May 2006 (UTC)


 * Done. The GA Template now adds to our category too. So there are more than 2200 articles... Bit of a hack that --BozMo talk 14:55, 23 May 2006 (UTC)


 * But stuff like "1987 (What the Fuck Is Going On?)" will have to get screened out..--BozMo talk 15:01, 23 May 2006 (UTC)

CD
Is this the place to dispute inclusion on the CD? A tag was added to Talk:Hippocrates, yet it needs major attention from an expert on the subject (per comment at the top of talk)) so ATM I don't think it's very good example for inclusion. Arniep 21:59, 22 May 2006 (UTC)
 * I think this is as good a place as any. I have been going through WP:VA and List of articles all languages should have, and Hippocrates is listed. I added the tag. It may be that the page is not at its best, but for someone with no access to the 'net (one of the targets for the CD), isn't any information better than none ? Wizzy&hellip; &#9742;   07:24, 23 May 2006 (UTC)
 * You can discuss inclusion on the talk pages on the category Category_talk:Wikipedia_CD_Selection. However, the process once an article is on the CD category is not automatic inclusion. The article will go into an editorial funnel when the next run happens and may have sections or the whole thing deleted if at that moment in time quality is not high enough. The editorial funnel is only allowed to delete not otherwise alter though (like to volunteer?). Some topics which are not in a good state like this one I think should stay in for now as the topic is important enough and central to schools curriculum. The next run isn't planned til Sept so someone may have sorted it by then --BozMo talk 08:40, 23 May 2006 (UTC)
 * Thanks for the responses. Arniep 00:10, 26 May 2006 (UTC)

Also Wikimedia Commons. Why on earth would you include this article? Are kids/schools going to care at all?? It seems only self-serving. I assume Wikipedia would be included. That seems enough. pfctdayelise (translate?) 01:09, 28 May 2006 (UTC)
 * Thanks, I agree. I have taken the tag off that one. --BozMo talk 05:25, 28 May 2006 (UTC)

Use Real Dates
Please use real dates; avoid ambiguous phrases like "by the end of the year". End of which year? Please say "by the end of 2017", or whatever... 69.87.193.9 01:45, 29 May 2006 (UTC)

Splitting included and candidates
Would it not be beneficial to split the included articles and the candidates into two different categories and so helping us to review the candidates? I would really love to help this project but it seems like it is being made difficult needlessly. Ciraric 19:34, 29 May 2006 (UTC)


 * I guess I could but it is loads of work. I will have a look at seeing if there is another way to let people tell easily by tweaking the box

--BozMo talk 09:17, 13 June 2006 (UTC)

Wikipedia 0.5
Just checking in here, things are progressing over at WP:V0.5, and I think we can probably both help each other out. I'm hoping that by August 31 (our cutoff) we will have a good number of articles nominated at WP:V0.5N and approved, many of which will be appropriate for this project to use. I would suggest that in Auguest both projects look at each other's lists and see what other articles we may want to "grab" from the other. For example I notice that you had Goethe listed, who might have been missed by our group because he is not GA, VA or anything, but we already have James Joyce who is FA. As I see it, this project has a lot in common with 0.5, with the revised SOS CD aimed at (older) children rather than adults.

Another thing - you may want to consider using the bot, which will generate a nice collection of tables and statistics using talk page templates. We now have a guide for adding a new project list. As a project which already has templates on many pages, this would probably be a useful thing for the project. Please keep in touch with us at 0.5. Thanks, Walkerma 01:32, 11 June 2006 (UTC)


 * More clue on how to do this would help. --BozMo talk 09:45, 20 June 2006 (UTC)

Name
Hi, sorry I haven't been around for a while. I wonder if it would be good to give this a different name. Maybe "School version"? Maurreen 17:14, 20 June 2006 (UTC)


 * Maybe but I am not sure whether it will merge into 0.5 first...--BozMo talk 22:38, 21 June 2006 (UTC)

Equations
Equations still don't appear (at least not in the browseable version -- see e.g. Special Relativity). Does anyone know how easy this is to fix?

Rnt20 13:14, 5 July 2006 (UTC)


 * Having a look but very busy next couple of weeks. In theory shoudln't be a problem...

Has anyone looked at this yet? I think this is very important. There aren't any equations in e.g. Redshift although the text refers to them. --Hjb26 18:41, 27 July 2006 (UTC)


 * Wiki formulas tagged with are rendered by a Tex engine.  Please see Help:Displaying a formula for an explanation of how this works, and why the formulas need to be modified for display in an offline database.  (I am not an expert at this).  Templates would also not be displayed properly in an offline database. --Blainster 02:50, 1 August 2006 (UTC)


 * This is now fixed in the 2007 version --BozMo talk 16:57, 4 January 2007 (UTC)

Criteria for inclusion
Can someone point me to an outline of the criteria for inclusion in the CD selection, or the procedure for tagging articles for inclusion? I ask because I recently removed the template from the talk page of the article on driving on the left or right, which is in a poor state and has unfortunately been the subject of back-and-forth reverts in a dispute over its lack of references. Someone readded the template to the talk page a few hours later, but I really don't think it's worthy of inclusion in its present state. It appears that anyone is free to add an article as a candidate for selection, and my concern is that when it comes to checking this article it might have been reverted to an apparently healthy state (i.e., without the tags indicating that it's largely unreferenced). Terminal emulator 23:10, 2 August 2006 (UTC)


 * Broadly the is not intended to be a quality mark: the process is that the articles proposed will subsequently be checked for quality, and boxed and some sections will be taken out. The tag should be put on any child friendly topic interesting anough and important enough (from the persepctive of an 8-15 year old child or their educators) to be in the top 5000 articles in WP. If it stimulates improvement all the better. However it is hard to argue that the one you mention is a top 5000 article in terms of importance.
 * Aha. I came here for a similar reason, tagging of the Racism page. (Which is a sorry mess and certainly shouldn't go on any CDs.) Maybe the template could make this more clear. Arbor 20:46, 5 August 2006 (UTC)
 * Seconded. A couple of paragraphs on the topic and a link in the template should do the trick, but right now it’s a bit confusing. —xyzzyn 21:57, 5 August 2006 (UTC)

Original CD gets a mention in the press
Folks here may be pleased to see User:Sj's mentioning the original CD in this interview in The Phoenix (newspaper). Walkerma 01:08, 28 August 2006 (UTC)


 * Actually there is quite a bit of news coverage around the place, including at least three articles in Norwegian e.g. http://magasinet.telenor.no/default.asp?page=27&article=1325. Perhaps we should add the page to ones cited in the press? --BozMo 19:07, 12 September 2006 (UTC)

Featured lists
How about including Wikipedia's featured lists in the 1.0 CD? There are only about 150 of them and they've all passed quality review. I've added the selection template to the two where I contribute Cultural depictions of Joan of Arc and List of notable brain tumor patients. Some of the others such as The Oz Books are truly impressive. Durova 15:16, 7 September 2006 (UTC)


 * My general view is that the featured articles are too isoteric and lower quality that the good articles. I will have a quick look through the 150,--BozMo talk 19:13, 7 September 2006 (UTC)


 * For 1.0 itself, we will be using lots of lists - but with lists I would say the main criteria are usefulness and importance rather than quality (assuming the list is something like complete). For example, in Version 0.5, we will include List of rivers by length because we are including a lot of rivers and we want people to be able to find them. Whenever we have a large group of articles on a topic, we will include the relevant list.  Walkerma 20:44, 12 September 2006 (UTC)


 * When will the 0.5 list of articles be ready --BozMo talk 21:15, 12 September 2006 (UTC)
 * I'd hoped by October 1st, but things are a bit slow so we may need a week or so of October to finish things off. Don't tell them that over at the 0.5 page though!  We should probably talk soon on how things with this version vs. the Version 0.X series might go. Walkerma 22:02, 12 September 2006 (UTC)

Objection to inclusion of articles
Where does one object to the inclusion of an article? If this is the right place then - I object to the inclusion of the following articles; British Isles and Ireland. Both of these articles are currently being used as a vehicle for anti-British sentiment, particularly regarding the term "British Isles". A minority of editors are closely guarding these articles and mercilessly reverting any content that doesn't adhere to the anti-British Isles POV. As such, these articles are not fit for inclusion on the CD. Arcturus 13:42, 12 November 2006 (UTC)


 * Here was fine. Thank you for the heads up --BozMo talk 18:43, 12 November 2006 (UTC)

Inclusion of specific versions of Ireland and British Isles
BozMo, you've indicated on your Talk page that Arcturus' specific edits to these articles will be included on the CD version of Wikipedia. I hope this isn't the case. His portrayal of himself as a noble NPOV knight fighting off anti-British-POV barbarians is very clever, but a review of the respective Talk pages will tell a different story. His selected version of Ireland, in particular, contains extensive references to the British Isles, an outdated term which many feel implies British hegemony over the Republic of Ireland. It's considered offensive to many Irish people. That's not merely the opinion of "a minority of editors" to those articles. Diplomats in both Ireland and the UK conscientiously avoid using the term, and a publisher recently eradicated it from its Irish edition of an atlas. The CD edition of Wikipedia would do well to follow those examples. As for his British Isles edit, he seems unwilling to acknowledge that there's any controversy surrounding the term at all, which there clearly is. If the compilers of the CD edition are uncomfortable taking a stand in the controversy, I would propose that both articles be left out of the CD edition entirely. If that's not feasible, then on the CD edition (as elsewhere on Wikipedia) consensus should carry the day. Arcturus' preferred versions are in opposition to the consensus. Dppowell 20:17, 14 November 2006 (UTC) (one of those supposed "anti-British POV" guys)


 * Answered on your talk page --BozMo talk 20:31, 14 November 2006 (UTC)

GFDL and Creative Commons
I see this has been dealt with in part above, but that was a few months ago and I think it's worth reopening the discussion. I'm working on Ben Nevis, which is tagged as a candidate for inclusion, and which uses a couple of images from the Geograph British Isles project. There are several more photos on Geograph which I'm thinking of including in the article, and the project is generally an immensely valuable resource for British geographical articles. Now, Geograph images are released under CC-BY-SA 2.0, and contributors to the project are given to understand that they will be explicitly credited by name whenever their photos are re-used. There was a fair amount of disquiet on the site's discussion boards when Geograph images first started to be used on Wikipedia, as the photographer wasn't credited on the article page; that seems to have been dealt with, but I don't think the argument above that "[CC image] creators can be acknowledged/attributed indirectly" will go down well.

So, can we be sure that authors will be directly credited on the CD somehow? (Most Geograph images use a standardised geograph template, so I expect it should be easy to write a script to extract the authors' names, then the remaining CC images could be manually checked.) Or will all CC-BY images be removed from the CD? (Which would be a shame.) Or should CD candidates avoid CC images altogether? (Which would be an even greater shame, especially if the article isn't in the end included on the CD.) Thanks, Blisco 11:44, 25 November 2006 (UTC)


 * Not quite at a decision point on this one. If the technology proves easy to master we will credit the photographers in a similar way to on WP. We will certainly improve on the first CD which whilst probably legal isn't really fair. Should be able to give a proper answer in a couple of weeks,--BozMo talk 15:41, 25 November 2006 (UTC)

Ideas for the next release
Since BozMo noticed my acid blog post on this CD, here's a suggestion.

Why not having a start page with some major themes placed as the petals of a flower, so that the child (or any reader) could find easier what he might be looking for?

Especially for geography, a clickable map would help. For kids, a *clickable* 3D map, with some mountains, monuments, etc., drawn in the cartoons style, would be extremely useful in finding some areas of interests.

In the same time, the editors of the CD could easily see "what's missing on the map". Is it the Statue of the Liberty? Niagara Falls? Victoria? La Tour Eiffel? The Chinese Wall? Any major city? Etc.

The start page could have some gross categories, then subcategories can be shown on the category page.

Examples of possible "gross categories" (main page) and subcategories (subpages):
 * - Earth and Universe: (1) Geography (2) History (3) Astronomy
 * - Life: (1) Vegetable kingdom (2) Animal kingdom (3) Human being
 * - Science and Technology: (1) Exact Sciences (2) Natural Sciences (3) Technics (4) Energy (5) Communication and IT
 * - Arts and Architecture: (1) Music and Dance (2) Visual arts (3) Architecture
 * - Society and Civilization: (1) People (2) Beliefs and Religion (3) Politics
 * - Life: (1) Food (2) House (3) Clothing (4) Study (5) Work (6) Sports and games (7) Transportation

Each of these should be represented by a pictogram too.

Some topics should be reachable from more than one (sub-)subcategory.

P.S. Check out as ideas of visual menus: shot10, shot11, took from this product.

create 2007 Wikipedia CD Selection page?, and article collection ?
If the new release is almost out, maybe its time to create a "2007 Wikipedia CD Selection" page?

Also, at one point, either for the CD Selection or the Wikipedia 1.0 project, there was discussion of including the content for a specific archived date for a given page, rather than the most recent page, as the archived page was not subject to vandalism in the same way. I'm just wondering if this is how you're doing things this time around?

Dialectric 15:10, 20 March 2007 (UTC)

Its a good point. I haven't really thought about it. --BozMo talk 15:22, 20 March 2007 (UTC)

Suggestion for article removal / article selection strategy
After looking over a few letters from the page listing articles pending, I have a suggestion for some article removals, generally along the following lines:


 * Video Games
 * - it would make the most sense to include only a few key games of high historical importance.
 * - video game characters/worlds/terms should not be included.
 * Music
 * - (almost) no individual songs, only key albums of high historical importance.
 * And here are some articles which I would suggest might be left out, with a brief reason for each:
 * And here are some articles which I would suggest might be left out, with a brief reason for each:
 * And here are some articles which I would suggest might be left out, with a brief reason for each:


 * -a-
 * Autobiography (Ashlee Simpson album)


 * -c-
 * Cowboys Are Frequently, Secretly Fond of Each Other - - non notable song
 * CM Punk - non notable band
 * Chill Out (album)
 * Chrono Trigger - video game
 * Clone Wars (Star Wars) - main star wars article already included
 * Common Unix Printing System - somewhat obscure techy article
 * Charmander - pokemon


 * -e-
 * Electric Six - non notable band
 * Early life of Joseph Smith, Jr. should be moved to "Joseph Smith, Jr."
 * Eevee - pokemon
 * Extraordinary Machine - nn album


 * -j-
 * Joe Beevers - poker player
 * Jack White - white stripes already included
 * JoJo - nn actress
 * Justified and Ancient - nn song
 * Jack Thompson (attorney) - not internationally important
 * Janet Farrar - british wiccan
 * John Henninger Reagan - american politician of usa specific importance
 * John Floyd (Virginia politician) - american politician of usa specific importance


 * -p-
 * Pedro López - serial killer, not suitable for children
 * Perfect Dark - video game, not high historical importance
 * -r-
 * Radical Dreamers: Nusumenai Houseki - non notable video game
 * Rena Mero - female pro wrestler
 * Ross Boatman - poker player
 * Ryan Leaf - american football quarterback - not record-breaking
 * Ralph Bakshi - adult content comic artist
 * Robert Clark (actor) - non notable actor
 * -s-
 * Stargate (device) - confusing to have a fictional sci-fi technology article
 * Supernature (Goldfrapp album) - band is not included, why album?
 * Striptease (film) - not high historical importance
 * Star Fox Assault - video game, not high historical importance
 * Star Fox Adventures - video game, not high historical importance
 * Star Fox 64 -video game, not high historical importance
 * Star Trek XI - unreleased movie, star trek series should be included instead
 * Spira (Final Fantasy X) - video game world
 * Squall Leonhart - video game character
 * Sparty - USA college mascot, not important
 * Sarlacc - fictional starwars character
 * Streetlight Manifesto - american bar band
 * Shadow (song) - song, not high historical importance


 * -t-
 * Tenebrae (film) - non notable horror film
 * The Star Wars Holiday Special - non notable, star wars already included
 * Tumbling Dice - rolling stones already included
 * The KLF discography / KLF Communications - non notable band, profanity
 * The Protocols of the Elders of Zion - article on controvertial racist text
 * The Reputation - non notable band
 * This Charming Man - non notable song, Smiths might be notable?
 * Tidus - video game character
 * Tila Nguyen - non notable model
 * The Legend of Zelda: Ocarina of Time - legend of zelda already included
 * Truthiness - not sure, maybe not encyclopedic

Dialectric 15:57, 20 March 2007 (UTC)
 * Thanks, thats very helpful. I will add them on the project page where the script can pick them up. --BozMo talk 16:01, 20 March 2007 (UTC)


 * Just to add that the remaining letters which no one has run through (or have agreed to do) are the second half of S (looks like you went through) T M and articles starting with a number. We have a few days left but looking over those would be especially helpful. —The preceding unsigned comment was added by BozMo (talk • contribs) 16:11, 20 March 2007 (UTC).

- one more thought, you should do a search through your article list for text 'pokemon' - there must be at least a dozen in there under strange names. maybe the main article 'pokemon' could be included, but these individual creatures should go. Dialectric 17:31, 20 March 2007 (UTC)


 * Thanks. There are a couple of automated check processes which should pick up too many pokemon characters but to be honest they probably are of some interest to children (unlike all the adverts and non notable songs: but like say dinosaurs) and I think some non-Britannia content might be part of the appeal. I have three long emails of comments on articles to go through but light is appearing at the end of the tunnel. --BozMo talk 17:49, 20 March 2007 (UTC)

and a short list of possible additions
Dialectric 17:40, 20 March 2007 (UTC)
 * Hip hop music
 * jazz
 * aye aye
 * red panda

Page move?
Since this page doesn't only cover the 2006 Wikipedia CD Selection anmore, but also the 2007 edition (and will probably also cover future releases), I think we should move it to Wikipedia CD Selection or Wikipedia CD Selection series or something like that... --Fritz S. (Talk) 13:36, 26 April 2007 (UTC)


 * I agree why not WP:Be Bold? --BozMo talk 13:48, 26 April 2007 (UTC)

Business Studies
BozMo asked me to look at http://schools-wikipedia.org/wp/index/subject.Business_Studies.htm, because I've been trying to improve some of the business articles in Wikipedia. I'm not sure I can help much, but here are some suggestions. Hope this is helpful. --SueHay 14:17, 27 May 2007 (UTC)
 * Since this CD selection is directed toward British school children, you might want to consider using an A level outline in choosing what to include in Business Studies. Something like the outline at http://www.s-cool.co.uk/topic_index.asp?subject_id=8, though I'm sure you can find a better one.
 * Check the SMOG Index of the articles for reading level. I ran a quick online check on the first paragraph of your Accountancy article and got a SMOG value of 23 -- post-graduate reading level. I don't think it matters what reading level index you use, but it's no good putting that sort of article on a CD for children.
 * I don't know if you have these articles elsewhere, but running a business involves understanding bookkeeping, obeying laws and regulations, and understanding basic contracts, at minimum.
 * I suggest you use some business examples that the students can see, such as food stores, petrol stations, and hairdressers.
 * Thanks --BozMo talk 19:23, 27 May 2007 (UTC)

template
Is there any way someone could create a template that would state on the talk page whether the article had been selected for the SOS CD and which version had been selected? Remember 14:20, 29 May 2007 (UTC)
 * Good idea but not sure if technically possible...I can easily provide the list of versions selected. --BozMo talk 20:39, 29 May 2007 (UTC)

CVG Suggestion
Bozmo popped up on the CVG Wikiproject asking for suggestions. I think History of video games would be a good place to start at for CVG contributions. - X201 13:27, 12 June 2007 (UTC)

I would suggest PlayStation, PlayStation 2, PlayStation Portable, PlayStation 3, XBOX and the XBOX 360  so NPOV can be maintained as currently Nintendo is the only manufacturer listed with systems unless we want to only present Nintendo. Xtreme racer 21:07, 12 June 2007 (UTC)

I would suggest dropping Star Fox Assault. The article selection is going to be limited, and, in the long run, that game's impact is nothing compared to the likes of Pac-Man. GarrettTalk 04:41, 13 June 2007 (UTC)