User:Fritzpoll/Refinement

Alternative: The proposal I never got to make
This entire page was 12 hours old before I even knew it existed, by which time there were misunderstandings, and raging arguments taking place. I have read what has been said, and believe there is net support for at least the principle of evening up (to a greater or lesser extent) the geographical coverage of Wikipedia. However, there are many legitimate concerns, and I have taken these on board and now present an amended proposal for the community's consideration

Proposal
The executive summary of my proposal is this: bot automation driven by WikiProjects, operating within community-defined guidelines.

Here is the meat of it.

1. A new WikiProject is created to coordinate the activities of the bot. This allows for a central group of volunteers to assist with the generic tasks involved in making this project work, and gives a centralised palce for questions to be asked, and new proposals and requests to be made

2. Before beginning work on a new country, the relevant WikiProjects are contacted. These will include the country WikiProjects, continental WikiProjects, and subject-based WikiProjects. We will seek some volunteers - if no, or insufficient volunteers for a country can be obtained, we ignore this country for the time being.

3. Together with the WikiProjects, a collection of sources will be obtained. The default will be an amalgamation of the US GNS data and the census data of various countries obtainable from the following list of resources http://www.census.gov/main/www/stat_int.html. If census data is not available, unreliable, or imcomplete, work on the country will be suspended until it is, or until other, reliable sources can be found to add this kind of data.

4a. Once source collection has occurred, the bot will be tuned to output lists similar in format to those already being created, but with the addition of population data, and hopefully other elements such as elevation data etc. The output will be seperated into subpages, with a subpage devoted to those places the bot is unable to reconclile between databases.

4b. The bot will not upload any data for places where the census data indicates that the dwelling is too small, with size to be determined here by community consensus (not voting!). More on this below.

5. Data will be checked, as per the old proposal, by human editors to ensure correct spellings, check for disambiguation, etc. In the case where the bot cannot automatically reconcile data from the existing sources, human editors must add a reference to their corrections to indicate how they reconciled the data (looking at an atlas, for instance). Most data reconciliation failures are likely to be failures to correlate census data to coordinate data. These references will ultimately also be uploaded by the bot

6. Once the project agrees that the data has been checked, and is ready for upload, relevant parties (such as New Page Patrol) will be given notice of an upload - I propose 30 mins notice - and the bot will automatically create the articles according to a template agreed with the WikiProjects. The articles will include all the above data, and all the references to it.

7. The bot will watchlist the articles to prevent flooding Special:UnwatchedPages and create a list of articles it created - this list will be posted to allow the WikiProject volunteers to watch the pages that they helped to create.

8. When they are first determined, the relevant notability policy will state specifically the initial minimum standards for "inherent notability" of villages, including global standards and any national exceptions. The initial specifics may be more narrow (such as minimum three or four independent reliable sources, and minimum 50% of population of capital city as determined by a specified benchmark source); over time the minima can be ratcheted down to broaden them slowly until the community and WikiProjects indicate when to stop. The bot's new articles will always observe the current notability standard strictly. Added by JJB 14:02, 2 June 2008 (UTC)

What use is this?
The advantages to the above are that, although a little slower, we end up with more than one-line stubs, and because countries can be worked on in parallel by multiple WikiProjects on their own subpage within a seperate WikiProject (the new one proposed above), the speed factor is also maintained. Thus there is an increase in quality with a minor cost in speed compared to the old proposals. By involving the WikiProjects in the way described, we ensure that there is sufficient interest in the articles, we obtain new and useful sources, and we ensure that there is someone to watchlist the pages afterwards.

The difference, therefore, between this and the old proposal is the increase in quality, and breadth of sourcing. These will not be single-sourced articles, and we will be able to devote our time to finding new and reliable sources of data. The WikiProjects also end up with a series of extra articles that they wanted, in the format they wanted. An example of how the project has already been moving in this direction is a discussion I have had with a member of WikiProject Russia, who is collating a list of sourced data in a database, and we want to help them by uploading the data when it is complete.

Other points
This proposal will probably drastically reduce the number of articles created, but I hope people will understand that this proposal by its very nature will not yield a good estimate of the number of articles created. It will be nowhere near the predictions of the first proposal, however.

I also hope the community will understand that an example is difficult to give, since I would have to first go and collect the data and sources for an entire country to create a handful of articles. This would not be in the interest of the articles in question. The rough layout of the articles created under this proposal would not be significantly different to the original - there would be an infobox, categories and text. The text would be more substantial given the additional sources, and the external links currently in the article would not exist in this new iteration.

Onwards to discussion...
I believe this proposal will qualm the legitimate fears of vandalism, unsure notability and low quality of stubs. The one point about the above, beyond acceptance, that needs to be considered above is point 4b). The easiest automatic criterion is size.  There is no need to have a permanent, everlasting limit - a limit that can later be reviewed if it is found to be inadequate is probably best, so that we introduce articles slowly.  My suggestion is that the community pick a percentage representing the lowest size of town/village to be included - the percentage would be "as a percentage of the capital city of the country".  So if you picked 50%, all dwellings that had a population greater than half the population of the capital city would be included by the bot.

The reason for doing this is that it is fairer than selecting a fixed number, like 30,000, since less developed countries will not necessarily have reached the levels of urbanisation that we consider.

This proposal should satisfy most of those "on the fence" for the previous proposal, should continue to garner support from those supporting it, and may even address some of the concerns of those who opposed. But let's not make the following discussion divisive. I beg, no more straw polls, no more "voting" - let's just talk about this rationally.

Let the games begin! Fritzpoll (talk) 11:53, 2 June 2008 (UTC)