Wikipedia:Merging encyclopedias


 * ''Note effort underway: WikiProject Missing encyclopedic articles.

Some larger part in the future of the English Wikipedia will be played by the "missing articles" issue. While those who work on Featured Articles are at one extreme of the project, concerned with pushing the quality of articles as far as possible, the other side of the way we judge any encyclopedia is certainly its comprehensiveness: the sheer power of very broad coverage speaks for itself, once you start consulting a reference work. Not only is this side of our project less obviously prominent (which is inevitable), it is much less discussed in detail, meaning conceptually, technically, and in terms of project management and co-ordination. Even the basic aims of working on missing articles are still up for grabs.

A major source of lists of topics is other reference works: listings that are prima facie encyclopedic can be extracted from any reputable specialist encyclopedia. Where such an encyclopedia is free of copyright (e.g. public domain works), or (for example) is another reference wiki under the GFDL, it is in theory possible to merge it into enWP.

Experience shows that not to be as easy as it may sound. This essay tries to develop some basic theory. As working title for the encyclopedia being merged in, take EV (Encyclopedia of Various).

Listing
It is fundamental to set up a suite of project pages, listing (preferably alphabetically, and completely) the titles of articles to be merged in. Some other pages should be set up around the listing, dealing with project management and other issues, and any auxiliary listings.

First ideas
The initial, rather naive view is that the topics fall now into two groups: the Blue Group where there is an existing WP article, and a Red Group where such an article is to be created.

Problems with this "first pass":


 * (A) Blue Group
 * On the face of it, the bluelinks from the topic listing indicate articles from EV to be merged into the matching articles in WP – probably a whole pile of merges to consider, some of which aren't worth starting.
 * Anyway, there will have to be a preliminary sorting: some bluelinks on the EV listing will not go to the matching topic, because of ambiguities.


 * (B) Red Group
 * Suppose articles are simply created: will they be orphans?
 * Anyway, a significant proportion will be titles that can and should be redirected to WP articles. That means a pass finding redirects and making them, which then moves an EV topic into the Blue Group.

Practical management
Since EV may easily have thousands of articles, collaboration on the project will be required. That means that the various steps indicated just now should probably be going on simultaneously, and yet comprehensibly.

There really needs to be a master Red List, because the "headline figure" for the whole project is the percentage of redlinks from the EV listing still remaining. That raises the question of what happens to the bluelinks that are taken out of the Red List.

It seems there should be at least two Blue Lists. (This is drawn from "good practice" at the Catholic Encyclopedia project.) There should be:


 * Blue List of Merges
 * This is a list of articles where the EV article is significantly fuller in some aspect than the existing WP article. Articles wait here until a decent merge is carried out.


 * Blue List of Dabs
 * This is a list of articles which have a bluelink as far as the EV title is concerned, but that link is misleading. This list is really a holding pen for the Red List.

In other words, there is a sophisticated maintenance cycle: a bluelink on the Red List should be


 * (i) removed totally if the WP article completely covers the EV article;
 * (ii) sent to the Blue List of Merges if the WP article overlaps the EV article;
 * (iii) sent to the Blue List of Dabs if the WP article is in fact on a disjoint topic.

A bluelink on the Blue List of Merges should be removed only after a good merge.

A bluelink on the Blue List of Dabs is really a "virtual" redlink. When a dabbed title is chosen or set up, it could even go back to the Red List with suitable annotations. The template has been created to help with maintenance. If John Smith should be on the Blue List of Dabs, John Smith can be displayed in a main listing (rather than moving John Smith to a separate list). This is supplementary to the standard John Smith. In other words, tag in one way if the bluelink goes to a dab page and requires attention to check down the dab page for a match. Tag in the other way if the target of the bluelink is just plain wrong. This will all help document matters.

Not creating orphans
This is a pressing issue for the Red List workers. It can indicate the need for:


 * thorough searching of the site to find where an EV article would fit in;
 * the creation of "hooks to hang on", snippets added legitimately to existing WP articles containing wikilinks to titles of EV topics;
 * addition of lists, perhaps sourced from EV;
 * priorities set, to add survey-level articles from EV early on, thus setting up good "link farms" for EV topics.

Anyway, what is the brief?
Once the early interest in adding much-needed articles has passed, what precisely is the aim of a merging project? Clearly such material as is added from EV to WP must conform to content policies. It is not usually put that way: the apparent objective is to turn redlinks blue on the working list. Starting from the simple statement of "missing articles"&mdash;that WP would like to include articles for all topics in EV&mdash;there is a more nuanced discussion to have. For example:


 * (A) We can treat inclusion in EV as prima facie evidence of notability of the topic, since it speaks to its having been noted as important in what is presumably a serious specialist work. But this isn't completely decisive. For example, EV may be an old work and the past reasons for being interested may have been perishable.
 * (B) If a topic has receded in importance in a field, over time, as is quite common in science, it may not be worth a separate article. Therefore, topics may certainly become redirects, and the appropriate step may be to create an article section on WP and set up a section-anchored redirect.
 * (C) Topics from EV may well be considered, in WP terms, as POV forks. As such, they are likely not suitable as article topics on WP. Indeed, if the EV article is written from a strongly-held POV, there may be little left when NPOV is applied. In that case, it is probably better only to use the EV article as a reference, explicitly attributing the views (if possible) to the article's author.
 * (D) In general terms, all material copied into WP should be edited as a preliminary. This is to clean up POV, wikify, format references and find better ones, restructure logical flow and so on. The idea is to end up with an article that conforms to content policies and the Manual of Style. It is also inevitable that there will be a need to update the article. In some cases the transformation will be so major that authoring a completely new article is also an option.

In the light of these arguments, what is left of the idea of merging? Well, in the non-prescriptive way of old-school wiki thinking, there is no need to lay down the precise scope of the merge. The discussion sets some boundaries for what is to be done with EV material on WP. The brief doesn't have to go into the project management issues.

Allied projects
Big merge projects don't come to a neat end, and they exist in a broader context.

Need for ratings
The inherent complexity of doing it right (maximising the value of material added, not just plundering the easier parts and forgetting about the last 20% of more exclusive material) means that such projects probably taper off, or change in nature.

It would be quite natural to see some sort of rating project develop, using article Talk pages to note matters relating to the standard of merge achieved. Older material should also be rated for how well it has been updated.

Project tracking
The biggest merge effort so far has been the EB1911 merge, and we can learn from it. One important lesson is that we have lost much of the history of the process, and it is difficult to determine if a current Wikipedia article has reached the completion of the merge process.

We need a set of tags that define the state of the article with respect to EB1911. The goal of the project should be to move all articles into the "final" merge state, where "final" means that an editor has determined that there is nothing more from the source article that can be usefully added and that the source is properly referenced.

However, there can be strong opposition from editors to the use of tags that are seen as soliciting content from old sources. Therefore, this "tag" might best be set up as a template that does not add any visible text.

Preserving the source
Some older public-domain sources, such as the 1911 Encyclopædia Britannica and the Dictionary of National Biography, are available online but in disconnected fashion or at unreliable locations. In these cases it may be a good idea to create a project at Wikisource to capture the original sources of each article before merging the article into Wikipedia. The Wikipedia article can then reference the Wikisource article, and the Wikisource article's description can worry about all the ugly details of the provenance and description of the data capture. The projects s:1911 Encyclopædia Britannica and the s:Dictionary of National Biography, 1885-1900 are already started, but are not complete.

For any but the smallest sources, the Wikisource will be divided into articles. Points to note about the interplay of WP and WS here:


 * Scanned text will have OCR errors, and both sites will want to correct those. The criteria are different, though. For WP, copy-editing will possibly paraphrase the text and in any case can simply omit anything garbled, while WS will want to check against the original before claiming to represent it faithfully.
 * It is certainly desirable that the merge to WP should proceed in parallel with the WS effort. The trouble is that this will impose an overhead on the merge, and there is no positive way to "enforce" a synchronized approach. So it is matter of tracking.
 * Feedback from WP workers to WS will be significant. There can quite often be issues with titling on WS, and those busy with the WP merge are some of the most likely people to notice. So project coordination for that reason should be set up, to give a communication forum.

Wiki merges
If it really were a question of merging a GFDL wiki into WP, there would be some question of checking how well the hypertext had carried across. This adds another level of generality, though,

Reminder
Wikipedia is a triumph of the approach of making the work comprehensible enough that volunteers are attracted to doing it. Whatever is set up for a merging project, it will not be helped if it is arcane rather than accessible.