User:Wbm1058/Analysis of merging processes on Wikipedia

Evolution of merging, analysis preliminary to merging Project Merge
The backlog at WikiProject Merge remains a tough nut to crack. One problem is that the WikiProject itself isn't setting a good example, as we have pages that themselves need to be merged. Merge-related discussions are ongoing at four venues:
 * 1) Wikipedia talk:WikiProject Merge
 * 2) * WikiProject Merge was started 25 February 2012
 * 3) Wikipedia talk:Merging
 * 4) * Merging began as a redirect to Duplicate articles at 08:32, 5 October 2004, and became a content page at 00:43, 10 June 2009
 * 5) Help talk:Merging
 * 6) * Merge began as a redirect to Duplicate articles at 08:33, 5 October 2004, i.e., a redirect to this page, and became a content page 23 May 2005
 * 7) * At 21:00, 21 July 2005, Merge moved to Merging and moving pages (not just about mergers)
 * 8) * At 10:39, 30 May 2007, Merging and moving pages moved to Help:Merging and moving pages: how-to page
 * 9) * At 06:03, 10 June 2009, Help:Merging and moving pages moved to Help:Merging over redirect, per technical requested move
 * 10) Wikipedia talk:Proposed mergers
 * 11) * Duplicate articles was started 31 July 2002, and at 21:02, 25 December 2005 moved to Proposed mergers: Per the talk page discussion, user:David Levy refactored this largely superseded page into something potentially useful

Take a look back to the very beginning, to see where things went off the rails. This is the first step towards getting the WikiProject back on track.

2002–03: Birth of merging, as a listing page for "Duplicate articles"
The concept of merging was born July 31, 2002 on Wikipedia. A new link Duplicate articles, which is what Wikipedia:Proposed mergers was originally called, was added to the To Do Lists section of Utilities— which was later moved to Topical index (diff). Note the many red links on that diff. At the time, apparently much of project namespace was still in mainspace. I believe these items were later moved to project space without leaving a redirect. The first version of Wikipedia:Duplicate articles had just a single merge item on it. You need to go back to Nostalgia Wikipedia to find the item's origins. Crimean war was created 13:13, 10 December 2001. User:Conversion script logged its phase II conversion edit at 15:51, 25 February 2002‎. The article was rewritten and expanded 09:38, 6 March 2002‎, just prior to the introduction of perhaps the first unintentional content fork—or at least the first reported content fork, Crimean War at 15:51, 30 March 2002. This fork went unreported for four months, but then was merged by a different editor just an hour and a half after it was reported, without attribution – which is understandable since the only guidance given was Wikipedia:Naming conventions and Wikipedia:Canonization.

A couple weeks later, someone listed a merger item over at Pages needing attention, under Articles that have good information, but need work for some reason: (diff) – Black Metal and Black metal music. This was either half a cut-and-paste move or an intentional content fork (it's not clear which), as the creator of Black Metal had just edited Black metal music the day before. In any event, three days later Black Metal was redirected to Black metal music. Eventually Black metal music moved to Black metal.

At 09:48, 20 September 2002 and 09:58, 20 September 2002, Duplicate articles became a "subpage" of Pages needing attention, which in turn was a subpage of Wikipedia utilities.

By the end of September, after its first two months of operation, the list had grown to five items, when the first talk page discussion decided to remove merged items from the list rather than mark them as "DONE" (diff). But a few weeks later, another editor said "I would be in favor of making a list of fixed issues at the bottom of the article. Good for follow-up purposes. If the list grows too long, it can be moved to an archives page."

By the end of November, there were 13 items listed, and a short discussion had spontaneously developed around one of them.

At 16:59, 14 January 2003, the first suggestion to tag articles with "boilerplate text" is made, but no templates yet. 16 items on the list.

At 20:34, 25 January 2003, three items are spun off into a new Some other articles that were duplicates and still need consolidation: section. It's unclear what this means, but it seems that some of these needed a history merge. (db-histmerge). Or an editor just dumped text from one article to the bottom of another article (diff), asking someone else to "please merge". As yet, it seems there is still no official guidance on how to merge.

At 15:15, 7 February 2003, Some other articles that were duplicates and still need consolidation was renamed Merged articles that need work. At the time the list had been emptied, so the items reported two weeks prior had been promptly tended to. An important insight was added: ''"The creation of duplicate articles, and the wasted effort this causes, can be avoided by creating lots of redirects." Also, when creating new articles, search for existing articles on the same subject.''

At 05:30, 25 February 2003, a new item was added to the Merged articles that need work section. Plot and brief characterisation of Richard II was concatenated to (appended to the end of) Richard II (play), and needed "merging" or "to be edited". Three weeks passed before any copyediting of the appended content began. I just put template copied on the talk pages, but that template wasn't created until 6 July 2009.

At 02:28, 27 April 2003, an editor removes an item from the list and initiates a discussion about it on the talk page. At 03:02, 27 April 2003, the item returned to the list, with edit summary "there is a dispute about merging. Consult (one of the articles' talk page) about merging.; Agree?" The Duplicate articles talk page won't be edited again until August. There are 29 Articles to be Merged and four Merged articles that would benefit from further editing.

At 13:16, 14 August 2003, an editor puts the list in alphabetical order. The list has grown to 54 Articles to be Merged and 10 Merged articles that would benefit from further editing. At 06:29, 20 October 2003, an editor comments on the talk page, "Actually I liked the previous order, time-based order. What happened??"

At 12:05, 16 September 2003, the list was grouped into alphabetical sections. There are 64 items in the list.

At 12:57, 17 September 2003, the Merged articles that would benefit from further editing section, now with 15 items, is moved and merged to Pages needing attention (diff).

On December 6, 2003, the MediaWiki namespace is introduced.

At year-end 2003, there are 95 items on the list. Note that a few of them seem to be "merged articles that would benefit from further editing", but now rather than appended in a special section at the end, these are co-mingled with yet-to-be merged articles, e.g.,
 * A (includes two versions) – as of 12:28, 27 December 2003
 * Domain name registry (article includes two versions) – large section moved from Registry article - needs merging
 * Hop (plant) with itself – Data from "hops" moved here; 1881 stuff not yet reduced; as of 12:43, 12 December 2003
 * Huldrych Zwingli (article includes two versions) – as of 7:49, 22 September 2003 (note the "Text to integrate from Schaff-Herzog Encyc of Religion" and, further down, "Text to integrate from the 1913 Catholic Encyclopedia")

2004: Birth of merge, and expansion of scope—it's not just for duplicates (content forks)
At 01:18, 29 February 2004, (MediaWiki:Merge) is created, and at 1:36, 29 February 2004, it replaced the boilerplate text. The implementation was clunky. Were parameters supported yet?

At 16:33, 3 March 2004, a talk-page suggestion to change policy: "... in the opening couple of paragraphs, I think we should distinguish another valid fix for duplicate pages: leaving the two pages as distinct, but rewriting the pages so they no longer duplicate information."

At 1:56, 9 March 2004, an editor "took a leap" and "Made the policy clearer and (hopefully) more friendly for newbies." 86 items on the list.

At 21:04, 12 May 2004, "See also: Special:Whatlinkshere/MediaWiki:Merge" is added to the instructions, perhaps in recognition that not all templated articles are added to the list.

At 4:49, 4 June 2004, Template namespace initialisation script moved from the MediaWiki namespace to Merge, in the new Template namespace. At 2:35, 6 June 2004, "MediaWiki"->"template". Rm "msg:"

At 22:37, 25 June 2004, MergeDisputed became the second member of the merge template family. At 22:46, 25 June 2004, a new instruction is added: "If you disagree with a 'merge' indication then change the template from to  and discuss it on this page until concensus is reached."

At 02:12, 18 July 2004, Summary style is created as an alternative to writing articles in news style.

At 15:14, 31 July 2004, merge was modified to pupulate Category:Articles to be merged and at 15:15, 31 July 2004, Category:Articles to be merged was created as a sub-category of Category:Wikipedia maintenance. At 11:44, 6 August 2004, a category link was added to Duplicate articles.

At 13:35, 5 August 2004, Mergefrom became the third member of the merge template family, and the second populating Category:Articles to be merged. At 13:43, 5 August 2004, the instructions were modified to include use of this new template. The template adds the hatnote: This article should include material from, which seems to leave the door open for copying (moving) a portion of the content while preserving both articles.

At 03:20, 16 August 2004, an editor refactored the talk page to group all procedure discussions to date into a single section.

At 17:41, 17 August 2004, an editor briefly adds the first parameter to merge, but quickly realizes this is incompatible with existing usage and opens discussion at template talk:Merge. At the time, there were 93 items on the list, and an unknown number of pages possibly transcluding one of the three merge templates, but not listed at Duplicate articles.

In late August 2004, a discussion listing new candidates: Q: After inserting the merge tags in each article, do we need to add them to the list on the project page, or are they automagically added? In other words, how is the list updated? A: Articles need to be added to this page manually. Reply: Perhaps this should be noted in the "Mark current duplicates" section — Done 17:39, 28 August 2004 Gosh, we have a bot for that. And the need for it was apparent back at the end of August '04.

At 11:00, 8 October 2004, the instructions were changed to say that those disagreeing with a merge indication could just remove it, rather than replace Merge with MergeDisputed, and pleaded with editors to "Consider using these tags sparingly, and use the discussion page to discuss how to merge articles where it's not obvious whether or how the article should be merged", after concerns were expressed on the talk page that merge tags were being overused. At the time, merge was just a simple hatnote saying This article should be merged with, and merge/doc did not exist—in fact, documentation for the merge template would not be created for another two years. So, while Duplicate articles leads with "Below is a list of duplicate articles that have been created mostly by mistake. They have to be merged into a single piece of work", an editor had tagged Federal Assault Weapons Ban, a subsection of the Violent Crime Control and Law Enforcement Act of 1994, to merge with Violent Crime Control and Law Enforcement Act. Comments were left on each talk page; paraphrasing: "It was suggested that these articles be merged. I deleted the merge tag from each article because I think the appropriate place to suggest merging articles is on the talk page, where the pros and cons can be discussed, rather than in the articles themselves. While some content is duplicated, it's not obvious to me whether or not the articles should be merged, and I take no strong view either way." As of 2014, Federal Assault Weapons Ban and Violent Crime Control and Law Enforcement Act are still separate articles.

At 03:34, 9 October 2004, Merging became the fourth member of the merge template family. At 21:00, 9 October 2004, the instructions were modified to include use of this new template, at 6:05, 10 October 2004, the Duplicate articles content was "clarified" a little, and at 06:13, 10 October 2004 Merging became the third template to populate Category:Articles to be merged.

An early October 2004 talk page conversation: Q: How long does it take before a pair that gets mentioned on this article can actually be merged?? Feel free to include what it can depend on. A: It can be merged anytime. It only depends on who is willing to put the effort to merge the articles involved, and that can take time.

2005–09
At 19:10, 3 April 2005, Content forking is created. While its initial focus was on POV forking, it evolved to include redundant content forks which were unintentionally created.

At 23:41, 4 December 2005, Merge-section is created: "The following section overlaps with other sections and should be merged with the rest of the article."

At 16:13, 29 December 2005, Merge-section was moved to Cleanup-merge.

At 02:27, 12 March 2006, Category:Articles in need of internal merging, populated by Cleanup-merge, was created as a sub-category of Category:Wikipedia cleanup categories.

At 18:00, 5 December 2006, Cleanup-merge was moved to Cleanup-combine.

At 17:35, 5 October 2009, Category:Duplicate articles, populated by Duplication, was created as a sub-category of Category:Wikipedia maintenance.