User talk:Jarry1250/Findings

Suspicious number
The total for Special Groupings is 65535 (2^16 - 1). Is this a coincidence or is there a systematic reason? OrangeDog (talk • edits) 01:29, 12 May 2009 (UTC)
 * Yes, there is. I will clarify that on the main page (I see what you mean), but also here: that's the number of articles in the whole sample (in order to give perspective for the other numbers, which are in sample not in population), not a sum of the other numbers in that table. 65535 was picked, somewhat unwittingly, as the sample size because it's the maximum number of rows Microsoft Excel can handle without help. - Jarry1250 (t, c) 08:20, 12 May 2009 (UTC)

Preemptive disambiguation
One other detail that would be useful to know is how often an article is named " " even when  does not (yet) need disambiguation.


 * Method

According to your method you "discarded" the list of all names. If you do still have them, sort by name and discard all where there is a disambiguation, i.e. where the text outside the brackets is duplicated at least. This will leave the possibly preemptive disambiguation pages. They would need to be checked against all names including those without brackets (and those ending in (disambiguation) of course).

Instead of leaving it to you, I downloaded the latest list and did it myself. I get 6437566 pages, 498694 disamb pages (excluding one original page, whether it has 's or not), and 97122 preemptive disambiguated pages.

You may still want to do it for your snapshot as well, just so it corresponds to the rest of your data.

After downloaded and extracting enwiki-latest-all-titles-in-ns0.txt I: Mark Hurd (talk) 02:43, 14 May 2009 (UTC)
 * executed DOS SORT to ensure it is ordered
 * then I ran this in DotLisp:

(disambiguation)
Note that a lot of redirects to disambiguaton pages were created in this time frame. Rich Farmbrough, 09:29 14 May 2009 (UTC).
 * All redirects were ignored, so I don't think that would have affected anything. Unless you think it has done? I'm only human, it's possible I made a mistake somewhere. - Jarry1250 (t, c) 11:06, 14 May 2009 (UTC)