User:Kokiri/WQA2

Two years since my last attempt (WQA 1) to assess the quality of the Pedia, here comes an update. On the one hand, I have included the same measures as last time, allowing for a comparison where possible. On the other hand, I have included a few extra measures. Please feel free to edit this page if you feel the presentation could be improved.

The other quality assessment I am aware of was by Adam Carr, carried out in October 2003: English Wikipedia Quality Survey.

I have sampled 275 articles on 18 December 2005, using the random article function. This means, in contrast to WQA 1, WQA 2 uses a representative random sample with a confidence level of 90% and a margin of error of 5%. I have not assessed the quality of the articles themselves, something Nature have recently done with their non-representative sample. I have also not assessed the level of plagiarism that I expect the Pedia to be guilty of.

Definitions
entry: what is commonly referred to a an article on the Pedia, not regarding its length or quality

article: a proper entry in the Pedia, not a stub or any other category listed here; articles can be quite short according to the criteria I applied

bot: a bot generated article; in practice this refers to Rambot articles

fragment: an entry with the structure of an article, but mostly consisting of titles and other non-content (called gaps in WQA 1)

list: an entry that is just a list links

disambiguation: disambiguation page

stub: a stub

spam: I came across one entry that was spam (which I flagged for speedy deletion)

Overview: Crude statistics
I guess that most people will be interested in the following table. Bear in mind the margin of error (5%) and the confidence level (90%) throughout the analysis.

This means that only 34.2% of all Wikipedia entries are articles of some sort.

Compared to WQA 1 two years ago
Two years ago, in my first assessment I came up with the following numbers:

Bear in mind the smaller sample (N=50) back then.

Frequencies of article features
What follows is a bunch of frequency tables on some of the variables in WQA 2. These are features that any article and stub can have. A low percentage of entries with a certain feature does not necessarily indicate poor quality: not all features are equally desirable in all articles.

Categories
Two years ago, we did not have categories. Now almost every article and stub is categorized in some way.

Formulae
A rare sighting.

Maps
Not very common, but then again, not every article should have one. Still, many place articles do not have a map. I counted 29 entries on places, and 13 maps...

Pictures and Illustrations
Most articles still come without illustration. I have not come across any animations or videos, something other encyclopaedias brag with... I know they exist, but the fact that they do not show in this assessment suggests that there are not many of them.

Tables
As with illustrations, not very common. There is of course the question whether tables should be used in some entries at all, because they are not always a useful way to summarize information.

Length
Again, like two years ago, I have used a rather convenient way to measure the length of entries.

Britannica comparison
As last time, I have checked whether the entry was also in Britannica. Last time I used the 2002 DVD version of Britannica; this time I used search.eb.com, so in Britannica also means in the 'Britannica Student Encyclopaedia. Not in Britannica means that there were no matches, In Britannica means that there is an article with the same or an equivalent title, Within Wider Britannica Article means that the topic is treated within a Britannica article that covers a larger topic. This is possibly my pet peeve: many stubs have little potential to grow because they would better be dealt with in a more general article (see WQA 1).

Areas covered
Here are two tables on the areas the entries cover. First I included a category for persons, then I split this category to the others. So, in the first table, a politician would be counted as person, in the second she or he would be found under politics. The areas are to a large extent influenced by the articles, and do not follow an existing categorization. This must be borne in mind when considering the systematic bias of the Pedia.

Predicting what is an article
The following table is the result of a regression analysis, trying to predict what makes an article (as opposed to a stub, fragment, or the like). Hits in Google and Scirus are insignificant predictors, meaning that some articles have many Google hits, others just a few. If the topic occurs in Britannica, it is 1.5 times as likely to be an article as a topic that does not. This, however, is not statistically significant once the number of incoming links is considered. The number of links to an entry is the single most powerful predictor whether an entry will be an article or anything less: every incoming link increases the chance of being an article by 8%.

I have also run the regression with splitting the different kinds of Britannica entries (equivalent or within other article). Again, the number of Google hits is irrelevant. The number of incoming links is a very good predictor. Moreover, we can see why the distinction between the kinds of Britannica entry is important: a Wikipedia entry with an equivalent article in Britannica is 2.3 times as likely to be an article than an entry without. Wikipedia entries with an article in Britannica that covers a wider topic do not fare significantly better than entries that have no entry in Britannica at all. I take this as a sign that we have too many entries that have no potential to grow...

Predicting what is a stub
Here are the results of a regression analysis that predicts whether an entry is a stub as opposed to anything else. We find the opposite here. Again, the number of hits in Google and Scirus are insignificant. The number of incoming links is significant: For every extra incoming link, the entry is 8% less likely to be a stub. If there is an entry in Britannica, the chances of being a stub drop by 34.7%.

Predicting article length
This table summarizes the prediction of article length. All these predictors are statistically significant (.1 level), with the exception of the number of hits in Scirus. An entry which can also be found in Britannica is expected to be about half a screen larger than one that cannot. The effects of hits in the search engines are significant but very small. It takes 1 million Google hits to increase the article length by half a screen (0.000000492 screens for every Google hit); or about 5000 Scirus Journal hits for the same effect (0.0000935 screens for every Scirus Journal hit).

Features in longer articles
Longer articles tend to have more of the features measured. All correlations are positive and significant at the .01 level. The number of incoming links once again shows up as the strongest effect.

The data
Feel free to make use of the data for your own analyses. Bear in mind the predictive limits outlined in the introduction: Data.