User:Physchim62/ITN stats

This is an interim statistical analysis of stories posted on the In the news secion of the Main Page of English Wikipedia in 2009. At present, the statistics only cover the first five months of 2009.

Dataset
The dataset is every story posted on ITN between 2009-01-01 00:00 (UTC) and 2009-05-31 23:59 (UTC). There were 256 stories posted during this period.

All normal stories
For the purposes of this analysis, a "normal" story is one which was removed from ITN to make room for newer items. Hence, it excludes Hence, there are 235 "normal" stories.
 * stories which were removed "early" because of complaints or other procedural problems (13 stories);
 * April Fool's Day items (8 stories).

Time on the Main Page
I have the raw data for this, but I haven't finished analysing it yet.

Viewing figures
The main statistic for viewing figures is the maximum daily viewing figure achieved by the article linked by the bolded link in the ITN story.

For individual articles, this statistic is subject to a number of systematic and semisystematic biases which I shall discuss below when I get round to it. These biases do not prevent its use for finding median viewing figures and similar statistics.

As stories are usually on the Main page for two to three days, the maximum daily viewing figure will systematically underestimate the total number of page views: no correction has been made for this effect, which is assumed to proportionally similar for all articles.

No baseline correction correction has been made, as, for ITN stories, baseline viewing figures are almost always far lower (by at least two orders of magnitude) than peak viewing figures while the article is featured on the Main Page.

The highest peak viewing figure was for swine influenza, with 1.1M page views; the lowest peak viewing figure was for Slovak presidential election, 2009, with 1.9k page views.

Procedural aspects
In the news has specific criteria for several types of story: All other stories have been classified as "standard discussions".
 * recurring events (sports events, elections, awards, space launches and meteor showers)
 * obituaries
 * April Fool's Day stories

The list of recurring events changed considerably during 2009. An story has been listed as a recurring event if:
 * it was listed on In the news/Recurring items at the time the story was posted; or
 * if the event was added to the list as a result of the story being posted.

For obituaries, articles have only been classified as obituaries if the death of the person was the only news story: hence, stories featuring people who died during other newsworthy events (eg, Velupillai Prabhakaran, leader of the Sri Lankan Tamil Tigers) have been classified as "standard" discussions.

Subject matter
An attempt was made to classify each story into one of a limited number of subject areas. The choice of subject area is, by nature, somewhat subjective, but this should not overly affect the validity of the medians. To give just one example, different editors might have different dividing lines between "War", "Terrorism" and "Crime".

For the "Other disasters & crime" category, almost all the stories involved homicide or accidental death. "Other disasters" implies not war, terrorism or natural disasters.

"Science & technology" includes medicine, as well as space launches etc.

Regional distribution
An attempt was made to assign a country to each story, based on the ISO 3166-1 alpha-3 classification. This proved unsatisfactory for a number of reasons, particularly the large number of countries which feature in ITN stories, which make statistical analysis unreliable. Instead, the stories (in practice, the countries) were classified into regions, based on the common news regions used by international news providers such as BBC News or Al-Jazeera. Even then, some modifications had to be made to cover the variety of ITN stories.

The choice of country, or even region, is, by nature, somewhat subjective.