Wikipedia:Wikipedia Signpost/2022-02-27/By the numbers


 * Wikidata contains a huge amount of data. How can it be used to answer the questions that Wikipedia editors are interested in? This article shows one example. Of course, academics and other professional researchers may research similar questions to those you are interested in. But it can be hard to find their results. It can take months or even years before a peer reviewed paper is published. And it may be costly to read the articles, once published. Sometimes a do it yourself approach is faster and answers your specific question better than any available peer-reviewed paper. PAC2 originally published this article here with the full dataset and graphics available. CC-BY SA 3.0

How does your birthplace affect your probability of being covered on Wikipedia? Having a Wikipedia page can be a sign of how successful you are in certain aspects of life. We know that the probability of being successful depends on your birthplace. So in this article I look at the probability of having a Wikipedia page depending on your birthplace.

While browsing Wikipedia in French, I was surprised by the number of people born in Neuilly-sur-Seine (Hauts-de-Seine) or in Paris. So I wanted to know if there was an over-representation of people born in these places. I've been looking at place-of-birth data in France. The national statistical institute (Insee) publishes data about the number of people born in each département from 1975 to the present (France is divided into 100 "départements" i.e. districts). Unfortunately, there is no older data available on births by department. This is a hard limit; our dataset is limited to those born after 1975. They are 47 years old or younger in 2022. Some people, of course, may become notable after this age. However this is the best available French data to examine my question of interest.

I collected data about the number of people with a page in Wikipedia in French born in each department using a SPARQL query from Wikidata. I also used data from the Code officiel géographique (the official list of French departments). All my data were collected using a Jupyter notebook written in the R language. The data set is stored in a CSV file.

Comments and feedback are welcome on my Wikidata talk page!

Probability of having a Wikipedia page by department of birth
For each department, we have the number of people born in the department with a Wikipedia page over the total number of people born in the department between 1975 and 1990. We compute the probability "per mille" [per thousand] (ie ‰).


 * People born in Paris between 1975 and 1990 have a 2.7‰ chance of having a French Wikipedia page.
 * People born in Pas-de-Calais between 1975 and 1990 have a 0.3‰ chance of having a French Wikipedia page.
 * People born in Creuse between 1975 and 1990 have a 0.2‰ chance of having a French Wikipedia page.

Of course interpretation is tricky. This may reflect a real inequality of opportunity to gain success in the real world, or it could be an encyclopedic bias. However, unlike gender bias, it is difficult to imagine reasons why the encyclopedia would be biased towards certain departments. So it probably reflects inequality of opportunities between French departments which drives the results.

I was not surprised to find Paris in the first place. The five departments coming after Paris are much more surprising: Hautes-Alpes, Pyrénées-Atlantiques, Hautes-Pyrénées, Pyrénées-Orientales and Alpes-Maritimes are mountain departments. Further analysis shows that there is a high concentration of rugby players in the South West departments such as Hautes-Pyrénées, Pyrénées-Atlantiques, Pyrénées-Orientales and Haute-Garonne which might explain their ranking. People born in those departments would have a higher probability of having a Wikipedia page because they can become rugby players. In the Hautes-Alpes, we find a high concentration of ice-hockey players. We definitely need further investigation to understand these geographical disparities.