Wikipedia:WikiProject Women in Red/Metrics/Wikidata

WikiProject Women in Red's article creation metrics, and article gender-gap statistics provided by Denelezh's tool and by WHGI, are all dependent on Wikidata having a record (termed an 'item') which has a sitelink to the en.wikipedia article; and on the item specifying that its subject is a human with female gender. Accurate gender-gap percentiles depend on such items for all male, as well as all female biography articles.

There is no automatic process by which wikidata items are created to match the creation of wikipedia articles. Wikidata items are added by hand, or by individuals running reports and bots which lead to item creation and population.

This page provides links to a number of Petscan and Listeria reports that can be used to identify and remedy missing and incomplete wikidata records. The reports list wikipedia article that lack wikidata items; or wikidata items that lack a gender property or any properties.

To use all the facilities of Petscan, it may be useful to authorise WiDaR to allow Petscan to make edits, under your control, on your behalf. It's also very highly recommended that you add the Wikidata framework to your Preferences / Appearance / Shared CSS/JavaScript for all skins: / Custom JavaScript page ... the Framework provides new left-side menu options enabling the creation and editing of wikidata pages from wikipedia articles.

All of the reports below take a considerable time to run. Note that Petscan (or the toolserver on which it runs) often fails, either hanging (it just does not produce a result) or else returning an error "502 - Bad Gateway" ... the solution is to run the report again - sometimes repeatedly - or run the report later.

Petscan works by looking at articles in categories and sub-categories: the deeper you ask Petscan to look, the longer Petscan will take to do its job. The deeper down the category:Women tree you descend, the more male & non-biography articles you find, because the Wikipedia category tree is somewhat idiosyncratic.

High priority
Probably the highest priority of all of the reports below are the following, which track new articles without wikidata items.


 * Articles with no wikidata item:
 * from category:Living people to 0 levels of depth - manually run - auto-run
 * from category:People stubs to 6 levels of depth - manually run - auto-run
 * from category:Births by decade to 3 levels of depth - manually run - auto-run
 * from category:Deaths by century to 4 levels of depth - manually run - auto-run
 * from category:Women to 5 levels of depth - manually run - auto-run

Articles with wikidata items having no properties

 * Articles from category:Sportspeople
 * to 6 levels of depth - manually run - auto-run


 * Articles from category:Men
 * to 3 levels of depth - manually run - auto-run


 * Articles from category:People by occupation
 * Alpha ordered to 3 levels of depth - manually run - auto-run
 * Page ID ordered (~ in date or creation order) to 3 levels of depth - manually run - auto-run


 * Articles from category:People by nationality
 * Alpha ordered to 3 levels of depth - manually run - auto-run
 * Page ID ordered (~ in date or creation order) to 3 levels of depth - manually run - auto-run


 * Articles from category:Women
 * Alpha ordered to 5 levels of depth - manually run - auto-run
 * Page ID ordered (~ in date or creation order) to 5 levels of depth - manually run - auto-run


 * Articles from category:Living people
 * Alpha ordered to 0 levels of depth - manually run - auto-run


 * Articles from category:Births by decade
 * Alpha ordered to 3 levels of depth - manually run - autorun


 * Articles from category:Deaths by century
 * Alpha ordered to 4 levels of depth - manually run - autorun


 * Articles from category:Alumni by educational institution
 * Date of creation ordered to 9 levels of depth - manually run - autorun


 * Articles from category:Faculty by university or college
 * Date of creation ordered to 6 levels of depth - manually run - autorun


 * Articles from category:People by company
 * Date of creation ordered to 7 levels of depth - manually run - autorun

Articles with wikidata items having no P31 nor P279 property
A top-tip for navigating these lists is to produce a list of all articles starting with a particular letter; to do this, select one of the manual run options, go to the Output tab, and in the Regexp filter, enter the following: ^A.*$ ... which means, give me articles that start with the capital letter A. Adjust for whichever letter of the alphabet you choose.

Wikidata property P31 is used to code an item as human (P31=Q5). (Property P279 codes an item as a subclass of something). All items should have a P31 or a P279, and must have a P31=Q5 to be counted in our statistics. It's a fair bet that lack of P31 infers a lack of property P21 which is used to code for gender. Most of these reports will list items that are not biographies, but there are biographies to be found, which require both P31 & P21 adding. And to the extent you can be bothered, coding non-biographies with an appropriate P31 or P279 helps to clear the wood so that we can see the trees.


 * Articles from category:Sportspeople
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run
 * to 6 levels of depth - manually run - auto-run


 * Articles from category:Women - will return 10,000 articles, the vast majority of which will not be biographies.


 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:People by nationality


 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:People by occupation


 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:Stub categories
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:Living people


 * to 0 levels of depth - manually run - auto-run - mainly done


 * Articles from category:Births by decade
 * to 3 levels of depth, alpha order - manually run - auto-run


 * Articles from category:Deaths by century
 * to 3 levels of depth, alpha order - manually run - auto-run


 * Wikidata MWAPI search for items with no P31 nor P21
 * search - amend the query to search for specific given or family names

Articles with no wikidata item
The majority of articles listed are for names where existing items having the same name as their label exist, so the challenge is to check whether an existing item matches the article (in which case add a sitelink, and check that there is a P21 gender code), and if not, create a new item.


 * Articles from category:Living people
 * to 0 levels of depth - manually run - auto-run


 * Articles from category:People stubs
 * to 6 levels of depth - manually run - auto-run


 * Articles from category:Women
 * to 5 levels of depth - manually run - auto-run
 * to 6 levels of depth - manually run - auto-run
 * to 7 levels of depth - manually run - auto-run


 * Articles from category:People by occupation
 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:People by nationality
 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:Births by decade
 * to 3 levels of depth - article creation order - manually run - auto-run
 * to 3 levels of depth - alpha order manually run - auto-run


 * Articles from category:Deaths by century
 * to 4 levels of depth - article creation order - manually run - auto-run
 * to 4 levels of depth - alpha order - manually run - auto-run


 * Articles from Special pages
 * Articles not connected to items - a contemporaneous listing of articles with no wikidata items, most recent articles listed first. Useful for spotting women biogs as they arrive, such that a wikidata item can be added.


 * Articles from the Duplicity tool
 * Articles not connected to items, of all sorts, greater than 2 weeks old, listed in order of age (oldest first) and (fwiw) stats on number of en.wiki articles with no wikidata item; provided by Magnus Manske's Duplicity tool, updated once daily.

Articles with wikidata items coded as human, but with no gender code

 * Listeria reports looking at batches of 1 million humans in WD.
 * 1 to 1M
 * 1M to 2M
 * 2M to 3M
 * 3M to 4M
 * 4M to 5M
 * 5M to 6M
 * 6M to 7M
 * 7M to 8M
 * 8M to 9M
 * 9M to 10M
 * 10M to 11M
 * 11M to 12M


 * Articles from category:Women
 * to 5 levels depth - manually run - auto-run
 * to 6 levels depth - manually run - auto-run
 * to 7 levels depth - manually run - auto-run


 * Articles from category:People by occupation
 * to 3 levels depth - manually run - auto-run
 * to 4 levels depth - manually run - auto-run
 * to 5 levels depth - manually run - auto-run


 * Articles from category:People by nationality
 * to 3 levels of depth - manually run - auto-run
 * to 4 levels of depth - manually run - auto-run
 * to 5 levels of depth - manually run - auto-run


 * Articles from category:Living people
 * to 0 levels of depth - manually run - auto-run


 * Articles from category:Living people - note, these reports look for no P21, but do not check either way for a P31.


 * manually run - auto-run


 * Articles from category:Births by decade - note, these reports look for no P21, but do not check either way for a P31.
 * to 3 levels of depth - manually run - auto-run


 * Articles from category:Deaths by century - note, these reports look for no P21, but do not check either way for a P31.
 * to 3 levels of depth - manually run - auto-run


 * Articles from person stub categories
 * to 3 levels of depth - run - auto-run

Females given male gender coding
Again, this area is probably mostly under control, although new instances of wrongly coded items can appear at any time.

The following reports list biographies which are found beneath Women categories in Wikipedia, but which have gender = male in Wikidata. Most listed items are false positives - go deep enough into any women category in wikipedia and you find men. Note that the reports are ordered with earliest items listed first ... incorrectly coded items will tend to be found at the bottom of lists.

Four likely outcomes are suggested after inspection of an item / article: 1) change the gender coding on wikidata 2) remove inappropriate women categories from the article 3) remove inappropriate women categories from categories found on the article 4) shake your head in wonder at the vagueries of the wikipedia category tree.


 * Women by Occupation (excluding Women in sports)
 * to 5 levels - manually run - auto-run
 * to 6 levels - manually run - auto-run


 * Women in sports to 6 levels - manually run - auto-run


 * Women by nationality to 4 levels - manually run - auto-run