Module:PopulationFromWikidata/doc

Reason for the module
The aim is to make it easier to keep population values (and associated references) up-to-date in Australian place article Infoboxes. This module looks at population claims in a linked Wikidata item and filters for the latest and most appropriate population value. It extracts this value, along with all referencing information, and gives this to the article Infobox.

Who made the module
Wikimedia Australia designed this project to coincide with the first release of the 2021 census data (in June 2022). This module was created as part of a funded project with work done by m:User:MaiaCWilliams in collaboration with (really...HUGE amounts of help from) User:Samwilson, User:99of9 and User:Canley. The project was coordinated by User:tenniscourtisland.

It is an ongoing project and we will continue to refine the module. Of course anyone is welcome to contribute!

Head to the Module_talk:PopulationFromWikidata page if you have anything to discuss.

We wrote a summary of the project for the Wikimedia Australia blog here.

Population sources
The module is designed to be invoked from the Infobox Australian place template and gathers data from the Wikidata item linked to each article. The module may be modified and used in other places/cases in the future.

Currently, this module is invoked in such a way that it will only give the Infobox a population figure if one isn't manually given for the Infobox Australian place pop argument. This means that initially the module will not impact many articles. Over time, once we're certain it is working well, we can remove the manually added population figures in favour of the Wikidata figures brought in by the module.

See line 110 of the Infobox Australian place template for the module invoke.

How to see the module in action
Currently the module will only give a population figure to the Infobox if one has not been manually added via the Infobox Australian place template pop field. This means if you want to see the module in action for a particular place article, you should follow these steps:

Here's an example of an article with Infobox using the module, and the diff of the edit made.
 * 1) Pick a Wikipedia place article and check that the linked Wikidata item has a valid population claim (most now do, but some values will be old because not all 2021 Census data has been released yet).
 * 2) If the Wikidata item looks good, then edit the Infobox Australian place template part of the article. Remove the pop value and replace with a comment like: “&lt;!--Leave blank to draw the latest automatically from Wikidata--&gt;”. Remove the pop_year and pop_footnotes fields. Check if the old pop_footnotes reference had been used elsewhere in the article.
 * 3) Check the output in the article Infobox. If the output is not as expected then edit the Wikidata item or if it’s really broken, get in touch here.

The list of articles using population values from Wikidata (via this module) is here.

Assumptions
The module works with the following assumptions:
 * That all Australian place Wikipedia articles are linked to relevant Wikidata items (true because Canley and 99of9 have done this work).
 * Relying on the type field of the Infobox Australian place template being a required field and assuming it always has a value specified.
 * We're only considering population values associated with the Australian Bureau of Statistics' defined Australian Statistical Geographic Standard areas.
 * Assuming that the linked Wikidata item will likely have population statements for multiple Australian Bureau of Statistics geographic areas that encompass the item place.
 * We’re ignoring any ranking of population statements.

Population selection
The high level steps of the module work flow are outlined in the diagram below. There are three major steps in the process of selecting the best population figure from a Wikidata item.

Step 1. Check which population claims have enough information to be considered
As a minimum they are required to have: After filtering for these requirements a subset of population claims is carried forward.
 * 1) A point in time qualifier date (this helps to choose the most recent population figures).
 * 2) An applies to part qualifier value (this states which ABS geography type the population is for and helps choose the most appropriate geographic area for the place article).
 * 3) A determination method qualifier item (this specifies if it is a census population figure or a non-census population estimate and helps define the reference components).
 * 4) Some reference information (it is a requirement to have something with which to build a reference but more than the minimum is recommended - see the Population data in Wikidata section).

Step 2. Check which population claims match the Infobox Australian place type value
The next part of the module separates the valid population claims into those which have applies to part values (defined ABS geography types) that match the Infobox type and those that don't. For the Infobox types that can map to multiple ABS geography types (eg. type = town), the most common mapping is considered a match initially and the other mappings are considered later in the module if the first preference isn't available. For example, type = town is matched to Urban Centres and Localities (UCL) as a first preference, but also returns population values for Suburbs and Localities (SAL) and Indigenous Locations (ILOC) instead, if they exist.

The mappings are based on outputs of summary SPARQL queries pulling out Infobox place type versus ABS geography types specified in linked Wikidata item (for all Australian place articles). The module uses the following mappings.

Step 3. Check which population claims have the most recent figures
The next step is to check within the two sets of claims (applies to part geography matched or not) and find the most recent population figure per each applies to part value. For example, in the list of claims with applies to part geography not matching the Infobox, there are likely multiple applies to part values (UCL, SA1 etc) and multiple point in time values (2006, 2011, 2016 etc). This step finds the most recent population figures for each geography type (eg 2016 UCL; 2021 SA1).

There are then three different types of outputs depending on the outcomes of the Step 2 and Step 3 filtering.

Step 3A. Outputs for claims with geography match to Infobox type
This is Output Scenario 1 and gives the Infobox one formatted population figure, with the relevant applies to part, point in time year and full Cite web reference(s). Eg. 5,089 (Suburb and Locality 2021)[1]

Step 3B Towns. Second preference output for Infobox type = town
This is Output Scenario 2 and gives the Infobox up to two formatted population figures, each with the relevant applies to part, point in time year and full Cite web reference(s). This happens when there is no valid UCL population claim and is the second preference output for type = town places. E.g. OR
 * 100 (Urban Centre and Locality 2021)[1]
 * 90 (Indigenous Location 2021)[1]
 * 100 (Suburb and Locality 2021)[2]

Step 3B. Outputs for claims with no geography match to Infobox type
This is Output Scenario 3 and gives the Infobox (possibly) multiple formatted population figures (one for each applies to part value), each with the relevant applies to part, point in time year and full Cite web reference(s). Eg. If Infobox type = city that's mapped to UCL (ands leads to Output Scenario 1), but if there're no UCL population values you might get this output:


 * 100 (GCCSA 2021)[1]
 * 100 (SUA 2016)[2]
 * 120 (SA1 2016)[3]

Example outputs
There are some example outputs in the Infobox Australian place Sandbox Test Cases page here.

What it doesn't do - next steps
There are some issues that we are aware of, have considered but haven't dealt with yet. These will be tackled in time in collaboration with other place article contributors. (No doubt there are many more to add to the list - please do).


 * Some tidying up the output within the infobox:
 * removing unnecessary bullet points when there's only one item
 * add links to information about the relevant Census
 * adding tooltip description
 * change geography to abbreviation
 * adding links to explanations of ABS geographic boundaries (add this info to the Census articles and link to sections there)
 * Make a table of historic population values (from those available in Wikidata and that meet other module criteria) and test this as a new addition to place articles. As part of possible methods of preserving historic population figures in articles. Possibly a better solution than having multiple old values listed in Infoboxes (eg. Basket_Range,_South_Australia) or having to maintain individually in-text? Not the same idea, but there is a table of historic population values listed in this article).
 * Figure out the case of two Infoboxes: Jimbour East, Queensland
 * Suppress the population figures for protected areas. (eg, no output for type = protected). Yes?
 * Population density figures need to be computed and added to the Infobox using the same population (and area from corresponding geography) as this module outputs now. With area data uploaded to Wikidata?
 * Test that city rank can still be displayed in the Infobox if population coming from the module.
 * How to integrate (merge correctly) named references from the module with those used in-text. And how to retain historic population values (and references) as the Infobox population automatically updates with the most current figures. The module produces named references that are unique to the population value, but there are currently reference merging bugs associated with references from templates (and modules).
 * Should we change it so that pop2 still displays even if pop is replaced by the module population? So you can have both the automated population and a specific other population that's relevant to the article for some reason.
 * Add some more documentation to WikiProject Australian places/Population data.
 * Figure out interactions with the Coord template that's used in the majority of Australian place articles. The Coord template takes a population argument and uses that to determine the display scale of the Coordinates interactive map. Should we make an equivalent module (similar to this one) to bring place coordinates (with appropriate map zoom scales) from Wikidata to the Infobox Australian place template? Then the coordinates (and map scale) can be kept up-to-date with Wikidata. This would require parallel work to determine most appropriate place coordinate definition (eg centroid? of which geographic area?) so coordinates can be bulk imported to Wikidata? Or just rely on people adding the coordinates values to Wikidata manually but cut out the need to use the Coord template to set map zoom scale? Or keep using the Coord template but give it the appropriate population value as selected by (a modified version) of this PopulationFromWikidata module.
 * Connect Aboriginal and Torres Strait Islander community Wikidata items with ABS ILOC IDs so ILOC population counts can be uploaded in bulk. Will then need to revisit the ILOC vs SAL preferencing because for some towns ILOC will be more appropriate than SAL (due to geographic area covered).
 * Need to revisit mapping of regions to ABS geographies. Maybe they should be mapped to SA3s? Eg: Kimberley region article Also, need to update  this article and other equivalents.
 * Discrepancy with places with zero population, such as Essendon Fields (Q5399482):
 * In QuickStats it says: "No information can be provided because the area selected had no people or a very low population in the 2021 Census."
 * But it does have data in Wikidata (population 13, for SAL20886, which is what is in the DataPack).
 * This means that the reference URL ends up not backing up the displayed population figure.
 * There may be a difference in how this is handled between 2016 and 2021. For example, has zero population and doesn't show in either 2016 or 2021 QuickStats — but  had 3 people in 2016 (shown in QuickStats) and 4 people in 2021 (not shown in QuickStats). Both places have both figures in the DataPack.

What if the outputs are incorrect
All the references produced by this module are followed by an Edit at Wikidata pencil icon with link the relevant Wikidata item (and specific population claim). This is where people should go to fix any errors in the population figure outputs or references. See next section for lists of what should ideally be included in a Wikidata population claim.

Wikipedia - Wikidata links
In parallel to development of this module User:99of9 and User:Canley have been working on ensuring all Australian place Wikipedia articles are linked to corresponding Wikidata items (describing that same place). This has largely been done. This enables the use of this module.

Census data
Population data has historically been manually entered to individual Wikidata items. Recently (since ~2017) User:99of9, User:Canley and others have used QuickStatements to do bulk imports of population data from Australian Bureau of Statistics datasets. Part of developing this module was to refine the list of metadata (qualifiers and reference fields) that should be imported alongside the population values.

As at July 2022 the first release of the 2021 census population data has been uploaded for the geographic areas relevant to Australian place Infoboxes. This includes data for Suburbs and Localities (SAL), Indigenous Locations (ILOC) and Local Government Areas (LGA). The Urban Centres and Localities (UCL) data is due to be released in October 2022.

The module requires these qualifiers and reference components to have values in the Wikidata population claim.
 * applies to part
 * point in time
 * determination method
 * reference: reference URL
 * reference: title
 * reference: published in
 * reference: retrieved
 * reference: Australian Statistical Geography 2021 ID (optional)

An example of a Wikidata item with a correctly filled 2021 population claim (using Census data) is:Q2821571.

Non-census data
Bulk uploads have been done for census data. They have not been done for between-census estimated residential population (ERP) or Data by Region figures, for example. These estimates are useful for capital cities, LGAs and regions.

The module requires that non-census population claims have these components:
 * applies to part
 * point in time
 * determination method
 * reference: reference URL
 * reference: title
 * reference: published in
 * reference: retrieved
 * reference: publication date
 * reference: Australian Statistical Geography 2021 ID (optional)

An example of a Wikidata item with a correctly filled 2021 estimated resident population claim (not the other population claims) is:Q11568. An example of a Wikidata item with a correctly filled 2020 LGA Data by Region population claim (not the other population claims) is:Q704257.

Usage
The module exposes one function.

ListForInfobox( type, wikidata )
Parameters:
 * type the type parameter from Infobox Australian place. Required.
 * wikidata Wikidata ID to override that of the current article. Optional.