Wikipedia:WikiProject Women in Red/Wikidata redlist guide

This Wikidata redlist guide provides step-by-step guidance to create Women in Red redlists. Although this guide is focused on Women in Red, it may be useful to create Wikidata-based lists for other purposes.

Preliminaries
In order to create a Wikidata-based redlist, you will need:
 * Basic understanding of template usage, see Help:Transclusion.
 * Basic understanding of what Wikidata is.
 * A grasp of SPARQL queries, see wikidata:Wikidata:SPARQL tutorial. You can learn even more at wikidata:Wikidata:SPARQL query service/Wikidata Query Help.

You will use the following tools:
 * Wikidata Query Service (query.wikidata.org).
 * Wikidata list and Wikidata list end templates.

Simple example
Let's start with a trivial Wikidata list. It will have a single entry for Ada Lovelace and we'll use the following query:

The above query will get every Wikidata item fulfills these conditions:
 * 1) Is a human:.
 * 2) Is a female:.
 * 3) Has given name Ada:.
 * 4) Has family name Byron:.

Now that we have a SPARQL query that returns the entries we want, we can create the redlist using Wikidata list (and remembering to include a Wikidata list end template):

ListeriaBot will take care of updating it automatically, producing the following output:

Notice that the query returns only ?item. Columns in the table it generates are specified in the  parameter of the Wikidata list template. See Template:Wikidata listfor more information on Wikidata list parameters.

Missing articles
In order to list only items without a corresponding article in the English Wikipedia, every redlist needs the following SPARQL fragment:

OPTIONAL { ?w schema:about ?item; schema:isPartOf . } FILTER(!(BOUND(?w)))

You will also see the following equivalent form:

FILTER NOT EXISTS { ?w schema:about ?item; schema:isPartOf . }

Number of sites
When looking for notable subjects, it is often useful to look at how many Wikimedia projects have a page for a given item. This number can be retrieved with the following SPARQL fragment:

?item wikibase:sitelinks ?linkcount.

Here's a modified version of the simple example modified to add a column with link count:

Handling large results
The number of results for a SPARQL query can often be in the thousands or tens of thousands. That is way beyond what we can handle in a wiki redlist, so we need to cut it own. The number of results of a query can be limited by adding a LIMIT clause to the end. For example, LIMIT 1000 to limit results to 1000.

However, if we use LIMIT alone, the results that make it into the list will be arbitrary, and they might not be the most relevant. So it is a good idea to always apply order criteria. A limit with our recommended order follows:

ORDER BY DESC(?linkcount) ASC(?item) LIMIT 1000

This limits the results to the top 1000 by number of sites. If two items have the same number of sites, the one with the lowest item number takes precedence. This makes the result deterministic, meaning that in the absence of actual data changes, the query will always return the same set of 1000 results. If we didn't do this, the bot will repeatedly remove and add back items in subsequent updates.

Occupation
One of the most common criterion for redlist is. Check out current redlists by occupation. We specify one or more occupations as follows:

?item wdt:P106 ?occ VALUES ?occ { wd:Q5468707 # forensic entomologist wd:Q27645949 # paleoentomologist wd:Q3055126 # entomologist }

This will include items where is either, , or. The comments in the query (e.g. # entomologist) are optional, but they can make the query more readable to humans.

Here's a full example of a redlist of 5 entomologist women (see also the actual Entomologists redlist):

Country
See our country redlists. A simple approach to create this would be using the property. But Wikidata may be missing the country of citizenship, but it may have other geographical properties that would be good enough for our purposes. So we can use a combination of, , , , and. We can do it with the following SPARQL fragment:

VALUES ?country { wd:Q189 # Iceland } { { ?item (wdt:P27

Here's a full example of a redlist of 5 women from Honduras (see also the actual Honduras redlist):

Killed by OS for overloading memory
A list may fail to update because the bot ran out of memory. This is signaled with the error Killed by OS for overloading memory on manual updated. This problem is a known problem of ListeriaBot, and it is usually because there are many links to large entities. A workaround is reducing the number of links to geographical entitites. For example, removing the column.