Wikipedia:Meetup/Christchurch/21/Artists

Messing with wikitable export from OpenRefine

Todo

 * figure out how to identify presence of wikipedia item from wikidata
 * export openrefine project as wikitable below
 * attach OpenRefine project for others to work with
 * consider workflow, e.g. "allocated to" person column

Original file
from Tim Jones 11 March 2020, 702 rows: Last, First Years Abbott, Marie Aberhart, Laurence (b.1949) Adams, Mark Bentley (b.1949) Aitken, Chrystabel (b.1904, d.2005) ...snipped...

Processing

 * 1) In OpenRefine split into columns Last, First, Birth date raw, Death date raw.
 * 2) Create new column Name in First +space+ Last format
 * 3) Reconcile against Wikidata using Name as label
 * 4) Create new column Wikidata ID based on reconciliation grel:cell.recon.match.id
 * 5) Add two new columns from reconciled values, selecting Country of citizenship and Sex or gender
 * 6) Add new column Birth date, Death date using value.match(/.*(\d{4}).*/)[0].toString (drops non-numerals from value)
 * 7) Add intermediate column Wikidata sitelinks URL with response from Q ID lookup "urlExpression": "grel:\"https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&languages=en&props=sitelinks&ids=\"+cell.recon.match.id",
 * 8) Add column Wikipedia with value from parsed response: grel:\"https://en.wikipedia.org/wiki/\" + value.parseJson.entities.get(cells[\"Wikidata ID\"].value).sitelinks.enwiki.title.replaceChars(' ', '_')"

380 out of 702 rows matched to Wikidata so far.

Data note
Some birth dates are actually "active" dates:
 * Edgar B. Vaughan (active 1898-1920)
 * William Tiller (active 1907-1918)
 * Marcus King (active 1920-1965)
 * F. G. Shewell (active 1950-1969)
 * B. L. Gray (active 1970-1975)
 * George Edmund Pruden (active c. 1890-1925)