Wikipedia:Bots/Requests for approval/JhealdBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

JhealdBot
Operator:

Time filed: 23:36, Monday December 8, 2014 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Perl

Source code available: Still under development.

Function overview: Maintenance of subpages of GLAM/Your_paintings, in particular the subpages listed at GLAM/Your_paintings. There is currently a drive to identify Wikidata entries for the entries on this list not yet matched. I seek approval to keep these corresponding pages on Wikipedia up to date.

Initially I would just use the bot as an uploader, to transfer wikipages edited off-line into these pages (including fixing some anomalies in the present pages -- which I would probably do sequentially, through more than one stage, reviewing each fix stage before moving on to the next).

Once the off-line code is proven, I would then propose to move to a semi-automated mode, automatically updating the pages to reflect new instances of items with d:Property:P1367 and/or corresponding Wikipedia and Commons pages.

Links to relevant discussions (where appropriate):

Edit period(s): Occasional (perhaps once a fortnight), once the initial updating has been completed. And on request.

Estimated number of pages affected: 17

Exclusion compliant (Yes/No): No. These are purely project tracking pages. No reason to expect a bots template. If anyone has any issues with what the bot does, they should talk to me directly and I'll either change it or stop running it.

Already has a bot flag (Yes/No): No. I have one on Commons, but not yet here.

Function details:
 * Initially: simple multiple uploader bot -- take updated versions of the 17 pages prepared and reviewed offline, and upload them here.
 * Subsequently: obtain a list of all Wikidata items with property P1367. Use the list to regenerate the "Wikidata" column of the tables, plus corresponding sitelinked Wikipedia and Commons pages.

Discussion

 * Regarding uploading offline edits: Are these being made by anyone besides the operator? What license are they being made under?  —  xaosflux  Talk 23:44, 18 December 2014 (UTC)
 * The pages have been being prepared by me using perl scripts, drawing from Wikidata.
 * I've slowly been making the scripts more sophisticated -- so I've recently added columns for VIAF and RKDartists links, both taken from Wikidata, defaulting to searches if there's no link, or no Wikidata item yet identified. Content not drawn from Wikidata (typically legacy entries from the pages as I first found them) I have prefixed with a question mark in the pages, meaning to be confirmed.  For the most part these are blue links, which may go to completely the wrong people.
 * So at the moment I'm running a WDQ search to pull out all Wikidata entries with one (or more) values for the P1367 "BBC Your Paintings identifier" property, along with the properties for Commons category name (P373), VIAF (P214) and RDKartists (P650). I'm also running an Autolist search to get en-wiki article names for all Wikidata items with a P1367.  Plus I have run a look-up to get Wikidata item numbers for all other en-wiki bluelinks on the page (this gives the Q-numbers marked with question marks).  But the latter was quite slow, so I have only run it the once.  At the moment I'm still launching these searches by hand, and making sure they've come back properly, before updating & re-uploading the pages.
 * As to the licensing -- Wikidata is licensed CC0. My uploads here are licensed CCSA like any other upload to the site (though in reality there is very little originality, creativity or expression, apart from the choice of design of the page overall, so probably (under U.S. law at least), there quite possibly is no new copyrightable content in the diffs.  Various people of course are updating Wikidata -- I've been slowly working down this list (well, so far only to the middle of the 1600s page) though unfortunately not all of the Wikidata updates seem to be being picked up by WDQ at the moment; the Your Painters list is also on Magnus's Mix-and-Match tool; and various others are working at the moment, particularly to add RKD entries to painters with works in the Rijksmuseum in Amsterdam.  But Wikidata is all CC0, so that all ought to be fine.
 * What would help though, would be having the permission for a (limited) multiple uploader, so I could then upload the updates to all 17 pages just by launching a script, rather than laboriously having to upload all 17 by hand each time I want to refresh them, or slightly improve the treatment of one of the columns.
 * I'm not sure if that entirely answers your question, but I hope does make clearer what I've been doing. All best, Jheald (talk) 00:45, 19 December 2014 (UTC)
 * Please post your results here after the trial. — xaosflux  Talk 01:48, 19 December 2014 (UTC)
 * First run of 16 edits made successfully -- see contribs for 19 December, from 15:59 to 16:55.
 * (Links to RKD streamlined + data updated; one page unaffected).
 * All the Captchas were a bit of a pain to have to deal with; but they will go away. Otherwise, all fine.  Jheald (talk) 17:31, 19 December 2014 (UTC)
 * Sorry about that, I added  flag to avoid this for now. —  xaosflux  Talk 17:34, 19 December 2014 (UTC)
 * New trial run carried smoothly (see this related changes page).
 * Update still prepared by executing several scripts manually, before a final uploader script; but I should have these all rolled together into a single process for the next test. Jheald (talk) 09:11, 11 January 2015 (UTC)
 * Run again on January 21st, adding a column with the total number of paintings in the PCF for each artist. Jheald (talk) 17:13, 24 January 2015 (UTC)

Have you completed the trial? Josh Parris 10:20, 4 March 2015 (UTC)
 * I was going to go on running it once a month or so, the next one probably in a day or two, until anyone progressed this any further, possibly making tweaks to my offline processing scripts as I went along.  Obviously I'm open to suggestions as to anything I can improve or do better; though the actual unsupervised bit itself is just an upload script, refreshing a dozen or so pages, so nothing very complicated. (The off-line preprocessing is a bit more involved, but still pretty trivial). Jheald (talk) 00:33, 5 March 2015 (UTC)
 * I note that further edits have been made. Out of interest, why do http://viaf.org IDs change?  The painter's been dead for centuries. Are they merges of duplicates? Also, is the trial finished now? Josh Parris 14:54, 9 March 2015 (UTC)
 * Clearly there has been a significant update of VIAF ids on Wikidata in the last three weeks, with a lot of new VIAF ids added -- I think by one of Magnus Manske's bots. This is why there are significant reductions in length for a lot of pages, with VIAF searches being replaced by explicit VIAF links.
 * I imagine that this may be catch-up resynchronisation for several months of updates at VIAF; but it may also be that now VIAF is explicitly targeting Wikidata items rather than just en-wiki articles, and is actively doing matching at the VIAF end, that may be why there now seems to be a sudden rush of new VIAF <--> Wikidata matches.
 * You're right that there are a few VIAF matches that have changed. I haven't looked in to any in detail, but two strong possibilities would be either erroneous matches that have been corrected (ie we used to point to the VIAF for somebody quite different); or alternatively that a group of duplicate entries on VIAF may have been merged -- eg if there had been a VIAF for the Library of Congress id, and another for the Getty ULAN id, and the two had not previously been connected.
 * As to where we're at, matching of the Your Paintings painter identifiers continues to move forwards using mix-n-match. About 80% of the YP identifiers have now been triaged into has / doesn't have / shouldn't have Wikidata item, with progress ongoing; plus I've now got as far as painters born before 1825, using mix-n-match search to match to RDKartists and other databases.  Then there will also a stage where new Wikidata items are created for YP ids that currently don't have them but should; and these new ids in turn will also have RKD artists (etc) that they match.  So there's still a lot to do going forward, and the tracking pages will continue to need updates if they are to reflect that.
 * At the moment it's still done using about four scripts that I sequentially run by hand on an occasional basis. The one I'd have to write a bit more code to integrate is the one that merges in the article names on en-wiki for the Wikidata items, because these are currently got using an Autolist query which is then saved manually.  I'd need to look into how to replace that batch look-up with an API call, if I was to make the whole thing more integrated and run on regular basis (weekly?)  I'm happy to do that work if anybody wants it, but for the time being it's also as easy just to go on doing what I've been doing, generating the updates in a partially manual way.  So I'm happy to be open to views, if anybody has got any strong preferences either way.  Jheald (talk) 23:27, 4 May 2015 (UTC)

Jheald what is to be done here? I have no followed the entire discussion to be honest. -- Magioladitis (talk) 08:49, 15 July 2015 (UTC)


 * ping Magioladitis (talk) 17:59, 14 August 2015 (UTC)


 * Hi . What I am looking for is permission to go on making script-driven updates to the 17 pages linked from Wikipedia:GLAM/Your paintings/header, as the data continues to develop on Wikidata. Thanks, Jheald (talk) 17:42, 17 August 2015 (UTC)

Jheald As I noticed that bots run an updated script already? -- Magioladitis (talk) 18:21, 17 August 2015 (UTC)


 * Hi . Thanks for the extended trial. You're right, I ran another update earlier this afternoon. But what I'm really looking for is for permission now to be extended indefinitely.  I have run the scripts on and off for over 9 months now; and it's a very small set of pages affected, in project space rather than main space, pages primarily used by myself.  Can we not just sign off the permission permanently now?  Jheald (talk) 18:55, 17 August 2015 (UTC)

Jheald I'll do it in 5 days from now if this is not a problem. Just ping me in 5 days and I'll immediately approve it. -- Magioladitis (talk) 18:59, 17 August 2015 (UTC)

-- Magioladitis (talk) 10:07, 26 August 2015 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.