User talk:Janhp78

Facto Post – Issue 10 – 12 March 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 10 – 12 March 2018

 

Milestone for mix'n'match
Around the time in February when Wikidata clicked past item Q50000000, another milestone was reached: the mix'n'match tool uploaded its 1000th dataset. Concisely defined by its author,, it works "to match entries in external catalogs to Wikidata". The total number of entries is now well into eight figures, and more are constantly being added: a couple of new catalogs each day is normal.

Since the end of 2013, mix'n'match has gradually come to play a significant part in adding statements to Wikidata. Particularly in areas with the flavour of digital humanities, but datasets can of course be about practically anything. There is a catalog on skyscrapers, and two on spiders.

These days mix'n'match can be used in numerous modes, from the relaxed gamified click through a catalog looking for matches, with prompts, to the fantastically useful and often demanding search across all catalogs. I'll type that again: you can search 1000+ datasets from the simple box at the top right. The drop-down menu top left offers "creation candidates", Magnus's personal favourite. Mix'n'match/Manual for more.

For the Wikidatan, a key point is that these matches, however carried out, add statements to Wikidata if, and naturally only if, there is a Wikidata property associated with the catalog. For everyone, however, the hands-on experience of deciding of what is a good match is an education, in a scholarly area, biographical catalogs being particularly fraught. Underpinning recent rapid progress is an open infrastructure for scraping and uploading.

Congratulations to Magnus, our data Stakhanovite!

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 12:26, 12 March 2018 (UTC)
 * Wikipedia goes 3D allowing users to upload .STLs for digital reference, Beau Jackson for 3dprintingindustry.com, February 22 2018
 * WikiCite report (video)
 * Formal publication and announcement of ISBN citation dataset, see Twitter post, February 23 2018
 * Plotting the Course Through Charted Waters, workshop on data visualization literacy from Mikhail Popov, Wikimedia Foundation
 * Using Wikidata to build an authority list of Holocaust-era ghettos, Nancy Cooey, United States Holocaust Memorial Museum, February 12 2018
 * Why Should You Learn SPARQL? Wikidata! Mark Longair, blogpost November 29 2017
 * Back to the future: Does graph database success hang on query language?, George Anadiotis for Big on Data, March 5 2018
 * }

Facto Post – Issue 11 – 9 April 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 11 – 9 April 2018

 

The 100 Skins of the Onion
Open Citations Month, with its eminently guessable hashtag, is upon us. We should be utterly grateful that in the past 12 months, so much data on which papers cite which other papers has been made open, and that Wikidata is playing its part in hosting it as "cites" statements. At the time of writing, there are 15.3M Wikidata items that can do that.

Pulling back to look at open access papers in the large, though, there is is less reason for celebration. Access in theory does not yet equate to practical access. A recent LSE IMPACT blogpost puts that issue down to "heterogeneity". A useful euphemism to save us from thinking that the whole concept doesn't fall into the realm of the oxymoron.

Some home truths: aggregation is not content management, if it falls short on reusability. The PDF file format is wedded to how humans read documents, not how machines ingest them. The salami-slicer is our friend in the current downloading of open access papers, but for a better metaphor, think about skinning an onion, laboriously, 100 times with diminishing returns. There are of the order of 100 major publisher sites hosting open access papers, and the predominant offer there is still a PDF. From the discoverability angle, Wikidata's bibliographic resources combined with the SPARQL query are superior in principle, by far, to existing keyword searches run over papers. Open access content should be managed into consistent HTML, something that is currently strenuous. The good news, such as it is, would be that much of it is already in XML. The organisational problem of removing further skins from the onion, with sensible prioritisation, is certainly not insuperable. The CORE group (the bloggers in the LSE posting) has some answers, but actually not all that is needed for the text and data mining purposes they highlight. The long tail, or in other words the onion heart when it has become fiddly beyond patience to skin, does call for a pis aller. But the real knack is to do more between the XML and the heart.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 16:25, 9 April 2018 (UTC)
 * Crossref as a new source of citation data: A comparison with Web of Science and Scopus, CWTS blogpost 17 January 2018, Nees Jan van Eck, Ludo Waltman, Vincent Larivière, Cassidy Sugimoto
 * Citations with identifiers in Wikipedia, figshare dataset
 * Making women more visible online—with Wikidata tools!, Wikimedia blogpost 29 March 2018 by Sandra Fauconnier
 * Village pump discussion, Turn on mapframe? We’re ready if you are reaches conclusions
 * The Power of the Wikimedia Movement beyond Wikimedia, Forbes 28 March 2018, Michael Bernick
 * Tracing stolen bitcoin, blogpost 26 March 2018 by Ross J. Anderson
 * }

Facto Post – Issue 12 – 28 May 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 12 – 28 May 2018

 

ScienceSource funded
The Wikimedia Foundation announced full funding of the ScienceSource grant proposal from ContentMine on May 18. See the ScienceSource Twitter announcement and 60 second video.

The proposal includes downloading 30,000 open access papers, aiming (roughly speaking) to create a baseline for medical referencing on Wikipedia. It leaves open the question of how these are to be chosen.
 * A medical canon?

The basic criteria of WP:MEDRS include a concentration on secondary literature. Attention has to be given to the long tail of diseases that receive less current research. The MEDRS guideline supposes that edge cases will have to be handled, and the premature exclusion of publications that would be in those marginal positions would reduce the value of the collection. Prophylaxis misses the point that gate-keeping will be done by an algorithm.

Two well-known but rather different areas where such considerations apply are tropical diseases and alternative medicine. There are also a number of potential downloading troubles, and these were mentioned in Issue 11. There is likely to be a gap, even with the guideline, between conditions taken to be necessary but not sufficient, and conditions sufficient but not necessary, for candidate papers to be included. With around 10,000 recognised medical conditions in standard lists, being comprehensive is demanding. With all of these aspects of the task, ScienceSource will seek community help.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. ScienceSource pages will be announced there, and in this mass message. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 10:16, 28 May 2018 (UTC)
 * d:Wikidata:Lexicographical data, Wikidata's multi-lingual dictionary project gets going
 * Ordia tool, a basic search interface for Wikidata lexemes and forms
 * OpenRefine tool 3.0, May update allows wrangling of tabular information into Wikidata
 * d:Wikidata:WikiProject British Politicians pushes ahead with data modelling and imports
 * #1Lib1Ref Returns for a Second Time in 2018, IFLA blogpost 25 May 2018, second chance this year to participate in referencing Wikipedia
 * }

Facto Post – Issue 13 – 29 May 2018
MediaWiki message delivery (talk) 18:19, 29 June 2018 (UTC)

Facto Post – Issue 14 – 21 July 2018
MediaWiki message delivery (talk) 06:10, 21 July 2018 (UTC)

Facto Post – Issue 15 – 21 August 2018
MediaWiki message delivery (talk) 13:23, 21 August 2018 (UTC)

Facto Post – Issue 16 – 30 September 2018
MediaWiki message delivery (talk) 17:57, 30 September 2018 (UTC)

Facto Post – Issue 17 – 29 October 2018
MediaWiki message delivery (talk) 15:01, 29 October 2018 (UTC)

Facto Post – Issue 18 – 30 November 2018
MediaWiki message delivery (talk) 11:20, 30 November 2018 (UTC)

Facto Post – Issue 19 – 27 December 2018
MediaWiki message delivery (talk) 19:08, 27 December 2018 (UTC)

Facto Post – Issue 20 – 31 January 2019
MediaWiki message delivery (talk) 10:53, 31 January 2019 (UTC)

Facto Post – Issue 21 – 28 February 2019
MediaWiki message delivery (talk) 10:02, 28 February 2019 (UTC)

Facto Post – Issue 22 – 28 March 2019
MediaWiki message delivery (talk) 11:45, 28 March 2019 (UTC)

Facto Post – Issue 23 – 30 April 2019
MediaWiki message delivery (talk) 11:27, 30 April 2019 (UTC)

Facto Post – Issue 24 – 17 May 2019
MediaWiki message delivery (talk) 18:52, 17 May 2019 (UTC)