User talk:Lydia Pintscher (WMDE)

Welcome
Welcome to English Wikipedia. I asked a question at WP:VPM. Thanks. Biosthmors (talk) 20:42, 8 February 2013 (UTC)

Courtesy note re: citations on WikiData
Hi Lydia. Thanks for all your work on the WikiData project. There's a discussion at Village pump (proposals) about moving citations to WikiData. I used a quote from you in the Alternative proposal subsection and thought it would be courteous to let you know about it. Kind regards. 64.40.54.208 (talk) 05:14, 31 March 2013 (UTC)
 * Thank you for letting me know! --Lydia Pintscher (WMDE) (talk) 09:15, 31 March 2013 (UTC)

Denny in Seattle
Re Wikipedia_talk:Meetup/Seattle: I'm interested. If nothing else comes together, I'll gladly take him out for coffee or a drink & a chance to talk. I can arrange to be free most weekday evenings or, if that doesn't work, we can see what does. - Jmabel &#124; Talk 00:56, 12 April 2013 (UTC)
 * Cool. I'll send you an email for further coordination. --Lydia Pintscher (WMDE) (talk) 15:41, 12 April 2013 (UTC)

Hi Lydia
i wonder if you might take a look and comment on an idea I've put together here: Wikipedia_talk:Category_intersection. This is a reaction to the nasty mess we got in in the media as a result of gendered-categories that were supposed to be non-diffusing but weren't. I'd like to see what we could get done quickly, and how doing something like that might eventually help feed into a wikidata-like solution. Anyway, please take a look and send your comments there. best, --Obi-Wan Kenobi (talk) 03:47, 5 May 2013 (UTC)
 * Hey :)
 * I unfortunately don't really have any comments on that besides that we very briefly talked with Erik Moeller about categories and Wikidata a while ago. The only presentable outcome so far is a sentence here though. So I'd say it is on the plan but not for the next weeks. Hope that helps. --Lydia Pintscher (WMDE) (talk) 15:25, 6 May 2013 (UTC)
 * Thanks for your response. With a few tweaks to the approach I've proposed above, some templates, a few bots, and moving the catscan tool over to wikipedia labs, I do think it would be possible to de-ghettoize most of the tree without a massive amount of work - and then when the whole wikidata thing becomes a reality, we'd be in a better position. Check out for a more refined example than the one I had the other day. In any case, today, in spite of all of the hoopla over the  category, people remain ghettoized by gender, ethnicity, and so on throughout the tree - so quick action might help here. Any possibility of having a few devs chat with me to discuss whether it would be possible to do something quickly, not w.r.t. categories in general, but esp w.r.t gender/ethnic/religious/sexuality categories - which as I said above, currently serve to ghettoize (intentionally or not) tens of thousands of bios. A new op-ed could be written tomorrow about how we ghettoize Indian women novelists or African-American journalists or gay actors or many other things besides, so wikipedia is still completely exposed to the critiques leveled at it in recent days - fixing American novelists was just the tip of the iceberg. Cheers! --Obi-Wan Kenobi (talk) 15:32, 6 May 2013 (UTC)
 * For this I think the best way would be an email to the wikitech-l mailing list. Good luck! Thanks for pushing this. It's important. --Lydia Pintscher (WMDE) (talk) 15:37, 6 May 2013 (UTC)
 * Ok - I haven't used that list before - who are the members of it? I guess I'm not sure who I'd be talking to there - but to broader point, if we do come up with what seems like a workable solution and there seems to be editor consensus around it, can WMF throw some short-term dev resources at this issue (since bots may need to be written, cat_scan UI refreshed, etc)? Just to emphasize - more NY times articles could be written today, and tomorrow, and the next day - we're not out of the woods yet, and yet we don't even have a broad wiki-wide consensus to start de-ghettoizing at scale. --Obi-Wan Kenobi (talk) 16:36, 6 May 2013 (UTC)
 * It's the list MediaWiki developers use for coordination. I can't say if the Foundation will spend money on something or not. I have no say in that :) --Lydia Pintscher (WMDE) (talk) 10:48, 7 May 2013 (UTC)
 * Thanks Lydia. I'm not really looking for a final answer, but more a pathway. If you think the best pathway is to go to the devs, I'm fine with that, but part of me thinks this should be escalated - e.g. have some top-level WMF people say "Ok, let's fix this ghettoization problem within a month - wikidata is the long term fix, but this particular fix can be implemented at relatively low cost (most of the work can be done by regular editors, and other bits by bots, and doesn't require any changes to mediawiki itself), so let's throw some resources at it" - how could I get this idea in front of those decision-makers? Thanks again, --Obi-Wan Kenobi (talk) 13:45, 7 May 2013 (UTC)
 * I would use the same list. The relevant people should be reading it. --Lydia Pintscher (WMDE) (talk) 13:46, 7 May 2013 (UTC)

Multiple values; linking equivalent items and properties
Hi Lydia,

Good to meet you last weekend. There's something I forgot to mention then:

Please be aware of this discussion about en.Wikipedia's use of Plain list and Flat list in infoboxes, and how they should receive multiple values (e.g. if a person has two occupations) as separate list items. Please let me know if you need further explanation, or if you need me to raise this elsewhere.

Also, just to remind you, I also raised with you there the issue that there is no logical connection between items and the equivalent properties, for example, for ORCID, Q51044 and P496, which you kindly said you would raise. Again, please let me know if I can do anything to help.

Andy Mabbett ( Pigsonthewing ); Talk to Andy; Andy's edits 20:28, 31 May 2013 (UTC)


 * Hey :) It was good meeting you at the hackathon! As for the list issue: As someone already mentioned in the discussion there this should be done with Lua and afaik is possible. As for the second issue: I think the best way to solve this is probably a note in the property documentation. There's some ongoing discussion at d:Wikidata:Requests for comment/Place of the property documentation. Maybe this would be a good place to raise this? --Lydia Pintscher (WMDE) (talk) 12:08, 3 June 2013 (UTC)

Bot-Programmierung / Betreiber
Hallo Lydia!

Hier Wikipedia:Bot requests#New REFBot und jetzt auch hier de:Wikipedia:Bots/Anfragen habe ich mal bescheidene Vorschläge gemacht, die leider wenig Nachhalt bewirken. Es ist einfach mal frustrierend und stupide, in der en.WP tagelang "clean up" die Category:Pages with citation errors... Meiner Schätzung nach könnten diese errors zu 95% maschinell von einem Bot erkannt und durch ihn die verursachenden User informiert werden, und damit die Fehler dann auch von den Verursachern selbst korrigiert werden. Viele erfahrene Benutzer verzichten auf die Vorschau bzw. nehmen den dicken fetten roten Warnhinweis nicht wahr. IPs und Neulinge wissen damit oft noch nicht umzugehen; das ist aber meiner Erfahrung nach eine kleine Minderheit.

An wen muss man sich wenden, um da mal was von hauptamtlicher Seite was auf die Beine stellen zu lassen? Wie hoch ist der Arbeitsaufwand für einen solchen REFBot, analog DPLBot und BrackBot, die hervorragend arbeiten? Rechnen tut sich der auf jeden Fall, wenn man z. B. die bisherigen in zehn Jahren vergeudeten unnützen Mann-Stunden in Wochen und Monate umrechnet. Danke. Bitte informier mich, wenn Du hier antwortest. Danke --Frze > talk  12:56, 25 October 2013 (UTC)


 * Hallo Lydia! In der en.Wikipedia scheint da zumindest in Ansätzen was anzulaufen. Habe das ganze nun auch hier de:Wikipedia:Umfragen/Technische Wünsche/Einzelnachweise auf die Wunschliste gesetzt. Kannst Du da bitte mal mithelfen / anschieben / viele      zusammentrommeln? Danke und wie immer auch danke für Mitteilung auf meiner Disk. --Frze > talk   23:36, 4 November 2013 (UTC)



Frze > talk  — is wishing you a Happy New Year! This greeting (and season) promotes WikiLove and hopefully this note has made your day a little better. Spread the WikiLove by wishing another user a Happy New Year, whether it be someone you have had disagreements with in the past, a good friend, or just some random person. Happy New Year!

Spread the New Year cheer by adding {{subst:New Year 1}} to their talk page with a friendly message.

Wikidata: Von Deutsch auf Englisch OK, In die andere Richtung aber falsch.
Hi Lydia Ich habe hier als Wikidata Neuling das Problem das der deutsche Artikel Beinlinge auf den englischen Artikel Hose_(clothing) zeigt. Dieser aber umgekehrt auf den falschen deutschen Artikel Hose. In dem Eintrag auf Wikidata scheint es aber dennoch zu passen. Bin Ratlos.--Tobias &#34;ToMar&#34; Maier (talk) 17:04, 2 March 2014 (UTC)
 * Hey :) Sorry mein Internet Zugang spinnt gerade weshalb ich dir im Chat nicht antworten konnte. Das Problem war, dass der Interwikilink noch lokal gesetzt war: https://en.wikipedia.org/w/index.php?title=Hose_%28clothing%29&diff=597828060&oldid=571770295 Der überschreibt dann was von Wikidata kommt. --Lydia Pintscher (WMDE) (talk) 17:30, 2 March 2014 (UTC)
 * Ah, danke. Dann war wohl die Fehlermeldung, das der Eintrag in Wikidata bereits existirt, unzureichend oder ich habe die nicht richtig gelesen --Tobias &#34;ToMar&#34; Maier (talk) 17:50, 2 March 2014 (UTC)

Wikidata: Von Deutsch auf Englisch. Wieder Probleme.
Und zwar gibt es den einzelnen Artikel Beugungsunschärfe der im Englischem Diffraction blur entspricht aber nur ein kleiner Teil des dortigen Artikels Depth of field darstellt. Ich bekomme die nicht verlinkt, auch über die Umleitung nicht.--Tobias &#34;ToMar&#34; Maier (talk) 09:35, 17 September 2014 (UTC)
 * Hi Tobias. Verlinkungen auf Artikelabschnitte sind nicht möglich. Der enwp Artikel ist bereits mit einem anderen Artikel in dewp verknüpft. Du kannst das in Q215932 und Q850930 sehen. In diesen Fällen kannst du weiterhin im Wikitext verlinken. --Lydia Pintscher (WMDE) (talk) 09:45, 17 September 2014 (UTC)

email to you
On Wikipedia talk:Wikidata, I describe a problem I am having in editing Wikidata. I have sent you an email with some further details about this, which I hope will be helpful. Risker (talk) 19:47, 2 July 2015 (UTC)

Wikidata deployment in Bengali Wikipedia
Hi Lydia,

Following my attendance in the wm2016:User digest/Wikidata and another Wikidata session by Asaf Bartov, I got curious about using Wikidata for the benefit of development of contents in my home Wiki, that is Bengali Wikipedia. But while trying to include data from Wikidata into articles in Bengali Wikipedia, I found that it is not working, as Wikidata has not yet been deployed in the Bengali Wikipedia. I am not versed with Wikidata. Could you please tell me when Wikidata will be deployed in the Bengali Wikipedia? Or is there anything, like taking an initiative, that I can do to deploy Wikidata in Bengali Wikipedia? Thank you :) Tanweer  talk  07:49, 11 July 2016 (UTC)


 * Hey Tanweer :) The software has been enabled on Bengali Wikipedia already a long time ago. I also just tried a simple test and it works for me. Have you had a look at the tutorial that is being developed at the moment? --Lydia Pintscher (WMDE) (talk) 08:09, 11 July 2016 (UTC)


 * Oh, in Bengali Wikipedia I just tried like what Asaf did here. I made a little bit mistake while typing the code, but as I corrected, that gave the result. Now it works! Thank you :) Tanweer  talk  13:25, 11 July 2016 (UTC)

A kitten for you!
Hello Lydia, This is a belated thank you for the Wikidata t-shirt you gave me at Wikimania. I love it and wear it proudly. Till we meet again...

Rosiestep (talk) 18:24, 15 July 2016 (UTC) 


 * Awwww thank you ;-) It was good seeing you! --Lydia Pintscher (WMDE) (talk) 12:37, 16 July 2016 (UTC)

Commons creator templates
Lydia, I am working lately on c:Module:creator which rewrites Creator infobox templates on Commons to rely as much as possible on Wikidata. There are 1.37M pages on commons that are pulling info from Wikidata using c:Module:creator. Just in case your team notice any issues with interactions of those pages with wikidata, please let me know at my Commons talk page. By the way, something is broken with your wikidata talk page and it is not possible to leave you a message there. --Jarekt (talk) 15:29, 12 June 2017 (UTC)
 * Thanks so much for the heads-up! We'll keep an eye on it. And I'll have a look at my Wikidata talk page. --Lydia Pintscher (WMDE) (talk) 08:49, 13 June 2017 (UTC)

Facto Post – Issue 1 – 14 June 2017
MediaWiki message delivery (talk) 09:33, 14 June 2017 (UTC)

Facto Post – Issue 2 – 13 July 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color:	#7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 2 – 13 July 2017

 

Editorial: Core models and topics
Wikimedians interest themselves in everything under the sun — and then some. Discussion on "core topics" may, oddly, be a fringe activity, and was popular here a decade ago.

The situation on Wikidata today does resemble the halcyon days of 2006 of the English Wikipedia. The growth is there, and the reliability and stylistic issues are not yet pressing in on the project. Its Berlin conference at the end of October will have five years of achievement to celebrate. Think Wikimania Frankfurt 2005.

Progress must be made, however, on referencing "core facts". This has two parts: replacing "imported from Wikipedia" in referencing by external authorities; and picking out statements, such as dates and family relationships, that must not only be reliable but be seen to be reliable.

In addition, there are many properties on Wikidata lacking a clear data model. An emerging consensus may push to the front key sourcing and biomedical properties as requiring urgent attention. Wikidata's "manual of style" is currently distributed over thousands of discussions. To make it coalesce, work on such a core is needed.

Links

 * WikiFactMine project pages on Wikidata, including a SPARQL library (in development).
 * Fatameh tool for adding items on scientific papers to Wikidata, by User: T Arrow. It has made a big recent impact. Offline for maintenance as we go to press, it is expected back soon.
 * As of July 2017, Zotero has a Wikidata translator. A personal Zotero library acts as an intermediary in managing and storing citation metadata.
 * GLAM Newsletter June 2017, Wikidata report. This is a good monthly round-up to follow, and welcomes contributions.
 * Exciting and Impressive! The Initiative for Open Citations (I4OC) was launched in April: Infodocket on the first three months.
 * Olivia Solon in San Francisco, the net neutrality protest matters, opinion piece in The Guardian'' on 11 July.

Editor. Please leave feedback for him. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery
 * }

Facto Post – Issue 3 – 11 August 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color:	#7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 3 – 11 August 2017

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

Wikimania report
Interviewed by Facto Post at the hackathon, Lydia Pintscher of Wikidata said that the most significant recent development is that Wikidata now accounts for one third of Wikimedia edits. And the essential growth of human editing. Impressive development work on Internet-in-a-Box featured in the WikiMedFoundation annual conference on Thursday. Hardware is Raspberry Pi, running Linux and the Kiwix browser. It can operate as a wifi hotspot and support a local intranet in parts of the world lacking phone signal. The medical use case is for those delivering care, who have smartphones but have to function in clinics in just such areas with few reference resources. Wikipedia medical content can be served to their phones, and power supplied by standard lithium battery packages.

Yesterday Katherine Maher unveiled the draft Wikimedia 2030 strategy, featuring a picturesque metaphor, "roads, bridges and villages". Here "bridges" could do with illustration. Perhaps it stands for engineering round or over the obstacles to progress down the obvious highways. Internet-in-a-Box would then do fine as an example.

"Bridging the gap" explains a take on that same metaphor, with its human component. If you are at Wikimania, come talk to WikiFactMine at its stall in the Community Village, just by the 3D-printed display for Bassel Khartabil; come hear talk at 3 pm today in Drummond West, Level 3.

Link

 * Plaudit for the Medical Wikipedia app, content that is loaded into Internet-In-A-Box with other material, such as per-country documentation.

Editor. Please leave feedback for him. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 10:55, 12 August 2017 (UTC)
 * }

Deploy Wikibase notifications to Wikimedia projects
Hi Lydia. I'm not sure, but you appear to be in charge of the Wikibase Notifications project. I am going to start by thanking Trizek (WMF). He gave you good advice in Phab T142102. Wikidata on Wikipedia is indeed a controversial topic, and you indeed should have set this as opt-in. The feature itself is relatively innocuous (and unneeded). However the intent and implementation are seriously bad. I was half way through writing an RFC, when I decided to try to talk to you first. You'll get roasted. Seriously seriously roasted. There are two problems.


 * 1) The discussion in the Phab task makes it abundantly clear that the sole purpose for this project is as an advertizement for wikidata.
 * 2) The discussion in the Phab task makes it abundantly clear that you set existing users opt-in and new users opt-out as a deliberate stealth deployment to force it on new users.

One of the few things that editors hate more than advertizements on Wikipedia, is deceptive/bad_faith tactics by the WMF. You will get roasted.

I really really want to run the RFC because I really really want a consensus asking the WMF-as-a-whole to quit the "new user opt-out" tactic. However in the interest of avoiding yet another firebomb on the already strained community-WMF relationship, and to avoid getting you roasted, I came here first. Do you think new users should be set to opt-in? Globally. Or maybe just scrap it? You said yourself "If we do that opt-in we might as well not have this feature at all." Alsee (talk) 07:56, 12 September 2017 (UTC)


 * Hi Alsee :) it was a project developed by a volunteer. I helped get the code rolled out with my team at WMDE. The WMF has nothing to do with this part of the feature.
 * I can see how you come to the conclusion you draw. But I acted in the best interest of the Wikimedia movement. It was and is not my intention to sneak anything by the community. I have gotten several complaints from editors on various projects that they have to teach new people to connect new articles to the corresponding Wikidata item. This feature is intended to take away some of that friction and act as a small educational hint. A lot of the existing users don't need this hint and would be more annoyed than helped by it so making it opt-out for them makes sense. For new people the situation is different. They very likely have no idea about Wikidata and are extremely unlikely to go into their settings to turn something on that they have no idea about. Setting this to opt-out makes more sense for them.
 * I hope this makes my decision a bit more understandable. Let me know if you have more questions. --Lydia Pintscher (WMDE) (talk) 19:48, 12 September 2017 (UTC)

Facto Post – Issue 4 – 18 September 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color:	#7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 4 – 18 September 2017

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

Editorial: Conservation data
The IUCN Red List update of 14 September led with a threat to North American ash trees. The International Union for Conservation of Nature produces authoritative species listings that are peer-reviewed. Examples used as metonyms for loss of species and biodiversity, and |theoretical discussion of extinction rates, are the usual topics covered in the media to inform us about this area. But actual data matters. Clearly, conservation work depends on decisions about what should be done, and where. While animals, particularly mammals, are photogenic, species numbers run into millions. Plant species lie at the base of typical land-based food chains, and vegetation is key to the habitats of most animals.

ContentMine dictionaries, for example as tabulated at d:Wikidata:WikiFactMine/Dictionary list, enable detailed control of queries about endangered species, in their taxonomic context. To target conservation measures properly, species listings running into the thousands are not what is needed: range maps showing current distribution are. Between the will to act, and effective steps taken, the services of data handling are required. There is now no reason at all why Wikidata should not take up the burden.

Links

 * What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata (paywall)
 * Wikimedia and the free knowledge ecosystem by Maria Cruz
 * Another Year Again: 2017 this time (long), blog by Joe Wass of CrossRef
 * Attack of the 50 Foot Blockchain, blog by User:David Gerard
 * WikiTribune in beta

Editor. Please leave feedback for him. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 14:46, 18 September 2017 (UTC)
 * }

Facto Post – Issue 5 – 17 October 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color:	#7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 5 – 17 October 2017

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

Editorial: Annotations
Annotation is nothing new. The glossators of medieval Europe annotated between the lines, or in the margins of legal manuscripts of texts going back to Roman times, and created a new discipline. In the form of web annotation, the idea is back, with texts being marked up inline, or with a stand-off system. Where could it lead? ContentMine operates in the field of text and data mining (TDM), where annotation, simply put, can add value to mined text. It now sees annotation as a possible advance in semi-automation, the use of human judgement assisted by bot editing, which now plays a large part in Wikidata tools. While a human judgement call of yes/no, on the addition of a statement to Wikidata, is usually taken as decisive, it need not be. The human assent may be passed into an annotation system, and stored: this idea is standard on Wikisource, for example, where text is considered "validated" only when two different accounts have stated that the proof-reading is correct. A typical application would be to require more than one person to agree that what is said in the reference translates correctly into the formal Wikidata statement. Rejections are also potentially useful to record, for machine learning.

As a contribution to data integrity on Wikidata, annotation has much to offer. Some "hard cases" on importing data are much more difficult than average. There are for example biographical puzzles: whether person A in one context is really identical with person B, of the same name, in another context. In science, clinical medicine require special attention to sourcing (WP:MEDRS), and is challenging in terms of connecting findings with the methodology employed. Currently decisions in areas such as these, on Wikipedia and Wikidata, are often made ad hoc. In particular there may be no audit trail for those who want to check what is decided.

Annotations are subject to a World Wide Web Consortium standard, and behind the terminology constitute a simple JSON data structure. What WikiFactMine proposes to do with them is to implement the MEDRS guideline, as a formal algorithm, on bibliographical and methodological data. The structure will integrate with those inputs the human decisions on the interpretation of scientific papers that underlie claims on Wikidata. What is added to Wikidata will therefore be supported by a transparent and rigorous system that documents decisions.

An example of the possible future scope of annotation, for medical content, is in the first link below. That sort of detailed abstract of a publication can be a target for TDM, adds great value, and could be presented in machine-readable form. You are invited to discuss the detailed proposal on Wikidata, via its talk page.

Links

 * Jon Udell, blogpost Annotating to extract findings from scientific papers, 15 December 2015
 * TDM and Libraries, Virginia Tech report
 * Magnus Manske, The Whelming: Scaling up Wikidata editing
 * OCLC and Internet Archive collaborate to expand library access to digital collections, metadata and linking exchange
 * GLOW week in November: Wikidata workshops on politician info

Editor. Please leave feedback for him. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 08:46, 17 October 2017 (UTC)
 * }

Facto Post – Issue 6 – 15 November 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 6 – 15 November 2017

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

WikidataCon Berlin 28–9 October 2017
Under the heading rerum causas cognescere, the first ever Wikidata conference got under way in the Tagesspiegel building with two keynotes, One was on YAGO, about how a knowledge base conceived ten years ago if you assume automatic compilation from Wikipedia. The other was from manager Lydia Pintscher, on the "state of the data". Interesting rumours flourished: the mix'n'match tool and its 600+ datasets, mostly in digital humanities, to be taken off the hands of its author Magnus Manske by the WMF; a Wikibase incubator site is on its way. Announcements came in talks: structured data on Wikimedia Commons is scheduled to make substantive progress by 2019. The lexeme development on Wikidata is now not expected to make the Wiktionary sites redundant, but may facilitate automated compilation of dictionaries. And so it went, with five strands of talks and workshops, through to 11 pm on Saturday. Wikidata applies to GLAM work via metadata. It may be used in education, raises issues such as author disambiguation, and lends itself to different types of graphical display and reuse. Many millions of SPARQL queries are run on the site every day. Over the summer a large open science bibliography has come into existence there.

Wikidata's fifth birthday party on the Sunday brought matters to a close. See a dozen and more reports by other hands.

Links

 * Wikidata statistics
 * I4OC progress in its first year, with 47% of scientific citation data now open (announced two days ago)
 * The flowering ORCID, Magnus Manske blogpost on identifying authors of scientific papers
 * @querybook, a Twitter feed devoted to SPARQL queries
 * Massive progress on Wikidata coverage of the UK parliament
 * Reminder: WikiFactMine pages on Wikidata are at WD:WFM

Editor. Please leave feedback for him. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 10:02, 15 November 2017 (UTC)
 * }

Facto Post – Issue 7 – 15 December 2017
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 7 – 15 December 2017

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

A new bibliographical landscape
At the beginning of December, Wikidata items on individual scientific articles passed the 10 million mark. This figure contrasts with the state of play in early summer, when there were around half a million. In the big picture, Wikidata is now documenting the scientific literature at a rate that is about eight times as fast as papers are published. As 2017 ends, progress is quite evident.

Behind this achievement are a technical advance (fatameh), and bots that do the lifting. Much more than dry migration of metadata is potentially involved, however. If paper A cites paper B, both papers having an item, a link can be created on Wikidata, and the information presented to both human readers, and machines. This cross-linking is one of the most significant aspects of the scientific literature, and now a long-sought open version is rapidly being built up. The effort for the lifting of copyright restrictions on citation data of this kind has had real momentum behind it during 2017. WikiCite and the I4OC have been pushing hard, with the result that on CrossRef over 50% of the citation data is open. Now the holdout publishers are being lobbied to release rights on citations.

But all that is just the beginning. Topics of papers are identified, authors disambiguated, with significant progress on the use of the four million ORCID IDs for researchers, and proposals formulated to identify methodology in a machine-readable way. P4510 on Wikidata has been introduced so that methodology can sit comfortably on items about papers.

More is on the way. OABot applies the unpaywall principle to Wikipedia referencing. It has been proposed that Wikidata could assist WorldCat in compiling the global history of book translation. Watch this space.

And make promoting #1lib1ref one of your New Year's resolutions. Happy holidays, all!



Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 14:54, 15 December 2017 (UTC)
 * WikidataCon: Giving more people more access to more knowledge, report by Peter Kraker of Open Knowledge Maps
 * This is a story of my knowledge adventure in New Zealand moths via Wikicommons, Wikipedia and Wikidata, @SiobhanLeachman
 * Wikidata and Arabic dialects, research paper, DOI: 10.1109/AICCSA.2017.115
 * c:Commons:British Library/Mechanical Curator collection/georeferencing status, Mechanical Curator project on Commons hits 50K maps milestone
 * Historical dataset on the provenance of Wikipedia text: Who wrote this?, by Tilman Bayer, WMF blogpost
 * "Anyone can edit", not everyone does: Wikipedia and the gender gap (PDF), journal paper, Heather Ford and Judy Wajcman
 * Alpha Zero’s "Alien" Chess Shows the Power, and the Peculiarity, of AI, MIT Technology Review, by Will Knight, December 8, 2017
 * }

Facto Post – Issue 8 – 15 January 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 8 – 15 January 2018

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

Metadata on the March
From the days of hard-copy liner notes on music albums, metadata have stood outside a piece or file, while adding to understanding of where it comes from, and some of what needs to be appreciated about its content. In the GLAM sector, the accumulation of accurate metadata for objects is key to the mission of an institution, and its presentation in cataloguing.

Today Wikipedia turns 17, with worlds still to conquer. Zooming out from the individual GLAM object to the ontology in which it is set, one such world becomes apparent: GLAMs use custom ontologies, and those introduce massive incompatibilities. From a recent article by, we quote the observation that "vocabularies needed for many collections, topics and intellectual spaces defy the expectations of the larger professional communities." A job for the encyclopedist, certainly. But the data-minded Wikimedian has the advantages of Wikidata, starting with its multilingual data, and facility with aliases. The controlled vocabulary — sometimes referred to as a "thesaurus" as term of art — simplifies search: if a "spade" must be called that, rather than "shovel", it is easier to find all spade references. That control comes at a cost. Case studies in that article show what can lie ahead. The schema crosswalk, in jargon, is a potential answer to the GLAM Babel of proliferating and expanding vocabularies. Even if you have no interest in Wikidata as such, simply vocabularies V and W, if both V and W are matched to Wikidata, then a "crosswalk" arises from term v in V to w in W, whenever v and w both match to the same item d in Wikidata.

For metadata mobility, match to Wikidata. It's apparently that simple: infrastructure requirements have turned out, so far, to be challenges that can be met.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 12:38, 15 January 2018 (UTC)
 * <div style="color: #936c29; font-family: Copperplate, 'Copperplate Gothic Light', serif">1lib1ref campaign starts today, see The Wikipedia Library/1Lib1Ref: also #1lib1ref introductory video by
 * Funders should mandate open citations, article 9 January 2018 in Nature by David Shotton
 * From snowflake to avalanche: Possibilities of using free citation data in libraries, translation from the German original of Annette Klein, Mannheim University Library
 * GLAM/Newsletter/December 2017/Contents/WMF GLAM report
 * Why Mickey Mouse’s 1998 copyright extension probably won't happen again: Copyrights from the 1920s will start expiring next year if Congress doesn't act, Timothy B. Lee, 8 January 2018, Arstechnica
 * }

Facto Post – Issue 9 – 5 February 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 9 – 5 February 2018

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

m:Grants:Project/ScienceSource is the new ContentMine proposal: please take a look.

Wikidata as Hub
One way of looking at Wikidata relates it to the semantic web concept, around for about as long as Wikipedia, and realised in dozens of distributed Web institutions. It sees Wikidata as supplying central, encyclopedic coverage of linked structured data, and looks ahead to greater support for "federated queries" that draw together information from all parts of the emerging network of websites. Another perspective might be likened to a photographic negative of that one: Wikidata as an already-functioning Web hub. Over half of its properties are identifiers on other websites. These are Wikidata's "external links", to use Wikipedia terminology: one type for the DOI of a publication, another for the VIAF page of an author, with thousands more such. Wikidata links out to sites that are not nominally part of the semantic web, effectively drawing them into a larger system. The crosswalk possibilities of the systematic construction of these links was covered in Issue 8.

External links speaks of them as kept "minimal, meritable, and directly relevant to the article." Here Wikidata finds more of a function. On viaf.org one can type a VIAF author identifier into the search box, and find the author page. The Wikidata Resolver tool, these days including Open Street Map, Scholia etc., allows this kind of lookup. The hub tool by takes a major step further, allowing both lookup and crosswalk to be encoded in a single URL.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 11:50, 5 February 2018 (UTC)
 * What galleries, libraries, archives, and museums can teach us about multimedia metadata on Wikimedia Commons, Wikimedia Foundation blogpost, 29 January 2018, by Jonathan Morgan and Sandra Fauconnier
 * The Wikipedia Library/1Lib1Ref/Connect, 2018 institutional participation in the #1lib1ref campaign
 * Newspeak House queries, created at 3 February 2018 event in London led by
 * Cochrane–Wikipedia Initiative, Wikipedia Signpost special report 5 February 2018, by
 * What is the Last Question?, 5 February 2018
 * }

Facto Post – Issue 10 – 12 March 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 10 – 12 March 2018

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

Milestone for mix'n'match
Around the time in February when Wikidata clicked past item Q50000000, another milestone was reached: the mix'n'match tool uploaded its 1000th dataset. Concisely defined by its author,, it works "to match entries in external catalogs to Wikidata". The total number of entries is now well into eight figures, and more are constantly being added: a couple of new catalogs each day is normal.

Since the end of 2013, mix'n'match has gradually come to play a significant part in adding statements to Wikidata. Particularly in areas with the flavour of digital humanities, but datasets can of course be about practically anything. There is a catalog on skyscrapers, and two on spiders.

These days mix'n'match can be used in numerous modes, from the relaxed gamified click through a catalog looking for matches, with prompts, to the fantastically useful and often demanding search across all catalogs. I'll type that again: you can search 1000+ datasets from the simple box at the top right. The drop-down menu top left offers "creation candidates", Magnus's personal favourite. Mix'n'match/Manual for more.

For the Wikidatan, a key point is that these matches, however carried out, add statements to Wikidata if, and naturally only if, there is a Wikidata property associated with the catalog. For everyone, however, the hands-on experience of deciding of what is a good match is an education, in a scholarly area, biographical catalogs being particularly fraught. Underpinning recent rapid progress is an open infrastructure for scraping and uploading.

Congratulations to Magnus, our data Stakhanovite!

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 12:26, 12 March 2018 (UTC)
 * Wikipedia goes 3D allowing users to upload .STLs for digital reference, Beau Jackson for 3dprintingindustry.com, February 22 2018
 * WikiCite report (video)
 * Formal publication and announcement of ISBN citation dataset, see Twitter post, February 23 2018
 * Plotting the Course Through Charted Waters, workshop on data visualization literacy from Mikhail Popov, Wikimedia Foundation
 * Using Wikidata to build an authority list of Holocaust-era ghettos, Nancy Cooey, United States Holocaust Memorial Museum, February 12 2018
 * Why Should You Learn SPARQL? Wikidata! Mark Longair, blogpost November 29 2017
 * Back to the future: Does graph database success hang on query language?, George Anadiotis for Big on Data, March 5 2018
 * }

Facto Post – Issue 11 – 9 April 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 11 – 9 April 2018

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

The 100 Skins of the Onion
Open Citations Month, with its eminently guessable hashtag, is upon us. We should be utterly grateful that in the past 12 months, so much data on which papers cite which other papers has been made open, and that Wikidata is playing its part in hosting it as "cites" statements. At the time of writing, there are 15.3M Wikidata items that can do that.

Pulling back to look at open access papers in the large, though, there is is less reason for celebration. Access in theory does not yet equate to practical access. A recent LSE IMPACT blogpost puts that issue down to "heterogeneity". A useful euphemism to save us from thinking that the whole concept doesn't fall into the realm of the oxymoron.

Some home truths: aggregation is not content management, if it falls short on reusability. The PDF file format is wedded to how humans read documents, not how machines ingest them. The salami-slicer is our friend in the current downloading of open access papers, but for a better metaphor, think about skinning an onion, laboriously, 100 times with diminishing returns. There are of the order of 100 major publisher sites hosting open access papers, and the predominant offer there is still a PDF. From the discoverability angle, Wikidata's bibliographic resources combined with the SPARQL query are superior in principle, by far, to existing keyword searches run over papers. Open access content should be managed into consistent HTML, something that is currently strenuous. The good news, such as it is, would be that much of it is already in XML. The organisational problem of removing further skins from the onion, with sensible prioritisation, is certainly not insuperable. The CORE group (the bloggers in the LSE posting) has some answers, but actually not all that is needed for the text and data mining purposes they highlight. The long tail, or in other words the onion heart when it has become fiddly beyond patience to skin, does call for a pis aller. But the real knack is to do more between the XML and the heart.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 16:25, 9 April 2018 (UTC)
 * Crossref as a new source of citation data: A comparison with Web of Science and Scopus, CWTS blogpost 17 January 2018, Nees Jan van Eck, Ludo Waltman, Vincent Larivière, Cassidy Sugimoto
 * Citations with identifiers in Wikipedia, figshare dataset
 * Making women more visible online—with Wikidata tools!, Wikimedia blogpost 29 March 2018 by Sandra Fauconnier
 * Village pump discussion, Turn on mapframe? We’re ready if you are reaches conclusions
 * The Power of the Wikimedia Movement beyond Wikimedia, Forbes 28 March 2018, Michael Bernick
 * Tracing stolen bitcoin, blogpost 26 March 2018 by Ross J. Anderson
 * }

Facto Post – Issue 12 – 28 May 2018
{| style="position: relative; margin-left: 2em; margin-right: 2em; padding: 0.5em 1em; background-color: #7FFFD4; border: 2px solid #00FFFF; border-color: rgba( 109, 193, 240, 0.75 ); border-radius: 8px; box-shadow: 8px 8px 12px rgba( 0, 0, 0, 0.7 );"
 * Facto Post – Issue 12 – 28 May 2018

<div style="position: absolute; top: -20px; right: -12px; background-color: white; border: 3px solid black; padding:10px;"> <hr style="border-bottom: 1px solid rgba( 109, 193, 240, 0.75 );" />

ScienceSource funded
The Wikimedia Foundation announced full funding of the ScienceSource grant proposal from ContentMine on May 18. See the ScienceSource Twitter announcement and 60 second video.

The proposal includes downloading 30,000 open access papers, aiming (roughly speaking) to create a baseline for medical referencing on Wikipedia. It leaves open the question of how these are to be chosen.
 * A medical canon?

The basic criteria of WP:MEDRS include a concentration on secondary literature. Attention has to be given to the long tail of diseases that receive less current research. The MEDRS guideline supposes that edge cases will have to be handled, and the premature exclusion of publications that would be in those marginal positions would reduce the value of the collection. Prophylaxis misses the point that gate-keeping will be done by an algorithm.

Two well-known but rather different areas where such considerations apply are tropical diseases and alternative medicine. There are also a number of potential downloading troubles, and these were mentioned in Issue 11. There is likely to be a gap, even with the guideline, between conditions taken to be necessary but not sufficient, and conditions sufficient but not necessary, for candidate papers to be included. With around 10,000 recognised medical conditions in standard lists, being comprehensive is demanding. With all of these aspects of the task, ScienceSource will seek community help.

Links
To subscribe to Facto Post go to Facto Post mailing list. For the ways to unsubscribe, see below. Editor, for ContentMine. Please leave feedback for him. Back numbers are here. Reminder: WikiFactMine pages on Wikidata are at WD:WFM. ScienceSource pages will be announced there, and in this mass message. If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page. Newsletter delivered by MediaWiki message delivery MediaWiki message delivery (talk) 10:16, 28 May 2018 (UTC)
 * d:Wikidata:Lexicographical data, Wikidata's multi-lingual dictionary project gets going
 * Ordia tool, a basic search interface for Wikidata lexemes and forms
 * OpenRefine tool 3.0, May update allows wrangling of tabular information into Wikidata
 * d:Wikidata:WikiProject British Politicians pushes ahead with data modelling and imports
 * #1Lib1Ref Returns for a Second Time in 2018, IFLA blogpost 25 May 2018, second chance this year to participate in referencing Wikipedia
 * }

Facto Post – Issue 13 – 29 May 2018
MediaWiki message delivery (talk) 18:19, 29 June 2018 (UTC)

Facto Post – Issue 14 – 21 July 2018
MediaWiki message delivery (talk) 06:10, 21 July 2018 (UTC)

Facto Post – Issue 15 – 21 August 2018
MediaWiki message delivery (talk) 13:23, 21 August 2018 (UTC)

Facto Post – Issue 16 – 30 September 2018
MediaWiki message delivery (talk) 17:57, 30 September 2018 (UTC)

Facto Post – Issue 17 – 29 October 2018
MediaWiki message delivery (talk) 15:01, 29 October 2018 (UTC)

Facto Post – Issue 18 – 30 November 2018
MediaWiki message delivery (talk) 11:20, 30 November 2018 (UTC)

Facto Post – Issue 19 – 27 December 2018
MediaWiki message delivery (talk) 19:08, 27 December 2018 (UTC)

Facto Post – Issue 20 – 31 January 2019
MediaWiki message delivery (talk) 10:53, 31 January 2019 (UTC)

Facto Post – Issue 21 – 28 February 2019
MediaWiki message delivery (talk) 10:02, 28 February 2019 (UTC)

Facto Post – Issue 22 – 28 March 2019
MediaWiki message delivery (talk) 11:45, 28 March 2019 (UTC)

Facto Post – Issue 23 – 30 April 2019
MediaWiki message delivery (talk) 11:27, 30 April 2019 (UTC)

Facto Post – Issue 24 – 17 May 2019
MediaWiki message delivery (talk) 18:52, 17 May 2019 (UTC)