Wikipedia:GLAM/NZThesisProject

Background
This project is focussed on uploading metadata for New Zealand academic theses to Wikidata, in order for them to be more openly citable and accessible. We believe this is the first attempt to upload a national dataset of theses.

The project came about while Giantflightlessbirds was a Wikipedian in Residence at Lincoln University. During that short residency, librarian Zeborah raised the possibility of adding Lincoln University's theses to Wikidata. She had an opportunity to present on to her academic librarian colleagues at the online conference Aotearoa Institutional Repositories Community Days 30 September – 1 October 2021 on adding thesis metadata into Wikidata. In preparation for this presentation she reached out to Giantflightlessbirds who in turn invited Ambrosia10 and DrThneed to join in the discussion. This group met several times to discuss the proposal of uploading all New Zealand academic theses into Wikidata and to prepare for the presentation at the conference.

Discussion documents, slides and other project documentation is being collated in Google Docs folders as some of the participating academic librarians are not Wikimedians. Some of this documentation is linked to in the documentation section of this page.

Scope
We collected metadata for theses from New Zealand universities and polytechnics, and uploaded a core set of statements for each thesis in the first instance. After this core set of statements was uploaded, we increased the findability and linkage of the theses, by including keywords, often in controlled vocabularies such as ANZSRC, which could be mapped to main subject statements. We also connected theses to degree programmes and advisors.

A dataset of approximately 66,500 theses has been compiled, from 13 New Zealand institutions. The theses range from diploma and bachelor's theses through to Doctor of Science, and span the time period 1907 to 2022. Whilst many of the theses are digitised and available through an institutional repository, others are represented only by their metadata. Because of variability in the data both within and between institutions, there was a lot of clean up and standardising of data required.

Funding
The thesis dataset is a large and complex dataset, with 66.5k items and several languages, including some apparent duplicate items within and between institutions that needed to be clarified with the academic librarians involved, and some incomplete data that needed follow up. The inconsistencies in data format between institutions required a lot of time to standardise and clean up. The data cleaning and wrangling was supported by a grant from Wikimedia Aotearoa New Zealand. WANZ also supported the presentation of this work at LIANZA in 2023.

Progress
The upload of a core set of statements for the full theses dataset was completed in May/June 2022. Work to match main subjects, authors and advisors continues as of 2024.

A small team met in the second half of 2022 to work on ANZSRC vocabularies in Wikidata, as a necessary prelude to uploading keywords to the theses items. Although the vocabularies are not completely mapped to Wikidata yet, all the terms used in the project were mapped, so that main subject statements could be uploaded.

Events

 * First meeting with librarians 2021
 * Second update with project members & librarians 25 March 2022 Dissertations sample data progress WC.pdf: DrThneed presented her findings to the Project participants and contributors and requested feedback from the contributing libraries on issues this trial upload raised.
 * Third update 28 July 2022 showing how theses are connected in Wikidata and cited in Wikipedia, and some of the data visualisations now possible, as well as tools to improve the data. NZThesisProject update July 2022.pdf
 * Presentation at Wikimania in Singapore on 20 August 2023 (see Documentation for recording)
 * Presentation to LIANZA conference in Christchurch 31 October 2023
 * Presentation to Christchurch librarians 22 April 2024

Documentation

 * Cradle model
 * Data schema for theses, authors and advisors, on Wikidata
 * Google doc documenting the process and progress of the project
 * Documentation giving recommendations to librarians when providing data
 * This Month in GLAM March 2022 report
 * Youtube video of DrThneed's report on project progress.
 * Recording of DrThneed's presentation at Wikimania 2023 in Singapore on 20 August 2023

Tools
DrThneed has made some Wikidata property dashboards to see progress on the project. They are both linked from the Wikidata project page. One table shows properties for theses, and the other properties for people (thesis authors). A third table shows some properties we don't expect to find, like volume number and published in - this helps check that our thesis items haven't been inappropriately merged with other types of publications.

The Wikidata project page also contains a link to some Histropedia timelines, and some Sparql queries to visualise the data e.g. a map of where authors have been educated or employed, bubble charts of main subjects or author occupations, links between advisors and students.

Tasks
If you would like to help, some easy tasks are making sure the theses are cited on relevant author Wikipedia pages, or matching authors to author name strings in the [https://mix-n-match.toolforge.org/? Mix'n'match tool].

Citing theses on Wikipedia
This Googlesheet shows theses by people who have Wikipedia pages (updated 23 March 2023). Unfortunately we have discovered that CiteQ is not helpful for citing theses currently, as the citations are not tracked by Altmetric. That means the impact of all the work is harder to see. We are currently replacing CiteQ citations with the "cite thesis" template instead. To make this easier the Google sheet now contains the citation with ref tags ready to paste into the Wikipedia page - without any need for source editing. A 4 minute "how to" video has been uploaded to YouTube showing how to create a new citation or replace an existing one.

For reference purposes, here is the old Googlesheet

Do you like working in other language Wikipedias?
This Googlesheet has a short list of thesis authors who do not have an English Wikipedia page, but do have one in another language (languages show in last column). It would be great to cite the theses on those pages, so that non-English speakers can see the work exists. The first sheet in the file contains some instructions for how to go about this if you don't speak the language concerned, obviously if you are fluent you will find it much faster!

Mix'n'match
The Mix'n'match tool is a way to match the author name strings from the thesis project to authors on Wikidata. If you search Wikidata and do not find the author, try removing middle names, initials etc. If you are sure the person is not in Wikidata, click the 'new' button to create an item for them. You may be able to find other identifiers to add to the new record e.g. Orcid or ResearchGate. Or if they have a university profile page you can add the university as an 'employer' statement, and then use their profile URL as the reference URL for the statement. You do NOT need to link the author and the thesis item. DrThneed will periodically download matches from the Mix'n'match catalogue and match the authors and theses, and also add other information such as advisors.

If you are not familiar with the Mix'n'match tool, this screencapture shows how to match items, using the Alexander Turnbull library catalogue as an example.

Participants

 * Obedmakolo
 * Zeborah
 * Ambrosia10
 * Giantflightlessbirds
 * DrThneed
 * Canley
 * Schwede66
 * Oronsay

Outcomes and impact
.
 * July 2022 After Dr Thneed presented to the librarian community who provided the thesis data in July, Ambrosia10 did a twitter thread explaining to the wider Wikidata community and others on twitter about the project and the progress being made. Dr. Amanda Whitmire, librarian at Stanford Hopkins Marine Station, responded by expressing a desire for the theses from that station be added to Wikidata. This led to an exchange where Dr Thneed and Ambrosia10 expressed encouragement and support in the preparation of theses data by Dr Whitmire being uploaded into Wikidata. As at 9 August 2022 Dr Whitmire has made over 1000 edits to Wikidata including adding 353 Stanford theses from folks who worked at Hopkins Marine Station. She has also created numerous items for the authors of those theses, and has learned how to cite them on Wikipedia.
 * August 2022 As a result of Dr Thneed creating a youtube video about the project and her workflow using OpenRefine she has been contacted by a PhD student in Leipzig who is doing a PhD on dissertations.
 * September 2022 As a result of Dr Thneed's twitter and Wikidata outreach awareness was raised of the NZ Thesis project and the London School of Economics Wikidata Thesis project were able to adapt queries and visualisations used in the NZThesisProject for their own (and vice versa).
 * September 2022 User:Schwede66 wanted to work on New Zealand Rhodes scholars. Dr Thneed scraped and imported a list to OpenRefine, and matched to existing Wikidata items, and then created a Mixnmatch catalogue for the remaining scholars to be matched or created. As most of the scholars have completed a degree at a university in New Zealand and many return to teach in New Zealand institutions, there is a large overlap between Rhodes scholars and the thesis project. Additionally we have been able to match some scholars to their Oxford thesis.
 * October 2022 DrThneed presented on the project to the Australia Wikimedia Community Meeting. DrThneed encouraged anyone who knows an institution keen to put thesis data into Wikidata to contact her.
 * 12 February 2023 DrThneed presented on the project to the ESEAP meeting (etherpad https://etherpad.wikimedia.org/p/ESEAP29), supported by Giantflightlessbirds.