Wikipedia:GLAM/Oxford/Final report


 * Project Title: Embedding Innovative use of Wikimedia across the University
 * Project Manager:	Martin Poulter		Martin.poulter@undefinedbodleian.ox.ac.uk
 * Project Start Date: 24th October 2016
 * Project End Date: 23rd October 2017

Project Summary
Please provide a summary of your project for a wider audience.

The Wikimedian In Residence worked with research projects to give tailored support in making their work more visible to the public. This involved sharing research outputs, including data and text, to enhance their impact. The WIR used these pieces of work, as well as events and articles, to advocate for a change in attitude to open data across the university. A lot of the work has ongoing consequences:
 * As a result of the WIR’s advocacy of Wikdata, the OxLOD (Oxford Linked Open Data) project will be building a prototype tool based on Wikidata to aid discovery of the university’s collections and databases. IT Services are funding a continuation of the WIR placement to support this.
 * Work with bibliographic researchers led to a project idea around creating open data representations of Enlightenment-era books. This will be submitted as a bid for funding.
 * Since Oxford University has become the first university to share metadata about its doctoral theses on Wikidata, Edinburgh University has announced plans to do the same.
 * The WIR has been added to the editorial board of the open access journal Vestiges: Traces of Record to advise on adapting papers into Wikipedia articles.

The WIR shared data about locations, people, publications and artefacts in Wikidata, Wikimedia’s structured knowledge base. These include more than four thousand records from the Atlas of Hillforts of Great Britain and Ireland, more than three thousand identifiers from the Electronic Enlightenment biographical dictionary, data about more than three thousand doctoral theses whose text is available through the Oxford Text Archive, and at the end of the placement the WIR received more than 100,000 records from the Beazley Archive Pottery Database. These data have been used in visualisations, apps and Wikimedia sites, with links back to the source databases.

As well as data, the WIR delivered tailored training and advice. For example, two anthropology projects were writing Wikipedia articles to raise public awareness of their fields. The WIR helped to get the new articles created and even to bring back the deleted article. Wikimedia UK supported the project in many ways, including endorsing a bid for funding.

Project Aims and Objectives
''Did you achieve what you set out to do? Did you do anything that you hadn’t originally planned (if this was the case, please tell us why and how it changed, and what did it mean for the project)?'' The project set out to produce a change in the way the University works with Wikipedia and the other Wikimedia web sites to achieve engagement and impact with its outputs.

The original scope of the project was to work across research impact and educational practice. Events and meetings with academics raised interest in running educational assignments on Wikipedia, but nothing concrete materialised. Meanwhile, the demand for activity around research impact covered more work than could be done in the span of the project. So the project became almost entirely about supporting research projects.

The original plan was to work with a different researcher or research team each month. However, researchers’ timescales were variable and, for good reasons, some activity had to happen months after it was decided on. For instance, the Atlas of Hillforts web site was being built during the placement. Although we had discussed the nature of the data sharing at the start of the project, the data was not ready for sharing until July. This meant it was not practical to divide up the work into month-long chunks, and so multiple activities had to be done in parallel, along with responding to queries.

The varying nature of the research projects engaged with meant that activities varied greatly in how much time they would take up. Advising staff on improving Wikipedia articles in their subject could be done relatively quickly, while the projects involving large data sets usually took multiple stages of activity spread over a few months.

Researchers’ timescales meant that some work could not happen. Three of the projects approached (Archive of Performances of Greek and Roman Drama, the Cairns Library in the John Radcliffe Hospital, and Cult of Saints) were interested in Wikipedia-related activity, but after the October 2017 end of the one-year placement. A Bodleian archivist is interested in using Wikidata as an authority file for religious houses in England and Wales, but this query came in at the end of the project, as did a data dump of records from the Beazley Archive Pottery Database. Since so much of the project involved processing data sets, we spent a small amount of funding on a Wikidata Assistant to work on data reconciliation and free up time for the WIR.

How did you go about your project?
Tell us the story of what you did.

I put out a call for interested researchers, which quickly brought in a lot of interest. This was followed up with face-to-face meetings with Principal Investigators or project teams to discuss their needs and the kind of activity we could do.

I also approached different parts of the university to speak in staff meetings or seminars about new developments in the Wikimedia family of projects. I presented at meetings of the Department of Psychiatry, the Digital Humanities Working Group, the Negotiated Texts Network, the Cairns Library, Bodleian Digital Library Systems and Services, communications and public engagement staff in Medicine and the Technology Enhanced Learning group.

During the project, the university’s GLAM (Gardens, Libraries, and Museums) Group was beginning to implement its new strategy to transform the accessibility of the university’s collections and their use in education and research. To influence the process, I met with the GLAM Programme Manager and presented at a meeting of the GLAM Strategy Implementation Group. I wrote guest posts for the Bodleian Digital Library blog and Oxford Museums blog to argue that Wikidata could have a central role in joining up data from different sources, and visualising those data in useful ways for education and discovery.
 * "Wikidata: the new hub for cultural heritage"
 * “Resource Discovery and Wikidata”
 * “Report from Wikimania”

These articles argue that Wikimedia projects can be platforms for work in Digital Humanities, and I reinforced that message by delivering a session in the Oxford Digital Humanities Summer School, a training session for library staff on “Working with the Open Culture Movement”, and a training session for staff and postgraduates at the Women In German Studies conference.

The nature of the activity work depended on the partner project, but usually involved sharing some data or text under a free licence:

The Electronic Enlightenment/ Oxford Text Archive team were interested in biographical dictionaries of booksellers and printers, and getting these from page scans into a more usable format. We used a combination of Wikisource, Wikidata and Google Sheets to produce a digital text version of one of these books, a data set representing the book, and the book’s text as a data set, with crowdsourced help from the Wikisource volunteers. This work is described in the case study “Turning a historical book into a data set”.

This team also wanted to explore getting more incoming links to the EE biographical dictionary. We arranged for Oxford University Press to give free access to EE for active Wikipedia editors who asked for it, and publicised this to the Wikimedian community. So far, 48 Wikipedia editors have been given free access to the service. We matched thousands of identifiers from EE with Wikidata, then used these data to identify people in EE that had no Wikipedia article, or had no English Wikipedia article but an article in another language. This work is described in the case study "Reconciling database identifiers with Wikidata".

With the Atlas of Hillforts, the interest was in getting incoming links to this newly-launched web site. We imported selected data fields into Wikidata, with links back to the Atlas for the full information. We matched Atlas records against Wikidata records where they existed, and tagged images in Wikimedia Commons with links to the relevant entry in the Atlas. We created a project page to tell the Wikipedia community what was happening, and used a tool to generate list articles in Wikipedia from the imported data. This work is described in the case study “Creating Wikipedia articles from research data”.

The Cultures of Knowledge group (who run Early Modern Letters Online) are interested in combining data about notable people and locations from historical sources, and having those data used in research and education. They included some of my time in a bid for a European project in which they were part of the consortium. I obtained a letter of support for the bid from Wikimedia UK. That bid is presently wait-listed. I have advised on how these data sources (including EMLO identifiers) are reconciled with Wikidata, influencing a couple of bids that are now being prepared.

The Oxford Research Archive had digitised thousands of doctoral theses and made them openly available. I created a new property in Wikidata to represent the institution to which a thesis had been submitted, and added data about the Oxford theses in a bulk upload. We used Google Sheets to find Wikipedia articles about notable people among the thesis authors, such as John Vickers, H. A. Berlin, and Ronald Hutton. ORA staff and I added thesis links to these articles. This work is described in "A step forward in the sharing of open data about theses".

With the Voltaire Foundation, we worked on describing Voltaire’s most important works in Wikidata, including the non-fiction books that were subject of a project funded by the AHRC. These data were used in two custom interactive timelines, using the Histropedia software library. These timelines were added to the Voltaire Foundation site and the data also appeared in Wikidata. The outcomes are described in the blog post "If Voltaire had used Wikipedia…".

With Prof. David Zeitlyn, a cultural anthropologist whose work covers people and archives in Cameroon, we found that a limiting factor on coverage of this topic in Wikipedia was the lack of published secondary sources. I recommended a way to adapt existing open-access journal articles into Wikipedia articles. This is described in the case study "Extending the reach of a journal with Journal-to-Wiki publishing".

The Ritual Modes research group in Cognitive and Evolutionary Anthropology had tried to add articles to Wikipedia about their topic, but had been discouraged because an article had been deleted. I got the article undeleted and gave them advice on getting further articles accepted. This is described in the case study “Deletion is not the end: making an academic article stick on Wikipedia.”

The Women In German Studies group wanted training on how to improve Wikipedia articles in their field. In their workshop, they learned about the behind-the-scenes efforts to improve coverage of women and to facilitate translation between different language versions of Wikipedia.

For IT Services, I documented two kinds of event using Wikimedia platforms to engage the public in research, including instructions that would be given to participants. I also provided training workshops for two different groups of staff in IT services.

There were other queries that did not develop into packages of work, or would have developed into work outside the one-year placement. For example, John Mittelmeier, a doctoral student in Geography, uses Wikipedia article hits as a measure of public interest in conservation topics. He had been screen-scraping this from an online stats tool and I showed him how to get the data directly from a machine-readable interface. Professor Kate McLoughlin was bidding for funding to make an educational app with elements of crowdsourcing, and I advised on how Wikidata could provide some of the infrastructure for this, and its limitations.

Project Deliverables and Outcomes
''What did your project produce? This will be your project deliverables but could also include other outcomes - such as increased knowledge and skills, a change in the way things are done, impact. Please also add where/how we can access the project outputs, if relevant (for example a URL)''

The case study outputs are described in the previous section. A principal outcome of the project is more informed awareness of Wikimedia projects across the many projects and researchers that have been involved.

In terms of changes to the web, these data have been shared on Wikidata:
 * 3,238 data items about doctoral theses
 * 4,147 data items about hillforts
 * More than 3,000 links to Electronic Enlightenment (the number increasing over time thanks to crowdsourced matching)
 * Data sharing by the Beazley Archive Pottery Database will likely result in tens of thousands of items being added in the future. The total number of records is 117,000 but some of these are too sparse to be shared on Wikidata.

These data items are used in some language versions of Wikipedia to make ”article placeholders”; pages that give basic information and links about a topic when no article has been written for it. Examples:
 * https://ht.wikipedia.org/wiki/Espesyal:AboutTopic/Q28914893
 * https://cy.wikipedia.org/wiki/Arbennig:Am_y_Pwnc/Q31055478
 * https://nn.wikipedia.org/wiki/Spesial:AboutTopic/Q1291966

They also appear in other queries and apps that draw on Wikidata’s freely reusable data.

The Oxford Research Archive gained incoming links from Wikidata, and 42 from Wikipedia, including from high-traffic biographies such as Neil Gorsuch and Rowan Williams. We got a change in Wikipedia’s infoboxes so that philosophers’ biographies could display their doctoral advisor, thesis title and thesis link at the top of the article.

The Voltaire outputs are described in the blog post: there was a measurable increase in the readership of Voltaire articles on English Wikipedia and French Wikipedia.

Archaeological articles:
 * https://en.wikipedia.org/wiki/List_of_hillforts_in_Ireland
 * https://en.wikipedia.org/wiki/List_of_hillforts_in_Northern_Ireland
 * https://en.wikipedia.org/wiki/List_of_hillforts_on_the_Isle_of_Man
 * and numerous changes to other hillfort-related articles. Again, the number of links is increasing over time as more Wikipedia editors use the resource.

Some hillfort images have been shared on Wikimedia Commons, and hundreds of existing images have been tagged with links to the Atlas. Anthropological articles:


 * https://en.wikipedia.org/wiki/Modes_of_religiosity
 * https://en.wikipedia.org/wiki/Identity_fusion
 * https://en.wikipedia.org/wiki/Cameroon_Press_Photo_Archive
 * https://en.wikipedia.org/wiki/Jacques_Toussele
 * https://en.wikipedia.org/wiki/Florence_Ayisi

These are getting viewed at a rate of around 8,000 per year.

Benefits
''How has the project benefitted the University? Who else may benefit from it, and how?''

The university’s outputs benefit from being published, from being online where they can be read by anyone with an internet connection, and from being structured where they can be discovered and repurposed. This project has been about the natural extension of that: putting text and data in high-traffic platforms to fully connect the research projects with the most popular platforms for finding and repurposing information.

Everything that has been shared on Wikipedia or Wikidata has links back to the scholarly source for further information. The public thereby benefit, and so do the sources themselves that get thousands of incoming links. For example, users of the Monumental app and Wikipedia infoboxes have more links to relevant information (about hillforts and doctoral theses respectively) which contributes to the reach of the source databases. The case studies produced in this project have outlined a workflow that could be used in almost any subject: getting primary data (EE/OTA book transcription); sharing biographical, geographical or bibliographic data to link against other data sets (EE, Hillforts, theses); publishing papers (Vestiges); sharing lay summaries (anthropology, Women In German Studies); and engaging a wider public in the topic (Voltaire timelines, public events for IT Services).

In the process of sharing data, all the contributing databases have been scrutinised and improved. The sharing process brought up a small number of inconsistencies, duplicates or gaps which were reported back to the source database. The Wikidata query service, and the Google Sheets tools that were used to prepare data for sharing, made it easy to create maintenance queries; queries that return no results so long as the data are clean and consistent.

The design of the Atlas of Hillforts was influenced by this project to give each hillfort its own direct link, and this benefits anyone who wants to point to a hillfort on social media. Similarly, the WIR fed into the future design of the online presence of the Bodleian’s archives.

Finances - Actual against Plan
The project was funded assuming the Wikimedian would be employed at the top of the pay grade. There were savings because the actual pay level was lower.

What Next?
''What is going to happen with your project’s outcomes and deliverables? How are you promoting it? Will you be developing anything further? Is there a user community?''

This project worked with community-based sites. Wikidata reports more than 17,000 active contributors. English Wikipedia reports more than 128,000 active contributors. English Wikisource has a much smaller, but extremely helpful, community. All the activity discussed in this report has benefited from work by volunteers in these communities.

The project has been documented in a Wikipedia project page from the start, and dedicated project pages have been created around particular efforts, such as hillforts or people of the Enlightenment. These project pages and blog posts outline activities that volunteers can do to build on the University of Oxford collaboration. They reconcile database identifiers for Electronic Enlightenment (EE) or Early Modern Letters Online (EMLO) with Wikidata (described in this blog post). They translate or create articles about people in EE, or create articles about hillforts. They translate the textual labels of shared data into non-English languages. A group at the University of Edinburgh, including that university’s Wikimedian In Residence, is taking forward the writing of hillfort articles.

The free access to EE for active Wikipedians is an ongoing relationship being managed by The Wikipedia Library, a dedicated team employed by the Wikimedia Foundation.

IT Services are involved with the OxLOD project which, among other activity, is building a resource discovery tool based on Wikidata. They are funding me to do some of the work. If successful, this will make the kind of activity done over the past year central to the joining-up of research projects and cultural collections across the university.

With the EE/OTA team I am working on a bid for a project around Enlightenment-era books, built on work during this project.

Queries from researchers about aspects of Wikipedia and Wikidata, including offers of data, continue to come in: some of these can be handed over to the volunteer Wikidata community.

Lessons Learnt and Feedback
''Based on your experiences of running a project is there anything you can share that can inform the Innovation programme? Do you any top tips to share with future project managers?''

Some staff in the university are still suspicious of Wikipedia and related sites, and of open access more generally. Still, there are many who see the benefit of actively working with these platforms: more queries and potential work came in than could be handled by one person working for a year at 0.5FTE.

Turning interest into enthusiasm, or into a plan of activity, is often just a matter of showing people the tools that exist in the Wikimedia sphere. Wikidata is a knowledge base. Reasonator is an overview of what Wikidata knows about a given item. Histropedia is a software library to generate interactive timeline views of these data. Women In Red is a set of lists of notable women who lack, and should have, Wikipedia articles. It is very easy to be aware of Wikipedia, even to take formal training in Wikipedia, without knowing anything of these behind-the-scenes tools and initiatives. So just looking behind the scenes can change how people do their work. For instance, Mix’n’Match is a crowdsourcing platform for linking across Wikidata and hundreds of external databases. Telling contacts about this service (and giving concrete examples of its use with Oxford datasets) influenced the discussion about the use of Wikidata in OxLOD, and influenced a project bid by Cultures of Knowledge.

So it benefits academics to be informed about this area, but not just given a link. It needs to be done with awareness of the legal and technical issues faced within Wikimedia and by projects within the university. For example, having seen a demo of Wikidata, researchers and librarians want to know how data are modelled in it, how that data model evolves, and how it can be expanded to meet the requirements of an external partner. Academics have asked if an oral lecture can be used as a source in Wikipedia, whether an object’s creation can be represented in Wikidata as a date range rather than single date, and whether figures from a research paper can be used to illustrate a Wikipedia article.

The complex internal structure of the university presents a challenge for diffusion of these ideas. However, the centralised services do an admirable job for similar topics: academics across the university can get expert help with open access, with data visualisation and other topics that span subjects. The demand for expert advice specifically on Wikimedia platforms will only grow in the foreseeable future.