Wikipedia:GLAM/Bodleian/Final report

This is an impact report of the Bodleian Libraries' Wikimedian In Residence, a residency that ran from 1 April 2015 to 31 March 2016.

Background
The Bodleian Libraries are a group of research libraries in Oxford University, including the four-century-old Bodleian Library itself. Collectively, they employ around 700 staff and hold more than 12 million books, plus many millions of digital files.

The Bodleian Libraries' connection with Wikimedia goes back to 2012, when a Bodleian staff member wrote an article for The Signpost about knowledge representation, and when the Libraries and Oxford University's IT Services jointly hosted a Wikipedia editathon about women in science as part of the Finding Ada campaign. Since then, Ada Lovelace editathons have become an annual event in Oxford, bringing together University staff and members of the public.

The Bodleian Libraries applied to Wikimedia UK for a Wikimedian In Residence in August 2014. An agreement was signed in November that year, with recruitment beginning in December. The post was funded by Wikimedia UK, with in-kind support from the Bodleian Libraries. Thus I worked from April 2015 to March 2016, as the WIR at the Bodleian. I was hosted by the Communications team, working half-time in their office in the Old Clarendon Building, line-managed by Liz McCarthy. The "Wikimedian", rather than "Wikipedian", title reflects the aim to promote and improve not just Wikipedia but other projects including Wikidata, Wikisource and Wikimedia Commons.

The main goals of the residency were Although based in the libraries, I was encouraged to work with colleges, research projects, museums, and other parts of the university to pursue these goals.
 * 1) to share content from Bodleian special collections to enhance Wikimedia projects;
 * 2) to expand the community of contributors, sharing skills with University members as well as the public;
 * 3) to shape policy and workflows to increase the amount of open knowledge.

During the year, there were important changes for both partners: the launch of the new image platform Digital Bodleian and, on the Wikimedia side, of some helpful new reporting and visualisation tools. There was also the bicentenary of Ada Lovelace, which called for a more ambitious set of events than the usual editathon.

Distinctive things about this residency

 * Strengths
 * The diversity of Bodleian content: not just a local or even national library, but a collection of cultural treasures from many different civilisations, covering a huge range of encyclopaedic topics
 * The Bodleian's position within the University of Oxford, bringing me into contact with other cultural institutions, departments and academic projects
 * The support of Academic IT Services who hosted editathon events and encouraged me to give Wikimedia-related training in their programme of workshops for staff
 * The diversity focus: from the outset, both the events and content-sharing were geared to address the themes of non-Western cultures and the representation of women, and these are themes that staff and the public have responded to.
 * Active support from the highest levels of the host organisation


 * Barriers
 * Oxford's cultural institutions are very successful at making commercial use of their archives, so some parts of the university are wary of free sharing. This is definitely changing, but the process is slow.
 * Even by the standards of universities, Oxford is a very complex organisation. Working simultaneously with different parts of the organisation has taken a lot of time, and some partnerships are just about to blossom as the residency ends.
 * I have had a great deal of help with communications within the Bodleian and the University, yet it has been hard to get the message out to all potentially interested staff; word-of-mouth has played an unexpectedly large role.

Key statistics and events delivered

 * 92 user names recorded, 70 of which were new accounts
 * 2,672 edits by trainees, adding 938,000 bytes of content to the Wikimedia projects, creating 20 new articles and improving 123 more so far
 * 1,991 edits by the resident, adding 1,008,000 bytes of content
 * 8005 files uploaded
 * 363 attendees total, from hands-on events, workshops about working with Wikimedia that did not involve editing, and from presentations at events.

Changing intellectual property policy
As a result of this project, the Libraries have enabled large amounts of their digital content to be shared under free licences for the first time. The terms and conditions of the Bodleian Libraries' own sites allow personal or educational use but forbid users from making unlimited free use of the downloaded images. A newly-formed Access and Reuse Committee first met on 28 May 2015 and considered a briefing from the WIR requesting images to illustrate Wikipedia articles. This committee gave permission for authorised staff to upload web-resolution versions of images from Digital Bodleian under an attribution-only licence. For 40 cases where the agreed resolution is insufficient to make good use of the image, the WIR asked the Committee for permission to upload higher-resolution versions, which they approved. This precedent makes possible an ongoing process of bulk uploads to Commons from Digital Bodleian.

The Bodleian Libraries are part of a recently-formed group called Academic Services and University Collections (ASUC), along with the University's four museums. There is an ASUC Libraries and Collections IP Policy which has been inspired and influenced by the Bodleian residency.

There has also been progress, albeit slow and ongoing, with the licensing policy of some colleges and museums: see below.

Sharing content and building an ongoing process
There have been 8,005 files uploaded but in the longer term we can expect larger quantities. Some highlights of the uploads so far:
 * Bodleian Oriental Collections: These collections cover a wide variety of different civilisations and eras, including Middle Eastern.
 * The John Johnson Collection of Printed Ephemera: nearly a thousand images of many kinds of unconventional publication, including theatre playbills, handbills, board games, and news clippings.
 * The Curzon Collection: more than 1,000 political cartoons from the era of the Napoleonic Wars, including cartoons from Germany, France and Russia as well as Great Britain.
 * The Bodleian Maps Collection: more than 1,000 files including local, national and world maps from a wide variety of historical periods.
 * The Polonsky Foundation Digitisation Project: a new project which is digitising Hebrew and Latin manuscripts relating to the history of Judaism and Christianity.

This total figure also includes some small exploratory contributions from other parts of the university- see below.

Some of the John Johnson Collection images are held in the commercial image library ProQuest. I asked ProQuest for permission to share eight of these on Commons under free licences, which they agreed to.

Historically, the Bodleian's digital image collections have been spread across various different platforms. During the residency, Digital Bodleian was launched; a single platform which makes it easy to preview pictures and export metadata. Work is ongoing in the Bodleian to transfer existing image collections and new scans into this platform, with a large upload coming in Summer 2016.

My own work has built tools linking Digital Bodleian with the GLAM Wiki Toolset (GWT). It takes the IIIF/JSON metadata output by Digital Bodleian and converts it to flat XML required by the GWT. At the end of the residency, I have put in a request to have my software added to the Bodleian's servers, and for a nominated staff member to take responsibility for continuing bulk uploads once or twice per year.

In the first months of the residency, the emphasis was on illustrating Wikipedia articles about non-Western cultures using files from the Oriental Collections. At the end of 2015, the effort went into creating and testing the different parts of the bulk upload process. The uploads themselves were packed into the final two months of the placement. This means that for most of the 8,000 images, there has not yet been time to find uses for them in Wikipedia, but this will happen organically as time goes on. This is why the views for Bodleian uploads reported by BaGLAMa jumped early on, have stayed relatively constant since then, but should be expected to increase from now on.

Images from this residency have been getting roughly two and a half million views per month on Wikimedia projects, though this is known to be an overestimate because it includes hits where the reader does not actually see the image because they do not scroll down to the relevant section. Digital Bodleian has had 597,000 pageviews in ten months, so images from the Bodleian are seen roughly 40 times as often on Wikimedia sites as on the Bodleian's own site.

How the images benefit from sharing
Sharing on Commons enables a variety of things to happen to the images. These uses are at an early stage since most of the material is so new.

This image from the John Johnson Collection is used in the French Wikipedia article about the Irish famine. The caption, translated into French by the Wikipedians, has been copied back into the record on Commons. The Bodleian's image archive shows the individual panels of this panorama of the life of the Buddha, but sharing on Commons enabled them to be combined to recreate the whole panorama.
 * Translation of descriptions or text into other languages
 * Combining images into composites

Similarly, elements from composite images can be extracted to serve an educational purpose. This image from the John Johnson collection supplied portraits for two actors' Wikipedia biographies.
 * Splitting images

This helps their visibility when used in the encyclopaedia. For example, this image from the Curzon Collection.
 * Cropping or contrast-enhancing individual images

For example the sonnet written by Ada Lovelace and a handbill from the John Johnson Collection.
 * Transcription into Wikisource

Images with a thematic or historical connection can be hyperlinked to each other. For example, this cartoon by Voltz from the Curzon Collection is linked to versions of the same image by other artists.
 * Connecting related images

This takes many forms, including embedding in Wikipedia articles; linking names and terms from descriptions to identities in Wikipedia or Wikidata; or adding categories on Commons so that Bodleian images are visible alongside those from other institutions.
 * Providing context

Extending the impact of funded projects
I remade contact with Dr Liz Leach, a lecturer who had attended my Jisc workshop. She was on sabbatical during my residency, but put me in touch with Dr Katherine Butler and the AHRC-funded Tudor Partbooks project. Together we ran a Tudor Music editathon creating articles relating to partbooks, two of which were linked from a DYK hook on Wikipedia's front page. I summarised this in a short case-study for the Bodleian's staff newsletter, Outline. Dr Leach has expressed interest in doing a Wikipedia educational assignment.

The Wellcome Trust have funded a project to catalogue and make accessible archives relating to the physiologist Mabel Purefoy FitzGerald. As part of outreach for the project, the archivists intend to improve the Wikipedia article about FitzGerald. I have given advice to the archivists in email, met one of the archivists to discuss how Wikipedia works, trained her in editing at an editathon event, and given feedback in email on proposed changes.

Colleges

 * Somerville College, St Hilda's College, and Lady Margaret Hall (former women-only colleges) have each, for the first time, released images to Wikimedia Commons. Somerville has released images relating to Ada Lovelace and Mary Somerville. St Hilda's and LMH each released four photos of past principals.
 * Jesus College has archives of historic value (especially for Welsh history) as well as images of its fellows. I have been in touch with the college's librarian/ archivist, who is seeking to release images of manuscripts under an open licence. Changing an intellectual property policy is a slow process and wasn't managed during my residency, but a decision is expected in the Summer.

Oxford University's museums
The University has four museums, all of which were represented at a workshop I ran on how museum staff can work with Wikimedia. The museums are still wary, to differing extents, about free licensing of their images and this rules out, for now, a mass upload of images. The GLAMorgan tool has been useful for showing the impressive numbers of hits on images relating to each museum. Staff from the Museum of the History of Science (MHS) and Museum of Natural History (MNH) have been in touch about further activity. Although we did not manage to arrange anything within my residency, the MHS is interested in hosting a backstage pass event and in possibly merging its inventor database with Wikidata.

The MHS released three of its images under a free licence to illustrate the Wikipedia article on Armillary Sphere. I also tagged some existing images as relevant to the Museum (including photos that visitors had taken to illustrate articles). GLAMorgan says that this category of images is getting about 160,000 hits per month (approximately two million per year).

The Oxford Internet Institute
When I heard about a proposed event for World Computer Day, I suggested moving the event to Wikipedia's birthday and having a Wikipedia-themed event. The OII not only hosted a Wikipedia training event, but included short presentations from six of their researchers on different aspects of Wikipedia, which were videoed and shared on YouTube. The event also included extensive discussion with the researchers about their mix of positive and critical perspectives. The videos and the OII write-up of the event, published on Medium, raised awareness of Wikipedia and way it is used in research at the OII.

The Voltaire Foundation
Having had a custom workshop about Wikipedia, Wikidata, Commons, Wikisource and Wikiquote, the Vf have expressed interest in further work, noting that most notable works by Voltaire lack articles on Wikipedia. They have shared four images on Commons. I will follow this up after the residency.

The Oxford Text Archive
I was hoping there could be a bulk upload of text content to Wikimedia, or to the Internet Archive from where it is easy to import to Wikisource, but this didn't happen.
 * Books from the Bodleian that had been digitised for the Google Books project had usually already been uploaded to the Internet Archive.
 * Wikisource requires texts to take some aspects of their original formatting. In practice, this requires page scans as well as text, though for a lot of its content OTA lacks page scans.
 * Some texts in OTA have been professionally validated, but many have not and so there are missing characters or other errors. Hence transfer into Wikisource needs careful checking rather than a bulk upload.

One change I did achieve was to get some 18th-century texts from the OTA linked to their page scans in Jisc Historic Texts, making it easier for people in UK higher education institutions to create Wikisource versions of the texts. In volunteer time, I have copied some texts into Wikisource and created relevant author profiles. I have started a category on Wikisource for texts from the OTA.

Beyond the University
In various ways the project affected the wider sector:
 * The Oxford Internet Institute event was attended by staff from City University, Public Health England and Loughborough University, who expressed interest in future Wikipedia training for their colleagues.
 * A double-page article on Wikimedia and special collections appeared in CILIP Update magazine.
 * I spoke at the DIY Digitisation Conference hosted by the Bodleian with attendees from various libraries. The conference organisers are making a proposal to publish an open-access book based on the conference sessions, in which case there will be a chapter from me about sharing images through Wikimedia.
 * I was invited to a seminar for History postgraduates at the University of London where I spoke about the benefits of writing for Wikipedia.
 * Other media exposure included interviews for Sky News and BBC Radio 5, plus a short segment of a regional ITV broadcast.

Some lessons learned

 * Events were more popular out of term-time. Students did not usually attend the events in large numbers. Some of the most interested participants were specialist staff such as librarians and archivists, and they find it easier to attend outside of term. The most popular student-focused event was a one-hour lunchtime workshop, so perhaps this reflects how students allocate their time differently.


 * Some of the new event formats were definitely a success, but others were not. Changing the complexity of what was expected of trainees had big effects on participation, in both directions.
 * The Transcribe-a-thon got a good attendance and was a very positive event in terms of mood, attendee feedback, and attendee involvement in later events. The Image-a-thon did not attract many sign-ups (in a week where there were four Wikipedia events), but it was well-received. A lunchtime workshop "looking under the bonnet" at hidden features of Wikipedia was good in terms of attendance and evaluation. Promoting informed comprehension of Wikipedia and giving people a simple task to do with a visible outcome (as opposed to getting them to create articles) both seemed to make events a success. I will try to make both of these central to my future training.
 * On the other hand, the Open Knowledge Ambassador course, which was aiming to give a deep understanding of Wikimedia and how to build partnerships, struggled to get attendees, though the evaluation was positive.


 * What provoked the best reactions in workshops were the varied ways to visualise and play with data: timelines, map interfaces, concept graphs and so on. People associate Wikipedia with text and images; these other ways to visualise open knowledge change the way they think about Wikimedia. The Wikidata Weekly Summary has been extremely useful for highlighting these tools.
 * It's hard to see the strengths and weaknesses of a Wikipedia category just by looking at Wikipedia itself, but a Histropedia timeline built from that category makes clear the usefulness and the gaps, creating interest in improving that category. It's also impressive that this demonstration can be done in just a few seconds.
 * Similarly, "what Wikipedia knows about central Oxford" is a very abstract concept, but becomes concrete when the audience see a map visualisation.


 * I usually introduced workshops by talking about Wikimedia's mission and its role as a social good, using examples from the Wikimedia Foundation film "Knowledge for Everyone" and the 1996 documentary "Life in a Day". This was a powerful message and was mentioned by people even many months afterwards as something that had changed how they thought about Wikimedia. In the Oxford Internet Institute workshop where I was not able to set up this framing, people took a more negative message about Wikipedia.


 * A common driver of interest in the events was that someone would attend one workshop out of interest, have a positive experience with it, and ask for a similar event for their workplace. For example, the Ada Lovelace events led to the Oxford Internet Institute event, which in turn led to the Voltaire Foundation workshop. I was trying to change how staff think about Wikimedia and to show them opportunities they had not thought of. A lot of people have already "made up their minds" about Wikipedia—for or against—so perhaps it's not surprising that people respond to word-of-mouth from colleagues much more than to the usual mailouts.

To Wikimedia UK

 * WIRs should be proactively looking for research projects, either that are nearing completion or are being planned, since funding often depends partly on public impact and public engagement, both of which Wikimedia projects can help with. I have given the University of Edinburgh WIR a briefing on this.
 * The GLAM-Wiki community should produce a general brochure on Wikimedia Commons for special collections, setting out the various kinds of functionality that Commons gives to an uploaded image. I'll be happy to work on this, when I have time.

To the Bodleian Libraries

 * The Bodleian Libraries are very successful at using social media platforms such as Twitter and Instagram to highlight items from their collections, often using creative remixes of the digital image. When the item is one that has been shared on Commons under this agreement, it should be routine to include the Commons link as well, so that the audience are encouraged to make their own remixes.
 * Digital Bodleian is still a new project, and I hope it rapidly becomes the definitive repository for Bodleian images and metadata. There was content that I got permission to share, but either was in a different database or, when I did share it, the metadata in Digital Bodleian was not the latest and fullest version.
 * The Bodleian Libraries have an active staff development programme. Awareness of open culture platforms such as Wikimedia Commons should be a normal part of the training provided to staff (this was suggested by an attendee at one of my workshops).
 * Consider a higher-resolution future upload for some categories of material. Maps, in particular, are not very usable at 1000px because written labels are illegible. The same applies to some manuscripts and cartoons that include small text.
 * Cataloguing within the Bodleian depends on a lot of controlled vocabularies, for example controlled names for all the British politicians and aristocrats depicted in this cartoon from the Curzon Collection. With nearly 20 million items, rapidly increasing, Wikidata has sufficient granularity for a lot of cataloguing purposes. So it would be worth mapping the identifiers used in cataloguing to Wikidata identifiers, as is already being done on a large scale by the Virtual International Authority File (VIAF). By linking to Wikidata, vocabulary items can draw on a huge amount of additional information, including family relations between people, authorship relations between people and texts, membership relations between societies or professions and people, and depiction relations between images and people.