Wikipedia:WikiProject CRUK/November & December Report

Cancer Research UK ‘Wikipedian in Residence’ project Wellcome People Award no. WT103116AIA

Fourth bimonthly update – for November and December 2014

Introduction
Cancer Research UK (CRUK) has a Wikipedian in Residence, John Byrne, who started on 1st May 2014, on a 4-day a week contract originally due to run until mid-December 2014, now extended to mid-February 2015. The main funder of the project is the Wellcome Trust. The overall aim of the project is to forge links and dialogue between CRUK, Wikipedia, Wikimedia UK and the wider cancer research community, and so begin to improve the cancer-related content on Wikipedia.

This report covers November and December 2014.

The project focuses on four areas:

1. Research: a short project to understand a) how people use Wikipedia when searching for cancer information online, and b) how they rate the information on Wikipedia before and after article improvements.

2. Content: forging links between CRUK experts, clinicians and Wikipedians, to improve Wikipedia content. We will initially focus on improving four cancer type articles – Pancreatic cancer, Oesophageal cancer, Lung cancer, and Brain tumour – since these are the four cancers of ‘unmet need’ identified in the CRUK research strategy.

3. Creative Commons: investigating the possibility of releasing CRUK content to Wikimedia Commons under a Creative Commons license.

4. Training: To increase awareness of how Wikipedia works, and how to edit and review articles, across CRUK staff, researchers and more widely.

With a further objective to disseminate lessons learnt during the project.

Progress so far:

Extension of project period
Thanks to additional funding, John’s role as the employed and dedicated CRUK Wikipedian in Residence will now continue for an extra 2 months until the middle of February 2015. The first month is funded by Wellcome Trust, followed by CRUK, then Wikimedia UK for the last period in February. Many thanks to our funders!

Objective 1 – Research
The qualitative research finally got underway, after much consideration of the details with UCL’s Senior Lecturer in Health Informatics, Dr Henry Potts, a Wikipedian of several years standing who kindly agreed to supervise the research. He previously supervised a research project interviewing editors of Wikipedia’s medical content which was published in December. Henry is based at the new Farr Institute of Health Informatics and we are lucky to be able to use a training room there as a neutral setting for the qualitative research sessions. John is conducting the sessions, and has had some training from Henry on doing this. Henry has also been present for the initial two sessions.

Description of the qualitative research protocol

The research takes lay subjects, and observes them as they research an unfamiliar medical topic on the web, with interviews about their experiences doing this, before and during the session. The subjects are given a scenario in which someone they are interested in (but not too close to) has been diagnosed with pancreatic cancer. The subjects are asked to find out about it on the web for a period of about 15 minutes.

Pancreatic cancer has been chosen for this, as one of the four cancers with unmet need emphasized in CRUK’s 2014 Research Strategy. At the beginning of December 2014 the Wikipedia article was nominated at Wikipedia’s Featured Articles Nominations page.

The web journey is recorded with screen recorder software, and tracked with tracking software, recording the pages visited and the time spent on them. If the subjects do not look at Wikipedia in their main search period, they are asked to do so at the end.

At the conclusion of their research period the subject is asked to answer 2 simple questions on pancreatic cancer, then given a semi-structured interview (audio-taped) for some 15-20 minutes on their experience, aided by playback of the screen recorder. This covers why they went to and left particular pages, and what they thought of them. Also an overview on which sites/pages they found most helpful and why, and any other issues that emerge in the course of the interview. They are asked to complete short questionnaires rating each site where they spent any significant time on key points, with 5-way rating boxes. The interviews (only) are audio recorded, to be fully transcribed (in India) later.

Research progress
After discussions with a UCL researcher using comparable methods, an ad was placed in Gumtree London (free) seeking subjects at £15 for a session of an hour or a little more. This produced 25 subjects, relatively few of them students (a nice change from the usual in such studies) and with a good spread of gender, age, and occupation. The very handy central location probably helped here. It can easily be repeated.

Equipment was borrowed from Wikimedia UK – audio recorders and, for technical reasons, a dedicated laptop. In December there were two initial sessions. These were planned as isolated sessions a few days apart to allow time for adjusting the protocol and technical equipment as necessary. The first was a session with a UCL administrator, not at full length, but running through all the parts of the protocol, to test the use of the equipment, paperwork, and facilities. A few days later the second session was a full session with a Gumtree recruit, which can be included in the research results. Henry Potts observed both sessions, and made many useful comments. He has also helped draft the consent forms and other documentation.

Bulk bookings have been made for January; we anticipate seeing up to about 30 subjects in all, rather more than we first planned.

The quantitative study will be run through YouGov in the New Year, with a much larger group being sent online to one of 3 different webpages, again on pancreatic cancer. These are: Wikipedia latest (hoped to be a Featured Article by then); Wikipedia as at May 2014, at the start of the project; and NHS Choices.

Objective 2 – Content
The article on Pancreatic cancer was nominated for Wikpedia Featured Article status, the top level of reviewed content, at the start of December. The review process remains ongoing at the time of writing, with Featured_article_candidates/Pancreatic_cancer/archive1 showing sections with comments from 9 different editors, a higher than average number [update: now it's passed]. In addition several other editors have commented on that page or the article talk page, or just made edits, so that over 15 editors can be said to have contributed to the “Featured Article Candidate” process. This seems to be drawing to a close now, and we are hopeful of being awarded “FA” status in January, given the many favourable comments. Altogether 430 edits were made to the article in December, and the start of January has also been busy.

The article also now reflects the comments from an outside expert review by Andrew Biankin, Regius Professor of Surgery at Glasgow University, and a leading researcher into pancreatic cancer, as well a surgeon. Earlier internal CRUK reviews were covered in previous reports.

The Brain tumor article has been the most difficult of the targeted articles to approach, as the subject is divided into a huge range of different tumour types, with only certain things in common. An internal workshop was held to map out an approach to making what will be largely a new article. Several parts of CRUK were represented, giving an excellent all-round view of the article & its subject. This produced a considerable amount of notes, which it is planned to utilize in January.

The CRUK statistics team, after receiving training, have added basic UK statistics to the Wikipedia articles on the 35 cancer types they cover in detail.

On December 9th “Endometrial cancer”, which the project helped to reach Featured Article status in October, was “Today’s Featured Article”, with the top left spot on Wikipedia’s front page. This added 20,000 extra views over a few days to the usual 500 or so views per day the article receives.

Work has continued on “Oesophageal cancer”, though the “Pancreatic cancer” FA candidacy diverted editor effort in December. Once that is over it will receive renewed attention.

Objective 3 – Content licensing and release


There is now agreement in principle by CRUK to release large parts of the web copy used on the CRUK website on Creative Commons BY-SA open licenses, subject to technical implementation.

A number of newly-created diagrams have been uploaded to Wikimedia Commons, and some other images such as photos of CRUK buildings. The upload of the first animations has so far been delayed by technical uncertainty, as Wikimedia Commons requires open file formats, which are difficult for animations.

The large views for the articles using CRUK images already uploaded (see last report) continue. In November the 80 Wikipedia articles in a total of 8 languages that used CRUK images received 1.35 million page views in the month. As before the ~30% of views on mobile devices are not in these figures. December figures are not yet available (but are always lower than average, with the holidays).

The vast bulk of these views are on the English Wikipedia, but the 15 images used in non-English versions of Wikipedia now had 63,000 views, a sharp increase. In November 5 images on pancreatic cancer had the text labels within the image translated into German, and they are now used in the German article. This is the first time that has been seen, the other non-English usage being of images either with no text, or keeping the English text.

This was done with no contact with the CRUK project, and it was always envisaged that such translations would be done organically. One of the strengths of the .svg file format used is that this is easy for those who understand graphics. A difficulty may be ensuring that all translations are correctly tagged or categorized as CRUK-originated, for collecting their viewing statistics.

Objective 4 – Training
November and December saw the completion of the planned programme of Wikipedia training sessions, with full workshop sessions at the CRUK research institutes in Glasgow, Oxford and Cambridge, as well as one in London for a UCL research group supported by CRUK. There was also the final planned session for CRUK in London, although now that the role is extended there is the possibility of another session in 2015.

Feedback from these sessions, using standard forms with 7 questions and 5-way multiple choice answers, has been good, and increasingly so as the presentation for these particular groups has been refined. Most participants have signed consent forms allowing their subsequent edits to be tracked as groups by the Wikimedia Foundation. The final report will include analysis of the feedback and subsequent editing. This is usually highly variable, depending on a number of factors, including the type of attendee, but the raw figures are often not high, especially where senior local opinion-formers are concerned.

The researcher attendees ranged from doctoral students to full professors and team leaders, with also some administrative and support staff. Typically their backgrounds were in basic research areas such as cell biology or genetics. This is an area where Wikipedia’s coverage is often strong on specialized areas, though tending to become outdated as the period when editing was at its peak (c. 2006-11) recedes.

Dissemination
Henry and John attended and presented at the SPOTON conference for new media science communicators, this year held at the Wellcome Collection on November 14-15. The session, entitled “Improving Science on Wikipedia”, was well attended, and included a full run-down by Henry of the progress of the CRUK Wikipedia project to date (see accompanying Powerpoint slides. Then there was a panel-led discussion, bringing in John, and also  Dr Mike Peel, a Manchester astronomer and long-serving Wikipedian, who had kindly agreed to join us in London, as well as Holly Millward from the Cochrane Collaboration, who discussed her organisation’s Wikipedian in Residence project.

There was a lively discussion between the panel and an audience of about 30.

At the conference, John led an unexpected Wikipedia training session in the “unconference” session on the 15th, assisted by Mike. This was in the last slot of the event on Saturday afternoon, and rather better attended at the start than the end.

Legacy
The transition to “business as usual” after the project is in progress, with various ways of keeping CRUK-Wikimedia co-operation going being established and explored in the various project areas. For example the uploading of the new diagrams was done by regular CRUK staff on the CancerHelp team who had been trained in this. The statistics team intend to make it part of their routine to keep their information up to date on Wikipedia as well as their own CRUK webpages.