User:John Cummings/Old/Mapping content gaps on Wikimedia

Wikipedia and other Wikimedia projects have large gaps in information, it is estimated that Wikipedia could cover over 100 million topics, currently English Wikipedia, the largest Wikipedia has articles. There are very large gaps in many topics for example less than 20% of biographies on English Wikipedia are about women and non binary people.

But how do we map what subjects are missing from Wikimedia? One way is to collate existing sources; databases, reference sources, and document expert knowledge.

The value for Wikimedia Some parts of Wikimedia are already doing this work, Women in Red is a project which writes about women and non binary people on Wikipedia and uses lists compiled by organisations and crowdsourcing to map who is missing from Wikipedia.

Collecting information from a wide range of sources and groups is extremely important to include as many topics and viewpoints as possible, for more information see Standpoint Theory. There are a wide range of people with expert knowledge including academics, professionals,, people with (tacit) lived experience, activists, policy makers etc. Sharing an overview of a topic area is an easy and quick way for people with knowledge to start contributing to Wikimedia projects. Note: not everyone who says they have in depth knowledge of a topic does e.g people who believe in conspiracy theories, Wikipedia:Reliable sources can provide some guidance.

Traditionally Wikipedia has found it hard to engage with experts to share information on Wikimedia projects, retention rates for in person training workshops are below 1%. Working with experts to compile topics provides an opportunity separates knowledge sharing from editing skills which are very often an unrealistic barrier to entry. It allows us to work with experts to quickly collect information on many topics in a way that can be used by Wikimedia projects in 300 languages.

The value for others

Experts People are often very happy to work with Wikipedia to share information with Wikipedia, they recognise it is a valuable source of knowledge and understanding of their area of work.

People using Wikimedia as a resource Compiling data can also help others who use Wikimedia in their work, e.g only 1/5th of experts interviewed in the media worldwide are women, could this change if journalists had an easy to access lists of all experts in specific areas? In turn this would make more reference sources available on women experts, allowing us to write about them on Wikipedia.

Examples

Working with experts Several projects have been run to collate information from different groups with knowledge on a topic area. Each take different approaches and collect information in different ways based on the audience.

COVID related topics Wikiproject COVID 19 Main messages

Wikimedians worked with UN agencies compile important messages, missing topics,, and reference sources related to the COVID pandemic. People with extensive subject matter expertise on a very fast moving topic which is very popular on Wikipedia

Sexuality topics Wikidata: Switched On: Working with experts to collate a worldwide database of sexuality topics

Wikimedians worked with experts on sexuality and sex education at a UN conference to map which sexuality topics and reference materials are missing from Wikimedia projects, creating a resource useful to both Wikimedia and for the experts to use and contribute to.

FindingGLAMs Meta:FindingGLAMs

Wikimedia Sverige (Sweden), UNESCO and the Wikimedia Foundation are working to build a truly worldwide database of cultural heritage institutions and their collections on Wikipedia. They have worked with government delegations to UNESCO remotely to collate datasets on cultural heritage institutions.

Icelandic women Wikidata:Icelandic Women

Wikimedians collaborated with Icelandic institutions to compile on Wikidata the most exhaustive list of Icelandic Women available online, adding over 1900 new women to Wikidata. This work automatically feeds into the work of Women in Red using Wikidata.

Working with a community

Mapping the open movement Github:Mapping the open movement

Wikimedians ran a social media campaign to map organisations working in the open movement on Wikidata.


 * 1) Scope: approaches to mapping a topic
 * 2) Types of information: kinds of information to collate
 * 3) Identifying sources: which sources or people may hold the information

Scope There are several ways to approach mapping a topic area and many valuable kinds of information to collect. Wikimedia projects have different notability rules so collecting all information available allows it to be useful for all Wikimedia projects.

Types of information There are several kinds of information that can be mapped to help improve knowledge available on Wikimedia projects including:


 * Topics: mapping the topics in a subject is extremely useful, these could be subjects, people, events, etc, anything that could be covered on Wikipedia, Wikidata and other Wikimedia projects.
 * Databases: existing resources that contain useful data, including data sources with different kinds and granularity is important and can be used in different ways, data can be used to create lists of missing Wikipedia articles, facts can be imported into Wikidata etc.
 * Reference sources: Information sources that can be used to create a reading list for a topic to help people writing about a topic cover it better e.g content that could be used as sources for Wikipedia articles. These could be books, blogs, publications, videos, video channels, etc, anything that can be used as a reference source on a Wikimedia project.

Information sources There are many ways to collect this information, it can be done both online and in person:


 * Social media: asking for input from a wide range of people or individuals, this could be done in conjunction with professors, professional organisations etc to expand reach.
 * Online groups of knowledgeable people: this could be organisations or group of practitioners or hobbyists.


 * Conferences: as a session or an activity (see Switched On and Conference Tables for more information).  Sessions allow for in depth collating of information, often less people but longer opportunity. Tables or other display: something that offers the opportunity throughout the conference, more contributions from more people but often less contributions per person, less depth.


 * Professional organisations: work with a professional society to get information from their members.
 * A literature review of available sources:  collecting a list of reference sources available on a subject.
 * Librarians: have a good understanding of the available resources on many different topics.

This section explains the process of collecting the information:
 * 1) Communication: creating messages when asking others to help compile the information.
 * 2) Ways to collect information: tools used to collect information
 * 3) Campaign / Workshop: Working with others to compile the information

Communication When asking people to take part in sharing their knowledge it is important to be clear and make contribution easy.

Clarity
 * Clearly asks of what you want from participants.
 * Provide clear instruction on how to contribute knowledge.
 * Clearly explain the value of doing work using terms the audience will be familiar with, the impact it might have and how it will benefit an audience. E.g ‘help build a worldwide database of LGBTQ+ topics on Wikipedia’, rather than ‘We are adding data about LGBTQ+ topics to Wikidata please help us’.
 * Don’t use jargon or specialist language the audience may not know e.g its often helpful to use Wikipedia rather than including Wikidata, Wikimedia etc, people already know and have strong positive feelings towards Wikipedia.

Ease of use
 * Make contributing as easy as possible with as few steps as possible, the more complicated it is the more people will give up.
 * Prioritise their contribution over extra work that may be needed to collate information in an easier way, e.g tweet a list of important female scientists vs using wiki pages when working with a non Wikimedia audience, technically difficult, if someone has 10 mins to spend their time sharing knowledge, signing up for a Wikimedia account and learning how to edit will take much more time than they have available meaning you get nothing from them.

Ways to collect information There are several options for collecting information:

Collaborative document Online collaborative documents, like Google Docs and Google Sheets are an easier way for people to use than a Wiki page. Lots of online collaborative documents have used Google Docs and Google Sheets to collate information on a subject.

Survey Another option is creating a form, like Google Forms, which people fill in individually feeds into a spreadsheet is good for getting structured feedback but is bad for repeated information and creating a sense of community and shared work.

Wiki page Wiki pages are significantly more difficult and take more time to learn than other options and are only really realistic to use for existing Wikimedia community members or others who are familiar with using a wiki.

This section describes the process of collecting information
 * 1) Option 1: Online collaboration: collating information online
 * 2) Option 2: Workshop: how workshops can be run with people with knowledge to collate information.

Option 1: Online collaboration Coming soon

Examples of online or remote collaboration include FindingGLAMs and Mapping the Open Movement.

An offline version of this work can also be run as a pinboard or other interactive display at a conference or other physical event. See Wikidata:Switched On as an example.

Option 2: Workshop A workshop can be run at a conference session or a stand alone event. It can also be mixed with other activities like sharing knowledge on a pinboard using post it notes. An example of this activity is the Switched On Conference session. Here is a suggested structure for a workshop:


 * 1) An introduction to Wikipedia, describing what Wikipedia is, its audience and how it fits into a large context eg Sustainable Development Goals. Often it is helpful to use Wikipedia as a catch all term to save time explaining Wikimedia vs Wikipedia.
 * 2) Examples of projects that can be done with Wikipedia, this included sharing text, images, data and supporting Wikipedia by hosting events and promoting initiatives. Also describing the value of collating data e.f the FindingGLAMs project which created a worldwide database of cultural heritage institutions as an example.
 * 3) A workshop to collate the expert's knowledge, asking people to share their knowledge; people, projects and organisations, books publications and databases, topics, videos, main messages etc. Ask people to write down answers to each topic on post it notes for several minutes and then everyone come to compile the information on a large sheet of paper where people can see the information growing as it was added.
 * 4) The messages can be displayed on a table to allow everyone at the conference to contribute their knowledge.

This section outlines the steps to process and understand the information collected.
 * 1) Processing the information: transforming the information into a usable format
 * 2) Analysis: understanding patterns, mistakes and themes in the information collected.

Processing the information Sort the information into categories or other ways of organising it to help others understand it and import it into a spreadsheet or list.

Analysis It can be very helpful to analyse the information you’ve collected before sharing it more widely to understand if any issues have happened, any obviously missing or incomplete information. Include your notes on the information you’ve collected when sharing the information to help others to understand what to consider when using the information.

Example: Switched On workshop
 * Many of the things people would like to see as Wikipedia articles appear to be information they would like to use in their work e.g lists of sexuality education apps, content producers etc. This points to the practical value of Wikipedia in people's professional work.
 * Many of the topics people suggested are about how two topics relate to one another e.g gender and feminism, Wikidata and Wikipedia's structure normally focuses on one topic, not the relationship between subjects, it is unclear how to work with this.
 * Many of the people that are suggested are from countries that are not well covered in the media e.g Fiji, Eswatini, this shows that there is a significant amount of knowledge that may only be available within small communities and local media or may not be documented at all, since Wikipedia and Wikidata rely on referencing its unclear how these topics can be covered.
 * Whilst most of the topics raised can be imported into Wikidata the 'main messages' category does not really fit, its unclear what to do with this information.


 * 1) Making the information available: to the Wikimedia community and the people who share information.
 * 2) Helping people find the information: People can only use the information if people are aware of it
 * 3) Sharing your process: so other people can learn from it.

Making the information available Once the information has been collected it can be imported into Wikimedia projects in many different ways, look at existing projects for ideas of how to use the data collected. It is also important to provide people with the information in a format that they can use, e.g don’t expect people to be able to use the Wikidata query service or have the time or interest to learn.


 * Lists on Wikiprojects: Compiling reading lists for Wikiprojects, this can be a reading list or redlists like Women in Red use.
 * Wikipedia Diversity Observatory: The Wikipedia Diversity Observatory is a tool which can be used to map cultural gaps between different language projects along a number of parameters. The tool is particularly good at surfacing content covered well on one language Wikipedia project, but is not yet covered on another. You can also prioritize article lists in terms of pageviews, links to other articles, and other parameters.
 * Wikidata: Adding data to Wikidata is very useful, it can be used by all Wikimedia projects in all languages and it can be combined with existing data. Several Wikiprojects already use Wikidata. You don’t have to add the data yourself, if you have a large spreadsheet you can request it is added on Wikidata:Datasets and leave a message on Wikidata project chat. If you have a small amount of topics and you would like add the information yourself you can learn by taking the Wikidata Tours.
 * Wikidata queries: If you import the data into Wikidata queries can be created which show which of the Wikidata items have Wikipedia articles already, if you don’t know how to write queries you can ask at Wikidata:Request a query or learn yourself.
 * Raw data: Its also really helpful to share your raw data in a spreadsheet. You can copy it into a Wikitable and also provide an online spreadsheet in something like Google Sheets can be a really easy way to access your raw data and use it in new ways. Currently Wikimedia Commons does not support spreadsheets, see this Phabricator task for more information.

Helping people find the information People can only use the information if people are aware of it and know where to find it tell people who can use it, places you can tell people that the information is available include:
 * Wikiprojects
 * Village pumps
 * Wikipedia social media channels
 * Wikipedia Facebook groups like Wikipedia Weekly

Sharing your process Sharing your process allows people to understand how you collected the information, what can be done to build on your work and to learn how to run their own projects.

Process This includes information about who you are working with, the steps in the process you took and how and why you did them that way and sharing any resources you created for the work so others can use them.

Example: The Switched On workshop The workshop was a conference session to educate people about Wikipedia's role in sexuality education and to crowd source people's knowledge on the topic. The presentation used in the workshop is available here. The workshop had four stages:
 * 1) An introduction to Wikipedia, describing what Wikipedia is, its audience and how it fits into the Sustainable Development Goals. Wikipedia was used as a catch all term to save time explaining Wikimedia vs Wikipedia.
 * 2) Examples of projects that can be done with Wikipedia from UN agencies, this included sharing text, images, data and supporting Wikipedia by hosting events and promoting initiatives. Also describing the value of collating data using the FindingGLAMs project to create a worldwide database of cultural heritage institutions as an example.
 * 3) A workshop to collate the expert's knowledge, asking people to share their knowledge; people, projects and organisations, books publications and databases, topics, videos, main messages. People were asked to write down answers to each topic on post it notes for several minutes and then everyone came to the front to compile the information on a large sheet of paper. People could see the information growing as it was added. Note: A workshop was run at a UNFPA event in Nairobi, Kenya which was a less developed version of this workshop, the results have been combined.
 * 4) The messages were displayed on a table to allow everyone at the conference to contribute their knowledge, around 20% of the information was added outside of the workshop.

Suggest improvements to the process What did you learn from the work you did? What improvements could be made in the process, either generally or for working with a specific audience?

Example: The Switched On workshop The most important improvements to make for the next version of the workshop are to explain the value of crowdsourcing the information of the knowledge in the room earlier in the presentation and asking people to write as clearly as possible.

Suggest follow on work What could be done to build upon the work you’ve done to map this topic area? What has mapping this topic area highlighted as missing? What else could be done in this area?

Example: The Switched On workshop 400+ ideas and suggestions were created in half an hour by 20 people, this volume of information demonstrates the breadth of content available in this area. The majority of the people attending the conference were from Europe and North America meaning that local knowledge from other areas of the world was not included.