Wikipedia:WikiProject Open/Open access task force/Signalling OA-ness



This page is about how Wikipedia pages could signal to readers whether a particular reference is open access or not, as outlined in this Signpost op-ed. The main purpose of such signalling would be to spare readers the disappointment of clicking through to the resource only to find out that they do not have access rights to read it. The scheme is also useful for Wikipedia editors who can see at a glance whether a given reference would be licensed in a way that allows for the images, media or even text to be reused in Wikipedia articles.''

Project summary
Some automated tools which work with open access articles are already created. They impose nothing upon anyone who does not wish to use them. For those who wish to use them, they would automate some parts of the citation process and make an odd Wikipedia-specific citation which, contrary to academic tradition, notes whether a work is free to read rather than subscription only. The tools also rip everything usable out of open access works, including the text of the article, pictures or media used, and some metadata, then places this content in multiple Wikimedia projects including Wikimedia Commons, Wikisource, and Wikidata, as well as generating the citation on Wikipedia.

How this works is that when making a citation, someone uses the "Signalling OA" tool. Their citation is generated, but then also, if the paper is open access, then it is mirrored on Wikisource, its images are uploaded to Wikimedia Commons, and metadata about the paper goes to Wikidata. From the user perspective, they just made a citation, but with this tool, making a citation also can automatically trigger the collection of any content which is free to take. For further background, see here.

In a nutshell


Citations used in Wikimedia projects should signal whether a person may actually read the work cited, rather than encounter a paywall. Furthermore, if the source actually is open access, then that source can be mirrored (by means of a robot which already exists and is being tested) on Wikisource. This would have many effects in all languages of Wikimedia projects which uses citations. On English Wikipedia, it would allow (but not force) writers to make "open access-signalling" citations, track the use of citations, give readers access to the works being cited, and import non-text media from open access sources into Commons. The tools to automate this work are in place - see examples of works imported into Wikisource. Collaborating with the community, we can engage the literature on Wikisource, Wikimedia Commons, and Wikidata. Wikipedia readers and editors can connect and leverage this citation infrastructure. This inherently affects every language, as it implies massive potential for translating and suggesting sources across languages. All of this could be ignored and forces no changes on anyone, but for people who would like to use any of these tools, they and a lot of mirrored content on Wikimedia projects would be available. To typical readers, the only obvious change they would notice is that citations would look like below when the writer chooses to use the Signaling OA tool suite, which signals rich content and opportunities for reuse:

While our initial focus is on the English Wikipedia, there are a number of articles on the Dutch Wikipedia that have statements like "Dit is een open access artikel, beschikbaar onder de licentie Creative Commons Naamsvermelding (CC-BY; versie 2.0)." We will work on coordinating such activities across languages and Wikimedia projects. Initial tests are being conducted here.

Longer summary
"Open access" is a term to describe academic publications (research articles) which can be read and remixed for free. There are social movements - especially since the early 2000s - which have said that there should be more access to these publications. This "Signalling OA-ness" project might be the first and only viable proposal to catalog all academic publication which exists and provide access to every open access publication amongst them.

The reason why this proposal is more viable than others is because the Wikimedia projects can make a claim competitive to any other for being the best platform for hosting a list of every article which has ever been published, just because anyone can add content to Wikimedia projects and Wikimedia projects already have a userbase which has been doing this since the founding of the project. Wikimedia projects have the following characteristics, which make them more likely to achieve this:
 * No system exists anywhere in the world to catalog all academic papers and signal which ones are open access.
 * Wikimedia projects already have the world's largest base of volunteers citing academic papers in a single platform.
 * Wikimedia content on any given subject is almost always either the world's most popular content on that subject or among the most popular content. This is because Internet search directs people who ask for information to check Wikimedia projects.
 * Since Wikipedia guidelines say that information in Wikipedia should come from reliable sources, and since from Wikipedia's founding there is a recognition of the need to cite academic papers as the ideal reliable sources, Wikimedia projects have already established a need and culture of practice around citing academic papers.
 * MediaWiki software is an ideal technical platform for hosting a catalog of citations, the text of open access sources, and granting access of the same to anyone who wants it.
 * Wikimedia projects are probably the least expensive and most organizationally neutral place to develop any system for managing all citations, while hosting and enriching all open access sources, and it can be practically done.
 * There is no identified major opposition to setting this up on a Wikimedia project - please comment if you can imagine a reason to oppose, it would be a favor to the organizers.
 * Because no standard exists, and because Wikimedia projects have the user base, popularity, need, and technical capacity to make that standard, and because it is practical to do it here, and because nothing opposes trying, then if a team wants to try to implement the system it is a project worthy of support.

The actual project works like this:
 * 1) Create a database listing every academic publication ever. This is easier than it sounds.
 * 2) Instead of keeping citations hosted locally on each Wikimedia project, put them into Wikidata. Suppose for simplicity that one paper is cited in 10 Wikipedia articles, and that these exist in 200 languages. With current practices, this means that the citation would have to be recreated 2000 times, which is crazy inefficient. If it were on Wikidata then it could be created 1 time and then called from there.
 * 3) Simultaneously to this catalog being perpetually developed, if a paper cited on any Wikimedia project is an open access paper, then copy it to Wikisource. This includes putting the text on Wikisource, the media on Wikimedia Commons, the citation on Wikidata, then the reference on Wikipedia or wherever it is used. See the example below to get an idea how it looks, and especially look at the "text on Wikisource". That text is automatically generated by a bot, and it looks good.
 * 4) The end result is that whenever anyone cites an academic paper in any Wikimedia project, like Wikipedia, then a bot checks out that paper. It puts a citation for it on Wikidata; checks to see whether the paper is open access; if the paper is open access, then it copies the text and media automatically, and it sets up metrics for how the citation is used.

It is extremely motivating to authors, publishers, organizations, governments, and universities to know how their papers are cited, and very empowering to the Wikimedia community to be able to police the use of citations at the source rather than at the article level. This could generate a level of quality control that attracts new and expert users in new ways.

There are some example open access papers already placed on Wikisource and Wikimedia Commons as tests for the importer robot, and these can be seen at s:Wikisource:WikiProject Open Access/Programmatic import from PubMed Central. There are some bugs, but the papers are quite usable. The proposal is to do this kind of importing for everything that can be imported.

As further incentive, once these papers are imported, they can be enriched in all kinds of ways by categorization, interlinking, annotations or limitless other proposals.

Code
A repository has been set up at https://github.com/wpoa/OA-signalling to host all software related to this project.

Funding
The project has received support through a grant from the Open Society Foundations, the proposal for which is available here. A report is available here.