Open Science Monitor



An Open Science Monitor or Open Access Monitor is a scientific infrastructure that aimed to assess the spread of open practices in a scientific context.

Open Science monitors have generally been built at the scale of a specific country or a specific institutions. They require an accurate assessment of the total scientific output and a further breakdown between open and closed content. They rely on a variety of data sources and methodologies to achieve this end. Consequently, Open Science Monitors have also become relevant tools for bibliometric analysis.

While initially conceived to track publications in academic journals, Open Science Monitor have diversify their scopes and indicators. A recent trend has been to map other major outputs of open science research such as datasets, software or clinical trials.

Definition
Open Science Monitor are a scientific infrastructure that provide a "good knowledge of the state" of scientific outputs and their "open access rate". They are also a policy tool that aims to better assess the discrepancy between actual practices and long-term objectives: they "can inform future strategies at institutional and national levels, provides guidance for policy development and review, helps to assess the effects of funding mechanisms and is crucial to negotiate transformative agreements with traditional subscription publishers."

Open Access Monitors are a specific variant of Open Science Monitors, that is focused on open access publications. They aim to track the share of open access among journal articles, but also "books, book chapters, proceedings, and other publication types". In contrast, generic Open Science Monitors have a more expansive scope and will in effect include all forms of scientific outputs and activities: "By definition, open science concerns the entire cycle of the scientific process, not only open access to publications"

Nearly all the Open Science Monitor have been created at a national scale, as part of a general policy of enhanced visibility of public costs and investments in regards to scientific publications. Major examples include the Baromètre de la science ouverte in France, the Open Access Monitor in Germany, JUULI in Finland, the Open Access Barometer in Denmark, NARCIS and later openaccess.nl in the Netherlands and the Swiss Open Access Monitor. A prototype of open science monitor was also conceived in the United Kingdom in 2017 but "apparently not realized."

International initiatives include the Australian-based Curtin Open Knowledge Initiative (CUKI), the Open Science Monitor of the European Union and OpenAIRE. Yet, the spread of their data is more limited than national monitors, as they do "not offer evaluation options on an institutional level".

Context
Open science monitors belong to a global ecosystem of open scientific infrastructures. This ecosystem emerged in the first decades of the 21st century as an alternative to the closed infrastructures built by large scientific publishers and analytic companies.

After the Second World War, scientific publishing faced a "periodical crisis": funders, institutions and journals could not keep up with the rapidly increasing scientific output. New infrastructure, tools have to be developed also to keep track of scientific investment. Due to the limited success of public initiatives like SCITEL or MEDLINE in the United States, large private organizations filled this need. In 1963, Eugene Garfield created the Institute for Scientific Information that aimed to transform the projects initially envisioned with the Federal administration into a profitable business. The Science Citation Index and, later, the Web of Science had a massive and lasting influence on global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Consequently funders increasingly relied on analytics created by the Science Citation Index and its main competitors to assess the performance of institutions or individual researchers.

After 1990, leading academic publishers started to diversify their activities beyond publishing and moved "from a content-provision to a data analytics business." By 2019, Elsevier has either acquired or built a large portofolio platforms, tools, databases and indicators covering all aspects and stages of scientific research: "the largest supplier of academic journals is also in charge of evaluating and validating research quality and impact (e.g., Pure, Plum Analytics, Sci Val), identifying academic experts for potential employers (e.g., Expert Lookup5), managing the research networking platforms through which to collaborate (e.g., SSRN, Hivebench, Mendeley), managing the tools through which to find funding (e.g., Plum X, Mendeley, Sci Val), and controlling the platforms through which to analyze and store researchers' data (e.g., Hivebench, Mendeley)." Metrics and indicators are key components of this vertical integration: "Elsevier's further move to offering metrics-based decision making is simultaneously a move to gain further influence in the entirety of the knowledge production process, as well as to further monetize its disproportionate ownership of content." The new market for scientific publication and scientific data has been compared with the business models of social networks, search engines and other forms of platform capitalism While content access is free, it is indirectly paid through data extraction and surveillance.

Early developments
The first open science monitors were created in the 2000s and the early 2010s. They were usually conceived as a natural outgrowth of new national and international policy in favor of open access and open science. The Berlin Declaration from 2003 especially introduced the concept of a global "transition of scientific publishing toward an open access system" which require "information on publication output and on subscription and publication fees."

Additionally the diversification of open science publishing into various publication venues (journals, repositories, overlay journals...) and formats (articles, conferences, datasets...) created unprecedented challenges.

One of the earliest form of open science monitor was the Dutch project NARCIS ("National Academic Research and Collaborations Information System") that started operating in December 2005. NARCIS is primarily a national scientific portal that aims to integrate "all kinds of types of information from scientific institutes in the Netherlands." Yet it also has a special focus on "academic OAI repositories" and publishes global statistics on the rate of open restricted and embargoed scientific works since 2000.

By 2013, Finland pioneered the influential Jyväskylä Model through its national portal JUULI. First experimented at the Open Science Centre of the University of Jyväskyl this approach aims "to centralize all aspects of the self-archiving and open access processes lying within the responsibility of the professionals at the university library" in order to ease the process of data collection: "Researchers do as little as possible and, in some cases, nothing at all."

From open access to open science
After 2015, the European Union started to implement ambitious programs and goals within its own funding mechanism like Horizon 2020. This created an unprecedented impetus for the development of monitoring tools and methodologies at a supranational scale: "there has also been a general push for increased monitoring, aiming for both increased transparency to enable each country to see what others are doing" By 2018, 81% of the scientific organizations from Science Europe stated that they "planned to develop Open Access monitoring mechanisms in the future"

In their preparatory work of the Open Science Monitor, Smith et al. underlined that "open science is much more than simply open access, despite the fact that open access tends to dominate discussions at present." Beyond research publications, their approach singled out open research data and a wider range of Communication activities related to open science that included preprints, evaluations, comments and online discussions on social networks.

In May 2018, the European Commission unveiled its plan for a European Open Science Monitor, through a detailed methodological note. While the core features of the Monitor were in line with previous research, it was also announced that Elsevier would be the leading subcontractor for the creation for the platform, despite the past commitments of the academic publisher against open science, and the metrics would combine the metadata of Scopus and Unpaywall to assess the rate of open access publications. The proposal was met with significant backlash, with nearly 1000 researchers and open science activists signing a formal complaint to European Ombudsman. In an oped to the Guardian, Jon Tennant stated that "it is a cruel irony that Elsevier are to be paid to monitor the very system that they have historically fought against."

The European Science Monitor has been subsequently reworked in a different direction. As of 2023, the website only include data only up to 2018. In 2022, the European Council clearly states that "data and bibliographic databases used for research assessment should, in principle, be openly accessible and that tools and technical systems should enable transparency".

The European Open Science Monitor has entailed a significant shift in the objectives and ambitions of similar projects in the member states. In 2018, the French feedback for the Monitor included a detailed plan for the elaboration of open science indicators beyond publications that would prove to have a direct influence over the Barometer of open science

International Infrastructures
Leading open science infrastructures commonly used in Open Science Monitor include, Unpaywall, Crossref and the Directory of Open Access Journal (DOAJ) Crossref is the primary information source of the French Open Science Monitor, as it only considers "publications associated with a Crossref DOI"

Due to significant developments during the 2010s, international infrastructure have a larger scope of "publications, languages and sources" than proprietary databases. Yet "they offer insufficiently standardized metadata, which complicates their collection and processing" and may lack key information for the creation of the open science monitors, such as author affiliations.

Local infrastructures and repositories
Local infrastructures include Current Research Information Systems directly managed by scientific institutions and universities that "help manage, understand, and evaluate research activities". At the institutional level they can bring the most extensive coverage of scientific output, especially taking into account locally published journals that would not necessarily be indexed in global scientific infrastructures. Due to their direct connections with scientific communities, local infrastructures can incentivize researchers to "enter their publications into those systems" and implement a more various range of indicators than what is commonly available in international databases.

Local infrastructures are managed in a decentralized way, with varying levels of coverage and information depending on the institutions. In some cases, local repository are "fed solely by the large commercial databases" and will not have any added value.

The integration of diverse local sources of data into a common and standardized schemes is a major challenge for open science monitors. The preexistence of ambitious funding policy considerably ease this process, as institutions will be already encouraged to adopt specific norms and metadata requirements.

While local infrastructures are generally thoughts as providers of data for an open science monitor, the relationship can go both way. In France of the University of Lorraine implemented its own Open Science Monitor that worked as a local expansion of the French Open Science Monitor.

Proprietary databases
Proprietary databases like the Web of Science or Scopus, have long been leading providers of publication metadata and analytics. Yet their integration into open science monitor is not consensual.

Proprietary databases have long raised issues of data bias, that are especially problematic in the national context of most open science monitors. Their coverage is usually centered on English-speaking publications and neglects resources with a significant local impact. Moreover, reliance on proprietary platforms create long term dependency with added costs and risks of unsustainability: "Commercial providers require licences to access their services, which vary in price and access type"

The French Open Science Monitor is committed to the exclusive use of "public or open datasources". Conversely the German Open Access Monitor currently relies on Dimensions, Web of Science and Scopus, especially to recover "corresponding author information", even though it "looks out for emerging new data sources, especially open sources"

Methodology
Open science monitors generally aim to bring diverse sources of publication metadata and data into a "central interface" that "enables continuous monitoring at a national level and provides a basis for fact-based decisions and actions." Due to "the complexity of the scholarly publishing system", the building of effective open science monitors and is "no trivial task and involves a multitude of decisions".

Data reconciliation
The combination of various bibliometric sources create several challenges. Key metadata can be missing. Entries are also frequently duplicated, as articles are indexed both in local and international databases.

Persistent identifiers (PIDs) are a critical component of open science monitors. In theory they make it possible to "unambiguously identify publications, authors, and associated research institutions". Publications in scientific journals can be associated with internationally recognized standards such as DOIs (for the actual publications) or ORCID (for authors), managed by leading international infrastructures like Crossref.

Despite the preexistence of international standards, open sciences monitor usually have to introduce their own standardization schemes and identifiers. Limiting the analysis to theses standards would immediately "rule out a certain number of journals that do not adhere to this very general technology of persistent identifiers". Furthermore, other forms of scientific outputs or scientific activities (like funding) do not have the same level of standardization.

Even when sources already include persistent identifiers, "some manual standardisation is required", as the original metadata is not always consistent or will not have the same focus. Author affiliation is a crucial information for most of open science monitor, as it makes it possible to discriminate the scientific production of a given country. Yet it will not always be commonly available nor in a systematic manner.

Text & data mining
Open science monitor have recently experimented a range of text mining methods to reconstruct missing metadata. Even leading databases can miss key information: on Crossref, institutional affiliations are missing for "75% of the indexed content".

Since 2022, the French Open Science Monitor has successfully experimented the use of natural language processing methods and models to detect disciplines or institutional affiliations. For discipline classification, this has led to the development of scientific-tagger, a word embedding model based on Fasttext and trained on two annotated databases, PASCAL and FRANCIS.

In 2022, Chaignon and Egret published a systematic reproduction and assessment of the methology of the Monitor in Quantitative Science Studies. Using a mix of proprietary and open databases, they found nearly the same rate of open access publications for the year 2019 (53% vs. 54%) Overall, the open-source strategy used by the BSO proved to be the most efficient approach in comparison with alternative proprietary sources: "The open-source strategy used by the BSO effectively identifies the vast majority of publications with a persistent identifier (DOI) for Open Science monitoring." Additionally the BSO makes it possible to provide metadata at a "sufficiently fine level to shed light on the geographical, thematic, linguistic, etc. disparities that affect bibliometric studies"

Text and data mining methods are especially promising for the indexation of a wider range of open science outputs. Datasets, code, reports or clinical trials have never been systematically cataloged. Since 2022, the national French plan for open science, aims to implement indicators beyond publications and consequently the French Open Science Monitor is working on the data extraction of "references to software and research data" in full text article with experimental deep learning models.

Tracking open science adoption
The French Open Science Monitor was conceived from the start to capture "open access dynamic". This has significant implication in terms of design and data flow as the "OA status of a publication evolves over time" due to embargo policies as well retrospective opening of past content.

Despite significant differences in regards to methodologies or to data source, Pierre Mounier underlined in 2022 that "we observe the same dynamic" in the open access monitors of "three different European countries": the French, German and Dutch monitor all convege to show that slightly more than 60% of research is published in open access.

Economic analysis
Open Science Monitors aim to facilitate the estimation of scientific publishing costs. Without any aggregation of publication data, "information on expenditure for open access publication fees and especially for non-open access publication fees is often not available centrally"

The monitor can also contribute to better assess the economic impact of open science across the entire academic ecosystem. While it is generally assumed that the conversion to open access publishing should not be costlier than the existing system, there can still be significant variations, especially with an APC-based model: institutions with a high volume of publication but limited needs for subscriptions can be in a "worse position financially".