Uses of open science

The open science movement has expanded the uses scientific output beyond specialized academic circles.

Non-academic audience of journals and other scientific outputs has always been significant but was not recorded by the leading metrics of scientific reception, which favor citation data. In the late 1990s, the first open access online publications started to attract a large number of individual visits. This transformation has renewed the theories of scientific dissemination, as direct access to publications curtailed the classic model of scientific popularization. Social impact and potential uses by lay reader have become focal points of discussion in the development of open science platforms and infrastructures.

Analysis of open science uses has required the development of new methods including log analysis, crosslinking analysis or altmetrics, as standard bibliometric approach failed to record the non-academic reception of scientific productions.

In the 2010s, several detailed studies has been devoted to the reception of specific open science platforms due to the increasing availability of use data. Log analysis and surveys showed that professional academics do not make up for the majority of the audience, as recurrent reader profiles include students, non-academic professionals (policy makers, industrial R&D, knowledge workers) and "private citizens" with various motivations (personal health, curiosity, hobby). Traffic on open science platforms is stimulated by a larger ecosystem of knowledge sharing and popularization which includes non-academic productions like blogs Non-academic audience tend to prefer the use of local language, which has create new incentives in favor of linguistic diversity in science.

Bibliometrics and its limitations
After the Second World War, the reception of scientific publications has been increasingly measured by quantitative counts of citations. The field of bibliometrics coalesced in parallel to the development of the first computed search engine, the Science Citation Index, originally incepted by Eugene Garfield in 1962. Founding figures of the field, like the British historian of science Derek John de Solla Price, were proponents of bibliometric reductionism, that is the reduction of all possible bibliometric indicators to citation data and citation graphs. Bibliometric indicators, like the Impact Factor have had a significant influence over research policy and research evaluation after the 1970s.

Academic search engine, citation data collection and the related metrics were intentionally designed to favor English-speaking journals. Until the development of open science platforms "very little [was] actually known about the impact of Latin American journals overall". The use of standard bibliometric indicators like the impact factor yielded a very limited outlook on the breadth and diversity of the academic publishing ecosystem in this region and other non-Western areas: "Putting aside issues of equity, the underrepresentation and shear low number of journals from developing countries mean that journals that are geared towards the developing world will have less of its citations counted than one geared towards journals that are in the dataset."

In the early developments, the open science movement partly coopted the standard tools of bibliometrics and quantitative evaluation: "the fact that no reference was made to metadata in the main OA declarations (Budapest, Berlin, Bethesda) has led to a paradoxical situation (…) it was through the use of the Web of Science that OA advocates were eager to show how much accessibility led to a citation advantage compared to paywalled articles." After 2000, an important bibliometric literature was devoted to the citation advantage of open access publications.

By the end of the 2000s, the impact factor and other metrics have been held increasingly held responsible a systemic locked-in of prestigious non-accessible sources. Key figures of the open science movement like Stevan Harnad called for the creation of "open access scientometrics" that would take "advantage of the wealth of usage and impact metrics enabled by the multiplication of online, full-text, open access digital archives." As the public of open science expanded beyond academic circles, new metrics should aim for "measuring the broader societal impacts of scientific research."

Non-academic audience
Academic journals always had a significant non-academic audience, either coming from students, professionals or amateurs. In 2000, one third of the readers have never authored a scientific publication. This rate may be higher for social science journals that may also act as intellectual periodicals. During the second half of the 20th century, non-academic audience may have continuously expanded in western countries, along with the increasing prevalence of high school education: "the percentage of U.S. adults with a minimal level of understanding of the meaning of scientific study has increased from 12 percent in 1957 to 21 percent in 1999".

The prevalence of non-academic audience raises additional issues on the relevancy and scope of classic bibliometric measures, as they would "never appear in citation data" The infrastructures and business models put in place by leading scientific publishers do not consider non-academic uses. Following the periodical crisis of the 1980s and the inflation of subscription prices, major journals have largely become unattainable for lay readers or independent researchers not affiliated to a large research institution. Search engines and bibliographic databases developed since the 1960s and the 1970s were meant to be used by professional librarians. Leading scientific publishers tacitly relies on a "gap" model of scientific reception, where specialized scientific knowledge is not directly accessible but mediated and popularized.

The shift of academic journals to electronic publishing and open access has underlined the significant discrepancy between the measure of citation counts. By the late 1990s, online journals and archive repositories have evidently attracted a very large audience: "Within individual disciplines the change has been nearly instantaneous. As an example, in mid-1997 the number of papers downloaded from astronomy’s digital library, the Smithsonian/NASA Astrophysics Data System (ADS; ads.harvard.edu) exceeded the sum of all the papers read in all of astronomy’s print libraries" Log studies have regularly underlined than publication of open access have much higher rate of use and downloading than publications under paywall.

The enlargement of the audience of scientific work to non-academic has always been a key objectives of the open access movement: "even the earliest formulations of the concept of open access included the general public as a potential audience for open access". The Budapest Open Access Initiative of 2001 include among the beneficiaries of open access "scientists, scholars, teachers, students, and other curious minds".

In an open science context, non-academic audience has been associated with a wider figure: the lay reader or unexpected reader. Once universally accessible, an academic work can have unplanned readers or users. In 2006, John Willinsky conjectured that "it is not difficult to imagine occasions when a dedicated history teacher, an especially keen high school student, an amateur astronomer, or an ecologically concerned citizen might welcome the opportunity to browse the current and relevant literature pertaining to their interests." Unexpected forms of reception did happen as the Editor in chief of PLOS once received a promising research on the modelling of pandemics, which turned out to be written by "a fifteen-year old high school student". The lay reader is not necessarily part of a non-academic audience, as a professional scientist may become one if "the information sought is outside his or her area of expertise". Not all unexpected readers behave similarly or have the same capacity of using academic resources. Even where they are not dealing with their main domain of expertise, academic researchers or some professionals (the knowledge workers) have acquired some generic skills for bibliographic analysis, such as following citations in the literature.

Unanticipated academic uses
Paywalled journals did not satisfy a larger range of unanticipated academic uses, as due to the costs of subscriptions access has been conditioned on the field of work or the available resources at the institutional level. In 2011, Michael Carroll introduced a typology of five "unanticipated readers" which are beyond the scope of the reading expectations of online academic journals: serendipitous readers (who discover the publication through a complex reading paths), the under-resourced readers (presumably uninitiated, like high school students) interdisciplinary readers (scientists that belong to a different field) international readers (scientist that work within a different national frame) and machine readers (bots that retrieve a corpus, for instance as part of a text mining project).

The development of academic pirate platforms like Sci-Hub or Libgen highlighted structural inequalities on a global scale: "The geography of Sci-Hub usage generally looks like a map of scientific productivity, but with some of the richer and poorer science-focused nations flipped." High rates of sci-hub uses have been especially found in Russia, Algerie, Brazil, Turkey, Mexico and India, which are all countries with significant local academic productions despite having less resources than OECD countries: "relatively to their national scientific production, middle-income countries had the more intensive use of pirated academic works". The audience of pirate academic platforms remain significant even in North-American and European universities endowed with large library subscriptions, as access is commonly perceived as more straightforward than in paywalled libraries: "even for journals to which the university has access, Sci-Hub is becoming the go-to resource"

From impact factor to social impact
The development of large open science platforms and infrastructure after 2010 entailed a shift in the measurement of scientific impact, from a strong focus to highly quoted English-speaking journal to an enlarged analysis of the social circulation of scientific publications. This transformation has been especially noticeable in Latin America, due to the early development of public-funded international publishing platforms like Redalyc, or Scielo: "There is a definite sense in Latin America that the investment in science will result in development in a more broadly defined sense—beyond simply innovation and economic growth."

In 2015, Juan Pablo Alperin introduced a systematic measure of social impact by relying on a diverse set of indicators (log analysis, survey and altmetrics). This approach entailed a conceptual redefinition of key concepts of scientific reception such as impact, reach or reader:

"I turn our attention to these alternative, public forms of research impact and reach by examining the Latin American case. In this study, impact will be assessed through evidence of the research literature being saved, discussed, forwarded, recommended, mentioned, or cited, both within and beyond the academic community (…) Reach refers, in this study, to the extent to which the research literature is viewed or downloaded by members of various audiences, beginning with the traditional academic readership and extending outward through related professions, and perhaps journalists, teachers, enthusiasts, and members of the public (…) By looking at a broad range of indicators of impact and reach, far beyond the typical measures of one article citing another, I argue, it is possible to gain a sense of the people that are using Latin American research, thereby opening the door for others to see the ways in which it has touched those individuals and communities."

The unprecedented focus on social impact of science fits with alternative models of scientific popularization. In 2009, Alesia Zuccala introduced a radiant model of open science dissemination with a variety of mediated and unmediated connections between non-academic audience and academic production: "Sometimes [research] engages the lay public — this is the co-production model of science communication—and sometimes self-selected intermediaries tell members of the public what they should know—the education model of science communication"

Methods
While open science has been largely theorized to have a significant impact on academic and non-academic access to literature, research investigation in this area has proven challenging: it has "the subject of many discussions and indeed was the basis for a lot of the advocacy work and many funding agencies’ OA policies, but rarely so in formal published studies" By definition, open science productions are non-transactional and as such their use leave much less traces than the distribution of commercialized scientific outputs. Overall, it is very difficult to retrieve "data on user demographics from currently available information sources (e.g. repositories and publisher platforms)".

The classic methods of bibliometric studies, including citation analysis, are largely unable to capture the new forms of receptions created by open science. Alternative approaches have to be developed in the 2000s and the 2010s and for long open science advocates and policy-makers had to rely on a limited evidence.

Survey
Surveys have been the primary method of analyzing scientific reception before the development of bibliometrics.

After the development of electronic publishing and open access, survey methods have also migrated online. Pop-up surveys have been introduced for academic publications in the early 2000s: they make it possible to query the user at the exact moment when the resource is retrieved and can be correlated with log data. Yet, "response rates of pop-up surveys tend to be low", which may ultimately distort the representativeness of the survey.

Since 2002, large international surveys of the uses of academic resources has been conducted by Simon Inger and Tracy Gardner with the support of several major scientific organizations and publishers. While not specifically focused on open science, the survey strived to include a more diverse subset of potential users beyond academic authors.

Log Analysis
Academic publication have been among the earliest corpus used for log analysis. The first applied studies in the area long predate the web, as interconnected scientific infrastructures were already commonly used in North America and Europe by the 1970s and the 1980s.

In 1983, several studies pioneered from the Online Computer Library Center the analysis of "transaction logs" let by database users. Logs were at the time stored on magnetic tapes and a large part of the analysis was devoted to the reformatting and the standardization of the data. Standard methods of log analysis were already implemented in these early studies, such the use of probabilistic approaches based on Markov Chains, in order to identify the more regular patterns of user behavior or the comparison with more user survey.

Uses of logs and other forms of readership metrics to measure the reception of academic work has remained marginal. Large commercial databases, like the Web of Science and Scopus had no incentives to divulge reading statistics and mostly use it for internal purposes. Bibliometric indices based on aggregated counts of citations like the impact factor or the h-index have been favored instead as the leading measure of academic impact.

Beyond the restriction put in place by leading publishers actors, log analysis has raised significant methodological issues. Data logging processes differ significantly depending on the structure of the interface: " The number of full-text downloads may be artificially inflated when publishers require users to view HTML versions before accessing PDF versions or when linking mechanisms". Automated access, including search engines indexers or robots can also largely distort aggregated visit counts. This uncertainty impede on the comparability of data: "issues such as journal interfaces continue to affect how users interact with content users, making even standardized reports difficult, if not impossible, to compare.".

Log analysis has been revived in the 2010s due to technological developments and the emergence of large open science platforms. Standards for the retrieval of academic log data have been introduced in the early 2010s such as COUNTER, PIRUS or MESUR. These standards were by design limited to specialized research use, due to their integration to academic infrastructures.

The development of open source web analytics software like Matomo has created an emerging standard for the collection of logs. During the same period, publicly funded scientific platforms have started to share use data openly, as part of their enlarged commitment to open science. In Latin America, both Redalyc and SciELO "provide such usage statistics to the public", although they have remained largely underused: "It is surprising that given the availability of these data, nobody has conducted a study analyzing different dimensions of downloads, beyond the overall view counts and "top 10" lists of articles available from time to time on the respective Web portals."

In 2011, Michael J. Kurtz and Johan Bollen called for the development of usage bibliometrics, an emerging field that "provides unique opportunities to address the known shortcomings of citation analysis". Increased access to log data from open science platforms has made it possible to publish extensive case studies on SciELO and Redalyc, Érudit, OpenEdition.org, Journal.fi or The Conversation

Crosslinking
The web itself and some of its key components (such as search engines) were partly a product of bibliometrics theory. In its original form, it was derived from a bibliographic scientific infrastructure commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics, ENQUIRE. The onset of the World Wide Web in the mid-1990s made Garfield's citationist dream more likely to come true. In the world network of hypertexts, not only is the bibliographic reference one of the possible forms taken by a hyperlink inside the electronic version of a scientific article, but the Web itself also exhibits a citation structure, links between web pages being formally similar to bibliographic citations." Consequently, bibliometrics concepts have been incorporated in major communication technologies the search algorithm of Google: "the citation-driven concept of relevance applied to the network of hyperlinks between web pages would revolutionize the way Web search engines let users quickly pick useful materials out of the anarchical universe of digital information."

While the web immediately affected reading practices, by creating seamless connections between texts, it did not transform to a similar extent the quantitative analysis of citation data, which remained mostly focused on academic connections. Global analysis of hyperlinking and backlinks make it possible to extend the citation analysis beyond scholarly publications and recover the expanding scope of open science circulations: "We have witnessed a proliferation of means of disseminating scholarly publications via academic blogs, scientific magazines destined to a wider audience." In 2011, a log analysis of the Kyoto University website identified a highly diversified set of links to scientific publications. In 2019, a study supported by the Aix-Marseille University of crosslinkings to the French open science platform OpenEdition highlighted that "scientific literature from a largely open access hosting platform is re-appropriated and repurposed for various uses in the public arena."

Altmetrics
In the 2000s and the 2010s, the web has been increasingly dominated by very large social media platforms, that curate and shape a significant part of the digital public sphere. The public reception of scientific literature has also largely migrated to these platforms. This evolution has prompted the development of new metrics and quantitative methods aiming to map the circulation of publications on social media: the altmetrics.

The concept of alt-metrics has been introduced in 2009 by Cameron Neylon and Shirly Wu as article-level metrics. In contrast with the focus of leading metrics on journals (impact factor) or, more recently, on individual researchers (h-index), the article-level metrics makes it possible to track the circulation of individual publications: "article that used to live on a shelf now lives in Mendeley, CiteULike, or Zotero – where we can see and count it" As such they are more compatible with the diversity of publication strategies that has characterized open science: preprints, reports or even non-textual outputs like dataset or software may also have associated metrics. In their original research proposition, Neylon and Wu favored the use of data from reference management software like Zotero or Mendeley. The concept of altmetrics evolved and came to cover data extracted "from social media applications, like blogs, Twitter, ResearchGate and Mendeley." Social media sources proved especially to be more reliable on a long-term basis, as specialized academic tools like Mendeley came to be integrated into proprietary ecosystem developed by leading scientific publishers. Major altmetrics indicators that emerged in the 2010s include Altmetric.com, PLUMx and ImpactStory.

As the meaning of altmetrics shifted, the debate over the positive impact of the metrics evolved toward their redefinition in an open science ecosystem: "Discussions on the misuse of metrics and their interpretation put metrics themselves in the center of open science practices." Social media altmetrics are limited to a specific subset of social media platforms and, within the platforms, to numeric metrics of reception let by users such as likes, shares or comments: "However, 'altmetrics' has continued in the same tradition as the older biblio/scientometrics by basing its indicators on numerical trace, i.e., computing the number of likes, posts, downloads, tweets or retweets a scholarly publication gets on the web with the result that neither of these fields provide information on the actual use of the scholarly publications cited nor the reasons for which they were cited."

While altmetrics were initially conceived for open science publications and their expanded circulation beyond academic circles, their compatibility with the emerging requirements for open metrics has been brought into question: social network data, in particular, is far from transparent and readily accessible. The conversation tracked on the social media may not be that representative of the social impact of research, as researchers are overly represented in theses spaces: "about half of the tweets mentioning journal articles are from academics". In 2016, Ulrich Herb published a systematic assessment of the leading publications metrics in regard to open science principles and conclude that "neither citation-based impact metrics nor alternative metrics can be labeled open metrics. They all lack scientific foundation, transparency and verifiability."

Current uses
Most empiric information retrieved on open science use are platform-specific.

User demographics
Studies of the use of open science resources have generally highlighted the diversity of user profiles with academic researchers only representing a minor segment of the audience. In 2015, the two leading Latin American platforms Redalyc and SciELO have mostly an audience of university students (with 50% and 55% respectively) and professionals in non-academic sectors (20% in SciELO and 17% in Redalyc). Once discounted from other university employees, "researchers only make up 5–6% of the total users". On the Finnish platform journal.fi, students are also the main demographic group (with 40% of users), but academic researchers still make up for a large group (36%).

Convergent estimations of lay readers have been given by the different open science platform studies: 9% of amateur/personal uses in SciELO and 6% in Redalyc, 8% of "private citizens" in the reader survey of journal.fi.

Open science platforms have a balanced gender distributions. The two Latin American platforms, Redalyc and Scielo, tend to have a relative "predominance of women users" (about 60%).

The discipline of the resources impact has a varying impact on uses. Personal interest is more prevalent in the humanities in SciELO. In contrast, "little variability between disciplines" has been observed in Redalyc. Analysis of the bookmark data let by the readers of F1000Prime on Mendeley highlighted a significant share of uses by disciplines totally distinct from the expected audience.

User practices and motivations
Studies of user practices have been mostly devoted to specific user profiles. Few general surveys have been undertaken. In Japan, a 2011 poll of 800 adults showed that a "majority of respondents (55%) claimed that Open Access is useful or slightly useful to them", which suggests a rather large awareness of open science in a population with a significant share of high school education.

The issues met by medical patients have been especially highlighted. An important field of research on health-information-seeking behaviour (HISB) emerged prior to the development of open science. In a 2003 survey, half of American Internet users have attempted to find qualified information about their health, but regularly faced access issues: "Many current Internet health users want to expand access to information-laden sites that are currently closed to non-subscribers". A qualitative research on English medical patients, subscriptions paywall were cited as the main barrier to access to scientific knowledge, along with the complexity of scientific terminology. While the specific needs patients make a strong case for open science, they have also overshadowed the variety of potential uses of academic research: "open access is not just a public health matter: It has a much more general research-enhancing mission".

Research has also focused on professional non-academic uses, due to their potential economic impact. In 2011, a JISC report estimated that there was 1.8 million knowledge workers in the United Kingdom working in R&D, IT, engineering services most of whom being "unaffiliated, without corporate library or information center support." Among a representative set of English knowledge workers, 25% stated that access to the literature was fairly difficult or very difficult and 17% had a recent access problems that has never been resolved A 2011 survey of Danese business highlighted a significant dependence of R&D to academic research: "Forty-eight per cent rated research articles as very or extremely important". The non-profit sector is also significantly impacted by increasing access to literature, as a survey of 101 NGOs from the United Kingdom showed that "73% reported using journal articles and 54% used conference proceedings". In 2018, a log analysis on OpenEdition highlighted corporate access as a significant source of readership, especially among "the aircraft industry, the bank, insurance, car selling and energy sectors and, even more significantly for the further circulation of science in the public sphere, media organizations." Theses results showed that open access had a direct commercial impact on small and large companies.

Language diversity
Scientific publications in another language than English have been marginalized in large commercial databases: they represent less than 5% of the publications indexed on the Web of Science.

Development of open science platforms, have gradually entailed a change of focus, as local language publication have become acknowledge as important actors in the social dissemination of scientific knowledge. In the 2010s, quantitative studies have started to highlight the positive impact of local languages on the reuse of open access resources in varied national contexts such as Finland, Québec, Croatia or Mexico.

Measures of the social impact tend to reverse the incentives of international academic metrics like the impact factors: while they are less featured in academic index, publications in a local language fare better on an enlarged audience. In Finland, a majority of the audience of the academic platform journal.fi favors publications in Finnish (67%). Yet, the linguistic practices of the visitors varies significantly depending on their academic status. Lay reader (private citizens) and students have a clear preference for the local language (81% and 78% of publication accessed). In contrast, professional researchers favor slightly the use of English over Finnish (55%).

Due to the ease of access, open science platforms in a local language can also attain a more global reach. The French-Canadian journal consortium Érudit has mostly an international audience, with less than one third of the readers coming from Canada.

Sharing ecosystem
Open science resources are more likely to be shared in non-scientific settings such as "Twitter, News, Blogs and Policies". In 2011, a log analysis study in Japan highlighted "a remarkable variety of websites linked to these OA papers including blogs about personal hobbies, websites by patients or their families, Q&A website and Wikipedia."

The diversity of the open science ecosystem has been hypothesized to affect the life cycle pattern. In the classic framework of bibliometrics, most publications are expected to experiment an exponentially negative number of citations over the year (also characterized as "half-life", by assimilation with the decay of radioactive elements). In contrast, open science publications "have the feature of keeping sustained and steady downloads for a long time". This sustained reception on a longer timeframe may be partially cause by recurrent episodes of "unexpected access": where old publications attract suddenly a new wave of readers due to a new found relevance.

Reuse of data and software
In contrast with publications, open scientific data and software frequently require a higher level of technical skills: "access is not enough to guarantee that Open Data can be reused effectively because reuse requires not only access, but other resources such as skills, money and computing power" Even firms and organizations may lack the "necessary skills such as information literacy to fully benefit from open resources"

Yet, recent developments like the growth of data analytics services across a large variety of economic sectors have created further needs for research data: "There are many other values (…) that are promoted through the longterm stewardship and open availability of research data. The rapidly expanding area of artificial intelligence (AI) relies to a great extent on saved data." In 2019, the combined data market of the 27 countries of the European Union and the United Kingdom was estimated at 400 billion euros and had a sustained growth of 7.6% per year. although no estimation was given of the specific value of research data, research institutions were identified as important stakeholders in the emerging ecosystem of "data commons".